Oracle-aware cache for variable-size slabs. More...

#include <OracularVariableSlabCache.hpp>

Public Member Functions
	OracularVariableSlabCache (std::shared_ptr< const tatami::Oracle< Index_ > > oracle, std::size_t max_size)

	OracularVariableSlabCache (const OracularVariableSlabCache &)=delete

OracularVariableSlabCache &	operator= (const OracularVariableSlabCache &)=delete

Index_	next ()

template<class Ifunction_ , class Ufunction_ , class Afunction_ , class Cfunction_ , class Pfunction_ >
std::pair< const Slab_ *, Index_ >	next (Ifunction_ identify, Ufunction_ upper_size, Afunction_ actual_size, Cfunction_ create, Pfunction_ populate)

auto	get_max_size () const

auto	get_used_size () const

auto	get_num_slabs () const

Detailed Description

template<typename Id_, typename Index_, class Slab_, typename Size_>
class tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >

Oracle-aware cache for variable-size slabs.

Template Parameters

Id_	Type of slab identifier, typically integer.
Index_	Type of row/column index produced by the oracle.
Slab_	Class for a single slab.
Size_	Numeric type for the slab size.

Implement an oracle-aware cache for variable-size slabs. Each slab is defined as the set of chunks required to read an element of the target dimension (or a contiguous block/indexed subset thereof) from a tatami::Matrix. This cache is similar to OracularSlabCache but enables improved cache utilization when the slabs vary in size. For example, the number of non-zero entries in a sparse matrix might vary between slabs, so the cache could be optimized to fit more slabs into memory when they have fewer non-zeros.

The size of each slab is defined by Size_, which can be any non-negative measure of slab size. This could be the number of non-zero elements, or the number of dimension elements, or the size of the slab in bytes, etc., as long as its interpretation is consistent between slabs and with the max_size used in the constructor. Users can also differentiate between the estimated and actual size of the slab, if the latter is not known until after the slab has been loaded into memory, e.g., the number of non-zero entries in a file-backed sparse matrix.

When implementing Slab_, we generally suggest using a common memory pool that is referenced by each Slab_ instance. This guarantees that the actual cache size does not exceed the limit associated with max_size when Slab_ instances are re-used for different slabs. (Otherwise, if each Slab_ allocates its own memory, re-use of an instance may cause its allocation to increase to the size of the largest encountered slab.) Callers may need to occasionally defragment the pool to ensure that enough memory is available for loading new slabs.

Constructor & Destructor Documentation

◆ OracularVariableSlabCache() [1/2]

template<typename Id_ , typename Index_ , class Slab_ , typename Size_ >

tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::OracularVariableSlabCache	(	std::shared_ptr< const tatami::Oracle< Index_ > >	oracle,
		std::size_t	max_size )

inline

Parameters

oracle	Pointer to an `tatami::Oracle` to be used for predictions.
max_size	Total size of all slabs to store in the cache. This may be zero, in which case no caching should be performed.

◆ OracularVariableSlabCache() [2/2]

template<typename Id_ , typename Index_ , class Slab_ , typename Size_ >

tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::OracularVariableSlabCache ( const OracularVariableSlabCache< Id_, Index_, Slab_, Size_ > & )

delete

Deleted as the cache holds persistent pointers.

Member Function Documentation

◆ operator=()

template<typename Id_ , typename Index_ , class Slab_ , typename Size_ >

OracularVariableSlabCache & tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::operator= ( const OracularVariableSlabCache< Id_, Index_, Slab_, Size_ > & )

delete

Deleted as the cache holds persistent pointers.

◆ next() [1/2]

template<typename Id_ , typename Index_ , class Slab_ , typename Size_ >

Index_ tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::next ( )

inline

This method is intended to be called when max_size = 0, to provide callers with the oracle predictions for non-cached extraction of data. Calls to this method should not be intermingled with calls to its overload below; the latter should only be called when max_size > 0.

Returns: The next prediction from the oracle.

◆ next() [2/2]

template<typename Id_ , typename Index_ , class Slab_ , typename Size_ >

template<class Ifunction_ , class Ufunction_ , class Afunction_ , class Cfunction_ , class Pfunction_ >

std::pair< const Slab_ *, Index_ > tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::next	(	Ifunction_	identify,
		Ufunction_	upper_size,
		Afunction_	actual_size,
		Cfunction_	create,
		Pfunction_	populate )

inline

Fetch the next slab according to the stream of predictions provided by the tatami::Oracle. This method should only be called if max_size > 0 in the constructor; otherwise, no slabs are actually available and cannot be returned.

Template Parameters

Ifunction_	Function to identify the slab containing each predicted row/column.
Ufunction_	Function to compute the estimated size of a slab.
Afunction_	Function to compute the actual size of a slab.
Cfunction_	Function to create a new slab.
Pfunction_	Function to populate zero, one or more slabs with their contents.

Parameters

identify	Function that accepts an `i`, an `Index_` containing the predicted index of a single element on the target dimension. This should return a pair containing: An `Id_`, the identifier of the slab containing `i`. This is typically defined as the index of the slab on the target dimension. For example, if each chunk takes up 10 rows, attempting to access row 21 would require retrieval of slab 2. An `Index_`, the index of row/column `i` inside that slab. For example, if each chunk takes up 10 rows, attempting to access row 21 would yield an offset of 1.
upper_size	Function that accepts `j`, an `Id_` containing the slab identifier. It should return the upper bound on the size of the slab as a non-negative `Size_`. This upper bound is typically different from the actual size when the latter is not known a priori, e.g., because the size is only known after populating the slab contents. However, if the latter is known, `upper_size()` may be a trivial function that returns the same value as `actual_size()`.
actual_size	Function that accepts `j`, an `Id_` containing the slab identifier; and `slab`, a populated `const Slab_&` instance corresponding to `j`. It should return the actual size of the slab as a non-negative `Size_` that is no greater than `upper_size(j)`.
create	Function that accepts no arguments and returns a `Slab_` object with sufficient memory to hold a slab's contents when used in `populate()`. This may also return a default-constructed `Slab_` object if the allocation is done dynamically per slab in `populate()`.
populate	Function that accepts three arguments - `to_populate`, `to_reuse` and `all_slabs`. The `to_populate` argument is a `std::vector<std::pair<Id_, SlabIndex> >&` specifying the slabs to be populated. The first `Id_` element of each pair contains the slab identifier, i.e., the first element returned by the `identify` function. The second `SlabIndex` element is an unsigned integer and the index of the entry of `all_slabs` containing the corresponding `Slab_` instance, as returned by `create()`. This argument can be modified in any manner. It is guaranteed to be non-empty but is not guaranteed to be sorted. The `to_reuse` argument is a `std::vector<std::pair<Id_, SlabIndex> >&` specifying the cached slabs that were re-used in the upcoming set of predictions. The elements of each pair are interpreted in the same manner as `to_populate`. This argument can be modified in any manner. It is not guaranteed to be non-empty or sorted. The `all_slabs` argument is a `std::vector<Slab_>&` containing all slabs in the cache. This may include instances that are not referenced by `to_populate` or `to_reuse`. Each element of this argument can be modified but the length should not change. The `populate` function should iterate over `to_populate` and fill each `Slab_` with the contents of the corresponding slab. Optionally, callers may use `to_reuse` to defragment the already-in-use parts of the cache, in order to free up enough space for new data from `to_populate`.

Returns: Pair containing (1) a pointer to a slab's contents and (2) the index of the next predicted row/column inside the retrieved slab.

◆ get_max_size()

template<typename Id_ , typename Index_ , class Slab_ , typename Size_ >

auto tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::get_max_size ( ) const

inline

Returns: Maximum total size of the cache. This has the same value as the max_size used in the constructor. The type is an unsigned integer defined in std::vector::size_type.

◆ get_used_size()

template<typename Id_ , typename Index_ , class Slab_ , typename Size_ >

auto tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::get_used_size ( ) const

inline

Returns: Current usage across all slabs in the cache. This should be interpreted as an upper bound on usage if there is a difference between estimated and actual slab sizes. The type is an unsigned integer defined in std::vector::size_type.

◆ get_num_slabs()

template<typename Id_ , typename Index_ , class Slab_ , typename Size_ >

auto tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::get_num_slabs ( ) const

inline

Returns: Number of slabs currently in the cache. The type is an unsigned integer defined in std::vector::size_type.

The documentation for this class was generated from the following file:

tatami_chunked/OracularVariableSlabCache.hpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ OracularVariableSlabCache() [1/2]

◆ OracularVariableSlabCache() [2/2]

Member Function Documentation

◆ operator=()

◆ next() [1/2]

◆ next() [2/2]

◆ get_max_size()

◆ get_used_size()

◆ get_num_slabs()