template<typename Id_, typename Index_, class Slab_, typename Size_>
class tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >
Oracle-aware cache for variable-size slabs.
- Template Parameters
-
Id_ | Type of slab identifier, typically integer. |
Index_ | Type of row/column index produced by the oracle. |
Size_ | Numeric type for the slab size. |
Slab_ | Class for a single slab. |
Implement an oracle-aware cache for variable-size slabs. Each slab is defined as the set of chunks required to read an element of the target dimension (or a contiguous block/indexed subset thereof) from a tatami::Matrix
. This cache is similar to OracularSlabCache
but enables improved cache utilization when the slabs vary in size. For example, the number of non-zero entries in a sparse matrix might vary between slabs, so the cache could be optimized to fit more slabs into memory when they have fewer non-zeros.
The size of each slab is defined by Size_
, which can be any non-negative measure of slab size. This could be the number of non-zero elements, or the number of dimension elements, or the size of the slab in bytes, etc., as long as its interpretation is consistent between slabs and with the max_size
used in the constructor. Users can also differentiate between the estimated and actual size of the slab, if the latter is not known until after the slab has been loaded into memory, e.g., the number of non-zero entries in a file-backed sparse matrix.
When implementing Slab_
, we generally suggest using a common memory pool that is referenced by each Slab_
instance. This guarantees that the actual cache size does not exceed the limit associated with max_size
when Slab_
instances are re-used for different slabs. (Otherwise, if each Slab_
allocates its own memory, re-use of an instance may cause its allocation to increase to the size of the largest encountered slab.) Callers may need to occasionally defragment the pool to ensure that enough memory is available for loading new slabs.
template<typename Id_ , typename Index_ , class Slab_ , typename Size_ >
template<class Ifunction_ , class Ufunction_ , class Afunction_ , class Cfunction_ , class Pfunction_ >
std::pair< const Slab_ *, Index_ > tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::next |
( |
Ifunction_ | identify, |
|
|
Ufunction_ | upper_size, |
|
|
Afunction_ | actual_size, |
|
|
Cfunction_ | create, |
|
|
Pfunction_ | populate ) |
|
inline |
Fetch the next slab according to the stream of predictions provided by the tatami::Oracle
. This method should only be called if max_size > 0
in the constructor; otherwise, no slabs are actually available and cannot be returned.
- Template Parameters
-
Ifunction_ | Function to identify the slab containing each predicted row/column. |
Ufunction_ | Function to compute the estimated size of a slab. |
Afunction_ | Function to compute the actual size of a slab. |
Cfunction_ | Function to create a new slab. |
Pfunction_ | Function to populate zero, one or more slabs with their contents. |
- Parameters
-
identify | Function that accepts an i , an Index_ containing the predicted index of a single element on the target dimension. This should return a pair containing:
- An
Id_ , the identifier of the slab containing i . This is typically defined as the index of the slab on the target dimension. For example, if each chunk takes up 10 rows, attempting to access row 21 would require retrieval of slab 2.
- An
Index_ , the index of row/column i inside that slab. For example, if each chunk takes up 10 rows, attempting to access row 21 would yield an offset of 1.
|
upper_size | Function that accepts j , an Id_ containing the slab identifier. It should return the upper bound on the size of the slab as a non-negative Size_ . This upper bound is typically different from the actual size when the latter is not known a priori, e.g., because the size is only known after populating the slab contents. However, if the latter is known, upper_size() may be a trivial function that returns the same value as actual_size() . |
actual_size | Function that accepts j , an Id_ containing the slab identifier; and slab , a populated const Slab_& instance corresponding to j . It should return the actual size of the slab as a non-negative Size_ that is no greater than upper_size(j) . |
create | Function that accepts no arguments and returns a Slab_ object with sufficient memory to hold a slab's contents when used in populate() . This may also return a default-constructed Slab_ object if the allocation is done dynamically per slab in populate() . |
populate | Function that accepts three arguments - to_populate , to_reuse and all_slabs .
- The
to_populate argument is a std::vector<std::pair<Id_, size_t> >& specifying the slabs to be populated. The first Id_ element of each pair contains the slab identifier, i.e., the first element returned by the identify function. The second size_t element is the index of the entry of all_slabs containing the corresponding Slab_ instance, as returned by create() . This argument can be modified in any manner. It is guaranteed to be non-empty but is not guaranteed to be sorted.
- The
to_reuse argument is a std::vector<std::pair<Id_, size_t> >& specifying the cached slabs that were re-used in the upcoming set of predictions. The elements of each pair are interpreted in the same manner as to_populate . This argument can be modified in any manner. It is not guaranteed to be non-empty or sorted.
- The
all_slabs argument is a std::vector<Slab_>& containing all slabs in the cache. This may include instances that are not referenced by to_populate or to_reuse . Each element of this argument can be modified but the length should not change.
The populate function should iterate over to_populate and fill each Slab_ with the contents of the corresponding slab. Optionally, callers may use to_reuse to defragment the already-in-use parts of the cache, in order to free up enough space for new data from to_populate . |
- Returns
- Pair containing (1) a pointer to a slab's contents and (2) the index of the next predicted row/column inside the retrieved slab.