tatami_chunked
Helpers to create custom chunked tatami matrices
|
Oracle-aware cache for variable-size slabs. More...
#include <OracularVariableSlabCache.hpp>
Public Member Functions | |
OracularVariableSlabCache (std::shared_ptr< const tatami::Oracle< Index_ > > oracle, size_t max_size) | |
OracularVariableSlabCache (const OracularVariableSlabCache &)=delete | |
OracularVariableSlabCache & | operator= (const OracularVariableSlabCache &)=delete |
Index_ | next () |
template<class Ifunction_ , class Ufunction_ , class Afunction_ , class Cfunction_ , class Pfunction_ > | |
std::pair< const Slab_ *, Index_ > | next (Ifunction_ identify, Ufunction_ upper_size, Afunction_ actual_size, Cfunction_ create, Pfunction_ populate) |
size_t | get_max_size () const |
size_t | get_used_size () const |
size_t | get_num_slabs () const |
Oracle-aware cache for variable-size slabs.
Id_ | Type of slab identifier, typically integer. |
Index_ | Type of row/column index produced by the oracle. |
Size_ | Numeric type for the slab size. |
Slab_ | Class for a single slab. |
Implement an oracle-aware cache for variable-size slabs. Each slab is defined as the set of chunks required to read an element of the target dimension (or a contiguous block/indexed subset thereof) from a tatami::Matrix
. This cache is similar to OracularSlabCache
but enables improved cache utilization when the slabs vary in size. For example, the number of non-zero entries in a sparse matrix might vary between slabs, so the cache could be optimized to fit more slabs into memory when they have fewer non-zeros.
The size of each slab is defined by Size_
, which can be any non-negative measure of slab size. This could be the number of non-zero elements, or the number of dimension elements, or the size of the slab in bytes, etc., as long as its interpretation is consistent between slabs and with the max_size
used in the constructor. Users can also differentiate between the estimated and actual size of the slab, if the latter is not known until after the slab has been loaded into memory, e.g., the number of non-zero entries in a file-backed sparse matrix.
When implementing Slab_
, we generally suggest using a common memory pool that is referenced by each Slab_
instance. This guarantees that the actual cache size does not exceed the limit associated with max_size
when Slab_
instances are re-used for different slabs. (Otherwise, if each Slab_
allocates its own memory, re-use of an instance may cause its allocation to increase to the size of the largest encountered slab.) Callers may need to occasionally defragment the pool to ensure that enough memory is available for loading new slabs.
|
inline |
oracle | Pointer to an tatami::Oracle to be used for predictions. |
max_size | Total size of all slabs to store in the cache. This may be zero, in which case no caching should be performed. |
|
delete |
Deleted as the cache holds persistent pointers.
|
delete |
Deleted as the cache holds persistent pointers.
This method is intended to be called when max_size = 0
, to provide callers with the oracle predictions for non-cached extraction of data. Calls to this method should not be intermingled with calls to its overload below; the latter should only be called when max_size > 0
.
|
inline |
Fetch the next slab according to the stream of predictions provided by the tatami::Oracle
. This method should only be called if max_size > 0
in the constructor; otherwise, no slabs are actually available and cannot be returned.
Ifunction_ | Function to identify the slab containing each predicted row/column. |
Ufunction_ | Function to compute the estimated size of a slab. |
Afunction_ | Function to compute the actual size of a slab. |
Cfunction_ | Function to create a new slab. |
Pfunction_ | Function to populate zero, one or more slabs with their contents. |
identify | Function that accepts an i , an Index_ containing the predicted index of a single element on the target dimension. This should return a pair containing:
|
upper_size | Function that accepts j , an Id_ containing the slab identifier. It should return the upper bound on the size of the slab as a non-negative Size_ . This upper bound is typically different from the actual size when the latter is not known a priori, e.g., because the size is only known after populating the slab contents. However, if the latter is known, upper_size() may be a trivial function that returns the same value as actual_size() . |
actual_size | Function that accepts j , an Id_ containing the slab identifier; and slab , a populated const Slab_& instance corresponding to j . It should return the actual size of the slab as a non-negative Size_ that is no greater than upper_size(j) . |
create | Function that accepts no arguments and returns a Slab_ object with sufficient memory to hold a slab's contents when used in populate() . This may also return a default-constructed Slab_ object if the allocation is done dynamically per slab in populate() . |
populate | Function that accepts three arguments - to_populate , to_reuse and all_slabs .
populate function should iterate over to_populate and fill each Slab_ with the contents of the corresponding slab. Optionally, callers may use to_reuse to defragment the already-in-use parts of the cache, in order to free up enough space for new data from to_populate . |
|
inline |
max_size
used in the constructor.
|
inline |
|
inline |