tatami_chunked
Helpers to create custom chunked tatami matrices
Loading...
Searching...
No Matches
Public Member Functions | List of all members
tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ > Class Template Reference

Oracle-aware cache for variable-size slabs. More...

#include <OracularVariableSlabCache.hpp>

Public Member Functions

 OracularVariableSlabCache (std::shared_ptr< const tatami::Oracle< Index_ > > oracle, size_t max_size)
 
 OracularVariableSlabCache (const OracularVariableSlabCache &)=delete
 
OracularVariableSlabCacheoperator= (const OracularVariableSlabCache &)=delete
 
Index_ next ()
 
template<class Ifunction_ , class Ufunction_ , class Afunction_ , class Cfunction_ , class Pfunction_ >
std::pair< const Slab_ *, Index_next (Ifunction_ identify, Ufunction_ upper_size, Afunction_ actual_size, Cfunction_ create, Pfunction_ populate)
 
size_t get_max_size () const
 
size_t get_used_size () const
 
size_t get_num_slabs () const
 

Detailed Description

template<typename Id_, typename Index_, class Slab_, typename Size_>
class tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >

Oracle-aware cache for variable-size slabs.

Template Parameters
Id_Type of slab identifier, typically integer.
Index_Type of row/column index produced by the oracle.
Size_Numeric type for the slab size.
Slab_Class for a single slab.

Implement an oracle-aware cache for variable-size slabs. Each slab is defined as the set of chunks required to read an element of the target dimension (or a contiguous block/indexed subset thereof) from a tatami::Matrix. This cache is similar to OracularSlabCache but enables improved cache utilization when the slabs vary in size. For example, the number of non-zero entries in a sparse matrix might vary between slabs, so the cache could be optimized to fit more slabs into memory when they have fewer non-zeros.

The size of each slab is defined by Size_, which can be any non-negative measure of slab size. This could be the number of non-zero elements, or the number of dimension elements, or the size of the slab in bytes, etc., as long as its interpretation is consistent between slabs and with the max_size used in the constructor. Users can also differentiate between the estimated and actual size of the slab, if the latter is not known until after the slab has been loaded into memory, e.g., the number of non-zero entries in a file-backed sparse matrix.

When implementing Slab_, we generally suggest using a common memory pool that is referenced by each Slab_ instance. This guarantees that the actual cache size does not exceed the limit associated with max_size when Slab_ instances are re-used for different slabs. (Otherwise, if each Slab_ allocates its own memory, re-use of an instance may cause its allocation to increase to the size of the largest encountered slab.) Callers may need to occasionally defragment the pool to ensure that enough memory is available for loading new slabs.

Constructor & Destructor Documentation

◆ OracularVariableSlabCache() [1/2]

tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::OracularVariableSlabCache ( std::shared_ptr< const tatami::Oracle< Index_ > >  oracle,
size_t  max_size 
)
inline
Parameters
oraclePointer to an tatami::Oracle to be used for predictions.
max_sizeTotal size of all slabs to store in the cache. This may be zero, in which case no caching should be performed.

◆ OracularVariableSlabCache() [2/2]

Deleted as the cache holds persistent pointers.

Member Function Documentation

◆ operator=()

Deleted as the cache holds persistent pointers.

◆ next() [1/2]

This method is intended to be called when max_size = 0, to provide callers with the oracle predictions for non-cached extraction of data. Calls to this method should not be intermingled with calls to its overload below; the latter should only be called when max_size > 0.

Returns
The next prediction from the oracle.

◆ next() [2/2]

std::pair< const Slab_ *, Index_ > tatami_chunked::OracularVariableSlabCache< Id_, Index_, Slab_, Size_ >::next ( Ifunction_  identify,
Ufunction_  upper_size,
Afunction_  actual_size,
Cfunction_  create,
Pfunction_  populate 
)
inline

Fetch the next slab according to the stream of predictions provided by the tatami::Oracle. This method should only be called if max_size > 0 in the constructor; otherwise, no slabs are actually available and cannot be returned.

Template Parameters
Ifunction_Function to identify the slab containing each predicted row/column.
Ufunction_Function to compute the estimated size of a slab.
Afunction_Function to compute the actual size of a slab.
Cfunction_Function to create a new slab.
Pfunction_Function to populate zero, one or more slabs with their contents.
Parameters
identifyFunction that accepts an i, an Index_ containing the predicted index of a single element on the target dimension. This should return a pair containing:
  1. An Id_, the identifier of the slab containing i. This is typically defined as the index of the slab on the target dimension. For example, if each chunk takes up 10 rows, attempting to access row 21 would require retrieval of slab 2.
  2. An Index_, the index of row/column i inside that slab. For example, if each chunk takes up 10 rows, attempting to access row 21 would yield an offset of 1.
upper_sizeFunction that accepts j, an Id_ containing the slab identifier. It should return the upper bound on the size of the slab as a non-negative Size_. This upper bound is typically different from the actual size when the latter is not known a priori, e.g., because the size is only known after populating the slab contents. However, if the latter is known, upper_size() may be a trivial function that returns the same value as actual_size().
actual_sizeFunction that accepts j, an Id_ containing the slab identifier; and slab, a populated const Slab_& instance corresponding to j. It should return the actual size of the slab as a non-negative Size_ that is no greater than upper_size(j).
createFunction that accepts no arguments and returns a Slab_ object with sufficient memory to hold a slab's contents when used in populate(). This may also return a default-constructed Slab_ object if the allocation is done dynamically per slab in populate().
populateFunction that accepts three arguments - to_populate, to_reuse and all_slabs.
  • The to_populate argument is a std::vector<std::pair<Id_, size_t> >& specifying the slabs to be populated. The first Id_ element of each pair contains the slab identifier, i.e., the first element returned by the identify function. The second size_t element is the index of the entry of all_slabs containing the corresponding Slab_ instance, as returned by create(). This argument can be modified in any manner. It is guaranteed to be non-empty but is not guaranteed to be sorted.
  • The to_reuse argument is a std::vector<std::pair<Id_, size_t> >& specifying the cached slabs that were re-used in the upcoming set of predictions. The elements of each pair are interpreted in the same manner as to_populate. This argument can be modified in any manner. It is not guaranteed to be non-empty or sorted.
  • The all_slabs argument is a std::vector<Slab_>& containing all slabs in the cache. This may include instances that are not referenced by to_populate or to_reuse. Each element of this argument can be modified but the length should not change.
The populate function should iterate over to_populate and fill each Slab_ with the contents of the corresponding slab. Optionally, callers may use to_reuse to defragment the already-in-use parts of the cache, in order to free up enough space for new data from to_populate.
Returns
Pair containing (1) a pointer to a slab's contents and (2) the index of the next predicted row/column inside the retrieved slab.

◆ get_max_size()

Returns
Maximum total size of the cache. This is the same as the max_size used in the constructor.

◆ get_used_size()

Returns
Current usage across all slabs in the cache. This should be interpreted as an upper bound on usage if there is a difference between estimated and actual slab sizes.

◆ get_num_slabs()

Returns
Number of slabs currently in the cache.

The documentation for this class was generated from the following file: