tatami_hdf5
tatami bindings for HDF5-backed matrices
|
Representations for matrix data in HDF5 files. More...
Classes | |
class | CompressedSparseMatrix |
Compressed sparse matrix in a HDF5 file. More... | |
struct | CompressedSparseMatrixOptions |
Options for HDF5 extraction. More... | |
class | DenseMatrix |
Dense matrix backed by a DataSet in a HDF5 file. More... | |
struct | DenseMatrixOptions |
Options for DenseMatrix extraction. More... | |
struct | WriteCompressedSparseMatrixOptions |
Parameters for write_compressed_sparse_matrix() . More... | |
Enumerations | |
enum class | WriteStorageLayout { AUTOMATIC , COLUMN , ROW } |
enum class | WriteStorageType { AUTOMATIC , INT8 , UINT8 , INT16 , UINT16 , INT32 , UINT32 , DOUBLE } |
Functions | |
template<typename Value_ , typename Index_ , class ValueStorage_ = std::vector<Value_>, class IndexStorage_ = std::vector<Index_>, class PointerStorage_ = std::vector<size_t>> | |
std::shared_ptr< tatami::Matrix< Value_, Index_ > > | load_compressed_sparse_matrix (size_t nr, size_t nc, const std::string &file, const std::string &vals, const std::string &idx, const std::string &ptr, bool row) |
template<typename Value_ , typename Index_ , class ValueStorage_ = std::vector<Value_>> | |
std::shared_ptr< tatami::Matrix< Value_, Index_ > > | load_dense_matrix (const std::string &file, const std::string &name, bool transpose) |
auto & | get_default_hdf5_lock () |
template<class Function_ > | |
void | serialize (Function_ f) |
template<typename Value_ , typename Index_ > | |
void | write_compressed_sparse_matrix (const tatami::Matrix< Value_, Index_ > *mat, H5::Group &location, const WriteCompressedSparseMatrixOptions ¶ms) |
template<typename Value_ , typename Index_ > | |
void | write_compressed_sparse_matrix (const tatami::Matrix< Value_, Index_ > *mat, H5::Group &location) |
Representations for matrix data in HDF5 files.
|
strong |
Layout to use when saving the matrix inside the HDF5 group.
|
strong |
Numeric type for writing data into a HDF5 dataset.
std::shared_ptr< tatami::Matrix< Value_, Index_ > > tatami_hdf5::load_compressed_sparse_matrix | ( | size_t | nr, |
size_t | nc, | ||
const std::string & | file, | ||
const std::string & | vals, | ||
const std::string & | idx, | ||
const std::string & | ptr, | ||
bool | row | ||
) |
Create a tatami::CompressedSparseMatrix
from a HDF5 group containing compressed sparse data.
Value_ | Type of the matrix values in the Matrix interface. |
Index_ | Type of the row/column indices. |
ValueStorage_ | Vector type for storing the values of the non-zero elements. Elements of this vector may be of a different type than Value_ for more efficient storage. |
IndexStorage_ | Vector type for storing the indices. Elements of this vector may be of a different type than Index_ for more efficient storage. |
PointerStorage_ | Vector type for storing the index pointers. |
nr | Number of rows in the matrix. |
nc | Number of columns in the matrix. |
file | Path to the file. |
vals | Name of the 1D dataset inside file containing the non-zero elements. |
idx | Name of the 1D dataset inside file containing the indices of the non-zero elements. If row = true , this should contain column indices sorted within each row, otherwise it should contain row indices sorted within each column. |
ptr | Name of the 1D dataset inside file containing the index pointers for the start and end of each row (if row = true ) or column (otherwise). This should have length equal to the number of rows (if row = true ) or columns (otherwise) plus 1. |
row | Whether the matrix is stored on disk in compressed sparse row format. If false, the matrix is assumed to be stored in compressed sparse column format. |
tatami::CompressedSparseMatrix
containing all values and indices in memory. This differs from a tatami_hdf5::CompressedSparseMatrix
, where the loading of data is deferred until requested. std::shared_ptr< tatami::Matrix< Value_, Index_ > > tatami_hdf5::load_dense_matrix | ( | const std::string & | file, |
const std::string & | name, | ||
bool | transpose | ||
) |
Create a tatami::DenseMatrix
from a HDF5 DataSet.
Value_ | Type of the matrix values in the tatami::Matrix interface. |
Index_Type | of the row/column indices. |
ValueStorage_ | Vector type for storing the matrix values. This may be different from Value_ for more efficient storage. |
file | Path to the HDF5 file. |
name | Name of the dataset inside the file. This should refer to a 2-dimensional dataset of integer or floating-point type. |
transpose | Whether the dataset is transposed in its storage order, i.e., rows in HDF5 are columns in the matrix. This may be true for HDF5 files generated by frameworks that use column-major matrices, where preserving the data layout between memory and disk is more efficient (see, e.g., the rhdf5 Bioconductor package). |
tatami::DenseMatrix
where all values are in memory. This differs from a tatami_hdf5::DenseMatrix
, where the loading of data is deferred until requested. void tatami_hdf5::serialize | ( | Function_ | f | ) |
Serialize a function's execution to avoid simultaneous calls to the (non-thread-safe) HDF5 library. This is primarily intended for use inside tatami::parallelize()
but can also be called anywhere that uses the same parallelization scheme. Also check out the subpar library, which implements the default parallelization scheme for tatami::parallelize()
.
The default serialization mechanism is automatically determined from the definition of the SUBPAR_USES_OPENMP_RANGE
macro. If defined (i.e., OpenMP is used), f
is executed in OpenMP critical regions named "hdf5"
. Otherwise, a global mutex from <mutex>
is used to guard the execution of f
.
If a custom parallelization scheme is defined via TATAMI_CUSTOM_PARALLEL
or SUBPAR_CUSTOM_PARALLELIZE_RANGE
, the default serialization mechanism may not be appropriate. Users should instead define a TATAMI_HDF5_PARALLEL_LOCK
function-like macro that accepts f
and executes it in a serial section appropriate to the custom scheme. Once defined, this user-defined lock will be used in all calls to serialize()
.
f | Function to be run in a serial section. This accepts no arguments and returns no outputs. |
void tatami_hdf5::write_compressed_sparse_matrix | ( | const tatami::Matrix< Value_, Index_ > * | mat, |
H5::Group & | location, | ||
const WriteCompressedSparseMatrixOptions & | params | ||
) |
Write a sparse matrix inside a HDF5 group. On return, location
will be populated with three datasets containing the matrix contents in a compressed sparse format. Storage of dimensions and other metadata (e.g., related to column versus row layout) is left to the caller.
Value_ | Type of the matrix values. |
Index_ | Type of the row/column indices. |
mat | Pointer to the (presumably sparse) matrix to be written. If a dense matrix is supplied, only the non-zero elements will be saved. |
location | Handle to a HDF5 group in which to write the matrix contents. |
params | Parameters to use when writing the matrix. |
void tatami_hdf5::write_compressed_sparse_matrix | ( | const tatami::Matrix< Value_, Index_ > * | mat, |
H5::Group & | location | ||
) |
Write a sparse matrix inside a HDF5 group. On return, location
will be populated with three datasets containing the matrix contents in a compressed sparse format. Storage of dimensions and other metadata (e.g., related to column versus row layout) is left to the caller.
Value_ | Type of the matrix values. |
Index_ | Type of the row/column indices. |
mat | Pointer to the (presumably sparse) matrix to be written. |
location | Handle to a HDF5 group in which to write the matrix contents. |