tatami_hdf5
tatami bindings for HDF5-backed matrices
Loading...
Searching...
No Matches
Classes | Enumerations | Functions
tatami_hdf5 Namespace Reference

Representations for matrix data in HDF5 files. More...

Classes

class  CompressedSparseMatrix
 Compressed sparse matrix in a HDF5 file. More...
 
struct  CompressedSparseMatrixOptions
 Options for HDF5 extraction. More...
 
class  DenseMatrix
 Dense matrix backed by a DataSet in a HDF5 file. More...
 
struct  DenseMatrixOptions
 Options for DenseMatrix extraction. More...
 
struct  WriteCompressedSparseMatrixOptions
 Parameters for write_compressed_sparse_matrix(). More...
 

Enumerations

enum class  WriteStorageLayout { AUTOMATIC , COLUMN , ROW }
 
enum class  WriteStorageType {
  AUTOMATIC , INT8 , UINT8 , INT16 ,
  UINT16 , INT32 , UINT32 , DOUBLE
}
 

Functions

template<typename Value_ , typename Index_ , class ValueStorage_ = std::vector<Value_>, class IndexStorage_ = std::vector<Index_>, class PointerStorage_ = std::vector<size_t>>
std::shared_ptr< tatami::Matrix< Value_, Index_ > > load_compressed_sparse_matrix (size_t nr, size_t nc, const std::string &file, const std::string &vals, const std::string &idx, const std::string &ptr, bool row)
 
template<typename Value_ , typename Index_ , class ValueStorage_ = std::vector<Value_>>
std::shared_ptr< tatami::Matrix< Value_, Index_ > > load_dense_matrix (const std::string &file, const std::string &name, bool transpose)
 
auto & get_default_hdf5_lock ()
 
template<class Function_ >
void serialize (Function_ f)
 
template<typename Value_ , typename Index_ >
void write_compressed_sparse_matrix (const tatami::Matrix< Value_, Index_ > *mat, H5::Group &location, const WriteCompressedSparseMatrixOptions &params)
 
template<typename Value_ , typename Index_ >
void write_compressed_sparse_matrix (const tatami::Matrix< Value_, Index_ > *mat, H5::Group &location)
 

Detailed Description

Representations for matrix data in HDF5 files.

Enumeration Type Documentation

◆ WriteStorageLayout

Layout to use when saving the matrix inside the HDF5 group.

◆ WriteStorageType

enum class tatami_hdf5::WriteStorageType
strong

Numeric type for writing data into a HDF5 dataset.

Function Documentation

◆ load_compressed_sparse_matrix()

template<typename Value_ , typename Index_ , class ValueStorage_ = std::vector<Value_>, class IndexStorage_ = std::vector<Index_>, class PointerStorage_ = std::vector<size_t>>
std::shared_ptr< tatami::Matrix< Value_, Index_ > > tatami_hdf5::load_compressed_sparse_matrix ( size_t  nr,
size_t  nc,
const std::string &  file,
const std::string &  vals,
const std::string &  idx,
const std::string &  ptr,
bool  row 
)

Create a tatami::CompressedSparseMatrix from a HDF5 group containing compressed sparse data.

Template Parameters
Value_Type of the matrix values in the Matrix interface.
Index_Type of the row/column indices.
ValueStorage_Vector type for storing the values of the non-zero elements. Elements of this vector may be of a different type than Value_ for more efficient storage.
IndexStorage_Vector type for storing the indices. Elements of this vector may be of a different type than Index_ for more efficient storage.
PointerStorage_Vector type for storing the index pointers.
Parameters
nrNumber of rows in the matrix.
ncNumber of columns in the matrix.
filePath to the file.
valsName of the 1D dataset inside file containing the non-zero elements.
idxName of the 1D dataset inside file containing the indices of the non-zero elements. If row = true, this should contain column indices sorted within each row, otherwise it should contain row indices sorted within each column.
ptrName of the 1D dataset inside file containing the index pointers for the start and end of each row (if row = true) or column (otherwise). This should have length equal to the number of rows (if row = true) or columns (otherwise) plus 1.
rowWhether the matrix is stored on disk in compressed sparse row format. If false, the matrix is assumed to be stored in compressed sparse column format.
Returns
Pointer to a tatami::CompressedSparseMatrix containing all values and indices in memory. This differs from a tatami_hdf5::CompressedSparseMatrix, where the loading of data is deferred until requested.

◆ load_dense_matrix()

template<typename Value_ , typename Index_ , class ValueStorage_ = std::vector<Value_>>
std::shared_ptr< tatami::Matrix< Value_, Index_ > > tatami_hdf5::load_dense_matrix ( const std::string &  file,
const std::string &  name,
bool  transpose 
)

Create a tatami::DenseMatrix from a HDF5 DataSet.

Template Parameters
Value_Type of the matrix values in the tatami::Matrix interface.
Index_Typeof the row/column indices.
ValueStorage_Vector type for storing the matrix values. This may be different from Value_ for more efficient storage.
Parameters
filePath to the HDF5 file.
nameName of the dataset inside the file. This should refer to a 2-dimensional dataset of integer or floating-point type.
transposeWhether the dataset is transposed in its storage order, i.e., rows in HDF5 are columns in the matrix. This may be true for HDF5 files generated by frameworks that use column-major matrices, where preserving the data layout between memory and disk is more efficient (see, e.g., the rhdf5 Bioconductor package).
Returns
Pointer to a tatami::DenseMatrix where all values are in memory. This differs from a tatami_hdf5::DenseMatrix, where the loading of data is deferred until requested.

◆ serialize()

template<class Function_ >
void tatami_hdf5::serialize ( Function_  f)

Serialize a function's execution to avoid simultaneous calls to the (non-thread-safe) HDF5 library. This is primarily intended for use inside tatami::parallelize() but can also be called anywhere that uses the same parallelization scheme. Also check out the subpar library, which implements the default parallelization scheme for tatami::parallelize().

The default serialization mechanism is automatically determined from the definition of the SUBPAR_USES_OPENMP_RANGE macro. If defined (i.e., OpenMP is used), f is executed in OpenMP critical regions named "hdf5". Otherwise, a global mutex from <mutex> is used to guard the execution of f.

If a custom parallelization scheme is defined via TATAMI_CUSTOM_PARALLEL or SUBPAR_CUSTOM_PARALLELIZE_RANGE, the default serialization mechanism may not be appropriate. Users should instead define a TATAMI_HDF5_PARALLEL_LOCK function-like macro that accepts f and executes it in a serial section appropriate to the custom scheme. Once defined, this user-defined lock will be used in all calls to serialize().

Parameters
fFunction to be run in a serial section. This accepts no arguments and returns no outputs.

◆ write_compressed_sparse_matrix() [1/2]

template<typename Value_ , typename Index_ >
void tatami_hdf5::write_compressed_sparse_matrix ( const tatami::Matrix< Value_, Index_ > *  mat,
H5::Group &  location,
const WriteCompressedSparseMatrixOptions params 
)

Write a sparse matrix inside a HDF5 group. On return, location will be populated with three datasets containing the matrix contents in a compressed sparse format. Storage of dimensions and other metadata (e.g., related to column versus row layout) is left to the caller.

Template Parameters
Value_Type of the matrix values.
Index_Type of the row/column indices.
Parameters
matPointer to the (presumably sparse) matrix to be written. If a dense matrix is supplied, only the non-zero elements will be saved.
locationHandle to a HDF5 group in which to write the matrix contents.
paramsParameters to use when writing the matrix.

◆ write_compressed_sparse_matrix() [2/2]

template<typename Value_ , typename Index_ >
void tatami_hdf5::write_compressed_sparse_matrix ( const tatami::Matrix< Value_, Index_ > *  mat,
H5::Group &  location 
)

Write a sparse matrix inside a HDF5 group. On return, location will be populated with three datasets containing the matrix contents in a compressed sparse format. Storage of dimensions and other metadata (e.g., related to column versus row layout) is left to the caller.

Template Parameters
Value_Type of the matrix values.
Index_Type of the row/column indices.
Parameters
matPointer to the (presumably sparse) matrix to be written.
locationHandle to a HDF5 group in which to write the matrix contents.