tatami_hdf5
tatami bindings for HDF5-backed matrices
Loading...
Searching...
No Matches
tatami for HDF5 matrices

Unit tests Documentation Codecov

Overview

This repository implements tatami bindings for HDF5-backed matrices, allowing some level of random access without loading the entire dataset into memory. Matrices can be conventionally stored as 2-dimensional HDF5 datasets, or in an ad hoc compressed sparse format with a 1-dimensional dataset for each component (data, indices, pointers).

Quick start

tatami_hdf5 is a header-only library, so it can be easily used by just #includeing the relevant source files:

// Dense HDF5 datasets.
"some_file.h5",
"dataset_name",
/* transposed = */ false
);
// Compressed sparse data stored in an ad hoc group.
nrow,
ncol,
"some_file.h5",
"group_name/data",
"group_name/index",
"group_name/ptrs",
/* csr = */ true
);
Compressed sparse matrix in a HDF5 file.
Definition CompressedSparseMatrix.hpp:71
Dense matrix backed by a DataSet in a HDF5 file.
Definition DenseMatrix.hpp:682
Create matrix representations for HDF5-backed data.

In cases where performance is more important than memory consumption, we also provide some utilities to quickly create in-memory tatami matrices from their HDF5 representations:

auto dense_mat_mem = tatami_hdf5::load_dense_matrix<double, int>(
"some_file.h5",
"dataset_name"
);
auto sparse_mat_mem = tatami_hdf5::load_compressed_sparse_matrix<double, int>(
nrow,
ncol,
"some_file.h5",
"group_name/data",
"group_name/index",
"group_name/ptrs",
/* csr = */ true
);

We can also write a tatami sparse matrix into a HDF5 file:

H5::H5File fhandle("some_file2.h5", H5F_ACC_TRUNC);
auto ghandle = fhandle.createGroup("group_name");
void write_compressed_sparse_matrix(const tatami::Matrix< Value_, Index_ > *mat, H5::Group &location, const WriteCompressedSparseMatrixOptions &params)
Definition write_compressed_sparse_matrix.hpp:308

Check out the reference documentation for more details.

Building projects

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
tatami_hdf5
GIT_REPOSITORY https://github.com/tatami-inc/tatami_hdf5
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(tatami_hdf5)

Then you can link to tatami_hdf5 to make the headers available during compilation:

# For executables:
target_link_libraries(myexe tatami_hdf5)
# For libaries
target_link_libraries(mylib INTERFACE tatami_hdf5)

CMake using find_package()

You can install the library by cloning a suitable version of this repository and running the following commands:

mkdir build && cd build
cmake .. -DTATAMI_HDF5_TESTS=OFF
cmake --build . --target install

Then you can use find_package() as usual:

find_package(tatami_tatami_hdf5 CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE tatami::tatami_hdf5)

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DTATAMI_HDF5_FETCH_EXTERN=OFF. See extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. The external dependencies listed in extern/CMakeLists.txt need to be made available during compilation. You'll also need to link to the HDF5 library yourself (version 1.10 or higher). Specific frameworks may come with their own HDF5 binaries, e.g., Rhdf5lib, h5wasm.