tatami_hdf5
tatami bindings for HDF5-backed matrices
|
This repository implements tatami bindings for HDF5-backed matrices, allowing some level of random access without loading the entire dataset into memory. Matrices can be conventionally stored as 2-dimensional HDF5 datasets, or in an ad hoc compressed sparse format with a 1-dimensional dataset for each component (data, indices, pointers).
tatami_hdf5 is a header-only library, so it can be easily used by just #include
ing the relevant source files:
In cases where performance is more important than memory consumption, we also provide some utilities to quickly create in-memory tatami matrices from their HDF5 representations:
We can also write a tatami sparse matrix into a HDF5 file:
Check out the reference documentation for more details.
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
Then you can link to tatami_hdf5 to make the headers available during compilation:
find_package()
You can install the library by cloning a suitable version of this repository and running the following commands:
Then you can use find_package()
as usual:
By default, this will use FetchContent
to fetch all external dependencies. If you want to install them manually, use -DTATAMI_HDF5_FETCH_EXTERN=OFF
. See extern/CMakeLists.txt
to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
. The external dependencies listed in extern/CMakeLists.txt
need to be made available during compilation. You'll also need to link to the HDF5 library yourself (version 1.10 or higher). Specific frameworks may come with their own HDF5 binaries, e.g., Rhdf5lib, h5wasm.