tatami_stats
Matrix statistics for tatami
|
Local output buffer for running calculations. More...
#include <utils.hpp>
Public Member Functions | |
template<typename Index_ > | |
LocalOutputBuffer (size_t thread, Index_ start, Index_ length, Output_ *output, Output_ fill) | |
template<typename Index_ > | |
LocalOutputBuffer (size_t thread, Index_ start, Index_ length, Output_ *output) | |
LocalOutputBuffer ()=default | |
Output_ * | data () |
const Output_ * | data () const |
void | transfer () |
Local output buffer for running calculations.
A typical parallelization scenario involves dividing the set of objective vectors into contiguous blocks, where each thread operates on a block at a time. However, in running calculations, an entire block's statistics are updated when its corresponding thread processes an observed vector. If these statistics are stored in a global buffer, false sharing at the boundaries of the blocks can result in performance degradation.
To avoid this, the LocalOutputBuffer
class provides thread-local storage for output statistics. Once the calculations are finished per thread, callers should use transfer()
to transfer the local statistics to the global buffer. The exception is that of the first thread, which is allowed to directly write to the global output buffer.
Output_ | Type of the result. |
|
inline |
Index_ | Type of the start index and length. |
thread | Identity of the thread, starting from zero to the total number of threads. | |
start | Index of the first objective vector in the contiguous block for this thread. | |
length | Number of objective vectors in the contiguous block for this thread. | |
[out] | output | Pointer to the global output buffer. |
fill | Initial value to fill the buffer. |
|
inline |
Overloaded constructor that sets the default fill = 0
.
Index_ | Type of the start index and length. |
thread | Identity of the thread, starting from zero to the total number of threads. | |
start | Index of the first objective vector in the contiguous block for this thread. | |
length | Number of objective vectors in the contiguous block for this thread. | |
[out] | output | Pointer to the global output buffer. |
|
default |
Default constructor.
|
inline |
length
addressable elements (see the argument of the same name in the constructor).
|
inline |
length
addressable elements (see the argument of the same name in the constructor).
|
inline |
Transfer results from the local buffer to the global buffer (i.e., output
in the constructor).