tatami_stats
Matrix statistics for tatami
Loading...
Searching...
No Matches
Public Member Functions | List of all members
tatami_stats::LocalOutputBuffer< Output_ > Class Template Reference

Local output buffer for running calculations. More...

#include <utils.hpp>

Public Member Functions

template<typename Index_ >
 LocalOutputBuffer (size_t thread, Index_ start, Index_ length, Output_ *output, Output_ fill)
 
template<typename Index_ >
 LocalOutputBuffer (size_t thread, Index_ start, Index_ length, Output_ *output)
 
 LocalOutputBuffer ()=default
 
Output_ * data ()
 
const Output_ * data () const
 
void transfer ()
 

Detailed Description

template<typename Output_>
class tatami_stats::LocalOutputBuffer< Output_ >

Local output buffer for running calculations.

A common parallelization scheme involves dividing the set of objective vectors into contiguous blocks, where each thread operates on a block at a time. However, in running calculations, an entire block's statistics are updated when its corresponding thread processes an observed vector. If these statistics are stored in a global output buffer, false sharing at the boundaries of the blocks can degrade performance.

To mitigate false sharing, we create a separate std::vector in each thread to store its output statistics. The aim is to give the memory allocator an opportunity to store each thread's vector contents at non-contiguous addresses on the heap. (While not guaranteed, well-separated addresses are observed on many compiler/architecture combinations, presumably due to the use of multiple arenas - see https://github.com/tatami-inc/tatami_stats/issues/9 for testing.) Once the calculations are finished, each thread can transfer its statistics to the global buffer.

The LocalOutputBuffer is just a wrapper around a std::vector with some special behavior for the first thread. Specifically, the first thread is allowed to directly write to the global buffer. This avoids any extra allocation in the serial case where there is no need to protect against false sharing.

Template Parameters
Output_Type of the result.

Constructor & Destructor Documentation

◆ LocalOutputBuffer() [1/3]

template<typename Output_ >
template<typename Index_ >
tatami_stats::LocalOutputBuffer< Output_ >::LocalOutputBuffer ( size_t  thread,
Index_  start,
Index_  length,
Output_ *  output,
Output_  fill 
)
inline
Template Parameters
Index_Type of the start index and length.
Parameters
threadIdentity of the thread, starting from zero to the total number of threads.
startIndex of the first objective vector in the contiguous block for this thread.
lengthNumber of objective vectors in the contiguous block for this thread.
[out]outputPointer to the global output buffer.
fillInitial value to fill the buffer.

◆ LocalOutputBuffer() [2/3]

template<typename Output_ >
template<typename Index_ >
tatami_stats::LocalOutputBuffer< Output_ >::LocalOutputBuffer ( size_t  thread,
Index_  start,
Index_  length,
Output_ *  output 
)
inline

Overloaded constructor that sets the default fill = 0.

Template Parameters
Index_Type of the start index and length.
Parameters
threadIdentity of the thread, starting from zero to the total number of threads.
startIndex of the first objective vector in the contiguous block for this thread.
lengthNumber of objective vectors in the contiguous block for this thread.
[out]outputPointer to the global output buffer.

◆ LocalOutputBuffer() [3/3]

template<typename Output_ >
tatami_stats::LocalOutputBuffer< Output_ >::LocalOutputBuffer ( )
default

Default constructor.

Member Function Documentation

◆ data() [1/2]

template<typename Output_ >
Output_ * tatami_stats::LocalOutputBuffer< Output_ >::data ( )
inline
Returns
Pointer to an output buffer to use for this thread. This contains at least length addressable elements (see the argument of the same name in the constructor). For thread = 0, this will be equal to output + start.

◆ data() [2/2]

template<typename Output_ >
const Output_ * tatami_stats::LocalOutputBuffer< Output_ >::data ( ) const
inline
Returns
Const pointer to an output buffer to use for this thread. This contains at least length addressable elements (see the argument of the same name in the constructor). For thread = 0, this will be equal to output + start.

◆ transfer()

template<typename Output_ >
void tatami_stats::LocalOutputBuffer< Output_ >::transfer ( )
inline

Transfer results from the local buffer to the global buffer (i.e., output in the constructor). For thread = 0, this will be a no-op.


The documentation for this class was generated from the following file: