Detailed Description

OpenCL-based reduction based on the NVIDIA OpenCL reduction SDK sample.

For small arrays with n <= 128 the GPU code isn't used since the block size is fixed in the kernel. Setting the block size dynamically would require recompilation of the kernel which probably isn't very efficient. Therefore the sum is computed on the CPU in these cases.

Public Member Functions
	ClReduction (ClEnvironment *env=nullptr)
	Create OpenCL environment.
	~ClReduction ()
	Use existing context and command queue.
float	reduce (float *data, int size)
float	reduce (const ClImage &data, int size, ClImage *outputBuffer=nullptr)
	Reduce contents of data.
Eigen::VectorXd	reduce (const ClImage &data)

Protected Attributes
ClEnvironment *	m_env
ClProgram *	m_program
ClKernel *	m_kernelReduction
ClKernel *	m_kernelFinal
ClImage *	m_dataOut
float *	m_dataHost
bool	m_ownContext
int	m_maxBlocks
int	m_maxThreadsReduce
int	m_maxThreadsFinal
unsigned int	m_dataType

Member Function Documentation

◆ reduce()

float ImFusion::ClReduction::reduce	(	const ClImage &	data,
		int	size,
		ClImage *	outputBuffer = nullptr )

Reduce contents of data.

Data has to be a 1D buffer of floats. If outputBuffer is provided it has to be a 1D buffer with 64 floats. The result of the reduction is obtained by summing over the 64 entries. In this case the return value will always be zero. If outputBuffer is not provided the buffer will be downloaded and reduced internally.

The documentation for this class was generated from the following file:

ImFusion/CL/ClReduction.h

Detailed Description

Public Member Functions

Protected Attributes

Member Function Documentation

◆ reduce()