Description

This file defines small PME GPU inline host/device functions. Note that OpenCL device-side functions can't use C++ features, so they are located in a similar file pme-gpu-utils.clh. Be sure to keep the logic in sync in both files when changing it!

Author: Aleksei Iupinov a.yup.nosp@m.inov.nosp@m.@gmai.nosp@m.l.co.nosp@m.m

Macros
#define	INLINE_EVERYWHERE __host__ __device__ __forceinline__
	A macro for inline GPU functions.

Functions
template<int order, int atomsPerWarp>
int __host__ __device__ __forceinline__	getSplineParamIndexBase (int warpIndex, int atomWarpIndex)
	Gets a base of the unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The base only corresponds to the atom index within the execution block. Feed the result into getSplineParamIndex() to get a full index. TODO: it's likely that both parameters can be just replaced with a single atom index, as they are derived from it. Do that, verifying that the generated code is not bloated, and/or revise the spline indexing scheme. Removing warp dependency would also be nice (and would probably coincide with removing c_pmeSpreadGatherAtomsPerWarp). More...

template<int order, int atomsPerWarp>
int __host__ __device__ __forceinline__	getSplineParamIndex (int paramIndexBase, int dimIndex, int splineIndex)
	Gets a unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The index is wrt to the execution block, in range(0, atomsPerBlock * order * DIM). This function consumes result of getSplineParamIndexBase() and adjusts it for `dimIndex` and `splineIndex`. More...

int __device__ __forceinline__	pme_gpu_check_atom_data_index (const int atomDataIndex, const int nAtomData)
	An inline CUDA function for checking the global atom data indices against the atom data array sizes. More...

int __device__ __forceinline__	pme_gpu_check_atom_charge (const float coefficient)
	An inline CUDA function for skipping the zero-charge atoms. More...

Function Documentation

template<int order, int atomsPerWarp>

int __host__ __device__ __forceinline__ getSplineParamIndex	(	int	paramIndexBase,
		int	dimIndex,
		int	splineIndex
	)

Gets a unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The index is wrt to the execution block, in range(0, atomsPerBlock * order * DIM). This function consumes result of getSplineParamIndexBase() and adjusts it for dimIndex and splineIndex.

Template Parameters

order	PME order
atomsPerWarp	Number of atoms processed by a warp

Parameters

[in]	paramIndexBase	Must be result of getSplineParamIndexBase().
[in]	dimIndex	Dimension index (from 0 to 2)
[in]	splineIndex	Spline contribution index (from 0 to `order` - 1)

Returns: Index into theta or dtheta array using GPU layout.

template<int order, int atomsPerWarp>

int __host__ __device__ __forceinline__ getSplineParamIndexBase	(	int	warpIndex,
		int	atomWarpIndex
	)

Gets a base of the unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The base only corresponds to the atom index within the execution block. Feed the result into getSplineParamIndex() to get a full index. TODO: it's likely that both parameters can be just replaced with a single atom index, as they are derived from it. Do that, verifying that the generated code is not bloated, and/or revise the spline indexing scheme. Removing warp dependency would also be nice (and would probably coincide with removing c_pmeSpreadGatherAtomsPerWarp).

Template Parameters

order	PME order
atomsPerWarp	Number of atoms processed by a warp

Parameters

[in]	warpIndex	Warp index wrt the block.
[in]	atomWarpIndex	Atom index wrt the warp (from 0 to atomsPerWarp - 1).

Returns: Index into theta or dtheta array using GPU layout.

int __device__ __forceinline__ pme_gpu_check_atom_charge ( const float coefficient )

An inline CUDA function for skipping the zero-charge atoms.

Returns: Non-0 if atom should be processed, 0 otherwise.

Parameters

[in] coefficient The atom charge.

This is called from the spline_and_spread and gather PME kernels.

int __device__ __forceinline__ pme_gpu_check_atom_data_index	(	const int	atomDataIndex,
		const int	nAtomData
	)

An inline CUDA function for checking the global atom data indices against the atom data array sizes.

Parameters

[in]	atomDataIndex	The atom data index.
[in]	nAtomData	The atom data array element count.

Returns: Non-0 if index is within bounds (or PME data padding is enabled), 0 otherwise.

This is called from the spline_and_spread and gather PME kernels. The goal is to isolate the global range checks, and allow avoiding them with c_usePadding enabled.

Description

Macros

Functions

Function Documentation