Gromacs  2020.4
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Macros | Functions
pme_gpu_utils.h File Reference
#include "config.h"
#include <cassert>
#include "pme_gpu_constants.h"
+ Include dependency graph for pme_gpu_utils.h:
+ This graph shows which files directly or indirectly include this file:

Description

This file defines small PME GPU inline host/device functions. Note that OpenCL device-side functions can't use C++ features, so they are located in a similar file pme_gpu_utils.clh. Be sure to keep the logic in sync in both files when changing it!

Author
Aleksei Iupinov a.yup.nosp@m.inov.nosp@m.@gmai.nosp@m.l.co.nosp@m.m

Macros

#define INLINE_EVERYWHERE   __host__ __device__ __forceinline__
 A macro for inline GPU functions.
 

Functions

template<int order, int atomsPerWarp>
int __host__ __device__
__forceinline__ 
getSplineParamIndexBase (int warpIndex, int atomWarpIndex)
 Gets a base of the unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The base only corresponds to the atom index within the execution block. Feed the result into getSplineParamIndex() to get a full index. TODO: it's likely that both parameters can be just replaced with a single atom index, as they are derived from it. Do that, verifying that the generated code is not bloated, and/or revise the spline indexing scheme. Removing warp dependency would also be nice (and would probably coincide with removing c_pmeSpreadGatherAtomsPerWarp). More...
 
template<int order, int atomsPerWarp>
int __host__ __device__
__forceinline__ 
getSplineParamIndex (int paramIndexBase, int dimIndex, int splineIndex)
 Gets a unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The index is wrt to the execution block, in range(0, atomsPerBlock * order * DIM). This function consumes result of getSplineParamIndexBase() and adjusts it for dimIndex and splineIndex. More...
 
int __device__ __forceinline__ pme_gpu_check_atom_data_index (const int atomDataIndex, const int nAtomData)
 An inline CUDA function for checking the global atom data indices against the atom data array sizes. More...
 
int __device__ __forceinline__ pme_gpu_check_atom_charge (const float coefficient)
 An inline CUDA function for skipping the zero-charge atoms. More...
 

Function Documentation

template<int order, int atomsPerWarp>
int __host__ __device__ __forceinline__ getSplineParamIndex ( int  paramIndexBase,
int  dimIndex,
int  splineIndex 
)

Gets a unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The index is wrt to the execution block, in range(0, atomsPerBlock * order * DIM). This function consumes result of getSplineParamIndexBase() and adjusts it for dimIndex and splineIndex.

Template Parameters
orderPME order
atomsPerWarpNumber of atoms processed by a warp
Parameters
[in]paramIndexBaseMust be result of getSplineParamIndexBase().
[in]dimIndexDimension index (from 0 to 2)
[in]splineIndexSpline contribution index (from 0 to order - 1)
Returns
Index into theta or dtheta array using GPU layout.
template<int order, int atomsPerWarp>
int __host__ __device__ __forceinline__ getSplineParamIndexBase ( int  warpIndex,
int  atomWarpIndex 
)

Gets a base of the unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The base only corresponds to the atom index within the execution block. Feed the result into getSplineParamIndex() to get a full index. TODO: it's likely that both parameters can be just replaced with a single atom index, as they are derived from it. Do that, verifying that the generated code is not bloated, and/or revise the spline indexing scheme. Removing warp dependency would also be nice (and would probably coincide with removing c_pmeSpreadGatherAtomsPerWarp).

Template Parameters
orderPME order
atomsPerWarpNumber of atoms processed by a warp
Parameters
[in]warpIndexWarp index wrt the block.
[in]atomWarpIndexAtom index wrt the warp (from 0 to atomsPerWarp - 1).
Returns
Index into theta or dtheta array using GPU layout.
int __device__ __forceinline__ pme_gpu_check_atom_charge ( const float  coefficient)

An inline CUDA function for skipping the zero-charge atoms.

Returns
Non-0 if atom should be processed, 0 otherwise.
Parameters
[in]coefficientThe atom charge.

This is called from the spline_and_spread and gather PME kernels.

int __device__ __forceinline__ pme_gpu_check_atom_data_index ( const int  atomDataIndex,
const int  nAtomData 
)

An inline CUDA function for checking the global atom data indices against the atom data array sizes.

Parameters
[in]atomDataIndexThe atom data index.
[in]nAtomDataThe atom data array element count.
Returns
Non-0 if index is within bounds (or PME data padding is enabled), 0 otherwise.

This is called from the spline_and_spread and gather PME kernels. The goal is to isolate the global range checks, and allow avoiding them with c_usePadding enabled.