#include "config.h"
#include <cassert>
#include "pme-gpu-constants.h"
This file defines small PME GPU inline host/device functions. Note that OpenCL device-side functions can't use C++ features, so they are located in a similar file pme-gpu-utils.clh. Be sure to keep the logic in sync in both files when changing it!
- Author
- Aleksei Iupinov a.yup.nosp@m.inov.nosp@m.@gmai.nosp@m.l.co.nosp@m.m
|
#define | INLINE_EVERYWHERE __host__ __device__ __forceinline__ |
| A macro for inline GPU functions.
|
|
|
template<int order, int atomsPerWarp> |
int __host__ __device__
__forceinline__ | getSplineParamIndexBase (int warpIndex, int atomWarpIndex) |
| Gets a base of the unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The base only corresponds to the atom index within the execution block. Feed the result into getSplineParamIndex() to get a full index. TODO: it's likely that both parameters can be just replaced with a single atom index, as they are derived from it. Do that, verifying that the generated code is not bloated, and/or revise the spline indexing scheme. Removing warp dependency would also be nice (and would probably coincide with removing c_pmeSpreadGatherAtomsPerWarp). More...
|
|
template<int order, int atomsPerWarp> |
int __host__ __device__
__forceinline__ | getSplineParamIndex (int paramIndexBase, int dimIndex, int splineIndex) |
| Gets a unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The index is wrt to the execution block, in range(0, atomsPerBlock * order * DIM). This function consumes result of getSplineParamIndexBase() and adjusts it for dimIndex and splineIndex . More...
|
|
int __device__ __forceinline__ | pme_gpu_check_atom_data_index (const int atomDataIndex, const int nAtomData) |
| An inline CUDA function for checking the global atom data indices against the atom data array sizes. More...
|
|
int __device__ __forceinline__ | pme_gpu_check_atom_charge (const float coefficient) |
| An inline CUDA function for skipping the zero-charge atoms. More...
|
|
template<int order, int atomsPerWarp>
int __host__ __device__ __forceinline__ getSplineParamIndex |
( |
int |
paramIndexBase, |
|
|
int |
dimIndex, |
|
|
int |
splineIndex |
|
) |
| |
Gets a unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The index is wrt to the execution block, in range(0, atomsPerBlock * order * DIM). This function consumes result of getSplineParamIndexBase() and adjusts it for dimIndex
and splineIndex
.
- Template Parameters
-
order | PME order |
atomsPerWarp | Number of atoms processed by a warp |
- Parameters
-
[in] | paramIndexBase | Must be result of getSplineParamIndexBase(). |
[in] | dimIndex | Dimension index (from 0 to 2) |
[in] | splineIndex | Spline contribution index (from 0 to order - 1) |
- Returns
- Index into theta or dtheta array using GPU layout.
template<int order, int atomsPerWarp>
int __host__ __device__ __forceinline__ getSplineParamIndexBase |
( |
int |
warpIndex, |
|
|
int |
atomWarpIndex |
|
) |
| |
Gets a base of the unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The base only corresponds to the atom index within the execution block. Feed the result into getSplineParamIndex() to get a full index. TODO: it's likely that both parameters can be just replaced with a single atom index, as they are derived from it. Do that, verifying that the generated code is not bloated, and/or revise the spline indexing scheme. Removing warp dependency would also be nice (and would probably coincide with removing c_pmeSpreadGatherAtomsPerWarp).
- Template Parameters
-
order | PME order |
atomsPerWarp | Number of atoms processed by a warp |
- Parameters
-
[in] | warpIndex | Warp index wrt the block. |
[in] | atomWarpIndex | Atom index wrt the warp (from 0 to atomsPerWarp - 1). |
- Returns
- Index into theta or dtheta array using GPU layout.
int __device__ __forceinline__ pme_gpu_check_atom_charge |
( |
const float |
coefficient | ) |
|
An inline CUDA function for skipping the zero-charge atoms.
- Returns
- Non-0 if atom should be processed, 0 otherwise.
- Parameters
-
[in] | coefficient | The atom charge. |
This is called from the spline_and_spread and gather PME kernels.
int __device__ __forceinline__ pme_gpu_check_atom_data_index |
( |
const int |
atomDataIndex, |
|
|
const int |
nAtomData |
|
) |
| |
An inline CUDA function for checking the global atom data indices against the atom data array sizes.
- Parameters
-
[in] | atomDataIndex | The atom data index. |
[in] | nAtomData | The atom data array element count. |
- Returns
- Non-0 if index is within bounds (or PME data padding is enabled), 0 otherwise.
This is called from the spline_and_spread and gather PME kernels. The goal is to isolate the global range checks, and allow avoiding them with c_usePadding enabled.