Gromacs  2026.0-dev-20251119-5f0a571d
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Functions | Variables
pme_gpu_calculate_splines_hip.h File Reference
#include "gmxpre.h"
#include <cassert>
#include "gromacs/gpu_utils/hip_kernel_utils.h"
#include "gromacs/gpu_utils/vectype_ops_hip.h"
#include "pme_gpu_constants.h"
#include "pme_gpu_types.h"
#include "pme_grid.h"
+ Include dependency graph for pme_gpu_calculate_splines_hip.h:
+ This graph shows which files directly or indirectly include this file:

Description

Implements helper routines for PME gather and spline routines.

Author
Paul Bauer pul.b.nosp@m.auer.nosp@m..q@gm.nosp@m.ail..nosp@m.com
Julio Maia julio.nosp@m..mai.nosp@m.a@amd.nosp@m..com
Aleksei Iupinov a.yup.nosp@m.inov.nosp@m.@gmai.nosp@m.l.co.nosp@m.m

Functions

template<int order, int atomsPerWarp>
static int __device__
__forceinline__ 
getSplineParamIndexBase (int warpIndex, int atomWarpIndex)
 Gets a base of the unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The base only corresponds to the atom index within the execution block. Feed the result into getSplineParamIndex() to get a full index. TODO: it's likely that both parameters can be just replaced with a single atom index, as they are derived from it. Do that, verifying that the generated code is not bloated, and/or revise the spline indexing scheme. Removing warp dependency would also be nice (and would probably coincide with removing c_pmeSpreadGatherAtomsPerWarp). More...
 
template<int order, int atomsPerWarp>
static int __device__
__forceinline__ 
getSplineParamIndex (int paramIndexBase, int dimIndex, int splineIndex)
 Gets a unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The index is wrt to the execution block, in range(0, atomsPerBlock * order * DIM). This function consumes result of getSplineParamIndexBase() and adjusts it for dimIndex and splineIndex. More...
 
static bool __device__
__forceinline__ 
pme_gpu_check_atom_charge (const float coefficient)
 An inline HIP function for skipping the zero-charge atoms. More...
 
template<typename T >
static __device__ void assertIsFinite (T arg)
 Asserts if the argument is finite. More...
 
template<>
__device__ void assertIsFinite (float3 gmx_unused arg)
 
template<typename T >
static __device__ void assertIsFinite (T gmx_unused arg)
 
template<typename T , int atomsPerBlock, int dataCountPerAtom>
static __device__
__forceinline__ void 
pme_gpu_stage_atom_data (T *__restrict__ sm_destination, const T *__restrict__ gm_source)
 General purpose function for loading atom-related data from global to shared memory. More...
 
template<int order, int atomsPerBlock, int atomsPerWarp, bool writeSmDtheta, bool writeGlobal, int numGrids, int parallelExecutionWidth>
static __device__
__forceinline__ void 
calculate_splines (const PmeGpuKernelParams kernelParams, const int atomIndexOffset, const float3 atomX, const float atomCharge, float *__restrict__ sm_theta, float *__restrict__ sm_dtheta, int *__restrict__ sm_gridlineIndices)
 PME GPU spline parameter and gridline indices calculation. This corresponds to the CPU functions calc_interpolation_idx() and make_bsplines(). First stage of the whole kernel. More...
 

Variables

static constexpr bool c_useAtomDataPrefetch = false
 Controls if the atom and charge data is prefeched into shared memory or loaded per thread from global.
 

Function Documentation

template<typename T >
static __device__ void assertIsFinite ( arg)
inlinestatic

Asserts if the argument is finite.

The function works for any data type, that can be casted to float. Note that there is also a specialized implementation for float3 data type.

Parameters
[in]argArgument to check.
template<int order, int atomsPerBlock, int atomsPerWarp, bool writeSmDtheta, bool writeGlobal, int numGrids, int parallelExecutionWidth>
static __device__ __forceinline__ void calculate_splines ( const PmeGpuKernelParams  kernelParams,
const int  atomIndexOffset,
const float3  atomX,
const float  atomCharge,
float *__restrict__  sm_theta,
float *__restrict__  sm_dtheta,
int *__restrict__  sm_gridlineIndices 
)
static

PME GPU spline parameter and gridline indices calculation. This corresponds to the CPU functions calc_interpolation_idx() and make_bsplines(). First stage of the whole kernel.

Template Parameters
orderPME interpolation order.
atomsPerBlockNumber of atoms processed by a block - should be accounted for in the sizes of the shared memory arrays.
atomsPerWarpNumber of atoms processed by a warp
writeSmDthetaBool controlling if the theta derivative should be written to shared memory. Enables calculation of dtheta if set.
writeGlobalA boolean which tells if the theta values and gridlines should be written to global memory. Enables calculation of dtheta if set.
numGridsThe number of grids using the splines.
Parameters
[in]kernelParamsInput PME HIP data in constant memory.
[in]atomIndexOffsetStarting atom index for the execution block w.r.t. global memory.
[in]atomXAtom coordinate of atom processed by thread.
[in]atomChargeAtom charge/coefficient of atom processed by thread.
[out]sm_thetaAtom spline values in the shared memory.
[out]sm_dthetaDerivative of atom spline values in shared memory.
[out]sm_gridlineIndicesAtom gridline indices in the shared memory.
template<int order, int atomsPerWarp>
static int __device__ __forceinline__ getSplineParamIndex ( int  paramIndexBase,
int  dimIndex,
int  splineIndex 
)
static

Gets a unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The index is wrt to the execution block, in range(0, atomsPerBlock * order * DIM). This function consumes result of getSplineParamIndexBase() and adjusts it for dimIndex and splineIndex.

Template Parameters
orderPME order
atomsPerWarpNumber of atoms processed by a warp
Parameters
[in]paramIndexBaseMust be result of getSplineParamIndexBase().
[in]dimIndexDimension index (from 0 to 2)
[in]splineIndexSpline contribution index (from 0 to order - 1)
Returns
Index into theta or dtheta array using GPU layout.
template<int order, int atomsPerWarp>
static int __device__ __forceinline__ getSplineParamIndexBase ( int  warpIndex,
int  atomWarpIndex 
)
static

Gets a base of the unique index to an element in a spline parameter buffer (theta/dtheta), which is laid out for GPU spread/gather kernels. The base only corresponds to the atom index within the execution block. Feed the result into getSplineParamIndex() to get a full index. TODO: it's likely that both parameters can be just replaced with a single atom index, as they are derived from it. Do that, verifying that the generated code is not bloated, and/or revise the spline indexing scheme. Removing warp dependency would also be nice (and would probably coincide with removing c_pmeSpreadGatherAtomsPerWarp).

Template Parameters
orderPME order
atomsPerWarpNumber of atoms processed by a warp
Parameters
[in]warpIndexWarp index wrt the block.
[in]atomWarpIndexAtom index wrt the warp (from 0 to atomsPerWarp - 1).
Returns
Index into theta or dtheta array using GPU layout.
static bool __device__ __forceinline__ pme_gpu_check_atom_charge ( const float  coefficient)
static

An inline HIP function for skipping the zero-charge atoms.

Returns
Non-0 if atom should be processed, 0 otherwise.
Parameters
[in]coefficientThe atom charge.

This is called from the spline_and_spread and gather PME kernels.

template<typename T , int atomsPerBlock, int dataCountPerAtom>
static __device__ __forceinline__ void pme_gpu_stage_atom_data ( T *__restrict__  sm_destination,
const T *__restrict__  gm_source 
)
static

General purpose function for loading atom-related data from global to shared memory.

Template Parameters
TData type (float/int/...)
atomsPerBlockNumber of atoms processed by a block - should be accounted for in the size of the shared memory array.
dataCountPerAtomNumber of data elements per single atom (e.g. DIM for an rvec coordinates array).
Parameters
[out]sm_destinationShared memory array for output.
[in]gm_sourceGlobal memory array for input.