#include <gromacs/ewald/pme_gpu_program_impl.h>

Description

PME GPU persistent host program/kernel data, which should be initialized once for the whole execution.

Primary purpose of this is to not recompile GPU kernels for each OpenCL unit test, while the relevant GPU context (e.g. cl_context) instance persists. In CUDA, this just assigns the kernel function pointers. This also implicitly relies on the fact that reasonable share of the kernels are always used. If there were more template parameters, even smaller share of all possible kernels would be used.

Todo:: In future if we would need to react to either user input or auto-tuning to compile different kernels, then we might wish to revisit the number of kernels we pre-compile, and/or the management of their lifetime.

This also doesn't manage cuFFT/clFFT kernels, which depend on the PME grid dimensions.

TODO: pass cl_context to the constructor and not create it inside. See also Issue #2522.

Public Types
using	PmeKernelHandle = ISyclKernelFunctor *
	Conveniently all the PME kernels use the same single argument type.

Public Member Functions
	PmeGpuProgramImpl (const DeviceContext &deviceContext)
	Constructor for the given device.

	GMX_DISALLOW_COPY_AND_ASSIGN (PmeGpuProgramImpl)

int	warpSize () const
	Return the warp size for which the kernels were compiled.

Public Attributes
const DeviceContext &	deviceContext_
	This is a handle to the GPU context, which is just a dummy in CUDA, but is created/destroyed by this class in OpenCL.

size_t	warpSize_
	Maximum synchronous GPU thread group execution width. "Warp" is a CUDA term which we end up reusing in OpenCL kernels as well. For CUDA, this is a static value that comes from gromacs/gpu_utils/cuda_arch_utils.cuh; for OpenCL, we have to query it dynamically.

PmeKernelHandle	nvshmemSignalKern


size_t	spreadWorkGroupSize
	Spread/spline kernels are compiled only for order of 4. More...

PmeKernelHandle	splineKernelSingle

PmeKernelHandle	splineKernelThPerAtom4Single

PmeKernelHandle	spreadKernelSingle

PmeKernelHandle	spreadKernelThPerAtom4Single

PmeKernelHandle	splineAndSpreadKernelSingle

PmeKernelHandle	splineAndSpreadKernelThPerAtom4Single

PmeKernelHandle	splineAndSpreadKernelWriteSplinesSingle

PmeKernelHandle	splineAndSpreadKernelWriteSplinesThPerAtom4Single

PmeKernelHandle	splineKernelDual

PmeKernelHandle	splineKernelThPerAtom4Dual

PmeKernelHandle	spreadKernelDual

PmeKernelHandle	spreadKernelThPerAtom4Dual

PmeKernelHandle	splineAndSpreadKernelDual

PmeKernelHandle	splineAndSpreadKernelThPerAtom4Dual

PmeKernelHandle	splineAndSpreadKernelWriteSplinesDual

PmeKernelHandle	splineAndSpreadKernelWriteSplinesThPerAtom4Dual


size_t	gatherWorkGroupSize
	Same for gather: hardcoded X/Y unwrap parameters, order of 4, plus it can either reduce with previous forces in the host buffer, or ignore them. More...

PmeKernelHandle	gatherKernelSingle

PmeKernelHandle	gatherKernelThPerAtom4Single

PmeKernelHandle	gatherKernelReadSplinesSingle

PmeKernelHandle	gatherKernelReadSplinesThPerAtom4Single

PmeKernelHandle	gatherKernelDual

PmeKernelHandle	gatherKernelThPerAtom4Dual

PmeKernelHandle	gatherKernelReadSplinesDual

PmeKernelHandle	gatherKernelReadSplinesThPerAtom4Dual


size_t	solveMaxWorkGroupSize
	Solve kernel doesn't care about the interpolation order, but can optionally compute energy and virial, and supports XYZ and YZX grid orderings. More...

PmeKernelHandle	solveYZXKernelA

PmeKernelHandle	solveXYZKernelA

PmeKernelHandle	solveYZXEnergyKernelA

PmeKernelHandle	solveXYZEnergyKernelA

PmeKernelHandle	solveYZXKernelB

PmeKernelHandle	solveXYZKernelB

PmeKernelHandle	solveYZXEnergyKernelB

PmeKernelHandle	solveXYZEnergyKernelB

Member Data Documentation

size_t PmeGpuProgramImpl::gatherWorkGroupSize

Same for gather: hardcoded X/Y unwrap parameters, order of 4, plus it can either reduce with previous forces in the host buffer, or ignore them.

Also similarly to the gather we can use either order(4) or order*order (16) threads per atom and either recalculate the splines or read the ones written by the spread The kernels are templated separately for using one or two grids (required for calculating energies and virial).

size_t PmeGpuProgramImpl::solveMaxWorkGroupSize

Solve kernel doesn't care about the interpolation order, but can optionally compute energy and virial, and supports XYZ and YZX grid orderings.

The kernels are templated separately for grids in state A and B.

size_t PmeGpuProgramImpl::spreadWorkGroupSize

Spread/spline kernels are compiled only for order of 4.

There are multiple versions of each kernel, paramaretized according to Number of threads per atom. Using either order(4) or order*order (16) threads per atom is supported If the spline data is written in the spline/spread kernel and loaded in the gather or recalculated in the gather. Spreading kernels also have hardcoded X/Y indices wrapping parameters, as a placeholder for implementing 1/2D decomposition. The kernels are templated separately for spreading on one grid (one or two sets of coefficients) or on two grids (required for energy and virial calculations).

The documentation for this struct was generated from the following files:

src/gromacs/ewald/pme_gpu_program_impl.h
src/gromacs/ewald/pme_gpu_program_impl.cpp
src/gromacs/ewald/pme_gpu_program_impl_ocl.cpp
src/gromacs/ewald/pme_gpu_program_impl_sycl.cpp

Description

Public Types

Public Member Functions

Public Attributes

Member Data Documentation