Gromacs  2020.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
List of all members | Public Types | Public Member Functions | Public Attributes
PmeGpuProgramImpl Struct Reference

#include <gromacs/ewald/pme_gpu_program_impl.h>

Description

PME GPU persistent host program/kernel data, which should be initialized once for the whole execution.

Primary purpose of this is to not recompile GPU kernels for each OpenCL unit test, while the relevant GPU context (e.g. cl_context) instance persists. In CUDA, this just assigns the kernel function pointers. This also implicitly relies on the fact that reasonable share of the kernels are always used. If there were more template parameters, even smaller share of all possible kernels would be used.

Todo:
In future if we would need to react to either user input or auto-tuning to compile different kernels, then we might wish to revisit the number of kernels we pre-compile, and/or the management of their lifetime.

This also doesn't manage cuFFT/clFFT kernels, which depend on the PME grid dimensions.

TODO: pass cl_context to the constructor and not create it inside. See also Redmine #2522.

Public Types

using PmeKernelHandle = void(*)(const struct PmeGpuCudaKernelParams)
 Conveniently all the PME kernels use the same single argument type.
 

Public Member Functions

 PmeGpuProgramImpl (const gmx_device_info_t *deviceInfo)
 Constructor for the given device.
 

Public Attributes

DeviceContext context
 This is a handle to the GPU context, which is just a dummy in CUDA, but is created/destroyed by this class in OpenCL. TODO: Later we want to be able to own the context at a higher level and not here, but this class would still need the non-owning context handle to build the kernels.
 
size_t warpSize
 Maximum synchronous GPU thread group execution width. "Warp" is a CUDA term which we end up reusing in OpenCL kernels as well. For CUDA, this is a static value that comes from gromacs/gpu_utils/cuda_arch_utils.cuh; for OpenCL, we have to query it dynamically.
 
size_t spreadWorkGroupSize
 Spread/spline kernels are compiled only for order of 4. More...
 
PmeKernelHandle splineKernel
 
PmeKernelHandle splineKernelThPerAtom4
 
PmeKernelHandle spreadKernel
 
PmeKernelHandle spreadKernelThPerAtom4
 
PmeKernelHandle splineAndSpreadKernel
 
PmeKernelHandle splineAndSpreadKernelThPerAtom4
 
PmeKernelHandle splineAndSpreadKernelWriteSplines
 
PmeKernelHandle splineAndSpreadKernelWriteSplinesThPerAtom4
 
size_t gatherWorkGroupSize
 Same for gather: hardcoded X/Y unwrap parameters, order of 4, plus it can either reduce with previous forces in the host buffer, or ignore them. More...
 
PmeKernelHandle gatherReduceWithInputKernel
 
PmeKernelHandle gatherReduceWithInputKernelThPerAtom4
 
PmeKernelHandle gatherKernel
 
PmeKernelHandle gatherKernelThPerAtom4
 
PmeKernelHandle gatherReduceWithInputKernelReadSplines
 
PmeKernelHandle gatherReduceWithInputKernelReadSplinesThPerAtom4
 
PmeKernelHandle gatherKernelReadSplines
 
PmeKernelHandle gatherKernelReadSplinesThPerAtom4
 
size_t solveMaxWorkGroupSize
 Solve kernel doesn't care about the interpolation order, but can optionally compute energy and virial, and supports XYZ and YZX grid orderings.
 
PmeKernelHandle solveYZXKernel
 
PmeKernelHandle solveXYZKernel
 
PmeKernelHandle solveYZXEnergyKernel
 
PmeKernelHandle solveXYZEnergyKernel
 

Member Data Documentation

size_t PmeGpuProgramImpl::gatherWorkGroupSize

Same for gather: hardcoded X/Y unwrap parameters, order of 4, plus it can either reduce with previous forces in the host buffer, or ignore them.

Also similarly to the gather we can use either order(4) or order*order (16) threads per atom and either recalculate the splines or read the ones written by the spread

size_t PmeGpuProgramImpl::spreadWorkGroupSize

Spread/spline kernels are compiled only for order of 4.

There are multiple versions of each kernel, paramaretized according to Number of threads per atom. Using either order(4) or order*order (16) threads per atom is supported If the spline data is written in the spline/spread kernel and loaded in the gather or recalculated in the gather. Spreading kernels also have hardcoded X/Y indices wrapping parameters, as a placeholder for implementing 1/2D decomposition.


The documentation for this struct was generated from the following files: