Gromacs
2019-beta2
|
#include <gromacs/ewald/pme-gpu-program-impl.h>
PME GPU persistent host program/kernel data, which should be initialized once for the whole execution.
Primary purpose of this is to not recompile GPU kernels for each OpenCL unit test, while the relevant GPU context (e.g. cl_context) instance persists. In CUDA, this just assigns the kernel function pointers. This also implicitly relies on the fact that reasonable share of the kernels are always used. If there were more template parameters, even smaller share of all possible kernels would be used.
This also doesn't manage cuFFT/clFFT kernels, which depend on the PME grid dimensions.
TODO: pass cl_context to the constructor and not create it inside. See also Redmine #2522.
Public Types | |
using | PmeKernelHandle = void(*)(const struct PmeGpuCudaKernelParams) |
Conveniently all the PME kernels use the same single argument type. | |
Public Member Functions | |
PmeGpuProgramImpl (const gmx_device_info_t *deviceInfo) | |
Constructor for the given device. | |
Public Attributes | |
Context | context |
This is a handle to the GPU context, which is just a dummy in CUDA, but is created/destroyed by this class in OpenCL. TODO: Later we want to be able to own the context at a higher level and not here, but this class would still need the non-owning context handle to build the kernels. | |
size_t | warpSize |
Maximum synchronous GPU thread group execution width. "Warp" is a CUDA term which we end up reusing in OpenCL kernels as well. For CUDA, this is a static value that comes from gromacs/gpu_utils/cuda_arch_utils.cuh; for OpenCL, we have to query it dynamically. | |
size_t | spreadWorkGroupSize |
Spread/spline kernels are compiled only for order of 4. More... | |
PmeKernelHandle | splineKernel |
PmeKernelHandle | spreadKernel |
PmeKernelHandle | splineAndSpreadKernel |
size_t | gatherWorkGroupSize |
Same for gather: hardcoded X/Y unwrap parameters, order of 4, plus it can either reduce with previous forces in the host buffer, or ignore them. | |
PmeKernelHandle | gatherReduceWithInputKernel |
PmeKernelHandle | gatherKernel |
size_t | solveMaxWorkGroupSize |
Solve kernel doesn't care about the interpolation order, but can optionally compute energy and virial, and supports XYZ and YZX grid orderings. | |
PmeKernelHandle | solveYZXKernel |
PmeKernelHandle | solveXYZKernel |
PmeKernelHandle | solveYZXEnergyKernel |
PmeKernelHandle | solveXYZEnergyKernel |
size_t PmeGpuProgramImpl::spreadWorkGroupSize |
Spread/spline kernels are compiled only for order of 4.
Spreading kernels also have hardcoded X/Y indices wrapping parameters, as a placeholder for implementing 1/2D decomposition.