Gromacs
2024.4
|
#include <gromacs/ewald/pme_gpu_types_host.h>
The main PME GPU host structure, included in the PME CPU structure by pointer.
Public Attributes | |
std::shared_ptr< PmeShared > | common |
The information copied once per reinit from the CPU structure. | |
const PmeGpuProgram * | programHandle_ |
A handle to the program created by buildPmeGpuProgram() | |
std::unique_ptr < gmx::ClfftInitializer > | initializedClfftLibrary_ |
Handle that ensures the clFFT library has been initialized once per process. | |
PmeGpuSettings | settings |
The settings. | |
PmeGpuStaging | staging |
The host-side buffers. The device-side buffers are buried in kernelParams, but that will have to change. | |
int | nAtomsAlloc |
Number of local atoms, padded to be divisible by c_pmeAtomDataAlignment. More... | |
std::intmax_t | maxGridWidthX |
Kernel scheduling grid width limit in X - derived from deviceinfo compute capability in CUDA. Declared as very large int to make it useful in computations with type promotion, to avoid overflows. OpenCL seems to not have readily available global work size limit, so we just assign a large arbitrary constant to this instead. TODO: this should be in PmeGpuProgram(Impl) | |
int | minParticleCountToRecalculateSplines = 23000 |
Minimum particle count to prefer recalculating splines. More... | |
std::shared_ptr < PmeGpuKernelParams > | kernelParams |
A single structure encompassing all the PME data used on GPU. Its value is the only argument to all the PME GPU kernels. More... | |
std::shared_ptr< PmeGpuSpecific > | archSpecific |
The pointer to GPU-framework specific host-side data, such as CUDA streams and events. | |
std::unique_ptr < PmeGpuHaloExchange > | haloExchange |
The pointer to PME halo-exchange specific host-side data. | |
bool | useNvshmem = false |
std::unique_ptr< PmeNvshmemHost > | nvshmemParams |
std::shared_ptr<PmeGpuKernelParams> PmeGpu::kernelParams |
A single structure encompassing all the PME data used on GPU. Its value is the only argument to all the PME GPU kernels.
int PmeGpu::minParticleCountToRecalculateSplines = 23000 |
Minimum particle count to prefer recalculating splines.
The gather kernel can either recalculate the splines or load those saved during the spline (and spread) kernel. Recalculating is advantageous when there are enough particles. When so doing, it is best to use fewer threads per atom in the spline and spread.
This feature is supported by CUDA and SYCL. Spread pipelining requires spline recalculation.
int PmeGpu::nAtomsAlloc |
Number of local atoms, padded to be divisible by c_pmeAtomDataAlignment.
Used only as a basic size for almost all the atom data allocations (spline parameter data is also aligned by PME_SPREADGATHER_PARTICLES_PER_WARP). kernelParams.atoms.nAtoms is the actual atom count to be used for most data copying.
TODO: memory allocation/padding properties should be handled by something like a container