|
Gromacs
2026.0
|
#include "config.h"
Include dependency graph for pme_gpu_constants.h:
This graph shows which files directly or indirectly include this file:This file defines the PME GPU compile-time constants/macros, used both in device and host code.
As OpenCL C is not aware of constexpr, most of this file is forwarded to the OpenCL kernel compilation as defines with same names, for the sake of code similarity.
Enumerations | |
| enum | ThreadsPerAtom : int { ThreadsPerAtom::Order, ThreadsPerAtom::OrderSquared, ThreadsPerAtom::Count } |
| The number of GPU threads used for computing spread/gather contributions of a single atom, which relates to the PME order. More... | |
Variables | |
| constexpr bool | c_skipNeutralAtoms = false |
| false: Atoms with zero charges are processed by PME. Could introduce some overhead. true: Atoms with zero charges are not processed by PME. Adds branching to the spread/gather. Could be good for performance in specific systems with lots of neutral atoms. More... | |
| constexpr int | c_virialAndEnergyCount = 7 |
| Number of PME solve output floating point numbers. 6 for symmetric virial matrix + 1 for reciprocal energy. | |
| constexpr int | c_pmeGpuOrder = 4 |
| PME order parameter. More... | |
| constexpr int | c_spreadMaxWarpsPerBlock = 8 |
| Spreading max block width in warps picked among powers of 2 (2, 4, 8, 16) for max. occupancy and min. runtime in most cases. | |
| constexpr int | c_solveMaxWarpsPerBlock = 8 |
| Solving kernel max block width in warps picked among powers of 2 (2, 4, 8, 16) for max. More... | |
| constexpr int | c_gatherMaxWarpsPerBlock = 4 |
| Gathering max block width in warps - picked empirically among 2, 4, 8, 16 for max. occupancy and min. runtime. | |
|
strong |
The number of GPU threads used for computing spread/gather contributions of a single atom, which relates to the PME order.
TODO: this assumption leads to minimum execution width of 16. See Issue #2516
| constexpr int c_pmeGpuOrder = 4 |
PME order parameter.
Note that the GPU code, unlike the CPU, only supports order 4.
| constexpr bool c_skipNeutralAtoms = false |
false: Atoms with zero charges are processed by PME. Could introduce some overhead. true: Atoms with zero charges are not processed by PME. Adds branching to the spread/gather. Could be good for performance in specific systems with lots of neutral atoms.
| constexpr int c_solveMaxWarpsPerBlock = 8 |
Solving kernel max block width in warps picked among powers of 2 (2, 4, 8, 16) for max.
occupancy and min. runtime (560Ti (CC2.1), 660Ti (CC3.0) and 750 (CC5.0)))
1.8.5