Gromacs  2024.3
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Enumerations | Variables
pme_gpu_constants.h File Reference
#include "config.h"
+ Include dependency graph for pme_gpu_constants.h:
+ This graph shows which files directly or indirectly include this file:

Description

This file defines the PME GPU compile-time constants/macros, used both in device and host code.

As OpenCL C is not aware of constexpr, most of this file is forwarded to the OpenCL kernel compilation as defines with same names, for the sake of code similarity.

Todo:
The values are currently common to both CUDA and OpenCL implementations, but should be reconsidered when we tune the OpenCL implementation. See Issue #2528.
Author
Aleksei Iupinov a.yup.nosp@m.inov.nosp@m.@gmai.nosp@m.l.co.nosp@m.m

Enumerations

enum  ThreadsPerAtom : int { ThreadsPerAtom::Order, ThreadsPerAtom::OrderSquared, ThreadsPerAtom::Count }
 The number of GPU threads used for computing spread/gather contributions of a single atom, which relates to the PME order. More...
 

Variables

constexpr bool c_skipNeutralAtoms = false
 false: Atoms with zero charges are processed by PME. Could introduce some overhead. true: Atoms with zero charges are not processed by PME. Adds branching to the spread/gather. Could be good for performance in specific systems with lots of neutral atoms. More...
 
constexpr int c_virialAndEnergyCount = 7
 Number of PME solve output floating point numbers. 6 for symmetric virial matrix + 1 for reciprocal energy.
 
constexpr int c_pmeGpuOrder = 4
 PME order parameter. More...
 
constexpr int c_spreadMaxWarpsPerBlock = 8
 Spreading max block width in warps picked among powers of 2 (2, 4, 8, 16) for max. occupancy and min. runtime in most cases.
 
constexpr int c_solveMaxWarpsPerBlock = 8
 Solving kernel max block width in warps picked among powers of 2 (2, 4, 8, 16) for max. More...
 
constexpr int c_gatherMaxWarpsPerBlock = 4
 Gathering max block width in warps - picked empirically among 2, 4, 8, 16 for max. occupancy and min. runtime.
 

Enumeration Type Documentation

enum ThreadsPerAtom : int
strong

The number of GPU threads used for computing spread/gather contributions of a single atom, which relates to the PME order.

TODO: this assumption leads to minimum execution width of 16. See Issue #2516

Enumerator
Order 

Use a number of threads equal to the PME order (ie. 4)

Only CUDA implements this. See Issue #2516

OrderSquared 

Use a number of threads equal to the square of the PME order (ie. 16)

Count 

Size of the enumeration.

Variable Documentation

constexpr int c_pmeGpuOrder = 4

PME order parameter.

Note that the GPU code, unlike the CPU, only supports order 4.

constexpr bool c_skipNeutralAtoms = false

false: Atoms with zero charges are processed by PME. Could introduce some overhead. true: Atoms with zero charges are not processed by PME. Adds branching to the spread/gather. Could be good for performance in specific systems with lots of neutral atoms.

Todo:
Estimate performance differences.
constexpr int c_solveMaxWarpsPerBlock = 8

Solving kernel max block width in warps picked among powers of 2 (2, 4, 8, 16) for max.

occupancy and min. runtime (560Ti (CC2.1), 660Ti (CC3.0) and 750 (CC5.0)))