Common functions for the different NBNXN GPU implementations.
- Author
- Szilard Pall pall..nosp@m.szil.nosp@m.ard@g.nosp@m.mail.nosp@m..com
|
static void | validateGpuAtomLocality (int atomLocality) |
| Check that atom locality values are valid for the GPU module. More...
|
|
static int | gpuAtomToInteractionLocality (int atomLocality) |
| Convert atom locality to interaction locality. More...
|
|
template<typename AtomDataT > |
static void | getGpuAtomRange (const AtomDataT *atomData, int atomLocality, int *atomRangeBegin, int *atomRangeLen) |
| Calculate atom range and return start index and length. More...
|
|
template<typename GpuTimers > |
static void | countPruneKernelTime (GpuTimers *timers, gmx_wallclock_gpu_nbnxn_t *timings, const int iloc) |
| Count pruning kernel time if either kernel has been triggered. More...
|
|
template<typename StagingData > |
static void | nbnxn_gpu_reduce_staged_outputs (const StagingData &nbst, int iLocality, bool reduceEnergies, bool reduceFshift, real *e_lj, real *e_el, rvec *fshift) |
| Reduce data staged internally in the nbnxn module. More...
|
|
template<typename GpuTimers , typename GpuPairlist > |
static void | nbnxn_gpu_accumulate_timings (gmx_wallclock_gpu_nbnxn_t *timings, GpuTimers *timers, const GpuPairlist *plist, int atomLocality, bool didEnergyKernels, bool doNeighbourSearch, bool doTiming) |
| Do the per-step timing accounting of the nonbonded tasks. More...
|
|
bool | nbnxn_gpu_try_finish_task (gmx_nbnxn_gpu_t *nb, int flags, int aloc, bool haveOtherWork, real *e_lj, real *e_el, rvec *fshift, GpuTaskCompletion completionKind) |
| Attempts to complete nonbonded GPU task. More...
|
|
void | nbnxn_gpu_wait_finish_task (gmx_nbnxn_gpu_t *nb, int flags, int aloc, bool haveOtherWork, real *e_lj, real *e_el, rvec *fshift) |
| Wait for the asynchronously launched nonbonded tasks and data transfers to finish. More...
|
|
template<typename GpuTimers , typename GpuPairlist >
static void nbnxn_gpu_accumulate_timings |
( |
gmx_wallclock_gpu_nbnxn_t * |
timings, |
|
|
GpuTimers * |
timers, |
|
|
const GpuPairlist * |
plist, |
|
|
int |
atomLocality, |
|
|
bool |
didEnergyKernels, |
|
|
bool |
doNeighbourSearch, |
|
|
bool |
doTiming |
|
) |
| |
|
inlinestatic |
Do the per-step timing accounting of the nonbonded tasks.
Does timing accumulation and call-count increments for the nonbonded kernels. Note that this function should be called after the current step's nonbonded nonbonded tasks have completed with the exception of the rolling pruning kernels that are accounted for during the following step.
NOTE: if timing with multiple GPUs (streams) becomes possible, the counters could end up being inconsistent due to not being incremented on some of the node when this is skipped on empty local domains!
- Template Parameters
-
GpuTimers | GPU timers type |
GpuPairlist | Pair list type |
- Parameters
-
[out] | timings | Pointer to the NB GPU timings data |
[in] | timers | Pointer to GPU timers data |
[in] | plist | Pointer to the pair list data |
[in] | atomLocality | Atom locality specifier |
[in] | didEnergyKernels | True if energy kernels have been called in the current step |
[in] | doNeighbourSearch | True if this is a neighbour search step. |
[in] | doTiming | True if timing is enabled. |
template<typename StagingData >
static void nbnxn_gpu_reduce_staged_outputs |
( |
const StagingData & |
nbst, |
|
|
int |
iLocality, |
|
|
bool |
reduceEnergies, |
|
|
bool |
reduceFshift, |
|
|
real * |
e_lj, |
|
|
real * |
e_el, |
|
|
rvec * |
fshift |
|
) |
| |
|
inlinestatic |
Reduce data staged internally in the nbnxn module.
Shift forces and electrostatic/LJ energies copied from the GPU into a module-internal staging area are immediately reduced (CPU-side buffers passed) after having waited for the transfers' completion.
Note that this function should always be called after the transfers into the staging buffers has completed.
- Template Parameters
-
StagingData | Type of staging data |
- Parameters
-
[in] | nbst | Nonbonded staging data |
[in] | iLocality | Interaction locality specifier |
[in] | reduceEnergies | True if energy reduction should be done |
[in] | reduceFshift | True if shift force reduction should be done |
[out] | e_lj | Variable to accumulate LJ energy into |
[out] | e_el | Variable to accumulate electrostatic energy into |
[out] | fshift | Pointer to the array of shift forces to accumulate into |
Attempts to complete nonbonded GPU task.
This function attempts to complete the nonbonded task (both GPU and CPU auxiliary work). Success, i.e. that the tasks completed and results are ready to be consumed, is signaled by the return value (always true if blocking wait mode requested).
The completionKind
parameter controls whether the behavior is non-blocking (achieved by passing GpuTaskCompletion::Check) or blocking wait until the results are ready (when GpuTaskCompletion::Wait is passed). As the "Check" mode the function will return immediately if the GPU stream still contain tasks that have not completed, it allows more flexible overlapping of work on the CPU with GPU execution.
Note that it is only safe to use the results, and to continue to the next MD step when this function has returned true which indicates successful completion of
- All nonbonded GPU tasks: both compute and device transfer(s)
- auxiliary tasks: updating the internal module state (timing accumulation, list pruning states) and
- internal staging reduction of (
fshift
, e_el
, e_lj
).
TODO: improve the handling of outputs e.g. by ensuring that this function explcitly returns the force buffer (instead of that being passed only to nbnxn_gpu_launch_cpyback()) and by returning the energy and Fshift contributions for some external/centralized reduction.
- Parameters
-
[in] | nb | The nonbonded data GPU structure |
[in] | flags | Force flags |
[in] | aloc | Atom locality identifier |
[in] | haveOtherWork | Tells whether there is other work than non-bonded work in the nbnxn stream(s) |
[out] | e_lj | Pointer to the LJ energy output to accumulate into |
[out] | e_el | Pointer to the electrostatics energy output to accumulate into |
[out] | fshift | Pointer to the shift force buffer to accumulate into |
[in] | completionKind | Indicates whether nnbonded task completion should only be checked rather than waited for |
- Returns
- True if the nonbonded tasks associated with
aloc
locality have completed