Performance improvements#
Increased default T- and P-coupling intervals#
The default maximum values temperature and pressure coupling intervals have been increased from 10 to 100 steps. These values are used when the default value of -1 is specified in the mdp file and a lower value is used when required for accurate integration. The improves the performance of both GPU runs and parallel runs.
The global communication frequency is independent of nstlist#
The global communication frequency no longer depends on nstlist. This can improve performance in simulations using GPUs in particular.
PME decomposition support with CUDA backend#
PME decomposition support has been added to the CUDA backend. With PME offloaded to the GPU, the number of PME ranks can
now be configured with -npme
option (previously limited to 1). The implementation requires building GROMACS
with CUDA-aware MPI and with NVIDIA’s cuFFTMp library. GPU-based PME decomposition support still lacks substantial testing,
hence is included in the current release as an experimental feature and should be used with caution (with results compared to
those from equivalent runs using a single PME GPU). This feature can be enabled using the GMX_GPU_PME_DECOMPOSITION
environment
variable. The GROMACS development team welcomes any feedback to help mature this feature.
CUDA Graphs for GPU-resident Steps#
New CUDA functionality has been introduced, allowing GPU activities
to be launched as a single CUDA graph on each step rather than multiple
activities scheduled to multiple CUDA streams. It only works for those
cases which already support GPU-resident steps (where all force and
update calculations are GPU-accelerated). This offers performance
advantages, especially for small cases, through reduction in both CPU
and GPU side scheduling overheads. The feature can optionally be
activated via the GMX_CUDA_GRAPH
environment variable.