New features

Added support for OpenCL acceleration

StreamComputing ( has implemented OpenCL support for the short-ranged non-bonded interaction accleration features previously accelerated in GROMACS with CUDA. Supported OpenCL 1.1 devices include GCN-based AMD GPUs, where performance is good, and NVIDIA GPUs, where performance with PME is currently not good. Currently only a one-to-one mapping of PP ranks to GPUs is supported (ie. with thread-MPI). With real MPI, only one PP rank per node (and thus one GPU) is supported. We expect to be able to improve support and performance in future GROMACS versions.

Added support for inter-molecular bonded interactions

The .top file can now contain an [ intermolecular_interactions ] directive, after which bonded interactions can be entered using global atom indices. See the topologies chapter in the reference manual for details.

Added flat-bottomed potential to pull-coordinate types

Allowed pull groups of 1 atom with mass 0

Unless constraint pulling is used, a pull group consisting of 1 atom can have mass 0, since the mass of the COM is irrelevant. This is useful for pulling on a virtual site.

Changed the way pull directions are selected

Pull types and geometries are now selected on a per-coordinate basis, so that different pull coordinates can have different pull types and geometries. The pull .mdp option now takes simply yes or no, and pull-coord1-type and pull-coord1-geomtry play the role of the old pull and pull-geometry options.

Improved pull geometry cylinder

Cylinder geometry now uses a smooth radial weight function and adds radial forces so it now conserves energy. issue 1590.

Expanded pulling output options

There is now also an option for printing the COM of the second group, as well as the reference value for all pull coordinates. Writing the distance components to the pull coordinate output is now optional (off by default).

Allowed configure-time specification target GPU architectures

With CUDA support enabled, by default the GROMACS build system will compile CUDA kernels for all supported hardware. However, this takes a long time and many of the flavours will not be used. For convenience in cases where the GPU model is known in advance, the cmake variables GMX_CUDA_TARGET_SM and GMX_CUDA_TARGET_COMPUTE allow setting the GPU architectures and virtual architectures, respectively. For details of what to put here, see the background information.

Added single-accuracy SIMD double math functions

Apart from double-precision SIMD variables typically being half the width of single, the math functions are considerably more expensive due to higher-order polynomials, which can drop the throughput to 25% of single. In some cases we do not need the full double precision in SIMD operations, so these new math functions use double-precision SIMD variables but only target single-precision accuracy, which can improve performance twofold. The patch also makes the target precision in single and double SIMD an advanced CMake variable, and the unit test tolerance is set based on these variables. This can be used (decided by the user) for a few platforms where the rsqrt/inv table lookups provide one bit too little to get by with a single N-R iteration based on our default target accuracy of 22 bits.

Added new tests

Several new test cases have been added, including that plain multi-simulation, replica-exchange and empty domains work.

Improved parallel distribution and thread count reporting

The domain decomposition no longer prints the atom count for all ranks (which gets far too long at high parallelization), but rather average, standard deviation, min and max.

Implemented zsh shell completions for gmx command

New fatal error to stop benchmarking runs if tuning is still active

Triggering the reset of mdrun counters (which can happen in various ways) could happen while tuning was active, which is useless and potentially wrong. This now emits a fatal error, so the user can judge how to run the benchmark more stably or for longer time.

issue 1781