New features ===================== Added support for OpenCL acceleration -------------------------------------- StreamComputing (http://www.streamcomputing.eu) has implemented OpenCL support for the short-ranged non-bonded interaction accleration features previously accelerated in GROMACS with CUDA. Supported OpenCL 1.1 devices include GCN-based AMD GPUs, where performance is good, and NVIDIA GPUs, where performance with PME is currently not good. Currently only a one-to-one mapping of PP ranks to GPUs is supported (ie. with thread-MPI). With real MPI, only one PP rank per node (and thus one GPU) is supported. We expect to be able to improve support and performance in future GROMACS versions. Added support for inter-molecular bonded interactions ----------------------------------------------------- The .top file can now contain an ``[ intermolecular_interactions ]`` directive, after which bonded interactions can be entered using global atom indices. See the topologies chapter in the reference manual for details. Added flat-bottomed potential to pull-coordinate types ------------------------------------------------------ Allowed pull groups of 1 atom with mass 0 ------------------------------------------------------------ Unless constraint pulling is used, a pull group consisting of 1 atom can have mass 0, since the mass of the COM is irrelevant. This is useful for pulling on a virtual site. Changed the way pull directions are selected -------------------------------------------- Pull types and geometries are now selected on a per-coordinate basis, so that different pull coordinates can have different pull types and geometries. The ``pull`` .mdp option now takes simply ``yes`` or ``no``, and ``pull-coord1-type`` and ``pull-coord1-geomtry`` play the role of the old ``pull`` and ``pull-geometry`` options. Improved pull geometry cylinder ------------------------------- Cylinder geometry now uses a smooth radial weight function and adds radial forces so it now conserves energy. :issue:`1590`. Expanded pulling output options ------------------------------- There is now also an option for printing the COM of the second group, as well as the reference value for all pull coordinates. Writing the distance components to the pull coordinate output is now optional (off by default). Allowed configure-time specification target GPU architectures -------------------------------------------------------------- With CUDA support enabled, by default the GROMACS build system will compile CUDA kernels for all supported hardware. However, this takes a long time and many of the flavours will not be used. For convenience in cases where the GPU model is known in advance, the cmake variables ``GMX_CUDA_TARGET_SM`` and ``GMX_CUDA_TARGET_COMPUTE`` allow setting the GPU architectures and virtual architectures, respectively. For details of what to put here, see the `background information `_. Added single-accuracy SIMD double math functions ------------------------------------------------------------ Apart from double-precision SIMD variables typically being half the width of single, the math functions are considerably more expensive due to higher-order polynomials, which can drop the throughput to 25% of single. In some cases we do not need the full double precision in SIMD operations, so these new math functions use double-precision SIMD variables but only target single-precision accuracy, which can improve performance twofold. The patch also makes the target precision in single and double SIMD an advanced CMake variable, and the unit test tolerance is set based on these variables. This can be used (decided by the user) for a few platforms where the rsqrt/inv table lookups provide one bit too little to get by with a single N-R iteration based on our default target accuracy of 22 bits. Added new tests ------------------------------------------------------------ Several new test cases have been added, including that plain multi-simulation, replica-exchange and empty domains work. Improved parallel distribution and thread count reporting ------------------------------------------------------------ The domain decomposition no longer prints the atom count for all ranks (which gets far too long at high parallelization), but rather average, standard deviation, min and max. Implemented zsh shell completions for gmx command ------------------------------------------------------- New fatal error to stop benchmarking runs if tuning is still active ------------------------------------------------------- Triggering the reset of mdrun counters (which can happen in various ways) could happen while tuning was active, which is useless and potentially wrong. This now emits a fatal error, so the user can judge how to run the benchmark more stably or for longer time. :issue:`1781`