These document fixes for issues that have been fixed for the 2016 release, but which have not been back-ported to other branches.
Fixed Verlet buffer calculation with nstlist=1¶
Under rare circumstances the Verlet buffer calculation code was called with nstlist=1, which caused a division by zero. The division by zero is now avoided. Furthermore, grompp now also determines and prints the Verlet buffer sizes with nstlist=1, which provider the user information and adds consistency checks.
Fixed large file issue on 32-bit platforms¶
At some point gcc started to issue a warning instead of a fatal error for the checking code; fixed to really generate an error now.
Avoided using abort() for fatal errors¶
This avoids situations that produce useless core dumps.
Fixed possible division by zero in polarization code¶
Avoided numerical overflow with overlapping atoms in Verlet scheme¶
The Verlet-scheme kernels did not allow overlapping atoms, even if they were not interacting (in contrast to the group kernels). Fixed by clamping the interaction distance so it can not become smaller than ~6e-4 in single and ~1e-18 in double, and when this number is later multiplied by zero parameters it will not influence forces. The clamping should never affect normal interactions; mdrun would previously crash for distances that were this small. On Haswell, RF and PME kernels get 3% and 1% slower, respectively. On CUDA, RF and PME kernels get 1% and 2% faster, respectively.
Relax pull PBC check¶
The check in the pull code for COM distances close to half the box was too strict for directional pulling. Now dimensions orthogonal to the pull vector are no longer checked. (The check was actually not strict enough for directional pulling along x or y in triclinic units cells, but that is a corner case.) Furthermore, the direction-periodic hint is now only printed with geometry direction.
Add detection for ARMv7 cycle counter support¶
ARMv7 requires special kernel settings to allow cycle counters to be read. This change adds a cmake setting to enable/disable counters. On all architectures but ARMv7 it is enabled by default, and on ARMv7 we run a small test program to see if the can be executed successfully. When cross-compiling to ARMv7 counters will be disabled, but either choice can be overridden by setting a value for GMX_CYCLECOUNTERS in cmake.
Introduced fatal error for too few frames in
gmx dos from crashing with an incomprehensible error
message when there are too few frames, test for this.
Part of Issue 1813
Properly reset CUDA application clocks¶
We now store the application clock values we read when starting mdrun and reset to these values, but only when clocks have not been changed (by another process) in the meantime.
Fixed replica-exchange debug output to all go to the debug file¶
mdrun -debug was selected with replica exchange, some of the
order description was printed to mdrun’s log file, but it looks like the
actual numbers were being printed to the debug log. This puts them
both in the debug log.
Fixed gmx mdrun -membed to always run on a single rank¶
This used to give a fatal error if default thread-MPI mdrun had chosen more than one rank, but it will now correctly choose to use a single rank.
Fixed issues with using int for number of simulation steps¶
Mostly we use a 64-bit integer, but we messed up a few things.
During mdrun -rerun, edr writing complained about the negative step number, implied it might be working around it, and threatened to crash, which it can’t do. Silenced the complaint during writing, and reduced the scope of the message when reading.
Fixed TNG wrapper routines to pass a 64-bit integer like they should.
Made various infrastructure use gmx_int64_t for consistency, and noted where in a few places the practical range of the value stored in such a type is likely to be smaller. We can’t extend the definition of XTC or TRR, so there is no proper solution available. TNG is already good, though.
Fixed trr magic-number reading¶
The trr header-reading routine returned an “OK” value even if the magic number was wrong, which might lead to chaotic results everywhere. This led to problems if other code (e.g. cpptraj) mistakenly wrote a wrong-endian trr file, which was then used with GROMACS. (This should never be a thing for XDR files, which are defined to be big endian, but such code has existed.)
Changed to use only legal characters in OpenCL cache filename¶
The option to cache JIT-compiled OpenCL short-ranged kernels needed to be hardened, so that mdrun would write files whose names would usually be specific to the device, but also only contain filenames that would work everywhere, ie only alphanumeric characters from the current locale.
Fixes for bugs introduced during development¶
These document fixes for issues that were identified as having been introduced into the release-2016 branch since it diverged from release-5-1. These will not appear in the final release notes, because no formal release is thought to have had the problem. Of course, the tracked issues remain available should further discussion arise.
Fixed bug in v-rescale thermostat & replica exchange¶
Commit 2d0247f6 made random numbers for the v-rescale thermostat that did not vary over MD steps, and similarly the replica-exchange random number generator was being reset in the wrong place.
Fixed vsite bug with MPI+OpenMP¶
The recent commit b7e4f30d caused non-local virtual sites not be treated when using OpenMP. This means their coordinates lagged one step behind and their forces are not spread to the atoms, leading to small errors in the forces. Note that non-local virtual sites are only used when local virtual sites use them as a constructing atom; the most common case is a C/N in a CH3/NH3 group with vsite H’s. Also added a check on the vsite count for debug builds.
Fixed some thread affinity cases¶
Fixed one deadlock in newly refactored thread-affinity code, which happened with automatic pinning, if only part of the nodes were full.
There is one deadlock still theoretically possible: if thread-MPI reports that setting the affinity is not possible only on a subset of ranks, the code deadlocks. This has always been there and might never happen, so it is not fixed here.
Removed OpenMP overhead at high parallelization¶
Commit 6d98622d introduced OpenMP parallelization for for loops clearing rvecs of increasing rvecs. For small numbers of atoms per MPI rank this can increase the cost of the loop by up to a factor 10. This change disables OpenMP parallelization at low atom count.
We should not use std::thread::hardware_concurrency() for determining the logical processor count, since it only provides a hint. Note that we still have 3 different sources for this count left.
Fixed data race in hwinfo with thread-MPI¶
Fixes for Power7 big-endian¶
Now compiles and passes all tests in both double and single precision with gcc 4.9.3, 5.4.0 and 6.1.0 for big-endian VSX.
The change for the code in incrStoreU and decrStoreU addresses an apparent regression in 6.1.0, where the compiler thinks the type returned by vec_extract is a pointer-to-float, but my attempts a reduced test case haven’t reproduced the issue.
Added some test cases that might hit more endianness cases in future.
We have not been able to test this on little-endian Power8; there is a risk the gcc-specific permutations could be endian-sensitive. We’ll test this when we have hardware access, or if somebody runs the tests for us.
Reduce hwloc & cpuid test requirements¶
On some non-x86 linux platforms hwloc does not report caches, which means it will fail our strict test requirements of full topology support. There is no problem whatsoever with this, so we reduce the test to only require basic support from hwloc - this is still better than anything we can get ourselves. Similarly for CPUID, it is not an error for an architecture to not provide any of the specific flags we have defined, so avoid marking it as such.
Work around compilation issue with random test on 32-bit machines¶
gcc 4.8.4 running on 32-bit Linux fails a few tests for random distributions. This seems to be caused by the compiler doing something strange (that can lead to differences in the lsb) when we do not use the result as floating-point values, but rather do exact binary comparisions. This is valid C++, and bad behaviour of the compiler (IMHO), but technically it is not required to produce bitwise identical results at high optimization. However, by using floating-point tests with zero ULP tolerance the problem appears to go away.
gmx wham for the new pull setup¶
gmx wham up to date with the new pull setup where the pull
type and geometry can now be set per coordinate and the pull
coordinate has changed and is more configurable.
Fix membed with partial revert of 29943f¶
The membrane embedding algorithm must be initialized before we call init_forcerec(), so it cannot trivially be moved into do_md(). This has to be cleaned up anyway for release-2017 since we will remove the group scheme be then, but for now this fix will allow us have the method working in release-2016.