GROMACS programs may be influenced by the use of
environment variables. First of all, the variables set in
GMXRC file are essential for running and
compiling GROMACS. Some other useful environment variables are
listed in the following sections. Most environment variables function
by being set in your shell to any non-NULL value. Specific
requirements are described below if other values need to be set. You
should consult the documentation for your shell for instructions on
how to set environment variables in the current shell, or in configuration
files for future shells. Note that requirements for exporting
environment variables to jobs run under batch control systems vary and
you should consult your local documentation for details.
GROMACS automatically backs up old copies of files when trying to write a new file of the same name, and this variable controls the maximum number of backups that will be made, default 99. If set to 0 it fails to run if any output file already exists. And if set to -1 it overwrites any output file without making a backup.
if this is explicitly set, no cool quotes will be printed at the end of a program.
prevent dumping of step files during (for example) blowing up during failure of constraint algorithms.
dump all configurations to a pdb file that have an interaction energy less than the value set in this environment variable.
GMX_VIEW_PDB, commands used to automatically view xvg, eps and pdb file types, respectively; they default to
rasmol. Set to empty to disable automatic viewing of a particular file type. The command will be forked off and run in the background at the same priority as the GROMACS tool (which might not be what you want). Be careful not to use a command which blocks the terminal (e.g.
vi), since multiple instances might be run.
the size of the buffer for file I/O. When set to 0, all file I/O will be unbuffered and therefore very slow. This can be handy for debugging purposes, because it ensures that all files are always totally up-to-date.
set display color for logo in gmx view.
use long float format when printing decimal values.
Applies for computational electrophysiology setups only (see reference manual). The initial structure gets dumped to pdb file, which allows to check whether multimeric channels have the correct PBC representation.
Defaults to 1, which prints frame count e.g. when reading trajectory files. Set to 0 for quiet operation.
Enables GPU timings in the log file for CUDA and SYCL. Note that CUDA timings are incorrect with multiple streams, as happens with domain decomposition or with both non-bondeds and PME on the GPU (this is also the main reason why they are not turned on by default).
Disables GPU timings in the log file for OpenCL.
number of steps that elapse between dumping the current DD to a PDB file (default 0). This only takes effect during domain decomposition, so it should typically be 0 (never), 1 (every DD phase) or a multiple of
number of steps that elapse between dumping the current DD grid to a PDB file (default 0). This only takes effect during domain decomposition, so it should typically be 0 (never), 1 (every DD phase) or a multiple of
general debugging trigger for every domain decomposition (default 0, meaning off). Currently only checks global-local atom index mapping for consistency.
over-ride the number of DD pulses used (default 0, meaning no over-ride). Normally 1 or 2.
disables the specialized polling wait path used to wait for the PME and nonbonded GPU tasks completion to overlap to do the reduction of the resulting forces that arrive first. Setting this variable switches to the generic path with fixed waiting order.
sets the number of GPUs required by the test suite. By default, the test suite would fall-back to using CPU if GPUs could not be detected. Set it to a positive integer value to ensure that at least this at least this number of usable GPUs are detected. Default: 0 (not testing GPU availability).
There are a number of extra environment variables like these that are used in debugging - check the code!
Performance and Run Control¶
planetary simulations are made possible (just for fun) by setting this environment variable, which allows setting
epsilon-rto -1 in the mdp file. Normally,
epsilon-rmust be greater than zero to prevent a fatal error. See webpage for example input files for a planetary simulation.
Value of the number of threads per rank from which to switch from uniform to localized bonded interaction distribution; optimal value dependent on system and hardware, default value is 4.
Controls the use of the domain decomposition machinery when using a single MPI rank. Value 0 turns DD off, 1 turns DD on. Default is automated choice based on heuristics.
force the use of twin-range cutoff kernel even if
rcoulombafter PP-PME load balancing. The switch to twin-range kernels is automated, so this variable should be used only for benchmarking.
force the use of analytical Ewald kernels. Should be used only for benchmarking.
force the use of tabulated Ewald kernels. Should be used only for benchmarking.
Removed, use GMX_ENABLE_DIRECT_GPU_COMM instead.
Removed, use GMX_ENABLE_DIRECT_GPU_COMM instead.
Enable direct GPU communication in multi-rank parallel runs. Use as an override for cases which do not default to using this feature; currently used to enable GPU communication with CUDA-aware MPI.
Disable direct GPU communication in multi-rank parallel runs. Use as an override for cases which default to using this feature; currently used to disable GPU communication with thread-MPI.
disable synchronizations between different GPU streams in SYCL build, instead relying on SYCL runtime to do scheduling based on data dependencies. Experimental.
partition the GPUs that support it into sub-devices, and treat each one as an independent device. GPUs that can not be split are ignored. Intended for use with multi-tile GPUs.
times all code during runs. Incompatible with threads.
calls MPI_Barrier before each cycle start/stop call.
build domain decomposition cells in the order (z, y, x) rather than the default (x, y, z).
during constraint and vsite communication, use a pair of
MPI_Sendrecvcalls instead of two simultaneous non-blocking calls (default 0, meaning off). Might be faster on some MPI implementations.
do domain-decomposition dynamic load balancing based on flop count rather than measured time elapsed (default 0, meaning off). This makes the load balancing reproducible, which can be useful for debugging purposes. A value of 1 uses the flops; a value > 1 adds (value - 1)*5% of noise to the flops to increase the imbalance and the scaling.
maximum percentage box scaling permitted per domain-decomposition load-balancing step (default 10)
record DD load statistics for reporting at end of the run (default 1, meaning on)
when set, print slightly more detailed performance information to the log file. The resulting output is the way performance summary is reported in versions 4.5.x and thus may be useful for anyone using scripts to parse log files or standard output.
disables architecture-specific SIMD-optimized (SSE2, SSE4.1, AVX, etc.) non-bonded kernels thus forcing the use of plain C kernels.
timing of asynchronously executed GPU operations can have a non-negligible overhead with short step times. Disabling timing can improve performance in these cases. Timings are disabled by default with CUDA and SYCL.
when set, disables GPU detection even if gmx mdrun was compiled with GPU support.
the number of systems for distance restraint ensemble averaging. Takes an integer value.
emulate GPU runs by using algorithmically equivalent CPU reference code instead of GPU-accelerated functions. As the CPU code is slow, it is intended to be used only for debugging purposes.
disable exiting upon encountering a corrupted frame in an edr file, allowing the use of all frames up until the corruption.
update forces when invoking
Override the result of build- and runtime CUDA-aware MPI detection and force the use of direct GPU MPI communication. Aimed at cases where the user knows that the MPI library is CUDA-aware, but GROMACS is not able to detect this.
Force update to run on the GPU by default, overriding the
mdrun -update autooption. Works similar to setting
mdrun -update gpu, but (1) falls back to the CPU code-path, if set with input that is not supported and (2) can be used to run update on GPUs in multi-rank cases. The latter case should be considered experimental since it lacks substantial testing. Also, GPU update is only supported with the GPU direct communications and
GMX_FORCE_UPDATE_DEFAULT_GPUvariable should be set simultaneously with
GMX_ENABLE_DIRECT_GPU_COMMenvironment variable in multi-rank cases using library-MPI. Does not override
mdrun -update cpu.
set in the same way as
GMX_GPU_IDallows the user to specify different GPU IDs for different ranks, which can be useful for selecting different devices on different compute nodes in a cluster. Cannot be used in conjunction with
set in the same way as
GMX_GPUTASKSallows the mapping of GPU tasks to GPU device IDs to be different on different ranks, if e.g. the MPI runtime permits this variable to be different for different ranks. Cannot be used in conjunction with
mdrun -gputasks. Has all the same requirements as
Disables the hardware compatibility check in OpenCL and SYCL. Useful for developers and allows testing the OpenCL/SYCL kernels on non-supported platforms without source code modification.
allow gmx mdrun to continue even if a file is missing.
when set to a floating-point value, overrides the default tolerance of 1e-5 for force-field floating-point parameters.
if set to -1, gmx mdrun will not exit if it produces too many LINCS warnings.
neighbor list balancing parameter used when running on GPU. Sets the target minimum number pair-lists in order to improve multi-processor load-balance for better performance with small simulation systems. Must be set to a non-negative integer, the 0 value disables list splitting. The default value is optimized for supported GPUs therefore changing it is not necessary for normal usage, but it can be useful on future architectures.
when set, print detailed neighbor search cycle counting.
force the use of analytical Ewald non-bonded kernels, mutually exclusive of
force the use of tabulated Ewald non-bonded kernels, mutually exclusive of
force the use of 2x(N+N) SIMD CPU non-bonded kernels, mutually exclusive of
force the use of 4xN SIMD CPU non-bonded kernels, mutually exclusive of
used in initializing domain decomposition communicators. Rank reordering is default, but can be switched off with this environment variable.
force the use of LJ paremeter lookup instead of using combination rules in the non-bonded kernels.
disable signal handlers for SIGINT, SIGTERM, and SIGUSR1, respectively.
do not use separate inter- and intra-node communicators.
skip non-bonded calculations; can be used to estimate the possible performance gain from adding a GPU accelerator to the current hardware setup – assuming that this is fast enough to complete the non-bonded calculations while the CPU does bonded force and PME computation. Freezing the particles will be required to stop the system blowing up.
disable the default heuristic for when to use a separate pull MPI communicator (at >=32 ranks).
shell positions are not predicted.
turns off update groups. May allow for a decomposition of more domains for small systems at the cost of communication during update.
set the number of OpenMP or PME threads; overrides the default set by gmx mdrun; can be used instead of the
-npmecommand line option, also useful to set heterogeneous per-process/-node thread count.
use P3M-optimized influence function instead of smooth PME B-spline interpolation.
PME thread division in the format “x y z” for all three dimensions. The sum of the threads in each dimension must equal the total number of PME threads (set in
if the number of domain decomposition cells is set to 1 for both x and y, decompose PME in one dimension.
require that shell positions are initiated.
should contain multiple masses used for test particle insertion into a cavity. The center of mass of the last atoms is used for insertion into the cavity.
resolution of buffer size in Verlet cutoff scheme. The default value is 0.001, but can be overridden with this environment variable.
Not strictly a GROMACS environment variable, but on large machines the hwloc detection can take a few seconds if you have lots of MPI processes. If you run the hwloc command lstopo out.xml and set this environment variable to point to the location of this file, the hwloc library will use the cached information instead, which can be faster.
mpiruncommand used by gmx tune_pme.
disables dynamic pair-list pruning. Note that gmx mdrun will still tune nstlist to the optimal value picked assuming dynamic pruning. Thus for good performance the -nstlist option should be used.
overrides the dynamic pair-list pruning interval chosen heuristically by mdrun. Values should be between the pruning frequency value (1 for CPU and 2 for GPU) and
Currently, several environment variables exist that help customize some aspects of the OpenCL version of GROMACS. They are mostly related to the runtime compilation of OpenCL kernels, but they are also used in device selection.
Enable OpenCL binary caching. Only intended to be used for development and (expert) testing as neither concurrency nor cache invalidation is implemented safely!
If set, generate and compile all algorithm flavors, otherwise only the flavor required for the simulation is generated and compiled.
Prevents the use of
-cl-fast-relaxed-mathcompiler option. Note: fast math is always disabled on Intel devices due to instability.
If defined, the OpenCL build log is always written to the mdrun log file. Otherwise, the build log is written to the log file only when an error occurs.
If defined, it enables verbose mode for OpenCL kernel build. Currently available only for NVIDIA GPUs. See
GMX_OCL_DUMP_LOGfor details about how to obtain the OpenCL build log.
If defined, intermediate language code corresponding to the OpenCL build process is saved to file. Caching has to be turned off in order for this option to take effect.
NVIDIA GPUs: PTX code is saved in the current directory with the name
.IL/.ISAfiles will be created for each OpenCL kernel built. For details about where these files are created check AMD documentation for
Use in conjunction with
OCL_FORCE_CPUor with an AMD device. It adds the debug flag to the compiler options (-g).
Disable optimisations. Adds the option
cl-opt-disableto the compiler options.
Force the selection of a CPU device instead of a GPU. This exists only for debugging purposes. Do not expect GROMACS to function properly with this option on, it is solely for the simplicity of stepping in a kernel and see what is happening.
Disables i-atom data (type or LJ parameter) prefetch allowing testing.
Enables i-atom data (type or LJ parameter) prefetch allowing testing on platforms where this behavior is not default.
Use this parameter to force GROMACS to load the OpenCL kernels from a custom location. Use it only if you want to override GROMACS default behavior, or if you want to test your own kernels.
Use Intel OpenCL extension to show additional runtime performance diagnostics.
Analysis and Core Functions¶
used by gmx do_dssp to point to the
dsspexecutable (not just its path).
spacing used by gmx dipoles.
sets the maximum number of residues to be renumbered by gmx grompp. A value of -1 indicates all residues should be renumbered.
Some force fields (like AMBER) use specific names for N- and C- terminal residues (NXXX and CXXX) as rtp entries that are normally renamed. Setting this environment variable disables this renaming.
name of X11 font used by gmx view.
the time unit used in output files, can be anything in fs, ps, ns, us, ms, s, m or h.
where to find VMD plug-ins. Needed to be able to read file formats recognized only by a VMD plug-in.
base path of VMD installation.
sets viewer to
xmgr(deprecated) instead of