The accelerated weight histogram method 185137 calculates the PMF along a reaction coordinate by adding
an adaptively determined biasing potential. AWH flattens free energy
barriers along the reaction coordinate by applying a history-dependent
potential to the system that “fills up” free energy minima. This is
similar in spirit to other adaptive biasing potential methods, e.g. the
Wang-Landau 138, local
elevation 139 and
metadynamics 140 methods.
The initial sampling stage of AWH makes the method robust against the
choice of input parameters. Furthermore, the target distribution along
the reaction coordinate may be chosen freely.
Rather than biasing the reaction coordinate directly, AWH
acts on a reference coordinate. The fundamentals of the
method is based on the connection between atom coordinates and and
is established through the extended ensemble 68,
where is a bias function (a free variable) and
is the unbiased potential energy of the system. The
distribution along can be tuned to be any predefined
target distribution (often chosen to be flat) by
choosing wisely. This is evident from
so that for large force constants ,
. Note the use of dimensionless energies for
compatibility with previously published work. Units of energy are
obtained by multiplication with . In the simulation,
samples the user-defined sampling interval .
Being the convolution of the PMF with the Gaussian defined by the
harmonic potential, is a smoothened version of the
PMF. (339) shows that in order to obtain
, needs to be
determined accurately. Thus, AWH adaptively calculates
and simultaneously converges
toward .
It is also possible to directly control the state
of, e.g., alchemical free energy perturbations 187. In that case there is no harmonic
potential and changes in discrete steps along the reaction coordinate
depending on the biased free energy difference between the states.
N.b., it is not yet possible to use AWH in combination with perturbed masses or
constraints.
For a multidimensional reaction coordinate , the sampling
interval is the Cartesian product (a rectangular
domain).
AWH is initialized with an estimate of the free energy
. At regular time intervals this estimate is updated
using data collected in between the updates. At update , the
applied bias is a function of the current free
energy estimate and target distribution
,
which is consistent with (339). Note that also the
target distribution may be updated during the simulation (see examples
in section Choice of target distribution). Substituting this choice of
back into (339) yields the simple free energy update
Accumulating these probability weights yields
, where
has been used. The
weights are thus the samples of the AWH
method. With the limited amount of sampling one has in practice, update
scheme (343) yields very noisy results. AWH instead applies a
free energy update that has the same form but which can be applied
repeatedly with limited and localized sampling,
Here is the reference weight histogram
representing prior sampling. The update for ,
disregarding the initial stage (see section The initial stage), is
Thus, the weight histogram equals the targeted, “ideal” history of
samples. There are two important things to note about the free energy
update. First, sampling is driven away from oversampled, currently local
regions. For such values,
and
, which by (342)
implies (assuming
). Thus, the probability to sample
decreases after the update (see (339)).
Secondly, the normalization of the histogram
, determines the update size
. For instance, for a single sample
, and using a harmonic potential
(:see (341)),
the shape of the update is approximately a
Gaussian function of width and height
137,
Therefore, in both cases, as samples accumulate in and
grows, the updates get smaller, allowing for the free energy to
converge.
Note that quantity of interest to the user is not but
the PMF . is extracted by reweighting
samples on the fly 137 (see
also section Reweighting and combining biased data) and will converge at the same rate as
, see Fig. 41. The PMF will be
written to output (see section Usage).
The bias potential can be applied to the system in two ways. Either by
applying a harmonic potential centered at , which is
sampled using (rejection-free) Monte-Carlo sampling from the conditional
distribution , see
(344). This is also called Gibbs sampling or independence
sampling. Alternatively, and by default in the code, the following
convolved bias potential can be applied,
These two approaches are equivalent in the sense that they give rise to
the same biased probabilities
(cf. (338)) while the dynamics are clearly
different in the two cases. This choice does not affect the internals of
the AWH algorithm, only what force and potential AWH returns to the MD
engine.
Along a bias dimension directly controlling the
state of the system, such as when controlling free energy
perturbations, the Monte-Carlo sampling alternative is always used, even if
a convolved bias potential is chosen to be used along the other dimensions
(if there are more than one).
Initially, when the bias potential is far from optimal, samples will be
highly correlated. In such cases, letting accumulate
samples as prescribed by (346), entails
a too rapid decay of the free energy update size. This motivates
splitting the simulation into an initial stage where the weight
histogram grows according to a more restrictive and robust protocol, and
a final stage where the weight histogram grows linearly at the
sampling rate ((346)). The AWH initial
stage takes inspiration from the well-known Wang-Landau algorithm 138,
although there are differences in the details.
In the initial stage the update size is kept constant (by keeping
constant) until a transition across the sampling interval
has been detected, a “covering”. For the definition of a covering, see
(350) below. After a covering has
occurred, is scaled up by a constant “growth factor”
, chosen heuristically as . Thus, in the
initial stage is set dynamically as
, where is the number of
coverings. Since the update size scales as (
(347)) this leads to a close to
exponential decay of the update size in the initial stage, see
Fig. 41.
The update size directly determines the rate of change of
and hence, from
(342), also the rate of change of
the bias funcion Thus initially, when
is kept small and updates large, the system will be driven along the
reaction coordinate by the constantly fluctuating bias. If
is set small enough, the first transition will typically be fast because
of the large update size and will quickly give a first rough estimate of
the free energy. The second transition, using
refines this estimate further. Thus, rather than very carefully filling
free energy minima using a small initial update size, the sampling
interval is sweeped back-and-forth multiple times, using a wide range of
update sizes, see Fig. 41. This
way, the initial stage also makes AWH robust against the choice of
.
In the general case of a multidimensional reaction coordinate
, the sampling interval is
considered covered when all dimensions have been covered. A dimension
is covered if all points in the
one-dimensional sampling interval have been “visited”.
Finally, a point has been visited if there is
at least one point with
that since the last covering has
accumulated probability weight corresponding to the peak of a
multidimensional Gaussian distribution
For longer times, when major free energy barriers have largely been
flattened by the converging bias potential, the histogram
should grow at the actual sampling rate and the
initial stage needs to be exited 141.
There are multiple reasonable (heuristic) ways of determining when this
transition should take place. One option is to postulate that the number
of samples in the weight histogram should never exceed the
actual number of collected samples, and exit the initial stage when this
condition breaks 137. In the initial stage,
grows close to exponentially while the collected number of
samples grows linearly, so an exit will surely occur eventually. Here we
instead apply an exit criterion based on the observation that
“artifically” keeping constant while continuing to collect
samples corresponds to scaling down the relative weight of old samples
relative to new ones. Similarly, the subsequent scaling up of
by a factor corresponds to scaling up the weight of old
data. Briefly, the exit criterion is devised such that the weight of a
sample collected after the initial stage is always larger or equal to
the weight of a sample collected during the initial stage, see
Fig. 41. This is consistent with
scaling down early, noisy data.
The initial stage exit criterion will now be described in detail. We
start out at the beginning of a covering stage, so that has
just been scaled by and is now kept constant. Thus, the
first sample of this stage has the weight relative
to the last sample of the previous covering stage. We assume that
samples are collected and added to for each
update . To keep constant, needs to be scaled down
by a factor after every update. Equivalently,
this means that new data is scaled up relative to old data by the
inverse factor. Thus, after updates a new sample has
the relative weight
. Now assume
covering occurs at this time. To continue to the next covering stage,
should be scaled by , which corresponds to again
multiplying by . If at this point
, then after rescaling ; i.e. overall
the relative weight of a new sample relative to an old sample is still
growing fast. If on the contrary , and this defines
the exit from the initial stage, then the initial stage is over and from
now simply grows at the sampling rate (see
(346)). To really ensure that
holds before exiting, so that samples after the exit have
at least the sample weight of older samples, the last covering stage is
extended by a sufficient number of updates.
This choice exactly flattens in user-defined
sampling interval . Generally,
. In certain cases other choices
may be preferable. For instance, in the multidimensional case the
rectangular sampling interval is likely to contain regions of very high
free energy, e.g. where atoms are clashing. To exclude such regions,
can specified by the following function of the
free energy
where is a free energy cutoff (relative to
). Thus, regions of the sampling interval
where will be exponentially
suppressed (in a smooth fashion). Alternatively, very high free energy
regions could be avoided while still flattening more moderate free
energy barriers by targeting a Boltzmann distribution corresponding to
scaling by a factor ,
The parameter determines to what degree the free energy
landscape is flattened; the lower , the flatter. Note
that both and
depend on ,
which needs to be substituted by the current best estimate
. Thus, the target distribution is also updated
(consistently with (342)).
There is in fact an alternative approach to obtaining
as the limiting target
distribution in AWH, which is particular in the way the weight histogram
and the target distribution are updated
and coupled to each other. This yields an evolution of the bias
potential which is very similar to that of well-tempered
metadynamics 142,
see 137 for details. Because of the popularity and
success of well-tempered metadynamics, this is a special case worth
considering. In this case is a function of the reference
weight histogram
Thus, here the weight histogram equals the real history of samples, but
scaled by . This target distribution is called local
Boltzmann since is only modified locally, where sampling has
taken place. We see that when the histogram
essentially does not grow and the size of the free energy update will
stay at a constant value (as in the original formulation of
metadynamics). Thus, the free energy estimate will not converge, but
continue to fluctuate around the correct value. This illustrates the
inherent coupling between the convergence and choice of target
distribution for this special choice of target. Furthermore note that
when using there is no initial
stage (section The initial stage). The rescaling of the weight
histogram applied in the initial stage is a global operation, which is
incompatible only depending locally on
the sampling history.
Lastly, the target distribution can be modulated by arbitrary
probability weights
Multiple independent bias potentials may be applied within one
simulation. This only makes sense if the biased coordinates
, , evolve essentially
independently from one another. A typical example of this would be when
applying an independent bias to each monomer of a protein. Furthermore,
multiple AWH simulations can be launched in parallel, each with a (set
of) indepedendent biases.
If the defined sampling interval is large relative to the diffusion time
of the reaction coordinate, traversing the sampling interval multiple
times as is required by the initial stage
(section The initial stage) may take an infeasible mount of
simulation time. In these cases it could be advantageous to parallelize
the work and have a group of multiple “walkers”
share a single bias potential. This can be achieved by collecting
samples from all of the same sharing group into a
single histogram and update a common free energy estimate. Samples can
be shared between walkers within the simulation and/or between multiple
simulations. However, currently only sharing between simulations is
supported in the code while all biases within a simulation are
independent.
Note that when attempting to shorten the simulation time by using
bias-sharing walkers, care must be taken to ensure the simulations are
still long enough to properly explore and equilibrate all regions of the
sampling interval. To begin, the walkers in a group should be
decorrelated and distributed approximately according to the target
distribution before starting to refine the free energy. This can be
achieved e.g. by “equilibrating” the shared weight histogram before
letting it grow; for instance,
with some tolerance.
Furthermore, the “covering” or transition criterion of the initial stage
should to be generalized to detect when the sampling interval has been
collectively traversed. One alternative is to just use the same
criterion as for a single walker (but now with more samples), see
(350). However, in contrast to the
single walker case this does not ensure that any real transitions across
the sampling interval has taken place; in principle all walkers could be
sampling only very locally and still cover the whole interval. Just as
with a standard umbrella sampling procedure, the free energy may appear
to be converged while in reality simulations sampling closeby
values are sampling disconnected regions of phase space.
A stricter criterion, which helps avoid such issues, is to require that
before a simulation marks a point along dimension
as visited, and shares this with the other walkers, also all
points within a certain diameter should have
been visited (i.e. fulfill (350)).
Increasing increases robustness, but may slow
down convergence. For the maximum value of ,
equal to the length of the sampling interval, the sampling interval is
considered covered when at least one walker has independently traversed
the sampling interval.
In practice biases are shared by setting awh-share-multisim to true
and awh1-share-group (for bias 1) to a non-zero value. Here, bias 1
will be shared between simulations that have the same share group value.
Sharing can be different for bias 1, 2, etc. (although there are
few use cases where this is useful). Technically there are no restrictions
on sharing, apart from that biases that are shared need to have the same
number of grid points and the update intervals should match.
Note that biases can not be shared within a simulation.
The latter could be useful, especially for multimeric proteins, but this
is more difficult to implement.
Often one may want to, post-simulation, calculate the unbiased PMF
of another variable . can be
estimated using -biased data by reweighting (“unbiasing”) the
trajectory using the bias potential , see
(349). Essentially, one bins the
biased data along and removes the effect of
by dividing the weight of samples by
,
Here the indicator function denotes the binning procedure:
if falls into the bin labeled by
and otherwise. The normalization factor
is the
partition function of the extended ensemble. As can be seen
depends on , the PMF of the
(biased) reaction coordinate (which is calculated and
written to file by the AWH simulation). It is advisable to use only
final stage data in the reweighting procedure due to the rapid change of
the bias potential during the initial stage. If one would include
initial stage data, one should use the sample weights that are inferred
by the repeated rescaling of the histogram in the initial stage, for the
sake of consistency. Initial stage samples would then in any case be
heavily scaled down relative to final stage samples. Note that
(357) can also be used to combine data
from multiple simulations (by adding another sum also over the
trajectory set). Furthermore, when multiple independent AWH biases have
generated a set of PMF estimates , a
combined best estimate can be obtained by
applying self-consistent exponential averaging. More details on this
procedure and a derivation of (357)
(using slightly different notation) can be found in 143.
Here
is
the force along dimension from an harmonic potential
centered at and
is the deviation of the force. The factors ,
see (344), reweight the samples.
is a friction
tensor 186 and 144. Its matrix elements are inversely proportional to local
diffusion coefficients. A measure of sampling (in)efficiency at each
is given by
AWH stores data in the energy file (edr) with a frequency set by the
user. The data – the PMF, the convolved bias, distributions of the
and coordinates, etc. – can be extracted
after the simulation using the gmx awh tool. Furthermore, the trajectory
of the reaction coordinate is printed to the pull output
file . The log file (log) also contains
information; check for messages starting with “awh”, they will tell you
about covering and potential sampling issues.
The initial value of the weight histogram size sets the
initial update size (and the rate of change of the bias). When
is kept constant, like in the initial stage, the average variance of the
free energy scales as 137, for a simple model system with constant diffusion
along the reaction coordinate. This provides a ballpark
estimate used by AWH to initialize in terms of more meaningful
quantities
where is the length of the interval and is
the diffusion constant along dimension of the AWH bias.
For one dimension, is the average time to diffuse
over a distance of . We then takes the maximum crossing
time over all dimensions involved in the bias.
Essentially, this formula tells us that a slower system (small )
requires more samples (larger ) to attain the same level of
accuracy () at a given sampling rate. Conversely,
for a system of given diffusion, how to choose the initial biasing rate
depends on how good the initial accuracy is. Both the initial error
and the diffusion only need to be
roughly estimated or guessed. In the typical case, one would only tweak
the parameter, and use a default value for
. For good convergence, should be chosen
as large as possible (while maintaining a stable system) giving large
initial bias updates and fast initial transitions. Choosing
too small can lead to slow initial convergence. It may be a good idea to
run a short trial simulation and after the first covering check the
maximum free energy difference of the PMF estimate. If this is much
larger than the expected magnitude of the free energy barriers that
should be crossed, then the system is probably being pulled too hard and
should be decreased. An accurate estimate of the diffusion
can be obtaining from an AWH simulation with the gmx awh tool.
on the other hand, should be a rough estimate
of the initial error.
The force constant should be larger than the curvature of the
PMF landscape. If this is not the case, the distributions of the
reaction coordinate and the reference coordinate
, will differ significantly and warnings will be printed
in the log file. One can choose as large as the time step
supports. This will neccessarily increase the number of points of the
discretized sampling interval . In general however, it should
not affect the performance of the simulation noticeably because the AWH
update is implemented such that only sampled points are accessed at free
energy update time.
As with any method, the choice of reaction coordinate(s) is critical. If
a single reaction coordinate does not suffice, identifying a second
reaction coordinate and sampling the two-dimensional landscape may help.
In this case, using a target distribution with a free energy cutoff (see
(352)) might be required to avoid
sampling uninteresting regions of very high free energy. Obtaining
accurate free energies for reaction coordinates of much higher
dimensionality than 3 or possibly 4 is generally not feasible.
Monitoring the transition rate of , across the sampling
interval is also advisable. For reliable statistics (e.g. when
reweighting the trajectory as described in section Reweighting and combining biased data),
one would generally want to observe at least a few transitions after
having exited the initial stage. Furthermore, if the dynamics of the
reaction coordinate suddenly changes, this may be a sign of e.g. a
reaction coordinate problem.
Difficult regions of sampling may also be detected by calculating the
friction tensor in the sampling interval,
see section The friction metric. as well
as the sampling efficiency measure
((359)) are written to the energy file and can be
extracted with gmx awh. A high peak in
indicates that this region requires
longer time to sample properly.