<—
—<
Multiple Time Step Algorithms for
Molecular Dynamics Simulations of
Proteins: How Good Are They?
¨
HELMUT GRUBMULLER,*
PAUL TAVAN
Institut fur
Oettingenstrasse 67,
¨ Medizinische Optik, Ludwig]Maximilians-Universitat
¨ Munchen,
¨
D-80538 Munchen,
Germany
¨
Received 12 May 1997; accepted 19 May 1998
ABSTRACT: We evaluate several multiple time step ŽMTS. molecular
dynamics ŽMD. methods with respect to their suitability for protein dynamics
simulations. In contrast to the usual check of conservation of total energy or
comparisons of trajectory details, we chose a problem-oriented approach and
selected a set of relevant observables computed from extended test simulations.
We define relevance of observables with respect to their role in the description
of protein function. Accordingly, the use of quantities that exhibit chaotic
behavior, like trajectory details, is shown to be inappropriate for the sake of the
evaluation of methods. The accuracy of a cutoff method and of six MTS methods
is evaluated, which differ in their treatment of the computationally crucial
long-ranged Coulomb interactions. For each of the observables considered, the
size of purely statistical fluctuations is determined to allow identification of
algorithmic artifacts. The obtained ranking of the considered MD methods
differs significantly from that obtained by the usual measures of algorithmic
accuracy. One particular distance class method, DC-1d, is shown to be clearly
superior in that no algorithmic artifacts were detected. Q 1998 John Wiley &
Sons, Inc. J Comput Chem 19: 1534]1552, 1998
Keywords: molecular dynamics simulation; protein dynamics; problem adapted
accuracy; multiple time step method; relevant observables
*Present address: Max-Planck-Institut fur
¨ Biophysikalische
Chemie, Am Faßberg 11, D-37077 Gottingen,
Germany
¨
Correspondence to: H. Grubmuller;
e-mail: hgrubmu@
¨
gwdg.de
Contractrgrant sponsor: Deutsche Forschungsgemeinschaft;
contractrgrant numbers: SFB 143rC1 and SFB 533rC1
Journal of Computational Chemistry, Vol. 19, No. 13, 1534]1552 (1998)
Q 1998 John Wiley & Sons, Inc.
CCC 0192-8651 / 98 / 131534-19
MTS ALGORITHMS FOR MD SIMULATIONS OF PROTEINS
Introduction
T
he function of many proteins is connected to
dynamical processes within these macromolecules.1 ] 3 The method of molecular dynamics
ŽMD. simulation 4 ] 7 has recently received particular attention8, 9 due to a number of correct predictions10 ] 14 and is currently considered to be one of
the most promising tools to describe these dynamical processes. However, the MD method exhibits
two severe constraints,15 a physical one, which
excludes certain molecular properties from a
proper description, and a computational one, which
limits its applicability to small systems, as well as
to short time scales.
The physical constraint originates from the approximate description of atomic motions in terms
of classical mechanics by numerically solving Newton’s equations for an effective interaction potential that partially models the quantum mechanics
of the electronic degrees of freedom. The applicability and quality of these approximations are the
subject of current discussions, but they are outside
the scope of this article.
Instead, we will focus on the computational
problems posed by the method of MD simulation
that make up its second constraint. This constraint
is set by the enormous size of the computational
task associated with MD simulations of proteins
due to the large number of particles involved
Ž10 4 ]10 5 ., as well as the large number of integration steps that have to be carried out Že.g., 10 6 for
a nanosecond simulation..
The demand to speed up simulations led to the
development and application of special purpose
computers16 ] 18 and methods to reduce the number
of floating point operations required for a certain
simulation time span. Examples of such methods
are the conventional truncation methods19, 20
Žcutoff., advanced integration schemes and multiple time stepping,21 ] 41 multipole methods,42 ] 48 as
well as the grid and Ewald methods.49 ] 52 For
reviews of these methods, see refs. 22 and 41. In
this article we will ask: according to what criteria
should such methods to compared, evaluated, and
optimized?
At first sight it may seem that efficient MD
methods should be designed to solve the equations of motion as fast and as accurate as possible.
However, for a description of protein structure
and function one generally does not have to con-
JOURNAL OF COMPUTATIONAL CHEMISTRY
sider all details of atomic motion. Due to the
approximate character of molecular models, usually no physical relevance can be attributed to
many of those details. Instead, one typically has to
reliably compute statistical properties like mean
atomic positions and fluctuations, spectra, or correlations of atomic motions. Accordingly, we will
denote such quantities as relevant. The essence of
this application-oriented approach can be summarized by the requirement that MD methods should
provide accurate descriptions of relevant observables. Algorithmic accuracy in the description of
irrelevant atomic details can be sacrificed.† Because MD algorithms generally exhibit a trade-off
between efficiency and accuracy, those MD algorithms should be employed, which in that respect
are as inaccurate as possible.
In this article we will evaluate the accuracy
of six different multiple time step ŽMTS.
methods,25, 29, 33, 53 ] 55 as well as a conventional cutoff method19 in line with the above requirements.
Applying the concept of appropriate accuracy within
the framework of MD, we evaluate the various
methods by computing a set of relevant observables from extended test simulations of a simplified protein model and, subsequently, by comparing these observables with those derived from
reference simulations. Another class of quite efficient methods, the fast multipole methods,48, 56, 57
was studied in a previous article 57 in a similar
manner.
ERROR ESTIMATION FOR
MD SIMULATIONS
Evaluations of algorithmic accuracy generally
compare selected quantities, which have been
computed using the particular MD algorithm to be
tested, with reference quantities that have either
been computed from an MD algorithm considered
to be more accurate or measured. We develop a
scheme that allows the classification of procedures
for the evaluation of algorithmic accuracy according to the types of quantities that they compare.
The method of MD simulation is based on two
subsequent steps involving assumptions and approximations.
†
Clearly, the above definition of ‘‘relevant’’ observables
depends on the particular question one seeks to answer with a
simulation. The crucial point here is that there are many details
of the MDs that can be safely labeled irrelevant for any physical
question one might wish to address.
1535
¨
GRUBMULLER
AND TAVAN
The first step is the description of protein
dynamics in terms of a physical model Žsee, e.g.,
ref. 15. defined by a large set of coupled differential equations of motion with given initial conditions. These have the detailed atomic motions, the
trajectory, as their unique and exact solution.
A numerical approximation‡ has to be computed
in a second step. This task consists of the numerical integration of Newton’s equations and requires
a frequent determination of the forces acting on
the particles. A method accomplishing the force
computation and the numerical integration will be
referred to as an MD algorithm.
The steps of physical modeling and numerical
integration imply approximations and therefore are
both subject to errors. Approximations inherent in
the physical model cause discrepancies between its
exact solution and the actual MD. Additional deviations of purely numerical origin are introduced at
the second step. Here, the major contribution to
numerical errors is typically due to an approximate treatment of long-range interactions, in particular Coulomb forces, which by far represent the
computationally most extensive task of MD simulations. Accordingly, we are mainly concerned
with the evaluation of those MD algorithms, which
aim at an efficient computation of long-range
interactions.§
For a further analysis we will distinguish two
groups of quantities: chaotic quantities and regular
quantities. This distinction is motivated by the observation that the dynamics of a protein at room
temperature is chaotic: identical systems starting
at almost identical positions in phase space quickly
become decorrelated Ži.e., well separated in phase
space within a few picoseconds..34 All quantities
that share that sensitivity to slight variations of
initial conditions will be termed chaotic quantities.
Examples are the positions and velocities of particular atoms or the exact timings of conformational
transitions. Those few quantities that do not show
such chaotic behavior will be referred to as regular
quantities. These include averaged quantities, such
‡
We call this task ‘‘numerical approximation’’ in order to
distinguish it from the physical task of developing a molecular
model. Here we include approximations, which rely on or are
derived from specific physical properties of proteins, in our
consideration.
§
The reader might object that for MTS algorithms, for
which resonance phenomena are well known,22, 29, 41, 80 discretization errors are also of concern. But in our nomenclature
these errors will be covered under a different headline: we
interpret MTS methods as approximate treatments of long-range
forces rather than as more coarse grained discretizations.
1536
as mean atomic positions, radii of gyration, mean
temperatures, vibrational spectra, or free energy
differences.
Obviously, this is not a clearcut distinction; the
assignment may depend on the model employed
or the duration of a simulation. However, experience shows that, at a given time scale, there are
quantities that can be safely labeled chaotic or
regular. These are the ones considered within the
present context.
Applying the distinction between chaotic and
regular quantities to each of the three following
levels of actual protein dynamics, exact solution,
and approximate solution, one obtains a total of
six distinct types of quantities. These, in turn,
allow six possibilities for comparison. Two of them
are inaccessible, however, because they involve
actual chaotic quantities, which cannot be measured. The remaining four combinations are all in
common use for quality estimation.
The comparison of chaotic quantities obtained
from the approximate solution with chaotic quantities obtained from the exact solution is expressed
as D c . Most estimates of algorithmic accuracy of
type D c employ comparisons of trajectories or the
time development of selected atom positions or
bond or dihedral angles Že.g., refs. 28, 35, 58]60..
Here the deviation of the approximate solution
from the exact solution is studied. An analysis of
the error in the force evaluation for a hierarchical
monopole approximation is given in refs. 42
and 56.
The comparison of regular quantities obtained
from the approximate solution with regular quantities obtained from experiments, bridging both
the physical and the numerical task is expressed as
D pn . This approach has been applied to study the
effect of truncating long-range forces on the dynamical behavior of liquids 61 and proteins.62 Here
the reference quantities are observables accessible
by experiment such as average atomic positions
and fluctuations as well as thermodynamic properties like compressibility, specific heat, or diffusion
coefficients.
The comparison of regular quantities obtained
from the approximate solution with regular quantities obtained from the exact solution, bridging
only the numerical task, is expressed as D n . The
most frequently employed test of this type is based
on the total energy of the system as a regular
quantity,6, 19, 60 which is constant for conservative
systems. Other test methods of the type D n are
based on comparisons of temperature and pressure
VOL. 19, NO. 13
MTS ALGORITHMS FOR MD SIMULATIONS OF PROTEINS
of a van der Waals fluid 58 ; on the velocity autocorrelation function of atomic position30 ; or on radial
distribution functions, compressibility, specific
heat, and self-diffusion coefficients.61
The comparison of regular quantities obtained
from the exact solution with regular quantities
obtained from experiments, bridging only the
physical task, is expressed as D p . This type of
comparison is of no direct concern within the present context, because it provides information on
the quality of molecular models, not on algorithmic accuracy.
Test methods of type D c are necessarily based
on comparisons of irrelevant quantities. By accepting only unnecessarily accurate MD algorithms,
most efficient algorithms would be rejected by
such tests. Furthermore, comparisons of chaotic
quantities may not even provide information on
algorithmic accuracy because minute, irrelevant
algorithmic deviations may be largely amplified
due to the chaoticity inherent in protein dynamics.
Therefore we will base our evaluations on comparisons of regular quantities. Recall that these arguments apply solely to the MD description of protein dynamics. For the study of other many-body
systems, D c results may prove quite useful, especially in cases in which the detailed trajectory is of
interest Že.g., in certain astrophysical computations..
Two approaches remain to be considered: those
based on D n and D pn . The latter type of comparison, however, involves quantities separated by two
levels of approximation, the physical and the numerical ones, respectively, and therefore the opportunity to separately optimize the physical
model, as well as the MD algorithm, is lost.
This disadvantage is avoided by an approach of
type D n : only comparisons of regular quantities
obtained using the particular MD method to be
evaluated with corresponding reference quantities
enable a separation of artifacts caused by such an
approximate method from incompatibilities of
simulation results and experimental data and an
optimization of MD algorithms without relying on
the quality of a particular physical model. Specifically, our test simulations described below, which
aim at an evaluation of approximation for the
Coulomb interaction, are based on reference quantities, which were obtained by exactly computing
that interaction, using an otherwise identical MD
algorithm.
Note that regular quantities are typically averages of chaotic quantities and are therefore subject
JOURNAL OF COMPUTATIONAL CHEMISTRY
to statistical fluctuations. Accordingly, in our comparisons of type D n we will have to determine the
size of these fluctuations in order to separate them
from the algorithmic artifacts that we wish to
detect. In particular, the ensemble from which the
average is taken must be sufficiently large, that, in
turn, requires extended test simulations.
From the above discussion it is clear that our
application-oriented approach cannot be expected
to determine the optimal MD algorithm. Rather,
problem-adapted mesures of accuracy are always
defined with respect to a particular set of quantities
and, probably, with respect to the particular physical model that has been used to carry out the test
simulations. Nevertheless, test simulations on similar models should provide qualitatively similar
evaluation results. Accordingly, results from test
simulations on properly chosen protein model systems should actually provide information on the
suitability of MD methods for the study of protein
dynamics in general.
Methods
Following the above approach, we studied seven
different MD algorithms, which all aim at an efficient, approximate computation of long-range
forces. We proceeded in three steps.
First, a typical test system was selected. By
typical we mean that the system should be similar
enough to proteins so that it ensures relevance of
our evaluation for MD studies of proteins.
Second, each algorithm was used to perform
several MD simulations of the test system. Each
simulation covered the time span of 1 ns. In addition, reference simulations were carried out with
exact computation of long-range forces.
Finally, selected observables were computed
from the obtained trajectories. As a measure of
problem-adapted accuracy, these observables were
compared with corresponding ones determined
from the reference simulations.
TEST SYSTEM
In principle, any protein model could serve as a
test system. However, to provide significant results that allow generalizations, the test system
should represent a ‘‘hard’’ test case: in our example, in which we evaluate approximations for the
Coulomb interaction, that interaction should sig-
1537
¨
GRUBMULLER
AND TAVAN
nificantly contribute to the structure and dynamics
of the test system.
Most quantities that are studied in typical MD
studies of proteins are averages. Therefore, they
are subject to statistical errors. To keep these errors
small enough to enable a separation from the algorithmic deviations we intend to study, extended
simulations are required. The computational effort
for such extended simulation enforces the use of a
small test system.
These criteria led us to the decision to use a
simplified protein model instead of a detailed one.
In a study of the low-frequency conformational
dynamics of proteins, we presented that model 63
in a detailed discussion of its design and properties in that article. Here, we solely want to rehash
those of its aspects that are of relevance for the
present study.
The ‘‘primary’’ structure of the model consists
of 100 heterogeneously charged residues. The internal structure of the residues is neglected such
that the polypeptide is represented by a covalently
interconnected chain of van der Waals spheres.
Figure 1 shows the primary structure of the model
in a stretched, unfolded conformation; the inset
shows its detailed structure. The force field includes bond-stretch, bond-angle, van der Waals,
and Coulomb interaction as defined in ref. 19.
ŽParticle masses and force parameters were chosen
from those of CH 2 ‘‘extended atoms’’ and of associated single bonds.. The heterogeneous charge
distribution along the chain shown in Figure 1 by
the bold, wavy curve was chosen to represent
hydrophobic and hydrophilic interactions and, accordingly, to enable formation of a stable tertiary
structure. Because, in our case, the forces that
maintain the tertiary structure are exclusively of a
Coulombic nature, the model should actually rep-
resent a ‘‘tough’’ test case for approximations, particularly of these forces.
Starting from the stretched configuration depicted in Figure 1, a folding process was carried
out, including a subsequent relaxation and equilibration. The stability of the obtained tertiary structure on a nanosecond time scale was verified by
dynamics simulations covering several nanoseconds. The resulting conformation of the protein
model is depicted in Figure 2. This structure was
used as the initial configuration for all test simulations.
Despite the simplicity of our model, its relevant
structural and dynamical properties were shown
to be quite similar to those of more realistic protein models.63 Accordingly, the results of our test
simulation should indeed provide information,
whether or not the considered MD algorithms are
suitable for MD studies of proteins.
FIGURE 1. Protein model in a stretched, unfolded
FIGURE 2. Equilibrated structure of the protein model
configuration, consisting of 100 CH 2-like ‘‘atoms’’; their
partial charges are represented by the bold curve; the
inset shows a zoom of the detailed structure.
described in the text, shown as a ‘‘ribbon-plot’’; the bold
lines represent chemical bonds, four labeled circles mark
atoms referred to in the text.
1538
TESTED MD ALGORITHMS
Test simulations on our simplified protein model
were carried out using seven different MD algorithms. Each of them employs a particular approximation for the computation of the long-range
VOL. 19, NO. 13
MTS ALGORITHMS FOR MD SIMULATIONS OF PROTEINS
Coulomb interaction and aims at a reduction of the
corresponding computational effort. Thus, all
tested algorithms provide an approximation for
the sum
Ý
Eel A
²i , j:
qi q j
Ž1.
ri j
over all atom pairs ² i, j :, as well as for its spatial
derivatives that are required for computation of
the corresponding forces. Here Eel denotes the
Coulomb energy of the system, qi and q j are the
partial charges of atoms i and j, respectively, and
ri j denotes the distance between the two atoms. In
contrast to the tested MD algorithms, the reference
method, denoted by REF, performs an exact computation of the sum Ž1.. Furthermore, it employs
the well-known Verlet algorithm58 for integration
of the equations of motion.
Approximation of the Coulomb interaction is
the key to reducing the total computational work
that has to be spent in REF: the number of floating
point operations necessary for evaluation of Eel
scales with N 2 , where N is the number of atoms.
Because the computational effort for the shortrange interaction scales linearly with N, by are
most of the computational work, particularly for
large systems has to spent on the determination of
the Coulomb interaction.
The first approximate method we studied is the
widely used ‘‘cutoff’’ method,19 denoted as CUT.
This method neglects pair interaction of atoms separated further than a certain distance, the cutoff
radius rcut . The cutoff is achieved by multiplying
the interaction energy with a ‘‘switching function’’
f Ž r ., which is unity for r s 0 and reduces the
forces to zero, usually in a continuous differentiable manner. Accordingly, the sum in eq. Ž1. is
approximated by
Eel f
Ý
² i , j :cut
f Ž ri j .
qi q j
ri j
.
Due to the switching function only a small subset
² i, j :cut of all atom pairs ² i, j : remains to be
considered, namely, those for which ri j - rcut .
Commonly used values for rcut are in the range of
˚ In our simulation we used the switching
8]15 A.
2 .2
function19 f Ž r . s Ž1 y r 2rrcut
and a cutoff radius
˚
rcut of 10 A.
The other six algorithms investigated were distance-class ŽDC. methods. By approximately ac-
JOURNAL OF COMPUTATIONAL CHEMISTRY
counting for pair interaction beyond rcut , they are
designed to avoid artifacts caused by the cutofftype neglect of these interactions. Nevertheless,
they achieve a comparably large reduction of the
computational effort by employing an MTS scheme.
That scheme rests on the observation that the time
derivative of pair interactions decreases with increasing distance ri j . Accordingly, during numerical integration, the slowly fluctuating, long-range
force contributions need not be explicitly computed as often as the rapidly varying short-range
interactions. Instead, the long-range forces can be
estimated by extrapolation procedures at many
intermediate integration steps.
The frequency of explicit force computation is
defined for each atom pair by its assignment to a
particular DC, which is chosen according to ri j as
follows: let R 0 , . . . , R n 4 be a set of radii with
R j - R jq1 for all j s 0, . . . , n y 1 and R 0 s 0. Then
the set of atom pairs ² i, k : that satisfy R j F ri k R jq1 is called DC j. Using eq. Ž1. and the respective spatial derivatives, the force F Ž j. acting on a
particular atom and originating from all other
atoms within DC j is computed explicitly once
every n [ 2 j integration steps. All other forces are
estimated from previously computed forces according to the linear extrapolation formula
Ž j.
Ž j. Ž j.
Ž j. Ž j.
Fiqk
n f a i Fk n q bi FŽ ky1. n ,
Ž2.
where i s 0, . . . , n y 1 and i q kn is the number
of the current integration step. The principles of
this method and their justification are outlined in
ref. 29. For our simulations, we used four DCs
˚ .: w 0 ??? 4x , w 4 ??? 7x ,
covering the intervals Žin A
w 7 ??? 11x , and w 11 ??? `x .
The six DC algorithms considered here, DC-0,
DC-i, DC-1a, DC-1b, DC-1c, and DC-1d, respectively, differ in the choice of the ‘‘extrapolation
coefficients’’ aŽi j. and biŽ j.. Table I lists the extrapolation coefficients that we used. Here, i is the
number of integration steps carried out since the
most recent exact force computation of atom pairs
within class j. As an illustration, Figure 3 shows
for all six DC algorithms under consideration the
time development of extrapolated forces Žvertical
lines. for the fourth DC Žin our simulations, this is
the outermost class. that enter into the integration
of the Newtonian equations of motion. The corresponding exact forces are drawn as bold lines.
Method DC-i Žimpulse., which is closely related
to the RESPA algorithm proposed in ref. 30, was
1539
¨
GRUBMULLER
AND TAVAN
TABLE I.
Extrapolation Coefficients a(i j ) and b i( j ) Used for Six Distance-Class Algorithms.
Algorithm
DC-i
DC-0
DC-1a
DC-1b
DC-1c
DC-1d
a (i j )
b i( j )
n for i s 0, 0 otherwise
1
( n 2 y n q 1) y i ( n y 2)
6n
( n q 1 )(2 n 2 q 1 )
1qi/n
( n q 1) / 2 for i s 0, 1 otherwise
3n2 y 2n q 1
ny1
y 3i
(
)
(
n nq1
n n q 1)
0
0
2 (y2 n 2 q 3 n y 1 ) q 6 i ( n y 1 )
2n2 q 1
yi / n
(1 y n ) / 2 for i s 0, 0 otherwise
3n2 y 2n q 1
ny1
1y
q 3i
(
)
(
n nq1
n n q 1)
We used the abbreviation n [ 2 j; i is the number of integration steps carried out since the most recent exact force computation for
class j.
derived in ref. 29 Žcalled VERLET-I therein. using
the so-called ‘‘Verlet criterion’’ that ensures equivalence to exact forced computation in those cases
in which only one DC Žnot necessarily the innermost. is populated by atom pairs. In fact, no force
extrapolation occurs here. Rather, all forces from n
subsequent integration steps are combined and
applied instantaneously Žcf. the large spikes depicted in Fig. 3, DC-i.. Because the magnitude of
these forces increases exponentially with the number of DCs, DC-i tends to cause severe artifacts if
more than three DCs are employed.29
Method DC-0, first applied to MD simulations
in ref. 27 Žsee also ref. 64., avoids this disadvan-
FIGURE 3. Extrapolated forces (vertical lines) for
distance class j = 3, as defined for the six distance-class
algorithms referred to in the text [cf. eq. (2), Table I]. For
the numerical integration of the Newtonian equations of
motion, these extrapolated forces are used as an
approximation for the exact forces (bold lines).
1540
tage. Here, exact forces computed every nth integration step are constantly applied within all
subsequent n y 1 integration steps. This can be
considered a 0th order extrapolation, which appears as stair steps in Figure 3, DC-0.
The above two methods estimate forces on the
basis of the most recently computed force; all biŽ j.
vanish in these cases. Better approximations may
be obtained if, in addition to the most recently
computed exact forces, ‘‘older’’ forces are considered, which have been computed n steps before.29
In this case, many possible choices for the extrapolation coefficients exist. The most obvious choice,
which to our knowledge has not yet been used for
MD simulations, is a linear extrapolation ŽDC-1b..
The corresponding picture in Figure 3 shows the
resulting force development in time.
However, method DC-1b does not obey the
Verlet criterion.29 Among all sets of extrapolation
coefficients, which do fulfill the Verlet criterion,
those that deviate as little as possible from a linear
extrapolation constitute method DC-1a, as derived
in ref. 29. Note that, despite this minimal principle,
the coefficients differ considerably from the ones
in DC-1b: in contrast to the linear extrapolation,
here the a coefficients decrease whereas the b
coefficients rise. This fact becomes apparent in
Figure 3, DC-1a, where increased discontinuities in
the force development can be observed.
A third set of coefficients ŽDC-1c. was proposed
by Skeel and Biesiadecki.65, 66 This method also
obeys the Verlet criterion. In addition, one can
show that DC-1c minimizes the orce discontinuities Žnot shown in Fig. 3. that arise if atom pairs
move from one DC to another in the course of a
simulation.34 However, this method may suffer
VOL. 19, NO. 13
MTS ALGORITHMS FOR MD SIMULATIONS OF PROTEINS
from artifacts, which are smaller than, but of similar origin as those apparent in DC-i Žcf. the corresponding pictures in Fig. 3, where sharp peaks are
visible..
To combine the advantages of the above algorithms while avoiding possible resonant artifacts,
we propose a new method DC-1d, which is the last
method considered here. Although it deviates only
slightly from method DC-1a, we expected it to be
superior, because it additionally avoids, like DC-1c,
force discontinuities caused by atom pairs crossing
DC boundaries. This algorithm is derived and discussed in detail in ref. 34.
TEST SIMULATIONS
A total of 232 test simulations, each of one
nanosecond duration, were carried out using the
eight MD algorithms described above. Integration
step sizes of 0.5, 1.0, and 2.0 fs were employed.¶
All 232 simulation started with almost identical
initial conditions, which were derived from the
model protein structure described above by applying minute random modifications of the order of
˚ to atomic positions. The chaoticity inherent
10y6 A
in the dynamics leads to a rapid decorrelation of
the trajectories after a few picoseconds and therefore guarantees that the 232 simulation are essentially independent of each other.
A described elsewhere,63 our model system was
found to exhibit several conformational states, between which rare transitions occur on a time scale
of several hundred picoseconds. That conformational dynamics, although initially unexpected
by us, resembles a ubiquitous property of lowfrequency protein dynamics67, 68 : in many cases
macroscopic observables have been found to be
strongly influenced by the underlying conformational state Žsee, e.g., refs. 69, 70.. It was not our
intention to demonstrate that well-known influence of molecular conformation on a variety of
observables. Instead, we wanted to concentrate on
the influence of different approximation schemes
on the dynamics of proteins in those cases in which
the molecular structure is unaffected by these approximations. Therefore, those trajectories were selected from our test simulations for further analysis, that did not exhibit conformational transitions.
¶
For the MTS algorithms this choice implies force update
periods of 1, 2, and 4 fs for the second distance class Ž i s 1.; of
2, 4, and 8 fs for the third distance class Ž i s 2.; and of 4, 8, and
16 fs for the fourth distance class Ž i s 3..
JOURNAL OF COMPUTATIONAL CHEMISTRY
The system was considered to stay within the
initial conformation state if the average structure
in the test simulation did not considerably deviate
from the reference structure and if no jumps of
selected atomic distances occurred.
The first criterion was assessed by computing
the maximum deviation of mean atomic positions
from those of the reference simulation. This devia˚ For the second
tion had to be smaller than 2.5 A.
criterion we selected four atoms, a12, a36, a63,
and a87, the location of which within the molecular structure is indicated by four labeled circles in
Figure 2. Because conformational changes during
the simulation could be monitored by fast, significant changes of the distances among these four
atoms, we selected only those simulations that did
not exhibit such distance jumps.
The above selection excludes those trajectories
from our analysis in which deviations from the
reference simulation are simply caused by occasional conformational transitions or by structural
changes induced by algorithmic artifacts. Therefore, any observed significant deviation actually
reflects an artifact concerning the dynamics of our
system. w A similar approach was taken in ref. 57,
where a more detailed protein model was used.x
RELEVANT QUANTITIES
For our application-oriented evaluation of the
seven MD algorithms sketched above, we chose a
set of regular quantities that enable comparisons of
the type D n . Our choice was guided by the requirement that the respective regular quantities
should be useful for explanatory purposes in MD
studies of protein function: these quantities should
be relevant. As explained in detail below, these
quantities include atomic fluctuations, vibrational
spectra, autocorrelation functions, and cross-correlations of atomic motions. We also considered configuration space density distributions due to their
close relation to free energy computation and reaction rates. For comparison, we included in our
analysis the drift of total energy as a frequently
employed measure of algorithmic accuracy, although in our view that quantity is of no particular functional relevance.
Despite the considerable length of the test simulations, certain observables exhibited statistical
errors. To separate deviation caused by the particular approximation scheme Ž‘‘algorithmic deviations’’. from statistical errors, the latter were
1541
¨
GRUBMULLER
AND TAVAN
estimated by comparing independent reference
simulations with each other.
Note the following abbreviations: r s Ž t i . is the
position of atom s at time t i at the ith integration
step in the simulation 5 ; ²r s : [ Ý t i r s Ž t i .rNt is the
average position of atom s, determined from a
trajectory of Nt coordinate sets Ži.e., the number of
integration steps.; and length T s 1,048,576 fs; the
integration step size is D t ; the superscript ref denotes an observable computed from a reference
simulation.
Low-Frequency Spectra of Atomic Vibrations. The
low-frequency Fourier transforms were computed
according to
Nsmy1
Ý
ˆr s Ž v . s
rs Ž t . s
10 s rD t
1
Ý
Z
The spectrum ˆ
r s Ž v . of atomic vibrations, derived by a Fourier transform of the trajectory of
atom s, determines the contribution of the motion
of atom s to the infrared spectrum of a protein.
For our evaluation of algorithm is accuracy, we
used the average power spectrum SŽ v . defined by
Ý ˆrs Ž v .
2
.
Ž3.
s
Spectra were computed for two different frequency ranges, a high-frequency range Ž0.5]250.0 psy1 .
and a low-frequency range Ž0.001]4.0 psy1 ..
High-Frequency Spectra of Atomic Vibrations. For
the computation of the high-frequency spectra,
each 1-ns trajectory was partitioned into M s 511
consecutively overlapping segments r Ž mD M ., . . . ,
r ŽŽ m q 2. D M .4 , m s 0 ??? M y 1, of length 2D M s
4096 fs each. According to the procedure described
in ref. 71, all segments were multiplied with a
Hanning window function and then subject to a
Fourier transform. From the resulting M spectra
an average was taken.
For technical reasons, all sums over s take only every fifth
atom along the chain into account: s s 1, 6, 11, . . . , 91, 96.
1542
Ž t y kD t . 2
2s 2
ksy10 s rD t
Ý
Zs
exp y
Ž t y kD t . 2
2s 2
ksy10 s rD t
,
.
The sampling rate of the smoothened trajectory
was chosen in correspondence to the width of the
smoothing window, s s 128 fs. Nsm [ TrNt denotes the number of smoothened coordinate sets.
Atomic RMS Fluctuations
The average displacement ss of an atom s from
its average position ²r s : was computed by
1r2
Average Spectrum of Atomic Vibrations
5
r s Ž t . exp y
10 s rD t
Due to algorithmic noise forces, we expected
some of the algorithms to cause considerable energy drifts, amounting to several hundred kilocaloriesrmole during a 1-ns simulation. Such drift
would imply a large change of temperature and
prevent further comparisons. Therefore, we decided to compensate energy drifts by a weak coupling to a heat bath. Accordingly, the energy flow
to or from the system was used as a measure for
the algorithmic heat production and, hence, total
energy drift.
tk s k s ,
with smoothened trajectories r s Ž t .,
Drift of Total Energy
SŽ v . [
r s Ž t k . exp Ž i v t k . ,
ks0
ss s
Ý ž Žrs Ž t i . y ²rs :.
ti
2
/ rN
t
.
Ž4.
Here ss is a measure for atomic mobility and
usually depends on the molecular environment of
the particular atom under consideration: atoms, for
example, that belong to solvent-exposed side
groups of proteins, typically show a higher mobility than backbone atoms.72 Because mobilities can
be determined by X-ray or neutron scattering experiments, that observable is frequently used as a
check for the quality of molecular models.
Correlations of Atomic Motions
Correlation function serve to characterize atomic
motions. Autocorrelation functions of atomic positions or velocities, on the one hand, provide information on the diffusive character of atomic motion
and enable the determination of friction coefficients.72 Cross-correlations of atomic motions, on
the other hand, serve to detect collective motion
and thus reveal functional, probably causal, interrelations between distant parts of a protein. Due to
the widespread use of these quantities, we studied
displacement autocorrelation function C sdis Žt . and
velocity autocorrelation function C svel Žt ., as well
VOL. 19, NO. 13
MTS ALGORITHMS FOR MD SIMULATIONS OF PROTEINS
as cross correlation functions K s1 s 2 , as relevant
quantities. They were computed according to
C sdis Ž t j . s
N jy1
Ý is0
Ž rs Ž ti . y ²rs : . Ž rs Ž ti q t j . y ²rs : .
N jy1 Ž Ž .
Ý is0
r s t i y ²r s :.
2
,
low-dimensional subspace spanned by M degrees
of freedom, c [ c i Žr N , p N .4 , i s 1, . . . , M, and obtains a projected phase space density rc Žc.:
rc Ž c . s
H dr9
N
dp9 N r Ž r9 N , p9 N . d Ž c y c9 . .
Ž5.
C svel Ž t j .
s
N jy1 Ž . Ž
Ý is0
˙r s t i ˙r s Ti q t j .
K s1 s 2 s
N jy1 Ž .
Ý is0
˙r s t i
k s1 s 2
Ž ks
k
2 s2 s2 s2
. 1r2
2
Ž6.
,
Ž7.
,
with the covariance matrix
k s1 s 2 s
Ý Ž rs Ž t i . y ²rs : .Ž rs Ž t i . y ²rs : . .
ti
1
1
2
2
In the above equations, Nj s ŽT y t j .rD t and t j s
jD t ; the overdot denotes the time derivative.
Projected Configuration Space Density
A fundamental thermodynamic quantity of
many-body systems is the phase space density
r Žr N , p N ., which is generated by their dynamics.
Here, r N and p N denote the 3 N-dimensional vectors of all N atomic position and momenta, respectively; r is rarely used and is determined on its
own, but it is closely related to statistical observables like entropy, free energy, or reaction rates.
An approximation for r can be computed using an
ensemble, which is generated by means of an MD
simulation, by dividing phase space into volume
elements V k : for long simulation times the fraction
of trajectory points within V k approaches phase
space density at the location of V k .
However, due to the limited length of MD simulations, as well as due to the high dimensionality
of phase space, such a density estimate generally
does not represent a regular quantity. Even for our
small test system the dimension of phase space is
6 N s 600; therefore, the number K of volume elements V k is extremely large even for coarse resolutions, thus, the average number of points per
volume element is small. Correspondingly large
statistical errors inhibit a straightforward use of
phase space density as a measure for algorithmic
accuracy.
However, if instead of taking all 6 N degrees of
freedom into account, only a few degrees of freedom are considered, regular quantities can be derived. For this purpose one projects r onto a
JOURNAL OF COMPUTATIONAL CHEMISTRY
The density rc Žc. can be considered a regular
quantity if the number of trajectory points per
volume element within the subspace is large enough
to allow a statistical analysis. We restrict our discussion to projections onto conformational degrees
of freedom, c s c i Žr N .4 , usually referred to as
‘‘conformational coordinates.’’73 For a discussion
of their central role in the statistical mechanics of
conformational transitions, see ref. 74. In particular, the strict definition of conformational substates
in terms of minima of free energy as a function of
suitably projected configuration space densities led
to an MD based method, ‘‘conformational flooding,’’ to predict microsecond conformational
transitions.71
For our test simulations, we selected projections
onto a number of P s 50 2-, 3-, and 4-dimensional
conformational subspaces, defined by pairs, triples,
and quadruples of interatomic distances, respectively. For the 2-dimensional case, we used distance pairs c s Ž d n 1 n 2 , d n 3 n 4 . among four selected
atoms, n1 , n 2 , n 3 , and n 4 , respectively. Figure 2
shows the locations of the four selected atoms
Ža12, a36, a63, and a87. within the protein model.
Projections onto all 15 2-dimensional subspaces,
which can be defined using the six distances d1 s
dŽa12; a36., d 2 s dŽa12; a63., d 3 s dŽa12; a87.,
d 4 s dŽa36; a63., d 5 s dŽa36; a87., and d 6 s
dŽa63; a87., were computed, yielding a total of 15
different density distributions. Twenty 3- and 15
4-dimensional projections were similarly defined.
The decision to use projections defined by interatomic distances was inspired by the possibility of
computing conformation-controlled reaction rates
from such projected densities and, corresponding,
of explaining the kinetics of protein function on
the basis of MD simulations. As an example, consider the hypothetical docking process of two enzymes A and B depicted in Figure 4. That process
exhibits similarities to the docking reactions of G
proteins Že.g., transducin. 75 to photoreceptors. The
two enzymes are supposed to have three docking
sites. Their mutual distances are d12 , d 23 , D12 , and
D 23 , as shown in Figure 4. For simplicity, we
assume enzyme A to be flexible, such that d12 and
d 23 show considerable fluctuations, and enzyme B
to be rigid, such that D 12 and D 23 are constant.
1543
¨
GRUBMULLER
AND TAVAN
These properties are indicated by the shapes of the
two enzymes. As the system moves, the probability of the occurrence of a docking reaction will be
large if all three active sites simultaneously fit
together, that is, if d12 f D12 and d 23 f D 23 ; otherwise the docking probability will be small. Hence,
the corresponding reaction rates depend on the
frequency of distance matchings and is thus determined by the projected configuration space density pŽ d12 , d 23 . in the vicinity of location Ž D 12 , D 23 ..
A similar approach was employed in ref. 76,
where electron transfer rates within a protein were
computed from MD simulations. In that work,
following Marcus theory,77 the probability of an
electron transfer at each instance of time was assumed to be large whenever the corresponding
change of electrostatic energy was small. Because
that energy difference is a function of atomic positions, the obtained transfer rate is determined by
the projected phase space density onto a conformational subspace that in this case is a 1-dimensional
one.
The above examples illustrate that projected
configuration space densities can be considered to
be a relevant quantity for a wide range of MD
applications.
For each algorithm a and each test simulation,
we determined rc by extracting ensembles S a consisting of a total of Nc s 131,072 configurations.
For this purpose, each of the P s 50 subspaces
was subdivided into K rectangular volume elements V k, p , k s 1 ??? K, p s 1 ??? P. The subdivisions were set up in such a way that in case of the
ensemble R1 obtained from the reference simulation, all rectangles V k, p contained an identical
number NR 1 [ NcrK of configurations. As an illustration, Figure 5 shows such a subdivision into
K s 16 rectangles V 16, p that are separated by bold
lines. The ensemble R1 is depicted as a cloud of
dots, each dot representing one particular configuration.**
Using this subdivision, deviations D Sp ayR 1 of the
projected phase space density from the reference
density were computed by
D Sp ayR 1 s
1
Ky1
K
Ý
ks1
ž
NSka, p y NR 1
NR 1
/
2 1r2
. Ž8.
where NSk,a p is the number of configurations out of
ensemble S a that fall into rectangle V k, p for projection p.
Nonvanishing deviations D Sp ayR 1 may result
from statistical fluctuations of the finite configuration counts NSk,a p or be caused by the particular
MD algorithm to be tested. To decide whether
observed deviations significantly point to an algorithmic artifact, the purely statistical contribution
of D Sp ayR 1 has to be estimated. w Note that these
statistical errors of the NSk,a p must be expected to
be considerably larger than Ž NSk,a p .1r2 , because subsequent configurations, temporally separated by
TrNc s 8 fs, are strongly correlated and therefore
do not represent independent random events.x
We estimated the fluctuations by comparing
ensemble R 1 with a second reference ensemble,
** For technical reasons, only a fraction of all Nc configurations is shown.
FIGURE 5. Ensemble R1 of configurations generated
FIGURE 4. Schematic enzymatic docking process to
illustrate the functional relevance of configuration space
density projections; an explanation is given in the text.
1544
by a reference simulation of the protein model. Points in
configuration space, projected onto the 2-dimensional
subspace defined by two interatomic distances d12;87
and d3 6;6 3 , are represented by dots; a subdivision V 16, p
of this subspace (see text) is depicted using bold lines.
VOL. 19, NO. 13
MTS ALGORITHMS FOR MD SIMULATIONS OF PROTEINS
R 2 . For the second reference simulation the initial
atomic velocities were slightly varied to generate
an independent trajectory. Thus, deviation D Rp 2yR 1
should solely result from statistical fluctuations.
Based on that assumption, the average deviation of the two reference ensembles
²D R
2 yR 1
:[
ž
P
Ý
ps1
D Rp 2yR 1
/
Ž9.
P
was used to estimate the statistical contribution to
the observed deviations. As a measure for the
error of that estimate we used the variance sR 2yR 1
of the D Rp 2yR 1,
sR 2yR 1 [
1
Py1
ÝŽ
ps1
D Rp 2yR 1
y ² D R 2yR 1 : .
2
.
Similarly, for each algorithm to be tested, a separate sS ayR 1 was computed on the basis of deviations D Sp ayR 1. The significant contribution to the
average deviation ² D Sp ayR 1 :, the algorithmic accuracy D a , can now be readily obtained by
D a s ² D S ayR 1 : y ² D R 2yR 1 : " sa
Ž 11.
with a certain error range sa .
If the P s 50 phase space projections were independent of each other, the error range sa could be
estimated by
Ž sR2 yR
2
1
1r2
q sS2ayR 1 . rP
. Ž 12.
However, because statistical independence cannot
be assumed, one has to consider the upper limit
samax [
The 232 test simulations of 1-ns duration each
were analyzed. According to the criteria outlined
further above, we selected for each of the eight
MD algorithms ŽREF, CUT, DC-i, DC-0, DC-1a,
DC-1b, DC-1c, and DC-1d. and for each of the
three integration step sizes of 0.5, 1.0, and 2.0 fs,
respectively, a suitable-1]ns trajectory for further
analysis. Wherever appropriate, results of all 24
trajectories are shown, but in most cases only the
eight simulations obtained with a 1.0-fs step size
are considered.
1r2
P
Ž 10.
sa f samin [
Results
Ž sR2 yR
2
1
q sS2ayR 1 .
1r2
.
DRIFT OF TOTAL ENERGY
Figure 6 shows the drift of total energy per
picosecond for all 24 1-ns simulations. One can
observe that, for each of the algorithms, the energy
drift increases with the integration step size. CUT
does not significantly affect conservation of total
energy that quite accurately results in the case of
the reference simulation, REF. In contrast, all DC
algorithms are associated with considerable heat
production. Without coupling to a heat bath, that
heat production would cause a denaturation of the
system with a 1-ns simulation.
The large energy drift observed for the DC
algorithms is due to errors associated with the
force extrapolations, as illustrated in Figure 3.
These deviations from the exact forces represent
‘‘noise’’ forces, which transfer kinetic energy into
the molecular system. Note that the magnitude of
noise, which may be estimated by inspecting the
force deviations apparent in Figure 3, does not
Ž 13.
Then one may safely assume sa to be in the range
between samin and sama x .
It is not easy to determine a priori which level
of coarse graining should be used for our comparisons of rc : few large volume elements, on the
one hand, minimize the statistical fluctuations
² D R yR : but provide no detailed picture of the
2
1
density distribution. Many small volume elements,
on the other hand, provide a high resolution picture of the density distribution but suffer from
large statistical fluctuations. To provide an unbiased analysis, we compared projected phase space
densities using various resolutions Ži.e., various
numbers K of volume elements V k, p ..
JOURNAL OF COMPUTATIONAL CHEMISTRY
FIGURE 6. Drift of total energy per picosecond for the
reference simulation REF, as well as for the seven
investigated algorithms. All eight algorithms were applied
with integration step sizes of 0.5, 1.0, and 2.0 fs.
1545
¨
GRUBMULLER
AND TAVAN
completely correspond to the observed energy drift
in Figure 6: large noise may be expected for DC-i,
DC-1c, DC-1a, or DC-1d; however, the latter three
algorithms that all obey the Verlet criterion exhibit
a moderate energy drift only. In contrast, the
largest energy drift is caused by the relatively
small noise forces of DC-0. One may conclude that
the Verlet criterion serves to reduce the total energy drift caused by algorithmic noise forces in DC
methods, in line with the objectives that led to the
formulation of that criterion.
AVERAGE HIGH-FREQUENCY SPECTRUM OF
ATOMIC VIBRATIONS
As is apparent in Figure 3, the force discontinuities caused by DC algorithms occur at every n s
2 j th integration step in our simulations. Accordingly, their influence on the atomic motion should
become apparent in averaged Fourier spectra of
atomic vibrations at high frequencies ŽFig. 7.. We
computed these spectra according to eq. Ž3. for the
integration step sizes of 0.5 and 1.0 fs, respectively.
The two spectra obtained from REF are plotted as
dashed lines in the upper left picture in Figure 7.
Due to the absence of algorithmic noise, they vanish at frequencies above 100 psy1 . In contrast, the
corresponding two spectra obtained from DC-i Žsolid lines. exhibit sharp artificial resonance peaks at
250 psy1 and, in the case of 1.0-fs step size, additionally at 125 psy1 . These are the frequencies
expected for the algorithmic noise forces, because
they represent the durations of four and eight
integration steps, respectively.
The upper right plot in Figure 7 shows the same
data, this time plotted as the relative error of DC-i
with respect to REF, again using 0.5-fs, as well as
1.0-fs, integration step sizes. In a corresponding
fashion, the bottom four plots of Figure 7 show
reltaive errors associated with DC-0, DC-1a, DC-1b,
and DC-1c, respectively. Note, however, the diminished error scales caused by the reduced force
discontinuities in these algorithms. Not shown are
high-frequency spectra of CUT, as well as DC-1d:
the former is identical to the reference spectrum,
whereas the latter spectrum is similar to DC-1c.
Due to the extended length of the simulations, no
statistical fluctuation could be observed in the
high-frequency spectra. For all methods, the magnitudes of the algorithmic noise peaks match
the sizes of the force discontinuities illustrated in
Figure 3 quite well.
Are these partially pronounced artifacts relevant? For an answer, first note the peak at
1546
FIGURE 7. High-frequency spectra of atomic mobilities;
the upper left shows the reference spectrum (dashed
lines) and DC-i spectrum (solid lines) that were both
obtained with integration step sizes of 0.5 fs (thin lines)
and 1.0 fs (bold lines); the remaining graphs show the
relative errors S a / S r e f of the spectra S a obtained using
the algorithms a g {DC-i, DC-0, DC-1a, DC-1b, DC-1c}
with integration step sizes of 0.5 fs (thin lines) and 1.0 fs
(bold lines); note the different scale for graph DC-k in the
upper right; no significant relative error was found for
CUT (results not shown); the relative error of DC-1d (not
shown) is similar to that of DC-1c.
f 30 psy1 in the upper left graph of Figure 7,
which originates from bond stretch vibrations.
These represent the fastest physical degrees of freedom in the system. Therefore, at higher frequencies, the spectrum does not represent physical
properties. Because no deviations of the spectra are
observed in the frequency range below 30 psy1 in
which physical properties are monitored, the apparent algorithmic artifacts are of no direct concern. It remains to be clarified, however, whether
the high-frequency noise and the necessary compensation of the energy drift cause errors in the
VOL. 19, NO. 13
MTS ALGORITHMS FOR MD SIMULATIONS OF PROTEINS
computation of relevant quantities. All further
comparisons serve to answer that question.
AVERAGE LOW-FREQUENCY SPECTRUM OF
ATOMIC VIBRATIONS
We first focus on low-frequency atomic motions: our extended simulations permitted the
computation of low-frequency spectra, including
frequencies as low as 0.05 psy1 with small statistical error. The upper part of Figure 8 shows three
spectra, each of which was obtained by averaging
over the three simulations with integration step
sizes 0.5, 1.0, and 2.0 fs. The dashed line represents
the reference spectrum.
None of the DC algorithms shows significant
deviations from REF; the spectrum obtained from
algorithm DC-1d is shown as an example Žthin,
solid line.. ON the contrary, CUT causes a pronounced suppression of atomic mobilities at frequencies below 0.2 psy1 . Inspection of the relative
deviations, S arS ref Žbottom of Fig. 8. reveals a
cutoff-induced suppression of up to 40% Žbold
line. with respect to REF. A noteworthy consequence of this finding is that apparently long-range
˚ which are absent in CUT
interaction above 10 A,
but present in REF, have a significant impact on
the low-frequency dynamics of our protein model.
In that respect our model reproduces well-known
properties of more realistic protein models.
ATOMIC RMS FLUCTUATIONS
To reduce statistical errors, the above spectra
were averaged over all atoms w cf. eq. Ž3.x , thereby
probably hiding algorithmic artifacts related to
motion of individual atoms. For a more detailed
measure of accuracy, we computed the atomic
mobilities ss as defined in eq. Ž4.. From these we
determined the RMS deviation D rms from REF,
D rms [
¦Ž s y s
s
s
;
ref . 2
s
1r2
.
The results for the seven algorithms using the
1-fs step size are shown in Figure 9. Note that the
observed deviation are not exclusively caused by
algorithmic approximations: due to the limited trajectory length, statistical fluctuations contribute to
the observed D rms . To estimate that statistical contribution, two reference simulations were compared with each other. The leftmost column in
Figure 9 shows this estimate; its value is marked
by the dashed line. Only deviations, which clearly
exceed that value, as is the case for DC-i and DC-0,
point toward algorithmic artifacts. For the other
algorithms the observed deviations are insignificant.
One might expect that the large random forces
caused by DC-i and DC-0 Žcf. Fig. 3. should increase atomic mobilities. However, a closer analysis of the data Žnot shown. reveals that these two
methods actually decrease the ss . We attribute this
effect to a decorrelation of atomic motions due to
the algorithmis random forces, resulting in a reduction of the inertial character and, hence, of the
amplitude of these motions.
FIGURE 8. Low-frequency spectra of atomic mobilities;
(top) spectrum obtained using REF (dashed), CUT (bold,
solid), and DC-1d (thin, solid); (bottom) relative errors
S a / S ref of the spectra S a for the algorithms a g
{CUT, DC-1d}.
JOURNAL OF COMPUTATIONAL CHEMISTRY
FIGURE 9. Root mean squared deviations of atomic
mobilities from reference mobilities, computed for the
seven different MD algorithms.
1547
¨
GRUBMULLER
AND TAVAN
CORRELATION OF ATOMIC MOTIONS
This speculation can be readily verified by
studying correlations of atomic motions. For that
purpose, displacement autocorrelation functions
C sdis were determined according to eq. Ž5.. Averages over 20 atoms are shown in Figure 10. These
autocorrelation functions, like D rms , are subject to
statistical fluctuations. To estimate the statistical
FIGURE 10. Average displacement autocorrelation
functions ² C dis ( t ):s of atomic positions; the results of
two reference simulations (thin, solid lines) are plotted
for an estimate of statistical fluctuations.
error, autocorrelation functions were computed
from two reference simulation Žthin, solid lines..
According to the results in Fig. 10, the tested
algorithms can be classified into three groups: in
agreement with the speculation voiced above, the
DC algorithms DC-i and DC-0 underestimate autocorrelations. The other algorithms DC-1a, DC-1b,
DC-1c, and CUT entail slight overestimates. No
significant deviation occurs for DC-1d. Thus, the
latter algorithm showed the best performance in
the tests of relevant observables considered so far.
Velocity autocorrelation functions were derived
according to eq. Ž6.. For the investigated algorithms there were no significant deviations from
REF observed. This result is explained by the close
relation of autocorrelation functions to spectra of
atomic motions, which are Fourier transformed
C vel. The velocity autocorrelation functions we
studied are determined by the physically relevant
part of the high-frequency spectrum discussed
above, which also did not exhibit any artifacts.
So far, we merely considered dynamical properties of single atoms but no interatomic relations.
To check whether correlated atomic motion are
described reliably by the various algorithms,
cross-correlation functions K s1 s 2 for 190 atom pairs
were computed according to eq. Ž7.. In the eight
pictures in Fig. 11, each of the diamonds represents one atom pair. The vertical axis measures the
FIGURE 11. Cross-correlation functions of atomic positions; each of the 190 diamonds represents one atom pair; the
vertical axis measures the value of its cross-correlation, the horizontal axis its average distance; in each of the eight
pictures the overall RMSD of the cross-correlation from REF is indicated at the bottom; the RMSD between two
reference simulations, indicated in the upper left picture of the figure, provides an estimate for the statistical error.
1548
VOL. 19, NO. 13
MTS ALGORITHMS FOR MD SIMULATIONS OF PROTEINS
value of its cross-correlation, the horizontal axis its
average distance. In addition, the overall RMS
deviation ŽRMSD. of the cross-correlation from REF
is indicated in each of the eight pictures. The
RMSD between two reference simulations, indicated in the upper left picture of Figure 11, provides an estimate for the statistical error.
Inspection of the REF cross-correlations shows
that in our protein model motions of close atoms
tend to be correlated, whereas distant atoms exhibit anticorrelations. Whereas this behavior is
qualitatively reproduced by all algorithms, quantitative differences can be readily observed: particularly at large distances, the noisy algorithms
DC-i and DC-0 tend to suppress Žanti-.correlations.
Considering the overall RMSDs from REF, one
observes that, in addition to DC-i and DC-0, CUT
and DC-1c also provide cross-correlations that differ significantly from the reference values Žalthough the scatterplots look almost identical.. No
significant deviations are seen for DC-1a, DC-1b,
and DC-1d. Hence, cross-correlations provide an
accuracy measure that is unable to discriminate
these DC algorithms from REF. As we will show
below, the projected configuration space densities
introduced in the Methods Section provide a more
sensitive measure.
PROJECTED CONFIGURATION
SPACE DENSITY
In our particular choice for the conformational
coordinates defined the projected configuration
space densities rc , we took care to monitor lowfrequency motions of our protein model. To this
aim we selected atoms, which in the tertiary structure of our model Žcf. Fig. 2. exhibit large relative
distances, for the definition of the conformational
coordinates. Thus, the temporal evolution of these
conformational coordinates should be dominated
by collective motions of the backbone and, hence,
should be of low-frequency character. As already
noted in connection with Figure 8, such degrees of
freedom are particularly influenced by long-range
forces.
We determined P s 50 density distributions rc
for each of the two reference simulations and each
of the seven tested MD algorithms. All comparisons of density distributions were carried out using nine different subdivisions V k 4 of conformational spaces ranging from coarse grained Ž K s 2.
to fine grained resolution Ž K s 512..
Additionally, a comparison of the two reference
simulations served to estimate statistical errors
JOURNAL OF COMPUTATIONAL CHEMISTRY
according to eqs. Ž8. and Ž9.. The upper left picture
of Figure 12 displays that estimate ² D R 2yR 1 : Žsolid
line.. The horizontal axis measures the level K of
graining. The increase of ² D R 2yR 1 : with K merely
reflects enhanced statistical fluctuations in the frequency counts of trajectory points within the conformation space volumes V k 4 . The standard deviation sR 2yR 1 of our error estimate, determined by
eq. Ž10., is plotted as dashed lines. The small
deviation range indicates that, in our case, projected configuration space densities actually represent regular quantities.
The other seven pictures of Figure 12 show the
averaged deviations D a of projected phase space
densities obtained by the seven MD algorithms a
with respect to REF. Here, according to eq. Ž11.,
the statistical contribution ² D R 2yR 1 : was subtracted from the observed deviations ² D S ayR 1 :.
Two error ranges are depicted, a lower 2 s limit
Ždashed. obtained by eq. Ž12., as well as an upper
limit Ždashed-dotted. obtained by eq. Ž13..
In all cases, the most significant deviations are
observed at intermediate levels K of coarse graining. These levels appear to provide a sufficiently
detailed description of the conformation space
densities rc to allow an identification of algorithmic artifacts. In contrast, the coarse grained view
on rc at small K levels, as well as the increased
statistical errors at large K, prevent a clearcut
distinction of algorithms.
Highly significant deviations D a are observed
for CUT, as well as for DC-1b. The most significant
FIGURE 12. Algorithmic deviations of projected phase
space densities and their statistical significance; the
upper left graph provides an estimate of the statistical
error; the horizontal axis measures the degree K of
coarse graining of conformational space.
1549
¨
GRUBMULLER
AND TAVAN
deviations for DC-i, DC-0, DC-1a, and DC-1c are
smaller by a factor of 4]5 and approximately of
size sama x , which is 7 times the size of samin . To
comment on the statistical significance of the latter
result, we note that the error range sama x represents an estimate for the worst case only, in which
all P density distributions employed for the evaluation of D a are completely correlated. Therefore,
one may assume that the actual sa is smaller than
sama x and we may attribute statistical significance
to the rc deviations of DC-i, DC-0, DC-1a, and
DC-1c, too. If we consider sama xr2 a reasonably
safe estimate for sa , then the hypothesis that algorithmic artifacts are absent in those four cases can
be rejected at a level of significance below 5%.
DC-1d is the only method for which the deviation
stays well even within the samin range; therefore,
no significant artifact is detected.
Discussion
Aiming at an evaluation of MD methods particularly designed for a description of protein dynamics, we presented problem-adapted measures
for algorithmic accuracy. A closer inspection of the
principles of MD simulation led us to the proposal
that such measures of accuracy should be based on
comparisons of regular, on nonchaotic, and relevant quantities. These comparisons should be of
type D n : the quantities derived from test simulations employing the MD algorithm to be evaluated
should be compared with ‘‘exact’’ reference quantities.
The requirement to compare regular quantities,
which generally are ensemble or time averages
and therefore show slow convergence demanding
extended test simulations, enforced the use of a
simplified protein model as a test system. This
allowed us to separate purely statistical fluctuations from the deviations caused by algorithmic
artifacts.
The requirement to use relevant quantities led
us to select a set of quantities that are considered
useful for descriptions of protein function on the
basis of MD simulations. The chosen quantities
included atomic fluctuations, vibrational spectra,
and various correlation functions of atomic motions, as well as projections of configuration space
density distributions. In addition, conservation of
total energy representing a conventional, although
in our view irrelevant, measure of accuracy was
studied.
We applied these measures of accuracy to seven
MD algorithms that all achieve a reduction of
computational effort by different types of approximation for the long-range Coulomb interaction.
The studied algorithms include a conventionally
used cutoff method, as well as a variety of recently
proposed DC algorithms, that belong to the family
of MTS methods. For a majority of the integration
steps, the latter replace exact long-range forces by
extrapolations using forces computed at previous
steps. They differ from each other by the respective extrapolation method.
As summarized in Table II, we observed that
DC methods, particularly an algorithm denoted as
DC-1d, reproduce the chosen relevant quantities
more accurately than the cutoff method. Specifically, the latter algorithm was found to artificially
suppress low-frequency atomic motions at frequencies around 0.1 psy1 by up to 40%, in addi-
TABLE II.
Observed Artifacts for Relevant Observables.
Observable
DE
S hf
S lf
ss
C sdis
C sv el
K s1 s 2
rc
CUT
—
—
DC-i
DC-0
DC-1a
DC-1b
DC-1c
vv
vv
v
v
v
v
vv
v
vv
v
vv
vv
—
—
—
—
—
—
—
—
—
—
—
—
vv
—
—
—
v
v
v
vv
vv
v
v
v
—
—
—
v
v
—
—
—
—
—
v
vv
v
v
v
vv
v
v
DC-1d
Total energy drift ( D E ), high frequency spectra ( S h f ), low frequency spectra ( S lf ), atomic fluctuations ( ss ), displacement
autocorrelation ( C sdis ), velocity autocorrelation ( C sv el ), cross-correlation of atomic motions ( K s 1s 2 ), and projected configuration
space densities ( rc ) are characterized by ( — ) no significant deviation from the reference simulation, (v) deviation observed, and
Žvv) severe deviation; the three vertical lines group methods that show similar deviation patterns.
1550
VOL. 19, NO. 13
MTS ALGORITHMS FOR MD SIMULATIONS OF PROTEINS
tion to the known structural artifacts caused by
this method.78 None of the DC methods showed
that artifact.
By construction, DC methods exhibit a different
type of artifact: as they approximate the long-range
forces, they introduce small errors into the simulations. That algorithmic noise causes a significant
drift of total energy, which was found to be larger
by 3]5 orders of magnitude than the one present
in an ‘‘exact’’ force computation. However, such
energy drift can easily be suppressed by a weak
coupling to a heat bath.41, 79 Our test simulations
served to address the question of whether the
algorithmic noise forces and the necessary temperature correction cause relevant artifacts.
Two of the investigated DC methods, DC-i and
DC-0, exhibited considerably larger noise than the
others. We found that in these two methods noise
decorrelates atomic motions, an effect that becomes apparent by inspecting autocorrelation
functions, as well as cross-correlations.
As a particularly sensitive tool to measure algorithmic accuracy with respect to low-frequency
collective motions, we studied projected configuration space densities. Because such quantities are
directly related to statistical observables like entropy, free energy, or reaction rates, we considered
them to be most relevant for protein function.
With respect to the projected configuration space
densities, DC-1d was the only approximation
method that did not show any deviation from an
exact force computation. In contrast, particularly
large artifacts were detected for the cutoff method,
as well as for algorithm DC-1b. The large deviations displayed by the latter method are somewhat
surprising, because they are in contrast to the
other measures of accuracy.
Considering the whole set of relevant quantities
that we studied, DC-1d turned out to be the only
MD algorithm that did not exhibit any algorithmic
artifacts. Because DC methods are generally as
efficient as the cutoff method, algorithm DC-1d
should be preferred for protein dynamics simulations.
Table II also shows that the conventional criterion of total energy conservation yields a significantly different ranking of the seven algorithms
considered. Accordingly, our problem-adapted approach employing regular and functionally relevant quantities also demonstrates that focusing on
energy conservation may provide misleading information on the suitability of MD methos for
protein dynamics simulations and therefore is in
appropriate for that purpose. Generally, not a sin-
JOURNAL OF COMPUTATIONAL CHEMISTRY
gle observable, but rather the combined consideration of a number of relevant quantities, provides a
comprehensive picture of algorithmic suitability.
Acknowledgments
The authors would like to thank C. Niedermeier
for helpful discussions. This work was supported
by the Deutsche Forschungsgemeinschaft Grants
SFB 143rC1 and SFB 533rC1.
References
1. M. Levitt, J. Mol. Biol., 168, 621 Ž1983..
2. H. Fraunfelder, Biophys. J., 47, 35 Ž1985..
3. J. A. McCammon and M. Karplus, Proc. Natl. Acad. Sci.
USA, 76, 3585 Ž1979..
4. M. Levitt and S. Lifson, J. Mol. Biol., 46, 269 Ž1969..
5. J. A. McCammon, B. R. Gelin, and M. Karplus, Nature
(Lond. ), 267, 585 Ž1977..
6. W. F. van Gunsteren and H. J. C. Berendsen, Mol. Phys., 34,
1311 Ž1977..
7. J. A. McCammon and S. C. Harvey, Dynamics of Proteins
and Nucleic Acids, Cambridge University Press, Cambridge,
U.K., 1987.
8. G. Petsko, Nature (Lond. ), 371, 740 Ž1994..
9. H. J. C. Berendsen, Science, 271, 954 Ž1996..
10. H. Heller, M. Schaefer, and K. Schulten, J. Phys. Chem., 97,
8343 Ž1993..
11. F. Zhou, A. Windemuth, and K. Schulten, Biochemistry, 32,
2291 Ž1993..
12. O. Edholm, O. Berger, and F. Jahnig,
J. Mol. Biol., 250, 94
¨
Ž1995..
13. H. Grubmuller,
B. Heymann, and P. Tavan, Science, 271, 997
¨
Ž1996..
14. M. Rief, F. Oesterhelt, B. Heymann, and H. E. Gaub, Science,
275, 1295 Ž1997..
15. W. F. van Gunsteren and H. J. C. Berendsen, Angew. Chem.
Int. Ed. Engl., 29, 992 Ž1990..
16. N. S. Ostlund and R. A. Whiteside, Ann. N.Y. Acad. Sci.,
439, 195 Ž1985..
17. H. Heller, H. Grubmuller,
and K. Schulten, Mol. Simul., 5,
¨
133 Ž1990..
18. B. R. Brooks and M. Hodoscek,
˘˘ Chem. Design Autom. News,
7, 16 Ž1992..
19. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States,
S. Swaminathan, and M. Karplus, J. Comput. Chem., 4, 187
Ž1983..
20. W. F. van Gunsteren and H. J. C. Berendsen, GROMOS
Manual, BIOMOS b.v., Biomolecular Software, Groningen,
The Netherlands, 1987.
21. B. Leimkuhler and R. D. Skeel, J. Comput. Phys., 112, 117
Ž1994..
22. B. Leimkuhler, S. Reich, and R. D. Skeel, In IMA Volumes
in Mathematics and Its Applications, Vol. 82, K. Schulten
and J. P. Mesivav,
Eds., Springer, New York, 1996, p. 161.
´
1551
¨
GRUBMULLER
AND TAVAN
23. R. D. Skeel, G. H. Zhang, and T. Schlick, SIAM J. Sci.
Comput., 18, 203 Ž1997..
24. A. Ahmad and L. Cohen, J. Comput. Phys., 12, 389 Ž1973..
25. W. B. Streett, D. J. Tildesley, and G. Saville, Mol. Phys., 35,
639 Ž1978..
26. R. C. Y. Chin, G. W. Hedstrom, and F. A. Howes, Considerations on Solving Problems with Multiple Scales, 1st ed., Academic Press, Orlando, FL, 1985, p. 1.
27. A. Windemuth, Diploma thesis, Technical University of
Munich, Munich, 1988.
28. M. E. Tuckerman, G. J. Martyna, and B. J. Berne, J. Chem.
Phys., 93, 1287 Ž1990..
29. H. Grubmuller,
H. Heller, A. Windemuth, and K. Schulten,
¨
Mol. Simul., 6, 121 Ž1991..
30. M. E. Tuckerman and B. J. Berne, J. Chem. Phys., 94, 1465
Ž1991..
31. M. E. Tuckerman, B. J. Berne, and G. J. Martyna, J. Chem.
Phys., 94, 6811 Ž1991..
32. R. D. Skeel, J. J. Biesiadecki, and D. Okunbor, In Proceedings
of the International Conference on Computation of Differential
Equations and Dynamical Systems, World Scientific, Singapore, 1992.
33. R. D. Skeel and J. J. Biesiadecki, Ann. Num. Math., 1, 191
Ž1994..
34. H. Grubmuller,
Ph.D. thesis, Technische Universitat,
¨
¨
Munchen,
1994.
¨
35. D. Okunbor and R. D. Skeel, Explicit Canonical Methods for
Hamiltonian Systems w Working documentx , Numerical Computing Group, University of Illinois at Urbana]Champaign,
1991.
36. A. Windemuth, Advanced Algorithms for Molecular Dynamics
Simulation: The Program PMD, ACS Books, Washington,
D.C., 1995.
37. D. D. Humphreys, R. A. Friesner, and B. J. Berne, J. Phys.
Chem., 99, 10674 Ž1995..
38. P. Procacci, T. Darden, and M. Marchi, J. Phys. Chem., 100,
10464 Ž1996..
39. S. J. Stuart, R. Zhou, and B. J. Berne, J. Chem. Phys., 105,
1426 Ž1996..
40. R. Zhou and B. J. Berne, J. Phys. Chem., 103, 9444 Ž1995..
41. T. Schlick, E. Bartha, and M. Mandziuk, Annu. Rev. Biophys. Biomol. Struct., 26, 181 Ž1997..
42. A. W. Appel, SIAM J. Sci. Stat. Comput., 6, 85 Ž1985..
43. J. Barnes and P. Hut, Nature (Lond. ), 324, 446 Ž1986..
44. L. Greengard and V. Rokhlin, Chem. Scr., 29A, 139 Ž1989..
45. J. F. Leathrum and J. A. Board, The Parallel Fast Multipole
Algorithm in Three Dimensions w Technical reportx , Dept. of
Electrical Engineering, Duke University, Durham, NC, 1992.
46. C. Niedermeier and P. Tavan, J. Chem. Phys., 101, 734
Ž1994..
47. C. Niedermeier, Ph.D. thesis, Ludwig-Maximilians-Universitat,
Germany, 1995.
¨ Munchen,
¨
48. C. Niedermeier and P. Tavan, Mol. Simul., 17, 57 Ž1996..
49. B. A. Luty, I. G. Tironi, and W. F. van Gunsteren, J. Chem.
Phys., 103, 3014 Ž1995..
50. U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee,
and L. G. Pedersen, J. Chem. Phys., 103, 8577 Ž1995..
1552
51. B. A. Luty, I. G. Tironi, and W. F. van Gunsteren, J. Chem.
Phys., 103, 3014 Ž1995..
52. B. A. Luty and W. F. van Gunsteren, J. Phys. Chem., 100,
2581 Ž1996..
53. S. J. Aarseth, Direct Methods for N-Body Simulations, 1st ed.,
Academic Press, Orlando, FL, 1985, p. 378.
54. H. Grubmuller,
Diploma thesis, Technische Universitat
¨
¨
Munchen,
Germany, 1989.
¨
55. M. E. Tuckerman and B. J. Berne, J. Chem. Phys., 95, 8362
Ž1991..
56. L. Greengard and V. Rokhlin, J. Comput. Phys., 73, 325
Ž1987..
57. M. Eichinger, H. Grubmuller,
H. Heller, and P. Tavan,
¨
J. Comput. Chem., 18, 1729 Ž1997..
58. L. Verlet, Phys. Rev., 159, 98 Ž1967..
59. G. D. Quinlan and S. Tremaine, Astron. J., 100, 1694 Ž1990..
60. K. D. Gibson and H. A. Scheraga, J. Comput. Chem., 11, 468
Ž1990..
61. M. Prevost,
D. van Belle, G. Lippens, and S. Woodak, Mol.
´
Phys., 71, 587 Ž1990..
62. R. J. Loncharich and B. R. Brooks, Proteins, 6, 32 Ž1989..
63. H. Grubmuller
and P. Tavan, J. Chem. Phys., 101, 5047
¨
Ž1994..
64. A. Windemuth and K. Schulten, Mol. Simul., 5, 353 Ž1991..
65. R. D. Skeel, private communication, 1991.
66. R. D. Skeel and J. J. Biesiadecki, In Proceedings of SCADE93,
Univ. of Auckland, New Zealand, January 4]8, 1993.
67. H. Frauenfelder, S. G. Sligar, and P. G. Wolynes, Science,
254, 1598 Ž1991..
68. W. Doster, S. Cusack, and W. Petry, Nature (Lond. ), 337, 754
Ž1989..
69. H. Frauenfelder, N. A. Alberding, A. Ansari, D. Braunstein,
B. R. Cowen, M. K. Hong, I. E. T. Iben, J. B. Johnson, S.
Luck, M. C. Marden, J. R. Mourant, P. Ormos, L. Reinisch,
R. Scholl, A. Schulte, E. Shymasunder, L. B. Sorensen, P. J.
Steinbach, A. Xie, R. D. Young, and K. T. Yue, J. Phys.
Chem., 94, 1024 Ž1990..
70. J. Zollfrank, J. Friedrich, and F. Parak, Biophys. J., 61, 716
Ž1992..
71. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T.
Vetterling, Numerical Recipes in C, Cambridge University
Press, Cambridge, U.K., 1988.
72. W. Nadler, A. T. Brunger,
K. Schulten, and M. Karplus,
¨
Proc. Natl. Acad. Sci. USA, 84, 7933 Ž1987..
73. H. Frauenfelder, P. J. Steinbach, and R. D. Young, Chem.
Scr., 29A, 145 Ž1989..
74. H. Grubmuller,
Phys. Rev. E, 52, 2893 Ž1995..
¨
75. P. A. Hargrave, H. E. Hamm, and K. P. Hofmann, BioEssays, 15, 43 Ž1993..
76. K. Schulten and M. Tesch, Chem. Phys., 158, 421 Ž1991..
77. R. A. Marcus and N. Sutin, Biocheim. Biophys. Acta, 811, 265
Ž1985..
78. C. L. Brooks III, B. M. Pettitt, and M. Karplus, J. Chem.
Phys., 83, 5897 Ž1985..
79. H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren,
A. DiNola, and J. R. Haak, J. Chem. Phys., 81, 3684 Ž1984..
80. O. Gonzalez and J. Simo, Comput. Methods Appl. Mech.
Eng., 134, 197 Ž1996..
VOL. 19, NO. 13