2016 Comput Biol Chem 64 403-13
2016 Comput Biol Chem 64 403-13
2016 Comput Biol Chem 64 403-13
Department of Physics and Astronomy, Louisiana State University, Baton Rouge, LA 70803, USA
School of Electrical Engineering and Computer Science, Louisiana State University, Baton Rouge, LA 70803, USA
Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
d
Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA
b
c
A R T I C L E I N F O
Article history:
Received 20 May 2016
Received in revised form 17 August 2016
Accepted 25 August 2016
Available online 6 September 2016
Keywords:
Contact Mode Score
CMS
eXtended Contact Mode Score
XCMS
Root-mean-square deviation
RMSD
Molecular docking
Ligand binding conformations
A B S T R A C T
Structural and computational biologists often need to measure the similarity of ligand binding
conformations. The commonly used root-mean-square deviation (RMSD) is not only ligand-size
dependent, but also may fail to capture biologically meaningful binding features. To address these issues,
we developed the Contact Mode Score (CMS), a new metric to assess the conformational similarity based
on intermolecular protein-ligand contacts. The CMS is less dependent on the ligand size and has the
ability to include exible receptors. In order to effectively compare binding poses of non-identical ligands
bound to different proteins, we further developed the eXtended Contact Mode Score (XCMS). We believe
that CMS and XCMS provide a meaningful assessment of the similarity of ligand binding conformations.
CMS and XCMS are freely available at http://brylinski.cct.lsu.edu/content/contact-mode-score and http://
geaux-computational-bio.github.io/contact-mode-score/.
2016 Elsevier Ltd. All rights reserved.
1. Introduction
Molecular docking is a computational technique routinely used
in protein function analysis and drug discovery (Cheng et al., 2012;
Yuriev et al., 2015). Docking calculations usually consist of two
successive stages, the prediction of the favorable orientation of a
small molecule when bound to its target protein followed by the
estimation of binding afnity and/or free energy of binding.
Scoring functions widely used in molecular docking evaluate
protein-ligand conformations in terms of the shape and electrostatic complementarity, as well as the presence of stabilizing
interactions such as hydrogen bonds, salt bridges, and hydrophobic
contacts (Yusuf et al., 2008). Since these factors hinge on the ligand
binding mode, accurately predicted protein-ligand conformations
are required for meaningful scoring.
A common practice in benchmarking docking programs is to
evaluate predicted conformations against experimentally solved
404
and the binding pocket center was set to the geometric center of
the compound bound in the experimental complex. This procedure
produced 2200 docked conformations of query ligands. For each
simulated conformation, RMSD and CMS were calculated against
the experimental structure, whereas the XCMS was calculated
using a template. Similar to the experimental BioLiP dataset, we
included only those templates having more than 6 heavy atoms, a
PS-score of <0.9, and a 1D-TC of >0.5. For the template-based
assessment with XCMS, suitable templates were identied for a
subset of 695 targets.
2.3. Molecular representation
Fast computation without compromising molecular details is
achieved by describing protein-ligand complex structures at a
mixed-resolution. A heavy-atom representation is used for ligands
with the following chemical types according to SYBYL (Clark et al.,
1989): carbon sp (C.1), carbon sp2 (C.2), carbon sp3 (C.3), aromatic
carbon (C.ar), carbocation in guadinium groups (C.cat), nitrogen sp
(N.1), nitrogen sp2 (N.2), nitrogen sp3 (N.3), positively charged
nitrogen sp3 (N.4), amide nitrogen (N.am), aromatic nitrogen (N.
ar), trigonal planar nitrogen (N.pl3), oxygen sp2 (O.2), oxygen sp3
(O.3), oxygen in carboxylate and phosphate groups (O.co2),
phosphorous sp3 (P.3), sulfur sp2 (S.2), sulfur sp3 (S.3), sulfoxide
sulfur (S.O), sulfone sulfur (S.O2), and halogens (Br, Cl, F, I). Proteins
are represented at the coarse-grained level. In CMS, two effective
backbone points per residue are placed at the position of its Ca
atom (CA) and the geometrical center of the peptide plane (PP).
Small side chains of Ala, Asn, Asp, Cys, Ile, Leu, Pro, Ser, Thr and Val
are reduced to one pseudo atom located at the geometric center,
whereas longer side chains of Arg, Gln, Glu, His, Lys, Met, Phe, Trp
and Tyr are described by two effective points corresponding to the
middle of a virtual Cb-Cg bond and the geometric center of the
remaining side-chain atoms (Zacharias, 2003). It is noteworthy
that this model is already implemented in a molecular docking
program, GeauxDock (Ding et al., 2015). In XCMS, two effective
points per residue are used at the positions of its Ca and Cb atoms
(CA and CB, respectively), except for glycine that has only the CA
atom.
2.4. Intermolecular contacts
Contacts between ligand heavy atoms and protein effective
points in the mixed-resolution model are calculated using type-
405
Fig. 1. Calculation of the Contact Mode Score (CMS). First, intermolecular contacts calculated between ligand atoms L and protein effective points P are stored in binary
matrices (1contact, 0no contact). Contact matrices for two arbitrary ligand conformations are shown in A and C, whereas B is a contact matrix constructed for the reference
conformation. Next, a confusion table is computed for a pair of contact matrices; tables D and E are calculated for pairs AB and CB, respectively. Finally, CMS is calculated as
the Matthews correlation coefcient for a given confusion table.
406
Fig. 2. Calculation of the eXtended Contact Mode Score (XCMS). First, Cartesian distances calculated between ligand atoms L and protein effective points P are stored in
distance matrices. Matrices for two arbitrary ligand conformations are shown in A and C, whereas B is a distance matrix for the reference conformation (distances are given in
). Next, two matrices are converted to distance vectors whose elements correspond to pairs of protein effective points and ligand atoms (P:L). Finally, XCMS is computed as
Spearmans rank correlation coefcient for a given set of vectors.
407
Fig. 3. Parameterization of mixed-resolution intermolecular contacts. The distribution of (A) contact distance thresholds Dcnt
ip and (B) the Matthews correlation coefcient
(MCC) values calculated vs. exact interatomic contacts across the eFindSite dataset.
Table 1
Dependence of RMSD and CMS on the ligand size. Ligand conformations from the
Astex/CCDC dataset were subjected to one round of perturbation comprising a set of
forward translations and clockwise rotations. The mean values of RMSD and CMS
are reported for each size range.
Ligand sizea
RMSD []
CMS
617
1828
2939
4050
5162
0.527
0.851
0.961
1.334
1.541
0.879
0.793
0.757
0.666
0.625
408
Fig. 4. Comparison of RMSD and CMS in the perturbation experiment. (A) Scatter plot of RMSD (dark gray squares) and CMS (light gray circles) vs. the number of ligand atoms
after a single perturbation round. Boxplots of (B) CMS and (C) RMSD calculated for ligand conformations generated through multiple perturbation rounds. Boxes end at the 25
and 75 percentiles, a horizontal line in a box is the 50 percentile (median).
Table 2
Changes in RMSD and CMS in the perturbation experiment. Ligand conformations
from the Astex/CCDC dataset were subjected to multiple rounds of perturbation,
each comprising a set of forward translations and clockwise rotations. 25, 50, and 75
percentiles as well as the quartile coefcient of dispersion (QCD) calculated across
the dataset are reported for each perturbation round.
Perturbation
round
CMS
25%
50%
75%
QCD
25%
50%
75%
QCD
1
2
3
4
5
0.738
0.605
0.503
0.415
0.357
0.793
0.673
0.561
0.485
0.426
0.837
0.734
0.638
0.564
0.493
0.063
0.096
0.118
0.152
0.161
0.708
1.175
1.599
2.018
2.489
0.839
1.368
1.876
2.387
2.872
1.001
1.613
2.237
2.808
3.394
0.171
0.157
0.166
0.164
0.154
RMSD
409
Fig. 5. Analysis of docking trajectories with the CMS. Docking simulations were conducted using GeauxDock for (A, B) penicillopepsin/pepstatin analogue (PDB-ID: 1apt,
chain A) and (C, D) plasminogen activator/inhibitor (PDB-ID: 1c5x, chain B). (A, C) Metropolis Monte Carlo trajectories for CMS (green) and pseudo-energy (E, blue). (B, D)
Scatter plots of CMS vs. the pseudo-energy; each dot represents an accepted protein-ligand conformation. (For interpretation of the references to colour in this gure legend,
the reader is referred to the web version of this article.)
Fig. 6. Examples of docking poses from GeauxDock simulations. (A-C) penicillopepsin/pepstatin analogue (PDB-ID: 1apt, chain A) and (DF) plasminogen activator/inhibitor
(PDB-ID: 1c5x, chain B). Three docking poses are shown in blue for each system, (A, D) initial, (B, E) intermediate, and (C, F) nal conformations. The corresponding
experimental complex structures are colored in orange. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version of this
article.)
quartile) RMSD, CMS, and XCMS for Vina is 2.89 , 0.574, and 0.694,
respectively, compared to 7.60 , 0.152, and 0.198 for random
conformations. Overall, these results demonstrate that when
suitable templates can be identied in the BioLiP database, a
retrospective assessment with RMSD and CMS against experimental structures can be replaced with a template-based evaluation
using the XCMS.
410
Fig. 7. XCMS and its statistical signicance for the BioLiP dataset. Query-template pairs are grouped based on the similarity between their ligands (measured by the 2D
Tanimoto coefcient) and pockets (measured by PS-score). Heat maps of (A) the arithmetic mean values of XCMS and (B) the geometric mean of the p-value for positive XCMS.
Fig. 8. Correlation between RMSD, CMS, and XCMS. Docking conformations generated for the BioLiP dataset by AutoDock Vina are used to calculate RMSD and CMS against
experimental binding poses. XCMS was computed against a holo template selected from the BioLiP database based on the highest value of the product of PS-score and the 2D
Tanimoto coefcient. Scatter plots of (A) CMS vs. RMSD and (B) CMS vs. XCMS.
Fig. 9. Assessment of docked and randomized ligand conformations across the BioLiP dataset. The similarity to experimental binding poses is assessed with (A) RMSD, (B)
CMS, and (C) XCMS. RMSD and CMS were calculated against experimental complex structures. XCMS was calculated against a holo template selected from the BioLiP database
based on the highest value of the product of PS-score and the 2D Tanimoto coefcient. Dark gray violins correspond to ligands docked by AutoDock Vina, whereas light gray
violins are calculated for randomized ligand conformations. Black horizontal lines are median values.
mean
25%
50%
75%
AutoDock Vina
Random
RMSD []
CMS
XCMS
RMSD []
CMS
XCMS
3.66
1.40
2.89
5.29
0.548
0.308
0.574
0.798
0.545
0.203
0.694
0.912
8.03
5.49
7.60
10.02
0.191
0.070
0.152
0.279
0.194
0.036
0.198
0.366
Table 4
Assessment of ligand binding poses docked by AutoDock Vina. Two case studies are
presented, MAPK14 complexed with triazolopyridine inhibitor (PDB-ID: 2yiw,
ligand YIW, chain A) and ribose-5-phosphate isomerase complexed with the
inhibitor arabinose-5-phosphate (PDB-ID: 1o8b, ligand ABF, chain A).
Metric/info
Case study
2YIW_YIW_A
1O8B_ABF_A
1.58
0.77
Template-based assessment
Template
TM-scorea
PS-scoreb
p-value of PS-score
2D-TCc
Query EC#
Template EC#
XCMS
p-value of XCMS
3BXH_F6P_A
0.27
0.46
4.90E-05
0.88
5.3.1.6
Non-enzyme
0.76
1.56E-63
3F3U_1AW_A
0.76
0.7
6.28E-09
0.41
2.7.11.24
2.7.10.2
0.96
0
411
Fig. 10. Examples of the superposition of query and template structures. The query protein is ice blue with its binding residues marked by red dots and the bound ligand
shown as red sticks. The template protein is cyan with its binding residues marked by green dots and the bound ligand shown as green sticks. (A, B) The superposition of
MAPK14 (PDB-ID: 2yiw, chain A) and c-Src (PDB-ID: 3f3u, chain A). (C, D) The superposition of ribose-5-phosphate isomerase (PDB-ID: 1o8b, chain A) and central glycolytic
gene regulator (PDB-ID: 3bxh, chain A). For each pair, two superpositions are shown, (A, C) the global structure alignment by Fr-TM-align and (B, D) the local pocket alignment
by APoc. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article.)
412
413