Abstract
X-ray crystallography provides the most accurate models of protein–ligand structures. These models serve as the foundation of many computational methods including structure prediction, molecular modelling, and structure-based drug design. The success of these computational methods ultimately depends on the quality of the underlying protein–ligand models. X-ray crystallography offers the unparalleled advantage of a clear mathematical formalism relating the experimental data to the protein–ligand model. In the case of X-ray crystallography, the primary experimental evidence is the electron density of the molecules forming the crystal. The first step in the generation of an accurate and precise crystallographic model is the interpretation of the electron density of the crystal, typically carried out by construction of an atomic model. The atomic model must then be validated for fit to the experimental electron density and also for agreement with prior expectations of stereochemistry. Stringent validation of protein–ligand models has become possible as a result of the mandatory deposition of primary diffraction data, and many computational tools are now available to aid in the validation process. Validation of protein–ligand complexes has revealed some instances of overenthusiastic interpretation of ligand density. Fundamental concepts and metrics of protein–ligand quality validation are discussed and we highlight software tools to assist in this process. It is essential that end users select high quality protein–ligand models for their computational and biological studies, and we provide an overview of how this can be achieved.
Similar content being viewed by others
References
Bernstein FC, Koetzle TF, Williams GJ, Meyer EF Jr, Brice MD et al (1977) The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol 112:535–542
Berman H (2008) The Protein Data Bank: a historical perspective. Acta Crystallogr A 64:88–95
Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10:980
Gutmanas A, Alhroub Y, Battle GM, Berrisford JM, Bochet E et al (2014) PDBe: protein Data Bank in Europe. Nucleic Acids Res 42:D285–D291
Henderson R, Sali A, Baker ML, Carragher B, Devkota B et al (2012) Outcome of the first electron microscopy validation task force meeting. Structure 20:205–214
Dutta S, Burkhardt K, Swaminathan GJ, Kosada T, Henrick K et al (2008) Data deposition and annotation at the Worldwide Protein Data Bank. In: Kobe B, Guss M, Huber T (eds) Structural proteomics: high-throughput methods. Humana Press/Springer, New York, NY
Carvalho AL, Trincao J, Romao MJ (2009) X-ray crystallography in drug discovery. Methods Mol Biol 572:31–56
Zheng H, Hou J, Zimmerman MD, Wlodawer A, Minor W (2014) The future of crystallography in drug discovery. Expert Opin Drug Discov 9:125–137
Davis AM, St-Gallay SA, Kleywegt GJ (2008) Limitations and lessons in the use of X-ray structural information in drug design. Drug Discov Today 13:831–841
Krishnan VV, Rupp B (2012) Macromolecular structure determination: comparison of X-ray crystallography and NMR. Spectroscopy. eLS. doi:10.1002/9780470015902.a9780470002716.pub9780470015902
Davies TG, Tickle IJ (2012) Fragment screening using X-ray crystallography. Top Curr Chem 317:33–59
Pozharski E, Weichenberger CX, Rupp B (2013) Techniques, tools and best practices for ligand electron-density analysis and results from their application to deposited crystal structures. Acta Crystallogr D 69:150–167
Kleywegt GJ, Harris MR (2007) ValLigURL: a server for ligand-structure comparison and validation. Acta Crystallogr 63:935–938
Cereto-Massague A, Ojeda MJ, Joosten RP, Valls C, Mulero M et al (2013) The good, the bad and the dubious: VHELIBS, a validation helper for ligands and binding sites. J Cheminform 5:36
Weichenberger CX, Pozharski E, Rupp B (2013) Visualizing ligand molecules in twilight electron density. Acta Crystallogr F69:195–200
Hartshorn MJ, Verdonk ML, Chessari G, Brewerton SC, Mooij WT et al (2007) Diverse, high-quality test set for the validation of protein-ligand docking performance. J Med Chem 50:726–741
Warren GL, Do TD, Kelley BP, Nicholls A, Warren SD (2012) Essential considerations for using protein-ligand structures in drug discovery. Drug Discov Today 17:1270–1281
Hawkins PCD, Warren GL, Skillman AG, Nicholls A (2008) How to do an evaluation: pitfalls and traps. J Comput Aided Mol Des 22:179–190
Westbrook JD, Fitzgerald PM (2003) The PDB format, mmCIF, and other data formats. Methods Biochem Anal 44:161–179
Kleywegt GJ, Harris MR, Zou JY, Taylor TC, Wahlby A et al (2004) The uppsala electron-density server. Acta Crystallogr D60:2240–2249
Joosten RP, Joosten K, Murshudov GN, Perrakis A (2012) PDB_REDO: constructive validation, more than just looking for errors. Acta Crystallogr D 68:484–496
Joosten RP, Long F, Murshudov GN, Perrakis A (2014) The PDB_REDO server for macromolecular structure model optimization. IUCrJ 1:213–220
Rhodes G (2006) Crystallography made crystal clear. Academic Press, London, UK
Rupp B (2009) Biomolecular crystallography: principles, practice, and application to structural biology. Garland Science, New York
Elsliger MA, Deacon AM, Godzik A, Lesley SA, Wooley J et al (2010) The JCSG high-throughput structural biology pipeline. Acta Crystallogr F66:1137–1142
Weichenberger CX, Rupp B (2014) Ten years of probabilistic estimates of biocrystal solvent content: new insights via nonparametric kernel density estimate. Acta Crystallogr D 70:1579–1588
Debreczeni JE, Emsley P (2012) Handling ligands with coot. Acta Crystallogr D68:425–430
Emsley P, Lohkamp B, Scott WG, Cowtan K (2010) Features and development of coot. Acta Crystallogr D 66:486–501
Krissinel E (2010) Crystal contacts as nature’s docking solutions. J Comput Chem 31:133–143
Krissinel E, Henrick K (2007) Inference of macromolecular assemblies from crystalline state. J Mol Biol 372:774–797
Danley D (2006) Crystallization to obtain protein-ligand complexes for structure-aided drug design. Acta Crystallogr D 62:569–575
Muller Y (2013) Unexpected features in the Protein Data Bank entries 3qd1 and 4i8e: the structural description of the binding of the serine-rich repeat adhesin GspB to host cell carbohydrate receptor is not a solved issue. Acta Crystallogr F69:1071–1076
Tronrud D, Allen J (2012) Reinterpretation of the electron density at the site of the eighth bacteriochlorophyll in the FMO protein from Pelodictyon phaeum. Photosynthesis Res 112:71–74
Gokulan K, Khare S, Ronning D, Linthicum SD, Sacchettini JC et al (2005) Co-crystal structures of NC6.8 Fab identify key interactions for high potency sweetener recognition: implications for the design of synthetic sweeteners. Biochemistry 44:9889–9898
Engh RA, Huber R (1991) Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr A 47:392–400
Engh RA, Huber R (2001) In: Arnold MGRE (ed) International tables for crystallography. Kluwer, Dordrecht, pp 382–392
Kleywegt GJ (2007) Crystallographic refinement of ligand complexes. Acta Crystallogr D 63:94–100
Brunger AT (1992) Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature 355:472–475
Karplus PA, Diederichs K (2012) Linking crystallographic model and data quality. Science 336:1030–1033
Tickle IJ (2012) Statistical quality indicators for electron-density maps. Acta Crystallogr D 68:454–467
Read RJ (1986) Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Crystallogr A 42:140–149
Hodel A, Kim S-H, Brunger AT (1992) Model bias in macromolecular structures. Acta Crystallogr D 48:851–858
Branden C-I, Alwyn Jones T (1990) Between objectivity and subjectivity. Nature 343:687–689
Jones TA, Zou JY, Cowan SW, Kjeldgaard M (1991) Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr A 47:110–119
Read Randy J, Adams Paul D, Arendall Iii WB, Brunger Axel T, Emsley P et al (2011) A new generation of crystallographic validation tools for the protein data bank. Structure 19:1395–1412
Rupp B, Segelke BW (2001) Questions about the structure of the botulinum neurotoxin B light chain in complex with a target peptide. Nat Struct Biol 8:643–664
Hanson MA, Oost TK, Sukonpan C, Rich DH, Stevens RC (2002) Structural basis for BABIM inhibition of botulinum neurotoxin type B protease. J Am Chem Soc 124:10248
Hanson MA, Stevens RC (2009) Retraction: cocrystal structure of synaptobrevin-II bound to botulinum neurotoxin type B at 2.0 A resolution. Nat Struct Mol Biol 16:795
Rupp B (2008) Scientific inquiry and inference in macromolecular crystallography. Acta Crystallogr A 64:C81
Vilcheze C, Wang F, Arai M, Hazbon MH, Colangeli R et al (2006) Transfer of a point mutation in Mycobacterium tuberculosis inhA resolves the target of isoniazid. Nat Med 12:1027–1029
Allen FH (2002) The Cambridge structural database: a quarter of a million crystal structures and rising. Acta Crystallogr B pp 380–388
Bruno IJ, Cole JC, Kessler M, Luo J, Motherwell WDS et al (2004) Retrieval of crystallographically-derived molecular geometry information. J Chem Inf Comput Sci 44:2133–2144
Chen VB, Arendall WB III, Headd JJ, Keedy DA, Immormino RM et al (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D 66:12–21
Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ et al (2007) MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res 35:W375–W383
Hooft RW, Vriend G, Sander C, Abola EE (1996) Errors in protein structures. Nature 381:272
Kleywegt GJ, Jones TA (1998) Databases in protein crystallography. Acta Crystallogr D54:1119–1131
van Aalten DM, Bywater R, Findlay JB, Hendlich M, Hooft RW et al (1996) PRODRG, a program for generating molecular topologies and unique molecular descriptors from coordinates of small molecules. J Comput Aided Mol Des 10:255–262
Gasteiger J, Rudolph C, Sadowski J (1990) Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comput Methodol 3:537–547
Clowney L, Westbrook JD, Berman HM (1999) CIF applications. XI. A la mode: a ligand and monomer object data environment. I. Automated construction of mmCIF monomer and ligand models. Appl Cryst 32:125–133
Peat TS, Christopher J, Schmidt K (2005) AFITT- working with good chemistry. Acta Crystallogr A 61:C165
Golovin A, Oldfield TJ, Tate JG, Velankar S, Barton GJ et al (2004) E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res 32:D211–D216
Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P (2005) ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics 21:4133–4139
Garavelli JS (2004) The RESID database of protein modifications as a resource and annotation tool. Proteomics 4:1527–1533
Bohne A, Lang E, von der Lieth CW (1999) SWEET: WWW-based rapid 3D construction of oligo- and polysaccharides. Bioinformatics 15:767–768
Nilsson K, Lecerof D, Sigfridsson E, Ryde U (2003) An automatic method to generate force-field parameters for hetero-compounds. Acta Crystallogr D 59:274–289
Feng Z, Chen L, Maddula H, Akcan O, Oughtred R et al (2004) Ligand depot: a data warehouse for ligands bound to macromolecules. Bioinformatics 20:2153–2155
Irwin JJ, Shoichet BK (2005) ZINC–a free database of commercially available compounds for virtual screening. J Chem Inf Model 45:177–182
Hendlich M, Bergner A, Günther J, Klebe G (2003) Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. J Mol Biol 326:607–620
Andrejasic M, Praaenikar J, Turk D (2008) PURY: a database of geometric restraints of hetero compounds for refinement in complexes with macromolecular structures. Acta Crystallogr D 64:1093–1109
Sehnal D, Svobodová Vařeková R, Pravda L, Ionescu C-M, Geidl S, et al (2015) ValidatorDB: database of up-to-date validation results for ligands and non-standard residues from the Protein Data Bank. Nucleic Acids Res 43:D369–D375
Varekova RS, Jaiswal D, Sehnal D, Ionescu CM, Geidl S et al (2014) MotiveValidator: interactive web-based validation of ligand and residue structure in biomolecular complexes. Nucleic Acids Res 42:W227–W233
Hartshorn MJ (2002) AstexViewer: a visualisation aid for structure-based drug design. J Comput Aided Mol Des 16:871–881
Henrick K, Feng Z, Bluhm WF, Dimitropoulos D, Doreleijers JF et al (2008) Remediation of the protein data bank archive. Nucleic Acids Res 36:D426–D433
Jaskolski M (2013) On the propagation of errors. Acta Crystallogr D 69:1865–1866
Langer G, Cohen SX, Lamzin VS, Perrakis A (2008) Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7. Nat Protoc 3:1171–1179
Terwilliger T (2004) SOLVE and RESOLVE: automated structure solution, density modification and model building. J Synchrotron Radiat 11:49–52
Cowtan K (2012) Completion of autobuilt protein models using a database of protein fragments. Acta Crystallogr D 68:328–335
Weichenberger CX, Sippl MJ (2007) NQ-Flipper: recognition and correction of erroneous asparagine and glutamine side-chain rotamers in protein structures. Nucleic Acids Res 35:W403–W406
Carolan CG, Lamzin VS (2014) Automated identification of crystallographic ligands using sparse-density representations. Acta Crystallogr D 70:1844–1853
Terwilliger TC, Adams PD, Moriarty NW, Cohn JD (2007) Ligand identification using electron-density map correlations. Acta Crystallogr D 63:101–107
Aishima J, Russel DS, Guibas LJ, Adams PD, Brunger AT (2005) Automated crystallographic ligand building using the medial axis transform of an electron-density isosurface. Acta Crystallogr D 61:1354–1363
Evrard GX, Langer GG, Perrakis A, Lamzin VS (2007) Assessment of automatic ligand building in ARP/wARP. Acta Crystallogr D 63:108–117
Wlodek S, Skillman AG, Nicholls A (2006) Automated ligand placement and refinement with a combined force field and shape potential. Acta Crystallogr D 62:741–749
Klei HE, Moriarty NW, Echols N, Terwilliger TC, Baldwin ET et al (2014) Ligand placement based on prior structures: the guided ligand-replacement method. Acta Crystallogr D 70:134–143
Laskowski RA, Swindells MB (2011) LigPlot + : multiple ligand-protein interaction diagrams for drug discovery. J Chem Inf Model 51:2778–2786
Kleywegt GJ (2000) Validation of protein crystal structures. Acta Crystallogr D 56:249–265
Dauter Z, Wlodawer A, Minor W, Jaskolski M, Rupp B (2014) Avoidable errors in deposited macromolecular structures: an impediment to efficient data mining. IUCrJ 1:179–193
Liebeschuetz J, Hennemann J, Olsson T, Groom CR (2012) The good, the bad and the twisted: a survey of ligand geometry in protein crystal structures. J Comput Aided Mol Des 26:169–183
Baker E, Dauter Z, Guss M, Einspahr H (2008) Deposition of diffraction images to be discussed at the Open Meeting of the Commission on Biological Macromolecules of the IUCr in Osaka. Acta Crystallogr F64:231–232
Cruickshank DW (1999) Remarks about protein structure precision. Acta Crystallogr D 55:583–601
Laskowski RA, Macarthur MW, Moss DS, Thornton JM (1993) {PROCHECK}: a program to check the stereochemical quality of protein structures. Appl Cryst 26:283–291
Ramachandran GN, Ramakrishnan C, Sasisekharan V (1963) Stereochemistry of polypeptide chain configurations. J Mol Biol 7:95–99
Sheffler W, Baker D (2009) RosettaHoles: rapid assessment of protein core packing for structure prediction, refinement, design, and validation. Protein Sci 18:229–239
Sheffler W, Baker D (2010) RosettaHoles2: a volumetric packing measure for protein structure refinement and validation. Protein Sci 19:1991–1995
Debye P (1913) Interferenz von Röntgenstrahlen und Wärmebewegung. Ann Phys 348:49–92
Waller I (1923) Zur Frage der Einwirkung der Wärmebewegung auf die Interferenz von Röntgenstrahlen. Zeitschrift für Physik 17:398–408
Lutteke T, Frank M, von der Lieth CW (2005) Carbohydrate Structure Suite (CSS): analysis of carbohydrate 3D structures derived from the PDB. Nucleic Acids Res 33:D242–D246
Lutteke T, von der Lieth CW (2004) pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files. BMC Bioinformatics 5:69
Collaborative Computational Project, Number 4 (1994) Acta Cryst D50:760–763. http://dx.doi.org/10.1107/S0907444994003112
Smart OS, Womack TO, Flensburg C, Keller P, Paciorek W et al (2012) Exploiting structure similarity in refinement: automated NCS and target-structure restraints in BUSTER. Acta Crystallogr D 68:368–380
Vriend G (1990) WHAT IF: a molecular modeling and drug design program. J Mol Graph 8(52–56):29
Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW et al (2010) PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D 66:213–221
Vaguine AA, Richelle J, Wodak SJ (1999) SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr D 55:191–205
Luthy R, Bowie JU, Eisenberg D (1992) Assessment of protein models with three-dimensional profiles. Nature 356:83–85
Urzhumtseva L, Afonine PV, Adams PD, Urzhumtsev A (2009) Crystallographic model quality at a glance. Acta Crystallogr D 65:297–300
Bhattacharya A, Tejero R, Montelione GT (2007) Evaluating protein structures determined by structural genomics consortia. Proteins 66:778–795
Sippl MJ (1993) Recognition of errors in three-dimensional structures of proteins. Proteins 17:355–362
Wiederstein M, Sippl MJ (2007) ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res 35:W407–W410
Colovos C, Yeates TO (1993) Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci 2:1511–1519
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN et al (2000) The protein data bank. Nucleic Acids Res 28:235–242
Acknowledgments
MCD acknowledges support from the NIH, National Institute of General Medical Sciences, Protein Structure Initiative under Grant Number U54 GM094586. BR acknowledges support from the European Union under a FP7 Marie Curie People Action, Grant PIIF-GA-2011–300025 (SAXCESS).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deller, M.C., Rupp, B. Models of protein–ligand crystal structures: trust, but verify. J Comput Aided Mol Des 29, 817–836 (2015). https://doi.org/10.1007/s10822-015-9833-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-015-9833-8