Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu
ChemComm Open Access Article. Published on 20 January 2016. Downloaded on 05/01/2017 10:20:31. This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. COMMUNICATION Cite this: Chem. Commun., 2016, 52, 3312 Received 16th November 2015, Accepted 11th January 2016 DOI: 10.1039/c5cc09499b View Article Online View Journal | View Issue The SQM/COSMO filter: reliable native pose identification based on the quantum-mechanical description of protein–ligand interactions and implicit COSMO solvation† Adam Pecina,‡a René Meier,‡b Jindřich Fanfrlı́k,a Martin Lepšı́k,a Jan Řezáč,a Pavel Hobza*ac and Carsten Baldauf*d www.rsc.org/chemcomm Current virtual screening tools are fast, but reliable scoring is elusive. Here, we present the ‘SQM/COSMO filter’, a novel scoring function featuring a quantitative semiempirical quantum mechanical (SQM) description of all types of noncovalent interactions coupled with implicit COSMO solvation. We show unequivocally that it outperforms eight widely used scoring functions. The accuracy and chemical generality of the SQM/COSMO filter make it a perfect tool for late stages of virtual screening. Despite the enormous advances in method development for structure-based in silico drug design, reliable predictions of the structures (docking) and affinities (scoring) of protein–ligand (P–L) complexes still remain an unsolved task.1 A plethora of scoring functions (SFs) have been devised by utilising experimental data for regression analyses, by constructing knowledgebased potentials, or based on physical laws.2,3 As none of the SFs is general enough to perform equally strongly for a diverse set of P–L complexes, utilising several SFs at once (consensus scoring) holds promise.4 Regression analysis and knowledge-based approaches to scoring are trained on a set of P–L complexes and rely on variable master equation terms. Their validity is limited to complexes similar to the training set. In principle, this problem has been overcome in physics-based methods. Because of computational cost, preference has been given to molecular mechanics (MM) methods, such as the combination of MM interaction energies with implicit solvation free energy terms (generalised Born, GB, or Poisson–Boltzmann, PB) to a Institute of Organic Chemistry and Biochemistry (IOCB) and Gilead Sciences and IOCB Research Center, Flemingovo nám. 2, 16610 Prague 6, Czech Republic. E-mail: hobza@uochb.cas.cz b Institut für Biochemie, Fakultät für Biowissenschaften, Pharmazie und Psychologie, Universität Leipzig, Brüderstrasse 34, D-04109 Leipzig, Germany c Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Palacký University, 77146 Olomouc, Czech Republic d Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, D-14195 Berlin, Germany. E-mail: baldauf@fhi-berlin.mpg.de † Electronic supplementary information (ESI) available. See DOI: 10.1039/c5cc09499b ‡ These authors have contributed equally to this work. 3312 | Chem. Commun., 2016, 52, 3312--3315 estimate affinities.2 Additionally, the wide coverage of organic chemical space in the GAFF (general AMBER force field)5 has made the parameterisation of ligands for MM straightforward. However, an explicit description of quantum mechanical (QM) effects in P–L interactions, such as charge transfer, polarisation, covalent-bond formation or s-hole bonding, was missing. QM methods, which describe these effects qualitatively better than the energy functions used in MM-based SFs, were thus introduced into computational drug design.6,7 Recent developments in QM methods and algorithms as well as the availability of a powerful computing infrastructure have paved the way to apply them for P–L complexes in numerous setups: linear scaling or efficient parallelisation of semi-empirical QM (SQM) methods,7–10 QM/MM,7,8,11,12 DFT-D3 on truncated P–L complexes13 or various fragmentation methods.11,14 Specifically, AM1, RM1, PM6 or DF–TB SQM methods have been used7–9,12,15 as such or with empirical corrections for dispersion, hydrogen- and halogen-bonding16 to describe the P–L noncovalent interactions. Merz et al. pioneered this area by introducing a QM-based SF (QMScore), a combination of the AM1 SQM method with an empirical dispersion (D) and the PB implicit solvent [eqn (1)].17 The method was useful for describing metalloprotein–ligand binding, but further corrections were needed, especially for a quantitative treatment of dispersion and hydrogen bonding.10 Score = DEint + DDGsolv + DG0w conf TDS (1) The above equation is a general physics-based SF. The terms are the gas-phase interaction energy (DEint), the change of solvation free energy upon complex formation (DDGsolv), the change of conformational ‘free’ energy (DG0w conf) and the change of entropy upon ligand binding ( TDS). Our approach is systematic. Using accurate calculations in small model systems as a benchmark, we developed corrections for SQM methods that provide reliable and accurate description of a wide range of noncovalent interactions including dispersion, hydrogen- and halogen-bonding.16 Coupled with the PM6 SQM method,18 the resulting PM6-D3H4X approach is applicable to a wide chemical space and does not require any This journal is © The Royal Society of Chemistry 2016 View Article Online Open Access Article. Published on 20 January 2016. Downloaded on 05/01/2017 10:20:31. This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. Communication ChemComm system-specific parameterisation. We use it here to calculate the DEint term. Subsequently, we compared MM-based (PB or GB) and QM-based (COSMO19 or SMD) implicit solvent models and found the latter group to be more accurate.20 These are therefore used for the DDGsolv term. These two dominant terms, DEint and DDGsolv, are at the heart of our SQM-based SF.15 We have demonstrated its generality in various noncovalent P–L complexes, such as aldose reductase or carbonic anhydrase and moreover extended it to treat covalent inhibitor binding (ref. 15, 21 and 22). In this work, we adapt our SQM-based SF to make it usable in virtual screening on the basis of our previous experience. By taking the two dominant terms only, DEint and DDGsolv, we define the ‘SQM/COSMO filter’ energy. Its performance is tested here against eight widely used SFs. GlideScore XP (GlideXP),23 PLANTS PLP (PLP),24 AutoDock Vina (Vina),25 Chemscore (CS),26 Goldscore (GS)27 and ChemPLP24 are empirical, regression-based functions which use different terms to describe vdW contacts, lipophilic surface coverage, hydrogen bonding, ligand strain, and desolvation. The Astex Statistical Potential (ASP)28 is a knowledge-based potential. The classical physics-based AMBER/GB SF combines the ff03-GAFF MM force fields with the GB implicit solvent.5,29 The goal is ‘cognate docking’,30 i.e. the ability to identify sharply the known native X-ray P–L binding pose from a set of decoy structures generated by docking (Fig. 1). To understand our results in detail, we have not opted for treating them in a statistical manner31 as in the pose decoy test sets available.32 Instead we cautiously selected four unrelated difficult-to-handle P–L systems, which comply with strict criteria for the selection of crystallographic structures for docking (details in the ESI†).33 These systems are acetylcholine esterase (AChE, PDB: 1E66),34 TNF-a converting enzyme (TACE, PDB: 3B92),35 aldose reductase (AR, PDB: 2IKJ)36 and HIV-1 protease (HIV PR; PDB: 1NH0).37 For the latter, the protonation of the active site is inferred from ultrahigh resolution X-ray crystallography. Based on these P–L crystal structures, we have created a set of non-redundant poses (2865 in total) by docking with four popular docking programs (Glide, PLANTS, AutoDock Vina and GOLD) coupled to seven widely used SFs23–28 (Fig. 1, Table S2, ESI†). All the poses were re-scored by all nine SFs. For the seven regression- and knowledge-based SFs, we used the recommended protocols. For the two physics-based SFs, only hydrogen atoms and close contacts were relaxed by the AMBER/GB method. RMSD values of the poses relative to the crystal were measured (details in S1.6, ESI†). The scores were normalised and are shown relative to the score of the crystal pose. The identification of the X-ray pose as the minimum-freeenergy structure is an unambiguous criterion for the performance of any SF. The ideal behaviour of such a score vs. RMSD curve (Fig. 2) is characterised by the positive values of energies for the decoy poses. Small deviations (negative energies for very small RMSD values) are acceptable and might be explained by inaccuracies of the crystal structure. These conditions are met by the SQM/COSMO filter, unlike the other SFs (Fig. 2). The numbers of false-positive solutions as well as the maximum RMSD (RMSDmax) from the X-ray pose within a defined interval of the normalised score quantify the virtually ideal behaviour of the SQM/COSMO filter in comparison to the other SFs. The number of false-positives is lowest for the SQM/COSMO filter, even zero for three P–L systems (Table 1). CS and ASP perform slightly worse. AMBER/GB performs satisfyingly well for three systems but yields 171 false-positives for TACE. For AChE, all the SFs perform satisfyingly well. For AR and HIV PR, GlideXP generates the highest number of false-positive solutions and also shape-wise the free energy landscape looks ill-defined (Fig. 2). In the case of AR, a plateau of negative relative scores is observed for GlideXP. The hardest case is the TACE metalloprotein. Here, all the SFs produce false-positive solutions but to a different extent. The SQM/COSMO filter performs best, followed by CS. This example in particular shows the strength of an electronic-structure theory description of P–L binding. The presence of the metal cation in this protein and the associated charge-transfer effects between the ligand and the cation are not adequately described by classical force-fields Fig. 1 The ligand poses generated by the four docking programs. Ligand poses are color-coded by RMSD. Fig. 2 The plots of normalised scores against RMSD values for all four P–L systems. This journal is © The Royal Society of Chemistry 2016 Chem. Commun., 2016, 52, 3312--3315 | 3313 View Article Online ChemComm Table 1 Communication The numbers of false-positive solutions, i.e. solutions that are scored better than the X-ray pose and have RMSD 4 0.5 Å Open Access Article. Published on 20 January 2016. Downloaded on 05/01/2017 10:20:31. This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. Scoring function AChE AR TACE HIV PR Total Table 2 Glide PLANTS AutoDock Gold SQM/COSMO AMBER/GB XP PLP Vina ASP CS GS ChemPLP 0 0 39 0 39 0 1 171 0 172 4 67 181 98 350 12 0 294 0 306 0 10 63 7 80 2 1 56 0 59 3 0 49 2 54 0 1 78 1 80 0 0 111 8 119 The maximum RMSD [Å] within all the poses in the defined range of the relative normalised score Scoring function SQM/COSMO AMBER/GB Glide PLANTS AutoDock Gold XP PLP Vina ASP CS GS ChemPLP 0.78 1.14 2.91 12.60 4.61 0.78 3.54 7.13 11.62 5.77 1.78 2.32 2.01 1.00 1.78 1.43 1.15 1.54 1.01 1.28 1.14 2.21 2.44 12.60 4.60 0.78 1.49 2.40 11.62 4.55 Maximal RMSD within a window of 5 of the normalised score AchE 0.47 0.57 2.13 AR 0.19 0.19 7.54 TACE 1.91 4.76 3.02 HIV PR 0.94 0.94 17.26 Average 0.88 1.62 7.49 or statistical potentials, but they are well represented by the SQM/COSMO filter. The second criterion, RMSDmax, is shown for the interval of the normalised relative scores below 5 (Table 2). The SQM/ COSMO filter shows the lowest RMSDmax of 0.88 Å on average. CS follows with 1.28 Å on average. ASP and AMBER/GB satisfy the conditions of an averaged RMSDmax up to 2 Å. AMBER/GB, however, fails in the difficult case of TACE with RMSDmax of 4.76 Å. Analogous analyses at greater intervals have revealed a similar ordering of the SFs (Table S4, ESI†). The SQM/COSMO filter enables us not only to recognise the correct binding pose (RMSD below 2 Å) but also to go beyond this limit and evaluate even small changes in the geometry of the ligand binding. The price for such a high accuracy is the increased computational time requirements. The SQM/COSMO filter is ca. 100-times slower than the statistics- and knowledge-based SFs and about 10-times slower than the classical physics-based AMBER/GB. However, compared to the standard SQM-based SF, it is ca. 100-times faster. The speed can be further enhanced by parallelisation. To summarise, we have pushed the limits of the accuracy of SFs to judge the energetics of P–L noncovalent interactions. Based on our development and the extensive experience with SQM-based scoring functions,3,21 the SQM/COSMO filter has been introduced. It features two dominant terms to describe P–L interaction, namely the DEint term at the PM6-D3H4X level for gas-phase noncovalent interactions and the DDGsolv term at the COSMO level for implicit solvation. We showed previously that both these methods are very accurate at a reasonable speed.16,20 The SQM/COSMO energy is calculated in four unrelated P–L complexes. The SQM/COSMO filter is compared to eight widely used SFs, which are statistics-, knowledge- or force-field-based. The SQM/COSMO scheme exhibits a superior performance as 3314 | Chem. Commun., 2016, 52, 3312--3315 judged by two criteria, the number of false positives and RMSDmax. In contrast to standard SFs, no fitting against data sets has been involved. Furthermore, it offers generality and comparability across the chemical space and no system-specific parameterisations have to be performed. The time requirements allow for calculations of thousands of docking poses as we have demonstrated in this pilot study. We propose the SQM/COSMO filter as a tool for accurate medium-throughput refinement in later stages of virtual screening or as a reference method for judging the performance of other scoring functions. The proof of concept that reliable QM calculations can now be performed for tens of thousands of large biochemical entities opens a way to progress in closely related disciplines such as materials design. This work was supported by research projects RVO 61388963 awarded to the Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic. We acknowledge the financial support of the Czech Science Foundation (grant number P208/12/G016). The authors acknowledge the support by the project L01305 of the Ministry of Education, Youth and Sports of the Czech Republic. The computations were performed at the Center for Information Services and High Performance Computing (ZIH) at TU Dresden. References 1 G. L. Warren, C. W. Andrews, A. M. Capelli, B. Clarke, J. LaLonde, M. H. Lambert, M. Lindvall, N. Nevis, S. F. Semus and S. Senger, et al., J. Med. Chem., 2006, 49, 5912; A. R. Leach, B. K. Shoichet and C. E. Peishoff, J. Med. Chem., 2006, 49, 5851. 2 H. Gohlke and G. Klebe, Angew. Chem., Int. Ed., 2002, 41, 2644. 3 R. Meier, M. Pippel, F. Brandt, W. Sippl and C. Baldauf, J. Chem. Inf. Model., 2010, 50, 879. 4 P. S. Charifson, J. J. Corkery, M. A. Murcko and W. P. Walters, J. Med. Chem., 1999, 42, 5100; R. Wang and S. J. Wang, J. Chem. Inf. Comput. Sci., 2001, 41, 1422. This journal is © The Royal Society of Chemistry 2016 View Article Online Open Access Article. Published on 20 January 2016. Downloaded on 05/01/2017 10:20:31. This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. Communication 5 J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman and D. A. Case, J. Comput. Chem., 2004, 25, 1157. 6 K. Raha, M. B. Peters, B. Wang, N. Yu, A. M. Wollacott, L. M. Westerhoff and K. M. Merz, Drug Discovery Today, 2007, 12, 725; M. Xu and M. A. Lill, Drug Discovery Today: Technol., 2013, 10, 411. 7 D. Mucs and R. A. Bryce, Expert Opin. Drug Discovery, 2013, 8, 263. 8 S. A. Hayik, R. Dunbrack and K. M. Merz, J. Chem. Theory Comput., 2010, 6, 3079. 9 M. Hennemann and T. Clark, J. Mol. Model., 2014, 20, 2331. 10 H. S. Muddana and M. K. Gilson, J. Chem. Theory Comput., 2012, 8, 2023; P. Mikulskis, S. Genheden, K. Wichmann and U. Ryde, J. Comput. Chem., 2012, 33, 1179. 11 P. Soderhjelm, J. Kongsted and U. Ryde, J. Chem. Theory Comput., 2010, 6, 1726. 12 K. Wichapong, A. Rohe, C. Platzer, I. Slynko, F. Erdmann, M. Schmidt and W. Sippl, J. Chem. Inf. Model., 2014, 54, 881; P. Chaskar, V. Zoete and U. F. Röhrig, J. Chem. Inf. Model., 2014, 54, 3137; S. K. Burger, D. C. Thompson and P. W. Ayers, J. Chem. Inf. Model., 2011, 51, 93. 13 J. Antony, S. Grimme, D. G. Liakos and F. Neese, J. Phys. Chem. A, 2011, 115, 11210. 14 M. S. Gordon, D. G. Fedorov, S. R. Pruitt and L. V. Slipchenko, Chem. Rev., 2012, 112, 632; J. Antony and S. Grimme, J. Comput. Chem., 2012, 33, 1730. 15 M. Lepšı́k, J. Řezáč, M. Kolář, A. Pecina, P. Hobza and J. Fanfrlı́k, ChemPlusChem, 2013, 78, 921. 16 J. Řezáč, J. Fanfrlı́k, D. Salahub and P. Hobza, J. Chem. Theory Comput., 2009, 5, 1749; J. Řezáč and P. Hobza, Chem. Phys. Lett., 2011, 506, 286; J. Řezáč and P. Hobza, J. Chem. Theory Comput., 2012, 8, 141. 17 K. Raha and K. M. Merz, J. Am. Chem. Soc., 2004, 126, 1020; K. Raha and K. M. Merz, J. Med. Chem., 2005, 48, 4558. 18 J. J. P. Stewart, J. Mol. Model., 2007, 13, 1173. 19 A. Klamt and G. Schüürmann, J. Chem. Soc., Perkin Trans. 2, 1993, 799. 20 M. Kolář, J. Fanfrlı́k, M. Lepšı́k, F. Forti, F. J. Luque and P. Hobza, J. Phys. Chem. B, 2013, 117, 5950. 21 J. Fanfrlı́k, A. K. Bronowska, J. Řezáč, O. Přenosil, J. Konvalinka and P. Hobza, J. Phys. Chem. B, 2010, 114, 12666; P. Dobeš, J. Řezáč, J. Fanfrlı́k, M. Otyepka and P. Hobza, J. Phys. Chem. B, 2011, This journal is © The Royal Society of Chemistry 2016 ChemComm 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 115, 8581; A. Pecina, O. Přenosil, J. Fanfrlı́k, J. Řezáč, J. Granatier, P. Hobza and M. Lepšı́k, Collect. Czech. Chem. Commun., 2011, 76, 457; A. Pecina, M. Lepšı́k, J. Řezáč, J. Brynda, P. Mader, P. Řezáčová, P. Hobza and J. Fanfrlı́k, J. Phys. Chem. B, 2013, 117, 16096; J. Fanfrlı́k, M. Kolář, M. Kamlar, D. Hurn, F. X. Ruiz, A. Cousido-Siah, A. Mitschler, J. Řezáč, E. Munusamy and M. Lepšı́k, et al., ACS Chem. Biol., 2013, 8, 2484; J. Fanfrlı́k, F. X. Ruiz, A. Kadlčı́ková, J. Řezáč, A. Cousido-Siah, A. Mitschler, S. Haldar, M. Lepšı́k, M. H. Kolář and P. Majer, et al., ACS Chem. Biol., 2015, 10, 1637. J. Fanfrlı́k, P. S. Brahmkshatriya, J. Řezáč, A. Jı́lková, M. Horn, M. Mareš, P. Hobza and M. Lepšı́k, J. Phys. Chem. B, 2013, 117, 14973. R. A. Friesner, R. B. Murphy, M. P. Repasky, L. L. Frye, J. R. Greenwood, T. A. Halgren, P. C. Sanschagrin and D. T. Mainz, J. Med. Chem., 2006, 49, 6177. O. Korb, T. Stützle and T. E. Exner, J. Chem. Inf. Model., 2009, 49, 84. O. Trott and A. J. Olson, J. Comput. Chem., 2010, 31, 455. M. D. Eldridge, C. W. Murray, T. R. Auton, G. V. Paolini and R. P. Mee, J. Comput.-Aided Mol. Des., 1997, 11, 425. G. Jones, P. Willett, R. C. Glen, A. R. Leach and R. Taylor, J. Mol. Biol., 1997, 267, 727. W. T. M. Mooij and M. L. Verdonk, Proteins, 2005, 61, 272. Y. Duan, C. Wu, S. Chowdhury, M. C. Lee, G. Xiong, W. Zhang, R. Yang, P. Cieplak, R. Luo and R. T. Lee, J. Comput. Chem., 2003, 24, 1999; V. Tsui and D. A. Case, Biopolymers, 2001, 56, 275. A. Nicholls and A. N. Jain, J. Comput.-Aided Mol. Des., 2008, 22, 133. B. Liu, S. Wang and X. Wang, Sci. Rep., 2015, 50, 15479. E. Perola, W. P. Walters and P. S. Charifson, Proteins, 2004, 56, 235; J. W. M. Nissink, C. Murray, M. Hartshorn, M. L. Verdonk, J. C. Cole and R. Taylor, Proteins, 2002, 49, 457; P. Ferrara, H. Gohlke, D. J. Price, G. Klebe and C. L. Brooks, J. Med. Chem., 2004, 47, 3032. G. Klebe, Drug Discovery Today, 2006, 11, 580. H. Dvir, D. M. Wong, M. Harel, X. Barril, M. Orozco, F. J. Luque, D. Munoz-Torrero, P. Camps, T. L. Rosenberry and I. Silman, et al., Biochemistry, 2002, 41, 2970. U. K. Bandarage, T. Wang, J. H. Come, E. Perola, Y. Wei and B. G. Rao, Bioorg. Med. Chem. Lett., 2008, 18, 44. H. Steuber, A. Heine and G. Klebe, J. Mol. Biol., 2007, 368, 618. J. Brynda, P. Řezáčová, M. Fábry, M. Hořejšı́, R. Štouračová, J. Sedláček, M. Souček, M. Hradı́lek, M. Lepšı́k and J. Konvalinka, J. Med. Chem., 2004, 47, 2030. Chem. Commun., 2016, 52, 3312--3315 | 3315