Abstract
Highly effective de novo design is a grand challenge of computer-aided drug discovery. Practical structure-specific three-dimensional molecule generations have started to emerge in recent years, but most approaches treat the target structure as a conditional input to bias the molecule generation and do not fully learn the detailed atomic interactions that govern the molecular conformation and stability of the binding complexes. The omission of these fine details leads to many models having difficulty in outputting reasonable molecules for a variety of therapeutic targets. Here, to address this challenge, we formulate a model, called SurfGen, that designs molecules in a fashion closely resembling the figurative key-and-lock principle. SurfGen comprises two equivariant neural networks, Geodesic-GNN and Geoatom-GNN, which capture the topological interactions on the pocket surface and the spatial interaction between ligand atoms and surface nodes, respectively. SurfGen outperforms other methods in a number of benchmarks, and its high sensitivity on the pocket structures enables an effective generative-model-based solution to the thorny issue of mutation-induced drug resistance.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data are available at Zenodo (https://doi.org/10.5281/zenodo.8307911)44. PDB IDs 1ZYU and 6LU7 are available in the PDB (https://www.rcsb.org/). Source data are available with this paper.
Code availability
The code is available at GitHub (https://github.com/HaotianZhangAI4Science/SurfGen)45.
References
Ferreira, L. G., Dos Santos, R. N., Oliva, G. & Andricopulo, A. D. Molecular docking and structure-based drug design strategies. Molecules 20, 13384–13421 (2015).
Anderson, A. C. The process of structure-based drug design. Chem. Biol. 10, 787–797 (2003).
Shoichet, B. K. Virtual screening of chemical libraries. Nature 432, 862–865 (2004).
Böhm, H.-J. The computer program LUDI: a new method for the de novo design of enzyme inhibitors. J. Comput. Aided Mol. Des. 6, 61–78 (1992).
Wang, R., Gao, Y. & Lai, L. LigBuilder: a multi-purpose program for structure-based drug design. Mol. Model. Annu. 6, 498–516 (2000).
David, L., Nielsen, P. A., Hedstrom, M. & Norden, B. Scope and limitation of ligand docking: methods, scoring functions and protein targets. Curr. Comput. Aided Drug Design 1, 275–306 (2005).
Jorgensen, W. L. Rusting of the lock and key model for protein-ligand binding. Science 254, 954–955 (1991).
Ain, Q. U., Aleksandrova, A., Roessler, F. D. & Ballester, P. J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip. Rev. Comput. Mol. Sci. 5, 405–424 (2015).
McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminformatics 13, 1–20 (2021).
Shen, C. et al. Boosting protein–ligand binding pose prediction and virtual screening based on residue–atom distance likelihood potential and graph transformer. J. Med. Chem. 65, 10691–10706 (2022).
Jiang, D. et al. Interactiongraphnet: a novel and efficient deep graph representation learning framework for accurate protein–ligand interaction predictions. J. Med. Chem. 64, 18209–18232 (2021).
Moon, S., Zhung, W., Yang, S., Lim, J. & Kim, W. Y. PIGNet: a physics-informed deep learning model toward generalized drug–target interaction predictions. Chem. Sci. 13, 3661–3673 (2022).
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Deng, C. et al. Vector neurons: A general framework for so (3)-equivariant networks. Proc. IEEE/CVF International Conference on Computer Vision 12200–12209. (2021).
Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 617–626 (2020).
Peng, X. et al. Pocket2Mol: efficient molecular sampling based on 3D protein pockets. International Conference on Machine Learning. 17644–17655. (2022).
Ragoza, M., Masuda, T. & Koes, D. R. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem. Sci. 13, 2701–2713 (2022).
Liu, M., Luo, Y., Uchino, K., Maruhashi, K. & Ji, S. Generating 3D molecules for target protein binding. International Conference on Machine Learning, 13912–13924. (2022).
Jeon, W. & Kim, D. Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors. Sci. Rep. 10, 22104 (2020).
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
Wang, R., Liu, L., Lai, L. & Tang, Y. SCORE: a new empirical method for estimating the binding affinity of a protein–ligand complex. Mol. Model. Annu. 4, 379–394 (1998).
Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at https://arxiv.org/abs/2210.13695 (2022).
Martin, Y. C., Kofron, J. L. & Traphagen, L. M. Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 4350–4358 (2002).
Yang, J., Cai, Y., Zhao, K., Xie, H. & Chen, X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov. Today 27, 103356 (2022).
Kang, S.-G. et al. In-pocket 3D graphs enhance ligand–target compatibility in generative small-molecule creation. Preprint at https://arxiv.org/abs/2204.02513 (2022).
Wang, M. et al. Relation: a deep generative model for structure-based de novo drug design. J. Med. Chem. 65, 9478–9492 (2022).
Gan, J., Gu, Y., Li, Y., Yan, H. & Ji, X. Crystal structure of Mycobacterium tuberculosis shikimate kinase in complex with shikimic acid and an ATP analogue. Biochemistry 45, 8539–8545 (2006).
Pereira, J. H. et al. Shikimate kinase: a potential target for development of novel antitubercular agents. Curr. Drug Targets 8, 459–468 (2007).
Jing, B., Eismann, S., Suriana, P., Townshend, R. J. & Dror, R. Learning from protein structure with geometric vector perceptrons. Preprint at https://arxiv.org/abs/2009.01411 (2020).
Lamm, G. The Poisson–Boltzmann equation. Rev. Comput. Chem. 19, 147–365 (2003).
Kortemme, T., Morozov, A. V. & Baker, D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein–protein complexes. J. Mol. Biol. 326, 1239–1259 (2003).
Hagemans, D., Van Belzen, I. A., Morán Luengo, T. & Rüdiger, S. G. A script to highlight hydrophobicity and charge on protein surfaces. Front. Mol. Biosci. 2, 56 (2015).
Shi, C. et al. Graphaf: a flow-based autoregressive model for molecular graph generation. International Conference on Learning Representations (ICLR), 2020.
Lin, H. et al. DiffBP: generative diffusion of 3D molecules for target protein binding. Preprint at https://arxiv.org/abs/2211.11214 (2022).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Lu, W. et al. TANKBind: Trigonometry-aware neural networks for drug-protein binding structure prediction. Adv. Neural Inf. Process. Syst. 35, 7236–7249 (2022).
Luo, S., Guan, J., Ma, J. & Peng, J. A 3D generative model for structure-based drug design. Adv. Neural Inf. Process. Syst. 34, 6229–6239 (2021).
Burley, S. K. et al. Protein Data Bank (PDB): the single global macromolecular structure archive. Protein Crystallogr. 1607, 627–641 (2017).
Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 35, D198–D201 (2007).
Jin, Z. et al. Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature 582, 289–293 (2020).
Tanimoto, T. T. Elementary Mathematical Theory of Classification and Prediction, IBM Internal Report (1958).
Landrum, G. RDKit documentation. Release 1, 4 (2013).
Odi,n Z. CrossDock processed data. Zenodo https://doi.org/10.5281/zenodo.7751348 (2023).
Odin, Z. SurfGenV1. Zenodo https://doi.org/10.5281/zenodo.8307911 (2023).
Clark, D. E. & Pickett, S. D. Computational methods for the prediction of ‘drug-likeness’. Drug Discov. Today 5, 49–58 (2000).
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminformatics 1, 1–11 (2009).
Ganesan, A. The impact of natural products upon modern drug discovery. Curr. Opin. Chem. Biol. 12, 306–317 (2008).
Sangster, J. Octanol–water partition coefficients of simple organic compounds. J. Phys. Chem. Ref. Data 18, 1111–1229 (1989).
Acknowledgements
This work was financially supported by the National Key Research and Development Program of China (2022YFF1203003), the National Natural Science Foundation of China (22220102001, 82373791, and 81973281) and the Natural Science Foundation of Zhejiang Province (LD22H300001).
Author information
Authors and Affiliations
Contributions
O.Z. contributed to the main idea and code. T.W. and N.W. contributed to the paper writing and code reorganization. G.W. contributed to the collection of the dataset and the corresponding experiment. D.J. contributed to the real-world case of the COVID-19 target experiment. X.W., H.Z. and J.W. contributed to the data analysis and drawing. N.W. contributed to the assessment of LigBuilder and Morld methods. E.W. contributed to the instruction in physical concepts. G.C. and Y.D. contributed to the visualization and technique support. P.P. contributed to the suggestion of the mutation experiment with molecular generation protocol. Y.K. and C.-Y.H. contributed to the paper revision and experimental design. T.H. contributed to the essential financial support and conception, and was responsible for the overall quality.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks Huziel Sauceda and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Sections 1–7, Figs. 1–3 and Tables 1–6.
Source data
Source Data Fig. 1
The corresponding metrics of the visualized examples in Fig. 1.
Source Data Fig. 2
Unprocessed raw data to draw the distribution plot in Fig. 2a.
Source Data Fig. 3
The molecular and pocket volumes shown in Fig. 3c.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, O., Wang, T., Weng, G. et al. Learning on topological surface and geometric structure for 3D molecular generation. Nat Comput Sci 3, 849–859 (2023). https://doi.org/10.1038/s43588-023-00530-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-023-00530-2
This article is cited by
-
G protein-coupled receptors (GPCRs): advances in structures, mechanisms and drug discovery
Signal Transduction and Targeted Therapy (2024)