3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

Taojie Kuang^1,2,
Yiming Ren¹ &
Zhixiang Ren¹

433 Accesses
1 Citation
Explore all metrics

Abstract

Molecular property prediction, crucial for early drug candidate screening and optimization, has seen advancements with deep learning-based methods. While deep learning-based methods have advanced considerably, they often fall short in fully leveraging 3D spatial information. Specifically, current molecular encoding techniques tend to inadequately extract spatial information, leading to ambiguous representations where a single one might represent multiple distinct molecules. Moreover, existing molecular modeling methods focus predominantly on the most stable 3D conformations, neglecting other viable conformations present in reality. To address these issues, we propose 3D-Mol, a novel approach designed for more accurate spatial structure representation. It deconstructs molecules into three hierarchical graphs to better extract geometric information. Additionally, 3D-Mol leverages contrastive learning for pretraining on 20 million unlabeled data, treating their conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, based on the similarity of their 3D conformation descriptors and fingerprints. We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning

Article Open access 29 April 2024

MolBench: A Benchmark of AI Models for Molecular Property Prediction

Deep contrastive learning of molecular conformation for efficient property prediction

Article 04 December 2023

Data availability

The unlabeled dataset ZINC20 and PubChem, used in pretraining stage, can be accessed at https://zinc20.docking.org/tranches/home/ and https://pubchem.ncbi.nlm.nih.gov/docs/downloads. The downstream benchmarks can be downloaded from MoleculeNet (https://moleculenet.org/datasets-1). It is available for non-commercial use.

Code availability

The software can be accessed at https://github.com/AI-HPC-Research-Team/3D-Mol.

References

Goh GB, Hodas NO, Siegel C, Vishnu A (2017) SMILES2Vec: an interpretable general-purpose deep neural network for predicting chemical properties https://doi.org/10.48550/ARXIV.1712.02034
Huang K, Fu T, Glass LM, Zitnik M, Xiao C, Sun J (2020) Deeppurpose: a deep learning library for drug-target interaction prediction. Bioinformatics 36(22–23):5545–5547
Google Scholar
Chithrananda S, Grand G, Ramsundar B (2020) ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. https://doi.org/10.48550/ARXIV.2010.09885
Weininger D (1988) Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inform Comput Sci 28(1):31–36
Article Google Scholar
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. Proc Mach Learn Res 70:1263–1272
Google Scholar
Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inform Modeling 59(8):3370–3388
Article Google Scholar
Hu W, Liu B, Gomes J, Zitnik M, Liang P, Pande V, Leskovec J (2019) Strategies for Pre-training Graph Neural Networks. https://doi.org/10.48550/ARXIV.1905.12265
Liu S, Demirel MF, Liang Y (2019) N-gram graph: simple unsupervised representation for graphs, with applications to molecules. Adv Neural Inform Process Syst 32:19
Google Scholar
Xiong Z, Wang D, Liu X, Zhong F, Wan X, Li X, Li Z, Luo X, Chen K, Jiang H et al (2019) Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 63(16):8749–8760
Article Google Scholar
Wang Y, Wang J, Cao Z, Barati Farimani A (2022) Molecular contrastive learning of representations via graph neural networks. Nature Mach Intell 4(3):279–287. https://doi.org/10.1038/s42256-022-00447-x
Article Google Scholar
Rong Y, Bian Y, Xu T, Xie W, WEI Y, Huang W, Huang J (2020) Self-supervised graph transformer on large-scale molecular data. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 12559–12571. Curran Associates, Inc., ???. https://proceedings.neurips.cc/paper_files/paper/2020/file/94aef38441efa3380a3bed3faf1f9d5d-Paper.pdf
Schütt K, Kindermans P-J, Sauceda Felix HE, Chmiela S, Tkatchenko A, Müller K-R (2017) Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems 30
Gasteiger J, Groß J, Günnemann S (2020) Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123
Shui Z, Karypis G (2020) Heterogeneous molecular graph neural networks for predicting molecule properties. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 492–500. IEEE
Danel T, Spurek P, Tabor J, Śmieja M, Struski Ł, Słowik A, Maziarka Ł (2020) Spatial graph convolutional networks. In: Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18–22, 2020, Proceedings, Part V, pp. 668–675. Springer
Fang X, Liu L, Lei J, He D, Zhang S, Zhou J, Wang F, Wu H, Wang H (2022) Geometry-enhanced molecular representation learning for property prediction. Nature Mach Intell 4(2):127–134
Article Google Scholar
Zhou G, Gao Z, Ding Q, Zheng H, Xu H, Wei Z, Zhang L, Ke G (2023) Uni-mol: a universal 3d molecular representation learning framework
Zhang Z, Xu M, Jamasb A, Chenthamarakshan V, Lozano A, Das P, Tang J (2022) Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125
Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2018) Moleculenet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530
Article Google Scholar
Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Garcia-Vallvé S, Pujadas G (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63
Article Google Scholar
Coley CW, Barzilay R, Green WH, Jaakkola TS, Jensen KF (2017) Convolutional embedding of attributed molecular graphs for physical property prediction. J Chem Inform Model 57(8):1757–1772
Article Google Scholar
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inform Modeling 50(5):742–754
Article Google Scholar
Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of mdl keys for use in drug discovery. J Chem Inform Comput Sci 42(6):1273–1280
Article Google Scholar
Wang S, Guo Y, Wang Y, Sun H, Huang J (2019) Smiles-bert: large scale unsupervised pre-training for molecular property prediction. Computat Biol Health Inform 4:429–436
Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. https://doi.org/10.48550/ARXIV.1810.04805
Wang J, Cao D, Tang C, Xu L, He Q, Yang B, Chen X, Sun H, Hou T (2021) Deepatomiccharge: a new graph convolutional network-based architecture for accurate prediction of atomic charges. Brief Bioinform 22(3):183
Article Google Scholar
Li X-S, Liu X, Lu L, Hua X-S, Chi Y, Xia K (2022) Multiphysical graph neural network (mp-gnn) for COVID-19 drug design. Brief Bioinform 23(4):231
Article Google Scholar
Lu C, Liu Q, Wang C, Huang Z, Lin P, He L (2019) Molecular property prediction: a multilevel quantum interactions modeling perspective. Proc Conf Artif Intell 33:1052–1060
Google Scholar
Qiao Z, Welborn M, Anandkumar A, Manby FR, Miller TF (2020) Orbnet: deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J Chem Phys 153(12):686
Article Google Scholar
Li Z, Jiang M, Wang S, Zhang S (2022) Deep learning methods for molecular representation and property prediction. Drug Discov Today 27:103373
Article Google Scholar
Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P (2018) Development and evaluation of a deep learning model for protein-ligand binding affinity prediction. Bioinformatics 34(21):3666–3674
Article Google Scholar
Sunseri J, Koes DR (2020) Libmolgrid: graphics processing unit accelerated molecular gridding for deep learning applications. J Chem Inform Modeling 60(3):1079–1084
Article Google Scholar
Liu Q, Wang P-S, Zhu C, Gaines BB, Zhu T, Bi J, Song M (2021) Octsurf: efficient hierarchical voxel-based molecular surface representation for protein-ligand affinity prediction. J Mol Graph Modelling 105:107865
Article Google Scholar
Floridi L, Chiriatti M (2020) Gpt-3: its nature, scope, limits, and consequences. Minds Mach 30:681–694
Article Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
Honda S, Shi S, Ueda HR (2019) Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738
You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y (2020) Graph contrastive learning with augmentations. Adv Neural Inform Process Syst 33:5812–5823
Google Scholar
Sun M, Xing J, Wang H, Chen B, Zhou J (2021) Mocl: Data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. In: proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp. 3585–3594
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S (2021) An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief Bioinform 22(6):109
Article Google Scholar
Wang Y, Magar R, Liang C, Barati Farimani A (2022) Improving molecular contrastive learning via faulty negative mitigation and decomposed fragment contrast. J Chem Inform Modeling 62(11):2713–2725
Article Google Scholar
Sun Q, Li J, Peng H, Wu J, Ning Y, Yu PS, He L (2021) Sugar: Subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In: proceedings of the web conference 2021, pp. 2081–2091
Ji Z, Shi R, Lu J, Li F, Yang Y (2022) Relmole: molecular representation learning based on two-level graph similarities. J Chem Inform Modeling 62(22):5361–5372
Article Google Scholar
Cho H, Choi IS (2019) Enhanced deep-learning prediction of molecular properties via augmentation of bond topology. Chem Med Chem 14(17):1604–1609
Article Google Scholar
Liu S, Wang H, Liu W, Lasenby J, Guo H, Tang J (2021) Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728
Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8
Irwin JJ, Tang KG, Young J, Dandarchuluun C, Wong BR, Khurelbaatar M, Moroz YS, Mayfield J, Sayle RA (2020) Zinc20-a free ultralarge-scale chemical database for ligand discovery. J Chem Inform Modeling 60(12):6065–6073
Article Google Scholar
Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH (2009) Pubchem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37(2):623–633
Article Google Scholar
Stärk H, Beaini D, Corso G, Tossou P, Dallago C, Günnemann S, Liò P (2022) 3d infomax improves gnns for molecular property prediction. In: international conference on machine learning, pp. 20479–20502. PMLR

Download references

Acknowledgements

The research was supported by the Peng Cheng Cloud-Brain.

Funding

This work is supported by Peng Cheng Laboratory and by the Major Key Project of PCL PCL2021A13.

Author information

Authors and Affiliations

Peng Cheng Laboratory, Shenzhen, 518000, Guangdong Province, China
Taojie Kuang, Yiming Ren & Zhixiang Ren
School of Future Technology, South China University of Technology, Guangzhou, 510000, Guangdong Province, China
Taojie Kuang

Authors

Taojie Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Ren
View author publications
You can also search for this author in PubMed Google Scholar
Zhixiang Ren
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Taojie Kuang: Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - original draft, Writing - review & editing. Yiming Ren: Validation, Writing - review & editing. Zhixiang Ren: Conceptualization, Formal analysis, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing - original draft, Writing - review & editing.

Corresponding author

Correspondence to Zhixiang Ren.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: 3D conformation descriptor and fingerprint

1.1 A.1 Fingerprint

In our study, we integrate molecular fingerprints, particularly Morgan fingerprints, to calculate weights for negative pairs in our model. These fingerprints, which provide a compact numerical representation of molecular structures, are crucial for computational chemistry tasks. The Morgan fingerprint method iteratively updates each atom’s representation based on its chemical surroundings, resulting in a detailed binary vector of the molecule. By evaluating the similarity between Morgan fingerprints, we derive a precise weighting mechanism for negative pairs, enhancing our model’s ability to detect and differentiate molecular structures. This methodology not only improves our model’s accuracy in molecular interaction analysis but also adds to its overall predictive capabilities.

1.2 A.2 3D conformation descriptor

Molecular 3D conformation descriptors are computational tools used to represent the three-dimensional arrangement of atoms within a molecule, capturing critical aspects of its spatial geometry. These descriptors are crucial in understanding how molecular shape influences chemical and biological properties, and they play a significant role in fields like drug design and materials science. The 3D-Morse descriptor, specifically, is a type of 3D molecular descriptor that quantifies the molecular structure using electron diffraction patterns, offering a unique approach to encapsulating the spatial distribution of atoms. It provides a detailed and nuanced representation of molecular conformation, making it highly valuable in computational chemistry and cheminformatics. In our research, we employ 3D-Morse descriptors to measure the similarity of molecular 3D conformations, enabling us to compare and analyze molecular structures effectively and identify potential similarities in their biological or chemical behavior. This application of 3D-Morse descriptors is instrumental in fields such as drug discovery, where understanding molecular similarities can lead to the identification of new therapeutic compounds or the prediction of their activities.

Appendix B: The contribution of pretraining method

Table 4 The contribution of pretraining method. We study the performance of 3D-Mol in three scenarios: contrastive learning only, supervised pretraining only, complete pretraining method, then mark the best results in bold and underline the second best

Full size table

In this section, we discuss the contributions of contrastive learning and supervised pretraining methods to our pretraining approach. We pretrained our model using three approaches: contrastive Learning only, supervised pretraining only, and complete pretraining method. We compared their performance on 7 benchmark datasets. As the Table 4 shown, the contributions of both contrastive learning and supervised pretraining were less significant than the complete method. These findings emphasize that while both contrastive learning and supervised pretraining contribute positively to the model’s performance, their combination is crucial for achieving optimal results.

Appendix C: Finetuning details

During finetuning for each downstream task, we randomly search the hyper-parameters to find the best performing setting on the validation set and report the results on the test set. Table 5 lists the combinations of different hyper-parameters.

Table 5 hyper-parameter setting

Full size table

Appendix D: Environment

CPU:

$\bullet $ Architect: X86 64

$\bullet $ Number of CPUs: 96

$\bullet $ Model: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz

GPU:

$\bullet $ Type: Tesla V100-SXM2-32GB

$\bullet $ Count: 8

$\bullet $ Driver Version: 450.80.02

$\bullet $ CUDA Version: 11.7

Software Environment:

$\bullet $ Operating System: Ubuntu 20.04.6 LTS

$\bullet $ Python Version: 3.10.9

$\bullet $ Paddle Version: 2.4.2

$\bullet $ PGL Version: 2.2.5

$\bullet $ RDKit Version: 2023.3.2

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kuang, T., Ren, Y. & Ren, Z. 3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information. Pattern Anal Applic 27, 71 (2024). https://doi.org/10.1007/s10044-024-01287-8

Download citation

Received: 01 April 2024
Accepted: 18 June 2024
Published: 21 June 2024
DOI: https://doi.org/10.1007/s10044-024-01287-8

3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning

MolBench: A Benchmark of AI Models for Molecular Property Prediction

Deep contrastive learning of molecular conformation for efficient property prediction

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: 3D conformation descriptor and fingerprint

1.1 A.1 Fingerprint

1.2 A.2 3D conformation descriptor

Appendix B: The contribution of pretraining method

Appendix C: Finetuning details

Appendix D: Environment

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning

MolBench: A Benchmark of AI Models for Molecular Property Prediction

Deep contrastive learning of molecular conformation for efficient property prediction

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: 3D conformation descriptor and fingerprint

1.1 A.1 Fingerprint

1.2 A.2 3D conformation descriptor

Appendix B: The contribution of pretraining method

Appendix C: Finetuning details

Appendix D: Environment

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation