Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3233547.3233548acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Public Access

Seq3seq Fingerprint: Towards End-to-end Semi-supervised Deep Drug Discovery

Published: 15 August 2018 Publication History

Abstract

Observing the recent progress in Deep Learning, the employment of AI is surging to accelerate drug discovery and cut R&D costs in the last few years. However, the success of deep learning is attributed to large-scale clean high-quality labeled data, which is generally unavailable in drug discovery practices. In this paper, we address this issue by proposing an end-to-end deep learning framework in a semi-supervised learning fashion. That is said, the proposed deep learning approach can utilize both labeled and unlabeled data. While labeled data is of very limited availability, the amount of available unlabeled data is generally huge. The proposed framework, named as seq3seq fingerprint, automatically learns a strong representation of each molecule in an unsupervised way from a huge training data pool containing a mixture of both unlabeled and labeled molecules. In the meantime, the representation is also adjusted to further help predictive tasks, e.g., acidity, alkalinity or solubility classification. The entire framework is trained end-to-end and simultaneously learn the representation and inference results. Extensive experiments support the superiority of the proposed framework.

References

[1]
Martín Abadi and et.al. . 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. deftempurl%http://download.tensorflow.org/paper/whitepaper2015.pdf tempurl
[2]
Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan . 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 (2016).
[3]
Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande . 2016. Low Data Drug Discovery with One-shot Learning. arXiv preprint arXiv:1611.03199 (2016).
[4]
Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande . 2017. Low data drug discovery with one-shot learning. ACS central science Vol. 3, 4 (2017), 283--293.
[5]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[6]
Travers Ching, Daniel S Himmelstein, Brett K Beaulieu-Jones, Alexandr A Kalinin, Brian T Do, Gregory P Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M Hoffman, et almbox. . 2018. Opportunities and obstacles for deep learning in biology and medicine. bioRxiv (2018), 142760.
[7]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio . 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[8]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei . 2009. ImageNet: A Large-Scale Hierarchical Image Database CVPR09.
[9]
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams . 2015. Convolutional networks on graphs for learning molecular fingerprints Advances in neural information processing systems. 2224--2232.
[10]
Yoav Freund and Robert E Schapire . 1995. A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory. Springer, 23--37.
[11]
Jerome H Friedman . 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[12]
Robert C Glen, Andreas Bender, Catrin H Arnby, Lars Carlsson, Scott Boyer, and James Smith . 2006. Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. IDrugs Vol. 9, 3 (2006), 199.
[13]
Joseph Gomes, Bharath Ramsundar, Evan N Feinberg, and Vijay S Pande . 2017. Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity. arXiv preprint arXiv:1703.10603 (2017).
[14]
Tin Kam Ho . 1995. Random decision forests. In Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, Vol. Vol. 1. IEEE, 278--282.
[15]
Ye Hu, Eugen Lounkine, and Jürgen Bajorath . 2009. Improving the Search Performance of Extended Connectivity Fingerprints through Activity-Oriented Feature Filtering and Application of a Bit-Density-Dependent Similarity Function. ChemMedChem Vol. 4, 4 (2009), 540--548.
[16]
Junzhou Huang and Zheng Xu . 2017. Cell Detection with Deep Learning Accelerated by Sparse Kernel. In Deep Learning and Convolutional Neural Networks for Medical Image Computing. Springer, 137--157.
[17]
John J Irwin, Teague Sterling, Michael M Mysinger, Erin S Bolstad, and Ryan G Coleman . 2012. ZINC: a free tool to discover chemistry for biology. Journal of chemical information and modeling Vol. 52, 7 (2012), 1757--1768.
[18]
Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley . 2016. Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design Vol. 30, 8 (2016), 595--608.
[19]
Philipp Koehn . 2005. Europarl: A parallel corpus for statistical machine translation MT summit, Vol. Vol. 5. 79--86.
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.
[21]
Ruoyu Li and Junzhou Huang . 2017. Learning Graph While Training: An Evolving Graph Convolutional Neural Network. arXiv preprint arXiv:1708.04675 (2017).
[22]
Ruoyu Li, Sheng Wang, Feiyun Zhu, and Junzhou Huang . 2018. Adaptive Graph Convolutional Neural Networks. arXiv preprint arXiv:1801.03226 (2018).
[23]
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen AWM van der Laak, Bram van Ginneken, and Clara I Sánchez . 2017. A survey on deep learning in medical image analysis. Medical image analysis Vol. 42 (2017), 60--88.
[24]
HL Morgan . 1965. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chemical Documentation Vol. 5 (1965), 107--113.
[25]
Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov . 2016. Learning convolutional neural networks for graphs. In International conference on machine learning. 2014--2023.
[26]
Noel M O'Boyle, Casey M Campbell, and Geoffrey R Hutchison . 2011. Computational design and selection of optimal organic photovoltaic materials. The Journal of Physical Chemistry C Vol. 115, 32 (2011), 16200--16210.
[27]
Hao Pan, Zheng Xu, and Junzhou Huang . 2015. An effective approach for robust lung cancer cell detection International Workshop on Patch-based Techniques in Medical Imaging. Springer, 87--94.
[28]
Zhongxing Peng, Zheng Xu, and Junzhou Huang . 2016. RSPIRIT: Robust self-consistent parallel imaging reconstruction based on generalized Lasso. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on. IEEE, 318--321.
[29]
David Rogers and Mathew Hahn . 2010. Extended-connectivity fingerprints. Journal of chemical information and modeling Vol. 50, 5 (2010), 742--754.
[30]
Chetan Rupakheti, Aaron Virshup, Weitao Yang, and David N Beratan . 2015. Strategy to discover diverse optimal molecules in the small molecule universe. Journal of chemical information and modeling Vol. 55, 3 (2015), 529--537.
[31]
Govindan Subramanian, Bharath Ramsundar, Vijay Pande, and Rajiah Aldrin Denny . 2016. Computational Modeling of β-secretase 1 (BACE-1) Inhibitors using Ligand Based Approaches. Journal of Chemical Information and Modeling Vol. 56, 10 (2016), 1936--1949.
[32]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le . 2014. Sequence to sequence learning with neural networks Advances in neural information processing systems. 3104--3112.
[33]
Swabha Swayamdipta, Sam Thomson, Chris Dyer, and Noah A Smith . 2017. Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold. arXiv preprint arXiv:1706.09528 (2017).
[34]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi . 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI, Vol. Vol. 4. 12.
[35]
Izhar Wallach, Michael Dzamba, and Abraham Heifets . 2015. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv preprint arXiv:1510.02855 (2015).
[36]
Sheng Wang, Jiawen Yao, Zheng Xu, and Junzhou Huang . 2016. Subtype cell detection with an accelerated deep convolution neural network International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 640--648.
[37]
David Weininger . 1970. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. In Proc. Edinburgh Math. SOC, Vol. Vol. 17. 1--14.
[38]
Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande . 2018. MoleculeNet: a benchmark for molecular machine learning. Chemical Science Vol. 9, 2 (2018), 513--530.
[39]
Zheng Xu and Junzhou Huang . 2015. Efficient lung cancer cell detection with deep convolution neural network International Workshop on Patch-based Techniques in Medical Imaging. Springer, 79--86.
[40]
Zheng Xu and Junzhou Huang . 2016. Detecting 10,000 Cells in One Second. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 676--684.
[41]
Zheng Xu and Junzhou Huang . 2017. A general efficient hyperparameter-free algorithm for convolutional sparse learning. AAAI. 2803--2809.
[42]
Zheng Xu, Yeqing Li, Leon Axel, and Junzhou Huang . 2015. Efficient preconditioning in joint total variation regularized parallel MRI reconstruction. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 563--570.
[43]
Zheng Xu, Sheng Wang, Feiyun Zhu, and Junzhou Huang . 2017. Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery BCB.
[44]
Jiawen Yao, Sheng Wang, Xinliang Zhu, and Junzhou Huang . 2016. Imaging biomarker discovery for lung cancer survival prediction International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 649--657.
[45]
Feiyun Zhu, Jun Guo, Zheng Xu, Peng Liao, and Junzhou Huang . 2017. Group-driven Reinforcement Learning for Personalized mHealth Intervention. arXiv preprint arXiv:1708.04001 (2017).
[46]
Andrew Zisserman, Joao Carreira, Karen Simonyan, Will Kay, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, and Mustafa Suleyman . 2017. The Kinetics Human Action Video Dataset.

Cited By

View all
  • (2024)QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learningJournal of Cheminformatics10.1186/s13321-024-00843-y16:1Online publication date: 29-Apr-2024
  • (2024)Deep Learning Methods for Small Molecule Drug Discovery: A SurveyIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32519775:2(459-479)Online publication date: Feb-2024
  • (2024)Toward Robust Self-Training Paradigm for Molecular Prediction TasksJournal of Computational Biology10.1089/cmb.2023.018731:3(213-228)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
August 2018
727 pages
ISBN:9781450357944
DOI:10.1145/3233547
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computational biology
  2. deep learning
  3. drug discovery
  4. imaging
  5. learning representation
  6. molecular representation
  7. semi-supervised learning
  8. sequence to sequence learning
  9. structured prediction
  10. unsupervised learning
  11. virtual screening

Qualifiers

  • Research-article

Funding Sources

Conference

BCB '18
Sponsor:

Acceptance Rates

BCB '18 Paper Acceptance Rate 46 of 148 submissions, 31%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)196
  • Downloads (Last 6 weeks)32
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learningJournal of Cheminformatics10.1186/s13321-024-00843-y16:1Online publication date: 29-Apr-2024
  • (2024)Deep Learning Methods for Small Molecule Drug Discovery: A SurveyIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32519775:2(459-479)Online publication date: Feb-2024
  • (2024)Toward Robust Self-Training Paradigm for Molecular Prediction TasksJournal of Computational Biology10.1089/cmb.2023.018731:3(213-228)Online publication date: 1-Mar-2024
  • (2024)A Comprehensive Survey on Deep Graph Representation LearningNeural Networks10.1016/j.neunet.2024.106207173(106207)Online publication date: May-2024
  • (2024)INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property predictionJournal of Molecular Graphics and Modelling10.1016/j.jmgm.2024.108703128(108703)Online publication date: May-2024
  • (2024)Multi-scale cross-attention transformer via graph embeddings for few-shot molecular property predictionApplied Soft Computing10.1016/j.asoc.2024.111268153:COnline publication date: 1-Mar-2024
  • (2023)Molecular Descriptors Property Prediction Using Transformer-Based ApproachInternational Journal of Molecular Sciences10.3390/ijms24151194824:15(11948)Online publication date: 26-Jul-2023
  • (2023)Knowledge-Enhanced Attributed Multi-Task Learning for Medicine RecommendationACM Transactions on Information Systems10.1145/352766241:1(1-24)Online publication date: 10-Jan-2023
  • (2023)Few-shot graph and smiles learning for molecular property predictionThird International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023)10.1117/12.3005809(26)Online publication date: 10-Oct-2023
  • (2023)Adaptive Transfer of Graph Neural Networks for Few-Shot Molecular Property PredictionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2023.332745220:6(3863-3875)Online publication date: Nov-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media