research-article

Public Access

Seq3seq Fingerprint: Towards End-to-end Semi-supervised Deep Drug Discovery

Authors:

Junzhou HuangAuthors Info & Claims

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Pages 404 - 413

https://doi.org/10.1145/3233547.3233548

Published: 15 August 2018 Publication History

Abstract

Observing the recent progress in Deep Learning, the employment of AI is surging to accelerate drug discovery and cut R&D costs in the last few years. However, the success of deep learning is attributed to large-scale clean high-quality labeled data, which is generally unavailable in drug discovery practices. In this paper, we address this issue by proposing an end-to-end deep learning framework in a semi-supervised learning fashion. That is said, the proposed deep learning approach can utilize both labeled and unlabeled data. While labeled data is of very limited availability, the amount of available unlabeled data is generally huge. The proposed framework, named as seq3seq fingerprint, automatically learns a strong representation of each molecule in an unsupervised way from a huge training data pool containing a mixture of both unlabeled and labeled molecules. In the meantime, the representation is also adjusted to further help predictive tasks, e.g., acidity, alkalinity or solubility classification. The entire framework is trained end-to-end and simultaneously learn the representation and inference results. Extensive experiments support the superiority of the proposed framework.

References

[1]

Martín Abadi and et.al. . 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. deftempurl%http://download.tensorflow.org/paper/whitepaper2015.pdf tempurl

[2]

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan . 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 (2016).

[3]

Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande . 2016. Low Data Drug Discovery with One-shot Learning. arXiv preprint arXiv:1611.03199 (2016).

[4]

Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande . 2017. Low data drug discovery with one-shot learning. ACS central science Vol. 3, 4 (2017), 283--293.

[5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[6]

Travers Ching, Daniel S Himmelstein, Brett K Beaulieu-Jones, Alexandr A Kalinin, Brian T Do, Gregory P Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M Hoffman, et almbox. . 2018. Opportunities and obstacles for deep learning in biology and medicine. bioRxiv (2018), 142760.

[7]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio . 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[8]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei . 2009. ImageNet: A Large-Scale Hierarchical Image Database CVPR09.

[9]

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams . 2015. Convolutional networks on graphs for learning molecular fingerprints Advances in neural information processing systems. 2224--2232.

Digital Library

[10]

Yoav Freund and Robert E Schapire . 1995. A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory. Springer, 23--37.

Digital Library

[11]

Jerome H Friedman . 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.

[12]

Robert C Glen, Andreas Bender, Catrin H Arnby, Lars Carlsson, Scott Boyer, and James Smith . 2006. Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. IDrugs Vol. 9, 3 (2006), 199.

[13]

Joseph Gomes, Bharath Ramsundar, Evan N Feinberg, and Vijay S Pande . 2017. Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity. arXiv preprint arXiv:1703.10603 (2017).

[14]

Tin Kam Ho . 1995. Random decision forests. In Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, Vol. Vol. 1. IEEE, 278--282.

Digital Library

[15]

Ye Hu, Eugen Lounkine, and Jürgen Bajorath . 2009. Improving the Search Performance of Extended Connectivity Fingerprints through Activity-Oriented Feature Filtering and Application of a Bit-Density-Dependent Similarity Function. ChemMedChem Vol. 4, 4 (2009), 540--548.

[16]

Junzhou Huang and Zheng Xu . 2017. Cell Detection with Deep Learning Accelerated by Sparse Kernel. In Deep Learning and Convolutional Neural Networks for Medical Image Computing. Springer, 137--157.

[17]

John J Irwin, Teague Sterling, Michael M Mysinger, Erin S Bolstad, and Ryan G Coleman . 2012. ZINC: a free tool to discover chemistry for biology. Journal of chemical information and modeling Vol. 52, 7 (2012), 1757--1768.

[18]

Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley . 2016. Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design Vol. 30, 8 (2016), 595--608.

[19]

Philipp Koehn . 2005. Europarl: A parallel corpus for statistical machine translation MT summit, Vol. Vol. 5. 79--86.

[20]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.

Digital Library

[21]

Ruoyu Li and Junzhou Huang . 2017. Learning Graph While Training: An Evolving Graph Convolutional Neural Network. arXiv preprint arXiv:1708.04675 (2017).

[22]

Ruoyu Li, Sheng Wang, Feiyun Zhu, and Junzhou Huang . 2018. Adaptive Graph Convolutional Neural Networks. arXiv preprint arXiv:1801.03226 (2018).

[23]

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen AWM van der Laak, Bram van Ginneken, and Clara I Sánchez . 2017. A survey on deep learning in medical image analysis. Medical image analysis Vol. 42 (2017), 60--88.

[24]

HL Morgan . 1965. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chemical Documentation Vol. 5 (1965), 107--113.

[25]

Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov . 2016. Learning convolutional neural networks for graphs. In International conference on machine learning. 2014--2023.

Digital Library

[26]

Noel M O'Boyle, Casey M Campbell, and Geoffrey R Hutchison . 2011. Computational design and selection of optimal organic photovoltaic materials. The Journal of Physical Chemistry C Vol. 115, 32 (2011), 16200--16210.

[27]

Hao Pan, Zheng Xu, and Junzhou Huang . 2015. An effective approach for robust lung cancer cell detection International Workshop on Patch-based Techniques in Medical Imaging. Springer, 87--94.

[28]

Zhongxing Peng, Zheng Xu, and Junzhou Huang . 2016. RSPIRIT: Robust self-consistent parallel imaging reconstruction based on generalized Lasso. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on. IEEE, 318--321.

[29]

David Rogers and Mathew Hahn . 2010. Extended-connectivity fingerprints. Journal of chemical information and modeling Vol. 50, 5 (2010), 742--754.

[30]

Chetan Rupakheti, Aaron Virshup, Weitao Yang, and David N Beratan . 2015. Strategy to discover diverse optimal molecules in the small molecule universe. Journal of chemical information and modeling Vol. 55, 3 (2015), 529--537.

[31]

Govindan Subramanian, Bharath Ramsundar, Vijay Pande, and Rajiah Aldrin Denny . 2016. Computational Modeling of β-secretase 1 (BACE-1) Inhibitors using Ligand Based Approaches. Journal of Chemical Information and Modeling Vol. 56, 10 (2016), 1936--1949.

[32]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le . 2014. Sequence to sequence learning with neural networks Advances in neural information processing systems. 3104--3112.

Digital Library

[33]

Swabha Swayamdipta, Sam Thomson, Chris Dyer, and Noah A Smith . 2017. Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold. arXiv preprint arXiv:1706.09528 (2017).

[34]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi . 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI, Vol. Vol. 4. 12.

[35]

Izhar Wallach, Michael Dzamba, and Abraham Heifets . 2015. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv preprint arXiv:1510.02855 (2015).

[36]

Sheng Wang, Jiawen Yao, Zheng Xu, and Junzhou Huang . 2016. Subtype cell detection with an accelerated deep convolution neural network International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 640--648.

[37]

David Weininger . 1970. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. In Proc. Edinburgh Math. SOC, Vol. Vol. 17. 1--14.

[38]

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande . 2018. MoleculeNet: a benchmark for molecular machine learning. Chemical Science Vol. 9, 2 (2018), 513--530.

[39]

Zheng Xu and Junzhou Huang . 2015. Efficient lung cancer cell detection with deep convolution neural network International Workshop on Patch-based Techniques in Medical Imaging. Springer, 79--86.

[40]

Zheng Xu and Junzhou Huang . 2016. Detecting 10,000 Cells in One Second. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 676--684.

[41]

Zheng Xu and Junzhou Huang . 2017. A general efficient hyperparameter-free algorithm for convolutional sparse learning. AAAI. 2803--2809.

[42]

Zheng Xu, Yeqing Li, Leon Axel, and Junzhou Huang . 2015. Efficient preconditioning in joint total variation regularized parallel MRI reconstruction. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 563--570.

[43]

Zheng Xu, Sheng Wang, Feiyun Zhu, and Junzhou Huang . 2017. Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery BCB.

Digital Library

[44]

Jiawen Yao, Sheng Wang, Xinliang Zhu, and Junzhou Huang . 2016. Imaging biomarker discovery for lung cancer survival prediction International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 649--657.

[45]

Feiyun Zhu, Jun Guo, Zheng Xu, Peng Liao, and Junzhou Huang . 2017. Group-driven Reinforcement Learning for Personalized mHealth Intervention. arXiv preprint arXiv:1708.04001 (2017).

[46]

Andrew Zisserman, Joao Carreira, Karen Simonyan, Will Kay, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, and Mustafa Suleyman . 2017. The Kinetics Human Action Video Dataset.

Cited By

Yang ZHuang TPan LWang JWang LDing JXiao J(2024)QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learningJournal of Cheminformatics10.1186/s13321-024-00843-y16:1Online publication date: 29-Apr-2024
https://doi.org/10.1186/s13321-024-00843-y
Hu WLiu YChen XChai WChen HWang HWang G(2024)Deep Learning Methods for Small Molecule Drug Discovery: A SurveyIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32519775:2(459-479)Online publication date: Feb-2024
https://doi.org/10.1109/TAI.2023.3251977
Ma HJiang FRong YGuo YHuang J(2024)Toward Robust Self-Training Paradigm for Molecular Prediction TasksJournal of Computational Biology10.1089/cmb.2023.018731:3(213-228)Online publication date: 1-Mar-2024
https://doi.org/10.1089/cmb.2023.0187
Show More Cited By

Recommendations

Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery
ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Many of today's drug discoveries require expertise knowledge and insanely expensive biological experiments for identifying the chemical molecular properties. However, despite the growing interests of using supervised machine learning algorithms to ...
Seq3seq fingerprint: towards end-to-end semi-supervised deep drug discovery

Observing the recent progress in Deep Learning, the employment of AI is surging to accelerate drug discovery and cut R&D costs in the last few years. However, the success of deep learning is attributed to large-scale clean high-quality labeled data, ...
Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification
WWW '19: The World Wide Web Conference

Hierarchical text classification has many real-world applications. However, labeling a large number of documents is costly. In practice, we can use semi-supervised learning or weakly supervised learning (e.g., dataless classification) to reduce the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

August 2018

727 pages

ISBN:9781450357944

DOI:10.1145/3233547

General Chairs:
Amarda Shehu
George Mason University, USA
,
Cathy Wu
University of Delaware, USA
,
Program Chairs:
Christina Boucher
University of Florida, USA
,
Jing Li
Case Western Reserve University, USA
,
Hongfang Liu
Mayo Clinic, USA
,
Mihai Pop
University of Maryland, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGBio: ACM Special Interest Group on Bioinformatics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF CAREER grant
National Science Foundation

Conference

BCB '18

Sponsor:

SIGBio

BCB '18: 9th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

August 29 - September 1, 2018

DC, Washington, USA

Acceptance Rates

BCB '18 Paper Acceptance Rate 46 of 148 submissions, 31%;

Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
863
Total Downloads

Downloads (Last 12 months)196
Downloads (Last 6 weeks)32

Reflects downloads up to 28 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang ZHuang TPan LWang JWang LDing JXiao J(2024)QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learningJournal of Cheminformatics10.1186/s13321-024-00843-y16:1Online publication date: 29-Apr-2024
https://doi.org/10.1186/s13321-024-00843-y
Hu WLiu YChen XChai WChen HWang HWang G(2024)Deep Learning Methods for Small Molecule Drug Discovery: A SurveyIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32519775:2(459-479)Online publication date: Feb-2024
https://doi.org/10.1109/TAI.2023.3251977
Ma HJiang FRong YGuo YHuang J(2024)Toward Robust Self-Training Paradigm for Molecular Prediction TasksJournal of Computational Biology10.1089/cmb.2023.018731:3(213-228)Online publication date: 1-Mar-2024
https://doi.org/10.1089/cmb.2023.0187
Ju WFang ZGu YLiu ZLong QQiao ZQin YShen JSun FXiao ZYang JYuan JZhao YWang YLuo XZhang M(2024)A Comprehensive Survey on Deep Graph Representation LearningNeural Networks10.1016/j.neunet.2024.106207173(106207)Online publication date: May-2024
https://doi.org/10.1016/j.neunet.2024.106207
Jiang JLi YZhang RLiu Y(2024)INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property predictionJournal of Molecular Graphics and Modelling10.1016/j.jmgm.2024.108703128(108703)Online publication date: May-2024
https://doi.org/10.1016/j.jmgm.2024.108703
Torres LRibeiro BArrais J(2024)Multi-scale cross-attention transformer via graph embeddings for few-shot molecular property predictionApplied Soft Computing10.1016/j.asoc.2024.111268153:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.asoc.2024.111268
Tran TEkenna C(2023)Molecular Descriptors Property Prediction Using Transformer-Based ApproachInternational Journal of Molecular Sciences10.3390/ijms24151194824:15(11948)Online publication date: 26-Jul-2023
https://doi.org/10.3390/ijms241511948
Zhang YWu XFang QQian SXu C(2023)Knowledge-Enhanced Attributed Multi-Task Learning for Medicine RecommendationACM Transactions on Information Systems10.1145/352766241:1(1-24)Online publication date: 10-Jan-2023
https://dl.acm.org/doi/10.1145/3527662
Sun D(2023)Few-shot graph and smiles learning for molecular property predictionThird International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023)10.1117/12.3005809(26)Online publication date: 10-Oct-2023
https://doi.org/10.1117/12.3005809
Zhang BLuo CJiang HFeng SLi XZhang BYe Y(2023)Adaptive Transfer of Graph Neural Networks for Few-Shot Molecular Property PredictionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2023.332745220:6(3863-3875)Online publication date: Nov-2023
https://doi.org/10.1109/TCBB.2023.3327452
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents