Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3233547.3233601acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Public Access

A Distributed Semi-Supervised Platform for DNase-Seq Data Analytics using Deep Generative Convolutional Networks

Published: 15 August 2018 Publication History

Abstract

A deep learning approach for analyzing DNase-seq datasets is presented, which has promising potentials for unraveling biological underpinnings on transcription regulation mechanisms. Further understanding of these mechanisms can lead to important advances in life sciences in general and drug, biomarker discovery, and cancer research in particular. Motivated by recent remarkable advances in the field of deep learning, we developed a platform, Deep Semi-Supervised DNase-seq Analytics (DSSDA). Primarily empowered by deep generative Convolutional Networks (ConvNets), the most notable aspect is the capability of semi-supervised learning, which is highly beneficial for common biological settings often plagued with a less sufficient number of labeled data. In addition, we investigated a k-mer based continuous vector space representation, attempting further improvement on learning power with the consideration of the nature of biological sequences for features associated with locality-based relationships between neighboring nucleotides. DSSDA employs a modified Ladder Network for underlying generative model architecture, and its performance is demonstrated on the cell type classification task using sequences from large-scale DNase-seq experiments. We report the performance of DSSDA in both fully-supervised setting, in which DSSDA outperforms widely-known ConvNet models (94.6% classification accuracy), and semi-supervised setting for which, even with less than 10% of labeled data, DSSDA performs relatively comparable to other ConvNets using the full data set. Our results underscore, in order to deal with challenging genomic sequence datasets, the need of a better deep learning method to learn latent features and representation.

References

[1]
Mart'ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et almbox. . 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
[2]
Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey . 2015. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nature biotechnology Vol. 33, 8 (2015), 831--838.
[3]
Timothy L Bailey, Mikael Boden, Fabian A Buske, Martin Frith, Charles E Grant, Luca Clementi, Jingyuan Ren, Wilfred W Li, and William S Noble . 2009. MEME SUITE: tools for motif discovery and searching. Nucleic acids research Vol. 37, suppl_2 (2009), W202--W208.
[4]
Marco Baroni, Georgiana Dinu, and Germán Kruszewski . 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (1). 238--247.
[5]
Yoshua Bengio, Aaron Courville, and Pascal Vincent . 2013 a. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence Vol. 35, 8 (2013), 1798--1828.
[6]
Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent . 2013 b. Generalized denoising auto-encoders as generative models Advances in Neural Information Processing Systems. 899--907.
[7]
ENCODE Project Consortium et almbox. . 2012. An integrated encyclopedia of DNA elements in the human genome. Nature Vol. 489, 7414 (2012), 57--74.
[8]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei . 2009. Imagenet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.
[9]
Mahmoud Ghandi, Dongwon Lee, Morteza Mohammad-Noori, and Michael A Beer . 2014. Enhanced regulatory sequence prediction using gapped k-mer features. PLoS computational biology Vol. 10, 7 (2014), e1003711.
[10]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio . 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[11]
Eduardo G Gusmao, Manuel Allhoff, Martin Zenke, and Ivan G Costa . 2016. Analysis of computational footprinting methods for DNase sequencing experiments. Nature methods (2016).
[12]
Hamid Reza Hassanzadeh and May D Wang . 2016. DeeperBind: Enhancing prediction of sequence specificities of DNA binding proteins. In Bioinformatics and Biomedicine (BIBM), 2016 IEEE International Conference on. IEEE, 178--183.
[13]
Housheng Hansen He, Clifford A Meyer, Mei-Wei Chen, Chongzhi Zang, Yin Liu, Prakash K Rao, Teng Fei, Han Xu, Henry Long, X Shirley Liu, et almbox. . 2014. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nature methods Vol. 11, 1 (2014), 73--78.
[14]
Geoffrey E Hinton and Ruslan R Salakhutdinov . 2006. Reducing the dimensionality of data with neural networks. science Vol. 313, 5786 (2006), 504--507.
[15]
David R Kelley, Jasper Snoek, and John L Rinn . 2016. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome research Vol. 26, 7 (2016), 990--999.
[16]
Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling . 2014. Semi-supervised learning with deep generative models Advances in Neural Information Processing Systems. 3581--3589.
[17]
Diederik P Kingma and Max Welling . 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[18]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.
[19]
Anshul Kundaje, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, Michael J Ziller, et almbox. . 2015. Integrative analysis of 111 reference human epigenomes. Nature Vol. 518, 7539 (2015), 317--330.
[20]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton . 2015. Deep learning. Nature Vol. 521, 7553 (2015), 436--444.
[21]
Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel . 1989. Backpropagation applied to handwritten zip code recognition. Neural computation Vol. 1, 4 (1989), 541--551.
[22]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner . 1998. Gradient-based learning applied to document recognition. Proc. IEEE Vol. 86, 11 (1998), 2278--2324.
[23]
Dongwon Lee, David U Gorkin, Maggie Baker, Benjamin J Strober, Alessandro L Asoni, Andrew S McCallion, and Michael A Beer . 2015. A method to predict the impact of regulatory variants from DNA sequence. Nature genetics Vol. 47, 8 (2015), 955.
[24]
Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther . 2016. Auxiliary deep generative models. arXiv preprint arXiv:1602.05473 (2016).
[25]
Laurens van der Maaten and Geoffrey Hinton . 2008. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, Nov (2008), 2579--2605.
[26]
Matthew T Maurano, Richard Humbert, Eric Rynes, Robert E Thurman, Eric Haugen, Hao Wang, Alex P Reynolds, Richard Sandstrom, Hongzhu Qu, Jennifer Brody, et almbox. . 2012. Systematic localization of common disease-associated variation in regulatory DNA. Science Vol. 337, 6099 (2012), 1190--1195.
[27]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013 a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[28]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013 b. Distributed representations of words and phrases and their compositionality Advances in neural information processing systems. 3111--3119.
[29]
Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, and Yoshua Bengio . 2016. Deconstructing the ladder network architecture. In International Conference on Machine Learning. 2368--2376.
[30]
Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko . 2015. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems. 3546--3554.
[31]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra . 2014. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082 (2014).
[32]
Jürgen Schmidhuber . 2015. Deep learning in neural networks: An overview. Neural networks Vol. 61 (2015), 85--117.
[33]
Richard I Sherwood, Tatsunori Hashimoto, Charles W O'donnell, Sophia Lewis, Amira A Barkal, John Peter Van Hoff, Vivek Karun, Tommi Jaakkola, and David K Gifford . 2014. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nature biotechnology Vol. 32, 2 (2014), 171--178.
[34]
Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[35]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich . 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.
[36]
Robert E Thurman, Eric Rynes, Richard Humbert, Jeff Vierstra, Matthew T Maurano, Eric Haugen, Nathan C Sheffield, Andrew B Stergachis, Hao Wang, Benjamin Vernot, et almbox. . 2012. The accessible chromatin landscape of the human genome. Nature Vol. 489, 7414 (2012), 75--82.
[37]
Harri Valpola . 2015. From neural PCA to deep unsupervised learning. Advances in Independent Component Analysis and Learning Machines (2015), 143--171.
[38]
Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert . 2012. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade. Springer, 639--655.
[39]
Matthew D Zeiler and Rob Fergus . 2014. Visualizing and understanding convolutional networks European conference on computer vision. Springer, 818--833.
[40]
Haoyang Zeng, Matthew D Edwards, Ge Liu, and David K Gifford . 2016. Convolutional neural network architectures for predicting DNA--protein binding. Bioinformatics Vol. 32, 12 (2016), i121--i127.
[41]
Jian Zhou and Olga G Troyanskaya . 2015. Predicting effects of noncoding variants with deep learning-based sequence model. Nature methods Vol. 12, 10 (2015), 931--934.

Cited By

View all
  • (2019)Deep Learning-Based Spatial Analytics for Disaster-Related Tweets: An Experimental Study2019 20th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM.2019.00-40(337-342)Online publication date: Jun-2019

Index Terms

  1. A Distributed Semi-Supervised Platform for DNase-Seq Data Analytics using Deep Generative Convolutional Networks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
      August 2018
      727 pages
      ISBN:9781450357944
      DOI:10.1145/3233547
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 August 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. continuous vector representation
      2. convolutional networks
      3. deep learning
      4. dnase-seq
      5. generative models
      6. semi-supervised learning

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      BCB '18
      Sponsor:

      Acceptance Rates

      BCB '18 Paper Acceptance Rate 46 of 148 submissions, 31%;
      Overall Acceptance Rate 254 of 885 submissions, 29%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)56
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 18 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Deep Learning-Based Spatial Analytics for Disaster-Related Tweets: An Experimental Study2019 20th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM.2019.00-40(337-342)Online publication date: Jun-2019

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media