Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters

Yifeng Li⁵,
Chih-Yu Chen⁵ &
Wyeth W. Wasserman⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9029))

Included in the following conference series:

International Conference on Research in Computational Molecular Biology

3463 Accesses
19 Citations

Abstract

Sparse linear models approximate target variable(s) by a sparse linear combination of input variables. The sparseness is realized through a regularization term. Since they are simple, fast, and able to select features, they are widely used in classification and regression. Essentially linear models are shallow feed-forward neural networks which have three limitations: (1) incompatibility to model non-linearity of features, (2) inability to learn high-level features, and (3) unnatural extensions to select features in multi-class case. Deep neural networks are models structured by multiple hidden layers with non-linear activation functions. Compared with linear models, they have two distinctive strengths: the capability to (1) model complex systems with non-linear structures, (2) learn high-level representation of features. Deep learning has been applied in many large and complex systems where deep models significantly outperform shallow ones. However, feature selection at the input level, which is very helpful to understand the nature of a complex system, is still not well-studied. In genome research, the cis-regulatory elements in non-coding DNA sequences play a key role in the expression of genes. Since the activity of regulatory elements involves highly interactive factors, a deep tool is strongly needed to discover informative features. In order to address the above limitations of shallow and deep models for selecting features of a complex system, we propose a deep feature selection model that (1) takes advantages of deep structures to model non-linearity and (2) conveniently selects a subset of features right at the input level for multi-class data. We applied this model to the identification of active enhancers and promoters by integrating multiple sources of genomic information. Results show that our model outperforms elastic net in terms of size of discriminative feature subset and classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Review on Deep Learning in Feature Selection

EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features

Article Open access 12 December 2016

Harvestman: a framework for hierarchical feature learning and selection from whole genome sequencing data

Article Open access 01 April 2021

References

Ackley, D., Hinton, G., Sejnowski, T.: A learning algorithm for Boltzmann machines. Cognitive Science, 147–169 (1985)
Google Scholar
Andersson, R., Gebhard, C., Miguel-Escalada, I., et al.: An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014)
Article Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8), 1798–1828 (2013)
Article Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: The Python for Scientific Computing Conference (SciPy), June 2010
Google Scholar
Bradley, P., Mangasarian, O.: Feature selection via concave minimization and support vector machines. In: International Conference on Machine Learning, pp. 82–90. Morgan Kaufmann Publishers Inc. (1998)
Google Scholar
Bredemeier-Ernst, I., Nordheim, A., Janknecht, R.: Transcriptional activity and constitutive nuclear localization of the ETS protein Elf-1. FEBS Letters 408(1), 47–51 (1997)
Article Google Scholar
Breiman, L.: Random Forests. Machine learning 45, 5–32 (2001)
Article MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22 (2010)
Google Scholar
Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Article MATH MathSciNet Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)
Article MATH MathSciNet Google Scholar
Ise, W., Kohyama, M., Schraml, B., Zhang, T., Schwer, B., Basu, U., Alt, F., Tang, J., Oltz, E., Murphy, T., Murphy, K.: The transcription factor BATF controls the global regulators of class-switch recombination in both B cells and T cells. Nature Immunology 12(6), 536–543 (2011)
Article Google Scholar
Kratz, A., Arner, E., Saito, R., Kubosaki, A., Kawai, J., Suzuki, H., Carninci, P., Arakawa, T., Tomita, M., Hayashizaki, Y., Daub, C.: Core promoter structure and genomic context reflect histone 3 lysine 9 acetylation patterns. BMC Genomics 11, 257 (2010)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lee, B., Dekker, J., Lee, B., Iyer, V., Sleckman, B., Shaffer, A.I., Ippolito, G., Tucker, P.: The BCL11A transcription factor directly activates rag gene expression and V(D)J recombination. Molecular Cell Biology 33(9), 1768–1781 (2013)
Article Google Scholar
Li, Y.: Deep learning package. https://github.com/yifeng-li/deep
Li, Y., Ngom, A.: Classification approach based on non-negative least squares. Neurocomputing 118, 41–57 (2013)
Article Google Scholar
LISA Lab: Deep learning tutorials. http://deeplearning.net/tutorial
Nechanitzky, R., Akbas, D., Scherer, S., Gyory, I., Hoyler, T., Ramamoorthy, S., Diefenbach, A., Grosschedl, R.: Transcription factor EBF1 is essential for the maintenance of B cell identity and prevention of alternative fates in committed cells. Nature Immunology 14(8), 867–875 (2013)
Article Google Scholar
Pjanic, M., Pjanic, P., Schmid, C., Ambrosini, G., Gaussin, A., Plasari, G., Mazza, C., Bucher, P., Mermod, N.: Nuclear factor I revealed as family of promoter binding transcription activators. BMC Genomics 12, 181 (2011)
Article Google Scholar
Rebhan, M., Chalifa-Caspi, V., Prilusky, J., Lancet, D.: Genecards: Integrating information about genes, proteins and diseases. Trends in Genetics 13(4), 163 (1997)
Article Google Scholar
Shlyueva, D., Stampfel, G., Stark, A.: Transcriptional enhancers: From properties to genome-wide predictions. Nature Review Genetics 15, 272–286 (2014)
Article Google Scholar
The ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Article Google Scholar
The FANTOM Consortium: The RIKEN PMI, CLST (DGT): A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288 (1996)
MATH MathSciNet Google Scholar
Vakoc, C., Sachdeva, M., Wang, H., Blobel, G.: Profile of histone lysine methylation across transcribed mammalian chromatin. Molecular and Cellular Biology 26(24), 9185–9195 (2006)
Article Google Scholar
Wang, Y., Li, X., Hua, H.: H3K4me2 reliably defines transcription factor binding regions in different cells. Genomics 103(2–3), 222–228 (2014)
Article Google Scholar
Zhou, V., Goren, A., Bernstein, B.: Charting histone modifications and the functional organization of mammalian genomes. Nature Review Genetics 12, 7–18 (2011)
Article Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67(2), 301–320 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Molecular Medicine and Therapeutics, University of British Columbia, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
Yifeng Li, Chih-Yu Chen & Wyeth W. Wasserman

Authors

Yifeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wyeth W. Wasserman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wyeth W. Wasserman .

Editor information

Editors and Affiliations

National Center of Biotechnology Information, Bethesda, Maryland, USA
Teresa M. Przytycka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Chen, CY., Wasserman, W.W. (2015). Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters. In: Przytycka, T. (eds) Research in Computational Molecular Biology. RECOMB 2015. Lecture Notes in Computer Science(), vol 9029. Springer, Cham. https://doi.org/10.1007/978-3-319-16706-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-16706-0_20
Published: 26 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16705-3
Online ISBN: 978-3-319-16706-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Review on Deep Learning in Feature Selection

EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features

Harvestman: a framework for hierarchical feature learning and selection from whole genome sequencing data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Review on Deep Learning in Feature Selection

EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features

Harvestman: a framework for hierarchical feature learning and selection from whole genome sequencing data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation