Motif Discovery Through Predictive Modeling of Gene Regulation

Manuel Middendorf²⁵,
Anshul Kundaje²⁶,
Mihir Shah²⁶,
Yoav Freund^26,28,29,
Chris H. Wiggins^27,28 &
…
Christina Leslie^26,28,29

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3500))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1192 Accesses
7 Citations

Abstract

We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites PSSMs by incorporating promoter sequence and transcriptome gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a k-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Sequence information gain based motif analysis

Article Open access 09 November 2015

MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures

Article Open access 11 June 2015

CMF: A Combinatorial Tool to Find Composite Motifs

References

Bailey, T.L., Elkan, C.P.: Fitting a mixture model by expectation-maximization to discover motifs in biopolymers. In: Altman, R., Brutlag, D., Karp, P., Lathrop, R., Searls, D. (eds.) Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)
Google Scholar
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)
Article Google Scholar
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Article Google Scholar
Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)
Article Google Scholar
Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., Friedman, N.: Module networks: Identifying regulatory modules and their condition specific regulators from gene expression data. Nature Genetics 34, 166–176 (2003)
Article Google Scholar
Battle, A., Segal, E., Koller, D.: Probabilistic discovery of overlapping cellular processes and their regulation. In: Proceedings of the eighth annual international conference on Computational molecular biology, pp. 167–176. ACM Press, New York (2004)
Google Scholar
Bussemaker, H.J., Li, H., Siggia, E.D.: Regulatory element detection using correlation with expression. Nature Genetics 27, 167–171 (2001)
Article Google Scholar
Conlon, E.M., Liu, X.S., Lieb, J.D., Liu, J.S.: Integrating regulatory motif discovery and genome-wide expression analysis. Proceedings of the National Academy of Sciences USA 100, 3339–3344 (2003)
Article Google Scholar
Zilberstein, C.B.Z., Eskin, E., Yakhini, Z.: Sequence motifs in ranked expression data. In: Proceedings of the First RECOMB Satellite Workshop on Regulatory Genomics (2004)
Google Scholar
Middendorf, M., Kundaje, A., Wiggins, C., Freund, Y., Leslie, C.: Predicting genetic regulatory response using classification. In: Proceedings of the Twelfth International Conference on Intelligent Systems for Molecular Biology, ISMB 2004 (2004)
Google Scholar
Segal, E., Yelensky, R., Koller, D.: Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19, 273–282 (2003)
Article Google Scholar
Schapire, R.E.: The boosting approach to machine learning: An overview. In: MSRI Workshop on Nonlinear Estimation and Classification (2002)
Google Scholar
Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics 26, 1651–1686 (1998)
Article MATH MathSciNet Google Scholar
Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 124–133 (1999)
Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. John Wiley, New York (1990)
Google Scholar
Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 129–136. ACM Press, New York (2002)
Chapter Google Scholar
Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to environmental changes. Molecular Biology of the Cell 11, 4241–4257 (2000)
Google Scholar
Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Gerber, G.K., Hannett, N.M., Harbison, C.R., Thompson, C.M., Simon, I., Zeitlinger, J., Jennings, E.G., Murray, H.L., Gordon, D.B., Ren, B., Wyrick, J.J., Tagne, J., Volkert, T.L., Fraenkel, E., Gifford, D.K., Young, R.A.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)
Article Google Scholar
Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Prüss, M., Reuter, I., Schacherer, F.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Research 28, 316–319 (2000)
Article Google Scholar
Pilpel, Y., Sudarsanam, P., Church, G.M.: Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genetics 2, 153–159 (2001)
Article Google Scholar
Kundaje, A., Middendorf, M., Shah, M., Wiggins, C., Freund, Y., Leslie, C.: A classification-based framework for predicting and analyzing gene regulatory response. Web supplement, http://www.cs.columbia.edu/compbio/robust-geneclass

Download references

Author information

Authors and Affiliations

Department of Physics, Columbia University, New York, NY, 10027, USA
Manuel Middendorf
Department of Computer Science, Columbia University, New York, NY, 10027, USA
Anshul Kundaje, Mihir Shah, Yoav Freund & Christina Leslie
Department of Applied Mathematics, Columbia University, New York, NY, 10027, USA
Chris H. Wiggins
Center for Computational Biology and Bioinformatics, Columbia University, New York, NY, 10027, USA
Yoav Freund, Chris H. Wiggins & Christina Leslie
Center for Computational Learning Systems, Columbia University, New York, NY, 10027, USA
Yoav Freund & Christina Leslie

Authors

Manuel Middendorf
View author publications
You can also search for this author in PubMed Google Scholar
Anshul Kundaje
View author publications
You can also search for this author in PubMed Google Scholar
Mihir Shah
View author publications
You can also search for this author in PubMed Google Scholar
Yoav Freund
View author publications
You can also search for this author in PubMed Google Scholar
Chris H. Wiggins
View author publications
You can also search for this author in PubMed Google Scholar
Christina Leslie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, 108-8639, Minato-ku, Tokyo, Japan
Satoru Miyano
Broad Institute of MIT and Harvard, 320 Charles Street, 02141-2023, Cambridge, MA, USA
Jill Mesirov
Computational Genomics Laboratory, Department of Bioengineering, Boston University, 44 Cummington St., 02215, Boston, MA, USA
Simon Kasif
Center for Molecular Biology and Computer Sciecne Department, Brown University, 115 Waterman St., 02912, Providence, RI, USA
Sorin Istrail
University of California, San Diego, USA
Pavel A. Pevzner
Department of Molecular and Computational Biology, University of Southern California, 1050 Childs Way, 90089-2910, Los Angeles, CA, USA
Michael Waterman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Middendorf, M., Kundaje, A., Shah, M., Freund, Y., Wiggins, C.H., Leslie, C. (2005). Motif Discovery Through Predictive Modeling of Gene Regulation. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_41

Download citation

DOI: https://doi.org/10.1007/11415770_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25866-7
Online ISBN: 978-3-540-31950-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Motif Discovery Through Predictive Modeling of Gene Regulation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Sequence information gain based motif analysis

MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures

CMF: A Combinatorial Tool to Find Composite Motifs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Motif Discovery Through Predictive Modeling of Gene Regulation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Sequence information gain based motif analysis

MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures

CMF: A Combinatorial Tool to Find Composite Motifs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation