Nothing Special   »   [go: up one dir, main page]

Skip to main content

Motif Discovery Through Predictive Modeling of Gene Regulation

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2005)

Abstract

We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites PSSMs by incorporating promoter sequence and transcriptome gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a k-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bailey, T.L., Elkan, C.P.: Fitting a mixture model by expectation-maximization to discover motifs in biopolymers. In: Altman, R., Brutlag, D., Karp, P., Lathrop, R., Searls, D. (eds.) Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)

    Google Scholar 

  2. Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)

    Article  Google Scholar 

  3. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  4. Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)

    Article  Google Scholar 

  5. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., Friedman, N.: Module networks: Identifying regulatory modules and their condition specific regulators from gene expression data. Nature Genetics 34, 166–176 (2003)

    Article  Google Scholar 

  6. Battle, A., Segal, E., Koller, D.: Probabilistic discovery of overlapping cellular processes and their regulation. In: Proceedings of the eighth annual international conference on Computational molecular biology, pp. 167–176. ACM Press, New York (2004)

    Google Scholar 

  7. Bussemaker, H.J., Li, H., Siggia, E.D.: Regulatory element detection using correlation with expression. Nature Genetics 27, 167–171 (2001)

    Article  Google Scholar 

  8. Conlon, E.M., Liu, X.S., Lieb, J.D., Liu, J.S.: Integrating regulatory motif discovery and genome-wide expression analysis. Proceedings of the National Academy of Sciences USA 100, 3339–3344 (2003)

    Article  Google Scholar 

  9. Zilberstein, C.B.Z., Eskin, E., Yakhini, Z.: Sequence motifs in ranked expression data. In: Proceedings of the First RECOMB Satellite Workshop on Regulatory Genomics (2004)

    Google Scholar 

  10. Middendorf, M., Kundaje, A., Wiggins, C., Freund, Y., Leslie, C.: Predicting genetic regulatory response using classification. In: Proceedings of the Twelfth International Conference on Intelligent Systems for Molecular Biology, ISMB 2004 (2004)

    Google Scholar 

  11. Segal, E., Yelensky, R., Koller, D.: Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19, 273–282 (2003)

    Article  Google Scholar 

  12. Schapire, R.E.: The boosting approach to machine learning: An overview. In: MSRI Workshop on Nonlinear Estimation and Classification (2002)

    Google Scholar 

  13. Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics 26, 1651–1686 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  14. Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 124–133 (1999)

    Google Scholar 

  15. Cover, T., Thomas, J.: Elements of Information Theory. John Wiley, New York (1990)

    Google Scholar 

  16. Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 129–136. ACM Press, New York (2002)

    Chapter  Google Scholar 

  17. Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to environmental changes. Molecular Biology of the Cell 11, 4241–4257 (2000)

    Google Scholar 

  18. Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Gerber, G.K., Hannett, N.M., Harbison, C.R., Thompson, C.M., Simon, I., Zeitlinger, J., Jennings, E.G., Murray, H.L., Gordon, D.B., Ren, B., Wyrick, J.J., Tagne, J., Volkert, T.L., Fraenkel, E., Gifford, D.K., Young, R.A.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)

    Article  Google Scholar 

  19. Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Prüss, M., Reuter, I., Schacherer, F.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Research 28, 316–319 (2000)

    Article  Google Scholar 

  20. Pilpel, Y., Sudarsanam, P., Church, G.M.: Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genetics 2, 153–159 (2001)

    Article  Google Scholar 

  21. Kundaje, A., Middendorf, M., Shah, M., Wiggins, C., Freund, Y., Leslie, C.: A classification-based framework for predicting and analyzing gene regulatory response. Web supplement, http://www.cs.columbia.edu/compbio/robust-geneclass

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Middendorf, M., Kundaje, A., Shah, M., Freund, Y., Wiggins, C.H., Leslie, C. (2005). Motif Discovery Through Predictive Modeling of Gene Regulation. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_41

Download citation

  • DOI: https://doi.org/10.1007/11415770_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25866-7

  • Online ISBN: 978-3-540-31950-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics