Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1854776.1854817acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Promoter prediction based on a multiple instance learning scheme

Published: 02 August 2010 Publication History

Abstract

Core promoters are crucial regions for initiation of gene transcription. Identification of core promoters is important to the understanding of transcriptional regulation and elucidation of relationships among genes of an organism. Experimentally locating core promoters is laborious and costly. Therefore, it is desirable to develop computational approaches to support and complement experimental methods. However, computational prediction of core promoters of eukaryotic species is challenging. In this paper, we first formulate the core promoter prediction problem as a variation of the multiple instance learning problem. We then develop a new algorithm for identifying core promoters with a high positive prediction rate and a high sensitivity. Since many computational biology problems can be formulated under the multiple instance learning paradigm, our approach may inspire future research of applying multiple instance learning techniques to complex biology problems and our method may have broad potential applications.

References

[1]
V. Bajic, M. Brent, R. Brown, A. Frankish, J. Harrow, U. Ohler, V. Solovyev, and S. Tan. Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biol, 7 Suppl 1:S3.1--13, 2006.
[2]
V. Bajic, A. Chong, S. Seah, and V. Brusic. Intelligent system for vertebrate promoter recognition. IEEE Intelligent Systems, 17(4):64--70, 2002.
[3]
V. Bajic and S. Seah. Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res, 13(8):1923--9, Aug 2003.
[4]
V. Bajic, S. Tan, Y. Suzuki, and S. Sugano. Promoter prediction analysis on the whole human genome. Nat Biotechnol, 22(11):1467--73, Nov 2004.
[5]
P. Carninci, A. Sandelin, and et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet, 38(6):626--35, Jun 2006.
[6]
Y. Chen, J. Bi, and J. Z. Wang. MILES: Multiple-Instance Learning via Embedded Instance Selection. IEEE Trans. Pattern Anal. Mach. Intell., 28(12):1931--1947, 2006.
[7]
R. Davuluri, I. Grosse, and M. Zhang. Computational identification of promoters and first exons in the human genome. Nat Genet, 29(4):412--7, Dec 2001.
[8]
T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell., 89(1--2):31--71, 1997.
[9]
M. Frith, E. Valen, A. Krogh, Y. Hayashizaki, P. Carninci, and A. Sandelin. A code for transcription initiation in mammalian genomes. Genome Res, 18(1):1--12, Jan 2008.
[10]
H. Hiisila and E. Bingham. Dependencies between Transcription Factor Binding Sites: Comparison between ICA, NMF, PLSA and Frequent Sets. In ICDM '04: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM '04), pages 114--121, Washington, DC, USA, 2004. IEEE Computer Society.
[11]
O. Maron and T. Lozano-Pérez. A framework for multiple-instance learning. In NIPS '97: Proceedings of the 1997 conference on Advances in neural information processing systems 10, pages 570--576, Cambridge, MA, USA, 1998. MIT Press.
[12]
Y. Matsuyama and R. Kawamura. Promoter Recognition for E. coli DNA Segments by Independent Component Analysis. pages 686--691, 2004.
[13]
U. Ohler, G. Liao, H. Niemann, and G. Rubin. Computational analysis of core promoters in the Drosophila genome. Genome Biol, 3(12):RESEARCH0087, 2002.
[14]
A. Pedersen, P. Baldi, Y. Chauvin, and S. Brunak. The biology of eukaryotic promoter prediction--a review. Comput Chem, 23(3--4):191--207, Jun 1999.
[15]
J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publisher, 1993.
[16]
A. Sandelin, P. Carninci, B. Lenhard, J. Ponjavic, Y. Hayashizaki, and D. Hume. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet, 8(6):424--36, Jun 2007.
[17]
M. Scherf, A. Klingenhoff, and T. Werner. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J Mol Biol, 297(3):599--606, Mar 2000.
[18]
S. Smale and J. Kadonaga. The RNA polymerase II core promoter. Annu Rev Biochem, 72:449--479, 2003.
[19]
V. Solovyev and I. Shahmuradov. PromH: Promoters identification using orthologous genomic sequences. Nucleic Acids Res, 31(13):3540--5, Jul 2003.
[20]
The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature, 447(7146):799--816, Jun 2007.
[21]
G. Wang, T. Yu, and W. Zhang. WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar. Nucleic Acids Res, 33(Web Server issue):W412--416, Jul 2005.
[22]
G. Wang and W. Zhang. A steganalysis-based approach to comprehensive identification and characterization of functional regulatory elements. Genome Biol, 7(6):R49, Jun 2006.
[23]
L. Weis and D. Reinberg. Transcription by RNA polymerase II: initiator-directed formation of transcription-competent complexes. FASEB J, 6(14):3300--3309, Nov 1992.
[24]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publisher, 1999.
[25]
Z. Xie, E. Allen, N. Fahlgren, A. Calamar, S. Givan, and J. Carrington. Expression of Arabidopsis miRNA genes. Plant Physiol, 138(4):2145--2154, Aug 2005.
[26]
Y. Zhang, Y. Chen, and X. Ji. Motif Discovery as a Multiple-Instance Problem. In ICTAI '06: Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence, pages 805--809, Washington, DC, USA, 2006. IEEE Computer Society.

Cited By

View all
  • (2017)A maximum partial entropy-based method for multiple-instance concept learningApplied Intelligence10.1007/s10489-016-0873-046:4(865-875)Online publication date: 1-Jun-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
August 2010
705 pages
ISBN:9781450304382
DOI:10.1145/1854776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 August 2010

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

BCB'10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)A maximum partial entropy-based method for multiple-instance concept learningApplied Intelligence10.1007/s10489-016-0873-046:4(865-875)Online publication date: 1-Jun-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media