Abstract
The accuracy of promoter recognition depends upon not only the appropriate representation of the promoter sequence but also the essential features of the sequence. These two important issues are addressed in this paper. Firstly, a promoter sequence is captured in form of a Chaos Game Representation (CGR). Then, based on the concept of Mahalanobis distance, a new statistical feature extraction is introduced to select a set of the most significant pixels from the CGR. The recognition is performed by a supervised neural network. This proposed technique achieved 100% accuracy when it is tested with the E.coli promoter sequences using a leave-one-out method. Our approach also outperforms other techniques.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ma, Q., et al.: DNA Sequence Classification via an Expectation Maximization Algorithm and Neural Networks: A Case Study. IEEE Trans. Systems, Man and Cybernetics, Part-C: Applications and Reviews 31, 468–475 (2001)
Mahadevan, I., Ghosh, I.: Analysis of E. coli promoter structures using neural networks. Nucleic Acids Research 22(11), 2158–2165 (1994)
O’Neill, M.C.: Escherichia coli promoters: neural networks develop distinct descriptions in learning to search for promoters of different spacing classes. Nucleic Acids Research 20, 3471–3477 (1992)
Pedersen, A.G., Engelbrecht, J.: Investigations of Escherichia coli promoter sequences with artificial neural network: New signals discovered upstream of the transcriptional startpoint. In: Proceedings of ISM 1995 (1995)
Horton, P.B., Kanehisa, M.: An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. Nucleic Acids Research 20, 4331–4338 (1992)
Demeler, B., Zhou, G.W.: Neural network optimization of E. coli promoter prediction. Nucleic Acids Research 19, 1593–1599 (1991)
Pedersen, A.G., et al.: Characterization of prokaryotic and eukaryotic promoters using hidden markov models. In: Proceedings of ISM 1998 (1998)
Bucher, P.: Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J.Mol.Biol. 212, 563–578 (1990)
Matsuda, T., Motoda, H., Washio, T.: Graph based induction and its applications. Advanced Engineering Informatics 16, 135–143 (2002)
Matsuyama, Y., Kawamura, R.: Promoter Recognition for E.coli DNA Segments by Independent Component Analysis. In: Proceedings of CSB 2004, pp. 686–691 (2004)
Huang, Y.F., Wang, C.M.: Integration of Knowledge-Discovery and Artificial-Intelligence Approaches for Promoter Recognition in DNA Sequences. In: Proceedings of ICITA 2005, vol. 1, pp. 259–264 (2005)
Hirsh, H., Noordewier, M.: Using Background Knowledge to Improve Inductive Learning of DNA sequences. In: Proceeding of the Tenth Annual Conference on Artificial Intelligence for Applications, San Antonio, TX, pp. 351–357 (1994)
Jeffrey, H.J.: Chaos game representation of gene structure. Nucleic Acids Research 18(8), 2163–2170 (1990)
Goldman, N.: Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. Nucleic Acids Research 21(10), 2487–2491 (1993)
Almeida, J.S., et al.: Analysis of genomic sequences by Chaos Game Representation. Bioinformatics 17(5), 429–437 (2001)
Deschavanne, P.J., et al.: Genomic Signature: Characterization and Classification of Species Assessed by Chaos Game Representation of Sequences. Mol. Biol. Evol. 16(10), 1391–1399 (1999)
Dash, M., Liu, H.: Consistency-based search in feature selection. Elsevier 151, 155–176 (2003)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases (1998), http://www.ics.uci.edu/mlearn/mlrepository.html
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, Los Altos (1993)
Huang, J.-W., Yang, C.-B., Tseng, K.-T.: Promoter Prediction in DNA Sequences. In: Proceedings of National Computer Symposium, Workshop on Algorithm and Computation Theory, Taichung, Taiwan (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tinnungwattana, O., Lursinsap, C. (2006). Statistical Feature Selection from Chaos Game Representation for Promoter Recognition. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758525_112
Download citation
DOI: https://doi.org/10.1007/11758525_112
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34381-3
Online ISBN: 978-3-540-34382-0
eBook Packages: Computer ScienceComputer Science (R0)