Abstract
Prediction of tertiary structure of protein from its primary structure (amino acid sequence of protein) without relying on sequential similarity is a challenging task for bioinformatics and biological science. The protein fold prediction problem can be expressed as a prediction problem that can be solved by machine learning techniques. In this paper, a new method based on ensemble of five classifiers (Naïve Bayes, Multi Layer Perceptron (MLP), Support Vector Machine (SVM), LogitBoost and AdaBoost.M1) is proposed for the protein fold prediction problem. The dataset used in this experiment is from the standard dataset provided by Ding and Dubchak. Experimental results show that the proposed method enhanced the prediction accuracy up to 64% on an independent test dataset, which is the highest prediction accuracy in compare with other methods proposed by the works have done by literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Stanley Shi, Y.M., Suganthan, P.N.: Multiclass protein fold recognition using multiobjective evolutionary algorithms. In: Computational Intelligence in Bioinformatics and Computational Biology (2004), 0-7803-8728-7
Ding, C., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4), 349–358 (2001)
Bologna, G., Appel, R.D.: A comparison study on protein fold recognition. In: Ninth International Conference on Neural Information Processing, November 2002, vol. 5, pp. 2492–2496 (2002)
Bittencourt, V.G., Abreut, M.C.C., de Souto, M.C.P., Canutot, A.M.P.: An empirical comparison of individual machine learning techniques and ensemble approaches in protein structural class prediction. In: International Joint Conference on Neural Networks, 0-7803-9048-2 (2005)
Krishnaraj, Y., Reddy, C.K.: Boosting methods for Protein Fold Recognition: An Empirical Comparison. In: IEEE International Conference on Bioinformatics (2008) 978-0-7695-3452-7
Hobohm, U., Scharf, M., Schneider, R., Sander, C.: selection of a representative set of structure from the Brookhaven Protein Bank protein. Science 1, 409–417 (1992)
Lo Conte, L., Ailey, B., Hubbard, T.J.P., Braner, S.E., Murzin, A.G., Chothia, C.: SCOP a structural classification of proteins database 28(1), 257–259 (2000)
Huang, C.D., Lin, C.T., Pal, N.R.: Hierarchical learning architecture with automatic fearture selection for multiclass protein fold classification. IEEE transactions on NanoBioscience 2(4), 221–232 (2003)
Duwairi, R., Kassawneh, A.: A Framework for Predicting Proteins 3D Structures. In: Computer Systems and Applications, AICCSA 2008 (2008), 978-1-4244-1968
Miller, D.J., Pal, S.: Transductive Methods for the Distributed Ensemble Classification Problem. Neural Computation 19, 856–884 (2007)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms, pp. 1045–9227 (1997)
Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning, p. 185. MIT Press, Cambridge (1998)
Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: a Statistical View of Boosting Annals of Statistics 28(2), 337–407 (2001) (Published version)
Friedman, N., Goldszmidt, M.: Learning Bayesian networks with local structure. In: Proc. UAI 1996, pp. 252–262 (1996)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. (2001), 978-0-471-05669-0
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. (1998), 978-0-471-05669-0
Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features. Computer Vision and Pattern Recognition (2001), 0-7695-1272-0
Schapire, R.E.: The strength of weak learnability. Machine Learning 5, 197–227 (1990)
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28(2), 337–407 (2000)
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
MacKay, D.J.C.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dehzangi, A., Phon Amnuaisuk, S., Ng, K.H., Mohandesi, E. (2009). Protein Fold Prediction Problem Using Ensemble of Classifiers. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10684-2_56
Download citation
DOI: https://doi.org/10.1007/978-3-642-10684-2_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10682-8
Online ISBN: 978-3-642-10684-2
eBook Packages: Computer ScienceComputer Science (R0)