Abstract
Protein secondary structure prediction (PSSP) is one of the main tasks in computational biology. During the last few decades, much effort has been made towards solving this problem, with various approaches, mainly artificial neural networks (ANN). Generally, in order to predict the protein secondary structure, the ANN training process is performed using CB513 data set. Like protein structures databases, this data set is imbalanced and it can cause a low error rate for the majority class and an undesirable error rate for the minority class. In this paper we evaluate the effects of an imbalanced data set in training and learning of neural networks when they are applied to predict protein secondary structure. For this we applied resampling methods to tackle the imbalance class problem. Results show that imbalanced data sets decrease the helixes predictions rates. Although, protein data set distribution does not affect significantly the global accuracy (Q3).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Nelson, D.L., Cox, M.M.: Lehninger Principles of Biochemistry. W H Freeman, New York (2005)
Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory, New York (2004)
Isaev, A.: Introduction to Mathematical Methods in Bioinformatics. Springer, Heidelberg (2006)
Tramontano, A.: Protein Structure Prediction. Wiley-VCH, Weinheim (2006)
Bourne, P.E., Weissig, H.: Structural Bioinformatics. Wiley-Liss, New Jersey (2003)
Garnier, J., Osguthorpe, D.J., Robson, B.: Analysis of the Accuracy and Implications of Simple Methods for Predicting the Secondary Structure of Globular Proteins. Journal of molecular Biology 120, 97–120 (1978)
Gibrat, J.F., Garnier, J., Robson, B.: Further Developments of Protein Secondary Structure Prediction Using Information Theory. Journal of Molecular Biology 198, 425–443 (1987)
Biou, V., Gibrat, J.F., Levin, J.M., Robson, B., Garnier, J.: Secondary Structure Prediction: Combination of Three Different Methods. Prot. Engin. 2, 185–191 (1988)
Yi, T.M., Lander, E.S.: Protein Secondary Structure Prediction Using Nearest-Neighbor Methods. Journal of Molecular Biology 232, 1117–1129 (1993)
Salamov, A.A., Solovyev, V.V.: Prediction of Protein Secondary Structure by Combining Nearest-Neighbor Algorithms and Multiple Sequence Alignment. Journal of Molecular Biology 247, 11–15 (1995)
Chen, C., Chen, L., Zou, X., Cai, P.: Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein and Peptides Letters 16, 27–31 (2009)
Nguyen, M.N., Rajapakse, J.C.: Prediction of Protein Secondary Structure with two-stage multi-class SVMs. International Journal in Data Mining and Bioinformatics 1, 248–269 (2007)
Yi, T.M., Lander, E.S.: Protein Secondary Structure Prediction Using Nearest-Neighbor Methods. Journal of Molecular Biology 232, 1117–1129 (1993)
Bohr, H., Bohr, J., Brunak, S., Cotterill, R., Lautrup, B.: Protein Secondary Structure and Homology by Neural Networks. FEBS Letter 241, 223–228 (1988)
Qian, N., Sejnowski, T.J.: Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. Journal of Molecular Biology 202, 865–884 (1988)
Holley, H.L., Karplus, M.: Protein Secondary Structure Prediction with a Neural Network. Proceedings of the National Academy of Sciences of U.S.A. 86, 152–156 (1989)
Rost, B., Sander, C.: Prediction of Protein Secondary Structure at Better than 70% Accuracy. Journal of Molecular Biology 232, 584–599 (1993)
Maclin, R., Shavlik, J.W.: Using Knowledge-Based Neural Networks to Improve Algorithms: Refining the Chou-Fasman Algorithm for Protein Folding. Machine Learning 11, 195–215 (1993)
Chandonia, J.-M., Karplus, M.: Neural Networks for Secondary Structure and Structural Class Predictions. Protein Science 4, 275–285 (1995)
Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the Past and the Future in Protein Secondary Structure Prediction. Bioinformatics 15, 937–946 (1999)
Jones, D.T.: Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices. Journal of Molecular Biology 292, 195–202 (1999)
Ouali, M., King, R.D.: Cascaded Multiple Classifiers for Secondary Structure Prediction. Protein Science 9, 1162–1176 (2000)
Pollastri, G., Przybylski, D., Baldi, P.: Improving the Predicition of Protein Secondary Structure in Three and Eight classes using recurrent neural networks and profiles. Proteins: Structure, Function and Genetics 47, 228–235 (2002)
Yao, X.Q., Zhu, H., She, Z.S.: A Dynamic Bayesian Network Approach to Protein Secondary Structure Prediction. BMC Bioinformatics 9 (2008)
Liu, K.H., Xia, J.F., Li, X.: Efficient Ensemble Schemes for Protein Secondary Structure Prediction. Protein and Peptides Letters 15, 488–493 (2008)
Malekpour, S.A., Naghizadeh, S., Pezeshk, H., Sadeghi, M., Eslahchi, C.: Protein secondary structure prediction using three neural networks and a segmental semi Markov model. Mathematical Biosciences 217, 145–150 (2008)
Radivojac, P., Chawla, N.V., Dunker, A.K., Obradovic, Z.: Classification and Knowledge Discovery in Protein Databases. Journal of Biomedical Informatics 37, 224–239 (2004)
Chawla, N.V., Japkowicz, N., Kolcz, A.: Editorial: Special Issue on Learning from Imbalanced Data Set. Sigkdd Explorations 6, 1–6 (2004)
Cuff, J.A., Barton, G.: Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction. Proteins: Structure, function and Genetics 34, 508–519 (1999)
Rost, S.: Review: Protein Secondary Structure Continues to Rise. Journal of Structural Biology 134, 204–218 (2001)
Rost, B., Sander, C.: Improved Prediction of Protein Secondary Structure by Use of Sequence Profiles and Neural Networks. Proceedings of the National Academy of Sciences 90, 7558–7562 (1993)
Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall, New York (1999)
Japkowicks, N., Stephen, S.: The Class imbalance Problem: a Systematic Study. Intelligent Data Analysis 6, 429–449 (2002)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Palodeto, V., Terenzi, H., Marques, J.L.B. (2009). Training Neural Networks for Protein Secondary Structure Prediction: The Effects of Imbalanced Data Set. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds) Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence. ICIC 2009. Lecture Notes in Computer Science(), vol 5755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04020-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-04020-7_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04019-1
Online ISBN: 978-3-642-04020-7
eBook Packages: Computer ScienceComputer Science (R0)