Abstract
Protein structure prediction is an important component in understanding protein structures and functions. Accurate prediction of protein secondary structure helps in understanding protein folding. In many applications such as drug discovery it is required to predict the secondary structure of unknown proteins. In this paper we report our first attempt to secondary structure predication, and approach it as a sequence classification problem, where the task is equivalent to assigning a sequence of labels (i.e. helix, sheet, and coil) to the given protein sequence. We propose an ensemble technique that is based on two stochastic supervised machine learning algorithms, namely Maximum Entropy Markov Model (MEMM) and Conditional Random Field (CRF). We identify and implement a set of features that mostly deal with the contextual information. The proposed approach is evaluated with a benchmark dataset, and it yields encouraging performance to explore it further. We obtain the highest predictive accuracy of 61.26% and segment overlap score (SOV) of 52.30%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Darroch, J., Ratcliff, D.: Generalized Iterative Scaling for Log-linear Models. Ann. Math. Statistics 43, 1470–1480 (1972)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic framework for segmenting and labelling sequence data. In: 18th International Conference on Maching Learning, pp. 282–289. Morgan Kaufmann, San Franciso (2001)
Thorton, J.M.: From genome to function. Science 292, 2095–2097 (2001)
Zemla, A., Venclovas, C., Fidelis, K., Rost, B.: A modified definition of sov, a segment-based measure for protein secondary structure prediction assessment. PROTEINS: Structure, Function, and Genetics 34, 220–223 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saha, S., Ekbal, A., Sharma, S., Bandyopadhyay, S., Maulik, U. (2013). Protein Secondary Structure Prediction Using Machine Learning. In: Abraham, A., Thampi, S. (eds) Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32063-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-32063-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32062-0
Online ISBN: 978-3-642-32063-7
eBook Packages: EngineeringEngineering (R0)