Abstract
It is necessary to find a computational method for prediction of protein subcellular location (SCL). Many researches have focused on the topic. Among them, methods incorporated Gene Ontology (GO) achieved higher prediction accuracy. However the former method of extracting features from GO have some disadvantages. In this paper, to increase the accuracy of the prediction, we present a novel method to extract features from GO by semantic similarity measurement, which is hopeful to overcome the disadvantages of former method. Testing on a public available dataset shows satisfied results. And this method can also be used in similar scenarios in other bioinformatics researches or data mining process.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rey, S., Acab, M., Gardy, J.L., Laird, M.R., deFays, K., Lambert, C., Brinkman, F.S.L.: PSORTdb: a protein subcellular localization database for bacteria. Nucleic Acids Research 33 (2005)
Yu, C.S., Chen, Y.C., Lu, C.H., Hwang, J.K.: Prediction of protein subcellular localization. Proteins-Structure Function and Bioinformatics 64, 643–651 (2006)
Hua, S.J., Sun, Z.R.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001)
Cai, Y.D., Chou, K.C.: Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochemical and Biophysical Research Communications 305, 407–411 (2003)
Gardy, J.L., Spencer, C., Wang, K., Ester, M., Tusnady, G.E., Simon, I., Hua, S., deFays, K., Lambert, C., Nakai, K., Brinkman, F.S.L.: PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Research 31, 3613–3617 (2003)
Nakai, K.: Protein sorting signals and prediction of subcellular localization. Advances in Protein Chemistry 5454, 277–344 (2000)
Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research 26, 2230–2236 (1998)
Chou, K.C., Cai, Y.D.: A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochemical and Biophysical Research Communications 311, 743–747 (2003)
Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L., Copley, R., Courcelle, E., Das, U., Durbin, R., Fleischmann, W., Gough, J., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McDowall, J., Mitchell, A., Nikolskaya, A.N., Orchard, S., Pagni, M., Pointing, C.P., Quevillon, E., Selengut, J., Sigrist, C.J.A., Silventoinen, V., Studholme, D.J., Vaughan, R., Wu, C.H.: InterPro, progress and status in 2005. Nucleic Acids Research 33, 201–205 (2005)
Su, C.-Y., Lo, A., Lin, C.-C., Chang, F., Hsu, W.-L.: A Novel Approach for Prediction of Multi-Labeled Protein Subcellular Localization for Prokaryotic Bacteria. IEEE The Computational Systems Bioinformatics Conference, Stanford (2005)
Lu, Z., Hunter, L.: GO Molecular Function Terms Are Predictive of Subcellular Localization. In: Pacific Symposium on Biocomputing, vol. 4-8, World Scientific, Hawaii, USA (2005)
Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A.: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19, 1275–1283 (2003)
Li, R., Cao, S.L., Li, Y.Y., Tan, H., Zhu, Y.Y., Zhong, Y., Li, Y.X.: A measure of semantic similarity between gene ontology terms based on semantic pathway covering. Progress in Natural Science 16, 721–726 (2006)
Zhong, J.W., Zhu, H.P., Li, J.M., Yu, Y.: Conceptual graph matching for semantic search. In: Priss, U., Corbett, D.R., Angelova, G. (eds.) ICCS 2002. LNCS (LNAI), vol. 2393, pp. 92–106. Springer, Heidelberg (2002)
Rey, S., Acab, M., Gardy, J.L., Laird, M.R., DeFays, K., Lambert, C., Brinkman, F.S.L.: PSORTdb: a protein subcellular localization database for bacteria. Nucleic Acids Research 33, D164–D168 (2005)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. Software (2001), available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Hua, S.J., Sun, Z.R.: A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach. Journal of Molecular Biology 308, 397–407 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, G., Sheng, H. (2007). Extracting Features from Gene Ontology for the Identification of Protein Subcellular Location by Semantic Similarity Measurement. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-77018-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)