Abstract
Domains act as structural and functional units of proteins, playing an essential role in functional genomics. To investigate the annotation of finite protein domains is of much importance because the functions of a protein can be directly inferred if the functions of its component domains are determined. In this paper, we propose PDAMIML based on a novel multi-instance multi-label learning framework combined with auto-cross covariance transformation and SVM. It can effectively annotate functions for protein domains. We evaluate the performance of PDAMIML using a benchmark of 100 protein domains and 10 high-cycle functional labels. The experiment results reveal that PDAMIML yields significant performance gains when compared to the state-of-the-art ap-proaches. Furthermore, we combine PDAMIML with the other two existing methods by using majority voting, and obtain encouraging results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Apic, G., Gough, J., Teichmann, S.A.: Domain Combinations in Archaeal, Eubacterial and Eukaryotic Proteomes. Journal of Molecular Biology 310, 311–325 (2001)
Wang, M.L., Caetano, A.G.: Global Phylogeny Determined by The Combination of Protein Daomains in Proteomes. Mol. Boi. Evol. 23(12), 2444–2454 (2006)
Bork, P.: Shuffled Domains in Extracellular Proteins. FEBS Letters 286(1-2), 47–54 (1991)
Schug, J., Diskin, S., Mazzarelli, J., et al.: Predicting Gene Ontology Functions From Prodom and CDD Protein Domains. Genome Res. 12(4), 648–655 (2002)
Ashburner, M., Ball, C.A., Blake, J.A., et al.: Gene Ontology: Tool For The Unification of Biology. The Gene Ontology Consortium. Nat Genet. 25, 25–29 (2000)
Lu, X., Zhai, C., Gopalakrishnan, V., Buchanan, B.G.: Automatic Annotation of Protein Motif Function With Gene Ontology Terms. BMC Bioinformatics 5, 122 (2004)
Zhao, X.M., Wang, Y., Chen, L., Aihara, K.: Protein Domain Annotation With Integration of Heterogeneous Information Sources. Proteins 72, 461–473 (2008)
Zhou, Z.H., Zhang, M.L., Huang, S.J., Li, Y.F.: Multi-Instance Multi-Label Learning. Artificial Intelligence 176(1), 2291–2320 (2012)
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
Wold, S., Jonsson, J., Sjöström, M., et al.: Dna and Peptide Sequences and Chemical Processes Mutlivariately Modelled by Principal Component Analysis and Partial Least-Squares Projections To Latent Structures. Anal. Chim. Acta. 277(2), 239–253 (1993)
Altschul, S.F., Madden, T.L., et al.: Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research 25(17), 3389–3402 (1997)
Hunter, S., Jones, P., Mitchell, A.: Interpro in 2011: New Developments in The Family and Domain Prediction Database. Nucleic Acids Research 40, 306–312 (2011)
Camon, E., Magrane, M., Barrell, D., Lee, V., et al.: The Gene Ontology Annotation (GOA) Database:Sharing Knowledge in Uniprot With Gene Ontology. Nucleic Acids Research 32, 262–266 (2004)
Heringa, J., Domains, P.: Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. Wiley Interscience (2005)
Steinwart, I., Hush, D., Scovel, C.: An Explicit Description of The Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels. IEEE Transactions on Information Theory 52, 4635–4643 (2006)
Deng, L., Guan, J., Dong, Q., et al.: Semihs: An Iterative Semi-Supervised Approach For Predicting Protein-Protein Interaction Hot Spots. Protein Pept. Lett. 18(9), 896–905 (2011)
Deng, L., Guan, J., Wei, X., et al.: Boosting Prediction Performance of Protein-Protein Interaction Hot Spots by Using Structural Neighborhood Properties. Journal of Computational Biology 20(11), 878–891 (2013)
Wen, Z.N., Li, M.L., Li, Y.Z., Guo, Y.Z., Wang, K.L.: Delaunay Triangulation With Partial Least Squares Projection To Latent Structures: A Model For G-Protein Coupled Receptors Classification and Fast Structure Recognition. Amino Acids 32, 277–283 (2007)
Guo, Y., Yu, L., Wen, Z., Li, M.: Using Support Vector Machine Combined With Auto Co-Variance To Predict Protein-Protein Interactions From Protein Sequences. Nucleic Acids Research 36(9), 3025–3030 (2008)
Deng, L., Guan, J., Dong, Q., et al.: Prediction of Protein-Protein Interaction Sites Using An Ensemble Method. BMC Bioinformatics 10, 426 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Meng, Y. et al. (2014). A Multi-Instance Multi-Label Learning Approach for Protein Domain Annotation. In: Huang, DS., Han, K., Gromiha, M. (eds) Intelligent Computing in Bioinformatics. ICIC 2014. Lecture Notes in Computer Science(), vol 8590. Springer, Cham. https://doi.org/10.1007/978-3-319-09330-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-09330-7_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09329-1
Online ISBN: 978-3-319-09330-7
eBook Packages: Computer ScienceComputer Science (R0)