Abstract
In the field of bioinformatics in solving biological problems, the huge amount of knowledge is often locked in textual documents such as scientific publications. Hence there is an increasing focus on extracting information from this vast amount of scientific literature. In this paper, we present an information extraction system which employs a semantic parser using the Hidden Vector State (HVS) model for protein-protein interactions. Unlike other hierarchical parsing models which require fully annotated treebank data for training, the HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure needed to robustly extract task domain semantics. When applied in extracting protein-protein interactions information from medical literature, we found that it performed better than other established statistical methods and achieved 47.9% and 72.8% in recall and precision respectively.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research, 235–242 (2000)
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I.: The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic Acids Research, 365–370 (2003)
Bader, G.D., Betel, D., Hogue, C.W.: Bind: the biomolecular interaction network database. Nucleic Acids Research 31(1), 248–250 (2003)
Thomas, J., Milward, D., Ouzounis, C., Pulman, S.: Automatic extraction of protein interactions from scientific abstracts. In: Proceedings of the Pacific Symposium on Biocomputing, Hawaii, U.S.A, pp. 541–552 (2000)
Ono, T., Hishigaki, H., Tanigam, A., Takagi, T.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17(2), 155–161 (2001)
Huang, M., Zhu, X., Hao, Y.: Discovering patterns to extract protein-protein interactions from full text. Bioinformatics 20(18), 3604–3612 (2004)
Craven, M., Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, Heidelberg, Germany, pp. 77–86. AAAI Press, Menlo Park (1999)
Pustejovsky, J., Castano, J., Zhang, J., Kotecki, M., Cochran, B.: Robust relational parsing over biomedical literature: Extracting inhibit relations. In: Proceedings of the Pacific Symposium on Biocomputing, Hawaii, U.S.A, pp. 362–373 (2002)
Yakushiji, A., Tateisi, Y., Miyao, Y., Tsujii, J.: Event extraction from biomedical papers using a full parser. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 6, pp. 408–419 (2001)
Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19(16), 2046–2053 (2003)
Tang, S., Kwoh, C.K.: Cytokine information system and pathway visualization. In: International Joint Conference of InCoB, AASBi and KSBI (BIOINFO 2005) (2005)
He, Y., Young, S.: Semantic processing using the hidden vector state model. Computer Speech and Language 19(1), 85–106 (2005)
Novichkova, S., Egorov, S., Daraselia, N.: Medscan, a natural language processing engine for medline abstracts. Bioinformatics 19(13), 1699–1706 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, D., He, Y., Kwoh, C.K. (2006). Extracting Protein-Protein Interactions from the Literature Using the Hidden Vector State Model. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758525_97
Download citation
DOI: https://doi.org/10.1007/11758525_97
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34381-3
Online ISBN: 978-3-540-34382-0
eBook Packages: Computer ScienceComputer Science (R0)