Abstract
Linking the biomedical literature to other data resources is notoriously difficult and requires text mining. Text mining aims to automatically extract facts from literature. Since authors write in natural language, text mining is a great natural language processing challenge, which is far from being solved. We propose an alternative: If authors and editors summarize the main facts in a controlled natural language, text mining will become easier and more powerful. To demonstrate this approach, we use the language Attempto Controlled English (ACE). We define a simple model to capture the main aspects of protein interactions. To evaluate our approach, we collected a dataset of 459 paragraph headings about protein interaction from literature. 56% of these headings can be represented exactly in ACE and another 23% partially. These results indicate that our approach is feasible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bernstein, A., Kaufmann, E., Kaiser, C.: Querying the Semantic Web with Ginseng: A Guided Input Natural Language Search Engine. In: Department of Informatics, University of Zurich (2005)
Booch, G., Rumbaugh, J., Jacobson, I.: The Unified Modeling Language User Guide, 1st edn. Addison-Wesley, Reading (1998)
Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings in Bioinformatics 6(1), 57–71 (2004)
Daraselia, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I.: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20(5), 604–611 (2004)
Doms, A., Schroeder, M.: GoPubMed: exploring PubMed with the Gene Ontology. In Nucleic Acids Research 33, W783–W786 (2005)
Fuchs, N.E., Hoefler, S., Kaljurand, K., Kuhn, T., Schneider, G., Schwertel, U.: Discourse Representation Structures of ACE 4 Sentences, Technical Report ifi-2006.07. Department of Informatics, University of Zurich (2006), ftp://ftp.ifi.unizh.ch/pub/techreports/TR-2006/ifi-2006.07.pdf
Fuchs, N.E., Kaljurand, K., Schneider, G.: Attempto Controlled English Meets the Challenges of Knowledge Representation, Reasoning, Interoperability and User Interfaces. In: The 19th International FLAIRS Conference (FLAIRS 2006) (2006)
Fuchs, N.E., Schwertel, U., Schwitter, R.: Attempto Controlled English – Not Just Another Logic Specification Language. In: Flener, P. (ed.) LOPSTR 1998. LNCS, vol. 1559, p. 1. Springer, Heidelberg (1999), http://www.ifi.unizh.ch/attempto/publications/papers/LOPSTR98.pdf
Fitting, M.: First-Order Logic and Automated Theorem Proving, 2nd edn. Springer, New York (1996)
Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., et al.: A Protein Interaction Map of Drosophila melanogaster. Science 302(5651), 1727–1736 (2003)
Gruber, T.R.: Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human-Computer Studies 43(5-6), 907–928 (1995)
Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H.: Accomplishments and challenges in literature data mining for biology. In Bioinformatics Review 18(12), 1553–1561 (2002)
Stefan Hoefler. The Syntax of Attempto Controlled English: An Abstract Grammar for ACE 4.0, Technical Report ifi-2004.03. Department of Informatics, University of Zurich (2004), ftp://ftp.ifi.unizh.ch/pub/techreports/TR-2004/ifi-2004.03.pdf
Deborah, L.: McGuinness, Frank van Harmelen. OWL Web Ontology Language Overview. W3C Recommendation (2004), http://www.w3.org/TR/2004/REC-owl-features-20040210/
Nardi, D., Brachman, R.J.: An Introduction to Description Logics. In: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, Cambridge (2003)
Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. In Nature Biotechnology 18, 1257–1261 (2000)
Schwitter, R., Ljungberg, A., Hood, D.: ECOLE: A Look-ahead Editor for a Controlled Language. In: Proceedings of EAMT-CLAW 2003, Controlled Language Translation, pp. 141–150. Dublin City University (2003)
Schwitter, R., Tilbrook, M.: Let’s Talk in Description Logic via Controlled Natural Language. In: Logic and Engineering of Natural Language Semantics 2006 (LENLS 2006), Japan (2006)
Thompson, C.W., Pazandak, P., Tennant, H.R.: Talk to Your Semantic Web. In IEEE Internet Computing 9(6), 75–79 (2005)
Uschold, M., Gruninger, M.: Ontologies: Principles, Methods and Applications. Knowledge Engineering Review 11(2) (1996)
Yeh, A., Morgan, A., Colosimo, M., Hirschman, L.: BioCreAtIvE Task 1A: gene mention finding evaluation. BMC Bioinformatics 6 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuhn, T., Royer, L., Fuchs, N.E., Schröder, M. (2006). Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_7
Download citation
DOI: https://doi.org/10.1007/11799511_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36593-8
Online ISBN: 978-3-540-36595-2
eBook Packages: Computer ScienceComputer Science (R0)