Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1568296.1568313acmotherconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Using domain knowledge for ontology-guided entity extraction from noisy, unstructured text data

Published: 23 July 2009 Publication History

Abstract

Domain-specific knowledge is often recorded by experts in the form of unstructured text. For example, in the medical domain, clinical notes from electronic health records contain a wealth of information. Similar practices are found in other domains. The challenge we discuss in this paper is how to identify and extract part names from technicians repair notes, a noisy unstructured text data source from General Motors' archives of solved vehicle repair problems, with the goal to develop a robust and dynamic reasoning system to be used as a repair adviser by service technicians.
In the present work, we discuss two approaches to this problem. We present an algorithm for ontology-guided entity disambiguation that uses existing knowledge sources such as domain-specific ontologies and other structured data. We illustrate its use in automotive domain, using GM parts ontology and the unit structure of repair manuals text to build context models, which are then used to disambiguate mentions of part-related entities in the text. We also describe extraction of part names with a small amount of annotated data using Hidden Markov Models (HMM) with shrinkage, achieving an f-score of approximately 80%. Next we used linear-chain Conditional Random Fields (CRF) in order to model observation dependencies present in the repair notes. Using CRF did not lead to improved performance, but a slight improvement over the HMM results was obtained by using a weighted combination of the HMM and CRF models.

References

[1]
Bruninghaus, S. and Ashley, K. D. 2005. Reasoning with Textual Cases Proceedings of the International Conference on Case-Based Reasoning (ICCBR), 137--151.
[2]
Freitag, D. and McCallum, A. 2000. Information Extraction with HMM Structures Learned by Stochastic Optimization. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, AAAI, 584--589.
[3]
Freitag, D. and McCallum, A. 1999. Information Extraction with HMMs and Shrinkage. In Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction, 31--36, July. AAAI Technical Report WS-99-11.
[4]
Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. 18th International Conference on Machine Learning.
[5]
Lenz, M. 1998. Textual CBR and Information Retreival: A Comparison. In Gierl, L. and Lenz, M. (eds.) Proceedings of the 6th German Workshop on Case-Based Reasoning, IMIB Series vol. 7, Inst. fuer Medizinische Informatik und Biometrie, University of Rostock.
[6]
Morgan, A. P., Cafeo, J. A., Gibbons, D. I., Lesperance, R. M., Sengir, G. H., and Simon, A. M. 2003. The General Motors Variation-Reduction Adviser: Evolution of a CBR System. ICCBR 2003, 306--318.
[7]
Morgan, A. P., Cafeo, J. A., Godden, K., Lesperance, R. M., Simon, A. M, McGuinness, D. L., and Benedict, J. L. 2005. The General Motors Variation-Reduction Adviser. AI Magazine 26, 3, 18--28.
[8]
Rabiner, L. R. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proceedings of the IEEE, 77, 2.
[9]
Sha, F. and F. Pereira. Shallow Parsing with Conditional Random Fields. Technical Report MS-CIS-02-35, University of Pennsylvania (2003)
[10]
Sutton, C. and McCallum, A. 2006. An Introduction to Conditional RandomFields for Relational Learning. In Introduction to Statistical Relational Learning. Getoor, L. and BenTaskar, B. (eds.) MIT Press.
[11]
Uschold, M. 2000. Creating, Integrating and Maintaining Local and Global Ontologies. Proceedings of the 14th European Conference on Artificial Intelligence ECAI 2000, Berlin, Germany.
[12]
Roberts, A., R. Gaizauskas, M. Hepple, N. Davis, G. Demetriou, Y. Guo, J. Kola, I. Roberts, A. Setzer, A. Tapuria, et al. 2007. The CLEF corpus: Semantic annotation of clinical text. In AMIA Annu Symp Proc, volume 625.

Cited By

View all
  • (2013)A novel semantic information retrieval system based on a three-level domain modelJournal of Systems and Software10.1016/j.jss.2013.01.02986:5(1426-1452)Online publication date: 1-May-2013
  • (2011)Discovering contextProceedings of the 6th international conference on Foundations of augmented cognition: directing the future of adaptive systems10.5555/2021773.2021833(484-492)Online publication date: 9-Jul-2011
  • (2011)Discovering Context: Classifying Tweets through a Semantic Transform Based on WikipediaFoundations of Augmented Cognition. Directing the Future of Adaptive Systems10.1007/978-3-642-21852-1_55(484-492)Online publication date: 2011
  • Show More Cited By

Index Terms

  1. Using domain knowledge for ontology-guided entity extraction from noisy, unstructured text data

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      AND '09: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
      July 2009
      127 pages
      ISBN:9781605584966
      DOI:10.1145/1568296
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 July 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. information extraction
      2. language models
      3. ontology-guided search
      4. text analysis

      Qualifiers

      • Research-article

      Conference

      AND '09

      Acceptance Rates

      AND '09 Paper Acceptance Rate 15 of 22 submissions, 68%;
      Overall Acceptance Rate 15 of 22 submissions, 68%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 19 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2013)A novel semantic information retrieval system based on a three-level domain modelJournal of Systems and Software10.1016/j.jss.2013.01.02986:5(1426-1452)Online publication date: 1-May-2013
      • (2011)Discovering contextProceedings of the 6th international conference on Foundations of augmented cognition: directing the future of adaptive systems10.5555/2021773.2021833(484-492)Online publication date: 9-Jul-2011
      • (2011)Discovering Context: Classifying Tweets through a Semantic Transform Based on WikipediaFoundations of Augmented Cognition. Directing the Future of Adaptive Systems10.1007/978-3-642-21852-1_55(484-492)Online publication date: 2011
      • (2010)Discovering users' topics of interest on twitterProceedings of the fourth workshop on Analytics for noisy unstructured text data10.1145/1871840.1871852(73-80)Online publication date: 26-Oct-2010

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media