research-article

Using domain knowledge for ontology-guided entity extraction from noisy, unstructured text data

Authors:

Sergey Bratus,

Anna Rumshisky,

Rajendra Magar,

Paul ThompsonAuthors Info & Claims

AND '09: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data

Pages 101 - 106

https://doi.org/10.1145/1568296.1568313

Published: 23 July 2009 Publication History

Get Access

Abstract

Domain-specific knowledge is often recorded by experts in the form of unstructured text. For example, in the medical domain, clinical notes from electronic health records contain a wealth of information. Similar practices are found in other domains. The challenge we discuss in this paper is how to identify and extract part names from technicians repair notes, a noisy unstructured text data source from General Motors' archives of solved vehicle repair problems, with the goal to develop a robust and dynamic reasoning system to be used as a repair adviser by service technicians.

In the present work, we discuss two approaches to this problem. We present an algorithm for ontology-guided entity disambiguation that uses existing knowledge sources such as domain-specific ontologies and other structured data. We illustrate its use in automotive domain, using GM parts ontology and the unit structure of repair manuals text to build context models, which are then used to disambiguate mentions of part-related entities in the text. We also describe extraction of part names with a small amount of annotated data using Hidden Markov Models (HMM) with shrinkage, achieving an f-score of approximately 80%. Next we used linear-chain Conditional Random Fields (CRF) in order to model observation dependencies present in the repair notes. Using CRF did not lead to improved performance, but a slight improvement over the HMM results was obtained by using a weighted combination of the HMM and CRF models.

References

[1]

Bruninghaus, S. and Ashley, K. D. 2005. Reasoning with Textual Cases Proceedings of the International Conference on Case-Based Reasoning (ICCBR), 137--151.

Digital Library

Google Scholar

[2]

Freitag, D. and McCallum, A. 2000. Information Extraction with HMM Structures Learned by Stochastic Optimization. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, AAAI, 584--589.

Digital Library

Google Scholar

[3]

Freitag, D. and McCallum, A. 1999. Information Extraction with HMMs and Shrinkage. In Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction, 31--36, July. AAAI Technical Report WS-99-11.

Google Scholar

[4]

Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. 18th International Conference on Machine Learning.

Digital Library

Google Scholar

[5]

Lenz, M. 1998. Textual CBR and Information Retreival: A Comparison. In Gierl, L. and Lenz, M. (eds.) Proceedings of the 6th German Workshop on Case-Based Reasoning, IMIB Series vol. 7, Inst. fuer Medizinische Informatik und Biometrie, University of Rostock.

Google Scholar

[6]

Morgan, A. P., Cafeo, J. A., Gibbons, D. I., Lesperance, R. M., Sengir, G. H., and Simon, A. M. 2003. The General Motors Variation-Reduction Adviser: Evolution of a CBR System. ICCBR 2003, 306--318.

Digital Library

Google Scholar

[7]

Morgan, A. P., Cafeo, J. A., Godden, K., Lesperance, R. M., Simon, A. M, McGuinness, D. L., and Benedict, J. L. 2005. The General Motors Variation-Reduction Adviser. AI Magazine 26, 3, 18--28.

Google Scholar

[8]

Rabiner, L. R. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proceedings of the IEEE, 77, 2.

Crossref

Google Scholar

[9]

Sha, F. and F. Pereira. Shallow Parsing with Conditional Random Fields. Technical Report MS-CIS-02-35, University of Pennsylvania (2003)

Google Scholar

[10]

Sutton, C. and McCallum, A. 2006. An Introduction to Conditional RandomFields for Relational Learning. In Introduction to Statistical Relational Learning. Getoor, L. and BenTaskar, B. (eds.) MIT Press.

Google Scholar

[11]

Uschold, M. 2000. Creating, Integrating and Maintaining Local and Global Ontologies. Proceedings of the 14th European Conference on Artificial Intelligence ECAI 2000, Berlin, Germany.

Google Scholar

[12]

Roberts, A., R. Gaizauskas, M. Hepple, N. Davis, G. Demetriou, Y. Guo, J. Kola, I. Roberts, A. Setzer, A. Tapuria, et al. 2007. The CLEF corpus: Semantic annotation of clinical text. In AMIA Annu Symp Proc, volume 625.

Google Scholar

Cited By

View all

Sbattella LTedesco R(2013)A novel semantic information retrieval system based on a three-level domain modelJournal of Systems and Software10.1016/j.jss.2013.01.02986:5(1426-1452)Online publication date: 1-May-2013
https://dl.acm.org/doi/10.1016/j.jss.2013.01.029
Genc YSakamoto YNickerson J(2011)Discovering contextProceedings of the 6th international conference on Foundations of augmented cognition: directing the future of adaptive systems10.5555/2021773.2021833(484-492)Online publication date: 9-Jul-2011
https://dl.acm.org/doi/10.5555/2021773.2021833
Genc YSakamoto YNickerson J(2011)Discovering Context: Classifying Tweets through a Semantic Transform Based on WikipediaFoundations of Augmented Cognition. Directing the Future of Adaptive Systems10.1007/978-3-642-21852-1_55(484-492)Online publication date: 2011
https://doi.org/10.1007/978-3-642-21852-1_55
Show More Cited By

Index Terms

Using domain knowledge for ontology-guided entity extraction from noisy, unstructured text data
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Domain-specific entity extraction from noisy, unstructured data using ontology-guided search
Special issue on noisy text analytics

Domain-specific knowledge is often recorded by experts in the form of unstructured text. For example, in the medical domain, clinical notes from electronic health records contain a wealth of information. Similar practices are found in other domains. The ...
Rule based synonyms for entity extraction from noisy text
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Identification of named entities such as person, organization and product names from text is an important task in information extraction. In many domains, the same entity could be referred to in multiple ways due to variations introduced by different ...
A Flexible Text Mining System for Entity and Relation Extraction in PubMed
DTMBIO '15: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics

Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

AND '09: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data

July 2009

127 pages

ISBN:9781605584966

DOI:10.1145/1568296

Program Chairs:
Daniel Lopresti
Lehigh University
,
Shourya Roy
Xerox India Innovation Hub
,
Klaus Schulz
University of Munich
,
L. Venkata Subramaniam
IBM India Research Lab

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AND '09

AND '09: Third Workshop on Analytics for Noisy Unstructured Text Data

July 23 - 24, 2009

Barcelona, Spain

Acceptance Rates

AND '09 Paper Acceptance Rate 15 of 22 submissions, 68%;

Overall Acceptance Rate 15 of 22 submissions, 68%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
433
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sbattella LTedesco R(2013)A novel semantic information retrieval system based on a three-level domain modelJournal of Systems and Software10.1016/j.jss.2013.01.02986:5(1426-1452)Online publication date: 1-May-2013
https://dl.acm.org/doi/10.1016/j.jss.2013.01.029
Genc YSakamoto YNickerson J(2011)Discovering contextProceedings of the 6th international conference on Foundations of augmented cognition: directing the future of adaptive systems10.5555/2021773.2021833(484-492)Online publication date: 9-Jul-2011
https://dl.acm.org/doi/10.5555/2021773.2021833
Genc YSakamoto YNickerson J(2011)Discovering Context: Classifying Tweets through a Semantic Transform Based on WikipediaFoundations of Augmented Cognition. Directing the Future of Adaptive Systems10.1007/978-3-642-21852-1_55(484-492)Online publication date: 2011
https://doi.org/10.1007/978-3-642-21852-1_55
Michelson MMacskassy SBasili RLopresti DRinglstetter CRoy SSchulz KSubramaniam L(2010)Discovering users' topics of interest on twitterProceedings of the fourth workshop on Analytics for noisy unstructured text data10.1145/1871840.1871852(73-80)Online publication date: 26-Oct-2010
https://dl.acm.org/doi/10.1145/1871840.1871852

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Domain-specific entity extraction from noisy, unstructured data using ontology-guided search

Rule based synonyms for entity extraction from noisy text

A Flexible Text Mining System for Entity and Relation Extraction in PubMed