Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1183568.1183570acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Automatic document indexing in large medical collections

Published: 11 November 2006 Publication History

Abstract

Term extraction relates to extracting the most characteristic or important terms (words or phrases) in a document. This information is commonly used for improving the accuracy of document indexing and retrieval in large text collections. It also allows for faster and better understanding of the contents of a document collection without first browsing through the contents of its documents. This paper presents AMTEx an automatic term extraction method, specifically designed for the automatic indexing of documents in large medical collections such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of NLM, with a well-established method for extraction of domain terms, the C/NC-value method. The performance evaluation of various AMTEx configurations in the indexing task is measured against the current state-of-the-art, the MMTx method. The experimental results on a subset of MEDLINE documents demonstrate that AMTEx achieves better precision and recall than MMTx.

References

[1]
I. 704. Principles and Methods of Terminology. Technical report, Intern. Organization for Standardization, Geneva, Switzerland, 1986.
[2]
S. Ananiadou. A Methodology for Automatic Term Recognition. In Proc. of COLING-94, pages 1034--1038, Kyoto, 1994.
[3]
S. Ananiadou, S. Albert, and D. Schuhmann. Evaluation of Automatic Term Recognition of Nuclear Receptors from Medline. Genome Informatics Series, 11, 2000.
[4]
A. R. Aronson. MetaMap: Mapping Text to the UMLS® Metathesaurus® March 1996. http://skr.nlm.nih.gov/papers.
[5]
A. R. Aronson. Effective Mapping of Biomedical Text to the UMLS® Metathesaurus®: The MetaMap Program. In Proceedings of AMIA 2001, pages 17--21, 2001.
[6]
A. R. Aronson. MetaMap Candidate Retrieval, July 2001. http://skr.nlm.nih.gov/papers.
[7]
A. R. Aronson. MetaMap Evaluation, May 2001. http://skr.nlm.nih.gov/papers.
[8]
A. R. Aronson. MetaMap Variant Generation, May 2001. http://skr.nlm.nih.gov/papers.
[9]
D. B, E. Gaussier, and J. Lange. Towards Automatic Extraction of Monolingual and Bilingual Terminology. In Proc. of COLING-94, pages 515--521, Kyoto, 1994.
[10]
D. Bourigault, I. Gonzalez-Mullier, and C.Gros. LEXTER, a Natural Language Tool for Terminology Extraction. In EURALEX '96: Proc. I-II, Part II -- Papers submitted to the Seventh EURALEX International Congress on Lexicography in Göteborg, pages 771--779, Göteborg University, Göteborg, Sweden, 1996.
[11]
G. Divita, T. Tse, and L. Roth. Failure Analysis of MetaMap Transfer (MMTx). Medinfo, pages 763--767, 2004.
[12]
K. Frantzi, S. Ananiadou, and H. Mima. Automatic recognition of multi-word terms: The C-Value/NC-value Method. International Journal of Digital Libraries, 3(2):117--132, 2000.
[13]
K. Franzi and S. Ananiadou. The C/NC Value Domain Independent Method for Multi-Word Term Extraction. Journal of Natural Language Processing, 6(3):145--180, 1999.
[14]
R. Gaizauskas, G. Demetriou, and K. Humphreys. Term Recognition in Biological Science Journal Articles. In Workshop on Computational Terminology for Medical and Biological Applications, (NLP 2000), pages 37--44, Patras, 2000.
[15]
A. Hliaoutakis, G. Varelas, E. G. Petrakis, and E. Milios. MedSearch: A Retrieval System for Medical Information Based on Semantic Similarity. In Proc. of the 10th ECDL European Conference on Research and Advanced Technology for Digital Libraries (ECDL'2006), pages 512--515, Alicante, Spain, September 17-22 2006.
[16]
C. Jacquemin. Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge, MA, USA, 2001.
[17]
Y. Li, Z. A. Bandar, and D. McLean. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Trans. on Knowledge and Data Engineering, 15(4):871--882, July/Aug. 2003.
[18]
C. Manning and H. Schüzte. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA, June 18 1999.
[19]
D. Maynard and S. Ananiadou. TRUCKS: A Model for Automatic Multi-Word Term Recognition. Journal of Natural Language Processing, 8(1):101--105, 2000.
[20]
E. Milios, Y. Zhang, B. He, and L. Dong. Automatic Term Extraction and Document Similarity in Special Text Corpora. In Proc. of the 6th Conf. of the Pacific Association for Computational Linguistics, pages 22--25, Halifax, Aug 2003.
[21]
S. Patwardhan, S. Banerjee, and T. Petersen. Using Measures of Semantic Relatedness for Word Sense Disambiguation. In Intern. Conf. on Intelligent Text Processing and Comutational Linguistics, pages 17--21, Mexico City, 2003.
[22]
G. Peat et al. The Knee Clinical Assessment Study -- CAS(K). A Prospective Study of Knee Pain and Knee Osteoarthritis in the General Population: Baseline Recruitment and Retention at 18 months, March 2006. http://www.biomedcentral.com/content/pdf/1471-2474-7-30.pdf.
[23]
E. G. Petrakis, G. Varelas, A. Hliaoutakis, and P. Raftopoulou. Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies. In 4th Workshop on Multimedia Semantics (WMS'06), pages 44--52, Chania, Crete, Greece, 1998.
[24]
G. Varelas, E. Voutsakis, P. Raftopoulou, E. Petrakis, E., and Milios. Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web. In Proc. of the 7th ACM Intern. Workshop on Web Information and Data Management(WIDM 2005), pages 10--16, Bremen, Germany, 2005.
[25]
I. Witten, G. Paynter, E. Frank, C. Gutwin, and C. Nevill-Manning. KEA: Practical Automatic Keyphrase Extraction. In Proc. of the $4^th$ ACM Conference on Digital Libraries, pages 254--255, Berkeley, CA, USA, Aug. 1999.
[26]
A. Yakushiji, Y. Tateisi, Y. Miyao, and J. Tsujii. Event Extraction from Biomedical Papers using a Full Parser. In Proceedings of the sixth Pacific Symposium on Biocomputing (PSB 2001), pages 408--419, Hawaii, U.S.A., 2001.
[27]
H. Yu, V. Hatzivassiloglou, A. Rzhetsky, and W. Wilbur. Automatically Identifying Gene/Protein Yerms in MEDLINE Abstracts. Journal of Biomedical Informatics, 35:322--330, 2002.
[28]
K. Zervanou and J. McNaught. A Domain-Independent Approach to IE Rule Development. In Proc. of the 4th Intern. Conf. on Language Resources and Evaluation (LREC 2004), pages 745--748, Lisbon, Portugal, May 2004.
[29]
Y. Zhang, E. Milios, and N. Zincir-Heywood. Narrative Text Classification and Automatic Key Phrase Extraction in Web Document Corpora. In 7th ACM Intern. Workshop on Web Information and Data Management (WIDM 2005), pages 51--58, Bremen, German, Nov. 5 2005.

Cited By

View all
  • (2022)Keyword Extraction for Medium-Sized Documents Using Corpus-Based Contextual Semantic SmoothingComplexity10.1155/2022/70157642022Online publication date: 1-Jan-2022
  • (2015)A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical lettersHealth Information Science and Systems10.1186/s13755-015-0013-y3:1Online publication date: 9-Dec-2015
  • (2012)Creating and Using the Knowledge Archive in the Internet Medical Consultant for Decision Support at the Point of CareInternational Journal of E-Health and Medical Communications10.4018/jehmc.20120701063:3(72-85)Online publication date: 1-Jul-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HIKM '06: Proceedings of the international workshop on Healthcare information and knowledge management
November 2006
66 pages
ISBN:1595935282
DOI:10.1145/1183568
  • Program Chairs:
  • Li Xiong,
  • Yuni Xia
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document indexing
  2. medical document retrieval
  3. term extraction

Qualifiers

  • Article

Conference

CIKM06
Sponsor:
CIKM06: Conference on Information and Knowledge Management
November 11, 2006
Virginia, Arlington, USA

Acceptance Rates

Overall Acceptance Rate 32 of 70 submissions, 46%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Keyword Extraction for Medium-Sized Documents Using Corpus-Based Contextual Semantic SmoothingComplexity10.1155/2022/70157642022Online publication date: 1-Jan-2022
  • (2015)A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical lettersHealth Information Science and Systems10.1186/s13755-015-0013-y3:1Online publication date: 9-Dec-2015
  • (2012)Creating and Using the Knowledge Archive in the Internet Medical Consultant for Decision Support at the Point of CareInternational Journal of E-Health and Medical Communications10.4018/jehmc.20120701063:3(72-85)Online publication date: 1-Jul-2012
  • (2009)Internet medical consultant — A knowledge-sharing systemProceedings of the ITI 2009 31st International Conference on Information Technology Interfaces10.1109/ITI.2009.5196058(79-86)Online publication date: Jun-2009
  • (2009)The AMTEx approach in the medical document indexing and retrieval applicationData & Knowledge Engineering10.1016/j.datak.2008.11.00268:3(380-392)Online publication date: 1-Mar-2009
  • (2009)A Term-Based Driven Clustering Approach for Name DisambiguationProceedings of the Joint International Conferences on Advances in Data and Web Management10.1007/978-3-642-00672-2_29(320-331)Online publication date: 22-Mar-2009
  • (2007)Report on ACM Workshop on Health Information and Knowledge Management (HIKM 2006)ACM SIGMOD Record10.1145/1328854.132886336:2(39-42)Online publication date: 1-Jun-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media