Article

Automatic document indexing in large medical collections

Authors:

Angelos Hliaoutakis,

Kalliopi Zervanou,

Euripides G.M. Petrakis,

Evangelos E. MiliosAuthors Info & Claims

HIKM '06: Proceedings of the international workshop on Healthcare information and knowledge management

Pages 1 - 8

https://doi.org/10.1145/1183568.1183570

Published: 11 November 2006 Publication History

Abstract

Term extraction relates to extracting the most characteristic or important terms (words or phrases) in a document. This information is commonly used for improving the accuracy of document indexing and retrieval in large text collections. It also allows for faster and better understanding of the contents of a document collection without first browsing through the contents of its documents. This paper presents AMTE_x an automatic term extraction method, specifically designed for the automatic indexing of documents in large medical collections such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTE_x combines MeSH, the terminological thesaurus resource of NLM, with a well-established method for extraction of domain terms, the C/NC-value method. The performance evaluation of various AMTE_x configurations in the indexing task is measured against the current state-of-the-art, the MMTx method. The experimental results on a subset of MEDLINE documents demonstrate that AMTE_x achieves better precision and recall than MMTx.

References

[1]

I. 704. Principles and Methods of Terminology. Technical report, Intern. Organization for Standardization, Geneva, Switzerland, 1986.

[2]

S. Ananiadou. A Methodology for Automatic Term Recognition. In Proc. of COLING-94, pages 1034--1038, Kyoto, 1994.

Digital Library

[3]

S. Ananiadou, S. Albert, and D. Schuhmann. Evaluation of Automatic Term Recognition of Nuclear Receptors from Medline. Genome Informatics Series, 11, 2000.

[4]

A. R. Aronson. MetaMap: Mapping Text to the UMLS® Metathesaurus® March 1996. http://skr.nlm.nih.gov/papers.

[5]

A. R. Aronson. Effective Mapping of Biomedical Text to the UMLS® Metathesaurus®: The MetaMap Program. In Proceedings of AMIA 2001, pages 17--21, 2001.

[6]

A. R. Aronson. MetaMap Candidate Retrieval, July 2001. http://skr.nlm.nih.gov/papers.

[7]

A. R. Aronson. MetaMap Evaluation, May 2001. http://skr.nlm.nih.gov/papers.

[8]

A. R. Aronson. MetaMap Variant Generation, May 2001. http://skr.nlm.nih.gov/papers.

[9]

D. B, E. Gaussier, and J. Lange. Towards Automatic Extraction of Monolingual and Bilingual Terminology. In Proc. of COLING-94, pages 515--521, Kyoto, 1994.

Digital Library

[10]

D. Bourigault, I. Gonzalez-Mullier, and C.Gros. LEXTER, a Natural Language Tool for Terminology Extraction. In EURALEX '96: Proc. I-II, Part II -- Papers submitted to the Seventh EURALEX International Congress on Lexicography in Göteborg, pages 771--779, Göteborg University, Göteborg, Sweden, 1996.

[11]

G. Divita, T. Tse, and L. Roth. Failure Analysis of MetaMap Transfer (MMTx). Medinfo, pages 763--767, 2004.

[12]

K. Frantzi, S. Ananiadou, and H. Mima. Automatic recognition of multi-word terms: The C-Value/NC-value Method. International Journal of Digital Libraries, 3(2):117--132, 2000.

[13]

K. Franzi and S. Ananiadou. The C/NC Value Domain Independent Method for Multi-Word Term Extraction. Journal of Natural Language Processing, 6(3):145--180, 1999.

[14]

R. Gaizauskas, G. Demetriou, and K. Humphreys. Term Recognition in Biological Science Journal Articles. In Workshop on Computational Terminology for Medical and Biological Applications, (NLP 2000), pages 37--44, Patras, 2000.

[15]

A. Hliaoutakis, G. Varelas, E. G. Petrakis, and E. Milios. MedSearch: A Retrieval System for Medical Information Based on Semantic Similarity. In Proc. of the 10th ECDL European Conference on Research and Advanced Technology for Digital Libraries (ECDL'2006), pages 512--515, Alicante, Spain, September 17-22 2006.

Digital Library

[16]

C. Jacquemin. Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge, MA, USA, 2001.

[17]

Y. Li, Z. A. Bandar, and D. McLean. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Trans. on Knowledge and Data Engineering, 15(4):871--882, July/Aug. 2003.

Digital Library

[18]

C. Manning and H. Schüzte. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA, June 18 1999.

Digital Library

[19]

D. Maynard and S. Ananiadou. TRUCKS: A Model for Automatic Multi-Word Term Recognition. Journal of Natural Language Processing, 8(1):101--105, 2000.

[20]

E. Milios, Y. Zhang, B. He, and L. Dong. Automatic Term Extraction and Document Similarity in Special Text Corpora. In Proc. of the 6th Conf. of the Pacific Association for Computational Linguistics, pages 22--25, Halifax, Aug 2003.

[21]

S. Patwardhan, S. Banerjee, and T. Petersen. Using Measures of Semantic Relatedness for Word Sense Disambiguation. In Intern. Conf. on Intelligent Text Processing and Comutational Linguistics, pages 17--21, Mexico City, 2003.

Digital Library

[22]

G. Peat et al. The Knee Clinical Assessment Study -- CAS(K). A Prospective Study of Knee Pain and Knee Osteoarthritis in the General Population: Baseline Recruitment and Retention at 18 months, March 2006. http://www.biomedcentral.com/content/pdf/1471-2474-7-30.pdf.

[23]

E. G. Petrakis, G. Varelas, A. Hliaoutakis, and P. Raftopoulou. Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies. In 4th Workshop on Multimedia Semantics (WMS'06), pages 44--52, Chania, Crete, Greece, 1998.

[24]

G. Varelas, E. Voutsakis, P. Raftopoulou, E. Petrakis, E., and Milios. Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web. In Proc. of the 7th ACM Intern. Workshop on Web Information and Data Management(WIDM 2005), pages 10--16, Bremen, Germany, 2005.

Digital Library

[25]

I. Witten, G. Paynter, E. Frank, C. Gutwin, and C. Nevill-Manning. KEA: Practical Automatic Keyphrase Extraction. In Proc. of the $4^th$ ACM Conference on Digital Libraries, pages 254--255, Berkeley, CA, USA, Aug. 1999.

Digital Library

[26]

A. Yakushiji, Y. Tateisi, Y. Miyao, and J. Tsujii. Event Extraction from Biomedical Papers using a Full Parser. In Proceedings of the sixth Pacific Symposium on Biocomputing (PSB 2001), pages 408--419, Hawaii, U.S.A., 2001.

[27]

H. Yu, V. Hatzivassiloglou, A. Rzhetsky, and W. Wilbur. Automatically Identifying Gene/Protein Yerms in MEDLINE Abstracts. Journal of Biomedical Informatics, 35:322--330, 2002.

Digital Library

[28]

K. Zervanou and J. McNaught. A Domain-Independent Approach to IE Rule Development. In Proc. of the 4th Intern. Conf. on Language Resources and Evaluation (LREC 2004), pages 745--748, Lisbon, Portugal, May 2004.

[29]

Y. Zhang, E. Milios, and N. Zincir-Heywood. Narrative Text Classification and Automatic Key Phrase Extraction in Web Document Corpora. In 7th ACM Intern. Workshop on Web Information and Data Management (WIDM 2005), pages 51--58, Bremen, German, Nov. 5 2005.

Digital Library

Cited By

Khan OWasi SSiddiqui MKarim A(2022)Keyword Extraction for Medium-Sized Documents Using Corpus-Based Contextual Semantic SmoothingComplexity10.1155/2022/70157642022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/7015764
Liu WChung BWang RNg JMorlet N(2015)A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical lettersHealth Information Science and Systems10.1186/s13755-015-0013-y3:1Online publication date: 9-Dec-2015
https://doi.org/10.1186/s13755-015-0013-y
Nakic DLoškovska S(2012)Creating and Using the Knowledge Archive in the Internet Medical Consultant for Decision Support at the Point of CareInternational Journal of E-Health and Medical Communications10.4018/jehmc.20120701063:3(72-85)Online publication date: 1-Jul-2012
https://dl.acm.org/doi/10.4018/jehmc.2012070106
Show More Cited By

Index Terms

Automatic document indexing in large medical collections

Recommendations

The AMTEx approach in the medical document indexing and retrieval application

AMTEx is a medical document indexing method, specifically designed for the automatic indexing of documents in large medical collections, such as MEDLINE, the premier bibliographic database of the US National Library of Medicine (NLM). AMTEx combines ...
Automatic term identification by user profile for document categorisation in Medline
NLDB'11: Proceedings of the 16th international conference on Natural language processing and information systems

We show how term extraction methods such as AMTE_XX and MMT_X can be used for the automatic categorisation of medical documents by user profile (novice users and experts). This is achieved by mapping document terms to external lexical resources such as ...
Sentence ranking for document indexing
PReMI'11: Proceedings of the 4th international conference on Pattern recognition and machine intelligence

This article discusses a new document indexing scheme for information retrieval. For a structured (e.g., scientific) document, Pasi et al. proposed varying weights to different sections according to their importance in the document. This concept is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HIKM '06: Proceedings of the international workshop on Healthcare information and knowledge management

November 2006

66 pages

ISBN:1595935282

DOI:10.1145/1183568

Program Chairs:
Li Xiong
Emory University
,
Yuni Xia
Indiana University - Purdue University Indianapolis

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIKM06

Sponsor:

CIKM06: Conference on Information and Knowledge Management

November 11, 2006

Virginia, Arlington, USA

Acceptance Rates

Overall Acceptance Rate 32 of 70 submissions, 46%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
688
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khan OWasi SSiddiqui MKarim A(2022)Keyword Extraction for Medium-Sized Documents Using Corpus-Based Contextual Semantic SmoothingComplexity10.1155/2022/70157642022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/7015764
Liu WChung BWang RNg JMorlet N(2015)A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical lettersHealth Information Science and Systems10.1186/s13755-015-0013-y3:1Online publication date: 9-Dec-2015
https://doi.org/10.1186/s13755-015-0013-y
Nakic DLoškovska S(2012)Creating and Using the Knowledge Archive in the Internet Medical Consultant for Decision Support at the Point of CareInternational Journal of E-Health and Medical Communications10.4018/jehmc.20120701063:3(72-85)Online publication date: 1-Jul-2012
https://dl.acm.org/doi/10.4018/jehmc.2012070106
Nakic DLoskovska S(2009)Internet medical consultant — A knowledge-sharing systemProceedings of the ITI 2009 31st International Conference on Information Technology Interfaces10.1109/ITI.2009.5196058(79-86)Online publication date: Jun-2009
https://doi.org/10.1109/ITI.2009.5196058
Hliaoutakis AZervanou KPetrakis E(2009)The AMTEx approach in the medical document indexing and retrieval applicationData & Knowledge Engineering10.1016/j.datak.2008.11.00268:3(380-392)Online publication date: 1-Mar-2009
https://dl.acm.org/doi/10.1016/j.datak.2008.11.002
Zhu JZhou XFung G(2009)A Term-Based Driven Clustering Approach for Name DisambiguationProceedings of the Joint International Conferences on Advances in Data and Web Management10.1007/978-3-642-00672-2_29(320-331)Online publication date: 22-Mar-2009
https://dl.acm.org/doi/10.1007/978-3-642-00672-2_29
Xiong LXia Y(2007)Report on ACM Workshop on Health Information and Knowledge Management (HIKM 2006)ACM SIGMOD Record10.1145/1328854.132886336:2(39-42)Online publication date: 1-Jun-2007
https://dl.acm.org/doi/10.1145/1328854.1328863

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents