Abstract
Latent Semantic Indexing (LSI) is a popular information retrieval model for concept-based searching. As with many vector space IR models, LSI requires an existing term-document association structure such as a term-by-document matrix. The term-by-document matrix, constructed during document parsing, can only capture weighted vocabulary occurrence patterns in the documents. However, for many knowledge domains there are pre-existing semantic structures that could be used to organize and categorize information. The goals of this study are (i) to demonstrate how such semantic structures can be automatically incorporated into the LSI vector space model, and (ii) to measure the effect of these structures on query matching performance. The new approach, referred to as Knowledge-Enhanced LSI, is applied to documents in the OHSUMED medical abstracts collection using the semantic structures provided by the UMLS Semantic Network and MeSH. Results based on precision-recall data (11-point average precision values) indicate that a MeSH-enhanced search index is capable of delivering noticeable incremental performance gain (as much as 35%) over the original LSI for modest constraints on precision. This performance gain is achieved by replacing the original query with the MeSH heading extracted from the query text via regular expression matches.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Baeza-Yates R and Ribeiro-Neto B (1999) Modern Information Retrieval, 2nd ed. Addison Wesley.
Berry MW and Browne M (1999) Understanding Search Engines: Mathematical Modeling and Text Retrieval. SIAM Book Series: Software, Environments, and Tools.
Bray T, Paoli J, Sperberg-McQueen CM and Maler E (2000) Extensible Markup Language (XML) 1.0, 2nd ed. http://www.w3.org/TR/REC-xml.
Deerwester S, Dumais ST, Furnas GW, Landauer TK and Harshman R (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(2):391–407.
Golub GH and Van Loan CF (1996) Matrix Computations, 3rd ed. Johns Hopkins University Press, Baltimore, MD.
Guo D (2001) Knowledge-enhanced latent semantic indexing (KELSI): Algorithms and applications. Master's Thesis, University of Tennessee.
Hersh W, Buckley C, Leone T and Hickam D (1994) OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual ACM SIGIR Conference, pp. 192–201.
Howard S, Tang H, Berry M and Martin D (2000) General Text Parser GTP Version 2.0. http://www.cs. utk.edu/~lsi.
Kowalski G (1997) Information Retrieval Systems: Theory and Implementation. Kluwer Academic Publishers, Boston.
Letsche TA (1996) Toward large-scale information retrieval using latent semantic indexing. Master's Thesis, University of Tennessee.
Letsche TA and Berry MW (1997) Large-scale information retrieval with latent semantic indexing. Information Sciences—Applications, 100:105–137.
Nelson S, Bear S, Belony M, Johnston D, Pash J, Powell T, Savage A, Schulman J, Sorden N and Tang L (2000) Medical Subject Headings. National Library of Medicine, http://www.nlm.nih.gov/mesh/meshhome.html.
O'Brien GW (1994) Information Management Tools for Updating an SVD-Encoded Indexing Scheme. Master's Thesis, Department of Computer Science, University of Tennessee.
Selden C, Humphreys R and Betsy L (1997) Unified Medical Language System (UMLS). In U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health. http:// umlsinfo.nlm.nih.gov/.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Guo, D., Berry, M.W., Thompson, B.B. et al. Knowledge-Enhanced Latent Semantic Indexing. Information Retrieval 6, 225–250 (2003). https://doi.org/10.1023/A:1023984205118
Issue Date:
DOI: https://doi.org/10.1023/A:1023984205118