Article

Free access

Integration of document detection and information extraction

Authors:

Louise Guthrie,

Tomek Strzalkowski,

Wang Jin,

Fang LinAuthors Info & Claims

TIPSTER '96: Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996

Pages 195 - 199

https://doi.org/10.3115/1119018.1119058

Published: 06 May 1996 Publication History

PDF eReader

Abstract

We have conducted a number of experiments to evaluate various modes of building an integrated detection/extraction system. The experiments were performed using SMART system as baseline. The goal was to determine if advanced information extraction methods can improve recall and precision of document detection. We identified the following two modes of integration:I. Extraction to Detection: broad-coverage extraction1. Extraction step: identify concepts for indexing2. Detection step 1: low recall, high initial precision3. Detection step 2: automatic relevance feedback using top N retrieved documents to regain recall.II. Detection to Extraction: query-specific extraction1. Detection step 1: high recall, low precision run2. Extraction step: learn concept(s) from query and retrieved subcollection3. Detection step 2: re-rank the subcollection to increase precisionOur integration effort concentrated on mode I, and the following issues:1. use of shallow but fast NLP for phrase extractions and disambiguation in place of a full syntactic parser2. use existing MUC-6 extraction capabilities to index a retrieval collection3. mixed Boolean/soft match retrieval model4. create a Universal Spotter algorithm for learning arbitrary concepts

References

[1]

Brown, P., S. Pietra, V. Pietra and R. Mercer. 1991. Word Sense Disambiguation Using Statistical Methods. Proceedings of the 29h Annual Meeting of the Association for Computational Linguistics, pp. 264--270.

Digital Library

Google Scholar

[2]

Gale, W., K. Church and D. Yarowsky. 1992. A Method for Disambiguating Word Senses in a Large Corpus. Computers and the Humanities, 26, pp. 415--439.

Crossref

Google Scholar

[3]

Harman, D. 1995. Overview of the Third Text REtrieval Conference. Overview of the Third Text REtrieval Conference (TREC-3), pp. 1--20.

Crossref

Google Scholar

[4]

Strzalkowski, T. 1995. Natural Language Information Retrieval. Information Processing and Management, vol. 31, no. 3, pp. 397--417.

Digital Library

Google Scholar

[5]

Yarowsky, D. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189--196.

Digital Library

Google Scholar

Recommendations

Document expansion for image retrieval
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous Information

Successful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One ...
Coreference, cross-document coreference, and information extraction methodologies
Parallel information retrieval and visualization on large, unstructured document collections using web link information

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

TIPSTER '96: Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996

May 1996

450 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 May 1996

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
154
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Recommendations

Document expansion for image retrieval

Coreference, cross-document coreference, and information extraction methodologies

Parallel information retrieval and visualization on large, unstructured document collections using web link information

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations