Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3133811.3133817acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicibeConference Proceedingsconference-collections
research-article

Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents

Published: 17 August 2017 Publication History

Abstract

Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.

References

[1]
Eugene Agichtein and Silviu Cucerzan. 2005. Predicting accuracy of extracting information from unstructured text collections. In Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 413--420.
[2]
David C Blair and Melvin E Maron. 1985. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28, 3 (1985), 289--299.
[3]
Vuk Ercegovac, David J DeWitt, and Raghu Ramakrishnan. 2005. The TEXTURE benchmark: measuring performance of text queries on a relational DBMS. In Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, 313--324.
[4]
Claudia Hauff and Franciska de Jong. 2010. Retrieval system evaluation: automatic evaluation versus incomplete judgments. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 863--864.
[5]
Cyril Labbé and Dominique Labbé. 2013. Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? Scientomet- rics 94, 1 (2013), 379--396.
[6]
Matt Mahoney. 2009. Large text compression benchmark. URL: http://www. mattmahoney. net/text/text.html (2009).
[7]
Gerard Salton, James Allan, and Chris Buckley. 1993. Approaches to passage retrieval in full text information systems. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 49--58.
[8]
Mark Sanderson et al. 2010. Test collection based evaluation of information retrieval systems. Foundations and Trends® in Information Retrieval 4, 4 (2010), 247--375.
[9]
Ellen M Voorhees and Donna Harman. 2000. Overview of the sixth text retrieval conference (TREC-6). Information Processing & Management 36, 1 (2000), 3--35.
[10]
Ellen M Voorhees, Donna K Harman, et al. 2005. TREC: Experiment and evaluation in information retrieval. Vol. 1. MIT press Cambridge.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICIBE '17: Proceedings of the 3rd International Conference on Industrial and Business Engineering
August 2017
107 pages
ISBN:9781450353519
DOI:10.1145/3133811
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • Waseda University: Waseda University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Document Searching
  2. Information Retrieval
  3. Performance Evaluation
  4. Text Similarity

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICIBE 2017

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 55
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media