Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1458082.1458236acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

A generative retrieval model for structured documents

Published: 26 October 2008 Publication History

Abstract

Structured documents contain elements defined by the author(s) and annotations assigned by other people or processes. Structured documents pose challenges for probabilistic retrieval models when there are mismatches between the structured query and the actual structure in a relevant document or erroneous structure introduced by an annotator. This paper makes three contributions. First, a new generative retrieval model is proposed to deal with the mismatch problem. This new model extends the basic keyword language model by treating structure as hidden variable during the generation process. Second, variations of the model are compared. Third, term-level and structure-level smoothing strategies are studied. Evaluation was conducted with INEX XML retrieval and question-answering retrieval tasks. Experimental results indicate that the optimal structured retrieval model is task dependent, two-level Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer smoothing, and with accurate structured queries, the proposed structured retrieval model outperforms keyword retrieval significantly, on both QA and INEX datasets.

References

[1]
Norbert Gövert and Gabriella Kazai. Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002
[2]
Börkur Sigurbjörnsson and Andrew Trotman. Queries: INEX 2003 working group report. 2003
[3]
A. Echihabi and D. Marcu. A Noisy-Channel Approach to Question Answering. In Proceedings of the 41st Annual of the Association for Computational Linguistics. 2003
[4]
J. Prager, J. Chu-Carroll, E. W. Brown and K. Czuba. Question Answering By Predictive Annotation. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research in Information Retrieval. 2000.
[5]
Jay M. Ponte and W. Bruce Croft. A Language Modeling Approach to Information Retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998
[6]
Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of 24th International ACM SIGIR Conference on Research in Information Retrieval. 2001
[7]
Hui Fang, Tao Tao and Chengxiang Zhai. A formal study of information retrieval heuristics. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. 2004
[8]
M. W. Bilotti, P. Ogilvie, J. Callan and E. Nyberg. Structured Retrieval for Question Answering. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in information retrieval. 2007
[9]
INDRI - Language modeling meets inference networks. http://www.lemurproject.org/indri/. As of May 2, 2008
[10]
Jaap Kamps, Maarten de Rijke and Börkur Sigurbjörnsson. Length normalization in XML retrieval. In Proceedings of the 27th annual international ACM SIGIR Conference on Research and Development in Information Retrieval. 2004
[11]
Djoerd Hiemstra. Statistical Language Models for Intelligent XML Retrieval. Lecture Notes in Computer Science. Vol 2818/2003 pp 107--118. 2003
[12]
Don Metzler and Bruce Croft. Combining the Language Model and Inference Network Approaches to Retrieval, Information Processing and Management, 40(5), 2004.
[13]
T. Strohman, D. Metzler, H. Turtle, and B. Croft. Indri: A language-model based search engine for complex queries (extended version). Technical Report IR-407, Department of Computer Science, University of Massachusetts. 2005.
[14]
P. Ogilvie and J. Callan. Hierarchical Language Models for Retrieval of XML Components. In Proceedings of the Initiative for the Evaluation of XML Retrieval Workshop (INEX 2004). 2004.
[15]
Saadia Malik, Andrew Trotman, Mounia Lalmas, Norbert Fuhr. Overview of INEX 2006. In INEX 2006 Workshop Proceedings, pp1-11. 2006
[16]
Andrew Trotman and Mounia Lalmas. Why Structural Hints in Queries do not Help XML-Retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006.
[17]
K. Järvelin and J. Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2000.

Cited By

View all
  • (2021)Towards adaptive structured Dirichlet smoothing model for digital resource objectsMultimedia Tools and Applications10.1007/s11042-020-10305-wOnline publication date: 9-Jan-2021
  • (2012)A schema-driven approach for knowledge-oriented retrieval and query formulationProceedings of the Third International Workshop on Keyword Search on Structured Data10.1145/2254736.2254746(39-46)Online publication date: 20-May-2012
  • (2011)Ranking support for keyword search on structured data using relevance modelsProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063818(1669-1678)Online publication date: 24-Oct-2011
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. generative model
  2. indri query language
  3. language model
  4. question answering
  5. structured retrieval
  6. xml retrieval

Qualifiers

  • Research-article

Conference

CIKM08
CIKM08: Conference on Information and Knowledge Management
October 26 - 30, 2008
California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Towards adaptive structured Dirichlet smoothing model for digital resource objectsMultimedia Tools and Applications10.1007/s11042-020-10305-wOnline publication date: 9-Jan-2021
  • (2012)A schema-driven approach for knowledge-oriented retrieval and query formulationProceedings of the Third International Workshop on Keyword Search on Structured Data10.1145/2254736.2254746(39-46)Online publication date: 20-May-2012
  • (2011)Ranking support for keyword search on structured data using relevance modelsProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063818(1669-1678)Online publication date: 24-Oct-2011
  • (2010)Shopping for top forumsProceedings of the First Workshop on Social Media Analytics10.1145/1964858.1964862(23-30)Online publication date: 25-Jul-2010
  • (2009)Effective and efficient structured retrievalProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646175(1573-1576)Online publication date: 2-Nov-2009
  • (2009)Retrieval experiments using pseudo-desktop collectionsProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646117(1297-1306)Online publication date: 2-Nov-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media