research-article

Document Retrieval Using Entity-Based Language Models

Authors:

David CarmelAuthors Info & Claims

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 65 - 74

https://doi.org/10.1145/2911451.2911508

Published: 07 July 2016 Publication History

Abstract

We address the ad hoc document retrieval task by devising novel types of entity-based language models. The models utilize information about single terms in the query and documents as well as term sequences marked as entities by some entity-linking tool. The key principle of the language models is accounting, simultaneously, for the uncertainty inherent in the entity-markup process and the balance between using entity-based and term-based information. Empirical evaluation demonstrates the merits of using the language models for retrieval. For example, the performance transcends that of a state-of-the-art term proximity method. We also show that the language models can be effectively used for cluster-based document retrieval and query expansion.

References

[1]

N. Abdul-jaleel, J. Allan, W. B. Croft, O. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. Umass at TREC 2004: Novelty and hard. In Proc. of TREC-13, 2004.

[2]

J. Allan, J. P. Callan, W. B. Croft, L. Ballesteros, J. Broglio, J. Xu, and H. Shu. Inquery at TREC-5. In Proc. of TREC-5, pages 119--132, 1996.

[3]

A. R. Aronson, T. C. Rindflesch, and A. C. Browne. Exploiting a large thesaurus for information retrieval. In Proc. of RIAO, volume 94, pages 197--216, 1994.

Digital Library

[4]

M. Bendersky, D. Metzler, and W. B. Croft. Learning concept importance using a weighted dependence model. In Proc. of WSDM, pages 31--40, 2010.

Digital Library

[5]

M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In Proc. of SIGIR, pages 605--614, 2011.

Digital Library

[6]

M. Bendersky, D. Metzler, and W. B. Croft. Effective query formulation with multiple information sources. In Proc. of WSDM, pages 443--452, 2012.

Digital Library

[7]

W. C. Brandao, R. L. T. Santos, N. Ziviani, E. S. de Moura, and A. S. da Silva. Learning to expand queries using entities. JASIST, 65(9):1870--1883, 2014.

[8]

G. Cao, J. Nie, and J. Bai. Integrating word relationships into language models. In Proc. of SIGIR, pages 298--305, 2005.

Digital Library

[9]

X. Cheng and D. Roth. Relational inference for wikification. In Proc. of EMNLP, pages 1787--1796, 2013.

[10]

K. Collins-Thompson and J. Callan. Query expansion using random walk models. In Proc. of CIKM, pages 704--711, 2005.

Digital Library

[11]

G. V. Cormack, M. D. Smucker, and C. L. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval, 14(5):441--465, 2011.

Digital Library

[12]

M. Cornolti, P. Ferragina, and M. Ciaramita. A framework for benchmarking entity-annotation systems. In Proc. of WWW, pages 249--260, 2013.

Digital Library

[13]

J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In Proc. of SIGIR, pages 365--374, 2014.

Digital Library

[14]

O. Egozi, S. Markovitch, and E. Gabrilovich. Concept-based information retrieval using explicit semantic analysis. ACM Transactions on Information Systems (TOIS), 29(2):8, 2011.

Digital Library

[15]

P. Ferragina and U. Scaiella. Tagme: On-the-fly annotation of short text fragments (by Wikipedia entities). In Proc. of CIKM, pages 1625--1628, 2010.

Digital Library

[16]

W. R. Hersh, D. H. Hickam, and T. Leone. Words, concepts, or both: optimal indexing units for automated information retrieval. In Proc. of SCAMC, page 644, 1992.

[17]

D. Hiemstra. Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term. In Proc. of SIGIR, pages 35--41, 2002.

Digital Library

[18]

M. Hsu, M. Tsai, and H. Chen. Combining wordnet and conceptnet for automatic query expansion: A learning approach. In Proc. of AIRS, pages 213--224, 2008.

Digital Library

[19]

S. Huston and W. B. Croft. A comparison of retrieval models using term dependencies. In Proc. of CIKM, pages 111--120, 2014.

Digital Library

[20]

A. Kotov and C. Zhai. Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries. In Proc. of WSDM, pages 403--412, 2012.

Digital Library

[21]

R. Krovetz and W. B. Croft. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems (TOIS), 10(2):115--141, 1992.

Digital Library

[22]

G. Kumaran and J. Allan. A case for shorter queries, and helping users create them. In Proc. of NAACL, pages 220--227, 2007.

[23]

H.-K. J. Kuo and W. Reichl. Phrase-based language models for speech recognition. In Proc. of EUROSPEECH, 1999.

[24]

O. Kurland and E. Krikon. The opposite of smoothing: A language model approach to ranking query-specific document clusters. Journal of Artificial Intelligence Research (JAIR), 41:367--395, 2011.

Digital Library

[25]

J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001.

Digital Library

[26]

V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001.

Digital Library

[27]

M. Levit, S. Parthasarathy, S. Chang, A. Stolcke, and B. Dumoulin. Word-phrase-entity language models: getting more mileage out of n-grams. In Proc. of INTERSPEECH, pages 666--670, 2014.

[28]

H. Li and J. Xu. Semantic matching in search. Foundations and Trends in Information Retrieval, 7(5):343--469, 2014.

Digital Library

[29]

R. Li, L. Hao, P. Zhang, D. Song, and Y. Hou. A query expansion approach using entity distribution based on markov random fields. In Proc. of AIRS, 2015.

[30]

S. Liu, F. Liu, C. T. Yu, and W. Meng. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In Proc. of SIGIR, pages 266--272, 2004.

Digital Library

[31]

X. Liu, F. Chen, H. Fang, and M. Wang. Exploiting entity relationship for query expansion in enterprise search. Information Retrieval Journal, 17(3):265--294, 2014.

Digital Library

[32]

X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR, pages 454--462, 2008.

Digital Library

[33]

X. Liu and H. Fang. Latent entity space: a novel retrieval approach for entity-bearing queries. Information Retrieval Journal, 18(6):473--503, December 2015.

Digital Library

[34]

R. Mandala, T. Tokunaga, and H. Tanaka. Combining multiple evidence from different types of thesaurus for query expansion. In Proc. of SIGIR, pages 191--197, 1999.

Digital Library

[35]

E. Meij, D. Trieschnigg, M. de Rijke, and W. Kraaij. Conceptual language models for domain-specific retrieval. Information Processing & Management, 46(4):448--469, 2010.

Digital Library

[36]

D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proc. of SIGIR, pages 472--479, 2005.

Digital Library

[37]

D. Metzler and W. B. Croft. Latent concept expansion using markov random fields. In Proc. of SIGIR, pages 311--318, 2007.

Digital Library

[38]

D. Milne and I. H. Witten. Learning to link with Wikipedia. In Proc. of CIKM, pages 509--518, 2008.

Digital Library

[39]

D. Pan, P. Zhang, J. Li, D. Song, J. Wen, Y. Hou, B. Hu, Y. Jia, and A. N. D. Roeck. Using Dempster-Shafer's evidence theory for query expansion based on freebase knowledge. In Proc. of AIRS, pages 121--132, 2013.

[40]

C. Shah and W. B. Croft. Evaluating high accuracy retrieval techniques. In Proc. of SIGIR, pages 2--9, 2004.

Digital Library

[41]

P. Srinivasan. Query expansion and medline. Information Processing & Management, 32(4):431--443, 1996.

Digital Library

[42]

E. M. Voorhees. Using wordnet to disambiguate word senses for text retrieval. In Proc. of SIGIR, pages 171--180, 1993.

Digital Library

[43]

E. M. Voorhees. Query expansion using lexical-semantic relations. In Proc. of SIGIR, pages 61--69, 1994.

Digital Library

[44]

C. Xiong and J. Callan. EsdRank: Connecting query and documents through external semi-structured data. In Proc. of CIKM, pages 951--960, 2015.

Digital Library

[45]

C. Xiong and J. Callan. Query expansion with Freebase. In Proc. of ICTIR, pages 111--120, 2015.

Digital Library

[46]

Y. Xu, G. J. Jones, and B. Wang. Query dependent pseudo-relevance feedback based on Wikipedia. In Proc. of SIGIR, pages 59--66, 2009.

Digital Library

[47]

Y. Yang and C. G. Chute. Words or concepts: the features of indexing units and their optimal use in information retrieval. In Proc. of SCAMC, page 685, 1993.

[48]

C. Zhai. Statistical language models for information retrieval: A critical review. Foundations and Trends in Information Retrieval, 2(3):137--213, 2008.

Digital Library

[49]

C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001.

Digital Library

Cited By

Cao JFang JMeng ZLiang S(2024)Knowledge Graph Embedding: A Survey from the Perspective of Representation SpacesACM Computing Surveys10.1145/364380656:6(1-42)Online publication date: 13-Mar-2024
https://dl.acm.org/doi/10.1145/3643806
Chatterjee SMackie IDalton J(2024)DREQ: Document Re-ranking Using Entity-Based Query UnderstandingAdvances in Information Retrieval10.1007/978-3-031-56027-9_13(210-229)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56027-9_13
Sheetrit ERaiber FKurland OYoshioka MKiseleva JAliannejadi M(2023)Entity-Based Relevance Feedback for Document RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605128(177-187)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605128
Show More Cited By

Index Terms

Document Retrieval Using Entity-Based Language Models
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Entity-Based Relevance Feedback for Document Retrieval
ICTIR '23: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval

There is a long history of work on using relevance feedback for ad hoc document retrieval. The main types of relevance feedback studied thus far are for documents, passages and terms. We explore the merits of using relevance feedback provided for ...
Non-relevance Feedback for Document Retrieval
KAM '09: Proceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling - Volume 02

We need to find documents that relate to human interesting from a large data set of documents. The relevance feedback method needs a set of relevant and non-relevant documents to work usefully. However, the initial retrieved documents, which are ...
Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

Cross-language spoken document retrieval (CL-SDR) is the technology that facilitates automatic retrieval of relevant information from a collection of spoken documents in a language that is different from that used in the queries. Information sources ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

July 2016

1296 pages

ISBN:9781450340694

DOI:10.1145/2911451

General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Technion-Israel Institute of Technology
Yahoo!

Conference

SIGIR '16

Sponsor:

SIGIR

SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval

July 17 - 21, 2016

Pisa, Italy

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

59
Total Citations
View Citations
1,304
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cao JFang JMeng ZLiang S(2024)Knowledge Graph Embedding: A Survey from the Perspective of Representation SpacesACM Computing Surveys10.1145/364380656:6(1-42)Online publication date: 13-Mar-2024
https://dl.acm.org/doi/10.1145/3643806
Chatterjee SMackie IDalton J(2024)DREQ: Document Re-ranking Using Entity-Based Query UnderstandingAdvances in Information Retrieval10.1007/978-3-031-56027-9_13(210-229)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56027-9_13
Sheetrit ERaiber FKurland OYoshioka MKiseleva JAliannejadi M(2023)Entity-Based Relevance Feedback for Document RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605128(177-187)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605128
He YLiu XHu JDong S(2023)Entity Relation Aware Graph Neural Ranking for Biomedical Information Retrieval2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385584(1118-1124)Online publication date: 5-Dec-2023
https://doi.org/10.1109/BIBM58861.2023.10385584
Mo FRehman HMonetti FChaplin JSanderson DPopov AMaffei ARatchev S(2023)A framework for manufacturing system reconfiguration and optimisation utilising digital twins and modular artificial intelligenceRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2022.10252482:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.rcim.2022.102524
Zhang QWang MWang HRao XChen L(2023)SSAR-GNNFuture Generation Computer Systems10.1016/j.future.2023.03.003144:C(230-241)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1016/j.future.2023.03.003
Kroll HPirklbauer JKalo JKunz MRuthmann JBalke W(2023)A discovery system for narrative query graphs: entity-interaction-aware document retrievalInternational Journal on Digital Libraries10.1007/s00799-023-00356-325:1(3-24)Online publication date: 24-Apr-2023
https://dl.acm.org/doi/10.1007/s00799-023-00356-3
Chatterjee SDietz LAl Hasan MXiong L(2022)Predicting Guiding Entities for Entity Aspect LinkingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557671(3848-3852)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557671
Shehata DArabzadeh NClarke CAl Hasan MXiong L(2022)Early Stage Sparse Retrieval with Entity LinkingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557588(4464-4469)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557588
Kasturia VGohsen MHagen MSelcuk Candan KLiu HAkoglu LLuna Dong XTang J(2022)Query Interpretations from Entity-Linked SegmentationsProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498532(449-457)Online publication date: 11-Feb-2022
https://dl.acm.org/doi/10.1145/3488560.3498532
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents