Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2911451.2911508acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Document Retrieval Using Entity-Based Language Models

Published: 07 July 2016 Publication History

Abstract

We address the ad hoc document retrieval task by devising novel types of entity-based language models. The models utilize information about single terms in the query and documents as well as term sequences marked as entities by some entity-linking tool. The key principle of the language models is accounting, simultaneously, for the uncertainty inherent in the entity-markup process and the balance between using entity-based and term-based information. Empirical evaluation demonstrates the merits of using the language models for retrieval. For example, the performance transcends that of a state-of-the-art term proximity method. We also show that the language models can be effectively used for cluster-based document retrieval and query expansion.

References

[1]
N. Abdul-jaleel, J. Allan, W. B. Croft, O. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. Umass at TREC 2004: Novelty and hard. In Proc. of TREC-13, 2004.
[2]
J. Allan, J. P. Callan, W. B. Croft, L. Ballesteros, J. Broglio, J. Xu, and H. Shu. Inquery at TREC-5. In Proc. of TREC-5, pages 119--132, 1996.
[3]
A. R. Aronson, T. C. Rindflesch, and A. C. Browne. Exploiting a large thesaurus for information retrieval. In Proc. of RIAO, volume 94, pages 197--216, 1994.
[4]
M. Bendersky, D. Metzler, and W. B. Croft. Learning concept importance using a weighted dependence model. In Proc. of WSDM, pages 31--40, 2010.
[5]
M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In Proc. of SIGIR, pages 605--614, 2011.
[6]
M. Bendersky, D. Metzler, and W. B. Croft. Effective query formulation with multiple information sources. In Proc. of WSDM, pages 443--452, 2012.
[7]
W. C. Brandao, R. L. T. Santos, N. Ziviani, E. S. de Moura, and A. S. da Silva. Learning to expand queries using entities. JASIST, 65(9):1870--1883, 2014.
[8]
G. Cao, J. Nie, and J. Bai. Integrating word relationships into language models. In Proc. of SIGIR, pages 298--305, 2005.
[9]
X. Cheng and D. Roth. Relational inference for wikification. In Proc. of EMNLP, pages 1787--1796, 2013.
[10]
K. Collins-Thompson and J. Callan. Query expansion using random walk models. In Proc. of CIKM, pages 704--711, 2005.
[11]
G. V. Cormack, M. D. Smucker, and C. L. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval, 14(5):441--465, 2011.
[12]
M. Cornolti, P. Ferragina, and M. Ciaramita. A framework for benchmarking entity-annotation systems. In Proc. of WWW, pages 249--260, 2013.
[13]
J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In Proc. of SIGIR, pages 365--374, 2014.
[14]
O. Egozi, S. Markovitch, and E. Gabrilovich. Concept-based information retrieval using explicit semantic analysis. ACM Transactions on Information Systems (TOIS), 29(2):8, 2011.
[15]
P. Ferragina and U. Scaiella. Tagme: On-the-fly annotation of short text fragments (by Wikipedia entities). In Proc. of CIKM, pages 1625--1628, 2010.
[16]
W. R. Hersh, D. H. Hickam, and T. Leone. Words, concepts, or both: optimal indexing units for automated information retrieval. In Proc. of SCAMC, page 644, 1992.
[17]
D. Hiemstra. Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term. In Proc. of SIGIR, pages 35--41, 2002.
[18]
M. Hsu, M. Tsai, and H. Chen. Combining wordnet and conceptnet for automatic query expansion: A learning approach. In Proc. of AIRS, pages 213--224, 2008.
[19]
S. Huston and W. B. Croft. A comparison of retrieval models using term dependencies. In Proc. of CIKM, pages 111--120, 2014.
[20]
A. Kotov and C. Zhai. Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries. In Proc. of WSDM, pages 403--412, 2012.
[21]
R. Krovetz and W. B. Croft. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems (TOIS), 10(2):115--141, 1992.
[22]
G. Kumaran and J. Allan. A case for shorter queries, and helping users create them. In Proc. of NAACL, pages 220--227, 2007.
[23]
H.-K. J. Kuo and W. Reichl. Phrase-based language models for speech recognition. In Proc. of EUROSPEECH, 1999.
[24]
O. Kurland and E. Krikon. The opposite of smoothing: A language model approach to ranking query-specific document clusters. Journal of Artificial Intelligence Research (JAIR), 41:367--395, 2011.
[25]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001.
[26]
V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001.
[27]
M. Levit, S. Parthasarathy, S. Chang, A. Stolcke, and B. Dumoulin. Word-phrase-entity language models: getting more mileage out of n-grams. In Proc. of INTERSPEECH, pages 666--670, 2014.
[28]
H. Li and J. Xu. Semantic matching in search. Foundations and Trends in Information Retrieval, 7(5):343--469, 2014.
[29]
R. Li, L. Hao, P. Zhang, D. Song, and Y. Hou. A query expansion approach using entity distribution based on markov random fields. In Proc. of AIRS, 2015.
[30]
S. Liu, F. Liu, C. T. Yu, and W. Meng. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In Proc. of SIGIR, pages 266--272, 2004.
[31]
X. Liu, F. Chen, H. Fang, and M. Wang. Exploiting entity relationship for query expansion in enterprise search. Information Retrieval Journal, 17(3):265--294, 2014.
[32]
X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR, pages 454--462, 2008.
[33]
X. Liu and H. Fang. Latent entity space: a novel retrieval approach for entity-bearing queries. Information Retrieval Journal, 18(6):473--503, December 2015.
[34]
R. Mandala, T. Tokunaga, and H. Tanaka. Combining multiple evidence from different types of thesaurus for query expansion. In Proc. of SIGIR, pages 191--197, 1999.
[35]
E. Meij, D. Trieschnigg, M. de Rijke, and W. Kraaij. Conceptual language models for domain-specific retrieval. Information Processing & Management, 46(4):448--469, 2010.
[36]
D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proc. of SIGIR, pages 472--479, 2005.
[37]
D. Metzler and W. B. Croft. Latent concept expansion using markov random fields. In Proc. of SIGIR, pages 311--318, 2007.
[38]
D. Milne and I. H. Witten. Learning to link with Wikipedia. In Proc. of CIKM, pages 509--518, 2008.
[39]
D. Pan, P. Zhang, J. Li, D. Song, J. Wen, Y. Hou, B. Hu, Y. Jia, and A. N. D. Roeck. Using Dempster-Shafer's evidence theory for query expansion based on freebase knowledge. In Proc. of AIRS, pages 121--132, 2013.
[40]
C. Shah and W. B. Croft. Evaluating high accuracy retrieval techniques. In Proc. of SIGIR, pages 2--9, 2004.
[41]
P. Srinivasan. Query expansion and medline. Information Processing & Management, 32(4):431--443, 1996.
[42]
E. M. Voorhees. Using wordnet to disambiguate word senses for text retrieval. In Proc. of SIGIR, pages 171--180, 1993.
[43]
E. M. Voorhees. Query expansion using lexical-semantic relations. In Proc. of SIGIR, pages 61--69, 1994.
[44]
C. Xiong and J. Callan. EsdRank: Connecting query and documents through external semi-structured data. In Proc. of CIKM, pages 951--960, 2015.
[45]
C. Xiong and J. Callan. Query expansion with Freebase. In Proc. of ICTIR, pages 111--120, 2015.
[46]
Y. Xu, G. J. Jones, and B. Wang. Query dependent pseudo-relevance feedback based on Wikipedia. In Proc. of SIGIR, pages 59--66, 2009.
[47]
Y. Yang and C. G. Chute. Words or concepts: the features of indexing units and their optimal use in information retrieval. In Proc. of SCAMC, page 685, 1993.
[48]
C. Zhai. Statistical language models for information retrieval: A critical review. Foundations and Trends in Information Retrieval, 2(3):137--213, 2008.
[49]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001.

Cited By

View all
  • (2024)Knowledge Graph Embedding: A Survey from the Perspective of Representation SpacesACM Computing Surveys10.1145/364380656:6(1-42)Online publication date: 13-Mar-2024
  • (2024)DREQ: Document Re-ranking Using Entity-Based Query UnderstandingAdvances in Information Retrieval10.1007/978-3-031-56027-9_13(210-229)Online publication date: 24-Mar-2024
  • (2023)Entity-Based Relevance Feedback for Document RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605128(177-187)Online publication date: 9-Aug-2023
  • Show More Cited By

Index Terms

  1. Document Retrieval Using Entity-Based Language Models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
    July 2016
    1296 pages
    ISBN:9781450340694
    DOI:10.1145/2911451
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 July 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. document retrieval
    2. entity-based language models

    Qualifiers

    • Research-article

    Funding Sources

    • Technion-Israel Institute of Technology
    • Yahoo!

    Conference

    SIGIR '16
    Sponsor:

    Acceptance Rates

    SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Knowledge Graph Embedding: A Survey from the Perspective of Representation SpacesACM Computing Surveys10.1145/364380656:6(1-42)Online publication date: 13-Mar-2024
    • (2024)DREQ: Document Re-ranking Using Entity-Based Query UnderstandingAdvances in Information Retrieval10.1007/978-3-031-56027-9_13(210-229)Online publication date: 24-Mar-2024
    • (2023)Entity-Based Relevance Feedback for Document RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605128(177-187)Online publication date: 9-Aug-2023
    • (2023)Entity Relation Aware Graph Neural Ranking for Biomedical Information Retrieval2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385584(1118-1124)Online publication date: 5-Dec-2023
    • (2023)A framework for manufacturing system reconfiguration and optimisation utilising digital twins and modular artificial intelligenceRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2022.10252482:COnline publication date: 1-Aug-2023
    • (2023)SSAR-GNNFuture Generation Computer Systems10.1016/j.future.2023.03.003144:C(230-241)Online publication date: 1-Jul-2023
    • (2023)A discovery system for narrative query graphs: entity-interaction-aware document retrievalInternational Journal on Digital Libraries10.1007/s00799-023-00356-325:1(3-24)Online publication date: 24-Apr-2023
    • (2022)Predicting Guiding Entities for Entity Aspect LinkingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557671(3848-3852)Online publication date: 17-Oct-2022
    • (2022)Early Stage Sparse Retrieval with Entity LinkingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557588(4464-4469)Online publication date: 17-Oct-2022
    • (2022)Query Interpretations from Entity-Linked SegmentationsProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498532(449-457)Online publication date: 11-Feb-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media