research-article

Ranking Entities for Web Queries Through Text and Knowledge

Authors:

Michael Schuhmacher,

Simone Paolo PonzettoAuthors Info & Claims

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 1461 - 1470

https://doi.org/10.1145/2806416.2806480

Published: 17 October 2015 Publication History

Abstract

When humans explain complex topics, they naturally talk about involved entities, such as people, locations, or events. In this paper, we aim at automating this process by retrieving and ranking entities that are relevant to understand free-text web-style queries like Argentine British relations, which typically demand a set of heterogeneous entities with no specific target type like, for instance, Falklands_-War} or Margaret-_Thatcher, as answer. Standard approaches to entity retrieval rely purely on features from the knowledge base. We approach the problem from the opposite direction, namely by analyzing web documents that are found to be query-relevant. Our approach hinges on entity linking technology that identifies entity mentions and links them to a knowledge base like Wikipedia. We use a learning-to-rank approach and study different features that use documents, entity mentions, and knowledge base entities -- thus bridging document and entity retrieval. Since established benchmarks for this problem do not exist, we use TREC test collections for document ranking and collect custom relevance judgments for entities. Experiments on TREC Robust04 and TREC Web13/14 data show that: i) single entity features, like the frequency of occurrence within the top-ranke documents, or the query retrieval score against a knowledge base, perform generally well; ii) the best overall performance is achieved when combining different features that relate an entity to the query, its document mentions, and its knowledge base representation.

References

[1]

N. Balasubramanian and S. Cucerzan. Beyond ranked lists in web search: Aggregating web content into topic pages. International Journal of Semantic Computing, 4(4):509--534, 2010.

[2]

K. Balog, A. P. de Vries, P. Serdyukov, P. Thomas, and T. Westerveld. Overview of the TREC 2009 entity track. In Proc. of TREC-09, 2010.

[3]

C. Biemann and M. Riedl. Text: Now in 2D! A framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1:55--95, 2013.

[4]

C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia -- A Crystallization Point for the Web of Data. Journal of Web Semantics, 7(3), 2009.

Digital Library

[5]

S. Bloehdorn, R. Basili, M. Cammisa, and A. Moschitti. Semantic kernels for text classification based on topological measures of feature similarity. In Proc. of ICDM'06, pages 808--812, 2006.

Digital Library

[6]

M. Ciglan, K. Nørvåg, and L. Hluchý. The SemSets model for ad-hoc semantic list search. In Proc. of WWW'12, pages 131--140, 2012.

Digital Library

[7]

J. Dalton and L. Dietz. A neighborhood relevance model for entity linking. In Proc. of OAIR-13, pages 149--156, 2013.

Digital Library

[8]

J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In Proc. of SIGIR-14, pages 365--374, 2014.

Digital Library

[9]

N. Dalvi, R. Kumar, B. Pang, R. Ramakrishnan, A. Tomkins, P. Bohannon, S. Keerthi, and S. Merugu. A web of concepts. In Proc. of PODS '09, pages 1--12, 2009.

Digital Library

[10]

G. Demartini, C. S. Firan, T. Iofciu, R. Krestel, and W. Nejdl. Why finding entities in Wikipedia is difficult, sometimes. Information Retrieval, 13(5):534--567, 2010.

Digital Library

[11]

G. Demartini, T. Iofciu, and A. P. de Vries. Overview of the INEX 2009 entity ranking track. In Proc. of INEX, pages 254--264, 2009.

Digital Library

[12]

L. Dietz, M. Schuhmacher, and S. Ponzetto. Queripidia: Query-specific Wikipedia construction. In Proc. of AKBC-14, 2014.

[13]

J. Dunietz and D. Gillick. A new entity salience task with millions of training examples. In Proc. of EACL-14, pages 205--209, 2014.

[14]

O. Egozi, S. Markovitch, and E. Gabrilovich. Concept-based information retrieval using Explicit Semantic Analysis. ACM Transactions on Information Systems, 29(2):8:1--8:34, 2011.

Digital Library

[15]

S. Elbassuoni, M. Ramanath, R. Schenkel, M. Sydow, and G. Weikum. Language-model-based ranking for queries on RDF-graphs. In Proc. of CIKM-09, pages 977--986, 2009.

Digital Library

[16]

P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. IEEE Software, 29(1):70--75, 2012.

Digital Library

[17]

E. Gabrilovich, M. Ringgaard, and A. Subramanya. Facc1: Freebase annotation of ClueWeb corpora, version 1, 2013.

[18]

R. Gupta, A. Halevy, X. Wang, S. Whang, and F. Wu. Biperpedia: An ontology for search applications. In Proc. of PVLDB-14, pages 505--516, 2014.

Digital Library

[19]

S. Gurajada, J. Kamps, A. Mishra, R. Schenkel, M. Theobald, and Q. Wang. Overview of the INEX 2013 linked data track. In Working Notes for CLEF 2013, 2013.

[20]

J. Hoffart, Y. Altun, and G. Weikum. Discovering emerging entities with ambiguous names. In Proc. of WWW-14, pages 385--396, 2014.

Digital Library

[21]

J. Hoffart, D. Milchevski, and G. Weikum. STICS: Searching with Strings, Things, and Cats. In Proc. of SIGIR-14, pages 1247--1248, 2014.

Digital Library

[22]

E. Hovy, R. Navigli, and S. P. Ponzetto. Collaboratively built semi-structured content and Artificial Intelligence: The story so far. Artificial Intelligence, 194:2--27, 2013.

Digital Library

[23]

K. Järvelin and J. Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR-00, pages 41--48, 2000.

Digital Library

[24]

T. Joachims. Optimizing search engines using clickthrough data. In Proc. of SIGKDD-02, pages 133--142, 2002.

Digital Library

[25]

T. Joachims. Training linear SVMs in linear time. In Proc. of SIGKDD-06, pages 217--226, 2006.

Digital Library

[26]

R. Kaptein and J. Kamps. Exploiting the category structure of Wikipedia for entity ranking. Artificial Intelligence, 194:111--129, 2013.

Digital Library

[27]

R. Kaptein, P. Serdyukov, A. P. de Vries, and J. Kamps. Entity ranking using Wikipedia as a pivot. In Proc. of CIKM-10, pages 69--78, 2010.

Digital Library

[28]

V. I. Levenshtein. Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission, 1:8--17, 1965.

[29]

T.-Y. Liu. Learning to rank for information retrieval. Springer-Verlag, Berlin, 2011.

[30]

D. Metzler and W. Bruce Croft. Linear feature-based models for information retrieval. Information Retrieval, 10(3):257--274, 2007.

Digital Library

[31]

D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proc. of SIGIR-05, pages 472--479, 2005.

Digital Library

[32]

J. Pehcevski, A.-M. Vercoustre, and J. A. Thom. Exploiting locality of Wikipedia links in entity ranking. In Proc. of ECIR-08, pages 258--269, 2008.

Digital Library

[33]

J. Pennington, R. Socher, and C. D. Manning. GloVe: Global vectors for word representation. In Proc. of EMNLP-2014, pages 1532--1543, 2014.

[34]

J. Pound, P. Mika, and H. Zaragoza. Ad-hoc object retrieval in the web of data. In Proc. of WWW-10, pages 771--780, 2010.

Digital Library

[35]

H. Raviv, D. Carmel, and O. Kurland. A ranking framework for entity oriented search using markov random fields. In Proc. of JIWES '12, pages 1--6, 2012.

Digital Library

[36]

M. Schuhmacher and S. P. Ponzetto. Knowledge-based graph document modeling. In Proc. of WSDM-14, pages 543--552, 2014.

Digital Library

[37]

M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers to non-factoid questions from web collections. Computational Linguistics, 37(2):351--383, 2011.

Digital Library

[38]

A. Tonon, G. Demartini, and P. Cudré-Mauroux. Combining inverted indices and structured search for ad-hoc object retrieval. In Proc. of SIGIR-12, pages 125--134. ACM, 2012.

Digital Library

[39]

C. Unger, L. Bühmann, J. Lehmann, A.-C. Ngonga Ngomo, D. Gerber, and P. Cimiano. Template-based question answering over RDF data. In Proc. of WWW-12, pages 639--648, 2012.

Digital Library

[40]

E. M. Voorhees. The TREC robust retrieval track. In ACM SIGIR Forum, volume 39, pages 11--20, 2005.

Digital Library

[41]

E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. The MIT Press, 2005.

Digital Library

[42]

W. Wu, H. Li, H. Wang, and K. Zhu. Probase: A probabilistic taxonomy for text understanding. In Proc. of SIGMOD-12, pages 481--492, 2012.

Digital Library

[43]

N. Zhiltsov and E. Agichtein. Improving entity search over linked data by modeling latent semantics. In Proc. of CIKM-13, pages 1253--1256, 2013.

Digital Library

Cited By

Chatterjee SMackie IDalton J(2024)DREQ: Document Re-ranking Using Entity-Based Query UnderstandingAdvances in Information Retrieval10.1007/978-3-031-56027-9_13(210-229)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56027-9_13
Sidi MGunal S(2023)A Purely Entity-Based Semantic Search Approach for Document RetrievalApplied Sciences10.3390/app13181028513:18(10285)Online publication date: 14-Sep-2023
https://doi.org/10.3390/app131810285
Rogushina J(2023)A three-dimensional model of semantic search: queries, resources, and resultsPROBLEMS IN PROGRAMMING10.15407/pp2023.04.039(39-55)Online publication date: Dec-2023
https://doi.org/10.15407/pp2023.04.039
Show More Cited By

Index Terms

Ranking Entities for Web Queries Through Text and Knowledge
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Semantic networks
2. Information systems
  1. Information retrieval

Recommendations

Entity query feature expansion using knowledge base links
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the ...
Entity-Aspect Linking: Providing Fine-Grained Semantics of Entities in Context
JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries

The availability of entity linking technologies provides a novel way to organize, categorize, and analyze large textual collections in digital libraries. However, in many situations a link to an entity offers only relatively coarse-grained semantic ...
Ranking related entities for web search queries
WWW '11: Proceedings of the 20th international conference companion on World wide web

Entity ranking is a recent paradigm that refers to retrieving and ranking related objects and entities from different structured sources in various scenarios. Entities typically have associated categories and relationships with other entities. In this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

October 2015

1998 pages

ISBN:9781450337946

DOI:10.1145/2806416

General Chairs:
James Bailey
The University of Melbourne
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Charu C. Aggarwal
IBM
,
Maarten de Rijke
University of Amsterdam
,
Ravi Kumar
Google
,
Vanessa Murdock
Microsoft
,
Timos Sellis
RMIT University
,
Jeffrey Xu Yu
Chinese University of Hong Kong

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Deutsche Forschungsgemeinschaft
Amazon
MWK Baden-Württemberg

Conference

CIKM'15

Sponsor:

CIKM'15: 24th ACM International Conference on Information and Knowledge Management

October 18 - 23, 2015

Melbourne, Australia

Acceptance Rates

CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
574
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chatterjee SMackie IDalton J(2024)DREQ: Document Re-ranking Using Entity-Based Query UnderstandingAdvances in Information Retrieval10.1007/978-3-031-56027-9_13(210-229)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56027-9_13
Sidi MGunal S(2023)A Purely Entity-Based Semantic Search Approach for Document RetrievalApplied Sciences10.3390/app13181028513:18(10285)Online publication date: 14-Sep-2023
https://doi.org/10.3390/app131810285
Rogushina J(2023)A three-dimensional model of semantic search: queries, resources, and resultsPROBLEMS IN PROGRAMMING10.15407/pp2023.04.039(39-55)Online publication date: Dec-2023
https://doi.org/10.15407/pp2023.04.039
Guo MZhou ZGotz DWang Y(2023)GRAFS: Graphical Faceted Search System to Support Conceptual Understanding in Exploratory SearchACM Transactions on Interactive Intelligent Systems10.1145/358831913:2(1-36)Online publication date: 31-Mar-2023
https://dl.acm.org/doi/10.1145/3588319
Oza PDietz L(2023)Entity Embeddings for Entity Ranking: A Replicability StudyAdvances in Information Retrieval10.1007/978-3-031-28241-6_8(117-131)Online publication date: 2-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-28241-6_8
Chatterjee SDietz LAl Hasan MXiong L(2022)Predicting Guiding Entities for Entity Aspect LinkingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557671(3848-3852)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557671
Chatterjee SDietz LAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)BERT-ERProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531944(1466-1477)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531944
Sansone CSperlí G(2022)Legal Information Retrieval systemsInformation Systems10.1016/j.is.2021.101967106:COnline publication date: 12-May-2022
https://dl.acm.org/doi/10.1016/j.is.2021.101967
Chatterjee S(2022)An Entity-Oriented Approach for Answering Topical Information NeedsAdvances in Information Retrieval10.1007/978-3-030-99739-7_57(463-472)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99739-7_57
Mahjoob MEnsan FKeshvari SJafarzadeh Pkeyvanzad M(2021)Extraction of Effective Textual and Semantic Features in Learning to Rank for Web Document RetrievalIranian Journal of Information Processing and Management10.52547/jipm.36.4.108136:4(1081-1112)Online publication date: 1-Jul-2021
https://doi.org/10.52547/jipm.36.4.1081
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents