research-article

Entity ranking and relationship queries using an extended graph model

Authors:

Adil Anis Sandalwala,

Prashant JaiswalAuthors Info & Claims

COMAD '12: Proceedings of the 18th International Conference on Management of Data

Pages 80 - 91

Published: 14 December 2012 Publication History

Abstract

There is a large amount of textual data on the Web and in Wikipedia, where mentions of entities (such as Gandhi) are annotated with a link to the disambiguated entity (such as M. K. Gandhi). Such annotation may have been done manually (as in Wikipedia) or can be done using named entity recognition/disambiguation techniques. Such an annotated corpus allows queries to return entities, instead of documents. Entity ranking queries retrieve entities that are related to keywords in the query and belong to a given type/category specified in the query; entity ranking has been an active area of research in the past few years. More recently, there have been extensions to allow entity-relationship queries, which allow specification of multiple sets of entities as well as relationships between them.

In this paper we address the problem of entity ranking ("near") queries and entity-relationship queries on theWikipedia corpus. We first present an extended graph model which combines the power of graph models used earlier for structured/semi-structured data, with information from textual data. Based on this model, we show how to specify entity and entity-relationship queries, and defined scoring methods for ranking answers. Finally, we provide efficient algorithms for answering such queries, exploiting a space efficient in-memory graph structure. A performance comparison with the ERQ system proposed earlier shows significant improvement in answer quality for most queries, while also handling a much larger set of entity types.

References

[1]

S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, 2002.

Digital Library

[2]

A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: authority-based keyword search in databases. In VLDB, 2004.

Digital Library

[3]

H. Bast, A. Chitea, F. M. Suchanek, and I. Weber. Ester: efficient search on text, entities, and relations. In SIGIR, pages 671--678, 2007.

Digital Library

[4]

G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, 2002.

Digital Library

[5]

S. Chakrabarti, K. Puniyani, and S. Das. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In WWW, pages 717--726, 2006.

Digital Library

[6]

S. Chakrabarti, D. Sane, and G. Ramakrishnan. Web-scale entity-relation search architecture. In WWW (Companion Volume), pages 21--22, 2011.

Digital Library

[7]

T. Cheng and K. C.-C. Chang. Beyond pages: Supporting efficient, scalable entity search with dual-inversion index. In SIGMOD, 2010.

[8]

T. Cheng, X. Yan, and K. C.-C. Chang. EntityRank: Searching entities directly and holistically. In VLDB, 2007.

Digital Library

[9]

H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: Ranked keyword searches on graphs. In SIGMOD, pages 305--316, 2007.

Digital Library

[10]

V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, 2002.

Digital Library

[11]

V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, 2005.

Digital Library

[12]

G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. NAGA: Searching and ranking knowledge. In ICDE, pages 953--962, 2008.

Digital Library

[13]

S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of wikipedia entities in web text. In KDD, pages 457--466, 2009.

Digital Library

[14]

X. Li, C. Li, and C. Yu. Entityengine: answering Entity-Relationship queries using shallow semantics. In CIKM, pages 1925--1926, 2010.

Digital Library

[15]

X. Li, C. Li, and C. Yu. Entity-relationship queries over wikipedia. ACM Trans. on Intelligent Systems and Technology, 3(4), Sept. 2012.

Digital Library

[16]

Y. Lv and C. Zhai. Positional language models for information retrieval. In SIGIR, pages 299--306, 2009.

Digital Library

[17]

J. Pound, I. F. Ilyas, and G. E. Weddell. Expressive and flexible access to web-extracted data: a keyword-based structured query language. In SIGMOD Conf., pages 423--434, 2010.

Digital Library

[18]

F. M. Suchanek, G. Kasneci, and G. Weikum. Yago - a core of semantic knowledge. In WWW, 2007.

Digital Library

[19]

M. A. Yosef, J. Hoffart, I. Bordino, M. Spaniol, and G. Weikum. AIDA: An online tool for accurate disambiguation of named entities in text and tables. PVLDB, 4(12):1450--1453, 2011.

Digital Library

Entity ranking and relationship queries using an extended graph model
1. Information systems

Recommendations

Entity-Relationship Queries over Wikipedia

Wikipedia is the largest user-generated knowledge base. We propose a structured query mechanism, entity-relationship query, for searching entities in the Wikipedia corpus by their properties and interrelationships. An entity-relationship query consists ...
Entity-relationship queries over wikipedia
SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents

Wikipedia is the largest user-generated knowledge base. We propose a structured query mechanism, entity-relationship query, for searching entities in Wikipedia corpus by their properties and inter-relationships. An entity-relationship query consists of ...
Entity ranking using Wikipedia as a pivot
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

In this paper we investigate the task of Entity Ranking on the Web. Searchers looking for entities are arguably better served by presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

COMAD '12: Proceedings of the 18th International Conference on Management of Data

December 2012

101 pages

General Chair:
Chandrashekhar Sahasrabudhe
Persistent Systems, Pune
,
Program Chairs:
Amr El Abbadi
University of California, Santa Barbara
,
Karin Murthy
IBM Research - Bangalore

Sponsors

IIIT: International Institute of Information Technology
Infosys
SAP
Persistent Systems
Aerospike: Aerospike
Yahoo! India Research & Development
IBM: IBM

In-Cooperation

ACM: Association for Computing Machinery
ACM India: ACM India

Publisher

Computer Society of India

Mumbai, Maharashtra, India

Publication History

Published: 14 December 2012

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
29
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents