Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3448016.3457309acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Proportionality in Spatial Keyword Search

Published: 18 June 2021 Publication History

Abstract

More often than not, spatial objects are associated with some context, in the form of text, descriptive tags (e.g. points of interest, flickr photos), or linked entities in semantic graphs (e.g. Yago2, DBpedia). Hence, location-based retrieval should be extended to consider not only the locations but also the context of the objects, especially when the retrieved objects are too many and the query result is overwhelming. In this paper, we study the problem of selecting a subset of the query result, which is the most representative. We argue that objects with similar context and nearby locations should proportionally be represented in the selection. Proportionality dictates the pairwise comparison of all retrieved objects and hence bears a high cost. We propose novel algorithms which greatly reduce the cost of proportional object selection in practice. Extensive empirical studies on real datasets show that our algorithms are effective and efficient. A user evaluation verifies that proportional selection is more preferable than random selection and selection based on object diversification.

Supplementary Material

MP4 File (3448016.3457309.mp4)
More often than not, spatial objects are associated with some context, in the form of text, descriptive tags (e.g. points of interest, flickr photos), or linked entities in semantic graphs (e.g. YAGO2, DBpedia). Hence, location-based retrieval should be extended to consider not only the locations but also the context of the objects,especially when the retrieved objects are too many and the query result is overwhelming. In this paper, we study the problem of selecting a subset of the query result, which is the most representative. We argue that objects with similar context and nearby locations should proportionally be represented in the selection. Proportionality dictates the pairwise comparison of all retrieved objects andhence bears a high cost. We propose novel algorithms which greatlyreduce the cost of proportional object selection in practice. Extensive empirical studies on real datasets show that our algorithms are effective and efficient. A user evaluation verifies that proportional selection is more preferable than random selection and selection based on object diversification.

References

[1]
Pritom Ahmed, Mahbub Hasan, Abhijith Kashyap, Vagelis Hristidis, and Vassilis J. Tsotras. 2017. Efficient Computation of Top-k Frequent Terms over Spatio-temporal Ranges. In SIGMOD. 1227--1241.
[2]
Ritesh Ahuja, Nikos Armenatzoglou, Dimitris Papadias, and George J. Fakas. 2015. Geo-Social Keyword Search. In SSTD. 431--450.
[3]
Avi Arampatzis and Georgios Kalamatianos. 2018. Suggesting Points-of-Interest via Content-Based, Collaborative, and Hybrid Fusion Methods in Mobile Devices. TOIS, Vol. 36, 3 (2018), 23:1--23:28.
[4]
Roberto J. Bayardo, Yiming Ma, and Ramakrishnan Srikant. 2007. Scaling up all pairs similarity search. In WWW. 131--140.
[5]
Zhi Cai, Georgios Kalamatianos, Georgios J. Fakas, Nikos Mamoulis, and Dimitris Papadias. 2020. Diversified spatial keyword search on RDF data. VLDB J., Vol. 29, 5 (2020), 1171--1189.
[6]
Lisi Chen, Shuo Shang, Chengcheng Yang, and Jing Li. 2020. Spatial keyword search: a survey. GeoInformatica, Vol. 24, 1 (2020), 85--106.
[7]
Shiwen Cheng, Anastasios Arvanitis, Marek Chrobak, and Vagelis Hristidis. 2014. Multi-Query Diversification in Microblogging Posts. In EDBT. 133--144.
[8]
Shiwen Cheng, Marek Chrobak, and Vagelis Hristidis. 2016. Slowing the Firehose: Multi-Dimensional Diversity on Social Post Streams. In EDBT. 17--28.
[9]
Gao Cong, Christian S. Jensen, and Dingming Wu. 2009. Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects. PVLDB, Vol. 2, 1 (2009), 337--348.
[10]
Van Dang and W. Bruce Croft. 2012a. Diversity by proportionality: an election-based approach to search result diversification. In SIGIR. 65--74.
[11]
Van Dang and W. Bruce Croft. 2012b. Diversity by Proportionality: An Election-based Approach to Search Result Diversification. In SIGIR. 65--74.
[12]
Elena Demidova, Peter Fankhauser, Xuan Zhou, and Wolfgang Nejdl. 2010. DivQ: diversification for keyword search over structured databases. In SIGIR. 331--338.
[13]
Georgios J. Fakas. 2008. Automated generation of object summaries from relational databases: A novel keyword searching paradigm. In ICDE Workshops. IEEE Computer Society, 564--567.
[14]
Georgios J. Fakas. 2011. A Novel Keyword Search Paradigm in Relational Databases: Object Summaries. DKE, Vol. 70, 2 (2011), 208 -- 229.
[15]
Georgios J. Fakas, Yilun Cai, Zhi Cai, and Nikos Mamoulis. 2018. Thematic ranking of object summaries for keyword search. DKE, Vol. 113 (2018), 1--17.
[16]
Georgios J. Fakas and Zhi Cai. 2009. Ranking of Object Summaries. In ICDE. 1580--1583.
[17]
Georgios J. Fakas, Zhi Cai, and Nikos Mamoulis. 2011a. Size-l Object Summaries for Relational Keyword Search. PVLDB, Vol. 5, 3 (2011), 229--240.
[18]
Georgios J. Fakas, Zhi Cai, and Nikos Mamoulis. 2014. Versatile Size-l Object Summaries for Relational Keyword Search. TKDE, Vol. 26, 4 (2014), 1026 -- 1038.
[19]
Georgios J. Fakas, Zhi Cai, and Nikos Mamoulis. 2015. Diverse and Proportional Size-l Object Summaries for Keyword Search. In SIGMOD. 363--375.
[20]
Georgios J. Fakas, Zhi Cai, and Nikos Mamoulis. 2016. Diverse and proportional size-l object summaries using pairwise relevance. VLDB J., Vol. 25, 6 (2016), 791 -- 816.
[21]
Georgios J. Fakas, Ben Cawley, and Zhi Cai. 2011b. Automated Generation of Personal Data Reports from Relational Databases. JIKM, Vol. 10, 2 (2011), 193--208.
[22]
Marios Hadjieleftheriou, George Kollios, Vassilis J. Tsotras, and Dimitrios Gunopulos. 2006. Indexing spatiotemporal archives. VLDB J., Vol. 15, 2 (2006), 143--164.
[23]
Jungkyu Han and Hayato Yamana. 2017. Geographical Diversification in POI Recommendation: Toward Improved Coverage on Interested Areas. In RecSys. ACM, 224--228.
[24]
Jayant R. Haritsa. 2009. The KNDN Problem: A Quest for Unity in Diversity. IEEE Data Eng. Bull., Vol. 32, 4 (2009), 15--22.
[25]
Mahbub Hasan, Abhijith Kashyap, Vagelis Hristidis, and Vassilis J. Tsotras. 2014. User effort minimization through adaptive diversification. In KDD. 203--212.
[26]
Refael Hassin, Shlomi Rubinstein, and Arie Tamir. 1997. Approximation algorithms for maximum dispersion. Operations Research Letters, Vol. 21, 3 (1997).
[27]
Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell., Vol. 194 (2013), 28--61.
[28]
Vagelis Hristidis, Luis Gravano, and Yannis Papakonstantinou. 2003. Efficient IR-Style Keyword Search over Relational Databases. In VLDB. 850--861.
[29]
Vagelis Hristidis and Yannis Papakonstantinou. 2002. Discover: Keyword Search in Relational Databases. In VLDB. 670--681.
[30]
Anoop Jain, Parag Sarda, and Jayant R Haritsa. 2004. Providing diversity in k-nearest neighbor query results. In PAKDD. 404--413.
[31]
Mehdi Kargar, Aijun An, and Xiaohui Yu. 2014. Efficient Duplication Free and Minimal Keyword Search in Graphs. TKDE, Vol. 26, 7 (2014), 1657 -- 1669.
[32]
George Kollios, Vassilis J. Tsotras, and Dimitrios Gunopulos. 2017. Mobile Object Indexing. In Encyclopedia of GIS. 1256--1266.
[33]
Wangchao Le, Feifei Li, Anastasios Kementsietsidis, and Songyun Duan. 2014. Scalable Keyword Search on Large RDF Data. TKDE, Vol. 26, 11 (2014), 2774--2788.
[34]
Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets 2 ed.). Cambridge University Press.
[35]
Michael Levandowsky and David Winter. 1971. Distance between sets. Nature, Vol. 234, 5323 (1971), 34--35.
[36]
S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. 1991. Facility dispersion problems: Heuristics and special cases. 431--450.
[37]
Dimitris Sacharidis, Paras Mehta, Dimitrios Skoutas, Kostas Patroumpas, and Agnè s Voisard. 2018. Selecting representative and diverse spatio-textual posts over sliding windows. In SSDBM. 17:1--17:12.
[38]
Jieming Shi, Dingming Wu, and Nikos Mamoulis. 2016. Top-k Relevant Semantic Place Retrieval on Spatial RDF Data. In SIGMOD. 1977--1990.
[39]
A. B. Siddique, Ahmed Eldawy, and Vagelis Hristidis. 2019. Euler+: Improved Selectivity Estimation for Rectangular Spatial Records. In BigData. 4129--4133.
[40]
Souvik Brata Sinha, Xinge Lu, and Dimitri Theodoratos. 2018. Personalized Keyword Search on Large RDF Graphs based on Pattern Graph Similarity. In IDEAS. 12--21.
[41]
Kostas Stefanidis, Marina Drosou, and Evaggelia Pitoura. 2010. PerK: personalized keyword search in relational databases through preferences. In EDBT. 585--596.
[42]
Julia Stoyanovich, Ke Yang, and H. V. Jagadish. 2018. Online Set Selection with Fairness and Diversity Constraints. In EDBT. 241--252.
[43]
Jiayu Tang and Mark Sanderson. 2010. Evaluation and User Preference Study on Spatial Diversity. In ECIR, Vol. 5993. 179--190.
[44]
Marc Van Kreveld, Iris Reinbacher, Avi Arampatzis, and Roelof Van Zwol. 2005. Multi-dimensional scattered ranking methods for geographic information retrieval. GeoInformatica, Vol. 9, 1 (2005), 61--84.
[45]
Marcos R. Vieira, Humberto L. Razente, Maria C. N. Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina, and Vassilis J. Tsotras. 2011. On Query Result Diversification. In ICDE. 1163--1174.
[46]
Lin Wu, Yang Wang, John Shepherd, and Xiang Zhao. 2013. An Optimization Method for Proportionally Diversifying Search Results. In PAKDD (1), Vol. 7818. 390--401.
[47]
Xiaolu Xing, Chaofeng Sha, and Junyu Niu. 2017. Improving Topic Diversity in Recommendation Lists: Marginally or Proportionally?. In APWeb/WAIM (2), Vol. 10367. 142--150.
[48]
Chengyuan Zhang, Ying Zhang, Wenjie Zhang, Xuemin Lin, Muhammad Aamir Cheema, and Xiaoyang Wang. 2014. Diversified Spatial Keyword Search On Road Networks. In EDBT. 367--378.

Cited By

View all
  • (2024)Spatio-Temporal Keyword Query Processing Based on Key-Value StoresData Science and Engineering10.1007/s41019-024-00265-8Online publication date: 5-Dec-2024
  • (2023)Proportionality on Spatial Data with ContextACM Transactions on Database Systems10.1145/358843448:2(1-37)Online publication date: 13-May-2023
  • (2023)Approximate Reverse Top-k Spatial-Keyword Queries2023 24th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM58254.2023.00026(96-105)Online publication date: Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ptolemy's spatial diversity
  2. diversity
  3. keyword search
  4. proportionality
  5. ranking
  6. spatial data

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Spatio-Temporal Keyword Query Processing Based on Key-Value StoresData Science and Engineering10.1007/s41019-024-00265-8Online publication date: 5-Dec-2024
  • (2023)Proportionality on Spatial Data with ContextACM Transactions on Database Systems10.1145/358843448:2(1-37)Online publication date: 13-May-2023
  • (2023)Approximate Reverse Top-k Spatial-Keyword Queries2023 24th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM58254.2023.00026(96-105)Online publication date: Jul-2023
  • (2023)Efficient m-closest entity matching over heterogeneous information networksKnowledge-Based Systems10.1016/j.knosys.2023.110299263:COnline publication date: 5-Mar-2023
  • (2022)Exploiting Pareto distribution for user modeling in location-based information retrievalExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.116275192:COnline publication date: 15-Apr-2022
  • (2022)Continuous spatial keyword query processing over geo-textual data streamsWorld Wide Web10.1007/s11280-022-01062-x26:3(889-903)Online publication date: 11-May-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media