Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3221269.3221290acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Selecting representative and diverse spatio-textual posts over sliding windows

Published: 09 July 2018 Publication History

Abstract

Thousands of posts are generated constantly by millions of users in social media, with an increasing portion of this content being geotagged. Keeping track of the whole stream of this spatio-textual content can easily become overwhelming for the user. In this paper, we address the problem of selecting a small, representative and diversified subset of posts, which is continuously updated over a sliding window. Each such subset can be considered as a concise summary of the stream's contents within the respective time interval, being dynamically updated every time the window slides to reflect newly arrived and expired posts. We define the criteria for selecting the contents of each summary, and we present several alternative strategies for summary construction and maintenance that provide different trade-offs between information quality and performance. Furthermore, we optimize the performance of our methods by partitioning the newly arriving posts spatio-textually and computing bounds for the coverage and diversity of the posts in each partition. The proposed methods are evaluated experimentally using real-world datasets containing geotagged tweets and photos.

References

[1]
Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In WSDM. 5--14.
[2]
Benjamin E. Birnbaum and Kenneth J. Goldman. 2006. An Improved Analysis for a Greedy Remote-Clique Algorithm Using Factor-Revealing LPs. In APPROX-RANDOM. 49--60.
[3]
Jaime G. Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In SIGIR. 335--336.
[4]
Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, and Eli Upfal. 2017. MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension. Proc. VLDB Endow. 10, 5 (2017), 469--480.
[5]
Lisi Chen and Gao Cong. 2015. Diversity-Aware Top-k Publish/Subscribe for Text Stream. In SIGMOD. 347--362.
[6]
Lisi Chen, Gao Cong, Xin Cao, and Kian-Lee Tan. 2015. Temporal Spatial-Keyword Top-k publish/subscribe. In ICDE. 255--266.
[7]
Zhicheng Dou, Sha Hu, Kun Chen, Ruihua Song, and Ji-Rong Wen. 2011. Multi-dimensional search result diversification. In WSDM. 475--484.
[8]
Marina Drosou and Evaggelia Pitoura. 2010. Search result diversification. SIGMOD Record 39, 1 (2010), 41--47.
[9]
Marina Drosou and Evaggelia Pitoura. 2014. Diverse Set Selection Over Dynamic Data. IEEE Trans. Knowl. Data Eng. 26, 5 (2014), 1102--1116.
[10]
Marina Drosou and Evaggelia Pitoura. 2015. Multiple Radii DisC Diversity: Result Diversification Based on Dissimilarity and Coverage. ACM Trans. Database Syst. 40, 1 (2015), 4:1--4:43.
[11]
Jade Goldstein, Vibhu Mittal, Jaime Carbonell, and Mark Kantrowitz. 2000. Multi-document summarization by sentence extraction. In NAACL-ANLP-AutoSum. 40--48.
[12]
Sreenivas Gollapudi and Aneesh Sharma. 2009. An axiomatic approach for result diversification. In WWW. 381--390.
[13]
Yihong Gong and Xin Liu. 2001. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In SIGIR. 19--25.
[14]
Long Guo, Dongxiang Zhang, Guoliang Li, Kian-Lee Tan, and Zhifeng Bao. 2015. Location-Aware Pub/Sub System: When Continuous Moving Queries Meet Dynamic Event Streams. In SIGMOD. 843--857.
[15]
Refael Hassin, Shlomi Rubinstein, and Arie Tamir. 1997. Approximation algorithms for maximum dispersion. Oper. Res. Lett. 21, 3 (1997), 133--137.
[16]
Huiqi Hu, Yiqun Liu, Guoliang Li, Jianhua Feng, and Kian-Lee Tan. 2015. A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In ICDE. 711--722.
[17]
Piotr Indyk, Sepideh Mahabadi, Mohammad Mahdian, and Vahab S. Mirrokni. 2014. Composable Core-sets for Diversity and Coverage Maximization. In PODS. 100--108.
[18]
Guoliang Li, Yang Wang, Ting Wang, and Jianhua Feng. 2013. Location-aware publish/subscribe. In SIGKDD. 802--810.
[19]
Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A. Tucker. 2005. Semantics and Evaluation Techniques for Window Aggregates in Data Streams. In SIGMOD. 311--322.
[20]
Hui Lin and Jeff A. Bilmes. 2011. A Class of Submodular Functions for Document Summarization. In ACL. 510--520.
[21]
Paul A. Longley, Mike Goodchild, David J. Maguire, and David W. Rhind. 2010. Geographic Information Systems and Science (3rd ed.). Wiley Publishing.
[22]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge University Press.
[23]
Enrico Minack, Wolf Siberski, and Wolfgang Nejdl. 2011. Incremental diversification for very large sets: a streaming-based approach. In SIGIR. 585--594.
[24]
Kostas Patroumpas and Manolis Loukadakis. 2016. Monitoring Spatial Coverage of Trending Topics in Twitter. In SSDBM. 7:1--7:12.
[25]
S. S. Ravi, Daniel J. Rosenkrantz, and Giri Kumar Tayi. 1994. Heuristic and Special Case Algorithms for Dispersion Problems. Operations Research 42, 2 (1994), 299--310.
[26]
Dimitris Sacharidis, Paras Mehta, Dimitrios Skoutas, Kostas Patroumpas, and Agnès Voisard. 2017. Continuous Summarization of Streaming Spatio-Textual Posts. In SIGSPATIAL. 53:1--53:4.
[27]
Gerard Salton and Chris Buckley. 1988. Term-Weighting Approaches in Automatic Text Retrieval. Inf. Process. Manage. 24, 5 (1988), 513--523.
[28]
Hiroya Takamura and Manabu Okumura. 2009. Text Summarization Model Based on Maximum Coverage Problem and its Variant. In EACL. 781--789.
[29]
Erik Vee, Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, and Sihem Amer-Yahia. 2008. Efficient Computation of Diverse Query Results. In ICDE. 228--236.
[30]
Marcos R. Vieira, Humberto Luiz Razente, Maria Camila Nardini Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina, and Vassilis J. Tsotras. 2011. On query result diversification. In ICDE. 1163--1174.
[31]
Xiang Wang, Ying Zhang, Wenjie Zhang, Xuemin Lin, and Zengfeng Huang. 2016. SKYPE: Top-k Spatial-keyword Publish/Subscribe Over Sliding Window. PVLDB 9, 7 (2016), 588--599.
[32]
Wenjian Xu and Chi-Yin Chow. 2016. A Location- and Diversity-Aware News Feed System for Mobile Users. IEEE Trans. Services Computing 9, 6 (2016), 846--861.

Cited By

View all
  • (2023)Proportionality on Spatial Data with ContextACM Transactions on Database Systems10.1145/358843448:2(1-37)Online publication date: 13-May-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '18: Proceedings of the 30th International Conference on Scientific and Statistical Database Management
July 2018
314 pages
ISBN:9781450365055
DOI:10.1145/3221269
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. continuous diversification
  2. spatio-textual streams
  3. summarization

Qualifiers

  • Research-article

Conference

SSDBM '18

Acceptance Rates

SSDBM '18 Paper Acceptance Rate 30 of 75 submissions, 40%;
Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Proportionality on Spatial Data with ContextACM Transactions on Database Systems10.1145/358843448:2(1-37)Online publication date: 13-May-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media