Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3274895.3274932acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Adaptive processing of spatial-keyword data over a distributed streaming cluster

Published: 06 November 2018 Publication History

Abstract

The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based ad-targeting systems enable advertisers to register millions of ads to millions of users based on the users' location and textual profile. Existing streaming systems are either centralized or are not spatial-keyword aware, and hence these systems cannot efficiently support the processing of rapidly arriving spatial-keyword data streams. In this paper, we introduce a two-layered indexing scheme for the distributed processing of spatial-keyword data streams. We realize this indexing scheme in Tornado, a distributed spatial-keyword streaming system. The first layer, termed the routing layer, is used to fairly distribute the workload, and furthermore, co-locate the data objects and the corresponding queries at the same processing units. The routing layer uses the Augmented-Grid, a novel structure that is equipped with an efficient search algorithm for distributing the data objects and queries. The second layer, termed the evaluation layer, resides within each processing unit to reduce the processing overhead. The two-layered index adapts to changes in the workload by applying a cost formula that continuously represents the processing overhead at each processing unit. Extensive experimental evaluation using real Twitter data indicates that Tornado achieves high scalability and more than 2x improvement over the baseline approach in terms of the overall system throughput.

References

[1]
2018. Hadoop. http://hadoop.apache.org/.
[2]
2018. Internet live stats. https://internetlivestats.com/.
[3]
2018. Keyword search statistics. http://www.keyworddiscovery.com/keyword-stats.html.
[4]
2018. The size of Hadoop clusters in Yahoo! https://wiki.apache.org/hadoop/PoweredBy.
[5]
Ahmed M. Aly Ahmed R. Mahmood and Walid G. Aref. 2018. FAST: Frequency-Aware Indexing for Spatio-Textual Data Streams. In ICDE.
[6]
Ahmed M Aly, Ahmed R Mahmood, Mohamed S Hassan, Walid G Aref, Mourad Ouzzani, Hazem Elmeleegy, and Thamir Qadah. 2015. AQWA: adaptive query workload aware partitioning of big spatial data. PVLDB 8, 13 (2015), 2062--2073.
[7]
Walid G Aref and Hanan Samet. 1990. Efficient processing of window queries in the pyramid data structure. In PODS. ACM, 265--272.
[8]
Lisi Chen, Gao Cong, Christian S Jensen, and Dingming Wu. 2013. Spatial keyword query processing: An experimental evaluation. In VLDB, Vol. 6. 217--228.
[9]
Zhida Chen, Gao Cong, Zhenjie Zhang, Tom ZJ Fuz, and Lisi Chen. 2017. Distributed publish/subscribe query processing on the spatio-textual data stream. In ICDE. IEEE, 1095--1106.
[10]
Michelangelo Grigni and Fredrik Manne. 1996. On the complexity of the generalized block distribution. In Parallel Algorithms for Irregularly Structured Problems. Springer, 319--326.
[11]
Antonin Guttman. 1984. R-trees: a dynamic index structure for spatial searching. Vol. 14. ACM.
[12]
Zeinab Hmedeh, Harris Kourdounakis, Vassilis Christophides, Cédric Du Mouza, Michel Scholl, and Nicolas Travers. 2012. Subscription indexes for web syndication systems. In EDBT. ACM, 312--323.
[13]
Ahmed R Mahmood, Ahmed M Aly, Thamir Qadah, El Kindi Rezig, Anas Daghistani, Amgad Madkour, Ahmed S Abdelhamid, Mohamed S Hassan, Walid G Aref, and Saleh Basalamah. 2015. Tornado: A distributed spatio-textual stream processing system. PVLDB 8, 12 (2015), 2020--2023.
[14]
Beng Chin Ooi, Ken J McDonell, and Ron Sacks-Davis. 1987. Spatial kd-tree: An indexing mechanism for spatial databases. In IEEE COMPSAC, Vol. 87. 85.
[15]
Hanan Samet. 1990. The design and analysis of spatial data structures. Vol. 85. Addison-Wesley Reading, MA.
[16]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. 2014. Storm@twitter. In SIGMOD. ACM, 147--156.
[17]
Subodh Vaid, Christopher B Jones, Hideo Joho, and Mark Sanderson. 2005. Spatio-textual indexing for geographical search on the web. In Advances in Spatial and Temporal Databases. 218--235.
[18]
Xiang Wang, Ying Zhang, Wenjie Zhang, Xuemin Lin, and Wei Wang. 2015. Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In ICDE. 1107--1118.
[19]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: cluster computing with working sets., 10--10 pages.
[20]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In ACM Symposium on Operating Systems Principles. ACM, 423--438.
[21]
Yu Zhang, Youzhong Ma, and Xiaofeng Meng. 2014. Efficient Spatio-textual Similarity Join Using MapReduce. In IAT, Vol. 1. 52--59.
[22]
Justin Zobel and Alistair Moffat. 2006. Inverted files for text search engines. ACM computing surveys 38, 2 (2006), 6.

Cited By

View all
  • (2024)Mobility Data Science: Perspectives and ChallengesACM Transactions on Spatial Algorithms and Systems10.1145/365215810:2(1-35)Online publication date: 1-Jul-2024
  • (2024)A Distributed and Scalable Framework for Low-Latency Continuous Trajectory Stream ProcessingIEEE Access10.1109/ACCESS.2024.348443312(159426-159444)Online publication date: 2024
  • (2023)STAR: A Cache-based Stream Warehouse System for Spatial DataACM Transactions on Spatial Algorithms and Systems10.1145/36059449:4(1-27)Online publication date: 27-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '18: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
November 2018
655 pages
ISBN:9781450358897
DOI:10.1145/3274895
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed streaming
  2. spatial-keyword processing

Qualifiers

  • Research-article

Conference

SIGSPATIAL '18
Sponsor:

Acceptance Rates

SIGSPATIAL '18 Paper Acceptance Rate 30 of 150 submissions, 20%;
Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Mobility Data Science: Perspectives and ChallengesACM Transactions on Spatial Algorithms and Systems10.1145/365215810:2(1-35)Online publication date: 1-Jul-2024
  • (2024)A Distributed and Scalable Framework for Low-Latency Continuous Trajectory Stream ProcessingIEEE Access10.1109/ACCESS.2024.348443312(159426-159444)Online publication date: 2024
  • (2023)STAR: A Cache-based Stream Warehouse System for Spatial DataACM Transactions on Spatial Algorithms and Systems10.1145/36059449:4(1-27)Online publication date: 27-Jun-2023
  • (2023)Approximate Reverse Top-k Spatial-Keyword Queries2023 24th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM58254.2023.00026(96-105)Online publication date: Jul-2023
  • (2022)Example-based spatial pattern matchingProceedings of the VLDB Endowment10.14778/3551793.355181515:11(2572-2584)Online publication date: 1-Jul-2022
  • (2022)Guard: Attack-Resilient Adaptive Load Balancing in Distributed Streaming SystemsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.312307119:6(4172-4186)Online publication date: 1-Nov-2022
  • (2022)GeoFlink: An Efficient and Scalable Spatial Data Stream Management SystemIEEE Access10.1109/ACCESS.2022.315406310(24909-24935)Online publication date: 2022
  • (2022)Streaming Big Spatial DataEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_70-3(1-9)Online publication date: 26-Nov-2022
  • (2022)BMTP: Combining Backward Matching with Tree-Based Pruning for Large-Scale Content-Based Pub/Sub SystemsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-95391-1_10(152-166)Online publication date: 23-Feb-2022
  • (2021)SWARM: Adaptive Load Balancing in Distributed Streaming Systems for Big Spatial DataACM Transactions on Spatial Algorithms and Systems10.1145/34600137:3(1-43)Online publication date: 8-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media