Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

AQWA: adaptive query workload aware partitioning of big spatial data

Published: 01 September 2015 Publication History

Abstract

The unprecedented spread of location-aware devices has resulted in a plethora of location-based services in which huge amounts of spatial data need to be efficiently processed by large-scale computing clusters. Existing cluster-based systems for processing spatial data employ static data-partitioning structures that cannot adapt to data changes, and that are insensitive to the query workload. Hence, these systems are incapable of consistently providing good performance. To close this gap, we present AQWA, an adaptive and query-workload-aware mechanism for partitioning large-scale spatial data. AQWA does not assume prior knowledge of the data distribution or the query workload. Instead, as data is consumed and queries are processed, the data partitions are incrementally updated. With extensive experiments using real spatial data from Twitter, and various workloads of range and k-nearest-neighbor queries, we demonstrate that AQWA can achieve an order of magnitude enhancement in query performance compared to the state-of-the-art systems.

References

[1]
https://github.com/ahmed-m-aly/AQWA.git.
[2]
Apache storm. https://storm.apache.org, 2015.
[3]
Twitter statistics. http://www.internetlivestats.com/twitter-statistics/, 2015.
[4]
D. Achakeev, B. Seeger, and P. Widmayer. Sort-based query-adaptive loading of r-trees. In CIKM, pages 2080--2084, 2012.
[5]
A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. H. Saltz. Hadoop-GIS: A high performance spatial data warehousing system over MapReduce. PVLDB, 6(11):1009--1020, 2013.
[6]
A. M. Aly, W. G. Aref, and M. Ouzzani. Cost estimation of spatial k-nearest-neighbor operators. In EDBT, pages 457--468, 2015.
[7]
N. An, Z. Yang, and A. Sivasubramaniam. Selectivity estimation for spatial joins. In ICDE, pages 368--375, 2001.
[8]
R. Beigel and E. Tanin. The geometry of browsing. In LATIN, pages 331--340, 1998.
[9]
J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509--517, 1975.
[10]
M. d. Berg, O. Cheong, M. v. Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag TELOS, 3rd ed. edition, 2008.
[11]
S. Chen. Cheetah: A high performance, custom data warehouse on top of MapReduce. PVLDB, 3(2):1459--1468, 2010.
[12]
D. Comer. The ubiquitous B-tree. ACM Comput. Surv., 11(2):121--137, 1979.
[13]
C. Curino, Y. Zhang, E. P. C. Jones, and S. Madden. Schism: A workload-driven approach to database replication and partitioning. PVLDB, 3(1):48--57, 2010.
[14]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004.
[15]
G. Dröge and H. Schek. Query-adaptive data space partitioning using variable-size storage clusters. In SSD, pages 337--356, 1993.
[16]
A. Eldawy, L. Alarabi, and M. F. Mokbel. Spatial partitioning techniques in SpatialHadoop. PVLDB, 8(12):1602--1613, 2015.
[17]
A. Eldawy and M. F. Mokbel. A demonstration of SpatialHadoop: An efficient MapReduce framework for spatial data. PVLDB, 6(12):1230--1233, 2013.
[18]
A. Eldawy and M. F. Mokbel. Pigeon: A spatial MapReduce language. In ICDE, pages 1242--1245, 2014.
[19]
A. Eldawy and M. F. Mokbel. SpatialHadoop: A MapReduce framework for spatial data. In ICDE, pages 1352--1363, 2015.
[20]
M. Y. Eltabakh, F. Özcan, Y. Sismanis, P. J. Haas, H. Pirahesh, and J. Vondrák. Eagle-eyed elephant: Split-oriented indexing in Hadoop. In EDBT, pages 89--100, 2013.
[21]
M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, and J. McPherson. CoHadoop: Flexible data placement and its exploitation in Hadoop. PVLDB, 4(9):575--585, 2011.
[22]
A. Floratou, J. M. Patel, E. J. Shekita, and S. Tata. Column-oriented storage techniques for MapReduce. PVLDB, 4(7):419--429, 2011.
[23]
S. Geffner, D. Agrawal, A. El Abbadi, and T. R. Smith. Relative prefix sums: An efficient approach for querying dynamic OLAP data cubes. In ICDE, pages 328--335, 1999.
[24]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.
[25]
C. Ho, R. Agrawal, N. Megiddo, and R. Srikant. Range queries in OLAP data cubes. In SIGMOD, pages 73--88, 1997.
[26]
L. Jiang, B. Li, and M. Song. The optimization of HDFS based on small files. In IC-BNMT, pages 912--915, 2010.
[27]
H. Liao, J. Han, and J. Fang. Multi-dimensional index on Hadoop distributed file system. In NAS, pages 240--249, 2010.
[28]
G. Mackey, S. Sehrish, and J. Wang. Improving metadata management for small files in HDFS. In CLUSTER, pages 1--4, 2009.
[29]
A. R. Mahmood, A. M. Aly, T. Qadah, E. K. Rezig, A. Daghistani, A. Madkour, A. S. Abdelhamid, M. S. Hassan, W. G. Aref, and S. Basalamah. Tornado: A distributed spatio-textual stream processing system. PVLDB, 8(12):2020--2031, 2015.
[30]
S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive analysis of web-scale datasets. PVLDB, 3(1):330--339, 2010.
[31]
G. Moerkotte. Small materialized aggregates: A light weight index structure for data warehousing. In VLDB, pages 476--487, 1998.
[32]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: A not-so-foreign language for data processing. In SIGMOD, pages 1099--1110, 2008.
[33]
A. Pavlo, C. Curino, and S. B. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In SIGMOD, pages 61--72, 2012.
[34]
S. Prabhakar, K. A. S. Abdel-Ghaffar, D. Agrawal, and A. El Abbadi. Cyclic allocation of two-dimensional data. In ICDE, pages 94--101, 1998.
[35]
S. Prabhakar, D. Agrawal, and A. El Abbadi. Efficient disk allocation for fast similarity searching. In SPAA, pages 78--87, 1998.
[36]
M. Riedewald, D. Agrawal, and A. El Abbadi. Flexible data cubes for online aggregation. In ICDT, pages 159--173, 2001.
[37]
M. Riedewald, D. Agrawal, A. El Abbadi, and R. Pajarola. Space-efficient data cubes for dynamic environments. In DaWak, pages 24--33, 2000.
[38]
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD, pages 71--79, 1995.
[39]
H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., 2006.
[40]
C. Sun, D. Agrawal, and A. El Abbadi. Selectivity estimation for spatial joins with geometric selections. In EDBT, pages 609--626, 2002.
[41]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - A warehousing solution over a Map-Reduce framework. PVLDB, 2(2):1626--1629, 2009.
[42]
K. Tzoumas, M. L. Yiu, and C. S. Jensen. Workload-aware indexing of continuously moving objects. PVLDB, 2(1):1186--1197, 2009.
[43]
T. White. Hadoop: The Definitive Guide. O'Reilly Media, Inc., 1st edition, 2009.
[44]
S. Zhang, L. Miao, D. Zhang, and Y. Wang. A strategy to deal with mass small files in HDFS. In IHMSC, volume 1, pages 331--334, 2014.

Cited By

View all
  • (2024)A Generic Machine Learning Model for Spatial Query Optimization based on Spatial EmbeddingsACM Transactions on Spatial Algorithms and Systems10.1145/365763310:4(1-33)Online publication date: 13-Apr-2024
  • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
  • (2023)Adaptive Indexing of Objects with Spatial ExtentProceedings of the VLDB Endowment10.14778/3598581.359859616:9(2248-2260)Online publication date: 1-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 8, Issue 13
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
September 2015
144 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2015
Published in PVLDB Volume 8, Issue 13

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Generic Machine Learning Model for Spatial Query Optimization based on Spatial EmbeddingsACM Transactions on Spatial Algorithms and Systems10.1145/365763310:4(1-33)Online publication date: 13-Apr-2024
  • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
  • (2023)Adaptive Indexing of Objects with Spatial ExtentProceedings of the VLDB Endowment10.14778/3598581.359859616:9(2248-2260)Online publication date: 1-May-2023
  • (2023)STAR: A Cache-based Stream Warehouse System for Spatial DataACM Transactions on Spatial Algorithms and Systems10.1145/36059449:4(1-27)Online publication date: 27-Jun-2023
  • (2023)Learned Spatial Data PartitioningProceedings of the Sixth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3593078.3593932(1-8)Online publication date: 18-Jun-2023
  • (2023)SynopsisDB: Distributed Synopsis-based Data Processing SystemCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589394(289-291)Online publication date: 4-Jun-2023
  • (2023)SAT: sampling acceleration tree for adaptive database repartitionWorld Wide Web10.1007/s11280-023-01199-326:5(3503-3533)Online publication date: 3-Aug-2023
  • (2022)Incremental partitioning for efficient spatial data analyticsProceedings of the VLDB Endowment10.14778/3494124.349415015:3(713-726)Online publication date: 4-Feb-2022
  • (2022)Flexible Computation of Multidimensional HistogramsSpatial Gems, Volume 110.1145/3548732.3548746(119-130)Online publication date: 5-Aug-2022
  • (2022)GeoBalance: workload-aware partitioning of real-time spatiotemporal dataGeoinformatica10.1007/s10707-021-00444-z26:1(67-94)Online publication date: 1-Jan-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media