Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2833312.2833314acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdcnConference Proceedingsconference-collections
research-article

D8-tree: a de-normalized approach for multidimensional data analysis on key-value databases

Published: 04 January 2016 Publication History

Abstract

The shift to more parallel and distributed computer architectures changed how data is managed consequently giving birth to a new generation of software products, namely NoSQL. These products offer a scalable and reliable solution for "Big Data", but none of them solves the problem of analyzing and visualizing multidimensional data. There are solutions for scaling analytic workloads, for creating distributed databases and for indexing multidimensional data, but there is no single solution that points to all three goals together.
We propose the D8-tree, a De-normalized Octa-tree index that supports all three goals. It works with both analytical and data-thinning queries on multidimensional data ensuring, at the same time, low latency and a linear scalability. We have implemented a D8-tree prototype, and we compared it with PostGIS on a set of queries required by an in-house HPC application. We found the D8-tree outperforming PostGIS in all tested queries, with a performance gain up to 47 times.

References

[1]
Scalable Nearest Neighbour Algorithms for High Dimensional Data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11):2227--2240, 2014.
[2]
L. Arge, M. de Berg, H. J. Haverkort, and K. Yi. The Priority R-tree: A Practically Efficient and Worst-case Optimal R-tree. ACM Transactions on Algorithms, 2008.
[3]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles, 1990.
[4]
G. Cong, C. S. Jensen, and D. Wu. Efficient retrieval of the top-k most relevant spatial web objects. Proceedings of the VLDB Endowment, 2(1), 2009.
[5]
C. Cugnasco, R. Hernandez, Y. Becerra, J. Torres, and E. Ayguadé. Aeneas: A tool to enable applications to effectively use non-relational databases. In Procedia Computer Science, volume 18. Elsevier, 2013.
[6]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66(4), 2003.
[7]
R. Finkel and J. Bentley. Quad trees a data structure for retrieval on composite keys. Acta Informatica, 4(1):1--9, 1974.
[8]
S. F. Frisken and R. N. Perry. Simple and Efficient Traversal Methods for Quadtrees and Octrees. Journal of Graphics Tools, 7, 2002.
[9]
A. Guttman. R-trees: a dynamic index structure for spatial searching. ACM Press, 1984.
[10]
D. Han and E. Stroulia. HGrid: A Data Model for Large Geospatial Data Sets in HBase. 2013 IEEE Sixth International Conference on Cloud Computing.
[11]
R. Hernandez, C. Cugnasco, Y. Becerra, J. Torres, and E. Ayguade. Experiences of using cassandra for molecular dynamics simulations. In Proceedings of PDP 2015, March 2015.
[12]
I. Kamel and C. Faloutsos. Hilbert R-tree: An Improved R-tree using Fractals. International Conference on Very Large Databases (VLDB), 1994.
[13]
R. K. V. Kothuri, S. Ravada, and D. Abugov. Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data. the 2002 ACM SIGMOD international conference on Management of data, 2002.
[14]
A. Lakshman and P. Malik. Cassandra. ACM SIGOPS Operating Systems Review, 2010.
[15]
S. Nishimura, S. Das, D. Agrawal, and A. El Abbadi. MD -HBase: Design and implementation of an elastic data infrastructure for cloud-scale location services. Distributed and Parallel Databases, 31, 2013.
[16]
A. D. Sarma, H. Lee, H. Gonzalez, J. Madhavan, and A. Halevy. Consistent thinning of large geographical data for map visualization. ACM Transactions on Database Systems, 38(4), nov 2013.
[17]
T. K. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree: A Dynamic Index for Multi-dimensional Objects. In International Conference on Very Large Databases (VLDB), 1987.
[18]
F. Tauheed, L. Biveinis, T. Heinis, F. Schurmann, H. Markram, and A. Ailamaki. Accelerating Range Queries for Brain Simulations. 2012 IEEE 28th International Conference on Data Engineering.
[19]
E. Tejedor, Y. Becerra, G. Alomar, A. Queralt, R. M. Badia, J. Torres, T. Cortes, and J. Labarta. Pycompss: Parallel computational workflows in python. International Journal of High Performance Computing Applications, 2015.
[20]
M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees.... on Very large data bases-Volume ..., 2004.
[21]
A. Toshniwal, J. Donham, N. Bhagat, S. Mittal, D. Ryaboy, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, and M. Fu. Storm@twitter. Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14, pages 147--156, 2014.
[22]
M. Vazquez, G. Houzeaux, S. Koric, A. Artigues, J. Aguado-Sierra, R. Aris, D. Mira, C. Hadrien, F. Cucchietti, H. Owen, A. Taha, and J. M. Cela. Alya: Towards Exascale for Engineering Simulation Codes. Supercomputing, 2014.
[23]
L.-Y. Wei, Y.-T. Hsu, W.-C. Peng, and W.-C. Lee. Indexing spatial data in cloud data managements. Pervasive and Mobile Computing, (1), 2013.
[24]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, 2010.
[25]
X. Zhang, J. Ai, Z. Wang, J. Lu, and X. Meng. An efficient multi-dimensional index for cloud data management. Proceeding of the first international workshop on Cloud data management - CloudDB 2009.
[26]
Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-y. Ma. Hybrid Index Structures for Location-based Web Search*. Cikm, (49), 2005.

Cited By

View all
  • (2019)The OTree: Multidimensional Indexing with efficient data Sampling for HPC2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006121(433-440)Online publication date: Dec-2019
  • (2019)Evaluating the Benefits of Key-Value Databases for Scientific ApplicationsComputational Science – ICCS 201910.1007/978-3-030-22734-0_30(412-426)Online publication date: 8-Jun-2019
  • (2017)Exploiting Key-Value Data Stores Scalability for HPC2017 46th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2017.25(85-94)Online publication date: Aug-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICDCN '16: Proceedings of the 17th International Conference on Distributed Computing and Networking
January 2016
370 pages
ISBN:9781450340328
DOI:10.1145/2833312
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 January 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NoSQL
  2. big data
  3. data visualization
  4. key-value database
  5. multidimensional index

Qualifiers

  • Research-article

Funding Sources

Conference

ICDCN '16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)The OTree: Multidimensional Indexing with efficient data Sampling for HPC2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006121(433-440)Online publication date: Dec-2019
  • (2019)Evaluating the Benefits of Key-Value Databases for Scientific ApplicationsComputational Science – ICCS 201910.1007/978-3-030-22734-0_30(412-426)Online publication date: 8-Jun-2019
  • (2017)Exploiting Key-Value Data Stores Scalability for HPC2017 46th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2017.25(85-94)Online publication date: Aug-2017
  • (2017)ParaView + Alya + D8tree: Integrating High Performance Computing and High Performance Data AnalyticsProcedia Computer Science10.1016/j.procs.2017.05.170108(465-474)Online publication date: 2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media