Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2619228.2619232acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Database cracking: fancy scan, not poor man's sort!

Published: 23 June 2014 Publication History

Abstract

Database Cracking is an appealing approach to adaptive indexing: on every range-selection query, the data is partitioned using the supplied predicates as pivots. The core of database cracking is, thus, pivoted partitioning. While pivoted partitioning, like scanning, requires a single pass through the data it tends to have much higher costs due to lower CPU efficiency. In this paper, we conduct an in-depth study of the reasons for the low CPU efficiency of pivoted partitioning. Based on the findings, we develop an optimized version with significantly higher (single-threaded) CPU efficiency. We also develop a number of multi-threaded implementations that are effectively bound by memory bandwidth. Combining all of these optimizations we achieve an implementation that has costs close to or better than an ordinary scan on a variety of systems ranging from low-end (cheaper than $300) desktop machines to high-end (above $60,000) servers.

References

[1]
Intel 64 and IA-32 Architectures Optimization Reference Manual. Appendix B: Using Performance Monitoring Events. Intel Corporation, June 2013.
[2]
S. Agrawal, S. Chaudhuri, L. Kollár, A. P. Marathe, V. R. Narasayya, and M. Syamala. Database Tuning Advisor for Microsoft SQL Server 2005. In VLDB, pages 1110--1121, 2004.
[3]
A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a Modern Processor: Where Does Time Go? In VLDB, pages 266--277, 1999.
[4]
C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited. PVLDB, 7(1):85--96, 2013.
[5]
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, pages 225--237, 2005.
[6]
N. Bruno and S. Chaudhuri. An Online Approach to Physical Design Tuning. In ICDE, pages 826--835, 2007.
[7]
J. Chhugani, A. D. Nguyen, V. W. Lee, W. Macy, M. Hagog, Y.-K. Chen, A. Baransi, S. Kumar, and P. Dubey. Efficient Implementation of Sorting on Multi-Core SIMD CPU Architecture. PVLDB, 1(2):1313--1324, 2008.
[8]
J. Cieslewicz and K. A. Ross. Adaptive Aggregation on Chip Multi-processors. In VLDB, pages 339--350, 2007.
[9]
B. Dageville, D. Das, K. Dias, K. Yagoub, M. Zaït, and M. Ziauddin. Automatic SQL Tuning in Oracle 10g. In VLDB, pages 1098--1109, 2004.
[10]
G. Graefe, F. Halim, S. Idreos, H. A. Kuno, and S. Manegold. Concurrency Control for Adaptive Indexing. PVLDB, 5(7):656--667, 2012.
[11]
G. Graefe, F. Halim, S. Idreos, H. A. Kuno, S. Manegold, and B. Seeger. Transactional Support for Adaptive Indexing. VLDBJ, 23(2):303--328, 2014.
[12]
G. Graefe, S. Idreos, H. A. Kuno, and S. Manegold. Benchmarking Adaptive Indexing. In TPCTC, pages 169--184, 2010.
[13]
G. Graefe and H. Kuno. Adaptive Indexing for Relational Keys. In ICDE Workshops, pages 69--74, 2010.
[14]
G. Graefe and H. A. Kuno. Self-Selecting, Self-Tuning, Incrementally Optimized Indexes. In EDBT, pages 371--381, 2010.
[15]
F. Halim, S. Idreos, P. Karras, and R. H. C. Yap. Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores. PVLDB, 5(6):502--513, 2012.
[16]
J. L. Hennessy and D. A. Patterson. Computer Architecture - A Quantitative Approach. Morgan Kaufmann, 5th edition, 2012.
[17]
C. A. R. Hoare. Algorithm 64: Quicksort. CACM, 4(7), 1961.
[18]
S. Idreos. Database Cracking: Towards Auto-tuning Database Kernels. PhD thesis, CWI, June 2010.
[19]
S. Idreos, M. L. Kersten, and S. Manegold. Database Cracking. In CIDR, pages 68--78, 2007.
[20]
S. Idreos, M. L. Kersten, and S. Manegold. Updating a Cracked Database. In SIGMOD, pages 413--424, 2007.
[21]
S. Idreos, M. L. Kersten, and S. Manegold. Self-Organizing Tuple Reconstruction in Column-Stores. In SIGMOD, pages 297--308, 2009.
[22]
S. Idreos, S. Manegold, H. A. Kuno, and G. Graefe. Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores. PVLDB, 4(9):585--597, 2011.
[23]
M. Johnson. Superscalar Microprocessor Design. Prentice Hall, 1991.
[24]
C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. PVLDB, 2(2):1378--1389, 2009.
[25]
M. Lühring, K.-U. Sattler, K. Schmidt, and E. Schallehn. Autonomous Management of Soft Indexes. In SMDB, pages 450--458, 2007.
[26]
S. Richter, J.-A. Quiané-Ruiz, S. Schuh, and J. Dittrich. Towards zero-overhead static and adaptive indexing in hadoop. VLDBJ, 23(3):469--494, 2014.
[27]
K. A. Ross. Selection Conditions in Main Memory. TODS, pages 132--161, 2004.
[28]
K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis. COLT: Continuous On-Line Tuning. In SIGMOD, pages 793--795, 2006.
[29]
F. M. Schuhknecht, A. Jindal, and J. Dittrich. The Uncracked Pieces in Database Cracking. PVLDB, 7(2):97--108, 2013.
[30]
J. Zhou and K. A. Ross. Implementing Database Operations Using SIMD Instructions. In SIGMOD, pages 145--156, 2002.
[31]
D. C. Zilio, J. Rao, S. Lightstone, G. M. Lohman, A. J. Storm, C. Garcia-Arellano, and S. Fadden. DB2 Design Advisor: Integrated Automatic Physical Database Design. In VLDB, pages 1087--1097, 2004.

Cited By

View all
  • (2024)Optimizing the B+tree Index with Hotness Awareness and AdaptivityAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5581-3_29(356-367)Online publication date: 1-Aug-2024
  • (2022)Cracking in-memory database indexInformation Systems10.1016/j.is.2021.101913104:COnline publication date: 1-Feb-2022
  • (2022)Data Management in Machine Learning SystemsundefinedOnline publication date: 26-Feb-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DaMoN '14: Proceedings of the Tenth International Workshop on Data Management on New Hardware
June 2014
71 pages
ISBN:9781450329712
DOI:10.1145/2619228
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)8
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing the B+tree Index with Hotness Awareness and AdaptivityAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5581-3_29(356-367)Online publication date: 1-Aug-2024
  • (2022)Cracking in-memory database indexInformation Systems10.1016/j.is.2021.101913104:COnline publication date: 1-Feb-2022
  • (2022)Data Management in Machine Learning SystemsundefinedOnline publication date: 26-Feb-2022
  • (2022)Data Exploration Using Example-Based MethodsundefinedOnline publication date: 25-Feb-2022
  • (2022)Databases on Modern HardwareundefinedOnline publication date: 25-Feb-2022
  • (2021)Indexing in Big Data Mining and AnalyticsMachine Learning and Data Mining for Emerging Trend in Cyber Dynamics10.1007/978-3-030-66288-2_5(123-143)Online publication date: 2-Apr-2021
  • (2019)Efficient Evaluation of Multi-Column Selection Predicates in Main-MemoryIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.282534931:7(1296-1311)Online publication date: 1-Jul-2019
  • (2019)Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping StudyIEEE Access10.1109/ACCESS.2018.28822447(10691-10717)Online publication date: 2019
  • (2019)Past and Future Steps for Adaptive Storage Data Systems: From Shallow to Deep AdaptivityReal-Time Business Intelligence and Analytics10.1007/978-3-030-24124-7_6(85-94)Online publication date: 11-Oct-2019
  • (2018)An analysis and comparison of database cracking kernelsProceedings of the 14th International Workshop on Data Management on New Hardware10.1145/3211922.3211930(1-10)Online publication date: 11-Jun-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media