research-article

Quickstep: a data platform based on the scaling-up approach

Editors: Jian Pei, Sihem Amer-Yahia Authors:

Jignesh M. Patel,

Harshad Deshmukh,

Marc Spehlmann,

Hakan Memisoglu,

Saket SaurabhAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 11, Issue 6

Pages 663 - 676

https://doi.org/10.14778/3184470.3184471

Published: 01 February 2018 Publication History

Abstract

Modern servers pack enough storage and computing power that just a decade ago was spread across a modest-sized cluster. This paper presents a prototype system, called Quickstep, to exploit the large amount of parallelism that is packed inside modern servers. Quickstep builds on a vast body of previous methods for organizing data, optimizing, scheduling and executing queries, and brings them together in a single system. Quickstep also includes new query processing methods that go beyond previous approaches. To keep the project focused, the project's initial target is read-mostly in-memory data warehousing workloads in single-node settings. In this paper, we describe the design and implementation of Quickstep for this target application space. We also present experimental results comparing the performance of Quickstep to a number of other systems, demonstrating that Quickstep is often faster than many other contemporary systems, and in some cases faster by orders-of-magnitude. Quickstep is an Apache (incubating) project.

References

[1]

D. J. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671?682, 2006.

Digital Library

[2]

D. J. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD, pages 967?980, 2008.

Digital Library

[3]

L. Abraham, J. Allen, O. Barykin, V. R. Borkar, B. Chopra, C. Gerea, D. Merl, J. Metzler, D. Reiss, S. Subramanian, J. L. Wiener, and O. Zed. Scuba: Diving into data at facebook. PVLDB, 6(11):1057?1067, 2013.

Digital Library

[4]

M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: relational data processing in spark. In SIGMOD, pages 1383?1394, 2015.

Digital Library

[5]

C. Beeri and R. Ramakrishnan. On the power of magic. In PODS, pages 269?284, 1987.

Digital Library

[6]

P. A. Bernstein and D.-M. W. Chiu. Using semi-joins to solve relational queries. J. ACM, 28(1):25?40, Jan. 1981.

Digital Library

[7]

S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core cpus. In SIGMOD, pages 37?48, 2011.

Digital Library

[8]

B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. CACM, 13:422?426, 1970.

Digital Library

[9]

P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Commun. ACM, 51(12):77?85, 2008.

Digital Library

[10]

P. A. Boncz, T. Neumann, and O. Erling. TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark. In 5th TPC Technology Conference, TPCTC, pages 61?76, 2013.

Digital Library

[11]

P. Bonnet, S. Manegold, M. Bjørling, W. Cao, J. Gonzalez, J. A. Granados, N. Hall, S. Idreos, M. Ivanova, R. Johnson, D. Koop, T. Kraska, R. Müller, D. Olteanu, P. Papotti, C. Reilly, D. Tsirogiannis, C. Yu, J. Freire, and D. E. Shasha. Repeatability and workability evaluation of SIGMOD 2011. SIGMOD Record, 40(2):45?48, 2011.

Digital Library

[12]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A distributed storage system for structured data. OSDI, pages 205?218, 2006.

Digital Library

[13]

C. Chasseur and J. M. Patel. Design and evaluation of storage organizations for read-optimized main memory databases. PVLDB, 6(13):1474?1485, 2013.

Digital Library

[14]

Citus Data. https://www.citusdata.com, 2016.

[15]

D. L. Davison and G. Graefe. Memory-contention responsive hash joins. In VLDB, 1994.

Digital Library

[16]

J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI, pages 10?10, 2004.

Digital Library

[17]

H. Deshmukh, H. Memisoglu, and J. M. Patel. Adaptive concurrent query execution framework for an analytical in-memory database system. IEEE BigData Congress, 2017.

[18]

J. Fan, A. G. S. Raj, and J. M. Patel. The case against specialized graph analytics engines. In CIDR, 2015.

[19]

F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA database ? an architecture overview. IEEE Data Eng. Bull., 35(1):28?33, 2012.

[20]

Z. Feng, E. Lo, B. Kao, and W. Xu. Byteslice: Pushing the envelop of main memory data processing with a new storage layout. In SIGMOD, pages 31?46, 2015.

Digital Library

[21]

G. Graefe. Encapsulation of parallelism in the volcano query processing system. In SIGMOD, pages 102?111, 1990.

Digital Library

[22]

G. Graefe, A. C. König, H. A. Kuno, V. Markl, and K.-U. Sattler. 10381 Summary and Abstracts Collection ? Robust Query Processing. Dagstuhl Seminar Proceedings, Dagstuhl, Germany, 2011.

[23]

Greenplum database. http://greenplum.org, 2016.

[24]

Harshad Deshmukh. Storage Formats in Quickstep. http://quickstep.incubator.apache.org/guides/2017/03/30/storage-formats-quickstep.html, 2017.

[25]

J. M. Hellerstein, M. Stonebraker, and J. R. Hamilton. Architecture of a database system. Foundations and Trends in Databases, 1(2):141?259, 2007.

Digital Library

[26]

IBM Corp. Database design with denormalization. http://ibm.co/2eKWmW1.

[27]

S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40?45, 2012.

[28]

Z. G. Ives and N. E. Taylor. Sideways information passing for push-style query processing. In ICDE, pages 774?783, 2008.

Digital Library

[29]

R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. PVLDB, 1(1):622?634, 2008.

Digital Library

[30]

A. Kemper and T. Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195?206, 2011.

Digital Library

[31]

B. W. Lampson and H. E. Sturgis. Reflections on an operating system design. Commun. ACM, 1976.

Digital Library

[32]

P. Larson, C. Clinciu, C. Fraser, E. N. Hanson, M. Mokhtar, M. Nowakiewicz, V. Papadimos, S. L. Price, S. Rangarajan, R. Rusanu, and M. Saubhasik. Enhancements to SQL server column stores. In SIGMOD, pages 1159?1168, 2013.

Digital Library

[33]

V. Leis, P. A. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: a numa-aware query evaluation framework for the many-core age. In SIGMOD, pages 743?754, 2014.

Digital Library

[34]

Y. Li and J. M. Patel. Bitweaving: Fast scans for main memory data processing. In SIGMOD, pages 289?300, 2013.

Digital Library

[35]

Y. Li and J. M. Patel. WideTable: An accelerator for analytical data processing. PVLDB, 7(10):907?918, 2014.

Digital Library

[36]

S. Manegold, I. Manolescu, L. Afanasiev, J. Feng, G. Gou, M. Hadjieleftheriou, S. Harizopoulos, P. Kalnis, K. Karanasos, D. Laurent, M. Lupu, N. Onose, C. Ré, V. Sans, P. Senellart, T. Wu, and D. E. Shasha. Repeatability & workability evaluation of SIGMOD 2009. SIGMOD Record, 38(3):40?43, 2009.

Digital Library

[37]

I. Manolescu, L. Afanasiev, A. Arion, J. Dittrich, S. Manegold, N. Polyzotis, K. Schnaitter, P. Senellart, S. Zoupanos, and D. E. Shasha. The repeatability experiment of SIGMOD 2008. SIGMOD Record, 37(1):39?45, 2008.

Digital Library

[38]

Microsoft. Implied predicates and query hints. https://blogs.msdn.microsoft.com/craigfr/2009/04/28/implied-predicates-and-query-hints/, 2009.

[39]

Microsoft Corp. Optimizing the Database Design by Denormalizing. https://msdn.microsoft.com/en-us/library/cc505841.aspx.

[40]

F. Nagel, G. M. Bierman, and S. D. Viglas. Code generation for efficient query processing in managed runtimes. PVLDB, 7(12):1095?1106, 2014.

Digital Library

[41]

T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539?550, 2011.

Digital Library

[42]

E. J. O'Neil, P. E. O'Neil, and G. Weikum. An optimality proof of the lru-K page replacement algorithm. J. ACM, 46(1):92?112, 1999.

Digital Library

[43]

P. O'Neil, E. O'Neil, and X. Chen. The star schema benchmark. http://www.cs.umb.edu/poneil/StarSchemaB.pdf, Jan 2007.

[44]

Oracle. Push-down part 2. https://blogs.oracle.com/in-memory/push-down:-part-2, 2015.

[45]

Oracle. White paper. http://www.oracle.com/technetwork/database/in-memory/overview/twp-oracle-database-in-memory-2245633.pdf, 2017.

[46]

Pamela Vagata and Kevin Wilfong. Scaling the Facebook data warehouse to 300 PB. https://code.facebook.com/posts/229861827208629, 2014.

[47]

J. M. Patel, H. Deshmukh, J. Zhu, H. Memisoglu, N. Potti, S. Saurabh, M. Spehlmann, and Z. Zhang. Quickstep: A data platform based on the scaling-in approach. Technical Report 1847, University of Wisconsin-Madison, 2017.

[48]

H. Pirk, O. Moll, M. Zaharia, and S. Madden. Voodoo - A vector algebra for portable database performance on modern hardware. PVLDB, 9(14):1707?1718, 2016.

Digital Library

[49]

PostgreSQL. http://www.postgresql.org, 2016.

[50]

PostgreSQL. Parallel Query. https://wiki.postgresql.org/wiki/Parallel_Query.

[51]

L. Qiao, V. Raman, F. Reiss, P. J. Haas, and G. M. Lohman. Main-memory scan sharing for multi-core cpus. PVLDB, 1(1):610?621, 2008.

Digital Library

[52]

B. Raducanu, P. A. Boncz, and M. Zukowski. Micro adaptivity in vectorwise. In SIGMOD, pages 1231?1242, 2013.

Digital Library

[53]

V. Raman, G. K. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Müller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. J. Storm, and L. Zhang. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080?1091, 2013.

Digital Library

[54]

V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In ICDE, pages 60?69, 2008.

Digital Library

[55]

Amazon Redshift. https://aws.amazon.com/redshift/, 2016.

[56]

Reynold Xin. Technical Preview of Apache Spark 2.0. https://databricks.com/blog/2016/05/11.

[57]

R. Ricci, E. Eide, and The CloudLab Team. Introducing CloudLab: Scientific infrastructure for advancing cloud architectures and applications. USENIX;login:, 39(6), Dec. 2014.

[58]

L. Shrinivas, S. Bodagala, R. Varadarajan, A. Cary, V. Bharathan, and C. Bear. Materialization strategies in the vertica analytic database: Lessons learned. In ICDE, pages 1196?1207. IEEE, 2013.

Digital Library

[59]

K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST '10, pages 1?10, Washington, DC, USA, 2010. IEEE Computer Society.

Digital Library

[60]

Standard Performance Evaluation Corporation. INT2006 (Integer Component of SPEC CPU2006). https://www.spec.org/cpu2006/CINT2006, 2016.

[61]

Statistic Brain Research Institute. Google Annual Search Statistics. http://www.statisticbrain.com/google-searches, 2016.

[62]

M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: A column-oriented DBMS. In VLDB, pages 553?564, 2005.

Digital Library

[63]

Sybase Inc. Denormalizing Tables and Columns. http://infocenter.sybase.com.

[64]

T. Willhalm, I. Oukid, I. Müller, and F. Faerber. Vectorizing database column scans with complex predicates. In ADMS, pages 1?12, 2013.

[65]

T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385?394, 2009.

Digital Library

[66]

R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: SQL and rich analytics at scale. In SIGMOD, pages 13?24, 2013.

Digital Library

[67]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX, pages 15?28, 2012.

[68]

Q. Zeng, J. M. Patel, and D. Page. Quickfoil: Scalable inductive logic programming. PVLDB, 8(3):197?208, 2014.

Digital Library

[69]

J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD, pages 145?156, 2002.

Digital Library

[70]

J. Zhu, N. Potti, S. Saurabh, and J. M. Patel. Looking ahead makes query plans robust. PVLDB, 10(8):889?900, 2017.

Digital Library

[71]

M. Zukowski and P. A. Boncz. Vectorwise: Beyond column stores. IEEE Data Eng. Bull., 35(1):21?27, 2012.

Cited By

Wu ZMarcus RLiu ZNegi PNathan VPfeil PSaxena GRahman MNarayanaswamy BKraska TBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Stage: Query Execution Time Prediction in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653391(280-294)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653391
Prammer MPatel J(2023)Rethinking the Encoding of Integers for Scans on Skewed DataProceedings of the ACM on Management of Data10.1145/36267511:4(1-27)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626751
Gaffney KPrammer MBrasfield LHipp DKennedy DPatel J(2022)SQLiteProceedings of the VLDB Endowment10.14778/3554821.355484215:12(3535-3547)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.14778/3554821.3554842
Show More Cited By

Quickstep: a data platform based on the scaling-up approach
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Quickstep: a data platform based on the scaling-up approach

Modern servers pack enough storage and computing power that just a decade ago was spread across a modest-sized cluster. This paper presents a prototype system, called Quickstep, to exploit the large amount of parallelism that is packed inside modern ...
Quickstep: a data platform based on the scaling-up approach

Modern servers pack enough storage and computing power that just a decade ago was spread across a modest-sized cluster. This paper presents a prototype system, called Quickstep, to exploit the large amount of parallelism that is packed inside modern ...
Adobe Flash Professional CS5 Classroom in a Book

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 11, Issue 6

February 2018

210 pages

ISSN:2150-8097

Editors:
Jian Pei
Simon Fraser University
,
Sihem Amer-Yahia
University of Grenoble Alpes, CNRS

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2018

Published in PVLDB Volume 11, Issue 6

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
267
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)16

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu ZMarcus RLiu ZNegi PNathan VPfeil PSaxena GRahman MNarayanaswamy BKraska TBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Stage: Query Execution Time Prediction in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653391(280-294)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653391
Prammer MPatel J(2023)Rethinking the Encoding of Integers for Scans on Skewed DataProceedings of the ACM on Management of Data10.1145/36267511:4(1-27)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626751
Gaffney KPrammer MBrasfield LHipp DKennedy DPatel J(2022)SQLiteProceedings of the VLDB Endowment10.14778/3554821.355484215:12(3535-3547)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.14778/3554821.3554842
Gehani AAhmad RIrshad HZhu JPatel J(2021)Digging into big provenance (with SPADE)Communications of the ACM10.1145/347535864:12(48-56)Online publication date: 19-Nov-2021
https://dl.acm.org/doi/10.1145/3475358
Ding JMinhas UChandramouli BWang CLi YLi YKossmann DGehrke JKraska TLi GLi ZIdreos SSrivastava D(2021)Instance-Optimized Data Layouts for Cloud Analytics WorkloadsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457270(418-431)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457270

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents