Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Bringing Compiling Databases to RISC Architectures

Published: 01 February 2023 Publication History

Abstract

Current hardware development greatly influences the design decisions of modern database systems. For many modern performance-focused database systems, query compilation emerged as an integral part and different approaches for code generation evolved, making use of standard compilers, general-purpose compiler libraries, or domain-specific code generators. However, development primarily focused on the dominating x86-64 server architecture; but neglected current hardware developments towards other CPU architectures like ARM and other RISC architectures.
Therefore, we explore the design space of code generation in database systems considering a variety of state-of-the-art compilation approaches with a set of qualitative and quantitative metrics. Based on our findings, we have developed a new code generator called FireARM for AArch64-based systems in our database system, Umbra. We identify general as well as architecture-specific challenges for custom code generation in databases and provide potential solutions to abstract or handle them.
Furthermore, we present an extensive evaluation of different compilation approaches in Umbra on a wide variety of x86-64 and ARM machines. In particular, we compare quantitative performance characteristics such as compilation latency and query throughput.
Our results show that using standard languages and compiler infrastructures reduces the barrier to employing query compilation and allows for high performance on big data sets, while domain-specific code generators can achieve a significantly lower compilation overhead and allow for better targeting of new architectures.

References

[1]
Inc Amazon Web Services. 2022. Factors affecting query performance. https://docs.aws.amazon.com/redshift/latest/dg/c-query-performance.html. Accessed: February 26, 2023.
[2]
Morton M. Astrahan, Mike W. Blasgen, Donald D. Chamberlin, Kapali P. Eswaran, Jim Gray, Patricia P. Griffiths, W. Frank King III, Raymond A. Lorie, Paul R. McJones, James W. Mehl, Gianfranco R. Putzolu, Irving L. Traiger, Bradford W. Wade, and Vera Watson. 1976. System R: Relational Approach to Database Management. ACM Trans. Database Syst. 1, 2 (1976), 97--137.
[3]
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. 2002. Models and Issues in Data Stream Systems. In PODS. ACM, 1--16.
[4]
Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. 1981. Register Allocation Via Coloring. Comput. Lang. 6, 1 (1981), 47--57.
[5]
Donald D. Chamberlin, Morton M. Astrahan, Mike W. Blasgen, Jim Gray, W. Frank King III, Bruce G. Lindsay, Raymond A. Lorie, James W. Mehl, Thomas G. Price, Gianfranco R. Putzolu, Patricia G. Selinger, Mario Schkolnick, Donald R. Slutz, Irving L. Traiger, Bradford W. Wade, and Robert A. Yost. 1981. A History and Evaluation of System R. Commun. ACM 24, 10 (1981), 632--646.
[6]
GCC Developer Community. 2022. GIMPLE. https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html. Accessed: February 26, 2023.
[7]
SQLite Consortium. 2022. The SQLite Bytecode Engine. https://www.sqlite.org/opcode.html. Accessed: February 26, 2023.
[8]
Patrick Damme, Marius Birkenbach, Constantinos Bitsakos, Matthias Boehm, Philippe Bonnet, Florina Ciorba, Mark Dokter, Pawel Dowgiallo, Ahmed Eleliemy, Christian Färber, Georgios Goumas, Dirk Habich, Niclas Hedam, Marlies Hofer, Wenjun Huang, Kevin Innerebner, Vasileios Karakostas, Roman Kern, Tomaž Kosar, and Xiao Zhu. 2022. DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines. In Proceedings of the Conference on Innovative Data Systems Research (CIDR '22).
[9]
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Åke Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In SIGMOD Conference. ACM, 1243--1254.
[10]
Matthias Felleisen. 1991. On the Expressive Power of Programming Languages. Sci. Comput. Program. 17, 1-3 (1991), 35--75.
[11]
Free Software Foundation. 2022. GCC, the GNU Compiler Collection. https://gcc.gnu.org/. Accessed: February 26, 2023.
[12]
Free Software Foundation. 2022. GNU lightning. https://www.gnu.org/software/lightning/manual/lightning.html. Accessed: February 26, 2023.
[13]
Henning Funke, Jan Mühlig, and Jens Teubner. 2020. Efficient generation of machine code for query compilers. In DaMoN. ACM, 6:1--6:7.
[14]
Tim Gubner and Peter A. Boncz. 2021. Charting the Design Space of Query Execution using VOILA. Proc. VLDB Endow. 14, 6 (2021), 1067--1079.
[15]
Paul Havlak. 1997. Nesting of Reducible and Irreducible Loops. ACM Trans. Program. Lang. Syst. 19, 4 (1997), 557--567.
[16]
Ltd. Huawei Technologies Co. 2022. Kunpeng Computing Platform. https://e.huawei.com/en/products/servers/computing-kunpeng. Accessed: February 26, 2023.
[17]
Manos Karpathiotakis, Miguel Branco, Ioannis Alagiannis, and Anastasia Ailamaki. 2014. Adaptive Query Processing on RAW Data. Proc. VLDB Endow. 7, 12 (2014), 1119--1130.
[18]
Timo Kersten, Viktor Leis, and Thomas Neumann. 2021. Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra. VLDB J. 30 (2021), 883--905.
[19]
Marcel Kornacker, Alexander Behm, Victor Bittorf, Taras Bobrovytsky, Casey Ching, Alan Choi, Justin Erickson, Martin Grund, Daniel Hecht, Matthew Jacobs, Ishaan Joshi, Lenni Kuff, Dileep Kumar, Alex Leblang, Nong Li, Ippokratis Pandis, Henry Robinson, David Rorke, Silvius Rus, John Russell, Dimitris Tsirogiannis, Skye Wanderman-Milne, and Michael Yoder. 2015. Impala: A Modern, Open-Source SQL Engine for Hadoop. In CIDR. www.cidrdb.org.
[20]
Nik Krichko. 2021. Comparing Graviton (ARM) Performance to Intel and AMD for MySQL. https://www.percona.com/blog/comparing-graviton-performance-to-arm-and-intel-for-mysql/. Accessed: February 26, 2023.
[21]
Konstantinos Krikellas, Stratis Viglas, and Marcelo Cintra. 2010. Generating code for holistic query evaluation. In ICDE. IEEE Computer Society, 613--624.
[22]
Mark Liu. 2022. ARM-based Server Penetration Rate to Reach 22% by 2025 with Cloud Data Centers Leading the Way, Says TrendForce. https://www.trendforce.com/presscenter/news/19700101-11178.html. Accessed: February 26, 2023.
[23]
Berenice Mann. 2017. Arm Architecture - Armv8.2-A evolution and delivery. https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-architecture-armv8-2-a-evolution-and-delivery. Accessed: February 26, 2023.
[24]
Prashanth Menon, Andrew Pavlo, and Todd C. Mowry. 2017. Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last. Proc. VLDB Endow. 11, 1 (2017), 1--13.
[25]
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (2011), 539--550.
[26]
Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR. www.cidrdb.org.
[27]
Jonas Oberhauser, Rafael Lourenco de Lima Chehab, Diogo Behrens, Ming Fu, Antonio Paolillo, Lilith Oberhauser, Koustubha Bhat, Yuzhong Wen, Haibo Chen, Jaeho Kim, and Viktor Vafeiadis. 2021. VSync: push-button verification and optimization for synchronization primitives on weak memory models. In ASPLOS. ACM, 530--545.
[28]
John K. Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru M. Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2009. The case for RAMClouds: scalable high-performance storage entirely in DRAM. ACM SIGOPS Oper. Syst. Rev. 43, 4 (2009), 92--105.
[29]
Holger Pirk, Oscar R. Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware. Proc. VLDB Endow. 9, 14 (2016), 1707--1718.
[30]
Massimiliano Poletto and Vivek Sarkar. 1999. Linear scan register allocation. ACM Trans. Program. Lang. Syst. 21, 5 (1999), 895--913.
[31]
LLVM Project. 2022. Extending LLVM: Adding instructions, intrinsics, types, etc. https://llvm.org/docs/ExtendingLLVM.html. Accessed: February 26, 2023.
[32]
LLVM Project. 2022. The LLVM Compiler Infrastructure. https://llvm.org/. Accessed: February 26, 2023.
[33]
LLVM Project. 2022. LLVM Language Reference Manual. https://llvm.org/docs/LangRef.html. Accessed: February 26, 2023.
[34]
Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell. 2018. Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8. Proc. ACM Program. Lang. 2, POPL (2018), 19:1--19:29.
[35]
B. K. Rosen, M. N. Wegman, and F. K. Zadeck. 1988. Global Value Numbers and Redundant Computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, California, USA) (POPL '88). Association for Computing Machinery, New York, NY, USA, 12--27.
[36]
Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. 2010. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Commun. ACM 53, 7 (2010), 89--97.
[37]
Amir Shaikhha, Yannis Klonatos, and Christoph Koch. 2018. Building Efficient Query Engines in a High-Level Language. ACM Trans. Database Syst. 43, 1 (2018), 4:1--4:45.
[38]
Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In SIGMOD Conference. ACM, 1907--1922.
[39]
Softbank Group. 2020. Annual Report - ARM Business Strategy. Statista. https://group.softbank/system/files/pdf/ir/financials/annual_reports/annual-report_fy2020_01_en.pdf
[40]
Andreas Stiller. 2022. ARMs langer Marsch in die Serverwelt. iX 1 (2022), 60--65.
[41]
Ruby Y. Tahboub, Grégory M. Essertel, and Tiark Rompf. 2018. How to Architect a Query Compiler, Revisited. In SIGMOD Conference. ACM, 307--322.
[42]
Andrew Waterman, Krste Asanović, John Hauser, and SiFive Inc. 2021. The RISC-V Instruction Set Manual Volume II: Privileged Architecture Version 20211203. Technical Report. EECS Department, University of California, Berkeley. https://github.com/riscv/riscv-isa-manual/releases/download/Priv-v1.12/riscv-privileged-20211203.pdf

Cited By

View all
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
  • (2024)Compile-Time Analysis of Compiler Frameworks for Query CompilationProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444856(233-244)Online publication date: 2-Mar-2024
  • (2023)Analyzing Vectorized Hash Tables across CPU ArchitecturesProceedings of the VLDB Endowment10.14778/3611479.361148516:11(2755-2768)Online publication date: 24-Aug-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 6
February 2023
393 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2023
Published in PVLDB Volume 16, Issue 6

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)12
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
  • (2024)Compile-Time Analysis of Compiler Frameworks for Query CompilationProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444856(233-244)Online publication date: 2-Mar-2024
  • (2023)Analyzing Vectorized Hash Tables across CPU ArchitecturesProceedings of the VLDB Endowment10.14778/3611479.361148516:11(2755-2768)Online publication date: 24-Aug-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media