Nothing Special   »   [go: up one dir, main page]

skip to main content
article

HorseIR: bringing array programming languages together with database query processing

Published: 06 April 2020 Publication History

Abstract

Relational database management systems (RDBMS) are operationally similar to a dynamic language processor. They take SQL queries as input, dynamically generate an optimized execution plan, and then execute it. In recent decades, the emergence of in-memory databases with columnar storage, which use array-like storage structures, has shifted the focus on optimizations from the traditional I/O bottleneck to CPU and memory. However, database research so far has primarily focused on CPU cache optimizations. The similarity in the computational characteristics of such database workloads and array programming language optimizations are largely unexplored. We believe that these database implementations can benefit from merging database optimizations with dynamic array-based programming language approaches. Therefore, in this paper, we propose a novel approach to optimize database query execution using a new array-based intermediate representation, HorseIR, that resides between database queries and compiled code. Furthermore, we provide a translator to generate HorseIR from database execution plans and a compiler that optimizes HorseIR and generates efficient code. We compare HorseIR with the MonetDB RDBMS, by testing standard SQL queries, and show how our approach and compiler optimizations improve the runtime of complex queries.

References

[1]
Daniel J Abadi, Samuel R Madden, and Nabil Hachem. 2008. Column-Stores vs. Row-Stores: How Different Are They Really?. In Special Interest Group on Management of Data (SIGMOD). ACM, 967-980.
[2]
Yanif Ahmad and Christoph Koch. 2009. DBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases. The Proceedings of the VLDB Endowment (PVLDB) 2 (2009), 1566-1569.
[3]
Anastassia Ailamaki, David J DeWitt, Mark D Hill, and David A Wood. 1999. DBMSs On A Modern Processor: Where Does Time Go?. In Conference on Very Large Data Bases (VLDB). 266-277.
[4]
Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Conference on Innovative Data Systems Research (CIDR). 225-237.
[5]
Qiang Cao, Pedro Trancoso, J-L Larriba-Pey, Josep Torrellas, Robert Knighten, and Youjip Won. 1999. Detailed Characterization of a Quad Pentium Pro Server Running TPC-D. In International Conference on Computer Design (ICCD). IEEE, 108-115.
[6]
Stefano Ceri and Georg Gottlob. 1985. Translating SQL into Relational Algebra: Optimization, Semantics, and Equivalence of SQL Queries. Transactions on Software Engineering 4 (1985), 324-345.
[7]
Donald D Chamberlin and Raymond F Boyce. 1974. SEQUEL: A Structured English Query Language. In Proc. ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control. ACM, 249-264.
[8]
Hanfeng Chen and Wai-Mee Ching. 2013. ELI: A Simple System for Array Programming. Vector, the Journal of the British APL Association 26, 1 (2013), 94-103.
[9]
Hanfeng Chen, Alexander Krolik, Erick Lavoie, and Laurie J. Hendren. 2016. Automatic Vectorization for MATLAB. In Workshop on Languages and Compilers for Parallel Computing (LCPC). 171-187.
[10]
Wai-Mee Ching and Da Zheng. 2012. Automatic Parallelization of Array-oriented Programs for a Multi-core Machine. International Journal of Parallel Programming 40, 5 (2012), 514-531.
[11]
Edgar F Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 (1970), 377-387.
[12]
Vincent Foley-Bourgon and Laurie J. Hendren. 2016. Efficiently Implementing the Copy Semantics of MATLAB's Arrays in JavaScript. In Dynamic Languages Symposium (DLS). 72-83.
[13]
Stavros Harizopoulos, Daniel J Abadi, Samuel Madden, and Michael Stonebraker. 2008. OLTP through the Looking Glass, and What We Found There. In Special Interest Group on Management of Data (SIGMOD). ACM, 981-992.
[14]
Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35, 1 (2012), 40-45.
[15]
Yannis E Ioannidis. 1996. Query Optimization. Comput. Surveys 28, 1 (1996), 121-123.
[16]
ISO/IEC 9075-1:2016 2016. Information technology - Database languages - SQL - Part 1: Framework (SQL/Framework). Standard. International Organization for Standardization.
[17]
Matthias Jarke and Jurgen Koch. 1984. Query Optimization in Database Systems. Comput. Surveys 16, 2 (1984), 111-152.
[18]
Michael A. Jenkins. 1989. Q'Nial; A Portable Interpreter for the Nested Interactive Array Language, Nial. Software: Practice and Experience 19, 2 (1989), 111-126.
[19]
Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP & OLAP Main Memory Database System Based on Virtual Memory Snapshots. In International Conference on Data Engineering (ICDE). 195-206.
[20]
Ken Kennedy. 2001. Fast Greedy Weighted Fusion. International Journal of Parallel Programming 29, 5 (2001), 463-491.
[21]
Christoph Koch. 2014. Abstraction Without Regret in Database Systems Building: a Manifesto. IEEE Data Eng. Bull. 37, 1 (2014), 70-79.
[22]
Vineet Kumar and Laurie J. Hendren. 2014. MIX10: compiling MATLAB to X10 for high performance. In Conference on Object-oriented Programming, Systems, and Applications. 617-636.
[23]
kx. 2018. KDB+/Q Database System. Retrieved June 2018 from https://kx.com/.
[24]
Vijay Menon and Keshav Pingali. 1999. A Case for Source-level Transformations in MATLAB. In Conference on Domain-specific Languages (DSL). ACM, 53-65.
[25]
MonetDB. 2018. MonetDB Optimizer Pipelines. Retrieved June 2018 from https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/OptimizerPipelines.
[26]
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. The Proceedings of the VLDB Endowment (PVLDB) 4, 9 (2011), 539-550.
[27]
Sriram Padmanabhan, Timothy Malkemus, Anant Jhingran, and Ramesh Agarwal. 2001. Block Oriented Processing of Relational Database Operations in Modern Computer Architectures. In International Conference on Data Engineering (ICDE). IEEE, 567-574.
[28]
David Lorge Parnas. 1972. On the Criteria to Be Used in Decomposing Systems into Modules. Commun. ACM 15, 12 (1972), 1053-1058.
[29]
Mark Raasveldt and Hannes Mühleisen. 2016. Vectorized UDFs in Column-Stores. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management, SSDBM 2016, Budapest, Hungary, July 18-20, 2016. 16:1-16:12.
[30]
Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In Special Interest Group on Management of Data (SIGMOD). 1907-1922.
[31]
John Miles Smith and Philip Yen-Tang Chang. 1975. Optimizing the Performance of a Relational Algebra Database Interface. Commun. ACM 18, 10 (1975), 568-579.
[32]
Transaction Processing Performance Council. 2017. TPC Benchmark H.
[33]
Patrick Valduriez. 1987. Join Indices. ACM Transactions on Database Systems (TODS) 12, 2 (1987), 218-246.
[34]
Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. IEEE Trans. on Knowl. and Data Eng. 27, 7 (2015), 1920-1948.
[35]
Ying Zhang, Martin L. Kersten, and Stefan Manegold. 2013. SciQL: Array Data Processing Inside an RDBMS. In Special Interest Group on Management of Data (SIGMOD). 1049-1052.
[36]
Jingren Zhou and Kenneth A Ross. 2004. Buffering Databse Operations for Enhanced Instruction Cache Performance. In Special Interest Group on Management of Data (SIGMOD). 191-202.

Index Terms

  1. HorseIR: bringing array programming languages together with database query processing

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 53, Issue 8
      DLS '18
      August 2018
      100 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3393673
      Issue’s Table of Contents
      • cover image ACM Conferences
        DLS 2018: Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages
        October 2018
        100 pages
        ISBN:9781450360302
        DOI:10.1145/3276945
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 April 2020
      Published in SIGPLAN Volume 53, Issue 8

      Check for updates

      Author Tags

      1. Array programming
      2. Compiler optimizations
      3. IR
      4. SQL database queries

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media