research-article

Open access

Kepler: Robust Learning for Parametric Query Optimization

Authors:

Vincent Zhuang,

Deniz Altinbüken,

Campbell FraserAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 1

Article No.: 109, Pages 1 - 25

https://doi.org/10.1145/3588963

Published: 30 May 2023 Publication History

Abstract

Most existing parametric query optimization (PQO) techniques rely on traditional query optimizer cost models, which are often inaccurate and result in suboptimal query performance. We propose Kepler, an end-to-end learning-based approach to PQO that demonstrates significant speedups in query latency over a traditional query optimizer. Central to our method is Row Count Evolution (RCE), a novel plan generation algorithm based on perturbations in the sub-plan cardinality space. While previous approaches require accurate cost models, we bypass this requirement by evaluating candidate plans via actual execution data and training anML model to predict the fastest plan given parameter binding values. Our models leverage recent advances in neural network uncertainty in order to robustly predict faster plans while avoiding regressions in query performance. Experimentally, we show that Kepler achieves significant improvements in query runtime on multiple datasets on PostgreSQL.

Supplemental Material

MP4 File

Presentation video for SIGMOD 2023

Download
44.81 MB

References

[1]

2022. Introduction to Aurora PostgreSQL Query Plan Management. https://aws.amazon.com/blogs/database/introduction-to-aurora-postgresql-query-plan-management/

[2]

2022. Oracle: Improving Real-World Performance Through Cursor Sharing. https://docs.oracle.com/en/database/oracle/oracle-database/18/tgsql/improving-rwp-cursor-sharing.html

[3]

2022. Parameter Sensitivity Plan optimization. https://docs.microsoft.com/en-us/sql/relational-databases/performance/parameter-sensitivity-plan-optimization?view=sql-server-ver16

[4]

2022. Skewed Data Generator for TPCH. https://github.com/gunaprsd/SkewedDataGenerator

[5]

2022. TPCH Benchmark. https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp

[6]

Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, and Stanley B Zdonik. 2012. Learning-based query performance modeling and prediction. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 390--401.

Digital Library

[7]

Gunes Aluç, David E DeHaan, and Ivan T Bowman. 2012. Parametric plan caching using density-based clustering. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 402--413.

Digital Library

[8]

Alejandro Correa Bahnsen, Djamia Aouada, and Björn Ottersten. 2014. Example-dependent cost-sensitive logistic regression for credit scoring. In 2014 13th International conference on machine learning and applications. IEEE, 263--269.

Digital Library

[9]

Surajit Chaudhuri, Hongrae Lee, and Vivek R Narasayya. 2010. Variance aware optimization of parameterized queries. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 531--542.

Digital Library

[10]

Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy. 2009. Exact cardinality query optimization for optimizer testing. Proceedings of the VLDB Endowment 2, 1 (2009), 994--1005.

Digital Library

[11]

Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. PVLDB 7, 4 (2013), 277--288. http://www.vldb.org/pvldb/vol7/p277-difallah.pdf

Digital Library

[12]

Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R Narasayya. 2019. Ai meets ai: Leveraging query executions to improve index recommendations. In Proceedings of the 2019 International Conference on Management of Data. 1241--1258.

Digital Library

[13]

Anshuman Dutt, Vivek Narasayya, and Surajit Chaudhuri. 2017. Leveraging re-costing for online optimization of parameterized queries with guarantees. In Proceedings of the 2017 ACM International Conference on Management of Data. 1539--1554.

Digital Library

[14]

Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al . 2021. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. arXiv preprint arXiv:2109.05877 (2021).

[15]

Naveen Reddy Jayant R Haritsa. 2005. Analyzing plan diagrams of database query optimizers. In Proceedings of the 31st international conference on Very large data bases. VLDB Endowment. 1228--1239.

[16]

Alexander Hepburn, Ryan McConville, Raúl Santos-Rodríguezo, Jesús Cid-Sueiro, and Dario García-García. 2018. Proper losses for learning with example-dependent costs. In Second International Workshop on Learning with Imbalanced Domains: Theory and Applications. PMLR, 52--66.

[17]

Arvind Hulgeri and S Sudarshan. 2002. Parametric query optimization for linear and piecewise linear cost functions. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 167--178.

[18]

Yannis E Ioannidis, Raymond T Ng, Kyuseok Shim, and Timos K Sellis. 1997. Parametric query optimization. The VLDB Journal 6, 2 (1997), 132--151.

Digital Library

[19]

Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, and Jaehyok Chong. 2022. Learned Cardinality Estimation: An In-depth Study. In Proceedings of the 2022 International Conference on Management of Data. 1214--1227.

[20]

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018).

[21]

Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196 (2018).

[22]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204--215.

Digital Library

[23]

Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, and Balaji Lakshminarayanan. 2020. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. Advances in Neural Information Processing Systems 33 (2020), 7498--7512.

[24]

Yao Lu, Srikanth Kandula, Arnd Christian König, and Surajit Chaudhuri. 2021. Pre-training summarization models of structured datasets for cardinality estimation. Proceedings of the VLDB Endowment 15, 3 (2021), 414--426.

Digital Library

[25]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data. 1275--1288.

Digital Library

[26]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711 (2019).

[27]

Ryan Marcus and Olga Papaemmanouil. 2018. Deep reinforcement learning for join order enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1--4.

Digital Library

[28]

Ryan Marcus and Olga Papaemmanouil. 2018. Towards a hands-free query optimizer through deep learning. arXiv preprint arXiv:1809.10212 (2018).

[29]

Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2021. Flow-Loss: learning cardinality estimates that matter. arXiv preprint arXiv:2101.04964 (2021).

[30]

Richard T Snodgrass, Sabah Currim, and Young-Kyoon Suh. 2022. Have query optimizers hit the wall? The VLDB Journal 31, 1 (2022), 181--200.

Digital Library

[31]

Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. 2021. Learned cardinality estimation: A design space exploration and a comparative evaluation. Proceedings of the VLDB Endowment 15, 1 (2021), 85--97.

Digital Library

[32]

Immanuel Trummer. 2019. Exact cardinality query optimization with bounded execution cost. In proceedings of the 2019 international conference on management of data. 2--17.

Digital Library

[33]

Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 3 (2007), 1--13.

[34]

Kapil Vaidya, Anshuman Dutt, Vivek Narasayya, and Surajit Chaudhuri. 2021. Leveraging query logs and machine learning for parametric query optimization. Proceedings of the VLDB Endowment 15, 3 (2021), 401--413.

Digital Library

[35]

Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2020. Are we ready for learned cardinality estimation? arXiv preprint arXiv:2012.06743 (2020).

[36]

Wentao Wu, Jeffrey F Naughton, and Harneet Singh. 2016. Sampling-based query re-optimization. In Proceedings of the 2016 International Conference on Management of Data. 1721--1736.

Digital Library

[37]

Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a Query Optimizer Without Expert Demonstrations. arXiv preprint arXiv:2201.01441 (2022).

[38]

Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: one cardinality estimator for all tables. arXiv preprint arXiv:2006.08109 (2020).

[39]

Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. arXiv preprint arXiv:1905.04278 (2019).

Cited By

Zhang WLim WButrovich MPavlo A(2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3682007
Saxena PSabek ISpedalieri F(2024)Constrained Quadratic Model for Optimizing Join OrdersProceedings of the 1st Workshop on Quantum Computing and Quantum-Inspired Technology for Data-Intensive Systems and Applications10.1145/3665225.3665447(38-44)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3665225.3665447
Wei ZTrummer I(2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654973
Show More Cited By

Index Terms

Kepler: Robust Learning for Parametric Query Optimization
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization

Recommendations

Multi-objective parametric query optimization

Classical query optimization compares query plans according to one cost metric and associates each plan with a constant cost value. In this paper, we introduce the multi-objective parametric query optimization (MPQO) problem where query plans are ...
Query Optimization for Ontology-Mediated Query Answering
WWW '24: Proceedings of the ACM Web Conference 2024

Ontology-mediated query answering (OMQA) consists in asking database queries on knowledge bases (KBs); a KB is a set of facts called the KB's database, which is described by domain knowledge called the KB's ontology. A widely-investigated OMQA technique ...
Query containment under bag and bag-set semantics

Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 1

PACMMOD

May 2023

2807 pages

EISSN:2836-6573

DOI:10.1145/3603164

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023

Published in PACMMOD Volume 1, Issue 1

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
961
Total Downloads

Downloads (Last 12 months)761
Downloads (Last 6 weeks)67

Reflects downloads up to 19 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang WLim WButrovich MPavlo A(2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3682007
Saxena PSabek ISpedalieri F(2024)Constrained Quadratic Model for Optimizing Join OrdersProceedings of the 1st Workshop on Quantum Computing and Quantum-Inspired Technology for Data-Intensive Systems and Applications10.1145/3665225.3665447(38-44)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3665225.3665447
Wei ZTrummer I(2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654973
Kim KLee SKim IHan W(2024)ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36393002:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639300
Kamali AKantere VZuzarte C(2024)Robust Query Optimization in the Era of Machine Learning: State-of-the-Art and Future Directions2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00408(5371-5375)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00408
Ooi BCai SChen GShen YTan KWu YXiao XXing NYue CZeng LZhang MZhao Z(2024)NeurDB: an AI-powered autonomous data systemScience China Information Sciences10.1007/s11432-024-4125-967:10Online publication date: 13-Sep-2024
https://doi.org/10.1007/s11432-024-4125-9
Trummer I(2023)DB-BERT: making database tuning tools “read” the manualThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00831-y33:4(1085-1104)Online publication date: 27-Dec-2023
https://dl.acm.org/doi/10.1007/s00778-023-00831-y

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents