Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Kepler: Robust Learning for Parametric Query Optimization

Published: 30 May 2023 Publication History

Abstract

Most existing parametric query optimization (PQO) techniques rely on traditional query optimizer cost models, which are often inaccurate and result in suboptimal query performance. We propose Kepler, an end-to-end learning-based approach to PQO that demonstrates significant speedups in query latency over a traditional query optimizer. Central to our method is Row Count Evolution (RCE), a novel plan generation algorithm based on perturbations in the sub-plan cardinality space. While previous approaches require accurate cost models, we bypass this requirement by evaluating candidate plans via actual execution data and training anML model to predict the fastest plan given parameter binding values. Our models leverage recent advances in neural network uncertainty in order to robustly predict faster plans while avoiding regressions in query performance. Experimentally, we show that Kepler achieves significant improvements in query runtime on multiple datasets on PostgreSQL.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023

References

[1]
2022. Introduction to Aurora PostgreSQL Query Plan Management. https://aws.amazon.com/blogs/database/introduction-to-aurora-postgresql-query-plan-management/
[2]
2022. Oracle: Improving Real-World Performance Through Cursor Sharing. https://docs.oracle.com/en/database/oracle/oracle-database/18/tgsql/improving-rwp-cursor-sharing.html
[3]
2022. Parameter Sensitivity Plan optimization. https://docs.microsoft.com/en-us/sql/relational-databases/performance/parameter-sensitivity-plan-optimization?view=sql-server-ver16
[4]
2022. Skewed Data Generator for TPCH. https://github.com/gunaprsd/SkewedDataGenerator
[5]
2022. TPCH Benchmark. https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
[6]
Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, and Stanley B Zdonik. 2012. Learning-based query performance modeling and prediction. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 390--401.
[7]
Gunes Aluç, David E DeHaan, and Ivan T Bowman. 2012. Parametric plan caching using density-based clustering. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 402--413.
[8]
Alejandro Correa Bahnsen, Djamia Aouada, and Björn Ottersten. 2014. Example-dependent cost-sensitive logistic regression for credit scoring. In 2014 13th International conference on machine learning and applications. IEEE, 263--269.
[9]
Surajit Chaudhuri, Hongrae Lee, and Vivek R Narasayya. 2010. Variance aware optimization of parameterized queries. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 531--542.
[10]
Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy. 2009. Exact cardinality query optimization for optimizer testing. Proceedings of the VLDB Endowment 2, 1 (2009), 994--1005.
[11]
Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. PVLDB 7, 4 (2013), 277--288. http://www.vldb.org/pvldb/vol7/p277-difallah.pdf
[12]
Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R Narasayya. 2019. Ai meets ai: Leveraging query executions to improve index recommendations. In Proceedings of the 2019 International Conference on Management of Data. 1241--1258.
[13]
Anshuman Dutt, Vivek Narasayya, and Surajit Chaudhuri. 2017. Leveraging re-costing for online optimization of parameterized queries with guarantees. In Proceedings of the 2017 ACM International Conference on Management of Data. 1539--1554.
[14]
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al . 2021. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. arXiv preprint arXiv:2109.05877 (2021).
[15]
Naveen Reddy Jayant R Haritsa. 2005. Analyzing plan diagrams of database query optimizers. In Proceedings of the 31st international conference on Very large data bases. VLDB Endowment. 1228--1239.
[16]
Alexander Hepburn, Ryan McConville, Raúl Santos-Rodríguezo, Jesús Cid-Sueiro, and Dario García-García. 2018. Proper losses for learning with example-dependent costs. In Second International Workshop on Learning with Imbalanced Domains: Theory and Applications. PMLR, 52--66.
[17]
Arvind Hulgeri and S Sudarshan. 2002. Parametric query optimization for linear and piecewise linear cost functions. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 167--178.
[18]
Yannis E Ioannidis, Raymond T Ng, Kyuseok Shim, and Timos K Sellis. 1997. Parametric query optimization. The VLDB Journal 6, 2 (1997), 132--151.
[19]
Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, and Jaehyok Chong. 2022. Learned Cardinality Estimation: An In-depth Study. In Proceedings of the 2022 International Conference on Management of Data. 1214--1227.
[20]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018).
[21]
Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196 (2018).
[22]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204--215.
[23]
Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, and Balaji Lakshminarayanan. 2020. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. Advances in Neural Information Processing Systems 33 (2020), 7498--7512.
[24]
Yao Lu, Srikanth Kandula, Arnd Christian König, and Surajit Chaudhuri. 2021. Pre-training summarization models of structured datasets for cardinality estimation. Proceedings of the VLDB Endowment 15, 3 (2021), 414--426.
[25]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data. 1275--1288.
[26]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711 (2019).
[27]
Ryan Marcus and Olga Papaemmanouil. 2018. Deep reinforcement learning for join order enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1--4.
[28]
Ryan Marcus and Olga Papaemmanouil. 2018. Towards a hands-free query optimizer through deep learning. arXiv preprint arXiv:1809.10212 (2018).
[29]
Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2021. Flow-Loss: learning cardinality estimates that matter. arXiv preprint arXiv:2101.04964 (2021).
[30]
Richard T Snodgrass, Sabah Currim, and Young-Kyoon Suh. 2022. Have query optimizers hit the wall? The VLDB Journal 31, 1 (2022), 181--200.
[31]
Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. 2021. Learned cardinality estimation: A design space exploration and a comparative evaluation. Proceedings of the VLDB Endowment 15, 1 (2021), 85--97.
[32]
Immanuel Trummer. 2019. Exact cardinality query optimization with bounded execution cost. In proceedings of the 2019 international conference on management of data. 2--17.
[33]
Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 3 (2007), 1--13.
[34]
Kapil Vaidya, Anshuman Dutt, Vivek Narasayya, and Surajit Chaudhuri. 2021. Leveraging query logs and machine learning for parametric query optimization. Proceedings of the VLDB Endowment 15, 3 (2021), 401--413.
[35]
Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2020. Are we ready for learned cardinality estimation? arXiv preprint arXiv:2012.06743 (2020).
[36]
Wentao Wu, Jeffrey F Naughton, and Harneet Singh. 2016. Sampling-based query re-optimization. In Proceedings of the 2016 International Conference on Management of Data. 1721--1736.
[37]
Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a Query Optimizer Without Expert Demonstrations. arXiv preprint arXiv:2201.01441 (2022).
[38]
Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: one cardinality estimator for all tables. arXiv preprint arXiv:2006.08109 (2020).
[39]
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. arXiv preprint arXiv:1905.04278 (2019).

Cited By

View all
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)Constrained Quadratic Model for Optimizing Join OrdersProceedings of the 1st Workshop on Quantum Computing and Quantum-Inspired Technology for Data-Intensive Systems and Applications10.1145/3665225.3665447(38-44)Online publication date: 9-Jun-2024
  • (2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Author Tags

  1. databases
  2. machine learning
  3. query optimization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)778
  • Downloads (Last 6 weeks)98
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)Constrained Quadratic Model for Optimizing Join OrdersProceedings of the 1st Workshop on Quantum Computing and Quantum-Inspired Technology for Data-Intensive Systems and Applications10.1145/3665225.3665447(38-44)Online publication date: 9-Jun-2024
  • (2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
  • (2024)ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36393002:1(1-27)Online publication date: 26-Mar-2024
  • (2024)A Cause-Focused Query Optimizer Alert SystemProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679771(2981-2990)Online publication date: 21-Oct-2024
  • (2024)Robust Query Optimization in the Era of Machine Learning: State-of-the-Art and Future Directions2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00408(5371-5375)Online publication date: 13-May-2024
  • (2024)NeurDB: an AI-powered autonomous data systemScience China Information Sciences10.1007/s11432-024-4125-967:10Online publication date: 13-Sep-2024
  • (2024)SPQO: Learning to Safely Reuse Cached Plans for Dynamic WorkloadsDatabase Systems for Advanced Applications10.1007/978-981-97-5552-3_21(315-330)Online publication date: 1-Oct-2024
  • (2023)DB-BERT: making database tuning tools “read” the manualThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00831-y33:4(1085-1104)Online publication date: 27-Dec-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media