research-article

LEON: A New Framework for ML-Aided Query Optimization

Authors:

Kai ZhengAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 16, Issue 9

Pages 2261 - 2273

https://doi.org/10.14778/3598581.3598597

Published: 01 May 2023 Publication History

Abstract

Query optimization has long been a fundamental yet challenging topic in the database field. With the prosperity of machine learning (ML), some recent works have shown the advantages of reinforcement learning (RL) based learned query optimizer. However, they suffer from fundamental limitations due to the data-driven nature of ML. Motivated by the ML characteristics and database maturity, we propose LEON-a framework for ML-aidEd query OptimizatioN. LEON improves the expert query optimizer to self-adjust to the particular deployment by leveraging ML and the fundamental knowledge in the expert query optimizer. To train the ML model, a pairwise ranking objective is proposed, which is substantially different from the previous regression objective. To help the optimizer to escape the local minima and avoid failure, a ranking and uncertainty-based exploration strategy is proposed, which discovers the valuable plans to aid the optimizer. Furthermore, an ML model-guided pruning is proposed to increase the planning efficiency without hurting too much performance. Extensive experiments offer evidence that the proposed framework can outperform the state-of-the-art methods in terms of end-to-end latency performance, training efficiency, and stability.

References

[1]

Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, and Stanley B Zdonik. 2012. Learning-based query performance modeling and prediction. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 390--401.

Digital Library

[2]

balsa project. 2022. Balsa source code. https://github.com/balsa-project/balsa.

[3]

Debabrota Basu, Qian Lin, Weidong Chen, Hoang Tam Vo, Zihong Yuan, Pierre Senellart, and Stéphane Bressan. 2015. Cost-model oblivious database tuning with reinforcement learning. In Database and Expert Systems Applications.

[4]

Debabrota Basu, Qian Lin, Weidong Chen, Hoang Tam Vo, Zihong Yuan, Pierre Senellart, and Stéphane Bressan. 2016. Regularized cost-model oblivious database tuning with reinforcement learning. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXVIII. Springer, 96--132.

[5]

Homanga Bharadhwaj. 2019. Meta-learning for user cold-start recommendation. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.

[6]

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning. 129--136.

Digital Library

[7]

Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R Narasayya. 2019. Ai meets ai: Leveraging query executions to improve index recommendations. In Proceedings of the 2019 International Conference on Management of Data. 1241--1258.

Digital Library

[8]

Jingtao Ding, Yuhan Quan, Quanming Yao, Yong Li, and Depeng Jin. 2020. Simplify and robustify negative sampling for implicit collaborative filtering. Advances in Neural Information Processing Systems 33 (2020), 1094--1105.

[9]

Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment (2019).

Digital Library

[10]

Lutz Fröhlich. 2022. PostgreSQL.

[11]

Goetz Graefe. 1995. The cascades framework for query optimization. IEEE Data Eng. Bull. 18, 3 (1995), 19--29.

[12]

Goetz Graefe and William J McKenna. 1993. The volcano optimizer generator: Extensibility and efficient search. In Proceedings of IEEE 9th international conference on data engineering. IEEE, 209--218.

[13]

Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. 2021. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. arXiv preprint arXiv:2109.05877 (2021).

[14]

Laurent Valentin Jospin, Hamid Laga, Farid Boussaid, Wray Buntine, and Mohammed Bennamoun. 2022. Hands-on Bayesian neural networks---A tutorial for deep learning users. IEEE Computational Intelligence Magazine 17, 2 (2022).

[15]

Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196 (2018).

[16]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204--215.

Digital Library

[17]

Guoliang Li, Xuanhe Zhou, and Lei Cao. 2021. AI meets database: AI4DB and DB4AI. In Proceedings of the 2021 International Conference on Management of Data. 2859--2866.

Digital Library

[18]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment (2019).

Digital Library

[19]

Lihao Liu, Qi Dou, Hao Chen, Jing Qin, and Pheng-Ann Heng. 2019. Multi-task deep model with margin ranking loss for lung nodule analysis. IEEE transactions on medical imaging 39, 3 (2019), 718--728.

[20]

Lin Ma, Bailu Ding, Sudipto Das, and Adith Swaminathan. 2020. Active learning for ML enhanced database systems. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.

Digital Library

[21]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2022. Bao: Making learned query optimization practical. ACM SIGMOD Record 51, 1 (2022), 6--13.

Digital Library

[22]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711 (2019).

[23]

Ryan Marcus and Olga Papaemmanouil. 2018. Deep reinforcement learning for join order enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1--4.

Digital Library

[24]

Ryan Marcus and Olga Papaemmanouil. 2019. Plan-structured deep neural network models for query performance prediction. arXiv:1902.00132 (2019).

[25]

Volker Markl, Guy M Lohman, and Vijayshankar Raman. 2003. LEO: An autonomic query optimizer for DB2. IBM Systems Journal (2003).

[26]

Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing bad plans by bounding the impact of cardinality estimation errors. Proceedings of the VLDB Endowment (2009).

Digital Library

[27]

Vikram Mullachery, Aniruddh Khera, and Amir Husain. 2018. Bayesian neural networks. arXiv preprint arXiv:1801.07710 (2018).

[28]

Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, and Alekh Jindal. 2021. Steering query optimizers: A practical take on big data workloads. In Proceedings of the 2021 International Conference on Management of Data.

Digital Library

[29]

Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2021. Flow-Loss: learning cardinality estimates that matter. arXiv preprint arXiv:2101.04964 (2021).

[30]

Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S Sathiya Keerthi. 2019. An empirical analysis of deep learning for cardinality estimation. arXiv preprint arXiv:1905.06425 (2019).

[31]

Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering (2009).

[32]

RyanMarcus. 2021. BAO source code. https://github.com/learnedsystems/BaoForPostgreSQL.

[33]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889--1897.

[34]

P Griffiths Selinger, Morton M Astrahan, Donald D Chamberlin, Raymond A Lorie, and Thomas G Price. 1979. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD international conference on Management of data. 23--34.

Digital Library

[35]

Ankur Sharma, Felix Martin Schuhknecht, and Jens Dittrich. 2018. The case for automatic database administration using deep reinforcement learning. arXiv preprint arXiv:1801.05643 (2018).

[36]

Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, and Wangchao Le. 2020. Cost models for big data query processing: Learning, retrofitting, and our findings. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.

Digital Library

[37]

Ji Sun and Guoliang Li. 2019. An end-to-end learning-based cost estimator. arXiv preprint arXiv:1906.02560 (2019).

[38]

Dana Van Aken, Dongsheng Yang, Sebastien Brillard, Ari Fiorino, Bohan Zhang, Christian Bilien, and Andrew Pavlo. 2021. An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems. Proceedings of the VLDB Endowment (2021).

Digital Library

[39]

Jiayi Wang, Chengliang Chai, Jiabin Liu, and Guoliang Li. 2021. FACE: a normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment (2021).

Digital Library

[40]

Xinshao Wang, Yang Hua, Elyor Kodirov, Guosheng Hu, Romain Garnier, and Neil M Robertson. 2019. Ranked list loss for deep metric learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5207--5216.

[41]

Ziniu Wu and Amir Shaikhha. 2020. BayesCard: A Unified Bayesian Framework for Cardinality Estimation. arXiv e-prints (2020).

[42]

Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a Query Optimizer Without Expert Demonstrations. arXiv preprint arXiv:2201.01441 (2022).

[43]

Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: one cardinality estimator for all tables. arXiv preprint arXiv:2006.08109 (2020).

[44]

Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2021. NeuroCard: One Cardinality Estimator for All Tables. Proceedings of the VLDB Endowment 14, 1, 61--73.

Digital Library

[45]

Xue Ying. 2019. An overview of overfitting and its solutions. In Journal of physics: Conference series, Vol. 1168. IOP Publishing, 022022.

[46]

Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement learning with tree-lstm for join order selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[47]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et al. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data.

Digital Library

[48]

Weinan Zhang, Tianqi Chen, Jun Wang, and Yong Yu. 2013. Optimizing top-n collaborative filtering via dynamic negative item sampling. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 785--788.

Digital Library

[49]

Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari, Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, and Alekh Jindal. 2022. Deploying a Steered Query Optimizer in Production at Microsoft. In Proceedings of the 2022 International Conference on Management of Data.

Digital Library

[50]

Xuanhe Zhou, Chengliang Chai, Guoliang Li, and Ji Sun. 2020. Database meets artificial intelligence: A survey. IEEE Transactions on Knowledge and Data Engineering (2020).

[51]

Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2020. FLAT: fast, lightweight and accurate method for cardinality estimation. arXiv preprint arXiv:2011.09022 (2020).

Cited By

Ye CDuan HZhang HWu YDai G(2024)Learned Query Optimization by Constraint-Based Query Plan AugmentationMathematics10.3390/math1219310212:19(3102)Online publication date: 3-Oct-2024
https://doi.org/10.3390/math12193102
Lehmann CSulimov PStockinger K(2024)Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning PerspectiveProceedings of the VLDB Endowment10.14778/3654621.365462517:7(1565-1577)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.14778/3654621.3654625
Zhu RWeng LWei WWu DPeng JWang YDing BLian DZheng BZhou J(2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 2-May-2024
https://dl.acm.org/doi/10.14778/3641204.3641209
Show More Cited By

Recommendations

Query Optimization for Ontology-Mediated Query Answering
WWW '24: Proceedings of the ACM Web Conference 2024

Ontology-mediated query answering (OMQA) consists in asking database queries on knowledge bases (KBs); a KB is a set of facts called the KB's database, which is described by domain knowledge called the KB's ontology. A widely-investigated OMQA technique ...
Synopses for query optimization: A space-complexity perspective
Special Issue: SIGMOD/PODS 2004

Database systems use precomputed synopses of data to estimate the cost of alternative plans during query optimization. A number of alternative synopsis structures have been proposed, but histograms are by far the most commonly used. While histograms ...
Synopses for query optimization: a space-complexity perspective
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Database systems use precomputed synopses of data to estimate the cost of alternative plans during query optimization. A number of alternative synopsis structures have been proposed, but histograms are by far the most commonly used. While histograms ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 16, Issue 9

May 2023

330 pages

ISSN:2150-8097

Editors:
Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 May 2023

Published in PVLDB Volume 16, Issue 9

Check for updates

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
281
Total Downloads

Downloads (Last 12 months)217
Downloads (Last 6 weeks)33

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ye CDuan HZhang HWu YDai G(2024)Learned Query Optimization by Constraint-Based Query Plan AugmentationMathematics10.3390/math1219310212:19(3102)Online publication date: 3-Oct-2024
https://doi.org/10.3390/math12193102
Lehmann CSulimov PStockinger K(2024)Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning PerspectiveProceedings of the VLDB Endowment10.14778/3654621.365462517:7(1565-1577)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.14778/3654621.3654625
Zhu RWeng LWei WWu DPeng JWang YDing BLian DZheng BZhou J(2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 2-May-2024
https://dl.acm.org/doi/10.14778/3641204.3641209
Ye RLiang ZChen XLiu SZheng KSerra ESpezzano F(2024)A Cause-Focused Query Optimizer Alert SystemProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679771(2981-2990)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679771
Chen HChen XLiang ZFeng XXie JSu HZheng KSerra ESpezzano F(2024)Towards Online and Safe Configuration Tuning with Semi-supervised Anomaly DetectionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679700(218-227)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679700
Fan SHou MXi RMa WSerra ESpezzano F(2024)Precision Meets Resilience: Cross-Database Generalization with Uncertainty Quantification for Robust Cost EstimationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679632(581-590)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679632
Liang ZChen XXia YYe RChen HXie JZheng K(2024)DACE: A Database-Agnostic Cost Estimator2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00374(4925-4937)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00374
Feng JLi ZChen Q(2024)Towards Exploratory Query Optimization for Template-Based SQL Workloads2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00019(151-164)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00019
Liang ZChen XZhao YXie JZeng KZheng K(2023)Efficient Cardinality and Cost Estimation with Bidirectional Compressor-based Ensemble Learning2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00048(388-397)Online publication date: 1-Dec-2023
https://doi.org/10.1109/ICDM58522.2023.00048
Wrembel R(2023)Data Integration Revitalized: From Data Warehouse Through Data Lake to Data MeshDatabase and Expert Systems Applications10.1007/978-3-031-39847-6_1(3-18)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-39847-6_1

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents