research-article

Eraser: Eliminating Performance Regression on Learned Query Optimizer

Authors:

Jingren ZhouAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 17, Issue 5

Pages 926 - 938

https://doi.org/10.14778/3641204.3641205

Published: 01 January 2024 Publication History

Abstract

Efficient query optimization is crucial for database management systems. Recently, machine learning models have been applied in query optimizers to generate better plans, but the unpredictable performance regressions prevent them from being truly applicable. To be more specific, while a learned query optimizer commonly outperforms the traditional query optimizer on average for a workload of queries, its performance regression seems inevitable for some queries due to model under-fitting and difficulty in generalization. In this paper, we propose a system called Eraser to resolve this problem. Eraser aims at eliminating performance regressions while still attaining considerable overall performance improvement. To this end, Eraser applies a two-stage strategy to estimate the model accuracy for each candidate plan, and helps the learned query optimizer select more reliable plans. The first stage serves as a coarse-grained filter that removes all highly risky plans with feature values that are seen for the first time. The second stage clusters plans in a more fine-grained manner and evaluates each cluster according to the prediction quality of learned query optimizers for selecting the final execution plan. Eraser can be deployed as a plugin on top of any learned query optimizer. We implement Eraser and demonstrate its superiority on PostgreSQL and Spark. In our experiments, Eraser eliminates most of the regressions while bringing very little negative impact on the overall performance of learned query optimizers, no matter whether they perform better or worse than the traditional query optimizer. Meanwhile, it is adaptive to dynamic settings and generally applicable to different database systems.

References

[1]

2022. HyperQO implementation. https://github.com/yxfish13/HyperQO.

[2]

2022. Lero implementation. https://github.com/Blondig/Lero-on-PostgreSQL.

[3]

2022. PerfGuard implementation. https://github.com/WoodyBryant/Perfguard.

[4]

Omer Achrack, Raizy Kellerman, and Ouriel Barzilay. 2020. Multi-loss sub-ensembles for accurate classification with uncertainty estimation. ArXiv Preprint ArXiv:2010.01917 (2020).

[5]

Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, HM Sajjad Hossain, Alekh Jindal, Peter Orenberg, Hiren Patel, Shi Qiao, Vijay Ramani, Lucas Rosenblatt, et al. 2021. PerfGuard: Deploying ML-for-systems without performance regressions, almost! Proceedings of the VLDB Endowment 14, 13 (2021), 3362--3375.

[6]

Surajit Chaudhuri and Vivek R Narasayya. 1997. An efficient, cost-driven index selection tool for Microsoft SQL server. In VLDB, Vol. 97. San Francisco, 146--155.

[7]

Transaction Processing Performance Council(TPC). 2021. TPC-H vesion 2 and version 3. http://www.tpc.org/tpch/.

[8]

Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R Narasayya. 2019. Ai meets ai: Leveraging query executions to improve index recommendations. In Proceedings of the 2019 International Conference on Management of Data. 1241--1258.

Digital Library

[9]

Tansel Dokeroglu, Murat Ali Bayir, and Ahmet Cosar. 2015. Robust heuristic algorithms for exploiting the common tasks of relational cloud database queries. Applied Soft Computing 30 (2015), 72--82.

Digital Library

[10]

Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Tan Wei Liang, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. PVLDB 15, 4 (2021), 752--765.

Digital Library

[11]

Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. ArXiv Preprint ArXiv:2109.05877 (2021).

[12]

Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from data, not from queries! PVLDB 13, 7, 992--1005.

[13]

Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E Hopcroft, and Kilian Q Weinberger. 2017. Snapshot ensembles: Train 1, get m for free. ArXiv Preprint ArXiv.00109 (2017).

[14]

Harish D Pooja N Darera Jayant and R Haritsa. 2008. Identifying robust plans through plan diagram reduction. In VLDB, Vol. 24. Citeseer, 25.

[15]

Alekh Jindal, Konstantinos Karanasos, Sriram Rao, and Hiren Patel. 2018. Selecting subexpressions to materialize at datacenter scale. Proceedings of the VLDB Endowment 11, 7 (2018), 800--812.

Digital Library

[16]

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. ArXiv Preprint ArXiv:1809.00677 (2018).

[17]

Mayuresh Kunjir and Shivnath Babu. 2020. Black or white? How to develop an autotuner for memory-based analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1667--1683.

Digital Library

[18]

Meghdad Kurmanji and Peter Triantafillou. 2023. Detect, distill and update: Learned DB systems facing out of distribution data. Proceedings of the ACM on Management of Data 1, 1 (2023), 1--27.

Digital Library

[19]

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems 30 (2017).

[20]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204--215.

Digital Library

[21]

Beibin Li, Yao Lu, and Srikanth Kandula. 2022. Warper: Efficiently adapting learned cardinality estimators to data and workload drifts. In Proceedings of the 2022 International Conference on Management of Data. 1920--1933.

Digital Library

[22]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment 12, 12 (2019), 2118--2130.

Digital Library

[23]

Martin Luhring, Kai-Uwe Sattler, Karsten Schmidt, and Eike Schallehn. 2007. Autonomous management of soft indexes. In 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, 450--458.

Digital Library

[24]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making learned query optimization practical. In SIGMOD. 1275--1288.

[25]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A learned query optimizer. ArXiv Preprint ArXiv:1904.03711 (2019).

[26]

Ryan Marcus and Olga Papaemmanouil. 2019. Plan-structured deep neural network models for query performance prediction. ArXiv Preprint ArXiv:1902.00132 (2019).

[27]

Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.

[28]

Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S Sathiya Keerthi. 2019. An empirical analysis of deep learning for cardinality estimation. ArXiv Preprint ArXiv:1905.06425 (2019).

[29]

Wendel Góes Pedrozo, Júlio Cesar Nievola, and Deborah Carvalho Ribeiro. 2018. An adaptive approach for index tuning with learning classifier systems on hybrid storage environments. In Hybrid Artificial Intelligent Systems: 13th International Conference, HAIS 2018, Oviedo, Spain, June 20--22, 2018, Proceedings 13. Springer, 716--729.

Digital Library

[30]

Zahra Sadri, Le Gruenwald, and Eleazar Leal. 2020. Online index selection using deep reinforcement learning for a cluster database. In 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW). IEEE, 158--161.

[31]

Karl Schnaitter, Serge Abiteboul, Tova Milo, and Neoklis Polyzotis. 2007. On-line index selection for shifting workloads. In 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, 459--468.

Digital Library

[32]

P Griffiths Selinger, Morton M Astrahan, Donald D Chamberlin, Raymond A Lorie, and Thomas G Price. 1979. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data. 23--34.

Digital Library

[33]

Ji Sun and Guoliang Li. 2019. An end-to-end learning-based cost estimator. ArXiv Preprint ArXiv:1906.02560 (2019).

[34]

Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, and Rui Zhang. 2019. Ibtune: Individualized buffer tuning for large-scale cloud databases. Proceedings of the VLDB Endowment 12, 10 (2019), 1221--1234.

Digital Library

[35]

Kostas Tzoumas, Amol Deshpande, and Christian S Jensen. 2011. Lightweight graphical models for selectivity estimation without independence assumptions. Proceedings of the VLDB Endowment 4, 11 (2011), 852--863.

Digital Library

[36]

Matias Valdenegro-Toro. 2019. Deep sub-ensembles for fast uncertainty estimation in image classification. ArXiv Preprint ArXiv:1910.08168 (2019).

[37]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008).

[38]

Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2020. Are we ready for learned cardinality estimation? ArXiv Preprint ArXiv:2012.06743 (2020).

[39]

Yeming Wen, Dustin Tran, and Jimmy Ba. 2020. Batchensemble: An alternative approach to efficient ensemble and lifelong learning. ArXiv Preprint ArXiv:2002.06715 (2020).

[40]

Lianggui Weng, Rong Zhu, Di Wu, Bolin Ding, Bolong Zheng, and Jingren Zhou. 2023. Eraser: Eliminating performance regression on learned query optimizer. https://github.com/duoyw/Eraser/tree/main/paper (2023).

[41]

Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a query optimizer without expert demonstrations. In SIGMOD Conference. ACM, 931--944.

Digital Library

[42]

Xiang Yu, Chengliang Chai, Guoliang Li, and Jiabin Liu. 2022. Cost-based or learning-based? A hybrid query optimizer for query plan selection. Proc. VLDB Endow. 15, 13 (2022), 3924--3936.

Digital Library

[43]

Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.

Digital Library

[44]

Erkang Zhu, Dong Deng, Fatemeh Nargesian, and Renée J Miller. 2019. Josie: Overlap set similarity search for finding joinable tables in data lakes. In Proceedings of the 2019 International Conference on Management of Data. 847--864.

Digital Library

[45]

Rong Zhu, Wei Chen, Bolin Ding, Xingguang Chen, Andreas Pfadler, Ziniu Wu, and Jingren Zhou. 2022. Lero: A learning-to-rank query optimizer. Proc. VLDB Endow. 16, 6 (2022), 1466--1479.

Digital Library

[46]

Rong Zhu, Ziniu Wu, Chengliang Chai, Andreas Pfadler, Bolin Ding, Guoliang Li, and Jingren Zhou. 2022. Learned query optimizer: At the forefront of AI-driven databases. In EDBT. 1--4.

[47]

Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, lightweight and accurate method for cardinality estimation. PVLDB 14, 9 (2021), 1489--1502.

Digital Library

[48]

Daniel C Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Guy M Lohman, Roberta J Cochrane, Hamid Pirahesh, Latha Colby, Jarek Gryz, Eric Alton, et al. 2004. Recommending materialized views and indexes with the IBM DB2 design advisor. In International Conference on Autonomic Computing, 2004. Proceedings. IEEE, 180--187.

Cited By

Zhu RWeng LWei WWu DPeng JWang YDing BLian DZheng BZhou J(2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 2-May-2024
https://dl.acm.org/doi/10.14778/3641204.3641209
Yi ZTian YIves ZMarcus R(2024)Low Rank Approximation for Learned Query OptimizationProceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3663742.3663974(1-5)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3663742.3663974

Recommendations

Learned Query Optimizer: What is New and What is Next
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data

In recent times, learned query optimizer has becoming a hot research topic in learned databases. It serves as the most suitable experimental plots for utilizing numerous machine-learning techniques and exhibits its superiority with enough evidence. In ...
Learned Optimizer for Online Approximate Query Processing in Data Exploration
In the interactive data exploration, approximate query processing (AQP) can be used to quickly return query results at the cost of accuracy. For online AQP, the sampler can be treated as an operator in the query plan. During the query optimization for AQP,...
DeepO: A Learned Query Optimizer
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Query optimization is crucial for the query performance of database systems. Despite decades of efforts from both research and industrial communities, query optimization remains one of the most challenging problems. Thanks to the advances in artificial ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 17, Issue 5

January 2024

233 pages

Editors:
Meihui Zhang
Beijing Institute of Technology
,
Cyrus Shahabi
University of Southern California

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2024

Published in PVLDB Volume 17, Issue 5

Check for updates

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
227
Total Downloads

Downloads (Last 12 months)227
Downloads (Last 6 weeks)18

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhu RWeng LWei WWu DPeng JWang YDing BLian DZheng BZhou J(2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 2-May-2024
https://dl.acm.org/doi/10.14778/3641204.3641209
Yi ZTian YIves ZMarcus R(2024)Low Rank Approximation for Learned Query OptimizationProceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3663742.3663974(1-5)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3663742.3663974

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents