Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Eraser: Eliminating Performance Regression on Learned Query Optimizer

Published: 02 May 2024 Publication History

Abstract

Efficient query optimization is crucial for database management systems. Recently, machine learning models have been applied in query optimizers to generate better plans, but the unpredictable performance regressions prevent them from being truly applicable. To be more specific, while a learned query optimizer commonly outperforms the traditional query optimizer on average for a workload of queries, its performance regression seems inevitable for some queries due to model under-fitting and difficulty in generalization. In this paper, we propose a system called Eraser to resolve this problem. Eraser aims at eliminating performance regressions while still attaining considerable overall performance improvement. To this end, Eraser applies a two-stage strategy to estimate the model accuracy for each candidate plan, and helps the learned query optimizer select more reliable plans. The first stage serves as a coarse-grained filter that removes all highly risky plans with feature values that are seen for the first time. The second stage clusters plans in a more fine-grained manner and evaluates each cluster according to the prediction quality of learned query optimizers for selecting the final execution plan. Eraser can be deployed as a plugin on top of any learned query optimizer. We implement Eraser and demonstrate its superiority on PostgreSQL and Spark. In our experiments, Eraser eliminates most of the regressions while bringing very little negative impact on the overall performance of learned query optimizers, no matter whether they perform better or worse than the traditional query optimizer. Meanwhile, it is adaptive to dynamic settings and generally applicable to different database systems.

References

[1]
2022. HyperQO implementation. https://github.com/yxfish13/HyperQO.
[2]
2022. Lero implementation. https://github.com/Blondig/Lero-on-PostgreSQL.
[3]
2022. PerfGuard implementation. https://github.com/WoodyBryant/Perfguard.
[4]
Omer Achrack, Raizy Kellerman, and Ouriel Barzilay. 2020. Multi-loss sub-ensembles for accurate classification with uncertainty estimation. ArXiv Preprint ArXiv:2010.01917 (2020).
[5]
Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, HM Sajjad Hossain, Alekh Jindal, Peter Orenberg, Hiren Patel, Shi Qiao, Vijay Ramani, Lucas Rosenblatt, et al. 2021. PerfGuard: Deploying ML-for-systems without performance regressions, almost! Proceedings of the VLDB Endowment 14, 13 (2021), 3362--3375.
[6]
Surajit Chaudhuri and Vivek R Narasayya. 1997. An efficient, cost-driven index selection tool for Microsoft SQL server. In VLDB, Vol. 97. San Francisco, 146--155.
[7]
Transaction Processing Performance Council(TPC). 2021. TPC-H vesion 2 and version 3. http://www.tpc.org/tpch/.
[8]
Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R Narasayya. 2019. Ai meets ai: Leveraging query executions to improve index recommendations. In Proceedings of the 2019 International Conference on Management of Data. 1241--1258.
[9]
Tansel Dokeroglu, Murat Ali Bayir, and Ahmet Cosar. 2015. Robust heuristic algorithms for exploiting the common tasks of relational cloud database queries. Applied Soft Computing 30 (2015), 72--82.
[10]
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Tan Wei Liang, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. PVLDB 15, 4 (2021), 752--765.
[11]
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. ArXiv Preprint ArXiv:2109.05877 (2021).
[12]
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from data, not from queries! PVLDB 13, 7, 992--1005.
[13]
Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E Hopcroft, and Kilian Q Weinberger. 2017. Snapshot ensembles: Train 1, get m for free. ArXiv Preprint ArXiv.00109 (2017).
[14]
Harish D Pooja N Darera Jayant and R Haritsa. 2008. Identifying robust plans through plan diagram reduction. In VLDB, Vol. 24. Citeseer, 25.
[15]
Alekh Jindal, Konstantinos Karanasos, Sriram Rao, and Hiren Patel. 2018. Selecting subexpressions to materialize at datacenter scale. Proceedings of the VLDB Endowment 11, 7 (2018), 800--812.
[16]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. ArXiv Preprint ArXiv:1809.00677 (2018).
[17]
Mayuresh Kunjir and Shivnath Babu. 2020. Black or white? How to develop an autotuner for memory-based analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1667--1683.
[18]
Meghdad Kurmanji and Peter Triantafillou. 2023. Detect, distill and update: Learned DB systems facing out of distribution data. Proceedings of the ACM on Management of Data 1, 1 (2023), 1--27.
[19]
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems 30 (2017).
[20]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204--215.
[21]
Beibin Li, Yao Lu, and Srikanth Kandula. 2022. Warper: Efficiently adapting learned cardinality estimators to data and workload drifts. In Proceedings of the 2022 International Conference on Management of Data. 1920--1933.
[22]
Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment 12, 12 (2019), 2118--2130.
[23]
Martin Luhring, Kai-Uwe Sattler, Karsten Schmidt, and Eike Schallehn. 2007. Autonomous management of soft indexes. In 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, 450--458.
[24]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making learned query optimization practical. In SIGMOD. 1275--1288.
[25]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A learned query optimizer. ArXiv Preprint ArXiv:1904.03711 (2019).
[26]
Ryan Marcus and Olga Papaemmanouil. 2019. Plan-structured deep neural network models for query performance prediction. ArXiv Preprint ArXiv:1902.00132 (2019).
[27]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
[28]
Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S Sathiya Keerthi. 2019. An empirical analysis of deep learning for cardinality estimation. ArXiv Preprint ArXiv:1905.06425 (2019).
[29]
Wendel Góes Pedrozo, Júlio Cesar Nievola, and Deborah Carvalho Ribeiro. 2018. An adaptive approach for index tuning with learning classifier systems on hybrid storage environments. In Hybrid Artificial Intelligent Systems: 13th International Conference, HAIS 2018, Oviedo, Spain, June 20--22, 2018, Proceedings 13. Springer, 716--729.
[30]
Zahra Sadri, Le Gruenwald, and Eleazar Leal. 2020. Online index selection using deep reinforcement learning for a cluster database. In 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW). IEEE, 158--161.
[31]
Karl Schnaitter, Serge Abiteboul, Tova Milo, and Neoklis Polyzotis. 2007. On-line index selection for shifting workloads. In 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, 459--468.
[32]
P Griffiths Selinger, Morton M Astrahan, Donald D Chamberlin, Raymond A Lorie, and Thomas G Price. 1979. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data. 23--34.
[33]
Ji Sun and Guoliang Li. 2019. An end-to-end learning-based cost estimator. ArXiv Preprint ArXiv:1906.02560 (2019).
[34]
Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, and Rui Zhang. 2019. Ibtune: Individualized buffer tuning for large-scale cloud databases. Proceedings of the VLDB Endowment 12, 10 (2019), 1221--1234.
[35]
Kostas Tzoumas, Amol Deshpande, and Christian S Jensen. 2011. Lightweight graphical models for selectivity estimation without independence assumptions. Proceedings of the VLDB Endowment 4, 11 (2011), 852--863.
[36]
Matias Valdenegro-Toro. 2019. Deep sub-ensembles for fast uncertainty estimation in image classification. ArXiv Preprint ArXiv:1910.08168 (2019).
[37]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008).
[38]
Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2020. Are we ready for learned cardinality estimation? ArXiv Preprint ArXiv:2012.06743 (2020).
[39]
Yeming Wen, Dustin Tran, and Jimmy Ba. 2020. Batchensemble: An alternative approach to efficient ensemble and lifelong learning. ArXiv Preprint ArXiv:2002.06715 (2020).
[40]
Lianggui Weng, Rong Zhu, Di Wu, Bolin Ding, Bolong Zheng, and Jingren Zhou. 2023. Eraser: Eliminating performance regression on learned query optimizer. https://github.com/duoyw/Eraser/tree/main/paper (2023).
[41]
Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a query optimizer without expert demonstrations. In SIGMOD Conference. ACM, 931--944.
[42]
Xiang Yu, Chengliang Chai, Guoliang Li, and Jiabin Liu. 2022. Cost-based or learning-based? A hybrid query optimizer for query plan selection. Proc. VLDB Endow. 15, 13 (2022), 3924--3936.
[43]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.
[44]
Erkang Zhu, Dong Deng, Fatemeh Nargesian, and Renée J Miller. 2019. Josie: Overlap set similarity search for finding joinable tables in data lakes. In Proceedings of the 2019 International Conference on Management of Data. 847--864.
[45]
Rong Zhu, Wei Chen, Bolin Ding, Xingguang Chen, Andreas Pfadler, Ziniu Wu, and Jingren Zhou. 2022. Lero: A learning-to-rank query optimizer. Proc. VLDB Endow. 16, 6 (2022), 1466--1479.
[46]
Rong Zhu, Ziniu Wu, Chengliang Chai, Andreas Pfadler, Bolin Ding, Guoliang Li, and Jingren Zhou. 2022. Learned query optimizer: At the forefront of AI-driven databases. In EDBT. 1--4.
[47]
Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, lightweight and accurate method for cardinality estimation. PVLDB 14, 9 (2021), 1489--1502.
[48]
Daniel C Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Guy M Lohman, Roberta J Cochrane, Hamid Pirahesh, Latha Colby, Jarek Gryz, Eric Alton, et al. 2004. Recommending materialized views and indexes with the IBM DB2 design advisor. In International Conference on Autonomic Computing, 2004. Proceedings. IEEE, 180--187.

Cited By

View all
  • (2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 2-May-2024
  • (2024)Low Rank Approximation for Learned Query OptimizationProceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3663742.3663974(1-5)Online publication date: 14-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 17, Issue 5
January 2024
233 pages
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 02 May 2024
Published in PVLDB Volume 17, Issue 5

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)153
  • Downloads (Last 6 weeks)28
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 2-May-2024
  • (2024)Low Rank Approximation for Learned Query OptimizationProceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3663742.3663974(1-5)Online publication date: 14-Jun-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media