Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Bao: Making Learned Query Optimization Practical

Published: 01 June 2022 Publication History

Abstract

Recent efforts applying machine learning techniques to query optimization have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties, we introduce Bao (the Bandit optimizer). Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Bao combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm. As a result, Bao automatically learns from its mistakes and adapts to changes in query workloads, data, and schema. Experimentally, we demonstrate that Bao can quickly learn strategies that improve end-to-end query execution performance, including tail latency, for several workloads containing longrunning queries. In cloud environments, we show that Bao can offer both reduced costs and better performance compared with a commercial system.

References

[1]
Google Cloud Platform, https://cloud.google.com/.
[2]
C. Anagnostopoulos and P. Triantafillou. Learning to accurately COUNT with query-driven predictive analytics. In 2015 IEEE International Conference on Big Data (Big Data), Big Data '15, pages 14--23, Oct. 2015.
[3]
O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems, NIPS'11, 2011.
[4]
M. Collier and H. U. Llorens. Deep Contextual Multi-armed Bandits. arXiv:1807.09809 [cs, stat], July 2018.
[5]
B. Ding, S. Das, R. Marcus, W. Wu, S. Chaudhuri, and V. R. Narasayya. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In 38th ACM Special Interest Group in Data Management, SIGMOD '19, 2019.
[6]
J. Duggan, O. Papaemmanouil, U. Cetintemel, and E. Upfal. Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction. In Proceedings of the 14th International Conference on Extending Database Technology, EDBT '14, pages 109--120, 2014.
[7]
R. C. Fernandez and S. Madden. Termite: A System for Tunneling Through Heterogeneous Data. In AIDM @ SIGMOD 2019, aiDM '19, 2019.
[8]
J. Gottschlich, A. Solar-Lezama, N. Tatbul, M. Carbin, M. Rinard, R. Barzilay, S. Amarasinghe, J. B. Tenenbaum, and T. Mattson. The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2018, pages 69--80, Philadelphia, PA, USA, June 2018. Association for Computing Machinery.
[9]
R. B. Guo and K. Daudjee. Research challenges in deep reinforcement learning-based join query optimization. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM '20, pages 1--6, Portland, Oregon, June 2020. Association for Computing Machinery.
[10]
S. Jain, B. Howe, J. Yan, and T. Cruanes. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. arXiv:1801.05613 [cs], Feb. 2018.
[11]
T. Kaftan, M. Balazinska, A. Cheung, and J. Gehrke. Cuttlefish: A Lightweight Primitive for Adaptive Query Processing. arXiv preprint, Feb. 2018.
[12]
A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.
[13]
T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, New York, NY, USA, 2018. ACM.
[14]
S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv:1808.03196 [cs], Aug. 2018.
[15]
V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? PVLDB, 9(3):204--215, 2015.
[16]
H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality Estimation Using Neural Networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, CASCON '15, pages 53--59, Riverton, NJ, USA, 2015. IBM Corp.
[17]
G. Lohman. Is Query Optimization a ?"Solved" Problem? In ACM SIGMOD Blog, ACM Blog '14, 2014.
[18]
R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska. Bao: Making Learned Query Optimization Practical. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD '21, China, June 2021.
[19]
R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A Learned Query Optimizer. PVLDB, 12(11):1705--1718, 2019.
[20]
R. Marcus and O. Papaemmanouil. Deep Reinforcement Learning for Join Order Enumeration. In First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM @ SIGMOD '18, Houston, TX, 2018.
[21]
T. M. Mitchell. The Need for Biases in Learning Generalizations. Technical report, 1980.
[22]
L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI '16, pages 1287--1293, Phoenix, Arizona, 2016. AAAI Press.
[23]
P. Negi, M. Interlandi, R. Marcus, M. Alizadeh, T. Kraska, M. Friedman, and A. Jindal. Steering Query Optimizers: A Practical Take on Big Data Workloads. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD '21, pages 2557--2569, Virtual Event China, June 2021. ACM.
[24]
P. Negi, R. Marcus, H. Mao, N. Tatbul, T. Kraska, and M. Alizadeh. Cost-Guided Cardinality Estimation: Focus Where it Matters. In Workshop on Self-Managing Databases, SMDB @ ICDE '20, 2020.
[25]
J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In 2nd Workshop on Data Managmeent for End-to-End Machine Learning, DEEM '18, 2018.
[26]
J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. An Empirical Analysis of Deep Learning for Cardinality Estimation. arXiv:1905.06425 [cs], Sept. 2019.
[27]
Y. Park, S. Zhong, and B. Mozafari. QuickSel: Quick Selectivity Learning with Mixture Models. arXiv:1812.10568 [cs], Dec. 2018.
[28]
A. Pavlo, E. P. C. Jones, and S. Zdonik. On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems. PVLDB, 5(2):86--96, 2011.
[29]
A. G. Read. DeWitt clauses: Can we protect purchasers without hurting Microsoft. Rev. Litig., 25:387, 2006.
[30]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In J. Mylopolous and M. Brodie, editors, SIGMOD '79, SIGMOD '79, pages 511--522, San Francisco (CA), 1979. Morgan Kaufmann.
[31]
Shrainik Jain, Jiaqi Yan, Thiery Cruanes, and Bill Howe. Database-Agnostic Workload Management. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.
[32]
M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's LEarning Optimizer. In VLDB, VLDB '01, pages 19--28, 2001.
[33]
J. Sun and G. Li. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 13(3):307--319, Nov. 2019.
[34]
W. R. Thompson. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika, 1933.
[35]
I. Trummer, S. Moseley, D. Maram, S. Jo, and J. Antonakakis. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. PVLDB, 11(12):2074--2077, 2018.
[36]
K. Tzoumas, T. Sellis, and C. Jensen. A Reinforcement Learning Approach for Adaptive Query Processing. Technical Reports, June 2008.
[37]
Z. Yang, A. Kamsetty, S. Luan, E. Liang, Y. Duan, X. Chen, and I. Stoica. NeuroCard: One Cardinality Estimator for All Tables. arXiv:2006.08109 [cs], June 2020.
[38]
Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica. Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment, 13(3):279--292, Nov. 2019.
[39]
L. Zhou. A Survey on Contextual Multi-armed Bandits. arXiv:1508.03326 [cs], Feb. 2016.

Cited By

View all
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
  • (2024)Sample-Efficient Cardinality Estimation Using Geometric Deep LearningProceedings of the VLDB Endowment10.14778/3636218.363622917:4(740-752)Online publication date: 5-Mar-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 51, Issue 1
March 2022
90 pages
ISSN:0163-5808
DOI:10.1145/3542700
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2022
Published in SIGMOD Volume 51, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)19
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
  • (2024)Sample-Efficient Cardinality Estimation Using Geometric Deep LearningProceedings of the VLDB Endowment10.14778/3636218.363622917:4(740-752)Online publication date: 5-Mar-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • (2024)Sibyl: Forecasting Time-Evolving Query WorkloadsProceedings of the ACM on Management of Data10.1145/36393082:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Modeling Shifting Workloads for Learned Database SystemsProceedings of the ACM on Management of Data10.1145/36392932:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Demonstrating λ-Tune: Exploiting Large Language Models for Workload-Adaptive Database System TuningCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654751(508-511)Online publication date: 9-Jun-2024
  • (2024)Learned Optimizer for Online Approximate Query Processing in Data ExplorationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336198936:8(3977-3991)Online publication date: 1-Aug-2024
  • (2024)LBSC: A Cost-Aware Caching Framework for Cloud Databases2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00373(4911-4924)Online publication date: 13-May-2024
  • (2024)GLO: Towards Generalized Learned Query Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00368(4843-4855)Online publication date: 13-May-2024
  • (2024)A survey on hybrid transactional and analytical processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00858-933:5(1485-1515)Online publication date: 1-Sep-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media