research-article

Open access

Steering Query Optimizers: A Practical Take on Big Data Workloads

Authors:

Alekh JindalAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 2557 - 2569

https://doi.org/10.1145/3448016.3457568

Published: 18 June 2021 Publication History

PDF eReader

Abstract

In recent years, there has been tremendous interest in research that applies machine learning to database systems. Being one of the most complex components of a DBMS, query optimizers could benefit from adaptive policies that are learned systematically from the data and the query workload. Recent research has brought up novel ideas towards a learned query optimizer, however these ideas have not been evaluated on a commercial query processor or on large scale, real-world workloads. In this paper, we take the approach used by Marcus et al. in Bao and adapt it to SCOPE, a big data system used internally at Microsoft. Along the way, we solve multiple new challenges: we define how optimizer rules affect final query plans by introducing the concept of a rule signature, we devise a pipeline computing interesting rule configurations for recurring jobs, and we define a new learning problem allowing us to apply such interesting rule configurations to previously unseen jobs. We evaluate the efficacy of the approach on production workloads that include 150K daily jobs. Our results show that alternative rule configurations can generate plans with lower costs, and this can translate to runtime latency savings of 7-30% on average and up to 90% for a non trivial subset of the workload.

Supplementary Material

MP4 File (3448016.3457568.mp4)

In recent years, there has been tremendous interest in research that applies machine learning to database systems. Being one of the most complex components of a DBMS, query optimizers could benefit from adaptive policies that are learned systematically from the data and the query workload. Recent research has brought up novel ideas towards a learned query optimizer, however these ideas have not been evaluated on a commercial query processor or on large scale, real-world workloads. In this paper, we take the approach used by Marcus et al. in Bao and adapt it to SCOPE, a big data system used internally at Microsoft. Along the way, we solve multiple new challenges: we define how optimizer rules affect final query plans by introducing the concept of a Rule Signature; we devise a pipeline computing interesting rules configuration for recurring jobs; and we define a new learning problem allowing us to apply such interesting rule configurations to previously unseen jobs. We evaluate the efficacy of the approach on production workloads that include 150 jobs. Our results show that alternative rule configurations can generate plans with lower costs, and this can translate to runtime latency savings on average of 7 30% and up to 90% for a non trivial subset of the workload.

Download
52.01 MB

References

[1]

Sameer Agarwal, Srikanth Kandula, Nicolas Bruno, Ming-Chuan Wu, Ion Stoica, and Jingren Zhou. 2012. Re-Optimizing Data-Parallel Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (San Jose, CA) (NSDI'12). USENIX Association, USA, 21.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Query optimization in multidatabase systems

Building query optimizers with combinators

Dynamic Materialization of Query Views for Data Warehouse Workloads

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Badges

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations