Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3183713.3196908acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Query-based Workload Forecasting for Self-Driving Database Management Systems

Published: 27 May 2018 Publication History

Abstract

The first step towards an autonomous database management system (DBMS) is the ability to model the target application's workload. This is necessary to allow the system to anticipate future workload needs and select the proper optimizations in a timely manner. Previous forecasting techniques model the resource utilization of the queries. Such metrics, however, change whenever the physical design of the database and the hardware resources change, thereby rendering previous forecasting models useless.
We present a robust forecasting framework called QueryBot 5000 that allows a DBMS to predict the expected arrival rate of queries in the future based on historical data. To better support highly dynamic environments, our approach uses the logical composition of queries in the workload rather than the amount of physical resources used for query execution. It provides multiple horizons (short- vs. long-term) with different aggregation intervals. We also present a clustering-based technique for reducing the total number of forecasting models to maintain. To evaluate our approach, we compare our forecasting models against other state-of-the-art models on three real-world database traces. We implemented our models in an external controller for PostgreSQL and MySQL and demonstrate their effectiveness in selecting indexes.

References

[1]
MySQL. https://www.mysql.com/.
[2]
OLTPBenchmark.com. http://oltpbenchmark.com.
[3]
Open Learning Initiative. http://oli.cmu.edu.
[4]
Oracle Self-Driving Database. https://www.oracle.com/database/autonomous-database/index.html.
[5]
PostgreSQL. https://www.postgresql.org/.
[6]
M. Akdere, U. cCetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. Learning-based query performance modeling and prediction. In 28th International Conference on Data Engineering, pages 390--401. IEEE, 2012.
[7]
F. J. Baldan Lozano, S. Ramirez-Gallego, C. Bergmeir, J. Benitez, and F. Herrera. A forecasting methodology for workload forecasting in cloud systems. PP:1--1, 06 2016.
[8]
J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975.
[9]
H. J. Bierens. The nadaraya-watson kernel regression function estimator. In Topics in Advanced Econometrics: Estimation, Testing, and Specification of Cross-Section and Time Series Models, pages 212--247. Cambridge University Press, 1994.
[10]
B. Boots, G. J. Gordon, and A. Gretton. Hilbert space embeddings of predictive state representations. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI 2013, Bellevue, WA, USA, August 11-15, 2013, 2013.
[11]
S. Chaudhuri, A. K. Gupta, and V. Narasayya. Compressing sql workloads. In Proceedings of the 2002 International Conference on Management of Data, pages 488--499. ACM, 2002.
[12]
S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft SQL server. In Proceedings of 23rd International Conference on Very Large Data Bases, pages 146--155, 1997.
[13]
S. Chu, D. Li, C. Wang, A. Cheung, and D. Suciu. Demonstration of the cosette automated sql prover. In Proceedings of the 2017 International Conference on Management of Data, pages 1591--1594. ACM, 2017.
[14]
S. Das, F. Li, V. R. Narasayya, and A. C. König. Automated demand-driven resource scaling in relational database-as-a-service. In Proceedings of the 2016 International Conference on Management of Data, pages 1923--1934. ACM, 2016.
[15]
B. Debnath, D. Lilja, and M. Mokbel. SARD: A statistical approach for ranking database tuning parameters. In ICDEW, pages 11--18, 2008.
[16]
D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. Proceedings of the VLDB Endowment, 7(4):277--288, 2013.
[17]
C. Downey, A. Hefny, B. Boots, G. J. Gordon, and B. Li. Predictive state recurrent neural networks. In Advances in Neural Information Processing Systems, pages 6055--6066, 2017.
[18]
N. Du, X. Ye, and J. Wang. Towards workflow-driven database system workload modeling. In Proceedings of the Second International Workshop on Testing Database Systems, page 10. ACM, 2009.
[19]
S. S. Elnaffar and P. Martin. An intelligent framework for predicting shifts in the workloads of autonomic database management systems.
[20]
M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In VLDB, volume 98, pages 323--333, 1998.
[21]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231. AAAI Press, 1996.
[22]
F. F/jr Informatik, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. 03 2003.
[23]
A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In International Conference on Data Engineering, pages 592--603. IEEE, 2009.
[24]
A. Ghosh, J. Parikh, V. S. Sengar, and J. R. Haritsa. Plan selection based on query clustering. In Proceedings of the 28th International Conference on Very Large Data Bases, pages 179--190. VLDB Endowment, 2002.
[25]
Z. Gong, X. Gu, and J. Wilkes. Press: Predictive elastic resource scaling for cloud systems. In International Conference on Network and Service Management (CNSM), pages 9--16. Ieee, 2010.
[26]
C. Gupta, A. Mehta, and U. Dayal. Pqr: Predicting query execution times for autonomous workload management. In International Conference on Autonomic Computing, pages 13--22. IEEE, 2008.
[27]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997.
[28]
M. Holze, C. Gaidies, and N. Ritter. Consistent on-line classification of dbs workload events. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1641--1644. ACM, 2009.
[29]
M. Holze, A. Haschimi, and N. Ritter. Towards workload-aware self-management: Predicting significant workload shifts. In 26th International Conference on Data Engineering Workshops (ICDEW), pages 111--116. IEEE, 2010.
[30]
M. Holze and N. Ritter. Towards workload shift detection and prediction for autonomic databases. In Proceedings of the ACM first Ph. D. workshop in CIKM, pages 109--116. ACM, 2007.
[31]
M. Holze and N. Ritter. Autonomic databases: Detection of workload shifts with n-gram-models. In East European Conference on Advances in Databases and Information Systems, pages 127--142. Springer, 2008.
[32]
S. Islam, J. Keung, K. Lee, and A. Liu. Empirical prediction models for adaptive resource provisioning in the cloud. Future Generation Computer Systems, 28(1):155--162, 2012.
[33]
T. Jayram, P. G. Kolaitis, and E. Vee. The containment problem for real conjunctive queries with inequalities. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 80--89. ACM, 2006.
[34]
L. I. Kuncheva and C. J. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning, 51(2):181--207, 2003.
[35]
G. Lanfranchi, P. Della Peruta, A. Perrone, and D. Calvanese. Toward a new landscape of systems management in an autonomic computing environment. IBM Systems journal, 42(1):119--128, 2003.
[36]
P. Martin, S. Elnaffar, and T. Wasserman. Workload models for autonomic database management systems. In International Conference on Autonomic and Autonomous Systems, pages 10--10. IEEE, 2006.
[37]
B. Mozafari, C. Curino, A. Jindal, and S. Madden. Performance and resource modeling in highly-concurrent oltp workloads. In Proceedings of the 2013 International Conference on Management of data, pages 301--312. ACM, 2013.
[38]
D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting dbms. In Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2005. 13th IEEE International Symposium on, pages 239--248. IEEE, 2005.
[39]
D. W. Opitz and R. Maclin. Popular ensemble methods: An empirical study. 1999.
[40]
A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang. Self-driving database management systems. In CIDR, 2017.
[41]
A. Pavlo, E. P. Jones, and S. Zdonik. On predictive modeling for optimizing transaction execution in parallel OLTP systems. Proc. VLDB Endow., 5:85--96, October 2011.
[42]
K. Pearson. Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559--572, 1901.
[43]
R. Polikar. Ensemble based systems in decision making. IEEE Circuits and systems magazine, 6(3):21--45.
[44]
J. Rogers, O. Papaemmanouil, and U. Cetintemel. A generic auto-provisioning framework for cloud databases. In International Conference on Data Engineering Workshops (ICDEW), pages 63--68. IEEE, 2010.
[45]
A. Rosenberg. Improving query performance in data warehouses. Business Intelligence Journal, 11, Jan. 2006.
[46]
N. Roy, A. Dubey, and A. Gokhale. Efficient autoscaling in the cloud using predictive models for workload forecasting. In International Conference on Cloud Computing, pages 500--507. IEEE, 2011.
[47]
Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. Journal of the ACM (JACM), 27(4), 1980.
[48]
S. Salza and M. Terranova. Workload modeling for relational database systems. In Database Machines, pages 233--255. Springer, 1985.
[49]
S. Salza and R. Tomasso. A modelling tool for the performance analysis of relational database applications. In Proc. 6th Int. Conf. on Modelling Techniques and Tools for Computer Performance Evaluation, pages 323--337, 1992.
[50]
B. Song, Y. Yu, Y. Zhou, Z. Wang, and S. Du. Host load prediction with long short-term memory in cloud computing. The Journal of Supercomputing, pages 1--15, 2017.
[51]
Z. Tang and P. A. Fishwick. Feedforward neural nets as models for time series forecasting. ORSA journal on computing, 5(4):374--385, 1993.
[52]
B. A. Trakhtenbrot. Impossibility of an algorithm for the decision problem in finite classes. Doklady Akademii Nauk SSSR, 70:569--572, 1950.
[53]
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37--57, 1985.
[54]
R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270--280, 1989.
[55]
D. Y. Yoon, N. Niu, and B. Mozafari. Dbsherlock: A performance diagnostic tool for transactional databases. In Proceedings of the 2016 International Conference on Management of Data, pages 1599--1614. ACM, 2016.
[56]
P. S. Yu, M.-S. Chen, H.-U. Heiss, and S. Lee. On workload characterization of relational database environments. IEEE Transactions on Software Engineering, 18(4):347--355, 1992.
[57]
Z.-H. Zhou. Ensemble methods: foundations and algorithms. CRC press, 2012.

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)Automating the Enterprise with Foundation ModelsProceedings of the VLDB Endowment10.14778/3681954.368196417:11(2805-2812)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. autonomic computing
  2. autonomous dbms
  3. database management systems
  4. machine learning
  5. query forecasting
  6. workload prediction

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)210
  • Downloads (Last 6 weeks)25
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)Automating the Enterprise with Foundation ModelsProceedings of the VLDB Endowment10.14778/3681954.368196417:11(2805-2812)Online publication date: 1-Jul-2024
  • (2024)Forecasting Algorithms for Intelligent Resource Scaling: An Experimental AnalysisProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698564(126-143)Online publication date: 20-Nov-2024
  • (2024)Vista: Machine Learning based Database Performance Troubleshooting Framework in Amazon RDSProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698519(83-98)Online publication date: 20-Nov-2024
  • (2024)Challenges & Opportunities in Automating DBMS: A Qualitative StudyProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695264(2013-2023)Online publication date: 27-Oct-2024
  • (2024)Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and TuningACM Computing Surveys10.1145/3665323Online publication date: 17-May-2024
  • (2024)Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665954(116-123)Online publication date: 8-Jul-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • (2024)Sibyl: Forecasting Time-Evolving Query WorkloadsProceedings of the ACM on Management of Data10.1145/36393082:1(1-27)Online publication date: 26-Mar-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media