Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

iBTune: individualized buffer tuning for large-scale cloud databases

Published: 01 June 2019 Publication History

Abstract

Tuning the buffer size appropriately is critical to the performance of a cloud database, since memory is usually the resource bottleneck. For large-scale databases supporting heterogeneous applications, configuring the individual buffer sizes for a significant number of database instances presents a scalability challenge. Manual optimization is neither efficient nor effective, and even not feasible for large cloud clusters, especially when the workload may dynamically change on each instance. The difficulty lies in the fact that each database instance requires a different buffer size that is highly individualized, subject to the constraint of the total buffer memory space. It is imperative to resort to algorithms that automatically orchestrate the buffer pool tuning for the entire database instances.
To this end, we design iBTune that has been deployed for more than 10, 000 OLTP cloud database instances in our production system. Specifically, it leverages the information from similar workloads to find out the tolerable miss ratio of each instance. Then, it utilizes the relationship between miss ratios and allocated memory sizes to individually optimize the target buffer pool sizes.
To provide a guaranteed level of service level agreement (SLA), we design a pairwise deep neural network that uses features from measurements on pairs of instances to predict the upper bounds of the request response times. A target buffer pool size can be adjusted only when the predicted response time upper bound is in a safe limit. The successful deployment on a production environment, which safely reduces the memory footprint by more than 17% compared to the original system that relies on manual configurations, demonstrates the effectiveness of our solution.

References

[1]
Docker. https://www.docker.com.
[2]
M. Akdere, U. Çetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. Learning-based query performance modeling and prediction. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, ICDE '12, pages 390--401, Washington, DC, USA, 2012. IEEE Computer Society.
[3]
M. Arlitt and L. W. C. Internet web servers: Workload characterization and performance implications. In IEEE/ACM Transaction on Networking, October 1997.
[4]
C. Berthet. Approximation of LRU caches miss rate: Application to power-law popularities. arXiv:1705.10738, 2017.
[5]
L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: evidence and implications. In Proceedings of the 18th Conference on Information Communications, 1999.
[6]
T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785--794. ACM, 2016.
[7]
F. J. Corbato. A paging experiment with the multics system. MIT Project MAC Report, MAC-M-384, 1968.
[8]
G. Dán and N. Carlsson. Power-law revisited: Large scale measurement study of p2p content popularity. In Proceedings of the 9th International Conference on Peer-to-peer Systems, IPTPS'10, pages 12--12, Berkeley, CA, USA, 2010. USENIX Association.
[9]
S. Das, F. Li, V. R. Narasayya, and A. C. König. Automated demand-driven resource scaling in relational database-as-a-service. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 1923--1934, New York, NY, USA, 2016. ACM.
[10]
K. G. Derpanis. Overview of the ransac algorithm. Image Rochester NY, 4(1):2--3, 2010.
[11]
E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence, 35(11):2765--2781, 2013.
[12]
C. Fricker, P. Robert, and J. Roberts. A versatile and accurate approximation for lru cache performance. In Proceedings of the 24th International Teletraffic Congress, page 8. International Teletraffic Congress, 2012.
[13]
A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE '09, pages 592--603, Washington, DC, USA, 2009. IEEE Computer Society.
[14]
S. Garcia, J. Derrac, J. Cano, and F. Herrera. Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE transactions on pattern analysis and machine intelligence, 34(3):417--435, 2012.
[15]
Y. Geng, S. Liu, Z. Yin, A. Naik, B. Prabhakar, M. Rosenblum, and A. Vahdat. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 81--94, Renton, WA, 2018. USENIX Association.
[16]
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine learning, 63(1):3--42, 2006.
[17]
G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, and Q. Li. X-engine: An optimized storage engine for large-scale e-commerce transaction processing. In Proceedings of the 2019 ACM International Conference on Management of Data, SIGMOD '19. ACM, 2019.
[18]
P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35(1):73--101, 03 1964.
[19]
P. R. Jelenković. Least-recently-used caching with Zipfs law requests. In The Sixth INFORMS Telecommunications Conference. Boca Raton, Florida, 2002.
[20]
A. Kadiyala and A. Kumar. Applications of python to evaluate the performance of bagging methods. Environmental Progress & Sustainable Energy, 37(5):1555--1559, 2018.
[21]
T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The case for learned index structures. SIGMOD, pages 489--504, 2018.
[22]
S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to Optimize Join Queries With Deep Reinforcement Learning. ArXiv e-prints, Aug. 2018.
[23]
L. Lamport. The part-time parliament. ACM Transactions on Computer Systems (TOCS), 16(2):133--169, 1998.
[24]
D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. On the existence of a spectrum of policies that subsumes the least recently used (lru) and least frequently used (lfu) policies. In Proceedings of the 1999 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '99, pages 134--143, New York, NY, USA, 1999. ACM.
[25]
Z. L. Li, M. C.-J. Liang, W. He, L. Zhu, W. Dai, J. Jiang, and G. Sun. Metis: Robustly tuning tail latencies of cloud systems. In ATC (USENIX Annual Technical Conference). USENIX, July 2018.
[26]
A. Liaw, M. Wiener, et al. Classification and regression by randomforest. R news, 2(3):18--22, 2002.
[27]
L. Ma, D. Van Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon. Query-based workload forecasting for self-driving database management systems. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 631--645, New York, NY, USA, 2018. ACM.
[28]
V. Narasayya, I. Menache, M. Singh, F. Li, M. Syamala, and S. Chaudhuri. Sharing buffer pool memory in multi-tenant relational database-as-a-service. PVLDB, 8(7):726--737, 2015.
[29]
D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting dbms. In 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 239--248, Sept 2005.
[30]
E. J. O'neil, P. E. O'neil, and G. Weikum. The lru-k page replacement algorithm for database disk buffering. ACM SIGMOD Record, 22(2):297--306, 1993.
[31]
A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang. Self-driving database management systems. In Proceedings of the 2017 Conference on Innovative Data Systems Research, CIDR '17, 2017.
[32]
J. Petrovic. Using Memcached for data distribution in industrial environment. In Proceeding ICONS '08 Proceedings of the Third International Conference on Systems, pages 368--372, April 2008.
[33]
S. Podlipnig and L. Böszörmenyi. A survey of web cache replacement strategies. ACM Computing Surveys (CSUR), 35(4):374--398, Dec. 2003.
[34]
L. Rokach and O. Z. Maimon. Data mining with decision trees: theory and applications, volume 69. World scientific, 2008.
[35]
D. L. Shrestha and D. P. Solomatine. Experiments with adaboost. rt, an improved boosting scheme for regression. Neural computation, 18(7):1678--1710, 2006.
[36]
Y. Smaragdakis, S. Kaplan, and P. Wilson. The eelru adaptive replacement algorithm. Perform. Eval., 53(2):93--123, July 2003.
[37]
A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in db2. In Proceedings of the 32Nd International Conference on Very Large Data Bases, VLDB '06, pages 1081--1092. VLDB Endowment, 2006.
[38]
T. Sugimoto and N. Miyoshi. On the asymptotics of fault probability in least-recently-used caching with Zipf-type request distribution. Random Structures & Algorithms, 29(3):296--323, 2006.
[39]
R. Taft, N. El-Sayed, M. Serafini, Y. Lu, A. Aboulnaga, M. Stonebraker, R. Mayerhofer, and F. Andrade. P-store: An elastic database system with predictive provisioning. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 205--219, New York, NY, USA, 2018. ACM.
[40]
J. Tan, G. Quan, K. Ji, and N. Shroff. On resource pooling and separation for LRU caching. In Proceedings of the 2018 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science. ACM, 2018.
[41]
D. N. Tran, P. C. Huynh, Y. C. Tay, and A. K. H. Tung. A new approach to dynamic self-tuning of database buffers. Trans. Storage, 4(1):3:1--3:25, May 2008.
[42]
D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 1009--1024, New York, NY, USA, 2017. ACM.
[43]
J. Wang. A survey of web caching schemes for the internet. SIGCOMM Computer Communication Review, 29(5):36--46, Oct. 1999.
[44]
W. Wu, Y. Chi, H. Hacígümüş, and J. F. Naughton. Towards predicting query execution time for concurrent and dynamic database workloads. PVLDB, 6(10):925--936, 2013.
[45]
Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Characterizing facebook's memcached workload. IEEE Internet Computing, 18(2):41--49, 2014.
[46]
Y. Yang and J. Zhu. Write skew and zipf distribution: Evidence and implications. ACM Trans. Storage, 12(4):21:1--21:19, June 2016.
[47]
J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 2061--2064. ACM, 2009.
[48]
H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301--320, 2005.

Cited By

View all
  • (2024)Eraser: Eliminating Performance Regression on Learned Query OptimizerProceedings of the VLDB Endowment10.14778/3641204.364120517:5(926-938)Online publication date: 1-Jan-2024
  • (2024)KnobTune: A Dynamic Database Configuration Tuning Strategy Leveraging Historical Workload SimilaritiesProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661734(1-8)Online publication date: 12-Apr-2024
  • (2024)Nautilus: A Benchmarking Platform for DBMS Knob TuningProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663336(72-76)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 10
June 2019
177 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 June 2019
Published in PVLDB Volume 12, Issue 10

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)4
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Eraser: Eliminating Performance Regression on Learned Query OptimizerProceedings of the VLDB Endowment10.14778/3641204.364120517:5(926-938)Online publication date: 1-Jan-2024
  • (2024)KnobTune: A Dynamic Database Configuration Tuning Strategy Leveraging Historical Workload SimilaritiesProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661734(1-8)Online publication date: 12-Apr-2024
  • (2024)Nautilus: A Benchmarking Platform for DBMS Knob TuningProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663336(72-76)Online publication date: 9-Jun-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • (2024)SimCost: cost-effective resource provision prediction and recommendation for spark workloadsDistributed and Parallel Databases10.1007/s10619-023-07436-y42:1(73-102)Online publication date: 1-Mar-2024
  • (2023)A Sample-Aware Database Tuning System With Deep Reinforcement LearningJournal of Database Management10.4018/JDM.33351935:1(1-25)Online publication date: 9-Nov-2023
  • (2023)MagicScaler: Uncertainty-Aware, Predictive AutoscalingProceedings of the VLDB Endowment10.14778/3611540.361156616:12(3808-3821)Online publication date: 1-Aug-2023
  • (2023)Auto-Tuning with Reinforcement Learning for Permissioned Blockchain SystemsProceedings of the VLDB Endowment10.14778/3579075.357907616:5(1000-1012)Online publication date: 1-Jan-2023
  • (2023)Workload-Aware Performance Tuning for Multimodel Databases Based on Deep Reinforcement LearningInternational Journal of Intelligent Systems10.1155/2023/88351112023Online publication date: 1-Jan-2023
  • (2023)EFTuner: A Bi-Objective Configuration Parameter Auto-Tuning Method Towards Energy-Efficient Big Data ProcessingProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609443(292-301)Online publication date: 4-Aug-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media