research-article

iBTune: individualized buffer tuning for large-scale cloud databases

Authors:

Rui ZhangAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 12, Issue 10

Pages 1221 - 1234

https://doi.org/10.14778/3339490.3339503

Published: 01 June 2019 Publication History

Abstract

Tuning the buffer size appropriately is critical to the performance of a cloud database, since memory is usually the resource bottleneck. For large-scale databases supporting heterogeneous applications, configuring the individual buffer sizes for a significant number of database instances presents a scalability challenge. Manual optimization is neither efficient nor effective, and even not feasible for large cloud clusters, especially when the workload may dynamically change on each instance. The difficulty lies in the fact that each database instance requires a different buffer size that is highly individualized, subject to the constraint of the total buffer memory space. It is imperative to resort to algorithms that automatically orchestrate the buffer pool tuning for the entire database instances.

To this end, we design iBTune that has been deployed for more than 10, 000 OLTP cloud database instances in our production system. Specifically, it leverages the information from similar workloads to find out the tolerable miss ratio of each instance. Then, it utilizes the relationship between miss ratios and allocated memory sizes to individually optimize the target buffer pool sizes.

To provide a guaranteed level of service level agreement (SLA), we design a pairwise deep neural network that uses features from measurements on pairs of instances to predict the upper bounds of the request response times. A target buffer pool size can be adjusted only when the predicted response time upper bound is in a safe limit. The successful deployment on a production environment, which safely reduces the memory footprint by more than 17% compared to the original system that relies on manual configurations, demonstrates the effectiveness of our solution.

References

[1]

Docker. https://www.docker.com.

[2]

M. Akdere, U. Çetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. Learning-based query performance modeling and prediction. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, ICDE '12, pages 390--401, Washington, DC, USA, 2012. IEEE Computer Society.

Digital Library

[3]

M. Arlitt and L. W. C. Internet web servers: Workload characterization and performance implications. In IEEE/ACM Transaction on Networking, October 1997.

Digital Library

[4]

C. Berthet. Approximation of LRU caches miss rate: Application to power-law popularities. arXiv:1705.10738, 2017.

[5]

L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: evidence and implications. In Proceedings of the 18th Conference on Information Communications, 1999.

[6]

T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785--794. ACM, 2016.

Digital Library

[7]

F. J. Corbato. A paging experiment with the multics system. MIT Project MAC Report, MAC-M-384, 1968.

[8]

G. Dán and N. Carlsson. Power-law revisited: Large scale measurement study of p2p content popularity. In Proceedings of the 9th International Conference on Peer-to-peer Systems, IPTPS'10, pages 12--12, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[9]

S. Das, F. Li, V. R. Narasayya, and A. C. König. Automated demand-driven resource scaling in relational database-as-a-service. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 1923--1934, New York, NY, USA, 2016. ACM.

Digital Library

[10]

K. G. Derpanis. Overview of the ransac algorithm. Image Rochester NY, 4(1):2--3, 2010.

[11]

E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence, 35(11):2765--2781, 2013.

Digital Library

[12]

C. Fricker, P. Robert, and J. Roberts. A versatile and accurate approximation for lru cache performance. In Proceedings of the 24th International Teletraffic Congress, page 8. International Teletraffic Congress, 2012.

Digital Library

[13]

A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE '09, pages 592--603, Washington, DC, USA, 2009. IEEE Computer Society.

Digital Library

[14]

S. Garcia, J. Derrac, J. Cano, and F. Herrera. Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE transactions on pattern analysis and machine intelligence, 34(3):417--435, 2012.

Digital Library

[15]

Y. Geng, S. Liu, Z. Yin, A. Naik, B. Prabhakar, M. Rosenblum, and A. Vahdat. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 81--94, Renton, WA, 2018. USENIX Association.

Digital Library

[16]

P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine learning, 63(1):3--42, 2006.

Digital Library

[17]

G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, and Q. Li. X-engine: An optimized storage engine for large-scale e-commerce transaction processing. In Proceedings of the 2019 ACM International Conference on Management of Data, SIGMOD '19. ACM, 2019.

Digital Library

[18]

P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35(1):73--101, 03 1964.

[19]

P. R. Jelenković. Least-recently-used caching with Zipfs law requests. In The Sixth INFORMS Telecommunications Conference. Boca Raton, Florida, 2002.

[20]

A. Kadiyala and A. Kumar. Applications of python to evaluate the performance of bagging methods. Environmental Progress & Sustainable Energy, 37(5):1555--1559, 2018.

[21]

T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The case for learned index structures. SIGMOD, pages 489--504, 2018.

Digital Library

[22]

S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to Optimize Join Queries With Deep Reinforcement Learning. ArXiv e-prints, Aug. 2018.

[23]

L. Lamport. The part-time parliament. ACM Transactions on Computer Systems (TOCS), 16(2):133--169, 1998.

Digital Library

[24]

D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. On the existence of a spectrum of policies that subsumes the least recently used (lru) and least frequently used (lfu) policies. In Proceedings of the 1999 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '99, pages 134--143, New York, NY, USA, 1999. ACM.

Digital Library

[25]

Z. L. Li, M. C.-J. Liang, W. He, L. Zhu, W. Dai, J. Jiang, and G. Sun. Metis: Robustly tuning tail latencies of cloud systems. In ATC (USENIX Annual Technical Conference). USENIX, July 2018.

Digital Library

[26]

A. Liaw, M. Wiener, et al. Classification and regression by randomforest. R news, 2(3):18--22, 2002.

[27]

L. Ma, D. Van Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon. Query-based workload forecasting for self-driving database management systems. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 631--645, New York, NY, USA, 2018. ACM.

Digital Library

[28]

V. Narasayya, I. Menache, M. Singh, F. Li, M. Syamala, and S. Chaudhuri. Sharing buffer pool memory in multi-tenant relational database-as-a-service. PVLDB, 8(7):726--737, 2015.

Digital Library

[29]

D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting dbms. In 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 239--248, Sept 2005.

Digital Library

[30]

E. J. O'neil, P. E. O'neil, and G. Weikum. The lru-k page replacement algorithm for database disk buffering. ACM SIGMOD Record, 22(2):297--306, 1993.

Digital Library

[31]

A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang. Self-driving database management systems. In Proceedings of the 2017 Conference on Innovative Data Systems Research, CIDR '17, 2017.

[32]

J. Petrovic. Using Memcached for data distribution in industrial environment. In Proceeding ICONS '08 Proceedings of the Third International Conference on Systems, pages 368--372, April 2008.

Digital Library

[33]

S. Podlipnig and L. Böszörmenyi. A survey of web cache replacement strategies. ACM Computing Surveys (CSUR), 35(4):374--398, Dec. 2003.

Digital Library

[34]

L. Rokach and O. Z. Maimon. Data mining with decision trees: theory and applications, volume 69. World scientific, 2008.

Digital Library

[35]

D. L. Shrestha and D. P. Solomatine. Experiments with adaboost. rt, an improved boosting scheme for regression. Neural computation, 18(7):1678--1710, 2006.

Digital Library

[36]

Y. Smaragdakis, S. Kaplan, and P. Wilson. The eelru adaptive replacement algorithm. Perform. Eval., 53(2):93--123, July 2003.

Digital Library

[37]

A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in db2. In Proceedings of the 32Nd International Conference on Very Large Data Bases, VLDB '06, pages 1081--1092. VLDB Endowment, 2006.

Digital Library

[38]

T. Sugimoto and N. Miyoshi. On the asymptotics of fault probability in least-recently-used caching with Zipf-type request distribution. Random Structures & Algorithms, 29(3):296--323, 2006.

Digital Library

[39]

R. Taft, N. El-Sayed, M. Serafini, Y. Lu, A. Aboulnaga, M. Stonebraker, R. Mayerhofer, and F. Andrade. P-store: An elastic database system with predictive provisioning. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 205--219, New York, NY, USA, 2018. ACM.

Digital Library

[40]

J. Tan, G. Quan, K. Ji, and N. Shroff. On resource pooling and separation for LRU caching. In Proceedings of the 2018 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science. ACM, 2018.

Digital Library

[41]

D. N. Tran, P. C. Huynh, Y. C. Tay, and A. K. H. Tung. A new approach to dynamic self-tuning of database buffers. Trans. Storage, 4(1):3:1--3:25, May 2008.

Digital Library

[42]

D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 1009--1024, New York, NY, USA, 2017. ACM.

Digital Library

[43]

J. Wang. A survey of web caching schemes for the internet. SIGCOMM Computer Communication Review, 29(5):36--46, Oct. 1999.

Digital Library

[44]

W. Wu, Y. Chi, H. Hacígümüş, and J. F. Naughton. Towards predicting query execution time for concurrent and dynamic database workloads. PVLDB, 6(10):925--936, 2013.

Digital Library

[45]

Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Characterizing facebook's memcached workload. IEEE Internet Computing, 18(2):41--49, 2014.

[46]

Y. Yang and J. Zhu. Write skew and zipf distribution: Evidence and implications. ACM Trans. Storage, 12(4):21:1--21:19, June 2016.

Digital Library

[47]

J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 2061--2064. ACM, 2009.

Digital Library

[48]

H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301--320, 2005.

Cited By

Weng LZhu RWu DDing BZheng BZhou J(2024)Eraser: Eliminating Performance Regression on Learned Query OptimizerProceedings of the VLDB Endowment10.14778/3641204.364120517:5(926-938)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641205
Zhan YXi RLiao JFan SHou M(2024)KnobTune: A Dynamic Database Configuration Tuning Strategy Leveraging Historical Workload SimilaritiesProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661734(1-8)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1145/3661725.3661734
Kanellis KFreischuetz JVenkataraman S(2024)Nautilus: A Benchmarking Platform for DBMS Knob TuningProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663336(72-76)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3650203.3663336
Show More Cited By

Recommendations

SQUID: a practical 100% throughput scheduler for crosspoint buffered switches

Crosspoint buffered switches are emerging as the focus of research in high-speed routers. They have simpler scheduling algorithms and achieve better performance than bufferless crossbar switches. Crosspoint buffered switches have a buffer at each ...
Excess buffer requirement for EPD schemes in ATM networks

It is known that the performance of TCP over ATM can be significantly degraded if the bandwidth is occupied by cells belonging to packets that are already corrupted by cell loss due to buffer overflow. The Early Packet Discard (EPD) mechanism is a well ...
The impact of bursty traffic on FPCF packet switch performance

This paper analyses and compares the performance of forward planning conflict-free (FPCF), virtual output queuing-partitioned (VOQ-P) and virtual output queuing-shared (VOQ-S) packet switches. The influence of packet burst size, offered switch load and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 12, Issue 10

June 2019

177 pages

ISSN:2150-8097

Editors:
Lei Chen
HKUST
,
Fatma Özcan
IBM Research - Almaden

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 June 2019

Published in PVLDB Volume 12, Issue 10

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
447
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)4

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Weng LZhu RWu DDing BZheng BZhou J(2024)Eraser: Eliminating Performance Regression on Learned Query OptimizerProceedings of the VLDB Endowment10.14778/3641204.364120517:5(926-938)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641205
Zhan YXi RLiao JFan SHou M(2024)KnobTune: A Dynamic Database Configuration Tuning Strategy Leveraging Historical Workload SimilaritiesProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661734(1-8)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1145/3661725.3661734
Kanellis KFreischuetz JVenkataraman S(2024)Nautilus: A Benchmarking Platform for DBMS Knob TuningProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663336(72-76)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3650203.3663336
Siddiqui TWu W(2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3641832.3641836
Chen YHoque MXu PLu JTarkoma S(2024)SimCost: cost-effective resource provision prediction and recommendation for spark workloadsDistributed and Parallel Databases10.1007/s10619-023-07436-y42:1(73-102)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s10619-023-07436-y
Ma ZLi ZTu Y(2023)A Sample-Aware Database Tuning System With Deep Reinforcement LearningJournal of Database Management10.4018/JDM.33351935:1(1-25)Online publication date: 9-Nov-2023
https://dl.acm.org/doi/10.4018/JDM.333519
Pan ZWang YZhang YYang SCheng YChen PGuo CWen QTian XDou YZhou ZYang CZhou AYang B(2023)MagicScaler: Uncertainty-Aware, Predictive AutoscalingProceedings of the VLDB Endowment10.14778/3611540.361156616:12(3808-3821)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.14778/3611540.3611566
Li MWang YMa SLiu CHuo DWang YXu Z(2023)Auto-Tuning with Reinforcement Learning for Permissioned Blockchain SystemsProceedings of the VLDB Endowment10.14778/3579075.357907616:5(1000-1012)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.14778/3579075.3579076
Sun JYe FNedjah NZhang MXu D(2023)Workload-Aware Performance Tuning for Multimodel Databases Based on Deep Reinforcement LearningInternational Journal of Intelligent Systems10.1155/2023/88351112023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/8835111
Dou HWei XWang KZhang YChen PHuang Y(2023)EFTuner: A Bi-Objective Configuration Parameter Auto-Tuning Method Towards Energy-Efficient Big Data ProcessingProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609443(292-301)Online publication date: 4-Aug-2023
https://dl.acm.org/doi/10.1145/3609437.3609443
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents