research-article

BestConfig: tapping the performance potential of systems via automatic configuration tuning

Authors:

Yingchun YangAuthors Info & Claims

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Pages 338 - 350

https://doi.org/10.1145/3127479.3128605

Published: 24 September 2017 Publication History

Abstract

An ever increasing number of configuration parameters are provided to system users. But many users have used one configuration setting across different workloads, leaving untapped the performance potential of systems. A good configuration setting can greatly improve the performance of a deployed system under certain workloads. But with tens or hundreds of parameters, it becomes a highly costly task to decide which configuration setting leads to the best performance. While such task requires the strong expertise in both the system and the application, users commonly lack such expertise.

To help users tap the performance potential of systems, we present Best Config, a system for automatically finding a best configuration setting within a resource limit for a deployed system under a given application workload. BestConfig is designed with an extensible architecture to automate the configuration tuning for general systems. To tune system configurations within a resource limit, we propose the divide-and-diverge sampling method and the recursive bound-and-search algorithm. BestConfig can improve the throughput of Tomcat by 75%, that of Cassandra by 63%, that of MySQL by 430%, and reduce the running time of Hive join job by about 50% and that of Spark join job by about 80%, solely by configuration adjustment.

References

[1]

Sameer Agarwal, Srikanth Kandula, Nicolas Bruno, Ming-Chuan Wu, Ion Stoica, and Jingren Zhou. 2012. Re-optimizing data-parallel computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 21--21.

Digital Library

[2]

Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al. 2006. The landscape of parallel computing research: A view from Berkeley. Technical Report. UCB/EECS-2006-183, EECS Department, University of California, Berkeley.

[3]

Phil Bernstein, Michael Brodie, Stefano Ceri, David DeWitt, Mike Franklin, Hector Garcia-Molina, Jim Gray, Jerry Held, Joe Hellerstein, HV Jagadish, et al. 1998. The Asilomar report on database research. ACM Sigmod record 27, 4 (1998), 74--80.

[4]

Josep Lluís Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, and Daron Green. 2015. Aloja-ml: A framework for automating characterization and knowledge discovery in hadoop deployments. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1701--1710.

Digital Library

[5]

Jon Brodkin. 2012. Why Gmail went down: Google misconfigured load balancing servers. (2012). http://arstechnica.com/information-technology/2012/12/why-gmail-went-down-google-misconfigured-chromes-sync-server/.

[6]

Xiangping Bu, Jia Rao, and Cheng-Zhong Xu. 2009. A reinforcement learning approach to online web systems auto-configuration. In Distributed Computing Systems, 2009. ICDCS'09. 29th IEEE International Conference on. IEEE, 2--11.

Digital Library

[7]

Cassandra.Apache.Org. 2017. Apache Cassandra Website. (2017). http://cassandra.apache.org/.

[8]

Haifeng Chen, Wenxuan Zhang, and Guofei Jiang. 2011. Experience transfer for the configuration tuning in large-scale computing systems. IEEE Transactions on Knowledge and Data Engineering 23, 3 (2011), 388--401.

Digital Library

[9]

Cloudera.Com. 2017. Tuning YARN. http://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_ig_yarn_tuning.html. (2017).

[10]

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st SoCC. ACM.

Digital Library

[11]

Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning database configuration parameters with iTuned. Proceedings of the VLDB Endowment 2, 1 (2009), 1246--1257.

Digital Library

[12]

Anon et al, Dina Bitton, Mark Brown, Rick Catell, Stefano Ceri, Tim Chou, Dave DeWitt, Dieter Gawlick, Hector Garcia-Molina, Bob Good, Jim Gray, et al. 1985. A measure of transaction processing power. Datamation 31, 7 (1985), 112--118.

Digital Library

[13]

Andrew D Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. 2012. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems. ACM, 99--112.

Digital Library

[14]

Adem Efe Gencer, David Bindel, Emin Gün Sirer, and Robbert van Renesse. 2015. Configuring Distributed Computations Using Response Surfaces. In Proceedings of the 16th Annual Middleware Conference. ACM, 235--246.

Digital Library

[15]

David E Goldberg and John H Holland. 1988. Genetic algorithms and machine learning. Machine learning 3, 2 (1988), 95--99.

Digital Library

[16]

Bilal Gonen, Gurhan Gunduz, and Murat Yuksel. 2015. Automated network management and configuration using Probabilistic Trans-Algorithmic Search. Computer Networks 76 (2015), 275--293.

Digital Library

[17]

Qi Guo, Tianshi Chen, Yunji Chen, Zhi-Hua Zhou, Weiwu Hu, and Zhiwei Xu. 2011. Effective and efficient microprocessor design space exploration using unlabeled design configurations. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Vol. 22. Citeseer, 1671.

[18]

Hadoop.Apache.Org. 2017. Apache Hadoop Website. (2017). http://hadoop.apache.org/.

[19]

Herodotos Herodotou, Fei Dong, and Shivnath Babu. 2011. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, 18.

Digital Library

[20]

Hive.Apache.Org. 2017. Apache Hive Website. (2017). http://hive.apache.org/.

[21]

Holger H Hoos. 2011. Automated algorithm configuration and parameter tuning. In Autonomous search. Springer, 37--71.

[22]

S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. 2010. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Proc. of ICDEW 2010. IEEE, 41--51.

[23]

JMeter.Apache.Org. 2017. Apache JMeter^™. http://jmeter.apache.org. (2017).

[24]

Launchpad.Net. 2017. SysBench: System evaluation benchmark. http://github.com/nuodb/sysbench. (2017).

[25]

Michael D McKay, Richard J Beckman, and William J Conover. 2000. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42, 1 (2000), 55--61.

[26]

Aurimas Mikalauskas. 2017. 17 KEY MYSQL CONFIG FILE SETTINGS (MYSQL 5.7 PROOF). http://www.speedemy.com/17-key-mysql-config-file-settings-mysql-5-7-proof/. (2017).

[27]

Rich Miller. 2012. Microsoft: Misconfigured Network Device Led to Azure Outage. (2012). http://www.datacenterknowledge.com/archives/2012/07/28/microsoft-misconfigured-network-device-caused-azure-outage/.

[28]

MySQL.Com. 2017. MySQL Website. (2017). http://www.mysql.com/.

[29]

Takayuki Osogami and Sei Kato. 2007. Optimizing system configurations quickly by guessing at the performance. In ACM SIGMETRICS Performance Evaluation Review, Vol. 35. ACM, 145--156.

Digital Library

[30]

Kim Shanley. 2010. History and Overview of the TPC. (2010).

[31]

Spark.Apache.Org. 2017. Apache Spark Website. (2017). http://spark.apache.org/.

[32]

Spark.Apache.Org. 2017. Tuning Spark. (2017). http://spark.apache.org/docs/latest/tuning.html

[33]

Spec.Org. 2017. Standard Performance Evaluation Corporation (SPEC). http://www.spec.org/. (2017).

[34]

Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl. 2015. Holistic configuration management at Facebook. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 328--343.

Digital Library

[35]

Keir Thomas. 2011. Amazon: The Cloud Crash Reveals Your Importance. (2011). http://www.pcworld.com/article/226033/thanks_amazon_for_making_possible_much_of_the_internet.html.

[36]

Tobert. 2017. Al's Cassandra 2.1 tuning guide. https://tobert.github.io/pages/alscassandra-21-tuning-guide.html. (2017).

[37]

Tomcat.Apache.Org. 2017. Apache Tomcat Website. (2017). http://tomcat.apache.org/.

[38]

TPC.Org. 2017. Transaction Processing Performance Council (TPC). http://www.tpc.org/. (2017).

[39]

Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 1009--1024.

Digital Library

[40]

Peter JM Van Laarhoven and Emile HL Aarts. 1987. Simulated annealing. In Simulated Annealing: Theory and Applications. Springer, 7--15.

[41]

Bowei Xi, Zhen Liu, Mukund Raghavachari, Cathy H Xia, and Li Zhang. 2004. A smart hill-climbing algorithm for application server configuration. In Proceedings of the 13th international conference on World Wide Web. ACM, 287--296.

Digital Library

[42]

Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. 2015. Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 307--319.

Digital Library

[43]

Tao Ye and Shivkumar Kalyanaraman. 2003. A recursive random search algorithm for large-scale network parameter configuration. ACM SIGMETRICS Performance Evaluation Review 31, 1 (2003), 196--205.

Digital Library

[44]

Wei Zheng, Ricardo Bianchini, and Thu D. Nguyen. 2007. Automatic Configuration of Internet Services. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007. ACM, New York, NY,USA, 219--229.

[45]

Yuqing Zhu and Jianxun Liu. 2017. Better Configurations for Large-Scale Systems (BestConf). (2017). http://github.com/zhuyuqing/bestconf

[46]

Yuqing Zhu, Jianxun Liu, Mengying Guo, Wenlong Ma, and Yungang Bao. 2017. ACTS in Need: Automatic Configuration Tuning with Scalability Guarantees. In Proceedings of the 8th SIGOPS Asia-Pacific Workshop on Systems. ACM.

Digital Library

[47]

Yuqing Zhu, Jianfeng Zhan, Chuliang Weng, Raghunath Nambiar, Jinchao Zhang, Xingzhen Chen, and Lei Wang. 2014. Bigop: Generating comprehensive big data workloads as a benchmarking framework. In International Conference on Database Systems for Advanced Applications. Springer, 483--492.

Cited By

Chen PGong JChen T(2025)Accuracy Can Lie: On the Impact of Surrogate Model in Configuration TuningIEEE Transactions on Software Engineering10.1109/TSE.2025.352595551:2(548-580)Online publication date: Feb-2025
https://doi.org/10.1109/TSE.2025.3525955
Gong JChen TBahsoon R(2025)Dividable Configuration Performance LearningIEEE Transactions on Software Engineering10.1109/TSE.2024.349194551:1(106-134)Online publication date: Jan-2025
https://doi.org/10.1109/TSE.2024.3491945
Li YBao LHuang KWu C(2025)CSAT: Configuration structure-aware tuning for highly configurable software systemsJournal of Systems and Software10.1016/j.jss.2024.112316222(112316)Online publication date: Apr-2025
https://doi.org/10.1016/j.jss.2024.112316
Show More Cited By

Index Terms

BestConfig: tapping the performance potential of systems via automatic configuration tuning

Recommendations

Optimizing system configurations quickly by guessing at the performance
SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

The performance of a Web system can be greatly improved by tuning its configuration parameters. However, finding the optimal configuration has been a time-consuming task due to the long measurement time needed to evaluate the performance of a given ...
Optimizing system configurations quickly by guessing at the performance
SIGMETRICS '07 Conference Proceedings

The performance of a Web system can be greatly improved by tuning its configuration parameters. However, finding the optimal configuration has been a time-consuming task due to the long measurement time needed to evaluate the performance of a given ...
ACTGAN: automatic configuration tuning for software systems with generative adversarial networks
ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering

Complex software systems often provide a large number of parameters so that users can configure them for their specific application scenarios. However, configuration tuning requires a deep understanding of the software system, far beyond the abilities ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

September 2017

672 pages

ISBN:9781450350280

DOI:10.1145/3127479

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
State Key Development Program for Basic Research of China

Conference

SoCC '17

Sponsor:

SoCC '17: ACM Symposium on Cloud Computing

September 24 - 27, 2017

California, Santa Clara

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

172
Total Citations
View Citations
1,288
Total Downloads

Downloads (Last 12 months)119
Downloads (Last 6 weeks)8

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen PGong JChen T(2025)Accuracy Can Lie: On the Impact of Surrogate Model in Configuration TuningIEEE Transactions on Software Engineering10.1109/TSE.2025.352595551:2(548-580)Online publication date: Feb-2025
https://doi.org/10.1109/TSE.2025.3525955
Gong JChen TBahsoon R(2025)Dividable Configuration Performance LearningIEEE Transactions on Software Engineering10.1109/TSE.2024.349194551:1(106-134)Online publication date: Jan-2025
https://doi.org/10.1109/TSE.2024.3491945
Li YBao LHuang KWu C(2025)CSAT: Configuration structure-aware tuning for highly configurable software systemsJournal of Systems and Software10.1016/j.jss.2024.112316222(112316)Online publication date: Apr-2025
https://doi.org/10.1016/j.jss.2024.112316
Pei YZhu MZhu CSong WSun YLi LZhu H(2025)Meta Reinforcement Learning Based Dynamic Tuning for Blockchain Systems in Diverse Network EnvironmentsBlockchain: Research and Applications10.1016/j.bcra.2024.100261(100261)Online publication date: Jan-2025
https://doi.org/10.1016/j.bcra.2024.100261
Chow MWang YWang WHailu ABopardikar RZhang BQu JMeisner DSonawane SZhang YPaim RWard MHuang IMcNally MHodges DFarkas ZGocmen CHuang ETang CGavrilovska ATerry D(2024)ServiceLabProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691967(545-562)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691967
Guo MDemetriou SYang JLeighton MHu DBao TAdhikari AKooburat TKim ATang CVanbever LZhang I(2024)MobileConfigProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691927(1867-1882)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691927
Öztürk M(2024)MFRLMO: Model-free reinforcement learning for multi-objective optimization of apache sparkICST Transactions on Scalable Information Systems10.4108/eetsis.476411:5Online publication date: 20-Feb-2024
https://doi.org/10.4108/eetsis.4764
Lekkala C(2024)Leveraging Reinforcement Learning for Autonomous Data Pipeline Optimization and ManagementSSRN Electronic Journal10.2139/ssrn.4908414Online publication date: 2024
https://doi.org/10.2139/ssrn.4908414
Wu YHuang XWei ZCheng HXin CChen ZChen BWu YWang HZhang TShi RGao XLiang YZhao PChen G(2024)Towards Resource Efficiency: Practical Insights into Large-Scale Spark Workloads at ByteDanceProceedings of the VLDB Endowment10.14778/3685800.368580417:12(3759-3771)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685804
Lyu CFan QGuyard PDiao Y(2024)A Spark Optimizer for Adaptive, Fine-Grained Parameter TuningProceedings of the VLDB Endowment10.14778/3681954.368202117:11(3565-3579)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682021
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten