Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Mastering uncertainty in performance estimations of configurable software systems

Published: 19 January 2023 Publication History

Abstract

Understanding the influence of configuration options on the performance of a software system is key for finding optimal system configurations, system understanding, and performance debugging. In the literature, a number of performance-influence modeling approaches have been proposed, which model a configuration option’s influence and a configuration’s performance as a scalar value. However, these point estimates falsely imply a certainty regarding an option’s influence that neglects several sources of uncertainty within the assessment process, such as (1) measurement bias, choices of model representation and learning process, and incomplete data. This leads to the situation that different approaches and even different learning runs assign different scalar performance values to options and interactions among them. The true influence is uncertain, though. There is no way to quantify this uncertainty with state-of-the-art performance modeling approaches.
We propose a novel approach, P4, which is based on probabilistic programming, that explicitly models uncertainty for option influences and consequently provides a confidence interval for each prediction alongside a scalar. This way, we can explain, for the first time, why predictions may be erroneous and which option’s influence may be unreliable. An evaluation on 13 real-world subject systems shows that P4’s accuracy is in line with the state of the art while providing reliable confidence intervals, in addition to scalar predictions. We qualitatively explain how uncertain influences of individual options and interactions cause inaccurate predictions.

References

[1]
Aken Dana Van, Pavlo Andrew, Gordon Geoffrey J, Zhang Bohan (2017) Automatic database management system tuning through large-scale machine learning. In: Proceedings of the international conference on management of data (SIGMOD). ACM, pp 1009–1024. ISBN 978-1-4503-4197-4.
[2]
Alan M (2002) Subset selection in regression. CRC Press,
[3]
Antonelli F, Cortellessa V, Gribaudo M, Pinciroli R, Trivedi KS, and Trubiani CAnalytical modeling of performance indices under epistemic uncertainty applied to cloud computing systemsFuture Gener Comput Syst2020102746-761ISSN 0167-739X. https://doi.org/10.1016/j.future.2019.09.006
[4]
Arcaini Paolo, Inverso Omar, and Trubiani CatiaAutomated model-based performance analysis of software product lines under uncertaintyJ Inform Software Technol (IST)2020127106371ISSN 0950-5849. https://doi.org/10.1016/j.infsof.2020.106371
[5]
Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems, volume 24. Curran Associates, Inc
[6]
Cheng J, Gao C, Zheng Z (2022) HINNPerf: Hierarchical Interaction Neural Network for Performance Prediction of Configurable Systems. ACM Transactions on Software Engineering and Methodology
[7]
Dorn J, Apel S, Siegmund N (2020) Mastering Uncertainty in Performance Estimations of Configurable Software Systems. In: ASE’20. Association for Computing Machinery, New York,
[8]
Dubslaff C, Weis K, Baier C, Apel S (2022) Causality in Configurable Software Systems, p 13
[9]
Elbaum S, Rosenblum DS (2014) Known Unknowns: Testing in the Presence of Uncertainty. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, FSE 2014. Association for Computing Machinery, New York, pp 833–836,
[10]
Farrar DE and Glauber RR Multicollinearity in Regression Analysis: The Problem Revisited Rev Econ Stat 1967 49 92-107
[11]
Gareth J, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning. Springer texts in statistics. Springer. ISBN 978-1-4614-7137-0 978-1-4614-7138-7.
[12]
Gogate V, Dechter R (2006) A New Algorithm for Sampling CSP Solutions Uniformly at Random. In: Principles and Practice of Constraint Programming - CP 2006, Springer, pp 711–715,
[13]
Guo J, Czarnecki K, Apel S, Siegmund N, Wasowski A (2013) Variability-Aware Performance Prediction: A Statistical Learning Approach. In: Proceedings of the International Conference on Automated Software Engineering (ASE), IEEE, pp 301–311,
[14]
Guo J, Yang D, Siegmund N, Apel S, Sarkar A, Valov P, Czarnecki K, Wasowski A, and Yu H Data-Efficient Performance Learning for Configurable Systems Empir Softw Eng 2018 23 1826-1867
[15]
Ha H, Zhang H (2019a) DeepPerf: Performance Prediction for Configurable Software with Deep Sparse Neural Network. In: Proceedings of the International Conference on Software Engineering (ICSE), IEEE, pp 1095–1106,
[16]
Ha H, Zhang H (2019b) Performance- Influence Model for Highly Configurable Software with Fourier Learning and Lasso Regression. In: Proceedings of the International Conference on Software Maintenance and Evolution (ICSME), IEEE, pp 470–480,
[17]
Hartigan JA and Hartigan PM The Dip Test of Unimodality Annal Stat 1985 13 1 70-84
[18]
Henard C, Papadakis M, Harman M, Traon LE (2015) Combining Multi- Objective Search and Constraint Solving for Configuring Large Software Product Lines. In: Proceedings of the International Conference on Software Engineering (ICSE), IEEE/ACM, pp 517–528,
[19]
Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: Proceedings of the Conference on Innovative Data Systems Research (CIDR), pp 261–272. www.cidrdb.org
[20]
Hill RC, Adkins LC (2007) Collinearity. In: A Companion to Theoretical Econometrics, chapter 12, Wiley, pp 256–278,
[21]
Hoerl AE and Kennard RW Ridge Regression: Biased Estimation for Nonorthogonal Problems Technometrics 1970 12 55-67
[22]
Hoffman MD and Gelman A The No- U- Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo J Mach Learn Res 2014 15 1593-1623
[23]
Horcas J-M, Galindo JA, Heradio R, Fernandez-Amoros D, Benavides D (2021) Monte Carlo Tree Search for Feature Model Analyses: A General Framework for Decision-Making. In: Proceedings of the 25th ACM international systems and software product line conference- Volume A, SPLC’21. Association for Computing Machinery, pp 190–201, New York,
[24]
Iqbal MdS, Krishna R, Javidian MA, Ray B, Jamshidi P (2022) Unicorn: reasoning about configurable system performance through the lens of causality. In: Proceedings of the seventeenth european conference on computer systems, EuroSys ’22. Association for Computing Machinery, pp 199–217, New York,
[25]
Jamshidi P, Casale G (2016) An uncertainty-aware approach to optimal configuration of stream processing systems. In: Proceedings of the international symposium on modeling, analysis and simulation of computer and telecommunication systems (MASCOTS). IEEE, pp 39–48,
[26]
Johansen MF, Haugen Ø, Fleurey F (2012) An algorithm for generating T-wise covering arrays from large feature models. In: Proceedings of the international software product line conference (SPLC). ACM, p 46,
[27]
Kaltenecker Christian, Grebhahn Alexander, Siegmund Norbert, and Apel SvenThe interplay of sampling and machine learning for software performance predictionIEEE Softw202037458-66ISSN 0740-7459, 1937-4194. https://doi.org/10.1109/MS.2020.2987024
[28]
Kaltenecker C, Grebhahn A, Siegmund N, Guo J, Apel S (2019) Distance-based sampling of software configuration spaces. In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 1084–1094,
[29]
Kendall MGA new measure of rank correlationBiometrika1938301-281-93ISSN 0006-3444. https://doi.org/10.1093/biomet/30.1-2.81
[30]
Kendall Alex, Gal Yarin (2017) What uncertainties do we need in Bayesian deep learning for computer vision?. In: Proceedings of the international conference on neural information processing systems (NIPS). Curran Associates Inc., pp 5580–5590
[31]
Kiureghian AD and Ditlevsen OAleatory or Epistemic? Does It Matter?Structural Safety, 31:105–112, 2009. ISSN 01674730. 10.1016/j.strusafe.200806020
[32]
Kolesnikov S, Siegmund N, Kästner C, and Apel S On the relation of control-flow and performance feature interactions: a case study Empirical Software Eng (EMSE) 2019 24 4 2410-2437
[33]
Krishna Rahul, Tang Chong, Sullivan Kevin, Ray Baishakhi (2020) Conex: efficient exploration of big-data system configurations for better performance. IEEE Trans Software Eng, pp 1–1.
[34]
Maaten LVD, Postma E, den Herik JV (2009) Dimensionality reduction: a comparative review. J Mach Learn Res, pp 10
[35]
Mandrioli C, Maggio M (2021) Testing self- adaptive software with probabilistic guarantees on performance metrics: extended and comparative results. IEEE Trans Software Eng, pp 1–1,. ISSN 0098-5589, 1939-3520, 2326-3881
[36]
Murphy KP (2012) Machine learning: a probabilistic perspective. Adaptive computation and machine learning series. MIT Press. ISBN 978-0-262-01802-9
[37]
Nair Vivek, Menzies Tim, Siegmund Norbert, and Apel SvenFaster discovery of faster system configurations with spectral learningAutom Softw Eng201725247-277https://doi.org/10.1007/s10515-017-0225-2
[38]
Nair Vivek, Menzies Tim, Siegmund Norbert, Apel Sven (2017b) Using bad learners to find good configurations. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, ESEC/FSE 2017. ACM, pp 257–267. ISBN 9781450351058
[39]
Nair V, Zhe Yu, Menzies T, Siegmund N, and Apel S Finding faster configurations using FLASH Trans Software Eng 2020 46 794-811
[40]
Neal RM (1993) Probabilistic inference using markov chain monte carlo methods. Department Of Computer Science University Of Toronto
[41]
Werner N (2019) Energy and performance evolution of configurable systems case studies and experiments. Master thesis, University of Passau
[42]
O’Brien RMA caution regarding rules of thumb for variance inflation factorsQuality Quantity200741673-690ISSN 0033-5177. https://doi.org/10.1007/s11135-006-9018-6
[43]
Oh J, Batory D, Myers M, Siegmund N (2017) Finding near-optimal configurations in product lines by random sampling. In: Proceedings of the joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 61–71. ISBN 978-1-4503-5105-8.
[44]
Rice JR (1976) The algorithm selection problem. volume 15 of Advances in Computers, pages 65–118. Elsevier,
[45]
Robbins HE (1956) An empirical bayes approach to statistics. In: Proceedings of the third berkeley symposium on mathematical statistics and probability, Vol 1: contributions to the theory of statistics. The Regents of the University of California,
[46]
Roeder G, Yuhuai W (2017) David k duvenaud. Sticking the landing simple, lower- variance gradient estimators for variational inference. In: Proceedings of the international conference on neural information processing systems (NIPS). Curran Associates Inc., pp 6928–6937
[47]
Salvatier J, Wiecki TV, Fonnesbeck C (2016) Probabilistic programming in Python using PyMC3. PeerJ computer science, 2:e55. ISSN 2376-5992. 10.7717/peerj-cs.55
[48]
Shannon CEA mathematical theory of communicationBell Syst Tech J194827379-423ISSN 0005-8580. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
[49]
Shapiro SS and Wilk MBAn analysis of variance test for normality (complete samples)‡Biometrika1965523-4591-611ISSN 0006-3444. https://doi.org/10.1093/biomet/52.3-4.591
[50]
Siegmund Norbert, Grebhahn Alexander, Apel Sven, Kästner Christian (2015) Performance- influence models for highly configurable systems. In: Proceedings of the joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 284–294. ISBN 978-1-4503-3675-8.
[51]
Siegmund Norbert, Kolesnikov Sergiy S, Kastner Christian, Apel Sven, Batory Don, Rosenmuller Marko, Saake Gunter (2012a) Predicting performance via automated feature-interaction detection. In: Proceedings of the international conference on software engineering (ICSE). IEEE, pp 167–177. ISBN 978-1-4673-1066-6 978-1-4673-1067-3.
[52]
Siegmund Norbert, Rosenmüller M, Kuhlemann Martin, Kästner C, Apel Sven, and Saake GunterSPL conqueror: toward optimization of non-functional properties in software product linesSoftware Qual J201220487-517ISSN 0963-9314, 1573-1367. https://doi.org/10.1007/s11219-011-9152-9
[53]
Smith R (2013) Uncertainty quantification: theory, implementation, and applications. Soc Indust Appl Math. ISBN 978-1-61197-321-1
[54]
Taylor JR (1997) An introduction to error analysis: the study of uncertainties in physical measurements University Science Books, 2nd edn.
[55]
Tibshirani R Regression Shrinkage and selection via the lasso J Royal Stat Soc . Series B (Methodol) 1996 58 267-288 ISSN 0035-9246
[56]
Trubiani Catia, Apel Sven (2019) PLUS: performance learning for uncertainty of software. In: Proceedings of the international conference on software engineering: new ideas and emerging results. IEEE, pp 77–80,
[57]
Velez M, Jamshidi P, Sattler F, Siegmund N, Apel S, and Christian Kästner Configcrusher: towards white-box performance analysis for configurable systems Autom Softw Eng 2020 27 3 265-300
[58]
Velez Miguel, Jamshidi Pooyan, Siegmund Norbert, Apel Sven, Kästner C (2021) White-box analysis over machine learning: modeling performance of configurable systems. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp 1072–1084.
[59]
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, Walt SJ, Brett M, Wilson J, Millman JK, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat I, Feng Yu, Moore EW, erPlas JV, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, and Mulbregt Pv Scipy 1. 0 Contributors. SciPy 1.0: fundamental algorithms for scientific computing in python Nat Methods 2020 17 261-272
[60]
Weber Max, Apel Sven, Siegmund Norbert (2021) White-box performance-influence models: a profiling and learning approach. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp-1059–1071.
[61]
Wooldridge Jeffrey (2012) Introductory econometrics: a modern approach. South-Western College Pub, 5 edn. ISBN 978-1-111-53104-1
[62]
Xu Tianyin, Jin Long, Fan Xuepeng, Zhou Yuanyuan, Pasupathy Shankar, Talwadker Rukma (2015) Hey, You Have given Me Too Many Knobs!: understanding and dealing with over-designed configuration in system software. In: Proceedings of the joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 307–319. ISBN 978-1-4503-3675-8.
[63]
Zhang Yi, Guo Jianmei, Blais Eric, Czarnecki Krzysztof (2015) Performance prediction of configurable software systems by fourier learning. In: Proceedings of the international conference on automated software engineering (ASE). IEEE, pp 365–373,
[64]
Zhu Y, Liu J, Guo M, Bao Y, Ma W, Liu Z, Song K, Yang Y (2017) BestConfig: tapping the performance potential of systems via automatic configuration tuning. In: Proceedings of the symposium on cloud computing (SoCC). ACM, pp 338–350. ISBN 9781450350280.
[65]
Zou H and Hastie TRegularization and variable selection via the elastic netJ Royal Stat Soc: Series B (Stat Method)201767301-320ISSN 1369-7412. https://doi.org/10.1111/j.1467-9868.2005.00503.x
[66]
Zwillinger D, Kokoska S (1999) CRC standard probability and statistics tables and formulae CRC Press

Cited By

View all
  • (2023)Performance evolution of configurable software systems: an empirical studyEmpirical Software Engineering10.1007/s10664-023-10338-328:6Online publication date: 13-Nov-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 28, Issue 2
Mar 2023
1389 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 19 January 2023
Accepted: 09 October 2022

Author Tags

  1. Probabilistic programming
  2. Performance-influence modeling
  3. Configurable software systems
  4. Bayesian inference
  5. P4

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Performance evolution of configurable software systems: an empirical studyEmpirical Software Engineering10.1007/s10664-023-10338-328:6Online publication date: 13-Nov-2023

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media