Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3427921.3450243acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

An Exploratory Study of the Impact of Parameterization on JMH Measurement Results in Open-Source Projects

Published: 09 April 2021 Publication History

Abstract

The Java Microbenchmarking Harness (JMH) is a widely used tool for testing performance-critical code on a low level. One of the key features of JMH is the support for user-defined parameters, which allows executing the same benchmark with different workloads. However, a benchmark configured with n parameters with m different values each requires JMH to execute the benchmark mn times (once for each combination of configured parameter values). Consequently, even fairly modest parameterization leads to a combinatorial explosion of benchmarks that have to be executed, hence dramatically increasing execution time. However, so far no research has investigated how this type of parameterization is used in practice, and how important different parameters are to benchmarking results. In this paper, we statistically study how strongly different user parameters impact benchmark measurements for 126 JMH benchmarks from five well-known open source projects. We show that 40% of the studied metric parameters have no correlation with the resulting measurement, i.e., testing with different values in these parameters does not lead to any insights. If there is a correlation, it is often strongly predictable following a power law, linear, or step function curve. Our results provide a first understanding of practical usage of user-defined JMH parameters, and how they correlate with the measurements produced by benchmarks. We further show that a machine learning model based on Random Forest ensembles can be used to predict the measured performance of an untested metric parameter value with an accuracy of 93% or higher for all but one benchmark class, demonstrating that given sufficient training data JMH performance test results for different parameterizations are highly predictable.

References

[1]
S. Ahmed, K. K. Ghosh, P. K. Singh, Z. W. Geem, and R. Sarkar. Hybrid of harmony search algorithm and ring theory-based evolutionary algorithm for feature selection. IEEE Access, 8:102629--102645, 2020.
[2]
N. K. L. Alhammad, E. Alzaghoul, F. A. Alzaghoul, and M. Akour. Evolutionary neural network classifiers for software effort estimation. Int. J. Comput. Aided Eng. Technol., 12(4):495--512, 2020.
[3]
S. Baltes and P. Ralph. Sampling in software engineering research: A critical review and guidelines, 2020.
[4]
L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001.
[5]
G. Catolino, F. Palomba, F. A. Fontana, A. D. Lucia, A. Zaidman, and F. Ferrucci. Improving change prediction models with code smell-related information. Empirical Software Engineering, 25(1):49--95, 2020.
[6]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, page 143--154, New York, NY, USA, 2010. Association for Computing Machinery.
[7]
D. E. Damasceno Costa, C. Bezemer, P. Leitner, and A. Andrzejak. What's wrong with my benchmark results? studying bad practices in JMH benchmarks. IEEE Transactions on Software Engineering, 2019. To appear.
[8]
J. Deng, L. Lu, S. Qiu, and Y. Ou. A suitable ast node granularity and multi-kernel transfer convolutional neural network for cross-project defect prediction. IEEE Access, 8:66647--66661, 2020.
[9]
D. F. Garcia and J. Garcia. TPC-W e-commerce benchmark evaluation. Computer, 36(2):42--48, Feb 2003.
[10]
A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous java performance evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, OOPSLA '07, page 57--76, New York, NY, USA, 2007. Association for Computing Machinery.
[11]
X. Gu, H. Zhang, and S. Kim. Deep code search. In M. Chaudron, I. Crnkovic, M. Chechik, and M. Harman, editors, Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, pages 933--944. ACM, 2018.
[12]
J. T. Hancock and T. M. Khoshgoftaar. Survey on categorical data for neural networks. J. Big Data, 7(1):28, 2020.
[13]
S. C. H. Hoi, D. Sahoo, J. Lu, and P. Zhao. Online learning: A comprehensive survey. CoRR, abs/1802.02871, 2018.
[14]
A. Kaur, S. Jain, and S. Goel. SP-J48: a novel optimization and machine-learningbased approach for solving complex problems: special application in software engineering for detecting code smells. Neural Computing and Applications, 32(11):7009--7027, 2020.
[15]
M. Kondo, C. Bezemer, Y. Kamei, A. E. Hassan, and O. Mizuno. The impact of feature reduction techniques on defect prediction models. Empirical Software Engineering, 24(4):1925--1963, 2019.
[16]
C. Laaber and P. Leitner. An evaluation of open-source software microbenchmark suites for continuous performance assessment. In A. Zaidman, Y. Kamei, and E. Hill, editors, Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018, Gothenburg, Sweden, May 28--29, 2018, pages 119--130. ACM, 2018.
[17]
C. Laaber, J. Scheuner, and P. Leitner. Software microbenchmarking in the cloud. how bad is it really? Empirical Software Engineering, 24(4):2469--2508, 2019.
[18]
C. Laaber, S. Würsten, H. C. Gall, and P. Leitner. Dynamically reconfiguring software microbenchmarks: Reducing execution time without sacrificing result quality. In Proceedings of the 2020 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). IEEE, 2020. To appear.
[19]
P. Leitner and C.-P. Bezemer. An exploratory study of the state of practice of performance testing in java-based open source projects. In Proceedings of the 8th ACM/SPEC International Conference on Performance Engineering (ICPE), 2017.
[20]
P. Leitner and J. Cito. Patterns in the chaos - a study of performance variation and predictability in public iaas clouds. ACM Transactions on Internet Technology, 16(3):15:1--15:23, apr 2016.
[21]
Y. Li, Z. M. J. Jiang, H. Li, A. E. Hassan, C. He, R. Huang, Z. Zeng, M. Wang, and P. Chen. Predicting node failures in an ultra-large-scale cloud computing platform: An aiops solution. ACM Trans. Softw. Eng. Methodol., 29(2):13:1--13:24, 2020.
[22]
A. Limaye and T. Adegbija. A workload characterization of the spec cpu2017 benchmark suite. In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 149--158, 2018.
[23]
C. Liu, X. Xia, D. Lo, Z. Liu, A. E. Hassan, and S. Li. Simplifying deep-learningbased model for code search. CoRR, abs/2005.14373, 2020.
[24]
H. Liu, H. Motoda, R. Setiono, and Z. Zhao. Feature selection: An ever evolving frontier in data mining. In H. Liu, H. Motoda, R. Setiono, and Z. Zhao, editors, Proceedings of the Fourth International Workshop on Feature Selection in Data Mining, FSDM, held at PAKDD 2010, Hyderabad, India, June 21st, 2010, volume 10 of JMLR Proceedings, pages 4--13. JMLR.org, 2010.
[25]
H. Liu and Z. Zhao. Manipulating data and dimension reduction methods: Feature selection. In R. A. Meyers, editor, Encyclopedia of Complexity and Systems Science, pages 5348--5359. Springer, 2009.
[26]
A. V. Papadopoulos, L. Versluis, A. Bauer, N. Herbst, J. Von Kistowski, A. Ali-eldin, C. Abad, J. N. Amaral, P. Tuma, and A. Iosup. Methodological principles for reproducible performance evaluation in cloud computing. IEEE Transactions on Software Engineering, pages 1--1, 2019.
[27]
H. Samoaa and P. Leitner. An Exploratory Study of the Impact of Parameterization on JMH Measurement Results in Open-Source Projects [Replication Package]. Zenodo, Sept. 2020. https://doi.org/10.5281/zenodo.4013943.
[28]
P. Stefan, V. Horký, L. Bulej, and P. Tuma. Unit testing performance in java projects: Are we there yet? In W. Binder, V. Cortellessa, A. Koziolek, E. Smirni, and M. Poess, editors, Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, ICPE 2017, L'Aquila, Italy, April 22--26, 2017, pages 401--412. ACM, 2017.
[29]
E. J. Weyuker and F. I. Vokolos. Experience with performance testing of software systems: Issues, an approach, and case study. IEEE Trans. Softw. Eng., 26(12):1147--1156, Dec. 2000.
[30]
C. M. Woodside, G. Franks, and D. C. Petriu. The future of software performance engineering. In L. C. Briand and A. L. Wolf, editors, International Conference on Software Engineering, ISCE 2007, Workshop on the Future of Software Engineering, FOSE 2007, May 23--25, 2007, Minneapolis, MN, USA, pages 171--187. IEEE Computer Society, 2007.
[31]
B. Xue, M. Zhang, W. N. Browne, and X. Yao. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput., 20(4):606--626, 2016.

Cited By

View all
  • (2024)Evaluating Search-Based Software Microbenchmark PrioritizationIEEE Transactions on Software Engineering10.1109/TSE.2024.338083650:7(1687-1703)Online publication date: 1-Jul-2024
  • (2023)Towards effective assessment of steady state performance in Java software: are we there yet?Empirical Software Engineering10.1007/s10664-022-10247-x28:1Online publication date: 1-Jan-2023

Index Terms

  1. An Exploratory Study of the Impact of Parameterization on JMH Measurement Results in Open-Source Projects

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering
    April 2021
    301 pages
    ISBN:9781450381949
    DOI:10.1145/3427921
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 April 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. benchmark measurements
    2. benchmark parametrization
    3. java microbenchmarking harness (JMH)
    4. machine learning

    Qualifiers

    • Research-article

    Conference

    ICPE '21

    Acceptance Rates

    ICPE '21 Paper Acceptance Rate 16 of 61 submissions, 26%;
    Overall Acceptance Rate 252 of 851 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Evaluating Search-Based Software Microbenchmark PrioritizationIEEE Transactions on Software Engineering10.1109/TSE.2024.338083650:7(1687-1703)Online publication date: 1-Jul-2024
    • (2023)Towards effective assessment of steady state performance in Java software: are we there yet?Empirical Software Engineering10.1007/s10664-022-10247-x28:1Online publication date: 1-Jan-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media