research-article

Open access

Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees

Authors:

Stephanie Brink,

Michael McKinsey,

Connor Scully-Allison,

Treece Burgess,

Jakob Lüttgau,

Katherine E. Isaacs,

Michela Taufer,

Olga PearceAuthors Info & Claims

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

Pages 281 - 293

https://doi.org/10.1145/3588195.3592989

Published: 07 August 2023 Publication History

Abstract

Thicket is an open-source Python toolkit for Exploratory Data Analysis (EDA) of multi-run performance experiments. It enables an understanding of optimal performance configuration for large-scale application codes. Most performance tools focus on a single execution (e.g., single platform, single measurement tool, single scale). Thicket bridges the gap to convenient analysis in multi-dimensional, multi-scale, multi-architecture, and multi-tool performance datasets by providing an interface for interacting with the performance data. Thicket has a modular structure composed of three components. The first component is a data structure for multi-dimensional performance data, which is composed automatically on the portable basis of call trees, and accommodates any subset of dimensions present in the dataset. The second is the metadata, enabling distinction and sub-selection of dimensions in performance data. The third is a dimensionality reduction mechanism, enabling analysis such as computing aggregated statistics on a given data dimension. Extensible mechanisms are available for applying analyses (e.g., top-down on Intel CPUs), data science techniques (e.g., K-means clustering from scikit-learn), modeling performance (e.g., Extra-P), and interactive visualization. We demonstrate the power and flexibility of Thicket through two case studies, first with the open-source RAJA Performance Suite on CPU and GPU clusters and another with a large physics simulation run on both a traditional HPC cluster and an AWS Parallel Cluster instance.

References

[1]

Adiak. http://github.com/LLNL/adiak

[2]

Extra-p. http://github.com/extra-p/extrap

[3]

NVIDIA Nsight Compute Profiling Tool. https://docs.nvidia.com/nsight-compute/NsightCompute/index.html

[4]

Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22(6), 685--701 (2010)

Digital Library

[5]

Anderson, R., Black, A., Busby, L., Blakeley, B., Bleile, R., Camier, J.S., Ciurej, J., Cook, A., Dobrev, V., Elliott, N., Grondalski, J., Harrison, C., Hornung, R., Kolev, T., Legendre, M., Liu, W., Nissen, W., Olson, B., Osawe, M., Papadimitriou, G., Pearce, O., Pember, R., Skinner, A., Stevens, D., Stitt, T., Taylor, L., Tomov, V., Rieben, R., Vargas, A., Weiss, K., White, D.: The Multiphysics on Advanced Platforms Project. Tech. rep., Lawrence Livermore National Laboratory (LLNL) (Nov 2020)

[6]

Beckingsale, D.A., Scogland, T.R., Burmark, J., Hornung, R., Jones, H., Killian, W., Kunen, A.J., Pearce, O., Robinson, P., Ryujin, B.S.: RAJA: Portable Performance for Large-Scale Scientific Applications. In: 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). pp. 71--81. IEEE, Denver, CO, USA (Nov 2019). https://doi.org/10.1109/P3HPC49587.2019.00012

[7]

Bhatele, A., Brink, S., Gamblin, T.: Hatchet: Pruning the Overgrowth in Parallel Profiles. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC '19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3295500.3356219, https://doi.org/10.1145/3295500.3356219

Digital Library

[8]

Boehme, D., Aschwanden, P., Pearce, O., Weiss, K., LeGendre, M.: Ubiquitous Performance Analysis. In: Chamberlain, B.L., Varbanescu, A.L., Ltaief, H., Luszczek, P. (eds.) High Performance Computing. pp. 431--449. Springer International Publishing, Cham (2021)

[9]

Boehme, D., Gamblin, T., Beckingsale, D., Bremer, P.T., Gimenez, A., LeGendre, M., Pearce, O., Schulz, M.: Caliper: Performance Introspection for HPC Software Stacks. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC '16, IEEE Press (2016)

[10]

Brink, S., Lumsden, I., Scully-Allison, C., Williams, K., Pearce, O., Gamblin, T., Taufer, M., Isaacs, K.E., Bhatele, A.: Usability and Performance Improvements in Hatchet. In: 2020 IEEE/ACM International Workshop on HPC User Support Tools (HUST) and Workshop on Programming and Performance Visualization Tools (ProTools). pp. 49--58 (2020). https://doi.org/10.1109/HUSTProtools51951.2020.00013

[11]

Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In: Proc. of the ACM/IEEE Conference on Supercomputing (SC13), Denver, CO, USA. pp. 1--12. ACM (November 2013). https://doi.org/10.1145/2503210.2503277

Digital Library

[12]

Gratzl, S., Lex, A., Gehlenborg, N., Pfister, H., Streit, M.: LineUp: Visual Analysis of Multi-Attribute Rankings. IEEE Transactions on Visualization and Computer Graphics (InfoVis) 19(12), 2277--2286 (2013). https://doi.org/10.1109/TVCG.2013.173

Digital Library

[13]

Hornung, R.D., Hones, H.E.: RAJA Performance Suite

[14]

Hoyer, S., Hamman, J.: xarray: N-D labeled arrays and datasets in Python. Journal of Open Research Software 5(1) (2017). https://doi.org/10.5334/jors.148, https://doi.org/10.5334/jors.148

[15]

Hsieh, S.M., Hsu, C.C., Hsu, L.F.: Efficient Method to Perform Isomorphism Testing of Labeled Graphs. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) Computational Science and Its Applications - ICCSA 2006. pp. 422--431. Springer Berlin Heidelberg, Berlin, Heidelberg (2006)

[16]

Huck, K., Malony, A.D., Bell, R., Li, L., Morris, A.: PerfDMF: Design and implementation of a parallel performance data management framework. In: International Conference on Parallel Processing (ICPP'05) (2005)

Digital Library

[17]

Huck, K.A., Malony, A.D., Shende, S., Morris, A.: Knowledge support and automation for performance analysis with PerfExplorer 2.0. Scientific programming 16(2--3), 123--134 (2008)

[18]

Karavanic, K.L., May, J., Mohror, K., Miller, B., Huck, K., Knapp, R., Pugh, B.: Integrating Database Technology with Comparison-based Parallel Performance Diagnosis: The PerfTrack Performance Experiment Management Tool. In: Super- computing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference. pp. 39--39 (Nov 2005). https://doi.org/10.1109/SC.2005.36

Digital Library

[19]

Karavanic, K.L., Miller, B.P.: Experiment management support for performance tuning. In: SC'97: Proceedings of the 1997 ACM/IEEE Conference on Supercomputing. pp. 8--8. IEEE (1997)

Digital Library

[20]

Kery, M.B., Radensky, M., Arya, M., John, B.E., Myers, B.A.: The story in the notebook: Exploratory data science using a literate programming tool. In: Proceedings of the 2018 CHI conference on human factors in computing systems. pp. 1--11 (2018)

Digital Library

[21]

Kluyver, T., Ragan-Kelley, B., Pérrez, F., Granger, B., Matthias, B., Frederic, Jonathan adn Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, Damiánn adn Abdalla, S., Willing, C., Team, J.D.: Jupyter Notebooks - a publishing format for reproducible computational workflows, pp. 87--90 (2016). https://doi.org/10.3233/978--1--61499--649--1--87

[22]

Knapp, R.L., Mohror, K., Amauba, A., Karavanic, K.L., Neben, A., Conerly, T., May, J.: PerfTrack: Scalable application performance diagnosis for linux clusters. In: 8th LCI International Conference on High-Performance Clustered Computing. pp. 15--17. Citeseer (2007)

[23]

Knüpfer, A., Rössel, C., Mey, D.a., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope,Scalasca, TAU, and Vampir. In: Brunst, H., Müller, M.S., Nagel, W.E., Resch, M.M. (eds.) Tools for High Performance Computing 2011. pp. 79--91. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)

[24]

LLNL: RAJA. http://github.com/LLNL/raja (Dec 2022)

[25]

LLNL: RAJA Performance Suite. http://github.com/LLNL/rajaperf (Dec 2022)

[26]

Lloyd, S.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129--137 (1982). https://doi.org/10.1109/TIT.1982.1056489

Digital Library

[27]

Lumsden, I., Luettgau, J., Lama, V., Scully-Allison, C., Brink, S., Isaacs, K.E., Pearce, O., Taufer, M.: Enabling Call Path Querying in Hatchet to Identify Performance Bottlenecks in Scientific Applications. In: 2022 IEEE 18th International Conference on e-Science (e-Science). pp. 256--266 (2022). https://doi.org/10.1109/eScience55777.2022.00039

[28]

Mellor-Crummey, J.: HPCToolkit: Multi-platform tools for profile-based performance analysis. In: 5th International Workshop on Automatic Performance Analysis (APART) (November 2003)

[29]

Nobre, C., Streit, M., Lex, A.: Juniper: A TreeTable Approach to Multivariate Graph Visualization. IEEE Transactions on Visualization and Computer Graphics (InfoVis) 25(1), 544--554 (2019). https://doi.org/10.1109/TVCG.2018.2865149

Digital Library

[30]

Obermaier, H., Bensema, K., Joy, K.I.: Visual trends analysis in time-varying ensembles. IEEE transactions on visualization and computer graphics 22(10), 2331--2342 (2015)

[31]

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825--2830 (2011)

Digital Library

[32]

Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53--65 (1987). https://doi.org/https://doi.org/10.1016/0377-0427(87)90125--7, https://www.sciencedirect.com/science/article/pii/0377042787901257

Digital Library

[33]

Rule, A., Tabard, A., Hollan, J.D.: Exploration and explanation in computational notebooks. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. pp. 1--12 (2018)

Digital Library

[34]

Scully-Allison, C., Lumsden, I., Williams, K., Bartels, J., Taufer, M., Brink, S., Bhatele, A., Pearce, O., Isaacs, K.E.: Designing an Interactive, Notebook- Embedded, Tree Visualization to Support Exploratory Performance Analysis. arXiv preprint arXiv:2205.04557 (2022)

[35]

Shende, S.S., Malony, A.D.: The Tau Parallel Performance System. Int. J. High Perform. Comput. Appl. 20(2), 287--311 (may 2006). https://doi.org/10.1177/1094342006064482, https://doi.org/10.1177/1094342006064482

Digital Library

[36]

Vargas, A., Stitt, T.M., Weiss, K., Tomov, V.Z., Camier, J.S., Kolev, T., Rieben, R.N.: Matrix-free approaches for GPU acceleration of a high-order finite element hydrodynamics application using MFEM, Umpire, and RAJA. Int. J. High Perform. Comput. Appl. 36(4), 492--509 (Jul 2022)

Digital Library

[37]

Wang, J., Hazarika, S., Li, C., Shen, H.W.: Visualization and visual analysis of ensemble data: A survey. IEEE transactions on visualization and computer graphics 25(9), 2853--2872 (2018)

[38]

Wang, J., Liu, X., Shen, H.W., Lin, G.: Multi-resolution climate ensemble parameter analysis with nested parallel coordinates plots. IEEE transactions on visualization and computer graphics 23(1), 81--90 (2016)

[39]

Yasin, A.: A Top-Down Method for Performance Analysis and Counters Architecture. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). pp. 35--44. IEEE, CA, USA (Mar 2014). https://doi.org/10.1109/ISPASS.2014.6844459

Cited By

Fan KKesavan SPetruzza SKumar S(2024)TinyProf: Towards Continuous Performance Introspection through Scalable Parallel I/OISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528932(1-12)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528932
Adhianto LAnderson JBarnett RGrbic DIndic VKrentel MLiu YMilaković SPhan WMellor-Crummey J(2024)Refining HPCToolkit for application performance analysis at exascaleThe International Journal of High Performance Computing Applications10.1177/1094342024127783938:6(612-632)Online publication date: 30-Aug-2024
https://doi.org/10.1177/10943420241277839
Lumsden IDevarajan HMarquez JBrink SBoehme DPearce OYeom JTaufer M(2024)Empirical Study of Molecular Dynamics Workflow Data Movement: DYAD vs. Traditional I/O Systems2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00111(543-553)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00111
Show More Cited By

Index Terms

Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Recommendations

Finding the forest in the trees: Enabling performance optimization on heterogeneous architectures through data science analysis of ensemble performance data

In this work, we develop novel data science methodologies for ensemble performance data that have the potential to uncover orders of magnitude of performance that is unknowingly being left on the table. Building on years of successful performance tool ...
Cross-Accelerator Performance Profiling
XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

The computing requirements of scientific applications have influenced processor design, and have motivated the introduction and use of many-core processors, i.e., accelerators, for high performance computing (HPC). Consequently, it is now common for the ...
Modeling and predicting performance of high performance computing applications on hardware accelerators

Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

August 2023

350 pages

ISBN:9798400701559

DOI:10.1145/3588195

General Chair:
Ali R. Butt
Virginia Tech, USA
,
Program Chairs:
Ningfang Mi
Northeastern University, USA
,
Kyle Chard
University of Chicago & Argonne National Laboratory, USA

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2023

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC '23

Sponsor:

HPDC '23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing

June 16 - 23, 2023

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
281
Total Downloads

Downloads (Last 12 months)247
Downloads (Last 6 weeks)34

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fan KKesavan SPetruzza SKumar S(2024)TinyProf: Towards Continuous Performance Introspection through Scalable Parallel I/OISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528932(1-12)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528932
Adhianto LAnderson JBarnett RGrbic DIndic VKrentel MLiu YMilaković SPhan WMellor-Crummey J(2024)Refining HPCToolkit for application performance analysis at exascaleThe International Journal of High Performance Computing Applications10.1177/1094342024127783938:6(612-632)Online publication date: 30-Aug-2024
https://doi.org/10.1177/10943420241277839
Lumsden IDevarajan HMarquez JBrink SBoehme DPearce OYeom JTaufer M(2024)Empirical Study of Molecular Dynamics Workflow Data Movement: DYAD vs. Traditional I/O Systems2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00111(543-553)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00111
Pearce OScott ABecker GHaque RHanford NBrink SJacobsen DPoxon HDomke JGamblin T(2023)Towards Collaborative Continuous Benchmarking for HPCProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624135(627-635)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624135

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents