Abstract
High-performance computing is not only a race towards the fastest supercomputers but also the science of using such massive machines productively to acquire valuable results – outlining the importance of performance modelling and optimization. However, it appears that more than punctual optimization is required for current architectures, with users having to choose between multiple intertwined parallelism possibilities, dedicated accelerators, and I/O solutions. Witnessing this challenging context, our paper establishes an automatic feedback loop between how applications run and how they are launched, with a specific focus on I/O. One goal is to optimize how applications are launched through moldability (launch-time malleability). As a first step in this direction, we propose a new, always-on measurement infrastructure based on state-of-the-art cloud technologies adapted for HPC. In this paper, we present the measurement infrastructure and associated design choices. Moreover, we leverage an existing performance modelling tool to generate I/O performance models. We outline sample modelling capabilities, as derived from our measurement chain showing the critical importance of the measurement in future HPC systems, especially concerning resource configurations. Thanks to this precise performance model infrastructure, we can improve moldability and malleability on HPC systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahn, D.H., Garlick, J., Grondona, M., Lipari, D., Springmeyer, B., Schulz, M.: Flux: a next-generation resource management framework for large HPC centers. In: 2014 43rd International Conference on Parallel Processing Workshops, pp. 9–17. IEEE (2014)
Arima, E., Comprés, A.I., Schulz, M.: On the convergence of malleability and the HPC PowerStack: exploiting dynamism in over-provisioned and power-constrained HPC systems. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds.) ISC High Performance 2022. LNCS, vol. 13387, pp. 206–217. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-23220-6_14
Balaprakash, P., et al.: Autotuning in high-performance computing applications. Proc. IEEE 106(11), 2068–2083 (2018)
Besnard, J.B., Malony, A.D., Shende, S., Pérache, M., Carribault, P., Jaeger, J.: Towards a better expressiveness of the speedup metric in MPI context. In: 2017 46th International Conference on Parallel Processing Workshops (ICPPW), pp. 251–260. IEEE (2017)
Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 45 (2013). tex.organization: ACM Citation Key: CA13
Cantalupo, C., et al.: A strawman for an HPC PowerStack. Technical report, Intel Corporation, United States; Lawrence Livermore National Lab. (LLNL) (2018)
Carns, P.H., et al.: Understanding and improving computational science storage access through continuous characterization. ACM Trans. Storage 7(3), 8:1–8:26 (2011). https://doi.org/10.1145/2027066.2027068
Carretero, J., Jeannot, E., Pallez, G., Singh, D.E., Vidal, N.: Mapping and scheduling HPC applications for optimizing I/O. In: Proceedings of the 34th ACM International Conference on Supercomputing, pp. 1–12 (2020)
Cascajo, A., Singh, D.E., Carretero, J.: LIMITLESS-light-weight monitoring tool for large scale systems. Microprocess. Microsyst. 93, 104586 (2022)
Cera, M.C., Georgiou, Y., Richard, O., Maillard, N., Navaux, P.O.A.: Supporting malleability in parallel architectures with dynamic CPUSETs mapping and dynamic MPI. In: Kant, K., Pemmaraju, S.V., Sivalingam, K.M., Wu, J. (eds.) ICDCN 2010. LNCS, vol. 5935, pp. 242–257. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11322-2_26
D’Amico, M., Jokanovic, A., Corbalan, J.: Holistic slowdown driven scheduling and resource management for malleable jobs. In: ACM International Conference Proceeding Series (2019). https://doi.org/10.1145/3337821.3337909
Denoyelle, N., Goglin, B., Ilic, A., Jeannot, E., Sousa, L.: Modeling large compute nodes with heterogeneous memories with cache-aware roofline model. In: Jarvis, S., Wright, S., Hammond, S. (eds.) PMBS 2017. LNCS, vol. 10724, pp. 91–113. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72971-8_5
Dorier, M., Dreher, M., Peterka, T., Wozniak, J.M., Antoniu, G., Raffin, B.: Lessons learned from building in situ coupling frameworks. In: Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, pp. 19–24 (2015)
Duro, F.R., Blas, J.G., Isaila, F., Carretero, J., Wozniak, J., Ross, R.: Exploiting data locality in Swift/T workflows using Hercules. In: Proceedings of NESUS Workshop (2014)
Goglin, B., Moreaud, S.: Dodging non-uniform I/O access in hierarchical collective operations for multicore clusters. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp. 788–794. IEEE (2011)
Gupta, R., Laguna, I., Ahn, D., Gamblin, T., Bagchi, S., Lin, F.: STATuner: efficient tuning of CUDA kernels parameters. In: Supercomputing Conference (SC 2015), Poster (2015)
Hoefler, T., Gropp, W., Kramer, W., Snir, M.: Performance modeling for systematic performance tuning. In: State of the Practice Reports, SC 2011, pp. 1–12. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2063348.2063356
Hu, W., Liu, G., Li, Q., Jiang, Y., Cai, G.: Storage wall for exascale supercomputing. Front. Inf. Technol. Electron. Eng. 17(11), 1154–1175 (2016). https://doi.org/10.1631/FITEE.1601336
Huber, D., Streubel, M., Comprés, I., Schulz, M., Schreiber, M., Pritchard, H.: Towards dynamic resource management with MPI sessions and PMIx. In: Proceedings of the 29th European MPI Users’ Group Meeting, pp. 57–67 (2022)
Klein, C., Pérez, C.: An RMS for non-predictably evolving applications. In: Proceedings - IEEE International Conference on Cluster Computing, ICCC, pp. 326–334 (2011). https://doi.org/10.1109/CLUSTER.2011.56
Kumar, R., Vadhiyar, S.: Identifying quick starters: towards an integrated framework for efficient predictions of queue waiting times of batch parallel jobs. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2012. LNCS, vol. 7698, pp. 196–215. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35867-8_11
Martí Fraiz, J.: dataClay: next generation object storage (2017)
Miranda, A., Jackson, A., Tocci, T., Panourgias, I., Nou, R.: NORNS: extending Slurm to support data-driven workflows through asynchronous data staging. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), USA, pp. 1–12. IEEE (2019). https://doi.org/10.1109/CLUSTER.2019.8891014
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001). https://doi.org/10.1109/71.932708
Netti, A., et al.: DCDB wintermute: enabling online and holistic operational data analytics on HPC systems. In: Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, pp. 101–112 (2020)
Nikitenko, D.A., et al.: Influence of noisy environments on behavior of HPC applications. Lobachevskii J. Math. 42(7), 1560–1570 (2021). https://doi.org/10.1134/S1995080221070192
Patki, T., Thiagarajan, J.J., Ayala, A., Islam, T.Z.: Performance optimality or reproducibility: that is the question. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver Colorado, pp. 1–30. ACM (2019). https://doi.org/10.1145/3295500.3356217
Petrini, F., Kerbyson, D., Pakin, S.: The case of the missing supercomputer performance: achieving optimal performance on the 8,192 processors of ASCI Q. In: SC 2003: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, p. 55 (2003). https://doi.org/10.1145/1048935.1050204
Prabhakaran, S., Neumann, M., Rinke, S., Wolf, F., Gupta, A., Kale, L.V.: A batch system with efficient adaptive scheduling for malleable and evolving applications. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 429–438. IEEE (2015)
Ritter, M., Calotoiu, A., Rinke, S., Reimann, T., Hoefler, T., Wolf, F.: Learning cost-effective sampling strategies for empirical performance modeling. In: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 884–895 (2020). https://doi.org/10.1109/IPDPS47924.2020.00095
Ritter, M., et al.: Noise-resilient empirical performance modeling with deep neural networks. In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 23–34 (2021). https://doi.org/10.1109/IPDPS49936.2021.00012
Schulz, M., Kranzlmüller, D., Schulz, L.B., Trinitis, C., Weidendorfer, J.: On the inevitability of integrated HPC systems and how they will change HPC system operations. In: Proceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, pp. 1–6 (2021)
Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 202–219. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47954-6_11
Sudarsan, R., Ribbens, C.J.: ReSHAPE: a framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment. In: Proceedings of the International Conference on Parallel Processing (2007). https://doi.org/10.1109/ICPP.2007.73
Vef, M.A., et al.: GekkoFS-a temporary distributed file system for HPC applications. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 319–324. IEEE (2018)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785
Wood, C., et al.: Artemis: automatic runtime tuning of parallel execution parameters using machine learning. In: Chamberlain, B.L., Varbanescu, A.-L., Ltaief, H., Luszczek, P. (eds.) ISC High Performance 2021. LNCS, vol. 12728, pp. 453–472. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78713-4_24
Wu, X., et al.: Toward an end-to-end auto-tuning framework in HPC PowerStack. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 473–483. IEEE (2020)
Acknowledgment
This work has been partially funded by the European Union’s Horizon 2020 under the ADMIRE project, grant Agreement number: 956748-ADMIRE-H2020-JTI-EuroHPC-2019-1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Besnard, JB. et al. (2023). Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling. In: Bienz, A., Weiland, M., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13999. Springer, Cham. https://doi.org/10.1007/978-3-031-40843-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-40843-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40842-7
Online ISBN: 978-3-031-40843-4
eBook Packages: Computer ScienceComputer Science (R0)