Abstract
The next generation of supercomputers will require HPC applications to handle failures. This paper presents, through an example application, the benefits of logging messages at the application level. The proposed method will do both, provide resilience to failures and improve performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Elnozahy, E.N., Alvisi, L., Wang, Y.-M., Johnson, D.B.: A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv. 34(3), 375–408 (2002)
Gioachin, F., Sharma, A., Chakravorty, S., Mendes, C., Kalé, L.V., Quinn, T.: Scalable cosmological simulations on parallel machines. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds.) VECPAR 2006. LNCS, vol. 4395, pp. 476–489. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71351-7_37
Huang, K.-H., Abraham, J.A.: Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 33(6), 518–528 (1984)
Kalé, L., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Paepcke, A. (ed.) Proceedings of OOPSLA 1993, pp. 91–108. ACM Press, September 1993
Meneses, E., Sarood, O., Kale, L.V.: Energy profile of rollback-recovery strategies in high performance computing. Parallel Comput. 40(9), 536–547 (2014)
Snir, M., Wisniewski, R.W., Abraham, J.A., Adve, S.V., Bagchi, S., Balaji, P., Belak, J., Bose, P., Cappello, F., Carlson, B., Chien, A.A., Coteus, P., DeBardeleben, N., Diniz, P.C., Engelmann, C., Erez, M., Fazzari, S., Geist, A., Gupta, R., Johnson, F., Krishnamoorthy, S., Leyffer, S., Liberty, D., Mitra, S., Munson, T., Schreiber, R., Stearley, J., Hensbergen, E.V.: Addressing failures in exascale computing. IJHPCA 28(2), 129–173 (2014)
Acknowledgments
This work was partially supported by a machine allocation on Argonne Leadership Computing Facility awarded by the U.S. Department of Energy under contract DE-AC02-06CH11357.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Meneses, E. (2018). Exploring Application-Level Message-Logging in Scalable HPC Programs. In: Mocskos, E., Nesmachnow, S. (eds) High Performance Computing. CARLA 2017. Communications in Computer and Information Science, vol 796. Springer, Cham. https://doi.org/10.1007/978-3-319-73353-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-73353-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73352-4
Online ISBN: 978-3-319-73353-1
eBook Packages: Computer ScienceComputer Science (R0)