Abstract
Collective I/Os are widely used to transform small, non-contiguous accesses into large, contiguous accesses for parallel I/O optimization. The existing collective I/O techniques were proposed with the assumption that computer memory is volatile. However, their ability is limited by the size of collective I/O buffers and communication overhead. In this paper, we propose PMIO, a novel collective I/O framework that employs node-local persistent memory on compute nodes for I/O optimization of HPC applications. First, it uses a log-structured buffer to achieve a high bandwidth of persistent memory and enforce crash consistency, allowing us to increase buffer size. Second, being less space-constrained than with more expensive DRAM, PMIO can buffer data across multiple collective I/O calls before writing them back to parallel file systems to further improve I/O performance. Third, we design a two-level log merging approach to reduce communication overhead for data shuffling among MPI processes on compute nodes. Our experimental results with representative MPI-IO benchmarks show that PMIO improves the I/O throughput by up to 121X and 151X for writes and reads respectively on the Perlmutter supercomputer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
ADIOS: The Adaptable I/O System. https://csmd.ornl.gov/adios
Parallel I/O Benchmarking Consortium. https://www.mcs.anl.gov/research/projects/pio-benchmark/
Perlmutter. https://docs.nersc.gov/systems/perlmutter/architecture/
Breitenfeld, M.S., Pourmal, E., Byna, S., Koziol, Q.: Achieving high performance I/O with HDF5. In: ECP Annual Meeting 2020 (2020)
Chen, Y., Sun, X.H., Thakur, R., Roth, P.C., Gropp, W.D.: Lacio: a new collective i/o strategy for parallel i/o systems. In: 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 794–804 (2011). https://doi.org/10.1109/IPDPS.2011.79
Ching, A., Choudhary, A., Coloma, K., Keng Liao, W., Ross, R., Gropp, W.: Noncontiguous i/o accesses through mpi-io. In: CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings, pp. 104–111 (2003). https://doi.org/10.1109/CCGRID.2003.1199358
Coloma, K., et al.: A new flexible mpi collective i/o implementation. In: 2006 IEEE International Conference on Cluster Computing, pp. 1–10 (2006). https://doi.org/10.1109/CLUSTR.2006.311865
Congiu, G., Narasimhamurthy, S., Süß, T., Brinkmann, A.: Improving collective i/o performance using non-volatile memory devices. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 120–129 (2016). https://doi.org/10.1109/CLUSTER.2016.37
Li, T., Byna, S., Koziol, Q., Tang, H., Bez, J.L., Kang, Q.: h5bench: HDF5 I/O kernel suite for exercising HPC I/O patterns. In: Proceedings of Cray User Group Meeting, CUG 2021 (2021)
Lu, Y., Chen, Y., Amritkar, P., Thakur, R., Zhuang, Y.: A new data sieving approach for high performance I/O. In: (Jong Hyuk) Park, J., Leung, V., Wang, CL., Shon, T. (eds.) FutureTech 2012. LNCS, vol. 164, pp. 111–121. Springer, Heidelberg (2012). https://doi.org/10.1007/978-94-007-4516-2_12
Newsroom, I.: Intel® OptaneTM DC Persistent Memory (2019). https://www.intel.com/content/www/us/en/products/memory-storage/optane-dc-persistent-memory.html
Nguyen, B., Tan, H., Davis, K., Zhang, X.: Persistent octrees for parallel mesh refinement through non-volatile byte-addressable memory. IEEE Trans. Parallel Distrib. Syst. 30(3), 677–691 (2019). https://doi.org/10.1109/TPDS.2018.2867867
Nguyen, B., Tan, H., Zhang, X.: Large-scale adaptive mesh simulations through non-volatile byte-addressable memory. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017 (2017)
Ou, J., Shu, J., Lu, Y.: A high performance file system for non-volatile main memory. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys 2016 (2016)
Sehrish, S., Son, S.W., Liao, W.k., Choudhary, A., Schuchardt, K.: Improving collective i/o performance by pipelining request aggregation and file access. In: Proceedings of the 20th European MPI Users’ Group Meeting, EuroMPI 2013, pp. 37–42. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2488551.2488559
Song, H., Leangsuksun, C., Nassar, R., Gottumukkala, N., Scott, S.: Availability modeling and analysis on high performance cluster computing systems. In: First International Conference on Availability, Reliability and Security (ARES 2006), p. 8 (2006). https://doi.org/10.1109/ARES.2006.37
Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective i/o in romio. In: Proceedings of Frontiers 1999, Seventh Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189 (1999). https://doi.org/10.1109/FMPC.1999.750599
Volos, H., Tack, A.J., Swift, M.M.: Mnemosyne: lightweight persistent memory. SIGPLAN Not. 47(4), 91–104 (2011)
Wang, Z., Shi, X., Jin, H., Wu, S., Chen, Y.: Iteration based collective i/o strategy for parallel i/o systems. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 287–294 (2014). https://doi.org/10.1109/CCGrid.2014.61
Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: 18th USENIX Conference on File and Storage Technologies (FAST 2020), pp. 169–182. USENIX Association, Santa Clara (2020). https://www.usenix.org/conference/fast20/presentation/yang
Zhang, X., Jiang, S., Davis, K.: Making resonance a common case: a high-performance implementation of collective i/o on parallel file systems. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–12 (2009). https://doi.org/10.1109/IPDPS.2009.5161070
Zhang, X., Ou, J., Davis, K., Jiang, S.: Orthrus: a framework for implementing efficient collective I/O in multi-core clusters. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 348–364. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_22
Acknowledgements
This work was supported in part by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists (WDTS) under the Visiting Faculty Program (VFP). This work was supported in part by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and also used resources of the National Energy Research Scientific Computing Center (NERSC). It was also supported in part by NSF CNS-2216108.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sanchez, K., Gavin, A., Byna, S., Wu, K., Zhang, X. (2024). A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14802. Springer, Cham. https://doi.org/10.1007/978-3-031-69766-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-69766-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-69765-4
Online ISBN: 978-3-031-69766-1
eBook Packages: Computer ScienceComputer Science (R0)