A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory

Keegan Sanchez¹³,
Alex Gavin¹³,
Suren Byna¹⁴,
Kesheng Wu¹⁵ &
…
Xuechen Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14802))

Included in the following conference series:

European Conference on Parallel Processing

457 Accesses

Abstract

Collective I/Os are widely used to transform small, non-contiguous accesses into large, contiguous accesses for parallel I/O optimization. The existing collective I/O techniques were proposed with the assumption that computer memory is volatile. However, their ability is limited by the size of collective I/O buffers and communication overhead. In this paper, we propose PMIO, a novel collective I/O framework that employs node-local persistent memory on compute nodes for I/O optimization of HPC applications. First, it uses a log-structured buffer to achieve a high bandwidth of persistent memory and enforce crash consistency, allowing us to increase buffer size. Second, being less space-constrained than with more expensive DRAM, PMIO can buffer data across multiple collective I/O calls before writing them back to parallel file systems to further improve I/O performance. Third, we design a two-level log merging approach to reduce communication overhead for data shuffling among MPI processes on compute nodes. Our experimental results with representative MPI-IO benchmarks show that PMIO improves the I/O throughput by up to 121X and 151X for writes and reads respectively on the Perlmutter supercomputer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Orthrus: A Framework for Implementing Efficient Collective I/O in Multi-core Clusters

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

Article 16 March 2023

A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X

References

ADIOS: The Adaptable I/O System. https://csmd.ornl.gov/adios
Parallel I/O Benchmarking Consortium. https://www.mcs.anl.gov/research/projects/pio-benchmark/
Perlmutter. https://docs.nersc.gov/systems/perlmutter/architecture/
Breitenfeld, M.S., Pourmal, E., Byna, S., Koziol, Q.: Achieving high performance I/O with HDF5. In: ECP Annual Meeting 2020 (2020)
Google Scholar
Chen, Y., Sun, X.H., Thakur, R., Roth, P.C., Gropp, W.D.: Lacio: a new collective i/o strategy for parallel i/o systems. In: 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 794–804 (2011). https://doi.org/10.1109/IPDPS.2011.79
Ching, A., Choudhary, A., Coloma, K., Keng Liao, W., Ross, R., Gropp, W.: Noncontiguous i/o accesses through mpi-io. In: CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings, pp. 104–111 (2003). https://doi.org/10.1109/CCGRID.2003.1199358
Coloma, K., et al.: A new flexible mpi collective i/o implementation. In: 2006 IEEE International Conference on Cluster Computing, pp. 1–10 (2006). https://doi.org/10.1109/CLUSTR.2006.311865
Congiu, G., Narasimhamurthy, S., Süß, T., Brinkmann, A.: Improving collective i/o performance using non-volatile memory devices. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 120–129 (2016). https://doi.org/10.1109/CLUSTER.2016.37
Li, T., Byna, S., Koziol, Q., Tang, H., Bez, J.L., Kang, Q.: h5bench: HDF5 I/O kernel suite for exercising HPC I/O patterns. In: Proceedings of Cray User Group Meeting, CUG 2021 (2021)
Google Scholar
Lu, Y., Chen, Y., Amritkar, P., Thakur, R., Zhuang, Y.: A new data sieving approach for high performance I/O. In: (Jong Hyuk) Park, J., Leung, V., Wang, CL., Shon, T. (eds.) FutureTech 2012. LNCS, vol. 164, pp. 111–121. Springer, Heidelberg (2012). https://doi.org/10.1007/978-94-007-4516-2_12
Newsroom, I.: Intel^® Optane^TM DC Persistent Memory (2019). https://www.intel.com/content/www/us/en/products/memory-storage/optane-dc-persistent-memory.html
Nguyen, B., Tan, H., Davis, K., Zhang, X.: Persistent octrees for parallel mesh refinement through non-volatile byte-addressable memory. IEEE Trans. Parallel Distrib. Syst. 30(3), 677–691 (2019). https://doi.org/10.1109/TPDS.2018.2867867
Article Google Scholar
Nguyen, B., Tan, H., Zhang, X.: Large-scale adaptive mesh simulations through non-volatile byte-addressable memory. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017 (2017)
Google Scholar
Ou, J., Shu, J., Lu, Y.: A high performance file system for non-volatile main memory. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys 2016 (2016)
Google Scholar
Sehrish, S., Son, S.W., Liao, W.k., Choudhary, A., Schuchardt, K.: Improving collective i/o performance by pipelining request aggregation and file access. In: Proceedings of the 20th European MPI Users’ Group Meeting, EuroMPI 2013, pp. 37–42. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2488551.2488559
Song, H., Leangsuksun, C., Nassar, R., Gottumukkala, N., Scott, S.: Availability modeling and analysis on high performance cluster computing systems. In: First International Conference on Availability, Reliability and Security (ARES 2006), p. 8 (2006). https://doi.org/10.1109/ARES.2006.37
Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective i/o in romio. In: Proceedings of Frontiers 1999, Seventh Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189 (1999). https://doi.org/10.1109/FMPC.1999.750599
Volos, H., Tack, A.J., Swift, M.M.: Mnemosyne: lightweight persistent memory. SIGPLAN Not. 47(4), 91–104 (2011)
Article Google Scholar
Wang, Z., Shi, X., Jin, H., Wu, S., Chen, Y.: Iteration based collective i/o strategy for parallel i/o systems. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 287–294 (2014). https://doi.org/10.1109/CCGrid.2014.61
Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: 18th USENIX Conference on File and Storage Technologies (FAST 2020), pp. 169–182. USENIX Association, Santa Clara (2020). https://www.usenix.org/conference/fast20/presentation/yang
Zhang, X., Jiang, S., Davis, K.: Making resonance a common case: a high-performance implementation of collective i/o on parallel file systems. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–12 (2009). https://doi.org/10.1109/IPDPS.2009.5161070
Zhang, X., Ou, J., Davis, K., Jiang, S.: Orthrus: a framework for implementing efficient collective I/O in multi-core clusters. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 348–364. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_22
Chapter Google Scholar

Download references

Acknowledgements

This work was supported in part by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists (WDTS) under the Visiting Faculty Program (VFP). This work was supported in part by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and also used resources of the National Energy Research Scientific Computing Center (NERSC). It was also supported in part by NSF CNS-2216108.

Author information

Authors and Affiliations

Washington State University Vancouver, Vancouver, WA, 98686, USA
Keegan Sanchez, Alex Gavin & Xuechen Zhang
The Ohio State University, Columbus, OH, 43210, USA
Suren Byna
Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Kesheng Wu

Authors

Keegan Sanchez
View author publications
You can also search for this author in PubMed Google Scholar
Alex Gavin
View author publications
You can also search for this author in PubMed Google Scholar
Suren Byna
View author publications
You can also search for this author in PubMed Google Scholar
Kesheng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xuechen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keegan Sanchez .

Editor information

Editors and Affiliations

University Carlos III of Madrid, Madrid, Spain
Jesus Carretero
University of Oregon, Eugene, OR, USA
Sameer Shende
University Carlos III of Madrid, Madrid, Spain
Javier Garcia-Blas
TU Wien, Vienna, Austria
Ivona Brandic
Universidad Complutense de Madrid, Madrid, Spain
Katzalin Olcoz
Université Grenoble Alpes, Saint Martin d'Hères, France
Martin Schreiber

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sanchez, K., Gavin, A., Byna, S., Wu, K., Zhang, X. (2024). A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14802. Springer, Cham. https://doi.org/10.1007/978-3-031-69766-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-69766-1_13
Published: 26 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-69765-4
Online ISBN: 978-3-031-69766-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Orthrus: A Framework for Implementing Efficient Collective I/O in Multi-core Clusters

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Orthrus: A Framework for Implementing Efficient Collective I/O in Multi-core Clusters

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation