Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3578178.3578236acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article
Open access

LibCOS: Enabling Converged HPC and Cloud Data Stores with MPI

Published: 27 February 2023 Publication History

Abstract

Recently, federated HPC and cloud resources are becoming increasingly strategic for providing diversified and geographically available computing resources. However, accessing data stores across HPC and cloud storage systems is challenging. Many cloud providers use object storage systems to support their clients in storing and retrieving data over the internet. One popular method is REST APIs atop the HTTP protocol, with Amazon’s S3 APIs being supported by most vendors. In contrast, HPC systems are contained within their networks and tend to use parallel file systems with POSIX-like interfaces. This work addresses the challenge of diverse data stores on HPC and cloud systems by providing native object storage support through the unified MPI I/O interface in HPC applications. In particular, we provide a prototype library called LibCOS that transparently enables MPI applications running on HPC systems to access object storage on remote cloud systems. We evaluated LibCOS on a Ceph object storage system and a traditional HPC system. In addition, we conducted performance characterization of core S3 operations that enable individual and collective MPI I/O. Our evaluation in HACC, IOR, and BigSort shows that enabling diverse data stores on HPC and Cloud storage is feasible and can be transparently achieved through the widely adopted MPI I/O. Also, we show that a native object storage system like Ceph could improve the scalability of I/O operations in parallel applications.

References

[1]
Joe Arnold. 2014. Openstack swift: Using, administering, and developing for swift object storage. " O’Reilly Media, Inc.".
[2]
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture 8, 3 (2013), 1–154.
[3]
Michael Moore David Bonnie, Becky Ligon, Mike Marshall, Walt Ligon, Nicholas Mills, Elaine Quarles Sam Sampson, Shuangyang Yang, and Boyd Wilson. 2011. OrangeFS: Advancing PVFS. In USENIX Conference on File and Storage Technologies (FAST).
[4]
Peter Braam. 2019. The Lustre storage architecture. arXiv preprint arXiv:1903.01955(2019).
[5]
Felipe A Cruz, Alejandro J Dabin, Juan Pablo Dorsch, Eirini Koutsaniti, Nelson F Lezcano, Maxime Martinasso, and Dario Petrusic. 2020. FirecREST: a RESTful API to HPC systems. In 2020 IEEE/ACM International Workshop on Interoperability of Supercomputing and Cloud Technologies (SuperCompCloud). IEEE, 21–26.
[6]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s Highly Available Key-Value Store. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles (Stevenson, Washington, USA) (SOSP ’07). Association for Computing Machinery, New York, NY, USA, 205–220. https://doi.org/10.1145/1294261.1294281
[7]
Michael Factor, Kalman Meth, Dalit Naor, Ohad Rodeh, and Julian Satran. 2005. Object storage: The future building block for storage systems. In 2005 IEEE International Symposium on Mass Storage Systems and Technology. IEEE, 119–123.
[8]
Roy Thomas Fielding. 2000. Architectural styles and the design of network-based software architectures. University of California, Irvine.
[9]
Gerd Heber. 2013. RESTful HDF5-Interface Specification-Version 0.1. (2013).
[10]
Katrin Heitmann, Thomas D. Uram, Hal Finkel, Nicholas Frontiere, Salman Habib, Adrian Pope, Esteban Rangel, Joseph Hollowed, Danila Korytov, Patricia Larsen, Benjamin S. Allen, Kyle Chard, and Ian Foster. 2019. HACC Cosmological Simulations: First Data Release. The Astrophysical Journal Supplement Series 244, 1 (sep 2019), 17. https://doi.org/10.3847/1538-4365/ab3724
[11]
John L Hennessy and David A Patterson. 2011. Computer architecture: a quantitative approach. Elsevier.
[12]
Dave Henseler, Benjamin Landsteiner, Doug Petesch, Cornell Wright, and Nicholas J Wright. 2016. Architecture and design of cray datawarp. Cray User Group CUG (2016).
[13]
Todd Kordenbrock, Gary Templet, Craig Ulmer, and Patrick Widener. 2022. Viability of S3 Object Storage for the ASC Program at Sandia. Sandia National Laboratories(2022).
[14]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev. 44, 2 (apr 2010), 35–40. https://doi.org/10.1145/1773912.1773922
[15]
Jialin Liu, Quincey Koziol, Gregory F Butler, Neil Fortner, Mohamad Chaarawi, Houjun Tang, Suren Byna, Glenn K Lockwood, Ravi Cheema, Kristy A Kallback-Rose, 2018. Evaluation of HPC application I/O on object storage systems. In 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS). IEEE, 24–34.
[16]
Glenn Lockwood. 2017. What’s So Bad About POSIX I/O?https://www.nextplatform.com/2017/09/11/whats-bad-posix-io/.
[17]
Luca Mascetti, Maria Arsuaga Rios, Enrico Bocchi, Joao Calado Vicente, Belinda Chan Kwok Cheong, Diogo Castro, Julien Collet, Cristian Contescu, Hugo Gonzalez Labrador, Jan Iven, 2020. Cern disk storage services: report from last data taking, evolution and future outlook towards exabyte-scale storage. In EPJ Web of Conferences, Vol. 245. EDP Sciences, 04038.
[18]
Inc. MinIO. 2021. MinIO. High Performance, Kubernetes Native Object Storage. MinIO. (2021).
[19]
Eric Schurman and Jake Brutlag. 2009. The user and business impact of server delays, additional bytes, and http chunking in web search. In Velocity Web Performance and Operations Conference. oreilly.
[20]
Jerome Soumagne, Jordan Henderson, Mohamad Chaarawi, Neil Fortner, Scot Breitenfeld, Songyu Lu, Dana Robinson, Elena Pourmal, and Johann Lombardi. 2021. Accelerating hdf5 i/o for exascale using daos. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2021), 903–914.
[21]
Houjun Tang, Suren Byna, François Tessier, Teng Wang, Bin Dong, Jingqing Mu, Quincey Koziol, Jerome Soumagne, Venkatram Vishwanath, Jialin Liu, 2018. Toward scalable and asynchronous object-centric data management for HPC. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 113–122.
[22]
Jinesh Varia, Sajee Mathew, 2014. Overview of amazon web services. Amazon Web Services 105(2014).
[23]
Murali Vilayannur, Partho Nath, and Anand Sivasubramaniam. 2005. Providing Tunable Consistency for a Parallel File Store. In FAST, Vol. 5. 2–2.
[24]
Chen Wang, Kathryn Mohror, and Marc Snir. 2021. File system semantics requirements of HPC applications. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. 19–30.
[25]
Feiyi Wang, Sarp Oral, Galen Shipman, Oleg Drokin, Tom Wang, and Isaac Huang. 2009. Understanding Lustre filesystem internals. Oak Ridge National Laboratory, National Center for Computational Sciences, Tech. Rep 120(2009).
[26]
Sage Weil, Andrew Leung, Scott Brandt, and Carlos Maltzahn. 2007. RADOS: A scalable, reliable storage service for petabyte-scale storage clusters. 35–44. https://doi.org/10.1145/1374596.1374606
[27]
Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation. 307–320.
[28]
Sage A Weil, Scott A Brandt, Ethan L Miller, and Carlos Maltzahn. 2006. CRUSH: Controlled, scalable, decentralized placement of replicated data. In SC’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. IEEE, 31–31.
[29]
Patrick Widener and Matthew Curry. 2020. CephFS experiments on stria. sandia. gov.Technical Report. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States).

Cited By

View all
  • (2024)IO-SEA: Storage I/O and Data Management for Exascale ArchitecturesProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3654620(94-100)Online publication date: 7-May-2024
  • (2024)Basis path coverage testing of MPI programs based on multi-task evolutionary optimizationExpert Systems with Applications10.1016/j.eswa.2024.124557255(124557)Online publication date: Dec-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
HPCAsia '23: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
February 2023
161 pages
ISBN:9781450398053
DOI:10.1145/3578178
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Check for updates

Author Tags

  1. Ceph
  2. MPI
  3. S3
  4. object storage
  5. parallel computing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • European Comission Horizon 2020

Conference

HPC ASIA 2023

Acceptance Rates

HPCAsia '23 Paper Acceptance Rate 15 of 34 submissions, 44%;
Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)323
  • Downloads (Last 6 weeks)25
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)IO-SEA: Storage I/O and Data Management for Exascale ArchitecturesProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3654620(94-100)Online publication date: 7-May-2024
  • (2024)Basis path coverage testing of MPI programs based on multi-task evolutionary optimizationExpert Systems with Applications10.1016/j.eswa.2024.124557255(124557)Online publication date: Dec-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media