Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

NVMM-Oriented Hierarchical Persistent Client Caching for Lustre

Published: 18 January 2021 Publication History

Abstract

In high-performance computing (HPC), data and metadata are stored on special server nodes and client applications access the servers’ data and metadata through a network, which induces network latencies and resource contention. These server nodes are typically equipped with (slow) magnetic disks, while the client nodes store temporary data on fast SSDs or even on non-volatile main memory (NVMM). Therefore, the full potential of parallel file systems can only be reached if fast client side storage devices are included into the overall storage architecture.
In this article, we propose an NVMM-based hierarchical persistent client cache for the Lustre file system (NVMM-LPCC for short). NVMM-LPCC implements two caching modes: a read and write mode (RW-NVMM-LPCC for short) and a read only mode (RO-NVMM-LPCC for short). NVMM-LPCC integrates with the Lustre Hierarchical Storage Management (HSM) solution and the Lustre layout lock mechanism to provide consistent persistent caching services for I/O applications running on client nodes, meanwhile maintaining a global unified namespace of the entire Lustre file system. The evaluation results presented in this article show that NVMM-LPCC can increase the average read throughput by up to 35.80 times and the average write throughput by up to 9.83 times compared with the native Lustre system, while providing excellent scalability.

References

[1]
Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. Proc. IEEE 98, 12 (2010), 2237--2251.
[2]
Jens Axboe. 2019. fio: Flexible I/O Tester. Retrieved from git://git.kernel.dk/fio.git.
[3]
Francieli Zanon Boito, Eduardo C. Inacio, Jean Luca Bez, Philippe O. A. Navaux, Mario A. R. Dantas, and Yves Denneulin. 2018. A checkpoint of research on parallel I/O for high-performance computing. ACM Comput. Surv. 51, 2 (2018), 23:1–23:35.
[4]
Peter Braam. 2005. The Lustre storage architecture. CoRR abs/1903.01955. http://arxiv.org/abs/1903.01955
[5]
André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip H. Carns, Toni Cortes, Scott Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, and Marc-Andre Vef. 2020. Ad hoc file systems for high-performance computing. J. Comput. Sci. Technol. 35, 1 (2020), 4--26.
[6]
Youmin Chen, Jiwu Shu, Jiaxin Ou, and Youyou Lu. 2018. HiNFS: A persistent memory file system with both buffering and direct-access. ACM Transactions on Storage (ToS) 14, 1 (2018), 4:1–4:30.
[7]
Giuseppe Congiu, Sai Narasimhamurthy, Tim Süß, and André Brinkmann. 2016. Improving collective I/O performance using non-volatile memory devices. In Proceedings of the 2016 IEEE International Conference on Cluster Computing (CLUSTER), Taipei, Taiwan, September 12–16. 120--129.
[8]
Mingkai Dong, Heng Bu, Jifei Yi, Benchao Dong, and Haibo Chen. 2019. Performance and protection in the ZoFS user-space NVM file system. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), Huntsville, ON, Canada, October 27–30. 478--493.
[9]
Subramanya R. Dulloor, Sanjay Kumar, Anil S. Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th Eurosys Conference 2014, EuroSys 2014, Amsterdam, The Netherlands, April 13–16. 15:1–15:15.
[10]
Marc Eshel, Roger L. Haskin, Dean Hildebrand, Manoj Naik, Frank B. Schmuck, and Renu Tewari. 2010. Panache: A parallel file system cache for global file access. In Proceedings of the 8th USENIX Conference on File and Storage Technologies, San Jose, CA, February 23–26. 155--168.
[11]
Herodotos Herodotou and Elena Kakoulli. 2019. Automating distributed tiered storage management in cluster computing. In Proceedings of the VLDB Endowment 13, 1 (Sept. 2019), 43--56.
[12]
Morteza Hoseinzadeh. 2019. A survey on tiering and caching in high-performance storage systems. CoRR abs/1904.11560 (2019).
[13]
David Howells. 2006. Fs-cache: A network filesystem caching facility. In Proceedings of the Linux Symposium, Ottawa, Ontario, Canada. 427--440.
[14]
Jian Huang, Karsten Schwan, and Moinuddin K. Qureshi. 2014. NVRAM-aware logging in transaction systems. In Proceedings of the VLDB Endowment 8, 4 (2014), 389--400.
[15]
Nusrat Sharmin Islam, Md. Wasi-ur-Rahman, Xiaoyi Lu, and Dhabaleswar K. Panda. 2016. High performance design for HDFS with byte-addressability of NVM and RDMA. In Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1–3. 8:1–8:14.
[16]
Adrian Jackson, Michèle Weiland, Mark Parsons, and Bernhard Homoelle. 2018. Architectures for high performance computing and data systems using byte-addressable persistent memory. CoRR abs/1805.10041.
[17]
Krish K. R., Bharti Wadhwa, M. Safdar Iqbal, M. Mustafa Rafique, and Ali Raza Butt. 2016. On efficient hierarchical storage for big data processing. In Proceedings of the IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, May 16–19. 403--408.
[18]
Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, and Vijay Chidambaram. 2019. SplitFS: Reducing software overhead in file systems for persistent memory. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), Huntsville, ON, Canada, October 27–30. 494--508.
[19]
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA), June 20–24, Austin, TX. 2--13.
[20]
Ning Liu, Jason Cope, Philip H. Carns, Christopher D. Carothers, Robert B. Ross, Gary Grider, Adam Crume, and Carlos Maltzahn. 2012. On the role of burst buffers in leadership-class storage systems. In Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), April 16–20, Asilomar Conference Grounds, Pacific Grove, CA. 1--11.
[21]
Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: An RDMA-enabled distributed persistent memory file system. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC), Santa Clara, CA, July 12–14. 773--785.
[22]
Amirsaman Memaripour and Steven Swanson. 2018. Breeze: User-level access to non-volatile main memories for legacy software. In Proceedings of the 36th IEEE International Conference on Computer Design (ICCD), Orlando, FL, October 7–10. 413--422.
[23]
Thomas Mikolajick, Christine Dehm, Walter Hartner, Ivan Kasko, M. J. Kastner, Nicolas Nagel, Manfred Moert, and Carlos Mazure. 2001. FeRAM technology for high density applications. Microelectron. Reliab. 41, 7 (2001), 947--950.
[24]
Yingjin Qian, Xi Li, Shuichi Ihara, Andreas Dilger, Carlos Thomaz, Shilong Wang, Wen Cheng, Chunyan Li, Lingfang Zeng, Fang Wang, Dan Feng, Tim Süß, and André Brinkmann. 2019. LPCC: Hierarchical persistent client caching for lustre. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, Denver, Colorado, November 17–19. 88:1–88:14.
[25]
Yingjin Qian, Xi Li, Shuichi Ihara, Lingfang Zeng, Jürgen Kaiser, Tim Süß, and André Brinkmann. 2017. A configurable rule based classful token bucket filter network request scheduler for the lustre file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Denver, CO, November 12–17. 6:1–6:12.
[26]
Frank B. Schmuck and Roger L. Haskin. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the FAST ’02 Conference on File and Storage Technologies, January 28–30, Monterey, California. 231--244.
[27]
Hongzhang Shan and John Shalf. 2007. Using IOR to Analyze the I/O Performance for HPC Platforms. In Proceedings of the Cray Users Group Meeting (CUG'07), Seattle, Washington, may 7-10. https://crd.lbl.gov/assets/pubs_presos/CDS/ATG/cug07shan.pdf.
[28]
Yizhou Shan, Shin-Yeh Tsai, and Yiying Zhang. 2017. Distributed shared persistent memory. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC), Santa Clara, CA, September 24–27. 323--337.
[29]
Liu Shi, Zhenjun Liu, and Lu Xu. 2012. BWCC: A FS-cache based cooperative caching system for network storage system. In Proceedings of the IEEE International Conference on Cluster Computing. IEEE.
[30]
Vasily Tarasov, Erez Zadok, and Spencer Shepler. 2016. Filebench: A flexible framework for file system benchmarking. ;login: 41, 1 (2016), 6--12.
[31]
Marc-Andre Vef, Nafiseh Moti, Tim Süß, Markus Tacke, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, and André Brinkmann. 2020. GekkoFS—A temporary burst buffer file system for HPC applications. J. Comput. Sci. Technol. 35, 1 (2020), 72--91.
[32]
Marc-Andre Vef, Vasily Tarasov, Dean Hildebrand, and André Brinkmann. 2018. Challenges and solutions for tracing storage systems: A case study with spectrum scale. ACM Trans. Storage (TOS) 14, 2 (2018), 18:1–18:24.
[33]
Haris Volos, Sanketh Nalli, Sankaralingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. 2014. Aerie: Flexible file-system interfaces to storage-class memory. In Proceedings of the 9th Eurosys Conference 2014 (EuroSys), Amsterdam, The Netherlands, April 13–16. 14:1–14:14.
[34]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Newport Beach, CA, March 5–11. 91--104.
[35]
Lipeng Wan, Qing Cao, Feiyi Wang, and Sarp Oral. 2017. Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems. J. Parallel Distrib. Comput. 100 (2017), 16--29.
[36]
Teng Wang, Kathryn Mohror, Adam Moody, Kento Sato, and Weikuan Yu. 2016. An ephemeral burst-buffer file system for scientific applications. In Proceeding of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE.
[37]
Xue Wei and Li Xi. 2017. LCOC: Lustre Cache on Client based on SSD. Retrieved from http://wiki.lustre.org/Lustre_Administrator_and_Developer_Workshop_2017.
[38]
Matthew Wilcox. 2014. Add Support for NV-DIMMs to Ext4. Retrieved from https://lwn.net/Articles/613384/.
[39]
H.-S. Philip Wong, Simone Raoux, SangBum Kim, Jiale Liang, John P. Reifenberg, Bipin Rajendran, Mehdi Asheghi, and Kenneth E. Goodson. 2010. Phase change memory. In Proceedings of the IEEE 98, 12 (2010), 2201--2227.
[40]
Li Xi. 2018. Lustre Persistent Client Cache: A client side cache that speeds up applications with certain I/O patterns. Retrieved from http://opensfs.org/lug-2018-agenda/.
[41]
Jian Xu and Steven Swanson. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, February 22--25. 323--338.
[42]
Jiachen Zhang, Peng Li, Bo Liu, Trent G. Marbach, Xiaoguang Liu, and Gang Wang. 2018. Performance analysis of 3D XPoint SSDs in virtualized and non-virtualized environments. In Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems (ICPADS), Singapore, December 11–13. 51--60.

Cited By

View all
  • (2024)Study on tiered storage algorithm based on heat correlation of astronomical dataFrontiers in Astronomy and Space Sciences10.3389/fspas.2024.137124911Online publication date: 14-Mar-2024
  • (2023)Re-architecting I/O Caches for Emerging Fast Storage DevicesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582041(542-555)Online publication date: 25-Mar-2023
  • (2022)The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and AvailabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317057433:12(3850-3869)Online publication date: 1-Dec-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 17, Issue 1
Special Section on Usenix Fast 2020
February 2021
165 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3446939
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 January 2021
Accepted: 01 June 2020
Revised: 01 March 2020
Received: 01 November 2019
Published in TOS Volume 17, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Lustre
  2. direct access
  3. hierarchical storage management
  4. non-volatile memory
  5. persistent caching

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • Hubei Natural Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)78
  • Downloads (Last 6 weeks)9
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Study on tiered storage algorithm based on heat correlation of astronomical dataFrontiers in Astronomy and Space Sciences10.3389/fspas.2024.137124911Online publication date: 14-Mar-2024
  • (2023)Re-architecting I/O Caches for Emerging Fast Storage DevicesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582041(542-555)Online publication date: 25-Mar-2023
  • (2022)The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and AvailabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317057433:12(3850-3869)Online publication date: 1-Dec-2022
  • (2022)Convergence of HPC and Big Data in extreme-scale data analysis through the DCEx programming model2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00024(130-139)Online publication date: Nov-2022
  • (2022)Fine-Grained I/O Traffic Control Middleware for I/O Fairness in Virtualized SystemIEEE Access10.1109/ACCESS.2022.318773110(73122-73144)Online publication date: 2022
  • (2022)FastCache: A write-optimized edge storage system via concurrent merging cache for IoT applicationsJournal of Systems Architecture10.1016/j.sysarc.2022.102718131(102718)Online publication date: Oct-2022
  • (2022)FastCache: A Client-Side Cache with Variable-Position Merging Schema in Network Storage SystemAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-95388-1_10(144-160)Online publication date: 23-Feb-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media