Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3458817.3476144acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Pinpointing crash-consistency bugs in the HPC I/O stack: a cross-layer approach

Published: 13 November 2021 Publication History

Abstract

We present ParaCrash, a testing framework for studying crash recovery in a typical HPC I/O stack, and demonstrate its use by identifying 15 new crash-consistency bugs in various parallel file systems (PFS) and I/O libraries. ParaCrash uses a "golden version" approach to test the entire HPC I/O stack: storage state after recovery from a crash is correct if it matches the state that can be achieved by a partial execution with no crashes. It supports systematic testing of a multilayered I/O stack while properly identifying the layer responsible for the bugs.

Supplementary Material

MP4 File (Pinpointing Crash-Consistency Bugs in the HPC I_O Stack_ A Cross-Layer Approach.mp4.mp4)
SC21 Paper: Pinpointing Crash-Consistency Bugs in the HPC I/O Stack: A Cross-Layer Approach . Corresponding Author: Jinghan Sun
MP4 File (Pinpointing Crash-Consistency Bugs in the HPC I_O Stack_ A Cross-Layer Approach.mp4.mp4)
SC21 Paper: Pinpointing Crash-Consistency Bugs in the HPC I/O Stack: A Cross-Layer Approach . Corresponding Author: Jinghan Sun

References

[1]
Recover a corrupt HDF5 file. https://forum.hdfgroup.org/t/recover-a-corrupt-hdf5-file/1146, 2008.
[2]
HDF5 file state in case of crash. https://forum.hdfgroup.org/t/hdf5-file-state-in-case-of-crash/1598, 2010.
[3]
Preventing file corruption on crash. https://forum.hdfgroup.org/t/preventing-file-corruption-on-crash/2462, 2012.
[4]
Recovering from power loss and timeline for journaling. https://forum.hdfgroup.org/t/recovering-from-power-loss-and-timeline-for-journaling/3293, 2014.
[5]
Corrupt files when creating HDF5 files without closing them (h5py). https://stackoverflow.com/questions/31287744/corrupt-files-when-creating-hdf5-files-without-closing-them-h5py, 2015.
[6]
HPCC power outage event at Texas Tech. http://www.ece.iastate.edu/~mai/docs/failures/2016-hpcc-lustre.pdf, 2016.
[7]
Avoiding corruption of the HDF5 file. https://forum.hdfgroup.org/t/avoiding-corruption-of-the-hdf5-file/4087, 2017.
[8]
FSCK: an online file system checker for Lustre. https://github.com/Xyratex/lustre-stable/blob/master/Documentation/lfsck.txt, 2017.
[9]
pvfs2-fsck - check and correct file system errors. https://www.mankier.com/1/pvfs2-fsck, 2017.
[10]
BeeGFS file system check (beegfs-fsck). https://www.beegfs.io/wiki/FSCheck, 2018.
[11]
Avoiding a corrupted HDF5-file or be able to recover it. https://forum.hdfgroup.org/t/avoiding-a-corrupted-hdf5-file-or-be-able-to-recoverit/5441, 2019.
[12]
Recovering NetCDF files after data loss. https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg14595.html, 2019.
[13]
Greedy, suboptimal solver for the travelling salesman problem. https://pypi.org/project/tsp-solver2/, 2020.
[14]
strace - trace system calls and signals. https://man7.org/linux/man-pages/man1/strace.1.html, 2020.
[15]
Ramnatthan Alagappan, Aishwarya Ganesan, Yuvraj Patel, Thanumalayan Sankaranarayana Pillai, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Correlated crash vulnerabilities. In Proccedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16), Savannah, GA, 2016.
[16]
Jason Ansel, Kapil Arya, and Gene Cooperman. Dmtcp: Transparent checkpointing for cluster computations and the desktop. In 2009 IEEE International Symposium on Parallel & Distributed Processing, pages 1--12. IEEE, 2009.
[17]
Konstantine Arkoudas, Karen Zee, Viktor Kuncak, and Martin Rinard. Verifying a file system implementation. In International Conference on Formal Engineering Methods, pages 373--390. Springer, 2004.
[18]
Remzi H. Arpaci-Dusseau. Operating Systems: Three Easy Pieces, volume 42. Arpaci-Dusseau Books LLC, 2017.
[19]
John Bent, Garth Gibson, Gary Grider, Ben McClelland, Paul Nowoczynski, James Nunez, Milo Polte, and Meghan Wingate. PLFS: a checkpoint filesystem for parallel applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pages 1--12. IEEE, 2009.
[20]
James Bornholt, Antoine Kaufmann, Jialin Li, Arvind Krishnamurthy, Emina Torlak, and Xi Wang. Specifying and checking file system crash-consistency models. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'16), Atlanta, GA, April 2016.
[21]
Eric B Boyer, Matthew C Broomfield, and Terrell A Perrotti. Glusterfs one storage server to rule them all. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2012.
[22]
Jinrui Cao, Om Rameshwar Gatla, Mai Zheng, Dong Dai, Vidya Eswarappa, Yan Mu, and Yong Chen. PFault: A general framework for analyzing the reliability of high-performance parallel file systems. In Proceedings of the 2018 International Conference on Supercomputing, pages 1--11. ACM, 2018.
[23]
Haogang Chen, Tej Chajed, Alex Konradi, Stephanie Wang, Atalay İleri, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. Verifying a highperformance crash-safe file system using a tree specification. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP'17), 2017.
[24]
Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, M Frans Kaashoek, and Nickolai Zeldovich. Using crash Hoare logic for certifying the FSCQ file system. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15), Monterey, CA, 2015.
[25]
Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Optimistic crash consistency. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP'13), 2013.
[26]
Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Consistency without ordering. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST'12), 2012.
[27]
Avery Ching, Kenin Coloma, Jianwei Li, Alok Choudhary, and Wei keng Liao. High-performance techniques for parallel I/O. http://users.eecs.northwestern.edu/~choudhar/Publications/ChiCol07.pdf, 2001.
[28]
Peter Corbett, Dror Feitelson, Sam Fineberg, Yarsun Hsu, Bill Nitzberg, JeanPierre Prost, Marc Snir, Bernard Traversat, and Parkson Wong. Overview of the MPI-IO parallel I/O interface. In Proceedings of the Workshop on Input/Output in Parallel and Distributed Systems (IPPS'95), 1995.
[29]
Dong Dai, Om Rameshwar Gatla, and Mai Zheng. A performance study of lustre file system checker: Bottlenecks and potentials. In Proceedings of 35th International Conference on Massive Storage Systems and Technology (MSST'19), Santa Clara, CA, 2019.
[30]
David Drysdale. Coverage-guided kernel fuzzing with Syzkaller. Linux Weekly News, 2:33, 2016.
[31]
Gidon Ernst, Gerhard Schellhorn, Dominik Haneberg, Jörg Pfähler, and Wolfgang Reif. Verification of a virtual filesystem switch. In Working Conference on Verified Software: Theories, Tools, and Experiments, pages 242--261. Springer, 2013.
[32]
Mike Folk, Gerd Heber, Quincey Koziol, Elena Pourmal, and Dana Robinson. An overview of the hdf5 technology suite and its applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pages 36--47, 2011.
[33]
Daniel Fryer, Kuei Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Ashvin Goel, and Angela Demke Brown. Recon: Verifying file system consistency at runtime. ACM Transactions on Storage (TOS), 8(4):1--29, 2012.
[34]
The HDF Group. h5check: the hdf5 format checker. https://support.hdfgroup.org/products/hdf5_tools/h5check.html, 2014.
[35]
The Open Group. POSIX.1-2008, IEEE Std 1003.1TM-2017 (Revision of IEEE Std 1003.1-2008), Base Specifications Issue 7. http://pubs.opengroup.org/onlinepubs/9699919799/.
[36]
Runzhou Han, Duo Zhang, and Mai Zheng. Fingerprinting the checker policies of parallel file systems. In 2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW), pages 46--51. IEEE, 2020.
[37]
Jan Heichler. An introduction to BeeGFS. http://www.beegfs.de/docs/whitepapers/Introduction_to_BeeGFS_by_ThinkParQ.pdf, 2014.
[38]
Michael P. Kasick, Jiaqi Tan, Rajeev Gandhi, and Priya Narasimhan. Black-box problem diagnosis in parallel file systems. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST'10), San Jose, CA, February 2010.
[39]
Seulbae Kim, Meng Xu, Sanidhya Kashyap, Jungyeon Yoon, Wen Xu, and Taesoo Kim. Finding semantic bugs in file systems with an extensible fuzzing framework. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP'19), Ontario, Canada, October 2019.
[40]
Leslie Lamport. Time, clocks and the ordering of events in a distributed system. Communications of the ACM, 21(7):558, 1978.
[41]
Kalman Z Meth and Julian Satran. Design of the iscsi protocol. In 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003.(MSST 2003). Proceedings., pages 116--122. IEEE, 2003.
[42]
Changwoo Min, Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, and Taesoo Kim. Cross-checking semantic correctness: The case of finding file system bugs. In Proceedings of the 25th Symposium on Operating Systems Principles, pages 361--377, 2015.
[43]
Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, and Vijay Chidambaram. Finding crash-consistency bugs with bounded black-box crash testing. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18), pages 33--50, 2018.
[44]
Carroll Morgan and Bernard Sufrin. Specification of the UNIX filing system. IEEE Transactions on Software Engineering, (2):128--142, 1984.
[45]
Gian Ntzik, Pedro da Rocha Pinto, Julian Sutherland, and Philippa Gardner. A concurrent specification of POSIX file systems. In 32nd European Conference on Object-Oriented Programming (ECOOP 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
[46]
Michael A Olson, Keith Bostic, and Margo I Seltzer. Berkeley DB. In USENIX Annual Technical Conference, FREENIX Track, pages 183--191, 1999.
[47]
Sarp Oral, Feiyi Wang, David Dillow, Galen Shipman, Ross Miller, and Oleg Drokin. Efficient object storage journaling in a distributed parallel file system. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST'10), 2010.
[48]
Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, and Dhabaleswar K Panda. Crfs: A lightweight user-level filesystem for generic checkpoint/restart. In 2011 International Conference on Parallel Processing, pages 375--384. IEEE, 2011.
[49]
Simon Pickartz, Niklas Eiling, Stefan Lankes, Lukas Razik, and Antonello Monti. Migrating linux containers using criu. In International Conference on High Performance Computing, pages 674--684. Springer, 2016.
[50]
Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14), Broomfield, CO, 2014.
[51]
Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Crash consistency. Communications of the ACM, 58(10):46--51, 2015.
[52]
Vijayan Prabhakaran, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Analysis and Evolution of Journaling File Systems. In Proceedings of 2005 USENIX Annual Technical Conference (USENIX'05), 2005.
[53]
Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. Indexfs: Scaling file system metadata performance with stateless caching and bulk insertion. In SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 237--248. IEEE, 2014.
[54]
Russ Rew and Glenn Davis. NetCDF: an interface for scientific data access. IEEE computer graphics and applications, 10(4):76--82, 1990.
[55]
Tom Ridge, David Sheets, Thomas Tuerk, Andrea Giugliano, Anil Madhavapeddy, and Peter Sewell. SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems. In Proceedings of the 25th Symposium on Operating Systems Principles, pages 38--53, 2015.
[56]
Ohad Rodeh, Josef Bacik, and Chris Mason. BTRFS: the linux b-tree filesystem. TOS, 9(3):9:1--9:32, 2013.
[57]
Frank B Schmuck and Roger L Haskin. GPFS: A shared-disk file system for large computing clusters. In FAST, volume 2, 2002.
[58]
Philip Schwan et al. Lustre: Building a file system for 1000-node clusters. In Proceedings of the 2003 Linux symposium, volume 2003, pages 380--386, 2003.
[59]
Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, and Xi Wang. Push-button verification of file systems via crash refinement. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16), 2016.
[60]
Jinghan Sun, Chen Wang, Jian Huang, and Marc Snir. Understanding and finding crash-consistency bugs in parallel file systems. In Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage'20), 2020.
[61]
OrangeFS Team. The OrangeFS project. http://www.orangefs.org/, 2015.
[62]
Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw. Accelerating parallel analysis of scientific simulation data via Zazen. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST'10), 2010.
[63]
Chen Wang, Jinghan Sun, Marc Snir, Kathryn Mohror, and Elsa Gonsiorowski. Recorder 2.0: Efficient parallel I/O tracing and analysis. In The IEEE International Workshop on High-Performance Storage, 2020.
[64]
Teng Wang, Sarp Oral, Yandong Wang, Brad Settlemyer, Scott Atchley, and Weikuan Yu. Burstmem: A high-performance burst buffer system for scientific applications. In 2014 IEEE International Conference on Big Data (Big Data), pages 71--79. IEEE, 2014.
[65]
Teng Wang, W Yu, K Sato, A Moody, and K Mohror. Burstfs: A distributed burst buffer file system for scientific applications. Technical report, Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States), 2016.
[66]
Md Wasi-ur Rahman, Xiaoyi Lu, Nusrat Sharmin Islam, Raghunath Rajachandrasekar, and Dhabaleswar K Panda. High-performance design of YARN mapreduce on modern HPC clusters with Lustre and RDMA. In 2015 IEEE International Parallel and Distributed Processing Symposium, pages 291--300. IEEE, 2015.
[67]
Sage A Weil, Kristal T Pollack, Scott A Brandt, and Ethan L Miller. Dynamic metadata management for petabyte-scale file systems. In SC'04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, pages 4--4. IEEE, 2004.
[68]
Wikipedia. Parallel I/O. https://en.wikipedia.org/wiki/Parallel_I/O, 2020.
[69]
Jiesheng Wu, Pete Wyckoff, and Dhabaleswar Panda. PVFS over InfiniBand: Design and performance evaluation. In 2003 International Conference on Parallel Processing, 2003. Proceedings., pages 125--132. IEEE, 2003.
[70]
Wen Xu, Hyungon Moon, Sanidhya Kashyap, Po-Ning Tseng, and Taesoo Kim. Fuzzing file systems via two-dimensional input space exploration. In Proceedings of the 40th IEEE Symposium on Security and Privacy (Oakland'19), San Francisco, CA, May 2019.
[71]
Junfeng Yang, Can Sar, and Dawson Engler. EXPLODE: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06), 2006.
[72]
Junfeng Yang, Paul Twohey, Dawson Engler, and Madanlal Musuvathi. Using model checking to find serious file system errors. ACM Transactions on Computer Systems (TOCS), 24(4):393--423, 2006.
[73]
Mai Zheng, Joseph Tucek, Dachuan Huang, Feng Qin, Mark Lillibridge, Elizabeth S Yang, Bill W Zhao, and Shashank Singh. Torturing databases for fun and profit. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14), pages 449--464, 2014.
[74]
Qing Zheng, Kai Ren, Garth Gibson, Bradley W Settlemyer, and Gary Grider. Deltafs: Exascale file systems scale better without dedicated servers. In Proceedings of the 10th Parallel Data Storage Workshop, pages 1--6, 2015.

Cited By

View all
  • (2023)FaultyRank: A Graph-based Parallel File System Checker2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00029(200-210)Online publication date: May-2023
  • (2023)Optimizing the Analysis and Evaluation of Logic Simulation Workloads in HPC Systems2023 IEEE 17th International Conference on Application of Information and Communication Technologies (AICT)10.1109/AICT59525.2023.10313156(1-6)Online publication date: 18-Oct-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2021
1493 pages
ISBN:9781450384421
DOI:10.1145/3458817
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. I/O library
  2. crash consistency
  3. parallel file systems

Qualifiers

  • Research-article

Funding Sources

  • NSF

Conference

SC '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)5
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)FaultyRank: A Graph-based Parallel File System Checker2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00029(200-210)Online publication date: May-2023
  • (2023)Optimizing the Analysis and Evaluation of Logic Simulation Workloads in HPC Systems2023 IEEE 17th International Conference on Application of Information and Communication Technologies (AICT)10.1109/AICT59525.2023.10313156(1-6)Online publication date: 18-Oct-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media