Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3477132.3483592acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Crash Consistent Non-Volatile Memory Express

Published: 26 October 2021 Publication History

Abstract

This paper presents crash consistent Non-Volatile Memory Express (ccNVMe), a novel extension of the NVMe that defines how host software communicates with the non-volatile memory (e.g., solid-state drive) across a PCI Express bus with both crash consistency and performance efficiency. Existing storage systems pay a huge tax on crash consistency, and thus can not fully exploit the multi-queue parallelism and low latency of the NVMe interface. ccNVMe alleviates this major bottleneck by coupling the crash consistency to the data dissemination. This new idea allows the storage system to achieve crash consistency by taking the free rides of the data dissemination mechanism of NVMe, using only two lightweight memory-mapped I/Os (MMIO), unlike traditional systems that use complex update protocol and heavyweight block I/Os. ccNVMe introduces transaction-aware MMIO and doorbell to reduce the PCIe traffic as well as to provide atomicity. We present how to build a high-performance and crash-consistent file system namely MQFS atop ccNVMe. We experimentally show that MQFS increases the IOPS of RocksDB by 36% and 28% compared to a state-of-the-art file system and Ext4 without journaling, respectively.

References

[1]
[n.d.]. A Persistent Key-Value Store for Fast Storage. https://rocksdb.org/.
[2]
[n.d.]. ext4 Data Structures and Algorithms. https://www.kernel.org/doc/html/latest/filesystems/ext4/index.html.
[3]
[n.d.]. Filebench - A Model Based File System Workload Generator. https://github.com/filebench/filebench.
[4]
[n.d.]. fio - Flexible I/O tester. https://fio.readthedocs.io/en/latest/fio_doc.html.
[5]
[n.d.]. Intel Optane SSD DC P5800X Series. https://ark.intel.com/content/www/us/en/ark/products/201859/intel-optane-ssd-dc-p5800x-series-1-6tb-2-5in-pcie-x4-3d-xpoint.html.
[6]
[n.d.]. Non-Volatile Memory express. https://nvmexpress.org.
[7]
[n.d.]. NVMe 1.2 Spec. https://nvmexpress.org/wp-content/uploads/NVM_Express_1_2_Gold_20141209.pdf.
[8]
[n.d.]. NVMe 1.4 Spec. https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf.
[9]
[n.d.]. NVMe 1.4 Spec Revision 1.4c. https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4c-2021.06.28-Ratified.pdf.
[10]
[n.d.]. NVMe SSD with Persistent Memory Region. https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2017/20170810_FM31_Chanda.pdf.
[11]
[n.d.]. PCI Express Base Specification Revision 3.1. https://pcisig.com/specifications/.
[12]
Ahmed Abulila, Vikram Sharma Mailthody, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong, and Wen-mei Hwu. 2019. FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 971--985. https://doi.org/10.1145/3297858.3304061
[13]
Duck-Ho Bae, Insoon Jo, Youra Adel Choi, Joo-Young Hwang, Sangyeun Cho, Dong-Gi Lee, and Jaeheon Jeong. 2018. 2B-SSD: The Case for Dual, Byte- and Block-Addressable Solid-State Drives. In Proceedings of the 45th Annual International Symposium on Computer Architecture (Los Angeles, California) (ISCA '18). IEEE Press, 425--438. https://doi.org/10.1109/ISCA.2018.00043
[14]
Steve Best. 2000. JFS Log: How the Journaled File System Performs Logging. In Annual Linux Showcase & Conference.
[15]
Srivatsa S. Bhat, Rasha Eqbal, Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2017. Scaling a File System to Many Cores Using an Operation Log. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 69--86. https://doi.org/10.1145/3132747.3132779
[16]
C. Chao, R. M. English, D. Jacobson, A. Stepanov, and J. Wilkes. 1997. Mime: a high performance parallel storage device with strong recovery guarantees.
[17]
Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic Crash Consistency. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP '13). ACM, New York, NY, USA, 228--243. https://doi.org/10.1145/2517349.2522726
[18]
Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. Consistency without Ordering. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (San Jose, CA) (FAST'12). USENIX Association, USA, 9.
[19]
Christopher Frost, Mike Mammarella, Eddie Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, and Lei Zhang. 2007. Generalized File System Dependencies. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles (Stevenson, Washington, USA) (SOSP '07). Association for Computing Machinery, New York, NY, USA, 307--320. https://doi.org/10.1145/1294261.1294291
[20]
Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Cheng, Vijay Chidambaram, and Emmett Witchel. 2018. TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 879--891. https://www.usenix.org/conference/atc18/presentation/hu
[21]
M. Kaashoek and Wilson Hsieh. 2001. Logical Disk: A Simple New Approach to Improving File System Performance. (03 2001).
[22]
Junbin Kang, Benlong Zhang, Tianyu Wo, Weiren Yu, Lian Du, Shuai Ma, and Jinpeng Huai. 2015. SpanFS: A Scalable File System on Fast Storage Devices. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (Santa Clara, CA) (USENIX ATC '15). USENIX Association, Berkeley, CA, USA, 249--261. http://dl.acm.org/citation.cfm?id=2813767.2813786
[23]
Woon-Hak Kang, Sang-Won Lee, Bongki Moon, Gi-Hwan Oh, and Changwoo Min. 2013. X-FTL: Transactional FTL for SQLite Databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA) (SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 97--108. https://doi.org/10.1145/2463676.2465326
[24]
Jongseok Kim, Cassiano Campes, Joo-Young Hwang, Jinkyu Jeong, and Euiseong Seo. 2021. Z-Journal: Scalable Per-Core Journaling. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 893--906. https://www.usenix.org/conference/atc21/presentation/kim-jongseok
[25]
Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, and Thomas Anderson. 2017. Strata: A Cross Media File System. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 460--477. https://doi.org/10.1145/3132747.3132770
[26]
Gyusun Lee, Seokha Shin, Wonsuk Song, Tae Jun Ham, Jae W. Lee, and Jinkyu Jeong. 2019. Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 603--616. https://www.usenix.org/conference/atc19/presentation/lee-gyusun
[27]
Xiaojian Liao, Youyou Lu, Erci Xu, and Jiwu Shu. 2020. Write Dependency Disentanglement with HORAE. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 549--565. https://www.usenix.org/conference/osdi20/presentation/liao
[28]
Xiaojian Liao, Youyou Lu, Erci Xu, and Jiwu Shu. 2021. Max: A Multicore-Accelerated File System for Flash Storage. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 877--891. https://www.usenix.org/conference/atc21/presentation/liao
[29]
Lanyue Lu, Yupu Zhang, Thanh Do, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Physical Disentanglement in a Container-based File System. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield, CO) (OSDI'14). USENIX Association, Berkeley, CA, USA, 81--96. http://dl.acm.org/citation.cfm?id=2685048.2685056
[30]
Youyou Lu, Jiwu Shu, Jia Guo, Shuai Li, and Onur Mutlu. 2013. LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions. In 2013 IEEE 31st International Conference on Computer Design (ICCD). 115--122. https://doi.org/10.1109/ICCD.2013.6657033
[31]
Youyou Lu, Jiwu Shu, and Wei Wang. 2014. ReconFS: A Reconstructable File System on Flash Storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (Santa Clara, CA) (FAST'14). USENIX Association, USA, 75--88.
[32]
Youyou Lu, Jiwu Shu, and Jiacheng Zhang. 2019. Mitigating Synchronous I/O Overhead in File Systems on Open-Channel SSDs. ACM Trans. Storage 15, 3, Article 17 (May 2019), 25 pages. https://doi.org/10.1145/3319369
[33]
Youyou Lu, Jiwu Shu, and Weimin Zheng. 2013. Extending the Lifetime of Flash-based Storage through Reducing Write Amplification from File Systems. In 11th USENIX Conference on File and Storage Technologies (FAST 13). USENIX Association, San Jose, CA, 257--270. https://www.usenix.org/conference/fast13/technical-sessions/presentation/lu_youyou
[34]
Microsoft. [n.d.]. Windows NTFS. https://en.wikipedia.org/wiki/NTFS.
[35]
Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, and Vijay Chidambaram. 2018. Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 33--50. https://www.usenix.org/conference/osdi18/presentation/mohan
[36]
Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. 2006. Rethink the Sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Seattle, Washington) (OSDI 06). USENIX Association, USA, 1--14.
[37]
Xiangyong Ouyang, David Nellans, Robert Wipfel, David Flynn, and Dhabaleswar K. Panda. 2011. Beyond Block I/O: Rethinking Traditional Storage Primitives. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA '11). IEEE Computer Society, USA, 301--311.
[38]
Daejun Park and Dongkun Shin. 2017. iJournaling: Fine-grained Journaling for Improving the Latency of Fsync System Call. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (Santa Clara, CA, USA) (USENIX ATC '17). USENIX Association, Berkeley, CA, USA, 787--798. http://dl.acm.org/citation.cfm?id=3154690.3154764
[39]
Thanumalayan Sankaranarayana Pillai, Ramnatthan Alagappan, Lanyue Lu, Vijay Chidambaram, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Application Crash Consistency and Performance with CCFS. In 15th USENIX Conference on File and Storage Technologies (FAST 17). USENIX Association, Santa Clara, CA, 181--196. https://www.usenix.org/conference/fast17/technical-sessions/presentation/pillai
[40]
Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel. 2009. Operating System Transactions. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA) (SOSP '09). Association for Computing Machinery, New York, NY, USA, 161--176. https://doi.org/10.1145/1629575.1629591
[41]
Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou. 2008. Transactional Flash. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (San Diego, California) (OSDI'08). USENIX Association, USA, 147--160.
[42]
Russell Sears and Eric Brewer. 2006. Stasis: Flexible Transactional Storage. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Seattle, Washington) (OSDI '06). USENIX Association, USA, 29--44.
[43]
Yongseok Son, Sunggon Kim, Heon Young Yeom, and Hyuck Han. 2018. High-performance Transaction Processing in Journaling File Systems. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (Oakland, CA, USA) (FAST'18). USENIX Association, Berkeley, CA, USA, 227--240. http://dl.acm.org/citation.cfm?id=3189759.3189781
[44]
Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS File System. In USENIX Annual Technical Conference, Vol. 15.
[45]
Youjip Won, Jaemin Jung, Gyeongyeol Choi, Joontaek Oh, Seongbae Son, Jooyoung Hwang, and Sangyeun Cho. 2018. Barrier-enabled IO Stack for Flash Storage. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (Oakland, CA, USA) (FAST'18). USENIX Association, Berkeley, CA, USA, 211--226. http://dl.acm.org/citation.cfm?id=3189759.3189779
[46]
Z. Yang, Y. Lu, E. Xu, and J. Shu. 2020. CoinPurse: A Device-Assisted File System with Dual Interfaces. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1--6.
[47]
Jiacheng Zhang, Jiwu Shu, and Youyou Lu. 2016. ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, Denver, CO, 87--100. https://www.usenix.org/conference/atc16/technical-sessions/presentation/zhang

Cited By

View all
  • (2023)When Database Meets New Storage Devices: Understanding and Exposing Performance Mismatches via ConfigurationsProceedings of the VLDB Endowment10.14778/3587136.358714516:7(1712-1725)Online publication date: 1-Mar-2023
  • (2023)Efficient Crash Consistency for NVMe over PCIe and RDMAACM Transactions on Storage10.1145/356842819:1(1-35)Online publication date: 11-Jan-2023

Index Terms

  1. Crash Consistent Non-Volatile Memory Express

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles
    October 2021
    899 pages
    ISBN:9781450387095
    DOI:10.1145/3477132
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. NVMe
    2. SSD
    3. crash consistency
    4. file system
    5. storage protocol

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SOSP '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 174 of 961 submissions, 18%

    Upcoming Conference

    SOSP '25
    ACM SIGOPS 31st Symposium on Operating Systems Principles
    October 13 - 16, 2025
    Seoul , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)136
    • Downloads (Last 6 weeks)22
    Reflects downloads up to 30 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)When Database Meets New Storage Devices: Understanding and Exposing Performance Mismatches via ConfigurationsProceedings of the VLDB Endowment10.14778/3587136.358714516:7(1712-1725)Online publication date: 1-Mar-2023
    • (2023)Efficient Crash Consistency for NVMe over PCIe and RDMAACM Transactions on Storage10.1145/356842819:1(1-35)Online publication date: 11-Jan-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media