Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

The Case for Custom Storage Backends in Distributed Storage Systems

Published: 18 May 2020 Publication History

Abstract

For a decade, the Ceph distributed file system followed the conventional wisdom of building its storage backend on top of local file systems. This is a preferred choice for most distributed file systems today, because it allows them to benefit from the convenience and maturity of battle-tested code. Ceph’s experience, however, shows that this comes at a high price. First, developing a zero-overhead transaction mechanism is challenging. Second, metadata performance at the local level can significantly affect performance at the distributed level. Third, supporting emerging storage hardware is painstakingly slow.
Ceph addressed these issues with BlueStore, a new backend designed to run directly on raw storage devices. In only two years since its inception, BlueStore outperformed previous established backends and is adopted by 70% of users in production. By running in user space and fully controlling the I/O stack, it has enabled space-efficient metadata and data checksums, fast overwrites of erasure-coded data, inline compression, decreased performance variability, and avoided a series of performance pitfalls of local file systems. Finally, it makes the adoption of backward-incompatible storage hardware possible, an important trait in a changing storage landscape that is learning to embrace hardware diversity.

References

[1]
Abutalib Aghayev and Peter Desnoyers. 2015. Skylight—A window on shingled disk operation. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). USENIX Association, 135--149.
[2]
Abutalib Aghayev, Theodore Ts’o, Garth Gibson, and Peter Desnoyers. 2017. Evolving Ext4 for shingled disks. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 105--120.
[3]
Abutalib Aghayev, Sage Weil, Greg Ganger, and George Amvrosiadis. 2019. Reconciling LSM-Trees with Modern Hard Drives Using BlueFS. Technical Report CMU-PDL-19-102. CMU Parallel Data Laboratory.
[4]
Amazon.com, Inc. 2019. Amazon Elastic Block Store. Retrieved from https://aws.amazon.com/ebs/.
[5]
Amazon.com, Inc. 2019. Amazon S3. Retrieved from https://aws.amazon.com/s3/.
[6]
Jens Axboe. 2009. Queue sysfs Files. Retrieved from https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt.
[7]
Jens Axboe. 2016. Flexible I/O Tester. Retrieved from git://git.kernel.dk/fio.git.
[8]
Jens Axboe. 2016. Throttled Background Buffered Writeback. Retrieved from https://lwn.net/Articles/698815/.
[9]
Matias Bjørling. 2019. From open-channel SSDs to zoned namespaces. In Proceedings of the Linux Storage and Filesystems Conference (Vault 19). USENIX Association.
[10]
Matias Bjørling. 2019. New NVMe Specification Defines Zoned Namespaces (ZNS) as Go-To Industry Technology. Retrieved from https://nvmexpress.org/new-nvmetm-specification-defines-zoned-namespaces-zns-as-go-to-industry-technology/.
[11]
Matias Bjørling, Javier Gonzalez, and Philippe Bonnet. 2017. LightNVM: The Linux open-channel SSD subsystem. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 359--374.
[12]
Artem Blagodarenko. 2016. Scaling LDISKFS for the Future. Retrieved from https://www.youtube.com/watch?v=ubbZGpxV6zk.
[13]
Artem Blagodarenko. 2017. Scaling LDISKFS for the Future. Again. Retrieved from https://www.youtube.com/watch?v=HLfEd0_Dq0U.
[14]
Frederick P. Brooks Jr. 1986. No Silver Bullet—Essence and Accident in Software Engineering. https://dl.acm.org/doi/10.1109/MC.1987.1663532
[15]
Btrfs. 2019. Btrfs Changelog. Retrieved from https://btrfs.wiki.kernel.org/index.php/Changelog.
[16]
David C. 2018. [ceph-users] Luminous | PG Split Causing Slow Requests. Retrieved from http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024984.html.
[17]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2 (Jun. 2008).
[18]
Luoqing Chao and Thunder Zhang. 2015. Implement Object Storage with SMR Based Key-Value Store. Retrieved from https://www.snia.org/sites/default/files/SDC15_presentations/smr/QingchaoLuo_Implement_Object_Storage_SMR_Key-Value_Store.pdf.
[19]
Dave Chinner. 2010. XFS Delayed Logging Design. Retrieved from https://www.kernel.org/doc/Documentation/filesystems/xfs-delayed-logging-design.txt.
[20]
Dave Chinner. 2015. SMR Layout Optimization for XFS. Retrieved from http://xfs.org/images/f/f6/Xfs-smr-structure-0.2.pdf.
[21]
Dave Chinner. 2019. Re: Pagecache Locking (Was: bcachefs Status Update) Merged). Retrieved from https://lkml.org/lkml/2019/6/13/1794.
[22]
Alibaba Clouder. 2018. Alibaba Deploys Alibaba Open Channel SSD for Next Generation Data Centers. Retrieved from https://www.alibabacloud.com/blog/alibaba-deploys-alibaba-open-channel-ssd-for-next-generation-data-centers_593802.
[23]
William Cohen. 2016. How to Avoid Wasting Megabytes of Memory a Few Bytes at a Time. Retrieved from https://developers.redhat.com/blog/2016/06/01/how-to-avoid-wasting-megabytes-of-memory-a-few-bytes-at-a-time/.
[24]
Jonathan Corbet. 2009. Supporting Transactions in Btrfs. Retrieved from https://lwn.net/Articles/361457/.
[25]
Jonathan Corbet. 2011. No-I/O Dirty Throttling. Retrieved from https://lwn.net/Articles/456904/.
[26]
Jonathan Corbet. 2018. PostgreSQL’s fsync() Surprise. Retrieved from https://lwn.net/Articles/752063/.
[27]
Western Digital. 2019. Zoned Storage. Retrieved from http://zonedstorage.io.
[28]
Anton Dmitriev. 2017. [ceph-users] All OSD Fails after Few Requests to RGW. Retrieved from http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-May/017950.html.
[29]
Siying Dong. 2018. Direct I/O Close() Shouldn’t Rewrite the Last Page. Retrieved from https://github.com/facebook/rocksdb/pull/4771.
[30]
Jake Edge. 2015. Filesystem Support for SMR Devices. Retrieved from https://lwn.net/Articles/637035/.
[31]
Jake Edge. 2015. The OrangeFS Distributed Filesystem. Retrieved from https://lwn.net/Articles/643165/.
[32]
Jake Edge. 2015. XFS: There and Back ... and There Again? Retrieved from https://lwn.net/Articles/638546/.
[33]
D. R. Engler, M. F. Kaashoek, and J. O’Toole, Jr. 1995. Exokernel: An operating system architecture for application-level resource management. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP’95). ACM, New York, NY, 251--266.
[34]
Facebook, Inc. 2018. A RocksDB Storage Engine with MySQL. Retrieved from http://myrocks.io/.
[35]
Andrew Fikes. 2010. Storage Architecture and Challenges. Retrieved from https://cloud.google.com/files/storage_architecture_and_challenges.pdf.
[36]
Mary Jo Foley. 2018. Microsoft readies new cloud SSD storage spec for the Open Compute Project. Retrieved from https://www.zdnet.com/article/microsoft-readies-new-cloud-ssd-storage-spec-for-the-open-compute-project/.
[37]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). ACM, New York, NY, 29--43.
[38]
Mingzhe Hao, Gokul Soundararajan, Deepak Kenchammana-Hosekote, Andrew A. Chien, and Haryadi S. Gunawi. 2016. The tail at store: A revelation from millions of hours of disk and SSD deployments. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). USENIX Association, Santa Clara, CA, 263--276.
[39]
Christoph Hellwig. 2009. XFS: The big storage file system for Linux. USENIX ;login 34, 5 (2009).
[40]
J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and M. West. 1987. Scale and performance in a distributed file system. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP’87). ACM, New York, NY, 1--2.
[41]
Joel Hruska. 2019. Western Digital to Demo Dual Actuator HDD, Will Use SMR to Hit 18TB Capacity. Retrieved from https://www.extremetech.com/computing/287319-western-digital-to-demo-dual-actuator-hdd-will-use-smr-to-hit-18tb-capacity.
[42]
Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Cheng, Vijay Chidambaram, and Emmett Witchel. 2018. TxFS: Leveraging file-system crash consistency to provide ACID transactions. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC’18). USENIX Association, 879--891.
[43]
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in Windows Azure storage. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC 12). USENIX, 15--26.
[44]
Felix Hupfeld, Toni Cortes, Björn Kolbeck, Jan Stender, Erich Focht, Matthias Hess, Jesus Malo, Jonathan Marti, and Eugenio Cesario. 2008. The XtreemFS architecture—A case for object-based file systems in grids. Concurr. Comput.: Pract. Exper. 20, 17 (Dec. 2008), 2049--2060.
[45]
Facebook Inc. 2019. RocksDB Direct IO. Retrieved from https://github.com/facebook/rocksdb/wiki/Direct-IO.
[46]
Facebook Inc. 2019. RocksDB Merge Operator. Retrieved from https://github.com/facebook/rocksdb/wiki/Merge-Operator.
[47]
Facebook Inc. 2019. RocksDB Synchronous Writes. Retrieved from https://github.com/facebook/rocksdb/wiki/Basic-Operations#synchronous-writes.
[48]
Silicon Graphics Inc. 2006. XFS Allocation Groups. Retrieved from http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/Allocation_Groups.html.
[49]
INCITS T10 Technical Committee. 2014. Information Technology—Zoned Block Commands (ZBC). Draft Standard T10/BSR INCITS 536. American National Standards Institute, Inc. Retrieved from http://www.t10.org/drafts.htm.
[50]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: Write-optimization in a kernel file system. Trans. Stor. 11, 4, Article 18 (Nov. 2015), 29 pages.
[51]
Sooman Jeong, Kisung Lee, Seongjin Lee, Seoungbum Son, and Youjip Won. 2013. I/O stack optimization for smartphones. In Proceedings of the 2013 USENIX Annual Technical Conference (USENIX ATC’13). USENIX, 309--320.
[52]
Theodore Johnson and Dennis Shasha. 1994. 2Q: A low overhead high performance buffer management replacement algorithm. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann, San Francisco, CA, 439--450. http://dl.acm.org/citation.cfm?id=645920.672996
[53]
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Hector M. Briceño, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie. 1997. Application performance and flexibility on exokernel systems. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP’97). ACM, New York, NY, 52--65.
[54]
John Kennedy and Michael Satran. 2018. About Transactional NTFS. Retrieved from https://docs.microsoft.com/en-us/windows/desktop/fileio/about-transactional-ntfs.
[55]
John Kennedy and Michael Satran. 2018. Alternatives to using Transactional NTFS. Retrieved from https://docs.microsoft.com/en-us/windows/desktop/fileio/deprecation-of-txf.
[56]
Jaeho Kim, Donghee Lee, and Sam H. Noh. 2015. Towards SLO complying SSDs through OPS isolation. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). USENIX Association, 183--189.
[57]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 2 (Apr. 2010), 35--40.
[58]
Butler Lampson and Howard E. Sturgis. 1979. Crash recovery in a distributed data storage system. (1979). https://www.microsoft.com/en-us/research/publication/crash-recovery-in-a-distributed-data-storage-system/.
[59]
Adam Leventhal. 2016. APFS in Detail: Overview. Retrieved from http://dtrace.org/blogs/ahl/2016/06/19/apfs-part1/.
[60]
Peter Macko, Xiongzi Ge, John Haskins Jr., James Kelley, David Slik, Keith A. Smith, and Maxim G. Smith. 2017. SMORE: A cold data object store for SMR drives (extended version). CoRR abs/1705.09701 (2017). http://arxiv.org/abs/1705.09701
[61]
Magic Pocket 8 Hardware Engineering Teams. 2018. Extending Magic Pocket Innovation with the First Petabyte Scale SMR Drive Deployment. Retrieved from https://blogs.dropbox.com/tech/2018/06/extending-magic-pocket-innovation-with-the-first-petabyte-scale-smr-drive-deployment/.
[62]
Magic Pocket 8 Hardware Engineering Teams. 2019. SMR: What We Learned in Our First Year. Retrieved from https://blogs.dropbox.com/tech/2019/07/smr-what-we-learned-in-our-first-year/.
[63]
Adam Manzanares, Noah Watkins, Cyril Guyot, Damien LeMoal, Carlos Maltzahn, and Zvonimr Bandic. 2016. ZEA, a data management approach for SMR. In Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16). USENIX Association.
[64]
Lars Marowsky-Brée. 2018. Ceph User Survey 2018 Results. Retrieved from https://ceph.com/ceph-blog/ceph-user-survey-2018-results/.
[65]
Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry. 1984. A fast file system for UNIX. ACM Trans. Comput. Syst. 2, 3 (1984), 181--197.
[66]
Chris Mellor. 2019. Toshiba Embraces Shingling for Next-gen MAMR HDDs. Retrieved from https://blocksandfiles.com/2019/03/11/toshiba-mamr-statements-have-shingling-absence/.
[67]
Changwoo Min, Woon-Hak Kang, Taesoo Kim, Sang-Won Lee, and Young Ik Eom. 2015. Lightweight application-level crash consistency on transactional flash storage. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC’15). USENIX Association, 221--234.
[68]
Sumedh N. 2013. Coding for Performance: Data alignment and structures. Retrieved from https://software.intel.com/en-us/articles/coding-for-performance-data-alignment-and-structures.
[69]
Michael A. Olson. 1993. The design and implementation of the inversion file system. In USENIX Winter. USENIX Association, Berkeley, CA. https://www.usenix.org/conference/usenix-winter-1993-conference/presentation/design-and-implementation-inversion-file-system.
[70]
Michael A. Olson, Keith Bostic, and Margo Seltzer. 1999. Berkeley DB. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC’99). USENIX Association, Berkeley, CA, 43--43.
[71]
Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Inf. 33, 4 (Jun. 1996), 351--385.
[72]
OpenStack Foundation. 2017. 2017 Annual Report. Retrieved from https://www.openstack.org/assets/reports/OpenStack-AnnualReport2017.pdf.
[73]
Adrian Palmer. 2015. SMRFFS-EXT4—SMR Friendly File System. Retrieved from https://github.com/Seagate/SMR_FS-EXT4.
[74]
Swapnil Patil and Garth Gibson. 2011. Scale and concurrency of GIGA+: File system directories with millions of files. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST’11). USENIX Association, Berkeley, CA, 13--13. http://dl.acm.org/citation.cfm?id=1960475.1960488
[75]
Juan Piernas, Toni Cortes, and José M. García. 2002. DualFS: A new journaling file system without meta-data duplication. In Proceedings of the 16th International Conference on Supercomputing (ICS’02). Association for Computing Machinery, New York, NY, 137--146.
[76]
Poornima G and Rajesh Joseph. 2016. Metadata Performance Bottlenecks in Gluster. Retrieved from https://www.slideshare.net/GlusterCommunity/performance-bottlenecks-for-metadata-workload-in-gluster-with-poornima-gurusiddaiah-rajesh-joseph.
[77]
Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel. 2009. Operating system transactions. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). ACM, New York, NY, 161--176.
[78]
Lee Prewitt. 2019. SMR and ZNS—Two Sides of the Same Coin. Retrieved from https://www.youtube.com/watch?v=jBxzO6YyMxU.
[79]
Red Hat Inc. 2019. GlusterFS Architecture. Retrieved from https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/.
[80]
Kai Ren and Garth Gibson. 2013. TABLEFS: Enhancing metadata efficiency in the local file system. In Proceedings of the 2013 USENIX Annual Technical Conference (USENIX ATC’13). USENIX, 145--156.
[81]
Mendel Rosenblum and John K. Ousterhout. 1991. The design and implementation of a log-structured file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP’91). ACM, New York, NY, 1--15.
[82]
Frank Schmuck and Jim Wylie. 1991. Experience with transactions in QuickSilver. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP’91). ACM, New York, NY, 239--253.
[83]
Thomas J. E. Schwarz, Qin Xin, Ethan L. Miller, Darrell D. E. Long, Andy Hospodor, and Spencer Ng. 2004. Disk scrubbing in large archival storage systems. In Proceedings of the IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04). IEEE Computer Society, 409--418. http://dl.acm.org/citation.cfm?id=1032659.1034226
[84]
Seastar. 2019. Shared-nothing Design. Retrieved from http://seastar.io/shared-nothing/.
[85]
Margo I. Seltzer. 1993. Transaction support in a log-structured file system. In Proceedings of the 9th International Conference on Data Engineering. IEEE Computer Society, 503--510.
[86]
Kai Shen, Stan Park, and Men Zhu. 2014. Journaling of journal is (almost) free. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). USENIX, 287--293.
[87]
Anton Shilov. 2017. Seagate Ships 35th Millionth SMR HDD, Confirms HAMR-Based Drives in Late 2018. Retrieved from https://www.anandtech.com/show/11315/seagate-ships-35th-millionth-smr-hdd-confirms-hamrbased-hard-drives-in-late-2018.
[88]
A. Shilov. 2019. Western Digital: Over Half of Data Center HDDs Will Use SMR by 2023. Retrieved from https://www.anandtech.com/show/14099/western-digital-over-half-of-dc-hdds-will-use-smr-by-2023.
[89]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (MSST’10). IEEE Computer Society, 1--10.
[90]
Chris Siebenmann. 2011. About the Order That readdir() Returns Entries In. Retrieved from https://utcc.utoronto.ca/ cks/space/blog/unix/ReaddirOrder.
[91]
Chris Siebenmann. 2013. ZFS Transaction Groups and the ZFS Intent Log. Retrieved from https://utcc.utoronto.ca/ cks/space/blog/solaris/ZFSTXGsAndZILs.
[92]
Richard P. Spillane, Sachin Gaikwad, Manjunath Chinni, Erez Zadok, and Charles P. Wright. 2009. Enabling transactional file access via lightweight kernel extensions. In Proccedings of the 7th Conference on File and Storage Technologies (FAST’09). USENIX Association, 29--42.
[93]
Stas Starikevich. 2016. [ceph-users] RadosGW performance degradation on the 18 millions objects stored. Retrieved from http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012983.html.
[94]
Jan Stender, Björn Kolbeck, Mikael Högqvist, and Felix Hupfeld. 2010. BabuDB: Fast and efficient file system metadata storage. In Proceedings of the 2010 International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI’10). IEEE Computer Society, 51--58.
[95]
Michael Stonebraker. 1981. Operating system support for database management. Commun. ACM 24, 7 (Jul. 1981), 412--418.
[96]
Michael Stonebraker and Lawrence A. Rowe. 1986. The design of POSTGRES. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data (SIGMOD’86). ACM, New York, NY, 340--355.
[97]
ZAR team. 2019. “Write hole” phenomenon. Retrieved from http://www.raid-recovery-guide.com/raid5-write-hole.aspx.
[98]
ThinkParQ. 2018. An introduction to BeeGFS. Retrieved from https://www.beegfs.io/docs/whitepapers/Introduction_to_BeeGFS_by_ThinkParQ.pdf.
[99]
Stephen C. Tweedie. 1998. Journaling the Linux ext2fs Filesystem. In Proceedings of the 4th Annual Linux Expo.
[100]
Sage Weil. 2009. Re: [RFC] Big Fat Transaction ioctl. Retrieved from https://lwn.net/Articles/361472/.
[101]
Sage Weil. 2009. [RFC] Big Fat Transaction ioctl. Retrieved from https://lwn.net/Articles/361439/.
[102]
Sage Weil. 2011. [PATCH v3] Introduce sys_syncfs to Sync a Single File System. Retrieved from https://lwn.net/Articles/433384/.
[103]
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). USENIX Association, Berkeley, CA, 307--320.
[104]
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, and Carlos Maltzahn. 2006. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC’06). Association for Computing Machinery, New York, NY, 122--es.
[105]
Sage A. Weil, Andrew W. Leung, Scott A. Brandt, and Carlos Maltzahn. 2007. RADOS: A scalable, reliable storage service for petabyte-scale storage clusters. In Proceedings of the 2Nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing’07 (PDSW’07). ACM, New York, NY, 35--44.
[106]
Brent Welch, Marc Unangst, Zainul Abbasi, Garth Gibson, Brian Mueller, Jason Small, Jim Zelenka, and Bin Zhou. 2008. Scalable performance of the panasas parallel file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). USENIX Association, Berkeley, CA, Article 2, 17 pages.
[107]
Western Digital Inc. 2018. ZBC device manipulation library. Retrieved from https://github.com/hgst/libzbc.
[108]
Lustre Wiki. 2017. Introduction to Lustre Architecture. Retrieved from http://wiki.lustre.org/images/6/64/LustreArchitecture-v4.pdf.
[109]
Wikipedia. 2018. Btrfs History. Retrieved from https://en.wikipedia.org/wiki/Btrfs#History.
[110]
Wikipedia. 2018. XFS History. Retrieved from https://en.wikipedia.org/wiki/XFS#History.
[111]
Wikipedia. 2019. Cache flushing. Retrieved from https://en.wikipedia.org/wiki/Disk_buffer#Cache_flushing.
[112]
Charles P. Wright, Richard Spillane, Gopalan Sivathanu, and Erez Zadok. 2007. Extending ACID semantics to the file system. ACM Trans. Stor. 3, 2 (Jun. 2007), 4--es.
[113]
Fengguang Wu. 2012. I/O-less Dirty Throttling. Retrieved from https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_wu.pdf.
[114]
Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. 2017. Tiny-tail flash: Near-perfect elimination of garbage collection tail latencies in NAND SSDs. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 15--28.
[115]
Ting Yao, Jiguang Wan, Ping Huang, Yiwen Zhang, Zhiwen Liu, Changsheng Xie, and Xubin He. 2019. GearDB: A GC-free key-value store on HM-SMR drives with gear compaction. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). USENIX Association, 159--171.
[116]
Lawrence Ying and Theodore Ts’o. 2017. Dynamic Hybrid-SMR: An OCP proposal to improve data center disk drives. Retrieved from https://www.blog.google/products/google-cloud/dynamic-hybrid-smr-ocp-proposal-improve-data-center-disk-drives/.
[117]
Zhihui Zhang and Kanad Ghose. 2007. hFS: A hybrid file system prototype for improving small file and metadata performance. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys’07). ACM, New York, NY, 175--187.
[118]
Qing Zheng, Charles D. Cranor, Danhao Guo, Gregory R. Ganger, George Amvrosiadis, Garth A. Gibson, Bradley W. Settlemyer, Gary Grider, and Fan Guo. 2018. Scaling embedded in-situ indexing with deltaFS. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’18). IEEE Press, Article 3, 15 pages. http://dl.acm.org/citation.cfm?id=3291656.3291660
[119]
Alexey Zhuravlev. 2016. ZFS: Metadata Performance. Retrieved from https://www.eofs.eu/_media/events/lad16/02_zfs_md_performance_improvements_zhuravlev.pdf.

Cited By

View all
  • (2025)Efficient security interface for high-performance Ceph storage systemsFuture Generation Computer Systems10.1016/j.future.2024.107571164(107571)Online publication date: Mar-2025
  • (2023)The Open-source DeLiBA2 Hardware/Software Framework for Distributed Storage AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/362448217:2(1-32)Online publication date: 14-Sep-2023
  • (2023)KV-CSD: A Hardware-Accelerated Key-Value Store for Data-Intensive Applications2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00019(132-144)Online publication date: 31-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 16, Issue 2
SOSP 2019 Special Section and Regular Papers
May 2020
194 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3399155
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2020
Online AM: 07 May 2020
Accepted: 01 March 2020
Received: 01 January 2020
Published in TOS Volume 16, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ceph
  2. distributed file system
  3. file system
  4. object storage
  5. storage backend

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • NDSEG Fellowship

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)459
  • Downloads (Last 6 weeks)54
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Efficient security interface for high-performance Ceph storage systemsFuture Generation Computer Systems10.1016/j.future.2024.107571164(107571)Online publication date: Mar-2025
  • (2023)The Open-source DeLiBA2 Hardware/Software Framework for Distributed Storage AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/362448217:2(1-32)Online publication date: 14-Sep-2023
  • (2023)KV-CSD: A Hardware-Accelerated Key-Value Store for Data-Intensive Applications2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00019(132-144)Online publication date: 31-Oct-2023
  • (2022)Research on distributed competition big data hierarchical storage method based on Ceph architectureInternational Conference on Signal Processing and Communication Security (ICSPCS 2022)10.1117/12.2655365(39)Online publication date: 2-Nov-2022
  • (2022)DeLiBA: An Open-Source Hardware/Software Framework for the Development of Linux Block I/O Accelerators2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL57034.2022.00038(183-191)Online publication date: Aug-2022
  • (2022)CephArmor: A Lightweight Cryptographic Interface for Secure High-Performance Ceph Storage SystemsIEEE Access10.1109/ACCESS.2022.322738410(127911-127927)Online publication date: 2022
  • (2022)IMSS: In-Memory Storage System for Data Intensive ApplicationsHigh Performance Computing. ISC High Performance 2022 International Workshops10.1007/978-3-031-23220-6_13(190-205)Online publication date: 29-May-2022
  • (2021)Don't be a blockheadProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465300(144-151)Online publication date: 1-Jun-2021
  • (2021)Using ceph's BlueStore as object storage in HPC storage frameworkProceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3439839.3458734(1-6)Online publication date: 26-Apr-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media