Abstract
Lately, fast storage devices are rapidly increasing in social network services, cloud platforms, etc. Unfortunately, the traditional Linux I/O stack is designed to maximize performance on disk-based storage. Emerging byte-addressable and low-latency non-volatile memory technologies (e.g., phase-change memories, MRAMs, and the memristor) provide very different characteristics, so the disk-based I/O stack cannot lead to high performance. This paper presents a high performance I/O stack for the fast storage devices. Our scheme is to remove the concept of block and to simplify the whole I/O path and software stack, which results in only two layers that are the byte-capable interface and the byte-aware file system called BAFS. We aim to minimize I/O latency and maximize bandwidth by eliminating the unnecessary layers and supporting byte-addressable I/O without requiring changes to applications. We have implemented a prototype and evaluated its performance with multiple benchmarks. The experimental results show that our I/O stack achieves 6.2 times on average and up to 17.5 times performance gains compared to the existing Linux I/O stack.
Similar content being viewed by others
References
Axboe, J.: Fiobenchmark, April (1998)
Card, R., Tso, T., Tweedie, S.: Design and implementation of the second extended filesystem. In: Proceedings of the First Dutch International Symposium on Linux, pp. 1–6. Monterey (1994)
Caulfield, A.M., De, A., Coburn, J., Mollow, T.I., Gupta, R.K., Swanson, S.: Moneta: a high-performance storage array architecture for next-generation, non-volatile memories. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 385–395. IEEE Computer Society, Washington, DC (2010)
Caulfield, A.M., Mollov, T.I., Eisner, L.A., De, A., Coburn, J., Swanson, S.: Providing safe, user space access to fast, solid state disks. SIGARCH Comput. Archit. News 40(1), 387–400 (2012)
Condit, J., Nightingale, E.B., Frost, C., Ipek, E., Lee, B., Burger, D., Coetzee, D.: Better I/O through byte-addressable, persistent memory. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pp. 133–146. ACM, New York (2009)
Katti, R.R., Stadler, H.L., Wu, J.-C.: Non-volatile magnetic random access memory. US Patent 5,289,410, 22 Feb 1994
Kim, H., Seshadri, S., Dickey, C.L., Chiu, L.: Evaluating phase change memory for enterprise storage systems: a study of caching and tiering approaches. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 14) USENIX, pp. 33–45. Santa Clara, CA (2014)
Lu, L., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Lu, S.: A study of linux file system evolution. Trans. Storage 10(1), 3:1–3:32 (2014)
Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L.: The new ext4 filesystem: current status and future plans. In: Ottawa Linux Symposium. http://ols.108.redhat.com/2007/Reprints/mathur-Reprint.pdf (2007)
Norcott, W.D.: Lozone file system benchmark (2011)
Oi, H.: A case study: performance evaluation of a dram-based solid state disk. In: Japan–China Joint Workshop on Frontier of Computer Science and Technology, FCST 2007, pp. 57–60 (2007)
Raoux, S., Burr, G., Breitwisch, M., Rettner, C., Chen, Y., Shelby, R., Salinga, M., Krebs, D., Chen, S.H., Lung, H.L., Lam, C.: Phase-change random access memory: a scalable technology. IBM J. Res. Dev. 52(4.5), 465–479 (2008)
Rodeh, O.: B-trees, shadowing, and clones. Trans. Storage 3(4), 2:1–2:27 (2008)
Rodeh, O., Bacik, J., Mason, C.: The linux b-tree filesystem. Trans. Storage 9(3), 9:1–9:32 (2013)
Seppanen, E., O’Keefe, M., Lilja, D.: High performance solid state storage under linux. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–12 (2010)
Shin, D.I., Yu, Y.J., Kim, H.S., Choi, J.W., Jung, D.Y., Yeom, H.Y.: Dynamic interval polling and pipelined post i/o processing for low-latency storage class memory. In: Proceedings of the 5th USENIX Conference on Hot Topics in Storage and File Systems, USENIX Association, pp. 5–5 (2013)
Son, Y., Choi, J. W., Eom, H., Yeom, H.Y.: Optimizing the file system with variable-length I/O for fast storage devices. In: Proceedings of the 4th Asia-Pacific Workshop on Systems, APSys ’13, pp. 14:1–14:6. ACM, New York (2013)
Son, Y., Song, N.Y., Eom, H., Yeom, H.Y.: A user-level file system for fast storage devices. Workshop on Autonomic Management of High Performance Grid and Cloud Computing
Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., Peck, G.: Scalability in the xfs file system. In: USENIX Annual Technical Conference, vol. 15 (1996)
TAILWINDSTORAGE. Extreme s3804 (2014)
Worthington, B.L., Ganger, G.R., and Patt, Y.N.: Scheduling algorithms for modern disk drives. In: Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS’ 94, pp. 241–251. ACM, New York (1994)
Wu, X., Reddy, A.L.N.: Scmfs: a file system for storage class memory. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pp. 39:1–39:11. ACM, New York (2011)
Yang, J., Minturn, D.B., Hady, F.: When poll is better than interrupt. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, FAST’12, pp. 3–3. USENIX Association, Berkeley (2012)
Yu, Y.J., Shin, D.I., Shin, W., Song, N.Y., Choi, J.W., Kim, H.S., Eom, H., Kim, H.S., Eom, H., Yeom, H.Y.: Optimizing the block I/O subsystem for fast storage devices. ACM Trans. Comput. Syst. 32(2), 6:1–6:48 (2014)
Yu, Y.J., Shin, D.I., Shin, W., Song, N.Y., Eom, H., Yeom, H.Y.: Exploiting peak device throughput from random access workload. In: Proceedings of the 4th USENIX Conference on Hot Topics in Storage and File Systems, USENIX Association, pp. 7–7 (2012)
Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 0421-20150075) and partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2014R1A1A2055032).
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version [18] of this paper was presented at AMGCC 2014, London, England.
Rights and permissions
About this article
Cite this article
Son, Y., Song, N.Y., Han, H. et al. Design and evaluation of a user-level file system for fast storage devices . Cluster Comput 18, 1075–1086 (2015). https://doi.org/10.1007/s10586-015-0465-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-015-0465-5