Distributed File Systems

Yuchong Hu³

455 Accesses

Definitions

File system is a subsystem of an operating system which aims to organize, retrieve, and store data files. A distributed file system (DFS) is a file system with files shared on dispersed storage resources across a network. The DFS makes it convenient to share files among clients in a controlled and authorized manner, and the clients can benefit from DFS since they can locate all the file shares within a single server or domain name.

Overview

A wide variety of applications of big data analytics rely on distributed environments to analyze large amounts of data. As the amount of data increases, the need to provide reliable and efficient storage solutions has become one of the main concerns of big data infrastructure administrators. The traditional systems and methods of storage are suboptimal due to their price or performance restrictions, while DFS, as a paradigm shift in the storage arena, has been developed to bring convenience to sharing files in both local area networks...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

Ananthanarayanan G, Agarwal S, Kandula S, Greenberg A, Stoica I, Harlan D, Harris E (2011) Scarlett: coping with skewed content popularity in MapReduce clusters. In: European conference on computer systems, proceedings of the sixth European conference on computer systems (EUROSYS 2011), Alzburg, pp 287–300
Google Scholar
Beaver D, Kumar S, Li HC, Sobel J, Vajgel P (2010) Finding a needle in haystack: Facebook’s photo storage. In: Usenix conference on operating systems design and implementation, pp 47–60
Google Scholar
Borthakur D, Gray J, Sarma JS, Muthukkaruppan K, Spiegelberg N, Kuang H, Ranganathan K, Molkov D, Menon A, Rash S (2011) Apache Hadoop goes realtime at Facebook. In: ACM SIGMOD international conference on management of data (SIGMOD 2011), Athens, pp 1071–1080
Google Scholar
Chaiken R, Jenkins B, Ramsey B, Shakib D, Weaver S, Zhou J (2008) Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276
Article Google Scholar
Chowdhury M, Kandula S, Stoica I (2013) Leveraging endpoint flexibility in data-intensive clusters. ACM SIGCOMM Comput Commun Rev 43(4):231–242
Article Google Scholar
Cidon A, Rumble SM, Stutsman R, Katti S, Ousterhout J, Rosenblum M (2013) Copysets: reducing the frequency of data loss in cloud storage. In: Usenix conference on technical conference, pp 37–48
Google Scholar
Dean J, Ghemawat S (2013) MapReduce: simplified data processing on large clusters. In: Proceedings of operating systems design and implementation (OSDI) 51(1):107–113
Google Scholar
Dice D, Shalev O, Shavit N (2006) Transactional locking II. In: International conference on distributed computing, pp 194–208
Google Scholar
Dimakis AG, Godfrey PB, Wainwright MJ, Ramchandran K (2007) Network coding for distributed storage systems. In: IEEE international conference on computer communications (INFOCOM 2007). IEEE, pp 2000–2008
Google Scholar
Fan B, Tantisiriroj W, Xiao L, Gibson G (2009) Diskreduce: raid for data-intensive scalable computing. In: The workshop on Petascale data storage, pp 6–10
Google Scholar
Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. ACM SIGOPS Oper Syst Rev 37(5):29–43
Article Google Scholar
Gray C, Cheriton D (1989) Leases: an efficient fault-tolerant mechanism for distributed file cache consistency. ACM SIGOPS Oper Syst Rev 23(23):202–210
Article Google Scholar
Hu Y, Chen HCH, Lee PPC, Tang Y (2012) Nccloud: applying network coding for the storage repair in a cloud-of-clouds. In: Usenix conference on file and storage technologies, pp 12–19
Google Scholar
Hu Y, Lee PPC, Shum KW, Zhou P (2017a) Proxy-assisted regenerating codes with uncoded repair for distributed storage systems. IEEE Trans Inf Theory PP(99):1–17
Google Scholar
Hu Y, Li X, Zhang M, Lee PPC, Zhang X, Zhou P, Feng D (2017b) Optimal repair layering for erasure-coded data centers: from theory to practice. IEEE Trans Storage PP(99):1–26
Google Scholar
Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S (2012) Erasure coding in windows azure storage. In: Usenix conference on technical conference, pp 2–13
Google Scholar
Li R, Lee PPC, Hu Y (2014) Degraded-first scheduling for MapReduce in erasure-coded storage clusters. In: IEEE/IFIP international conference on dependable systems and networks, pp 419–430
Google Scholar
Li R, Hu Y, Lee PPC (2015) Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. IEEE Trans Parallel Distrib Syst PP(99):1–14
Google Scholar
Li R, Li X, Lee PPC, Huang Q (2017) Repair pipelining for erasure-coded storage. In: Usenix technical conference
Google Scholar
Muralidhar S, Lloyd W, Roy S, Hill C, Lin E, Liu W, Pan S, Shankar S, Sivakumar V, Tang L (2014) f4: Facebook’s warm blob storage system. In: Usenix conference on operating systems design and implementation, pp 383–398
Google Scholar
Pamies-Juarez L, Blagojevic F, Mateescu R, Gyuot C, Gad EE, Bandic Z (2016) Opening the chrysalis: on the real repair performance of MSR codes. In: International cultural heritage informatics meeting, pp 93–106
Google Scholar
Quinlan S (1991) A cached worm file system, vol 21. Wiley Online Library
Google Scholar
Rashmi KV, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2013) A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the Facebook warehouse cluster. Usenix Hotstorage
Google Scholar
Rashmi KV, Borthakur D, Borthakur D, Borthakur D, Borthakur D, Ramchandran K (2014) A “Hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. In: ACM conference on SIGCOMM, pp 331–342
MATH Google Scholar
Rashmi KV, Nakkiran P, Wang J, Shah NB, Ramchandran K (2015) Having your cake and eating it too: jointly optimal erasure codes for i/o, storage and network-bandwidth. In: Usenix conference on file and storage technologies, pp 81–94
Google Scholar
Sathiamoorthy M, Asteris M, Papailiopoulos D, Dimakis AG, Vadali R, Chen S, Borthakur D (2013) Xoring elephants: novel erasure codes for big data. Proc VLDB Endow 6(5):325–336
Article Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: MASS storage systems and technologies, pp 1–10
Google Scholar
Silberstein M, Ganesh L, Wang Y, Alvisi L, Dahlin M (2014) Lazy means smart: reducing repair bandwidth costs in erasure-coded distributed storage, pp 1–7
Google Scholar
Weil SA, Brandt SA, Miller EL, Long DDE, Maltzahn C (2006a) Ceph: a scalable, high-performance distributed file system. In: Symposium on operating systems design and implementation, pp 307–320
Google Scholar
Weil SA, Brandt SA, Miller EL, Maltzahn C (2006b) Crush: controlled, scalable, decentralized placement of replicated data. In: SC 2006 conference, proceedings of the ACM/IEEE, pp 31–31
Google Scholar
Wilkes J (1996) The HP autoraid hierarchical storage system. ACM Trans Comput Syst 14(1):108–136
Article Google Scholar
Xia M, Saxena M, Blaum M, Pease DA (2015) A tale of two erasure codes in HDFS. In: Usenix conference on file and storage technologies, pp 213–226
Google Scholar
Zhang Z, Deshpande A, Ma X, Thereska E, Narayanan D (2010) Does erasure coding have a role to play in my data center? Microsoft research MSR-TR-2010
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Yuchong Hu

Authors

Yuchong Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuchong Hu .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

School of Computing, Engineering and Mathematics, Western Sydney University, Locked Bag 1797, 2751, Penrith, NSW, Australia
Rodrigo N. Calheiros
Inria, LIP, ENS Lyon, 46 allee d'Italie, 69364, Lyon, France
Marcos Dias de Assuncao Ph.D

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Hu, Y. (2018). Distributed File Systems. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_44-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_44-1
Published: 01 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Living Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

Latest
Distributed File Systems

Published:

24 May 2022

DOI: https://doi.org/10.1007/978-3-319-63962-8_44-2
Original
Distributed File Systems

Published:

01 February 2018

DOI: https://doi.org/10.1007/978-3-319-63962-8_44-1