Definitions
File system is a subsystem of an operating system which aims to organize, retrieve, and store data files. A distributed file system (DFS) is a file system with files shared on dispersed storage resources across a network. The DFS makes it convenient to share files among clients in a controlled and authorized manner, and the clients can benefit from DFS since they can locate all the file shares within a single server or domain name.
Overview
A wide variety of applications of big data analytics rely on distributed environments to analyze large amounts of data. As the amount of data increases, the need to provide reliable and efficient storage solutions has become one of the main concerns of big data infrastructure administrators. The traditional systems and methods of storage are suboptimal due to their price or performance restrictions, while DFS, as a paradigm shift in the storage arena, has been developed to bring convenience to sharing files in both local area networks...
References
Ananthanarayanan G, Agarwal S, Kandula S, Greenberg A, Stoica I, Harlan D, Harris E (2011) Scarlett: coping with skewed content popularity in MapReduce clusters. In: European conference on computer systems, proceedings of the sixth European conference on computer systems (EUROSYS 2011), Alzburg, pp 287–300
Beaver D, Kumar S, Li HC, Sobel J, Vajgel P (2010) Finding a needle in haystack: Facebook’s photo storage. In: Usenix conference on operating systems design and implementation, pp 47–60
Borthakur D, Gray J, Sarma JS, Muthukkaruppan K, Spiegelberg N, Kuang H, Ranganathan K, Molkov D, Menon A, Rash S (2011) Apache Hadoop goes realtime at Facebook. In: ACM SIGMOD international conference on management of data (SIGMOD 2011), Athens, pp 1071–1080
Chaiken R, Jenkins B, Ramsey B, Shakib D, Weaver S, Zhou J (2008) Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276
Chowdhury M, Kandula S, Stoica I (2013) Leveraging endpoint flexibility in data-intensive clusters. ACM SIGCOMM Comput Commun Rev 43(4):231–242
Cidon A, Rumble SM, Stutsman R, Katti S, Ousterhout J, Rosenblum M (2013) Copysets: reducing the frequency of data loss in cloud storage. In: Usenix conference on technical conference, pp 37–48
Dean J, Ghemawat S (2013) MapReduce: simplified data processing on large clusters. In: Proceedings of operating systems design and implementation (OSDI) 51(1):107–113
Dice D, Shalev O, Shavit N (2006) Transactional locking II. In: International conference on distributed computing, pp 194–208
Dimakis AG, Godfrey PB, Wainwright MJ, Ramchandran K (2007) Network coding for distributed storage systems. In: IEEE international conference on computer communications (INFOCOM 2007). IEEE, pp 2000–2008
Fan B, Tantisiriroj W, Xiao L, Gibson G (2009) Diskreduce: raid for data-intensive scalable computing. In: The workshop on Petascale data storage, pp 6–10
Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. ACM SIGOPS Oper Syst Rev 37(5):29–43
Gray C, Cheriton D (1989) Leases: an efficient fault-tolerant mechanism for distributed file cache consistency. ACM SIGOPS Oper Syst Rev 23(23):202–210
Hu Y, Chen HCH, Lee PPC, Tang Y (2012) Nccloud: applying network coding for the storage repair in a cloud-of-clouds. In: Usenix conference on file and storage technologies, pp 12–19
Hu Y, Lee PPC, Shum KW, Zhou P (2017a) Proxy-assisted regenerating codes with uncoded repair for distributed storage systems. IEEE Trans Inf Theory PP(99):1–17
Hu Y, Li X, Zhang M, Lee PPC, Zhang X, Zhou P, Feng D (2017b) Optimal repair layering for erasure-coded data centers: from theory to practice. IEEE Trans Storage PP(99):1–26
Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S (2012) Erasure coding in windows azure storage. In: Usenix conference on technical conference, pp 2–13
Li R, Lee PPC, Hu Y (2014) Degraded-first scheduling for MapReduce in erasure-coded storage clusters. In: IEEE/IFIP international conference on dependable systems and networks, pp 419–430
Li R, Hu Y, Lee PPC (2015) Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. IEEE Trans Parallel Distrib Syst PP(99):1–14
Li R, Li X, Lee PPC, Huang Q (2017) Repair pipelining for erasure-coded storage. In: Usenix technical conference
Muralidhar S, Lloyd W, Roy S, Hill C, Lin E, Liu W, Pan S, Shankar S, Sivakumar V, Tang L (2014) f4: Facebook’s warm blob storage system. In: Usenix conference on operating systems design and implementation, pp 383–398
Pamies-Juarez L, Blagojevic F, Mateescu R, Gyuot C, Gad EE, Bandic Z (2016) Opening the chrysalis: on the real repair performance of MSR codes. In: International cultural heritage informatics meeting, pp 93–106
Quinlan S (1991) A cached worm file system, vol 21. Wiley Online Library
Rashmi KV, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2013) A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the Facebook warehouse cluster. Usenix Hotstorage
Rashmi KV, Borthakur D, Borthakur D, Borthakur D, Borthakur D, Ramchandran K (2014) A “Hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. In: ACM conference on SIGCOMM, pp 331–342
Rashmi KV, Nakkiran P, Wang J, Shah NB, Ramchandran K (2015) Having your cake and eating it too: jointly optimal erasure codes for i/o, storage and network-bandwidth. In: Usenix conference on file and storage technologies, pp 81–94
Sathiamoorthy M, Asteris M, Papailiopoulos D, Dimakis AG, Vadali R, Chen S, Borthakur D (2013) Xoring elephants: novel erasure codes for big data. Proc VLDB Endow 6(5):325–336
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: MASS storage systems and technologies, pp 1–10
Silberstein M, Ganesh L, Wang Y, Alvisi L, Dahlin M (2014) Lazy means smart: reducing repair bandwidth costs in erasure-coded distributed storage, pp 1–7
Weil SA, Brandt SA, Miller EL, Long DDE, Maltzahn C (2006a) Ceph: a scalable, high-performance distributed file system. In: Symposium on operating systems design and implementation, pp 307–320
Weil SA, Brandt SA, Miller EL, Maltzahn C (2006b) Crush: controlled, scalable, decentralized placement of replicated data. In: SC 2006 conference, proceedings of the ACM/IEEE, pp 31–31
Wilkes J (1996) The HP autoraid hierarchical storage system. ACM Trans Comput Syst 14(1):108–136
Xia M, Saxena M, Blaum M, Pease DA (2015) A tale of two erasure codes in HDFS. In: Usenix conference on file and storage technologies, pp 213–226
Zhang Z, Deshpande A, Ma X, Thereska E, Narayanan D (2010) Does erasure coding have a role to play in my data center? Microsoft research MSR-TR-2010
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this entry
Cite this entry
Hu, Y. (2018). Distributed File Systems. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_44-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_44-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering
Publish with us
Chapter history
-
Latest
Distributed File Systems- Published:
- 24 May 2022
DOI: https://doi.org/10.1007/978-3-319-63962-8_44-2
-
Original
Distributed File Systems- Published:
- 01 February 2018
DOI: https://doi.org/10.1007/978-3-319-63962-8_44-1