Nothing Special   »   [go: up one dir, main page]

Skip to main content

Distributed File Systems

Encyclopedia of Big Data Technologies
  • 455 Accesses

Definitions

File system is a subsystem of an operating system which aims to organize, retrieve, and store data files. A distributed file system (DFS) is a file system with files shared on dispersed storage resources across a network. The DFS makes it convenient to share files among clients in a controlled and authorized manner, and the clients can benefit from DFS since they can locate all the file shares within a single server or domain name.

Overview

A wide variety of applications of big data analytics rely on distributed environments to analyze large amounts of data. As the amount of data increases, the need to provide reliable and efficient storage solutions has become one of the main concerns of big data infrastructure administrators. The traditional systems and methods of storage are suboptimal due to their price or performance restrictions, while DFS, as a paradigm shift in the storage arena, has been developed to bring convenience to sharing files in both local area networks...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Ananthanarayanan G, Agarwal S, Kandula S, Greenberg A, Stoica I, Harlan D, Harris E (2011) Scarlett: coping with skewed content popularity in MapReduce clusters. In: European conference on computer systems, proceedings of the sixth European conference on computer systems (EUROSYS 2011), Alzburg, pp 287–300

    Google Scholar 

  • Beaver D, Kumar S, Li HC, Sobel J, Vajgel P (2010) Finding a needle in haystack: Facebook’s photo storage. In: Usenix conference on operating systems design and implementation, pp 47–60

    Google Scholar 

  • Borthakur D, Gray J, Sarma JS, Muthukkaruppan K, Spiegelberg N, Kuang H, Ranganathan K, Molkov D, Menon A, Rash S (2011) Apache Hadoop goes realtime at Facebook. In: ACM SIGMOD international conference on management of data (SIGMOD 2011), Athens, pp 1071–1080

    Google Scholar 

  • Chaiken R, Jenkins B, Ramsey B, Shakib D, Weaver S, Zhou J (2008) Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276

    Article  Google Scholar 

  • Chowdhury M, Kandula S, Stoica I (2013) Leveraging endpoint flexibility in data-intensive clusters. ACM SIGCOMM Comput Commun Rev 43(4):231–242

    Article  Google Scholar 

  • Cidon A, Rumble SM, Stutsman R, Katti S, Ousterhout J, Rosenblum M (2013) Copysets: reducing the frequency of data loss in cloud storage. In: Usenix conference on technical conference, pp 37–48

    Google Scholar 

  • Dean J, Ghemawat S (2013) MapReduce: simplified data processing on large clusters. In: Proceedings of operating systems design and implementation (OSDI) 51(1):107–113

    Google Scholar 

  • Dice D, Shalev O, Shavit N (2006) Transactional locking II. In: International conference on distributed computing, pp 194–208

    Google Scholar 

  • Dimakis AG, Godfrey PB, Wainwright MJ, Ramchandran K (2007) Network coding for distributed storage systems. In: IEEE international conference on computer communications (INFOCOM 2007). IEEE, pp 2000–2008

    Google Scholar 

  • Fan B, Tantisiriroj W, Xiao L, Gibson G (2009) Diskreduce: raid for data-intensive scalable computing. In: The workshop on Petascale data storage, pp 6–10

    Google Scholar 

  • Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. ACM SIGOPS Oper Syst Rev 37(5):29–43

    Article  Google Scholar 

  • Gray C, Cheriton D (1989) Leases: an efficient fault-tolerant mechanism for distributed file cache consistency. ACM SIGOPS Oper Syst Rev 23(23):202–210

    Article  Google Scholar 

  • Hu Y, Chen HCH, Lee PPC, Tang Y (2012) Nccloud: applying network coding for the storage repair in a cloud-of-clouds. In: Usenix conference on file and storage technologies, pp 12–19

    Google Scholar 

  • Hu Y, Lee PPC, Shum KW, Zhou P (2017a) Proxy-assisted regenerating codes with uncoded repair for distributed storage systems. IEEE Trans Inf Theory PP(99):1–17

    Google Scholar 

  • Hu Y, Li X, Zhang M, Lee PPC, Zhang X, Zhou P, Feng D (2017b) Optimal repair layering for erasure-coded data centers: from theory to practice. IEEE Trans Storage PP(99):1–26

    Google Scholar 

  • Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S (2012) Erasure coding in windows azure storage. In: Usenix conference on technical conference, pp 2–13

    Google Scholar 

  • Li R, Lee PPC, Hu Y (2014) Degraded-first scheduling for MapReduce in erasure-coded storage clusters. In: IEEE/IFIP international conference on dependable systems and networks, pp 419–430

    Google Scholar 

  • Li R, Hu Y, Lee PPC (2015) Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. IEEE Trans Parallel Distrib Syst PP(99):1–14

    Google Scholar 

  • Li R, Li X, Lee PPC, Huang Q (2017) Repair pipelining for erasure-coded storage. In: Usenix technical conference

    Google Scholar 

  • Muralidhar S, Lloyd W, Roy S, Hill C, Lin E, Liu W, Pan S, Shankar S, Sivakumar V, Tang L (2014) f4: Facebook’s warm blob storage system. In: Usenix conference on operating systems design and implementation, pp 383–398

    Google Scholar 

  • Pamies-Juarez L, Blagojevic F, Mateescu R, Gyuot C, Gad EE, Bandic Z (2016) Opening the chrysalis: on the real repair performance of MSR codes. In: International cultural heritage informatics meeting, pp 93–106

    Google Scholar 

  • Quinlan S (1991) A cached worm file system, vol 21. Wiley Online Library

    Google Scholar 

  • Rashmi KV, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2013) A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the Facebook warehouse cluster. Usenix Hotstorage

    Google Scholar 

  • Rashmi KV, Borthakur D, Borthakur D, Borthakur D, Borthakur D, Ramchandran K (2014) A “Hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. In: ACM conference on SIGCOMM, pp 331–342

    MATH  Google Scholar 

  • Rashmi KV, Nakkiran P, Wang J, Shah NB, Ramchandran K (2015) Having your cake and eating it too: jointly optimal erasure codes for i/o, storage and network-bandwidth. In: Usenix conference on file and storage technologies, pp 81–94

    Google Scholar 

  • Sathiamoorthy M, Asteris M, Papailiopoulos D, Dimakis AG, Vadali R, Chen S, Borthakur D (2013) Xoring elephants: novel erasure codes for big data. Proc VLDB Endow 6(5):325–336

    Article  Google Scholar 

  • Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: MASS storage systems and technologies, pp 1–10

    Google Scholar 

  • Silberstein M, Ganesh L, Wang Y, Alvisi L, Dahlin M (2014) Lazy means smart: reducing repair bandwidth costs in erasure-coded distributed storage, pp 1–7

    Google Scholar 

  • Weil SA, Brandt SA, Miller EL, Long DDE, Maltzahn C (2006a) Ceph: a scalable, high-performance distributed file system. In: Symposium on operating systems design and implementation, pp 307–320

    Google Scholar 

  • Weil SA, Brandt SA, Miller EL, Maltzahn C (2006b) Crush: controlled, scalable, decentralized placement of replicated data. In: SC 2006 conference, proceedings of the ACM/IEEE, pp 31–31

    Google Scholar 

  • Wilkes J (1996) The HP autoraid hierarchical storage system. ACM Trans Comput Syst 14(1):108–136

    Article  Google Scholar 

  • Xia M, Saxena M, Blaum M, Pease DA (2015) A tale of two erasure codes in HDFS. In: Usenix conference on file and storage technologies, pp 213–226

    Google Scholar 

  • Zhang Z, Deshpande A, Ma X, Thereska E, Narayanan D (2010) Does erasure coding have a role to play in my data center? Microsoft research MSR-TR-2010

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuchong Hu .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Hu, Y. (2018). Distributed File Systems. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_44-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_44-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

  1. Latest

    Distributed File Systems
    Published:
    24 May 2022

    DOI: https://doi.org/10.1007/978-3-319-63962-8_44-2

  2. Original

    Distributed File Systems
    Published:
    01 February 2018

    DOI: https://doi.org/10.1007/978-3-319-63962-8_44-1