Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2901318.2901328acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage

Published: 18 April 2016 Publication History

Abstract

With the explosion of data in applications all around us, erasure coded storage has emerged as an attractive alternative to replication because even with significantly lower storage overhead, they provide better reliability against data loss. Reed-Solomon code is the most widely used erasure code because it provides maximum reliability for a given storage overhead and is flexible in the choice of coding parameters that determine the achievable reliability. However, reconstruction time for unavailable data becomes prohibitively long mainly because of network bottlenecks. Some proposed solutions either use additional storage or limit the coding parameters that can be used. In this paper, we propose a novel distributed reconstruction technique, called Partial Parallel Repair (PPR), which divides the reconstruction operation to small partial operations and schedules them on multiple nodes already involved in the data reconstruction. Then a distributed protocol progressively combines these partial results to reconstruct the unavailable data blocks and this technique reduces the network pressure. Theoretically, our technique can complete the network transfer in ⌈(log2(k + 1))⌉ time, compared to k time needed for a (k, m) Reed-Solomon code. Our experiments show that PPR reduces repair time and degraded read time significantly. Moreover, our technique is compatible with existing erasure codes and does not require any additional storage overhead. We demonstrate this by overlaying PPR on top of two prior schemes, Local Reconstruction Code and Rotated Reed-Solomon code, to gain additional savings in reconstruction time.

References

[1]
Ceph http://ceph.com/.
[2]
Google Colossus File System: http://static.googleusercontent.com/media/research.google.com/en/university/relations/facultysummit2010/storage_architecture_and_challenges.pdf.
[3]
Erasure Coding Support inside HDFS: https://issues.apache.org/jira/browse/HDFS-7285.
[4]
OpenStack: Open source software for creating private and public clouds: http://www.openstack.org/.
[5]
Quantcast File System http://quantcast.github.io/qfs/.
[6]
OpenStack Object Storage (Swift): http://swift.openstack.org.
[7]
Big Data and What it Means: http://www.uschamberfoundation.org/bhq/big-data-and-what-it-means.
[8]
Yahoo Cloud Object Store: http://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at.
[9]
A. Akella, T. Benson, B. Chandrasekaran, C. Huang, B. Maggs, and D. Maltz. A universal approach to data center network design. In ICDCN, 2015.
[10]
M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. 2008.
[11]
L. Atieno, J. Allen, D. Goeckel, and R. Tessier. An adaptive reed-solomon errors-and-erasures decoder. In ACM/SIGDA FPGA, 2006.
[12]
R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. M. Voelker. Total recall: System support for automated availability management. In NSDI, 2004.
[13]
k. Björck and V. Pereyra. Solution of vandermonde systems of equations. Mathematics of Computation, 1970.
[14]
B. Calder, J. Wang, A. Ogus, N. Nilakantan, et al. Windows azure storage: a highly available cloud storage service with strong consistency. In SOSP, 2011.
[15]
X. Chu, C. Liu, K. Ouyang, L. S. Yung, H. Liu, and Y.-W. Leung. Perasure: a parallel cauchy reed-solomon coding library for gpus. In IEEE ICC, 2015.
[16]
B.-G. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. F. Kaashoek, J. Kubiatowicz, and R. Morris. Efficient replica maintenance for distributed storage systems. In NSDI, 2006.
[17]
A. G. Dimakis, P. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. IEEE Transactions on Information Theory, 2010.
[18]
K. S. Esmaili, L. Pamies-Juarez, and A. Datta. Core: Cross-object redundancy for efficient data repair in storage systems. In IEEE International Conference on Big Data, 2013.
[19]
D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan. Availability in globally distributed storage systems. In OSDI, 2010.
[20]
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Vl2: a scalable and flexible data center network. In SIGCOMM, 2009.
[21]
Y. Hu, H. C. Chen, P. P. Lee, and Y. Tang. Nccloud: applying network coding for the storage repair in a cloud-of-clouds. In FAST, 2012.
[22]
C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. Erasure coding in windows azure storage. In ATC, 2012.
[23]
H. M. Ji. An optimized processor for fast reed-solomon encoding and decoding. In IEEE ICASSP, 2002.
[24]
O. Khan, R. C. Burns, J. S. Plank, W. Pierce, and C. Huang. Rethinking erasure codes for cloud file systems: minimizing i/o for recovery and degraded reads. In FAST, 2012.
[25]
J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, et al. Oceanstore: An architecture for global-scale persistent storage. In ASPLOS, 2000.
[26]
J. Li and B. Li. Beehive: erasure codes for fixing multiple failures in distributed storage systems. In HotStorage, 2015.
[27]
F. J. MacWilliams and N. J. A. Sloane. The theory of error correcting codes. North-Holland, 1977.
[28]
S. Mitra, I. Laguna, D. H. Ahn, S. Bagchi, M. Schulz, and T. Gamblin. Accurate application progress analysis for large-scale parallel debugging. In PLDI, 2014.
[29]
S. Muralidhar, W. Lloyd, S. Roy, C. Hill, E. Lin, W. Liu, S. Pan, S. Shankar, V. Sivakumar, L. Tang, et al. F4: Facebooks warm blob storage system. In OSDI, 2014.
[30]
M. Ovsiannikov, S. Rus, D. Reeves, P. Sutter, S. Rao, and J. Kelly. The quantcast file system. VLDB, 2013.
[31]
D. S. Papailiopoulos and A. G. Dimakis. Locally repairable codes. IEEE Transactions on Information Theory, 2014.
[32]
J. S. Plank and L. Xu. Optimizing cauchy reed-solomon codes for fault-tolerant network storage applications. In IEEE International Symposium on Network Computing and Applications (NCA), 2006.
[33]
J. S. Plank, S. Simmerman, and C. D. Schuman. Jerasure: A library in c/c++ facilitating erasure coding for storage applications-version 1.2. Technical report, Technical Report CS-08-627, University of Tennessee, 2008.
[34]
J. S. Plank, J. Luo, C. D. Schuman, L. Xu, Z. Wilcox-O'Hearn, et al. A performance evaluation and examination of open-source erasure coding libraries for storage. In FAST, 2009.
[35]
K. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the facebook warehouse cluster. In HotStorage, 2013.
[36]
K. Rashmi, P. Nakkiran, J. Wang, N. B. Shah, and K. Ramchandran. Having your cake and eating it too: Jointly optimal erasure codes for i/o, storage, and network-bandwidth. In FAST, 2015.
[37]
K. V. Rashmi, N. B. Shah, and P. V. Kumar. Optimal exact-regenerating codes for distributed storage at the msr and mbr points via a product-matrix construction. IEEE Transactions on Information Theory, 2011.
[38]
K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran. A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers. In SIGCOMM, 2014.
[39]
I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. SIAM, 1960.
[40]
S. C. Rhea, P. R. Eaton, D. Geels, H. Weatherspoon, B. Y. Zhao, and J. Kubiatowicz. Pond: The oceanstore prototype. In FAST, 2003.
[41]
R. Rodrigues and B. Liskov. High availability in dhts: Erasure coding vs. replication. In Peer-to-Peer Systems. Springer, 2005.
[42]
M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. Xoring elephants: Novel erasure codes for big data. In VLDB, 2013.
[43]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010.
[44]
M. Silberstein, L. Ganesh, Y. Wang, L. Alvisi, and M. Dahlin. Lazy means smart: Reducing repair bandwidth costs in erasure-coded distributed storage. In SYSTOR, 2014.
[45]
R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in mpich. International Journal of High Performance Computing Applications, 2005.
[46]
R. Van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. In OSDI, 2004.
[47]
Z. Wang, A. G. Dimakis, and J. Bruck. Rebuilding for array codes in distributed storage systems. In GLOBECOM Workshop, 2010.
[48]
H. Weatherspoon and J. D. Kubiatowicz. Erasure coding vs. replication: A quantitative comparison. In Peer-to-Peer Systems. Springer, 2002.
[49]
S. A. Weil, S. A. Brandt, E. L. Miller, and C. Maltzahn. Crush: Controlled, scalable, decentralized placement of replicated data. In ACM/IEEE conference on Supercomputing, 2006.
[50]
M. Xia, M. Saxena, M. Blaum, and D. A. Pease. A tale of two erasure codes in hdfs. In FAST, 2015.
[51]
L. Xiang, Y. Xu, J. Lui, and Q. Chang. Optimal recovery of single disk failure in rdp code storage systems. In SIGMETRICS, 2010.

Cited By

View all
  • (2024)Symmetrical Data Recovery: FPGA-Based Multi-Dimensional Elastic Recovery Acceleration for Multiple Block Failures in Ceph SystemsSymmetry10.3390/sym1606067216:6(672)Online publication date: 30-May-2024
  • (2024)Enabling Efficient Erasure Coding in Disaggregated Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333278235:1(154-168)Online publication date: Jan-2024
  • (2024)Parallelized In-Network Aggregation for Failure Repair in Erasure-Coded Storage SystemsIEEE/ACM Transactions on Networking10.1109/TNET.2024.336799532:4(2888-2903)Online publication date: Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroSys '16: Proceedings of the Eleventh European Conference on Computer Systems
April 2016
605 pages
ISBN:9781450342407
DOI:10.1145/2901318
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed storage
  2. erasure code
  3. network transfer
  4. reconstruction
  5. repair
  6. utilization

Qualifiers

  • Research-article

Conference

EuroSys '16
EuroSys '16: Eleventh EuroSys Conference 2016
April 18 - 21, 2016
London, United Kingdom

Acceptance Rates

EuroSys '16 Paper Acceptance Rate 38 of 180 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)4
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Symmetrical Data Recovery: FPGA-Based Multi-Dimensional Elastic Recovery Acceleration for Multiple Block Failures in Ceph SystemsSymmetry10.3390/sym1606067216:6(672)Online publication date: 30-May-2024
  • (2024)Enabling Efficient Erasure Coding in Disaggregated Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333278235:1(154-168)Online publication date: Jan-2024
  • (2024)Parallelized In-Network Aggregation for Failure Repair in Erasure-Coded Storage SystemsIEEE/ACM Transactions on Networking10.1109/TNET.2024.336799532:4(2888-2903)Online publication date: Aug-2024
  • (2024)A Fast Location-Aware Repair Strategy for Mobile Grouped Storage ClustersIEEE Internet of Things Journal10.1109/JIOT.2024.336386811:12(20885-20898)Online publication date: 15-Jun-2024
  • (2024)Boosting Correlated Failure Repair in SSD Data CentersIEEE Internet of Things Journal10.1109/JIOT.2023.333997911:8(14228-14240)Online publication date: 15-Apr-2024
  • (2024)Greedy Transfer Planning Search For Improving Repair Throughput of RDP-like Coded Storage Clusters2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682840(1-10)Online publication date: 19-Jun-2024
  • (2024)HGR: A Hybrid Global Graph-Based Recovery Approach for Cloud Storage Systems with Failure and Straggler Nodes2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00075(750-761)Online publication date: 23-Jul-2024
  • (2024)Rapper: A Parameter-Aware Repair-in-Memory Accelerator for Blockchain Storage Platform2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00042(468-482)Online publication date: 2-Mar-2024
  • (2024)HSM: A Hybrid Storage Method Based on the Heat of Data and Global Disk Space UtilizationIEEE Access10.1109/ACCESS.2024.338298712(48630-48639)Online publication date: 2024
  • (2024)ACPR: Adaptive Classification Predictive Repair Method for Different Fault ScenariosIEEE Access10.1109/ACCESS.2023.334688112(4631-4641)Online publication date: 2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media