Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3337821.3337831acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Fast Recovery Techniques for Erasure-coded Clusters in Non-uniform Traffic Network

Published: 05 August 2019 Publication History

Abstract

Nowadays many practical systems adopt erasure codes to ensure reliability and reduce storage overhead. However, erasure codes also bring in low recovery performance. The network links in practice, such as peer-to-peer and cross-data network, always have nonuniform bandwidth because of various reasons. To reduce recovery time, we propose Parallel Pipeline Tree (PPT) and Parallel Pipeline Cross-Tree (PPCT) to speed up single-node and multiple-node recovery in non-uniform traffic network environment, respectively. By utilizing bandwidth gap among links, PPT constructs a tree path based on bandwidth and pipelines the data in parallel. By sharing traffic pressure of requesters with helpers, PPCT constructs a tree-like path and pipelines the data in parallel without additional helpers. We also theoretically explain the effect of PPT and PPCT used in uniform network environment. The experiments implemented on geo-distributed Amazon EC2 show that the time reduction reaches up to 37.2% with PPCT over traditional technique and reaches up to 89.2%, 76.4% and 21.6% with PPT over traditional technique, Partial-Parallel-Repair and Repair Pipelining, respectively. PPT and PPCT significantly improve the performance of erasure codes' recovery.

References

[1]
2015. iPerf - The ultimate speed test tool for TCP, UDP and SCTP. https://iperf.fr/iperf-doc.php
[2]
2015. Under the hood: FacebookâĂŹs cold storage system. https://code.fb.com/core-data/under-the-hood-facebook-s-cold-storage-system/
[3]
2016. Ceph Erasure Coding. http://docs.ceph.com/docs/mimic/rados/operations/erasure-code/
[4]
2017. HDFS Erasure Coding. https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html
[5]
2018. The Digitization of the World From Edge to Core. https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
[6]
Ranjita Bhagwan, Kiran Tati, Yuchung Cheng, Stefan Savage, and Geoffrey M Voelker. 2004. Total Recall: System Support for Automated Availability Management. In Nsdi, Vol. 4. 25--25.
[7]
Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, et al. 2011. Windows Azure Storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM, 143--157.
[8]
Yu Lin Chen, Shuai Mu, Jinyang Li, Cheng Huang, Jin Li, Aaron Ogus, and Douglas Phillips. 2017. Giza: Erasure coding objects across global data centers. In Proc. USENIX Annu. Tech. Conf.(USENIX ATC).
[9]
Toni Ernvall, Salim El Rouayheb, Camilla Hollanti, and H Vincent Poor. 2013. Capacity and security of heterogeneous distributed storage systems. IEEE Journal on Selected Areas in Communications 31, 12 (2013), 2701--2709.
[10]
Daniel Ford, François Labelle, Florentina I Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in Globally Distributed Storage Systems. In Osdi, Vol. 10. 1--7.
[11]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. Vol. 37. ACM.
[12]
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, Sergey Yekhanin, et al. 2012. Erasure Coding in Windows Azure Storage. In Usenix annual technical conference. Boston, MA, 15--26.
[13]
Osama Khan, Randal C Burns, James S Plank, William Pierce, and Cheng Huang. 2012. Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. In FAST. 20.
[14]
John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, et al. 2000. Oceanstore: An architecture for global-scale persistent storage. In ACM SIGARCH Computer Architecture News, Vol. 28. ACM, 190--201.
[15]
James F Kurose and Keith W Ross. 2012. Computer networking: A top down approach sixth edition.
[16]
Jun Li and Baochun Li. 2017. Beehive: erasure codes for fixing multiple failures in distributed storage systems. IEEE Transactions on Parallel and Distributed Systems 28, 5 (2017), 1257--1270.
[17]
Jun Li, Shuang Yang, Xin Wang, and Baochun Li. 2010. Tree-structured data regeneration in distributed storage systems with regenerating codes. In 2010 Proceedings IEEE INFOCOM. IEEE, 1--9.
[18]
Jun Li, Shuang Yang, Xin Wang, Xiangyang Xue, and Baochun Li. 2009. Tree-structured data regeneration with network coding in distributed storage systems. In 2009 17th International Workshop on Quality of Service. IEEE, 1--9.
[19]
Runhui Li, Xiaolu Li, Patrick PC Lee, and Qun Huang. 2017. Repair pipelining for erasure-coded storage. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATCâĂŹ17). 567--579.
[20]
WK Lin, C Ye, and DM Chiu. 2007. Decentralized replication algorithms for improving file availability in P2P networks. In Quality of Service, 2007 Fifteenth IEEE International Workshop on. IEEE, 29--37.
[21]
Ao Ma, Rachel Traylor, Fred Douglis, Mark Chamness, Guanlin Lu, Darren Sawyer, Surendar Chandra, and Windsor Hsu. 2015. RAIDS hield: characterizing, monitoring, and proactively protecting against disk failures. ACM Transactions on Storage (TOS) 11, 4 (2015), 17.
[22]
Yadi Ma, Thyaga Nandagopal, Krishna PN Puttaswamy, and Suman Banerjee. 2013. An ensemble of replication and erasure codes for cloud file systems. In INFOCOM, 2013 Proceedings IEEE. IEEE, 1276--1284.
[23]
Subrata Mitra, Rajesh Panta, Moo-Ryong Ra, and Saurabh Bagchi. 2016. Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage. In Proceedings of the Eleventh European Conference on Computer Systems. ACM, 30.
[24]
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, et al. 2014. f4: FacebookâĂŹs warm blob storage system. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation. USENIX Association, 383--398.
[25]
James S Plank, Scott Simmerman, and Catherine D Schuman. 2008. Jerasure: A library in C/C++ facilitating erasure coding for storage applications-Version 1.2. University of Tennessee, Tech. Rep. CS-08-627 23 (2008).
[26]
KV Rashmi, Nihar B Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2015. A hitchhiker's guide to fast and efficient data reconstruction in erasure-coded data centers. ACM SIGCOMM Computer Communication Review 44, 4 (2015), 331--342.
[27]
Korlakai Vinayak Rashmi, Nihar B Shah, and P Vijay Kumar. 2011. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Transactions on Information Theory 57, 8 (2011), 5227--5239.
[28]
Irving S Reed and Gustave Solomon. 1960. Polynomial codes over certain finite fields. Journal of the society for industrial and applied mathematics 8, 2 (1960), 300--304.
[29]
Birenjith Sasidharan, Myna Vajha, and P Vijay Kumar. 2016. An explicit, coupled-layer construction of a high-rate MSR code with low sub-packetization level, small field size and all-node repair. arXiv preprint arXiv:1607.07335 (2016).
[30]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributedi le system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. Ieee, 1--10.
[31]
Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. 2013. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Transactions on Information Theory 59, 3 (2013), 1597--1616.
[32]
Hakim Weatherspoon and John D Kubiatowicz. 2002. Erasure coding vs. replication: A quantitative comparison. In International Workshop on Peer-to-Peer Systems. Springer, 328--337.

Cited By

View all
  • (2024)Stripe-schedule Aware Repair in Erasure-coded Clusters with Heterogeneous Star NetworksACM Transactions on Architecture and Code Optimization10.1145/366492621:3(1-24)Online publication date: 13-May-2024
  • (2024)A Fast Location-Aware Repair Strategy for Mobile Grouped Storage ClustersIEEE Internet of Things Journal10.1109/JIOT.2024.336386811:12(20885-20898)Online publication date: 15-Jun-2024
  • (2024)Boosting Correlated Failure Repair in SSD Data CentersIEEE Internet of Things Journal10.1109/JIOT.2023.333997911:8(14228-14240)Online publication date: 15-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '19: Proceedings of the 48th International Conference on Parallel Processing
August 2019
1107 pages
ISBN:9781450362955
DOI:10.1145/3337821
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Erasure code
  2. Non-uniform traffic
  3. Recovery
  4. Transmitting path

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2019

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Stripe-schedule Aware Repair in Erasure-coded Clusters with Heterogeneous Star NetworksACM Transactions on Architecture and Code Optimization10.1145/366492621:3(1-24)Online publication date: 13-May-2024
  • (2024)A Fast Location-Aware Repair Strategy for Mobile Grouped Storage ClustersIEEE Internet of Things Journal10.1109/JIOT.2024.336386811:12(20885-20898)Online publication date: 15-Jun-2024
  • (2024)Boosting Correlated Failure Repair in SSD Data CentersIEEE Internet of Things Journal10.1109/JIOT.2023.333997911:8(14228-14240)Online publication date: 15-Apr-2024
  • (2023)ParaRCProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585940(17-31)Online publication date: 21-Feb-2023
  • (2023)Boosting Multi-Block Repair in Cloud Storage Systems with Wide-Stripe Erasure Coding2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00036(279-289)Online publication date: May-2023
  • (2023)FullRepair: Towards Optimal Repair Pipelining in Erasure-Coded Clustered Storage Systems2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00017(107-117)Online publication date: 31-Oct-2023
  • (2023)Optimized Proactive Recovery in Erasure-Coded Cloud Storage SystemsIEEE Access10.1109/ACCESS.2023.326710611(38226-38239)Online publication date: 2023
  • (2023)BPR: An Erasure Coding Batch Parallel Repair Approach in Distributed Storage SystemsIEEE Access10.1109/ACCESS.2023.325740411(44509-44518)Online publication date: 2023
  • (2022)LEGOStoreProceedings of the VLDB Endowment10.14778/3547305.354732315:10(2201-2215)Online publication date: 7-Sep-2022
  • (2022)Exploiting Parallelism of Disk Failure Recovery via Partial Stripe Repair for an Erasure-Coded High-Density Storage ServerProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545014(1-11)Online publication date: 29-Aug-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media