Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Toward Efficient Block Replication Management in Distributed Storage

Published: 19 October 2020 Publication History

Abstract

Distributed/parallel file systems commonly suffer from load imbalance and resource contention due to the bursty characteristic exhibited in scientific applications. This article presents an adaptive scheme supporting dynamic block data replication and an efficient replica placement policy to improve the I/O performance of a distributed file system. Our goal is not only to yield a balanced data replication among storage servers but also a high degree of data access parallelism for the applications. We first present mathematical cost models to formulate the cost of data block replication by considering both the overhead and reduced data access time to the replicated data. To verify the validity and feasibility of the proposed cost model, we implement our proposal in a prototype distributed file system and evaluate it using a set of representative database-relevant application benchmarks. Our results demonstrate that the proposed approach can boost the usage efficiency of the data replicas with acceptable overhead of data replication management. Consequently, the overall data throughput of storage system can be noticeably improved. In summary, the proposed replication management scheme works well, especially for the database-relevant applications that exhibit an uneven access frequency and pattern to different parts of files.

References

[1]
C. Abad, Y. Lu, and R. Campbell. 2011. Dare: Adaptive data replication for efficient cluster scheduling. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’11). 159--168.
[2]
A. Aral and T. Ovatman. 2018. A decentralized replica placement algorithm for edge computing. IEEE Transactions on Network and Service Management 15, 2 (2018), 516--529.
[3]
R. Arpaci-Dusseau and C. Arpaci-Dusseau. 2014. Operating Systems: Three Easy Pieces (0.8 ed.). Arpaci-Dusseau Books.
[4]
M. Bhadkamkar, J. Guerra, L. Useche, S. Burnett, J. Liptak, R. Rangaswami, and V. Hristidis. 2009. BORG: Block-reORGanization for self-optimizing storage systems. In Proceedings of the Conference on File and Storage Technologies (FAST’09). 183--196.
[5]
V. Chandakanna. 2018. REHDFS: A random read/write enhanced HDFS. Journal of Network and Computer Applications 103 (2018), 85--100.
[6]
L. Chen, M. Qiu, J. Song, Z. Xiong, and H. Hassan. 2018. E2FS: An elastic storage system for cloud computing. Journal of Supercomputing 74, 3 (2018), 1045--1060.
[7]
H. Ciritoglu, L. Almeida, E. Almeida, T. Buda, J. Murphy, and C. Thorpe. 2018. Investigation of replication factor for performance enhancement in the Hadoop Distributed File System. In Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering. 135--140.
[8]
L. Cui, J. Zhang, L. Yue, Y. Shi, H. Li, and D. Yuan. 2018. A genetic algorithm based data replica placement strategy for scientific applications in clouds. IEEE Transactions on Services Computing 11, 4 (2018), 727--739.
[9]
C. Curino, E. Jones, Y. Zhang, and S. Madden. 2010. Schism: A workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment 3, 1--2 (2010), 48--57.
[10]
S. C. Deshmukh and S. S. Deshmukh. 2015. Improved load balancing for distributed file system using self acting and adaptive loading data migration process. In Proceedings of the 4th International Conference on Reliability, Infocom Technologies, and Optimization (ICRITO’15) (Trends and Future Directions). 1--6.
[11]
A. Ganesan, R. Alagappan, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. 2017. Redundancy does not imply fault tolerance: Analysis of distributed storage reactions to file-system faults. ACM Transactions on Storage 13, 3 (2017), Article 20.
[12]
M. Garey and D. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman.
[13]
S. Ghemawat, H. Gobioff, and S. Leung. 2003. The Google File System. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). 29--43.
[14]
GitHub. 2015. FUSE: Filesystem in Userspace. Retrieved August 5, 2020 from http://fuse.sourceforge.net/.
[15]
C. Guerrero, I. Lera, and C. Juiz. 2018. Migration-aware genetic optimization for MapReduce scheduling and replica placement in Hadoop. Journal of Grid Computing 16, 2 (2018), 265--284.
[16]
J. Han, J. Pei, Y. Yin, and R. Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery 8, 1 (2004), 53--87.
[17]
J. He, D. Nguyen, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. 2015. Reducing file system tail latencies with Chopper. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 119--133.
[18]
S. He and X. Sun. 2018. A cost-effective distribution-aware data replication scheme for parallel I/O systems. IEEE Transactions on Computers 67, 10 (2018), 1374--1387.
[19]
H. Hsiao, H. Chung, H. Shen, and Y. Chao. 2013. Load rebalancing for distributed file systems in clouds. IEEE Transactions on Parallel and Distributed Systems 24, 5 (2013), 951--962.
[20]
D. Huang, D. Han, J. Wang, J. Yin, X. Chen, X. Zhang, J. Zhou, and M. Ye. 2018. Achieving load balance for parallel data access on distributed file systems. IEEE Transactions on Computers 67, 3 (2018), 388--402.
[21]
M. Klems, A. Silberstein, J. Chen, M. Mortazavi, S. Albert, P. Narayan, A. Tumbde, and B. Cooper. 2012. The Yahoo!: Cloud datastore load balancer. In Proceedings of the 4th International Workshop on Cloud Data Management (CloudDB’12).
[22]
J. Li, X. Xu, X. Peng, and J. Liao. 2019. Pattern-based write scheduling and read balance-oriented wear-leveling for solid state drivers. In Proceedings of the 35th International Conference on Massive Storage Systems and Technology (MSST’19).
[23]
J. Liao, L. Li, H. Chen, and X. Liu. 2015. Adaptive replica synchronization for distributed file systems. IEEE Systems Journal 9, 3 (2015), 865--877.
[24]
J. Liao, F. Trahay, Z. Cai, J. Zhou, and G. Xiao. 2018a. Adaptive process migrations in coupled applications for exchanging data in local file cache. ACM Transactions on Autonomous and Adaptive Systems 13, 2 (2018), Article 9.
[25]
J. Liao, Z. Cai, F. Trahay, and X. Peng. 2018b. Block placement in distributed file systems based on block access frequency. IEEE Access 6, 1 (2018), 38411--38420.
[26]
J. Liao, F. Trahay, Z. Cai, H. Xiong, S. Chen, and Y. Ishikawa. 2019. Fine granularity and adaptive cache update mechanism for client caching. IEEE Systems Journal 27, 9 (2019), 2698--2710.
[27]
Y. Lin and H. Shen. 2017. EAFR: An energy-efficient adaptive file replication system in data-intensive clusters. IEEE Transactions on Parallel and Distributed Systems 28, 4 (2017), 1017--1030.
[28]
Y. Liu, R. Gunasekaran, X. Ma, and S. Vazhkudai. 2014. Automatic identification of application I/O signatures from noisy server-side traces. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 213--228.
[29]
S. Long, Y. Zhao, and W. Chen. 2014. A three-phase energy-saving strategy for cloud storage systems. Journal of Systems and Software 87 (2014), 38--47.
[30]
N. Mansouri and M. Javidi. 2018. A new prefetching-aware data replication to decrease access latency in cloud environment. Journal of Systems and Software 144 (2018), 197--215.
[31]
MySQL Database. 2016. MySQL Community Downloads. Retrieved July 1, 2016 from http://dev.mysql.com/downloads/.
[32]
D. Narayanan, A. Donnelly, and A. Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. ACM Transactions on Storage 4, 3 (2008), Article 10.
[33]
NOAA. 2018. Data Access. Retrieved July 2, 2018 from https://www.ncdc.noaa.gov/data-access.
[34]
P. Padmanabhan, L. Gruenwald, A. Vallur, and M. Atiquzzaman. 2008. A survey of data replication techniques for mobile ad hoc network databases. VLDB Journal—The International Journal on Very Large Data Bases 17, 5 (2008), 1143--1164.
[35]
A. Paul, A. Goyal, F. Wang, S. Oral, A. Butt, M. Brim, and S. Srinivasa. 2018. I/O load balancing for big data HPC applications. In Proceedings of the IEEE International Conference on Big Data.
[36]
F. Schmuck and R. Haskin. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the Conference on File and Storage Technologies (FAST’02). 231--244.
[37]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST’10). 1--10.
[38]
T. Shwe and M. Aritsugi. 2018. PRTuner: Proactive-reactive re-replication tuning in HDFS-based cloud data center. IEEE Cloud Computing 5, 6 (2018), 48--57.
[39]
A. Singh, M. Korupolu, and D. Mohapatra. 2008. Server-storage virtualization: Integration and load balancing in data centers. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08). 1--12.
[40]
G. Stavrinides, F. Duro, H. Karatza, J. Blas, and J. Carretero. 2017. Different aspects of workflow scheduling in large-scale distributed systems. Simulation Modelling Practice and Theory 70 (2017), 120--134.
[41]
W. Sun, V. Simon, S. Monnet, and P. Sens. 2017. Analysis of a stochastic model of replication in large distributed storage systems: A mean-field approach. Proceedings of the ACM on Measurement and Analysis of Computing Systems 1, 1 (2017), Article 24, 21 pages.
[42]
Transaction Processing Performance Council. 2016a. TPC-C Benchmark Revision 5.11.0. Retrieved February 1, 2016 from http://www.tpc.org/tpcc/.
[43]
Transaction Processing Performance Council. 2016b. TPC-E Benchmark Version 1.14.0. Retrieved February 1, 2016 from http://www.tpc.org/tpce/.
[44]
M. Tu. 2006. A Data Management Framework for Secure and Dependable Data Grid. Ph.D. Dissertation. University of Texas at Dallas.
[45]
J. Wang, X. Zhang, J. Zhang, J. Wang, R. Wang, and D. Huang. 2017. Deister: A light-weight autonomous block management in data-intensive file systems using deterministic declustering distribution. Journal of Parallel and Distributed Computing 108 (2017), 3--13.
[46]
L. Wang, Y. Ma, A. Zomaya, R. Ranjan, and D. Chen. 2015. A parallel file system with application-aware data layout policies for massive remote sensing image processing in Digital Earth. IEEE Transactions on Parallel and Distributed Systems 26, 6 (2015), 1497--1508.
[47]
X. Wang, Y. Wang, and Y. Cui. 2014. A new multi-objective bi-level programming model for energy and locality aware multijob scheduling in cloud computing. Future Generation Computer Systems 36 (2014), 91--101.
[48]
Q. Wei, B. Veeravalli, B. Gong, L. Zeng, and D. Feng. 2010. CDRM: A cost-effective dynamic replication management scheme for cloud storage cluster. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’10). 188--196.
[49]
S. Weil, K. Pollack, S. Brandt, and E. Miller. 2004. Dynamic metadata management for petabyte-scale file systems. In Proceedings of the 2004 ACM/IEEE Conference on Supercomputing (SC’04). 4--15.
[50]
Z. Yang, J. Bhimani, J. Wang, D. Evans, and N. Mi. 2017. Automatic and scalable data replication manager in distributed computation and storage infrastructure of cyber-physical systems. Journal of Scalable Computing 18, 4 (2017), 291--311.
[51]
Y. Yin, J. Li, J. He, X. Sun, and R. Thakur. 2013. Pattern-direct and layout-aware replication scheme for parallel I/O systems. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS’13). 345--356.
[52]
M. Yu, Y. Yu, Y. Zheng, B. Yang, and W. Wang. 2020. RepBun: Load-balanced, shuffle-free cluster caching for structured data. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM’20). 1--10.
[53]
Q. Zhang, S. Zhang, A. Leon-Garcia, and R. Boutaba. 2015. Aurora: Adaptive block replication in distributed file systems. In Proceedings of the 35th IEEE International Conference on Distributed Computing Systems (ICDCS’15). 442--451.
[54]
Y. Zhao, C. Li, L. Li, and P. Zhang. 2017. Dynamic replica creation strategy based on file heat and node load in hybrid cloud. In Proceedings of the 19th International Conference on Advanced Communication Technology (ICACT’17). 213--220.

Cited By

View all
  • (2024)Improving big data analytics data processing speed through map reduce scheduling and replica placement with HDFS using genetic optimization techniquesJournal of Intelligent & Fuzzy Systems10.3233/JIFS-24006946:4(10863-10882)Online publication date: 18-Apr-2024
  • (2022)Research on the Application of Distributed Key-Value Storage Technology in Computer Database Platform2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA)10.1109/ICPECA53709.2022.9719107(690-694)Online publication date: 21-Jan-2022

Index Terms

  1. Toward Efficient Block Replication Management in Distributed Storage

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems
    ACM Transactions on Modeling and Performance Evaluation of Computing Systems  Volume 5, Issue 3
    September 2020
    130 pages
    ISSN:2376-3639
    EISSN:2376-3647
    DOI:10.1145/3403640
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2020
    Accepted: 01 July 2020
    Revised: 01 May 2020
    Received: 01 September 2019
    Published in TOMPECS Volume 5, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Distributed file systems
    2. access load balance
    3. block data replication
    4. modeling
    5. replica placement

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • Fundamental Research Funds for the Central Universities
    • Natural Science Foundation Project of CQ CSTC
    • Hunan Provincial Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Improving big data analytics data processing speed through map reduce scheduling and replica placement with HDFS using genetic optimization techniquesJournal of Intelligent & Fuzzy Systems10.3233/JIFS-24006946:4(10863-10882)Online publication date: 18-Apr-2024
    • (2022)Research on the Application of Distributed Key-Value Storage Technology in Computer Database Platform2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA)10.1109/ICPECA53709.2022.9719107(690-694)Online publication date: 21-Jan-2022

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media