Placement Scheduling for Replication in HDFS Based on Probabilistic Approach

Dinh-Mao Bui¹⁹ &
Sungyoung Lee¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9677))

Included in the following conference series:

International Conference on Smart Homes and Health Telematics

2312 Accesses

Abstract

Along with the rapid evolution in Big Data analysis, Apache Hadoop keeps the important role to deliver the high availability on top of computing clusters. Also, to maintain the high throughput access for computation, the Apache Hadoop is equipped with the Hadoop File System (HDFS) for managing the file operations. Besides, HDFS is ensured the reliability and high availability by using a specific replication mechanism. However, because the workload on each computing node is various, keeping the same replication strategy might result in imbalance. Targeting to solve this drawbacks of HDFS architecture, we proposes an approach to adaptively choose the placement for replicas. To do that, the network status and system utilization can be used to create the individual replication placement strategy for each file. Eventually, the proposed approach can provide the suitable destination for replicas to improve the performance. Subsequently, the availability of the system is enhanced while still keeping the reliability of data storage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HaRD: a heterogeneity-aware replica deletion for HDFS

Article Open access 21 October 2019

Hierarchical data replication strategy to improve performance in cloud computing

Article 04 December 2020

Dynamic Replication Management Scheme for Distributed File System

References

Wei, Q., Veeravalli, B., Gong, B., Zeng, L., Feng, D.: Cdrm: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE International Conference on Cluster Computing (CLUSTER), pp. 188–196, September 2010
Google Scholar
Abad, C.L., Lu, Y., Campbell, R.H.: Dare: adaptive data replication for efficient cluster scheduling. In: CLUSTER, pp. 159–168. IEEE (2011)
Google Scholar
Cheng, Z., Luan, Z., Meng, Y., Xu, Y., Qian, D., Roy, A., Zhang, N., Guan, G.: Erms: an elastic replication management system for hdfs. In: 2012 IEEE International Conference on Cluster Computing Workshops (CLUSTER WORKSHOPS), pp. 32–40, September 2012
Google Scholar
Kousiouris, G., Vafiadis, G., Varvarigou, T.: Enabling proactive data management in virtualized hadoop clusters based on predicted data activity patterns. In: 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), pp. 1–8, October 2013
Google Scholar
Wu, X.: Performance Evaluation Prediction and Visualization of Parallel Systems. The International Series on Asian Studies in Computer and Information Science. Springer US, New York (1999). http://books.google.co.kr/books?id=IJZt5H6R8OIC
Google Scholar
Gallager, R.: Stochastic Processes: Theory for Applications. Cambridge University Press, Cambridge (2013). http://books.google.co.kr/books?id=CGFbAgAAQBAJ
Google Scholar

Download references

Acknowledgment

This work was supported by the Industrial Core Technology Development Program (10049079, Develop of mining core technology exploiting personal big data) funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea); and supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) NRF-2014R1A2A2A01003914.

Author information

Authors and Affiliations

Department of Computer Engineering, Kyung Hee University, Suwon, Korea
Dinh-Mao Bui & Sungyoung Lee

Authors

Dinh-Mao Bui
View author publications
You can also search for this author in PubMed Google Scholar
Sungyoung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sungyoung Lee .

Editor information

Editors and Affiliations

Iowa State University, Ames, Iowa, USA
Carl K. Chang
University of Bologna, Bologna, Italy
Lorenzo Chiari
The University of Massachusetts, Lowell, Massachusetts, USA
Yu Cao
Huazhong Univ. of Science and Technology, Wuhan, China
Hai Jin
Institut Mines Télécom Paris/CNRS, Paris, France
Mounir Mokhtari
Institut Mines Télécom, Paris, France
Hamdi Aloulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bui, DM., Lee, S. (2016). Placement Scheduling for Replication in HDFS Based on Probabilistic Approach. In: Chang, C., Chiari, L., Cao, Y., Jin, H., Mokhtari, M., Aloulou, H. (eds) Inclusive Smart Cities and Digital Health. ICOST 2016. Lecture Notes in Computer Science(), vol 9677. Springer, Cham. https://doi.org/10.1007/978-3-319-39601-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-39601-9_28
Published: 21 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39600-2
Online ISBN: 978-3-319-39601-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Placement Scheduling for Replication in HDFS Based on Probabilistic Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HaRD: a heterogeneity-aware replica deletion for HDFS

Hierarchical data replication strategy to improve performance in cloud computing

Dynamic Replication Management Scheme for Distributed File System

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Placement Scheduling for Replication in HDFS Based on Probabilistic Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HaRD: a heterogeneity-aware replica deletion for HDFS

Hierarchical data replication strategy to improve performance in cloud computing

Dynamic Replication Management Scheme for Distributed File System

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation