Abstract
Along with the rapid evolution in Big Data analysis, Apache Hadoop keeps the important role to deliver the high availability on top of computing clusters. Also, to maintain the high throughput access for computation, the Apache Hadoop is equipped with the Hadoop File System (HDFS) for managing the file operations. Besides, HDFS is ensured the reliability and high availability by using a specific replication mechanism. However, because the workload on each computing node is various, keeping the same replication strategy might result in imbalance. Targeting to solve this drawbacks of HDFS architecture, we proposes an approach to adaptively choose the placement for replicas. To do that, the network status and system utilization can be used to create the individual replication placement strategy for each file. Eventually, the proposed approach can provide the suitable destination for replicas to improve the performance. Subsequently, the availability of the system is enhanced while still keeping the reliability of data storage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wei, Q., Veeravalli, B., Gong, B., Zeng, L., Feng, D.: Cdrm: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE International Conference on Cluster Computing (CLUSTER), pp. 188–196, September 2010
Abad, C.L., Lu, Y., Campbell, R.H.: Dare: adaptive data replication for efficient cluster scheduling. In: CLUSTER, pp. 159–168. IEEE (2011)
Cheng, Z., Luan, Z., Meng, Y., Xu, Y., Qian, D., Roy, A., Zhang, N., Guan, G.: Erms: an elastic replication management system for hdfs. In: 2012 IEEE International Conference on Cluster Computing Workshops (CLUSTER WORKSHOPS), pp. 32–40, September 2012
Kousiouris, G., Vafiadis, G., Varvarigou, T.: Enabling proactive data management in virtualized hadoop clusters based on predicted data activity patterns. In: 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), pp. 1–8, October 2013
Wu, X.: Performance Evaluation Prediction and Visualization of Parallel Systems. The International Series on Asian Studies in Computer and Information Science. Springer US, New York (1999). http://books.google.co.kr/books?id=IJZt5H6R8OIC
Gallager, R.: Stochastic Processes: Theory for Applications. Cambridge University Press, Cambridge (2013). http://books.google.co.kr/books?id=CGFbAgAAQBAJ
Acknowledgment
This work was supported by the Industrial Core Technology Development Program (10049079, Develop of mining core technology exploiting personal big data) funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea); and supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) NRF-2014R1A2A2A01003914.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Bui, DM., Lee, S. (2016). Placement Scheduling for Replication in HDFS Based on Probabilistic Approach. In: Chang, C., Chiari, L., Cao, Y., Jin, H., Mokhtari, M., Aloulou, H. (eds) Inclusive Smart Cities and Digital Health. ICOST 2016. Lecture Notes in Computer Science(), vol 9677. Springer, Cham. https://doi.org/10.1007/978-3-319-39601-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-39601-9_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39600-2
Online ISBN: 978-3-319-39601-9
eBook Packages: Computer ScienceComputer Science (R0)