research-article

A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory

Authors:

Sanghyun ParkAuthors Info & Claims

Volume 75, Issue 10

Pages 6632 - 6662

https://doi.org/10.1007/s11227-019-02876-9

Published: 01 October 2019 Publication History

Abstract

With the emergence of the big data era, various technologies have been proposed to cope with the exascale of data. For a considerably large volume of data, a single machine does not comprise enough resources to store the complete data. Hadoop distributed file system (HDFS) enables large datasets to be stored across the big data environment consisting of several machines. Although Hadoop has become a crucial part of the big data industry, because of its simple architecture which composed of master and slaves several problems such as scalability and performance bottleneck has been remained to solve. New storage technologies offer an opportunity to solve the problems and improve HDFS. We propose a novel management scheme for namespace metadata of HDFS by utilizing nonvolatile memory which has been mentioned as the next-generation device since flash memory devices. Nonvolatile memory, which can guarantee data persistence and high performance with byte-address access, alleviates Namenode bottlenecks resulting from journaling processes performed to preserve the file system’s metadata. Our proposed methods show significant improvement compared with block devices such as hard disk drive, solid-state drive in terms of NameNode performance.

References

[1]

Andrei M, Lemke C, Radestock G, Schulze R, Thiel C, Blanco R, Meghlan A, Sharique M, Seifert S, Vishnoi S et al (2017) Sap hana adoption of non-volatile memory. Proc VLDB Endow 10(12):1754–1765

Digital Library

[2]

Apache Hadoop Home Page. http://hadoop.apache.org

[3]

Apache Kafka Home Page. https://kafka.apache.org

[4]

Apache Storm Home Page. http://storm.apache.org

[5]

Apache Zookeeper Home Page. https://zookeeper.apache.org

[6]

Arulraj J, Pavlo A (2017) How to build a non-volatile memory database management system. In: Proceedings of the 2017 ACM International Conference on Management of Data. ACM, pp 1753–1758

[7]

Arulraj J, Perron M, Pavlo A (2016) Write-behind logging. Proc VLDB Endow 10(4):337–348

Digital Library

[8]

Bakratsas M, Basaras P, Katsaros D, Tassiulas L (2016) Hadoop mapreduce performance on ssds: the case of complex network analysis tasks. In: INNS Conference on Big Data. Springer, Berlin, pp 111–119

[9]

Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

Digital Library

[10]

Gao S, Xu J, Härder T, He B, Choi B, Hu H (2015) Pcmlogging: optimizing transaction logging and recovery performance with PCM. IEEE Trans Knowl Data Eng 27(12):3332–3346

Digital Library

[11]

Hadoop Distribted Filesystem Federation. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html

[12]

Hadoop Archival Stroage, SSD & Memory Document. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html

[13]

HiBench Home Page. https://github.com/intel-hadoop

[14]

Huang S, Huang J, Dai J, Xie T, Huang B (2010) The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010). IEEE, pp 41–51

[15]

Islam NS, Wasi-ur Rahman M, Lu X, Panda DK (2016) High performance design for HDFS with byte-addressability of NVM and RDMA. In: Proceedings of the 2016 International Conference on Supercomputing. ACM, p 8

Digital Library

[16]

Kambatla K, Chen Y (2014) The truth about mapreduce performance on SSDS. In: 28th Large Installation System Administration Conference (LISA14), pp 118–126

[17]

Kim M, Shin M, Park S (2016) Take me to SSD: a hybrid block-selection method on HDFS based on storage type. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing. ACM, pp 965–971

[18]

Kim WH, Kim J, Baek W, Nam B, Won Y (2016) Nvwal: exploiting NVRAM in write-ahead logging. ACM SIGOPS Oper Syst Rev 50(2):385–398

[19]

Krish K, Iqbal MS, Butt AR (2014) Venu: Orchestrating SSDS in Hadoop storage. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, pp 207–212

[20]

Lee BC, Ipek E, Mutlu O, Burger D (2009) Architecting phase change memory as a scalable dram alternative. ACM SIGARCH Comput Archit News 37(3):2–13

Digital Library

[21]

Lee SK, Lim KH, Song H, Nam B, Noh SH (2017) WORT: write optimal radix tree for persistent memory storage systems. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp 257–270

[22]

Lu Y, Shu J, Chen Y, Li T (2017) Octopus: an RDMA-enabled distributed persistent memory file system. In: 2017 USENIX Annual Technical Conference (USENIXATC 17), pp 773–785

[23]

Moon S, Lee J, Kee YS (2014) Introducing SSDS to the Hadoop mapreduce framework. In: 2014 IEEE 7th International Conference on Cloud Computing. IEEE, pp 272–279

[24]

Neshatpour K, Malik M, Ghodrat MA, Sasan A, Homayoun H (2015) Energy-efficient acceleration of big data analytics applications using fpgas. In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp 115–123

[25]

Niazi S, Ismail M, Haridi S, Dowling J, Grohsschmiedt S, Ronström M (2017) Hopsfs: scaling hierarchical file system metadata using newsql databases. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp 89–104

[26]

Oh G, Kim S, Lee SW, Moon B (2015) Sqlite optimization with phase change memory for mobile applications. Proc VLDB Endow 8(12):1454–1465

Digital Library

[27]

Shvachko K, Kuang H, Radia S, Chansler R et al (2010) The hadoop distributed file system. MSST 10:1–10

[28]

Wasi-ur Rahman M, Islam NS, Lu X, Panda DK (2016) Can non-volatile memory benefit mapreduce applications on hpc clusters? In: 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS). IEEE, pp 19–24

[29]

Wasi-ur Rahman M, Islam NS, Lu X, Panda DKD (2017) Nvmd: non-volatile memory assisted design for accelerating mapreduce and dag execution frameworks on HPC systems. In: 2017 IEEE International Conference on Big Data (Big Data). IEEE, pp 369–374

[30]

Xia F, Jiang D, Xiong J, Sun N (2017) Hikv: a hybrid index key-value store for dram-NVM memory systems. In: 2017 USENIX Annual Technical Conference (USENIXATC 17), pp 349–362

[31]

Yang J, Izraelevitz J, Swanson S (2019) Orion: a distributed file system for non-volatile main memory and RDMA-capable networks. In: 17th USENIX Conference on File and Storage Technologies (FAST 19), pp 221–234

[32]

Yang J, Wei Q, Wang C, Chen C, Yong KL, He B (2016) Nv-tree: a consistent and workload-adaptive tree structure for non-volatile memory. IEEE Trans Comput 65(7):2169–2183

Index Terms

A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory
1. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Write Activity Minimization for Nonvolatile Main Memory Via Scheduling and Recomputation

Nonvolatile memories such as Flash memory, phase change memory (PCM), and magnetic random access memory (MRAM) have many desirable characteristics for embedded systems to employ them as main memory. However, there are two common challenges we need to ...
Having Memory Storage Under Control of a File System
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems

The development of memory storage device technologies, such as next generation non-volatile (NV) memory and battery backed NV-DIMM, has been advanced recently, and they became widely recognized. They provide high performance and persistency along with ...
Exploiting Multiple Write Modes of Nonvolatile Main Memory in Embedded Systems
Special Issue on Secure and Fault-Tolerant Embedded Computing and Regular Papers

Existing Nonvolatile Memories (NVMs) have many attractive features to be the main memory of embedded systems. These features include low power, high density, and better scalability. Recently, Multilevel Cell (MLC) NVM has gained more and more popularity ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing

The Journal of Supercomputing Volume 75, Issue 10

Oct 2019

878 pages

ISSN:0920-8542

Issue’s Table of Contents

Copyright © 2019 Springer Science+Business Media, LLC, part of Springer Nature.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2019

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents