Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory

Published: 01 October 2019 Publication History

Abstract

With the emergence of the big data era, various technologies have been proposed to cope with the exascale of data. For a considerably large volume of data, a single machine does not comprise enough resources to store the complete data. Hadoop distributed file system (HDFS) enables large datasets to be stored across the big data environment consisting of several machines. Although Hadoop has become a crucial part of the big data industry, because of its simple architecture which composed of master and slaves several problems such as scalability and performance bottleneck has been remained to solve. New storage technologies offer an opportunity to solve the problems and improve HDFS. We propose a novel management scheme for namespace metadata of HDFS by utilizing nonvolatile memory which has been mentioned as the next-generation device since flash memory devices. Nonvolatile memory, which can guarantee data persistence and high performance with byte-address access, alleviates Namenode bottlenecks resulting from journaling processes performed to preserve the file system’s metadata. Our proposed methods show significant improvement compared with block devices such as hard disk drive, solid-state drive in terms of NameNode performance.

References

[1]
Andrei M, Lemke C, Radestock G, Schulze R, Thiel C, Blanco R, Meghlan A, Sharique M, Seifert S, Vishnoi S et al (2017) Sap hana adoption of non-volatile memory. Proc VLDB Endow 10(12):1754–1765
[2]
[3]
[4]
[5]
[6]
Arulraj J, Pavlo A (2017) How to build a non-volatile memory database management system. In: Proceedings of the 2017 ACM International Conference on Management of Data. ACM, pp 1753–1758
[7]
Arulraj J, Perron M, Pavlo A (2016) Write-behind logging. Proc VLDB Endow 10(4):337–348
[8]
Bakratsas M, Basaras P, Katsaros D, Tassiulas L (2016) Hadoop mapreduce performance on ssds: the case of complex network analysis tasks. In: INNS Conference on Big Data. Springer, Berlin, pp 111–119
[9]
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
[10]
Gao S, Xu J, Härder T, He B, Choi B, Hu H (2015) Pcmlogging: optimizing transaction logging and recovery performance with PCM. IEEE Trans Knowl Data Eng 27(12):3332–3346
[14]
Huang S, Huang J, Dai J, Xie T, Huang B (2010) The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010). IEEE, pp 41–51
[15]
Islam NS, Wasi-ur Rahman M, Lu X, Panda DK (2016) High performance design for HDFS with byte-addressability of NVM and RDMA. In: Proceedings of the 2016 International Conference on Supercomputing. ACM, p 8
[16]
Kambatla K, Chen Y (2014) The truth about mapreduce performance on SSDS. In: 28th Large Installation System Administration Conference (LISA14), pp 118–126
[17]
Kim M, Shin M, Park S (2016) Take me to SSD: a hybrid block-selection method on HDFS based on storage type. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing. ACM, pp 965–971
[18]
Kim WH, Kim J, Baek W, Nam B, Won Y (2016) Nvwal: exploiting NVRAM in write-ahead logging. ACM SIGOPS Oper Syst Rev 50(2):385–398
[19]
Krish K, Iqbal MS, Butt AR (2014) Venu: Orchestrating SSDS in Hadoop storage. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, pp 207–212
[20]
Lee BC, Ipek E, Mutlu O, Burger D (2009) Architecting phase change memory as a scalable dram alternative. ACM SIGARCH Comput Archit News 37(3):2–13
[21]
Lee SK, Lim KH, Song H, Nam B, Noh SH (2017) WORT: write optimal radix tree for persistent memory storage systems. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp 257–270
[22]
Lu Y, Shu J, Chen Y, Li T (2017) Octopus: an RDMA-enabled distributed persistent memory file system. In: 2017 USENIX Annual Technical Conference (USENIXATC 17), pp 773–785
[23]
Moon S, Lee J, Kee YS (2014) Introducing SSDS to the Hadoop mapreduce framework. In: 2014 IEEE 7th International Conference on Cloud Computing. IEEE, pp 272–279
[24]
Neshatpour K, Malik M, Ghodrat MA, Sasan A, Homayoun H (2015) Energy-efficient acceleration of big data analytics applications using fpgas. In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp 115–123
[25]
Niazi S, Ismail M, Haridi S, Dowling J, Grohsschmiedt S, Ronström M (2017) Hopsfs: scaling hierarchical file system metadata using newsql databases. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp 89–104
[26]
Oh G, Kim S, Lee SW, Moon B (2015) Sqlite optimization with phase change memory for mobile applications. Proc VLDB Endow 8(12):1454–1465
[27]
Shvachko K, Kuang H, Radia S, Chansler R et al (2010) The hadoop distributed file system. MSST 10:1–10
[28]
Wasi-ur Rahman M, Islam NS, Lu X, Panda DK (2016) Can non-volatile memory benefit mapreduce applications on hpc clusters? In: 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS). IEEE, pp 19–24
[29]
Wasi-ur Rahman M, Islam NS, Lu X, Panda DKD (2017) Nvmd: non-volatile memory assisted design for accelerating mapreduce and dag execution frameworks on HPC systems. In: 2017 IEEE International Conference on Big Data (Big Data). IEEE, pp 369–374
[30]
Xia F, Jiang D, Xiong J, Sun N (2017) Hikv: a hybrid index key-value store for dram-NVM memory systems. In: 2017 USENIX Annual Technical Conference (USENIXATC 17), pp 349–362
[31]
Yang J, Izraelevitz J, Swanson S (2019) Orion: a distributed file system for non-volatile main memory and RDMA-capable networks. In: 17th USENIX Conference on File and Storage Technologies (FAST 19), pp 221–234
[32]
Yang J, Wei Q, Wang C, Chen C, Yong KL, He B (2016) Nv-tree: a consistent and workload-adaptive tree structure for non-volatile memory. IEEE Trans Comput 65(7):2169–2183

Index Terms

  1. A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Supercomputing
    The Journal of Supercomputing  Volume 75, Issue 10
    Oct 2019
    878 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 October 2019

    Author Tags

    1. Hadoop filesystem
    2. Nonvolatile memory
    3. NameNode
    4. Distributed computing

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media