Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2600212.2600715acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper

SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS

Published: 23 June 2014 Publication History

Abstract

In this paper, we propose SOR-HDFS, a SEDA (Staged Event-Driven Architecture)-based approach to improve the performance of HDFS Write operation. This design not only incorporates RDMA-based communication over InfiniBand but also maximizes overlapping among different stages of data transfer and I/O. Performance evaluations show that, the new design improves the aggregated write throughput of Enhanced DFSIO benchmark in Intel HiBench by up to 64% and reduces the job execution time by 37% compared to IPoIB (IP over InfiniBand). Compared to the previous best RDMA-enhanced design [4], the improvements in throughput and execution time are 30% and 20%, respectively. Our design can also improve the performance of HBase Put operation by up to 53% over IPoIB and 29% compared to the previous best RDMA-enhanced HDFS. To the best of our knowledge, this is the first design of SEDA-based HDFS in the literature.

References

[1]
Gordon at San Diego Supercomputer Center, http://www.sdsc.edu/us/resources/gordon/.
[2]
N. S. Islam, X. Lu, M. W. Rahman, J. Jose, H. Wang, and D. K. Panda. A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters. In The Proceedings of 2nd Workshop on Big Data Benchmarking (WBDB), India, 2012.
[3]
N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda. Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? In The Proceedings of IEEE 21st Annual Symposium on High-Performance Interconnects (HOTI), San Jose, CA, 2013.
[4]
N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. High Performance RDMA-based Design of HDFS over InfiniBand. In The Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Salt Lake City, 2012.
[5]
J. Shafer, S. Rixner, and A. L. Cox. The Hadoop Distributed Filesystem: Balancing Portability and Performance. In The Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS'10), White Plains, NY, 2010.
[6]
Stampede at Texas Advanced Computing Center. http://www.tacc.utexas.edu/resources/hpc/stampede.
[7]
M. Welsh, D. Culler, and E. Brewer. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP), Banff, Alberta, Canada, 2001.

Cited By

View all
  • (2023)Characterizing Lossy and Lossless Compression on Emerging BlueField DPU Architectures2023 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI59126.2023.00019(33-40)Online publication date: Aug-2023
  • (2023)xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep LearningJournal of Computer Science and Technology10.1007/s11390-023-2894-638:1(166-195)Online publication date: 31-Mar-2023
  • (2022)The research and analysis of efficiency of hardware usage base on HDFSCluster Computing10.1007/s10586-022-03597-025:5(3719-3732)Online publication date: 11-May-2022
  • Show More Cited By

Index Terms

  1. SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing
    June 2014
    334 pages
    ISBN:9781450327497
    DOI:10.1145/2600212
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. big data
    2. clusters and networks
    3. hdfs
    4. storage

    Qualifiers

    • Short-paper

    Conference

    HPDC'14
    Sponsor:

    Acceptance Rates

    HPDC '14 Paper Acceptance Rate 21 of 130 submissions, 16%;
    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Characterizing Lossy and Lossless Compression on Emerging BlueField DPU Architectures2023 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI59126.2023.00019(33-40)Online publication date: Aug-2023
    • (2023)xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep LearningJournal of Computer Science and Technology10.1007/s11390-023-2894-638:1(166-195)Online publication date: 31-Mar-2023
    • (2022)The research and analysis of efficiency of hardware usage base on HDFSCluster Computing10.1007/s10586-022-03597-025:5(3719-3732)Online publication date: 11-May-2022
    • (2021)HatRPCProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476191(1-14)Online publication date: 14-Nov-2021
    • (2020)DCache: A Distributed Cache Mechanism for HDFS based on RDMA2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS50907.2020.00035(283-291)Online publication date: Dec-2020
    • (2019)Approaches of enhancing interoperations among high performance computing and big data analytics via augmentationCluster Computing10.1007/s10586-019-02960-yOnline publication date: 3-Aug-2019
    • (2017)Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data AnalyticsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.262755828:6(1663-1674)Online publication date: 1-Jun-2017
    • (2017)A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC ClustersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.259194728:3(633-646)Online publication date: 1-Mar-2017
    • (2017)Swift-XProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.103(238-247)Online publication date: 14-May-2017
    • (2017)NVMD: Non-volatile memory assisted design for accelerating MapReduce and DAG execution frameworks on HPC systems2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8257947(369-374)Online publication date: Dec-2017
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media