Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2103380.2103444acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

More convenient more overhead: the performance evaluation of Hadoop streaming

Published: 02 November 2011 Publication History

Abstract

Hadoop is one popular implementation of MapReduce programming model, which has made programming on distributed system with much ease. In computer world, the convenience is always at the cost of performance. Comparing with MPI, Hadoop simplifies the programming, but it degrades the performance. In this work, we focus on the comparison between Hadoop and Hadoop Streaming, since Hadoop Streaming is widely used as it frees programmers from Java language, which makes programmers use the power of Hadoop more easily. Also, Hadoop Streaming brings the performance penalty. With deep analysis of Hadoop Streaming mechanism, we find out that pipe is the major bottleneck. In our experiments, we evaluate the performance of Hadoop Streaming with 6 benchmarks, The experiment results show that Hadoop Streaming degrades the performance a lot only for data intensive jobs, and for computational intensive jobs, Hadoop Streaming may even performs better because of using a more effiecient language than Java.

References

[1]
Apache hadoop. http://hadoop.apache.org.
[2]
Hadoop streaming. http://hadoop.apache.org/common/docs/stable/\\streaming.html.
[3]
R. Barga, D. Gannon, and D. Reed. The client and the cloud: Democratizing research computing. Internet Computing, IEEE, 15(1): 72--75, jan.-feb. 2011.
[4]
D. Bovet, M. Cesati, and A. Oram. Understanding the Linux kernel. O'Reilly & Associates, Inc., 2002.
[5]
R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic. Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst., 25: 599--616, June 2009.
[6]
M. de Kruijf and K. Sankaralingam. Mapreduce for the cell broadband engine architecture. IBM Journal of Research and Development, 53(5): 10:1--10:12, 2009.
[7]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51: 107--113, January 2008.
[8]
J. Ekanayake, T. Gunarathne, and J. Qiu. Cloud technologies for bioinformatics applications. IEEE Transactions on Parallel and Distributed Systems, 22(6): 998--1011, june 2011.
[9]
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: a mapreduce framework on graphics processors. In the 17th international conference on Parallel architectures and compilation techniques, pages 260--269, New York, NY, USA, 2008. ACM.
[10]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev., 41: 59--72, March 2007.
[11]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In IEEE 13th International Symposium on High Performance Computer Architecture, pages 13--24, feb. 2007.
[12]
J. A. Stuart, C.-K. Chen, K.-L. Ma, and J. D. Owens. Multi-gpu volume rendering using mapreduce. In the 19th ACM International Symposium on High Performance Distributed Computing, pages 841--848, New York, NY, USA, 2010. ACM.

Cited By

View all
  • (2022)MIX-RS: A Multi-Indexing System Based on HDFS for Remote Sensing Data StorageTsinghua Science and Technology10.26599/TST.2021.901008227:6(881-893)Online publication date: Dec-2022
  • (2022)A unified framework to improve the interoperability between HPC and Big Data languages and programming modelsFuture Generation Computer Systems10.1016/j.future.2022.04.002134(123-139)Online publication date: Sep-2022
  • (2021)A Digital Twin Decision Support System for the Urban Facility Management ProcessSensors10.3390/s2124846021:24(8460)Online publication date: 18-Dec-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
RACS '11: Proceedings of the 2011 ACM Symposium on Research in Applied Computation
November 2011
355 pages
ISBN:9781450310871
DOI:10.1145/2103380
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • SIGAPP: ACM Special Interest Group on Applied Computing
  • ACCT: Association of Convergent Computing Technology
  • CUSST: University of Suwon: Center for U-city Security & Surveillance Technology of the University of Suwon
  • KIISE: Korean Institute of Information Scientists and Engineers
  • KISTI

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hadoop
  2. Hadoop streaming
  3. Linux kernel
  4. MapReduce

Qualifiers

  • Research-article

Funding Sources

Conference

RACS '11
Sponsor:
RACS '11: Research in Applied Computation Symposium
November 2 - 5, 2011
Florida, Miami

Acceptance Rates

Overall Acceptance Rate 393 of 1,581 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)MIX-RS: A Multi-Indexing System Based on HDFS for Remote Sensing Data StorageTsinghua Science and Technology10.26599/TST.2021.901008227:6(881-893)Online publication date: Dec-2022
  • (2022)A unified framework to improve the interoperability between HPC and Big Data languages and programming modelsFuture Generation Computer Systems10.1016/j.future.2022.04.002134(123-139)Online publication date: Sep-2022
  • (2021)A Digital Twin Decision Support System for the Urban Facility Management ProcessSensors10.3390/s2124846021:24(8460)Online publication date: 18-Dec-2021
  • (2020)MaRe: Processing Big Data with application containers on Apache SparkGigaScience10.1093/gigascience/giaa0429:5Online publication date: 5-May-2020
  • (2020)Ignis: An efficient and scalable multi-language Big Data frameworkFuture Generation Computer Systems10.1016/j.future.2019.12.052Online publication date: Jan-2020
  • (2019)Improved Programming-Language Independent MapReduce on Shared-Memory SystemsBig Data Analytics and Knowledge Discovery10.1007/978-3-030-27520-4_15(206-220)Online publication date: 3-Aug-2019
  • (2018)LinguaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS.2018.8554689(239-244)Online publication date: Oct-2018
  • (2018)MapReduce a Comprehensive Review2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE)10.1109/ICSCEE.2018.8538364(1-6)Online publication date: Jul-2018
  • (2018)XRT: Programming-Language Independent MapReduce on Shared-Memory Systems2018 IEEE International Congress on Big Data (BigData Congress)10.1109/BigDataCongress.2018.00031(182-189)Online publication date: Jul-2018
  • (2018)K-mer Counting: memory-efficient strategy, parallel computing and field of application for Bioinformatics2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM.2018.8621325(2561-2567)Online publication date: Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media