Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1851476.1851598acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

MR-scope: a real-time tracing tool for MapReduce

Published: 21 June 2010 Publication History

Abstract

MapReduce programming model is emerging as an efficient tool for data-intensive applications. Hadoop, an open-source implementation of MapReduce, has been widely adopted and experienced by both academia and enterprise. Recently, lots of efforts have been done on improving the performance of MapReduce system and on analyzing the MapReduce process based on the log files generated during the Hadoop execution. Visualizing log files seems to be a very useful tool to understand the behavior of the Hadoop process. In this paper, we present MR-Scope, a real-time MapReduce tracing tool. MR-Scope provides a real-time insight of the MapReduce process, including the ongoing progress of every task hosted in Task Tracker. In addition, it displays the health of the Hadoop cluster data nodes, the distribution of the file system's blocks and their replicas and the content of the different block splits of the file system. We implement MR-Scope in native Hadoop 0.1. Experimental results demonstrat that MR-Scope's overhead is less than 4% when running wordcount benchmark.

References

[1]
}}Dean, J. and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of OSDI. 2004, 137--150.
[2]
}}Hadoop: http://hadoop.apache.org/2010.
[3]
}}Ghemawat, S., Gobioff, H., and Leung, S. The Google file system. In Proceedings of SOSP. 2003, 29--43.
[4]
}}Tan, J., Pan, X., Kavulya, S., Gandhi, R., and Narasimhan, P. Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop. CMU-PDL-09-103, Technical Report, May 2009.
[5]
}}Fonseca, R., Porter, G., Katz, R. H., Shenker, S., and Stoica, I. X-Trace: A Pervasive Network Tracing Framework. In Proceedings of NSDI. 2007.
[6]
}}Konwinski, A., Zaharia, M., Katz, R., and Stoica I. X-Tracing Hadoop. Presentation in Hadoop Summit, Mar 2008.
[7]
}}Massie, M. L., Chun, B. N., and Culler, D. E. The ganglia distributed monitoring system: design, implementation, and experience. In Parallel Computing. Vol. 30, No. 7, 2004, 817--840.
[8]
}}Quiroz, A., Gnanasambandam, N., Parashar, M., and Sharma, N. Robust clustering analysis for the management of self-monitoring distributed systems. In Proceedings of Cluster Computing. 2009, 73--85.
[9]
}}Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., and Stoica, I. Improving MapReduce Performance in Heterogeneous Environments. In Proceedings of OSDI. 2008, 29--42.
[10]
}}Boulon, J., Konwinski, A., Qi, R., Rabkin, A., Yang, E., and Yang, M. Chukwa: A large-scale monitoring system. In Proceedings of Cloud Computing and Its Applications, 2008.
[11]
}}Ibrahim, S., Jin, H., Cheng, B., Cao, H., Wu, S., and Qi, L. CLOUDLET: towards mapreduce implementation on virtual machines. In Proceedings of HPDC. 2009, 65--66.
[12]
}}Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S., and Shi, X. Evaluating MapReduce on Virtual Machines: The Hadoop Case. In Proceedings of CloudCom. 2009, 519--528.
[13]
}}Cretu-Ciocarlie, G. F., Budiu, M., and Goldszmidt, M. Hunting for Problems with Artemis. In Proceedings of WASL. 2008.
[14]
}}He, B. S., Fang, W., Luo, Q., Govindaraju, N. K., and Wang, T. Mars: a MapReduce framework on graphics processors. In Proceedings of PACT. 2008, 260--269.
[15]
}}Chen, S, and Schlosser, S. W. Map-Reduce Meets Wider Varieties of Applications. IRP-TR-08-05, Technical Report, Intel Research Pittsburgh, May 2008.
[16]
}}ParaGraph User Guide. ParaGraph: A Performance Visualization Tool for MPI http://www.csar.illinois.edu/software/paragraph/
[17]
}}MPI standard: http://www.mpi-forum.org/docs/mpi-11-html/mpi-report.html 2010.

Cited By

View all
  • (2021)Cloud deployment of game theoretic categorical clustering using apache spark: An application to car recommendationMachine Learning with Applications10.1016/j.mlwa.2021.100100(100100)Online publication date: Jul-2021
  • (2017)Enabling fast failure recovery in shared Hadoop clusters: Towards failure-aware schedulingFuture Generation Computer Systems10.1016/j.future.2016.02.01574(208-219)Online publication date: Sep-2017
  • (2016)On the Dynamic Shifting of the MapReduce TimeoutManaging and Processing Big Data in Cloud Computing10.4018/978-1-4666-9767-6.ch001(1-22)Online publication date: 2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
June 2010
911 pages
ISBN:9781605589428
DOI:10.1145/1851476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hadoop
  2. MapReduce
  3. real-time tracing
  4. visualization

Qualifiers

  • Research-article

Funding Sources

Conference

HPDC '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Cloud deployment of game theoretic categorical clustering using apache spark: An application to car recommendationMachine Learning with Applications10.1016/j.mlwa.2021.100100(100100)Online publication date: Jul-2021
  • (2017)Enabling fast failure recovery in shared Hadoop clusters: Towards failure-aware schedulingFuture Generation Computer Systems10.1016/j.future.2016.02.01574(208-219)Online publication date: Sep-2017
  • (2016)On the Dynamic Shifting of the MapReduce TimeoutManaging and Processing Big Data in Cloud Computing10.4018/978-1-4666-9767-6.ch001(1-22)Online publication date: 2016
  • (2016)MapReduce Parallel Programming ModelInternational Journal of Parallel Programming10.1007/s10766-015-0395-044:4(832-866)Online publication date: 1-Aug-2016
  • (2015)On Understanding the Energy Impact of Speculative Execution in HadoopProceedings of the 2015 IEEE International Conference on Data Science and Data Intensive Systems (DSDIS)10.1109/DSDIS.2015.45(396-403)Online publication date: 11-Dec-2015
  • (2014)Peer-Comparison Based Fault Diagnosis for Hadoop SystemsApplied Mechanics and Materials10.4028/www.scientific.net/AMM.621.235621(235-240)Online publication date: Aug-2014
  • (2013)TheiusProceedings of the 2013 IEEE International Conference on Cloud Engineering10.1109/IC2E.2013.36(177-182)Online publication date: 25-Mar-2013
  • (2011)Improving MapReduce energy efficiency for computation intensive workloadsProceedings of the 2011 International Green Computing Conference and Workshops10.1109/IGCC.2011.6008564(1-8)Online publication date: 25-Jul-2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media