research-article

MapReduce short jobs optimization based on resource reuse

Microprocessors & Microsystems, Volume 47, Issue PA

Pages 178 - 187

https://doi.org/10.1016/j.micpro.2016.05.007

Published: 01 November 2016 Publication History

Abstract

Hadoop is an open-source implementation of MapReduce serving for processing large datasets in a massively parallel manner. It was designed aiming at executing large-scale jobs in an enormous number of computing nodes offering computing and storage. However, Hadoop is frequently employed to process short jobs. In practice, short jobs suffer from poor response time and run inefficiently. To fill this gap, this paper analyses the process of job execution and depicts the existing issues why short jobs run inefficiently in Hadoop. According to the characteristic of task execution in multi-wave under cluster overload, we develop a mechanism in light of resource reuse to optimize short jobs execution. This mechanism can reduce the frequency of resource allocation and recovery. Experimental results suggest that the developed mechanism based on resource reuse is able to improve effectiveness of the resource utilization. In addition, the runtime of short jobs can be significantly reduced.

References

[1]

J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters, Commun. ACM 51 (1) (2008) 107-113.

Digital Library

[2]

M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, Ion stoica, improving MapReduce performance in heterogeneous environments, Symposium on Operating Systems Design and Implementation, 8, 2008.

Digital Library

[3]

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, A. Goldberg, Quincy: fair scheduling for distributed computing clusters, in: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 2009, pp. 261-276.

Digital Library

[4]

G. Ananthanarayanan, A. Ghodsi, S. Shenker, I. Stoica, Effective straggler mitigation: attack of the clones, in: USENIX Symposium on Networked Systems Design and Implementation, 13, 2013, pp. 185-198.

Digital Library

[5]

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker, I. Stoica, Mesos: a platform for fine-grained resource sharing in the data center, in: USENIX Symposium on Networked Systems Design and Implementation, 11, 2011, p. 22.

Digital Library

[6]

G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Yi Lu, B. Saha, E. Harris, Reining in the outliers in Map-reduce clusters using Mantri, USENIX Symposium on Operating Systems Design and Implementation, 10, 2010.

Digital Library

[7]

Q. Chen, C. Liu, Z. Xiao, Improving MapReduce performance using smart speculative execution strategy, IEEE Trans. Comput. 63 (4) (2013) 954-967.

Digital Library

[8]

G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, Coping with skewed content popularity in MapReduce clusters, in: S. Harris (Ed.), Proceedings of the Sixth Conference on Computer systems, 2011, pp. 287-300.

Digital Library

[9]

B. Gufler, N. Augsten, A. Reiser, A. Kemper, Load balancing in MapReduce based on scalable cardinality estimates, in: International Conference on Data Engineering, 2012, pp. 522-533.

Digital Library

[10]

Y.C. Kwon, M. Balazinska, B. Howe, J. Rolia, SkewTune: mitigating skew in MapReduce applications, in: Proceedings of the SIGMOD International Conference on Management of Data, 2012, pp. 25-36.

Digital Library

[11]

K. Elmeleegy, C. Olston, B. Reed, SpongeFiles: mitigating data skew in mapreduce using distributed memory, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014, pp. 551-562.

Digital Library

[12]

M. Pastorelli, A. Barbuzzi, D. Carra, M. Dell'Amico, P. Michiardi, HFSP: size-based scheduling for Hadoop, in: International Conference on Big Data, 2013, pp. 51-59.

[13]

M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, I. Stoica, Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling, in: Proceedings of the 5th European Conference on Computer Systems, 2010, pp. 265-278.

Digital Library

[14]

S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, S. Wu, Maestro: replica-aware map scheduling for MapReduce, in: International Symposium on Cluster, Cloud and Grid Computing, 2012, pp. 435-442.

Digital Library

[15]

W. Wang, K. Zhu, L. Ying, J. Tan, L. Zhang, A throughput optimal algorithm for map task scheduling in mapreduce with data locality, ACM Sigmetrics Perf. Eval. Rev. 40 (4) (2013) 33-42.

Digital Library

[16]

X. Tang, L. Wang, Z. Geng, A reduce task scheduler for MapReduce with minimum transmission cost based on sampling evaluation, Int. J. Database Theory Appl. 8 (1) (2015) 1-10.

[17]

M.H. Mottaghi, H.R. Zarandi, DFTS: a dynamic fault-tolerant scheduling for real-time tasks in multicore processors, Micropr. Microsyst. 38 (1) (2014) 88-97.

Digital Library

[18]

Y. Yao, J. Tai, B. Sheng, N.F. Mi, LsPS: a job size-based scheduler for efficient task assignments in hadoop, in: IEEE Transactions on Cloud Computing, 3, 2014, pp. 411-424.

Digital Library

[19]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, in: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, 2012.

Digital Library

[20]

A. Okcan, M. Riedewald, Anti-combining for MapReduce, in: Proceedings of the SIGMOD International Conference on Management of Data, 2014, pp. 839-850.

Digital Library

[21]

A. Eldawy, M.F. Mokbel, SpatialHadoop: a MapReduce framework for spatial data, in: Proceedings of the IEEE International Conference on Data Engineering, 2015, pp. 1352-2363.

[22]

C. Kachris, G.Ch. Sirakoulis, D. Soudris, A MapReduce scratchpad memory for multi-core cloud computing applications, Micropr. Microsyst. 39 (8) (2015) 599-608.

Digital Library

[23]

V.K. Vavilapalli, A.C. Murthy, C. Douglas, S. Agarwal, et al., Apache Hadoop YARN: yet another resource negotiator, in: Proceedings of the 4th Annual Symposium on Cloud Computing, 2013.

Digital Library

[24]

K. Elmeleegy, in: Piranha: optimizing short jobs in Hadoop, 6, 2013, pp. 985-996.

Digital Library

[25]

K. Ousterhout, A. Panda, J. Rosen, S. Venkataraman, R. Xin, S. Ratnasamy, S. Shenker, I. Stoica, The case for tiny tasks in compute clusters, Workshop on Hot Topics in Operating Systems, 2013.

Digital Library

[26]

L. Lin, V. Lychagina, W. Liu, Y. Kwon, S. Mittal, M. Wong, Tenzing a sql implementation on the mapreduce framework, in: Proceedings of the VLDB Endow- ment, 2011.

[27]

J. Yan, X.L. Yang, R. Gu, C. Yuan, Y. Huang, Performance optimization for short MapReduce job execution in Hadoop, in: International Conference on Cloud and Green Computing, 2012, pp. 688-694.

Digital Library

[28]

K. Ousterhout, P. Wendell, M. Zaharia, I. Stoica, Sparrow: distributed, low latency scheduling, in: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013, pp. 69-84.

Digital Library

[29]

M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, I. Stoica, Job Scheduling for Multi-User Mapreduce Clusters, EECS Department, University of California, Berkeley, 2009 Tech. Rep. UCB/EECS-2009-55.

[30]

Z. Ren, X. Xu, J. Wan, W. Shi, M. Zhou, Workload characterization on a production Hadoop cluster: a case study on taobao, in: IEEE International Symposium on Workload Characterization, 2012, pp. 3-13.

Digital Library

Cited By

Zhang WCheng AGeilen M(2016)Special Issue on Real-Time Scheduling on Heterogeneous Multi-core ProcessorsMicroprocessors & Microsystems10.1016/j.micpro.2016.11.00547:PA(90-92)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.micpro.2016.11.005

MapReduce short jobs optimization based on resource reuse
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures

Recommendations

Techniques for Handling Error in User-estimated Execution Times During Resource Management on Systems Processing MapReduce Jobs
CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

In our previous work, we described a resource allocation and scheduling technique for processing an open stream of MapReduce jobs with SLAs (characterized by an earliest start time, an execution time, and a deadline) called the Hadoop Constraint ...
Multi-objective scheduling of MapReduce jobs in big data processing

Data generation has increased drastically over the past few years due to the rapid development of Internet-based technologies. This period has been called the big data era. Big data offer an emerging paradigm shift in data exploration and utilization. ...
Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan

Cloud computing offers an attractive option for businesses to rent a suitable size MapReduce cluster, consume resources as a service, and pay only for resources that were consumed. A key challenge in such environments is to increase the utilization of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Microprocessors & Microsystems

Microprocessors & Microsystems Volume 47, Issue PA

November 2016

250 pages

ISSN:0141-9331

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 November 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang WCheng AGeilen M(2016)Special Issue on Real-Time Scheduling on Heterogeneous Multi-core ProcessorsMicroprocessors & Microsystems10.1016/j.micpro.2016.11.00547:PA(90-92)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.micpro.2016.11.005

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents