Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

MapReduce short jobs optimization based on resource reuse

Published: 01 November 2016 Publication History

Abstract

Hadoop is an open-source implementation of MapReduce serving for processing large datasets in a massively parallel manner. It was designed aiming at executing large-scale jobs in an enormous number of computing nodes offering computing and storage. However, Hadoop is frequently employed to process short jobs. In practice, short jobs suffer from poor response time and run inefficiently. To fill this gap, this paper analyses the process of job execution and depicts the existing issues why short jobs run inefficiently in Hadoop. According to the characteristic of task execution in multi-wave under cluster overload, we develop a mechanism in light of resource reuse to optimize short jobs execution. This mechanism can reduce the frequency of resource allocation and recovery. Experimental results suggest that the developed mechanism based on resource reuse is able to improve effectiveness of the resource utilization. In addition, the runtime of short jobs can be significantly reduced.

References

[1]
J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters, Commun. ACM 51 (1) (2008) 107-113.
[2]
M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, Ion stoica, improving MapReduce performance in heterogeneous environments, Symposium on Operating Systems Design and Implementation, 8, 2008.
[3]
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, A. Goldberg, Quincy: fair scheduling for distributed computing clusters, in: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 2009, pp. 261-276.
[4]
G. Ananthanarayanan, A. Ghodsi, S. Shenker, I. Stoica, Effective straggler mitigation: attack of the clones, in: USENIX Symposium on Networked Systems Design and Implementation, 13, 2013, pp. 185-198.
[5]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker, I. Stoica, Mesos: a platform for fine-grained resource sharing in the data center, in: USENIX Symposium on Networked Systems Design and Implementation, 11, 2011, p. 22.
[6]
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Yi Lu, B. Saha, E. Harris, Reining in the outliers in Map-reduce clusters using Mantri, USENIX Symposium on Operating Systems Design and Implementation, 10, 2010.
[7]
Q. Chen, C. Liu, Z. Xiao, Improving MapReduce performance using smart speculative execution strategy, IEEE Trans. Comput. 63 (4) (2013) 954-967.
[8]
G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, Coping with skewed content popularity in MapReduce clusters, in: S. Harris (Ed.), Proceedings of the Sixth Conference on Computer systems, 2011, pp. 287-300.
[9]
B. Gufler, N. Augsten, A. Reiser, A. Kemper, Load balancing in MapReduce based on scalable cardinality estimates, in: International Conference on Data Engineering, 2012, pp. 522-533.
[10]
Y.C. Kwon, M. Balazinska, B. Howe, J. Rolia, SkewTune: mitigating skew in MapReduce applications, in: Proceedings of the SIGMOD International Conference on Management of Data, 2012, pp. 25-36.
[11]
K. Elmeleegy, C. Olston, B. Reed, SpongeFiles: mitigating data skew in mapreduce using distributed memory, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014, pp. 551-562.
[12]
M. Pastorelli, A. Barbuzzi, D. Carra, M. Dell'Amico, P. Michiardi, HFSP: size-based scheduling for Hadoop, in: International Conference on Big Data, 2013, pp. 51-59.
[13]
M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, I. Stoica, Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling, in: Proceedings of the 5th European Conference on Computer Systems, 2010, pp. 265-278.
[14]
S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, S. Wu, Maestro: replica-aware map scheduling for MapReduce, in: International Symposium on Cluster, Cloud and Grid Computing, 2012, pp. 435-442.
[15]
W. Wang, K. Zhu, L. Ying, J. Tan, L. Zhang, A throughput optimal algorithm for map task scheduling in mapreduce with data locality, ACM Sigmetrics Perf. Eval. Rev. 40 (4) (2013) 33-42.
[16]
X. Tang, L. Wang, Z. Geng, A reduce task scheduler for MapReduce with minimum transmission cost based on sampling evaluation, Int. J. Database Theory Appl. 8 (1) (2015) 1-10.
[17]
M.H. Mottaghi, H.R. Zarandi, DFTS: a dynamic fault-tolerant scheduling for real-time tasks in multicore processors, Micropr. Microsyst. 38 (1) (2014) 88-97.
[18]
Y. Yao, J. Tai, B. Sheng, N.F. Mi, LsPS: a job size-based scheduler for efficient task assignments in hadoop, in: IEEE Transactions on Cloud Computing, 3, 2014, pp. 411-424.
[19]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, in: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, 2012.
[20]
A. Okcan, M. Riedewald, Anti-combining for MapReduce, in: Proceedings of the SIGMOD International Conference on Management of Data, 2014, pp. 839-850.
[21]
A. Eldawy, M.F. Mokbel, SpatialHadoop: a MapReduce framework for spatial data, in: Proceedings of the IEEE International Conference on Data Engineering, 2015, pp. 1352-2363.
[22]
C. Kachris, G.Ch. Sirakoulis, D. Soudris, A MapReduce scratchpad memory for multi-core cloud computing applications, Micropr. Microsyst. 39 (8) (2015) 599-608.
[23]
V.K. Vavilapalli, A.C. Murthy, C. Douglas, S. Agarwal, et al., Apache Hadoop YARN: yet another resource negotiator, in: Proceedings of the 4th Annual Symposium on Cloud Computing, 2013.
[24]
K. Elmeleegy, in: Piranha: optimizing short jobs in Hadoop, 6, 2013, pp. 985-996.
[25]
K. Ousterhout, A. Panda, J. Rosen, S. Venkataraman, R. Xin, S. Ratnasamy, S. Shenker, I. Stoica, The case for tiny tasks in compute clusters, Workshop on Hot Topics in Operating Systems, 2013.
[26]
L. Lin, V. Lychagina, W. Liu, Y. Kwon, S. Mittal, M. Wong, Tenzing a sql implementation on the mapreduce framework, in: Proceedings of the VLDB Endow- ment, 2011.
[27]
J. Yan, X.L. Yang, R. Gu, C. Yuan, Y. Huang, Performance optimization for short MapReduce job execution in Hadoop, in: International Conference on Cloud and Green Computing, 2012, pp. 688-694.
[28]
K. Ousterhout, P. Wendell, M. Zaharia, I. Stoica, Sparrow: distributed, low latency scheduling, in: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013, pp. 69-84.
[29]
M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, I. Stoica, Job Scheduling for Multi-User Mapreduce Clusters, EECS Department, University of California, Berkeley, 2009 Tech. Rep. UCB/EECS-2009-55.
[30]
Z. Ren, X. Xu, J. Wan, W. Shi, M. Zhou, Workload characterization on a production Hadoop cluster: a case study on taobao, in: IEEE International Symposium on Workload Characterization, 2012, pp. 3-13.

Cited By

View all
  • (2016)Special Issue on Real-Time Scheduling on Heterogeneous Multi-core ProcessorsMicroprocessors & Microsystems10.1016/j.micpro.2016.11.00547:PA(90-92)Online publication date: 1-Nov-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Microprocessors & Microsystems
Microprocessors & Microsystems  Volume 47, Issue PA
November 2016
250 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 November 2016

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Special Issue on Real-Time Scheduling on Heterogeneous Multi-core ProcessorsMicroprocessors & Microsystems10.1016/j.micpro.2016.11.00547:PA(90-92)Online publication date: 1-Nov-2016

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media