Abstract
With the development of content-sharing and collaborative computing services such as online social networks, scientific workflow, there are huge amounts of data generated. To process this tremendous amount of data, multi-cloud system that integrates multiple clouds together to provide a unified service in a collaborative manner has been introduced. However, task scheduling in such heterogeneous multi-cloud environment is very challenging. To reduce response delay caused by cross-data centers file access, we proposed a replica-aware task scheduling algorithm based on data replication. For speeding up data access in multi-cloud cooperative caches, we presented a load balanced cache placement algorithm based on Bayesian networks. In our scheduling algorithm, combined transferring computation with transferring data, resource matching is accomplished according to node locality. Only non-local unassigned and failed map tasks’ input data are replicated and transferred in advance to target nodes to expedite task execution. In our cache placement method, based on Bayesian networks the next execute task is predicted. In accordance with caching profit and recycling cost, cache prefetching files are selected. For each prefetching file, according to load balancing, target placement node is determined. Extensive experimental results show that the performance of our proposed replica-aware task scheduling algorithm is better than benchmark scheduling algorithms in terms of node locality ratio and job response time, and our load balanced cache placement algorithm outperforms the baseline caching algorithms in performance of prefetching hit ratio and execution time saving ratio.
Similar content being viewed by others
References
Yang JY, Yang MQ, Zhu MM et al (2008) Promoting synergistic research and education in genomics and bioinformatics. BMC Genom 9(1):I1
Yang MQ, Athey BD, Arabnia HR et al (2009) High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC Genom 10(1):I1
Arabnia HR, Taha TR (1998) A parallel numerical algorithm on a reconfigurable multi-ring network. Telecommun Syst 10(1–2):185–202
Ehandarkar SM, Arabnia HR (1997) Parallel computer vision on a reconfigurable multiprocessor network. IEEE Trans Parallel Distrib Syst 8(3):292–309
Chaudhary R, Aujla GS, Kumar N et al (2018) Optimized big data management across multi-cloud data centers: software-defined-network-based analysis. IEEE Commun Mag 56(2):118–126
Nikolaou S, Van Renesse R, Schiper N (2016) Proactive cache placement on cooperative client caches for online social networks. IEEE Trans Parallel Distrib Syst 27(4):1174–1186
Motavaselalhagh F, Esfahani FS, Arabnia HR (2015) Knowledge-based adaptable scheduler for SaaS providers in cloud computing. Hum Centric Comput Inf Sci 5(1):16
Tang Z, Liu M, Ammar A et al (2016) An optimized MapReduce workflow scheduling algorithm for heterogeneous computing. J Supercomput 72(6):2059–2079
Cai X, Li F, Li P et al (2017) SLA-aware energy-efficient scheduling scheme for Hadoop YARN. J Supercomput 73(8):3526–3546
Hashem IAT, Anuar NB, Marjani M et al (2018) Multi-objective scheduling of MapReduce jobs in big data processing. Multimed Tools Appl 77(8):9979–9994
Li C, Zhu L, Liu Y et al (2017) Resource scheduling approach for multimedia cloud content management. J Supercomput 73(12):5150–5172
Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219
Nguyen MC et al (2017) Prefetching-based metadata management in Advanced Multitenant Hadoop. J Supercomput 2017(2):1–21
Xie Q, Pundir M, Lu Y et al (2017) Pandas: robust locality-aware scheduling with stochastic delay optimality. IEEE/ACM Trans Netw (TON) 25(2):662–675
Naik NS, Negi A, Tapas Bapu BR et al (2019) A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434
Kaur K, Kumar N, Garg S et al (2018) EnLoc: data locality-aware energy-efficient scheduling scheme for cloud data centers. In: 2018 IEEE International Conference on Communications (ICC). IEEE, pp 1–6
Convolbo MW et al (2018) GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers. Computing 100(1):21–46
Sahoo J, Salahuddin MA, Glitho R et al (2016) A survey on replica server placement algorithms for content delivery networks. IEEE Commun Surv Tutor 19(2):1002–1026
Chae SH, Quek TQS, Choi W (2017) Content placement for wireless cooperative caching helpers: a tradeoff between cooperative gain and content diversity gain. IEEE Trans Wirel Commun 16(10):6795–6807
Chae SH, Choi W (2016) Caching placement in stochastic wireless caching helper networks: channel selection diversity via caching. IEEE Trans Wirel Commun 15(10):6626–6637
Li C, Toni L, Zou J et al (2018) QoE-driven mobile edge caching placement for adaptive video streaming. IEEE Trans Multimed 20:965–984
Song J, Song H, Choi W (2017) Optimal content placement for wireless femto-caching network. IEEE Trans Wirel Commun 16(7):4433–4444
Liu J, Bai B, Zhang J et al (2017) Cache placement in Fog-RANs: from centralized to distributed algorithms. IEEE Trans Wirel Commun 16(11):7039–7051
Sung J, Kim M, Lim K et al (2016) Efficient cache placement strategy in two-tier wireless content delivery network. IEEE Trans Multimed 18(6):1163–1174
Poularakis K, Tassiulas L (2016) On the complexity of optimal content placement in hierarchical caching networks. IEEE Trans Commun 64(5):2092–2103
Kovács J, Kacsuk P (2018) Occopus: a multi-cloud orchestrator to deploy and manage complex scientific infrastructures. J Grid Comput 16(1):19–37
Moreno-Vozmediano R, Montero RS, Huedo E et al (2018) Orchestrating the deployment of high availability services on multi-zone and multi-cloud scenarios. J Grid Comput 16(1):39–53
Guerrero C, Lera I, Juiz C (2018) Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications. J Supercomput 74(7):1–28
Bruno R, Costa F, Ferreira P (2017) freeCycles-efficient multi-cloud computing platform. J Grid Comput 15(4):501–526
Panda SK, Gupta I, Jana PK (2017) Task scheduling algorithms for multi-cloud systems: allocation-aware approach. Inf Syst Front 1–19
Panda SK, Jana PK (2017) SLA-based task scheduling algorithms for heterogeneous multi-cloud environment. J Supercomput 73(6):2730–2762
Thirumalaiselvan C, Venkatachalam V (2017) A strategic performance of virtual task scheduling in multi cloud environment. Clust Comput. https://doi.org/10.1007/s10586-017-1268-7
Kang S, Veeravalli B, Aung KMM (2018) Dynamic scheduling strategy with efficient node availability prediction for handling divisible loads in multi-cloud systems. J Parallel Distrib Comput 113:1–16
Kavulya S, Tan J, Gandhi R et al (2010) An analysis of traces from a production MapReduce cluster. In: 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 94–103
Fair Scheduler. https://issues.apache.org/jira/browse/HADOOP-3746. Accessed 17 Feb 2016
Abad CL, Lu Y, Campbell RH (2011) DARE: adaptive data replication for efficient cluster scheduling. In: 2011 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 159–168
Chen Y, Ganapathi A, Griffith R et al (2011) The case for evaluating MapReduce performance using workload suites. In: IEEE 19th International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2011). IEEE, pp 390–399
Arlitt M, Cherkasova L, Dilley J, Friedrich R, Jin T (2000) Evaluating content management techniques for Web proxy caches. ACM SIGMETRICS Perform Eval Rev 27(4):3–11
Kim E, Liu JCL (2017) An integrated prefetching/caching scheme in multimedia servers. J Netw Comput Appl 88:1–21
Acknowledgements
The work was supported by the National Natural Science Foundation (NSF) under Grants (Nos. 61672397, 61873341, 61472294, 61771354), Application Foundation Frontier Project of WuHan (No. 2018010401011290), the Young Teachers’ Scientific Research Ability Promotion Project of Huanghuai University (No. 2017LX09), Beijing Intelligent Logistics System Collaborative Innovation Center Open Project (No. BILSCIC-2018KF-02), Key Laboratory of Agricultural Remote Sensing [2017002], Beijing Youth Top-notch Talent Plan of High-Creation Plan (No. 2017000026833ZK25), Canal Plan-Leading Talent Project of Beijing Tongzhou District (No. YHLB2017038), and Beijing Key Laboratory of Intelligent Logistics System (No. BZ0211). Any opinions, findings, and conclusions are those of the authors and do not necessarily reflect the views of the above agencies.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, C., Zhang, J. & Tang, H. Replica-aware task scheduling and load balanced cache placement for delay reduction in multi-cloud environment. J Supercomput 75, 2805–2836 (2019). https://doi.org/10.1007/s11227-018-2695-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2695-9