Abstract
This paper proposes an effective replica-aware scheduling algorithm for independent jobs in Grid and distributed systems. The proposed algorithm considers not only the execution time of jobs but also the location and transfer time of data and data replica that these jobs require. We propose a cost model to estimate the starting time and earliest completion time of a job and its associated data (original or replicated). Based on the estimated time, the scheduling algorithm finds a proper execution sequence for the jobs and the data with the goal to minimize the makespan of the jobs. Our experiment results demonstrate that the proposed algorithm is scalable and outperforms a random job selection strategy. We also show that the proposed algorithm performs well compared to a conservative theoretical lower bound, with performance within 15% of the lower bound on average and within 40% in the worst case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ullman, J.D.: NP-complete scheduling problems. Journal of Computer and System Sciences 10(3), 384–393 (1975)
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. WH Freeman and Co., New York (1979)
Casanova, H., Obertelli, G., Berman, F., Wolski, R.: The apples parameter sweep template: user-level middleware for the grid. In: Proceedings of the ACM/IEEE conference on Supercomputing, pp. 75–76 (2000)
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)
Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Using simulation to evaluate scheduling heuristics for a class of applications in grid environments. Technical Report RR-1999-46, LIP, ENS (2000)
Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for scheduling parameter sweep applications in grid environments. In: Proceedings of Heterogeneous Computing Workshop, pp. 349–363 (2000)
Ranganathan, K., Foster, I.: Simulation studies of computation and data scheduling algorithms for data grids. Journal of Grid Computing 1(10), 53–62 (2003)
Chakrabarti, A., Dheepak, R., Sengupta, S.: Integration of scheduling and replication in data grids. In: Bougé, L., Prasanna, V.K. (eds.) HiPC 2004. LNCS, vol. 3296, pp. 375–385. Springer, Heidelberg (2004)
Tang, M., Lee, B., Tang, X., Yeo, C.: Combining data replication algorithms and job scheduling heuristics in the data grid. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 381–390. Springer, Heidelberg (2005)
Desprez, F., Vernois, A.: Simultaneous scheduling of replication and computation for data-intensive applications on the grid. Journal of Grid Computing 4(1), 19–31 (2006)
Beaumont, O., Legrand, A., Marchal, L., Robert, Y.: Pipelining broadcasts on heterogeneous platforms. IEEE Transactions on Parallel and Distributed Systems 16(4), 300–313 (2005)
Chen, C., Hsu, C., Wu, J., Liu, P.: GFS: A distributed file system with multi-source data access and replication for grid computing. In: The 5th Workshop on Grid Technologies and Applications (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liao, WC., Wu, JJ. (2010). Replica-Aware Job Scheduling in Distributed Systems. In: Bellavista, P., Chang, RS., Chao, HC., Lin, SF., Sloot, P.M.A. (eds) Advances in Grid and Pervasive Computing. GPC 2010. Lecture Notes in Computer Science, vol 6104. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13067-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-13067-0_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13066-3
Online ISBN: 978-3-642-13067-0
eBook Packages: Computer ScienceComputer Science (R0)