Abstract
Traditional job schedulers for grid or cluster systems are responsible for assigning incoming jobs to compute nodes in such a way that some evaluative condition is met. Such systems generally take into consideration the availability of compute cycles, queue lengths, and expected job execution times, but they typically do not account directly for data staging and thus miss significant associated opportunities for optimisation. Intuitively, a tighter integration of job scheduling and automated data replication can yield significant advantages due to the potential for optimised, faster access to data and decreased overall execution time. In this paper we consider data placement as a first-class citizen in scheduling and use an optimisation heuristic for generating schedules. We make the following two contributions. First, we identify the necessity for co-scheduling job dispatching and data replication assignments and posit that simultaneously scheduling both is critical for achieving good makespans. Second, we show that deploying a genetic search algorithm to solve the optimal allocation problem has the potential to achieve significant speed-up results versus traditional allocation mechanisms. Through simulation, we show that our algorithm provides on average an approximately 20-45% faster makespan than greedy schedulers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adamic, L.: Zipf, Power-laws, and Pareto – a ranking tutorial, http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
Baeck, T., Fogel, D., Michalewicz, Z. (eds.): Evolutionary Computation 1: Basic Algorithms and Operators. Institute of Physics Publishing (2000)
Braun, T., Siegel, H., Beck, N., Boloni, L., Maheswaran, M., Reuther, A., Robertson, J., Theys, M., Yao, B., Hengsen, D., Freund, R.: A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems. Journal of Parallel and Distributed Computing 61(6) (June 2001)
Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. In: Proceedings of the 9th Heterogeneous Computing Workshop (May 2000)
Chakrabarti, A., Dheepak, R.A., Sengupta, S.: Integration of Scheduling and Replication in Data Grids. In: Bougé, L., Prasanna, V.K. (eds.) HiPC 2004. LNCS, vol. 3296, pp. 375–385. Springer, Heidelberg (2004)
Davis, L.: Job Shop Scheduling with Genetic Algorithms. In: Proceedings of the International Conference on High Performance Computing (1985)
Deelman, E., Kosar, T., Kesselman, C., Livny, M.: What Makes Workflows Work in an Opportunistic Environment? In: Concurrency and Computation: Practice and Experience (2004)
Feitelson, D.: A Survey of Scheduling in Multiprogrammed Parallel Systems. IBM Research Report RC 19790 (87657) (1994)
Feitelson, D., Rudolph, L., Schwiegelshohn, U.: Parallel Job Scheduling – A Status Report. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 1–16. Springer, Heidelberg (2005)
The Grid Physics Project, http://www.griphyn.org
Holtman, K.: CMS Requirements for the Grid. In: Proceedings of the International Conference on Computing in High Energy and Nuclear Physics (2001)
Hu, N., Li, L., Mao, Z., Steenkiste, P., Wang, J.: Locating Internet Bottlenecks: Algorithms, Measurements, and Implications. In: Proceedings of SIGCOMM (2004)
Kosar, T., Livny, M.: Stork: Making Data Placement a First Class Citizen in the Grid. In: Proceedings of IEEE International Conference on Distributed Computing Systems (2004)
Lifka, D.: The ANL/IBM SP Scheduling System. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949. Springer, Heidelberg (1995)
Michaelewicz, Z., Fogel, D.: How to Solve It: Modern Heuristics. Springer, Heidelberg (2000)
Mohamed, H., Epema, D.: An Evaluation of the Close-to-Files Processor and Data Co-Allocation Policy in Multiclusters. In: Proceedings of the IEEE International Conference on Cluster Computing (2004)
Mu’alem, A., Feitelson, D.: Utilization, Predictability, Workloads,and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling. IEEE Transactions on Parallel and Distributed Systems (June 2001)
The Particle Physics Data Grid, http://www.ppdg.net
Ranganathan, K., Foster, I.: Computation Scheduling and Data Replication Algorithms for Data Grids. In: Nabrzyski, J., Schopf, J., Weglarz, J. (eds.) Grid Resource Management: State of the Art and Future Trends. Kluwer Academic Publishers, Dordrecht (2003)
Ribeiro, V., Riedi, R., Baraniuk, R.: Locating Available Bandwidth Bottlenecks. IEEE Internet Computing (September-October 2004)
Santos-Neto, E., Cirne, W., Brasileiro, F., Lima, A.: Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 210–232. Springer, Heidelberg (2005)
Schmueli, E., Feitelson, D.: Backfilling with Lookahead to Optimize the Packing of Parallel Jobs. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 228–251. Springer, Heidelberg (2003)
Stockinger, H., Samar, A., Allcock, B., Foster, I., Holtman, K., Tierney, B.: File and Object Replication in Data Grids. In: Proceedings of the 10th International Symposium on High Performance Distributed Computing (2001)
Thain, D., Bent, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Livny, M.: Gathering at the Well: Creating Communities for Grid I/O. In: Proceedings of Supercomputing (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Phan, T., Ranganathan, K., Sion, R. (2005). Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm. In: Feitelson, D., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2005. Lecture Notes in Computer Science, vol 3834. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11605300_9
Download citation
DOI: https://doi.org/10.1007/11605300_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31024-2
Online ISBN: 978-3-540-31617-6
eBook Packages: Computer ScienceComputer Science (R0)