Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2391541.2391568guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Scheduling file transfers for data-intensive jobs on heterogeneous clusters

Published: 28 August 2007 Publication History

Abstract

This paper addresses the problem of efficient collective scheduling of file transfers requested by a batch of tasks. Our work targets a heterogeneous collection of storage and compute clusters. The goal is to minimize the overall time to transfer files to their respective destination nodes. Two scheduling schemes are proposed and experimentally evaluated against an existing approach, the Insertion Scheduling. The first is a 0-1 Integer Programming based approach which is based on the idea of time-expanded networks. This scheme achieves the minimum total file transfer time, but has significant scheduling overhead. To address this issue, we propose a maximum weight graph matching based heuristic approach. This scheme is able to perform as well as insertion scheduling and has much lower scheduling overhead. We conclude that the heuristic scheme is a better fit for larger workloads and systems.

References

[1]
Khanna, G., Vydyanathan, N., Kurc, T., Catalyurek, U., Wyckoff, P., Saltz, J., Sadayappan, P.: A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O. In: Proc. of CCGrid'05, vol. 2, pp. 792-799 (2005)
[2]
Ford, L. R., Fulkerson, D. R.: Constructing maximal dynamic flows from static flows. Operations Research 6, 419-433 (1958)
[3]
Khanna, G., Catalyurek, U., Kurc, T., Sadayappan, P., Saltz, J.: Scheduling file transfers for data-intensive jobs on heterogeneous clusters. Technical Report OSUCISRC- 1/07-TR05, CSE Dept, The Ohio State University (2007)
[4]
Giersch, A., Robert, Y., Vivien, F.: Scheduling tasks sharing files from distributed repositories. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 246-253. Springer, Heidelberg (2004)
[5]
Ibarra, O. H., Kim, C. E.: Heuristic algorithms for scheduling independent tasks on nonidentical processors. J. ACM 24, 280-289 (1977)
[6]
Allcock, W., Bresnahan, J., Kettimuthu, R., Link, M.: The globus striped gridftp framework and server. In: Proc. of SuperComputing'05 (2005)
[7]
Gabow, H. N.: An efficient implementation of edmonds' algorithm for maximum matching on graphs. J. ACM 23, 221-234 (1976)
[8]
Uysal, M., Kurc, T. M., Sussman, A., Saltz, J.: A performance prediction framework for data intensive applications on large scale parallel machines. In: O'Hallaron, D. R. (ed.) LCR 1998. LNCS, vol. 1511, pp. 243-258. Springer, Heidelberg (1998)
[9]
Fischetti, M., Glover, F., Lodi, A.: The feasibility pump. Math. Program. 104, 91-104 (2005)
[10]
Czyzyk, J., Mesnier, M. P., Moré, J. J.: The neos server. IEEE Comput. Sci. Eng. 5, 68-75 (1998)

Cited By

View all
  • (2009)Workload characterization in a high-energy data grid and impact on resource managementCluster Computing10.1007/s10586-009-0081-312:2(153-173)Online publication date: 1-Jun-2009
  • (2008)Using overlays for efficient data transfer over shared wide-area networksProceedings of the 2008 ACM/IEEE conference on Supercomputing10.5555/1413370.1413418(1-12)Online publication date: 15-Nov-2008
  • (2008)File grouping for scientific data managementProceedings of the 17th international symposium on High performance distributed computing10.1145/1383422.1383429(153-164)Online publication date: 23-Jun-2008
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Euro-Par'07: Proceedings of the 13th international Euro-Par conference on Parallel Processing
August 2007
969 pages
ISBN:3540744657
  • Editors:
  • Anne-Marie Kermarrec,
  • Luc Bougé,
  • Thierry Priol

Sponsors

  • INRIA/IRISA: INRIA/IRISA
  • University of Rennes 1: University of Rennes 1
  • Rennes Métropole
  • Métivier Foundation
  • Pôle de competitivité Images & Réseaux

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 28 August 2007

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2009)Workload characterization in a high-energy data grid and impact on resource managementCluster Computing10.1007/s10586-009-0081-312:2(153-173)Online publication date: 1-Jun-2009
  • (2008)Using overlays for efficient data transfer over shared wide-area networksProceedings of the 2008 ACM/IEEE conference on Supercomputing10.5555/1413370.1413418(1-12)Online publication date: 15-Nov-2008
  • (2008)File grouping for scientific data managementProceedings of the 17th international symposium on High performance distributed computing10.1145/1383422.1383429(153-164)Online publication date: 23-Jun-2008
  • (2008)Efficient reuse of replicated parallel data segments in computational gridsFuture Generation Computer Systems10.1016/j.future.2008.01.00124:7(644-657)Online publication date: 1-Jul-2008

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media