Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1891719.1891720acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

On the feasibility of dynamic rescheduling on the Intel Distributed Computing Platform

Published: 29 November 2010 Publication History

Abstract

This paper examines the feasibility of dynamic rescheduling techniques for effectively utilizing compute resources within a data center. Our work is motivated by practical concerns of Intel Distributed Computing Platform (IDCP), an Internet-scale data center based distributed computing platform developed by Intel Corporation for massively parallel chip simulations within the company. IDCP has been operational for many years, and currently is deployed live on tens of thousands of machines that are globally distributed at various data centers. We perform an analysis of job execution traces obtained over a one year period collected from tens of thousands of IDCP machines from 20 different pools. Our analysis shows that the IDCP currently does not make full use of all the resources. Specifically, the job completion time can be severely impacted due to job suspension when high priority jobs preempt low priority jobs. We then develop dynamic job rescheduling strategies that adaptively restart jobs to available resources elsewhere, which better utilize system resources and improve completion times. Our trace-driven evaluation results show that dynamic rescheduling enables IDCP to significantly reduce system waste and completion time of suspended jobs.

References

[1]
Amazon Elastic Compute Cloud, Virtual Grid Computing. http://aws.amazon.com/ec2.
[2]
M. Armbrust, A. Fox, R. Griffith, and et al. Above the Clouds: A Berkeley View of Cloud Computing. Technical Report UCB/EECS-2009-28, UC Berkeley, Feb. 2009.
[3]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In osdi, 2004.
[4]
M. L. et. al. Checkpoint and migration of UNIX processes in the Condor distributed processing system. Technical Report UW-CS-TR-1346, UW-Madison, 1997.
[5]
I. Foster and C. Kesselman, editors. The Grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 2004.
[6]
G. Koole and R. Righter. Resource allocation in grid computing. J. of Scheduling, 11(3):163--173, 2008.
[7]
M. Litzkow, M. Livny, and M. Mutka. Condor - A Hunter of Idle Workstations. In ICDCS, 1988.
[8]
M. Nelson, B.-H. Lim, and G. Hutchins. Fast transparent migration for virtual machines. In USENIX, 2005.
[9]
S. Sammanna, T. Tang, and V. Lal. Server Virtualization using Xen based VMM. In Intel IT Technical Leadership Conference, 2008.
[10]
H. Shan, L. Oliker, and R. Biswas. Job superscheduler architecture and performance in computational grid environments. SC Conference, 2003.
[11]
Srinivas Nimmagadda et al. High-End Workstation Compute Farms Using Windows NT. In 3rd USENIX Windows NT Symposium, 1999.
[12]
G. Tan, D. Duzevik, E. Bunch, T. Ashburn, E. Wynn, and T. Witham. Agent-based Simulator for Compute Resource Allocation. In Intel IT Technical Leadership Conference, 2008.
[13]
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2008.

Cited By

View all
  • (2016)Application Suitability Assessment for Many-Core TargetsHigh Performance Computing10.1007/978-3-319-46079-6_23(319-338)Online publication date: 6-Oct-2016
  • (2014)Heuristics for Resource Matching in Intel’s Compute FarmJob Scheduling Strategies for Parallel Processing10.1007/978-3-662-43779-7_7(116-135)Online publication date: 11-Jun-2014
  • (2013)On-line fair allocations based on bottlenecks and global prioritiesProceedings of the 4th ACM/SPEC International Conference on Performance Engineering10.1145/2479871.2479904(229-240)Online publication date: 21-Apr-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
Middleware Industrial Track '10: Proceedings of the 11th International Middleware Conference Industrial track
November 2010
45 pages
ISBN:9781450304566
DOI:10.1145/1891719
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Professional
  • USENIX Assoc: USENIX Assoc
  • IFIP

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 November 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud resource management
  2. distributed computing
  3. dynamic rescheduling
  4. trace-driven analysis

Qualifiers

  • Research-article

Funding Sources

Conference

Middleware '10
Sponsor:
  • USENIX Assoc
Middleware '10: 11th International Middleware Conference
November 29 - December 3, 2010
Bangalore, India

Acceptance Rates

Overall Acceptance Rate 5 of 23 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Application Suitability Assessment for Many-Core TargetsHigh Performance Computing10.1007/978-3-319-46079-6_23(319-338)Online publication date: 6-Oct-2016
  • (2014)Heuristics for Resource Matching in Intel’s Compute FarmJob Scheduling Strategies for Parallel Processing10.1007/978-3-662-43779-7_7(116-135)Online publication date: 11-Jun-2014
  • (2013)On-line fair allocations based on bottlenecks and global prioritiesProceedings of the 4th ACM/SPEC International Conference on Performance Engineering10.1145/2479871.2479904(229-240)Online publication date: 21-Apr-2013
  • (2013)Experiences with a Private Enterprise CloudProceedings of the 2013 IEEE Sixth International Conference on Cloud Computing10.1109/CLOUD.2013.72(770-777)Online publication date: 28-Jun-2013
  • (2012)Autonomic Placement of Mixed Batch and Transactional WorkloadsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2011.12923:2(219-231)Online publication date: 1-Feb-2012
  • (2011)CRESTProceedings of the 2011 IEEE 8th International Conference on e-Business Engineering10.1109/ICEBE.2011.37(311-316)Online publication date: 19-Oct-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media