research-article

On the feasibility of dynamic rescheduling on the Intel Distributed Computing Platform

Authors:

Insup LeeAuthors Info & Claims

Middleware Industrial Track '10: Proceedings of the 11th International Middleware Conference Industrial track

Pages 4 - 10

https://doi.org/10.1145/1891719.1891720

Published: 29 November 2010 Publication History

Get Access

Abstract

This paper examines the feasibility of dynamic rescheduling techniques for effectively utilizing compute resources within a data center. Our work is motivated by practical concerns of Intel Distributed Computing Platform (IDCP), an Internet-scale data center based distributed computing platform developed by Intel Corporation for massively parallel chip simulations within the company. IDCP has been operational for many years, and currently is deployed live on tens of thousands of machines that are globally distributed at various data centers. We perform an analysis of job execution traces obtained over a one year period collected from tens of thousands of IDCP machines from 20 different pools. Our analysis shows that the IDCP currently does not make full use of all the resources. Specifically, the job completion time can be severely impacted due to job suspension when high priority jobs preempt low priority jobs. We then develop dynamic job rescheduling strategies that adaptively restart jobs to available resources elsewhere, which better utilize system resources and improve completion times. Our trace-driven evaluation results show that dynamic rescheduling enables IDCP to significantly reduce system waste and completion time of suspended jobs.

References

[1]

Amazon Elastic Compute Cloud, Virtual Grid Computing. http://aws.amazon.com/ec2.

Google Scholar

[2]

M. Armbrust, A. Fox, R. Griffith, and et al. Above the Clouds: A Berkeley View of Cloud Computing. Technical Report UCB/EECS-2009-28, UC Berkeley, Feb. 2009.

Google Scholar

[3]

J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In osdi, 2004.

Digital Library

Google Scholar

[4]

M. L. et. al. Checkpoint and migration of UNIX processes in the Condor distributed processing system. Technical Report UW-CS-TR-1346, UW-Madison, 1997.

Google Scholar

[5]

I. Foster and C. Kesselman, editors. The Grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 2004.

Digital Library

Google Scholar

[6]

G. Koole and R. Righter. Resource allocation in grid computing. J. of Scheduling, 11(3):163--173, 2008.

Digital Library

Google Scholar

[7]

M. Litzkow, M. Livny, and M. Mutka. Condor - A Hunter of Idle Workstations. In ICDCS, 1988.

Crossref

Google Scholar

[8]

M. Nelson, B.-H. Lim, and G. Hutchins. Fast transparent migration for virtual machines. In USENIX, 2005.

Digital Library

Google Scholar

[9]

S. Sammanna, T. Tang, and V. Lal. Server Virtualization using Xen based VMM. In Intel IT Technical Leadership Conference, 2008.

Google Scholar

[10]

H. Shan, L. Oliker, and R. Biswas. Job superscheduler architecture and performance in computational grid environments. SC Conference, 2003.

Digital Library

Google Scholar

[11]

Srinivas Nimmagadda et al. High-End Workstation Compute Farms Using Windows NT. In 3rd USENIX Windows NT Symposium, 1999.

Digital Library

Google Scholar

[12]

G. Tan, D. Duzevik, E. Bunch, T. Ashburn, E. Wynn, and T. Witham. Agent-based Simulator for Compute Resource Allocation. In Intel IT Technical Leadership Conference, 2008.

Google Scholar

[13]

M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2008.

Digital Library

Google Scholar

Cited By

View all

Newburn CSukha JSharapov INguyen AMiao C(2016)Application Suitability Assessment for Many-Core TargetsHigh Performance Computing10.1007/978-3-319-46079-6_23(319-338)Online publication date: 6-Oct-2016
https://doi.org/10.1007/978-3-319-46079-6_23
Shai OShmueli EFeitelson D(2014)Heuristics for Resource Matching in Intel’s Compute FarmJob Scheduling Strategies for Parallel Processing10.1007/978-3-662-43779-7_7(116-135)Online publication date: 11-Jun-2014
https://doi.org/10.1007/978-3-662-43779-7_7
Zeldes YFeitelson DTůma PCasale GField TAmaral J(2013)On-line fair allocations based on bottlenecks and global prioritiesProceedings of the 4th ACM/SPEC International Conference on Performance Engineering10.1145/2479871.2479904(229-240)Online publication date: 21-Apr-2013
https://dl.acm.org/doi/10.1145/2479871.2479904
Show More Cited By

Recommendations

Simulation modeling of a dynamic job shop rescheduling with machine availability constraints
Simulation modeling of a dynamic job shop rescheduling with machine availability constraints
Proceedings of the 23rd international conference on on Computers and industrial engineering
Dynamic personnel rescheduling: insights and recovery strategies
Abstract
Personnel rescheduling problems have typically been studied from a static perspective, assuming a single rescheduling decision to be taken for which all disruption information is known. However, companies operate in a dynamic environment and new ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Middleware Industrial Track '10: Proceedings of the 11th International Middleware Conference Industrial track

November 2010

45 pages

ISBN:9781450304566

DOI:10.1145/1891719

Program Chairs:
Lucy Cherkasova
HP Labs
,
Rajeev Rastogi
Yahoo Research, India

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 November 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Division of Computer and Network Systems

Conference

Middleware '10

Sponsor:

USENIX Assoc

Middleware '10: 11th International Middleware Conference

November 29 - December 3, 2010

Bangalore, India

Acceptance Rates

Overall Acceptance Rate 5 of 23 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
137
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Newburn CSukha JSharapov INguyen AMiao C(2016)Application Suitability Assessment for Many-Core TargetsHigh Performance Computing10.1007/978-3-319-46079-6_23(319-338)Online publication date: 6-Oct-2016
https://doi.org/10.1007/978-3-319-46079-6_23
Shai OShmueli EFeitelson D(2014)Heuristics for Resource Matching in Intel’s Compute FarmJob Scheduling Strategies for Parallel Processing10.1007/978-3-662-43779-7_7(116-135)Online publication date: 11-Jun-2014
https://doi.org/10.1007/978-3-662-43779-7_7
Zeldes YFeitelson DTůma PCasale GField TAmaral J(2013)On-line fair allocations based on bottlenecks and global prioritiesProceedings of the 4th ACM/SPEC International Conference on Performance Engineering10.1145/2479871.2479904(229-240)Online publication date: 21-Apr-2013
https://dl.acm.org/doi/10.1145/2479871.2479904
Kamath VGiri RMuralidhar R(2013)Experiences with a Private Enterprise CloudProceedings of the 2013 IEEE Sixth International Conference on Cloud Computing10.1109/CLOUD.2013.72(770-777)Online publication date: 28-Jun-2013
https://dl.acm.org/doi/10.1109/CLOUD.2013.72
Carrera DSteinder MWhalley ITorres JAyguade E(2012)Autonomic Placement of Mixed Batch and Transactional WorkloadsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2011.12923:2(219-231)Online publication date: 1-Feb-2012
https://dl.acm.org/doi/10.1109/TPDS.2011.129
Lei LWo THu C(2011)CRESTProceedings of the 2011 IEEE 8th International Conference on e-Business Engineering10.1109/ICEBE.2011.37(311-316)Online publication date: 19-Oct-2011
https://dl.acm.org/doi/10.1109/ICEBE.2011.37

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Recommendations

Simulation modeling of a dynamic job shop rescheduling with machine availability constraints

Simulation modeling of a dynamic job shop rescheduling with machine availability constraints

Dynamic personnel rescheduling: insights and recovery strategies

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations