Article

Adaptive work stealing with parallelism feedback

Authors:

Kunal Agrawal,

Yuxiong He,

Charles E. LeisersonAuthors Info & Claims

PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 112 - 120

https://doi.org/10.1145/1229428.1229448

Published: 14 March 2007 Publication History

Get Access

Abstract

We present an adaptive work-stealing thread scheduler, A-Steal, for fork-join multithreaded jobs, like those written using the Cilk multithreaded language or the Hood work-stealing library. The A-Steal algorithm is appropriate for large parallel servers where many jobs share a common multiprocessor resource and in which the number of processors available to a particular job may vary during the job's execution. A-Steal provides continual parallelism feedback to a job scheduler in the form of processor requests, and the job must adaptits execution to the processors allotted to it. Assuming that the job scheduler never allots any job more processors than requested by thejob's thread scheduler, A-Steal guarantees that the job completes in near-optimal time while utilizing at least a constant fraction of the allotted processors.

Our analysis models the job scheduler as the thread scheduler's adversary, challenging the thread scheduler to be robust to the system environment and the job scheduler's administrative policies. We analyze the performance of A-Steal using "trim analysis," which allows us to prove that our thread scheduler performs poorly on at most a small number of time steps, while exhibiting near-optimal behavior on the vast majority. To be precise, suppose that a job has work T₁ and span (critical-path length)T_∞. On a machine with P processors, A-Steal completes the job in expected O(T₁/P + T_∞ + L lg P) time steps, where L is the length of a scheduling quantum and P denotes the O(T_∞ + L lg P)-trimmed availability. This quantity is the average of the processor availability over all but the O(T_∞ + L lg P)time steps having the highest processor availability. When the job's parallelism dominates the trimmed availability, that is, P « T₁/T_∞, the job achieves nearly perfect linear speedup. Conversely, when the trimmed mean dominates the parallelism, the asymptotic running time of the job is nearly its span.

References

[1]

Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. The data locality of work stealing. In SPAA, pages 1--12, New York, NY, USA, 2000.

Abstract

References

Cited By

Index Terms

Recommendations

Adaptive work-stealing with parallelism feedback

Scheduling multithreaded computations by work stealing

Adaptive scheduling with parallelism feedback

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations