Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1229428.1229448acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Adaptive work stealing with parallelism feedback

Published: 14 March 2007 Publication History

Abstract

We present an adaptive work-stealing thread scheduler, A-Steal, for fork-join multithreaded jobs, like those written using the Cilk multithreaded language or the Hood work-stealing library. The A-Steal algorithm is appropriate for large parallel servers where many jobs share a common multiprocessor resource and in which the number of processors available to a particular job may vary during the job's execution. A-Steal provides continual parallelism feedback to a job scheduler in the form of processor requests, and the job must adaptits execution to the processors allotted to it. Assuming that the job scheduler never allots any job more processors than requested by thejob's thread scheduler, A-Steal guarantees that the job completes in near-optimal time while utilizing at least a constant fraction of the allotted processors.
Our analysis models the job scheduler as the thread scheduler's adversary, challenging the thread scheduler to be robust to the system environment and the job scheduler's administrative policies. We analyze the performance of A-Steal using "trim analysis," which allows us to prove that our thread scheduler performs poorly on at most a small number of time steps, while exhibiting near-optimal behavior on the vast majority. To be precise, suppose that a job has work T1 and span (critical-path length)T. On a machine with P processors, A-Steal completes the job in expected O(T1/P + T + L lg P) time steps, where L is the length of a scheduling quantum and P denotes the O(T + L lg P)-trimmed availability. This quantity is the average of the processor availability over all but the O(T + L lg P)time steps having the highest processor availability. When the job's parallelism dominates the trimmed availability, that is, P « T1/T, the job achieves nearly perfect linear speedup. Conversely, when the trimmed mean dominates the parallelism, the asymptotic running time of the job is nearly its span.

References

[1]
Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. The data locality of work stealing. In SPAA, pages 1--12, New York, NY, USA, 2000.
[2]
Kunal Agrawal, Yuxiong He, Wen Jing Hsu, and Charles E. Leiserson. Adaptive task scheduling with parallelism feedback. In PPoPP, 2006.
[3]
Kunal Agrawal, Yuxiong He, Wen Jing Hsu, and Charles E. Leiserson. An empirical evaluation of work stealing with parallelism feedback. In ICDCS, 2006.
[4]
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA, pages 119--129, Puerto Vallarta, Mexico, 1998.
[5]
James Aspnes, Maurice Herlihy, and Nir Shavit. Counting networks. Journal of the ACM, 41(5):1020--1048, 1994.
[6]
Nikhil Bansal, Kedar Dhamdhere, Jochen Konemann, and Amitabh Sinha. Non-clairvoyant scheduling for minimizing mean slowdown. Algorithmica, 40(4):305--318, 2004.
[7]
Guy Blelloch, Phil Gibbons, and Yossi Matias. Provably efficient scheduling for languages with fine-grained parallelism. Journal of the ACM, 46(2):281--321, 1999.
[8]
Guy E. Blelloch, Phillip B. Gibbons, and Yossi Matias. Provably efficient scheduling for languages with fine-grained parallelism. In SPAA, pages 1--12, Santa Barbara, California, 1995.
[9]
Guy E. Blelloch and John Greiner. A provable time and space efficient implementation of NESL. In ICFP, pages 213--225, 1996.
[10]
Robert D. Blumofe. Executing Multithreaded Programs Efficiently. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1995.
[11]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 207--216, Santa Barbara, California, July 1995.
[12]
Robert D. Blumofe and Charles E. Leiserson. Space-efficient scheduling of multithreaded computations. SIAM Journal on Computing, 27(1):202--229, February 1998.
[13]
Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5):720--748, 1999.
[14]
Robert D. Blumofe, Charles E. Leiserson, and Bin Song. Automatic processor allocation for work-stealing jobs. 1998.
[15]
Robert D. Blumofe and Philip A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In USENIX, pages 133--147, Anaheim, California, 1997.
[16]
Robert D. Blumofe and Dionisios Papadopoulos. The performance of work stealing in multiprogrammed environments. In SIGMETRICS, pages 266--267, 1998.
[17]
Robert D. Blumofe and Dionisios Papadopoulos. Hood: A user-level threads library for multiprogrammed multiprocessors. Technical report, University of Texas at Austin, 1999.
[18]
Robert D. Blumofe and David S. Park. Scheduling large-scale parallel computations on networks of workstations. In HPDC, pages 96--105, San Francisco, California, 1994.
[19]
R.P. Brent. The parallel evaluation of general arithmetic expressions. Journal of the ACM, pages 201--206, 1974.
[20]
F. Warren Burton and M. Ronan Sleep. Executing functional programs on a virtual tree of processors. In FPCA, pages 187--194, Portsmouth, New Hampshire, October 1981.
[21]
Su-Hui Chiang and Mary K. Vernon. Dynamic vs. static quantum-based parallel processor allocation. In JSSPP, pages 200--223, Honolulu, Hawaii, United States, 1996.
[22]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. The MIT Press and McGraw-Hill, second edition, 2001.
[23]
Xiaotie Deng and Patrick Dymond. On multiprocessor system scheduling. In SPAA, pages 82--88, 1996.
[24]
DESMO-J: A framework for discrete-event modelling and simulation. http://asi-www.informatik.uni-hamburg.de/desmoj/.
[25]
Derek L. Eager, John Zahorjan, and Edward D. Lozowska. Speedup versus efficiency in parallel systems. IEEE Transactions on Computers, 38(3):408--423, 1989.
[26]
Jeff Edmonds. Scheduling in the dark. In STOC, pages 179--188, 1999.
[27]
Jeff Edmonds, Donald D. Chinn, Timothy Brecht, and Xiaotie Deng. Non-clairvoyant multiprocessor scheduling of jobs with changing execution characteristics. Journal of Scheduling, 6(3):231--250, 2003.
[28]
Zhixi Fang, Peiyi Tang, Pen-Chung Yew, and Chuan-Qi Zhu. Dynamic processor self-scheduling for general parallel nested loops. IEEE Transactions on Computers, 39(7):919--929, 1990.
[29]
Dror G. Feitelson. Job scheduling in multiprogrammed parallel systems (extended version). Technical report, IBM Research Report RC 19790 (87657) 2nd Revision, 1997.
[30]
Raphael Finkel and Udi Manber. DIB---A distributed implementation of backtracking. TOPLAS, 9(2):235--256, April 1987.
[31]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212--223, 1998.
[32]
Dipak Ghosal, Giuseppe Serazzi, and Satish K. Tripathi. The processor working set and its use in scheduling multiprocessor systems. IEEE Transactions on Software Engineering, 17(5):443--453, 1991.
[33]
R. L. Graham. Bounds on multiprocessing anomalies. SIAM Journal on Applied Mathematics, pages 17(2):416--429, 1969.
[34]
Nian Gu. Competitive analysis of dynamic processor allocation strategies. Master's thesis, York University, 1995.
[35]
Michael Halbherr, Yuli Zhou, and Chris F. Joerg. MIMD-style parallel programming with continuation-passing threads. In Proceedings of the International Workshop on Massive Parallelism: Hardware, Software, and Applications, Capri, Italy, September 1994.
[36]
Robert H. Halstead, Jr. Implementation of Multilisp: Lisp on a multiprocessor. In LFP, pages 9--17, Austin, Texas, August 1984.
[37]
Yuxiong He, Hsu Wen Jing, and Charles E. Leiserson. Provably efficient two-level adaptive scheduling for multithreaded jobs on parallel computers. JSSPP, 2006.
[38]
S. F. Hummel and E. Schonberg. Low-overhead scheduling of nested parallelism. IBM Journal of Research and Development, 35(5-6):743--765, 1991.
[39]
Richard M. Karp and Yanjun Zhang. A randomized parallel branch-and-bound procedure. In STOC, pages 290--300, Chicago, Illinois, May 1988.
[40]
Cathy McCann, Raj Vaswani, and John Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Transactions on Computer Systems, 11(2):146--178, 1993.
[41]
Eric Mohr, David A. Kranz, and Jr. Robert H. Halstead. Lazy task creation: A technique for increasing the granularity of parallel programs. In LFP, pages 185--197, 1990.
[42]
Rajeev Motwani, Steven Phillips, and Eric Torng. Non-clairvoyant scheduling. In SODA, pages 422--431, 1993.
[43]
Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1 edition, 1995.
[44]
Girija J. Narlikar and Guy E. Blelloch. Space-efficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21(1):138--173, 1999.
[45]
Thu D. Nguyen, Raj Vaswani, and John Zahorjan. Maximizing speedup through self-tuning of processor allocation. In IPPS, pages 463--468, 1996.
[46]
Thu D. Nguyen, Raj Vaswani, and John Zahorjan. Using runtime measured workload characteristics in parallel processor scheduling. In Dror G. Feitelson and Larry Rudolph, editors, JSSPP, pages 155--174. Springer-Verlag, 1996.
[47]
Eric W. Parsons and Kenneth C. Sevcik. Multiprocessor scheduling for high-variability service time distributions. In IPPS, pages 127--145, London, UK, 1995. Springer-Verlag.
[48]
Larry Rudolph, Miriam Slivkin-Allalouf, and Eli Upfal. A simple load balancing scheme for task allocation in parallel machines. In SPAA, pages 237--245, Hilton Head, South Carolina, July 1991.
[49]
William Scherer, Doug Lea, and Michael Scott. Scalable synchronous queues. In PPoPP, 2006.
[50]
Siddhartha Sen. Dynamic processor allocation for adaptively parallel jobs. Master's thesis, Massachusetts Institute of technology, 2004.
[51]
K. C. Sevcik. Characterizations of parallelism in applications and their use in scheduling. In SIGMETRICS, pages 171--180, 1989.
[52]
Nir Shavit and Dan Touitou. Elimination trees and the construction of pools and stacks: preliminary version. In SPAA, pages 54--63, 1995.
[53]
Nir Shavit and Asaph Zemach. Diffracting trees. ACM Transactions of Computer Systems, 14(4):385--428, 1996.
[54]
B. Song. Scheduling adaptively parallel jobs. Master's thesis, Massachusetts Institute of Technology, 1998.
[55]
Kaushik Guha Timothy B. Brecht. Using parallel program characteristics in dynamic processor allocation policies. Performance Evaluation, 27-28:519--539, 1996.
[56]
Andrew Tucker and Anoop Gupta. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In SOSP, pages 159--166, New York, NY, USA, 1989. ACM Press.
[57]
Roger Wattenhofer and Peter Widmayer. The counting pyramid: an adaptive distributed counting scheme. Journal of Parallel and Distributed Computing, 6 (4):449--460, 2004.
[58]
K. K. Yue and D. J. Lilja. Implementing a dynamic processor allocation policy for multiprogrammed parallel applications in the Solaris™ operating system. Concurrency and Computation-Practice and Experience, 13(6):449--464, 2001.

Cited By

View all
  • (2022)Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310425533:6(1303-1320)Online publication date: 1-Jun-2022
  • (2022)Task-Parallel Programming with Constrained Parallelism2022 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC55821.2022.9926348(1-7)Online publication date: 19-Sep-2022
  • (2020)An Efficient Work-Stealing Scheduler for Task Dependency Graph2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS51040.2020.00018(64-71)Online publication date: Dec-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
March 2007
284 pages
ISBN:9781595936028
DOI:10.1145/1229428
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive scheduling
  2. adversary
  3. distributed scheduling
  4. job scheduling
  5. multiprogramming
  6. multithreaded languages
  7. parallel computation
  8. parallelism feedback
  9. processor allocation
  10. space sharing
  11. thread scheduling
  12. trim analysis
  13. two-level scheduling
  14. work stealing

Qualifiers

  • Article

Conference

PPoPP07
Sponsor:

Acceptance Rates

PPoPP '07 Paper Acceptance Rate 22 of 65 submissions, 34%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310425533:6(1303-1320)Online publication date: 1-Jun-2022
  • (2022)Task-Parallel Programming with Constrained Parallelism2022 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC55821.2022.9926348(1-7)Online publication date: 19-Sep-2022
  • (2020)An Efficient Work-Stealing Scheduler for Task Dependency Graph2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS51040.2020.00018(64-71)Online publication date: Dec-2020
  • (2019)Practically Efficient Scheduler for Minimizing Average Flow Time of Parallel Jobs2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00024(134-144)Online publication date: May-2019
  • (2018)Accelerating Data Analytics on Integrated GPU Platforms via Runtime SpecializationInternational Journal of Parallel Programming10.1007/s10766-016-0482-x46:2(336-375)Online publication date: 1-Apr-2018
  • (2018)Scheduling Parallelizable Jobs Online to Maximize ThroughputLATIN 2018: Theoretical Informatics10.1007/978-3-319-77404-6_55(755-776)Online publication date: 13-Mar-2018
  • (2017)Brief AnnouncementProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087590(87-89)Online publication date: 24-Jul-2017
  • (2016)Scheduling parallel DAG jobs online to minimize average flow timeProceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms10.5555/2884435.2884449(176-189)Online publication date: 10-Jan-2016
  • (2016)Well-Structured Futures and Cache LocalityACM Transactions on Parallel Computing10.1145/28586502:4(1-20)Online publication date: 9-Feb-2016
  • (2014)Well-structured futures and cache localityACM SIGPLAN Notices10.1145/2692916.255525749:8(155-166)Online publication date: 6-Feb-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media