The Data Locality of Work Stealing

Umut A. Acar¹,
Guy E. Blelloch¹ &
Robert D. Blumofe²

270 Accesses
55 Citations
3 Altmetric
Explore all metrics

Abstract

This paper studies the data locality of the work-stealing scheduling algorithm on hardware-controlled shared-memory machines, where movement of data to and from the cache is solely controlled by the hardware. We present lower and upper bounds on the number of cache misses when using work stealing, and introduce a locality-guided work-stealing algorithm and its experimental validation.

As a lower bound, we show that a work-stealing application that exhibits good data locality on a uniprocessor may exhibit poor data locality on a multiprocessor. In particular, we show a family of multithreaded computations G _n whose members perform Θ(n) operations (work) and incur a constant number of cache misses on a uniprocessor, while even on two processors the total number of cache misses soars to Ω(n) . On the other hand, we show a tight upper bound on the number of cache misses that nested-parallel computations, a large, important class of computations, incur due to multiprocessing. In particular, for nested-parallel computations, we show that on P processors a multiprocessor execution incurs an expected O (C ⌉m/s;⌈PT _∞more misses than the uniprocessor execution. Here m is the execution time of an instruction incurring a cache miss, s is the steal time, C is the size of cache, and T _∈ fty is the number of nodes on the longest chain of dependencies. Based on this we give strong execution time bounds for nested-parallel computations using work stealing.}

For the second part of our results, we present a locality-guided work-stealing algorithm that improves the data locality of multithreaded computations by allowing a thread to have an affinity for a processor. Our initial experiments on iterative data-parallel applications show that the algorithm matches the performance of static-partitioning under traditional work loads but improves the performance up to 50% over static partitioning under multiprogrammed work loads. Furthermore, locality-guided work stealing improves the performance of work stealing up to 80%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, umut@cs.cmu.edu,blelloch@cs.cmu.edu , Pittsburgh, PA 15213, USA, USA
Umut A. Acar & Guy E. Blelloch
Department of Computer Sciences, University of Texas at Austin, rdb@cs.utexas.edu, Austin, TX 78712, USA, USA
Robert D. Blumofe

Authors

Umut A. Acar
View author publications
You can also search for this author in PubMed Google Scholar
Guy E. Blelloch
View author publications
You can also search for this author in PubMed Google Scholar
Robert D. Blumofe
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Acar, U.A., Blelloch, G.E. & Blumofe, R.D. The Data Locality of Work Stealing . Theory of Computing Systems 35, 321–347 (2002). https://doi.org/10.1007/s00224-002-1057-3

Download citation

Published: 01 May 2002
Issue Date: June 2002
DOI: https://doi.org/10.1007/s00224-002-1057-3

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms

Near Optimal Work-Stealing Tree Scheduler for Highly Irregular Data-Parallel Workloads

Adaptive Granularity Control in Task Parallel Programs Using Multiversioning

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

The Data Locality of Work Stealing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms

Near Optimal Work-Stealing Tree Scheduler for Highly Irregular Data-Parallel Workloads

Adaptive Granularity Control in Task Parallel Programs Using Multiversioning

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation