research-article

Synthesizing concurrent schedulers for irregular algorithms

Authors:

Keshav PingaliAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 39, Issue 1

Pages 333 - 344

https://doi.org/10.1145/1961295.1950404

Published: 05 March 2011 Publication History

Abstract

Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph is known at compile-time or on dynamic scheduling of independent loop iterations such as in OpenMP.

In irregular algorithms, dependences between activities are complex functions of runtime values so these algorithms are not amenable to compile-time analysis nor do they consist of independent activities. Moreover, the amount of work can vary dramatically with the scheduling policy. To handle these complexities, implementations of irregular algorithms employ carefully handcrafted, algorithm-specific schedulers but these schedulers are themselves parallel programs, complicating the parallel programming problem further.

In this paper, we present a flexible and efficient approach for specifying and synthesizing scheduling policies for irregular algorithms. We develop a simple compositional specification language and show how it can concisely encode scheduling policies in the literature. Then, we show how to synthesize efficient parallel schedulers from these specifications. We evaluate our approach for five irregular algorithms on three multicore architectures and show that (1) the performance of some algorithms can improve by orders of magnitude with the right scheduling policy, and (2) for the same policy, the overheads of our synthesized schedulers are comparable to those of fixed-function schedulers.

References

[1]

N. Amenta, S. Choi, and G. Rote. Incremental constructions con brio. In Proc. Symp. on Computational Geometry (SCG), pages 211--219, 2003.

Digital Library

[2]

L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, May 1994. (DIKU report 94/19).

[3]

D. Bader and V. Sachdeva. A cache-aware parallel implementation of the push-relabel network flow algorithm and experimental evaluation of the gap relabeling heuristic. In Proc. 18th ISCA International Conference on Parallel and Distributed Computing Systems (PDCS), September 2005.

[4]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995.

Digital Library

[5]

A. Bowyer. Computing Dirichlet tessellations. The Computer Journal, 24(2):162--166, 1981.

[6]

B. V. Cherkassy and A. V. Goldberg. On implementing push-relabel method for the maximum flow problem. In Proc. Intl. Conf. on Integer Programming and Combinatorial Optimization (IPCO), pages 157--171, London, UK, 1995.

Digital Library

[7]

L. P. Chew. Guaranteed-quality mesh generation for curved surfaces. In Proc. Symp. on Computational Geometry (SCG), 1993.

Digital Library

[8]

K. L. Clarkson and P. W. Shor. Applications of random sampling in computational geometry, ii. Discrete Comput. Geom., 4(5):287--421, 1989.

Digital Library

[9]

T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001.

Digital Library

[10]

L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng., 5(1):46--55, 1998.

Digital Library

[11]

A. Duran, J. Corbalán, and E. Ayguadé. Evaluation of OpenMP task scheduling strategies. In Proc. Conf. on OpenMP in a new era of parallelism (IWOMP), pages 100--110, 2008.

Digital Library

[12]

A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988.

Digital Library

[13]

R. H. Halstead, Jr. MULTILISP: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst., 7:501--538, October 1985.

Digital Library

[14]

B. Hardekopf and C. Lin. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proc. Conf. on Programming Language Design and Implementation (PLDI), 2007.

Digital Library

[15]

M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proc. of International Symposium on Computer Architecture (ISCA), 1993.

Digital Library

[16]

M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463--492, 1990.

Digital Library

[17]

Intel Corporation. Intel threading building blocks 3.0. http://threadingbuildingblocks.org.

[18]

M. Kulkarni, P. Carribault, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Scheduling strategies for optimistic parallel execution of irregular programs. In Proc. Symp. on Parallelism in algorithms and architectures (SPAA), pages 217--228, New York, NY, USA, 2008.

Digital Library

[19]

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI), 42(6):211--222, 2007.

Digital Library

[20]

M. Kulkarni, D. Prountzos, D. Nguyen, and K. Pingali. Defining and implementing commutativity conditions for parallel execution. regular rech report TR-ECE-09-11, School of Electrial and Computer Engineering, Purdue University, August 2009.

[21]

D. Lea. A Java fork/join framework. In Proc. Conf. on Java Grande, pages 36--43, 2000.

Digital Library

[22]

D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proc. Conf. on object-oriented programming, systems, languages, and applications (OOPSLA), pages 227--242, 2009.

Digital Library

[23]

K. Madduri, D. Bader, J. Berry, and J. Crobak. Parallel shortest path algorithms for solving large-scale instances. Technical Report GT-CSE-06-19, Georgia Institute of Technology, September 2006.

[24]

M. Méndez-Lojo, A. Mathew, and K. Pingali. Parallel inclusion-based points-to analysis. In Proc. Intl. Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH), 2010.

Digital Library

[25]

U. Meyer and P. Sanders. Delta-stepping: A parallel single source shortest path algorithm. In Proc. European Symp. on Algorithms (ESA), pages 393--404, 1998.

Digital Library

[26]

M. M. Michael, M. T. Vechev, and V. A. Saraswat. Idempotent work stealing. In Proc. Symp. on Principles and practice of parallel programming (PPoPP), pages 45--54, 2009.

Digital Library

[27]

G. L. Miller. A time efficient Delaunay refinement algorithm. In Proc. Symposium on Discrete Algorithms (SODA), pages 400--409, New Orleans, 2004.

Digital Library

[28]

F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program Analysis. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999.

Digital Library

[29]

D. J. Pearce, P. H. J. Kelly, and C. L. Hankin. Online cycle detection and difference propagation for pointer analysis. In Intl. IEEE Workshop on Source Code Analysis and Manipulation (SCAM), pages 3--12, 2003.

[30]

K. Pingali, M. Kulkarni, D. Nguyen, M. Burtscher, M. Mendez-Lojo, D. Prountzos, X. Sui, and Z. Zhong. Amorphous data-parallelism in irregular algorithms. regular tech report TR-09-05, The University of Texas at Austin, 2009.

[31]

J. Ruppert. A Delaunay refinement algorithm for quality 2-dimensional mesh generation. J. Algorithms, 18(3):548--585, 1995.

Digital Library

[32]

J. R. Shewchuk. Triangle: Engineering a 2d quality mesh generator and Delaunay triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. Springer-Verlag, 1996.

Digital Library

[33]

A. Solar-Lezama, C. G. Jones, and R. Bodik. Sketching concurrent data structures. In Proc. Conf. on Programming Language Design and Implementation (PLDI), pages 136--148, New York, NY, USA, 2008.

Digital Library

[34]

M. T. Vechev, E. Yahav, D. F. Bacon, and N. Rinetzky. CGCExplorer: a semi-automated search procedure for provably correct concurrent collectors. In Proc. Conf. on Programming Language Design and Implementation (PLDI), pages 456--467, New York, NY, USA, 2007.

Digital Library

[35]

D. F. Watson. Computing the n-dimensional tessellation with application to Voronoi polytopes. The Computer Journal, 24(2):167--172, 1981.

[36]

J. Wu, R. Das, J. Saltz, H. Berryman, and S. Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44, 1995.

Digital Library

Cited By

Chen LZhu JDeng YLi ZChen JJiang XYin SWei SLiu L(2021)An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.308480432:12(3066-3080)Online publication date: 1-Dec-2021
https://doi.org/10.1109/TPDS.2021.3084804
Whang JLenharth ADhillon IPingali K(2015)Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons LearnedEuro-Par 2015: Parallel Processing10.1007/978-3-662-48096-0_34(438-450)Online publication date: 25-Jul-2015
https://doi.org/10.1007/978-3-662-48096-0_34
Behera NKumar ARajadurai T ENitish SM RNasre R(2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024
https://doi.org/10.1016/j.jpdc.2024.104967
Show More Cited By

Index Terms

Synthesizing concurrent schedulers for irregular algorithms
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures

Recommendations

Synthesizing concurrent schedulers for irregular algorithms
ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems

Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph ...
Synthesizing concurrent schedulers for irregular algorithms
ASPLOS '11

Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph ...
Approximation algorithms for parallel machine scheduling with linear deterioration

This paper deals with a parallel machine scheduling problem. Different from fixed processing time assumption in the classical scheduling, a job's processing time is a simple linear increasing function of its starting time. The aim is makespan ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 39, Issue 1

ASPLOS '11

March 2011

407 pages

ISSN:0163-5964

DOI:10.1145/1961295

Issue’s Table of Contents

ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
March 2011
432 pages
ISBN:9781450302661
DOI:10.1145/1950365
General Chair:
Rajiv Gupta
University of California, Riverside
,
Program Chair:
Todd C. Mowry
Carnegie Mellon University

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2011

Published in SIGARCH Volume 39, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
561
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen LZhu JDeng YLi ZChen JJiang XYin SWei SLiu L(2021)An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.308480432:12(3066-3080)Online publication date: 1-Dec-2021
https://doi.org/10.1109/TPDS.2021.3084804
Whang JLenharth ADhillon IPingali K(2015)Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons LearnedEuro-Par 2015: Parallel Processing10.1007/978-3-662-48096-0_34(438-450)Online publication date: 25-Jul-2015
https://doi.org/10.1007/978-3-662-48096-0_34
Behera NKumar ARajadurai T ENitish SM RNasre R(2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024
https://doi.org/10.1016/j.jpdc.2024.104967
Arif MVandierendonck H(2021)Reducing the burden of parallel loop schedulers for many‐core processorsConcurrency and Computation: Practice and Experience10.1002/cpe.624133:13Online publication date: 5-Apr-2021
https://doi.org/10.1002/cpe.6241
Lee HWong DHoang LDathathri RGill GJatala VKuck DPingali K(2020)A Study of APIs for Graph Analytics Workloads2020 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC50251.2020.00030(228-239)Online publication date: Oct-2020
https://doi.org/10.1109/IISWC50251.2020.00030
Yesil SHeidarshenas AMorrison ATorrellas JTaufer MBalaji PPeña A(2019)Understanding priority-based scheduling of graph algorithms on a shared-memory platformProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356160(1-14)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356160
Hoang LPontecorvi MDathathri RGill GYou BPingali KRamachandran VHollingsworth JKeidar I(2019)A round-efficient distributed betweenness centrality algorithmProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295729(272-286)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295729
Firoz JZalewski MLumsdaine ABarnas M(2018)Runtime Scheduling Policies for Distributed Graph Algorithms2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00073(640-649)Online publication date: May-2018
https://doi.org/10.1109/IPDPS.2018.00073
Nguyen DPingali K(2017)What Scalable Programs Need from Transactional MemoryACM SIGARCH Computer Architecture News10.1145/3093337.303775045:1(105-118)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093337.3037750
Nguyen DPingali K(2017)What Scalable Programs Need from Transactional MemoryACM SIGPLAN Notices10.1145/3093336.303775052:4(105-118)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093336.3037750
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents