Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Synthesizing concurrent schedulers for irregular algorithms

Published: 05 March 2011 Publication History

Abstract

Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph is known at compile-time or on dynamic scheduling of independent loop iterations such as in OpenMP.
In irregular algorithms, dependences between activities are complex functions of runtime values so these algorithms are not amenable to compile-time analysis nor do they consist of independent activities. Moreover, the amount of work can vary dramatically with the scheduling policy. To handle these complexities, implementations of irregular algorithms employ carefully handcrafted, algorithm-specific schedulers but these schedulers are themselves parallel programs, complicating the parallel programming problem further.
In this paper, we present a flexible and efficient approach for specifying and synthesizing scheduling policies for irregular algorithms. We develop a simple compositional specification language and show how it can concisely encode scheduling policies in the literature. Then, we show how to synthesize efficient parallel schedulers from these specifications. We evaluate our approach for five irregular algorithms on three multicore architectures and show that (1) the performance of some algorithms can improve by orders of magnitude with the right scheduling policy, and (2) for the same policy, the overheads of our synthesized schedulers are comparable to those of fixed-function schedulers.

References

[1]
N. Amenta, S. Choi, and G. Rote. Incremental constructions con brio. In Proc. Symp. on Computational Geometry (SCG), pages 211--219, 2003.
[2]
L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, May 1994. (DIKU report 94/19).
[3]
D. Bader and V. Sachdeva. A cache-aware parallel implementation of the push-relabel network flow algorithm and experimental evaluation of the gap relabeling heuristic. In Proc. 18th ISCA International Conference on Parallel and Distributed Computing Systems (PDCS), September 2005.
[4]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995.
[5]
A. Bowyer. Computing Dirichlet tessellations. The Computer Journal, 24(2):162--166, 1981.
[6]
B. V. Cherkassy and A. V. Goldberg. On implementing push-relabel method for the maximum flow problem. In Proc. Intl. Conf. on Integer Programming and Combinatorial Optimization (IPCO), pages 157--171, London, UK, 1995.
[7]
L. P. Chew. Guaranteed-quality mesh generation for curved surfaces. In Proc. Symp. on Computational Geometry (SCG), 1993.
[8]
K. L. Clarkson and P. W. Shor. Applications of random sampling in computational geometry, ii. Discrete Comput. Geom., 4(5):287--421, 1989.
[9]
T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001.
[10]
L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng., 5(1):46--55, 1998.
[11]
A. Duran, J. Corbalán, and E. Ayguadé. Evaluation of OpenMP task scheduling strategies. In Proc. Conf. on OpenMP in a new era of parallelism (IWOMP), pages 100--110, 2008.
[12]
A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988.
[13]
R. H. Halstead, Jr. MULTILISP: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst., 7:501--538, October 1985.
[14]
B. Hardekopf and C. Lin. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proc. Conf. on Programming Language Design and Implementation (PLDI), 2007.
[15]
M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proc. of International Symposium on Computer Architecture (ISCA), 1993.
[16]
M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463--492, 1990.
[17]
Intel Corporation. Intel threading building blocks 3.0. http://threadingbuildingblocks.org.
[18]
M. Kulkarni, P. Carribault, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Scheduling strategies for optimistic parallel execution of irregular programs. In Proc. Symp. on Parallelism in algorithms and architectures (SPAA), pages 217--228, New York, NY, USA, 2008.
[19]
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI), 42(6):211--222, 2007.
[20]
M. Kulkarni, D. Prountzos, D. Nguyen, and K. Pingali. Defining and implementing commutativity conditions for parallel execution. regular rech report TR-ECE-09-11, School of Electrial and Computer Engineering, Purdue University, August 2009.
[21]
D. Lea. A Java fork/join framework. In Proc. Conf. on Java Grande, pages 36--43, 2000.
[22]
D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proc. Conf. on object-oriented programming, systems, languages, and applications (OOPSLA), pages 227--242, 2009.
[23]
K. Madduri, D. Bader, J. Berry, and J. Crobak. Parallel shortest path algorithms for solving large-scale instances. Technical Report GT-CSE-06-19, Georgia Institute of Technology, September 2006.
[24]
M. Méndez-Lojo, A. Mathew, and K. Pingali. Parallel inclusion-based points-to analysis. In Proc. Intl. Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH), 2010.
[25]
U. Meyer and P. Sanders. Delta-stepping: A parallel single source shortest path algorithm. In Proc. European Symp. on Algorithms (ESA), pages 393--404, 1998.
[26]
M. M. Michael, M. T. Vechev, and V. A. Saraswat. Idempotent work stealing. In Proc. Symp. on Principles and practice of parallel programming (PPoPP), pages 45--54, 2009.
[27]
G. L. Miller. A time efficient Delaunay refinement algorithm. In Proc. Symposium on Discrete Algorithms (SODA), pages 400--409, New Orleans, 2004.
[28]
F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program Analysis. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999.
[29]
D. J. Pearce, P. H. J. Kelly, and C. L. Hankin. Online cycle detection and difference propagation for pointer analysis. In Intl. IEEE Workshop on Source Code Analysis and Manipulation (SCAM), pages 3--12, 2003.
[30]
K. Pingali, M. Kulkarni, D. Nguyen, M. Burtscher, M. Mendez-Lojo, D. Prountzos, X. Sui, and Z. Zhong. Amorphous data-parallelism in irregular algorithms. regular tech report TR-09-05, The University of Texas at Austin, 2009.
[31]
J. Ruppert. A Delaunay refinement algorithm for quality 2-dimensional mesh generation. J. Algorithms, 18(3):548--585, 1995.
[32]
J. R. Shewchuk. Triangle: Engineering a 2d quality mesh generator and Delaunay triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. Springer-Verlag, 1996.
[33]
A. Solar-Lezama, C. G. Jones, and R. Bodik. Sketching concurrent data structures. In Proc. Conf. on Programming Language Design and Implementation (PLDI), pages 136--148, New York, NY, USA, 2008.
[34]
M. T. Vechev, E. Yahav, D. F. Bacon, and N. Rinetzky. CGCExplorer: a semi-automated search procedure for provably correct concurrent collectors. In Proc. Conf. on Programming Language Design and Implementation (PLDI), pages 456--467, New York, NY, USA, 2007.
[35]
D. F. Watson. Computing the n-dimensional tessellation with application to Voronoi polytopes. The Computer Journal, 24(2):167--172, 1981.
[36]
J. Wu, R. Das, J. Saltz, H. Berryman, and S. Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44, 1995.

Cited By

View all
  • (2021)An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.308480432:12(3066-3080)Online publication date: 1-Dec-2021
  • (2015)Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons LearnedEuro-Par 2015: Parallel Processing10.1007/978-3-662-48096-0_34(438-450)Online publication date: 25-Jul-2015
  • (2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 39, Issue 1
ASPLOS '11
March 2011
407 pages
ISSN:0163-5964
DOI:10.1145/1961295
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
    March 2011
    432 pages
    ISBN:9781450302661
    DOI:10.1145/1950365
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2011
Published in SIGARCH Volume 39, Issue 1

Check for updates

Author Tags

  1. scheduling
  2. synthesis

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.308480432:12(3066-3080)Online publication date: 1-Dec-2021
  • (2015)Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons LearnedEuro-Par 2015: Parallel Processing10.1007/978-3-662-48096-0_34(438-450)Online publication date: 25-Jul-2015
  • (2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024
  • (2021)Reducing the burden of parallel loop schedulers for many‐core processorsConcurrency and Computation: Practice and Experience10.1002/cpe.624133:13Online publication date: 5-Apr-2021
  • (2020)A Study of APIs for Graph Analytics Workloads2020 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC50251.2020.00030(228-239)Online publication date: Oct-2020
  • (2019)Understanding priority-based scheduling of graph algorithms on a shared-memory platformProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356160(1-14)Online publication date: 17-Nov-2019
  • (2019)A round-efficient distributed betweenness centrality algorithmProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295729(272-286)Online publication date: 16-Feb-2019
  • (2018)Runtime Scheduling Policies for Distributed Graph Algorithms2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00073(640-649)Online publication date: May-2018
  • (2017)What Scalable Programs Need from Transactional MemoryACM SIGARCH Computer Architecture News10.1145/3093337.303775045:1(105-118)Online publication date: 4-Apr-2017
  • (2017)What Scalable Programs Need from Transactional MemoryACM SIGPLAN Notices10.1145/3093336.303775052:4(105-118)Online publication date: 4-Apr-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media