Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

User-Assisted Store Recycling for Dynamic Task Graph Schedulers

Published: 28 December 2016 Publication History

Abstract

The emergence of the multi-core era has led to increased interest in designing effective yet practical parallel programming models. Models based on task graphs that operate on single-assignment data are attractive in several ways. Notably, they can support dynamic applications and precisely represent the available concurrency. However, for efficient execution, they also require nuanced algorithms for scheduling and memory management. In this article, we consider memory-efficient dynamic scheduling of task graphs. Specifically, we present a novel approach for dynamically recycling the memory locations assigned to data items as they are produced by tasks. We develop algorithms to identify memory-efficient store recycling functions by systematically evaluating the validity of a set of user-provided or automatically generated alternatives. Because recycling functions can be input data-dependent, we have also developed support for continued correct execution of a task graph in the presence of a potentially incorrect store recycling function.
Experimental evaluation demonstrates that this approach to automatic store recycling incurs little to no overheads, achieves memory usage comparable to the best manually derived solutions, often produces recycling functions valid across problem sizes and input parameters, and efficiently recovers from an incorrect choice of store recycling functions.

References

[1]
Samah Abu-Mahmeed, Cheryl McCosh, Zoran Budimlić, Ken Kennedy, Kaushik Ravindran, Kevin Hogan, Paul Austin, Steve Rogers, and Jacob Kornerup. 2009. Scheduling tasks to maximize usage of aggregate variables in place. In CC. 204--219.
[2]
Kunal Agrawal, Charles E. Leiserson, and Jim Sukha. 2010. Executing task graphs using work-stealing. In Proceedings of the IEEE 24th International Symposium on Parallel 8 Distributed Processing (IPDPS). 1--12.
[3]
Arvind. 1981. Data flow languages and architecture. In Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA’ 81). 1.
[4]
Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187--198.
[5]
François Broquedis, Thierry Gautier, and Vincent Danjean. 2012. LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms. In Proceedings of the 8th International Conference on OpenMP in a Heterogeneous World (IWOMP’12). Springer-Verlag, Berlin, 102--115.
[6]
Zoran Budimlić, Michael Burke, Vincent Cavé, Kathleen Knobe, Geoff Lowney, and others. 2010. Concurrent collections. Scientific Programming 18, 3 (2010), 203--217.
[7]
Zoran Budimlic, Aparna M. Chandramowlishwaran, Kathleen Knobe, Geoff N. Lowney, Vivek Sarkar, and Leo Treggiari. 2008. Declarative aspects of memory management in the concurrent collections parallel programming model. In Proceedings of the Workshop on Declarative Aspects of Multicore Programming. 47--58.
[8]
A. Bui, Kwang-Ting Cheng, J. Cong, L. Vese, Yi-Chu Wang, Bo Yuan, and Yi Zou. 2012. Platform characterization for domain-specific computing. In ASP-DAC. 94--99.
[9]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’09). IEEE, 44--54.
[10]
Jack B. Dennis. 1974. First version of a data flow procedure language. In Programming Symposium. Springer, 362--376.
[11]
Alejandro Duran, Eduard Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. OmpSs: A Proposal for Programming Heterogeneous Multi-core Architectures. Parallel Processing Letters 21, 2 (2011), 173--193.
[12]
Thierry Gautier, Joao V. F. Lima, Nicolas Maillard, and Bruno Raffin. 2013. XKaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel 8 Distributed Processing (IPDPS). IEEE, 1299--1308.
[13]
Léonard Gérard, Adrien Guatto, Cédric Pasteur, and Marc Pouzet. 2012. A modular memory optimization for synchronous data-flow languages: Application to arrays in a lustre compiler. SIGPLAN Not. 47, 5 (June 2012), 51--60.
[14]
Paul Hudak and Adrienne G. Bloss. 1985. The aggregate update problem in functional programming systems. In POPL. 300--314.
[15]
Prabhanjan Kambadur, Anshul Gupta, Torsten Hoefler, and Andrew Lumsdaine. 2009. Demand-driven execution of static directed acyclic graphs using task parallelism. In Proceedings of the 2009 International Conference on High Performance Computing (HiPC). IEEE, 284--293.
[16]
Mehmet Can Kurt, Sriram Krishnamoorthy, Kunal Agrawal, and Gagan Agrawal. 2014. Fault-tolerant dynamic task graph scheduling. In SC. 719--730.
[17]
Yu-Kwong Kwok and Ishfaq Ahmad. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. Comput. Surveys 31, 4 (Dec. 1999), 406--471.
[18]
Malcolm Yoke Hean Low, Weiguo Liu, and Bertil Schmidt. 2007. A parallel BSP algorithm for irregular dynamic programming. In APPT. 151--160.
[19]
Nicholas D. Matsakis. 2012. Parallel closures: A new twist on an old idea. In HotPar. 5--5.
[20]
Walid A. Najjar, Edward A. Lee, and Guang R. Gao. 1999. Advances in the dataflow computational model. Parallel Comput. 25, 1314 (1999), 1907--1929.
[21]
Antoniu Pop and Albert Cohen. 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Transactions on Architecture and Code Optimization (TACO) 9, 4 (2013), 53.
[22]
Polyvios Pratikakis, Hans Vandierendonck, Spyros Lyberis, and Dimitrios S. Nikolopoulos. 2011. A programming model for deterministic task parallelism. In Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness. ACM, 7--12.
[23]
Veselin Raychev, Martin Vechev, and Manu Sridharan. 2013. Effective race detection for event-driven programs. In OOPSLA. 151--166.
[24]
Dragos Sbirlea, Zoran Budimlic, and Vivek Sarkar. 2014. Bounded memory scheduling of dynamic task graphs. In PACT. 343--356.
[25]
Dragos Sbirlea, Kathleen Knobe, and Vivek Sarkar. 2012. Folding of tagged single assignment values for memory-efficient parallelism. In Euro-Par. 601--613.
[26]
Peter Schnorf, Mahadevan Ganapathi, and John L. Hennessy. 1993. Compile-time copy elimination. Software: Practice and Experience 23, 11 (1993), 1175--1200.
[27]
Sagnak Tasirlar and Vivek Sarkar. 2011. Data-driven tasks and their implementation. In ICPP. 652--661.
[28]
Asim YarKhan, Jakub Kurzak, and Jack Dongarra. 2011. Quark users guide: Queueing and runtime for kernels. University of Tennessee Innovative Computing Laboratory Technical Report ICL-UT-11-02.

Cited By

View all
  • (2021)Incremental, Distributed, and Concurrent Service Coordination for Reliable and Deterministic Systems-of-SystemsIEEE Systems Journal10.1109/JSYST.2020.302043015:2(2470-2481)Online publication date: Jun-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
December 2016
648 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3012405
Issue’s Table of Contents
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2016
Accepted: 01 November 2016
Revised: 01 November 2016
Received: 01 May 2016
Published in TACO Volume 13, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Task parallelism
  2. dynamic scheduling
  3. memory management

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • U.S. Department of Energy's (DOE) Office of Science
  • Battelle for DOE
  • Pacific Northwest National Laboratory
  • NSF
  • The Ohio State University
  • Office of Advanced Scientific Computing Research
  • Department of Energy (DOE)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Incremental, Distributed, and Concurrent Service Coordination for Reliable and Deterministic Systems-of-SystemsIEEE Systems Journal10.1109/JSYST.2020.302043015:2(2470-2481)Online publication date: Jun-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media