Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Generalized Task Parallelism

Published: 02 April 2015 Publication History

Abstract

Existing approaches to automatic parallelization produce good results in specific domains. Yet, it is unclear how to integrate their individual strengths to match the demands and opportunities of complex software. This lack of integration has both practical reasons, as integrating those largely differing approaches into one compiler would impose an engineering hell, as well as theoretical reasons, as no joint cost model exists that would drive the choice between parallelization methods.
By reducing the problem of generating parallel code from a program dependence graph to integer linear programming, <i>generalized task parallelization</i> integrates central aspects of existing parallelization approaches into a single unified framework. Implemented on top of LLVM, the framework seamlessly integrates enabling technologies such as speculation, privatization, and the realization of reductions.
Evaluating our implementation on various C programs from different domains, we demonstrate the effectiveness and generality of generalized task parallelization. On a quad-core machine with hyperthreading we achieve speedups of up to 4.6 ×.

Supplementary Material

TACO1201-08 (taco1201-08.pdf)
Slide deck associated with this paper

References

[1]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72--81.
[2]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An efficient multithreaded runtime system. SIGPLAN Not. 30, 8 (Aug. 1995), 207--216.
[3]
Michael Burke, Ron Cytron, Jeanne Ferrante, and Wilson Hsieh. 1989. Automatic generation of nested, fork-join parallelism. J. Supercomput. 3, 2 (1989), 71--88.
[4]
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy M. Jones, Gu-Yeon Wei, and David Brooks. 2014. HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs. SIGARCH Comput. Archit. News 42, 3 (June 2014), 217--228.
[5]
Simone Campanoni, Timothy Jones, Glenn Holloway, Vijay Janapa Reddi, Gu-Yeon Wei, and David Brooks. 2012. HELIX: Automatic parallelization of irregular programs for chip multiprocessing. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 84--93.
[6]
Michael K. Chen and Kunle Olukotun. 2003. The jrpm system for dynamically parallelizing java programs. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). ACM Press, New York, NY, 434--446.
[7]
Daniel Cordes, Peter Marwedel, and Arindam Mallik. 2010. Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS’10). ACM, New York, NY, 267--276.
[8]
Matthew DeVuyst, Dean M. Tullsen, and Seon Wook Kim. 2011. Runtime parallelization of legacy code on a transactional memory system. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC’11). ACM, New York, NY, 127--136.
[9]
Alejandro Duran, Xavier Teruel, Roger Ferrer, Xavier Martorell, and Eduard Ayguade. 2009. Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In Proceedings of the 2009 International Conference on Parallel Processing (ICPP’09). IEEE Computer Society, Washington, DC, 124--131.
[10]
Tobias Edler von Koch and Björn Franke. 2014. Variability of data dependences and control flow. In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14). 180--189.
[11]
Yong hun Eom, Stephen Yang, James C. Jenista, and Brian Demsky. 2012. DOJ: Dynamically parallelizing object-oriented programs. SIGPLAN Not. 47, 8 (Feb. 2012), 85--96.
[12]
Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (July 1987), 319--349.
[13]
Samuel Z. Guyer and Calvin Lin. 2005. Error checking with client-driven pointer analysis. Sci. Comput. Prog. 58, 1--2 (Oct. 2005), 83--114.
[14]
Ben Hardekopf and Calvin Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE Computer Society, Washington, DC, 289--298.
[15]
Ben Hertzberg and Kunle Olukotun. 2011. Runtime automatic speculative parallelization. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE Computer Society, Washington, DC, 64--73.
[16]
Jialu Huang, Stephen R. Beard, Nick P. Johnson, Thomas B. Jablin, and David I. August. 2013. Automatically exploiting cross-invocation parallelism using runtime information. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’13). IEEE Computer Society, Washington, DC, USA, 1--11.
[17]
Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han Hung, and David I. August. 2010. Decoupled software pipelining creates parallelization opportunities. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’10). ACM, New York, NY, 121--130.
[18]
James Christopher Jenista, Yong hun Eom, and Brian Charles Demsky. 2011. OoOJava: Software out-of-order execution. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM, New York, NY, 57--68.
[19]
Nick P. Johnson, Hanjun Kim, Prakash Prabhu, Ayal Zaks, and David I. August. 2012. Speculative separation for privatization and reductions. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, New York, NY, 359--370.
[20]
Troy A. Johnson, Rudolf Eigenmann, and T. N. Vijaykumar. 2007. Speculative thread decomposition through empirical optimization. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’07). ACM, New York, NY, 205--214.
[21]
Thomas Karcher and Victor Pankratius. 2011. Run-Time automatic performance tuning for multicore applications. In Euro-Par 2011 Parallel Processing, Emmanuel Jeannot, Raymond Namyst, and Jean Roman (Eds.). Lecture Notes in Computer Science, Vol. 6852. Springer, Berlin, 3--14.
[22]
Ken Kennedy and John R. Allen. 2002. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco, CA.
[23]
Hanjun Kim, Nick P. Johnson, Jae W. Lee, Scott A. Mahlke, and David I. August. 2012. Automatic speculative DOALL for clusters. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 94--103.
[24]
Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2007. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, New York, NY, 211--222.
[25]
Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. SIGPLAN Not. 42, 6 (June 2007), 278--289.
[26]
Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, and Scott Mahlke. 2009. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. SIGPLAN Not. 44, 6 (June 2009), 166--176.
[27]
Samuel P. Midkiff. 2012. Automatic Parallelization: An Overview of Fundamental Compiler Techniques. Morgan &amp; Claypool Publishers.
[28]
Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. 2005. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 38). IEEE Computer Society, Washington, DC, 105--118.
[29]
Louis-Noël Pouchet. 2012. PolyBench/C: The Polyhedral Benchmark Suite. Retrieved from http://www.cs.ucla.edu/&sim;pouchet/software/polybench/.
[30]
Arun Raman, Ayal Zaks, Jae W. Lee, and David I. August. 2012. Parcae: A system for flexible parallel execution. SIGPLAN Not. 47, 6 (June 2012), 133--144.
[31]
Easwaran Raman, Guilherme Ottoni, Arun Raman, Matthew J. Bridges, and David I. August. 2008. Parallel-stage decoupled software pipelining. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’08). ACM, New York, NY, 114--123.
[32]
Ram Rangan, Neil Vachharajani, Manish Vachharajani, and David I. August. 2004. Decoupled software pipelining with the synchronization array. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT’04). IEEE Computer Society, Washington, DC, 177--188.
[33]
Lawrence Rauchwerger and David Padua. 1995. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation (PLDI’95). ACM, New York, NY, 218--232.
[34]
Radu Rugina and Martin Rinard. 1999. Automatic parallelization of divide and conquer algorithms. SIGPLAN Not. 34, 8 (May 1999), 72--83.
[35]
Silvius Rus, Lawrence Rauchwerger, and Jay Hoeflinger. 2003. Hybrid Analysis: Static and dynamic memory reference analysis. Int. J. Parallel Program. 31, 4 (Aug. 2003), 251--283.
[36]
Vivek Sarkar. 1991. Automatic partitioning of a program dependence graph into parallel tasks. IBM J. Res. Dev. 35, 5--6 (Sept. 1991), 779--804.
[37]
Kevin Streit, Clemens Hammacher, Andreas Zeller, and Sebastian Hack. 2013. Sambamba: Runtime adaptive parallel execution. In Proceedings of the 3rd International Workshop on Adaptive Self-Tuning Computing Systems (ADAPT’13). ACM, New York, NY, Article 7, 6 pages.
[38]
Sid Ahmed Ali Touati, Julien Worms, and Sébastien Briais. 2013. The Speedup-Test: A statistical methodology for programme speedup analysis and computation. Concurr. Comp.: Pract. Exper. 25, 10 (2013), 1410--1426.
[39]
Peng Tu and David A. Padua. 1994. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, London, UK, 500--521. http://dl.acm.org/citation.cfm&quest;id=645671.665384
[40]
Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme Ottoni, and David I. August. 2007. Speculative decoupled software pipelining. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT’07). IEEE Computer Society, Washington, DC, 49--59.
[41]
Hans Vandierendonck, Sean Rul, and Koen De Bosschere. 2010. The paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, New York, NY, 389--400.
[42]
Hongtao Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. 2008. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the IEEE 14th International Symposium on High Performance Computer Architecture (HPCA’08). 290--301.

Cited By

View all
  • (2019)White-box program tuningProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314889(122-135)Online publication date: 16-Feb-2019
  • (2019)Sparse computation data dependence simplification for efficient compiler-generated inspectorsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314646(594-609)Online publication date: 8-Jun-2019
  • (2019)White-Box Program Tuning2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661177(122-135)Online publication date: Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 12, Issue 1
April 2015
201 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2744295
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 April 2015
Accepted: 01 January 2015
Revised: 01 December 2014
Received: 01 June 2014
Published in TACO Volume 12, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automatic parallelization
  2. integer linear programming
  3. just-in-time compilation

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)88
  • Downloads (Last 6 weeks)15
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)White-box program tuningProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314889(122-135)Online publication date: 16-Feb-2019
  • (2019)Sparse computation data dependence simplification for efficient compiler-generated inspectorsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314646(594-609)Online publication date: 8-Jun-2019
  • (2019)White-Box Program Tuning2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661177(122-135)Online publication date: Feb-2019
  • (2018)Advances in Engineering Software for Multicore SystemsDependability Engineering10.5772/intechopen.72784Online publication date: 6-Jun-2018
  • (2016)Thread-level speculation with kernel supportProceedings of the 25th International Conference on Compiler Construction10.1145/2892208.2892221(1-11)Online publication date: 17-Mar-2016
  • (2016)Automatic Parallel Pattern Detection in the Algorithm Structure Design Space2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.60(43-52)Online publication date: May-2016

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media