Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation

Published: 17 December 2019 Publication History

Abstract

Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations on and across parallel control constructs. Remedying this situation has generally been thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir leverages the “serial-projection property,” which is commonly satisfied by task-parallel programs, to handle the semantics of these programs without an extensive rework of the compiler.
For recursive fork-join programs that satisfy the serial-projection property, Tapir enables effective compiler optimization of parallel programs with only minor changes to existing compiler analyses and code transformations. Tapir uses the serial-projection property to order logically parallel fine-grained tasks in the program’s control-flow graph. This ordered representation of parallel tasks allows the compiler to optimize parallel codes effectively with only minor modifications. For example, to implement Tapir/LLVM, a prototype of Tapir in the LLVM compiler, we added or modified less than 3,000 lines of LLVM’s half-million-line core middle-end functionality.
These changes sufficed to enable LLVM’s existing compiler optimizations for serial code—including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination—to work with parallel control constructs such as parallel loops and Cilk’s Cilk_Spawn keyword. Tapir also supports parallel optimizations, such as loop scheduling, which restructure the parallel control flow of the program. By making use of existing LLVM optimizations and new parallel optimizations, Tapir/LLVM can optimize recursive fork-join programs more effectively than traditional compilation methods. On a suite of 35 Cilk application benchmarks, Tapir/LLVM produces more efficient executables for 30 benchmarks, with faster 18-core running times for 26 of them, compared to a nearly identical compiler that compiles parallel linguistic constructs the traditional way.

References

[1]
Shivali Agarwal, Rajkishore Barik, Vivek Sarkar, and Rudrapatna K. Shyamasundar. 2007. May-happen-in-parallel analysis of X10 programs. In Proceedings of PPoPP. 183--193.
[2]
Kunal Agrawal, Charles E. Leiserson, and Jim Sukha. 2010. Executing task graphs using work-stealing. In Proceedings of IPDPS. 1--12.
[3]
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.). Addison-Wesley.
[4]
Jonathan Aldrich, Craig Chambers, EminGun Sirer, and Susan Eggers. 1999. Static analyses for eliminating unnecessary synchronization from Java programs. In Static Analysis, Agostino Cortesi and Gilberto Filé (Eds.). Lecture Notes in Computer Science, Vol. 1694. 19--38.
[5]
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 1998. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of SPAA. 119--129.
[6]
E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Yuan Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and Guansong Zhang. 2009. The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst. 20, 3 (2009), 404--418.
[7]
Rajkishore Barik and Vivek Sarkar. 2009. Interprocedural load elimination for dynamic optimization of parallel programs. In Proceedings of PACT. 41--52.
[8]
Rajkishore Barik, Jisheng Zhao, and Vivek Sarkar. 2013. Interprocedural strength reduction of critical sections in explicitly-parallel programs. In Proceedings of PACT. 29--40.
[9]
Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. 2010. CoreDet: A compiler and runtime system for deterministic multithreaded execution. In Proceedings of ASPLOS.
[10]
Emery D. Berger, Ting Yang, Tongping Liu, and Gene Novark. 2009. Grace: Safe multithreaded programming for C/C++. In Proceedings of OOPSLA. 81--96.
[11]
Guy E. Blelloch. 1996. Programming parallel algorithms. Commun. ACM 39, 3 (Mar. 1996).
[12]
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic parallel algorithms can be fast. In Proceedings of PPoPP. 181--192.
[13]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1996. Cilk: An efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37, 1 (1996), 55--69.
[14]
Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (1999), 720--748.
[15]
Robert L. Bocchino, Jr., Vikram S. Adve, Sarita V. Adve, and Marc Snir. 2009. Parallel programming must be deterministic by default. In Proceedings of HotPar.
[16]
Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ concurrency memory model. In Proceedings of PLDI. 68--78.
[17]
Luca Cardelli. 1997. Program fragments, linking, and modularization. In Proceedings of POPL. 266--277.
[18]
Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: The new adventures of old X10. In Proceedings of PPPJ. 51--61.
[19]
Prasanth Chatarasi, Jun Shirako, and Vivek Sarkar. 2015. Polyhedral optimizations of explicitly parallel programs. In Proceedings of PACT. 213--226.
[20]
John S. Danaher, I.-Ting Angelina Lee, and Charles E. Leiserson. 2008. Programming with exceptions in JCilk. Sci. Comput. Program. 63, 2 (Dec. 2008), 147--171.
[21]
Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. 2009. DMP: Deterministic shared memory multiprocessing. In Proceedings of ASPLOS. 85--96.
[22]
Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A relaxed consistency deterministic computer. In Proceedings of ASPLOS. 67--78.
[23]
Wei Du, Renato Ferreira, and Gagan Agrawal. 2003. Compiler support for exploiting coarse-grained pipelined parallelism. In Proceedings of SC. 8--21.
[24]
Mingdong Feng and Charles E. Leiserson. 1997. Efficient detection of determinacy races in Cilk programs. In Proceedings of SPAA.
[25]
Mingdong Feng and Charles E. Leiserson. 1999. Efficient detection of determinacy races in Cilk programs. Theory Comput. Syst. 32, 3 (1999), 301--326.
[26]
Jeremy T. Fineman and Charles E. Leiserson. 2011. Race detectors for Cilk and Cilk++ programs. In Encyclopedia of Parallel Computing, David Padua (Ed.). 1706--1719.
[27]
The MPI Forum. 1993. MPI: A message passing interface. In Proceedings of Supercomputing. 878--883.
[28]
Matteo Frigo, Pablo Halpern, Charles E. Leiserson, and Stephen Lewin-Berlin. 2009. Reducers and other Cilk++ hyperobjects. In Proceedings of SPAA. 79--90.
[29]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of PLDI. 212--223.
[30]
GCC Team. 2014. GCC 4.9 Release Series Changes, New Features, and Fixes. Retrieved from https://gcc.gnu.org/gcc-4.9/changes.html.
[31]
GCC Team. 2015. GOMP—An OpenMP Implementation for GCC. Retrieved from https://gcc.gnu.org/projects/gomp/.
[32]
P. B. Gibbons. 1989. A more practical PRAM model. In Proceedings of SPAA. 158--168.
[33]
Dan Grossman and Ruth E. Anderson. 2012. Introducing parallelism and concurrency in the data structures course. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE ’12). ACM, New York, NY, 505--510.
[34]
Dirk Grunwald and Harini Srinivasan. 1993. Data flow equations for explicitly parallel programs. In Proceedings of PPoPP. 159--168.
[35]
Pablo Halpern. 2012. Strict Fork-Join Parallelism. Technical Report N3409. Intel Corporation.
[36]
Pablo Halpern and Charles E. Leiserson. 2013. Thread-Local Storage in X-Parallel Computations. Technical Report N3556. Intel Corporation and MIT.
[37]
Robert H. Halstead, Jr. 1985. Multilisp: A language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst. 7, 4 (Oct. 1985), 501--538.
[38]
Yuxiong He, Charles E. Leiserson, and William M. Leiserson. 2010. The Cilkview scalability analyzer. In Proceedings of SPAA.
[39]
Michael A. Heroux, Douglas W. Doerfler, Paul S. Crozier, James M. Willenbring, H. Carter Edwards, Alan Williams, Mahesh Rajan, Eric R. Keiter, Heidi K. Thornquist, and Robert W. Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.
[40]
C. A. R. Hoare. 1961. Algorithm 64: Quicksort. Commun. ACM 4, 7 (1961), 321.
[41]
L. Hochstein, J. Carver, F. Shull, S. Asgari, V. Basili, J. K. Hollingsworth, and M. V. Zelkowitz. 2005. Parallel programmer productivity: A case study of novice parallel programmers. In Proceedings of SC.
[42]
Derek R. Hower, Polina Dudnik, Mark D. Hill, and David A. Wood. 2011. Calvin: Deterministic or not? Free will to choose. In Proceedings of HPCA. 333--334.
[43]
Institute of Electrical and Electronic Engineers. [n.d.]. Information Technology—Portable Operating System Interface (POSIX)—Part 1: System Application Program Interface (API) [C Language]. IEEE Standard 1003.1, 1996 Edition.
[44]
Intel Corporation. 2010. Intel Cilk Plus Application Binary Interface Specification. Document Number: 324512-001US. Retrieved from https://software.intel.com/sites/products/cilk-plus/cilk_plus_abi.pdf.
[45]
Intel Corporation. 2010. Intel Cilk Plus Language Specification. Document Number: 324396-001US. Retrieved from http://software.intel.com/sites/products/cilk-plus/cilk_plus_language_specification.pdf.
[46]
Intel Corporation. 2013. Cilk Plus/LLVM. Retrieved from http://cilkplus.github.io/.
[47]
Intel Corporation 2013. Intel Cilk Plus Language Extension Specification, Version 1.2. Intel Corporation. Retrieved from https://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm.
[48]
Intel Corporation. 2015. Intel C++ Compiler 16.0 User and Reference Guide.
[49]
Intel Corporation. 2018. Intel Cilk Plus Samples. Retrieved from https://software.intel.com/en-us/code-samples/intel-compiler/all-samples-and-downloads.
[50]
Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, and Daniel Sanchez. 2015. A scalable architecture for ordered parallelism. In Proceedings of MICRO. 228--241.
[51]
Pramod G. Joisha, Robert S. Schreiber, Prithviraj Banerjee, Hans J. Boehm, and Dhruva R. Chakrabarti. 2011. A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code. In Proceedings of POPL. 623--636.
[52]
Herbert Jordan, Simone Pellegrini, Peter Thoman, Klaus Kofler, and Thomas Fahringer. 2013. INSPIRE: The Insieme parallel intermediate representation. In Proceedings of PACT. 7--18.
[53]
Brian W. Kernighan and Dennis M. Ritchie. 1988. The C Programming Language (2nd ed.). Prentice Hall, Inc.
[54]
D. Khaldi, P. Jouvelot, C. Ancourt, and F. Irigoin. 2012. SPIRE, a Sequential to Parallel Intermediate Representation Extension. Technical Report. Technical Report CRI/A-487, MINES ParisTech.
[55]
Dounia Khaldi, Pierre Jouvelot, François Irigoin, Corinne Ancourt, and Barbara Chapman. 2015. LLVM parallel intermediate representation: Design and evaluation using OpenSHMEM communications. In Proceedings of LLVM. 2:1--2:8.
[56]
Jens Knoop, Bernhard Steffen, and Jürgen Vollmer. 1996. Parallelism for free: Efficient and optimal bitvector analyses for parallel programs. ACM Trans. Program. Lang. Syst. 18, 3 (May 1996), 268--299.
[57]
Leslie Lamport. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (Sept. 1979), 690--691.
[58]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of CGO. 75--87.
[59]
Edward A. Lee. 2006. The problem with threads. Computer 39 (2006), 33--42.
[60]
I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-fly pipeline parallelism. Trans. Parallel Comput. 2, 3, Article 17 (2015), 42 pages.
[61]
I-Ting Angelina Lee, Aamir Shafi, and Charles E. Leiserson. 2012. Memory-mapping support for reducer hyperobjects. In Proceedings of SPAA. 287--297.
[62]
Jaejin Lee, Samuel P. Midkiff, and David A. Padua. 1997. Concurrent static single assignment form and constant propagation for explicitly parallel programs. In Proceedings of LCPC. 114--130.
[63]
Charles E. Leiserson. 2010. The Cilk++ concurrency platform. J. Supercomput. 51, 3 (2010), 244--257.
[64]
LLVM Developer List. 2012. [LLVMdev] [cfe-dev] SPIR Provisional Specification Is Now Available in the Khronos Website. Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053293.html.
[65]
LLVM Developer List. 2012. [LLVMdev] [RFC] OpenMP Representation in LLVM IR. Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053861.html.
[66]
LLVM Developer List. 2012. [LLVMdev] [RFC] Parallelization Metadata and Intrinsics in LLVM (for OpenMP, Etc.). Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053792.html.
[67]
LLVM Developer List. 2015. [LLVMdev] LLVM Parallel IR. Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2015-March/083314.html.
[68]
LLVM Project. 2015. OpenMP: Support for the OpenMP Language. Retrieved from http://openmp.llvm.org/.
[69]
LLVM Project. 2018. Exception Handling in LLVM. Retrieved from https://llvm.org/docs/ExceptionHandling.html.
[70]
LLVM Project. 2018. LLVM Language Reference Manual. Retrieved from http://llvm.org/docs/LangRef.html.
[71]
LLVM Project. 2018. LLVM’s Analysis and Transform Passes. Retrieved from http://llvm.org/docs/Passes.html.
[72]
Michael McCool, Arch D. Robison, and James Reinders. 2012. Structured Parallel Programming: Patterns for Efficient Computation. Elsevier Science.
[73]
Don McCrady. 2008. Avoiding Contention Using Combinable Objects. Microsoft Developer Network blog post. Retrieved from http://blogs.msdn.com/nativeconcurrency/archive/2008/09/25/avoiding-contention-using-combinable-objects.aspx.
[74]
Samuel P. Midkiff and David A. Padua. 1990. Issues in the optimization of parallel programs. In Proceedings of ICPP. 105--113.
[75]
Carroll Morgan. 1994. Programming from Specifications (2nd ed.). Prentice Hall International (UK) Ltd.
[76]
Joel Moses. 1970. The Function of FUNCTION in LISP or Why the FUNARG Problem Should be Called the Environment Problem. Technical Report memo AI-199. Massachusetts Institute of Technology Artificial Intelligence Laboratory.
[77]
Steven S. Muchnick. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann.
[78]
Angeles Navarro, Rafael Asenjo, Siham Tabik, and Calin Cascaval. 2009. Analytical modeling of pipeline parallelism. In Proceedings of PACT. 281--290.
[79]
Robert H. B. Netzer and Barton P. Miller. 1992. What are race conditions?ACM Lett. Program. Lang. Syst. 1, 1 (1992), 74--88.
[80]
Diego Novillo, Ron Unrau, and Jonathan Schaeffer. 1998. Concurrent SSA form in the presence of mutual exclusion. In Proceedings of ICPP. 356--364.
[81]
Marek Olszewski, Jason Ansel, and Saman Amarasinghe. 2009. Kendo: Efficient deterministic multithreading in software. In Proceedings of ASPLOS. 97--108.
[82]
OpenMP Architecture Review Board. 2015. OpenMP Application Program Interface, Version 4.5. Retrieved from http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf.
[83]
Suhas S. Patil. 1970. Closure properties of interconnections of determinate systems. In Record of the Project MAC Conference on Concurrent Systems and Parallel Computation, Jack B. Dennis (Ed.). ACM.
[84]
Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The Tao of parallelism in algorithms. In Proceedings of ACM PLDI.
[85]
Antoniu Pop and Albert Cohen. 2010. Preserving high-level semantics of parallel programming annotations through the compilation flow of optimizing compilers. In Proceedings of CPC.
[86]
William Pugh. 1999. Fixing the Java memory model. In Proceedings of JAVA. 89--98.
[87]
Rolf Rabenseifner, Georg Hager, and Gabriele Jost. 2009. Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In Proceedings of PDP. 427--436.
[88]
James Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Inc.
[89]
Arch D. Robison and Ralph E. Johnson. 2010. Three layer cake for shared-memory programming. In Proceedings of ParaPLoP. 5:1--5:8.
[90]
Erik Ruf. 2000. Effective synchronization removal for Java. In Proceedings of PLDI. 208--218.
[91]
Radu Rugina and Martin C. Rinard. 2003. Pointer analysis for structured parallel programs. ACM Trans. Program. Lang. Syst. 25, 1 (Jan. 2003), 70--116.
[92]
Vivek Sarkar. 1998. Analysis and optimization of explicitly parallel programs using the parallel program graph representation. In Proceedings of LCPC. 94--113.
[93]
Vivek Sarkar and Barbara Simons. 1994. Parallel program graphs and their classification. In Proceedings of LCPC. 633--655.
[94]
Tao B. Schardl. 2016. Performance Engineering of Multicore Software: Developing a Science of Fast Code for the Post-Moore Era. Ph.D. Dissertation. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA.
[95]
Tao B. Schardl, Tyler Denniston, Damon Doucet, Bradley C. Kuszmaul, I-Ting Angelina Lee, and Charles E. Leiserson. 2017. The CSI framework for compiler-inserted program instrumentation. Proc. ACM Meas. Anal. Comput. Syst. 1, 2 (Dec. 2017).
[96]
Tao B. Schardl, Bradley C. Kuszmaul, I-Ting Angelina Lee, William M. Leiserson, and Charles E. Leiserson. 2015. The Cilkprof scalability profiler. In Proceedings of SPAA. 89--100.
[97]
Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2017. Tapir: Embedding fork-join parallelism into LLVM’s intermediate representation. In Proceedings of PPoPP. 249--265.
[98]
J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer. 2009. Phaser accumulators: A new reduction construct for dynamic parallelism. In Proceedings of IPDPS.
[99]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. 2013. Reducing contention through priority updates. In Proceedings of SPAA. 152--163.
[100]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: The problem-based benchmark suite. In Proceedings of SPAA. 68--70.
[101]
Harini Srinivasan and Dirk Grunwald. 1991. An Efficient Construction of Parallel Static Single Assignment Form for Structured Parallel Programs. Technical Report. Technical Report CU-CS-564-91, University of Colorado at Boulder.
[102]
Harini Srinivasan, James Hook, and Michael Wolfe. 1993. Static single assignment for explicitly parallel programs. In Proceedings of POPL. 260--272.
[103]
Harini Srinivasan and Michael Wolfe. 1991. Analyzing programs with explicit parallelism. In Proceedings of LCPC. 405--419.
[104]
Richard M. Stallman and the GCC Developer Community. 2016. Using the GNU Compiler Collection (for GCC version 6.1.0). Free Software Foundation.
[105]
Guy L. Steele Jr. 1990. Making asynchronous parallelism safe for the world. In Proceedings of POPL. 218--231.
[106]
George Stelle, William S. Moses, Stephen L. Olivier, and Patrick McCormick. 2017. OpenMPIR: Implementing openmp tasks with tapir. In Proceedings of LLVM-HPC.
[107]
Bjarne Stroustrup. 2013. The C++ Programming Language (4th ed.). Addison-Wesley.
[108]
Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, and I-Ting Angelina Lee. 2016. Provably good and practically efficient parallel race detection for fork-join programs. In Proceedings of SPAA. 83--94.
[109]
Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco Zappa Nardelli. 2015. Common compiler optimisations are invalid in the C11 memory model and what we can do about it. In Proceedings of POPL. 209--220.
[110]
Eelco Visser. 2001. A survey of rewriting strategies in program transformation systems. Electr. Notes Theoret. Comput. Sci. 57 (2001), 109--143.
[111]
Martin Wimmer. 2013. Wait-free hyperobjects for task-parallel programming systems. In Proceedings of IPDPS. 803--812.
[112]
Jie Yu and Satish Narayanasamy. 2009. A case for an interleaving constrained shared-memory multi-processor. In Proceedings of ISCA. 325--336.
[113]
Jisheng Zhao and Vivek Sarkar. 2011. Intermediate language extensions for parallelism. In Proceedings of SPLASH. 329--340.

Cited By

View all
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2023)High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel ConstructsProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577475(119-134)Online publication date: 25-Feb-2023
  • (2023)Co-Located Parallel Scheduling of Threads to Optimize Cache Sharing2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00030(251-264)Online publication date: 5-Dec-2023
  • Show More Cited By

Index Terms

  1. Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation

      Recommendations

      Reviews

      William M. Waite

      A typical compiler has a front end that analyzes the source text and converts it to a language-independent intermediate representation (IR) whose structure and operations support general-purpose optimization in the compiler's middle end. The quality of the optimization depends on the IR's ability to express the intent of source language constructs in a way that the middle end understands and can improve. Although some compilers now support task parallel linguistic constructs, their front ends convert those constructs into more primitive IR sequences. Often, those sequences involve opaque procedure calls that block further optimization. The authors took a different approach with Tapir. They chose to extend the existing LLVM IR to encode logical parallelism and expose opportunities for standard compiler optimizations to work on parallel code. Integration of these extensions was relatively simple, and they turned out to be easy to maintain in response to LLVM changes. The extensions opened parallel code to a wide variety of standard optimizations by avoiding the problems with opaque procedure calls. According to the authors, the Tapir approach is premised on the notion that parallel programs can usually be executed serially and that there is a normative serial execution order for tasks. This allows them to define serial projection: A serial projection transforms a given program to replace parallel constructs with serial constructs. If the serial projection produces one of the valid behaviors of the original program, then the compiler and runtime system can execute parallel tasks serially. The authors argue that the serial projection property is supported by most task parallel programming models, but not necessarily by arbitrary concurrency mechanisms. They therefore strongly advocate that task parallelism and concurrency be treated as separate concepts. The paper describes how Tapir represents logically parallel tasks, how LLVM's analysis is adapted to Tapir programs, and how the authors evaluated Tapir in practice. Each of these topics is covered in detail, with examples and discussion. The authors discuss related work and present a comprehensive retrospective on where they have been and where they are going. I found the paper easy to read and engrossing; it should be accessible to anyone with a basic understanding of code optimization

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Parallel Computing
      ACM Transactions on Parallel Computing  Volume 6, Issue 4
      December 2019
      188 pages
      ISSN:2329-4949
      EISSN:2329-4957
      DOI:10.1145/3372747
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 December 2019
      Accepted: 01 April 2019
      Revised: 01 January 2019
      Received: 01 April 2018
      Published in TOPC Volume 6, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Cilk
      2. LLVM
      3. OpenMP
      4. Tapir
      5. compiling
      6. control-flow graph
      7. fork-join parallelism
      8. multicore
      9. optimization
      10. parallel computing
      11. serial-projection property

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)376
      • Downloads (Last 6 weeks)37
      Reflects downloads up to 22 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
      • (2023)High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel ConstructsProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577475(119-134)Online publication date: 25-Feb-2023
      • (2023)Co-Located Parallel Scheduling of Threads to Optimize Cache Sharing2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00030(251-264)Online publication date: 5-Dec-2023
      • (2023)Optimizing Compression Schemes for Parallel Sparse Tensor Algebra2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363624(1-7)Online publication date: 25-Sep-2023
      • (2023)An Enhanced Static Taint Analysis Approach to Detect Input Validation VulnerabilityJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.01.00935:2(682-701)Online publication date: 1-Feb-2023
      • (2022)PINT: Parallel INTerval-Based Race Detector2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00087(850-861)Online publication date: May-2022
      • (2022)Predicting input validation vulnerabilities based on minimal SSA features and machine learningJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.09.01034:10(9311-9331)Online publication date: Nov-2022
      • (2021)An Approach for Detecting Feasible Paths Based on Minimal SSA Representation and Symbolic ExecutionApplied Sciences10.3390/app1112538411:12(5384)Online publication date: 10-Jun-2021
      • (2021)A Hybrid Scheduling Scheme for Parallel Loops2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00067(587-598)Online publication date: May-2021
      • (2021)Streaming Sparse Graphs using Efficient Dynamic Sets2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671836(284-294)Online publication date: 15-Dec-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media