article

Free access

The implementation of the Cilk-5 multithreaded language

Authors:

Charles E. Leiserson,

Keith H. RandallAuthors Info & Claims

ACM SIGPLAN Notices, Volume 33, Issue 5

Pages 212 - 223

https://doi.org/10.1145/277652.277725

Published: 01 May 1998 Publication History

Abstract

The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this "work-first" principle has led to a portable Cilk-5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the work-first principle was exploited in the design of Cilk-5's compiler and its runtime system. In particular, we present Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler.

References

[1]

Andrew W. Appel and Zhong Shao. Empirical and analytic study of stack versus heap cost for languages with closures. Journal of Functional Programming, 6(1):47-74, 1996.

[2]

Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors, in Proceedings of the Tenth Annual A CM Symposium on Parallel Algorithms and Architectures (SPAA), Puerto Vallarta, Mexico, June 1998. To appear.

Digital Library

[3]

Robert D. Blumofe. Executing Multithreaded Programs Efficiently. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, September 1995.

Digital Library

[4]

Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing, 37(1):55-69, August 1996.

Digital Library

[5]

Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceed* ings of the 35th Annual Symposium on Foundations of Corn* puter Science (FOCS), pages 356-368, Santa Fe, New Mexico, November 1994.

[6]

Richard P. Brent. The parallel evaluation of general arithmetic expressions. Journal of the A CM, 21(2):201-206, April 1974.

Digital Library

[7]

Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. Detecting data races in Cilk programs that use locks. In Proceedings of the Tenth Annual A CM Symposium on Parallel Algorithms and Architectures (SPAA), Puerto Vallarta, Mexico, June 1998. To appear.

Digital Library

[8]

Cilk-5.1 (Beta 1) Reference Manual. Available on the Internet from http://theory.lcs .mit .edu/'cilk.

[9]

Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, Massachusetts, 1990.

Digital Library

[10]

David E. Culler, Anurag Sah, Klaus Erik Schauser, Thorsten yon Eicken, and John Wawrzynek. Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 164-175, Santa Clara, California, April 1991.

Digital Library

[11]

E. W. Dijkstra. Solution of a problem in concurrent programming control. Communications of the A CM, 8(9):569, September 1965.

Digital Library

[12]

Marc Feeley. Polling efficiently on stock hardware. In Proceedings of the 1993 A CM SIGPLAN Conference on Functional Programming and Computer Architecture, pages 179- 187, Copenhagen, Denmark, June 1993.

Digital Library

[13]

Mingdong Feng and Charles E. Leiserson. Efficient detection of determinacy races in Cilk programs. In Proceedings of the Ninth Annual A CM Symposium on Parallel Algorithms and Architectures (SPAA), pages 1-11, Newport, Rhode Island, June 1997.

Digital Library

[14]

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5-20, August 1996.

Digital Library

[15]

R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416-429, March 1969.

Digital Library

[16]

Dirk Grunwald. Heaps o' stacks: Time and space efficient threads without operating system support. Technical Report CU-CS-750-94, University of Colorado, November 1994.

[17]

Dirk Grunwald and Richard Neves. Whole-program optimization for time and space efficient threads. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 50-59, Cambridge, Massachusetts, October 1996.

Digital Library

[18]

Christopher F. Joerg. The Cilk System for Parallel Multithreaded Computing. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, January 1996.

Digital Library

[19]

Robert H. Halstead Jr. Multilisp: A language for concurrent symbolic computation. A CM Transactions on Programming Languages and Systems, 7(4):501-538, October 1985.

Digital Library

[20]

Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE 7~nsactions on Computers, C-28(9):690-691, September 1979.

[21]

Phillip Lisiecki and Alberto Medina. Personal communication.

[22]

James S. Miller and Guillermo J. Rozas. Garbage collection is fast, but a stack is faster. Technical Report Memo 1462, MIT Artificial Intelligence Laboratory, Cambridge, MA, 1994.

Digital Library

[23]

Robert C. Miller. A type-checking preprocessor for Cilk 2, a multithreaded C language. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 1995.

[24]

Eric Mohr, David A. Kranz, and Robert H. Halstead, Jr. Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Transactions on Parallel and Distributed Systems, 2(3):264-280, July 1991.

Digital Library

[25]

Joel Moses. The function of FUNCTION in LISP or why the FUN ARG problem should be called the environment problem. Technical Report memo AI-199, MIT Artificial intelligence Laboratory, June 1970.

Digital Library

[26]

PJshiyur Sivaswami Nikhil. Parallel Symbolic Computing in Cid. In Proc. Wkshp. on Parallel Symbolic Computing, Beaune, P~nce, Springer-Verla~ LNCS 1068, pages 217- 242, October 1995.

Digital Library

[27]

Per Stenstr6m. VLSI support for a cactus stack oriented memory organization. Proceedings of the Twentit-First Annual Hawaii International Conference on System Sciences, volume 1, pages 211-220, January 1988.

Digital Library

[28]

Leslie G. Valiant. A bridging model for parallel computation. Communications of the A CM, 33(8):103-111, August 1990.

Digital Library

Cited By

Kaler TChen XWheatman BCurtis DHoppe BSchardl TLeiserson C(2024)Speedcode: Software Performance Engineering Education via the Coding of Didactic Exercises2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00087(391-394)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00087
Chen JManivannan MGoel BPericàs M(2024)SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00036(325-336)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00036
Sales EInverso OTuosto E(2024)Accurate Static Data Race Detection for CFormal Methods10.1007/978-3-031-71162-6_23(443-462)Online publication date: 11-Sep-2024
https://doi.org/10.1007/978-3-031-71162-6_23
Show More Cited By

Index Terms

The implementation of the Cilk-5 multithreaded language
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Control structures
      2. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages

Recommendations

The implementation of the Cilk-5 multithreaded language
PLDI '98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation

The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of ...
Programming with exceptions in JCilk
Special issue: Synchronization and concurrency in object-oriented languages

JCilk extends the serial subset of the Java language by importing the fork-join primitives spawn and sync from the Cilk multithreaded language, thereby providing call-return semantics for multithreaded subcomputations. In addition, JCilk transparently ...
Exploring the performance of massively multithreaded architectures

We present a new scheme for evaluating the performance of multithreaded computers and demonstrate its application to the Cray MTA-2 and XMT supercomputers. Our scheme is based on the concept of clock cycles per element, \documentclass{article}\footskip=...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 33, Issue 5

May 1998

358 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/277652

Chairmen:
Jack W. Davidson
Univ. of Virginia
,
Keith D. Cooper
Rice Univ., Houston, TX
,
Editor:
A. Michael Berman
Rowan Univ., Glassboro, NJ

Issue’s Table of Contents

PLDI '98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
May 1998
357 pages
ISBN:0897919874
DOI:10.1145/277650
Chairmen:
Jack W. Davidson
Univ. of Virginia
,
Keith D. Cooper
Rice Univ. Houston, TX
,
Editor:
A. Michael Berman
Rowan Univ. Glassboro, NJ

Copyright © 1998 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1998

Published in SIGPLAN Volume 33, Issue 5

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,095
Total Citations
View Citations
3,686
Total Downloads

Downloads (Last 12 months)283
Downloads (Last 6 weeks)75

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kaler TChen XWheatman BCurtis DHoppe BSchardl TLeiserson C(2024)Speedcode: Software Performance Engineering Education via the Coding of Didactic Exercises2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00087(391-394)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00087
Chen JManivannan MGoel BPericàs M(2024)SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00036(325-336)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00036
Sales EInverso OTuosto E(2024)Accurate Static Data Race Detection for CFormal Methods10.1007/978-3-031-71162-6_23(443-462)Online publication date: 11-Sep-2024
https://doi.org/10.1007/978-3-031-71162-6_23
Mishra PNandivada V(2023)COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel LoopACM Transactions on Architecture and Code Optimization10.1145/363333121:1(1-26)Online publication date: 18-Nov-2023
https://dl.acm.org/doi/10.1145/3633331
Cheng LRuttenberg MJung DRichmond DTaylor MOskin MBatten CAamodt TJerger NSwift M(2023)Beyond Static Parallel Loops: Supporting Dynamic Task Parallelism on Manycore Architectures with Software-Managed Scratchpad MemoriesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582020(46-58)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582020
Benoit NLouise S(2023)Runtime support for automatic placement of workloads on heterogeneous processors2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC60832.2023.00039(210-217)Online publication date: 18-Dec-2023
https://doi.org/10.1109/MCSoC60832.2023.00039
Khaund ASharma ATiwari AGarg SKailasam S(2023)RD-FCA: A resilient distributed framework for formal concept analysisJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.04.011179(104710)Online publication date: Sep-2023
https://doi.org/10.1016/j.jpdc.2023.04.011
Adnan (2023)Performance evaluation on work-stealing featured parallel programs on asymmetric performance multicore processorsArray10.1016/j.array.2023.10031119(100311)Online publication date: Sep-2023
https://doi.org/10.1016/j.array.2023.100311
Subramanian HGurumurthy BDurand GBroneske DSaake G(2023)Out-of-the-box library support for DBMS operations on GPUsDistributed and Parallel Databases10.1007/s10619-023-07431-341:3(489-509)Online publication date: 10-May-2023
https://doi.org/10.1007/s10619-023-07431-3
Shiki YYasugi MHiraishi T(2023)Evaluating and Analyzing Irregular Tree Search in the Tascell and HOPE Parallel Programming LanguagesParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_21(262-272)Online publication date: 8-Apr-2023
https://doi.org/10.1007/978-3-031-29927-8_21
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents