Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

The implementation of the Cilk-5 multithreaded language

Published: 01 May 1998 Publication History

Abstract

The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this "work-first" principle has led to a portable Cilk-5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the work-first principle was exploited in the design of Cilk-5's compiler and its runtime system. In particular, we present Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler.

References

[1]
Andrew W. Appel and Zhong Shao. Empirical and analytic study of stack versus heap cost for languages with closures. Journal of Functional Programming, 6(1):47-74, 1996.
[2]
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors, in Proceedings of the Tenth Annual A CM Symposium on Parallel Algorithms and Architectures (SPAA), Puerto Vallarta, Mexico, June 1998. To appear.
[3]
Robert D. Blumofe. Executing Multithreaded Programs Efficiently. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, September 1995.
[4]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing, 37(1):55-69, August 1996.
[5]
Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceed* ings of the 35th Annual Symposium on Foundations of Corn* puter Science (FOCS), pages 356-368, Santa Fe, New Mexico, November 1994.
[6]
Richard P. Brent. The parallel evaluation of general arithmetic expressions. Journal of the A CM, 21(2):201-206, April 1974.
[7]
Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. Detecting data races in Cilk programs that use locks. In Proceedings of the Tenth Annual A CM Symposium on Parallel Algorithms and Architectures (SPAA), Puerto Vallarta, Mexico, June 1998. To appear.
[8]
Cilk-5.1 (Beta 1) Reference Manual. Available on the Internet from http://theory.lcs .mit .edu/'cilk.
[9]
Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, Massachusetts, 1990.
[10]
David E. Culler, Anurag Sah, Klaus Erik Schauser, Thorsten yon Eicken, and John Wawrzynek. Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 164-175, Santa Clara, California, April 1991.
[11]
E. W. Dijkstra. Solution of a problem in concurrent programming control. Communications of the A CM, 8(9):569, September 1965.
[12]
Marc Feeley. Polling efficiently on stock hardware. In Proceedings of the 1993 A CM SIGPLAN Conference on Functional Programming and Computer Architecture, pages 179- 187, Copenhagen, Denmark, June 1993.
[13]
Mingdong Feng and Charles E. Leiserson. Efficient detection of determinacy races in Cilk programs. In Proceedings of the Ninth Annual A CM Symposium on Parallel Algorithms and Architectures (SPAA), pages 1-11, Newport, Rhode Island, June 1997.
[14]
S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5-20, August 1996.
[15]
R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416-429, March 1969.
[16]
Dirk Grunwald. Heaps o' stacks: Time and space efficient threads without operating system support. Technical Report CU-CS-750-94, University of Colorado, November 1994.
[17]
Dirk Grunwald and Richard Neves. Whole-program optimization for time and space efficient threads. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 50-59, Cambridge, Massachusetts, October 1996.
[18]
Christopher F. Joerg. The Cilk System for Parallel Multithreaded Computing. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, January 1996.
[19]
Robert H. Halstead Jr. Multilisp: A language for concurrent symbolic computation. A CM Transactions on Programming Languages and Systems, 7(4):501-538, October 1985.
[20]
Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE 7~nsactions on Computers, C-28(9):690-691, September 1979.
[21]
Phillip Lisiecki and Alberto Medina. Personal communication.
[22]
James S. Miller and Guillermo J. Rozas. Garbage collection is fast, but a stack is faster. Technical Report Memo 1462, MIT Artificial Intelligence Laboratory, Cambridge, MA, 1994.
[23]
Robert C. Miller. A type-checking preprocessor for Cilk 2, a multithreaded C language. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 1995.
[24]
Eric Mohr, David A. Kranz, and Robert H. Halstead, Jr. Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Transactions on Parallel and Distributed Systems, 2(3):264-280, July 1991.
[25]
Joel Moses. The function of FUNCTION in LISP or why the FUN ARG problem should be called the environment problem. Technical Report memo AI-199, MIT Artificial intelligence Laboratory, June 1970.
[26]
PJshiyur Sivaswami Nikhil. Parallel Symbolic Computing in Cid. In Proc. Wkshp. on Parallel Symbolic Computing, Beaune, P~nce, Springer-Verla~ LNCS 1068, pages 217- 242, October 1995.
[27]
Per Stenstr6m. VLSI support for a cactus stack oriented memory organization. Proceedings of the Twentit-First Annual Hawaii International Conference on System Sciences, volume 1, pages 211-220, January 1988.
[28]
Leslie G. Valiant. A bridging model for parallel computation. Communications of the A CM, 33(8):103-111, August 1990.

Cited By

View all
  • (2024)Speedcode: Software Performance Engineering Education via the Coding of Didactic Exercises2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00087(391-394)Online publication date: 27-May-2024
  • (2024)SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00036(325-336)Online publication date: 27-May-2024
  • (2024)Accurate Static Data Race Detection for CFormal Methods10.1007/978-3-031-71162-6_23(443-462)Online publication date: 11-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 33, Issue 5
May 1998
358 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/277652
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
    May 1998
    357 pages
    ISBN:0897919874
    DOI:10.1145/277650
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1998
Published in SIGPLAN Volume 33, Issue 5

Check for updates

Author Tags

  1. critical path
  2. multithreading
  3. parallel computing
  4. programming language
  5. runtime system
  6. work

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)283
  • Downloads (Last 6 weeks)75
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Speedcode: Software Performance Engineering Education via the Coding of Didactic Exercises2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00087(391-394)Online publication date: 27-May-2024
  • (2024)SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00036(325-336)Online publication date: 27-May-2024
  • (2024)Accurate Static Data Race Detection for CFormal Methods10.1007/978-3-031-71162-6_23(443-462)Online publication date: 11-Sep-2024
  • (2023)COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel LoopACM Transactions on Architecture and Code Optimization10.1145/363333121:1(1-26)Online publication date: 18-Nov-2023
  • (2023)Beyond Static Parallel Loops: Supporting Dynamic Task Parallelism on Manycore Architectures with Software-Managed Scratchpad MemoriesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582020(46-58)Online publication date: 25-Mar-2023
  • (2023)Runtime support for automatic placement of workloads on heterogeneous processors2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC60832.2023.00039(210-217)Online publication date: 18-Dec-2023
  • (2023)RD-FCA: A resilient distributed framework for formal concept analysisJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.04.011179(104710)Online publication date: Sep-2023
  • (2023)Performance evaluation on work-stealing featured parallel programs on asymmetric performance multicore processorsArray10.1016/j.array.2023.10031119(100311)Online publication date: Sep-2023
  • (2023)Out-of-the-box library support for DBMS operations on GPUsDistributed and Parallel Databases10.1007/s10619-023-07431-341:3(489-509)Online publication date: 10-May-2023
  • (2023)Evaluating and Analyzing Irregular Tree Search in the Tascell and HOPE Parallel Programming LanguagesParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_21(262-272)Online publication date: 8-Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media