article

Public Access

Responsive parallel computation: bridging competitive and cooperative threading

Authors:

Stefan K. Muller,

Robert HarperAuthors Info & Claims

ACM SIGPLAN Notices, Volume 52, Issue 6

Pages 677 - 692

https://doi.org/10.1145/3140587.3062370

Published: 14 June 2017 Publication History

Abstract

Competitive and cooperative threading are widely used abstractions in computing. In competitive threading, threads are scheduled preemptively with the goal of minimizing response time, usually of interactive applications. In cooperative threading, threads are scheduled non-preemptively with the goal of maximizing throughput or minimizing the completion time, usually in compute-intensive applications, e.g. scientific computing, machine learning and AI.

Although both of these forms of threading rely on the same abstraction of a thread, they have, to date, remained largely separate forms of computing. Motivated by the recent increase in the mainstream use of multicore computers, we propose a threading model that aims to unify competitive and cooperative threading. To this end, we extend the classic graph-based cost model for cooperative threading to allow for competitive threading, and describe how such a cost model may be used in a programming language by presenting a language and a corresponding cost semantics. Finally, we show that the cost model and the semantics are realizable by presenting an operational semantics for the language that specifies the behavior of an implementation, as well as an implementation and a small empirical evaluation.

References

[1]

U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321–347, 2002.

[2]

U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP ’13, 2013.

Digital Library

[3]

U. A. Acar, A. Charguéraud, and M. Rainey. Oracle-guided scheduling for controlling granularity in implicitly parallel languages. Journal of Functional Programming (JFP), 26:e23, 2016.

[4]

N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115–144, 2001.

[5]

Arvind and K. P. Gostelow. The Id report: An asychronous language and computing machine. Technical Report TR-114, Department of Information and Computer Science, University of California, Irvine, Sept. 1978.

[6]

A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pages 29–44, 2009.

Digital Library

[7]

G. Blake, R. G. Dreslinski, T. Mudge, and K. Flautner. Evolution of thread-level parallelism in desktop applications. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pages 302–313, 2010.

Digital Library

[8]

G. Blelloch and J. Greiner. Parallelism in sequential functional languages. In Proceedings of the 7th International Conference on Functional Programming Languages and Computer Architecture, FPCA ’95, pages 226–237. ACM, 1995.

Digital Library

[9]

G. E. Blelloch and J. Greiner. A provable time and space e fficient implementation of NESL. In Proceedings of the 1st ACM SIGPLAN International Conference on Functional Programming, pages 213–225. ACM, 1996.

Digital Library

[10]

G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput., 21(1):4–14, 1994.

Digital Library

[11]

G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably e fficient scheduling for languages with fine-grained parallelism. J. ACM, 46:281–321, Mar. 1999.

Digital Library

[12]

R. D. Blumofe and C. E. Leiserson. Space-e fficient scheduling of multithreaded computations. SIAM Journal on Computing, 27(1):202–229, 1998.

Digital Library

[13]

R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720–748, Sept. 1999.

Digital Library

[14]

S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pages 43–57, 2008.

Digital Library

[15]

R. P. Brent. The parallel evaluation of general arithmetic expressions. J. ACM, 21(2):201–206, 1974.

Digital Library

[16]

F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Functional Programming Languages and Computer Architecture (FPCA ’81), pages 187– 194. ACM Press, Oct. 1981.

Digital Library

[17]

M. M. T. Chakravarty, R. Leshchinskiy, S. L. Peyton Jones, G. Keller, and S. Marlow. Data parallel Haskell: a status report. In Proceedings of the POPL 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP 2007, Nice, France, January 16, 2007, pages 10–18, 2007.

Digital Library

[18]

P. Charles, C. Grotho ff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an objectoriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’05, pages 519–538. ACM, 2005.

Digital Library

[19]

R. Davies. A temporal-logic approach to binding-time analysis. In LICS, pages 184–195, 1996.

Digital Library

[20]

D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus e fficiency in parallel systems. IEEE Transactions on Computing, 38(3):408–423, 1989.

Digital Library

[21]

Y. Endo, Z. Wang, J. B. Chen, and M. Seltzer. Using latency to evaluate interactive system performance. In Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation, OSDI ’96, pages 185–199, New York, NY, USA, 1996. ACM.

Digital Library

[22]

D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling - A status report. In Job Scheduling Strategies for Parallel Processing (JSSPP), 10th International Workshop, pages 1–16, 2004.

Digital Library

[23]

N. Feltman, C. Angiuli, U. A. Acar, and K. Fatahalian. Automatically splitting a two-stage lambda calculus. In Proceedings of the 25 European Symposium on Programming, ESOP, pages 255–281, 2016.

Digital Library

[24]

K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Threadlevel parallelism and interactive performance of desktop applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, pages 129–138, 2000.

Digital Library

[25]

M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20(5-6):1–40, 2011.

Digital Library

[26]

M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212–223, 1998.

Digital Library

[27]

C. Gao, A. Gutierrez, R. G. Dreslinski, T. Mudge, K. Flautner, and G. Blake. A study of thread level parallelism on mobile devices. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, pages 126– 127, March 2014.

[28]

R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416–429, 1969.

Digital Library

[29]

J. Greiner and G. E. Blelloch. A provably time-e fficient parallel implementation of full speculation. ACM Transactions on Programming Languages and Systems, 21(2):240–285, Mar. 1999.

Digital Library

[30]

R. H. Halstead. Multilisp: a language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7:501–538, 1985.

Digital Library

[31]

M. Harchol-Balter. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013.

Digital Library

[32]

R. Harper. Practical Foundations for Programming Languages. Cambridge University Press, New York, NY, USA, 2012.

Digital Library

[33]

C. Hauser, C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: A case study. In Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, SOSP ’93, pages 94–105, New York, NY, USA, 1993. ACM.

Digital Library

[34]

S. Imam and V. Sarkar. Load balancing prioritized tasks via work-stealing. In Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing, pages 222–234, 2015.

[35]

S. M. Imam and V. Sarkar. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14, pages 75–86, 2014.

[36]

Intel. Intel threading building blocks, 2011.

[37]

https://www.threadingbuildingblocks.org/.

[38]

S. Jagannathan, A. Navabi, K. Sivaramakrishnan, and L. Ziarek. The design rationale for Multi-MLton. In ML ’10: Proceedings of the ACM SIGPLAN Workshop on ML. ACM, 2010.

[39]

J. Jaja. An introduction to parallel algorithms. Addison Wesley Longman Publishing Company, 1992.

Digital Library

[40]

G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP ’10, pages 261–272, 2010.

Digital Library

[41]

T. B. Knoblock and E. Ruf. Data specialization. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, PLDI ’96, pages 215– 225, 1996.

Digital Library

[42]

D. Lea. A Java fork /join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA ’00, pages 36–43, 2000.

Digital Library

[43]

D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’09, pages 227–242, 2009.

Digital Library

[44]

R. Ley-Wild, U. A. Acar, and M. Fluet. A cost semantics for self-adjusting computation. In Proceedings of the 26th Annual ACM Symposium on Principles of Programming Languages, POPL ’09, 2009.

Digital Library

[45]

MLton. MLton web site. http://www.mlton.org.

[46]

S. K. Muller and U. A. Acar. Latency-hiding work stealing: Scheduling interacting parallel computations with work stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’16, pages 71–82, 2016.

Digital Library

[47]

S. K. Muller, U. A. Acar, and R. Harper. Responsive parallel computation: Bridging competitive and cooperative threading. Technical Report TBD, Carnegie Mellon University School of Computer Science, Apr. 2017.

Digital Library

[48]

T. Murphy, VII, K. Crary, R. Harper, and F. Pfenning. A symmetric modal lambda calculus for distributed computing. In Proceedings of the 19th IEEE Symposium on Logic in Computer Science (LICS), pages 286–295. IEEE Press, 2004.

Digital Library

[49]

A. Nanevski and F. Pfenning. Staged computation with names and necessity. J. Funct. Program., 15(5):893–939, 2005.

Digital Library

[50]

G. J. Narlikar and G. E. Blelloch. Space-e fficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21, 1999.

Digital Library

[51]

F. Pfenning and R. Davies. A judgmental reconstruction of modal logic. Mathematical Structures in Computer Science, 11:511–540, 2001.

Digital Library

[52]

R. Raghunathan, S. K. Muller, U. A. Acar, and G. Blelloch. Hierarchical memory management for parallel programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, pages 392–406, New York, NY, USA, 2016. ACM.

Digital Library

[53]

M. Rosendahl. Automatic complexity analysis. In FPCA ’89: Functional Programming Languages and Computer Architecture, pages 144–156. ACM, 1989.

Digital Library

[54]

A. Saifullah, D. Ferry, J. Li, K. Agrawal, C. Lu, and C. D. Gill. Parallel real-time scheduling of DAGs. IEEE Trans. Parallel Distrib. Syst., 25(12):3242–3252, 2014.

[55]

D. Sands. Complexity analysis for a lazy higher-order language. In ESOP ’90: Proceedings of the 3rd European Symposium on Programming, pages 361–376, London, UK, 1990. Springer-Verlag.

Digital Library

[56]

P. M. Sansom and S. L. Peyton Jones. Time and space profiling for non-strict, higher-order functional languages. In Principles of Programming Languages, pages 355–366, 1995.

Digital Library

[57]

A. Silberschatz, P. B. Galvin, and G. Gagne. Operating system concepts (7. ed.). Wiley, 2005.

Digital Library

[58]

D. C. Smith, C. Irby, R. Kimball, B. Verplank, and E. Harslem. Designing the star user interface. BYTE Magazine, 7(4):242– 282, 1982.

[59]

D. Spoonhower. Scheduling Deterministic Parallel Programs. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2009.

Digital Library

[60]

D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In International Conference on Functional Programming, 2008.

Digital Library

[61]

D. C. Swinehart, P. T. Zellweger, R. J. Beach, and R. B. Hagmann. A structural view of the cedar programming environment. ACM Trans. Program. Lang. Syst., 8(4):419– 490, Aug. 1986.

Digital Library

[62]

W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theoretical Computer Science, 248 (1):211 – 242, 2000.

Digital Library

[63]

J. Ullman. NP-complete scheduling problems. Journal of Computer and System Sciences, 10(3):384 – 393, 1975.

Digital Library

[64]

M. Wimmer, D. Cederman, J. L. Trä ff, and P. Tsigas. Workstealing with configurable scheduling strategies. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pages 315–316, 2013.

Digital Library

[65]

M. Wimmer, F. Versaci, J. L. Trä ff, D. Cederman, and P. Tsigas. Data structures for task-based priority scheduling. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 379–380, 2014.

Digital Library

Cited By

Arora JWestrick SAcar U(2021)Provably space-efficient parallel functional programmingProceedings of the ACM on Programming Languages10.1145/34342995:POPL(1-33)Online publication date: 4-Jan-2021
https://doi.org/10.1145/3434299
Arora JMuller SAcar U(2024)Disentanglement with Futures, State, and InteractionProceedings of the ACM on Programming Languages10.1145/36328958:POPL(1569-1599)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3632895
Westrick SFluet MRainey MAcar U(2024)Automatic Parallelism ManagementProceedings of the ACM on Programming Languages10.1145/36328808:POPL(1118-1149)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3632880
Show More Cited By

Index Terms

Responsive parallel computation: bridging competitive and cooperative threading
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages
        Functional languages
        Parallel programming languages

Recommendations

Responsive parallel computation: bridging competitive and cooperative threading
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

Competitive and cooperative threading are widely used abstractions in computing. In competitive threading, threads are scheduled preemptively with the goal of minimizing response time, usually of interactive applications. In cooperative threading, ...
PAIS: Parallelism-aware interconnect scheduling in multicores
Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Multicore processors have the potential to deliver scalable performance by distributing computation across multiple cores. However, the communication cost of parallel application thread execution may significantly limit the performance achievable due to ...
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 52, Issue 6

PLDI '17

June 2017

708 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/3140587

Editor:
Matthew Fluet

Issue’s Table of Contents

PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2017
708 pages
ISBN:9781450349888
DOI:10.1145/3062341
General Chair:
Albert Cohen
Inria, France
,
Program Chair:
Martin Vechev
DeepCode, Switzerland / ETH Zurich, Switzerland

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Published in SIGPLAN Volume 52, Issue 6

Check for updates

Author Tags

Qualifiers

Article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
713
Total Downloads

Downloads (Last 12 months)122
Downloads (Last 6 weeks)29

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Arora JWestrick SAcar U(2021)Provably space-efficient parallel functional programmingProceedings of the ACM on Programming Languages10.1145/34342995:POPL(1-33)Online publication date: 4-Jan-2021
https://doi.org/10.1145/3434299
Arora JMuller SAcar U(2024)Disentanglement with Futures, State, and InteractionProceedings of the ACM on Programming Languages10.1145/36328958:POPL(1569-1599)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3632895
Westrick SFluet MRainey MAcar U(2024)Automatic Parallelism ManagementProceedings of the ACM on Programming Languages10.1145/36328808:POPL(1118-1149)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3632880
Arora JWestrick SAcar U(2023)Efficient Parallel Functional Programming with EffectsProceedings of the ACM on Programming Languages10.1145/35912847:PLDI(1558-1583)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591284
Muller SSinger KKeeney DNeth AAgrawal KLee IAcar U(2023)Responsive Parallelism with SynchronizationProceedings of the ACM on Programming Languages10.1145/35912497:PLDI(712-735)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591249
Singer KAgrawal KLee IAgrawal KShun J(2023)An Efficient Scheduler for Task-Parallel Interactive ApplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591092(27-38)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591092
Westrick SArora JAcar U(2022)Entanglement detection with near-zero costProceedings of the ACM on Programming Languages10.1145/35476466:ICFP(679-710)Online publication date: 31-Aug-2022
https://dl.acm.org/doi/10.1145/3547646
Arora JWestrick SAcar U(2021)Provably space-efficient parallel functional programmingProceedings of the ACM on Programming Languages10.1145/34342995:POPL(1-33)Online publication date: 4-Jan-2021
https://dl.acm.org/doi/10.1145/3434299
Singer KGoldstein NMuller SAgrawal KLee IAcar UScheideler CSpear M(2020)Priority Scheduling for Interactive ApplicationsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400236(465-477)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3350755.3400236
Westrick SYadav RFluet MAcar U(2019)Disentanglement in nested-parallel programsProceedings of the ACM on Programming Languages10.1145/33711154:POPL(1-32)Online publication date: 20-Dec-2019
https://dl.acm.org/doi/10.1145/3371115
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents