research-article

Efficient deterministic multithreading without global barriers

Authors:

Xiaoping WangAuthors Info & Claims

ACM SIGPLAN Notices, Volume 49, Issue 8

Pages 287 - 300

https://doi.org/10.1145/2692916.2555252

Published: 06 February 2014 Publication History

Abstract

Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend on its inputs only, which can totally solve the above problem. However, current DMT implementations suffer from a common inefficiency: they use frequent global barriers to enforce a deterministic ordering on memory accesses. In this paper, we eliminate that inefficiency using an execution model we call deterministic lazy release consistency (DLRC). Our execution model uses the Kendo algorithm to enforce a deterministic ordering on synchronization, and it uses a deterministic version of the lazy release consistency memory model to propagate memory updates across threads. Our approach guarantees that programs execute deterministically even when they contain data races. We implemented a DMT system based on these ideas (RFDet) and evaluated it using 16 parallel applications. Our implementation targets C/C++ programs that use POSIX threads. Results show that RFDet gains nearly 2x speedup compared with DThreads-a start-of-the-art DMT system.

References

[1]

S. V. Adve and J. K. Aggarwal, "A Unified Formalization of Four Shared-Memory Models," IEEE Trans. Parallel Distrib. Syst., vol. 4, pp. 613--624, 1993.

Digital Library

[2]

S. V. Adve and M. D. Hill, "Weak ordering--a new definition," presented at the Proceedings of the 17th annual international symposium on Computer Architecture, Seattle, Washington, USA, 1990.

Digital Library

[3]

A. Amittai, W. Shu-Chun, H. Sen, and F. Bryan, "Efficient system-enforced deterministic parallelism," presented at the Proceedings of the 9th USENIX conference on Operating systems design and implementation, Vancouver, BC, Canada, 2010.

Digital Library

[4]

T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman, "CoreDet: a compiler and runtime system for deterministic multithreaded execution," presented at the Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, Pittsburgh, Pennsylvania, USA, 2010.

Digital Library

[5]

T. Bergan, L. Ceze, and D. Grossman, "Input-Covering Schedules for Multithreaded Programs," in Proceedings of the Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), Indianapolis, Indiana, USA, 2013.

Digital Library

[6]

T. Bergan, J. Devietti, N. Hunt, and L. Ceze, "The Deterministic Execution Hammer: How Well Does it Actually Pound Nails?," in WoDET, 2011.

[7]

T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble, "Deterministic process groups in dOS," in Proceedings of the 9th USENIX conference on Operating systems design and implementation, 2010.

Digital Library

[8]

E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. "Hoard: A scalable memory allocator for multithreaded applications," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), pages 117--128, Cambridge, MA, Nov. 2000.

Digital Library

[9]

E. D. Berger, T. Yang, T. Liu, and G. Novark, "Grace: Safe multithreaded programming for C/C+," in OOPSLA, 2009, pp. 81--96.

Digital Library

[10]

C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," in Proceedings of the 17th international conference on Parallel architectures and compilation techniques, 2008.

Digital Library

[11]

R. L. Bocchino Jr, V. S. Adve, S. V. Adve, and M. Snir, "Parallel programming must be deterministic by default," in Proceedings of the First USENIX conference on Hot topics in parallelism, 2009, pp. 4--4.

Digital Library

[12]

H.-J. Boehm, "Position Paper: Nondeterminism is Unavoidable, but Data Races are Pure Evil," in Proceedsing of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability (RACES), 2012.

Digital Library

[13]

H.-J. Boehm and S. V. Adve, "Foundations of the C+ concurrency memory model," presented at the Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, Tucson, AZ, USA, 2008.

Digital Library

[14]

H. Cui, J. Simsa, H. Li, B. Blum, X. Xu, J. Yang, G. A. Gibson, and R. E. Bryant, "Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads," in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, Farmington, PA, USA, 2013.

Digital Library

[15]

H. Cui, J. Wu, and J. Yang, "Stable deterministic multithreading through schedule memoization," in Proceedings of the 9th USENIX conference on Operating systems design and implementation, 2010.

Digital Library

[16]

H. Cui, J. Wu, J. Gallagher, H. Guo, and J. Yang, "Efficient Deterministic Multithreading through Schedule Relaxation," in Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, Cascais, Portugal, 2011.

Digital Library

[17]

J. Devietti, B. Lucia, L. Ceze, M. Oskin, "DMP: deterministic shared memory multiprocessing,"presented at the Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, Washington, DC, USA, 2009.

Digital Library

[18]

J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman, "RCDC: a relaxed consistency deterministic computer," in Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, Newport Beach, California, USA, 2011, pp. 67--78.

Digital Library

[19]

C. J. Fidge., "Partial orders for parallel debugging," in ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, January 1989, pp. 24(1): 183--194.

Digital Library

[20]

M. Hill and M. Xu. Racey: A Stress Test for Deterministic Execution. Available: http://www.cs.wisc.edu/~markhill/racey.html

[21]

D. R. Hower, P. Dudnik, M. D. Hill, and D. A. Wood, "Calvin: Deterministic or not? Free will to choose," in High Performance Computer Architecture (HPCA), 2011, pp. 333--334.

Digital Library

[22]

B. Kasikci, C. Zamfir, and G. Candea. "Data Races vs. Data Race Bugs: Telling the Difference with Portend," in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012.

Digital Library

[23]

P. Keleher, A. L. Cox, and W. Zwaenepoel, "Lazy release consistency for software distributed shared memory," SIGARCH Comput. Archit. News, vol. 20, pp. 13--21, 1992.

Digital Library

[24]

P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel, "TreadMarks: distributed shared memory on standard workstations and operating systems," presented at the Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, San Francisco, California, 1994.

Digital Library

[25]

T. J. LeBlanc and J. M. Mellor-Crummey, "Debugging parallel programs with instant replay," Computers, IEEE Transactions on, vol. 100, pp. 471--482, 1987.

Digital Library

[26]

E. A. Lee, "The problem with threads," Computer, vol. 39, pp. 33--42, 2006.

Digital Library

[27]

D. Lee, P. M. Chen, J. Flinn, and S. Narayanasamy, "Chimera: Hybrid Program Analysis for Determinism," presented at the Proceedings of the 2012 ACM SIGPLAN conference on Programming language design and implementation, Beijing, China, 2012.

Digital Library

[28]

T. Liu, C. Curtsinger, and E. D. Berger, "DTHREADS: Efficient Deterministic Multithreading," in Proceedings of the 22nd ACM Symposium on Operating Systems Principles, 2011.

Digital Library

[29]

T. Merrifield, and J. Eriksson, "Conversion: Multi-Version Concurrency Control for Main Memory Segments," in EuroSys, 2013.

Digital Library

[30]

M. Olszewski, J. Ansel, and S. Amarasinghe, "Kendo: efficient deterministic multithreading in software," in Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, 2009, pp. 97--108.

Digital Library

[31]

C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, "Evaluating MapReduce for Multi-core and Multiprocessor Systems," in Proceedings of the 13th International Symposium on High Performance Computer Architecture, Washington, DC, USA, 2007, pp. 13--24.

Digital Library

[32]

D. Subhraveti and J. Nieh, "Record and transplay: partial checkpointing for replay debugging across heterogeneous systems," in SIGMETRICS 2011, pp. 109--120.

Digital Library

[33]

K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, et al., "DoublePlay: Parallelizing Sequential Logging and Replay," in Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, Newport Beach, California, USA, 2011.

Digital Library

[34]

V. M. Weaver and S. A. McKee, "Can hardware performance counters be trusted?," in IISWC, 2008, pp. 141--150.

[35]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The SPLASH-2 programs: Characterization and methodological considerations," in Proceedings of the 22nd annual international symposium on Computer architecture, 1995, pp. 24--36.

Digital Library

[36]

W. Xiong, S. Park, J. Zhang, Y. Zhou, and Z. Ma, "Ad hoc synchronization considered harmful," in Proceedings of the 9th USENIX conference on Operating Systems Design and Implementation, 2010, pp. 163--176.

Digital Library

[37]

X. Zhou, K. Lu, X. Wang, and X. Li, "Exploiting parallelism in deterministic shared memory multiprocessing," J. Parallel Distrib. Comput., pp. 72(2012)716--727, 2012.

Digital Library

Cited By

Basu KHussain SGupta UKarri R(2020)COPPTCHA: COPPA Tracking by Checking Hardware-Level ActivityIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.298328715(3213-3226)Online publication date: 2020
https://doi.org/10.1109/TIFS.2020.2983287
Das SWerner JAntonakakis MPolychronakis MMonrose F(2019)SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security2019 IEEE Symposium on Security and Privacy (SP)10.1109/SP.2019.00021(20-38)Online publication date: May-2019
https://doi.org/10.1109/SP.2019.00021
Lau P(2019)Static Detection of Event-Driven Races in HTML5-Based Mobile AppsVerification and Evaluation of Computer and Communication Systems10.1007/978-3-030-35092-5_3(32-46)Online publication date: 12-Nov-2019
https://doi.org/10.1007/978-3-030-35092-5_3
Show More Cited By

Index Terms

Efficient deterministic multithreading without global barriers
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Efficient deterministic multithreading without global barriers
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend ...
Efficient deterministic multithreading through schedule relaxation
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Deterministic multithreading (DMT) eliminates many pernicious software problems caused by nondeterminism. It works by constraining a program to repeat the same thread interleavings, or schedules, when given same input. Despite much recent research, it ...
What is the cost of weak determinism?
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

We analyze the fundamental performance impact of enforcing a fixed order of synchronization operations to achieve weak deterministic execution. Our analysis is in three parts, performed on a real system using the SPLASH-2 and PARSEC benchmarks. First, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 49, Issue 8

PPoPP '14

August 2014

390 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2692916

Editors:
Mark W. Bailey
Hamilton College, Clinton, NY
,
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Sarita Adve
University of Illinois at Urbana-Champ

Issue’s Table of Contents

PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
February 2014
412 pages
ISBN:9781450326568
DOI:10.1145/2555243
General Chair:
José Moreira
IBM Research, USA
,
Program Chair:
James Larus
EPFL, Switzerland

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 February 2014

Published in SIGPLAN Volume 49, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
578
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Basu KHussain SGupta UKarri R(2020)COPPTCHA: COPPA Tracking by Checking Hardware-Level ActivityIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.298328715(3213-3226)Online publication date: 2020
https://doi.org/10.1109/TIFS.2020.2983287
Das SWerner JAntonakakis MPolychronakis MMonrose F(2019)SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security2019 IEEE Symposium on Security and Privacy (SP)10.1109/SP.2019.00021(20-38)Online publication date: May-2019
https://doi.org/10.1109/SP.2019.00021
Lau P(2019)Static Detection of Event-Driven Races in HTML5-Based Mobile AppsVerification and Evaluation of Computer and Communication Systems10.1007/978-3-030-35092-5_3(32-46)Online publication date: 12-Nov-2019
https://doi.org/10.1007/978-3-030-35092-5_3
Su DBao WLiu JGong C(2018)An efficient simulation of the fractional chaotic system and its synchronizationJournal of the Franklin Institute10.1016/j.jfranklin.2016.10.045355:18(9072-9084)Online publication date: Dec-2018
https://doi.org/10.1016/j.jfranklin.2016.10.045
Fang HGong CYu CMin CZhang XLiu JXiao L(2017)Efficient mesh deformation based on Cartesian background meshComputers & Mathematics with Applications10.1016/j.camwa.2016.10.02373:1(71-86)Online publication date: Jan-2017
https://doi.org/10.1016/j.camwa.2016.10.023
CHEN CLU KWANG XZHOU XWU Z(2015)A Load-Balanced Deterministic Runtime for Pipeline ParallelismIEICE Transactions on Information and Systems10.1587/transinf.2014EDL8171E98.D:2(433-436)Online publication date: 2015
https://doi.org/10.1587/transinf.2014EDL8171
Lu KZhou XWang XBergan TChen C(2015)An Efficient and Flexible Deterministic Framework for Multithreaded ProgramsJournal of Computer Science and Technology10.1007/s11390-015-1503-830:1(42-56)Online publication date: 25-Jan-2015
https://doi.org/10.1007/s11390-015-1503-8
Jiang WHe YGu HLiu KHu YLi JAi W(2024)Memento: A New Multithread Deterministic Replay Debugging Method2024 5th International Conference on Artificial Intelligence and Computer Engineering (ICAICE)10.1109/ICAICE63571.2024.10864045(310-325)Online publication date: 8-Nov-2024
https://doi.org/10.1109/ICAICE63571.2024.10864045
Zhao QQiu ZShao SHui XKhan HJin GKloeckner AMoreira J(2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569669
Albaghajati AAhmed M(2021)CPN.Net: An Automated Colored Petri Nets Model Extraction From .Net Based Source Code2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)10.1109/CAIDA51941.2021.9425201(245-250)Online publication date: 6-Apr-2021
https://doi.org/10.1109/CAIDA51941.2021.9425201
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents