Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient deterministic multithreading without global barriers

Published: 06 February 2014 Publication History

Abstract

Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend on its inputs only, which can totally solve the above problem. However, current DMT implementations suffer from a common inefficiency: they use frequent global barriers to enforce a deterministic ordering on memory accesses. In this paper, we eliminate that inefficiency using an execution model we call deterministic lazy release consistency (DLRC). Our execution model uses the Kendo algorithm to enforce a deterministic ordering on synchronization, and it uses a deterministic version of the lazy release consistency memory model to propagate memory updates across threads. Our approach guarantees that programs execute deterministically even when they contain data races. We implemented a DMT system based on these ideas (RFDet) and evaluated it using 16 parallel applications. Our implementation targets C/C++ programs that use POSIX threads. Results show that RFDet gains nearly 2x speedup compared with DThreads-a start-of-the-art DMT system.

References

[1]
S. V. Adve and J. K. Aggarwal, "A Unified Formalization of Four Shared-Memory Models," IEEE Trans. Parallel Distrib. Syst., vol. 4, pp. 613--624, 1993.
[2]
S. V. Adve and M. D. Hill, "Weak ordering--a new definition," presented at the Proceedings of the 17th annual international symposium on Computer Architecture, Seattle, Washington, USA, 1990.
[3]
A. Amittai, W. Shu-Chun, H. Sen, and F. Bryan, "Efficient system-enforced deterministic parallelism," presented at the Proceedings of the 9th USENIX conference on Operating systems design and implementation, Vancouver, BC, Canada, 2010.
[4]
T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman, "CoreDet: a compiler and runtime system for deterministic multithreaded execution," presented at the Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, Pittsburgh, Pennsylvania, USA, 2010.
[5]
T. Bergan, L. Ceze, and D. Grossman, "Input-Covering Schedules for Multithreaded Programs," in Proceedings of the Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), Indianapolis, Indiana, USA, 2013.
[6]
T. Bergan, J. Devietti, N. Hunt, and L. Ceze, "The Deterministic Execution Hammer: How Well Does it Actually Pound Nails?," in WoDET, 2011.
[7]
T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble, "Deterministic process groups in dOS," in Proceedings of the 9th USENIX conference on Operating systems design and implementation, 2010.
[8]
E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. "Hoard: A scalable memory allocator for multithreaded applications," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), pages 117--128, Cambridge, MA, Nov. 2000.
[9]
E. D. Berger, T. Yang, T. Liu, and G. Novark, "Grace: Safe multithreaded programming for C/C+," in OOPSLA, 2009, pp. 81--96.
[10]
C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," in Proceedings of the 17th international conference on Parallel architectures and compilation techniques, 2008.
[11]
R. L. Bocchino Jr, V. S. Adve, S. V. Adve, and M. Snir, "Parallel programming must be deterministic by default," in Proceedings of the First USENIX conference on Hot topics in parallelism, 2009, pp. 4--4.
[12]
H.-J. Boehm, "Position Paper: Nondeterminism is Unavoidable, but Data Races are Pure Evil," in Proceedsing of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability (RACES), 2012.
[13]
H.-J. Boehm and S. V. Adve, "Foundations of the C+ concurrency memory model," presented at the Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, Tucson, AZ, USA, 2008.
[14]
H. Cui, J. Simsa, H. Li, B. Blum, X. Xu, J. Yang, G. A. Gibson, and R. E. Bryant, "Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads," in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, Farmington, PA, USA, 2013.
[15]
H. Cui, J. Wu, and J. Yang, "Stable deterministic multithreading through schedule memoization," in Proceedings of the 9th USENIX conference on Operating systems design and implementation, 2010.
[16]
H. Cui, J. Wu, J. Gallagher, H. Guo, and J. Yang, "Efficient Deterministic Multithreading through Schedule Relaxation," in Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, Cascais, Portugal, 2011.
[17]
J. Devietti, B. Lucia, L. Ceze, M. Oskin, "DMP: deterministic shared memory multiprocessing,"presented at the Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, Washington, DC, USA, 2009.
[18]
J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman, "RCDC: a relaxed consistency deterministic computer," in Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, Newport Beach, California, USA, 2011, pp. 67--78.
[19]
C. J. Fidge., "Partial orders for parallel debugging," in ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, January 1989, pp. 24(1): 183--194.
[20]
M. Hill and M. Xu. Racey: A Stress Test for Deterministic Execution. Available: http://www.cs.wisc.edu/~markhill/racey.html
[21]
D. R. Hower, P. Dudnik, M. D. Hill, and D. A. Wood, "Calvin: Deterministic or not? Free will to choose," in High Performance Computer Architecture (HPCA), 2011, pp. 333--334.
[22]
B. Kasikci, C. Zamfir, and G. Candea. "Data Races vs. Data Race Bugs: Telling the Difference with Portend," in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012.
[23]
P. Keleher, A. L. Cox, and W. Zwaenepoel, "Lazy release consistency for software distributed shared memory," SIGARCH Comput. Archit. News, vol. 20, pp. 13--21, 1992.
[24]
P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel, "TreadMarks: distributed shared memory on standard workstations and operating systems," presented at the Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, San Francisco, California, 1994.
[25]
T. J. LeBlanc and J. M. Mellor-Crummey, "Debugging parallel programs with instant replay," Computers, IEEE Transactions on, vol. 100, pp. 471--482, 1987.
[26]
E. A. Lee, "The problem with threads," Computer, vol. 39, pp. 33--42, 2006.
[27]
D. Lee, P. M. Chen, J. Flinn, and S. Narayanasamy, "Chimera: Hybrid Program Analysis for Determinism," presented at the Proceedings of the 2012 ACM SIGPLAN conference on Programming language design and implementation, Beijing, China, 2012.
[28]
T. Liu, C. Curtsinger, and E. D. Berger, "DTHREADS: Efficient Deterministic Multithreading," in Proceedings of the 22nd ACM Symposium on Operating Systems Principles, 2011.
[29]
T. Merrifield, and J. Eriksson, "Conversion: Multi-Version Concurrency Control for Main Memory Segments," in EuroSys, 2013.
[30]
M. Olszewski, J. Ansel, and S. Amarasinghe, "Kendo: efficient deterministic multithreading in software," in Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, 2009, pp. 97--108.
[31]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, "Evaluating MapReduce for Multi-core and Multiprocessor Systems," in Proceedings of the 13th International Symposium on High Performance Computer Architecture, Washington, DC, USA, 2007, pp. 13--24.
[32]
D. Subhraveti and J. Nieh, "Record and transplay: partial checkpointing for replay debugging across heterogeneous systems," in SIGMETRICS 2011, pp. 109--120.
[33]
K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, et al., "DoublePlay: Parallelizing Sequential Logging and Replay," in Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, Newport Beach, California, USA, 2011.
[34]
V. M. Weaver and S. A. McKee, "Can hardware performance counters be trusted?," in IISWC, 2008, pp. 141--150.
[35]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The SPLASH-2 programs: Characterization and methodological considerations," in Proceedings of the 22nd annual international symposium on Computer architecture, 1995, pp. 24--36.
[36]
W. Xiong, S. Park, J. Zhang, Y. Zhou, and Z. Ma, "Ad hoc synchronization considered harmful," in Proceedings of the 9th USENIX conference on Operating Systems Design and Implementation, 2010, pp. 163--176.
[37]
X. Zhou, K. Lu, X. Wang, and X. Li, "Exploiting parallelism in deterministic shared memory multiprocessing," J. Parallel Distrib. Comput., pp. 72(2012)716--727, 2012.

Cited By

View all
  • (2020)COPPTCHA: COPPA Tracking by Checking Hardware-Level ActivityIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.298328715(3213-3226)Online publication date: 2020
  • (2019)SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security2019 IEEE Symposium on Security and Privacy (SP)10.1109/SP.2019.00021(20-38)Online publication date: May-2019
  • (2019)Static Detection of Event-Driven Races in HTML5-Based Mobile AppsVerification and Evaluation of Computer and Communication Systems10.1007/978-3-030-35092-5_3(32-46)Online publication date: 12-Nov-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 49, Issue 8
PPoPP '14
August 2014
390 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2692916
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
    February 2014
    412 pages
    ISBN:9781450326568
    DOI:10.1145/2555243
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 February 2014
Published in SIGPLAN Volume 49, Issue 8

Check for updates

Author Tags

  1. deterministic execution
  2. lazy release consistency
  3. multithreading

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)COPPTCHA: COPPA Tracking by Checking Hardware-Level ActivityIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.298328715(3213-3226)Online publication date: 2020
  • (2019)SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security2019 IEEE Symposium on Security and Privacy (SP)10.1109/SP.2019.00021(20-38)Online publication date: May-2019
  • (2019)Static Detection of Event-Driven Races in HTML5-Based Mobile AppsVerification and Evaluation of Computer and Communication Systems10.1007/978-3-030-35092-5_3(32-46)Online publication date: 12-Nov-2019
  • (2018)An efficient simulation of the fractional chaotic system and its synchronizationJournal of the Franklin Institute10.1016/j.jfranklin.2016.10.045355:18(9072-9084)Online publication date: Dec-2018
  • (2017)Efficient mesh deformation based on Cartesian background meshComputers & Mathematics with Applications10.1016/j.camwa.2016.10.02373:1(71-86)Online publication date: Jan-2017
  • (2015)A Load-Balanced Deterministic Runtime for Pipeline ParallelismIEICE Transactions on Information and Systems10.1587/transinf.2014EDL8171E98.D:2(433-436)Online publication date: 2015
  • (2015)An Efficient and Flexible Deterministic Framework for Multithreaded ProgramsJournal of Computer Science and Technology10.1007/s11390-015-1503-830:1(42-56)Online publication date: 25-Jan-2015
  • (2024)Memento: A New Multithread Deterministic Replay Debugging Method2024 5th International Conference on Artificial Intelligence and Computer Engineering (ICAICE)10.1109/ICAICE63571.2024.10864045(310-325)Online publication date: 8-Nov-2024
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • (2021)CPN.Net: An Automated Colored Petri Nets Model Extraction From .Net Based Source Code2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)10.1109/CAIDA51941.2021.9425201(245-250)Online publication date: 6-Apr-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media