Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Kendo: efficient deterministic multithreading in software

Published: 07 March 2009 Publication History

Abstract

Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by hindering their ability to create parallel applications with repeatable results. As a consequence, parallel applications are significantly harder to debug, test, and maintain than sequential programs.
This paper introduces Kendo: a new software-only system that provides deterministic multithreading of parallel applications. Kendo enforces a deterministic interleaving of lock acquisitions and specially declared non-protected reads through a novel dynamically load-balanced deterministic scheduling algorithm. The algorithm tracks the progress of each thread using performance counters to construct a deterministic logical time that is used to compute an interleaving of shared data accesses that is both deterministic and provides good load balancing. Kendo can run on today's commodity hardware while incurring only a modest performance cost. Experimental results on the SPLASH-2 applications yield a geometric mean overhead of only 16% when running on 4 processors. This low overhead makes it possible to benefit from Kendo even after an application is deployed. Programmers can start using Kendo today to program parallel applications that are easier to develop, debug, and test.

References

[1]
Claudio Basile, Keith Whisnant, Zbigniew Kalbarczyk, and Ravi Iyer. Loose synchronization of multithreaded replicas. pages 250--255, 2002.
[2]
R. Bocchino, V. Adve, S. Adve, and M. Snir. Parallel programming must be deterministic by default. Technical Report UIUCDCS-R-2008-3012, University of Illinois at Urbana-Champaign, 2008. URL http://dpj.cs.uiuc.edu/DPJ/Publications_files/paper.pdf.
[3]
Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. Detecting data races in cilk programs that use locks. In proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA'98), pages 298--309, Puerto Vallarta, Mexico, June 28-July 2, 1998.
[4]
Joe Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. Explicitly parallel programming with shared-memory is insane: At least make it deterministic! In proceedings of SHCMP 2008: Workshop on Software and Hardware Challenges of Manycore Platforms, Beijing, China, June 22, 2008.
[5]
Rachna Dhamija and Adrian Perrig. Deja vu: a user study using images for authentication. In SSYM'00: Proceedings of the 9th conference on USENIX Security Symposium, pages 4--4, Berkeley, CA, USA, 2000. USENIX Association.
[6]
Jorg Domaschka, Franz J. Hauck, Hans P. Reiser, and Rutiger Kapitza. Deterministic multithreading for java-based replicated objects. In proceedings of the International Conference on Parallel and Distributed Computing and Systems, 2006.
[7]
Jorg Domaschka, Andreas I. Schmied, Hans P. Reiser, and Franz J. Hauck. Revisiting deterministic multithreading strategies. International Parallel and Distributed Processing Symposium, pages 1--8, 2007.
[8]
George W. Dunlap, Dominic G. Lucchetti, Michael A. Fetterman, and Peter M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 121--130, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-796-4.
[9]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In proceedings of the ACM SIGPLAN-98 Conference on Programming Language Design and Implementation, pages 212--223, Montreal, Quebec, Canada, June 1998. Proceedings published ACMSIGPLAN Notices, Vol. 33, No. 5, May, 1998.
[10]
Derek R. Hower and Mark D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008.
[11]
Wolfgang Karl, Markus Leberecht, and Michael Oberhuber. Forcing deterministic execution of parallel programs -- debugging support through the smile monitoring approach. In proceedings of the SCI-Europe, September 1998.
[12]
Milind Kulkarni, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala, and L. Paul Chew. Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News, 36(1):233--243, 2008. ISSN 0163-5964.
[13]
Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, 1978. ISSN 0001-0782.
[14]
T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4):471--482, 1987. ISSN 0018-9340.
[15]
Edward A. Lee. The problem with threads. Computer, 39(5):33--42, May 2006. ISSN 0018-9162.
[16]
Shan Lu, Joe Tucek, Feng Qin, and Yuanyuan Zhou. Avio: Detecting atomicity violations via access-interleaving invariants. In proceedings of the International Conference on Architecture Support for Programming Languages and Operating Systems, October 2006.
[17]
Shan Lu, Weihang Jiang, and Yuanyuan Zhou. A study of interleaving coverage criteria. In proceedings of ESEC-FSE, pages 533--536, New York, NY, USA, 2007a. ACM. ISBN 978-1-59593-811-4.
[18]
Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A. Popa, and Yuanyuan Zhou. Muvi: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In proceedings of the 21st ACM Symposium on Operating Systems Principles, October 2007b.
[19]
Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In ASPLOS XIII: proceedings of the 13th international conference on Architectural support for programming languages and operating systems, pages 329--339, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-958-6.
[20]
Pablo Montesinos, Luis Ceze, and Josep Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008.
[21]
Hans P. Reiser, Jorg Domaschka, Franz J. Hauck, Rudiger Kapitza, and Wolfgang Schroder-Preikschat. Consistent replication of multithreaded distributed objects. 25th IEEE Symposium on Reliable Distributed Systems, pages 257--266, 2006. ISSN 1060-9857.
[22]
Jonathan Rose. Locusroute: a parallel global router for standard cells. In DAC-88: Proceedings of the 25th ACM/IEEE conference on Design automation, pages 189--195, Los Alamitos, CA, USA, 1988. IEEE Computer Society Press. ISBN 0-8186-8864-5.
[23]
Mark Russinovich and Bryce Cogswell. Replay for concurrent non-deterministic shared-memory applications. In proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, pages 258--266, New York, NY, USA, 1996. ACM. ISBN 0-89791-795-2.
[24]
Debashis Saha and Sourav K. Dutta. Specification of deterministic execution timing schema for parallel programs on a multiprocessor. Computer, Communication, Control and Power Engineering, pages 114--116 vol.1, 1993.
[25]
Jaswinder Pal Singh, Anoop Gupta, and Marc Levoy. Parallel visualization algorithms: Performance and architectural implications. Computer, 27(7):45--55, 1994. ISSN 0018-9162.
[26]
William Thies, Michal Karczmarek, and Saman Amarasinghe. Streamit: A language for streaming applications. In International Conference on Compiler Construction, Grenoble, France, April 2002.
[27]
Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou. Triage: diagnosing production run failures at the user's site. In proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, pages 131--144, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-591-5.
[28]
Larry D. Wittie. Debugging distributed C programs by real time reply. SIGPLAN Not., 24(1):57--67, 1989. ISSN 0362-1340.
[29]
S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In proceedings of 22nd Annual International Symposium on Computer Architecture News, pages 24--36, June 1995.
[30]
Min Xu, Rastislav Bodik, and Mark D. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. SIGARCH Comput. Archit. News, 31(2):122--135, 2003. ISSN 0163-5964.

Cited By

View all
  • (2022)Exploiting Concurrency in Sharded Parallel State Machine ReplicationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313576133:9(2133-2147)Online publication date: 1-Sep-2022
  • (2022)Early scheduling on steroids: Boosting parallel state machine replicationJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.02.001163(269-282)Online publication date: May-2022
  • (2021)Frequent background polling on a shared thread, using light-weight compiler interruptsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454107(1249-1263)Online publication date: 19-Jun-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 44, Issue 3
ASPLOS 2009
March 2009
346 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1508284
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
    March 2009
    358 pages
    ISBN:9781605584065
    DOI:10.1145/1508244
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2009
Published in SIGPLAN Volume 44, Issue 3

Check for updates

Author Tags

  1. debugging
  2. determinism
  3. deterministic multithreading
  4. multicore
  5. parallel programming

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)6
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Exploiting Concurrency in Sharded Parallel State Machine ReplicationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313576133:9(2133-2147)Online publication date: 1-Sep-2022
  • (2022)Early scheduling on steroids: Boosting parallel state machine replicationJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.02.001163(269-282)Online publication date: May-2022
  • (2021)Frequent background polling on a shared thread, using light-weight compiler interruptsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454107(1249-1263)Online publication date: 19-Jun-2021
  • (2021)CPN.Net: An Automated Colored Petri Nets Model Extraction From .Net Based Source Code2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)10.1109/CAIDA51941.2021.9425201(245-250)Online publication date: 6-Apr-2021
  • (2021)Exploring Domain-Specific Architectures for Energy-Efficient Wearable ComputingJournal of Signal Processing Systems10.1007/s11265-021-01682-y94:6(559-577)Online publication date: 24-Jul-2021
  • (2020)Parallel State Machine Replication from Generalized Consensus2020 International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS51746.2020.00021(133-142)Online publication date: Sep-2020
  • (2020)Dynamic Analysis Method for Concurrency Bugs in Multi-process/Multi-thread EnvironmentsInternational Journal of Parallel Programming10.1007/s10766-020-00661-348:6(1032-1060)Online publication date: 1-Dec-2020
  • (2019)Distributed Management Systems for Infocommunication Networks: A Model Based on TM Forum FrameworxComputers10.3390/computers80200458:2(45)Online publication date: 4-Jun-2019
  • (2019)TapirACM Transactions on Parallel Computing10.1145/33656556:4(1-33)Online publication date: 17-Dec-2019
  • (2019)Resource Utilization Analysis of Early Scheduling in Parallel State Machine Replication2019 9th Latin-American Symposium on Dependable Computing (LADC)10.1109/LADC48089.2019.8995730(1-10)Online publication date: Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media