Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1736020.1736031acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Respec: efficient online multiprocessor replayvia speculation and external determinism

Published: 13 March 2010 Publication History

Abstract

Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, replaying shared memory multiprocessor systems at low overhead on commodity hardware is still an open problem. This paper presents Respec, a new way to support deterministic replay of shared memory multithreaded programs on commodity multiprocessor hardware. Respec targets online replay in which the recorded and replayed processes execute concurrently.
Respec uses two strategies to reduce overhead while still ensuring correctness: speculative logging and externally deterministic replay. Speculative logging optimistically logs less information about shared memory dependencies than is needed to guarantee deterministic replay, then recovers and retries if the replayed process diverges from the recorded process. Externally deterministic replay relaxes the degree to which the two executions must match by requiring only their system output and final program states match. We show that the combination of these two techniques results in low recording and replay overhead for the common case of data-race-free execution intervals and still ensures correct replay for execution intervals that have data races.
We modified the Linux kernel to implement our techniques. Our software system adds on average about 18% overhead to the execution time for recording and replaying programs with two threads and 55% overhead for programs with four threads.

References

[1]
G. Altekar and I. Stoica. ODR: Output-deterministic replay for multicore debugging. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles, October 2009.
[2]
D. F. Bacon and S. C. Goldstein. Hardware assisted replay of multiprocessor programs. In Proceedings of the 1991 ACM/ONR Workshop on Parallel and Distributed Debugging, pages 194--206. ACM Press, 1991.
[3]
S. Bhansali, W. Chen, S. de Jong, A. Edwards, and M. Drinic. Framework for instruction-level tracing and analysis of programs. In Second International Conference on Virtual Execution Environments, June 2006.
[4]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008.
[5]
H. J. Boehm and S. Adve. Foundations of the c concurrency memory model. In Proceedings of PLDI, pages 68--78. ACM, 2008.
[6]
B. Boothe. Efficient algorithms for bidirectional debugging. In Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pages 299--310, 2000.
[7]
T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. ACM Transactions on Computer Systems, 14(1):80--107, February 1996.
[8]
J. D. Choi, B. Alpern, T. Ngo, and M. Sridharan. A perturbation free replay platform for cross-optimized multithreaded applications. In Proceedings of the 15th International Parallel and Distributed Processing Symposium, April 2001.
[9]
J. Chow, T. Garfinkel, and P. M. Chen. Decoupling dynamic program analysis from execution in virtual environments. In Proceedings of the 2008 USENIX Technical Conference, pages 1--14, June 2008.
[10]
G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation, pages 211--224, Boston, MA, December 2002.
[11]
G. W. Dunlap, D. G. Lucchetti, M. Fetterman, and P. M. Chen. Execution replay on multiprocessor virtual machines. In Proceedings of the 2008 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), pages 121--130, March 2008.
[12]
S. I. Feldman and C. B. Brown. Igor: a system for program debugging via reversible execution. In PADD '88: Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging, pages 112--123, 1988.
[13]
K. Fraser and F. Chang. Operating system I/O speculation: How two invocations are faster than one. In Proceedings of the 2003 USENIX Technical Conference, pages 325--338, San Antonio, TX, June 2003.
[14]
A. Georges, M. Christiaens, M. Ronsse, and K. D. Bosschere. Jarec: A portable record/replay environment for multi-threaded java applications. In Software: Practice and Experience, 2004.
[15]
D. R. Hower and M. D. Hill. Rerun: Exploiting Episodes for Lightweight memory Race Recording. In Proceedings of the 2008 International Symposium on Computer Architecture, pages 265--276, June 2008.
[16]
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proceedings of the 2005 USENIX Technical Conference, pages 1--15, April 2005.
[17]
T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Transaction on Computers, 36(4):471--482, 1987.
[18]
D. Lee, M. Said, S. Narayanasamy, Z. J. Yang, and C. Pereira. Offline Symbolic Analysis for Multi-Processor Execution Replay. In International Symposium on Microarchitecture (MICRO), 2009.
[19]
D. E. Lowell, S. Chandra, and P. M. Chen. Exploring failure transparency and the limits of generic recovery. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation, San Diego, CA, October 2000.
[20]
J. Manson, W. Pugh, and S. Adve. The java memory model. In Proceedings of POPL, pages 378--391. ACM, 2005.
[21]
P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently . In Proceedings of the 2008 International Symposium on Computer Architecture, pages 289--300, June 2008.
[22]
P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: a software-hardware interface for practical deterministic multiprocessor replay. In Proceedings of the 14th International conference on Architectural support for programming languages and operating systems (ASPLOS), pages 73--84, 2009.
[23]
S. Narayanasamy, C. Pereira, and B. Calder. Recording shared memory dependencies using strata. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 229--240, 2006.
[24]
S. Narayanasamy, C. Pereira, H. Patil, R. Cohn, and B. Calder. Automatic logging of operating system effects to guide application-level architecture simulation. In International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS), June 2006.
[25]
S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically classifying benign and harmful data races using replay analysis. In PLDI, June 2007.
[26]
R. H. B. Netzer. Optimal tracing and replay for debugging shared-memory parallel programs. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, pages 1--11, 1993.
[27]
E. B. Nightingale, P. M. Chen, and J. Flinn. Speculative execution in a distributed file system. In Proceedings of the 20th ACM Symposium on Operating Systems Principles, pages 191--205, Brighton, United Kingdom, October 2005.
[28]
E. B. Nightingale, D. Peek, P. M. Chen, and J. Flinn. Parallelizing security checks on commodity hardware. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, Seattle, WA, March 2008.
[29]
E. B. Nightingale, K. Veeraraghavan, P. M. Chen, and J. Flinn. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, pages 1--14, Seattle, WA, October 2006.
[30]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: efficient deterministic multithreading in software. In Proceedings of the 2009 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2009.
[31]
S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of Zap: A system for migrating computing environments. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation, pages 361--376, Boston, MA, December 2002.
[32]
S. Park, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, S. Lu, and Y. Zhou. Do you have to reproduce the bug at the first replay attempt? -- PRES: Probabilistic replay with execution sketching on multiprocessors. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles, October 2009.
[33]
F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: Treating bugs as allergies -- a safe method to survive software failures. In Proceedings of the 20th ACM Symposium on Operating Systems Principles, pages 235--248, Brighton, United Kingdom, October 2005.
[34]
M. Ronsse and K. D. Bosschere. RecPlay: A Full Integrated Practical Record/Replay System. ACM Transactions on Computer Systems, 17(2):133--152, May 1999.
[35]
S. Sarangi, S. Narayanasamy, B. Carneal, A. Tiwari, B. Calder, and J. Torrellas. Patching processor design errors with programmable hardware. IEEE Micro Top Picks, 27(1):12--25, 2007.
[36]
S. Srinivasan, C. Andrews, S. Kandula, and Y. Zhou. Flashback: A light-weight extension for rollback and deterministic replay for software debugging. In Proceedings of the 2004 USENIX Technical Conference, Boston, MA, June 2004.
[37]
J. Steven, P. Chandra, B. Fleck, and A. Podgurski. jrapture: A capture replay tool for observation-based testing. In Proceedings of the International Symposium on Software Testing and Analysis, 2000.
[38]
J. Tucek, S. Lu, C. Huang, S. Xanthos, and Y. Zhou. Triage: Diagnosing Production Run Failures at the User's Site. In Proceedings of the 2007 Symposium on Operating Systems Principles, pages 131--144, October 2007.
[39]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24--36, June 1995.
[40]
M. Xu, R. Bodik, and M. D. Hill. A Flight Data Recorder for Enabling Full-system Multiprocessor Deterministic Replay. In Proceedings of the 2003 International Symposium on Computer Architecture, June 2003.
[41]
M. Xu, M. D. Hill, and R. Bodik. A regulated transitive reduction (RTR) for longer memory race recording. In ASPLOS--XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 49--60, 2006.
[42]
M. Xu, V. Malyugin, J. Sheldon, G. Venkitachalam, and B. Weissman. ReTrace: Collecting Execution Trace with Virtual Machine Deterministic Replay. In Proceedings of the 2007 Workshop on Modeling, Benchmarking and Simulation (MoBS), June 2007.

Cited By

View all
  • (2024)Trinity: High-Performance and Reliable Mobile Emulation through Graphics ProjectionACM Transactions on Computer Systems10.1145/364302942:3-4(1-33)Online publication date: 20-Sep-2024
  • (2024)Differential Analysis for System Provenance2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00455(5649-5653)Online publication date: 13-May-2024
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
March 2010
422 pages
ISBN:9781605588391
DOI:10.1145/1736020
  • General Chair:
  • James C. Hoe,
  • Program Chair:
  • Vikram S. Adve
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 45, Issue 3
    ASPLOS '10
    March 2010
    399 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1735971
    Issue’s Table of Contents
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 38, Issue 1
    ASPLOS '10
    March 2010
    399 pages
    ISSN:0163-5964
    DOI:10.1145/1735970
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. external determinism
  2. replay
  3. speculative execution

Qualifiers

  • Research-article

Conference

ASPLOS '10

Acceptance Rates

ASPLOS XV Paper Acceptance Rate 32 of 181 submissions, 18%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Trinity: High-Performance and Reliable Mobile Emulation through Graphics ProjectionACM Transactions on Computer Systems10.1145/364302942:3-4(1-33)Online publication date: 20-Sep-2024
  • (2024)Differential Analysis for System Provenance2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00455(5649-5653)Online publication date: 13-May-2024
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • (2022)ClusterRR: a record and replay framework for virtual machine clusterProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516819(31-44)Online publication date: 25-Feb-2022
  • (2021)Postmortem accurate IR-level state recovery for deployed concurrent programsACM SIGAPP Applied Computing Review10.1145/3493499.349350221:3(33-48)Online publication date: 20-Oct-2021
  • (2021)STRABProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3442028(1532-1541)Online publication date: 22-Mar-2021
  • (2020)Improving reproducibility of data science pipelines through transparent provenance captureProceedings of the VLDB Endowment10.14778/3415478.341555613:12(3354-3368)Online publication date: 14-Sep-2020
  • (2020)Extending Coverage of Stationary Sensing Systems with Mobile Sensing Systems for Human Mobility ModelingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34118274:3(1-21)Online publication date: 4-Sep-2020
  • (2020)Semantic-aware Spatio-temporal App Usage Representation via Graph Convolutional NetworkProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34118174:3(1-24)Online publication date: 4-Sep-2020
  • (2020)SmartTrack: efficient predictive race detectionProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385993(747-762)Online publication date: 11-Jun-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media