Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Transparent, lightweight application execution replay on commodity multiprocessor operating systems

Published: 14 June 2010 Publication History

Abstract

We present Scribe, the first system to provide transparent, low-overhead application record-replay and the ability to go live from replayed execution. Scribe introduces new lightweight operating system mechanisms, rendezvous and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Rendezvous points make a partial ordering of execution based on system call dependencies sufficient for replay, avoiding the recording overhead of maintaining an exact execution ordering. Sync points convert asynchronous interactions that can occur at arbitrary times into synchronous events that are much easier to record and replay.
We have implemented Scribe without changing, relinking, or recompiling applications, libraries, or operating system kernels, and without any specialized hardware support such as hardware performance counters. It works on commodity Linux operating systems, and commodity multi-core and multiprocessor hardware. Our results show for the first time that an operating system mechanism can correctly and transparently record and replay multi-process and multi-threaded applications on commodity multiprocessors. Scribe recording overhead is less than 2.5% for server applications including Apache and MySQL, and less than 15% for desktop applications including Firefox, Acrobat, OpenOffice, parallel kernel compilation, and movie playback.

References

[1]
D. F. Bacon and S. C. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs. In Proceedings of the 1991 ACM/ONR Workshop on Parallel and Distributed Debugging, May 1991.
[2]
R. M. Balzer. EXDAMS: Extendable Debugging and Monitoring System. In Proceedings of the AFIPS Spring Joint Computer Conference, May 1969.
[3]
P. Bergheaud, D. Subhraveti, and M. Vertes. Fault Tolerance in Multiprocessor Systems Via Application Cloning. In Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS), June 2007.
[4]
T. C. Bressoud. TFT: A Software System for Application-Transparent Fault Tolerance. In Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing, June 1998.
[5]
T. C. Bressoud and F. B. Schneider. Hypervisor-Based Fault Tolerance. In Proceedings of the 15th Symposium on Operating Systems Principles (SOSP), Dec. 1995.
[6]
J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multithreaded Applications. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, June 1998.
[7]
P. J. Courtois, F. Heymans, and D. L. Parnas. Concurrent Control with "Readers" and "Writers". Communications of the ACM, 14(10), 1971.
[8]
J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009.
[9]
G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling Intrusion Analysis Through Virtual--Machine Logging and Replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2002.
[10]
G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In Proceedings of the 4th International Conference on Virtual Execution Environments (VEE), Mar. 2008.
[11]
D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay Debugging for Distributed Applications. In Proceedings of the 2006 USENIX Annual Technical Conference, June 2006.
[12]
Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-Level Kernel for Record and Replay. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2008.
[13]
D. R. Hower and M. D. Hill. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA), June 2008.
[14]
O. Laadan, R. A. Baratto, D. Phung, S. Potter, and J. Nieh. DejaView: A Personal Virtual Computer Recorder. In Proceedings of the 21st Symposium on Operating Systems Principles (SOSP), Oct. 2007.
[15]
O. Laadan and J. Nieh. Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems. In Proceedings of the 2007 USENIX Annual Technical Conference, June 2007.
[16]
O. Laadan and J. Nieh. Operating System Virtualization: Practice and Experience. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference (SYSTOR), May 2010.
[17]
T. J. Leblanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, C-36(4), Apr. 1987.
[18]
N. McWhirter, editor. The Guinness Book of World Records. Sterling Publishing Co., Inc, 1985.
[19]
P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared--Memory Multiprocesso rExecution Efficiently. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA), June 2008.
[20]
P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: a Software-Hardware Interface for Practical Deterministic Multiprocessor Replay. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009.
[21]
S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2006.
[22]
S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA), 2005.
[23]
M. Olszweski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009.
[24]
S. Osman, D. Subhraveti, G. Su, and J. Nieh. The Design and Implementation of Zap: A System for Migrating Computing Environments. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2002.
[25]
M. Russinovich and B. Cogswell. Replay for Concurrent Non-Deterministic Shared-Memory Applications. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (PLDI), May 1996.
[26]
Y. Saito. Jockey: a User-Space Library for Record-Replay Debugging. In Proceedings of the 6th International Symposium on Automated Analysis-Driven Debugging, Sept. 2005.
[27]
J. H. Slye and E. Elnozahy. Supporting Nondeterministic Execution in Fault-Tolerant Systems. In Proceedings of the 26th Annual International Symposium on Fault-Tolerant Computing, 1996.
[28]
S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In Proceedings of the 2004 USENIX Annual Technical Conference, June 2004.
[29]
D. Stodden, H. Eichner, M. Walter, and C. Trinitis. Hardware Instruction Counting for Log-based Rollback Recovery on x86-family Processors. In Proceedings of the 3rd International Service Availability Symposium (ISAS), 2006.
[30]
H. Thane and H. Hansson. Using Deterministic Replay for Debugging of Distributed Real-Time Systems. In Proceedings of the 12th Euromicro Conference on Real-Time System, June 2000.
[31]
A. Tucker. Personal communications, June 2009.
[32]
Vmware. http://www.vmware.com.
[33]
M. Xu, R. Bodik, and M. D. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In Proceedings of the 30th International Symposium on Computer Architecture (ISCA), June 2003.

Cited By

View all
  • (2024)Jmvx: Fast Multi-threaded Multi-version Execution and Record-Replay for Managed LanguagesProceedings of the ACM on Programming Languages10.1145/36897698:OOPSLA2(1641-1669)Online publication date: 8-Oct-2024
  • (2018)Debugging Nondeterministic Failures in Linux Programs through Replay AnalysisScientific Programming10.1155/2018/89390272018Online publication date: 1-Jan-2018
  • (2018)Replay without recording of production bugs for service oriented applicationsProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering10.1145/3238147.3238186(452-463)Online publication date: 3-Sep-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 38, Issue 1
Performance evaluation review
June 2010
382 pages
ISSN:0163-5999
DOI:10.1145/1811099
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMETRICS '10: Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
    June 2010
    398 pages
    ISBN:9781450300384
    DOI:10.1145/1811039
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2010
Published in SIGMETRICS Volume 38, Issue 1

Check for updates

Author Tags

  1. debugging
  2. fault-tolerance
  3. record-replay
  4. virtualization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Jmvx: Fast Multi-threaded Multi-version Execution and Record-Replay for Managed LanguagesProceedings of the ACM on Programming Languages10.1145/36897698:OOPSLA2(1641-1669)Online publication date: 8-Oct-2024
  • (2018)Debugging Nondeterministic Failures in Linux Programs through Replay AnalysisScientific Programming10.1155/2018/89390272018Online publication date: 1-Jan-2018
  • (2018)Replay without recording of production bugs for service oriented applicationsProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering10.1145/3238147.3238186(452-463)Online publication date: 3-Sep-2018
  • (2018)FTvNFProceedings of the 2018 Symposium on Architectures for Networking and Communications Systems10.1145/3230718.3230731(141-147)Online publication date: 23-Jul-2018
  • (2018)Exploring OS-based full-system deterministic replayProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167247(1077-1086)Online publication date: 9-Apr-2018
  • (2018)Nondeterministic Event Sequence Reduction for Android Applications2018 5th International Conference on Dependable Systems and Their Applications (DSA)10.1109/DSA.2018.00026(96-101)Online publication date: Sep-2018
  • (2017)Towards Practical Default-On Multi-Core Record/ReplayACM SIGARCH Computer Architecture News10.1145/3093337.303775145:1(693-708)Online publication date: 4-Apr-2017
  • (2017)Towards Practical Default-On Multi-Core Record/ReplayACM SIGPLAN Notices10.1145/3093336.303775152:4(693-708)Online publication date: 4-Apr-2017
  • (2017)Towards Practical Default-On Multi-Core Record/ReplayProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3037697.3037751(693-708)Online publication date: 4-Apr-2017
  • (2017)Deterministic Replay for Multi-Core VxWorks Applications2017 International Conference on Dependable Systems and Their Applications (DSA)10.1109/DSA.2017.27(118-125)Online publication date: Oct-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media