Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Toward generating reducible replay logs

Published: 04 June 2011 Publication History

Abstract

Logging and replay is important to reproducing software failures and recovering from failures. Replaying a long execution is time consuming, especially when replay is further integrated with runtime techniques that require expensive instrumentation, such as dependence detection. In this paper, we propose a technique to reduce a replay log while retaining its ability to reproduce a failure. While traditional logging records only system calls and signals, our technique leverages the compiler to selectively collect additional information on the fly. Upon a failure, the log can be reduced by analyzing itself. The collection is highly optimized. The additional runtime overhead of our technique, compared to a plain logging tool, is trivial (2.61% average) and the size of additional log is comparable to the original log. Substantial reduction can be cost-effectively achieved through a search based algorithm. The reduced log is guaranteed to reproduce the failure.

References

[1]
G. Altekar and I. Stoica. Odr: output-deterministic replay for multicore debugging. In SOSP'09.
[2]
A. Ayers, R. Schooler, C. Metcalf, A. Agarwal, J. Rhee, and E. Witchel. Traceback: first fault diagnosis by reconstruction of distributed control flow. In PLDI'05.
[3]
G. Bronevetsky, D. Marques, K. Pingali, and R. Rugina. Compiler-enhanced incremental checkpointing. In LCPC'07.
[4]
K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63--75, 1985.
[5]
G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08.
[6]
Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An application-level kernel for record and replay. In OSDI'08.
[7]
D. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two hardware-based approaches for deterministic multiprocessor replay. Communications of the ACM, 52(6):93--100, 2009.
[8]
D. R. Hower and M. D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In ISCA'08.
[9]
P. Joshi, C. S. Park, K. Sen, and M. Naik. A randomized dynamic program analysis technique for detecting real deadlocks. In PLDI'09.
[10]
S. Joshi and A. Orso. Scarpe: A technique and tool for selective capture and replay of program executions. In ICSM'07.
[11]
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In USENIX ATEC'05.
[12]
P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: a software-hardware interface for practical deterministic multiprocessor replay. In ASPLOS'09.
[13]
M. Musuvathi and S. Qadeer. Iterative context bounding for systematic testing of multithreaded programs. In PLDI'07.
[14]
S. Narayanasamy, C. Pereira, and B. Calder. Recording shared memory dependencies using strata. In ASPLOS'06.
[15]
R. H. B. Netzer and M. H. Weaver. Optimal tracing and incremental reexecution for debugging long-running programs. In PLDI'94.
[16]
S. Park, W. Xiong, Z. Yin, R. Kaushik, K. Lee, S. Lu, and Y. Zhou. Pres: Probabilistic replay with execution sketching on multiprocessors. In SOSP'09.
[17]
M. Ronsse, K. D. Bosschere, M. Christiaens, J. C. d. Kergommeaux, and D. Kranzlmüller. Record/replay for nondeterministic program executions. Communications of the ACM, 46(9):62--67, 2003.
[18]
Y. Saito. Jockey: a user-space library for record-replay debugging. In AADEBUG'05.
[19]
S. Tallam, C. Tian, X. Zhang, and R. Gupta. Enabling tracing of long-running multithreaded programs via dynamic execution reduction. In ISSTA'07.
[20]
L. D. Wittie. Debugging distributed c programs by real time reply. In PADD'88.
[21]
M. Wu, F. Long, X. Wang, Z. Xu, H. Lin, X. Liu, Z. Guo, H. Guo, L. Zhou, and Z. Zhang. Language-based replay via data flow cut. In FSE'10.
[22]
G. Xu, A. Rountev, Y. Tang, and F. Qin. Efficient checkpointing of java software using context-sensitive capture and replay. In FSE'07.
[23]
R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G. M. Voelker. Mpiwiz: subgroup reproducible replay of mpi applications. In PPOPP'09.
[24]
A. Zeller. Isolating cause-effect chains from computer programs. In FSE'02.
[25]
X. Zhang, S. Tallam, and R. Gupta. Dynamic slicing long running programs through execution fast forwarding. In FSE'06.\endthebibliography

Cited By

View all
  • (2013)Guided Algebraic Specification Mining for Failure SimplificationTesting Software and Systems10.1007/978-3-642-41707-8_15(223-238)Online publication date: 2013
  • (2020)RSX: Reproduction Scenario Extraction Technique for Business Application Workloads in DBMS2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW51248.2020.00043(91-96)Online publication date: Oct-2020
  • (2017)Characterizing and taming non-deterministic bugs in JavaScript applicationsProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155696(1006-1009)Online publication date: 30-Oct-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 46, Issue 6
PLDI '11
June 2011
652 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1993316
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2011
    668 pages
    ISBN:9781450306638
    DOI:10.1145/1993498
    • General Chair:
    • Mary Hall,
    • Program Chair:
    • David Padua
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011
Published in SIGPLAN Volume 46, Issue 6

Check for updates

Author Tags

  1. debugging
  2. instrumentation
  3. log reduction
  4. replay
  5. software reliability

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2013)Guided Algebraic Specification Mining for Failure SimplificationTesting Software and Systems10.1007/978-3-642-41707-8_15(223-238)Online publication date: 2013
  • (2020)RSX: Reproduction Scenario Extraction Technique for Business Application Workloads in DBMS2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW51248.2020.00043(91-96)Online publication date: Oct-2020
  • (2017)Characterizing and taming non-deterministic bugs in JavaScript applicationsProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155696(1006-1009)Online publication date: 30-Oct-2017
  • (2017)Characterizing and taming non-deterministic bugs in Javascript applications2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE.2017.8115720(1006-1009)Online publication date: Oct-2017
  • (2016)Minimizing faulty executions of distributed systemsProceedings of the 13th Usenix Conference on Networked Systems Design and Implementation10.5555/2930611.2930631(291-309)Online publication date: 16-Mar-2016
  • (2015)Pegasus: automatic barrier inference for stable multithreaded systemsProceedings of the 2015 International Symposium on Software Testing and Analysis10.1145/2771783.2771813(153-164)Online publication date: 13-Jul-2015
  • (2015)Fast reproducing web application errorsProceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE.2015.7381845(530-540)Online publication date: 2-Nov-2015
  • (2014)Troubleshooting blackbox SDN control software with minimal causal sequencesACM SIGCOMM Computer Communication Review10.1145/2740070.262630444:4(395-406)Online publication date: 17-Aug-2014
  • (2014)Troubleshooting blackbox SDN control software with minimal causal sequencesProceedings of the 2014 ACM conference on SIGCOMM10.1145/2619239.2626304(395-406)Online publication date: 17-Aug-2014
  • (2014)DrDebugProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544152(98-108)Online publication date: 15-Feb-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media