Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1629575.1629594acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

ODR: output-deterministic replay for multicore debugging

Published: 11 October 2009 Publication History

Abstract

Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur high overheads, rely on non-standard multiprocessor hardware, or fail to reliably reproduce executions. Their primary stumbling block is data races -- a source of nondeterminism that must be captured if executions are to be faithfully reproduced.
In this paper, we present ODR--a software-only replay system that reproduces bugs and provides low-overhead multiprocessor recording. The key observation behind ODR is that, for debugging purposes, a replay system does not need to generate a high-fidelity replica of the original execution. Instead, it suffices to produce any execution that exhibits the same outputs as the original. Guided by this observation, ODR relaxes its fidelity guarantees to avoid the problem of reproducing data-races altogether. The result is a system that replays real multiprocessor applications, such as Apache, MySQL, and the Java Virtual Machine, and provides low record-mode overhead.

References

[1]
G. Altekar and I. Stoica. Output--deterministic replay for multicore debugging. Technical Report UCB/EECS-2009-108, EECS Department, University of California, Berkeley, Aug 2009.
[2]
A. Ayers, R. Schooler, C. Metcalf, A. Agarwal, J. Rhee, and E. Witchel. Traceback: first fault diagnosis by reconstruction of distributed control flow. In V. Sarkar and M.W. Hall, editors, PLDI, pages 201--212. ACM, 2005.
[3]
S. Bhansali, W.-K. Chen, S. de Jong, A. Edwards, R. Murray, M. Drinić, D. MihoÇcka, and J. Chau. Framework for instruction-level tracing and analysis of program executions. In VEE '06, pages 154--163, New York, NY, USA, 2006. ACM.
[4]
C. Cadar, V. Ganesh, P.M. Pawlowski, D.L. Dill, and D.R. Engler. Exe: automatically generating inputs of death. In CCS '06: Proceedings of the 13th ACM conference on Computer and communications security, pages 322--335, New York, NY, USA, 2006. ACM Press.
[5]
G.W. Dunlap, D.G. Lucchetti, M.A. Fetterman, and P.M. Chen. Execution replay of multiprocessor virtual machines. In VEE '08, pages 121--130, New York, NY, USA, 2008. ACM.
[6]
V. Ganesh and D.L. Dill. A decision procedure for bit-vectors and arrays. In W. Damm and H. Hermanns, editors, CAV, volume 4590 of Lecture Notes in Computer Science, pages 519--531. Springer, 2007.
[7]
D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay debugging for distributed applications. In USENIX Annual Technical Conference, General Track, pages 289--300. USENIX, 2006.
[8]
Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M.F. Kaashoek, and Z. Zhang. R2: An application-level kernel for record and replay. In R. Draves and R. van Renesse, editors, OSDI, pages 193--208. USENIX Association, 2008.
[9]
J.L. Hennessy and D.A. Patterson. Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufmann, September 2006.
[10]
D.R. Hower and M.D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In ISCA '08: Proceedings of the 35th International Symposium on Computer Architecture, pages 265--276, Washington, DC, USA, 2008. IEEE Computer Society.
[11]
Intel. Intel 64 and IA-32 Architectures Reference Manual, November 2008.
[12]
J.C. King. Symbolic execution and program testing. Commun. ACM, 19(7):385--394, 1976.
[13]
S.T. King, G.W. Dunlap, and P.M. Chen. Debugging operating systems with time-traveling virtual machines. In USENIX Annual Technical Conference, General Track, pages 1--15. USENIX, 2005.
[14]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, 1978.
[15]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, volume 40, pages 190--200, New York, NY, USA, June 2005. ACM Press.
[16]
D.A. Molnar and D. Wagner. Catchconv: Symbolic execution and run-time type inference for integer conversion errors. Technical Report UCB/EECS-2007-23, EECS Department, University of California, Berkeley, 2007.
[17]
P. Montesinos, L. Ceze, and J. Torrellas. Delorean: Recording and deterministically replaying shared=memory multiprocessor execution efficiently. In ISCA '08: Proceedings of the 35th International Symposium on Computer Architecture, pages 289--300, Washington, DC, USA, 2008. IEEE Computer Society.
[18]
P. Montesinos, M. Hicks, S.T. King, and J. Torrellas. Capo: a software-hardware interface for practical deterministic multiprocessor replay. In M.L. Soffa and M.J. Irwin, editors, ASPLOS, pages 73--84. ACM, 2009.
[19]
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not., 42(6):89--100, June 2007.
[20]
J. Pool, I.S.K. Wong, and D. Lie. Relaxed determinism: making redundant execution on multiprocessors practical. In HOTOS'07: Proceedings of the 11th USENIX workshop on Hot topics in operating systems, pages 1--6, Berkeley, CA, USA, 2007. USENIX Association.
[21]
M. Ronsse and K. De Bosschere. Recplay: a fully integrated practical record/replay system. ACM Trans. Comput. Syst., 17(2):133--152, 1999.
[22]
Y. Saito. Jockey: A user-space library for record-replay debugging. In In AADEBUG'05: Proceedings of the sixth international symposium on Automated analysis-driven debugging, pages 69--76. ACM Press, 2005.
[23]
S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24--37, New York, February 1995. ACM Press.

Cited By

View all
  • (2024)DoppelGanger++ in Action: A Database Replay System with Fast Dependency Graph GenerationProceedings of the VLDB Endowment10.14778/3685800.368586317:12(4313-4316)Online publication date: 1-Aug-2024
  • (2024)Demystifying the Fight Against Complexity: A Comprehensive Study of Live Debugging Activities in Production Cloud SystemsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698568(341-360)Online publication date: 20-Nov-2024
  • (2024)Efficient Reproduction of Fault-Induced Failures in Distributed Systems with Feedback-Driven Fault InjectionProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695979(46-62)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
October 2009
346 pages
ISBN:9781605587523
DOI:10.1145/1629575
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. debugging
  2. deterministic replay
  3. inference
  4. multicore

Qualifiers

  • Research-article

Conference

SOSP09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)16
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DoppelGanger++ in Action: A Database Replay System with Fast Dependency Graph GenerationProceedings of the VLDB Endowment10.14778/3685800.368586317:12(4313-4316)Online publication date: 1-Aug-2024
  • (2024)Demystifying the Fight Against Complexity: A Comprehensive Study of Live Debugging Activities in Production Cloud SystemsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698568(341-360)Online publication date: 20-Nov-2024
  • (2024)Efficient Reproduction of Fault-Induced Failures in Distributed Systems with Feedback-Driven Fault InjectionProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695979(46-62)Online publication date: 4-Nov-2024
  • (2024)DoppelGanger++: Towards Fast Dependency Graph Generation for Database ReplayProceedings of the ACM on Management of Data10.1145/36393222:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Enoki: High Velocity Linux Kernel Scheduler DevelopmentProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629569(962-980)Online publication date: 22-Apr-2024
  • (2024)Zoomie: A Software-like Debugging Tool for FPGAsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651356(1048-1062)Online publication date: 27-Apr-2024
  • (2024)Enhanced S2E for Analysis of Multi-Thread SoftwareProgramming and Computer Software10.1134/S036176882309007449:S1(S39-S44)Online publication date: 26-Jan-2024
  • (2023)Alligator in Vest: A Practical Failure-Diagnosis Framework via Arm Hardware FeaturesProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598106(917-928)Online publication date: 12-Jul-2023
  • (2023)Vidi: Record Replay for Reconfigurable HardwareProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582040(806-820)Online publication date: 25-Mar-2023
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media