Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

DoublePlay: Parallelizing Sequential Logging and Replay

Published: 01 February 2012 Publication History

Abstract

Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order of or the values read by shared memory operations performed by multiple threads. In this paper, we present DoublePlay, a new way to efficiently guarantee replay on commodity multiprocessors. Our key insight is that one can use the simpler and faster mechanisms of single-processor record and replay, yet still achieve the scalability offered by multiple cores, by using an additional execution to parallelize the record and replay of an application. DoublePlay timeslices multiple threads on a single processor, then runs multiple time intervals (epochs) of the program concurrently on separate processors. This strategy, which we call uniparallelism, makes logging much easier because each epoch runs on a single processor (so threads in an epoch never simultaneously access the same memory) and different epochs operate on different copies of the memory. Thus, rather than logging the order of shared-memory accesses, we need only log the order in which threads in an epoch are timesliced on the processor. DoublePlay runs an additional execution of the program on multiple processors to generate checkpoints so that epochs run in parallel. We evaluate DoublePlay on a variety of client, server, and scientific parallel benchmarks; with spare cores, DoublePlay reduces logging overhead to an average of 15% with two worker threads and 28% with four threads.

References

[1]
Altekar, G. and Stoica, I. 2009. ODR: Output-deterministic replay for multicore debugging. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles. 193--206.
[2]
Aviram, A., Weng, S.-C., Hu, S., and Ford, B. 2010. Efficient system-enforced deterministic parallelism. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation.
[3]
Bacon, D. F. and Goldstein, S. C. 1991. Hardware assisted replay of multiprocessor programs. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging. ACM Press, 194--206.
[4]
Bergan, T., Anderson, O., Devietti, J., Ceze, L., and Grossman, D. 2010a. Coredet: A compiler and runtime system for deterministic multithreaded execution. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 53--64.
[5]
Bergan, T., Hunt, N., Ceze, L., and Gribble, S. D. 2010b. Deterministic Process Groups in dOS. In Proceedings of the Symposium on Operating Systems Design and Implementation.
[6]
Berger, E. D., Yang, T., Liu, T., and Novark, G. 2009. Grace: Safe multithreaded programming for C/C++. In Proceedings of OOPSLA. 81--96.
[7]
Bhansali, S., Chen, W., de Jong, S., Edwards, A., and Drinic, M. 2006. Framework for instruction-level tracing and analysis of programs. In Proceedings of the 2nd International Conference on Virtual Execution Environments. 154--163.
[8]
Bocchino Jr., R. L., Adve, V. S., Dig, D., Adve, S. V., Heumann, S., Komuravelli, R., Overbey, J., Simmons, P., Sung, H., and Vakilian, M. 2009. A type and effect system for deterministic parallel java. In Proceedings of OOPSLA. 97--116.
[9]
Bressoud, T. C. and Schneider, F. B. 1996. Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14, 1, 80--107.
[10]
Choi, J. D., Alpern, B., Ngo, T., and Sridharan, M. 2001. A perturbation free replay platform for cross-optimized multithreaded applications. In Proceedings of the 15th International Parallel and Distributed Processing Symposium.
[11]
Chow, J., Garfinkel, T., and Chen, P. M. 2008. Decoupling dynamic program analysis from execution in virtual environments. In Proceedings of the USENIX Technical Conference. 1--14.
[12]
Devietti, J., Lucia, B., Ceze, L., and Oskin, M. 2009. DMP: Deterministic shared memory multiprocessing. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 85--96.
[13]
Dunlap, G. W., King, S. T., Cinar, S., Basrai, M. A., and Chen, P. M. 2002. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation. 211--224.
[14]
Dunlap, G. W., Lucchetti, D. G., Fetterman, M., and Chen, P. M. 2008. Execution replay on multiprocessor virtual machines. In Proceedings of the 2008 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE). 121--130.
[15]
Feldman, S. I. and Brown, C. B. 1988. IGOR: A system for program debugging via reversible execution. In Proceedings of the ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging (PADD’88). 112--123.
[16]
Hower, D. R. and Hill, M. D. 2008. Rerun: Exploiting episodes for lightweight memory race recording. In Proceedings of the International Symposium on Computer Architecture. 265--276.
[17]
Kelsey, K., Bai, T., Ding, C., and Zhang, C. 2009. Fast Track: A software system for speculative program optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO). 157--168.
[18]
King, S. T., Dunlap, G. W., and Chen, P. M. 2005. Debugging operating systems with time-traveling virtual machines. In Proceedings of the USENIX Technical Conference. 1--15.
[19]
Laadan, O., Viennot, N., and Nieh, J. 2010. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS). 155--166.
[20]
LeBlanc, T. J. and Mellor-Crummey, J. M. 1987. Debugging parallel programs with instant replay. IEEE Trans. Comput. 36, 4, 471--482.
[21]
Lee, D., Said, M., Narayanasamy, S., Yang, Z. J., and Pereira, C. 2009. Offline symbolic analysis for multi-processor execution replay. In Proceedings of the International Symposium on Microarchitecture (MICRO).
[22]
Lee, D., Wester, B., Veeraraghavan, K., Chen, P. M., Flinn, J., and Narayanasamy, S. 2010. Respec: Efficient online multiprocessor replay via speculation and external determinism. In Proceedings of ASPLOS. 77--89.
[23]
Lucia, B., Ceze, L., Strauss, K., Qadeer, S., and Boehm, H.-J. 2010. Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races. In Proceedings of the International Symposium on Computer Architecture. 210--221.
[24]
Mellor-Crummey, J. M. and LeBlanc, T. J. 1989. A Software Instruction Counter. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 78--86.
[25]
Montesinos, P., Ceze, L., and Torrellas, J. 2008. DeLorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In Proceedings of the International Symposium on Computer Architecture. 289--300.
[26]
Musuvathi, M., Qadeer, S., Ball, T., Basler, G., Nainar, P. A., and Neamtiu, I. 2008. Finding and reproducing heisenbugs in concurrent programs. In Proceedings of the Symposium on Operating Systems Design and Implementation. 267--280.
[27]
Narayanasamy, S., Pokam, G., and Calder, B. 2005. BugNet: Continuously recording program execution for deterministic replay debugging. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). 284--295.
[28]
Narayanasamy, S., Pereira, C., and Calder, B. 2006a. Recording shared memory dependencies using Strata. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. 229--240.
[29]
Narayanasamy, S., Pereira, C., Patil, H., Cohn, R., and Calder, B. 2006b. Automatic logging of operating system effects to guide application-level architecture simulation. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS). 216--227.
[30]
Netzer, R. H. B. 1993. Optimal tracing and replay for debugging shared-memory parallel programs. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging. 1--11.
[31]
Nightingale, E. B., Chen, P. M., and Flinn, J. 2005. Speculative execution in a distributed file system. In Proceedings of the ACM Symposium on Operating Systems Principles. 191--205.
[32]
Nightingale, E. B., Veeraraghavan, K., Chen, P. M., and Flinn, J. 2006. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation. 1--14.
[33]
Nightingale, E. B., Peek, D., Chen, P. M., and Flinn, J. 2008. Parallelizing security checks on commodity hardware. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems. 308--318.
[34]
Olszewski, M., Ansel, J., and Amarasinghe, S. 2009. Kendo: Efficient deterministic multithreading in software. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 97--108.
[35]
Oplinger, J. and Lam, M. S. 2002. Enhancing software reliability using speculative threads. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 184--196.
[36]
Park, S., Zhou, Y., Xiong, W., Yin, Z., Kaushik, R., Lee, K. H., and Lu, S. 2009. PRES: Probabilistic replay with execution sketching on multiprocessors. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles. 177--191.
[37]
Peterson, Z. N. J. and Burns, R. 2005. Ext3cow: A time-shifting file system for regulatory compliance. ACM Trans. Storage 1, 2, 190--212.
[38]
Purser, Z., Sundaramoorthy, K., and Rotenberg, E. 2000. Slipstream processors: Improving both performance and fault tolerance. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems. 257--268.
[39]
Ronsse, M. and Bosschere, K. D. 1999. RecPlay: A full integrated practical record/replay system. ACM Trans. Comput. Syst. 17, 2, 133--152.
[40]
Russinovich, M. and Cogswell, B. 1996. Replay for concurrent non-deterministic shared-memory applications. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 258--266.
[41]
Santry, D. S., Feeley, M. J., Hutchinson, N. C., Veitch, A. C., Carton, R. W., and Ofir, J. 1999. Deciding when to forget in the Elephant file system. SIGOPS Oper. Syst. Rev. 33, 5, 110--123.
[42]
Sohi, G. S., Breach, S. E., and Vijaykumar, T. N. 1995. Multiscalar processors. In Proceedings of the International Symposium on Computer Architecture. 414--425.
[43]
Srinivasan, S., Andrews, C., Kandula, S., and Zhou, Y. 2004. Flashback: A light-weight extension for rollback and deterministic replay for software debugging. In Proceedings of the USENIX Technical Conference. 29--44.
[44]
Steffan, J. G. and Mowry, T. C. 1998. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proceedings of the Symposium on High Performance Computer Architecture. 2--13.
[45]
Süßkraut, M., Knauth, T., Weigert, S., Schiffel, U., Meinhold, M., Fetzer, C., Bai, T., Ding, C., and Zhang, C. 2010. Prospect: A compiler framework for speculative parallelization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO). 131--140.
[46]
Tucek, J., Lu, S., Huang, C., Xanthos, S., and Zhou, Y. 2007. Triage: Diagnosing production run failures at the user’s site. In Proceedings of the 21st ACM Symposium on Operating Systems Principles. 131--144.
[47]
Veeraraghavan, K., Flinn, J., Nightingale, E. B., and Noble, B. 2010. quFiles: The right file at the right time. In Proceedings of the 8th USENIX Conference on File and Storage Technologies. 1--14.
[48]
Veeraraghavan, K., Chen, P. M., Flinn, J., and Narayanasamy, S. 2011. Surviving and detecting data races using complementary schedules. In Proceedings of the Symposium on Operating Systems Principles (SOSP).
[49]
Vlachos, E., Goodstein, M. L., Kozuch, M. A., Chen, S., Falsafi, B., Gibbons, P. B., and Mowry, T. C. 2010. ParaLog: Enabling and accelerating online parallel monitoring of multithreaded applications. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 271--284.
[50]
Weeratunge, D., Zhang, X., and Jagannathan, S. 2010. Analyzing multicore dumps to facilitate concurrency bug reproduction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 155--166.
[51]
Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture. 24--36.
[52]
Xu, M., Bodik, R., and Hill, M. D. 2003. A “flight data recorder” for enabling full-system multiprocessor deterministic replay. In Proceedings of the International Symposium on Computer Architecture. 122--135.
[53]
Xu, M., Malyugin, V., Sheldon, J., Venkitachalam, G., and Weissman, B. 2007. ReTrace: Collecting execution trace with virtual machine deterministic replay. In Proceedings of the Workshop on Modeling, Benchmarking and Simulation (MoBS).
[54]
Zamfir, C. and Candea, G. 2010. Execution synthesis: A technique for automated software debugging. In Proceedings of the European Conference on Computer Systems (EuroSys). 321--334.
[55]
Zilles, C. and Sohi, G. 2002. Master/slave speculative parallelization. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO). 85--96.

Cited By

View all
  • (2023)Logging Practices in Software Engineering: A Systematic Mapping StudyIEEE Transactions on Software Engineering10.1109/TSE.2022.316692449:2(902-923)Online publication date: 1-Feb-2023
  • (2022)EZEE: Epoch Parallel Zero Knowledge for ANSI C2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP53844.2022.00015(109-123)Online publication date: Jun-2022
  • (2021)Parallel shadow execution to accelerate the debugging of numerical errorsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468585(615-626)Online publication date: 20-Aug-2021
  • Show More Cited By

Recommendations

Reviews

Bayard Kohlhepp

There are some computer fields, like computer forensics and security analysis, that require the replaying of interesting operational sequences. On uniprocessor systems, this is not a problem. Critical events are logged and can then be replayed as needed. But what do you do on contemporary multiprocessor systems running distributed parallel applications__?__ The standard logging approach requires single-threaded log entries, which turns multiprocessing back into uniprocessing, defeating the performance advantages of multiprocessing. The few record-and-replay solutions that have been created for parallel processing systems thus have prohibitive performance penalties as a result of such sequentialization. The DoublePlay solution presented in this paper minimizes sequentialization and is now the most efficient deterministic replay solution for multiprocessors. Overhead ranges from 20 to 100 percent, as opposed to other solutions that run as high as 1100 percent. This innovation can make record-and-replay practical for many parallel applications. Overall execution is subdivided into time slices called epochs, which are bounded by critical events. The 2D matrix created by epochs and processors is then swapped, exchanging columns for rows. This translation re-orders the epoch/processor segments so that they can be sequenced in parallel, but deterministically, preserving the order of critical events. The application is rerun following this reordered execution sequence, putting the "double" in DoublePlay. The replay log is created by this reordered run, and a deterministic sequence is then available for future replays. I found the subject matter to be complex, and the paper is relatively long at 24 pages. However, the writing, style, and presentation of the paper were excellent and made for easy reading. The paper itself did not get in the way, which happens all too frequently, even with simple subjects. As an apparent breakthrough in its field, this paper should interest anyone researching or developing deterministic replay systems. It may have broader application, though, in operating system design, fault tolerance, or even computer forensics. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 30, Issue 1
Special Issue APLOS 2011
February 2012
137 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2110356
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2012
Accepted: 01 October 2011
Received: 01 August 2011
Published in TOCS Volume 30, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deterministic replay
  2. uniparallelism

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)8
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Logging Practices in Software Engineering: A Systematic Mapping StudyIEEE Transactions on Software Engineering10.1109/TSE.2022.316692449:2(902-923)Online publication date: 1-Feb-2023
  • (2022)EZEE: Epoch Parallel Zero Knowledge for ANSI C2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP53844.2022.00015(109-123)Online publication date: Jun-2022
  • (2021)Parallel shadow execution to accelerate the debugging of numerical errorsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468585(615-626)Online publication date: 20-Aug-2021
  • (2021)RAProducer: efficiently diagnose and reproduce data race bugs for binaries via trace analysisProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464831(593-606)Online publication date: 11-Jul-2021
  • (2019)RENNProceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE.2019.00090(924-935)Online publication date: 10-Nov-2019
  • (2019)Efficient Methods for Trace Analysis ParallelizationInternational Journal of Parallel Programming10.1007/s10766-019-00631-4Online publication date: 9-Feb-2019
  • (2019)Causal-Consistent Replay Debugging for Message Passing ProgramsFormal Techniques for Distributed Objects, Components, and Systems10.1007/978-3-030-21759-4_10(167-184)Online publication date: 17-Jun-2019
  • (2018)Debugging Nondeterministic Failures in Linux Programs through Replay AnalysisScientific Programming10.1155/2018/89390272018Online publication date: 1-Jan-2018
  • (2018)Replay without recording of production bugs for service oriented applicationsProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering10.1145/3238147.3238186(452-463)Online publication date: 3-Sep-2018
  • (2018)Leveraging Hardware-Assisted Virtualization for Deterministic Replay on Commodity Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2017.272749267:1(45-58)Online publication date: 1-Jan-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media