Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Abstractions for Practical Virtual Machine Replay

Published: 25 March 2016 Publication History

Abstract

Efficient deterministic replay of whole operating systems is feasible and useful, so why isn't replay a default part of the software stack? While implementing deterministic replay is hard, we argue that the main reason is the lack of general abstractions for understanding and addressing the significant engineering challenges involved in the development of a replay engine for a modern VMM. We present a design blueprint---a set of abstractions, general principles, and low-level implementation details---for efficient deterministic replay in a modern hypervisor. We build and evaluate our architecture in Xen, a full-featured hypervisor. Our architecture can be readily followed and adopted, enabling replay as a ubiquitous part of a modern virtualization stack.

References

[1]
G. Altekar and I. Stoica. ODR: Output-deterministic replay for multicore debugging. In Proc. SOSP, pages 193--206, Oct. 2009. 10.1145/1629575.1629594.
[2]
Amazon Web Services, Inc. Amazon EC2 -- virtual server hosting, 2016. URL https://aws.amazon.com/ec2/.
[3]
0AMD Corporation. AMD64 Architecture Programmer's Manual Volume 2: System Programming, 2007.
[4]
M. Attariyan, M. Chow, and J. Flinn. X-ray: Automating root-cause diagnosis of performance anomalies in production software. In Proc. OSDI, Oct. 2012. URL https://www.usenix.org/conference/osdi12/technical-sessions/presentation/attariyan.
[5]
P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proc. SOSP, pages 164--177, Oct. 2003. 10.1145/945445.945462.
[6]
J. F. Bartlett. A nonstop kernel. In Proc. SOSP, pages 22--29, Dec. 1981. 10.1145/800216.806587.
[7]
T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic process groups in dOS. In Proc. OSDI, pages 177--192, Oct. 2010. URL https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Bergan.pdf.
[8]
A. Borg, J. Baumbach, and S. Glazer. A message system supporting fault tolerance. In Proc. SOSP, pages 90--99, Oct. 1983. 10.1145/773379.806617.
[9]
T. C. Bressoud and F. B. Schneider. Hypervisor-based fault-tolerance. In Proc. SOSP, pages 1--11, Dec. 1995. 10.1145/224056.224058.
[10]
T. A. Cargill and B. N. Locanthi. Cheap hardware support for software debugging and profiling. In Proc. ASPLOS, pages 82--83, Oct. 1987. 10.1145/36177.36187.
[11]
A. Chen, W. B. Moore, H. Xiao, A. Haeberlen, L. T. X. an, M. Sherr, and W. Zhou. Detecting covert timing channels with time-deterministic replay. In Proc. OSDI, pages 541--554, Oct. 2014. URL https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chen_ang.
[12]
Y. Chen and H. Chen. Scalable deterministic replay in a parallel full-system emulator. In Proc. PPoPP, pages 207--218, Feb. 2013. 10.1145/2442516.2442537.
[13]
Y. Chen, S. Zhang, Q. Guo, L. Li, R. Wu, and T. Chen. Deterministic replay: A survey. ACM Comput. Surv., 48 (2), Nov. 2015. 10.1145/2790077.
[14]
D. Chisnall. The Definitive Guide to the Xen Hypervisor. Prentice Hall, first edition, 2007. ISBN 978-0132349710.
[15]
J. Chow, T. Garfinkel, and P. M. Chen. Decoupling dynamic program analysis from execution in virtual environments. In Proc. USENIX ATC, pages 1--14, June 2008. URL https://www.usenix.org/legacy/event/usenix08/tech/full_papers/chow/chow.pdf.
[16]
F. Cornelis, A. Georges, M. Christiaens, M. Ronsse, T. Ghesquiere, and K. D. Bosschere. A taxonomy of execution replay systems. In International Conference on Advances in Infrastructure for Electronic Business, Education, Science, Medicine, and Mobile Technologies on the Internet, 2003.
[17]
F. Cornelis, M. Ronsse, and K. De Bosschere. TORNADO: A novel input replay tool. In Proc. PDPTA, 2003\natexlabb.
[18]
R. Curtis and L. D. Wittie. BUGNET: A debugging system for parallel programming environments. In Proc. ICDCS, pages 394--400, Oct. 1982.
[19]
D. A. S. de Oliveira, J. R. Crandall, G. Wassermann, S. F. Wu, Z. Su, and F. T. Chong. ExecRecorder: VM-based full-system replay for attack analysis and system recovery. In Proc. 1st Workshop on Architectural and System Support for Improving Software Dependability, pages 66--71, Oct. 2006. 10.1145/1181309.1181320.
[20]
C. Dionne, M. Feeley, and J. Desbiens. A taxonomy of distributed debuggers based on execution replay. In Proc. PDPTA, Aug. 1996.
[21]
G. Dunlap. Personal communication, 2012.
[22]
G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. In Proc. OSDI, pages 211--224, Dec. 2002. URL https://www.usenix.org/legacy/event/osdi02/tech/dunlap.html.
[23]
G. W. Dunlap, D. G. Lucchetti, P. M. Chen, and M. A. Fetterman. Execution replay for multiprocessor virtual machines. In Proc. VEE, Mar. 2008. 10.1145/1346256.1346273.
[24]
G. W. Dunlap III. Execution Replay for Intrusion Analysis. D thesis, University of Michigan, 2006.
[25]
K. Fraser, S. Hand, R. Neugebauer, I. Pratt, A. Warfield, and M. Williamson. Safe hardware access with the Xen virtual machine monitor. In Proc. 1st Workshop on Operating System and Architectural Support for the On Demand IT Infrastructure (OASIS), Oct. 2004. URL https://www.cl.cam.ac.uk/research/srg/netos/papers/2004-safehw-oasis.pdf.
[26]
D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay debugging for distributed applications. In Proc. USENIX ATC, pages 289--300, May--June 2006. URL https://www.usenix.org/legacy/events/usenix06/tech/geels.html.
[27]
Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An application-level kernel for record and replay. In Proc. OSDI, pages 193--208, Dec. 2008. URL https://www.usenix.org/legacy/events/osdi08/tech/full_papers/guo/guo.pdf.
[28]
A. Haeberlen, P. Aditya, R. Rodrigues, and P. Druschel. Accountable virtual machines. In Proc. OSDI, pages 119--134, Oct. 2010. URL https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Haeberlen.pdf.
[29]
N. Honarmand and J. Torrellas. RelaxReplay: Record and replay for relaxed-consistency multiprocessors. In Proc. ASPLOS, Mar. 2014. 10.1145/2541940.2541979.
[30]
J. Huselius. Debugging parallel systems: A state of the art report. MTRC Report 63, Malardalens University, Vaster's, Sweden, Sept. 2002. URL http://www.es.mdh.se/publications/366-Debugging_Parallel_Systems__A_State_of_the_Art_Report.
[31]
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3 (3A, 3B, 3C, and 3D): System Programming Guide, 2015.
[32]
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proc. USENIX ATC, pages 1--15, Apr. 2005. URL https://www.usenix.org/legacy/events/usenix05/tech/general/king.html.
[33]
O. Laadan, N. Viennot, and J. Nieh. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In Proc. SIGMETRICS, pages 155--166, June 2010. 10.1145/1811039.1811057.
[34]
G. B. Leeman, Jr. A formal approach to undo operations in programming languages. ACM TOPLAS, 8 (1): 50--87, Jan. 1986. 10.1145/5001.5005.
[35]
G. Lefebvre, B. Cully, C. Head, M. Spear, N. Hutchinson, M. Feeley, and A. Warfield. Execution mining. In Proc. VEE, pages 145--158, Mar. 2012. 10.1145/2151024.2151044.
[36]
J. M. Mellor-Crummey and T. J. LeBlanc. A software instruction counter. In Proc. ASPLOS, pages 78--86, Apr. 1989. 10.1145/70082.68189.
[37]
Mozilla Foundation. rr: lightweight recording & deterministic debugging, Feb. 2016. URL http://rr-project.org/.
[38]
S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic replay with execution sketching on multiprocessors. In Proc. SOSP, pages 177--192, Oct. 2009. 10.1145/1629575.1629593.
[39]
R. Russell. virtio: Towards a de-facto standard for virtual I/O devices. ACM SIGOPS OSR, 42 (5): 95--103, July 2008. 10.1145/1400097.1400108.
[40]
Y. Saito. Jockey: A user-space library for record-replay debugging\balance. In Proc. AADEBUG, pages 69--76, Sept. 2005. 10.1145/1085130.1085139.
[41]
S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A lightweight extension for rollback and deterministic replay for software debugging. In Proc. USENIX ATC, pages 29--44, June--July 2004. URL https://www.usenix.org/legacy/event/usenix04/tech/general/srinivasan.html.
[42]
G. Venkitachalam, M. Nelson, B. Weissman, M. Xu, and V. V. Malyugin. Using branch instruction counts to facilitate replay of virtual machine instruction execution. U.S. patent 7,844,954, Nov. 2010.
[43]
VMware. VMware vSere 4 Fault Tolerance: Architecture and performance. White paper, Aug. 2009. URL https://www.vmware.com/resources/techresources/10058.
[44]
VMware. Protecting Hadoop with VMware vSere 5 Fault Tolerance. Technical white paper, Aug. 2012. URL https://www.vmware.com/resources/techresources/10301.
[45]
VMware. VMware vSere 6 Fault Tolerance: Architecture and performance. Technical white paper, Dec. 2015. URL https://www.vmware.com/resources/techresources/10514.
[46]
B. Weissman, V. V. Malyugin, P. Vandrovec, G. Venkitachalam, and M. Xu. Precise branch counting in virtualization systems. U.S. patent 9,027,003, May 2015.
[47]
B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An integrated experimental environment for distributed systems and networks. In Proc. OSDI, pages 255--270, Dec. 2002. URL https://www.usenix.org/legacy/event/osdi02/tech/white.html.
[48]
M. Xu, V. Malyugin, J. Sheldon, G. Venkitachalam, and B. Weissman. ReTrace: Collecting execution trace with virtual machine deterministic replay. In Proc. 3rd Annual Workshop on Modeling, Benchmarking and Simulation, June 2007. URL https://labs.vmware.com/academic/publications/retrace.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 51, Issue 7
VEE '16
July 2016
167 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3007611
Issue’s Table of Contents
  • cover image ACM Conferences
    VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
    March 2016
    186 pages
    ISBN:9781450339476
    DOI:10.1145/2892242
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2016
Published in SIGPLAN Volume 51, Issue 7

Check for updates

Author Tags

  1. deterministic replay
  2. execution replay
  3. hypervisor
  4. virtualization
  5. xentt

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)154
  • Downloads (Last 6 weeks)30
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media