Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Optimizing Control Transfer and Memory Virtualization in Full System Emulators

Published: 08 December 2015 Publication History

Abstract

Full system emulators provide virtual platforms for several important applications, such as kernel and system software development, co-verification with cycle accurate CPU simulators, or application development for hardware still in development. Full system emulators usually use dynamic binary translation to obtain reasonable performance. This paper focuses on optimizing the performance of full system emulators. First, we optimize performance by enabling classic control transfer optimizations of dynamic binary translation in full system emulation, such as indirect branch target caching and block chaining. Second, we improve the performance of memory virtualization of cross-ISA virtual machines by improving the efficiency of the software translation lookaside buffer (software TLB). We implement our optimizations on QEMU, an industrial-strength full system emulator, along with the Android emulator. Experimental results show that our optimizations achieve an average speedup of 1.98X for ARM-to-X86-64 QEMU running SPEC CINT2006 benchmarks with train inputs. Our optimizations also achieve an average speedup of 1.44X and 1.40X for IA32-to-X86-64 QEMU and AArch64-to-X86-64 QEMU on SPEC CINT2006. We use a set of real applications downloaded from Google Play as benchmarks for the Android emulator. Experimental results show that our optimizations achieve an average speedup of 1.43X for the Android emulator running these applications.

References

[1]
Keith Adams and Ole Agesen. 2006. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). ACM, New York, NY, 2--13.
[2]
AMIDuOS. 2015. AMIDuOS. Retrieved November 2, 2015 from http://www.amiduos.com.
[3]
Android-Emulator. 2015. Homepage. Retrieved November 2, 2015 from https://android.googlesource.com/platform/external/qemu.git.
[4]
ARM. 2007. Cortex-A9 Technical Reference Manual. Retrieved November 2, 2015 from http://infocenter.arm.com/help/index.jsp.
[5]
Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. 2000. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (PLDI’00). ACM, New York, NY, 1--12.
[6]
L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Yun Wang, and Y. Zemach. 2003. IA-32 execution layer: A two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. (MICRO-36). 191--201.
[7]
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). ACM, New York, NY, 164--177.
[8]
Michael Bebenita, Florian Brandner, Manuel Fahndrich, Francesco Logozzo, Wolfram Schulte, Nikolai Tillmann, and Herman Venter. 2010. SPUR: A trace-based JIT compiler for CIL. SIGPLAN Notices 45, 10, 708--725.
[9]
Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track. 41--46.
[10]
BlueStacks. 2015. BlueStacks. (2015). Retrieved November 2, 2015 from http://www.bluestacks.com.
[11]
I. Bohm, T. J. K. Edler von Koch, S. C. Kyle, B. Franke, and N. Topham. 2011. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proceedings of PLDI.
[12]
D. Bruening, T. Garnett, and S. Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Code Generation and Optimization. 265--275.
[13]
Prashanth P. Bungale and Chi-Keung Luk. 2007. PinOS: A programmable framework for whole-system dynamic instrumentation. In Proceedings of the 3rd International Conference on Virtual Execution Environments (VEE’07). ACM, New York, NY, 137--147.
[14]
Chao-Jui Chang, Jan-Jan Wu, Wei-Chung Hsu, Pangfeng Liu, and Pen-Chung Yew. 2014. Efficient memory virtualization for cross-ISA system mode emulation. In Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’14). ACM, New York, NY, 117--128.
[15]
Anton Chernoff, Mark Herdeg, Ray Hookway, Chris Reeve, Norman Rubin, Tony Tye, S. Bharadwaj Yadavalli, and John Yates. 1998. FX!32: A profile-directed binary translator. IEEE Micro 18, 2, 56--64.
[16]
Bob Cmelik and David Keppel. 1994. Shade: A fast instruction-set simulator for execution profiling. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’94). ACM, New York, NY, 128--137.
[17]
James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. 2003. The transmeta code morphing™software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’03). IEEE Computer Society, Washington, DC, 15--24.
[18]
Kemal Ebcioglu, Erik Altman, Michael Gschwind, and Sumedh Sathaye. 2001. Dynamic binary translation and optimization. IEEE Transactions on Computers 50, 6, 529--548.
[19]
Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R. Haghighat, Blake Kaplan, Graydon Hoare, Boris Zbarsky, Jason Orendorff, Jesse Ruderman, Edwin W. Smith, Rick Reitmaier, Michael Bebenita, Mason Chang, and Michael Franz. 2009. Trace-based just-in-time type specialization for dynamic languages. SIGPLAN Notices 44, 6, 465--478.
[20]
Ding-Yong Hong, Chun-Chen Hsu, Pangfeng Liu, Chien-Min Wang, Jan-Jan Wu, Pen-Chung Yew, and Wei-Chung Hsu. 2012. HQEMU: A multi-threaded and retargetable dynamic binary translator on multicores. In Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’12).
[21]
Chun-Chen Hsu, Pangfeng Liu, Jan-Jan Wu, Pen-Chung Yew, Ding-Yong Hong, Wei-Chung Hsu, and Chien-Min Wang. 2013. Improving dynamic binary optimization through early-exit guided code region formation. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’13). ACM, New York, NY, 23--32.
[22]
H. Inoue, H. Hayashizaki, P. Wu, and T. Nakatani. 2011. A trace-based Java JIT compiler retrofitted from a method-based compiler. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization. 246--256.
[23]
Intel. 2015. Intel Developer Manuals. Retrieved November 2, 2015 from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html.
[24]
A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori. 2007. kvm: The Linux virtual machine monitor. In The 2007 Ottawa Linux Symposium (OLS’07). 225--230.
[25]
Linaro. 2013. Linaro Versatile Express 13.08 Release. Retrieved from http://releases.linaro.org/openembedded/vexpress-lsk/latest.
[26]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). ACM, New York, NY, 190--200.
[27]
Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hållberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58.
[28]
N. Nethercote and J. Seward. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of PLDI. 89--100.
[29]
QEMU. 2015. Retrieved November 2015 from http://qemu.org.
[30]
K. Scott, N. Kumar, B. R. Childers, J. W. Davidson, and M. L. Soffa. 2004. Overhead reduction techniques for software dynamic translation. In Proceedings of IPDPS. 200--207.
[31]
James E. Smith and Ravi Nair. 2005. Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufman, Burlington, MA.
[32]
Xin Tong, Toshihiko Koju, Motohiro Kawahito, and Andreas Moshovos. 2015. Optimizing memory translation emulation in full system emulators. ACM Transactions on Architecture and Code Optimization 11, 4, Article 60, 24 pages.
[33]
Emmett Witchel and Mendel Rosenblum. 1996. Embra: Fast and flexible machine simulation. In Proceedings of the 1996 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’96). ACM, New York, NY, 68--79.
[34]
Qin Zhao, Derek Bruening, and Saman Amarasinghe. 2010. Umbra: Efficient and scalable memory shadowing. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’10). ACM, New York, NY, 22--31.
[35]
Cindy Zheng and Carol Thompson. 2000. PA-RISC to IA-64: Transparent execution, no recompilation. Computer 33, 3 (2000), 47--52.

Cited By

View all
  • (2023)Towards Efficient Dynamic Binary Translation Optimizations Based on RISC Architectural FeaturesJournal of Circuits, Systems and Computers10.1142/S021812662450104433:06Online publication date: 26-Oct-2023
  • (2021)BTMMU: an efficient and versatile cross-ISA memory virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454015(71-83)Online publication date: 7-Apr-2021
  • (2019)Cross-ISA machine instrumentation using fast and scalable dynamic binary translationProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313811(74-87)Online publication date: 14-Apr-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 12, Issue 4
January 2016
848 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2836331
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2015
Accepted: 01 October 2015
Revised: 01 October 2015
Received: 01 May 2015
Published in TACO Volume 12, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Control transfer optimizations
  2. memory virtualization optimizations

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Ministry of Science and Technology of Taiwan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)143
  • Downloads (Last 6 weeks)25
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Towards Efficient Dynamic Binary Translation Optimizations Based on RISC Architectural FeaturesJournal of Circuits, Systems and Computers10.1142/S021812662450104433:06Online publication date: 26-Oct-2023
  • (2021)BTMMU: an efficient and versatile cross-ISA memory virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454015(71-83)Online publication date: 7-Apr-2021
  • (2019)Cross-ISA machine instrumentation using fast and scalable dynamic binary translationProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313811(74-87)Online publication date: 14-Apr-2019
  • (2016)Hardware-Accelerated Cross-Architecture Full-System VirtualizationACM Transactions on Architecture and Code Optimization10.1145/299679813:4(1-25)Online publication date: 25-Oct-2016

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media