Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ISCA.2006.33acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Reducing Startup Time in Co-Designed Virtual Machines

Published: 01 May 2006 Publication History

Abstract

A Co-Designed Virtual Machine allows designers to implement a processor via a combination of hardware and software. Dynamic binary translation converts code written for a conventional (legacy) ISA into optimized code for an underlying implementation-specific ISA. Because translation is done dynamically, an important consideration in such systems is the startup time for performing the initial translations. Beginning with a previously proposed co-designed VM that implements the x86 ISA, we study runtime binary translation overhead effects. The co-designed x86 virtual machine is based on an adaptive translation system that uses a basic block translator for initial emulation and a superblock translator for hotspot optimization. We analyze and model VM startup performance via simulation. We observe that non-hotspot emulation via basic block translation is the major part of the startup overhead. To reduce startup translation overhead, we follow the co-designed hardware / software philosophy and propose hardware assists to dramatically accelerate basic block translations. By combining hardware assists with balanced translation strategies, the co-designed translation system reduces runtime overhead significantly and demonstrates very competitive startup performance when compared with conventional processors running a set of Windows application benchmarks.

References

[1]
1 E. R. Altman, et al., "Advances and Future Challenges in Binary Translation and Optimization", Proc. of the IEEE, Special Issue on Microprocessor Architecture and Compiler Technology, pp. 1710-1722, Nov. 2001.
[2]
2 M. Arnold, et al., "Adaptive Optimization in the Jalapeño JVM " ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '00), pp. 47-65, Oct. 2000.
[3]
3 L. B araz, et al. "IA-32 Execution Layer: a two phase dynamic translator designed to support IA-32 applications on Itanium®-based systems", Proc. of the 36th Int'l Symp. on Microarchitecture, pp. 191-204, Dec. 2003.
[4]
4 D. Bruening, et al., "An infrastructure for adaptive dynamic optimization". Proc. of the 1st Int'l Symp. on Code Generation and Optimization, pp. 265-275, March 2003.
[5]
5 A. Chernoff, et al, "FX!32: A Profiler-Directed Binary Translator", IEEE Micro (18), March/April 1998.
[6]
6 Y. Chou, J. P. Shen. "Instruction Path Coprocessors", Proc. of the 27th Int'l Symp. on Computer Architecture, pp. 270- 281, June 2000.
[7]
7 J. Dean, et al. "ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors", Proc. of the 27th Int'l Symp. on Computer Architecture, pp. 316- 325, Jun. 2000.
[8]
8 J. C. Dehnert, et al. "The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges", Proc. of the 1st Int'l Symp. on Code Generation and Optimizations, pp. 15- 24, Mar. 2003.
[9]
9 K. Diefendorff "K7 Challenges Intel" Microprocessor Report. Vol. 12, No. 14, Oct. 25, 1998.
[10]
10 K. Ebcioglu, E. Altman, "DAISY: Dynamic Compilation for 100% Architectural Compatibility", Proc. of the 24th Int'l Symp. on Computer Architecture, pp. 26-37, Jun. 1997.
[11]
11 K. Ebcioglu et al., "Dynamic Binary Translation and Optimization", IEEE Transactions on Computers, Vol. 50, No. 6, pp. 529-548. June 2001.
[12]
12 S. Gochamn et al., "The Intel Pentium M Processor: Microarchitecture and Performance", Intel Technology Journal, vol. 7, issue 2, pp. 21-36, 2003.
[13]
13 G. Hinton et al. "The Microarchitecture of the Pentium 4 Processor", Intel Technology Journal. Q1, 2001.
[14]
14 R. J. Hookway, M. A. Herdeg, "Digital FX!32: Combining Emulation and Binary Translation", Digital Technical Journal, vol. 9, No. 1, Jan. 1997.
[15]
15 S. Hu, J. E. Smith, "Using Dynamic Binary Translation to Fuse Dependent Instructions", Proc. of the 2nd Int'l Symp. on Code Generation and Optimization, pp. 213-224, Mar. 2004.
[16]
16 S. Hu, et al., "An Approach for Implementing Efficient Superscalar CISC Processors", Proc. of the 12th Int'l Symp. on High Performance Computer Architecture, pp. 40-51, Feb. 2006.
[17]
17 W. W. Hwu et al., "The Superblock: An Effective Technique for VLIW and Superscalar Compilation", The Journal of Supercomputing, 7(1-2), pp. 229-248, 1993.
[18]
18 C. N. Keltcher, et al., "The AMD Opteron Processor for Multiprocessor Servers", IEEE MICRO, pp. 66-76, Mar.- Apr. 2003.
[19]
19 H.-S. Kim, J. E. Smith, "Dynamic Binary Translation for Accumulator-Oriented Architectures", Proc. of the 1st Int'l Symp. on Code Generation and Optimization, pp. 25-35, Mar. 2003.
[20]
20 H.-S. Kim, J. E. Smith, "Hardware Support for Control Transfers in Code Cache". Proc. of the 36th Int'l Symp. on Microarchitecture pp. 253-264, Dec. 2003.
[21]
21 A. Klaiber, "The Technology Behind Crusoe Processors", Transmeta Technical Brief, 2000.
[22]
22 K. Krewell, "Transmeta Gets More Efficeon" Microprocessor report. v. 17, October 2003.
[23]
23 M. C. Merten, et al. "An Architectural Framework for Run-time Optimization", IEEE transactions on Computers, Vol. 50, No. 6, pp. 567-589, Jun. 2001.
[24]
24 S. J. Patel, S. S. Lumetta, "rePLay: a hardware framework for dynamic optimization", IEEE, Transactions on Computers , pp. 590-680, Jun. 2001.
[25]
25 R. Rosner, et al. "Power Awareness through Selective Dynamically Optimized Traces", Proc. of the 31st Int'l Symp. on Computer Architecture, pp. 162-175, Jun. 2004.
[26]
26 E. P. Stritter, et al., "Microprogrammed Implementation of a Single Chip Microprocessor", Proc. of the 11th Annual Microprogramming Workshop, pp. 8-16, Nov. 1978.
[27]
27 Transmeta Corporation. Transmeta Efficeon Processor, http://www.transmeta.com/efficeon
[28]
28 VeriTest, PC Magazine, "Business WinStone Benchmark", http://www.veritest.com/benchmarks/bwinstone/

Cited By

View all
  • (2017)Hardware-accelerated dynamic binary translationProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130632(1062-1067)Online publication date: 27-Mar-2017
  • (2015)HERMESProceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2738600.2738631(246-256)Online publication date: 7-Feb-2015
  • (2014)Call sequence prediction through probabilistic calling automataACM SIGPLAN Notices10.1145/2714064.266022149:10(745-762)Online publication date: 15-Oct-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture
June 2006
383 pages
ISBN:076952608X
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 34, Issue 2
    May 2006
    383 pages
    ISSN:0163-5964
    DOI:10.1145/1150019
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2006

Check for updates

Qualifiers

  • Article

Conference

ISCA06
Sponsor:

Acceptance Rates

ISCA '06 Paper Acceptance Rate 31 of 234 submissions, 13%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Hardware-accelerated dynamic binary translationProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130632(1062-1067)Online publication date: 27-Mar-2017
  • (2015)HERMESProceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2738600.2738631(246-256)Online publication date: 7-Feb-2015
  • (2014)Call sequence prediction through probabilistic calling automataACM SIGPLAN Notices10.1145/2714064.266022149:10(745-762)Online publication date: 15-Oct-2014
  • (2014)Call sequence prediction through probabilistic calling automataProceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications10.1145/2660193.2660221(745-762)Online publication date: 15-Oct-2014
  • (2014)Accurate off-line phase classification for HW/SW co-designed processorsProceedings of the 11th ACM Conference on Computing Frontiers10.1145/2597917.2597937(1-10)Online publication date: 20-May-2014
  • (2014)Warm-Up Simulation Methodology for HW/SW Co-Designed ProcessorsProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544142(284-294)Online publication date: 15-Feb-2014
  • (2014)Warm-Up Simulation Methodology for HW/SW Co-Designed ProcessorsProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2544137.2544142(284-294)Online publication date: 15-Feb-2014
  • (2012)DDGaccACM SIGPLAN Notices10.1145/2365864.215104647:7(159-168)Online publication date: 3-Mar-2012
  • (2012)DDGaccProceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments10.1145/2151024.2151046(159-168)Online publication date: 3-Mar-2012
  • (2011)A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computingProceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2190025.2190070(236-245)Online publication date: 2-Apr-2011
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media