Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/776261.776264acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Dynamic binary translation for accumulator-oriented architectures

Published: 23 March 2003 Publication History

Abstract

A dynamic binary translation system for a co-designed virtual machine is described and evaluated. The underlying hardware directly executes an accumulator-oriented instruction set that exposes instruction dependence chains (strands) to a distributed microarchitecture containing a simple instruction pipeline. To support conventional program binaries, a source instruction set (Alpha in our study) is dynamically translated to the target accumulator instruction set. The binary translator identifies chains of inter-instruction dependences and assigns them to dependence-carrying accumulators. Because the underlying superscalar microarchitecture is capable of dynamic instruction scheduling, the binary translation system does not perform aggressive optimizations or re-schedule code; this significantly reduces binary translation overhead.Detailed timing simulation of the dynamically translated code running on an accumulator-based distributed microarchitecture shows the overall system is capable of achieving similar performance to an ideal out-of-order superscalar processor, ignoring the significant clock frequency advantages that the accumulator-based hardware is likely to have. As part of the study, we evaluate an instruction set modification that simplifies precise trap implementation. This approach significantly reduces the number of instructions required for register state copying, thereby improving performance. We also observe that translation chaining methods can have substantial impact on the performance, and we evaluate a number of chaining methods.

References

[1]
Erik R. Altman, Michael Gschwind, Sumedh Sathaye, S. Kosonocky, Arthur Bright, Jason Fritts, Paul Ledak, David Appenzeller, Craig Agricola, Zachary Filan, "BOA: The Architecture of a Binary Translation Processor," IBM Research Report RC 21665, Dec. 2000
[2]
Matthew Arnold, Stephen Fink, David Grove, Michael Hind, Peter F. Sweeney, "Adaptive Optimization in the Jalapeno JVM," Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 47--65, Oct. 2000.
[3]
Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia, "Transparent dynamic optimization: the design and implementation of Dynamo," Hewlett Packard Laboratories Technical Report HPL-1999-78, Jun. 1999.
[4]
Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia, "Dynamo: A Transparent Dynamic Optimization System," Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 1--12, Jun. 2000.
[5]
James R. Bell, "Threaded Code," Communications of ACM, Vol. 16, No. 6, pp. 370--372, Jun. 1973.
[6]
Derek Bruening, Evelyn Duesterwald, Saman Amarasinghe, "Design and Implementation of a Dynamic Optimization Framework for Windows," Proceedings of the 4th Workshop on Feedback-Directed and Dynamic Optimization, Dec. 2001.
[7]
Douglas C. Burger and Todd M. Austin, "The SimpleScalar Toolset, Version 2.0," Technical Report CS-TR-97-1342, University of Wisconsin---Madison, Jun. 1997.
[8]
Wen-Ke Chen, Sorin Lerner, Ronnie Chaiken, David M. Gillies, "Mojo: A. Dynamic Optimization System," Proceedings of the 3rd ACM Workshop on Feedback-Directed and Dynamic Optimization, Dec. 2000.
[9]
Anton Chernoff, Mark Herdeg, Ray Hookway, Chris Reeve, Norman Rubin, Tony Tye, S. Bharadwaj Yadavalli, John Yates, "FX!32 - A Profile-Directed Binary Translator," IEEE Micro, Vol. 18, No. 2, pp. 56--64, Mar. 1998.
[10]
Yuan Chou, John. P. Shen, "Instruction Path Coprocessors," Proceedings of the 27th International Symposium on Computer Architecture, pp. 270--281, Jun. 2000.
[11]
Thomas M. Conte, Sumedh W. Sathaye, "Dynamic Rescheduling: A Technique for Object Code Compatibility in VLIW Architectures," Proceedings of the 28th International Symposium on Microarchitecture, pp. 208--218, Dec. 1995.
[12]
Dean Deaver, Rick Gorton, Norman Rubin, "Wiggins/Redstone: an online program specializer," Proceedings of the 11th HotChips Symposium, Aug. 1999.
[13]
Rajagopalan Deskian, Douglas C. Burger, Stephen W. Keckler, "Measuring Experimental Error in Microprocessor Simulation," Proceedings of the 28th International Symposium on Computer Architecture, pp. 266--277, Jun 2001.
[14]
Kemal Ebcioglu, Erik Altman, Michael Gschwind, Sumedh Sathaye, "Dynamic Binary Translation and Optimization," IEEE Transactions on Computers, Vol. 50, No. 6, pp. 529--548, Jun. 2001.
[15]
Brian Fahs, Satarupa Bose, Matthew Crum, Brian Slechta, Francesco Spadini, Tony Tung, Sanjay J. Patel, Steven S. Lumetta, "Performance Characterization of a Hardware Mechanism for Dynamic Optimization," Proceedings of the 34th International Symposium on Microarchitecture, pp. 16--27, Dec. 2001.
[16]
Keith Farkas, Paul Chow, Norman Jouppi, Zvonko Vranesic, "The Multlcluster" Architecture: Reducing Cycle Time Through Partitioning," Proceedings of the 30th International Symposium on Microarchitecture, pp. 40--51, Dec. 1997.
[17]
Michael Gschwind, Erik R. Altman, Sumedh Sathaye, Paul Ledak, David Appenzeller, "Dynamic and Transparent Binary Translation," IEEE Computer, Vol. 33, No. 2, pp. 54--59, Mar. 2000.
[18]
Linley Gwennap, "Intel's P6 Uses Decoupled Superscalar Design," Microprocessor Report, Feb. 16, 1995.
[19]
Kim M. Hazelwood, Thomas M. Conte, "A Lightweight Algorithm for Dynamic If-Conversion During Dynamic Optimization," Proceedings of the 2000 International Symposium on Parallel Architectures and Compilation Techniques, pp. 71--80, Oct. 2000.
[20]
John L. Henning, "SPEC CPU2000: Measuring CPU Performance in the New Millennium," IEEE Computer, Vol. 33, No. 7, pp. 28--35, Jul. 2000.
[21]
Raymond J. Hookway, Mark A. Herdeg, "Digital FX!32: Combining Emulation and Binary Translation," Digital Technical Journal, Vol. 9, No. 1, Jan. 1997.
[22]
Wen-mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery, "The Superblock: An Effective Technique for VLIW and Superscalar Compilation," Journal of Supercomputing, Kluwer Academic Publishing, pp. 229--248, 1993.
[23]
Intel Corp., Intel Itanium Architecture Software Developer's Manual vol. 3, Rev. 2.0: Instruction Set Reference, Intel Corp., 2001.
[24]
Tom R. Halfhill, "Transmeta Breaks x86 Low-Power Barrier," Microprocessor Report, Feb. 14, 2000.
[25]
Glenn Hinton, Dave Sager, Mike Upton, Darrel Boggs, Doug Carmean, Alan Kyker, Pattice Roussel, "The Microarchitecture of the Pentium 4 Processor," lntel Technology Journal Q1, 2001.
[26]
Paul Hohensee, Mathew Myszewski, David Reese, "Wabi CPU Emulation," Proceedings of the 8th HotChips Symposium, pp. 47--65. Aug. 1996.
[27]
David Kaeli, P. G. Emma, "Branch History Table Prediction of Moving Target Branches Due to Subroutine Returns," Proceedings of the 18th International Symposium on Computer Architecture, pp. 34--42, Jun. 1991.
[28]
Ho-Seop Kim, James E. Smith, "An Instruction Set and Microarchitecture for Instruction-Level Distributed Processing," Proceedings of the 29th International Symposium on Computer Architecture, pp. 71--81, Jun. 2002.
[29]
Richard E. Kessler, "The Alpha 21264 Microprocessor," IEEE Micro, Vol. 19, No. 2, pp. 24--36, Mar. 1999.
[30]
Thomas Kistler, Michael Franz, "Continuous Program Optimization: Design and Evaluation," IEEE Transactions on Computers, Vol. 50, No. 6, pp. 549--565, Jun. 2001.
[31]
Alexander Klaiber, "The Technology behind Crusoe Processors," Transmeta Technical Brief, 2000.
[32]
Paul Klint, "Interpretation Techniques," Software Practice and Experience, Vol. 11, No. 9, pp. 963--973, Sep. 1981.
[33]
Peter S. Magnusson, David Samuelsson, "A Compact Intermediate Format for SIMICS," Technical Report R94:17, Swedish Institute of Computer Science, 1994.
[34]
Steve Meloan, "The Java HotSpot Performance Engine: An In-Depth Look," Technical Whitepaper, Sun Microsystems, 1999.
[35]
Matthew Merten, Andrew R. Trick, Ronald D. Barnes, Erik M. Nystrom, Christopher N. George, John C. Gyllenhaal, Wen-mei W. Hwu, "An Architectural Framework for Run-Time Optimization," IEEE Transactions on Computers, Vol. 50, No. 6, pp. 567--589, Jun. 2001.
[36]
Erik Nystrom, Ronald D. Barnes, Matthew C. Merten, and Wen-mei W. Hwu, "Code Reordering and Speculation Support for Dynamic Optimization Systems," Proceedings of the Int. Conference on Parallel Architectures and Compilation Techniques, Sep. 2001.
[37]
Subbarao Palacharla, Norman P. Jouppi, James E. Smith, "Complexity-Effective Superscalar Processors," Proceedings of the 24th International Symposium on Computer Architecture, pp. 206--218, Jun. 1997.
[38]
Karl Pettis, Robert C. Hansen, "Profile Guided Code Positioning," Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 16--27, Jun. 1990.
[39]
Alex Ramire, Josep L. Larriba-Pey, Carlos Navarro, Josep Torrellas, Matero Valero, "Software Trace Cache," Proceedings of the 13th International Conference on Supercomputing, pp. 119--126, Jun. 1999.
[40]
Michael Slater, "AMD's K5 Designed to Outrun Pentium," Microprocessor Report, Oct. 24, 1994.
[41]
James E. Smith, "Instruction-Level Distributed Processing," IEEE Computer, Vol. 34, No. 4, pp. 59--65, Apr 2001.
[42]
James E. Smith, S. Subramaya Sastry, Timothy H. Heil, Todd M. Bezenek, "Achieving High Performance via CoDesigned Virtual Machines," International Workshop on Innovative Architecture, Maui High Performance Computer Center, Oct. 1998.
[43]
Michael D. Smith, "Overcoming the Challenges to Feedback-Directed Optimization," Proceedings of the ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization, Dec. 2000.
[44]
Joel M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, B. Sinharoy, "POWER4 System Microarchitecture," IBM Journal of Research and Development, Vol. 46, No. 1, pp. 5--26, Jan. 2002.
[45]
David Ung, Cristina Cifuentes, "Optimizing Hot Paths in a Dynamic Binary Translator," Proceedings of the 2nd Workshop on Binary Translation, Oct. 2000.
[46]
Cindy Zheng, Carol Thompson, "PA-RISC to IA-64: Transparent Execution, No Recompilation," IEEE Computer, Vol. 33, No. 3, pp. 47--53, Mar. 2000.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '03: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
March 2003
349 pages
ISBN:076951913X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 23 March 2003

Check for updates

Qualifiers

  • Article

Conference

CGO03
Sponsor:

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Hardware-accelerated dynamic binary translationProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130632(1062-1067)Online publication date: 27-Mar-2017
  • (2014)Probing the Limits of Virtualized Software ProtectionProceedings of the 4th Program Protection and Reverse Engineering Workshop10.1145/2689702.2689707(1-11)Online publication date: 9-Dec-2014
  • (2010)DisIRerACM Transactions on Architecture and Code Optimization10.1145/1880043.18800457:4(1-36)Online publication date: 30-Dec-2010
  • (2009)A cross-layer approach to heterogeneity and reliabilityProceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign10.5555/1715759.1715772(88-97)Online publication date: 13-Jul-2009
  • (2007)Static strandsACM Transactions on Embedded Computing Systems10.1145/1274858.12748626:4(24-es)Online publication date: 1-Sep-2007
  • (2007)SuperPinProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2007.37(209-220)Online publication date: 11-Mar-2007
  • (2006)Reducing Startup Time in Co-Designed Virtual MachinesACM SIGARCH Computer Architecture News10.1145/1150019.113651034:2(277-288)Online publication date: 1-May-2006
  • (2006)Reducing Startup Time in Co-Designed Virtual MachinesProceedings of the 33rd annual international symposium on Computer Architecture10.1109/ISCA.2006.33(277-288)Online publication date: 17-Jun-2006
  • (2005)Static strandsACM SIGPLAN Notices10.1145/1070891.106592940:7(127-136)Online publication date: 15-Jun-2005
  • (2005)Static strandsProceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems10.1145/1065910.1065929(127-136)Online publication date: 15-Jun-2005
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media