Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1250662.1250715acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization

Published: 09 June 2007 Publication History

Abstract

Indirect branches have become increasingly common in modular programs written in modern object-oriented languages and virtual machine based runtime systems. Unfortunately, the prediction accuracy of indirect branches has not improved as much as that of conditional branches. Furthermore, previously proposed indirect branch predictors usually require a significant amount of extra hardware storage and complexity, which makes them less attractive to implement.
This paper proposes a new technique for handling indirect branches, called Virtual Program Counter (VPC) prediction. The key idea of VPC prediction is to treat a single indirect branch as multiple virtual conditional branches in hardware for prediction purposes. Our technique predicts each of the virtual conditional branches using the existing conditional branch prediction hardware. Thus, no separate storage structure is required for predicting indirect branch targets.
Our evaluation shows that VPC prediction improves average performance by 26.7% compared to a commonly-used branch target buffer based predictor on 12 indirect branch intensive applications. VPC prediction achieves the performance improvement provided by at least a 12KB (and usually a 192KB) tagged target cache predictor on half of the examined applications. We show that VPC prediction can be used with any existing conditional branch prediction mechanism and that the accuracy of VPC prediction improves when a more accurate conditional branch predictor is used.

References

[1]
Advanced Micro Devices, Inc. AMD Athlon(TM) XP Processor Model 10 Data Sheet, Feb. 2003.
[2]
B. Calder and D. Grunwald. Reducing indirect function call overhead in C++ programs. In POPL-21, 1994.
[3]
B. Calder, D. Grunwald, and B. Zorn. Quantifying behavioral differences between C and C++ programs. Journal of Programming Languages, 2(4):323--351, 1995.
[4]
L. Cardelli and P. Wegner. On understanding types, data abstraction, and polymorphism. ACM Computing Surveys, 17(4):471--523, Dec. 1985.
[5]
P.-Y. Chang, M. Evers, and Y. N. Patt. Improving branch prediction accuracy by reducing pattern history table interference. In PACT, 1996.
[6]
P.-Y. Chang, E. Hao, and Y. N. Patt. Target prediction for indirect jumps. In ISCA-24, 1997.
[7]
L. P. Deutsch and A. M. Schiffman. Efficient implementation of the Smalltalk-80 system. In POPL, 1984.
[8]
K. Driesen and U. Hölzle. Accurate indirect branch prediction. In ISCA-25, 1998.
[9]
K. Driesen and U. Hölzle. The cascaded predictor: Economical and adaptive branch target prediction. In MICRO-31, 1998.
[10]
K. Driesen and U. Hölzle. Multi-stage cascaded prediction. In European Conference on Parallel Processing, 1999.
[11]
M. Evers, S. J. Patel, R. S. Chappell, and Y. N. Patt. An analysis of correlation and predictability: What makes two-level branch predictors work. In ISCA-25, 1998.
[12]
The GAP Group. GAP System for Computational Discrete Algebra. http://www.gap--system.org/.
[13]
C. Garrett, J. Dean, D. Grove, and C. Chambers. Measurement and application of dynamic receiver class distributions. Technical Report UW-CS 94-03-05, University of Washington, Mar. 1994.
[14]
GCC-4.0. GNU compiler collection. http://gcc.gnu.org/.
[15]
S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. C. Valentine. The Intel Pentium M processor: Microarchitecture and performance. Intel Technology Journal, 7(2), May 2003.
[16]
D. Grove, J. Dean, C. Garrett, and C. Chambers. Profile-guided receiver class prediction. In OOPSLA-10, 1995.
[17]
A. Hartstein and T. R. Puzak. The optimum pipeline depth for a microprocessor. In ISCA-29, 2002.
[18]
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, Feb. 2001. Q1 2001 Issue.
[19]
U. Hölzle and D. Ungar. Optimizing dynamically-dispatched calls with run-time type feedback. In PLDI, 1994.
[20]
IBM Corporation. IBM zSeries mainframe servers. http://www.ibm.com/systems/z/.
[21]
Intel Corporation. ICC 9.1 for Linux. http://www.intel.com/cd/software/products/asmo--na/eng/compilers/284264.htm.
[22]
Intel Corporation. Intel Core Duo Processor T2500. http://processorfinder.intel.com/Details.aspx?sSpec=SL8VT.
[23]
Intel Corporation. Intel VTune Performance Analyzers. http://www.intel.com/vtune/.
[24]
K. Ishizaki, M. Kawahito, T. Yasue, H. Komatsu, and T. Nakatani. A study of devirtualization techniques for a Java Just-In-Time compiler. In OOPSLA-15, 2000.
[25]
D. A. Jiménez and C. Lin. Dynamic branch prediction with perceptrons. In HPCA-7, 2001.
[26]
D. Kaeli and P. Emma. Branch history table predictions of moving target branches due to subroutine returns. In ISCA-18, 1991.
[27]
J. Kalamatianos and D. R. Kaeli. Predicting indirect branches via data compression. In MICRO-31, 1998.
[28]
R. E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24-36, 1999.
[29]
H. Kim, J. A. Joao, O. Mutlu, C. J. Lee, Y. N. Patt, and R. Cohn. VPC prediction: Reducing the cost of indirect branches via hardware-based dynamic devirtualization. Technical Report TR-HPS-2007-001, The University of Texas at Austin, Mar. 2007.
[30]
J. K. F. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, pages 6--22, Jan. 1984.
[31]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.
[32]
P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, Feb. 2002.
[33]
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Western Research Laboratory, June 1993.
[34]
Microsoft Research. Bartok compiler. http://research.microsoft.com/act/.
[35]
D. Novillo, Mar. 2007. Personal communication.
[36]
H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004.
[37]
A. Roth, A. Moshovos, and G. S. Sohi. Improving virtual function call target prediction via dependence-based pre-computation. In ICS-13, 1999.
[38]
D. Sehr, Nov. 2006. Personal communication.
[39]
A. Seznec. Analysis of the O-GEometric History Length branch predictor. In ISCA-32, 2005.
[40]
A. Seznec and P. Michaud. A case for (partially) TAgged GEometric history length branch prediction. Journal of Instruction-Level Parallelism (JILP), 8, Feb. 2006.
[41]
D. Tarditi, Nov. 2006. Personal communication.
[42]
J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.
[43]
M. Wolczko. Benchmarking Java with the Richards benchmark. http://research.sun.com/people/mario/java_benchmarking/richards/richards.html.
[44]
T.-Y. Yeh, D. Marr, and Y. N. Patt. Increasing the instruction fetch rate via multiple branch prediction and branch address cache. In ICS, 1993.

Cited By

View all
  • (2021)Judging a type by its pointer: optimizing GPU virtual functionsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446734(241-254)Online publication date: 19-Apr-2021
  • (2021)Characterizing Massively Parallel Polymorphism2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00037(205-216)Online publication date: Mar-2021
  • (2019)JumpswitchesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358832(285-299)Online publication date: 10-Jul-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
June 2007
542 pages
ISBN:9781595937063
DOI:10.1145/1250662
  • General Chair:
  • Dean Tullsen,
  • Program Chair:
  • Brad Calder
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
    May 2007
    527 pages
    ISSN:0163-5964
    DOI:10.1145/1273440
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. devirtualization
  2. indirect branch prediction
  3. virtual functions

Qualifiers

  • Article

Conference

SPAA07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Judging a type by its pointer: optimizing GPU virtual functionsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446734(241-254)Online publication date: 19-Apr-2021
  • (2021)Characterizing Massively Parallel Polymorphism2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00037(205-216)Online publication date: Mar-2021
  • (2019)JumpswitchesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358832(285-299)Online publication date: 10-Jul-2019
  • (2019)Bit-level perceptron prediction for indirect branchesProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322217(27-38)Online publication date: 22-Jun-2019
  • (2018)Quantitative Overhead Analysis for Python2018 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2018.8573512(36-47)Online publication date: Sep-2018
  • (2017)ShortCutACM SIGARCH Computer Architecture News10.1145/3140659.308023745:2(494-506)Online publication date: 24-Jun-2017
  • (2017)A Schur Complement Preconditioner for Scalable Parallel Fluid SimulationACM Transactions on Graphics10.1145/309281836:5(1-11)Online publication date: 25-Jul-2017
  • (2017)ShortCutProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080237(494-506)Online publication date: 24-Jun-2017
  • (2017)Efficient Solver for Spacetime Control of SmokeACM Transactions on Graphics10.1145/301696336:5(1-13)Online publication date: 25-Jul-2017
  • (2016)YakProceedings of the 12th USENIX conference on Operating Systems Design and Implementation10.5555/3026877.3026905(349-365)Online publication date: 2-Nov-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media