Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Improving the performance of object-oriented languages with dynamic predication of indirect jumps

Published: 01 March 2008 Publication History

Abstract

Indirect jump instructions are used to implement increasingly-common programming constructs such as virtual function calls, switch-case statements, jump tables, and interface calls. The performance impact of indirect jumps is likely to increase because indirect jumps with multiple targets are difficult to predict even with specialized hardware.
This paper proposes a new way of handling hard-to-predict indirect jumps: dynamically predicating them. The compiler (static or dynamic) identifies indirect jumps that are suitable for predication along with their control-flow merge (CFM) points. The hardware predicates theinstructions between different targets of the jump and its CFM point if the jump turns out to be hard-to-predict at run time. If the jump would actually have been mispredicted, its dynamic predication eliminates a pipeline flush, thereby improving performance.
Our evaluations show that Dynamic Indirect jump Predication (DIP) improves the performance of a set of object-oriented applications including the Java DaCapo benchmark suite by 37.8% compared to a commonly-used branch target buffer based predictor, while also reducing energy consumption by 24.8%. We compare DIP to three previously proposed indirect jump predictors and find that it provides the best performance and energy-efficiency.

Supplementary Material

JPG File (1346293.jpg)
index.html (index.html)
Slides from the presentation
ZIP File (p80-joao-slides.zip)
Supplemental material for Improving the performance of object-oriented languages with dynamic predication of indirect jumps
Audio only (1346293.mp3)
Video (1346293.mp4)

References

[1]
J.R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In POPL-10, 1983.
[2]
B. Alpern, A. Cocchi, S. Fink, D. Grove, and D. Lieber. Efficient implementation of Java interfaces: Invokeinterface considered harmless. In OOPSLA, 2001.
[3]
S. Bhansali, W.-K. Chen, S.D. Jong, A. Edwards, and M. Drinic. Framework for instruction-level tracing and analysis of programs. In VEE, 2006.
[4]
N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S.K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, 2006.
[5]
S.M. Blackburn et al. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA'06, 2006.
[6]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA-27, 2000.
[7]
B. Calder, D. Grunwald, and B. Zorn. Quantifying behavioral differences between C and C++ programs. Journal of Programming Languages, 2(4):323--351, 1995.
[8]
L. Cardelli and P. Wegner. On understanding types, data abstraction, and polymorphism. ACM Computing Surveys, 17(4):471--523, Dec. 1985.
[9]
P. Chang, E. Hao, and Y.N. Patt. Target prediction for indirect jumps. In ISCA, 1997.
[10]
R. Cytron et al. Efficiently computing static single assignment form and the control dependence graph. ACM TOPLAS, 13(4):451--490, Oct. 1991.
[11]
L.P. Deutsch and A.M. Schiffman. Efficient implementation of the Smalltalk-80 system. In POPL, 1984.
[12]
K. Driesen and U. Holzle. Accurate indirect branch prediction. In ISCA-25, 1998.
[13]
K. Driesen and U. Holzle. Multi-stage cascaded prediction. In Euro-Par, 1999.
[14]
M.A. Ertl and D. Gregg. Optimizing indirect branch prediction accuracy in virtual machine interpreters. In PLDI, 2003.
[15]
M. Farrens, T. Heil, J.E. Smith, and G. Tyson. Restricted dual path execution. Technical Report CSE-97-18, University of California at Davis, Nov. 1997.
[16]
S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R.C. Valentine. The Intel PentiumM processor: Microarchitecture and performance. Intel Technology Journal, 7(2), May 2003.
[17]
T. Heil and J.E. Smith. Selective dual path execution. Technical report, University of Wisconsin-Madison, Nov. 1996.
[18]
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, Feb. 2001.
[19]
U. Holzle, C. Chambers, and D. Ungar. Optimizing dynamically-typed object-oriented languages with polymorphic inline caches. In ECOOP, 1991.
[20]
U. Holzle and D. Ungar. Optimizing dynamically-dispatched calls with run-time type feedback. In PLDI, 1994.
[21]
Intel Corp. ICC 9.1 for Linux. http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284264.htm.
[22]
Intel Corp. Intel Core2 Duo Desktop Processor E6600. http://processorfinder.intel.com/details.aspx?sSpec=SL9ZL.
[23]
Intel Corp. Intel VTune Performance Analyzers. http://www.intel.com/vtune/.
[24]
K. Ishizaki, M. Kawahito, T. Yasue, H. Komatsu, and T. Nakatani. A study of devirtualization techniques for a java just-in-time compiler. In OOPSLA-15, 2000.
[25]
E. Jacobsen, E. Rotenberg, and J.E. Smith. Assigning confidence to conditional branch predictions. In MICRO-29, 1996.
[26]
D. Jimenez and C. Lin. Dynamic branch prediction with perceptrons. In HPCA, 2001.
[27]
J.A. Joao, O. Mutlu, H. Kim, and Y.N. Patt. Dynamic predication of indirect jumps. IEEE Computer Architecture Letters, May 2007.
[28]
J. Kalamatianos and D.R. Kaeli. Predicting indirect branches via data compression. In MICRO-31.
[29]
R.E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24--36, 1999.
[30]
H. Kim, J.A. Joao, O. Mutlu, C.J. Lee, Y.N. Patt, and R.S. Cohn. VPC Prediction: Reducing the cost of indirect branches via hardware-based dynamic devirtualization. In ISCA-34, 2007.
[31]
H. Kim, J.A. Joao, O. Mutlu, and Y.N. Patt. Diverge-merge processor (DMP): Dynamic predicated execution of complex control-flow graphs based on frequently executed paths. In MICRO-39, 2006.
[32]
H. Kim, J.A. Joao, O. Mutlu, and Y.N. Patt. Diverge-merge processor: Generalized and energy-efficient dynamic predication. IEEE Micro, 27(1):94--104, 2007.
[33]
H. Kim, J.A. Joao, O. Mutlu, and Y.N. Patt. Profile-assisted compiler support for dynamic predication in diverge-merge processors. In CGO-5, 2007.
[34]
A. Klauser, T. Austin, D. Grunwald, and B. Calder. Dynamic hammock predication for non-predicated instruction set architectures. In PACT, 1998.
[35]
A. Klauser, A. Paithankar, and D. Grunwald. Selective eager execution on the polypath architecture. In ISCA-25, 1998.
[36]
B.R. Rau, D.W.L. Yen, W. Yen, and R.A. Towle. The Cydra 5 departmental supercomputer. IEEE Computer, 22:12--35, Jan. 1989.
[37]
E.M. Riseman and C.C. Foster. The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers, C-21(12):1405--1411, 1972.
[38]
A. Roth, A. Moshovos, and G.S. Sohi. Improving virtual function call target prediction via dependence-based pre-computation. In ICS-13, 1999.
[39]
A. Seznec and P. Michaud. A case for (partially) TAgged GEometric history length branch prediction. JILP, Feb. 2006.
[40]
E.H. Sussenguth. Instruction Control Sequence. U.S. Patent 3559183, Jan. 26, 1971.
[41]
D. Tarditi, July 2007. Personal communication.
[42]
J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.
[43]
P.H. Wang, H. Wang, R.M. Kling, K. Ramakrishnan, and J.P. Shen. Register renaming and scheduling for dynamic execution of predicated code. In HPCA-7, 2001.
[44]
M. Wolczko. Benchmarking Java with the Richards benchmark. http://research.sun.com/people/mario/java_benchmarking/richards/richards.html.

Cited By

View all
  • (2018)Cost effective speculation with the omnipredictorProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243208(1-13)Online publication date: 1-Nov-2018
  • (2021)Characterizing Massively Parallel Polymorphism2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00037(205-216)Online publication date: Mar-2021
  • (2019)JumpswitchesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358832(285-299)Online publication date: 10-Jul-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 43, Issue 3
ASPLOS '08
March 2008
339 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1353536
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
    March 2008
    352 pages
    ISBN:9781595939586
    DOI:10.1145/1346281
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2008
Published in SIGPLAN Volume 43, Issue 3

Check for updates

Author Tags

  1. dynamic predication
  2. indirect jumps
  3. object-oriented languages
  4. predicated execution
  5. virtual functions

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Cost effective speculation with the omnipredictorProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243208(1-13)Online publication date: 1-Nov-2018
  • (2021)Characterizing Massively Parallel Polymorphism2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00037(205-216)Online publication date: Mar-2021
  • (2019)JumpswitchesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358832(285-299)Online publication date: 10-Jul-2019
  • (2018)Cost effective speculation with the omnipredictorProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243208(1-13)Online publication date: 1-Nov-2018
  • (2018)Quantitative Overhead Analysis for Python2018 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2018.8573512(36-47)Online publication date: Sep-2018
  • (2017)ShortCutACM SIGARCH Computer Architecture News10.1145/3140659.308023745:2(494-506)Online publication date: 24-Jun-2017
  • (2017)ShortCutProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080237(494-506)Online publication date: 24-Jun-2017
  • (2016)YakProceedings of the 12th USENIX conference on Operating Systems Design and Implementation10.5555/3026877.3026905(349-365)Online publication date: 2-Nov-2016
  • (2015)Bungee jumpsProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830781(370-382)Online publication date: 5-Dec-2015
  • (2014)On-device objective-C application optimization framework for high-performance mobile processorsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616711(1-6)Online publication date: 24-Mar-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media