Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/MICRO.2006.20acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Published: 09 December 2006 Publication History

Abstract

This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor (DMP). The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them without requiring ISA support for predicate registers and predicated instructions. To achieve this without incurring large hardware cost and complexity, the compiler provides control-flow information by hints and the processor dynamically predicates instructions only on frequently executed program paths. The key insight behind DMP is that most control-flow graphs look and behave like simple hammock (if-else) structures when only frequently executed paths in the graphs are considered. Therefore, DMP can dynamically predicate a much larger set of branches than simple hammock branches. This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor (DMP). The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them without requiring ISA support for predicate registers and predicated instructions. To achieve this without incurring large hardware cost and complexity, the compiler provides control-flow information by hints and the processor dynamically predicates instructions only on frequently executed program paths. The key insight behind DMP is that most control-flow graphs look and behave like simple hammock (if-else) structures when only frequently executed paths in the graphs are considered. Therefore, DMP can dynamically predicate a much larger set of branches than simple hammock branches.

References

[1]
{1} P. S. Ahuja, K. Skadron, M. Martonosi, and D. W. Clark. Multipath execution: opportunities and limits. In ICS-12, 1998.
[2]
{2} J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In POPL-10, 1983.
[3]
{3} D. I. August, D. A. Connors, J. C. Gyllenhaal, and W. W. Hwu. Architectural support for compiler-synthesized dynamic branch prediction strategies: Rationale and initial results. In HPCA-3, 1997.
[4]
{4} D. I. August, W. W. Hwu, and S. A. Mahlke. A framework for balancing control flow and predication. In MICRO-30, 1997.
[5]
{5} D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA-27, 2000.
[6]
{6} P.-Y. Chang, E. Hao, Y. N. Patt, and P. P. Chang. Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution. In PACT, 1995.
[7]
{7} S. Chaudhry, P. Caprioli, S. Yip, and M. Tremblay. High-performance throughput computing. IEEE Micro, 25(3):32-45, May 2005.
[8]
{8} C.-Y. Cher and T. N. Vijaykumar. Skipper: a microarchitecture for exploiting control-flow independence. In MICRO-34, 2001.
[9]
{9} Y. Choi, A. Knies, L. Gerke, and T.-F. Ngai. The impact of if-conversion and branch prediction on program execution on the Intel Itanium processor. In MICRO-34, 2001.
[10]
{10} Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. In ISCA-31, 2004.
[11]
{11} Y. Chou, J. Fung, and J. P. Shen. Reducing branch misprediction penalties via dynamic control independence detection. In ICS-13, 1999.
[12]
{12} J. D. Collins, D. M. Tullsen, and H. Wang. Control flow optimization via dynamic reconvergence prediction. In MICRO-37, 2004.
[13]
{13} A. Cristal, O. J. Santana, F. Cazorla, M. Galluzzi, T. Ramirez, M. Pericas, and M. Valero. Kilo-instruction processors: Overcoming the memory wall. IEEE Micro, 25(3):48-57, May 2005.
[14]
{14} R. Cytron, J. Ferrante, B. K. Rosen, M. N.Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems , 13(4):451-490, Oct. 1991.
[15]
{15} M. Farrens, T. Heil, J. E. Smith, and G. Tyson. Restricted dual path execution. Technical Report CSE-97-18, University of California at Davis, Nov. 1997.
[16]
{16} A. Gandhi, H. Akkary, and S. T. Srinivasan. Reducing branch misprediction penalty via selective recovery. In HPCA-10, 2004.
[17]
{17} D. Grunwald, A. Klauser, S. Manne, and A. Pleszkun. Confidence estimation for speculation control. In ISCA-25, 1998.
[18]
{18} T. Heil and J. E. Smith. Selective dual path execution. Technical report, University of Wisconsin-Madison, Nov. 1996.
[19]
{19} E. Jacobsen, E. Rotenberg, and J. E. Smith. Assigning confidence to conditional branch predictions. In MICRO-29, 1996.
[20]
{20} D. A. Jiménez and C. Lin. Dynamic branch prediction with perceptrons. In HPCA-7, 2001.
[21]
{21} H. Kim, J. A. Joao, O. Mutlu, and Y. N. Patt. Diverge-merge processor (DMP): Dynamic predicated execution of complex control-flow graphs based on frequently executed paths. Technical Report TR-HPS-2006-008, The University of Texas at Austin, Sept. 2006.
[22]
{22} H. Kim, O. Mutlu, J. Stark, and Y. N. Patt. Wish branches: Combining conditional branching and predication for adaptive predicated execution. In MICRO-38, 2005.
[23]
{23} A. Klauser, T. Austin, D. Grunwald, and B. Calder. Dynamic hammock predication for non-predicated instruction set architectures. In PACT, 1998.
[24]
{24} A. Klauser and D. Grunwald. Instruction fetch mechanisms for multipath execution processors. In MICRO-32, 1999.
[25]
{25} A. Klauser, A. Paithankar, and D. Grunwald. Selective eager execution on the polypath architecture. In ISCA-25, 1998.
[26]
{26} A. KleinOsowski and D. J. Lilja. MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Computer Architecture Letters, 1, June 2002.
[27]
{27} M. S. Lam and R. P. Wilson. Limits of control flow on parallelism. In ISCA-19, 1992.
[28]
{28} S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu. Characterizing the impact of predicated execution on branch prediction. In MICRO-27, 1994.
[29]
{29} S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In MICRO-25, 1992.
[30]
{30} O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In HPCA-9, 2003.
[31]
{31} ORC. Open research compiler for Itanium processor family. http://ipforc.sourceforge.net/.
[32]
{32} J. C. H. Park and M. Schlansker. On predicated execution. Technical Report HPL-91-58, Hewlett-Packard Labs, Palo Alto CA, May 1991.
[33]
{33} D. N. Pnevmatikatos and G. S. Sohi. Guarded execution and dynamic branch prediction in dynamic ILP processors. In ISCA-21, 1994.
[34]
{34} E. M. Riseman and C. C. Foster. The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers, C-21(12):1405- 1411, 1972.
[35]
{35} E. Rotenberg, Q. Jacobson, and J. E. Smith. A study of control independence in superscalar processors. In HPCA-5, 1999.
[36]
{36} E. Rotenberg and J. Smith. Control independence in trace processors. In MICRO-32, 1999.
[37]
{37} A. Seznec. Analysis of the O-GEometric History Length branch predictor. In ISCA-32, 2005.
[38]
{38} J. W. Sias, S. Ueng, G. A. Kent, I. M. Steiner, E. M. Nystrom, and W. W. Hwu. Field-testing IMPACT EPIC research results in Itanium 2. In ISCA- 31, 2004.
[39]
{39} B. Simon, B. Calder, and J. Ferrante. Incorporating predicate information into branch predictors. In HPCA-9, 2003.
[40]
{40} K. Skadron, P. S. Ahuja, M. Martonosi, and D. W. Clark. Branch prediction, instruction-window size, and cache size: Performance trade-offs and simulation techniques. ACM Transactions on Computer Systems, 48(11):1260-1281, Nov. 1999.
[41]
{41} E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In ISCA-29, 2002.
[42]
{42} S. T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton. Continual flow pipelines. In ASPLOS-XI, 2004.
[43]
{43} J. M. Tendler, J. S. Dodson, J. S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.
[44]
{44} G. S. Tyson. The effects of predication on branch prediction. In MICRO- 27, 1994.
[45]
{45} P. H. Wang, H. Wang, R. M. Kling, K. Ramakrishnan, and J. P. Shen. Register renaming and scheduling for dynamic execution of predicated code. In HPCA-7, 2001.
[46]
{46} N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau. Reverse if-conversion. In PLDI, 1993.

Cited By

View all
  • (2021)Enabling Branch-Mispredict Level Parallelism by Selectively Flushing InstructionsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480045(767-778)Online publication date: 18-Oct-2021
  • (2015)A profiling based task scheduling approach for multicore network processorsConcurrency and Computation: Practice & Experience10.1002/cpe.284627:4(855-869)Online publication date: 25-Mar-2015
  • (2014)Efficient Out-of-Order Execution of Guarded ISAsACM Transactions on Architecture and Code Optimization10.1145/267703711:4(1-21)Online publication date: 8-Dec-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
December 2006
493 pages
ISBN:0769527329

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 09 December 2006

Check for updates

Qualifiers

  • Article

Conference

Micro-39
Sponsor:

Acceptance Rates

MICRO 39 Paper Acceptance Rate 42 of 174 submissions, 24%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Enabling Branch-Mispredict Level Parallelism by Selectively Flushing InstructionsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480045(767-778)Online publication date: 18-Oct-2021
  • (2015)A profiling based task scheduling approach for multicore network processorsConcurrency and Computation: Practice & Experience10.1002/cpe.284627:4(855-869)Online publication date: 25-Mar-2015
  • (2014)Efficient Out-of-Order Execution of Guarded ISAsACM Transactions on Architecture and Code Optimization10.1145/267703711:4(1-21)Online publication date: 8-Dec-2014
  • (2012)Mixed speculative multithreaded execution modelsACM Transactions on Architecture and Code Optimization10.1145/2355585.23555919:3(1-26)Online publication date: 5-Oct-2012
  • (2012)Control-Flow DecouplingProceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2012.38(329-340)Online publication date: 1-Dec-2012
  • (2008)Improving the performance of object-oriented languages with dynamic predication of indirect jumpsACM SIGPLAN Notices10.1145/1353536.134629343:3(80-90)Online publication date: 1-Mar-2008
  • (2008)Improving the performance of object-oriented languages with dynamic predication of indirect jumpsACM SIGOPS Operating Systems Review10.1145/1353535.134629342:2(80-90)Online publication date: 1-Mar-2008
  • (2008)Improving the performance of object-oriented languages with dynamic predication of indirect jumpsACM SIGARCH Computer Architecture News10.1145/1353534.134629336:1(80-90)Online publication date: 1-Mar-2008
  • (2008)Improving the performance of object-oriented languages with dynamic predication of indirect jumpsProceedings of the 13th international conference on Architectural support for programming languages and operating systems10.1145/1346281.1346293(80-90)Online publication date: 1-Mar-2008
  • (2007)GingerACM SIGARCH Computer Architecture News10.1145/1273440.125071635:2(436-447)Online publication date: 9-Jun-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media