research-article

Exploiting Program Branch Probabilities in Hardware Compilation

Authors:

Wayne LukAuthors Info & Claims

IEEE Transactions on Computers, Volume 53, Issue 11

Pages 1408 - 1419

https://doi.org/10.1109/TC.2004.96

Published: 01 November 2004 Publication History

Abstract

This paper explores using information about program branch probabilities to optimize the results of hardware compilation. The basic premise is to promote utilization by dedicating more resources to branches which execute more frequently. A new hardware compilation and flow control scheme are presented which enable the computation rate of different branches to be matched to the observed branch probabilities. We propose an analytical queuing network performance model to determine the optimal settings for basic block computation rates given a set of observed branch probabilities. An experimental hardware compilation system has been developed to evaluate this approach. The branch optimization design space is characterized in an experimental study for Xilinx Virtex FPGAs of two complex applications: video feature extraction and progressive refinement radiosity. For designs of equal performance, branch-optimized designs require 24 percent and 27.5 percent less area. For designs of equal area, branch optimized designs run up to three times faster. Our analytical performance model is shown to be highly accurate with relative error between 0.12 and 1.1\times10^{-4}.

References

[1]

Elixent d-fabrix and rap, http://www.elixent.com/products/white_papers.htm, 2003]]

[2]

Celoxica Limited, Handel-C Language Reference Manual, version 3.1, document number RM-1003-3.0, 2002.]]

[3]

“A Look into Quicksilver's Acm Architecture,” http://www.qstech.com/pdfs/a_look_into_quicksilver.pdf, 2002]]

[4]

M. Budiu and S.C. Goldstein, “Compiling Application-Specific Hardware,” Field-Programmable Logic and Applications, pp. 853-863, Springer, 2002.]]

Digital Library

[5]

W. Luk N. Shirazi and P.Y.K. Cheung, “Modelling and Optimising Run-Time Reconfigurable Systems,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 1996.]]

[6]

D. Cottet R. Enzler T. Jeger and G. Tröster, “High-Level Area and Performance Estimation of Hardware Building Blocks on FPGAs,” Field Programmable Logic and Applications, pp. 525-534, Springer, 2000.]]

Digital Library

[7]

P.J. Denning, “The Working Set Model for Program Behavior,” Proc. ACM Symp. Operating System Principles, pp. 15.1-15.12, 1967.]]

Digital Library

[8]

G. De Micheli, Synthesis and Optimization of Digital Circuits. McGraw-Hill, 1994.]]

Digital Library

[9]

A.S. Dhodapkar and J.E. Smith, “Managing Multi-Configuration Hardware via Dynamic Working Set Analysis,” Proc. 29th IEEE/ACM Int'l Symp. Computer Architecture, pp. 233-244, 2002.]]

Digital Library

[10]

A.J. Field and P.G. Harrison, “A Methodology for the Performance Modelling of Distributed Cache Coherent Multiprocessors,” The State-of-the-Art in Performance Modeling and Simulation, K. Bagchi, J.nbspWalrand, and G.W. Zobrist, eds., pp.nbsp55-92, Gordon and Breach, 1998.]]

[11]

M.J. Flynn, Computer Architecture: Pipelined and Parallel Processor Design. Sudbury, Mass.: Jones and Bartlett, 1995.]]

Digital Library

[12]

T. Harriss R. Walke B. Kienhuis and E. Deprettere, “Compilation from Matlab to Process Networks,” Design Automation for Embedded Systems, vol. 7, pp. 385-403, 2002.]]

Digital Library

[13]

P.A. Jackson J.L. Tripp and B.L. Hutchings, “Sea Cucumber: A Synthesizing Compiler for FPGAs,” Field Programmable Logic and Applications, pp. 875-885, Springer, 2002.]]

Digital Library

[14]

D. Cottet T. Jeger R. Enzler and G. Tröster, “The Performance Prediction Model,” technical report, Electronics Lab, Swiss Federal Inst. of Technology (ETH) Zurich, 2000.]]

[15]

L. Kleinrock, Queueing Systems. Volume I: Theory. John Wiley & Sons, Inc., 1975.]]

Digital Library

[16]

L. Kleinrock, Queueing Systems. Volume II: Computer Applications. John Wiley & Sons, Inc., 1976.]]

[17]

M.S. Lam R.P. Wilson R.S. French and J.L. Hennessy, “Suif: An Infrastructure for Research on Parallelizing And Optimizing Compilers,” SIGPLAN Notices, vol. 29, no. 12, pp. 31-37, 1994.]]

Digital Library

[18]

R. Leupers and P. Marwedel, Retargetable Compiler Technology for Embedded Systems: Tools and Applications, Kluwer Academic, 2001.]]

Digital Library

[19]

W. Luk T.K. Lee J.R. Rice N. Shirazi and P.Y.K. Cheung, “Reconfigurable Computing for Augmented Reality,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 1999.]]

Digital Library

[20]

X.-J. Zhang K.-W. Ng and W. Luk, “A Combined Approach to High-Level Synthesis for Dynamically Reconfigurable Systems,” Field Programmable Logic and Applications, pp. 361-370, Springer, 2000.]]

Digital Library

[21]

S. McMillan and S. Guccione, “Partial Run-Time Reconfiguration Using jrtr,” Field Programmable Logic and Applications, Springer, 2000.]]

Digital Library

[22]

K.N. McNall and A.E. Casavant, “Automatic Operator Configuration in the Synthesis of Pipelined Architectures,” Proc. 27th ACM/IEEE Design Automation Conf., pp. 174-179, 1990.]]

Digital Library

[23]

O. Mencer H Hübert M. Morf and M.J. Flynn, “StReAm: Object-Oriented Programming of Stream Architectures Using PAM-Blox,” Field-Programmable Logic: the Roadmap to Reconfigurable Systems, pp. 595-604, Springer, 2000.]]

Digital Library

[24]

I. Mitrani, Probabalistic Modelling. Cambridge Univ. Press, 1998.]]

Digital Library

[25]

T. Moller and B. Trumbore, “Fast, Minimum Storage Ray-Triangle Intersection,” J. Graphics Tools, vol. 2, no. 1, pp. 21-28, 1997.]]

Digital Library

[26]

S. Ong, et al., “Automatic Mapping of Multiple Applications to Multiple Adaptive Computing Systems,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 218-227, 2001.]]

Digital Library

[27]

U. Prabhu and B.M. Pangrle, “Global Mobility Based Scheduling,” Proc. IEEE Int'l Conf. Computer Design: VLSI in Computers and Processors, pp. 370-373, 1993.]]

[28]

R.O. Onvural, “Survey of Closed Queueing Networks with Blocking,” ACM Computing Surveys, vol. 22, no. 2, pp. 83-121, June 1990.]]

Digital Library

[29]

J. Park K.R. Shesha Shayee and P.C. Diniz, “Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations,” Field Programmable Logic and Applications, Springer, 2003.]]

Digital Library

[30]

H. Schmit, “Incremental Reconfiguration for Pipeline Applications,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 1997.]]

Digital Library

[31]

S. Singh and P. James-Roxby, “Lava and jbits: From Hdl to Bitstream in Seconds,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 2001.]]

Digital Library

[32]

H. Styles and W. Luk, “Accelerating Radiosity Calculations Using Reconfigurable Platforms,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 279-281, 2002.]]

Digital Library

[33]

A.H. Veen, “Dataflow Machine Architecture,” ACM Computing Surveys, vol. 18, no. 4, pp. 365-396, 1986.]]

Digital Library

[34]

M. Weinhardt and W. Luk, “Pipeline Vectorisation,” IEEE Trans. Computer-Aided Design, vol. 20, no. 2, pp. 234-248, Feb. 2001.]]

Digital Library

[35]

J.M. Stone, et al., “Stream-Oriented Fpga Computing in the Stream-C High Level Language,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 2000.]]

Digital Library

[36]

W. Wang A. Raghunathan N.K. Jha and S. Dey, “High-Level Synthesis of Multi-Process Behavioral Descriptions,” Proc. IEEE Int'l Conf. VLSI Design, pp. 467-473, 2003.]]

Digital Library

[37]

H. Ziegler B. So M. Hall and P.C. Diniz, “Coarse-Grain Pipelining on Multiple FPGA Architectures,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 77-86, 2002.]]

Digital Library

Cited By

(2018)Using program branch probability for the thread parallelisation of branch divergence on the CUDA platformInternational Journal of Autonomous and Adaptive Communications Systems10.1504/IJAACS.2018.09203111:2(171-191)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1504/IJAACS.2018.092031
Osborne WLuk WCoutinho JMencer O(2011)Energy reduction by systematic run-time reconfigurable hardware deactivationTransactions on High-Performance Embedded Architectures and Compilers IV10.5555/2172445.2172467(354-369)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.5555/2172445.2172467

Index Terms

Exploiting Program Branch Probabilities in Hardware Compilation
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Data flow architectures
2. Hardware
  1. Electronic design automation
    1. Logic synthesis
      1. Circuit optimization
  2. Very large scale integration design
    1. Application-specific VLSI designs

Recommendations

An Asynchronous Dataflow FPGA Architecture

We discuss the design of a high-performance field programmable gate array (FPGA) architecture that efficiently prototypes asynchronous (clockless) logic. In this FPGA architecture, low-level application logic is described using asynchronous dataflow ...
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration

Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware, in the form of new function units (or coprocessors), ...
The MOLEN Polymorphic Processor

In this paper, we present a polymorphic processor paradigm incorporating both general purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/designers, and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 53, Issue 11

November 2004

144 pages

ISSN:0018-9340

Issue’s Table of Contents

Copyright © 2004.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 2004

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

(2018)Using program branch probability for the thread parallelisation of branch divergence on the CUDA platformInternational Journal of Autonomous and Adaptive Communications Systems10.1504/IJAACS.2018.09203111:2(171-191)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1504/IJAACS.2018.092031
Osborne WLuk WCoutinho JMencer O(2011)Energy reduction by systematic run-time reconfigurable hardware deactivationTransactions on High-Performance Embedded Architectures and Compilers IV10.5555/2172445.2172467(354-369)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.5555/2172445.2172467

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents