Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Exploiting Program Branch Probabilities in Hardware Compilation

Published: 01 November 2004 Publication History

Abstract

This paper explores using information about program branch probabilities to optimize the results of hardware compilation. The basic premise is to promote utilization by dedicating more resources to branches which execute more frequently. A new hardware compilation and flow control scheme are presented which enable the computation rate of different branches to be matched to the observed branch probabilities. We propose an analytical queuing network performance model to determine the optimal settings for basic block computation rates given a set of observed branch probabilities. An experimental hardware compilation system has been developed to evaluate this approach. The branch optimization design space is characterized in an experimental study for Xilinx Virtex FPGAs of two complex applications: video feature extraction and progressive refinement radiosity. For designs of equal performance, branch-optimized designs require 24 percent and 27.5 percent less area. For designs of equal area, branch optimized designs run up to three times faster. Our analytical performance model is shown to be highly accurate with relative error between 0.12 and 1.1\times10^{-4}.

References

[1]
Elixent d-fabrix and rap, http://www.elixent.com/products/white_papers.htm, 2003]]
[2]
Celoxica Limited, Handel-C Language Reference Manual, version 3.1, document number RM-1003-3.0, 2002.]]
[3]
“A Look into Quicksilver's Acm Architecture,” http://www.qstech.com/pdfs/a_look_into_quicksilver.pdf, 2002]]
[4]
M. Budiu and S.C. Goldstein, “Compiling Application-Specific Hardware,” Field-Programmable Logic and Applications, pp. 853-863, Springer, 2002.]]
[5]
W. Luk N. Shirazi and P.Y.K. Cheung, “Modelling and Optimising Run-Time Reconfigurable Systems,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 1996.]]
[6]
D. Cottet R. Enzler T. Jeger and G. Tröster, “High-Level Area and Performance Estimation of Hardware Building Blocks on FPGAs,” Field Programmable Logic and Applications, pp. 525-534, Springer, 2000.]]
[7]
P.J. Denning, “The Working Set Model for Program Behavior,” Proc. ACM Symp. Operating System Principles, pp. 15.1-15.12, 1967.]]
[8]
G. De Micheli, Synthesis and Optimization of Digital Circuits. McGraw-Hill, 1994.]]
[9]
A.S. Dhodapkar and J.E. Smith, “Managing Multi-Configuration Hardware via Dynamic Working Set Analysis,” Proc. 29th IEEE/ACM Int'l Symp. Computer Architecture, pp. 233-244, 2002.]]
[10]
A.J. Field and P.G. Harrison, “A Methodology for the Performance Modelling of Distributed Cache Coherent Multiprocessors,” The State-of-the-Art in Performance Modeling and Simulation, K. Bagchi, J.nbspWalrand, and G.W. Zobrist, eds., pp.nbsp55-92, Gordon and Breach, 1998.]]
[11]
M.J. Flynn, Computer Architecture: Pipelined and Parallel Processor Design. Sudbury, Mass.: Jones and Bartlett, 1995.]]
[12]
T. Harriss R. Walke B. Kienhuis and E. Deprettere, “Compilation from Matlab to Process Networks,” Design Automation for Embedded Systems, vol. 7, pp. 385-403, 2002.]]
[13]
P.A. Jackson J.L. Tripp and B.L. Hutchings, “Sea Cucumber: A Synthesizing Compiler for FPGAs,” Field Programmable Logic and Applications, pp. 875-885, Springer, 2002.]]
[14]
D. Cottet T. Jeger R. Enzler and G. Tröster, “The Performance Prediction Model,” technical report, Electronics Lab, Swiss Federal Inst. of Technology (ETH) Zurich, 2000.]]
[15]
L. Kleinrock, Queueing Systems. Volume I: Theory. John Wiley & Sons, Inc., 1975.]]
[16]
L. Kleinrock, Queueing Systems. Volume II: Computer Applications. John Wiley & Sons, Inc., 1976.]]
[17]
M.S. Lam R.P. Wilson R.S. French and J.L. Hennessy, “Suif: An Infrastructure for Research on Parallelizing And Optimizing Compilers,” SIGPLAN Notices, vol. 29, no. 12, pp. 31-37, 1994.]]
[18]
R. Leupers and P. Marwedel, Retargetable Compiler Technology for Embedded Systems: Tools and Applications, Kluwer Academic, 2001.]]
[19]
W. Luk T.K. Lee J.R. Rice N. Shirazi and P.Y.K. Cheung, “Reconfigurable Computing for Augmented Reality,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 1999.]]
[20]
X.-J. Zhang K.-W. Ng and W. Luk, “A Combined Approach to High-Level Synthesis for Dynamically Reconfigurable Systems,” Field Programmable Logic and Applications, pp. 361-370, Springer, 2000.]]
[21]
S. McMillan and S. Guccione, “Partial Run-Time Reconfiguration Using jrtr,” Field Programmable Logic and Applications, Springer, 2000.]]
[22]
K.N. McNall and A.E. Casavant, “Automatic Operator Configuration in the Synthesis of Pipelined Architectures,” Proc. 27th ACM/IEEE Design Automation Conf., pp. 174-179, 1990.]]
[23]
O. Mencer H Hübert M. Morf and M.J. Flynn, “StReAm: Object-Oriented Programming of Stream Architectures Using PAM-Blox,” Field-Programmable Logic: the Roadmap to Reconfigurable Systems, pp. 595-604, Springer, 2000.]]
[24]
I. Mitrani, Probabalistic Modelling. Cambridge Univ. Press, 1998.]]
[25]
T. Moller and B. Trumbore, “Fast, Minimum Storage Ray-Triangle Intersection,” J. Graphics Tools, vol. 2, no. 1, pp. 21-28, 1997.]]
[26]
S. Ong, et al., “Automatic Mapping of Multiple Applications to Multiple Adaptive Computing Systems,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 218-227, 2001.]]
[27]
U. Prabhu and B.M. Pangrle, “Global Mobility Based Scheduling,” Proc. IEEE Int'l Conf. Computer Design: VLSI in Computers and Processors, pp. 370-373, 1993.]]
[28]
R.O. Onvural, “Survey of Closed Queueing Networks with Blocking,” ACM Computing Surveys, vol. 22, no. 2, pp. 83-121, June 1990.]]
[29]
J. Park K.R. Shesha Shayee and P.C. Diniz, “Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations,” Field Programmable Logic and Applications, Springer, 2003.]]
[30]
H. Schmit, “Incremental Reconfiguration for Pipeline Applications,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 1997.]]
[31]
S. Singh and P. James-Roxby, “Lava and jbits: From Hdl to Bitstream in Seconds,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 2001.]]
[32]
H. Styles and W. Luk, “Accelerating Radiosity Calculations Using Reconfigurable Platforms,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 279-281, 2002.]]
[33]
A.H. Veen, “Dataflow Machine Architecture,” ACM Computing Surveys, vol. 18, no. 4, pp. 365-396, 1986.]]
[34]
M. Weinhardt and W. Luk, “Pipeline Vectorisation,” IEEE Trans. Computer-Aided Design, vol. 20, no. 2, pp. 234-248, Feb. 2001.]]
[35]
J.M. Stone, et al., “Stream-Oriented Fpga Computing in the Stream-C High Level Language,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, 2000.]]
[36]
W. Wang A. Raghunathan N.K. Jha and S. Dey, “High-Level Synthesis of Multi-Process Behavioral Descriptions,” Proc. IEEE Int'l Conf. VLSI Design, pp. 467-473, 2003.]]
[37]
H. Ziegler B. So M. Hall and P.C. Diniz, “Coarse-Grain Pipelining on Multiple FPGA Architectures,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 77-86, 2002.]]

Cited By

View all
  • (2018)Using program branch probability for the thread parallelisation of branch divergence on the CUDA platformInternational Journal of Autonomous and Adaptive Communications Systems10.1504/IJAACS.2018.09203111:2(171-191)Online publication date: 1-Jan-2018
  • (2011)Energy reduction by systematic run-time reconfigurable hardware deactivationTransactions on High-Performance Embedded Architectures and Compilers IV10.5555/2172445.2172467(354-369)Online publication date: 1-Jan-2011

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 53, Issue 11
November 2004
144 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 2004

Author Tags

  1. 65
  2. Index Terms- Automatic synthesis
  3. dataflow architectures
  4. queuing theory.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Using program branch probability for the thread parallelisation of branch divergence on the CUDA platformInternational Journal of Autonomous and Adaptive Communications Systems10.1504/IJAACS.2018.09203111:2(171-191)Online publication date: 1-Jan-2018
  • (2011)Energy reduction by systematic run-time reconfigurable hardware deactivationTransactions on High-Performance Embedded Architectures and Compilers IV10.5555/2172445.2172467(354-369)Online publication date: 1-Jan-2011

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media