Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

FRoC 2.0: Automatic BRAM and Logic Testing to Enable Dynamic Voltage Scaling for FPGA Applications

Published: 09 September 2019 Publication History

Abstract

In earlier technology nodes, FPGAs had low power consumption compared to other compute chips such as CPUs and GPUs. However, in the 14nm technology node, FPGAs are consuming unprecedented power in the 100+W range, making power consumption a pressing concern. To reduce FPGA power consumption, several researchers have proposed deploying dynamic voltage scaling. While the previously proposed solutions show promising results, they have difficulty guaranteeing safe operation at reduced voltages for applications that use the FPGA hard blocks. In this work, we present the first DVS solution that is able to fully handle FPGA applications that use BRAMs. Our solution not only robustly tests the soft logic component of the application but also tests all components connected to the BRAMs. We extend a previously proposed CAD tool, FRoC, to automatically generate calibration bitstreams that are used to measure the application’s critical path delays on silicon. The calibration bitstreams also include testers that ensure all used SRAM cells operate safely while scaling Vdd. We experimentally show that using our DVS solution we can save 32% of the total power consumed by a discrete Fourier transform application running with the fixed nominal supply voltage and clocked at the Fmax reported by static timing analysis.

References

[1]
Ian Kuon and J. Rose. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 26, 2 (Feb. 2007), 203--215.
[2]
A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J. Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In ISCA.
[3]
Wikipedia. 2019. List of CPU Power Dissipation Figures. Retrieved May 2, 2019 https://en.wikipedia.org/wiki/List_of_CPU_power_dissipation_figures.
[4]
Tim Tuan, Sean Kao, Arif Rahman, Satyaki Das, and Steve Trimberger. 2006. A 90nm low-power FPGA for battery-powered applications. In FPGA. 9.
[5]
Jose Nunez-Yanez. 2013. Energy proportional computing in commercial FPGAs with adaptive voltage scaling. In FPGAworld. Article 6, 5 pages.
[6]
S. Zhao, I. Ahmed, C. Lamoureux, A. Lotfi, V. Betz, and O. Trescases. 2016. A universal self-calibrating dynamic voltage and frequency scaling (DVFS) scheme with thermal compensation for energy savings in FPGAs. In APEC.
[7]
Joshua M. Levine, Edward Stott, and Peter Y.K. Cheung. 2014. Dynamic voltage 8 frequency scaling with online slack measurement. In FPGA. 10.
[8]
C. T. Chow, L. S. M. Tsui, P. H. W. Leong, W. Luk, and S. J. E. Wilton. 2005. Dynamic voltage scaling for commercial FPGAs. In FPT.
[9]
I. Ahmed, S. Zhao, O. Trescases, and V. Betz. 2016. Measure twice and cut once: Robust dynamic voltage scaling for FPGAs. In FPL.
[10]
S. Zhao, I. Ahmed, A. Khakpour, V. Betz, and O. Trescases. 2017. A robust dynamic voltage scaling scheme for FPGAs with IR drop compensation. In APEC.
[11]
V. R. Devanathan, A. Hales, S. Kale, and D. Sonkar. 2010. Towards effective and compression-friendly test of memory interface logic. In ITC.
[12]
L. C. Chen, P. Dickinson, P. Mantri, M. Gala, P. Dahlgren, S. Bhattacharya, O. Caty, K. Woodling, T. Ziaja, D. Curwen, W. Yee, E. Su, G. Gu, and T. Nguyen. 2008. Transition test on UltraSPARC-T2 microprocessor. In ITC.
[13]
J. Zeng, M. S. Abadir, G. Vandling, L. C. Wang, S. Karako, and J. A. Abraham. 2004. On correlating structural tests with functional tests for speed binning of high performance design. In MTV.
[14]
I. Ahmed, S. Zhao, J. Meijers, O. Trescases, and V. Betz. 2018. Automatic BRAM testing for robust dynamic voltage scaling for FPGAs. In FPL.
[15]
Charles R. Lefurgy, Alan J. Drake, Michael S. Floyd, Malcolm S. Allen-Ware, Bishop Brock, Jose A. Tierno, and John B. Carter. 2011. Active management of timing guardband to save energy in POWER7. In MICRO. 11.
[16]
A. Drake, R. Senger, H. Deogun, G. Carpenter, S. Ghiasi, T. Nguyen, N. James, M. Floyd, and V. Pokala. 2007. A distributed critical-path timing monitor for a 65nm high-performance microprocessor. In ISSCC.
[17]
B. Bowhill, B. Stackhouse, N. Nassif, Z. Yang, A. Raghavan, C. Morganti, C. Houghton, D. Krueger, O. Franza, J. Desai, J. Crop, D. Bradley, C. Bostak, S. Bhimji, and M. Becker. 2015. The Xeon processor E5-2600 v3: A 22nm 18-core product family. In ISSCC.
[18]
B. Bowhill, B. Stackhouse, N. Nassif, Z. Yang, A. Raghavan, O. Mendoza, C. Morganti, C. Houghton, D. Krueger, O. Franza, J. Desai, J. Crop, B. Brock, D. Bradley, C. Bostak, S. Bhimji, and M. Becker. 2016. The Xeon processor E5-2600 v3: A 22nm 18-core product family. IEEE. Solid-State Circ. 51, 1 (Jan. 2016), 92--104.
[19]
Atukem Nabina and Jose Luis Nunez-Yanez. 2012. Adaptive voltage scaling in a dynamically reconfigurable FPGA-based platform. ACM Trans. Reconfig. Technol. Syst. 5, 4, Article 20 (Dec. 2012), 22 pages.
[20]
Jose Nunez-Yanez. 2017. Adaptive voltage scaling in a heterogeneous FPGA device with memory and logic in-situ detectors. Microprocess. Microsyst. 51 (June 2017), 227--238.
[21]
S. Mukhopadhyay, H. Mahmoodi, and K. Roy. 2005. Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 24, 12 (Dec. 2005), 1859--1880.
[22]
Sadegh Yazdanshenas, Kosuke Tatsumura, and Vaughn Betz. 2017. Don’t forget the memory: Automatic block RAM modelling, optimization, and architecture exploration. In FPGA.
[23]
S. Mukhopadhyay, H. Mahmoodi, and K. Roy. 2004. Statistical design and optimization of SRAM cell for yield enhancement. In ICCAD. 4.
[24]
A. R. Alameldeen, Z. Chishti, C. Wilkerson, W. Wu, and S. L. Lu. 2011. Adaptive cache design to enable reliable low-voltage operation. IEEE Trans. Comput. 60, 1 (Jan. 2011), 50--63.
[25]
E. Stott, J. M. Levine, P. Y. K. Cheung, and N. Kapre. 2014. Timing fault detection in FPGA-based circuits. In FCCM.
[26]
M. B. Tahoori and S. Mitra. 2007. Application-dependent delay testing of FPGAs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syste. 26, 3 (Mar. 2007), 553--563.
[27]
P. R. Menon, Weifeng Xu, and R. Tessier. 2006. Design-specific path delay testing in lookup-table-based FPGAs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 25, 5 (May 2006), 867--877.
[28]
Justin S. J. Wong, Pete Sedcole, and Peter Y. K. Cheung. 2009. Self-measurement of combinatorial circuit delays in FPGAs. ACM Trans. Reconfig. Technol. Syst. 2, 2, Article 10 (Jun. 2009), 22 pages.
[29]
A. Brant, A. Abdelhadi, D. H. H. Sim, S. L. Tang, M. X. Yue, and G. G. F. Lemieux. 2013. Safe overclocking of tightly coupled CGRAs and processor arrays using razor. In FCCM.
[30]
C. Chiasson and V. Betz. 2013. Should FPGAs abandon the pass-gate? In FPL.
[31]
Altera. 2014. Cyclone IV Device Handbook.
[32]
Song Yang. 1991. Logic Synthesis and Optimization Benchmarks User Guide 3.0. Technical Report. MCNC.
[33]
S. Zhao, I. Ahmed, C. Lamoureux, A. Lotfi, V. Betz, and O. Trescases. 2016. A universal self-calibrating dynamic voltage and frequency scaling (DVFS) scheme with thermal compensation for energy savings in FPGAs. In APEC.
[34]
Nikil Mehta, Raphael Rubin, and Andre DeHon. 2012. Limit study of energy 8 delay benefits of component-specific routing. In FPGA. 10.
[35]
Z. Guan, J. S. J. Wong, S. Chaudhuri, G. Constantinides, and P. Y. K. Cheung. 2012. A two-stage variation-aware placement method for FPGAs exploiting variation maps classification. In FPL.
[36]
Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, Hiroaki Takada, and Katsuya Ishii. 2008. CHStone: A benchmark program suite for practical C-based high-level synthesis. In ISCAS.
[37]
K. E. Murray, S. Whitty, S. Liu, J. Luu, and V. Betz. 2013. Titan: Enabling large and complex benchmarks in academic CAD. In FPL.
[38]
Peter Milder, Franz Franchetti, James C. Hoe, and Markus Püschel. 2012. Computer generation of hardware for linear digital signal processing transforms. ACM Trans. Des. Autom. Electron. Syst. 17, 2, Article 15 (Apr. 2012), 33 pages.
[39]
M. Zuluaga, P. Milder, and M. Püschel. 2012. Computer generation of streaming sorting networks. In DAC. 9.
[40]
Jian Liang, R. Tessier, and D. Goeckel. 2004. A dynamically-reconfigurable, power-efficient turbo decoder. In FCCM.
[41]
I. Ahmed, S. Zhao, O. Trescases, and V. Betz. 2017. Find the real speed limit: FPGA CAD for chip-specific application delay measurement. In FPL.
[42]
S. Zhao, I. Ahmed, C. Lamoureux, A. Lotfi, V. Betz, and O. Trescases. 2018. Robust self-calibrated dynamic voltage scaling in FPGAs with thermal and IR-drop compensation. IEEE Trans. Power Electron. 33, 10 (Oct. 2018), 8500--8511.

Cited By

View all
  • (2023)Field-Programmable Gate Array ArchitectureHandbook of Computer Architecture10.1007/978-981-15-6401-7_49-1(1-47)Online publication date: 7-Jan-2023
  • (2022)FODM: A Framework for Accurate Online Delay Measurement Supporting All Timing Paths in FPGAIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.314432130:4(502-514)Online publication date: Apr-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 12, Issue 4
December 2019
163 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3361265
  • Editor:
  • Deming Chen
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2019
Accepted: 01 July 2019
Revised: 01 May 2019
Received: 01 December 2018
Published in TRETS Volume 12, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BRAM interface logic delay testing
  2. BRAM testing
  3. FPGA DVS

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Field-Programmable Gate Array ArchitectureHandbook of Computer Architecture10.1007/978-981-15-6401-7_49-1(1-47)Online publication date: 7-Jan-2023
  • (2022)FODM: A Framework for Accurate Online Delay Measurement Supporting All Timing Paths in FPGAIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.314432130:4(502-514)Online publication date: Apr-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media