Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Fine-Grained Module-Based Error Recovery in FPGA-Based TMR Systems

Published: 24 January 2018 Publication History

Abstract

Space processing applications deployed on SRAM-based Field Programmable Gate Arrays (FPGAs) are vulnerable to radiation-induced Single Event Upsets (SEUs). Compared with the well-known SEU mitigation solution—Triple Modular Redundancy (TMR) with configuration memory scrubbing—TMR with module-based error recovery (MER) is notably more energy efficient and responsive in repairing soft-errors in the system. Unfortunately, TMR-MER systems also need to resort to scrubbing when errors occur between sub-components, such as in interconnection nets, which are not recovered by MER. This article addresses this problem by proposing a fine-grained module-based error recovery technique, which can localize and correct errors that classic MER fails to do without additional system hardware. We evaluate our proposal via fault-injection campaigns on three types of circuits implemented in Xilinx 7-Series devices. With respect to scrubbing, we observed reductions in the mean time to repair configuration memory errors of between 48.5% and 89.4%, while reductions in energy used recovering from configuration memory errors were estimated at between 77.4% and 96.1%. These improvements result in higher reliability for systems employing TMR with fine-grained reconfiguration than equivalent systems relying on scrubbing for configuration error recovery.

References

[1]
Dimitris Agiakatsikas, Ediz Cetin, and Oliver Diessel. 2016. FMER: A hybrid configuration memory error recovery scheme for highly reliable FPGA SoCs. In FPL. 1--4.
[2]
Dimitris Agiakatsikas, Nguyen T. H. Nguyen, Zhuoran Zhao, Tong Wu, Ediz Cetin, Oliver Diessel, and Lingkan Gong. 2016. Reconfiguration control networks for TMR systems with module-based recovery. In FCCM. 88--91.
[3]
Ghazanfar Asadi and Mehdi B. Tahoori. 2005. Soft error rate estimation and mitigation for SRAM-based FPGAs. In FPGA. 149--160.
[4]
Cristiana Bolchini, Antonio Miele, and Chiara Sandionigi. 2011. A novel design methodology for implementing reliability-aware systems on SRAM-based FPGAs. IEEE Trans. Comput. 60, 12 (2011), 1744--1758.
[5]
Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In FPGA. 33--36.
[6]
Ediz Cetin, Oliver Diessel, Lingkan Gong, and Victor Lai. 2013. Towards bounded error recovery time in FPGA-based TMR circuits using dynamic partial reconfiguration. In FPL. 1--4.
[7]
Ediz Cetin, Oliver Diessel, Tuo Li, Jude A. Ambrose, Thomas Fisk, Sri Parameswaran, and Andrew G. Dempster. 2016. Overview and investigation of SEU detection and recovery approaches for FPGA-based heterogeneous systems. In FPGAs and Parallel Architectures for Aerospace Applications. Springer, 33--46.
[8]
Sergio D’Angelo, Cecilia Metra, Sandro Pastore, A. Pogutz, and Giacomo R. Sechi. 1998. Fault-tolerant voting mechanism and recovery scheme for TMR FPGA-based systems. In DFT. 233--240.
[9]
Jonathan M. Johnson and Michael J. Wirthlin. 2010. Voter insertion algorithms for FPGA designs using triple modular redundancy. In FPGA. 249--258.
[10]
Ganghee Lee, Dimitris Agiakatsikas, Tong Wu, Ediz Cetin, and Oliver Diessel. 2017. TLegUp: A TMR code generation tool for SRAM-based FPGA applications using HLS. In FCCM. 1--4.
[11]
Daniel McMurtrey, Keith S . Morgan, Brian Pratt, and Michael J Wirthlin. 2008. Estimating TMR Reliability on FPGAs Using Markov Models. Technical Report. Brigham Young University. Retrieved from http://scholarsarchive.byu.edu/facpub/149.
[12]
Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, Jason Anderson, and Koen Bertels. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 35, 10 (2016), 1591--1604.
[13]
Gabriel Luca Nazar, Leonardo Pereira Santos, and Luigi Carro. 2015. Fine-grained fast field-programmable gate array scrubbing. IEEE Trans. VLSI Syst. 23, 5 (2015), 893--904.
[14]
QB50 Project. 2009. Homepage. Retrieved June 6, 2017 from https://www.qb50.eu.
[15]
Luca Sterpone, Matteo Sonza Reorda, and Massimo Violante. 2005. RoRA: A reliability-oriented place and route algorithm for SRAM-based FPGAs. In PRIME, Vol. 1. IEEE, 173--176.
[16]
Martin Straka, Jan Kastil, Zdenek Kotasek, and Lukas Miculka. 2013. Fault tolerant system design and SEU injection based testing. Microprocess Microsy 37, 2 (2013), 155--173.
[17]
Jorge Tonfat, Fernanda Kastensmidt, and Ricardo Reis. 2015. Analyzing the effectiveness of a frame-level redundancy scrubbing technique for SRAM-based FPGAs. IEEE Trans. Nucl. Sci. 62, 6 (Dec. 2015), 3080--3087.
[18]
Xilinx Inc. 2013. UG470: 7 Series FPGAs Configuration User Guide. Retrieved from https://www.xilinx.com/support/documentation/user_guides/ug470_7Series_Config.pdf.
[19]
Xilinx Inc. 2015. PG036: Product Guide - Soft Error Mitigation Controller (v4.1). Retrieved from https://www.xilinx.com/support/documentation/ip_documentation/sem/v4_1/pg036_sem.pdf.
[20]
Xilinx Inc. 2015. UG909: Vivado Design Suite User Guide—Partial Reconfiguration. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_1/ug909-vivado-partial-reconfiguration.pdf.
[21]
Xilinx Inc.2016. XAPP1222: Isolation Design Flow for Xilinx 7 Series FPGAs or Zynq-7000 AP SoCs (Vivado Tools). Retrieved from https://www.xilinx.com/support/documentation/application_notes/xapp1222-idf-for-7s-or-zynq-vivado.pdf.
[22]
Zhuoran Zhao, Dimitris Agiakatsikas, Nguyen T. H. Nguyen, Ediz Cetin, and Oliver Diessel. 2016. Fine-grained module-based error recovery in FPGA-based TMR systems. In FPT. 101--108.

Cited By

View all
  • (2022)Optimized Fault-Tolerant Adder Design Using Error AnalysisJournal of Circuits, Systems and Computers10.1142/S021812662350091332:06Online publication date: 25-Oct-2022
  • (2021)Reconfigurable Framework for Resilient Semantic Segmentation for Space ApplicationsACM Transactions on Reconfigurable Technology and Systems10.1145/347277014:4(1-32)Online publication date: 13-Sep-2021
  • (2021)FTT-NAS: Discovering Fault-tolerant Convolutional Neural ArchitectureACM Transactions on Design Automation of Electronic Systems10.1145/346028826:6(1-24)Online publication date: 12-Aug-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 1
Special Section on FCCM 2016 and Regular Papers
March 2018
183 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3178391
  • Editor:
  • Steve Wilton
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2018
Accepted: 01 December 2017
Revised: 01 October 2017
Received: 01 June 2017
Published in TRETS Volume 11, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SRAM FPGA
  2. configuration memory errors
  3. dynamic reconfiguration
  4. mean time to recover
  5. partial reconfiguration
  6. radiation-induced errors
  7. recovery energy
  8. reliability

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Discovery
  • Australian Research Council's Linkage

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)7
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Optimized Fault-Tolerant Adder Design Using Error AnalysisJournal of Circuits, Systems and Computers10.1142/S021812662350091332:06Online publication date: 25-Oct-2022
  • (2021)Reconfigurable Framework for Resilient Semantic Segmentation for Space ApplicationsACM Transactions on Reconfigurable Technology and Systems10.1145/347277014:4(1-32)Online publication date: 13-Sep-2021
  • (2021)FTT-NAS: Discovering Fault-tolerant Convolutional Neural ArchitectureACM Transactions on Design Automation of Electronic Systems10.1145/346028826:6(1-24)Online publication date: 12-Aug-2021
  • (2020)A Real-Time Fault Location Mechanism Combining CGP Code and Deep Learning2019 6th International Conference on Dependable Systems and Their Applications (DSA)10.1109/DSA.2019.00047(311-316)Online publication date: Jan-2020
  • (2019)Hybrid scheduling to enhance reliability of real-time tasks running on reconfigurable devicesThe Journal of Supercomputing10.1007/s11227-019-02976-676:6(4701-4730)Online publication date: 27-Aug-2019
  • (2018)Energy Optimization and Fault Tolerance to Embedded System Based on Adaptive Heterogeneous Multi-Core Hardware Architecture2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)10.1109/QRS-C.2018.00063(316-323)Online publication date: Jul-2018
  • (2018)IPRDF: An Isolated Partial Reconfiguration Design Flow for Xilinx FPGAs2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC2018.2018.00018(36-43)Online publication date: Sep-2018

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media