research-article

Low-Cost Error Detection in Deep Neural Network Accelerators with Linear Algorithmic Checksums

Authors:

Alex OrailogluAuthors Info & Claims

Journal of Electronic Testing, Volume 36, Issue 6

Pages 703 - 718

https://doi.org/10.1007/s10836-020-05920-2

Published: 01 December 2020 Publication History

Abstract

The widespread adoption of deep neural networks in safety-critical systems necessitates the examination of the safety issues raised by hardware errors. The appropriateness of the concern is herein confirmed by evidencing the possible catastrophic impact of hardware bit errors on DNN accuracy. The consequent interest in fault tolerance methods that are comprehensive yet low-cost to match the margin requirements of consumer deep learning applications can be met through a rigorous exploration of the mathematical properties of the deep neural network computations. Our novel technique, Sanity-Check, allows error detection in fully-connected and convolutional layers through the use of linear algorithmic checksums. The purely software-based implementation of Sanity-Check facilitates the widespread adoption of our technique on a variety of off-the-shelf execution platforms while requiring no hardware modification. We further propose a dedicated hardware unit that seamlessly integrates with modern deep learning accelerators and eliminates the performance overhead of the software-based implementation at the cost of a negligible area and power budget in a DNN accelerator. Sanity-Check delivers perfect critical error coverage in our error injection experiments and offers a promising alternative for low-cost error detection in safety-critical deep neural network applications.

References

[1]

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) TensorFlow: a system for large-scale machine learning. In: Proceedings 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283

[2]

Abraham JA, Banerjee P, Chen C, Fuchs WK, Kuo SY, and Reddy ALN Fault tolerance techniques for systolic arrays Computer 1987 20 7 65-75

[3]

Baleani M, Ferrari A, Mangeruca L, Sangiovanni-Vincentelli A, Peri M, Pezzini S (2003) Fault-tolerant platforms for automotive safety-critical applications. In: Proceedings international conference on compilers, architecture and synthesis for embedded systems. ACM, pp 170–177

[4]

Bayraktaroglu I and Orailoglu A Concurrent test for digital linear systems IEEE Trans Comput-Aided Des Integrated Circ Syst 2001 20 9 1132-1142

[5]

Chen C, Seff A, Kornhauser A, Xiao J (2015) DeepDriving: learning affordance for direct perception in autonomous driving. In: Proceedings IEEE international conference on computer vision, pp 2722–2730

[6]

Chollet F, et al. (2015) Keras. https://keras.io

[7]

Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: Proceedings international conference on artificial neural networks. Springer, pp 281–290

[8]

Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

[9]

Ernst D, Das S, Lee S, Blaauw D, Austin T, Mudge T, Kim NS, and Flautner K Razor: circuit-level correction of timing errors for low-power operation IEEE Micro 2004 24 6 10-20

[10]

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings IEEE conference on computer vision and pattern recognition, pp 770–778

[11]

He Z, Lin J, Ewetz R, Yuan JS, Fan D (2019) Noise injection adaption: end-to-end ReRAM crossbar non-ideal effect adaption for neural network mapping. In: Proceedings 56th annual design automation conference, pp 1–6

[12]

Huang KH and Abraham JA Algorithm-based fault tolerance for matrix operations IEEE Trans Comput 1984 C-33 6 518-528

[13]

ISO 26262-1:2018 road vehicles – functional safety. https://www.iso.org/standard/68383.html (2018)

[14]

Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R, Cantin Pl, Chao C, Clark C, Coriell J, Daley M, Dau M, Dean J, Gelb B, Ghaemmaghami TV, Gottipati R, Gulland W, Hagmann R, Ho CR, Hogberg D, Hu J, Hundt R, Hurt D, Ibarz J, Jaffey A, Jaworski A, Kaplan A, Khaitan H, Killebrew D, Koch A, Kumar N, Lacy S, Laudon J, Law J, Le D, Leary C, Liu Z, Lucke K, Lundin A, MacKean G, Maggiore A, Mahony M, Miller K, Nagarajan R, Narayanaswami R, Ni R, Nix K, Norrie T, Omernick M, Penukonda N, Phelps A, Ross J, Ross M, Salek A, Samadiani E, Severn C, Sizikov G, Snelham M, Souter J, Steinberg D, Swing A, Tan M, Thorson G, Tian B, Toma H, Tuttle E, Vasudevan V, Walter R, Wang W, Wilcox E, Yoon DH (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings 44th annual international symposium on computer architecture, pp 1–12

[15]

Katz G, Barrett C, Dill DL, Julian K, Kochenderfer MJ (2017) Towards proving the adversarial robustness of deep neural networks. arXiv:1709.02802

[16]

Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings advances in neural information processing systems, pp 1097–1105

[17]

Kulkarni JP, Tokunaga C, Aseron PA, Nguyen T, Augustine C, Tschanz JW, and De V A 409 GOPS/W adaptive and resilient domino register file in 22 nm tri-gate CMOS featuring in-situ timing margin and error detection for tolerance to within-die variation, voltage drop, temperature and aging IEEE J Solid State Circuits 2015 51 1 117-129

[18]

Li G, Hari SKS, Sullivan M, Tsai T, Pattabiraman K, Emer J, Keckler SW (2017) Understanding error propagation in deep learning neural network (DNN) accelerators and applications. In: Proceedings international conference for high performance computing, networking, storage and analysis, pp 1–12

[19]

Li G, Pattabiraman K, DeBardeleben N (2018) TensorFI: a configurable fault injector for TensorFlow applications. In: Proceedings IEEE international symposium on software reliability engineering workshops (ISSREW). IEEE, pp 313–320

[20]

Liu C, Hu M, Strachan JP, Li H (2017) Rescuing memristor-based neuromorphic design with high defects. In: Proceedings 54th ACM/EDAC/IEEE design automation conference (DAC), pp 1–6

[21]

Liu M, Xia L, Wang Y, Chakrabarty K (2018) Fault tolerance for RRAM-based matrix operations. In: Proceedings international test conference (ITC). IEEE, pp 1–10

[22]

Liu M, Xia L, Wang Y, Chakrabarty K (2019) Fault tolerance in neuromorphic computing systems. In: Proceedings 24th Asia and South pacific design automation conference. ACM, pp 216–223

[23]

Liu T, Wen W, Jiang L, Wang Y, Yang C, Quan G (2019) A fault-tolerant neural network architecture. In: Proceedings 56th annual design automation conference. ACM/IEEE, pp 1–6

[24]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: Proceedings European conference on computer vision. Springer, pp 21–37

[25]

Long Y, She X, Mukhopadhyay S (2019) Design of reliable DNN accelerator with un-reliable ReRAM. In: Proceedings design, automation & test in europe conference & exhibition (DATE). IEEE, pp 1769–1774

[26]

Nair V and Abraham JA Real-number codes for fault-tolerant matrix operations on processor arrays IEEE Trans Comput 1990 39 4 426-435

[27]

Neggaz MA, Alouani I, Lorenzo PR, Niar S (2018) A reliability study on CNNs for critical embedded systems. In: Proceedings international conference on computer design (ICCD). IEEE, pp 476–479

[28]

Ozen E, Orailoglu A (2019) Sanity-Check: boosting the reliability of safety-critical deep neural network applications. In: Proceedings 28th Asian test symposium (ATS). IEEE, pp 7–12

[29]

Ozen E and Orailoglu A Boosting bit-error resilience of DNN accelerators through median feature selection IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2020 39 11 3250-3262

[30]

Ozen E, Orailoglu A (2020) Concurrent monitoring of operational health in neural networks through balanced output partitions. In: Proceedings 25th Asia and South pacific design automation conference (ASP-DAC). IEEE, pp 169–174

[31]

Ozen E, Orailoglu A (2020) Just say zero: containing critical bit-error propagation in deep neural networks with anomalous feature suppression. In: Proceedings IEEE/ACM international conference on computer aided design (ICCAD)

[32]

Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The limitations of deep learning in adversarial settings. In: Proceedings IEEE European symposium on security and privacy (EuroS&P), pp 372–387

[33]

Reagen B, Gupta U, Pentecost L, Whatmough P, Lee SK, Mulholland N, Brooks D, Wei GY (2018) Ares: A framework for quantifying the resilience of deep neural networks. In: Proceedings 55th ACM/ESDA/IEEE design automation conference (DAC), pp 1–6

[34]

Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee SK, Hernández-Lobato JM, Wei GY, Brooks D (2016) Minerva: enabling low-power, highly-accurate deep neural network accelerators. In: Proceedings ACM/IEEE 43rd annual international symposium on computer architecture (ISCA), pp 267–278

[35]

Schorn C, Guntoro A, Ascheid G (2018) Accurate neuron resilience prediction for a flexible reliability management in neural network accelerators. In: Proceedings design, automation & test in europe conference & exhibition (DATE). IEEE, pp 979–984

[36]

Schorn C, Guntoro A, Ascheid G (2018) Efficient on-line error detection and mitigation for deep neural network accelerators. In: Proceedings international conference on computer safety, reliability, and security. Springer, pp 205–219

[37]

Shaheen H, Boschi G, Harutyunyan G, Zorian Y (2017) Advanced ECC solution for automotive SoCs. In: Proceedings 23rd international symposium on on-line testing and robust system design (IOLTS). IEEE, pp 71–73

[38]

Sharma H, Park J, Mahajan D, Amaro E, Kim JK, Shao C, Mishra A, Esmaeilzadeh H (2016) From high-level deep neural models to FPGAs. In: Proceedings IEEE/ACM 49th annual IEEE/ACM international symposium on microarchitecture (MICRO), pp 1–12

[39]

Stallkamp J, Schlipsing M, Salmen J, and Igel C Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition Neural Netw 2012 32 323-332

[40]

Su J, Vargas DV, and Sakurai K One pixel attack for fooling deep neural networks IEEE Trans Evol Comput 2019 23 5 828-841

[41]

Sze V, Chen YH, Yang TJ, and Emer JS Efficient processing of deep neural networks: a tutorial and survey Proc IEEE 2017 105 12 2295-2329

[42]

Tian Y, Pei K, Jana S, Ray B (2018) DeepTest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings 40th international conference on software engineering. ACM, pp 303–314

[43]

Xia L, Liu M, Ning X, Chakrabarty K, and Wang Y Fault-tolerant training enabled by on-line fault detection for RRAM-based neural computing systems IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2018 38 9 1611-1624

[44]

Zhang J, Rangineni K, Ghodsi Z, Garg S (2018) Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In: Proceedings 55th annual design automation conference. ACM, pp 19:1–19:6

[45]

Zhang JJ, Gu T, Basu K, Garg S (2018) Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In: Proceedings IEEE 36th VLSI test symposium (VTS), pp 1–6

Cited By

Wei XWang CYue HTan JGuan ZJiang NZheng XZhao JQiu M(2024)ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error DetectionACM Transactions on Architecture and Code Optimization10.1145/367490921:3(1-26)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3674909
Ahmadilivani MTaheri MRaik JDaneshtalab MJenihhin M(2023)A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural NetworksACM Computing Surveys10.1145/363824256:6(1-39)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3638242
Asgari Khoshouyeh AGeissler FQutub SPaulitsch MNair PPattabiraman KMohror KArnold DBadia R(2023)Structural Coding: A Low-Cost Scheme to Protect CNNs from Large-Granularity Memory FaultsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607084(1-17)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607084
Show More Cited By

Index Terms

Low-Cost Error Detection in Deep Neural Network Accelerators with Linear Algorithmic Checksums

Index terms have been assigned to the content through auto-classification.

Recommendations

Understanding error propagation in deep learning neural network (DNN) accelerators and applications
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy ...
Hardware/software optimization of error detection implementation for real-time embedded systems
CODES/ISSS '10: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

This paper presents an approach to system-level optimization of error detection implementation in the context of fault-tolerant real-time distributed embedded systems used for safety-critical applications. An application is modeled as a set of processes ...
Performance comparison of CNN, QNN and BNN deep neural networks for real-time object detection using ZYNQ FPGA node
Abstract
In this manuscript, previously trained Convolutional neural network (CNN), Quantum Neural Network (QNN), and Binarized Neural Network (BNN) models performed employing Tensor Flow's Application Programming Interface (API) for real-time ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Electronic Testing: Theory and Applications

Journal of Electronic Testing: Theory and Applications Volume 36, Issue 6

Dec 2020

113 pages

ISSN:0923-8174

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2020

Accepted: 26 November 2020

Received: 27 June 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wei XWang CYue HTan JGuan ZJiang NZheng XZhao JQiu M(2024)ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error DetectionACM Transactions on Architecture and Code Optimization10.1145/367490921:3(1-26)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3674909
Ahmadilivani MTaheri MRaik JDaneshtalab MJenihhin M(2023)A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural NetworksACM Computing Surveys10.1145/363824256:6(1-39)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3638242
Asgari Khoshouyeh AGeissler FQutub SPaulitsch MNair PPattabiraman KMohror KArnold DBadia R(2023)Structural Coding: A Low-Cost Scheme to Protect CNNs from Large-Granularity Memory FaultsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607084(1-17)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607084
El-Sayed SSpyrou TCamuñas-Mesa LStratigopoulos H(2023)Compact Functional Testing for Neuromorphic Computing CircuitsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.322384342:7(2391-2403)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/TCAD.2022.3223843
Ozen EOrailoglu A(2022)Architecting Decentralization and Customizability in DNN Accelerators for Hardware Defect AdaptationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319754041:11(3934-3945)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1109/TCAD.2022.3197540
Ozen EOrailoglu A(2021)SNR: Squeezing Numerical Range Defuses Bit Error Vulnerability Surface in Deep Neural NetworksACM Transactions on Embedded Computing Systems10.1145/347700720:5s(1-25)Online publication date: 17-Sep-2021
https://dl.acm.org/doi/10.1145/3477007
Aramoon OQu GChen YZhirnov VSasan ASavidis I(2021)Provably Accurate Memory Fault Detection Method for Deep Neural NetworksProceedings of the 2021 Great Lakes Symposium on VLSI10.1145/3453688.3461750(443-448)Online publication date: 22-Jun-2021
https://dl.acm.org/doi/10.1145/3453688.3461750

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents