Nothing Special   »   [go: up one dir, main page]

skip to main content
survey

Classification Framework for Analysis and Modeling of Physically Induced Reliability Violations

Published: 17 February 2015 Publication History

Abstract

Technology downscaling is expected to amplify a variety of reliability concerns in future digital systems. A good understanding of reliability threats is crucial for the creation of efficient mitigation techniques. This survey performs a systematic classification of the state of the art on the analysis and modeling of such threats, which are caused by physical mechanisms to digital systems. The purpose of this article is to provide a classification tool that can aid with the navigation across the entire landscape of reliability analysis and modeling. A classification framework is constructed in a top-down fashion from complementary categories, each one addressing an approach on reliability analysis and modeling. In comparison to other classifications, the proposed methodology approaches the target research domain in a complete way, without suppressing hybrid works that fall under multiple categories. To substantiate the usability of the classification framework, representative works from the state of the art are mapped to each appropriate category and are briefly analyzed. Thus, research trends and opportunities for novel approaches can be identified.

References

[1]
A. Andrei et al. 2004. Reliability study of AlTi/TiW, polysilicon and ohmic contacts for piezoresistive pressure sensors applications. In Proceedings of IEEE Sensors. 1125--1128.
[2]
J. Arlat et al. 2003. Comparison of physical and software-implemented fault injection techniques. IEEE Transactions on Computing 52, 9, 1115--1133.
[3]
A. Avizienis et al. 2004. Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing 1, 1, 11--33.
[4]
R. C. Baumann. 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Transactions on Device and Materials Reliability 5, 3, 305--316.
[5]
D. Bekiaris, A. Papanikolaou, C. Papameletis, D. Soudris, G. Economakos, and K. Pekmestzi. 2011. A temperature-aware time-dependent dielectric breakdown analysis framework. In Proceedings of the 20th International Conference on Integrated Circuit and System Design: Power and Timing Modeling, Optimization and Simulation (PATMOS'10). Springer-Verlag, Grenoble, France, 73--83. http://dl.acm.org/citation.cfm?id=1950238.1950248.
[6]
R. Bell. 2006. Introduction to IEC 61508. In Proceedings of the 10th Australian Workshop on Safety Critical Systems and Software, Vol. 55 (SCS'05). Australian Computer Society, Darlinghurst, Australia, 3--12.
[7]
S. Benecke et al. 2012. Energy harvesting on its way to a reliable and green micro energy source. In Proceedings of Electronics Goes Green 2012+ (EGG), 2012. 1--8.
[8]
A. Benso et al. 2010. Fault Injection Techniques and Tools for Embedded Systems Reliability Evaluation. Springer.
[9]
D. Bharathan et al. 2008. An assessment of air cooling for use with automotive power electronics. In Proceedings of the 11th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems, 2008. 37--43.
[10]
A. Biswas et al. 2005. Computing architectural vulnerability factors for address-based structures. In Proceedings of the 32nd International Symposium on Computer Architecture, 2005 (ISCA'05). 532--543.
[11]
S. Borkar et al. 2011. The future of microprocessors. Communications of the ACM 54, 5, 67--77.
[12]
J. W. Bossung. 1977. Projection printing characterization. In Proceedings of SPIE, Vol. 100: Developments in Semiconductor Microlithography II. 80--84.
[13]
D. Burger et al. 1997. The SimpleScalar tool set, v. 2.0. SIGARCH Computer Architecture News 25, 3, 13--25.
[14]
E. Buturla. 1991. The use of TCAD in semiconductor technology development. In Proceedings of the IEEE 1991 Custom Integrated Circuits Conference. 23.1/1--23.1/7.
[15]
N. P. Carter et al. 2010. Design techniques for cross-layer resilience. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE), 2010. 1023--1028.
[16]
F. Chen et al. 2011. Realistic simulation and experimental validation of adjacent-channel interference in planning of industrial wireless networks. In Proceedings of the 8th ACM Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, and Ubiquitous Networks. ACM, New York, NY, 97--104.
[17]
H. Cho et al. 2012. ERSA: Error resilient system architecture for probabilistic applications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 4, 546--558.
[18]
P. Civera et al. 2001. Exploiting circuit emulation for fast hardness evaluation. IEEE Transactions on Nuclear Science 48, 6, 2210--2216.
[19]
C. E. Ebeling. 2009. An Introduction to Reliability and Maintainability Engineering. Waveland Press.
[20]
A. Evans et al. 2012a. Case study of SEU effects in a network processor. In Proceedings of the IEEE Workshop on Silicon Errors in Logic--System Effects (SELSE).
[21]
A. Evans et al. 2012b. RIIF: Reliability information interchange format. In Proceedings of the 2012 IEEE 18th International On-Line Testing Symposium (IOLTS). 103--108.
[22]
Y. Fei et al. 2007. Microarchitectural support for program code integrity monitoring in application-specific instruction set processors. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE'07). 815--820. http://dl.acm.org/citation.cfm?id=1266366.1266540
[23]
D. D. Gajski et al. 1983. New VLSI tools. Computer 16, 12, 11--14. MC.1983.1654264
[24]
A. Geist et al. 2002. Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors. Technical Report. Oak Ridge National Laboratory.
[25]
R. Geist et al. 1990. Reliability estimation of fault-tolerant systems: Tools and techniques. Computer 23, 7, 52--61.
[26]
M. S. Gordon et al. 2004. Measurement of the flux and energy spectrum of cosmic-ray induced neutrons on the ground. IEEE Transactions on Nuclear Science 51, 6, 3427--3434. 10.1109/TNS.2004.839134
[27]
T. Grasser et al. 2011. The paradigm shift in understanding the bias temperature instability: From reaction-diffusion to switching oxide traps. IEEE Transactions on Electron Devices 58, 11, 3652--3666.
[28]
R. L. Greenwell et al. 2011. SOI-based integrated circuits for high-temperature power electronics applications. In Proceedings of the 26th Annual IEEE Applied Power Conference and Exposition. 836--843.
[29]
P. Gupta et al. 2004. Toward a systematic-variation aware timing methodology. In Proceedings of the 41st Design Automation Conference, 2004. 321--326.
[30]
M.-C. Hsueh et al. 1997. Fault injection techniques and tools. Computer 30, 4, 75--82.
[31]
IEEE. 2000. 1076-2000: IEEE Standard VHDL Language Reference Manual. i--290.
[32]
IEEE. 2012a. 1800-2012: IEEE Standard for SystemVerilog--Unified Hardware Design, Specification, and Verification Language. 1--1304.
[33]
IEEE. 2012b. 1666-2011: IEEE Standard for Standard SystemC Language Reference Manual. 1--638.
[34]
ISO 26262-1. 2011. ISO 26262-1:2011: Road Vehicles--Functional Safety--Part 1: Vocabulary. Retrieved December 26, 2014, from http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csn umber=43464
[35]
F. Jalali et al. 2012. Error control schemes in solar energy harvesting wireless sensor networks. In Proceedings of the 2012 International Symposium on Communications and Information Technologies (ISCIT). 979--984.
[36]
B. Kaczer et al. 2010. Origin of NBTI variability in deeply scaled pFETs. In Proceedings of the 2010 IEEE International Reliability Physics Symposium (IRPS). 26--32.
[37]
K. C. Kapurl et al. 1977. Reliability in Engineering Design. John Wiley & Sons.
[38]
J. Karlsson et al. 1995. Application of three physical fault injection techniques to the experimental assessment of the Mars architecture. In Proceedings of the 5th IFIP International Working Conference on Dependable Computing for Critical Applications. IEEE, Los Alamitos, CA, 267--287.
[39]
S. Kim et al. 2007. Power deregulation: Eliminating off-chip voltage regulation circuitry from embedded systems. In Proceedings of the 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 105--110.
[40]
W.-Y. Kim et al. 2009. Evergreen: A fault-tolerant application streaming technique. In Proceedings of the 11th International Conference on Advanced Communication Technology. IEEE, Los Alamitos, CA, 2302--2307. http://dl.acm.org/citation.cfm?id=1701655.1701814
[41]
Y. Kim et al. 2011. Automated di/dt stressmark generation for microprocessor power delivery networks. In Proceedings of the 2011 International Symposium on Low Power Electronics and Design (ISLPED). 253--258.
[42]
M. Kimura. 1999. Field and temperature acceleration model for time-dependent dielectric breakdown. IEEE Transactions on Electronic Devices 46, 1, 220--229.
[43]
G. A. Klutke et al. 2003. A critical look at the bathtub curve. IEEE Transactions on Reliability 52, 1, 125--129.
[44]
M. Koganemaru et al. 2008. Evaluation of stress-induced effect on electronic characteristics of nMOSFETs using mechanical stress simulation and drift-diffusion device simulation. In Proceedings of the 2nd Electronics System-Integration Technology Conference, 2008 (ESTC'08). 839--844.
[45]
H. Kopetz et al. 1989. Distributed fault-tolerant real-time systems: The Mars approach. IEEE Micro 9, 1, 16.
[46]
A. Kritikakou et al. 2013. A systematic approach to classify design-time global scheduling techniques. ACM Computing Surveys 45, 2, Article No. 14.
[47]
A. Kumar et al. 2009. SRAM supply voltage scaling: A reliability perspective. In Proceedings of the Conference on Quality of Electronic Design, 2009 (ISQED'09). 782--787.
[48]
M.-N. Li et al. 2009. Accurate microarchitecture-level fault modeling for studying hardware faults. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture, 2009 (HPCA'09). 105--116.
[49]
M. R. Lyu. 2007. Software reliability engineering: A roadmap. In Proceedings of the 2007 Future of Software Engineering Conference (FOSE'07). IEEE, Los Alamitos, CA, 153--170.
[50]
J. Ma et al. 2011. Layout-aware critical path delay test under maximum power supply noise effects. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 12, 1923--1934.
[51]
S. Mao et al. 2010. Hardware support for secure processing in embedded systems. IEEE Transactions on Computers 59, 6, 847--854.
[52]
R. J. Martínez et al. 1999. Experimental validation of high-speed fault-tolerant systems using physical fault injection. In Proceedings of the Conference on Dependable Computing for Critical Applications. IEEE, Los Alamitos, CA, 249--265.
[53]
J. W. McPherson. 2006. Reliability challenges for 45nm and beyond. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). ACM, New York, NY, 176--181.
[54]
A. Meixner et al. 2008. Argus: Low-cost, comprehensive error detection in simple cores. IEEE Micro 28, 1, 52--59.
[55]
S. Mukherjee. 2008. Architecture Design for Soft Errors. Morgan Kaufmann.
[56]
S. Mukherjee et al. 2005. The soft error problem: An architectural perspective. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. IEEE, Los Alamitos, CA, 243--247.
[57]
L. W. Nagel et al. 1973. SPICE (Simulation Program with Integrated Circuit Emphasis). Technical Report UCB/ERL M382. EECS Department, University of California, Berkeley. Retrieved December 26, 2014, from http://www.eecs.berkeley.edu/Pubs/TechRpts/1973/22871.html.
[58]
R. Naseer et al. 2007. Critical charge characterization for soft error rate modeling in 90nm SRAM. In Proceedings of the IEEE International Symposium on Circuits and Systems, 2007 (ISCAS'07). 1879--1882.
[59]
Z. Peng et al. 2009. Impact of humidity on dielectric charging in RF MEMS capacitive switches. IEEE Microwave and Wireless Components Letters 19, 5, 299--301.
[60]
R. G. Ragel et al. 2006. IMPRES: Integrated monitoring for processor reliability and security. In Proceedings of the 2006 43rd ACM/IEEE Design Automation Conference. 502--505.
[61]
V. Raghunathan et al. 2005. Design considerations for solar energy harvesting wireless embedded systems. In Proceedings of the 4th International Symposium on Information Processing in Sensor Networks, 2005 (IPSN'05). 457--462.
[62]
V. J. Reddi et al. 2010. Predicting voltage droops using recurring program and microarchitectural event activity. IEEE Micro 30, 1, 110.
[63]
V. J. Reddi et al. 2011. Voltage noise in production processors. IEEE Micro 31, 1, 20--28.
[64]
F. Redmill. 2005. An Introduction to the Safety Standard IEC 61508. Technical Report. Centre for Software Reliability, Newcastle University, Newcastle upon Tyne, England. Available at http://www.csr.ncl.ac.uk/FELIX_Web/4B.IEC%2061508%20Intro.pdf.
[65]
P. Roche et al. 1999. Determination of key parameters for SEU occurrence using 3-D full cell SRAM simulations. IEEE Transactions on Nuclear Science 46, 6, 1354--1362.
[66]
B. F. Romanescu et al. 2007. VariaSim: Simulating Circuits and Systems in the Presence of Process Variability. Technical Report 2007-3. Department of Electrical and Computer Engineering, Duke University, Durham, NC.
[67]
SAE International. 1996. Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment. Retrieved December 24, 2014, from http://standards.sae.org/arp4761/.
[68]
E. Seevinck et al. 1987. Static-noise margin analysis of MOS SRAM cells. IEEE Journal of Solid-State Circuits 22, 5, 748--754.
[69]
R. A. Shafik et al. 2008. SystemC-based minimum intrusive fault injection technique with improved fault representation. In Proceedings of the 14th IEEE International On-Line Testing Symposium. IEEE, Los Alamitos, CA, 99--104.
[70]
K. Skadron et al. 2004. Temperature-aware microarchitecture: Modeling and implementation. ACM Transactions on Architecture and Code Optimization 1, 1, 94--125.
[71]
D. J. Sorin. 2009. Fault Tolerant Computer Architecture. Morgan & Claypool.
[72]
J. Srinivasan et al. 2004. The impact of technology scaling on lifetime reliability. In Proceedings of the 2004 International Conference on Dependable Systems and Networks. 177--186.
[73]
C. Srivaree-ratana et al. 1998. Estimating all-terminal network reliability using a neural network. In Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics, Vol. 5. 4734--4739.
[74]
M. Toledano-Luque et al. 2011. Response of a single trap to AC negative bias temperature stress. In Proceedings of the 2011 IEEE International Reliability Physics Symposium (IRPS). 4A.2.1--4A.2.8.
[75]
L. Tomek et al. 1994. Reliability modeling of life-critical, real-time systems. Proceedings of the IEEE 82, 1, 108--121.
[76]
K. Tyagi et al. 2011. Reliability of component based systems: A critical survey. ACM SIGSOFT Softwware Engineering Notes 36, 6, 1--6.
[77]
A. J. Tylka et al. 1997. CREME96: A revision of the cosmic ray effects on micro-electronics code. IEEE Transactions on Nuclear Science 44, 6, 2150--2160.
[78]
A. Varga et al. 2008. An overview of the OMNeT++ simulation environment. In Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks, and Systems and Workshops. Article No. 60.
[79]
R. Velazco et al. 1991. Heavy ion test results for the 68020 microprocessor and the 68882 coprocessor. In Proceedings of the 1st European Conference on Radiation and Its Effects on Devices and Systems. 445--449.
[80]
G. S. Walia et al. 2006. Requirement error abstraction and classification: An empirical study. In Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engin. ACM, New York, NY, 336--345.
[81]
J. Wei et al. 2012. BLOCKWATCH: Leveraging similarity in parallel programs for error detection. In Proceedings of the 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 1--12.
[82]
Y. Ye et al. 2011. Statistical modeling and simulation of threshold variation under random dopant fluctuations and line-edge roughness. IEEE Transactions on VLSI Systems 19, 6, 987--996.
[83]
R. Yokoyama et al. 2008. Modeling and evaluation of supply reliability of microgrids including PV and wind power. In Proceedings of the Power and Energy Society General Meeting. 1--5.
[84]
H. Ziade et al. 2004. A survey on fault injection techniques. International Arab Journal of Information Technology 1, 2, 171--186.

Cited By

View all
  • (2020)Binary Tree Classification of Rigid Error Detection and Correction TechniquesACM Computing Surveys10.1145/339726853:4(1-38)Online publication date: 20-Aug-2020
  • (2019)Parametric and Functional Degradation Analysis of Complete 14-nm FinFET SRAMIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.2902881(1-14)Online publication date: 2019
  • (2017)Runtime Slack Creation for Processor Performance Variability using System ScenariosACM Transactions on Design Automation of Electronic Systems10.1145/315215823:2(1-23)Online publication date: 21-Dec-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 47, Issue 3
April 2015
602 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2737799
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2015
Accepted: 01 October 2014
Revised: 01 December 2013
Received: 01 November 2012
Published in CSUR Volume 47, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Reliability analysis
  2. classification framework
  3. error
  4. failure
  5. fault

Qualifiers

  • Survey
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Binary Tree Classification of Rigid Error Detection and Correction TechniquesACM Computing Surveys10.1145/339726853:4(1-38)Online publication date: 20-Aug-2020
  • (2019)Parametric and Functional Degradation Analysis of Complete 14-nm FinFET SRAMIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.2902881(1-14)Online publication date: 2019
  • (2017)Runtime Slack Creation for Processor Performance Variability using System ScenariosACM Transactions on Design Automation of Electronic Systems10.1145/315215823:2(1-23)Online publication date: 21-Dec-2017
  • (2017)Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital SystemsACM Computing Surveys10.1145/309269950:4(1-38)Online publication date: 4-Oct-2017
  • (2017)Will Chips of the Future Learn How to Feel Pain and Cure Themselves?IEEE Design & Test10.1109/MDAT.2017.273084134:5(80-87)Online publication date: Oct-2017

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media