Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/998680.1006725acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

The Case for Lifetime Reliability-Aware Microprocessors

Published: 02 March 2004 Publication History

Abstract

Ensuring long processor lifetimes by limiting failuresdue to wear-out related hard errors is a critical requirementfor all microprocessor manufacturers. We observethat continuous device scaling and increasing temperaturesare making lifetime reliability targets even harder to meet.However, current methodologies for qualifying lifetime reliabilityare overly conservative since they assume worst-caseoperating conditions. This paper makes the case thatthe continued use of such methodologies will significantlyand unnecessarily constrain performance. Instead, lifetimereliability awareness at the microarchitectural design stagecan mitigate this problem, by designing processors that dynamicallyadapt in response to the observed usage to meeta reliability target.We make two specific contributions. First, we describean architecture-level model and its implementation, calledRAMP, that can dynamically track lifetime reliability, respondingto changes in application behavior. RAMP isbased on state-of-the-art device models for different wear-outmechanisms. Second, we propose dynamic reliabilitymanagement (DRM) - a technique where the processorcan respond to changing application behavior to maintainits lifetime reliability target. In contrast to currentworst-case behavior based reliability qualification methodologies,DRM allows processors to be qualified for reliabilityat lower (but more likely) operating points than theworst case. Using RAMP, we show that this can save costand/or improve performance, that dynamic voltage scalingis an effective response technique for DRM, and that dynamicthermal management neither subsumes nor is sub-sumedby DRM.

References

[1]
{1} Reliability in CMOS IC Design: Physical Failure Mechanisms and their Modeling. In MOSIS Technical Notes, http://www.mosis.org/support/technical-notes.html.
[2]
{2} Failure Mechanisms and Models for Semiconductor Devices. In JEDEC Publication JEP 122-A, 2002.
[3]
{3} Critical Reliability Challenges for The International Technology Roadmap for Semiconductors. In Intl. Sematech Tech. Transfer 03024377A-TR, 2003.
[4]
{4} D. H. Albonesi et al. Dynamically Tuning Processor Resources with Adaptive Processing. In IEEE Computer, 2003.
[5]
{5} T. M. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In Proc. of the 32nd Annual Intl. Symp. on Microarchitecture, 1998.
[6]
{6} P. Bose. Power-Efficient Microarchitectural Choices at the Early Design Stage. In Keynote Address, Workshop on Power-Aware Computer Systems, 2003.
[7]
{7} D. Brooks et al. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proc. of the 27th Annual Intl. Symp. on Comp. Arch., 2000.
[8]
{8} A. Dasgupta et al. Electromigration Reliability Enhancement Via Bus Activity Distribution. In Design Automation Conference, 1996.
[9]
{9} S. Heo et al. Reducing Power Density Through Activity Migration. In Intl. Symp. on Low Power Elec. Design, 2003.
[10]
{10} C. J. Hughes et al. RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors. IEEE Computer, Feb. 2002.
[11]
{11} S. S. Mukherjee et al. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proc. of the 36th Intl. Symp. on Microarch., 2003.
[12]
{12} D. Patterson et al. Recovery-Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. In UC Berkeley CS Tech. Report UCB//SD-02-1175, 2002.
[13]
{13} M. G. Pecht et al. Guidebook for Managing Silicon Chip Reliabilty. CRC Press, 1999.
[14]
{14} E. Rotenberg. AR/SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. In International Symposium on Fault Tolerant Computing, 1998.
[15]
{15} R. Sasanka et al. Joint Local and Global Hardware Adaptations for Energy. In Proc. of the 10th Intl. Conf. on Arch. Support for Prog. Langs. and Operating Sys., 2002.
[16]
{16} K. Seshan et al. The Quality and Reliability of Intel's Quarter Micron Process. In Intel Technology Journal, Q3, 1998.
[17]
{17} P. Shivakumar et al. Exploiting Microarchitectural Redundancy for Defect Tolerance. In 21st Intl. Conf. on Comp. Design, 2003.
[18]
{18} K. Skadron et al. Temperature-Aware Microarchitecture. In Proc. of the 30th Annual Intl. Symp. on Comp. Arch., 2003.
[19]
{19} L. Spainhower et al. IBM s/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective. In IBM Journal of R&D, September/November 1999.
[20]
{20} J. Srinivasan et al. The Impact of Scaling on Processor Lifetime Reliability. In Proc. of the Intl. Conf. on Dependable Systems and Networks, 2004.
[21]
{21} J. H. Stathis. Reliability Limits for the Gate Insulator in CMOS Technology. In IBM Journal of R&D, Vol. 46, 2002.
[22]
{22} K. Trivedi. Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Prentice Hall, 1982.
[23]
{23} N. J. Wang et al. Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline. In Proc. of the Intl. Conf. on Dependable Systems and Networks, 2004.
[24]
{24} E. Y. Wu et al. Interplay of Voltage and Temperature Acceleration of Oxide Breakdown for Ultra-Thin Gate Dioxides. In Solid-state Electronics Journal, 2002.

Cited By

View all
  • (2020)Exploiting inter- and intra-memory asymmetries for data mapping in hybrid tiered-memoriesProceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management10.1145/3381898.3397215(100-114)Online publication date: 16-Jun-2020
  • (2020)HAT-DRL: Hotspot-Aware Task Mapping for Lifetime Improvement of Multicore System using Deep Reinforcement LearningProceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD10.1145/3380446.3430623(77-82)Online publication date: 16-Nov-2020
  • (2019)Runtime Stress Estimation for Three-dimensional IC Reliability Management Using Artificial Neural NetworkACM Transactions on Design Automation of Electronic Systems10.1145/336318524:6(1-29)Online publication date: 6-Nov-2019
  • Show More Cited By
  1. The Case for Lifetime Reliability-Aware Microprocessors

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
      June 2004
      373 pages
      ISBN:0769521436
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 32, Issue 2
        ISCA 2004
        March 2004
        373 pages
        ISSN:0163-5964
        DOI:10.1145/1028176
        Issue’s Table of Contents

      Sponsors

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 02 March 2004

      Check for updates

      Qualifiers

      • Article

      Conference

      ISCA04
      Sponsor:

      Acceptance Rates

      ISCA '04 Paper Acceptance Rate 31 of 217 submissions, 14%;
      Overall Acceptance Rate 512 of 2,969 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)34
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 18 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Exploiting inter- and intra-memory asymmetries for data mapping in hybrid tiered-memoriesProceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management10.1145/3381898.3397215(100-114)Online publication date: 16-Jun-2020
      • (2020)HAT-DRL: Hotspot-Aware Task Mapping for Lifetime Improvement of Multicore System using Deep Reinforcement LearningProceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD10.1145/3380446.3430623(77-82)Online publication date: 16-Nov-2020
      • (2019)Runtime Stress Estimation for Three-dimensional IC Reliability Management Using Artificial Neural NetworkACM Transactions on Design Automation of Electronic Systems10.1145/336318524:6(1-29)Online publication date: 6-Nov-2019
      • (2019)SORTACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/33228994:2(1-25)Online publication date: 13-Jun-2019
      • (2017)Multi-armed bandits for efficient lifetime estimation in MPSoC designProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130739(1544-1549)Online publication date: 27-Mar-2017
      • (2017)Scalable analytical model of the reliability of multi-core systems-on-chip by interacting Markovian agentsProceedings of the 11th EAI International Conference on Performance Evaluation Methodologies and Tools10.1145/3150928.3150935(156-163)Online publication date: 5-Dec-2017
      • (2016)Lifetime-aware load distribution policies in multi-core systemsProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2971992(804-809)Online publication date: 14-Mar-2016
      • (2016)Environmental conditions and disk reliability in free-cooled datacentersProceedings of the 14th Usenix Conference on File and Storage Technologies10.5555/2930583.2930588(53-65)Online publication date: 22-Feb-2016
      • (2016)Invited - Optimizing device reliability effects at the intersection of physics, circuits, and architectureProceedings of the 53rd Annual Design Automation Conference10.1145/2897937.2905016(1-6)Online publication date: 5-Jun-2016
      • (2016)Adaptive and Hierarchical Runtime Manager for Energy-Aware Thermal Management of Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/283412015:2(1-25)Online publication date: 29-Jan-2016
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media