Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Mechanisms for bounding vulnerabilities of processor structures

Published: 09 June 2007 Publication History

Abstract

Concern for the increasing susceptibility of processor structures to transient errors has led to several recent research efforts that propose architectural techniques to enhance reliability. However, real systems are typically required to satisfy hard reliability budgets, and barring expensive full-redundancy approaches, none of the proposed solutions treat any reliability budgets or bounds as hard constraints. Meeting vulnerability bounds requires monitoring vulnerabilities of processor structures and taking appropriate actions whenever these bounds are violated. This mandates treating reliability as a first-order microarchitecture design constraint, while optimizing performance as long as reliability requirements are satisfied. This paper makes three key contributions towards this goal: (i) we present a simple infrastructure to monitor and provide upper bounds on the vulnerabilities of key processor structures at cycle-level fidelity; (ii) we propose two distinct control mechanisms - throttling and selective redundancy - to proactively and/or reactively bound the vulnerabilities to any limit specified by the system designer; (iii) within this framework, we propose a novel adaptation of Out-of-Order Commit for vulnerability reduction, which automatically provides additional leverage for the control mechanisms to boost performance while remaining within the reliability budget.

References

[1]
G. B. Bell and M. H. Lipasti. Deconstructing commit. In Proceedings of the 4th International Symposium on Performance Analysis of Systems and Software, Austin, Texas, March 2004.
[2]
D. Burger and T. Austin. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com.
[3]
C. Constantinescu. Trends and Challenges in VLSI Circuit Reliability. IEEE Micro, 23(4):14--19, July-August 2003.
[4]
A. Cristal, D. Ortega, J. Llosa, and M. Valero. Out-of-order commit processors. In HPCA '04: Proceedings of the 10th International Symposium on High Performance Computer Architecture, page 48, Washington, DC, USA, 2004. IEEE Computer Society.
[5]
X. Fu, J. Poe, T. Li, and J. A. B. Fortes. Characterizing microarchitecture soft error vulnerability phase behavior. In MASCOTS '06: Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation, pages 147--155, Washington, DC, USA, 2006. IEEE Computer Society.
[6]
M. A. Gomaa and T. N. Vijaykumar. Opportunistic transient-fault detection. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 172--183, 2005.
[7]
I. Kim and M. H. Lipasti. Understanding scheduling replay schemes. In HPCA '04: Proceedings of the 10th International Symposium on High Performance Computer Architecture, page 198, Washington, DC, USA, 2004. IEEE Computer Society.
[8]
X. Li, S. V. Adve, P. Bose, and J. A. Rivers. Softarch: An architecture level tool for modeling and analyzing soft errors. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), pages 496--505, 2005.
[9]
S. Mukherjee, M. Kontz, and S. Reinhardt. Detailed Design and Evaluation of Redundant Multithreading Alternatives. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 99--110, May 2002.
[10]
S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 29--40, December 2003.
[11]
S. Palacharla. Complexity-Effective Superscalar Processors. PhD thesis, University of Wisconsin-Madison, 1998.
[12]
A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 376--386, June 2004.
[13]
A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. Slick: Slice-based locality exploitation for efficient redundant multithreading. In Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006.
[14]
D. Ponomarev, G. Kucuk, and K. Ghose. Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources. In MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pages 90--101, Washington, DC, USA, 2001. IEEE Computer Society.
[15]
J. Ray, J. Hoe, and B. Falsafi. Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 214--224, December 2001.
[16]
V. K. Reddy, S. Parthasarathy, and E. Rotenberg. Understanding prediction-based partial redundant threading for low-overhead, high-coverage fault tolerance. In Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006.
[17]
S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 25--36, June 2000.
[18]
E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. In Proceedings of the International Symposium on Fault-Tolerant Computing (FTCS), pages 84--91, June 1999.
[19]
J. Shen and M. Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors (Beta Edition). McGraw Hill, 2003.
[20]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2002.
[21]
P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2002.
[22]
K. Sundaramoorthy, Z. Purser, and E. Rotenburg. Slipstream processors: improving both performance and fault tolerance. In ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, pages 257--268, 2000.
[23]
T. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-Fault Recovery via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 87--98, May 2002.
[24]
C. Weaver, J. Emer, S. Mukherjee, and S. Reinhardt. Techniques to Reduce the Soft Error Rate of High-Performance Microprocessor. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 264--275, June 2004.

Cited By

View all
  • (2018)Online Soft-Error Vulnerability Estimation for Memory Arrays and Logic CoresIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.270655837:2(499-511)Online publication date: 1-Feb-2018
  • (2017)Reliability-Aware Scheduling on Heterogeneous Multicore Processors2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.12(397-408)Online publication date: Feb-2017
  • (2016)A Survey of Techniques for Modeling and Improving Reliability of Computing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.242617927:4(1226-1238)Online publication date: 1-Apr-2016
  • Show More Cited By

Index Terms

  1. Mechanisms for bounding vulnerabilities of processor structures

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
    May 2007
    527 pages
    ISSN:0163-5964
    DOI:10.1145/1273440
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
      June 2007
      542 pages
      ISBN:9781595937063
      DOI:10.1145/1250662
      • General Chair:
      • Dean Tullsen,
      • Program Chair:
      • Brad Calder
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2007
    Published in SIGARCH Volume 35, Issue 2

    Check for updates

    Author Tags

    1. microarchitecture
    2. redundant threading
    3. transient faults

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Online Soft-Error Vulnerability Estimation for Memory Arrays and Logic CoresIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.270655837:2(499-511)Online publication date: 1-Feb-2018
    • (2017)Reliability-Aware Scheduling on Heterogeneous Multicore Processors2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.12(397-408)Online publication date: Feb-2017
    • (2016)A Survey of Techniques for Modeling and Improving Reliability of Computing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.242617927:4(1226-1238)Online publication date: 1-Apr-2016
    • (2016)A method for issue queue soft error vulnerability mitigation2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)10.1109/SNPD.2016.7515938(443-450)Online publication date: May-2016
    • (2014)Comprehensive and Efficient Design Parameter Selection for Soft Error Resilient Processors via Universal RulesIEEE Transactions on Computers10.1109/TC.2013.2463:9(2201-2214)Online publication date: Sep-2014
    • (2014)Prevention from Soft Errors via Architecture ElasticityJournal of Computer Science and Technology10.1007/s11390-014-1427-829:2(247-254)Online publication date: 23-Mar-2014
    • (2011)Accurate and Simplified Prediction of AVF for Delay and Energy Efficient Cache DesignJournal of Computer Science and Technology10.1007/s11390-011-1150-726:3(504-519)Online publication date: 12-May-2011
    • (2022)Reliability-Aware Runahead2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00062(772-785)Online publication date: Apr-2022
    • (2022)IntroductionFault Tolerant Computer Architecture10.1007/978-3-031-01723-0_1(1-17)Online publication date: 5-Mar-2022
    • (2020)Prediction-Based Error Correction for GPU Reliability with Low OverheadElectronics10.3390/electronics91118499:11(1849)Online publication date: 5-Nov-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media