Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Dynamic prediction of architectural vulnerability from microarchitectural state

Published: 09 June 2007 Publication History

Abstract

Transient faults due to particle strikes are a key challenge in microprocessor design. Driven by exponentially increasing transistor counts, per-chip faults are a growing burden. To protect against soft errors, redundancy techniques such as redundant multithreading (RMT) are often used. However, these techniques assume that the probability that a structural fault will result in a soft error (i.e., the Architectural Vulnerability Factor (AVF)) is 100 percent, unnecessarily draining processor resources. Due to the high cost of redundancy, there have been efforts to throttle RMT at runtime. To date, these methods have not incorporated an AVF model and therefore tend to be ad hoc. Unfortunately, computing the AVF of complex microprocessor structures (e.g., the ISQ) can be quite involved.
To provide probabilistic guarantees about fault tolerance, we have created a rigorous characterization of AVF behavior that can be easily implemented in hardware. We experimentally demonstrate AVF variability within and across the SPEC2000 benchmarks and identify strong correlations between structural AVF values and a small set of processor metrics. Using these simple indicators as predictors, we create a proof-of-concept RMT implementation that demonstrates that AVF prediction can be used to maintain a low fault tolerance level without significant performance impact.

References

[1]
D. Bernick and et al. NonStop® Advanced Architecture. In Proceedings of the InternationalConference on Dependable Systems and Networks (DSN), pages 12--21, June 2005.
[2]
A. Biswas, P. Racunas, R. Cheveresan, J. S. Emer, S. S. Mukherjee, and R. Rangan. Computing architectural vulnerability factors for address-based structures. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 532--543, 2005.
[3]
D. Burger and T. Austin. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com.
[4]
C.L. Chen and M.Y. Hsiao. Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review. IBM Journal of Research and Development, 28(2):124--134, March 1984.
[5]
E. Duesterwald, C. Cascaval, and S. Dwarkadas. Characterizing and predicting program behavior and its variability. In PACT '03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, page 220, Washington, DC, USA, 2003. IEEE Computer Society.
[6]
L. Eeckhout, H. Vandierendonck, and K. D. Bosschere. Workload design: Selecting representative program-input pairs. In PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pages 83--94, Washington, DC, USA, 2002. IEEE Computer Society.
[7]
X. Fu, J. Poe, T. Li, and J. Fortes. Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior. In Proceedings of the International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), September 2006.
[8]
M. Gomaa, C. Scarbrough, T. Vijaykumar, and I. Pomeranz. Transient-Fault Recovery for Chip Multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 98--109, June 2003.
[9]
M. A. Gomaa and T. N. Vijaykumar. Opportunistic transient-fault detection. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 172--183, 2005.
[10]
D. Grunwald, A. Klauser, S. Manne, and A. R. Pleszkun. Confidence estimation for speculation control. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 122--131, 1998.
[11]
K. Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John, and K. D. Bosschere. Performance prediction based on inherent program similarity. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 114--122, New York, NY, USA, 2006. ACM Press.
[12]
I. Jolliffe. Principal Component Analysis. Springer, 2002.
[13]
S. Kumar and A. Aggarwal. Reduced Resource Redundancy for Concurrent Error Detection Techniques in High Performance Microprocessors. In Proceedings of the International Conference on High Performance Computer Architecture (HPCA), pages 212--221, February 2006.
[14]
N. Madan and R. Balasubramonian. A First-Order Analysis of Power Overheads of Redundant Multi-Threading. In Proceedings of the Workshop on the System Effects of Logic Soft Errors (SELSE), April 2006.
[15]
S. Mukherjee, M. Kontz, and S. Reinhardt. Detailed Design and Evaluation of Redundant Multithreading Alternatives. In International Symposium on Computer Architecture (ISCA), pages 99--110, May 2002.
[16]
S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 29--40, December 2003.
[17]
Multiple SimPoints. http://www.cse.ucsd.edu/~calder/simpoint/multiplestandardsimpoints.htm.
[18]
A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 376--386, June 2004.
[19]
A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. SlicK: Slice-based Locality Exploitation for Efficient Redundant Multithreading. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 95--105, October 2006.
[20]
M. Rashid, E. Tan, M. Huang, and D. Albonesi. Exploiting Coarse-Grained Verification Parallelism for Power-Efficient Fault Tolerance. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 315--325, September 2005.
[21]
V. Reddy, S. Parthasarathy, and E. Rotenberg. Understanding Prediction-Based Partial Redundant Threading for Low-Overhead, High-Coverage Fault Tolerance. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 83--94, October 2006.
[22]
S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 25--36, June 2000.
[23]
G. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. August. SWIFT: Software Implemented Fault Tolerance. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), March 2005.
[24]
G. Reis, J. Chang, N. Vachharajani, R. Rangan, D. August, and S. Mukherjee. Design and Evaluation of Hybrid Fault-Detection Systems. In Proceedings of the International Symposium on Computer Architecture (ISCA), June 2005.
[25]
E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. In proceedings of the International Symposium on Fault-Tolerant Computing (FTCS), pages 84--91, June 1999.
[26]
J. Sheaffer, D. Luebke, and K. Skadron. The visual vulnerability spectrum: Characterizing architectural vulnerability for graphics hardware. In Proceedings of the 2006 Graphics Hardware Workshop, 2006.
[27]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2002.
[28]
P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2002.
[29]
T. J. Slegel, R. M. A. III, M. A. Check, B. C. Giamei, B. W. Krumm, C. A. Krygowski, W. H. Li, J. S. Liptay, J. D. MacDougall, T. J. McPherson, J. A Navarro, E. M. Schwarz, K. Shum, and C. F. Webb. Ibm's s/390 g5 microprocessor design. IEEE Micro, 19(2):12--23, 1999.
[30]
J. Smolens, B. Gold, J. Kim, B. Falsafi, J. Hoe, and A. Nowatzyk. Fingerprinting: Bounding Soft-Error Detection Latency and Bandwidth. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 224--234, October 2004.
[31]
J. Smolens, J. Kim, J. Hoe, and B. Falsafi. Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 257--268, December 2004.
[32]
SPEC CPU2000. http://www.spec.org/cpu2000/.
[33]
T. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-Fault Recovery via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 87--98, May 2002.
[34]
A. Wood. Data integrity concepts, features, and technology. White Paper, Tandem Division, Compaq Computer Corporation.
[35]
J. Zeigler. Terrestrial Cosmic Rays. IBM Journal of Research and Development, 40(1):19--39, January 1996.

Cited By

View all
  • (2020)A machine learning approach for reliability-aware application mapping for heterogeneous multicoresProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437633(1-6)Online publication date: 20-Jul-2020
  • (2020)Gem5Panalyzer: A Light-weight tool for Early-stage Architectural Reliability Evaluation & Prediction2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS)10.1109/MWSCAS48704.2020.9184536(482-485)Online publication date: Aug-2020
  • (2019)Ensemble learning based Architecture Vulnerability Factor calculation using partial feature set in processorsJournal of Physics: Conference Series10.1088/1742-6596/1195/1/0120201195(012020)Online publication date: 29-May-2019
  • Show More Cited By

Index Terms

  1. Dynamic prediction of architectural vulnerability from microarchitectural state

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
    May 2007
    527 pages
    ISSN:0163-5964
    DOI:10.1145/1273440
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
      June 2007
      542 pages
      ISBN:9781595937063
      DOI:10.1145/1250662
      • General Chair:
      • Dean Tullsen,
      • Program Chair:
      • Brad Calder
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2007
    Published in SIGARCH Volume 35, Issue 2

    Check for updates

    Author Tags

    1. architecture vulnerability factor
    2. microarchitecture
    3. performance
    4. redundant multithreading
    5. reliability

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 22 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)A machine learning approach for reliability-aware application mapping for heterogeneous multicoresProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437633(1-6)Online publication date: 20-Jul-2020
    • (2020)Gem5Panalyzer: A Light-weight tool for Early-stage Architectural Reliability Evaluation & Prediction2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS)10.1109/MWSCAS48704.2020.9184536(482-485)Online publication date: Aug-2020
    • (2019)Ensemble learning based Architecture Vulnerability Factor calculation using partial feature set in processorsJournal of Physics: Conference Series10.1088/1742-6596/1195/1/0120201195(012020)Online publication date: 29-May-2019
    • (2018)Online Soft-Error Vulnerability Estimation for Memory Arrays and Logic CoresIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.270655837:2(499-511)Online publication date: 1-Feb-2018
    • (2018)Reliability-Aware Data Placement for Heterogeneous Memory Architecture2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00056(583-595)Online publication date: Feb-2018
    • (2016)Online soft-error vulnerability estimation for memory arrays2016 IEEE 34th VLSI Test Symposium (VTS)10.1109/VTS.2016.7477301(1-6)Online publication date: Apr-2016
    • (2016)A Survey of Techniques for Modeling and Improving Reliability of Computing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.242617927:4(1226-1238)Online publication date: 1-Apr-2016
    • (2016)Fast and accurate architectural vulnerability analysis for embedded processors using Instruction Vulnerability FactorMicroprocessors and Microsystems10.1016/j.micpro.2016.01.01242(113-126)Online publication date: May-2016
    • (2015)Partial triplication of a SPARC-V8 microprocessor using fault injection2015 IEEE 6th Latin American Symposium on Circuits & Systems (LASCAS)10.1109/LASCAS.2015.7250415(1-4)Online publication date: Feb-2015
    • (2015)A cross-layer approach to online adaptive reliability prediction of transient faults2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)10.1109/DFT.2015.7315165(215-220)Online publication date: Oct-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media