Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2148600.2148626acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
poster

Poster: a tunable, software-based DRAM error detection and correction library for HPC

Published: 12 November 2011 Publication History

Abstract

Proposed exascale systems will present a number of considerable resiliency challenges. In particular, DRAM soft-errors, or bit-flips, are expected to greatly increase due to the increased memory density of these systems. Current hardware-based fault-tolerance methods will be unsuitable for addressing the expected soft error frequency rate. As a result, additional software will be needed to address this challenge. In this paper we introduce LIBSDC, a tunable, transparent silent data corruption detection and correction library for HPC applications. LIBSDC provides comprehensive SDC protection for program memory by implementing on-demand page integrity verification by utilizing the MMU. Experimental benchmarks with Mantevo HPCCG show that once tuned, LIBSDC is able to achieve SDC protection with less than 100% overhead of resources.

Supplementary Material

PDF File (post186.pdf)

Cited By

View all

Index Terms

  1. Poster: a tunable, software-based DRAM error detection and correction library for HPC

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
    November 2011
    166 pages
    ISBN:9781450310307
    DOI:10.1145/2148600

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. resiliency
    2. silent data corruption

    Qualifiers

    • Poster

    Conference

    SC '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)FlipSphereProceedings of the 20th International Symposium on Distributed Simulation and Real-Time Applications10.1109/DS-RT.2016.27(19-28)Online publication date: 21-Sep-2016
    • (2013)Online-ABFTACM SIGPLAN Notices10.1145/2517327.244253348:8(167-176)Online publication date: 23-Feb-2013
    • (2013)kMemvisorProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2493123.2462910(251-262)Online publication date: 17-Jun-2013
    • (2013)kMemvisorProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2462902.2462910(251-262)Online publication date: 17-Jun-2013
    • (2013)Online-ABFTProceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/2442516.2442533(167-176)Online publication date: 23-Feb-2013
    • (2013)Battling Bad Bits with Checksums in the Loris Page CacheProceedings of the 2013 Sixth Latin-American Symposium on Dependable Computing10.1109/LADC.2013.10(68-77)Online publication date: 1-Apr-2013
    • (2012)Evaluating operating system vulnerability to memory errorsProceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers10.1145/2318916.2318930(1-8)Online publication date: 29-Jun-2012
    • (2012)Detection and correction of silent data corruption for large-scale high-performance computing2012 International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2012.49(1-12)Online publication date: Nov-2012

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media