Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2464996.2465022acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

Published: 10 June 2013 Publication History

Abstract

State-of-art graphics processing units (GPUs) employ the single-instruction multiple-data (SIMD) style execution to achieve both high computational throughput and energy efficiency. As previous works have shown, there exists significant computational redundancy in SIMD execution, where different execution lanes operate on the same operand values. Such value locality is referred to as uniform vectors. In this paper, we first show that besides redundancy within a uniform vector, different vectors can also have the identical values. Then, we propose detailed architecture designs to exploit both types of redundancy. For redundancy within a uniform vector, we propose to either extend the vector register file with token bits or add a separate small scalar register file to eliminate redundant computations as well as redundant data storage. For redundancy across different uniform vectors, we adopt instruction reuse, proposed originally for CPU architectures, to detect and eliminate redundancy. The elimination of redundant computations and data storage leads to both significant energy savings and performance improvement. Furthermore, we propose to leverage such redundancy to protect arithmetic-logic units (ALUs) and register files against hardware errors. Our detailed evaluation shows that our proposed design has low hardware overhead and achieves performance gains, up to 23.9% and 12.0% on average, along with energy savings, up to 24.8% and 12.6% on average, as well as a 21.1% and 14.1% protection coverage for ALUs and register files, respectively.

References

[1]
AMD Accelerated Parallel Processing OpenCL Programming Guide 2.1, May 2012
[2]
A. Bakhoda, et al., Analyzing CUDA workloads using a detailed GPU simulator. IPASS 2009.
[3]
S. Che, et al., Rodinia: a benchmark suite for heterogeneous computing, IISWC 2009.
[4]
Z. Chen, et al., Characterizing Scalar Opportunities in GPGPU Applications, ISPSS, 2013
[5]
S. Collange, et al., Dynamic detection of uniform and affine vectors in GPGPU computations, Euro-Par, 2009
[6]
S. Collange. Identifying scalar behavior in CUDA kernels. Technical report hal-00555134, 2011.
[7]
B. Coutinho, et al., Divergence analysis and optimizations, PACT 2011.
[8]
M. Dimitrov, et al., Understanding software approaches for GPGPU reliability, GPGPU-2, 2009
[9]
S. Gilani, N. Kim, M. Schulte: Power-efficient computing for compute-intensive GPGPU applications. PACT 2012.
[10]
M. Gomaa and T. Vijaykumar, "Opportunistic Transient-Fault Detection", ISCA-32, 2005.
[11]
N. B. Lakshminarayana and H. Kim, Effect of Instruction Fetch and Memory Scheduling on GPU Performance, Workshop on Language, Compiler, and Architecture Support for GPGPU, 2010.
[12]
C. J. Lee, et al. Prefetch-aware DRAM controllers. MICRO-41, 2008.
[13]
J. Leng, et al., GPUWattch: Enabling Energy Optimizations in GPGPUs, ISCA, 2013
[14]
S. Li at al., McPAT: an integrated power, area and timing modeling framework for multicore and manycore architectures, MICRO 2009.
[15]
G. Long, et al., Minimal Multi-Threading: Finding and Removing Redundant Instructions in Multi-Threaded Processors. MICRO, 2010.
[16]
Y. Lee, et al. Convergence and Scalarization for Data-Parallel Architectures. CGO 2013.
[17]
NVIDIA GPU Computing SDK 3.1.
[18]
J. Sheaffer, et al. A Hardware Redundancy and Recovery Mechanism for Reliable Scientific Computation on Graphics Processors. Graphics Hardware 2007.
[19]
A. Sodani and G. S. Sohi. Dynamic Instruction Reuse. ISCA 1997.

Cited By

View all
  • (2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
  • (2022)ValueExpert: exploring value patterns in GPU-accelerated applicationsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507708(171-185)Online publication date: 28-Feb-2022
  • (2020)Approximate Cache in GPGPUsACM Transactions on Embedded Computing Systems10.1145/340790419:5(1-22)Online publication date: 26-Sep-2020
  • Show More Cited By

Index Terms

  1. Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
    June 2013
    512 pages
    ISBN:9781450321303
    DOI:10.1145/2464996
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPGPU
    2. redundancy

    Qualifiers

    • Research-article

    Conference

    ICS'13
    Sponsor:
    ICS'13: International Conference on Supercomputing
    June 10 - 14, 2013
    Oregon, Eugene, USA

    Acceptance Rates

    ICS '13 Paper Acceptance Rate 43 of 202 submissions, 21%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
    • (2022)ValueExpert: exploring value patterns in GPU-accelerated applicationsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507708(171-185)Online publication date: 28-Feb-2022
    • (2020)Approximate Cache in GPGPUsACM Transactions on Embedded Computing Systems10.1145/340790419:5(1-22)Online publication date: 26-Sep-2020
    • (2020)GVPROF: A Value Profiler for GPU-Based ClustersSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00093(1-16)Online publication date: Nov-2020
    • (2020)Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00065(725-737)Online publication date: Oct-2020
    • (2020)DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register FilesIEEE Access10.1109/ACCESS.2020.30258998(173276-173288)Online publication date: 2020
    • (2019)An Aging-Aware GPU Register File Design Based on Data RedundancyIEEE Transactions on Computers10.1109/TC.2018.284937668:1(4-20)Online publication date: 1-Jan-2019
    • (2018)An efficient control flow validation method using redundant computing capacity of dual-processor architecturePLOS ONE10.1371/journal.pone.020112713:8(e0201127)Online publication date: 1-Aug-2018
    • (2018)Efficiently Managing the Impact of Hardware Variability on GPUs’ Streaming ProcessorsACM Transactions on Design Automation of Electronic Systems10.1145/328730824:1(1-15)Online publication date: 21-Dec-2018
    • (2018)Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.285704337:11(2779-2789)Online publication date: Nov-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media