Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/HPCA.2012.6168948guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

CPU-assisted GPGPU on fused CPU-GPU architectures

Published: 25 February 2012 Publication History

Abstract

This paper presents a novel approach to utilize the CPU resource to facilitate the execution of GPGPU programs on fused CPU-GPU architectures. In our model of fused architectures, the GPU and the CPU are integrated on the same die and share the on-chip L3 cache and off-chip memory, similar to the latest Intel Sandy Bridge and AMD accelerated processing unit (APU) platforms. In our proposed CPU-assisted GPGPU, after the CPU launches a GPU program, it executes a pre-execution program, which is generated automatically from the GPU kernel using our proposed compiler algorithms and contains memory access instructions of the GPU kernel for multiple thread-blocks. The CPU pre-execution program runs ahead of GPU threads because (1) the CPU pre-execution thread only contains memory fetch instructions from GPU kernels and not floating-point computations, and (2) the CPU runs at higher frequencies and exploits higher degrees of instruction-level parallelism than GPU scalar cores. We also leverage the prefetcher at the L2-cache on the CPU side to increase the memory traffic from CPU. As a result, the memory accesses of GPU threads hit in the L3 cache and their latency can be drastically reduced. Since our pre-execution is directly controlled by user-level applications, it enjoys both high accuracy and flexibility. Our experiments on a set of benchmarks show that our proposed pre-execution improves the performance by up to 113% and 21.4% on average.

Cited By

View all
  • (2017)Decoupled Affine Computation for SIMT GPUsACM SIGARCH Computer Architecture News10.1145/3140659.308020545:2(295-306)Online publication date: 24-Jun-2017
  • (2017)Decoupled Affine Computation for SIMT GPUsProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080205(295-306)Online publication date: 24-Jun-2017
  • (2017)Massive parallelization of approximate nearest neighbor search on KD-tree for high-dimensional image descriptor matchingJournal of Visual Communication and Image Representation10.1016/j.jvcir.2017.01.01344:C(106-115)Online publication date: 1-Apr-2017
  • Show More Cited By
  1. CPU-assisted GPGPU on fused CPU-GPU architectures

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    HPCA '12: Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
    February 2012
    457 pages
    ISBN:9781467308274

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 25 February 2012

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Decoupled Affine Computation for SIMT GPUsACM SIGARCH Computer Architecture News10.1145/3140659.308020545:2(295-306)Online publication date: 24-Jun-2017
    • (2017)Decoupled Affine Computation for SIMT GPUsProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080205(295-306)Online publication date: 24-Jun-2017
    • (2017)Massive parallelization of approximate nearest neighbor search on KD-tree for high-dimensional image descriptor matchingJournal of Visual Communication and Image Representation10.1016/j.jvcir.2017.01.01344:C(106-115)Online publication date: 1-Apr-2017
    • (2016)Cooperative Caching for GPUsACM Transactions on Architecture and Code Optimization10.1145/300158913:4(1-25)Online publication date: 12-Dec-2016
    • (2015)iConnACM Journal on Emerging Technologies in Computing Systems10.1145/270023811:4(1-23)Online publication date: 27-Apr-2015
    • (2015)Accelerating aerial image simulation using improved CPU/GPU collaborative computingComputers and Electrical Engineering10.1016/j.compeleceng.2015.05.01846:C(176-189)Online publication date: 1-Aug-2015
    • (2015)Performance and power consumption evaluation of concurrent queue implementations in embedded systemsComputer Science - Research and Development10.1007/s00450-014-0261-030:2(165-175)Online publication date: 1-May-2015
    • (2015)Communication and computation optimization of concurrent kernels using kernel coalesce on a GPUConcurrency and Computation: Practice & Experience10.1002/cpe.319427:1(47-68)Online publication date: 1-Jan-2015
    • (2014)In-cache query co-processing on coupled CPU-GPU architecturesProceedings of the VLDB Endowment10.14778/2735496.27354978:4(329-340)Online publication date: 1-Dec-2014
    • (2014)ad-heapProceedings of Workshop on General Purpose Processing Using GPUs10.1145/2588768.2576786(54-63)Online publication date: 1-Mar-2014
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media