Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1058129.1058148acmconferencesArticle/Chapter ViewAbstractPublication PageshpgConference Proceedingsconference-collections
Article

Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

Published: 29 August 2004 Publication History

Abstract

Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model's constraint on input reuse and perform an in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrix-matrix multiplication as an obvious candidate for efficient evaluation on GPUs but, surprisingly we find even near-optimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches. We find the key cause of this inefficiency is that the GPU can fetch less data and yet execute more arithmetic operations per clock than the CPU when both are operating out of their closest caches. The lack of high bandwidth access to cached data will impair the performance of GPU implementations of any computation featuring significant input reuse.

References

[1]
{ABB*99} Anderson E., Bai Z., Bischof C., Blackford S., Demmel J., Dongarra J., Du Croz J., Greenbaum A., Hammarling S., McKenney A., Sorensen D.: LAPACK Users' Guide, third ed. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999.
[2]
{BFGS03} Bolz J., Farmer I., Grinspun E., Schroder P.: Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph, 22, 3 (2003). 917--924.
[3]
{BFH*04} Buck I., Foley T., Horn D., Sugerman J., Fatahalian K., Houston M., Hanrahan P.: Brook for GPUs: Stream computing on graphics hardware. In Proceedings of ACM SIGGRAPH 2004 (to appear) (2004).
[4]
{HCH03} Hall J. D., Carr N. A., Hart J. C.: Cache and bandwidth aware matrix multiplication on the GPU. UIUC Technical Report UIUCDCSR-2003-2328 (2003).
[5]
{HCSL02} Harris M. J., Coombe G., Scheuermann T., Lastra A.: Physically-based visual simulation on graphics hardware. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (2002), Eurographics Association, pp. 109--118.
[6]
{HSU*01} Hinton G., Sager D., Upton M., BOGGS D., CARMEAN D., KYKER A., ROUSSEL P.: The microarchitecture of the pentium 4 processor. Intel Technology Journal 22, 1 (2001).
[7]
{KW03} Kruger J., Westermann R.: Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans. Graph. 22, 3 (2003), 908--916.
[8]
{LM01} Larsen E. S., McAllister D.: Fast matrix multiplies using graphics hardware. In Proc. Supercomputing 2001 (2001).
[9]
{MA03} Moreland K., Angel E.: The FFT on a GPU. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (2003), Eurographics Association, pp. 112--119.
[10]
{Mor03} Moravanszky A.: Dense matrix algebra on the GPU, 2003. http://www.shaderx2.com/shaderx.PDF.
[11]
{TCL98} Thottethodi M., Chatterjee S., Lebeck A. R.: Tuning strassen's matrix multiplication for memory efficiency. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM) (1998), IEEE Computer Society, pp. 1--14.
[12]
{WPD01} Whaley R. C., Petitet A., Dongarra J. J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 1-2 (2001), 3--35.

Cited By

View all
  • (2024)Distributed Predictive QoS in Automotive Environments Under Concept Drift2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619805(549-554)Online publication date: 3-Jun-2024
  • (2024)GUMSO: Gating Unnecessary On-Chip Memory Slices for Power Optimization on GPUsProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670800(1-6)Online publication date: 5-Aug-2024
  • (2024)Sparse Spiking Neural-like Membrane Systems on Graphics Processing UnitsInternational Journal of Neural Systems10.1142/S0129065724500382Online publication date: 5-Apr-2024
  • Show More Cited By

Index Terms

  1. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HWWS '04: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
    August 2004
    142 pages
    ISBN:3905673150
    DOI:10.1145/1058129
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 August 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    GH04
    Sponsor:
    GH04: Graphics Hardware 2004
    August 29 - 30, 2004
    Grenoble, France

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)246
    • Downloads (Last 6 weeks)42
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Distributed Predictive QoS in Automotive Environments Under Concept Drift2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619805(549-554)Online publication date: 3-Jun-2024
    • (2024)GUMSO: Gating Unnecessary On-Chip Memory Slices for Power Optimization on GPUsProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670800(1-6)Online publication date: 5-Aug-2024
    • (2024)Sparse Spiking Neural-like Membrane Systems on Graphics Processing UnitsInternational Journal of Neural Systems10.1142/S0129065724500382Online publication date: 5-Apr-2024
    • (2024)PRIEST: Projection Guided Sampling-Based Optimization for Autonomous NavigationIEEE Robotics and Automation Letters10.1109/LRA.2024.33573119:3(2630-2637)Online publication date: Mar-2024
    • (2024)eRPCAStatistical Analysis and Data Mining10.1002/sam.1167017:2Online publication date: 27-Mar-2024
    • (2023)SWARM parallelismProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619631(29416-29440)Online publication date: 23-Jul-2023
    • (2023)Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUsACM Transactions on Architecture and Code Optimization10.1145/360009220:3(1-26)Online publication date: 19-Jul-2023
    • (2023)Kernel-as-a-ServiceProceedings of the 24th International Middleware Conference10.1145/3590140.3629115(192-206)Online publication date: 27-Nov-2023
    • (2023)An Empirical Performance Comparison between Matrix Multiplication Join and Hash Join on GPUs2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW58674.2023.00034(184-190)Online publication date: Apr-2023
    • (2023)Fast matrix multiplication via compiler‐only layered data reorganization and intrinsic loweringSoftware: Practice and Experience10.1002/spe.321453:9(1793-1814)Online publication date: 14-May-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media