Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/514191.514233acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library

Published: 22 June 2002 Publication History

Abstract

LLNL's hypre library is an object-oriented library for the solution of sparse linear systems on parallel computers. While hypre facilitates rapid-prototyping of complex parallel applications, our experience is that without careful attention to temporal data locality, node performance of applications developed using hypre will fall significantly short of peak performance on architectures based on modern microprocessors. In this paper, we describe our experiences analyzing and tuning the performance of smg98, a benchmark that exercises hypre's semicoarsening multigrid solver. In the original code, the lack of temporal data reuse in the registers and caches significantly hurts performance. We describe a variety of techniques we applied to hand-tune the performance of hypre's semicoarsening multigrid solver. We expect that similar strategies will be applicable to other solvers and codes based on hypre as well. We present performance measurements of smg98 on both SGI Origin and Compaq Alpha platforms. Overall, our optimizations improve the node performance of smg98 by nearly a factor of two on large problems.

References

[1]
A. Behie and P. A. Forsyth, Jr. Multigrid solution of three-dimensional problems with discontinuous coefficients. Appl. Math. Comput, 13:366--386, 1983
[2]
A. Brandt. Multilevel computations: Review and recent developments. In S. F. McCormick, editor, Multigrid Methods: Theory, Applications, and Supercomputing, pages 35--62. Marcel Dekker, New York, 1988
[3]
M. Bromley, S. Heller, T. McNerney, and G. Steele, Jr. Fortran at ten gigaflops: The Connection Machine convolution compiler. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991
[4]
P. N. Brown, R. D. Falgout, and J. E. Jones. Semicoarsening multigrid on distributed memory machines. SIAM J. Sci. Comput, 21(5):1823--1834, 1999
[5]
D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of the SIGPLAN '90 Conference on Programming Language Design and Implementation, White Plains, NY, June 1990
[6]
E. Chow, A. J. Cleary, and R. D. Falgout. Design of the hypre preconditioner library. In M. Henderson, C. Anderson, and S. Lyons, editors, Proceedings of the SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing, Yorktown Heights, Oct. 1998
[7]
J. E. Dendy, Jr. Black box multigrid. J. Comput. Phys., 48:366--386, 1982
[8]
J. E. Dendy, Jr. Black box multigrid for nonsymmetric problems. Appl. Math. Comput, 13:261--283, 1982
[9]
J. E. Dendy, Jr. Black box multigrid for systems. Appl. Math. Comput, 19:57--74, 1986
[10]
J. E. Dendy, Jr. Two multigrid methods for three-dimensional problems with discontinuous and anisotropic coefficients. SIAM J. Sci. Stat. Comput, 8:673--685, 1987
[11]
J. E. Dendy, Jr., M. P. Ida, and J. M. Rutledge. A semicoarsening multigrid algorithm for simd machines. SIAM J. Sci. Stat. Comput, 13:1460--1469, 1992
[12]
C. C. Douglas. Caching in with multigrid algorithms: Problems in two dimensions. Parallel Algorithms and Applications, 9:195--204, 1996
[13]
R. D. Falgout and J. E. Jones. Multigrid on massively parallel architectures. Technical Report UCRL-JC-133948, Lawrence Livermore National Laboratory, 2000
[14]
G. Jin, J. Mellor-Crummey, and R. Fowler. Increasing temporal locality with skewing and recursive blocking. In Proceedings of SC2001, Denver, CO, Nov 2001
[15]
J. Jones. A semicoarsening multigrid algorithm for elliptic partial differential equations. Master's thesis, Mathematics Department, New Mexico Tech, 1989
[16]
S. Schaffer. A semicoarsening multigrid method for elliptic partial differential equations with highly discontinuous and anisotropic coefficients. SIAM J. Sci. Comput, 20(1):228--242, 1998
[17]
M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The Complete Reference. MIT Press, 1995
[18]
Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999
[19]
C. Weiss, W. Karl, M. Kowarschik, and U. Rüde. Memory characteristics of iterative methods. In Proceedings of SC'99: High Performance Networking and Computing, Portland, OR, Nov. 1999
[20]
D. Wonnacott. A general algorithm for time skewing. Technical Report DCS-TR-449, Dept. of Computer Science, Rutgers University, July 2001. To appear in International Journal of Parallel Programming, June 2002, 181--221

Cited By

View all
  • (2023)XHYPRE: a reliable parallel numerical algorithm library for solving large-scale sparse linear equationsCCF Transactions on High Performance Computing10.1007/s42514-023-00141-35:2(191-209)Online publication date: 4-Apr-2023
  • (2009)Effective source-to-source outlining to support whole program empirical optimizationProceedings of the 22nd international conference on Languages and Compilers for Parallel Computing10.1007/978-3-642-13374-9_21(308-322)Online publication date: 8-Oct-2009
  • (2007)Methodology for modelling SPMD hybrid parallel computationConcurrency and Computation: Practice and Experience10.1002/cpe.121420:8(903-940)Online publication date: Oct-2007
  • Show More Cited By

Index Terms

  1. Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '02: Proceedings of the 16th international conference on Supercomputing
    June 2002
    338 pages
    ISBN:1581134835
    DOI:10.1145/514191
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache blocking
    2. memory hierarchy
    3. multigrid
    4. performance tuning
    5. stencils
    6. time skewing

    Qualifiers

    • Article

    Conference

    ICS02
    Sponsor:
    ICS02: International Conference on Supercomputing
    June 22 - 26, 2002
    New York, New York, USA

    Acceptance Rates

    ICS '02 Paper Acceptance Rate 31 of 144 submissions, 22%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)XHYPRE: a reliable parallel numerical algorithm library for solving large-scale sparse linear equationsCCF Transactions on High Performance Computing10.1007/s42514-023-00141-35:2(191-209)Online publication date: 4-Apr-2023
    • (2009)Effective source-to-source outlining to support whole program empirical optimizationProceedings of the 22nd international conference on Languages and Compilers for Parallel Computing10.1007/978-3-642-13374-9_21(308-322)Online publication date: 8-Oct-2009
    • (2007)Methodology for modelling SPMD hybrid parallel computationConcurrency and Computation: Practice and Experience10.1002/cpe.121420:8(903-940)Online publication date: Oct-2007
    • (2006)Introducing the open trace format (OTF)Proceedings of the 6th international conference on Computational Science - Volume Part II10.1007/11758525_71(526-533)Online publication date: 28-May-2006
    • (2005)Improving the computational intensity of unstructured mesh applicationsProceedings of the 19th annual international conference on Supercomputing10.1145/1088149.1088195(341-350)Online publication date: 20-Jun-2005
    • (2005)Self-Adapting Linear Algebra Algorithms and SoftwareProceedings of the IEEE10.1109/JPROC.2004.84084893:2(293-312)Online publication date: Feb-2005

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media