Bernaschi et al., 2016 - Google Patents
A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing unitsBernaschi et al., 2016
View PDF- Document ID
- 10072827351681719385
- Author
- Bernaschi M
- Bisson M
- Fantozzi C
- Janna C
- Publication year
- Publication venue
- SIAM Journal on Scientific Computing
External Links
Snippet
Graphics Processing Units (GPUs) exhibit significantly higher peak performance than conventional CPUs. However, in general only highly parallel algorithms can exploit their potential. In this scenario, the iterative solution to sparse linear systems of equations could …
- 238000011030 bottleneck 0 abstract description 3
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/12—Simultaneous equations, e.g. systems of linear equations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | GPU-accelerated preconditioned iterative linear solvers | |
Lutz et al. | PARTANS: An autotuning framework for stencil computation on multi-GPU systems | |
Bell et al. | Efficient sparse matrix-vector multiplication on CUDA | |
Wyant et al. | Computing performance benchmarks among cpu, gpu, and fpga | |
Georgescu et al. | GPU acceleration for FEM-based structural analysis | |
Agullo et al. | Task‐based FMM for heterogeneous architectures | |
Bernaschi et al. | A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units | |
Calore et al. | Performance and portability of accelerated lattice Boltzmann applications with OpenACC | |
Fernandez et al. | Alternate parallel processing approach for FEM | |
Economon et al. | Towards high-performance optimizations of the unstructured open-source SU2 suite | |
Sun et al. | An I/O bandwidth-sensitive sparse matrix-vector multiplication engine on FPGAs | |
Gao et al. | A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm | |
Ziane Khodja et al. | Parallel sparse linear solver with GMRES method using minimization techniques of communications for GPU clusters | |
Smith et al. | Accelerating scientific applications with the SRC-6 reconfigurable computer: Methodologies and analysis | |
Magoulès et al. | Auto-tuned Krylov methods on cluster of graphics processing unit | |
Rossinelli et al. | Multicore/multi-gpu accelerated simulations of multiphase compressible flows using wavelet adapted grids | |
AlAhmadi et al. | Performance characteristics for sparse matrix-vector multiplication on GPUs | |
Tian et al. | swSuperLU: A highly scalable sparse direct solver on Sunway manycore architecture | |
Halbiniak et al. | Exploration of OpenCL heterogeneous programming for porting solidification modeling to CPU‐GPU platforms | |
Davis et al. | Paradigmatic shifts for exascale supercomputing | |
Zhang et al. | Mixed-precision block incomplete sparse approximate preconditioner on Tensor core | |
Ohshima et al. | Optimization of numerous small dense-matrix–vector multiplications in H-matrix arithmetic on GPU | |
Zhang et al. | Implementing sparse matrix-vector multiplication with QCSR on GPU | |
Al-Mouhamed et al. | SpMV and BiCG-Stab optimization for a class of hepta-diagonal-sparse matrices on GPU | |
Bylina et al. | Explicit Fourth-Order Runge–Kutta Method on Intel Xeon Phi Coprocessor |