Cattaneo et al., 2015 - Google Patents
On how to accelerate iterative stencil loops: a scalable streaming-based approachCattaneo et al., 2015
View PDF- Document ID
- 9913768749978432149
- Author
- Cattaneo R
- Natale G
- Sicignano C
- Sciuto D
- Santambrogio M
- Publication year
- Publication venue
- ACM Transactions on Architecture and Code Optimization (TACO)
External Links
Snippet
In high-performance systems, stencil computations play a crucial role as they appear in a variety of different fields of application, ranging from partial differential equation solving, to computer simulation of particles' interaction, to image processing and computer vision. The …
- 238000000034 method 0 abstract description 61
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
- G06F17/505—Logic synthesis, e.g. technology mapping, optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5068—Physical circuit design, e.g. layout for integrated circuits or printed circuit boards
- G06F17/5072—Floorplanning, e.g. partitioning, placement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5068—Physical circuit design, e.g. layout for integrated circuits or printed circuit boards
- G06F17/5081—Layout analysis, e.g. layout verification, design rule check
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G06F17/504—Formal methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2217/00—Indexing scheme relating to computer aided design [CAD]
- G06F2217/78—Power analysis and optimization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2217/00—Indexing scheme relating to computer aided design [CAD]
- G06F2217/12—Design for manufacturability
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cattaneo et al. | On how to accelerate iterative stencil loops: a scalable streaming-based approach | |
Zhou et al. | An FPGA framework for edge-centric graph processing | |
US8930926B2 (en) | System, methods and apparatus for program optimization for multi-threaded processor architectures | |
Natale et al. | A polyhedral model-based framework for dataflow implementation on FPGA devices of iterative stencil loops | |
Guo et al. | A graph covering algorithm for a coarse grain reconfigurable system | |
Wang et al. | A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model | |
Zuo et al. | A polyhedral-based systemc modeling and generation framework for effective low-power design space exploration | |
Korolija et al. | Dataflow-based parallelization of control-flow algorithms | |
Tithi et al. | Exploiting spatial architectures for edit distance algorithms | |
Tian et al. | SASA: A scalable and automatic stencil acceleration framework for optimized hybrid spatial and temporal parallelism on HBM-based FPGAs | |
Hussain et al. | Implementation of a reverse time migration kernel using the hce high level synthesis tool | |
Deest et al. | Towards scalable and efficient FPGA stencil accelerators | |
Pnevmatikatos et al. | FASTER: Facilitating analysis and synthesis technologies for effective reconfiguration | |
Deest et al. | One size does not fit all: Implementation trade-offs for iterative stencil computations on FPGAs | |
Zhao et al. | HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow Architectures | |
WO2010121228A2 (en) | System, methods and apparatus for program optimization for multi-threaded processor architectures | |
Pajuelo-Holguera et al. | Recommender system implementations for embedded collaborative filtering applications | |
Canis | Legup: open-source high-level synthesis research framework | |
Sharma et al. | Array interleaving—an energy-efficient data layout transformation | |
Voss et al. | On predictable reconfigurable system design | |
Szőke et al. | Performance evaluation of a two-dimensional lattice Boltzmann solver using CUDA and PGAS UPC based parallelisation | |
Sen et al. | Speeding up cycle based logic simulation using graphics processing units | |
Cong et al. | Synthesis of reconfigurable high-performance multicore systems | |
Jin et al. | Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench | |
Schmid et al. | High-level synthesis revised-Generation of FPGA accelerators from a domain-specific language using the polyhedron model |