Tseng et al., 2014 - Google Patents

Automatic data layout transformation for heterogeneous many-core systems

Tseng et al., 2014

Document ID: 11980604711650493275
Author: Tseng Y; Huang Y; Lai B; Lin J
Publication year: 2014
Publication venue: Network and Parallel Computing: 11th IFIP WG 10.3 International Conference, NPC 2014, Ilan, Taiwan, September 18-20, 2014. Proceedings 11

External Links

Cited by

Snippet

Applying appropriate data structures is critical to attain superior performance in heterogeneous many-core systems. A heterogeneous many-core system is comprised of a host for control flow management, and a device for massive parallel data processing …

Continue reading at inria.hal.science (PDF) (other versions)

230000001131 transforming 0 title abstract description 41

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining

Similar Documents

Publication	Publication Date	Title
CN107347253B (en)	2021-07-06	Hardware instruction generation unit for special purpose processor
Lee et al.	2016	Openacc to fpga: A framework for directive-based high-performance reconfigurable computing
US9448779B2 (en)	2016-09-20	Execution of retargetted graphics processor accelerated code by a general purpose processor
Singh	2011	Computing without Processors: Heterogeneous systems allow us to target our programming to the appropriate environment.
Liu et al.	2015	Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors
Krommydas et al.	2016	Opendwarfs: Characterization of dwarf-based benchmarks on fixed and reconfigurable architectures
Vishkin	2011	Using simple abstraction to reinvent computing for parallelism
Baskaran et al.	2008	Optimizing sparse matrix-vector multiplication on GPUs using compile-time and run-time strategies
Ukidave et al.	2015	Nupar: A benchmark suite for modern gpu architectures
Hormati et al.	2010	Macross: Macro-simdization of streaming applications
Coarfa et al.	2004	Co-array Fortran performance and potential: An NPB experimental study
Papakonstantinou et al.	2013	Efficient compilation of CUDA kernels for high-performance computing on FPGAs
Sunitha et al.	2017	Performance improvement of CUDA applications by reducing CPU-GPU data transfer overhead
Desnos et al.	2015	Memory Analysis and Optimized Allocation of Dataflow Applications on Shared-Memory MPSoCs: In-Depth Study of a Computer Vision Application
Oancea et al.	2012	Financial software on GPUs: between Haskell and Fortran
Verma et al.	2016	Accelerating workloads on fpgas via opencl: A case study with opendwarfs
Buchty	2014	High-performance and hardware-aware computing
Owaida et al.	2011	Massively parallel programming models used as hardware description languages: The OpenCL case
Burgio et al.	2012	OpenMP-based synergistic parallelization and HW acceleration for on-chip shared-memory clusters
Xue et al.	2010	Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding
Huynh et al.	2013	Mapping streaming applications onto GPU systems
Tseng et al.	2014	Automatic data layout transformation for heterogeneous many-core systems
Kim et al.	2013	Memory performance estimation of CUDA programs
Jin et al.	2018	Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench
Conti et al.	2016	He-P2012: Performance and energy exploration of architecturally heterogeneous many-cores