Tseng et al., 2014 - Google Patents
Automatic data layout transformation for heterogeneous many-core systemsTseng et al., 2014
View PDF- Document ID
- 11980604711650493275
- Author
- Tseng Y
- Huang Y
- Lai B
- Lin J
- Publication year
- Publication venue
- Network and Parallel Computing: 11th IFIP WG 10.3 International Conference, NPC 2014, Ilan, Taiwan, September 18-20, 2014. Proceedings 11
External Links
Snippet
Applying appropriate data structures is critical to attain superior performance in heterogeneous many-core systems. A heterogeneous many-core system is comprised of a host for control flow management, and a device for massive parallel data processing …
- 230000001131 transforming 0 title abstract description 41
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107347253B (en) | Hardware instruction generation unit for special purpose processor | |
Lee et al. | Openacc to fpga: A framework for directive-based high-performance reconfigurable computing | |
US9448779B2 (en) | Execution of retargetted graphics processor accelerated code by a general purpose processor | |
Singh | Computing without Processors: Heterogeneous systems allow us to target our programming to the appropriate environment. | |
Liu et al. | Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors | |
Krommydas et al. | Opendwarfs: Characterization of dwarf-based benchmarks on fixed and reconfigurable architectures | |
Vishkin | Using simple abstraction to reinvent computing for parallelism | |
Baskaran et al. | Optimizing sparse matrix-vector multiplication on GPUs using compile-time and run-time strategies | |
Ukidave et al. | Nupar: A benchmark suite for modern gpu architectures | |
Hormati et al. | Macross: Macro-simdization of streaming applications | |
Coarfa et al. | Co-array Fortran performance and potential: An NPB experimental study | |
Papakonstantinou et al. | Efficient compilation of CUDA kernels for high-performance computing on FPGAs | |
Sunitha et al. | Performance improvement of CUDA applications by reducing CPU-GPU data transfer overhead | |
Desnos et al. | Memory Analysis and Optimized Allocation of Dataflow Applications on Shared-Memory MPSoCs: In-Depth Study of a Computer Vision Application | |
Oancea et al. | Financial software on GPUs: between Haskell and Fortran | |
Verma et al. | Accelerating workloads on fpgas via opencl: A case study with opendwarfs | |
Buchty | High-performance and hardware-aware computing | |
Owaida et al. | Massively parallel programming models used as hardware description languages: The OpenCL case | |
Burgio et al. | OpenMP-based synergistic parallelization and HW acceleration for on-chip shared-memory clusters | |
Xue et al. | Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding | |
Huynh et al. | Mapping streaming applications onto GPU systems | |
Tseng et al. | Automatic data layout transformation for heterogeneous many-core systems | |
Kim et al. | Memory performance estimation of CUDA programs | |
Jin et al. | Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench | |
Conti et al. | He-P2012: Performance and energy exploration of architecturally heterogeneous many-cores |