Keyword: VLIW : Search

research-article

Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 29, Issue 5Article No.: 83, Pages 1–20https://doi.org/10.1145/3643135

Typical embedded processors, such as Digital Signal Processors (DSPs), usually adopt Very Long Instruction Word (VLIW) architecture to improve computing efficiency. The performance of VLIW processors heavily relies on Instruction-Level Parallelism (ILP). ...

Article

nAIxt: A Light-Weight Processor Architecture for Efficient Computation of Neuron Models

Architecture of Computing SystemsPages 3–17https://doi.org/10.1007/978-3-031-66146-4_1

Abstract

The simulation of biological neural networks holds immense promise for advancing both neuroscience and artificial intelligence. Due to its high complexity, it requires powerful computers. However, the high proportion of communication and routing ...

Article

Adaptive Low-Cost Loop Expansion for Modulo Scheduling

Network and Parallel ComputingPages 30–41https://doi.org/10.1007/978-3-031-21395-3_3

Abstract

This paper presents a novel modulo scheduling method, which is called Expanded Modulo Scheduling (EMS). Unlike existing methods which regard loop unrolling and scheduling respectively, EMS supports adaptive loop expansion, and provides a unified ...

research-article

Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers

Journal of Signal Processing Systems (JSPS), Volume 92, Issue 7Pages 655–678https://doi.org/10.1007/s11265-019-01493-2

Abstract

Code generation for VLIW processors includes several optimization problems like code optimization, instruction scheduling, and register allocation. The high complexity of these problems usually does not allow the computation of the optimal ...

research-article

A design of EPIC type processor based on MIPS architecture

Artificial Life and Robotics (SPALR), Volume 25, Issue 1Pages 59–63https://doi.org/10.1007/s10015-019-00554-w

Abstract

This paper proposes an EPIC (Explicitly Parallel Instruction Computing Architecture) type processor based on MIPS. VLIW processors can execute multiple instructions simultaneously, but due to dependency of instructions, it is often impossible to ...

Article

Evaluation of Different Processor Architecture Organizations for On-site Electronics in Harsh Environments

Embedded Computer Systems: Architectures, Modeling, and SimulationPages 3–17https://doi.org/10.1007/978-3-030-27562-4_1

Abstract

Microcontroller units used in harsh environmental conditions are manufactured using large semiconductor technology nodes in order to provide reliable operation, even at high temperatures or increased radiation exposition. These large technology ...

article

A templated programmable architecture for highly constrained embedded HD video processing

Journal of Real-Time Image Processing (SPJRTIP), Volume 16, Issue 1Pages 143–160https://doi.org/10.1007/s11554-018-0808-6

The implementation of a video reconstruction pipeline is required to improve the quality of images delivered by highly constrained devices. These algorithms require high computing capacities--several dozens of GOPs for real-time HD 1080p video streams. ...

research-article

A Study of Techniques to Increase Instruction Level Parallelisms

ISCSIC '18: Proceedings of the 2nd International Symposium on Computer Science and Intelligent ControlArticle No.: 41, Pages 1–5https://doi.org/10.1145/3284557.3284562

Instruction Level Parallelism (ILP) is the number of instructions that can be executed in simultaneously a program in a clock cycle. The microprocessors exploit ILP by means of several techniques that have been implemented in the last decades and ...

research-article

Adaptive and polymorphic VLIW processor to optimize fault tolerance, energy consumption, and performance

CF '18: Proceedings of the 15th ACM International Conference on Computing FrontiersPages 54–61https://doi.org/10.1145/3203217.3203238

Because most traditional homogeneous and heterogeneous processors have a fixed design that limits its runtime adaptability, they are not able to cope with the varying application behavior when one considers the axes of fault tolerance, performance, and ...

article

Generating ASIPs with Reduced Number of Connections to the Register-File

International Journal of Parallel Programming (IJPP), Volume 45, Issue 6Pages 1461–1487https://doi.org/10.1007/s10766-017-0491-4

We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., $$*({ reg}1*{ reg}2) = (*{ reg}3)+(*{ reg}4)$$ź(reg1źreg2)=(źreg3)+(źreg4) (C-syntax) an ...

research-article

Domain specific compiler for coordinated signal processing in 5G testbed

SmartIoT '17: Proceedings of the Workshop on Smart Internet of ThingsArticle No.: 8, Pages 1–5https://doi.org/10.1145/3132479.3132487

In the past years, the Internet of Things (IoT) is changing our working mode and life. In order to address the demand of massive data, low latency and low power consumption in IoT applications, the wireless industry is moving to its fifth generation (5G)...

research-article

Open Access

Extending Halide to Improve Software Development for Imaging DSPs

ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 3Article No.: 21, Pages 1–25https://doi.org/10.1145/3106343

Specialized Digital Signal Processors (DSPs), which can be found in a wide range of modern devices, play an important role in power-efficient, high-performance image processing. Applications including camera sensor post-processing and computer vision ...

research-article

Design and Implementation of Configurable SHIFT Instructions Targeted at Symmetrical Cipher Processing

Procedia Computer Science (PROCS), Volume 107, Issue CPages 225–230https://doi.org/10.1016/j.procs.2017.03.083

High-performance and flexible configurable SHIFT instructions targeted at symmetrical cipher processing are proposed in this paper, in order to dispel the bottleneck of symmetrical cipher algorithms realized by universal processors. Through analyzing ...

research-article

Novel parallel Givens QR decomposition implementation on VLIW architecture with Efficient memory access for real time image processing applications

BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and ApplicationsArticle No.: 89, Pages 1–6https://doi.org/10.1145/3090354.3090445

Compressed Sensing (CS) methods impact is important in the health care systems. The acquisition and the processing speed of many medical imaging applications is highly improved using this technique. Orthogonal Matching Pursuit (OMP) is one of the widely ...

research-article

Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 13, Issue 2Article No.: 13, Pages 1–21https://doi.org/10.1145/3001935

Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable ...

article

IEEE 802.11ac MIMO Transceiver Baseband Processing on a VLIW Processor

Journal of Signal Processing Systems (JSPS), Volume 85, Issue 1Pages 167–182https://doi.org/10.1007/s11265-015-1032-2

Wireless standards are evolving rapidly due to the exponential growth in the number of portable devices along with the applications with high data rate requirements. Adaptable software based signal processing implementations for these devices can make ...

research-article

Balanced loop retiming to effectively architect STT-RAM-based hybrid cache for VLIW processors

SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied ComputingPages 1710–1716https://doi.org/10.1145/2851613.2851670

Loop retiming has been extensively studied to maximize instruction-level parallelism (ILP) of multiple function units by rearranging the dependence delays in a uniform loop. Recently loop retiming technique has been proposed to mitigate the migration ...

research-article

Customizing VLIW processors from dynamically profiled execution traces

Microprocessors & Microsystems (MSYS), Volume 39, Issue 8Pages 656–673https://doi.org/10.1016/j.micpro.2015.09.005

The design philosophy of VLIW processors is to maximize instruction level parallelism (ILP) starting from compiler and machine code level to all the way down to memory and computational blocks. For this purpose, VLIW tailoring has been an important ...

article

pocl: A Performance-Portable OpenCL Implementation

International Journal of Parallel Programming (IJPP), Volume 43, Issue 5Pages 752–785https://doi.org/10.1007/s10766-014-0320-y

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the ...

research-article

Open Access

Aging-Aware Compilation for GP-GPUs

ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 2Article No.: 24, Pages 1–20https://doi.org/10.1145/2778984

General-purpose graphic processing units (GP-GPUs) offer high computational throughput using thousands of integrated processing elements (PEs). These PEs are stressed during workload execution, and negative bias temperature instability (NBTI) adversely ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming

nAIxt: A Light-Weight Processor Architecture for Efficient Computation of Neuron Models

Adaptive Low-Cost Loop Expansion for Modulo Scheduling

Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers

A design of EPIC type processor based on MIPS architecture

Upcoming Conferences

Evaluation of Different Processor Architecture Organizations for On-site Electronics in Harsh Environments

A templated programmable architecture for highly constrained embedded HD video processing

A Study of Techniques to Increase Instruction Level Parallelisms

Adaptive and polymorphic VLIW processor to optimize fault tolerance, energy consumption, and performance

Generating ASIPs with Reduced Number of Connections to the Register-File

Domain specific compiler for coordinated signal processing in 5G testbed

Extending Halide to Improve Software Development for Imaging DSPs

Design and Implementation of Configurable SHIFT Instructions Targeted at Symmetrical Cipher Processing

Novel parallel Givens QR decomposition implementation on VLIW architecture with Efficient memory access for real time image processing applications

Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors

IEEE 802.11ac MIMO Transceiver Baseband Processing on a VLIW Processor

Balanced loop retiming to effectively architect STT-RAM-based hybrid cache for VLIW processors

Customizing VLIW processors from dynamically profiled execution traces

pocl: A Performance-Portable OpenCL Implementation

Aging-Aware Compilation for GP-GPUs

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences