Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2024
Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 29, Issue 5Article No.: 83, Pages 1–20https://doi.org/10.1145/3643135Typical embedded processors, such as Digital Signal Processors (DSPs), usually adopt Very Long Instruction Word (VLIW) architecture to improve computing efficiency. The performance of VLIW processors heavily relies on Instruction-Level Parallelism (ILP). ...
- ArticleAugust 2024
nAIxt: A Light-Weight Processor Architecture for Efficient Computation of Neuron Models
AbstractThe simulation of biological neural networks holds immense promise for advancing both neuroscience and artificial intelligence. Due to its high complexity, it requires powerful computers. However, the high proportion of communication and routing ...
- ArticleSeptember 2022
Adaptive Low-Cost Loop Expansion for Modulo Scheduling
AbstractThis paper presents a novel modulo scheduling method, which is called Expanded Modulo Scheduling (EMS). Unlike existing methods which regard loop unrolling and scheduling respectively, EMS supports adaptive loop expansion, and provides a unified ...
- research-articleJuly 2020
Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers
Journal of Signal Processing Systems (JSPS), Volume 92, Issue 7Pages 655–678https://doi.org/10.1007/s11265-019-01493-2AbstractCode generation for VLIW processors includes several optimization problems like code optimization, instruction scheduling, and register allocation. The high complexity of these problems usually does not allow the computation of the optimal ...
- research-articleFebruary 2020
A design of EPIC type processor based on MIPS architecture
Artificial Life and Robotics (SPALR), Volume 25, Issue 1Pages 59–63https://doi.org/10.1007/s10015-019-00554-wAbstractThis paper proposes an EPIC (Explicitly Parallel Instruction Computing Architecture) type processor based on MIPS. VLIW processors can execute multiple instructions simultaneously, but due to dependency of instructions, it is often impossible to ...
-
- ArticleJuly 2019
Evaluation of Different Processor Architecture Organizations for On-site Electronics in Harsh Environments
- Sven Gesper,
- Moritz Weißbrich,
- Stephan Nolting,
- Tobias Stuckenberg,
- Pekka Jääskeläinen,
- Holger Blume,
- Guillermo Payá-Vayá
Embedded Computer Systems: Architectures, Modeling, and SimulationPages 3–17https://doi.org/10.1007/978-3-030-27562-4_1AbstractMicrocontroller units used in harsh environmental conditions are manufactured using large semiconductor technology nodes in order to provide reliable operation, even at high temperatures or increased radiation exposition. These large technology ...
- articleFebruary 2019
A templated programmable architecture for highly constrained embedded HD video processing
Journal of Real-Time Image Processing (SPJRTIP), Volume 16, Issue 1Pages 143–160https://doi.org/10.1007/s11554-018-0808-6The implementation of a video reconstruction pipeline is required to improve the quality of images delivered by highly constrained devices. These algorithms require high computing capacities--several dozens of GOPs for real-time HD 1080p video streams. ...
- research-articleSeptember 2018
A Study of Techniques to Increase Instruction Level Parallelisms
ISCSIC '18: Proceedings of the 2nd International Symposium on Computer Science and Intelligent ControlArticle No.: 41, Pages 1–5https://doi.org/10.1145/3284557.3284562Instruction Level Parallelism (ILP) is the number of instructions that can be executed in simultaneously a program in a clock cycle. The microprocessors exploit ILP by means of several techniques that have been implemented in the last decades and ...
- research-articleMay 2018
Adaptive and polymorphic VLIW processor to optimize fault tolerance, energy consumption, and performance
CF '18: Proceedings of the 15th ACM International Conference on Computing FrontiersPages 54–61https://doi.org/10.1145/3203217.3203238Because most traditional homogeneous and heterogeneous processors have a fixed design that limits its runtime adaptability, they are not able to cope with the varying application behavior when one considers the axes of fault tolerance, performance, and ...
- articleDecember 2017
Generating ASIPs with Reduced Number of Connections to the Register-File
International Journal of Parallel Programming (IJPP), Volume 45, Issue 6Pages 1461–1487https://doi.org/10.1007/s10766-017-0491-4We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., $$*({ reg}1*{ reg}2) = (*{ reg}3)+(*{ reg}4)$$ź(reg1źreg2)=(źreg3)+(źreg4) (C-syntax) an ...
- research-articleOctober 2017
Domain specific compiler for coordinated signal processing in 5G testbed
SmartIoT '17: Proceedings of the Workshop on Smart Internet of ThingsArticle No.: 8, Pages 1–5https://doi.org/10.1145/3132479.3132487In the past years, the Internet of Things (IoT) is changing our working mode and life. In order to address the demand of massive data, low latency and low power consumption in IoT applications, the wireless industry is moving to its fifth generation (5G)...
- research-articleAugust 2017
Extending Halide to Improve Software Development for Imaging DSPs
ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 3Article No.: 21, Pages 1–25https://doi.org/10.1145/3106343Specialized Digital Signal Processors (DSPs), which can be found in a wide range of modern devices, play an important role in power-efficient, high-performance image processing. Applications including camera sensor post-processing and computer vision ...
- research-articleApril 2017
Design and Implementation of Configurable SHIFT Instructions Targeted at Symmetrical Cipher Processing
Procedia Computer Science (PROCS), Volume 107, Issue CPages 225–230https://doi.org/10.1016/j.procs.2017.03.083High-performance and flexible configurable SHIFT instructions targeted at symmetrical cipher processing are proposed in this paper, in order to dispel the bottleneck of symmetrical cipher algorithms realized by universal processors. Through analyzing ...
- research-articleMarch 2017
Novel parallel Givens QR decomposition implementation on VLIW architecture with Efficient memory access for real time image processing applications
BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and ApplicationsArticle No.: 89, Pages 1–6https://doi.org/10.1145/3090354.3090445Compressed Sensing (CS) methods impact is important in the health care systems. The acquisition and the processing speed of many medical imaging applications is highly improved using this technique. Orthogonal Matching Pursuit (OMP) is one of the widely ...
- research-articleJanuary 2017
Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors
- Anderson L. Sartor,
- Arthur F. Lorenzon,
- Luigi Carro,
- Fernanda Kastensmidt,
- Stephan Wong,
- Antonio C. S. Beck
ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 13, Issue 2Article No.: 13, Pages 1–21https://doi.org/10.1145/3001935Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable ...
- articleOctober 2016
IEEE 802.11ac MIMO Transceiver Baseband Processing on a VLIW Processor
Journal of Signal Processing Systems (JSPS), Volume 85, Issue 1Pages 167–182https://doi.org/10.1007/s11265-015-1032-2Wireless standards are evolving rapidly due to the exponential growth in the number of portable devices along with the applications with high data rate requirements. Adaptable software based signal processing implementations for these devices can make ...
- research-articleApril 2016
Balanced loop retiming to effectively architect STT-RAM-based hybrid cache for VLIW processors
SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied ComputingPages 1710–1716https://doi.org/10.1145/2851613.2851670Loop retiming has been extensively studied to maximize instruction-level parallelism (ILP) of multiple function units by rearranging the dependence delays in a uniform loop. Recently loop retiming technique has been proposed to mitigate the migration ...
- research-articleNovember 2015
Customizing VLIW processors from dynamically profiled execution traces
Microprocessors & Microsystems (MSYS), Volume 39, Issue 8Pages 656–673https://doi.org/10.1016/j.micpro.2015.09.005The design philosophy of VLIW processors is to maximize instruction level parallelism (ILP) starting from compiler and machine code level to all the way down to memory and computational blocks. For this purpose, VLIW tailoring has been an important ...
- articleOctober 2015
pocl: A Performance-Portable OpenCL Implementation
International Journal of Parallel Programming (IJPP), Volume 43, Issue 5Pages 752–785https://doi.org/10.1007/s10766-014-0320-yOpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the ...
- research-articleJuly 2015
Aging-Aware Compilation for GP-GPUs
ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 2Article No.: 24, Pages 1–20https://doi.org/10.1145/2778984General-purpose graphic processing units (GP-GPUs) offer high computational throughput using thousands of integrated processing elements (PEs). These PEs are stressed during workload execution, and negative bias temperature instability (NBTI) adversely ...