Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2023
CustomHalide – A new plugin of clang for loop optimization
ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial IntelligencePages 557–567https://doi.org/10.1145/3594315.3594372Nowadays, Polyhedral and Halide techniques optimize complex nested loops in deep learning frameworks’ basic operator compilation process. However, faced with many algorithm platforms, a single framework cannot be used for deployment purposes. In some ...
- research-articleFebruary 2023
Accelerating OpenVX Application Kernels Using Halide Scheduling
Journal of Signal Processing Systems (JSPS), Volume 95, Issue 5Pages 623–642https://doi.org/10.1007/s11265-023-01851-1AbstractIn this study, we investigate how to use a Domain-Specific Language—Halide to accelerate and optimize OpenVX graphs. Halide is a new high-level image processing pipeline language. It offers developers to separate the program into algorithms and ...
- research-articleFebruary 2023
Accelerating OpenVX through Halide and MLIR
Journal of Signal Processing Systems (JSPS), Volume 95, Issue 5Pages 571–584https://doi.org/10.1007/s11265-022-01826-8AbstractIn recent years, as many social media and AI-enabled applications have become increasingly ubiquitous, camera-centric applications have emerged as the most popular category of apps on mobile phones. A programmer can develop a camera application in ...
- ArticleMay 2023
Halide Code Generation Framework in Phylanx
AbstractSeparating algorithms from their computation schedule has become a de facto solution to tackle the challenges of developing high performance code on modern heterogeneous architectures. Common approaches include Domain-specific languages (DSLs) ...
- research-articleJune 2022
Efficient microscopy image analysis on CPU-GPU systems with cost-aware irregular data partitioning
- Willian Barreiros,
- Alba C.M.A. Melo,
- Jun Kong,
- Renato Ferreira,
- Tahsin M. Kurc,
- Joel H. Saltz,
- George Teodoro
Journal of Parallel and Distributed Computing (JPDC), Volume 164, Issue CPages 40–54https://doi.org/10.1016/j.jpdc.2022.02.004AbstractThe analysis of high resolution whole slide tissue images is a computationally expensive task, which adversely impacts effective use of pathology imaging data in research. We propose runtime solutions to enable efficient execution of ...
Highlights- Pathology image analysis is computed intensive due to the high-resolution images.
- research-articleMay 2022
A fast and concise parallel implementation of the 8x8 2D forward and inverse DCTs using halide
Journal of Parallel and Distributed Computing (JPDC), Volume 163, Issue CPages 20–29https://doi.org/10.1016/j.jpdc.2022.01.014Highlights- IDCT and FDCT written in Halide.
- 200 Lines of Halide replace over 20,000 lines ...
The Discrete Cosine Transform (DCT) is commonly used for image and video coding and very efficient implementations of the forward and inverse transforms are of great importance. The popular libjpeg-turbo library contains handwritten, ...
- research-articleOctober 2021
Efficient automatic scheduling of imaging and vision pipelines for the GPU
Proceedings of the ACM on Programming Languages (PACMPL), Volume 5, Issue OOPSLAArticle No.: 109, Pages 1–28https://doi.org/10.1145/3485486We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or hand-optimized kernels. ...
- research-articleAugust 2020
Schedule Synthesis for Halide Pipelines on GPUs
ACM Transactions on Architecture and Code Optimization (TACO), Volume 17, Issue 3Article No.: 23, Pages 1–25https://doi.org/10.1145/3406117The Halide DSL and compiler have enabled high-performance code generation for image processing pipelines targeting heterogeneous architectures through the separation of algorithmic description and optimization schedule. However, automatic schedule ...
- research-articleMay 2020
Programming tensor cores from an image processing DSL
SCOPES '20: Proceedings of the 23th International Workshop on Software and Compilers for Embedded SystemsPages 36–41https://doi.org/10.1145/3378678.3391880Tensor Cores (TCUs) are specialized units first introduced by NVIDIA in the Volta microarchitecture in order to accelerate matrix multiplications for deep learning and linear algebra workloads. While these units have proved to be capable of providing ...
- research-articleAugust 2019
Accelerate DNN Performance with Sparse Matrix Compression in Halide
ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel ProcessingArticle No.: 14, Pages 1–6https://doi.org/10.1145/3339186.3339194Machine learning nowadays is profoundly impacting every aspect of our lives. With the evolution of the machine learning, many techniques, such as deep learning, improve the accuracy and performance of machine learning. Deep learning is a set of ML ...
- posterMay 2019
Sparse-Matrix Compression Primitives with OpenCL Framework to Support Halide
IWOCL '19: Proceedings of the International Workshop on OpenCLArticle No.: 24, Pages 1–2https://doi.org/10.1145/3318170.3318179Halide and OpenCL now play important roles for heterogeneous multi-core computing. OpenCL provides vendor-level support and Halide provides domain-specific support such as vision processing and AI model (TVM Halide IR). Halide also provides flexible ...
- research-articleApril 2019
Schedule Synthesis for Halide Pipelines through Reuse Analysis
ACM Transactions on Architecture and Code Optimization (TACO), Volume 16, Issue 2Article No.: 10, Pages 1–22https://doi.org/10.1145/3310248Efficient code generation for image processing applications continues to pose a challenge in a domain where high performance is often necessary to meet real-time constraints. The inherently complex structure found in most image-processing pipelines, the ...
- articleMarch 2019
A Halide-based Synergistic Computing Framework for Heterogeneous Systems
Journal of Signal Processing Systems (JSPS), Volume 91, Issue 3-4Pages 219–233https://doi.org/10.1007/s11265-017-1283-1New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for ...
- research-articleFebruary 2018
Loop transformations leveraging hardware prefetching
CGO '18: Proceedings of the 2018 International Symposium on Code Generation and OptimizationPages 254–264https://doi.org/10.1145/3168823Memory-bound applications heavily depend on the bandwidth of the system in order to achieve high performance. Improving temporal and/or spatial locality through loop transformations is a common way of mitigating this dependency. However, choosing the ...
- research-articleNovember 2017
A profile-guided synergistic computation framework for Halide
Recently, heterogeneous computing that incorporates the main processor(s) with accelerator(s) for boosting the performance of applications becomes popular. While joining forces of the accelerators could help improve performance, it may also sometimes ...
- research-articleAugust 2017
Extending Halide to Improve Software Development for Imaging DSPs
ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 3Article No.: 21, Pages 1–25https://doi.org/10.1145/3106343Specialized Digital Signal Processors (DSPs), which can be found in a wide range of modern devices, play an important role in power-efficient, high-performance image processing. Applications including camera sensor post-processing and computer vision ...