Keyword: Halide : Search

research-article

CustomHalide – A new plugin of clang for loop optimization

ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial IntelligencePages 557–567https://doi.org/10.1145/3594315.3594372

Nowadays, Polyhedral and Halide techniques optimize complex nested loops in deep learning frameworks’ basic operator compilation process. However, faced with many algorithm platforms, a single framework cannot be used for deployment purposes. In some ...

research-article

Accelerating OpenVX Application Kernels Using Halide Scheduling

Journal of Signal Processing Systems (JSPS), Volume 95, Issue 5Pages 623–642https://doi.org/10.1007/s11265-023-01851-1

Abstract

In this study, we investigate how to use a Domain-Specific Language—Halide to accelerate and optimize OpenVX graphs. Halide is a new high-level image processing pipeline language. It offers developers to separate the program into algorithms and ...

research-article

Accelerating OpenVX through Halide and MLIR

Journal of Signal Processing Systems (JSPS), Volume 95, Issue 5Pages 571–584https://doi.org/10.1007/s11265-022-01826-8

Abstract

In recent years, as many social media and AI-enabled applications have become increasingly ubiquitous, camera-centric applications have emerged as the most popular category of apps on mobile phones. A programmer can develop a camera application in ...

Article

Halide Code Generation Framework in Phylanx

Euro-Par 2022: Parallel Processing WorkshopsPages 32–45https://doi.org/10.1007/978-3-031-31209-0_3

Abstract

Separating algorithms from their computation schedule has become a de facto solution to tackle the challenges of developing high performance code on modern heterogeneous architectures. Common approaches include Domain-specific languages (DSLs) ...

research-article

Efficient microscopy image analysis on CPU-GPU systems with cost-aware irregular data partitioning

Journal of Parallel and Distributed Computing (JPDC), Volume 164, Issue CPages 40–54https://doi.org/10.1016/j.jpdc.2022.02.004

Abstract

The analysis of high resolution whole slide tissue images is a computationally expensive task, which adversely impacts effective use of pathology imaging data in research. We propose runtime solutions to enable efficient execution of ...

Highlights

Pathology image analysis is computed intensive due to the high-resolution images.

research-article

A fast and concise parallel implementation of the 8x8 2D forward and inverse DCTs using halide

Journal of Parallel and Distributed Computing (JPDC), Volume 163, Issue CPages 20–29https://doi.org/10.1016/j.jpdc.2022.01.014

Highlights

IDCT and FDCT written in Halide.
200 Lines of Halide replace over 20,000 lines ...

Abstract

The Discrete Cosine Transform (DCT) is commonly used for image and video coding and very efficient implementations of the forward and inverse transforms are of great importance. The popular libjpeg-turbo library contains handwritten, ...

research-article

Open Access

Efficient automatic scheduling of imaging and vision pipelines for the GPU

Proceedings of the ACM on Programming Languages (PACMPL), Volume 5, Issue OOPSLAArticle No.: 109, Pages 1–28https://doi.org/10.1145/3485486

We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or hand-optimized kernels. ...

research-article

Open Access

Schedule Synthesis for Halide Pipelines on GPUs

ACM Transactions on Architecture and Code Optimization (TACO), Volume 17, Issue 3Article No.: 23, Pages 1–25https://doi.org/10.1145/3406117

The Halide DSL and compiler have enabled high-performance code generation for image processing pipelines targeting heterogeneous architectures through the separation of algorithmic description and optimization schedule. However, automatic schedule ...

research-article

Programming tensor cores from an image processing DSL

SCOPES '20: Proceedings of the 23th International Workshop on Software and Compilers for Embedded SystemsPages 36–41https://doi.org/10.1145/3378678.3391880

Tensor Cores (TCUs) are specialized units first introduced by NVIDIA in the Volta microarchitecture in order to accelerate matrix multiplications for deep learning and linear algebra workloads. While these units have proved to be capable of providing ...

research-article

Accelerate DNN Performance with Sparse Matrix Compression in Halide

ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel ProcessingArticle No.: 14, Pages 1–6https://doi.org/10.1145/3339186.3339194

Machine learning nowadays is profoundly impacting every aspect of our lives. With the evolution of the machine learning, many techniques, such as deep learning, improve the accuracy and performance of machine learning. Deep learning is a set of ML ...

poster

Sparse-Matrix Compression Primitives with OpenCL Framework to Support Halide

IWOCL '19: Proceedings of the International Workshop on OpenCLArticle No.: 24, Pages 1–2https://doi.org/10.1145/3318170.3318179

Halide and OpenCL now play important roles for heterogeneous multi-core computing. OpenCL provides vendor-level support and Halide provides domain-specific support such as vision processing and AI model (TVM Halide IR). Halide also provides flexible ...

research-article

Open Access

Schedule Synthesis for Halide Pipelines through Reuse Analysis

ACM Transactions on Architecture and Code Optimization (TACO), Volume 16, Issue 2Article No.: 10, Pages 1–22https://doi.org/10.1145/3310248

Efficient code generation for image processing applications continues to pose a challenge in a domain where high performance is often necessary to meet real-time constraints. The inherently complex structure found in most image-processing pipelines, the ...

article

A Halide-based Synergistic Computing Framework for Heterogeneous Systems

Journal of Signal Processing Systems (JSPS), Volume 91, Issue 3-4Pages 219–233https://doi.org/10.1007/s11265-017-1283-1

New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for ...

research-article

Loop transformations leveraging hardware prefetching

CGO '18: Proceedings of the 2018 International Symposium on Code Generation and OptimizationPages 254–264https://doi.org/10.1145/3168823

Memory-bound applications heavily depend on the bandwidth of the system in order to achieve high performance. Improving temporal and/or spatial locality through loop transformations is a common way of mitigating this dependency. However, choosing the ...

research-article

A profile-guided synergistic computation framework for Halide

Journal of Systems Architecture: the EUROMICRO Journal (JOSA), Volume 81, Issue CPages 54–61

Recently, heterogeneous computing that incorporates the main processor(s) with accelerator(s) for boosting the performance of applications becomes popular. While joining forces of the accelerators could help improve performance, it may also sometimes ...

research-article

Open Access

Extending Halide to Improve Software Development for Imaging DSPs

ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 3Article No.: 21, Pages 1–25https://doi.org/10.1145/3106343

Specialized Digital Signal Processors (DSPs), which can be found in a wide range of modern devices, play an important role in power-efficient, high-performance image processing. Applications including camera sensor post-processing and computer vision ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Caption

CustomHalide – A new plugin of clang for loop optimization

Accelerating OpenVX Application Kernels Using Halide Scheduling

Accelerating OpenVX through Halide and MLIR

Halide Code Generation Framework in Phylanx

Efficient microscopy image analysis on CPU-GPU systems with cost-aware irregular data partitioning

A fast and concise parallel implementation of the 8x8 2D forward and inverse DCTs using halide

Efficient automatic scheduling of imaging and vision pipelines for the GPU

Schedule Synthesis for Halide Pipelines on GPUs

Programming tensor cores from an image processing DSL

Accelerate DNN Performance with Sparse Matrix Compression in Halide

Sparse-Matrix Compression Primitives with OpenCL Framework to Support Halide

Schedule Synthesis for Halide Pipelines through Reuse Analysis

A Halide-based Synergistic Computing Framework for Heterogeneous Systems

Loop transformations leveraging hardware prefetching

A profile-guided synergistic computation framework for Halide

Extending Halide to Improve Software Development for Imaging DSPs

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder