research-article

Accelerating OpenVX through Halide and MLIR

Authors:

Shih-Wei LiaoAuthors Info & Claims

Journal of Signal Processing Systems, Volume 95, Issue 5

Pages 571 - 584

https://doi.org/10.1007/s11265-022-01826-8

Published: 01 February 2023 Publication History

Abstract

In recent years, as many social media and AI-enabled applications have become increasingly ubiquitous, camera-centric applications have emerged as the most popular category of apps on mobile phones. A programmer can develop a camera application in an hour or less without any knowledge related to this domain by using different API provided by frameworks. This allows for the rapid promotion of this technology. OpenVX is a computer vision framework with vital considerations for performance and portability. This paper proposes a new framework that effectively accelerates OpenVX with Halide and MLIR. Our framework possesses Halide’s properties of decoupling algorithms and also has schedules such as an auto-scheduler. It also has MLIR’s multi-level dialects that structure the operations and the data accesses. To generate more efficient programs, we propose a bridge that can transform the programs written in OpenVX into Halide and then translate from Halide to MLIR. In the process, Our framework attains both Halide’s scheduling and MLIR’s dialect to generate more efficient binary code for execution speed.

References

[1]

Krizhevsky A, Sutskever I, and Hinton GE Imagenet classification with deep convolutional neural networks Communications of the ACM 2017 60 6 84-90

[2]

Baghdadi, R., Ray, J., Romdhane, M. B., Sozzo, E. D., Akkas, A., Zhang, Y., Suriana, P., Kamil, S., & Amarasinghe, S. (2018). Tiramisu: A polyhedral compiler for expressing fast and portable code.

[3]

Benabderrahmane M-W, Pouchet L-N, Cohen A, and Bastoul C Gupta R The polyhedral model is more widely applicable than you think Compiler Construction 2010 Berlin, Heidelberg Springer 283-303

[4]

Hartono, A., Baskaran, M. M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., Ramanujam, J., & Sadayappan, P. (2009). Parametric multi-level tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing. ICS ’09 (pp. 147–157). Association for Computing Machinery, New York, NY, USA.

[5]

Tavarageri, S., Hartono, A., Baskaran, M., Pouchet, L.-N., Ramanujam, J., & Sadayappan, P. (2010). Parametric tiling of affine loop nests. In Proceedings of the 15th Workshop on Compilers for Parallel Computers, Vienna, Austria.

[6]

Maleki, S., Gao, Y., Garzar’n, M. J., Wong, T., & Padua, D. A. (2011). An evaluation of vectorizing compilers. In 2011 International Conference on Parallel Architectures and Compilation Techniques (pp. 372–382).

[7]

Giduthuri, R., & Pulli, K. (2016). OpenVX: A framework for accelerating computer vision. In SIGGRAPH ASIA 2016 Courses. https://doi.org/10.1145/2988458.2988513

[8]

Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 519–530). PLDI ’13. Association for Computing Machinery, New York, NY, USA.

[9]

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., & Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (pp. 265–283). Retrieved September 20, 2021, from https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

[10]

Lattner, C., Amini, M., Bondhugula, U., Cohen, A., Davis, A., Pienaar, J., Riddle, R., Shpeisman, T., Vasilache, N., & Zinenko, O. (2021). MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (pp. 2–14).

[11]

Bondhugula, U. (2020). High performance code generation in MLIR: An early case study with GEMM. CoRR abs/2003.00532. arXiv:2003.00532

[12]

Wang, E., et al. (2014). Intel Math Kernel Library. In High-performance computing on the Intel® Xeon Phi™. Springer, Cham. Retrieved September 1, 2021, from https://doi.org/10.1007/978-3-319-06486-4_7

[13]

Wang, Q., Zhang, X., Zhang, Y., & Yi, Q. (2013). AUGEM: Automatically generate high performance dense linear algebra Kernels on x86 CPUS. In SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (pp. 1–12).

[14]

Xianyi, Z., Qian, W., & Yunquan, Z. (2012). Model-driven level 3 BLAS performance optimization on Loongson 3A processor. In 2012 IEEE 18th International Conference on Parallel and Distributed Systems (pp. 684–691).

[15]

Verdoolaege, S. (2010). ISL: An integer set library for the polyhedral model. In Proceedings of the Third International Congress Conference on Mathematical Software (pp. 299–302). ICMS’10. Springer, Berlin, Heidelberg.

[16]

Bastoul, C. (2004). Code generation in the polyhedral model is easier than you think. In PACT’13 IEEE International Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, France (pp. 7–16).

[17]

Goto, K., & Geijn, R. A. V. D. (2008). Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software, 34(3).

Recommendations

Accelerating OpenVX Application Kernels Using Halide Scheduling
Abstract
In this study, we investigate how to use a Domain-Specific Language—Halide to accelerate and optimize OpenVX graphs. Halide is a new high-level image processing pipeline language. It offers developers to separate the program into algorithms and ...
A Halide-based Synergistic Computing Framework for Heterogeneous Systems

New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for ...
Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators

In recent years, image processing has been a key application area for mobile and embedded computing platforms. In this context, many-core accelerators are a viable solution to efficiently execute highly parallel kernels. However, architectural ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Signal Processing Systems

Journal of Signal Processing Systems Volume 95, Issue 5

May 2023

100 pages

ISSN:1939-8018

EISSN:1939-8115

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 February 2023

Accepted: 30 November 2022

Revision received: 11 November 2022

Received: 08 July 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents