research-article

Enhancing LLVM Optimizations for Linear Recurrence Programs on RVV

Authors:

Yuan-Shin HwangAuthors Info & Claims

ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops

Pages 79 - 87

https://doi.org/10.1145/3605731.3605904

Published: 07 September 2023 Publication History

Abstract

The RISC-V Vector Extension (RVV) has emerged as a promising vector architecture for high-performance computing. It enables parallel computing capability for RISC-V CPUs by introducing additional vector instructions and vector registers. To fully utilize the potential of RVV, it is crucial to optimize the auto-vectorization capabilities of compilers. Auto-vectorization is a widely used method for parallelizing scalar programs on vector machines. It examines the input program to identify the parallelizable section and then maps it to the underlying vector architecture. This paper focuses on enhancing LLVM’s auto-vectorization for linear recurrence programs on RVV, with a specific emphasis on a well-studied computation pattern called scan (all-prefix-sums). Linear recurrence programs are prevalent in diverse computational domains, and their efficient execution is vital for achieving optimal performance. However, the auto-vectorization on current LLVM’s vectorizer is hindered by the loop-carried data dependence of recurrence programs. In this work, we propose novel techniques to address the challenges of auto-vectorizing linear recurrence programs on RVV, leveraging the unique features of RVV and extending LLVM’s vectorization capabilities. The experiment shows that our auto-vectorized plus-scan can achieved 17.64x speedup on RVV compared to LLVM 16.0.0.

References

[1]

1986. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range. (1986).

[2]

Neil Adit and Adrian Sampson. 2022. Performance Left on the Table: An Evaluation of Compiler Autovectorization for RISC-V. IEEE Micro 42, 5 (2022), 41–48. https://doi.org/10.1109/MM.2022.3184867

Digital Library

[3]

Guy E Blelloch. 1990. Vector models for data-parallel computing. Vol. 2. MIT press Cambridge.

[4]

Guy E. Blelloch. 2004. Prefix sums and their applications. (5 2004). https://doi.org/10.1184/R1/6608579.v1

[5]

D. Callahan, J. Dongarra, and D. Levine. 1988. Vectorizing compilers: a test suite and results. In Supercomputing ’88:Proceedings of the 1988 ACM/IEEE Conference on Supercomputing, Vol. I. 98–105. https://doi.org/10.1109/SUPERC.1988.44642

[6]

Allan L. Fisher and Anwar M. Ghuloum. 1994. Parallelizing Complex Scans and Reductions. SIGPLAN Not. 29, 6 (jun 1994), 135–146. https://doi.org/10.1145/773473.178255

Digital Library

[7]

Mark Harris 2007. Optimizing parallel reduction in CUDA. Nvidia developer technology 2, 4 (2007), 70.

[8]

W Daniel Hillis and Guy L Steele Jr. 1986. Data parallel algorithms. Commun. ACM 29, 12 (1986), 1170–1183.

Digital Library

[9]

Peng Jiang, Linchuan Chen, and Gagan Agrawal. 2018. Revealing Parallel Scans and Reductions in Recurrences through Function Reconstruction. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (Limassol, Cyprus) (PACT ’18). Association for Computing Machinery, New York, NY, USA, Article 10, 13 pages. https://doi.org/10.1145/3243176.3243204

Digital Library

[10]

Ralf Karrenberg and Sebastian Hack. 2011. Whole-Function Vectorization. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization(CGO ’11). IEEE Computer Society, USA, 141–150.

[11]

Peter M. Kogge and Harold S. Stone. 1973. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Trans. Comput. C-22, 8 (1973), 786–793. https://doi.org/10.1109/TC.1973.5009159

Digital Library

[12]

Richard E Ladner and Michael J Fischer. 1980. Parallel prefix computation. Journal of the ACM (JACM) 27, 4 (1980), 831–838.

Digital Library

[13]

Hung-Ming Lai and Jenq-Kuen Lee. 2023. Efficient Support of the Scan Vector Model for RISC-V Vector Extension. In Workshop Proceedings of the 51st International Conference on Parallel Processing (Bordeaux, France) (ICPP Workshops ’22). Association for Computing Machinery, New York, NY, USA, Article 15, 8 pages. https://doi.org/10.1145/3547276.3548518

Digital Library

[14]

Yu-Te Lin and Jenq-Kuen Lee. 2015. Vector data flow analysis for SIMD optimizations on OpenCL programs. Concurrency and Computation: Practice and Experience 28 (01 2015), n/a–n/a. https://doi.org/10.1002/cpe.3714

Digital Library

[15]

D. Nuzman and R. Henderson. 2006. Multi-platform auto-vectorization. In International Symposium on Code Generation and Optimization (CGO’06). 11 pp.–294. https://doi.org/10.1109/CGO.2006.25

Digital Library

[16]

G. Ren, P. Wu, and D. Padua. 2005. An empirical study on the vectorization of multimedia applications for multimedia extensions. In 19th IEEE International Parallel and Distributed Processing Symposium. 10 pp.–. https://doi.org/10.1109/IPDPS.2005.94

Digital Library

[17]

Y. Tanaka, K. Iwasawa, S. Gotoo, and Y. Umetani. 1988. Compiling techniques for first-order linear recurrences on a vector computer. In Supercomputing ’88:Proceedings of the 1988 ACM/IEEE Conference on Supercomputing, Vol. I. 174–181. https://doi.org/10.1109/SUPERC.1988.44651

[18]

Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero, Matt Masten, Andrew Savonichev, Michael Rice, Elena Demikhovsky, Ayal Zaks, Gil Rapaport, Abhinav Gaba, Vasileios Porpodas, and Eric Garcia. 2017. LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization(LLVM-HPC’17). Association for Computing Machinery, New York, NY, USA, Article 4, 11 pages. https://doi.org/10.1145/3148173.3148191

Digital Library

Index Terms

Enhancing LLVM Optimizations for Linear Recurrence Programs on RVV
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

VW-SLP: auto-vectorization with adaptive vector width
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

Auto-vectorization techniques allow the compiler to automatically generate SIMD vector code out of scalar code. SLP is a commonly-used algorithm for converting straight-line code into vector code, which complements the loop-based traditional ...
Vectorization-aware loop unrolling with seed forwarding
CC 2020: Proceedings of the 29th International Conference on Compiler Construction

Loop unrolling is a widely adopted loop transformation, commonly used for enabling subsequent optimizations. Straight-line-code vectorization (SLP) is an optimization that benefits from unrolling. SLP converts isomorphic instruction sequences into ...
Multi-dimensional Vectorization in LLVM
WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing

Loop vectorization is a classic technique to exploit SIMD instructions in a productive way. In multi-dimensional vectorization, multiple loops of a loop nest are vectorized at once. This exposes opportunities for data reuse, register tiling and more ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops

August 2023

217 pages

ISBN:9798400708428

DOI:10.1145/3605731

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

RealTek
Taiwan NSTC
Google

Conference

ICPP-W 2023

ICPP-W 2023: 52nd International Conference on Parallel Processing Workshops

August 7 - 10, 2023

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
130
Total Downloads

Downloads (Last 12 months)106
Downloads (Last 6 weeks)18

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents