Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3605731.3605904acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Enhancing LLVM Optimizations for Linear Recurrence Programs on RVV

Published: 07 September 2023 Publication History

Abstract

The RISC-V Vector Extension (RVV) has emerged as a promising vector architecture for high-performance computing. It enables parallel computing capability for RISC-V CPUs by introducing additional vector instructions and vector registers. To fully utilize the potential of RVV, it is crucial to optimize the auto-vectorization capabilities of compilers. Auto-vectorization is a widely used method for parallelizing scalar programs on vector machines. It examines the input program to identify the parallelizable section and then maps it to the underlying vector architecture. This paper focuses on enhancing LLVM’s auto-vectorization for linear recurrence programs on RVV, with a specific emphasis on a well-studied computation pattern called scan (all-prefix-sums). Linear recurrence programs are prevalent in diverse computational domains, and their efficient execution is vital for achieving optimal performance. However, the auto-vectorization on current LLVM’s vectorizer is hindered by the loop-carried data dependence of recurrence programs. In this work, we propose novel techniques to address the challenges of auto-vectorizing linear recurrence programs on RVV, leveraging the unique features of RVV and extending LLVM’s vectorization capabilities. The experiment shows that our auto-vectorized plus-scan can achieved 17.64x speedup on RVV compared to LLVM 16.0.0.

References

[1]
1986. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range. (1986).
[2]
Neil Adit and Adrian Sampson. 2022. Performance Left on the Table: An Evaluation of Compiler Autovectorization for RISC-V. IEEE Micro 42, 5 (2022), 41–48. https://doi.org/10.1109/MM.2022.3184867
[3]
Guy E Blelloch. 1990. Vector models for data-parallel computing. Vol. 2. MIT press Cambridge.
[4]
Guy E. Blelloch. 2004. Prefix sums and their applications. (5 2004). https://doi.org/10.1184/R1/6608579.v1
[5]
D. Callahan, J. Dongarra, and D. Levine. 1988. Vectorizing compilers: a test suite and results. In Supercomputing ’88:Proceedings of the 1988 ACM/IEEE Conference on Supercomputing, Vol. I. 98–105. https://doi.org/10.1109/SUPERC.1988.44642
[6]
Allan L. Fisher and Anwar M. Ghuloum. 1994. Parallelizing Complex Scans and Reductions. SIGPLAN Not. 29, 6 (jun 1994), 135–146. https://doi.org/10.1145/773473.178255
[7]
Mark Harris 2007. Optimizing parallel reduction in CUDA. Nvidia developer technology 2, 4 (2007), 70.
[8]
W Daniel Hillis and Guy L Steele Jr. 1986. Data parallel algorithms. Commun. ACM 29, 12 (1986), 1170–1183.
[9]
Peng Jiang, Linchuan Chen, and Gagan Agrawal. 2018. Revealing Parallel Scans and Reductions in Recurrences through Function Reconstruction. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (Limassol, Cyprus) (PACT ’18). Association for Computing Machinery, New York, NY, USA, Article 10, 13 pages. https://doi.org/10.1145/3243176.3243204
[10]
Ralf Karrenberg and Sebastian Hack. 2011. Whole-Function Vectorization. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization(CGO ’11). IEEE Computer Society, USA, 141–150.
[11]
Peter M. Kogge and Harold S. Stone. 1973. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Trans. Comput. C-22, 8 (1973), 786–793. https://doi.org/10.1109/TC.1973.5009159
[12]
Richard E Ladner and Michael J Fischer. 1980. Parallel prefix computation. Journal of the ACM (JACM) 27, 4 (1980), 831–838.
[13]
Hung-Ming Lai and Jenq-Kuen Lee. 2023. Efficient Support of the Scan Vector Model for RISC-V Vector Extension. In Workshop Proceedings of the 51st International Conference on Parallel Processing (Bordeaux, France) (ICPP Workshops ’22). Association for Computing Machinery, New York, NY, USA, Article 15, 8 pages. https://doi.org/10.1145/3547276.3548518
[14]
Yu-Te Lin and Jenq-Kuen Lee. 2015. Vector data flow analysis for SIMD optimizations on OpenCL programs. Concurrency and Computation: Practice and Experience 28 (01 2015), n/a–n/a. https://doi.org/10.1002/cpe.3714
[15]
D. Nuzman and R. Henderson. 2006. Multi-platform auto-vectorization. In International Symposium on Code Generation and Optimization (CGO’06). 11 pp.–294. https://doi.org/10.1109/CGO.2006.25
[16]
G. Ren, P. Wu, and D. Padua. 2005. An empirical study on the vectorization of multimedia applications for multimedia extensions. In 19th IEEE International Parallel and Distributed Processing Symposium. 10 pp.–. https://doi.org/10.1109/IPDPS.2005.94
[17]
Y. Tanaka, K. Iwasawa, S. Gotoo, and Y. Umetani. 1988. Compiling techniques for first-order linear recurrences on a vector computer. In Supercomputing ’88:Proceedings of the 1988 ACM/IEEE Conference on Supercomputing, Vol. I. 174–181. https://doi.org/10.1109/SUPERC.1988.44651
[18]
Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero, Matt Masten, Andrew Savonichev, Michael Rice, Elena Demikhovsky, Ayal Zaks, Gil Rapaport, Abhinav Gaba, Vasileios Porpodas, and Eric Garcia. 2017. LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization(LLVM-HPC’17). Association for Computing Machinery, New York, NY, USA, Article 4, 11 pages. https://doi.org/10.1145/3148173.3148191

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops
August 2023
217 pages
ISBN:9798400708428
DOI:10.1145/3605731
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RISC-V vector extension
  2. SIMD
  3. auto-vectorization
  4. recurrence
  5. scan

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP-W 2023

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 130
    Total Downloads
  • Downloads (Last 12 months)106
  • Downloads (Last 6 weeks)18
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media