Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3302516.3307357acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccConference Proceedingsconference-collections
research-article
Open access

Revec: program rejuvenation through revectorization

Published: 16 February 2019 Publication History

Abstract

Modern microprocessors are equipped with Single Instruction Multiple Data (SIMD) or vector instructions which expose data level parallelism at a fine granularity. Programmers exploit this parallelism by using low-level vector intrinsics in their code. However, once programs are written using vector intrinsics of a specific instruction set, the code becomes non-portable. Modern compilers are unable to analyze and retarget the code to newer vector instruction sets. Hence, programmers have to manually rewrite the same code using vector intrinsics of a newer generation to exploit higher data widths and capabilities of new instruction sets. This process is tedious, error-prone and requires maintaining multiple code bases. We propose Revec, a compiler optimization pass which revectorizes already vectorized code, by retargeting it to use vector instructions of newer generations. The transformation is transparent, happening at the compiler intermediate representation level, and enables performance portability of hand-vectorized code.
Revec can achieve performance improvements in real-world performance critical kernels. In particular, Revec achieves geometric mean speedups of 1.160× and 1.430× on fast integer unpacking kernels, and speedups of 1.145× and 1.195× on hand-vectorized x265 media codec kernels when retargeting their SSE-series implementations to use AVX2 and AVX-512 vector instructions respectively. We also extensively test Revec’s impact on 216 intrinsic-rich implementations of image processing and stencil kernels relative to hand-retargeting.

References

[1]
Randy Allen and Ken Kennedy. 1987. Automatic Translation of FORTRAN Programs to Vector Form. ACM Trans. Program. Lang. Syst. 9, 4 (Oct. 1987), 491–542.
[2]
ARM. 2013. ARM Programmer Guide. http://infocenter.arm.com/help/index. jsp?topic=/com.arm.doc.den0018a/index.html
[3]
Alexandre E. Eichenberger, Peng Wu, and Kevin O’Brien. 2004. Vectorization for SIMD Architectures with Alignment Constraints. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI ’04). ACM, New York, NY, USA, 82–93.
[5]
Franz Franchetti, Stefan Kral, Juergen Lorenz, and Christoph W Ueberhuber. 2005. Efficient utilization of SIMD extensions. Proc. IEEE 93, 2 (2005), 409–425.
[6]
S. Fu, D. Hong, J. Wu, P. Liu, and W. Hsu. 2015. SIMD Code Translation in an Enhanced HQEMU. In 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS). 507–514.
[7]
Vijay Ganesh, Tim Leek, and Martin Rinard. 2009. Taint-based directed whitebox fuzzing. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, 474–484.
[8]
Ding-Yong Hong, Yu-Ping Liu, Sheng-Yu Fu, Jan-Jan Wu, and Wei-Chung Hsu. 2018. Improving SIMD Parallelism via Dynamic Binary Translation. ACM Trans. Embed. Comput. Syst. 17, 3, Article 61 (Feb. 2018), 27 pages. 1145/3173456
[9]
IBM. 2006. PowerPC microprocessor family: Vector/SIMD multimedia extension technology programming environments manual. IBM Systems and Technology Group (2006).
[10]
Yermalayeu Ihar, Antonenka Mikhail, Radchenko Andrey, Dmitry Fedorov, and Kirill Matsaberydze. 2016. Simd Library for Image Processing. http://ermig1979.
[11]
github.io/Simd/index.html
[12]
MulticoreWare Inc. 2018. x265 HEVC Encoder / H.265 Video Codec. http: //x265.org
[13]
Matthias Kretz and Volker Lindenstruth. 2012. Vc: A C++ library for explicit vectorization. Software: Practice and Experience 42, 11, 1409–1430.
[14]
Samuel Larsen and Saman Amarasinghe. 2000. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (PLDI ’00). ACM, New York, NY, USA, 145–156.
[15]
Samuel Larsen, Emmett Witchel, and Saman P. Amarasinghe. 2002. Increasing and Detecting Memory Address Congruence. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT ’02). IEEE Computer Society, Washington, DC, USA, 18–29. http://dl.acm.org/citation.cfm? id=645989.674329
[16]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO ’04). IEEE Computer Society, Washington, DC, USA, 75–. http://dl.acm.org/citation.cfm?id=977395.977673
[17]
Daniel Lemire and Leonid Boytsov. 2015. Decoding billions of integers per second through vectorization. Journal of Software Practice and Experience (2015).
[18]
Jianhui Li, Qi Zhang, Shu Xu, and Bo Huang. 2006. Optimizing dynamic binary translation for SIMD instructions. In International Symposium on Code Generation and Optimization (CGO’06). 12 pp.–280.
[19]
Saeed Maleki, Yaoqing Gao, Maria J Garzar, Tommy Wong, David A Padua, et al. 2011. An evaluation of vectorizing compilers. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on. IEEE, 372–382.
[20]
Stanislav Manilov, Björn Franke, Anthony Magrath, and Cedric Andrieu. 2015. Free Rider: A Tool for Retargeting Platform-Specific Intrinsic Functions. SIGPLAN Not. 50, 5, Article 5 (June 2015), 10 pages.
[21]
Daniel S. McFarlin, Volodymyr Arbatov, Franz Franchetti, and Markus Püschel. 2011. Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets. In Proceedings of the International Conference on Supercomputing (ICS ’11). ACM, New York, NY, USA, 265–274. 1145/1995896.1995938
[22]
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of Interleaved Data for SIMD. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’06). ACM, New York, NY, USA, 132–143.
[23]
Dorit Nuzman and Ayal Zaks. 2008. Outer-loop Vectorization: Revisited for Short SIMD Architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT ’08). ACM, New York, NY, USA, 2–11.
[24]
Stuart Oberman, Greg Favor, and Fred Weber. 1999. AMD 3DNow! Technology: Architecture and Implementations. IEEE Micro 19, 2 (March 1999), 37–48.
[25]
Gilles Pokam, Stéphane Bihan, Julien Simonnet, and François Bodin. 2004. SWARP: a retargetable preprocessor for multimedia instructions. Concurrency and Computation: Practice and Experience 16, 2-3 (2004), 303–318.
[26]
M. Puschel, J. M. F. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W. Singer, Jianxin Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. 2005. SPIRAL: Code Generation for DSP Transforms. Proc. IEEE 93, 2 (Feb 2005), 232–275.
[27]
B. K. Rosen, M. N. Wegman, and F. K. Zadeck. 1988. Global Value Numbers and Redundant Computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’88). ACM, New York, NY, USA, 12–27.
[28]
Ira Rosen, Dorit Nuzman, and Ayal Zaks. 2007. Loop-Aware SLP in GCC. In Proceedings of the GCC Developers’ Summit. 131–142.
[29]
N. Sreraman and R. Govindarajan. 2000. A Vectorizing Compiler for Multimedia Extensions. Int. J. Parallel Program. 28, 4 (Aug. 2000), 363–400.
[30]
Haichuan Wang, Peng Wu, Ilie Gabriel Tanase, Mauricio J Serrano, and José E Moreira. 2014. Simple, portable and fast SIMD intrinsic programming: generic simd library. In Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing. ACM, 9–16.
[31]
Thomas Willhalm, Ismail Oukid, Ingo Müller, and Franz Faerber. 2013. Vectorizing Database Column Scans with Complex Predicates. In Fourth International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS) at VLDB.
[32]
Thomas Willhalm, Nicolae Popovici, Yazan Boshmaf, Hasso Plattner, Alexander Zeier, and Jan Schaffner. 2009. SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. Proceedings of the VLDB Endowment 2, 1, 385–394.

Cited By

View all
  • (2024)Rewriting and Optimizing Vector Length Agnostic Intrinsics from Arm SVE to RVVWorkshop Proceedings of the 53rd International Conference on Parallel Processing10.1145/3677333.3678151(38-47)Online publication date: 12-Aug-2024
  • (2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
  • (2023)Java Vector API: Benchmarking and Performance AnalysisProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580265(1-12)Online publication date: 17-Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CC 2019: Proceedings of the 28th International Conference on Compiler Construction
February 2019
204 pages
ISBN:9781450362771
DOI:10.1145/3302516
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 February 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Single Instruction Multiple Data (SIMD)
  2. optimizing compilation
  3. program rejuvenation
  4. vectorization

Qualifiers

  • Research-article

Funding Sources

Conference

CC '19

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)102
  • Downloads (Last 6 weeks)12
Reflects downloads up to 19 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Rewriting and Optimizing Vector Length Agnostic Intrinsics from Arm SVE to RVVWorkshop Proceedings of the 53rd International Conference on Parallel Processing10.1145/3677333.3678151(38-47)Online publication date: 12-Aug-2024
  • (2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
  • (2023)Java Vector API: Benchmarking and Performance AnalysisProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580265(1-12)Online publication date: 17-Feb-2023
  • (2023)Vector-Processing for Mobile Devices: Benchmark and Analysis2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00036(15-27)Online publication date: 1-Oct-2023
  • (2022)An SLP Vectorization Method Based on Equivalent Extended TransformationWireless Communications & Mobile Computing10.1155/2022/18325222022Online publication date: 1-Jan-2022
  • (2022)Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-VectorizationProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569663(439-450)Online publication date: 8-Oct-2022
  • (2022)Lasagne: a static binary translator for weak memory model architecturesProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523719(888-902)Online publication date: 9-Jun-2022
  • (2021)Effective exploitation of SIMD resources in cross-ISA virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454016(84-97)Online publication date: 7-Apr-2021
  • (2021)PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized CodeLanguages and Compilers for Parallel Computing10.1007/978-3-030-72789-5_2(15-31)Online publication date: 26-Mar-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media