Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/CGO.2005.18acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Efficient SIMD Code Generation for Runtime Alignment and Length Conversion

Published: 20 March 2005 Publication History

Abstract

When generating codes for today's multimedia extensions, one of the major challenges is to deal with memory alignment issues. While hand programming still yields best performing SIMD codes, it is both time consuming and error prone. Compiler technology has greatly improved, including techniques that simdize loops with misaligned accesses by automatically rearranging mis-aligned memory streams in registers. Current techniques are applicable to runtime alignments, but they aggressively reduce the alignment overhead only when all alignments are known at compile time. This paper presents two major enhancements to the state of the art, improving both performance and coverage. First, we propose a novel technique to simdize loops with runtime alignment nearly as efficiently as those with compile-time misalignment. Runtime alignment is pervasive in real applications because it is either part of the algorithms, or it is an artifact of the compiler's inability to extract accurate alignment information from complex applications. Second, we incorporate length conversion operations, e.g., conversions between data of different sizes, into the alignment handling framework. Length conversions are pervasive in multimedia applications where mixed integer types are often used. Supporting length conversion can greatly improve the coverage of simdizable loops. Experimental results indicate that our runtime alignment technique achieves a 19% to 32% speedup increase over prior art for a benchmark stressing the impact of misaligned data. We also demonstrate speedup factors of up to 8.11 for real benchmarks over sequential execution.

References

[1]
{1} J. R. Allen and K. Kennedy. Automatic Translation of Fortran Programs to Vector Form. ACM Transactions on Programming Languages and Systems, (4):491-542, October 1987.
[2]
{2} A. Bik, M. Girkar, P. M. Grey, and X. Tian. Automatic Intra-Register Vectorization for the Intel Architecture. International Journal of Parallel Programming, (2):65-98, April 2002.
[3]
{3} A. J. Bik. The Software Vectorization Handbook. Intel Press, 2004.
[4]
{4} G. Cheong and M. S. Lam. An Optimizer for Multimedia Instruction Sets. In Second SUIF Compiler Workshop, August 1997.
[5]
{5} M. Corporation. AltiVec Technology Programming Interface Manual, June 1999.
[6]
{6} A. E. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD Architectures with Alignment Constraints. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, June 2004.
[7]
{7} A. Krall and S. Lelait. Compilation Techniques for Multimedia Processors. International Journal of Parallel Programming , (4):347-361, August 2000.
[8]
{8} S. Larsen and S. Amarasinghe. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, pages 145-156, June 2000.
[9]
{9} S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and Detecting Memory Address Congruence. In Proceedings of 11th International Conference on Parallel Architectures and Compilation Techniques, September 2002.
[10]
{10} C. G. Lee and M. G. Stoodley. Simple Vector Microprocessors for Multimedia Applications. In Proceedings of International Symposium on Microarchitecture, pages 25-36, 1998.
[11]
{11} D. Naishlos, M. Biberstein, S. Ben-David, and A. Zaks. Vectorizing for a SIMdD DSP Architecture. In Proceedings of International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, pages 2-11, October 2003.
[12]
{12} G. Ren, P. Wu, and D. Padua. A Preliminary Study on the Vectorization of Multimedia Applications for Multimedia Extensions. In 16th International Workshop of Languages and Compilers for Parallel Computing, October 2003.
[13]
{13} C. B. Software. VAST-F/AltiVec: Automatic Fortran Vectorizer for PowerPC Vector Unit. http://www.psrv.com/vast_altivec.html, 2004.
[14]
{14} N. Sreraman and R. Govindarajan. A Vectorizing Compiler for Multimedia Extensions. International Journal of Parallel Programming, 28(4):363-400, August 2000.
[15]
{15} H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, 1990.

Cited By

View all
  • (2021)Temporal vectorization for stencilsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476149(1-13)Online publication date: 14-Nov-2021
  • (2017)Improving the effectiveness of searching for isomorphic chains in superword level parallelismProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3124554(718-729)Online publication date: 14-Oct-2017
  • (2017)Improving Loop Dependence AnalysisACM Transactions on Architecture and Code Optimization10.1145/309575414:3(1-24)Online publication date: 22-Aug-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '05: Proceedings of the international symposium on Code generation and optimization
March 2005
313 pages
ISBN:076952298X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 March 2005

Check for updates

Qualifiers

  • Article

Conference

CGO05

Acceptance Rates

CGO '05 Paper Acceptance Rate 26 of 75 submissions, 35%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Temporal vectorization for stencilsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476149(1-13)Online publication date: 14-Nov-2021
  • (2017)Improving the effectiveness of searching for isomorphic chains in superword level parallelismProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3124554(718-729)Online publication date: 14-Oct-2017
  • (2017)Improving Loop Dependence AnalysisACM Transactions on Architecture and Code Optimization10.1145/309575414:3(1-24)Online publication date: 22-Aug-2017
  • (2016)Effective SIMD vectorization for intel Xeon Phi coprocessorsScientific Programming10.1155/2015/2697642015(1-1)Online publication date: 1-Jan-2016
  • (2016)Vectorization in PyPy's Tracing Just-In-Time CompilerProceedings of the 19th International Workshop on Software and Compilers for Embedded Systems10.1145/2906363.2906384(67-76)Online publication date: 23-May-2016
  • (2016)An evaluation of current SIMD programming models for C++Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing10.1145/2870650.2870653(1-8)Online publication date: 13-Mar-2016
  • (2015)PSLPProceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2738600.2738625(190-201)Online publication date: 7-Feb-2015
  • (2015)Compiling Vector Pascal to the XeonPhiConcurrency and Computation: Practice & Experience10.1002/cpe.350927:17(5060-5075)Online publication date: 10-Dec-2015
  • (2015)Evaluating vector data type usage in OpenCL kernelsConcurrency and Computation: Practice & Experience10.1002/cpe.342427:17(4586-4602)Online publication date: 10-Dec-2015
  • (2013)From relational verification to SIMD loop synthesisACM SIGPLAN Notices10.1145/2517327.244252948:8(123-134)Online publication date: 23-Feb-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media