research-article

Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)

Authors:

Somaia Awad Hassan,

A.M. Hemeida,

Mountasser M.M. MahmoudAuthors Info & Claims

Microprocessors & Microsystems, Volume 47, Issue PB

Pages 369 - 374

https://doi.org/10.1016/j.micpro.2016.10.002

Published: 01 November 2016 Publication History

Abstract

Implementing dense matrix-matrix multiplication kernels in parallel using Intel's advanced vector extension (AVX) instruction sets.The obtained results are compared using inline assembly versus intrinsic functions for programming.A comparative study to indicate the effects of two widely used C++ compilers: Intel C++ compiler (ICC) in Intel Parallel Studio XE 2016 against Microsoft Visual Studio C++ compiler 2015 (MSVC++) has been investigated.The obtained results are compared using inline assembly versus intrinsic functions for programming. A comparative study to indicate the effects of two widely used C++ compilers: Intel C++ compiler (ICC) in Intel Parallel Studio XE 2016 against Microsoft Visual Studio C++ compiler 2015 (MSVC++) has been investigated.The performance of using intrinsic functions compared to the inline assembly demonstrates that the intrinsic functions has better performance than inline assembly by 2.1, 2.13, and 2.18 using Intel compiler and by 2.08, 2.49, and 2.11 using MSVC++ compiler for C=A. B, C=A. BT, and C=AT. B, respectively. Intel's Advanced Vector Extensions is known as single instruction multiple data streams (SIMD), and the instruction sets is introduced in the second-generation Intel Core processor family. This new technology is supported by new generations of Intel and AMD processors. The advanced vector extensions (AVX) exploits single instruction multiple data (SIMD) computing units for fine grained-parallelism. These instructions process multiple data elements simultaneously and independently. Many applications such as signal processing, recognition, visual processing, scientific and engineering numerical, physics and other areas of applications need for vector floating point performance supported by AVX. Matrix-Matrix multiplications is the core of many important algorithms such as signal processing, scientific and engineering numerical, so it is substantial to accelerate implementation of matrix-matrix multiplications. It is very important to use appropriate compilers that can optimally utilize the new features of the evolving processors. For this purpose, a clear vision on the performance of the compilers on performance characteristics of AVX is needed. In addition choosing the appropriate programming method is substantial to gain the best performance. In this paper, the performance evaluation of matrix-matrix multiplications in three forms (C=A. B, C=A. BT, and C=AT. B), using Intel's advanced vector extension (AVX) instruction sets has been reported. The obtained results are compared using inline assembly versus intrinsic functions for programming. A comparative study to indicate the effects of two widely used C++ compilers: Intel C++ compiler (ICC) in Intel Parallel Studio XE 2016 against Microsoft Visual Studio C++ compiler 2015 (MSVC++) has been investigated. The results are evaluated on Intel Core i7 processor on a Broadwell system for square matrices of different large sizes. The results demonstrate that the Intel compiler has better performance than MSVC++ compiler by 1.34, 1.32, and 1.22 using inline assembly language and by 1.36, 1.19, and 1.25 using intrinsic functions for C=A. B, C=A. BT, and C=AT. B, respectively. The performance of using intrinsic functions compared to the inline assembly demonstrates that the intrinsic functions has better performance than inline assembly by 2.1, 2.13, and 2.18 using Intel compiler and by 2.08, 2.49, and 2.11 using MSVC++ compiler for C=A. B, C=A. BT, and C=AT. B, respectively.

References

[1]

"Intel Advanced Vector Extensions Programming Reference," Ref. #319433-005, www.intel.com, January 2009.

Abstract

References

Cited By

Recommendations

Effective Implementation of MatrixVector Multiplication on Intel's AVX multicore Processor

Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512

An implementation of matrix---matrix multiplication on the Intel KNL processor with AVX-512

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations