To demo simd tricks, cache blocking tricks, and more to be added.
Set the three switches to 'yes' or 'no' at the beginning of Makefile. 'VECREPORT' works only with intel compilers
Demonstrates memcpy() vs simple loop. Compiler is smart enough to recognize the simple loop pattern and auto optimize the loop
No external dependence
Demonstrates different array operations. The simplest one could be the fasted because the double loop could be auto-switched by the compiler to significantly improve the reuse of data in cache.
No external dependence
Demonstrates different fast inverse square root routines.
No external dependence
Demonstrates cache blocking.
The Eigen dgemm function relies on eigen to compile.
Demonstrates the kernel sum performance.
The pvfmm kernel sum function relies pvfmm to compile.
Demonstrates the effect of HornerForm in polynomial evaluations.
- What Every Programmer Should Know About Memory
- Intel® 64 and IA-32 Architectures Optimization Reference Manual
- Intel instrinsics guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
- Agner Fog’s website: http://www.agner.org/
- Online Compiler Explorer: https://gcc.godbolt.org/