This set of scripts can be used to benchmark the performance of KeOps versus other solutions.
We compare the performances of several computing libraries on a simple benchmark: a Gaussian kernel matrix-vector product with a growing number of points N
in dimension D = 3
. All experiments are performed with float32
precision on a Nvidia RTX 2080 TiGPU, with the exception of the PyTorch-TPU column that was run in Google Colab.
PyTorch | PyTorch-TPU | TF-XLA | Halide | TVM | PyKeOps | KeOps++ | |
---|---|---|---|---|---|---|---|
N = 10k | 9 ms | 10 ms | 13 ms | 1.0 ms | 3.8 ms | 0.7 ms | 0.40 ms |
N = 100k | out of mem | out of mem | 89 ms | 34.1 ms | 36.8 ms | 15.0 ms | 14.6 ms |
N = 1M | out of mem | out of mem | out of mem | 3.8 s | 2.79 s | 1.39 s | 1.38 s |
Lines of code | 5 | 5 | 5 | 15 | 17 | 5 | 55 |
Interface | NumPy-like | NumPy-like | NumPy-like | C++ | low-level Python | NumPy-like | C++ |
To compute the Gaussian convolution on GPU using Pykeops with torch.
- intall pykeops
$ pip install pykeops
- Run
$ python KeOps.py
To compute the Gaussian convolution on GPU using KeOps c++ backend
mkdir build
cd build
cmake ../../keops
make test_fromdevice
example/test_fromdevice 100000
To compute the Gaussian convolution on GPU using Halide (c++ code)
- Download Halide, and extract it in the current folder E.g.
$ wget https://github.com/halide/Halide/releases/download/release_2019_08_27/halide-linux-64-gcc53-800-65c26cba6a3eca2d08a0bccf113ca28746012cc3.tgz
$ tar xvf halide-linux-64-gcc53-800-65c26cba6a3eca2d08a0bccf113ca28746012cc3.tgz
- Compile the sources launch it:
$ g++ gauss_conv_halide.cpp -g -std=c++11 -I halide/tutorial -I halide/include -I halide/tools -L halide/bin -lHalide -lpthread -ldl -o gauss_conv_halide
$ LD_LIBRARY_PATH=halide/bin ./gauss_conv_halide 100000
To compute the Gaussian convolution on GPU using TF-XLA:
- Install TensorFlow-XLA 2.xx with gpu support. On Google Colab,
!pip install tensorflow-gpu==2.0.0
should do the trick. On a linux system, you may need to define the XLA-FLAGS env:
$ export XLA_FLAGS="--xla_gpu_cuda_data_dir=/path/to/cuda
- Run
$ python TF_XLA.py
To compute the Gaussian convolution on GPU using PyTorch:
$ python Pytorch_GPU.py
To compute the Gaussian convolution on TPU using PyTorch. This code should be run on a Google Colab session, with TPU acceleration. Copy/paste the content of PyTorch_TPU.py
in the collab notebook.
To compute the Gaussian convolution on GPU using PyTorch:
- Install TVM using this tutorial
- If you are running on a linux system, export
TVM_HOME
andPYTHONPATH
variable. - Run
$ python TVM.py