Benchmarks
We compare
a Fortran implementation due to Sébastien Motsch using the Verlet list method with double precision float numbers, run on the NextGen compute cluster at Imperial College London.
the CPU version of SiSyPHE with double precision tensors (float64) and the blocksparsereduction method (BSR), run on an Intel MacBook Pro (2GHz Intel Core i5 processor with 8 Go of memory) ;
the GPU version of SiSyPHE with single precision tensors (float32) with and without the blocksparsereduction method (BSR), run on a GPU cluster at Imperial College London using an nVidia GTX 2080 Ti GPU ;
the GPU version of SiSyPHE with double precision tensors (float64) with and without the blocksparsereduction method (BSR), run on a GPU cluster at Imperial College London using an nVidia GTX 2080 Ti GPU.
Note
The choice of single or double precision tensors is automatically made according to the floating point precision of the input tensors.
We run a Vicsek
model in a square periodic box with fixed parameters \(L=100\), \(\nu=5\), \(\sigma=1\), \(c=R\), \(dt=0.01\) and various choices of \(N\) and \(R\). The simulation is run for 10 units of time (i.e. 1000 iterations).
For each value of \(N\), three values of \(R\) are tested which correspond to a dilute regime, a moderate regime and a dense meanfield regime. When the particles are uniformly scattered in the box, the average number of neighbours in the three regimes is respectively \(\sim3\), \(\sim30\) and \(\sim300\). The regime has a strong effect on the efficiency of the Verlet list and blocksparse reduction methods.
In the table below, the total computation times are given in seconds except when they exceed 12 hours. The best times are indicated in bold and the worst times in italic.
Fortran

sisyphe64
CPU BSR

sisyphe32
GPU

sisyphe32
GPU BSR

sisyphe64
GPU

sisyphe64
GPU BSR



N = 10k 
R = 1 
19s 
26s 
2.1s 
3.3s 
13s 
3.6s 
R = 3 
59s 
29s 
3.4s 
3.9s 

R = 10 
494s 
69s 
3.4s 
7.9s 

N = 100k 
R = 0.3 
309s 
323s 
29s 
4.3s 
973s 
9.3s 
R = 1 
1522s 
384s 
4.5s 
11s 

R = 3 
3286s 
796s 
4.9s 
28s 

N = 1M 
R = 0.1 
>12h 
6711s 
2738s 
22s 
>12h 
120s 
R = 0.3 
>12h 
6992s 
23s 
135s 

R = 1 
>12h 
9245s 
26s 
194s 
The GPU implementation is at least 5 times faster in the dilute regime and outperform the other methods by three orders of magnitude in the meanfield regime with large values of \(N\). Without the blocksparse reduction method, the GPU implementation does suffer from the quadratic complexity. The blocksparse reduction method is less sensitive to the density of particles than the traditional Verlet list method. It comes at the price of a higher memory cost (see Tutorial 04: Blocksparse reduction) but allows to run large scale simulations even on a CPU. On a CPU, the performances are not critically affected by the precision of the floatingpoint format so only simulations with double precision tensors are shown.
Note
The parameters of the blocksparse reduction method are chosen using the method best_blocksparse_parameters()
(see Tutorial 04: Blocksparse reduction). As a guideline, in this example and on the GPU, the best number of cells is respectively \(\sim 15^2\), \(\sim 42^2\) and \(\sim 70^2\) for \(N=10^4\), \(N=10^5\) and \(N=10^6\) with sisyphe32 and \(\sim 25^2\), \(\sim 60^2\) and \(\sim 110^2\) for sisyphe64. Note that the number of cells will be automatically capped at \((L/R)^2\). For the last case (sisyphe64 with \(R=1\)), the maximal number of cells depends on the memory available.
As an example, the case sisyphe32 GPU BSR with \(N=10^5\) and \(R=1\) is obtained with the following script.
import time
import math
import torch
import sisyphe.models as models
from sisyphe.display import save
dtype = torch.cuda.FloatTensor
# Replace by the following line for double precision tensors.
# dtype = torch.cuda.DoubleTensor
N = 100000
L = 100.
dt = .01
nu = 5.
sigma = 1.
R = 1.
c = R
# The type of the initial condition will determine the floating point precision.
pos = L*torch.rand((N,2)).type(dtype)
vel = torch.randn(N,2).type(dtype)
vel = vel/torch.norm(vel,dim=1).reshape((N,1))
simu = models.Vicsek(
pos = pos,
vel = vel,
v = R,
sigma = sigma,
nu = nu,
interaction_radius = R,
box_size = L,
dt = dt,
block_sparse_reduction = True,
number_of_cells = 42**2)
simu.__next__() # GPU warmup
s = time.time()
data = save(simu, [10.], [], [])
e = time.time()
print(es)