The actual computation of the physics of the particles was done using the Navier-Stokes equations:
Where v is the flow velocity, ρ is the fluid density, p is the pressure, T represents the drag term and f represents the external forces acting upon the fluid, eg: gravity.
Our GPU implementation was done using CUDA, using the CUDA code examples from the NVIDIA website as starter code. The main SPH algorithm we used is based on this paper by Goswami et al: Interactive SPH Simulation and Rendering on the GPU. It makes use of CUDA data structures and Z-indexing to speed up the computation time of the interactions between the particles. Z-indexing is a way to generate a compact representation of the particles which allow us to store the particle positions in a 32 byte number while preserving the locality of that data by interleaving the one-dimensional representation of the positions.
After we compute the Z-indices of the particles, we then use a radix sort from the thrust library to sort these indices based on the indices of the particles. After this, we compute the blocks for each particles where we will perform the computations in.
After this, we generate CUDA blocks which store an index to the first particle to the Z-index array, along with the particle count in each block. We then filter these blocks to remove the empty blocks, and then split the blocks with more than a certain number of particles based on the threads per CUDA block.
This brings the complexity of the algorithm to O(n^2) to O(mn), where n is the total number of particles and m is the number of neighboring particles. The grid size our implementation also changes depending on the number of particles to optimize the use of CUDA threads.
We then perform the physics computations(density and force and surface tension) and then collisions that act upon the particles. We do about 10 iterations each frame to improve the accuracy of the simulation. Finally, we render the particles using the resulting positions, and use a pointer to these positions to render them using OpenGL. We had researched some fluid rendering algorithms like marching cubes or raycasting to see if we would be able to use them to render on the GPU, but we unfortunately did not have the time to.
After we implemented this algorithm, we also attempted to improve our simulator by implementing a Hybrid SPH algorithm based on this paper: Hybrid Smoothed Particle Hydrodynamics, as it supposedly improved the stability of the simulation as the timestep increases, but we did not see a significant improvement when we tried to implement it.
|Number of Particles||Densities and Forces||Collisions||Rendering||Total|
|10000||8.01 ms||17.74 ms||2.83 ms||28.58 ms|
|15000||13.550 ms||32.636 ms||4.366 ms||50.552 ms|
|20000||20.80 ms||53.71 ms||10.81 ms||89.686 ms|
|25000||30.520 ms||82.582 ms||113.103 ms||249.987 ms|
|30000||41.063||113.681 ms||155.491 ms||310.235 ms|
|Number of Particles||Data transfer||Z-indexing||Sorting||Densities/Forces||Collisions||Rendering||Total|
|10000||10.716 ms||0.0279 ms||2.460 ms||2.520 ms||0.004 ms||0.547 ms||16.678 ms|
|15000||10.688 ms||0.027 ms||3.334 ms||1.680 ms||0.004 ms||0.531 ms||16.322 ms|
|20000||10.148 ms||0.028 ms||3.257 ms||1.830 ms||0.004 ms||0.547 ms||15.878 ms|
|25000||8.299 ms||0.028 ms||3.261 ms||1.770 ms||0.004 ms||0.509 ms||13.930 ms|
|30000||7.028 ms||0.029 ms||3.363 ms||1.740 ms||0.004 ms||0.585 ms||12.807 ms|
|100000||12.999 ms||0.028 ms||6.017 ms||1.770 ms||0.004 ms||0.707 ms||21.586 ms|
|200000||20.474 ms||0.028 ms||9.13 ms||1.830 ms||0.004 ms||1.693 ms||33.222 ms|
|300000||27.749 ms||0.029 ms||10.133 ms||1.860 ms||0.004 ms||2.365 ms||42.203 ms|
|400000||33.031 ms||0.029 ms||11.112 ms||2.070 ms||0.004 ms||2.000 ms||48.316 ms|
|500000||40.473 ms||0.029 ms||24.956 ms||2.820 ms||0.004 ms||0.751 ms||56.552 ms|
|1000000||48.011 ms||0.032 ms||26.037 ms||1.740 ms||0.005 ms||0.987 ms||76.905 ms|
When there are a lower number of particles, the constants affect the simulation lot more, which is why from 10000 - 30000 particles there is some fluctuation in the data. Also, some of the variations in timings are due to the positions of the particles, because when there are a large number of particles in the same grid, there would be a lot more calculations that need to be done on that particular grid. Investigating a technique to adapt to this would be a nice optimization we could make in the future. The density/force computation and the collision computation stay relatively constant as the number of particles increases.
The x-axis denotes the number of particles