GPU Accelerated SPH for Fluid Pouring Simulation

Yashoditya Watal & Athan Ferber

URL: https://www.andrew.cmu.edu/user/athanf/SPH.html

Summary

We will implement a GPU accelerated Smoothed Particle Hydrodynamics (SPH) CUDA-based fluid simulator for modeling liquid pouring scenarios, targeting GPUs on the GHC machines. Our focus is on parallelizing particle interactions and analyzing performance challenges such as memory bandwidth limitations, warp divergence, and load imbalance. We will evaluate how different GPU implementations and batching strategies affect scalability and efficiency.

Background

SPH is a particle-based method where fluid is represented as a set of discrete particles. Rather than a fixed grid, SPH estimates fluid properties by combining contributions from nearby particles using a weighted average. This makes it ideal for splashing and pouring flows.

At a high level, each simulation timestep consists of several stages. First, for every particle, the simulator finds nearby particles within a fixed radius. Using those neighbors, it computes local density and pressure. Then it computes forces on each particle, including pressure forces, viscosity, gravity, and collisions with container boundaries. Finally, it updates each particle’s velocity and position based on these forces, moving the simulation forward in time. This process repeats every timestep, and in a pouring scenario, it captures behavior like liquid forming a stream, hitting a surface, and spreading or splashing.

This workload has clear opportunities for parallelism. Most of the work is done per particle, so we can assign one particle to each thread. A key operation is neighbor search, which means finding which nearby particles are close enough to affect a given particle. To avoid checking every particle against every other particle, the simulation space is divided into a spatial hash, and each particle is placed into a grid cell based on its position. Then, when processing one particle, we only need to check particles in its own cell and nearby cells, which makes the computation much more efficient.

In our implementation, we plan to parallelize the main SPH stages on a GPU. Grid construction, density and pressure computation, force computation, and the final timestep integration. However, although SPH appears highly parallel, the workload is not uniform. In a pouring simulation, particles are unevenly distributed. Some regions are sparse (like in a falling stream), while others are very dense (like where the liquid lands). This means some threads do much more work than others, and memory access becomes irregular. These effects can lead to warp divergence, load imbalance, and poor memory performance, which are the key challenges we aim to analyze in this project.

Pseudocode

for each timestep:
    // Spatial Hashing
    for each particle i:
        cell_i = cell containing particle i
        add particle i to cell_i

    // Compute density and pressure for all particle
    for each particle i:
        neighbors = find_neighbors(i, spatial_grid)
        density_i = sum(neighbor_contributions)
        pressure_i = compute_pressure(density_i)

    // Compute forces using already-computed densities/pressures
    for each particle i:
        force_i = sum(pressure_viscosity_interactions)
        force_i += gravity and wall collisions

    // Move particles forward
    for each particle i:
        velocity_i += force_i * dt
        position_i += velocity_i * dt
    

The Challenge

The idea of a SPH fluid simulation for pouring specifically, introduces a unique set of challenges and constraints that sets the project apart from classic parallel SPH projects.

Warp Divergence & Load Imbalance

Pouring a thin stream of liquid from one cup into another creates a severe warp divergence problem, where threads operating on sparse parts of the grid cell, such as the thin stream in mid-air, will likely finish calculation far before threads operating on denser parts of the cell such as the point where the stream lands in the pool inside the recipient cup, as well as the dense bodies inside both cups. Also, with liquid being poured from cup to cup there will be significant splashing, which will present us with a challenge of balancing the tradeoffs of how frequently we rebuild the spatial hash to maintain load balance.

Neighbor search and spatial hash rebuilding

SPH relies heavily on finding nearby particles efficiently. As specified previously, particle distributions may change rapidly from dense fluid to a thin stream and then to splashing regions. This makes the spatial grid quickly outdated. Rebuilding it too often adds overhead, while rebuilding too infrequently makes neighbor searches slower and increases imbalance. Choosing when and how often to rebuild becomes a key challenge.

Memory Bandwidth & Cache Pressure

To push the memory hierarchy even further, we propose running a number of these simulations concurrently on the same GPU in batches (motivated by the need to evaluate multiple pour angles for MPC). While parallelizing independent trajectories is traditionally trivial, forcing them to share the cache and memory bandwidth while dynamically allocating spatial grids for SPH introduces severe cache thrashing and contention, which we will have to actively manage

Goals and Deliverables

Plan to Achieve

Hope to Achieve

Fallback Goals

Performance Goals

Demo plan

Analysis Goals

System Capabilities and Expected Outcomes

Resources

Platform Choice

We will implement our system on the GHC lab machines, which provide NVIDIA GPUs with CUDA support. This platform is well suited for our workload because SPH fluid simulation is highly data parallel, with most computation occurring independently per particle. CUDA allows us to naturally assign particles to GPU threads and execute density, and force computations in parallel, at a large scale. Additionally, GPUs are designed to handle high throughput workloads with many concurrent threads, which matches the structure of SPH where thousands of particles must be processed each timestep.

Our project specifically targets challenges such as irregular particle distributions, memory bandwidth pressure, and warp divergence, all of which are important performance considerations on modern GPU architectures. Using this platform allows us to both exploit the available parallelism and study how well the workload maps to the GPU’s execution and memory model.

Schedule

Milestone Report