Fast Cloth Simulation with CUDA Acceleration

By Ian Heath


Final Report:


Cloth Simulation is integral to realism in today’s movies, computer games, and other simulations (some uses are as far reaching as the fashion industry or even heart surgery). I plan to create a high-resolution cloth simulation solution which will leverage CUDA for speedup over a serial implementation.


Particle-Spring systems have been used for physical models of cloth in computer simulation as far back as 1995 when the original paper was published on the topic. Generally, an implicit Euler integration scheme is used wherein a cloth is composed of a mesh of particles, each connected to eight of its neighbors with with springs. The particles obey gravity, but also collide with other geometry, and must satisfy constraints relating to the distance they can travel from their neighbors and the spring forces acting on them.

While cloth simulation in this way has existed for quite a while, recent technology should allow for a good speedup in the simulation, which can result in both real-time and high-resolution (many-particle) cloth simulations. I will explore the capabilities of today’s top-end GPUs, leveraging CUDA, to render cloth in this manner.

The Challenge

Generally speaking, the particles need only reference their eight neighbors to determine the forces acting upon them and the positions they will occupy in the next time step. However, as the cloth becomes more detailed and incorporates more particles, it becomes more difficult to ensure locality of the computation. That is, the challenge is in determining how to exploit locality when updating the positions of many different 3D particles. This will become especially important as physics calculations are done on the GPU, which tends to have a relatively small cache.

[UPDATE] The challenge turned out to be very different from what I had initially anticipated. Read the final report to learn more.


I will be using my own machine, along with C++, CUDA, and OpenGL for rendering. Additionally, I will be making use of the information presented in the following papers:

Deformation Constraints in a Mass-Spring Model to Describe Rigid Cloth Behavior

Parallel techniques in irregular codes: cloth simulation as case of study

And I may also make use of the tutorial and / or starter code presented here:

Mosegaards Cloth Simulation Coding Tutorial

Machine Specifications:

Nvidia GTX 980  @ 1.2 GHz

Intel 3570k @ 3.8 GHz

4 GB RAM @ 1.86 GHz

Goals and Deliverables

Plan to Achieve:

I aim to implement a cloth simulation that gains a significant speedup over the sequential version using CUDA. I believe that a well-designed locality-conscious kernel should be able to enable real-time rendering of fairly detailed cloth, but at this time it is hard to say how many cloth nodes (and therefore what resolution) of cloth is to be expected from a real-time simulation.

Hope to Achieve:

If I find that I am ahead of schedule, it could be pretty cool to add in frictional forces or self-collisions (the cloth can collide with itself)


The plan is to show my cloth simulation in real time at the competition (as long as you don’t mind me hauling my enormous desktop in… videos might also work!). I should be able to demonstrate the speedup between the two versions visually by running the sequential and parallel versions successively. Additionally, I should be able to provide something along the lines of a speedup graph or FPS counter to quantify the difference.

Platform Choice:

I think my own machine is fairly representative of the kind of computing resources available to the average consumer. Since I’m a gamer at heart, I’d like to see what today’s hardware can do!

(Revised) Schedule

April 1-8: Initial research, paper exploration, familiarization with general approach.

April 9-15: Sequential Implementation, OpenGL Visualization

April 20-23: Read papers on cloth self-collision

April 24-27: CUDA implementation / parallelization of Verlet Integration - based Cloth

April 28 - May 1: More time for CUDA implementation if need be, otherwise start on self-collision

May 2 - May 6: Self-collision algorithm, begin performance analysis

May 7 - May 11: Work on writeup and presentation, if possible, multi-thread the CPU implementation and include results in writeup

[UPDATE: The CUDA algorithm for parallelizing the cloth simulation turned out to be non-trivial, and no time was left over for cloth features like self-collision or friction]