Sunday, December 4, 2011

Profiling the CPU fluid simulation

The suspended particle explosion simulation doesn't actually require a very high resolution fluid grid: while a higher resolution one will allow for slightly nicer detailed particle behaviour, much of the detail of the simulation is in the movement of the fuel/soot particles.

I had originally thought that the most critical thing to optimize would be the conjugate gradient solver in the projection step, since it may take many steps to converge, and each loop of it is doing a pretty decent exponent on the number of grid cells. I believe that part of the reason why projection isn't the most expensive step is a result of the high tolerance we use to make sure that the solver doesn't freak out, as previously mentioned.

Roughly profiling a CPU-based update step, the pieces of the update step rank in the following way:
- vorticity confinement
- velocity advection
- density advection
- temperature advection
- buoyancy
- projection
- update constant sources

There are three particle update steps that don't generally fit nicely in that list:
- particle position update
- heat transfer to particles
- burning particles

The particle position update is flatly O(N), but eventually the number of particles in the simulation can be a few orders of magnitude larger than the total number of grid cells used to simulate the fluid, and once that happens this update becomes more expensive than the advection steps (naturally). Perhaps the most surprising one is the vorticity confinement step: it is regularly an order of magnitude slower than the next slowest step, whether the total number of grid cells is in the 100s or 10,000s, and this can probably be attributed to the fact that it does 7 square root operations per grid cell, even though it is an O(N) function. It's difficult to improve on this operation on a CPU-based implementation.

Thankfully, we have a GPU, but let's save that post for later.