CS 5220

Applications of Parallel Computers

Particle Systems

Prof David Bindel

Please click the play button below.

Particle systems

Billiards, electrons, galaxies, ...
Ants, cars, agents, ...?

Particle simulation

Particles move via Newton (\(F = ma\)), with

External forces: ambient gravity, currents, etc.
Local forces: collisions, Van der Waals (\(1/r^6\)), etc.
Far-field forces: gravity and electrostatics (\(1/r^2\)), etc.
- Simple approximations often apply (Saint-Venant)

When we are dealing with physical particles, we generally have them moving around according to Newton's force balance law. The interesting thing is where the forces come from! Sometimes they are external, things like gravity or currents that come from outside the scope of the things we are trying to simulate. Sometimes, they involve interactions between particles that are very close to each other. And sometimes they involve far-field forces like gravitational attraction or electrostatic attraction or repulsion - though, as we have mentioned before, these types of far-field forces sometimes admit convenient approximations.

By the way: the "far field vs near field" distinction is ubiquitous in physics. In some areas, like mechanics, it has special names - St. Venant's principle, for example, in solid mechanics. Mathematicians refer to the property as elliptic regularity, usually in the context of talking about PDEs. But even if you only care about discrete systems like social networks, the same effect can occur - just with less directly-physical notions of what "near" and "far" mean.

A forced example

Example force: \[f_i = \sum_j Gm_i m_j \frac{(x_j-x_i)}{r_{ij}^3} \left(1 - \left(\frac{a}{r_{ij}}\right)^{4} \right), \qquad r_{ij} = \|x_i-x_j\|\]

Long-range attractive force (\(r^{-2}\))
Short-range repulsive force (\(r^{-6}\))
Go from attraction to repulsion at radius \(a\)

A simple serial simulation

In Matlab, we can write

npts = 100;
t = linspace(0, tfinal, npts);
[tout, xyv] = ode113(@fnbody, t, [x; v], [], m, g);
xout = xyv(:,1:length(x))';

... but I can’t call ode113 in C in parallel (or can I?)

A simple serial simulation

Maybe a fixed step leapfrog will do?

npts = 100;
steps_per_pt = 10;
dt = tfinal/(steps_per_pt*(npts-1));
xout = zeros(2*n, npts);
xout(:,1) = x;
for i = 1:npts-1
  for ii = 1:steps_per_pt
    x = x + v*dt;
    a = fnbody(x, m, g);
    v = v + a*dt;
  end
  xout(:,i+1) = x;
end

Plotting particles

Smooth Particle Hydrodynamics (SPH) simulation

Pondering particles

Where do particles live (distributed mem)?
- Decompose in space? By particle number?
- What about clumping?
How are long-range force computations organized?
How are short-range force computations organized?
How is force computation load balanced?
What are the boundary conditions?
How are potential singularities handled?
What integrator is used? What step control?

We'll say more about time steppers on Thursday. But the more interesting thing to talk about at the moment is how we manage the data for the particles and the computation of the forces between them at any given step. We also have to worry about things like what happens at boundaries - do we bounce off walls, or does something else happen? And sometimes we have to worry about the potential that our model has singular behaviors, like infinite attraction between particles that are zero distance apart. Of course, that type of interaction usually means that we have neglected some important part of the physics, but that's cold comfort when we're trying to sort out why there are NaNs suddenly appearing in our simulations!

External forces

Simplest case: no particle interactions.

Embarrassingly parallel (like Monte Carlo)!
Could just split particles evenly across processors
Is it that easy?
- Maybe some trajectories need short time steps?
- Even with MC, load balance may not be trivial!

Local forces

Simplest all-pairs check is \(O(n^2)\) (expensive)
Or only check close pairs (via binning, quadtrees?)
Communication required for pairs checked
Usual model: domain decomposition

Things get more interesting when particles interact locally. The naive thing to do here would be to check every pair of particles for interaction, which is O(n^2) where n is the number of particles. But we can go much faster if we instead partition space into bins, and only have to check for interactions between particles in neighboring bins.

Often, we would assign big regions of space (maybe broken into smaller bins internally) to each processor; this is a spatial decomposition of the problem. We could also choose to partition the problem by assigning a fixed subset of the particles to each processor, or by assigning a subset of the particle-particle interactions to each processor. The fastest particle simulations out there often end up using a combination of these approaches.

Local forces: Communication

Minimize communication:

Send particles that might affect a neighbor “soon”
Trade extra computation against communication
Want low surface area-to-volume ratios on domains

If we use a spatial decomposition, we have to communicate the position of any particles that are near the boundaries of the domains owned by each processor. As with the Game of Life example, we like decomposing into big blocks because they have a low surface-area-to-volume ratio, with communication proportional to surface area and computation proportional to volume.

We could communicate just those particles that are within an interaction radius of the boundary, but we'd have to send new updates at every time step. often we might communicate particles that are within a larger radius of the boundary. That way, we only have to communicate once every few steps. Each message might involve more particles, but often that's a small price to pay for reducing the number of messages (and the associated latencies).

Local forces: Load balance

Are particles evenly distributed?
Do particles remain evenly distributed?
Can divide space unevenly (e.g. quadtree/octtree)

Far-field forces

Every particle affects every other particle
All-to-all communication required
- Overlap communication with computation
- Poor memory scaling if everyone keeps everything!
Idea: pass particles in a round-robin manner

Passing particles (far-field forces)

copy local particles to current buf
for phase = 1:p
  send current buf to rank+1 (mod p)
  recv next buf from rank-1 (mod p)
  interact local particles with current buf
  swap current buf with next buf
end

Here's that idea in pictures and pseudocode. We start off with each processor holding a set of particles that it owns. The processor copies those particles into a buffer that holds the particles with which we want to compute interactions. When we are done with one batch of particles, we send the current set that we're interacting with on to the next processor, and receive a new set of particles from the previous processor (in a ring ordering). If we have some memory to spare, we can do these data transfers at the same time that we are computing, hiding the communication costs (at least partly). This strategy of overlapping communication and computation is a good approach across a lot of different types of parallel patterns.

Passing particles (far-field forces)

Suppose \(n = N/p\) particles in buffer. At each phase \[\begin{aligned} t_{\mathrm{comm}} & \approx \alpha + \beta n \\ t_{\mathrm{comp}} & \approx \gamma n^2 \end{aligned}\] So we can mask communication with computation if \[n \geq \frac{1}{2\gamma} \left( \beta + \sqrt{\beta^2 + 4 \alpha \gamma} \right) > \frac{\beta}{\gamma}\]

Passing particles (far-field forces)

More efficient serial code
\(\implies\) larger \(n\) needed to mask communication!
\(\implies\) worse speed-up as \(p\) gets larger (fixed \(N\))
but scaled speed-up (\(n\) fixed) remains unchanged.

Far-field forces: particle-mesh methods

Consider \(r^{-2}\) electrostatic potential interaction

Enough charges looks like a continuum!
Poisson maps charge distribution to potential
Fast Poisson for regular grids (FFT, multigrid)
Approx depends on mesh and particle density
Can clean up leading part of approximation error

Long-range forces: particle-mesh

Map particles to mesh points (multiple strategies)
Solve potential PDE on mesh
Interpolate potential to particles
Add correction term – acts like local force

Far-field forces: tree methods

Distance simplifies things
- Andromeda looks like a point mass from here?
Build a tree, approx descendants at each node
Variants: Barnes-Hut, FMM, Anderson’s method
More on this later in the semester

Particle-mesh methods work pretty well when there are a lot of particles and they are fairly evenly distributed in space. But those aren't always valid assumptions. An alternative that works even with less even particle distributions is the idea of a so-called tree code. In tree codes, we organize our approximation in boxes of increasing size, and keep a "summary" approximation of the potential from things inside the box on things that are sufficiently far away from the box (relative to the box size). For example, in the solar system, we might need to keep track of the sun and all the different planets and their interactions; but if we care about the pull of the Andromeda galaxy at all, we will probably approximate it as a single giant mass at a point very far away.

There are a variety of these tree-code approximation schemes out there, and we'll try to talk about some of them in more detail later in the semester.

Summary of particle example

Model: Continuous motion of particles
- Could be electrons, cars, whatever...
Step through discretized time

Summary of particle example

Local interactions
- Relatively cheap
- Load balance a pain
All-pairs interactions
- Obvious algorithm is expensive (\(O(n^2)\))
- Particle-mesh and tree-based algorithms help

An important special case of lumped/ODE models.

The key to making particle simulations go fast is the organization of data and computations in order to quickly compute the forces acting on the particles. External forces are the easy part; the harder part is computing particle-particle interactions. We can be pretty efficient about short-range interactions by using appropriate spatial data structures in order to only compute interactions that matter, but the all-to-all computation involved in long-range forces is more expensive. The obvious algorithms are O(n^2), but there are alternative methods that give a very good approximation that cost less; these include particle-mesh and tree-based algorithms.

Particle systems are a special case of an ODE-based model. The structure of the force interactions (which often varies over time) is interesting enough for us to treat these as a thing on their own. But there are some concerns in common with other ODE-based simulations, as we will see when we pick up with our discussion in the slides for Thursday.