Parallelism and Locality in Simulations
2024-09-17
The world exhibits parallelism and locality
Get more parallelism / locality through model
Often get parallelism at multiple levels
Often more than one type of simulation is approprate.
(Sometimes more than one at a time!)
May be discrete or continuous time.
Game of life (John Conway):
Game of life (John Conway):
What to do if I really cared?
Before doing anything with OpenMP/MPI!
East to parallelize by domain decomposition
Also works with tiling.
Sketch of a kernel for tiled implementation:
uint64_t
)
ref
and tmp
)ref
to tmp
, four back)Some areas are more eventful than others!
What if pattern is dilute?
How do we manage events?
How do we manage load balance?
Particles move via Newton (\(F = ma\)) with
\[\begin{aligned} f_i &= \sum_j G m_i m_j \frac{(x_j-x_i)}{r_{ij}^3} \left( 1-\left( \frac{a}{r_{ij}} \right)^4 \right), \\ r_{ij} &= \|x_i-x_j\| \end{aligned}\]
Using Boost.Numeric.Odeint
, we can write
integrate(particle_system, x0, tinit, tfinal, h0,
[](const auto& x, double t) {
std::cout << "t=" << t << ": x=" << x << std::endl;
});
where
particle_system
defines the ODE systemx0
is the initial conditiontinit
and tfinal
are start and end timesh0
is the initial step sizeand the final lambda is an observer function.
Can parallelize in
Smooth Particle Hydrodynamics (SPH) – Project 2
Simplest case: no particle interactions.
Minimize communication:
copy particles to current buf
for phase = 1 to p
send current buf to rank+1 (mod p)
recv next buf from rank-1 (mod p)
interact local particles with current buf
swap current buf with next buf
end
Suppose \(n = N/p\) particles in buffer. At each phase \[\begin{aligned} t_{\mathrm{comm}} & \approx \alpha + \beta n \\ t_{\mathrm{comp}} & \approx \gamma n^2 \end{aligned}\]
So mask communication with computation if \[ n \geq \frac{1}{2\gamma} \left( \beta + \sqrt{\beta^2 + 4 \alpha \gamma} \right). \]
More efficient serial code
\(\implies\) larger \(n\) needed to mask commujnication!
\(\implies\) worse speed-up as \(p\) gets larger (fixed \(N\))
but scaled speed-up (\(n\) fixed) remains unchanged.
Consider \(r^{-2}\) electrostatic potential interaction
An important special case of lumped/ODE models.