Due: Wednesday, March 10 by 5
pm.
Problem
The purpose of this assignment is introduction to programming in
shared and distributed memory models. Your goal is to parallelize a toy
particle simulator with a short-range potential force (Lennard-Jones) and
an external gravitational field. Some starting point code is documented
here. Our current version uses a naive
algorithm that does not use spatial locality. Your mission, should you
choose to accept it, is to fix that. You should report on your progress
in two weeks, including three things:
- Characterize the performance of all three basic codes as a function
of the number of particles in the system and the number of processors.
Simple models are nice here, but mostly I want to see empirical timing
data.
- Improve the complexity by changing from the naive O(n2)
algorithm to an O(n) (roughly) force evaluation algorithm based on
spatial partitioning. You will want to modify the parallel algorithms
to use this spatial decomposition, and you will probably want to do
something with the communication to keep it from dominating the
computational cost in your new code!
- If you have time, play a little! Improve or extend the code in some
way that appeals to you, either by doing something clever with the time
integrator, adding error diagnostics (is monitoring conservation of
energy and momentum enuough?), doing some dynamic load balancing, or by
doing some performance tuning on the serial implementation. Feel free
to suggest your own ideas as well!
Source Code
You may start with the serial and parallel implementations supplied
below. All of them run in O(n2) time.
You may consider using the following Java visualization program to
check the correctness of the result produced by your code: Bouncy.jar. If you feel like hacking on it, here is
Bouncy.java.
Submission
You may work in groups of 2 or 3. One person in your group should be a
non-CS student (if possible), but otherwise you're responsible for
finding a group. You do not have to have the same groups as last
time.
Here is a list of items you might show in your report:
- A plot in log-log scale that shows that your serial and parallel
codes run in O(n) time and a description of the data
structures that you used to achieve it.
- A description of the synchronization you used in the shared memory
implementation.
- A description of the communication you used in the distributed
memory implementation.
- A description of the design choices that you tried and how did they
affect the performance.
- Speedup plots that show how closely your parallel codes approach
the idealized p-times speedup and a discussion on whether it is
possible to do better.
- Where does the time go? Consider breaking down the runtime into
computation time, synchronization time and/or communication time. How
do they scale with p?
- A discussion on using OpenMP and MPI.
Resources
OpenMP tutorial
(LLNL),
OpenMP
tutorial,
OpenMP specifications,
MPI tutorial
(LLNL), and
MPI
specifications.