Meeting notes (2020-10-01)
Logistics (5 minutes)
- Please use
n2-highcpu-2
for timing and tuning on matmul - Accounts set up soon on Graphite
(CPU
and GPU
- Priority partitions:
cs5220
andcs5220-gpu
- Priority partitions:
Quick discussion of matrix multiply status (10 minutes)
Examples: 1D waves in C and Python (10 minutes)
I will probably run this part locally on my laptop, but it is also
possible to run it remotely on a Google Compute Engine machine.
I do this with SSH tunneling, which is a technique to make traffic
from a port on my local machine go across SSH and magically arrive at
a port on the remote machine. This is a standard bit of SSH trickery,
and can be accomplished with the Google systems by running the
gcloud
command reported by clicking the “SSH” menu to the right of
the instance information in the Compute Engine instance list. For me,
this brings up something like:
gcloud beta compute ssh --zone "us-central1-a" "instance-1" --project "cs-5220-289518"
Typing that command at the command line of my Mac gets me an ordinary ssh session to a Google VM. In order to add tunneling, I run
gcloud beta compute ssh --zone "us-central1-a" "instance-1" --project "cs-5220-289518" -- -L 8888:localhost:8888
Then when I run jupyter notebook
(after installing all the packages
to ensure that I can do such a thing!), if it sends traffic to port
8888 on the Google VM, I can pick up that traffic on port 8888 of my
local host. That is, I can access the Jupyter notebook on my local
machine by pointing my web browser to
http://localhost:8888/?token=RANDOM-LOOKING-STUFF-HERE
and interact with the remote Jupyter notebook from the comfort of my local web browser.
Breakout (35 minutes)
- Suppose you have a tuned single-core dot product that is limited by memory bandwidth (with memory at 12.4 GB/s for one core), and sending a message between processors takes 10 microseconds. If a parallel dot product implementation requires p-1 messages, what is the speedup curve for running a dot product on double precision vectors of dimension one million?
- Consider a spatial decomposition of “Game of Life” on an n-by-n grid with periodic boundary conditions in distributed memory. Assume we have a p-by-q grid of processors, and exchange a “halo” of d layers of boundary cells every d steps of the simulation. How would we model the communication and computation costs at each step? Under what circumstances is it possible to “hide” the communication under the computation. Use a simple model of the type discussed toward the end of the particle lecture.
- Address the questions at the end of the waves demo.