Logistics (5 minutes)

  • Project 1 release
  • Alternate times (TR 1:25-2:40 in the Gather.town office)

Discussion of project 1 (15 minutes)

  • Walk through of assignment logistics and code

Collaborating on code (15 minutes)

Breakout groups (25 minutes)

  1. Open Google Cloud Shell and type cat /proc/cpuinfo. What is the processor family, clock rate, L1 cache size?

  2. What is the arithmetic intensity of the centroid code?

  3. What is the effective flop rate ceiling for the centroid code, ignoring memory but keeping in mind flop rate? What if there is no vectorization? Use a GCP E2 instance as a base (this is what you get with Cloud Shell).

  4. The STREAM triad benchmark shows an effective memory bandwidth of about 12.4 GB/s. Sketch a roofline diagram for yourself with the two effective flop rates as horizontal lines and the 12.4 GB/s memory bandwidth. Where are the crossover points on this plot? Is the centroid code bottlenecked by memory or flop rate when it is properly vectorized? What about when only scalar arithmetic operations are used?

Report out (5 minutes)