Meeting notes (2020-09-24)
Logistics (5 minutes)
- Buddy check!
- Call for part-time TAs
- Status of course computing platforms
- Outline of the next few weeks
- Note on timing of materials
Mini-breakout (10 minutes)
- Where are you confused right now?
- What do you want to know more about that it looks like I may not cover?
Addressing confusions (10 minutes)
- Group 1
- Talked about speedup on single core, still uncomfortable how it looks in practice. More examples of basic cases, even some more complex ones.
- Group 2
- Basic block and loop vectorization
- Modified code a little bit, loop vectorization
- Group 3
- Sounds similar to what group 1 said! How to put things into practice (one example was roofline – taking a code and doing this in practice)
- Group 4
- Would be nice to have completed versions of problem sets (didn’t get to all of them together). Nice to see others after the fact; and additional examples.
- Group 5
- Also about how to make the principles work in practice; e.g. cache size, getting compiler to optimize memory fetching. Something more complex than matrix-matrix. Maybe better after proj 1
- Why C and not C++
- Group 6
- Repeat what group 5 said
- Group 7
- Not being able to tell in slides if things are toys or not? Example of what you could do?
- Just more examples, specifically memory alignment, padding
Vectorization demo (20 minutes)
- Comments about intrinsics (Intel guide
- GCC vector extension
- GCC alignment
attributes;
__builtin_assume_aligned
- C11 alignment notation (with
alignas
) - OpenMP SIMD tutorial and ORNL doc
Breakout (20 minutes)
Suppose you have a tuned single-core dot product that is limited by memory bandwidth (with memory at 12.4 GB/s for one core), and sending a message between processors takes 10 microseconds. If a parallel dot product implementation requires p-1 messages, what is the speedup curve for running a dot product on double precision vectors of dimension one million?
Afternotes
We spent more time than expected walking through the vectorization example, and so did not do the second breakout.
Thanks to all who participated for the helpful suggestions about the class. We will do some more hands-on examples in the coming week.