Single Core Architecture
2024-09-03
Searching for a study buddy or partner? Looking to meet a new friend? Are you taking CS, INFO, or ORIE classes? If so, the CIS Partner Finding Social is for you! This is the PERFECT opportunity to find a partner and meet other students in your classes, so join us on September 11th at 5-7pm in Duffield Atrium!
You will be using three systems (if enrolled!) – see email.
Perlmutter and GCP are Unix environments. Recommended for local work:
You can also develop remotely.
You will want to know:
Resources:
Memory operations are not all the same!
Instructions are non-obvious!
Goal: Understand how to help the compiler.
Today, a play in two acts:
Serial execution:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|
wash | dry | fold | ||||||
wash | dry | fold | ||||||
wash | dry | fold |
Pipelined execution:
1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|
wash | dry | fold | ||
wash | dry | fold | ||
wash | dry | fold |
Classic five-stage pipeline (MIPS and company)
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|
IF | ID | EX | MEM | WB | ||||
IF | ID | EX | MEM | WB | ||||
IF | ID | EX | MEM | WB | ||||
IF | ID | EX | MEM | WB | ||||
IF | ID | EX | MEM | WB |
Current versions are much more complicated!
Fetch/decode or retire multiple ops at once
Support multiple HW threads/core
Different pipelines for different units
Modern single-core architecture is complex! Desiderata
Compiler understands CPU in principle
Compiler needs help in practice
The goal:
Note memory layouts are part of your job!
Programs usually have locality:
The cache hierarchy is built to take advantage of locality.
This is mostly automatic and implicit.
Where can data go in cache?
Higher associativity costs more in hardware.
Apple M1 Pro (Firestorm core?)
Even for simple programs, performance is a complicated function of architecture!