Accelerators
2024-10-08
Different fundamental goals
Graphics (and ML) are high throughput!
Five-stage RISC basically has
Modern machines have pipeline and
Mostly to find parallelism and reduce latency.
Simplify, simplify, simplify!
Simplify: Fetch/decode
Simplify: Branches
Simplify: Branches
Simplify: Latency tolerance
No caches, but still:
Therefore:
(From PMPP book)
Much is not so dissimilar to multicore CPU!
CPU
GPU
GPUs are great for
… but you still want a CPU, too!
GPU nodes have:
Each SM has
CPU code calls GPU kernels
“Hello world”: vector addition in CUDA
Serial version of add:
CPU code calls GPU kernels
__global__
for host-callable kernel on GPU__device__
or __host__
First, allocate memory on GPU:
int size = n * sizeof(float);
float *d_x, *d_y;
cudaMalloc((void**)& d_x, size);
cudaMalloc((void**)& d_y, size);
cudaMalloc(void** p, size_t size);
p
refers to a device (global) memory locationvector
, for ex)cudaFree(void* p)
to freeShould probably be more careful:
// cudaMemcpy(void* dest, void* src, size_t size, int direction);
cudaMemcpy(d_x, x.data(), size, cudaMemcpyHostToDevice);
cudaMemcpy(d_y, y.data(), size, cudaMemcpyHostToDevice);
data
for C pointer to vector
storagey
at endx
back (only y
updated)void add(const std::vector<float>& x, std::vector<float>& y)
{
int n = x.size();
// Allocate GPU buffers and transfer data in
int size = n * sizeof(float);
float *d_x, *d_y;
cudaMalloc((void**)& d_x, size); cudaMalloc((void**)& d_y, size);
cudaMemcpy(d_x, x.data(), size, cudaMemcpyHostToDevice);
cudaMemcpy(d_y, y.data(), size, cudaMemcpyHostToDevice);
// Call kernel on the GPU (1 block, 1 thread)
gpu_add<<<1,1>>>(n, d_x, d_y);
// Copy data back and free GPU memory
cudaMemcpy(y.data(), d_y, size, cudaMemcpyDeviceToHost);
cudaFree(d_x); cudaFree(d_y);
}
malloc
/free
/memcpy
thing is not very C++!__global__
void gpu_add(int n, float* x, float* y)
{
int i = threadIdx.x + blockDim.x * blockIdx.x;
if (i < n)
y[i] += x[i]
}
// Call looks like
gpu_add<<<n_blocks,block_size>>>(n, x, y);
blockDim.x
threadsn
total threads