The following list is by no means complete. Please suggest references
on Piazza! For more general references on C and UNIX programming,
makefiles, version control, and software design ideas, go here.
Other classes
Serial architecture and tuning
Profilers
- perf
is a Linux tool based on the new kernel-based subsystem for providing
access to CPU performance counters.
- OProfile is
another Linux tool based on CPU performance counters. However, OProfile
requires a custom kernel module, and can only be turned on and off with
root access. Thus, it's probably only useful if you have your own Linux
box.
- HPCToolkit is a suite of
sampling-based measurement and analysis tools for serial or parallel
codes. Runs on everything from single-core machines to the big DOE
machines.
- Google
Perftools is a sampling-based CPU profiler, a heap profiler, and an
allocator. Generates output that works with
KCachegrind/QCachegrid.
- Cachegrind is a part of the Valgrind toolkit. It actually simulates
the CPU, which means that it runs relatively slowly, but gives fairly
accurate information about where you'll see cache misses.
- KCachegrind is
not a profiler, but it is a tool for visualizing the output of
profilers. Thought it started off just as a front-end for Cachegrind,
it also works with the output of OProfile and the Google perftools. If
you're using OS X and don't feel like installing all of KDE just to get
this interface,
you may want to try just building QCachegrind
- VTune
is a commercial profiler from Intel. It's expensive, but surely worth
some attention if you work on a system where it is installed.
- AMD
CodeAnalyst is a commercial profiler from AMD. Unlike VTune, it is
freely available; but I think it is also specific to AMD-based
processors.
-
Shark is the OS X analogue of VTune or CodeAnalyst. It's freely
available, though you might have to register for a developer account
(also free).
Automated tuning
- ATLAS
(Automatically Tuned Linear Algebra Subroutines) is a package that
automatically generates efficient BLAS libraries. Clint Whaley's papers
on ATLAS have some pretty good information about tuning.
- FFTW (Fastest Fourier Transform
in the West) is an automatically-generated cache-oblivious
high-performance Fourier transform code.