Using VT to Show Cause of Load Imbalance
Here we are running on 3 processors with a 500-element array
partitioned block. However, the problem being solved is only 100x100,
so the main loop executes only 100 times. Since node 0 "owns"
the first 134 elements, it executes all 100 iterations of the loop,
which shows up clearly in the display here, where gray is busy and
bright pink is blocked on a receive:
The light pink lines double are "user markers" to
delimit the main iterations of
the program, of which there are three.
(Back to main talk)