Using VT to Show Cause of Load Imbalance

Here we are running on 3 processors with a 500-element array partitioned block. However, the problem being solved is only 100x100, so the main loop executes only 100 times. Since node 0 "owns" the first 134 elements, it executes all 100 iterations of the loop, which shows up clearly in the display here, where gray is busy and bright pink is blocked on a receive: [VT shows load imbalance]

The light pink lines double are "user markers" to delimit the main iterations of the program, of which there are three.

(Back to main talk)