Looking at Idle Time in ntv

Nice MPI trace tools start out with an overview of the entire execution. Here is a 4-way HPF run that generated a trace in native IBM format, which is directly readable by ntv. It is a matrix multiplication with A, B, and C all partitioned (BLOCK,BLOCK). It is color-coded with a legend below showing what each state means.

[overview of time line]

If you zoom in, and add the communications display, it is clear that there is far more Blocking Recv going on that Running:

[Zoomed in time line]

On the basis of this trace, we decide to partition our arrays more efficiently so that not so much data needs to be communicated.

(Back to main talk)