Several tests are used to measure the asymptotic network bandwidth
( ) and the data size at which the transfer rate is half
the asymptotic rate (
).
indicates how
fast the network adapter moves data from the virtual buffers to the
network while
characterizes the performance of bulk
transfers for small messages.
The bandwidth benchmarks involve two processing nodes and measure the
one-way bandwidth for data sizes varying from 16 bytes to
1 Mbyte. They were run using SP AM bulk transfer
primitives as well as IBM MPL send and receive primitives for
comparison. The blocking transfer bandwidth test measures
synchronous transfer requests by issuing blocking requests (
am_store and am_get) and waiting for their completion. For
MPL an mpc_bsend is followed by 0-byte mpc_brecv. The
pipelined asynchronous transfer bandwidth uses a number of small
requests to transfer a large block. This benchmark sends N bytes of
data using
transfers of n bytes, where N
is 1 MByte and n varies from 64 bytes to 1 MByte, using
am_store_async and mpc_send respectively.
Figure 3: Bandwidth of blocking and non-blocking bulk transfers.
Figure 3 shows the results. The achieved by
pipelining am_store_async and am_get is 34.3 MBytes/s
compared to MPL's 34.6 MBytes/s using mpc_send. The
value of about 260 bytes for am_store_async
(slightly higher for am_get) compared to about 450 bytes for
mpc_send indicates that SP AM achieves better performance with
small messages.
The bandwidth of SP AM's synchronous stores and gets also converges to
34.3 MBytes/s but at a slower rate due to the round-trip latency as
the sender blocks after every transfer waiting for an acknowledgement.
Also, for smaller transfer sizes, the performance for gets is slightly
lower than for stores because of the overhead of the get request.
Consequently, the bandwidth curve for synchronous gets shows an
of 3000 bytes compared to the 2800 bytes for stores.
The effect of this overhead on the bandwidth vanishes as the transfer
size increases, explaining the overlapping of both curves for sizes
larger than 4 KBytes. Despite a higher
of 34.6 MBytes/s,
synchronous transfers using MPL's sends and receives have an
greater than 3500 bytes.
Figure 3 clearly shows that SP AM's asynchronous transfers are no better than their blocking counterparts for message sizes larger than one chunk (8064 bytes), which is when the flow control kicks in.