I want to determine the overhead of non-blocking point-to-point communications in MPI. There are some benchmarks available (like Sandia MPI Micro-Benchmark Suite or OSU micro-benchmarks), but for some reason they do not discriminate between the send modes MPI offers (standard, ready, buffered, synchronous) and only using the standard mode. The MPI report states that
In this mode, it is up to MPI to decide whether outgoing messages will be buffered. MPI may buffer outgoing messages. In such a case, the send call may complete before a matching receive is invoked. On the other hand, buffer space may be unavailable, or MPI may choose not to buffer outgoing messages, for performance reasons. In this case, the send call will not complete until a matching receive has been posted, and the data has been moved to the
receiver.
I would assume that writing the message into a buffer may have a different performance than sending the message directly to the receiver (which could be physically far away, be connected via a low bandwidth connection, etc). So my question is, whether my assumptions are wrong and there are never any significant performance differences between a buffered send and a ready send (and if yes, why) - or whether these benchmarks just ignore these possible differences (and if yes, why).
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…