Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received,
2 Java processes just spawned, without assigning specific CPU cores with taskset:
TCP - 25 microseconds
Named pipes - 15 microseconds
Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:
TCP, same cores: 30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!
so
TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit
At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead
Some shared memory measurement:
September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport.
http://solacesystems.com/news/fastest-ipc-messaging/
P.S. - tried shared memory next day in the form of memory mapped files,
if busy waiting is acceptable, we can reduce latency to 0.3 microsecond
for passing a single byte with code like this:
MappedByteBuffer mem =
new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
.map(FileChannel.MapMode.READ_WRITE, 0, 1);
while(true){
while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
mem.put(0, (byte)10); // sending the reply
}
Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes
(I don't know of another way yet). If 2 processes forced to same core with taskset,
the latency becomes 1.5 microseconds - that's a context switch delay
P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:
int j=123456789;
String ret = "my-record-key-" + j + "-in-db";
P.P.P.S - hope this is not too much off-topic, but finally I tried replacing Thread.sleep(0) with incrementing a static volatile int variable (JVM happens to flush CPU caches when doing so) and obtained - record! - 72 nanoseconds latency java-to-java process communication!
When forced to same CPU Core, however, volatile-incrementing JVMs never yield control to each other, thus producing exactly 10 millisecond latency - Linux time quantum seems to be 5ms... So this should be used only if there is a spare core - otherwise sleep(0) is safer.