When I compile C code with a recent compiler on an amd64 or x86 system, functions are aligned to a multiple of 16 bytes. How much does this alignment actually matter on modern processors? Is there a huge performance penalty associated with calling an unaligned function?
Benchmark
I ran the following microbenchmark (call.S
):
// benchmarking performance penalty of function alignment.
#include <sys/syscall.h>
#ifndef SKIP
# error "SKIP undefined"
#endif
#define COUNT 1073741824
.globl _start
.type _start,@function
_start: mov $COUNT,%rcx
0: call test
dec %rcx
jnz 0b
mov $SYS_exit,%rax
xor %edi,%edi
syscall
.size _start,.-_start
.align 16
.space SKIP
test: nop
rep
ret
.size test,.-test
with the following shell script:
#!/bin/sh
for i in `seq 0 15` ; do
echo SKIP=$i
cc -c -DSKIP=$i call.S
ld -o call call.o
time -p ./call
done
On a CPU that identifies itself as Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz according to /proc/cpuinfo
. The offset didn't make a difference for me, the benchmark took constant 1.9 seconds to run.
On the other hand, on another system with a CPU that reports itself as a Intel(R) Core(TM) i7 CPU L 640 @ 2.13GHz, the benchmark takes 6.3 seconds, except if you have a offset of 14 or 15, where the code takes 7.2 seconds. I think that's because the function starts to span multiple cache lines.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…