Note that gcc -O0
really only disables optimization across statements, and disables only some within statements. See Disable all optimization options in GCC.
Within a single statement, it still does some of its usual optimizations within statements, including multiplicative inverses for division by non-power-of-2 constants.
Some other compilers do more braindead transliteration of C into asm with optimization disabled, e.g. MSVC will sometimes put a constant into a register and compare it against another constant, with two immediates. GCC never does anything that dumb; it evaluates constant expressions as far as possible and removes always-false branches.
If you want a very literal-minded compiler, a look at TinyCC, a one-pass compiler.
In this case: The ISO C standard defines all of those in terms of x+1
x[y]
is syntactical sugar for *(x+y)
, so ISO C only has to define the rules for pointer math; the +
operator between pointer and integral types. +
is commutative (x+y
and y+x
are exactly equivalent), so it's not surprising that variations on that boil down to the same thing. In your case, T x[10]
decays to a T*
for the pointer math.
&*x
"cancels out": the ISO C abstract machine never truly references the *x
object, so this is safe even if x
is a NULL pointer or pointing past the end of an array or whatever. That's why this takes the address of the array element, not of some temporary *x
object. So this is the kind of thing compilers need to sort out before doing code-gen, not just evaluate *x
with a mov
load. Because then what? Having the value in a register doesn't help you take the address of the original location.
Nobody expects truly efficient code from -O0
(part of the goal is to compile fast, as well as consistent debugging), but gratuitous random extra instructions would be unwelcome even in cases where they're not dangerous.
GCC actually transforms source through GIMPLE and RTL internal representations of the program logic. It's probably during those passes where different C ways of expressing the same logic tend to become identical.
That said, it's somewhat surprising that gcc does lea rax, [rbp-80]
/ add rax, 4
instead of folding the + 1*sizeof(unsigned)
into the LEA. It would of course do that if you used optimization. (and volatile unsigned int*
to force it to still materialize the unused variables, if you want it to work without the code bloat of the printf calls.)
Other compilers:
MSVC does have some differences: https://godbolt.org/z/xoMfT4
;; x86-64 MSVC
sub rsp, 88 ; Windows x64 doesn't have a red zone
...
// unsigned int* a = &x[1]; // Get address of dereferenced x[1]
mov eax, 4 ; even dumber than GCC
imul rax, rax, 1 ; sizeof(unsigned) * 1 I guess?
lea rax, QWORD PTR x$[rsp+rax]
mov QWORD PTR a$[rsp], rax
// unsigned int* b = &(*(x+1)); // Get address of dereferenced *(x+1)
lea rax, QWORD PTR x$[rsp+4] ; smarter than GCC
mov QWORD PTR b$[rsp], rax
// unsigned int* c = x+1; // Get address x+1
lea rax, QWORD PTR x$[rsp+4]
mov QWORD PTR c$[rsp], rax
...
c$[rsp]
is just [16 + rsp]
, given the c$ = 16
assemble-time constant it defined earlier.
ICC and clang compile all versions the same way.
MSVC for AArch64 avoids the multiply (and uses hex literals instead of decimal). But like x86-64 GCC, it gets the array base address into a register and then adds 4. https://godbolt.org/z/ThPxx9
@@ AArch64 MSVC
...
sub sp,sp,#0x40
...
// unsigned int* a = &x[1]; // Get address of dereferenced x[1]
add x8,sp,#0x20
add x8,x8,#4
str x8,[sp]
// unsigned int* b = &(*(x+1)); // Get address of dereferenced *(x+1)
add x8,sp,#0x20
add x8,x8,#4
str x8,[sp,#8]
// unsigned int* c = x+1; // Get address x+1
add x8,sp,#0x20
add x8,x8,#4
str x8,[sp,#0x10]
// unsigned int* d = &1[x];
add x8,sp,#0x20
add x8,x8,#4
str x8,[sp,#0x18]
Clang uses the interesting strategy of getting the array base address into a register once, and adding to it for each statement. I guess it considers that x86-64 lea
or AArch64 add x9, sp, #36
part of its prologue, if it wants to support debuggers that use jump
between source lines, and maybe won't do if it there's any non-linear control-flow in the function?