Second question first: div
is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.
The first five instructions (up to and including the shrl
) compute i/10 (I'll explain how in a minute).
The next few instructions multiply the result by 10 again, but avoiding the mul
/imul
instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).
movl %edx, %eax ; eax=i/10
sall $2, %eax ; eax=(i/10)*4
addl %edx, %eax ; eax=(i/10)*4 + (i/10) = (i/10)*5
addl %eax, %eax ; eax=(i/10)*5*2 = (i/10)*10
This is then subtracted from i
again to obtain i - (i/10)*10
which is i % 10
(for unsigned numbers).
Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx
, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.
There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)
Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div
. div
expects the 64-bit dividend in edx:eax
(high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx
divides edx:eax
by ebx
). It returns the quotient in eax
and the remainder in edx
. idiv
does the same for signed values.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…