That's a mistake. Only imul
has immediate and 2-register forms.
mul, div, or idiv still only exist in the one-operand form introduced with 8086, using RDX:RAX as the implicit double-width operand for output (and input for division).
Or EDX:EAX, DX:AX, or AH:AL, depending on operand-size of course. Consult an ISA reference like Intel's manual, not this book! https://www.felixcloutier.com/x86/idiv
Also see When and why do we sign extend and use cdq with mul/div? and Why should EDX be 0 before using the DIV instruction?
x86-64's only hardware division instructions are idiv
and div
. 64-bit mode removed aam
, which does 8-bit division by an immediate. (Dividing in Assembler x86 and Displaying Time in Assembly has an example of using aam
in 16-bit mode).
Of course for division by constants idiv
and div
(and aam
) are very inefficient. Use shifts for powers of 2, or a multiplicative inverse otherwise, unless you're optimizing for code-size instead of performance.
CS:APP 3e Global Edition apparently has multiple serious x86-64 instruction-set mistakes like this in practice problems, claiming that GCC emits impossible instructions. Not just typos or subtle mistakes, but misleading nonsense that's very obviously wrong to people familiar with the x86-64 instruction set. It's not just a syntax mistake, it's trying to use instructions that aren't encodeable (no syntax can exist to express them, other than a macro that expands to multiple instructions. Defining idivq
as a pseudo-instruction using a macro would be pretty weird).
e.g. I correctly guessed missing part of a function, but gcc generated assembly code doesn't match the answer is another one where it suggests that (%rbx, %rdi, %rsi)
and (%rsi, %rsi, 9)
are valid addressing modes! The scale factor is actually a 2-bit shift count so these are total garbage and a sign of a serious lack of knowledge by the authors about the ISA they're teaching, not a typo.
Their code won't assemble with any AT&T syntax assembler.
Also What does this x86-64 addq instruction mean, which only have one operand? (From CSAPP book 3rd Edition) is another example, where they have a nonsensical addq %eax
instead of inc %rdx
, and a mismatched operand-size in a mov
store.
It seems that they're just making stuff up and claiming it was emitted by GCC. IDK if they start with real GCC output and edit it into what they think is a better example, or actually write it by hand from scratch without testing it.
GCC's actual output would have used multiplication by a magic constant (fixed-point multiplicative inverse) to divide by 9 (even at -O0
, but this is clearly not debug-mode code. They could have used -Os
).
Presumably they didn't want to talk about Why does GCC use multiplication by a strange number in implementing integer division? and replaced that block of code with their made-up instruction. From context you can probably figure out where they expect the output to go; perhaps they mean rcx /= 9
.
These errors are from 3rd-party practice problems in the Global Edition
From the publisher's web site (https://csapp.cs.cmu.edu/3e/errata.html)
Note on the Global Edition: Unfortunately, the publisher arranged for the generation of a different set of practice and homework problems in the global edition. The person doing this didn't do a very good job, and so these problems and their solutions have many errors. We have not created an errata for this edition.
So CS:APP 3e is probably a good textbook, as long as you get the North American edition, or ignore the practice / homework problems. This explains the huge disconnect between the textbook's reputation and wide use vs. the serious and obvious (to people familiar with x86-64 asm) errors like this one that go beyond sloppy into don't-know-the-language territory.
How a hypothetical idiv reg, reg
or idiv $imm, reg
would be designed
Also, the dividend should be given from the quantity in registers %rdx (high-order 64 bits) and %rax (low-order 64 bits) - so if this is defined in the architecture then it does not seem possible that the second operand could be a specified dividend.
If Intel or AMD had introduced a new convenient forms for div
or idiv
, they would have designed it to use a single-width dividend because that's how compilers always use it.
Most languages are like C and implicitly promote both operands for + - * / to the same type and produce a result of that width. Of course if the inputs are known to be narrow that can be optimized away. (e.g. using one imul r32
to implement a * (int64_t)b
).
But div
and idiv
fault if the quotient overflows so it's not safe to use a single 32-bit idiv
when compiling int32_t q = (int64_t)a / (int32_t)b
.
Compilers always use xor edx,edx
before DIV or cdq
or cqo
before IDIV to actually do n / n => n-bit division.
Real full-width division using a dividend that isn't just zero- or sign-extended is only done by hand with intrinsics or asm (because gcc/clang and other compilers don't know when the optimization is safe), or in gcc helper functions that do e.g. 64-bit / 64-bit division in 32-bit code. (Or 128-bit division in 64-bit code).
So what would be most helpful is a div/idiv that avoids the extra instruction to set up RDX, too, as well as minimizing the number of implicit register operands. (Like imul r32, r/m32
and imul r32, r/m32, imm
do: making the common case of non-widening multiplication more convenient with no implicit registers. That's Intel-syntax like the manuals, destination first)
The simplest way would be a 2-operand instruction that did dst /= src
. Or maybe replaced both operands with quotient and remainder. Using a VEX encoding for 3 operands like BMI1 andn
, you could maybe have
idivx remainder_dst, dividend, divisor
. With the 2nd operand also an output for the quotient. Or you could have the remainder written to RDX with a non-destructive destination for the quotient.
Or more likely to optimize for the simple case where only the quotient is needed, idivx quot, dividend, divisor
and not store the remainder anywhere. You can always use regular idiv
when you want the quotient.
BMI2 mulx
uses an implicit rdx
input operand because its purpose is to allow multiple dep chains of add-with-carry for extended-precision multiply. So it still has to produce 2 outputs. But this hypothetical new form of idiv
would exist to save code-size and uops around normal uses of idiv
that aren't widening. So 386 imul reg, reg/mem
is the point of comparison, not BMI2 mulx
.
IDK if it would make sense to introduce an immediate form of idivx
as well; you'd only use it for code-size reasons. Multiplicative inverses are more efficient division by constants so there's very little real-world use-case for such an instruction.