Intel's vol.2 manual has details on the encoding of operands for each form of each instruction. e.g. taking just the 8-bit operand size versions of the well-known add
instruction, which has 2 reg,rm forms ; a rm,immediate form ; and a no-ModRM 2-byte short for for add al, imm8
Opcode Instruction | Op/En | 64-bit Mode | Compat/Leg Mode | Description
04 ib ADD AL, imm8 | I | Valid Valid Add imm8 to AL.
80 /0 ib ADD r/m8, imm8 | MI | Valid Valid Add imm8 to r/m8.
00 /r ADD r/m8, r8 | MR | Valid Valid Add r8 to r/m8.
02 /r ADD r8, r/m8 | RM | Valid Valid Add r/m8 to r8.
And below that, the Instruction Operand Encoding ? table details what those I / MI / MR / RM codes from the Op/En (operand encoding) column above mean:
Op/En | Operand 1 | Operand 2 | Operand 3 Operand 4
RM | ModRM:reg (r, w) | ModRM:r/m (r) | NA NA
MR | ModRM:r/m (r, w) | ModRM:reg (r) | NA NA
MI | ModRM:r/m (r, w) | imm8/16/32 | NA NA
I | AL/AX/EAX/RAX | imm8/16/32 | NA NA
Notice that the "I" operand form doesn't mention a ModRM, so there isn't one. But MI does have one. (With the /r
field being filled in with the /0
from the 80 /0
in the opcode table: full explanation with 83 /0 add r/m64, imm8
as an example.)
Notice that RM and MR differ only in whether the r/m operand (that can be memory) is the destination or source.
Most x86 ALU instructions have four reg, r/m opcodes, one for each direction (MR vs. RM) for each of 8-bit and non-8-bit (size determined by 66
operand-size prefix to flip between 16 and 32, or REX.W for 64-bit, or none for the default operand-size (32 in modes other than 16-bit).
Plus the standard immediate form(s):
- r/m8 bit with immediate (sharing an opcode byte overloaded via /digit)
- r/m 16/32/64-bit with 8-bit sign-extended immediate (sharing an opcode byte overloaded via /digit)
- r/m 16/32/64-bit with 16/32/sign_extended_32 bit immediate (sharing an opcode byte overloaded via /digit)
- AL no modrm with 8-bit immediate (whole opcode byte to itself)
- AX/EAX/RAX no modrm, imm16 / imm32 / sign_extended_imm32 (whole opcode byte to itself)
This is a lot of opcodes for every mnemonic, and is why 8086 didn't have room for more following the same pattern as the usual instruction. (Why are there no NAND, NOR and XNOR instructions in X86?)
See also https://wiki.osdev.org/X86-64_Instruction_Encoding which covers things more concisely than Intel's manual. Also note that you can check your understanding by assembling something with an assembler like NASM or GAS and looking at the machine code. Or just looking at disassembly of an existing program like objdump -drwC -Mintel /bin/ls | less
Some disassemblers even group bytes together in the machine code for each instruction, keeping a 4-byte immediate together as a group separate from opcode and modrm for example. (Agner Fog's objconv
is like this.)