Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

assembly - Do terms like direct/indirect addressing mode actual exists in the Intel x86 manuals

To give a little bit of background, I wanted to study how x86 instructions are encoded/decoded manually. I came across the ModR/M and SIB bytes and it seems that understanding the x86 addressing modes is fundamental to understanding the instruction encoding scheme.

Hence, I did a Google search for x86 addressing modes. Most blogs/videos that the search returned were addressing modes for the 8086 processor. Going through some of them, the different addressing modes were Register, Direct, Indirect, Indexed, Based, and some more. But the blogs use inconsistent names when referring to these addressing modes. Multiple different sources use multiple different addressing modes. The different terms are not even mentioned in the Intel manual here. For example, I can't seem to find anywhere in the Intel manual, an addressing mode called Direct or Indirect. Also, the Mod bits in the ModRM byte is a 2 bit field, which makes me wonder if more than 4 addressing modes are possible.

My question is, are terms like Direct addressing modes, Indirect addressing modes older terms that are no longer used in the Intel manuals, but used by the general public. If the terms technically do exists, where can I find a reference to them in the manuals.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There aren't really official names for most forms of x86 addressing modes. They all have the form [base + index*scale + disp8/disp32] (or a subset of any 1 or 2 components of that), except for 64-bit RIP-relative addressing. Referencing the contents of a memory location. (x86 addressing modes).

Intel does officially name those components of addressing modes, in section 3.7.5 of volume 1. They also use Register vs. Immediate vs. Memory, but usually don't make a big deal about different forms of addressing mode.

The Mod bits in the ModRM byte is a 2 bit field, which makes me wonder if more than 4 addressing modes are possible.

Mod chooses Register vs. Memory with disp0/8/32. There are "escape" codes for more modes

  • The modes that would be [rbp] with no displacement instead means there's a disp32 with no base. (This is why you see [rbp+0] in disassembly: the best encoding for [rbp] is base=rbp, with a disp8 of 0. (Note that [rbp] isn't useful when it's a frame pointer.)
  • The ModR/M encodings that would be base=rsp mean there's a SIB byte.
  • The SIB encodings that would be index=RSP mean no index. (Given the previous rule, this makes it possible to encode [rsp], instead of the less-useful [rsp+rsp].)

When writing in English about assembly language, it's natural to use terms with obvious meanings, including some that you mentioned. For example, Intel's optimization manual says (my emphasis):

2.3.2.4 Micro-op Queue and the Loop Stream Detector (LSD)

... (micro-fused uops with indexed addressing modes are un-laminated in the IDQ on SnB)

... For code that is dominated by indexed addressing (as often happens with array processing), recoding algorithms to use base (or base+displacement) addressing can sometimes improve performance by keeping the load plus operation and store instructions fused.

Indexed addressing modes include any combination that uses idx*scale, regardless of whether it's with a base reg or with a disp32, or both. (idx alone is not encodeable; [rax*1] is actually encoded as disp32+idx*1 with disp32=0.) At some point they say "any addressing mode with an index" or similar, otherwise it might not be clear exactly what they meant. Of course, testing with performance counters can verify the interpretation.

But they don't over-do it with making up names for things. When there isn't an obvious English phrase they can stick on something, they write (still in the Sandybridge section):

The common load latency is five cycles. When using a simple addressing mode, base plus offset that is smaller than 2048, the load latency can be four cycles.

In table 2-19, they have two columns, one for Base + Offset > 2048; or Base + Index [+ Offset], and another for Base + Offset < 2048 with latencies 1 cycle lower (except for 256b AVX loads). (Fun fact, [rdi+8] is 1c lower latency than [rdi-8].)

(Technically they probably should have said "displacement", because the whole addressing mode forms an offset or effective-address in x86 terminology, which forms a linear address when added to the segment base. But "offset" is also used to describe immediate constant parts of addressing modes in non-x86 generic terminology. And x86 segmentation is fortunately not something you usually have to think about these days.)


In the vol.1 manual, Intel does sort of use some of the terminology you describe. They describe an addressing mode with just a displacement component as "direct" (sort of), and [reg] as "indirect", because those terms do get used when talking about instruction-sets and what kind of addressing modes they support.

vol.1 3.7.5 Specifying an Offset

The following addressing modes suggest uses for common combinations of address components.

  • Displacement ? A displacement alone represents a direct (uncomputed) offset to the operand. Because the displacement is encoded in the instruction, this form of an address is sometimes called an absolute or static address. It is commonly used to access a statically allocated scalar operand.

  • Base ? A base alone represents an indirect offset to the operand. ...

  • (Index ? Scale) + Displacement ? This address mode offers an efficient way to index into a static array...
  • Base + Index + Displacement ...
  • Base + (Index ? Scale) + Displacement ? Using all the addressing components together allows efficient indexing of a two-dimensional array when the elements of the array are 2, 4, or 8 bytes in size.

But as you saw, they don't make up names for the more complex forms.

They do distinguish between Immediate vs. Register vs. Memory operands, though. (3.7 OPERAND ADDRESSING). They usually make little or no distinction between an r/m32 operand that uses a register encoding, vs. the other operand that has to be a register, though.


Branch instruction terminology

Direct vs. indirect also comes up for branches. It's a bit like talking about the addressing mode for reaching the code bytes that will be run next.

6.3.7 Branch Functions in 64-Bit Mode

...

Address sizes affect the size of RCX used for JCXZ and LOOP; they also impact the address calculation for memory indirect branches. Such addresses are 64 bits by default; but they can be overridden to 32 bits by an address size prefix.

Memory indirect is jmp [rax], where the final value of RIP comes from memory, vs. a register-indirect branch like jmp rax that sets RIP=RAX. x86 doesn't have a memory-indirect addressing mode for loads/stores; code-fetch after a branch is taken introduces the extra level of indirection in the terminology. (sort of).

The vol.2 manual entry for jmp does talk about indirect vs. relative or absolute jumps. (Although note that x86 doesn't have absolute direct near jumps (put an address in a register for that), only absolute far address specified with an immediate pointer (ptr16:16 or ptr16:32) or indirectly with a memory location.)

When describing near indirect jumps, jmp r/m32 (or 64), they say "absolute offset specified indirectly in a GP reg or memory". ("absolute offset" is relative to the CS segment base).

Segmentation makes x86 addressing more complicated to talk about, especially when comparing special addressing modes that can include a segment explicitly vs. ones that don't.


Naming addressing modes is over-rated

It's far easier to remember what x86 addressing modes can do in terms of subsets of the general case, rather than memorizing all the different possibilities separately with names like Indexed, Based, or whatever.

You see that kind of thing in tutorials like https://www.tutorialspoint.com/microprocessor/microprocessor_8086_addressing_modes.htm or http://www.geeksforgeeks.org/addressing-modes/ that make a big deal out of classifying the addressing modes. The latter even has a quiz asking you to match C statements with some addressing-mode names.

With the less-flexible 16-bit addressing modes, there are few enough that you can try to name them, and Based vs. Indexed does actually give you a different choice of registers. But when you're programming, all you really need to remember is that it's your choice of any subset of [bx|bp] + [di|si] + disp0/8/16. This is how di/si (dst/src index) and maybe bx/bp got their names.


Terminology like this can be useful in comparing the capabilities of different ISAs. For example, Wikipedia says that old ISAs like PDP-8 made a lot of use of memory-indirect because they had few registers and only 8 bit addressing range with registers.

Wikipedia also says:

Note that there is no generally accepted way of naming the various addressing modes.

There's no sense making a big deal out of naming of modes. If you're writing something, make sure it's clear what you mean without depending on a specific technical meaning for certain terms. e.g. if you say "an index addressing mode", make sure the reader knows from context whether you're including base+index*scale or not.

I wonder if some of the desire to name modes originated with 8-bit micros that predate 8086. You might want to ask about this over on https://retrocomputing.stackexchange.com/. I don't know much about addressing modes available on 8-bit CPUs with mostly fixed one-byte instructions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...