x86 - Assembly: What is the purpose of movl data_items(,%edi,4), %eax in this program

Question

Welcome To Ask or Share your Answers For Others

x86 - Assembly: What is the purpose of movl data_items(,%edi,4), %eax in this program

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

x86 - Assembly: What is the purpose of movl data_items(,%edi,4), %eax in this program

This program (from Jonathan Bartlett's Programming From the Ground Up) cycles through all the numbers stored in memory with .long and puts the largest number in the EBX register for viewing when the program completes.

.section .data
data_items:
    .long 3, 67, 34, 222, 45, 75, 54, 34, 44, 33, 22, 11, 66, 0

.section .text
.globl _start

_start:
    movl $0, %edi
    movl data_items (,%edi,4), %eax
    movl %eax, %ebx
start_loop:
    cmpl $0, %eax
    je loop_exit
    incl %edi
    movl data_items (,%edi,4), %eax
    cmpl %ebx, %eax
    jle start_loop
    movl %eax, %ebx
    jmp start_loop
loop_exit:
    movl $1, %eax
    int $0x80

I'm not certain about the purpose of (,%edi,4) in this program. I've read that the commas are for separation, and that the 4 is for reminding our computer that each number in data items is 4 bytes long. Since we've already declared that each number is 4 bytes with .long, why do we need to do it again here? Also, could someone explain in more detail what purpose the two commas serve in this situation?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:25:44+0000

In AT&T syntax, memory operands have the following syntax¹:

displacement(base_register, index_register, scale_factor)

The base, index and displacement components can be used in any combination, and every component can be omitted

but obviously the commas must be retained if you omit the base register, otherwise it would be impossible for the assembler to understand which of those components you are leaving out.

All this data gets combined to calculate the address you are specifying, with the following formula:

effective_address = displacement + base_register + index_register*scale_factor

(which incidentally is almost exactly how you would specify this in Intel syntax).

So, armed with this knowledge we can decode your instruction:

movl data_items (,%edi,4), %eax

Matching the syntax above, you see that:

data_items is the displacement;
base_register is omitted, so is not put into the formula above;
%edi is index_register;
4 is scale_factor.

So, you are telling the CPU to move a long from the location data_items+%edi*4 to the register %eax.

The *4 is necessary because each element of your array is 4-bytes wide, so to transform the index (in %edi) to an offset (in bytes) from the start of the array you have to multiply it by 4.

Since we've already declared that each number is 4 bytes with .long, why do we need to do it again here?

Assemblers are low level tools that knows nothing about types.

.long is not an array declaration, is just a directive to the assembler to emit the bytes corresponding to the 32-bit representation of its parameters;
data_items is not an array, is just a symbol that gets resolved to some memory location, exactly as the other labels; the fact that you placed a .long directive after it is of no particular significance to the assembler.

Notes

Technically, there would also be the segment specifier, but given that we are talking about 32 bit code on Linux I'll omit segments entirely, as they would only add confusion.

Categories

x86 - Assembly: What is the purpose of movl data_items(,%edi,4), %eax in this program

x86 - Assembly: What is the purpose of movl data_items(,%edi,4), %eax in this program

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags