Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
437 views
in Technique[技术] by (71.8m points)

x86 - Assembly: What is the purpose of movl data_items(,%edi,4), %eax in this program

This program (from Jonathan Bartlett's Programming From the Ground Up) cycles through all the numbers stored in memory with .long and puts the largest number in the EBX register for viewing when the program completes.

.section .data
data_items:
    .long 3, 67, 34, 222, 45, 75, 54, 34, 44, 33, 22, 11, 66, 0

.section .text
.globl _start

_start:
    movl $0, %edi
    movl data_items (,%edi,4), %eax
    movl %eax, %ebx
start_loop:
    cmpl $0, %eax
    je loop_exit
    incl %edi
    movl data_items (,%edi,4), %eax
    cmpl %ebx, %eax
    jle start_loop
    movl %eax, %ebx
    jmp start_loop
loop_exit:
    movl $1, %eax
    int $0x80

I'm not certain about the purpose of (,%edi,4) in this program. I've read that the commas are for separation, and that the 4 is for reminding our computer that each number in data items is 4 bytes long. Since we've already declared that each number is 4 bytes with .long, why do we need to do it again here? Also, could someone explain in more detail what purpose the two commas serve in this situation?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In AT&T syntax, memory operands have the following syntax1:

displacement(base_register, index_register, scale_factor)

The base, index and displacement components can be used in any combination, and every component can be omitted

but obviously the commas must be retained if you omit the base register, otherwise it would be impossible for the assembler to understand which of those components you are leaving out.

All this data gets combined to calculate the address you are specifying, with the following formula:

effective_address = displacement + base_register + index_register*scale_factor

(which incidentally is almost exactly how you would specify this in Intel syntax).

So, armed with this knowledge we can decode your instruction:

movl data_items (,%edi,4), %eax

Matching the syntax above, you see that:

  • data_items is the displacement;
  • base_register is omitted, so is not put into the formula above;
  • %edi is index_register;
  • 4 is scale_factor.

So, you are telling the CPU to move a long from the location data_items+%edi*4 to the register %eax.

The *4 is necessary because each element of your array is 4-bytes wide, so to transform the index (in %edi) to an offset (in bytes) from the start of the array you have to multiply it by 4.

Since we've already declared that each number is 4 bytes with .long, why do we need to do it again here?

Assemblers are low level tools that knows nothing about types.

  • .long is not an array declaration, is just a directive to the assembler to emit the bytes corresponding to the 32-bit representation of its parameters;
  • data_items is not an array, is just a symbol that gets resolved to some memory location, exactly as the other labels; the fact that you placed a .long directive after it is of no particular significance to the assembler.

Notes

  1. Technically, there would also be the segment specifier, but given that we are talking about 32 bit code on Linux I'll omit segments entirely, as they would only add confusion.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...