Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
879 views
in Technique[技术] by (71.8m points)

assembly - Reading from memory in 8086 real mode while using 'ORG 0x0000'

I've been messing around with x86-16 assembly and running it with VirtualBox. For some reason when I read from memory and try to print it as a character, I get completely different results from what I was expecting. However when I hard-code the character as part of the instruction, it works fine. Here's the code:

ORG 0
BITS 16

push word 0xB800        ; Address of text screen video memory in real mode for colored monitors
push cs
pop ds                  ; ds = cs
pop es                  ; es = 0xB800
jmp start

; input = di (position*2), ax (character and attributes)
putchar:
    stosw
    ret

; input = si (NUL-terminated string)
print:
    cli
    cld
    .nextChar:
        lodsb   ; mov al, [ds:si] ; si += 1
        test al, al
        jz .finish
        call putchar
        jmp .nextChar
    .finish:
        sti
        ret

start:
    mov ah, 0x0E
    mov di, 8

    ; should print P
    mov al, byte [msg]
    call putchar

    ; should print A
    mov al, byte [msg + 1]
    call putchar

    ; should print O
    mov al, byte [msg + 2]
    call putchar

    ; should print !
    mov al, byte [msg + 3]
    call putchar

    ; should print X
    mov al, 'X'
    call putchar

    ; should print Y
    mov al, 'Y'
    call putchar

    cli
    hlt

msg: db 'PAO!', 0

; Fill the rest of the bytes upto byte 510 with 0s
times 510 - ($ - $$) db 0

; Header
db 0x55
db 0xAA

The print label and instructions in it can be ignored since I haven't used it yet because of the problem I've been having trying to print a character stored in memory. I've assembled it with both FASM and NASM and have the same problem meaning it's obviously my fault.

It prints something like: VirtualBox

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The ORG Directive

When you specify an ORG directive like ORG 0x0000 at the top of your assembler program, and use BITS 16 you are informing NASM that when resolving labels to Code and Data, that the absolute offsets that will be generated will be based on the starting offset specified in ORG (16-bit code will be limited to an offset being a WORD/2 bytes) .

If you have ORG 0x0000 at the start and place a label start: at the beginning of the code, start will have an absolute offset of 0x0000. If you use ORG 0x7C00 then the label start will have an absolute offset of 0x7c00. This will apply to any data labels and code labels.

We can simplify your example to see what is going on in the generated code when dealing with a data variable and a hard coded character. Although this code doesn't exactly perform the same actions as your code, it is close enough to show what works and what doesn't.

Example using ORG 0x0000:

BITS 16
ORG 0x0000

start:
    push cs
    pop  ds      ; DS=CS
    push 0xb800
    pop es       ; ES = 0xB800 (video memory)
    mov ah, 0x0E ; AH = Attribute (yellow on black)

    mov al, byte [msg]
    mov [es:0x00], ax   ; This should print letter 'P'
    mov al, byte [msg+1]
    mov [es:0x02], ax   ; This should print letter 'A'
    mov al, 'O'
    mov [es:0x04], ax   ; This should print letter 'O'
    mov al, '!'
    mov [es:0x06], ax   ; This should print letter '!'


    cli
    hlt

msg: db "PA"

; Bootsector padding
times 510-($-$$) db 0
dw 0xAA55

If you were to run this on VirtualBox the first 2 characters would be garbage while O! should display correctly. I will use this example through the rest of this answer.


VirtualBox / CS:IP / Segment:Offset Pairs

In the case of Virtual Box, it will effectively do the equivalent of a FAR JMP to 0x0000:0x7c00 after loading the boot sector at physical address 0x00007c00. A FAR JMP (or equivalent) will not only jump to a given address, it sets CS and IP to the values specified. A FAR JMP to 0x0000:0x7c00 will set CS = 0x0000 and IP = 0x7c00 .

If one is unfamiliar with the calculations behind 16-bit segment:offset pairs and how they map to a physical address then this document is a reasonably good starting point to understanding the concept. The general equation to get a physical memory address from a 16-bit segment:offset pair is (segment<<4)+offset = 20-bit physical address .

Since VirtualBox uses CS:IP of 0x0000:0x7c00 it would start executing code at a physical address of (0x0000<<4)+0x7c00 = 20-bit physical address 0x07c00 . Please be aware that this isn't guaranteed to be the case in all environments. Because of the nature of segment:offset pairs, there is more than one way to reference physical address 0x07c00. See the section at the end of this answer on ways to handle this properly.


What is Going Wrong with Your Bootloader?

Assuming we are using VirtualBox and the information above in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 upon entry to our bootloader. If we take the example code (Using ORG 0x0000) I wrote in the first section of this answer and look at the disassembled information (I'll use objdump output) we'd see this:

objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin

00000000 <.data>:
   0:   0e                      push   cs
   1:   1f                      pop    ds
   2:   68 00 b8                push   0xb800
   5:   07                      pop    es
   6:   b4 0e                   mov    ah,0xe
   8:   a0 24 00                mov    al,ds:0x24
   b:   26 a3 00 00             mov    es:0x0,ax
   f:   a0 25 00                mov    al,ds:0x25
  12:   26 a3 02 00             mov    es:0x2,ax
  16:   b0 4f                   mov    al,0x4f
  18:   26 a3 04 00             mov    es:0x4,ax
  1c:   b0 21                   mov    al,0x21
  1e:   26 a3 06 00             mov    es:0x6,ax
  22:   fa                      cli
  23:   f4                      hlt
  24:   50                      push   ax          ; Letter 'P'
  25:   41                      inc    cx          ; Letter 'A'
        ...
 1fe:   55                      push   bp
 1ff:   aa                      stos   BYTE PTR es:[di],al

Since the ORG information is lost when assembling to a binary file, I use --adjust-vma=0x0000 so that the first column of values (memory address) start at 0x0000. I want to do this because I used ORG 0x0000 in the original assembler code. I have also added some comments in the code to show where our data section is (and where the letters P and A were placed after the code).

If you were to run this program in VirtualBox the first 2 characters will come out as gibberish. So why is that? First recall VirtualBox reached our code by setting CS to 0x0000 and IP to 0x7c00. This code then copied CS to DS:

   0:   0e                      push   cs
   1:   1f                      pop    ds

Since CS was zero, then DS is zero. Now let us look at this line:

   8:   a0 24 00                mov    al,ds:0x24

ds:0x24 is actually the encoded address for the msg variable in our data section. The byte at offset 0x24 has the value P in it (0x25 has A). You might see where things might go wrong. Our DS = 0x0000 so mov al,ds:0x24 is really the same as mov al,0x0000:0x24. This syntax isn't valid but I'm replacing DS with 0x0000 to make a point. 0x0000:0x24 is where our code while executing will attempt to read our letter P from. But wait! That is physical address (0x0000<<4)+0x24 = 0x00024. This memory address happens to be at the bottom of memory in the middle of the interrupt vector table. Clearly this is not what we intended!

There are a couple ways to tackle this issue. The easiest (and preferred method) is to actually place the proper segment into DS, and not rely on what CS might be when our program runs. Since we set an ORG of 0x0000 we need to have a Data Segment(DS) = 0x07c0 . A segment:offset pair of 0x07c0:0x0000 = physical address 0x07c00 . Which is what the address of our bootloader is at. So all we have to do is amend the code by replacing:

    push cs
    pop  ds      ; DS=CS

With:

    push 0x07c0
    pop  ds      ; DS=0x07c0 

This change should provide the correct output when run in VirtualBox . Now let us see why. This code didn't change:

   8:   a0 24 00                mov    al,ds:0x24

Now when executed DS=0x07c0. This would have been like saying mov al,0x07c0:0x24. 0x07c0:0x24, which would translate into a physical address of (0x07c0<<4)+0x24 = 0x07c24 . This is what we want since our bootloader was physically placed into memory by the BIOS starting at that location and so it should reference our msg variable correctly.

Moral of the story? What ever you use for ORG there should be an applicable value in the DS register when we start our program.We should set it explicitly, and not rely on what is in CS.


Why Do Immediate Values Print?

With the original code, the first 2 characters printed gibberish, but the last two didn't. As was discussed in the previous section there was a reason the first 2 character wouldn't print, but what about the last 2 characters that did?

Let us examine the disassembly of the 3rd character O more carefully:

  16:   b0 4f                   mov    al,0x4f        ; 0x4f = 'O'

Since we used an immediate (constant) value and moved it into register AL, the character itself is encoded as part of the instruction. It doesn't rely on a memory access via the DS register. Because of this the last 2 characters displayed properly.


Ross Ridge's Suggestion and Why it Works in VirtualBox

Ross Ridge suggested we use ORG 0x7c00, and you observed that it worked. Why did that happen? And is that solution ideal?

Using my very first example and modify ORG 0x0000 to ORG 0x7c00, and then assemble it. objdump would have provided this disassembly:

objdump -Mintel -mi8086 -D -b binary  --adjust-vma=0x7c00 boot.bin

boot.bin:     file format binary   
Disassembly of section .data:

00007c00 <.data>:
    7c00:       0e                      push   cs
    7c01:       1f                      pop    ds
    7c02:       68 00 b8                push   0xb800
    7c05:       07                      pop    es
    7c06:       b4 0e                   mov    ah,0xe
    7c08:       a0 24 7c                mov    al,ds:0x7c24
    7c0b:       26 a3 00 00             mov    es:0x0,ax
    7c0f:       a0 25 7c                mov    al,ds:0x7c25
    7c12:       26 a3 02 00             mov    es:0x2,ax
    7c16:       b0 4f                   mov    al,0x4f
    7c18:       26 a3 04 00             mov    es:0x4,ax
    7c1c:       b0 21                   mov    al,0x21
    7c1e:       26 a3 06 00             mov    es:0x6,ax
    7c22:       fa                      cli
    7c23:       f4                      hlt
    7c24:       50                      push   ax          ; Letter 'P'
    7c25:       41                      inc    cx          ; Letter 'A'
        ...
    7dfe:       55                      push   bp
    7dff:       aa                      stos   BYTE PTR es:[di],al

VirtualBox set CS to 0x0000 when it jumped to our bootloader. Our original code then copied CS to DS, so DS = 0x0000. Now observe what the ORG 0x7c00 directive has done to our generated code:

    7c08:       a0 24 7c                mov    al,ds:0x7c24

Notice how we are now using an offset of 0x7c24! This would be like mov al,0x0000:0x7c24 which is physical address (0x0000<<4)+0x7c24 = 0x07c24. That is the right memory location where the bootloader was loaded, and is the proper position of our msg string. So it works.

Is using an ORG 0x7c00 a bad idea? No. It is fine. But we have a subtle issue to contend with. What happens if another Virtual PC environment or real hardware doesn't FAR JMP to our bootloader using a CS:IP of 0x0000:0x7c00? This is possible. There are many physical PCs with a BIOS that actually does the equivalent of a far jump to 0x07c0:0x0000. That too is physical address 0x07c00 as we have already seen. In that environment, when our code runs CS = 0x07c0. If we use the original code that copies CS to DS, DS now has 0x07c0 too. Now observe what would happen to this code in that situation:

    7c08:       a0 24 7c                mov    al

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...