The ORG Directive
When you specify an ORG directive like ORG 0x0000
at the top of your assembler program, and use BITS 16
you are informing NASM that when resolving labels to Code and Data, that the absolute offsets that will be generated will be based on the starting offset specified in ORG (16-bit code will be limited to an offset being a WORD/2 bytes) .
If you have ORG 0x0000
at the start and place a label start:
at the beginning of the code, start
will have an absolute offset of 0x0000. If you use ORG 0x7C00
then the label start
will have an absolute offset of 0x7c00. This will apply to any data labels and code labels.
We can simplify your example to see what is going on in the generated code when dealing with a data variable and a hard coded character. Although this code doesn't exactly perform the same actions as your code, it is close enough to show what works and what doesn't.
Example using ORG 0x0000:
BITS 16
ORG 0x0000
start:
push cs
pop ds ; DS=CS
push 0xb800
pop es ; ES = 0xB800 (video memory)
mov ah, 0x0E ; AH = Attribute (yellow on black)
mov al, byte [msg]
mov [es:0x00], ax ; This should print letter 'P'
mov al, byte [msg+1]
mov [es:0x02], ax ; This should print letter 'A'
mov al, 'O'
mov [es:0x04], ax ; This should print letter 'O'
mov al, '!'
mov [es:0x06], ax ; This should print letter '!'
cli
hlt
msg: db "PA"
; Bootsector padding
times 510-($-$$) db 0
dw 0xAA55
If you were to run this on VirtualBox the first 2 characters would be garbage while O!
should display correctly. I will use this example through the rest of this answer.
VirtualBox / CS:IP / Segment:Offset Pairs
In the case of Virtual Box, it will effectively do the equivalent of a FAR JMP to 0x0000:0x7c00 after loading the boot sector at physical address 0x00007c00. A FAR JMP (or equivalent) will not only jump to a given address, it sets CS and IP to the values specified. A FAR JMP to 0x0000:0x7c00 will set CS = 0x0000 and IP = 0x7c00 .
If one is unfamiliar with the calculations behind 16-bit segment:offset pairs and how they map to a physical address then this document is a reasonably good starting point to understanding the concept. The general equation to get a physical memory address from a 16-bit segment:offset pair is (segment<<4)+offset = 20-bit physical address
.
Since VirtualBox uses CS:IP of 0x0000:0x7c00 it would start executing code at a physical address of (0x0000<<4)+0x7c00 = 20-bit physical address 0x07c00 . Please be aware that this isn't guaranteed to be the case in all environments. Because of the nature of segment:offset pairs, there is more than one way to reference physical address 0x07c00. See the section at the end of this answer on ways to handle this properly.
What is Going Wrong with Your Bootloader?
Assuming we are using VirtualBox and the information above in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 upon entry to our bootloader. If we take the example code (Using ORG 0x0000
) I wrote in the first section of this answer and look at the disassembled information (I'll use objdump output) we'd see this:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin
00000000 <.data>:
0: 0e push cs
1: 1f pop ds
2: 68 00 b8 push 0xb800
5: 07 pop es
6: b4 0e mov ah,0xe
8: a0 24 00 mov al,ds:0x24
b: 26 a3 00 00 mov es:0x0,ax
f: a0 25 00 mov al,ds:0x25
12: 26 a3 02 00 mov es:0x2,ax
16: b0 4f mov al,0x4f
18: 26 a3 04 00 mov es:0x4,ax
1c: b0 21 mov al,0x21
1e: 26 a3 06 00 mov es:0x6,ax
22: fa cli
23: f4 hlt
24: 50 push ax ; Letter 'P'
25: 41 inc cx ; Letter 'A'
...
1fe: 55 push bp
1ff: aa stos BYTE PTR es:[di],al
Since the ORG information is lost when assembling to a binary file, I use --adjust-vma=0x0000
so that the first column of values (memory address) start at 0x0000. I want to do this because I used ORG 0x0000
in the original assembler code. I have also added some comments in the code to show where our data section is (and where the letters P
and A
were placed after the code).
If you were to run this program in VirtualBox the first 2 characters will come out as gibberish. So why is that? First recall VirtualBox reached our code by setting CS to 0x0000 and IP to 0x7c00. This code then copied CS to DS:
0: 0e push cs
1: 1f pop ds
Since CS was zero, then DS is zero. Now let us look at this line:
8: a0 24 00 mov al,ds:0x24
ds:0x24
is actually the encoded address for the msg variable in our data section. The byte at offset 0x24 has the value P
in it (0x25 has A
). You might see where things might go wrong. Our DS = 0x0000 so mov al,ds:0x24
is really the same as mov al,0x0000:0x24
. This syntax isn't valid but I'm replacing DS with 0x0000 to make a point. 0x0000:0x24
is where our code while executing will attempt to read our letter P
from. But wait! That is physical address (0x0000<<4)+0x24 = 0x00024. This memory address happens to be at the bottom of memory in the middle of the interrupt vector table. Clearly this is not what we intended!
There are a couple ways to tackle this issue. The easiest (and preferred method) is to actually place the proper segment into DS, and not rely on what CS might be when our program runs. Since we set an ORG of 0x0000 we need to have a Data Segment(DS) = 0x07c0 . A segment:offset pair of 0x07c0:0x0000 = physical address 0x07c00 . Which is what the address of our bootloader is at. So all we have to do is amend the code by replacing:
push cs
pop ds ; DS=CS
With:
push 0x07c0
pop ds ; DS=0x07c0
This change should provide the correct output when run in VirtualBox . Now let us see why. This code didn't change:
8: a0 24 00 mov al,ds:0x24
Now when executed DS=0x07c0. This would have been like saying mov al,0x07c0:0x24
. 0x07c0:0x24
, which would translate into a physical address of (0x07c0<<4)+0x24 = 0x07c24 . This is what we want since our bootloader was physically placed into memory by the BIOS starting at that location and so it should reference our msg variable correctly.
Moral of the story? What ever you use for ORG there should be an applicable value in the DS register when we start our program.We should set it explicitly, and not rely on what is in CS.
Why Do Immediate Values Print?
With the original code, the first 2 characters printed gibberish, but the last two didn't. As was discussed in the previous section there was a reason the first 2 character wouldn't print, but what about the last 2 characters that did?
Let us examine the disassembly of the 3rd character O
more carefully:
16: b0 4f mov al,0x4f ; 0x4f = 'O'
Since we used an immediate (constant) value and moved it into register AL, the character itself is encoded as part of the instruction. It doesn't rely on a memory access via the DS register. Because of this the last 2 characters displayed properly.
Ross Ridge's Suggestion and Why it Works in VirtualBox
Ross Ridge suggested we use ORG 0x7c00
, and you observed that it worked. Why did that happen? And is that solution ideal?
Using my very first example and modify ORG 0x0000
to ORG 0x7c00
, and then assemble it. objdump
would have provided this disassembly:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x7c00 boot.bin
boot.bin: file format binary
Disassembly of section .data:
00007c00 <.data>:
7c00: 0e push cs
7c01: 1f pop ds
7c02: 68 00 b8 push 0xb800
7c05: 07 pop es
7c06: b4 0e mov ah,0xe
7c08: a0 24 7c mov al,ds:0x7c24
7c0b: 26 a3 00 00 mov es:0x0,ax
7c0f: a0 25 7c mov al,ds:0x7c25
7c12: 26 a3 02 00 mov es:0x2,ax
7c16: b0 4f mov al,0x4f
7c18: 26 a3 04 00 mov es:0x4,ax
7c1c: b0 21 mov al,0x21
7c1e: 26 a3 06 00 mov es:0x6,ax
7c22: fa cli
7c23: f4 hlt
7c24: 50 push ax ; Letter 'P'
7c25: 41 inc cx ; Letter 'A'
...
7dfe: 55 push bp
7dff: aa stos BYTE PTR es:[di],al
VirtualBox set CS to 0x0000 when it jumped to our bootloader. Our original code then copied CS to DS, so DS = 0x0000. Now observe what the ORG 0x7c00
directive has done to our generated code:
7c08: a0 24 7c mov al,ds:0x7c24
Notice how we are now using an offset of 0x7c24! This would be like mov al,0x0000:0x7c24
which is physical address (0x0000<<4)+0x7c24 = 0x07c24. That is the right memory location where the bootloader was loaded, and is the proper position of our msg string. So it works.
Is using an ORG 0x7c00
a bad idea? No. It is fine. But we have a subtle issue to contend with. What happens if another Virtual PC environment or real hardware doesn't FAR JMP to our bootloader using a CS:IP of 0x0000:0x7c00? This is possible. There are many physical PCs with a BIOS that actually does the equivalent of a far jump to 0x07c0:0x0000
. That too is physical address 0x07c00
as we have already seen. In that environment, when our code runs CS = 0x07c0. If we use the original code that copies CS to DS, DS now has 0x07c0 too. Now observe what would happen to this code in that situation:
7c08: a0 24 7c mov al