On Linux, the familiar argc
and argv
variables from C are always passed on the stack by the kernel, available even to assembly programs that are completely standalone and don't link with the startup code in the C library. This is documented in the i386 System V ABI, along with other details of the process startup environment (register values, stack alignment).
At the ELF entry point (a.k.a. _start
) of an x86 Linux executable:
- ESP points to
argc
- ESP + 4 points to
argv[0]
, the start of the array. i.e. the value you should pass to main as char **argv
is lea eax, [esp+4]
, not mov eax, [esp+4]
)
How a Minimal Assembly Program Obtains argc and argv
I'll show how to read argv
and argc[0]
in GDB.
cmdline-x86.S
#include <sys/syscall.h>
.global _start
_start:
/* Cause a breakpoint trap */
int $0x03
/* exit_group(0) */
mov $SYS_exit_group, %eax
mov $0, %ebx
int $0x80
cmdline-x86.gdb
set confirm off
file cmdline-x86
run
# We'll regain control here after the breakpoint trap
printf "argc: %d
", *(int*)$esp
printf "argv[0]: %s
", ((char**)($esp + 4))[0]
quit
Sample Session
$ cc -nostdlib -g3 -m32 cmdline-x86.S -o cmdline-x86
$ gdb -q -x cmdline-x86.gdb cmdline-x86
<...>
Program received signal SIGTRAP, Trace/breakpoint trap.
_start () at cmdline-x86.S:8
8 mov $SYS_exit_group, %eax
argc: 1
argv[0]: /home/scottt/Dropbox/stackoverflow/cmdline-x86
Explanation
- I placed a software breakpoint (
int $0x03
) to cause the program to trap back into the debugger right after the ELF entry point (_start
).
- I then used
printf
in the GDB script to print
argc
with the expression *(int*)$esp
argv
with the expression ((char**)($esp + 4))[0]
x86-64 version
The differences are minimal:
- Replace ESP with RSP
- Change address size from 4 to 8
- Conform to different Linux syscall calling conventions when we call
exit_group(0)
to properly terminate the process
cmdline.S
#include <sys/syscall.h>
.global _start
_start:
/* Cause a breakpoint trap */
int $0x03
/* exit_group(0) */
mov $SYS_exit_group, %rax
mov $0, %rdi
syscall
cmdline.gdb
set confirm off
file cmdline
run
printf "argc: %d
", *(int*)$rsp
printf "argv[0]: %s
", ((char**)($rsp + 8))[0]
quit
How Regular C Programs Obtain argc and argv
You can disassemble _start
from a regular C program to see how it obtains argc
and argv
from the stack and passes them as it calls __libc_start_main
. Using the /bin/true
program on my x86-64 machine as an example:
$ gdb -q /bin/true
Reading symbols from /usr/bin/true...Reading symbols from /usr/lib/debug/usr/bin/true.debug...done.
done.
(gdb) disassemble _start
Dump of assembler code for function _start:
0x0000000000401580 <+0>: xor %ebp,%ebp
0x0000000000401582 <+2>: mov %rdx,%r9
0x0000000000401585 <+5>: pop %rsi
0x0000000000401586 <+6>: mov %rsp,%rdx
0x0000000000401589 <+9>: and $0xfffffffffffffff0,%rsp
0x000000000040158d <+13>: push %rax
0x000000000040158e <+14>: push %rsp
0x000000000040158f <+15>: mov $0x404040,%r8
0x0000000000401596 <+22>: mov $0x403fb0,%rcx
0x000000000040159d <+29>: mov $0x4014c0,%rdi
0x00000000004015a4 <+36>: callq 0x401310 <__libc_start_main@plt>
0x00000000004015a9 <+41>: hlt
0x00000000004015aa <+42>: xchg %ax,%ax
0x00000000004015ac <+44>: nopl 0x0(%rax)
The first three arguments to __libc_start_main()
are:
- RDI: pointer to
main()
- RSI:
argc
, you can see how it was the first thing popped off the stack
- RDX:
argv
, the value of RSP right after argc
was popped. (ubp_av
in the GLIBC source)
The x86 _start is very similar:
Dump of assembler code for function _start:
0x0804842c <+0>: xor %ebp,%ebp
0x0804842e <+2>: pop %esi
0x0804842f <+3>: mov %esp,%ecx
0x08048431 <+5>: and $0xfffffff0,%esp
0x08048434 <+8>: push %eax
0x08048435 <+9>: push %esp
0x08048436 <+10>: push %edx
0x08048437 <+11>: push $0x80485e0
0x0804843c <+16>: push $0x8048570
0x08048441 <+21>: push %ecx
0x08048442 <+22>: push %esi
0x08048443 <+23>: push $0x80483d0
0x08048448 <+28>: call 0x80483b0 <__libc_start_main@plt>
0x0804844d <+33>: hlt
0x0804844e <+34>: xchg %ax,%ax
End of assembler dump.