Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
146 views
in Technique[技术] by (71.8m points)

linux - Execute in memory binary with c

I'm currently stuck on my c project, I'm trying to run a program in memory with C language, but all the time i have segmentation fault.

First I have a simple hello_worl.c, I compile it and I have the output hello_world.

2nd I transform the hello_world to C header like that :

xxd -i hello_world > hello.h

(Content of the hello.h)

unsigned char hello[] = {
...
}
unsigned int hello_len = 16696;

Finally, I create the program main.c, which seem like that :

#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include "hello.h"

int main ()
{
    int (*my_hello) () = NULL;

    // allocate executable buffer
    my_hello = mmap (0, sizeof(hello), PROT_READ|PROT_WRITE|PROT_EXEC,
                MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

    // copy code to buffer
    memcpy (my_hello, hello, sizeof(hello));
    __builtin___clear_cache (my_hello, my_hello + sizeof(my_hello));  // GNU C

    // run code
 my_hello ();


}

I base this code with this 2 topics :

In this case I am trying to do with hello_world but as you can gess, I would like to do it with other programs (like linux utilities).

I am not very good at C, so any help is welcome !

question from:https://stackoverflow.com/questions/66051960/execute-in-memory-binary-with-c

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is a much more complicated problem than you think it is :/

The only "simple" option would be to write() that embedded array to a file in /tmp and execve it, to take advantage of the kernel's ELF program loader and the system's dynamic linker. This would effectively be creating a self-extracting executable wrapper. (Typically you'd want to compress the file with zstd or gzip / deflate, at least that's the normal reason for having a self-extracting wrapper.)


A Linux executable starts with metadata; you can't just load the whole thing into memory and jump to the first byte. (This is true for everything except DOS .com).

The ELF program-headers say which parts of the executable file need to be mapped to what relative position in memory (normally by the kernel's ELF program-loader which sets up the mmap-like mappings).

e.g. .data might be next to .text on disk, but the headers will have them mapped some distance apart so array out-of-bounds is likely to fault instead of read code. Also, the .bss section needs to be allocated separately, either past the end of .data or as a separate MAP_ANONYMOUS mapping so it's initially filled with zeros.

Your wrapper program already has a stack, but the _start entry point of the executable will be expecting argc, argv[], and envp[] to be on the stack as specified in the System V ABI doc. (e.g. x86-64 System V).

Also, x86-64 and i386 System V ABIs used on Linux expect the stack pointer to be 16-byte aligned on entry to _start. Not like a function call where it should be aligned before the call, resulting in RSP % 16 == 8 at a function entry point. You can expect some x86-64 library functions to crash if the program relies on the incoming stack alignment.


It's also probably dynamically linked, so you'd need to process the relocation metadata to hook up its GOT (and PLT) entries to symbols from libc.so.6 and any other libraries it uses. i.e. you'd need to implement an ELF program-loader, and the dynamic linking functionality of ld.so.

You could avoid that part if you statically link libraries, e.g. compile hello_world with gcc -static-pie -fPIE to link a static but Position-Independent Executable, so it could hopefully work at any load address randomly chosen by mmap.

PIE code may still contain absolute addresses as static initializers, like static char *ptr = foo;. The CRT _start code is responsible for applying those relocation fixups in a static-pie (How are relocations supposed to work in static PIE binaries?). Jumping to the ELF entry point will run that code. It should work by reading metadata mapped into memory as per the ELF headers, so if you got that right it might well Just Work. But if not, you might have to code carefully to avoid addresses as static initializers.

IDK if the static libc.a you'd be linking would have been compiled with -fPIE though, so might itself contain absolute addresses (maybe even as immediates in machine code as 32-bit absolute addresses, which would make linking into a PIE impossible on x86-64: 32-bit absolute addresses no longer allowed in x86-64 Linux?).


I guess another option would be to make your embedded executable a non-PIE (gcc -fno-pie -no-pie -static), and use mmap(MAP_FIXED) to map it into the right place in virtual address space. (Or use the "hint" address for mmap without MAP_FIXED, and return an error if it doesn't put it where you asked: that means something else is already using that space in your virtual memory space. Of course if you're going to jump to its _start, it's effectively taking over your process and will not return. It will eventually _exit, or execve something else. But you'd still have a problem if you blew away the code that copies and jumps.)

So to avoid address-space conflicts, make your wrapper program a PIE (the default for GCC on modern Linux distros), so it will get mapped to 0x555... instead of 0x400000, or use a custom linker script for one or the other so they can both be non-PIE but using different base addresses.

More about PIE (Position-Independent Executable):


None of this will avoid having to parse the ELF headers somehow to find the entry point to jump to, instead of the head of the buffer. (As well as to map the segments properly)

You could do that alongside xxd, though, with a script to parse readelf output and emit a C var into your .h like const size_t entry_offset = 1234;.

Note that the Entry point address: 0x5b20 in readelf output is a memory address, not a file-offset, although for a PIE they might be the same thing. Check the program headers to see what mapping contains that virtaddr, and compare the (file) "Offset". With standard GNU/Linux toolchain defaults, usually the virtual memory address is 0x1000 higher than the file offset.

(If you're looking at a non-running PIE executable or library; when run those addresses will be relative to a base randomly chosen by the kernel, typically starting with 0x5555.... i.e. look at typical code addresses in GDB before / after using the GDB start command, like disas main)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...