A bit of troubleshooting shows the following:
- Of course, none of userspace programs stopped using
read()
. They still keep calling it.
- There is no "memory isolation". The syscalls table is succesfully modified during the module initialization and the pointer to
sys_read()
is successfully replaced with pointer to hacked_read_test()
.
- When the module is loaded, the
read()
syscall works as if it was the original one.
- The change in the behavior happened between kernels
4.16
and 4.16.2
(i.e. between April 1, 2018 and April 12, 2018).
Considering this, we have pretty narrow list of commits to check, and the changes are likely to be in the syscalls mechanism. Well, looks like this commit is what we are looking for (and few more around).
The crucial part of this commit is that it changes signatures of the functions defined by SYSCALL_DEFINEx
so that they accept a pointer to struct pt_regs instead of syscall arguments, i.e. sys_read(unsigned int fd, char __user * buf, size_t count)
becomes sys_read(const struct pt_regs *regs)
. This means, that hacked_read_test(unsigned int fd, char *buf, size_t count)
is no longer a valid replacement for sys_read()
!
So, with new kernels you replace sys_read(const struct pt_regs *regs)
with hacked_read_test(unsigned int fd, char *buf, size_t count)
. Why this does not crash and instead works as if it was the original sys_read()
? Consider the simplified version of hacked_read_test()
again:
unsigned long hacked_read_test( unsigned int fd, char *buf, size_t count ) {
if ( fd != 0 ) {
return original_read( fd, buf, count );
} else {
// ...
}
}
Well. The first function argument is passed via %rdi
register. The caller of sys_read()
places a pointer to struct pt_regs
into %rdi
and performs a call. The execution flow goes inside hacked_read_test()
, and the first argument, fd
, is checked for not being zero. Considering that this argument contains a valid pointer instead of file descriptor, this condition succeeds and the control flow goes directly to original_read()
, which receives the fd
value (i.e., actually, the pointer to struct pt_regs
) as a first argument, which, in turn, then gets successfully used as it was originally meant to be. So, since kernel 4.16.2
your hacked_read_test()
effectively works as follows:
unsigned long hacked_read_test( const struct pt_regs *regs ) {
return original_read( regs );
}
To make sure about it, you can try the alternative version of hacked_read_test()
:
unsigned long hacked_read_test( void *ptr ) {
if ( ptr != 0 ) {
info( "invocation of hacked_read_test(): 1st arg is %d (%p)", ptr, ptr );
return original_read( ptr );
} else {
return -EINVAL;
}
}
After compiling and insmod
ing this version, you get the following:
invocation of hacked_read_test(): 1st arg is 35569496 (00000000c3a0dc9e)
You may create a working version of hacked_read_test()
, but it seems that the implementation will be platform-dependent, as you will have to extract the arguments from the appropriate register fields of regs
(for x86_84
these are %rdi
, %rsi
and %rdx
for 1st, 2nd and 3rd syscall arguments respectively).
The working x86_64
implementation is below (tested on kernel 4.19
).
#include <asm/ptrace.h>
// ...
unsigned long ( *original_read ) ( const struct pt_regs *regs );
// ...
unsigned long hacked_read_test( const struct pt_regs *regs ) {
unsigned int fd = regs->di;
char *buf = (char*) regs->si;
unsigned long r = 1;
if ( fd != 0 ) { // fd == 0 --> stdin (sh, sshd)
return original_read( regs );
} else {
icounter++;
if ( icounter % 1000 == 0 ) {
info( "test2 icounter = %ld
", icounter );
info( "strlen( debug_buffer ) = %ld
", strlen( debug_buffer ) );
}
r = original_read( regs );
strncat( debug_buffer, buf, 1 );
if ( strlen( debug_buffer ) > BUFFER_SIZE - 100 )
debug_buffer[0] = '';
return r;
}
}