using qemu-user emulation to reverse engineer binaries

QEMU is primarily known as the software which provides full system emulation under Linux’s KVM.  Also, it can be used without KVM to do full emulation of machines from the hardware level up.  Finally, there is qemu-user, which allows for emulation of individual programs.  That’s what this blog post is about.

The main use case for qemu-user is actually not reverse-engineering, but simply running programs for one CPU architecture on another.  For example, Alpine developers leverage qemu-user when they use dabuild(1) to cross-compile Alpine packages for other architectures: qemu-user is used to run the configure scripts, test suites and so on.  For those purposes, qemu-user works quite well: we are even considering using it to build the entire riscv64 architecture in the 3.15 release.

However, most people don’t realize that you can run a qemu-user emulator which targets the same architecture as the host.  After all, that would be a little weird, right?  Most also don’t know that you can control the emulator using gdb, which is possible and allows you to debug binaries which detect if they are being debugged.

You don’t need gdb for this to be a powerful reverse engineering tool, however.  The emulator itself includes many powerful tracing features.  Lets look into them by writing and compiling a sample program, that does some recursion by calculating whether a number is even or odd inefficiently:

#include <stdbool.h> 
#include <stdio.h> 

bool isOdd(int x); 
bool isEven(int x); 

bool isOdd(int x) { 
   return x != 0 && isEven(x - 1); 
} 

bool isEven(int x) { 
   return x == 0 || isOdd(x - 1); 
} 

int main(void) { 
   printf("isEven(%d): %d\\n", 1025, isEven(1025)); 
   return 0; 
}

Compile this program with gcc, by doing gcc -ggdb3 -Os example.c -o example.

The next step is to install the qemu-user emulator for your architecture, in this case we want the qemu-x86_64 package:

$ doas apk add qemu-x86_64
(1/1) Installing qemu-x86_64 (6.0.0-r1)
$

Normally, you would also want to install the qemu-openrc package and start the qemu-binfmt service to allow for the emulator to handle any program that couldn’t be run natively, but that doesn’t matter here as we will be running the emulator directly.

The first thing we will do is check to make sure the emulator can run our sample program at all:

$ qemu-x86_64 ./example 
isEven(1025): 0

Alright, all seems to be well.  Before we jump into using gdb with the emulator, lets play around a bit with the tracing features.  Normally when reverse engineering a program, it is common to use tracing programs like strace.  These tracing programs are quite useful, but they suffer from a design flaw: they use ptrace(2) to accomplish the tracing, which can be detected by the program being traced.  However, we can use qemu-user to do the tracing in a way that is transparent to the program being analyzed:

$ qemu-x86_64 -d strace ./example 
22525 arch_prctl(4098,274903714632,136818691500777464,274903714112,274903132960,465) = 0 
22525 set_tid_address(274903715728,274903714632,136818691500777464,274903714112,0,465) = 22525 
22525 brk(NULL) = 0x0000004000005000 
22525 brk(0x0000004000007000) = 0x0000004000007000 
22525 mmap(0x0000004000005000,4096,PROT_NONE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,-1,0) = 0x0000004000005000 
22525 mprotect(0x0000004001899000,4096,PROT_READ) = 0 
22525 mprotect(0x0000004000003000,4096,PROT_READ) = 0 
22525 ioctl(1,TIOCGWINSZ,0x00000040018052b8) = 0 ({55,236,0,0}) 
isEven(1025): 0 
22525 writev(1,0x4001805250,0x2) = 16 
22525 exit_group(0)

But we can do even more.  For example, we can learn how a CPU would hypothetically break a program down into translation buffers full of micro-ops (these are TCG micro-ops but real CPUs are similar enough to gain a general understanding of the concept):

$ qemu-x86_64 -d op ./example
OP: 
ld_i32 tmp11,env,$0xfffffffffffffff0 
brcond_i32 tmp11,$0x0,lt,$L0 

---- 000000400185eafb 0000000000000000 
discard cc_dst 
discard cc_src 
discard cc_src2 
discard cc_op 
mov_i64 tmp0,$0x0 
mov_i64 rbp,tmp0 

---- 000000400185eafe 0000000000000031 
mov_i64 tmp0,rsp 
mov_i64 rdi,tmp0 

---- 000000400185eb01 0000000000000031 
mov_i64 tmp2,$0x4001899dc0 
mov_i64 rsi,tmp2 

---- 000000400185eb08 0000000000000031 
mov_i64 tmp1,$0xfffffffffffffff0 
mov_i64 tmp0,rsp 
and_i64 tmp0,tmp0,tmp1 
mov_i64 rsp,tmp0 
mov_i64 cc_dst,tmp0 

---- 000000400185eb0c 0000000000000019 
mov_i64 tmp0,$0x400185eb11 
sub_i64 tmp2,rsp,$0x8 
qemu_st_i64 tmp0,tmp2,leq,0 
mov_i64 rsp,tmp2 
mov_i32 cc_op,$0x19 
goto_tb $0x0 
mov_i64 tmp3,$0x400185eb11 
st_i64 tmp3,env,$0x80 
exit_tb $0x7f72ebafc040 
set_label $L0 
exit_tb $0x7f72ebafc043
[...]

If you want to trace the actual CPU registers for every instruction executed, that’s possible too:

$ qemu-x86_64 -d cpu ./example
RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000000 
RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000004001805690 
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 
RIP=000000400185eafb RFL=00000202 \[-------\] CPL=3 II=0 A20=1 SMM=0 HLT=0 
ES =0000 0000000000000000 00000000 00000000 
CS =0033 0000000000000000 ffffffff 00effb00 DPL=3 CS64 \[-RA\] 
SS =002b 0000000000000000 ffffffff 00cff300 DPL=3 DS   \[-WA\] 
DS =0000 0000000000000000 00000000 00000000 
FS =0000 0000000000000000 00000000 00000000 
GS =0000 0000000000000000 00000000 00000000 
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT 
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy 
GDT=     000000400189f000 0000007f 
IDT=     000000400189e000 000001ff 
CR0=80010001 CR2=0000000000000000 CR3=0000000000000000 CR4=00000220 
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400 
CCS=0000000000000000 CCD=0000000000000000 CCO=EFLAGS 
EFER=0000000000000500
[...]

You can also trace with disassembly for each translation buffer generated:

$ qemu-x86_64 -d in_asm ./example
---------------- 
IN:  
0x000000400185eafb:  xor    %rbp,%rbp 
0x000000400185eafe:  mov    %rsp,%rdi 
0x000000400185eb01:  lea    0x3b2b8(%rip),%rsi        # 0x4001899dc0 
0x000000400185eb08:  and    $0xfffffffffffffff0,%rsp 
0x000000400185eb0c:  callq  0x400185eb11 

---------------- 
IN:  
0x000000400185eb11:  sub    $0x190,%rsp 
0x000000400185eb18:  mov    (%rdi),%eax 
0x000000400185eb1a:  mov    %rdi,%r8 
0x000000400185eb1d:  inc    %eax 
0x000000400185eb1f:  cltq    
0x000000400185eb21:  mov    0x8(%r8,%rax,8),%rcx 
0x000000400185eb26:  mov    %rax,%rdx 
0x000000400185eb29:  inc    %rax 
0x000000400185eb2c:  test   %rcx,%rcx 
0x000000400185eb2f:  jne    0x400185eb21
[...]

All of these options, and more, can also be stacked.  For more ideas, look at qemu-x86_64 -d help.  Now, lets talk about using this with gdb using qemu-user’s gdbserver functionality, which allows for gdb to control a remote machine.

To start a program under gdbserver mode, we use the -g argument with a port number.  For example, qemu-x86_64 -g 1234 ./example will start our example program with a gdbserver listening on port 1234.  We can then connect to that gdbserver with gdb:

$ gdb ./example
[...]
Reading symbols from ./example... 
(gdb) target remote localhost:1234 
Remote debugging using localhost:1234 
0x000000400185eafb in ?? ()
(gdb) br isEven 
Breakpoint 1 at 0x4000001233: file example.c, line 12.
(gdb) c 
Continuing. 

Breakpoint 1, isEven (x=1025) at example.c:12 
12          return x == 0 || isOdd(x - 1);
(gdb) bt full 
#0  isEven (x=1025) at example.c:12 
No locals. 
#1  0x0000004000001269 in main () at example.c:16 
No locals.

All of this is happening without any knowledge or cooperation of the program.  As far as its concerned, its running as normal, there is no ptrace or any other weirdness.

However, this is not 100% perfect: a program could be clever and run the cpuid instruction and check for GenuineIntel or AuthenticAMD and crash out if it doesn’t see that it is running on a legitimate CPU.  Thankfully, qemu-user has the ability to spoof CPUs with the -cpu option.

If you find yourself needing to spoof the CPU, you’ll probably have the best results with a simple CPU type like -cpu Opteron_G1-v1 or similar.  That CPU type spoofs an Opteron 240 processor, which was one of the first x86_64 CPUs on the market.  You can get a full list of CPUs supported by your copy of the qemu-user emulator by doing qemu-x86_64 -cpu help.

There’s a lot more qemu-user emulation can do to help with reverse engineering, for some ideas, look at qemu-x86_64 -h or similar.