Toilet lanterns. 2015/11/19 14:16

Author:[email protected]

0x00 What is Code virtualization


In fact, I think virtualization is to replace the native instructions of the program with a set of custom bytecode, which is interpreted and executed by the interpreter in the program. Custom bytecodes are readable only by the interpreter, so our custom bytecodes are not readable by ordinary tools, and because of this, virtual machine-based protection is more difficult to crack than other protections. But interpreters are usually native code, which is what makes the interpreter run to interpret and execute bytecode. These relationships, like many interpreted languages, are not executable files that run directly on the system and require an interpreter to run them, such as Python.

0x01 Why research Code virtualization


There are many places where virtualization technology is used, such as sandboxes, application shells, etc. Most of the time, in order to prevent malicious code from damaging our system, we need a Sandbox, so that the program runs in the Sandbox. Even if malicious code breaks the system, it only damages the Sandbox and does not affect our system. Other shell shields such as VMP and Shielden are built with a virtual machine to protect the program code. The protection based on virtual machine is more difficult to crack than other protection because the existing tools cannot recognize the bytecode of the VIRTUAL machine. After seeing the power of this kind of protective shell, I also had the impulse to write one myself, so I had this article.

0x02 Virtual Machine Based Code Confusion


Virtual machine based code protection can also be considered a code obfuscation technique. Code to confuse the purpose is to prevent code is reverse analysis, but all the obfuscation techniques are not completely cannot be analyzed, only increasing the difficulty of analysis or longer analysis time, although these techniques have the effect very much to protect the code, but there are also side effects, such as can be reduced to a more or less efficiency of program, This is particularly prominent in virtual machine-based protection, so most virtual machine-based protection only protects important parts of it. Virtual Machine based code protection can be broadly divided into two types:

  1. Use virtual machine interpretation to execute unshell code. This obfuscation is intended to hide how the source code is encrypted and decrypted by the unshell code. This approach is more effective for static analysis, but less effective for dynamic debugging. Because dynamic debugging can wait until the unshell code decrypts the original source code after the unshell. Only with other protection technology will have a strong protection effect.

  2. The source code of the program to be protected is converted into custom bytecode, and then the virtual machine is used to interpret and execute the transformed program bytecode, while the source code of the program will not appear in the program. This method can effectively protect both static and dynamic.

As you can see, the difference between the two types of protection is that the first type only protects the unshell code, not the source code. The second protects all source code directly. So the first is also less intense than the second. This article takes the second approach, which is to protect all source code.

In virtual machine based protection technologies, there is usually a mapping relationship between custom bytecodes and native instructions, that is, one or more bytecodes correspond to one Native instruction. As for the reason why multiple bytecodes are required to correspond to the same native instruction, it is actually to increase the difficulty of decrypting virtual machine protection. In this way, when converting the protected code, multiple sets of programs with different bytecodes but the same execution effect can be randomly generated, which increases the difficulty of reverse analysis.

0x03 What needs to be implemented?

  • After first looking at the principles of code virtualization, you know that the principle is to customize a set of bytecodes, and then use an interpreter to interpret and run the bytecodes. So, the goal is divided into two parts:

    1. Defining bytecode

      Bytecode is just an identifier and can be defined at will. Here is my definition of bytecode, where each instruction identifier corresponds to a byte

      #! C++ /* * opcode enum */ enum OPCODES {MOV = 0xa0 XOR = 0xa1, // CMP instruction bytecode corresponds to 0xA2 RET = 0xa3, // RET instruction bytecode corresponds to 0xa3 SYS_READ = 0xA4, // read system call bytecode corresponds to 0xA4 SYS_WRITE = 0xa5, // write system call bytecode corresponding to 0xa5 JNZ = 0xa6 // JNZ instruction bytecode corresponding to 0xa0};Copy the code

      Since my demo is just a simple Crackme, I only define a few commonly used directives. If necessary, more bytecodes can be defined to enrich the virtual machine.

    2. Implementation interpreter

      After the bytecode corresponding to the instruction is defined, an interpreter can be implemented to interpret the instruction bytecode defined above. Before implementing the virtual machine interpreter, we need to figure out what we’re going to virtualize. A virtual machine is a virtual environment in which a program (custom bytecode) runs, and the virtual machine interprets the execution of bytecode very much like the execution of our real processor. The program in the physical machine needs a processor, stack, heap and other environment to execute instructions to run, so the first need to virtual a processor, the processor needs to have some registers to assist computing, the following is my definition of virtual processor

      #! c /* * virtual processor */ typedef struct processor_t { int r1; // virtual register R1 int R2; // virtual register r2 int r3; // virtual register r3 int r4; R4 int flag; // The virtual flag register is similar to eflags unsigned char *eip; // The virtual machine register eIP points to the bytecode address being interpreted vm_opcode op_table[OPCODE_NUM]; } vm_processor;} vm_processor; /* * opcode struct */ typedef struct opcode_t { unsigned char opcode; // bytecode void (*func)(void *); } vm_opcode;Copy the code
  • In the above structure, R1 ~ R4 are four general purpose registers, which are used to pass parameters and return values. The eIP points to the bytecode address that is currently being executed. Op_table holds the handlers for all bytecode instructions. These two virtual structures are the core of the virtual machine, and the interpreter interprets bytecode around them. Because the program logic is simple, only one processor needs to be virtualized, and neither heap nor stack is necessary. I use a buffer to store the data in the program. I can also put the whole buffer into a heap or stack.

  • With these two structures in place, you are ready to start writing the interpreter. The job of the interpreter is to determine whether the currently interpreted bytecode can be parsed, and if so, to pass the parameters to the corresponding handler, which interprets and executes the instruction. Here is the interpreter code

    #! C void vm_interp(vm_processor *proc) {/* eip points to the first byte of the protected code * target_func + 4 is code to skip the function entry generated by the compiler */ proc->eip = (unsigned char *) target_func + 4; // Loop to determine whether the bytecode pointed to by eIP is a return instruction, and if not, call exec_opcode to explain the execution of while (*proc->eip! = RET) { exec_opcode(proc); }}Copy the code
  • Where target_func is the target function written by custom bytecode, and is the first byte that the EIP points to the target function, ready to explain execution. Terminate when the RET instruction is encountered, otherwise call exec_opcode to execute the bytecode. Below is the code exec_opcode

    #! c void exec_opcode(vm_processor *proc) { int flag = 0; int i = 0; // Find the bytecode handler that eIP points to being interpreted while (! flag && i < OPCODE_NUM) { if (*proc->eip == proc->op_table[i].opcode) { flag = 1; Proc ->op_table[I].func((void *) proc); } else { i++; }}}Copy the code
  • The bytecode interpretation first determines which instruction needs to be executed and then calls its handler. Here is the pseudocode for target_func. The pseudo-code logic is to first read 0x12 bytes from the standard input, then the first 8 bits are xor with 0x29, and finally compare with 8 bytes in memory bit by bit. If all the same, success is printed, and error is printed if failure. The following code could have been done in a loop, but I’m being lazy here and just copy and paste.

    #! c /* mov r1, 0x00000000 mov r2, 0x12 call vm_read ; Mov R1, input[0] mov R2, 0x29 xor R1, R2; Xor CMP R1, Flag [0]; JNZ ERROR; If not, jump to the code that prints the error; Mov R1, input[1] xor R1, R2 CMP R1, flag[1] JNZ ERROR MOV R1, input[2] xor R1, R2 CMP R1, flag[2] JNZ ERROR mov R1, input[3] xor r1, r2 cmp r1, flag[3] jnz ERROR mov r1, input[4] xor r1, r2 cmp r1, flag[4] jnz ERROR mov r1, input[5] xor r1, r2 cmp r1, flag[5] jnz ERROR mov r1, input[6] xor r1, r2 cmp r1, flag[6] jnz ERROR mov r1, input[7] xor r1, r2 cmp r1, flag[7] jnz ERROR */Copy the code
  • The code for the corresponding handler function is shown in the complete code below. With the above key functions, a simple virtual machine can run. In a virtual machine, you can also create a virtual machine stack and more complete registers to enrich the instructions that the virtual machine supports. Because this program is relatively simple, there is no stack, all parameters are passed through registers, or hidden in bytecode. You can modify it yourself if you are interested.

0x04 The interpreter explains the execution process


The eip of the vm_interp interpreter points to target_func + 4, which is the first byte 0xa0 defined in the inline assembly of target_func. Then it will determine whether the bytecode pointed by EIP is ret instruction. Ret instruction is 0xA3, so it is not RET pointed by EIP. Enter exec_opcode function for bytecode interpretation.

Enter exec_opcode and start looking in the op_table of the virtual processor for the bytecode pointed to by eIP, which is currently 0xa0, and call its interpretive function once found.

The initialization of bytecode and interpretive functions is in init_vm_proc

It can be seen that 0xa0 corresponds to the MOV instruction, so when the interpreter encounters 0xa0, it will call the vm_mov function to interpret the MOV instruction.

In vm_MOV, one byte at eIP + 1 and four bytes at eIP + 2 are stored in dest and SRC respectively. Dest is the register identifier. In the following switch, it is determined which register dest is. R1 is assigned * SRC in the case 0x10 branch. In general, the first 6 bytes are the first MOV instruction, corresponding to MOV R1, XXXX, and XXXX is the last 4 bytes of the 6 bytes, in this case 0x00000000.

This example gives you an idea of how an interpreter interprets the execution of bytecode, which can be as simple as calling the corresponding function through a bytecode’s relationship to the interpreted function, or identifying each bytecode with a long switch and calling the corresponding function. Interpretive functions simulate an instruction by performing corresponding operations. Finally, these instructions are strung together to complete a complete logic.

0x05 Code execution effect


0x06 VM Protection Effect


Static analysis

When statically analyzing code based on virtual machine protection, normal tools are ineffective because the bytecode is defined by ourselves and only recognized by the interpreter. So bytecode is just an unrecognizable piece of data for IDA analysis.

This is the code identified by IDA as target_func, which has been done against static analysis. However, we can still analyze our interpreter statically. When analyzing the interpreter, the control flow in the interpreter will be much more complex than the control flow in the source program, which also makes analysis more difficult.

Dynamic debugging

Bytecode is still unrecognized during dynamic debugging, and the processor does not actually execute the unrecognized stuff. Because the bytecode is executed by our virtual processor through the interpreter, which is native, it can be statically analyzed or dynamically debugged. However, dynamic debugging is only in the debugging interpreter, in the debugging process can only see in the continuous call each instruction interpretation function. Therefore, in order to truly restore the source code, it is necessary to find the mapping relation of all the corresponding native instructions of the bytecode in the debugging process. Finally, the bytecode can be converted into the native instruction through this mapping relation. Of course, a completely unshell and executable Native program can also be repaired, but the process will be tedious.

0x07 Complete code


Below is the complete demo code, which has been tested on Linux.

xvm.h

#! c #include <stdio.h> #include <stdlib.h> #include <string.h> #define OPCODE_NUM 7 // opcode number #define HEAP_SIZE_MAX  1024 char *heap_buf; // vm heap /* * opcode enum */ enum OPCODES {MOV = 0xa0, // MOV instruction bytecode corresponding 0xa0 XOR = 0xA1, // CMP bytecode corresponds to 0xa1 CMP = 0xA2, CMP bytecode corresponds to 0xA2 RET = 0xa3, // Read system call bytecode corresponding to 0xA4 SYS_WRITE = 0xa5, // write system call bytecode corresponding to 0xA5 JNZ = 0xA6 // JNZ instruction bytecode corresponding to 0xa0}; enum REGISTERS { R1 = 0x10, R2 = 0x11, R3 = 0x12, R4 = 0x13, EIP = 0x14, FLAG = 0x15 }; /* * opcode struct */ typedef struct opcode_t { unsigned char opcode; // bytecode void (*func)(void *); } vm_opcode; /* * virtual processor */ typedef struct processor_t { int r1; // virtual register R1 int R2; // virtual register r2 int r3; // virtual register r3 int r4; R4 int flag; // The virtual flag register is similar to eflags unsigned char *eip; // The virtual machine register eIP points to the bytecode address being interpreted vm_opcode op_table[OPCODE_NUM]; } vm_processor;} vm_processor;Copy the code

xvm.c

#! c #include "xvm.h" void target_func() { __asm__ __volatile__(".byte 0xa0, 0x10, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x11, 0x12, 0x00, 0x00, 0x00, 0xa4, 0xa0, 0x14, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x11, 0x29, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x20, 0xa6, 0x5b, 0xa0, 0x14, 0x01, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x21, 0xa6, 0x50, 0xa0, 0x14, 0x02, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x22, 0xa6, 0x45, 0xa0, 0x14, 0x03, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x23, 0xa6, 0x3a, 0xa0, 0x14, 0x04, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x24, 0xa6, 0x2f, 0xa0, 0x14, 0x05, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x25, 0xa6, 0x24, 0xa0, 0x14, 0x06, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x26, 0xa6, 0x19, 0xa0, 0x14, 0x07, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x27, 0xa6, 0x0f, 0xa0, 0x10, 0x30, 0x00, 0x00, 0x00, 0xa0, 0x11, 0x09, 0x00, 0x00, 0x00, 0xa5, 0xa3, 0xa0, 0x10, 0x40, 0x00, 0x00, 0x00, 0xa0, 0x11, 0x07, 0x00, 0x00, 0x00, 0xa5, 0xa3"); /* mov r1, 0x00000000 mov r2, 0x12 call vm_read ; Mov R1, input[0] mov R2, 0x29 xor R1, R2; Xor CMP R1, Flag [0]; JNZ ERROR; If not, jump to the code that prints the error; Mov R1, input[1] xor R1, R2 CMP R1, flag[1] JNZ ERROR MOV R1, input[2] xor R1, R2 CMP R1, flag[2] JNZ ERROR mov R1, input[3] xor r1, r2 cmp r1, flag[3] jnz ERROR mov r1, input[4] xor r1, r2 cmp r1, flag[4] jnz ERROR mov r1, input[5] xor r1, r2 cmp r1, flag[5] jnz ERROR mov r1, input[6] xor r1, r2 cmp r1, flag[6] jnz ERROR mov r1, input[7] xor r1, r2 cmp r1, Flag [7] JNZ ERROR */} /* * xor instruction interpreter */ void vm_xor(vm_processor *proc) {// Two data are stored in r1, R2 register int arg1 = proc->r1; int arg2 = proc->r2; Proc ->r1 = arg1 ^ arg2; The // xor instruction takes only one byte, so when interpreted, eIP moves back one byte proc->eip += 1; Int arg1 = proc->r1; int arg1 = proc->r1; int arg1 = proc->r1; Char *arg2 = *(proc->eip + 1) + heap_buf; If (arg1 == *arg2) {proc->flag = 1; } else { proc->flag = 0; Proc ->eip += 2; } /* * JNZ */ void vm_jnz(vm_processor *proc) {// Obtain the offset from the current address of the bytecode unsigned char arg1 = *(proc->eip + 1);  If (proc->flag == 0) {proc->eip += arg1; proc->eip += arg1; } else { proc->flag = 0; } // JNZ directive takes up 2 bytes, so eIP moves back 2 bytes proc->eip += 2; } /* * RET instruction interpreter */ void vm_ret(vm_processor *proc) {} /* * read System call interpreter */ void vm_read(vm_processor *proc) {// Char *arg2 = heap_buf + proc->r1; char *arg2 = heap_buf + proc->r1; char *arg2 = heap_buf + proc-> R1 int arg3 = proc->r2; // Call read read(0, arg2, arg3) directly; // read The system call takes up 1 byte, so the eIP moves back 1 byte proc->eip += 1; } /* * write system call interpreter */ void vm_write(vm_processor *proc) {// Same as read system call, r1 is the offset of the buF that holds the written data, Char *arg2 = heap_buf + proc->r1; char *arg2 = heap_buf + proc->r1; int arg3 = proc->r2; Write write(1, arg2, arg3); // write System call takes 1 byte, so eIP moves back 1 byte proc->eip += 1; } /* * mov */ void vm_mov(vm_processor *proc) {// mov Unsigned char *dest = proc->eip + 1 unsigned char *dest = proc->eip + 1 unsigned char *dest = proc->eip + 1 int *src = (int *) (proc->eip + 2); // In the first 4 cases, r1 = * SRC; // in the last case, * SRC is a buffer offset, which is implemented by assigning a byte to R1 switch (*dest) {case 0x10: proc->r1 = * SRC; break; case 0x11: proc->r2 = *src; break; case 0x12: proc->r3 = *src; break; case 0x13: proc->r4 = *src; break; case 0x14: proc->r1 = *(heap_buf + *src); break; } // mov instruction takes up 6 bytes, so eIP is moved back 6 bytes proc->eip += 6; } /* * execute bytecode */ void exec_opcode(vm_processor *proc) {int flag = 0; int i = 0; // Find the bytecode handler that eIP points to being interpreted while (! flag && i < OPCODE_NUM) { if (*proc->eip == proc->op_table[i].opcode) { flag = 1; Proc ->op_table[I].func((void *) proc); } else { i++; }} /* * the virtual machine interpreter */ void vm_interp(vm_processor *proc) {/* eip points to the first byte of the protected code * target_func + 4 is code to skip the function entry generated by the compiler */ proc->eip = (unsigned char *) target_func + 4; // Loop to determine whether the bytecode pointed to by eIP is a return instruction, and if not, call exec_opcode to explain the execution of while (*proc->eip! = RET) { exec_opcode(proc); }} /* * Initialize the VM processor */ void init_vm_proc(vm_processor *proc) {proc->r1 = 0; proc->r2 = 0; proc->r3 = 0; proc->r4 = 0; proc->flag = 0; Proc ->op_table[0]. Opcode = MOV; proc->op_table[0].func = (void (*)(void *)) vm_mov; proc->op_table[1].opcode = XOR; proc->op_table[1].func = (void (*)(void *)) vm_xor; proc->op_table[2].opcode = CMP; proc->op_table[2].func = (void (*)(void *)) vm_cmp; proc->op_table[3].opcode = SYS_READ; proc->op_table[3].func = (void (*)(void *)) vm_read; proc->op_table[4].opcode = SYS_WRITE; proc->op_table[4].func = (void (*)(void *)) vm_write; proc->op_table[5].opcode = RET; proc->op_table[5].func = (void (*)(void *)) vm_ret; proc->op_table[6].opcode = JNZ; proc->op_table[6].func = (void (*)(void *)) vm_jnz; // create buffer heap_buf = (char *) malloc(HEAP_SIZE_MAX); // Initialize buffer memcpy(heap_buf + 0x20, "syclover", 8); memcpy(heap_buf + 0x30, "success! \n", 9); memcpy(heap_buf + 0x40, "error! \n", 7); } // flag: ZPJEF_L[ int main() { vm_processor proc = {0}; // initial vm processor init_vm_proc(&proc);  // execute target func vm_interp(&proc); return 0; }Copy the code

0 x08 summary


The above procedures for learning code virtualization after the summary, which have a lot of incorrect understanding of the place I hope Daniel pointed out. This is just the simplest implementation, only for learning to use, want to further study virtualization technology is still very complex, need to accumulate more knowledge to understand in place, this article as a piece of advice. There are also many problems to be solved in the learning process. For example, if you want to realize a virtual machine-based protective shell, you must first convert native instructions in the source program into custom bytecode, but I don’t know what is the best way to convert.

In many foreign articles also see another kind of virtual machine protection, is based on LLVM-IR virtual machine protection, have interest can also continue to in-depth study.

0 x09 reference


www.cs.rhul.ac.uk/home/kinder…