As a C/ C++ engineer, you will encounter various problems in the development process, the most common is the memory use problem, such as, out of bounds, leaks. In the past, the most commonly used tool is Valgrind. But the biggest problem with using Valgrind is that it greatly slows down the running speed of the program. Initial estimates say that it will slow down by 10 times. AddressSanitizer, a tool developed by Google, is a good solution to the performance penalty of Valgrind. It’s very fast, slowing down the program twice as fast.

AddressSanitizer overview

AddressSanitizer is a compilers based test tool that detects multiple memory errors in C/C++ code at run time. AddressSanitizer is strictly a compiler plug-in that is divided into two modules: the instrumentation module for the compiler and a replacement for the dynamic library for Malloc/Free.

Instrumentation is mainly aimed at the LLVM compiler level on access to memory operations (store, load, alloc, etc.), they are processed. The dynamic library provides complex runtime functionality (such as Poison /unpoison Shadow Memory) and hooks in system calls such as malloc/free.

AddressSanitizer basic use

These memory errors can be detected according to the AddressSanitizer Wiki

  • Use after free: Access memory on the heap that has been freed
  • Heap buffer overflow: buffer overflow on the Heap
  • Stack buffer overflow: buffer overflow on the Stack
  • Global buffer overflow: indicates that the access to the Global buffer overflows
  • Use after return: Access the freed memory on the stack
  • Use after scope: Stack object usage exceeds the defined scope
  • Initialization Order bugs: Initialization command errors
  • Memory leaks

Here I simply introduce basic use, the use of detailed documentation can see official compiler USES documents, such as the Clang document: clang.llvm.org/docs/Addres…

Use After Free

The following code is a very simple example of Use after free:

//use_after_free.cpp
#include <iostream>
int main(int argc, char **argv) {
  int *array = new int[100];
  delete [] array;
  std: :cout << array[0] < <std: :endl;
  return 1;
}
Copy the code

Compile the code and run it. As you can see here, you just need to compile it with the -fsanitize=address option.

clang++  -O -g -fsanitize=address ./use_after_free.cpp
./a.out
Copy the code

We end up with the following output:

==10960==ERROR: AddressSanitizer: heap-use-after-free on address 0x614000000040 at pc 0x00010d471df0 bp 0x7ffee278e6b0 sp 0x7ffee278e6a8 READ of size 4 at  0x614000000040 thread T0#0 0x10d471def in main use_after_free.cpp:6
    #1 0x7fff732c17fc in start (libdyld.dylib:x86_64+0x1a7fc)0 x614000000040 is located 0 bytes inside of a 400 - byte region [0 x614000000040, 0 x6140000001d0) freed by thread T0 here:#0 0x10d4ccced in wrap__ZdaPv (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x51ced)
    #1 0x10d471ca1 in main use_after_free.cpp:5
    #2 0x7fff732c17fc in start (libdyld.dylib:x86_64+0x1a7fc)

previously allocated by thread T0 here:
    #0 0x10d4cc8dd in wrap__Znam (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x518dd)
    #1 0x10d471c96 in main use_after_free.cpp:4
    #2 0x7fff732c17fc in start (libdyld.dylib:x86_64+0x1a7fc)

SUMMARY: AddressSanitizer: heap-use-after-free use_after_free.cpp:6 in main
Copy the code

You can see that at a glance, it is very clear which row of memory was freed and which row of memory was used again.

There is also a memory leak, such as the following code, where it is clear that the memory p refers to has not been freed.

void *p;

int main(a) {
        p = malloc(7);
        p = 0; // The memory is leaked here.
        return 0;
}
Copy the code

Compile and run

clang -fsanitize=address -g  ./leak.c
./a.out
Copy the code

You can see the following results

=================================================================
==17756==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 7 byte(s) in 1 object(s) allocated from:
    #0 0x4ffc80 in malloc (/home/simon.liu/workspace/a.out+0x4ffc80)
    #1 0x534ab8 in main /home/simon.liu/workspace/./leak.c:4:8
    #2 0x7f127c42af42 in __libc_start_main (/usr/lib64/libc.so.6+0x23f42)

SUMMARY: AddressSanitizer: 7 byte(s) leaked in 1 allocation(s).
Copy the code

But notice that detection for memory leaks only happens before the program finally exits, which means that if you’re allocating memory at runtime and then freeing it at exit, AddressSanitizer won’t detect memory leaks, This is where you might need another tool JeMalloc/TCMalloc.

Basic principles of AddressSanitizer

Here a brief introduction of the realization of the AddressSanitizer, more detailed algorithm we can see the AddressSanitizer: a fast address sanity the checker “: www.usenix.org/system/file…

AddressSanitizer will replace all of your malloc and free, and then the memory areas that have been allocated (malloc) will be marked as poisoned and poisoned (mainly to deal with overflow situations), Memory that is free is marked as poisoned (primarily to handle Use after free). Every memory access in your code will be translated by the compiler as follows.

before:

*address = ... ;// or: ... = *address;
Copy the code

after:

shadow_address = MemToShadow(address);
if(ShadowIsPoisoned(shadow_address)) { ReportError(address, kAccessSize, kIsWrite); } *address = ... ;// or: ... = *address;
Copy the code

You can see that there is a translation process for the memory address (MemToShadow), and then determine if the memory area visited is poisoned, and if it is, an error is reported and poisoned exits.

The reason for this translation here is because AddressSanitizer divides virtual memory into two parts:

  1. Main Application Memory (Mem) is the memory used by the current program itself
  2. In simple terms, Shadow memory is a block of memory that holds the metadata information of main memory. For example, areas of main memory that are posioned are stored in Shadow Memory

AddressSanitizer versus other memory detection tools

Here’s a look at AddressSanitizer versus some of the other memory detection tools:

AddressSanitizer Valgrind/Memcheck Dr. Memory Mudflap Guard Page gperftools
technology CTI DBI DBI CTI Library Library
ARCH x86, ARM, PPC x86, ARM, PPC, MIPS, S390X, TILEGX x86 all(?) all(?) all(?)
OS Linux, OS X, Windows, FreeBSD, Android, iOS Simulator Linux, OS X, Solaris, Android Windows, Linux Linux, Mac(?) All (1) Linux, Windows
Slowdown 2x 20x 10x 2x-40x ? ?
Detects:
Heap OOB yes yes yes yes some some
Stack OOB yes no no some no no
Global OOB yes no no ? no no
UAF yes yes yes yes yes yes
UAR yes (see AddressSanitizerUseAfterReturn) no no no no no
UMR no (see MemorySanitizer) yes yes ? no no
Leaks yes (see LeakSanitizer) yes yes ? no yes

Parameter description:

  • DBIDynamic binary instrumentation
  • CTIInstrumentation: compile-time instrumentation
  • UMRUninitialized memory reads (reads uninitialized memory)
  • UAFUse of free memory (aka colloquy)
  • UAR: use-after-return
  • OOBOut of bounds (out of bounds)
  • x86: includes 32- and 64-bit.

You can see that AddressSanitizer only slows the program down by two times compared to Valgrind. AddressSanitizer currently supports GCC and Clang, where GCC has been supported since 4.8 and Clang since 3.1.

Considerations for using AddressSanitizer

  1. AddressSanitizer does not automatically crash applications when memory access violations are discovered. This is because when using fuzzy testing tools, they usually detect this error by checking the return code. Of course, we can also force the software to crash before fuzzing by changing the environment variable ASAN_OPTIONS to the following form:
export ASAN_OPTIONS='abort_on_error=1'/
Copy the code
  1. AddressSanitizer requires quite a lot of virtual memory (about 20 TERabytes), don’t worry, this is just virtual memory, you can still use it for your application. Fuzzy testing tools such as American Fuzzy Lop will limit the memory used by fuzzy software, but you can still solve this problem by disabling the memory limit. The only caveat is that there are some risks: the test sample could cause the application to allocate a large amount of memory, which could lead to system instability or other applications crashing. So don’t try to disable memory limits on the same system when doing some important blur tests.

Open the AddressSanitizer in the Nebula Graph

We also used AddressSanitizer in the Nebula Graph, which helped us find a lot of problems. Enabling AddressSanitizer in the Nebula Graph is simple, simply by enabling ENABLE_ASAN when Cmake is enabled. For example:

Cmake -DENABLE_ASAN=On
Copy the code

It is recommended that all developers open AddressSanitizer to run unit tests at the end of the development as it can detect memory problems that are not easily detected and save a lot of debugging time.

The appendix

  • Nebula Graph: an open source distributed Graph database
  • GitHub:github.com/vesoft-inc/…
  • Official blog: nebula-graph. IO /cn/posts/
  • Weibo: weibo.com/nebulagraph