Bytedance Terminal Technology -- Kelun CAI

Byte bhook open source

Github.com/bytedance/b…

Byte Android PLT hook scheme Bhook open source. Bhook supports Android 4.1-12 (API Level 16-31), supports Armeabi-V7A, ARM64-V8A, x86 and x86_64 under MIT license.

Most Android apps in Byte use Bhook as a PLT Hook scheme online. Bytes inside there are more than 20 different technical latitude SDKS that use Bhook. Bhook online stability, functionality, performance and other aspects have reached expectations.

Android native hook

As the technology stack of Android APP development continues to expand to native layer, Native Hook has been used in more and more technology scenarios. There are many ways to implement Android native hooks, among which inline hook and PLT hook are the most widely used and versatile.

The inline hook is by far the most powerful. It has few restrictions and can hook almost anywhere. Inline hooks are widely used in offline scenarios. The existing open source inline hook solutions have more or less stability problems and lack large-scale online validation.

PLT hook has the advantage of controllable stability and can be used in full on-line. However, PLT hook can only hook function calls that jump through PLT table, which limits its usage scenarios to a certain extent.

In real online environments, PLT hooks and inline hooks are often used together so that they can play a different role in different scenarios.

ELF

To understand how Android PLT hooks work, you need to understand the ELF file format and how linker loads ELF files.

Both app_Process and SO libraries are ELF (Executable and Linkable Format) files. For runtime Native Hooks, we are primarily concerned with the final product, the ELF file.

At the beginning of an ELF file, there is a fixed-length header in a fixed format. The ELF file header contains the starting positions and lengths of SHT (Section Header Table) and PHT (Program Header Table) in the current ELF file. The SHT and PHT describe the basic information of ELF’s “connection view” and “execution view,” respectively.

Execution View

ELF is divided into a Linking View and an Execution View.

  • Link view: Data organized in sections before ELF is loaded into memory.
  • Execution view: ELF data organized in segments after it has been loaded into memory.

PLT hooks do not modify the ELF files on disk, but modify the data in memory at run time, so our main concern is the execution view, that is, how the data in THE ELF is organized and stored once the ELF is loaded into memory.

Linker uses mmap to load ELF into memory according to the information in the ELF file execution view, performs relocation to fill the global offset table and DATA with the absolute addresses of external references, and then sets the permissions of the memory page. Finally, the various initialization functions in init_array are called.

The time for PLT hook execution is after Linker has fully loaded ELF and we need to parse ELF data in memory and modify the result of Relocation.

ELF can contain many types of sections. Here are some of the most important and PLT hook related sections.

Dynamic section

.dynamic is designed specifically for Linker and contains indexes of the various data that Linker uses to parse and load ELF. After linker has parsed the ELF header and the contents of the execution view, it starts parsing.dynamic.

Data

  • .bss: uninitialized data. For example: global variables and static variables with no initial value assigned. (.bssNo ELF file size)
  • .data: Initializes non-read-only data. Such as:int g_value = 1;Or,size_t (*strlen_ptr)(const char *) = strlen;Linker Relocation needs to be involved in the initialization process to know the outsidestrlenThe absolute address of the function)
  • .rodata: Initializes read-only data. After loading, linker sets the page to be read-only. Such as:const int g_value = 1;.
  • .data.rel.roLinker relocation is used to initialize read-only data. Linker relocation will set the page to read-only after loading. Such as:const size_t (*strlen)(const char *) = strlen;.

Code (Code)

  • .textThis is where most functions are stored after being compiled into binary machine instructions.
  • .init_arraySometimes we need to automatically execute some logic as soon as ELF is loaded, such as defining an instance of a global C++ class.init_arrayCall the constructor of this class in. In addition, it can be used__attribute__((constructor))Define a separate init function.
  • .plt: a springboard for calling external or internal symbols,.pltfrom.got.data.data.rel.roTo query the absolute address of the symbol, and then perform the jump.

A Symbol is a Symbol.

Symbols can be divided into two categories: “dynamically linked symbols” and “internals (debug symbols)”. These two sets of symbols are not strictly inclusive, and the debugger generally loads both symbols simultaneously. Linker only cares about dynamically linked symbols; internal symbols are not loaded into memory by Linker. Only dynamic link symbols are concerned when PLT hooks are executed.

  • .dynstr: string pool of dynamic linking symbols, which stores all string information used in dynamic linking, such as function names and global variable names.
  • .dynsym: Index information table of dynamic link symbol, play the role of “association” and “description”.

Dynamic link symbols are divided into “import symbols” and “export symbols” :

  • Exported symbols: the symbols currently provided by ELF for external use. For example, libc.soopenIt is the derived symbol of libc.so.
  • Import symbol: indicates the external symbol currently required for ELF. For example: your own libtest.so if usedopen, thenopenLibtest. so is defined as the import symbol.

Incidentally, information about internal symbols is contained in.symtab,.strtab, and.gnu_debugData.

Hash table

In order to speed up the search process of “dynamically linked symbol strings”, the ELF contains the hash table of these strings. By checking the hash table, you can quickly confirm the existence of a dynamically linked symbol in ELF and the offset position of the corresponding information item in.dynsym.

Historically, there are two types of hash tables in Android ELF:

  • .hash: SYSV hash. It contains all the dynamic link symbols.
  • .gnu.hash: the GNU hash. Contains only exported symbols in dynamic link symbols.

ELF may contain both.hash and.gnu. Hash, or only one of them. ELF static link parameter -wl,–hash-style, can be set to sysv or GNU or both. Since Android 6.0, Linker has supported.gnu. Hash parsing.

Linker (Dynamic link)

The main thing Linker did when loading ELF was relocation, a process that seeks to locate the absolute address of the corresponding external symbol (function or data) for each current ELF “import symbol.” Eventually, these addresses are written in the following places:

  • .got.plt: Saves the absolute address of the external function. This is the “GOT table” we often hear.
  • .data..data.rel.ro: Holds the absolute address of external data, including function Pointers.

To complete the relocation process, rely on the following information in ELF:

  • .rel.plt..rela.plt: Used for association.dynsym.got.plt. This is what we often hear about “PLT tables”.
  • .rel.dyn..rela.dyn..rel.dyn.aps2..rela.dyn.aps2: Used for association.dynsym.data..data.rel.ro.

Android only uses RELA in 64-bit implementations, which have the additional R_addend field over REL. In addition, Android has since 6.0 supported.rel. Dyn and.rela.dyn data in APS2 format, which is sleB128 encoded data that requires special decoding logic to read.

Relocation completes with the following function calls:

Data reference relationship after completion of relocation is as follows:

Android PLT hook

PLT Hook fundamentals

With an understanding of the ELF format and Linker’s Relocation process, the PLT Hook process is self-explanatory. It does something similar to Relocation. That is: Use the symbol name to find the corresponding symbol information in the hash table (in.dynsym). Then find the corresponding PLT information (in.rel. PLT or.rela.plt or.rel. Dyn or.rela.dyn or.rel. Dyn. Aps2 or.rela.dyn. Aps2), Finally find the absolute address information (in.got.plt or.data or.data.rel. Ro). The last thing we need to do is change the value of this absolute address to the address we need for our own proxy function.

Linker will select.got.plt and.data.rel. Ro as read-only before changing the absolute address. Linker will select.got.plt and.data.rel. After the changes are made, you need to clear the CPU cache for that memory location with __builtin___clear_cache for the changes to take effect immediately.

Shortcomings of xHook

XHook is an early open source Android PLT hook scheme, which has received a lot of attention. XHook better implements ELF resolution and absolute address replacement. However, as an engineering PLT Hook scheme, xHook has many shortcomings, mainly including:

  • The native crash bottom-pocket mechanism is flawed and online crashes cannot be completely avoided.
  • Unable to hook the newly loaded ELF automatically. (External repetition is requiredrefreshTo “discover” the newly loaded ELF. But at what pointrefresh? Too high frequency will affect performance, too low frequency will lead to hook failure)
  • Because it relies on the chain call mechanism. If a call point is hooked multiple times, subsequent proxy functions in the chain will be lost after unhook a proxy function.
  • ELF is traversed only by reading Maps. In the high version of Android system and some models of the compatibility is not good, often can not hook the situation.
  • The API design uses re to specify which target ELF hooks are not running efficiently.
  • You need to register all the hook points before actually executing the hook, and once you start executing the hook (call)refreshAfter), cannot add hook point again. This design is very unfriendly.
  • The Linker Namespace mechanism introduced in Android 8.0 is not compatible (multiple implementations of the same function symbol may exist in the process).

These stability, effectiveness, and functionality issues make it difficult for xHook to be used in a truly large-scale online environment.

More complete Android PLT hook scheme

We desperately need a new and improved Android PLT Hook solution, what should it look like? I think it should satisfy these conditions:

  • Have a really reliable backstop mechanism for native crashes to avoid controllable crashes.
  • You can hook and unhook a single, partial, and total caller ELF at any time.
  • When the new ELF is loaded into memory, it should automatically perform all predetermined hook operations.
  • If multiple users hook the same call point, they should be able to unhook each other independently without interfering with each other.
  • To accommodate the Android Linker namespace, it should be possible to specify ELF, the caller of the hook function.
  • Automatic avoidance of accidental “recursive calls” and “ring calls” due to hooks. Such as:openIs called in the proxy function ofreadAnd thenreadIs called again in the proxy functionopen. If these two proxies exist in two separate SDKS, the resulting ring calls will be difficult to detect during SDK development. If you create a larger proxy function call loop between more SDKS, things can get out of hand.
  • The proxy function should be able to get the backtrace in the normal way (libunwind, libunwindstack, LLVM libunwind, FP unwind, etc.). In a large number of business scenarios, it is necessary to hook and capture and save backtrace in proxy function, dump and aggregate these backtraces at specific opportunities, and deliver data to the server after symbolization, so as to monitor and discover business problems.
  • The additional performance loss from the hook management mechanism itself should be low enough.

We designed and developed BHook with these goals in mind.

Byte bhook introduction

ELF and Linker have already been introduced, but here are some other key modules in BHook.

DL monitor

On Android, the dynamic loading of the SO library is eventually done through dlopen and Android_dlopen_ext, while the so library can be unloaded through DLclose.

Bhook hooks these three function calls internally. Therefore, when a new SO is loaded into the memory, Bhook can immediately sense it and execute the predetermined hook task on it immediately. Bhook can immediately sense when an SO is being unmounted and synchronizes with the ELF cache and hook execution module via an internal read/write lock mechanism to ensure that “the SO being hooked is not being unmounted”.

Android no longer allows the Dlopen library in apps as of 7.0. Linker namespace mechanism has been introduced since 8.0, and libdel. so is no longer a virtual entry for Linker, but a real so file. Android 7.0 and 8.0 are two important releases for Linker.

Android_dlopen_ext: android_dlopen_ext: android_dlopen_ext: android_dlopen_ext: android_dlopen_ext: android_dlopen_ext: android_dlopen_ext

Here we refer to ByteDance Raphael (github.com/bytedance/m… From Android 7.0, hook dlopen and android_dlopen_ext will not call the original function, but instead call linker and libdel. so internal functions to bypass the limitation. The internal functions corresponding to the following symbols are mainly used:

Android 7.x linker:

__dl__ZL10dlopen_extPKciPK17android_dlextinfoPv
__dl__Z9do_dlopenPKciPK17android_dlextinfoPv
__dl__Z23linker_get_error_bufferv
__dl__ZL23__bionic_format_dlerrorPKcS0_
Copy the code

The Android 8.0 + libdl. So:

__loader_dlopen
__loader_android_dlopen_ext
Copy the code

trampoline

Simple PLT hook schemes such as xHook do not need trampolines, just replace the absolute addresses in.got. PLT (and.data and.data.rel. Ro). However, this approach leads to “chain calls of multiple proxy functions on the same hook point” (similar to the Signal handler registered by SigAction on Linux). If one of the proxies is unhooked, Subsequent proxies in the “chain” will also be lost. XHook has this problem:

When Proxy 1 is unhook, Proxy 2 also disappears from the call chain, because Proxy 1 does not know the existence of Proxy 2 at all. When Unhook Proxy 1, it tries to restore the initial value, that is, the address of Callee.

To solve this problem, we need to write the address of the management entry function for each global offset table. At the same time, we also need to maintain a proxy function list for each function call point that is hooked. In the management entry function, we need to traverse and call each specific proxy function in the proxy function list.

In order to specify the jump at runtime, we need to create shellCode with Mmap and MProtect. By convention, we call the jump logic created here a trampoline:

In addition, in order to detect and avoid “ring call”, every trampoline starts to execute, the execution stack of proxy function will be recorded. When traversing the execution in the proxy function chain, it will detect whether the proxy function to be executed has already appeared in the execution stack. If so, In this case, all subsequent proxy functions in the proxy function chain are ignored and the last “original function” is directly executed.

The difficulty with trampoline implementation is performance. Trampoline injects additional logic into the execution process. In a multithreaded environment, the chain of proxy calls is traversed frequently, and the number of stored proxy functions can be increased and decreased at any time. We also need to save the execution stack of proxy functions. All of this logic cannot be locked, otherwise the performance loss will be obvious when hook high frequency functions.

Native crashes

When performing a hook operation, it is necessary to directly calculate a large number of absolute memory locations and then read and write to these memory locations, but this is not always safe. We may encounter situations like this:

  • During the initialization of DL monitor, yesdlcloseLinker executes while the hook has not yet completeddlcloseWe are executing dlClosedlcloseHook operated ELF.
  • The ELF file could be accidentally corrupted, causing Linker to load an ill-formed ELF.

At this point, read and write to the specified memory location may occur SIGSEGV or SIGBUS, resulting in native crash. We need a Java/C++ try-catch mechanism to protect dangerous operations from crashes:

int *p = NULL;

TRY(SIGSEGV, SIGBUS) {
    *p = 1;
} CATCH() {
    LOG("There was a problem, but it's okay.");
} EXIT
Copy the code

When crashes occur, because we understand that there are only “memory reads” or “single memory writes” operations in the protected code interval, there are no side effects from ignoring such crashes. In the Java virtual machine, there is a similar mechanism for detecting native crashes and creating appropriate Java exceptions.

Bhook carries out native crash bottom by registering SIGSEGV and SIGBUS signal processing functions, and saves registers and SIGmask at the beginning of the try block with SIGsetJMP. When the crash occurs, Jump to a catch block with siglongJMP in a signal processing function and restore sigmask.

A few issues worth noting:

  • ART Sigchain agentsigaction.sigprocmaskDelta function, we need to use delta functiondlsymFind the original functions in libc.so and call them.
  • Bionic and ART Sigchain have bugs on some AOSP versions, so we need to use them firstsigaction64sigprocmask64Rather thansigactionsigprocmask.
  • It is important to set sigmask in the right place in the right way.
  • Our try-catch mechanism runs in a multi-threaded environment, so it needs to be saved in some thread-independent waysigjmp_buf.
  • For performance and more usage scenarios, the whole mechanism needs to be lock-free, heap-free, TLS free, thread-safe, and asynchronous signal safe.

The native crash bottom-pocket module of Bhook has gone through strict pressure test and online test. If it is used correctly, it can achieve the expected effect. As you can see in the Bhook source code, we deliberately designed the module as it is (just a.c and.h file, with no external dependencies), with the benefit of easy portability and reuse. If you want to use this module in your own projects, please note the following:

  • Native crash backpocket is a “high-risk” operation that can cause problems that are uncertain and difficult to troubleshoot. So don’t use it if you can.
  • Please do not use native crash backstop for purely business type Native libraries. Instead, expose the crash and fix it.
  • The less logic in the try block, the better. For example, when sigSEGV and SIGBUS are in the bottom of the pocket, it is better to only have some memory address reads and single write operations in the try block, and try not to call external functions (includingmalloc.free.new.delete, etc.).
  • Try not to use C++ in try blocks. Some C++ syntax encapsulation for which the compiler generates unexpected logic (such as reading and writing C++ TLS variables, the compiler generates_emutls_get_addressCall, which may be calledmalloc).
  • In the current design: Do not call return in the try block, otherwise the recovery logic in the catch or exit block will be skipped, causing problems that are difficult to troubleshoot. In addition, you cannot nest another “same signal try” inside a try block.

About the Byte Terminal technology team

Bytedance Client Infrastructure is a global r&d team of big front-end Infrastructure technology (with r&d teams in Beijing, Shanghai, Hangzhou, Shenzhen, Guangzhou, Singapore and Mountain View), responsible for the construction of the whole big front-end Infrastructure of Bytedance. Improve the performance, stability and engineering efficiency of the company’s entire product line; The supported products include but are not limited to Douyin, Toutiao, Watermelon Video, Feishu, Guagualong, etc. We have in-depth research on mobile terminals, Web, Desktop and other terminals.

Now! Client/front-end/server/side intelligent algorithm/test development for global recruitment! Let’s change the world with technology. If you are interested, please contact [email protected], email subject resume – Name – Job intention – Desired city – Phone number.