Bytedance Terminal Technology — Feng Yadong

background

What is symbolic parsing

Symbol resolution is the process of mapping crash log addresses into readable symbols and line numbers in source files for developers to locate and fix problems. As shown below, the first completely unreadable crash log becomes the third fully readable log after full symbolic parsing. For byte stability monitoring platform, it is necessary to support iOS crash/stuck/stuck/custom exception and other log types of reverse solution, so symbol resolution is a necessary basic capability of the monitoring platform.

System native symbol parsing tool

symbolicatecrash

Symbolicatecrash provided by Xcode. The command is located at: / Applications/Xcode. App/Contents/SharedFrameworks/DVTFoundation framework Versions/A/Resources/symbolicatecrash, Is a Perl script that incorporates step-by-step parsing (you can also copy the command and call it directly).

Use: Symbolicatecrash log.crash -d xxx.app.dsym

Advantages: can be very convenient to symbolize the whole crash log.

Disadvantages:

  1. It takes a long time.
  2. It is coarse-grained and cannot symbolize a particular line.

atos

Usage: atos -o XXX. App. DSYM/Contents/Resources/DWARF/XXX – arch arm64 / armv7 -l loadAddress runtimeAddress

Advantages: Fast speed, can symbolize a specific line, convenient for the upper layer to do cache.

The problem with native tools

However, both of these tools have two major drawbacks:

  1. Are stand-alone tools that cannot be provided as an online service.
  2. It has to rely on macOS system, because byte server infrastructure is all based on Linux, which makes it impossible to reuse various platforms and frameworks of the group, which brings very high machine cost, deployment cost and operation and maintenance cost.

Exploration of historical Scheme

In order to solve these two major pain points and build a set of services that can provide iOS online symbol parsing on Linux, we have made the following explorations in history:

Plan 1: LLVM – atosl

This is a custom modification based on LLVM’s own symbol parsing tool.

The flowchart for resolving single-line logs online is as follows:

This scheme did not have a big problem at first, but as time went by, parsing failure often occurred during evening peak hours due to parsing timeout, and the problem that only address offset could be seen but symbols could not be seen, so we still need to find the bottleneck and further optimize it.

Scheme 2: LLVM atosl — cgo

The llVM-atosL tool is invoked from CGO instead of the command line.

After the launch of scheme 1, we observed that the single-line parsing of PCT99 was extremely exaggerated during the evening peak, with more and more parsing failures due to timeout. Once, the whole service was directly jammed during the evening peak, and when I logged in to the online machine, I saw a large number of too many open files reporting errors. At that time, I suspected that the fd usage exceeded the upper limit. Considering that llVM-atosl takes up at least three FDS (stdin, stdout and stderr) each time the script is executed, we tried to encapsulate llvm-atosl from the command line tool into a C library, which is called by cGO on the Golang side:

package main

/*
#cgo CFLAGS: -I./tools
#cgo LDFLAGS: -lstdc++ -lncurses -lm -L${SRCDIR}/tools/ -lllvm-atosl
#include "llvm-atosl-api.h"
#include <stdlib.h>
*/
import "C"

import (
  "fmt"
  "strconv"
  "strings"
  "unsafe"
)

func main(a) {
    result = symbolicate("~ / dsym 7.8.0 eb7dd4d73df0329692003523fc2c9586 / Aweme (78007). The app. The dsym/Contents/Resources/DWARF/Aweme"."arm64"."0x100008000"."0x0000000102cff4b8");
    fmt.Println(result)
}

func symbolicate(go_path string, go_arch string, go_loadAddress string, go_address string) string {
    c_path := C.CString(go_path)
    c_arch := C.CString(go_arch)

    loadAddress := hex2int(go_loadAddress)
    c_loadAddress := C.ulong(loadAddress)

    address := hex2int(go_address)
    c_address := C.ulong(address)

    c_result := C.getSymbolicatedName(c_path, c_arch, c_loadAddress, c_address)

    result := C.GoString(c_result)

    C.free(unsafe.Pointer(c_path))
    C.free(unsafe.Pointer(c_arch))
    C.free(unsafe.Pointer(c_result))

    return result;
}

func hex2int(hexStr string) uint64 {
     // remove 0x suffix if found in the input string
     cleaned := strings.Replace(hexStr, "0x"."".- 1)

     // base 16 for hexadecimal
     result, _ := strconv.ParseUint(cleaned, 16.64)
     return uint64(result)
 }
Copy the code

Switching from cross-process calls to in-process calls was supposed to reduce both fd usage and inter-process communication overhead, but parsing efficiency dropped instead of improving after going live.

See the blog post “How to Make Go call C 10 times Better?” (See Resources [1] for a link). There are two main reasons for poor cGO performance:

  1. The stack of threads is relatively small when Go runs, and is limited by the number of P(Processor, which can be understood as the management scheduler of goroutine) and M(Machine, which can be understood as the physical thread). Generally, it can be simply understood as subjectedGOMAXPROCSRestricted, go 1.5 version laterGOMAXPROCSThe default is the number of CPU cores in the machine, so oncecgoThe number of methods concurrently invoked exceedsGOMAXPROCS, the call blocks.
  2. Because of the need to preserve both C/C++ runtimes,cgoTranslation and coordination is required between the two runtimes and the two ABIs (abstract binary interfaces). That’s a lot of overhead.

This shows that the assumptions about fd overload and cross-process invocation performance bottlenecks were not true, so this solution proved to be unworkable.

Solution 3: golang – atos

Based on golang native system library DEBUG/DWARF, it can realize dwarf file parsing, address parsing into symbols, can replace llVC-ATosL implementation, and can naturally use the characteristics of Golang coroutine to achieve high concurrency. Implementation scheme can refer to the following source:

package dwarfexample
import (
    "debug/macho"
    "debug/dwarf"
    "log"
    "github.com/go-errors/errors")
func ParseFile(path string, address int64) (err error) {
    var f *macho.FatFile
    iff, err = macho.OpenFat(path); err ! =nil {
        return errors.New("open file error: " + err.Error())
    }

    var d *dwarf.Data
    if d, err = f.Arches[1].DWARF(); err ! =nil {
        return
    }

    r := d.Reader()

    var entry *dwarf.Entry
    ifentry, err = r.SeekPC(address); err ! =nil {
        log.Print("Not Found ...")
        return
    } else {
        log.Print("Found ...")
    }

    log.Printf("tag: %+v, lowpc: %+v", entry.Tag, entry.Val(dwarf.AttrLowpc))

    var lineReader *dwarf.LineReader
    iflineReader, err = d.LineReader(entry); err ! =nil {
        return
    }

    var line dwarf.LineEntry

    if err = lineReader.SeekPC(0x1005AC550, &line); err ! =nil {
        return
    }

    log.Printf("line %+v:%+v", line.File.Name, line.Line)

    return
}
Copy the code

However, in unit tests, it was found that golang-atos single-line parsing was 10 times slower than llvm-atosl, because the golang version of the DWARF file parsing implementation was more time-consuming than the C++ version of LLVM. Therefore, this scheme is not feasible.

The ultimate solution

Overall scheme design

Later, it was found through monitoring that the read traffic of CephFS, the distributed file system storing symbol table files, was extremely high every time the parsing efficiency decreased and a large number of errors were reported:

Then I realized that the real bottleneck of symbol resolution was in network IO, because the file size of symbol table of some super apps such as Douyin and Toutiao often exceeded 1GB, and the number of internal test packages uploaded every day was very large. Although the symbol table was cached locally on the physical machine, there were always some long-tailed symbol tables that could not hit the cache. This problem is magnified by the need to synchronize backend container instances from distributed file systems during the evening peak, and because symbol resolution is distributed randomly to a physical machine in the cluster: The higher the network I/O traffic is, the slower the symbol resolution is. The slower the symbol resolution is, the easier it is to accumulate, which in turn may lead to higher NETWORK I/O traffic.

We finally adopted the symbol table upload to fully resolve the symbol table file address and symbol mapping relationship, the ultimate solution of online directly check online cache:

Core changes:

  1. The symbol and address mapping is changed from searching for corresponding symbol table file in crash to calling command line work resolution to fully parsing all address and symbol mapping relations when symbol table file is uploaded, and then storing the mapping relations structurally and searching cache in crash.
  2. In order to solve the problem of demangle failure in some C++ and Rust symbols and the inconsistency of demangle tools in various languages. LLVM’s demangle tool is replaced with a Rust implementation, which supports the demangle tool symbolic-Demangle (see Resources [2]) in the whole language, greatly reducing the operation and maintenance cost.
  3. Give priority to the new scheme to do symbolic analysis, the new scheme is dead in the volume or the new scheme fails to analyze the old scheme to do the bottom.

Scheme implementation details

Symbol table file format

DWARF

File structure

DWARF is a debug information format commonly used for source-level debugging and for tools (such as ATOS) that retrieve symbols and line numbers corresponding to source code from runtime addresses.

If DWARF with dSYM is set in Build Options -> Debug Infomation format, Xcode will generate a dSYM file that explicitly contains DWARF to help us use the Find method symbols, file names, line numbers and other information to help developers troubleshoot problems after the release.

We use AwemeDylib. Framework. The DWARF dSYM file, for example, in order to file a macOS observation under its file type:

As you can see from the figure above, DWARF is also a type of Mach-O file, so it can also be opened for analysis using the MachOView tool.

You can see from the figure above that its Mach-O file is of type MH_DSYM. Use the size command to view the segments and sections in the AwemeDylib DWARF file. Use arm64 for example:

~/Downloads/dwarf/AwemeDylib.framework.dSYM/Contents/Resources/DWARF > size -x -m -l AwemeDylib
AwemeDylib (for architecture arm64):
Segment __TEXT: 0x18a4000 (vmaddr 0x0 fileoff 0)
        Section __text: 0x130fd54 (addr 0x5640 offset 0)
        Section __stubs: 0x89d0 (addr 0x1315394 offset 0)
        Section __stub_helper: 0x41c4 (addr 0x131dd64 offset 0)
        Section __const: 0x1a4358 (addr 0x1321f40 offset 0)
        Section __objc_methname: 0x47c15 (addr 0x14c6298 offset 0)
        Section __objc_classname: 0x45cd (addr 0x150dead offset 0)
        Section __objc_methtype: 0x3a0e6 (addr 0x151247a offset 0)
        Section __cstring: 0x1bf8e4 (addr 0x154c560 offset 0)
        Section __gcc_except_tab: 0x1004b8 (addr 0x170be44 offset 0)
        Section __ustring: 0x1d46 (addr 0x180c2fc offset 0)
        Section __unwind_info: 0x67c40 (addr 0x180e044 offset 0)
        Section __eh_frame: 0x2e368 (addr 0x1875c88 offset 0)
        total 0x189e992
Segment __DATA: 0x5f8000 (vmaddr 0x18a4000 fileoff 0)
        Section __got: 0x4238 (addr 0x18a4000 offset 0)
        Section __la_symbol_ptr: 0x5be0 (addr 0x18a8238 offset 0)
        Section __mod_init_func: 0x1850 (addr 0x18ade18 offset 0)
        Section __const: 0x146cb0 (addr 0x18af670 offset 0)
        Section __cfstring: 0x1b2c0 (addr 0x19f6320 offset 0)
        Section __objc_classlist: 0x1680 (addr 0x1a115e0 offset 0)
        Section __objc_nlclslist: 0x28 (addr 0x1a12c60 offset 0)
        Section __objc_catlist: 0x208 (addr 0x1a12c88 offset 0)
        Section __objc_protolist: 0x2f0 (addr 0x1a12e90 offset 0)
        Section __objc_imageinfo: 0x8 (addr 0x1a13180 offset 0)
        Section __objc_const: 0xb2dc8 (addr 0x1a13188 offset 0)
        Section __objc_selrefs: 0xf000 (addr 0x1ac5f50 offset 0)
        Section __objc_protorefs: 0x48 (addr 0x1ad4f50 offset 0)
        Section __objc_classrefs: 0x16a8 (addr 0x1ad4f98 offset 0)
        Section __objc_superrefs: 0x1098 (addr 0x1ad6640 offset 0)
        Section __objc_ivar: 0x42c4 (addr 0x1ad76d8 offset 0)
        Section __objc_data: 0xe100 (addr 0x1adb9a0 offset 0)
        Section __data: 0xc0d20 (addr 0x1ae9aa0 offset 0)
        Section HMDModule: 0x50 (addr 0x1baa7c0 offset 0)
        Section __bss: 0x1e9038 (addr 0x1baa820 offset 0)
        Section __common: 0x1058e0 (addr 0x1d93860 offset 0)
        total 0x5f511c
Segment __LINKEDIT: 0x609000 (vmaddr 0x1e9c000 fileoff 4096)
Segment __DWARF: 0x2a51000 (vmaddr 0x24a5000 fileoff 6332416)
        Section __debug_line: 0x3e96b7 (addr 0x24a5000 offset 6332416)
        Section __debug_pubnames: 0x16ca3a (addr 0x288e6b7 offset 10434231)
        Section __debug_pubtypes: 0x2e111a (addr 0x29fb0f1 offset 11927793)
        Section __debug_aranges: 0xf010 (addr 0x2cdc20b offset 14946827)
        Section __debug_info: 0x12792a4 (addr 0x2ceb21b offset 15008283)
        Section __debug_ranges: 0x567b0 (addr 0x3f644bf offset 34378943)
        Section __debug_loc: 0x674483 (addr 0x3fbac6f offset 34733167)
        Section __debug_abbrev: 0x2637 (addr 0x462f0f2 offset 41500914)
        Section __debug_str: 0x5d0e9e (addr 0x4631729 offset 41510697)
        Section __apple_names: 0x1a6984 (addr 0x4c025c7 offset 47609287)
        Section __apple_namespac: 0x1b90 (addr 0x4da8f4b offset 49340235)
        Section __apple_types: 0x137666 (addr 0x4daaadb offset 49347291)
        Section __apple_objc: 0x13680 (addr 0x4ee2141 offset 50622785)
        total 0x2a507c1
total 0x4ef6000
Copy the code

As you can see, there is a Segment named __DWARF, which contains __debug_line, __debug_aranges, __debug_info, and many other class sections. We can explore DWARF sections using dwarfdump, for example by typing dwarfdump AwemeDylib –debug-info to show formatted content under the __debug_infoSection. The full use of the Dwarfdump directive can be found in the official documentation of the LLVM toolchain (see Resources [3] for a link).

Refer to the DWARF Official Document Format (see Resources [4] for a link). The relationship between these sections is shown below:

debug_info

The debug_Infosection is the core information in DWARF files. DWARF uses The Debugging Information Entry (DIE) to describe The Information in a uniform form, and each DIE contains:

  • A TAG attribute expression describes what type of element, such as:DW_TAG_subprogram(function),DW_TAG_formal_parameter(Formal parameters),DW_TAG_variable(variable),DW_TAG_base_typeBase type.
  • N Attributes that describe a DIE.

Here’s an example:

0x0049622c:   DW_TAG_subprogram
                DW_AT_low_pc        (0x000000000030057c)
                DW_AT_high_pc        (0x0000000000300690)
                DW_AT_frame_base        (DW_OP_reg29 W29)
                DW_AT_object_pointer        (0x0049629e)
                DW_AT_name        ("+[SSZipArchive _dateWithMSDOSFormat:]")
                DW_AT_decl_file        ("/var/folders/03/2g9r4cnj3kqb5605581m1nf40000gn/T/cocoapods-uclardjg/Pods/SSZipArchive/SSZipArchive/SSZipArchive.m")
                DW_AT_decl_line        (965)
                DW_AT_prototyped        (0x01)
                DW_AT_type        (0x00498104 "NSDate*")
                DW_AT_APPLE_optimized        (0x01)
Copy the code

Here are some of the key numbers:

  • DW_AT_low_pc.DW_AT_high_pcRespectively represent the start/end PC addresses of the function.
  • DW_AT_nameThe description function is named +[SSZipArchive _dateWithMSDOSFormat:].
  • DW_AT_decl_fileSay this function is in… / ssziparchiive. M file declaration.
  • DW_AT_decl_fileRefers to the function in… / ssziparchiive. M file declared at line 965.
  • DW_AT_typeRepresents the return value type of the function, which in the case of this function is NSDate*.

It is worth noting:

  1. DWARF has a limited set of attributes, and a list of all attributes can be found at the beginning of DW_TAG in the LLVM API documentation (see Resources [5] for a link).
  2. DW_AT_low_pcDW_AT_high_pcThe machine code address described is not equivalent to the program’s runtime address, which we can call file_address. For security reasons, the operating system will apply a random address space layout randomization technology ASLR. When loading the executable file into memory, it will make a random offset (load_address is referred to below). After we obtain the offset, we need to add it__TEXTSegment vmaddr can be restored to the outbound address. Vmaddr can be accessed via the size directive above orotool -lGet the instructions. Vmaddr is 0x4000 for armV7 and 0x100000000 for ARM64, but not always. For example, in AwemeDylib, vmADDR is 0. We’ll call the address of the function at App runtime runtime_address.

The calculation formula of the above addresses is as follows:

file_address = runtime_address – load_address + vm_address

CompileUnit

CompileUnit translates to compilation unit. A compilation unit usually corresponds to a DIE whose TAG is DW_TAG_compile_unit. The compilation unit represents the __TEXT and __DATA and other products of an executable source file compiled, which can be simply understood as a file involved in the compilation of our code, such as.m,.mm,.cpp,.c and other corresponding source files of different programming languages. A compilation unit contains all the dies (including methods, parameters, variables, and so on) declared in the compilation unit. Take a typical example:

0x00495ea3: DW_TAG_compile_unit
              DW_AT_producer        ("Apple LLVM version 10.0.0 (clang-1000.11.45.5)")
              DW_AT_language        (DW_LANG_ObjC)
              DW_AT_name        ("/var/folders/03/2g9r4cnj3kqb5605581m1nf40000gn/T/cocoapods-uclardjg/Pods/SSZipArchive/SSZipArchive/SSZipArchive.m")
              DW_AT_stmt_list        (0x001e8f31)
              DW_AT_comp_dir        ("/private/var/folders/03/2g9r4cnj3kqb5605581m1nf40000gn/T/cocoapods-uclardjg/Pods")
              DW_AT_APPLE_optimized        (0x01)
              DW_AT_APPLE_major_runtime_vers        (0x02)
              DW_AT_low_pc        (0x00000000002fc8e8)
              DW_AT_high_pc        (0x0000000000300828)
Copy the code

Here are some of the key numbers:

  • DW_AT_language, which describes the programming language used by the current compilation unit.
  • DW_AT_stmt_listThe line number information corresponding to the current compilation unit is indebug_lineOffsets in section, which we’ll cover in more detail in the next summary.
  • DW_AT_low_pc.DW_AT_high_pcThese respectively represent all that the compilation unit containsDW_TAG_subprogramTAG the overall start/end PC address of the DIE.
debug_line

You can view the structured data of the debug_linesection by typing dwarfdump AwemeDylib –debug-line.

Then we search DW_AT_stmt_list in the previous summary, which is 0x001e8F31:

debug_line[0x001e8f31] ... include_directories[ 1] = "/var/folders/03/2g9r4cnj3kqb5605581m1nf40000gn/T/cocoapods-uclardjg/Pods/SSZipArchive/SSZipArchive" ... file_names[ 1]: name: "SSZipArchive.m" dir_index: 1 mod_time: 0x00000000 length: 0x00000000 ... Address Line Column File ISA Discriminator Flags ------------------------ ------ ------ --- ----- ------------- --------  0x00000000002fc8e8 46 0 1 0 0 is_stmt 0x00000000002fc908 48 32 1 0 0 is_stmt prologue_end 0x00000000002fc920 0 32 1 0 0  0x00000000002fc928 48 19 1 0 0 0x00000000002fc934 49 9 1 0 0 is_stmt 0x00000000002fc938 53 15 1 0 0 is_stmt 0x00000000002fc940 54 9 1 0 0 is_stmt ... 0x0000000000300828 1058 1 1 0 0 is_stmt end_sequenceCopy the code

The combination of include_directories and file_names is the absolute path to the participating file.

Then the list below is the file name and line number for file_address.

  • Address: This refers to FileAddress.

  • Line: refers to the Line number corresponding to FileAddress in the source file.

  • Column: FileAddress Column number corresponding to the source file.

  • File: source File index, which is the same as the subscript in file_names above.

  • ISA: An unsigned integer that indicates which instruction set schema the current instruction applies to, usually 0.

  • Discriminator: An unsigned integer that indicates that the current instruction belongs to a multi-compiler. In a single-compiler system, the value is 0.

  • Flags: Some flag bits, the most important two of which are explained here:

    • End_sequence: is the end address of the target file machine instruction +1, so it can be considered that in the current compilation unit, only the address before the corresponding address of end_sequence is valid instruction.
    • Is_stmt: indicates whether the current command is the recommended breakpoint location. Generally speaking, the code where is_stmt is false may be corresponding to the compiler’s optimized instruction. This part of the instruction is generally 0 line number, which is interfering with our analysis.
Principle of symbolic resolution

For example, this line calls the stack:

5 AwemeDylib 0x000000010035d580 0x10005d000 + 3147136

The corresponding binaryImage is:

0x10005d000 – 0x1000dffff AwemeDylib arm64

From the file structure section we can use the formula to calculate the file_address corresponding to the crash address:

file_address = 0x000000010035d580 – 0x10005d000 + 0x0 = 0x300580

We then use dwarfdump –lookup to find the corresponding method name and row number:

We use a flowchart to describe dwarfdump’s address to symbol mapping (as do other tools like ATOS) :

As you can see, the final dwarfdump results are exactly the same as our manual dwarfdump results. The file names and line numbers at the address range 0x30057C to 0x300593 are identical.

Based on the symbolic parsing of DWARF files we expect the format of the parsing result to be:

func_name (in binary_name) (file_name:line_number)

FileAddress 0x300580

+[SSZipArchive _dateWithMSDOSFormat:] (in AwemeDylib) (SSZipArchive.m:965)

Then we use the ATOS tool to execute the command and the result of manual parsing is:

dwarf atos -o AwemeDylib.framework.dSYM/Contents/Resources/DWARF/AwemeDylib -arch arm64 -l 0x10005d000 0x000000010035d580

+[SSZipArchive _dateWithMSDOSFormat:] (in AwemeDylib) (SSZipArchive.m:965)

It can be seen that THE results of ATOS and our hand moving flesh analysis are also completely consistent.

Symbol Table

In the last large section, we introduced the principles of symbol parsing through DWARF files. However, this solution does not cover 100% of the scenarios. The reason is:

  1. If the statically linked Framework compiles the parameters when it is packagedGCC_GENERATE_DEBUGGING_SYMBOLSChange it to NO, then the dSYM file generated when App is packaged will not have the file name and line number information corresponding to this part of code generation machine instruction.
  2. For system libraries, there are no dSYM files available, we just have.dylib or.framework MachO files, for examplelibobjc.A.dylib.Foundation.frameworkAnd so on.

For symbols that do not have DWARF files, we need to use another means: Symbol Table String for Symbol parsing.

File structure

The Symbol Table part of the MachO file in the MachoView tool format is as follows:

Key information interpretation:

  • String Table Index: the offset in the String Table. For example, the offset of the first symbol info in the circle above is 0x0048C12B plus the starting address of String Table 0x02BBC360 equals 0x304848B. The result is _ff_stream_add_bitstream_filter.

  • Value: the starting FileAddress corresponding to the current method.
Principle of symbolic resolution
  1. Sort the value of the Symbol Table list.
  2. If value is sorted and index less than value is found, the crash information exists in the index-1 data area. Then, String Table index in the index-1 data area can be used to index the corresponding method name in String Table. Then FileAddress – the value of the target data field is the number of bytes offset from the crash address to the method’s start address.

We expect the parse result to be in the following format:

func_name (in binary_name) + func_offset

FileAddress 0x56C1DE;

_ff_stream_add_bitstream_filter (in AwemeDylib) + 2

Then we use the ATOS tool to execute the command and the result of manual parsing is:

dwarf atos -o AwemeDylib.framework.dSYM/Contents/Resources/DWARF/AwemeDylib -arch arm64 -l 0x0 0x56C1DE

ff_stream_add_bitstream_filter (in AwemeDylib) + 2

The only difference is that ATOS removes the _ prefix that the compiler assigns to c functions by default.

Online pre-resolution scheme implementation

Golang native implementation

Golang uses the debug/ Dwarf library to parse dwarf files, which makes it very easy to print the file name and line number of address. Golang naturally supports cross-platform.

However, Golang’s native implementation didn’t meet our needs for several reasons:

  1. debug/dwarfThere is no API for directly resolving method names, which results in incomplete parsing results.
  2. There is also no compatibility for more complex scenarios such as file names and line numbers for inline functions.
  3. The implementation here is based on the known FileAddress premise and does not provide full preparsing solution.
  4. Only Dwarf file parsing is supported, but Symbol Table parsing is not supported.

Therefore, we have to parse DWARF files and Symbol Table by ourselves.

Full pre-parsing implementation

According to the above principle, we can first naturally think of a thought is: we just need to parse __TEXTSegment __textSection possible address range one by one, and then save in the backend distributed cache such as Hbase or Redis isn’t it good?

The answer is yes, but not necessarily.

From the above figure, we can see that the size of the code segment is 0x130FD54, which is nearly 2000W when converted to base 10. This is just a single schema for a single symbol table file, but byte stability monitoring platforms already have hundreds of thousands of symbol tables online, and this level of storage is too resource-intensive to be realistic. Based on the principle of symbolic resolution, it is not difficult to find that the resolution results of a consecutive address may be exactly the same. For example, as we mentioned above, +[SSZipArchive _dateWithMSDOSFormat:] (in AwemeDylib) [SSZipArchive _dateWithMSDOSFormat:] (SSZipArchive. M: 965). This is at least 20x compression, and this strategy works for DWARF files as well as Symbol tables.

So the next question comes up, [SSZipArchive _dateWithMSDOSFormat:] (in AwemeDylib) (SSZipArchive. M: 965). It is very simple to write a value in Hbase. We can encapsulate the lowest address, highest address, method name, file name, and line number of an address range into a struct and define it as value, which is called Unit {}. So what’s key?

In fact, there is a tricky problem: When the data store is pre-parsed, we store a range of addresses, but when the data store is parsed online, we input only one address. Then, how can we extract the Hbase stored key from this address? Our solution is as follows:

hbase_key = [table_name]+image_name+uuid+chunk_index

Each part is explained as follows:

  • Table_name: used to distinguish dwarf and symbol_table.
  • Image_name: binary name, such as Aweme, libobjc.a.dilib, etc.
  • Uuid: unique identifier of a symbol table file. Note that dSYM is a multi-schema fat binary file, and MachO files have different UUID.
  • Chunk_index: splits the address space whose continuous length is a constant N (for example, 10000) and calculates which subscript the current address can fall into. It can also be considered as dividing the current address by a constant N and rounding down. For a single address is very clear, but for some address range is more complicated, if a lower limit and upper limit of address range divided by the constant N the same down, they fall in the index of the same, but if different, in order to ensure that when read into this section address range of each email address can be parsed, Therefore, all chunk_indexes across the address range need to be written to the address range.

Based on this policy, the value in Hbase cannot be a single address range and the corresponding resolution result. Instead, it is an array of all address ranges within this range, which is marked as [] Unit {}. The schematic diagram is as follows:

We can clearly see that the address range 29001~41000 spans three chunk_indexes, which are simultaneously written into three Hbase caches. Although there is a bit of redundancy, it still gives the maximum consideration to performance and throughput. When we query the resolution result of the call stack address online, we just divide the offset address by the constant N and then round down to calculate which chunk_index the offset address falls into, and then use the dichotomy method to find the first unit_index that is just larger than this address, and then move one more step forward to find the resolution result we need.

Note: online priority is given to query Hbase cache in DWARF table, and splice method name, file name and line number into the format we need; If not, query the Hbase cache in symbol_table and calculate the offset from the function’s starting address. In order to prevent some cold data from occupying storage resources for a long time after the symbol table is uploaded, we set an expiration time of 45 days for each chunk in the figure above. If any chunk is found online, we update the expiration time of this chunk to 45 days after the current time.

DWARF file parsing

Full CompileUnit parsing

We know from DWARF that we need to rely on CompileUnit to parse both file names, line numbers and function names. Using DWARF we know that all offset addresses of CompileUnit in the debug_info section are stored in the debug_Arranges section.

The above document also provides the structure of the debug_Arrangesbinary. Based on the structure in the document, we need to parse all debug_info_offset manually. Because of the space, we will not post the code implementation here. One thing you need to be careful about when parsing binary manually is the size side.

Full address resolution flow

The following figure shows the process of full address resolution. Three points need special attention:

  1. Inline functions The function name is the same as the declaration of the function, but the file name and line number are the same as the position in which the function is inlined, which is consistent with atOS parsing. Otherwise, the information of two successive call stacks may jump, affecting the efficiency of analyzing the problem.
  2. As we know from the official DWARF Document Format,debug_lineFlags if there is oneis_stmt“, indicates that the current instruction is the compiler recommended breakpoint location, otherwise the corresponding instruction is the compiler automatically generated by the compiler recommended breakpoint location. Since the breakpoint can only be played on the same line, then we can determine fromis_stmtThe flag line will be available next timeis_stmtThe source file name and line number corresponding to these several lines of flag instructions are exactly the same, so there is nois_stmtFlag, we just need to find the nearest, and the address is smaller than it, andis_stmtThe line information of flag can accurately obtain the file name and line number of the corresponding address resolved. Therefore, the conclusion is: whether the line number information of consecutive lines of debug_line can be merged is the flagis_stmt, only two lines in a rowis_stmtDebug line info between true can be merged.
  3. The address range written to Hbase refers to the offset address, which is calculated as follows: offset = file_address-__text.vmaddr. This eliminates the need to care about the DWARF files during parsing__TEXTStart address of the Segment.

The Symbol Table parsing

It is relatively easy to parse the Symbol Table. We only need to sort the information in the Symbol Table by value, and then write the starting and ending addresses of each part and the corresponding function names into Hbase according to the above policies.

Remember on pit

In the process of realizing this scheme, we also stepped on a variety of pits. Here are a few typical examples for your reference:

  1. The write time is much longer than expected.

Cause: The Demangle tool is invoked before Hbase writing. Each time, the performance overhead is tens of ms. This problem is magnified when the magnitude is exaggerated.

Solution: Change the timing of Demangle from before Hbase writes to after Hbase queries, since there are still far fewer ways to crash than there would be for a full approach.

  1. Failed to fetch CompileUnit.

Cause of the problem:

In most cases, compile Unit offset fetched from the.debug_Arranges section needs to be manually added with an offset of 0xB which is the expected offset of CompileUnit.

But in this case there is an accident:

First we see that the offset is not 0xB, and the compile Unit offset retrieved from the debug_Arranges section is correct, for unknown reasons.

Solution: make a compatibility, if the offset with 0xB fails to compile Unit, subtract 0xB and try again.

  1. debug_lineThe same address appears in two consecutive lines, resulting in ambiguity in the parsing result.

Cause: Although two consecutive lines have the same address, the file name and line number are not the same, which results in ambiguity.

Solution: Refer to the atOS parsing result, as shown in the previous line.

  1. debug_lineHave readend_sequenceThat’s the last row, but the current CompileUnit still has part of the TAGDW_TAG_subprogramThe DIE has not beendebug_lineIndex to any address in. So that part of the address range is missing.

Cause: Suspected compiler optimization related, this part of the DIE method name usually starts with _OUTLINED_FUNCTION_.

Solution: If the end_sequence row is parsed and dies with DW_TAG_subprogram are not indexed, then the file name and line number of the end_sequence row will be the file name and line number of the DIE address range.

  1. Invalid data in Symbol Table

Cause of the problem: The FileAddress of the data in Symbol Table is actually smaller than __text.vmaddr, which causes the offset to be negative. Because the address offset was defined as uINT_64 at the beginning, the offset was strongly converted to a very large integer, which is not in line with expectations.

Solution: Filter out the data segment whose address offset is negative.

Effect of online

The solution was AB tested for about 2 weeks before going live in full and fixed all known badcases that had diff with the old solution. The performance of all performance indicators after full launch is as follows:

Single-line parsing takes time

The last 6h average elapsed time has been optimized by 70 times and pcT99 by more than 300 times

Crash Indicates the total interface time

From 7.7 to 7.10, the overall average time of crash parsing interfaces decreased by 50%.

From 7.7 to 7.0 crash parsing interface overall PCT99 time decreased by 70%+.

Symbol table file access magnitude

From 7.7->7.10 day symbol table file access magnitude decreased by 50%+.

Parse error

Since July 7, when the volume began to increase, parsing errors have completely disappeared.

Physical machine performance

Select a representative physical machine on the line to monitor, it can be seen that the machine load, memory occupancy, CPU occupancy, network IO have very obvious optimization compared with the same period of last year.

Below are some indicators kanban before and after optimization of core indicators for comparison:

Time range before optimization: 7.3 12:00-7.5 12:00

Optimized time range: 7.10 12:00-7.12 12:00

15 min load

15min load average: 5.76 => 0.84, which can be interpreted as the overall parsing efficiency of the cluster increased to 6.85 times.

IOWait CPU

Average IOWait CPU usage: 4.21 => 0.16, 96% optimization.

Memory footprint

Average memory usage: 74.4GiB => 31.7GiB, an optimization of 57%.

Network Input Traffic

Network Input traffic: 13.2MB/s=>4.34MB/s, optimized by 67%.

The resources

[1] my.oschina.net/linker/blog…

[2] docs. Rs/crate/symbo…

[3] llvm.org/docs/Comman…

[4] www.dwarfstd.org/doc/DWARF4….

[5] formalverification.cs.utah.edu/llvm_doxy/2…

About the Byte Terminal technology team

Bytedance Client Infrastructure is a global r&d team of big front-end Infrastructure technology (with r&d teams in Beijing, Shanghai, Hangzhou, Shenzhen, Guangzhou, Singapore and Mountain View), responsible for the construction of the whole big front-end Infrastructure of Bytedance. Improve the performance, stability and engineering efficiency of the company’s entire product line; The supported products include but are not limited to Douyin, Toutiao, Watermelon Video, Feishu, Guagualong, etc. We have in-depth research on mobile terminals, Web, Desktop and other terminals.

Now! Client/front-end/server/side intelligent algorithm/test development for global recruitment! Let’s change the world with technology. If you are interested, please contact [email protected], email subject resume – Name – Job intention – Desired city – Phone number.