This article introduces the common symbols and stack symbolization in iOS development.

DSYM and DWARF

IOS developers should be familiar with dSYM.

The compiler generates a table of Debug symbols during compilation (converting source code to machine code). The Debug symbol table is a mapping table that maps machine instructions from each compiled binary to each line of source code that generated them. These Debug Symbol tables are stored either in the compiled binary or in a separate Debug Symbol file (i.e., a dSYM file) : Generally, apps built in debug mode store the debug symbol table in the compiled binary, while apps built in release mode store the debug symbol table in a dSYM file to save binary volume.

At each build, the Debug symbol table and App binary are associated with each other by the build time UUID. Each build generates a new unique IDENTIFIER UUID, regardless of whether the source code is the same. Only dSYM files with consistent UUID can be used to parse their stack information.

DWARF, Debug With Arbitrary Record Format, is a standard Format for debugging messages. A separate dSYM File is called Debug Symbol File. Using MachOView to open a binary file, you can see many DWARF sections, such as __DWARF,__debug_str, __DWARF,__debug_info, __DWARF,__debug_names, and so on.

Online apps do not have dSYM, so for some online crashes, the correct dSYM is required for stack symbolization. Platforms such as Firebase and Bugly require dSYM files to be uploaded to symbolize stack information.

/xxxxxx/Pods/Crashlytics/iOS/Crashlytics.framework/upload-symbols -a 75ef2a0601e7b1071aed828d01b73ebdda95f3b9 -p ios ./MyApp.dSYM
Copy the code

The -a parameter specifies the UUID.

Symbol

Variables and functions are symbols. Linking is the process of collecting and linking together individual Mach-O files, which requires reading symbol tables. When debugging with Xcode, symbols are also mapped to source files through symbol tables.

For example, binary main uses the function A in binary A, that is, main finds the implementation of the function in A through the symbol. Binary A maintains its own symbol table. Use the nm tool to view the symbol information in the binary.

Struct nlist_64 stores the symbol’s data structure. The name of the symbol is not in the symbol Table, but in the String Table, where all the strings are stored. You need to find the name of the symbol according to n_strx at the subscript position in the String Table to find the correct symbol name, that is, the String.

struct nlist_64 {
    union {
        uint32_t  n_strx; /* index into the string table */ // The name of the symbol is the subscript of the String Table.
    } n_un;
    uint8_t n_type;        /* type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see <mach-o/stab.h> */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};
Copy the code

Notice the n_strx field, which is the subscript of the symbol’s name in the String Table.

Symbol Table

Symbol tables store symbol information. Ld and dyLD both read the symbol table at link,

String Table

All the strings in binary are stored in the String Table.

Use the strings command to see the strings in binary that can be printed, including the strings in the String Table.

strings - find the printable strings in a object, or other binary, file
Copy the code

Dynamic Symbol Table

The Dynamic Symbol Table stores only the subscripts in the Symbol Table, not the Symbol data structure, because the Symbol structure is stored only in the Symbol Table.

Use the otool command to view symbols in the dynamic symbol table at the subscript of the symbol table. Therefore, dynamic symbols are also called Indirect symbols.

Those who qualify can qualify onto university can qualify onto university. ➜ swift-hello git:(master) Qualify otool -i swift-hello.out swift-hello.out:Indirect symbols for (__TEXT,__stubs) 9 entries
address            index
0x0000000100000eec    10 
0x0000000100000ef2    11 
0x0000000100000ef8    15 
0x0000000100000efe    16 
0x0000000100000f04    17 
0x0000000100000f0a    18 
0x0000000100000f10    19 
0x0000000100000f16    21 
0x0000000100000f1c    22 
Indirect symbols for (__DATA_CONST,__got) 5 entries
address            index
0x0000000100001000    12 
0x0000000100001008    13 
0x0000000100001010    14 
0x0000000100001018    20 
0x0000000100001020    23 
Indirect symbols for (__DATA,__la_symbol_ptr) 9 entries
address            index
0x0000000100002000    10 
0x0000000100002008    11 
0x0000000100002010    15 
0x0000000100002018    16 
0x0000000100002020    17 
0x0000000100002028    18 
0x0000000100002030    19 
0x0000000100002038    21 
0x0000000100002040    22 
Copy the code

__la_symbol_ptr

In the otool command output above, there are Indirect symbols for (__DATA,__la_symbol_ptr) 9 entries. __la_symbol_ptr is a lazily loaded symbol pointer, meaning it is loaded when first used.

Section_64 has a reserved field, so if the section is __DATA,__la_symbol_ptr, The reserveD1 field stores the offset of __LA_symbol_ptr in the Dynamic Symbol Table, also known as the subscript.

struct section_64 { /* for 64-bit architectures */
  char    sectname[16]; /* name of this section */
  char    segname[16];  /* segment this section goes in */
  uint64_t  addr;   /* memory address of this section */
  uint64_t  size;   /* size in bytes of this section */
  uint32_t  offset;   /* file offset of this section */
  uint32_t  align;    /* section alignment (power of 2) */
  uint32_t  reloff;   /* file offset of relocation entries */
  uint32_t  nreloc;   /* number of relocation entries */
  uint32_t  flags;    /* flags (section type and attributes)*/
  uint32_t  reserved1;  /* reserved (for offset or index) */
  uint32_t  reserved2;  /* reserved (for count or sizeof) */
  uint32_t  reserved3;  /* reserved */
};
Copy the code

Find the symbol for __la_symbol_ptr

  1. If the LC is__DATA,__la_symbol_ptrIs readreserved1Field, to get the__la_symbol_ptrDynamic Symbol TableThe starting address or subscript of.
  2. right__la_symbol_ptrIf we iterate, we get each of these symbols corresponds toDynamic Symbol TableThe subscript of. That is, the current traversal indexidx + reserverd1
  3. throughDynamic Symbol Table, find the symbol corresponding toSymbol TableThe subscript of.
  4. throughSymbol Table, find the symbol name corresponding toString TableIn the subscript (i.enlist_64In then_strxField), you get the symbol name.
  5. In the end, it’s all neededString Table, the symbol name can be found through the corresponding subscript of the symbol.

__non_la_symbol_ptr

__non_la_symbol_ptr works in a similar way, not lazily loaded.

During binary loading, for symbols used, the lazy symbol and non lazy symbol are first found through a series of relationships, and the function symbol is located to its function implementation. The process of binding the two is called symbol binding.

Symbol naming rules

Refer to the NM command help here, as well as the Symbol blog for an in-depth understanding.

  1. C language symbol, directly in the function name before the underscore. Such asmyFuncThe sign of the function is_myFunc.
  2. C++ supports namespaces, function overloading, and so on. To avoid conflicts, Symbol Mangling is used. Such as__ZN11MyNameSpace7MyClass6myFuncEdIn, ***_ZN*** is the beginning, followed byLength of namespace and namespace, length of class name and class name, length of function name and function nameIn order toEThe last is the argument type, such as I for int, d for double.
  3. Objective-c notation is similar to: ***_OBJC_CLASS_$_MyViewController***, ***_OBJC_CLASS_$_MyObject**, etc.
  4. Swift symbol name, somewhat similar to C++ rules. Such as the function sayHello corresponding to the symbol name_sor_$ssAt the beginning, followed by4mainBinary name? To be verified. And then there’s the8sayHelloThe length of the function name and the name of the function. The last of theyyFI don’t know… ?????

The nm command

The nm command is used to display the binary symbol table. There are two versions of this command, and the nm we usually use is actually llvm-nm.

The symbol table displayed by nm, that is, the symbol table in the nList structure of each binary.

As of Xcode 8.0 the default nm(1) tool is LLVM-nm (1). For the most part nm(1) and LLVm-nm (1) have the same options; notable exceptions include -f, -s, and -L as described below. This document explains options common between the two commands as well as some historically relevant options supported by nm-classic(1). More help on options for llvm-nm(1) is provided when running it with the --help option. nm displays the name list (symbol table of nlist structures) of each object file in the argument list. In  some cases, as with an object that has had strip(1) with its -T option used on the object, that can be different than the dyld information. For that information use dyldinfo(1). If an argument is an archive, a listing for each object file in the archive will be produced. File can be of the form libx.a(x.o), in which case only symbols from that member of the object file are listed. (The parentheses have to be quoted to get by the shell.) If no file is given, the symbols in a.out are listed. Each symbol name is preceded by its value (blanks if undefined). Unless the -m option is specified, this value is followed by one of the following characters, representing the symbol type: U (undefined), A (absolute), T (text section symbol), D (data section symbol), B (bss section symbol), C (common symbol), - (for debugger symbol table entries; see -a below), S (symbol in a section other than those above), or I (indirect symbol). If the symbol is local (non-external), the symbol's type is instead represented by the corresponding lowercase letter. A lower case u in a dynamic shared library indicates a undefined reference to a private external in another module in the same library. If the symbol is a Objective-C method, the symbol name is +-[Class_name(category_name) method:name:], where `+' is for class methods, `-' is for instance methods, and (category_name) is present only when the method is in a category.Copy the code

Use the nm command to view the symbol information in a Mach-o file. For example:

➜ codes git:(master) qualify nm main0000000000000000 T _main
                 U _printf
Copy the code

The small letters represent global symbols, and the lower case represents local symbols. The U here stands for undefined, the undefined external symbol.

For the binary generated by the Swift code, nm executes the following output:

➜ swift-hello git:(master) those who qualify can qualify nm swift-hello.out0000000100002050 b _$s4main4name33_9D2E62AE399B1FA0EBB6EEB3A775C624LLSSvp
0000000100000c40 t _$s4main8sayHelloyyF
                 U _$sSS19stringInterpolationSSs013DefaultStringB0V_tcfC
                 U _$sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC
                 U _$sSSN
0000000100000e70 t _$sSSWOh
                 U _$sSSs20TextOutputStreamablesWP
                 U _$sSSs23CustomStringConvertiblesWP
                 U _$ss26DefaultStringInterpolationV06appendC0yyxs06CustomB11ConvertibleRzs20TextOutputStreamableRzlF
                 U _$ss26DefaultStringInterpolationV13appendLiteralyySSF
                 U _$ss26DefaultStringInterpolationV15literalCapacity18interpolationCountABSi_SitcfC
0000000100000e90 t _$ss26DefaultStringInterpolationVWOh
                 U _$ss27_allocateUninitializedArrayySayxG_BptBwlF
                 U _$ss5print_9separator10terminatoryypd_S2StF
0000000100000eb0 t _$ss5print_9separator10terminatoryypd_S2StFfA0_
0000000100000ed0 t _$ss5print_9separator10terminatoryypd_S2StFfA1_
                 U _$sypN
0000000100000fa0 s ___swift_reflection_version
0000000100002048 d __dyld_private
0000000100000000 T __mh_execute_header
0000000100000bf0 T _main
                 U _swift_bridgeObjectRelease
                 U _swift_bridgeObjectRetain
                 U dyld_stub_binder
Copy the code

As you can see, it’s a little bit more complicated, but it also follows the naming rules mentioned above.

Nm-g can view only global symbols (external symbols).

Visibility of symbols

By default, Xcode just leaves every symbol in a library visible, unless it is obviously private (like static functions or inlined ones, or in Swift ones declared internal or private). But there is a setting to reverse that: “Symbols Hidden by Default” (Clang flag-fvisibility = Hidden).

Symbols in the project are visible by default. You can use -fvisibility=hidden to make the symbol hidden. You can also use Clang attributes to set symbol visibility separately, such as:

// Symbols can be linked externally
__attribute__(( visibility("default")))void foo( void );
// Symbols are not placed in the Dynamic Symbol Table, which means they cannot be linked by other compilation units
__attribute__(( visibility("hidden")))void bar( int x );
Copy the code

Weak and strong symbols

Refer to self-in-depth understanding of Symbol.

Copyright Notice: this article is the original article of CSDN blogger “Huang Wenchen”. It follows the COPYRIGHT agreement of CC 4.0 by-SA. Please attach the link of the original source and this statement. Link: blog.csdn.net/Hello_Hwc/a…

The default symbol is strong symbol and must be implemented. The symbol must not be the same name.

Weak symbol is a symbol that does not contain the corresponding function implementation, allowing the symbol to be non-existent at runtime. Strong can override weak.

Usage scenario:

  1. Dependency injection: The default implementation can be provided with weak Symbol. An external strong symbol can be provided to inject the implementation, which can be used for dependency injection.
  2. Weak linking: Used to implement version compatibility. For example, if a dynamic library feature is only supported by iOS 10 or higher, then the symbol will be NULL when accessed on iOS 9. In this case, you can use weak linking
extern void demo(void) __attribute__((weak_import));
if (demo) {
    printf("Demo is not implemented");
}else{
    printf("Demo is implemented");
}
Copy the code

Symbolic breakpoints in Xcode

Symbolic breakpoints are useful in some debugging scenarios:

(lldb) breakpoint set -F "-[UIViewController viewDidAppear:]"
Breakpoint 2: where = UIKitCore`-[UIViewController viewDidAppear:], address = 0x00007fff46b03dab
Copy the code

LLDB view symbols

The image lookup command enables you to view symbol information during debugging:

Image lookup -t symbol_name image lookup -s symbol_nameCopy the code

Symbol binding

Symbolic binding is the operation of binding a symbolic name to its actual address, such as binding a function name to the address of the function body.

Look at this Swift code:

# swift-hello.swift

private let name = "Chris"

func sayHello(a) {
  print("Hello \(name)")
}

sayHello()
Copy the code

Run the swiftc swift-hello.swift -o swift-hello.out command to generate the swift-hello.out executable file and view its symbol information:

➜ swift-hello git:(master) Qualify xcrun dyldinfo -bind swift-hello. Out bind information: segment section address type addend dylib symbol __DATA_CONST __got0x100001020    pointer      0 libSystem        dyld_stub_binder
__DATA_CONST __got            0x100001000    pointer      0 libswiftCore     _$sSSN
__DATA_CONST __got            0x100001008    pointer      0 libswiftCore     _$sSSs20TextOutputStreamablesWP
__DATA_CONST __got            0x100001010    pointer      0 libswiftCore     _$sSSs23CustomStringConvertiblesWP
__DATA_CONST __got            0x100001018    pointer      0 libswiftCore     _$sypN
Copy the code

The -bind argument outputs symbols that have been bind, that is, in the __DATA_CONST __got section. Dyld_stub_binder is the tool that performs the bind operation.

In fact, most of the external symbols are bind for the first time, which is __la_symbol_ptr. This can be viewed using the -lazy_bind parameter.

➜ swift-hello git:(master) qualify xcrun dyldinfo -lazy_bind swift-hello. Out lazy binding information (from lazy_bind part of dyld info): segment section address index dylib symbol __DATA __la_symbol_ptr 0x100002000 0x0000 libswiftCore _$sSS19stringInterpolationSSs013DefaultStringB0V_tcfC __DATA __la_symbol_ptr 0x100002008 0x003C libswiftCore _$sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC __DATA __la_symbol_ptr 0x100002010 0x0089 libswiftCore _$ss26DefaultStringInterpolationV06appendC0yyxs06CustomB11ConvertibleRzs20TextOutputStreamableRzlF __DATA __la_symbol_ptr 0x100002018 0x00F2 libswiftCore _$ss26DefaultStringInterpolationV13appendLiteralyySSF __DATA __la_symbol_ptr 0x100002020 0x012E libswiftCore _$ss26DefaultStringInterpolationV15literalCapacity18interpolationCountABSi_SitcfC __DATA __la_symbol_ptr 0x100002028 0x0186 libswiftCore _$ss27_allocateUninitializedArrayySayxG_BptBwlF __DATA __la_symbol_ptr 0x100002030 0x01BC libswiftCore _$ss5print_9separator10terminatoryypd_S2StF __DATA __la_symbol_ptr 0x100002038 0x01EE libswiftCore _swift_bridgeObjectRelease __DATA __la_symbol_ptr 0x100002040 0x020F libswiftCore _swift_bridgeObjectRetainCopy the code

As you can see, the symbols are all in the __DATA __LA_symbol_ptr section, the lazy bind section.

  1. __la_symbol_ptrA pointer to, will point to__stub_helper
  2. The first time this function is called, usedyld_stub_binderBind the pointer to the implementation of the function.
  3. When assembly code calls a function, it calls it directly__DATA, __la_symbol_ptrThe address to which the pointer points.

In fact, Fishhook makes use of the principle of symbol binding and uses symbol rebind to locate the implementation of the specified function symbol to the new function implementation defined by itself, so as to achieve the purpose of hook C language functions.

link

Static linker ld

Ld is a static linker that compiles many source files into.o files and links them.

Dynamic loader dyLD

Dynamic libraries such as dylib are linked using dyld.

Dlopen and DLSYm are a set of apis available on iOS that allow for loading dynamic libraries and fetching symbols dynamically at runtime, but are not allowed for online apps.

extern NSString *myDyFunc(void);
void *handle = dlopen("my.dylib", RTLD_LAZY);
NSString *(*myFunc)(void) = dlsym(RTLD_DEFAULT,"myDyFunc");
NSString *result = myFunc();
Copy the code

Use dyld to hook

As you can see from the in-depth Understanding of Symbol blog, dyld can be used for hooks. It’s disabled on iOS, so it can only be used on MacOS and emulators.

It is known that C function hook can be implemented with fishhook, but in fact dyld has built-in symbol hook. Xcode analysis tools such as Malloc History are implemented through dyld hook and malloc/free functions. Here we hook NSClassFromString with dyld, and notice that the advantage of dyld hook is that the function being hooked still points to the original implementation, so we can call it directly.

The sample code provided by the authors is as follows:

#defineDYLD_INTERPOSE(_replacement,_replacee) \ __attribute__((used)) static struct{\ const void* replacement; \ const void* replacee; \ } _interpose_##_replacee \ __attribute__ ((section ("__DATA,__interpose"))) = {\
    (const void*)(unsigned long)&_replacement,\
    (const void*)(unsigned long)&_replacee\
};

Class _Nullable hooked_NSClassFromString(NSString *aClassName){
    NSLog(@"hello world");
    return NSClassFromString(aClassName);
}
DYLD_INTERPOSE(hooked_NSClassFromString, NSClassFromString);
Copy the code

Static libraries versus dynamic libraries

Static library *.a files are not linked, but ar is used directly. This is similar to the tar command.

  1. When an ld links to a static library (.a file), a symbol is written to the final binary file only if it is referenced. Otherwise, the symbol is discarded.
  2. When using a static library, the binary directly copies the code data associated with the corresponding symbols in the static library into the binary, which also increases the binary size. And the binaries need to be recompiled when the static library is updated. Binary can run separately.
  3. When using a dynamic library, the binary only makes sure that the symbol implementation it uses is in the dynamic library at compile time, and does not copy any symbolic code data in the dynamic library. When you run binary, you also need the dynamic library, which means when you call a function at runtime, you need to look up the implementation of the function in the dynamic library. When dynamic libraries are updated, there is no need to recompile the binaries.

Suppose there is another executable program F and executable program E that also need to reference foo: both E and F refer to the static library S, then both E and F will have corresponding foo code after compilation. There are two copies of foo code. Both E and F refer to dynamic library D. After compiling, E and F only need to refer to the foo code of dynamic library D at runtime to execute.

Reference: About macOS & iOS symbol.

Symbolizing tools and commands

For stack symbolization, just note that App, UUID, and dSYM correspond.

  1. The UUID is the unique identifier of the binary from which the corresponding dSYM and DWARF files are found
  2. DSYM contains symbolic information, including DWARF.
  3. Crash records the original call stack information.

The symbolization process is to find the symbol information in the dSYM corresponding to the specified binary according to the address information of the stack in the crash, that is, to call the function.

dwarfdump

The dwarfdump command can be used to obtain the UUID of a dSYM file or perform a simple query.

Dwarfdump -- UUID dSYM file dwarfdump --lookup [address] -arch ARM64 dSYM fileCopy the code

mfind

Use mfind to locate dSYM files on a Mac system, such as:

mdfind "com_apple_xcode_dsym_uuids == E30FC309-DF7B-3C9F-8AC5-7F0F6047D65F"
Copy the code

symbolicatecrash

The crash file can be symbolized using the symbolicatecrash command.

First find the Symbolicatecrash, then copy the Symbolicatecrash to use (or create a softlink).

find /Applications/Xcode.app -name symbolicatecrash -type f
Copy the code

The usage is as follows:

./symbolicatecrash my.crash myDSYM > symbolized.crash
Copy the code

If below mistakes, it will export DEVELOPER_DIR = / Applications/Xcode. The app/Contents/Developer added to bashrc can.

Error: "DEVELOPER_DIR" is not defined at ./symbolicatecrash line 69.
Copy the code

If there is No symbolic Information found, it may be related to whether Bitcode is enabled. If bitcode is enabled, Xcode will generate multiple dSYM files. If bitcode is turned off, only one will be generated. For details, see the impact of ios Bitcode on DSYM debugging files.

Sometimes dSYM files for individual symbols need to be obtained separately from other sources, such as:

0x1001f263c _hidden#1_ + 26172 (__hidden#18_ :33)
Copy the code

This is where the atos command might be needed.

atos

A single address can be symbolized using the atos command. Run the shell command xcrun atos -o [dwarf file address] -arch arm64 -l [loadAddress] [instructionAddress].

xcrun atos -o app.dSYM/Contents/Resources/DWARF/MyApp -arch arm64 -l -l 0x1006b4000 0x0000000100d382a8
Copy the code

In fact, only the offset of the symbol in the corresponding Mach-O can be symbolized. Assume that loadAddress is 1, calculate instructionAddress = offset + loadAddress. The ATos command does not accept direct passing of the offset address, oddly enough. And loadAddress cannot be 0.

xcrun atos -o app.dSYM/Contents/Resources/DWARF/MyApp -arch arm64 -l 0x1 0xF781
Copy the code

Where 0xF781 is the instructionAddress obtained through offet calculation when loadAddress is 0x1.

The resources

  • In-depth understanding of Symbol
  • Hiding Symbols in Static Libraries with Xcode or CMake
  • About macOS & iOS symbol
  • Delve into the TBD file by linking to libstdc++ in Xcode 10