Thread Call Stack capture and parsing

Note: jdy_ and BS_ are the source prefixes of the original author.

Get the Call Stack for any thread

To get the call stack of the current thread, you can use the existing API:[NSThread callStackSymbols].

However, there is no API for fetching the call stack of any thread, so you have to code it yourself.

1. Call stack

What does a thread’s call stack look like?

“My” understanding is to include the execution address of the current thread, and from this address can be traced level by level back to the thread’s entry address, thus creating a chain in reverse: thread entry executes a method, and then nested calls level by level to the current scene.

As shown in the figure, each level of method invocation corresponds to an active record, also known as an active frame. In other words, the call stack is composed of frames, which can be called stack frames. Each stack frame corresponds to a function call: the blue part is the stack frame of the DrawSquare function, which calls the green part of the DrawLine function during execution.

You can see that the stack frame consists of three parts: function parameters, return address, and variables within the frame. First, the function arguments are pushed; The return address is then pushed, which represents the address to be returned after the execution of the current active record is completed; Finally, there are the variables defined inside the function.

A Stack Pointer represents the top of the current Stack, and since most operating systems grow the Stack downwards, it is actually the minimum number of Stack addresses. As explained above, the address to which the Frame Pointer points stores the value of the last Stack Pointer, which is the return address.

On most operating systems, each Stack Frame also holds a Frame Pointer from the previous Stack Frame, so knowing the current Stack Frame’s Stack Pointer and Frame Pointer, Then we can know the Stack Pointer and Frame Pointer of the last Stack Frame and recursively get the Frame at the bottom of the Stack. It forms a chain.

Obviously when a function call ends, its stack frame no longer exists.

Therefore, the call stack is actually an abstract concept of stack, which represents the call relationship between methods. Generally speaking, the call stack can be resolved from the stack.

So, after we get the stack frame, we can go back by returning the address.

Instruction pointer and base address pointer

We have identified two objectives: (1) currently executing instructions

②, current stack frame structure.

Using x86 as an example, registers are used as follows:

SP/ESP/RSP: Stack pointer for top address of the stack.
BP/EBP/RBP: Stack base pointer for holding the address of the current stack frame.
IP/EIP/RIP: Instruction pointer. Holds the program counter, the current instruction address.Copy the code

As you can see, we can get the current instruction address from the instruction pointer and the current stack frame address from the stack base address pointer.

So, how do we get the relevant registers?

3. Thread execution status

Considering that when a thread is suspended, subsequent execution needs to restore the scene, the relevant scene needs to be saved during suspension, such as which instruction is currently executed.

A structure is needed to store the state of the thread at runtime. After checking, the following information is obtained:

The function thread_get_state returns the execution state (e.g. the machine registers) of target_thread as specified by flavor.

Function - Return the execution statefora thread. SYNOPSIS kern_return_t thread_get_state ( thread_act_t target_act, thread_state_flavor_t flavor, thread_state_t old_state, mach_msg_type_number_t *old_stateCnt ); /* * THREAD_STATE_FLAVOR_LIST 0 * these are the supported flavors. These enumerated values were not found and may be custom for the author. */ #define x86_THREAD_STATE32 1 #define x86_FLOAT_STATE32 2 #define x86_EXCEPTION_STATE32 3 #define x86_THREAD_STATE64 4  #define x86_FLOAT_STATE64 5 #define x86_EXCEPTION_STATE64 6 #define x86_THREAD_STATE 7 #define x86_FLOAT_STATE 8 #define x86_EXCEPTION_STATE 9 #define x86_DEBUG_STATE32 10 #define x86_DEBUG_STATE64 11 #define x86_DEBUG_STATE 12 #define THREAD_STATE_NONE 13 /* 14 and 15 are used for the internal x86_SAVED_STATE flavours */ #define x86_AVX_STATE32 16 #define x86_AVX_STATE64 17 #define x86_AVX_STATE 18Copy the code

So we can use this API to get the desired register information with related parameters:

bool jdy_fillThreadStateIntoMachineContext(thread_t thread, _STRUCT_MCONTEXT * machineContext) {
    mach_msg_type_number_t state_count = x86_THREAD_STATE64_COUNT;
    kern_return_t kr = thread_get_state(thread, x86_THREAD_STATE64, (thread_state_t)&machineContext->__ss, &state_count);
    return (kr == KERN_SUCCESS);
}Copy the code

The state_count is different for different schemas, so the macro x86_THREAD_STATE64_COUNT is used here. I’ve introduced a structure called _STRUCT_MCONTEXT.

4. Registers for different platforms

X86_64, such as iPhone6 simulator:

_STRUCT_MCONTEXT64
{
    _STRUCT_X86_EXCEPTION_STATE64   __es;
    _STRUCT_X86_THREAD_STATE64  __ss;
    _STRUCT_X86_FLOAT_STATE64   __fs;
};

_STRUCT_X86_THREAD_STATE64
{
    __uint64_t  __rax;
    __uint64_t  __rbx;
    __uint64_t  __rcx;
    __uint64_t  __rdx;
    __uint64_t  __rdi;
    __uint64_t  __rsi;
    __uint64_t  __rbp;
    __uint64_t  __rsp;
    __uint64_t  __r8;
    __uint64_t  __r9;
    __uint64_t  __r10;
    __uint64_t  __r11;
    __uint64_t  __r12;
    __uint64_t  __r13;
    __uint64_t  __r14;
    __uint64_t  __r15;
    __uint64_t  __rip;
    __uint64_t  __rflags;
    __uint64_t  __cs;
    __uint64_t  __fs;
    __uint64_t  __gs;
};Copy the code

X86_32, such as iPhone4s simulator:

_STRUCT_MCONTEXT32
{
    _STRUCT_X86_EXCEPTION_STATE32   __es;
    _STRUCT_X86_THREAD_STATE32  __ss;
    _STRUCT_X86_FLOAT_STATE32   __fs;
};

_STRUCT_X86_THREAD_STATE32
{
   unsignedint    __eax;
   unsignedint    __ebx;
   unsignedint    __ecx;
   unsignedint    __edx;
   unsignedint    __edi;
   unsignedint    __esi;
   unsignedint    __ebp;
   unsignedint    __esp;
   unsignedint    __ss;
   unsignedint    __eflags;
   unsignedint    __eip;
   unsignedint    __cs;
   unsignedint    __ds;
   unsignedint    __es;
   unsignedint    __fs;
   unsignedint    __gs;
};Copy the code

ARM64, such as iPhone5s:

_STRUCT_MCONTEXT64
{
	_STRUCT_ARM_EXCEPTION_STATE64	__es;
	_STRUCT_ARM_THREAD_STATE64	__ss;
	_STRUCT_ARM_NEON_STATE64	__ns;
};

_STRUCT_ARM_THREAD_STATE64
{
	__uint64_t    __x[29];	/* General purpose registers x0-x28 */
	void*         __opaque_fp;	/* Frame pointer x29 */
	void*         __opaque_lr;	/* Link register x30 */
	void*         __opaque_sp;	/* Stack pointer x31 */
	void*         __opaque_pc;	/* Program counter */
	__uint32_t    __cpsr;	/* Current program status register */
	__uint32_t    __opaque_flags;	/* Flags describing structure format */
};Copy the code

ARMv7/v6, e.g. IPhone4s:

_STRUCT_MCONTEXT32
{
	_STRUCT_ARM_EXCEPTION_STATE	__es;
	_STRUCT_ARM_THREAD_STATE	__ss;
	_STRUCT_ARM_VFP_STATE		__fs;
};

_STRUCT_ARM_THREAD_STATE
{
	__uint32_t	__r[13];	/* General purpose register r0-r12 */
	__uint32_t	__sp;		/* Stack pointer r13 */
	__uint32_t	__lr;		/* Link register r14 */
	__uint32_t	__pc;		/* Program counter r15 */
	__uint32_t	__cpsr;		/* Current program status register */
};Copy the code

The frame pointer register (x29) must always address a valid frame record, Although some functions — such as leaf functions or tail calls — may elect not to create an entry in this list. As a result, stack traces will always be meaningful, even without debug information

In ARMv7/v6:

The function calling conventions used in the ARMv6 environment are the same as those used in the Procedure Call Standard for the ARM Architecture (release 1.07), with the following exceptions:

*The stack is 4-byte aligned at the point of function calls. Large data types (larger than 4 bytes) are 4-byte aligned. Register R7 is used as a frame pointer Register R9 has special usage.*

Therefore, by understanding the register structure of the above different platforms, we can write a more general traceback function.

5. Algorithm implementation

/** * The layout of the stack frame can be referenced:  * https://en.wikipedia.org/wiki/Call_stack * http://www.cs.cornell.edu/courses/cs412/2008sp/lectures/lec20.pdf * http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/ */ typedef struct JDYStackFrame { const struct JDYStackFrame* const previous; // Uintptr_t returnAddress; // return address: the address of the previous stack frame} JDYStackFrame; Uintptr_t *backtraceBuffer, uintptr_t) {if (limit <= 0) return 0; _STRUCT_MCONTEXT mcontext; // Get context if (! jdy_fillThreadStateIntoMachineContext(thread, &mcontext)) { return 0; } int i = 0; uintptr_t pc = jdy_programCounterOfMachineContext(&mcontext); backtraceBuffer[i++] = pc; if (i == limit) return i; uintptr_t lr = jdy_linkRegisterOfMachineContext(&mcontext); if (lr ! BacktraceBuffer [i++] = lr; backtraceBuffer[i++] = lr; if (i == limit) return i; } JDYStackFrame frame = {0}; uintptr_t fp = jdy_framePointerOfMachineContext(&mcontext); if (fp == 0 || jdy_copyMemory((void *)fp, &frame, sizeof(frame)) ! = KERN_SUCCESS) { return i; } while (i < limit) { backtraceBuffer[i++] = frame.returnAddress; if (frame.returnAddress == 0 || frame.previous == NULL || jdy_copyMemory((void *)frame.previous, &frame, sizeof(frame)) ! = KERN_SUCCESS) { break; } } return i; }Copy the code

2. Traditional methods of failure

By using the methods of dispatch_async or performSelectorOnMainThread, combining callstackSymbols method, back to the main thread and get the call stack. Is this feasible?

The thread must first run and then (if necessary) start runloop for survival. As we know, runloop is essentially an infinite loop in which multiple functions are called to determine whether source0, source1, timer, dispatch_queue, etc., have anything to process.

All uI-related events are source0, so __CFRunLoopDoSources0 is executed, and runloop goes to sleep after the event is processed.

If we use dispatch_async, it will wake up runloop and handle the event, but __CFRunLoopDoSources0 has finished executing and it is not possible to get the call stack of viewDidLoad.

The underlying performSelector family of methods also relies on Runloop, so it just submits a task to the current runloop, but it still waits for the existing task to complete, so it doesn’t get a real-time call stack.

In short, any scheme that involves runloop, or waits for viewDidLoad to complete, is unlikely to succeed.

Third, Mach_thread

Remember from the stack introduction, you can completely determine the stack information by knowing StackPointer and FramePointer. Is there a way to get StackPointer and FramePointer for all threads?

The answer is yes. First, the task_threads method is provided to fetch all threads. Note that this thread is the lowest level of Mach threads, and its relationship to NSthreads will be explained later.

For each thread, you can use the thread_get_state method to get all of its information, which is populated with parameters of type _STRUCT_MCONTEXT. There are two parameters in this method that change from CPU architecture to CPU architecture, so macros BS_THREAD_STATE_COUNT and BS_THREAD_STATE are defined to mask differences between cpus.

A _STRUCT_MCONTEXT structure stores the Stack Pointer of the current thread and the Frame Pointer of the topmost Stack Frame to retrieve the call Stack of the entire thread.

In a project, the call stack is stored in the backtraceBuffer array, where each pointer corresponds to a stack frame, each stack frame corresponds to a function call, and each function has its own symbolic name.

The next task is to get the symbolic name of the function call from the Frame Pointer on the stack Frame.

4. Related apis and data structures

/*
 * Structure filled in by dladdr().
 */
typedef struct dl_info {
        const char      *dli_fname;     /* Pathname of shared object */
        void            *dli_fbase;     /* Base address of shared object */
        const char      *dli_sname;     /* Name of nearest symbol */
        void            *dli_saddr;     /* Address of nearest symbol */
} Dl_info;


extern int dladdr(const void *, Dl_info *);

DESCRIPTION
     These routines provide additional introspection of dyld beyond that provided by dlopen() and dladdr()

     _dyld_image_count() returns the current number of images mapped in by dyld. Note that using this count
     to iterate all images is not thread safe, because another thread may be adding or removing images dur-ing during
     ing the iteration.

     _dyld_get_image_header() returns a pointer to the mach header of the image indexed by image_index.  If
     image_index is out of range, NULL is returned.

     _dyld_get_image_vmaddr_slide() returns the virtural memory address slide amount of the image indexed by
     image_index. If image_index is out of range zero is returned.

     _dyld_get_image_name() returns the name of the image indexed by image_index. The C-string continues to
     be owned by dyld and should not deleted.  If image_index is out of range NULL is returned.
Copy the code

In order to determine whether the resolution is successful, the interface design is as follows:

bool jdy_symbolicateAddress(const uintptr_t addr, Dl_info *info)

Dl_info is used to populate the results of parsing.

5. Algorithm thinking

The following idea is to describe a general direction and does not cover specific details, such as asLR-based offsets:

/ / based on ASLR offset https://en.wikipedia.org/wiki/Address_space_layout_randomization / * * * the When the dynamic would loads the an image, * the image must be mapped into the virtual address space of the process at an unoccupied address. * The dynamic linker accomplishes this by adding a value "the virtual memory slide amount" to the base address of the image. */Copy the code

Find the target mirror that contains the address

extern bool _dyld_image_containing_address(const void * address) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_3,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);Copy the code

So you have to judge for yourself.

A segment defines a range of bytes in a Mach-O file and the addresses and memory protection attributes at which those bytes are mapped into virtual memory when the dynamic linker loads the application. As such, segments are always virtual memory page aligned. A segment contains zero or more sections.

By traversing each segment, determine whether the destination address falls within the range contained in the segment:

/* * The segment load command indicates that a part of this file is to be * mapped into the task's address space. The size of this segment in memory, * vmsize, maybe equal to or larger than the amount to map from this file, * filesize. The file is mapped starting at fileoff to the beginning of * the segment in memory, vmaddr. The rest of the memory of the segment, * if any, is allocated zero fill on demand. The segment's maximum virtual * memory protection and initial virtual memory protection are specified * by the maxprot and initprot fields. If the segment has sections then the * section structures  directly follow the segment command and their size is * reflected in cmdsize. */ struct segment_command { /* for 32-bit  architectures */ uint32_t cmd; /* LC_SEGMENT */ uint32_t cmdsize; /* includes sizeof section structs */ char segname[16]; /* segment name */ uint32_t vmaddr; /* memory address of this segment */ uint32_t vmsize; /* memory size of this segment */ uint32_t fileoff; /* file offset of this segment */ uint32_t filesize; /* amount to map from the file */ vm_prot_t maxprot; /* maximum VM protection */ vm_prot_t initprot; /* initial VM protection */ uint32_t nsects; /* number of sections in segment */ uint32_t flags; /* flags */ }; /** * @brief check whether a segment_command contains the address addr. */ bool jdy_segmentContainsAddress(const struct load_command *cmdPtr, const uintptr_t addr) { if (cmdPtr->cmd == LC_SEGMENT) { struct segment_command *segPtr = (struct segment_command *)cmdPtr; if (addr >= segPtr->vmaddr && addr < (segPtr->vmaddr + segPtr->vmsize)) { return true; }}}Copy the code

This way, we can find the image file containing the destination address.

② Locate the symbol table of the target mirror

Since symbol collection and symbol table creation run through the compile and link phases, we will not expand here, but just make sure that in addition to the code segment _TEXT and the DATA segment DATA, there is also a _LINKEDIT segment that contains the symbol table:

The __LINKEDIT segment contains raw data used by the dynamic linker, such as symbol, string, and relocation table entries.

So now we need to navigate to the __LINKEDIT section, also from apple’s official documentation:

Segments and sections are normally accessed by name. Segments, by convention, are named using all uppercase letters preceded by two underscores (for example, _TEXT); sections should be named using all lowercase letters preceded by two underscores (for example, _text). This naming convention is standard, although not required for the tools to operate correctly.

We compare whether the segment name is the same as __LINKEDIT by iterating through each segment:

usr/include/mach-o/loader.h

#define SEG_LINKEDIT   "__LINKEDIT"Copy the code

Next, the symbol table:

From The Mac Hacker’s Handbook: The LC_SYMTAB load command describes where to find the string and symbol tables within the __LINKEDIT segment. The offsets given are file offsets, so you subtract the file offset of the __LINKEDIT segment to obtain the virtual memory offset of the string and symbol tables. Adding the virtual memory offset to the virtual-memory address where the __LINKEDIT segment is loaded will give you the in-memory location of the string and sym- bol tables.

That is, we need to combine __LINKEDIT Segment_command (see structure description above) with LC_SYMTAB load_command(see structure description below) to locate the symbol table:

/*
 * The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
 * "stab" style symbol table information as described in the header files
 * <nlist.h> and <stab.h>.
 */
struct symtab_command {
	uint32_t	cmd;		/* LC_SYMTAB */
	uint32_t	cmdsize;	/* sizeof(struct symtab_command) */
	uint32_t	symoff;		/* symbol table offset */
	uint32_t	nsyms;		/* number of symbol table entries */
	uint32_t	stroff;		/* string table offset */
	uint32_t	strsize;	/* string table size in bytes */
};Copy the code

As described by the above reference, the offsets in LC_SYMTAB and _LINKEDIT are file offsets, so to get the address of the symbol table and string table in memory, We first subtract the fileoff of LINKEDIT from symoff and stroff of LC_SYMTAB respectively to get the offset of virtual address, and then add the Vmoffset of _LINKEDIT to get the virtual address. Of course, to get the final actual memory address, you need to add an ASLR-based offset.

Finally found the symbol table, code:

/** * @brief matches the most appropriate symbol for the address in the specified symbol table, Here the address of the need to minus the vmaddr_slide * / const JDY_SymbolTableEntry * jdy_findBestMatchSymbolForAddress (uintptr_t addr, JDY_SymbolTableEntry *symbolTable, uint32_t nsyms) { // 1. addr >= symbol.value; Because addr is the address of an instruction in a function, it should be greater than or equal to the address of the function's entry, which is the value of the corresponding symbol; // 2. symbol.value is nearest to addr; The function entry address closer to the instruction address addr is a more accurate match. const JDY_SymbolTableEntry *nearestSymbol = NULL; uintptr_t currentDistance = UINT32_MAX; for (uint32_t symIndex = 0; symIndex < nsyms; symIndex++) { uintptr_t symbolValue = symbolTable[symIndex].n_value; if (symbolValue > 0) { uintptr_t symbolDistance = addr - symbolValue; if (symbolValue <= addr && symbolDistance <= currentDistance) { currentDistance = symbolDistance; nearestSymbol = symbolTable + symIndex; } } } return nearestSymbol; } /* * This is the symbol table entry structure for 64-bit architectures. */ struct nlist_64 { union { uint32_t n_strx; /* index into the string table */ } n_un; uint8_t n_type; /* type flag, see below */ uint8_t n_sect; /* section number or NO_SECT */ uint16_t n_desc; /* see <mach-o/stab.h> */ uint64_t n_value; /* value of this symbol (or stab offset) */ };Copy the code

Once the matching nlist structure is found, we can locate the corresponding symbol name in the string table by.n_un.n_strx.

Reveal NSThreads

We can get all the threads and their call stacks, but what if we want to get the stack of a single thread? How do I establish a connection between NSThread threads and kernel threads?

Gnustep-base’s source code includes the Foundation library’s source code. There is no guarantee that NSThread will adopt its implementation, but at least you can dig up a lot of useful information from the nsThread.m class.

Many articles have mentioned that NSThreads are encapsulation of Pthreads, which involves two issues:

What is a pthread?
How do NSThreads encapsulate PThreads?

The letter P in pthread is short for POSIX, which stands for Portable Operating System Interface.

Each operating system has its own threading model and provides different APIS for manipulating threads. This creates problems for thread management across platforms. POSIX aims to provide abstract Pthreads and apis that are implemented differently on different operating systems. But it does the same thing.

Unix provides methods such as thread_get_state and task_threads that operate on kernel threads. Each kernel thread is uniquely identified by an ID of type thread_T. A pthread is uniquely identified by type pthread_T.

Switching between kernel threads and pthreads (thread_t and pthread_t) is easy, because pthreads are designed to abstract kernel threads.

It is not accurate to say that NSThreads encapsulate pthreads, and only a few places inside nsthreads are used for pthreads. A simplified version of NSThread’s start method is implemented as follows:

- (void) start
{
     pthread_attr_t attr;
     pthread_t thr;
     errno = 0;
     pthread_attr_init(&attr);
     if (pthread_create(&thr, &attr, nsthreadLauncher, self)) {
         // Error Handling
     }
}Copy the code

Even nsthreads do not have pthread_T identifiers to store newly created Pthreads.

Another place where pthreads are used is when nsthreads exit, they call pthread_exit(). Other than that, pthreads are rarely felt.

In fact, all performSelector families end up with the following omnipotent function:

- (void)performSelector:(SEL)aSelector onThread:(NSThread *)thr withObject:(nullable id)arg waitUntilDone:(BOOL)wait Modes :(nullable NSArray<NSString *> *)array API_AVAILABLE(macos(10.5), ios(2.0), watchos(2.0), tvos(9.0));Copy the code

This is just a wrapper that calls the NSRunloop method when the thread gets the runloop:

- (void) performSelector:(SEL)aSelector
                  target:(id)target
                argument:(id)argument
                   order:(NSUInteger)order
                   modes:(NSArray*)modes;Copy the code

This information will form a Performer object into a Runloop to wait for execution.

NSThread to kernel thread

Since the system does not provide a conversion method, and NSthreads do not retain the thread’s Pthread_T, conventional methods cannot meet the requirements.

One idea is to use the performSelector method to execute the code on a given thread and record thread_t. You can’t execute the code too late. If you print the call stack, you break the stack. The best way to do this is when a thread is created. The pthread_create method is used to create a thread, and its callback, nsthreadLauncher, is as follows:

static void *nsthreadLauncher(void* thread)
{
    NSThread *t = (NSThread*)thread;
    [nc postNotificationName: NSThreadDidStartNotification object:t userInfo: nil];
    [t _setName: [t name]];
    [t main];
    [NSThread exit];
    return NULL;
}Copy the code

Amazing discovery system would send a notification to inform the name is not available, but you can learn by using the method of monitor all notice of its name: @ “_NSThreadDidStartNotification”, so we can listen to this notice and call the performSelector method.

General use NSThread initWithTarget: Selector: create object method. So inside the main method, the selector gets executed, and when the main method is done, the thread exits. If you want to do thread survival, you need to turn runloop on in the selector you pass in, as described in my article on Runloop and thread survival.

As you can see, this isn’t realistic because, as explained earlier, performSelector relies on runloop being turned on, and runloop can’t be turned on until main.

Reviewing the problem, we found that what we needed was a link between the NSThread object and the kernel thread, meaning that we needed to find some unique value of the NSThread object, and that the kernel thread also had that unique value.

If you look at nsThreads, the only values are the address of the object, the Sequence Number of the object, and the thread name:

<NSThread: 0x144d095e0>{number = 1, name = main}Copy the code

The address is assigned on the heap, so there is no use for it. The serial number calculation is not understood, so only name is left. Fortunately, pthread also provides a method, pthread_getname_NP, to get the name of the thread. The two methods are the same. Interested readers can read the implementation of the setName method, which calls the interface provided by pThread.

Np here means not POSIX, which means it can’t be used across platforms.

The solution then is simple: for the NSThread parameter, change its name to some random number (I chose the timestamp), and then iterate over the PThread and check for a matching name. After the search is complete, the parameter name can be restored.

Convert the main thread to the kernel thread

Pthread_getname_np cannot read the name of the main thread.

Fortunately, we can solve the problem in a roundabout way: get the thread_t of the main thread first and compare it.

The above scenario requires us to execute code on the main thread to get thread_t, and the best scenario is obviously in the load method:

static mach_port_t main_thread_id;
+ (void)load {
    main_thread_id = mach_thread_self();
}Copy the code

Nine, study the passage

Capture and Parse of thread Call Stack in iOS
Capture and Parse of thread Call Stack in iOS (2)
Get those things on the call stack of any thread
BSBacktraceLogger