Create a set of APM monitoring system with the most complete client functions

APM, short for Application Performance Monitoring, monitors and manages the Performance and availability of software applications. Application performance management is critical to the continuous and stable operation of an application. So this article from an iOS App performance management latitude to talk about how to accurately monitor and how to report data and other technical points

App performance is one of the most important factors affecting user experience. Performance issues include: crashes, network request errors or timeouts, slow UI responses, main thread lags, high CPU and memory usage, high power consumption, and so on. Most of the problems are caused by developers using thread locks incorrectly, system functions, programming specification issues, data structures, etc. The key to solving the problem is to find and locate the problem as early as possible.

This article focuses on the reasons for APM and how to collect the data. After APM data collection, combined with the data reporting mechanism, data can be uploaded to the server according to certain policies. The server consumes this information and produces a report. Please combine the sister article, summarized how to create a flexible, configurable, powerful data reporting components.

1. Caton monitoring

Stagton problem is the problem of failure to respond to user interaction on the main thread. It affects the direct experience of users, so monitoring for APP lag is an important part of APM.

FPS (frame per second) is 60 for iPhone and 120 for some iPad models, which is also used as a reference parameter for lag monitoring. Why is it a reference parameter? Because it’s not accurate. Let’s start with how to get an FPS. CadisPlayLink is a system timer that will refresh the view at the same rate as the frame refresh rate. [CADisplayLink displayLinkWithTarget: self selector: @ the selector (# # # :)]. As for why not take a look at the sample code below

_displayLink = [CADisplayLink displayLinkWithTarget:self selector:@selector(p_displayLinkTick:)];
[_displayLink setPaused:YES];
[_displayLink addToRunLoop:[NSRunLoop currentRunLoop] forMode:NSRunLoopCommonModes];

As the code shows, the CadisPlayLink object is added to a Mode of the specified RunLoop. So it’s still CPU level, and Caton’s experience is the result of the entire image rendering: CPU + GPU. Please keep reading

1. Principle of screen rendering

Talk about the principle of an old CRT monitor. The CRT electron gun scans from top to bottom in the above manner. After the scanning is completed, a frame will be displayed on the display. Then the electron gun will return to the initial position to continue the next scanning. To synchronize the display process with the system’s video controller, the display (or other hardware) uses the hardware clock to generate a series of timing signals. When the gun moves to a new line and is ready to be scanned, the monitor sends out a signal called horizonal synchronization, or HSYNC; When a frame is drawn and the gun returns to its original position, the monitor sends out a signal called Vertical Sync, or VSync, before the next frame is ready to be drawn. The display usually refreshes at a fixed rate, and this fixed rate is the frequency generated by the VSYNC signal. Although today’s displays are basically liquid crystal displays, the principle remains the same.

In general, a picture on the screen is displayed by the CPU, GPU and the display in the way of the above work together. The CPU calculates the realistic content (such as view creation, layout calculation, image decoding, text rendering, etc.) based on the code written by the engineer, and then submits the calculation results to the GPU, which is responsible for layer composition, texture rendering, and then the GPU submits the rendering results to the frame buffer. The video controller then reads the frame buffer line by line according to the VSYNC signal and transmits the data to the display through digital-to-analog conversion.

In the case of only one frame buffer, the reading and refreshing of the frame buffer have efficiency problems. In order to solve the efficiency problem, the display system will introduce two buffers, that is, double buffer mechanism. In this case, the GPU will pre-render a frame and put it in the frame buffer for the video controller to read. When the next frame is rendered, the GPU will directly point the video controller’s pointer to the second buffer. Improved efficiency.

For now, double buffers improve efficiency, but bring new problems: When the video controller has not finished reading, that is, the content of the screen is displayed, the GPU submits the newly rendered frame to another frame buffer and points the pointer of the video controller to the new frame buffer. The video controller will display the lower half of the new frame data on the screen, causing the picture tearing.

To solve this problem, GPUs usually have a mechanism called VSync. When VSync is turned on, the GPU waits until the video controller sends the V-Sync signal before rendering a new frame and updating the frame buffer. Several of these mechanics solve the problem of tearing and increase the flow of the picture. But more computing resources are needed

Answering questions

When VSync is turned on, the GPU will wait until the video controller sends the V-SYNC signal to render a new frame and update the frame buffer. If the GPU receives the V-SYNC signal, then it will render a new frame and update the frame buffer.

Imagine a monitor displaying the first and second frames of an image. First, in the case of double buffers, the GPU first renders a frame of image and stores it in the frame buffer, and then asks the video controller’s pointer to directly display the first frame of image in this buffer. After the content of the first frame is displayed, the video controller sends the V-SYNC signal. After receiving the V-SYNC signal, the GPU renders the second frame image and points the video controller’s pointer to the second frame buffer.

It appears that the video controller sends the V-SYNC signal for the second frame after the first frame is displayed. Isn’t it? Is that true? 😭 think what, of course not. 🐷 otherwise there would be no point in the existence of double buffers

Revelation. Please see below

When the first V-SYNC signal arrives, a frame of image is rendered and placed in the frame buffer, but it is not displayed. When the second V-SYNC signal is received, the first rendered result is read (the video controller’s pointer points to the first frame buffer), and the new frame is rendered and stored in the second frame buffer. Upon receipt of the third V-SYNC signal, the contents of the second frame buffer are read (the video controller’s pointer points to the second frame buffer), and the rendering of the third frame image is started and fed into the first frame buffer, and so on and so on.

Please check the information, need ladder: Multiple Buffering

2. Causes of stagnation

After the arrival of the Vsync signal, the system graphics service will notify the APP through mechanisms such as CadisPlayLink, and the APP main thread will start to calculate the display content (view creation, layout calculation, image decoding, text rendering, etc.) in the CPU. Then the calculated content is submitted to the GPU, which transforms, composes and renders the layers, and then the GPU submits the rendered results to the frame buffer, waiting for the next Vsync signal to arrive before displaying the previously rendered results. In the case of VSync, if the CPU or GPU does not complete the submission of the content within a VSync period, the frame will be dropped and displayed at the next opportunity, while the previous rendered image will still appear on the screen, so this is the reason why the CPU-GPU interface is stuck.

IOS devices currently have dual caching, as well as three buffers, while Android is now dominated by three buffers, and in its early days it was single buffers. IOS three caching examples

CPU and GPU resources are consumed for many reasons, Such as frequent object creation, property adjustment, file reading, view hierarchy adjustment, layout calculation (AutoLayout) The number of views is more difficult to solve the linear equation), image decoding (read optimization of large picture), image rendering, text rendering, database reading (read more or write more optimistic lock, pessimistic lock scenario), the use of lock (for example: improper use of spin lock will waste CPU) and other aspects. Developers look for the best solution based on their own experience (which is not the focus of this article).

3. How does APM monitor and report the lag

CadisplayLink is no longer needed, this FPS is for reference only. Generally speaking, there are two ways to monitor the Caton: listening for the RunLoop status callback, and the child thread ping the main thread

3.1 How RunLoop status is monitored

The RunLoop listens for input sources for scheduling. Such as networks, input devices, periodic or delayed events, asynchronous callbacks, etc. The RunLoop receives two types of input sources: asynchronous messages (source0 events) from another thread or from different applications, and events from predetermined or repeated intervals.

The RunLoop state is shown in the figure below

Step 1: Notifies Observers, the RunLoop is about to enter the loop, followed by the loop

IF (CURRENTMODE -> _OBSERVERMASK &KCFRUNLOOPENTRY) // notify Observers: Loop __cfrunloopDoobServers (rl, currentMode, kCFRunLoopEntry); / / into the loop result = __CFRunLoopRun (rl, currentMode, seconds, returnAfterSourceHandled, previousMode);

Step 2: Start the do while loop to preserve the thread, notify Observers, the RunLoop triggers the Timer callback, the Source0 callback, and then execute the block that was added

IF (RLM-> _OBSERVERMASK & KCFRUNLOOPBEFORETIMERS) // NOTIFY OBSERVERS: RunLoop about to trigger a Timer callback __cfrunloopDoobServers (rl, RLM, kcfrunloopBeforeTimers); IF (RLM-> _OBSERVERMASK & KCFRUNLOOPBEFORESOURCES) // notify Observers: RunLoop about to trigger Source callback __cfrunloopDoobServers (rl, RLM, kcfrunloopBefoResources); // Execute a block __CFRunLoopDoBlocks(rl, RLM);

Step 3: After the Source0 callback is fired, if Source1 is ready, it will jump to handle_msg to process the message.

// If a Source1 (based on port) is ready, process the Source1 directly and jump to handle the message if (MACH_PORT_NULL! = dispatchPort && ! didDispatchPortLastTime) { #if DEPLOYMENT_TARGET_MACOSX || DEPLOYMENT_TARGET_EMBEDDED || DEPLOYMENT_TARGET_EMBEDDED_MINI  msg = (mach_msg_header_t *)msg_buffer; if (__CFRunLoopServiceMachPort(dispatchPort, &msg, sizeof(msg_buffer), &livePort, 0, &voucherState, NULL)) { goto handle_msg; } #elif DEPLOYMENT_TARGET_WINDOWS if (__CFRunLoopWaitForMultipleObjects(NULL, &dispatchPort, 0, 0, &livePort, NULL)) { goto handle_msg; } #endif }

Step 4: When the callback is triggered, notify Observers that they are about to go to sleep

Boolean poll = sourceHandledThisLoop || (0ULL == timeout_context->termTSR); // Notifies Observers: The RunLoop thread is about to sleep if (! poll && (rlm->_observerMask & kCFRunLoopBeforeWaiting)) __CFRunLoopDoObservers(rl, rlm, kCFRunLoopBeforeWaiting); __CFRunLoopSetSleeping(rl);

Step 5: Once you hibernate, you wait for the mach_port message to wake up again. Only the following four conditions can be reawakened.

Port-based Source event
The Timer to time
RunLoop timeout

Waken by the caller

do { if (kCFUseCollectableAllocator) { // objc_clear_stack(0); // <rdar://problem/16393959> memset(msg_buffer, 0, sizeof(msg_buffer)); } msg = (mach_msg_header_t *)msg_buffer; __CFRunLoopServiceMachPort(waitSet, &msg, sizeof(msg_buffer), &livePort, poll ? 0 : TIMEOUT_INFINITY, &voucherState, &voucherCopy); if (modeQueuePort ! = MACH_PORT_NULL && livePort == modeQueuePort) { // Drain the internal queue. If one of the callout blocks sets the timerFired flag, break out and service the timer. while (_dispatch_runloop_root_queue_perform_4CF(rlm->_queue)); if (rlm->_timerFired) { // Leave livePort as the queue port, and service timers below rlm->_timerFired = false; break; } else { if (msg && msg ! = (mach_msg_header_t *)msg_buffer) free(msg); } } else { // Go ahead and leave the inner loop. break; } } while (1);

Step 6: Inform the Observer that the thread of the RunLoop has just been woken up

// Notifies Observers: The RunLoop thread has just been woken up if (! poll && (rlm->_observerMask & kCFRunLoopAfterWaiting)) __CFRunLoopDoObservers(rl, rlm, kCFRunLoopAfterWaiting); // Handle_msg :; __CFRunLoopSetIgnoreWakeUps(rl);

Step 7: When the RunLoop wakes up, process the message received when it wakes up

If the Timer time is up, the Timer callback is triggered
If it is a dispatch, then a block executes

If it is the source1 event, the event is processed

#if (RLM -> - timerport!) #if (RLM -> - timerport!) = MACH_PORT_NULL && livePort == rlm->_timerPort) { CFRUNLOOP_WAKEUP_FOR_TIMER(); // On Windows, we have observed an issue where the timer port is set before the time which we requested it to be set. For example, we set the fire time to be TSR 167646765860, but it is actually observed firing at TSR 167646764145, which is 1715 ticks early. The result is that, when __CFRunLoopDoTimers checks to see if any of the run loop timers should be firing, it appears to be 'too early' for the next timer, and no timers are handled. // In this case, the timer port has been automatically reset (since it was returned from MsgWaitForMultipleObjectsEx), and if we do not re-arm it, then no timers will ever be serviced again unless something adjusts the timer list (e.g. adding or removing timers). The  fix for the issue is to reset the timer here if CFRunLoopDoTimers did not handle a timer itself. 9308754 if (! __CFRunLoopDoTimers(rl, rlm, mach_absolute_time())) { // Re-arm the next timer __CFArmNextTimerInMode(rlm, rl); }} #endif // If there is a block dispatched to main_queue, If (livePort == dispatchPort) {CFRUNLOOP_WAKEUP_FOR_DISPATCH(); __CFRunLoopModeUnlock(rlm); __CFRunLoopUnlock(rl); _CFSetTSD(__CFTSDKeyIsInGCDMainQ, (void *)6, NULL); #if DEPLOYMENT_TARGET_WINDOWS void *msg = 0; #endif __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__(msg); _CFSetTSD(__CFTSDKeyIsInGCDMainQ, (void *)0, NULL); __CFRunLoopLock(rl); __CFRunLoopModeLock(rlm); sourceHandledThisLoop = true; didDispatchPortLastTime = true; Else {CFRUNLOOP_WAKEUP_FOR_SOURCE();} // If a Source1 (based on port) emits an event, else {CFRUNLOOP_WAKEUP_FOR_SOURCE(); // If we received a voucher from this mach_msg, then put a copy of the new voucher into TSD. CFMachPortBoost will look in the TSD for the voucher. By using the value in  the TSD we tie the CFMachPortBoost to this received mach_msg explicitly without a chance for anything in between the two pieces of code to set the voucher again. voucher_t previousVoucher = _CFSetTSD(__CFTSDKeyMachMessageHasVoucher, (void *)voucherCopy, os_release); CFRunLoopSourceRef rls = __CFRunLoopModeFindSourceForMachPort(rl, rlm, livePort); if (rls) { #if DEPLOYMENT_TARGET_MACOSX || DEPLOYMENT_TARGET_EMBEDDED || DEPLOYMENT_TARGET_EMBEDDED_MINI mach_msg_header_t *reply = NULL; sourceHandledThisLoop = __CFRunLoopDoSource1(rl, rlm, rls, msg, msg->msgh_size, &reply) || sourceHandledThisLoop; if (NULL ! = reply) { (void)mach_msg(reply, MACH_SEND_MSG, reply->msgh_size, 0, MACH_PORT_NULL, 0, MACH_PORT_NULL); CFAllocatorDeallocate(kCFAllocatorSystemDefault, reply); } #elif DEPLOYMENT_TARGET_WINDOWS sourceHandledThisLoop = __CFRunLoopDoSource1(rl, rlm, rls) || sourceHandledThisLoop; #endif

Step 8: Based on the current state of the RunLoop, determine whether to enter the next Loop. When an external stop is forced or the loop timeouts, the next loop is not continued, otherwise the next loop is entered

If (sourceHandledThisLoop && stopAfterHandle) {/ / said into the loop parameters processed event is returned retVal = kCFRunLoopRunHandledSource; } else if (timeout_context->termTSR < mach_absolute_time()) {retVal = kcfrunloopRuntimedOut; } else if (__CFRunLoopIsStopped(rl)) { __CFRunLoopUnsetStopped(rl); RetVal = kcfrunLoopRunStopped; } else if (rlm->_stopped) { rlm->_stopped = false; retVal = kCFRunLoopRunStopped; } else if (__cfrunloopModeIsEmpty (rl, RLM, previousMode)) {// source/timer does not have a retVal = kcfrunloopRunFind; }

See the complete, annotated RunLoop code here. Source1 is used by the RunLoop to handle system events from Mach port, and Source0 is used to handle user events. The essence of receiving the Source1 event is to call the Source0 event handler.

RunLoop has 6 states

Typedef CF_OPTIONS(cfOptionFlags, cfrunLoopActivity) {kcfrunloOpenTry, // Enter loop kcfrunLoopBeforeTimers, // trigger the Timer callback kcfrunLoopBeFoResources, // trigger the Source0 callback kcfrunLoopBeforeWaiting, // wait for mach_port message kcfrunloopAfterwaiting), // receive mach_port message kCFRunLoopExit, // exit loop kcfrunloopAllActivities // all state of loop changes}

The RunLoop can block a thread by taking too long to execute a method before it goes to sleep, or by taking too long to receive a message after the thread has woken up and is unable to proceed to the next step. If it is the main thread, it shows a lag.

Once it is found that the state of kcfrunloopBeFoResources before sleep or kcfrunloopAfterWaiting after waking has no change within the set time threshold, it can be judged to be stuck. At this time, stack information can be dumped to restore the scene of the crime, and then the problem of stuck can be solved.

Start a child thread and continue to loop to see if it is stuck. If the threshold value is exceeded n times, it is considered to be stuck. This is followed by stack dump and report (there is a mechanism for handling data in the next part).

Watchdog has different values in different states.

Launch: 20S
Resume: 10s
Suspend (Suspend) : 10s
Quit: 6s
Background: 3min (before iOS7 can apply for 10min; Then it was changed to 3min; Continuous application, up to 10min)

The setting of the Caton threshold is based on the mechanism of the WatchDog. The threshold value in the APM system needs to be less than the value of WatchDog, so the value range is between [1, 6]. The industry usually chooses 3 seconds.

Long dispatch_semaphore_wait(dispatch_semaphore_t dsema, dispatch_time_t timeout) determines if the main thread is blocked. Returns zero on success, or non-zero if the timeout occurred. A non-0 return indicates that the main thread has been blocked by a timeout.

Many people may be wondering why you chose kcfrunloopBefoResources and kcfrunloopAfterWaiting when there are so many RunLoop states. Because most of the traffic is between kcfrunloopBefoResources and kcfrunloopAfterWaiting. Such as Source0 type App internal events, etc

The flow chart of RunLoop detection lag is as follows:

The key code is as follows:

CfrunLoopObserverContext context = {0, (__bridge void *)self, NULL, NULL}; // create Runloop observer _observer = CFRUNLOOPOBSERVERCREATE (kCFALLOCatorDefault, kCFRUNLOOPAllActivities, YES, 0, &runLoopObserverCallBack, &context); // Add a new observer to the current thread's runloop cfrunLoopAdDobServer (cfrunLoopGetMain (), _observer, kcfrunLoopCommonMoDES); // create signal _semaphore = dispatch_semaphore_create(0); __weak __typeof(self) weakSelf = self; Dispatch_async (dispatch_get_global_queue(0, 0), ^{__strong __typeof(weakSelf) strongSelf = weakSelf; if (! strongSelf) { return; } while (YES) { if (strongSelf.isCancel) { return; } / / N caton exceeds the threshold T recorded as a caton long semaphoreWait = dispatch_semaphore_wait (self - > _semaphore, dispatch_time (DISPATCH_TIME_NOW, strongSelf.limitMillisecond * NSEC_PER_MSEC)); if (semaphoreWait ! = 0) { if (self->_activity == kCFRunLoopBeforeSources || self->_activity == kCFRunLoopAfterWaiting) { if (++strongSelf.countTime < strongSelf.standstillCount){ continue; } // Stack dump and upload data to the server according to the policy. Stack dump is explained below. Data reported in [to build a powerful, flexible, configurable data reporting component] (https://github.com/FantasticLBP/knowledge-kit/blob/master/Chapter1%20-%20iOS/1.80.md)}}  strongSelf.countTime = 0; }});

3.2 How child threads ping the main thread listens

Start a child thread that creates a semaphore with an initial value of 0 and a Boolean flag bit with an initial value of YES. Will set the flag bit to NO task distributing in the main thread, the child thread dormancy threshold time, time to judge whether flag bit is the main thread after successful (value of NO), if not successful that have taken place in the main thread caton case, the dump stack information combined with the data reporting mechanism, according to certain strategy to upload data to the server. Data submission is a powerful, flexible and configurable data submission component

while (self.isCancelled == NO) { @autoreleasepool { __block BOOL isMainThreadNoRespond = YES; dispatch_semaphore_t semaphore = dispatch_semaphore_create(0); dispatch_async(dispatch_get_main_queue(), ^{ isMainThreadNoRespond = NO; dispatch_semaphore_signal(semaphore); }); [NSThread sleepForTimeInterval:self.threshold]; if (isMainThreadNoRespond) { if (self.handlerBlock) { self.handlerBlock(); } dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);} dispatch_semaphore_wait; }}

4. A stack dump

Getting the method stack is a hassle. Get your thoughts in order. [NSThread CallstackSymbols] gets the call stack of the current thread. However, when monitoring the occurrence of a lag, the need to get the main thread stack information is helpless. There is no way to get back from any thread to the main thread. So let’s do a little bit of a review.

In computer science, a call stack is a stack-type data structure used to store thread information about a computer program. This stack is also known as the execution stack, program stack, control stack, runtime stack, machine stack, and so on. The call stack is used to track the point at which each active subroutine should return control after completion of execution.

A Wikipedia search for the “Call Stack” shows a graph and an example, as shown below

The figure above represents a stack. It is divided into several stack frames, each of which corresponds to a function call. This is in blue down hereDrawSquareFunction, which is called during executionDrawLineThe function, I’ll do in green.

As you can see, the stack frame consists of three parts: function parameters, return address, and local variables. For example, if the DrawLine function is called inside drawSquare: first, the parameters required by the DrawLine function are pushed into the stack; The second return address (control information. For example, function B is called within function A, and the address of the next line of code that calls function B is the return address. Local variables inside the third function are also stored in the stack.

The Stack Pointer indicates the top of the current Stack. Most operating systems grow down the Stack, so the Stack Pointer is the minimum. The address to which the Frame Pointer points stores the value of the last Stack Pointer.

On most operating systems, each stack frame also holds the frame pointer of the previous stack frame. So knowing the Stack Pointer and Frame Pointer of the current Stack Frame, we can go back and recursively retrieve the Frame at the bottom of the Stack.

The next step is to get the Stack Pointer and Frame Pointer of all the threads. And then go back and retrace the scene of the crime.

5. Mach Task knowledge

Mach task:

When the App is run, it will correspond to a Mach Task, which may have multiple threads executing the Task at the same time. A Mach Task is a container object that manages virtual memory and other resources, including devices and other handles. A Mach Task is a container object that manages virtual memory and other resources. A Mack Task is the execution environment abstraction of a machine-independent Thread.

Action: A task can be thought of as a process that contains a list of its threads.

TASK_THREADS, which stores all the threads under target_task in an act_list array with the number of act_listCNT

kern_return_t task_threads ( task_t traget_task, thread_act_array_t *act_list, // Thread pointer list mach_msg_type_number_t * act_listCNT // Thread pointer list mach_msg_type_number_t * act_listCNT

thread_info:

kern_return_t thread_info
(
  thread_act_t target_act,
  thread_flavor_t flavor,
  thread_info_t thread_info_out,
  mach_msg_type_number_t *thread_info_outCnt
);

How to get thread stack data:

System method kern_return_t task_threads(task_inspect_t target_task, thread_act_array_t *act_list, mach_msg_type_number_t *act_listCnt); You can get all the threads, but this way you get the thread information for the lowest level Mach thread.

For each thread, KERN_RETURN_T THREAD_GET_STATE (THREAD_ACT_T TARGET_ACT, THREAD_STATE_FLAVOR_T flavor, THREAD_STATE_T OLD_STATE, mach_msg_type_number_t *old_stateCnt); The method gets all of its information, which is filled in with arguments of type _struct_mContext, two of which are different depending on the CPU architecture. So you need to define macros to mask the differences between different CPUs.

The _STRUCT_MCONTEXT structure stores the Stack Pointer of the current thread and the Frame Pointer of the top Stack Frame, and then traces back the entire thread call Stack.

However, the above method gets the kernel thread, and the information we need is NSthread, so we need to convert the kernel thread to NSthread.

The p for pthread is an acronym for POSIX, which stands for Portable Operating System Interface. The idea is that each system has its own unique threading model, and different systems have different APIs for threading. So the purpose of POSIX is to provide abstract Pthreads and related APIs. These APIs have different implementations in different operating systems, but they do the same thing.

The TASK_THREADS and THREAD_GET_STATE provided by UNIX systems operate on kernel systems, and each kernel thread is uniquely identified by the ID of type THREAD_T. The unique identity of a pthread is of type pthread_t. The conversion between kernel threads and pthreads (i.e., thread_t and pthread_t) is easy because pthreads are designed to “abstract kernel threads.”

The memorystatus_action_neededpthread_create method creates the thread’s callback function as NSThreadLauncher.

static void *nsthreadLauncher(void* thread)  
{
    NSThread *t = (NSThread*)thread;
    [nc postNotificationName: NSThreadDidStartNotification object:t userInfo: nil];
    [t _setName: [t name]];
    [t main];
    [NSThread exit];
    return NULL;
}

@ “_NSThreadDidStartNotification NSThreadDidStartNotification is string.

<NSThread: 0x... >{number = 1, name = main}

In order for NSThreads to correspond to kernel threads, they can only correspond by name. The pthread API pthread_getname_np also retrieves the kernel thread name. NP stands for NOT POSIX, so it cannot be used across platforms.

The idea is summarized as follows: store the original name of the NSthread, change the name to some random number (timestamp), and then traverse the name of the kernel thread pthread. If the name matches, the NSthread and the kernel thread correspond. When found, restore the thread name to its original name. For the main thread, since you cannot use pthread_getname_np, you get thread_t in the load method of the current code and match the name.

static mach_port_t main_thread_id;  
+ (void)load {
    main_thread_id = mach_thread_self();
}

2. Monitoring of APP startup time

1. Monitor the APP startup time

Application startup time is one of the most important factors affecting user experience, so we need to quantify how fast an App starts. Start is divided into cold start and hot start.

Cold Start: The App is not yet running and the entire application must be loaded and built. Complete the initialization of the application. There is a large room for optimization in cold start. Cold start time from application: didFinishLaunchingWithOptions: method, App in general here on the basis of the various SDK and App initialization.

Thermal activation: application has been run in the background (common scenarios: such as user click the Home button in the process of using the App, then open the App), due to some event will wake up to the front desk, the App will be applicationWillEnterForeground: method accepts applications into the front desk

The idea is simple. The following

In the monitor classloadMethod to get the current time value first
Listen for notifications after App startup is completeUIApplicationDidFinishLaunchingNotification
Get the current time when you receive the notification
The time difference between steps 1 and 3 is the App startup time.

Mach_absolute_time is a CPU/ bus-dependent function that returns a CPU clock cycle number. It does not increase when the system is sleeping. It’s a nanosecond number. Getting before and after 2 nanoseconds is required to convert to seconds. A system time-based benchmark is required, obtained through mach_timebase_info.

mach_timebase_info_data_t g_apmmStartupMonitorTimebaseInfoData = 0; mach_timebase_info(&g_apmmStartupMonitorTimebaseInfoData); uint64_t timelapse = mach_absolute_time() - g_apmmLoadTime; double timeSpan = (timelapse * g_apmmStartupMonitorTimebaseInfoData.numer) / (g_apmmStartupMonitorTimebaseInfoData.denom  * 1e9);

2. Online monitoring of the startup time is fine, but the startup time needs to be optimized in the development stage.

To optimize startup time, you need to know exactly what you’re doing at startup and make a plan for what’s going on.

The pre-main phase is defined as the period from the start of the App to the system calling the main function. The main phase is defined as the main function entry into the main UI framework’s ViewDidAppear.

APP startup process:

Info.plist: Load information such as flash screen; Sandbox setup, permissions check;
Mach-O loading: If it’s a fat binary, look for the appropriate portion of the current CPU architecture; Loads all dependent Mach-O files (recursively calling the Mach-O loading method); Define internal and external pointer references, such as strings, functions, etc. Load the method in the classification; C ++ static object loading, call Objc+load()Functions; Execute a C function declared as __attribute_(constructor);
Program execution: call main(); Calls UIApplicationMain (); Call applicationWillFinishLaunching ();

The Pre – the Main stage

The Main stage

2.1 loading Dylib

Each dynamic library is loaded as required by DYLD

Analyze the dependent dynamic libraries
Locate the Mach-O file for the dynamic library
Open the file
Verify that the file
Register the file signature in the system core
Call mmap () for each segment of the dynamic library

Optimization:

Reduce dependencies on non-system libraries
Use static libraries instead of dynamic libraries
Merges non-system dynamic libraries into a single dynamic library

2.2 Rebase && Binding

Optimization:

Reduce the number of Objc classes, reduce the number of selectors, and get rid of unused classes and functions
Reduce the number of C ++ virtual functions
Switching to Swift struct (essentially reducing the number of symbols)

2.3 Initializers

Optimization:

use+initializeInstead of+load
Instead of using attribute*(constructor) to mark the method display as an initializer, let the method call be initialized before execution. Such as dispatch_one, pthread_once(), or STD ::once(). It is only initialized when it is used for the first time, which postpones some of the work time and tries not to use C ++ static objects

2.4 Factors influencing the pre-main stage

The more dynamic libraries are loaded, the slower they start.
The more ObjC classes, the more functions, and the slower the startup.
The larger the executable, the slower the startup.
The more constructor functions in C, the slower the startup.
The more C++ static objects, the slower the startup.
The more +load ObjC has, the slower it starts.

Optimization means:

Reduced reliance on unnecessary libraries, both dynamic and static; If possible, change the dynamic library to a static one; If you must rely on dynamic libraries, combine multiple non-system dynamic libraries into one dynamic library
Check that the framework should be set to optional and required. If the framework is available on all iOS versions supported by the App, then set it to required. Otherwise, set it to optional because optional is a little extra
Merge or delete some OC classes and functions. In terms of cleaning up unused classes in the project, I can use the code inspection function of AppCode to find the classes that are not used in the current project (it can also be analyzed according to the LinkMap file, but the accuracy is not very high). There is an open source project called FUI that can analyze the classes that are no longer used with very high accuracy. The only problem is that it can’t handle classes provided by dynamic and static libraries, and it can’t handle C++ class templates
Eliminate unwanted static variables
Delete methods that have not been called or that have been deprecated
Deferred to +initialize anything you don’t have to do in the +load method, try not to use C++ virtual functions (creating virtual tables has overhead)
Class and method names should not be too long: Each class and method name in IOS has a corresponding string value in the __cstring section, so the length of the class and method name also has an impact on the size of the executable file, which is also the dynamic characteristic of Object-C, because it needs to find this class/method through the reflection of the class/method name to call. The Object-C Object model stores class/method name strings;
Replace all attribute() functions with dispatch_once(), static object initialization in C++, and ObjC’s +load function.
It pays to compress the image to a size that is acceptable to the designer. Why does compressing images speed up startup? Because it is normal to load a dozen or twenty large and small pictures when starting. When the pictures are small, the amount of IO operation is small, and of course, the startup will be faster. The more reliable compression algorithm is TinYPNG.

2.5 Main stage optimization

Reduce the process of starting initialization. Can lazy loading on lazy loading, can put the background initialization on the background initialization, can delay the initialization on the delay, do not card the main thread start time, has been offline business code directly deleted
Optimize code logic. Remove unnecessary logic and code to reduce the amount of time each process consumes
The startup phase uses multi-threading for initialization to maximize CPU performance
Use pure code instead of XIB or storyboard to describe the UI, especially the main UI framework, such as TabBarController. Because the XIB and storyboard still need to be parsed into code to render the page, that’s one more step.

3. Accelerated startup time

Memory page missing exception? In use, when a page in virtual memory is accessed and the corresponding physical memory shortage does not exist (it was not loaded into physical memory), a page shortage exception occurs. The impact takes a few milliseconds.

When do large page missing exceptions occur? When an application is just started.

The code required for startup is distributed on page 1, page 2, page 3 of the VM… In this case, the startup time will have a great impact, so the solution is to put the code needed by the application startup (binary optimization) into a few pages, so as to avoid the memory page missing exception, then the APP startup time is optimized.

Binary reordering boosts App startup speed by “resolving the memory page miss exception” (which can take milliseconds for a page miss).

The best time for an App to experience a large number of “pages out of memory” is when the App is just launched. Therefore, the optimization method is to “concentrate the methods that affect the launch of the App and put them on a certain page or several pages” (pages in virtual memory). Xcode projects allow developers to specify an “Order File” that can be “loaded in the Order of the methods in the File,” You can check the LinkMap File (you need to set the Order File, Write Link Map Files parameters in the “Buiild Settings” in Xcode).

The hard part is how to get the method called at startup time. The code may be SWIFT, BLOCK, C, OC, so hook must not work, fishhook also can not work, with Clang peg can meet the requirements.

Three, CPU utilization monitoring

1. The CPU architecture

The main architectures in the market include ARM (ARM64), Intel (x86), AMD and so on. Intel uses CISC (Complex Instruction Set Computer) and ARM uses RISC (Reduced Instruction Set Computer). The difference lies in different CPU design concepts and approaches.

Early CPUs were all CISC architectures, designed to accomplish the required computing tasks with a minimum of machine language instructions. For multiplication, for example, on a CISC architecture CPU. With a single instruction, MUL ADDRA, ADDRB can multiply the numbers in memory ADDRA and memory ADDRB, and store the results in ADDRA. All it does is read data from ADDRA and ADDRB into registers and write the multiplied results into memory, which depends on the CPU design, so the CISC architecture increases CPU complexity and demands on CPU technology.

The RISC architecture requires software to specify the operational steps. For example, in the multiplication above, the instruction is implemented as MOVE A, ADDRA; MOVE B, ADDRB; MUL A, B; STR ADDRA, A; . This architecture can reduce CPU complexity and allow more powerful CPUs to be produced at the same level of technology, but it requires more compiler design.

Currently, the majority of iPhones in the market are based on ARM64 architecture. And the ARM architecture has low energy consumption.

2. Get thread information

So how do we monitor CPU usage

Start the timer and continue to execute the following logic in accordance with the set period
Gets the current task task. Get all thread information from the current task (number of threads, array of threads)
Iterate through all thread information to determine if any thread’s CPU usage exceeds the set threshold
If any thread usage exceeds the threshold, dump the stack
Assemble data, report data

Thread information structure

struct thread_basic_info { time_value_t user_time; /* user run time */ time_value_t system_time; /* system run time */ integer_t cpu_usage; /* Scaled CPU usage percentage (maximum 1000 CPU usage) */ policy_t policy; /* Scheduling policy in effect */ integer_t run_state; /* int state */ integer_t flags; /* Various flags */ integer_t suspend_count; /* suspend count for thread */ integer_t sleep_time; /* number of seconds that thread * has been sleeping */};

The code mentioned in the stack restore, forget to look at the above analysis

thread_act_array_t threads; mach_msg_type_number_t threadCount = 0; const task_t thisTask = mach_task_self(); kern_return_t kr = task_threads(thisTask, &threads, &threadCount); if (kr ! = KERN_SUCCESS) { return ; } for (int i = 0; i < threadCount; i++) { thread_info_data_t threadInfo; thread_basic_info_t threadBaseInfo; mach_msg_type_number_t threadInfoCount; kern_return_t kr = thread_info((thread_inspect_t)threads[i], THREAD_BASIC_INFO, (thread_info_t)threadInfo, &threadInfoCount); if (kr == KERN_SUCCESS) { threadBaseInfo = (thread_basic_info_t)threadInfo; // Todo: if (!! (threadBaseInfo->flags & TH_FLAGS_IDLE)) { integer_t cpuUsage = threadBaseInfo->cpu_usage / 10; if (cpuUsage > CPUMONITORRATE) { NSMutableDictionary *CPUMetaDictionary = [NSMutableDictionary dictionary]; NSData *CPUPayloadData = [NSData data]; NSString *backtraceOfAllThread = [BacktraceLogger backtraceOfAllThread]; // 1. Cpumetadictionary [@" Monitor_Type "] = ApmMonitorCputype; // 2. The Payload of the assembly card (a JSON object whose Key is the agreed STACK_TRACE). The value for the stack information after base64) NSData * CPUData = [SAFE_STRING (backtraceOfAllThread) dataUsingEncoding: NSUTF8StringEncoding]; NSString *CPUDataBase64String = [CPUData base64EncodedStringWithOptions:0]; NSDictionary *CPUPayloadDictionary = @{@"STACK_TRACE": SAFE_STRING(CPUDataBase64String)}; NSError *error; // NSJSONWritingOptions must be passed to 0 because the server needs to process logic based on \n. Passing zero is generated by the json string without \ n NSData * parsedData = [NSJSONSerialization dataWithJSONObject: CPUPayloadDictionary options: 0 error:&error]; if (error) { APMMLog(@"%@", error); return; } CPUPayloadData = [parsedData copy]; / / 3. The data reported in [to build a powerful, flexible, configurable data reporting component] (https://github.com/FantasticLBP/knowledge-kit/blob/master/Chapter1%20-%20iOS/1.80.md) Speak [[HermesClient sharedInstance] sendWithType: APMMonitorCPUType meta: CPUMetaDictionary content: CPUPayloadData]; }}}}

Four, OOM problem

1. Basic knowledge preparation

Hard disk: Also called a disk, used to store data. Your songs, pictures, and videos are stored on your hard drive.

Memory: Because the hard disk reads slowly, if the CPU runs the program, all the data is read directly from the hard disk, it is very inefficient. So the CPU reads the data needed to run the program from the hard disk into memory. Then the CPU calculates and exchanges the data with the memory. Memory is volatile memory, ie when the power is turned off, the data disappears. Memory areas are the internal storage of a computer (on the motherboard) to hold intermediate data and results of CPU operations. Memory is the bridge between the program and the CPU. To read data from a hard disk or run a program for the CPU.

Virtual memory is a technique of memory management in computer systems. It makes the program think that it has contiguous available memory, when in fact, it is usually divided into multiple physical memory fragments, which may be partially stored temporarily on external disk (hard disk) memory (with data from the hard disk being swapped into memory when needed). This is called “virtual memory” on Windows and “swap space” on Linux/ UNIX.

IOS doesn’t support swap Spaces? It’s not just iOS that doesn’t support swap Spaces, most mobile systems don’t. Because a large amount of memory of mobile devices is flash memory, its read and write speed is far smaller than the hard disk used by computers, that is to say, even if the mobile phone uses swap space technology, but also because of the problem of slow flash memory, can not improve the performance, so simply there is no swap space technology.

2. Knowledge of iOS memory

Memory (RAM), like CPU, is one of the scarcest resources in the system, and it is also easy to compete. Application memory is directly related to performance. IOS has no swap space as an alternative resource, so memory resources are especially important.

What is OOM? “Out of Memory” is short for “Out of Memory”, which literally means “Out of Memory”. It is divided into Foom (Foreground Out Of Memory), and the application crashes during Foreground running. This kind Of crash will lead to the loss Of active users, which is very undesirable for the business) and BOOM (Background Out Of Memory, the process Of the application running in the Background crashes). It is a non-mainstream Crash caused by iOS’s Jetsam mechanism, which cannot be caught by the monitoring scheme Signal.

What is the Jetsam mechanism? The Jetsam mechanism can be understood as a management mechanism used by the system to control excessive memory resource usage. The JetSam mechanism runs in a separate process, and each process has a memory threshold that JetSam kills as soon as it exceeds.

Why the Jetsam mechanism? Memory resources are important because the memory of the device is limited. System processes, as well as other apps in use, will preempt this resource. Since IOS does not support swap space, once a low memory event is triggered, JetSam will release the memory where the App is located as much as possible. In this way, when there is insufficient memory on the IOS system, the App will be killed by the system and turned into a crash.

2 cases trigger OOM: the system will kill apps with lower priority based on priority policy due to high overall memory usage; If the current App reaches the “highg water mark”, the system will also force the current App (the system’s memory limit for the current single App is exceeded).

If you read the source code (xnu/ BSD /kern/kern_memorystatus.c), you will find that there are also two mechanisms for memory killing, as follows

Highwater processing -> Our App cannot consume more memory than a single limit

Loop through the priority list for threads
Determine if the constraint of p_memstat_memlimit is met
Diagonose active, FREEZE filter
Kill the process, exit if successful, otherwise loop

Memorystatus_act_aggressive handling -> has a high memory footprint and is killed by priority

JLD_BUCKET_COUNT = JLD_BUCKET_COUNT = JLD_BUCKET_COUNT = JLD_BUCKET_COUNT
Start killing from JETSAM_PRIORITY_ELEVATED_INACTIVE
OLD_BUCKET_COUNT and MemoryStatus_JLD_EVAL_PERIOD_MSECs determine whether to open the kill
Start from low to high according to the priority, up to memorystatus_avail_pages_below_pressure

Several cases of excessive memory

App memory consumption is low, and other apps have good memory management, so even if we switch to another App, our App will still be “alive” and retain the user state. Experience is good
App memory consumption is low, but other apps consume too much memory (either because of poor memory management, or because they consume resources themselves, such as games), then other apps except the threads in the foreground will be killed by the system, and the memory resources will be collected and used to provide memory for active processes.
APP consumes a lot of memory. After switching to another APP, even if the memory applied by other APP to the system is not large, the system will kill the APP with a large memory consumption first because of the memory resource shortage. It shows that the user will exit the App to the background, and then open it again later and find that the App will be reloaded.
App memory consumption is very large, when running in the foreground is killed by the system, causing flashback.

When the App runs out of memory, the system will follow a certain strategy to free up more space for use. A more common practice is to move some of the low-priority data to disk, an operation called page out. When the data is accessed again later, the system is responsible for moving it back into memory. This operation is called Page In.

Memory Page ** is the smallest unit of Memory management, and is system-allocated. It is possible for a page to hold multiple objects, or for a large object to span multiple pages. It is typically 16KB in size and has three types of pages.

Clean Memory consists of three categories: You can page out the memory, the memory mapped files, and the framework used by your App (each framework has a _DATA_CONST segment, which is usually clean, but with Runtime swizling, it becomes dirty).

The first pages allocated are clean (except for objects allocated in the heap), and our App becomes dirty when we write the data. Files read from hard disk into memory are also read-only, clean page.
Dirty Memory

Dirty memory includes four categories: memory written to by the App, objects allocated in all heap areas, image decoding buffers, and framework (framework has _DATA and _DATA_DIRTY segments, both of which are Dirty).

Using singletons or global initializers helps reduce the amount of Dirty memory that is generated by using the framework (because singletons are not destroyed once they are created, they are always in memory, and the system does not recognize them as Dirty memory).
Compressed Memory

Due to flash capacity and read/write limitations, iOS does not have a swapping space mechanism. Instead, Memory Compressor is introduced in iOS7. It is used to compress objects that have not been used in the recent period of time when memory is tight. The memory compressor compresses the object, freeing up more pages. It is decompressed and reused by the memory compressor as needed. It improves the response speed while saving memory.

For example, APP uses a Framework and has an NSDictionary attribute inside to store data, which uses 3 pages of memory. Memory Compressor compresses it to 1 page when it has not been accessed recently. Restore to 3 pages when you use it again.

App running memory = PageNumbers * PageSize. Because Compressed Memory is Dirty Memory. So Memory footprint = DirtySize + compressedSize

With different devices, the upper limit of memory consumption is different, APP upper limit is higher, extension upper limit is lower, and when the upper limit is exceeded, crash toEXC_RESOURCE_EXCEPTION.

Let’s talk about how to get a memory cap and how to monitor if an App is forced to kill because it uses too much memory.

3. Get memory information

3.1 Calculate the memory limit value from the JetSamEvent log

When an App is killed by the Jetsam mechanism, the phone generates a system log. View the path: Settings-Privacy-Analytics & Improvements- Analytics Data (Settings-Privacy-Analytics and Improvements- Analytics Data) You can see the JetSamEvent -2020-03-14-161828. IPS log, beginning with JetSamEvent. These JetSamEvent logs are left by the iOS kernel to kill those apps that are not of high priority (Idle, Frantmost, Suspended) and occupy more memory than the system memory limit.

The log contains the memory information of the App. RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize = RPAGES * PageSize

The largeStProcess field in the log represents the App name; Reason field represents memory reason; The states field represents the state of the App when it crashes (Idle, Suspended, Frontmost…) .

In order to test the accuracy of the data, I completely quit all the apps on two devices (iPhone 6S Plus /13.3.1, iPhone 11 Pro/13.3.1) except for one Demo App to test the memory threshold. Loop for memory, ViewController code is as follows

- (void)viewDidLoad { [super viewDidLoad]; NSMutableArray *array = [NSMutableArray array]; for (NSInteger index = 0; index < 10000000; index++) { UIImageView *imageView = [[UIImageView alloc] initWithFrame:CGRectMake(0, 0, 100, 100)]; UIImage *image = [UIImage imageNamed:@"AppIcon"]; imageView.image = image; [array addObject:imageView]; }}

Here’s the iPhone 6S Plus /13.3.1 data:

{"bug_type":"298","timestamp":"2020-03-19 17:23:45.94 +0800","os_version":"iPhone OS 13.3.1" (17D50)","incident_id":"DA8AF66D-24E8-458C-8734-981866942168"} { "crashReporterKey" : "Fc9b659ce486df1ed1b8062d5c7c977a7eb8c851", "the kernel", "Darwin kernel Version 19.3.0: Thu Jan 9 21:10:44 PST 2020; Root :xnu-6153.82.3~1\/RELEASE_ARM64_S8000", "product" : "iPone8,2", "incident" : DA8AF66D-24E8-458C-8734-981866942168, "Date" : "2020-03-19 17:23:45.93 +0800", "Build" : "IPhone OS 13.3.1 (17D50)", "timeDelta" : 332, "memoryStatus" : {"compressorSize" : 48499, "compressions" : 7458651, "decompressions" : 5190200, "zoneMapCap" : 744407040, "largestZone" : "APFS_4K_OBJS", "largestZoneSize" : 41402368, "pageSize" : 16384, "uncompressed" : 104065, "zoneMapSize" : 141606912, "memoryPages" : { "active" : 26214, "throttled" : 0, "fileBacked" : 14903, "wired" : 20019, "anonymous" : 37140, "purgeable" : 142, "inactive" : 23669, "free" : 2967, "speculative" : 2160 } }, "largestProcess" : "Test", "genCounter" : 0, "processes" : [ { "uuid" : "39c5738b-b321-3865-a731-68064c4f7a6f", "states" : [ "daemon", "idle" ], "lifetimeMax" : 188, "age" : 948223699030, "purgeable" : 0, "fds" : 25, "coalition" : 422, "rpages" : 177, "pid" : 282, "idleDelta" : 824711280, "name" : ". Com. Apple Safari. SafeBrowsing. Se ", "cpuTime" : 10.275422000000001}, {" uuid ": / /... "83dbf121-7c0c-3ab5-9b66-77ee926e1561", "states" : [ "frontmost" ], "killDelta" : 2592, "genCount" : 0, "age" : 1531004794, "purgeable" : 0, "fds" : 50, "coalition" : 1047, "rpages" : 92806, "reason" : "per-process-limit", "pid" : 2384, "cpuTime" : 59.464373999999999, "name" : "Test", "lifetimeMax" : 92806}, //...] }

The critical value of iPhone 6S Plus /13.3.1 OOM is :(16384*92806)/(1024*1024)=1450.09375M

Here’s the iPhone 11 Pro/13.3.1 data:

{"bug_type":"298","timestamp":"2020-03-19 17:30:28.39 +0800","os_version":"iPhone OS 13.3.1" (17D50)","incident_id":"7F111601-BC7A-4BD7-A468-CE3370053057"} { "crashReporterKey" : "Bc2445adc164c399b330f812a48248e029e26276", "the kernel", "Darwin kernel Version 19.3.0: Thu Jan 9 21:11:10 PST 2020; Root :xnu-6153.82.3~1\/RELEASE_ARM64_T8030", "product" : "iPone12,3", "incident" : "7f111601-bc7a-4bd7-a468-ce3370053057 ", "date" : "2020-03-19 17:30:28.39 +0800", "build" : "IPhone OS 13.3.1 (17D50)", "timeDelta" : 189, "memoryStatus" : {"compressorSize" : 66443, "compressions" : 25498129, "decompressions" : 15532621, "zoneMapCap" : 1395015680, "largestZone" : "APFS_4K_OBJS", "largestZoneSize" : 41222144, "pageSize" : 16384, "uncompressed" : 127027, "zoneMapSize" : 169639936, "memoryPages" : { "active" : 58652, "throttled" : 0, "fileBacked" : 20291, "wired" : 45838, "anonymous" : 96445, "purgeable" : 4, "inactive" : 54368, "free" : 5461, "speculative" : 3716}}, "largestProcess" : "genCounter" : 0, "processes" : [{"uuid" : "2dd5eb1e-fd31-36c2-99d9-bcbff44efbb7", "states" : [ "daemon", "idle" ], "lifetimeMax" : 171, "age" : 5151034269954, "purgeable" : 0, "fds" : 50, "coalition" : 66, "rpages" : 164, "pid" : 11276, "idleDelta" : 3801132318, "name" : "the WCD", "cpuTime" : 3.430787}, {" uuid ": / /... "63158edc-915f-3a2b-975c-0e0ac4ed44c0", "states" : [ "frontmost" ], "killDelta" : 4345, "genCount" : 0, "age" : 654480778, "purgeable" : 0, "fds" : 50, "coalition" : 1718, "rpages" : 134278, "reason" : "per-process-limit", "pid" : 14206, "cpuTime" : 23.955463999999999, "name" : "lifetimeMax" : 134278}, //...] }

The critical value of iPhone 11 Pro/13.3.1 OOM is :(16384*134278)/(1024*1024)=2098.09375M

How does iOS discover Jetsam?

MacOS/iOS is a BSD-derived system with Mach as the core, but the exposed interface to the upper layer is usually based on the BSD layer wrapper around Mach. Mach is a microkernel architecture where real virtual memory management takes place, and BSD provides the upper layer interface for memory management. The Jetsam event was also generated by BSD. The bsd_init function is the entry point, where it basically initializes various subsystems, such as virtual memory management, etc.

// 1. Initialize the BSD memory allocator, which builds kmemInit () based on the Mach kernel Zone; // 2. Initialise Background Freezing #ifndef CONFIG_MEMORYSTATUS #error "CONFIG_FREEZE defined without matching CONFIG_MEMORYSTATUS" #endif /* Initialise background freezing */ bsd_init_kprintf("calling memorystatus_freeze_init\n");  memorystatus_freeze_init(); #endif> // 3. IOS unique, #if CONFIG_MEMORYSTATUS /* Initialize kernel memory status notifications */ bsd_init_kprintf("calling memorystatus_init\n"); memorystatus_init(); #endif /* CONFIG_MEMORYSTATUS */

The main effect is to start the two highest priority threads, to monitor the memory situation of the entire system.

When CONFIG_FREEZE is turned on, the kernel freezes processes instead of killing them. The freezing function is carried out by starting a memorystatus_freeze_thread in the kernel, which calls the memorystatus_freeze_top_process after receiving a signal to freeze.

IOS starts the highest-priority thread, vm_pressure_monitor, to monitor the system’s memory stress and maintains all App processes through a stack. IOS also maintains a memory snapshot table, which is used to keep the memory page consumption of each process. The logic for JetSam, also known as MemoryStatus, can be viewed in the kern_memoryStatus.h and kern_memoryStatus.c source code in the XNU project.

The JetSamEvent log is generated in at least 6 seconds before the iOS system kills the App due to high memory footprint.

As mentioned above, iOS has no swap space, so MemoryStatus (also known as JetSam) was introduced. This means freeing up as much memory as possible for the current App on iOS. This mechanism is shown in the priority, is to kill the background application; If memory is still not enough, force the current application to be killed. In MacOS, MemoryStatus will only force a process that is marked as idle to exit.

The MemoryStatus mechanism starts a MemoryStatus_Jetsam_Thread thread, which is responsible for killing and logging apps and does not send messages, so the memory stress checking thread is unable to get the message that killed the App.

When the monitoring thread finds that an App is under memory pressure, it notifies the App, and the App with memory executes the DidReceiveMoryWarning proxy method. At this time, we still have the opportunity to do some logic of memory resource release, which may avoid the APP being killed by the system.

Source view of the problem

The iOS kernel has an array that maintains the priority of threads. Each item of the array is a structure that contains a linked list of processes. The structure is as follows:

#define MEMSTAT_BUCKET_COUNT (JETSAM_PRIORITY_MAX + 1)

typedef struct memstat_bucket {
    TAILQ_HEAD(, proc) list;
    int count;
} memstat_bucket_t;

memstat_bucket_t memstat_bucket[MEMSTAT_BUCKET_COUNT];

You can see the process priority information in kern_memorystatus.h

#define JETSAM_PRIORITY_IDLE_HEAD                -2
/* The value -1 is an alias to JETSAM_PRIORITY_DEFAULT */
#define JETSAM_PRIORITY_IDLE                      0
#define JETSAM_PRIORITY_IDLE_DEFERRED          1 /* Keeping this around till all xnu_quick_tests can be moved away from it.*/
#define JETSAM_PRIORITY_AGING_BAND1          JETSAM_PRIORITY_IDLE_DEFERRED
#define JETSAM_PRIORITY_BACKGROUND_OPPORTUNISTIC  2
#define JETSAM_PRIORITY_AGING_BAND2          JETSAM_PRIORITY_BACKGROUND_OPPORTUNISTIC
#define JETSAM_PRIORITY_BACKGROUND                3
#define JETSAM_PRIORITY_ELEVATED_INACTIVE      JETSAM_PRIORITY_BACKGROUND
#define JETSAM_PRIORITY_MAIL                      4
#define JETSAM_PRIORITY_PHONE                     5
#define JETSAM_PRIORITY_UI_SUPPORT                8
#define JETSAM_PRIORITY_FOREGROUND_SUPPORT        9
#define JETSAM_PRIORITY_FOREGROUND               10
#define JETSAM_PRIORITY_AUDIO_AND_ACCESSORY      12
#define JETSAM_PRIORITY_CONDUCTOR                13
#define JETSAM_PRIORITY_HOME                     16
#define JETSAM_PRIORITY_EXECUTIVE                17
#define JETSAM_PRIORITY_IMPORTANT                18
#define JETSAM_PRIORITY_CRITICAL                 19

#define JETSAM_PRIORITY_MAX                      21

It can be clearly seen that the background App priority Jetsam_priority_background is 3, and the foreground App priority Jetsam_priority_foreground is 10.

The priority rule is: kernel thread priority > operating system priority > App priority. The priority of the foreground APP is higher than that of the background APP. When threads have the same priority, the priority of threads with more CPU is lowered.

The possible reasons for OOM can be seen in kern_memorystatus.c:

/* For logging clarity */
static const char *memorystatus_kill_cause_name[] = {
    ""                                ,        /* kMemorystatusInvalid                            */
    "jettisoned"                    ,        /* kMemorystatusKilled                            */
    "highwater"                        ,        /* kMemorystatusKilledHiwat                        */
    "vnode-limit"                    ,        /* kMemorystatusKilledVnodes                    */
    "vm-pageshortage"                ,        /* kMemorystatusKilledVMPageShortage            */
    "proc-thrashing"                ,        /* kMemorystatusKilledProcThrashing                */
    "fc-thrashing"                    ,        /* kMemorystatusKilledFCThrashing                */
    "per-process-limit"                ,        /* kMemorystatusKilledPerProcessLimit            */
    "disk-space-shortage"            ,        /* kMemorystatusKilledDiskSpaceShortage            */
    "idle-exit"                        ,        /* kMemorystatusKilledIdleExit                    */
    "zone-map-exhaustion"            ,        /* kMemorystatusKilledZoneMapExhaustion            */
    "vm-compressor-thrashing"        ,        /* kMemorystatusKilledVMCompressorThrashing        */
    "vm-compressor-space-shortage"    ,        /* kMemorystatusKilledVMCompressorSpaceShortage    */
};

Look at the key code in the memorystatus_init function to initialize the JetSam thread

__private_extern__ void memorystatus_init(void) { // ... /* Initialize the jetsam_threads state array */ jetsam_threads = kalloc(sizeof(struct jetsam_thread_state) * max_jetsam_threads); /* Initialize all the jetsam threads */ for (i = 0; i < max_jetsam_threads; i++) { result = kernel_thread_start_priority(memorystatus_thread, NULL, 95 /* MAXPRI_KERNEL */, &jetsam_threads[i].thread); if (result == KERN_SUCCESS) { jetsam_threads[i].inited = FALSE; jetsam_threads[i].index = i; thread_deallocate(jetsam_threads[i].thread); } else { panic("Could not create memorystatus_thread %d", i); }}}

/* * High-level priority assignments * ************************************************************************* * 127 Reserved (real-time) * A * + * (32 levels) * + * V * 96 Reserved (real-time) * 95 Kernel mode only * A * + * (16 levels)  * + * V * 80 Kernel mode only * 79 System high priority * A * + * (16 levels) * + * V * 64 System high priority * 63 Elevated priorities * A * + * (12 levels) * + * V * 52 Elevated priorities * 51 Elevated priorities (incl. BSD +nice) * A * + * (20 levels) * + * V * 32 Elevated priorities (incl. BSD +nice) * 31 Default (default base for threads) * 30 Lowered priorities (incl. BSD -nice) * A * + * (20 levels) * + * V * 11 Lowered priorities (incl. BSD -nice) * 10 Lowered priorities (aged pri's) * A * + * (11 levels) * + * V * 0 Lowered priorities (aged pri's / idle) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

As you can see, a user-mode application cannot have higher threads than the operating system and kernel. Also, user applications are assigned thread priorities differently, such that front-end applications have higher priority than back-end applications. The highest application priority on iOS is SpringBoard; In addition, the priority of threads is not fixed. Mach dynamically adjusts thread priorities based on thread utilization and overall system load. Thread priority is lowered if it consumes too much CPU, and is raised if it is too starved. However, the program must not exceed the priority range of the thread it is in.

It can be seen that the system will start MAX_JETSAM_THREADS (1 in general, 3 in special cases) JETSAM threads according to kernel startup parameters and device performance, and the priority of these threads is 95. MAXPRI_KERNEL (note that 95 is the priority of the thread, and XNU is the priority of the thread between 0 and 127. The macro definition above is the process priority, ranging from -2 to 19).

Next, look at the memorystatus_thread function, which is responsible for the initialization of thread startup

static void memorystatus_thread(void *param __unused, wait_result_t wr __unused) { //... while (memorystatus_action_needed()) { boolean_t killed; int32_t priority; uint32_t cause; uint64_t jetsam_reason_code = JETSAM_REASON_INVALID; os_reason_t jetsam_reason = OS_REASON_NULL; cause = kill_under_pressure_cause; switch (cause) { case kMemorystatusKilledFCThrashing: jetsam_reason_code = JETSAM_REASON_MEMORY_FCTHRASHING; break; case kMemorystatusKilledVMCompressorThrashing: jetsam_reason_code = JETSAM_REASON_MEMORY_VMCOMPRESSOR_THRASHING; break; case kMemorystatusKilledVMCompressorSpaceShortage: jetsam_reason_code = JETSAM_REASON_MEMORY_VMCOMPRESSOR_SPACE_SHORTAGE; break; case kMemorystatusKilledZoneMapExhaustion: jetsam_reason_code = JETSAM_REASON_ZONE_MAP_EXHAUSTION; break; case kMemorystatusKilledVMPageShortage: /* falls through */ default: jetsam_reason_code = JETSAM_REASON_MEMORY_VMPAGESHORTAGE; cause = kMemorystatusKilledVMPageShortage; break; } /* Highwater */ boolean_t is_critical = TRUE; if (memorystatus_act_on_hiwat_processes(&errors, &hwm_kill, &post_snapshot, &is_critical)) { if (is_critical == FALSE) { /* * For now, don't kill any other processes. */ break; } else { goto done; } } jetsam_reason = os_reason_create(OS_REASON_JETSAM, jetsam_reason_code); if (jetsam_reason == OS_REASON_NULL) { printf("memorystatus_thread: failed to allocate jetsam reason\n"); } if (memorystatus_act_aggressive(cause, jetsam_reason, &jld_idle_kills, &corpse_list_purged, &post_snapshot)) { goto done; } /* * memorystatus_kill_top_process() drops a reference, * so take another one so we can continue to use this exit reason * even after it returns */ os_reason_ref(jetsam_reason); /* LRU */ killed = memorystatus_kill_top_process(TRUE, sort_flag, cause, jetsam_reason, &priority, &errors); sort_flag = FALSE; if (killed) { if (memorystatus_post_snapshot(priority, cause) == TRUE) { post_snapshot = TRUE; } /* Jetsam Loop Detection */ if (memorystatus_jld_enabled == TRUE) { if ((priority == JETSAM_PRIORITY_IDLE) || (priority == system_procs_aging_band) || (priority == applications_aging_band)) { jld_idle_kills++; } else { /* * We've reached into bands beyond idle deferred. * We make no attempt to monitor them */ } } if ((priority >= JETSAM_PRIORITY_UI_SUPPORT) && (total_corpses_count() > 0) && (corpse_list_purged == FALSE)) { /* * If we have jetsammed a process in or above JETSAM_PRIORITY_UI_SUPPORT * then we attempt to relieve pressure by purging corpse memory. */ task_purge_all_corpses(); corpse_list_purged = TRUE; } goto done; } if (memorystatus_avail_pages_below_critical()) { /* * Still under pressure and unable to kill a process - purge corpse  memory */ if (total_corpses_count() > 0) { task_purge_all_corpses(); corpse_list_purged = TRUE; } if (memorystatus_avail_pages_below_critical()) { /* * Still under pressure and unable to kill a process - panic */ panic("memorystatus_jetsam_thread: no victim! available pages:%llu\n", (uint64_t)memorystatus_available_pages); } } done: }

You can see that it starts a loop, memorystatus_action_needed(), to continue to free memory as a condition of the loop.

static boolean_t
memorystatus_action_needed(void)
{
#if CONFIG_EMBEDDED
    return (is_reason_thrashing(kill_under_pressure_cause) ||
            is_reason_zone_map_exhaustion(kill_under_pressure_cause) ||
           memorystatus_available_pages <= memorystatus_available_pages_pressure);
#else /* CONFIG_EMBEDDED */
    return (is_reason_thrashing(kill_under_pressure_cause) ||
            is_reason_zone_map_exhaustion(kill_under_pressure_cause));
#endif /* CONFIG_EMBEDDED */
}

It uses the memory pressure sent by vm_pagepout to determine whether the current memory resource is tight. Several situations: Frequent page paging is_promis_thrashing, Mach Zone exhaustion is_promis_zone_map_exhaustion, and available pages fall below the memory status_available_pages threshold.

Continue to see memorystatus_thread, will find memory nervous, will trigger a type of High – water OOM first, that is to say, if a process is in use process more than the maximum use of memory hight water mark happens OOM. In memorystatus_act_on_hiwat_processes(), look for the process with the lowest priority in the memorystatus_kill_hiwat_proc() priority group memstat_bucket. If the memory of the process is less than the threshold (footprint_in_bytes <= MEMLIMIT_IN_bytes), then continue to look for the lower priority process until the process that occupies more memory than the threshold is found and killed.

In general, it is difficult for a single App to touch the High Water Mark, and if it fails to kill any processes, it will end up on memorystatus_act_aggressive, which is where most of the OOM occurs.

static boolean_t memorystatus_act_aggressive(uint32_t cause, os_reason_t jetsam_reason, int *jld_idle_kills, boolean_t *corpse_list_purged, boolean_t *post_snapshot) { // ... if ( (jld_bucket_count == 0) || (jld_now_msecs > (jld_timestamp_msecs + memorystatus_jld_eval_period_msecs))) { /* * Refresh evaluation parameters */ jld_timestamp_msecs = jld_now_msecs; jld_idle_kill_candidates = jld_bucket_count; *jld_idle_kills = 0; jld_eval_aggressive_count = 0; jld_priority_band_max = JETSAM_PRIORITY_UI_SUPPORT; } / /... }

JLD_NOW_MSECS > (JLD_TIMESTAMP_MSECS + MEMORYSTATUS_JLD_EVAL_PERIOD_MSECS + MEMORYSTATUS_JLD_EVAL_PERIOD_MSECS) MemoryStatus_JLD_EVAL_PERIOD_MSECS The kill occurs only after MemoryStatus_JLD_EVAL_PERIOD_MSECS.

/* Jetsam Loop Detection */
if (max_mem <= (512 * 1024 * 1024)) {
    /* 512 MB devices */
memorystatus_jld_eval_period_msecs = 8000;    /* 8000 msecs == 8 second window */
} else {
    /* 1GB and larger devices */
memorystatus_jld_eval_period_msecs = 6000;    /* 6000 msecs == 6 second window */
}

MemoryStatus_JLD_EVAL_PERIOD_MSECS has a minimum value of 6 seconds. So we can do something in six seconds.

3.2 Developers collate their earnings

StackOverflow has a list of the OOM thresholds for various devices

device	crash amount:MB	total amount:MB	percentage of total
iPad1	127	256	49%
iPad2	275	512	53%
iPad3	645	1024	62%
IPad4 (iOS 8.1)	585	1024	57%
Pad Mini 1st Generation	297	512	58%
The Mini retina (iOS 7.1)	696	1024	68%
iPad Air	697	1024	68%
The Air 2 (iOS 10.2.1)	1383	2048	68%
IPad Pro 9.7″(iOS 10.0.2, 14A456)	1395	1971	71%
IPad Pro 10.5 “(iOS 11 Beta4)	3057	4000	76%
IPad Pro 12.9 “(2015)(iOS 11.2.1)	3058	3999	76%
The 10.2 (iOS 13.2.3)	1844	2998	62%
IPod Touch 4th Gen (iOS 6.1.1)	130	256	51%
iPod touch 5th gen	286	512	56%
iPhone4	325	512	63%
iPhone4s	286	512	56%
iPhone5	645	1024	62%
iPhone5s	646	1024	63%
iPhone6(iOS 8.x)	645	1024	62%
iPhone6 Plus(iOS 8.x)	645	1024	62%
IPhone6s (iOS 9.2)	1396	2048	68%
IPhone6s Plus (iOS 10.2.1)	1396	2048	68%
IPhoneSE (iOS 9.3)	1395	2048	68%
IPhone7 (iOS 10.2)	1395	2048	68%
IPhone7 Plus (iOS 10.2.1)	2040	3072	66%
IPhone8 (iOS 12.1)	1364	1990	70%
IPhoneX (iOS 11.2.1)	1392	2785	50%
IPhoneXS (iOS 12.1)	2040	3754	54%
IPhoneXS Max (iOS 12.1)	2039	3735	55%
IPhoneXR (iOS 12.1)	1792	2813	63%
IPhone11 (iOS 13.1.3)	2068	3844	54%
IPhone11 Pro Max (iOS 13.2.3)	2067	3740	55%

3.3 Trigger the High Water Mark of the current App

We can write a timer, constantly apply for memory, and then print the current occupied memory through Phys_footprint. In theory, constantly applying for memory can trigger Jetsam mechanism and force the APP, so the memory occupied by the last printing is also the memory upper limit of the current device.

The timer = [NSTimer scheduledTimerWithTimeInterval: 0.01 target: self selector: @ the selector (allocateMemory) the userInfo: nil repeats:YES]; - (void)allocateMemory { UIImageView *imageView = [[UIImageView alloc] initWithFrame:CGRectMake(0, 0, 100, 100)]; UIImage *image = [UIImage imageNamed:@"AppIcon"]; imageView.image = image; [array addObject:imageView]; memoryLimitSizeMB = [self usedSizeOfMemory]; if (memoryWarningSizeMB && memoryLimitSizeMB) { NSLog(@"----- memory warnning:%dMB, memory limit:%dMB", memoryWarningSizeMB, memoryLimitSizeMB); } } - (int)usedSizeOfMemory { task_vm_info_data_t taskInfo; mach_msg_type_number_t infoCount = TASK_VM_INFO_COUNT; kern_return_t kernReturn = task_info(mach_task_self(), TASK_VM_INFO, (task_info_t)&taskInfo, &infoCount); if (kernReturn ! = KERN_SUCCESS) { return 0; } the return (int) (taskInfo phys_footprint / 1024.0/1024.0); }

3.4 Access mode applicable to iOS13 system

IOS13 start < OS /proc.h> size_t os_proc_available_memory(void); You can view the current available memory.

Return Value

The number of bytes that the app may allocate before it hits its memory limit. If the calling process isn’t an app, or if the process has already exceeded its memory limit, this function returns 0.

Discussion

Call this function to determine the amount of memory available to your app. The returned value corresponds to the current memory limit minus the memory footprint of your app at the time of the function call. Your app’s memory footprint consists of the data that you allocated in RAM, and that must stay in RAM (or the equivalent) at all times. Memory limits can change during the app life cycle and don’t necessarily correspond to the amount of physical memory available on the device.

Use the returned value as advisory information only and don’t cache it. The precise value changes when your app does any work that affects memory, which can happen frequently.

Although this function lets you determine the amount of memory your app may safely consume, don’t use it to maximize your app’s memory usage. Significant memory use, even when under the current memory limit, affects system performance. For example, when your app consumes all of its available memory, the system may need to terminate other apps and system processes to accommodate your app’s requests. Instead, always consume the smallest amount of memory you need to be responsive to the user’s needs.

If you need more detailed information about the available memory resources, you can call task_info. However, be aware that task_info is an expensive call, whereas this function is much more efficient.

If (@availability (iOS 13.0, *)) {return os_proc_available_memory() / 1024.0/1024.0; }

The API of APP memory information can be found in the Mach layer. The mach_task_basic_info structure stores the memory usage information of Mach task, where the phys_footprint is the physical memory size used by the application. Virtual_size is the virtual memory size.

#define MACH_TASK_BASIC_INFO     20         /* always 64-bit basic info */
struct mach_task_basic_info {
    mach_vm_size_t  virtual_size;       /* virtual memory size (bytes) */
    mach_vm_size_t  resident_size;      /* resident memory size (bytes) */
    mach_vm_size_t  resident_size_max;  /* maximum resident memory size (bytes) */
    time_value_t    user_time;          /* total user run time for
                                            terminated threads */
    time_value_t    system_time;        /* total system run time for
                                            terminated threads */
    policy_t        policy;             /* default policy for new threads */
    integer_t       suspend_count;      /* suspend count for task */
};

So the fetch code is

task_vm_info_data_t vmInfo; mach_msg_type_number_t count = TASK_VM_INFO_COUNT; kern_return_t kr = task_info(mach_task_self(), TASK_VM_INFO, (task_info_t)&vmInfo, &count); if (kr ! = KERN_SUCCESS) { return ; } CGFloat MemoryUsed = (CGFloat)(vminfo.Phys_Footprint /1024.0/1024.0);

Anyone wondering if resident_size should not be the field to get memory usage? After the initial test, it is found that the resident_size and Xcode measurement results are significantly different. Using Phys_footprint is close to the result given by Xcode. And can be confirmed from the WebKit source code.

So on iOS13, we can use os_proc_available_memory to get the current available memory, and we can use phys_footprint to get the current App occupied memory, the sum of 2 which is the memory limit of the current device, Exceed triggers the Jetsam mechanism.

- (CGFloat) LimitSizeOfMemory {if (@Available (iOS 13.0, *)) {task_vm_info_data_t taskInfo; mach_msg_type_number_t infoCount = TASK_VM_INFO_COUNT; kern_return_t kernReturn = task_info(mach_task_self(), TASK_VM_INFO, (task_info_t)&taskInfo, &infoCount); if (kernReturn ! = KERN_SUCCESS) { return 0; } return (CGFloat)((TaskInfo.PHYS_FOOTPRINT + os_proc_available_memory())/(1024.0 * 1024.0); } return 0; }

Memory currently available: 1435.936752MB; Current APP occupied memory: 14.5MB, critical value: 1435.936752MB + 14.5MB= 1450.436MB, the same as the memory threshold obtained in method 3.1 “iPhone 6S Plus /13.3.1 mobile phone OOM critical value: (16384 * 92806)/(1024 * 1024) = 1450.09375 M “.

3.5 Get the memory limit value via XNU

In XNU, there are functions and macros for memorystatus_priority_entry to get the priority and memory-limit values for all processes.

typedef struct memorystatus_priority_entry {
  pid_t pid;
  int32_t priority;
  uint64_t user_data;
  int32_t limit;
  uint32_t state;
} memorystatus_priority_entry_t;

Priority represents the process’s priority, and Limit represents the process’s memory limit. But this requires root access, which I haven’t tried since I don’t have a jailbreak device.

See the kern_memorystatus.h file for the code. Int memorystatus_control(uint32_t command, int32_t pid, uint32_t flags, void *buffer, size_t buffersize);

/* Commands */
#define MEMORYSTATUS_CMD_GET_PRIORITY_LIST            1
#define MEMORYSTATUS_CMD_SET_PRIORITY_PROPERTIES      2
#define MEMORYSTATUS_CMD_GET_JETSAM_SNAPSHOT          3
#define MEMORYSTATUS_CMD_GET_PRESSURE_STATUS          4
#define MEMORYSTATUS_CMD_SET_JETSAM_HIGH_WATER_MARK   5    /* Set active memory limit = inactive memory limit, both non-fatal    */
#define MEMORYSTATUS_CMD_SET_JETSAM_TASK_LIMIT          6    /* Set active memory limit = inactive memory limit, both fatal    */
#define MEMORYSTATUS_CMD_SET_MEMLIMIT_PROPERTIES      7    /* Set memory limits plus attributes independently            */
#define MEMORYSTATUS_CMD_GET_MEMLIMIT_PROPERTIES      8    /* Get memory limits plus attributes                    */
#define MEMORYSTATUS_CMD_PRIVILEGED_LISTENER_ENABLE   9    /* Set the task's status as a privileged listener w.r.t memory notifications  */
#define MEMORYSTATUS_CMD_PRIVILEGED_LISTENER_DISABLE  10   /* Reset the task's status as a privileged listener w.r.t memory notifications  */
#define MEMORYSTATUS_CMD_AGGRESSIVE_JETSAM_LENIENT_MODE_ENABLE  11   /* Enable the 'lenient' mode for aggressive jetsam. See comments in kern_memorystatus.c near the top. */
#define MEMORYSTATUS_CMD_AGGRESSIVE_JETSAM_LENIENT_MODE_DISABLE 12   /* Disable the 'lenient' mode for aggressive jetsam. */
#define MEMORYSTATUS_CMD_GET_MEMLIMIT_EXCESS          13   /* Compute how much a process's phys_footprint exceeds inactive memory limit */
#define MEMORYSTATUS_CMD_ELEVATED_INACTIVEJETSAMPRIORITY_ENABLE     14 /* Set the inactive jetsam band for a process to JETSAM_PRIORITY_ELEVATED_INACTIVE */
#define MEMORYSTATUS_CMD_ELEVATED_INACTIVEJETSAMPRIORITY_DISABLE     15 /* Reset the inactive jetsam band for a process to the default band (0)*/
#define MEMORYSTATUS_CMD_SET_PROCESS_IS_MANAGED       16   /* (Re-)Set state on a process that marks it as (un-)managed by a system entity e.g. assertiond */
#define MEMORYSTATUS_CMD_GET_PROCESS_IS_MANAGED       17   /* Return the 'managed' status of a process */
#define MEMORYSTATUS_CMD_SET_PROCESS_IS_FREEZABLE     18   /* Is the process eligible for freezing? Apps and extensions can pass in FALSE to opt out of freezing, i.e.,

Pseudo code

struct memorystatus_priority_entry memStatus[NUM_ENTRIES];
size_t count = sizeof(struct memorystatus_priority_entry) * NUM_ENTRIES;
int kernResult = memorystatus_control(MEMORYSTATUS_CMD_GET_PRIORITY_LIST, 0, 0, memStatus, count);
if (rc < 0) {
  NSLog(@"memorystatus_control"); 
    return ;
}

int entry = 0;
for (; rc > 0; rc -= sizeof(struct memorystatus_priority_entry)){
  printf ("PID: %5d\tPriority:%2d\tUser Data: %llx\tLimit:%2d\tState:%s\n",
          memstatus[entry].pid,
          memstatus[entry].priority,
          memstatus[entry].user_data,
          memstatus[entry].limit,
          state_to_text(memstatus[entry].state));
  entry++;
}

The for loop prints out the PID, Priority, User Data, Limit, and State information for each process (i.e., App). Find the process with priority 10 from the log, which is the App we’re running in the foreground. Why 10? Because #define JETSAM_PRIORITY_FOREGROUND 10, our goal is to get the foreground App’s memory upper limit.

4. How to determine OOM

Will an app receive a low memory warning before crashing in OOM?

Two groups of comparative experiments were done:

// 1 NSMutableArray * Array = [NSMutableArray]; for (NSInteger index = 0; index < 10000000; index++) { NSString *filePath = [[NSBundle mainBundle] pathForResource:@"Info" ofType:@"plist"]; NSData *data = [NSData dataWithContentsOfFile:filePath]; [array addObject:data]; }

// ViewController.m - (void)viewDidLoad {[super viewDidLoad]; dispatch_async(dispatch_get_global_queue(0, 0), ^{ NSMutableArray *array = [NSMutableArray array]; for (NSInteger index = 0; index < 10000000; index++) { NSString *filePath = [[NSBundle mainBundle] pathForResource:@"Info" ofType:@"plist"]; NSData *data = [NSData dataWithContentsOfFile:filePath]; [array addObject:data]; }}); } - (void)didReceiveMemoryWarning { NSLog(@"2"); } // AppDelegate.m - (void)applicationDidReceiveMemoryWarning:(UIApplication *)application { NSLog(@"1"); }

Phenomenon:

So in viewDidLoad, which is the main thread, the system doesn’t issue a low memory warning, it crashes. The main thread is busy because memory is growing too fast.
In the case of multiple threads, the App will receive a low memory warning because the memory is growing too fastapplicationDidReceiveMemoryWarningExecute first, followed by the current VC’sdidReceiveMemoryWarning.

Conclusion:

Reception of a low memory warning does not necessarily cause a Crash, because the system has 6 seconds to determine whether a memory drop occurs. An OOM does not necessarily receive a low memory warning.

5. Memory information collection

To accurately locate the problem, you need to dump all objects and their memory information. When the memory is close to the upper limit of the system memory, the required information is collected and recorded. Combined with certain data reporting mechanism, the information is uploaded to the server for analysis and repair.

You also need to know exactly in which function each object was created in order to restore the crime scene.

Source code (libmalloc/malloc), memory allocation functions such as malloc and calloc use nano_zone by default. Nano_zone is the allocation of memory less than 256B, and scalable_zone is used for allocation greater than 256B.

Mainly for large memory allocation monitoring. The malloc function uses malloc_zone_malloc, and calloc uses malloc_zone_calloc.

Any function that allocates memory using scalable_zone calls the malloc_logger function, because the system has a place to count and manage memory allocations. This design also satisfies the “closing principle”.

void *
malloc(size_t size)
{
    void *retval;
    retval = malloc_zone_malloc(default_zone, size);
    if (retval == NULL) {
        errno = ENOMEM;
    }
    return retval;
}

void *
calloc(size_t num_items, size_t size)
{
    void *retval;
    retval = malloc_zone_calloc(default_zone, num_items, size);
    if (retval == NULL) {
        errno = ENOMEM;
    }
    return retval;
}

Let’s first look at what default_zone is. What is the code

typedef struct {
    malloc_zone_t malloc_zone;
    uint8_t pad[PAGE_MAX_SIZE - sizeof(malloc_zone_t)];
} virtual_default_zone_t;

static virtual_default_zone_t virtual_default_zone
__attribute__((section("__DATA,__v_zone")))
__attribute__((aligned(PAGE_MAX_SIZE))) = {
    NULL,
    NULL,
    default_zone_size,
    default_zone_malloc,
    default_zone_calloc,
    default_zone_valloc,
    default_zone_free,
    default_zone_realloc,
    default_zone_destroy,
    DEFAULT_MALLOC_ZONE_STRING,
    default_zone_batch_malloc,
    default_zone_batch_free,
    &default_zone_introspect,
    10,
    default_zone_memalign,
    default_zone_free_definite_size,
    default_zone_pressure_relief,
    default_zone_malloc_claimed_address,
};

static malloc_zone_t *default_zone = &virtual_default_zone.malloc_zone;

static void *
default_zone_malloc(malloc_zone_t *zone, size_t size)
{
    zone = runtime_default_zone();
    
    return zone->malloc(zone, size);
}


MALLOC_ALWAYS_INLINE
static inline malloc_zone_t *
runtime_default_zone() {
    return (lite_zone) ? lite_zone : inline_malloc_default_zone();
}

You can see that the default_zone is initialized in this way

static inline malloc_zone_t *
inline_malloc_default_zone(void)
{
    _malloc_initialize_once();
    // malloc_report(ASL_LEVEL_INFO, "In inline_malloc_default_zone with %d %d\n", malloc_num_zones, malloc_has_debug_zone);
    return malloc_zones[0];
}

The following call is _malloc_initialize-> create_scalable_zone -> create_scalable_szone and finally we create an object of type szone_t, and through type conversion, So we get our default_zone.

malloc_zone_t *
create_scalable_zone(size_t initial_size, unsigned debug_flags) {
    return (malloc_zone_t *) create_scalable_szone(initial_size, debug_flags);
}

void *malloc_zone_malloc(malloc_zone_t *zone, size_t size) { MALLOC_TRACE(TRACE_malloc | DBG_FUNC_START, (uintptr_t)zone, size, 0, 0); void *ptr; if (malloc_check_start && (malloc_check_counter++ >= malloc_check_start)) { internal_check(); } if (size > MALLOC_ABSOLUTE_MAX_SIZE) { return NULL; } ptr = zone->malloc(zone, size); / / in zone distribution after the memory began using malloc_logger for recording the if (malloc_logger) {malloc_logger (MALLOC_LOG_TYPE_ALLOCATE | MALLOC_LOG_TYPE_HAS_ZONE, (uintptr_t)zone, (uintptr_t)size, 0, (uintptr_t)ptr, 0); } MALLOC_TRACE(TRACE_malloc | DBG_FUNC_END, (uintptr_t)zone, size, (uintptr_t)ptr, 0); return ptr; }

The allocation implementation is zone->malloc, which is the corresponding malloc implementation in the szone_t structure object according to the previous analysis.

After creating the SZone, we do a series of initializations as follows.

// Initialize the security token.
szone->cookie = (uintptr_t)malloc_entropy[0];

szone->basic_zone.version = 12;
szone->basic_zone.size = (void *)szone_size;
szone->basic_zone.malloc = (void *)szone_malloc;
szone->basic_zone.calloc = (void *)szone_calloc;
szone->basic_zone.valloc = (void *)szone_valloc;
szone->basic_zone.free = (void *)szone_free;
szone->basic_zone.realloc = (void *)szone_realloc;
szone->basic_zone.destroy = (void *)szone_destroy;
szone->basic_zone.batch_malloc = (void *)szone_batch_malloc;
szone->basic_zone.batch_free = (void *)szone_batch_free;
szone->basic_zone.introspect = (struct malloc_introspection_t *)&szone_introspect;
szone->basic_zone.memalign = (void *)szone_memalign;
szone->basic_zone.free_definite_size = (void *)szone_free_definite_size;
szone->basic_zone.pressure_relief = (void *)szone_pressure_relief;
szone->basic_zone.claimed_address = (void *)szone_claimed_address;

Other functions that use scalable_zone to allocate memory have a similar approach, so large memory allocations, regardless of how encapsulated by external functions, will eventually be called to the malloc_logger function. Therefore, we can use Fishhook to hook this function, and then record the memory allocation, combined with certain data reporting mechanism, upload it to the server, analyze and repair it.

// For logging VM allocation and deallocation, arg1 here
// is the mach_port_name_t of the target task in which the
// alloc or dealloc is occurring. For example, for mmap()
// that would be mach_task_self(), but for a cross-task-capable
// call such as mach_vm_map(), it is the target task.

typedef void (malloc_logger_t)(uint32_t type, uintptr_t arg1, uintptr_t arg2, uintptr_t arg3, uintptr_t result, uint32_t num_hot_frames_to_skip);

extern malloc_logger_t *__syscall_logger;

When the Pointers to the malloc_logger and __syscall_logger functions are not empty, the allocation/release of memory such as malloc/free and vm_allocate/vm_deallocate is notified by these two Pointers. This is also how memory debugging tool malloc stack is implemented. With these two function Pointers, we can easily record the memory allocation of the current living object (including the allocation size and allocation stack). The allocation stack can be captured using the backtrace function, but the captured address is a virtual memory address that cannot parse symbols from the symbol table DSYM. So also record the offset slide for each image when it is loaded, such that symbol table address = stack address -slide.

Small tips:

ASLR (Address space layout randomization) : Common name for the address space of random load, address space configuration randomized, address space layout randomization, is a kind of prevent damage of memory leaks by use of computer security technologies, key data area by randomly placed process addressing space to place the attacker can jump to the memory of a particular location in a reliable way to operate the function. This mechanism is typically present in modern operating systems.

Add: The actual implementation address of the function;

Function virtual address: vm_add;

ASLR: The virtual address of the slide function is loaded into the process memory at a random offset, which varies for each Mach-O slide. Vm_add + slide = add. *(base +offset)= imp.

Since Tencent has also open source its OOM positioning scheme – OomDetector, with the existing wheel, it will be OK to use it well. Therefore, the memory monitoring idea is to find the upper limit of the memory given by the system to the APP, and then dump the memory situation when it is close to the upper limit of the memory. Assemble the basic data information into a qualified report data, report the data to the server through certain data reporting strategies, the server consumes the data, analyzes and generates reports, and the client engineers analyze problems according to the reports. The data of different projects will be notified to the owner and developer of the project in the form of email, SMS, enterprise WeChat, etc. (If the case is serious, I will call the developer directly and follow up the result of each step to the supervisor). After the problem is analyzed and fixed, either release a new version or hot fix it.

6. What can we do about memory during development

zooming

WWDC 2018 Session 416 – iOS Memory Deep Dive. Processing image scaling directly using UIImage will consume a portion of Memory by reading the file while decoding, and will also generate intermediate bitmaps that consume a lot of Memory. ImageIO does not have the two disadvantages mentioned above. It only takes up the memory of the final image size

Two groups of contrast experiments were done: showing an image to the App

/ / method 1: 19.6m UIImage * ImageResult = [self scaleImage:[UIImage imageNamed:@"test"] newSize:CGSizeMake(self.view.frame.size.width, self.view.frame.size.height)]; self.imageView.image = imageResult; // Method 2: 14M nsData *data = UIImageNGrePresentation ([UIImageNamed :@"test"]]); UIImage *imageResult = [self scaledImageWithData:data withSize:CGSizeMake(self.view.frame.size.width, self.view.frame.size.height) scale:3 orientation:UIImageOrientationUp]; self.imageView.image = imageResult; - (UIImage *)scaleImage:(UIImage *)image newSize:(CGSize)newSize { UIGraphicsBeginImageContextWithOptions(newSize, NO, 0); [image drawInRect:CGRectMake(0, 0, newSize.width, newSize.height)]; UIImage *newImage = UIGraphicsGetImageFromCurrentImageContext(); UIGraphicsEndImageContext(); return newImage; } - (UIImage *)scaledImageWithData:(NSData *)data withSize:(CGSize)size scale:(CGFloat)scale orientation:(UIImageOrientation)orientation { CGFloat maxPixelSize = MAX(size.width, size.height); CGImageSourceRef sourceRef = CGImageSourceCreateWithData((__bridge CFDataRef)data, nil); NSDictionary *options = @{(__bridge id)kCGImageSourceCreateThumbnailFromImageAlways : (__bridge id)kCFBooleanTrue, (__bridge id)kCGImageSourceThumbnailMaxPixelSize : [NSNumber numberWithFloat:maxPixelSize]}; CGImageRef imageRef = CGImageSourceCreateThumbnailAtIndex(sourceRef, 0, (__bridge CFDictionaryRef)options); UIImage *resultImage = [UIImage imageWithCGImage:imageRef scale:scale orientation:orientation]; CGImageRelease(imageRef); CFRelease(sourceRef); return resultImage; }

You can see that using ImageIO takes less memory than using UIImage direct scaling.

Use AutoReleasePool properly

We know that the AutoReleasePool object is released at the end of the RunLoop. In ARC, if we are constantly applying for memory, such as various loops, then we need to manually add AutoReleasePool to prevent the memory from popping into OOM in a short time.

Contrast experiment

// 1 NSMutableArray * Array = [NSMutableArray]; for (NSInteger index = 0; index < 10000000; index++) { NSString *indexStrng = [NSString stringWithFormat:@"%zd", index]; NSString *resultString = [NSString stringWithFormat:@"%zd-%@", index, indexStrng]; [array addObject:resultString]; } // 2 NSMutableArray *array = [NSMutableArray]; for (NSInteger index = 0; index < 10000000; index++) { @autoreleasepool { NSString *indexStrng = [NSString stringWithFormat:@"%zd", index]; NSString *resultString = [NSString stringWithFormat:@"%zd-%@", index, indexStrng]; [array addObject:resultString]; }}

Experiment 1 consumed 739.6M of memory, while Experiment 2 consumed 587M of memory.

UIGraphicsBeginImageContext and UIGraphicsEndImageContext must appear in pairs, or it will cause leakage of the context. In addition, Xcode’s Analyze also sweeps out such problems.
Whether it is to open a web page, or execute JS, it should use WKWebView. UIWebView uses a lot of memory, which increases the chance of an App OOM, while WKWebView is a multi-process component, Network Loading and UI Rendering are performed in other processes. Lower memory overhead than UIWebView.
When making an SDK or App, if the scene is cache related, try to use NSCache instead of NSMutableDictionary. Nscache allocates Purgeable Memory, which can be automatically freed by the system. The combination of NSCache with NSPureableData allows the system to reclaim memory as appropriate, as well as remove objects during memory cleanup.

Other development habits are not a description, good development habits and code awareness is to pay attention to practice.

7. Status quo and improvements

After using a number of the industry’s best memory monitoring tools, I found some problems with mLeaksFinder, OomDetector, FbretainCycleDetector, and others. For example, mLeaksFinder will have false positives because it detects memory leaks purely through VC push, pop, etc. FbretainCycleDetector will have some performance issues due to object depth-first traversal, which will affect App performance. OomDetector because there is no appropriate timing to trigger.

There are two ways of thinking:

MLeaksFinder + FBRetainCycleDetectorCombine to improve accuracy
Drawing on the implementation scheme of the headline: an online scheme based on the Memory snapshot technology, we call it the online Memory Graph. (Quoted below)

It can only deal with part of the circular reference problem, but the memory problem is usually complex, such as memory accumulation, Root Leak, C/C++ layer problems can not be solved.

The scheme based on allocation stack information clustering needs permanent operation, which has a large consumption of memory, CPU and other resources. It cannot monitor users with memory problems, but can only cast a wide net, which has a great impact on user experience. At the same time, memory allocated by some of the more common stacks cannot be located in the actual memory usage scenarios, nor can it be analyzed for common leaks such as circular references.

The core principle is: scan all the Dirty memory in the process, and build a directed graph of reference relationship between memory nodes by the address value of other memory nodes stored in the memory node.

The full explanation can be seen here). For those interested in the implementation details of the Memory Graph, see this article

5. APP network monitoring

Mobile network environment has always been very complex, WiFi, 2G, 3G, 4G, 5G, etc., users may switch between these types in the process of using APP, which is also a difference between mobile network and traditional network, known as “Connection Migration”. In addition, there are slow DNS resolution, high failure rate, carrier hijacking and other problems. Users have a poor experience when using APP for some reasons. In order to improve the network situation, there must be clear monitoring means.

1. APP network request process

APP will generally go through the following key steps to send a network request:

The DNS

The Domain Name System is essentially a distributed database that maps Domain names and IP addresses to each other, making it easier for people to access the Internet. First, the local DNS cache will be queried, and if the search fails, the DNS server will be queried, which may go through a lot of nodes, involving the process of recursive query and iterative query. Carrier may not work: Carrier hijacking occurs when you visit a web page in an App and you see an AD that doesn’t relate to the content. Another possible situation is to send your request to a very far base station for DNS parsing, which will lead to a long DNS parsing time for our App and low efficiency of the App network. Usually do HTTPDNS solution to solve the problem of DNS.
TCP 3 handshakes

Check out this article on why there are three handshakes instead of two or four in a TCP handshake.
The TLS handshake

For HTTPS requests, there is also a TLS handshake, which is the process of key negotiation.
Send the request

After the connection is established, the request can be sent. At this time, the request start time can be recorded
Wait for the response

Wait for the server to return a response. This time depends on the size of the resource and is the most time-consuming stage in the network request process.
Returns a response

The server returns the response to the client and determines whether the request was successful, cached, or redirected according to the status code in the HTTP header information.

2. Monitoring principle

The name of the	instructions
NSURLConnection	It has been abandoned. Use simple
NSURLSession	IOS 7.0 comes out with more powerful features
CFNetwork	The underlying NSURL, pure C implementation

The hierarchy of iOS network framework is as follows:

The status quo of iOS network is composed of four layers: BSD Sockets and SecureTransport at the bottom; The secondary bottom layer is CFNetwork, NSURLSession, NSURLConnection, WebView is implemented in Objective-C, and CFNetwork is called; AFNetworking is an application layer framework based on NSURLSession and NSURLConnection.

At present, there are mainly two kinds of network monitoring in the industry: one is through NSURLPROTOCOL monitoring, the other is through HOOK monitoring. Here are several ways to monitor network requests, each with advantages and disadvantages.

2.1 Scheme 1: NSURLPROTOCOL monitors APP network requests

As the upper interface, NSURLPROTOCOL is relatively simple to use, but NSURLPROTOCOL belongs to the URL Loading System System. Application protocol support degree is limited, support FTP, HTTP, HTTPS and other application layer protocols, for other protocols can not be monitored, there are certain limitations. If you monitor the underlying network library, CFNetwork, you do not have this restriction.

The specific approach of NSURLPROTOCOL is described in this article. It inherits the abstract class and implements the corresponding method, and custom-defines to initiate the network request to achieve the purpose of monitoring.

Since iOS 10, a new delegate method has been added to NSURLSESSIONTaskDelegate:

/* * Sent when complete statistics information has been collected for the task. */ - (void)URLSession:(NSURLSession *)session task:(NSURLSessionTask *)task didFinishCollectingMetrics:(NSURLSessionTaskMetrics *)metrics API_AVAILABLE (macosx (10.12), the ios (10.0), watchos (3.0), tvos (10.0));

You can get an indication of the network situation from NSURLSESSIONTASKMetrics. The parameters are as follows

@interface NSURLSessionTaskMetrics : NSObject /* * transactionMetrics array contains the metrics collected for every request/response transaction created during the task execution. */ @property (copy, readonly) NSArray<NSURLSessionTaskTransactionMetrics *> *transactionMetrics; /* * Interval from the task creation time to the task completion time. * Task creation time is the time when the task was instantiated. * Task completion time is the time when the task is about to change its internal state to completed. */ @property (copy, readonly) NSDateInterval *taskInterval; /* * redirectCount is the number of redirects that were recorded. */ @property (assign, readonly) NSUInteger redirectCount; - (InstanceType)init API_DEPRECATED("Not supported", macOS (10.12,10.15), iOS (10.0,13.0), watchOS (3.0,6.0), Tvos (10.0, 13.0)); + (InstanceType)new API_DEPRECATED("Not supported", macOS (10.12,10.15), iOS (10.0,13.0), watchOS (3.0,6.0), Tvos (10.0, 13.0)); @end

Where: taskInterval refers to the total time from task creation to completion. Task creation time is the time when the task is instantiated, and task completion time is the time when the internal state of the task will be changed to complete. RedirectCount is the number of times it was redirected; The TransactionMetrics array contains the metrics collected during each request/response transaction during the execution of the task, with the following parameters:

/*
 * This class defines the performance metrics collected for a request/response transaction during the task execution.
 */
API_AVAILABLE(macosx(10.12), ios(10.0), watchos(3.0), tvos(10.0))
@interface NSURLSessionTaskTransactionMetrics : NSObject

/*
 * Represents the transaction request. 请求事务
 */
@property (copy, readonly) NSURLRequest *request;

/*
 * Represents the transaction response. Can be nil if error occurred and no response was generated. 响应事务
 */
@property (nullable, copy, readonly) NSURLResponse *response;

/*
 * For all NSDate metrics below, if that aspect of the task could not be completed, then the corresponding “EndDate” metric will be nil.
 * For example, if a name lookup was started but the name lookup timed out, failed, or the client canceled the task before the name could be resolved -- then while domainLookupStartDate may be set, domainLookupEndDate will be nil along with all later metrics.
 */

/*
 * 客户端开始请求的时间，无论是从服务器还是从本地缓存中获取
 * fetchStartDate returns the time when the user agent started fetching the resource, whether or not the resource was retrieved from the server or local resources.
 *
 * The following metrics will be set to nil, if a persistent connection was used or the resource was retrieved from local resources:
 *
 *   domainLookupStartDate
 *   domainLookupEndDate
 *   connectStartDate
 *   connectEndDate
 *   secureConnectionStartDate
 *   secureConnectionEndDate
 */
@property (nullable, copy, readonly) NSDate *fetchStartDate;

/*
 * domainLookupStartDate returns the time immediately before the user agent started the name lookup for the resource. DNS 开始解析的时间
 */
@property (nullable, copy, readonly) NSDate *domainLookupStartDate;

/*
 * domainLookupEndDate returns the time after the name lookup was completed. DNS 解析完成的时间
 */
@property (nullable, copy, readonly) NSDate *domainLookupEndDate;

/*
 * connectStartDate is the time immediately before the user agent started establishing the connection to the server.
 *
 * For example, this would correspond to the time immediately before the user agent started trying to establish the TCP connection. 客户端与服务端开始建立 TCP 连接的时间
 */
@property (nullable, copy, readonly) NSDate *connectStartDate;

/*
 * If an encrypted connection was used, secureConnectionStartDate is the time immediately before the user agent started the security handshake to secure the current connection. HTTPS 的 TLS 握手开始的时间
 *
 * For example, this would correspond to the time immediately before the user agent started the TLS handshake. 
 *
 * If an encrypted connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSDate *secureConnectionStartDate;

/*
 * If an encrypted connection was used, secureConnectionEndDate is the time immediately after the security handshake completed. HTTPS 的 TLS 握手结束的时间
 *
 * If an encrypted connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSDate *secureConnectionEndDate;

/*
 * connectEndDate is the time immediately after the user agent finished establishing the connection to the server, including completion of security-related and other handshakes. 客户端与服务器建立 TCP 连接完成的时间，包括 TLS 握手时间
 */
@property (nullable, copy, readonly) NSDate *connectEndDate;

/*
 * requestStartDate is the time immediately before the user agent started requesting the source, regardless of whether the resource was retrieved from the server or local resources.
 客户端请求开始的时间，可以理解为开始传输 HTTP 请求的 header 的第一个字节时间
 *
 * For example, this would correspond to the time immediately before the user agent sent an HTTP GET request.
 */
@property (nullable, copy, readonly) NSDate *requestStartDate;

/*
 * requestEndDate is the time immediately after the user agent finished requesting the source, regardless of whether the resource was retrieved from the server or local resources.
 客户端请求结束的时间，可以理解为 HTTP 请求的最后一个字节传输完成的时间
 *
 * For example, this would correspond to the time immediately after the user agent finished sending the last byte of the request.
 */
@property (nullable, copy, readonly) NSDate *requestEndDate;

/*
 * responseStartDate is the time immediately after the user agent received the first byte of the response from the server or from local resources.
 客户端从服务端接收响应的第一个字节的时间
 *
 * For example, this would correspond to the time immediately after the user agent received the first byte of an HTTP response.
 */
@property (nullable, copy, readonly) NSDate *responseStartDate;

/*
 * responseEndDate is the time immediately after the user agent received the last byte of the resource. 客户端从服务端接收到最后一个请求的时间
 */
@property (nullable, copy, readonly) NSDate *responseEndDate;

/*
 * The network protocol used to fetch the resource, as identified by the ALPN Protocol ID Identification Sequence [RFC7301].
 * E.g., h2, http/1.1, spdy/3.1.
 网络协议名，比如 http/1.1, spdy/3.1
 *
 * When a proxy is configured AND a tunnel connection is established, then this attribute returns the value for the tunneled protocol.
 *
 * For example:
 * If no proxy were used, and HTTP/2 was negotiated, then h2 would be returned.
 * If HTTP/1.1 were used to the proxy, and the tunneled connection was HTTP/2, then h2 would be returned.
 * If HTTP/1.1 were used to the proxy, and there were no tunnel, then http/1.1 would be returned.
 *
 */
@property (nullable, copy, readonly) NSString *networkProtocolName;

/*
 * This property is set to YES if a proxy connection was used to fetch the resource.
    该连接是否使用了代理
 */
@property (assign, readonly, getter=isProxyConnection) BOOL proxyConnection;

/*
 * This property is set to YES if a persistent connection was used to fetch the resource.
 是否复用了现有连接
 */
@property (assign, readonly, getter=isReusedConnection) BOOL reusedConnection;

/*
 * Indicates whether the resource was loaded, pushed or retrieved from the local cache.
 获取资源来源
 */
@property (assign, readonly) NSURLSessionTaskMetricsResourceFetchType resourceFetchType;

/*
 * countOfRequestHeaderBytesSent is the number of bytes transferred for request header.
 请求头的字节数
 */
@property (readonly) int64_t countOfRequestHeaderBytesSent API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfRequestBodyBytesSent is the number of bytes transferred for request body.
 请求体的字节数
 * It includes protocol-specific framing, transfer encoding, and content encoding.
 */
@property (readonly) int64_t countOfRequestBodyBytesSent API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfRequestBodyBytesBeforeEncoding is the size of upload body data, file, or stream.
 上传体数据、文件、流的大小
 */
@property (readonly) int64_t countOfRequestBodyBytesBeforeEncoding API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfResponseHeaderBytesReceived is the number of bytes transferred for response header.
 响应头的字节数
 */
@property (readonly) int64_t countOfResponseHeaderBytesReceived API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfResponseBodyBytesReceived is the number of bytes transferred for response body.
 响应体的字节数
 * It includes protocol-specific framing, transfer encoding, and content encoding.
 */
@property (readonly) int64_t countOfResponseBodyBytesReceived API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfResponseBodyBytesAfterDecoding is the size of data delivered to your delegate or completion handler.
给代理方法或者完成后处理的回调的数据大小
 
 */
@property (readonly) int64_t countOfResponseBodyBytesAfterDecoding API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * localAddress is the IP address string of the local interface for the connection.
  当前连接下的本地接口 IP 地址
 *
 * For multipath protocols, this is the local address of the initial flow.
 *
 * If a connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSString *localAddress API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * localPort is the port number of the local interface for the connection.
 当前连接下的本地端口号
 
 *
 * For multipath protocols, this is the local port of the initial flow.
 *
 * If a connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSNumber *localPort API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * remoteAddress is the IP address string of the remote interface for the connection.
 当前连接下的远端 IP 地址
 *
 * For multipath protocols, this is the remote address of the initial flow.
 *
 * If a connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSString *remoteAddress API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * remotePort is the port number of the remote interface for the connection.
  当前连接下的远端端口号
 *
 * For multipath protocols, this is the remote port of the initial flow.
 *
 * If a connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSNumber *remotePort API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * negotiatedTLSProtocolVersion is the TLS protocol version negotiated for the connection.
  连接协商用的 TLS 协议版本号
 * It is a 2-byte sequence in host byte order.
 *
 * Please refer to tls_protocol_version_t enum in Security/SecProtocolTypes.h
 *
 * If an encrypted connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSNumber *negotiatedTLSProtocolVersion API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * negotiatedTLSCipherSuite is the TLS cipher suite negotiated for the connection.
 连接协商用的 TLS 密码套件
 * It is a 2-byte sequence in host byte order.
 *
 * Please refer to tls_ciphersuite_t enum in Security/SecProtocolTypes.h
 *
 * If an encrypted connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSNumber *negotiatedTLSCipherSuite API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * Whether the connection is established over a cellular interface.
 是否是通过蜂窝网络建立的连接
 */
@property (readonly, getter=isCellular) BOOL cellular API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * Whether the connection is established over an expensive interface.
 是否通过昂贵的接口建立的连接
 */
@property (readonly, getter=isExpensive) BOOL expensive API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * Whether the connection is established over a constrained interface.
 是否通过受限接口建立的连接
 */
@property (readonly, getter=isConstrained) BOOL constrained API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * Whether a multipath protocol is successfully negotiated for the connection.
 是否为了连接成功协商了多路径协议
 */
@property (readonly, getter=isMultipath) BOOL multipath API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));


- (instancetype)init API_DEPRECATED("Not supported", macos(10.12,10.15), ios(10.0,13.0), watchos(3.0,6.0), tvos(10.0,13.0));
+ (instancetype)new API_DEPRECATED("Not supported", macos(10.12,10.15), ios(10.0,13.0), watchos(3.0,6.0), tvos(10.0,13.0));

@end

Network monitoring simple code

// 监控基础信息
@interface  NetworkMonitorBaseDataModel : NSObject
// 请求的 URL 地址
@property (nonatomic, strong) NSString *requestUrl;
//请求头
@property (nonatomic, strong) NSArray *requestHeaders;
//响应头
@property (nonatomic, strong) NSArray *responseHeaders;
//GET方法 的请求参数
@property (nonatomic, strong) NSString *getRequestParams;
//HTTP 方法, 比如 POST
@property (nonatomic, strong) NSString *httpMethod;
//协议名，如http1.0 / http1.1 / http2.0
@property (nonatomic, strong) NSString *httpProtocol;
//是否使用代理
@property (nonatomic, assign) BOOL useProxy;
//DNS解析后的 IP 地址
@property (nonatomic, strong) NSString *ip;
@end

// 监控信息模型
@interface  NetworkMonitorDataModel : NetworkMonitorBaseDataModel
//客户端发起请求的时间
@property (nonatomic, assign) UInt64 requestDate;
//客户端开始请求到开始dns解析的等待时间,单位ms 
@property (nonatomic, assign) int waitDNSTime;
//DNS 解析耗时
@property (nonatomic, assign) int dnsLookupTime;
//tcp 三次握手耗时,单位ms
@property (nonatomic, assign) int tcpTime;
//ssl 握手耗时
@property (nonatomic, assign) int sslTime;
//一个完整请求的耗时,单位ms
@property (nonatomic, assign) int requestTime;
//http 响应码
@property (nonatomic, assign) NSUInteger httpCode;
//发送的字节数
@property (nonatomic, assign) UInt64 sendBytes;
//接收的字节数
@property (nonatomic, assign) UInt64 receiveBytes;


// 错误信息模型
@interface  NetworkMonitorErrorModel : NetworkMonitorBaseDataModel
//错误码
@property (nonatomic, assign) NSInteger errorCode;
//错误次数
@property (nonatomic, assign) NSUInteger errCount;
//异常名
@property (nonatomic, strong) NSString *exceptionName;
//异常详情
@property (nonatomic, strong) NSString *exceptionDetail;
//异常堆栈
@property (nonatomic, strong) NSString *stackTrace;
@end

  
// 继承自 NSURLProtocol 抽象类，实现响应方法，代理网络请求
@interface CustomURLProtocol () <NSURLSessionTaskDelegate>

@property (nonatomic, strong) NSURLSessionDataTask *dataTask;
@property (nonatomic, strong) NSOperationQueue *sessionDelegateQueue;
@property (nonatomic, strong) NetworkMonitorDataModel *dataModel;
@property (nonatomic, strong) NetworkMonitorErrorModel *errModel;

@end

//使用NSURLSessionDataTask请求网络
- (void)startLoading {
    NSURLSessionConfiguration *configuration = [NSURLSessionConfiguration defaultSessionConfiguration];
      NSURLSession *session = [NSURLSession sessionWithConfiguration:configuration
                                                          delegate:self
                                                     delegateQueue:nil];
    NSURLSession *session = [NSURLSession sessionWithConfiguration:configuration delegate:self delegateQueue:nil];
      self.sessionDelegateQueue = [[NSOperationQueue alloc] init];
    self.sessionDelegateQueue.maxConcurrentOperationCount = 1;
    self.sessionDelegateQueue.name = @"com.networkMonitor.session.queue";
    self.dataTask = [session dataTaskWithRequest:self.request];
    [self.dataTask resume];
}

#pragma mark - NSURLSessionTaskDelegate
- (void)URLSession:(NSURLSession *)session task:(NSURLSessionTask *)task didCompleteWithError:(NSError *)error {
    if (error) {
        [self.client URLProtocol:self didFailWithError:error];
    } else {
        [self.client URLProtocolDidFinishLoading:self];
    }
    if (error) {
        NSURLRequest *request = task.currentRequest;
        if (request) {
            self.errModel.requestUrl  = request.URL.absoluteString;        
            self.errModel.httpMethod = request.HTTPMethod;
            self.errModel.requestParams = request.URL.query;
        }
        self.errModel.errorCode = error.code;
        self.errModel.exceptionName = error.domain;
        self.errModel.exceptionDetail = error.description;
      // 上传 Network 数据到数据上报组件，数据上报会在 [打造功能强大、灵活可配置的数据上报组件](https://github.com/FantasticLBP/knowledge-kit/blob/master/Chapter1%20-%20iOS/1.80.md) 讲
    }
    self.dataTask = nil;
}


- (void)URLSession:(NSURLSession *)session task:(NSURLSessionTask *)task didFinishCollectingMetrics:(NSURLSessionTaskMetrics *)metrics {
       if (@available(iOS 10.0, *) && [metrics.transactionMetrics count] > 0) {
        [metrics.transactionMetrics enumerateObjectsUsingBlock:^(NSURLSessionTaskTransactionMetrics *_Nonnull obj, NSUInteger idx, BOOL *_Nonnull stop) {
            if (obj.resourceFetchType == NSURLSessionTaskMetricsResourceFetchTypeNetworkLoad) {
                if (obj.fetchStartDate) {
                    self.dataModel.requestDate = [obj.fetchStartDate timeIntervalSince1970] * 1000;
                }
                if (obj.domainLookupStartDate && obj.domainLookupEndDate) {
                    self.dataModel. waitDNSTime = ceil([obj.domainLookupStartDate timeIntervalSinceDate:obj.fetchStartDate] * 1000);
                    self.dataModel. dnsLookupTime = ceil([obj.domainLookupEndDate timeIntervalSinceDate:obj.domainLookupStartDate] * 1000);
                }
                if (obj.connectStartDate) {
                    if (obj.secureConnectionStartDate) {
                        self.dataModel. waitDNSTime = ceil([obj.secureConnectionStartDate timeIntervalSinceDate:obj.connectStartDate] * 1000);
                    } else if (obj.connectEndDate) {
                        self.dataModel.tcpTime = ceil([obj.connectEndDate timeIntervalSinceDate:obj.connectStartDate] * 1000);
                    }
                }
                if (obj.secureConnectionEndDate && obj.secureConnectionStartDate) {
                    self.dataModel.sslTime = ceil([obj.secureConnectionEndDate timeIntervalSinceDate:obj.secureConnectionStartDate] * 1000);
                }

                if (obj.fetchStartDate && obj.responseEndDate) {
                    self.dataModel.requestTime = ceil([obj.responseEndDate timeIntervalSinceDate:obj.fetchStartDate] * 1000);
                }

                self.dataModel.httpProtocol = obj.networkProtocolName;

                NSHTTPURLResponse *response = (NSHTTPURLResponse *)obj.response;
                if ([response isKindOfClass:NSHTTPURLResponse.class]) {
                    self.dataModel.receiveBytes = response.expectedContentLength;
                }

                if ([obj respondsToSelector:@selector(_remoteAddressAndPort)]) {
                    self.dataModel.ip = [obj valueForKey:@"_remoteAddressAndPort"];
                }

                if ([obj respondsToSelector:@selector(_requestHeaderBytesSent)]) {
                    self.dataModel.sendBytes = [[obj valueForKey:@"_requestHeaderBytesSent"] unsignedIntegerValue];
                }
                if ([obj respondsToSelector:@selector(_responseHeaderBytesReceived)]) {
                    self.dataModel.receiveBytes = [[obj valueForKey:@"_responseHeaderBytesReceived"] unsignedIntegerValue];
                }

               self.dataModel.requestUrl = [obj.request.URL absoluteString];
                self.dataModel.httpMethod = obj.request.HTTPMethod;
                self.dataModel.useProxy = obj.isProxyConnection;
            }
        }];
                // 上传 Network 数据到数据上报组件，数据上报会在 [打造功能强大、灵活可配置的数据上报组件](https://github.com/FantasticLBP/knowledge-kit/blob/master/Chapter1%20-%20iOS/1.80.md) 讲
    }
}

2.2 Scheme 2: Dark magic of NSURLPROTOCOL monitoring APP network requests

In 2.1 of the article, we analyzed that NSURLSESSIONTASKMetrics is not perfect for network monitoring due to compatibility problems. However, I found an article when I searched the data later. The article in the analysis of WebView network monitoring analysis of WebKit source code found the following code

#if ! HAVE(TIMINGDATAOPTIONS) void setCollectsTimingData() { static dispatch_once_t onceToken; dispatch_once(&onceToken, ^{ [NSURLConnection _setCollectsTimingData:YES]; . }); } #endif

NSURLConnection itself has a TimingData collection API, but it’s not being exposed to developers. Apple uses it itself. NSURLConnection API _setCollectStimingData: and _timingData are found in the Runtime header (available after iOS8).

NSURLSession uses _setCollectStimingData before iOS9: TimingData is available.

Note:

Because it’s a private API, be careful of confusion when using it. Such as[[@"_setC" stringByAppendingString:@"ollectsT"] stringByAppendingString:@"imingData:"].
Private API is not recommended. Generally, those who do APM belong to the public team. Although the SDK you make achieves the purpose of network monitoring, it will cause problems in case of APP being put on the shelf of the business line, and the loss is outworth the gain. Usually this kind of opportunistic, not 100% certain thing can be used in the toy stage.

@interface _NSURLConnectionProxy : DelegateProxy

@end

@implementation _NSURLConnectionProxy

- (BOOL)respondsToSelector:(SEL)aSelector
{
    if ([NSStringFromSelector(aSelector) isEqualToString:@"connectionDidFinishLoading:"]) {
        return YES;
    }
    return [self.target respondsToSelector:aSelector];
}

- (void)forwardInvocation:(NSInvocation *)invocation
{
    [super forwardInvocation:invocation];
    if ([NSStringFromSelector(invocation.selector) isEqualToString:@"connectionDidFinishLoading:"]) {
        __unsafe_unretained NSURLConnection *conn;
        [invocation getArgument:&conn atIndex:2];
        SEL selector = NSSelectorFromString([@"_timin" stringByAppendingString:@"gData"]);
        NSDictionary *timingData = [conn performSelector:selector];
        [[NTDataKeeper shareInstance] trackTimingData:timingData request:conn.currentRequest];
    }
}

@end

@implementation NSURLConnection(tracker)

+ (void)load
{
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        Class class = [self class];
        
        SEL originalSelector = @selector(initWithRequest:delegate:);
        SEL swizzledSelector = @selector(swizzledInitWithRequest:delegate:);
        
        Method originalMethod = class_getInstanceMethod(class, originalSelector);
        Method swizzledMethod = class_getInstanceMethod(class, swizzledSelector);
        method_exchangeImplementations(originalMethod, swizzledMethod);
        
        NSString *selectorName = [[@"_setC" stringByAppendingString:@"ollectsT"] stringByAppendingString:@"imingData:"];
        SEL selector = NSSelectorFromString(selectorName);
        [NSURLConnection performSelector:selector withObject:@(YES)];
    });
}

- (instancetype)swizzledInitWithRequest:(NSURLRequest *)request delegate:(id<NSURLConnectionDelegate>)delegate
{
    if (delegate) {
        _NSURLConnectionProxy *proxy = [[_NSURLConnectionProxy alloc] initWithTarget:delegate];
        objc_setAssociatedObject(delegate ,@"_NSURLConnectionProxy" ,proxy, OBJC_ASSOCIATION_RETAIN_NONATOMIC);
        return [self swizzledInitWithRequest:request delegate:(id<NSURLConnectionDelegate>)proxy];
    }else{
        return [self swizzledInitWithRequest:request delegate:delegate];
    }
}

@end

2.3 Scheme 3: Hook

There are two types of hook technology in iOS, one is NSProxy, and the other is Method Swizzling (ISA Swizzling).

2.3.1 method a

It is definitely impossible to manually hack business code when writing SDK (you don’t have the right to submit to online code 😂), so both APM and non-trace buried point are through Hook.

Aspect-Oriented Programming (AOP) is a Programming paradigm in computer science, which further separates crosscutting concerns from business subjects to improve the modularization degree of program code. To add functionality to a program dynamically without modifying the source code. The core idea is to separate business logic (core concerns, main functions of the system) from common functions (crosscutting concerns, such as logging system) to reduce complexity and keep the system modularity, maintainability and reusability. It is often used in logging system, performance statistics, security control, transaction processing, exception handling and other scenarios.

The implementation of AOP in IOS is based on Runtime mechanism, and currently there are three ways: Method Swizzling, NSProxy and FishHook (mainly used for Hook C code).

As discussed in 2.1 above, NSURLProtocol monitors the network requests of NSURLConnection and NSURLSession. After its own proxy, NSURLProtocol can initiate network requests and get such information as request start time, request end time and header information. However, it is not possible to get very detailed network performance data, such as DNS parsing time, DNS parsing time, reponse return time, return time, and so on. – (void)URLSession:(NSURLSession *) SessionTask :(NSURLSessionTask *) Task DidFinishCollectingMetrics: (NSURLSessionTaskMetrics *) metrics API_AVAILABLE (macosx (10.12), the ios (10.0), watchos (3.0), Tvos (10.0)); , you can get accurate network data. But it is compatible. The information obtained from the WebKit source code is discussed in 2.2 above. TimingData can be obtained through the private methods _setCollectStimingData:, _timingData.

However, if you need to monitor all the network requests, it can not meet the needs. After consulting the information, I found that Ali Baichuan has APM solution, so I have the solution 3. For network monitoring, I need to do the following processing

If you are unfamiliar with CFNetwork, you can take a look at the hierarchy and simple usage of CFNetwork

The foundation of CFNetwork is CFSocket and CFStream.

CfSocket: Sockets are the underlying basis of network communication, allowing two Socket ports to send data to each other. The most common Socket abstraction in iOS is a BSD Socket. CfSocket is an OC wrapper for BSD sockets, which implements almost all BSD functions, plus RunLoop.

CFStream: provides a device-independent way to read and write data to and from memory, files, and networks (using sockets) without having to write all data to memory using a stream. The CFStream API provides abstractions for two CFType objects: CFReadStream and CFWriteStream. It is also the basis of CFHTTP and CFFTP.

A simple Demo

- (void)testCFNetwork
{
    CFURLRef urlRef = CFURLCreateWithString(kCFAllocatorDefault, CFSTR("https://httpbin.org/get"), NULL);
    CFHTTPMessageRef httpMessageRef = CFHTTPMessageCreateRequest(kCFAllocatorDefault, CFSTR("GET"), urlRef, kCFHTTPVersion1_1);
    CFRelease(urlRef);
    
    CFReadStreamRef readStream = CFReadStreamCreateForHTTPRequest(kCFAllocatorDefault, httpMessageRef);
    CFRelease(httpMessageRef);
    
    CFReadStreamScheduleWithRunLoop(readStream, CFRunLoopGetCurrent(), kCFRunLoopCommonModes);
    
    CFOptionFlags eventFlags = (kCFStreamEventHasBytesAvailable | kCFStreamEventErrorOccurred | kCFStreamEventEndEncountered);
    CFStreamClientContext context = {
        0,
        NULL,
        NULL,
        NULL,
       NULL
    } ;
    // Assigns a client to a stream, which receives callbacks when certain events occur.
    CFReadStreamSetClient(readStream, eventFlags, CFNetworkRequestCallback, &context);
    // Opens a stream for reading.
    CFReadStreamOpen(readStream);
}
// callback
void CFNetworkRequestCallback (CFReadStreamRef _Null_unspecified stream, CFStreamEventType type, void * _Null_unspecified clientCallBackInfo) {
    CFMutableDataRef responseBytes = CFDataCreateMutable(kCFAllocatorDefault, 0);
    CFIndex numberOfBytesRead = 0;
    do {
        UInt8 buffer[2014];
        numberOfBytesRead = CFReadStreamRead(stream, buffer, sizeof(buffer));
        if (numberOfBytesRead > 0) {
            CFDataAppendBytes(responseBytes, buffer, numberOfBytesRead);
        }
    } while (numberOfBytesRead > 0);
    
    
    CFHTTPMessageRef response = (CFHTTPMessageRef)CFReadStreamCopyProperty(stream, kCFStreamPropertyHTTPResponseHeader);
    if (responseBytes) {
        if (response) {
            CFHTTPMessageSetBody(response, responseBytes);
        }
        CFRelease(responseBytes);
    }
    
    // close and cleanup
    CFReadStreamClose(stream);
    CFReadStreamUnscheduleFromRunLoop(stream, CFRunLoopGetCurrent(), kCFRunLoopCommonModes);
    CFRelease(stream);
    
    // print response
    if (response) {
        CFDataRef reponseBodyData = CFHTTPMessageCopyBody(response);
        CFRelease(response);
        
        printResponseData(reponseBodyData);
        CFRelease(reponseBodyData);
    }
}

void printResponseData (CFDataRef responseData) {
    CFIndex dataLength = CFDataGetLength(responseData);
    UInt8 *bytes = (UInt8 *)malloc(dataLength);
    CFDataGetBytes(responseData, CFRangeMake(0, CFDataGetLength(responseData)), bytes);
    CFStringRef responseString = CFStringCreateWithBytes(kCFAllocatorDefault, bytes, dataLength, kCFStringEncodingUTF8, TRUE);
    CFShow(responseString);
    CFRelease(responseString);
    free(bytes);
}
// console
{
  "args": {}, 
  "headers": {
    "Host": "httpbin.org", 
    "User-Agent": "Test/1 CFNetwork/1125.2 Darwin/19.3.0", 
    "X-Amzn-Trace-Id": "Root=1-5e8980d0-581f3f44724c7140614c2564"
  }, 
  "origin": "183.159.122.102", 
  "url": "https://httpbin.org/get"
}

We know that NSURLSession, NSURLConnection, CFNetwork all have to call a bunch of methods to set it up and then we have to set the proxy object, implement the proxy method. Therefore, the first idea to monitor this situation is to use Runtime Hook to remove the method hierarchy. But the proxy method for the set proxy object cannot hook, because we do not know which class the proxy object is. Therefore, we can hook up the step of setting the proxy object, replace the proxy object with a class we designed, and then let this class implement the proxy methods related to NSURLConnection, NSURLSession and CFNetwork. The method implementation of the original proxy object is then called inside each of these methods. So our requirements are satisfied, we can get the monitoring data in the corresponding method, such as the request start time, end time, status code, content size, etc.

NSURLSession and NSURLConnection hooks are shown below.

There are APM solutions for CFNetwork in the industry, which are summarized and described as follows:

CFNetwork is implemented in C language. To hook C code, Dynamic Loader hook library – fishhook is needed.

Dynamic Loader (DYLD) binds symbols by updating the pointer saved in the Mach-O file. It allows you to modify a function pointer to a C function call at Runtime. How Fishhook is implemented: Iterate over the symbols in the __nl_symbol_ptr and __la_symbol_ptr sections of __DATA segment Through the cooperation of Indirect Symbol Table, Symbol Table and String Table, I found the function to be replaced, so as to achieve the purpose of hook.

/* Returns the number of bytes read, or -1 if an error occurs preventing any

bytes from being read, or 0 if the stream’s end was encountered.

It is an error to try and read from a stream that hasn’t been opened first.

This call will block until at least one byte is available; it will NOT block

until the entire buffer can be filled. To avoid blocking, either poll using

CFReadStreamHasBytesAvailable() or use the run loop and listen for the

kCFStreamEventHasBytesAvailable event for notification of data available. */

CF_EXPORT

CFIndex CFReadStreamRead(CFReadStreamRef _Null_unspecified stream, UInt8 * _Null_unspecified buffer, CFIndex bufferLength);

CFNetwork uses CFReadStreamRef to pass data and receives the server’s response in the form of callback functions. When a callback function gets

The specific steps and their key codes are as follows, with NSURLConnection as an example

Because there are a lot of hooks, write a tool class for Method Swizzling

#import <Foundation/Foundation. H > NS_ASSUME_NONNULL_BEGIN @interface NSObject (Hook) /** Hook object method @Param @Param swizzledSelector The object method to be replaced */ + (void)apm_swizzleMethod:(SEL)originalSelector swizzledSelector:(SEL)swizzledSelector; @Param OriginalSelector @Param SwizzledSelector @Param SwizzledSelector @Param SwizzledSelector @Param SwizzledSelector @Param SwizzledSelector (void)apm_swizzleClassMethod:(SEL)originalSelector swizzledSelector:(SEL)swizzledSelector; @end NS_ASSUME_NONNULL_END + (void)apm_swizzleMethod:(SEL)originalSelector swizzledSelector:(SEL)swizzledSelector { class_swizzleInstanceMethod(self, originalSelector, swizzledSelector); } + (void)apm_swizzleClassMethod:(SEL)originalSelector swizzledSelector:(SEL)swizzledSelector {// The class method is stored in the class object's class (metaclass), That is, the class method is equivalent to the instance method of the metaclass, so you only need to pass in the metaclass, just like any other logical and interactive instance method. Class class2 = object_getClass(self); class_swizzleInstanceMethod(class2, originalSelector, swizzledSelector); } void class_swizzleInstanceMethod(Class class, SEL originalSEL, SEL replacementSEL) { Method originMethod = class_getInstanceMethod(class, originalSEL); Method replaceMethod = class_getInstanceMethod(class, replacementSEL); if(class_addMethod(class, originalSEL, method_getImplementation(replaceMethod),method_getTypeEncoding(replaceMethod))) { class_replaceMethod(class,replacementSEL, method_getImplementation(originMethod), method_getTypeEncoding(originMethod)); }else { method_exchangeImplementations(originMethod, replaceMethod); }}

Create a class that inherits from the NSProxy abstract class and implement the corresponding methods.

H > NS_ASSUME_NONNULL_BEGIN // Set proxy forwarding for NSURLConnection, NSURLSession, CFNetwork proxy @interface NetworkDelegateProxy : NSProxy + (instancetype)setProxyForObject:(id)originalTarget withNewDelegate:(id)newDelegate; @end NS_ASSUME_NONNULL_END // .m @interface NetworkDelegateProxy () { id _originalTarget; id _NewDelegate; } @end @implementation NetworkDelegateProxy #pragma mark - life cycle + (instancetype)sharedInstance { static NetworkDelegateProxy *_sharedInstance = nil; static dispatch_once_t onceToken; dispatch_once(&onceToken, ^{ _sharedInstance = [NetworkDelegateProxy alloc]; }); return _sharedInstance; } #pragma mark - public Method + (instancetype)setProxyForObject:(id)originalTarget withNewDelegate:(id)newDelegate { NetworkDelegateProxy *instance = [NetworkDelegateProxy sharedInstance]; instance->_originalTarget = originalTarget; instance->_NewDelegate = newDelegate; return instance; } - (void)forwardInvocation:(NSInvocation *)invocation { if ([_originalTarget respondsToSelector:invocation.selector]) {  [invocation invokeWithTarget:_originalTarget]; [((NSURLSessionAndConnectionImplementor *)_NewDelegate) invoke:invocation]; } } - (nullable NSMethodSignature *)methodSignatureForSelector:(SEL)sel { return [_originalTarget methodSignatureForSelector:sel]; } @end

Create an object that implements the NSURLConnection, NSURLSession, and NSIuputStream proxy methods

// NetworkImplementor.m #pragma mark-NSURLConnectionDelegate - (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error { NSLog(@"%s", __func__); } - (nullable NSURLRequest *)connection:(NSURLConnection *)connection willSendRequest:(NSURLRequest *)request redirectResponse:(nullable NSURLResponse *)response { NSLog(@"%s", __func__); return request; } #pragma mark-NSURLConnectionDataDelegate - (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response { NSLog(@"%s", __func__); } - (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data { NSLog(@"%s", __func__); } - (void)connection:(NSURLConnection *)connection didSendBodyData:(NSInteger)bytesWritten totalBytesWritten:(NSInteger)totalBytesWritten totalBytesExpectedToWrite:(NSInteger)totalBytesExpectedToWrite { NSLog(@"%s", __func__); } - (void)connectionDidFinishLoading:(NSURLConnection *)connection { NSLog(@"%s", __func__); } #pragma mark-NSURLConnectionDownloadDelegate - (void)connection:(NSURLConnection *)connection didWriteData:(long long)bytesWritten totalBytesWritten:(long long)totalBytesWritten expectedTotalBytes:(long long) expectedTotalBytes { NSLog(@"%s", __func__); } - (void)connectionDidResumeDownloading:(NSURLConnection *)connection totalBytesWritten:(long long)totalBytesWritten expectedTotalBytes:(long long) expectedTotalBytes { NSLog(@"%s", __func__); } - (void)connectionDidFinishDownloading:(NSURLConnection *)connection destinationURL:(NSURL *) destinationURL { NSLog(@"%s", __func__); } // Write data items that need to be monitored according to the requirements

Add a Category to NSURLConnection to set the Hook proxy object and the Hook NSURLConnection object method

// NSURLConnection+Monitor.m @implementation NSURLConnection (Monitor) + (void)load { static dispatch_once_t onceToken; dispatch_once(&onceToken, ^{ @autoreleasepool { [[self class] apm_swizzleMethod:@selector(apm_initWithRequest:delegate:) swizzledSelector:@selector(initWithRequest: delegate:)]; }}); } - (_Nonnull instancetype)apm_initWithRequest:(NSURLRequest *)request delegate:(nullable id)delegate { /* 1. Replace the Delegate when setting the Delegate. 2. Because we need to monitor data in each proxy method, we need to hook all the proxy methods 3. */ NSString *traceId = @"traceId"; */ NSString *traceId = @"traceId"; NSMutableURLRequest *rq = [request mutableCopy]; NSString *preTraceId = [request.allHTTPHeaderFields valueForKey:@"head_key_traceid"]; If (pretraceID) {// call the initialization method before the hook, return NSURLConnection [self apm_initWithRequest: RQ delegate:delegate]; } else { [rq setValue:traceId forHTTPHeaderField:@"head_key_traceid"]; NSURLSessionAndConnectionImplementor *mockDelegate = [NSURLSessionAndConnectionImplementor new]; [self registerDelegateMethod:@"connection:didFailWithError:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@@"]; [self registerDelegateMethod:@"connection:didReceiveResponse:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@@"]; [self registerDelegateMethod:@"connection:didReceiveData:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@@"]; [self registerDelegateMethod:@"connection:didFailWithError:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@@"]; [self registerDelegateMethod:@"connectionDidFinishLoading:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@"]; [self registerDelegateMethod:@"connection:willSendRequest:redirectResponse:" originalDelegate:delegate newDelegate:mockDelegate flag:"@@:@@"]; delegate = [NetworkDelegateProxy setProxyForObject:delegate withNewDelegate:mockDelegate]; NSURLConnection return [self apm_initWithRequest: RQ delegate:delegate]; } } - (void)registerDelegateMethod:(NSString *)methodName originalDelegate:(id<NSURLConnectionDelegate>)originalDelegate  newDelegate:(NSURLSessionAndConnectionImplementor *)newDelegate flag:(const char *)flag { if ([originalDelegate respondsToSelector:NSSelectorFromString(methodName)]) { IMP originalMethodImp = class_getMethodImplementation([originalDelegate class], NSSelectorFromString(methodName)); IMP newMethodImp = class_getMethodImplementation([newDelegate class], NSSelectorFromString(methodName)); if (originalMethodImp ! = newMethodImp) { [newDelegate registerSelector: methodName]; NSLog(@""); } } else { class_addMethod([originalDelegate class], NSSelectorFromString(methodName), class_getMethodImplementation([newDelegate class], NSSelectorFromString(methodName)), flag); } } @end

In this way, the network information can be monitored, and then the data will be submitted to the data reporting SDK, and the data will be reported according to the issued data reporting strategy.

2.3.2 method 2

In fact, there is another way to meet these requirements, and that is Isa Swizzling.

By the way, after the hook of NSURLConnection, NSURLSession, and NSInputStream proxies above, there is another way to use NSProxy to realize the forward of proxies. That is ISA Swizzling.

Method swizzling principle

struct old_method {
    SEL method_name;
    char *method_types;
    IMP method_imp;
};

An improved version of Method Swizzling is shown below

Method originalMethod = class_getInstanceMethod(aClass, aSEL);
IMP originalIMP = method_getImplementation(originalMethod);
char *cd = method_getTypeEncoding(originalMethod);
IMP newIMP = imp_implementationWithBlock(^(id self) {
  void (*tmp)(id self, SEL _cmd) = originalIMP;
  tmp(self, aSEL);
});
class_replaceMethod(aClass, aSEL, newIMP, cd);

isa swizzling

/// Represents an instance of a class.
struct objc_object {
    Class _Nonnull isa  OBJC_ISA_AVAILABILITY;
};

/// A pointer to an instance of a class.
typedef struct objc_object *id;

So let’s look at why changing ISA works.

The people who write APM monitors have no way of identifying the business code
It is not possible to write a class that tells line of business developers not to use the system NSURLSession and NSURLConnection classes for the convenience of monitoring APM

How does KVO work? Combine this with the figure above

Create a monitor object subclass
Overrides getters and seeters for properties in subclasses
Point the ISA pointer of the monitor object to the newly created subclass
Intercept changes in getters and setters of subclasses to inform the monitor of changes in the value of the object
Restore the ISA of the monitored object after monitoring

Following this idea, we can also dynamically create subclasses of NSURLConnection, NSURLSession load methods, and override the methods in the subclasses. For example – (** Nullable ** ** InstanceType **)initWithRequest:(NSURLRequest *)request delegate:(** Nullable ID **)delegate startImmediately:(**BOOL**)startImmediately; Then point the ISA of NSURLSession and NSURLConnection to the dynamically created subclass. Restore its own ISA pointer after these methods have been processed.

However, ISA Swizzling is still targeted at Method Swizzling, and the proxy object is uncertain, so NSProxy is still needed for dynamic processing.

As for how to modify ISA, shall I write a simple Demo to simulate KVO

- (void)lbpKVO_addObserver:(NSObject *)observer forKeyPath:(NSString *)keyPath Options: (NSKeyValueObservingOptions) options context: (nullable void *) context {/ / generate custom name nsstrings * className = NSStringFromClass(self.class); NSString *currentClassName = [@"LBPKVONotifying_" stringByAppendingString:className]; Class myClass = objc_allocateClasspair (self. Class, [currentClassName utf8String], 0); Objc_registerClasspair (myClass); objc_registerClasspair (myClass); //2. Overwrite setter class_addMethod(myClass,@selector(say), (IMP)say, "v@:@"); // class_addMethod(myclass,@selector(setName:) , (IMP)setName, "v@:@"); Isa object_setClass(self, myClass); Objc_setAssociateObject (self, "observer", observer, OBJC_ASSOCIATION_ASSIGN); Objc_setassociateObject (self, "context", (__bridge id _Nullable)(context), objc_setassociateObject (self, "context", (__bridge id _Nullable)(context), OBJC_ASSOCIATION_RETAIN); } void say(id self, SEL _cmd) {struct objc_super superclass = {self, [self superclass]}; ((void(*)(struct objc_super *,SEL))objc_msgSendSuper)(&superclass,@selector(say)); NSLog(@"%s", __func__); // Class Class = [self Class]; // object_setClass(self, class_getSuperclass(class)); // objc_msgSend(self, @selector(say)); } void setName (id self, SEL _cmd, NSString *name) { NSLog(@"come here"); // first switch to the parent class of the current class, then send the message setName, then switch to the current child class. Class Class = [self Class]; object_setClass(self, class_getSuperclass(class)); Objc_msgSend (self, @selector(setName:), name); //3. Call observer id observer = objc_getAssociateObject (self, "observer"); id context = objc_getAssociatedObject(self, "context"); if (observer) { objc_msgSend(observer, @selector(observeValueForKeyPath:ofObject:change:context:), @"name", self, @{@"new": name, @"kind": @1 } , context); } return subclass object_setClass(self, class) to object_setClass(self, class); } @end

2.4 Scheme 4: Monitor the common network requests of APP

Due to the cost, since the network capability of most projects is completed through AFNetworking, the network monitoring in this paper can be completed quickly.

AFNetworking will be notified when the network is initiated. AFNetworkingTaskDidResumeNotification and AFNetworkingTaskDidCompleteNotification. Obtain network information by listening to the parameters carried by the notification.

self.didResumeObserver = [[NSNotificationCenter defaultCenter] addObserverForName:AFNetworkingTaskDidResumeNotification Object :nil queue:self.queue usingBlock:^(NSNotification * _Nonnull note) {// start __strong __typeof(weakSelf)strongSelf =  weakSelf; NSURLSessionTask *task = note.object; NSString *requestId = [[NSUUID UUID] UUIDString]; task.apm_requestId = requestId; [strongSelf.networkRecoder recordStartRequestWithRequestID:requestId task:task]; }]; self.didCompleteObserver = [[NSNotificationCenter defaultCenter] addObserverForName:AFNetworkingTaskDidCompleteNotification object:nil queue:self.queue usingBlock:^(NSNotification * _Nonnull note) { __strong __typeof(weakSelf)strongSelf = weakSelf; NSError *error = Note. The userInfo [AFNetworkingTaskDidCompleteErrorKey]; NSURLSessionTask * task = note. The object; the if (! Error) {/ / success [strongSelf.net workRecoder recordFinishRequestWithRequestID: task. Apmn_requestId task: task];} else {/ / failure [strongSelf.networkRecoder recordResponseErrorWithRequestID:task.apmn_requestId task:task error:error]; } }];

Assemble the data in the networkRecode method, hand it to the data reporting component, and wait for the appropriate timing strategy to report.

Because the network is an asynchronous process, it is necessary to set a unique identity for each network when the network request is started. After the network request is completed, the identification of each request can be used to determine how long the network takes and whether the network is successful or not. So the solution is to add a category to NSURLSessionTask, and add a property via Runtime, which is a unique identifier.

I’m going to have to be a little bit careful about naming categories and internal properties and methods. What if you don’t pay attention? If you want to hide the middle of the ID number for NSString, then the old driver A, who has been writing for A long time, adds A method name to NSString called getMaskeDidCardNumber, but he wants to hide it from the four-digit string [9, 12]. A few days later, my colleague B also encountered a similar demand. He is also an old driver. He added a method also called getMaskedidCardNumber to NSString, but he needed to hide it from the 4-bit string [8, 11]. The single test he wrote for the method failed, and he thought he had miswritten the truncation method. He checked it several times before discovering that the project had introduced another NSString taxonomy with the same name as the method 😂 True Pit.

The following example is an SDK, but the same goes for daily development.

Category class name: It is recommended to prefix the current SDK name with the abbreviation, followed by an underscore, and then add the functionality of the current Category, i.eClass name +SDK name abbreviation _ function name. If the current SDK is called JuhuasuanAPM, then the NSURLSessionTask Category name would beNSURLSessionTask+JuHuaSuanAPM_NetworkMonitor.h
Category property name: It is recommended to prefix the current SDK name with the abbreviation, followed by an underscore, and then add the property name, i.eSDK name abbreviation _ attribute name. Such as JuhuaSuanAPM_requestId `
Category method name: It is recommended to prefix the current SDK name with the abbreviation, then underline it, and then add the method name, i.eSDK name abbreviation _ method name. Such as-(BOOL)JuhuaSuanAPM__isGzippedData

Examples are as follows:

#import <Foundation/Foundation.h>

@interface NSURLSessionTask (JuhuaSuanAPM_NetworkMonitor)

@property (nonatomic, copy) NSString* JuhuaSuanAPM_requestId;

@end

#import "NSURLSessionTask+JuHuaSuanAPM_NetworkMonitor.h"
#import <objc/runtime.h>

@implementation NSURLSessionTask (JuHuaSuanAPM_NetworkMonitor)

- (NSString*)JuhuaSuanAPM_requestId
{
    return objc_getAssociatedObject(self, _cmd);
}

- (void)setJuhuaSuanAPM_requestId:(NSString*)requestId
{
    objc_setAssociatedObject(self, @selector(JuhuaSuanAPM_requestId), requestId, OBJC_ASSOCIATION_COPY_NONATOMIC);
}
@end

2.5 iOS traffic monitoring

2.5.1 HTTP request and response data structure

HTTP request packet structure

The structure of the response message

HTTP messages are formatted blocks of data, and each message consists of three parts: a starting line describing the message, a first block containing attributes, and, optionally, a body part containing the data.
The opening line and hand are line-delimited ASCII text, each line ending with a two-character line-terminating sequence (including a carriage return and a newline)
The body of an entity or the body of a packet is an optional block of data. Unlike the start and head lines, the body can contain either text or binary data, or it can be empty.
HTTP Headers (that is, HEADERS) should always end with a blank line, even if there is no entity section. The browser sends a blank line to notify the server that it has finished sending the header.

The format of the request message

<method> <request-URI> <version>
<headers>

<entity-body>

The format of the response message

<version> <status> <reason-phrase>
<headers>

<entity-body>

Below is the request to open Chrome to view the time page. Includes the response line, response header, response body and other information.

The figure below is for terminal usecurlView a complete request and response data

As we all know, in HTTP communication, response data will be compressed by GZIP or other compression methods, monitored by schemes such as NSURLPROTOCOL, and calculated and analyzed by NSDATA type, which will cause inaccurate data. Since the contents of a normal HTTP response body are compressed using GZIP or other compression methods, using NSData would be too large.

2.5.2 problems

Request and Response do not necessarily exist in pairs

For example, network disconnection, APP suddenly Crash, etc., so the Request and Response monitoring should not be recorded in a record
The request traffic calculation method is not accurate

The main reasons are:
- The monitoring technique scheme ignores the data size of the request header and the request row section
- The monitoring technique scheme ignores the data size of the Cookie part
- The monitoring technique is used directly when calculating the request body sizeHTTPBody.length, resulting in inaccuracy
Response flow calculation method is not accurate

The main reasons are:
- The monitoring technique scheme ignores the data size of the response header and the response row section
- Monitoring technique scheme in the body part of the byte size calculation, as adoptedexceptedContentLengthIt’s not accurate enough
- The monitoring technology solution ignores the response body using GZIP compression. During real network communication, the client is in the header that initiates the requestAccept-EncodingThe field represents the compression method supported by the client (indicating the compression method supported by the client if the client can normally use the data). Again, the server processes the data according to the compression method desired by the client and the compression method currently supported by the server, in the response headerContent-EncodingField represents what compression mode is currently used by the server.

2.5.3 Technical Implementation

The fifth part describes various principles and technical solutions of network interception. Here, NSURLPROTOCOL is used to realize traffic monitoring (Hook method). Knowing what we need from the above, let’s get there step by step.

2.5.3.1 Request part

Firstly, NSURLPROTOCOL is used to manage various network requests of APP by network monitoring scheme
The required parameters are recorded within each method (NSURLPROTOCOL cannot analyze data size and time consumption such as request handshake, wave, etc., but it is sufficient for normal interface traffic analysis, as the lowest level requires the Socket layer).
```
@property(nonatomic, strong) NSURLConnection *internalConnection;
@property(nonatomic, strong) NSURLResponse *internalResponse;
@property(nonatomic, strong) NSMutableData *responseData;
@property (nonatomic, strong) NSURLRequest *internalRequest;
```

- (void)startLoading
{
    NSMutableURLRequest *mutableRequest = [[self request] mutableCopy];
    self.internalConnection = [[NSURLConnection alloc] initWithRequest:mutableRequest delegate:self];
    self.internalRequest = self.request;
}

- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response
{
    [self.client URLProtocol:self didReceiveResponse:response cacheStoragePolicy:NSURLCacheStorageNotAllowed];
    self.internalResponse = response;
}

- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data 
{
    [self.responseData appendData:data];
    [self.client URLProtocol:self didLoadData:data];
}

The Status Line section

NSURLResponse has no properties or interfaces such as Status Line, nor does HTTP Version information, so if you want to get Status Line, try to convert to the CFNetwork layer. Discovered a private API that could be implemented.

Idea: pass NSURLResponse_CFURLResponseconvertCFTypeRef“And then willCFTypeRefconvertCFHTTPMessageRefAnd then through theCFHTTPMessageCopyResponseStatusLineTo obtainCFHTTPMessageRefStatus Line information of.

Add a classification of NSURLResponse to the function of reading Status Line.

// NSURLResponse+apm_FetchStatusLineFromCFNetwork.h #import <Foundation/Foundation.h> NS_ASSUME_NONNULL_BEGIN @interface  NSURLResponse (apm_FetchStatusLineFromCFNetwork) - (NSString *)apm_fetchStatusLineFromCFNetwork; @end NS_ASSUME_NONNULL_END // NSURLResponse+apm_FetchStatusLineFromCFNetwork.m #import "NSURLResponse+apm_FetchStatusLineFromCFNetwork.h" #import <dlfcn.h> #define SuppressPerformSelectorLeakWarning(Stuff) \  do { \ _Pragma("clang diagnostic push") \ _Pragma("clang diagnostic ignored \"-Warc-performSelector-leaks\"") \ Stuff; \ _Pragma("clang diagnostic pop") \ } while (0) typedef CFHTTPMessageRef (*APMURLResponseFetchHTTPResponse)(CFURLRef response); @implementation NSURLResponse (apm_FetchStatusLineFromCFNetwork) - (NSString *)apm_fetchStatusLineFromCFNetwork { NSString *statusLine = @""; NSString *funcName = @"CFURLResponseGetHTTPResponse"; APMURLResponseFetchHTTPResponse originalURLResponseFetchHTTPResponse = dlsym(RTLD_DEFAULT, [funcName UTF8String]); SEL getSelector = NSSelectorFromString(@"_CFURLResponse"); if ([self respondsToSelector:getSelector] && NULL ! = originalURLResponseFetchHTTPResponse) { CFTypeRef cfResponse; SuppressPerformSelectorLeakWarning( cfResponse = CFBridgingRetain([self performSelector:getSelector]); ) ; if (NULL ! = cfResponse) { CFHTTPMessageRef messageRef = originalURLResponseFetchHTTPResponse(cfResponse); statusLine = (__bridge_transfer NSString *)CFHTTPMessageCopyResponseStatusLine(messageRef); CFRelease(cfResponse); } } return statusLine; } @end

Convert the acquired Status Line into NSData, and then calculate the size

- (NSUInteger)apm_getLineLength {
NSString *statusLineString = @"";
if ([self isKindOfClass:[NSHTTPURLResponse class]]) {
    NSHTTPURLResponse *httpResponse = (NSHTTPURLResponse *)self;
    statusLineString = [self apm_fetchStatusLineFromCFNetwork];
}
NSData *lineData = [statusLineString dataUsingEncoding:NSUTF8StringEncoding];
return lineData.length;
}

The Header section

AllHeaderFields gets NSDictionary, splits it into strings according to key: value, and then converts it into NSData to calculate the size

Note: The key: value key is followed by a space. Curl or the chrome Network panel can be used to view the validation.

- (NSUInteger)apm_getHeadersLength
{
NSUInteger headersLength = 0;
if ([self isKindOfClass:[NSHTTPURLResponse class]]) {
    NSHTTPURLResponse *httpResponse = (NSHTTPURLResponse *)self;
    NSDictionary *headerFields = httpResponse.allHeaderFields;
    NSString *headerString = @"";
    for (NSString *key in headerFields.allKeys) {
        headerString = [headerStr stringByAppendingString:key];
        headheaderStringerStr = [headerString stringByAppendingString:@": "];
        if ([headerFields objectForKey:key]) {
            headerString = [headerString stringByAppendingString:headerFields[key]];
        }
        headerString = [headerString stringByAppendingString:@"\n"];
    }
    NSData *headerData = [headerString dataUsingEncoding:NSUTF8StringEncoding];
    headersLength = headerData.length;
}
return headersLength;
}

The Body part

The Body size cannot be calculated using ExcepectedContentLength directly. The official documentation states that it is inaccurate and should be used for reference only. Or the Content-Length value in AllHeaderFields is not accurate.

/ *!

@abstract Returns the expected content length of the receiver.

@discussion Some protocol implementations report a content length

as part of delivering load metadata, but not all protocols

guarantee the amount of data that will be delivered in actuality.

Hence, this method returns an expected amount. Clients should use

this value as an advisory, and should be prepared to deal with

either more or less data.

@result The expected content length of the receiver, or -1 if

there is no expectation that can be arrived at regarding expected

content length.

* /

@property (readonly) long long expectedContentLength;
- The HTTP 1.1 version specifies that if there isTransfer-Encoding: chunked, it cannot be in the headerContent-Length, there will be ignored.
- In HTTP 1.0 and before,content-lengthFields are optional
- In HTTP 1.1 and later. If it iskeep alive,Content-Length 和 chunkedIt has to be one or the other. If you arekeep aliveIs the same as HTTP 1.0.Content-LengthDispensable.
What is transfer-encoding: chunked

The data is sent as a series of chunks and the Content-Length header is not sent in this case. At the beginning of each chunk you need to add the length of the current chunk, expressed in hexadecimal form, followed by \r\n, followed by \r\n itself, followed by \r\n. The termination block is a regular chunk, except that it has a length of 0.

We recorded the data earlier with NSMutableData, so we can calculate the Body size in the stopLoading method. Here are the steps:
- indidReceiveDataContinuously add data to
```
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data
{
    [self.responseData appendData:data];
    [self.client URLProtocol:self didLoadData:data];
}
```

Get the AllHeaderfields dictionary in the Stoploading method and get the value of the Content-Encoding key. If it is GZIP, process NSData as GZIP compressed data in Stoploading. And then calculate the magnitude. (This tool can be used for gzip related functions.)

You need to calculate the length of an additional blank line

- (void)stopLoadi
{
    [self.internalConnection cancel];

    HCTNetworkTrafficModel *model = [[HCTNetworkTrafficModel alloc] init];
    model.path = self.request.URL.path;
    model.host = self.request.URL.host;
    model.type = DMNetworkTrafficDataTypeResponse;
    model.lineLength = [self.internalResponse apm_getStatusLineLength];
    model.headerLength = [self.internalResponse apm_getHeadersLength];
    model.emptyLineLength = [self.internalResponse apm_getEmptyLineLength];
    if ([self.dm_response isKindOfClass:[NSHTTPURLResponse class]]) {
        NSHTTPURLResponse *httpResponse = (NSHTTPURLResponse *)self.dm_response;
        NSData *data = self.dm_data;
        if ([[httpResponse.allHeaderFields objectForKey:@"Content-Encoding"] isEqualToString:@"gzip"]) {
            data = [self.dm_data gzippedData];
        }
        model.bodyLength = data.length;
    }
    model.length = model.lineLength + model.headerLength + model.bodyLength + model.emptyLineLength;
    NSDictionary *networkTrafficDictionary = [model convertToDictionary];
    [[HermesClient sharedInstance] sendWithType:APMMonitorNetworkTrafficType meta:networkTrafficDictionary payload:nil];
}

2.5.3.2 Resquest part

Firstly, NSURLPROTOCOL is used to manage various network requests of APP by network monitoring scheme
The required parameters are recorded within each method (NSURLPROTOCOL cannot analyze data size and time consumption such as request handshake, wave, etc., but it is sufficient for normal interface traffic analysis, as the lowest level requires the Socket layer).
```
@property(nonatomic, strong) NSURLConnection *internalConnection;
@property(nonatomic, strong) NSURLResponse *internalResponse;
@property(nonatomic, strong) NSMutableData *responseData;
@property (nonatomic, strong) NSURLRequest *internalRequest;
```

- (void)startLoading
{
    NSMutableURLRequest *mutableRequest = [[self request] mutableCopy];
    self.internalConnection = [[NSURLConnection alloc] initWithRequest:mutableRequest delegate:self];
    self.internalRequest = self.request;
}

- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response
{
    [self.client URLProtocol:self didReceiveResponse:response cacheStoragePolicy:NSURLCacheStorageNotAllowed];
    self.internalResponse = response;
}

- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data 
{
    [self.responseData appendData:data];
    [self.client URLProtocol:self didLoadData:data];
}

The Status Line section

There is no way to find StatusLine for NSURLRequest like there is for NSURLResponse. Therefore, the undercover scheme is to manually construct one according to the structure of the Status Line. The structure is: protocol version number + space + status code + space + status text + newline

Add a category to the NSURLRequest that specifically gets the Status Line.
```
// NSURLResquest+apm_FetchStatusLineFromCFNetwork.m - (NSUInteger)apm_fetchStatusLineLength { NSString *statusLineString = [NSString String WithFormat:@"%@ %@ %@ %@\n", self.httpmethod, self.url. Path, @"HTTP/1.1"]; NSData *statusLineData = [statusLineString dataUsingEncoding:NSUTF8StringEncoding]; return statusLineData.length; }
```

The Header section

An HTTP request builds to determine the presence of a cache and then performs DNS domain resolution to obtain the server IP address for the requested domain name. If the request protocol is HTTPS, you also need to establish a TLS connection. The next step is to establish a TCP connection using the IP address and the server. After the connection is established, the browser side will build the request line, request first-class information, and append the data related to the domain name such as cookies to the request header, and then send the build request information to the server.

So a network monitoring does not consider the cookie 😂, to borrow a word from Wang Duoyu “that can not finish the baby”.

I’ve read some articles about how NSURLRequest can’t get the full header information. In fact, the problem is not big, a few information access is not complete also can’t. Measuring the monitoring scheme itself is to see whether the data consumption of the interface is abnormal in different versions or in some cases, and whether the WebView resource requests are too large, similar to the idea of the control variable method.

So after getting AllHeaderFields of NSURLRequest, add the cookie information to calculate the complete Header size

// NSURLResquest+apm_FetchHeaderWithCookies.m - (NSUInteger)apm_fetchHeaderLengthWithCookie { NSDictionary *headerFields  = self.allHTTPHeaderFields; NSDictionary *cookiesHeader = [self apm_fetchCookies]; if (cookiesHeader.count) { NSMutableDictionary *headerDictionaryWithCookies = [NSMutableDictionary dictionaryWithDictionary:headerFields]; [headerDictionaryWithCookies addEntriesFromDictionary:cookiesHeader]; headerFields = [headerDictionaryWithCookies copy]; } NSString *headerString = @""; for (NSString *key in headerFields.allKeys) { headerString = [headerString stringByAppendingString:key]; headerString = [headerString stringByAppendingString:@": "]; if ([headerFields objectForKey:key]) { headerString = [headerString stringByAppendingString:headerFields[key]]; } headerString = [headerString stringByAppendingString:@"\n"]; } NSData *headerData = [headerString dataUsingEncoding:NSUTF8StringEncoding]; headersLength = headerData.length; return headerString; } - (NSDictionary *)apm_fetchCookies { NSDictionary *cookiesHeaderDictionary; NSHTTPCookieStorage *cookieStorage = [NSHTTPCookieStorage sharedHTTPCookieStorage]; NSArray<NSHTTPCookie *> *cookies = [cookieStorage cookiesForURL:self.URL]; if (cookies.count) { cookiesHeaderDictionary = [NSHTTPCookie requestHeaderFieldsWithCookies:cookies]; } return cookiesHeaderDictionary; }

The Body part

The HttpBody of NSURLConnection may not be available, similar to the case of Ajax on WebView, etc. So you can calculate the body size by reading the stream from HttpBodyStream.

- (NSUInteger)apm_fetchRequestBody
{
    NSDictionary *headerFields = self.allHTTPHeaderFields;
    NSUInteger bodyLength = [self.HTTPBody length];

    if ([headerFields objectForKey:@"Content-Encoding"]) {
        NSData *bodyData;
        if (self.HTTPBody == nil) {
            uint8_t d[1024] = {0};
            NSInputStream *stream = self.HTTPBodyStream;
            NSMutableData *data = [[NSMutableData alloc] init];
            [stream open];
            while ([stream hasBytesAvailable]) {
                NSInteger len = [stream read:d maxLength:1024];
                if (len > 0 && stream.streamError == nil) {
                    [data appendBytes:(void *)d length:len];
                }
            }
            bodyData = [data copy];
            [stream close];
        } else {
            bodyData = self.HTTPBody;
        }
        bodyLength = [[bodyData gzippedData] length];
    }
    return bodyLength;
}

Connection :(NSURLRequest *) Connection :(NSURLRequest *) Connection willsendRequest :(NSURLRequest *) Request RedirectResponse :(NSURLResponse *) Response method to submit data in the creation of powerful, flexible and configurable data submission components

-(NSURLRequest *)connection:(NSURLConnection *)connection willSendRequest:(NSURLRequest *)request redirectResponse:(NSURLResponse *)response { if (response ! = nil) { self.internalResponse = response; [self.client URLProtocol:self wasRedirectedToRequest:request redirectResponse:response]; } HCTNetworkTrafficModel *model = [[HCTNetworkTrafficModel alloc] init]; model.path = request.URL.path; model.host = request.URL.host; model.type = DMNetworkTrafficDataTypeRequest; model.lineLength = [connection.currentRequest dgm_getLineLength]; model.headerLength = [connection.currentRequest dgm_getHeadersLengthWithCookie]; model.bodyLength = [connection.currentRequest dgm_getBodyLength]; model.emptyLineLength = [self.internalResponse apm_getEmptyLineLength]; model.length = model.lineLength + model.headerLength + model.bodyLength + model.emptyLineLength; NSDictionary *networkTrafficDictionary = [model convertToDictionary]; [[HermesClient sharedInstance] sendWithType:APMMonitorNetworkTrafficType meta:networkTrafficDictionary payload:nil]; return request; }

6. Electricity consumption

The power consumption on mobile devices has always been a sensitive issue. If the user finds that the power consumption of an App is serious and the phone is heated seriously, then the user is likely to uninstall the App immediately. So you need to worry about power consumption during the development phase.

Generally speaking, when we encounter high power consumption, we immediately think about whether we are using location, whether we are using frequent network requests, whether we are doing something in a continuous loop.

The development phase is mostly fine, and we can use the Energy Log tool in Instrucments to locate problems. But online problems require code to monitor power consumption, which can be one of APM’s capabilities.

1. How do I get electricity

In iOS, Iokit is a private framework for getting hardware and device details, and the underlying framework for communication between hardware and kernel services. Therefore, we can obtain the hardware information through IOKIT, so as to obtain the electric quantity information. Here are the steps:

It was first found in Apple’s opensource OpenSourceIOPowerSources.h,IOPSKeys.h. In XcodePackage ContentsfindIOKit.framework. Path for/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/System/Library/Framew orks/IOKit.framework
Then IOPowerSources. H, IOPSKeys. H, IOKIT. Framework into the project project
Set UIDevice’s BatteryMonitoringEnabled to true
The power consumption obtained is accurate to 1%

2. Location problem

Usually, after we solve a lot of problems through the Energy Log in Instrucments, the App is launched, and the online power consumption needs to be solved by APM. It could be a two-party library, a three-party library, or a colleague’s code.

After detecting the power loss, you can locate the thread with the problem and then stack dump it to restore the crime scene.

In the previous section we saw the structure of thread information. There is a field in thread_basic_info, cpu_usage, that records the percentage of CPU usage. So we can find the problematic thread by iterating through the current thread to determine which one has higher CPU usage. The stack is then dumped to locate the code that is consuming power. See section 3.2 for details.

- (double)fetchBatteryCostUsage { // returns a blob of power source information in an opaque CFTypeRef CFTypeRef blob = IOPSCopyPowerSourcesInfo(); // returns a CFArray of power source handles, each of type CFTypeRef CFArrayRef sources = IOPSCopyPowerSourcesList(blob); CFDictionaryRef pSource = NULL; const void *psValue; // returns the number of values currently in an array int numOfSources = CFArrayGetCount(sources); // error in CFArrayGetCount if (numOfSources == 0) { NSLog(@"Error in CFArrayGetCount"); The return - 1.0 f; } // calculating the remaining energy for (int i=0; i<numOfSources; i++) { // returns a CFDictionary with readable information about the specific power source pSource = IOPSGetPowerSourceDescription(blob, CFArrayGetValueAtIndex(sources, i)); if (! pSource) { NSLog(@"Error in IOPSGetPowerSourceDescription"); The return - 1.0 f; } psValue = (CFStringRef) CFDictionaryGetValue(pSource, CFSTR(kIOPSNameKey)); int curCapacity = 0; int maxCapacity = 0; double percentage; psValue = CFDictionaryGetValue(pSource, CFSTR(kIOPSCurrentCapacityKey)); CFNumberGetValue((CFNumberRef)psValue, kCFNumberSInt32Type, &curCapacity); psValue = CFDictionaryGetValue(pSource, CFSTR(kIOPSMaxCapacityKey)); CFNumberGetValue((CFNumberRef)psValue, kCFNumberSInt32Type, &maxCapacity); Percentage = (double) curCapacity/(double) maxCapacity * 100.0f); NSLog(@"curCapacity : %d / maxCapacity: %d , percentage: %.1f ", curCapacity, maxCapacity, percentage); return percentage; } the return - 1.0 f; }

3. What can we do about power consumption during development

CPU intensive computation is the main reason for power consumption. So we need to be careful about how we use CPU. Try to avoid making the CPU do nothing. For the complex operation of large amounts of data, the ability of the server and GPU can be leveraged. If the project design must be completed on the CPU data calculation, you can use GCD technology, Use dispatch_block_create_with_qos_class(<#dispatch_block_flags_t flags#>, dispatch_qos_class_t qos_class, <#int relative_priority#>, <#^(void) Block# >)() and specify QoS for the queue as QOS_CLASS_Utility. The task is submitted to the block of this queue. In QOS_CLASS_Utility mode, the system optimizes the power for the computation of large amounts of data

In addition to CPU heavy computation, I/O operation is also the main cause of power consumption. The common solution in the industry is to defer the “write the fragmented data to disk storage” operation, aggregate it in memory first, and then store it on disk. Fragmented data is first aggregated and stored in memory. IOS provides the object NSCache.

Nscache is thread-safe. Nscache clears the cache when the default cache space is reached. This will fire – (**void**)cache:(Nscache *)cache willevictObject :(**id**)obj; Method callback to perform I/O operations on the data within the method to delay the aggregated data I/O. Fewer I/ Os consume less power.

Nscache can be used to view the image loading framework SDWebImage. In the image read cache processing, instead of reading the disk file (I/O) directly, the system’s NSCache is used.

- (nullable UIImage *)imageFromMemoryCacheForKey:(nullable NSString *)key {
    return [self.memoryCache objectForKey:key];
}

- (nullable UIImage *)imageFromDiskCacheForKey:(nullable NSString *)key {
    UIImage *diskImage = [self diskImageForKey:key];
    if (diskImage && self.config.shouldCacheImagesInMemory) {
        NSUInteger cost = diskImage.sd_memoryCost;
        [self.memoryCache setObject:diskImage forKey:key cost:cost];
    }

    return diskImage;
}

As you can see, the main logic is to first read the image from the disk, and if the configuration allows memory cache to be turned on, then save the image to NSCache, and also read the image from NSCache when used. NSCache’s TotalCostLimit, CountLimit properties,

– (void)setObject:(ObjectType)obj forKey:(KeyType)key cost:(NSUInteger)g; The caches () method is used to set caching criteria. So we write disk, memory file operations can learn from this strategy, to optimize power consumption.

Crash monitor

1. Review of abnormal knowledge

1.1 Exception handling by the Mach layer

Mach implements a unique set of exception handling methods on top of messaging. Mach exception handling is designed with:

Single exception handling facility with consistent semantics: Mach provides only one exception handling mechanism for all types of exceptions (including user-defined exceptions, platform-independent exceptions, and platform-specific exceptions). Groups by exception types, and specific platforms can define specific subtypes.
Clear and concise: The interface for exception handling relies on the well-defined message and port architecture already in place in Mach and is therefore elegant (without compromising efficiency). This allows the debugger and external handler to be extended-and even, in theory, to extend network-based exception handling.

In Mach, exceptions are handled through infrastructure-messaging mechanisms in the kernel. An exception is not much more complex than a message; it is thrown by the thread or task in question (via msg_send()), and then caught by a handler via msg_recv()). The handler can handle the exception, it can make sense of the exception (mark the exception as completed and continue), and it can decide to terminate the thread.

Mach’s exception handling model is different from other exception handling models, where the exception handler runs in the context of the failing thread, and the failing thread sends a message to a pre-specified exception port and waits for a reply. Each task can register an exception handling port, which will take effect for all threads in the task. In addition, Each thread can pass THREAD_SET_EXCEPTION_PORT (<# THREAD_ACT_T THREAD# >, <# EXCEPTION_MASK_T EXCEPTION_MASK #>, <#mach_port_t new_port#>, <#exception_behavior_t behavior#>, <#thread_state_flavor_t new_flavor#>) register your own exception-handling port. In general, the exception ports for tasks and threads are NULL, meaning that exceptions are not handled, and once created, they can be handed over to other tasks or other hosts, just like any other port on the system. With the port, you can use the UDP protocol to allow applications on other hosts to handle exceptions through the network capability.

When an exception occurs, it is first tried to throw the exception to the thread’s exception port, then to the task’s exception port, and finally to the host’s exception port (that is, the host’s registered default port). If none of the ports returns KERN_SUCCESS, the entire task will be aborted. That is, Mach does not provide exception handling logic, only a framework for passing exception notifications.

Exceptions are first thrown by processor traps. To handle traps, every modern kernel has a trap handler installed. These low-level functions are installed by the assembly part of the kernel.

1.2 Exception handling by BSD layer

The BSD layer, the main XUN interface used by user mode, presents a POSIX compliant interface. Developers can use all the functionality of the UNIX system, but do not need to know the details of the Mach layer implementation.

Mach already provides low-level trap handling through the exception mechanism, while BSD builds signal handling on top of the exception mechanism. Hardware generated signals are captured by the Mach layer and then converted to the corresponding UNIX signals, and to maintain a uniform mechanism, operating system and user-generated signals are first converted to Mach exceptions and then to signals.

All Mach exceptions are converted by ux_exception at the host layer to the corresponding UNIX signal, which is delivered to the fault thread via a threadSignal.

2. Crash the road

The Apples’ Crash Reporter in iOS system records the Crash log in the Settings. Let’s observe the Crash log first

Incident Identifier: 7FA6736D-09E8-47A1-95EC-76C4522BDE1A CrashReporter Key: 4 e2d36419259f14413c3229e8b7235bcc74847f3 Hardware Model: iPhone7, 1 Process: APMMonitorExample [3608] the Path: /var/containers/Bundle/Application/9518A4F4-59B7-44E9-BDDA-9FBEE8CA18E5/APMMonitorExample.app/APMMonitorExample Identifier: com. Wacai. APMMonitorExample Version: 1.0 (1) Code Type: ARM - 64 the Parent Process:? [1] Date/Time: 2017-01-03 11:43:03.000 +0800 OS Version: iOS 10.2 (14C92) Report Version: 104 Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x00000000 at 0x0000000000000000 Crashed Thread: 0 Application Specific Information: *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[__NSSingleObjectArrayI objectForKey:]: unrecognized selector sent to instance 0x174015060' Thread 0 Crashed: 0 CoreFoundation 0x0000000188f291b8 0x188df9000 + 1245624 (<redacted> + 124) 1 libobjc.A.dylib 0x000000018796055c 0x187958000 + 34140 (objc_exception_throw + 56) 2 CoreFoundation 0x0000000188f30268 0x188df9000 + 1274472 (<redacted> + 140) 3 CoreFoundation 0x0000000188f2d270 0x188df9000 + 1262192 (<redacted> + 916) 4 CoreFoundation 0x0000000188e2680c 0x188df9000 + 186380 (_CF_forwarding_prep_0 + 92) 5 APMMonitorExample 0x000000010004c618 0x100044000 + 34328 (-[MakeCrashHandler throwUncaughtNSException] + 80)

You can see that the Exception Type entry in the Crash log consists of two parts: the Mach Exception + the UNIX signal.

So the Exception Type: EXC_CRASH (SIGABRT) said the Mach layer EXC_CRASH abnormal happens, the host layer are converted into SIGABRT signal is delivered to the wrong thread.

Question: What is the choice between catching Mach layer exceptions and registering UNIX signal handling to catch crashes?

Answer: Preferably Mach layer exception intercepting. As described in 1.2 above, we know that the Mach layer exception handles earlier. If the Mach layer exception handler lets the process exit, then the UNIX signal will never occur.

There are many open source projects that collect crash logs in the industry, notably KScrash, PlcrashReporter, and Bugly, Friends and others that provide one-stop services. We typically use open source projects to develop bug-collecting tools that meet the internal needs of the company. After some comparison, choose Kscrash. Why you chose Kscrash is not the point of this article.

KScrash is fully functional and can catch the following types of crashes

Mach kernel exceptions
Fatal signals
C++ exceptions
Objective-C exceptions
Main thread deadlock (experimental)
Custom crashes (e.g. from scripting languages)

Therefore, the analysis of the Crash collection scheme at the iOS end is to analyze the implementation principle of KScrash Crash monitoring.

2.1. Mach layer exception handling

The general idea is to create an exception handling port, apply permission for this port, then set up an exception port, create a new kernel thread, wait for the exception in the thread loop. However, in order to prevent our registered Mach layer exception handling from preempting the logic set by other SDKs or line of business developers, we need to save other exception handling ports at the beginning, and pass exception handling to the logic in the other port after the logic is executed. After collecting the Crash information, assemble the data and write it to a JSON file.

The flow chart is as follows:

For Mach exception capture, you can register an exception port that is responsible for listening on all threads for the current task.

Here’s the key code:

static bool installExceptionHandler() { KSLOG_DEBUG("Installing mach exception handler."); bool attributes_created = false; pthread_attr_t attr; kern_return_t kr; int error; // Get the current process const task_t thisTask = mach_task_self(); exception_mask_t mask = EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC | EXC_MASK_SOFTWARE | EXC_MASK_BREAKPOINT; KSLOG_DEBUG("Backing up original exception ports."); / / to get the Task on the abnormal registered good port kr = task_get_exception_ports (thisTask, mask, g_previousExceptionPorts. Masks, &g_previousExceptionPorts.count, g_previousExceptionPorts.ports, g_previousExceptionPorts.behaviors, g_previousExceptionPorts.flavors); // Failed to go if(kr! = KERN_SUCCESS) { KSLOG_ERROR("task_get_exception_ports: %s", mach_error_string(kr)); goto failed; } // KScrash exception is empty then walk execution logic if(G_ExceptionPort == MACH_PORT_NULL) {Ksegreg_debug (" segregNew port with receive rights."); Kr = mach_port_allocate(Thistask, MACH_PORT_RIGHT_RECEIVE, &g_exceptionport); if(kr ! = KERN_SUCCESS) { KSLOG_ERROR("mach_port_allocate: %s", mach_error_string(kr)); goto failed; } KSLOG_DEBUG("Adding send rights to port."); // Request permission for exception handling port: MACH_MSG_TYPE_MAKE_SEND kr = mach_port_insert_right(thisTask, g_exceptionPort, g_exceptionPort, MACH_MSG_TYPE_MAKE_SEND); if(kr ! = KERN_SUCCESS) { KSLOG_ERROR("mach_port_insert_right: %s", mach_error_string(kr)); goto failed; } } KSLOG_DEBUG("Installing port as exception handler."); Kr = task_set_exception_ports(Thistask, Mask, G_ExceptionPort, Exception_Default) THREAD_STATE_NONE); if(kr ! = KERN_SUCCESS) { KSLOG_ERROR("task_set_exception_ports: %s", mach_error_string(kr)); goto failed; } KSLOG_DEBUG("Creating secondary exception thread (suspended)."); pthread_attr_init(&attr); attributes_created = true; pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); Error = pthread_create(&g_secondaryThread, &attr, &handleexceptions, kthreadSecondary); if(error ! = 0) { KSLOG_ERROR("pthread_create_suspended_np: %s", strerror(error)); goto failed; G_secondaryMachThread = pthread_mach_thread_np(g_secondaryMachThread); ksmc_addReservedThread(g_secondaryMachThread); KSLOG_DEBUG("Creating primary exception thread."); error = pthread_create(&g_primaryPThread, &attr, &handleExceptions, kThreadPrimary); if(error ! = 0) { KSLOG_ERROR("pthread_create: %s", strerror(error)); goto failed; } pthread_attr_destroy(&attr); g_primaryMachThread = pthread_mach_thread_np(g_primaryPThread); ksmc_addReservedThread(g_primaryMachThread); KSLOG_DEBUG("Mach exception handler installed."); return true; failed: KSLOG_DEBUG("Failed to install mach exception handler."); if(attributes_created) { pthread_attr_destroy(&attr); } / / reduction anomaly before registration port, will restore uninstallExceptionHandler control (); return false; }

Handle exception logic, assemble crash information

/** Our exception handler thread routine. * Wait for an exception message, uninstall our exception port, record the * exception information, and write a report. */ static void* handleExceptions(void* const userData) { MachExceptionMessage exceptionMessage = {{0}}; MachReplyMessage replyMessage = {{0}}; char* eventID = g_primaryEventID; const char* threadName = (const char*) userData; pthread_setname_np(threadName); if(threadName == kThreadSecondary) { KSLOG_DEBUG("This is the secondary thread. Suspending."); thread_suspend((thread_t)ksthread_self()); eventID = g_secondaryEventID; } // Loop through the port for(;;); { KSLOG_DEBUG("Waiting for mach exception"); // Wait for a message. kern_return_t kr = mach_msg(&exceptionMessage.header, MACH_RCV_MSG, 0, sizeof(exceptionMessage), g_exceptionPort, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); If (kr == KERN_SUCCESS) {break; if(kr == KERN_SUCCESS) {break; } // Loop and try again on failure. KSLOG_ERROR("mach_msg: %s", mach_error_string(kr)); } KSLOG_DEBUG("Trapped mach exception code 0x%x, subcode 0x%x", exceptionMessage.code[0], exceptionMessage.code[1]); If (g_isEnabled) {// Suspending all threads ksmc_suspendEnvironment(); g_isHandlingCrash = true; / / notice happened abnormal kscm_notifyFatalExceptionCaptured (true); KSLOG_DEBUG("Exception handler is installed. Continuing exception handling."); // Switch to the secondary thread if necessary, or uninstall the handler // to avoid a death loop. if(ksthread_self() == g_primaryMachThread) { KSLOG_DEBUG("This is the  primary exception thread. Activating secondary thread."); // TODO: This was put here to avoid a freeze. Does secondary thread ever fire? restoreExceptionPorts(); if(thread_resume(g_secondaryMachThread) ! = KERN_SUCCESS) { KSLOG_DEBUG("Could not activate secondary thread. Restoring original exception ports."); } } else { KSLOG_DEBUG("This is the secondary exception thread. Restoring original exception ports."); // restoreExceptionPorts(); } // Fill out crash information // Slot_debug ("Fetching machine state."); KSMC_NEW_CONTEXT(machineContext); KSCrash_MonitorContext* crashContext = &g_monitorContext; crashContext->offendingMachineContext = machineContext; kssc_initCursor(&g_stackCursor, NULL, NULL); if(ksmc_getContextForThread(exceptionMessage.thread.name, machineContext, true)) { kssc_initWithMachineContext(&g_stackCursor, 100, machineContext); KSLOG_TRACE("Fault address 0x%x, instruction address 0x%x", kscpu_faultAddress(machineContext), kscpu_instructionAddress(machineContext)); if(exceptionMessage.exception == EXC_BAD_ACCESS) { crashContext->faultAddress = kscpu_faultAddress(machineContext); } else { crashContext->faultAddress = kscpu_instructionAddress(machineContext); } } KSLOG_DEBUG("Filling out context."); crashContext->crashType = KSCrashMonitorTypeMachException; crashContext->eventID = eventID; crashContext->registersAreValid = true; crashContext->mach.type = exceptionMessage.exception; crashContext->mach.code = exceptionMessage.code[0]; crashContext->mach.subcode = exceptionMessage.code[1]; if(crashContext->mach.code == KERN_PROTECTION_FAILURE && crashContext->isStackOverflow) { // A stack overflow should return KERN_INVALID_ADDRESS, but // when a stack blasts through the guard pages at the top of the stack, // it generates KERN_PROTECTION_FAILURE. Correct for this. crashContext->mach.code = KERN_INVALID_ADDRESS; } crashContext->signal.signum = signalForMachException(crashContext->mach.type, crashContext->mach.code); crashContext->stackCursor = &g_stackCursor; kscm_handleException(crashContext); KSLOG_DEBUG("Crash handling complete. Restoring original handlers."); g_isHandlingCrash = false; ksmc_resumeEnvironment(); } KSLOG_DEBUG("Replying to mach exception message."); // Send a reply saying "I didn't handle this exception". replyMessage.header = exceptionMessage.header; replyMessage.NDR = exceptionMessage.NDR; replyMessage.returnCode = KERN_FAILURE; mach_msg(&replyMessage.header, MACH_SEND_MSG, sizeof(replyMessage), 0, MACH_PORT_NULL, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); return NULL; }

Restore the exception-handling port and transfer control

/** Restore the original mach exception ports. */ static void restoreExceptionPorts(void) { KSLOG_DEBUG("Restoring original exception ports."); if(g_previousExceptionPorts.count == 0) { KSLOG_DEBUG("Original exception ports were already restored."); return; } const task_t thisTask = mach_task_self(); kern_return_t kr; // Reinstall old exception ports. // For (mach_msg_type_number_t I = 0; i < g_previousExceptionPorts.count; i++) { KSLOG_TRACE("Restoring port index %d", i); kr = task_set_exception_ports(thisTask, g_previousExceptionPorts.masks[i], g_previousExceptionPorts.ports[i], g_previousExceptionPorts.behaviors[i], g_previousExceptionPorts.flavors[i]); if(kr ! = KERN_SUCCESS) { KSLOG_ERROR("task_set_exception_ports: %s", mach_error_string(kr)); } } KSLOG_DEBUG("Exception ports restored."); g_previousExceptionPorts.count = 0; }

Signal exception handling

For Mach exceptions, the operating system converts them to the corresponding UNIX signal, so developers can register SignanHandler to handle them.

The processing logic of KScrash here is as follows:

Take a look at the key code:

Set the signal handler function

static bool installSignalHandler() { KSLOG_DEBUG("Installing signal handler."); #if KSCRASH_HAS_SIGNAL_STACK // Allocate a chunk of memory on the heap,  if(g_signalStack.ss_size == 0) { KSLOG_DEBUG("Allocating signal stack area."); g_signalStack.ss_size = SIGSTKSZ; g_signalStack.ss_sp = malloc(g_signalStack.ss_size); } // The signal handler's stack is moved to the heap instead of sharing the stack with processes. // The sigaltStack () function, whose first argument, sigStack, is a pointer to a stack_t structure that stores the location and properties of an "alternative signal stack". The second parameter, old_sigstack, is also a stack_t pointer, which returns information (if available) for the last established "alternate signal stack", KSLOG_DEBUG("Setting signal stack area."); // The first parameter of SigaltStack is to create a new replaceable stack. The second parameter can be set to NULL. If not NULL, the information of the old replaceable stack is stored in it. This function returns 0 on success and -1 on failure. If (sigaltStack (&g_signalStack, NULL)! = 0) { KSLOG_ERROR("signalstack: %s", strerror(errno)); goto failed; } #endif const int* fatalSignals = kssignal_fatalSignals(); int fatalSignalsCount = kssignal_numFatalSignals(); if(g_previousSignalHandlers == NULL) { KSLOG_DEBUG("Allocating memory to store previous signal handlers."); g_previousSignalHandlers = malloc(sizeof(*g_previousSignalHandlers) * (unsigned)fatalSignalsCount); } // Set the second argument to the signal handler sigaction, of type sigaction struct sigaction = {{0}}; // The SA_FLAGS member sets the SA_ONSTACK flag, which tells the kernel that stack frames for signal handlers are created on the "alternate signal stack." action.sa_flags = SA_SIGINFO | SA_ONSTACK; #if KSCRASH_HOST_APPLE && defined(__LP64__) action.sa_flags |= SA_64REGSET; #endif sigemptyset(&action.sa_mask); action.sa_sigaction = &handleSignal; For (int I = 0; i < fatalSignalsCount; I++) {// Bind each signal handler to the action declared above, Also keep KSLOG_DEBUG("Assigning handler for signal %d", fatalSignals[I]) for the current signal with g_previousSignalHandlers; if(sigaction(fatalSignals[i], &action, &g_previousSignalHandlers[i]) ! = 0) { char sigNameBuff[30]; const char* sigName = kssignal_signalName(fatalSignals[i]); if(sigName == NULL) { snprintf(sigNameBuff, sizeof(sigNameBuff), "%d", fatalSignals[i]); sigName = sigNameBuff; } KSLOG_ERROR("sigaction (%s): %s", sigName, strerror(errno)); // Try to reverse the damage for(i--; i >= 0; i--) { sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL); } goto failed; } } KSLOG_DEBUG("Signal handlers installed."); return true; failed: KSLOG_DEBUG("Failed to install signal handlers."); return false; }

The context information such as thread is recorded during signal processing

static void handleSignal(int sigNum, siginfo_t* signalInfo, void* userContext) { KSLOG_DEBUG("Trapped signal %d", sigNum); if(g_isEnabled) { ksmc_suspendEnvironment(); kscm_notifyFatalExceptionCaptured(false); KSLOG_DEBUG("Filling out context."); KSMC_NEW_CONTEXT(machineContext); ksmc_getContextForSignal(userContext, machineContext); kssc_initWithMachineContext(&g_stackCursor, 100, machineContext); KScrash_MonitorContext * CrashContext = &G_MonitorContext; memset(crashContext, 0, sizeof(*crashContext)); crashContext->crashType = KSCrashMonitorTypeSignal; crashContext->eventID = g_eventID; crashContext->offendingMachineContext = machineContext; crashContext->registersAreValid = true; crashContext->faultAddress = (uintptr_t)signalInfo->si_addr; crashContext->signal.userContext = userContext; crashContext->signal.signum = signalInfo->si_signo; crashContext->signal.sigcode = signalInfo->si_code; crashContext->stackCursor = &g_stackCursor; kscm_handleException(crashContext); ksmc_resumeEnvironment(); } KSLOG_DEBUG("Re-raising signal for regular handlers to catch."); // This is technically not allowed, but it works in OSX and iOS. raise(sigNum); }

Restore the previous signal processing authority after KScrash signal processing

static void uninstallSignalHandler(void) { KSLOG_DEBUG("Uninstalling signal handlers."); const int* fatalSignals = kssignal_fatalSignals(); int fatalSignalsCount = kssignal_numFatalSignals(); For (int I = 0; int I = 0; i < fatalSignalsCount; i++) { KSLOG_DEBUG("Restoring original handler for signal %d", fatalSignals[i]); sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL); } KSLOG_DEBUG("Signal handlers uninstalled."); }

Description:

Allocating an area of memory from the heap, known as the “replaceable signal stack”, aims to eliminate the stack of the signal handlers and replace it with the memory area on the heap, rather than sharing the stack with the processes.

Why do you do that? A process may have N threads, each with its own task, and if one thread fails, the entire process will crash. So in order for the signal handler function to work properly, you need to set up a separate space for the signal handler function to run. The other scenario is that the recursive function runs out of the system’s default stack space, but the signal handler uses the stack that its implementation allocates in the heap, not the system’s default stack, so it still works fine.
Int sigaltStack (const stack_t * __restrict, stack_t * __restrict) is a pointer to a stack_t structure that stores information about replaceable signal stacks (starting address, length, state). Parameter 1. This structure stores the location and properties of an “alternative signal stack”. The second parameter is used to return information (if any) about the “replaceable signal stack” that was created last time.
```
_STRUCT_SIGALTSTACK { void *ss_sp; /* signal stack base */ __darwin_size_t ss_size; /* signal stack length */ int ss_flags; /* SA_DISABLE and/or SA_ONSTACK */ }; typedef _STRUCT_SIGALTSTACK stack_t; / * /?????  signal stack */
```

For the newly created replaceable signal stack, ss_flags must be set to 0. Sigstksz constant is defined in the system, which can meet the requirements of most replaceable signal stacks.

/*
 * Structure used in sigaltstack call.
 */

#define SS_ONSTACK      0x0001  /* take signal on signal stack */
#define SS_DISABLE      0x0004  /* disable taking signals on alternate stack */
#define MINSIGSTKSZ     32768   /* (32K)minimum allowable stack */
#define SIGSTKSZ        131072  /* (128K)recommended stack size */

The SigaltStack system call notifies the kernel that an “alternative signal stack” has been established.

When ss_flags is set to SS_ONSTACK, the process is currently executing in the “alternate signal stack”. If the process tries to create a new “alternate signal stack”, it will encounter an EPERM error. SS_DISABLE indicates that there are no existing “replaceable signal stacks” and disallows the creation of “replaceable signal stacks”.

int sigaction(int, const struct sigaction * __restrict, struct sigaction * __restrict);

The first function represents the value of the signal that needs to be processed, but it cannot be SIGKILL and SIGSTOP. The handlers for these two signals do not allow the user to override them. Because they provide a way for the superuser to sign kill and sign stop cannot be caught, blocked, or ignored;

The second and third parameters are a SIGACTION structure. If the second argument is not null, it points to the signal handler. If the third argument is not null, the previous signal handler is stored in that pointer. If the second argument is null and the third argument is not, the current signal handler is retrieved.
```
/*
 * Signal vector "template" used in sigaction call.
 */
struct  sigaction {
    union __sigaction_u __sigaction_u;  /* signal handler */
    sigset_t sa_mask;               /* signal mask to apply */
    int     sa_flags;               /* see signal options below */
};
```

The SA_FLAGS parameter for the SIGACTION function needs to be set with the SA_ONSTACK flag, which tells the kernel that the stack frame for the signal handler is created on the “alternate signal stack.”

2.3. C++ exception handling

The C ++ implementation of exception handling relies on the library STD ::set_terminate(CppExceptionTerminate) function.

The implementation of some functions in the IOS project may use C, C++, etc. If a C++ exception is thrown, the OC exception catching mechanism is used if the exception can be converted to NSEXCEPTION.If not, the C++ exception process continues, that is, default_terminate_handler. The default terminate function of this C++ exception calls abort_message internally, which triggers an ABORT call, and the system generates a SIGABRT signal.

After the system throws a C++ exception, add a try… catch… To determine whether the exception can be converted to a NSEXCEPTION and then rethrown in C++. At this time, the field stack of the exception has disappeared, so the upper layer cannot restore the scene when the exception occurs by capturing the SIGABRT signal, that is, the exception stack is missing.

Why is that? try… catch… __cxa_rethrow() is called inside the statement to throw an exception, __cxa_rethrow() is called inside the statement to unwind. Unwind is simply a reverse call to a function call, and is used to clean up the local variables generated by each function during a function call. All the way to the outermost function where the catch statement is located and transfers control to the catch statement, which is why the stack of C++ exceptions disappears.

static void setEnabled(bool isEnabled) { if(isEnabled ! = g_isEnabled) { g_isEnabled = isEnabled; if(isEnabled) { initialize(); ksid_generate(g_eventID); g_originalTerminateHandler = std::set_terminate(CPPExceptionTerminate); } else { std::set_terminate(g_originalTerminateHandler); } g_captureNextStackTrace = isEnabled; } } static void initialize() { static bool isInitialized = false; if(! isInitialized) { isInitialized = true; kssc_initCursor(&g_stackCursor, NULL, NULL); } } void kssc_initCursor(KSStackCursor *cursor, void (*resetCursor)(KSStackCursor*), bool (*advanceCursor)(KSStackCursor*)) { cursor->symbolicate = kssymbolicator_symbolicate; cursor->advanceCursor = advanceCursor ! = NULL ? advanceCursor : g_advanceCursor; cursor->resetCursor = resetCursor ! = NULL ? resetCursor : kssc_resetCursor; cursor->resetCursor(cursor); }

static void CPPExceptionTerminate(void) { ksmc_suspendEnvironment(); KSLOG_DEBUG("Trapped c++ exception"); const char* name = NULL; std::type_info* tinfo = __cxxabiv1::__cxa_current_exception_type(); if(tinfo ! = NULL) { name = tinfo->name(); } if(name == NULL || strcmp(name, "NSException") ! = 0) { kscm_notifyFatalExceptionCaptured(false); KSCrash_MonitorContext* crashContext = &g_monitorContext; memset(crashContext, 0, sizeof(*crashContext)); char descriptionBuff[DESCRIPTION_BUFFER_LENGTH]; const char* description = descriptionBuff; descriptionBuff[0] = 0; KSLOG_DEBUG("Discovering what kind of exception was thrown."); g_captureNextStackTrace = false; try { throw; } catch(std::exception& exc) { strncpy(descriptionBuff, exc.what(), sizeof(descriptionBuff)); } #define CATCH_VALUE(TYPE, PRINTFTYPE) \ catch(TYPE value)\ { \ snprintf(descriptionBuff, sizeof(descriptionBuff), "%" #PRINTFTYPE, value); \ } CATCH_VALUE(char, d) CATCH_VALUE(short, d) CATCH_VALUE(int, d) CATCH_VALUE(long, ld) CATCH_VALUE(long long, lld) CATCH_VALUE(unsigned char, u) CATCH_VALUE(unsigned short, u) CATCH_VALUE(unsigned int, u) CATCH_VALUE(unsigned long, lu) CATCH_VALUE(unsigned long long, llu) CATCH_VALUE(float, f) CATCH_VALUE(double, f) CATCH_VALUE(long double, Lf) CATCH_VALUE(char*, s) catch(...) { description = NULL; } g_captureNextStackTrace = g_isEnabled; // TODO: Should this be done here? Maybe better in the exception handler? KSMC_NEW_CONTEXT(machineContext); ksmc_getContextForThread(ksthread_self(), machineContext, true); KSLOG_DEBUG("Filling out context."); crashContext->crashType = KSCrashMonitorTypeCPPException; crashContext->eventID = g_eventID; crashContext->registersAreValid = false; crashContext->stackCursor = &g_stackCursor; crashContext->CPPException.name = name; crashContext->exceptionName = name; crashContext->crashReason = description; crashContext->offendingMachineContext = machineContext; kscm_handleException(crashContext); } else { KSLOG_DEBUG("Detected NSException. Letting the current NSException handler deal with it."); } ksmc_resumeEnvironment(); KSLOG_DEBUG("Calling original terminate handler."); g_originalTerminateHandler(); }

2.4. Objective-C exception handling

For OC level NSException exception handling is relatively easy, can be captured by a registered NSUncaughtExceptionHandler exception information, through NSException parameters for Crash information collection, report to data components.

static void setEnabled(bool isEnabled) { if(isEnabled ! = g_isEnabled) { g_isEnabled = isEnabled; if(isEnabled) { KSLOG_DEBUG(@"Backing up original handler."); / / record the OC before the exception handler g_previousUncaughtExceptionHandler = NSGetUncaughtExceptionHandler (); KSLOG_DEBUG(@"Setting new handler."); / / set the new OC exception handler NSSetUncaughtExceptionHandler (& handleException); KSCrash.sharedInstance.uncaughtExceptionHandler = &handleException; } else { KSLOG_DEBUG(@"Restoring original handler."); NSSetUncaughtExceptionHandler(g_previousUncaughtExceptionHandler); }}}

2.5. Main thread deadlock

The detection of main thread deadlock is somewhat similar to the detection of ANR

Create a thread to use in the thread run methoddo... while...Loop processing logic, add autorelease to avoid memory too high

There is an AwaitingResponse property and the WatchdogPulse method. WatchdogPulse main logic is to set AwaitingResponse to Yes, switch to the main thread, set AwaitingResponse to No,

- (void) watchdogPulse
{
    __block id blockSelf = self;
    self.awaitingResponse = YES;
    dispatch_async(dispatch_get_main_queue(), ^
                   {
                       [blockSelf watchdogAnswer];
                   });
}

The execution method of the thread loops continuously, waiting for the set G_WatchdogInterval to determine whether the value of awaitingResponse property is the value of the initial state, otherwise it is judged to be a deadlock

- (void) runMonitor { BOOL cancelled = NO; do { // Only do a watchdog check if the watchdog interval is > 0. // If the interval is <= 0, just idle until the user changes it. @autoreleasepool { NSTimeInterval sleepInterval = g_watchdogInterval; BOOL runWatchdogCheck = sleepInterval > 0; if(! runWatchdogCheck) { sleepInterval = kIdleInterval; } [NSThread sleepForTimeInterval:sleepInterval]; cancelled = self.monitorThread.isCancelled; if(! cancelled && runWatchdogCheck) { if(self.awaitingResponse) { [self handleDeadlock]; } else { [self watchdogPulse]; } } } } while (! cancelled); }

2.6 Crash generation and saving

2.6.1 Crash log generation logic

In the previous section, we discussed the various Crash monitoring logic in iOS application development. Now we should examine how to record Crash information once Crash is captured, i.e. save it in the application sandbox.

Take the main thread deadlock as an example of a crash and see how KScrash records the crash information.

// KSCrashMonitor_Deadlock.m
- (void) handleDeadlock
{
    ksmc_suspendEnvironment();
    kscm_notifyFatalExceptionCaptured(false);

    KSMC_NEW_CONTEXT(machineContext);
    ksmc_getContextForThread(g_mainQueueThread, machineContext, false);
    KSStackCursor stackCursor;
    kssc_initWithMachineContext(&stackCursor, 100, machineContext);
    char eventID[37];
    ksid_generate(eventID);

    KSLOG_DEBUG(@"Filling out context.");
    KSCrash_MonitorContext* crashContext = &g_monitorContext;
    memset(crashContext, 0, sizeof(*crashContext));
    crashContext->crashType = KSCrashMonitorTypeMainThreadDeadlock;
    crashContext->eventID = eventID;
    crashContext->registersAreValid = false;
    crashContext->offendingMachineContext = machineContext;
    crashContext->stackCursor = &stackCursor;
    
    kscm_handleException(crashContext);
    ksmc_resumeEnvironment();

    KSLOG_DEBUG(@"Calling abort()");
    abort();
}

The same is true for several other crashes, where the exception information is wrapped in the kscm_handleException() function. You can see that this function is called after it has been captured by several other crashes.

/** Start general exception processing. * * @oaram context Contextual information about the exception. */ void kscm_handleException(struct KSCrash_MonitorContext* context) { context->requiresAsyncSafety = g_requiresAsyncSafety; if(g_crashedDuringExceptionHandling) { context->crashedDuringCrashHandling = true; } for(int i = 0; i < g_monitorsCount; i++) { Monitor* monitor = &g_monitors[i]; // Check that Crash monitoring is enabled (monitor) {// Add contextualInfoToEvent (monitor, monitor) for each type of Crash. context); }} // Save the JSON-formatted Crash message G_OnExceptionEvent (context); if(g_handlingFatalException && ! g_crashedDuringExceptionHandling) { KSLOG_DEBUG("Exception is fatal. Restoring original handlers."); kscm_setActiveMonitors(KSCrashMonitorTypeNone); }}

G_OnExceptionEvent is a block declared static void (* G_OnExceptionEvent)(struct KScrash_MonitorContext * MonitorContext); Is assigned in kscrashmonitor. c

void kscm_setEventCallback(void (*onEvent)(struct KSCrash_MonitorContext* monitorContext))
{
    g_onExceptionEvent = onEvent;
}

The kscm_setEventCallback() function is called in the kscrashc.c file

KSCrashMonitorType kscrash_install(const char* appName, const char* const installPath) { KSLOG_DEBUG("Installing crash reporter."); if(g_installed) { KSLOG_DEBUG("Crash reporter already installed."); return g_monitoring; } g_installed = 1; char path[KSFU_MAX_PATH_LENGTH]; snprintf(path, sizeof(path), "%s/Reports", installPath); ksfu_makePath(path); kscrs_initialize(appName, path); snprintf(path, sizeof(path), "%s/Data", installPath); ksfu_makePath(path); snprintf(path, sizeof(path), "%s/Data/CrashState.json", installPath); kscrashstate_initialize(path); snprintf(g_consoleLogPath, sizeof(g_consoleLogPath), "%s/Data/ConsoleLog.txt", installPath); if(g_shouldPrintPreviousLog) { printPreviousLog(g_consoleLogPath); } kslog_setLogFilename(g_consoleLogPath, true); ksccd_init(60); // Set the callback function kscm_setEventCallback(onCrash); KSCrashMonitorType monitors = kscrash_setMonitoring(g_monitoring); KSLOG_DEBUG("Installation complete."); return monitors; } /** Called when a crash occurs. * * This function gets passed as a callback to a crash handler. */ static void onCrash(struct KSCrash_MonitorContext* monitorContext) { KSLOG_DEBUG("Updating application state to note crash."); kscrashstate_notifyAppCrash(); monitorContext->consoleLogPath = g_shouldAddConsoleLogToReport ? g_consoleLogPath : NULL; // When the crash occurs, Happened again crash if (monitorContext - > crashedDuringCrashHandling) {kscrashreport_writeRecrashReport (monitorContext, g_lastCrashReportFilePath); } else {// 1. Creates the new Crash filePath (char CrashReportFilePath [KSFU_MAX_PATH_LENGTH]); kscrs_getNextCrashReportPath(crashReportFilePath); Strncpy (g_lastCrashReportFilePath) strncpy(g_lastCrashReportFilePath) strncpy(g_lastCrashReportFilePath) strncpy(g_lastCrashReportFilePath) sizeof(g_lastCrashReportFilePath)); // 3. Crash the path of the newly generated file into the function kscrashreport_writestandArdReport (MonitorContext, CrashreportFilePath); }}

The next function is the actual implementation of the log write file. Both functions do similar things, formatting them as JSON and writing them to a file. KScrashReport_WriteCrashReport () is a simple version of the write logic used when a crash occurs. Otherwise, follow the standard write logic, kscrashreport_writestandArdReport ().

bool ksfu_openBufferedWriter(KSBufferedWriter* writer, const char* const path, char* writeBuffer, int writeBufferLength) { writer->buffer = writeBuffer; writer->bufferLength = writeBufferLength; writer->position = 0; #define O_RDONLY 0x0000 open for reading only #define O_WRONLY 0x0001 open for writing only #define O_RDWR 0x0002 open for reading and writing #define O_ACCMODE 0x0003 mask for above mode #define O_CREAT 0x0200 Create if nonexistant #define O_TRUNC 0x0400 truncate to zero length #define O_EXCL 0x0800 error if already exists 0755: That is, users have read/write/execute authority, group users and other users have read and write authority; 0644: That is, users have read and write permissions, group users and other users have read only permissions; Success is returned file descriptors, if appear the return 1 * / writer - > fd = open (path, O_RDWR | O_CREAT | O_EXCL, 0644); if(writer->fd < 0) { KSLOG_ERROR("Could not open crash report file %s: %s", path, strerror(errno)); return false; } return true; }

/** * Write a standard crash report to a file. * * @param monitorContext Contextual information about the crash and environment. * The caller must fill this out before passing it in. * * @param path The file to write to. */ void kscrashreport_writeStandardReport(const struct KSCrash_MonitorContext* const monitorContext, const char* path) { KSLOG_INFO("Writing crash report to %s", path); char writeBuffer[1024]; KSBufferedWriter bufferedWriter; if(! ksfu_openBufferedWriter(&bufferedWriter, path, writeBuffer, sizeof(writeBuffer))) { return; } ksccd_freeze(); KSJSONEncodeContext jsonContext; jsonContext.userData = &bufferedWriter; KSCrashReportWriter concreteWriter; KSCrashReportWriter* writer = &concreteWriter; prepareReportWriter(writer, &jsonContext); ksjson_beginEncode(getJsonContext(writer), true, addJSONData, &bufferedWriter); writer->beginObject(writer, KSCrashField_Report); { writeReportInfo(writer, KSCrashField_Report, KSCrashReportType_Standard, monitorContext->eventID, monitorContext->System.processName); ksfu_flushBufferedWriter(&bufferedWriter); writeBinaryImages(writer, KSCrashField_BinaryImages); ksfu_flushBufferedWriter(&bufferedWriter); writeProcessState(writer, KSCrashField_ProcessState, monitorContext); ksfu_flushBufferedWriter(&bufferedWriter); writeSystemInfo(writer, KSCrashField_System, monitorContext); ksfu_flushBufferedWriter(&bufferedWriter); writer->beginObject(writer, KSCrashField_Crash); { writeError(writer, KSCrashField_Error, monitorContext); ksfu_flushBufferedWriter(&bufferedWriter); writeAllThreads(writer, KSCrashField_Threads, monitorContext, g_introspectionRules.enabled); ksfu_flushBufferedWriter(&bufferedWriter); } writer->endContainer(writer); if(g_userInfoJSON ! = NULL) { addJSONElement(writer, KSCrashField_User, g_userInfoJSON, false); ksfu_flushBufferedWriter(&bufferedWriter); } else { writer->beginObject(writer, KSCrashField_User); } if(g_userSectionWriteCallback ! = NULL) { ksfu_flushBufferedWriter(&bufferedWriter); g_userSectionWriteCallback(writer); } writer->endContainer(writer); ksfu_flushBufferedWriter(&bufferedWriter); writeDebugInfo(writer, KSCrashField_Debug, monitorContext); } writer->endContainer(writer); ksjson_endEncode(getJsonContext(writer)); ksfu_closeBufferedWriter(&bufferedWriter); ksccd_unfreeze(); } /** Write a minimal crash report to a file. * * @param monitorContext Contextual information about the crash and environment. * The caller must fill this out before passing it in. * * @param path The file to write to. */ void kscrashreport_writeRecrashReport(const struct KSCrash_MonitorContext* const monitorContext, const char* path) { char writeBuffer[1024]; KSBufferedWriter bufferedWriter; static char tempPath[KSFU_MAX_PATH_LENGTH]; // I will send you the last crash report Filename path (/ var/mobile/Containers/Data/Application / * * * * * * / Library/Caches/KSCrash/Test/Reports/Test report - * * * * * *. Json) is modified to remove Json, Add.old to make the new file path /var/mobile/Containers/Data/Application/******/Library/Caches/KSCrash/Test/Reports/Test-report-******.old strncpy(tempPath, path, sizeof(tempPath) - 10); strncpy(tempPath + strlen(tempPath) - 5, ".old", 5); KSLOG_INFO("Writing recrash report to %s", path); if(rename(path, tempPath) < 0) { KSLOG_ERROR("Could not rename %s to %s: %s", path, tempPath, strerror(errno)); } // Open the file needed for the memory write according to the incoming path if(! ksfu_openBufferedWriter(&bufferedWriter, path, writeBuffer, sizeof(writeBuffer))) { return; } ksccd_freeze(); // JSON-parsed C code ksJsonEncodeContext JsonContext; jsonContext.userData = &bufferedWriter; KSCrashReportWriter concreteWriter; KSCrashReportWriter* writer = &concreteWriter; prepareReportWriter(writer, &jsonContext); ksjson_beginEncode(getJsonContext(writer), true, addJSONData, &bufferedWriter); writer->beginObject(writer, KSCrashField_Report); { writeRecrash(writer, KSCrashField_RecrashReport, tempPath); ksfu_flushBufferedWriter(&bufferedWriter); if(remove(tempPath) < 0) { KSLOG_ERROR("Could not remove %s: %s", tempPath, strerror(errno)); } writeReportInfo(writer, KSCrashField_Report, KSCrashReportType_Minimal, monitorContext->eventID, monitorContext->System.processName); ksfu_flushBufferedWriter(&bufferedWriter); writer->beginObject(writer, KSCrashField_Crash); { writeError(writer, KSCrashField_Error, monitorContext); ksfu_flushBufferedWriter(&bufferedWriter); int threadIndex = ksmc_indexOfThread(monitorContext->offendingMachineContext, ksmc_getThreadFromContext(monitorContext->offendingMachineContext)); writeThread(writer, KSCrashField_CrashedThread, monitorContext, monitorContext->offendingMachineContext, threadIndex, false); ksfu_flushBufferedWriter(&bufferedWriter); } writer->endContainer(writer); } writer->endContainer(writer); ksjson_endEncode(getJsonContext(writer)); ksfu_closeBufferedWriter(&bufferedWriter); ksccd_unfreeze(); }

2.6.2 Crash log read logic

After the current App crashes, KSCrash will save the data to the App sandbox directory. After the next launch of the App, we will read the stored Crash file, then process the data and upload it.

Function call after APP startup:

[KSCrashInstallation sendAllReportsWithCompletion:] -> [KSCrash sendAllReportsWithCompletion:] -> [KSCrash allReports] -> [KSCrash reportWithIntID:] ->[KSCrash loadCrashReportJSONWithID:] -> kscrs_readReport

Read the sandbox in sendAllReportsWithCompletion Crash data.

Static int getReportCount() {int count = 0; static int getReportCount() {int count = 0; DIR* dir = opendir(g_reportsPath); if(dir == NULL) { KSLOG_ERROR("Could not open directory %s", g_reportsPath); goto done; } struct dirent* ent; while((ent = readdir(dir)) ! = NULL) { if(getReportIDFromFilename(ent->d_name) > 0) { count++; } } done: if(dir ! = NULL) { closedir(dir); } return count; } // select the name of the file as the last part of the file name as the last part of the file name as the last part of the file name. - (NSArray*) allReports {int reportCount = kScrash_getReportCount (); int64_t reportIDs[reportCount]; reportCount = kscrash_getReportIDs(reportIDs, reportCount); NSMutableArray* reports = [NSMutableArray arrayWithCapacity:(NSUInteger)reportCount]; for(int i = 0; i < reportCount; i++) { NSDictionary* report = [self reportWithIntID:reportIDs[i]]; if(report ! = nil) { [reports addObject:report]; } } return reports; } // ReportInfo - (NSDictionary*) ReportInfo :(Int64_t) ReportInfo {NSData* JSONData = [self] loadCrashReportJSONWithID:reportID]; if(jsonData == nil) { return nil; } NSError* error = nil; NSMutableDictionary* crashReport = [KSJSONCodec decode:jsonData options:KSJSONDecodeOptionIgnoreNullInArray | KSJSONDecodeOptionIgnoreNullInObject | KSJSONDecodeOptionKeepPartialObject error:&error]; if(error ! = nil) { KSLOG_ERROR(@"Encountered error loading crash report %" PRIx64 ": %@", reportID, error); } if(crashReport == nil) { KSLOG_ERROR(@"Could not load crash report"); return nil; } [self doctorReport:crashReport]; return crashReport; } / / reportID read crash content and converted to NSData type - (NSData *) loadCrashReportJSONWithID: int64_t reportID {char * report = kscrash_readReport(reportID); if(report ! = NULL) { return [NSData dataWithBytesNoCopy:report length:strlen(report) freeWhenDone:YES]; } return nil; } // Reportid reads the crash data to the char type char* kscrash_readReport(INT64_T) {if(Reportid <= 0) {KSLOG_ERROR("Report ID was %" PRIx64, reportID); return NULL; } char* rawReport = kscrs_readReport(reportID); if(rawReport == NULL) { KSLOG_ERROR("Failed to load report ID %" PRIx64, reportID); return NULL; } char* fixedReport = kscrf_fixupCrashReport(rawReport); if(fixedReport == NULL) { KSLOG_ERROR("Failed to fixup report ID %" PRIx64, reportID); } free(rawReport); return fixedReport; } // Multithreading () {getCrashReportPathById ();} // Multithreading (); Result char* kscrs_readReport(int64_t reportID) {pthread_mutex_lock(&g_mutex); char path[KSCRS_MAX_PATH_LENGTH]; getCrashReportPathByID(reportID, path); char* result; ksfu_readEntireFile(path, &result, NULL, 2000000); pthread_mutex_unlock(&g_mutex); return result; } int kscrash_getReportIDs(int64_t* reportIDs, int count) { return kscrs_getReportIDs(reportIDs, count); } int kscrs_getReportIDs(int64_t* reportIDs, int count) { pthread_mutex_lock(&g_mutex); count = getReportIDs(reportIDs, count); pthread_mutex_unlock(&g_mutex); return count; } // Loop through the folder and call getReportidFromFileName () according to ent->d_name. Static int getReportids (int64_t* Reportids, int count) {int index = 0; DIR* dir = opendir(g_reportsPath); if(dir == NULL) { KSLOG_ERROR("Could not open directory %s", g_reportsPath); goto done; } struct dirent* ent; while((ent = readdir(dir)) ! = NULL && index < count) { int64_t reportID = getReportIDFromFilename(ent->d_name); if(reportID > 0) { reportIDs[index++] = reportID; } } qsort(reportIDs, (unsigned)count, sizeof(reportIDs[0]), compareInt64); done: if(dir ! = NULL) { closedir(dir); } return index; } sprintf(parameter 1, format 2) returns the value of format 2 to parameter 1, and then executes sscanf(parameter 1, parameter 2, parameter 3). The function writes the contents of string parameter 1 to parameter 3, in the format of parameter 2. Static int64_t getReportidFromFilename (const char* filename) {char (int) scanFormat[100]; sprintf(scanFormat, "%s-report-%%" PRIx64 ".json", g_appName); int64_t reportID = 0; sscanf(filename, scanFormat, &reportID); return reportID; }

2.7 Monitoring of front-end JS related Crash

2.7.1 JavaScript Core Exception Monitoring

This part is straightforward and is monitored directly through the ExceptionHandler property of the JSContext object, such as the code below

JsContext. ExceptionHandler = ^ (jsContext * context, JSValue * exception) {/ / processing jscore related exception information};

2.7.2 Exception monitoring on H5 pages

The window object triggers an error event from the ErroEvent interface and executes window.onError () when JavaScript in an H5 page is running abnormally.

Window. onError = function (MSG, url, LineNumber, ColumnNumber, error) {// Handle the exception message};

2.7.3 React Native Exception Monitoring

Small experiment: The following is a RN Demo project written, in the Debug Text control added to the event monitoring code, the internal artificial trigger crash

<Text style={styles.sectionTitle} onPress={()=>{1+qw; }}>Debug</Text>

Comparison group 1:

Condition: iOS project debug mode. Added exception handling code in RN side.

Click Command + D to get out of the emulator panel, select Debug, open the Chrome browser, click Command + Option + J to open the Debug panel, and you can Debug the RN code just like you Debug React.

View the Crash Stack and click to jump to SourceMap.

Tips: Release the RN project

Create a folder (release_iOS) under the project root to be used as the output folder for the resources

Switch to the project directory at the terminal and execute the following code

react-native bundle --entry-file index.js --platform ios --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.ios.map;

Release_IOS folder.jsbundle 和 assetsDrag the contents of the folder into the iOS project

Comparison group 2:

Conditions: iOS project release mode. No exception handling code is added on the RN side

Operation: Run the iOS project and click the button to simulate Crash

Symptoms: iOS project collapse. The screenshots and logs are below

2020-06-22 22:26:03.318 [info][tid:main][rcTrootView.m :294] Running application todos ({initialProps = {}; rootTag = 1; }) 2020-06-22 22:26:03. 490 [info] [dar: com. Facebook. React. JavaScript] Running "with" todos {" rootTag ": 1," initialProps ": {}} 22:27:38. 2020-06-22, 673 [error] [dar: com. Facebook. React. JavaScript] ReferenceError: Can't find variable: 've 22:27:38 2020-06-22. 675 (fatal) [dar: com. Facebook. React. ExceptionsManagerQueue] Unhandled JS Exception: ReferenceError: Can't find variable: QW 2020-06-22 22:27:38.691300+0800 TODOS [16790:314161] *** Terminating app due to uncaught Exception 'RCTFatalException:  Unhandled JS Exception: ReferenceError: Can't find variable: qw', reason: 'Unhandled JS Exception: ReferenceError: Can't find variable: qw, stack: onPress@397:1821 <unknown>@203:3896 _performSideEffectsForTransition@210:9689 _performSideEffectsForTransition@(null):(null) _receiveSignal@210:8425 _receiveSignal@(null):(null) touchableHandleResponderRelease@210:5671 touchableHandleResponderRelease@(null):(null) onResponderRelease@203:3006 b@97:1125 S@97:1268 w@97:1322 R@97:1617 M@97:2401 forEach@(null):(null) U@97:2201 <unknown>@97:13818 Pe@97:90199 Re@97:13478 Ie@97:13664 receiveTouches@97:14448 value@27:3544 <unknown>@27:840 value@27:2798 value@27:812 value@(null):(null) ' *** First throw call stack: ( 0 CoreFoundation 0x00007fff23e3cf0e __exceptionPreprocess + 350 1 libobjc.A.dylib 0x00007fff50ba89b2 objc_exception_throw + 48 2 todos 0x00000001017b0510 RCTFormatError + 0 3 todos 0x000000010182d8ca -[RCTExceptionsManager reportFatal:stack:exceptionId:suppressRedBox:] + 503 4 todos 0x000000010182e34e -[RCTExceptionsManager reportException:] + 1658 5 CoreFoundation 0x00007fff23e43e8c __invoking___ + 140 6 CoreFoundation  0x00007fff23e41071 -[NSInvocation invoke] + 321 7 CoreFoundation 0x00007fff23e41344 -[NSInvocation invokeWithTarget:] +  68 8 todos 0x00000001017e07fa -[RCTModuleMethod invokeWithBridge:module:arguments:] + 578 9 todos 0x00000001017e2a84 _ZN8facebook5reactL11invokeInnerEP9RCTBridgeP13RCTModuleDatajRKN5folly7dynamicE + 246 10 todos 0x00000001017e280c ___ZN8facebook5react15RCTNativeModule6invokeEjON5folly7dynamicEi_block_invoke + 78 11 libdispatch.dylib 0x00000001025b5f11 _dispatch_call_block_and_release + 12 12 libdispatch.dylib 0x00000001025b6e8e _dispatch_client_callout + 8 13 libdispatch.dylib 0x00000001025bd6fd _dispatch_lane_serial_drain + 788 14 libdispatch.dylib 0x00000001025be28f _dispatch_lane_invoke + 422 15 libdispatch.dylib 0x00000001025c9b65 _dispatch_workloop_worker_thread + 719 16 libsystem_pthread.dylib 0x00007fff51c08a3d _pthread_wqthread + 290 17 libsystem_pthread.dylib 0x00007fff51c07b77 start_wqthread + 15 ) libc++abi.dylib: terminating with uncaught exception of type NSException (lldb)

Tips: How to debug in RN Release mode (see Console information on JS side)

inAppDelegate.mThe introduction of#import <React/RCTLog.h>
in- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptionsaddRCTSetLogThreshold(RCTLogLevelTrace);

Comparison group 3:

Conditions: iOS project release mode. Add exception handling code on RN side.

global.ErrorUtils.setGlobalHandler((e) => { console.log(e); let message = { name: e.name, message: e.message, stack: e.stack }; Axios. Get (' http://192.168.1.100:8888/test.php '{params: {" message ": JSON.stringify(message) } }).then(function (response) { console.log(response) }).catch(function (error) { console.log(error) }); }, true)

Operation: Run the iOS project and click the button to simulate Crash.

Symptoms: iOS projects don’t crash. The log information is shown below, comparing the JS in the bundle.

Conclusion:

In the RN project, if there is a crash, it will be reflected in the Native side. If the RN side writes the code that Crash captured, the Native side will not crash. If the crash on the RN side is not captured, the Native will crash directly.

After the monitoring, the stack information was printed out and it was found that the corresponding JS information was processed by Webpack. Crash analysis was very difficult. Therefore, we need to profile the monitoring code in RN for the crash of RN, and report it after monitoring. In addition, we need to write special crash information restoration to you for the monitored information, that is, sourceMAP analysis.

2.7.3.1 JS logic error

Anyone who has ever written RN knows that red screen will be generated if there is a problem with JS code in DEBUG mode, while white screen or flash back will be generated in RELEASE mode. In order to experience and quality control, abnormal monitoring is required.

When looking at the RN source code, I found ErrorUtils, which can be set to handle error messages.

/** * Copyright (c) Facebook, Inc. and its affiliates. * * This source code is licensed under the MIT license found in the * LICENSE file in the root directory of this source tree. * * @format * @flow strict * @polyfill */ let _inGuard = 0; type ErrorHandler = (error: mixed, isFatal: boolean) => void; type Fn<Args, Return> = (... Args) => Return; /** * This is the error handler that is called when we encounter an exception * when loading a module. This will report any errors encountered before * ExceptionsManager is configured. */ let _globalHandler: ErrorHandler = function onError( e: mixed, isFatal: boolean, ) { throw e; }; /** * The particular require runtime that we are using looks for a global * `ErrorUtils` object and if it exists, then it requires modules with the * error handler specified via ErrorUtils.setGlobalHandler by calling the * require function with applyWithGuard. Since the require module is loaded * before any of the modules, this ErrorUtils must be defined (and the handler * set) globally before requiring anything. */ const ErrorUtils = { setGlobalHandler(fun: ErrorHandler): void { _globalHandler = fun; }, getGlobalHandler(): ErrorHandler { return _globalHandler; }, reportError(error: mixed): void { _globalHandler && _globalHandler(error, false); }, reportFatalError(error: mixed): void { // NOTE: This has an untyped call site in Metro. _globalHandler && _globalHandler(error, true); }, applyWithGuard<TArgs: $ReadOnlyArray<mixed>, TOut>( fun: Fn<TArgs, TOut>, context? :? mixed, args? :? TArgs, // Unused, but some code synced from www sets it to null. unused_onError? : null, // Some callers pass a name here, which we ignore. unused_name? :? string, ): ? TOut { try { _inGuard++; // $FlowFixMe: TODO T48204745 (1) apply(context, null) is fine. (2) array -> rest array should work return fun.apply(context, args); } catch (e) { ErrorUtils.reportError(e); } finally { _inGuard--; } return null; }, applyWithGuardIfNeeded<TArgs: $ReadOnlyArray<mixed>, TOut>( fun: Fn<TArgs, TOut>, context? :? mixed, args? :? TArgs, ): ? TOut { if (ErrorUtils.inGuard()) { // $FlowFixMe: TODO T48204745 (1) apply(context, null) is fine. (2) array -> rest array should work return fun.apply(context, args); } else { ErrorUtils.applyWithGuard(fun, context, args); } return null; }, inGuard(): boolean { return !! _inGuard; }, guard<TArgs: $ReadOnlyArray<mixed>, TOut>( fun: Fn<TArgs, TOut>, name? :? string, context? :? mixed, ): ? (... TArgs) => ? TOut { // TODO: (moti) T48204753 Make sure this warning is never hit and remove it - types // should be sufficient. if (typeof fun ! == 'function') { console.warn('A function must be passed to ErrorUtils.guard, got ', fun); return null; } const guardName = name ?? fun.name ?? '<generated guard>'; function guarded(... args: TArgs): ? TOut { return ErrorUtils.applyWithGuard( fun, context ?? this, args, null, guardName, ); } return guarded; }}; global.ErrorUtils = ErrorUtils; export type ErrorUtilsT = typeof ErrorUtils;

So RN exceptions can use global.errorUtils to set the error handling. For example

global.ErrorUtils.setGlobalHandler(e => {
   // e.name e.message e.stack
}, true);

2.7.3.2 Component problems

One more thing to note about RN’s crash handling is the act Error Boundaries. Detailed information on

In the past, JavaScript errors within a component have caused the internal state of React to be corrupted and produced potentially untraceable errors on the next render. These errors are basically caused by earlier errors in other code (non-React component code), but React does not provide an elegant way to handle these errors in a component, nor does it provide a way to recover from them.

JavaScript errors in a partial UI should not crash the entire app. To address this, React 16 introduces a new concept called error bounds.

The error boundary is a React component that captures and prints JavaScript errors that occur anywhere in its child tree, and it renders the alternate UI instead of the crashed child tree. The error boundary catches errors during rendering, in lifecycle methods, and in constructors throughout the component tree.

It catches exceptions in subcomponent lifecycle functions, including constructors and render functions

Instead of catching the following exceptions:

Handlers (Event handlers)
Asynchronous code (e.g. setTimeout, Promise, etc.)
Server Side Rendering (server-side rendering)
Errors thrown in the error boundary itself (rather than its children)

Therefore, all exceptions within the life cycle of the component can be captured by the exception boundary component, and then the undercover component can be rendered to prevent App Crash and improve user experience. Users can also be guided to feedback problems, convenient troubleshooting and repair problems

So far, RN crashes are divided into two types, namely JS logic error, component JS error, have been monitored and processed. And then how can we solve these problems on an engineering level

2.7.4 RN Crash restore

The sourceMap file is essential for parsing the front-end logs. The parameters and calculation steps in the sourceMap file are described in the file. Check out this article.

With the SourceMap file, you can restore the RN Crash log with Mozilla’s source-map project.

I wrote a NodeJS script with the following code

var fs = require('fs'); var sourceMap = require('source-map'); var arguments = process.argv.splice(2); function parseJSError(aLine, aColumn) { fs.readFile('./index.ios.map', 'utf8', function (err, data) { const whatever = sourceMap.SourceMapConsumer.with(data, null, Consumer = > {/ / read the number of rows in the crash logs, column number let parseData = consumer. OriginalPositionFor ({line: parseInt (aLine), the column: parseInt(aColumn) }); // Output to console. Log (ParseData); Fs.writeFileSync ('./ parse. TXT ', json.stringify (parseData) + '\n', 'utf8', 'fs.writeFileSync ', './ parse. TXT ',' stringify(parseData) + '\n', 'utf8', function(err) { if(err) { console.log(err); }}); }); }); } var line = arguments[0]; var column = arguments[1]; parseJSError(line, column);

Now let’s do an experiment, again with the above TODOS project.

Simulate crash on the Text click event

<Text style={styles.sectionTitle} onPress={()=>{1+qw; }}>Debug</Text>

Bundle the RN project and produce the sourceMap file. Execute the command,

react-native bundle --entry-file index.js --platform android --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.android.map;

For high frequency use, add alias to iterm2 and modify the.zshrc file

alias RNRelease='react-native bundle --entry-file index.js --platform ios --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.ios.map; '# RN'

Copy the JS bundle and image resources to your Xcode project
Click Simulate Crash and copy the line and column numbers below the log. Under the Node project, execute the following command
```
node index.js 397 1822
```
Compare the line numbers, column numbers, and file information parsed by the script to the source code file, and the results are correct.

2.7.5 Design of SourceMap parsing system

Objectives: Through the platform can be RN project online crash can be restored to the specific files, lines of code, code column number. You can see the specific code. You can see the RN Stack Trace and provide the function of downloading source files.

Server managed under the packaging system:
- Source Map files are generated only when packaged in production
- Store all files before packaging (install)
Develop product side RN analysis interface. Click on the collected RN Crash, you can see the specific file, code line number, code column number in the details page. You can see the specific code. You can see the RN stack trace and Native stack trace. (The specific technical implementation is described above)
Due to the large size of Souece Map file, the long RN parsing is not long, but it is a consumption of computing resources, so it is necessary to design an efficient reading method
SourceMap is different in iOS and Android mode, so Sourecemap storage needs to be OS specific.

3. The use of KScrash packaging

You can then encapsulate your own Crash handling logic. For example, things to do are:

Inherit from the abstract class KScrashInstallation, set the initialization work (abstract classes such as NSURLProtocol must be inherited and used), and implement the Sink method in the abstract class.

/**
 * Crash system installation which handles backend-specific details.
 *
 * Only one installation can be installed at a time.
 *
 * This is an abstract class.
 */
@interface KSCrashInstallation : NSObject

#import "APMCrashInstallation.h" #import <KSCrash/KSCrashInstallation+Private.h> #import "APMCrashReporterSink.h" @implementation APMCrashInstallation + (instancetype)sharedInstance { static APMCrashInstallation *sharedInstance = nil;  static dispatch_once_t onceToken; dispatch_once(&onceToken, ^{ sharedInstance = [[APMCrashInstallation alloc] init]; }); return sharedInstance; } - (id)init { return [super initWithRequiredProperties: nil]; } - (id<KSCrashReportFilter>)sink { APMCrashReporterSink *sink = [[APMCrashReporterSink alloc] init]; return [sink defaultCrashReportFilterSetAppleFmt]; } @end

Sink within a method APMCrashReporterSink class, follow the KSCrashReportFilter agreement, declared the defaultCrashReportFilterSetAppleFmt public methods

// .h
#import <Foundation/Foundation.h>
#import <KSCrash/KSCrashReportFilter.h>

@interface APMCrashReporterSink : NSObject<KSCrashReportFilter>

- (id <KSCrashReportFilter>) defaultCrashReportFilterSetAppleFmt;

@end

// .m
#pragma mark - public Method

- (id <KSCrashReportFilter>) defaultCrashReportFilterSetAppleFmt
{
    return [KSCrashReportFilterPipeline filterWithFilters:
            [APMCrashReportFilterAppleFmt filterWithReportStyle:KSAppleReportStyleSymbolicatedSideBySide],
            self,
            nil];
}

The defaultCrashReportFilterSetAppleFmt internal () method returns a KSCrashReportFilterPipeline class methods filterWithFilters results.

APMCrashReportFilterAppleFmt is an inherited from KSCrashReportFilterAppleFmt class, follow the KSCrashReportFilter protocol. The protocol approach allows developers to work with Crash’s data format.

/** Filter the specified reports.
 *
 * @param reports The reports to process.
 * @param onCompletion Block to call when processing is complete.
 */
- (void) filterReports:(NSArray*) reports
          onCompletion:(KSCrashReportFilterCompletion) onCompletion;

#import <KSCrash/KSCrashReportFilterAppleFmt.h> @interface APMCrashReportFilterAppleFmt : KSCrashReportFilterAppleFmt<KSCrashReportFilter> @end // .m - (void) filterReports:(NSArray*)reports onCompletion:(KSCrashReportFilterCompletion)onCompletion { NSMutableArray* filteredReports = [NSMutableArray arrayWithCapacity:[reports count]]; for(NSDictionary *report in reports){ if([self majorVersion:report] == kExpectedMajorVersion){ id monitorInfo = [self generateMonitorInfoFromCrashReport:report]; if(monitorInfo ! = nil){ [filteredReports addObject:monitorInfo]; } } } kscrash_callCompletion(onCompletion, filteredReports, YES, nil); } /** @brief fetch Crash time, Mach name, signal name, and Apple Report in Crash JSON */ - (NSDicdictionary *)generateMonitorInfoFromCrashReport:(NSDictionary *)crashReport { NSDictionary *infoReport = [crashReport objectForKey:@"report"]; / /... id appleReport = [self toAppleFormat:crashReport]; NSMutableDictionary *info = [NSMutableDictionary dictionary]; [info setValue:crashTime forKey:@"crashTime"]; [info setValue:appleReport forKey:@"appleReport"]; [info setValue:userException forKey:@"userException"]; [info setValue:userInfo forKey:@"custom"]; return [info copy]; }

/**
 * A pipeline of filters. Reports get passed through each subfilter in order.
 *
 * Input: Depends on what's in the pipeline.
 * Output: Depends on what's in the pipeline.
 */
@interface KSCrashReportFilterPipeline : NSObject <KSCrashReportFilter>

Set up a launcher for the Crash module in the APM capability. The launcher internally sets up the initialization of KScrash and the assembly of the data needed to monitor when Crash is triggered. For example: SESSION_ID, App start time, App name, crash time, App version number, current page information and other basic information.
```
/** C Function to call during a crash report to give the callee an opportunity to
 * add to the report. NULL = ignore.
 *
 * WARNING: Only call async-safe functions from this function! DO NOT call
 * Objective-C methods!!!
 */
@property(atomic,readwrite,assign) KSReportWriteCallback onCrash;
```

+ (instancetype)sharedInstance
{
    static APMCrashMonitor *_sharedManager = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        _sharedManager = [[APMCrashMonitor alloc] init];
    });
    return _sharedManager;
}


#pragma mark - public Method

- (void)startMonitor
{
    APMMLog(@"crash monitor started");

#ifdef DEBUG
    BOOL _trackingCrashOnDebug = [APMMonitorConfig sharedInstance].trackingCrashOnDebug;
    if (_trackingCrashOnDebug) {
        [self installKSCrash];
    }
#else
    [self installKSCrash];
#endif
}

#pragma mark - private method

static void onCrash(const KSCrashReportWriter* writer)
{
    NSString *sessionId = [NSString stringWithFormat:@"\"%@\"", ***]];
    writer->addJSONElement(writer, "SESSION_ID", [sessionId UTF8String], true);
    
    NSString *appLaunchTime = ***;
    writer->addJSONElement(writer, "USER_APP_START_DATE", [[NSString stringWithFormat:@"\"%@\"", appLaunchTime] UTF8String], true);
    // ...
}

- (void)installKSCrash
{
    [[APMCrashInstallation sharedInstance] install];
    [[APMCrashInstallation sharedInstance] sendAllReportsWithCompletion:nil];
    [APMCrashInstallation sharedInstance].onCrash = onCrash;
    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(5.f * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
        _isCanAddCrashCount = NO;
    });
}

In installKSCrash method calls the [[APMCrashInstallation sharedInstance] sendAllReportsWithCompletion: nil], internal implementation is as follows

- (void) sendAllReportsWithCompletion:(KSCrashReportFilterCompletion) onCompletion { NSError* error = [self validateProperties]; if(error ! = nil) { if(onCompletion ! = nil) { onCompletion(nil, NO, error); } return; } id<KSCrashReportFilter> sink = [self sink]; if(sink == nil) { onCompletion(nil, NO, [NSError errorWithDomain:[[self class] description] code:0 description:@"Sink was nil (subclasses must implement method \"sink\")"]); return; } sink = [KSCrashReportFilterPipeline filterWithFilters:self.prependedFilters, sink, nil]; KSCrash* handler = [KSCrash sharedInstance]; handler.sink = sink; [handler sendAllReportsWithCompletion:onCompletion]; }

Method internally assigns the sink of KScrashInstallation to the KScrash object. Internal or call the KSCrash sendAllReportsWithCompletion method, implemented as follows

- (void) sendAllReportsWithCompletion:(KSCrashReportFilterCompletion) onCompletion { NSArray* reports = [self allReports]; KSLOG_INFO(@"Sending %d crash reports", [reports count]); [self sendReports:reports onCompletion:^(NSArray* filteredReports, BOOL completed, NSError* error) { KSLOG_DEBUG(@"Process finished with completion: %d", completed); if(error != nil) { KSLOG_ERROR(@"Failed to send reports: % @ ", error); } if((self.deleteBehaviorAfterSendAll == KSCDeleteOnSucess && completed) || self.deleteBehaviorAfterSendAll == KSCDeleteAlways) { kscrash_deleteAllReports(); } kscrash_callCompletion(onCompletion, filteredReports, completed, error); }]; }

The method internally calls the object method SendReports: OnCompletion:, as shown below

- (void) sendReports:(NSArray*) reports onCompletion:(KSCrashReportFilterCompletion) onCompletion
{
    if([reports count] == 0)
    {
        kscrash_callCompletion(onCompletion, reports, YES, nil);
        return;
    }
    
    if(self.sink == nil)
    {
        kscrash_callCompletion(onCompletion, reports, NO,
                                 [NSError errorWithDomain:[[self class] description]
                                                     code:0
                                              description:@"No sink set. Crash reports not sent."]);
        return;
    }
    
    [self.sink filterReports:reports
                onCompletion:^(NSArray* filteredReports, BOOL completed, NSError* error)
     {
         kscrash_callCompletion(onCompletion, filteredReports, completed, error);
     }];
}

Method internal [self.sink FilterReports: OnCompletion: ] implementation is actually the Sink Getter method set in APMCRashInstallation, Internal object returned to APMCrashReporterSink defaultCrashReportFilterSetAppleFmt methods return values. The internal implementation is as follows

- (id <KSCrashReportFilter>) defaultCrashReportFilterSetAppleFmt
{
    return [KSCrashReportFilterPipeline filterWithFilters:
            [APMCrashReportFilterAppleFmt filterWithReportStyle:KSAppleReportStyleSymbolicatedSideBySide],
            self,
            nil];
}

APMCRashReporterSink object (self.sink FilterReports) {self.sink FilterReports (self.sink); OnCompletion :], which calls the data processing method in the APMCRashReporterSink. After that, go through kscrash_callCompletion(onCompletion, reports, YES, nil); Tell KSCrash that the Crash log saved locally has been processed and can be deleted.

- (void)filterReports:(NSArray *)reports onCompletion:(KSCrashReportFilterCompletion)onCompletion { for (NSDictionary * Report in Reports) {// Process Crash data and hand it over to a unified data reporting component for processing... } kscrash_callCompletion(onCompletion, reports, YES, nil); }

At this point, summarize what KScrash does, to provide a variety of Crash monitoring capabilities, after Crash process information, basic information, exception information, thread information and so on efficiently converted into JSON to write files with C, After the next launch of the App, it reads the Crash log in the local Crash folder, allowing developers to customize the key and value and then report the log to the APM system, and then delete the log in the local Crash folder.

4. Symbolic

After the application crashes, the system will generate a crash log, stored in the Settings, the application running state, call stack, thread and other information will be recorded in the log. But these logs are addresses and are not readable, so you need to do a symbolic restore.

4.1. DSYM file

The.dsym (DebuggingSymbol) file is a transit file that holds the address mapping information for hexadecimal functions, including the symbols. The Xcode project will generate a new.dsym text �� every time it is compiled and run. By default, debug mode does not generate.dsym. You can change the value DWARF to DWARF WITH DSYM File in the Build Settings -> Build Options -> Debug Information Format. This will generate the.dsym file when you compile and run it again.

So you need to save each version of the.dsym file every time your App is packaged.

Information contained in DWARF DSYM file, open the file package Contents Test. App. DSYM/Contents/Resources/DWARF/Test is saved to DWARF file.

The.dsym file is a file directory that extracts debugging information from the mach-o file. When it is released, debugging information is stored in a separate file for safety. The.dsym file is actually a file directory with the following structure:

4.2 DWARF file

DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.

DWARF is a debug file format that is widely used by many compilers and debuggers to support source-level debugging. It meets the needs of many process languages (C, C++, Fortran), and it is designed to support scaling to other languages. DWARF is architecturally independent and applicable to any other processor and operating system. It is widely used on UNIX, Linux, and other operating systems, as well as in standalone environments.

DWARF is a Debugging file that uses a formatted Record format.

DWARF is a compact representation of an executable’s relationship to source code.

Most modern programming languages have a block structure: each entity (a class, a function) is contained within another entity. A C program may contain multiple data definitions, variables, and functions per file, so DWARF follows this model and is also a block structure. The basic descriptors in DWARF are Debugging Information Entry DIE. A DIE has a tag that indicates what the DIE describes and a list of properties (like HTML, XML structures) that fill in the details to further describe the item. A DIE (except the top level) is contained by a parent DIE, which may have sibling DIEs or child DIEs, and properties may contain a variety of values: constants (such as a function name), variables (such as the starting address of a function), or references to another DIE (such as the return value type of a function).

The data in the DWARF file is as follows:

Data column	The information that
.debug_loc	The list of locations used in the DW_AT_location property
.debug_macinfo	The macro information
.debug_pubnames	Look-up tables for global objects and functions
.debug_pubtypes	A lookup table of global type
.debug_ranges	The range of addresses used in the DW_AT_ranges property
.debug_str	Table of strings used in.debug_info
.debug_types	Type description

Common tags and attributes are as follows:

Data column	The information that
DW_TAG_class_type	Represents the class name and type information
DW_TAG_structure_type	Represents the structure name and type information
DW_TAG_union_type	Represents union name and type information
DW_TAG_enumeration_type	Represents enumeration name and type information
DW_TAG_typedef	Represents the name and type information of a typedef
DW_TAG_array_type	Represents the array name and type information
DW_TAG_subrange_type	Represents the size information of an array
DW_TAG_inheritance	Represents inherited class name and type information
DW_TAG_member	Represents a member of a class
DW_TAG_subprogram	Represents the name information of a function
DW_TAG_formal_parameter	Represents the parameter information of a function
DW_TAG_name	Representing a Name String
DW_TAG_type	Representing type information
DW_TAG_artifical	Set by the compiler at creation time
DW_TAG_sibling	Represents brother location information
DW_TAG_data_memver_location	Representing location information
DW_TAG_virtuality	Set when virtual

To take a brief example of DWARF, parse the DWARF files in the.dsym folder of the test project with the following command

dwarfdump -F --debug-info Test.app.DSYM/Contents/Resources/DWARF/Test > debug-info.txt

Open the following

Test.app.DSYM/Contents/Resources/DWARF/Test:    file format Mach-O arm64

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004f version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x00000053)

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer [DW_FORM_strp]    ("Apple clang version 11.0.3 (clang-1103.0.32.62)")
              DW_AT_language [DW_FORM_data2]    (DW_LANG_ObjC)
              DW_AT_name [DW_FORM_strp]    ("_Builtin_stddef_max_align_t")
              DW_AT_stmt_list [DW_FORM_sec_offset]    (0x00000000)
              DW_AT_comp_dir [DW_FORM_strp]    ("/Users/lbp/Desktop/Test")
              DW_AT_APPLE_major_runtime_vers [DW_FORM_data1]    (0x02)
              DW_AT_GNU_dwo_id [DW_FORM_data8]    (0x392b5344d415340c)

0x00000027:   DW_TAG_module
                DW_AT_name [DW_FORM_strp]    ("_Builtin_stddef_max_align_t")
                DW_AT_LLVM_config_macros [DW_FORM_strp]    ("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                DW_AT_LLVM_include_path [DW_FORM_strp]    ("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include")
                DW_AT_LLVM_isysroot [DW_FORM_strp]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x00000038:     DW_TAG_typedef
                  DW_AT_type [DW_FORM_ref4]    (0x0000004b "long double")
                  DW_AT_name [DW_FORM_strp]    ("max_align_t")
                  DW_AT_decl_file [DW_FORM_data1]    ("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include/__stddef_max_align_t.h")
                  DW_AT_decl_line [DW_FORM_data1]    (16)

0x00000043:     DW_TAG_imported_declaration
                  DW_AT_decl_file [DW_FORM_data1]    ("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include/__stddef_max_align_t.h")
                  DW_AT_decl_line [DW_FORM_data1]    (27)
                  DW_AT_import [DW_FORM_ref_addr]    (0x0000000000000027)

0x0000004a:     NULL

0x0000004b:   DW_TAG_base_type
                DW_AT_name [DW_FORM_strp]    ("long double")
                DW_AT_encoding [DW_FORM_data1]    (DW_ATE_float)
                DW_AT_byte_size [DW_FORM_data1]    (0x08)

0x00000052:   NULL
0x00000053: Compile Unit: length = 0x000183dc version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x00018433)

0x0000005e: DW_TAG_compile_unit
              DW_AT_producer [DW_FORM_strp]    ("Apple clang version 11.0.3 (clang-1103.0.32.62)")
              DW_AT_language [DW_FORM_data2]    (DW_LANG_ObjC)
              DW_AT_name [DW_FORM_strp]    ("Darwin")
              DW_AT_stmt_list [DW_FORM_sec_offset]    (0x000000a7)
              DW_AT_comp_dir [DW_FORM_strp]    ("/Users/lbp/Desktop/Test")
              DW_AT_APPLE_major_runtime_vers [DW_FORM_data1]    (0x02)
              DW_AT_GNU_dwo_id [DW_FORM_data8]    (0xa4a1d339379e18a5)

0x0000007a:   DW_TAG_module
                DW_AT_name [DW_FORM_strp]    ("Darwin")
                DW_AT_LLVM_config_macros [DW_FORM_strp]    ("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                DW_AT_LLVM_include_path [DW_FORM_strp]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include")
                DW_AT_LLVM_isysroot [DW_FORM_strp]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x0000008b:     DW_TAG_module
                  DW_AT_name [DW_FORM_strp]    ("C")
                  DW_AT_LLVM_config_macros [DW_FORM_strp]    ("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                  DW_AT_LLVM_include_path [DW_FORM_strp]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include")
                  DW_AT_LLVM_isysroot [DW_FORM_strp]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x0000009c:       DW_TAG_module
                    DW_AT_name [DW_FORM_strp]    ("fenv")
                    DW_AT_LLVM_config_macros [DW_FORM_strp]    ("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                    DW_AT_LLVM_include_path [DW_FORM_strp]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include")
                    DW_AT_LLVM_isysroot [DW_FORM_strp]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x000000ad:         DW_TAG_enumeration_type
                      DW_AT_type [DW_FORM_ref4]    (0x00017276 "unsigned int")
                      DW_AT_byte_size [DW_FORM_data1]    (0x04)
                      DW_AT_decl_file [DW_FORM_data1]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/fenv.h")
                      DW_AT_decl_line [DW_FORM_data1]    (154)

0x000000b5:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]    ("__fpcr_trap_invalid")
                        DW_AT_const_value [DW_FORM_udata]    (256)

0x000000bc:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]    ("__fpcr_trap_divbyzero")
                        DW_AT_const_value [DW_FORM_udata]    (512)

0x000000c3:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]    ("__fpcr_trap_overflow")
                        DW_AT_const_value [DW_FORM_udata]    (1024)

0x000000ca:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]    ("__fpcr_trap_underflow")
// ......
0x000466ee:   DW_TAG_subprogram
                DW_AT_name [DW_FORM_strp]    ("CFBridgingRetain")
                DW_AT_decl_file [DW_FORM_data1]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/System/Library/Frameworks/Foundation.framework/Headers/NSObject.h")
                DW_AT_decl_line [DW_FORM_data1]    (105)
                DW_AT_prototyped [DW_FORM_flag_present]    (true)
                DW_AT_type [DW_FORM_ref_addr]    (0x0000000000019155 "CFTypeRef")
                DW_AT_inline [DW_FORM_data1]    (DW_INL_inlined)

0x000466fa:     DW_TAG_formal_parameter
                  DW_AT_name [DW_FORM_strp]    ("X")
                  DW_AT_decl_file [DW_FORM_data1]    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/System/Library/Frameworks/Foundation.framework/Headers/NSObject.h")
                  DW_AT_decl_line [DW_FORM_data1]    (105)
                  DW_AT_type [DW_FORM_ref4]    (0x00046706 "id")

0x00046705:     NULL

0x00046706:   DW_TAG_typedef
                DW_AT_type [DW_FORM_ref4]    (0x00046711 "objc_object*")
                DW_AT_name [DW_FORM_strp]    ("id")
                DW_AT_decl_file [DW_FORM_data1]    ("/Users/lbp/Desktop/Test/Test/NetworkAPM/NSURLResponse+apm_FetchStatusLineFromCFNetwork.m")
                DW_AT_decl_line [DW_FORM_data1]    (44)

0x00046711:   DW_TAG_pointer_type
                DW_AT_type [DW_FORM_ref4]    (0x00046716 "objc_object")

0x00046716:   DW_TAG_structure_type
                DW_AT_name [DW_FORM_strp]    ("objc_object")
                DW_AT_byte_size [DW_FORM_data1]    (0x00)

0x0004671c:     DW_TAG_member
                  DW_AT_name [DW_FORM_strp]    ("isa")
                  DW_AT_type [DW_FORM_ref4]    (0x00046727 "objc_class*")
                  DW_AT_data_member_location [DW_FORM_data1]    (0x00)
// ......

I won’t paste the whole thing here (it’s too long). You can see that DIE contains the function’s starting address, ending address, function name, filename, and line number. For a given address, if you find a DIE that contains the function’s starting address and ending address, you can reduce the function name and filename information.

The debug_line restores information such as the number of file lines

dwarfdump -F --debug-line Test.app.DSYM/Contents/Resources/DWARF/Test > debug-inline.txt

Post part information

Test.app.DSYM/Contents/Resources/DWARF/Test:    file format Mach-O arm64

.debug_line contents:
debug_line[0x00000000]
Line table prologue:
    total_length: 0x000000a3
         version: 4
 prologue_length: 0x0000009a
 min_inst_length: 1
max_ops_per_inst: 1
 default_is_stmt: 1
       line_base: -5
      line_range: 14
     opcode_base: 13
standard_opcode_lengths[DW_LNS_copy] = 0
standard_opcode_lengths[DW_LNS_advance_pc] = 1
standard_opcode_lengths[DW_LNS_advance_line] = 1
standard_opcode_lengths[DW_LNS_set_file] = 1
standard_opcode_lengths[DW_LNS_set_column] = 1
standard_opcode_lengths[DW_LNS_negate_stmt] = 0
standard_opcode_lengths[DW_LNS_set_basic_block] = 0
standard_opcode_lengths[DW_LNS_const_add_pc] = 0
standard_opcode_lengths[DW_LNS_fixed_advance_pc] = 1
standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
standard_opcode_lengths[DW_LNS_set_epilogue_begin] = 0
standard_opcode_lengths[DW_LNS_set_isa] = 1
include_directories[  1] = "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include"
file_names[  1]:
           name: "__stddef_max_align_t.h"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000

Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000      1      0      1   0             0  is_stmt end_sequence
debug_line[0x000000a7]
Line table prologue:
    total_length: 0x0000230a
         version: 4
 prologue_length: 0x00002301
 min_inst_length: 1
max_ops_per_inst: 1
 default_is_stmt: 1
       line_base: -5
      line_range: 14
     opcode_base: 13
standard_opcode_lengths[DW_LNS_copy] = 0
standard_opcode_lengths[DW_LNS_advance_pc] = 1
standard_opcode_lengths[DW_LNS_advance_line] = 1
standard_opcode_lengths[DW_LNS_set_file] = 1
standard_opcode_lengths[DW_LNS_set_column] = 1
standard_opcode_lengths[DW_LNS_negate_stmt] = 0
standard_opcode_lengths[DW_LNS_set_basic_block] = 0
standard_opcode_lengths[DW_LNS_const_add_pc] = 0
standard_opcode_lengths[DW_LNS_fixed_advance_pc] = 1
standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
standard_opcode_lengths[DW_LNS_set_epilogue_begin] = 0
standard_opcode_lengths[DW_LNS_set_isa] = 1
include_directories[  1] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include"
include_directories[  2] = "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include"
include_directories[  3] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/sys"
include_directories[  4] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach"
include_directories[  5] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/libkern"
include_directories[  6] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/architecture"
include_directories[  7] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/sys/_types"
include_directories[  8] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/_types"
include_directories[  9] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/arm"
include_directories[ 10] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/sys/_pthread"
include_directories[ 11] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach/arm"
include_directories[ 12] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/libkern/arm"
include_directories[ 13] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/uuid"
include_directories[ 14] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/netinet"
include_directories[ 15] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/netinet6"
include_directories[ 16] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/net"
include_directories[ 17] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/pthread"
include_directories[ 18] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach_debug"
include_directories[ 19] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/os"
include_directories[ 20] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/malloc"
include_directories[ 21] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/bsm"
include_directories[ 22] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/machine"
include_directories[ 23] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach/machine"
include_directories[ 24] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/secure"
include_directories[ 25] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/xlocale"
include_directories[ 26] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/arpa"
file_names[  1]:
           name: "fenv.h"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000
file_names[  2]:
           name: "stdatomic.h"
      dir_index: 2
       mod_time: 0x00000000
         length: 0x00000000
file_names[  3]:
           name: "wait.h"
      dir_index: 3
       mod_time: 0x00000000
         length: 0x00000000
// ......
Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x000000010000b588     14      0      2   0             0  is_stmt
0x000000010000b5b4     16      5      2   0             0  is_stmt prologue_end
0x000000010000b5d0     17     11      2   0             0  is_stmt
0x000000010000b5d4      0      0      2   0             0 
0x000000010000b5d8     17      5      2   0             0 
0x000000010000b5dc     17     11      2   0             0 
0x000000010000b5e8     18      1      2   0             0  is_stmt
0x000000010000b608     20      0      2   0             0  is_stmt
0x000000010000b61c     22      5      2   0             0  is_stmt prologue_end
0x000000010000b628     23      5      2   0             0  is_stmt
0x000000010000b644     24      1      2   0             0  is_stmt
0x000000010000b650     15      0      1   0             0  is_stmt
0x000000010000b65c     15     41      1   0             0  is_stmt prologue_end
0x000000010000b66c     11      0      2   0             0  is_stmt
0x000000010000b680     11     17      2   0             0  is_stmt prologue_end
0x000000010000b6a4     11     17      2   0             0  is_stmt end_sequence
debug_line[0x0000def9]
Line table prologue:
    total_length: 0x0000015a
         version: 4
 prologue_length: 0x000000eb
 min_inst_length: 1
max_ops_per_inst: 1
 default_is_stmt: 1
       line_base: -5
      line_range: 14
     opcode_base: 13
standard_opcode_lengths[DW_LNS_copy] = 0
standard_opcode_lengths[DW_LNS_advance_pc] = 1
standard_opcode_lengths[DW_LNS_advance_line] = 1
standard_opcode_lengths[DW_LNS_set_file] = 1
standard_opcode_lengths[DW_LNS_set_column] = 1
standard_opcode_lengths[DW_LNS_negate_stmt] = 0
standard_opcode_lengths[DW_LNS_set_basic_block] = 0
standard_opcode_lengths[DW_LNS_const_add_pc] = 0
standard_opcode_lengths[DW_LNS_fixed_advance_pc] = 1
standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
standard_opcode_lengths[DW_LNS_set_epilogue_begin] = 0
standard_opcode_lengths[DW_LNS_set_isa] = 1
include_directories[  1] = "Test"
include_directories[  2] = "Test/NetworkAPM"
include_directories[  3] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/objc"
file_names[  1]:
           name: "AppDelegate.h"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000
file_names[  2]:
           name: "JMWebResourceURLProtocol.h"
      dir_index: 2
       mod_time: 0x00000000
         length: 0x00000000
file_names[  3]:
           name: "AppDelegate.m"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000
file_names[  4]:
           name: "objc.h"
      dir_index: 3
       mod_time: 0x00000000
         length: 0x00000000
// ......

You can see that the debug_line contains the number of lines for each code address. It has the AppDelegate part posted on it.

4.3 symbols

In the link, we collectively call the function and the variable as the Symbol, and the function Name or the variable Name is the Symbol Name. We can regard the Symbol as the adhesive in the link, and the whole link process is based on the Symbol to complete correctly.

The above text is from Programmer Self-cultivation. So a symbol is a general term for a function, a variable, or a class.

By type, symbols can be divided into three categories:

Global symbol: A symbol visible outside of an object file that can be referenced by, or that requires definition from, another object file
Local symbol: A symbol that is visible only in the object file. It refers to functions and variables that are visible only in the object file
Debug symbol: Debug symbol information that contains line number information. Line number information records the file and line number of the function and variable.

Symbol Table: This is a mapping Table of memory addresses with function names, file names, and line numbers. Each defined Symbol has a corresponding Value, called the Symbol Value. For variables and functions, the Symbol Value is the address. The Symbol table consists of the following

< start address > < end address > < function > [< filename: line number >]

4.4 How do I get an address?

When an image is loaded, it will be repositioned relative to the base address, and the base address will be different each time. The address of the function stack frame is the absolute address after relocation, and what we want is the relative address before relocation.

Binary Images

Take the test project’s crash log as an example, and open the post section of the Binary Images content

// ...
Binary Images:
0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
0x1030e0000 - 0x1030ebfff libobjc-trampolines.dylib arm64  <181f3aa866d93165ac54344385ac6e1d> /usr/lib/libobjc-trampolines.dylib
0x103204000 - 0x103267fff dyld arm64  <6f1c86b640a3352a8529bca213946dd5> /usr/lib/dyld
0x189a78000 - 0x189a8efff libsystem_trace.dylib arm64  <b7477df8f6ab3b2b9275ad23c6cc0b75> /usr/lib/system/libsystem_trace.dylib
// ...

You can see that the Binary Images in the Crash log contain the load start address, end address, Image name, ARM architecture, UUID, and Image path of each Image.

Information in the Crash log

Last Exception Backtrace:
// ...
5   Test                              0x102fe592c -[ViewController testMonitorCrash] + 22828 (ViewController.mm:58)

Binary Images:
0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test

So frame 5 has a relative address of 0x102FE592C-0x102FE0000. Use the command to restore the symbol information.

ATOS is used for parsing. 0x102FE0000 is the starting address of image loading and 0x102FE592C is the address of frame to be restored.

atos -o Test.app.DSYM/Contents/Resources/DWARF/Test-arch arm64 -l 0x102fe0000 0x102fe592c

4.5 UUID

The UUID of the CRASH file

grep --after-context=2 "Binary Images:" *.crash

Test  5-28-20, 7-47 PM.crash:Binary Images:
Test  5-28-20, 7-47 PM.crash-0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
Test  5-28-20, 7-47 PM.crash-0x1030e0000 - 0x1030ebfff libobjc-trampolines.dylib arm64  <181f3aa866d93165ac54344385ac6e1d> /usr/lib/libobjc-trampolines.dylib
--
Test.crash:Binary Images:
Test.crash-0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
Test.crash-0x1030e0000 - 0x1030ebfff libobjc-trampolines.dylib arm64  <181f3aa866d93165ac54344385ac6e1d> /usr/lib/libobjc-trampolines.dylib

The Test App UUID for 37 eaa57df2523d95969e47a9a1d69ce5.

The UUID of the.dsym file
```
dwarfdump --uuid Test.app.DSYM
```

The results for

UUID: 37EAA57D-F252-3D95-969E-47A9A1D69CE5 (arm64) Test.app.DSYM/Contents/Resources/DWARF/Test

The app UUID
```
dwarfdump --uuid Test.app/Test
```

The results for

UUID: 37EAA57D-F252-3D95-969E-47A9A1D69CE5 (arm64) Test.app/Test

4.6 Symbolizing (parsing the Crash log)

The above section analyzed how to capture various types of Crash. In the hands of the APP users, we can obtain the information of the scene of the Crash crime by technical means and report it in combination with certain mechanism. However, this stack is a hexadecimal address and cannot locate the problem, so it needs to be processed symbolically.

Above also explained the function of.DSYM file, through the symbol address combined with DSYM file to restore the file name, line, function name, this process is called symbolization. However, the.dsym file must correspond to the bundle ID and version of the Crash Log file.

To get the Crash log, you can select the corresponding device through Xcode-> window-> Devices and Simulators, find the Crash log file, and locate it according to the time and App name.

App and the DSYM file can be obtained by the product of packaging, path for ~ / Library/Developer/Xcode/Archives.

There are two general analytical methods:

Using symbolicatecrash

Symbolicatecrash is the Crash log analysis tool that comes with Xcode. First determine the path and execute the following command at the terminal
```
find /Applications/Xcode.app -name symbolicatecrash -type f
```

It returns several paths to the line where the iPhoneSimulator. Platform is located

/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/Library/PrivateFrameworks/DVTFou ndation.framework/symbolicatecrash

Copy Symbolicatecrash to the specified folder (the folder where the app, DSYM, and crash files are stored)

Execute the command

./symbolicatecrash Test.crash Test.DSYM > Test.crash

Error: “DEVELOPER_DIR” is not defined at./symbolicatecrash line 69. Solution: Execute the following command at the terminal

export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer

The use of atos

Unlike Symbolicatecrash, ATOS is more flexible, as long as.crash corresponds to.dsym or.crash corresponds to.app files.

The usage is as follows: -l is followed by a symbolic address
```
xcrun atos -o Test.app.DSYM/Contents/Resources/DWARF/Test -arch armv7 -l 0x1023c592c
```

You can also parse the.app file (no.dsym file exists), where XXX is the segment address and XX is the offset address

atos -arch architecture -o binary -l xxx xx

Because we may have many apps, each App may have different versions in the hands of users, so after the APM intercept and need to symbolize after the crash file and.dsym file one to one, in order to correctly symbolize, the corresponding principle is consistent UUID.

4.7 Symbolization analysis of system library

Every time we connect the real machine to Xcode to run the program, it will prompt us to wait. In fact, in order to resolve the stack, Saves the current version of Library automatic system symbols imported to/Users/your user name/Library/Developer/Xcode/iOS DeviceSupport directory to install a lot of symbolic file system libraries. You can visit the following directory to have a look

/ Users/your user name/Library/Developer/Xcode/iOS DeviceSupport /

5. Server processing

5.1 ELK logging system

The industry design log monitoring system generally adopts ELK technology. Elk is an acronym for Elasticsearch, Logstash and Kibana. ElasticSearch is a distributed, RESTful interactive near real time search platform framework. Logstash is a central data flow engine used to from different target (documents/data storage/MQ) to collect data of different format, support after filtered output to different destination (file/MQ/Redis/ElasticsSearch/Kafka). Kibana can display ElasticSerarch data in a friendly page, providing visual analysis capabilities. Therefore, ELK can build an efficient and enterprise-level log analysis system.

In the early era of single application, almost all the functions of the application were run on a single machine. If something went wrong, the operation and maintenance personnel would open the terminal and input commands to directly view the system log, and then locate the problem and solve the problem. As the function of the system more and more complex, the user volume is more and more big, the monomer used hardly to meet demand, so the technical architecture iteration, through horizontal expand to support a large population, the monomers were split into multiple application, each application using the cluster deployment, load balancing control scheduling, a problem if a child module, Go to the terminal on this server for log analysis? Obviously Taiwan lagged behind, so the log management platform came into being. Logstash collects and analyzes the log files of each server, filters them according to the defined regular template, transfers them to Kafka or Redis, and then another Logstash reads the logs from Kafka or Redis and stores them in ES to create indexes. Finally, Kibana is used for visualization analysis. In addition, the collected data can be analyzed for further maintenance and decision making.

The diagram above shows a log architecture diagram for ELK. A brief explanation:

Before Logstash and ES, there is a Kafka layer. Because Logstash is set up on the data resource server, it will filter the collected data in real time. Filtering needs time and memory, so there is Kafka, which plays the role of data buffer storage. Because Kafka has very good read and write performance.
The next step is for Logstash to read data from Kafka, filter the data, process it, and transmit the results to ES
This design not only has good performance, low coupling, but also has extensibility. For example, it can be read from N different Logstash to N Kafka, and then filtered by N Logstash. There can be m log sources, such as App logs, Tomcat logs, Nginx logs, and so on

A screenshot of an Elastic APM hands-on theme shared by the Elasticsearch community.

5.2 the service side

The Crash Log is not symbolized when entering Kibana uniformly, so it needs to be symbolized in order to locate the problem, generate the report, and then process it.

So the whole process is as follows: the client APM SDK collects Crash Log -> Kafka storage -> MAC executes timing task symbolization -> data return to Kafka-> product side (display end) to conduct data classification, report, alarm and other operations.

Because the company had multiple product lines, multiple apps, and different versions of apps used by users, it was necessary to have the correct.dsym file for the crash log analysis. Therefore, it was very important to automate the different versions of apps.

There are two ways to automate this. Smaller companies, or to make things easier, can add runScript code to Xcode to automatically upload DSYM in release mode.

Because we have a system at the front, Can simultaneously manage IOS SDK, IOS APP, Android SDK, Android APP, Node, React, React Native engineering project initialization, dependency management, build (continuous integration, Unit) Test, Lint, hop detection), testing, packaging, deployment, dynamic capabilities (hot update, hop routing out), etc. Capability insertion can be done based on each stage, so in the packaging system, after calling the packaging, the.dsym file can be transferred to Qiniu Cloud storage on the packaging machine (the rule can be AppName + Version as the key, value as the.dsym file).

A lot of architecture today is about microservices, and why that is is beyond the scope of this article. So the symbolization of the Crash log was designed as a microservice. The architecture diagram is as follows

Description:

As an integral part of the overall monitoring system, Symbolication Service is a microservice focused on Crash Report symbolization.
Receiving the request from the task scheduling framework containing the pre-processed Crash Report and DSYM Index, pulling the corresponding DSYM from Qiniu, doing the symbolic analysis of the Crash Report, calculating the hash, and responding the hash to the “Data Processing and Task Scheduling Framework”.
Recepts the request from the APM management system containing the original Crash Report and DSYM Index, pulls the corresponding DSYM from Qiniu, makes a symbolic analysis of the Crash Report, and responds the symbolic Crash Report to the APM management system.
Scaffolding cli a ability is called packaging systems, packaging building ability, can according to the characteristics of the project, select the appropriate packing machine (packing platform is to maintain a number of tasks, according to the characteristics of different task has been distributed to different packaging machine, can see depend on the task details page to download, compile, operation process, etc., The packaged products include binary package, download QR code, etc.)

Among them, the symbolic service is the product of the large front end team in the background of the large front end, so it is implemented by NodeJS (single thread, so in order to improve the machine utilization, it is necessary to enable the multi-process capability). The symbolic machine of IOS is the dual-core Mac Mini, which requires experimental evaluation and how many worker processes need to be started to provide symbolic services. The result was a two-process process that handled the Crash Log nearly twice as efficiently as a single-process process, while a four-process process was not significantly more efficient than a two-process one, consistent with the dual-core Mac Mini. Therefore, two worker processes are started for symbolic processing.

The complete design is shown below

To summarize, the symbolization process is a master-slave cache that reads the results of.DSYM and crash. “Data Processing and Task Scheduling Framework” scheduling symbolization service (internal 2 Symbolocate workers) simultaneously fetches.dsym files from Qiniu Cloud.

The system architecture diagram is as follows

8. APM summary

Generally speaking, the monitoring capability of each end is not consistent, and the technical implementation details are not uniform. Therefore, the monitoring capability should be aligned and unified during the technical proposal review. The data fields of each capability at each end must be aligned (number of fields, name, data type and accuracy), because APM itself is a closed loop. After monitoring, it needs to be symbolized for analysis, data consolidation, product development, and finally, it needs to monitor the large scale display, etc
Some Crash or ANR, etc., inform stakeholders according to the level needs by email, SMS or enterprise content communication tools, and then quickly release versions, hot fixes, etc.
Monitoring capabilities need to be configurable and flexible to turn on and off.
Monitoring data needs to do memory to file write processing, need to pay attention to the policy. Monitoring data needs to store database, database size, design rules, etc. How to report to the database after it is stored in the database? The reporting mechanism will be discussed in another article: Creating a universal, configurable data reporting SDK

As far as possible, after the technical review, the technical realization of each end should be written in the document and synchronized to the relevant personnel. For example, the implementation of ANR

/* Android terminal according to the device grading, generally more than 300ms is regarded as a Caton Hook system loop, before and after the message processing of a pile, to calculate the length of each message to start another thread dump stack. */ new ExceptionProcessor().init(this, New Runnable() {@Override public void run() {ProxyPrinter ProxyPrinter = new ProxyPrinter(PerformanceMonitor.this); Looper.getMainLooper().setMessageLogging(proxyPrinter); mWeakPrinter = new WeakReference<ProxyPrinter>(proxyPrinter); } catch (FileNotFoundException e) {}}}) /* The IOS child thread pings the main thread to see if the main thread is currently stuck. The deadton threshold is set to 300ms. When the threshold is exceeded, the deadton is considered. The card gets the stack of the main thread and stores the upload. */ - (void) main() { while (self.cancle == NO) { self.isMainThreadBlocked = YES; dispatch_async(dispatch_get_main_queue(), ^{ self.isMainThreadBlocked = YES; [self.semaphore singal]; }); [Thread sleep:300]; if (self.isMainThreadBlocked) { [self handleMainThreadBlock]; } [self.semaphore wait]; }}

The architecture diagram of the entire APM is shown below

Description:
- Buried SDK, associating log data with SESSIONID
APM technical solutions themselves are constantly adjusted and upgraded according to technical means and analysis requirements. Some of the schematic diagrams in the above diagram are earlier versions, and the current ones are updated and modified on this basis. Some key words are mentioned: Hermes, Flink SQL, and InfluxDB.