Types of problems

Time complexity

The effect of time complexity on performance seems minimal when the amount of data in the collection is small. But this complexity optimization is important when a developed function is a public function and the amount of data passed in by the caller is unpredictable.

The figure above shows the time complexity of various cases, such as efficient sorting algorithms are generally O(n log n). Take a look at the image below:

As can be seen from the figure, O(n) is a watershed, greater than which has a great potential impact on performance. If it is a public interface, it must be specified, and you should also know how to call it. Of course, it is best to trade memory consumption for space in exchange for time through algorithmic optimization or the use of appropriate system interface methods.

Here’s an example of whether there is a value in the set:

//O(1) return array[idx] == value; //O(n) for (int i = 0; i < count; i++) { if (array[i] == value) { return YES; } } return NO; For (int I = 0; i < count; i++) { for (int j = 0; j < count; j++) { if ( i ! = j && array[i] == array[j]) { return YES; } } } return NO;Copy the code

What about the time complexity of interface methods provided by several common collection objects in OC?

NSArray / NSMutableArray

First of all, we found that they have sorting and allow duplicate elements, so this design indicates that the collection store cannot use the elements in the hash table to perform quick operations on keys. Therefore, the performance of different functional interface methods will be very different.

  • ContainsObject, containsObject, indexOfObject, removeObject, it’s going to go through the elements to see if they match, so it’s going to be O(n) or greater.
  • ObjectAtIndex, firstObject, lastObject, addObject, removeLastObject, these are all top of the stack operations and they’re all O(1)
  • IndexOfObject: inSortedRange: options: usingComparator: using the binary search, the time complexity is O (log n)

NSSet / NSMutableSet / NSCountedSet

These collection types are unordered with no repeating elements. This allows fast operations to be performed using hash tables. So addobobject, removeObject, Containsobobject are all O(1). Note that converting an array to a Set will combine duplicate elements and lose the sorting.

NSDictionary / NSMutableDictionary

It’s almost like Set, but there’s more key correspondence. Add, delete, and find are all O(1). Note that Keys must be NSCopying compliant.

ContainsObject different implementations of the array and Set methods

Implementation in arrays

- (BOOL) containsObject: (id)anObject { return ([self indexOfObject: anObject] ! = NSNotFound); } - (NSUInteger) indexOfObject: (id)anObject { unsigned c = [self count]; if (c > 0 && anObject ! = nil) { unsigned i; IMP get = [self methodForSelector: oaiSel]; BOOL (*eq)(id, SEL, id) = (BOOL (*)(id, SEL, id))[anObject methodForSelector: eqSel]; for (i = 0; i < c; i++) if ((*eq)(anObject, eqSel, (*get)(self, oaiSel, i)) == YES) return i; } return NSNotFound; }Copy the code

As you can see, it goes through all the elements before it returns.

Here’s how containsObject is implemented in Set:

- (BOOL) containsObject: (id)anObject { return (([self member: anObject]) ? YES : NO); } // in GSSet,m there is an implementation of member - (id) member: (id)anObject {if (anObject! = nil) { GSIMapNode node = GSIMapNodeForKey(&map, (GSIMapKey)anObject); if (node ! = 0) { return node->key.obj; } } return nil; }Copy the code

Elements are retrieved from the map table by keys. Since elements in a Set are unique, hash elements can be used as keys to quickly obtain values.

Use GCD for optimization

We can put some time-consuming operations on the non-main thread through the method provided by GCD, so that the App can run more smoothly and respond faster. However, when using GCD, you need to be careful not to cause thread explosions and deadlocks. There is also no cure-all for non-main thread processing tasks. If a process consumes a lot of memory or CPU, GCD can not help you.

Asynchronously processing events

Above is the most typical way to handle events asynchronously

Requires time-consuming tasks

Run the dispatch_block_create_WITH_qOS_class method to specify the queue QoS as QOS_CLASS_UTILITY. This QoS system optimizes power for large computing, I/O, networking, and complex data processing.

Avoid thread explosion

  • Using serial queues
  • Concurrent limit method using NSOperationQueues NSOperationQueue maxConcurrentOperationCount

For example, the following is dangerous and can cause thread explosions and deadlocks

for (int i = 0; i < 999; i++) { dispatch_async(q, ^{... }); } dispatch_barrier_sync(q, ^{});Copy the code

So how can it be avoided? You can start with dispatch_apply

dispatch_apply(999, q, ^(size_t i){... });Copy the code

Or use dispatch_semaphore

#define CONCURRENT_TASKS 4sema = dispatch_semaphore_create(CONCURRENT_TASKS); for (int i = 0; i < 999; i++){ dispatch_async(q, ^{ dispatch_semaphore_signal(sema); }); dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER); }Copy the code

GCD related Crash logs

Managing thread issues

Thread 1:: Dispatch queue: com.apple.libdispatch-manager
0   libsystem_kernel.dylib   0x00007fff8967e08a kevent_qos + 10
1   libdispatch.dylib        0x00007fff8be05811 _dispatch_mgr_invoke + 251
2   libdispatch.dylib        0x00007fff8be05465 _dispatch_mgr_thread + 52
Copy the code

Idle thread

Thread 6:
0   libsystem_kernel.dylib       0x00007fff8967d772 __workq_kernreturn + 10
1   libsystem_pthread.dylib      0x00007fff8fd317d9 _pthread_wqthread + 1283
2   libsystem_pthread.dylib      0x00007fff8fd2ed95 start_wqthread + 13
Copy the code

When a thread is active

Thread 3 Crashed:: Dispatch queue: <queue name>
<my code>
7   libdispatch.dylib        0x07fff8fcfd323 _dispatch_call_block_and_release
8   libdispatch.dylib        0x07fff8fcf8c13 _dispatch_client_callout + 8
9   libdispatch.dylib        0x07fff8fcfc365 _dispatch_queue_drain + 1100
10  libdispatch.dylib        0x07fff8fcfdecc _dispatch_queue_invoke + 202
11  libdispatch.dylib        0x07fff8fcfb6b7 _dispatch_root_queue_drain + 463
12  libdispatch.dylib        0x07fff8fd09fe4 _dispatch_worker_thread3 + 91
13  libsystem_pthread.dylib  0x07fff93c17637 _pthread_wqthread + 729
14  libsystem_pthread.dylib  0x07fff93c1540d start_wqthread + 13
Copy the code

The main thread is idle

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib     0x00007fff906614de mach_msg_trap + 10
1   libsystem_kernel.dylib     0x00007fff9066064f mach_msg + 55
2   com.apple.CoreFoundation   0x00007fff9a8c1eb4 __CFRunLoopServiceMachPort
3   com.apple.CoreFoundation   0x00007fff9a8c137b __CFRunLoopRun + 1371
4   com.apple.CoreFoundation   0x00007fff9a8c0bd8 CFRunLoopRunSpecific + 296
...
10  com.apple.AppKit           0x00007fff8e823c03 -[NSApplication run] + 594
11  com.apple.AppKit           0x00007fff8e7a0354 NSApplicationMain + 1832
12  com.example                0x00000001000013b4 start + 52
Copy the code

The home side column

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread <my code> 12 com.apple.Foundation 0x00007fff931157e8 __NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK__ + 7 13 com.apple.Foundation 0x00007fff931155b5 -[NSBlockOperation main] +  9 14 com.apple.Foundation 0x00007fff93114a6c -[__NSOperationInternal _start:] + 653 15 com.apple.Foundation 0x00007fff93114543 __NSOQSchedule_f + 184 16 libdispatch.dylib 0x00007fff935d6c13 _dispatch_client_callout + 8 17 libdispatch.dylib 0x00007fff935e2cbf _dispatch_main_queue_callback_4CF + 861 18 com.apple.CoreFoundation 0x00007fff8d9223f9 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ 19 com.apple.CoreFoundation 0x00007fff8d8dd68f __CFRunLoopRun + 2159 20 com.apple.CoreFoundation 0x00007fff8d8dcbd8 CFRunLoopRunSpecific + 296 ... 26 com.apple.AppKit 0x00007fff999a1bd3 -[NSApplication run] + 594 27 com.apple.AppKit 0x00007fff9991e324 NSApplicationMain + 1832 28 libdyld.dylib 0x00007fff9480f5c9 start + 1Copy the code

I/O performance optimization

I/O is a big user of performance, and any I/O operation will break the low power state, so reducing the number of I/ OS is the key to this performance optimization. Here are some ways to achieve this goal.

  • Write bits and pieces as a whole
  • Use appropriate I/O operation apis
  • Use appropriate threads
  • Using NSCache for caching reduces I/O

NSCache

Why not just use a dictionary to do that? NSCache has all the functions of a dictionary, as well as the following features:

  • Automatically clear the memory occupied by the system
  • NSCache is thread safe
  • -(void)cache:(NSCache *)cache willEvictObject:(id)obj; A callback when the cache object is about to be cleaned
  • EvictsObjectsWithDiscardedContent can control whether clean

So how does NSCache achieve these features?

Now let’s learn how NSCache works. First, NSCache will hold an NSMutableDictionary.

@implementation NSCache
- (id) init
{
  if (nil == (self = [super init]))
    {
      return nil;
    }
  _objects = [NSMutableDictionary new];
  _accesses = [NSMutableArray new];
  return self;
}Copy the code

You need to design a Cached object structure to hold some additional information

@interface _GSCachedObject : NSObject { @public id object; NSString *key; // Set the cache key int accessCount; // Save access times for automatic clean-up NSUInteger cost; //setObject:forKey:cost: BOOL isEvictable; Thread-safe} @endCopy the code

The isEvictable Boolean value is used to ensure thread-safe operation. The _accesses array is added and deleted when the Cache reads data. Use the accessCount attribute in the Cached object to perform a +1 operation to prepare for the automatic cleanup condition later. The concrete implementation is as follows:

- (id) objectForKey: (id)key { _GSCachedObject *obj = [_objects objectForKey: key]; if (nil == obj) { return nil; } if (obj->isEvictable) {// Obj is the only device connected to the access list [_OBj: obj]; [_accesses addObject: obj]; } obj->accessCount++; _totalAccesses++; return obj->object; }Copy the code

Every time the Cache is added, the system checks whether the Cache is automatically cleared. A Cached object is created to record the key, object, and cost into the _accesses array and _Objects dictionaries.

- (void) setObject: (id)obj forKey: (id)key cost: (NSUInteger)num { _GSCachedObject *oldObject = [_objects objectForKey: key]; _GSCachedObject *newObject; if (nil ! = oldObject) { [self removeObjectForKey: oldObject->key]; } [self _evictObjectsToMakeSpaceForObjectWithCost: num]; newObject = [_GSCachedObject new]; // Retained here, released when obj is dealloc'd newObject->object = RETAIN(obj); newObject->key = RETAIN(key); newObject->cost = num; if ([obj conformsToProtocol: @protocol(NSDiscardableContent)]) { newObject->isEvictable = YES; [_accesses addObject: newObject]; } [_objects setObject: newObject forKey: key]; RELEASE(newObject); _totalCost += num; }Copy the code

So how is the automatic memory cleaning method mentioned above implemented? Since automatic cleaning must have trigger time and enter the condition judgment, trigger time one is to add Cache content, the other is to occur in memory warning. The conditional judgment code is as follows:

If (_costLimit > 0 && _totalCost + cost > _costLimit){if (_costLimit > 0 && _totalCost + cost > _costLimit){ spaceNeeded = _totalCost + cost - _costLimit; } / / only when the cost is greater than the artificial restrictions will clean up / / or cost is set to 0 if without manual intervention (count > 0 && (spaceNeeded > 0 | | count > = _countLimit))Copy the code

Therefore, the totalCostLimit of NSCache is compared with the total cost added to the Cache each time. If the totalCostLimit is exceeded, memory clearing is triggered.

Therefore, the _totalAccesses and the total accesses are equal to the average accesses, and only the accesses of that object are less than the average accesses.

NSUInteger Averagebus = (_totalAccesses/count * 0.2) +1; If (obj->accessCount < averageAccesses && obj->isEvictable)Copy the code

There are a few things you need to do before cleaning up, including marking the isEvictable of the Cached object to prevent unsafe threading operations later on. If you have enough space, you don’t need to add more objects to the cleanup array. You can clean up one by one through the cleanup array.

NSUInteger cost = obj->cost; obj->cost = 0; Obj ->isEvictable = NO; / / added to remove the list in the if (_evictsObjectsWithDiscardedContent) {[evictedKeys addObject: obj - > key]; } _totalCost -= cost; // Needed if (cost > spacenneeded) {break; } spaceNeeded -= cost;Copy the code

Callback content is executed during cleanup so that if some cached data needs to be stored persistently, it can be processed in the callback.

- (void) removeObjectForKey: (id)key
{
    _GSCachedObject *obj = [_objects objectForKey: key];
    
    if (nil != obj)
    {
        [_delegate cache: self willEvictObject: obj->object];
        _totalAccesses -= obj->accessCount;
        [_objects removeObjectForKey: key];
        [_accesses removeObjectIdenticalTo: obj];
    }
}Copy the code

The full implementation can be seen in the GNUstep Base nscache.m file.

Here’s how NSCache works in SDWebImage:

- (UIImage *)imageFromMemoryCacheForKey:(NSString *)key { return [self.memCache objectForKey:key]; } - (UIImage *) imageFromDiskCacheForKey: (nsstrings *) key {/ / check if there is a UIImage * in the NSCache image = [the self imageFromMemoryCacheForKey:key]; if (image) { return image; } UIImage *diskImage = [self diskImageForKey:key]; if (diskImage && self.shouldCacheImagesInMemory) { NSUInteger cost = SDCacheCostForImage(diskImage); [self.memCache setObject:diskImage forKey:key cost:cost]; } return diskImage; }Copy the code

NSCache automatically frees the memory of all images in NSCache, so that when the memory is insufficient, the images that are not used in the Cache are automatically deleted. When the Cache is not deleted, the image data is returned directly. After the Cache is deleted, the image data is read from disk. In this way, space can be used to reduce disk operations, and space can be controlled more effectively.

Control the Wake times of App

Notifications, VoIP, location, Bluetooth, and so on all arouse the device from the Standby state. Arousal is a costly process and should be avoided frequently. Notification should be considered mainly at the product level. In terms of positioning, let’s take a look at some of the apis for positioning and see how they affect performance differently to consider the appropriate interface.

Continuous position updates

[locationManager startUpdatingLocation]Copy the code

This method keeps the device active.

Delay effective location

[locationManager allowDeferredLocationUpdatesUntilTraveled: timeout:]Copy the code

Efficient and energy saving positioning method, data will be cached on the location hardware. All running apps should be this way.

Major position change

[locationManager startMonitoringSignificantLocationChanges]Copy the code

It is more energy efficient and can be used for applications that require callbacks only when there is a significant change in location, such as weather applications.

Regional monitoring

[locationManager startMonitoringForRegion:(CLRegion *)]Copy the code

It is also an energy-saving way of positioning, such as monitoring and displaying different information in different areas in museums and other applications are more suitable for this positioning.

Frequently visited places

// Start monitoringlocationManager.startMonitoringVisits()// Stop monitoring when no longer neededlocationManager.stopMonitoringVisits()Copy the code

In general, don’t use startUpdatingLocation() unless you absolutely have to, and use stopUpdatingLocation() as soon as possible to stop locating and returning a power saving device to the user.

Memory impact on performance

First, Reclaiming memory takes time, and a sudden high memory demand will affect the response.

How can these performance problems be prevented, and do you need to deliberately prevent them

Try to avoid performance problems during the coding phase by sticking to the following principles.

  • Optimize computational complexity to reduce CPU usage
  • Stop unnecessary task processing while the application responds to interactions
  • Set an appropriate QoS
  • Combine timer tasks so that the CPU is idle more often

So if you write requirements too late to notice these problems and can’t prevent them, can you automate code reviews to avoid these problems?

How to check

Do you look these up in code, write a tool or automate it? It is possible, but there are too many cases to consider. The existing tools are not well supported, and it takes too long to write too many points to consider. Then what is the better way?

Monitor by listening on the main thread

I’m going to use CFRunLoopObserverCreate to create an observer that accepts the CFRunLoopActivity callback, Then use CFRunLoopAddObserver to add the observer to the kCFRunLoopCommonModes of the CFRunLoopGetMain() main thread Runloop.

Then create a child thread to monitor and use dispatch_semaphore_wait to define interval time, standard is 16 or 20 microseconds. The standard of monitoring results is to judge whether the two runloops are stuck according to whether the status of BeforeSources and AfterWaiting can be detected in the interval.

How to print stack information and save it on-site

The whole idea of printing the stack is to get the thread’s information to get the thread’s state so as to get Pointers to all stacks in the thread. Based on these Pointers, find corresponding descriptions in the symbol table, namely symbolic parsing, so as to display readable stack information. What is the implementation? Here are the details:

Gets information about the thread

The first step is to fetch all threads via task_threads,

thread_act_array_t threads; Thread [1] = 5635 mach_MSG_type_number_t thread_count = 0; //mach_msg_type_number_t is int const task_t this_task = mach_task_self(); Kern_return_t kr = task_threads(this_task, &threads, &thread_count);Copy the code

The traversal gets details about individual threads from thread_info

SMThreadInfoStruct threadInfoSt = {0}; thread_info_data_t threadInfo; thread_basic_info_t threadBasicInfo; mach_msg_type_number_t threadInfoCount = THREAD_INFO_MAX; if (thread_info((thread_act_t)thread, THREAD_BASIC_INFO, (thread_info_t)threadInfo, &threadInfoCount) == KERN_SUCCESS) { threadBasicInfo = (thread_basic_info_t)threadInfo; if (! (threadBasicInfo->flags & TH_FLAGS_IDLE)) { threadInfoSt.cpuUsage = threadBasicInfo->cpu_usage / 10; threadInfoSt.userTime = threadBasicInfo->system_time.microseconds; } } uintptr_t buffer[100]; int i = 0; NSMutableString *reStr = [NSMutableString stringWithFormat:@"Stack of thread: %u:\n CPU used: %.1f percent\n user time: %d second\n", thread, threadInfoSt.cpuUsage, threadInfoSt.userTime];Copy the code

Gets information about all stacks in a thread

Thread_get_state tells you that the machine context contains all the Pointers to the thread stack.

_STRUCT_MCONTEXT machineContext; // Get complete machineContext message from thread_get_state, Mach_msg_type_number_t state_count = smThreadStateCountByCPU(); kern_return_t kr = thread_get_state(thread, smThreadStateByCPU(), (thread_state_t)&machineContext.__ss, &state_count);Copy the code

Create a stack structure to hold stack data

Typedef struct SMStackFrame {const struct SMStackFrame *const previous; const uintptr_t return_address; } SMStackFrame; SMStackFrame stackFrame = {0}; / / by base address pointer for the current stack frame address const uintptr_t framePointer = smMachStackBasePointerByCPU (& machineContext); if (framePointer == 0 || smMemCopySafely((void *)framePointer, &stackFrame, sizeof(stackFrame)) ! = KERN_SUCCESS) { return @"Fail frame pointer"; } for (; i < 32; i++) { buffer[i] = stackFrame.return_address; if (buffer[i] == 0 || stackFrame.previous == 0 || smMemCopySafely(stackFrame.previous, &stackFrame, sizeof(stackFrame)) ! = KERN_SUCCESS) { break; }}Copy the code

symbolic

The main idea of symbolization is to subtract the Slide address from the stack pointer address to get the ASLR offset from which the string and symbol table positions can be found in the __LINKEDIT segment. The specific code is as follows:

info->dli_fname = NULL; info->dli_fbase = NULL; info->dli_sname = NULL; info->dli_saddr = NULL; / / according to the address which image acquisition is const uint32_t independence idx = smDyldImageIndexFromAddress (address); if (idx == UINT_MAX) { return false; } /* Header ------------------ Load commands Segment command 1 -------------| Segment command 2 | ------------------ | Data | Section 1 data |segment 1 <----| Section 2 data | <----| Section 3 data | <----| Section 4 data |segment 2 Section 5 data | ... | Section n data | * / / * -- -- -- -- -- -- -- -- -- -- the Mach Header -- -- -- -- -- -- -- -- - * / / / according to the image of the serial number to obtain mach_header const struct mach_header * machHeader = _dyld_get_image_header(idx); // When the dynamic linker loads an image, the image must be mapped to the virtual address space of the process that has not occupied the address. The dynamic linker does this by adding a value to the base address of the image, Const uintptr_t imageVMAddressSlide = (uintptr_t) _dyLD_GET_image_vmaddr_slide (IDX); / * -- -- -- -- -- -- -- -- -- -- -- ASLR offset -- -- -- -- -- -- -- -- - * / / / https://en.wikipedia.org/wiki/Address_space_layout_randomization const uintptr_t addressWithSlide = address - imageVMAddressSlide; // Segment defines the range of bytes in the mach-o file and the address and memory protection properties mapped to virtual memory when the dynamic linker loads the application. Therefore, segments are always virtual memory page aligned. A fragment contains zero or more sections. const uintptr_t segmentBase = smSegmentBaseOfImageIndex(idx) + imageVMAddressSlide; if (segmentBase == 0) { return false; } // info->dli_fname = _dyld_get_image_name(idx); info->dli_fbase = (void*)machHeader; / * -- -- -- -- -- -- -- -- -- -- -- -- -- -- the Mach Segment -- -- -- -- -- -- -- -- -- -- -- -- - * / / / address the most matching symbol const nlistByCPU * bestMatch = NULL; uintptr_t bestDistance = ULONG_MAX; uintptr_t cmdPointer = smCmdFirstPointerFromMachHeader(machHeader); if (cmdPointer == 0) { return false; // Uint32_t iCmd = 0; iCmd < machHeader->ncmds; iCmd++) { const struct load_command* loadCmd = (struct load_command*)cmdPointer; /*---------- the symbol table of the target Image ----------*/ / the Segment contains the original data used by the dynamic linker, such as symbols, String and relocation entry. If (loadCmd-> CMD == LC_SYMTAB) {// Get the virtual memory offset of the string and symbol table. const struct symtab_command* symtabCmd = (struct symtab_command*)cmdPointer; const nlistByCPU* symbolTable = (nlistByCPU*)(segmentBase + symtabCmd->symoff); const uintptr_t stringTable = segmentBase + symtabCmd->stroff; for (uint32_t iSym = 0; iSym < symtabCmd->nsyms; // If n_value is 0, symbol points to the external object if (symbolTable[iSym].n_value! Uintptr_t symbolBase = symbolTable[iSym].n_value; uintptr_t currentDistance = addressWithSlide - symbolBase; // Find the minimum bestDistance, since addressWithSlide is the instruction address of a method that is larger than the entry of the method. If ((addressWithSlide >= symbolBase) && (currentDistance <= bestDistance)) {bestMatch =  symbolTable + iSym; bestDistance = currentDistance; } } } if (bestMatch ! = NULL) {// Add the virtual memory offset to the __LINKEDIT segment to provide the memory address of the string and symbol table. info->dli_saddr = (void*)(bestMatch->n_value + imageVMAddressSlide); info->dli_sname = (char*)((intptr_t)stringTable + (intptr_t)bestMatch->n_un.n_strx); if (*info->dli_sname == '_') { info->dli_sname++; If (info->dli_saddr == info->dli_fbase && bestMatch->n_type == 3) {info->dli_sname = NULL; } break; } } cmdPointer += loadCmd->cmdsize; }Copy the code

Something to watch out for

Note that this application has thread Get states that consume performance. This will also be detected by monitoring, so you can filter out such stack information.

A way to get more information

To get more information such as full-level method calls and the time consumed by each method, what’s the benefit?

Time consumption can be measured in a more detailed way, and time-consuming methods can be found. Faster interactive operations can provide better user experience. The following are some scenarios that can be measured:

  • responsiveness
  • Click on the button
  • gestures
  • The Tab to switch
  • Vc switch and transition

You can set optimization goals, such as 60fps for scrolling and animation, and 100ms for responding to user actions. Then check it out one by one and fix it.

How to get more information?

The hook objc_msgSend method can be used to obtain all methods called, log depth can be used to obtain the tree structure of method calls, by logging the time before and after execution can be used to obtain the time of each method, so as to obtain a complete performance consumption information.

The hook C function can use Facebook fishhook, and the method call tree can use InspectiveC. Their implementation is described below:

Gets the method call tree structure

First of all, two structures are designed, CallRecord records the detailed information of calling methods, including OBj and SEL, etc. ThreadCallStack needs to use index to record the depth of the current calling method tree. With SEL you can get the method name using NSStringFromSelector, and with obj you can get the Class using object_getClass and then you can get the Class name using NSStringFromClass.

// Shared structures. typedef struct CallRecord_ { id obj; // object_getClass can be used to get the Class and NSStringFromClass can be used to get SEL _cmd; // use NSStringFromSelector to get the method name uintptr_t lr; int prevHitIndex; char isWatchHit; } CallRecord; typedef struct ThreadCallStack_ { FILE *file; char *spacesStr; CallRecord *stack; int allocatedLength; int index; int numWatchHits; int lastPrintedIndex; int lastHitIndex; char isLoggingEnabled; char isCompleteLoggingEnabled; } ThreadCallStack;Copy the code

Store reads ThreadCallStack

Pthread_setspecific () sets the private data on the specified thread, and pthread_getSpecific () reads the private data. This allows you to bind ThreadCallStack data to that thread. Data access at any time. The code is as follows:

static inline ThreadCallStack * getThreadCallStack() { ThreadCallStack *cs = (ThreadCallStack *)pthread_getspecific(threadKey); If (cs == NULL) {cs = (ThreadCallStack *)malloc(sizeof(ThreadCallStack)); #ifdef MAIN_THREAD_ONLY cs->file = (pthread_main_np()) ? newFileForThread() : NULL; #else cs->file = newFileForThread(); #endif cs->isLoggingEnabled = (cs->file ! = NULL); cs->isCompleteLoggingEnabled = 0; cs->spacesStr = (char *)malloc(DEFAULT_CALLSTACK_DEPTH + 1); memset(cs->spacesStr, ' ', DEFAULT_CALLSTACK_DEPTH); cs->spacesStr[DEFAULT_CALLSTACK_DEPTH] = '\0'; cs->stack = (CallRecord *)calloc(DEFAULT_CALLSTACK_DEPTH, sizeof(CallRecord)); // Allocate the default CallRecord space cs->allocatedLength = DEFAULT_CALLSTACK_DEPTH; cs->index = cs->lastPrintedIndex = cs->lastHitIndex = -1; cs->numWatchHits = 0; pthread_setspecific(threadKey, cs); } return cs; }Copy the code

Record method call depth

Since depth is recorded and there are more method calls in a method call, the method call records pushCallRecord at the beginning and popCallRecord at the end of the record so that it can be subtracted at the end by adding one to the depth at the beginning.

// Start static inline void pushCallRecord(id obj, uintptr_t lr, SEL _cmd, ThreadCallStack *cs) { int nextIndex = (++cs->index); If (nextIndex >= cs->allocatedLength) {cs->allocatedLength += CALLSTACK_DEPTH_INCREMENT; cs->stack = (CallRecord *)realloc(cs->stack, cs->allocatedLength * sizeof(CallRecord)); cs->spacesStr = (char *)realloc(cs->spacesStr, cs->allocatedLength + 1); memset(cs->spacesStr, ' ', cs->allocatedLength); cs->spacesStr[cs->allocatedLength] = '\0'; } CallRecord *newRecord = &cs->stack[nextIndex]; newRecord->obj = obj; newRecord->_cmd = _cmd; newRecord->lr = lr; newRecord->isWatchHit = 0; } static inline CallRecord * popCallRecord(ThreadCallStack *cs) {return &cs->stack[cs->index--]; // Reduce depth}Copy the code

Insert execution methods before and after objc_msgSend

Finally, hook objc_msgSend requires pushCallRecord before the call and popCallRecord after the call. Because you need to insert a method at this point after the call, and because it’s impossible to write a function that keeps unknown parameters and jumps to any function pointer in C, assembly is needed to do this.

Arm64 has 31 64 bit integer registers (x0 to x30). The parameter registers are x0-x7. For objc_msgSend, the first parameter of x0 is the object passed in. The second argument to x1 is the selector _cmd. Syscall’s number is going to be in x8. Then swap the register and move the return register LR into X1. PushCallRecord is enabled first, then the original objc_msgSend is executed, saving the return value, and finally popCallRecord is enabled. The specific code is as follows:

Static void replacementObjc_msgSend() {__asm__ volatile (// sp is the stack offset that points to the top of the stack. // save {q0-q7} offset address to sp register "STP q6, q7, [sp, #-32]! \n" "stp q4, q5, [sp, #-32]! \n" "stp q2, q3, [sp, #-32]! \n" "stp q0, q1, [sp, #-32]! // save {x0-x8, lr} "STP x8, lr, [sp, #-16]! \n" "stp x6, x7, [sp, #-16]! \n" "stp x4, x5, [sp, #-16]! \n" "stp x2, x3, [sp, #-16]! \n" "stp x0, x1, [sp, #-16]! Mov x2, x1\n" "mov x1, lr\n" "mov x3, sp\n" // Call preObjc_msgSend using bl label syntax. Bl performs a branch link operation, label is unconditional branch, is offset with the address of this instruction, The range is -128MB to +128MB "bl __Z15preObjc_msgSendP11objc_objectmP13objc_selectorP9RegState_\n" "mov x9, x0\n" "mov x10, X1 \n" "TST x10, x10\n" // read {x0-x8, lr}" LDP x0, x1, [sp], #16\n" "LDP x2, x3, [sp], #16\n" "LDP x4, X5, [sp], # 16 \ n "" LDP x6, x7, [sp], # 16 \ n" "LDP by 8, lr, [sp], # 16 \ n" read / / {q0 - q7} "LDP q0 and q1, [sp], # 32 \ n" "LDP q2, Q3, [sp], #32\n" "LDP Q4, q5, [sp], #32\n" "LDP Q6, q7, [sp], #32\n" "b.beam passthrough\n" // Invoke the original objc_msgSend. Use BLR XN syntax. BLR has the same effect as BL except that it reads the new PC value from the specified register. // Save {x0-x9} "STP x0, x1, [sp, #-16]! // Save {x0-x9}" STP x0, x1, [sp, #-16]! \n" "stp x2, x3, [sp, #-16]! \n" "stp x4, x5, [sp, #-16]! \n" "stp x6, x7, [sp, #-16]! \n" "stp x8, x9, [sp, #-16]! \n" // save {q0-q7} "STP q0, q1, [sp, #-32]! \n" "stp q2, q3, [sp, #-32]! \n" "stp q4, q5, [sp, #-32]! \n" "stp q6, q7, [sp, #-32]! \n" // call postObjc_msgSend hook. "bl __Z16postObjc_msgSendv\n" "mov lr, x0\n" // read {q0-q7} "LDP q6, q7, [sp], # 32 \ n "" LDP q4, q5, [sp], # 32 \ n" "LDP q2, q3, [sp], # 32 \ n" "LDP q0 and q1, [sp], # 32 \ n" read / / {x0 - x9} "LDP by 8, x9, [sp], #16\n" "ldp x6, x7, [sp], #16\n" "ldp x4, x5, [sp], #16\n" "ldp x2, x3, [sp], #16\n" "ldp x0, x1, [sp], #16\n" "ret\n" "Lpassthrough:\n" // br unconditional branch to register address "br x9"); }Copy the code

A method of recording time

To record the time, you need to record the time in pushCallRecord and popCallRecord. Here are some ways to calculate when a piece of code starts and ends

The first type: NSDate microseconds

NSDate* tmpStartData = [NSDate date];
//some code need caculate
double deltaTime = [[NSDate date] timeIntervalSinceDate:tmpStartData];
NSLog(@"cost time: %f s", deltaTime);Copy the code

Clock_t microseconds Clock_t indicates the number of clock units that occupy the CPU

clock_t start = clock();
//some code need caculate
clock_t end = clock();
NSLog(@"cost time: %f s", (double)(end - start)/CLOCKS_PER_SEC);Copy the code

CFAbsoluteTime microseconds

CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();
//some code need caculate
CFAbsoluteTime end = CFAbsoluteTimeGetCurrent();
NSLog(@"cost time = %f s", end - start); //sCopy the code

Fourth: CFTimeInterval nanosecond

CFTimeInterval start = CACurrentMediaTime(); //some code need caculate CFTimeInterval end = CACurrentMediaTime(); NSLog(@"cost time: %f s", end - start); Uint64_t start = mach_absolute_time (); //some code need caculate uint64_t end = mach_absolute_time (); uint64_t elapsed = 1e-9 *(end - start);Copy the code

The last two available, essentially different NSDate or CFAbsoluteTimeGetCurrent() returns the clock time that will be synchronized to the network time, in terms of clock offset. Mach_absolute_time () and CACurrentMediaTime() are based on built-in clocks. Select one and add it to pushCallRecord and popCallRecord to subtract the elapsed time.

How to hook msgsend method

So how does objc_msgSend hook into c? Dyld defines lazy and non-lazy symbols by updating Pointers in a specific part of the __DATA segment of the Mach-O binary. Rebind symbols by identifying where each symbol name is updated in the rebind_symbol to find the corresponding replacement. The following is an analysis of the key code:

Traverse dyld

The first step is to iterate over all the images in dyLD and retrieve the Image header and slide. Note that the callback is primarily registered on the first call.

if (!_rebindings_head->next) {
    _dyld_register_func_for_add_image(_rebind_symbols_for_image);
} else {
    uint32_t c = _dyld_image_count();
    for (uint32_t i = 0; i < c; i++) {
        _rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i));
    }
}Copy the code

Find the symbol table associated commands

Next you need to find the symbol table-related commands, including Linkedit Segment Command, symtab Command, and dysymtab Command. The method is as follows:

segment_command_t *cur_seg_cmd; segment_command_t *linkedit_segment = NULL; struct symtab_command* symtab_cmd = NULL; struct dysymtab_command* dysymtab_cmd = NULL; uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t); for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) { cur_seg_cmd = (segment_command_t *)cur; if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) { if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) { linkedit_segment = cur_seg_cmd; } } else if (cur_seg_cmd->cmd == LC_SYMTAB) { symtab_cmd = (struct symtab_command*)cur_seg_cmd; } else if (cur_seg_cmd->cmd == LC_DYSYMTAB) { dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd; }}Copy the code

Get the Base and indirect symbol tables

// Find base symbol/string table addresses
uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);

// Get indirect symbol table (array of uint32_t indices into symbol table)
uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);Copy the code

Method substitution

With the symbol table and the method replacement array passed in, we can replace the symbol table access pointer address as follows:

uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1; void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr); for (uint i = 0; i < section->size / sizeof(void *); i++) { uint32_t symtab_index = indirect_symbol_indices[i]; if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL || symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) { continue; } uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx; char *symbol_name = strtab + strtab_offset; if (strnlen(symbol_name, 2) < 2) { continue; } struct rebindings_entry *cur = rebindings; while (cur) { for (uint j = 0; j < cur->rebindings_nel; j++) { if (strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) { if (cur->rebindings[j].replaced ! = NULL && indirect_symbol_bindings[i] ! = cur->rebindings[j].replacement) { *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; } indirect_symbol_bindings[i] = cur->rebindings[j].replacement; goto symbol_loop; } } cur = cur->next; } symbol_loop:;Copy the code

Count method call frequency

In some application scenarios, there may be some frequent method calls, and some method calls are actually unnecessary. However, it is necessary to find out the frequently called methods so as to better locate the potential methods that may cause performance waste. How do you find these frequently called methods?

The general idea is as follows: based on the scheme of recording method call depth mentioned in the above chapter, the path of each method called is saved, and the same method called in the same path is recorded in the database plus one time. Finally, a view is made to find the methods called frequently according to the number of calls. Here’s what it looks like when it’s done:

Let’s look at the implementation

Design the structure of method call frequency records

Add path, frequency and other information on the basis of previous time consumption model

@property (nonatomic, strong) NSString *className; // class name @property (nonatomic, strong) NSString *methodName; @property (nonatomic, assign) BOOL isClassMethod; @property (nonatomic, assign) NSTimeInterval timeCost; @property (nonatomic, assign) NSUInteger callDepth; //Call @property (nonatomic, copy) NSString *path; // path @property (nonatomic, assign) BOOL lastCall; // Call @property (nonatomic, assign) NSUInteger frequency; @property (nonatomic, strong) NSArray <SMCallTraceTimeCostModel *> *subCosts;Copy the code

Assembly method path

The path is assembled and recorded in the database while traversing SMCallTrace record method model and traversing method submethods

For (SMCallTraceTimeCostModel *model in arr) {// Record method path model.path = [NSString stringWithFormat:@"[%@ %@]",model.className,model.methodName]; [self appendRecord:model to:mStr]; } + (void)appendRecord:(SMCallTraceTimeCostModel *)cost to:(NSMutableString *)mStr { [mStr appendFormat:@"%@\n path%@\n",[cost des],cost.path]; if (cost.subCosts.count < 1) { cost.lastCall = YES; } / / records in the database [[[SMLagDB shareInstance] increaseWithClsCallModel: cost] subscribeNext: ^ (id) x {}]. For (SMCallTraceTimeCostModel *model in cost. SubCosts) {// Record method submethod path model.path = [NSString stringWithFormat:@"%@ - [%@ %@]",cost.path,model.className,model.methodName]; [self appendRecord:model to:mStr]; }}Copy the code

Log method call frequency database

Creating a database

In this case, lastCall is a call that records whether or not the last method was called. Just take the last method for presentation, because there will also be a full path to the parent method and the source method.

_clsCallDBPath = [PATH_OF_DOCUMENT stringByAppendingPathComponent:@"clsCall.sqlite"]; if ([[NSFileManager defaultManager] fileExistsAtPath:_clsCallDBPath] == NO) { FMDatabase *db = [FMDatabase databaseWithPath:_clsCallDBPath]; If ([db open]) {/* cid: primary ID fid: parent ID not needed CLS: class name MTD: method name path: full path id timecost: method elapsed time calldepth: level frequency Number of calls lastCall: Call */ NSString *createSql = @"create table clscall (CID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, fid integer, cls text, mtd text, path text, timecost integer, calldepth integer, frequency integer, lastcall integer)"; [db executeUpdate:createSql]; }}Copy the code

Add records

When adding records, you need to check whether the database has the same method call with the same path. In this way, you can add one to the frequency field to achieve the purpose of recording frequency.

FMResultSet *rsl = [db executeQuery:@"select cid,frequency from clscall where path = ?", model.path]; Int fq = [RSL intForColumn:@"frequency"] + 1; if ([RSL next]) {// if ([RSL next]) {fq = [RSL intForColumn:@"frequency"] + 1; int cid = [rsl intForColumn:@"cid"]; [db executeUpdate:@"update clscall set frequency = ? where cid = ?", @(fq), @(cid)]; } else {// add a record NSNumber *lastCall = @0; if (model.lastCall) { lastCall = @1; } [db executeUpdate:@"insert into clscall (cls, mtd, path, timecost, calldepth, frequency, lastcall) values (?, ?, ?, ?, ?, ?, ?)", model.className, model.methodName, model.path, @(model.timeCost), @(model.callDepth), @1, lastCall]; } [db close]; [subscriber sendCompleted];Copy the code

To retrieve the records

Note that the retrieval can be sorted according to the call frequency field.

FMResultSet *rs = [db executeQuery:@"select * from clscall where lastcall=? order by frequency desc limit ?, 50",@1, @(page * 50)];
NSUInteger count = 0;
NSMutableArray *arr = [NSMutableArray array];
while ([rs next]) {
    SMCallTraceTimeCostModel *model = [self clsCallModelFromResultSet:rs];
    [arr addObject:model];
    count ++;
}
if (count > 0) {
    [subscriber sendNext:arr];
} else {
    [subscriber sendError:nil];
}
[subscriber sendCompleted];
[db close];Copy the code

Find out that the CPU uses a large thread stack

Thread_info can be used to check the CPU consumption of each thread. However, if the CPU is not in the main thread, even if the CPU consumption is very high, it does not necessarily cause the CPU consumption to be stuck. So polling is the only way to monitor CPU usage in each thread, and to track down which methods consume power by recording those that exceed standard values such as 70%. The following is a display of stack records listing CPU overloads:

With this foundation in place, it’s a lot easier to implement

+ (void)updateCPU {thread_act_array_t threads; mach_msg_type_number_t threadCount = 0; const task_t thisTask = mach_task_self(); kern_return_t kr = task_threads(thisTask, &threads, &threadCount); if (kr ! = KERN_SUCCESS) { return; } for (int i = 0; i < threadCount; i++) { thread_info_data_t threadInfo; thread_basic_info_t threadBaseInfo; mach_msg_type_number_t threadInfoCount = THREAD_INFO_MAX; if (thread_info((thread_act_t)threads[i], THREAD_BASIC_INFO, (thread_info_t)threadInfo, &threadInfoCount) == KERN_SUCCESS) { threadBaseInfo = (thread_basic_info_t)threadInfo; if (! (threadBaseInfo->flags & TH_FLAGS_IDLE)) { integer_t cpuUsage = threadBaseInfo->cpu_usage / 10; If (cpuUsage > 70) {NSString *reStr = smStackOfThread(threads[I]); / / records in the database [[[SMLagDB shareInstance] increaseWithStackString: reStr] subscribeNext: ^ (id) x {}]. NSLog(@"CPU useage overload Thread stack: \n%@",reStr); } } } } }Copy the code

Demo

The tool was integrated into the GCDFetchFeed we did earlier.

  • [[SMLagMonitor shareInstance] beginMonitor]; Can.
  • The use of detecting all method calls is to call [SMCallTrace start] where it is detected; Stop and save are called without checking the printed results. You can also set maximum depth and minimum time detection to filter information that is not required to be seen.
  • Method call frequency can be used to start the statistics by adding [SMCallTrace startWithMaxDepth:3]; Record using [SMCallTrace stopSaveAndClean]. Log to the database and clean up memory usage. You can hook VC’s viewWillAppear and viewWillDisappear, start to record when appear, record to the database at the same time clear when Disappear. The resulting view controller is SMClsCallViewController, and you push it out and you see the list.

data

WWDC

  • WWDC 2013 224 Designing Code for Performance
  • WWDC 2013 408 Optimizing Your Code Using LLVM
  • WWDC 2013 712 Energy Best Practices
  • WWDC 2014 710 writing energy efficient code part 1
  • WWDC 2014 710 writing energy efficient code part 2
  • WWDC 2015 230 performance on ios and watchos
  • WWDC 2015 707 achieving allday battery life
  • WWDC 2015 708 debugging energy issues
  • WWDC 2015 718 building responsive and efficient apps with gcd
  • WWDC 2016 406 optimizing app startup time
  • WWDC 2016 719 optimizing io for performance and battery life
  • WWDC 2017 238 writing energy efficient apps
  • WWDC 2017 706 modernizing grand central dispatch usage

PS:

[email protected]