Author: Yang Jin, Senior engineer of Tencent Mobile Client development Please contact Tencent WeTest for authorization. The original link: wetest.qq.com/lab/view/36…


WeTest takeaway

The current mainstream memory monitoring tool for iOS is Instruments Allocations, but it can only be used in development. This article describes how to implement an offline memory monitoring tool to detect memory problems after the App goes online.

Foreground Out Of Memory (FOOM) : the App consumes too much Memory in the Foreground and causes a strong kill by the system. To the user, it looks like Crash. Facebook as early as August 2015 FOOM detection method is put forward, the general principle is ruled out all kinds of situation, the remaining is FOOM, specific links: code.facebook.com/posts/11469…


Since the launch of wechat at the end of 2015, FOOM reports have been reported. According to the initial data, the proportion of FOOM times and login users every day is close to 3%, and the crash rate in the same period is less than 1%. At the beginning of 2016, a boss of the east feedback wechat frequent flash back, in the difficult pull 2G more logs, only to find that KV report frequently hit log cause FOOM. Then in August 2016, a lot of external users reported that wechat withdrew after it was started. After analyzing a large number of logs, the cause of FOOM could not be found. Wechat urgently needs an effective memory monitoring tool to find problems.


First, the realization principle

The initial version of wechat memory monitoring is to use Facebook’s FBAllocationTracker tool to monitor OC object allocation, and use Fishhook tool hook malloc/free interface to monitor heap memory allocation every 1 second. The current number of all OC objects, the TOP 200 maximum heap memory and its allocation stack, with text log output to the local. This scheme is simple to implement and can be completed within one day. By delivering TestFlight to users, it is found that the contact module loads a large number of contacts due to migration DB, resulting in FOOM.


But there are a number of drawbacks:

1. The monitoring granularity is not fine enough, such as the qualitative change caused by a large amount of small memory allocation cannot be monitored. In addition, Fishhook can only be invoked by the C interface of its own app, and has no effect on the system library;

2, play log interval is not good control, the interval is too long may lose the middle peak, the interval is too short will cause power consumption, IO frequent performance problems;

3, the original log reported by manual analysis, lack of good page tools display and classification problems.

So the second version is based on Instruments Allocations and focuses on four areas of optimization: data collection, storage, reporting, and presentation.


1. Data collection

In order to solve ios10 Nano crash, I researched the source code of libmalloc, and found these interfaces by accident:




When the malloc_logger and __syscall_logger Pointers are not empty, malloc/free, vm_allocate/vm_deallocate are used to notify the upper layer of memory allocation/release. This is how the memory debugging tool Malloc Stack works. With these two function Pointers, it is easy to record the memory allocation information (including the allocation size and the allocation stack) of the currently living object. The allocation stack can be captured using the backtrace function, but the addresses captured are virtual memory addresses and cannot be resolved from the symbol table DSYM. So also record the offset slide for each image when it is loaded, such that symbol table address = stack address -slide.




In addition, in order to better classify data, each memory object should have its own Category, as shown in the figure above. For heap memory objects, the Category name is “Malloc” + the allocation size, such as “Malloc 48.00KiB”; For a virtual memory object, when vm_allocate is created, the last parameter, flags, indicates what kind of virtual memory it is. This parameter corresponds to the first parameter type of the function pointer __syscall_logger. The meaning of each flag can be found in the header file. For an OC object, its Category name is the OC class name, which we can get by hook OC method +[NSObject alloc] :




But it turns out that the NSData class static method that creates the object does not call +[NSObject alloc], and the implementation is to call the C method NSAllocateObject to create the object, which means that an OC object created this way cannot hook to get the name of the OC class. In the apple open source code CF 1153.18 finally found the answer, when CFOASafe = true and CFObjectAllocSetLastAllocEventNameFunction! =NULL, CoreFoundation creates an object and uses this function pointer to tell the upper layer what type the object is:




In this way, our monitor data sources are basically the same as Allocations, with the help of the private API of course. Without enough “tricks”, private apis won’t make it to the Appstore, so we’ll have to settle for the next best thing. Modifying the malloc, free, and other function Pointers in the malloc_zone_t structure returned by malloc_default_zone can also monitor heap memory allocation, which has the same effect as malloc_logger. Virtual memory allocation can only be done in fishhook mode.


2. Data store

Live Object Management

The APP allocates/frees a lot of memory while it is running. In the example above, within 10 seconds of launching wechat, 800,000 objects have been created and 500,000 released. Performance is a challenge. In addition, in the stored procedure, also minimize memory application/release. So instead of SQLite, we use a lightweight balanced binary tree for storage.


Splay Tree, also known as split Tree, is a kind of binary sorting Tree. It does not guarantee that the Tree is balanced, but the average time complexity of various operations is O(logN), which can be approximated as a balanced binary Tree. Compared to other balanced binary trees (such as red-black trees), it has a smaller memory footprint and does not need to store additional information. The main starting point of the stretchtree is to consider the principle of locality (a node that has just been accessed will be accessed again, or a node that has been accessed more times may be accessed next time). In order to reduce the search time, the frequently queried nodes are moved closer to the root through the “stretch” operation. In most cases, memory requests are quickly released, such as autoreleased objects, temporary variables, etc. After the OC object allocates memory, it updates its Category. So stretch tree management is perfect.


The traditional binary tree is implemented with a linked list. Every time a node is added/deleted, memory is applied/freed. To reduce memory operations, you can implement binary trees with arrays. The specific approach is that the left and right children of the parent node from the previous pointer type to integer type, representing the children in the array of the subscript; When deleting a node, the deleted node is stored in the array index of the last node to be freed.




The stack is stored

According to statistics, during the operation of wechat, there are millions of backtrace stacks. In the case of the maximum capture stack length of 64, the average stack length is 35. If 36bits stores an address (armv8 has a maximum virtual memory address of 48bits, but 36bits is actually enough), the average stack size is 157.5bytes, and 1M stacks require 157.5 MB of storage space. But by looking at the breakpoint, most of the stacks actually have a common suffix. For example, the following two stacks have the same last seven addresses:




To do this, you can use Hash tables to store these stacks. The idea is that the whole stack is inserted into the table as a linked list. The linked list node stores the current address and the index of the table where the last address is located. Every time an address is inserted, its hash value is calculated and used as the index in the table. If the slot corresponding to the index has no data stored, the linked list node is recorded. If there is data stored and the data is the same as the linked list node, the hash hits and the process continues to the next address. Inconsistent data means hash conflict. The hash value needs to be recalculated until the storage condition is met. Here’s an example (simplifying hash computation) :




1) G, F, E, D, C, and A of Stack1 are inserted into the Hash Table, and the data of index 1 through 6 is (G, 0), (F, 1), (E, 2), (D, 3), (C, 4), (A, 5). The Stack1 index entry is 6


2) It is the turn to insert Stack2, because the data of G, F, E, D, C is the same as the first 5 nodes of Stack1, hash hits; B inserts a new position 7, (B, 5). The Stack2 index entry is 7


3) Finally, insert Stack3, G, F, E, D hash hit; Stack3 (A, 4); Stack3 (A, 4); Stack3 (A, 4); Stack3 (A, 4); Insert node (B, 9) (B, 9); insert node (B, 9) (B, 9) (B, 9); The Stack3 index entry is 9


After such suffix compression storage, the average stack length from the original 35 shortened to less than 5. While each node stores 64bits (36bits for the address and 28bits for the parent index), the hashTable space utilization is 60%+, and the average stack storage length is only 66.7bytes, with a compression rate of 42%.


The performance data

After the above optimization, the CPU usage of memory monitoring tool in iPhone6Plus is less than 13%. Of course, this is related to the amount of data, heavy users (such as too many groups, frequent messages, etc.) may have a slightly higher CPU usage. The memory for storing data occupies about 20 MB. Files are mapped to the memory in MMAP mode. You can find out more about the benefits of Mmap by Google.




3. Data report

Since memory monitoring stores the memory allocation information of all the current living objects, the amount of data is very large. Therefore, when FOOM occurs, it is not possible to report it in full, but selectively report it according to some rules.


First, all objects are classified by Category, and the number of objects in each Category and the allocated memory size are counted. The data in this list is so small that it can be reported in full. Then all the same stacks under the Category are merged, calculating the number of objects and memory size of each stack. For some categories, such as TOP N allocation size, or UI-related (such as UIViewController, UIView, etc.), the TOP M allocation stack is reported. The report format looks like this:




4. Page display

The Allocations page shows you Allocations, which allows you to see which categories are available, the size and number of objects allocated to each Category, and for some categories, you can see the allocation stack.




In order to highlight the problem and improve the efficiency of solving the problem, the background first identifies the Categories (Suspect Categories above) that may cause FOOM. The rules include:

● The number of UIViewControllers is abnormal

● The number of UIViews is abnormal

● The number of UIImages is abnormal

● Whether the allocation size of other categories and the number of objects are abnormal


Then calculate the eigenvalue for the suspect Category, which is the OOM reason. The eigenvalue is composed of “Caller1”, “Caller2” and “Category, Reason”. Caller1 refers to the memory application point, and Caller2 refers to a specific scenario or business. They are both extracted from the stack with the first allocation size under Category. Caller1 extract is as meaningful as possible, and not the previous address of the assignment function. Such as:




After all the reports have calculated the eigenvalues, they can be classified. The primary Category can be either Caller1 or Category, and the secondary Category is an aggregation of features related to Caller1/Category. The effect is as follows:

The primary classification




The secondary classification




5. Operational strategy

As mentioned above, memory monitoring will bring certain performance loss, and the amount of data reported is about 300K each time. Full reporting will put some pressure on the background, so sampling is enabled for the live network users, and 100% is enabled for the grayscale package users/internal company users/whitelist users. Only the data of the last three times can be stored locally.


Second, reduce misjudgment

Let’s review how Facebook determines if FOOM was present on the last launch:




1. The App has not been upgraded

2.App does not call exit() or abort() to exit

3. The App does not crash

4. The user did not strongly reject the App

5. The system is not upgraded or restarted

6. The App was not running in the background

7. FOOM App


1, 2, 4, and 5 are easy to determine, 3 rely on the crash callback of their own CrashReport component, and 6 and 7 rely on ApplicationState and background switch notifications. Since wechat launched FOOM data reporting, there have been many misjudgments, the main cases are:

ApplicationState no

Part of the system will briefly invoke the app in the background. ApplicationState is Active, but it’s not BackgroundFetch; After didFinishLaunchingWithOptions dropped out, also have received BecomeActive notice, but soon also exit; The whole startup process lasts 5 to 8 seconds. The solution is to consider this startup to be a normal foreground startup one second after receiving the notification of BecomeActive. This method can only reduce the probability of misjudgment, but cannot solve it completely.


Team-based plug-in

This kind of plug-in is the software that can remotely control the iPhone. Usually, one computer can control multiple mobile phones, and the computer screen and mobile phone screen can be operated synchronously in real time, such as opening wechat, automatically adding friends, Posting moments of friends, and forced to exit wechat. This process is prone to misjudgment. The solution can only be a security crackdown to reduce such misjudgments.


The CrashReport component crashes and does not call back to the upper layer

A large number of GIF crashes broke out in wechat at the end of May, 2017. The crash was caused by memory out of bounds. However, when receiving the crash signal to write the crashlog, the component could not write the crashlog normally due to the damage of the memory pool, and even caused the second crash. The upper layer could not receive the crash notification, so it misjudged as FOOM. If the last crashlog (complete or not) exists locally, the APP will be considered to have been restarted due to crash.


Before decca death caused the system watchdog strong kill

This is the common 0x8BadF00d, usually caused by too many foreground threads, deadlocks, or persistently high CPU usage. This type of kill cannot be captured by the App. For this reason, we combined the existing system with the lag. The lag was caught at the last moment when the current platform was running, and we thought that this startup was forcibly killed by the watchdog. At the same time, we divided the new restart cause from FOOM called “APP front decca death caused restart”, and included it in the focus.


Three,

Wechat since March 2017 online memory monitoring, to solve more than 30 small and large memory problems, involving chat, search, circle of friends and other businesses, FOOM rate from the beginning of 17 years 3%, down to 0.67%, and the death rate of the former Tecca from 0.6% to 0.3%, the effect is particularly obvious.






Four, common problems

UIGraphicsEndImageContext

UIGraphicsBeginImageContext and UIGraphicsEndImageContext must appear in pairs, or it will cause leakage of the context. In addition, XCode’s Analyze can also scan out such problems.


UIWebView

Whether it’s opening a web page or executing a simple piece of JS code, UIWebView takes up a lot of APP memory. While WKWebView not only has excellent rendering performance, but also has its own independent process, some web-related memory consumption moved to its own process, the most suitable replacement for UIWebView.


autoreleasepool

Usually autoreleased objects are released at the end of the runloop. If you generate a large number of Autoreleased objects in the loop, the memory spikes will spike and even OOM will appear. Adding autoReleasepool properly can release memory in time and reduce the peak.


Refer to each other

One of the easiest places to have a reference to each other is when self is used in a block and self holds the block, which can only be avoided by code specifications. In addition, NSTimer’s target and CAAnimation’s delegate are strong references to Obj. Wechat currently circumvents this problem by implementing its own MMNoRetainTimer and MMDelegateCenter.


Large image processing

For example, the image zoom interface used to look like this:




[UIImage drawInRect:] when drawing, first decode the image, regenerate the original resolution size of the bitmap, this is very memory consumption. The solution is to use a lower level ImageIO interface to avoid intermediate bitmap generation:




Large view

A large View is a View that is too large and contains the content to be rendered. Super long text is a common explosive group message in wechat, usually thousands or even tens of thousands of lines. If you draw it in the same View, it will consume a lot of memory and cause serious lag. The best practice is to divide the text into multiple views, taking advantage of the TableView reuse mechanism to reduce unnecessary rendering and memory usage.


Recommend the article

Here are a few iOS memory related links:

Memory Usage Performance Guidelines

No pressure, Mon!




Tencent WeTest iOS prereview tool

In order to improve the passing rate of IEG apple review, Tencent set up an Apple review and test team to build a product called iOS pre-review tool. After one and a half years of internal operation, the iOS approval rate of Tencent’s internal apps increased from an average of 35% to over 90%.


Now the audit experience of Tencent’s internal products is shared with you in the form of online tools. It is available online on WeTest Tencent quality Open platform. Click wetest.qq.com/product/ios experience immediately!