Component is introduced

OOMDetector is an IOS memory monitoring component developed by Tencent. Currently, several apps have been connected to OOMDetector, which has the following two functions:

  • Memory stack burst statistics: Records the memory allocation stack and memory block size of the process, and dumps stack data to disk when memory bursts

  • Memory leak detection: Detects memory leaks, currently supports Malloc memory blocks and OC objects leak detection

OOMDetector can quickly to help developers find and locate the App memory problems and memory leaks, component is now in making open source, the source address: https://github.com/Tencent/OOMDetector.

background

There are several IOS memory analysis tools available in the industry. Here are some of the features and limitations of these tools.

Allocation

As IOS developers, we are all familiar with the Allocation memory analysis tool officially provided by Apple. During the development and debugging phase, Allocation can be used to analyze the memory usage of each App module in detail. The Allocation can monitor all heap memory and some VM memory Allocation comprehensively. Although the function of Allocation is powerful, it also has obvious limitations, mainly as follows:

  • Can not run independently in App, can only be used in the debugging phase connected to the Mac

  • Poor performance, large App is easy to cause jam after opening

These two limitations determine that Allocation is only suitable for assisting the analysis of memory problems in the code at the development stage, but cannot directly monitor and locate the problems of online users.

FBAllocationTracker

FBAllocationTracker is Facebook’s open source memory analysis tool. It is based on the principle of using Method Swizzling to replace the original alloc Method, which can record the allocation information of alloc instances when the App runs. Help App to find abnormal growth of OC objects during runtime. Compared with Allocation, FBAllocationTracker has less impact on App performance and can run independently in App. But there are some obvious drawbacks to this tool:

  • The monitoring scope is not comprehensive enough. Only OC objects can be monitored, but C++ objects, malloc memory blocks, and VM memory cannot be monitored

  • Without stack information on the allocation of memory objects, it is difficult for developers to pinpoint the cause of memory growth just by looking at the type and number of objects

To sum up, although FBAllocationTracker can run independently in App, the monitored memory range is too small and the object information recorded is too simple, which is of limited help for analyzing memory problems.

Memory problems have always been the focus of Q. In order to ensure the memory quality of online users, we hope to have a tool to help monitor and locate the memory problems of online users. Based on this background, our team developed the OOMDetector component. OOMDetector by Hook system at the bottom of the memory allocation method, can record to process all of the memory allocation stack information, at the same time component can in the case of little influence on the performance of fluency can ensure independent operation in the App, can be used for analysis and monitoring of online users memory problems (memory or memory leaks).

Principle component

Statistics of memory stack overflow

Burst memory stack monitoring principle

The realization principle of memory burst stack monitoring is shown in Figure 1. Through the relevant methods of underlying memory allocation in Hook IOS system (including heap memory allocation related to malloc_zone and VM memory allocation method corresponding to VM_ALLOCATE), the memory allocation information of each object in the process is tracked and recorded. The allocation stack, the total number of allocations, the total amount of memory allocated, etc., are also cached in the process memory. When the memory reaches the top, the component periodically dumps the stack information to the local disk, so that if the application runs out of memory, the stack data that was dumped before the memory burst can be reported to the background server for analysis.

Figure 1 Working principle of memory overflow monitoring

Performance challenge

App memory allocation methods are called at a very high frequency, which can be as high as 10W/ second in large apps. Hook methods like this can be a huge performance challenge for a component, because if the component itself is time consuming it can easily cause the App to stall or even die. In OOMDetector, we strictly control the execution efficiency of Hook method code, and also adopt some strategies to optimize the time-consuming stack backtracking and lock wait in Hook method:

  • Optimize the stack traceback method

For stack traceback, the system provides the backtrace_symbols method to directly obtain the stack information, but this method is very time-consuming. Therefore, we implemented a more efficient stack backtracking method based on the stack backtracking principle. The optimized method would only obtain the address information of the stack function when running, and assemble the stack format as shown in Figure 2 (similar to Crash stack) according to the address range of the dynamic library when writing back to the disk. The background server can restore the corresponding stack contents using ATOS commands and symbol table files. In this way, the time-consuming symbol restoration work can be transferred to the server side. The client side only needs to perform the less time-consuming stack function address traceback operation. The optimized stack traceback method takes less than 1us.

Figure 2. Stack format

  • Optimize lock wait time

For multithreaded memory allocation, stack insertion operations must be locked to ensure thread safety. For this frequently invoked method, lock performance is the metric we care most about. NSLock and @synchronized are commonly used in IOS development, so what is the performance of these two locks?

We tested the commonly used locks in IOS through the test code, and summarized the performance comparison graph of various locks as shown in Figure 2. According to the test results in Figure 3, the performance of NSLock and @synchronized is lower than that of pthread_mutex, and the best performance is the spin lock OSSpinLock.

The principle of a spinlock is that if the spinlock has been held by another execution unit, the caller waits in a loop for the lock to be released. Compared to mutex, spin-locks do not cause the caller to sleep, saving the thread from state switching to sleep, so they are more efficient, but at the cost of increased CPU utilization. In our scenario, since the execution time of the locked part of the code is less, using OSSpinLock does not significantly increase the CPU utilization, so we use OSSpinLock to prioritize the efficiency of locking.

Figure 3 Performance comparison of various locks

Stack clustering and compression

As mentioned earlier, our Hook method caches stack data for each memory allocation. Assuming that the number of memory blocks of App is 25W, the average stack depth is 20 lines, and each stack address uses 8-byte integer data storage, 25W stack data will occupy 40M memory space. Obviously this kind of memory growth is not affordable for any App, so we need to optimize the memory footprint of the components.

When we analyze burst memory problems, we only need to analyze the large memory usage of the stack, and we do not care about the small memory usage of the stack. So our optimization idea is clear: only keep the large stack. To accomplish this task, all stacks in memory must be clustered together first, and the accumulated memory value of each stack is counted.

Specific optimization strategy as shown in figure 4, for each record to the stack, the distribution of the first through the md5 algorithm to compress the stack data is 16 bytes of md5, through the md5 value clustering, the cache only 16 bytes md5 data, only when a stack of cumulative memory exceeds a certain threshold, will retain the original stack information, Thus, because the number of stacks above the threshold is limited, the space occupied by the original stack information is almost negligible.

Figure 4. Stack clustering and compression

There are two ways to reduce the stack to about 1/40 of what it was before optimization, and the optimized component memory will have little impact on App memory.

Data Dump scheme

As mentioned earlier, the stack data in memory is periodically dumped to disk after the memory reaches the top. The conventional solution is that the IO interface writes data directly to disk. Because data Dump frequency is high, frequent I/O operations may cause program lag. Because the data Dump operation is very high frequency, we adopted the mMAP method which is more efficient.

Mmap is a method of memory-mapping files, where a file or other object is mapped to the address space of a process. After such a direct mapping is implemented, the file writing process does not have additional file data copy operations, avoiding frequent switching between kernel space and user space, as shown in Figure 5. According to our code test, the performance of writing data to MMAP mapping space is consistent with that of writing directly to memory, and the efficiency is much higher than IO operation.

Figure 5 Principle of memory mapping

So what is the timing of mmap write back? According to the official documents, there are mainly the following opportunities:

  • The system memory is insufficient

  • Process when the crash

  • When actively calling msync

Mmap will actively write back when the memory is insufficient. This mechanism also ensures that our monitoring component can write back the data in the cache to disk before the program runs out of memory. From this point of view, mMAP is more reliable than conventional IO operations.

Memory leak detection

In addition to burst memory stack monitoring, OOMDetector also integrates memory leak detection, which detects “no main memory leak” of Malloc memory blocks and OC objects. An “ownerless memory leak” is a memory block that is no longer referenced in the process but cannot be released normally.

, introduce according to before OOMDetector can record to each object allocation stack information, to find out from these objects “leaking objects”, we need to know in the process of program can access memory space, whether there is a “pointer variable” point to the corresponding memory block, the memory space in the whole process are no pointer to the block of memory, It’s the leaky memory block we’re looking for. As shown in figure 2, in the IOS system, may contain a pointer variable area of memory heap memory, the memory, a global data area and register, OOMDetector through to these areas through scanning can find all the possible “pointer variable”, after the whole scanning process did not “pointer variable pointing to the memory block is leaking memory block.

To avoid memory access conflicts, all threads are suspended during the scan process, which freezes the program for 1-2 seconds. Because the scanning process is time-consuming, this function is mainly used in the testing phase of App at present. Combined with automated testing, leakage problems can be found quickly and efficiently.

Figure 6 Principle of memory leak detection

Looking forward to

Open source is just the beginning, we will continue to improve the OOMDetector component in the future, and we welcome your comments on the component. If your IOS app is suffering from memory problems or you are interested in IOS memory monitoring technology, check out our component.