Performance issues are one of the biggest culprits in losing App users. App performance issues include crashes, network request errors or timeouts, slow response times, list scrolling delays, heavy traffic, power consumption, and more. There are many reasons for low App performance. Apart from external factors of device hardware and software, most of them are caused by developers’ wrong use of threads, locks, system functions, programming paradigms, data structures, etc. It is difficult for even the most experienced programmer to avoid all the potholes that lead to poor performance during development, so the key to solving performance problems is to find and locate them early.

Through summarizing common performance problems in practice, Meituan takeout developed a set of mobile terminal performance monitoring solution — Hertz (Hertz) after learning the principles of performance monitoring technologies such as wechat and 360 in the industry. Hertz aims to achieve these three features:

  • During development, check for performance anomalies and notify developers;
  • During the test period, generate performance test reports in conjunction with existing test tools;
  • During the online period, performance data will be reported through the monitoring platform to locate and trace online problems.

To realize these three functions, measurable and valuable performance data must be collected first, so the collection of performance data is one of the most core issues we pay attention to.

The data collection

While there are a variety of performance issues that users can perceive, they can still be abstracted into concrete monitoring metrics. Hertz monitors FPS, CPU usage, memory usage, lag, page load time, network request traffic, etc. Some performance indicators are easy to obtain, such as FPS, CPU usage, and memory usage, while others are hard to obtain, such as latency, page load time, and network request traffic.

For example, in iOS we can get FPS like this:

- (void)tick:(CADisplayLink *)link
{
    NSTimeInterval deltaTime = link.timestamp - self.lastTime;
    self.currentFPS = 1 / deltaTime;
    self.lastTime = link.timestamp;
}
Copy the code

In Android we can get the memory footprint as follows:

public long useSize() {
    Runtime runtime = Runtime.getRuntime();
    long totalSize = runtime.maxMemory() >> 10;
    this.memoryUsage = (runtime.totalMemory() - runtime.freeMemory()) >> 10;
    this.memoryUsageRate = this.memoryUsage * 100 / totalSize;
}
Copy the code

The above example is just to illustrate how easy it is to get metrics like FPS, memory, CPU, etc., but these metrics have to be combined with other data to make sense, such as the current page information, the current App running time, or the application execution stack and run log when a lag occurs. For example, combining CPU and current page information, you can measure the computational complexity of each page. Combined with memory and App running time, the relationship between memory and usage time can be observed to analyze whether memory leakage occurs. The FPS and the lag information are combined to assess how bad the App’s performance is when the lag happens.

Traffic consumption

Mobile terminal users are very sensitive to traffic. Meituan Takeout occasionally receives complaints from users that they consume a large amount of traffic in a short period of time. Therefore, we consider whether we can make local statistics on users’ traffic consumption in the App and report them to the background. This statistic does not have to be accurate to each API, but can be roughly categorized to calculate the total traffic consumption. The dimensions of traffic statistics are as follows: natural day + source of request + network type. Why do you need to monitor traffic locally on the client when you have server-side traffic monitoring (such as CAT)? Local traffic counts all network requests made by clients, which is difficult to monitor on the server. One example is that not all network requests are reported to the server for monitoring; Another example is that the user may consume only the upstream traffic for network reasons, but the requests do not reach the server.

In iOS we implement traffic statistics by registering NSURLProtocol:

- (void)connectionDidFinishLoading:(NSURLConnection *)connection
{
    [self.client URLProtocolDidFinishLoading:self];

    self.data = nil;
    if (connection.originalRequest) {
        WMNetworkUsageDataInfo *info = [[WMNetworkUsageDataInfo alloc] init];
        self.connectionEndTime = [[NSDate date] timeIntervalSince1970];
        info.responseSize = self.responseDataLength;
        info.requestSize = connection.originalRequest.HTTPBody.length;
        info.contentType = [WMNetworkUsageURLProtocol getContentTypeByURL:connection.originalRequest.URL andMIMEType:self.MIMEType];
    [[WMNetworkMeter sharedInstance] setLastDataInfo:info];
    [[WMNetworkUsageManager sharedManager] recordNetworkUsageDataInfo:info];
}
Copy the code

}

In Android we implement traffic statistics by intercepting network request apis in aspectJ-based AOP:

@Pointcut("target(java.net.URLConnection) && " + "! within(retrofit.appengine.UrlFetchClient) " + "&& ! within(okio.Okio) && ! within(butterknife.internal.ButterKnifeProcessor) " + "&& ! within(com.flurry.sdk.hb)" + "&& ! within(rx.internal.util.unsafe.*) " + "&& ! within(net.sf.cglib.. *) "+" &&! within(com.huawei.android.. *) "+" &&! within(com.sankuai.android.nettraffic.. *) "+" &&! within(roboguice.. *) "+" &&! within(com.alipay.sdk.. *)") protected void baseCondition() { } @Pointcut("call (org.apache.http.HttpResponse org.apache.http.client.HttpClient.execute(org.apache.http.client.methods.HttpUriRequest))" + "&& target(org.apache.http.client.HttpClient)" + "&& args(request)" + "&& ! within(com.sankuai.android.nettraffic.factory.. *)" + "&& baseClientCondition()" ) void httpClientExecute(HttpUriRequest request) { }Copy the code

After collecting statistics on the total traffic consumption, we also hope to classify traffic roughly to locate problems. There are two factors we care about: the first is the source of the request, i.e. whether the traffic consumption is from the API request, H5 or CDN; The second is network type, namely Wifi, 4G or 3G traffic. For traffic sources, we first do a simple classification by domain name. The following is an example code for iOS:

- (NSString *) regApiHost { return _regApiHost ? _regApiHost :@"^(.*\\.) ? (meituan\\.com|maoyan\\.com|dianping\\.com|kuxun\\.cn)$"; } - (NSString *) regResHost { return _regResHost ? _regResHost : @"^(.*\\.) ? (meituan\\.net|dpfile\\.com)$"; } - (NSString *) regWebHost { return _regWebHost ? _regWebHost : @"^(.*\\.) ? (meituan\\.com|maoyan\\.com|dianping\\.com|kuxun\\.cn|meituan\\.net|dpfile\\.com)$"; }Copy the code

However, some domain names may deploy both API services and Web services. We further distinguish such domains by verifying the MIMEType of the returned package. The following is an example code for iOS:

+ (BOOL)isPermissiveWebURL:(NSURL *)URL andMIMEType:(NSString *)MIMEType
{
    NSRegularExpression *permissiveHost = [NSRegularExpression regularExpressionWithPattern:[[WMNetworkMeter sharedInstance] regWebHost]
                                                                                options:NSRegularExpressionCaseInsensitive
                                                                                  error:nil];
    NSString *host = URL.host;
    return ([MIMEType isEqualToString:@"text/css"] || [MIMEType isEqualToString:@"text/html"] || [MIMEType isEqualToString:@"application/x-javascript"] || [MIMEType isEqualToString:@"application/javascript"]) && (host && [permissiveHost numberOfMatchesInString:host options:0 range:NSMakeRange(0, [host length])]);
}
Copy the code

Page load time

To measure page load time, we solve two problems. First, how to measure the load time of a page; Second, how to achieve speed measurement by writing as little or as little code as possible. Let’s start with the first question. Take Android as an example. During the creation and loading of an Activity, many operations will be performed, such as setting the page theme, initializing the page layout, loading images, obtaining network data or reading and writing databases, etc. Performance problems in any of the above operations may cause the screen to fail to be displayed in a timely manner, affecting user experience. Hertz abstracts these possible operations into the speed measurement model shown below:

T1 refers to the time between page initialization and the display of the first UI element, which is usually a waiting animation while data is loaded. T2 refers to the network request time, which may start earlier than T1 ends. T3 is the time it takes to load the data, populate the UI and re-render it. T is the time of the entire page from initialization to the final UI drawing.

For the second problem, it is inefficient and error-prone to manually write code buried at every point in time. Therefore, Hertz uses a profile to configure the API for each page, unifying the burial points in the base class of API requests. There is certainly room for optimization, such as embedding buried code into API calls at hook key nodes.

[{
  "page": "MainActivity",
  "api": [
    "/poi/filter",
    "/home/head",
    "/home/rcmdboard"
  ]
},
{
  "page": "RestaurantActivity",
  "api": [
    "/poi/food"
  ]
}]
Copy the code

In addition, there is the question of how to determine that the UI rendering is complete. In Android, Hertz inserts a FrameLayout into the Activity rootView and listens to see if the FrameLayout is implemented by calling the dispatchDraw method. Of course, the downside of this scenario is that the level nesting becomes deeper due to the insertion of a first-level View.

@Override
protected void dispatchDraw(Canvas canvas) {
    super.dispatchDraw(canvas);
    if (!mIsComplete) {
        mIsComplete = mCallback.onDrawEnd(this, mKey);
    }
}
Copy the code

In iOS we take a different approach, with Hertz specifying a tag for an element of the final rendered page in their profile and turning on CADisplayLink after a successful network request to check if the element appears below the root node.

- (void)tick:(CADisplayLink *)link
{
    [_currentTrackRecordArray enumerateObjectsUsingBlock:^(WMHertzPageTrackRecord * _Nonnull record, NSUInteger idx, BOOL * _Nonnull stop) {
        if ([self findTag:record.configItem.tag inViewHierarchy:record.rootView]) {
            [self endPageRenderEvent:record];
        }
    }];
}
Copy the code

caton

At present, the mainstream mobile devices adopt the display technology of double cache + vertical synchronization. The general principle is that the display system has two buffers. The GPU will pre-render a frame and put it into a buffer for the video controller to read. After the next frame is rendered, the GPU will directly point the pointer of the video controller to the second container. Here, the GPU waits for the display’s VSync (vertical sync) signal before a new frame is rendered and the buffer updated.

The screen refresh frequency of most mobile phones is 60HZ. If the task of this frame is not completed within 1000/60=16.67ms, frame loss will occur, which is the reason why users feel stuck. The drawing task of this frame includes the work of CPU and GPU. CPU is responsible for calculating the displayed content, such as view creation, layout calculation, image decoding, text drawing, etc. Then CPU submits the calculated content to GPU for transformation, synthesis and rendering.

Except for UI rendering system, input events and callback services, and other code we inserted in the main thread of execution, so once added a complex operation in the main thread code, the code is likely to block the main thread to respond to click, sliding, and block the main thread of the UI map operation, this is the most common cause of caton.

After understanding the principle of screen drawing and the cause of stalling, it is easy to think that by detecting the FPS, you can know whether the App is stalling or not, and you can measure the quality of the current page drawing by calculating the frame loss rate of a consecutive FPS frame. However, it has been found in practice that FPS refresh rate is very fast and jitter is easy to occur, so it is difficult to detect the lag by comparing FPS directly. It is much easier to detect the execution time of the main thread message loop, which is a common way to detect lag in the industry. Therefore, what Hertz uses in practice is to detect the time that the main thread executes the message loop each time. When this time is greater than the threshold value, it is recorded as the occurrence of a stall.

In practice, we find that some lag continuity takes a long time, such as the lag when opening a new page; However, the time is relatively short but the frequency is faster when there is a lag continuity, such as the lag when the list is sliding. Therefore, we adopt the judgment strategy of “N times of delay exceeding threshold T”, that is, the collection and reporting will be triggered only when the accumulative number of delay in a period is greater than N. For example, if the delay threshold T=2000ms and the number of delay N=1, it can be judged as the delay of a long time. However, the lateness threshold T=300ms and the lateness times N=5 can be judged as the lateness with higher frequency.

Runnable loopRunnable = new Runnable() { @Override public void run() { if (mStartedDetecting && ! isCatched) { nowLaggyCount++; if (nowLaggyCount >= N) { blockHandler.onBlockEvent(); isCatched = true; . }}}}; public void onMainLoopFinish(){ if(isCatched){ blockHandler.onBlockFinishEvent(loopStartTime,loopEndTime); } resetStatus(); . }Copy the code

How do you locate the problem causing the stutter when it is detected? Wouldn’t it be cool if we could grab the call stack and run log of the program when Caton happened? Indeed, grabbing the stack can be very effective in helping us locate the “problem code” that is causing the lag.

In practice, we found that grabbing the stack has two issues to be aware of.

The first problem is the timing of stack fetching. The time to grab the stack must be when the holdup occurs, not after, otherwise the code causing the holdup cannot be caught exactly, so in the child thread we grab the stack before the holdup ends.

The second problem is how to classify the stack. The classification of the stuck stack is different from that of Crash stack. It is obviously inappropriate to classify the innermost code, because different business logic codes in the outer layer may have the same call stack in the innermost layer. It is also inappropriate to categorize the outermost code, which can be either business logic code or system call.

Hertz currently follows the innermost classification principle and matches some simple rules to match the class name of the matching rule.

Scalability and ease of use

Hertz takes the SCALABILITY and ease of use of the SDK very seriously, and we put a lot of thought into it from the beginning. The framework of SDK is divided into three layers as shown in the figure below: the top layer is the interface layer, which provides a small number of exposed methods, as well as the environment and configuration parameters. The second layer is the business layer, which contains all the core logic such as page speed measurement, lag detection and parameter collection. The third layer is the data adaptation layer, which encapsulates the data generated by the service layer into a unified data structure and ADAPTS it to different output channels through adapters.

The first consideration was the ease of use of the interface. Hertz has three built-in operating modes: development mode, test mode and online mode. All the developers have to do is specify a mode and Hertz is up and running. Each mode presets the parameters required by the SDK, such as sampling frequency, lag threshold, channel switch, etc., while the monitoring indicator collection, lag detection, page speed measurement and other logic are automatically executed internally. Using Android as an example, the code is as follows:

final HertzConfiguration configuration = new HertzConfiguration.Builder(this)
        .mode(HertzMode.HERTZ_MODE_DEBUG)
        .appId(APP_ID)
        .unionId(UNION_ID)
        .build();
Hertz.getInstance().init(configuration);
Copy the code

Our second design consideration was the extensibility of the SDK. Take the data adaptation layer as an example. Currently, there are five built-in adaptation channels, which can adapt the collected monitoring data to different data channels. Depending on the selected working mode, the data will be adapted to the server monitoring channel to generate test reports, or log and prompt only locally in the App. One benefit of this design is that if you need a new data output channel, you can either add an interceptor on top of it or add an adapter with minimal changes to the SDK code. Similarly, the design of performance acquisition module and page speed measurement module also follows this idea.

The practical application

After entering Hertz, Meituan Takeout initially had the ability to find and locate performance problems, and carried out practical verification on Hertz in the development period, test period and online phase.

Development period application

Hertz is integrated with an offline performance testing tool during the development period. When anomalies are detected, Hertz will directly feed back the data to the developers, as shown in the figure below:

Data collected during runtime will be output to the log, and a floating layer will be inserted on the App page to show the current FPS, CPU, memory and other basic information. If a stall is detected, a prompt page pops up and lists the current execution stack. According to caton detection so far, most stack logs can clearly identify problematic code, which can be easily optimized by looking at the code and analyzing the cause.

Here is an example of a complex UI initialization that causes a lag:

android.content.res.StringBlock.nativeGetString(Native Method)
android.content.res.StringBlock.get(StringBlock.java:82)
android.content.res.XmlBlock$Parser.getName(XmlBlock.java:175)
android.view.LayoutInflater.inflate(LayoutInflater.java:470)
android.view.LayoutInflater.inflate(LayoutInflater.java:420)
android.view.LayoutInflater.inflate(LayoutInflater.java:371)
com.sankuai.meituan.takeoutnew.controller.ui.PoiListAdapterController.getView(PoiListAdapterController.java:77)
com.sankuai.meituan.takeoutnew.adapter.PoiListAdapter.getView(PoiListAdapter.java:26)
android.widget.HeaderViewListAdapter.getView(HeaderViewListAdapter.java:220)
Copy the code

Here is an example of a holdup when using Gson to reverse parse a string:

com.google.gson.Gson.toJson(Gson.java:519)
com.meituan.android.common.locate.util.GoogleJsonWrapper    $MyGson.toJson(GoogleJsonWrapper.java:236)
com.sankuai.meituan.location.collector.CollectorJson    $MyGson.toJson(CollectorJson.java:216)
com.sankuai.meituan.location.collector.CollectorFilter.saveCurrentData(CollectorFilter.java:67)
com.sankuai.meituan.location.collector.CollectorFilter.init(CollectorFilter.java:33)
com.sankuai.meituan.location.collector.CollectorFilter.<init>(CollectorFilter.java:27)
com.sankuai.meituan.location.collector.CollectorMsgHandler.recordGps(CollectorMsgHandler.java:134)
com.sankuai.meituan.location.collector.CollectorMsgHandler.getNewLocation(CollectorMsgHandler.java:81)
com.meituan.android.common.locate.LocatorMsgHandler$1.handleMessage(LocatorMsgHandler.java:29)
Copy the code

The following is an example of the main thread writing to the database causing a lag:

android.database.sqlite.SQLiteConnection.nativeExecuteForLastInsertedRowId(Native Method)
android.database.sqlite.SQLiteConnection.executeForLastInsertedRowId(SQLiteConnection.java:782)
android.database.sqlite.SQLiteSession.executeForLastInsertedRowId(SQLiteSession.java:788)
android.database.sqlite.SQLiteStatement.executeInsert(SQLiteStatement.java:86)
de.greenrobot.dao.AbstractDao.executeInsert(AbstractDao.java:306)
de.greenrobot.dao.AbstractDao.insert(AbstractDao.java:276)
com.sankuai.meituan.takeoutnew.db.dao.BaseAbstractDao.insert(BaseAbstractDao.java:25)
com.sankuai.meituan.takeoutnew.log.LogDataUtil.insertIntoDb(LogDataUtil.java:243)
com.sankuai.meituan.takeoutnew.log.LogDataUtil.saveLogInfo(LogDataUtil.java:221)
com.sankuai.meituan.takeoutnew.log.LogDataUtil.saveLog(LogDataUtil.java:116)
com.sankuai.meituan.takeoutnew.log.LogDataUtil.saveLogInfo(LogDataUtil.java:112)
com.sankuai.meituan.takeoutnew.ui.page.main.order.OrderListFragment.onPageShown(OrderListFragment.java:306)
com.sankuai.meituan.takeoutnew.ui.page.main.order.OrderListFragment.init(OrderListFragment.java:151)
com.sankuai.meituan.takeoutnew.ui.page.main.order.OrderListFragment.onCreateView(OrderListFragment.java:81)
Copy the code

In terms of the specific problems reported, most logs can clearly identify problematic code, which can be easily optimized by looking at the code and analyzing the cause.

Test period application

Traditional performance tests mostly rely on third-party tools, which produce data that differs greatly from the actual development data. In addition, these tests often only give data of some indicators, rather than helping developers locate problems. We used Hertz to collect performance data during the test phase, which could be manual, automated or Monkey tests. After the performance data is obtained, a simple test report will be issued after script processing.

Of course, this form of test report still needs to manually export logs and execute scripts. In the future, we will develop a set of automatic test tools based on this.

Live application

For the lag detection, In addition to the development period and test period, Hertz can immediately feed back the problem to the developer, and Hertz will also upload the data to the server when gray scale or online operation. Currently, the reporting channel is CAT (open source, please refer to the article in-depth Analysis of Open Source Distributed Monitoring CAT for details). It can be seen that the classification and display of stack is very similar to the Crash monitoring that we are familiar with. According to the classification principle mentioned above, the stack of Crash is arranged according to The Times of occurrence, and can be filtered according to version, operating system and device, which is more in line with the usage habits of developers.

For traffic statistics, we report the traffic consumption data of all users on the server every day and output a report listing the Top100 users in traffic consumption. If an exception is found, you can locate the network request that causes the abnormal traffic based on the back-end logs and client diagnostic logs.

Hertz will also report the page speed data and basic indicators such as FPS, CPU and memory to CAT to evaluate the overall performance of the App.

conclusion

Performance optimization is a topic that every mature App must take seriously, and the pain point of performance optimization is often that problems cannot be found in time, or problems cannot be located when they are found. Meituan Waimai developed and improved Hertz, an App performance monitoring scheme, in practice with the idea of monitoring data to guide performance optimization, and made some exploration and verification in the monitoring and application of performance data.

Hertz’s monitoring indicators currently include FPS, CPU usage, memory usage, lag, page loading time, network request flow, etc., while power consumption, App cold start, and Exception will be gradually added to Hertz’s monitoring targets in the future. In the future, performance monitoring indicators may be reused and improved on the basis of existing tools.

Hertz lag detection and stack fetching are very effective in helping developers locate performance issues, but there is still a lot of room for improvement. For example, whether different thresholds can be set according to different devices and different policies can be set in different periods of App running. As for the classification of stack, the current rules simply match the prefix of class name. How to classify the stack more accurately and reasonably is also a problem that we need to consider more in the future. Of course, these optimizations need more data samples to support.

It is also important to build visual, user-friendly performance testing tools, such as a Web page that can be viewed in real time, as well as browsing through historical reports. Hertz could easily be designed to incorporate automated testing methods or generate automated test reports during the integration phase, but we have only made some initial efforts in this area. Once we have the ability to accurately collect performance data, how to better apply it to the whole development process, including the testing process, still needs long-term exploration and practice.

This paper mainly introduces some ideas and implementation methods summarized by Meituan Takeout in the practice of Hertz, but there are many interesting and deeper topics about App performance monitoring that are not covered. For example, how to balance performance monitoring tools and performance problems brought by the tools themselves, specific skills and means of performance optimization, and further analysis of performance data to establish a monitoring system for abnormal devices, etc. In the future, we will further explore, practice and share these issues.

reference

  1. BlockCanary.
  2. Leakcanary.
  3. Watchdog.
  4. iOS-System-Services.
  5. Guoling, wechat iOS Caton monitoring system.








If you answer “thinking questions”, find mistakes in the article, or have questions about the content, you can leave a message to us at the background of wechat public account (Meituan-Dianping technical team). Each week, we will select one “excellent responder” and give a nice small gift. Scan the code to pay attention to us!