Author: Li Lei (Qian Nuo)

APM provides data related to frame rate, namely FPS(Frames Per Second) data. FPS is an indicator of page fluency to some extent, but THE FPS provided by APM is not very accurate. Coincides with the opening of low-end mobile phone performance optimization project, there is an urgent need for relevant indicators to measure the optimization of sliding experience, and the exploration practice of frame rate data has been opened.

In our exploration practice, we encountered many problems:

  • The percentage of high swipes is relatively high, affecting the overall FPS data
  • Non-human sliding data is mixed in FPS and cannot directly reflect user operation experience
  • When calculating average data, the lag data is buried in a sea of normal data. Does a lag affect only one FPS or one user experience?

After a period of exploration, we settled down some indicators, including: sliding frame rate, frozen frame ratio, scrollHitchRate, lag frame rate. In addition to relevant framerate metrics, APM also provides framerate analysis to better guide performance optimization, as well as a stacken stack to better locate stacken problems.

The following is a detailed introduction of APM platform based features and framerate related exploration practices. I hope this article can bring some help to you.

System Rendering Mechanism

Before introducing the realization of indicators, we first need to understand how the system renders. Only knowing the system rendering mechanism can help us better calculate and process frame rate data.

Rendering mechanism is an important part of Android, which involves a lot of things, including measure/layout/draw principle, lag, excessive drawing, etc., are all related to it. The main purpose here is to get an overview of the rendering process and know what parts need to be computed and obtained through the system API in order to calculate the target data.

Rendering process

We all know that when a render is triggered, it goes to the scheduleTraversals of the ViewRootImpl. At this point, the scheduleTraversals method basically registers the next VSync callback with Choreographer. When the next VSync comes around, Choreographer first cuts to the main thread (native code passed to VSync does not run on the main thread) and of course it does not directly send Looper sendMessage, It is MSG. SetAsynchronous (true), which increases the UI response rate.

After cutting to the main thread, Choreographer starts executing all callbacks registered with this VSync. There are four types of callbacks:

  1. CALLBACK_INPUT, input event
  2. CALLBACK_ANIMATION, animation processing
  3. CALLBACK_TRAVERSAL, UI distribution
  4. CALLBACK_COMMIT

Choreographer classifies all callbacks by type, organized in a linked list with a fixed size array of headers (since only these four callbacks are supported). In the message VSync sends to the main thread, the list fetching sequence is executed and cleared, one by one.

ScheduleTraversals register CALLBACK_TRAVERSAL callback, which executes the most familiar ViewRootImpl#doTraversal() method. PerformTraversals is used in the doTraversal method. PerformMeasure, performLayout and performDraw are used in the performTraversals method.

Code can be read in detail: android. The Choreographer and android. The ViewRootImpl

From this we can see that a frame of data to screen at least includes: VSync cutting to the main thread, processing input events, processing animation time, processing UI distribution (measure, layout, draw) time.

However, when the DRAW process ends, only the CPU computation ends, and the data is then handed over to the RenderThread to do the GPU work.

Screen refresh

Android 4.1 introduces VSync and triple buffering. VSync gives the CPU time to start computing and the buffer that the GPU and Display swap to make the most of the time to process data and reduce jank.

In the figure above, A, B, and C represent three buffers. We can see that the CPU, GPU, and display can get buffers as quickly as possible, reducing unnecessary waiting. If both the display and GPU are currently using a buffer, and if the next rendering starts, there is still a buffer available to write data to the CPU, it can immediately start rendering the next frame, such as the first VSync in the image.

If we take A closer look at the image above, we can see that data A is ready to be flushed to the screen as soon as the third VSync comes, but when the fourth VSync comes to the screen. Thus, triple buffering makes good use of the waiting time for VSync and reduces Jank, but it brings latency.

Here is just a brief review of the knowledge of this area, we suggest that you turn to the history of development, know why and also know why.

Mining of frame data information

When we know the whole system rendering process, what we need to monitor and how to monitor, this is a problem.

The industry solution

Original APM scheme:

After receiving the Touch event, APM will collect the number of draws within 1s of the page. This scheme has the advantage of low performance loss, but has a fatal flaw. If the total page rendering time is less than 1s, the refresh will be stopped, resulting in artificially low data. Second, touching the screen doesn’t necessarily bring a refresh, and the refresh doesn’t necessarily come from a Touch event. In the above cases, the calculated data is dirty.

However, Android has implemented a Debug FPS solution in Viewrotimpl. The principle is similar to the appeal solution, which is accumulated up to 1s in draw. Therefore, if you want a low-cost offline FPS test with lossless performance, this is a solution.

See the trackFPS method for ViewRootImpl.

Matrix:

For the frame rate section, Matrix has creatively hooked Choreographer’s CallbackQueue, while also adding a custom FrameCallback to the head of each CallbackQueue via the reflection call addCallbackLocked. If the Callback is called, rendering of the frame begins, and the message currently being executed in Looper is the rendered message. In this way, in addition to monitoring the frame rate, it can also monitor the time data of each stage of the current frame.

In addition, the combination of frame rate callback and Looper Printer can dump the main thread information when there is a stuttering frame, which is convenient for the business side to solve the stuttering, but frequent concatenation of strings will bring some performance overhead (string concatenation when the println method is called).

General:

Using Choreographer. FrameCallback doFrame (frameTimeNanos: Long) method, in every callback calculate the difference between the two frames, through calculation can get FPS.

Sliding frame rate

FPS is a simple and universal indicator in the industry. FPS stands for Frames Per Second, or the number of Frames rendered Per Second.

It’s not our goal to calculate the FPS, we always want to calculate the sliding framerate. For FPS, we are more concerned with the framerate of the user’s interaction, and monitoring this framerate is a better reflection of the user experience.

First of all, in the face of the previous acquisition scheme, it is impossible to collect FPS in line with the definition, so the original scheme must be abandoned and redesigned. When we saw the scheme of Matrix, we thought the idea was great, but it was too hack, so we preferred the open API of the system with lower maintenance cost and high stability.

So, on the choice, we decided to use the most common Choreographer. FrameCallback implement. Of course, it’s not perfect, but try to avoid this flaw in your design.

So how do we figure out an FPS?

Choreographer. FrameCallback is callback, doFrame method with a timestamp, computing and the difference between the last callback, you can see it as a frame of time. After adding more than 1s, you can calculate an FPS value.

In this process, it is important to know when doFrame will call back:

First, after each callback we need to make a call to Choreographer postFrameCallback, which adds a node to the next frame of the linked list of type CALLBACK_ANIMATION. Therefore, the doFrame callback time is not the start of the calculation of this frame, nor the screen of this frame, but a CALLBACK during the CPU processing of the animation.

Once an FPS is calculated, the following states need to be superimposed on top:

View slide frame rate

In the initial implementation, the View will monitor the frame rate as long as it slides, until it does not slide. According to the requirements, our frame rate capture becomes the following:

So how do you monitor if the View is sliding? It needs to introduce the ViewTreeObserver OnScrollChangedListener. After all, you can only decide whether to use it if you know how it works.

// ViewRootImpl#draw
private void draw(boolean fullRedrawNeeded) {
     // ...
     if (mAttachInfo.mViewScrollChanged) {
            mAttachInfo.mViewScrollChanged = false;
            mAttachInfo.mTreeObserver.dispatchOnScrollChanged();
     }
     // ...
     mAttachInfo.mTreeObserver.dispatchOnDraw();
     // ...
 }
Copy the code

We can see that in viewrotimpl #draw, we determine whether the View in the mAttachInfo message has a slide and distribute it if it has a slide. So when does the set View position change? When the View onScrollChanged is called:

// View#onScrollChanged protected void onScrollChanged(int l, int t, int oldl, int oldt) { // ... final AttachInfo ai = mAttachInfo; if (ai ! = null) { ai.mViewScrollChanged = true; } / /... }Copy the code

OnScrollChanged connects View#scrollTo and View#scrollBy directly, which is generic enough for most scenarios.

According to the rendering process we explained earlier: . We can see that the ViewTreeObserver OnScrollChangedListener callback is in ViewRootImpl# the draw, Then the Choreographer. FrameCallback correction before ViewTreeObserver. OnScrollChangedListener.

For a single frame, it can be expressed as follows:

In this way, each frame has a sliding state. When a frame is a sliding frame, it can start counting, accumulating time to 1s, and a sliding frame rate data is calculated out.

Finger slide frame rate

View sliding frame rate, in offline verification, consistent with the data from the test platform, and can meet the basic requirements, acceptance. Once online, it was also up and running and able to take on framerate related work.

However, View scrolling does not mean that it is caused by user action, and data is not always the result of user experience. So, we started to implement the sliding frame rate of the finger.

Finger sliding frame rate, first we need to be able to receive the Touch behavior of the finger. Since APM already has a hook for the dispatchTouchEvent interface for Callback, it was decided to use this interface directly to recognize finger swipes.

At this point, we need to know a few timing issues:

  • Having a dispatchTouchEvent does not immediately produce a doFrame
  • Move through the dispatchTouchEvent calculation time/distance than TapTimeout/ScaledTouchSlop, does not necessarily produce doFrame immediately

So, move through the dispatchTouchEvent calculation time/distance exceeds TapTimeout/ScaledTouchSlop, it will only get a flag, . Notice the back of the ViewTreeObserver OnScrollChangedListener doFrame can be counted into the fingers sliding frame rate.

Performance optimization/sliding number identification

We need to postframe allBack after we receive the doFrame callback for each frame. PostFrameCallback registers VSync every time (if it is not registered), and when VSync comes, it throws a message to the main thread, which puts some stress on the main thread.

As we all know, the system does not render while the page is still, so no VSync is registered. Do I need to post when I’m not rendering? No, it doesn’t make sense. It can be filtered out. Based on this concept, we optimized the calculation of the sliding frame rate.

To reduce unnecessary frame callbacks and registrations, we need to clarify a few issues:

  1. Starting point (when to start postFrameCallback) : On the first scroll event received (onSrollChanged)
  2. Endpoint (when to stop postFrameCallback) : After calculating a finger slide FPS, if the next frame does not slide, then the callback for the next frame is stopped registering.

If careful, you will find that the fingers can be thought of as the starting point of rendering the starting point, the end of the fingers can be thought of as the rendering the finish (including the Fling), the data is very important, we have equivalent to identify a finger sliding, and the ability to provide each finger sliding data such as time consuming.

Is this perfect for optimization? No, if you look closely at the start time of the calculation above, you will see that the first frame of the slide was lost. Because we calculate the difference between the two doFrame callbacks, even if we know that the current frame is the one to calculate, without the timestamp of the previous frame, we can’t calculate the true time of the frame that started sliding.

The frame of cold

Frozen frames are a type of frame officially defined by Google:

Frozen frames are UI frames that take longer than 700ms to render.

As a special kind of frame, frozen frame is not strongly recommended. It is also mentioned in huawei documents. Once such frames appear, the page freezes. Therefore, in APM, this kind of special frame is also included in the monitoring range, and the frozen frame ratio is calculated:

Frozen frame ratio = the number of frozen frames in the sliding process/the number of frames produced by sliding

scrollHitchRate**

The concept of scrollHitchRate comes from iOS and is mainly used to describe the ratio of hitch time in the process of sliding. What does “hitch” mean? Hitch is the hitch for a frame that takes longer than the standard render time.

The calculation formula is shown in the figure:

The numerator refers to the sum of the hitch for the entire slide process, and the denominator refers to the entire slide time (including the Fling).

You may ask: why not use FPS? If FPS can be used to detect a Hitch rate, why should there be a Hitch rate?

This is because FPS doesn’t work for every situation. For example, when there is a pause in an animation, the FPS cannot measure the smoothness of the animation, and not all apps aim for 60 FPS /120 FPS, as some games only want to run at 30 FPS. For the Hitch Rate, the goal is always zero.

Is scrollHitchRate introduced simply to solve the data inconsistency problem of high-brush mobile phones? Isn’t. We collected a scrollHitchRate data, and implicitly added the number of slides. For example, under the scene of scouring in hand, the students consulted a question on the home page. Will the more the page is brushed down, the more serious the card will be? When this data is collected, you can answer it.

Frame rate main cause analysis

No matter sliding frame rate or frozen frame, most of them prefer monitoring data. If you want to analyze the main reason for the current low frame rate from the data, there is no way to start.

In the previous rendering process, we talked about which steps the rendering process is mainly divided into. If we can monitor each step of the rendering process, we can think that when an abnormal frame appears, the main problem occurs at which stage. But we still hope not to invade the system code like Matrix. Based on this idea, we found that the system provides apis to our needs: Window. OnFrameMetricsAvailableListener. Google Firebase also uses this API for frame monitoring and is unlikely to have subsequent compatibility issues.

FrameMetrics, see developer.android.com/reference/a development of documents…

In the FrameMetrics data provided by the asynchronous callback, it will tell us the time consuming of each frame and each stage, which is very consistent with our monitoring demands. But there are still two issues that deserve attention:

  • The FrameMetrics API is provided on Android 24, which can be found by looking at mobile users’ data and meets basic needs.
  • There is a risk of data loss if a frame is not processed properly, but the interface can tell how many frames are discarded.

Let’s take a closer look at what render stages are defined in the FrameMetrics data:

Excerpt from Android 26. In addition to the fields mentioned in the appeal, there are several relatively good timestamp fields, also can explore some novel gameplay, we can explore together.

Did you notice? It’s exactly the same rendering process. After tracing the source code and registering a listener, there is no performance penalty. Timestamps recorded in FrameMetrics are collected even if they are not registered, so there is no additional performance penalty.

First, we define a frame time threshold for analysis. If the threshold is exceeded, statistical reasons are considered necessary. We define: when the time consumed in a certain stage of a frame exceeds half of the threshold, it is the main cause; otherwise, the main cause does not exist.

In this way, for a certain Activity can be analyzed to determine whether the main thread is slow resulting in low frame rate, or layout problems resulting in layout & measure slow, or draw problems, in the performance optimization, directly lock the main cause of optimization.

Caton frame rate

First of all, let’s review the human eye’s Caton perception. In principle, higher frame rates allow for smoother, more realistic animations. To produce smooth and coherent animations, frame rates should not be less than 8FPS. The more frames per second, the smoother the animation will be. Generally, the human eye can retain about 1/24 of a second of its image, so the average movie has a frame rate of 24FPS. Compared to games, no matter how high the frame rate is, 60 or 120, the average person will end up with no more than 30 frames. Although the movie only has 24 frames per second, the interval between two frames is 1/24 second, so the human eye will not feel obvious lag. Even if the refresh of the game or our interface reaches 30 frames per second, if 30 frames are not evenly distributed within this second, even if 60 frames per second, 59 of them are very smooth. A frame delay of more than 1/24 of a second still makes us feel noticeably stuck.

This is why most of the time our interface has been sliding smoothly, but we still occasionally notice a lag. If the interval is more than 41.6ms, we can feel the lag. If the interval is more than 1/30, the frame time is 33.3ms. If the delay time of a certain frame is more than 33.3ms, then the human eye can easily perceive this process. In order to reflect this stutter, we need to make some notes as we encounter these frames. But if we just go to record the time consuming more than 33.3 ms frame, in the process of this case, on the one hand will lose the factors of time, it’s hard to gauge the seriousness of the caton (after all, the emergence of a period of uninterrupted caton, obvious than people occasionally drop a frame to make a lot of), on the other hand, because of the influence of multiple buffers, not 100% will drop frames, So we’re just taking this frame beyond a certain point in time is not necessarily accurate.

Based on the above considerations, the concept of instantaneous FPS is used to measure the lag. The instantaneous FPS is the value calculated in the small time interval generated during the sliding process. For example, if the user swipes 500ms, this process may result in several instantaneous FPS of user statistics. How does this process work out?

  1. The sliding process gets the time interval for each frame;
  2. According to the time of 100 (99.6ms, 6 frames) milliseconds or so to refine the stuck interval;
  3. Start recording from frames with an interval greater than 33.3 ms, as the starting point of the interval;
  4. The end point is the sum of the frames taken from the starting point, reaching 99.6ms and the subsequent frame taking less than 17ms (or reaching the last frame), otherwise the end point will be searched;
  5. The frame rate during this time period is the frame rate that we’re looking for here.

You can see that there are 3 frames that are obviously more than that. According to the previous statistical method, the frame time: 1535ms, the number of frames is: 83, so the FPS of this interface is 54. We can see that the FPS is relatively high, and no lag is seen at all. Even if there are some high time-consuming frames before, they are averaged out by the subsequent normal time-consuming frames. So the previous statistical methods have failed to reflect these problems.

According to the new calculation method, the first instantaneous FPS interval should be counted from the 7th frame, and the time of at least 99.6ms should be counted from this frame, then 69+16+15 has reached 100ms, 3 frames, so the FPS is 30, because it is lower than 50, so this time the FPS ratio will be recorded, among which the maximum FPS time is 69ms.

The second time starts at frame 17, 5 frames 114ms, the FPS is 43ms, and the maximum frame interval is 61ms.

Starting from frame 26 for the third time, 98+10=108ms, but the elapsed time of subsequent frames is 19ms, exceeding 16.6ms, so it will still be added to the statistics. Three frames, 127ms, 23 FPS. The maximum frame interval is 98.

According to this statistics, there are 3 FPS FPS in total, respectively 30,43,23, and the maximum frame duration is 98.

Caton stack

If you use the Looper Printer of the main thread for stuck stack dump, there will be a performance loss due to the large number of string concatenations. On Android 10, the Observer was added to Looper, providing Looper with lossless callbacks, but not available because of the hide API. The ultimate solution is to keep Posting messages to the main thread, but throwing messages to the main thread every once in a while causes stress to the main thread.

Is there a better way? If yes, Choreographer postFrameCallback itself will post main thread messages, which can be considered stuck if the difference between two callbacks is above a certain threshold. And this identification is stuck, or in the sliding process of stuck.

What do you mean when you dump? We use the mechanism of watchdog to dump the stuck stack, that is, the sub-thread posts a dump message of the main thread, dumps the message if the single frame time exceeds the threshold, and cancels the message if the current frame is completed within the specified time. When we collect the stack, we will cluster the stack of stuck, so as to better determine the main contradiction and deal with the alarm.

Exploration of the use of frame data

AB and APM are used together

The above is mainly explained how we calculate an index, how to troubleshoot problems, but for a market index, heavy and heavy of course is the need to measure the optimization results, that how to measure optimization? The best approach is AB. APM index data was connected with AB test platform, and performance data was produced along with APM experiment.

The AB platform here includes Yihuo platform and Magic Rabbit 2 platform. The index access mode of Yihuo platform uses a custom index. Frame rate is only one of the indicators, and data such as startup and page are also one of them.

Ihugh is the one-stop A/B experiment service platform of Ali Group, providing A visual operation interface, scientific data analysis, automatic experimental report and other one-stop experiment process for each business. Drive business growth by verifying the best solutions through scientific experimentation and real user behavior.

When we optimize the page performance, we can directly compare the benchmark bucket with the optimization bucket by using relevant indicators, so as to directly and obviously display the optimization of page performance.

Write in the last

For hand cleaning performance monitoring, frame rate monitoring and lag monitoring are only a small part of performance monitoring, and polishing every detail is also crucial. In addition to the use of the AB platform, the relevant data has been connected with the whole link screening data, public opinion data, version release performance gateway, using background clustering, alarm, automatic email report and other data means, and the proprietary data platform to carry on. We need to have an attitude toward data that is not just there, but comprehensive and strong.

In round after round of technical iterations, the high availability of Mobile shopping is constantly improved and reconstructed. It is hoped that in the future, the high availability data of Mobile shopping client can better help the development of all links, prevent the corruption of user experience, and help continuously improve user experience.

Pay attention to [Alibaba mobile technology] wechat public number, every week 3 mobile technology practice & dry goods to give you thinking!