, recruiting

We urgently need talent for browser rendering engine /Flutter rendering engine. WelcomeJoin us.

preface

On January 16, UC Technology Committee, along with nuggets and Google developer communities, held the first Flutter Engine technology Salon in 2021. The activity attracted more than 150 students to apply, but due to the impact of the number of people on site, we could only arrange 50 students to come to the site. In addition, more than 2,000 students watched the live broadcast. During the event, five technical experts from Alibaba Group shared their research and development system based on Flutter construction, development and optimization experience, dynamic solutions, as well as the advantages and new features of Hummer, UC’s customized Flutter enhancement engine.

The first session was “Building UC Mobile Technology Center Based on Flutter” brought by Hui Hong, head of UC/ Quark client and head of UC Mobile Technology Center.

The second session is “Hummer (Flutter Custom Engine) Optimization and Systematic Construction Exploration” brought by Lu Long, technical director of UC Flutter Hummer Engine.

The third session is “Mobile Middleware Technology System and Audio and Video Technology Based on Flutter”, brought by Li Yuan, video technology expert of UC browser.

The fourth session is the experience optimization practice of Flutter by Yuncong, a technical expert of Flutter mobile group. This article contains about 5600 words and 36 pictures. It takes about 30 minutes to read.

Share content

Good afternoon, everyone. Welcome to participate in this sharing. I am very glad to receive the invitation from the UC team to participate in this sharing.

The following will be shared today:

First, what are the industry challenges of Flutter in terms of performance experience?

Secondly, it introduces how to optimize the fluency and memory of idle fish, and introduces our practice process from indicators and tool construction, problem discovery and positioning, optimization direction and means.

Finally, looking ahead to the future work of idle fish in terms of fluency and memory, what are the challenges facing the Flutter industry?

Flutter has always been known for its high performance in terms of performance experience. Check out the FlutterGallery for its smooth performance. However, in complex enterprise business scenarios, Flutter’s smooth performance is not as good as that of native apps.

Dart is a single-threaded model similar to JS. Therefore, the multi-threaded optimization direction common in native APP development is no longer applicable. Secondly, the three famous trees in Flutter (Widget, Element and RenderObject) ensure performance to a certain extent. However, it is also easy to be ignored by business development that the update of Widget to RenderObject diff has a performance cost. The Dart language has a java-like Memory garbage collection mechanism based on reachable analysis. However, it is also easy for business development to ignore the External Memory of Flutter and the C++ part of the engine layer.

In terms of toolchain construction, Android ADB detection smoothness is not applicable to Flutter pages, and average FPS cannot fully reflect user’s sense of motion, so there are also gaps in the construction of indicators and detection tools suitable for user’s sense of motion. Additionally, the Flutter Dart is missing both online and offline stack-on capture components. In terms of memory leak detection, each major factory is still under active construction, such as Kuaishou, Ali have their own solutions.

In terms of optimization direction, in native APP development, r&d can put most of the non-UI operation logic into the background thread for optimization; Multi-threaded optimization directions no longer apply in Flutter. In enterprise-level complex scenarios, we optimized common performance problems according to official suggestions. We used official tools and self-built tools to find slow functions and optimize them. However, there was still a large gap between the expected smoothness result and optimization of Flutter long list was difficult. In memory, it is difficult to detect and locate memory leaks and optimize peak memory due to the construction of detection tools.


How do you optimize your fluency and memory for these challenges? We will introduce our optimization ideas from three steps: indicator tool, problem location and optimization direction.

fluency

Indicators and tools building

The picture on the right shows the smoothness performance of the product details page of Idle Fish on the high-end machine. It can be seen that even the high-end machine has obvious lag and frame loss. The video on the left is shared on Apple’s WWDC in 2018. Obviously, the video on the left is more sluggish in terms of body sensation, but the detected FPS value on the left is higher than that on the right. Therefore, FPS, even the most authoritative indicator in the industry, cannot fully reflect the user’s body sensation. For this reason, we propose average FPS and large lag times (the lag times of 49ms and above within 1s on average) as fluency indicators.

With indicators, it is necessary to have corresponding monitoring tools to get data. We expect the fluency tool to be non-invasive, support multiple platforms (native, Flutter, WEEx, H5, applets), multi-dimensional (multiple performance data), and automatic sliding (excluding manual sliding errors). Xianyu team developed an independent APP on Android platform. See the hover box in the picture on the right. After clicking “Start”, the APP fluency data under the hover box can be automatically detected, with the average FPS=57 and the number of big freezes =0.306. The frame distribution data showed the original data of screen statistics: 16.6ms screen 371 times, 16.6ms*2 screen 6 times, 16.6ms*3 screen 1 time.

The implementation principle of fluency detection tool is based on screen recording data: we register the screen recording service with the system, and then read the compressed screen data from VirtualDisplay every 16.6ms to calculate the hash value of the screen. If the hash value of the screen is the same continuously, it can be known that the screen has not changed and frames are stuck.

Problem discovery and location

With indicators and detection tools, we get the status quo of APP fluency. How can we optimize it? First of all, we need to find and locate unreasonable logic on the business side. Here we recommend several powerful tools including Flutter Performance, DevTools and Debug Flags, and introduce the practical application of these tools in the Xianyu project.

The left is the idle fish detail page, and the bottom is the FDDetailBottomBar. When we slide the list, we can see that the FDDetailBottomBar is constantly rebuilt in the Flutter Performance (figure on the right), but the actual view does not change. Above, we need to optimize the update logic of the bottom view to avoid invalid refreshes.

You can see that there are many ClipRRectLayer and ClipRectLayer in the Timeline using DevTools in the Render thread, which reduces the rendering performance. There is a ClipRRectLayer because the card has rounded corners, but how is the ClipRectLayer generated?

Use Debug Flags to set the debugDisableClipLayers to true and view the view again. You can find that the aspect ratio of the image is inconsistent with the aspect ratio of the Widget. Therefore, you need to ClipRectLayer to trim the excess parts during the display.

When the problem is located, the ClipRRect can be removed by forcing the aspect ratio to be specified on the Native side when the image is requested. The obtained image data can be rounded and clipped to the Flutter Widget. The result can be seen in the timeline diagram.

Image requests to guarantee ratio instances are as follows: gw.alicdn.com/bao/uploade… image.png

Gw.alicdn.com/bao/uploade… image.png image.png

In addition to the official tools, Xianyu also has its own tools to help us find problems.

FishRedux is a data driven view, a bundled application framework for Flutter. In the debug environment, FishRedux prints performance logs for action consumption. By modifying the source code, non-debug packages can also output performance logs to quickly find problems. As shown in the figure above, a scrolling event broadcast will be sent during the scrolling process of the idle fish detail page. It can be seen that one of the scrolling processing logic takes 1.933ms. Based on the business scenario, we optimized for invalid broadcast sending and invalid consumption of broadcast by some components.

Flutter_trace_canary is a self-built Flutter detection and stack collection tool. First of all, the official DevTools tool has been able to help analyze performance problems and analyze method time at the statistical level. There are several reasons for the self-built idle fish:

  1. The offline assistance discovery method does not take a long time (it is difficult to discover on DevTools), but it may have jitter.
  2. Offline automated test scenarios, discover and automatically collect the lag information in the sliding process.
  3. Online slow function detection and stack collection.

The principle of Flutter_trace_canary is shown in the figure above. The collection thread sends signals at a fixed time interval in layer C, and collects the dart UI thread call stack when the signal is received. If the call stack occurs for several consecutive times, it can be considered that a stall has occurred, which is exactly the stall stack.

In the actual process of optimization, we set each 1 ms signal, detected by flutter_trace_canary Flutter high availability SDK in each frame call FrameFpsRecorder. GetScrollOffset method takes seriously, As shown in the figure above, the method length occupies more than one unit length (1 unit length represents 1ms). Check the implementation of high availability SDK, and find that SDK will call back at each frame to find the scrolling view through recursive traversal, and calculate the scrolling distance, so as to judge whether it is in the scrolling state. FPS statistics will be enabled only in the scrolling state. Locating problems, the high availability SDK adds caching logic to avoid ineffective lookup consumption.

Optimize direction and means

In terms of fluency, based on official tools and self-built tools, we found a lot of performance problems caused by business code implementation. How should we optimize the fluency? We divided the optimization direction into the following points:

  • Task optimization direction

  • Reduce consumption by removing useless computations, optimizing algorithms, etc

  • Including the above mentioned case of finding and locating problems through tools and fixing problems

  • User response priority direction

  • If the update display cannot be completed within a frame, the logic will be split into multiple frames to give priority to the user and let the user see the interface feedback

  • Screen jump optimization

The Flutter list controls, such as SliverList and SliverGrid, can be divided into visible and cache extend areas. The view Element is destroyed when it moves out of the area during scrolling. There is no reuse mechanism like RecyclerView UITableView. For this, we build a secondary mapping: index → key and key → elements. In destroyChild, find the key based on the index and place the element to be destroyed in the elements corresponding to the key. When createChild needs to create an element, it finds available elements from this secondary map for reuse.

In the idle fish business scenario, we want the user to browse the content smoothly. In order to do this, loadMore will be triggered during the list scrolling. Loadmore will trigger the refresh of the list control.

This is also a typical example of the overhead of diff updates from widgets to RenderObjects, as originally stated in the Flutter industry challenge, because widgets are discarded and recreated, In the final update process, all update or inflateWidget logic is used. Due to the combination and nesting method of widget design, recursive traversal is not cheap.

Xianfish developed its own PowerScrollView. In loadMore, we kept the existing widgets to ensure that when the old widgets diff updated the renderObject, It only costs the call overhead of updateSlotForChild.

An aside: Although the official design of the three trees considered widgets to be lightweight, simple configuration information, not directly related to layout rendering, so they can be created frequently and repeatedly. However, in the scenario of extreme performance optimization, especially in the scenario of fluency, 16.6ms is required to complete all the calculations, and we cannot ignore this part of the overhead

After the above optimization, we found that on high-end Android devices, there is still a lag in the view. The reason is that the content of the Widget itself is more complex, and the addition of the DX component for dynamic capability makes the Widget more complex. Overly complex widgets make it difficult to render in a single frame, so we can consider disintegrating the Widget and only partially render in a single frame, allowing the list to scroll first to complete the user response.

We disassemble the Widget into one skeleton Widget and two card widgets, and then disassemble the FXImage in the card Widget. Finally, set the skeleton Widget to respond immediately to the current frame; The two card widgets are placed in a high-priority task queue to ensure that each render frame has exclusive time of one frame. The rest of the Widget build display is put into the low-priority task queue. Multiple tasks can be fetched at a time, and different values can be set according to the performance of the model.

With the help of the delayed frame build solution, we can split a large widget into 4 frames and reduce the maximum time of rendering from 18 19ms to 8 9ms. With the time margin, we can perform better on non-high-end computers.

After the above optimization, the smoothness has been greatly improved. The value of the high-end machine is close to the performance of the Native page, but we still feel the lag of the Flutter page more obviously in the motion sense. We deliberately made small caton when item was created. By drawing the offset and time curve, it could be found that the offset did not jump in RecyclerView under the small caton scene, while the offset curve of the Flutter page would jump. This is understandable because the Flutter list will jump after it stops in time, exacerbating our sense of lag.

Check the ClampingScrollSimulation source code, it can be found that the Flutter is based on a d/t curve formula to calculate the offset. When △t is doubled, the offset (△d) will also be doubled, and the screen will jump. In the little Carden scenario, we modified the original D/T curve to V/T curve, and calculated the Y value by means of accumulation.

After optimization, the offset time curve becomes relatively smooth in the case of intentional small kappa, and the motion sensation is weakened a lot when the small kappa occurs in the list, which is more obvious on high-resolution models.

The optimization results

After the above optimization, the fluency of the idle fish details page and search results page has been significantly improved. Green indicates the optimization curve, and the farther to the right of the curve, the better FPS performance. Among them, by using self-built fluency testing tools and offline testing details page, FPS has been increased by 3 points, and the number of big stables has been reduced by half on low-end phones and close to 0 on high-end phones.

Memory optimization

Problem discovery and location

The metrics of memory optimization, namely memory leaks and memory spikes, will not be described here. In terms of memory leaks, we used the custom DevTools tool to do memory exploration based on Layer and the Observatory to locate the leak reference path.

The Widget updates the RenderObject to the RenderObject, builds the Layer tree, generates the corresponding number of layers on the C++ side, and submits the skia rendering. The Dart object is recycled through garbage collection mechanism, and the C++ side is recycled through reference counting. Here, the C++ side uses WeakPersistenerHandle to hold the pointer. When the corresponding Dart object is recycled, the reference count on the C++ side is reduced by one, and the reference count is 0, and the collection is triggered. So theoretically the Layer Dart object and C++ object have a life correlation, and when the C++ Layer is not recycled, it in turn indicates that the Dart object is leaking. In contrast to Android LeakCanary, startActivity is weakly referenced after the Activity exits, and the ReferenceQueue queue is checked to see if the Activity leaks. Similarly, we record the number of layers on the C++ side (the number of objects in memory and in use) when the Flutter Navigator pushes. When the page exits, we actively notify the GC and record the number of layers again a few seconds later. Compare the changes in the number of layers to determine whether leakage occurs.

In the figure, the blue curve represents the number of layers in use, and the yellow curve represents the number of layers in memory. In the last paragraph, the number of layers is consistent before entering the page. After exiting the page, the number of layers in memory is greater than the number of layers in use, indicating a memory leak.

Knowing there is a memory leak, how do you locate the leak? One trick is to put a custom Widget in the scene and then search the name of the custom Widget directly in the Observatory to quickly get the reference path of the leaked object.

Optimize direction and means

Memory leaks are found, and we need to fix them, which is an important aspect of memory optimization. In addition, the Idle Fish team optimizes peak memory using image textures and memory overcommitment.

The Flutter native Image Widget generates a RenderImage object, which holds the Image object, and finally holds the C++ side SkImage object. Dart memory and External memory are both controlled by THE Dart VM GC, and the trigger timing of GC has a lag, which is first affected by VM memory water level statistics. You can see that part of the implementation of the statistics memory value getAllocationSize is approximate, and even the implementation of EngineLayer is written to 3000.

Therefore, when the Flutter list slides rapidly, neither the Dart object nor External memory will be released in time due to GC lag, resulting in a significant memory peak.

The Free Fish team uses an external Texture solution to get the TextureId (TextureId) from the Native layer. The Texture Widget is used to display the image Texture. One advantage of this is that when an IFImage is disposed, we can actively destroy texture memory without relying on GC. As the list slides quickly, the generated memory peak no longer contains the SkImage portion that occupies the largest portion of memory.

With external textures, our image display flow is shown above. If multiple FxImages have the same URL, multiple TextureId will be generated in the scheme, occupying memory for multiple textures.

In order to reuse the texture memory, we add a reference count mechanism for the texture ID. If there is no texture ID cache in the ImageCache when the URL is passed in, the texture ID is generated through the left process and the reference count is marked as 1. If the texture ID is already cached in the ImageCache, the reference count +1 returns the cached texture ID. When the Texture Widget is destroyed, the reference count is -1, and when the reference count is 0, the Texture memory is destroyed.

The optimization results

Not including external texture optimizations, the memory optimization of the idle fish page is shown above, and a leak in the publish details edit page was fixed, reducing memory by 10-30MB.

Future and Outlook

Above is this idle fish in the fluency and memory optimization practice, and finally achieved a good performance improvement. But in the future, there are still areas to improve. First of all, the delayed frame build scheme we use requires business r&d to manually split frames according to business scenarios. We expect to realize a set of automatic frame resolution capabilities similar to React Fiber framework, and automatic split of word usage according to machine performance.

Secondly, we implemented Flutter detection tool based on stack aggregation. This capability will be later applied to online to build a set of online lag monitoring system. Then, we implemented a layer-based memory leak detection tool, and continued to build reference links for memory leaks, including C++ Layer → Dart Layer → RenderObject → Element → BuildContext.

Finally, we will visualize the engine layer Memory usage based on our own DevTools, so that business development can pay more attention to the External Memory usage.

That’s it. Thank you for listening.

Please search for U4 kernel technology and get the latest technology updates immediately