In this article, the Metro-code-split tool is open source, support RN unpacking and dynamic import, welcome everyone start.

https://github.com/wuba/metro…

The following is the text.

Today I would like to share with you the topic of “58RN page second opening scheme and practice”. Let me introduce myself first. My name is Jiang Hongwei. I joined 58 in 2015 and began to explore in the direction of RN in 2016. In recent years, many RN performance solutions have been promoted. During the landing, one oft-asked question was:

It takes less time to do performance tuning, but what is the benefit to the business?

For the first time, I did get asked. With this question in mind, I conducted an experiment to calculate the relationship between the first screen time and the visit churn rate. I noticed an interesting pattern.

When the first screen time decreases by 1 second, the access churn rate decreases by 6.9%.

In retrospect, is 6.9% really that good?

We had a 6.9% revenue figure, and we were able to get a couple of businesses off the ground. On the whole, the predicted churn return and the actual churn return are actually similar, but differentiated. Specifically, pages with good performance are less profitable than expected, and pages with poor performance are better than expected. This is also very understandable, good performance of the page, the churn rate is very low, further optimization space is very small.

When we know that the benefits of performance optimization will be differentiated, it is natural to focus more on those pages that have not yet been implemented. We designed several metrics, including churn rate, first screen time, and revenue per second.

Churn rate and first screen time are post facto indicators. The open revenue metric is an ex ante metric that tells you how much your churn rate will be reduced if your page is open in seconds. We want to use this set of metrics to drive business performance optimization. At the same time, we will also provide some low-cost or even cost-free optimization solutions to help businesses save optimization costs.

Metrics drive the business, the business chooses solutions, solutions improve revenue, and this is what we envision as a revenue driven model.

The collection scheme of the first screen time

This sharing will also be carried out around the plan and indicators. First, let’s talk about metrics. The most important metric is first screen time. First screen time is calculated, and churn rate and revenue per second are actually calculated. Therefore, this sharing is divided into the following three parts.

  • The first part is about the first screen time acquisition scheme
  • In the second part, I’m going to talk about performance optimization
  • Finally, I will summarize and look forward to it.

Let’s take a look at the loading process of a page, which has roughly five stages:

  1. 0 ms: User entry
  2. 410 ms: First content drawing FCP
  3. 668 ms: Business Component DidUpdate
  4. 784 ms: LCP for maximum content drawing
  5. 928 ms: The visual area has been loaded

The first screen time can be defined at time points 2, 3, 4, and 5. The definition of the first screen time is different, the time taken is different, and the difference is very large. Therefore, we need to choose a metric to define the first screen time.

The first screen time we chose is LCP. Why is that?

  • First, because the LCP is the maximum content drawing, this is when the main elements of the page are actually displayed.
  • Second, it is because LCP can realize non-intrusive collection, which does not require manual burying.
  • Third, it is because the LCP is a draft of the W3C, which is an important reason. You tell people that your first screen metric is LCP, and they will understand. You don’t need to explain too much.

In order to let you better understand the implementation of the LCP algorithm, first to pave the way for you.

Simply put, the LCP is the largest element you can see, and the time it takes to render. But there’s a problem here. For example, the largest element in our second image is not the same as the largest element in our fifth image. The render time is different for different elements, and the LCP is different. That is, a page has multiple LCPs. Which LCPs are reported? The final convergent LCP value should be reported, that is, the LCP value when the visual region is loaded.

LCP is a Web standard that is not implemented in RN. How is it implemented?

Generally speaking, the implementation can be roughly divided into 5 steps:

  1. When the user enters, the Native thread records the Start timestamp.
  2. The Native thread injects the Start timestamp into the JS Context.
  3. The JS thread listens for layout events of render elements in the page.
  4. It is calculated by JS thread during page rendering, and the LCP value is constantly updated.
  5. The End timestamp is calculated by the JS thread and the final LCP value is reported.

At this point, the final reported LCP = End Time-Start Time.

The difficulty is how to converge the LCP, that is, how to determine that the visual region is fully loaded. The rule we used was that the visual area was loaded when all elements were loaded and the bottom element was also loaded. The element has a call cycle, starting with Render and then Layout. An element that only calls render is an element that has not been loaded. The element that called render and called layout is the loaded element. Being able to tell if an element has been loaded can also tell if the visual area has been loaded.

Performance optimization scheme

Before talking about the specific scheme, let’s talk about the overall idea of our performance optimization.

Before doing any performance optimization, we should first analyze what the performance structure is, and then find the performance bottleneck, according to the bottleneck to come up with a specific optimization scheme.

The performance structure of an RN application, on the whole, can be divided into two parts, Native part and JS part. More specifically, it can be divided into six parts. Here is the time structure of an unoptimized, complex, dynamically updated RN application:

  1. Version request 200 ms
  2. Resources download 470 ms
  3. Native is initialized for 350 ms
  4. JS initializes 380 ms
  5. Business request 420 ms
  6. Business render 460 ms

Generally speaking, the above six structures can be divided into three bottlenecks.

  1. Dynamic update bottleneck, accounting for 29%.
  2. Initialization bottlenecks, 32%.
  3. Business time bottleneck, accounting for 39%.

Bottleneck 1: Dynamic update

Internet products have a feature of rapid trial and error, which requires the business to be able to quickly iterate. In order to support rapid business iteration, this requires the application to be updated dynamically. Dynamic update, must send a request, to send a request will be slow can, such as the Web. If it is the same as Native, the performance will be much better with built-in resources, but how to update dynamically?

Dynamic updates and performance seem to be a contradiction in terms. Is there a trade-off?

The solution we started thinking of was to improve page performance with built-in resources and to update dynamically with silent updates.

When the user first comes in, there are no requests because there are already built-in resources, and the page can be rendered directly. At the same time, the Native thread silently updates in parallel, asking the server if it has the latest version. If so, it downloads the bundle and updates its cache. So the next time the user comes in, they can use the last cached resource, directly render the page, and silently update it in parallel. And so on, each time the user enters, without a request, the page can be rendered directly.

There is one small detail to note when designing silent updates. Each time the user is using the last cached resource, rather than the latest resource on the line. There is therefore a risk that a badly bugged version will be cached by users and not be updated. To this end, we have designed a forced update function. After the silent update is successful, the Native thread notifies the JS thread, and the business decides whether to force the update to the latest version according to the specific situation.

The built-in + silent update scheme also has some disadvantages:

  1. Increase the volume of your App. For super apps, the volume is already very large, and it’s hard to increase the volume.
  2. The new version has low coverage. The coverage of the new 72-hour version is around 60%, which is relatively low compared to the Web solution.
  3. Version fragmentation is severe. Multiple built-in versions and multiple dynamic updates can lead to fragmentation issues, driving up maintenance costs.

So we made some improvements.

Resource preloading instead of built-in. This largely avoids the problems of package size, coverage, and fragmentation. Silent updates are still in place to update possible BUG versions.

In fact, the topic of resource preloading has been bad, I only from the perspective of “rights”, to help you analyze it.

Who should have the right to preload? Is it an RN framework or a specific business? To give the permissions to the framework, the framework can preload all the resources of the page, but this is obviously very inefficient. For platform-level APP, there are dozens or even hundreds of RN applications in an App, and most of the preloaded resources are not used by users, which causes a waste. Give the authority to the business, let the specific business one by one load, and very troublesome.

Information is rights, and rights are given to those who have information. Initially, the framework doesn’t have any useful information, but the business can know the percentage of jump to a specific page based on the business data, so the right to call preload should be given to the business. When the framework knows this information after the user has already used an RN application, the rights should be given to the framework. The framework can make version pre-request after the App is launched.

To solve the dynamic update bottleneck, we use resource preloading and silent update schemes. From 2280 ms, which was never optimized, the time was reduced to 1610 ms, a 29% decrease.

Bottleneck two: Frame initialization bottleneck

First, let’s examine why framework initialization is slow.

JS thread and Native thread communicate asynchronously, and each communication is serialized and deserialized through the Bridge. Before communication, JS threads and Native threads are unaware of each other’s existence because they are not in the same Context. Because Native doesn’t know which NativeModule JS will use, Native needs to initialize all NativeModule instead of on-demand, which is why initialization performance is slow.

In the new RN architecture, there are plans to replace asynchronous Bridge communication with synchronous JSI communication for on-demand initialization. However, the on-demand initialization function has not been implemented yet, so we still need to optimize the framework initialization.

The idea we give is to unpack built-in and frame pre-execute.

Our App is a hybrid App. We don’t use RN on the front page. Therefore, all nativeModules can be initialized by executing the RN built-in package after the App is started. When the user actually enters the RN page, the performance will naturally be much faster.

The biggest difficulty in this scheme is unpacking. How do you properly unbundle a complete bundle into built-in and dynamic bundles?

At first we stepped into a pothole, hoping to help you avoid it.

It turns out that we use Google’s diff-match-patch algorithm, which will compare the differences between the old and new texts to generate a patch file. Similarly, the diff-match-patch algorithm can be used to compare the differences between the service package and the built-in package to generate a patch dynamic update package.

However, PATCH is actually a “text patch”, and “text patch” cannot be executed alone. Cannot meet the requirement of first executing the built-in package and then executing the dynamic update package.

Later, we reworked Metro to unpack properly, thus enabling framework preloading.

A complete bundle consists of several modules. How do you tell whether a module belongs to a built-in bundle or a dynamically updated bundle? The path to the built-in module, or ID, has a feature that is in the path node_modules/react/ XXX or node_modules/react-native/ XXX. You can record all the IDS of the built-in modules in advance. When packaging, all the IDS belonging to the built-in modules will be filtered out, and a dynamic update package containing only business modules will be generated.

Metro Unpack’s dynamic update package is a “code patch” that can be executed directly, satisfying the requirement to execute the built-in package first and then the dynamic update package.

One of the details is to add a line of require(initializeCore) to the built-in package to call the defined modules. By adding this line of code, the first screen time can be reduced by about 90 ms.

To address the framework initialization bottleneck, we used both a built-in unpack solution and a framework pre-execution solution. From 1610 ms, which was never optimized, to 1300ms, an overall drop of 43%.

Bottleneck three: business request bottleneck

After you’ve optimized the dynamic update bottleneck and the framework time bottleneck, take a look at the business bottleneck. The business bottleneck mainly consists of two parts: business request and business rendering. The request is easy to optimize, so we first optimize the business request bottleneck.

There are many common solutions for optimizing business requests.

  • Business data cache
  • Preloads the business data for the next page on the previous page

But not every application is good for caching, not every application is good for preloading data from the last page. Therefore, we need a more general approach. If you look carefully, the Init part and the business request part are serial. Can we change them to parallel?

Our idea is to replace JS with Native and directly request business data in parallel when users enter the page.

The specific scheme is as follows.

  1. In the resource file downloaded by Native, both the BIZ business package and the URL of the original business request will be included.
  2. The original URL will contain dynamic business parameters that will be transformed according to preagreed rules. For example,58.com/api?user=${user}It’s going to be converted to58.com/api?user=GTMC.
  3. Native executes Biz package rendering page in parallel, and initiates URL request to obtain business data.
  4. The JS side directly calls PREFETCH (CB) to obtain the data requested by the Native side.

To solve the bottleneck of business request, we use the scheme of parallel loading of business data. From 1300 ms, which was never optimized, it was reduced to 985 ms, an overall reduction of 57%.

Application of the above scheme, most pages can be achieved seconds open. Is there room for performance optimization?

Code execution bottleneck

Another reason for the slow rendering of RN pages is that RN needs to execute complete JS files, even if there is code in JS that does not need to be executed.

Let’s look at an example. A page contains three tabs, and users will only see one TAB when they come in. Theoretically, you only need to execute the code for 1 TAB. In reality, however, the code for the other two invisible tabs will also download and execute, slowing performance.

The ability of RN code to lazily load and lazily execute to improve performance, similar to Dynamic Import on the Web.

Dynamic import is not officially provided by RN, so we decided to do it ourselves.

The Dynamic Import demo is currently available in RN 0.64. When the business is initialized, only the BIZ business package can be executed, and the corresponding chunk dynamic package will be dynamically downloaded when jumping to the two dynamic pages of Foo and Bar. Exit the dynamic page you have entered and enter it again. It will not be downloaded again. The dynamic page will be directly rendered using the original cache.

RN dynamic import implementation, we refer to the TC39 specification.

The business only needs to write a single line of code import(“./Foo”) to achieve lazy loading and lazy execution. All the rest of the work is done at the frame and platform levels.

At runtime, after the business executes import(“./Foo”), the framework layer determines whether the module corresponding to the./Foo path has been installed. If not installed, the URL of the corresponding chunk package is found through the./Foo path, the chunk is downloaded and executed, and the Foo Component is rendered.

The URL of the Chunk package is a CDN address. Obviously, uploading the CDN and recording the relationship between Path and URL is not done at runtime, but at compile time.

During platform-level compilation, the relationship table between PATH and URL is stored in the BIZ package so that the Runtime can find the corresponding URL through PATH.

To complete this process, it is roughly divided into 5 parts.

  1. Project: A Project consists of several files, and there are dependencies among the files.
  2. Graph: Each file generates a corresponding module, and all modules and their dependencies make up a Graph.
  3. Modules: Colors the collection of Dynamic Modules to distinguish them.
  4. Bundles: Bundles a collection of modules into multiple Bundles.
  5. CND: Upload the bundle to the CDN.

The most critical step is to color the collection of Dynamic Modules.

  1. Decomposition of coloring: The coloring of a Graph can be decomposed into a number of base cases for which the coloring scheme has been determined.
  2. Dynamic Map: After the coloring is completed, the root Path of “green” and “blue” dynamic modules will be recorded, and a dynamic map will be formed with the CDN URL of its bundle.
  3. Path to URL: Dynamic Map is packaged into the “white” BIZ business package and therefore invoked at Runtimeimport()The corresponding URL can be found through PATH.

Many of the details above are not covered, but those of you who are interested in implementation details can pay attention to our open source tool, Metro-Code-Split.

metro-code-split:https://github.com/wuba/metro…

  • Based on the metro
  • Support DLL unpacking
  • Support for RN Dynamic Import

Summary and Prospect

By analyzing the performance structure, we found three types of performance bottlenecks and produced different optimization schemes. The following figure is the collection of our second opening scheme, which lists the (expected) benefits, effective scope and effective scenarios. I hope it will be helpful for you to select the technology.

In the latest release, many of the features of the new RN architecture have matured and we are actively exploring them. One of the biggest surprises is the Hermes engine, which is already available on both iOS and Android. The biggest difference between Hermes engine and the original JScore engine is that Hermes will precompile JS files into Bytecode files at compile time, so that the JS files can be directly executed using Bytecode files at run time, which can greatly reduce the JS execution time. After testing, we found that a page that takes 140 ms can be reduced to 40 ms, a reduction of 80%.

While we provide performance optimization solutions for the business, we also need to pay attention to the implementation of the business. In order to enable more businesses to achieve second opening, we collected indicators such as churn rate, first screen time and second opening revenue through non-intrusive collection. In our practice, this alignment of technical optimization requirements to business benefits is more acceptable to the business and is easier to drive.

Finally, I hope our second opening scheme and revenue-driven practice can give you inspiration. Thank you.