Endless optimization, iQiyi in the background Web application performance optimization practice

Iqiyi video production intelligent cloud platform system has undergone a major upgrade this year. The front-end team also took the opportunity to completely switch the underlying technology architecture from arm.js (internal MVC framework) + Java BFF + Velocity template three years ago to vue.js + Node.js BFF technology stack.

The new front-end should be a single-page application with more than ten business modules, each of which has been split by routing lazy loading, while common third-party dependencies have been split into separate Vendor files. However, in the early stage of online trial, users generally reported that the page opening speed was significantly lower than that of the old version, and there was a blank screen waiting time of several seconds.

In order to improve user experience and efficiency, the team has optimized the new front-end application for many times, and the final effect is significantly improved. The main content of this article is the summary and sharing of analysis ideas and solutions for the performance of middle and background Web applications.

Problem of comb

We first listed some possible performance bottlenecks in resource file loading, page rendering performance and interface response speed by asking questions.

Resource loading problem

In a complex Web application, it is common to rely on many resource files such as JS/CSS/Images. How to obtain the minimum resources required by the page in the shortest time, we need to consider the following questions:

Are there redundant modules in the source code? Is there any compression or merging?
Are the server response and network transmission speed normal? Are you maximizing the browser’s concurrent requests?
Is the cache policy for resource files reasonable? Do you need to rerequest all files every time you publish online?
Did the first page render download unnecessary resource files? Can the resource files required for each render be pre-loaded?

Page rendering issues

Since JS is executed in a single thread, most of the rendering of the vue.js framework is done on the browser side. In order to solve the problems of white screen and lag, we need to consider the following issues:

Is it possible to pre-render the core layout via skeleton screens, etc.?
Does the main thread have long, time-consuming tasks? Is task sharding and delayed rendering possible?
Is there an algorithm with too much time complexity? Is there a lot of double counting?
Do you initialize the same object repeatedly? Is there a memory leak?

Interface speed problem

Interface response speed is also critical in pages that rely on background data presentation, such as list queries. Since BFF built by Node.js is used to integrate the interfaces of multiple service providers, the following problems may exist:

Is the interface speed provided by the back-end service slow to respond? Are gateway, database, and index services normal?
Can cache services be used for data with low real-time requirements?
Are concurrent requests maximized when multi-party interfaces are invoked simultaneously? Can non-essential interfaces initiate separate requests?
As with browser scripts, is there code for complex algorithms, memory leaks, etc.?

The solution

With these issues in mind, we began a detailed review of the existing application, gradually locating the key performance issues and addressing them one by one.

Resource loading optimization

Webpack build problem analysis

Since our project is built through Webpack 4.x, in order to analyze the number and size of resource files, we use Webpack plug-in Webpack-bundle-Analyzer to make statistics on the static resource files produced, as shown in the figure below (several large files are captured).

According to the statistics, we found the following main problems:

Cache issues. Every time you change any code, the Hash value of all the generated JS/CSS files changes, which means that every time you publish, the browser needs to re-request all the resources.
File size. The original size of chunk-vendor generated by node_modules exceeds 1.5 M. ElementUI is the largest at more than 650K, followed by Moment.js at more than 250K. The rest is made up of vue. js, Lodash and other basic class libraries.
Repeat packaging. The original size of chunk files corresponding to some service modules is about 500 KB. The reason for this is to use d3, Echarts and other dependent modules and package them directly into the corresponding modules. These third-party libraries account for about 70% of the total file size.
Number of resources. Webpack automatically generates common chunks between modules, ranging in size from a few K to more than a hundred K. For example, if there are three modules A, B, and C, the chunk automatically generated contains a variety of different combinations a b.js, A C.js, and a B.JS. These files will also be loaded synchronously when requesting module A. As the number of modules increases, the composition becomes more complex, which increases the number of requests virtually.

Analysis of browser loading speed

Through the browser Network tool, we found that server cache and Network transmission have little influence on the loading speed, and the main problems resulting in slow loading are as follows:

Number of concurrent requests. The static resource files generated by the build are deployed under a static domain name, resulting in a queue to download the files.
Order problem. Some JS files that are not required for first rendering (such as player SDK, flow SDK, etc.) block loading when the page is opened.

Resource construction and deployment optimization plan

To address the above problems, we made the following improvements to the Webpack configuration.

Deploy the base library to the CDN separately. In production environment, vue.js + VueRouter + Vuex + VueCompositionAPI + ElementUI + Lodash and other basic class libraries are built into library.dlL.js in advance through webpack.dllPlugin And deployed separately, while preloading through Prefetch across the site.
Deploy the style theme separately to the CDN. The ElementUI component style used in the project and the MaterialTheme theme style developed internally by the team were dropped from NPM to import the Sass source code. Instead, 9 color-coded themes are pre-built, pre-deployed to the CDN, and pre-loaded via Prefetch. Custom styles in the project generate rules for different themes through Sass Mixin.

< link href = "/ / static.iqiyi.com/lego/theme/element-ui/1.0.0/css/cyan.css" rel = "prefetch" / > < link A href = "/ / static.iqiyi.com/lego/theme/element-material/2.0.0/css/cyan.css" rel = "prefetch" / >Copy the code

Deploy the business code to a different domain than the base library. Increase the number of concurrent browser requests.
The JS files that are not required for the first rendering, such as player SDK and flow chart SDK, are asynchronously loaded by means of defer or dynamically requested during component initialization.
Delete unnecessary third-party libraries such as moment.js. Looking at the project’s source code, we found only a few places to use the moment.js formatting functionality, so we chose to implement a utility function of just a few dozen lines ourselves instead. In addition, depending on the actual situation of the project, you can also consider introducing a smaller class library in the project, such as day.js.
Optimize splitChunks policy for Webpack. Extract d3, ECharts and other dependencies into separate chunks. In addition, this configuration is disabled because of the small size of the public chunk (like A, B, C, s) files automatically generated between different modules, which increases the number of requests. At the same time, the parts that are common between modules (in the project, they are grouped together in the SRC /common directory) are explicitly packaged into chunk-common files.

// webpack config {optimization: {splitChunks: {cacheGroups: {// Disable chunks by default: False, // display extracting project public chunk common: {name: 'chunk-common', test: / SRC [\\/]common/, chunks: 'all'}, / / draw d3 / echarts third-party libraries d3: {name: 'the chunk - d3, test: / / \ \ / node_modules / \ \ / (d3 | dagre | graphlib) /, priority: 100, chunks: 'all' }, echarts: { name: 'chunk-echarts', test: /[\\/]node_modules[\\/](echarts|zrender)/, priority: 110, chunks: 'all' } } } }, }Copy the code

Optimize the Hash in the file name after construction. In production, use Contenthash to name files instead, and only generate new file names when the contents of the contained files change, maximizing the cache.

// webpack config{  output: {    filename: 'js/[name].[contenthash].js',    chunkFilename: 'js/[name].[contenthash].js'  }}
Copy the code

After the above optimization, the size of chunk-vendor finally constructed is around 500K, and its volume is reduced by about 2/3. The newly extracted project chunk-common file size is about 300K; The size of files packed by each module is about 200K, and the volume is reduced by about 3/5. At the same time, combined with CDN deployment base class library, Prefetch preloading and Contenthash cache control, the speed of resource loading is greatly improved.

Page rendering optimization

Considering the business scenario and development cost, the new version of the front-end application does not achieve server-side rendering, there is a long white screen time. Rendering is done on the server side with Java + Velocity, and the user experience is quite different.

Browser rendering performance analysis

To address this issue, we did a complete analysis of page rendering Performance with Chrome Performance.

Since the production environment code has been compressed, it is recommended that you record a Profile in the development environment so that you can directly locate the source code. The timeline after recording is shown in the screenshot below.

We need to focus on the following dimensions:

Frames: FPS rendered and results rendered at different points in time.
Main: Render the Main thread, including HTML parsing, JavaScript execution and other tasks.
Timings: includes indicators such as FP, DCL, FCP, and LCP, as well as the running time recorded through the Performance API. In vue.js 2.x, vue.config. performance = true; Enable component performance recording. The screenshot below shows the rendering time of vue.js components.

After analysis, we found the following main problems:

The first rendering task after route activation took a particularly long time, exceeding 2 seconds. Among them, site navigation, sidebar, etc. take up more than half of the time.
The authService.hasuriauth method used to determine link permissions takes up 80% of the time in the navigation component.
In dynamic form pages rendered by configuration, the core component FormBuilder also renders in about 2 seconds.

Page rendering overall optimization scheme

In view of the above problems, we have made the following improvements:

Render the skeleton screen through the server side, including the basic layout of the page such as navigation. Reduce the user’s psychological wait from the visual effect.
Reduce the number of components for the first screen rendering. Components such as navigation secondary menu, site sidebar and list advanced search popup, which are initially hidden, are extracted to asynchronous chunk through Webpack, and then rendered asynchronously during user interaction.

// AppLayout.vue
{
  components: {
    AppDrawer: () =>
      import(
        /* webpackChunkName: 'chunk-async-common' */
        './AppDrawer'
      ),
    AppHeader
  },
}
Copy the code

Optimize time-consuming JavaScript functions. This step needs to be optimized with the actual code implementation. Take authService.hasuriauth as an example, where the most prominent problems are the repeated execution of functions within the loop and the repeated creation of regular expressions. We added Memoization function to the time-consuming function by Memoization, and returned the memory value directly when the parameters were the same. Regular expression instances are cached for reuse.
Manually split the dynamic form FormBuilder that renders according to the configuration into multiple rendering tasks. Due to the complexity of business scenarios, a typical form has more than 80 fields. In vue.js, the rendering task triggered by a data change cannot be split directly. Here we take a different approach, splitting the form configuration into segments, passing only the first segment in the first rendering, and then stitching the configuration together in subsequent rendering cycles.

<template>
    <form-builder :config="formConfig"></form-builder>
</template>
Copy the code

{ created() { this.getFormConfig().then(() => { this.startWork(); }); }, methods: {startWork() {const work = () => {// Task scheduler return scheduler.next(() => {// Step by step join the form to configure this.formConfig = this.concatNextFormConfig(); if (! Scheduler.done ()) {// Loop the task work(); }}); }; // Start the first task work(); }}}Copy the code

Interface speed optimization

BFF performance analysis

Due to the complexity of business processes, the front end invokes multiple service interfaces and performs secondary processing on data, so the front end has been responsible for the development of the Java Web Layer (BFF). In order to simplify development, this update introduces the typescript-based NestJS framework to replace Spring MVC. NestJS encapsulates the front-end interface for VueJS applications. To address potential performance issues, we made some general extensions:

Add custom middleware TimeMiddleware for all encapsulated interfaces to count the overall response speed of the interface.
Add interceptor to AXIOS, which measures the response speed of BFF calls to third-party interfaces.

Finally, through log, Apache JMeter and other tools to analyze the core interface, we mainly found the following problems:

In interface A, which conducts paging query for index data of tens of millions of magnitude, the current ES query speed is not ideal, and the average time is about 2.6 seconds.
There is unnecessary serialization in interface B that calls multiple services at the same time. In addition, the average time of one of the tag query services is around 700ms, which is a key factor affecting speed.
In interface C, which obtains user information, 20% of the requests take about 600ms, while the other requests only take 50ms. A server in the service cluster is located in another region.
Most of the interfaces rely on a basic service to obtain the channel list, and the real-time requirements are very low. However, the real-time acquisition is done through the interface every time, which takes about 50 ms.
The entire application logging service inherits From NestJS logger.service, which is output synchronously by default through process.stdout. Therefore, a large amount of log content costs a lot on some machines, and the average log time is about 100ms.

BFF overall optimization scheme

In view of the above problems, we have made the following improvements:

The backend students optimized ES query service and added more physical machines for capacity expansion. After optimization, the average time is less than 1 second, and the speed is increased by more than 60%.
The back-end students added a caching mechanism for the label query service, which took about 200ms on average after optimization, with an overall improvement of more than 70%.
Remove cross-region servers from the cluster to ensure that all services reside in the same area or equipment room.
A critical path that maximizes parallel requests and reduces request time. Taking one interface as an example, the average time before optimization is 1.3 seconds, and the average time after optimization is only 700ms, an increase of about 45%.
For services with low real-time requirements, Redis can cache query results, such as channel query service, and the average time consuming is reduced from 50ms to 15ms, increasing by about 70%.
Production logs are unprinted to process.stdout and written asynchronously to a specified file using a logging framework such as Winston.

The overall effect display after optimization

Display the resource loading speed

Through various optimizations such as reducing file size and number, cache, concurrency, preloading, lazy loading, the overall time to obtain core resources is controlled within 200ms.

Loading theme styles and switching theme samples for the first time (Prefetch)

Examples of asynchronous load routes and components (Prefetch)

Page rendering speed display

By means of asynchronous rendering to hide components, optimization time function, task fragmentation, skeleton screen, etc., the user can see the content as soon as possible and control the time of first route rendering within 1 second. Combined with the optimization of the browser itself, the existence of white screen can not be perceived under the condition of normal network speed and performance of the computer.

Interface speed display

Through capacity expansion, caching, concurrency, and optimization of time-consuming functions, we also controlled the speed of several core query interfaces within about 1 second.

Comparison of core data before and after optimization

Optimization processes	Before optimization	The optimized
The script resource is downloaded for the first time	In fact, 7 JS files with a total size of 3.5M were downloaded	Actually download 4 JS files, the overall size of 1.8M
Route first render time	An average of 2.66 seconds	An average of 790 ms
Index paging query response time	An average of 2.60 seconds	An average of 1.03 seconds

Afterword.

Front-end performance optimization involves all aspects, each link has room for optimization. In this practice, aiming at the actual scene of the project, we mainly analyze and solve problems from three aspects of resource loading, rendering performance and interface speed, step by step to improve the opening speed of the page, but also bring better user experience. Of course, optimization is endless, I hope this article can play a role, interested students can leave a comment.

I Qiyi Technology Sauce

Iqiyi APP has greatly optimized user experience, and its intelligent voice function has become one of the first model apps for aging transformation to improve dialect recognition ability and facilitate the use of the elderly. # Senior citizens # Dialect recognition # Voice interaction # Black Technology # Keep upgrading @iQiyi

Video,

Maybe you’d like to see more

Efficiency increased by 50%. Exploration and practice of mobile TERMINAL UI self-acceptance in IQiyi

Common AI elements identify best practices in UI automation testing

Endless optimization, iQiyi in the background Web application performance optimization practice

Related Posts

When will play king glory AI learn to play football, accidentally won the world champion!

Warner Cloud: What is the role of multi-IP server?

C++ simple factory pattern (introduction)