Web performance basically means how fast the site opens, how smooth the animation is, how fast the form is submitted, and how slow the scrolling between lists is. Performance optimization is about making your site faster.

Web performance is defined in MDN as objective measures and perceived user experience of a site or application. Such as to reduce page load event (reduce file size, the HTTP request, using preloading), let the site available as soon as possible (lazy loading or subdivision load), smooth interaction (use CSS instead of JS animation, reduce the UI redraw), perceived performance (such as loading animation, loading to the user feel fast), performance measurement (performance indicators, Performance testing, performance monitoring for continuous optimization, which is an ongoing process).

Page performance matters to retention, conversion rates, user experience, and spread of the site. It even affects search rankings and user complaints, and of course, development efficiency.

Performance indicators

Before we can do performance tuning, we need to know what areas to do performance tuning.

First, you need to understand the performance metrics. How fast is fast? The performance of a website or application can be quantified using professional tools.

Based on the life cycle of web page response, analyze the causes of poor performance, and finally carry out technical transformation, feasibility analysis and other specific optimization measures, continuous iterative optimization can be.

In fact, performance is relative, not absolute. The speed at which a user accesses a page in different network environments may be different. Even the same site can seem fast if it loads lazily.

Precise, quantifiable metrics are important when discussing performance. But just because a metric is based on objectively prepared and quantifiable D measures does not necessarily mean those measures are useful. Measuring the performance of a Web page has always been a challenge for Web developers.

Initially, the developers used Time to To Byte. DomContentLoaded and Load are constant metrics for document loading progress, but they do not directly reflect the user’s visual experience.

To consistently measure the user’s visual experience, performance metrics are defined in the Web standard. These performance metrics are standardized by major browsers such as First Paint and First Contentful Paint.

There are also several performance metrics proposed by the Web Incubator community group, such as Largest COntentful Paint, Time to Interactive, First Input Delay, and First CPU Idle.

And then there’s First Meaningful Paint, Speed Index from Google.

Baidu’s First Screen Paint.

These metrics are not unrelated, but have evolved in user-centric goals, some are no longer recommended, some are implemented by various testing tools, and some can be used as common standard apis for measurement in production environments.

RAIL performance model

RAIL is the acronym of Response, Animation, Idle and Load. It is a performance model proposed by Google Chrome team in 2015 to improve the user experience and performance of the browser.

The RAIL model is user-centric, and the ultimate goal is not to make your site run fast on any particular device, but to make your users happy.

Response: Should respond to the user’s operation as quickly as possible, and should respond to the user’s input within 100ms.

Animation: When displaying animations, each frame should be rendered in 16ms to maintain consistency and avoid stalling.

Idle: When using the JS main thread, the task should be divided into fragments that take less than 50ms to free up threads for user interaction. The unit of 50ms is to ensure that the user responds within 100ms of the operation.

To make quick response and smooth animation, often require long processing time, but for the center with the user to view performance issues, will find that not all work needs to be done in response and loading stage, we can fully use the browser’s free time treatment can delay the task, so long as lets the user without delay. Using idle time processing delays can reduce the size of preloaded data to ensure that websites or applications load quickly.

Load: Your site should Load in less than 1s and be ready for user interaction. Users interpret performance latency differently depending on network conditions and hardware, taking more time on 3G networks, and 5s is a more realistic goal.

Performance indicators based on user experience include the following important performance indicators.

  1. FCP (First Contentful Paint)

First content rendering, the time when the browser first draws content from the DOM. The content must include text, images, non-white canvas or SVG, and text with a Web font that is being loaded. This is what users are seeing for the first time.

FCP time (seconds) Color coding FPC scores
0-2 Green (quick) 75-100.
2-4 Orange (medium) 50-74.
More than 4 Red (slow) 0-49
  1. LCP (Largest Contentful Paint)

Maximum content rendering, the time it takes for the largest content element in the visible area to appear on the screen, used to estimate how long the main content of the page is visible to the user. Img images, cover of video elements, Beijing loaded via URL, text nodes, etc. In order to provide a better user experience, the site should draw the maximum content within 2.5s or less.

LCP time (seconds) Color coding
0-2.5 Green (quick)
2.5 4 Orange (medium)
More than 4 Red (slow)
  1. FID (First Input Delay)

Input lag for the first time, and for the first time from the user to interact with the page to the browser actually able to respond to the interaction time, input lag because the browser’s main Cheng Zheng busy doing other things, so can’t respond to user, is a common cause of this occurring browser are busy to parse and execute JavaScript application of load calculation.

FID time (ms) Color coding
0-100. Green (quick)
100-300. Orange (medium)
More than 300 Red (slow)
  1. TTI (Time to Interactive)

The point at which the web page is first fully interactive, the browser is continuously responding to user input, the point at which the web page is fully interactive is at the end of the last long task, and the network and main thread are idle for the next 5 seconds. By definition, the Chinese term continuous interaction time or fluent interaction time is more appropriate.

TTI time (s) Color coding
0-3.8 Green (quick)
3.9-7.3 Orange (medium)
More than 7.3 Red (slow)
  1. TBT (Total Block Time)

Total blocking time, which measures the total time between FCP and TTI in which the main thread is blocked long enough to prevent input responses. The main thread is considered blocked whenever there is a long task running on the main thread for more than 50 milliseconds.

Threads block because the browser cannot interrupt ongoing tasks, so if the user does interact with the page in the middle of a long task, the browser must wait for the task to complete before responding.

TBT time (ms) Color coding
0-300. Green (quick)
300-600. Orange (medium)
More than 600 Red (slow)
  1. CLS (Cumulative Layout Shift)

Cumulative layout is cheap. CLS measures the sum of all the individual layout shift scores for every unexpected layout shift that occurs over the life of a page. It is a metric scheme to ensure visual stability of a page and thus improve the user experience.

In human terms, when we want to click on an element on the page, suddenly the layout changes and the finger points somewhere else. Say you want to click on a link on a page and suddenly a banner appears. This may be due to an image or video of unknown size.

CLS time (ms) Color coding
0-0.1 Green (quick)
0.1-0.25 Orange (medium)
More than 0.25 Red (slow)

Web Vitals

This is also Google’s standard for specifying Web performance metrics. Google thought the previous standard was too complex and had too many metrics, and in 2020 it was reorganized and simplified to three. Loading performance LCP, interactive FID, visual stability CLS. Only need to do these three, the performance of the site is basically ok.

There are many tools for measuring Web Vitals, such as Lighthouse, Web-Vitals, and the browser plug-in Web Vitals.

  • Web-Vitals
// npm install web-vitals -g

import { getLCP, getFID, getCLS } from 'web-vitals';

getCLS(conole.log)
getFID(conole.log)
getCLS(conole.log)
Copy the code
  • Browser plug-in

Google Chrome can find and install Web Vitals directly from the plug-in marketplace. Once installed, the plugin logo appears in the upper right corner of the browser, and clicking on it will show performance metrics for the page.

The performance test

Performance detection is a part of the performance optimization process. Its purpose is to provide guidance, reference baselines and comparison basis for subsequent optimization work. Performance testing is not a one-time effort, it is repeated through iterations of testing, recording, and improvement. To help the site’s performance optimization continue to approach the desired results.

  • Lighthouse

Lighthouse is an open-source Web performance testing tool developed by Google to improve the quality of web applications. It can be run as a Chrome extension or from the command line. Just give it an address to review, and Lighthouse runs a battery of tests on the page, generating a report on its performance.

The Lighthouse option already exists by default in your browser’s debugging tools, so simply switch to Lighthouse and select the desired option in the options pane on the right. Click Generate report.

It can be seen that the first screen time of Taobao is 0.6s, the interactive time is 1.5s, and the total blocking time is 10ms. The maximum drawing time is 1s. Using these metrics, you can see where performance bottlenecks exist.

A screenshot will be taken of the rendering below. If there are many blank pages, the site will be white for a long time. Here are some optimization suggestions. For example, some resources are too large, the loading time is too long, etc. Of course, these suggestions are not always correct, just some suggestions.

Finally, the test environment information, can not make a test of one environment, to multiple environment test.

  • WebPageTest

An online Web performance testing tool (www.webpagetest.org) that provides multi-location testing. He can only test sites that have already been published. Enter the web page address to test, click the Start test button to start the test, you can choose the test location, test browser, etc.

Here will generate a detailed test data, I did not open here, open to fill the picture, embarrassing…

Chrome DevTools

  • Task manager for the browser

You can view the GPU, network, and memory usage of all processes in The current Chrome browser, including the currently opened tabs, installed extensions, and default processes of the browser, such as GPU, network, and rendering. By monitoring these data, Locate faulty processes that may have memory leaks or abnormal network resource loading.

More tools -> Task Manager

You can see all the processes going on, and you can see the memory footprint and network consumption.

  • Network Network analysis

The Network panel is a frequently used tool that allows you to obtain information about all resources on a website, including load time, size, priority Settings, and HTTP caching. This helps developers find problems that may result in large size of resources due to insufficient compression, and that the loading time of secondary requests is too long due to failure to configure a cache policy.

1. Cache test

Disable cache

2. Throughput test to simulate network speed

  • Coverage

Monitor and count the coverage of code execution during application running.

The statistical objects are JavaScript script files and CSS style files. The statistical results mainly include file byte size, number of code bytes covered during execution, and visual coverage bar chart.

Based on the execution results, you can find out which larger code files have lower coverage, which means that there is more useless code in those code files.

Ctrl + Shift + P search for Coverage will show up.

You can see that 58% of the first file is unused and 95.2% of the second file is unused.

  • The Memory panel

It is mainly used to analyze the memory usage. If memory leaks occur, the website may crash.

For a more detailed and accurate monitoring of the current Memory usage of the application website, Chrome provides the Memory panel, which can quickly generate a snapshot of the current heap Memory.

You can view the memory usage and optimize the corresponding module.

  • Performance

The Performance panel is mainly used to detect and analyze the runtime Performance of web applications, including page frames per second, CPU consumption and time spent on various requests.

Click start and wait two or three seconds to stop.

This can be the statistics of the website information.

  • FPS

Another very handy tool is the FPS count, which provides a real-time estimate of the FPS while the page is running.

Ctrl + Shift + P enter FPS to select display render. The monitor panel will appear in your browser.

You can also use a performance monitor, which is a fact monitor.

Ctrl + Shift + P Enter monitor

Performance optimization path

Before we talk about front-end optimization, let’s start with this question. What happens when you type the URL into the browser address bar and press Enter. Performance optimization is basically about this process.

First of all, the browser receives the URL, to the start of the network request thread, a complete HTTP request is issued, the server receives the request and turns to the specific processing service, HTTP interaction between the front and back and the caching mechanism involved, the browser receives the key rendering path of the data packet, and the JS engine parsing process. And that’s sort of the process.

Let’s talk about it in detail.

The browser receives the input URL and starts the network request thread, which is done inside the browser. So what is a thread and what is a process?

In simple terms, the process is a program running instances, operating system, will create separate memory for process to deposit needed to run the code and data, and the thread is a component part of the process, each process has at least one main thread and possible number of child thread, the thread by the process needed to start and manage.

Because multiple threads can share the resources allocated by the operating system to the same process they belong to, the parallel processing of multiple threads can effectively improve the running efficiency of programs.

If one thread fails to execute, the entire program will crash. Processes are isolated from each other, which ensures that when a process hangs or crashes, it will not affect the normal operation of other processes. Although each process can only access the resources allocated to it by the system, it can communicate with each other through the IPC mechanism.

The resources occupied by a process are reclaimed by the operating system after the process is shut down. Even if there is a memory leak caused by a thread in the process, the related memory resources are reclaimed when the process exits. Threads can share data of their own processes.

Early browsers were single-process, in which page rendering, rendering, and web requests were performed through threads. As mentioned earlier, it only takes one thread to crash to cause the entire process to crash. If you’ve been on the Internet for a long time you’ve probably experienced this, where a website is dead and the entire browser is dead. Single-process browsers have many pitfalls, such as low page fluency, security, and stability.

Later, Chrome introduced the multi-process browser. Each browser has only one main process, which is responsible for the menu bar, title bar and other page display, file access, forward and backward, and child process management. In addition to the main process there are GPU process, plug-in process, network process, render process.

The renderer process, also known as the browser kernel, will open an independent renderer process for each TAB page by default, which is responsible for turning HTML, CSS, JavaScript and other resources into interactive pages, including multiple sub-threads, JS engine threads, GUI rendering threads, event triggering threads, timing trigger threads, asynchronous HTTP request threads and so on. The web request that is made when a TAB is opened and a URL is entered starts from this process. For safety reasons, the renderer process exists in a sandbox. You can find these processes in Chrome’s Task Manager.

Establishing an HTTP request is divided into two parts: DNS resolution and communication link establishment. In simple terms, the client browser that initiates the request must first know the address of the server to be accessed, and then establish a path to that server address.

  • The DNS

DNS resolution is basically to find the specific IP address based on the host domain name. There are many steps in the process.

First looks for the browser’s cache, if can’t find it to find its own DNS cache system, in the absence of is to find the system hosts file, no will go to the local domain name server provider query root name servers, if you still can’t find to find com top-level domain name server, finally to access the domain name server lookup, Returns an error message if none are found. This is the DNS search process, any slow will affect the subsequent operations.

  • A network model

After the IP address of the target server is resolved by DNS, the network connection can be resumed for resource access. In this process involves the network architecture model, the International Organization for Standardization put forward some network architecture model, OSI, TCP/IP.

OSI is a seven-layer architecture, including application layer, presentation layer, session layer, transport layer, network layer, data link layer, and physical layer. TCP/IP is simplified to four layers: application layer, transport layer, network layer, and data link layer. Also, the slowness of each layer has an impact on performance.

  • TCP

After passing through the network model, TCP links are established, primarily to request and send data over HTTP.

TCP is a link-oriented communication protocol. Therefore, the connection between the client and the service must be established before data transmission, namely, the three-way handshake.

  • Front and back end data interaction

After the TCP connection is established, the front and back ends can communicate with each other through HTTP. However, in actual applications, a reverse proxy server is added instead of the browser directly communicating with the server whose IP address is specified.

  • Reverse proxy server

The reverse proxy server obtains resources from the back-end server and provides them to the client at the request of the client. The reverse proxy does the following things, such as load balancing, secure firewall, encryption and SSL acceleration, data compression, cross-domain resolution, and static resource caching.

  • Back-end processing flow

After receiving a request, the reverse proxy server implements unified authentication, such as cross-domain authentication and security interception. If an irregular request is found, the reverse proxy server directly returns the corresponding rejection packet.

After verification will enter the specific acquired program code execution stage, such as the specific calculation of database operations.

After completing the calculation, the backend sends an HTTP response packet back to the front end of the request, explaining the request.

  • HTTP protocol features

HTTP is an application layer protocol based on TCP at the transport layer. There are differences between long and short connections at the TCP layer.

The so-called long link is the TCP connection between the client and server that can continuously send data packets, but both sides need to send heartbeat check packets to maintain the connection.

When the client needs to send a request to the server, it establishes a connection over the NETWORK layer IP protocol. When the request is sent and the response is received, the connection is disconnected.

Short connections are used by default in HTTP1.0.

HTTP1.1 uses long links by default, but long links can have concurrency and still need to wait if too many requests are made. Common practice is to split domain names and merge small ICONS.

HTTP2.0 has made it possible to request multiple resources over a SINGLE TCP connection, splitting the requests into smaller frames for performance gains again.

  • Browser cache

HTTP – based caching is divided into strong caching and negotiated caching.

Cache-control max-age specifies whether the cache is valid or not. If the cache is valid or not, the cache is invalid. If the cache is valid or not, the cache is invalid. This is more accurate and safer than the previous use of Expires.

Negotiation cache requires the browser to make an HTTP request to determine whether the file cached locally by the browser has changed.

  • Key render path

When a web request is made and the page file is retrieved from the server, the browser starts rendering what the server responds to.

First, the browser builds the DOM and CSSOM by parsing HTML and CSS files.

The browser receives and reads the HTML file, in fact, the original byte specified in the file encoding, first needs to convert the byte into a string, and then the string into the W3C standard token structure, which is a set of rules in HTML for different tags to represent different meanings. Then, through lexical analysis, the tags are transformed into objects that define attributes and rule values. Finally, these tags are connected into a tree structure according to the parent-child relationship represented by HTML.

DOM tree said document tag attributes and relationships, but did not include after rendering, the appearance of each element, over here is the next CSSOM responsibilities, and to parse HTML files into the document object model similar to the process of the CSS file will first experience from byte to a string, and then build after the token and lexical analysis for cascading style sheets (CSS) object model.

Both object models take time to build, and you can check the developer tools performance TAB in your browser.

Once you have the DOCUMENT object model and cascading style sheet objects, you draw them. Before rendering, the browser merges the document object model and style model into a rendering tree. This tree contains only visible nodes, such as nodes whose display is node.

Start at the root of the generated DOM tree and traverse down each child node, ignoring all invisible nodes because invisible nodes do not appear in the render tree.

Find a rule for each visible child node in CSSOM and apply it.

Layout nodes calculate their location and size on the intended device based on the resulting render tree. This step outputs a box model. Drawing nodes translates the concrete drawing scheme for each node into actual pixels on the screen.

The time required to build the render tree, layout, and drawing process depends on the size of the actual document. The larger the document, the more tasks the browser has to handle and the more complex the style, the longer it takes to draw. Therefore, the execution speed of the key rendering path will directly affect the performance index of the first screen loading time.

When the first screen rendering is complete, the user may change the structure of the rendering tree through the user operation interface provided by JavaScript code during the interaction with the website. As soon as the DOM structure changes, the rendering process is repeated.

Key render path optimization is not only for first screen performance, but also for interactive performance.