F(X) Team – Zhen Zi

When encountering performance optimization problems, we will start from two directions: industry standard optimization methods and actual performance bottlenecks. It can not only absorb and draw lessons from advanced modes, methods and routines, but also design solutions based on actual performance conditions, which is already a path of high-quality work. However, the road is full of various judgments and choices, and a little mistake will still sink you into the mire and walk on the road of “3000 will kill you”.

There are three main parts to this path: perception, healing and embalming. By “perception”, can performance problems be identified and understood? By “healing”, can the performance problems identified and understood be solved? By “anticorrosion”, will the performance problem be degraded in the future after being solved? This three parts have very rich experience in time, articles, and theory, go here, I want to try from the perspective of overall and systematic, to share my humble opinion of “the nature of the rendering performance optimization”, and tries to put forward a path from the underlying principle, on the rendering performance optimization direction, in the face of the complicated problem, There are more precise and clear basis and more valuable programs.

Hardware perspective

The essence of performance in a broad sense is to strike a balance between experience, processing power and power consumption. The inspiration of this definition comes from the design of hardware chip. Chip design finds a balance point from the perspective of hardware engineering on chip requirements: area, performance and power consumption. In the use of FPGA, CPLD chip design and verification, logic gate number is limited by chip and the size of the production process and produces an overall area constraint, at that time, in order to use some combination of logic gates into a dedicated circuit (the so-called IP) so as to improve performance and reduce power consumption (dedicated circuit power consumption is less than the software plus general circuit), otherwise, Can only be designed for general purpose circuits such as: register operation, general processing instructions… And so on. Therefore, it can be seen that when conditions permit, dedicated circuits, also known as “specialization”, can provide better area, performance and power consumption ratio.

Some students may ask: what happens when the utilization of these specialized circuits is less than that of the general circuits?

Indeed, the performance will be very poor, which is why M1 can help Apple to be the first in the industry, because Apple from the software ecosystem to the operating system, from the underlying system API to the hardware driver, from the driver to the hardware electronic circuit design global planning, This global planning ensures that the system programs with the most frequent software calls are hardwareized, improving the performance power ratio while also ensuring the utilization of specialized circuits. This is what I call the hardware perspective.

Another student asked: you this is the Apple system Android can not be used.

It is true that Android as an open source ecosystem is not as neat, concise, and consistent as Apple’s closed system, but there are some clues to the capabilities of Android’s open source ecosystem that can be found if you care: The path from the software ecosystem to the operating system, from the underlying system API to the hardware driver, and from the driver to the professional electronic circuit provided by the hardware is sorted out. According to the path corresponding to the software engineering, the performance optimization problem is viewed from the overall perspective of the hardware, and the capability of the underlying hardware is fully utilized.

When I was in charge of the international browser, because the infrastructure construction of India and Southeast Asia was backward, the mobile network conditions were very poor and the bandwidth was very impressive. The mobile Internet in the multimedia era was full of pictures and videos. My team and I have been working on super resolution (a technology that uses machine learning algorithm models to predict images to achieve 240p to 720p super resolution conversion that is not possible with traditional interpolation) in the hope of bringing a better experience to UC browser users in India and Southeast Asia.

Model and algorithm with the help of the team will soon be a breakthrough, we have solved most of the models to predict the image display an error, but the entire model required force is both India and southeast Asia mobile terminal equipment is unable to support, even if we tried to reduce accuracy, model compression and pruning, knowledge means such as distillation, On mid-to-low end models like the Redmi (defined as an ARMV7 architecture processor with one gigabyte of memory), the speed is still just a few frames per second and is not available at all.

So we looked at ARM NEON instruction Optimization: an accelerated instruction set for parallel floating-point and vector computing. Using the open source XNN framework, we were able to do directional optimization of the NEON in the OP to improve the forward computing speed of the algorithm model and reduce the pressure on memory and CPU. After nearly a year of effort, we finally achieved 240p to 720p superresolution capability at 24 frames per second. Deployed on UC Browser to serve users in India and Southeast Asia.

Although I was often involved in assembly instruction optimization in software engineering, the global optimization experience from software (algorithm model) to system API, from driver to hardware electronic circuit made me feel the importance of hardware perspective. The algorithm engineer who worked late into the night and taught himself Android programming and NEON’s instruction set was Zhen Qi, who became a senior expert on P8 algorithms because of the project.

So, you might ask: What does this have to do with the front end? I will try to illustrate the similarity between them with an example. First, the front end contains both rendering and computation. The rendering part is defined by both HTML and CSS and then rendered by the browser, so the browser does block most of the connectivity to the underlying capabilities, resulting in a lack of grip on the front end. However, with new apis like WebGL and WebGPU exposed, there is still some scratching on the front end. The computing part is defined by JavaScript and then executed by the script engine. Therefore, the script engine blocks most of the parts connected to the underlying capabilities, resulting in a lack of grip on the front end. At the same time, the script engine basically uses virtual machines to mask the underlying instruction set and hardware differences, so there is another layer of shielding. However, with technologies such as Node.js and WASM allowing some programs to be executed locally, and with V8 engines using special strategies to allow some JavaScript to be compiled locally by Sparkplug, this may change. (Reference: v8.dev/blog/sparkp…


As a result, the front end has been more directly dealing with the underlying hardware through the browser/WebView in many scenarios. The hardware perspective will become the key to finding a balance between the experience, processing power and power consumption. Next, the two core scenarios of rendering performance optimization, rendering and computing, will be introduced respectively.

Render the view

In fact, the previous statement that HTML and CSS define rendering is too crude. A more accurate statement would be: HTML and CSS define the content to render, while JavaScript can interfere with the content and rendering process. Without the intervention of JavaScript, HTML and CSS-defined rendering is just the rendering of static HTML pages (which will not be discussed in this article due to the particularity of animations and videos). With the disappearance of DHTML and XSLT, dynamic rendering is more done by JavaScript. This demonstrates the advantages of understanding the decoupling of individual roles, and at the same time, opens up infinite possibilities by moving dynamic rendering capabilities from simple API calls to complex logic control of programming languages. To sum up, I split the rendering perspective into three parts: render content, render process, JavaScript intervention, and explain my understanding of each part.

To render content

First of all, the emergence of WWW pushed human beings from the information island to the world Wide Web era of interconnection, and the carrier of information was the HTML on the Web commonly known as hypertext Markup Language. At that time, the core of RENDERING HTML was the typesetting engine, and Web standards were also developed around typesetting. With the development of technology, people for typesetting out of static content gradually fatigue, DHTML and XHTML (XML + XSLT) and other technologies and advanced APIS such as: HTTPWebRequest… Brings dynamic capabilities, Flash, Silverlight, JavaAplate… With the decentralization of Web 2.0 to break the monopoly of the portal, the entire Internet industry has brought unprecedented prosperity.

With the development of the industry and the progress of technology, rendering the content from the original simple information “document layout”, the “rich media” carry multimedia information, and then to today’s augmented reality WebXR carrying complex digital and real information, rendering the content to the rendering engine, display, hardware acceleration is put forward different requirements. At its simplest, each engine will separate out the animation-related API to separate out the heavy workload of rendering and optimize it specifically for both the framework and the underlying engine.

In addition, there are differences in the ability to render content, most commonly resolution, HDR… And other content has special requirements for display ability. Hardware acceleration is easier to understand. For different loads of rendering work, the first step is to reduce CPU, memory, and disk I/O pressure, and the second step is to replace more professional electronic circuits such as Gpus and DSPS to achieve higher performance/power ratios.

To understand this is to distinguish between software engineering build content, which has a decisive impact on rendering performance, and this difference is limited by the underlying hardware: CPU, GPU, DSP, professional accelerator circuits… And so on. As a result, they are integrated and closely related from content parsing to the underlying hardware acceleration capabilities. Even without the ability to directly control what content is presented (UI controls and Draw API selection on the side of the application, HTML markup and CSS style selection on the Web on Paint… Etc.) need to penetrate to the bottom of the perspective and understanding, in order to “kill you 3000” on the basis of describing a more close to the essence of the problem optimization path.

In the case of UI controls in your application and Draw API selection, content has a very limited API selection. When I first led the browser team back in 2016, I read up on the source code for Servo, a next-generation browser engine being developed by Mozilla and Samsung (aside: Because the Servo programmers invented the great Rust programming language, which I love and have been learning for a long time, I recommend it), the Demo provided in the Servo’s open source project uses Android’s SurfaceView to ensure browser rendering performance. The reason for not using a View is that the View is redrawn through the VSYNC signal sent by the system. The refresh interval is 16ms, and if you can’t finish the drawing in 16ms, the interface will lag. So Servo chose the SurfaceView to solve this problem. Looking deeper, it’s essentially the complex and dynamic nature of HTML that makes the View inappropriate. As you can imagine, the constant local refresh of the View will bring flicker to the page, while SurfaceView’s double-buffering technology allows you to process the image in memory and then display it on the screen. This solves Servo’s problem with displaying HTML content. Similarly, many games opt for double buffering because they display “games.”

The GPU in OpenGL has two rendering modes: For current and off-screen renderings, rasterization, masks, rounded corners, and shadows trigger off-screen rendering, while off-screen rendering requires creating a separate buffer, switching context multiple times (from on-screen to off-screen), and finally switching context from off-screen to the current Screen to display the results of the off-screen buffer rendering. All of these are just API capabilities, but the choice of content determines rasterization, masks, rounded corners, shadow triggering, and performance dissipation, which is behind the impact of rendering content on the underlying and hardware.

The principle of front end and end applications is the same, the difference is that the front end takes a longer path and it is more difficult for the perspective to penetrate the underlying and hardware, because the front end hosting environment: browser/browser kernel/rendering engine is a layer on top of the system’s UI engine and rendering engine. At the same time, this layer of wrapping involves different implementations on different platforms by different browser vendors. However, with WebGL and WebGPU… Technology in the field of the front-end application, through to the underlying hardware and the difficulty of was simplified by the parallel technology ability and reduce the front even in perspective through to the underlying hardware and at the same time, to the underlying hardware and have certain ability to intervene, especially content rendering hardware acceleration, to render the content design and realize a more relaxed environment.

With the rendering of the content of the control, the implementation does not need to repeat, very simple one far one near two steps. The so-called “yiyuan” refers to what new and interesting things can be brought by penetrating the perspective to the bottom and examining the technical capabilities of hardware according to the business needs and product design. The “near” approach is to take the perspective back and select the appropriate UI controls and Draw API, HTML tags and CSS to build the content to be rendered, and the rest is the intermediate rendering process.

Rendering process

From the imaging principle, the rendering process includes: CPU computing (UI engine or browser engine, rendering engine work), graphics rendering API (OpenGL/Metal), GPU driver, GPU rendering, VSYNC signal emission, HSYNC signal emission process. Common rendering problems include: lag, tearing, frame dropping… Lag, tear, and frame drop are usually caused by long render time. The render time is mostly spent on CPU calculation, and part of the time is spent on graphics rendering. It’s very simple. You take a complex page with poor rendering performance and record it on a high end computer with a smooth rendering process, then play the video on a low end computer and open the page on the same phone and let the browser render it. The image complexity is the same, but the rendering performance is much faster than the page rendering. This is an example of how much CPU, GPU computing, and graphics rendering takes, since video decoding and rendering is much simpler and the rendering process is much shorter than in a browser engine. (Except for special codec formats and high bit rates)

Therefore, from the perspective of rendering process, the essence of performance optimization is to reduce the CPU and GPU computing load first. Second, if there are conditions (the business side needs to be persuaded to realize the difference to the business) to influence the rendering process through different methods of building render content, the underlying API with CPU, GPU optimization instructions and dedicated electronic circuit acceleration is preferred to build render content. For example, with the accelerated popularization of H.264 hardware today, it is questionable whether X.265 / H.265 should be used.

When looking at the rendering process, the fluency metrics are the first to focus on. Based on the 16.6ms frame rendering speed at 60Hz refresh rate, we can define the CPU and GPU processing time of 16.6ms x 2 (double buffering) and 16.6ms x 3 (triple buffering) in terms of time. Compress the rendering process to ensure smoothness.


OOPD (Out of Process Display Compositor), The main purpose is to migrate the Display Compositor from the Browser process to the Viz process (the original GPU process), and the Browser becomes a Client of Viz. The Renderer makes a CompositorFrame link to Viz through the Browser, but after the link is made, the CompositorFrame is submitted directly to Viz. The Browser also delivers the CompositorFrame to Viz, and Viz generates the final CompositorFrame, which is sent to the Renderer for compositing output through Display.

The main differences between OOPR and the current GPU Rasterization mechanism are:

  1. In the current GPU rasterization mechanism, when the Worker thread executes the rasterization task, it will call Skia to convert the 2D drawing instruction into GPU instruction, and the GPU instruction issued by Skia is transmitted to the GPU thread of Viz process through the Command Buffer for execution.
  2. In OOPR, when Worker thread executes rasterization task, it directly serializes 2D drawing instruction (DisplayItemList) to Command Buffer and sends it to GPU thread. In the part running on the GPU thread, Skia is called to generate the corresponding GPU instruction, which is directly executed by the GPU.

In short, the rasterization of Skia was moved from the Renderer process to the Viz process. When OOPD, OOPR, and SkiaRenderer are enabled:

  1. Rasterization and relocation to Viz process;
  2. Rasterization and Chengdu use Skia to do 2D drawing. In fact, all 2D drawing of Chromium is finally done by Skia, and the corresponding GPU instructions are generated by Skia.
  3. For rasterization and composition, Skia’s final output GPU instructions are all on the GPU thread and use the same Skia GrContext (the GPU drawing context inside Skia).


This means that when Skia’s support for Vulkan, Metal, DX12 and other 3D apis is complete, Chromium will be able to decide which GPU API Skia uses for rasterization and synthesis, depending on the platform and device. Vulkan, Metal, DX12, and other lower Level apis offer lower CPU overhead and better performance than the GL API.

Throughout the rendering process, different Low Level apis are affected by the rasterization process, which is affected by the synthesizer working process, and the synthesizer working process is affected by Blink’s processing of rendering content:


Interested in the rendering process can check out this document: Life of a Pixel, learning and understanding the rendering process can help you understand how different choices of rendering content affect performance. Analysis of the performance impact process can pinpoint performance problems. At the same time, understanding the rendering process can lead to many optimization methods to combat white screen, drop pits, flicker, lag… Performance and user experience issues.

The above explains relatively much about static rendering, but software engineering today deals with complex, dynamic scenes such as: dynamic data loading and dynamic rendering, conditional rendering, motion animation… Therefore, JavaScript intervention will also lead to changes in the rendering content, which will affect the rendering process. The following is a description of JavaScript intervention related issues.

JavaScript intervention


In principle Blink exposes the DOM API to JavaScript calls. (In fact, there is also a part of the CSSOM API that is used to interfere with CSS, which I won’t go into here because most modern front-end development frameworks inline this part of the INTERVENTION directly into HTML. CreateElement with an HTML tag append to childNodes[1] of document.body.firstchild (left, P2). The DOM Tree changes and causes the entire rendering process to change:


This is also the principle that virtual-tree technology can improve browser rendering performance: the changes of DOM Tree are merged to carry out batch bindings to reduce the probability and frequency of reentrant rendering process.

In short, from Blink’s point of view, V8 is an outlier. The browser engine decouples V8’s intervention in the DOM to the HTML tags themselves. However, since JavaScript intervention causes DOM changes, It will also cause changes in the subsequent rendering process. Therefore, sometimes changes to merge DOM Tree may lead to errors in rendering results. Without understanding the rendering process, rendering problems caused by using virtual-tree may be more difficult to locate and solve.

Secondly, in some conditional rendering or some routing logic of SPA applications, the selection and change of render content can have a negative impact on the rendering process, which may cause problems such as lag beyond the 16.7ms limit. Optimizations for conditions and judgment logic will likely alleviate some of the rendering performance issues (which will not be extended here since this is the domain of JavaScript), but in a nutshell: JavaScript executes and returns as soon as possible. The impact of computational complexity on rendering performance will be analyzed from the computing perspective of Passer, Layout, and Compositor in the following sections.

Computational perspective


In simple terms, the computational perspective is to look at the calculated parts of the rendering process: DOM, style, Layout, comp.assign, paint (including prepaint), because the time spent on these calculations directly affects the rendering performance. The industry has a concept of CRP for this process, so let’s start with CRP and look at the issues and approaches to optimizing rendering performance from a computational perspective.

An overview of the CRP

Link after loading HTML CSS via network I/O or disk I/O (cache) : Decoding HTML, CSS files (GZip text compression before transmission, etc.), processing (HTML, CSS Passing), DOM Tree building, Style inlining, layout, synthesizer, drawing, this involves a lot of parsing and calculation and other processing processes in the browser engine. For this, A concept that needs to be introduced: Critical Rendering Path (CRP).


  • First, once the browser gets the response, it starts parsing it. When it encounters a dependency, it tries to download it
  • If it is a style file (CSS file), the browser must fully parse it before rendering the page (which is why CSS is render blocking).
  • If it is a script file (JavaScript file), the browser must: ** stop parsing, download the script, and run it. ** Only after that can it continue parsing, because JavaScript scripts can change the content of the page (especially the HTML). (This is why JavaScript blocks parsing)
  • Once all the parsing is done, the browser creates the DOM tree and the CSSOM tree. Put them together and you get a render tree.
  • The penultimate step is to convert the render tree to the layout. This phase is also known as a rearrangement
  • The final step is to draw. It involves literally coloring pixels based on data calculated by the browser in previous stages

Putting these steps into the rendering engine’s rendering process, it becomes clear that CRP goes through the following steps:


In a nutshell, the steps of CRP are:

  • Process the HTML tags and build the DOM tree
  • Process the CSS tags and build the CSSOM tree
  • Merge the DOM tree and CSSOM tree into one render tree
  • Layout according to the render tree
  • Draws the individual nodes to the screen

Note: When the DOM or CSSOM changes (JavaScript can manipulate them via the DOM API and CSSOM API to change the look or content of the page), the browser needs to perform the above steps again. This is where the virtual-Tree rendering optimization described above came from

Optimizing the Critical Rendering Path of a page involves three things:

  • Reduce the number of critical resource requests: reduce the number of resources (CSS and JS) that are blocked. Note that not all resources are critical resources, especially CSS and JS (for example, CSS using media queries, asynchronous JS is not critical).
  • ** Reduce the size of critical resources: ** Use a variety of methods, such as reducing, compression and caching of critical resources, the smaller the amount of data, the less computational complexity of the engine
  • Shorten the critical render path length

The following general steps can be followed when specifically optimizing CRP:

  • Analyze and characterize CRP, and record the number, size, and CRP length of critical resources
  • Minimize the number of critical resources: remove them, delay their download, mark them as asynchronous, and so on
  • Optimize key resource bytes to shorten download time (round trip) and reduce CRP length
  • Optimizing the load order of the remaining critical resources requires all critical resources to be downloaded as early as possible to shorten CRP length

Use Lighthouse or Navigatio Timing API to detect key request chains

We need some tools to help us detect some important indicators in CRP, such as: critical resource quantity, critical resource size, CRP length, etc. Use Lighthouse plugin in Chrome, The Node version of Lighthouse:

/ / install lighthouse » NPM I - g lighthouse » lighthouse https://jhs.m.taobao.com/ - locale = useful - CN - preset = the desktop --disable-network-throttling=true --disable-storage-reset=true --viewCopy the code

You can get a detailed report where you can see the key request information:


More details can be found on the Lighthouse website

In addition to using the Lightouse detection tool, you can also use the Navigation Timing API to capture and record the true CRP performance of any page.


We can also use the relevant APIS in the performance monitoring specification to monitor the performance of real user scenarios on the page:


After the RESULTS of CRP analysis are obtained by the corresponding tools or technical means, they can be optimized accordingly.

CRP optimization strategy

Please wait for the complete content of the article published by the desert teacher. I only cite the relevant part from the perspective of computing.

Calculate the VIEW of HTML

  • Writing a valid and readable DOM:

    • Write in lowercase. Every tag should be lowercase, so please don’t use any uppercase letters in your HTML tags

    • Close the self-enclosed label

    • Avoid excessive use of comments (it is recommended to use appropriate tools to clear comments)

    • Organize the DOM and try to create only the absolutely necessary elements

  • Reduce the number of DOM elements (in HTMLDocumentParser processing in the Blink kernel, the number of tokens is closely related to the number of DOM elements, reducing the number of tokens can speed up the construction of DOM Tree, thus speeding up the speed of typesetting and rendering the first frame). Having too many DOM nodes on a page can slow down the initial page load time, slow down rendering performance, and possibly cause a lot of memory usage. So monitor the number of DOM elements present on your page and make sure your page does not:

    • There are no more than 1500 DOM nodes

    • DOM nodes are nested at no more than 32 levels

    • The parent node does not have more than 60 children

Reference: zhuanlan.zhihu.com/p/48524320… Blink Html parsing, by not wearing a plaid shirt program ape

CSS from a computational perspective

  • CSS class length: The length of class names has a minor impact on HTML and CSS files (this is controversial in some scenarios, see CSS Selector Performance for more details)
  • Critical CSS: Divides the CSS into Critical CSS and non-critical CSS. The Critical CSS passes<style>Inline to the page (as compressed as possible after reference), you can usecriticalTools to do this (seeThe key of CSS)
  • Use media query: Only the styles that match the media query will block the rendering of the page. Of course, downloads of all styles will not be blocked, but their priority will be lowered.
  • Avoid the use of @import The introduction of CSS:@importThe introduced CSS can be discovered only after the included CSS is downloaded and parsed, which increases the round-trip times on the critical path and increases the computing load on the browser engine.
  • Analyzing stylesheet complexity: Analyzing stylesheets can help identify problematic, redundant, and repetitive CSS selectors. Redundant and repetitive CSS selectors can be removed to speed up the reading and loading of CSS files. You can use TestMyCSS, analyze- CSS, Project Wallace, and CSS Stats to help you analyze and correct CSS code
// If the node has an ID attribute
if (element.hasID()) 
  collectMatchingRulesForList(
      matchRequest.ruleSet->idRules(element.idForStyleResolution()),
      cascadeOrder, matchRequest);
// If the node has a class attribute
if (element.isStyledElement() && element.hasClass()) { 
  for (size_t i = 0; i < element.classNames().size(); ++i)
    collectMatchingRulesForList(
        matchRequest.ruleSet->classRules(element.classNames()[i]),
        cascadeOrder, matchRequest);
}
// Pseudo-class processing.// Label selector processing
collectMatchingRulesForList(
    matchRequest.ruleSet->tagRules(element.localNameForSelectorMatching()),
    cascadeOrder, matchRequest);
// Last is the wildcard.Copy the code

You can use code to intuitively feel the difference in computing overhead caused by different CSS selectors, which provides guidance for optimizing computing performance. Reference: nextfe.com/how-chrome-… Li Yincheng

The optimization calculation itself

The browser engine itself is software, and once you understand the rendering process, you actually understand the software details of rendering. Then, from the perspective of software engineering, the methods of optimization calculation are actually quite rich. If the programming object is understood as algorithm + data structure, the theory is familiar to all of us. What I want to say is that algorithm + data structure can be regarded as time + space from the perspective of performance optimization. From this, a common performance optimization strategy can be introduced: time for space or space for time. When the space pressure is large (that is, the storage pressure is large), you can use time for space, a typical case is: file compression. When time pressure is bigger (that is, to calculate pressure), can use a space to change time, typical cases are said to: buffer (cache), the use of the long process of computation intensive tasks divided into a series of subtasks, concurrently stored after parallel computing (buffer/cache), then by the GPU output on the Display.


Below, analyze rendering performance from a computational perspective using Layout and Compositing instances.

Layout instance

In addition to the partial optimization ideas of CRP in HTML and CSS described above (solving the DOM and style parts in the following figure), the rendering pipeline includes:


The red box in the figure shows the time consuming part of CPU and GPU rendering pipeline, which is the optimization direction of rendering pipeline.

Core idea: reduce the rendering pipeline computing load



The different HTML markup and CSS style choices, and the layout methods we use within them, can inadvertently create a computational load on the layout engine. To avoid the generation of this load, is to make the layout engine as little calculation as possible, can use the above mentioned space for time method, with deterministic values to avoid the layout engine estimate or calculation. Reference: www.chromium.org/developers/…

Note that this is not a specific rendering performance optimization method, but rather an idea. Therefore, using this idea in a specific project requires additional work, including debugging capabilities, statistical analysis capabilities… , the most commonly used is to debug the Chromium kernel to find the source of the computing load. For example, to open a page, let’s put the breakpoint in blink, Source for third_party/blink/renderer/core/dom/document_init DocumentInit in cc: : Type DocumentInit: : ComputeDocumentType

Reference: zhuanlan.zhihu.com/p/260645423… Build, Debug “by Mark-0xg


“The performance goal of The Blink project is to be able to run web content at 60fps on a mobile phone, which means we have 16ms per frame to handle input, execute scripts, and execute the rendering pipeline for changes done by scripts through style recalculation, render tree building, layout, compositing, painting, and pushing changes to the graphics hardware. So style recalculation can only use a fraction of those 16ms. In order to reach that goal, the brute force solution is not good enough.

At the other end of the scale, you can minimize the number of elements having their style recalculated by storing the set of selectors that will change evaluation for each possible attribute and state change and recalculate computed style for each element that matches at least one of those selectors against the set of the descendants and the sibling forest.

At the time of writing, roughly 50% of the time used to calculate the computed style for an element is used to match selectors, and the other half of the time is used for constructing the RenderStyle (computed style representation) from the matched rules. Matching selectors to figure out exactly which elements need to have the style recalculated and then do a full match is probably too expensive too.

Space for time part:

We landed on using what we call descendant invalidation sets to store meta-data about selectors and use those sets in a Process called style invalidation to decide which elements need to have their computed styles recalculated.”

Reference: docs.google.com/document/d/… Invalidation in Blink by [email protected]

Compositing instance

Core idea: Find the problem that rendering engine designers are trying to optimize for and avoid it.

To put it simply, when reading Chromium and other browser kernel design documents or blogs, I often see some design solutions that explain the problems encountered by Blink kernel team and their ideas and ideas for solving the problems. If we think differently, we can avoid understanding the causes of these rendering performance problems. Avoid these situations in your own projects, and you will be able to optimize well.

Article: “the Multithreaded Rasterization, www.chromium.org/developers/…

Question:


Examples provided by The teacher:

 background: url(//www.textures4photoshop.com/tex/thumbs/watercolor-paint-background-free-thumb32.jpg) repeat center center;  background-size: 20%;
Copy the code


In addition, it is to study some: www.chromium.org/developers/… Documentation such as Compositor Thread Architecture to understand CPU, GPU, memory (space size, swapping in and out, memory alignment… Memory problems that block computation also create computation load because they lengthen computation time.) What happens? What conditions or conditions would cause a switch between threads? What problems cause competition for resources? Start with these questions: Is what we’re giving to the Compositor the most appropriate? In this way of reverse thinking, computing bottlenecks are found and targeted optimization. At the same time, the root of these ideas is very naive software engineering and programming ability, in the inability to perceive and understand these problems, it may as well make up for these basic software engineering ability and programming ability.

Consider, for example, a book like UNIX Network Programming. (book.douban.com/subject/150…

conclusion

There are obvious bottlenecks with the “kill you 3000” approach to high-quality rendering performance optimization: you can’t do it better than anyone else. Only layers deep from: Decoding HTML, CSS files (GZip text compression before transmission, etc.), processing (HTML, CSS Passing), DOM Tree building, Style inlining, layout, synthesizer, drawing start, then to WebGL, Vulkan, Skia and other low-level programming interfaces, Finally, get a hardware view of how your page is rendered. The more you know, the deeper you are, the more you can find deeper and more valuable problems. Combined with your programming ability and software engineering ability, you can put forward some of your own solutions, so as to: do better than others!



Tao department front end – F-X-Team opened micro blog! (Visible after Posting on Weibo)
More team content awaits you in addition to articles at 🔓