Talk about Chromium’s rendering mechanism

About the BeesAndroid project

BeesAndroid project provides a series of tools, theoretical analysis and methodology, aimed at reducing the threshold of Android source code, so that readers better understand the design and implementation of Android system. For the first time in this series, see the introduction and the table of contents for more articles.

Today we are going to talk about Chromium rendering mechanism, which is also the second part of the rendering mechanism series. Most of the work in the recent half year is related to H5 container, so I spent some time to learn Chromium project, and here I will focus on analyzing its rendering mechanism. From a developer’s perspective, when we look at an H5 container, we have the following characters working with it:

  • software
    • Chromium: WebView, Content, Blink, V8, Net, Base, etc
    • Android OS: View/Window, Activity, WindowManager, ActivityManager, Surface/Texture, SurfaceFlinger.
    • Graphics: OpenGL ES, Skia, Vulkan
    • Binders
    • Linux Kernel
  • hardware
    • display
    • CPU
    • GPU


As follows:







Of course, the actual structure is much more complicated. When users click the screen to open the H5 page, they will generally go through the following stages.

  1. Touch feedback: The first is touch, how this touch event is passed to the App in Android.
  2. Container creation: How Android launches the WebView container to load the URL when the touch event is passed to the App. This will involve the Chromium kernel boot and other relevant knowledge.
  3. Page loading: how the WebView sends the master document request to the server and receives the master document response after it is started.
  4. Page rendering: Once the WebView receives the main document, how it parses it into a page is the most critical and complex part.

It can be seen that there is still work to be done before rendering the page, and container startup is also a time-consuming operation. Why do we talk about container startup specifically, because it is an important part of H5 page experience, and front-end students may not pay attention to it because it is Native. And the container navigation phase is an important preloading time, where we can do a lot of things, such as:

  1. Interface preloading
  2. HTML documents are preloaded
  3. Resource preloading
  4. Create a JS Engine for navigation that executes the JS logic ahead of time, opening up the navigation preload capability to the front end

Without further ado, let’s talk about rendering mechanics.

Rendering Architecture

The rendering process of a browser is to render a web page through a rendering pipeline into pixels that are eventually exported to the screen. There are three characters involved

  • Input side: web page, which Chromium abstracts into Content.
  • Rendering pipeline: Blink is mainly responsible for DOM parsing, style layout, drawing and other operations, converting web content into drawing instructions.
  • Output: mainly responsible for the drawing instructions into pixels, display on the screen.

What is Content?

We see the concept of Content a lot in the Chromium project, so what is Content? Content is the area for rendering web Content, corresponding to AwContent in the Java layer and represented by WebContents at the bottom, as shown below:

Content is described in code by Content ::WebContents, which is created by Blink in a separate Render process. Specifically, Content corresponds to HTML, CSS, JS and image involved in front-end development, as shown below:

What is a Rendering Pipeline?

The rendering pipeline can be understood as a disassembly of the rendering process, like a factory assembly line, where the semi-finished product generated in one workshop is sent to the next workshop for further assembly. Disassembling the rendering process helps simplify the rendering process and improve rendering efficiency.

When rendering is dynamic, it will trigger rendering and update pixel points when the content changes. Like the Android rendering system, triggering rendering is also triggered by the invalidate mechanism. After triggering rendering, it is very expensive to execute the whole rendering pipeline.

  • Triggering conditions are as follows:
    • scrolling
    • zooming
    • animations
    • incremental loading
    • javascript
  • The triggering methods of each process are as follows:
    • Style: Node: : SetNeedsStyleRecalc ()
    • Layout: LayoutObject: : SetNeedsLayout ()
    • Paint: PaintInvalidator: : InvalidatePaint ()
    • RasterInvalidator::Generate()


After the rendering pipe converts the webpage into drawing instructions, it cannot directly turn the drawing instructions into pixels (raster) and display them on the screen (Window). At this time, it needs to use the ability of the operating system itself (the underlying graphics library). Most platforms follow the API standardized by OpenGL in terms of the graphical interface. DirectX on Windows, Vulcan on Android. As shown below:







From the above description, we know where Conntent comes from and where it is going. Basically, it’s converting HTML, CSS, JS, etc. into the correct OpenGL instructions, rendering them to the screen, and interacting with the user.



Now that we know the basics of rendering, let’s take a look at how the rendering process works, as follows:





Structure

** Let’s start with structure from top to bottom, layer by layer:

  • Blink: The Render thread running in the Render process, which is the Chromium Blink rendering engine, mainly responsible for HTML/CSS parsing, jS interpretation execution (V8), DOM manipulation, typesetting, layer tree construction and update and other tasks.
  • Layer Compositor: the Compositor thread running in the Render process is responsible for receiving the Main Frame generated by Blink, Layer tree management, Layer scrolling, rotation and other matrix changes, Layer partitioning, rasterization, texture uploading and other tasks.
  • Display Compositor: UI thread running in the Browser process that receives the Compositor Frame generated by the Layer Compositor and outputs the final OpenGL rendering instructions to draw the web content to the target window via GL mapping operations.

It also mentioned the output Frame of each level. The Frame describes the encapsulation of the data related to the drawing content output from the lower module of the rendering line to the upper module.

  • Main Frame: Contains a description of the content of a web page, mainly in the form of drawing instructions, or understood as a vector snapshot of the entire web page at a point in time.
  • Compositor Frame: The Layer Compositor receives the Main Frame generated by Blink and converts it into an internal synthesizer structure. It is sent to the Browser and eventually to the Compositor Frame, which consists of two main parts:
    • Resource: This is an encapsulation of Texture, the Layer Compositor divides each Layer into blocks, assigns resources to each block, and arranges the rasterization task.
    • Draw Quad: This represents the Draw command (rectangle Draw command, specifying coordinates, size, transformation matrix, etc.). When the Layer Compositor receives Browser’s Draw request, it generates a Draw Quad command for each block of each Layer in the currently visible area.
  • GL Frame: Display Compositor converts each Draw Quad Draw instruction of the Compositor Frame into a GL polygon Draw instruction that maps the target window using the corresponding Texture of the Resource wrapper. The set of GL drawing instructions constitutes a GL Frame, and finally the GPU executes these GL instructions to complete the drawing of the web page in the visible area of the window.

The scheduling of the entire rendering pipeline is based on requests and state machine responses. The scheduling hub runs in the Browser UI thread, which sends requests to the Layer Compositor to output the next frame according to the VSync signal of the display. The Layer Compositor determines whether to Blink the next frame according to the state of its state machine. The Layer Compositor and Display Compositor are producers and consumers. The Display Compositor holds a Compositor Frame queue that is constantly drawn and removed. The output frequency depends on the input Frame rate of the Compositor Frame and the drawing frequency of its own GL Frame.

Flow

Let’s talk about the process

  1. Parse/DOM: Content is parsed into a DOM tree, which is the basis for subsequent rendering processes.
  2. Style: Parses and applies the Style sheet.
  3. -Leonard: Layout.
  4. Compositing Update: Split the entire page into separate layers according to certain rules for isolated updates.
  5. Prepaint: Builds a property tree so that a node (transform, crop, effect, scroll) can be manipulated independently without affecting its children.
  6. Paint: Paint. Verbs can mean paint, etc. Paint transforms a Layout Object in a Layout Tree into a drawing instruction (for example, draw a rectangle, draw a font, draw a color, which is a bit like calling a drawing API). These operations are then encapsulated in Dsipaly items, so the Display items are like paint, and it hasn’t really started to Draw yet.
  7. Commit: Commit copies the paint data to the synthesizer thread.
  8. Tiling: After raster receives the paint instructions, he blocks the layers first. The diagram block is the basic working unit of Raster.
  9. Raster: Rasterization.
  10. Activate: Rasterization is an asynchronous process, so the Layer Tree is divided into a Pending Tree (rasterizing the Layer that receives the Commit) and an Activate Tree (drawing the rasterized Layer from the Pending Tree). The process of copying a Layer from a Pending Tree to an Activate Tree is called Activate.
  11. The Draw: After the blocks are rasterized, the synthesizer thread generates draw Quads for each block. These Draw Quads are encapsulated into a Compositor Frame and output to the GPU. The DRAW operation is the process of generating Draw Quads.
  12. Display: After generating the Compositor Frame, Viz calls the GL command to output draw Quads to the screen.

Let’s look at the specific process.

Rendering Pipeline

Note: The picture in Rendering Pipeline is from the screenshot of PPT Life of a Pixel by Chromium engineer.

Blink

01 Parse

The related documents

  • The DOM standard

Related to the source code

  • /blink/renderer/core/dom


When you download an HTML document from a server, the first step is parsing. The HTML parser takes in tags and streams of text (HTML is plain text) and parses the HTML document into a DOM tree. The DOM (Document Object Model), the internal representation of the DOM and the page, also exposes apis for JavaScript (V8 DOM API), allowing JavaScript programs to change the structure, style, and content of a Document.



It is a tree structure, and we will see many more trees (layout trees, attribute trees, etc.) later in the rendering process because they are based on the DOM tree structure (HTML structure).





Note: HTML documents may contain multiple DOM trees, often referred to as Shadow trees, because HTML supports custom elements.

The process for parsing HTML to generate a DOM tree is as follows:

  1. HTMLDocumentParser parses tokens in HTML and generates an object model.
  2. HTMLTreeBuilder is responsible for generating a complete DOM tree. The same HTML document can contain multiple DOM trees. Custom Element elements have a shadow tree. Nodes passed in shadow Tree Slot will be found by FlatTreeTraversal traversal down.

The DOM Tree is used as the basis for the subsequent drawing process, and various types of trees are produced based on it. Specifically, the following transformations are performed:

Object conversion

  • DOM Tree -> Render Tree -> Layer Tree
  • DOM node -> RenderObject -> RenderLayer

DOM Tree (node is DOM node)

When an HTML is loaded, it is parsed to generate a DOM tree. Each node in the DOM tree corresponds to each element in the page, and the page can manipulate the DOM tree through JavaScript.

How Webkit Works

Render Tree (node is RenderObject)

However, the DOM Tree itself is not directly used for layout and rendering, so the kernel generates the Render Tree, which is a combination of THE DOM Tree and CSS, and the nodes of the two are almost one-to-one. Render Tree is a bridge between a typography engine and a rendering engine.

How Webkit Works

Layer Tree (node is RenderLayer)

Render engine is not directly using Render Tree for rendering, in order to more convenient processing operations such as positioning, clipping, industry scrolling, rendering engine will generate a Layer Tree. The rendering engine generates a RenderLayer for a particular RenderObject, but the child node of the RenderObject does not have a corresponding RenderLayer, so it is subordinate to the parent node’s RenderLayer. The rendering engine iterates through each RenderLayer, iterates through the RenderObjects that belong to that RenderLayer, and draws each RenderObject.

The Layer Tree determines the order in which the page is drawn, and the RenderObject that is subordinate to the RenderLayer determines what is drawn at that Layer.

What RenderObject is going to be the RenderLayer. GPU Accelerated Compositing in Chrome

  • It’s the root object for the page
  • It has explicit CSS position properties (relative, absolute or a transform)
  • It is transparent
  • Has overflow, an alpha mask or reflection
  • Has a CSS filter
  • Corresponds to element that has a 3D (WebGL) context or an accelerated 2D context
  • Corresponds to a element

It doesn’t matter if you don’t know the above process, we will explain them all below.

02 Style


Once the DOM tree is generated, you need to set a style for each element, either to affect only one node, or to affect the rendering of the entire DOM subtree below the entire node (for example, the rotation transformation of the node).







The related documents

Related to the source code

  • /blink/renderer/core/css


Styles are usually the result of a combination of style renderers with complex priority semantics and a rendering process that is divided into three steps:



Collect, partition, and index style rules in all style sheets.







CSSParser first parses the CSS file into object model StyleSheetContents, which contains stylerules that have a rich representation. In these style rules, objects are indexed in various ways for more efficient lookups.



In addition, style attributes are defined declaratively in Chromium css_properties,json5 json files, which generate specific C++ classes through py scripts.



Access each DOM element and find all the rules that apply to that element.



The style engine traverses the DOM tree, calculating the style for each node, and ComputeStyle maps property to rule, such as font styles, margins, background colors, and so on. These are the outputs of the style engine.







3 Combine these rules with other information (the style engine consists of partial default styles) to generate the final computed style.

03 Layout


Once you have calculated and applied the style of each DOM node, you need to decide where to place each DOM node. DOM nodes are placed based on a box model (a rectangle), and the layout is the calculation of the coordinates of those boxes.







Layout operations are built onCSS box modelBased on, as follows:

|-------------------------------------------------| | | | margin-top | | | | |---------------------------------------| |  | | | | | | border-top | | | | | | | | |--------------------------|--| | | | | | | | | | | | | padding-top |##| | | | |  | |##| | | | | | |----------------| |##| | | | | | | | | | | | | ML | BL | PL | content box | PR |SW| BR | MR | | | | |  | | | | | | | | |----------------| | | | | | | | | | | | | | | padding-bottom | | | | | | | | | | | | | |--------------------------|--| | | | | | scrollbar height ####|SC| | | | | |-----------------------------| | | | | | | | | border-bottom | | | | | | | |---------------------------------------| | | | | margin-bottom | | | |-------------------------------------------------|Copy the code

The related documents

  • Blink Layout
  • CSS Box Model Module Level 3
  • LayoutNG

Related to the source code

  • /blink/renderer/core/layout


A Layout Tee is generated based on the DOM Tree to generate Layout information for each node. The process of Layout is to traverse the entire Layout Tree for Layout operations.



The DOM Tree and Layout Tree are not always one-to-one. If we set dispaly: None in the tag, it will not create a Layout object.



04 Compositing Update

After the Layout operation is complete, it is theoretically possible to start Paint operation, but as we mentioned, it would be very expensive to start Paint operation and draw the entire interface. Therefore, the concept of layer composition acceleration is introduced. What is Compositing Layer?

The basic idea of layer composition acceleration is to divide the entire page into multiple layers according to certain rules (like layers in Photoshop), so that only the necessary layers need to be manipulated during rendering, and the other layers only need to be composited to improve rendering efficiency. The Thread that does this is called the Compositor Thread, which is worth noting that it also has the ability to handle input events (such as scrolling events), but if event listeners are registered in JavaScript, it forwards input events to the main Thread for processing.

Specifically, renderlayers have their own cache, called Compositing layers, from which the kernel creates the corresponding GraphicsLayer.

  • The RenderLayer that has its own GraphicsLayer draws in its own cache as it draws.
  • RenderLayer that doesn’t have its own GraphicsLayer will look up the parent’s GraphicsLayer until the RootRenderLayer (which always has its own GraphicsLayer), And then draw it in the cache of the parent node that has GraphicsLayer.

This creates a GraphicsLayer Tree that corresponds to the RenderLayer Tree. When the content of the Layer changes, you just need to update the GraphicsLayer, whereas with a single cache architecture, you update the entire Layer, which is time-consuming. This improves rendering efficiency. However, too many Graphicslayers can also consume memory, which reduces unnecessary drawing, but can also lead to poor overall rendering performance due to memory issues. Thus layer composition acceleration seeks a dynamic balance.

RenderLayer creates GraphicsLayer (GPU Accelerated Compositing in Chrome)

  • Layer has 3D or perspective transform CSS properties
  • Layer is used by element using accelerated video decoding
  • Layer is used by a element with a 3D context or accelerated 2D context
  • Layer is used for a composited plugin
  • Layer uses a CSS animation for its opacity or uses an animated webkit transform
  • Layer uses accelerated CSS filters
  • Layer has a descendant that is a compositing layer
  • Layer has a sibling with a lower z-index which has a compositing layer (in other words the layer overlaps a composited layer and should be rendered on top of it)


The layering decision is handled by Blink (which may move to the Layer Compositor decision in the future), which generates a Layer tree from the DOM tree and records the contents of each Layer in DisplayList.



Now that we know about layer Compositing acceleration, let’s look at the Compositing update that happens after the Layout operation. The Compositing update is the process of creating a GraphicsLayer for the specific RenderLayer (the creation rules we’ve already described) as follows:





05 Prepaint

What is an attribute tree?

In the hierarchical structure of the description attribute this piece, before the way is to use layer tree way, if the parent layer with matrix transformation (translation, scaling, or perspective), cutting or special effects (filter, etc.), need recursive applied to child nodes, time complexity is O (layers), which could be a performance problem in extreme circumstances.

Therefore, the concept of attribute tree is introduced, and the synthesizer provides transformation tree, clipping tree, special effect tree, etc. Each layer is composed of several node ids corresponding to matrix transformation nodes, clipping nodes and special effects nodes of different attribute trees. Such time complexity is O(the node to change), as shown below:


The Prepaint process is the process of building a property tree, as follows:

06 Paint

Once the property tree (Prepaint) is created, the Paint phase begins. The related documents

Related to the source code

  • /blink/renderer/core/paint

The Paint operation converts Layout objects in a Layout Tree into drawing instructions (such as drawing rectangles, fonts, and colors, which are a bit like drawing API calls). These operations are then encapsulated in Dsipaly items, which are stored in PaintArtifact. PaintArtifact is the output of the Paint phase. So far, we’ve created a list of draw operations that can be replayed, but no actual draw operations have been performed.

Note: Recrod & Replay mechanism is used in most graphics systems now. The acquisition and execution of drawing instructions are separated from each other to improve rendering efficiency






During the drawing process, there is a stacking order issue, which uses stacking Order (Z-index) instead of DOM order. Z-index determines the order in which Paint is drawn. In the absence of z-order, Paint is drawn in the following order.





  • The background color
  • floats
  • The foreground
  • outline


The Paint operation will eventually generate a Paint Tree based on the Layout Tree.







Layer Compositor

07 Commit


After the Paint phase is complete, enter the Commit phase. This phase updates a copy of the layer and property tree to the synthesizer thread to match the submitted main thread state. Copy layers and properties from the main thread to the synthesizer thread for use.





08 Tiling

However, after the synthesizer thread receives the data, it will not immediately start to synthesize, but to partition the layer, which involves a partition rendering technology. What is block rendering?

Tile Rendering is to divide the cache of a web page into small tiles, usually 256×256 or 512×512, and render them in blocks.

There are two main considerations for partitioned rendering:

  • GPU composition is usually implemented using OpenGL ES maps, where the cache is actually GL Texture. Many Gpus have limitations on the size of the Texture, such as length and width must be a power of 2, maximum 2048 or 4096, etc. Arbitrary size caches are not supported.
  • Block caching, so that browsers can use a unified buffer pool to manage the cache. The small cache of the buffer pool is shared by all WebViews. When a web page is opened, the small cache is applied to the buffer pool. When a web page is closed, the cache is reclaimed.


Tiling is the basic unit of rasterization. Rasterization prioritizes the blocks based on their distance from the visible viewport. Those that are close will be rasterized, and those that are far away will degrade rasterized priority. These blocks are stitched together to create a layer that looks like this:



09 Raster

Once the layers are partitioned, Raster is performed. What is rasterization (rasterization)?

Raterization, also known as rasterization, is used to execute drawing instructions to generate color values of pixels. There are two rasterization strategies:

  • Synchronous rasterization: Rasterization and composition in the same thread, or through thread synchronization to ensure light and composition
    • Direct Rasterization: Directly execute the drawing instructions for the visible areas in the eDisplayList of all visible layers to generate the color values of the pixels on the pixel buffer of the target Surface. Of course, if it is completely direct rasterization, there is no layer merge involved, and there is no need for subsequent composition.
    • Indirect rasterization: Allows additional buffers to be allocated for a layer that is rasterized to its own pixel buffer, The rendering engine then synthesizes and outputs the large ohmmeter Surface’s pixel buffers from these layers (view. setLayerType allows the application to assign pixel buffers to views). Android and Flutter mainly use direct rasterization of measurements, but also support indirect rasterization.
  • Asynchronous block rasterization




































































  1. In Proess Raster
  2. Out of Proess Raster


1. In the old version, Skia runs in Renderer Process and is responsible for generating GL instructions. GPU has a separate GPU Process, and Skia cannot directly make rendering system calls in this mode. When initializing Skia, give it back a table of function Pointers (pointing to the GL API, but not the real OpenGL API, but the Proxy provided by Chromium), The process of converting function pointer tables to the true OpenGL API is called command buffers (GpuChannelMsg_FlushCommandBuffers),



Separate GPU processes help isolate GL operations and improve stability and security, also known as a sandbox mechanism (unsafe operations run in separate processes).







The new version puts drawing operations into the GPU Process and runs Skia on the GPU side, which helps improve performance.









The next step is to execute GL instructions, which are typically provided by the underlying SO library. On Windows, OpenGL is also converted to DirectX (Microsoft’s graphics API for graphics acceleration).





10 Activate

After Commit, there is an Activate operation before Draw. Raster and Draw both occur in the Layer Tree of the synthesizer thread, but we know that Raster operation is asynchronous, it may need to execute Draw operation, Raster operation is not completed, this time need to solve the problem. It divides the Layer tree into:

  • Pending Tree: Receives the COMMIT and Raster the Layer
  • Active Tree: Draws the rasterized Layer from here.


The copying process is called Activate and looks like this:







In fact, there are four main types of Layer Tree:

  • Main thread Layer tree: cc::Layer, always present.
  • Pending tree: CC ::LayerImpl, synthesizer thread, for rasterization phase, optional.
  • Active tree: cc::LayerImpl, the synthesizer thread, for the drawing phase, is always present.
  • Recycle trees: CC ::LayerImpl, synthesizer threads, and Pending trees do not coexist.


The main thread’s layer tree is owned by LayerTreeHost, and each layer has its sub-layers recursively. Pending trees, Active trees, and Recycle trees are all instances owned by LayerTreeHostImpl. These trees are defined in cc/treesDirectory. They are called trees because earlier they were implemented based on tree structures, and today they are implemented as lists.



11 Draw



When each block is rasterized, the synthesizer thread generates Draw Quads for each block, which are encapsulated in a CompositorFrame object. The CompositorFrame object is also the output of the Render Process. It will be submitted to the GPU Process. The Frame of 60fps output refers to a Compositor Frame.



The Draw operation is the process of generating Draw Quads from rasterized blocks.





Display Compositor

12 Display

The related documents

  • Viz Doc

After the Draw operation is complete, the Compositor frames are generated, which are output to the GPU Process. It receives Compositor frames from multiple Render processes from multiple sources.

  • The Browser Process also has its Compositor to generate Compositor frames, which are typically used to draw the Browser UI (navigation bars, Windows, etc.).
  • Each time a TAB is created or iframe is used, a separate Render Process is created.






The Display Compositor runs on the Viz Compositor thread, which calls the OpenGL command to render the Draw Quads inside the Compositor Frame and outputs pixels to the screen.



What is VIz?

Viz is short for VIsual, and it is an important part of Chromium’s overall architecture towards servitization, including Compositing, GL, Hit Testing, Media, VR/AR and many other functions.

VIz is also double-buffered, drawing Draw Quads in the background buffer and then executing swap commands to finally get them displayed on the screen. What is double buffering?

In the rendering process, if only a buffer is read and written, the screen will have to wait to read, while the GPU will have to wait to write, which will cause low performance. A natural idea is to separate reading and writing into:

  • Front Buffer: The screen reads frame data from the Front Buffer for output display.
  • Back Buffer: The GPU writes frame data to the Back Buffer.

The two buffer will not direct data copy (performance), but in the background buffer write complete, front desk buffer read complete, direct exchange of pointer, front desk becomes the background, the background at the front desk, so what time exchange, if the background buffer is ready to, the screen has not been processed at the front desk buffer, so there will be a problem, Obviously at this point you need to wait for screen processing to complete. After the screen is scanned, the device needs to go back to the first line to refresh. There is a Vertical Blank Interval during this period, which is the time for interaction. This operation is also known as VSync.

At this point, the rendering process is complete, and the front-end code becomes pixels that the user can interact with.