How to upgrade the architecture of the search front end with billions of traffic?

With the increasing complexity of large front ends, many companies have started to separate the front and back ends, separating the front and back end architecture design. So let’s see, what is the front-end architecture design? What are the problems with the development of the once very simple front-end architecture? How to solve the problems such as the huge volume of front-end code, cross-team collaboration efficiency, code coupling, backward technology stack and so on?

1. What is the front-end architecture?

The term front-end architecture, I believe that many people have different definitions; According to the interpretation of splitting words, I understand it as “front end” + “architecture”. Front-end refers to the front page of the Web side, including the content, style, script, etc., which are usually encapsulated in the components, which may be the file module of the template engine, or the components in the MVVM framework. The word “architecture” is better understood. The word “architecture” comes from the construction industry, which can be understood as the overall structure and frame of a house. Combining the concepts of front end and architecture, “front end architecture” can be understood as the abstraction and organization of components of a Web page.

Because of the different businesses of each company, the development of front-end architecture of each company is different. Here, I will take the classic search scene of Baidu’s mobile terminal to give you an example, hoping to find some common problems from the evolution of Baidu’s mobile terminal architecture.

Second, Baidu mobile terminal background and problems

Why take Baidu for example? Because Baidu is the leading search engine in China, and has been leading the industry. According to StatCounter Outlook Industry Research in 2019 China search engine line can know that Baidu search accounted for 12.3% of the world’s search engine market share, ranking second only to Google. So use Baidu for example, more representative.

To get back to the point, open Baidu App you will find that Baidu’s front end is directly divided into the home page and search results page, search results page is the main entrance of search, carrying a billion traffic every day.

Not only that, the search results page carries many product line requirements and downstream module runtimes. Each year, in-house developers provide more than 500 product requirements, providing the base libraries and runtimes for more than a dozen downstream modules. There is even back-end collaboration, and Figure 1 shows the overall architecture of the result page.

Figure 1: The overall structure of the Baidu search results page

For the overall architecture design, there are these questions:

Subdivision lines of business are numerous and single library code is huge.
Average monthly 200+ submissions, 3W + lines of code;
80+ developers develop in the same codebase;
No one can fully master the overall technology of the module.

Therefore, three aspects are sorted out:

Personnel responsibilities are unclear, with a single module taking on the responsibilities of multiple teams at the same time

Boxes and Tabs: “all” and adjacent search shared;
Operating products: penetration in the result page code base;
Others: result lists, user feedback, search recommendations, experience logs, speed logs, billing logic…

Serious code coupling

Error-prone, fragile code logic;
Rigid structure, not easy to add functions;
Dependencies are strong and code is hard to reuse.

Technology Stack Backwardness

Pages are not componentized. No Vue, no React, still using Smarty templates;
Node.js cannot be supported. Smarty templates rely heavily on the PHP environment;
Backward tool chain. No TypeScript, no JEST.

These three problems will ultimately affect the efficiency of research and development and product quality. So Baidu is how to do it specifically? Architecture optimization has only two goals, one is to meet the business needs, and the other is to be able to adapt the framework and tools flexibly from the technical perspective (also in order to continuously meet the business needs). According to the goal of “meeting business needs”, Baidu has made directions at three levels. (Figure 2)

The bottom basic layer is close to the community, because according to the internal research, the cost of building wheels is not high, but the cost of maintaining these wheels is very high. If you want to iterate faster, it is suggested to close to the community, use open source things or contribute open source. It is mainly to solve the problems of backward technology stack and unclear responsibilities.
The middle tier is a separate module that deals with the previously mentioned problem of unclear responsibilities and low delivery efficiency. It is mainly to solve the problem of unclear responsibilities and low delivery efficiency.
The top layer is componentization, and componentization is done on the basis of independent modules to accelerate business iteration.

△ Figure 2: The three directions of business requirements

Three, how to solve

According to the direction and goals mentioned here, how to combine with Baidu’s own architecture landing? First, a review of Baidu’s architecture can be seen in Figure 3 below.

Figure 3: The overall structure of the Baidu search results page

There are two logs, which means the same set of code is maintained in two parts; In addition to duplication, their differences introduce higher costs for subsequent maintenance;
The underlying HHVM+PHP conflict with the community’s greater embrace of Node.js.

Therefore, Baidu students adjusted the target structure as shown in Figure 4.

△ Figure 4: The target structure of the result page

As you can see in Figure 4:

Log, search box, related search, performance and other independent modules, there are special students to maintain and iterate independently;
A rendering layer is added between the front and back ends. Separate business code from back-end logic;
The underlying Node.js mechanism is added.

After the goal and direction have been solved, we have to see how to implement it. For a small library, build the architecture from scratch; But to baidu, implementation also is difficult. Consider not only smooth migration and non-degradation of performance, but also long-term maintainability, security, cross-platform, and so on.

As mentioned above, the basic idea is to follow the steps of infrastructure, module splitting and componentization. Infrastructure is the key to business module partition, perfect automation and tool chain is the premise of modularization; Modular split can provide businesses and teams with better ability to scale horizontally; On the basis of modularization, we can further build componentization scheme within the module to accelerate business iteration.

Things to focus on in infrastructure include:

TypeScript: essential for large projects to spot problems early It’s also the foundation for cross-platform;
Continuous integration: ensure that every change to new functionality and fix problems does not introduce new problems;
Unit tests: Introduced at the beginning of refactoring to help prevent degradation and aid in design.

Some of the things that need to be concerned about modular unbundling are:

Identify and define business boundaries, and divide the unified warehouse into several independent small warehouses;
Build automation mechanism in sub-module, select, develop and go online independently.

Note:

Modular unbundling is not a technical issue, it is a business issue. Decoupling and independent iteration is possible only if there is a vertical division by business and product. Otherwise, just formally splitting the coupled code will result in greater maintenance and communication costs.

Since the component is the selection within the business module, the componentization scheme is relatively free. As long as it doesn’t severely impact performance, and the transition is smooth.

Four, landing plan

1. The modular

We also use a picture (Fig. 5) to show the specific landing plan. You can see that it has two parts, the server side and the browser side.

The server-side concerns the division of business modules and the composition of runtime;
The browser side is concerned with dependency resolution and how to support componentization solutions.

△ Figure 5: Specific landing plan

2. The service side

Baidu is the whole big module is divided into a number of independent business modules, the final page is composed of modules. This requires business modules to have a unified interface, the Molecule interface shown above, which defines how the module is rendered, what dependencies it has, and so on. Because the rendering process is encapsulated within the module, the entire architecture can support multiple languages and frameworks.

As you may have noticed, Molecule and microservices are very similar. The key difference is that the services of microservices operate with each other through IPC, and each service can scale and be deployed independently. Molecule’s modules exist in the same process. Despite this difference, Molecule can still implement nearly the same characteristics as a microservice, as shown in Figure 6.

△ Figure 6: Comparison between Molecule and microservices

Figure 7 shows the server entry file of a specific business module, where ToptipController implements the controller interface provided by Molecule. This interface requires a rendering function that accepts a dictionary-type data and returns the rendered content of the page. It’s up to the caller to decide how to assemble the page.

Figure 7: Server-side entry file for a specific business module

Above is the interface of the business module provider. In addition, the Molecule mechanism provides a convenient interface for the caller (the side on which the final page is assembled) to render the submodule at run time by passing in its name and parameters where it is necessary to import the submodule. The principle of the mechanism is simple, but in practice you might need to introduce namespaces, consider module versions, and so on.

3. The client

So how does the client work? We also need to get the browser-side components of each module up and running. The difficulty is the dependencies and code sharing between the components. These components may be in different codebases and belong to different businesses, so we need a very loose dependency approach.

What we are introducing here is a dependency injection container (Figure 8). In summary, the framework logic and common tools are encapsulated as specific services for the business module to use, and each business module needs to define which services it depends on.

△ Figure 8: Client design

Figure 9 graphically depicts the relationships between components, services, and containers.

△ Figure 9: Relationships among components, services, and containers

Blue represents specific services and other colors represent individual business modules. The runtime container takes care of resolving the dependencies of each of the business modules and assembling them together to produce an interactive Web page.

Note:

Business modules are independent of each other, and a business module cannot depend on other business modules, only on the common Service. Therefore, if there is a product logic coupling between business modules, a common Service may be required as an intermediary, such as an EventService provided in a container that acts as an event bus.

Figure 10 is an example of the client code for the business module. Its dependencies are declared through constructors, the runtime container is responsible for creating dependencies, and the business module is only concerned with the use of dependencies. It is the separation of use and create operations that allows decoupling between business modules and between business modules and page frameworks, independent development and independent testing.

Figure 10: Client-side code example for a business module

The above is the overall scheme of module splitting. Let’s review: business modules are composed on the server side through an interface called Molecule; Dependencies are resolved through a DI container on the browser side and all business modules are started.

4. The modular

Componentization solutions directly affect the efficiency of business development. In other words, componentization solutions to some extent determine the code written by business students. Componentization can also help with issues such as unclear responsibilities. Our choice of componentization solution is SAN, or you can choose Vue or React based on your business or preference. The migration of business code is straightforward, from Smarty templates to SAN components, from HTML string concatenation to component structures with business semantics.

Next, we focus on two key technical issues of componentization solutions, cross-platform and page performance.

1) Cross-platform

We have a lot of business code — thousands of templates, hundreds of thousands of lines of code — that needs to be migrated to componentization, and we need to make sure that the whole back-end migration from PHP to Node.js doesn’t need to be redeveloped. So how do business components cross platforms? The key is abstraction.

High level language: Our business code needs to use a high enough level language, in this case TypeScript, to translate to multiple platforms.
Dependency inversion: our high-level business module should not depend on the specific low-level module, but only on the interface, so that it is possible to replace the low-level implementation on different platforms;
Abstract Interface: Finally Molecule should be simple enough; The Molecule interface does not depend on the underlying implementation, such as the specific API of PHP.

Do all of the above to make the transition smooth. This process is divided into three phases (Figure 11).

Figure 11: Three stages of platform transition

2) Page performance

Introducing a front-end framework usually means an increase in size and a decrease in performance, which directly affects search revenue, so page performance is key to the success of a project. If the performance is going to be worse than the performance of the template engine, the project is likely to die. How to ensure page performance? Two optimization points are emphasized.

Introduction of SSR: the introduction of server-side rendering, the first screen performance can be significantly improved;
SSR optimization: Further performance optimization is required on traditional SSRs.

The introduction of SSR. To illustrate the importance of SSR, see Figure 12. The browser loads the page in four steps: request the page, request the external link resource, execute the script, and render the component. As can be seen from the comparison in the figure, in the first three steps of CSR, the user does not see the page; With the introduction of SSR, the user can see the requested page in the second step. One of the biggest uses of SSR is to improve first screen time.

△ Figure 12: Comparison between CSR and SSR

SSR optimization. Simply introducing SSR doesn’t deliver the performance expected, because SSR requires recursive rendering components, especially recursive VNodes, compared to template engines that concatenate strings directly. The SAN SSR has many improvements over the VUE /React SSR.

Go to VNode: recurse the VNode at compile time and only do HTML stitching at run time;
Compile-time calculation: move work to compile time as much as possible to reduce runtime overhead;

Figure 13 shows the performance comparison between the final SAN SSR and the Smarty template engine before the revamp.

Figure 13: Performance comparison between the final SAN SSR and the Smarty template engine before the revamp

You can see how Smarty and SAN SSR behave differently in different scenarios, because they render in very different ways. When the componentized SSR of the final search results page is launched, the online experiment results show that it is about 10ms faster than Smarty. This is already a pretty good result, and we beat the template engine in performance with componentization.

Five, the conclusion

In view of the problems encountered in the architecture evolution of Baidu search engine, I believe there will be some common things in other fields. Through Baidu’s solution, I hope to be able to do front-end architecture you have some inspiration.

Harttle

He holds a bachelor’s degree in physics and a master’s degree in computer science from Peking University. Joined Baidu in 2016, and was responsible for and participated in the research and development of Baidu Search Web Speed Browse Framework and MIP open source project. Currently, he is responsible for the search results page and search recommendation business. LiquidJS author, contributing to SAN, RealWorld Apps, hightlight.js, ALE, HTML5 Standard, etc.

Recruitment information

The front-end team of Baidu search is responsible for the development of baidu.com interaction and the underlying framework under various scenarios, and is committed to allowing people to access information equally and conveniently and find what they want. This is a group of post-90s Siege Lion/program socialites who are fond of originality, kindness and pursuit of perfection. The ratio of men and women in the team is 1:1. We give the outstanding person to free space, welcome to join: [email protected].

The original link: https://mp.weixin.qq.com/s/1y…

———- END ———-

Architect of Baidu

Baidu’s official technical public account has been launched!

Technology dry goods, industry information, online salon, industry conference

Recruitment information, internal information, technical books, Baidu surrounding

Welcome to your attention!