Since last year, users’ products on Baidu MEG (Mobile Eco-Business Group) platform have been gradually transformed into cloud native products, which have been basically completed. At present, more attention is paid to the construction of dynamic elastic capacity and dynamic management mechanism. In this issue of “Geek Talents”, we invited Chuan Yu, a teacher from Recommendation Technology Architecture Department, to talk with you about the stage strategy of the original transformation of Baidu search and recommendation engine cloud, as well as the thinking on the future development.

Brief introduction of guests: Chuan Yu

Since 2012, it has focused on the direction of search engine and recommendation engine. In 2016, I began to take charge of the research and development of my own resource scheduling and container scheduling system. In 2019, I began to take charge of the research and development of some common basic components, and started to comprehensively promote the transformation of cloud native architecture within MEG user products.

The core focus on two “efficiency” to achieve cost reduction and efficiency increase

“I agree that the goal of cloud native is to make the whole process of developing online apps easier. Baidu’s search and recommendation engines are both large and complex, especially search, which has a history of more than 20 years and cannot be designed entirely in accordance with a new set of concepts. So we have to combine our own genes with our current situation when we try cloud native.”

The main value of cloud native architecture lies in the improvement of efficiency, including resource utilization efficiency and R&D efficiency.

From the perspective of resource utilization efficiency, it is expected to realize the full mixing of online services through containerization and dynamic resource scheduling, so as to make the overall resource utilization of the cluster more balanced and more efficient, thus reducing the overall machine cost. In addition, through the cloud platform to achieve efficient resource-based delivery instead of physical machine delivery, improve the transfer efficiency of resources, so that internal resources can be transferred to the key business more quickly, to support the rapid development of the business.

In terms of R&D efficiency, on the one hand, through micro-service transformation, the iteration of different business teams is decoupled, and the mutual influence between teams is reduced to improve the efficiency of business iteration. On the other hand, it is expected to sink common infrastructure capabilities into cloud native infrastructure, raising the baseline level of new business architecture capabilities.

Some local fault tolerance ability, for example, in some business online similar architectural mechanisms already very mature, but for the new lines of business it is difficult to directly reuse, often need to pit, with reference to the mature business line experience, build, if we can get the ability of these common precipitation to the cloud in the form of standardization and normalization in the underlying infrastructure, Then the innovative line of business can be relatively simple reuse, less detours, as far as possible less technical debt.

In addition, the improvement of the R&D efficiency of the cloud native architecture is also reflected in the reduction of manpower and time in the handling of online problems and maintenance.

Usually there is a saying in the industry: a storage system is good or not, the key to see his level of operation and maintenance.

But in fact, not only for the storage system, but also for many innovative projects, if too many people support the operation and maintenance of online services to solve online problems, then the manpower invested in research and development will be relatively reduced, and the corresponding development speed may be affected

Through some cloud-native infrastructure, we can standardize and automate various routine operations, such as automatic repair of machine failure, automatic migration of service instance failure, and automatic adjustment of service capacity. On the one hand, it can reduce the labor cost of operation and maintenance. On the other hand, in many cases, an automated system can do a better job than a human.

“Before, we had automation. But the benefit of applying cloud native architecture is the mechanism that allows us to do this automation in a more formal, standard, and sustainable way. Is to free up that massive amount of manpower from the maintenance of this online service. When the size of the team remains the same, there will be less maintenance manpower, and more manpower to devote to research and development, which will increase the overall R&D efficiency.”

Overall, the greatest significance of cloud primitivity lies in improving efficiency and improving the baseline for overall R&D.

Especially when making new products, it can save the cost of purchasing resources and save too much human input in the basic stage to ensure the smooth launch of products. The lower the cost, the more innovation you can do. This allows each new product to avoid losing at the starting line.

02 standardize service design standards to set rules for cloud native transformation

The MEG architecture platform will be fully cloud-enabled in 2019. However, the migration of most services is only the transformation of deployment mode from physical machine deployment to PaaS platform container deployment, without the transformation and upgrade of cloud environment and cloud capability to obtain greater cost and efficiency benefits. Based on this problem, it is expected to further standardize the service design standards of MEG architecture platform to realize the transition from cloud architecture to cloud native architecture.

“We have a certain foundation before we implement cloud protochemistry. The first is that the whole organization has the idea of micro-services; Secondly, a series of micro-service best practice standards have been developed in practice, and “MEG User Product Architecture Cloud Native Internal Specification” has been established. And third, we already have a set of public infrastructure.”

Chuanyu refers to the characteristics of cloud native applications that are widely recognized in the industry, combined with Baidu’s internal pioneering practice, and in order to ensure the efficiency and effect of the landing of cloud native architecture, the service module design is standardized from the following three aspects:

1. Microservitization: each service granularity should be within a limited scope;

Container encapsulation: the deployment of a service should only rely on the infrastructure and components within the container, and should not rely on other business services;

3. Dynamic management: Each service should be able to adjust its deployment dynamically without affecting its external SLA commitments.

Evaluation methods of each specification:

Overall business evaluation method:

1. The service without access to PaaS does not meet the standard calculation;

2. Assess whether a service meets the specification on a service basis. Only when a service meets all the above standards at the same time, can a service be considered to meet the specification of cloud native architecture;

3. For each line of business, the cloud native specification index is calculated by percentage system. The calculation method is (the total quota/ total quota occupied by the service module conforming to the specification), CPU quota/MEM quota is used to calculate according to the lower proportion.

4, each individual score, only as the internal index reference, used to analyze the bottleneck, promote the landing.

03 underline key points, the phased realization path of cloud original biochemistry

It is a very complicated process from cloud formation to cloud formation. After the establishment of the cloud native transformation specification, it has experienced four stages successively, which are: micro-service transformation, containerization transformation, dynamic management, and advanced cloud native. However, MEG’s cloud original transformation process has not stopped, but continues to explore the fifth stage — declarative architecture.

The first stage: micro-service transformation

At first, when the Baidu MEG architecture platform was fully cloud-oriented, all the architecture services and resources were managed to the internal cloud platform, but at that time, there were still problems in the utilization of resources. The first thing MEG architecture platform does when implementing cloud native is to require all services to do micro-service transformation and eliminate giant services.

“These huge services will lead to fragmentation of the overall resource allocation. For example, if a service occupies 80% of the resources of a machine, the remaining 20% May not be divided out, so there will be exclusive phenomenon and can not be mixed. Still have a few services before deployment, want to make a few modifications to the environment of complete machine.

So even though all the resources were hosted on the cloud platform, we were still using it as much as we were using the machine directly. OP invested a lot, and overall online resource utilization, including resource allocation rate, was relatively low.”

The split of microservices has brought several changes: first, improved performance.

Although there is a little more RPC overhead, after splitting, each service can be optimized for the specific purpose, and all the scale-out operations can be carried out only for the logic of this service. Therefore, the overall cost, delay and other aspects of the performance to achieve a significant improvement.

Second, improve the efficiency of research and development.

According to the original product and strategy iteration, in many cases, a service needs hundreds of people to work together, which takes a long time to launch. But after the split, although there are dozens of modules, but a module only needs two or three people to iterate, and can be online anytime, anywhere. This is of great importance to the overall improvement of R&D efficiency.

“For example, our Feed video recommendation service, before it was split, was a huge service and iterated a lot. Single instance 450 CPU, 100G memory, single module more than 40+ strategy RD involved in the development, daily online feature 10+. Therefore, there are a lot of resource fragmentation, long on-line time and difficult memory iteration in the operation process.

We did three things:

First, according to the recommendation business logic, it is divided into two layers: aggregation and recall.

Secondly, the recall layer is divided into several parallel recall services according to the recall algorithm, and part of the recall service can be lost.

Third, the aggregation layer is split into machine learning estimation services, which are accelerated using the AVX instruction set, and aggregation services.”

The results of the transformation of the FEED video recommendation service are as follows:

L Split a single large module into 10+ small modules, with the largest module occupying less than 40G of memory.

L Overall CPU usage decreased by 23% and memory usage decreased by 84%

L Delay reduced by 50+ms and stability increased from less than 99.9% to 99.97%

The efficiency of l iteration is greatly improved, and the situation of mutual block of hitching line is completely eliminated.

The second stage: container transformation

The MEG architecture platform does containerization by requiring all services to put their dependencies in containers. Achieve one-click deployment of all services, that is, automated deployment.

This may not be a problem in some emerging Internet companies, because many of their services are based on Docker. But Baidu’s search system is 20 years old and it has to take time to do this. In this process, a typical transformation of the search BS module, its age is almost as big as Baidu search.

Twenty years ago, when designing BS, the architect of Baidu took into account the resources of the whole machine as much as possible.

“SSDs were very expensive at the time, so when we designed the BS, we wanted to use all the SSDs, and also, for convenience, we didn’t show all the claims, for example, if you claimed to use 10 gigabytes when in fact you used 60 gigabytes. This was fine at the time, because there was only one service per machine, and it was no one else’s business to use the resource either explicitly or implicitly. Now the disk hardware has been completely different from 20 years ago, a single service computing capacity is often not enough to take up the entire collection of resources, in order to avoid waste, need to mix other services. Then there is a problem and it has to be fixed.”

The first is that each service explicitly states the resources it needs to consume, eliminating the greedy preemption strategy.

Put all the resources in his own container. That is, to change BS from a machine-level service to a container-level service that does not occupy resources outside the container. This is where the scheduling capabilities of the container orchestration system really come into play.

The second thing is to improve the efficiency of service deployment.

Some older services may need to pull a lot of extra data when deployed, or even require the OP to manually adjust the parameters and configuration of the machine. This leads to deployments that are either not automated or are too slow. To improve efficiency, you need to remove all dependencies on the physical environment of the service and rely only on the environment within the container. In addition, we also need to optimize P2P download and real-time transformation of some data to optimize the speed of service start-up.

“We have used a service that stores data classes, which is logically portable, but in reality it takes about 8 hours for an instance to be migrated. This kind of portability doesn’t make much sense. Because the stored data service is limited by the number of copies/capacity/concurrency, for example, a cluster with hundreds or thousands of instances can only migrate a few in one round at most, and then it takes 8 hours to migrate a round. Then the entire cluster migration time will be very long. Kernel upgrades, operating system upgrades, troubleshooting, etc., can be very cumbersome. Therefore, we need to optimize the P2P distribution, optimize the data download and the data distribution, and speed up the deployment.”

The third stage: dynamic management

Dynamic management mainly refers to “dynamic”, such as whether online services can be migrated and expanded at any time.

It has two layers:

On the one hand, from the point of view of the business itself, dynamic management requires all services to have a certain degree of flexibility and fault tolerance.

Whenever an instance on the line involves scaling or migration, it will be restarted or unavailable for a short period of time. This first requires that all online services have a certain fault tolerance. Secondly, the service needs to be capable of automatic load-balancing (the simplest way of accessing the service is in the use of a service), which needs to automatically shatter traffic when new instances are added, and which needs to be promptly blocked in the event of instance failure or exit

On the other hand, from an infrastructure level, since all services can be migrated and scaled up at any time.

Then we can automate the on-demand allocation of capacity and traffic on the service.

“Once a business is going to do an operation, it needs to do a lot of temporary expansion operations. This process was very troublesome in the non-cloud native mode. It was necessary to first find a batch of physical machines, then deploy the service to this batch of new machines to achieve capacity expansion, and then drop the service after finishing the activities, and then exit the physical machine. The whole process involved a lot of manual operation, and the cost was relatively high. But after the dynamic transformation, the operation of finding the physical machine is gone. All of our services are in one large resource pool. Any business that needs additional resources in a short period of time can simply expand it. Because the demand period of different businesses is different, but also the use of the wrong peak.

“Then there is the flexibility of resource use. For example, for my own recommendation system, if there are additional resources available in the resource pool, it will enable my recommendation system to provide a better user experience through more complex calculations. Therefore, when there are no operational activities, we use these idle resources to improve the recommendation and retrieval effect. When there are operational activities, we will spit out this resource to operational activities. This allows us to balance the overall use of resources, and this process should be a very low-cost, automated operation.”

Stage 4: Advanced Cloud Prototype

In order to continue to reduce costs and improve efficiency, starting from 2021, the cloud native transformation of MEG architecture platform will add further operations, such as Serverless and Function as a Service, on the basis of dynamic management.

Before the transformation, the entire system of that capacity is basically fixed, once there is a burst of traffic can only be downgraded. Through the Serverless mechanism, real-time monitoring of traffic, abnormal found in a few minutes can be automatically completed capacity expansion.

However, Function as a Service is a direction to improve the R&D efficiency to the extreme. It allows business students to focus only on the logic they want to implement. As for how to split micro-services, how to do traffic control, how to do capacity management, all left to the underlying system to implement.

Phase 5 Declarative Architecture

Chuanyu mentioned that in fact, in the advanced cloud native stage to do some things, are moving towards the declarative architecture. For example, Function as a Service is a key part of the declarative architecture, including the implementation of the Serverless mechanism, with the ultimate goal of decoupling policy from architecture.

“There are a lot of architectures that were designed with very little problem at the beginning. However, with the development of business, reconfiguration is often needed after running for a period of time, because with the change of business, the business scenario that the system is facing is not the same. However, the architecture adjustment is very complex, usually breaking bones, and will involve a lot of business policy logic migration, etc. We want to decouple the business from the architecture as much as possible, the business logic from the architectural design. This way, the business logic can be described as simply as possible, and our system can automatically split up functions based on these descriptions, and then send these functions to the system for execution.”

The MEG architecture platform becomes very flexible if you can separate the architecture from the business.

The automatic derivation system is responsible for the implementation of nodes, functions, how to connect and combine with these functions. In this way, our system will become declarative, which means you declare your business logic on it, and it will automatically deduce what the architecture needs to be, and automatically execute it.

“So of course this is the ultimate ideal. We still have many, many things to do on our way to achieving this goal.”

The above are some critical paths for Baidu MEG architecture platform to complete the cloud native transformation.

In the following sharing, Chuanyu will also focus on service governance, automation, chaos engineering and other aspects, and talk about some problems and solutions in the process.

The original link:…

Recommended reading

BBB10 billion traffic Baidu search platform, how to do observability construction?

BBB billion traffic search front end, how to do the architecture upgrade?

|Exploration and Application of Elastic Nearline Computing in Baidu Information Flow and Search Services

———-  END  ———-

“Baidu Geek said” a new online

If you are interested, you can also join the “Baidu Geek Talk WeChat Exchange WeChat Group” and continue to communicate in the group.

If you are interested in other positions in Baidu, please join the “Baidu Geek Official Recruitment Group” to learn about the latest job trends. There is always a JD for you.

Technical dry goods, industry information, online salons, industry conference recruitment information, internal information, technical books, Baidu surrounding