Evolution and practice of online service system of algorithm platform

Turing platform is Meituan distribution technical team building algorithm of one-stop platform, Turing platform of online service framework – Turing OS mainly focus on machine learning and deep learning online service module, online deployment and calculation of the strategy for the model and algorithm provide a unified platform solution, can effectively improve algorithm iterative efficiency. This paper will discuss the Turing OS in the construction and practice of thinking and optimization ideas, hoping to help or inspire you.

0. Write first

AI is the hottest “star” in the Internet industry. Both established giants and traffic upstarts are making great efforts to develop AI technology to empower their businesses. Meituan has long been exploring the application of different machine learning models in various business scenarios, from linear model and tree model at the beginning to deep neural network, BERT and DQN in recent years, and has been successfully applied to search, recommendation, advertising, distribution and other businesses, and achieved good results and output.

Meituan distribution algorithm for the construction of technical platform – Turing (hereinafter referred to as Turing platform), aims to provide one-stop service, data preprocessing, feature generation, model training, assessment, deployment, on-line prediction, AB effect evaluation of the whole process of experiment, the algorithm, reduce the use of the algorithm and engineer threshold, help them from the tedious engineering development, Focus limited energy on iterative optimization of business and algorithmic logic. For specific practice, you can refer to the technical blog “Construction Practice of One-stop Machine Learning Platform” previously promoted by Meituan technical team.

With the completion of machine learning platform, feature platform and AB platform, the distribution technology team found that online prediction gradually became the bottleneck of algorithm development and iteration. Therefore, we started the overall research and development of Turing online service framework. This article will discuss the design and practice of Turing OS (Online Serving), a framework of Turing platform, in detail. I hope it will be helpful and enlightening to you.

With the gradual maturity of The Turing platform, more than 18 business parties including Meituan Distribution have joined the Turing platform. The overall situation is as follows: A total of 10+ BU (business units) are connected, 100% covering the core business scenarios of Meituan distribution. It supports 500+ online models, 2500+ features, 180+ algorithm strategies, and tens of billions of online predictions every day. By enabling Turing platform, the iteration cycle of the algorithm is reduced from day level to hour level, which greatly improves the iteration efficiency of the distribution algorithm.

1. Turing Platform introduction

Turing platform is a one-stop algorithm platform with the overall architecture as shown in Figure 1 below. The bottom layer relies on Kubernetes and Docker to realize the unified scheduling and management of CPU/GPU resources, and integrates Spark ML, XGBoost, TensorFlow and other machine learning/deep learning frameworks. It contains features production, model training, model deployment, online reasoning, AB experiment and other one-stop platform functions, supporting the scheduling, time estimation, distribution scope, search, recommendation and other AI applications of Meituan distribution and flash purchase, cycling, food shopping, map and other business divisions. Turing platform includes four main functions: machine learning platform, feature platform, Turing Online Serving platform and AB experimental platform.

Machine learning platform: provides model training, task scheduling, model evaluation, model tuning and other functions, and implements drag-and-drop visual model training based on DAG.
Feature platform: Provides functions such as online and offline feature production, feature extraction, and feature aggregation, and pushes them to the online feature database to provide high-performance feature acquisition services.
Turing Online Serving: Turing OS provides a unified platform-based solution for feature acquisition, data preprocessing, Online deployment of model and algorithm strategies, and HIGH-PERFORMANCE computing.
AB experimental platform: provides pre-aa grouping, in-process AB triage and after-effect evaluation, covering the complete life cycle of AB experiments.

Turing OS mainly refers to the online service module of Turing platform, focusing on machine learning/deep learning online services. The goal is to make the model trained offline quickly go online, effectively improve the efficiency of algorithm iteration of each business department, get results quickly and generate value for business. The following highlights Turing Online Serving.

2. The construction background of Turing OS

In the early stage of the development of Meituan distribution business, in order to support the rapid development of the business, fast support algorithm on-line, rapid trial and error, the engineering side of each business line independently developed a series of online prediction functions, which is known to us as the “chimney mode”. This model is self-contained and flexible enough to quickly support the personalized needs of the business. However, with the gradual expansion of business scale, the disadvantages of this “smokestack model” become obvious, mainly manifested in the following three aspects:

Repeat wheel: feature acquisition and pre-processing, feature version switching, model loading and switching, online prediction and AB experiments are all developed independently, starting from scratch.
Lack of platform capability: Lack of platform operation, maintenance, management, monitoring and tracking capabilities for the full life cycle of feature and model iteration online, resulting in low r&d efficiency.
Serious coupling between algorithm and engineering: the boundary between algorithm and engineering is fuzzy, the coupling is serious, and the efficiency of algorithm iteration is low.

The “smokestack model” made indelible contributions in the early stage of business development, but with the growth of business volume, the marginal revenue of this approach gradually decreased to an intolerable degree, and a unified online service framework is urgently needed to change.

At present, most of the mainstream open source machine learning online service frameworks in the market only provide model prediction functions, without pre-processing and post-processing modules, as shown in Figure 2 below.

For example, Google TensorFlow Serving is a high-performance open source online service framework for machine learning model Serving. It provides a gRPC/HTTP interface for external invocation, supports model hot update and automatic model version management, solves the pain points of resource scheduling and service discovery, and provides a stable and reliable service. However, the pre-processing and post-processing modules are not included in TensorFlow Serving. The input is pre-processed into tensors by the business engineering party and delivered to TensorFlow Serving for model calculation, and then the results of model calculation are post-processed. Pretreatment and post-treatment of logical strategy is very important for the algorithm, the iterative also more frequently, this part with model combined with more closely, more suitable for students of algorithm is responsible for, if by the engineering implementation, the implementation algorithm of the engineering students only students’ logical design, coupling is too serious, iterative efficiency is low, but also easy to cause the design and implementation, Cause an online accident.

In order to solve the above problems and provide users with a more convenient and easy to use algorithm platform, Turing platform has built a unified online service framework, through the integration of model calculation and pre-processing/post-processing modules, in the form of algorithm version, and iteration, eliminating the complex interaction between algorithms and engineering.

Here we extend the algorithm definition, and the algorithm in this paper (also called algorithm strategy) can be understood as a combination function: y= F1 (x)+ Fi (x)+… +fn(x), where FI (x) can be regular calculation, model calculation (machine learning and deep learning) or non-model algorithm calculation (such as genetic algorithm, operations optimization, etc.). Any adjustment of the combination factors in the combination function (such as model input/output changes, model type changes, or rule adjustments) can be considered as an iteration of the algorithm version. Algorithm iteration is a cyclic process of algorithm development – on-line – effect evaluation – improvement. The goal of Turing OS is to optimize the iterative efficiency of the algorithm.

3. Turing OS 1.0

3.1 Introduction to Turing OS 1.0

In order to solve the problems of repeated wheels and lack of platformability during the development of “Chimney Mode”, we set about building the Turing OS 1.0 framework. The framework integrates model calculation, pre-processing and post-processing modules, and encapsulates complex logic such as feature acquisition, pre-processing, model calculation and post-processing in the Turing online service framework in the form of SDK. Algorithm engineers develop personalized pre-processing and post-processing logic based on Turing online service SDK; Business engineering integrates Turing online service SDK and algorithm package, and invokes the interface provided by SDK for model calculation and algorithm calculation.

Through The Turing OS 1.0, we solved the problem of independent development, independent iteration and repeated wheel building by each business side, which greatly simplified the development work of algorithm engineers and engineering researchers. Moreover, the project indirectly called algorithm preprocessing and model calculation through the Turing online service framework, and did not directly interact with the algorithm. The coupling problem between engineering and algorithm is also reduced to some extent.

As shown in Figure 3, the Turing online Services framework at this stage integrates the following functions:

3.1.1 Feature acquisition

Feature aggregation, dynamic grouping, local cache, and physical resource isolation at the service line level provide high-availability and high-performance feature online computing capabilities.
Customize Machine Learning Definition Language (MLDL) to configure the feature acquisition process and unify the feature acquisition process to improve the ease of use of online service features.
Deep Learning Box (DLBox) supports local calculation by placing original vectorization features and models on the same node, which solves the performance problem of recalling large-scale data in Deep Learning scenarios, and supports high concurrency and fast algorithm iteration of various delivery services.

3.1.2 Model calculation

Supports Local and Remote model deployment modes, which are deployed in Local and dedicated online service clusters of business services respectively. The performance of large-scale model computing can be solved by means of asynchronous parallel computing of multiple machines and heterogeneous CPU/GPU resources. Solve the problem that the superlarge model cannot be loaded in a single machine through model Sharding.
In terms of deep learning model calculation, the compiler optimization techniques such as HPC accelerator library MKL-DNN and TVM are used to further improve the reasoning performance of the deep learning model.
Feature acquisition, feature processing and assembly are automated by MLDL package feature correlation and pre-processing logic, and the iterative efficiency of model development is improved.

3.1.3 Algorithm calculation

Supports algorithm version management, AB routing, dynamic acquisition of models, features, and parameters associated with algorithm versions, and hot update of models and parameters.
Support AB experiment and flexible grayscale release volume, and achieve AB experiment effect evaluation through unified buried point log.

3.2 Turing OS 1.0 legacy issues

Turing OS 1.0 solves the problems of repeated wheels, chaotic features and lack of platform capabilities of all lines of business. By providing one-stop platform services, it supports the scenarios of online prediction of large-scale algorithms for meituan distribution of all lines of business and the needs of high-performance computing. The algorithm students pay more attention to the iterative optimization of the algorithm strategy itself and improve the efficiency of algorithm iteration. However, the aforementioned tripartite coupling problem of engineering, algorithm and platform has not been well solved, which is mainly reflected in:

The service engineering statically depends on the algorithm package. The algorithm package is deployed in the service engineering.
The algorithm package and the business project run in the same JVM. Although the one-time RPC consumption is reduced, the calculation performance of the algorithm package will affect the performance of the business project. The stability of the business project is uncontrollable, for example, the CPU consumption is too large during the calculation of the TensorFlow model, and the memory consumption is caused by the loading and switching of the large model.
As the Turing platform provides more and more functions, the Turing online service SDK becomes more and more bloated. Business engineering must upgrade the Turing online service SDK to use the new functions of the Turing platform. However, upgrading the SDK for business engineering is risky and will slow down the deployment of business engineering.

Based on the above points, it can be seen that the algorithm, engineering and Turing platform are highly coupled, resulting in many pain points respectively, as shown in FIG. 4. These problems seriously affect the iterative efficiency of the algorithm, and the online test period of algorithm iteration is long and the efficiency is low:

Algorithm pain points: Algorithm package iteration is strongly dependent on the launch of business projects, and each project release requires a complete R & D and test cycle, resulting in a long process and low efficiency.
Engineering pain point: Algorithm package and business engineering are in the same JVM, the performance of algorithm calculation will affect the performance of business engineering services; At the same time, the business engineering needs to follow the iteration of the algorithm package frequently, and the change may only involve upgrading the version of the algorithm package.
Pain points of Turing platform: The SDK of Turing online service is deployed in business engineering, and it is difficult to converge version and compatibility. At the same time, it is difficult to promote the new functions of Turing, so business engineering is required to upgrade the Turing online service SDK.

Therefore, it is necessary to better decoupled the algorithm, engineering and Turing platform, so as to not only meet the needs of rapid algorithm iteration, but also meet the demands of stability in the engineering side of the business, so as to achieve win-win cooperation.

4. Turing OS 2.0

In view of the pain point of high coupling of algorithm, engineering and Turing platform in Turing OS 1.0 framework, we developed the Turing OS 2.0 framework, aiming to solve the problem of coupling of algorithm, engineering and Turing platform, so that algorithm iteration does not rely on engineering release, and new functions of Turing platform go online without business engineering upgrade SDK. Further improve algorithm iteration efficiency and engineering development efficiency.

Centering on the goal of decoupling algorithm, engineering and Turing platform, in the Turing OS 2.0 framework, we designed and developed the algorithm package plug-in hot deployment framework, algorithm data channel and algorithm orchestration framework, and other functions to support algorithm self-service iteration online. At the same time, an algorithm verification platform with sandbox drainage, real-time playback, performance pressure test and Debug test functions is designed and developed to ensure the high performance, correctness and stability of the algorithm strategy. The Turing OS 2.0 framework decouples the algorithm, engineering and Turing platform, realizing the respective closed-loop of algorithm and engineering iteration. Most of the entire process of algorithm iteration does not require the participation of engineering r&d personnel and test engineers, and algorithm engineers can complete the iteration online of algorithm strategy at the hour level. With the enabling of Turing OS 2.0, the efficiency of algorithm development iteration has been greatly improved.

Turing OS 2.0 features are as follows:

Standardized lightweight SDK: Business engineering only needs to rely on a lightweight Turing OS SDK without frequent upgrades, reducing the difficulty of access to the engineering end and decoupling business engineering and Turing platform.
Algorithm plug-in: self-developed Turing algorithm plug-in framework, support algorithm package as a plug-in in the Turing OS service thermal deployment, decoupling algorithm and engineering; Multiple versions of multiple algorithm packages can be deployed in the Turing OS service, and each algorithm package has its own thread pool resources.
Data channel: in some complex algorithm scenarios, the algorithm strategy also depends on the completion of business engineering: 1) The algorithm obtains data internally, and can only obtain results through the call interface of business engineering and then transfer them to the algorithm; 2) Algorithm internal call algorithm, only through business engineering transfer call algorithm A and algorithm B at the same time. In order to solve the above two points, we put forward the concept of Data Channel, so that the algorithm itself has the ability to acquire Data independently, rather than all the Data need to be acquired by business engineering and then transmitted to the algorithm.
Algorithm choreography: Multiple algorithms are combined into a directed acyclic graph (DAG) in a serial or parallel manner, which can be regarded as an algorithm choreography. The abstraction and precipitation of the business algorithm correspond to the new architecture is the combination and arrangement of the algorithm. The arrangement of the algorithm further enables the business on-line and algorithm iteration, further improves the efficiency of the iteration of the business algorithm, and further decouples the algorithm and the engineering.
Sandbox drainage: The Turing sandbox is a service that is physically isolated from the Turing OS, but the operating environment is completely consistent. The flow through the sandbox will not have any impact on online business. The sandbox can verify the correctness of the algorithm logic and evaluate the performance of the algorithm calculation, improving the efficiency of the R&D test process.
Turing Playback and unified burial point: In the process of algorithm calculation and model calculation, a lot of important data (algorithm strategy, model, characteristics, parameters, data channels and other relevant data) will be generated. These data not only help to quickly identify and locate system problems, but also provide important data basis for AB experimental report, sandbox drainage, performance pressure measurement and other modules. In order to better automatically record, store and use these data, we designed a real-time playback platform and a unified burial point.
Performance compaction: By integrating the capabilities of Quake, the Turing OS reuses the traffic data collected by the unified playback platform to construct requests, and tests the sandbox with the new version of the algorithm package to ensure the performance and stability of algorithm strategy iteration.

Here’s a look at some of these features and how Turing OS 2.0 addresses the pain point of coupling algorithms, engineering, and Turing platforms.

4.1 Standardize lightweight SDKS

In order to solve the coupling pain point between business engineering and Turing platform, that is, the SDK version convergence of Turing online service SDK is difficult when it is deployed in business engineering, we split and transform the Turing online service SDK mainly from the following aspects: lightweight, simple and easy to access, stable and scalable, safe and reliable.

SDK lightweight: the original Turing OS SDK logic is sunk into the Turing OS service, providing only a simple and universal batch prediction interface; The SDK does not need to expose much of the details of the algorithm, algorithm version routing, real-time/offline feature acquisition, model calculation, etc., are hidden inside the Turing OS. The lightweight SDK integrates custom routing for The Turing OS internally, so the business side doesn’t have to care which Turing OS cluster the algorithm package is deployed in, and is completely transparent to the user.
Simple and easy access: Provides a unified and universal Thrift interface for algorithm calculation, and uses Protobuf/Thrift to define algorithm input and output. Compared with the current Java class definition interface, it has guaranteed compatibility. With Protobuf interfaces defined, algorithms and projects can be developed independently of each other.
Extensible: lightweight SDK version is stable, no need to repeatedly upgrade the project side; Protobuf naturally supports serialization, which can be used for subsequent traffic copy and playback burying points.
High performance: For scenarios requiring high availability and large volume algorithm calculation, such as batch prediction for C-end users, we designed asynchronous batch and highly parallel methods to improve algorithm performance. For scenarios with long single-task computing time, high CPU consumption and high availability requirements, such as scheduling path planning by city region, we designed a client fast failure optimal retry mechanism to ensure high availability and balance the computing resources of Turing OS.
Secure and reliable: For scenarios where multiple algorithm packages are deployed on a single Turing OS, resource isolation at the thread pool level is provided. For algorithm packages of different service lines, resource isolation at the physical level is provided according to service scenarios. At the same time, a circuit breaker degradation mechanism is added to ensure stable and reliable computing processes.

4.2 Algorithm plug-in

We solved the pain point of coupling between business engineering and The Turing platform by transforming the Turing OS SDK to standard and lightweight. Through the transformation of Turing OS as a service, the pain point of coupling between algorithm and business engineering is solved. However, the pain point of coupling between the algorithm and The Turing platform still exists and increases: the iterative on-line of the algorithm relies on the Turing OS service release, and fails to achieve the goal of three-party decoupling.

In order to solve the algorithm and the coupling between the Turing platform pain points, further enhance the iteration of the algorithm strategy efficiency, our algorithm is the design train of thought of the plugin, next Turing OS container: fold algorithm as a plug-in, deploying to the Turing OS, algorithm package release does not require the Turing OS version, I don’t even need to restart the Turing OS, as shown in figure 7.

Algorithm plug-in: We developed the Turing OS algorithm plug-in framework by ourselves, and support the algorithm package to be deployed in the Turing OS service in the form of plug-in; The specific implementation scheme is to customize the algorithm ClassLoader ClassLoader. Different classloaders load different versions of the algorithm package. Hot deployment of the algorithm package is realized by loading multiple versions of the algorithm package and pointer replacement.
Turing OS containerization: Turing OS acts as a plug-in container, loads different algorithm versions of algorithm packages, performs algorithm version routing and algorithm strategy calculation. The process of Turing OS containerization transformation is as follows: 1) If the algorithm version does not need to add parameters, neither the engineering side nor the Turing OS need to issue the version; 2) The main work of business engineering is to transfer parameters to the algorithm. The logic is simple. If the input parameters do not change, there is no need to issue the version.

4.3 Data Channel

By the above means, we solve the coupling problem of algorithm, engineering and Turing platform in iteration. However, in addition to the above coupling, there are also some complex algorithm scenarios. There is still coupling between algorithm and business engineering, mainly reflected in the algorithm relying on the following two data of business engineering:

Internal data acquisition of the algorithm: at present, the results are obtained through the call interface of business engineering and then transmitted to the algorithm, such as some servitization interface data and distributed KV cache data, etc. Both the algorithm and business engineering need to go online in development iteration.
Algorithm internal call algorithm: at present, business engineering simultaneously calls algorithm A and algorithm B and writes the transfer logic to achieve this. For example, the input of algorithm A needs to use the result of algorithm B, or the final output needs to synthesize the result of algorithm A and algorithm B. These operations are generally handled by business engineering. An alternative scheme is to combine algorithm A and algorithm B into A huge algorithm, but the disadvantage of this scheme is to increase the cost of AB experiment and gray scale research and development of algorithm A and algorithm B independently.

In order to solve the above two points, we put forward the concept of Data Channel, so that the algorithm itself has the ability to obtain Data autonomously. In the algorithm, the algorithm can support the data channel through the annotation provided by The Turing OS. The interactive interface between the algorithm and business engineering only needs to transfer some key parameters and context data, and the algorithm internally assemble the parameters required by the data channel. After the transformation of data channelization, the algorithm interface is further simplified, and the degree of coupling between algorithm and engineering is further reduced. The problem of algorithm internal call algorithm can be solved by the algorithm arrangement introduced below.

4.4 Algorithm Scheduling

A complete process including algorithm algorithm to calculate the part, and the pretreatment of the input logic and the calculation results of post-processing logic, etc., can be a N time rules algorithm to calculate the, N model (machine learning and deep learning, etc.) or non algorithm of the model calculation (such as genetic algorithm and strategy optimization, etc.), or a variety of types of combination algorithm. We abstract the computational logic unit with independent input and output into an operator, which can be arranged and reused. The two general operators are as follows:

Model calculation operator: In other words, the model computing engine performs model calculation. We support Local and Remote model computing modes. In Remote computing mode, models may be deployed in different model clusters, and operators further encapsulate model computing, making Local and Remote selection and model cluster routing transparent to users. Algorithm engineers do not need to be aware, we will make dynamic adjustments based on the overall computational performance.
Algorithm computing operator: that is, the algorithm computing engine in Turing OS performs algorithm strategy calculation. Different algorithm plug-ins may be deployed in different Turing OS. Meanwhile, the routing function of Turing OS cluster is encapsulated and transparent to users.

Multiple operators are combined into a directed acyclic graph (DAG) in serial or parallel ways to form operator arrangement. At present, we have two ways to realize operator arrangement:

Algorithmic data channel: Different algorithmic computing engines in Turing OS call each other or algorithmic computing engines call model computing engines. Algorithmic data channel is a specific means to achieve operator orchestration.
Algorithm total control logic: we extract a layer of algorithm total control logic layer from the upper layer of algorithm call, to meet the complex algorithm scene and multiple algorithm dependent situation, the algorithm total control logic is implemented by algorithm engineer in algorithm package; Through the algorithm control logic function, algorithm engineers can arbitrarily arrange the relationship between algorithms, and further decouple the algorithm and the project.

From an algorithm engineer’s point of view, Turing OS provides services as building blocks, connecting independent sub-functions and operators in a standard way to form online systems that meet a wide range of needs.

Under this architecture, the work of the algorithm mainly includes the following three parts: 1) The algorithm engineer abstracts and models the business process; 2) Algorithm engineers conduct independent operator development and testing; 3) Algorithm engineers arrange and combine operators based on business process abstraction. Operator arrangement further enables the on-line operation of business functions and algorithm iteration, and the iteration efficiency of business algorithm is further improved.

4.5 Multi-mode integration

As described above, Turing OS can deploy multiple versions of multiple algorithm packages as a container and supports hot deployment of algorithm packages. Turing OS decouples the tripartite coupling of business engineering, algorithm and Turing through plug-in thermal deployment and choreography, which greatly improves the iterative efficiency of algorithm. To further meet the business requirements, we provide two Turing OS deployment integration modes: Standalone mode and Embedded mode.

Standalone (Standalone mode)

In Standalone mode, The Turing OS is deployed independently of the business service, which calls the algorithm through a lightweight SDK. The Turing LIGHTWEIGHT SDK internally encapsulates the custom routing of the Turing OS and the logic of Thrift-RPC calling the Turing OS service.

Embedded (Embedded mode)

In some complex scenarios with high concurrency and high performance requirements, higher requirements are put forward for the integration mode and performance of our Turing OS. In the independent deployment mode, every algorithm calculation of business engineering has RPC consumption, so we implement a new integration mode of Turing OS — Embedded. In Embedded mode, we provide Turing OS framework code package externally. Business parties integrate The Turing OS framework package in their own engineering services. Business services also serve as a Turing OS container and perform algorithm calculation locally in business services by invoking algorithms through lightweight SDK. Features of the embedded Turing OS are as follows:

Because of the integration of Turing OS framework code, business engineering inherits the functions of algorithm package plug-in and hot deployment, and has the dual attributes of business function and Turing OS container.
Business engineering is not directly dependent on the algorithm package, but by the Turing OS framework for dynamic management, algorithm package plug-in hot deployment, to achieve the purpose of decoupling algorithm and engineering.
The service engineering directly calculates the local algorithm, which reduces the RPC and serialization consumption of algorithm invocation. Meanwhile, the service engineering server resources are reused, which further reduces cluster resource consumption and improves resource utilization.

When the algorithm package plug-in is deployed, the business project integrated in embedded mode will load the corresponding algorithm package as a container and route to the local area for algorithm calculation, as shown in Figure 9 below.

The Standalone and Embedded modes have their advantages and disadvantages, and neither has an absolute advantage. Use them according to your specific business scenario. A comparison of the two modes is as follows:

Deployment patterns	advantages	disadvantages	Applicable scenario
Standalone	The coupling is lower, and the business side only relies on the Turing lightweight SDK	Need to build Turing OS cluster, occupy machine resources; There is RPC call overhead	Suitable for large-scale invocation and asynchronous parallel computing on multiple distributed machines
Embedded	Reuse business machine, high resource utilization; Fewer RPC calls, high performance	Can not give full play to the multi-machine asynchronous distributed parallel, only single machine parallel	Suitable for small batch invocation and high RT performance requirements for single invocation

4.6 Turing sandbox

After the hot deployment of algorithm plug-ins supported by Turing OS, the efficiency of algorithm iteration has been greatly improved compared with before, and the on-line freedom of algorithm engineers has been greatly increased, without the scheduled development and testing of business engineering and testing; But it also introduces a new problem:

Before the algorithm iteration goes online, the flow on the lead cannot be predicted, so the algorithm effect can be evaluated before going online in advance, which is difficult to verify before going online, and the test efficiency of algorithm engineers is low.
At present, online real-time evaluation and verification is difficult, and online performance and effect evaluation of algorithm strategies lacks process automation tools.
Frequent iteration on-line is also a great challenge to the stability of Turing OS services and business.

At that time, the alternative plan was to deploy the algorithm strategy online first, cut gray scale to small flow, and then analyze the unified buried point log evaluation algorithm effect. The defect of this scheme is that the algorithm effect cannot be evaluated before it goes online and the problem is found too late. If the grayscale function is faulty, the online business will be affected, resulting in Bad Case. In view of the above problems in the verification link before going online, we developed the Turing sandbox and realized the full-link simulation experiment of the algorithm on the premise of not interfering with the online business stability.

The Turing Sandbox is a service that is physically isolated from the Turing OS service but runs in a completely consistent environment. Traffic passing through the sandbox does not affect online services. As shown in Figure 10 below, online traffic is diverted to the online environment sandbox, and the configuration and data of each environment of Turing OS and Turing Sandbox are consistent (version, parameters, characteristics, model, etc.). The new version of the algorithm (V3 of algorithm package 1 in Figure 10 below) is first deployed in a sandbox to verify the correctness of the algorithm through traffic diversion. Meanwhile, the algorithm performance can be measured by traffic diversion in the sandbox. As an automatic tool for algorithm verification process, the Turing sandbox improves the algorithm testing efficiency and further improves the iteration efficiency of algorithm version.

4.7 Unified Playback Platform

In order to facilitate the analysis of the algorithm effect and troubleshooting of anomalies, we need to record the input, output, used characteristics and model data in the algorithm calculation process, so as to restore the site. However, a large amount of data will be generated during algorithm calculation, which brings challenges to storage and recording:

Large amount of data: One request may correspond to multiple algorithm model calculations, and rich eigenvalues are often used, resulting in intermediate calculation data several times the amount of requests.
High concurrency: Data generated by each Turing OS service is collected and stored in a centralized manner. It is necessary to be able to carry the sum of QPS traffic during peak service periods.
Customizable: Turing OS deployed dozens of different algorithms, their request and response formats vary greatly, and data such as characteristics and data sources are difficult to unify.

In order to better record and store these important data, Turing OS designed and developed a unified playback platform to provide a solution to the above problems, as shown in Figure 11 below:

ES and HBase are used to store and play back data. ES stores key index fields and HBase stores complete data records, giving full play to their advantages and meeting the requirements of fast search and mass data storage.
Using the DynamicMessage function of Google Protobuf, it extends the original Google Protobuf format, dynamically supports the definition of playback data format and data assembly, and supports synchronization with ES index, which not only ensures high performance of serialization and storage, but also ensures efficient access of various algorithm data.
Considering that the timeliness of these data queries is not high, message queues are used to decouples the sending and storage, so as to achieve the effect of peak cutting and valley filling of traffic. All algorithms in The Turing OS platform automatically access and play back through the playback Client.

4.8 Performance pressure measurement and tuning

With Turing sandbox and unified playback, Turing OS has the ability to quickly verify the correctness of algorithm data, but it lacks automated tools for analyzing algorithm performance. The Turing OS incorporates the capabilities of the company’s full-link pressure measurement system, Quake (see Quake practice in the US), and reuses traffic data collected by the unified playback platform to construct requests for stress testing of either the Turing OS or the Turing sandbox with the new algorithm package deployed.

Record in the process of pressure measuring algorithm under different QPS scene performance, mainly includes the index of applications such as CPU and memory, TP delay and overtime rate response time data, such as online real performance, history and pressure through comparison between the test data and services promised SLA pressure test report and optimization guidelines are given, obvious performance problem will block algorithm package online process. Turing OS is also connected to meituan’s internal performance diagnosis and optimization platform Scalpel, which can generate analysis reports of thread stack and performance hot spots during pressure measurement to help users quickly locate performance bottlenecks and provide reference for specific optimization directions.

5. Turing OS 2.0 construction achievements

5.1 Algorithm development process

With the ability of algorithm plug-in transformation and dynamic thermal deployment of Turing OS, we decouped the algorithm, engineering and Turing platform, realized the respective closed-loop of algorithm and engineering iteration, improved the research and development efficiency, and greatly shortened the online cycle of algorithm iteration:

In the case of model iteration, feature change and algorithm strategy iteration, algorithm engineers can independently complete the development and testing of the whole link without the intervention of engineering researchers and test engineers. At the same time, the algorithm package can be deployed independently without the need for any services to go online. After the launch, it can be publicized to the engineering side and the product side to pay attention to the changes of relevant indicators.
When a new service scenario or algorithm policy is added, the algorithm and engineering must develop together. After the Protobuf interface is defined, algorithm engineers and engineering developers can independently develop codes and put them online.

Through the use of automated tools such as sandbox drainage verification and performance pressure measurement diagnosis provided by Turing OS, the efficiency of algorithm strategy iteration is further improved, and the online cycle of algorithm iteration is greatly shortened from days to hours. Algorithm engineers independently develop, and then deploy Turing OS for self-testing and debugging, deploy sandbox for drainage testing, evaluate the effect performance through pressure testing platform, and finally independently deploy online, the whole process without the participation of engineering r&d personnel and Turing engineers, to achieve the goal of automatic operation and maintenance; At the same time, through various means to ensure the performance of the algorithm strategy and the operation stability of Turing OS.

5.2 Turing OS 2.0 use summary

It has been more than half a year since the construction of Turing OS (namely Turing Online Service Framework 2.0). The overall situation is generally as follows: Currently, more than 20 Turing OS clusters have been built, and more than 25 algorithm packages and 50 algorithms have been connected. It supports tens of billions of algorithm strategy calculations every day. Through the Turing OS empowerment, most of the entire algorithm iteration process does not require the participation of engineering r&d personnel and test engineers, and algorithm engineers can complete the iteration online of algorithm strategy at the hour level.

Currently, a Turing OS cluster can bear multiple algorithm packages of a single line of business or multiple sub-line of business packages of a single department. Algorithm packages and Turing OS clusters can be dynamically associated and deployed. Turing OS supports physical resource isolation at both line of business level and algorithm package level. In order to facilitate the use of business side, we provide a complete access to documentation and video courses. With the exception of a Turing platform building a Turing OS cluster, any business can basically build its own Turing OS service in an hour. We provide both best practice documentation and performance tuning configurations, allowing the business side to solve most of the problems without guidance. At present, we are building automated operation and maintenance tools to further reduce the access threshold and operation and maintenance costs of Turing OS.

6. Summary and future outlook

Of course, there is certainly no perfect algorithm platform and algorithm online service framework, Turing OS has a lot of room for progress. As we continue to explore machine learning and deep learning online services, there will be more and more application scenarios that require Turing OS support. In the future, we will continue to build in the following areas:

Build Turing OS automated operation and maintenance tools and automated test tools to support semi-automated development of algorithms and further reduce platform access costs and operation and maintenance costs.
Further improve the Turing OS framework, improve the algorithm support ability, support to run in Spark environment, when the algorithm iteration, based on massive data to verify the correctness, performance and effect of the new algorithm function.
Promote the construction of Turing OS full graphics engine, provide graphical process orchestration tools and graph execution engine by abstracting the general components of algorithm business, further empower business on-line and algorithm iteration, and further improve iteration efficiency.

7. Author profile

Yongbo, Ji Shang, Yan Wei and Fei Fei are all from the algorithm platform group of Meituan Distribution Technology Department, responsible for the construction of Turing platform and other related work.

8. Job information

If you want to feel the charm of Turing platform and Turing OS up close, welcome to join us. Meituan distribution technology team sincerely invites technical experts and architects in the field of machine learning platform and algorithm engineering to jointly face the challenges of complex business and high concurrent flow, build the largest instant distribution network and platform in the industry, and meet the era of comprehensive intelligence of Meituan distribution business. If you are interested, please send your resume to [email protected] (subject line: Meituan Delivery Technical Team).

Read more technical articles from meituan’s technical team

| in the public bar menu dialog reply goodies for [2020], [2019] special purchases, goodies for [2018], [2017] special purchases such as keywords, to view Meituan technology team calendar year essay collection.

| this paper Meituan produced by the technical team, the copyright ownership Meituan. You are welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication. Please mark “Content reprinted from Meituan Technical team”. This article shall not be reproduced or used commercially without permission. For any commercial activity, please send an email to [email protected] for authorization.