The foreword 0.

AI scenarios are rich and colorful, and AI evaluation methods are in full bloom. It is a great challenge to design a more general evaluation framework, which needs to take into account different protocols, different model environments, and even different operating systems. This article shares some of our hands-on experiences on the AI evaluation road, highlighting some of our attempts to address the uncertainty of the execution environment. Elastic container is our most suitable solution at present, and we hope it will inspire you.

1. What is the AI review?

In current AI product development, questions like this are often answered. Like which smart speaker is better, which engine is better, or which machine learning algorithm is better for me.

And so on. To answer a question like this, we need to have a ruler of measurement, and the ruler is called a measurement. We need to design what the indicators are, what data can be used to produce them, what are the advantages and disadvantages of each type of indicators, and how to effectively improve the indicators. This is what evaluation does.

2. How to evaluate AI?

AI evaluation platform architecture

The following is an overall architecture of the platform, which is divided into access layer, logical layer, data layer and storage layer. This architecture is a pure cloud native system, all service deployment and some storage resources applied are all based on the cloud.

3. Uncertainty of evaluation tasks

The following is a flowchart of evaluating tasks. We have important links such as pipeline, task scheduling and task encapsulation. We found that there was a great challenge in scheduling and executing tasks that supported multiple objects. Each task is a way of collaborative development, which leads to the uncertainty of the environment in which the task is executed, as well as the uncertainty of the resources consumed during the execution of the task.

Here are some features of the different scenarios:

Diversity of evaluation services

For example, if we want to review a service, we may face multiple request protocols and return content.

Complexity of the benchmark model

We will be faced with different deep learning frameworks to evaluate models, such as tensorFlow-based and PyTorch based.

Diversity of evaluation algorithms

We are going to evaluate the algorithm, and we may be faced with some data in different formats.

4. The evolution of the task execution architecture

So how do you resolve these uncertainties? We actually did a couple of different things, and I’m going to share with you a couple of different things that we did.

Specialized tasks run services

The initial solution was to use a set of specialized services to support all scenarios. All functions and environmental integrated together, one of the most obvious problem is particularly high coupling between tasks, maintain special trouble, it is difficult to achieve common development, because we review services, evaluation model and evaluation algorithm is likely to be different teams or classmate, if all the services are encapsulated in a service, There is actually a very large effect between each of the layers, which actually causes the overall reusability to be very low.

Independent and fixed container services

We also explored the second solution, which is to separate a class of services, to a certain extent, to solve the complexity of services, but the complexity is still high, because within the same class of services, for example, taking model evaluation as an example, different models have different frameworks. If they are still encapsulated in the same service, In fact, compatibility with different frameworks within the same service is a particularly complex issue.

Scheme ii has a benefit, all kinds of services are relatively independent, but still there is a problem, the same kind of service complexity is higher, another service is permanent, even the same type of different business review, its frequency is not the same, even differ extremely large, then lead to service no-load, resource utilization is not high.

Elastic container tasks

After the launch of EKS[1], as the company’s first crab business, a real customer-oriented business, we began to explore another solution, which is to use the elastic container task to carry out a evaluation task. We do feel like it’s a really nice, completely isolated task, and now it’s a very low maintenance thing, not having to deploy your own services, not having to manage your own resources.

It is also a great benefit for the developer of each review task, and the developer can even do whatever he wants without worrying that the framework of the platform will not support it, and without worrying that what he does will affect other businesses. Of course, one of the biggest benefits to the overall platform design is that when it’s used up, it’s released, and it’s a very rational use of resources.

A comparison of several solutions

The following is a comparison of the three schemes, as well as some benefits of EKS.

In scenario 1, although unified maintenance is possible, it is very difficult to implement a service to obtain the various dependencies of measurement objects, and the coupling between tasks is particularly high, resulting in only limited scenarios.

In the second scheme, although certain isolation is achieved, the coupling between tasks is relatively low. However, the busy and idle state of each service is different. Our resource consumption is particularly large. As shown in the figure, some services may be more regular, with busy periods and idle periods. Some services will be busy all the time, and then some services will be very loosely busy, and then in a long-term idle state.

The container task based on elasticity can be sold as soon as the task is started, the maintenance cost is low, and the resources can be reasonably utilized.

5. “Flexibility” in evaluating tasks

The following is the overall task scheduling process based on EKS. We will package the task into a mirror library, and then dispatch the mirror to deploy to an EKS warehouse for execution. This is to solve the uncertainty problem faced by the evaluation task.

Release process of EKS evaluation mission

More like process, this is a task of developers only need to develop code submitted to the code base, we will have a set of standard processes to code package to mirror inside, at the time of concrete production line scheduling, the image will be started in the scheduling of EKS scheduling execution, the execution results will return to the developer’s side, He would know in time what effect the mission was going to have.

6. Summary

EKS really helped us to support different scenarios, which helped us to expand our scenarios.

Before, we need to solve the evaluation of different scenarios, and in many cases, we need to rely on the developers of our own platform and the developers of specific tasks to constantly adapt. What environment is needed for this task, I need to help you prepare, and then make my task can provide you with a better environment and be able to execute it.

Since a EKS, we can only focus on the architecture of a design platform, the task it depends on what the environment, what kind of bag, completely to the task of a developer to design, then we will he need to the package, environment of a mirror, a mirror image of the various tasks and are independent of each other, then I can well by EKS scheduler, This is a good way to expand the architecture of the entire platform and our own capabilities.

The resources

[1]EKS: cloud.tencent.com/product/eks

[Tencent cloud native] cloud said new, cloud research new technology, cloud travel new live, cloud appreciation information, scan the code to pay attention to the public number of the same name, timely access to more dry goods!!