Artificial intelligence with Kubernetes

A better combination of AI and Kubernetes often ranks high on the list of top foreign sites predicting Kubernetes for 2021. Kubernetes is an ideal platform for running DL/ML workloads because of its scalable and distributed nature, as well as its powerful scheduling capabilities.

Above, the connection diagram for WeBank’s open-source sewage learning platform was taken to the sewage is. In green, we could see the usual functions a machine learning platform takes place including training, development, modeling, data and application management. Typically, these machine learning platforms run on top of Kubernetes, as shown in purple: At the bottom is Kubernetes, and above that is the container management platform (webank’s developer mentioned KubeSphere on the KubeSphere 2020 Meetup in Beijing), The CONTAINER management platform provides storage, networking, service governance, CI/CD, and observability capabilities on top of Kubernetes.

Kubernetes is powerful, but usually the workloads running AI on Kubernetes require more support for non-K8S native capabilities such as:

  • User management: involves multi-tenant rights management

  • Multi-cluster Management

  • Graphical GPU workload scheduling

  • GPU monitoring

  • Training and inference log management

  • Kubernetes events and audits

  • Alarms and Notifications

Specifically, Kubernetes does not provide the perfect user management capabilities that an enterprise-level machine learning platform requires. Similarly, the native K8s does not provide the ability to manage multiple clusters, and users have many K8s clusters that need to be managed in a unified manner. Running AI workloads requires gpus. Expensive Gpus require better monitoring and scheduling to improve GPU utilization and save costs. AI training can take a long time to complete, ranging from a few hours to a few days, and it is easier to see progress through the logging system provided by the container platform. Container platform event management helps developers better locate problems; Container platform audit management makes it easier to know who is doing what on what resources, giving users deep control over the entire container platform.

Overall, K8s is like Linux/Unix, but users still need Ubuntu or Mac. KubeSphere is an enterprise-class distributed multi-tenant container platform, which is essentially a modern distributed operating system. KubeSphere provides rich platform capabilities on top of K8s such as user management, multi-cluster management, observability, application management, micro-service governance, CI/CD, etc.

How to build machine learning platform with K8s and KubeSphere

Extreme Stack platform is a machine learning service platform for enterprises or organizations. It provides AI full life cycle management services from data processing, model training, model testing to model reasoning, and is committed to helping enterprises or organizations quickly build AI algorithm development and application capabilities. The platform provides low-code development and automated testing functions, supports intelligent task scheduling and intelligent resource monitoring, and helps enterprises comprehensively improve the development efficiency of AI algorithm, reduce the application and management cost of AI algorithm, and quickly realize intelligent upgrade.

The challenges of the iterative evolution of the extremestack AI platform

Before the use of Kubernetes, the platform used Docker to mount the designated GPU to allocate computing power, the container built in Jupyter online IDE implementation and interaction with developers, developers completed the training test code writing and model training in the allocated container, there were four problems to be solved at that time:

  1. Low utilization of computing power: when developers code, GPU is only used for code debugging; At the same time, the developer needs to manually turn on or off the environment. If the developer does not turn off the environment after the training, it will continue to occupy computing resources. Taking the algorithm competition as an example, the average utilization rate of computing power is 50%, which is a serious waste of computing power resources.

  2. High storage operation and maintenance cost: The platform uses Ceph to store data sets and codes. For example, the container mounts Ceph’s block storage to store the development environment persistently, so that it can be restored on other nodes when it is used again. When a large number of developers use it, the mounted volume cannot be released and the container cannot be stopped, affecting the developers.

  3. Data set security cannot be guaranteed: Commercial algorithm data sets are often classified, which requires the separation of data ownership and use right. For example, algorithms developed by many large government enterprises are often outsourced to professional AI companies. How to make the algorithm engineers of external AI companies not only complete the algorithm development, but also do not have access to the data sets is the urgent demand of the customers of the government and enterprise algorithm platform.

  4. High labor cost of algorithm testing: For the algorithm submitted by the algorithm developer, the accuracy and performance indicators should be evaluated, and the algorithm can be put online only after reaching the accuracy and performance indicators required by the demand side. Taking the algorithm competition as an example, the general AI platform will provide the training data set and test data set to the user. After the algorithm development is completed, the user will run the test set with the algorithm and write the results into the CSV file and submit it together with the algorithm. For the winning developer, we need to restore the developer test environment, to use the algorithm of developers to run test set, and submit the CSV and developers, the results of determine the CSV file has not been modified, to ensure fair play, all of these need a lot of testing personnel to participate in, as positioning the AI development extremely stack platform, In order to develop thousands of algorithms on the platform, it is urgent to reduce the labor cost of testing.

To address these issues, it was decided to introduce Kubernetes in mid-2018 to refactor the platform. However, the team did not have anyone proficient in Kubernetes at that time, and Kubernetes is not cheap to learn. Later, we found KubeSphere, the open source container management platform of Qingyun Technology, which shields a lot of the underlying details of Kubernetes. Users only need to use Kubernetes in a visual way like public cloud, which can reduce the cost of using Kubernetes. At the same time, the community maintenance team is also very serious and responsible. At the beginning, KubeSphere was deployed to our test environment, and the cluster would crash after running for a period of time. Students from the open source community helped us solve the problem hand in hand, and later QingStor NeonSAN was introduced to replace Ceph. Greatly improved platform stability.

KubeSphere V3.0.0 supports multi-cluster management, custom monitoring, and provides perfect event, audit query and alarm capabilities. KubeSphere v3.1.x added the function of KubeEdge edge node management, added the ability of multidimensional metering and billing, reconfigured the alarm system, Prometheus format compatible alarm Settings, new notification channels including enterprise wechat/spike/email /Slack/Webhook, and app Store and DevOps refactorings; KubeSphere V3.2, which is under development, will provide better support for running AI workloads including GPU workload scheduling, GPU monitoring, etc. In the future, KubeSphere V4.0.0 will be upgraded to a pluggable architecture, and the front and back ends will be pluggable, based on which a machine learning platform pluggable to KubeSphere can be built.

Cloud-native AI platform practices

Improve the utilization of computing power resources

  1. GPU virtualization

Firstly, in view of the low utilization of coding power by developers, we separate coding and training power clusters, and use GPU virtualization technology to make better use of GPU power, for which there are mature solutions in the market. After technical research, I chose Tencent Cloud open source GPUManager as the virtualization solution.

GPUManager is implemented based on GPU driver encapsulation. Users need to encapsulation and hijack some key interfaces (such as video memory allocation and CUDA Thread creation, etc.) of the driver. During the hijack, user processes are restricted to use computing resources. Supports GPU and video memory isolation between containers on the same card, which ensures that developers in scenarios with low computing power utilization can share gpus and resources will not be preempted during debugging of the same block.

  1. Train cluster computing force scheduling

Using Job in Kubernetes to create training tasks, you only need to specify GPU resources to be used. Combined with message queue, the computing power utilization of training cluster can reach full load.

	resources:
		requests:
			nvidia.com/gpu: 2
			cpu: 8
			memory: 16Gi
		limits:
			nvidia.com/gpu: 2
			cpu: 8
			memory: 16Gi
Copy the code
  1. Resource monitoring

Resource monitoring plays a key role in guiding cluster coding and training optimization. KubeSphere can not only monitor traditional resources such as CPU, but also successfully complete observability monitoring through custom monitoring panel and simple configuration in a few steps. At the same time, the pole stack platform is also based on Kubernetes. You can limit the total GPU usage per project and GPU resource allocation per user.

Now, for example, in the algorithm contest scenario, the average GPU utilization we monitored was 70% in the coding cluster and 95% in the training cluster.

Storage: QingStor NeonSAN RDMA

We replaced the open source Ceph with NeonSAN with NVMe SSD+25GbE (RDMA) and the performance of NeonSAN was amazing: For example, the IOPS of random read and write reaches 180K, which is 6 times that of Ceph, and the IOPS of write can also reach 75.7K, which is 5.3 times that of Ceph. After that, the AI platform can concurrently train up to 1000 PODS, and there is no delay caused by the failure of the storage mount volume release.

Data set security

We solved the problem of data set security by creating a data security sandbox that allows algorithm developers to train models and evaluate algorithm quality based on customer data without compromising data.

The data security sandbox solves two problems:

  1. Security isolation problem: The external cluster cannot connect to the external network, and data is transmitted. Within the platform, data can be transferred programmatically to an environment where developers can access it. For example, set up an HTTP service in a coding environment that the developer can control to receive training data from the training cluster. KubeSphere provides visualized tenant based network security policy management, which greatly reduces the operation and maintenance pressure of container platforms at the network level. Network isolation can be implemented in the same cluster through network policies, which means that firewalls can be set between Pods. If multiple policies select a Pod, the Pod is limited to the union of the Ingress/Egress rules of these policies, and they will not conflict. So here just set a policy to restrict access to the Internet and prohibit access to the encoded Pod policy.

2. Training experience problem: After isolation, developers should be able to know the training and testing status in real time. Training and testing should not be a black box, or it will greatly affect the efficiency of model training and algorithm testing, as shown in the figure below.

  • EFK collects and stores container logs. For different data sets, you can set different levels of blacklist and whitelist filtering policies to prevent image data from being leaked. For example, for high security data sets, you can set whitelist log display directly.

  • We developed the VISUAL toolkit EV_toolkit to check the training indicators such as accuracy, loss function (such as cross entropy), etc. The principle is that the training indicators are written to the specified position through the API interface of Toolkit, and then displayed on the interface.

  • Training monitoring: support unattended training, error message notification, customized training progress reminder, training end notification.

Automatic test

Three problems need to be solved to complete automated testing of the algorithm:

  1. The model format developed by each algorithm framework is not uniform, how to standardize unified invocation.

  2. All algorithm inputs should be unified and standardized.

  3. Customers have higher and higher requirements on the algorithm, and the data structure output by the algorithm is more and more complex. How to compare the algorithm output with the correct results of data annotation?

Let’s look at our EVSdk since the research of reasoning framework, it can solve the problem of the first two, one is to formulate the unified algorithm packaging standard, different from other AI platform evaluation on single model, a stack of automatic test system for encapsulating a good algorithm to evaluate, because as the algorithm is more and more complex, an algorithm with multiple model is becoming more and more common, Only evaluating a single model can not evaluate the quality of the finished product delivered to the customer. In addition, the model evaluation should consider the compatibility of various development framework model formats. Second, EVSdk abstracted the algorithm input interface. For example, for video analysis, the first parameter is the detector instance created, the second parameter is the input source frame, and the third parameter is configurable JSON, such as ROI and confidence, etc. The standardization of algorithm input is realized through EVSdk specification. In addition to solve the two problems of automatic testing, EVSdk also provides kits, such as the algorithm of authorization, in addition, there are a unified framework of reasoning, outside a layer of the algorithm of engineering work can also be standardized, provided by the unified platform, in the form of installation package installed inside, keep the developers to do, such as processing the video stream, the algorithm provides GRPC service, etc.

We also need to solve the problem of algorithm output. Algorithm output is to find the algorithm prediction data in the node that outputs JSON or XML, and then compare it with the correct data labeled. The automated test system introduces two concepts:

  • Template: Defines the data structure that the algorithm outputs. This data structure contains several variables and the routing path of how to get specific values from the original data. After parsing the original data according to the template, all variables in the template will be filled with specific values.

  • Routing path: A routing rule for data queries that maps different data objects such as XML/JSON to the same data structure. For test data from different sources or with different structures, you can change the configuration file to get the same data structure, so you can use the same parser to calculate the algorithm metrics for the same type of task.

The following example is an interception of the smallest unit in a template, but suffice to illustrate the principle of automated testing. The automated test program finds the age in the algorithm’s output and compares it to the age in the tag of the picture or video. The route_path field tells the system that there is a root node whose key is people. The value of the age field is the routing path we are looking for. [0] represents an array, which makes sense because more than one person can appear in a frame. If [] is not 0, for example, age is an object. The key name is age. @num is its data type. Of course, there are more complex criteria, such as age error within three years the system thinks the algorithm analysis is correct.

Status of the pole stack platform

The PaaS layer uses KubeSphere as a base to build three platforms: data platform, development platform, inference platform. After the original data is collected, the data categories are marked, and then the algorithm is used to deduplicate the data, and the low-quality data are removed, and then the initial screening is performed manually. Because AI is a large amount of training data, we also support to set up data life cycle, such as importance according to the data to save, and then to the data annotation, mark is very elapsed time work, need to do the task and to indicate the distribution of the personnel performance management, through retraining model for automatic labeling and manual adjustment. The data is annotated and then transferred to the development platform, which supports two development modes: interactive development and low-code development. Finally, after the algorithm is produced, it is put on the algorithm store of the inference platform. The client of the inference platform can be deployed to the user side. The user only needs to input the activation code to install the algorithm, and then analyze the local real-time video stream or picture.

The future of the pole stack platform

  1. What makes a computer vision algorithm different from other software, such as KubeSphere, is that the products we use are the same as those used by other companies. However, for the algorithm, the effect is very good in customer A. If the camera distance to customer B is longer, the Angle is different, or the detection object has an extra color, the effect is not up to standard. Data should be collected again to train the model. The reproducibility of the algorithm is not strong, and it is difficult to standardize and productize. For solving the problem of common scenarios, the computing power and data cost will double for every percentage point increase in recognition rate, so the industry has spent a total of billions of costs to make face recognition into products. For thousands of algorithms, how to reuse them on a large scale is the problem we need to overcome.

  2. Since it is very difficult to adapt to the general scenario, back to the real world, do we really need to constantly optimize an algorithm to adapt to all user scenarios? For every user, what he really cares about is how well the algorithm works in his own scenario. In practice, for new scenes, we will add the new scene data set into the training set and retrain the algorithm to solve the problem. This problem can be solved if the system is able to remove the algorithm engineer from the retraining process and allow the customer to do it through simple operations.

3. We will development industry next suite, low code algorithm developers only within the platform development and maintenance of algorithm in the leading industry, the user data to describe your scene, training model, adapted own scenario, without any coding work, can complete algorithm, the optimization effect is not standard, intervention by the developer. If model optimization is unavoidable, when placed on the user side, the algorithm is equivalent to the pet that the user feeds. In fact, this will redefine the relationship between the user and the algorithm. Only in this way, the algorithm can be transformed into products and thousands of scenes can be opened.

Serverless application in AI field

The application of Kubernetes in AI is described in detail above. In fact, AI also needs Serverless technology. Specifically, AI data, training and reasoning can be combined with Serverless technology to achieve higher efficiency and reduce costs:

  • AI cannot live without data, and it costs less to process data in Serverless mode.

  • The AI can be trained by timing or event-triggered Serverless workloads and the expensive Gpus can be released in time.

  • The trained model can provide services in a Serverless manner.

  • The AI reasoning results can be triggered by events to the Serverless function for subsequent processing.

Serverless is a track that cannot be missed in the cloud native field. Taking this as a starting point, Qingyun Technology has opened the cloud native FaaS platform — OpenFunction.

The FaaS platform consists of Build, Serving and Events. Build is responsible for converting function code into function image; The Serving section is responsible for providing scalable function services based on the generated function images; The Events section connects to external event sources and drives functions to run.

Kubernetes has enabled Docker as the default Container Runtime, so it is not possible to build images in K8s cluster using the Docker in Docker method Docker build, so there is another option. OpenFunction now supports Cloud Native Buildpacks by default, with Buildah, BuildKit, Kaniko, and more to follow.

Dapr is Microsoft’s open source distributed application runtime that provides some of the common infrastructure capabilities needed to run distributed applications. OpenFunction also applies Dapr to OpenFunction Serving and Events.

The most important thing about function services is automatic scaling between 0 and N copies. For synchronous functions, OpenFunction supports Knative Serving, and keda-HTTP is also planned later. For asynchronous functions, OpenFunction combined Dapr and KEDA to develop an asynchronous function runtime called OpenFunction Async.

After investigating Knative Eventing, we found that although it was well designed, it was a little too complicated to learn and use. The Argo Events architecture, while much simpler, is not designed specifically for Serverless and requires that all Events be sent to its EventBus, EventBus. Based on the above survey, we developed the function event management framework OpenFunction Events.

OpenFunction Events takes full advantage of Dapr’s ability to connect with many event sources and more MQ to act as EventBus, including EventSource, EventBus and Trigger:

  • EventSource: Used to connect to external event sources such as Kafka, NATS, PubSub, S3, GitHub, etc. After obtaining the event, the EventSource can either call the synchronous function directly for processing, or send the event to EventBus for persistence, triggering either synchronous or asynchronous functions.

  • EventBus: Dapr enables pluggable connectivity with many message queues such as Kafka, NATS, etc.

  • Trigger: Obtains events from EventBus, filters and selects the events that you care about, and then directly triggers the synchronization function or sends the filtered events to the EventBus Trigger asynchronous function.

Serverless can also be applied to IoT device data processing, streaming data processing, Web/mobile terminal BACKEND for Frontend, etc. OpenFunction is now open source on GitHub. The main repositories include: OpenFunction[1], Functions-Framework [2], Builder [3], samples[4] are also welcome to the Chinese Slack channels [5] in KubeSphere and OpenFunction community.

footnotes

[1]. OpenFunction:github.com/OpenFunctio…

[2]. Functions-framework:github.com/OpenFunctio…

[3]. Builder:github.com/OpenFunctio…

[4]. Samples:github.com/OpenFunctio…

[5]. Chinese Slack channel: kubesphere.slack.com/archives/CB…

The author

Polar perspective technology partner Yellow River, KubeSphere qingyun technology senior architect Huo Bingjie