Author: KubeVela Community

In the current wave of machine learning, AI engineers not only need to train and debug their models, but also need to deploy the models online to verify the effects of the models (sometimes, of course, AI system engineers do this part of the work). This part of the work is tedious and takes extra effort for AI engineers.

And in the cloud native era, our model training and model services are often done in the cloud. This not only improves scalability, but also improves resource utilization. This is useful for machine learning scenarios that consume a lot of computing resources.

But it is often difficult for AI engineers to take advantage of cloud-native capabilities. Over time, the concept of cloud native has become more complex. To deploy a simple model Service on top of cloud native, AI engineers may need to learn several additional concepts: Deployment, Service, Ingress, etc.

KubeVela is a simple, easy-to-use, and highly scalable cloud native application management tool that allows developers to quickly and easily define and deliver applications on Kubernetes without needing to know any details related to the underlying cloud native infrastructure. KubeVela has rich scalability. Its AI plug-in provides model training, model services, A/B testing and other functions, covering the basic needs of AI engineers and helping AI engineers quickly conduct model training and model services in the cloud native environment.

This article mainly introduces how to use KubeVela’s AI plug-in to help engineers more easily complete model training and model service.

KubeVela AI plug-in

The KubeVela AI plug-in is divided into two plug-ins: model training and model service. The model training plug-in is based on KubeFlow train-operator and can support distributed model training of different frameworks such as TensorFlow, PyTorch, MXNet and so on. The model service plug-in is based on Seldon Core, which makes it easy to use the model to start the model service. It also supports advanced functions such as traffic distribution and A/B testing.

Through the KubeVela AI plug-in, the deployment of model training tasks and model services can be greatly simplified. At the same time, the process of model training and model services can be combined with KubeVela’s own workflow, multi-cluster and other functions, so as to complete the deployment of production available services.

Note: You can find all the source code and YAML files at KubeVela Samples[1]. If you want to use the pre-trained model in this example, style-model.yaml and color-model.yaml in the folder will copy the model into the PVC.

Model training

Start with two plug-ins for model training and model services.

vela addon enable model-training
vela addon enable model-serving
Copy the code

The model-training and Jupyter-Notebook component types are included in the model training and the Model-Serving component type is included in the model service. You can use the Vela show command to see the specific parameters in these three components.

You can also choose to consult the KubeVela AI plugin documentation [2] for more information.

vela show model-training
vela show jupyter-notebook
vela show model-serving
Copy the code

Let’s train a simple model using the TensorFlow framework. The effect of this model is to turn gray images into color. Deploy the following YAML files:

Coloring by Emilwallner /Coloring by Greyscale-images [3]

apiVersion: core.oam.dev/v1beta1 kind: Application metadata: name: training-serving namespace: default spec: Components: # training model-name: demo-training type: model-training properties: # training model-properties: Fogdong /train-color:v1 # Framework for model training framework: TensorFlow # Declaration storage, model persistence. The default storage class in the cluster is used to create the PVC storage: - name:"my-pvc"
          mountPath: "/model"
Copy the code

At this point, KubeVela will pull up a TFJob for model training.

It is difficult to see the effect of just training the model, so let’s modify the YAML file to put the model service after the model training step. Also, because the model service will start the model directly, and the input and output of the model are not intuitive (Ndarray or Tensor), we deploy a test service to invoke the service and translate the results into images.

Deploy the following YAML files:

apiVersion: core.oam.dev/v1beta1 kind: Application metadata: name: training-serving namespace: default spec: Components: # training model-name: demo-training type: model-training properties: image: fogdong/train-color: V1 framework: tensorflow storage: - name:"my-pvc"
          mountPath: "/model"DependsOn: - Demo-training properties: < class - demo-training > < class - demo-training properties: < class - demo-training > < class - demo-training > Protocol: tensorFlow predictors: - name: model specifies the number of replicas of the predictors. Replicas:1Graph: # implementation: my-model # implementation: ModelUri: PVC :// my-PVC # test model service-name :// my-PVC # test model service-name: DependsOn: - demo-rest-serving Properties: image: Demo -serving type: Following: Demo -rest-serving Type: WebService # Fogdong /color-serving:v1 # LoadBalancer env: - name: URL # LoadBalancer env: http://ambassador.vela-system.svc.cluster.local/seldon/default/demo-serving/v1/models/my-model:predict ports: # test service port - port:3333
          expose: true

```css
Copy the code

After deployment, use Vela ls to view the status of the application:

$Vela ls training- Serving demo- Training Model - Training Running Healthy Job Succeeded 2022-03-02 17:26:40 +0800 CST Exercises - Demo - Serving Model - Serving running healthy Available 2022-03-02 17:26:40 +0800 CST ├ ─ demo-rest-serving webService running healthy Ready:1/1 2022-03-02 17:26:40 +0800 CST ```css ```cssCopy the code

As you can see, the application has started normally. Use Vela Status –endpoint to view the service address of the application.

$ vela status training-serving --endpoint

+---------+-----------------------------------+---------------------------------------------------+
| CLUSTER |     REF(KIND/NAMESPACE/NAME)      |                     ENDPOINT                      |
+---------+-----------------------------------+---------------------------------------------------+
|         | Service/default/demo-rest-serving | tcp://47.251.10.177:3333                          |
|         | Service/vela-system/ambassador    | http://47.251.36.228/seldon/default/demo-serving  |
|         | Service/vela-system/ambassador    | https://47.251.36.228/seldon/default/demo-serving |
+---------+-----------------------------------+---------------------------------------------------+
Copy the code

The application has three service addresses, the first is the address of our test service, and the second and third are the addresses of the native model. We can call the test service to see what the model does: the test service will read the content of the image, translate it into Tensor and ask the model service, and then translate the Tensor that the model service returns into the image.

We chose a black and white image of a woman as input:

After the request, you can see that a color image is output:

Model service: Grayscale testing

In addition to starting the model service directly, we can use multiple versions of the model within a model service and assign different traffic to them for grayscale testing.

Deploy YAML as shown below, and you can see that both the V1 model and the V2 model are set to 50% traffic. Similarly, we deploy a test service behind the model service:

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: color-serving
  namespace: default
spec:
  components:
  - name: color-model-serving
    type: model-serving
    properties:
      protocol: tensorflow
      predictors:
        - name: model1
          replicas: 1The model flow of version V1 is50
          traffic: 50graph: name: my-model implementation: Tensorflow ://color-model/model/v1; //color-model/model/v1; pvc://color-model/model/v1 - name: model2 replicas:1The model flow of version v2 is50
          traffic: 50graph: name: my-model implementation: Tensorflow ://color-model/model/v2; //color-model/model/v2; pvc://color-model/model/v2 - name: color-rest-serving type: webservice dependsOn: - color-model-serving properties: image: fogdong/color-serving:v1 exposeType: LoadBalancer env: - name: URL value: http://ambassador.vela-system.svc.cluster.local/seldon/default/color-model-serving/v1/models/my-model:predict ports: - port:3333
          expose: true
Copy the code

When the model is deployed, use Vela Status –endpoint to view the address of the model service:

$ vela status color-serving --endpoint
+---------+------------------------------------+----------------------------------------------------------+
| CLUSTER |      REF(KIND/NAMESPACE/NAME)      |                         ENDPOINT                         |
+---------+------------------------------------+----------------------------------------------------------+
|         | Service/vela-system/ambassador     | http://47.251.36.228/seldon/default/color-model-serving  |
|         | Service/vela-system/ambassador     | https://47.251.36.228/seldon/default/color-model-serving |
|         | Service/default/color-rest-serving | tcp://47.89.194.94:3333                                  |
+---------+------------------------------------+----------------------------------------------------------+
Copy the code

Use a black and white city image request model:

As you can see, the result of the first request is as follows. Although the sky and ground are rendered in color, the city itself is black and white:

Ask again, and you can see that the sky, ground, and city are rendered in color:

By distributing traffic to different versions of the model, we can better judge the results of the model.

Model services: A/B testing

For the same black and white picture, we can either change it into color through the model, or transfer the style of the original picture by uploading another style picture.

Are color images better for users or different styles of images better? We can explore this question by doing A/B testing.

Deploy YAML as follows and forward the request with style: Transfer in the Header to the style migration model by setting the customRouting. Also, the style-migrated model shares an address with the colored model.

Note: The model of style transfer comes from TensorFlow Hub[4]

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: color-style-ab-serving
  namespace: default
spec:
  components:
  - name: color-ab-serving
    type: model-serving
    properties:
      protocol: tensorflow
      predictors:
        - name: model1
          replicas: 1graph: name: my-model implementation: tensorflow modelUri: pvc://color-model/model/v2 - name: style-ab-serving type: Model-serving Properties: protocol: Tensorflow # The time required for the model is long. Set the timeout so that the request does not timeout:"10000"CustomRouting: # Specify custom Header Header:"style: transfer"# specify a custom route serviceName:"color-ab-serving"
      predictors:
        - name: model2
          replicas: 1
          graph:
            name: my-model
            implementation: tensorflow
            modelUri: pvc://style-model/model
  - name: ab-rest-serving
    type: webservice
    dependsOn:
      - color-ab-serving
      - style-ab-serving
    properties:
      image: fogdong/style-serving:v1
      exposeType: LoadBalancer
      env:
        - name: URL
          value: http://ambassador.vela-system.svc.cluster.local/seldon/default/color-ab-serving/v1/models/my-model:predict
      ports:
        - port: 3333
          expose: true
Copy the code

After successful deployment, use Vela Status –endpoint to view the address of the model service:

$ vela status color-style-ab-serving --endpoint
+---------+---------------------------------+-------------------------------------------------------+
| CLUSTER |    REF(KIND/NAMESPACE/NAME)     |                       ENDPOINT                        |
+---------+---------------------------------+-------------------------------------------------------+
|         | Service/vela-system/ambassador  | http://47.251.36.228/seldon/default/color-ab-serving  |
|         | Service/vela-system/ambassador  | https://47.251.36.228/seldon/default/color-ab-serving |
|         | Service/vela-system/ambassador  | http://47.251.36.228/seldon/default/style-ab-serving  |
|         | Service/vela-system/ambassador  | https://47.251.36.228/seldon/default/style-ab-serving |
|         | Service/default/ab-rest-serving | tcp://47.251.5.97:3333                                |
+---------+---------------------------------+-------------------------------------------------------+
Copy the code

In this application, the two services have two addresses each, but the second style-ab-serving is invalid because the serving is already pointed to the address of color-ab-serving. Again, we see the effect of the model by requesting the test service.

First, without the header, the image changes from black and white to color:

We added a picture of a wave as a style render:

We add the Header “style: Transfer” to this request, and you can see that the city has become a wave style:

We can also use a picture of an ink painting as a style render:

As you can see, the city was transformed into an ink-and-wash painting:

conclusion

KubeVela’s AI plugin helps you train and serve models more easily.

In addition, through the combination with KubeVela, we can also test the effect of the model through KubeVela multi-environment function, distribution to different environments, so as to achieve flexible deployment of the model.

A link to the

[1] KubeVela Samples github.com/oam-dev/sam…

[2] KubeVela AI plugin kubevela. IO/en /docs/nex

[3] emilwallner/Coloring greyscale – images github.com/emilwallner…

[4] TensorFlow Hub tfhub. Dev/Google/mage…

Release the latest information of cloud native technology, collect the most complete content of cloud native technology, hold cloud native activities and live broadcast regularly, and release ali products and user best practices. Explore the cloud native technology with you and share the cloud native content you need.

Pay attention to [Alibaba Cloud native] public account, get more cloud native real-time information!