introduce

Since the official release of ISTIO-1.0.0 this year, the Coohom project has also started using ISTIO as a service grid in production.

This article will introduce and share some practices and experiences in using ISTIO in Coohom projects.

Coohom project

Founded in 2011, Hangzhou Qunhe Information Technology Co., Ltd. is headquartered in Hangzhou, Zhejiang province, covering an area of more than 5,000 square meters. Cool knorr’s company with distributed parallel computing and multimedia data mining for the core technology, introduced the household design of cloud platform, committed to the rendering of cloud, the cloud design, BIM, VR, AR, AI technology, such as research and development, to achieve “wysiwyg” experience, 5 minutes to generate decorate plan, 10 seconds to generate rendering, a key generation VR scheme, launched in 2013. As the “design entrance”, Kujiale is committed to creating a strong ecological platform connecting designers, household brands, decoration companies and owners.

Relying on Kukule’s fast cloud rendering capabilities and advanced 3D design tool experience, Coohom is committed to creating a cloud design platform for users to have free editing experience and ultimate visual design. The Coohom project is an emerging product with no historical baggage in terms of architectural technology, and Coohom has been deployed and running on the Kubernetes platform since the inception of the project. As a technical accumulation for the Coohom project, we decided to use the service Grid as the service governance for the Coohom project.

Why isTIO

Since istio is a Google-dominated product, using istio must be on the Kubernetes platform. So for the Coohom project, Coohom was stable on the Kubernetes platform before isTIO was used in the production environment. Let’s start by listing the features istio provides (service discovery and load balancing are already provided by Kubernetes) :

  1. Flow management: Control the flow between services and the direction of API calls, fusing, grayscale publishing, A/BTest can be completed under this function;

  2. Observability: IStio can tease out service dependencies through traffic and perform non-intrusive monitoring (Prometheus) and tracing (Zipkin);

  3. Policy enforcement: This is where Ops is concerned. Policies such as Quota, limiting, and even billing can be done through the grid, completely decoupled from the application code;

  4. Service identity and security: Providing authentication for services in the grid, which is useless on a small scale, is essential on a large cluster.

However, these features are not the fundamental reason why we decided to use ISTIO. The following are the reasons why we decided to try ISTIO:

  • First, it uses a new mode (SideCar) to manage and control microservices, completely decoupling the service framework and application code. Business developers need no additional learning of service frameworks, just focus on their own business. Istio is a layer that can be accessed and managed by dedicated people or teams, greatly reducing the cost of “doing” microservices well.

  • The second: Istio comes from Google Cloud Platform (GCP). Istio is the official Service Mesh solution on Kubernetes. All functions on Kubernetes are out of the box without modification. Going deep into ISTIO and following its community development greatly reduced the cost of re-building the wheel.

The progress of Coohom in ISTIO

At present, Coohom has used ISTIO as a service grid in production environment clusters in multiple regions. For the functions provided by ISTIO, network traffic management of Coohom projects has been completely handed over to ISTIO and grayscale publishing has been done through ISTIO. The outgoing traffic from the K8S cluster is managed through ISTIO.

Switch from single Kubenertes to Kubernetes+istio

The Coohom project has been running on the Kubernetes platform since before istio was used. Coohom’s architecture can be simply divided into the following aspects from the perspective of technology stack:

  • Node. Js egg application

  • Java application Springboot

From the perspective of network traffic management, it can be divided into three categories:

  • Only external traffic is accepted.

  • Only intra-cluster traffic is accepted.

  • Both external and internal traffic are accepted

In our scenario, almost all Node applications fall into category 1, some Java applications fall into category 2, and some Java applications fall into category 3. For clarity, we can imagine a simple scenario here:

As we can see from the above scenario, we have a page service that renders concurrent page content to the user’s browser, from which the user accesses the page service and account service. The account service records user names, passwords and other related information. The account service also checks within the permission service to see if the user has the appropriate permissions, and the page service also requests some interfaces of the account service. So according to our traffic management taxonomy above, page services are category 1 services, permission services are category 2 services, and account services are category 3 services. At the same time, the account service and permission service also connect to external RDS as storage, it is important to note that THE RDS is not inside the Kubernetes cluster.

In the past, when only Kubenretes was used, we needed to write Kubernetes Ingress in order to allow users to access the corresponding service correctly: it is worth noting that only the account service and page service need to be exposed to external access, so only the rules of these two services are written in the Ingress.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: example-ingress
spec:
  rules:
  - host: www.example.com
    http:
      paths:
      - backend:
          serviceName: page-service
          servicePort: 80
        path: /
  - host: www.example.com
    http:
      paths:
      - backend:
          serviceName: account-service
          servicePort: 80
        path: /api/account
Copy the code

After accessing the ISTIO system, these three services can still use the above Kubernetes Ingress to import traffic to the corresponding service even if all pods have ISTIo-Proxy as sidecar. However, since we are using ISTIO, we want to take full advantage of isTIO’s traffic management capabilities, so we leave the responsibility of importing traffic to the service to istio VirtualService. So when I first connected to ISTIO, we transformed the above Kubernetes solution into the following solution:

Entrance to the Ingress

First, we create an Ingress under the namespace istio-system and import all traffic under the host www.example.com into the istio-ingressGateway. So we hand off traffic management to ISTIO for management from the traffic entry point of the cluster.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: istio-ingress
  namespace: istio-system
spec:
  rules:
  - host: www.example.com
    http:
      paths:
      - backend:
          serviceName: istio-ingressgateway
          servicePort: 80
        path: /Copy the code

After delivery to ISTIO for management, we need to tell ISTIO the specific route-service matching rules, which can be done through Gateway+VirtualService. Note that the following service names are shorthand, so both files must be deployed under the same Kubernetes namespace as the corresponding service.

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: example-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: example-http
      protocol: HTTP
    hosts:
    - "www.example.com"

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: example-virtualservice
spec:
  hosts:
  - "www.example.com"
  gateways:
  - example-gateway
  http:
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        port:
          number: 80
        host: page
  - match:
    - uri:
        prefix: /api/account
    route:
    - destination:
        port:
          number: 80
        host: account-serviceCopy the code

External service registration

After the above operations, restarting the service instance and automatically injecting istiO-Proxy, we found that the two back-end Java applications did not start properly. After querying the startup log, it was found that the startup failed because the external RDS could not be connected. This is because all our network traffic is controlled by ISTIO, and all external services in the cluster need to register with ISTIO before they can be successfully forwarded. A very common scenario is external RDS over TCP connections. Of course, the same is true for external HTTP services.

Here is an example of registering an external RDS with ISTIO.

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: mysql-external
  namespace: istio-system
spec:
  hosts:
  - xxxxxxxxxxxx1.mysql.rds.xxxxxx.com
  - xxxxxxxxxxxx2.mysql.rds.xxxxxx.com

  addresses:
  - xxx.xxx.xxx.xxx/24
  ports:
  - name: tcp
    number: 3306
    protocol: tcp
  location: MESH_EXTERNALCopy the code

Support grayscale publishing

The above solution of istio-ingress+Gateway+VirtualService can replace our previous solution of using only Kubernetes Ingress. However, if we just stop at this step, the benefits brought by IStio may not be fully reflected. It is worth mentioning that in the above isTIo-ingress we imported all the traffic from www.example.com into the IStio-ingressGateway. In this step, you can view the network status of all forwarded traffic in the istio-ingressgateway log, which is very useful in daily debugging. However, istio’s capabilities have not been fully utilized in the scheme mentioned above. Next, I will introduce how I modified the scheme based on the above scheme to carry out Coohom’s daily grayscale publishing.

Taking the above article as an example, suppose that three services need to be published at the same time, and all three services need grayscale publishing, and we have the following requirements for grayscale publishing:

  • The initial greyscale release is expected to be viewable only by internal developers, with no access to greyscale for external users.

  • After the internal developers verify the gray scale, gradually develop the ratio of switching new and old service traffic.

  • When an external user enters the new/old service, I want the entire service link behind him to be the new/old service

In order to support the above grayscale release requirements, we have the following work to be completed:

  1. Defining rules tells IStio which subsequent Deployment instances are new and which are old services for a Kubernetes Service.

  2. Redesign the VirtualService architecture policy so that the entire route management meets the second and third requirements mentioned above.

  3. It is necessary to design a reasonable process so that the final state can be restored to be consistent with the initial state after the grayscale release is completed.

Define the rules

In order for IStio to know the rules of old and new instances of a service, we need to use DestinationRule. Take the account service as an example:

As can be seen from the example below, for account service, all Pod tags with type of Normal are classified into normal group, and all Pod tags with type of Grey are classified into Grey group. This is used to help IStio know the rules of new and old services later. That is, the POD tagged with type:normal is the old instance, and the POD tagged with type: Grey is the new instance. This rule can be applied to all three service classes here without further elaboration.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: account-service-dest
spec:
  host: account-service
  subsets:
  - name: normal
    labels:
      type: normal
  - name: grey
    labels:
      type: greyCopy the code

Refactoring VirtualService

As we mentioned earlier, within the Kubernetes platform, we divide all services in network traffic into three categories. The reason for this distinction is the different design of VirtualService for each type of service. Let’s start with the first class of externally connected services, page services. Here’s an example of a page service:

For headers with end-user:test, istio will import the request to the grey group we mentioned above, that is, the specific request will go to gray level. All other requests are imported into the normal group, the old instance, as before.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: page-service-external-vsc
spec:
  hosts:
    - "www.example.com"
  gateways:
  - example-gateway
  http:
  - match:
    - headers:
        end-user:
          exact: test
      uri:
        prefix: /
    route:
    - destination:
        port:
          number: 80
        host: page-service
        subset: grey
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        port:
          number: 80
        host: page-service
        subset: normalCopy the code

Then we look at the second type of service, the permission service. Here is a virtualService example of the permission service:

VirtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService virtualService The name here has no effect on the actual effect, but it is my personal habit of naming to remind myself whether this rule applies to external traffic or intra-cluster traffic. Here we define an internal service rule that only traffic that flows through POD instances with Type: Grey can enter the Grey group. That is, to satisfy our third requirement above, the entire service link is either all new instances or all old instances.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: auth-service-internal-vsc
spec:
  hosts:
  - auth-service
  http:
  - match:
    - sourceLabels:
        type: grey
    route:
    - destination:
        host: auth-service
        subset: grey
  - route:
    - destination:
        host: auth-service
        subset: normalCopy the code

For our third type of service, an account service that receives both external and internal traffic, we just need to combine the two VirtualServices mentioned above:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: account-service-external-vsc
spec:
  hosts:
    - "www.example.com"
  gateways:
  - example-gateway
  http:
  - match:
    - headers:
        end-user:
          exact: test
      uri:
        prefix: /api/account
    route:
    - destination:
        port:
          number: 80
        host: account-service
        subset: grey
  - match:
    - uri:
        prefix: /api/account
    route:
    - destination:
        port:
          number: 80
        host: account-service
        subset: normal

---

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: account-service-internal-vsc
spec:
  hosts:
  - "account-service"
  http:
  - match:
    - sourceLabels:
        type: grey
    route:
    - destination:
        host: account-service
        subset: grey
  - route:
    - destination:
        host: account-service
        subset: normal
Copy the code

At this point, we have completed the first step of grayscale release preparation, is also a big step. When a new service instance is published, we initially enter the new service by adding specific headers, while ensuring that all external services only enter the old service. After the internal staff has verified the performance of the new service instance in the production environment, we need to gradually open up the traffic ratio to import external user traffic into the new service instance. This can be achieved by changing the external-VSC of class 1 and Class 3 services. Here is an example:

The following example shows that half of external traffic will enter grey group and generally enter Normal group. Finally, weigth of Grey group can be changed to 100, while weight of normal group can be changed to 0. In other words, all traffic can be imported into Grey group, and gray publishing is completed.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: page-service-external-vsc
spec:
  hosts:
  - "www.example.com"
  http:
  - route:
    - destination:
        host: page-service
        subset: grey
      weight: 50
    - destination:
        host: page-service
        subset: normal
      weight: 50Copy the code

Finishing touches

From the above scheme, we divided all services into three types according to the network traffic sources, and realized the grayscale publishing of the whole business through ISTIO. However, the grayscale release is not completely over, we still need a little finishing touch.

Considering the initial state of the whole business we have 3 Kubernetes Services and 3 Kubernetes Deployment, each of which has a POD labeled type: Normal. However, now after the above scheme, we also have 3 Kubernetes Services and 3 Kubernetes Deployment, but the POD of each Deployment is labeled with Type: Grey.

Therefore, after the above grayscale release, we also need to restore the state to the initial value, which is conducive to our next grayscale release. For Coohom project, Gitlab-CI is used in CICD, so our automated grayscale publishing final work is deeply bound with gitlab-CI scripts, so we will not introduce it here, and readers can customize it according to their own conditions.

conclusion

The above is some of Coohom’s practice and experience on grayscale publishing in isTIO. For the Coohom project, the use of ISTIO in production only began after isTIO’s official release of 1.0.0. But before that, we had been using IStio in the internal environment for almost half a year, and Coohom started using IStio 0.7.1 in the internal environment. It is counterintuitive that the internal network environment is inconsistent with the production environment in the medium to long term. However, isTIO is completely transparent to the business and can be viewed as part of the infrastructure, so we started using ISTIO in the Intranet environment before using IT in the production environment and gained a lot of experience.

ServiceMesher community information

Wechat group: Contact me to join the group

Community official website: www.servicemesher.com

Slack:servicemesher.slack.com requires invitation to join

Twitter: twitter.com/servicemesh…

GitHub:github.com/servicemesh…

For more ServiceMesh consultation, follow the wechat public account ServiceMesher.