background

It has been a long time since the birth of micro services. During this period, different companies and different teams had their own unique insights, but gradually the understanding of various aspects of microservices, such as service discovery consistency, fault tolerance, transactions, circuit breakers, downgrades, configuration, and so on, converged. As microservices are applied in teams, services are divided more and more carefully. The responsibilities of individual services are simple and clear, and services are easy to maintain. While microservitization does bring great benefits to the team, microservitization also brings some problems.

       

  1. Individual services have simple logic and clear responsibilities, but overall, business complexity has not disappeared. Where has business complexity gone? That’s right! Direct or indirect calls between services. If the business logic is very complex, the call link between services will become very long, sorting out the call relationship between services will become very complicated, the code will also become difficult to maintain, the cost of new students’ business learning is still very high.
  2. Microservices bring additional complexity, service discovery, consistency, and distributed transactions that were not present in the era of monolithic applications. Even with the development of the service, these functions encapsulated so complete, the service also can appear in a large number of reference templates, template code, increase the cost of development, in order to solve this problem, a service grid, the complexity of the grid will be a lot of public service encapsulation in side car, enables developers to focus on writing business code. Can existing solutions go a step further and allow developers to write as little code as possible?

So, how do you articulate complex business logic with as little code as possible? After investigating many solutions, I finally focused on Knative and built a Serverless workflow platform based on Knative to perfectly solve the above problems.

Knative profile

Knative is the K8S sServerless framework announced by Google at the Google Cloud Nnext Conference in 2018. It is currently maintained by Red Hat, Google, IBM, Pivotal, and many others. Knative expands K8S. On the basis of K8S and Istio, Knative abstracts the general function of cloud service, service Deployment, gray scale, etc., so that users do not need to care about K8S such as Deployment, Replicaset, Pods, Ingress, Distribution Rule, etc. The concept of Istio, so as to concentrate on business development, K8S, Istio internal components integration is realized by Knative. Therefore, Knative is essentially a user-oriented solution to server construction, deployment, application management and other issues. The overall structure is shown as follows:

The image is quoted from Knative official website

Knative module composition

  • CICD, who is responsible for the project at Build, has migrated to Tekton and won’t be covered here
  • Serving is responsible for the deployment of Severless applications and methods
  • Eventing is responsible for publishing subscriptions to events and spin-offs based on publishing subscriptions

Serving main composition

  • Service: A Service is an abstraction of an application. It is responsible for the entire application life cycle and the creation and management of other modules, such as Route and Configuration

  • Route. Route manages traffic and controls the traffic to Revision routing rules

  • Configuration: Maintains the latest application status. There is sufficient decoupling between the code and the configuration that a new Revision is created each time the configuration is changed

  • Revision. A Revision is an immutable snapshot of code and configuration changes each time.

    The Serving component diagram is as follows:

Eventing main components

  • Event consumers, Event consumers typically implement Addressable or Callable interfaces to accept consumption events and return the results
  • As the name implies, the Event Source is the Event producer. The existing Event sources include K8S Apiserver, Github, Kafka, Websocket and so on, which are very rich
  • CloudEvent, the event data specification that Knative uses for event transmission
  • Broker An event Broker that accepts a series of events and forwards them to subscribers that meet Trigger Filter criteria
  • Trigger filters, defined event filtering rules and subscribers, used in conjunction with the Broker
  • A collection of Event Registry EventTypes
  • Event Channel Event persistence layer
  • Event Subscription, which is typically a time consumer Subscription to events

Eventing’s overall architecture is as follows:

Photo is from Aleksander Slominski, member of the Knative group

Knative’s event-based publish-subscribe mechanism, which encapsulates Channel and Subscription, enables multiple work nodes to be linked together via publish-subscribe to create high-level workflow resources for simple workflow development. Knative provides two high-level Workflow resources:

  1. Sequence, which provides a workflow resource for sequential execution
  2. Parallel, which provides a workflow resource for method branch lists

Taking Sequence as an example, the process of event consumption is as follows: \

The application practice

Of the three components that Knative provides for Workflow, there is no doubt about Eventing, a series of components that provide the foundation for Workflow implementation, and Eventing also provides high-level Workflow components for developers to use in business development. However, the existing workflow of Knative is only component-level, and there is still a long way to go before it can be put into production and become a workflow platform.

  1. Knative Eventing didn’t provide UI capabilities, and developers had to manually write Yaml files to define workflows, which wasn’t easy to develop
  2. Yaml is complicated to write, which involves broker, Trigger, Parallel, Sequence and other components, and costs a lot to learn
  3. Sequence, Parallel workflow component logic is relatively simple and cannot complete relatively complex business logic
  4. Lack of business services, workflow unified deployment scheme, high deployment cost

WorkFlow Module Composition

In order to solve the above problems, IQiyi built production-level Serverless Workflow. Iqiyi repackaged Knative Eventing components and added drag-drop Workflow generation, CICD, and other functions. Easy for developers to use. The project is divided into four modules:

  1. Workflow-Dashbord Workflow page front-end engineering, responsible for Workflow list display, drag and drop functions
  2. Workflow-Api Workflow page corresponds to the backend, which is responsible for Workflow Yaml generation and application of K8S functions
  3. Workflow-operator Listens to K8S Workflow resources, parses Workflow Yaml, and creates updates to components such as Knative Eventing to orchestrate workflows
  4. Workflow-Syncer Monitors the K8S Workflow resource and updates the Workflow status

Taking workflow creation as an example, the interaction between modules is shown as follows:

A brief analysis of Workflow principles

Take a simple Workflow as an example. The business scenario is that Workflow receives a message from Rocketmq, appends the message, and displays it in a log.

1. A developer logs in to the Workflow UI, creates a Workflow definition, and then drags the definition to the Workflow Api.

2.Workflow -API Creates a Git project and initializes the project directory and Yaml configuration file according to the process node properties

 

Overall directory structure

Enter the CMD directory, the entry of the business code is displayed, and the directory name is generated according to the name of the business node, as shown in the following figure. Go. Write the service code in main.go

CMD subdirectory structure

The configuration files of the Workflow process, service nodes, and the configuration files dependent on the Workflow process are displayed in the Config directory.

Config directory structure

The most important Workflow Yaml definitions are as follows:

Workflow definition Yaml

Yaml consists of two parts: Triggers and Steps. Triggers is a Workflow trigger that supports RocketMQ and scheduled tasks. Steps are defined for the process and are parsed into Knative Eventing components.

3. The developer pulls the project and enters the CMD directory for service development. After completing the business logic push code, enter the Workflow UI and click Deploy to package the service and deploy the Workflow into K8S. CICD adopts Gitlab-CI, which is packaged and deployed to K8S using Google KO. KO builds mirrors of the business codes in the append1 and display directories in CMD and pushes them into the docker image repository. The image with the KO prefix is replaced with the image version you just packaged

After the image is replaced, apply the Yaml files in the config directory, including Workflow Yaml, to K8S in sequence.

4. Establish a long connection between workflow-operator and apI-server to monitor Workflow resources and resources that Workflow needs to integrate, parse Workflow Yaml, and create and update corresponding Knative Eventing components. The components are concatenated according to the Workflow definition so that data can flow according to the Workflow definition. If you are careful, you may have noticed that Workflow’s resource is not a K8S one, but a custom one, with apiVersion as apps.iqiyi.com/v1alpha1 and Kind as FlowApp. Further integration of the Knative Eventing component was required, so a new CRD was defined to describe Workflow. There are many ways to customize CRD. Since the communication between Workflow-operator and API-server is relatively complex, a relatively stable, mature, and easy to use scaffold is needed, so we decided to use operator-SDK as the scaffolding framework of CRD. Kubebuilder vs Operator-SDK (Kubebuilder vs Operator-SDK)

Workflow and Knative Eventing components map as follows:

5. Workflow-syncer establishes a long connection with apI-server to monitor the Workflow resource status. After a Workflow is deployed, workflow-Syncer monitors the latest status of the Workflow and persists the status to the DB for the front-end to check. After that, a Workflow is deployed.

Monitoring alarm

Workflow monitoring for Serverless is relatively straightforward at this point. Knative Eventing component time data interaction is CloudEvent standard and the transport protocol is Http. From this point of view, Http client status codes, response times, and QPS are monitored by Prometheus client through burial of business service code. And exposed by urlpath:/metrics.

Enable Prometheus cluster monitoring and project monitoring in Rancher, and customize monitoring indicators to collect URlpath :/metrics data, and finally display metrics in Grafana of Rancher to achieve Http interface monitoring of business services.

With monitoring data, Grafana, Prometheus and other methods can be used to alarm, and we used Webhook to connect with the company’s unified alarm platform. This is a relatively generic set of monitoring that uses Knative Eventing to communicate with Http protocols to monitor Http metrics for business services and reflect the health of Workflow for business services to a certain extent.

Summary and Prospect

For a long time, the programmer classmates were complex business logic and repeat business development, number iQIYI Serverless workflow platform from the two pain points, in view of the complex business logic, in the form of a work flow chart, clear, combined with the project management, students can even order the product to product flow chart as demand input, The developer develops Workflow according to the product flow chart, which greatly reduces the communication cost and project maintenance cost; In view of repeated business development, iQiyi Serverless workflow platform takes low code as the principle. From the perspective of configuration and centralization, it tries to make developers only write core business code to reduce the necessary repetitive work.

 

Iqiyi Serverless workflow platform is still in its infancy, with many functions still to be improved. It will be improved from the following aspects in the future:

  1. Support for more complex workflow nodes. Knative Eventing provides Sequence, Parallel is obviously too simple to support complex business logic and must enrich workflow selection control nodes such as selection, loop, etc.
  2. Workflow monitoring. The current workflow monitoring is only for business services, but not from the perspective of workflow monitoring management. We need to keep a detailed record of each execution of the workflow, each node execution, and the results of each execution should be traceable.
  3. Support for multiple languages. Current Workflow business services are limited by KO’s packaged deployment and only support the GO language. The subsequent deployment needs to support Java, Python and other multi-language packaging.
  4. Synchronous invocation is supported. Workflow is an asynchronous model. If a synchronous scenario requires an interface service, it is not supported. Synchronization scenarios are also common in normal development. Workflow needs to support the synchronization model.

Maybe you’d like to see more

Iqiyi Micro Service Standard Technical Architecture Practice \

High availability Optimization Practice based on microservices Maturity model \

Scan the qr code below, more exciting content to accompany you!