<img src=”https://main.qcloudimg.com/raw/f16dd23519567f4d48aa7c284cf37f62.jpg” width=”700″/>

Tencent Application Service Workflow (ASW) is a Service choreography solution based on a new generation of computing architecture, which is used to coordinate the execution of distributed tasks. When the task execution steps are set in the workflow of application and service choreography, multiple Tencent cloud services can be scheduled according to the steps to complete various business application scenarios. It can simplify the tedious work of task coordination, state management and error handling required by the development and operation of business processes, and make the application construction simpler and more efficient. Glue products and services across the cloud to provide end-to-end solutions for user scenarios.

01. Application and Service Choreography Workflow ASW Background Introduction

<img src=”https://main.qcloudimg.com/raw/19fee1dce4817561398ed6ef109bdf4b.jpg” width=”700″/>

With the development and progress of cloud computing technology, functions as a service (FAAS), no service (Serverless) and other new generation technology solutions are becoming more and more users’ preferred solutions on the cloud. No service does not mean that developers do not have services, but refers to the use of developers, do not need to more to consider the relevant content of the server, there is no need to consider the server size, storage type, network bandwidth, automatic scaling capacity; At the same time, there is no need to operate and maintain the server, no need to constantly patch the system, apply patches, no need for data backup, software configuration and other work.

Serverless has natural advantages in terms of ease of development, high performance, flexible scaling capacity, ease of deployment, cost, etc. Users have moved from buying computing instances and deploying application code in usage patterns to doing function-based development oriented to the end business. Tencent Cloud Serverless Function computing product – Cloud Function (Serverless Cloud Function, SCF), very convenient to provide for a single request or thing processing capacity; The operation, scaling and deployment of the cloud function itself are solved by the Serverless service provider, which is transparent for the layer. With more and more applications of Serverless architecture, more and more users are gradually deploying more and more business in Serverless way.

At this point, the choreography and composition of multiple cloud functions and other cloud services becomes a new technical challenge. ASW was created to address the concatenation and choreography requirements of many atomic services.

ASW (Application Service Workflow) in the form of Workflow, including cloud functions for a unified arrangement of cloud services, support sequence, parallel, cycle, failure retry, exception capture, input and output processing and other functions, truly oriented to the developer’s final scene, from input to output, Provide end-to-end solutions.

For example, if a developer wants to achieve a video subtitle OCR function, in the absence of ASW, it is necessary to manually combine and series the video frame capture, video image capture, image save, OCR interface call, result save and other processing nodes. This may involve a series of logical development and component docking, such as operation and maintenance, capacity expansion, monitoring, failure handling, etc. However, with ASW workflow, users do not need to consider, and only need to write the workflow with TCSL language according to the final scenario to quickly complete the business on-line.

The workflow provides TCSL (Tencent Cloud States Language), a JSON-based structured Language for describing and defining the business logic in the workflow. The language is flexible and easy to write readable and maintainable state machine definition code. The language is compatible with the ASL syntax of Amazon STEP Functions. Provide Task node (Task), Pass node (Pass), Choice node (Choice), Parallel node (Parallel), loop node (Map).

02. Analysis of technical challenges

There are many problems to be considered when designing and implementing such a flexible workflow system. In this part, data volume, observability, architecture flexibility and other aspects will be analyzed.

1. The Workflow ASW product is a data intensive product.

For all micro services connected by users in series, data needs to be forwarded or transmitted through ASW. At the same time, a large volume of data flows through the ASW. At this time, the CPU load is not the highest, memory, network and other hardware involving a lot of data IO, will first become a performance bottleneck. This also requires ASW products to carefully select database middleware and storage middleware when designing.

According to the design requirements, 10 billion executions per day, corresponding to the generation of 10 billion execution records; It generates well over 10 billion execution history data. These data are characterized by far more writes than queries, sequential writes, and the need for expiration logic.

The solution we adopted for the execution data is to use Redis to write the execution data sequentially, and develop a special cleaning program, which is responsible for the expiration data cleaning. It can be further modified to use native data middleware that supports expiration logic. For execution history, ASW uses CLS, Tencent’s cloud logging service, to store massive execution records.

2. Workflow products need to provide adequate observability

Workflow ASW is a solution for the user’s final scenario. Every workflow is a business of the user, and the jitter or unavailability of the workflow will directly damage the user’s business. Therefore, it is very necessary to provide the necessary observability. Indicators such as the number of startup executions per second, the number of successful executions, the number of failed executions, and the number of execution elapsed should be provided. This data needs to be reported from the ASW execution code. Although the burying point is not difficult, it is also a challenge to deal with such a huge amount of data.

For observability requirements, ASW uses Tencent Cloud to monitor CM. Collect and sort out the index data generated in the process of execution. The monitoring, alarm and Dashboard visualization capabilities of four indicators, namely, the number of start execution, the number of successful execution, the number of failed execution and the time of execution, are respectively provided.

Finally, considering that the flow peak may come at any time, the system as a whole needs to have enough flexibility to deal with it. Workflow products deployed on the public cloud have unexpected spikes in traffic, so the overall technical architecture is required to have sufficient horizontal scaling capabilities to cope with traffic challenges.

In terms of flexibility, ASW uses TKE, the Tencent cloud container service, to configure HPA strategy for traffic peak, and uses the monitoring provided by TKE to observe the operating state of the container itself. Meanwhile, all services are deployed based on the container.

Application and Service Choreography Workflow ASW System Architecture

<img src=”https://main.qcloudimg.com/raw/ee03124490c7eaa24bf694ada0667196.jpg” width=”700″/>

The overall architecture of ASW includes the following parts: front end +SDK, permission service, scheduling service, template service, executor, as well as external base facilities and middleware to support the overall operation.

Each module plays its own role, cooperate with each other, and achieve a good balance among performance, extensibility and cost. Here is a brief introduction to the core functions of each module.

  • Access service

The main functions include two parts:

  1. To authenticate the users from the console and verify whether there are roles required by ASW in the user account;
  2. When the state machine is running, which involves invoking resources on the cloud, a temporary secret key needs to be obtained.

The second core function of the permission service is ticket swap and ticket caching, expiration, update logic, etc. Among them, the number of requests for executors to invoke the permission service can reach billions of times a day.

  • Template service

Used to interact with the console and SDK, to add, delete, modify, check and manage template data. The user’s request to create and edit the state machine is supported by the template service. Because this module mainly interacts with the user side, the amount of concurrency is not particularly large.

  • The scheduler

When the user initiates execution (calls StartExecution) interface through the console or SDK, the traffic will reach the scheduler after being forwarded by Tencent Cloud API. After the scheduler performs parameter checking, TCSL code acquisition, load balancing, EXECUTIONQRN generation, and execution data writing, etc., The request is sent to an executor selected by the load balancer module to actually run a state machine. Because the user’s core logic is dependent on starting the execution function, it requires adequate performance and flexibility. Other functions are involved in stopping execution, getting execution status, getting execution list, executor heartbeat check, and so on.

  • actuator

The ASW core runtime module only interacts with the scheduler and provides APIs to start execution, stop execution, and so on. The process of starting execution includes TCSL syntax checking, INPUT parameter checking, TCSL syntax parsing and creating directed acyclic graphs, input and output processing between state machine nodes, RPC calling cloud services, etc. Execution history data (input and output for each Node) needs to be reported to the external data middleware based on parameters at startup execution time. As the ASW project and the TI Matrix project of Tencent Yunzhi Tianshu artificial intelligence service platform share the actuator, in addition to the cloud SDK, it also supports K8S-related service calls.

04. Direction of architecture evolution

The current architecture provides stable, observable, and resilient services in the context of high traffic, but there are several areas where improvements can be made. Including but not limited to: resource isolation, privatization, cost reduction, etc.

Quickly understand “Tencent Cloud ASW Workflow” in one picture

<img src=”https://main.qcloudimg.com/raw/afd5ace5211ca216709aa14ba1d8e75f.png” width=”700″/>

Identify the QR code below 👇 to join the Tencent Cloud ASW Communication Group.

<img src=”https://main.qcloudimg.com/raw/22821ff9d923fd60056f2313db287a79.png” width=”400″/>


One More Thing

Instantly experience Tencent cloud Serverless Demo, receive Serverless new user gift package 👉 Tencent cloud Serverless novice experience.