Abstract:ROMA Connect, the core system of the ROMA platform, is a product of Huawei process IT integration platform with over 15 years of enterprise business integration experience within Huawei.

This article is from The ROMA Integration Key Technologies (1)-A Detailed explanation of API Flow control technology in the Huawei Cloud community.

1 overview

ROMA Connect, the core system of the ROMA platform, is a product of Huawei process IT integration platform with over 15 years of enterprise business integration experience within Huawei. Relying on an Connect to the Internet of things, big data, video, unified communications, such as GIS platform and various application service, messages, data integration, adaptation and choreography, shielding the platform to the differences between the upper interface of the business, to provide services, information, data integration can make the service, in order to support rapid development of the new business plan, Improve application development efficiency. For scenarios such as safe parks, smart cities, and digital transformation of enterprises, Figure 1 shows a functional view of ROMA Connect.

Figure 1. ROMA Connect functional view

APIC Connect (APIC Connect), as a core component, contains the API Gateway capability and carries the API integration and opening capability. Flow control, as a key feature of the API Gateway, provides rapid and effective security protection for users’ API integration and opening. This article will describe the implementation of API Gateway flow control in detail and reveal the technical details of high performance second level flow control.

2 High concurrency high throughput system flow control requirements

2.1 Drivers of flow control

In the high concurrency and high throughput system, the usual technical keywords are degrade, cache, flow control and so on. Flow control is the core technology. Of course, these technologies complement each other.

  • Value of flow control
  1. Improve system stability/prevent avalanche
  2. Ensure high priority services
  3. Lower response latency and improve user experience
  4. Improves the effective system throughput
  5. Restricted business use, etc
  • Target parameters for flow control
  1. Limit total concurrency (such as database connection pools, thread pools)
  2. Limit the number of instantaneous concurrency (such as the limit_conn module of Nginx)
  3. Limit the average rate in a time window
  4. Limit the remote interface call rate
  5. Limit the consumption rate of MQ
  6. Limiting network traffic
  7. Limit the CPU and memory usage

2.2 Business Challenges

In large-service scenarios, the main challenges are high concurrency, low latency, high precision, and multi-dimensional flexible scalability.

Figure 2 business challenges

The specific challenges for flow control are as follows:

  • 10 times a day and 100,000 times a minute flow control coexist
  • The flow control feedback cycle is longer than the flow control cycle
  • Flow control has many dimensions
  • Flow control synchronization processing time affects user experience
  • Static setting of flow control, either too high or too low
  • Flow control failure causes service failure
  • The deployment of flow control nodes is complex and consumes high resources

Analysis of common flow control techniques

3.1 Common Flow control Logical Architecture

Figure 3. Common flow control logic architecture

The advantages and disadvantages of the various options are shown in the following table:

3.2 Common Flow control Algorithms

3.2.1 Counter algorithm

Advantages: 1. The algorithm is simple and easy to implement.

Disadvantages: 1. The output is not smooth. 2. There is a critical problem. At the boundary of the flow control cycle, output surge is easy to occur, which greatly exceeds the flow control threshold and destroys the back-end service.

3.2.2 Sliding window algorithm

Advantages: 1. It can solve the critical problem of counter algorithm. 2. The algorithm is simple and easy to implement.

Disadvantages: 1. The higher the accuracy requirement, the more window lattice is required, and the greater the memory overhead. 2. The output is not smooth.

3.2.3 Leaky Bucket algorithm

Advantages: 1. The output speed is independent of the input speed and is constant. The flow control effect is smooth. 2. No critical problem. 3. Do not rely on tokens.

Disadvantages: 1. Because the output speed of the leaky bucket is constant, it does not support a certain degree of burst requests. 2. If the bucket is full, input data is discarded

3.2.4 Token bucket algorithm

Advantages: 1. A certain amount of burst traffic is allowed. 2. You can customize complex flow control policies by adding customized tokens. 3. No critical problem.

Disadvantages: 1. If no token is available in the bucket, the input request is discarded. 2. Input requests cannot be processed by priority.

4 ROMA Connect flow control technology implementation

4.1 Overall Strategy

  • The high precision and high throughput are layered to distinguish the flow control of different scenes, and different strategies and algorithms are adopted

    • Persisting the high precision low throughput flow control; High throughput high frequency pure memory counting strategy
    • High throughput and high frequency flow control. HA is not supported. After a fault occurs, data is reset and recalculated
  • Multi-dimensional multi-priority, using Policy multi-dimensional control, a single request can trigger multiple policies

    • Decoupling complex control, using multiple simple strategy mapping; Reduce user complexity
    • A single request can trigger all policies that meet the conditions to implement comprehensive flow control
  • The request delay and Controller workload can be reduced through the distribution policy, asynchronous, batch declaration and other mechanisms

    • Process flow control requests at the Filter/SDK level as much as possible to prevent traffic control requests from affecting service latency
    • Report to the Controller as little as possible to reduce Controller load and improve Controller efficiency
  • Filter and algorithm thresholds are degraded and routed to prevent services from being affected by Ratelimit mechanism faults
  • Using KEY/VALUE mode and multidimensional, provide a common mechanism to adapt to different scenarios and different application flow complaints

    • The first application scenario of the API Gateway
    • Controllers do not need to understand specific services. Sdk-encapsulated filters are used to adapt specific services and flow control controllers

4.2 Logical View

  • The RateLimit SDK accesses the Sharding RateLimit Controller based on the consistent hash, and performs flow limiting calculation in the Controller memory for high-throughput and high-precision flow controls.
  • RateLimit Controller For high accuracy and throughput, the RateLimit Controller is only calculated in the local memory and does not need to retain the historical traffic limiting information after crash.
  • RateLimit Controller adopts an asynchronous persistence policy for high-precision and low-throughput flow limiting to ensure the accuracy of flow control after Controller crash.
  • The Ratelimit SDK supports automatic degradation when the Ratelimit Controller service is terminated.
  • Dynamically adjust flow control policies based on feedback such as API Response latency collected by the API Gateway.
  • Supports SLA-based flow control Policies.

4.3 Architecture Design

  • Use a separate Controller scheme

    • Independent cluster Controller provides globally accurate high throughput flow control
    • The Controller adopts the Sharding mechanism internally
  • Adopt the common Policy and Key/Value model

    • Extensible Domain/Policy mechanism is adopted to adapt to different service scenarios
    • Different policies are associated with different algorithms
  • Provide SDK and Tools, develop API G and other plug-ins

    • Provide reusable SDK and debugging tools
    • Flow control plug-ins such as API Gateway are pre-implemented
  • External log and flow control data analysis module

    • The flow control policy is dynamically revised through data mining and forecasting feedback to the configuration/policy management module

4.4 Built-in Algorithm

4.4.1 Token bucket algorithm with cache and color

  • Token bucket algorithm problem:

    • When no token is available, the request is rejected immediately. The user may continue to send requests until a token is available. This increases the load on the API Gateway and on the flow control services.
    • All requests receive the token with equal probability. Priority is not supported. In practice, some requests need to be processed first, while others can be delayed or rejected. For example, requests for payment on e-commerce sites should be prioritized, while requests for browsing items can be deferred.
  • A token bucket algorithm that supports caching and priority is designed

    • Cache:

      • When no token is available, the request is temporarily placed in the request queue until a token is available.
      • The FCFS algorithm is used to process the request.
      • If there is no cache space available, the request is rejected.
    • The token

      • Tokens are divided into different colors, and different colors indicate different priorities. For example, green, yellow, and red indicate the highest priority.
      • In the API configuration file, you can configure the priority of different apis. The request is assigned a token of the appropriate color based on a preconfigured priority. If the request does not have a priority, the default priority is used.
      • Configure the number of tokens based on the capabilities of the API Gateway system.
      • When a lower priority request arrives, a higher priority token can also be assigned to the lower priority request if the number of higher priority tokens is greater than the number reserved. Set a reservation for the token to ensure that low-priority requests do not run out of high-priority tokens.
      • Each color token has a separate request cache.

4.4.2 Flow control algorithm with high precision and throughput

  • Problem: Contradiction between high precision and high throughput

    • To achieve high-precision flow control, the API Gateway needs to send flow control requests to the flow control service for each API request, which greatly reduces the throughput of processing requests.
    • To improve throughput, the API Gateway needs to send flow control requests less frequently, which reduces the accuracy of flow control. The lower the frequency of sending flow control requests, the lower the accuracy of flow control.
  • A High Accuracy, High Throughput flow control algorithm HAT (High Accuracy, High Throughput) is proposed.

    • Flow control is divided into self – flow control stage and flow control service stage.
    • Set the flow control threshold to L, the self-mainstreaming control threshold to S, the number of API Gateway cluster nodes to N, and the number of apis processed in the current flow control period to R.
    • Flow control service computing: set the flow control threshold S = L/N and distribute it to each API Gateway node.
    • Within the threshold range of the self-service control, each API Gateway node can function as the self-service control without sending flow control requests to the traffic control service.
    • When the API request volume of a node in the API Gateway cluster exceeds the flow control threshold – α, the node sends a flow control request to the flow control service to apply for a new flow control threshold. At this point, the flow control service contacts the other nodes of the API Gateway to get the number of API requests they are processing. Then, the flow control service recalculates the threshold S = (L — R)/N of the flow control and sends it to each API Gateway node.
    • When the flow balance is less than δ, the self-mainstreaming control threshold is no longer updated.
    • When the lower first-class control period starts, the flow control service resets S, and each API Gateway node contacts the flow control service to update the mainstream control threshold.
  • Algorithm analysis

    • Let u be the number of updates from the mainstream control threshold in a single flow control cycle, and Pi is the API processing speed of the ith API Gateway node.
    • The number of flow control requests in a single flow control cycle is reduced from L to U *N.
    • In the optimal case, the performance of each node in the API Gateway cluster is exactly the same, where u = 1. When the flow control threshold is 10000 and the number of API Gateway nodes is 10, the number of flow control requests in a single flow control period decreases from 10000 to 10.
    • The closer the performance of each node in the API Gateway cluster is, the closer u is to 1. The greater the performance gap between each node in the API Gateway cluster, the greater the U.

4.4.3 Dynamic flow control algorithm

Dynamic flow control based on health status, trend, and API call chain.

  • After the request gets the token, the flow control service starts processing the request and generates the flow control response (accept/reject, degrade, or whitelist).
  • Dynamic flow control policy based on running status

    • Dynamically change the flow control threshold based on the network status (number of available connections, network latency), request processing latency, and CPU and memory status of the API Gateway. Or wait.
    • When the CPU and memory usage is much lower than the threshold, the request is processed normally.
    • When the CPU and memory usage approaches the threshold, lower the flow control threshold (degrade) to reduce the load on the API Gateway.
    • If the CPU and memory usage exceeds the threshold by a large amount, increase the speed at which the flow control threshold is reduced.
    • When no CPU or memory is available, the request is rejected.
    • When the CPU and memory usage reaches the normal level, restore the flow control threshold.
  • Dynamic flow control policy based on the trend of the running status

    • Machine learning is used to analyze historical data, generate predictive models, predict the LOAD of the API Gateway, modify the flow control threshold or degrade the service in advance, and ensure the smooth and stable load of the API Gateway.
    • Use machine learning to find out which requests should be blacklisted.
  • Dynamic flow control strategy based on API call flow

    • Case: API call flow.
    • A dynamic flow control strategy based on API call flow is designed.

      • Discovery of API call flows using machine learning. The flow control service holds the API call relationship.
      • When the system load is high, when an API request reaches the threshold and the traffic is limited, for all associated API requests of the same level and low level, they no longer access Redis to obtain real-time data and processing, but directly delay processing or reject.
      • When the LOAD of the API Gateway system is normal, do not enable the dynamic flow control policy.
      • In this way, the load of the API Gateway and the flow control service can be reduced without affecting the PROCESSING speed of the API.

Click follow to learn about the fresh technologies of Huawei Cloud