Aws – Application Load Balancer – algorithms

Original address: medium.com/dazn-tech/a…

Note: some modifications have been made on the basis of the original text. If you mind, please browse the original text.

Prior to 2019, AWS Application Load Balancers (ALBs) only supported round-robin routing, Aws now offers a new algorithm, the Least Outstanding Requests (LOR) algorithm.

What is balancing algorithm?

Almost all load balancers support load balancing algorithms, which, in simple terms, map different requests to corresponding targets. Using the right algorithm can improve performance. The two most commonly used algorithms are round-robin and least connections (the least outstanding requests algorithm in AWS)

Round Robin algorithm

Polling is the simplest method, accessing your target groups in a sequential fashion so that each target group receives the same number of requests. Due to its simplicity, polling is supported by almost all load balancers.

The advantages of the rotation training algorithm

  • Simple and easy to use
  • The distribution is reasonable and balanced

Disadvantages of rotation training algorithm

  • Each target group needs to have similar computing power, resources, or performance
  • Each request captures nearly the same amount of power, resources, performance, and latency for the target group

For the last point, this is also the most important disadvantage of the rotation algorithm. There are many types of apis, and each method causes a different load. For example, POST requests typically require more resources than GET requests. As you can see from the figure above, task 1 May still be processing the previous request when it receives another. Task 2 May run on slower hardware. When using cloud resources, there is no guarantee that every instance will perform the same. To make matters worse, if we used Spot queues, we would be running instances of different sizes, each with different performance and capacity.

Least Outstanding Requests

The least uncompleted request algorithm can solve the shortcomings of the rotation algorithm. Each time he selects a target group, he looks for the target group that is currently waiting for the fewest requests, rather than cycling through it. In Nginx this algorithm is called least_conn

The following figure shows how the LOR algorithm directs requests. Here, OR represents the number of uncompleted requests. Each time, we choose the target group with the lowest OR value as the selection. After the selection, the value of OR increases by 1. Then receive the next request, find the target group with the smallest OR again, and repeat.

Scaling to rebalance the cluster is critical for Websocket-based services such as Pubby, our product that handles WebSockets on a large scale. For more information on extending WebSockets services using ALB, read the load Testing section of this article.

For HTTP-based services, LOR algorithms can still have a significant impact on performance. Let’s run a load test on each of these algorithms to see what the difference is.

The load test

To see the performance differences when using these algorithms, I’ll use Artillery to run a load test on a simple Node.js application

Our test service will have 2 routes

  • GET /test– Return success immediately
  • POST /test– success is returned after a random interval of 0 to 200ms.

Artillery Configuration: (Artillery configuration runs a 120-second load test running the hit service 10 times a second running 3 GET requests and a POST request each time)

config:
  target: ""
  phases:
    - duration: 120
      arrivalRate: 10

scenarios:
  - flow:
      - loop:
          - get:
              url: "/test"
        count: 3
      - post:
          url: "/test"
Copy the code

Application code

const http = require('http');
http.createServer((req, res) = > {
  if (req.method === 'POST') {
    let now = Date.now();
    const finish = now + Math.random() * 200;
    while (now < finish) {
      now = Date.now();
    }
    res.writeHead(200, {
      'Content-Type': 'application/json'}); res.end(JSON.stringify({ success: true }));
  }

  if (req.method === 'GET') {
    res.writeHead(200, {
      'Content-Type': 'application/json'}); res.end(JSON.stringify({ success: true }));
  }
}).listen(8080);
Copy the code

The results show that LOR algorithm is significantly better, and the test results are posted first.

Round-robin Least Outstanding Requests Performance Improvement
Median 68.5 54.7 25.23%
P95 1062.4 217.2 389.13%
P99 2374 386.4 514.39%

Here are some technical terms for testing:

  • Latency: Measures how fast the server responds to client requests. Usually measured in milliseconds (ms), the delay is also commonly referred to as response time. A lower number indicates a faster response. The delay is measured on the client side, from the time the request is sent to the time the response is received. The network overhead package is also included in this figure.
  • P99 metric: The average latency of all requests for the top 99 percent of response times needs to meet the metric. (The most stringent requirements, almost every request needs to meet the metrics)
  • P95 metric: The average latency of the top 95 percent of all requests needs to meet the metric. (The %5 requests with the longest response time are not counted, so they are a little lenient.)

As can be seen from the figure above, when all Requests (P99) are counted, the latency of Least Outstanding Requests is much higher than that of round-robin, which can be interpreted as the large number of round-robin Requests with long latency.

Disadvantages of LOR algorithms

The main risk with LOR algorithms is that a large number of requests will result when new targets are added to the target group. In the LOR diagram above, Task 6 receives five requests in quick succession, which may put it under high load. This is risky when running WebSocket services, because when we are receiving thousands of connections per second, we don’t want all the new connections to suddenly be sent to a single task. HTTP services should not be affected that badly, but are still worth considering.

The ALB supports slow start, which instructs the ALB to slowly increase the number of requests that the new target will serve. The downside is that you can’t use LOR and slow start at the same time. The only way to avoid this problem is to ensure that our automatic scaling policy starts at least a few tasks during each scaling operation.

Another potential problem is that failed goals often respond faster than healthy ones. For example, if your service is not connected to a database, it might immediately respond with a 500 error. In this case, the failed target will receive a higher percentage of requests, resulting in more failed events. Therefore, it is important to ensure that the health check can respond quickly to failed targets. This still results in a failed request of at least 10 seconds, so it might be worth introducing some human error delay as well. Hopefully AWS will improve their health checks so that we can move traffic away from failing instances more quickly.

conclusion

In summary, LOR algorithms generally reduce latency. It helps balance resource utilization across the cluster.