LTM LB for NGINX is a typical two-layer load balancing solution, that is, a typical L4.L7 separated two-layer load balancing solution. After we use this solution, there is a server overload after NGINX, what to do? Should it be solved like drinking?

If the NGINX in the LTM pool member is located in a different availability zone or in a different DC, then LTM only does application layer load balancing or monitor NGINX itself. The LTM has no sense of how many business servers are available behind the NGINX (upstream). If an NGINX upstream has few available servers, the LTM will still allocate the same number of connection requests to the NGINX, resulting in an overload of servers behind the NGINX, resulting in a loss of quality of service.

The roadmap of the F5 load balancing solution is as follows:

If LTM can be made aware of how many servers are currently available in the NGINX upstream and set a threshold below which LTM will not assign connections to that NGINX instance. In this way, the above problems can be better avoided. According to the logs reported by LTM or Telemetry Streaming output, o&M personnel can trigger relevant automatic processes to expand the service instances of the NGINX. When the number of available service instances exceeds the threshold, LTM starts to allocate new connections to the NGINX.

NGINX Plus itself provides an API endpoint, which can be used to obtain the number of available server instances by obtaining and processing the API. In LTM, external Monitor can be used to implement automatic monitoring and processing of the API.

F5 describes how to perform the load balancing solution

1. Install a VPN that can be used in China to obtain THE API resources. Note: Version 6 after THE API may vary from version to version of Nginx Plus.

State: up {“peers”: [{“id”: 0, “server”: “10.0.0.1:8080”, “name” : “10.0.0.1:8080”, “backup” : false, “weight” : 1, “state” : “up”, “active” : 0, “requests:” 3468, “header_time:” 778, “response_time:” 778, “responses” : {” 1 “xx: 0,” 2 xx “: 3435, “xx” 3:6, “xx” 4:20, “xx” 5, 4, “total” : 3465}, “sent” : 1511086, the “received” : 99693373, “fails” : 0, “unavail” : 0, “health_checks” : {” checks “: 1754,” fails “: 0,” unhealthy “: 0, “last_passed” : true}, “downtime” : 0, “selected” : “the 2020-01-03 T07:52:57 Z”}, {” id “: 1, “server” : “10.0.0.1:8081”, “name” : “10.0.0.1:8081”, “backup” : true, “weight” : 1, “state” : “Unhealthy”, “active” : 0, “requests” : 0, “responses” : {” 1 “xx: 0,” xx “2:0,” xx “3:0,” 4 xx “: 0, “xx” : 0, “total” : 0}, “sent” : 0, “received” : “fails” : 0, 0, “unavail” : 0, “health_checks” : {” checks “: 1759,” fails “: 1759,” unhealthy “: 1,” last_passed “: false},” downtime “: 17588406, “downstart” : “the 2020-01-03 T03:00:00. 427 z”} 3}]. You can write the following Python script: #! /usr/bin/python # — coding: utf-8 — import sys import urllib2 import json def get_nginxapi(url): Ct_headers = {‘ content-type ‘:’application/json’} request = urllib2. request (URL,headers=ct_headers) response = Urllib2.urlopen (request) HTML = response.read() return HTML API = sys.argv[3] try: Data = get_nginxapi(API) data = json. Loads (data) except: Data = ‘m = 0 lowwater = int(sys.argv[4]) try: for peer in data[‘peers’]: State = peer [‘ state ‘] if state = = ‘up’ : m = m + 1 except: M = 0 # print data [‘ peers’] [] [] ‘state’ # print m if m > = lowwater: print ‘UP’

4. Upload the script to LTM in the system – file Management – External monitor – import directory

5. Configure external-monitor. Enter the URL of the relevant API before the space and the threshold after the space

6. Then associate the monitor with an Nginx pool member

As you can see, the member is marked up at this point

7. If the threshold is set to 3, LTM will mark the NGINX instance as Down because there are only two upstreams available

Wrong URL or wrong endpoints, etc., are set to down, so that users can easily find the problem? Members in Upstream whose status is set to Backup are considered available? This approach also prevents the real server from being monitored twice by LTM and NGINx

What about LTM Settings if nginx has multiple upstreams?

From the front-end LTM to nginx, if any upstreams on the nginx back end do not have sufficient capacity, the nginx should not be re-linked.

If there is a problem with the configuration of nginx, the actual service cannot be accessed.

Yes, for nginx’s own availability and configuration issues, consider adding a penetrating 7-layer health check on LTM, but if Nginx itself has a lot of server/ Location section configurations, and you want to find possible problems with all of those sections, that means 7-layer health checks for every service, In the case of multiple service scenarios, it is necessary to consider that excessive pursuit of perfection of detection may bring more detection pressure to the business server. In theory, a penetration check on LTM plus all UPstreams’ API checks should suffice for most scenarios.

In a large-scale NGINX deployment scenario, how to reduce the pressure of NGINX health check on back-end services?

Consider nginx to do dynamic service discovery app, app availability by registry tools to solve, from distributed health check to registry centralized health check; Or use NGINX Plus’s Upstream API to dynamically update upstreams with a centralized health check system, which reduces the stress of health checks by eliminating frequent reload configurations.

The above content is the practical operation method of F5 load balancing solution after partial NGINX service instance overload. Hope it will help you. If you are still not able to operate, it is recommended to contact F5 customer service, F5 will quickly, targeted to help you solve the problem.

F5 Community Recommendation: How to use F5 to avoid Local NGINX Overload under dual load of Multiple availability zones