Splash under pressure? Try load balancing!

When using Splash to do page fetching, if the amount of page fetching is very large and the tasks are very large, it would be too much pressure to use a Splash service to handle it. At this time, a load balancer can be considered to distribute the pressure to each server. This is equivalent to multiple machines and services participating in task processing, which can reduce the pressure of a single Splash service.

1. Configure the Splash service

To set up a Splash load balancer, there must be multiple Splash services. If Splash service is enabled on port 8050 of four remote hosts, their service addresses are 41.159.27.223:8050, 41.159.27.221:8050, 41.159.27.9:8050 and 41.159.117.119:8050 respectively. These four services are completely consistent and are all enabled by the Splash image of Docker. When accessing any of these services, the Splash service can be used.

2. Configure load balancing

Then, you can select any host with a public IP address to configure load balancing. First, install Nginx on the host, then modify the Nginx configuration file nginx.conf to add the following:

http { upstream splash { least_conn; Server 41.159.27.223:8050; Server 41.159.27.221:8050; Server 41.159.27.9:8050; Server 41.159.117.119:8050; } server { listen 8050; location / { proxy_pass http://splash; }}}Copy the code

Thus we define a service cluster configuration called SPLASH through the upstream field. Least_conn stands for least linked load balancing, which is suitable for server overload due to varying request processing times.

Of course, we can also not specify the configuration, as follows:

Upstream splash {server 41.159.27.223:8050; Server 41.159.27.221:8050; Server 41.159.27.9:8050; Server 41.159.117.119:8050; }Copy the code

This defaults to a polling strategy for load balancing, with the same stress on each server. This policy is suitable for services with comparable server configurations, stateless, and short speed.

Alternatively, we can specify weights as follows:

Upstream splash {server 41.159.27.223:8050 weight=4; Server 41.159.27.221:8050 weight = 2; Server 41.159.27.9:8050 weight = 2; Server 41.159.117.119:8050 weight = 1; }Copy the code

The weight parameter specifies the weight of each service. The higher the weight, the more requests are allocated to processing. If different server configures difference to compare big word, can use this kind configures.

Finally, there is an IP hash load balancer with the following configuration:

upstream splash {
    ip_hash;
    server 41.159.27.223:8050;
    server 41.159.27.221:8050;
    server 41.159.27.9:8050;
    server 41.159.117.119:8050;
}Copy the code

The server hashes the IP address of the requesting client to ensure that the same server is used to respond to the request. This strategy is suitable for stateful services, such as when a user logs in and accesses a page. For Splash, this setting is not required.

We can choose different configurations for different situations. After configuration, restart the Nginx service:

sudo nginx -s reloadCopy the code

In this way, you can directly access port 8050 of the server where Nginx resides to achieve load balancing.

3. Configure authentication

Splash is now publicly accessible, and if you don’t want it to be, you can configure authentication, again with the help of Nginx. You can add auth_BASIC and auth_basic_user_file to the location field of the server as follows:

http {
    upstream splash {
        least_conn;
        server 41.159.27.223:8050;
        server 41.159.27.221:8050;
        server 41.159.27.9:8050;
        server 41.159.117.119:8050;
    }
    server {
        listen 8050;
        location / {
            proxy_pass http://splash;
            auth_basic "Restricted"; auth_basic_user_file /etc/nginx/conf.d/.htpasswd; }}}Copy the code

The username and password configuration used here is placed in the /etc/nginx/conf.d directory, which we need to create using the htpasswd command. For example, to create a file named admin, run the following commands:

htpasswd -c .htpasswd adminCopy the code

We are then prompted for a password, and after typing twice, a password file is generated with the following contents:

cat .htpasswd 
admin:5ZBxQr0rCqwbcCopy the code

After the configuration, restart the Nginx service:

sudo nginx -s reloadCopy the code

The access authentication has been successfully configured.

4. Test

Finally, we can use code to test the load balancing configuration to see if the IP is switched on every request. Use http://httpbin.org/get test, the implementation code is as follows:

import requests
from urllib.parse import quote
import re

lua = ' '' function main(splash, args) local treat = require("treat") local response = splash:http_get("http://httpbin.org/get") return treat.as_string(response.body) end '' '

url = 'http://splash:8050/execute? lua_source=' + quote(lua)
response = requests.get(url, auth=('admin'.'admin'))
ip = re.search('(\d+\.\d+\.\d+\.\d+)', response.text).group(1)
print(ip)Copy the code

Replace the SPLASH string in the URL with its own Nginx server IP address. Here I changed Hosts and set splash to the IP address of the Nginx server.

After running the code several times, you can see that the IP changes with each request, such as the result of the first one:

41.159.27.223Copy the code

Results of the second time:

41.159.27.9Copy the code

This indicates that load balancing has been successfully implemented.

In this section, we have successfully implemented the configuration of load balancing. After load balancing is configured, multiple Splash services can work together to reduce the load on a single service, which is useful.

Splash under pressure? Try load balancing!

1. Configure the Splash service

2. Configure load balancing

3. Configure authentication

4. Test

Related Posts

JB Python Journey – crawler – Graphic captchas (2)- charging OCR learn below

Small program introduction, you need to prepare

Scala series 11 — Pattern matching