We’re going to be using complex equalization a lot with Elastic Stack. If we don’t take this into account, then when one of our links goes wrong, it can lead to a Single point of failure, where the entire data collection doesn’t work. And load balancing to better leverage existing resources when multiple instances are deployed. In today’s article, we will cover how to use load balancing in data collection or access.

 

Typical Elastic Stack architecture diagram

Let’s start by looking at a typical Elastic Stack diagram:

Above, we can see that Beats connects directly to the Logstash, which processes the data for us and eventually imports it into Elasticsearch.

In the absence of load balancing, it looks like this:

 

We can usually configure beats like this:

output:
  logstash.hosts: ["mylogstash"]
Copy the code

Once this TCP connection is established, it is a very reliable connection when nothing is happening. But one of the things that happens is when our Logstash fails, then we might be in trouble. This is the single point of failure we mentioned above. If there are too many Beats connected to a Logstash, all data collection will be affected.

So how can we avoid this situation?

The solution is to add one more Logstash server and change the configuration to:

On top of that, we added an extra Logstash server. In our actual use, if one of the Logstash servers dies, we can complete our data collection work through another Logstash server. So how do we configure in our Beats?

Methods a:

Here’s what we did in the Beats profile:

output:
   logstash.hosts: ["Logstash1", "Logstash2"]
Copy the code

In this configuration, Beats randomly picks out a Logstash to send data each time it sends data. If one of them fails, Beats will pick the other one to send. In this way, load balancing is not used.

Method 2:

In this case, we will use a load balancing configuration. I can see the article www.elastic.co/guide/en/be…

output:
  logstash.hosts: ["logstash1", "logstash2"]
  logstash.loadbalance: true
Copy the code

Currently for Filebeat, the load balancing option is available for Redis, Logstash and Elasticsearch outputs. The Kafka output handles load balancing internally.

Beats, in this case, sends data evenly to one of them, the Logstash, depending on the load. If one of the connections breaks, Beats removes it from its pool and doesn’t use it until it can reconnect. Step back exponentially and retry to reconnect.

There is a big problem with the above method. When we added a new Logstash, we had to constantly modify our configuration file so that Beats knew it was there. Or, if we remove one of the Logstash files, we also need to modify our Beats configuration file. If we were just maintaining one or two Beats, this might not bea problem, since it’s not a lot of work.

However, the problem is that if we have a lot of beats, the workload will be very large. How do we do that?

 

Use load balancing to import data

As the number of Beats grows, one possible solution is to use proprietary load balancing:

 

As shown above, we can have each beat send data to a professional load balancer, which then sends it to the Logstash. After this transformation, our beat output is very simple:

output:
  logstash.hosts: ["loadbalancer"]
Copy the code

Here, every time we add a new beat, or we add a new Logstash, we don’t need to maintain the beats section again. All configuration is done in the place of load balancing.

 

Hands-on practice

In our practice, I use the following configuration:

Load balancing diagram:

Above, the data beats collects is sent to Nginx, then to Logstash, then to Elasticsearch, and finally to Kibana.

 

The installation

Elasticsearch

If you haven’t already installed your own Elasticsearch, please refer to my previous article “How to Install Elasticsearch on Linux, MacOS, and Windows” to install your own Elasticsearch. To enable Elasticsearch to be accessed from the Logstash file in Ubunutu OS, we made the following changes to the config/ elasticSearch. yml file:

Network. The host: 0.0.0.0 discovery. Type: single - nodeCopy the code

So we made Elasticsearch bind to every network interface on Mac OS. We can see the output at http://localhost:9200/ and http://192.168.0.3:9200/ respectively:

 

Kibana

If you haven’t already installed your own Kibana, see my previous article “How to Install Kibana in An Elastic Stack on Linux, MacOS, and Windows” to install your own. We don’t have to make any changes. After the installation is complete, enter http://localhost:5601/ in the address box of the browser

Nginx

Nginx is available in Ubuntu’s default repository, so installation is very simple.

Since this is our first interaction with the APT packaging system in this session, we will update the local package index so that we can access the latest package list. After that, we can install nginx:

sudo apt-get update
sudo apt-get install nginx
Copy the code

Once nginx has been successfully installed, we can check whether the nginx service has been successfully started by using the following command:

sudo service nginx status
Copy the code
$sudo service nginx status ● nginx.service - nginx - High performance Web server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enable> Active: active (running) since Wed 2020-06-17 16:44:00 CST; 5h 5min ago Docs: http://nginx.org/en/docs/ Process: 1761 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, sta> Main PID: 1781 (nginx) Tasks: 2 (limit: 18985) the Memory: 3.7 M CGroup: / system. Slice/nginx service ├ ─ 1781 nginx: Master process /usr/sbin/nginx -c /etc/nginx/nginx.conf ├ ─ 082 Nginx: Worker processCopy the code

It shows that nginx has been successfully installed and is running.

To configure nginx as a load balancer, configure /etc/nginx/nginx.conf as follows:

/etc/nginx/nginx.conf

user nginx; worker_processes 1; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; } stream {upstream stream_backend {server 192.168.0.4:5044; } server { listen 12345; proxy_pass stream_backend; }}Copy the code

Here 192.168.0.4 is the address of Ubuntu OS. Listen on port 12345 and forward it to 192.168.0.4:5044. After configuring our nginx.conf, let’s restart the nginx service:

sudo service nginx restart
Copy the code

Install Logstash2

Let’s install my previous article “How to Install A Logstash in an Elastic stack” to install a Logstash. Following the above configuration, we installed Logstash2 on the Ubuntu OS computer. For our situation, we can directly download the local, and add a compressed file to install:

Tar XZF logstash-7.7.1.tar.gz CD logstash-7.7.1/Copy the code

Next, we create the following logstash. Conf configuration file:

logstash.conf

input {
  beats {
    port => 5044
  }
}

output {
  stdout {
    codec => dots
  }
}
Copy the code

On top, Logstash listens on port 5044. If we have data, we just display dot, which is dot.

We use the following approach to start the Logstash:

./bin/logstash -f logstash.conf 
Copy the code

In the following exercise, we will also install Logstash1.

 

Metricbeat

We can open Kibana:

Click Add Mertic Data:

Select System metrics

Then install it according to your platform. We need to modify Merticbeat.yml.

We launch MetricBeat:

./metricbeat -e
Copy the code

As shown above, the connection between MetricBeat and Nginx was successful.

Let’s go back to the Logstash console:

We see a lot of dots coming up. This shows that the data passed from MetricBeat to Nginx to Logstash is successful.

Install Logstash1

The installation of Logstash1 on Mac OS is the same as the installation of Logstash2 on Ubuntu OS. We also create the following configuration file:

logstash.conf

input {
  beats {
    port => 5044
  }
}

output {
  stdout {
    codec => dots
  }
}
Copy the code

Let’s run this Logstash:

./bin/logstash -f logstash.conf 
Copy the code

We’ve got Logstash1 up and running, but we haven’t told Nginx to forward to this Logstash yet. Let’s reopen the nginx.conf file and add the Mac OS IP address information:

/etc/nginx/nginx.conf

user nginx; worker_processes 1; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; } stream {upstream stream_backend {server 192.168.0.4:5044; Server 192.168.0.3:5044; } server { listen 12345; proxy_pass stream_backend; }}Copy the code

Note the line added above:

Server 192.168.0.3:5044;Copy the code

That is, the information for port 12345 is load-balanced to 192.168.0.3 and 192.168.0.4 Logstash.

After the above repair, we restart nginx:

sudo service nginx restart
Copy the code

At this point, we go back to the Mac OS Logstash1 console and see that, in fact, there is no output. Is there something wrong with our configuration? The answer is simple. The connection between Nginx and MetricBeat is a TCP/IP connection. Once connected, it will not be disconnected. Nginx also does not re-load balance. We need to do some configuration for MetricBeat. Let’s repeat the exercise as follows:

/etc/nginx/nginx.conf /nginx/nginx.conf /nginx/nginx.conf Then restart nginx

2) Stop metricBeat, edit metricBeat. Yml and add TTL to the output.logstash configuration:

Logstash: # The logstash hosts hosts: ["192.168.0.4:12345"] TTL: "30s" pipelining: 0Copy the code

Restart MetricBeat:

./metricbeat -e
Copy the code

3) It should now have the same effect as before, only Logstash2 can see the output. And in Logstash1 there’s no output.

4) Modify the /etc/nginx/nginx.conf file and add server 192.168.0.3:5044.

Stream {upstream stream_backend {server 192.168.0.4:5044; Server 192.168.0.3:5044; } server { listen 12345; proxy_pass stream_backend; }}Copy the code

After the modification, restart the nginx service:

sudo service nginx restart
Copy the code

5) Let’s revisit Logstash1’s console:

At this point, we can see that there are some points that are starting to appear. This shows that our NGINx load balancing is working. And if we do what we did with MetricBeat, every time we add a new Logstash, we don’t need to do any extra configuration for the beat. Load balancing takes effect automatically.

In our exercise, I did not add Elasticsearch to the Logstash output. I’ll leave you with that.