In the previous article “Logstash: Elasticsearch: Loading CSV files into Elasticsearch: Loading CSV files into Elasticsearch How to pass the contents of an Apache log into Elasticsearch

Before doing this exercise, you’ve already set up a Logstash as I did in my previous article “How to Install a Logstash in an Elastic stack.” You already have your own Elasticsearch and Kibana installed and running them successfully. For Elasticsearch, we usually use a combination of Logstash and Beats:

In today’s application, we’ll use Filebeats to read our log and pass the data into a Logstash. It is processed by the Logstash Filter and finally sent to Elasticsearch for data analysis. In the above configuration, the default ports are shown as follows:

 

Download the Apache example log

 

Method one:

You can go to github.com/elastic/exa… Download. Then save it in a local directory of its own. In my case, I saved it in a data directory under my own home directory.

localhost:data liuxg$ pwd
/Users/liuxg/data
localhost:data liuxg$ ls apache_logs.txt 
apache_logs.txt
Copy the code

Method 2:

We use the following method in our local directory:

Debian:

wget https://raw.githubusercontent.com/elastic/examples/master/Common%20Data%20Formats/apache_logs/apache_logs
Copy the code

Mac:

curl -L -O  https://raw.githubusercontent.com/elastic/examples/master/Common%20Data%20Formats/apache_logs/apache_logs
Copy the code

In my case, I save the apache_logs file in my local home directory in the data folder.

Each entry in Log has the following format:

[17/May/ 2005:10:05:03 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP / 1.1 "200 203023" http://semicomplete.com/presentations/logstash-monitorama-2013/ "" Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome/32.0.1700.77 Safari/537.36" 83.149.9.216 -- [17/May/2015:10:05:43 +0000] "GET / presentations/logstash - monitorama - 2013 / images/kibana - dashboard3. HTTP / 1.1 PNG ", 200, 171717 "Http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"Copy the code

In order for our application to run correctly, we must also download a tempate file called apache_template.json and place it in the same directory as the log file. This file will be used in the following Logstash configuration. We can download it from the following address:

https://github.com/elastic/examples/blob/master/Common%20Data%20Formats/apache_logs/logstash/apache_template.json
Copy the code

 

Install Filebeat

In our application we need to use Filebeat to read our log information and pass it into Elastcsearch.

deb:

The curl - L - O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.1-amd64.deb sudo DPKG -i Filebeat 7.3.1 - amd64. DebCopy the code

rpm:

The curl - L - O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.1-x86_64.rpm sudo RPM - vi Filebeat 7.3.1 - x86_64. RPMCopy the code

mac:

Curl - L - O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.1-darwin-x86_64.tar.gz tar XZVF Filebeat - 7.3.1 - Darwin - x86_64. Tar. GzCopy the code

brew:

brew tap elastic/tap
brew install elastic/tap/filebeat-full
Copy the code

linux:

Curl - L - O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.1-linux-x86_64.tar.gz tar XZVF Filebeat - 7.3.1 - Linux - x86_64. Tar. GzCopy the code

So our Filebeat is installed. Please note: since ELK iterates quickly, we can simply replace version 7.3.1 above with the version we need. Let’s not run Filebeat yet.

 

Configuration Filebeat

Filebeat_apache. yml filebeat_apache.yml filebeat_apache.yml fileBeat_apache.yml fileBeat_apache.yml fileBeat_apache.yml fileBeat_apache.yml fileBeat_apache.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /Users/liuxg/data/apache_logs

output.logstash:
  hosts: ["localhost:5044"]
Copy the code

 

Configuration Logstash

 

 

We create a Logstash configuration file in our data directory. The advantage of this file being in the data directory instead of the Logstash installation directory is that we can run it later with different versions of the Logstash installation directory without losing the file when we delete the Logstash installation directory. We’ll call this file apache_logstash. Conf. It reads as follows:

input { beats { port => "5044" } } filter { grok { match => { "message" => '%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (? :-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}' } } date { match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ] locale => en } geoip { source => "clientip" } useragent { source => "agent" target => "useragent" } } output { stdout { codec => dots {} } elasticsearch { index => "apache_elastic_example" template => "/Users/liuxg/data/apache_template.json" template_name => "apache_elastic_example" template_overwrite => true } }Copy the code

Description:

  • The input here uses Beats and listens on port 5044. Many Beats publish data through a single port. In our example, we use Filebeat to collect data from this port
  • The Filter section. All filters here are executed from top to bottom.

grok

This matches each of our logs with a regular expression. More grok patterns can be found at the address grok Pattern.

It structurally processes each of our messages by matching them with regular expressions. For the following message:

[17/May/ 2005:10:05:03 +0000] “GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP / 1.1 “200 203023″ semicomplete.com/presentatio…” “Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36”

or

[17/May/ 2005:10:05:03 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP / 1.1 "200 203023" http://semicomplete.com/presentations/logstash-monitorama-2013/ "" Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"Copy the code

 

We match each of our log messages with grok and assign the corresponding values to the corresponding variables. There are a lot of other messages that FileBeat sends as well:

{ "request" => "/presentations/logstash-monitorama-2013/images/Dreamhost_logo.svg", "log" => { "file" => { "path" => "/Users/liuxg/data/apache_logs" }, "offset" => 3260 }, "agent" => { "ephemeral_id" => "cb749ecc-852a-420a-b97a-4fe5d1b3ffb9", "hostname" => "localhost", "Id" = > "c88813ba a98 fdea - 4 - fb53566f3 a0be - 468", "type" = > "filebeat", "version" = > "7.3.0}", "the ident" = > "-", "input" => { "type" => "log" }, "The message" = > "83.149.9.216 - [17 / May / 2015:10:05:46 + 0000] \" GET / presentations/logstash - monitorama - 2013 / images/Dreamhost_logo. SVG HTTP / 1.1 \ ", 200, 2126 \ \ "http://semicomplete.com/presentations/logstash-monitorama-2013/\" "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\", "Ecs" = > {" version "= >" 1.0.1} ", "clientip" = > "83.149.9.216", "bytes = > 2126", "verb" = > "GET", "auth" = > "-", "@ version" = > "1", "@ timestamp" = > 2019-09-11 T06: the fixed order departs from 098 z, "referrer" => "\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"", "The host" = > {" name "= >" localhost "}, "timestamp" = > "17 / May / 2015:10:05:46 + 0000", "httpversion" = > "1.1", "tags" => [ [0] "beats_input_codec_plain_applied" ], "response" => 200 }Copy the code

If you are not familiar with Grok regex, there is a Grok Debugger for you to test. The tool is already integrated into Kibana.

We can also test it at grokdebug.herokuapp.com/.

 

date filter

Through the application of date filter, the timestamp information transmitted by the previous filter is converted into a @timestamp field. This is very important. In the above display, we can see that there are two time fields: timestamp and @timestamp. Where @timestamp represents the current timestamp at runtime. We want @timestamp to come from the time information in the log, which is the time that timestamp expresses.

After running the date filter above, the field changes to:

{ "auth" => "-", "log" => { "file" => { "path" => "/Users/liuxg/data/apache_logs" }, "offset" => 983 }, "tags" => [ [0] "beats_input_codec_plain_applied" ], "timestamp" => "17/May/2015:10:05:12 +0000", "Request" => "/presentations/logstash-monitorama-2013/plugin/zoom-js/zoom.js", "ecS" => {"version" => "1.0.1"}, "The message" = > "83.149.9.216 - [17 / May / 2015:10:05:12 + 0000] \" GET / presentations/logstash - monitorama - 2013 / plugin/zoom - js/zoom. HTTP / 1.1 js \ ", 200, 7697 \ \ "http://semicomplete.com/presentations/logstash-monitorama-2013/\" "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\", "bytes" => 7697, "response" => 200, "verb" => "GET", "ident" => "-", "input" => { "type" => "log" }, "agent" => { "hostname" => "localhost", "id" => "c88813ba-fdea-4a98-a0be-468fb53566f3", "Ephemeral_id" => "fa8FE907-C89F-4410-8877-00FAebeFE76e ", "type" =>" fileBeat ", "version" => "7.3.0"}, "@ version" = > "1", "the host" = > {" name "= >" localhost "}, "httpversion" = > "1.1", "referrer" => "\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"", "@timestamp" => 2015-05-17t10:05:12.000z, "clientip" => "83.149.9.216"}Copy the code

We can clearly see that @timestamp and timestamp are now exactly the same.

geoip filter

The GeoIP Filter can help us determine which place it is from and its latitude and longitude based on our IP address. After processing geoIP, we can see the following information.

{" verb "= >" GET ", "httpversion" = > "1.1", "@ version" = > "1", "the host" = > {" name "= >" localhost "}, "The message" = > "83.149.9.216 - [17 / May / 2015:10:05:25 + 0000] \" GET / presentations/logstash - monitorama - 2013 / images/elasticsearch. HTTP / 1.1 PNG \ ", 200, 8026 \ \ "http://semicomplete.com/presentations/logstash-monitorama-2013/\" "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\", "log" => { "file" => { "path" => "/Users/liuxg/data/apache_logs" }, "offset" => 4872 }, "input" => { "type" => "log" }, "request" => "/presentations/logstash-monitorama-2013/images/elasticsearch.png", "ident" => "-", "@timestamp" => 2015-05-17T10:05:25.000z, "agent" => {"id" => "C88813BA-FDEA-4A98-A0BE-468FB53566F3 ", "ephemeral_id" => "9afaba31-7cd0-4202-9ca8-4501742fc7a3", "hostname" => "localhost", "type" => "filebeat", "Version" = > "7.3.0"}, "bytes = > 8026", "geoip" = > {city_name "= >" Moscow ", "location" = > {" lon "= > 37.6172, "Lat" => 55.7527}, "longitude" => 37.6172, "latitude" => 55.7527, "postal_code" => "102325" "region_name" => "Moscow", "country_code3" => "RU", "country_name" => "Russia", "timezone" => "Europe/Moscow", "Country_code2" = > "RU", "region_code" = > "MOW", "continent_code" = > "EU", "IP" = > "83.149.9.216"}, "auth" = > "-", "Response" => 200, "clientip" => "83.149.9.216", "tags" => [[0] "beatS_input_codec_plain_applied"], "Referrer" = > "\" http://semicomplete.com/presentations/logstash-monitorama-2013/\ ""," ecs "= > {" version" = > "1.0.1"}, "timestamp" => "17/May/2015:10:05:25 +0000" }Copy the code

useragent filter

Useragent can help us add information about userAgent (such as series, operating system, version, and device). After entering userAgent Filter, we can see the following information:

{" input "= > {" type" = > "log"}, "clientip" = > "199.16.156.124." "The message" = > "199.16.156.124 - [18 / May / 2015:14:05:13 + 0000] \" GET/files/lumberjack lumberjack - 0.3.0. Exe HTTP / 1.1 \" 200 4378624 \ ", \ "\" Twitterbot / 1.0 \ ""," verb "= >" GET ", "ecs" = > {" version "= >" 1.0.1} ", "httpversion" = > "1.1", "response" => 200, "timestamp" => "18/May/2015:14:05:13 +0000", "referrer" => "\"-\"", "@timestamp" => 2015-05-18t14:05:13.000z, "log" => {"offset" => 780262, "file" => { "path" => "/Users/liuxg/data/apache_logs" } }, "@version" => "1", "Geoip = > {timezone" = > "America/Chicago", "country_code2" = > "US", "IP" = > "199.16.156.124." "Country_code3" => "US", "longitude" => -97.822, "country_name" => "United States", "latitude" => 37.751, "Continent_code" = > "NA", "location" = > {" lon "= > 97.822," lat "= > 37.751}}," the ident "= >" - ", "tags" => [ [0] "beats_input_codec_plain_applied" ], "bytes" => 4378624, "host" => { "name" => "localhost" }, "agent" => { "hostname" => "localhost", "id" => "c88813ba-fdea-4a98-a0be-468fb53566f3", Ephemeral_id => "e146fbc1-8073-404E-BC6F-D692CF9303D2 ", "type" =>" fileBeat ", "version" => "7.3.0"}, "Request" = > "/ files/lumberjack lumberjack - 0.3.0. Exe", "auth" = > "-"}Copy the code

Run Logstash and Filebeat

We can run Logstash as follows:

$ ./bin/logstash -f ~/data/apache_logstash.conf
Copy the code

The -f option here is used to specify the Logstash configuration file for our definition.

We go to the Filebeat installation directory and use the following command to collect Filebeat data and pass it to the Logstash:

$ ./filebeat -c filebeat_apache.yml
Copy the code

Here we use the -c option to specify our Filebeat configuration file.

If we want to re-run our FileBeat and re-process our Logstash data, we can delete the Registry directory:

The Registry file for Filebeat stores the state and location information that Filebeat used to track the last location read.

  • Data /registry is installed for.tar.gz and.tgz archive files
  • The/var/lib/filebeat/registry for DEB and RPM installation package
  • C: ProgramData fileBeat Registry for Windows ZIP files

If we want to run the data again, we can simply go to the corresponding directory and delete the directory called Registry. For the.tar.gz installation package, we can simply delete this file:

Localhost: data liuxg $PWD/Users/liuxg/elastic/filebeat - 7.3.0 - Darwin - x86_64 / data localhost: data liuxg $rm - rf registry /Copy the code

After deleting the regsitry directory, we re-run the filebeat command.

  • output

We simply output points on the screen to indicate progress. Export our content to Elasticsearch as well.

 

Check out the index in Kibana

If we are doing well, we have successfully imported the apache_logs data into Elasticsearch. According to our configuration file above, this index is named apache_ELAStic_example. We can check it out in Kibana:

As you can see from the display above, we already have 1000 pieces of data. We can also search for:

 

We can set up an Index Pattern and analyze the data:

Once the data is in Elasticsearch, we can analyze it. We will show you how to use Kibana to analyze the data in a future article.