Use Logstash to parse logs

Getting Started with Logstash Reference [6.5] Parsing Logs with Logstash.

In the Section storing your first event, we created a simple Example of a Logstash pipeline to test the Logstash pipeline you have installed and running. In a production environment, the Logstash pipeline can be quite complex: it typically contains one or more inputs, filters, and output plug-ins.

In this section, you will create a Logstash pipeline: Filebeat takes Apache Web log data as input, parses the input log, creates specific fields, and writes the parsed data to the Elasticsearch cluster. You will configure the Logstash pipeline in the configuration file rather than on the command line.

Next, click here to download the sample data and unzip the file.

Configure Filebeat to send logs to Logstash

Before creating the Logstash pipe, you need to configure Filebeat to send logs to the Logstash pipe. Filebeat is a lightweight, low-resource log collection tool that collects logs from various servers and sends them to a Logstash instance for processing. Filebeat is designed for reliability and low latency. Filebeat takes up very little host resources, so the Beat input plugin effectively reduces the resource requirements of the Logstash instance.

In a typical application, Filebeat and Logstash are deployed on different hosts. Here they are deployed on the same host for tutorial simplicity.

Beat Input is installed with the Logstash plugin by default. The Beat input plugin enables Logstash to receive events from the Elastic Beat framework, which means that any Beat written to work with the Beat framework, such as Packetbeat and Metricbeat, can send event data to Logstash.

To install Filebeat on the data source host, download the installation package from the Filebeat product page. You can also see Filebeat Getting Started for additional installation instructions.

After you install Filebeat, you need to configure it. Locate and open the filebeat.yml file in the Filebeat installation directory and replace the configuration items in the following line. Make sure the Paths item is the path to the Apache sample log file (logstuck-tutorial.log) you downloaded earlier.

filebeat.prospectors:
- type: log
  paths:
    - /path/to/file/logstash-tutorial.log  The absolute path must be filled in
output.logstash:
  hosts: ["localhost:5044"]
Copy the code

Save the configuration. To simplify the configuration, you do not need to configure TLS/SSL configurations commonly used in the build environment.

On the data source host, type the following command to run Filebeat:

sudo ./filebeat -e -c filebeat.yml -d "publish"
Copy the code

If you run Filebeat as root, you need to change the Ownership of the configuration File (see Config File Ownership and Permissions).

Filebeat will attempt to establish a connection on port 5044. Before Logstash starts an active Beats plug-in, there is no response on the port, so the message that you see failed to establish a connection on port 5044 before input the plug-in is inactive is normal.

Configure Filebeat input for Logstash

Next, you create a Logstash pipe configuration that receives events from Beats with the Beats input plug-in as input.

The following section is a framework for pipe configuration:

# The # character at the beginning of a line indicates a comment. Use
# comments to describe your configuration.
input {
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
#}
output {
}
Copy the code

The framework above is useless because the input and output parts are not defined.

Ok, go ahead and create a first-pipeline.conf file in the Logstash directory and copy and paste the pipeline configuration framework content above.

Next, configure the Logstash instance to use Beats as the input plug-in by adding the following line to the input section of the first-pipeline.conf file:

    beats {
        port => "5044"
    }
Copy the code

Later you will configure the Logstash output to Elasticsearch. For now, you can add the following lines to the output section to make the Logstash runtime output to stdout:

    stdout { codec => rubydebug }
Copy the code

After completing the above steps, the first-pipeline.conf file should be configured as follows:

input {
    beats {
        port => "5044"}}# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
#}
output {
    stdout { codec => rubydebug }
}

Copy the code

To verify the above configuration, run the following command:

bin/logstash -f first-pipeline.conf --config.test_and_exit
Copy the code

The — config.test_AND_exit option checks the configuration file and reports an error.

If the configuration file passes the validation test, run the following command to start the Logstash:

bin/logstash -f first-pipeline.conf --config.reload.automatic
Copy the code

— Config.reload. Automatic reloads the configuration periodically and can be changed without stopping to restart the Logstash configuration.

When Logstash starts, you will see one or more lines of warning about Logstash ignoring pipelines. Yml files. You can safely ignore this warning. Yml command to run multiple pipelines in one Logstash instance In this example, only one pipe is running.

If your pipe is running properly, it will output a large number of events on the console like the following:

{
    "@timestamp"= > 2017-11-09 T01:44:20. 071 z,"offset"= > 325,"@version"= >"1"."beat"= > {"name"= >"My-MacBook-Pro.local"."hostname"= >"My-MacBook-Pro.local"."version"= >"6.0.0"
    },
          "host"= >"My-MacBook-Pro.local"."prospector"= > {"type"= >"log"
    },
        "source"= >"/path/to/file/logstash-tutorial.log"."message"= >"[04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP / 1.1 \ "200\203023" http://semicomplete.com/presentations/logstash-monitorama-2013/\ "\" Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"."tags" => [
        [0] "beats_input_codec_plain_applied"]}Copy the code

Parse Web logs through the Grok filtering plug-in

Now you have a working pipeline that can read log lines from Filebeat, but you’ll find that the log format is not ideal. From this, you want to parse the message to create specific fields. To do this, you need to use the Grok filter plugin.

The Grok filter plug-in is one of several that are available by default with Logstash. For more information on how to manage the Logstash plug-in, see the Plug-in Management section in the Reference documentation.

The Grok filtering plug-in enables you to parse unstructured log data into a structured, easily searchable form.

The Grok filter plug-in looks for patterns in the incoming log data, so it’s up to you to configure how the plug-in recognizes patterns based on your own use case requirements. The representative lines in the Web server log example look like this:

[04/Jan/2015:05:13:42 +0000]"GET/presentations/logstash - monitorama - 2013 / images/kibana - search. HTTP / 1.1 PNG"200, 203023,"http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
Copy the code

The IP address at the start of the log line is easy to identify, as is the timestamp in parentheses. To parse this data, you can use the %{COMBINEDAPACHELOG} pattern to structure the Apache log lines as follows:

Information	Field Name
IP Address	clientip
User ID	ident
User Authentication	auth
timestamp	timestamp
HTTP Verb	verb
Request body	request
HTTP Version	httpversion
HTTP Status Code	response
Bytes served	bytes
Referrer URL	referrer
User agent	agent

If you need help building grok patterns, try using the Grok debugger. The Grok debugger is a feature of X-Pack and is free to use.

Edit the first-pipeline.conf configuration file and replace the contents of the filter section with the following:

filter {
    grok {
        match => { "message"= >"%{COMBINEDAPACHELOG}"}}}Copy the code

After the above work is done, the first-pipeline.conf configuration file will look like this:

input {
    beats {
        port => "5044"
    }
}
filter {
    grok {
        match => { "message"= >"%{COMBINEDAPACHELOG}"}
    }
}
output {
    stdout { codec => rubydebug }
}
Copy the code

Save the changes. Because you set automatic configuration overloading in your configuration, you don’t have to restart the Logstash configuration when you change it again. However, you need to force Filebeat to read log files from scratch. To do this, go to the terminal window where Filebeat is running and press Ctrl + C to close Filebeat. Then delete the Filebeat registry file registry. For example, run:

sudo rm data/registry
Copy the code

Since Filebeat stores the state of each file after it has been read in the registry file, deleting the registry file forces Filebea to read the file from scratch. Next, restart Filebeat with the following command:

sudo ./filebeat -e -c filebeat.yml -d "publish"
Copy the code

There may bea bit of a delay if Filebeat waits for the Logstash configuration file to reload before it starts processing events. With the Grok pattern applied to the Logstash scheme, the event will have the following JSON representation:

{
        "request"= >"/presentations/logstash-monitorama-2013/images/kibana-search.png"."agent"= >"\" Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"."offset"= > 325,"auth"= >"-"."ident"= >"-"."verb"= >"GET"."prospector"= > {"type"= >"log"
    },
         "source"= >"/path/to/file/logstash-tutorial.log"."message"= >"[04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP / 1.1 \ "200\203023" http://semicomplete.com/presentations/logstash-monitorama-2013/\ "\" Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"."tags" => [
        [0] "beats_input_codec_plain_applied"]."referrer"= >"\"http://semicomplete.com/presentations/logstash-monitorama-2013/\""."@timestamp"= > 2017-11-09 T02: up. 416 z,"response"= >"200"."bytes"= >"203023"."clientip"= >"83.149.9.216"."@version"= >"1"."beat"= > {"name"= >"My-MacBook-Pro.local"."hostname"= >"My-MacBook-Pro.local"."version"= >"6.0.0"
    },
           "host"= >"My-MacBook-Pro.local"."httpversion"= >"1.1"."timestamp"= >"04/Jan/2015:05:13:42 +0000"
}
Copy the code

Note that events contain raw messages, but log messages are also parsed into individual specific fields.

Enhance your data with the Geoip filtering plugin

In addition to parsing log data for better searches, the filtering plug-in can also extract supplementary information from existing data. For example, the GeoIP plug-in looks for an IP address, retrieves geo-location information from the address, and adds that location information to the log.

To configure your Logstash instance using the Geoip plugin, add the following line to the filter section of the first-pipeline.conf configuration file:

    geoip {
        source= >"clientip"
    }
Copy the code

The GeoIP plug-in configuration requires that you specify the name of the source field that contains information about the IP address you are looking for. In this example, the Clientip field contains IP address information.

Since filters are processed sequentially, make sure that the geoIP section comes after the grok section in the configuration file and is all inside the filter.

When you’re done, the first-pipeline.conf should look like this:

input {
    beats {
        port => "5044"
    }
}
 filter {
    grok {
        match => { "message"= >"%{COMBINEDAPACHELOG}"}
    }
    geoip {
        source= >"clientip"
    }
}
output {
    stdout { codec => rubydebug }
}
Copy the code

Save the changes. To force Filebeat to read log files from scratch, close Filebeat(Ctrl+C), delete the register file, and restart Filebeat by running the following command:

sudo ./filebeat -e -c filebeat.yml -d "publish"
Copy the code

Please note that the event now contains geolocation information:

{
        "request"= >"/presentations/logstash-monitorama-2013/images/kibana-search.png"."agent"= >"\" Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"."geoip"= > {"timezone"= >"Europe/Moscow"."ip"= >"83.149.9.216"."latitude"= > 55.7485,"continent_code"= >"EU"."city_name"= >"Moscow"."country_name"= >"Russia"."country_code2"= >"RU"."country_code3"= >"RU"."region_name"= >"Moscow"."location"= > {"lon"= > 37.6184,"lat" => 55.7485
        },
           "postal_code"= >"101194"."region_code"= >"MOW"."longitude"= > 37.6184},...Copy the code

Create the Elasticsearch index

Now that you have parsed the Web log data into fields, you are ready to export the data to Elasticsearch.

You can run Elasticsearch on your own host or use our hosted Elasticsearch Service on Elastic Cloud. AWS and GCP both provide Elasticsearch services. Free trial of Elasticsearch.

The Logstash pipe indexes the data into the Elasticsearch cluster. Edit the first-pipeline.conf configuration file and replace the output with the following:

output {
    elasticsearch {
        hosts => [ "localhost:9200"]}}Copy the code

In this configuration, Logstash connects to Elasticsearch using the HTTP protocol. The example above assumes that Logstash runs on the same host as Elasticsearch. You can connect to a remote Elasticsearch instance by using the hosts configuration item similar to hosts => [” ES-Machine :9092″].

The input, filter, and output in the first-pipeline.conf configuration file are configured as follows:

input {
    beats {
        port => "5044"
    }
}
 filter {
    grok {
        match => { "message"= >"%{COMBINEDAPACHELOG}"}
    }
    geoip {
        source= >"clientip"
    }
}
output {
    elasticsearch {
        hosts => [ "localhost:9200"]}}Copy the code

Save the changes. To force Filebeat to read log files from scratch, close Filebeat(Ctrl+C), delete the register file, and restart Filebeat by running the following command:

sudo ./filebeat -e -c filebeat.yml -d "publish"
Copy the code

Pipeline testing

By now, the Logstash pipe has been configured to output data to the Elasticsearch cluster, so you can query related data in Elasticsearch.

Query test Elasticsearch based on grok filter plugin. Replace $DATE with the current DATE in the format YYYy.mm.DD:

curl -XGET 'localhost:9200/logstash-$DATE/_search? pretty&q=response=200'
Copy the code

The date in the index name is based on UTC, not the time zone in which the Logstash runs. If the query returns index_NOT_found_Exception, make sure logstash-$DATA is the correct index name. Curl ‘localhost:9200/_cat/indices? V ‘.

You’ll get multiple returns. Similar to:

{
  "took": 50."timed_out": false."_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0."failed": 0}."hits": {
    "total": 98,
    "max_score": 2.793642."hits": [{"_index": "Logstash - 2017.11.09"."_type": "doc"."_id": "3IzDnl8BW52sR0fx5wdV"."_score": 2.793642."_source": {
          "request": "/presentations/logstash-monitorama-2013/images/frontend-response-codes.png"."agent": """"Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36""""."geoip": {
            "timezone": "Europe/Moscow"."ip": "83.149.9.216"."latitude": 55.7485."continent_code": "EU"."city_name": "Moscow"."country_name": "Russia"."country_code2": "RU"."country_code3": "RU"."region_name": "Moscow"."location": {
              "lon": 37.6184."lat": 55.7485},"postal_code": "101194"."region_code": "MOW"."longitude": 37.6184},"offset": 2932,
          "auth": "-"."ident": "-"."verb": "GET"."prospector": {
            "type": "log"
          },
          "source": "/path/to/file/logstash-tutorial.log"."message": """[04/Jan/2015:05:13:45 +0000]"GET/presentations/logstash - monitorama - 2013 / images/frontend - response - codes. HTTP / 1.1 PNG"200, 52878"http://semicomplete.com/presentations/logstash-monitorama-2013/""Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36""""."tags": [
            "beats_input_codec_plain_applied"]."referrer": """"http://semicomplete.com/presentations/logstash-monitorama-2013/""""."@timestamp": "The 2017-11-09 T03: everything. 304 z"."response": "200"."bytes": "52878"."clientip": "83.149.9.216"."@version": "1"."beat": {
            "name": "My-MacBook-Pro.local"."hostname": "My-MacBook-Pro.local"."version": "6.0.0"
          },
          "host": "My-MacBook-Pro.local"."httpversion": "1.1"."timestamp": "04/Jan/2015:05:13:45 +0000"}},...Copy the code

Try another query with the geographic information exported from the IP address. Replace $DATE with the current DATE in the format YYYy.mm.DD:

curl -XGET 'localhost:9200/logstash-$DATE/_search? pretty&q=geoip.city_name=Buffalo'
Copy the code

A small number of log entries in the data are from Buffalo, and the query results are as follows:

{
  "took": 9,
  "timed_out": false."_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0."failed": 0}."hits": {
    "total": 2."max_score": 2.6390574."hits": [{"_index": "Logstash - 2017.11.09"."_type": "doc"."_id": "L4zDnl8BW52sR0fx5whY"."_score": 2.6390574."_source": {
          "request": "/blog/geekery/disabling-battery-in-ubuntu-vms.html? utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+semicomplete%2Fmain+%28semicomplete.com+-+Jordan+Sissel%29"."agent": """"Tiny Tiny RSS / 1.11 (http://tt-rss.org/)""""."geoip": {
            "timezone": "America/New_York"."ip": "198.46.149.143"."latitude": 42.8864."continent_code": "NA"."city_name": "Buffalo"."country_name": "United States"."country_code2": "US"."dma_code": 514,
            "country_code3": "US"."region_name": "New York"."location": {
              "lon": 78.8781,"lat": 42.8864},"postal_code": "14202"."region_code": "NY"."longitude": 78.8781},"offset": 22795,
          "auth": "-"."ident": "-"."verb": "GET"."prospector": {
            "type": "log"
          },
          "source": "/path/to/file/logstash-tutorial.log"."message": """198.46.149.143 - [04 / Jan / 2015:05:29:13 + 0000]." "GET /blog/geekery/disabling-battery-in-ubuntu-vms.html? utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+semicomplete%2Fmain+%28semicomplete.com+-+Jordan+Sissel%29 HTTP / 1.1"200, 9316"-""Tiny Tiny RSS / 1.11 (http://tt-rss.org/)""""."tags": [
            "beats_input_codec_plain_applied"]."referrer": """"-""""."@timestamp": "The 2017-11-09 T03: everything. 321 z"."response": "200"."bytes": "9316"."clientip": "198.46.149.143"."@version": "1"."beat": {
            "name": "My-MacBook-Pro.local"."hostname": "My-MacBook-Pro.local"."version": "6.0.0"
          },
          "host": "My-MacBook-Pro.local"."httpversion": "1.1"."timestamp": "04/Jan/2015:05:29:13 +0000"}},...Copy the code

If you are using Kibana to visualize log data, you can also view the data read by Filebeat in Kibana:

Filebeat uses documents

You have successfully created a pipe that uses Filebeat to read Apache Web logs as input, parse the logs to create specified fields, and export the parsed log data to the Elasticsearch cluster. Next, you’ll learn how to create a pipe that uses multiple input and output plug-ins.

Pay attention to wechat public number, regularly push articles!

Configure Filebeat to send logs to Logstash

Configure Filebeat input for Logstash

Parse Web logs through the Grok filtering plug-in

Enhance your data with the Geoip filtering plugin

Create the Elasticsearch index

Pipeline testing

Related Posts

4) In-depth understanding of Java concurrent programming, lock-free CAS mechanism, magic class Unsafe, Atomic package

Docker creates multiple databases when starting PostgreSQL

Springboot implements NULL serialization for initial values by type (string NULL for empty string, List NULL for [], Map NULL for {}) & LocalDateTim