This is the 8th day of my participation in the First Challenge 2022. For details: First Challenge 2022.

Follow me: Initial ELK construction for distributed link tracking

1. Introduction

In the last article, we set up the ELK environment. Initially, when adding logs in a specified folder, it can be stored in ES, but there are many problems. In this article, we will carefully stroke it.

Some of the code is in the middle of the experience, so we use the @x tag, which can be searched at the bottom.

Problem 2.

2.1 Multi-line code problem

Question why

As we all know, Java exceptions come in multiple lines of code, rendered in Kibana as previously configured, one sentence at a time, completely unusable.

Problem solving

Add a configuration under the log node of Filebeat. yml. If the regular expression does not match, the following statement will be After to become a Message. @ 1

  multiline:
      pattern: '^\s*(\d{4}|\d{2})\-(\d{2}|[a-zA-Z]{3})\-(\d{2}|\d{4})'  
      negate: true                                                       
      match: after
Copy the code

2.2 Data format problems

Question why

After logs are generated, you need to clean data to ensure data quality.

Problem solving

Create the Patterns folder under the logstash root.
Create a pattern file.
Edit the pattern file and add the following regular expressions for data formatting.

DATETIME \ d {4} \ d {1, 2} - \ d {1, 2} \ s + \ d {1, 2} : \ d {1, 2} : \ d {1, 2} (\. \ d {1, 3}) STACK_TRACE ((.+Exception:.*)|(.+Error:.*)|(\s+at\s.*))? (.*\s*)* LINE \|{2} VLINE \s*\|{2}\s* VLIN1 \| NOTVLINE [^\s\|]* NOTSQUAR [\]]* JAVA_LINE_LOG_SIMPLATE01 %{DATETIME:timestamp}%{LINE}%{DATA:level}%{LINE}%{DATA:logger}%{LINE}%{GREEDYDATA:more} JAVA_LINE_LOG_MULTILINE01 %{DATETIME:timestamp}%{LINE}%{DATA:level}%{LINE}%{DATA:logger}%{LINE}%{DATA:more}[\n]+%{STACK_TRACE:stacktrace}Copy the code

Configure @2 under sync.conf of the logstash (ps: this file is created manually, see above)

2.3 Data filtering

Question why

When a lot of logs are generated, some logs are needed and some logs are not. In order to ensure data integrity, no filtering is performed. I’m just going to label it Parse_Error, the same concept as logical deletion.

Problem solving

Add the @3 configuration to the Logstash configuration file

2.4 Data time problem

Question why

I believe that most developers have encountered the problem of the outdated zone, whether MySQL, or Docker, often 8 hours.

There are two times in this environment, one is the time when the log is generated and the other is the time when ES is written.

Problem solving

Add configuration @4 to the Logstash configuration file

2.5 Redundant fields

Question why

After data cleaning, some fields are not needed, we need to delete some fields.

Problem solving

Add configuration @5 to the Logstash configuration file

3 the end

3.1 the effect

After the above configuration, the result in Es is as follows:

3.2 the search

4 Complete configuration files

4.1 filebeat

filebeat.inputs: - type: log enabled: true paths: - /Users/zyq/project/study-project/logs/*.log fields_under_root: True fields: app_type: Java ### Multiline options @1 Multiline: pattern: '^\s*(\d{4}|\d{2})\-(\d{2}|[a-zA-Z]{3})\-(\d{2}|\d{4})' negate: true match: after filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false setup.template.settings: index.number_of_shards: 1 index.number_of_replica: 0 setup.kibana: output.logstash: # The Logstash hosts hosts: ["localhost:5044"] # pretty: true # enable: true processors: - add_host_metadata: ~ - add_cloud_metadata: ~ - add_docker_metadata: ~ - add_kubernetes_metadata: ~Copy the code

4.2 logstash

# input {beats {port => 5044}} filter{#@2 if [agent][type] == "fileBeat" {# check whether the source is Fillebeat if [app_type] == "Java" {# determine appType if "multiline" in [log][flags] {# determine multiline or single line grok {patterns_dir => ["../config/patterns"] Match => ["message", "%{JAVA_LINE_LOG_MULTILINE01}"]} mutate {add_field => {"mutiline" => "true"} else {grok { patterns_dir => ["../config/patterns"] match => [ "message" , "%{JAVA_LINE_LOG_SIMPLATE01}"] # single-line regular expression}} date {# timezone configuration @4 target => "@timestamp" timezone => "Asia/Shanghai" match => ["@timestamp","MMMM dd YYYY HH:mm: ss.sss "]} if "_grokparseFailure" in [tags] {# add_field => {"parse_error" => "true"} } } else { mutate { replace => {"message" => "%{[more]}"} } } mutate { #@5 Remove_field = > [# remove field "@ version", "agent", "host", "ecs", "input", "log", "more", "tags"] add_field = > {" logfile = >" "%{[log][file][path]}" } } } } } output { elasticsearch { hosts => ["http://localhost:9200"] index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}" #user => "elastic" #password => "changeme" } }Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Follow me: ELK log practice for distributed link tracking

1. Introduction

Problem 2.

2.1 Multi-line code problem

2.2 Data format problems

2.3 Data filtering

2.4 Data time problem

2.5 Redundant fields

3 the end

3.1 the effect

3.2 the search

4 Complete configuration files

4.1 filebeat

4.2 logstash

Follow me: ELK log practice for distributed link tracking

1. Introduction

Problem 2.

2.1 Multi-line code problem

2.2 Data format problems

2.3 Data filtering

2.4 Data time problem

2.5 Redundant fields

3 the end

3.1 the effect

3.2 the search

4 Complete configuration files

4.1 filebeat

4.2 logstash

Related Posts

A lightweight delay task design idea

Use Python to crawl and analyze the sales data of popular products in [dong 618].

Redis source code analysis skiplist