Twelve,LogstashandBeatsBuilding a data pipeline

1. LogstashAn introduction to

1.1 LogstashArchitecture is introduced

  • LogstashIt’s open sourceetlTool, can be collected on a series of data processing and processing, support from a variety of different data sources (file, HTTP, database, Kafka), send the processed data to different places.

1.2 Logstash Concepts(LogstashSome concepts explained in

  • Pipeline(Pipe meaning)
    • Contains theinputfilteroutputThe three-stage process (we call this standardized process processPipeline)
    • Plug-in life Management cycle (LogstashCurrently supports more than 200 plug-ins)
    • Queue Management (LogstashHas a message queue of its own)
  • Logstash Event
    • The specific representation of data as it is circulated internally. The data ininputThe phase is converted toEventIn theoutputPhases are converted to target format data
    • EventIt’s actually aJava ObjectIn the configuration file, yesEventAdd, delete, change and check the attributes of

1.3 LogstashOverview of process architecture

  • Codec(Code/Decode): As we said before, inLogstashInternally, the data is based onEventThat converts the original data toEventAnd thenEventTo convert data into the target format, these operations areCodecTo complete the
    • Decode: Converts raw datadecodeintoEvent
    • CodeWill:Event encodeTarget data

1.4 LogstashConfiguration file structure

As we mentioned earlier, in Logstash we call a data processing process a Pipeline. So what is a Pipeline? How do we create a Pipeline?

  • PipelineIs includedInput/Filter/OutputA configuration file for the process (not all required)
  • When we use it, all we need to do is write this configuration file, which we used in section 1

1.5 LogstashtheInput Plugins(Input plug-in)

  • aPipelineYou can have more than oneinputThe plug-in
    • Stdin(console input)/File(File input)
    • Beats/Log4J/Elasticsearch/JDBC/Kafka/Rabbitmq/Redis
    • JMX/HTTP/Websocket/UDP/TCP
    • Google Cloud Storage/S3
    • GitHub/Twitter

1.6 LogstashtheOutput Plugins(Output plug-in)

  • willEventSend invert specific destination, yesPipelineThe final stage of
  • commonOutput Plugins(https://www/elastic.co.guide/en/logstash/7.1/output-plugins.html)
    • Elasticsearch
    • Email/Pageduty
    • Influxdb/Kafka/Mongodb/Opentsdb/Zabbix
    • Http/TCP/Websocket

1.7 Codec Plugins(Data conversion)

  • The raw datadecodeintoEvent; willEvent encodeTarget data
  • The built-inCodec Plugins
    • Line/Multiline
    • JSON/Avro/Cef(ArcSight Common Event Format)
    • Dots/Rubydebug

1.8 Filter Plugins(Data processing)

  • To deal withEvent
  • The built-inFilter Plugins
    • MutateOperation:EventThe field of
    • Metrics: Aggregate metrics
    • Ruby: performRubycode

1.9 Queue(queue)

We now know that one Logstash can support multiple inputs, so to prevent data loss when the Logstash restarts, the Logstash introduces the concept of Queue. All data collected by Input will be converted into events by Codec, then events will be sent to Queue, which will then be thrown to Filter for processing

1.9.1 In Memory Queue(Memory queue)

  • If the processCrash, machine downtime, will cause the loss of data

1.9.2 Persistent Queue(Persistent queue)

  • Queue.type.persisted(the default ismemory)
    • Queue.max_bytes: 4gb
  • When the machine goes down, data is not lost; Data is guaranteed to be consumed; Can replaceKafkaWait message queue Wait message queue buffer
  • https://www.elastic.co/guide/en/logstash/7.1/persistent-queues.html

More than 1.10PipelineThe instance

  • Pipeline.works: PipelineNumber of threads, default isCPUauditing
  • Pipeline.batch.size: BatcherThe number of documents to be processed in a batch. The default value is 125. Need to combinejvm.optionsAdjust the
  • Pipeline.batch.delay: BatchWaiting time

1.11 Codec pluginThe actual combat

1.11.1 Codec Pluginline: Single-line input

[root@hadoop101 logstash-7.1.0]# bin/logstash -e "Input {stdin{codec=>line}}output{stdout{codec=> rubydebug}}"Copy the code

1.11.2 Codec Plugindots: Displays the processing progress

[root@hadoop101 logstash-7.1.0]# bin/logstash -e "Input {stdin{codec=>line}}output{stdout{codec=> dots}}"Copy the code

1.11.3 Codec PluginjsonInput:Json

[root@hadoop101 logstash-7.1.0]# bin/logstash -e "Input {stdin{codec=>json}}output{stdout{codec=> rubydebug}}"Copy the code

1.11.4 Codec PluginMultiline: multi-line matching (exception log handling)

  • Set the parameters
    • Pattern: Sets the regular expression for row matching
    • What: If the match is successful, then whether the matching row belongs to the previous event or the next event
      • Previous/Next
    • Negate true/false: whether topatternResults the not
      • True/False

input {
	stdin {
		codec => multiline {
			pattern => "^\s"
			what => "previous"
		}
	}
}
output {
	stdout {
		codec => "rubydebug"
	}
}
Copy the code

1.12 Input pluginThe actual combat

1.12.1 Input plugin:File(Reading data from a file)

  • Support for reading data from files, such as log files
  • File reading needs to be resolved
    • It is read only once. After a restart, you need to continue from where you last read (throughsincedbImplementation to save location information tosincedbC)
  • A new file is found. Procedure
  • File has been archived (document location changed, logrotation), cannot affect the current content read

1.12 Filter pluginActual combat (important)

  • Filter PluginCan beLogstash EventPerforms various processing, such as parsing, deleting fields, and casting
    • Date: Date resolution
    • Dissect: delimiter resolution
    • Grok: re match parser
    • Mutate: Processes fields. Rename, delete, replace
    • Ruby:RubyCode to dynamically modifyEvent

1.12.1 Filter Plugin: Mutate(important)

  • Do various operations on fields
    • Convert: type conversion
    • Gsub: String substitution
    • Split/Join/MergeString slicing, array merging string, array merging array
    • Rename: rename the field
    • Update/Replace: Field content is updated and replaced
    • Remove_field: field deletion

2. Beatsintroduce

2.1 what isBeats

It’s for collecting data, and it can be easily integrated with Logstash or ES. ElasticSearch makes a lot of Beats out of the box

  • Light weight data shippers
    • Mainly to collect data
    • Support andLogstashorESintegration
  • Full category/lightweight/out of the box/pluggable/extensible/visual

2.2 Metricbeat

2.2.1 profile

  • It is used to collect operating system and software index data regularly
    • Metric vs Logs
      • Metric: aggregable data, collected regularly
      • Logs: Text data, randomly collected
  • Metrics are stored inElasticsearchMedium, yesKibanaPerform real-time data analysis

2.2.2 Metricbeatcomposition

  • Module
    • Counter objects to be collected include different operating systems, databases, and application systems
  • Metricset
    • aMoudleYou can have more than onemetricset
    • A specific set of metrics. To reduce the number of calls for the principle of partition
      • differentmetricsetYou can set different capture duration

2.2.3 Module

  • MetricbeatA large number are available out of the boxModule
  • By performingmetricbeat module listTo view
  • By performingmetricbeat module enable module_namecustom

Blog.csdn.net/fenglixiong…

Elasticsearch Is a game about Elasticsearch. It’s a game about Elasticsearch.