“This is the 24th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

The background,

Since the test results of our self-developed client pressure measuring tool are structured log files, and considering the real-time and centralized performance monitoring needs, a collection agent is needed to collect structured log files regularly and in batches, and just the Telegraf Logparser plug-in can meet this requirement.

Second, the Telegraf logparser

The Logparser plug-in streams and parses a given log file and currently supports parsing “grok” and regular expression modes.

1. Grok parser

Familiar with one of the best ways to grok the parser is a reference logstash document: www.elastic.co/guide/en/lo… The Telegraf parser uses a slightly modified version of the Logstash “grok” pattern in the format

%{<capture_syntax>[:<semantic_name>][:<modifier>]}
Copy the code
  • capture_syntax: Defines the grok mode for parsing input rows
  • semantic_name: Used to name fields or tags
  • modifier: extends the data type or other special processing to which the parsed item is converted

By default, all named captures are converted to string fields. If the schema does not have a semantic name, it is not captured. Timestamp modifiers can be used to convert a capture to a timestamp of a parsed measure. If no timestamp is resolved, the measure is created using the current time.

Note: Each row must capture at least one field. Converting all capture patterns to tags will result in points that cannot be written to the sequential database.

  • Available modifiers:
    • string (default if nothing is specified)
    • int
    • float
    • Duration (IE, 5.23ms gets in line with int nanoseconds)
    • tag (converts the field into a tag)
    • drop (drops the field completely)
  • Timestamp modifiers:
    • ts (This will auto-learn the timestamp format)
    • ts-ansic (“Mon Jan _2 15:04:05 2006”)
    • ts-unix (“Mon Jan _2 15:04:05 MST 2006”)
    • ts-ruby (“Mon Jan 02 15:04:05 -0700 2006”)
    • ts-rfc822 (“02 Jan 06 15:04 MST”)
    • ts-rfc822z (“02 Jan 06 15:04 -0700”)
    • ts-rfc850 (“Monday, 02-Jan-06 15:04:05 MST”)
    • ts-rfc1123 (“Mon, 02 Jan 2006 15:04:05 MST”)
    • ts-rfc1123z (“Mon, 02 Jan 2006 15:04:05 -0700”)
    • ts-rfc3339 (“2006-01-02T15:04:05Z07:00”)
    • Ts – rfc3339nano (” the 2006-01-02 T15:04:05. 999999999 z07:00 “)
    • ts-httpd (“02/Jan/2006:15:04:05 -0700”)
    • ts-epoch (seconds since unix epoch, may contain decimal)
    • ts-epochmilli (milliseconds since unix epoch, may contain decimal)
    • ts-epochnano (nanoseconds since unix epoch)
    • ts-syslog (“Jan 02 15:04:05”, parsed time is set to the current year)
    • ts-“CUSTOM”

Custom time format must be in quotes and must be a representation of “reference time” on Jan 2 15:04:05-0700 MST 2006. To match a comma decimal point, use a period. For example, %{TIMESTAMP: TIMESTAMP: ts-“2006-01-02 15:04:05.000”} can be used to match “2018-01-02 15:04:05.000” to match a comma decimal point, a period can be used in the pattern string.

For more details, please refer to: golang.org/pkg/time/#P…

Telegraf has many of its own built-in patterns and supports most logstash built-in patterns. Golang regular expressions do not support forward or backward lookup. The Logstash schema that relies on these is not supported.

If you need to build patterns to match log debugging, debugging using Grokdebug.herokuapp.com is very useful!

2, sample

We can use LogParser to convert the log lines generated by Telegraf into metrics.

To do this, we need to configure Telegraf to write logs to a file. This can be done using the agent.logfile parameter or by configuring syslog.

[agent]
  logfile = "/var/log/telegraf/telegraf.log"
Copy the code

Logparser configuration:

[[inputs.logparser]]
  files = ["/var/log/telegraf/telegraf.log"]

  [inputs.logparser.grok]
    measurement = "telegraf_log"
    patterns = ['^%{TIMESTAMP_ISO8601:timestamp:ts-rfc3339} %{TELEGRAF_LOG_LEVEL:level:tag}! %{GREEDYDATA:msg}']
    custom_patterns = ' '' TELEGRAF_LOG_LEVEL (? :[DIWE]+) '' '
Copy the code

The log content:

2018-06-14T06:41:35Z I! | | | | | | | | | | | Agent Config: Interval:3s, Quiet:false, Hostname:"archer", Flush Interval:3s
2018-02-20T22:39:20Z E! Error in plugin [inputs.docker]: took longer to collect than collection interval (10s)
2018-06-01T10:34:05Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-14T07:33:33Z D! Output [file] buffer fullness: 0 / 10000 metrics.
Copy the code

Data collected by InfluxDB:

telegraf_log,host=somehostname,level=I msg="Starting Telegraf v1.6.4" 1528958495000000000
telegraf_log,host=somehostname,level=I msg="Agent Config: Interval:3s, Quiet:false, Hostname:\"somehostname\", Flush Interval:3s" 1528958495001000000
telegraf_log,host=somehostname,level=E msg="Error in plugin [inputs.docker]: took longer to collect than collection interval (10s)" 1519166360000000000
telegraf_log,host=somehostname,level=W msg="Skipping a scheduled flush because there is already a flush ongoing." 1527849245000000000
telegraf_log,host=somehostname,level=D msg="Output [file] buffer fullness: 0 / 10000 metrics." 1528961613000000000
Copy the code

Third, concrete practice

1. Log format

The following is an example of structured logs to be collected:

TestConfig1, 5.0, 2019/3/6 17:48:23, 2019/3/6 17:48:30, demo_1, open, 3,1,6.8270219 openscreen > validatestage TestConfig2, 5.0, 2019/3/6 17:48:33, 2019/3/6 17:48:40, demo_2, open, 3,2,6.9179322 openscreen > validatestage TestConfig3, 5.0, 2019/3/6 17:48:43, 2019/3/6 17:50:23, demo_1, open, 3,3,100.1237885 switchscreen > validatestag TestConfig3, 5.0, 2019/3/6 17:48:43, 2019/3/6 17:50:23, demo_1, open, 3,3,100.1237885 switchscreen > validatestag TestConfig3, 5.0, 2019/3/6 17:48:43, 2019/3/6 17:50:23, demo_1, open, 3,3,100.1237885 switchscreen > validatestag TestConfig3, 5.0, 2019/3/6 17:48:43, 2019/3/6 17:50:23, demo_1, open, 3,3,100.1237885 switchscreen > validatestag TestConfig3, 5.0, 2019/3/6 17:48:43, 2019/3/6 17:50:23, demo_1, open, 3,3,100.1237885 switchscreen > validatestag TestConfig3, 5.0, 2019/3/6 17:48:43, 2019/3/6 17:50:23, demo_1, open, 3,3,100.1237885 switchscreen > validatestagCopy the code

Note: This log is generated in batches. Each time the client presses the current directory, a *.log file is generated. Specify column names for corresponding columns during data collection.

2. Telegraf configuration

Configuration Telegraf. Conf

[[inputs.logparser]]
  ## Log files to parse.
  ## These accept standard unix glob matching rules, but with the addition of
  ## ** as a "super asterisk". ie:
  ## /var/log/**.log -> recursively find all .log files in /var/log
  ## /var/log/*/*.log -> find all .log files with a parent dir in /var/log
  ## /var/log/apache.log -> only tail the apache log file
  files = ["C:\\Release\\TestConfigLog\\*.log"]

  ## Read files that currently exist from the beginning. Files that are created
  ## while telegraf is running (and that match the "files" globs) will always
  ## be read from the beginning.
  from_beginning = false

  ## Method used to watch for file updates. Can be either "inotify" or "poll".
  watch_method = "poll"

  ## Parse logstash-style "grok" patterns:
  ## Telegraf built-in parsing patterns: https://goo.gl/dkay10
  [inputs.logparser.grok]
    ## This is a list of patterns to check the given log file(s) for.
    ## Note that adding patterns here increases processing time. The most
    ## efficient configuration is to have one pattern per logparser.
    ## Other common built-in patterns are:
    ## %{COMMON_LOG_FORMAT} (plain apache & nginx access logs)
    ## %{COMBINED_LOG_FORMAT} (access logs + referrer & agent)
    patterns = ['%{WORD:scene},%{NUMBER:version:float},%{TS_WIN:begtime},%{TS_WIN:endtime},%{WORD:canvasName},%{WORD:canvasCase},%{NUMBE R:totaltimes:int},%{NUMBER:current:int},%{NUMBER:time_consuming:float}']

    ## Name of the outputted measurement name.
    measurement = "bigscreen"

    ## Full path(s) to custom pattern files.
    ## custom_pattern_files = []

    ## Custom patterns can also be defined here. Put one pattern per line.
    custom_patterns = 'TS_WIN %{YEAR}/%{MONTHNUM}/%{MONTHDAY}[T ]%{HOUR}:? %{MINUTE}(? : :? %{SECOND})? '

    ## Timezone allows you to provide an override for timestamps that
    ## don't already include an offset
    04/06/2016 12:41:45 Data One Two 5.43µs
    # #
    ## Default: "" which renders UTC
    ## Options are as follows:
    ## 1. Local -- interpret based on machine localtime
    ## 2. "Canada/Eastern" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
    ## 3. UTC -- or blank/unspecified, will return timestamp in UTC
    timezone = "Local"
Copy the code

Note:

  • files = [" *.log"], to solve the current directory multiple file object matching requirements
  • watch_method = "poll", set the rotation training to obtain file updates
  • Custom_patterns, to customize a time formatting pattern match

Indicator data generated by InfluxDB is as follows:

> select * from bigscreen limit 5
name: bigscreen
time                begtime           canvasCase canvasName current endtime           host            path                                                    scene       time_consuming totaltimes version
---- ------- ---------- ---------- ------- ------- ---- ---- ----- -------------- ---------- -------
1552296231630588200 2019/3/6 17:48:43 open       demo_1     3       2019/3/6 17:50:23 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\1.log  TestConfig3 100.1237885    3          5
1552296231630588201 2019/3/6 17:48:43 open       demo_1     3       2019/3/6 17:50:23 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\1.log  TestConfig3 100.1237885    3          5
1552296231630588202 2019/3/6 17:48:43 open       demo_1     3       2019/3/6 17:50:23 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\1.log  TestConfig3 100.1237885    3          5
1552296231631587700 2019/3/6 17:48:43 open       demo_1     3       2019/3/6 17:50:23 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\1.log  TestConfig3 100.1237885    3          5
1552297570005076300 2019/3/6 17:48:23 open       demo_1     1       2019/3/6 17:48:30 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\12.log TestConfig1 6.8270219      3          5
Copy the code

The column names are all customized by us.

Grafana Settings

The overall consideration is to use a table for data presentation and support filtering by individual fields.

Set filtering variables to meet field filtering requirements:

Create the Dashboard and select the table component:

Define data source:

Set the table field style to format the time field

Different levels of highlighting for response time fields (green, yellow, red)

The actual dynamic effect is as follows:

Five, the summary

This article uses a simple example to show how Telegraf+InfluxDB+Grafana can monitor structured logs in real time, and also support unstructured log collection. If you are interested, you can try it yourself.

Related information:

  • Github.com/zuozewei/bl…