originally

Today is July 10th, August 10th, exactly one year ahead of the search. When I arrived, I was a Noder novice, and I still am. With millions of lines of code, it was a nightmare to have a problem without an error stack. This is the dilemma when dealing with an online bug, and then when testing locally, it’s good, crazy. Out of urgency, I took Sentry for a period of time. The main application of the project is as follows:

  • No error records were caught
  • Email Alarm Notification
  • It’s an excellent collaboration platform in its own right

After more than half a year, some problems gradually emerged, such as:

  • It is only a record of the error stack, and lacks a record of the context, which is of limited help for bug debugging.
  • The search function is relatively weak and historical traceability is poor.
  • Teams began to have the need to normalize logs and monitor cluster logs, which sentry clearly couldn’t do.

Hence the evolution of ELK + Souche-Alert.

ELK

Elasticsearch + Logstash + Kibana: Elasticsearch + Logstash + Kibana: Elasticsearch + Logstash + Kibana

Filebeta Log harvester

Filebeat download address. You can run the bin command to start FileBeta without installation

./filebeat -e -c filebeat.yml -d '*'
Copy the code

The configuration file

  • Set the log listening path
  • Set the Logstash address
  • Make sure the default configuration of Elasticsearch has been removed
filebeat: # List of prospectors to fetch data. prospectors: # Each - is a prospector. Below are the prospector specific configurations - # Paths that should be crawled and fetched.  Glob based paths. # To fetch all ".log" files from a specific level of subdirectories # /var/log/*/*.log can be used. #  For each file found under this path, a harvester is started. # Make sure not file is defined twice as this can lead to unexpected behaviour. paths: - /tmp/bunyan-cheniu-error.log # - /var/log/*.log #- c:\programdata\elasticsearch\logs\* ### Logstash as output Logstash: # The logstash hosts hosts: ["127.0.0.1:5043"]Copy the code

Logstash Log transfer station

Logstash Download address to start logstash

./bin/logstash -f config/xxxxx.conf
Copy the code

The logstash log format is JSON. The default template indexes and partitions all String fields. In this case, a customized template is used to specify certain fields without indexing to improve ElasticSearch search performance and save storage space.

# The # character at the beginning of a line indicates a comment. Use # comments to describe your configuration. # The hosts configuration in FileBeats needs to be the same as this one. Input {beats {type => "cheniu_API_server" port => "5043" codec => "json"}} # The filter part of this file is commented out to indicate that it is # optional. filter { } output { elasticsearch { index => 'cheniu - API - test - % {+ YYYY. MM. Dd}' template = > '/ home/yourname/logdir/logstash - 2.3.2 / config/cheniu - API - template. Json' template_name => 'cheniu-api' template_overwrite => true } stdout {} }Copy the code

You can search for elasticSearch-template. json in the Logstash project to see the default template. Below is the custom optimized template, with the major changes in the Properties section. There is no Chinese word segmentation requirement yet, if there is one, you need to specify a word segmentation engine for fields containing Chinese.

{
  "template" : "cheniu-api-*",
  "settings" : {
    "index.refresh_interval" : "10s"
  },
  "mappings" : {
    "_default_" : {
      "_all" : {"enabled" : true, "omit_norms" : true},
      "dynamic_templates" : [ {
        "message_field" : {
          "match" : "message",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "string", "index" : "analyzed", "omit_norms" : true,
            "fielddata" : { "format" : "disabled" }
          }
        }
      }, {
        "string_fields" : {
          "match" : "*",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "string", "index" : "analyzed", "omit_norms" : true,
            "fielddata" : { "format" : "disabled" },
            "fields" : {
              "raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
            }
          }
        }
      } ],
      "properties" : {
        "name": {"type": "string", "index": "not_analyzed"},
        "req_id": {"type": "string", "index": "not_analyzed"},
        "@timestamp": { "type": "date" },
        "@version": { "type": "string", "index": "not_analyzed" },
        "geoip"  : {
          "dynamic": true,
          "properties" : {
            "ip": { "type": "ip" },
            "location" : { "type" : "geo_point" },
            "latitude" : { "type" : "float" },
            "longitude" : { "type" : "float" }
          }
        }
      }
    }
  }
}
Copy the code

ElasticSearch

Download url for ElasticSearch, also free to install, there is no cluster requirement yet, so it will work.

Kibana

Kibana powerful log visualization panel, you can use the ElasticSearch address after configuring.

Souche-Alert

Elastic has a commercial product called Watcher, which is a bit too flexible to be charged. Node installs an alarm center that calls ElasticSearch’s SEARCH Api periodically, queries error logs, and determines whether to send alarm emails based on the trigger logic.

The benefits are as follows:

  • The javascript technology stack allows anyone to write several monitors. (I’m on the front end team)
  • The principle is simple and crude enough to be stable and easy to debug.
  • Free!

Log generation convention

The Bunyan team added another layer of encapsulation internally, creating a private NPM package called ByLog to generate jSON-formatted logs.

The following improvements have been made:

  • Node xx. Js | bunyan command to change to a configurable parameter, used for log formatted output.
  • Added meta field support for logging additional context information.
  • Added silent channel, log optionally output to file only, used to request information for recording.
  • In order to prevent the name field of name.xxx.cc in the log index, the data after the second layer of JSON log is escaped by util. Inspect at most 5 layers. For performance reasons, no more.

Framework for the integration

const bylog = require('.. / '); / / terminal output format, did the node XXX. Js | bunyan's matter const logOutput = the require (".. /lib/log_output.js'); / / support bunyan (https://github.com/trentm/node-bunyan) all parameters let logOpt = {name: 'cheniu - pro, SRC: false, streams: [{level: 'info', path: '/ TMP /bunyan-cheniu-pro.log', mute: true // only output to file, not output to terminal, echo output function Options mute parameters}, {level: 'trace', stream: logOutput // process.stdout } ] }; let byLog = bylog.createLogger(logOpt);Copy the code

Level agreement

The log levels in bunyan are as follows. The level descriptions are best practice opinions of the author.

  • “fatal” (60): The service/app is going to stop or become unusable now. An operator should definitely look into this soon.
  • “error” (50): Fatal for a particular request, but the service/app continues servicing other requests. An operator should look at this soon(ish).
  • “warn” (40): A note on something that should probably be looked at by an operator eventually.
  • “info” (30): Detail on regular operation.
  • “debug” (20): Anything else, i.e. too verbose to be included in “info” level.
  • “trace” (10): Logging from external libraries used by your app or very detailed application logging.

Info Logs of higher levels are recorded on the log platform. Logs of level:>=50 are displayed with an email warning and need to be handled immediately.

Within the project, different levels of logs are output using the following functions, and byLog is mounted on the Global object.

byLog.{level}(err[, meta[, append]]); - {Error | String} err Error object or a String. - {JSON | String} meta (optional) is used to record Error occurs when the context information - {JSON} append (optional) - currently supports the following parameters {String}req_id Specifies the unique request ID, which can be obtained from req.x_request_id in the project. It is used to bind an error to an API request. // Example Test bylog. trace('trace message here'); byLog.debug('debug message here', 'meta message here'); byLog.info('info message here', {infoKey: 'infoValue'}); byLog.warn(new Error('warn message here'), {infoKey: 'infoValue'}); byLog.error(new Error('error message here'), { url: 'http://www.souche.com', token: { from: 'usercenter', value: 'jdi90knen' } }); byLog.fatal(new Error('fatal message here'), null, {req_id: 'xxx'});Copy the code

Note: it is! Not all errors are required to be at Error level or higher. It is at the discretion of the developer, depending on the level of impact

Log type

The error log

Collecting channel

  • The system does not catch any errors. The buried points are as follows and the output level iserror.
process.on('uncaughtException', processErrorHandler);  
process.on('unhandledRejection', processErrorHandler);  
Copy the code
  • ByLog output in the program
  • The console. The error equals byLog. Error
  • The following burying points can be made for Express
// Add error handling at the end of all routes, Must contain four parameters at the same time / / http://expressjs.com/en/guide/error-handling.html app. Use (function (err, the req, res, next) { res.send({ code: 500, message: `System Error! Please send serial number ${req.x_request_id} to administer` }); byLog.error(err, null, { req_id: req.x_request_id }); });Copy the code

Requesting Link Information

There is a middleware in the Bylog framework that does the following

  • If the request isheaderPart not specifiedX-Request-Id, the system automatically fills in the following formatw{weeks}{day of week}_uuidValue.
  • The default is to record the full parameters, header information for each API request, and Expressreq.x_request_idIs bound to this requestX-Request-Id.
  • In the API response bodyheaderPart of the returnX-Request-IdField.

In Express, we also did a hack on the res.send function, logging the return in JSON format and correlating the request object with req.x_request_id.

Res. send has a black hole. If you send a JSON object, the function does a type check, converts the JSON to a string, and then calls res.send again.

Best practice: Pass the req.x_request_id value to error output to associate the error with a request.

byLog.fatal(new Error('fatal message here'), {key: 'keyValue'}, {req_id: 'xxx'});  
Copy the code

Enter the following statement in the search bar of the log panel to view all the log information of a certain request (note the selection of the search time range) :

req_id: 'w224_9666a363-af8b-4c8a-a5f4-312078763961'  
Copy the code

In the testing phase, after capturing packets, the tester only needs to give the X-request-ID value to the server. If an online fault is detected, complete link information is also displayed.

Custom monitoring logic

You can use the log search function of ElasticSearch to customize the monitoring logic in Souche-Alert for key indicators

Such as queue health monitoring, key business indicators, free play.

The resources

  • ELKstack Chinese guide
  • Elastic’s product documentation

Write the last words

Log normalization and analysis monitoring is a long but meaningful process, normalization is the basis of analysis monitoring, a mess is not clear. It takes constant patience and push, but comebacks pay off, and it’s a pleasure to see clean log output and find problems before users do.

Some of the toolkits still in use internally, with much consultation with Taro, should soon be packaged as an open source solution.

Log standardization and analysis monitoring, car search also have a lot of road to go, such as

  • Cluster deployment of ELK
  • Rules for cleaning and silencing logs
  • Utilization of Kibana data visualization
  • More diversified alarm channels, such as wechat and SMS
  • Compatibility with multiple technology stacks, such as Node, Java, ruby
  • .

May there be no P0 bug

PS: Today is my birthday 🎂