There are few useful NodeJS o&m articles on the web. In the future, I will update some NodeJS o&M related content or let us have a deeper understanding of some server knowledge and the basic knowledge of automatic o&M Why do you want to do error log analysis, because there are few online tools in this respect I found a GoAccess but are analysis of success log and user access trend, looking for a long time did not find their own want simply use Node to build a

Error log Analysis

First we need to read the Nginx log, we can see that Nginx error log format is generally like this, it is important to note that Nginx error log format is similar because there is no way to set the log format, only set the log error level, so we can analyze it conveniently

readline

  • The file is read line by line: for example, log analysis.
  • Auto complete: For example, “Help init install” is displayed when you enter NPM.
  • Command-line tools: A question-and-answer scaffolding tool like NPM init. Here we mainly do log analysis and other interested can be considered

Implementation method

const readline = require('readline');
const fs = require('fs');
const path = require('path');
console.time('readline-time')
const rl = readline.createInterface({
  input: fs.createReadStream(path.join(__dirname, '.. /public/api.err.log'), {
    start: 0.end: Infinity})});let count = 0; 
rl.on('line', (line) => {
  const arr = line.split(', ');
  const time = arr[0].split(The '*') [0].split('[') [0].replace(/\//g.The '-');// Get the time
  const error = arr[0].split(The '*') [1].split(/\d\s/) [1];// Error cause
  const client = arr[1].split(' ') [1];// The requested client
  const server = arr[2].split(' ') [1];// The requested url
  const url = arr[3].match(/\s\/(\S*)\s/) [0].trim()// Get the request link
  const upstream = arr[4].match(/ (? < = "). *? (? =")/g) [0];// Get upstream
  const host = arr[5].match(/ (? < = "). *? (? =")/g) [0];/ / for the host
  const referrer = arr[6]? arr[6].match(/ (? < = "). *? (? =")/g) [0] : ' ';/ / source
  console.log(` time:${time}- reason:${error}- Client:${client}- the url:${server}- address:${url}Upstream: -${upstream}Host:${host}- source:${referrer}`); 
  count++;
});
rl.on('close', () = > {let size = fs.statSync(path.join(__dirname, '.. /public/api.err.log')).size;
  console.log('Read finished:${count}; File Location:${size % 2= = =0}`);
  console.timeEnd('readline-time')});Copy the code

A couple of things to note about this code is that it creates a file readable stream and for the sake of the demo I’m going to go directly to the local address if you’re in production you can just fill in the error log address on the server. If you don’t have Nginx error log partition you’re going to get a lot of logs every day. CreateReadStream reads tens of meters-worth of text The good news is that this can cause performance problems if we read hundreds of megabytes or gigabytes of capacity logs, so we need to read createReadStream every time. There is no need to start reading from 0 bytes every time. CeateReadStream provides start and end

let size = fs.statSync(path.join(__dirname, '.. /public/api.err.log')).size;
Copy the code

We can compare reading from 0 bytes each time to reading from a specified byte

Save the data for analysis

The node-schedule library is used to store the error logs in the same way as cron. Mongodb is used to store the error logs in the same way as Cron

  rl.on('close'.async() = > {let count = 0;
          for (let i of rlist) {
            count++;
            if (count % 500= = =0) {
              const res = await global.db.collection('logs').bulkWrite(rlist.slice(count, count + 500), { ordered: false.w: 1 }).catch(err= > { console.error('Batch insert error${err}`)}); }else if (count === rlist.length - 1) {
            // Batch insert data
              const res = await global.db.collection('logs').bulkWrite(rlist.slice(rlist - (rlist % 500), rlist.length), { ordered: false.w: 1 });
              let size = fs.statSync(addres).size;
              size = size % 2= = =0 ? size : size + 1;// Ensure that the byte size is even or the uplink content will be read incomplete
              count = 0;
              rlist.length = [];
              // Update the size of files in the database
              global.db.collection('tasks').updateOne({ _id: addre }, { $set: { _id: addre, size, date: +new Date()}}, {upsert: true }); 
            }
          }
          resolve(true);
        })
Copy the code

Because I’m using batch inserts and mongodb has a limit of up to 16 MEgabytes of data to be inserted at a time, so it’s up to you to clear and decide how many to insert at a time Hesitant to readline implementation more interested, went to read the source code found is not as complex as we think,readline source code, the following post line event source code, want to continue to in-depth students can look at all the source code


  if (typeof s === 'string' && s) {
          var lines = s.split(/\r\n|\n|\r/);
          for (var i = 0, len = lines.length; i < len; i++) {
            if (i > 0) {
              this._line();
            }
            this._insertString(lines[i]); }}... Interface.prototype._line =function() {
  const line = this._addHistory();
  this.clearLine();
  this._onLine(line); }; . Interface.prototype._onLine =function(line) {
  if (this._questionCallback) {
    var cb = this._questionCallback;
    this._questionCallback = null;
    this.setPrompt(this._oldPrompt);
    cb(line);
  } else {
    this.emit('line', line); }};Copy the code

The saved data needs to be analyzed such as which IP addresses have the most access and which errors have the most. You can use aggregation to analyze and post examples to analyze the reasons for which IP addresses have the most access errors on a given day

db.logs.aggregate(

	// Pipeline
	[
		// Stage 1
		{
			$group: {
			  '_id': { 'client': '114.112.163.28'.'server': '$server'.'error': '$error'.'url': '$url'.'upstream': '$upstream'.'date':'$date' ,'msg':'$msg'},'date': {'$addToSet':'$date'},
			   count: { '$sum': 1}}},// Stage 2
		{
			$match: { 
			      count: { $gte: 1 },
			      date: ['2019-05-10']}}, {$sort: {
			    count: - 1}},].// Options
	{
		cursor: {
			batchSize: 50
		},

		allowDiskUse: true});Copy the code

Through this log analysis to learn a lot of things, welcome to communicate with me, students have questions can leave a message below