Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article was published by J2X in the Cloud + Community column

We all have to live a machine check log experience, when the cluster number increasing, the low efficiency of the primitive operation brings not only bring great challenge to our positioning network problem now, at the same time, we also can’t the indicators of our service framework for effective quantitative diagnosis, don’t talk about more targeted optimization and improvement. At this time, the construction of real-time log monitoring system with information search, service diagnosis, data analysis and other functions is particularly important.

ELK (ELK Stack: ElasticSearch, LogStash, Kibana, Beats) is a set of mature logging solutions. Its open source and high performance are widely used by major companies. How does the service framework used by our business connect to the ELK system?

Business background

Our business framework background:

  • The business framework is a WebServer based on NodeJs
  • The service uses the Winston log module to localize logs
  • The logs generated by the services are stored on disks on their respective machines
  • The service is deployed on multiple machines in different locations

Access to the steps

We can summarize the whole framework into the following steps:

  • Log structure design: Change from traditional plain text log to structured object and output as JSON.
  • Log collection: Outputs logs at key points in the framework request lifecycle
  • ES index template definition: establish JSON to ES actual storage mapping

First, log structure design

Traditionally, when we do log output, we directly output the log level and log content string (message). However, not only do we care about when and what happened, but we may also need to care about how many times similar logs occur, the details and context of the logs, and the associated logs. So instead of simply structuring our log as objects, we extract the key fields of the log.

1. Abstract logs as events

We treat each log occurrence as an event. Events include:

Event meta-field

  • Time of event:datetime.timestamp
  • Event Level:level, such as:ERROR.INFO.WARNING.DEBUG
  • Event Name:event, such as:client-request
  • Relative time of event occurrence (in nanoseconds) :reqLife, this field is the time (interval) when the event began to occur relative to the request
  • Location of the event:line, code location;server, location of the server

Request meta-field

  • Request unique ID:reqId, this field traverses all events that occur on the request link
  • Request user ID:reqUid, this field is the user ID, which can trace the access or request link of the user

The data fields

Different types of events need to output different details, we put these details (non-meta fields) into d — data. This makes our event structure clearer and also prevents data fields from polluting meta-fields.

For example, the client-init event will be printed every time the server receives a user request. We categorize the user’s IP address, URL and other events into unique data fields and put them into the D object

Let me give you a complete example

{
    "datetime":"The 2018-11-07 21:38:09. 271"."timestamp":1541597889271."level":"INFO"."event":"client-init"."reqId":"rJtT5we6Q"."reqLife":5874."reqUid": "999793fc03eda86"."d": {"url":"/"."ip":"9.9.9.9"."httpVersion":"1.1"."method":"GET"."userAgent":"Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"."headers":"*"
    },
    "browser":"{"name":"Chrome","version":"70.03538.77.","major":"70"}"."engine":"{"version":"537.36","name":"WebKit"}"."os":"{"name":"Mac OS","version":"10.14. 0"}"."content":"(Empty)"."line":"middlewares/foo.js:14"."server":"127.0.0.1"
}
Copy the code

Sometimes we want the log to be as flat as possible (maximum depth of 2) to avoid the performance penalty of unnecessary indexing in ES. In the actual output, we will output values with a depth greater than 1 as a string. Sometimes some object fields are of interest, so we put these special fields in the outer layer to ensure that the output depth is not greater than 2.

In general, we only need to focus on the event name and data fields when printing the output log. Other, we can in the print log method, through access to the context of uniform access, calculation, output.

2. Modify log output

Earlier we mentioned how to define a logging event, so how to upgrade based on the existing logging scheme, while compatible with the logging method of the old code.

Example Upgrade logs of key nodes

/ / before modification
logger.info('client-init => ' + JSON.stringfiy({
    url,
    ip,
    browser,
    / /...
}));

/ / after transforming
logger.info({
    event: 'client-init',
    url,
    ip,
    browser,
    / /...
});
Copy the code

Compatible with old log invocation methods

logger.debug('checkLogin');
Copy the code

Since Winston’s logging methods already support passing in strings or objects, the formatter actually receives {level: ‘debug’, message: CheckLogin}. The formatter is the process by which Winston’s logs are formatted before they are exported, and this gives us the opportunity to convert logs from such calls to pure output events — we call them raw-log events — before they are exported, without changing the calling method.

Change the log output format

As mentioned earlier, Winston sends logs through our predefined formatter, so in addition to compatible logic processing, we can put some common logic processing here. In calls, we just focus on the field itself.

  • Meta field extraction and processing
  • Field length control
  • Compatible logic processing

How to extract meta fields, which involves the creation and use of context, here is a brief introduction to the creation and use of domain.

//--- middlewares/http-context.js
const domain = require('domain');
const shortid = require('shortid');

module.exports = (req, res, next) = > {
    const d = domain.create();
    d.id =  shortid.generate(); // reqId;
    d.req = req;
    
    / /...

    res.on('finish', () => process.nextTick((a)= > {
        d.id = null;
        d.req = null;
        d.exit();
    });

    d.run((a)= > next());
}

//--- app.js
app.use(require('./middlewares/http-context.js'));

//--- formatter.js
if (process.domain) {
    reqId = process.domain.id;
}
Copy the code

In this way, we can output the reqId to all events in a request, thus achieving the purpose of associating events.

2. Log collection

Now that we know how to output an event, we should consider two questions next:

  1. Where do we output events?
  2. What details does the event output?

In other words, in the whole request link, which nodes are we concerned about and which nodes can be used to quickly locate the problem? In addition, what other nodes can we use for statistical analysis?

In combination with common request links (user request, service side receiving request, downstream server/database of service request (* multiple times), data aggregation rendering, service response), the flow chart of the following side is shown

So, we can define our events like this:

User request

  • client-init: Prints a request (unparsed) received by the framework, including: request address, request header, Http version and method, user IP, and browser
  • client-request: Printed when the framework receives a request (parsed), including: request address, request header, Cookie, request package body
  • client-response: Printed in the frame return request, including: request address, response code, response header, response package body

Downstream depend on

  • http-start: Printed downstream of the request start: request address, request package body, module alias (convenient for name-based aggregation and domain name)
  • http-success: printed on request return200: Request address, request package body, response package body (code & MSG & Data), time
  • http-error: printed on request returnThe 200, that is, failed to connect to the server: request address, request package body, response package body (code & Message & stack), time.
  • http-timeoutTimeout: request address, request package body, response package body (code & MSG & stack), time.

So many fields, how to choose? In a nutshell, the field principle of event output is: output fields that you care about, that are easy to retrieve, and that are easy to aggregate later.

Some advice

  1. The request body and return body downstream of the request have fixed formats.{ action: 'getUserInfo', payload: {} }Output:{ code: 0, msg: '', data: {}}We can output action, code and so on in the event, so that we can retrieve various indicators and aggregations of a specific interface of a module through action.

Some principles

  1. Keep output field types consistent Because all events are stored in the same ES index, the same field should be consistent regardless of the same event or different events, for example: Code should not be both a number and a string. This can cause field collisions, and some documents cannot be retrieved by the conflicting fields.
  2. The ES storage type is keyword and cannot exceed ES Mappingignore_aboveIn (default 4096 bytes). Otherwise, it may also be unable to be retrieved

ES index template definition

Here we introduce two concepts of ES, Mapping and Template.

First of all, the basic storage types of ES are roughly enumerated as follows

  • String: keyword & text
  • Numeric: long, integer, double
  • Date: date
  • Boolean: boolean

In general, we do not need to display the storage type of each event field specified in ES, ES automatically determines the storage type of the field in the index based on the value of the field’s first occurrence in Document. But sometimes, we need to display the storage type that specifies certain fields. In this case, we need to define the index Mapping to tell ES how the fields are stored and indexed.

e.g.

Remember the event meta-field has a field timestamp? In fact, when we output timestamp, the value is a number, which represents the number of milliseconds with the distance from 1970/01/01 00:00:00, and we expect its storage type in ES to be date to facilitate later retrieval and visualization, so when we create the index, Specify our Mapping.

PUT my_logs
{
  "mappings": {
    "_doc": { 
      "properties": { 
        "title":    {
            "type": "date"."format": "epoch_millis"},}}}}Copy the code

But in general, we might automatically generate our log index by date, assuming that our index name is in the format my_logs_yyyyMMdd (e.g. my_logs_20181030). Then we need to define a Template that automatically applies the preset Mapping when the index is created.

PUT _template/my_logs_template
{
  "index_patterns": "my_logs*"."mappings": {
    "_doc": { 
      "properties": { 
        "title":    {
            "type": "date"."format": "epoch_millis"},}}}}Copy the code

Tip: Keeping logs generated on all dates in one index not only incurs unnecessary performance overhead, but also prevents periodic deletion of older logs.

summary

Now that the log transformation and access preparations are complete, all we need to do is install FileBeat, a lightweight file logging Agent that transfers logs from log files to ELK. Next, we can use Kibana to quickly retrieve our logs.

Machine learning in action! Quick introduction to online advertising business and CTR knowledge

This article has been authorized by the author to Tencent Cloud + community, more original text pleaseClick on the

Search concern public number “cloud plus community”, the first time to obtain technical dry goods, after concern reply 1024 send you a technical course gift package!

Massive technical practice experience, all in the cloud plus community!