In Graphite, we used ELK to build a set of monitoring charts for some reasons, such as:

  1. Kibana often checks logs for hang-ups
  2. Kibana’s charts aren’t pretty and flexible enough

Therefore, a new monitoring system is established by using StatsD + Grafana + InfluxDB.


Tool profile

StatsD
here






Grafana
The online DEMO






InfluxDB





Start the docker – statsd – influxdb – grafana

docker-statsd-influxdb-grafana














➜ Desktop docker-machine start Starting "default"... (default) Check network to re-create if needed... (default) Waiting for an IP... Machine "default" was started. Waiting for SSH to be available... Detecting the provisioner... Started machines may have new IP addresses. You may need to re-run the 'docker-machine env' command. ➜ Desktop Docker - machine env export DOCKER_TLS_VERIFY = "1" export DOCKER_HOST = "TCP: / / 192.168.99.100:2376" export DOCKER_CERT_PATH="/Users/nswbmw/.docker/machine/machines/default" export DOCKER_MACHINE_NAME="default" # Run this Command to configure your shell: # eval $(docker-machine env) ➜ Desktop eval $(docker-machine env) ➜ Desktop docker psCopy the code








docker run -d \
  --name docker-statsd-influxdb-grafana \
  -p 3000:9000 \
  -p 8083:8083 \
  -p 8086:8086 \
  -p 22022:22 \
  -p 8125:8125/udp \
  samuelebistoletti/docker-statsd-influxdb-grafana:latest
Copy the code



Configuration InfluxDB

http:// your IP: 8083




Pay attention to





Configuration Grafana

http:// your IP: 3000


  1. Enter user and password as root to log in
  2. Click the icon in the upper left corner -> Data Source -> Add Data Source to enter the configuration Data Source page. Fill in the following information and click Save:




Pay attention to





Use the node – statsd

node-statsd


'use strict'; Const StatsD = require('node-statsd'), client = new StatsD({host: '192.168.99.100', port: 8125}); setInterval(function () { const responseTime = Math.floor(Math.random() * 100); client.timing('api', responseTime, function (error, bytes) { if (error) { console.error(error); } else { console.log(`Successfully sent ${bytes} bytes, responseTime: ${responseTime}`); }}); }, 1000);Copy the code

Pay attention to











Create the Grafana diagram


  1. Click the icon in the upper left corner -> Dashboards -> +New to go to the Create Chart page
  2. Go to the green block on the left -> Add Panel -> Graph to create a chart


Create an API request volume chart

  1. Go to General -> Title and change it to “API requests”
  2. Go to Metrics -> Add Query, click as shown, select “api.timer.count”, ALIAS BY fill in “TPS”, as shown below:
  3. By clicking on the top left corner to save (or CTRL + S), I chose to display the data within 5 minutes, refreshing every 5s, as shown below:


Create an API response time chart

  1. Go to +ADD ROW -> the green block on the left -> ADD Panel -> Graph to create a Graph
  2. Go to General -> Title and change it to “API response time”
  3. Go to Metrics -> Add Query, click as shown, select “api.timer.mean”, ALIAS BY fill in “mean”
  4. Click Add query, select “api.timer.mean_90”, ALIAS BY fill in “mean_90”
  5. Add query, select “api.timer.upper_90” and ALIAS BY “upper_90”










  1. Mean: indicates the average response time of all requests
  2. Mean_90: The average response time of the remaining 90% of requests after removing 10% of the highest response time
  3. Upper_90: The maximum response time after removing 10% of the maximum response time





  1. https://github.com/etsy/statsd/blob/master/docs/metric_types.md

  2. https://github.com/etsy/statsd/issues/157





Matters needing attention

  1. /opt/statsd/config.js/docker-statsd-grafan So if you change the InfluxDB or username or password, don’t forget to change this configuration.
  2. Use query statements on the InfluxDB Web admin page. If you use client.timing(‘ API ‘) on Node-statsd, it does not create tables for the API, but tables such as api.timer.count, etc. Therefore, there is no result for the following query: SELECT * from API, which can be used under the datasource: select * from /.*/
  3. When node-statsd is used, only timing data is sent. This type also creates additional data of counting type, so this is unnecessary client.increment(‘ API ‘).





Used in Koa

lib/statsd.js


'use strict';

const StatsD = require('node-statsd');
const config = require('config');

module.exports = new StatsD({
  host: config.statsd.host,
  port: config.statsd.port
});
Copy the code

middlewares/statsd.js


'use strict';

const statsdClient = require('../lib/statsd');

module.exports = function () {
  return function *statsd(next) {
    const routerName = this.route ? this.route.handler.controller + '.' + this.route.handler.action : null;
    const start = Date.now();

    yield next;

    const spent = Date.now() - start;
    if (routerName) {
      statsdClient.timing(`api.${routerName}`, spent);
      statsdClient.timing('api', spent);
    }
  };
};
Copy the code

app.js


app.use(require('./middlewares/statsd')());
Copy the code

bay


const routerName = this.routerName;
Copy the code

One-click data import

We have nearly a hundred interfaces to our API, so it would be too much work to manually create and configure a chart every time, and the configuration of the chart was almost the same every time I created it, so I looked for some shortcuts. I found that Grafana had Template functionality, but I tried it and couldn’t figure out how to use it. I also found that Grafana had the function of Import, so I first exported the configured chart to JSON, and then kept copying and pasting to modify it, saved it and tried to Import to see the effect, and finally succeeded.

Note: In the exported JSON, rows represent each row, and each row has a panels array that stores each Graph (see the image below for two graphs on a row). Each Graph has an ID field that is incremented (e.g., 1, 2, 3…). , targets the refId of each curve is increasing (e.g. A, B, C…). , remember to correct, otherwise the chart will not display properly.


Finally, I wrote a script and ran it to generate JSON files for each interface. More than 30 files were exported from more than 30 interfaces, and each Import took more than 30 times. There is a much easier way to do it. I looked at Grafana’s network request in the console while importing in the browser and found that the Grafana network request was imported by calling:

POST https://xxx:3006/api/dashboards/import
Copy the code

In addition, the JSON file data is placed directly in the post request body, which is easy to do without generating the file. Finally, the generated configuration is placed in an array, and the co + Co-Request loop is used to call the above interface import, which is really a one-click import.

Here is a dashboard with its CORRESPONDING JSON configuration:

{ "id": 32, "title": "API file", "tags": [], "style": "dark", "timezone": "browser", "editable": true, "hideControls": false, "sharedCrosshair": false, "rows": [ { "collapse": false, "editable": true, "height": "250px", "panels": [ { "aliasColors": {}, "bars": false, "datasource": "api-influxdb", "editable": true, "error": false, "fill": 2, "threshold1": null, "threshold1Color": "rgba(216, 200, 27, 0.27)", "threshold2": null, "threshold2Color": "Rgba (234, 112, 112, 0.22)"}, "id" : 1, "isNew" : true, "legend" : {" avg ": false," current ": false," Max ": false," min ": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "links": [], "minSpan": 6, "nullPointMode": "connected", "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "span": 6, "stack": false, "steppedLine": false, "targets": [ { "alias": "tps", "dsType": "influxdb", "groupBy": [ { "params": [ "$interval" ], "type": "time" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "api.file.show.timer.count", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "value" ], "type": "field" }, { "params": [], "type": "mean" } ] ], "tags": [] } ], "timeFrom": null, "timeShift": null, "title": "api.file.show.count", "tooltip": { "msResolution": true, "shared": true, "sort": 0, "value_type": "cumulative" }, "type": "graph", "xaxis": { "show": true }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ] }, { "aliasColors": {}, "bars": false, "datasource": "api-influxdb", "editable": true, "error": false, "fill": 1, "threshold1": null, "threshold1Color": "rgba(216, 200, 27, 0.27)", "threshold2": null, "threshold2Color": "Rgba (234, 112, 112, 0.22)"}, "id" : 2, "isNew" : true, "legend" : {" avg ": false," current ": false," Max ": false," min ": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 2, "links": [], "minSpan": 5, "nullPointMode": "connected", "percentage": false, "pointradius": 5, "points": false, "renderer": "flot", "seriesOverrides": [], "span": 6, "stack": false, "steppedLine": false, "targets": [ { "dsType": "influxdb", "groupBy": [ { "params": [ "$interval" ], "type": "time" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "api.file.show.timer.mean", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "value" ], "type": "field" }, { "params": [], "type": "mean" } ] ], "tags": [], "alias": "mean" }, { "dsType": "influxdb", "groupBy": [ { "params": [ "$interval" ], "type": "time" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "api.file.show.timer.mean_90", "policy": "default", "refId": "B", "resultFormat": "time_series", "select": [ [ { "params": [ "value" ], "type": "field" }, { "params": [], "type": "mean" } ] ], "tags": [], "alias": "mean_90" }, { "dsType": "influxdb", "groupBy": [ { "params": [ "$interval" ], "type": "time" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "api.file.show.timer.upper_90", "policy": "default", "refId": "C", "resultFormat": "time_series", "select": [ [ { "params": [ "value" ], "type": "field" }, { "params": [], "type": "mean" } ] ], "tags": [], "alias": "upper_90" } ], "timeFrom": null, "timeShift": null, "title": "api.file.show.timer", "tooltip": { "msResolution": true, "shared": true, "sort": 0, "value_type": "cumulative" }, "type": "graph", "xaxis": { "show": true }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ] } ], "title": "Row" } ], "time": { "from": "now-1h", "to": "now" }, "timepicker": { "refresh_intervals": [ "5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h", "2h", "1d" ], "time_options": [ "5m", "15m", "1h", "6h", "12h", "24h", "2d", "7d", "30d" ] }, "templating": { "list": [] }, "annotations": { "list": [] }, "refresh": "5s", "schemaVersion": 12, "version": 2, "links": [], "gnetId": null }Copy the code

Grafana for more


  1. Total average API response time
  2. API TPS and average response time per interface
  3. CPU and memory usage will be added in the future





The last

We are hiring!

[Beijing/Wuhan] Graphite document to make the most beautiful products – looking for the most talented engineers in China to join us