The original link

The article directories

  • Common indicator meanings of numeric types

    • The total error
    • error
  • Template variables

    • The way template variables are written
    • Creating a Template variable
    • Hidden gameplay of template variables
  • Grafana panel editor

    • Metrcis
    • Legend
    • Display
  • Senior function

    • Aggregate the amount of multiple matching curves for a single query Combine -> sumSeries
    • Transform -> timeShift
    • Remove outliers Filter -> removeAboveValue
    • Rename function
    • Special -> groupByNode
    • Calculate the success rate Calculate -> asPercent for multiple queries
  • other

    • Call the police
    • Statsd points limit
    • The back-end queries grafana data
    • Anonymous mode
  • The last

Grafana is an open source temporal statistics and monitoring platform that supports many data sources such as ElasticSearch, Graphite, and InfluxDB, and is known for its powerful interface editor. We have received some good feedback on the introduction of Grafana in front-end monitoring, but many users often ask questions about Grafana because they have not been exposed to grafana before. Therefore, we hope that this article will be helpful for you to use Grafana.

Grafana has three levels of permissions: The Viewer can only view existing grafana panels, but cannot edit them. The Editor can edit panels, and the Admin has all permissions to add data sources, plug-ins, and API keys.

Viewer permissions are sufficient for the average user, and the rest of this article focuses on Editor permissions. Due to limited space, this paper uses Graphite as the data source for the example, and only introduces the configuration method of Graph Graph, which is most commonly used.

Common indicator meanings of numeric types

  • count_ps

    • Number per second
  • count

    • The number per ten seconds
  • mean_90

    • The average minus the top 10%
  • upper_90

    • The highest value after removing the highest 10% of data

The total error

A common grafana mistake is to use count_ps to get the number of RPS per second. Note that in this case, the total number of RPS over a period of time is calculated by multiplying the avG average of count_ps by the number of seconds during that period. Instead of reading directly from Total on the interface.

This is because there is a limit to the number of points that can be displayed on a curve, and Grafana determines how many points to return based on the width of your window, because there is no way to display points per second on the screen in a time period like a day. There are 86,400 points, and you can’t even fit them on a fish screen. For points that cannot be shown, grafana defaults to using the avG average behavior to correct the value of the return point, for example, as shown below:

The time range in the figure above is one day. The upper part is divided into the value of curve panel and the lower part is divided into the value of pie chart. In addition, the curve of the upper part of the icon is of the count type (clustered once in ten seconds), and it can be seen that the average AVG is 683. So the total should be 682 times 6 times 60 times 60 times 24 times an hour 60 minutes, which is 5.89 million, which is close to 5.82 million in the lower part of the picture, Therefore, the previous total of 1.17 million is a completely misleading value, which can be ignored as meaningless.

error

In fact, there is a little error between the 5.89 million calculated above and the 5.82 million on the interface, but this is acceptable, because STATSD is generally in the form of UDP (it actually has the form of TCP), so if you want the exact correct data, you’d better put the relevant data into the library. A post-query from the database is completely reliable.

Template variables

The ability of template variables to dynamically control queries in the panel is an important feature. They can often be found in the upper left corner of the panel, as shown below:

The way template variables are written

Template variables can be written in $name and [[name]]. For Graphite data sources, the former is mainly used, for example, stats.timers.fe.test.$key.count_ps

Creating a Template variable

On the grafana interface, click the gear button -> Templating -> New, and the following interface will appear:

This section describes how to write the Query type.

  • Name

    • The name of the variable. Special characters such as $are not supported
  • Refresh

    • optionalNever.On Dashboard LoadandOn Time Range Change
    • Select if the value of the variable frequently increases dynamicallyOn Time Range Change, otherwise,On Dashboard LoadNever set the Query type to Never, otherwise the variable will only be updated when you click in to edit the variable
  • Query

    • Query statements, such as stats.timers.fe.test.*
    • Grafana does not trigger the request when you write it, you need to click on the outside of the input box, and the queried value is displayed below

Query supports template variables, such as stats.timers.fe.test.$key.*, which automatically refresh values when the $key variable changes. This ability to associate multiple template variables can significantly reduce the time of a grafana query.

Hidden gameplay of template variables

Template variables can even be used to jump grafana, a hidden trick not even mentioned in the documentation, by inserting $name anywhere in the URL in Link or Dashboard, Grafana will also replace this variable when the user clicks on the link to jump to the correct link. This can be integrated with other systems to achieve a good user experience, such as jumping to Kibana to query the log.

Kibana and Grafana have different time range formats, which can be fixed using the Chrome plugin in this article.

Var -${name} is the same as var-${name} in the queryString section of the link. This combination makes it easy to jump from third-party systems to the correct Grafana panel

Grafana panel editor

Enter any panel with an account with Editor permission, click a chart and then click the Edit button in the popup window to enter the chart Editor interface. For the editor, this article covers only the important configuration of the chart, Metrics, Legend, and Display

Metrcis

  • Edit mode

    • The arrow above points totoggle editor modeYou can control the editing mode. If you close the mode, you need to manually input the query statement. If you open the mode, you can dynamically add, delete and change the mode on the interface, as shown in the figure above.
  • The data source

    • Panel data sourceMake sure you choose the right path, otherwise you won’t be able to find the corresponding path, and Mock data can come up and confuse you.

To enable dynamic edit mode, click on each box in the image above, and Grafana will automatically load the value of that position in the data source, and you can also select template variables here for dynamic control.

Clicking on the plus sign on the tail will bring up the corresponding data source function, which can do some advanced functions, which will be the focus of the second half of this article and will be covered later. Graphite has more functions, but less other data sources.

Legend

Legend specifies the name and value of the control curve

  • As Table

    • Whether to display in table form
  • To the right

    • Is it on the right side of the chart or is it on the bottom
  • Width

    • If left blank by default, it will automatically scale. Otherwise, the width will be forced
  • Min

    • Minimum value in a panel period
  • Avg

    • Average value over the panel period
  • Total

    • The Total number of values in the panel time period, as described above, is a very misleading parameter when getting the Total number of numeric types
  • Max

    • The maximum value of a panel period
  • Current

    • The current value of the panel time period

Display

Display controls the Display of points and lines on a graph. There are some important parameters

  • Draw Modes -> Lines

    • Whether to draw line segments between points
  • Draw Modes -> Points

    • Draw point or not
  • Hover info -> Mode

    • The way it is displayed on the floating panel, and the value isAll serires(show the values of all line segments at that point in time) andsingle(Show only the line segment the mouse is pointing at.)
  • Hover info -> Sort Order

    • The order of the lines on the suspended panel was chosen as Decreasing
  • Stacking & Null value -> Null value

    • This is important, and it needs to be determined dynamically based on the density of the points. If there are fewer points, it is easy to misunderstand that there are points in between.
    • Click connected
    • Select NULL when the dot is small

Senior function

Take Graphite as an example, the KEY in the dot path only supports upper and lower case letters, numbers, middle lines and underscores, which will lead to the front-end path (often including # and :path) cannot be saved, so we can only translate in advance, for example, translate # into ANCHOR, Translate :path to path and/to -, so that the variable template shows a weird front-end path, but fortunately we have functions that can be replaced back in the interface.

Click the “plus” sign in the Metrics panel to add the aliaSub function and fill in the three replacement rules above. You can see the normal path as shown in the following figure:

AliaSub is just one of the simple alias functions used to handle the name of a curve. Many more functions are used to handle aggregation of a single query, aggregation of multiple curves, display different timelines, calculation, and filtering. This section describes some of the most commonly used functions.

Aggregate the amount of multiple matching curves for a single query Combine -> sumSeries

For example, if stats_count.fe.test.* has dozens of matches, and the query shows dozens of curves in the graph, how do you get the total value of all the curves? SumSeries (stats_count.fe.test.*) sumSeries(stats_count.fe.test.

Transform -> timeShift

Want to show the previous day’s curve at the same time during this time period? TimeShift (Query, ‘1’ d ‘)

Remove outliers Filter -> removeAboveValue

If there is an abnormal value in a numeric type, such as millions of seconds when the average value is 1 second, then the interface can be filtered directly through a number of filtering functions rather than modifying the dot code, removeAboveValue(Query, 10000)

Rename function

  • alias

    • Directly rename the curve with the curve name
  • aliasByNode(4, 5, 6)

    • Name curves sections 4, 5, and 6 of their original names
  • aliasSub

    • A regular replaces a paragraph in a name

Special -> groupByNode

The sumSeries function simply adds the final value of a number of data points, not the final value, and does not support sum. For example, using groupByNode, you can dynamically aggregate multiple value types at a given position.

Calculate the success rate Calculate -> asPercent for multiple queries

Suppose we have the following points:

stats.timers.fe.test.error1.count
stats.timers.fe.test.error2.count
stats.timers.fe.test.error3.count
stats.timers.fe.test.success.countCopy the code

If you want to calculate the percentage of success, how do you do it?

In this case, we can not solve the problem with a single Query. First, we create two queries as follows:

Stats. Timers. Fe. Test. *. Count (Query sequence number as # A) stats. Timers. Fe. Test. Success. Count (Query sequence number is # B)Copy the code

Create A third Query (asPercent(#B, sumSeries(#A));

As you can see from the examples above, even the most complex functions that would previously have required back-end code can be easily implemented in the interface with multiple queries and functions nested within each other.

Each data source has corresponding function development documentation, such as Graphite. Grafana’s support for multiple data sources and functions enables it to achieve so much power in a single web interface.

other

Call the police

Grafana has added the alarm function after version 4.0, but the alarm function of Grafana belongs to the post-query of data source, which cannot meet the requirements in real time. Our company has an open source Banshee to solve this problem.

Banshee uses the three Sigma law, supports alerts based on thresholds and trends, and offers open apis and Webhooks with Slack integration by default. Banshee is located in the same location as the data source (the back end of StatSD), so timeliness is guaranteed and there are no requirements for the Grafana version due to the independent nature of the alarm.

Statsd points limit

Grafana relies on a sequential database to store data in a file for each KEY, such as stats.timers.fe.test.* Equivalent to all files in the stats/timers/fe/test folder. Therefore, we must pay attention to not have too many combinations of the dot path. For example, the combination of provinces and cities as keys can easily occupy more than 1 GB of data, resulting in disk burst.

In order to avoid path pollution caused by too many combinations, please make sure that the formatting points in each KEY are removed, for example, replacing them with underscores. In addition, you can add more prefixes to the dotted paths, for example, change stats.timers.fe.test.v1.* to stats.timers.fe.test.v1. In the event of contamination, you can delete the entire v1 folder instead of deleting the test root path to keep your historical data intact.

The back-end queries grafana data

It is generally recommended to use API keys to query the data of Grafana. An Admin account can generate API keys of the above three permissions in the interface. However, Basic Auth is enabled for Grafana by default, and you can pass grafana authentication by using the account password. For example http://${account} : ${password} @ ${grafana_host} / API/org.

Of course, it is best to have read permission from the data source to read the data directly.

Anonymous mode

Sometimes the user does not have a Grafana account, but he just wants to see the panel, what to do? This is where Grafana’s anonymous mode comes in.

The grafana configuration file has auth.anonymous configuration sections, enabled control switch, org_name controls the organizations that enable anonymous mode, and org_role controls the permissions of anonymous. Enabling anonymity in organizations means that non-logon users can directly skip grafana to query data sources without permission, so ensure data source security, such as restricted Intranet access.

The last

In addition to the powerful functions of Grafana, it should also be noted that Grafana is only a statistical monitoring platform with strong timing. Some non-timing functions, such as error aggregation and error logging, should be handed over to professionals. Examples include Sentry and ELK.