Some tips for using Grafana

The article directories

Value type Common indicator Meaning
- The total error
- error
Template variables
- How to write a template variable
- Creating a template variable
- Hidden gameplay of template variables
Grafana Panel editor
- Metrcis
- Legend
- Display
Senior function
- Aggregate total of multiple matching curves for a single query Combine -> sumSeries
- Transform -> timeShift
- Remove the outlier Filter -> removeAboveValue
- Rename a function
- Aggregation of multiple curves Special -> groupByNode
- Calculate the success rate of multiple Query components Calculate -> asPercent
other
- Call the police
- Statsd dot limit
- Query grafana data in the back end
- Anonymous mode
The last

Grafana is an open source temporal statistics and monitoring platform that supports data sources such as ElasticSearch, Graphite, InfluxDB, and is known for its powerful interface editor. We have received some good feedback on the introduction of Grafana for front-end monitoring, but many users ask questions about Grafana because they have not been exposed to it before, so I hope this article will be helpful for you to use grafana.

There are three levels of grafana permissions: Viewer, Editor, and Admin. The Viewer can only view grafana panels that already exist, but not edit them. The Editor can edit the panels, and the Admin has all permissions such as adding data sources, adding plug-ins, and adding API keys.

For ordinary users, Viewer permissions are sufficient; the rest of this article focuses on Editor permissions. Due to the limited space, the data source of graphite is used as an example in this paper, and only the configuration method of the most commonly used Graph is introduced.

Value type Common indicator Meaning

count_ps
- Quantity per second
count
- The number per ten seconds
mean_90
- The average after removing the highest 10% of the data
upper_90
- The highest value after removing the highest 10% of the data

The total error

This is a common grafana myth because it is often useful to use the numeric type count_ps to get the number of hits per second. Note that in this case, the total number of hits over a period of time is calculated using the avG average of count_ps multiplied by the number of seconds over that period. Instead of reading directly from the Total on the interface.

This is because there is a limit to the number of points a curve can display on the interface. Grafana will determine the number of points returned based on the width of your window, because there is no way to display every second of points on the interface for a time period like a day, since the total number of points is 86,400 and even a fish screen would not fit. For points that cannot be displayed, grafana defaults to using avG average behavior to correct the return point, for example, as shown below:

The time range of the figure above is one day. The upper part is divided into the values of the curve panel, and the lower part is divided into the values of the pie chart. The curve of the upper part of the icon is of type count (gathering every ten seconds), and you can see that the average AVG is 683. So the total would be 682 times 6 (or 60 for count_ps) times 60 (60 minutes in an hour) times 24 (24 hours in a day) to get 5.89 million, similar to the 5.82 million in the lower part of the picture, Therefore, the above total of 1.17 million is a completely misleading value that can be dismissed as meaningless.

error

The 5.89 million we calculated above is a little different from the 5.82 million we calculated on the interface, but this is acceptable because statSD is generally in UDP form (it actually has TCP form), so if you want the exact data, you’d better put the data related to the dot in the database as well. It is the backend query from the database that is completely reliable.

Template variables

The ability of template variables to dynamically control query statements in the panel is very important. You can often find them in the upper left corner of the panel, as shown below:

How to write a template variable

Template variables support $name and [[name]], the former is mainly used for Graphite data sources, such as stats.timers.fe.test.$key.count_ps

Creating a template variable

On the Grafana interface the gear button -> Templating -> click New and you will see something like this:

This section focuses on how to write the Query type.

Name
- The name of the variable. Special characters such as $are not supported
Refresh
- optionalNever.On Dashboard LoadandOn Time Range Change
- Used if the value of the variable is often increased dynamicallyOn Time Range Change, otherwise,On Dashboard LoadNever select the Query type, otherwise the variable will only be updated when you click in to edit it
Query
- Query statements, such as stats.timers.fe.test.*
- When writing, the grafana will not trigger the request. You need to click outside the input box and the value of the query will be displayed below

Query supports template variables, such as stats.timers.fe.test.$key.*, which automatically refreshes the value of the $key variable when it changes. This feature can be used to linkage multiple template variables to greatly reduce the grafana query time.

Hidden gameplay of template variables

Template variables can even be used in grafana jumps, which is a hidden play not even mentioned in the documentation. Insert $name anywhere in the Link or Dashboard URL. When the user clicks the link to jump, grafana will also replace this variable to make you jump to the correct link. This can be integrated with other systems to achieve a good user experience, such as jumping to Kibana to query logs.

Kibana and Grafana have different time range formats, which can be addressed using the Chrome plugin in this article.

In addition, the Custom template variable allows users to enter their own values in the variable drop down box, which is also a frequently used value. The Custom template variable synchronizes with the var-${name} in the queryString section of the current link. Together, you can easily jump from third-party systems to the correct Grafana panel

Grafana Panel editor

Go to any panel with Editor privileges, click a chart, and then click the Edit button in the pop-up to enter the chart Editor interface. For the editor this article covers only the important chart configurations, Metrics, Legend, and Display

Metrcis

Edit mode
- The arrow in the figure above points totoggle editor modeYou can control the editing mode. If you close it, you need to enter the query statement manually. If you open it, you can dynamically add, delete and change the mode on the interface as shown in the figure above.
The data source
- Panel data sourceMake sure you get it right, or you won’t be able to find the right path, and you’re likely to Mock up confusing data.

To enable dynamic edit mode, click on each of the boxes in the image above and grafana will automatically load the values in the data source for that location. You can also select template variables from here to dynamically control them.

Clicking the plus sign on the tail will bring up the function corresponding to the data source, which will do some advanced functions, which will be the focus of the second half of this article and will be covered later. Graphite has more functions, other data sources will be less.

Legend

Legend mainly controls the presentation of the names and values of curves, which is relatively simple. Here’s what they mean

As Table
- Whether to display in table format
To the right
- Is it on the right side of the chart or is it on the bottom
Width
- Default default will automatically scale, otherwise forced to limit the width
Min
- Minimum value in a panel period
Avg
- The average value of a panel period
Total
- The Total number of values over a panel period, which is a very misleading parameter when getting the Total number of numeric types as described above
Max
- Maximum value in a panel period
Current
- The current value within the panel time range

Display

Display controls the Display of points and lines on a chart. There are some important parameters

Draw Modes -> Lines
- Whether to draw line segments between points
Draw Modes -> Points
- Draw points or not
Hover info -> Mode
- Hover panel display, the value isAll serires(displays the values of all line segments at that point in time) andsingle(Show only the line at which the mouse points)
Hover info -> Sort Order
- The sequence of lines on the levitating panel, generally chosen To be sharp
Stacking & Null value -> Null value
- This is important because it needs to be determined dynamically according to the density of points. If there are fewer points, it is easy to misunderstand that there is a point between two points.
- When the time is long, select Connected
- When the point is small, select NULL

Senior function

Take Graphite as an example, the KEY in the dot path only supports uppercase and lowercase letters, numbers, hyphens and underscores, which will lead to the front end of the path (often including # and :path) can not be saved, so we can only translate in advance, such as translating # into ANCHOR, Translating :path to path and/to – makes for a weird front end path in the variable template, but we have a function to replace it on the interface.

Click the plus sign in edit mode on the Metrics panel of the edit interface, add aliaSub function, and use this to fill in the three substitution rules shown in the figure above. You will see the normal path as shown below:

AliaSub is just one simple alias function that handles the name of a curve. There are many more functions that are used to handle the aggregation of a single query, the aggregation of multiple curves, the presentation of different timelines, calculations, and filters. This section describes some of the most frequently used functions.

Aggregate total of multiple matching curves for a single query Combine -> sumSeries

For example, if stats_count.fe.test.* has dozens of matches, then the query will show dozens of curves in the graph. How do you get the total value of all the curves? SumSeries (stats_count.fe.test.*)

Transform -> timeShift

Want to show the previous day’s curves in the same time frame? TimeShift (Query, ‘1’ d ‘)

Remove the outlier Filter -> removeAboveValue

If there are abnormal values in the numeric type, such as millions of seconds when the average is 1 second, then you can use a number of filtering functions to filter them out directly on the interface rather than modifying the dotting code: removeAboveValue(Query, 10000)

Rename a function

alias
- Rename the curve directly with the curve name
aliasByNode(4, 5, 6)
- Name the curves as sections 4, 5, and 6 of their original names
aliasSub
- Re replaces a paragraph in the name

Aggregation of multiple curves Special -> groupByNode

The sumSeries function simply adds up the final values of multiple data sets, except at the end position, and does not support any other functions, such as avG averaging. Using groupByNode, you can dynamically aggregate multiple numeric types at a given position, as shown in the figure below:

Calculate the success rate of multiple Query components Calculate -> asPercent

Suppose we have the following points:

stats.timers.fe.test.error1.count
stats.timers.fe.test.error2.count
stats.timers.fe.test.error3.count
stats.timers.fe.test.success.count
Copy the code

If you want to calculate the percentage of success, how do you do it?

In this complex case, we can’t solve it with just one Query. First, we create two queries, as follows:

stats.timers.fe.test.*.count （QuerySerial number #A) stats. Timers. Fe. Test. Success.count （QuerySerial number #B)Copy the code

Create A third Query with the value asPercent(#B, sumSeries(#A). As the name implies, first aggregate the #A queries to get the total value, and then divide with asPercent.

From the examples above, you can see the power of functions, and even complex parts that previously required backend code can be easily implemented on the interface by nested multiple Queries and multiple functions.

Each data source has a function development document, such as Graphite. Thanks to its support for so many data sources and functions, Grafana is able to accomplish so much power in a single web interface.

other

Call the police

Since version 4.0, Grafana has added alarm functionality. However, since Grafana’s alarm is a post query of the data source, it is not real-time enough. Our company has an open source Banshee to solve this problem.

Banshee uses three Sigma’s law, supports threshold and trend-based alerting, offers open apis and Webhooks, and integrates Slack by default. The banshee and the data source are located in the same location (the back end of StatSD), thus ensuring timeliness, and there are no requirements on the Grafana version due to the independent nature of the alerts.

Statsd dot limit

If grafana relies on a temporal database, then each KEY will have a file to store the data. For example, stats.timers.fe.test.* Corresponds to all the files in the stats/timers/fe/test folder. Therefore, you must be careful not to have too many combinations of dot paths. For example, the combination of provinces and cities as keys can easily take up more than 1 GB of data and cause the disk to burst.

To avoid path contamination due to too many groups, try to replace each KEY with a formatting drop point, such as an _, and add a prefix to the dotted path as much as possible. For example, change stats.timers.fe.test.* to stats.timers.fe.test.v1. In this way, if the data is contaminated, you can delete the v1 folder instead of the test root, so that you can keep your historical and normal data.

Query grafana data in the back end

It is recommended to use the API KEY to query grafana data. The Admin user can generate API keys in the interface. However, Basic Auth is enabled in grafana by default. For example http://${account} : ${password} @ ${grafana_host} / API/org.

Of course, it is best to have read permission from the data source to read the data directly.

Anonymous mode

Sometimes the user does not have a Grafana account, but he just wants to see the panel. This is where grafana’s anonymous mode comes in.

The grafana configuration file contains the auth.anonymous configuration section, enabled to control the switch, org_name to control the organizations that enable the anonymous mode, and org_role to control the anonymous permissions. Enabling anonymity in an organization means that a non-logged in user can directly skip grafana and query the data source without permission, so ensure that the data source is secure, such as restricting Intranet access.

The last

This article introduces some relatively advanced techniques of using Grafana. In addition to the powerful functions of Grafana, it should also be noted that Grafana is only a time-dependent statistical monitoring platform. Some non-time-dependent functions such as error aggregation and error logging should be left to more professional people. Examples are Sentry and ELK.