Deploy SMS operation monitoring visualization platform using TDengine and Grafana

The author | Malone, fusion, senior engineer of communication

Introduction: Fusion Communication Technology (Tianjin) Co., Ltd. is a domestic first-class enterprise mobile new communication business platform technology and application solution provider. The cloud communication service platform integrates SMS, MMS, voice, traffic and virtual agent, integrates the three major operators and Internet IP network, and is committed to providing first-class mobile new communication business platform and operation services for domestic and foreign enterprises.

The main business of the company is to provide SMS services to corporate customers, and there is a strong internal operation monitoring platform. The daily SMS traffic is huge, so the monitoring platform needs to use the time series database to make statistics and monitor the massive SMS sent by geographical region, operator, receiving status and other categories. Previously using InfluxDB, queries with slightly longer time spans (such as a month’s worth of data) are very slow. After open sourcing TDengine, I knew about this awesome guy, so I tried TDengine.

After understanding the basic functions, I launched the monitoring system of TDengine version. However, I found that “group by” was not possible in Grafana, and only where statements could be written to put multiple SMS status data into a dashboard, as shown in the figure:

If there are more WHERE conditions, this approach is too clumsy and inflexible.

So, began to carefully study the official documents, understand “super table”, “continuous query” and so on, in this process encountered many problems, here to make a record (test environment, data is simulated).

I. Installation and operation

Test environment:

Linux Release 7.7.1908 (Core)Copy the code

Installation is simple:

The RPM - the ivh tdengine 1.6.3.1-3. X86_64. RPMCopy the code

Use the default configuration (without changing the configuration file)

Start the TDengine:

systemctl start taosdCopy the code

Two, build the database, build the table

Type “taos” on the command line

taos>Copy the code

The following database and table building operations are performed at this prompt

1. Create a database

create database jk keep 365 precision 'us';Copy the code

Description:

Keep 365 indicates that the database saves the data generated within 365 days, and the data generated before 365 days will be automatically deleted.
Precision ‘us’ indicates that the precision of the timestamp in the database is “microseconds” (the default is milliseconds, and precision’ ms’ can also be displayed). It is possible to use both single and double quotes, but not without quotes.
The time precision configuration is no longer supported in the configuration file and must be specified when building the library.
TDengine is designed for the Internet of things, information acquisition precision of the equipment to “ms” is enough to use, but our message platform will be a sudden large amounts of data, in order to avoid may cause data loss, set the precision to “ms”, after the test, the effect is very good, the test of simulated data inserted into the timestamp used “now”, The left side of the figure shows the “millisecond” precision. If “0 row(s)” is displayed, data is not inserted. The right side shows the “microsecond” precision, data is not inserted.

2. Create a super table

Enter database “JK”

taos> use jk; Database changed.Copy the code

create table jiankong (ts timestamp, gatewayid binary(6), companyid binary(20), provinceid binary(10), cityid binary(10), value int, timestr binary(30))tags(type binary(10), subtype binary(10));Copy the code

There are three types and hundreds of subtypes in the SMS system, so take this static information (or simply think of it as needing to set the group by field to tag)

To explain the supertable:

STable is an abstraction of data collection points of the same type. It is a collection of collection instances of the same type and contains multiple sub-tables with the same data structure. Each STable defines a table structure and a set of labels for its children. A table structure is the data columns and data types recorded in the table. The label name and data type are defined by STable. The label value records the static information of each sub-table and is used for group filtering of sub-tables. Subtable is an ordinary table in essence, consisting of a timestamp primary key and several data columns, each row records specific data, data query operation is exactly the same as ordinary table; But child tables differ from regular tables in that each child table belongs to a super table and has a set of label values defined by STable. Multi-table aggregation query is performed on all subtables created through STable. It supports conditional filtering (where) based on all TAG values and group by based on the values in TAGS. Fuzzy matching filtering for binary types is not currently supported.

Label data (or label value) is directly associated with each sub-table. The same label value (one or more, up to six) is located in a sub-table (” Automatic table creation when data is written “can be used to map” the same label value “to multiple sub-tables).

Tag values support Chinese, and you need to set the tag type to NCHAR (as verified in other tests).

Create subtables (subtables are normal tables, and the table structure is completely defined by the supertable) :

create table jiankong_sub_send using jiankong tags ('send'.'send'); create table jiankong_sub_delivrd using jiankong tags ('delivrd'.'delivrd'); create table jiankong_sub_undeliv_db_0108 using jiankong tags ('undeliv'.'DB:0108'); create table jiankong_sub_undeliv_db_0107 using jiankong tags ('undeliv'.'DB:0107'); create table jiankong_sub_undeliv_balance using jiankong tags ('undeliv'.'BALANCE'); create table jiankong_sub_undeliv_id_0076 using jiankong tags ('undeliv'.'ID:0076'); create table jiankong_sub_undeliv_ib_0008 using jiankong tags ('undeliv'.'IB:0008');Copy the code

A combination of type and subtype of the same type is created as a subtable (just for testing, so not all hundreds of subtypes are created)

4. Insert data

INSERT INTO jiankong_sub_send VALUES (now, 3034, '1564'.'109'.'1272'.'2'.'201909231530') INSERT INTO jiankong_sub_delivrd VALUES (now, 3034, '1564'.'109'.'1272'.'2'.'201909231530')INSERT INTO jiankong_sub_undeliv_balance VALUES (now, 1179, '152'.'106'.'1000'.'1'.'201910071113') INSERT INTO jiankong_sub_undeliv_id_0076 VALUES (now, 1165, '1785'.'111'.'1226'.'1'.'201910071415') INSERT INTO jiankong_sub_undeliv_ib_0008 VALUES (now, 1165, '1785'.'127'.'1000'.'2'.'201910061727') INSERT INTO jiankong_sub_undeliv_db_0108 VALUES (now, 90, '548'.'123'.'1237'.'1'.'201910061127') INSERT INTO jiankong_sub_undeliv_db_0107 VALUES (now, 2261, '808'.'116'.'1314'.'2'.'201910032106')Copy the code

The above is to insert simulated data into the 7 sub-tables created above. As a large amount of data is simulated, shell scripts (or other methods can be used) are required for data inputting.

You cannot write data directly to stables, but to each child table.

5. Query the database and table structure

Query database information:

taos> show databases; name | created time | ntables | vgroups |replica| days | keep1,keep2,keep(D) | tables | rows | cache(b) | ablocks |tblocks| ctime(s) | clog | comp |time precision| status | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =log025 | | 19-11-18 16:37:14. 4 | 1 | 1 | | 10 30,30,30 32 | | 1024 | | 2048 | | 3600 | 32 2.00000 1 | 2 | us | ready jk | | 19-11-18 16:48:19. 867, | | | 1 | | 365365365 | 1024 | 4096 | 16384 | | 100 | 3600 4.00000 | 1 | 2 | us | ready | Query OK, 1 row (s)in set(0.002487 s)Copy the code

Query a supertable:

taos> show stables; name | created_time |columns| tags | tables | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Jiankong 16:48:41. | 19-11-18 | 540 7 | 2 | | 7 Query OK, 1 row (s)in set(0.002140 s)Copy the code

Query the table structure of a supertable:

taos> describe jiankong; Field | Type | Length | Note |  ======================================================================================================= ts |TIMESTAMP | 8| | gatewayid |BINARY | 6| | companyid |BINARY | 20| | provinceid |BINARY | 10| | cityid |BINARY | 10| | value |INT | 4| | timestr |BINARY | 30| | type |BINARY | 10|tag | subtype |BINARY | 10|tag | Query OK, 9 row(s) in set(0.001301 s)Copy the code

You can see “tag” in the Note column to indicate that this column is a tag

Select * from subtable;

taos> show tables; table_name | created_time |columns| stable | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = jiankong_sub_delivrd 16:49:17. | 19-11-18 | 009 7 | jiankong | jiankong_sub_undeliv_ib_0008 | 19-11-18 16:49:17. 025 7 | | jiankong | jiankong_sub_undeliv_db_0108 16:49:17. | 19-11-18 | 016 7 | jiankong | jiankong_sub_undeliv_db_0107 | 19-11-18 16:49:17. 018 7 | | jiankong | jiankong_sub_undeliv_id_0076 | 19-11-18 16:49:17. 023 7 | | jiankong | jiankong_sub_send 16:49:17. | 19-11-18 | 003 7 | jiankong | jiankong_sub_undeliv_balance | 19-11-18 16:49:17. 021 7 | | jiankong |Copy the code

Query the table structure of a specific subtable:

taos> describe jiankong_sub_undeliv_db_0108; Field | Type | Length | Note | ========================================================================================================= ts |TIMESTAMP | 8| | gatewayid |BINARY | 6| | companyid |BINARY | 20| | provinceid |BINARY | 10| | cityid |BINARY | 10| | value |INT |  4| | timestr |BINARY | 30| |type                                                            |BINARY          |         10|undeliv   |    subtype                                                         |BINARY          |         10|DB:0108   |Copy the code

You can see in the Note column “undeliv” (the type field in the super table) and “DB:0108” (the subtype field in the super table), the two static label values that identify the subtable

6. Data query

Group aggregate query for type:

taos> select sum(value) from jk.jiankong group by type;     sum(value)      |   type   |=================================             11827688|delivrd   |             55566578|send      |             46687487|undeliv   |Query OK, 3 row(s) in set(0.018251 s)Copy the code

Query subtype by group aggregation:

taos>taos> select sum(value) from jk.jiankong group by subtype; sum(value) | subtype |================================= 9317|BALANCE | 65219|DB:0107 | 2077691|DB:0108 | 2804417|IB:0008  | 41730843|ID:0076 | 11827688|delivrd | 55566578|send |Query OK, 7 row(s)in set(0.013978 s)Copy the code

Query type and subtype by group aggregation:

taos> select sum(value) from jk.jiankong group by type, subtype;     sum(value)      |   type   | subtype  |============================================             11827688|delivrd   |delivrd   |             55566578|send      |send      |                  9317|undeliv   |BALANCE   |                65219|undeliv   |DB:0107   |              2077691|undeliv   |DB:0108   |              2804417|undeliv   |IB:0008   |             41730843|undeliv   |ID:0076   |Query OK, 7 row(s) in set(0.732830 s)Copy the code

Query type and subtype by day by group aggregation:

taos> select sum(value) from jk.jiankong interval(1d) group by type, subtype;           ts            |     sum(value)      |   type| subtype | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 19-11-18 00:00:00. 000000 | 1760800 | delivrd | delivrd | 19-11-19 00:00:00. 000000 | 14768 | delivrd | delivrd | 19-11-20 00:00:00. 000000 | 3290720 | delivrd 000000 | 4973640 | | delivrd | 19-11-21 00:00:00. Delivrd | delivrd | 19-11-22 00:00:00. 000000 | 1787760 | delivrd | delivrd | 19-11-18 00:00:00. 000000 | 36976790 | send | send | 19-11-19 00:00:00. 000000 | 310128 | send | send | 19-11-20 00:00:00. 000000 | 9482760 | send | send | 19-11-21 00:00:00. 000000 | 6470940 | send | send | 19-11-22 00:00:00. 000000 | 2325960 | send | send | 19-11-18 00:00:00. 000000 | 6200 | undeliv | BALANCE | 19-11-19 00:00:00. 52 | 000000 | undeliv | BALANCE | 19-11-20 00:00:00. 000000 | 1590 | undeliv | BALANCE | 19-11-21 00:00:00. 000000 | 1085 | undeliv | BALANCE | 19-11-22 00:00:00. 000000 | 390 | undeliv | BALANCE | 19-11-18 00:00:00. 000000 | 43400 | undeliv | DB: 0107 | 19-11-19 00:00:00. 000000 | 364 | undeliv | DB: 0107 000000 | 11130 | | 19-11-20 00:00:00. Undeliv | DB: 0107 | 19-11-21 00:00:00. 000000 | 7595 | undeliv | DB: 0107 | 19-11-22 00:00:00. 000000 | 2730 | undeliv | DB: 0107 | 19-11-18 00:00:00. 000000 | 1382600 | undeliv | DB: 0108 | 19-11-19 00:00:00. 000000 | 0108 | 11596 | undeliv | DB: 19-11-20 00:00:00. 000000 | 354570 | undeliv | DB: 0108 | 19-11-21 00:00:00. 000000 | 241955 | undeliv 0108 | | DB: 19-11-22 00:00:00. 000000 | 86970 | undeliv | DB: 0108 | 19-11-18 00:00:00. 000000 | 1866200 | undeliv | IB: 0008 | 19-11-19 00:00:00. 000000 | 15652 | undeliv | IB: 0008 | 19-11-20 00:00:00. 000000 | 478590 | undeliv | IB: 0008 | 19-11-21 00:00:00. 000000 | 326585 | undeliv | IB: 0008 | 19-11-22 00:00:00. 000000 | 117390 | undeliv | IB: 0008 | 19-11-18 00:00:00. 000000 | 0076 | 27769800 | undeliv | ID: 19-11-19 00:00:00. 000000 | 232908 | undeliv | ID: 0076 | 19-11-20 00:00:00. 000000 | 0076 | 7121610 | undeliv | ID: 19-11-21 00:00:00. 000000 | 4859715 | undeliv | ID: 0076 | 19-11-22 00:00:00. 000000 | 1746810|undeliv |ID:0076 |Query OK, 35 row(s)in set(0.023865 s)Copy the code

Interval is the length of the aggregation period. The minimum interval is 10 milliseconds (10A).

If no super table is created, an error message will be displayed when you perform group aggregation query on common tables:

taos> select sum(value) from jk.jiankong group by type;    TSDB error: invalid SQL: group by only available for STable queryCopy the code

7. Create sub-tables automatically when writing data

We have another requirement, because we have to monitor hundreds of static data, and there is uncertainty, so it is not possible to create all the child tables when building the library and table. This feature completely solves our problem.

Here’s an excerpt from the official document:

In some special scenarios, when users are not sure whether a table on a device exists during data writing, you can use the table structure defined by the super table to automatically create non-existent sub-tables. If the table already exists, no new table will be created. Note: The autobuild clause can only automatically create subtables and not supertables, which requires that the supertables have been defined in advance. The automatic table building method is very similar to the INSERT /import syntax, except that supertables and label information are added to the statement. The syntax is as follows:

INSERT INTO

USING

TAGS (

…) VALUES (field_value,…). (field_value,…). … ;

Create a child table with create;

create table jiankong_sub_send using jiankong tags ('send'.'send');Copy the code

Install and configure Garafana

1. Install

Download on the website https://grafana.com/grafana/download grafana RPM installation package, for installation:

RPM - the ivh grafana 6.4.4-1. X86_64. RPMCopy the code

2. Copy the Grafana plugin for TDengine to the Grafana plugin directory

TDengine Grafana plugin in the installation package/usr/local/taos/connector/Grafana directory

cp -r /usr/local/taos/connector/grafana/tdengine/ /var/lib/grafana/pluginsCopy the code

3. Start Grafana

systemctl start grafana-serverCopy the code

4. In the browser, run host:3000 to log in to the Grafana server

The default user name and password are admin

5. Add the TDengine data source

Find “TDengine” at the bottom

Name: TDengine Host: IP address of the test server http://192.168.8.66:6020 User: default value root Password: default value taosdataCopy the code

Test it out:

6. Add Folder

Put dashboards of the same type to be monitored into a Folder

7. Add the Dashboard

After entering the Folder you just created, create the Dashboard

SQL statement in INPUT, pay attention to the position of fill before group by, otherwise error

Configuring graphical display

You can select graphs, tables, dashboard, etc

Here, configure the Curve diagram. Below the diagram are specific graphic display rules, such as marking, filling or not, and customizing the curve color of displayed fields

Give the dashboard an easy-to-understand name:

8. Six dashboards are configured

9. There is no group by or group by instance

If there is no group by, you need to write multiple SQL statements and distinguish them by WHERE conditions. If there are multiple classifications, it will be troublesome and inflexible

With group BY, a single SQL statement solves the problem:

The figure below is the comparison of curve trends with or without group by, which is exactly the same

Fourth, advanced functions and some associations

Continuous Query — user-level estimates

The concept of “predictive computation” in TDengine is one that I think is very good. Here is an excerpt from the documentation on TDengine’s website:

In order to effectively improve the performance of query processing, aiming at the immutable characteristics of Internet of Things data, TDengine records the maximum value, minimum value, and other statistical data of the data in each saved data block. If the query processing involves the entire data of the entire data block, the predicted results are used directly and the contents of the data block are not read. Because the size of the precompute module is much smaller than the size of the specific data stored on the disk, using the precompute result can greatly reduce the I/O read and speed up the query processing process when the DISK I/O is the bottleneck.

An excerpt of the official website document for continuous queries is as follows:

Continuous Stream query based on sliding window is a kind of simplified time-driven streaming query, which is automatically executed by TDengine periodically. For tables in the library or supertables, TDengine provides periodic, automatic, continuous queries that can be pushed by TDengine or written back to TDengine. Each query executed is a time window that slides forward over time. When defining continuous query, you need to specify the time window size (interval) and each forward sliding times (parameter sliding).

Among them, the way to write the results back to TDengine is actually a user-level prediction, so that TDengine performs the background calculation according to the user-defined time window and time increment. When the user queries the data, the data can be read directly from the written table, which will be very fast.

Create a continuous query:

taos> create table test_stream_sum as select sum(value) from jiankong interval(20s) sliding(10s) group by type, subtype; Query OK, 1 row(s) affected (0.000983s)Copy the code

SQL > select ();

taos> select sum(value) from jiankong interval(20s) group by type, subtype;           ts            |     sum(value)      |   type| subtype | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 19-11-18 16:50:40. 000000 | 9088 | delivrd | delivrd | 19-11-18 16:51:00. 000000 | 31808 | delivrd | delivrd 16:51:20. | 19-11-18 | 000000 | 15904 delivrd | delivrd | The 19-11-18 16:52:20. 000000 | 12212 | delivrd | delivrd | 19-11-18 16:52:40. 000000 | 31524 | delivrd | delivrd | 19-11-18 16:53:00. 000000 | 31524 | delivrd | delivrd 16:53:20. | 19-11-18 | 000000 | 31808 delivrd | delivrd 16:53:40. | 19-11-18 | 000000 31240 | delivrd | delivrd 16:54:00. | 19-11-18 | 000000 | 31524 delivrd | delivrd 16:54:20. | 19-11-18 | 000000 | 31524 delivrd | delivrd | 19-11-18 16:54:40. 000000 | 31240 | delivrd | delivrd 16:55:00. | 19-11-18 | 000000 | 31524 delivrd | delivrd | The 19-11-18 16:55:20. 000000 | 28400 | delivrd | delivrd | 19-11-18 16:55:40. 000000 | 31808 | delivrd | delivrd | 19-11-18 16:56:00. 000000 | 31524 | delivrd | delivrd 16:56:20. | 19-11-18 | 000000 | 31240 delivrd | delivrd 16:56:40. | 19-11-18 | 000000 31524 | delivrd | delivrd 16:57:00. | 19-11-18 | 000000 | 32092 delivrd | delivrd 16:57:20. | 19-11-18 | 000000 | 31240 delivrd | delivrd | 19-11-18 16:57:40. 000000 | 32092 | delivrd | delivrd 16:58:00. | 19-11-18 | 000000 | 31240 delivrd | delivrd | The 19-11-18 16:58:20. 000000 | 22720 | delivrd | delivrd | 19-11-18 16:50:40. 000000 | 190848 | send | send |Copy the code

The actual data in the automatically created continuous query table is:

taos> select * from test_stream_sum; Ts | sum_value_ | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 19-11-18 17:17:30. 000000 | 2556 | 19-11-18 17:17:40. 000000 | 18460 | 19-11-18 17:17:50. 000000 | 15904 | 19-11-18 17:18:00. 000000 | 15620 | Query OK, four row (s)in set(0.000431 s)Copy the code

The above result is not the expected result and does not show the aggregated query according to the group by field I defined.

Therefore, github Issues, Siege lion of Taos replied, “Group by is not well supported by continuous query at present, this problem has been in our plan list, we will improve this function in the future”, the reason is “because after writing back, The new table may have the same timestamp primary key (different group by fields will have the same time), which will conflict, so it is not supported for now.

Although the feature is not as good as I expected so far, the reply from the siege lion is comforting and looking forward to the improvement of this feature.

2. Lenovo

TDengine is designed to serve the Internet of Things and has the characteristics of structure and timing. It is suggested that data closely related to timing can be used, while other data is not recommended.

From TDengine architecture design, storage and so on, I think not limited to 100% timing features, some detailed queries can also try to store in TDengine, I will try to test later.