Here I would like to introduce a big data platform named TDengine, which is a mature platform designed and optimized for the Internet of Things, Internet of vehicles, industrial Internet, IT operation and maintenance, etc. In addition to being 10 times faster than a sequential database, it also has the ability to provide caching, data,

As for why I got to know this platform, it mainly brings a good experience to many enterprises in the Internet of vehicles and wind power data processing. I also decided to try it here.

As for the advantages of this platform, we can see them in its documentation:

  • More than 10 times performance improvement.
  • Hardware or cloud service costs down to 1/5.
  • Full stack timing data processing engine.
  • Powerful analysis capabilities.
  • Seamless connection with third-party tools.
  • Zero operation and maintenance cost, zero learning cost.

I. Development practice

As for the practice, the small editor here uses Ubuntu20.04 for development. First we open the tools required for Ubuntu installation and type the following command:

sudo apt-get install -y gcc cmake build-essential git
Copy the code

Install OpenJDK 8 as follows:

sudo apt-get install -y openjdk-8-jdk
Copy the code

Then install Apache Maven with the following command:

sudo apt-get install -y  maven
Copy the code

Then get the source code from Github and run the following command:

git clone https://github.com/taosdata/TDengine.git
Copy the code

Then open TDengine with CD TDengine:

Build TDengine

In Linux, you need to log in to the root user first, because the commands used in the build require root permission. Enter su and password on the terminal to log in to the root user, and then run the following command to build:

mkdir debug && cd debugcmake .. && cmake --build .
Copy the code

After the build is complete, we can install again using the command:

sudo make install
Copy the code

After the installation is successful, start the TDengine service in the terminal:

sudo systemctl start taosd
Copy the code

Users can connect to the TDengine service using the TDengine Shell, in the terminal, by typing:

taos
Copy the code

If the TDengine Shell connection service is successful, a welcome message and version information will be printed. If it fails, an error message is printed.

In the simple installation and use of the process, I think in the version of the iteration of the upgrade, some of the plug-in package upgrade and version number is quite good.

Of course, if you feel that downloading the compiled source code is a tedious process, you can also directly download the installation package to experience: www.taosdata.com/cn/all-down…

Second, operation practice

After the download is complete, let’s try it out and write to Taosdemo without using any parameters. Enter the taosdemo command in the command box and the output is as follows:

$ taosdemotaosdemo is simulating data generated by power equipment monitoring... Host: 196.168.0.169:6030 user: rootpassword: taosdataconfigDir: resultFile:. / output txtthread num of insert data: 8thread num of create table: 8top insert interval: 0number of records per req: 30000max sql length: 1048576database count: 1database[0]: database[0] name: test drop: yes replica: 1 precision: ms super table count: 1 super table[0]: stbName: meters autoCreateTable: no childTblExists: no childTblCount: 10000 childTblPrefix: d dataSource: rand iface: taosc insertRows: 10000 interlaceRows: 0 disorderRange: 1000 disorderRatio: 0 maxSqlLen: 1048576 timeStampStep: 1 startTimestamp: 2022-01-14 10:04:00.000 sampleFormat: sampleFile: tagsFile: columnCount: 3column[0]:FLOAT column[1]:INT column[2]:FLOAT tagCount: 2 tag[0]:INT tag[1]:BINARY(16) Press enter key to continue or Ctrl-C to stopCopy the code

The parameters for taosDemo to write data are shown above. That is, to simulate the generation of a power industry scene data, establish a database named test, and create a super table named meters, the structure of the super table is as follows:

taos> describe test.meters; Field | Type | Length | Note |================================================================================= ts | TIMESTAMP | 8 | | current | FLOAT | 4 | | voltage | INT | 4 | | phase | FLOAT | 4 | | groupid | INT | 4 | TAG | location | BINARY | 64 | TAG | Query OK, 6 row (s) in the set (0.002972 s) taos > the describe the test. The meters. Field | Type | Length | Note |================================================================================= ts | TIMESTAMP | 8 | | current | FLOAT | 4 | | voltage | INT | 4 | | phase | FLOAT | 4 | | groupid | INT | 4 | TAG | location 64 | | BINARY | TAG | Query OK, 6 row (s) in the set (0.002972 s)Copy the code

After press Enter, TaosDemo will create database test and super meters, and according to TDengine’s best practices in data modeling, meters super meters will be used as a template to generate 10,000 sub-tables, representing 10,000 meter devices that independently report data.

taos> use test; Database changed.taos> show stables; name | created_time | columns | tags | tables |============================================================================================ meters | 2021-08-27 11:21:01. 209 | | 2 | 10000 | Query OK, 1 row (s) in the set (0.001740 s)Copy the code

Taosdemo then generates 10,000 records for each meter device simulation:

. ====thread[3] completed total inserted rows: 6250000, total affected rows: 6250000.347626.22 Records/Second ====[1]:100%==== Thread [1] completed total Inserted Rows: 6250000, total affected Rows: 6250000.347481.98 Records/Second ====[4]:100%==== Thread [4] completed total Inserted Rows: 6250000, total affected Rows: 6250000. 347149.44 Records/Second ====[8]:100%==== Thread [8] completed total Inserted Rows: 6250000, total affected Rows: 6250000.347082.43 Records /second====[6]:99%[6]:100%==== Thread [6] completed total Inserted Rows: 6250000, total affected Rows: 6250000. 345586.35 Records /second====Spent 18.0863 seconds to insert rows: 100000000, affected Rows: 100000000 with 16 thread(s) into test.meters. 5529049.90 Records/Secondinsert delay, AVG: 28.64ms, Max: 112.92ms, min: 9.35msCopy the code

The above information is measured on a common PC server with 8 cpus and 64 GB of memory. Taosdemo took 18 seconds to insert 100 million records, an average of 5,529,049 records per second.

After some tests, in general, TDengine, as a big data platform for the design and optimization of scenarios such as Internet of Things, Internet of vehicles, industrial Internet and IT operation and maintenance, shows a far more efficient performance than similar products due to the creation of new data storage and query engine design in the database kernel.

Three, notes

Finally in the use of the process, small make up here to give you a brief description of some need to pay attention to.

I’m sure you’ve all noticed that it’s all about temporal databases, so let’s talk about how TDengine manages data in a temporal dimension.

First, take a look at the description on the official website:

In addition to vNode sharding, TDengine also partitions sequential data by time period. Each data file contains the time sequence data of only one time range. The time range length is determined by the DB configuration parameter days. This method of partitioning by time period also facilitates the efficient implementation of the data retention policy. As long as the data files exceed the specified number of days (the system configuration parameter KEEP), they will be deleted automatically. In addition, different time periods can be stored in different paths and storage media to facilitate hot and cold management of big data and achieve multi-level storage.

It can be seen that the retention strategy of timing data is firmly controlled by the two parameters keep and days. However, this is not enough if we want a deeper understanding of TDengine’s timing data storage logic to optimize performance.

The official documentation describes keep and days as follows:

Keep: indicates the retention period of data in the database, in days. Default value: 3650 days: indicates the retention period of data in a data file, in days. Default value: 10

TDengine uses KEEP and DAYS to strictly control the timestamp range of inserted data: for past data, it cannot exceed the current time minus the TIMESTAMP value of KEEP; For future data, you cannot exceed the current time plus the timestamp value of days.

It can be seen that the retention strategy of timing data is firmly controlled by the two parameters keep and days. However, this is not enough if we want a deeper understanding of TDengine’s timing data storage logic to optimize performance.

The official documentation describes keep and days as follows:

Keep: indicates the retention period of data in the database, in days. Default value: 3650 days: indicates the retention period of data in a data file, in days. Default value: 10

TDengine uses KEEP and DAYS to strictly control the timestamp range of inserted data: for past data, it cannot exceed the current time minus the TIMESTAMP value of KEEP; For future data, you cannot exceed the current time plus the timestamp value of days.

We assume that the keep parameter of a database is 7, the days parameter is 3, and the current time is 00:00 on the 9th of a certain month.

Since keep is 7, the data before 2 days (9-7) must not be written. In addition, data can not be inserted after 12 days (9+3). In this way, you have a time range (color range) of data that TDengine can currently process, and when you try to write data in a gray time range — you’ll see “timestamp out of time range”.

This set of graphs represents the distribution of data files and the range of data that can be written as the current timeline moves.

Over time, the timestamp of the data is calculated against the system time. Once the keep days are exceeded, it is identified as expired data. After all data in the data file has expired, the data file will be removed from the computer.

Taking the above group of figures as an example, since the Data of the 2nd and 4th days are in the same Data File (Data File 1), the Data of the 4th day can be retained until the end of the 11th day at most, so the Data of the 2nd day should also be retained until the end of the 11th day. So we can see that on The 12th, Data File 1 has been deleted.

On a physical level, though, data is still deleted on a data file basis. But except for users with extremely detailed storage requirements, the vast majority of users are unaware. After this optimization, users no longer need to worry about the size of the deletion granularity. Just feel free to set the size of the days parameter to find optimal performance, depending on your business type.

In addition, given the time range of writable data (now-keep to now+days), given the time range of data shred (days), as long as the number of data file groups under the vnode directory is less than or equal to keep/days mod +1, the automatic deletion mechanism is considered to be working properly.

The above is also my personal understanding and analysis of TDengine data platform. I hope you can correct many shortcomings.

Author: Bob