Authors: Huang Peijian, Chen Chi | Three Four Zero Technology Co., LTD

Small T introduction: Shanghai Three Four Zero Technology Co., Ltd. is committed to the digital transformation of energy (gas and heat), with the current pressure and flow monitoring iot as the starting point, combined with the existing information data of the enterprise, using iot technology to solve the pain points of the safe operation of energy enterprises, to help improve their intelligent operation efficiency.

Based on digital twinning and artificial intelligence algorithm model, we build a transient simulation platform that reflects the actual operation conditions of pipe network in real time. The platform is dedicated to providing a series of data services for energy enterprises, including customer cognition, demand load analysis, pipe network status monitoring and optimization, multi-condition calculation, gas (heat) source matching, and channel optimization. Below is the actual operation scene of the platform.

In this project, two main sources of data were used to build digital twinning:

  1. Iot data: Iot data will be reported every 5 minutes by iot equipment;
  2. Simulation data: based on the access of a large number of iot devices, the simulation data can be obtained by real-time alignment and uploading of working condition data for simulation solution.

As the project was about to go into development and testing, we found some problems that would affect the normal operation of the project — both data types were very large, and the previously used database could not support such a large amount of data to store queries and other operations. With this problem in mind, we turned our eyes to the current database products on the market and tried to select the most suitable database for this project.

Tortuous database selection

In terms of the data types involved in the project, iot data has typical characteristics of time series data, with large quantity but unchanged, and the amount of calculation results of simulation data is also large. Both types of data require fast storage and real-time query responses, but queries are not overly complex compared to storage.

Based on the above background requirements, we began to select database products.

The first attempt is to use the non-relational database MongoDB, MongoDB has been outstanding in the general database, but the efficiency of query and storage is still not up to our expectations. After thinking, we narrowed down the scope of database selection based on data type, and finally decided to choose from timing database.

A set of tests shows that, although business needs will be met, InfluxDB’s operation and maintenance costs are too high, and its cluster edition is not open source, which is not cheap to use. Therefore, InfluxDB is also excluded from our selection range.

By chance, we learned about TDengine, a timing database, which outperforms InfluxDB in performance and even makes the cluster open source, facilitating horizontal expansion and achieving the optimal cost control. The lightweight, SQL-like structure greatly reduces operation and learning costs. With so many advantages, we tried to put TDengine into beta use.

We were very happy to receive timely and professional technical support from the TDengine community, and finally successfully applied the open source TDengine to the project development. TDengine did not disappoint us, and it was very efficient and stable after launching.

Specific scenarios and configurations

At present, the version of TDengine we choose is 2.2.0.1. There is no pressure on the stand-alone version for the time being, but due to the demand of business expansion, we are also preparing for the horizontal expansion of the stand-alone into a cluster.

In practice, the current server configuration is 24 gb memory + 12 cores 3.60GHzCPU + mechanical hard disk. As shown in the figure below, the library has a total of 20000+ sub-tables and 6 super tables, among which there are 4 commonly used tables. The data is retained for 10 years, and the growth rate is about 10 million lines per week.

According to the information provided by Taos Data, the number of VNodes in the database is exactly equal to the number of CPU cores of the computer through reasonable configuration parameters, so that the computer performance can be fully utilized and the environment can be successfully built.

Query results of tens of millions of levels of data

The super table computing_result stores the results of all the simulation calculations, with a total of 21 columns in a single table and a single row length of 1.8K. The current data volume is tens of millions, which is our main query object —

According to the above query data, the specific query results are as follows:

1. Query all simulation results of some devices in 0.09 seconds. The code example is as follows:

select * from slsl_digital_twin.computing_result r where r.batch_no in ( 'c080018_20211029080000' ) and r.device_id in ( '347444', '73593', '18146', '235652', '350382');Copy the code

2. It takes 7.8 milliseconds to query the latest pressure data of some devices within a certain time range.

3. Query the latest data of different devices in this area by grouping area IDS, which takes 9 milliseconds (due to the nested query function added in version 2.1, we can better achieve relatively complex logic to obtain query results). A code example is as follows:


select sum(pressure) as pressure, sum(flow) as flow, sum(temperature) as temperature,last(on_off) as onOff,gis_id from (select last(pressure) as pressure, last(flow) as flow, last(temperature) as temperature,last(on_off) as on_off  from slsl_digital_twin.enn_iot where in_out_flag = 'OUT' and gis_id in( '347444', '73593', '18146', '235652', '350382') group by device_id,gis_id ) group by gis_id;

Copy the code

It is worth noting that TDengine takes up less than 500MB of storage space for efficient storage query performance. In fact, the actual size of the computing_result single supertable should theoretically be (82+ (40+125+31+225) 4+811+2+42+10) bytes *12409408 rows, or about 21GB. Not to mention that extracting static data into in-memory labels drastically reduces the amount of raw data.

However, we cannot calculate the compression rate accurately in this article because there are partial null values in the nCHAR class data. Even so, TDengine’s performance is impressive enough.

Write in the last

In the future, as we access the calculation models of pipe network simulation, explosion radiation and leakage warning in more cities, the amount of data generated will reach 1 billion + and the number of sub-tables will reach tens of millions. The upcoming 3.0 release of TDengine can easily support hundreds of millions of tables, which makes us look forward to working with TDengine in the future and have great confidence in TDengine. As we work together, we will continue to explore how TDengine can be applied to more business scenarios to better meet the business needs of our various simulation processes.

About the author

Huang Peijian, architect, three Four Zero. Currently responsible for the company’s digital twin project and the company’s overall technical architecture. Chen Chi, three four zero senior engineer. Currently responsible for the overall development of the company’s digital twin project.


✨ For more details on TDengine, check out the source code on GitHub. ✨