Young people do not speak martial arts, Tdengine edge side data storage solution challenges SQLite

Last week, Taos Data and EMQ jointly released the industrial Internet integration solution on Meetup. Based on TDengine and EMQ X, it built a lightweight edge computing industrial Internet platform with the capabilities of industrial data collection, aggregation, cleaning, storage analysis and visual display. Now that Tdengine fully supports ARM 32 and ARM 64 processors, why is Tdengine a more efficient storage option for edge-side data? How is it better than SQLite? On Meetup, Forrest, co-founder of Taos, shared the technical rationale behind it.

From the Internet to the mobile Internet and now to our Internet of Things, computers, mobile terminals, wearables, cars, even lights in homes and factories are all connected to the Internet. Generally speaking, a variety of devices continuously collect real-time status data and then aggregate this data into a computing platform in the cloud. This is the general idea of cloud computing in the Internet of Things.

The whole technology chain of the Internet of Things has a four-layer structure: the status data of the device is collected by the sensor -> sends the data to the cloud through the communication module -> stores, queries and calculates the data in the cloud -> finally connects to the analysis and application system.

However, in the cloud computing model, data must be transferred to the cloud for centralized storage, archiving, analysis, etc. The node on the edge could be a gateway, or it could be a terminal that we actually use. If it does not have its own computing capacity, it must send the collected data to the cloud, and rely on the cloud computing resources to carry out complex calculations, get a guiding conclusion, and then send it to the terminal through the network. It is easy to see how much the work of the terminal depends on the network in this process. If the network once some outages or failures, the terminal can not interact with the cloud, some of its work will be greatly affected. Therefore, the idea of central-side (cloud) master control requires very high communication requirements between side clouds, which often requires the use of high-speed communication network with high cost. On the other hand, as the volume of data continues to grow, so will the cost of storage and computing in the cloud.

A good idea to solve this problem is edge computing, that is, to sink part of the storage and computing capacity to the edge side (i.e., the device side), the terminal device is relatively independent data storage, calculation, decision and application. In this way, the edge is smarter, less dependent on the cloud, more time-sensitive data processing, and no longer affected by the network.

The advantages of edge computing are summarized as follows.

However, what is the difficulty of edge computing that has not been solved? As we know, the edge side is often some small intelligent terminals that can be laid in large quantities. Considering the cost, its configuration of hardware resources such as memory, CPU and computing power are very limited. The difficulty of edge computing lies in whether it can realize the most efficient data storage, analysis and calculation under the limited computing resources. This makes the database selection on the edge particularly important. The data collected by the terminal equipment at the edge has obvious characteristics, which are generally structured time-series data streams with time stamps. Therefore, the requirements of edge computing for database capability are reflected in the following aspects:

Ultra-high read and write performance
Low hardware overhead
General-purpose interface to meet various computing needs at the edge
Real-time data caching capability, streaming computing capability
Persistent storage of historical data and efficient compression capability
Historical data backtracking ability, according to the time window statistical aggregation ability
Cloud-edge synergy capability

Tdengine – More suitable for the edge side of the big data engine

Timing series database is the best choice for edge data storage, which has the above abilities. However, time-series databases such as OpenSDB (underlying HBase-based transformation) and InfluxDB are too heavy for the edge side, and the running hardware resource cost is too high. An extremely lightweight open source timing database is called TDEngine, and the entire package is just over 2MB. Its core function is a high performance distributed timing database; In addition, it also has the functions of message queue, cache, streaming computation, data subscription and so on, providing an all-in-one solution for temporal structured data storage.

The Tdengine community has released ARM32 and ARM64 processor support versions that run smoothly on mainstream edge hardware such as Raspberry Pi, while providing real-time data caching, historical data backtracking, aggregated calculations by time, and more. While the odds of using a distributed cluster on the edge are small, if a few Raspberry Pi, Box, or Gateway want to build a cluster, they can.

The Tdengine ARM version also supports a wide variety of interfaces, with little difference from the normal clustered version. At the same time, a Taos Shell client is also provided, so that the debugger can easily check the running status of the TDEngine.

Tdengine edge cloud synergy thinking

Edge resources are limited, and the amount of data that can be stored is also limited. Therefore, data backup and collaboration should be made to the cloud. There are also a lot of edge cloud synergy ideas, here to talk about some of our ideas.

Let me give you an example so you can understand it better. There are many gateways in the edge-side factory. We can install an edge-side version of Tdengine in each gateway. Then Tdengine becomes a storage engine on the edge-side, which can persist the data collected by the gateway. Depending on the data collection frequency and compression, the edge side can selectively store the original data for a certain length of time (such as one month to half a year, etc.) according to the existing storage resources. For integer or floating-point data, TDEngine can reduce the compression ratio to about 10%, depending on the data type. If the value of the data randomly changes too much, the compression ratio will be affected, but the overall compression ratio is still around 10%. Therefore, if we put a 2GB or even 1GB SD card in the gateway, we can probably store 10GB of raw data volume. This magnitude is sufficient for edge-side real-time analysis.

However, if longer historical data need to be stored and further analysis such as big data mining is needed, the data should be synchronized to the cloud data center for storage. The edge-side version of Tdengine can be accessed directly by the Tdengine client in the cloud (if the network is clear), so data synchronization from the edge to the cloud is very simple. The cloud application can pull the latest data from the edge gateway in real time through the subscription module of Tdengine, and then write the received incremental data into the local Tdengine cluster in real time for historical archiving. The implementation of this technology is essentially a timed query, so TDEngine allows users to selectively synchronize data on the edge by adding data filters (such as pulling only records collected above a certain threshold, or not) without having to report all historical data to the cloud.

Based on the edge storage advantage of TDengine and the overall idea of edge cloud collaboration, Taos Data and EMQ have also jointly made an edge solution. EMQ X Neuron, EMQ X Edge, EMQ X Kuiper and TDengine are deployed in the edgeside gateway to convert streaming data collected by the device into MQTT messages via protocol resolution on Neuron and then publish EDGE (edgeside MQTT Broker). This is then stored in the TDengine deployed on the edge through Kuiper. In this way, applications running at the edge can get and process data from TDEngine, make real-time display and alarm. The Edge Manager, which EMQ runs on the Edge, provides an administrative console that makes it easy to configure the software and manage the other three components. Click on the”Here,”, to understand the configuration method of the scheme in detail. Such a solution would leave the coordination to EMQ.

However, there may be users who already use Tdengine’s Cluster in the cloud, and now have industrial devices that want to access Tdengine on the edge directly through the Tdengine Cluster client. This can also be achieved directly through the data subscription module of TDengine, that is, the cloud application calls the data subscription module to create a series of subscription tasks and directly pull the latest incremental data in the edge side TDengine in real time. This solution is equivalent to handing over the work of collaboration to TDEngine, of course, here to ensure that the network is unblocked.

Tdengine Edge builds on Raspberry Pi

The following is a brief tutorial on how to compile, install, and run TDEngine on Raspberry Pi.

Environment to prepare

1. Burn the operating system

Burn operating system to SD card. TDengine supports Ubuntu16.04, Centos7.0 and above and other major operating systems.

2. Network Settings

Configure the network environment on Raspberry Pi, set the static IP and hostname for the development version, and connect to the network.

3. Download and compile TDEngine

From www.github.com/taosdata/TDengine clone TDengine source to raspberries pie, compile and run.

The build process

# clone source code $ git clone --recursive --recurse-submodules https://github.com/taosdata/TDengine.git # checkout to The latest version $CD TDengine/ $git checkout ver-2.0.7.0 # compile and install $mkdir build && CD build $cmake.. / -dcputype = aarch64-dvernumber = 2.0.7.0-dvercompatible =2.0.0.0 $make && make install # start taosd $systemctl start taosd $ taosdemo

After the compilation and installation is completed, you can see the TaosDemo program provided by us, which is convenient for you to experience the top speed. We can test the data write and query efficiency of TDEngine through TaosDemo.

A simple comparison between TDEngine and SQLite

Data storage in edge-side, embedded devices has to be referred to SQLite. SQLite is a lightweight database that does not need a background. It can be said that it is plug and play. It is also the highest installed database in the world. SQLite even benchmarks itself against fopen() on its official website, rather than the database: Think of SQLite not as a replacement for Oracle but as a replacement for fopen() SQLite is a compact library. Of course, SQLite provides a range of APIs that target relational databases, and it even supports transactions, so it is often used in the industry as an embedded relational database.

For comparison, SQLite on Linux is 1.9MB and TDengine is 2.7MB. Both are the ultimate lightweight. Because TDEngine is a scheme specifically for temporal structured data, it does not support transaction and complex table relation processing, but it provides temporal index of temporal data, real-time stream calculation, column storage and better compression ratio, the ability of down sampling and aggregation according to time, data preservation time and so on. In this sense, TDengine is more responsive to the need to process temporal data in an edge-side production environment than SQLite. The edge-side version of Tdengine can also achieve seamless connection of products in the cloud. If the network is not smooth, Tdengine can realize automatic data cache, automatic transmission after networking, and realize the ability of edge cloud collaboration. Here is a diagram to briefly summarize the differences between TDEngine and SQLite.

As an emerging time series database representative, TDengine has many advantages, in the edge of the storage choice really challenges the generation of master SQLite, it is really a little young people do not speak of the moral. However, it is important to realize that TDEngine and SQLite have different emphases to deal with. They are not necessarily to be chosen or not chosen. Instead, they can be used flexibly according to their own business needs.

Follow the public account “TDengine”, reply “1117” in the background, and get the complete version of PPT

Young people do not speak martial arts, Tdengine edge side data storage solution challenges SQLite

Tdengine – More suitable for the edge side of the big data engine

Tdengine edge cloud synergy thinking

Tdengine Edge builds on Raspberry Pi

Environment to prepare

The build process

A simple comparison between TDEngine and SQLite

Related Posts

From principle to practice, hand in hand with you easily get the number of warehouse double cluster disaster recovery

Details on Nebula 2.0 performance testing and tuning of Nebula-Importer data import

The machine could write SQL, and Curd Boys (Girls) didn’t panic