As a branch of the Internet of Things, the Internet of vehicles (IOT) collects data through on-board terminals and transmits it to cloud service platforms for persistent storage through wireless networks, ultimately providing personalized services based on real-time/historical data.

At present, start-up vehicle network enterprises usually access less than 100,000 vehicles, and the data collection frequency is far more than 1 second. If the HBase technology solution is used, at least six 8-core 32GB machines are required. However, if the TDengine is used as the data storage engine, one 2-core 8GB machine can complete this level of data.

The technical architecture

TDengine, as a timing processing engine, can completely eliminate Kafka, HDFS/HBase/Spark, Redis and other software, greatly simplify the design of big data platform, and reduce research and development costs and operating costs. Because there are fewer open source components to integrate, the system can be more robust and it is easier to ensure data consistency.

  • The architecture diagram of the hbase-based solution is as follows

  • For tdEngine-based solutions, the architecture diagram is shown below

The data model

The data fields collected by vehicle terminals are very large. Many enterprises establish data models according to the NATIONAL standard ISO 22901, and some companies use customized data models according to business needs. However, the collected data usually contains the following fields, and this paper also adopts this method to construct the data model.

  • Collection time (timestamp)
  • Vehicle marks (string)
  • Longitude (double precision floating point)
  • Dimensions (double precision floating point)
  • Altitude (floating point)
  • Direction (floating point)
  • Speed (floating point)
  • License plate number (string)
  • Vehicle model (string)
  • Vehicle VID (string)

Different from other time series data engines, TDengine creates a separate data table for each vehicle, and the data fields are acquisition time, vehicle marks, longitude, latitude, altitude, direction, speed and other acquisition data related to time series. The label field is the fixed description information of the vehicle, such as the license plate number and vehicle model. There is a trick here, floating point data compression ratio is very poor compared to integer data compression, latitude and longitude is usually accurate to 7 decimal places, so increase latitude and longitude by 1E7 times to long integer storage, increase altitude, direction, speed by 1E2 times to integer storage.

The statement to create the database is

create database db cache 8192 ablocks 2 tblocks 1000 tables 10000;
Copy the code

The SQL statement for creating the super table is

create table vehicle(ts timestamp, longitude bigint, latitude bigint, altitude int, direction int, velocity int) tags(card int, model binary(10));
Copy the code

If vid of the vehicle is used as the table name (for example, VID is 1, license plate number is 25746, type is BMW), then the statement to create the table is

Create table V1 using tags(25746, 'BMW');Copy the code

Data is written to

Again taking vehicle V1 as an example, write a record to table V1 SQL statement is

Insert into v1 values (1562150939000,1,2,3,4,5);Copy the code

Test data generation, can use batch data write method, similar

Insert into VALUES (1562150939000,1,1,1) (1562150939000, 2,2,2,2,2,2) (......) (...) ;Copy the code

This paper uses C language to write a vehicle simulation data generation program. The program starts with 100,000 data tables, and then each data table writes one month’s data (the data interval is 1 minute, which is 44,000 data pieces).

#include <stdio.h> 
#include <stdlib.h>
#include <string.h> 
#include <unistd.h> 
#include "time.h"
#include "taos.h"
int main(int argc, char *argv[]) {
   taos_init();
 
   TAOS *taos = taos_connect("127.0.0.1"."root"."taosdata".NULL.0);
   if (taos == NULL) {
     printf("failed to connect to server, reason:%s\n", taos_errstr(taos));
     exit(1);
   }
 
   if (taos_query(taos, "create database db cache 8192 ablocks 2 tblocks 1000 tables 10000") != 0) {
     printf("failed to create database, reason:%s\n", taos_errstr(taos));
     exit(1);
   }
 
   taos_query(taos, "use db");
 
   char sql[65000] = "create table vehicles(ts timestamp, longitude bigint, latitude bigint, altitude int, direction int, velocity int) tags(card int, model binary(10))";
   if(taos_query(taos, sql) ! =0) {
     printf("failed to create stable, reason:%s\n", taos_errstr(taos));
     exit(1);
   }
 
   int begin = time(NULL);
   for (int table = 0; table < 100000; ++table) {
     sprintf(sql, "create table v%d using vehicles tags(%d, 't%d')", table, table, table);
     if(taos_query(taos, sql) ! =0) {
       printf("failed to create table t%d, reason:%s\n", table, taos_errstr(taos));
       exit(1);
     }
 
     for (int loop = 0; loop < 44; loop++) {   
       int len = sprintf(sql, "insert into v%d values", table);   
       for (int row = 0; row < 1000; row++) {     
         len += sprintf(sql + len, "(%ld,%d,%d,%d,%d,%d)".1561910400000L + 60000L * (row + loop * 1000L), row, row, row, row, row);   
       }   
       if(taos_query(taos, sql) ! =0) {     
         printf("failed to insert table t%d, reason:%s\n", table, taos_errstr(taos)); }}}int end = time(NULL);
     printf("insert finished, time spend %d seconds", end - begin); }}Copy the code

Change the C file to test. C and create a makefile in the same directory

ROOT = ./
TARGET = exe
LFLAGS = -Wl,-rpath,/usr/lib/ -ltaos -lpthread -lm -lrt 
CFLAGS = -O3 -g -Wall -Wno-deprecated -fPIC -Wno-unused-result -Wconversion -Wno-char-subscripts -D_REENTRANT -Wno-format -D_REENTRANT -DLINUX -msse4.2 -Wno-unused-function -D_M_X64 -std=gnu99 -I/usr/local/include/taos/
 
all: $(TARGET)
 
exe:
   gcc $(CFLAGS) ./test.c -o $(ROOT)/test $(LFLAGS)
 
clean:
   rm $(ROOT)test 
Copy the code

After compiling, the test program and database were run on the same 2-core 8G desktop, and the writing time was 3946 seconds in total, equivalent to 4400 million pieces /3946 seconds = 1.115 million pieces/second, which was converted to 111.5*5= 5.57 million points/second.

insert finished, time spend 3946 seconds
Copy the code

The program is a single thread running, if it is modified to multi-thread, the speed will be more improved, but only in terms of the current performance, it is enough for the scene of the vehicle network.

Data query

TDengine does a lot of data query optimization for temporal data. Query based on the test data set generated above, which is the result of running some common SQL statements, and the performance is still a bit scary.

  • The total number of queries

  • Detailed data on individual vehicles
Types of queries The query time
1 Query the current value of the vehicle 2.3 ms
1 car 1 hour detailed inquiry 2.1 ms
1 car 1 day detailed inquiry 6.3 ms
1 car 10 days detailed inquiry 15.4 ms
1 car 31 days detailed inquiry 31.6 ms

  • Aggregate query for a single vehicle
Types of queries The query time
1 car 1 hour aggregation query 1.9 ms
1 car 1 day aggregation query 1.7 ms
1 car 10 day aggregation query 2.3 ms
1 car 31 day aggregation query 2.2 ms

  • One-day aggregate query of multiple vehicles
Types of queries The query time
1 car one-day aggregation query 3.2 ms
10 cars a day aggregation query 5.1 ms
100 cars a day aggregation query 10.4 ms
1000 cars a day aggregation query 51.4 ms
10000 cars a day aggregation query 455.9 ms
100000 cars one-day aggregation query 2074.8 ms

  • Many vehicles a month aggregation query
Types of queries The query time
1 car monthly aggregation query 3.1 ms
10 car monthly aggregation query 4.1 ms
100 cars a month aggregation query 7.7 ms
1000 cars a month aggregation query 33.7 ms
10000 cars a month aggregation query 289.5 ms
100000 cars a month aggregation query 1197.ms

  • Multiple vehicles monthly curve query
Types of queries The query time
1 car monthly curve query 6.9 ms
10 cars monthly curve query 13.2 ms
100 cars monthly curve query 75.6 ms
1000 cars monthly curve query 710.9 ms
10000 vehicles monthly curve query 7137.6 ms
100000 cars monthly curve query 32130.8 ms

  • Resource consumption

The database server process consumes only about 2.7GB of memory and the CPU footprint is negligible.

Results analysis

The timing data solution provided by TDengine has an average write speed of one million units per second in a single vehicle. All queries of a single vehicle can be realized in real time, and the query speed of multiple vehicles is also very fast. It is a necessary tool for the Internet of vehicles and even the Internet of Things.

About TDengine

TDengine is a high-performance, scalable, highly reliable and zero-management Internet of Things big data platform software with independent intellectual property rights of TAos Data, which can fully integrate database, cache, message queue, streaming computing and other functions. Because according to the characteristics of the Internet of things big data made all sorts of optimization, TDengine data insert, query performance is better than gm’s big data platform more than 10 times, has been greatly save the storage space, using SQL interface, can seamlessly integrate with third party software, greatly simplifies the system architecture of Internet of things platform, significantly reduce the complexity and cost of research and development and operations. TDengine can be widely used in Internet of Things, Internet of vehicles, industrial big data and other fields. On July 12, 2019, TDengine opened source and ranked number one on GitHub’s Global Trends list for several consecutive days.

Currently, TDengine has more than 10,000 stars on GitHub at github.com/taosdata/TD… Welcome to Star us on GitHub!