On time sequence Database

1. The concept

The time series database is called time series database. Compared with the conventional relational database SQL, the biggest difference is that the time series database is a database with regular time interval records indexed by time. The figure shows the display form of a sequential database, whose index is TIMESTAMP, followed by different columns that record the attribute values under the current timestamp. This record increases periodically over time.

Metric is a table in a relational database. Data point is a data point, that is, a row of a relational database, timestamp is the time when the data point was generated. Field is an attribute value that changes with the timestamp under measurement, and tag is additional information, usually attribute information that is not related to the timestamp.

At the same time, the characteristics and classification of time series data should be discussed.

  • Time series data are divided into two categories by characteristics
  1. High frequency low retention period (data acquisition, real-time display)
  2. Low frequency high retention period (data presentation and analysis)
  • According to the frequency
  1. Regular interval (data acquisition)
  2. Irregular intervals (event-driven)

The difference between a sequential database and a relational database is:

  • Because the data is frequently collected, the data volume of time series database is very large, so time series data has become one of the fastest growing data types.
  • The data grows over time and is evaluated by dimension, while the latitude of the data is almost constant.
  • Continuous high concurrent write, the more devices, the more write, and due to periodic sampling, the write volume is stable. But there are few updates (the data produced by a device at a point in time does not change) and the deletion of individual data points (usually only all data within the expired time range).
  • Periodic data collection makes the data volume very stable.
  • Because of the time standard, data records are time-sensitive, and the use of old data decreases with time.
  • Unlike relational databases, which are more query tasks, sequential databases are more write tasks. Therefore, LSM Tree is used instead of B Tree in SQL to improve write performance.

2. Application scenarios

I have personally worked on projects with sequential databases. In my opinion, with the further development of the current era of big data, time series database is widely used. Taking a project I once participated in as an example, a large shopping mall currently measures a variety of indoor parameters, including indoor temperature, humidity, air pressure, electricity consumption, illumination and so on, in order to ensure that the shopping mall can provide customers with the best experience and use resources as efficiently as possible. In this case, various sensors will send data back to the database every 15 minutes of the hour, and the engineers in the background will analyze the environmental changes in the mall and adjust the environment accordingly. Such data are time-sensitive and are rarely used after a week. At the same time, it also has a strong periodicity and seasonality. It is easy to find the daily regularity, monthly and quarterly trend changes by analyzing the temporal data. This can be better and more effective to carry out accurate null value of the shopping mall environment.

From the above example extended to other more fields, such as financial analysis, the Internet industry, environmental monitoring, etc., as long as it is project related to the time change, such as the analysis time change rule, characteristic, and according to the historical data trend forecasting, time series model to establish training, etc., will need a lot of time series data. At present, because the results generated by the above tasks are very beneficial to the industry layout, cost calculation and other work, time series analysis task is becoming more and more important, which also leads to the importance of time series data. The data volume of time series data is huge, and the characteristic of writing data mainly leads to the fact that the relational database cannot complete the storage work very effectively (because it mainly takes query as the main task, so the storage efficiency is not high). Therefore, professional time sequence databases represented by Hbase and Cassandra can better complete the task of time sequence data.

3. Timing database challenges

The biggest challenge of timing database is how to complete various tasks in massive data, which can be divided into the following aspects

  • Sequential data reads: Because the data write frequency is sometimes very high, up to tens of thousands of times per second, how to support the high intensity of writes is a challenge for sequential databases.
  • Sequential data reading: Because of the high-frequency nature of sequential data, how to support high-intensity grouping aggregation is also a challenge.
  • Storage cost: Because of the large amount of data, we need to consider how to store data at low cost.
  • Database usability requirements: How to design a sequential database can be used quickly by current IT staff.