Finishing | er yue guest | Liao Hao are

Small T takeaway: in recent years, with the development of all kinds of emerging technologies, Internet, industrial industry obtained rapid development, such as the Internet, the resulting amount of temporal data is becoming more and more large, general big data solution is more and more worse, a variety of temporal database products arises at the historic moment, migration also become one of the enterprises face the difficult point of operation.

Compared with TDengine, OpenTSDB is an earlier player. The hbase-based product model has both advantages and disadvantages: It lowers the threshold for enterprises with basic hbase-based services, while over-reliance on Hbase creates a bottleneck for performance and compression. With the continuous expansion of enterprise business scale, the deployment cost and operation efficiency of monitoring system will increase. As time goes by, the negative effects caused by OpenTSDB’s defects will completely exceed the positive effects, as proved by sf Express’s business case.

Previously, SF Technology used OpenTSDB as a full monitoring data storage solution, which not only costs a lot in operation and operation, but also is increasingly difficult to meet the requirements in terms of performance — daily large-span and high-frequency queries can no longer meet the requirements of current business development, and it even takes more than ten seconds to return the query results. As the number of users increases, OpenTSDB, which supports lower QPS, is prone to crashes, which can render the entire service unavailable. Sf Express then decided to migrate to TDengine after investigating the market timing database products.

In order to help enterprises with migration requirements like SF Express, we hold TDengine technology open class. This article is organized based on TDengine Technology Insider Sharing — Compatible OpenTSDB, which was given by Liao Haojun, co-founder of Taos Data, to answer your questions from the specific operation level.

Considerations for migrating OpenTSDB to TDengine

Before making the migration decision, some students may be concerned about how compatible TDengine is with OpenTSDB. This problem should be viewed from two dimensions, one is the full compatibility of writing protocols, the other is the full coverage of query functions. In the former case, you can write all your data into TDengine without changing a single line of code, which is very easy and convenient. In the latter case, we provide full coverage of OpenTSDB’s query functionality, but there are two important points to note: first, OpenTSDB does not provide support for query syntax, and second, it does not provide support for equivalent metadata query.

There are three main reasons for not providing syntax support for OpenTSDB:

First of all, the overall semantics of a query expression in OpenTSDB above are relatively complex, and the query syntax is weak in ideographic ability, so it cannot be used to map a complete query capability. The syntax rules are also inconsistent with SQL usage and cannot be used for complex applications, such as the following SQL query: this is an unrelated query from words, which cannot be expressed using Json.

SELECT id, SUM(avg_conn)
FROM 
(
  SELECT AVG(connection) as avg_conn, id 
  FROM t1 INTERVAL(10s) 
  GROUP BY id
)
GROUP BY id
Copy the code

Second, compared to OpenTSDB, TDengine provides richer query functions, aggregation functions, TWA, TRATE, LEASTSQUARES, TOP, BOTTOM, LAST_ROW, etc. To match its query syntax, You won’t be able to fully use all of the query capabilities TDengine provides. Third, OpenTSDB also provides an AD hoc interpolation mechanism during the query process, making it difficult for TDengine to provide completely consistent query results.

In addition to syntax incompatibility, TDengine does not support metadata query function and some other functions in data query function. Specific reasons are as follows:

  • OpenTSDB uses interfaces, apis, and stats to monitor cluster service status. TDengine also has a monitoring mechanism for its own cluster status, which is different from OpenTSDB’s cluster service status monitoring.
  • / API /tree is a hierarchical organization structure of time line/time series unique to OpenTSDB. TDengine adopts the hierarchy of “database → super table → sub-table” to organize and maintain timelines: that is, all timelines belonging to the same super table and all sub-tables belong to the same level physically in the system, that is, there is no hierarchy or subordinate relationship between different sub-tables. If you really need this scenario for OpenTSDB, you can use the logical design of tags to establish a logical hierarchy between different child tables.

In addition, OpenTSDB provides the special function of Rollup And PreAggregates. Basically, it is an automated data reduction mechanism that aggregates data written to the system according to a preset window of time. And write the result to the system again, the new write result is visible to the user. This mechanism is proposed to solve the problem of raw data query performance and help users reduce query processing overhead and response time.

With this capability, users can achieve a significant performance improvement by diverting a significant proportion of standing Queries (sampled or aggregated queries issued to periodically obtain reported information or monitor panel refresh presentations) from the application to query the down-sampled results. Because it turns the query into a data read operation. However, an obvious problem is that this strategy is intrusive to the application. In some scenarios, for example, when the down-sampling result does not meet the query requirements, the application will be required to read the original data and then issue a query request to obtain the results according to the requirements.

Therefore, the shortcoming of “Rollup And PreAggregates” is that it is opaque to applications, making application processing logic extremely complex And lacking portability. It is a compromise And compromise in the case that sequential databases cannot provide high-performance aggregate query. TDengine does not support such automatic sampling reduction at the system function level.

How is TDengine’s query performance optimization strategy implemented?

TDengine has built-in, user-transparent block-level predictive algorithms (SMA) that provide very high performance query responses and simplify application query processing logic.

Above, is a process, red and green for two time line, when writing data is two time lines together, after entering into the system at the same time line data are bound together, every time data block in after such a calculation process, this process is driven by trading thread is responsible for the execution. As shown in the figure, there are four types of precomputation: count, Max, min, and sum. Each block is preceded by a small precomputation module. This structure is transparent to users and applications.

When we want to send a query, for example, the query time interval from 3 seconds to 13 seconds, namely, above the red framed part, because is itself is in view of the data block level to set up, so when reading a pre-calculated results in only between two pieces of whole block of data is meaningful, will only read here is a block, The red boxes on either side need to read real data before the computing engine can calculate a result.

But the real data is very large compared to the predicted data. If there are 4000 records in a block, all 4000 records must be read out when reading the real data. If there are any predicted data to use, it may only need one number, such as Max in the figure above. This method greatly improves THE I/O performance and reduces the I/O pressure during query.

TDengine ensures efficient query processing performance with this built-in support mechanism, rather than focusing on the application layer and leaving the complexity to the user. To sum up, from OpenTSDB migrated to TDengine, you just need to adjust the reading part of the application code and implementation logic, can get quick results and benefits, not only to be able to use more efficient query and calculation function, still can obtain a faster response time and lower overhead storage resources, greatly reduce the cost of hardware and software deployment.

Two, you need to know the two types of writing

As mentioned earlier, writing to OpenTSDB can be adjusted to write to TDengine without changing a single line of code. How is this possible? There are two ways: one is a high-level language directly using cross-language scheme to call C interface write, one is directly calling HTTP interface write in a RESTful way.

As shown above, if your application is written in C#, Java, Rust, Python, and other high-level languages, you’ll need to recompile it using TDengine’s libraries. Then connect the driver of TDengine through the connector of high-level language and encapsulate the loaded logic into the driver. All OpenTSDB write protocols are then handled through a C interface called TaOS_Schemaless_INSERT, which writes data directly to TDengine in a structured manner.

Some of you may wonder why the logic is wrapped in driver and not in a high-level language. The reason is very simple, all high-level languages are directly cross-language call interface, TDengine does not provide a variety of high-level language native interface, if the high-level language level implementation of OpenTSDB to TDengine syntax compatible parsing and conversion write, Such a complex set of logic needs to be implemented for different high-level languages. Moreover, considering the specific application scenarios in the production environment, the multi-threaded environment makes the problem more complex, resulting in many boundary cases to deal with.

So we decided to take this logic down to C and let all the high-level languages directly drive the C interface to implement the calls, which not only greatly reduces the complexity of the high-level language connector, but also simplifies the architecture and speeds up the evolution.

This is how high-level languages write data using localized interfaces. If you decide to go RESTful, it’s even easier, just change the configuration file, port, and IP address, and deploy a component called a “taosAdapter.”

TaosAdapter is our recently open source HTTP service developed in Go language. As shown in the figure above, from the user side, when you write various OpenTSDB protocols, you can directly post to the port opened by taosAdapter. The taosAdapter uses the underlying driver to connect to the TDengine to directly write data without requiring users to perform any operations.

Third, the specific implementation of data writing using TAOSAdapter

TaosAdapter architecture

Users can use the SERVICES provided by taosAdapter through the JDBC-restful package or directly use the HTTP interface provided by taosAdapter. The specific structure is shown in the figure above. The lowest layer is the driver layer, followed by the GO Connector, Connection Pool and HTTP modules of TDengine.

The taosAdapter has the following technical features:

  • Can be deployed separately from TDengine

Before introducing this technical feature, there is a question to consider. We all know TDengine itself provides HTTP services, so why develop a taosAdapter to receive OpenTSDB data?

First of all, in the process of data writing, the most CPU resources are the data parsing and conversion (SQL → binary data) operations, of course you can use dynamic binding to reduce CPU consumption to a certain extent, but this degree is very limited. When you write in Native mode, SQL parsing and binary conversion is done on the client side. When you write in HTTP mode, an SQL statement is posted directly to the server. To reduce the burden on the server, we thought of separating this operation from the TDengine service.

This not only reduces the load on the server, but also allows you to deploy as many TaOSAdapters as you need to write data — for example, if you deploy two TDEngines, you can only provide two HTTP services inline. However, it is possible to deploy four or even five TAOSAdapters for data writing. Based on the actual write load, the number of taosAdapter nodes can be flexibly adjusted to meet the actual write requirements, regardless of the number of TDengine cluster nodes.

At the same time, because taosAdapter itself is a stateless protocol conversion system, it converts HTTP services into a switching protocol inside TDengine and writes data into the system. This stateless operation makes its scale out very convenient. The taosAdapter cluster can be flexibly deployed efficiently. In addition, it can greatly reduce the load of TDengine server itself and save more service resources to support larger query processing.

  • Supports the OpenTSDB write protocol

Writing over HTTP is completely transparent to the writing application, and you don’t even need to do anything like recompile, just adjust the IP:PORT of the writing application, and the data can be written to TDengine seamlessly. The IP PORT mentioned here is the taosAdapter service PORT, not TDengine. The HTTP service embedded in TDengine does not support OpenTSDB. This is an important point to explain here.

  • The OpenTSDB database is written in parallel

As we all know, OpenTSDB does not have the concept of database selection. TDengine needs to write data to a database, which raises the problem of which database to write to. We propose a solution. The taosAdapter supports parallel write to the OpenTSDB database.

Figure 1

Figure 2

The specific operation is shown in the two pictures above. Through port mapping, taosAdapter lets different applications write data to different databases. Multiple ports can be configured in the taosAdapter configuration file. Each port is mapped to a different TDengine database, and application data can be automatically written to a different database. After matching read and write permissions, different system data can be effectively controlled.

In the above operation, the database needs to be manually set up by the administrator (transparent to the writing application) and adjusted to the configuration file. After this process is complete, the database can be started. At this time, your application only needs to adjust the Settings to the correct IP port, and the data can be written to the correct DB.

The figure above shows the detailed process of parallel write. Firstly, the port to DB mapping configuration tables are established in taosAdapter: 6046: DB1, 6047: DB2. Take 6046 as an example, the system will map the data obtained through port 6046 directly to DB1, and then the taosAdapter will set up the connection with TDengine, switch the database connected to it to DB1 database, and then write data. After the write is complete, the DB information associated with the current connection is cleared, that is, the connection is restored to a state where no DB is specified. The same principle applies when writing to 6047, except that the data is automatically written to DB2, which is the implementation of parallel write to the repository.

The above is the analysis of several characteristics of taosAdapter. Let’s look at the difference between TDengine’s schemaless write performance and normal SQL statement write performance.

As a simple comparison, taosAdapter’s write performance is about 74.97% of normal SQL write performance at 10 million data writes. This means that if you can write at 1 million data writes per second with normal SQL, Instead, taosAdapter is capable of approximately 750,000 units per second. With such a comparison, people can assess the approximate size of the system to be deployed to support their business throughput.

The figure below shows the load of the taosAdapter during the write process. The yellow line is taOSD and the green line is taosAdapter. As you can see, taosAdapter CPU consumption is high, mainly due to string related numeric format conversion.

Write at the end

TaosAdapter currently supports a wide range of protocols and data writing formats, including Json and Telnet protocols of OpenTSDB. It also supports InfluxDB V1, StatSD, CollectD, TCollect, Icinga2, and Node exporter. We’ll explore taosAdapter support for other exporters later.

In addition, we will further optimize the performance of taosAdapter in the future. Currently, taosAdapter consumes a high amount of CPU, and we will optimize the architecture design and performance to the extent feasible. Firstly, some internal synchronous interfaces are upgraded to asynchronous interface invocation to improve service performance and support higher performance processing. Second, further optimize its processing logic, reduce the CPU overhead in the writing process.


⬇️ Click on the image below to see more about the event and bring the iPhone 13 Pro home!