Design and implementation of Cobar SQL audit

background

Introduction of Cobar

Cobar is an open source database middleware product of Alibaba.

With the rapid growth of business, database often becomes the bottleneck of the whole business system. The emergence of database middleware is a kind of intermediate product to solve the bottleneck of database.

There is no problem in software engineering that can’t be solved by adding an intermediate layer. If there is, add another layer.

A proxy type of database middleware (this article does not discuss client SDK-type database middleware) has the following capabilities:

Support database transparent proxy, so that users do not feel
Ability to split databases and tables horizontally and vertically, scaling database capacity and performance horizontally
Read and write separation, reduce the main library pressure
Reuse database connections to reduce database connection consumption
It can detect all kinds of database cluster failures and achieve a fast failover
Reliable enough, good enough performance

Cobar, the main character of this article, supports all other features well except read/write separation, and it is not difficult to develop read/write separation features based on Cobar.

SQL audit

I was fortunate enough to do custom development on Cobar in my company, which was SQL auditing.

From the operational perspective of database products, statistical analysis of executed SQL is a necessary function; From a security perspective, information leaks and abnormal SQL also need to be audited.

What information does A SQl audit need to audit? Through investigation, the SQL to be collected and executed, execution time, source host, number of returned rows and other dimensions are roughly determined.

SQL audit requirements are very simple, but even a very simple requirements in the database middleware of high concurrency, low latency, single QPS can reach tens of thousands to hundreds of thousands of scenarios need to be carefully considered, strict testing.

For example, to get the operating System time, call System.CurrentTimemillis () directly from Java; This is fine, but in Cobar if you take the time this way, the performance loss is very high. Check out the code on Cobar’s Github repository).

Technical solution

General direction

After investigation, there are two directions for SQL audit implementation

One of the more obvious ones is to modify the Cobar code directly, burying points where information needs to be gathered
The other is the solution provided by Ali Cloud database, which is analyzed by capturing the communication traffic of the database.

Considering the complexity of the technology, we chose the simpler first implementation.

SQL auditing is a “icing on the cake” requirement in Cobar. It should not degrade Cobar performance or make Cobar unavailable. Therefore, the following two points must be followed:

Performance is as close as possible to the NO-SQL audit version
Can’t make Cobar unusable in any way

For performance degradation, you can’t optimize without measuring it, so you use Sysbench, a database benchmarking tool, to pressure current versions of Cobar.

Cobar was deployed on a 4C8GB machine, mysql was deployed on a good enough physical machine to press a benchmark of 5.5 W /s, and subsequent releases were compared against this number.

Because of the way of invading Cobar code, in order to have the least impact on Cobar, it is necessary to keep the code minimal modification, so the Agent scheme is adopted.

This keeps code changes to a minimum, only needs to be collected and transmitted to the Agent, the logic of transmitting audit information to the remote end only needs to be handled in the Agent, and transmitting information to the remote end is almost immediately determined to use Kafka. This also keeps Cobar from introducing new third-party dependencies. Keeping the code clean (Cobar only relies on Log4j as a third party) and keeping Kafka and Cobar in two JVMS is even more isolating. This resulted in the first draft of the architecture shown below

From the above figure, two key technical points are sorted out: thread communication and process communication.

Process communication is easy to understand. Why is thread communication involved here?

First of all, Cobar’s execute thread is the main thread that executes SQL. If the process communication is carried out in this thread, performance will definitely be exhausted. This is left to the audit thread with minimal impact on Cobar performance.

Interprocess communication

Let’s start with interprocess communication, which is a little easier. We just need to list the available interprocess communication methods, compare the advantages and disadvantages, and choose a suitable one

First, Cobar is written in Java, so we frame the scope: TCP, UDP, UnixDomainSocket, files.

After investigation, UnixDomainSocket and platform correlation is too strong, and there is no official implementation, only third party implementation (such as JunixSocket), after testing, different Linux versions are inconsistent support, so it is directly excluded here.

Writing files leads to high IO and even runs the risk of full disk write, after all, under such high concurrency, so ruled out.

Finally, we choose BETWEEN TCP and UDP. Considering that UDP has better performance than TCP, and TCP has to solve the sticky packet problem by itself, we choose UDP. In fact, SQL audit requirements are similar to log collection and metric reporting. Many log collection and metric reporting methods are UDP.

Interthread communication

If interprocess communication can be decided in a flash, it is because it does not directly affect Cobar, it is the communication between the audit thread and the Agent process. However, communication between threads directly determines the performance impact on Cobar, and caution must be exercised.

Communication between threads must be routed through an intermediate buffer, which we require as follows

Bounded and unbounded may result in memory overflow
Delivery cannot be blocked, which will cause the main thread to be tamped, greatly affecting Cobar performance
It can be out of order and even lose some data in extreme cases to ensure Cobar availability
Thread safety, high concurrency if the thread is not safe, the data will be corrupted
A high performance

Java built-in queue

The built-in queues in Java serve as this buffer

The only bounded ones are ArrayBlockingQueue and LinkedBlockingQueue, but they’re both locked, and my gut tells me it’s not going to perform very well.

Considering that Both CurrentHashMap and LongAdder in Java resolve lock conflicts by fragmentation, we decided to use multiple ArrayBlockingQueues to construct the buffer

Measured, only 4.7W /s, performance loss of about 10%

Disruptor

Java’s built-in queues are locked queues. Are there any queues that are unlocked and bounded? A search revealed an open source, lock-free queue implementation that is used by a number of products such as Log4j2. It is a circular data structure, uses CAS in Java instead of locks, and has many detailed performance optimizations, resulting in very strong performance.

Unfortunately, the Disruptor buffer blocks after the Disruptor buffer fills up, which would be disastrous if the main thread blocks, so we abandoned it.

The RingBuffer SkyWalking

SkyWalking is an open source application performance monitoring system, including indicator monitoring, distributed tracking, and distributed system performance diagnosis.

His principle is to use Java bytecode modification technology at the call to insert buried points, collect information to report. Similar to Cobar’s acquisition and reporting process.

So how does his RingBuffer come about? In fact, it is very simple, the buffer is an array, each time the delivery of an array of no data to get the index, in multithreading as long as the acquisition of the index is not two threads at the same time. The speed of data writing depends on whether the subscript acquisition is efficient, as shown below:

Fetching array subscripts is similar to Disruptor and uses CAS, but its implementation is very simple and a bit crude, but it can choose to block, overwrite, or ignore when full. We chose to override this strategy, in extreme cases discarding old data in exchange for Cobar availability. We tested a scenario with multiple SkyWalking RingBuffers and found only 3W /s, a 45% performance loss.

So we made some optimizations to this Ringbuffer

This optimization is mainly to replace CAS with incrementAndGet, so that we can take advantage of the incrementAndGet optimization in JDK8. Prior to JDK8, incrementAndGet was also CAS underneath, but after JDK8, IncrementAndGet uses fetch-and-add(CPU instructions) and is much more robust. See “Buffering queues for Extreme Performance” for details and code.

In addition to this major optimization, SkyWalking was optimized for row population with reference to Disruptor, resulting in a 5.4W /s performance loss of only 1.8%. This version of Ringbuffer was used as a buffer for Cobar SQL audits.

The optimized Ringbuffer was also fed back to the SkyWalking community, and SkyWalking authors praised it as an “intersting contribution”.

conclusion

Cobar SQL audit has steadily supported all Cobar clusters in the company since its launch and is one of the systems with the highest QPS.

In retrospect, the pursuit of extreme performance may have been too “paranoid”, creating benefits that don’t seem to be that big in the eyes of the outside world, adding a machine to do things that must be so complicated. But this “paranoia” is our initial pursuit of technology, life is not only the casual, but also poetry and distance.

The original address: mp.weixin.qq.com/s/OEuZIfbKb…