Brief introduction:PolarDB-X 2.0 provides a global Binlog capability to address the data isling problem, which provides an incremental log consumption experience for downstream ecosystems that is exactly the same as MySQL Binlog. For data corruption, it provides instant-level, table-level, SQL-level and row-level data recovery capabilities with different granularity, including consistent backup recovery, table recycle bin, SQL Flashback, Flashback Query and so on.

Conference portal: https://yqh.aliyun.com/live/polardbx2021

background

As developers, we all know or are familiar with the backend system. The backend system can be abstractly divided into two components: the business system, which handles the business logic of the system. In modern architectures, this part is usually designed as a stateless node that scales horizontally; The other is the database system, which is responsible for storing the state of the system, including the most core business data. From the perspective of database, data inflow includes two parts. One is real-time writing of business system, which is the main part of core data source. The other is data imported once or periodically from an upstream data system. Because these core data were generated here for the first time, this database is also called SSOT (Single Source of Truth).

The SSOT is the most important data asset in the backend system, so two issues arise that need to be addressed properly. The first problem is that, as the most important asset, we usually need to synchronize these data in real time to other systems for further processing such as BI analysis. Without such real-time synchronization mechanism, the data will become an island of data. The second problem is that the data may be damaged for various reasons, such as data damage caused by hardware failure or software Bug, data damage caused by improper operation, data disorder caused by wrong SQL, etc., so it is very necessary to provide a variety of mechanisms to guarantee the security of the data.

Global Binlog

PolarDB-X is a distributed database product that is highly compatible with the MySQL ecosystem, so let’s first take a look at how MySQL can solve the data island problem.

As can be seen from DB-Engines’ list, MySQL’s popularity is higher than that of other open source databases combined, which means that MySQL’s ecology is very prosperous. For example, the downstream system of MySQL includes Kafka, MySQL standby node, Canal various data synchronization tools, Pulsar and so on. MySQL implements real-time incremental data synchronization with downstream systems through the Binlog mechanism. Binlog can be regarded as a message queue, in which detailed incremental Change information in MySQL is stored in sequence. By consuming this queue, downstream systems or tools realize real-time Data synchronization with MySQL. This mechanism is also called CDC (Change Data Capture). Incremental data capture.

Distributed databases provide CDC capabilities with higher complexity than stand-alone databases. A distributed database typically contains multiple nodes that generate multiple incremental log queues, and downstream consumption of multiple queues has several issues.

  1. Since there are multiple queues, how can the order of change events in multiple queues be determined when downstream consumption occurs?
  2. Changes to distributed transactions may involve multiple queues. If transaction integrity is to be guaranteed at consumption time, how can change events from the same transaction be discovered and merged?
  3. The system has a scale-up capacity (that is, the increase and decrease of the queue), how to deal with the downstream correctly?
  4. DDL will involve multiple queues. How can the downstream accurately identify the position before and after the Schema change of each queue and coordinate the consumption schedule?

Faced with these problems, CDC capabilities of distributed databases need to be trade-off in terms of implementation difficulty, supporting features, ease of use, etc. Generally speaking, to provide multiple downstream queue, don’t guarantee transaction integrity only provides the consistency and provide custom formatting incremental log is a relatively easy to implement, but the plan puts forward higher requirements on the downstream consumption, such as to develop consumer code or tool, need to consider multiple queue coordination problems, etc. A more experience friendly approach is to significantly reduce the cost of setting up a data synchronization link by providing CDC’s ability to experience exactly the same as MySQL Binlog, allowing downstream to consume incremental changes to the distributed database as transparently as consuming MySQL Binlog. This is also the solution for PolarDB-X 2.0, which we call the global Binlog.

PolarDB-X 2.0 adopts a horizontally scalable share-nothing architecture. The basic unit of the system is Node (i.e., Node), and each Node can be divided into two parts: computing Node (i.e., CN) and data Node (i.e., DN). To provide global Binlog capabilities, PolarDB-X 2.0 adds a CDC component, which is a cluster with elastic capabilities, as shown in the figure above.

The global Binlog generation process can be divided into three stages:

  1. The CDC component pulls the incremental log from each DN, known as the physical Binlog, and then performs single-queue sorting, internal event filtering, and DDL-related plastic operations to provide a “clean” incremental event queue for the next stage. At the same time, if the system scales up, The CDC component will automatically sense and do relevant processing at this stage;
  2. CDC component will all “clean” incremental event queue to merge, to belong to the same distributed transactions during the period of event to merge, and according to the transaction time stamp for global sorting, thus get a global event queue orderly safeguard transaction integrity, at the same time, this phase will also handle the DDL place in the queue. The CDC component then generates the queue into a file compatible with the MySQL Binlog format, that is, a global Binlog file.
  3. When the CN component receives a request to subscribe to the global Binlog, it sends the global Binlog to the downstream consumer according to the MySQL Dump protocol.

Through the above three phases, PolarDB-X 2.0 implements the global Binlog capability that is fully compatible with MySQL binlogs. If you are interested in the detailed implementation principle, please pay attention to our Zhihu column PolarDB-X Zhihu.

Backup recovery

For data corruption issues, PolarDB-X 2.0 provides data recovery capabilities of different granularity, including instant-level consistent backup recovery capability, table-level table recycle bin capability, SQL-level SQL Flashback capability, row-level Flashback Query capability, and so on. The following introduces the characteristics and use scenarios of these four capabilities.

Consistent backup recovery

First, look at Consistent Backup Recovery, which provides instance-level historical data recovery at any point in time, down to the second level. In a stand-alone database, it can be considered that both full data and incremental log are stored on a single machine. To achieve consistent backup and recovery, only full data and incremental log backup are needed. Consistent backup and recovery in a distributed database adds additional complexity because full data and incremental logs are stored on multiple machines. PolarDB-X 2.0 achieves consistent backup recovery by taking full backups of all DNs plus global binlogs. For example, if we have a PolarDB-X 2.0 instance that backups at 0:00 on Monday, Tuesday and Friday, and one day there is a need to restore the data to 14:25:26 on Sunday, then our system will select the full backup set (—-) that is closest to the recovery point, namely 0:00 on Friday. We replay the global Binlog starting at 0:00 on Friday and ending at 14:25:26 on Sunday, so we have the snapshot we want. PolarDB-X 2.0 has a consistent backup recovery capability that does not lock libraries during a backup. This capability relies on the global Binlog, where the recoverable interval is the save interval of the global Binlog. This capability currently has several limitations, such as the inability to scale up during a backup, and the ability to support only isomorphic recovery.

Watch the recycle bin

The second data recovery capability that PolarDB-X 2.0 provides is called Table Recycle Bin. As the name implies, we temporarily place the DROP table in a recycle bin. If we find that we need to restore the table within two hours, we can retrieve it from the recycle bin. Table Recycle Bin provides complete management functions, such as viewing all tables in the Recycle Bin, completely dropping a table, restoring a table, and so on. The Recycle Bin currently only caches tables that have been dropped within two hours and does not support retrieving tables that have been dropped via the Truncate Table.

SQL Flashback (coming soon)

The third data recovery capability that PolarDB-X 2.0 provides is called SQL Flashback. The ability to accurately recover a single piece of data affected by an erroneous SQL operation. PolarDB-X also included this ability, which has been widely recognized as a data recovery capability since the launch of PolarDB-X, which has helped many users recover data after mistakenly deleting their data. Here is an example to introduce the specific use process of this capability. As shown in the figure above, in T1 we wanted to delete the record whose title was “Developer” and name was “Ralph”, but forgot to add “name=’Ralph'” in the WHERE condition, so the record whose name was “Mary” was deleted together. These two delete events, along with the ID of the corresponding SQL, are recorded in the global Binlog. At T2, we found the error deletion problem, and found the corresponding SQL and ID through the auditing function of Polardb-X. At T3, we generate the recovery SQL with the SQL ID and SQL flashback capability. SQL flashbacks work by searching the global Binlog for all the change events (in this case, the two delete events) that correspond to the SQL after obtaining the SQL ID, and then generating the reverse restore SQL one by one. SQL > delete (‘ delete ‘, ‘delete’, ‘delete’); SQL flashbacks provide accurate data recovery capabilities for SQL maloperation scenarios. As you can see, the time interval that can be recovered depends on the save interval of the global Binlog.

Flashback Query (coming soon)

The fourth data recovery capability that PolardB-X 2.0 provides is called Flashback Query. This capability provides row-level accurate data recovery over a time range. Let’s take the SQL error scenario as an example. As shown in the figure above, at T1 we wanted to update the record position with “Developer” and “Ralph” as “CTO”, but forgot to add “name=’Ralph'” in the WHERE condition, The record that causes all positions to be “Developer” is updated to “CTO”. These changes are recorded in a VN +1 undo log (a basic data structure in the database that records the changes to each row of data in detail). At T2, we immediately found the error correction problem and determined the time of the error operation and the affected data range. At T3, we can directly check the correct value of the affected two rows at T1 by using the Flashback Query capability. At T4, we corrected the data based on the correct value returned by the Flashback Query. As you can see, the Flashback Query capability depends on the length of the undo log. This capability provides faster and more accurate row-level recovery than SQL flashbacks, but undo logs generally don’t last as long as global binlogs, so they are less recoverable than SQL flashbacks.

conclusion

PolarDB-X 2.0 provides a global Binlog capability to address the data isling problem, which provides an incremental log consumption experience for downstream ecosystems that is exactly the same as MySQL Binlog. For data corruption, it provides instant-level, table-level, SQL-level and row-level data recovery capabilities with different granularity, including consistent backup recovery, table recycle bin, SQL Flashback, Flashback Query and so on. PolarDB-X 2.0 continues to build more capabilities, so stay tuned

Copyright Notice:The content of this article is contributed by Aliyun real-name registered users, and the copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.