Canal-based client-Adapter data synchronization must-read guide

This paper will introduce the use of client-Adapter in canal project, as well as the reliability, high availability and monitoring alarm that need to be considered in landing production. (Based on Canal 1.1.4 version)

Canal, as a real-time data subscription component of mysql, realized the capture of mysql binlog data.

Although Ali also open source a pure mysql data synchronization to mysql project Otter (github.com/alibaba/otter, based on Canal), mysql unidirectional synchronization, bidirectional synchronization and other capabilities. However, we often need to synchronize data from mysql to ES, hbase and other storage, so users need to use canal-client to obtain data for consumption, which is quite troublesome.

Since version 1.1.1 canal implements a matching landing module, implementation of canal to subscribe to news for consumption, is the client – adapter (github.com/alibaba/canal/wiki/ClientAdapter).

In the latest stable version 1.1.4, client-Adapter has realized the ability to synchronize data to RDS, ES, and HBase.

1. Basic client-Adapter capabilities

Currently, Adapter has the following basic capabilities:

Connects to upstream messages, including Kafka, RocketMQ, and Canal-Server
Implement incremental synchronization of mysql data
Implement full synchronization of mysql data
Downstream write supports mysql, ES, and hbase

2. Client – Adapter architecture

The Adapter essentially consumes the real-time incremental data that the Canal-Server subscribs to, so an upstream Canal-server must generate the data.

The overall structure is as follows:

3. Migrate and synchronize the configuration (Mysql as an example)

The official documentation address: github.com/alibaba/canal/wiki/Sync-RDB

The following is a list of practical considerations.

3.1 Parameter Configuration

1) General configuration file application.yml

Description:

A piece of data can be consumed by multiple groups at the same time. Multiple groups execute in parallel. In one group, a serial execute in multiple outerAdapters, for example, logger and hbase
Currently, there are two methods for client Adapter data subscription: direct connection to Canal Server or subscription to Kafka /RocketMQ messages
After zookeeperHosts is configured, distributed locks are supported. If the canal-server is connected in cluster mode, you still need to enter this parameter. For details, see the high availability section below.

2) Configure Adapter for corresponding tasks

The task configuration to synchronize to mysql is in the conf/ RDB path. The task configuration file used in this article is named mysqL1.yml

Attention! TargetPk The mapping between the source and target primary keys is srcPk: targetPk.

3) Log format modification

The default log level in logback. XML is DEBUG. Change the log level to INFO when using online logs; otherwise, logs will burst

3.2 Incremental Synchronization Capability

1) DML incremental synchronization

After completing the above configuration, you will be able to subscribe to incremental data normally after startup. The Adapter can receive the MQ to message and post successfully to the target library.

The following logs are displayed.

2) the DDL synchronization

To use DDL synchronization capabilities, you must configure mirroDb to True in the RDB.

3.3 Full synchronization Capability

Adapter’s ability to provide the full amount of synchronization, specific operation can refer to website github.com/alibaba/canal/wiki/ClientAdapter in section 3.2.

Here we use commands

The curl http://127.0.0.1:8081/etl/rdb/mysql1/mysql1.yml - X POSTCopy the code

The output is as follows

4. Dynamic configuration

4.1 Task Switch

The curl http://127.0.0.1:8081/syncSwitch/dts-dbvtest-insertdata/on - PUT XCopy the code

If the ZK address is configured in application.yml, then the distributed switch is used, and this task switch is registered with ZK, and switching on any machine will start and stop all machines with the same task.

Relevant source code implementation is as follows:

Get task switch status information on zK
If false, disconnect

4.2 Configuration Change

1) Local configuration files

The Adapter is configured by default by reading the local configuration file.

One surprise is that when you modify the configuration file, the task automatically refreshes the configuration, enabling dynamic configuration.

Let’s see how it works.

Inherited FileAlterationListenerAdaptor
File changes are detected
Destroy the current canalAdapterService
The refresh contextRefresher
Sleep for 2 seconds
Reinitialize the canalAdapterService

Eventually the log will print

2) mysql based remote configuration

If multiple Adapters are configured, you can use mysql to store configuration information for unified global configuration.

The implementation principle of this is also relatively simple:

Local asynchronous thread rotation mysql
If there is an update, write the updated configuration to the local configuration file
Dynamic update

5. Data reliability analysis

5.1 an ack mechanism

One of the Adapter tasks uses a multithreaded model.

The main thread grabs mq’s message and writes it to the queue, where CountDownLatch waits
Asynchronous threads poll queue, post downstream
Upon successful delivery, the main thread releases latch and returns an ACK to MQ

If, for some reason, the target library does not have a primary key ID for the data, and the update returns 0, the consumption is considered successful

5.2 Retry Mechanism

The retries parameter in application.yml is used to poll downstream of the post.

The retry interval of 0.5s can be set to x times to avoid network jitter loss.

When the number of retries is reached, it will automatically ack. Therefore, it is necessary to pay attention to the collection of failed logs and timely alarm reminder in the process of use.

6. Performance analysis

Specific performance requirements or need to get a conclusion through pressure measurement.

Here are two performance tuning related points from the source code.

6.1 Full Synchronization multi-threading

In full synchronization, synchronization efficiency is a problem worth considering.

Adapter has made some designs for full data synchronization efficiency. When the number of full data synchronization is greater than 1W, multi-threading will be enabled, as shown in the code below:

However, there is a problem with deep paging in mysql, which can put a significant performance strain on the source database.

6.2 Full Synchronization Select *

Another efficiency issue with full synchronization is select *, which prevents client memory from bursting.

Look at the source code, indeed has also considered this problem, opened the JDBC stream query.

7. Monitor alarms

If it is to be used in production, it is necessary to monitor the alarm.

Although Adapter does not provide apis for monitoring indicators as canal-Server does, we can do some auxiliary monitoring of alarms.

1) Mq message accumulation alarm

If the Adapter fails, the mq message accumulation can be detected in a timely manner by using heap alarms under existing MQ topics.

2) Abnormal log alarm

The Adapter has its own log format. You can confirm the configuration mode of log collection and log parsing format with the existing monitoring system.

Then change the pattern in conf/logback. XML to change the log printing format for configuration collection.

8. High availability

Read the source code to find

The TCP mode supports ZK for HA (non-high availability), whereas the MQ mode does not support ZK for HA

TCP requires HA in a different way from our understanding of HA.

Because the Canal-server HA directly connects to the upstream canal-server, and the Canal-server HA will cause IP changes, the ADAPTER TCP HA supports this. The ADAPTER can monitor IP changes and connect to different upstream servers, rather than its own high availability architecture.

The MQ schema itself does not support HA.

However, if we dock with the upstream MQ pattern, we can make a trick of high availability.

Currently, when MQ is fetched from binlog, it is delivered to only one queue in a given topic (even if the hash is multiple queues), so mq consumption is clustered so that only one client can sequentially consume messages in the corresponding queue.

Thus, we deploy two Adapters and have two MQ consumers running at the same time. Normally, only one machine will be consuming tasks, and once one machine dies, MQ will automatically continue consuming tasks with the other machine. Made a simple high availability.

Disadvantages are also obvious, the task can not load balance, can only run on a machine

Therefore, you need to consider multiple consumer groups for task processing.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Canal-based client-Adapter data synchronization must-read guide

1. Basic client-Adapter capabilities

2. Client – Adapter architecture