The introduction

If you are familiar with MySQL, you should know that the data synchronization mechanism between the master and slave of MySQL cluster is very perfect. Surprisingly, ClickHouse, one of the most popular big data analytics engines in recent years, can also be mounted as a MySQL slave library, serving as MySQL’s “coprocessor” for efficient data analysis in OLAP scenarios. The previous approach was straightforward, translating all the MySQL operations through a third-party plug-in and then replicating them one by one on the ClickHouse side for data synchronization. Finally, in the second half of 2020, Yandex released the MaterializeMySQL engine in the ClickHouse community, which supports full and incremental real-time data synchronization from MySQL. The MaterializeMySQL engine currently supports MySQL 5.6/5.7/8.0 and is compatible with Delete/Update statements and most common DDL operations.

Basic concept

  • MySQL & ClickHouse

MySQL generally refers to the complete MySQL RDBMS, is open source relational database management system, currently owned by Oracle Corporation. With its improved features and active open source community, MySQL is attracting more and more enterprise and individual users.

ClickHouse is an open-source distributed column database for OLAP scenarios from Yandex. ClickHouse has real-time queries, a complete DBMS, and efficient data compression, supporting batch updates and high availability. ClickHouse is also SQL syntax-compatible and has many benefits right out of the box.

  • Row Store & Column Store

MySQL uses Row Store. Data in a table is continuously stored in storage media as logical storage units according to Row. This storage mode is suitable for random add, delete, change and search operations, and is friendly to query by line. However, if only a few attributes in a Row are selected as the target of the query, the Row storage method has to traverse all rows and filter out the target attributes. When there are many table attributes, the query efficiency is usually low. Although optimizations such as indexing and caching can improve efficiency in OLTP scenarios, they are not sufficient in OLAP scenarios facing massive data backgrounds.

ClickHouse uses a Column Store, where data in a table is stored consecutively in a storage medium as a logical storage unit by Column. This storage method is suitable for Single Instruction Multiple Data (SIMD) concurrent processing of Data, especially when there are many table attributes, the query efficiency is significantly improved. In the column mode, physically adjacent data types are often the same, and therefore are naturally suitable for data compression to achieve extreme data compression ratios.

Method of use

  • Enable the BinLog function for master-mysql. Enable the ROW mode. Enable the GTID mode.

    My.cnf Key configuration

    gtid_mode=ON enforce_gtid_consistency=1 binlog_format=ROW

  • You are advised to use gcc-10.2.0, CMake 3.15, and NINJa1.9.0 or higher to obtain the ClickHouse/Master code

  • MySQL > create database and table in master-mysql

    creat databases master_db; use master_db; CREATE TABLE IF NOT EXISTS runoob_tbl( runoob_id INT UNSIGNED AUTO_INCREMENT, runoob_ VARCHAR(100) NOT NULL, runoob_author VARCHAR(40) NOT NULL, submission_date DATE, PRIMARY KEY ( runoob_id ) )ENGINE=InnoDB DEFAULT CHARSET=utf8;

    Insert a few pieces of data

    INSERT INTO runoob_tbl (runoob_, runoob_author, submission_date) VALUES (“MySQL-learning”, “Bob”, NOW()); INSERT INTO runoob_tbl (runoob_, runoob_author, submission_date) VALUES (“MySQL-learning”, “Tim”, NOW());

  • Create a MaterializeMySQL database in slave-clickhouse

    Enable the Materialize synchronization function

    SET allow_experimental_database_materialize_mysql=1;

    Create slave library with parameters (“mysqld service address “, “database name to be synchronized “,” authorized account “, “password “)

    CREATE DATABASE slave_DB ENGINE = MaterializeMySQL(‘192.168.6.39:3306’, ‘master_DB ‘, ‘root’, ‘3306123456’);

Now you can see that ClickHouse already has data synchronized from MySQL:

DESKTOP:) select * from runoob_tbl; SELECT * FROM runoob_tbl Query id: 6e2b5f3B-0910-4d29-9192-1b985484d7E3 chrysene ─runoob_id─┬─runoob_title─ ┬─runoob_author─┬─submission_date─ 1 │ Bob MySQL - learning │ │ │ 2021-01-06 └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ Chrysene ─runoob_id─┬─runoob_title─ ┬─runoob_author─┬─submission_date─ ── 2 │ MySQL-learning Tim │ 2021-01-06 │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ 2 rows in the set. The Elapsed: 0.056 SEC.Copy the code

The working principle of

  • BinLog Event

BinLog events in MySQL mainly include the following types:

MYSQL_WRITE_ROWS_EVENT-- insert 3. MYSQL_UPDATE_ROWS_EVENT -- update 4. MYSQL_DELETE_ROWS_EVENT -- deleteCopy the code

After the transaction is committed, MySQL will process the BinLog Event and persist it into a BinLog file

ClickHouse achieves data synchronization by consuming BinLog, with three main considerations in mind:

DDL compatibility: Because of the differences between ClickHouse and MySQL data type definitions, DDL statements need to be converted accordingly

2. Delete/Update support: The _version field is introduced to control the version information

3. Query filtering: _sign field is introduced to mark data validity

  • DDL operations

Compare the MySQL DDL statement with the DDL statement executed on the ClickHouse side:

mysql> show create table runoob_tbl\G; *************************** 1. row *************************** Table: runoob_tbl Create Table: CREATE TABLE `runoob_tbl` ( `runoob_id` int unsigned NOT NULL AUTO_INCREMENT, `runoob_` varchar(100) NOT NULL, `runoob_author` varchar(40) NOT NULL, `submission_date` date DEFAULT NULL, PRIMARY KEY (' runoob_id ') ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET= UTf8 1 row in set (0.00 SEC) --------------------------------------------------------------- cat /metadata/slave_db/runoob_tbl.sql ATTACH TABLE _ UUID '14dbff59-930e-4aa8-9f20-ccfddaf78077' ( `runoob_id` UInt32, `runoob_` String, `runoob_author` String, `submission_date` Nullable(Date), `_sign` Int8 MATERIALIZED 1, `_version` UInt64 MATERIALIZED 1 ) ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(runoob_id, 4294967) ORDER BY tuple(runoob_id) SETTINGS index_granularity = 8192Copy the code

You can see:

1. Add 2 hidden fields to DDL conversion by default: _sign(-1 delete, 1 write) and _version(data version) Use _version as column version 3 and the original DDL primary key field runoob_id as ClickHouse sort and partition keys

In addition, there is a lot of DDL processing, such as adding columns, indexes, etc., the corresponding code is in the Parsers/MySQL directory.

  • Delete/Update operations

Update:

UPDATE runoob_tbl set runoob_author='Mike' where runoob_id=2; mysql> select * from runoob_tbl; +-----------+----------------+---------------+-----------------+ | runoob_id | runoob_title | runoob_author | submission_date | +-----------+----------------+---------------+-----------------+ | 1 | MySQL-learning | Bob | 2021-01-06 | | 2 | MySQL-learning | Mike | 2021-01-06 | +-----------+----------------+---------------+-----------------+ 2 rows in the set (0.00 SEC) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- # ClickHouse end:  DESKTOP:) select *, _sign, _version from runoob_tbl order by runoob_id; SELECT *, _sign, _version FROM runoob_tbl ORDER BY runoob_id ASC Query id: C5f4db0a - b49 eff6-4 - a429 - b55230c26301 ┌ ─ runoob_id ─ ┬ ─ runoob_title ─ ─ ─ ┬ ─ runoob_author ─ ┬ ─ submission_date ─ ┬ ─ _sign ─ ┬ ─ _version ─ ┐ │ 1 │ mysql-learning │ Bob │ 2021-01-06 │ 1 │ 2 │ mysql-learning │ Mike │ 2021-01-06 │ 1 │ 4 │ 2 │ MySQL- Learning Tim │ 2021-01-06 │ 1 │ 3 │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ 3 rows in the set. The Elapsed: 0.003 SEC.Copy the code

As you can see, the ClickHouse data also synchronizes updates in real time.

  • Delete:

    Mysql backend

    mysql> DELETE from runoob_tbl where runoob_id=2;

    mysql> select * from runoob_tbl; +———–+—————-+—————+—————–+ | runoob_id | runoob_title | runoob_author | submission_date | +———–+—————-+—————+—————–+ | 1 | MySQL-learning | Bob | The 2021-01-06 | + — — — — — — — — — — – + — — — — — — — — — — — — — — — – + — — — — — — — — — — — — — — – + — — — — — — — — — — — — — — — — — + 1 row in the set (0.00 SEC)


    ClickHouse end

    DESKTOP:) select *, _sign, _version from runoob_tbl order by runoob_id;

    SELECT *, _sign, _version FROM runoob_tbl ORDER BY runoob_id ASC

    Query id: e9cb0574-fcd5-4336-afa3-05f0eb035d97

    Chrysene ─runoob_id─┬─runoob_title─ ┬─runoob_author─┬─submission_date─┬─_sign─┬─_version─ ─── ─── mysql-learning │ Bob │ The 2021-01-06 1 2 │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ Chrysene ─runoob_id─┬─runoob_title─ ┬─runoob_author─┬─submission_date─┬─_sign─┬─_version─ ─── ── mysql-learning │ Mike │ The 2021-01-06 5 │ │ │ – 1 └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ Chrysene ─runoob_id─┬─runoob_title─ ┬─runoob_author─┬─submission_date─┬─_sign─┬─_version─ ─── ── mysql-learning │ Mike │ 2021-01-06 │ 1 │ 4 │ 2 │ mysql-learning Tim │ 2021-01-06 │ 1 │ 3 │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ 4 rows in the set. The Elapsed: 0.002 SEC.

As you can see, deleting the row with id 2 just inserts an additional row with _sign == -1, and does not actually delete it.

  • Log replay

During data synchronization between the Master and Slave of MySQL, the Slave node converts BinLog events into corresponding SQL statements, and the Slave simulates the Master writing. Similarly, traditional third-party plug-ins follow MySQL’s master-slave BinLog consumption scheme, which translates an Event into a Clickhouse-compatible SQL statement that is then executed (played back) on ClickHouse, but the entire execution link is long and often costly. The difference is that the internal data parsing and write back scheme provided by the MaterializeMySQL engine hides the complex link of the three-party plug-in. The BinLog Event is converted to the underlying Block structure during playback, and then written directly to the underlying storage engine, which is close to physical replication. This scheme is analogous to playing back BinLog events directly to InnoDB pages.

Synchronization strategies

  • Log replay

Prior to v20.9.1, ClickHouse used location-based synchronization. Each BinLog Event consumed by ClickHouse was stored in a.metadata file.

[FavonianKong@Wsl[20:42:37]slave_db]
$ cat ./.metadata
Version:        2
Binlog File:    mysql-bin.000003
Binlog Position:355005999
Data Version:   5
Copy the code

So when ClickHouse starts up again, it will tell mysql Server by protocol that the {‘ mysql-bin.000003 ‘, 355005999} binary will send data from this point:

S1 > ClickHouse sends {' mysql-bin.000003 ', 355005999} to mysql S2 > mysql to locate mysql-bin.000003 and 355005999 offset Read the next Event and send it to ClickHouse S3 > ClickHouse receives the Binlog Event and completes the synchronization operation S4 > ClickHouse updates the.metadata siteCopy the code

Existing problems:

When MaterializeMySQL creates a database, host refers to THE VIP. When the cluster master/slave switchover occurs, host refers to the VIP. {Binlog File, Binlog Position} tuples are not necessarily accurate because Binlog can be reset.

S1 > ClickHouse sends {'mysql-bin.000003 ', 355005999} to cluster new master mysql s2> new master mysql Because it reset master, the binlog file is mysql-bin.000001 S3 > error replication occurredCopy the code

To solve this problem, the GTID synchronization mode was introduced after v20.9.1, deprecated the unsafe loci synchronization mode.

  • GTID synchronization

The GTID mode assigns a globally unique ID and sequence number to each event.

[FavonianKong@Wsl[21:30:19]slave_db]
Version:        2
Binlog File:    mysql-bin.000003
Executed GTID:  0857c24e-4755-11eb-888c-00155dfbdec7:1-783
Binlog Position:355005999
Data Version:   5
Copy the code

Where 0857C24E-4755-11EB-888C-00155DFBDEC7 is the UUID of the host that generates the Event, and 1-783 is the Event interval that has been synchronized

The flow then becomes:

S1 > ClickHouse sends GTID: 0857C24E-4755-11EB-888C-00155dfbDEC7:1-783 to MySQL S2 > MySQL finds local loci based on GTID, Read the next Event and send it to ClickHouse S3 > ClickHouse receives BinLog Event and completes the synchronization operation S4 > ClickHouse updates.metadata GTID informationCopy the code

Source code analysis

  • An overview of the

In the latest source code (V20.13.1.1), ClickHouse has reconfigured the database Database SQL engine to accommodate the GTID synchronization mode. ClickHouse entry the main function of the entire project in/ClickHouse/designed/main CPP file, the main program will be given out according to receive instructions for tasks to ClickHouse/designed subroutine in processing in the directory. This analysis focuses on the workflow of the Server side MaterializeMySQL engine.

  • Source directory

The main source path associated with MaterializeMySQL:

ClickHouse/SRC/databases/MySQL / / MaterializeMySQL storage engines achieve ClickHouse/SRC/Storages / / / table engine implementation ClickHouse/SRC/core/MySQL * / / copy the code ClickHouse/SRC/Interpreters / / / Interpreters, SQL rewrite here also processing ClickHouse/SRC/Parsers/MySQL / / analytical partial implementation, DDL parsing and related processing hereCopy the code
  • Server main process

ClickHouse uses the POCO network library to handle network requests. The Client connection logic is in the Hander method of ClickHouse/ SRC /Server/* handler. CPP. Taking TCP as an example, the main logic can be abstracted into:

// ClickHouse/src/Server/TCPHandler.cpp
TCPHandler.runImpl()
{
    ...
    while(true) {
        ...
        if (!receivePacket())  //line 184
                continue
        /// Processing Query   //line 260
        state.io = executeQuery(state.query, *query_context, ...);
    ...
}
Copy the code
  • Data synchronization preprocessing

The executeQuery function processes the SQL sent by the Client. The logic is simplified as follows:

// ClickHouse/src/Interpreters/executeQuery.cpp static std::tuple executeQueryImpl(...) {... Ast = parseQuery(...); ; . // line 503, generate interpreter Auto Interpreter = InterpreterFactory::get(ast, context,...) ; . // line 525, the interpreter returns the result res = interpreter->execute(); . }Copy the code

There are three main points:

InterpreterFactory: InterpreterFactory ->execute(); InterpreterFactory: interpreter->execute();

Following up on point 3, see what the Excute () for InterpreterCreateQuery does:

// ClickHouse/src/Interpreters/InterpreterCreateQuery.cpp BlockIO InterpreterCreateQuery::execute() { ... // CREATE | ATTACH DATABASE if (! Create.database.empty () &&create.table.empty ()) // line 1133, when using MaterializeMySQL, Return createDatabase(create); }Copy the code

CREATE or ATTACH DATABASE, followed by createDatabase() :

// ClickHouse/src/Interpreters/InterpreterCreateQuery.cpp BlockIO InterpreterCreateQuery::createDatabase(ASTCreateQuery & create) { ... // line 208, according to ASTCreateQuery, Get the database object from the DatabaseFactory // refer to the DatabasePtr DatabaseFactory::getImpl() function DatabasePtr Database = for details DatabaseFactory::get(create, metadata_path, ...) ; . DatabaseMaterializeMySQL Database ->loadStoredObjects(context,...); ; }Copy the code

At this point, it is equivalent to distributing the task to DatabaseMaterializeMySQL for processing, and then tracing the loadStoredObjects function:

//ClickHouse/src/Databases/MySQL/DatabaseMaterializeMySQL.cpp template void DatabaseMaterializeMySQL::loadStoredObjects(Context & context, ...) { Base::loadStoredObjects(context, has_force_restore_data_flag, force_attach); Try {/ / line87, launched here materialize thread synchronization materialize_thread. StartSynchronization (); started_up = true; } catch (...) . }Copy the code

Follow up with the startSynchronization() binding execution function:

// ClickHouse/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp void MaterializeMySQLSyncThread::synchronization() { . If (STD ::optional metadata = prepareSynchronized()) {while (! isCancelled()) { UInt64 max_flush_time = settings->max_flush_data_time; BinlogEventPtr binlog_event = client.readOneBinlogEvent(...) ; {// Incremental synchronization listens binlog_envent if (binlog_event) onEvent(buffers, binlog_event, *metadata); }}}... }Copy the code
  • Full amount of synchronization

MaterializeMySQLSyncThread: : prepareSynchronized DDL and full synchronization, simplified main process is as follows:

// ClickHouse/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp std::optional MaterializeMySQLSyncThread::prepareSynchronized() { while (! isCancelled()) { ... MaterializeMetadata Metadata (Connection,...); // MaterializeMetadata metadata(connection,...) ; Metadata. Transaction (position, [&]() {cleanOutdatedTables(database_name, global_context); dumpDataForTables(connection, metadata, global_context, ...) ; }); return metadata; }... }}Copy the code

ClickHouse, as a MySQL slave node, performs a series of preprocesses on the MySQL side in the MaterializeMetadata constructor:

TablesCreateQuery Run the SHOW CREATE TABLE statement to obtain TABLE building statements of the MySQL server. 3. Release the TABLE lock after TABLE building statements are obtained

Further down the line, to the metadata.Transaction () function, which takes an anonymous function as an argument, follow this function until anonymous functions are executed, namely the cleanOutdatedTables and dumpDataForTables functions, Take a look primarily at the dumpDataForTables function:

// ClickHouse/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp static inline void dumpDataForTables(...) {... //line293, where tryToExecuteQuery(... , query_context, database_name, comment); }Copy the code

Following the tryToExecuteQuery function, we call the executeQueryImpl() function, which we mentioned earlier, but this time our context information has changed, the generated executor has changed, and DDL conversions and table dump operations are performed:

// ClickHouse/src/Interpreters/executeQuery.cpp static std::tuple executeQueryImpl(...) {... Ast = parseQuery(...); ; . Auto Interpreter = InterpreterFactory::get(ast,context,...) ; . // line 525, the interpreter returns the result res = interpreter->execute(); . }Copy the code

Return to InterpreterExternalDDLQuery InterpreterFactory right now, to go in to see what did the execute function:

// ClickHouse/src/Interpreters/InterpreterExternalDDLQuery.cpp BlockIO InterpreterExternalDDLQuery::execute() { ... if (external_ddl_query.from->name == "MySQL") { #ifdef USE_MYSQL ... // line61, DDL is executed when full copy is executed else if (... ->as()) return MySQLInterpreter::InterpreterMySQLCreateQuery( external_ddl_query.external_ddl, cogetIdentifierName(arguments[0]), getIdentifierName(arguments[1])).execute(); #endif } ... return BlockIO(); }Copy the code

Follow along to see what getIdentifierName(arguments[1])).execute() does:

// ClickHouse/src/Interpreters/MySQL/InterpretersMySQLDDLQuery.h
class InterpreterMySQLDDLQuery : public IInterpreter
{
    public:
    ...
    BlockIO execute() override
    {
        ...
        // line68, 把从MySQL获取到的DDL语句进行转化
        ASTs rewritten_queries = InterpreterImpl::getRewrittenQueries(
                   query, context, mapped_to_database, mysql_database);
        
        // line70, 这里执行转化后的DDL语句
        for (const auto & rewritten_query : rewritten_queries)
            executeQuery(..., queryToString(rewritten_query), ...);

        return BlockIO{};
    }
    ...
}
Copy the code

See InterpreterImpl: further: how getRewrittenQueries into DDL:

// ClickHouse/src/Interpreters/MySQL/InterpretersMySQLDDLQuery.cpp ASTs InterpreterCreateImpl::getRewrittenQueries(...) {... If (primary_keys.empty()) throw Exception(" Cannot be materialized, no primary keys.",...) ; . Auto sign_column_name = getUniqueColumnName(columns_name_and_type, "_sign"); auto version_column_name = getUniqueColumnName(columns_name_and_type, "_version"); ReplacingMergeTree storage->set(storage->engine, makeASTFunction("ReplacingMergeTree",... ); . return ASTs{rewritten_query}; }Copy the code

After completing the DDL conversion, the new DDL statement is executed, the table is built, and dumpDataForTables is returned:

// ClickHouse/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp static inline void dumpDataForTables(...) {... //line293, where tryToExecuteQuery(... , query_context, database_name, comment); . MySQLBlockInputStream Input (connection,...) MySQLBlockInputStream input(connection,... ; }Copy the code
  • The incremental synchronization

Remember the execution function of the startSynchronization() binding? Full synchronization analysis is done in prepareSynchronized(), but what about incremental updates?

// ClickHouse/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp void MaterializeMySQLSyncThread::synchronization() { . If (STD ::optional metadata = prepareSynchronized()) {while (! isCancelled()) { UInt64 max_flush_time = settings->max_flush_data_time; BinlogEventPtr binlog_event = client.readOneBinlogEvent(...) ; {// Incremental synchronization listens binlog_envent if (binlog_event) onEvent(buffers, binlog_event, *metadata); }}}... }Copy the code

As you can see, there is a listener for binlog_event in the while statement, which is used to listen for changes in the MySQL side’s BinLog. Once the MySQL side performs the operation, the binlog_event will update and trigger the binlog_event. Incremental updates are mostly done here.

// ClickHouse/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp void MaterializeMySQLSyncThread::onEvent(Buffers & Buffers, const BinlogEventPtr & Receive_event, MaterializeMetadata & metadata) {// Incremental synchronization is implemented by listening on binlog events. Currently, four types of events are supported: MYSQL_WRITE_ROWS_EVENT, // MYSQL_UPDATE_ROWS_EVENT, MYSQL_DELETE_ROWS_EVENT, and MYSQL_QUERY_EVENT // You can find the corresponding onHandle for specific processes If (receive_event->type() == MYSQL_WRITE_ROWS_EVENT){... } else if (receive_event->type() == MYSQL_UPDATE_ROWS_EVENT){... } else if (receive_event->type() == MYSQL_DELETE_ROWS_EVENT){... } else if (receive_event->type() == MYSQL_QUERY_EVENT){... } else {/* MYSQL_UNHANDLED_EVENT*/} }Copy the code

summary

The MaterializeMySQL engine is an official ClickHouse feature for 2020. Since this feature is just needed in production environments and is currently available, the module is in a state of rapid iteration, so there is a lot of functionality to be improved. For example, you can view the status of the replication process and verify data consistency. For those interested, see Github’s 2021-roadmap, which will update some of the community’s recent plans. Please correct the above content if there is a misunderstanding.

reference

ClickHouse community source code