The logic behind MyCat

This article is an introduction to MyCat. But the focus is not so much on how to use Mycat and how to configure it. Mycat configuration related content I believe readers on the Internet and Mycat official group can find the corresponding answers. The focus of this article is on the implementation of the logic behind Mycat. This article is the author from the network to learn the summary of the results, I hope to have certain help to each reader. If there is inappropriate content, welcome to leave a message in the comments.

Mycat is a database middleware edited based on Java language. It complies with Mysql native protocol and is a universal middleware agent for cross-language, cross-platform and cross-database.
Mycat works by intercepting SQL statements sent by users and analyzing SQL statements: fragment analysis, route analysis, read/write separation analysis, cache analysis, etc. The SQL is then sent to the real database at the back end.
Three configuration files
1. Schema.xml defines logical libraries, tables, shard nodes, and so on
2. Rule-xml defines sharding rules
3. Server.xml defines variables related to users and system variables.
Data segmentation method:
1. Vertical sharding (database sharding) : Shards tables in different business scenarios to different databases based on business.
2. Horizontal shred: Split the data in the same table into different databases according to the logical relationship between the data in the table.
Database and table problems:
1. Introducing distributed transactions
2. Cross node join problem
3. Merge sort paging problems across nodes
4. Multi-data source problem
Mycat join problem
1. Table grouping: The records of the child Table and the associated parent Table are stored in the same data fragment, that is, the child Table depends on the parent Table. The Table Group ensures that data Join will not operate across libraries.
2. Mycat global table:
  - The insertion and update of the global table will be performed on all nodes in real time to keep the data consistency of each shard
  - Query operations for global tables are retrieved from only one node
  - A global table can JOIN any table
3. Share join
  1. Currently, joins of two tables are supported. The principle is to parse SQL statements, split into a single table SQL statement execution, and then the data of each node together.
Mycat query: The routing result of myCAT is determined by the sharding field and sharding method.
1. If a fragment field exists in the query criteria, routes are directly routed to a specific fragment
2. If there is no fragment field in the query condition, MyCat cannot calculate the route and sends the query to all nodes for execution, and then returns the result and aggregates it.
Mycat paging sort:
1. Pure LIMIT: MyCat distributes queries to individual DB nodes for execution. However, Mycat response results depend on which DB node returns results to Mycat first.
2. Limit and sort: MyCat distributes queries to individual DB nodes for execution. After receiving the results from each DB node, a minimum heap operation is performed on all the results. The result is then returned.
3. Limit with offset and sort:
  1. For SQL statements with limit m,n, Mycat will rewrite them to limit 0, m+n to ensure logical correctness of the query result.
  2. Then send the rewritten SQL to each DB node to execute.
  3. After receiving the results from all DB nodes, a minimum heap operation is performed on all the results. It is then returned to the front-end user
  4. Disadvantages: it consumes resources. For K DB nodes, the amount of data MyCat needs to process is (m+n) * T. Mycat is not suitable for paging sort operations.
Mycat transaction implementation.
1. When the application starts a transaction, MyCat identifies the connection as non-automatic
2. Mycat executes subsequent SQl over a connection that is not automatically committed.
3. If all nodes are successfully executed, MyCat marks the connection as Prepare Ready. If one node fails to be executed, the node is in the RollBack state.
4. MyCat waits to reference subsequent commit or rollBack commands. For the commit command: If the current connection is in the Prepare Ready state, the commit command is sent to each DB node.
5. If a DB fails during commit and the other DB nodes commit successfully, mycat waits for the failed DB node to return the result until TIMEOUT. The transaction consistency is broken. Therefore, Mycat transactions are weakly consistent.
Fragmentation rules
1. modulus
2. Fragment enumeration: You can configure fragments by configuring possible enumeration ids in the configuration file.
3. Sharding by scope
4. Sharding by date
MyCat global sequence.
1. MyCat configures the sequence into a file. For each insert, MyCat updates the current sequence value in the sequence_conf.properties file in classpath.
  - Advantages: Local load, fast read.
  - Disadvantages: Poor risk resistance, unable to read local files after MyCat active outage.
2. Database mode: Use a table of the database for accumulation. MyCat preloads a portion of the number segment into MyCat’s memory. The number segment is fetched from the sequence first, and then from the database if MyCat runs out of number segments in memory.
  1. If MyCat crashes, a number segment is wasted. Mycat restarts to get the next number segment.
3. Timestamp mode: UUID=42 bits ms +5 bits MACHINE ID+5 service code +12 repeated accumulation
  1. Advantages: Simple configuration
  2. Disadvantages: The 18-bit ID is too long
  3. Heavily dependent on the machine clock, clock callback will cause serialization duplication.
4. Autonomously produce global sequences.

Related Posts

Why can Apache Flink become a new generation of big data computing engine?

Should we refrain from pursuing new technologies?

The era of cloud native video has begun, and Huawei Cloud is ready