In actual Mycat

Let’s use a mini e-commerce system database table to practice. Our current mysql architecture is two servers. Server A is ready to carry three user information libraries, and server B is ready to carry two order information libraries. Meanwhile, master/slave replication is enabled on server B.

The original library table was described as follows:

Area table:

Zip code table:

User information table:

The order list:

Note: By observing e-commerce websites such as Ali, it can be found that these e-commerce websites will place products of only one merchant with an order number, so we also follow this design.

Order list:

Now with the development of business, user information and order information are very large, we need to divide the database and table, how to do?

We have determined that we are going to split the user information and the order information, so that a vertical split of the business repository.

After business dewarehousing, the user information and order information will be entered into the sub-database and sub-table. The most important step is the selection of sharding column, which will directly determine whether the whole sub-database and sub-table scheme is successful. The selection of sharding fields is strongly related to business. The method of selecting sharding fields mainly analyzes your API traffic, gives priority to the API with heavy traffic, extracts the SQL corresponding to the API with heavy traffic, and takes these SQL conditions as the sharding fields.

We investigated the fields in the business and user information table. User login is a high-frequency and necessary operation, and the fields involved in login include Nicky_name and phone_name. We consider these two fields as sharding fields, and the user ID is often used in our business, so it should also be used as sharding fields. Now that there are three fields that need to be partitioned, what do you do in this case? In fact, we can consider splitting the user information table. Nicky_name and phone_name are collectively referred to as the login field. After the user logs in, the user can find the user ID through the login field. So the user information table becomes:

Order business From the business perspective, there are three columns as three independent Sharding columns, namely: THE ID of order itself, buyer_user_id, merchant_user_id. Because the order business buyers and sellers of the query flow are relatively large. Select * from order_id; select * from order_id; In this case, order_id will not be processed because the method is the same as buyer_user_id and merchant_user_id.

There is another point that needs to be mentioned here, whether multiple sharding-column sub-database sub-tables are fully redundant or only redundant relational index tables.

The case of full redundancy is as follows — the data of the table corresponding to each Sharding column is full, which has the advantage of no second query and better performance, but has the disadvantage of wasting storage space. If we do this this time, the table will become:

Note: using redundant relation indexing table partition, table structure can be designed so that, the order table remains the same, partition fields for the id of the order itself, the newly built two tables, partition field respectively buyer_user_id, merchant_user_id, namely

Partition field merchant_user_id:

merchant_user_id order_id

Sharding buyer_user_id column to:

buyer_user_id order_id

The advantage of this method is to save space. The disadvantage is that except for the query of the first Sharding column, all other sharding column queries need to be queried twice:

The order list and order table are typical master and slave list. From the table type provided by Mycat, they belong to ER table type. In order to make the detail record and order record fall on the same node after sharding, so this table should also have two, but the table structure is the same.

After the structure of the table is determined, it is the choice of partitioning algorithm. For the user information table, because it is an ID fragment, we can directly take the module with THE ID, while the partition field of the login table is a string, so we choose to intercept part of the string to do hash modeling. For the order table, we choose the algorithm “model constraint”.

As for post, since it is rarely used, we only put it on one node, and the range table perfectly fits the definition of global table in Mycat, so the range table is defined as global table, so the configuration file in Mycat should look like this:

Mycat architecture analysis and implementation analysis

The main architecture 

Mycat consists of several modules: communication protocol, route resolution, result set processing, database connection, monitoring and so on.

1. Communication protocol module

The communication protocol module undertakes the processing of sending and receiving data and thread callback at the bottom level, and mainly adopts Reactor and Proactor modes to improve efficiency. At present, Mycat communication module adopts Reactor mode by default and MySQL protocol at the protocol layer.

2. Route parsing module

The route parsing module is responsible for parsing the grammar of the incoming SL statements, parsing the conditions, statement types and keywords of the SQL statements parsed from MySQL protocol and entered into the module, and optimizing the SQL statements that meet the requirements. Finally, route calculation is carried out according to these route computing units.

3. Result set processing module

The result set processing module is responsible for aggregating, sorting and intercepting the cross-fragment results. Since data is stored in different databases, cross-shard data needs to be aggregated.

4. Database connection module

The database connection module is responsible for creating, managing, and maintaining the back-end connection pool. To minimize the overhead of establishing a database connection each time, the database uses connection pooling to manage the connection life cycle

5. Monitor the management module

The monitoring and management module is responsible for monitoring and managing the connection, memory and other resources in Mycat. Monitoring is to display some monitoring data in real time through management commands, such as the number of connections and cache hits. Management detects and releases unused resources mainly through polling events.

6.SQL execution module

The SQL execution module is responsible for obtaining the corresponding target connection from the connection pool, synchronizing the information of the target connection, and then distributing the SQL statement to the corresponding node for execution according to the result of route parsing.

Overall Execution process

Main process: initiated by read/write event notification of communication protocol module. The read/write event informs the specific callback code to handle the read/write event. The execution process of the management module is initiated when the timer event checks and releases resources

The data sent from the client enters the execution component through protocol resolution and route resolution. The execution component sends the data to the communication protocol module, and finally the data is written to the target database.

Data is returned from the back-end database, parsed by the protocol and sent to the callback module. If the data involves multiple nodes, the execution process will first enter the result set aggregation, sorting and other modules, and then return the processed data to the client through the communication protocol module.

routing

In principle, you can think of myCat as an SQL forwarder. Mycat receives the SQL from the front end and forwards it to the mysql server in the background for execution. However, there are many mysql nodes (such as Dn1, dn2, dn3), which nodes should be forwarded to? That’s where route resolution comes in.

Routing ensures that the SQL is forwarded to the correct node. The range of forwarding is just right, not many hair. For selection, MyCat uses DruidParser.

Implementation of cross-library Join

Global table

Any system an enterprise will have a lot of basic information table, the data structure is relatively simple, similar to a dictionary table, is there a link between them and the business table, but the relationship will not be master-slave relationship, most is a genus of sexual relations and the equivalent of configuration information, such as provinces, city, region, etc.

When the amount of business data reaches a certain size, we need to fragment the business table, the relationship between the business table and the related basic information becomes very troublesome, and these basic information tables have the following characteristics

L Data changes infrequently

L The scale of data volume is also relatively small, with tens of thousands of records in general and rarely more than hundreds of thousands of records

Given these characteristics, Mycat defines a special kind of table called a global table. When inserting into a global table, each node simultaneously inserts and updates data. When reading data, any node can read data because the data of any node is the same. In this way, the read performance is improved and the efficiency of cross-node Join is solved.

The important features of global tables are as follows.

L The insertion, update and deletion of the global table will be performed synchronously in all nodes in real time to maintain the consistency of data in each fragment

L Queries for a global table can be performed from any node because all nodes have the same data. A global table can Join any table

The basic information table and dictionary table are defined as global table. From the business point of view, the Join operation between them will not be affected after the business table is sharded. This is an implementation to solve the problem of cross-library Join.

ER shard

ER relational model of database is often used in current application systems. Mycat refers to the design ideas of Foundation dB, a newcomer in the field of NewSQL. Foundation DB innovatively put forward the concept of Table Group. The storage location of sub-tables depends on the main Table and is physically close to each other, so it completely solves the problem According to this idea, a data sharding strategy based on E-R relation is proposed, in which the records of the child table and the associated parent table are stored in the same data sharding.

In the process of data insertion, the child table will be routed by Mycat to the node of its related parent table record, so that the Join query of the parent table and child table can be pushed down to each database node. This is the most efficient cross-node Join processing technology and is also the first one of Mycat, as shown in Figure 7-12.

Customer table adopts the strategy of sharding by scope. Customers in each scope are in one shard, and Customer and Customer100 are in host 1 and Host2. As you can see from Figure 7-12, the Orderl and Order2 orders for Customer1 are on the host shard, The Oder3 and Order100 orders of Customer100 are shard on Host2, so the records of Orders table depend on the parent table for shard. The association relation of the two tables is Orders Customer ID = Customer ID.

In accordance with such sharding rules, customer and Orders on Host1 sharding can perform local Join operations, as well as on Host2 sharding, that is, Join on the respective sharding first, and then merge the data of the two nodes inside Mycat, thus completing cross-sharding Join. Consider that there are 1 million orders tables on each shard, so there are 100 million in 10 shards. The ER mapping based data sharding pattern basically solves the problems faced by many applications.

However, there are many businesses with multi-layer ER relationship. In this case, there is a very simple method, that is, the concept of space for time, simply speaking, the multi-layer E-R relationship, such as the second and third layer E-R relationship, can be solved by adding redundant fields. For example, in the above example, there is also a sub-table Order Items of Orders, and we only need to add a new field customer ID to the table, that is, the ER relation of the second and third layer is transformed into the ER relation of the first layer for shard.

ShareJoin is an implementation of Catlet, which executes parsed SQL in batches, coexists and merges result sets.

Convergence and sorting

The parsed SQL is executed separately, and the result set is coexist, and the result set is merged, and the result set is heaped in mycat if there is a sort.