1. What is sub-database sub-table

Literally, the data stored in one database is partitioned into multiple databases, and the data stored in one table is partitioned into multiple tables.

Data segmentation can be divided into the following two segmentation modes according to the types of segmentation rules.

  • Vertical (vertical) shard: Split a single table into multiple tables and distribute them to different databases (hosts). For example, an order table contains user information, commodity information, receiving address information, and promotion information. As the table has too many fields, it looks bloated. Therefore, we separate them to form multiple tables to store data.

Advantages of this operation:

After the split, the services are clear and the rules are clear. Integration or extension between systems is easy. Put the table on different machines according to cost, application level, application type, etc., for easy management. Easy to realize static and static separation, cold and hot separation of the database table design mode. Data maintenance is simple.Copy the code

Disadvantages:

Service tables are diverse and SQL statements are complex.Copy the code
  • Horizontal (horizontal) sharding: According to the logical relationship of the data in the table, the data in the same table is split to multiple databases (hosts) according to certain conditions. In contrast to vertical shard, horizontal shard does not classify tables, but distributes them into multiple libraries according to certain rules of a certain field. Each table contains a part of the data, and all the tables add up to the full amount of data.

    Have a users table, for example, article 100 million, article of single table in the query, insert, update, will be very slow, so we can put these data distribution into the 100 tables, each table will be down the amount of data, leading to the capacity of the single table will not be too big, so as to ensure the single table query processing capacity, etc.

    The data allocation principle is usually called sharding rule. Common sharding rules include modulo the user ID.

Note: it is important to decide on sharding rules and try to select fields that do not change. If you select fields such as region, gender, age, etc., you will have to move the data to another database when the user changes the information.

The advantages of this level of operation are as follows:

Data in a single table is kept at a certain level to improve performance. The structure of the sharded table is the same, and the application layer is less modified. You only need to add routing rules. The stability and load capacity of the system are improved.Copy the code

Disadvantages are as follows:

After sharding, the data is scattered, so it is difficult to use the Join operation of database, and the performance of cross-library Join is poor. The split rules are difficult to abstract. The consistency of sharded transactions is difficult to resolve. Data expansion is difficult and requires a lot of maintenance.Copy the code

2. Why do we use separate database and separate table

Handling when database load increases: As the number of users of our application increases, so does the number of visits, and as they increase to a certain level, the application becomes slower and slower. Of course, we can increase the speed by increasing the front-end application load, but until one day we found that no matter how to increase the front-end application load can not improve the speed, we gradually found the reason, is the database problem. Because database performance bottlenecks exist, they are unavoidable.

3. Commonly used middleware in the market

MyCat: is a third-party middleware application that requires no code changes to use MyCat. When we use, if there are more than one library, we only need to write myCAT in the code of a logical library information, and the database level configuration, such as the total number of libraries, the table in each library, each table sharding rules, these are configured in Mycat, do not need to modify the code information. The specific logical diagram is as follows:

Sharding JDBC: is a JAR package that requires code changes to use Sharding – JDBC. When we use it, we need to introduce sharding JDBC JAR package, specify the total number of libraries in the configuration file, the table of each library, the sharding rule of each table and other information.

The specific logical diagram is as follows:

How to choose middleware?

Sharding-jdbc and Mycat use a different philosophy. Sharding-jdbc is currently based on JDBC drivers and does not require additional proxies, so there is no need to focus on the high availability of the proxy itself. Mycat is based on Proxy, which copies the MySQL protocol and disguises Mycat Server as a MySQL database, while Sharding-JDBC is based on JDBC interface extension and provides lightweight services in the form of JAR packages.

In fact, MyCAT is suitable for small and medium-sized enterprises. It is very easy to achieve the separation of reading and writing of the database and the separation of database and tables. On the contrary, for large enterprises, they will develop their own applications for the middle layer of the database. For example, Sharding JDBC was originally dangdang internal database middleware, and then came out of open source.

Follow me to get more learning materials.