The original address: mp.weixin.qq.com/s/8eej1CMmW…

During the interview, when talking about high concurrency or big data, many times will talk about the database sub-database sub-table related problems, because your database can not resist the number of concurrent, and the number of users up, the data volume of the database is also limited.

If the amount of data in a single table is too large and the SQL is slightly more complex, the query will be slow. And, now slightly larger Internet companies, sub – library sub – tables have become standard. If you go out for an interview right now, and the interviewer asks you a question about inventory and inventory, and you say that you haven’t done it, people will immediately assume that you don’t have high concurrency experience, and that you are doing relatively simple business systems.

The maximum number of entries in a single MySQL table is 500W. It is not recommended that the number of entries exceed 1000W. If the number exceeds 1000W, it is recommended to create a separate database table.

For example, common sub-library sub-surface test questions:

  • Why separate database and table?

  • Which sub-database and sub-table middleware have you used?

  • What are the advantages and disadvantages of different sub-database sub-table middleware?

  • How exactly do you split databases vertically or horizontally?

  • How did you move the system non-stop to a sub-database sub-table?

  • How to generate global ID after partition table?

General development of a new business system, due to the need for rapid proofing, as soon as possible online, so the beginning is basically a single library system.

Figure 1 single library

Maybe the business grows so fast that in a few months it hits 10 million users! 1 million daily active users! Daily single table data volume of 100,000! Peak peak of 1000QPS requests per second!

Now everyone feels a little bit of pressure, why? Because there are 100,000 more data every day, there are 3 million more data in a month. Now we have millions of data in a single table, and it will soon be over 10 million.

The number of users is still growing, and the amount of data added every day is also growing. At this rate, the system may not last long. Business systems can easily add machines.

Figure 2 Business system adding machines

But most requests are concentrated at 20% of the time, and 80% of the time the volume of requests is manageable. During this 20% period, the number of concurrent requests per second and online users peak, and the strain on the database is also the highest every day. In this case, you can add MQ peak-clipping between the business system and the MySQL database, such as using Kafka, to mitigate high concurrency requests. If the peak is 8000 requests per second, the asynchronous write system consumes 2000 requests per second. After MQ peak peaking, many unexecuted database operations are cached in message queues, waiting for the asynchronous write system to slowly consume them.

Figure 3 MQ peak clipping

Business systems can add machines for capacity expansion, no problem, MQ peak peaking can hold up, but the bottleneck is MySQL. There are three main problems:

(1) MySQL cannot support high concurrency on a single server

(2) The disk capacity is fast or slow.

(3) SQL is running slower;

What if MySQL needs to be able to handle higher concurrency, say 8000 requests /s now, asynchronous write systems can be scaled to multiple machines and MQ consumes 6000 requests /s? You have to store them first.

Figure 4 depots

Suppose there are three libraries, each with identical tables and table structures, MQ is divided into three partitions, and each asynchronous write system consumes only one partition. Each asynchronous write system will distribute to each database according to a certain ID of each piece of data, such as userId, and the same data of each userId will be distributed to the same machine.

Before the separation, the single database may increase 100W data every day, but now each database increases more than 300,000 data. The concurrency that the database can handle has increased by three times, the disk utilization of the database has been greatly reduced, and the performance of SQL statement execution has been improved.

After the sub-database, each table of the database is still a lot, SQL statement execution performance is still not high, so we still have to consider sub-table, create a multi-library multi-table system.

Never restrict the development of the company’s business for technical reasons.

table

For example, if you have tens of millions of data in a single table, are you sure you can handle it? A large number of connection cards waiting for execution in MySQL will not only bring your MySQL database down, but also have a chain reaction, which will bring your business down. In general, when you get to a few million meters, the performance gets worse and worse, and you score.

What do you mean by a separate table?

It’s a way of putting data from one table into multiple tables, and then when you query, you look up one table. For example, to separate tables by user ID, one user’s data is placed in one table. And then when you do that you just do that table for a user. In this way, the amount of data in each table can be controlled within a controllable range, for example, each table is fixed within 5 million.

depots

What does separate library mean? As a rule of thumb, a single library can last up to 2000 concurrent sessions, so it must be expanded, and a healthy single library should be under 1000 concurrent sessions per second, not too large. Then you can split the data from one library into multiple libraries and access only one library.

MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency MySQL single-server deployment: High concurrency SQL running becomes slower The amount of data in a single table decreases, improving SQL execution efficiency

Commonly used sub-database sub-table middleware and advantages and disadvantages

  • ShardingSphere (Sharding-jdbc)

  • Mycat

Sharding-sphere is a set of open source distributed database middleware solutions, belonging to the client solution, that is, your business system only need to reference its JAR package, can be used. Sharding-sphere community is still under development and maintenance at present, and it is still relatively active. Personally, it is also an option now.

Mycat is based on the transformation of Cobar, belongs to the proxy layer solution, support the function is very perfect, and at present should be very hot and popular database middleware, the community is very active, there are also some companies began to use. But it is younger and less tempered than Sharding JDBC.

Comparison:

The advantages of The client solution of Sharding-Sphere lie in the fact that there is no deployment, low operation and maintenance cost, no need for secondary forwarding requests of the agent layer, and high performance. However, if there is any upgrade, all business systems need to upgrade and release the version again, and all systems need to couple Sharding-Sphere.

The disadvantage of Mycat proxy layer scheme is that it needs to be deployed independently and operate and maintain a set of middleware by itself, which has high operation and maintenance cost. However, the advantage is that it is transparent to each project. If upgrading is needed, only Mycat needs to be upgraded separately.

Generally speaking, both of these two schemes can be selected, but I suggest small and medium-sized companies to choose Sharding-Sphere. The client layer scheme is lightweight and has low maintenance cost, so there is no need to send additional staff to maintain it. Moreover, small and medium-sized companies will have lower system complexity and fewer projects.

However, it is best for medium and large companies to choose proxy layer schemes like Mycat, because large companies may have many systems and projects, large teams and sufficient personnel. It is best to arrange individuals to study and maintain Mycat, and then use it transparently for each project directly.

How do you split databases vertically or horizontally?

Horizontal split means that the data of one table is divided into multiple tables in multiple libraries, but the table structure of each library is the same, but the data of each library table is different, and the data of all library tables adds up to the total data. The meaning of horizontal splitting is to evenly distribute data into more libraries, then use more libraries to support higher concurrency, and use the storage capacity of multiple libraries to expand.

Figure 5. Horizontal split

Vertical split is to split a table with many fields into multiple tables or libraries. Each library table has a different structure, and each library table contains some fields.

In general, you put the most frequently accessed fields in one table and the least frequently accessed fields in another table. Because the database is cached, the fewer rows you access frequently, the more rows you can cache in the cache, and the better the performance. This is usually done a little bit more at the table level.

FIG. 6 Vertical split

If there is 600W data, now we need to divide the database into tables. Comprehensively, the tables may look like this:

FIG. 7 Schematic diagram of sub-database sub-table

The common sub-database sub-table is, is based on a certain ID to locate the module to the library first, and then locate the table. You can modulo userId and orderId. You can also divide the database into tables based on the range of data, such as when the data was created.

With the introduction of the database and table middleware, our system does not have to consider which database and table each data is routed to. SQL can be directly thrown into the sub-database sub-table middleware, by it according to the configuration, routing to the corresponding libraries and tables, this time MQ and asynchronous write system can also be removed, because after the sub-database sub-table, equivalent to each business system to bear the pressure is greatly reduced.

FIG. 8 Schematic diagram of sub-database sub-table

Conclusion:

This paper shares the origin of the sub-database and sub-table. Driven by the continuous development of business, the transformation of the system sub-database and sub-table is the only way to upgrade the architecture. At the same time, ShardingSphere and Mycat middleware commonly used in the industry as well as their advantages and disadvantages are discussed. Finally, how to split the database vertically and horizontally, as well as specific data routing methods of database and table are shared.

END