1. Database bottlenecks

Either an IO bottleneck or a CPU bottleneck will eventually lead to an increase in the number of active connections to the database, approaching or even reaching the threshold for the number of active connections the database can handle.

In the case of a business Service, there are few or no database connections available. Then you can imagine (concurrency, throughput, crashes).

1. IO bottlenecks

The first type: disk read IO bottleneck, too much hotspot data, database caching, each query will produce a large number of I/O, reduce the query speed -> branch library and vertical branch table.

Second: network IO bottleneck, request too much data, network bandwidth is not enough -> branch library.

2. CPU bottlenecks

The first kind: SQL problem, such as SQL contains JOIN, group BY, order by, non-index field condition query, etc., increase THE OPERATION of CPU operation -> SQL optimization, establish appropriate indexes, and carry out business calculation in the business Service layer.

Second: single table data volume is too large, query scan too many rows, SQL efficiency is low, CPU bottleneck -> horizontal table.

Two, sub-database sub-table

1. Horizontal branch library

1. Concept: Split data in one library into multiple libraries based on fields and policies (hash, range, etc.).

2. The result:

  • Each library has the same structure;

  • Each library has different data and no intersection;

  • The union of all libraries is full data;

3. Scenario: With the absolute concurrency of the system coming up, it is difficult to fundamentally solve the problem of table splitting, and there is no obvious business ownership to divide the database vertically.

4. Analysis: with more libraries, the pressure of IO and CPU can be relieved exponentially.

2. Level table

1. Concept: Divide data in a table into multiple tables based on fields and policies (such as hash and range).

2. The result:

  • Each table has the same structure

  • The data in each table is different and there is no intersection;

  • The union of all tables is full data;

3. Scenario: The absolute concurrency of the system does not increase, but the amount of data in a single table is too large, which affects SQL efficiency and increases CPU burden, and becomes a bottleneck.

4. Analysis: the amount of data in the table is less, and the single SQL execution is efficient, which naturally reduces the burden on the CPU.

3. Vertical branch library

1. Concept: Split different tables into different libraries based on service ownership.

2. The result:

  • Each library has a different structure;

  • The data in each library is also different and there is no intersection;

  • The union of all libraries is full data;

3. Scenario: The absolute amount of concurrency in the system comes up, and individual business modules can be abstracted. 4. Analysis: At this point, it’s basically servitization.

For example, with the development of services, there are more and more common configuration tables and dictionary tables. At this time, these tables can be separated into a separate library, or even service-oriented. Furthermore, as the business develops and incubates a business model, relevant tables can be separated into a separate library or even servitized.

4, vertical table

1. Concept: Based on the field and according to the activity of the field, separate the fields in the table into different tables (main table and extended table).

2. The result:

  • Each table has a different structure;

  • The data of each table is also different. Generally speaking, each table has at least one intersection column of fields, which is usually the primary key used to associate data.

  • The union of all tables is full data;

3. Scenario: The absolute amount of concurrent data in the system is not increasing. There are not many records in the table but many fields. As a result, the number of data rows in the database cache is reduced. A large number of random read I/OS are generated when disk data is read during query, resulting in I/O bottlenecks.

4. Analysis: Use list pages and detail pages to help understand. The split principle of vertical split tables is to put hot data (which may be redundant and often queried together) together as the primary table, and non-hot data together as the extended table.

This allows more hotspot data to be cached, reducing random read IO. After the split, to get all the data needs to associate the two tables to fetch the data. Keep in mind, however, that you should never use joins, because joins not only increase the CPU burden but also couple the two tables together (on the same database instance). Associated data should be done in the business Service layer, fetching primary and extended table data separately and then associating all data with associated fields.

Three, database and table tools

  1. Sharding-sphere: JAR, formerly sharding-JDBC;

  2. TDDL: JAR, Taobao Distribute Data Layer;

  3. Mycat: middleware.

Note: the pros and cons of the tool, please do your own research, official website and community priority.

Four, the steps of database and table

Evaluate the number of branches or tables based on capacity (current capacity and growth) -> Select key (uniform) -> table rules (hash or range) -> execute (double write) -> Capacity expansion (minimize data movement).

Fifth, the problem of database and table

1, non-partition key query problem (horizontal partition database partition table, split strategy is commonly used hash method)

  1. Only one non-partition key can be used for conditional query
  • Mapping method

  • Genetic method

Note: When writing, the genetic method generates userID, as shown in figure. For xBIT gene, for example, there are 8 tables, 23=8, so x is 3, that is, 3bit gene. When querying according to userID, modules can be directly routed to the corresponding sub-database or sub-table. How does the database divide into tables, vertical? Level? You can read this one as well.

When querying according to username, username_code is generated through the userNamecode generating function and then modded to the corresponding branch library or branch table. Id Generates the common Snowflake algorithm.

  1. More than one non-partition key is used as a conditional query
  • Mapping method

  • Redundancy method

Note: Routing to dBOBuyer when querying orderID or buyerID and to dBO_seller when querying sellerID. Feels like putting the cart before the horse! Is there another good way? What about changing the technology stack?

  1. In addition to partition keys, the background can also query the combination of non-partition keys
  • No method

  • Redundancy method

2, non-partition key cross-database cross-table paging query problem (horizontal partition database and table, split strategy is commonly used hash method)

Note: use NoSQL method to solve (ES, etc.).

3. Capacity expansion (horizontal database and table, split strategy using common hash method)

1. Horizontally expanding the database (upgrading the secondary database)

Note: Expansion is multiplied.

2. Horizontal expansion table (double write migration method)

Step 1 :(synchronize dual-write) configure dual-write and deploy applications. Step 2 :(synchronous double write) copy the old data from the old library to the new library; Step 3 :(synchronous double-write) proofread the old data in the new database according to the old database; Step 4 :(synchronize dual-write) remove the dual-write application and deploy it.

Note: Double-write is common.

Vi. Summary of database and table

  1. Split database and table, first of all, we have to know where the bottleneck is, and then can reasonably split (split database or table? Horizontal or vertical? How many? . And can not be divided in order to separate table and split.

  2. Key selection is important to consider both split evenly and non-partition key query.

  3. As long as it meets the requirements, the split rules should be as simple as possible.