1. Introduction

I believe you are often read and write split, vertical split, horizontal split, split table these terms confused. I’m sometimes confused, so today I’m going to clear up some common database terms and write them down as well.

2. Read/write separation

This relatively easy to understand some, is the database is divided into Master and slave libraries, a Master library (Master) for writing data, multiple slave libraries (Slaver) for polling data reading process, between the Master and slave libraries through some communication mechanism for data synchronization, is a common database architecture. The following diagram shows the structure of “one master and two slaves” :

2.1 Why read/Write Separation

Most Internet data operations tend to be read more than write less, with the growth of data, database “read” will first become the bottleneck. If we want to linearly improve the read and write performance of a database, we need to keep the reads and writes as independent as possible. Before using read/write separation we should consider whether caching can solve the problem. Then consider grouping the database into “read” and “write” groups. Read/write separation means the fragmentation of an integrated structure, and the following issues need to be considered in high-volume, high-concurrency scenarios

  1. How to ensure Master high availability, failover, fusible current limiting, etc.
  2. Rules for distinguishing read and write operations, and how to handle read and write commands at the code level to avoid service intrusion.
  3. Tolerance for data consistency. Although it is data synchronization, it is still a problem that cannot be ignored because of the uncertainty of network.

3. The depots

Database vertical split, database horizontal split referred to as the branch library. According to specific conditions and dimensions, the data in the same database is split into multiple databases (hosts) to achieve the effect of distributing the load of a single database (host). In this way, we reduce the size of the data set in a way that improves performance in space for time.

3.1 Database vertical Split

Vertical database splitting refers to grouping tables in a database by business and putting the same group into a new database (logically, not an instance). Large businesses need to be divided into small businesses based on the actual business. For example, the user related tables, order related tables and logistics related tables in the whole business of the mall are separately classified to form user system database, order system database and logistics system database as shown below:

This brings some benefits: (a) clear business and single responsibility, (b) easy maintenance and expansion, and (c) data servitization. At the same time, it also has some negative effects :(a) it increases the complexity of the whole application and creates cross-library transactions; (b) it causes “barrel effect”, any weakness may affect the whole system; (c) some table relationships cannot join but can only be maintained through service calls. Even data inconsistency due to network problems.

Vertical splitting is usually preferred in cases where repository splitting is required.

3.2 Database horizontal splitting

After encountering performance bottlenecks in stand-alone databases after vertical database splitting, consider database splitting horizontally. The reason for vertical split is that data services are clear and simple after vertical split, which makes it easier to specify horizontal standards. For example, the horizontal split of user system after vertical split of mall business is better than the horizontal split of the whole mall business to find dimensions. We can split the data according to the interval of user registration time, user area or user ID range, hash and other conditions, and then associate the records of related tables. If you are placed on the whole mall business is based on users or orders are not good to consider.

We split the user system horizontally on a scale of 1 million as follows:

The benefits of this split are: (a) the capacity of a single library is controllable, (b) single records ensure data integrity, (c) data relationships can be maintained through joins, (d) cross-library transactions are avoided; Disadvantages also exist :(a) split rules have some influence on coding, and (b) partitioning interactions of different businesses need to be designed in an overall way

4. The table

Subtables are also divided into data table vertical split and data table horizontal split.

4.1 Vertical split of data tables

Table vertical splitting is the process of dividing a table’s columns vertically into multiple tables, changing the table from “wide” to “narrow.” Generally follow the following points to split:

  • Separate hot and cold, put the frequently used columns in one table, and the infrequently used columns in one table.
  • Large field columns are stored separately
  • The columns of an association relationship are placed closely together

We split the user table into two tables with frequently used and infrequently used and large fields:

4.2 Horizontal split of data tables

The horizontal splitting of a table feels the same as the horizontal splitting of a library in mind, but at a different granularity. The table structure remains the same. In other words, the union of the split dataset is equal to the pre-split dataset. There is nothing left to say about this after understanding section 3.2.

5. To summarize

Several database optimization concepts are briefly described here, which are often used in combination in practice. We need to make a good estimate of the amount of data before the actual operation, so that we can select the type according to the increment of the predicted future data. Business data growth is small and is often used for table splitting. If the growth is particularly large and reaches tens of thousands of levels, you can choose sub-databases, such as some capital score streams, historical records and so on. Sometimes it is not all right after splitting. For example, after splitting by region, the business in Region A grows rapidly and performs well, while that in region B suffers from weak promotion and fierce competition, resulting in sluggish performance, which leads to skewed data. It will also affect the expected effect of sub-database sub-table. This needs to establish a long-term monitoring and forecasting mechanism to deal with, and even timely adjust the strategy according to the actual situation. Data split also faces many problems of distributed, distributed transactions, high availability, data consistency, global uniqueness are all issues that should be considered. If you have any questions, please contact me through the public account Felordcn.

Follow our public id: Felordcn for more information

Personal blog: https://felord.cn