The background,

With the development of the time and business, the amount of data in the database growth is not controlled, libraries, and the data in the table will be more and more big, then there is a higher disk, IO, system overhead, even performance bottlenecks, and a service resource is limited after all, so you need to split the database and table, and thus provide better data services.

When the user expresses to ten million level, it will be very difficult to do a lot of operations, so when the data increases to more than 10 million, it needs to separate database and table to relieve the pressure of single database (table).

Two, what is the sub-database sub-table

In simple terms, it means that the data stored in the same database is dispersed to multiple databases (hosts) under certain specific conditions, so as to achieve the effect of dispersing the load of a single device.

Data Sharding can be divided into two Sharding modes according to the types of Sharding rules. One is to shard data to different databases (hosts) according to different tables (or schemas), which can be called vertical (vertical) data shard. The other is according to the logical relationship of the data in the table, the data in the same table will be split to multiple databases (host) in accordance with certain conditions, which is called the horizontal (horizontal) segmentation of data.

The biggest characteristic of vertical segmentation is simple rules and more convenient implementation. It is especially suitable for the system with very low coupling degree, little mutual influence and very clear business logic between different businesses. In such a system, it is easy to separate the tables used by different business modules into different databases. Split by different tables has less impact on the application and the split rules are simpler and clearer.

Horizontal sharding is a little more complicated than vertical sharding. Because different data from the same table is split into different databases, the splitting rules themselves are more complex for the application than splitting by table name, and later data maintenance is also more complex.

Three, vertical segmentation

Each database consists of many tables, and each table corresponds to different business. Vertical segmentation means that tables are classified according to business and distributed to different databases, so that data or pressure is shared to different libraries, as shown in the following figure:

The system is segmented into user, order, transaction and payment modules.

A good architecture design of the application system, its overall function must be composed of many functional modules, and the data required by each functional module corresponds to the database is one or more tables. In architectural design, the more unified the interaction points between each functional module are, the lower the coupling degree of the system is, and the better the maintenance and expansibility of each module of the system is. In such a system, it is easier to achieve vertical segmentation of data.

However, it is often difficult for some tables in the system to be completely independent, and there is the situation of expanding library join. For such tables, it is necessary to balance whether the database compromises the business and shares a data source, or divides it into multiple libraries and calls the business through interfaces. In the early stage of the system, when the amount of data is relatively small or the resources are limited, it will choose to share the data source. However, when the data develops to a certain scale and the load is heavy, it is necessary to do segmentation.

Generally speaking, it is difficult to segment businesses with complex join scenarios, and it is easy to segment businesses independently. How and to what extent to shard is a challenge to the technical architecture. The advantages and disadvantages of vertical segmentation are analyzed below:

Advantages:

After the split, the services are clear and the rules are clear.

Easy integration or extension between systems;

Data maintenance is simple.

Disadvantages:

Some service tables cannot be joined and can only be solved through interfaces, which improves the system complexity.

It is difficult to expand data and improve performance because of the single library performance bottleneck due to the different limitations of each service.

Transaction processing is complex.

Because vertical sharding is to disperse tables into different libraries according to the classification of services, some business tables are too large and there are bottlenecks in reading, writing and storing single libraries. Therefore, horizontal split is required to solve the problem.

Fourth, horizontal segmentation

In contrast to vertical splitting, horizontal splitting does not classify tables, but distributes them into multiple libraries according to certain rules of a certain field, with each table containing a portion of the data.

To put it simply, we can understand the horizontal partitioning of data as the partitioning by data rows, that is, some rows in the table are shelled into one database and some other rows are shelled into other databases, as shown in the figure

To split data, you need to define sharding rules. Relational databases are two-dimensional models of rows and columns. The first rule of resolution is to find the resolution dimensions.

For example, from the point of view of members, when the order transaction system of merchants queries an order of a member on a certain day or a certain month, it needs to be split according to the date of membership integration, and different data are grouped according to the member ID, so that all data query join will be solved in a single database. If from the perspective of merchants, to query all orders of a merchant on a certain day, we need to split according to the merchant ID; However, if the system wants to split by member and merchant data, it will have some difficulty.

How to find the appropriate sharding rules needs to be considered comprehensively.

Several typical sharding rules include:

According to the user ID module, the data is dispersed to different databases, the user data with the same data are dispersed to a library;

According to the date, the data of different months or even days are scattered into different libraries;

Search by a specific field, or split into different libraries according to a specific range.

As shown in the figure, the sharding principle is to find suitable sharding rules according to the business and disperse them to different libraries. The user ID is used as the model below

Since the data is split, there are advantages and disadvantages.

Advantages:

Split rule abstraction is good, join operation can be done by the database;

There is no single database big data, high concurrency performance bottleneck;

Less application end transformation;

Improve the system stability and load capacity.

Disadvantages:

Split rules are difficult to abstract;

The consistency of fragmented transactions is difficult to solve;

The difficulty of data expansion and maintenance is very large;

Cross-library join performance is poor

What is Mycat

It is an open source distributed database system, is a Server to implement the MySQL protocol, front-end users can regard it as a database agent, using MySQL client tools and command line access, and back-end can use MySQL Native protocol to communicate with multiple MySQL servers, You can also use JDBC protocol to communicate with most mainstream database servers. Its core function is to divide tables and libraries, that is, a large table is horizontally divided into N small tables and stored in the back-end MySQL server or other databases.

Common application scenarios:

Simple read/write separation: supports read/write separation and primary/secondary switchover.

Table and library, sharding more than 10 million tables, supporting a maximum of 100 billion single table sharding;

Multi-tenant applications, one library for each application, but the application only connects to Mycat, so as not to change the application itself, to achieve multi-tenant;

Report system, with the help of Mycat’s ability to divide tables, to deal with the statistics of large-scale reports;  Replace Hbase and analyze big data;

As a simple and effective solution for real-time query of massive data, for example, 10 billion frequently queried records need to be queried within 3 seconds. In addition to queries based on primary keys, there may also be scope queries or other attribute queries. In this case, Mycat may be the simplest and effective choice

SpringBoot+Mycat+MySQL implementation of sub-table sub-library case

Mycat has already implemented the routing function internally. We just need to configure the following sharding rules in Mycat. For developers, we can treat Mycat as a database, and then we can set up the environment:

Step one:

Mycat is a database middleware written in Java, so to run Mycat, you need to prepare a JDK environment, which is jdK1.7 or higher. Therefore, you need to configure the JAVA_HOME environment variable in the system.

Step 2:

Mycat, dl.mycat. IO / 1.6-release…

We build Mycat environment based on CentOS7, so download version:

Mycat – server – 1.6 – RELEASE – 20161028204710 – Linux. Tar. Gz

Step 3:

Upload the downloaded installation package to the server and decompress it. After decompression, the directory structure is as follows:

Step four:

Configure the sharding rule: Copy and paste the following configuration to overwrite the contents of mycat/conf/schema.xml.

<? The XML version = "1.0"? > <! DOCTYPE mycat:schema SYSTEM "schema.dtd"> <mycat:schema xmlns:mycat="http://io.mycat/"> <schema name="TESTDB" checkSQLschema="false" sqlMaxLimit="100"> <table name="user" primaryKey="id" dataNode="dn01,dn02" rule="rule1" /> </schema> <! DataHost --> <dataNode name="dn01" dataHost=" DH01 "database="db01" /> <dataNode name="dn02" dataHost="dh01" database="db02" /> <! -- mycat Logical host dataHost Physical host. DataHost name="dh01" maxCon="1000" minCon="10" balance="0" writeType="0" dbType="mysql" DbDriver ="native"> <heartbeat>select user()</heartbeat> <writeHost host="server1" URL ="127.0.0.1:3306" user="root" password="WolfCode_2017"/> </dataHost> </mycat:schema>Copy the code


: indicates the logical library configuration in mycat. The logical library name is TESTDB

: indicates the logical table configuration in Mycat. The logical table name is user, and it is mapped to two database nodes, dataNode. The partitioning rule is rule1(configured in rule-xml).


: indicates the database node. This node is not a single node but can be configured with read/write separation.


: address configuration of the real database


: detects the user heartbeat


: write library configuration

Copy and paste the following configuration to overwrite the contents of mycat/conf/ rule-xml.

<? The XML version = "1.0" encoding = "utf-8"? > <! DOCTYPE mycat:rule SYSTEM "rule.dtd"> <mycat:rule xmlns:mycat="http://io.mycat/"> <tableRule name="rule1"> <rule> <columns>id</columns> <algorithm>mod-long</algorithm> </rule> </tableRule> <function name="mod-long" class="io.mycat.route.function.PartitionByMod"> <! -- how many data nodes --> <property name="count">2</property> </function> </mycat:rule>Copy the code

2 The number of Datanodes must be the same as that set in

. Otherwise, an error occurs.

Step 5:

Create two databases db01 and DB02 in the database.

Execute the following construction statement in each library:

CREATE TABLE `user` (
 `id` bigint(20) NOT NULL,
 `name` varchar(255) DEFAULT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;Copy the code

Step 6:

To start mycat, run mycat/bin/startup_nowrap.sh

Step 7:

Project has been uploaded to the githubgithub.com/javalanxion… Set up the SpringBoot environment and execute the insert statement.

# configuration data source spring. The datasource. The druid. Driver - class - name = com. Mysql. JDBC. Driver # of this configuration is Mycat server is configured in the XML account password, not the database password. Spring. The datasource. The druid. Username = root spring. The datasource. The druid. Password = 123456 # mycat logic library Port is mycat Spring. The datasource. The druid. Url = JDBC: mysql: / / 192.168.142.129:8066 / TESTDBCopy the code

The usermapper. Java code looks like this:

@Mapper
public interface UserMapper {
    @Insert("insert into user(id,name) value (#{id},#{name})")
    int insert(User user);
    @Select("select * from user")
    List<User> selectAll();
}Copy the code

The userController.java code looks like this:

@RestController @RequestMapping("/user") public class UserController { @Autowired private UserMapper userMapper; @RequestMapping("/save") public String save(User user){ userMapper.insert(user); Return "Save successfully "; } @RequestMapping("/list") public List<User> list(){ return userMapper.selectAll(); }}Copy the code

Step 8:

Testing:

Type in the address bar: http://localhost:8080/user/save? id=1&name=tom http://localhost:8080/user/save? id=2&name=jack

Check database:

The data with ID 1 is inserted into table USER in database DB02.

The data with ID 2 is inserted into table USER in database DB01.

Type in the address bar: http://localhost:8080/user/list you can see just insert two records.

Ok, so at this point we’re done dividing tables and libraries.