Sharding-JDBC, tens of millions of data you don't do?

Today we will introduce the Sharding-JDBC framework and a quick example of building a sub-library and sub-table to set up the environment for the following function points.

Sharding-jdbc introduction

Sharding-jdbc is a database and table framework used by Dangdang at the earliest. It was opened to open source in 2017. In recent years, with the continuous iteration of a large number of community contributors, its functions have been gradually improved and it has been renamed ShardingSphere. It became an Apache Software Foundation Top Level project on April 16, 2020.

The core functions of ShardingSphere have become diversified with the continuous change of versions. From Sharding-JDBC 1.0 to Sharding-JDBC 2.0 to support for database governance (registry, configuration center, etc.), Distributed transactions (Atomikos, Narayana, Bitronix, Seata) were added to Sharding-JDBC 3.0 and are now iterated to Sharding-JDBC 4.0.

Now ShardingSphere is not just a framework but an ecosystem, which is composed of sharding-JDBC, Sharding-Proxy and Sharding-Sidecar, three open source distributed database middleware solutions.

The predecessor of ShardingSphere is Sharding-JDBC, so it is the most classic and mature component in the whole framework. We start from sharding-JDBC framework to learn the library and table.

Ii. Core Concepts

Before we dive into the sharding-JDBC database table, it is important to understand some of the core concepts of database table.

shard

Generally, when we refer to database and table, most of them are based on the horizontal partitioning mode (horizontal partitioning and table). Data sharding splits the original table T_ORDER with a large amount of data into several small tables T_ORDER_0, T_ORDER_1, ··· and T_ORDER_N with completely consistent table structure. Each table stores only part of the data in the original large table. When an SQL query is executed, the data is distributed to different databases and tables through the database and sharding policies.

Data nodes

A data node is a non-separable minimum data unit (table) in a sub-database sub-table. It consists of data source names and data tables. For example, order_DB_1.t_ORDER_0 and ORDER_DB_2.t_ORDER_1 in the figure above represent a data node.

Logical table

A logical table is a group of tables with the same logic and data structure. For example, we split the order table T_ORDER into t_ORDER_0 ··· T_ORDER_9 and other 10 tables. Now we can see that the table T_ORDER is no longer in the database, but instead t_order_n, but we still write the SQL as T_order in the code. T_order is the logical table of the split tables.

Truth table

Real tables are the physical tables that actually exist in the T_ORDER_N database mentioned above.

Shard key

Database field used for sharding. Order_id is the shard key of T_ORDER, and when we execute a SQL query, we will determine which table in which database order_id should be executed by moding the order_id field.

In this way, the related data of the same order will be stored in the same database table, which greatly improves the performance of data retrieval. Sharding – JDBC also supports sharding based on multiple fields as sharding keys.

Subdivision algorithm

We mentioned above that we can use sharding rules to take modules sharding, but this is only a relatively simple one. IN the actual development, we also hope to use >=, <=, >, <, BETWEEN and IN conditions as sharding rules, custom sharding logic, then we need to use sharding strategy and sharding algorithm.

From the perspective of SQL execution, database and table partitioning can be regarded as a routing mechanism to route SQL statements to the database or data table we expect and obtain data. Sharding algorithm can be understood as a routing rule.

Let’s take a look at the relationship between them, sharding strategy is just an abstract concept, it is a combination of sharding algorithm and sharding health, sharding algorithm to do specific data sharding logic.

The sharding policy configuration of sub-database and sub-table is relatively independent, and different policies and algorithms can be used respectively. Each policy can be a combination of multiple sharding algorithms, and each sharding algorithm can make logical judgment on multiple sharding keys.

Note: Sharding – JDBC does not provide the sharding algorithm implementation directly, and requires the developer to implement it according to the business.

Sharding-jdbc provides four sharding algorithms:

1. Accurate sharding algorithm

The PreciseShardingAlgorithm is used for a single field as a sharding key. SQL sharding conditions such as = and IN need to be used under the StandardShardingStrategy.

2. Range sharding algorithm

The RangeShardingAlgorithm is used for sharding a single field as a sharding key. For sharding conditions such as BETWEEN AND, >, <, >= AND <= in SQL, it needs to be used under the StandardShardingStrategy.

Compound sharding algorithm

Composite subdivision algorithm (ComplexKeysShardingAlgorithm) used in multiple fields as shard key shard operations, access to multiple subdivision the value chain at the same time, based on the business logic of the multiple fields. It needs to be used under the compound sharding strategy.

Hint sharding algorithm

Hint sharding algorithm (HintShardingAlgorithm) is slightly different. In the above algorithms, we parse SQL statements to extract sharding keys and set sharding strategies for sharding. However, sometimes we do not use any shard key or shard policy, but still want to route SQL to the target database and table, we need to manually specify the target database and table information of SQL, this is also called forced routing.

Shard strategy

Sharding strategy is an abstract concept. The actual sharding operation is done by the sharding algorithm and the sharding key.

1. Standard sharding strategy

The standard sharding strategy applies to single sharding keys and supports PreciseShardingAlgorithm and RangeShardingAlgorithm.

PreciseShardingAlgorithm is required to process fragments of = and IN. RangeShardingAlgorithm is optional for BETWEEN AND, >, <, >=, <= conditional sharding. If RangeShardingAlgorithm is not configured, conditions in SQL will be processed as full-library routing.

2. Compound sharding strategy

The compound sharding strategy also supports sharding of SQL statements =, >, <, >=, <=, IN AND BETWEEN AND. The difference is that it supports multiple shard keys, and the details of how to allocate slices are completely up to the application developer.

3. Line expression sharding strategy

The row expression sharding policy supports the sharding of = and IN IN SQL statements, but only supports single sharding keys. This strategy is usually used for simple sharding. You do not need to customize the sharding algorithm and can directly write rules in the configuration file.

T_order_ $->{t_order_id % 4} represents T_ORDER by modulating its field T_order_id into 4 tables named T_ORDER_0 to T_order_3.

4. Hint Sharding strategy

Hint Sharding strategy, which corresponds to the Hint sharding algorithm, is a strategy for sharding by specifying sharding keys instead of extracting the keys from SQL.

Distributed primary key

After data sharding, it is very difficult for different data nodes to generate globally unique primary keys. Self-increment keys between different real tables (T_ORDER_N) in the same logical table (T_ORDER) are not aware of each other, resulting in duplicate primary keys.

Although ID collisions can be avoided by setting auto-increment primary key initial values and steps, this can result in high maintenance costs, poor integrity, and scalability. If the number of shard tables needs to be increased later, the step size of the shard table needs to be changed one by one. Therefore, the operation and maintenance cost is very high. Therefore, this method is not recommended.

There are many ways to achieve distributed primary key generator, specific can be Baidu, there are many online

To make it easier to get started, ApacheShardingSphere has built-in UUID and SNOWFLAKE two distributed primary key generators. By default, SNOWFLAKE algorithm is used to generate 64-bit long integer data. Moreover, it also abstracts the interface of distributed primary key generator, which is convenient for us to implement the self-increment primary key generation algorithm.

The broadcast table

Broadcast table: A table that exists in all shard data sources and has exactly the same structure and data in the table in each database. Once a table is configured as a broadcast table, if the broadcast table of a database is modified, the broadcast table data in all data sources will be synchronized.

The binding table

Bound tables: Primary and child tables that have the same sharding rules. For example, the T_ORDER order table and the T_ORDER_item order service item table are both sharded by the order_ID field, so the two tables are bound to each other.

So what’s the point of binding tables?

T_order and T_ORDER_item are usually used in our business for multi-table joint query, but these tables are divided into N sub-tables due to the partition of the database and table. If the binding table relationship is not configured, a Cartesian product associative query will appear, producing the following four SQL statements.

SELECT * FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id SELECT * FROM t_order_0 o JOIN t_order_item_1 i  ON o.order_id=i.order_id SELECT * FROM t_order_1 o JOIN t_order_item_0 i ON o.order_id=i.order_id SELECT * FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_idCopy the code

However, when associative query is performed after the binding table relationship is configured, the data generated by the corresponding table sharding rules will fall into the same library, so only T_ORDER_0 and T_ORDER_ITEM_0 tables need to be associated.

SELECT * FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id SELECT * FROM t_order_1 o JOIN t_order_item_1 i  ON o.order_id=i.order_idCopy the code

Note: t_order is used as the main table for the entire federated query in the associative query. All related routing calculations use the master table policy only. Shard calculations for T_ORDER_item also use the T_ORDER condition, so ensure that the shard keys are exactly the same between the bound tables.

Three, and JDBC dodgy

We know that JDBC is a Java language specification for accessing relational databases. It was designed to provide a uniform standard for all kinds of databases. Different manufacturers comply with this standard. And provide their own implementation scheme to supply program call.

In fact, for developers, we only care about how to call the JDBC API to access the database, as long as the correct use of DataSource, Connection, Statement, ResultSet API interface, directly operate the database. Therefore, if you want to achieve data Sharding at the JDBC level, you must extend the function of the existing API, and Sharding-JDBC is based on this idea, rewrite the JDBC specification and fully compatible with the JDBC specification.

The interfaces of the original DataSource and Connection are extended to ShardingDataSource and ShardingConnection, and the exposed sharding operation interfaces are exactly the same as those provided in the JDBC specification. As long as you are familiar with JDBC, you can easily apply Sharding-JDBC to implement library and table.

Therefore, it is suitable for any JDBC-based ORM framework such as JPA, Hibernate, Mybatis, Spring JDBC Template or directly used JDBC. Perfect compatibility with any third party database connection pool such as DBCP, C3P0, BoneCP, Druid, HikariCP, etc. Support for almost all mainstream relational databases.

How does Sharding-JDBC extend these interfaces? To find out, let’s start with the source code. Let’s use the JDBC API DataSource as an example to see how it can be overwritten.

The DataSource interface provides two methods to obtain the database Connection, and inherit the CommonDataSource and Wrapper interfaces.

public interface DataSource  extends CommonDataSource, Wrapper {

  /**
   * <p>Attempts to establish a connection with the data source that
   * this {@code DataSource} object represents.
   * @return  a connection to the data source
   */
  Connection getConnection() throws SQLException;

  /**
   * <p>Attempts to establish a connection with the data source that
   * this {@code DataSource} object represents.
   * @param username the database user on whose behalf the connection is
   *  being made
   * @param password the user's password
   */
  Connection getConnection(String username, String password)
    throws SQLException;
}
Copy the code

The CommonDataSource is the root interface that defines the data source, which is easy to understand, while the Wrapper interface is the key to extending JDBC sharding.

Depending on the database vendor, they may provide some extensions that go beyond the standard JDBC API, but these cannot be used directly outside the JDBC standard. The Wrapper interface is used to wrap a non-JDBC standard interface provided by a third party vendor. This is the adapter pattern.

Now that we have talked about the adapter pattern, a few more wordy words are also convenient for the following understanding.

The adapter pattern is a common design pattern that translates the interface of one class into another interface that the client expects, enabling two classes to work together that would otherwise not work together because of interface mismatches (or incompatibations). For example, I use headphones to listen to music, I have a round head headphones, but the phone jack is flat, if I want to use headphones to listen to music, I have to use a adapter, this adapter plays an adaptive role. For example, let’s say we have hello() and word() methods in our Target interface.

public interface Target {

    void hello();

    void world();
}
Copy the code

The greet() method of the Adaptee class will replace the hello() method because the word() method of the iterated Target interface may be deprecated or not supported.

public class Adaptee {

    public void greet(){

    }
    public void world(){

    }
}
Copy the code

However, there are still a lot of word() methods in use in older versions. The best way to solve this problem is to create an Adapter Adapter, which is compatible with the Target class and resolves the compatibility problems caused by the interface upgrade.

public class Adapter extends Adaptee implements Target { @Override public void world() { } @Override public void hello()  { super.greet(); } @Override public void greet() { } }Copy the code

Sharding-jdbc provides a non-JDBC standard interface, so it also provides a similar implementation scheme, using the Wrapper interface to adapt the data Sharding function. In addition to the DataSource, Connection, Statement, ResultSet and other core objects also inherit this interface.

ShardingDataSource class source code for a simple look at the implementation process, the following is the inheritance relationship flow chart.

The ShardingDataSource class extends the functionality of the original DataSource by registering the shard SQL routing wrapper, SQL rewriting context and result set processing engine during initialization. It also validates the DataSource type because it supports multiple different data sources simultaneously. To didn’t see how it’s like adaptation, then look up ShardingDataSource AbstractDataSourceAdapter inheritance of class.

@Getter public class ShardingDataSource extends AbstractDataSourceAdapter { private final ShardingRuntimeContext runtimeContext; / * * * register routing, rewrite the SQl result set processing engine * / static context, {NewInstanceServiceLoader. Register (RouteDecorator. Class); NewInstanceServiceLoader.register(SQLRewriteContextDecorator.class); NewInstanceServiceLoader.register(ResultProcessEngine.class); } /** * when initializing, verify data source type and get a sharding context according to data source map, sharding rule, and database type. Public ShardingDataSource(final Map<String, DataSource> dataSourceMap, final ShardingRule ShardingRule final Properties props) throws SQLException { super(dataSourceMap); checkDataSourceType(dataSourceMap); runtimeContext = new ShardingRuntimeContext(dataSourceMap, shardingRule, props, getDatabaseType()); } private void checkDataSourceType(final Map<String, DataSource> dataSourceMap) { for (DataSource each : dataSourceMap.values()) { Preconditions.checkArgument(! (each instanceof MasterSlaveDataSource), "Initialized data sources can not be master-slave data sources."); Public final ShardingConnection getConnection() {return new ShardingConnection(getDataSourceMap(), runtimeContext, TransactionTypeHolder.get()); }}Copy the code

AbstractDataSourceAdapter main access to different types of data sources inside the abstract class corresponding database connection object, realize the AutoCloseable interface is for after in the use of the resources can be automatically shut these resources (call the close method), That will see derived class AbstractUnsupportedOperationDataSource.

@Getter public abstract class AbstractDataSourceAdapter extends AbstractUnsupportedOperationDataSource implements AutoCloseable { private final Map<String, DataSource> dataSourceMap; private final DatabaseType databaseType; public AbstractDataSourceAdapter(final Map<String, DataSource> dataSourceMap) throws SQLException { this.dataSourceMap = dataSourceMap; databaseType = createDatabaseType(); } public AbstractDataSourceAdapter(final DataSource dataSource) throws SQLException { dataSourceMap = new HashMap<>(1, 1); dataSourceMap.put("unique", dataSource); databaseType = createDatabaseType(); } private DatabaseType createDatabaseType() throws SQLException { DatabaseType result = null; for (DataSource each : dataSourceMap.values()) { DatabaseType databaseType = createDatabaseType(each); Preconditions.checkState(null == result || result == databaseType, String.format("Database type inconsistent with '%s' and '%s'", result, databaseType)); result = databaseType; } return result; } /** * different DataSource types obtain database connections */ private DatabaseType createDatabaseType(final DataSource DataSource) throws SQLException {if (dataSource instanceof AbstractDataSourceAdapter) { return ((AbstractDataSourceAdapter) dataSource).databaseType; } try (Connection connection = dataSource.getConnection()) { return DatabaseTypes.getDatabaseTypeByURL(connection.getMetaData().getURL()); } } @Override public final Connection getConnection(final String username, final String password) throws SQLException { return getConnection(); } @Override public final void close() throws Exception { close(dataSourceMap.keySet()); }}Copy the code

AbstractUnsupportedOperationDataSource implement DataSource interface and inherited WrapperAdapter class, inside it there is no specific methods only play the role of bridge, But it looks a little bit like the adapter pattern example we talked about earlier.

public abstract class AbstractUnsupportedOperationDataSource extends WrapperAdapter implements DataSource { @Override public final int getLoginTimeout() throws SQLException { throw new SQLFeatureNotSupportedException("unsupported getLoginTimeout()"); } @Override public final void setLoginTimeout(final int seconds) throws SQLException { throw new SQLFeatureNotSupportedException("unsupported setLoginTimeout(int seconds)"); }}Copy the code

WrapperAdapter is a Wrapper adapter that implements the Wrapper interface in JDBC and has two core methods for adding methods and parameters that need to be executed. The added methods and parameters are executed by reflection when the Replay Method Invocation is performed. If you look closely, both methods use the JDB MethodInvocation class.

public abstract class WrapperAdapter implements Wrapper { private final Collection<JdbcMethodInvocation> jdbcMethodInvocations = new ArrayList<>(); /** * Add the method to be executed */ @sneakythrows public Final void recordMethodInvocation(Final Class<? > targetClass, final String methodName, final Class<? >[] argumentTypes, final Object[] arguments) { jdbcMethodInvocations.add(new JdbcMethodInvocation(targetClass.getMethod(methodName, argumentTypes), arguments)); } /** * Public final void Replay method Invocation(Final Object Target) {for (JdbcMethodInvocation) each : jdbcMethodInvocations) { each.invoke(target); }}}Copy the code

The JDBC MethodInvocation class uses reflection to invoke the method method and arguments arguments from the method invocation so that non-JDBC methods can be invoked from the JDBC API.

@RequiredArgsConstructor public class JdbcMethodInvocation { @Getter private final Method method; @Getter private final Object[] arguments; /** * Invoke JDBC method. * * @param target target object */ @SneakyThrows public void invoke(final Object target) { method.invoke(target, arguments); }}Copy the code

Sharding-jdbc extends the JDBC API, and what does Sharding do with the new Sharding functionality?

A table is divided into several sub-tables after sub-tables and dispersed to different databases. Under the premise of not modifying the original business SQL, Sharing-JDBC must be transformed to SQL for normal execution.

General execution process: SQL parsing -> executor optimization -> SQL routing -> SQL rewriting -> SQL execution -> result merge six steps, let’s see what each step does.

SQL parsing

The SQL parsing process is divided into two steps: lexical parsing and grammatical parsing. For example, in the following SQL query, lexical parsing is used to break the SQL into non-separable atomic units. These units are categorized as keywords, expressions, variables, or operators based on dictionaries provided by different database dialects.

SELECT order_no,price FROM t_order_ where user_id = 10086 and order_status > 0
Copy the code

Parsing then transforms the split SQL into an abstract syntax tree, traversing the abstract syntax tree to extract the context needed for the shard. Context contains query Field information (Field), Table information (Table), query Condition (Condition), sorting information (Order By), grouping information (Group By), paging information (Limit), and so on, and marks places in SQL that may need rewriting.

Actuator optimization

Executor optimization optimizes SQL sharding conditions to handle bad smells like keyword OR that affect performance.

SQL routing

SQL routing parses the fragment context, matches the fragment policy configured by the user, and generates routing paths. A simple point of view is that we can calculate which database table SQL should be executed in according to the sharding policy configured, and SQL routing can distinguish sharding routes and broadcast routes according to whether there is a sharding health.

A route with a fragment key is called a fragment route and can be divided into three types: direct route, standard route, and Cartesian product route.

Standard routing

Standard routing is the most recommended and commonly used sharding method. It is applicable to SQL that does not contain associative query or only contains associative query between bound tables.

When the operator of THE SQL sharding key is =, the routing result will fall into a single library (table). When the sharding operator is BETWEEN or IN, the routing result may not fall into a unique library (table). Therefore, a logical SQL may be split into multiple real SQL for execution.

SELECT * FROM t_order  where t_order_id in (1,2)
Copy the code

SQL routing is processed

SELECT * FROM t_order_0  where t_order_id in (1,2)
SELECT * FROM t_order_1  where t_order_id in (1,2)
Copy the code

The direct route

Direct routing is a sharding method that uses HintAPI to directly route SQL to a specified library table. Direct routing can be used in scenarios where shard keys are not in SQL, and can execute any SQL in complex situations including subqueries and custom functions.

For example, if you want to query an order based on the t_order_id field, you want to add user_id as the sharding condition without modifying the SQL to use direct routing.

Cartesian product routing

The Cartesian route is generated by the associated query between unbound tables. The query performance is low. Avoid this route mode.

The route without fragment keys is also called broadcast route, which can be classified into five types: full library table route, full library route, full instance route, unicast route, and block route.

Full library table routing

Full table routing is for database DQL, DML, DDL and other operations. When we execute a logical table T_ORDER SQL, we execute one by one in the corresponding real table T_ORDER_0 ·· T_ORDER_N in all shard libraries.

All library routing

Full-library routing focuses on database-level operations such as database SET type database management commands and transaction control statements such as TCL.

After the autocommit property is set for the logical library, this command is executed in all corresponding real libraries.

SET autocommit=0;
Copy the code

Full instance routing

Full-instance routing is a DCL action for a database instance (setting or changing database user or role permissions), such as creating a user order, which will be executed in all real library instances to ensure that the ORDER user can access each database instance.

CREATE USER [email protected] identified BY 'programmer ';Copy the code

Unicast routing

Unicast routing is used to obtain real table information, such as table description:

DESCRIBE t_order; 
Copy the code

The real table for T_ORDER is T_ORDER_0 ··· t_ORDER_N, and their description structure is exactly the same, we only need to execute on any real table once.

Blocking the route

To mask SQL operations on a database, for example:

USE order_db;
Copy the code

This command will not be executed in the real database because ShardingSphere uses logical Schema (database organization and structure), so there is no need to send the command to switch the database to the real database.

Rewrite the SQL

Rewrite SQL developed based on logical tables into statements that can be executed correctly in a real database. For example, to query the T_ORDER order table, our actual development SQL is written according to the logical table T_ORDER.

SELECT * FROM t_order
Copy the code

However, after the partition, the T_ORDER table in the real database does not exist, but is divided into multiple sub-tables T_ORDER_N scattered in different databases, and it is obviously not feasible to execute according to the original SQL. In this case, the logical table name in the partition table configuration needs to be changed to the real table name obtained after routing.

SELECT * FROM t_order_n
Copy the code

SQL execution

Routing and rewritten real SQL is safely and efficiently sent to the underlying data source for execution. Instead of simply sending SQL directly to the data source for execution via JDBC, this process balances the cost of data source connection creation and memory usage, automatically balancing resource control and execution efficiency.

Results the merge

It is called result merging that the multiple data result sets obtained from each data node are merged into one large result set and returned to the requesting client correctly. Our SQL syntax, such as sorting, grouping, paging, and aggregation, operates on a merged result set.

Four, fast practice

Below, we combine Springboot + MybatisPlus to quickly build a sub-database sub-table case.

1. Preparation

Create two databases DS-0 and DS-1, and create tables T_ORDER_0, T_ORDER_1, T_ORDER_2, T_ORDER_ITEM_0, T_ORDER_ITEM_1, T_ORDER_ITEM_2, t_config, This is convenient for verifying broadcast table and binding table scenarios.

The table structure is as follows:

T_order_0 orders table

CREATE TABLE `t_order_0` (
  `order_id` bigint(200) NOT NULL,
  `order_no` varchar(100) DEFAULT NULL,
  `create_name` varchar(50) DEFAULT NULL,
  `price` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
Copy the code

T_order_0 and T_ORDER_ITEM_0 are interrelated tables

CREATE TABLE `t_order_item_0` (
  `item_id` bigint(100) NOT NULL,
  `order_no` varchar(200) NOT NULL,
  `item_name` varchar(50) DEFAULT NULL,
  `price` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
Copy the code

Broadcast table t_config

  `id` bigint(30) NOT NULL,
  `remark` varchar(50) CHARACTER SET utf8 DEFAULT NULL,
  `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `last_modify_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Copy the code

ShardingSphere provides four sharding configuration modes:

Java Code Configuration
Yaml and properties configurations
Spring namespace configuration
Spring the Boot configuration

In order to make the code look more concise and intuitive, we will introduce the sharding-JdbC-spring-boot-starter and Sharding-core-common packages corresponding to shardingsphere in the form of properties configuration. The unified version is 4.0.0-rc1.

2. Sharding configuration

<dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>sharding-jdbc-spring-boot-starter</artifactId> < version > 4.0.0 - RC1 < / version > < / dependency > < the dependency > < groupId > org. Apache. Shardingsphere < / groupId > < artifactId > sharding - core - common < / artifactId > < version > 4.0.0 - RC1 < / version > < / dependency >Copy the code

After the preparation work (mybatis setup will not be described), let’s interpret the shard configuration information one by one.

We first define two data sources DS-0 and DS-1, and add the basic information of the data sources respectively.

# define two global data source spring. Shardingsphere. The datasource. Names = ds 0, ds - 1 # configuration data source ds 0 spring.shardingsphere.datasource.ds-0.type=com.alibaba.druid.pool.DruidDataSource spring.shardingsphere.datasource.ds-0.driverClassName=com.mysql.jdbc.Driver Spring. Shardingsphere. The datasource. Ds 0. Url = JDBC: mysql: / / 127.0.0.1:3306 / ds - 0? UseUnicode = true&characterEncoding = utf8 & tinyIn t1isBit=false&useSSL=false&serverTimezone=GMT spring.shardingsphere.datasource.ds-0.username=root Spring. Shardingsphere. The datasource. Ds 0. Password = root configuration data ds - # 1 spring.shardingsphere.datasource.ds-1.type=com.alibaba.druid.pool.DruidDataSource spring.shardingsphere.datasource.ds-1.driverClassName=com.mysql.jdbc.Driver Spring. Shardingsphere. The datasource. The ds - 1. Url = JDBC: mysql: / / 127.0.0.1:3306 / ds - 1? UseUnicode = true&characterEncoding = utf8 & tinyIn t1isBit=false&useSSL=false&serverTimezone=GMT spring.shardingsphere.datasource.ds-1.username=root spring.shardingsphere.datasource.ds-1.password=rootCopy the code

After configuring the data source, add sub-database and sub-table strategy for the table. Using Sharding-JDBC to make sub-database and sub-table requires us to set sharding rules independently for each form.

# # configuration subdivision table t_order specified spring. The real data node shardingsphere. Sharding. Name t_order. Actual data - nodes = ds - $- > {0.. 1}.t_order_$->{0.. 2}Copy the code

The actual-data-nodes attribute specifies the actual data nodes for sharding. $is a placeholder, {0.. 1} represents the actual number of split database tables.

ds-$->{0.. 1}.t_order_$->{0.. The 2} expression is equivalent to six data nodes

ds-0.t_order_0
ds-0.t_order_1
ds-0.t_order_2
ds-1.t_order_0
ds-1.t_order_1
ds-1.t_order_2

# # # # depots strategy depots shard health spring. Shardingsphere. Sharding. Name t_order. Database - strategy. The inline. Sharding - column = # order_id Depots subdivision algorithm spring. Shardingsphere. Sharding. Name t_order. Database - strategy. The inline. Algorithm - expression = ds - $- > {order_id % 2}Copy the code

Sharding – JDBC provides four sharding strategies. For fast building, we first use the simplest intra-line expression sharding strategy to achieve. In the next chapter, we will introduce the detailed usage and usage scenarios of the four sharding strategies.

In the database-strategy.inline-sharding -column attribute, database-strategy indicates the database sharding policy, inline indicates the sharding policy, and sharding-column indicates the sharding key.

Database – strategy. The inline. Algorithm – expression is under the current policy of specific subdivision algorithm, ds – $- > {order_id % 2} expression modulus by means of order_id field depots, number 2 on behalf of the subdivision of the library, Different strategies correspond to different algorithms, which can also be our custom sharding algorithm class.

# # table strategy table fragmentation health spring. Shardingsphere. Sharding. Name t_order. Table - strategy. The inline. Sharding - column = # order_id table algorithm spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order_$->{order_id % 3} # On the primary key field spring. Shardingsphere. Sharding. Name t_order. Key - the generator. The column = order_id # on the primary key ID generation scheme spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKECopy the code

The configuration of the split table policy is similar to that of the split table policy. The difference is that the split table can be configured with key-generator. Column and key-generator. You can also customize the primary key generation algorithm class, which will be explained in more detail later.

# binding table relationships spring. Shardingsphere. Sharding. Binding - tables = t_order, t_order_itemCopy the code

The tables that must be sharded according to the same shard key can be bound to each other, which can avoid cartesian product query in joint query.

# configuration table spring. Shardingsphere. Sharding. Broadcast - tables = t_configCopy the code

Broadcast tables, enable SQL parsing logs, and see the SQL fragment parsing process clearly

Whether # open SQL parsing the log spring. Shardingsphere. Props. SQL. The show = trueCopy the code

3. Verify sharding

After the sharding configuration, we do not need to modify the business code. We can directly add, delete, change and check the business logic. Then verify the effect of sharding.

We insert 5 order records into t_ORDER and T_ORDER_item without specifying the primary key order_ID and item_id.

public String insertOrder() { for (int i = 0; i < 4; i++) { TOrder order = new TOrder(); order.setOrderNo("A000" + i); Order.setcreatename (" order "+ I); order.setPrice(new BigDecimal("" + i)); orderRepository.insert(order); TOrderItem orderItem = new TOrderItem(); orderItem.setOrderId(order.getOrderId()); orderItem.setOrderNo("A000" + i); Orderitem.setitemname (" Service Item "+ I); orderItem.setPrice(new BigDecimal("" + i)); orderItemRepository.insert(orderItem); } return "success"; }Copy the code

Seeing that the order record has been successfully split into different library tables and that the order_ID field automatically generates the primary key ID, the basic shard functionality is complete.

What would be the effect of inserting a data item into the broadcast table T_config?

public String config() { TConfig tConfig = new TConfig(); Tconfig.setremark (" I am a broadcast table "); tConfig.setCreateTime(new Date()); tConfig.setLastModifyTime(new Date()); configRepository.insert(tConfig); return "success"; }Copy the code

See that the T_config table in all libraries executes this SQL, and that the broadcast table is similar to the MQ broadcast subscription schema, in that all subscribed clients receive the same message.

SQL > select * from t_order_item; select * from t_order_item; select * from t_order_item;

The console log shows that after the logical table SQL is parsed, only the T_ORDER_0 and T_ORDER_ITEM_0 tables are associated to generate one SQL.

What happens if you don’t bind tables to each other? Remove the spring. Shardingsphere. Sharding. Try the binding – tables.

It was found that the console parsed 3 real table SQL, but after removing order_ID as the query condition and executing again, the result parsed 9 SQL, cartesian product query. So the advantages of binding tables are obvious.

Five, the summary

The above basic concepts of sharding- JDBC middleware have been briefly combed, and a case of sharding and table has been quickly built. However, this is only the first step in the practice of sharding and table. In the next chapter, we will introduce the specific usage and usage scenarios of the four sharding strategies in detail (must know and must know). Later, we will explain custom distributed primary key, distributed database transaction, distributed service governance, data desensitization and so on.

** reprinted from the public number: programmer that thing

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Sharding-JDBC, tens of millions of data you don’t do?

Sharding-jdbc introduction

Ii. Core Concepts

Three, and JDBC dodgy

Four, fast practice

Five, the summary

Sharding-JDBC, tens of millions of data you don’t do?

Sharding-jdbc introduction

Ii. Core Concepts

Three, and JDBC dodgy

Four, fast practice

Five, the summary

Related Posts

Kick you in the door! Beginners must see, ten thousand words long, suggested collection!

Database +JVM+ Microservices + High Concurrency

New Terraform providers: F5 Networks, Nutanix, Tencent Cloud, Helm