The background,

The background management system stores the customer clue data of each company. The usage scenario is to provide the data background system for each company in ORM. Each company can log in and view its own company data through its own account.

At present, the single table data volume has reached 877 w +, and growth speed within the range of 90 w to 120 w per month, half a year after the data may be more than never, the background in the face of this line table on page paging filter query, statistical efficiency may be lower, so you need to take apart this table, table there are a lot of fields, in order to simplify and desensitization, I won’t write out entirely, Just list the main fields:

CREATE TABLE 'clue_notify' (' id 'bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT' 增 increment increment ', 'clue_id' bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'clue ID', 'visitor_id' bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'company ID', 'city_id' int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'city_id ', 'create_time' datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'create_time ', 'update_time' TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'update_time ', 'is_delete' tinyint(4) NOT NULL DEFAULT '0' COMMENT 'PRIMARY KEY (' id'), KEY `idx_clue_visitor_id` (`clue_id`,`visitor_id`) USING BTREE, KEY `idx_create_time` (`create_time`) USING BTREE, KEY 'idx_update_time' (' update_time ') ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Copy the code

The existing data amount is 877W+

growthThe daily increase is nearly 4W

Two, the resolution scheme

Horizontal split, the goal is to evenly distribute data to 10 tables as much as possible. If the amount of a table is 1000W, it can achieve a stable query of 100 million data. There are two main candidate schemes at present, one is to divide according to the company data table, that is, to fragment according to the visitor_ID field, the other is to divide according to the clues ID. That is, clue_id field is sharded. The two schemes are analyzed below

Solution 1: Slice according to visitor_id

Divided table by company data, that is, divided table by visitor_ID field

advantages

  1. The clue data of the same company will only be stored in the same table, and the data will not be too scattered. Generally, the data of the same company will only be checked without the merging of query results and result sets, which has great performance advantages
  2. The later associated data query will bring great convenience, the later query know the visitor_ID can determine which sub-table to query, simple read and write logic

disadvantages

Different companies have different customer traffic, which varies greatly. The data is not evenly dispersed and the growth rate varies greatly. According to the number of companies currently connected, it cannot be dispersed into 10 tables, and the data sharding effect is difficult to achieve the expected results

Solution 2: Fragment according to clue_id

The clue_ID field is sharded according to the clue ID, and the company attribute is ignored when dividing the table

advantages

The data is evenly distributed, avoiding the disadvantages of Scheme 1 and not affected by the amount of business data of the company

disadvantages

The data of the same company will be scattered into various sub-tables, which may be troublesome for business query. However, there are mature solutions using database sub-table components. Although components have automatic merging function, merging of query results will affect performance to a certain extent

conclusion

Scheme 1 is more consistent with business characteristics. Although data cannot be distributed more evenly, it can reduce multiple table queries and data merges and is more consistent with business characteristics. The problem of uneven data can be avoided by combining manual specification and default sharding rules

Iii. Implementation scheme of data table disassembly technology

At present, the mature open source database middleware in China includes Sharing-JDBC and MyCAT. The following two middleware will be compared and their similarities will be found:

  1. The design concept is the same, the main process is SQL parsing ->SQL routing ->SQL rewriting ->SQL also line -> result merge
  2. The principle of query is the same. Non-shard field query queries all shards and merges the results. Shard field query directly searches the shard database query.

Difference:

  1. Mycat is proxy-based, similar to Nginx, which copies the MySQL protocol and disguises MycatServer as a MySQL database. The advantage is to ensure the security of the database, the result of merging data is completely decoupled, the disadvantage is low efficiency.
  2. Sharding-jdbc is a JDBC-based extension that provides lightweight services in the form of JAR packages. The advantage is high efficiency, while the disadvantage is that the result of merging data is not decoupled, which may affect our business logic code. It’s also easy to run out of memory, so do paging.

In addition to the above open source middleware, sub-table can also be implemented by ourselves. The main work is to implement sharding rules and distributed primary key ID generation algorithm by ourselves. In order to reduce the changes of the original code, a mybatis plug-in can be implemented to dynamically modify the table name to complete sub-table query

Plan 1. Mycat

MyCat is a third-party server-side database middleware. All JDBC requests from clients must be submitted to MyCat and then forwarded to a specific real server by MyCat. Mycat is an independent application that needs to be deployed separately. It shields the complex logic of application processing of sub-libraries and sub-tables in the form of logical tables. It complies with the Mysql native protocol, cross-language and cross-platform, and has more common application scenarios.

advantages

Across languages, more universal

disadvantages

Implementation complex, need to deploy a separate cluster, official documentation chicken ribs

Sharding-Jdbc

Sharding-Jdbc is a local database middleware framework, using Jar form, in the local application layer to rewrite the Jdbc native method, to achieve the form of database fragmentation.

advantages

From the perspective of architecture, Sharding-JDBC is more consistent with the design of distributed architecture, directly connected to the database, without intermediate applications, and the theoretical performance is the highest (the actual performance needs to be combined with specific code implementation, theoretical performance can be understood as the upper limit, through continuous optimization of code implementation, gradually approaching the theoretical performance). From a development point of view, there is starter for SpringBoot and less code change

disadvantages

As a component, it needs to be integrated in the application, which means that as a user, it must be integrated into the code, making the development cost relatively high. On the other hand, the need for integration within the application makes it necessary to target different languages (Java, C, PHP…) There are different implementations (in fact sharding-JDBC currently only supports Java), so the maintenance costs of the components themselves can be high. Finally, the application scenario is limited to the application scenario developed by Java.

Sharding – JDBC will effect all the business project list, because eventually ShardingPreparedStatement to do all the database interaction, some of the history of the SQL statement because SQL function or other writing, Prevented ShardingPreparedStatement processing and abnormal.

Scheme 3. Customize the sub-table strategy and implement the sub-table through mybatis plug-in

Compared with the convenience brought by middleware out of the box, this method will require a little more work in development. Sharing-jdbc will directly take over the original data source, which may bring some uncontrollable effects. Therefore, the advantages and disadvantages of custom table partitioning strategy mainly include the following. Compared with Sharding, it is more controllable and only intercepts where sub-tables are needed, rather than taking over data sources completely. Advantages: Accurate and controllable Disadvantages: Large amount of code, need to implement sub-table strategy and primary key generation strategy, need to develop mybatis plug-in

conclusion

Mycat was too heavy to take into consideration. Since the scene of split table was very simple and there were already some historical SQL, we worried that introducing Sharding would cause other SQL exceptions that did not involve table, so we finally chose plan 3

4. Implementation scheme of sub-table

Implementation scheme 1. Use Mybatis to implement the custom sub-table strategy

Table strategy

ShardingMap is a specified sub-table written in the configuration file. The configuration is as follows:

sharding:
  notify:
    tableCount: 10
    shardingMap: {10 : 8}
Copy the code
/** * @description * @date 2021/11/30 4:06 PM */ public interface {String UNDERLINE = "_"; /** * calculate the corresponding table number * @param tableNamePrefix tableNamePrefix * @param shardingKey sub-table key * @return sub-library number */ String getTargetTableName(String tableNamePrefix, Object shardingKey); }Copy the code

Default table splitting policy:

/** * </br> * </br>10 tables * </br> hash 10 * @date 2021/11/30 4:08pm */ @component @slf4j public class DefaultShardingTableStrategy implements IShardingTableStrategy {/ table name prefix * * * * / public static final String TABLE_NAME_PREFIX = "clue_notify"; private static final String UNDERLINE = "_"; @Autowired NotifyShardingMapProperties properties; /** ** No specified rule operation result by mode * @param tableNamePrefix tableNamePrefix * @param shardingKey sub-table key * @return target table name */ @override @cacheable (value = {"getTargetTableName"}, key = "#root.methodName + ':' + #tableNamePrefix + ':' + #shardingKey") public String getTargetTableName(String tableNamePrefix, Object shardingKey) { Map<Long, Long> shardingMap = properties.getShardingMap(); Class<? > aClass = shardingKey.getClass(); if (aClass ! Class) {throw new RuntimeException(" shard key type error, please check shard key type "); } long visitorId = (long) shardingKey; return TABLE_NAME_PREFIX + UNDERLINE + shardingMap.getOrDefault(visitorId, visitorId % properties.getTableCount()); }}Copy the code

Table annotation

/** ** Table names can be automatically replaced * all method names will be scanned on Mapper interface class names * method names written on Mapper interface methods will be scanned on current annotations * @date 2021/11/30 4:29 PM */ @Target({ElementType.TYPE, ElementType.METHOD}) @Retention(retentionPolicy.runtime) public @Interface TableSharding {// Table prefix String tableNamePrefix(); // value,visitor_id Input parameter name String value() default ""; Boolean fieldFlag() default false; // The corresponding subtable policy Class<? extends IShardingTableStrategy> shardingStrategy(); }Copy the code

Mybatis interceptor

/** * @description: * mybatis partition table interceptor, I want to write the distributed ID together with the interceptor, but there is only one place to write the ID, * if there are many places to write the ID, then I want to write the distributed ID together with the interceptor. Create time and primary key are generated by different mechanisms, and may be due to concurrency, */ @intercepts ({@intercepts (type = statementhandler.class,) {@intercepts (type = statementhandler.class,); method = "prepare", args = {Connection.class, Integer.class}) }) @Component @Slf4j public class TableShardingInterceptor implements Interceptor { private static final  ReflectorFactory DEFAULT_REFLECTOR_FACTORY = new DefaultReflectorFactory(); @Override @SuppressWarnings(value = {"unchecked", "Rawtypes "}) public Object intercept(Invocation) throws MetaObject {// MetaObject is a utility class provided in mybatis. MetaObject = getMetaObject(Invocation); BoundSql boundSql = (BoundSql) metaObject.getValue("delegate.boundSql"); MappedStatement mappedStatement = (MappedStatement) metaObject.getValue("delegate.mappedStatement"); Method Method = Invocation Method(); TableSharding tableShard = getTableShard(method, mappedStatement); TableSharding tableShard = getTableShard(method, mappedStatement); If (TableShard == NULL) {return invocation. Proceed (); if (TableShard == null) {return Invocation. } // Obtain the value String Value = tableshard.value (); //value Specifies whether to specify the field name. If yes, parse the value of the field name of the request parameter. Boolean fieldFlag = tablesHard.fieldFlag (); Object valueObject = null; If (fieldFlag) {/ / get request parameter Object parameterObject = boundSql. GetParameterObject (); ParamMap parameterMap = if (parameterObject Instanceof mapperMethod.parammap) {mapperMethod.paramMap parameterMap = (MapperMethod.ParamMap) parameterObject; ValueObject = parametermap. get(value); valueObject = parameterMap.get(value); If (valueObject == null) {throw new RuntimeException(String.format(" input field %s does not match ", value)); If (isBaseType(parameterObject)) {throw new RuntimeException(" One parameter is invalid, Please name this parameter with the @param annotation "); } if (parameterObject instanceof Map) { Map<String, Object> parameterMap = (Map<String, Object>) parameterObject; valueObject = parameterMap.get(value); } else {// Non-base type object Class<? > parameterObjectClass = parameterObject.getClass(); Field declaredField = parameterObjectClass.getDeclaredField(value); declaredField.setAccessible(true); valueObject = declaredField.get(parameterObject); }} // replaceSql replaceSql(tableShard, valueObject, metaObject, boundSql); // The next plugin logic return invocation.proceed(); Override public Object plugin(Object target) {Override public Object plugin(Object target) { If (target instanceof StatementHandler) {return plugin.wrap (target, this); } else { return target; }} /** * Basic data type validation, true yes, false no ** @return: boolean */ private boolean isBaseType(Object object) { return object.getClass().isPrimitive() || object instanceof String || object instanceof Integer || object instanceof Double || object instanceof Float || object instanceof Long || object instanceof Boolean || object instanceof Byte || object instanceof Short; } /** * @param tableShard Table annotation * @Param Value Value * @param metaObject Mybatis reflection object * @param boundSql SQL information object * @description: ReplaceSql */ private void replaceSql(TableSharding tableShard, Object value, MetaObject MetaObject, BoundSql boundSql) { String tableNamePrefix = tableShard.tableNamePrefix(); // Get policy class class <? extends IShardingTableStrategy> strategyClazz = tableShard.shardingStrategy(); // Get the policy class from the Spring IOC container. You can also create beans by reflection. IShardingTableStrategy tableShardStrategy = Springutil. getBean(strategyClazz); / / generated table name String shardTableName = tableShardStrategy. GetTargetTableName (tableNamePrefix, value); SQL = boundSQL.getsql (); Metaobject.setvalue (" delegate.boundSQl. SQL ", sql.replaceAll(tableNamePrefix, shardTableName)); Mybatis = mybatis = mybatis = mybatis */ Private MetaObject getMetaObject(Invocation) {StatementHandler StatementHandler = (StatementHandler) invocation.getTarget(); // MetaObject is a utility class provided in Mybatis. Return metaobject. forObject(statementHandler, systemMetaObject. DEFAULT_OBJECT_FACTORY, SystemMetaObject.DEFAULT_OBJECT_WRAPPER_FACTORY, DEFAULT_REFLECTOR_FACTORY); } /** * get table annotation information ** @param method Target method * @param mappedStatement mappedStatement * @return table annotation information */ private TableSharding getTableShard(Method method, Throws ClassNotFoundException {//id is in the full method name, packageName.classsName.methodName String id = mappedStatement.getId(); Class final String ClassName = id.subString (0, id.lastIndexof (".")); // Table annotations TableSharding tableShard = null; TableShard = method.getannotation (tablesharding.class); // Obtain the TableShard annotation for the Mapper execution method. // If the method does not set annotations, Obtain TableShard from the Mapper interface. If (TableShard == NULL) {// Obtain TableShard. TableShard = Class.forName(className).getAnnotation(TableSharding.class); } return tableShard; }}Copy the code

Adding interceptors

sqlSessionFactory.getConfiguration().addInterceptor(tableShardingInterceptor);
sqlSessionFactory.getConfiguration().addInterceptor(mobileFillInterceptor);
sqlSessionFactory.getConfiguration().addInterceptor(pageHelper);
Copy the code

Implementation scheme 2. Use Sharding to implement sub-table

Although this scheme was not adopted in the end, we also tried to implement sub-table with Sharding. The code change momentum is indeed small and the configuration is very simple. The advantages and disadvantages will not be discussed here

Correlation dependency introduction

Add sharding-jdbc-spring-boot-starter and Sharding-jdbc-spring-namespace

<! Shardingsphere --> <shardingsphere.version>4.1.1</shardingsphere.version> <! - shardingsphere latest version - > < the dependency > < groupId > org. Apache. Shardingsphere < / groupId > <artifactId>sharding-jdbc-spring-boot-starter</artifactId> <version>${shardingsphere.version}</version> </dependency> <dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>sharding-jdbc-spring-namespace</artifactId> <version>${shardingsphere.version}</version> </dependency>Copy the code

Data Source Configuration Changes

The multi-data source configuration only needs to be modified. The configuration modification is mainly in the following two aspects:

  1. The data source uses the ShardingDataSource instead of the HikariDataSource
  2. Configure a data fragmentation policy
@Configuration public class DatabaseConfig { private static final String BRAND_DB_NAME = "db58_cloudstore_brand"; private static final String GROUP_DB_NAME = "ydtyuanquan_db"; /** * ydtyuanquan_db ** @return */ @lazy @bean (name = "dataSource") public dataSource getDataSource(WConfig wConfig) throws SQLException { HikariDataSource hikariDataSource = getHikariDataSourceByDBName(wConfig, GROUP_DB_NAME); Map<String, DataSource> dataSourceMap = new HashMap<>(2); dataSourceMap.put(GROUP_DB_NAME, hikariDataSource); / / shard rules ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration (); shardingRuleConfig.getTableRuleConfigs().add(getOrderTableRuleConfiguration()); Properties = new Properties(); properties.put("sql.show","true"); return ShardingDataSourceFactory.createDataSource(dataSourceMap, shardingRuleConfig, properties); } /** * Sharding rules, Clue_notify_sharding for logical table * @ return * / private TableRuleConfiguration getOrderTableRuleConfiguration () {/ / t_order specified Table data distribution, Configuration data node TableRuleConfiguration Result = new TableRuleConfiguration("clue_notify_sharding",GROUP_DB_NAME + ".clue_notify_$->{0.. 9} "); // Specify the sharding policy for t_order, Shard strategy including the shard key and subdivision algorithm. The result setTableShardingStrategyConfig (new InlineShardingStrategyConfiguration (" clue_id ", "clue_notify_$->{clue_id%10}")); result.setKeyGeneratorConfig(getKeyGeneratorConfiguration()); return result; } /** * define the primary key generation policy, Table primary key to generate a primary key using the algorithm of snowflakes * @ return * / private static KeyGeneratorConfiguration getKeyGeneratorConfiguration () {return new KeyGeneratorConfiguration("SNOWFLAKE","id"); }}Copy the code

Effect of the configuration

$is a placeholder and is replaced by {}

Problems encountered

1, when paging queries, sort fields are scattered in child tables

The original SQL:

select * from clue_notify_sharding where visitor_id = ? order by update_time desc limit 0, 2
Copy the code

Effect after table splitting: Sharding-JDBC is converted to logical table paging query

A real SQL query would be executed once in each sub-table, with the following effect:

. .

The effect is to perform a paging query in parallel in each of the subtables, and then perform paging after aggregating the data

Performance problem: Sharding-JDBC uses streaming processing + merge sort to avoid excessive memory occupation. Excessive paging of query offset will lead to low performance of database data acquisition

2. How to handle the join statement

Join statements associated with sharded keys can be reused directly

How to handle primary key after table

Primary key can no longer use the original simple increment database, sharding-JDBC built-in two solutions UUID and Snowflake, UUID disordered is not suitable for the main key, orderly gradually in solving some large offset paging query slow scenarios more helpful, can also use fixed step to achieve increment. But not conducive to the subsequent increase in the number of sub-tables

snowflake

Snowflake is an open source solution for Twitter: a previous note was devoted to the distributed ID solutions and code generated by Snowflake, which I won’t write here

Six, separate table to replace the original single table on-line scheme

1. Data from the old table is flushed to the child table

To maintain primary key auto-increment, it is recommended to use single-threaded processing

2. Double write data

Before the system is stable, it is recommended to double write data. The old table and word table are recommended to be written in first. After the system is stable, the original old table can be abandoned

【 reference 】 【 1 】 blog.csdn.net/xiaojin21ce… 【 2 】 shardingsphere.apache.org/document/le…