This is the ninth day of my participation in the First Challenge 2022. For details: First Challenge 2022.

It has been a long time since I wrote a blog. I have been a little busy lately. Some time ago, I happened to share the usage of shardingJdbc in my company.

In my opinion, the concept of separate database and table should still belong to a transitional scheme in the process of technological development. If one day mysql supports high performance level expansion, or more mature database products appear, then this technology will not exist necessary.

In this lecture we are going to focus on some conceptual things.

A brief introduction to the database and table

1.1 Why to use sub-database sub-table

Prerequisite: If a large amount of data is stored in a single table, the SQL execution performance deteriorates and the SQL runs slowly. As a rule of thumb, performance degrades when the meter is in the millions

Split premise: For a single library, the maximum concurrency may be around 2000, but for a robust single library the best concurrency is around 1000. When a single database grows or grows concurrently, the data from one library can be split into multiple libraries

1.2 When should we divide the database and table

So says alibaba’s handbook.

Ii. Introduction to ShardingSphere

2.1 Function Description

Website address: shardingsphere.apache.org/index_zh.ht…

ShardingSphere became the Apache Software Foundation’s Top project on April 16, 2020.

Apache ShardingSphere – Version: 5.1.0

Features List:

Data fragmentation

  • Sub-library & sub-table

  • Reading and writing separation

  • Customize sharding policies

  • Decentralized distributed primary keys

Distributed transaction

  • Standardized transaction interface

  • XA strong consistent transactions

  • Flexible transaction

Database governance

  • Distributed governance

  • Elastic scaling

  • Visual link tracking

  • Data encryption

2.2 Horizontal and vertical split

Vertical split: Also known as vertical split, refers to the operation of the library/table, divided into different parts of the structure.

Horizontal splitting: Splitting the same library/table into multiple libraries/tables with the same structure

2.3 Core Concepts

  • Logical table: The collective name for the same logical and data structure table of a horizontally split data table. For example, if the order table is split into 10 tables based on the primary key, t_ORDER_0 to T_order_9, then their logical table name is T_ORDER

  • Real table: A physical table that actually exists in a sharded database. T_order_0 to T_order_9 in the previous example

  • Data node: The smallest unit of a data fragment. It consists of a data source name and a data table, for example, ds_0.t_ORDER_0

  • Binding table: the main table and word table with the same sharding rules. For example, if the t_ORDER and T_order_item tables are shard according to order_ID, the two tables are bound tables. Cartesian product association will not appear in multi-table associated query between bound tables, and the association efficiency will be greatly improved.

    SELECT i.* FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

  • Broadcast table: a table that exists in all shard data sources and has exactly the same structure and data in it in each database. This method is applicable to scenarios where the data volume is small and massive data tables need to be associated with query, for example, dictionary tables

2.4 Four sharding algorithms

  • PreciseShardingAlgorithm

Used for sharding = and in using a single key as the sharding key. Needs to be used in conjunction with the StandardShardingStrategy.

  • RangeShardingAlgorithm – RangeShardingAlgorithm

Used for sharding Between And with a single key as the sharding key. Need to work with StandardShardingStrategy

  • Composite subdivision algorithm – ComplexKeysShardingAlgorithm

It is used to handle the scenario where multiple sharding keys are used as sharding keys. The logic of multiple sharding keys is complex and requires application developers to handle the complexity by themselves. It needs to work with ComplexShardingStrategy

Note: In our business development, we often need to query a user’s record list according to the user ID and a user’s record according to the primary key of a business, which requires the use of compound sharding algorithm. For example, in an order table, we need to query both the order list data for a time period for a userId and the order data for an orderId. In this case, orderId and userId are compound shard keys.

  • HintSharding algorithm – HintShardingAlgorithm

Hint sharding refers to scenarios in which shard fields are determined by other external conditions rather than SQL. You can flexibly inject shard fields by using SQL Hint.

Hint Sharding is a strategy that bypassed SQL parsing, so it is possible to implement the algorithm to achieve syntax restrictions that sharding-JDBC does not support. Hint row sharding is used for scenarios where Hint row sharding is used together with HintShardingStrategy

2.5 Five Sharding Strategies

  • StandardShardingStrategy – StandardShardingStrategy

Supports sharding =, in, between and in SQL statements. StandardShardingStrategy only supports single shard keys and provides PreciseShardingAlgorithm and RangeShardingAlgorithm.

PreciseShardingAlgorithm is required to handle = and in fragments. PreciseShardingAlgorithm is optional for processing Between and sharding. If RangeShardingAlgorithm is not configured, Between and in Sql will be processed as full-library routing.

  • Compound sharding strategy – ComplexShardingStrategy

Support for sharding =, in, and between and in SQL statements. ComplexShardingStrategy supports multiple sharding keys. Because of the complex relationship between multiple sharding keys, it does not encapsulate too much. Instead, it directly combines sharding key values and sharding operators into the sharding algorithm, which is fully implemented by application developers to provide maximum flexibility.

  • Row expression sharding strategy – InlineShardingStrategy

Use groovy expressions to support single sharding of = and in in SQL statements. For example, t_user$->{u_id%8} indicates that the T_USER table is divided into eight tables according to u_id mode 8, named T_user_0 to t_user_7

  • HintSharding strategy – HintShardingStrategy

A strategy for sharding by Hint rather than SQL parsing

  • Non-sharding Strategy – NoneShardingStrategy

This policy is the non-fragmentation policy

We’ll stop there with some of the conceptual stuff, write an article and we’ll do it in real time