preface

With the increasing amount of system data, when talking about database architecture and database optimization, we often hear such terms as sub-database sub-table.

Of course, there are a lot of methods, such as vertical split, horizontal split; There are also many middleware products, such as MyCat and ShardingJDBC.

Choosing the right split method based on the business scenario, along with a familiar open source framework, can help us complete the data split work involved in the project.

Instead of going into depth on these methodologies and open source frameworks, I’d like to discuss another scenario:

If there are only one or a few tables in the system that need to be split, is it worth introducing some relatively complex middleware products? In particular, if we don’t know much about how they work, can we be confident in harnessing them?

Based on this, if you have a small number of tables that need to be split, and you don’t have resources dedicated to open source components, you can implement a simple library and table plug-in yourself. Of course, if you have a complex system with a high volume of business, it is better to use open source or team developed components to solve the problem.

A, principle

It’s easy to say, it’s complicated…

Simple because the core process is straightforward. SQL statements are parsed and then rewritten or routed to real database tables according to pre-configured rules.

The complexity is that SQL statements are complex and flexible, such as paging, de-duplication, sorting, grouping, aggregation, associated query operations, how to correctly parse them.

So even ShardingJDBC, in the official website also clearly supported and not supported items.

2. Annotated configuration

Instead of complex configuration files, we use a lightweight annotated configuration, which is defined as follows:

@Target({ElementType.TYPE}) @Retention(RetentionPolicy.RUNTIME) @Documented public @interface Sharding { String tableName(); // Logical table name String field(); // Shard key String mode(); // algorithm mode int length() default 0; // table count}Copy the code

So, where do you use it? For example, if our User table needs a separate table, we will annotate the User entity object.

@Data
@Sharding(tableName = "user",field = "id",mode = "hash",length = 16)
public class User {
    private Long id;
    private String name;
    private String address;
    private String tel;
    private String email;
}
Copy the code

So that tells me that I have 16 tables of users, and I’m using a Hash algorithm to compute its position based on the user ID.

Of course, we don’t just have Hash algorithms, we can also define them based on date ranges.

@Data
@Sharding(tableName = "car",field = "creatTime",mode = "range")
public class Car {
    private long id;
    private String number;
    private String brand;
    private String creatTime;
    private long userId;
}
Copy the code

Third, sharding algorithm

Here, the author has implemented two sharding methods, HashAlgorithm and RangeAlgorithm.

1. Scope sharding

If you use hot and cold data separation in your system, we can split the data for different months into different tables by date.

For example, the vehicle creation time is 2019-12-10 15:30:00, this data will be assigned to the car_201912 table.

We can do this by intercepting the month part of the time and then adding the logical table name.

public class RangeAlgorithm implements Algorithm {
    @Override
    public String doSharding(String tableName, Object value,int length) {
        if(value! =null){ try{ DateUtil.parseDateTime(value.toString()); String replace = value.toString().substring(0, 7).replace("-"."");
                String newName = tableName+"_"+replace;
                return newName;
            }catch (DateException ex){
                logger.error("Time format does not meet requirements! Pass argument :{}, correct format :{}",value.toString(),"yyyy-MM-dd HH:mm:ss");
                returntableName; }}returntableName; }}Copy the code

2. Hash sharding

In the Hash sharding algorithm, we can first determine whether the number of tables is a power of two. If not, get the index arithmetically, if so, get the index bit. Of course, this is learned from the HashMap source code.

public class HashAlgorithm implements Algorithm {
    @Override
    public String doSharding(String tableName, Object value,int length) {
        if (this.isEmpty(value)){
            return tableName;
        }else{
            int h;
            int hash = (h = value.hashCode()) ^ (h >>> 16);
            int index;
            if (is2Power(length)){
                index = (length - 1) & hash;
            }else {
                index = Math.floorMod(hash, length);
            }
            return tableName+"_"+index; }}}Copy the code

4. Interceptor

Now that we have the configuration and sharding algorithms, we’re going to get to the big picture. Here, we use the Mybatis interceptor to put them to use.

Those of us who have spent years in CRUD know that a business SQL cannot escape their scope. In business, our deletion function is usually logical deletion, so there is basically no DELETE operation.

In comparison, new and modified SQL is relatively simple and fixed format, query SQL is often flexible and complex. So, here I define two interceptors.

Before we get to interceptors, though, there’s a reason to know two more things: SQL syntax parsers and sharding algorithm processors.

1, JSqlParser

JSqlParser parses SQL statements and transforms them into a hierarchy of Java classes. And we can start by looking at a simple example.

public static void main(String[] args) throws JSQLParserException {

	String insertSql = "Insert into user (id,name,age) value(1001,' fanyi ',20)";
	Statement parse = CCJSqlParserUtil.parse(insertSql);
	Insert insert = (Insert) parse;

	String tableName = insert.getTable().getName();
	List<Column> columns = insert.getColumns();
	ItemsList itemsList = insert.getItemsList();
	System.out.println("The name of the table."+tableName+"The column name."+columns+"Properties."+itemsList); } Output: table name :user Column name :[id, name, age] Property :(1001,'such', 20)
Copy the code

As you can see, JSqlParser can parse SQL syntax information. Accordingly, we can change the object content to modify the SQL statement.

2. Algorithm processor

We have multiple sharding algorithms, which one should be called is determined at runtime. So, we use a Map to register the algorithm and then invoke it according to the sharding pattern. This is also the embodiment of the strategic pattern.

@Component
public class AlgorithmHandler {
    private Map<String, Algorithm> algorithm = new HashMap<>();
    @PostConstruct
    public void init(){
        algorithm.put("range",new RangeAlgorithm());
        algorithm.put("hash",new HashAlgorithm());
    }
    public String handler(String mode,String name,Object value,int length){
        returnalgorithm.get(mode).doSharding(name, value,length); }}Copy the code

3. Interceptors

As we know, MyBatis allows you to intercept calls at some point during the execution of mapped statements.

If you are not familiar with how it works, you can take a look at my article: How the Mybatis interceptor works.

Overall, its process is as follows:

  • throughMybatisIntercepting SQL to be executed;
  • throughJSqlParserParse SQL, obtain logical table names, etc.
  • Call the sharding algorithm to get the real table name;
  • Modify SQL, and modifyBoundSql;
  • MybatisExecute the modified SQL to achieve the purpose.

For example, the core code for insert and UPDATE statements is as follows:

Five, query and paging

In fact, the addition and modification are relatively simple, more complex is the query statement.

However, our plug-in is not designed to satisfy all queries, but can be extended and modified according to real business scenarios.

But the paging function is basically inescapable. Take PageHelper for example, which is also implemented through the Mybatis interceptor. If it is with our partitioned table plug-in, there may be conflicts.

So in the split-table plug-in, I also integrated paging, basically the same as PageHelper, but didn’t use it directly. In addition, for the query, whether there is a sharding key in the query condition is also a very critical place.

1, the query

In the scope algorithm, in business, we only need to query the data of a specific month or recent months; In the Hash algorithm, we want a primary key every time.

But the second condition is often not true, and the business side is not satisfied that it must have a primary key every time.

In this case, we can only traverse all the tables, query the data that meets the conditions, and then return the summary;

2, paging

As mentioned above, the plugin integrates paging and implements the same process as PageHelper, but it is not used directly because of conflicts.

Sixth, other

In fact, when I think of the title of this article, it is really quite distressed. Because sub – library sub – table in the industry is a word, but this plug-in does not involve sub – library, only sub – table operation, but the focus of this article is the idea, finally called sub – library sub – table, please forgive my friends, do not call me the title party ~

Due to the limitation of space, this paper is only a small amount of code, if interested friends can go to https://github.com/taoxun/sharding for a complete Demo.

The author’s code includes some test cases and SQL to build a table. After creating the table, you can run the project directly.