Cabbage Java self study room covers core knowledge

Database and table solution MyCat series Basic concepts database and table solution MyCat series data and table solution MyCat series cluster transactions database and table solution MyCat series route distribution solution

1. Fragment Join of Mycat

1.1 summary of the join

Join is definitely one of the most commonly used features in relational databases, but cross-sharding joins are the most complex and difficult to solve in distributed environments. Let’s take a look at the various Join operations.

Inner join

An inner join, also known as an equivalent join, produces A set of data that matches both table A and table B. As shown in figure:

Left join

The left join produces A complete set of records from table A (left), with matching table B records (right table). If there is no match, the right side will contain null, which is equivalent to the left outer JOIN in Mysql. As shown in figure:

Right join

Left join (AB);

Cross join

Cross join, the result is the product of two tables, the Cartesian product. The Descartes product is also called the direct product. Assuming that set A = {A, b}, set b = {0}, the two sets of cartesian product for {(A, 0), (A, 1), (A, 2), (b, 0), (b, 1), (b, 2)}. Can be extended to multiple collections. Similarly, if A represents the collection of students of A school and B represents the collection of all courses of the school, then the Cartesian product of A and B represents all possible courses.

Full join

All records produced by A full join (both matching records) are in tables A and B. If there is no match, the opposite side will contain NULL.

Performance Suggestions

  • Avoid Left join or Right Join and use Inner Join instead
  • When Left JOIN or Right Join is used, ON is preferentially executed, and WHERE condition is executed last. Therefore, in the process of use, conditions should be judged in ON statement as much as possible to reduce where execution
  • Use less subqueries and more joins.

The current version of Mycat supports cross-sharding joins in four main ways. Global table, ER Sharding, catletT(artificial intelligence) and ShareJoin, ShareJoin is supported in development version, the first three are supported in 1.3.0.1.

1.2. global table for Mycat

In a real business system technology, a large number of similar dictionary table table, they may have relationship with the business table, this kind of relationship, can be understand as “tag”, and should not be interpreted as the usual “master-slave relationship”, these tables are basically rarely changes, can according to the primary key ID for caching, the following picture illustrates a typical “label relation graph:

In the case of sharding, when the business table is sharded because of the size, the association between the business table and these affiliated dictionary tables becomes a tricky problem, considering the dictionary table has the following characteristics:

  • Infrequent changes
  • The amount of data changed little overall
  • Data sizes are small, rarely exceeding hundreds of thousands of records

For this reason, MyCAT defines a special kind of table called a “global table” that has the following properties:

  • The insertion and update of the global table will be performed on all nodes in real time to keep the data consistency of each shard
  • Query operations for global tables are retrieved from only one node
  • A global table can JOIN any table

Defining the dictionary table or some tables that conform to the characteristics of dictionary table as global table solves the problem of data JOIN well from another aspect. With a global table + sharding strategy based on e-R relationships, MyCAT can satisfy more than 80% of enterprise application development.

configuration

Global table configuration is simple. You do not need to write rules. You can configure the global table as follows:

<table name="company" primaryKey="ID" type="global" dataNode="dn1,dn2,dn3" /> 
Copy the code

It is important to note that each shard node of the global table must have DDL statements running to create the table.

1.3. ER the Join

MyCAT draws on the design ideas of Foundation DB, a newcomer in the field of NewSQL. Foundation DB innovatively puts forward the concept of Table Group, in which the storage location of sub-tables depends on the main Table and is physically close to each other. Therefore, the efficiency and performance problems of JION are completely solved. According to this idea, a data sharding strategy based on E-R relation is proposed, in which the records of child table and associated parent table are stored in the same data sharding.

Customer adopts the sharding-by-intfile sharding strategy. The sharding is on DN1 and DN2, and Orders relies on the parent table for sharding. The association between the two tables is orders.customer_id= Customer.id. Thus, the schematic diagram of data fragmentation and storage is as follows:

In this way, the customer on shard Dn1 can JOIN with orders on shard Dn1 locally, and the same is true for Dn2. Then the data on the two nodes can be merged to complete the whole JOIN. Imagine that there are 1 million orders tables on each shard. The data sharding mode based on E-R mapping basically solves more than 80% of the problems faced by enterprise applications.

configuration

Using the above example, schema. XML defines the following sharding configuration:

<table name="customer" dataNode="dn1,dn2" rule="sharding-by-intfile"> 
    <childTable name="orders" joinKey="customer_id" parentKey="id"/> 
</table> 
Copy the code

1.4. Share the join

ShareJoin is a simple cross – fragment Join based on HBT. Currently, join of two tables is supported. The principle is to parse SQL statements, divide SQL statements into single tables for execution, and then collect data of each node.

configuration

Table A and B of any configuration are supported, as follows:

The datanodes of A and B are the same

<table name="A" dataNode="dn1,dn2,dn3" rule="auto-sharding-long" /> 
<table name="B" dataNode="dn1,dn2,dn3" rule="auto-sharding-long" />
Copy the code

The datanodes of A and B are different

<table name="A" dataNode="dn1,dn2 " rule="auto-sharding-long" /> 
<table name="B" dataNode="dn1,dn2,dn3" rule="auto-sharding-long" />
Copy the code

or

<table name="A" dataNode="dn1 " rule="auto-sharding-long" /> 
<table name="B" dataNode=" dn2,dn3" rule="auto-sharding-long" /> 
Copy the code

1.5. Catlet (Artificial Intelligence)

Solving the problem of cross-shard SQL JOIN is far more complex than imagined, and it is often unable to achieve efficient processing. In this case, it relies on artificial intelligence to programmatically solve the JOIN logic of several specific SQL that must cross shard in the business system. MyCAT provides specific API for programmers to call. This is MyCAT’s innovative idea — artificial intelligence. Take a cross-node SQL as an example.

select a.id,a.name,b.title from a,b where a.id=b.id;
Copy the code

Where a is on fragment 1, 2, and 3, and B is on fragment 4, 5, and 6, all data needs to be pulled to the local (MyCAT server) to execute JOIN logic, the specific process is as follows (only one possible execution logic) :

SQLEngine EngineCtx CTX = new EngineCtx(); mycat.sqlengine EngineCtx CTX = new EngineCtx(); String sql = "select a.id, a.name from a"; / / on a table where all the shard sequentially following the local SQL CTX. ExecuteNativeSQLSequnceJob (allAnodes, new DirectDBJoinHandler ());Copy the code

DirectDBJoinHandler class is a callback class that is responsible for processing data packets returned during SQL execution. The main purpose of this class is to use the ID information returned by a table, to query the records of b table, to do the real time association:

Public class DirectDBJoinHandler {// Key = ID,value = Column original Byte array, Private HashMap<byte[], byte[]> rows; Public Boolean onHeader(byte[] header) {// Save header information, Public Boolean onRowData(byte[] rowData) {String ID = getColumnAsString("id"); public Boolean onRowData(byte[] rowData) {String ID = getColumnAsString("id"); Rows. Put (getColumnRawBytes("id"), rowData); String SQL = "select B.id, b.name from b where id in(......) "; / / this SQL concurrent execution on B all the nodes, the results returned directly to the client output CTX. ExecuteNativeSQLParallJob (allBNodes, SQL, new MyRowOutPutDataHandler (rows)); } public Boolean onRowFinished() { } public void onJobFinished() { if (ctx.allJobFinished()) { } } }Copy the code

Finally, add a Job event listener, where the RowEnd package is sent to the client after all the jobs are completed to end the process.

    ctx.setJobEventListener(new JobEventHandler() {
        public void onJobFinished() {
            client.writeRowEndPackage()
        }
    });
Copy the code

The above provides a SQL execution framework, is completely asynchronous mode execution, and will provide more high-quality API, simplify distributed data processing, such as memory combined file data JOIN algorithm, grouping algorithm, sorting algorithm and so on.

1.6. Spark/Storm extends join

Is Spark related to Storm and Join? Is Spark, Storm necessary?

The following functions of Mycat will introduce Spark and Storm to make cross-shard join. The general process is as follows: Mycat calls the API of Spark and Storm, and sends data to Spark, Storm, and Spark. Storm joins back to Mycat, which is returning to the client.

2. Sharding rule of Mycat

In data segmentation, especially in horizontal segmentation, the two final processing processes of middleware are data segmentation and data aggregation. Choosing the right sharding rule is crucial, because it determines the difficulty of subsequent data aggregation and can even avoid cross-library data aggregation.

There are several important principles in data sharding, including data redundancy and Table Group, which are good ways to avoid cross-library join in business. However, not all business scenarios are suitable for such rules, so this chapter will describe how to choose the appropriate sharding rules.

2.1. The global table

If you have some data similar to a data dictionary in your business, such as configuration file configuration, common service configuration, or tables with small amount of data that rarely change, these tables are usually not very large, and most business scenarios will use, then this table is suitable for Mycat global table without data shard. In the Join operation, Mycat will preferentially select the Join of the global table in the same shard when the business table and the global table perform Join aggregation, avoiding cross-library Join. During the data insertion operation, Mycat will distribute data to all the shards corresponding to the global table for execution. During data reading, a node will be randomly acquired to read data.

The configuration of the global table is as follows:

<table name="t_area" primaryKey="id" type="global" dataNode="dn1,dn2" />
Copy the code

2.2. ER fragment table

For one kind of business, such as order and order_detail, the detail table will depend on the order, that is to say, there will be a master-slave relationship of the table. In this way, appropriate sharding rules can be abstracted for similar business sharding, such as sharding according to the user ID. Other related tables depend on the user ID. Or shard by order ID. In any case, part of the business can always be abstracted into a parent-child relationship table. This type of table applies to ER sharded tables. Records of child tables and associated parent tables are stored in the same data shard to avoid data Join cross-library operations.

Taking order and order_detail as examples, the following sharding configuration is defined in schema. XML. Order and order_detail are partitioned according to order_ID to ensure that the data of the same order_ID is divided into the same shard. When data is inserted, Mycat retrieves the order shard and inserts order_detail into the order shard as well.

<table name="order" dataNode="dn$1-32" rule="mod-long"> 
    <childTable name="order_detail" primaryKey="id" joinKey="order_id" parentKey="order_id" /> 
</table> 
Copy the code

2.3. Many-to-many relationships

There is A business scenario of “Main table A+ relational table + Main table B”, for example, member + order + merchant. How to divide this kind of business?

Member for from the perspective of members, if need to purchase order, it shall be carried out in accordance with the member segmentation can, but if you want to query merchants sold the order, so need to do segmentation according to the merchants, but if according to the members and should be in accordance with the merchants in segmentation, almost impossible to achieve, how to choose this kind of business it is very difficult to segmentation rules. At present, the association between the three tables in this mode is not well supported. Now the general rule is needed from A business perspective, A relational table more which table, namely the relationship between “A” or “B”, to determine the relationship between table storage, follow the direction of future Mycat version will consider A two-way replication middle table, in order to realize from A – relational tables and B – relational tables of bidirectional association query as shown in the figure below:

Primary key sharding vs non-primary key sharding

Primary key sharding is the only option when you don’t have any fields to use as sharding fields. The advantages of primary key sharding are that queries by primary key are the fastest and data can be sharded evenly across different nodes when auto-growing serial numbers are used as primary keys. If a certain service field is suitable to be used as a fragment field, you are advised to use the fragment field. The requirements for selecting a fragment field are as follows:

  • Distribute data evenly to each node as much as possible;
  • This business field is the most frequent or important query condition.

Other common possible sharding fields besides the primary key are “order creation time”, “store category”, or “province”. When you find a suitable business field to use as a sharding field, you don’t have to worry about sacrificing the performance of the primary key query record, because in this case, MyCAT provides a primary key to sharding memory caching mechanism, and hot data is queried by primary key with no loss of performance.

<table name="t_user" primaryKey="user_id" dataNode="dn$1-32" rule="mod-long"> 
    <childTable name="t_user_detail" primaryKey="id" joinKey="user_id" parentKey="user_id" /> 
</table> 
Copy the code

For tables that are not sharded with primary keys, enter the property primaryKey. MyCAT will analyze the results of the first execution of the SQL statement based on the primaryKey query to determine which shard the primaryKey of the table is in and cache the shard ID from primaryKey to primaryKey. The second or subsequent query mycat will first check the cache for id – >node (primary key to shard mapping). If there is a direct query, this method improves the query performance of non-primary key shard.

2.4. Sharding rules commonly used by Mycat

2.4.1. Fragmented enumeration

This rule applies to specific scenarios. For example, some services need to be saved by province or county, but the province or county is fixed. For these services, the configuration is as follows:

<tableRule name="sharding-by-intfile"> 
<rule> 
<columns>user_id</columns> 
<algorithm>hash-int</algorithm> 
</rule> 
</tableRule> 

<function name="hash-int" class="io.mycat.route.function.PartitionByFileMap"> 
<property name="mapFile">partition-hash-int.txt</property> 
<property name="type">0</property> 
<property name="defaultNode">0</property> 
</function> 
Copy the code

Partition hash – int. TXT configuration:

10000=0
10010=1 
DEFAULT_NODE=1 
Copy the code

Configuration instructions

Columns indicate the columns to be sharded, and algorithm indicates the shard function. In the configuration of the shard function, mapFile indicates the name of the configuration file. The default value of type is 0, 0 indicates Integer, and non-zero indicates String. And 0 represent node 1.

  • DefaultNode defaultNode: if the value is less than 0, the defaultNode is not configured. If the value is greater than or equal to 0, the defaultNode is configured
  • What the default node does: When enumerating shards, if you don’t recognize another enumeration value, route it to the default node
  • If the defaultNode is not configured (a value of defaultNode less than 0 indicates that the defaultNode is not configured), an error is reported when an enumeration value is not recognized

2.4.2. Fixed Sharding Hash algorithm

This rule is similar to the modular operation of decimal, the difference is that the operation of binary is to take the lower 10 bits of id binary, that is, ID binary &1111111111. The advantage of this algorithm is that if the operation is carried out according to the base 10 module, 1-10 will be divided into 1-10 fragments when continuously inserted 1-10, increasing the difficulty of transaction control of insertion. However, this algorithm may be divided into continuous fragments according to binary, reducing the difficulty of transaction control of insertion.

<tableRule name="rule1"> <rule> <columns>user_id</columns> <algorithm>func1</algorithm> </rule> </tableRule> <function Name = "func1" class = "IO. Mycat. The route. The function. PartitionByLong" > < property name = "partitionCount" > 2, 1 < / property > < property Name = "partitionLength" > 256512 < / property > < / function >Copy the code

Configuration instructions

Columns indicate the columns to be sharded, algorithm fragment function, partitionCount fragment number list, and partitionLength fragment range list partitionLength: The default value is 2n=10242^n=10242n=1024, indicating that a maximum of 1024 partitions are supported.

Note: The count and length arrays must have the same length. 1024=sum((count[I]∗length[I]))1024=sum((count[I]*length[I]))1024=sum((count[I]∗length[I])). The dot product of the vectors count and length is identical to 1024.

The partitioning strategy for this example: You want to split the data level into three pieces, with the first two pieces accounting for 25% each and the third for 50%. (Therefore, this example is non-uniform partition)

/ / | < -- -- -- -- 1024 -- -- -- -- -- - > | / / | < -- - 256 -- > | < -- - 256 -- > | < -- - 512 -- -- - > | / / | partition0 | partition1 | partition2 | / / | 2 , so the count [0] = 2 | 1, so the count [1] = 1 | int [] count = new int [] {2, 1}; int[] length = new int[] { 256, 512 }; PartitionUtil pu = new PartitionUtil(count, length);Copy the code

If the average distribution setting is required: PartitionCount ∗partitionLength=1024partitionCount*partitionLength=1024partitionCount∗partitionLength=1024

<tableRule name="rule1"> 
<rule> 
<columns>user_id</columns> 
<algorithm>func1</algorithm> 
</rule> 
</tableRule> 

<function name="func1" class="io.mycat.route.function.PartitionByLong">
<property name="partitionCount">4</property> 
<property name="partitionLength">256</property> 
</function>  
Copy the code

2.4.3. Scope convention

This shard is used when a shard field range is planned in advance.


  • s t a r t < = r a n g e < = e n d start <= range <= end

  • r a n g e : s t a r t e n d range: start-end

  • d a t a n o d e : i n d e x datanode: index
    (K=1000,M=10000)
<tableRule name="auto-sharding-long"> 
<rule> 
<columns>user_id</columns> 
<algorithm>rang-long</algorithm> 
</rule> 
</tableRule> 

<function name="rang-long" class="io.mycat.route.function.AutoPartitionByLong"> 
<property name="mapFile">autopartition-long.txt</property> 
<property name="defaultNode">0</property> 
</function>  
Copy the code

Configuration instructions

Columns identify the table field to be sharded, algorithm sharding function, and rang-long function mapFile represents the defaultNode of the configuration file path defaultNode after the range is exceeded.

All node configurations start at 0, and 0 represents node 1. This configuration is very simple, i.e. specifying the range of possible ids to a shard: (0-500M=0 500-1000m =1 1000m-1500m =2) or (0-10000000=0 10000001-20000000=1).

2.4.4. Field module selection

This rule is a touch operation on a fragment field.

<tableRule name="mod-long"> <rule> <columns>user_id</columns> <algorithm>mod-long</algorithm> </rule> </tableRule> <function name="mod-long" class="io.mycat.route.function.PartitionByMod"> <! -- how many data nodes --> <property name="count">3</property> </function>Copy the code

Configuration instructions

The columns above identify the table fields to be sharded and the algorithm sharding function. This configuration is very clear, that is, the decimal module budget is calculated according to the ID. Compared with the fixed sharding hash, this kind of batch insertion may involve the batch insertion of single transaction and multiple data fragments, which increases the difficulty of transaction consistency.

2.4.5. Sharding by date (day)

This rule is sharding by day.

<tableRule name="sharding-by-date"> 
<rule> 
<columns>create_time</columns> 
<algorithm>sharding-by-date</algorithm> 
</rule> 
</tableRule> 

<function name="sharding-by-date" class="io.mycat.route.function.PartitionByDate"> 
<property name="dateFormat">yyyy-MM-dd</property> 
<property name="sBeginDate">2014-01-01</property> 
<property name="sEndDate">2014-01-02</property> 
<property name="sPartionDay">10</property> 
</function>  
Copy the code

Configuration instructions

  • Columns: Identifies the table field to be sharded
  • Algorithm: fragment function
  • DateFormat: indicates the dateFormat
  • SBeginDate: indicates the start date
  • SEndDate: Indicates the end date
  • SPartionDay: partition days (10 days from the start date by default). If sEndDate is set for a partition, the partition will be inserted from the start.
Assert.assertEquals(true, 0 == partition.calculate("2014-01-01")); 
Assert.assertEquals(true, 0 == partition.calculate("2014-01-10")); 
Assert.assertEquals(true, 1 == partition.calculate("2014-01-11")); 
Assert.assertEquals(true, 12 == partition.calculate("2014-05-01"));
Copy the code

2.4.6. Model constraints

This kind of rule is the combination of modular operation and range constraint, mainly to prepare for subsequent data migration, that is, the node distribution of data after modular operation can be determined independently.

<tableRule name="sharding-by-pattern"> 
<rule> 
<columns>user_id</columns> 
<algorithm>sharding-by-pattern</algorithm> 
</rule> 
</tableRule> 

<function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPattern"
<property name="patternValue">256</property> 
<property name="defaultNode">2</property> 
<property name="mapFile">partition-pattern.txt</property> 
</function> 
Copy the code

Partition – pattern. TXT configuration:

# id partition range start-end, data node index 
###### first host configuration 
1-32=0 
33-64=1 
65-96=2 
97-128=3 
######## second host configuration 
129-160=4 
161-192=5 
193-224=6 
225-256=7 
0-0=7  
Copy the code

Configuration instructions

The columns above identify the table fields to be sharded, the algorithm sharding function, and the patternValue is the modulus base. The defaoultNode default node. If the default is configured, the module calculation will not be performed. MapFile Configuration file Path in the configuration file, 1-32 indicates the distribution range after ID %256. If the distribution range is 1-32, it is in partition 1. If the id is not data, it is allocated to the default node of defaoultNode.

String idVal = "0"; 
Assert.assertEquals(true, 7 == autoPartition.calculate(idVal)); 
idVal = "45a"; 
Assert.assertEquals(true, 2 == autoPartition.calculate(idVal)); 
Copy the code

2.4.7. Intercept numbers to hash model perimeter constraints

This rule is similar to the model circumferential constraint, which supports data symbol letter modulus.

<tableRule name="sharding-by-prefixpattern"> 
<rule> 
<columns>user_id</columns> 
<algorithm>sharding-by-prefixpattern</algorithm> 
</rule> 
</tableRule> 

<function name="sharding-by-prefixpattern" class="io.mycat.route.function.PartitionByPrefixPattern"> 
<property name="patternValue">256</property> 
<property name="prefixLength">5</property> 
<property name="mapFile">partition-pattern.txt</property> 
</function>  
Copy the code

Partition – pattern. TXT configuration:

# range start-end,data node index # ASCII # 8-57=0-9 Arabic # 64, 65-90=@, A-z # 97-122=a-z ###### first host configuration 1-4=0 5-8=1 9-12=2 13-16=3 ###### second host configuration 17-20=4 21-24=5 25-28=6 29-32=7 0-0=7Copy the code

Configuration instructions

Columns identify the columns to be sharded, the algorithm sharding function, and the patternValue is the modular cardinality. PrefixLength Is the ASCII number of intercepted digits. MapFile Configuration file Path in the configuration file, 1-32 indicates the distribution range after ID %256. If the distribution range is 1-32, it indicates the distribution range in partition 1. This method is similar to method 6 except that it takes the column type to get the prefixLength before listing all the ASCII codes and then patternValue the sum%patternValue, the value obtained, and the number of fragments in the range.

String idVal="gf89f9a"; 
Assert.assertEquals(true, 0==autoPartition.calculate(idVal)); 
idVal="8df99a"; 
Assert.assertEquals(true, 4==autoPartition.calculate(idVal));
idVal="8dhdf99a"; 
Assert.assertEquals(true, 3==autoPartition.calculate(idVal)); 
Copy the code

2.4.8. Application specification

This rule allows the application to determine which shard to route to at run time.

<tableRule name="sharding-by-substring"> <rule> <columns>user_id</columns> <algorithm>sharding-by-substring</algorithm> </rule> </tableRule> <function name="sharding-by-substring" class="io.mycat.route.function.PartitionDirectBySubString"> <property name="startIndex">0</property><! -- zero-based --> <property name="size">2</property> <property name="partitionCount">8</property> <property name="defaultPartition">0</property> </function>Copy the code

Configuration instructions

The columns identify the columns to be sharded, and the algorithm is used to calculate the partition number based on character substrings (which must be numbers). For example, if ID =05-100000002 in this configuration, it indicates that the partition is obtained from startIndex=0 and siz=2 digits, that is, 05. If the partition is not transmitted, 05 is allocated to defaultPartition by default.

2.4.9. Intercepting numbers for Hash parsing

This rule intercepts hash fragments of int values in strings.

<tableRule name="sharding-by-stringhash"> <rule> <columns>user_id</columns> <algorithm>sharding-by-stringhash</algorithm> </rule> </tableRule> <function name="sharding-by-stringhash" class="io.mycat.route.function.PartitionByString"> <property name="partitionLength">512</property><! -- zero-based --> <property name="partitionCount">2</property> <property name="hashSlice">0:2</property> </function>Copy the code

Configuration instructions

The columns indicate the columns to be sharded, the algorithm sharding function partitionLength represents the modulus base of string hash, the partitionCount represents the number of partitions, and the hashSlice hash budget bit. That is, hash based on the int value in the substring.

HashSlice: 0 for str.length(), -1 for str.length()-1;

"2" - > "" - >" (0, 2) (1, 2) "1:" - > "1:" (1, 0) - > (1, 0) ": 1" - > (0, 1) ", "- > (0, 0)Copy the code

2.4.10. Consistent hash

Consistent hash budget effectively solves the problem of distributed data expansion.

<tableRule name="sharding-by-murmur"> <rule> <columns>user_id</columns> <algorithm>murmur</algorithm> </rule> </tableRule> <function name="murmur" class="io.mycat.route.function.PartitionByMurmurHash"> <property name="seed">0</property><! <property name="count">2</property><! <property name="virtualBucketTimes">160</property><! An actual database node is mapped to this number of virtual nodes, which is 160 times the number of virtual nodes than the number of physical nodes by default. -- <property name="weightMapFile">weightMapFile</property> Specifies the weight of the node. Enter the value in the format of the properties file. The value is an integer from 0 to count-1, i.e. the node index is the key and the node weight is the value. The ownership weight must be a positive integer, otherwise 1 is substituted --> <! -- <property name="bucketMapPath"> /etc/mycat/bucketmapPath </property> used to observe the distribution of physical and virtual nodes during testing. </function> Outputs the hash value of the virtual node. There is no default value.Copy the code

2.4.11. Split by monthly hour

This rule is split by hour within a month, the minimum granularity is hour, can be a day at most 24 shards, at least 1 shard, after a month start from the beginning of the next month cycle. At the end of each month, data needs to be manually cleared.

<tableRule name="sharding-by-hour"> 
<rule> 
<columns>create_time</columns> 
<algorithm>sharding-by-hour</algorithm> 
</rule> 
</tableRule> 

<function name="sharding-by-hour" class="io.mycat.route.function.LatestMonthPartion"> 
<property name="splitOneDay">24</property> 
</function>  
Copy the code

Configuration instructions

  • Columns: Split columns, string type (yyyymmddHH)
  • Splitday: Number of slices cut to one day

2.4.12. Scope module sharding

The sharding group is calculated by scope sharding first, and the advantages of modules within the group can avoid data migration during expansion and avoid hot issues of scope sharding to a certain extent. Combining the advantages of range sharding and modular sharding, the use of modular sharding in the group can ensure that the data within the group is more uniform, and the range sharding between the groups can take into account the range query.

You are advised to plan the number of shards in advance. If data is expanded by shard group, data in the original shard group does not need to be migrated. Because data in a fragment group is uniform, hot data in a fragment group can be avoided.

<tableRule name="auto-sharding-rang-mod"> 
<rule> 
<columns>id</columns> 
<algorithm>rang-mod</algorithm> 
</rule> 
</tableRule> 

<function name="rang-mod" class="io.mycat.route.function.PartitionByRangeMod"> 
<property name="mapFile">partition-range-mod.txt</property> 
<property name="defaultNode">21</property> 
</function> 
Copy the code

Configuration instructions

Columns indicate the columns to be sharded, the algorithm sharding function, and rang-mod mapFile represents the defaultNode sequence number of defaultNode after the configuration file path exceeds the range. The node starts from 0. The following configuration has a range representing a shard group, and the number after the = sign represents the number of shards that the shard group has.

Partition – range – mod. TXT configuration:

0 to 200M=5 // Indicates that there are five fragments. 200m1-400m =1 400M1-600m =4 600M1-800m =4 800M1-1000m =6Copy the code

2.4.13. Date range hash sharding

The idea is consistent with the scope of the module, when there will be data set problems due to the date in the module, so change to hash method. The advantages of grouping data by date and then distributing data evenly in a short term based on time hash can avoid data migration during expansion and avoid hot spots of range sharding to a certain extent. The date format should be as accurate as possible, otherwise local uniformity cannot be achieved.

<tableRule name="rangeDateHash"> 
<rule> 
<columns>col_date</columns> 
<algorithm>range-date-hash</algorithm> 
</rule> 
</tableRule> 

<function name="range-date-hash" class="io.mycat.route.function.PartitionByRangeDateHash"> 
<property name="sBeginDate">2014-01-01 00:00:00</property> <property name="sPartionDay">3</property> 
<property name="dateFormat">yyyy-MM-dd HH:mm:ss</property> 
<property name="groupPartionSize">6</property> 
</function> 
Copy the code

GroupPartionSize specifies the size of the shard group.

2.4.14. Hot and cold data sharding

According to the distribution of hot and cold data of the date query log data, the real-time transaction database is queried in the latest N months, and the transaction over n months is fragmented in m days.

<tableRule name="sharding-by-date"> 
<rule> 
<columns>create_time</columns> 
<algorithm>sharding-by-hotdate</algorithm> 
</rule> 
</tableRule> 

<function name="sharding-by-hotdate" class="io.mycat.route.function.PartitionByHotDate"> 
<property name="dateFormat">yyyy-MM-dd</property> 
<property name="sLastDay">10</property> 
<property name="sPartionDay">30</property> 
</function> 
Copy the code

2.4.15. Natural monthly sharding

Partition by month column, one fragment per natural month, format between operation parsing example.

<tableRule name="sharding-by-month"> 
<rule> 
<columns>create_time</columns> 
<algorithm>sharding-by-month</algorithm> 
</rule> 
</tableRule> 

<function name="sharding-by-month" class="io.mycat.route.function.PartitionByMonth"> 
<property name="dateFormat">yyyy-MM-dd</property> 
<property name="sBeginDate">2014-01-01</property> 
</function>  
Copy the code

Configuration instructions

  • Columns: indicates a fragment field. It is a string
  • DateFormat: Indicates the date string format
  • SBeginDate: indicates the start date
PartitionByMonth partition = new PartitionByMonth(); 
partition.setDateFormat("yyyy-MM-dd"); 
partition.setsBeginDate("2014-01-01"); 
partition.init(); 
Assert.assertEquals(true, 0 == partition.calculate("2014-01-01")); 
Assert.assertEquals(true, 0 == partition.calculate("2014-01-10")); 
Assert.assertEquals(true, 0 == partition.calculate("2014-01-31")); 
Assert.assertEquals(true, 1 == partition.calculate("2014-02-01")); 
Assert.assertEquals(true, 1 == partition.calculate("2014-02-28")); 
Assert.assertEquals(true, 2 == partition.calculate("2014-03-1")); 
Assert.assertEquals(true, 11 == partition.calculate("2014-12-31")); 
Assert.assertEquals(true, 12 == partition.calculate("2015-01-31")); 
Assert.assertEquals(true, 23 == partition.calculate("2015-12-31")); 
Copy the code

2.4.16. Crc32slot Fragmentation algorithm

Different from previous sharding algorithms, stateful sharding algorithm is designed for automatic data migration.

Crc32solt is one of the implementation of stateful sharding algorithm, and the design of automatic data migration scheme crC32 (key)%102400=slot;

Slots are evenly distributed on Datanodes according to the range, and each table is instantiated. A file is used to record the mapping between slots and nodes. Zk is used for coordination during migration. You only need to migrate the slot data. The maximum number of fragments is 102,400, which should be sufficient in the short term. Each fragment is 10 million, which can support 1 trillion data in total.

<table name="travelrecord" dataNode="dn1,dn2" rule="crc32slot" /> 
Copy the code

2.5. Mycat multi-tenant support

Single-tenant is the traditional deployment of web + DB for each tenant. Due to the increasing number of tenants, the machine and operation cost of the whole Web part is very high, so it needs to be improved to a model where all tenants share a set of Web (the DB part is not changed yet).

Based on this demand, we do a simple transformation of the single tenant program to achieve web multi-tenant sharing. The specific transformation is as follows:

  1. Web part modification:

    • When the user logs in, the id of the tenant is recorded in the thread variable (ThreadLocal).
    • Change the JDBC implementation: when submitting SQL, get the tenant ID from ThreadLocal, add SQL comments, and put the tenant’s schema into the comments. For example: / *! mycat : schema = test_01 */ sql ;
  2. Build a proxy layer in front of db to proxy all database requests from the Web. The Proxy layer is implemented with myCAT. When the SQL submitted by the Web comes in, the schema is specified in the annotation, and the proxy layer forwards the SQL request according to the specified schema.

  3. Mycat configuration:

<user name="mycat"> 
<property name="password">mycat</property> 
<property name="schemas">order</property> 
<property name="readOnly">true</property> 
</user> 
 
<user name="mycat2"> 
<property name="password">mycat</property> 
<property name="schemas">order</property> 
</user> 
Copy the code

Database and table solution MyCat series Basic concepts database and table solution MyCat series data and table solution MyCat series cluster transactions database and table solution MyCat series route distribution solution