Abstract:

As MySQL becomes more and more popular, more and more data is stored in MySQL. In our daily work, we often come across a table that contains hundreds of millions or even billions of records. These tables hold a lot of historical records. Clearing the historical data is a big headache, because all the data is in a common table.

MySQL partition table overview

As MySQL becomes more and more popular, the data stored in MySQL becomes larger and larger. In our daily work, we often come across a table that contains hundreds of millions or even billions of records. These tables hold a lot of historical records.

Clearing the historical data is a big headache, because all the data is in a common table. Therefore, only one or more DELETE statements with a WHERE condition can be used to delete.

This puts a lot of strain on the database. Even if we remove these, the underlying data files don’t get smaller. The most effective way to deal with this problem is to use partitioned tables. The most common way to partition is by time.

One of the biggest advantages of partitioning is that it is very efficient to clean up historical data.

Partition type

MySQL currently supports RANGE, LIST, HASH, and KEY partitions. Let’s look at each partition one by one:

RANGE partitioning

Allocates multiple rows to a partition based on column values belonging to a given contiguous interval. The most common is based on the time field. Partition-based columns should preferably be of integer type, if date type can be converted to integer using a function. In this example, the to_days function is used

CREATE TABLE my_range_datetime(
    id INT,
    hiredate DATETIME
) 
PARTITION BY RANGE (TO_DAYS(hiredate) ) (
    PARTITION p1 VALUES LESS THAN ( TO_DAYS('20171202') ),
    PARTITION p2 VALUES LESS THAN ( TO_DAYS('20171203') ),
    PARTITION p3 VALUES LESS THAN ( TO_DAYS('20171204') ),
    PARTITION p4 VALUES LESS THAN ( TO_DAYS('20171205') ),
    PARTITION p5 VALUES LESS THAN ( TO_DAYS('20171206') ),
    PARTITION p6 VALUES LESS THAN ( TO_DAYS('20171207') ),
    PARTITION p7 VALUES LESS THAN ( TO_DAYS('20171208') ),
    PARTITION p8 VALUES LESS THAN ( TO_DAYS('20171209') ),
    PARTITION p9 VALUES LESS THAN ( TO_DAYS('20171210') ),
    PARTITION p10 VALUES LESS THAN ( TO_DAYS('20171211'), PARTITION p11 VALUES LESS THAN (MAXVALUE));Copy the code

P11 is the default partition where all records older than 20171211 will be stored. MAXVALUE is an infinite value. P11 is an optional partition. If we do not specify this partition in the definition table, we will receive an error when we insert data greater than 20171211.

We must carry partition fields when executing queries. This allows you to use the partition clipping feature

mysql> insert into my_range_datetime select * from test;                                                                    
Query OK, 1000000 rows affected (8.15 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

mysql> explain partitions select * from my_range_datetime where hiredate >= '20171207124503' and hiredate<='20171210111230'; +----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+--------- ----+ | id | select_type | table | partitions |type| possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+--------- ----+ | 1 | SIMPLE | my_range_datetime | p7,p8,p9,p10 | ALL | NULL | NULL | NULL | NULL | 400061 | Usingwhere| +----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+--------- ----+ 1 rowin set (0.03 sec)
Copy the code

Note that only p7, P8, p9, and P10 partitions are queried in the partitions execution plan, so using the to_days function can achieve partition trimming.

Datetime-based timestamp (timestamp);

In fact, MySQL provides a RANGE partition scheme based on the UNIX_TIMESTAMP function, and only the UNIX_TIMESTAMP function can be used. If other functions such as to_days are used, the following ERROR will be reported: “ERROR 1486 (HY000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed “.

“Any other expressions involving TIMESTAMP values are not permitted. (See Bug #42849.)”

Let’s test the RANGE partition scheme based on UNIX_TIMESTAMP function to see whether it can achieve partition clipping.

Partition scheme for TIMESTAMP

The original sentence is as follows:

CREATE TABLE my_range_timestamp (
    id INT,
    hiredate TIMESTAMP
)
PARTITION BY RANGE ( UNIX_TIMESTAMP(hiredate) ) (
    PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-02 00:00:00') ),
    PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-03 00:00:00') ),
    PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-04 00:00:00') ),
    PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-05 00:00:00') ),
    PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-06 00:00:00') ),
    PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-07 00:00:00') ),
    PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-08 00:00:00') ),
    PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-09 00:00:00') ),
    PARTITION p9 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-10 00:00:00') ),
    PARTITION p10 VALUES LESS THAN (UNIX_TIMESTAMP('2017-12-11 00:00:00')));Copy the code

Insert the data and view the execution plan for the above query

mysql> insert into my_range_timestamp select * from test;
Query OK, 1000000 rows affected (13.25 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

mysql> explain partitions select * from my_range_timestamp where hiredate >= '20171207124503' and hiredate<='20171210111230'; +----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+--------- ----+ | id | select_type | table | partitions |type| possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+--------- ----+ | 1 | SIMPLE | my_range_timestamp | p7,p8,p9,p10 | ALL | NULL | NULL | NULL | NULL | 400448 | Usingwhere| +----+-------------+-------------------+--------------+------+---------------+------+---------+------+--------+--------- ----+ 1 rowin set (0.00 sec)
Copy the code

Partition clipping can also be achieved.

Prior to version 5.7, the YEAR() and TO_DAYS() functions could only be used for partition clipping for columns of type DATA and DATETIME. In version 5.7, the TO_SECONDS() function was added.

The LIST partition

The LIST partition

A LIST partition is similar to a RANGE partition, except that LIST is a collection of enumerated values and RANGE is a collection of contiguous interval values. They are very similar in grammar. It is also recommended that the LIST partition column be non-NULL, otherwise the insertion of null values will fail if there are no null values in the enumerated LIST. This is different from other partitions. The RANGE partition will store them as the minimum partition value, and the HASH\KEY partition will convert them to 0. A non-integer field needs to be converted to an integer by a function.

Create table t_list(a int(11), b int(11))(partition by list(b) partition P0 valuesin,3,5,7,9 (1), the partition p1 valuesin(2, 0));Copy the code

Hash partitioning

We often come across such tables as membership tables in our practical work. There are no characteristic fields that can be obviously partitioned. But the table data is very large. To partition this data, mysql provides hash partitions. Based on the given number of partitions, data is allocated to different partitions. The HASH partition can HASH only integers. Non-integer fields can be converted to integers only through expressions. An expression can be any valid function or expression in mysql. If data is inserted into a table using a non-integer HASH, one more expression operation is required. Therefore, you are not advised to use complex expressions because this may affect performance.

The basic statements for a Hash partition table are as follows:

CREATE TABLE my_member (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    created DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)
PARTITION BY HASH(id)
PARTITIONS 4;
Copy the code

Note:

  1. HASH PARTITIONS can be partialized without specifying PARTITIONS clause. For example, PARTITIONS 4 in the preceding paragraph, the default partition number is 1.
  2. It is not allowed to write PARTITIONS without specifying the number of PARTITIONS.
  3. As with RANGE and LIST partitions, expr in the PARTITION BY HASH (expr) clause must return an integer value.
  4. The underlying implementation of HASH partitioning is based on MOD functions. For example, for the following table

CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)

PARTITION BY HASH( YEAR(col3) )

PARTITIONS 4;

If you want to insert a record with col3 as “2017-09-15”, the partition selection is based on the following values:

The MOD (YEAR (‘ 2017-09-01 ‘), 4)

= the MOD (2017, 4)

= 1

LINEAR HASH partitioning

LINEAR HASH partitioning is a special type of HASH partitioning that is based on a different algorithm than the LINEAR HASH partition, which is based on MOD functions.

The format is as follows:

CREATE TABLE my_members (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)
PARTITION BY LINEAR HASH( id )
PARTITIONS 4;
Copy the code

Description:

Its advantage is that it is faster to add, delete, merge, and split partitions in large-volume scenarios, such as terabytes. Its disadvantage is that it is more likely to have uneven data distribution than HASH partitions.

The KEY partition

A KEY partition is similar to a HASH partition, with the following differences:

  1. The KEY partition allows multiple columns, while the HASH partition allows only one column.
  2. If there is a primary key or unique key, the partition column in the key can not be specified. By default, the primary key or unique key must be specified explicitly.
  3. The KEY partition object must be a column, not a column-based expression.
  4. The algorithm for KEY PARTITION and HASH PARTITION is different. For PARTITION BY HASH (expr), the object for MOD values is the value returned BY EXPr, whereas for PARTITION BY KEY (column_list), the object is based on the MD5 value of the column.

The format is as follows:

CREATE TABLE k1 (
    id INT NOT NULL PRIMARY KEY,    
    name VARCHAR(20)
)
PARTITION BY KEY()
PARTITIONS 2;
Copy the code

In the case of no primary key or unique key, the format is as follows:

CREATE TABLE tm1 ( mmysql
    s1 CHAR(32)
)
PARTITION BY KEY(s1)
PARTITIONS 10;
Copy the code

Conclusion:

  1. If a primary key or unique key exists in a MySQL partition, the partition column must be included in it.
  2. For native RANGE, LIST, and HASH partitions, partition objects can only return integer values.
  3. Partition fields cannot be NULL, otherwise how to determine the partition scope, so try NOT NULL
This article is the original content of the cloud habitat community, shall not be reproduced without permission, if need to be reproduced, please send email to [email protected]; If you find any content suspected of plagiarism in our community, you are welcome to send an email to [email protected] to report and provide relevant evidence. Once verified, our community will immediately delete the content suspected of infringement.

The original link