directory

1. Specification background and purpose

2. Design specifications

2.1 Database design 2.1.1 Database name 2.1.2 Table structure 2.1.3 Column data type optimization 2.1.4 Index design 2.1.5 Sub-database sub-table and sub-table 2.1.6 Character set 2.1.7 Program DAO Layer design Suggestions 2.1.8 A standard example of table construction 2.2 SQL 2.2.1 DML Statements 2.2.2 Join multiple tables 2.2.3 Transactions 2.2.4 Sorting and Grouping 2.2.5 Prohibited SQL statements online

1. Specification background and purpose

Compared with Oracle, SQL Server and other databases, MySQL database has its advantages and disadvantages in kernel. When we use MySQL database, we need to follow certain specifications, and make full use of the strengths and avoid the weaknesses. The purpose of this specification is to help or guide RD, QA, OP and other technical personnel to make database design suitable for online business. In the database change and processing process, database table design, SQL preparation and other aspects to standardize, so as to provide guarantee for the stable and healthy operation of the company’s business system.

2. Design specifications

2.1 Database Design

All the following specifications will be marked in accordance with the three levels of [high risk], [Mandatory], and [Recommended], and the compliance priority is from high to low.

For designs that do not meet the “high risk” and “mandatory” levels, the DBA will call back for changes.

2.1.1 the library

  1. [Mandatory] The library name must contain no more than 32 characters. Table names of related modules, such as user table and user_login table, must be in the same join relationship as possible.
  2. [Mandatory] Name format of the library: Business system name_subsystem name. Try to use the same prefix for table names used by the same module.
  3. For example, wendA_001 is named by time in the format of “library wildcard name_time”.
  4. [Mandatory] The character set must be utF8 or UTF8MB4 when creating a database. SQL example for creating a database: create database db1 default character set UTF8; .

2.1.2 table structure

  1. [Mandatory] The name of a table and column must contain no more than 32 characters. The table name can contain only lowercase letters, digits, and underscores (_).
  2. [Mandatory] The table name must be strongly related to the module name, for example, the teacher system uses “SZ” as the prefix, the channel system uses “QD” as the prefix, etc.
  3. [Mandatory] The character set must be utF8 or UTF8MB4 when creating a table.
  4. [Mandatory] The table storage engine type must be explicitly specified when creating a table. InnoDB is used if there is no special requirement. When storage engines other than InnoDB/MyISAM/Memory are required, they must be approved by the DBA before they can be used in a production environment. Because Innodb tables support transactions, row locks, downtime recovery, MVCC and other important features of relational databases, Innodb is the most used MySQL storage engine in the industry. This is something most other storage engines don’t have, so InnoDB is the first choice.
  5. [Mandatory] Create a table with a comment
  6. [Suggestion] Create table about primary key: Mysql > alter table auto_increment increment(int, bigINT, user_id, order_ID); mysql > alter table auto_increment(INT, bigINT); mysql > alter table auto_increment(INT, bigINT); And set up unique key index (refer to CDB. Teacher table design). If the primary key is set to random insert, innoDB’s internal page splitting and a large number of random I/ OS will result in performance degradation.
  7. [Suggestion] Core tables (such as user tables and money-related tables) must have create_time and update_time fields of row data to facilitate troubleshooting.
  8. [Suggestion] All fields in the table must be NOT NULL. Services can define DEFAULT values as required. Using NULL values can take up extra storage space for each row, data migration is error-prone, and aggregation function calculation results are biased.
  9. [Suggestion] It is recommended that large columns in the table, such as BLOb and text, be vertically split into other tables and select only when these objects need to be read.
  10. [Suggestion] Anti-paradigm design: Make a redundant copy of the fields that often need to be queried by JOIN in other tables. For example, the user_name attribute is redundant in user_Account and user_login_log tables, reducing join query.
  11. [Mandatory] The intermediate table is used to hold the intermediate result set. The name must start with tmp_. The backup table is used to back up or capture source table snapshots. The name of the backup table must start with bak_. The intermediate and backup tables are cleaned periodically.
  12. [Mandatory] Alter table for large tables with more than 100W rows must be reviewed by the DBA and executed during off-peak service periods. Because ALTER TABLE creates a table lock that blocks all writes to the table, this can have a significant impact on services.

2.1.3 Column data type optimization

  1. [Suggestion] BigINT is recommended for auto_increment columns in the table. If an unsigned int is stored between -2147483648 and 2147483647, an overflow will cause an error.
  2. [Suggestion] You are advised to use tinytint or Smallint for fields such as status and type to save storage space.
  3. [Suggestion] You are advised to use int instead of char(15) for the IP address field. Since ints take only 4 bytes, you can convert them to each other using the following function, while char(15) takes at least 15 bytes. Once the table has 100 million rows, add 1.1 GIGABytes of storage. SQL: select inet_aton (‘ 192.168.2.12 ‘); select inet_ntoa(3232236044); PHP: ip2long (‘ 192.168.2.12); long2ip(3530427185);
  4. Suggestion: Enum, set is not recommended. Because they waste space, and enumeration values are dead, changes are inconvenient. Tinyint or Smallint are recommended.
  5. [Suggestion] The blob and text types are not recommended. They’re a waste of hard disk and memory space. When loading table data, large fields will be read into memory, wasting memory space and affecting system performance. It is suggested to communicate with PM and RD whether such a large field is really needed. When a row in Innodb exceeds 8098 bytes, it takes the longest field in the record and puts 768 bytes in the original page. The rest of the field is stored in overflow-Page. Unfortunately, in compact row format, both the original page and overflow-Page load.
  6. [Suggestion] Use int, multiply by 100 and divide by 100. Since int takes up 4 bytes and double takes up 8 bytes, space is wasted.
  7. [Suggestion] Use vARCHAR to store text data. Since vARCHar is variable length storage, it saves more space than char. The MySQL Server layer provides a maximum of 65535 bytes for all text in a line, so a maximum of 21844 characters in the UTF8 character set are automatically converted to mediumText fields. Text can hold up to 21844 characters in the UTF8 character set, mediumtext can hold up to 2^24/3 characters, and longtext can hold up to 2^32 characters. It is recommended that the vARCHAR type be used and the number of characters should not exceed 2700.
  8. [Suggestion] Select TIMESTAMP as the time type as possible. Because datetime occupies 8 bytes, timestamp occupies only 4 bytes, but the range is 1970-01-01 00:00:01 to 2001-01 00:00:00. The higher-order method uses int to store the time and uses SQL functions unix_TIMESTAMP () and from_unixtime() to convert the time.

2.1.4 Index design

  1. [Mandatory] InnoDB tables must have id int/ BIGINT AUTO_INCREMENT as the primary key, and the primary key cannot be updated.
  2. [Suggestion] The name of the primary key starts with pk_, the unique key starts with UK_ or uq_, and the common index starts with IDx_. All indexes must be in lowercase format and suffixes with the table name or field name or abbreviation.
  3. InnoDB and MyISAM store engine table, index type must be BTREE; MEMORY tables can select HASH or BTREE indexes as needed.
  4. [mandatory] the length of each index record in a single index cannot exceed 64KB.
  5. [Suggestion] The number of indexes in a single table cannot exceed seven.
  6. [Suggestion] When creating an index, consider setting up a joint index and put the most distinguished field in the first place. For example, the distinctness of a column userID can be calculated by select count(distinct UserID).
  7. [Suggestion] In SQL with multi-table JOIN, ensure that the join columns of the driven table have indexes, so that the join execution efficiency is the highest.
  8. [Suggestion] When creating a table or adding indexes, ensure that there are no redundant indexes in the table. For MySQL, if key(a,b) already exists in the table, key(a) is a redundant index and needs to be deleted.

2.1.5 Sub-database sub-table and partition table

  1. [Mandatory] The partition field (partition key) of a partitioned table must have an index, or the first column of a combined index.
  2. [Mandatory] The number of partitions (including sub-partitions) in a partition table cannot exceed 1024.
  3. [Mandatory] The RD or DBA must specify the creation and clearing policies for partition tables.
  4. [Mandatory] SQL accessing a partitioned table must contain a partitioning key.
  5. [Suggestion] The size of a partition file does not exceed 2 GB and the total size does not exceed 50 GB. You are advised to create a maximum of 20 partitions.
  6. [Mandatory] The ALTER TABLE operation on a partitioned table must be performed during off-peak hours.
  7. [Mandatory] If the separate library policy is adopted, the number of libraries cannot exceed 1024
  8. [Mandatory] If the split table policy is adopted, the number of tables cannot exceed 4096
  9. [Suggestion] A single sub-table should not exceed 500W rows, and the IBD file size should not exceed 2G, so as to improve the performance of data distribution.
  10. [Suggestion] The level table should be modeled as far as possible. It is recommended to use date for log and report data.

2.1.6 character set

  1. [Mandatory] The character set of the database must be utF8 or UTF8MB4.
  2. [Mandatory] The character set of the front-end program or environment variable must be the same as the character set of the database or table. The character set must be UTF8.

2.1.7 Suggestions for program Layer DAO design

  1. [Suggestion] Do not use model for new code, recommend using manual spelling SQL+ binding variables to pass in parameters. Because although the Model can use the object-oriented way to operate DB, but its improper use is easy to cause the generated SQL is very complex, and the model layer itself to do the cast performance is poor, and ultimately lead to the database performance degradation.
  2. [Suggestion] If the front-end program connects to MySQL or Redis, it must have connection timeout and retry mechanism, and retry interval.
  3. [Suggestion] Try to prompt the original error message of MySQL or Redis in the error report of the front-end program, so as to facilitate error detection.
  4. [Suggestion] For front-end programs with connection pools, configure the initial, minimum, and maximum number of connections, timeout duration, and connection reclamation mechanism based on service requirements. Otherwise, database connection resources will be exhausted, resulting in online accidents.
  5. [Suggestion] Tables of the log or history type tend to grow larger over time. Therefore, the RD or DBA must establish a plan for clearing or archiving table data before logging in.
  6. [Suggestion] During application design, RD must consider and avoid the impact of master/slave delay in database on services. Minimize the impact on services caused by the short delay (less than 20 seconds) of the secondary database. You are advised to force consistent read start transactions to go through the primary database or read the secondary database after the update.
  7. [Suggestion] When multiple concurrent business logics access the same data block (InnoDB table), row locks or even table locks will occur on the database side, resulting in reduced concurrency. Therefore, it is recommended that update SQL classes should be updated based on primary keys as far as possible.
  8. [Suggestion] Keep the lock sequence of service logic the same. Otherwise, a deadlock may occur.
  9. [Suggestion] If the read/write ratio of a single table is greater than 10:1, you can store hotspot data (such as Mecache or Redis) in the cache to speed up access and reduce MySQL pressure.

2.1.8 A standard example of a construction sentence

A relatively standard construction sentence is:

CREATE TABLE user (
  `id` bigint(11) NOT NULL AUTO_INCREMENT,
  `user_id` bigint(11) NOT NULL COMMENTUsers'id'`username` varchar(45) NOT NULL COMMENT 'Real Name'.`email` varchar(30) NOT NULL COMMENT'User email',`nickname` varchar(45) NOT NULL COMMENT 'nickname'.`avatar` int(11) NOT NULL COMMENT 'avatar'.`birthday` date NOT NULL COMMENT 'birthday'.`sex` tinyint(4) DEFAULT '0' COMMENT 'gender'.`short_introduce` varchar(150) DEFAULT NULL COMMENT 'Introduce yourself in one sentence, up to 50 Chinese characters'.`user_resume` varchar(300) NOT NULL COMMENT 'Address of resume submitted by user'.`user_register_ip` int NOT NULL COMMENT'User's source IP at registration time',`create_time` timestamp NOT NULL COMMENT'User record creation time',`update_time` timestamp NOT NULL COMMENT'User information modification time',`user_review_status` tinyint NOT NULL COMMENT'User information review status,1To pass,2Is under review,3Is not passed,4Is not yet submitted for audit ', PRIMARYKEY (`id`),
  UNIQUE KEY `idx_user_id` (`user_id`),
  KEY `idx_username`(`username`),
  KEY `idx_create_time`(`create_time`.`user_review_status`))ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Basic Site User Information';
Copy the code

2.2 write SQL

2.2.1 DML statements

  1. [Mandatory] SELECT statement must specify the name of a field. Asterisks are not allowed. Because select * will read data from MySQL that should not be read, causing network adapter stress. The system will report an error if the table field is updated but the Model layer has not been updated.
  2. Insert into T1 values(…) For the same reason.
  3. 【 suggestion 】 Insert into… Values (XX), (XX), (XX)… . The value of XX should not exceed 5000. Too many values go online very quickly, but cause a master/slave synchronization delay.
  4. [Suggestion] Do not use UNION in SELECT statements. Instead, use UNION ALL and limit the number of UNION clauses to 5. Because union All does not require deduplication, it saves database resources and improves performance.
  5. [Suggestion] Limit the IN value list to 500. For example, the select… Where userid (in… 500 or less…) This is done to reduce low-level scans, reduce database stress and speed up queries.
  6. [Suggestion] It is necessary to control the amount of batch data update in the transaction, and perform necessary sleep to achieve a small number of times.
  7. [Mandatory] All tables involved in the transaction must be InnoDB tables. Otherwise, the rollback will not be complete, and the master and slave libraries may synchronize terminals.
  8. Writes and transactions are sent to the master, and read-only SQL is sent to the slave.
  9. [Mandatory] Except for static tables or small tables (less than 100 rows), DML statements must have where conditions and use index look-ups.
  10. [Mandatory] Hint, such as SQL_NO_cache, force index, ignore key, and straight Join, is prohibited in the production environment. Hint is used to force SQL to execute according to a certain execution plan, but with the amount of data changing, we can’t guarantee that our initial prediction is correct, so we have to trust the MySQL optimizer!
  11. [Mandatory] The type of the left and right columns in the where condition must be the same; otherwise, the index cannot be used.
  12. SELECT advice 】 【 | UPDATE | DELETE | REPLACE should have a WHERE clause, and the conditions of the WHERE clause must use an index lookup.
  13. [Mandatory] Full table scan is not recommended for large tables in production databases, but can be performed for static tables with less than 100 rows. The amount of query data should not exceed 25% of the table rows or the index will not be utilized.
  14. [mandatory] In the WHERE clause, it is forbidden to use only the fully fuzzy LIKE condition for the search.
  15. [Suggestion] Do not use functions or expressions in index columns. Otherwise, indexes cannot be used. For example where length(name)=’Admin’ or where user_id+2=10023.
  16. [Suggestion] Reduce the use of OR statements. You can optimize the OR statements to union, and then build indexes on each WHERE condition. Where a=1 or b=2 The union… Where b = 2, the key (a), the key (b).
  17. [Suggestion] Query on multiple pages. When the limit start point is high, you can filter by filter criteria first. Select a,b,c from T1 limit 1000020; Select a,b,c from T1 where id>10000 LIMIT 20; .

2.2.2 Join multiple tables

  1. [Mandatory] Disable cross-DB join statements. This reduces module coupling and lays a solid foundation for database splitting.
  2. [Mandatory] Forbid the use of JOIN in service update SQL statements, for example, update T1 join T2… .
  3. [Suggestion] It is not recommended to use subquery. It is recommended to separate the subquery SQL into multiple queries by combining programs, or use Join instead of subquery.
  4. [Suggestion] In an online environment, do not join more than three tables.
  5. [Suggestion] Alias is recommended for multiple join queries, and alias should be used to reference fields and databases in the SELECT list. Select a from db1.table1 where alias1… .
  6. [Suggestion] In multi-table join, try to select the table with smaller result set as the driver table to join other tables.

2.2.3 transaction

  1. Transaction advice 】 【 INSERT | UPDATE | DELETE | REPLACE operations within the lines of control IN 2000, and IN the WHERE clause IN the list of the number control within 500 refs.
  2. [Suggestion] When batch data is operated, it is necessary to control the transaction processing interval and perform necessary sleep. Generally, 5-10 seconds is recommended.
  3. [Suggestion] For insert operations in a table with an AUTO_INCREMENT field, ensure that the number of concurrent operations is less than 200.
  4. [Mandatory] The program design must consider the impact of the database transaction isolation level, including dirty reads, unrepeatable reads, and phantom reads. The recommended transaction isolation level is REPEATable -read.
  5. [Suggestion] The transaction contains no more than five SQL entries (except payment services). Because long transactions can cause data to be locked for long periods of time, MySQL internal caching, excessive connection consumption, etc.
  6. [Suggestion] Update statements in transactions based on primary keys or unique keys, such as update… where id=XX; Otherwise, a gap lock will occur, and the internal lock range will be expanded, resulting in system performance deterioration and deadlock.
  7. [Suggestion] Try to move some typical external calls out of the transaction, such as calling webService and accessing file storage, so as to avoid the transaction being too long.
  8. [Suggestion] For select statements that are strictly sensitive to the primary/secondary delay of MySQL, enable transaction force access to the primary library.

2.2.4 Sorting and Grouping

  1. [Suggestion] Reduce the use of order by, and communicate with business without sorting, or put sorting on the program side. The ORDER BY, Group BY, and DISTINCT statements consume CPU, and the CPU resources of the database are extremely valuable.
  2. [Suggestion] Try to use indexes to directly retrieve sorted data in ORDER BY, group BY, and DISTINCT SQL. Where a=1 order by key(a,b)
  3. [Suggestion] For statements that contain the order BY, Group BY, and DISTINCT queries, keep the result set filtered by the WHERE condition within 1000 lines; otherwise, the SQL will be slow.
  4. 2.2.5 online to ban the use of SQL statements “high-risk” disable the update | delete t1… where a=XX limit XX; The update statement with the limit. Because it will cause inconsistency between master and slave, resulting in data confusion. It is suggested to add order by PK.
  5. [High risk] Disable associated subqueries, such as update T1 set… Where name in(select name from user where…) ; It’s extremely inefficient.
  6. [Mandatory] Disable procedure, function, trigger, views, event, and foreign key constraints. Because they consume database resources and reduce database instance scalability. Recommendations are implemented on the application side.
  7. [Mandatory] Disable insert into… On the duplicate key update… In a high concurrency environment, primary/secondary inconsistency may occur.
  8. Update T1, T2 WHERE T1.id =t2.id… update T1, T2 where t1.id=t2.id… .