[reproduced] a copy of the MySQL specification worth collecting

Code farmers youdao

Author: Listen to the wind

Data naming convention

All database object names must be lowercase and underlined.

Disallow MySQL reserved keywords for all database object names (if a table name contains a keyword query, it needs to be enclosed in single quotes).

The name of the database object should be consistent with its name and should not exceed 32 characters at the end.

Temporary library tables must be prefixed with tmp_ and suffixed with date, and backup tables must be prefixed with bak_ and suffixed with date (timestamp).

The names and types of all columns that store the same data must be the same. Generally, all columns are associated columns. If the associated column types are inconsistent, implicit data type conversion is automatically performed, which invalidates indexes on the columns and reduces the query efficiency.

Basic database design specifications

1. All tables must use InnoDB storage engine

No special requirements (i.e. features InnoDB can’t meet like: InnoDB supports transactions, row-level locking, better recovery, and better performance under high concurrency. InnoDB storage engine (Myisam used by default before MySQL 5.5, InnoDB used by default after MySQL 5.6)

2. Use UTF8 for database and table character sets

Better compatibility, unified character set can avoid garbled characters caused by character set conversion, the need for conversion before comparing different character sets will cause index failure.

All tables and fields need to be commented

Add comments to tables and columns using comment clauses to maintain data dictionaries from the start.

4. Try to control the amount of data in a single table, and it is recommended to control it within 5 million

5 million is not the limit of MySQL database. If it is too large, it will cause great problems in modifying table structure, backup and recovery. Historical data archiving (applied to log data), database and table division (applied to business data) and other means can be used to control the size of data.

5, use MySQL partition table carefully

A partitioned table consists of multiple files physically and a single table logically. Select partition keys carefully. Cross-partition query efficiency may be lower.

6. Try to separate hot and cold data and reduce the width of the table

MySQL limits each table to a maximum of 4096 columns, and the size of each row of data cannot exceed 65535 bytes to reduce disk I/O and ensure the memory cache hit ratio of hot data (the wider the table, the larger the memory used to load the table into the memory buffer pool, and also consume more I/O). Avoid reading useless cold data columns that are often used together in a table (avoid more correlation operations)

7. Do not create reserved fields in the table

The reserved field cannot be named by name. The reserved field cannot confirm the data type of the storage. Therefore, the table will be locked if the reserved field type is changed

8. Do not store large binary data such as pictures and files in the database

When a file is large, the amount of data increases rapidly in a short period of time. When a database reads a database, a large number of random I/O operations are performed. When a file is large, I/O operations are time-consuming.

9. Do not do database stress tests online

Do not connect to the build environment database directly from the development environment and test environment

Database field design specification

1. Select the smallest data type that meets storage requirements

why

The larger the field of a column is, the more space is needed to create an index. In this way, the number of inodes that can be stored in a page is smaller and smaller. The more I/O times required for traversal, the worse the index performance is

methods

1. Convert a string to a numeric type for storage, for example, convert an IP address to an integer.

MySQL provides two methods to handle IP addresses

Inet_aton Converts IP to an unsigned integer (4-8 bits)

Inet_ntoa Converts an integer IP address to an address

Before inserting data, use inet_aton to convert the IP address to an integer to save space. To display data, use inet_ntoa to convert an integer IP address to an address.

2. For non-negative data (such as self-increasing ID and integer IP address), unsigned integer is preferred because unsigned data has twice as much storage space as signed data.

SIGNEDINT-2147483648~2147483647

UNSIGNEDINT0~4294967295

The N in VARCHAR(N) represents the number of characters, not bytes. Use UTF8 to store 255 Characters Varchar(255)=765 bytes. Too large a length will consume more memory

2. Avoid TEXT and BLOB data types. The most common TEXT type can store up to 64K of data

It is recommended to separate BLOB or TEXT columns into separate extended tables

MySQL memory temporary tables do not support large data types such as TEXT and BLOB. If such data is contained in a query, disk temporary tables are used instead of memory temporary tables for sorting operations.

And for this kind of data, MySQL still has to perform a second query, which will make SQL performance become poor, but it is not to say that such data type must not be used.

If you must, it is recommended to separate BLOB or TEXT columns into separate extended tables, never use SELECT * for queries but only extract necessary columns, and never query a TEXT column when it is not needed.

TEXT or BLOB types can only use prefix indexes

Because MySQL has a limit on the length of index fields, the TEXT type can only use prefix indexes, and there is no default value on the TEXT column.

3. Avoid the ENUM type

To change the ENUM value, use the ALTER statement

The ORDER BY operation of ENUM type is inefficient and requires additional operations

Do not use numeric values as enumeration values of enUms

4. Define all columns as NOT NULL if possible

The reason:

Index NULL columns require extra space to hold, so they take up more space.

NULL values are treated specially for comparison and calculation.

5. Use TIMESTAMP (4 bytes) or DATETIME (8 bytes) to store the time

TIMESTAMP Storage time range 1970-01-01 00:00:01 to 2038-01-19-03:14:07

TIMESTAMP takes 4 bytes, the same as INT, but is more readable than INT. The value beyond the TIMESTAMP range is stored as DATETIME.

It is common for people to use strings to store date-type data (incorrect) :

Disadvantage 1: You can’t use date functions for calculation and comparison.

Disadvantage 2: Storing dates as strings takes up more space.

6. Finance-related amount class data must be of type DECIMAL

Imprecise floating point: float, double

Precise floating point: decimal

The Decimal type is an exact floating-point number that does not lose precision when evaluating. The space taken up is determined by the width of the definition. Each 4 bytes can store 9 digits, and the decimal point takes up one byte. Can be used to store integer data larger than BigInt.

Index design specification

1. Limit the number of indexes in a table. You are advised to limit the number of indexes in a table to five

More indexes are not always better! Indexing can improve efficiency as well as reduce efficiency; Indexes can increase query efficiency, but they also reduce insert and update efficiency, and in some cases, query efficiency.

Because MySQL when choosing how to optimize the query optimizer, according to a unified information, for each index can be used to evaluate, to generate a best execution plan, if there are a lot of index can be used to query at the same time, it will increase the time of MySQL optimizer to generate the execution plan, also can reduce the query performance.

2. Do not create a separate index for each column in the table

Before version 5.6, a SQL query could only use one index in a table. After 5.6, even with the optimization of merged indexes, it is still not as good as using a single combined index

3. Every InnoDB table must have a primary key

InnoDB is an indexed organized table: data is stored in the same logical order as the index order. Each table can have multiple indexes, but there is only one order in which InnoDB organizes tables by primary key index.

Do not use UUID, MD5, HASH, or string columns as primary keys (sequential growth cannot be guaranteed). You are advised to use the self-increasing ID for the primary key.

Common index column recommendations

Columns that appear in the WHERE clause of SELECT, UPDATE, and DELETE statements.

Fields contained in ORDER BY, GROUP BY, DISTINCT.

It is not necessary to index all columns that match the fields in 1 and 2. It is usually better to have a joint index for the fields in 1 and 2.

The associated column of multiple table joins.

How do I choose the order of index columns

The purpose of creating an index is to reduce random I/OS and improve query performance. The less data an index can filter out, the less data it can read from the disk.

The most discriminant is placed at the far left of the union index (discriminant = number of different values in the column/total number of rows in the column).

Try to place columns with small field lengths at the far left of the union index (because smaller field lengths mean more data can be stored on a page and better IO performance).

The most frequently used columns are placed to the left of the federated index (so that fewer indexes can be created).

Avoid creating redundant and duplicate indexes

This increases the time it takes the query optimizer to generate the execution plan.

Duplicate indexes: Primary key(ID), index(ID), unique index(ID)

Examples of redundant indexes: Index (a, B, C), index(a,b), index(a)

Overwrite indexes are preferred

Override indexes are preferred for frequent queries.

Overwrite index: an index that contains all query fields (where, SELECT, Ordery by,group by)

Benefits of overwriting indexes:

Avoid secondary queries of InnoDB tables for indexes

InnoDB is stored in the order of clustered indexes. For InnoDB, secondary indexes store the primary key information of rows in leaf nodes. If we use secondary indexes to query data, after finding the corresponding key value, we need to conduct a secondary query through the primary key to obtain the data we really need. In an overwrite index, all data can be obtained from the key value of the secondary index, avoiding secondary query on the primary key, reducing I/O operations and improving query efficiency.

You can change random I/OS into sequential I/OS to improve query efficiency

Because overwrite indexes are stored in order of key values, it is much less for IO intensive range searches than to read each row of data randomly from disk. Therefore, overwrite indexes can also be used to convert random read IO from disk to sequential index searches during access.

Index SET specification

Try to avoid using foreign key constraints.

Foreign key constraints are not recommended, but indexes must be built on the associated keys between tables.

Foreign keys can be used to ensure referential integrity of data, but are recommended to be implemented on the business side.

Foreign keys can affect writes to parent and child tables and thus degrade performance.

Database SQL development specification

1. You are advised to use precompiled statements to perform database operations

Precompiled statements can reuse these plans, reducing the time required for SQL compilation. They can also solve SQL injection problems caused by dynamic SQL. Passing only parameters is more efficient than passing SQL statements.

2. Avoid implicit conversions of data types

Implicit conversions cause index invalidation. Such as:

selectname,phonefromcustomerwhereid=’111′;

3. Make full use of indexes that already exist on the table

Avoid query conditions with double % numbers.

For example, a like ‘%123%’, (if not preceded by %, only after % is used)

An SQL can only use one column in a composite index for range queries

For example, if there is a range query for column A,b, and C, the index on column B and C will not be used. When defining a joint index, if column A needs range lookup, column A should be placed to the right of the joint index.

Use left Join or NOT EXISTS to optimize the NOT in operation

Because not in also usually uses index invalidation.

4. The database should be designed with future extensions in mind

5. Programs connect to different databases using different accounts, base cross-library query

Leave room for database migration and table splitting

Reduce service coupling

Avoid security risks caused by excessive permissions

6. Disable SELECT * You must use SELECT < field list > to query

The reason:

Consumes more CPU and IO to network bandwidth resources

Unable to use overwrite index

Reduces the impact of table structure changes

7. Disallow INSERT statements without a list of fields

Such as:

insertintovalues(‘a’,’b’,’c’);

You should use:

insertintot(c1,c2,c3)values(‘a’,’b’,’c’);

8. Avoid subqueries and optimize them to JOIN operations

Generally, subqueries can be converted into associated queries for optimization only when the subquery is in clause and the subquery is simple SQL (excluding union, group BY, Order BY, and limit clauses).

Reasons for poor sub-query performance:

The result set of a subquery cannot use an index. Usually, the result set of a subquery is stored in a temporary table. No index exists in a memory temporary table or a disk temporary table, so the query performance is affected.

Especially for subqueries that return large result sets, the impact on query performance will be greater.

Because sub-queries generate a large number of temporary tables and no indexes, they consume too much CPU and I/O resources and generate a large number of slow queries.

9. Avoid joining too many tables

For MySQL, there is an associative cache, and the size of the cache can be set by the join_buffer_size parameter.

In MySQL, if more tables are joined to the same SQL, an additional association cache will be allocated. If more tables are associated with a SQL, more memory will be occupied.

If a large number of programs use the operation of multi-table association, while the JOIN_BUFFer_SIZE setting is not reasonable, it is easy to cause server memory overflow, will affect the stability of server database performance.

A maximum of 61 tables can be associated with MySQL. You are advised to associate a maximum of 5 tables.

10. Reduce the number of interactions with the database

A database is more suitable for batch operations. Combining multiple identical operations together improves processing efficiency

11. When judging or for the same column, use in instead of OR

The value of In should not exceed 500. In operations can make more efficient use of the index, and or can make less use of the index In most cases.

12. Disallow random sorting using order by rand()

All eligible data in the table will be loaded into memory, and then all data will be sorted in memory according to randomly generated values, and a random value may be generated for each row. If the data set that meets the criteria is very large, it will consume a lot of CPU and IO and memory resources.

The recommended way to get a random value in a program and then get the data from the database.

13. The WHERE clause disallows function conversions and calculations on columns

Functional conversions or calculations on columns result in unusable indexes.

Is not recommended:

wheredate(create_time)=’20190101′

Recommendation:

wherecreate_time >=’20190101’and create_time <‘20190102’

14. Use UNION ALL instead of UNION when it is clear there will be no duplicates

The UNION will put all the data in the two result sets into temporary tables and then de-redo them.

The UNION ALL will no longer deduplicate the result set.

15. Split complex large SQL into multiple small SQL

Large SQL: logically complex SQL that requires a large amount of CPU for calculation.

MySQL: Only one CPU can be used for a SQL calculation.

SQL splitting can be executed in parallel to improve processing efficiency.

Code of conduct for database operations

1. Perform batch write operations (UPDATE, DELETE, and INSERT) for more than 1 million rows

Large volume operations can cause significant master-slave delays

In a master/slave environment, a large number of operations may cause serious master/slave latency. A large number of write operations generally take a long time to execute. However, after the write operations are completed on the master library, they are executed on other slave libraries, resulting in a long delay between the master and slave libraries

Binlog Logs in ROW format generate a large number of logs

Large quantities of write operation will produce large amounts of log, especially for the binary data row format, because each line is recorded in the row format data change, the more data we a change, the log will have, the more amount of log of transmission and the longer the recovery time is needed to, this is also a cause of the master-slave delay.

Avoid large transaction operations

A large number of data changes must be performed in a transaction, which will cause a large number of data locks in the table, resulting in a large number of blocks, which will have a significant impact on MySQL performance.

In particular, long blocks can fill up all available connections to the database, preventing other applications in the production environment from connecting to the database, so it is important to note that bulk writes are done in batches.

2. Run the pt-online-schema-change command to modify the table structure for large tables

Avoid master/slave delays caused by large table changes

Avoid locking tables while table fields are being modified

Changes to large table data structures must be made with caution. Serious table locking operations, especially in production environments, are not tolerated.

Pt-online-schema-change creates a new table with the same structure as the original table, changes the table structure on the new table, copies the data from the original table to the new table, and adds some triggers to the original table.

Copy the newly added data from the original table to the new table. After all the data in the row is copied, name the new table as the original table and delete the original table. Break the original DDL operation into multiple smaller batches.

3. Do not grant super permission to the account used by the program

When the maximum number of connections is reached, one user with super permission is allowed to connect. Super permission can only be reserved for the account that handles the problem.

4. Follow the principle of minimum permissions for application database access accounts

Programs can only use database accounts in one DB, and accounts that are not allowed to be used by cross-library programs are not allowed to have drop permission in principle.

[reproduced] a copy of the MySQL specification worth collecting

Related Posts

Let’s play Dubbo first

Some issues with Python crawlers -7 list de-duplication

By the end of this article, your PHP code is elegant.