Database basics

Why use a database

Data is stored in memory

Advantages: Fast access speed

Disadvantages: Data cannot be stored permanently

Data is saved in files

Advantages: Data is saved permanently

Disadvantages: 1) Slower than memory operation, frequent I/O operation. 2) It is inconvenient to query data

The data is stored in the database

1) Data is stored permanently

2) The use of SQL statements, convenient and efficient query.

3) Convenient data management

What is SQL?

Structured Query Language (SQL) is a database Query Language.

Purpose: Used to access data, query, update, and manage relational database systems.

What is a MySQL?

MySQL is a relational database management system developed by MySQL AB, a Swedish company and a product of Oracle. MySQL is one of the most popular Relational Database Management systems, and one of the best RDBMS (Relational Database Management System) applications in WEB applications. It is commonly used in Java enterprise development because MySQL is open source, free and easy to expand.

What are the three paradigms of database

First normal form: No column can be split again.

Second normal form: On a first normal form basis, non-primary key columns are completely dependent on the primary key, not part of it.

Third normal form: On a second normal form basis, non-primary key columns depend only on the primary key and not on other non-primary keys.

When designing a database structure, try to follow the three paradigms, and if not, there must be a good reason for it. Like performance. In fact, we often compromise database design for performance.

Mysql > select * from ‘privileges’ where’ privileges’ are stored

The MySQL server controls user access to the database through the permission table, which is stored in the MySQL database and initialized by the mysql_install_db script. These permission tables are user, DB, table_priv, columns_priv, and host. The structure and contents of these tables are described below:

User permission table: records the information about the user accounts that are allowed to connect to the server. The permissions in the table are global. Db rights table: records the operation rights of each account on each database. Table_priv Permission table: records data table-level operation permissions. Columns_priv permission table: records operation permissions at the data column level. Host permission table: Works with db permission table to control database-level operation permissions on a given host. This permission list is not affected by GRANT and REVOKE statements.

How many types of entries are available for MySQL binlog? What’s the difference?

There are three formats, Statement, Row and mixed.

  • In statement mode, each SQL statement that modifies data is recorded in the binlog. You do not need to record the changes of each row, reducing the amount of binlog logs, saving I/O, and improving performance. Because SQL execution is contextual, relevant information needs to be saved at the time of saving, and some statements that use functions and the like cannot be recorded and copied.
  • At the ROW level, information about the CONTEXT of SQL statements is not recorded. Only the modified record is saved. The recording unit is the change of each row. Basically, all the changes can be recorded, but many operations will lead to a large number of changes of rows (such as ALTER table). Therefore, files in this mode save too much information and log too much.
  • Mixed, a compromise, uses statement records for normal operations and row records for situations where statement is not available.

In addition, the row level has been optimized in the new version of MySQL to record statements instead of row by row when table structure changes.

The data type

What are the data types of mysql? Category Type Name Description

  • Integer types

TinyInt Specifies a small integer (8-bit binary) SmallInt Specifies a small integer (16-bit binary) MediumInt Specifies a medium integer (24-bit binary) int(integer) Specifies a common integer (32-bit binary).

  • The decimal type

Decimal (m,d) Compresses strictly fixed-point numbers

  • The date type

year YYYY 1901 2155 time HH:MM:SS -838:59:59 838:59:59 date YYYY-MM-DD 1000-01-01 9999-12-3 datetime YYYY-MM-DD HH:MM:SS 1000-01-01 00:00:00 9999-12-31 23:59:59 timestamp YYYY-MM-DD HH:MM:SS 19700101 00:00:01 UTC~2038-01-19 03:14:07UTC

  • Text, binary type

VARCHAR(M) M Is an integer between 0 and 255. TINYBLOB Allowed length 0 255 bytes BLOB allowed length 0 65535 bytes MEDIUMBLOB Allowed length 0 167772150 bytes TINYTEXT Allowed length 0 255 bytes TEXT Allowed length 0 65535 bytes MEDIUMTEXT Allowed length 0 167772150 bytes LONGTEXT Allowed length 0 4294967295 bytes VARBINARY(M) Allows a variable length string of 0 M bytes BINARY(M) allows a fixed length string of 0 M bytes

  • The value can be TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT, which are integers of 1 byte, 2 byte, 3 byte, 4 byte, and 8 byte respectively. Any integer type can have an UNSIGNED attribute to indicate that the data is UNSIGNED, that is, a non-negative integer.

Length: The length of an integer can be specified. For example, INT(11) indicates an INT of length 11. Length is meaningless in most scenarios. It does not limit the legal range of values, only affects the number of characters displayed, and needs to be used in conjunction with the UNSIGNED ZEROFILL property to be meaningful. For example, if the type is set to INT(5) and the property is UNSIGNED ZEROFILL, the database will actually store 00012 if the user inserts 12.

  • Real number types, including FLOAT, DOUBLE, DECIMAL.

DECIMAL can be used to store integers larger than BIGINT and can store exact decimals. FLOAT and DOUBLE have a range of values and support approximations using standard floating points. FLOAT is much more efficient than a DOUBLE in computing DECIMAL, which you can interpret as a string.

  • The value can be VARCHAR, CHAR, TEXT, or BLOB

VARCHAR is used to store variable-length strings, which are more space-efficient than fixed-length types. VARCHAR stores string length with an extra 1 or 2 bytes. If the column length is less than 255 bytes, 1 byte is used; otherwise, 2 bytes is used. If the VARCHAR stores more content than the set length, the content is truncated. CHAR is a fixed length that allocates sufficient space based on the defined length of the string. CHAR is padded with Spaces as needed for comparison purposes. CHAR is good for storing very short strings, or all values close to the same length. CHAR also truncates the content stored beyond the set length.

Use strategy: For frequently changing data, CHAR is better than VARCHAR because CHAR is less prone to fragmentation. For very short columns, CHAR is more storage efficient than VARCHAR. Be careful to allocate only as much space as you need; sorting longer columns consumes more memory. Avoid the TEXT/BLOB type. Temporary tables are used for query, which incurs significant performance overhead.

  • Enumeration type (ENUM) stores non-repeating data as a predefined set.

Sometimes you can use ENUM instead of the common string type. ENUM storage is very compact, condensing list values to one or two bytes. ENUM is stored internally as an integer. Avoid using numbers as constants in ENUM enumerations because they are confusing. Sort by internally stored integers

  • Date and time type, try to use timestamp, space efficiency is higher than datetime,

Storing timestamps in integers is usually cumbersome. If you need to store subtlety, you can use BigInt storage. See here, this real question is not easier to answer.

engine

MySQL storage engine MyISAM is different from InnoDB

Storage Engine: How data, indexes, and other objects are stored in MySQL is an implementation of a file system.

Common storage engines are as follows:

  • Innodb engine: The Innodb engine provides support for DATABASE ACID transactions. Row-level locking and foreign key constraints are also provided. It is designed to handle database systems with large data volumes.
  • MyIASM engine (originally Mysql’s default engine) : does not support transactions, row-level locks and foreign keys.
  • MEMORY engine: All data is stored in MEMORY, data processing speed is fast, but not high security.

MyISAM is different from InnoDB

MyISAM Innodb storage structure Each table is stored in three files: FRM – table definition, MYD(MYData)- data file, MYI(MYIndex)- index file All tables are stored in the same data file (may be multiple files, or independent table space file), InnoDB table size is only limited by the size of the operating system file, InnoDB tables require more memory and storage. It creates its own buffer pool in main memory for caching data and index portability, backup and recovery. Since MyISAM data is stored as files, So cross-platform data transfer is very convenient. You can perform a separate operation on a single table during backup and restore. Free solutions include copying data files, backing up binlogs, or using mysqldump, which can be painful at tens of gigabytes. File format Data and indexes are stored separately. Ibd Records stored sequentially in order of record insertion Sequentially inserted by primary key size Foreign key not supported Supported Transactions not supported Supported Locks Supported (locking is a mechanism to avoid resource contention, INSERT, UPDATE, DELETE InnoDB SELECT count(*) MyISAM faster, Because myISAM maintains a counter internally, it can be called directly. Index implementation B+ tree index, myISAM is heap table B+ tree index, Innodb is index organization table hash index not supported full-text index supported not supported

What is the difference between MyISAM index and InnoDB index?

  • InnoDB index is clustered index, MyISAM index is non-clustered index.
  • InnoDB’s primary key index is very efficient because its leaf nodes store rows.
  • The leaf node of the MyISAM index stores the row data address, which needs to be addressed again to get the data back.
  • InnoDB leaf nodes that are not primary key indexes store primary key and other indexed column data, so overwriting indexes can be very efficient when querying.

Four features of the InnoDB engine

Insert buffer

Double write

Adaptive Hash index (AHI)

Pre-reading (read ahead)

Storage Engine Selection

If there are no specific requirements, use the default Innodb.

MyISAM: Read-write and insert-oriented applications, such as blogging systems, news portals.

Innodb: Updates (deletes) frequently, or to ensure data integrity; High concurrency, support for transactions and foreign keys. For example, OA office automation system.

The index

What is an index?

Indexes are special files (indexes on InnoDB tables are part of the table space) that contain Pointers to all the records in the table.

An index is a data structure. A database index is a sorted data structure in a database management system to help query and update data in a database table quickly. Indexes are usually implemented using B trees and their variant B+ trees.

More generally, an index is a table of contents. In order to facilitate the search of the contents of the book, through the content of the index to form a catalog. An index is a file that occupies physical space.

What are the advantages and disadvantages of indexes?

Advantages of indexes

  • It can greatly speed up the retrieval of data, which is the main reason for creating indexes.
  • By using indexes, you can improve the performance of the system by using optimization hiders during the query process.

Disadvantages of indexes

  • Time: It takes time to create and maintain indexes. To be specific, indexes need to be maintained dynamically when data in a table is added, deleted, or modified, which reduces the efficiency of adding, changing, or deleting.
  • Spatial: Indexes need to occupy physical space.

Index Usage scenarios (emphasis)

where

In the figure above, the record is queried by ID. Because the ID field is only the primary key index, this SQL execution can select only the primary key index. If there are more than one, it will eventually select the better one as the basis for the retrieval.

Alter table innodb1 add sex char(1); alter table innodb1 add sex char(1); Null EXPLAIN SELECT * from innodb1 where sex=’ male ‘;

Alter table add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index)))

order by

When we use order BY to sort the query results by a field, if the field is not indexed, then the execution plan will use external sort for all the queried data. This operation can affect performance. Because all the data involved in the query needs to be read from disk to memory (if a single data is too large or too much data will reduce efficiency), let alone the sorting after reading to memory.

However, if we create an index for this field alter table add index(field name), then because the index itself is ordered, so directly according to the order of the index and mapping relationship can be fetched one by one. And if paging, then only the index in a range of the index table corresponding to the data, rather than fetching all the data to sort and return a range of data as described above. (Fetching data from disk is the most performance critical)

join

It is efficient to index the fields involved in the join statement matching relation (ON)

Indexes cover

If the fields to be queried are all indexed, the engine will query directly in the index table without accessing the raw data (otherwise, it will do a full table scan whenever a field is not indexed), which is called index overwrite. Therefore, we need to write only the necessary query fields after the select as much as possible to increase the chance of index coverage.

It’s worth noting here that you don’t want to index every field, because the advantage of using indexes in preference is their small size.

What are the types of indexes?

Primary key index: Data columns cannot duplicate or be NULL, and a table can have only one primary key.

Unique index: Data columns are not allowed to duplicate and NULL values are allowed. A table allows multiple columns to create unique indexes.

ALTER TABLE table_name ADD UNIQUE (column); Create a unique index

ALTER TABLE table_name ADD UNIQUE (column1,column2); Create a unique composite index

Plain index: A basic index type that has no restrictions on uniqueness and allows NULL values.

ALTER TABLE table_name ADD INDEX index_name (column); Creating a normal index

ALTER TABLE table_name ADD INDEX index_name(column1, column2, column3); Create composite indexes

Full-text indexing is a key technology used by search engines at present.

ALTER TABLE table_name ADD FULLTEXT (column); Creating a full-text index

Index data structure (B-tree, hash)

The data structure of index is related to the implementation of specific storage engine. Indexes used in MySQL include Hash index, B+ tree index, etc. The default index of InnoDB storage engine we often use is B+ tree index. For hash index, the underlying data structure is hash table, so in the vast majority of requirements for a single record query, you can choose hash index, query performance is the fastest; In most scenarios, you are advised to select the BTree index.

1) B-tree index

Mysql uses storage engine to fetch data, and almost 90% of people use InnoDB. According to the implementation, InnoDB has only two index types: BTREE index and HASH index. B-tree index is the most frequently used index type in Mysql database. Almost all storage engines support BTree index. Mysql > select * from BTREE; select * from BTREE; select * from BTREE;

Query mode:

Primary key index area :PI(address of associated saved data) press the primary key to query,

Common index area :si(address of the associated ID, and then to the address above). So press the primary key query, the fastest

B + tree properties:

1.) N subtrees contain n keywords, which are not used to store data but to store indexes of data.

2.) All leaf nodes contain information of all keywords and Pointers to records containing these keywords, and leaf nodes themselves are linked in large order according to the size of keywords.

3.) All non-terminal nodes can be regarded as index parts, which only contain the maximum (or minimum) keyword in the sub-tree.

4.) In B+ tree, data objects are inserted and deleted only on leaf nodes.

5.) B+ trees have two head Pointers, one is the root node of the tree and the other is the leaf node of the minimum key code.

2) Hash index

Briefly said, simple implementation of a HASH table is similar to the data structures (HASH), when we use HASH index in mysql, mainly through the HASH algorithm (common HASH algorithm is directly addressing method, in the square method and folding method, the divisor residual method, random number method), puts the data into a database field long HASH value, The row pointer to this data is stored in the Hash table; If a Hash collision occurs (two different keywords have the same Hash value), they are stored in a linked list under the corresponding Hash key. Of course, this is just a rough simulation.

The fundamentals of indexing

Indexes are used to quickly find records that have specific values. If there is no index, the query will generally traverse the entire table.

The principle of indexing is simple: it turns unordered data into ordered queries

  1. Sort the contents of columns that have been indexed
  2. Generate an inversion list of sorted results
  3. Spell the data address chain on the inverted list contents
  4. In the query, first get inverted list content, and then take out the data address chain, so as to get specific data

What are the indexing algorithms?

Index algorithms include BTree algorithm and Hash algorithm

BTree algorithm

BTree is the most common mysql database indexing algorithm and the default mysql algorithm. Because it can be used not only on the =,>,>=,<,<= and between comparison operators, but also on the like operator, as long as its query condition is a constant that does not begin with a wildcard, for example:

Select * from user where name like ‘jack%’; select * from user where name like ‘jack%’; Select * from user where name like ‘%jack’; select * from user where name like ‘%jack’; The Hash algorithm

Hash Hash indexes can only be used for peer comparison, such as the =,<=> (equivalent to the =) operator. Because it is a positioning data, unlike the BTree index, which needs to access the page node from the root node to the branch node for many IO visits, the retrieval efficiency is much higher than that of the BTree index.

Principles of index design?

  1. The columns that are suitable for indexing are those that appear in the WHERE clause or are specified in the join clause
  2. Classes with a small cardinality are poorly indexed and there is no need to index this column
  3. Use short indexes. If you index long string columns, you should specify a prefix length to save a lot of index space
  4. Don’t over-index. Indexes require additional disk space and reduce write performance. When table contents are modified, the index is updated or even reconstructed, and the more index columns, the longer this takes. So keep only the indexes you need to help the query.

Principles for index creation (top priority)

Indexing is good, but it is not unlimited. It is best to comply with the following principles

Mysql will keep matching to the right until it hits a range query (>, <, between, like). A = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and C > 3 and d = 4

2) create an index for a field that is frequently used as a query condition

3) Frequently updated fields are not suitable for creating indexes

4) If the column cannot effectively distinguish data, it is not suitable for index column (such as gender, male and female unknown, there are only three at most, the distinction is too low).

5) Expand indexes as much as possible, do not create new indexes. For example, if you want to add (a,b) to a table that already has an index of A, you only need to modify the original index.

6) foreign key columns must be indexed.

7) For those columns that are rarely involved in the query, do not create indexes for those columns with a high number of duplicate values.

8) Do not index columns of data types defined as text, image, and bit.

There are three ways to create an index: drop an index

The first method: CREATE an index when executing CREATE TABLE

CREATE TABLE user_index2 ( id INT auto_increment PRIMARY KEY, first_name VARCHAR (16), last_name VARCHAR (16), id_card VARCHAR (18), information text, KEY name (first_name, last_name), FULLTEXT KEY (information), UNIQUE KEY (id_card) );

Second: use the ALTER TABLE command to add an index

ALTER TABLE table_name ADD INDEX index_name (column_list); 1 ALTER TABLE creates a common index, UNIQUE index, or PRIMARY KEY index.

Table_name indicates the name of the table to which the index is to be added. Column_list indicates the column to which the index is to be added. If there are multiple columns, the columns are separated by commas.

The index name index_name is self-naming. By default, MySQL assigns a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time.

Third method: Run the CREATE INDEX command to CREATE the INDEX

CREATE INDEX index_name ON table_name (column_list); 1 CREATE INDEX You can add a common INDEX or UNIQUE INDEX to a table. (However, you cannot create a PRIMARY KEY index.)

Remove the index

Drop a normal, unique, or full-text index by index name: ALTER TABLE table name drop KEY index name

alter table user_index drop KEY name; alter table user_index drop KEY id_card; alter table user_index drop KEY information;

Alter table table name drop primary key; It is worth noting here that this operation cannot be performed directly if the primary key grows (self-growth depends on the primary key index) :

Need to cancel self-growth and then delete:

Alter table user_index — MODIFY id int, drop PRIMARY KEY

However, primary keys are usually not removed because design primary keys must be independent of business logic.

What should I pay attention to when creating an index?

  • Non-empty fields: Columns should be specified NOT NULL unless you want to store NULL. Columns with null values are difficult to query optimize in mysql because they complicate indexes, index statistics, and comparison operations. You should replace null values with 0, a special value, or an empty string;
  • The columns of fields with large value dispersion (the difference between values of variables) are placed before the joint index. You can view the difference value of the field by using the count() function. The larger the returned value is, the more unique values of the field the higher the dispersion degree of the field is.
  • The smaller the index field, the better: Database data is stored in pages. More data is stored on a page. More data is obtained in one I/O operation, the more efficient it is.

Does using indexed queries necessarily improve query performance? why

In general, querying data through an index is faster than a full table scan. But we must also be aware of the costs.

Indexes require space to store and need regular maintenance, and the index itself is modified whenever a record is added or subtracted from the table or an index column is modified. This means that each INSERT, DELETE, and UPDATE record will cost 4 or 5 more disk I/ OS. Because indexes require extra storage and processing, unnecessary indexes can slow query response times. Using INDEX queries does not necessarily improve query performance. An INDEX RANGE SCAN is suitable for two types of queries: searches based on a RANGE. Generally, a query returns a result set less than 30% of the number of records in a table

How do I delete data at the million level or above

About index: Because index needs extra maintenance cost, index file is a separate file, so when we add, modify and delete data, there will be extra operations on index file, these operations need to consume extra IO, which will reduce the efficiency of add/change/delete. So, when we delete millions of database data, check the MySQL official manual to see that the speed of deleting data is proportional to the number of indexes created.

  1. So when we want to delete millions of data, we can delete the index first.
  2. Then delete the useless data (this process takes less than two minutes)
  3. Re-create the index after the deletion (when there is less data) is also very fast, in about 10 minutes.
  4. With the previous direct delete is definitely much faster, not to mention in case of delete interruption, all delete will be rolled back. That’s even worse.

The prefix index

Syntax: index(field(10)) : uses the first 10 characters of a field value to create an index. The default value is to use the entire content of a field to create an index.

Prerequisite: The identifier of the prefix is high. Passwords, for example, are good for prefix indexing because they are almost always different.

Practical difficulty: the length of the prefix cut.

Select count(*)/count(distinct left(password,prefixLen)); By adjusting the prefixLen value (incremented from 1) to see an average match for different prefix lengths near 1 (the prefixLen characters representing a password almost determine a single record)

What is the leftmost prefix principle? What is the leftmost matching principle

  • As the name implies, left-most first. When creating a multi-column index, the most frequently used column in the WHERE clause is placed on the left-most, depending on business requirements.
  • A = 1 and b = 2 and C > 3 and d = 4; a = 2 and C > 3 and d = 4; D (a,b,d,c); d (a, B,d,c); d (a, B,d);
  • = and in can be out of order, such as a = 1 and b = 2 and c = 3. Create (a,b,c) indexes in any order. Mysql’s query optimizer will help you optimize them into a form that can be recognized by the indexes

B tree and B+ tree

  • In a B-tree, you can store keys and values in internal nodes and leaf nodes; But in a B+ tree, the inner nodes are all keys and have no values, and the leaf nodes hold both keys and values.
  • The leaves of a B+ tree are connected by a chain, whereas the leaves of a B tree are independent.

Benefits of using B-trees B-trees can store keys and values in internal nodes at the same time, so placing frequently accessed data near the root node greatly improves the efficiency of hot data queries. This feature makes b-trees more efficient in scenarios where a particular data is queried repeatedly.

The advantages of using a B+ tree Because the internal nodes of a B+ tree only store keys, not values, you can obtain more keys in the memory page in a single read, which helps to narrow down the search more quickly. The leaf nodes of B+ tree are connected by a chain. Therefore, when a full data traversal is needed, B+ tree only needs O(logN) time to find the smallest node, and then O(N) sequential traversal through the chain is enough. B trees, on the other hand, need to traverse each level of the tree, which requires more memory replacement times and therefore more time

What’s the difference between a Hash index and a B+ tree? First, understand the underlying implementation of Hash indexes and B+ tree indexes:

The underlying hash index is a hash table. When searching for data, you can call the hash function once to obtain the corresponding key value, and then query back to the table to obtain the actual data. The underlying implementation of a B+ tree is a multi-path balanced lookup tree. For each query, it starts from the root node, and the key value can be obtained when the leaf node is found, and then it is judged whether it is necessary to query data back to the table according to the query.

So you can see that they have the following differences:

  • Hash indexes are faster for equivalent queries (in general), but not for range queries.

After the hash function is used to create indexes in the hash index, the index order cannot be the same as the original order, and range query cannot be supported. All nodes of a B+ tree follow the rules (the left node is smaller than the parent node, the right node is larger than the parent node, and the same is true for multi-fork trees), which naturally supports the range.

  • Hash indexes do not support sorting by indexes.
  • Hash indexes do not support fuzzy query and left-most prefix matching of multi-column indexes. It also works because hash functions are unpredictable. The indexes of AAAA and AAAAB have no correlation.
  • Hash indexes can always be used to query data back to the table, whereas B+ trees can use indexes only when certain conditions (clustered indexes, overwriting indexes, etc.) are met.
  • Hash indexes, while fast for equivalent queries, are not stable. Performance is unpredictable. When there are a large number of duplicate key values, hash collisions occur, and the efficiency may be very poor. The query efficiency of B+ tree is relatively stable. All queries are from the root node to the leaf node, and the height of the tree is relatively low.

Therefore, in most cases, choosing B+ tree indexes directly can achieve stable and good query speed. Instead of using hash indexes.

Why does the database use B+ trees instead of B trees

  • B tree is only suitable for random retrieval, while B+ tree supports both random and sequential retrieval.
  • The B+ tree provides higher space utilization, reduces I/O times, and lowers disk read/write costs. Generally, indexes themselves are too large to be stored in memory, so indexes are often stored on disk as index files. In this case, disk I/O consumption is incurred during index lookups. The internal nodes of the B+ tree do not have Pointers to the specific information about keywords, but are used as indexes. The internal nodes of the B+ tree are smaller than those of the B tree. The number of keywords in the nodes that can be contained in the disk block is larger, and the number of keywords that can be searched in the memory at a time is larger. IO read and write times are the biggest factor affecting index retrieval efficiency.
  • The query efficiency of B+ tree is more stable. B-tree search may end at non-leaf nodes, and the closer it is to the root node, the shorter the record search time is. As long as the key word is found, the existence of the record can be determined, and its performance is equivalent to a binary search in the whole set of keywords. However, in B+ tree, sequential retrieval is more obvious. In random retrieval, any keyword must be searched from the root node to the leaf node. All keyword search paths have the same length, resulting in the same query efficiency of each keyword.
  • B-tree improves disk IO performance but does not solve the problem of inefficiency of element traversal. The leaf nodes of a B+ tree are connected together sequentially using Pointers, and the entire tree can be traversed simply by traversing the leaf nodes. Moreover, range-based queries are very frequent in the database, and B trees do not support such operations.
  • Adding and deleting files (nodes) is more efficient. Because the leaf node of B+ tree contains all keywords and is stored in an ordered linked list structure, the efficiency of addition and deletion can be greatly improved.

B+ trees do not need to query data back to the table when meeting the requirements of clustered index and overwritten index.

In the index of a B+ tree, the leaf node may store the current key value, or it may store the current key value as well as the entire row of data. This is the clustered index and the non-clustered index. In InnoDB, only primary key indexes are clustered indexes. If there is no primary key, a unique key is selected to create a clustered index. If there is no unique key, a key is implicitly generated to build the cluster index.

When a query uses a clustered index, the entire row of data can be retrieved at the corresponding leaf node, so there is no need to run a query back to the table.

What is a cluster index? When to use clustered and non-clustered indexes

  • Clustered indexes: Store data together with the index, find the index and find the data
  • Non-clustered index: Myisam uses key_buffer to cache the index in memory. When it needs to access data (through the index), myISam directly searches the index in memory, and then finds the corresponding data on disk through the index. This is why indexes are slow when they are not hit by the key buffer

To clarify a concept: innodb, above the clustering index created index called secondary index, auxiliary index access data is always need a second search, the clustering index is auxiliary index, as a composite index, the prefix index, the only index, auxiliary index leaf node storage is no longer the physical location, but the primary key

When to use clustered and non-clustered indexes

Must a non-clustered index be queried back into the table?

Not necessarily. This involves whether all the fields required by the query match the index. If all the fields match the index, then there is no need to perform the query back to the table.

Select age from employee where age < 20; select age from employee where age < 20; select age from employee where age < 20;

What is a federated index? Why do I care about the order in a federated index?

MySQL can use multiple fields to create an index at the same time, called a federated index. If you want to match an index in a joint index, you need to match the index one by one in the order of the fields when the index is created. Otherwise, the index cannot be matched.

Specific reasons are as follows:

MySQL > create index (name, age, school); MySQL > create index (name, age, school); MySQL > create index (school);

When the query is performed, the indexes are only strictly ordered according to name, so the name field must be used for equivalent query first. Then, the matched columns are strictly ordered according to age field, and the age field can be used for index search, and so on. Therefore, when establishing a joint index, we should pay attention to the order of index columns. In general, the columns with frequent query requirements or high field selectivity should be placed first. Additional adjustments can be made individually, depending on the specific query or table structure.

Transactions What are database transactions?

Transaction is an indivisible sequence of database operations and the basic unit of database concurrency control. The result of its execution must make the database change from one consistency state to another. A transaction is a logical set of operations that either all or none of them execute.

The most classic and often cited example of a transaction is the transfer of money.

If Xiao Ming wants to transfer 1000 yuan to Xiao Hong, the transfer will involve two key operations: reducing Xiao Ming’s balance by 1000 yuan and increasing Xiao Hong’s balance by 1000 yuan. If something goes wrong between these two operations like the banking system crashes, and Ming’s balance goes down and Red’s balance doesn’t go up, that’s not right. A transaction is a guarantee that both of these critical operations will either succeed or fail.

What are the four properties of ACID? Relational databases must follow the ACID rule, which reads as follows:

  1. Atomicity: Transactions are the smallest unit of execution and do not allow splitting. The atomicity of the transaction ensures that the action either completes completely or does not work at all;
  2. Consistency: Data is consistent before and after a transaction is executed. Multiple transactions read the same data with the same result.
  3. Isolation: when accessing the database concurrently, a user’s transaction is not disturbed by other transactions, and the database is independent between the concurrent transactions.
  4. Persistence: After a transaction is committed. Its changes to the data in the database are persistent and should not be affected if the database fails.

What is dirty reading? Phantom read? Unrepeatable?

  • Drity Read: a transaction has updated a copy of data, and another transaction has Read the same copy of data. For some reason, the first transaction has rolled back, and the data Read by the second transaction is incorrect.
  • Non-repeatable read: Data inconsistency between two queries of a transaction. This may be because the original data updated by a transaction was inserted between the two queries.
  • Phantom Read: a transaction where the number of pens is inconsistent between two queries. For example, if one transaction queries for rows and another inserts new columns, the previous transaction will find columns that it did not have before on subsequent queries.

What is the isolation level of a transaction? What is the default isolation level for MySQL?

In order to achieve the four characteristics of transaction, the database defines four different transaction isolation levels, which are Read uncommitted, Read committed, Repeatable Read, Serializable. The four levels solve the problems of dirty reads, unrepeatable reads, and phantom reads one by one.

The SQL standard defines four isolation levels:

  • Read-uncommitted: The lowest isolation level that allows UNCOMMITTED data changes to be READ, potentially resulting in dirty, illusory, or unrepeatable reads.
  • Read-committed: Allows concurrent transactions to READ data that has been COMMITTED, preventing dirty reads, but magic or unrepeatable reads can still occur.
  • REPEATABLE-READ: Multiple reads of the same field are consistent, unless the data is modified by the transaction itself. This can prevent dirty reads and unrepeatable reads, but phantom reads are still possible.
  • SERIALIZABLE: Highest isolation level, fully subject to ACID isolation level. All transactions are executed one by one so that interference between transactions is completely impossible. That is, this level prevents dirty reads, unrepeatable reads, and phantom reads.

REPEATABLE_READ isolation level used by Mysql READ_COMMITTED isolation level used by Oracle

The implementation of transaction isolation mechanism is based on locking mechanism and concurrent scheduling. Among them, concurrent scheduling uses MVVC (Multi-version Concurrency Control), which supports concurrent consistent read and rollback by saving modified old version information.

Most database systems have read-committed isolation: because the lower the isolation level, the fewer locks are COMMITTED, but remember that InnoDB storage engine uses **REPEATABLE READ ** by default without any performance penalty.

InnoDB storage engine typically uses the **SERIALIZABLE ** isolation level for distributed transactions.

The lock

Do you know about MySQL locks

When a database has concurrent transactions, data inconsistencies may occur, and some mechanism is needed to ensure the order of access. The locking mechanism is such a mechanism.

Just like a hotel room, if people go in and out at random, there will be many people snatches for the same room, and a lock will be installed on the room. Only the person who has obtained the key can enter and lock the room, and others can use it again only after they have finished using it.

The relationship between isolation levels and locks

At the Read Uncommitted level, shared locks are not required to Read data so that it does not conflict with exclusive locks on modified data

At the Read Committed level, shared locks are added to Read operations but released after the statement is finished.

In Repeatable Read level, Read operations need to add the shared lock, but the shared lock is not released before the transaction is committed, that is, the shared lock must be released after the transaction is completed.

SERIALIZABLE is the most restrictive isolation level because it locks the entire range of keys and holds the lock until the transaction completes.

What are the database locks by lock granularity? Locking mechanism and InnoDB locking algorithm

In relational databases, database locks can be divided into row-level locks (INNODB engine), table-level locks (MYISAM engine), and page-level locks (BDB engine) according to the granularity of locks.

MyISAM and InnoDB storage engines use locks:

  • MyISAM uses table-level locking.
  • InnoDB supports row-level locking and table-level locking. The default row-level locking is performed

Row-level locking, table-level locking and page-level locking comparison

Row-level lock The row-level lock is the most fine-grained lock in Mysql. It only locks the current row. Row-level locking can greatly reduce conflicts in database operations. Its locking particle size is the smallest, but the locking cost is also the largest. Row-level locks are divided into shared locks and exclusive locks.

Features: high overhead, slow lock; Deadlocks occur; The lock granularity is the lowest, the probability of lock conflict is the lowest, and the concurrency is the highest.

Table level lock Table level lock is the lock with the largest granularity in MySQL. It locks the entire table in the current operation. It is simple to implement, consumes less resources, and is supported by most MySQL engines. The most commonly used MYISAM and INNODB both support table-level locking. Table level locks are classified into shared table read locks (shared locks) and exclusive table write locks (exclusive locks).

Features: low overhead, fast lock; No deadlocks occur; The lock granularity is large, and the probability of lock conflict is high and the concurrency is low.

Page-level lock Page-level lock is a type of lock whose granularity is in the middle between row-level lock and table-level lock in MySQL. Table level locking is fast but has many conflicts, while row level locking is slow but has few conflicts. So a compromise page level is taken, locking adjacent sets of records at a time.

Features: Overhead and locking time are between table and row locks; Deadlocks occur; The locking granularity is between table locks and row locks, and the concurrency is average

What locks does MySQL have? Locking as above is a bit inefficient

Will tell from the type of lock, have share lock and exclusive lock.

Shared lock: also known as read lock. When the user wants to read the data, a shared lock is placed on the data. Multiple shared locks can be added simultaneously.

Exclusive lock: also known as write lock. An exclusive lock is placed on the data when the user writes to it. Only one exclusive lock can be added, and other exclusive locks and shared locks are mutually exclusive.

In the above example, there are two kinds of user behaviors. One is to view the house. It is acceptable for multiple users to view the house together. One is a real one-night stay, during which neither those who want to stay nor those who want to see the house are allowed.

The granularity of locking depends on the specific storage engine. InnoDB implements row-level locking, page-level locking, and table-level locking.

Their locking overhead goes from high to low, and so does their concurrency.

MySQL InnoDB engine row lock how to implement?

A: InnoDB does row locking based on indexes

Select * from tab_with_index where id = 1 for update;

For UPDATE can perform row locks based on conditions, and ids are columns with index keys. If id is not an index key InnoDB will complete table locks and concurrency will not be possible

There are three locking algorithms for InnoDB storage engine

  • Record Lock: A lock on a single row Record
  • Gap Lock: A Gap lock that locks a range, excluding the record itself
  • Next-key lock: Record +gap locks a range, including the record itself

Related knowledge:

  • Innodb uses next-key lock for row queries
  • Next-locking keying to solve Phantom Problem
  • Demote the next-key lock to a Record key when the query index contains unique attributes
  • Gap locks are designed to prevent multiple transactions from inserting records into the same range, which can cause phantom problems
  • There are two ways to explicitly close gap locks :(use only record locks except for foreign key constraints and uniqueness checks) A. Set the transaction isolation level to RC B. Set innodb_locks_unsafe_for_binlog to 1

What is a deadlock? How to solve it?

A deadlock is a vicious cycle in which two or more transactions occupy each other’s resources and request to lock each other’s resources.

Common solutions to deadlocks

1. If different programs concurrently access multiple tables, try to agree to access the tables in the same order, which can greatly reduce the chance of deadlocks.

2, in the same transaction, as far as possible to lock all the resources needed to reduce the probability of deadlock;

3. For services that are prone to deadlocks, upgrade locking granularity can be used to reduce the probability of deadlocks by table-level locking.

Distributed transaction locks or optimistic locks can be used if the business is not doing well

What are optimistic and pessimistic locks for databases? How do you do that?

The task of concurrency control in a database management system (DBMS) is to ensure that the isolation and unity of transactions and the unity of the database are not broken when multiple transactions simultaneously access the same data in the database. Optimistic concurrency control (optimistic locking) and pessimistic concurrency control (pessimistic locking) are the main techniques used in concurrency control.

Pessimistic locking: Shielding all operations that might violate data integrity, assuming concurrency conflicts. The transaction is locked after the data is queried until the transaction is committed. Implementation: use the locking mechanism in the database

Optimistic locking: Data integrity violations are checked only at commit time, assuming no concurrency conflicts will occur. The transaction is locked while the data is being modified, using version locking. Implementation: Music will generally use the version number mechanism or CAS algorithm implementation.

Two types of lock usage scenarios

From the introduction of the two kinds of lock, we know that the two kinds of lock have their own advantages and disadvantages, can not be considered better than the other kind, for example, optimistic lock is suitable for the situation of less write (multi-read scenario), that is, conflict is really rare, this can save the lock overhead, increase the overall throughput of the system.

However, in the case of overwrite, conflicts often arise, which can cause the upper application to be repeatedly retry, thus reducing performance. Pessimistic locking is suitable for overwrite scenarios.

view

Why use views? What is a view?

To improve the reusability of complex SQL statements and the security of table operations, the MySQL database management system provides the view feature. A view is essentially a virtual table that does not physically exist and contains a list of named columns and rows similar to a real table. However, views do not exist in the database as stored data values. The row and column data comes from the base table referenced by the query that defines the view and is generated dynamically when the view is specifically referenced.

Views improve the security of data in the database by allowing developers to focus only on specific data they are interested in and specific tasks they are responsible for, and only see the data defined in the view rather than the data in the tables referenced by the view.

What are the characteristics of views?

Views have the following characteristics:

  • The columns of a view can come from different tables, which are abstractions of tables and new relationships established in a logical sense.
  • A view is a table (virtual table) generated by a base table (real table).
  • View creation and deletion do not affect the base table.
  • Updates (additions, deletions, and modifications) to view content directly affect the base table.
  • When the view comes from more than one base table, data cannot be added and deleted.

Operations on a view include creating a view, viewing a view, deleting a view, and modifying a view.

What are the usage scenarios for views?

View basic purpose: Simplify SQL queries and improve development efficiency. If there is another use, it is to be compatible with older table structures.

The following are common usage scenarios for views:

  • Reuse SQL statements;
  • Simplify complex SQL operations. After you write a query, you can easily reuse it without knowing its basic query details;
  • Use parts of a table rather than the entire table;
  • Protect data. Users can be granted access to specific parts of a table rather than the entire table;
  • Change the data format and presentation. Views can return data that is different from the presentation and format of the underlying table.

Advantages of views

  1. Simplify queries. Views simplify user operations
  2. Data security. Views enable users to view the same data from multiple perspectives and secure confidential data
  3. Logical data independence. Views provide a degree of logical independence for refactoring the database

Disadvantages of Views

  • Performance. If the view is defined by a complex multi-table query, then even a simple query for a view will take the database some time to turn into a complex combination.
  • Modify restrictions. When the user tries to modify some rows of the view, the database must translate it into changes to some rows of the base table. In fact, the same is true when inserting or deleting from a view. This is convenient for simple views, but may not be modifiable for more complex views

These views have the following characteristics: 1. Views with collection operators such as UNIQUE. 2. View with GROUP BY clause. 3. Views with aggregate functions such as AVG\SUM\MAX. 4. Views that use the DISTINCT keyword. 5. Join table views (with some exceptions)

What is a cursor?

A cursor is a data buffer created by the system for users to store the execution results of SQL statements. Each cursor area has a name. The user can retrieve records one by one through a cursor and assign them to the main variable for further processing by the main language.

Stored Procedures and Functions What is a stored procedure? What are the pros and cons?

A stored procedure is a precompiled SQL statement that has the advantage of allowing modular design, meaning that it needs to be created once and can be called multiple times later in the program. If an operation requires multiple SQL executions, using stored procedures is faster than simply executing SQL statements.

advantages

1) Stored procedures are precompiled and run efficiently.

2) Stored procedure code is directly stored in the database, through the stored procedure name directly call, reduce network communication.

3) High security. Users with certain permissions are required to execute stored procedures.

4) Stored procedures can be reused to reduce the workload of database developers.

disadvantages

1) Debugging is troublesome, but debugging with PL/SQL Developer is very convenient! Make up for that shortcoming.

2) Migration issues, database side code is of course database related. But if you are doing engineering projects, there are basically no migration problems.

3) Recompile problem, because the back-end code is compiled before run, if the object with reference relationship changes, the affected stored procedures, packages will need to be recompiled (but can also be set to automatically compile at run time).

4) if in a program in the system a lot of the use of stored procedures, used to program delivery time with the increase of user requirements will lead to the change of the data structure, then there is the system of related problems, and finally, if the user wants to maintain the system can be said to be the very difficult, and the price is unprecedented, to maintain more of a problem.

Trigger What is a trigger? What are the use scenarios for triggers?

Triggers are special event-driven stored procedures defined by users on relational tables. A trigger is a piece of code that is automatically executed when an event is triggered.

Usage scenarios

  • Changes can be cascaded through related tables in the database.
  • Monitor changes to a field in a table in real time and need to be processed accordingly.
  • For example, some service numbers can be generated.
  • Do not abuse it; otherwise, database and application maintenance will be difficult.
  • Keep the basics in mind, but understand the difference between the data type CHAR and VARCHAR, and the difference between the table storage engine InnoDB and MyISAM.

What triggers are available in MySQL?

There are six types of triggers in MySQL database:

  • Before Insert
  • After Insert
  • Before Update
  • After Update
  • Before Delete
  • After Delete

Common SQL Statements What are the types of SQL statements

Data Ddefinition Language (DDL) CREATE, DROP, ALTER

Mainly for the above operations that have operations on logical structures, including table structures, views and indexes.

Data Query Language (DQL) SELECT

This is better understood as a query operation with the select keyword. All simple queries and connection queries belong to DQL.

Data Manipulation Language (DML) INSERT, UPDATE, and DELETE

DQL and DML jointly construct the add, delete, change and check operations commonly used by most junior programmers. Queries are a special kind of DQL.

Data Control Language (DCL) GRANT, REVOKE, COMMIT, ROLLBACK

Mainly for the above operations, that is, database security integrity and other operations, can be simply understood as permission control.

What are superkeys, candidate keys, primary keys, and foreign keys?

  • Superkeys: The set of attributes that uniquely identify a tuple in a relationship is called a relational schema superkey. An attribute can be used as a superkey, or a combination of attributes can be used as a superkey. Superkeys contain candidate keys and primary keys.
  • Candidate key: is the minimum superkey, that is, a superkey with no redundant elements.
  • Primary key: A combination of data columns or attributes in a database table that uniquely and completely identify a stored data object. A data column can have only one primary key, and the value of the primary key cannot be missing, that is, cannot be Null.
  • Foreign key: The primary key of another table that exists in one table is called the foreign key of that table.

What kinds of SQL constraints are there?

  • NOT NULL: The contents of the control field must NOT be NULL.
  • UNIQUE: The content of the control field cannot be repeated. A table can have multiple UNIQUE constraints.
  • PRIMARY KEY: Also used for control field contents cannot be repeated, but it is allowed only one in a table.
  • FOREIGN KEY: Action used to prevent breaking connections between tables and to prevent illegal data from being inserted into a FOREIGN KEY column, since it must be one of the values in the table to which it points.
  • CHECK: Used to control the value range of the field.

Six associated queries

  • CROSS JOIN
  • INNER JOIN
  • LEFT JOIN/RIGHT JOIN
  • UNION query (UNION and UNION ALL)
  • FULL JOIN
  • CROSS JOIN

SELECT * FROM A CROSS JOIN B(CROSS JOIN C) SELECT * FROM A CROSS JOIN B(CROSS JOIN C) SELECT * FROM A,B WHERE a.id = b.id or SELECT * FROM A INNER JOIN B ON A.id=B.id INNER JOIN can be shortened to JOIN 1. There are three types of INNER JOIN

  • ON A.id=B.id
  • Unequal connection: ON A.id > B.id
  • SELECT * FROM A T1 INNER JOIN A T2 ON t1.id = t2.pid

LEFT JOIN/RIGHT JOIN

  • LEFT OUTER JOIN: select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN
  • Select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN

UNION query (UNION and UNION ALL)

SELECT * FROM A UNION SELECT * FROM B UNION …

  • The result before the UNION is the benchmark. It should be noted that the number of columns in the joint query should be equal, and the same rows of records will be merged
  • If you use UNION ALL, duplicate rows are not merged
  • Efficiency of UNION is higher than that of UNION ALL

FULL JOIN

  • MySQL does not support full connections
  • You can use LEFT JOIN in combination with UNION and RIGHT JOIN

SELECT * FROM A LEFT JOIN B ON A.id=B.id UNIONSELECT * FROM A RIGHT JOIN B ON A.id=B.id

What is a subquery

  1. Condition: The query result of one SQL statement is the condition or result of another query statement
  2. Nested: Multiple SQL statements are nested. The internal SQL query statements are called subqueries.

Three cases of subqueries

  • Subqueries are single-row, single-column cases: the result set is a single value, and the parent query uses the: =, <, > operators

— Query who is the highest paid employee? select * from employee where salary=(select max(salary) from employee);

  • Subqueries are multi-row, single-column cases: the result set resembles an array, and the parent query uses the: in operator

— Query who is the highest paid employee? select * from employee where salary=(select max(salary) from employee);

  • A subquery is a multi-row, multi-column case: the result set is similar to a virtual table and cannot be used as a subtable in a WHERE condition

Query all department information and compare it with the information in the virtual table to find all department employees with the same ID. select * from dept d, (select * from employee where join_date > ‘2011-1-1’) e where e.dept_id = d.id;

Select d., e. from dept d inner join employee e on D.id = e.dept_id where e.join_date > ‘2011-1-1’

In and exists are different in mysql

The in statement in mysql hashes the outer table to the inner table, while the exists statement loops the outer table to query the inner table. It has long been thought that exists is more efficient than in statements, but this is not accurate. This is the distinction between the environment.

There is little difference between in and EXISTS if the two tables queried are of the same size. If one of the two tables is smaller and the other is larger, exists is used for the larger subtable and in is used for the smaller subtable. Not in and NOT EXISTS: If not in is used in a query statement, a full table scan is performed on both the internal and external tables without an index. The not extsts subquery can still be used for indexes on the table. So regardless of the size of the table, not exists is faster than not in.

Varchar differs from char

The characteristics of the char

  • Char indicates a string of fixed length.
  • If the length of the inserted data is less than the fixed length of char, it is padded with Spaces.
  • Because the length is fixed, the access speed is much faster than VARCHAR, even 50% faster, but because its length is fixed, so it will occupy extra space, is the way of space for time;
  • For char, the maximum number of characters that can be stored is 255, regardless of encoding

The characteristics of the varchar

  • Varchar stands for variable length string, the length is variable;
  • Store the data as long as it is inserted;
  • Varchar is the opposite of char in that it accesses slowly because the length is not fixed, but because of this, it does not occupy extra space.
  • For vARCHAR, the maximum number of characters can be 65532

In short, a combination of performance (faster CHAR) and disk space savings (smaller VARCHAR) is a good way to design your database in a specific situation.

Meaning of 50 in VARCHar (50)

Varchar (50) and (200) take up the same amount of space to store Hello, but the latter will consume more memory when sorting because order by COL uses fixed_length (the same with the memory engine). In earlier versions of MySQL, 50 stood for the number of bytes; now it stands for the number of characters.

Meaning of 20 in int(20)

The length of the display character. 20 indicates that the maximum display width is 20, but it still occupies 4 bytes of storage, and the storage range remains unchanged.

Does not affect internal storage, only affects the number of zeros before the int with zerofill definition, easy report display

Why is mysql designed this way

Does not make sense for most applications, but specifies some tool used to display the number of characters; Int (1) and int(20) store and calculate the same;

Int (10); char(10); varchar(10)

  • Int (10) 10 indicates the length of the displayed data, not the size of the stored data; The 10 in chart(10) and VARCHar (10) indicates the size of the stored data, that is, how many characters are stored.

Int (10) 10-bit data length 9999999999, accounting for 32 bytes, int type 4 bits

  • Char (10) A fixed string of 10 characters with a maximum of 10 Spaces

Varchar (10) A 10-character variable string with a maximum of 10 characters for fillable Spaces

  • Char (10) Stores 10 characters of a fixed length. If the number of characters is less than 10, Spaces are used to fill up the storage space
  • Varchar (10) stores as many variable length characters as possible. Spaces are also stored as one character, unlike char(10) Spaces, which are placeholders that do not count as one character

What’s the difference between FLOAT and DOUBLE?

  • FLOAT data can store up to 8 decimal digits and account for 4 bytes in memory.
  • Data of type DOUBLE can store up to 18 decimal digits and occupy 8 bytes of memory.

Difference between DROP, DELETE, and TRUNCate

All three indicate deletion, but there are some differences:

Therefore, when a table is no longer needed, use drop; When you want to delete some rows, use delete; Truncate is used when deleting all data from a reserved table.

The difference between a UNION and a UNION ALL?

  • If you use UNION ALL, duplicate rows are not merged
  • Efficiency of UNION is higher than that of UNION ALL

SQL optimization

How to locate and optimize performance problems of SQL statements? Is the index being used? Or how do I know why this statement is running slowly?

The most important and effective way to locate low performance SQL statements is to use an execution plan. MySQL provides the Explain command to view the execution plan of a statement. As we know, no matter what kind of database or database engine, there are many related optimizations in the execution of a SQL statement. For query statements, the most important optimization method is the use of indexes. And the execution plan, is to show the database engine for SQL statement execution details, including whether to use the index, what index to use, the use of the index information.

The information ID contained in the execution plan consists of a set of numbers. Represents the execution order of subqueries in a query.

  • Id Same execution sequence from top to bottom.
  • Different ids. A larger ID has a higher priority and is executed earlier.
  • A null id indicates a result set and does not need to be queried. It is commonly used in query statements containing union.

Select_type Specifies the query type of each subquery. Some common query types.

Table query data. When querying data from derived tables, x indicates the corresponding execution plan ID of the table partitions. The table can be partitioned by the specified column when the table is created. Here’s an example:

create table tmp ( id int unsigned not null AUTO_INCREMENT, name varchar(255), PRIMARY KEY (id) ) engine = innodb partition by key (id) partitions 5;

Type (very important, you can see if there is an index) access type

  • ALL Scans ALL table data
  • Index Traverses the index
  • Range Indicates the search range of the index
  • Index_subquery uses ref in subqueries
  • Unique_subquery uses eq_ref in subqueries
  • Ref_or_null The optimized ref to index Null
  • Fulltext uses full-text indexes
  • Ref uses a non-unique index to find data
  • Eq_ref uses PRIMARY KEYorUNIQUE NOT NULL index association in join query.

Possible_keys Possible index. Note that possible_keys may not be used. If there is an index on the field involved in the query, the index will be listed. When this column is NULL, it is time to consider whether the current SQL needs to be optimized.

Key Displays the actual index used by MySQL in the query. If no index is used, the value is NULL.

TIPS: If an overwritten index is used in a query, it only appears in the key list

Key_length Indicates the length of the index

Ref indicates the join match criteria for the above table, that is, which columns or constants are used to find values on indexed columns

Rows returns the estimated number of result sets, which is not an exact value.

The information of EXTRA is very rich, common ones are:

  1. Using index Uses an overwrite index
  2. Using WHERE uses the WHERE clause to filter result sets
  3. Use filesort to sort files Using non-indexed columns.
  4. The goal of SQL optimization can be found in the Ali development manual

SQL performance optimization objectives: at least range level, requirements are ref level, conSTS is the best. Note: 1) Consts has at most one matching row (primary key or unique index) in a single table. Data can be read in the optimization stage. 2) ref means normal index is used. 3) range Retrieves the index in range. Counter example: explain table results, type=index, index physical file full scan, speed is very slow, this index level comparison range is also low, and full table scan is nothing.

SQL life cycle?

  1. The application server establishes a connection with the database server
  2. The database process gets the requested SQL
  3. Parse and generate an execution plan, execute
  4. Read data into memory and process it logically
  5. Send the result to the client through the connection in step 1
  6. Close the connection and release resources

Large table data query, how to optimize

  1. Optimize shema, SQL statement + index;
  2. Second plus cache, memcached, redis;
  3. Master/slave replication, read/write separation;
  4. Vertical split, dividing a large system into smaller systems based on how well your modules are coupled, is a distributed system;
  5. Horizontal segmentation, for tables with large data volume, this step is the most troublesome and can test the technical level. It is necessary to choose a reasonable Sharding key. In order to have good query efficiency, the table structure should also be changed to make certain redundancy, and the application should also be changed. Instead of scanning all tables;

How to handle large pages?

Large paging is generally handled in two directions.

  • Select * from table where age > 20 limit 1000000,10 there is room for optimization. This statement takes load1000000 data and then basically dumps it all. Fetching 10 is slow of course. Select * from table where id in (select id from table where age > 20 limit 1000000,10) This also loads a million data, but it is fast because of index overwriting, so all the fields to be queried are in the index. Select * from table where ID > 1000000 limit 10 select * from table where ID > 1000000 limit 10
  • Reduce this request from a requirement perspective… The main thing is not to do similar requirements (jump directly to a specific page millions of pages later. Allows only page-by-page viewing or following a given path, which is predictable and cacheable) and prevents ID leaks and continuous malicious attacks.

In fact, to solve the problem of large paging, we mainly rely on cache. We can check the content in advance predictably, cache it to redis and other K-V databases, and return it directly.

In the Alibaba Java Development Manual, the solution to large paging is similar to the first one mentioned above.

[Recommendation] Use deferred association or subquery to optimize super multi-page scenarios.

Note: MySQL does not skip the offset line, but select offset+N, and return N. When offset is very large, it is very inefficient to either control the total number of pages returned, or perform SQL rewriting if the number of pages exceeds a certain threshold.

Example: Quickly locate the ID segment to be obtained, and then associate it:

SELECT a.* FROM 表1 a, (select id from 表1 where 条件 LIMIT 100000,20 ) b where a.id=b.id

Mysql paging

The LIMIT clause can be used to force a SELECT statement to return a specified number of records. LIMIT accepts one or two numeric parameters. The argument must be an integer constant. If you are given two arguments, the first parameter specifies the offset of the first row to return, and the second parameter specifies the maximum number of rows to return. Initial row offset is 0(not 1)

Mysql > SELECT * FROM table LIMIT 5,10; 1 To retrieve all rows from an offset to the end of the recordset, specify the second parameter -1:

mysql> SELECT * FROM table LIMIT 95,-1; If only one argument is given, it means to return the maximum number of rows:

mysql> SELECT * FROM table LIMIT 5; In other words, LIMIT n is equivalent to LIMIT 0,n.

Slow Query logs

This log is used to record SQL logs whose execution time exceeds a critical value. This log is used to quickly locate slow queries and provide reference for optimization.

Enable slow log query

Configuration item: slow_query_log

Use show variables like ‘slov_query_log’ to check whether the slov_query_log is enabled. If the status is OFF, use set GLOBAL slow_query_log = on to enable the slov_query_log function. It will generate an XXx-slow.log file under datadir.

Set critical time

Configuration item: long_query_time

Check: show VARIABLES like ‘long_query_time’, in seconds

Set: set long_query_time=0.5

The real time should be set from long time to short time, that is, the slowest SQL optimization away

View the log, which is logged to xxx-slow.log whenever the SQL exceeds the critical time we set

Care about the SQL time in the business system? Statistics too slow query? How are slow queries optimized?

In the business system, except for the query using the primary key, I will test the time on the test library. The statistics of the slow query are mainly done by the operation and maintenance, and the slow query in the business will be fed back to us regularly.

Slow query optimization first to understand what is the cause of slow? Does the query condition not match the index? Load unwanted columns? Or too much data?

So optimization is going in those three directions,

  • The statement is first analyzed to see if additional data is loaded, perhaps by querying for extra rows and discarding them, or by loading many columns that are not needed in the result. The statement is analyzed and overwritten.
  • Analyze a statement’s execution plan to see how it uses the index, and then modify the statement or index so that the statement matches the index as closely as possible.
  • If statement optimization is no longer possible, consider whether the amount of data in the table is too large, and if so, split the table horizontally or vertically.

Why try to have a primary key?

Primary keys ensure the uniqueness of data rows in the entire table. You are advised to add a self-growing ID column as the primary key even if the table does not have a primary key. After setting the primary key, it is possible to make subsequent deletions faster and ensure the safety of the operation data range.

Does the primary key use an autoincrement ID or a UUID?

It is recommended to use the autoincrement ID instead of the UUID.

Because in InnoDB storage engines, the primary key index as a clustering index, that is, the primary key index of B + tree leaves node stores the primary key index, and all the data (in order), if the primary key index is the ID, so you just need to constantly backward arrangement, if it is a UUID, due to the size of the ID with the arrival of the original not sure. It causes a lot of data inserts, a lot of data movement, and then a lot of memory fragmentation, which in turn degrades insert performance.

In general, in the case of large data volumes, the performance is better with auto-increment primary keys.

As for the primary key being a clustered index, InnoDB selects a unique key as the clustered index if there is no primary key, and generates an implicit primary key if there is no unique key.

Why is the field required to be not NULL?

Null values take up more bytes and cause a lot of mismatches in your program.

If you want to store user password hashes, what fields should be used for storage?

Fixed length strings such as password hashes, salt, and user id numbers should be stored in char rather than vARCHar to save space and improve retrieval efficiency.

Optimize data access during queries

  • Too much data is accessed and query performance deteriorates
  • Determine if the application is retrieving more data than it needs, perhaps too many rows or columns
  • Verify that the MySQL server is not parsing a large number of unnecessary rows
  • Avoid the following SQL statement errors
  • Query data that is not needed. Solution: Use limit to resolve
  • Multi-table association returns all columns. Solution: Specify column names
  • Always return all columns. Workaround: Avoid using SELECT *
  • Query the same data repeatedly. Workaround: You can cache the data and read the cache directly next time
  • Whether additional records are being scanned. Solutions:
  • Use Explain for analysis, if you find that the query needs to scan a large amount of data, but only returns a small number of rows, you can use the following techniques to optimize:
  • Using an index override scan, all columns are placed in the index so that the storage engine does not need to go back to the table for the corresponding row to return the result.
  • Change the database and table structure and modify the data table paradigm
  • Rewrite the SQL statement so that the optimizer can execute the query in a better way.

Optimize long and difficult query statements

  • One complex query or multiple simple queries
  • MySQL internally scans millions of rows of data in memory per second, and responding to data to clients is much slower
  • It is good to use as small a query as possible, but sometimes it is necessary to decompose a large query into several smaller ones.
  • Segmentation queries
  • Divide a large query into multiple small identical queries
  • Deleting 10 million data at a time costs more than deleting 10, 000 and pausing for a while.
  • Decompose associated queries to make caching more efficient.
  • Performing a single query can reduce lock contention.
  • The association at the application layer makes it easier to split the database.
  • Query efficiency will be greatly improved.
  • Fewer queries for redundant records.

Optimize specific types of query statements

  • Count (*) ignores all columns and counts all columns. Do not use count(column name)
  • In MyISAM, count(*) without any WHERE conditions is very fast.
  • When there are WHERE conditions, MyISAM’s count count is not necessarily faster than other engines.
  • You can use explain to query for approximations and replace count(*) with approximations
  • Add summary tables
  • Use the cache

Optimizing associated query

  • Determines whether there is an index in the ON or USING clause.
  • Ensure that GROUP BY and ORDER BY are only columns in one table so that MySQL can use indexes.

Optimized subquery

  • Use associative queries instead
  • Optimize GROUP BY and DISTINCT
  • These two types of query data can be optimized using indexes, which are the most effective optimization methods
  • In associative query, the use of identity column group is more efficient
  • ORDER BY NULL (GROUP BY);
  • WITH ROLLUP super aggregation that can be moved to application processing

Optimize LIMIT paging

  • If the LIMIT offset is large, the query efficiency is low
  • You can record the maximum ID of the last query. The next query is performed based on this ID

Optimizing UNION queries

  • The efficiency of UNION ALL is higher than that of UNION

Optimize the WHERE clause

The problem solving method

If there is no problem with the index, consider the above aspects. Data access problems, long and difficult query sentences or some specific types of optimization problems, answer one by one.

Some methods of SQL statement optimization?

1. To optimize the query, avoid full table scan as far as possible, and first consider creating indexes on the columns involved in WHERE and ORDER by. 2. Avoid null values in the WHERE clause. Otherwise, the engine will abandon the use of index and perform full table scan. Select id from t where num is null — select id from t where num is null — select id from t where num is null

3. Avoid using it in where clauses! = or <> otherwise the engine will abandon the index and perform a full table scan. Select id from t where num=10 or num=20 select id from t where num=10 or num=20 select id from t where num=10 union all select id from t where num=20

Select id from t where num in(1,2,3) select id from t where num in(1,2,3) select id from t where num between 1 and 3

Select id from t where name like ‘% li %’ select id from t where name like ‘% li %’ 7. If you use parameters in the WHERE clause, it will also cause a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the choice of an access plan until run time; It must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is unknown and therefore cannot be used as an input for index selection. Select id from t where num=@num — select id from t with index where num=@num

8. Expression operations on fields in the WHERE clause should be avoided as much as possible. This will cause the engine to abandon the use of indexes and perform a full table scan. Select id from t where num/2=100 select id from t where num=100*2

9. Avoid functional manipulation of fields in the WHERE clause, which will cause the engine to abandon indexes and perform a full table scan. Select id from t where substring(name,1,3)= ‘ABC’ select id from t where name like ‘ABC %’

10. Do not perform functions, arithmetic operations, or other expression operations to the left of the “=” in the WHERE clause, or the system may not use the index properly.

Database optimization

Why optimize

  • The throughput bottleneck of the system often appears in the database access speed
  • As the application runs, more and more data is stored in the database, and processing times are correspondingly slower
  • Data is stored on disk and read and write speeds are not comparable to memory

Optimization principle: Reduce system bottlenecks, reduce resource occupation, and increase system response speed.

4) Database structure optimization

A good database design scheme for the performance of the database often get twice the result with half the effort.

You need to consider data redundancy, speed of query and update, and whether the data type of the field is reasonable.

Split a table with many fields into multiple tables

For a table with many fields, if some fields are used infrequently, you can separate these fields to form a new table.

Because when a table has a large amount of data, it is slowed down by the presence of infrequently used fields.

Add intermediate tables

For tables that require frequent joint queries, you can create intermediate tables to improve query efficiency.

By creating an intermediate table, you insert the data that needs to be queried through the federated query into the intermediate table, and then change the original federated query to a query against the intermediate table.

Add redundant fields

The design of data tables should follow the rules of the paradigm theory as far as possible, reduce the redundant fields as far as possible, and make the database design look delicate and elegant. However, reasonable addition of redundant fields can improve the query speed.

The more normalized a table is, the more relationships there are between tables, the more queries need to be joined, and the worse the performance.

Note:

If the value of a redundant field is changed in one table, you have to find a way to update it in another table, otherwise you will have data inconsistency problems.

MySQL database CPU up to 500%

When the CPU increases to 500%, run the top command of the operating system to check whether mysqld is occupied. If not, find out the processes with high CPU usage and handle the problem.

If mysqld is the cause, show processList to see if there is a session running in it. Find the high SQL consumption to see if the execution plan is accurate, if the index is missing, or if there is simply too much data.

In general, it is important to kill these threads (and see if CPU usage drops), and then re-run the SQL after making appropriate adjustments (such as adding indexes, changing SQL, changing memory parameters).

It is also possible that each SQL server does not consume a lot of resources, but all of a sudden, a large number of sessions are connected and the CPU spikes. In this case, you need to work with the application to analyze why the number of connections increases and adjust accordingly, such as limiting the number of connections

How to optimize the large table? A table has nearly ten million data, CRUD is slow, how to optimize? How is cent library cent table done? What problem does cent table cent library have? Does middleware work? Do you know how they work?

When the number of MySQL single table records is too large, the CRUD performance of the database will be significantly reduced. Some common optimization measures are as follows:

  • Limit the scope of data: It is important to prohibit queries that do not contain any conditions that limit the scope of data. For example, when users query the order history, we can control it within a month. ;
  • Read/write separation: the classical database split scheme, the master library is responsible for writing, the slave library is responsible for reading;
  • Caching: Use MySQL’s cache. For heavy, less-updated data, you can consider using application-level caching.

There is also the way of optimization by sub-table, mainly vertical sub-table and horizontal sub-table

  1. Vertical partition:

Split according to the correlation of the tables in the database. For example, if the user table contains both the user login information and the user’s basic information, you can split the user table into two separate tables, or even put them into separate libraries.

To put it simply, vertical splitting is the splitting of data table columns. A table with many columns is split into multiple tables. This should make it a little bit easier to understand.

Advantages of vertical split: Smaller row data, fewer blocks to read during query, and fewer I/ OS. In addition, vertical partitioning simplifies table structure and is easier to maintain.

Disadvantages of vertical split: Redundant primary keys, need to manage redundant columns, and may cause Join operations, which can be solved by joining at the application layer. In addition, vertical partitioning makes transactions more complex;

Vertical split tables put the primary key and some columns in one table, and then put the primary key and other columns in another table

Applicable scenario 1, if a certain columns in the table are commonly used, in addition some columns are not commonly used 2 smaller, can make the data line, a data page can store more data, query time reduce the number of I/O Some table strategy logic algorithm based on the application layer, once change logic algorithm, the whole table logic will change, and poor extensibility For the application layer, Logical algorithm adds redundant columns for development cost management, query all data requires join operation horizontal partition:

Keep the data table structure unchanged and store the data shards with some policy. In this way, each piece of data is dispersed to different tables or libraries, achieving the purpose of distribution. Horizontal splitting can support very large amounts of data.

Horizontal splitting is the splitting of index table rows. When the number of table rows exceeds 2 million, it will slow down. At this time, the data of a table can be split into multiple tables to store. For example, we can split the user information table into multiple user information tables to avoid the performance impact of a single table having too much data.

Water resolution can support very large amounts of data. Note that the split table only solves the problem of large data in a single table, but because the table data is still on the same machine, in fact, there is no significance to improve MySQL concurrency, so horizontal split is best.

Horizontal splitting can support very large amount of data storage and less application side transformation, but it is difficult to solve fragmented transactions, poor Join performance of cross-border points and complicated logic.

The author of “The Way to Train Java Engineers” recommends avoiding data sharding as much as possible because of the complexity of logic, deployment, and operation and maintenance. A typical data table can support less than 10 million data volumes with proper optimization. If sharding is necessary, choose client sharding architecture to reduce network I/O with middleware.

Horizontal split table: A large table can be divided to reduce the number of data and index pages that need to be read during query, reduce the number of index layers, and improve the query times

Applicable scenario

1, the data in the table itself is independent, for example, the table records the data of various regions or data of different periods, especially some data are commonly used, some are not commonly used. 2. Data needs to be stored on multiple media. Horizontal segmentation faults 1, adds complexity to the application, is usually need more than one table name query, query all data should be the UNION operation, 2 in many database applications, more than it brings the advantages of this complexity, query increases when reading a disk number index layer Add database under the shard of two common solutions:

Client proxy: The sharding logic is on the application side, encapsulated in jar packages, and implemented by modifying or encapsulating the JDBC layer. Dangdang’s Sharding-JDBC and Ali’s TDDL are two commonly used implementations. Middleware proxy: Adds a proxy layer between applications and data. The sharding logic is uniformly maintained in middleware services. We are talking about Mycat, 360 Atlas, netease DDB and so on are the realization of this architecture.

Problems faced after sub-database sub-table

  • When a transaction supports separate libraries and tables, it becomes a distributed transaction. If you rely on the distributed transaction management function of the database itself to execute transactions, it will pay a high performance cost. If the application program to assist control, the formation of program logic transactions, and will cause programming burden.
  • Cross-database joins

As long as it is segmented, the problem of cross-node Join is inevitable. But good design and sharding can reduce this. A common way to solve this problem is to implement it in two queries. The ids of the associated data are found in the result set of the first query, and the second request is made according to these ids to get the associated data. Sub-database and sub-table scheme products

  • Count, Order BY,group by, and aggregate function issues across nodes these are a class of problems because they all need to be evaluated based on the entire data set. Most agents do not automatically handle merges. Solution: Similar to the cross-node join problem, get the results separately on each node and merge them on the application side. Unlike a JOIN, the queries on each node can be executed in parallel, so it is often much faster than a single large table. However, if the result set is large, the consumption of application memory is an issue.
  • Data migration, capacity planning, capacity and other issues Comprehensive business platform team from taobao, more than it used to a multiple of 2 take forward compatible characteristics (e.g., for more than 4 take 1 for more than 2 to 1) to assign data, to avoid the line levels of data migration, but still need to be table level of migration, at the same time and table size of expansion quantity are limited. Generally speaking, these schemes are not very ideal and have some disadvantages more or less, which also reflects the difficulty of Sharding’s capacity expansion from one side.
  • ID problem
  • Once the database is shelled across multiple physical nodes, we can no longer rely on the primary key generation mechanism of the database itself. On the one hand, the self-generated ID of a partitioned database is not guaranteed to be globally unique. Applications, on the other hand, need to obtain ids before inserting data for SQL routing. Some common primary key generation strategies

UUID Using UUID as the primary key is the simplest solution, but the disadvantages are obvious. Because UUID is very long, in addition to taking up a large amount of storage space, the main problem is the index, which has performance problems when creating indexes and querying based on indexes. In distributed systems, there are many occasions when you need to generate a global UID. Snowflake solves this need and implementation is also very simple, except configuration information. The core code is a 41-bit machine ID 10-bit sequence of 12 bits in milliseconds.

  • Sort paging across shards

Generally speaking, paging requires sorting by the specified field. When the sort field is a shard field, we can easily locate the specified shard through the sharding rule, but when the sort field is not a shard field, the situation becomes more complicated. In order to ensure the accuracy of the final result, we need to sort and return the data in different shard nodes, summarize and sort the result set returned from different shards, and finally return it to the user. As shown below:

MySQL replication principle and process

Master-slave replication: Transfer DDL and DML operations from the master database to the slave database using binary logs, and then re-execute (redo) the logs. This keeps the data from the slave database consistent with the master database.

The role of master-slave replication

  • If the primary database has a problem, you can switch to the secondary database.
  • Read/write separation can be performed at the database level.
  • Daily backups can be made on a slave database.

MySQL master-slave replication solves a problem

  • Data distribution: Start or stop replication at will and distribute data backups across geographic locations
  • Load balancing: Reduce the stress on a single server
  • High availability and failover: Helps applications avoid single points of failure
  • Upgrade testing: you can use a higher version of MySQL as the slave library

MySQL master-slave replication works

  • Log data higher to binary logs on the main library
  • The slave library copies the master library’s logs to its own relay logs
  • An event that reads the relay log from the library and replaces it into slave library data
  • The rationale for the flow, the three threads and the associations between them

Master: binlog thread — records all statements that change the database data in the master binlog;

Slave: IO thread — after using the start slave, it is responsible to pull the binlog content from the master and put it into its own relay log.

From: SQL thread — execute statements in relay log;

Replication process

Binary log: Binary log of the primary database

Relay log: indicates the Relay log of the secondary server

Step 1: The master writes the operation record serially to a binlog file before each transaction updates the data.

Step 2: Salve starts an I/O Thread. This Thread opens a normal connection on master. If the reading has caught up with the master, it goes to sleep and waits for the master to generate new events. The ultimate goal of the I/O thread is to write these events to the relay log.

Step 3: THE SQL Thread reads the relay log and executes the SQL events in the log in order to be consistent with the data in the primary database.

What are the solutions for read/write separation?

Read/write separation depends on master/slave replication, which in turn serves read/write separation. Since master/slave replication requires that the slave cannot write and can only read (if a write operation is performed on the slave, show slave status will show Slave_SQL_Running=NO, in which case you need to manually synchronize the slave as mentioned above).

Plan a

Use the mysql-proxy proxy

Advantages: Directly implements read/write separation and load balancing without modifying the code. The master and slave use the same account. It is not recommended to use this account in actual production

Disadvantages: Reduced performance, no transaction support

Scheme 2

Using aop AbstractRoutingDataSource + + annotation in the dao layer decision data source. If mybatis is used, you can put read/write separation in ORM layer. For example, Mybatis can use mybatis plugin to block SQL statements, all inserts /update/delete access master library, all select access salve library. This is transparent to the DAO layer. Plugins can be implemented to select master and slave libraries by annotations or by analyzing whether the statement is a read-write method. But it still has a problem, that is, do not support transactions, so we need to rewrite the DataSourceTransactionManager, to throw in the affairs of the read – only read library, the rest have read write the thrown into library.

Plan 3

Using aop AbstractRoutingDataSource + + annotation in the service layer decision data sources, can support the transaction.

Disadvantages: Aop does not intercept internal class methods that call each other in this.xx() mode, requiring special handling.

Backup plan, mysqlDump and XtranBackup implementation principle

(1) Backup plan

Mysqldump can be used to perform full backup every day (the files backed up by mysqldump are smaller and smaller after compression) because it is lighter and more flexible.

Xtranbackup is a faster backup than mysqlDump for libraries over 100GB. Full backup is performed once a week and incremental backup is performed every other day during off-peak service periods.

(2) Backup and restoration time

Physical backup is fast, but logical backup is slow

Here with the machine, especially the hard disk speed has a relationship, the following list a few for reference only

Mysqldump 20G 2 minutes (mysqldump)

30 minutes for 80G (mysqldump)

111G 30 minutes (mysqldump)

288GB in 3 hours (Xtra)

4 hours of 3T (XTRA)

The logical import time is usually five times longer than the backup time

(3) How to handle the backup and restoration failure

First of all, we should make full preparations before recovery to avoid mistakes during recovery. For example, the validity check, permission check, space check after backup. If any error occurs, adjust accordingly according to the error prompt.

(4) MysqlDump and Xtrabackup implementation principle

mysqldump

Mysqldump is a logical backup. Add the – single-transaction option for consistency backup. The background process sets the TRANSACTION ISOLATION LEVELREPEATABLE READ level of the session to RR(SET Session TRANSACTION ISOLATION LEVELREPEATABLE READ) and then explicitly starts a TRANSACTION (START TRANSACTION /*! 40100 WITH CONSISTENTSNAPSHOT */), this ensures that the data read in this transaction is the snapshot of the transaction. And then read the data out of the table. — master-data=1 FLUSH TABLES WITH READ LOCK; showmaster status =1 FLUSH TABLES WITH READ LOCK Unlock it now and read the table. When all data has been derived, the transaction can be terminated

Xtrabackup:

Xtrabackup is a physical backup. It copies tablespace files and scans for redo logs. When innoDB is finally backed up, a flush Engine logs operation is performed to ensure that all redo logs have been dropped (involving a two-phase commit)

Because Xtrabackup does not copy binlogs, you must ensure that all redo logs fall to disk, otherwise the last set of committed transaction data may be lost. This point in time is when InnoDB completes the backup. Although the data files are not consistent, having redo during this time period makes the data files consistent

Mood). Flush tables with read lock for myISam and other engines This makes for perfect hot spare.

What are the repair methods for data table corruption?

Use MyISamchk to fix it.

1) Stop mysql service before repair. 2) Open the cli and go to the /bin directory of mysql. MYI use repair table or OPTIMIZE table command to repair the problem. REPAIR TABLE table_name REPAIR TABLE OPTIMIZE TABLE table_name REPAIR TABLE used to REPAIR damaged tables. The OPTIMIZE TABLE command OPTIMIZE TABLE is used to reclaim the spare database space, the disk space is not immediately reclaimed when the TABLE rows are removed, and the rows are rearranged with the OPTIMIZE TABLE command.

Source: author: ThinkWon thinkwon.blog.csdn.net/article/det…