A, SQL statement execution process

MySQL can be divided into Server layer and storage engine layer.

Server layer:

  • The connector: After the TCP handshake, the server authenticates the login user. After user A creates A connection, the administrator’s modification of user A’s permission does not affect the existing connection permission, and user A must log in again.
  • The query cacheMySQL8.0: the location where the query results are stored. MySQL8.0 has been deprecated because the query cache fails too often.
  • analyzerCheck whether the SQL statement you entered meets the MySQL syntax according to the syntax rules.
  • The optimizer: Multiple execution policies can achieve the goal, and the system automatically selects the best one for execution.
  • actuator: Checks whether you have the permission to submit the final task to the storage engine.

Storage engine layer

Responsible for data storage and extraction. Its architecture mode is plug-in, supporting InnoDB, MyISAM, Memory and other storage engines. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL 5.5.5 (and is often used as well).

SQL Execution sequence

BinLog, RedoLog, UndoLog

BinLog

A BinLog is a binary log that records all changes to the structure of a database table (such as CREATE and ALTER TABLE) and changes to the table data (insert, UPDATE, delete). The BinLog file is used for synchronization between the primary and secondary databases. BinLog log files have three modes.

The STATEMENT model

Content: Binlog only records SQL statements that may cause data changes

Advantage: In this mode, because no actual data is recorded, the log and I/O consumption is low and the performance is optimal

Disadvantages: However, some operations are not certain. For example, the uuid() function randomly generates unique identify. when binlog is used for playback, the data generated by this operation must be different from the original data, which may cause unexpected consequences.

The ROW pattern

Content: In this mode, binlog records the source data and modified target data for each operation. StreamSets require this mode.

Advantages: Accurate restoration ensures data security and reliability, and replication and data recovery can be performed concurrently

Disadvantages: The disadvantages are that the binlog volume can be very large, and the performance consumption can be serious for the operation with many records and large field length. Reading also requires special instructions to read data.

MIXED mode

Content: A combination of STATEMENT and ROW modes.

Details: For most operations, STATEMENT is used to record binlogs, except for the following operations that use ROW: NDB storage engine, uUID () indeterminate functions, insert delay statements, and temporary tables

Master/slave synchronization process:

1. The master node must enable binary logging to record any event that modifies database data.

2. Start a Thread (I/O Thread) from the primary node, acting as the client of mysql, and requesting events from the binary log file of the primary node through mysql protocol.

3. The master node starts a dump Thread to check the events in its binary log and compare them to the requested location. If there is no request location parameter, the master node sends the first event in the first log file to the slave node one by one.

4. After receiving the data from the master node, place it in the Relay log file. And record the location within the specific binary log file of the master node (there may be multiple binary files in the master node).

5. Start another Thread (SQL Thread) from the node, read events from the Relay log, and execute them locally again.

Mysql’s default replication mode is asynchronous and has parallel replication capability. The master library sends logs to the slave library and does not care about them. A problem with this is that if the master library dies and the slave library fails, the logs will be lost after the slave library is promoted to the master library. Two concepts emerge from this.

1. Full synchronous replication

The master writes to the binlog and forces the log to be synchronized to the slave, which is returned to the client after all the slave libraries have finished executing, but performance is obviously affected in this way.

2. Semi-synchronous replication

The logic of semi-synchronous replication is as follows: the slave database sends an ACK to the master database after writing to the log successfully. The master database considers the write operation complete when it receives at least one acknowledgement from the slave database.

It can also be extended to the master/slave delay caused by different master/slave configurations, large transactions of the master/slave database, excessive pressure of the slave database, and network flapping. How to avoid this problem? Is reliability first or availability first used during the active/standby switchover? How to determine the primary library Crash? How can I avoid active/standby cyclic replication in active/standby mode? How to restore the deleted library correctly? (it’s even… It feels like it’s getting to be a DBA.

RedoLog

You can understand it through the following demo:

You can write your bills on a ledger or you can write them on a powder board. There are two ways to pay a bill or credit:

1. Just pull out the books and add or subtract the credit this time.

2, first write down the account on the powder board, and then turn out the account book after closing.

When business is busy, choose the latter because the former is too much trouble. You have to dig through a thicket of records to find the person’s credit totals, take out the abacus, and write the results back into the books.

Similarly, in MySQL, if every update operation needs to be written into the disk, and then the disk also needs to find the corresponding record, and then update, the whole process of IO costs and search costs are very high. Write-ahead Logging technology is used by MySQL in the whole process of powder-board and ledger. The key point of this technology is to Write logs first, and then Write disks. In this case, ledger = BinLog, pink plate = RedoLog.

1. InnoDB engine will first write the records to RedoLog and update the memory. Meanwhile, the InnoDB engine updates this operation record to disk when idle.

2. If too many updates cannot be processed by RedoLog, write some RedoLog data to disks first and then erase some RedoLog data. A RedoLog is like a turntable.

RedoLog has write POS and checkpoint

Write pos: is the position of the current record, moving backwards as you write, returning to the beginning of file 0 after the end of file 3.

Check point: is the current point to be erased, and is also moved backward and cyclic. Records are updated to the data file before being erased.

Between write POS and check point is the empty part of the pink board that can be used to record new operations. If write POS catches up to checkpoint, the card is full and no new updates can be performed. We must stop to erase some records and push checkpoint forward.

With redo log, InnoDB guarantees that all previously committed records will not be lost in the event of an unexpected database restart. This capability is called crash-safe.

Redolog two-phase commit: To ensure that the logic between the binlog and redolog logs is consistent. The submission process is as follows:

1 In the prepare phase, 2 write binlog, and 3 commit

  1. If a crash occurs before 2, restart recovery finds no COMMIT and rollback. Backup and restore: no binlog. consistent
  2. If the system crashes before 3, the system will not commit, but prepare and binlog are completeautomaticThe commit. Backup: binlog. consistent

Binlog differs from redolog:

  1. Redo logs are unique to InnoDB; Binlog is implemented in the Server layer of MySQL and is available to all engines.
  2. A redo log is a physical log of changes made to a data page. A binlog is a logical log that records the original logic of the statement, such as adding 1 to the c field in the line ID=2.
  3. Redo logs are written in a redo log cycle. Binlog can be appended. Appending means that the binlog file will be switched to the next one after being written to a certain size without overwriting the previous log.

UndoLog

UndoLog is generally a logical log, which can be divided into two types:

  1. insert undo log

The undo log, which represents the transaction generated when a new record is inserted, is only needed when a transaction is rolled back and can be discarded immediately after a transaction is committed

  1. update undo log

The undo log generated when the transaction is update or DELETE; Not only when a transaction is rolled back, but also when a snapshot is read; The purge thread will purge the log only if the log is not involved in a quick read or transaction rollback

MySQL > select * from ‘MySQL’

Common models for indexing are hash tables, ordered arrays, and search trees.

Hash table: a data storage structure in KV, only suitable for equivalent query, not suitable for range query.

Ordered arrays: Only works with static storage engines and can be a hassle when it comes to inserting. See ArrayList in Java.

Search tree: Stores data as a binary tree in the data structure, but in this case as an n-fork tree (B+ tree). Widely used in the storage engine layer.

The advantages of B+ tree over B tree are as follows:

  1. Non-leaf nodes of a B+ tree store only indexes, and can store much more. B+ trees are dumber and have fewer I/OS than B trees.
  2. B+ Tree leaf nodes before and after management, more convenient range query. At the same time, the results are in the leaf node, the query efficiency is stable.
  3. B+ tree is more conducive to data scanning, which can avoid B tree backtracking scan.

Advantages of indexes:

1. A unique index ensures the uniqueness of each row

2, improve the speed of query

3, accelerometer and table connection

4. Significantly reduce grouping and sorting time in queries

5, through the use of index, can be in the query process, the use of optimization hide, improve the performance of the system.

Disadvantages of indexes:

1. Creation and maintenance are time-consuming

2. When creating an index, you need to lock the table. When locking the table, other data operations may be affected

3. Indexes require disk space for storage, and disk usage is also fast.

4. When CRUD is performed on the data in the table, index maintenance will also be triggered, which takes time and may degrade data operation performance

The principles of index design should not:

1. More indexes are not always better. There are too many indexes. Maintaining indexes takes time and space.

2. Frequently updated data should not be indexed.

3. There is no need to create indexes for small tables.

Should be:

1. It is recommended to generate indexes for columns with low repetition rate. Because there are fewer duplicates, index tree queries are more efficient, and the larger the equivalent cardinality, the better.

2, the data is unique, it is recommended to generate unique index. At the database level, ensure data correctness

3, it is recommended to generate index for columns with frequent group by and order by. Can greatly improve the efficiency of grouping and sorting

4. It is recommended to generate indexes for fields commonly used in query conditions. It is faster to query by index

Index failure scenario

1, fuzzy search: left blur or full blur will cause index invalidation, such as ‘%a’ and ‘%a%’. But right obfuscation can take advantage of indexes, such as ‘a%’.

MySQL > select * from t where name = XXX; MySQL > select * from t where name = XXX; MySQL > select * from t where name = XXX Select * from t where name= ‘sw’ or age=14

Select * from (A,B,C) where (B,C)

Index information:

Primary key index: The leaf node of the primary key index stores the entire row of data information. In InnoDB, primary key indexes are also called clustered indexes. Primary key increments are not guaranteed to be completely self-increments. Unique key conflicts, transaction rollback, etc., can cause discontinuities.

Unique index: An index generated from a unique column that does not allow duplicate values but does allow NULL values

InnoDB reads and writes data on a per-page basis. By default, InnoDB reads and writes data on a per-page basis. By default, InnoDB reads and writes data on a per-page basis.

Change buffer: Normal indexes are used to speed up the update process. If the fields are in the cache, they can be updated directly if they are normal indexes. If the index is unique, all data needs to be read into memory to ensure uniqueness is not violated, so use plain indexes.

Non-primary key index: The content of a leaf node that is not a primary key index is the primary key value. In InnoDB, non-primary key indexes are also called secondary indexes.

Table back: scans the row where the data resides through the database index, and retrives the data that is not provided by the index based on the primary key ID of the row. That is, the query based on the non-primary key index needs to scan another index tree.

Overwrite index: If an index contains (or overwrites) the values of all the fields to be queried, it is called an overwrite index.

A combined index is an index that combines multiple columns. A maximum of 16 columns can be combined at a time.

Left-most prefix rule: A combined index is created for multiple fields at the same time. ABC and ACB are two different joint indexes. For example, to create a combined index (A, B, and C) is equivalent to creating indexes A, AB, and ABC. In addition, the composite index is actually an index, not really creating multiple indexes, but the effect is equivalent to creating multiple indexes.

Index push down: MySQL 5.6 introduces the index push down optimization, which can judge the fields in the index during the index traversal process, filter out the unqualified records, and reduce the number of words back to the table.

Index maintenance: B+ trees involve page splitting and page merging in order to maintain index order. Consider page space utilization when adding or deleting data.

Auto-add primary key: The system generally creates an auto-add primary key irrelevant to services and does not trigger leaf node splitting.

Deferred association: Use an overwrite index query to return the desired primary key, and then associate the original table with the primary key to obtain the desired data.

InnoDB storage: *.frm file is a definition file that defines what a database table should be. Ibd file is the index of the table, and the data store file is the file in which all index trees and row records of the table are stored.

The MyISAM storage: *.frm file is a definition file that defines what a database table should be. The *.myd file is the MyISAM file that stores all rows of the engine table. *.myi file is MyISAM storage engine table index related data file. Under MyISAM, table data and table index data are stored separately.

MyISAM query: Under MyISAM, both primary and secondary key indexes are non-clustered indexes. No matter the primary key index or non-primary key index is queried, the address of the destination data is obtained in the leaf node, and the destination data can be found in the data file through this address.

PS: InnoDB supports clustered indexes, MyISAM does not

SQL transaction isolation level

Four properties of ACID

  1. atomicAtomicity: Putting multiple operations into a single transaction and ensuring that they will either all succeed or none will
  2. consistencyConsistency: interpreted as a series of programs that operate on data and execute without any adverse effect on the data, such as creation or disappearance
  3. Isolation,(Isolation, also known as independence) : Isolation means that multiple transactions do not interfere with each other. Even in the case of concurrent transactions, they are just two things executed concurrently without intersection and do not affect each other. Of course, the implementation does not necessarily require such complete isolation, that is, it does not necessarily require such non-interference, sometimes it is allowed to have some interference. So MySQL can support four types of transaction isolation
  4. persistence(E.g., “they”) : When an operation ends, it ends like this and the operation persists in logging

PS: difference between C in ACID and C in CAP theorem

In a single database transaction operation, the data should be complete and correct. The data will not disappear or increase. CAP

The C in theory refers to the consistency of reading and writing multiple backups of the same data

Data problems that may occur with transaction operations

1, Dirty read: B transaction change data has not committed, A transaction has seen and used. If transaction B rolls back, transaction A does something wrong

2. Non-repeatable read: the key point of non-repeatable read is modification: for the same condition, you read the data and find the value is different when you read it again. You only need to lock the record that meets the condition

3, Phantom read: transaction A first changed the status field of all records of A table to processed, not committed; Transaction B also adds an unprocessed record at this point and commits it; Transaction A then queries the record and finds that one of the records is an unprocessed make-phantom, which refers only to newly inserted rows. Phantom reading can cause semantic problems and data consistency problems.

4. In the RRR isolation level, ordinary queries are snapshot reads and do not see data inserted by other transactions. Therefore, the illusion will only appear after the current read. Gap locks are used to solve this problem.

Before you talk about quarantine levels, the first thing you need to know is that the more tightly you quarantine, the less effective you will be. So a lot of times, we have to find a balance between the two. The transaction isolation levels of the SQL standard from low to high are as follows:

The top-down pattern in the figure above results in a decrease in parallel performance and an increase in security.

Read uncommitted: someone else’s transaction has not been committed, and I can read it in my transaction.

Read Committed (Oracle default) : someone else’s transaction has been committed so I can read it in my transaction.

Repeatable read (MySQL default) : I do not read the data in my own transaction to ensure repeatable read consistency.

Serial: No one else can change data until my transaction is committed.

MySQL InnDB defaults to RR level, but there will be no magic read. Because when transaction A updates A field of all records, transaction A will acquire A table lock on this table. Because transaction A has not committed yet, the lock obtained by transaction A is not released. At this time, transaction B inserts new records in this table, and the lock of this table cannot be obtained, thus the insert operation will be blocked. Transaction B can proceed only after transaction A commits the transaction and the lock is released. So it can be said that MySQL RR level isolation has been implemented to solve dirty reads, non-repeatable reads and phantom reads.

MySQL > alter database lock

Both concurrent programming of Java and concurrent operation of database will involve locks. Researchers introduce the design idea of pessimistic lock and optimistic lock.

Pessimistic locks:

Advantages: Suitable for concurrent environment with more write and less read. Although it cannot maintain very high performance, it can achieve data security under the premise that optimistic lock cannot improve better performance

Disadvantages: Locking will increase the system overhead, although it can ensure data security, but the data processing throughput is low, not suitable for reading and writing occasions

Optimistic locking:

Advantages: In concurrent scenarios with more reads and less writes, the overhead of database locking can be avoided and the response performance of DAO layer can be improved. In many cases, ORM tools have implementation with optimistic locking, so these methods do not need to be implemented manually.

Disadvantages: In the concurrent scenario with many writes and few reads, that is, in the case of fierce write operations, CAS retries are caused. The conflict frequency is high, resulting in higher cost than pessimistic lock.

Implementation: Database level optimistic locking is similar to THE CAS idea, through the data version number or timestamp can also be implemented.

There are three main database concurrency scenarios:

Read-read: There are no problems and no concurrency control is required

Read-write: Has isolation problems, may encounter dirty read, unreal read, non repeatable read

Write – write: There may be update loss problems, such as type 1 update loss, type 2 update loss

There are two types of update loss problems:

Type 1 update loss: the transaction rollback of transaction A overwrites the committed result of transaction B Type 2 update loss: the commit of transaction A overwrites the committed result of transaction B

In order to properly implement the idea of locking, MySQL introduces a variety of miscellaneous locks:

Classification of lock

MySQL supports three levels of locking, respectively

1. Table-level lock

MySQL has the largest lock granularity. The most commonly used MYISAM and INNODB both support table-level locking.

2. Page-level locking

MySQL lock is a kind of lock whose granularity is between row-level lock and table-level lock. Table-level lock is fast but has many conflicts, while row-level lock is slow but has few conflicts. So a compromise page level is taken, locking adjacent sets of records at a time.

3. Row-level lock

The most fine-grained lock in Mysql, which locks only the current row. Row-level locking can greatly reduce conflicts in database operations. Row-level locking is not necessarily better than table-level locking: the finer the lock granularity is, the higher the cost is. Compared with table-level locking in the head of the table directly, row-level locking also needs to scan to find the corresponding row to lock it, which is actually a higher cost. Therefore, table locking and row locking have their own advantages.

The lock MyISAM

  1. While MySQL supports table, page, and row locking, MyISAM storage engine only supports table locking. So MyISAM’s locking is relatively cheap, but the concurrent performance of data operations is relatively low. However, if all writes are tail-inserts, some degree of read-write concurrency can still be supported

  2. MyISAM is a database engine that supports read-write concurrency, but does not support universal read-write concurrency, write concurrency, so it is more suitable for read and write applications, general engineering is also used less.

Lock in the InnoDB

There are too many locks supported in this mode, as follows:

Shared and Exclusive Locks

Intention Locks

Record Locks

Gap Locks

Next-key Locks

Insert Intention Locks

Auto-inc Locks For primary keys

Predicate Locks for Spatial Indexes

Lock in share modle lock in share modle

To ensure that the data is not being modified by other transactions, that is, to ensure that the data is the latest data, and do not allow others to modify the data. However, you may not be able to modify the data, because other transactions may also lock the data in share mode. If you do not commit or rollback in time, a large number of transactions may be waiting.

For update exclusive write lock:

To ensure that the data queried is the latest and only you can modify the data, use “for update”. Equivalent to an UPDATE statement. In busy situations, if transactions are not committed or rolled back in time, other transactions may wait for a long time, affecting the concurrency efficiency of the database.

Gap Lock:

1, The row Lock can only Lock rows, if inserted between records can not solve the Gap Lock, so MySQL introduced Gap Lock. Clearance lock is left and right open interval. There is no conflict between gap locks.

2. Gap lock and row lock are collectively called NextKeyLock, and each NextKeyLock is a front-open and back-closed interval.

Gap lock lock principle (learn to forget that) :

1. The basic unit of locking is NextKeyLock, which is the interval between open and closed.

2. Only objects accessed during the search will be locked.

NextKeyLock is degraded to row lock when a unique index is locked.

4. When the equivalent query on the index is traversed to the right and the last value does not meet the equivalent condition, NextKeyLock degrades to a gap lock.

5. A range query on a unique index is accessed up to the first value that does not satisfy the condition.

Six, MVCC

MVCC:

Concurrency Control for multi-version Concurrency Control MVCC is a concurrency control concept that maintains multiple versions of a data so that read and write operations do not conflict.

2, MVCC in MySQL InnoDB is mainly to improve the database concurrency performance, with a better way to deal with read-write conflict, even if there is read and write conflict, can also do not lock, non-blocking concurrent read.

Current and snapshot reads under MySQL InnoDB

1. The current reading

1. Operations such as SELECT lock in share mode, select for update, update, INSERT, delete are current reads, that is, they read the latest version of the record. It is also necessary to ensure that the current record cannot be modified by other concurrent transactions and that the read record is locked.

2. The current read can be considered as a specific functional implementation of pessimistic locking

2. The snapshot to read

1. An unlocked SELECT is a snapshot read, that is, an unlocked non-blocking read. The prerequisite for snapshot reads is that the isolation level is not serial. Snapshot reads at the serial level are degraded to current reads. The reason for the occurrence of snapshot read is to improve concurrency performance. The implementation of snapshot read is based on multi-version concurrency control, namely MVCC. MVCC can be considered as a variant of row lock, but it avoids locking operations and reduces overhead in many cases. Since it is based on multiple versions, snapshot reads may not necessarily read the latest version of data, but may read the previous historical version.

2. Snapshot read is the realization of MVCC idea in MySQL’s specific non-blocking read function. The purpose of MVCC is to realize read-write conflicts without locking, and improve concurrent read and write performance.

3. Snapshot reads are one of the specific non-blocking reads that MySQL implements for us in the MVCC ideal model.

MVCC is proposed because the database is not satisfied with the poor performance of pessimistic locking to solve the read-write conflict problem, so we can form two combinations:

MVCC + Pessimistic lock: MVCC resolves read/write conflicts, while pessimistic lock resolves write conflicts

MVCC + Optimistic lock: MVCC resolves read/write conflicts, while optimistic lock resolves write conflicts

Realization principle of MVCC

The realization principle of MVCC is mainly based on the four implicit fields in the record, undo log and Consistent Read View.

Four implicit fields:

1. DB_TRX_ID:

6byte, last modified (modified/inserted) transaction ID: Records the ID of the transaction in which the record was created or last modified

2.DB_ROLL_PTR

7byte, a rollback pointer to the previous version of this record (stored in the rollback segment)

3.DB_ROW_ID

6byte, implied increment ID (hidden primary key). If the table does not have a primary key, InnoDB automatically generates a cluster index with DB_ROW_ID

4.FLAG

A deleted flag hidden field does not mean that the record has been updated or deleted, but that the deleted flag has changed

Transaction modification of a record will cause the undo log of the record to become a linear list of record versions (linked list). The first chain of the Undo log is the latest old record, and the last chain is the earliest old record.

Undo log: Update undo log with rollback segment undo log: Update undo log with rollback segment

Consistent Read View: Read View is a Read View produced when a transaction is snapshot Read. At the moment of snapshot Read, a snapshot of the current database system is generated. The ID of the current active transaction is recorded and maintained. This is called the Transaction ID. It is applied to InnoDB’s transaction system at the start of a transaction and is applied in a strictly incremental order). Take this ID and compare it to the ID in the record for selective display, and here’s the general idea.

MVCC adds two hidden fields to each row. The two fields hold the current transaction ID of the row and the deleted transaction ID of the row.

1. Insert:

InnoDB stores the current system version number as the version number for each newly inserted row.

2. Select the:

InnoDB will only look for rows whose version is earlier than the current transaction version (i.e. the system version number of the row <= the transaction’s system version number). This ensures that the rows read by the transaction either existed before the transaction started, or were inserted or modified by the transaction itself.

2. The deleted version of the row is either undefined or greater than the current transaction version number, which ensures that the row read by the transaction is not deleted before the transaction begins.

3. Only the records that meet both 1 and 2 can be returned as the query result.

3. Delete:

InnoDB stores the current system version number (transaction ID) for each row that it deletes as a delete identifier.

4. When the update:

InnoDB performs an update, which actually inserts a new row, and saves its creation time as the ID of the current transaction, as well as the current transaction ID to the delete time of the row to be updated.

The above is only a superficial explanation of the MVCC selection standard process, the source level should be based on the low water level and high water level to intercept. Specific implementation can be baidu.

Key points:

1. The snapshot read result in a transaction is very dependent on the place where the snapshot read occurs for the first time in the transaction. That is, the place where the snapshot read occurs for the first time in a transaction is very critical, because it has the ability to determine the subsequent snapshot read result of the transaction.

2. At the RC isolation level, the latest Read View is generated and retrieved for each snapshot Read. In RR isolation, a Read View is created only for the first snapshot Read in the same transaction, and all subsequent snapshot reads obtain the same Read View.

7. Buffer Pool

In order to speed up data access, the most frequently accessed data is stored in the cache to avoid accessing the database every time. The operating system will have a buffer pool mechanism to avoid accessing the disk every time, so as to speed up data access. As a storage system, MySQL also has a buffer pool mechanism to avoid disk I/O every time data is queried.

1, the meaning of existence is to speed up the query

2. Buffer pools are a common mechanism for reducing disk access.

The buffer pool usually caches data in the unit of page (16K).

4. The common buffer pool management algorithm is LRU, memcache, OS and InnoDB.

5. InnoDB optimizes ordinary LRU: the buffer pool is divided into old generation and new generation. Pages entering the buffer pool enter old generation first, and the page is accessed before entering new generation, so as to solve the problem of prefetch failure. In addition, the retention time of the old generation exceeds the configured threshold before entering the new generation, so as to solve the problem of batch data access and large amount of hot data elimination

Prefetch failure:

Because of read-ahead, the page is put into the buffer pool Ahead of time, but MySQL does not Read the data from the page. This is called read-ahead failure

Buffer pool contamination:

When a SQL statement needs to scan a large amount of data in batches, it may cause all pages in the buffer pool to be replaced, resulting in a large amount of hot data to be replaced and MySQL performance to deteriorate dramatically. This situation is called buffer pool pollution. Solution: after adding the old generation dwell window policy, pages that are loaded in a short period of time are not immediately inserted into the new generation’s head, but are preferentially eliminated if they are visited only once in a short period of time.

Eight, table slim

Empty:

The MySQL delete command simply marks the record location or data page for reuse, but the disk file size does not change. Table Spaces cannot be reclaimed by the delete command. These can be reused, but the unused space looks like a void. Insertions that cause splits also create cavities.

Rebuild table train of thought:

Create table B with the same structure as table A

Select * from primary key (A); select * from primary key (B)

3. Replace table A with Table B to achieve slimming effect.

Rebuild table instruction:

Alter table A engine=InnoDB alter table A engine=InnoDB

2、推荐Github:gh-ost

SQL Joins, statistics, random query

The seven types of Join are as follows:

Statistics:

1. In MyISAM mode, the total number of rows in a table is saved on disk, which can be used directly

2. Due to MVCC, InnoDB engine needs to read out data and sum it up

3, performance from bad to good: count(field) < count(primary key ID) < count(1) ≈ count(*), try to use count(*) can be.

Random query:

mysql> select word from words order by rand() limit 3;
Copy the code

Use order by rand() to explain the need for Using temporary and Using filesort. So avoid this in your design.

mysql> select count(*) into @C from t;
set @Y1 = floor(@C * rand());
set @Y2 = floor(@C * rand());
set @Y3 = floor(@C * rand());
select * from t limit @Y1,1; 
select * from t limit @Y2,1;
select * from t limit @Y3,1;
Copy the code

Select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from))));

Comparison between exist and in:

1. When querying in, first query the sub-query table, then make a Cartesian product between the inner table and the outer table, and then filter according to the conditions.

2. The subquery uses EXISTS. The master query is performed first, and each row of data queried is cyclically put into the subquery to verify the existence and filter out the whole returned data.

3, the two tables are of the same size; there is little difference between in and exists. The internal table is large and the efficiency of exists is high. Small inner surface, high efficiency with IN.

SQL > select * from ‘not in’; SQL > select * from ‘not in’ A not EXISTS subquery can still be used on an index of a table. Not exists is faster than not in.

MySQL optimization

SQL optimization is mainly divided into four directions: SQL statement and index, table structure, system configuration, hardware.

The general optimization idea is to maximize the use of indexes, avoid full table scans as much as possible, and reduce invalid data queries:

1. Reduce data access: Set reasonable field types, enable compression, and reduce disk I/O through index access.

2, return less data: only return required fields and data paging processing, reduce disk IO and network IO.

3, reduce the number of interactions: batch DML operation, function storage, etc., reduce the number of data connections.

4, reduce server CPU overhead: try to reduce database sorting operations and full table query, reduce CPU memory occupation.

5, table partition: the use of table partition, you can increase the parallel operation, the maximum use of CPU resources.

SQL statement optimization examples:

1, reasonable establishment of coverage index: can effectively reduce back table.

2, Union, or, in can all match index, recommend to use in

3, the negative condition (! =, <>, not in, not exists, not like, etc.) Indexes do not use indexes. In is recommended.

4. Performing operations on columns or using functions invalidates indexes for a full table scan

5. Beware of implicit type conversions, because an integer in the original string will trigger the CAST function and cause index invalidation. The original int with a string will take the index.

6. Fuzzy query with % prefix is not recommended.

7, multi-table associated query, small table first, large table after. In MySQL, a table associative query after executing from is performed from left to right (Oracle does the opposite), and the first table involves a full table scan.

MySQL parse Where clauses from left to right, top to bottom. According to this principle, the filtering conditions should be put forward, the fastest speed to reduce the result set.

SQL tuning overview:

1. Use slow query logs to locate the SQL that needs to be optimized

2. Use the Explain execution plan to view index usage

3, focus on (in general, index problems can be found based on these 4 columns) :

1. Key (check if index is used)

2. Key_len (check whether the index is fully used)

3, type (check index type)

4, Extra (view additional information: sort, temporary table, where condition false, etc.)

4. Optimize the SQL based on the index problems identified in step 1

Table structure optimization:

1. Use TINYINT, SMALLINT, MEDIUM_INT as integer types instead of INT, and UNSIGNED if non-negative.

2. The length of a VARCHAR allocates only the space it really needs.

3. Use TIMESTAMP instead of DATETIME.

4. Do not have too many fields in a single table.

5. Avoid NULL fields, which are difficult to optimize and take up extra index space. The string defaults to ”.

Read/write separation:

Write only on the master server, read only on the slave server. The corresponding database cluster is generally one master, one slave, and one master, many slave. The business server writes all operations that need to be written to the main database, and reads are queried from the library. The master database synchronizes data to the slave database to ensure data consistency. Generally, there are two ways to achieve read and write separation: code encapsulation and database middleware.

Sub-database sub-table: sub-database sub-table is divided into vertical and horizontal two ways, generally vertical after horizontal.

1. Vertical branch library: Divide the application into several modules, such as order module, user module, commodity module, payment module and so on. It’s the idea of microservices.

2. Vertical split table: generally split uncommon fields with large data fields.

3. Level Table: Select the field according to the scene as the sub-table field, for example, Taobao daily order 10 million, userId as the sub-table field, data query support to the latest 6 months of orders, more than 6 months for archiving, then the data volume of 6 months is 1.8 billion, divided into 1024 tables, each table stores 200W data. Hash (userId)%100 to find the corresponding table.

4. ID generator: distributed ID needs to be globally unique and convenient to query storage – retrieve data to ensure uniqueness and increasing number.

At present, the main popular sub-database sub-table tools are Mycat and Sharding-Sphere.

TiDB: Open source distributed database that combines the best features of traditional RDBMS and NoSQL. TiDB is MySQL compatible and supports unlimited horizontal expansion with strong consistency and high availability. The goal of TiDB is to provide a one-stop solution for BOTH OLTP(Online Transactional Processing) and OLAP (Online Analytical Processing) scenarios. TiDB has the following core features

1. Support MySQL protocol (low cost of development and access).

2. 100% transaction support (data consistency implementation is simple and reliable).

3, unlimited level expansion (do not have to consider the sub-database sub-table), non-stop service.

4, TiDB support and MySQL mutual backup.

5, follow the JDBC principle, low learning cost, strong relationship, strong consistency, do not worry about the master and slave configuration, do not consider the sub-library sub-table, but also seamless dynamic expansion.

Suitable for:

1. When the original MySQL service encounters a single capacity or performance bottleneck, you can consider using TiDB to seamlessly replace MySQL.

2. MySQL complex query is slow under large data volume.

3. With a large amount of data, the data grows rapidly and approaches the limit of single-machine processing. Sharding schemes that are more intrusive and constraining to business, such as dividing databases and tables or using database middleware, are not wanted.

4. In the case of large amount of data, high concurrency real-time writing, real-time query and real-time statistical analysis are required. 5. Requirements for distributed transactions, 100% data consistency of multi-data centers, and high availability of auto-failover.

Not suitable for:

1. TiDB is not used for scenarios that standalone MySQL can meet.

2. TiDB is usually not used in scenarios with less than 5000W data. TiDB is designed for large-scale data scenarios.

3. If your application has a small amount of data (all data below ten million rows) and does not have requirements such as high availability, consistency, or multi-datacenter replication, then TiDB is not suitable for you.

Write in the last

Welcome to pay attention to my public number [calm as code], massive Java related articles, learning materials will be updated in it, sorting out the data will be placed in it.

If you think it’s written well, click a “like” and add a follow! Point attention, do not get lost, continue to update!!