3 the database

MySQL 3.1

3.1.1 basis

1. Describe the internal process of executing a query in MySQL.

The procedure for MySQL to perform a query is as follows: - The client connects to the MySQL server through the connector. - After the connector permission is verified, it first queries whether there is a query cache. If there is a cache (this statement has been executed before), it directly returns the cache data. If there is no cache, it enters the analyzer. - The analyzer will analyze the syntax and lexical analysis of the query statement to determine whether the SQL syntax is correct. If the query syntax is wrong, the error message will be directly returned to the client. If the syntax is correct, it will enter the optimizer. - The optimizer optimizes query statements. For example, if there are multiple indexes in a table, the optimizer determines which index has better performance. - The optimizer enters the executor after execution, and the executor starts to execute the statement for query comparison, until all the data that meets the condition is queried, and then returns.Copy the code

2. What is the difference between InnoDB and MyISAM?

The biggest difference between InnoDB and MyISAM is that InnoDB supports transactions while MyISAM does not. The other major differences between InnoDB and MyISAM are as follows: - InnoDB supports safe recovery after crash, MyISAM does not support safe recovery after crash; InnoDB supports row-level locking, MyISAM does not support row-level locking, only supports table-to-table locking; - InnoDB supports foreign keys, MyISAM does not; - MyISAM performs better than InnoDB; - MyISAM supports FULLTEXT, InnoDB does not support FULLTEXT, but InnoDB can use the sphinx plugin to support FULLTEXT indexing, with better results; - InnoDB primary key query performance is higher than MyISAM; -count (*) MyISAM built-in counter, just read the number of saved rows, InnoDB need to walk through the table calculation;Copy the code

What are independent and shared table Spaces? What’s the difference?

- Shared tablespace: all table data and index files of a database are stored in one file. By default, the file path of the shared tablespace is in the data directory. - Independent table space: Each table will be generated as a separate file for storage. - Difference: If a table is placed in a shared tablespace, the tablespace will not be deleted even if the tablespace is deleted. Therefore, the table will still be large, whereas if the tablespace is deleted, the tablespace will be cleared.Copy the code

How does MySQL handle deadlocks?

- Set timeout by innodb_lock_WAIT_timeout and wait until timeout; - Initiates deadlock detection. After a deadlock is found, one of the transactions in the deadlock is rolled back and other transactions continue to execute.Copy the code

5. What is global locking? What are its application scenarios? What problems does it cause?

- Concept: A global lock is a lock on the entire database instance; - Application scenario: Perform full logical backup. In this case, the entire library is in the read-only state. - Problem: Using a global lock prevents the entire system from performing update operations, and all update services are in a waiting state; If you are backing up from a secondary repository, this will cause a significant delay in the master/slave synchronization.Copy the code

What are the performance indicators of MySQL? How do you get these indicators?

- QPS (Queries Per Second) indicates the number of Queries processed by a database Per Second. - Transactions Per Second (TPS) : indicates the number of Transactions processed Per Second. These performance indicators can be estimated by running the show status command to query the current database status. The show status command contains more than 300 status records. QPS is related to TPS: - Uptime indicates the running time of the server, in seconds. -questions, have been sent to the database query number; -com_select, number of queries, actual number of queries. -com_insert, insert times; -com_delete, delete times. -com_update, number of updates. - Com_commit, number of transactions; - Com_rollback Indicates the number of rollback times.Copy the code

7. What are the important logs in MySQL?

Error log: Used to record error information during the running of the MySQL server, such as the failure to load the data file of the MySQL database, or incorrect permissions, etc., will be recorded in the error log, and in the replication environment, the information of the slave server process will be recorded in the error log. By default, error logging is enabled and cannot be disabled. By default, error logs are stored in the database's data file directory named hostname.err, where hostname is the server hostname. Before MySQL 5.5.7, the database administrator can delete the error log file from a long time ago to save disk space on the server. After MySQL 5.5.7, the server will disable this function, only to rename the original error log file and manually flush the log to create a new one. Err hostname.err.old mysqladmin flush-logs ② query logs. The general log contains all the commands executed by the database, regardless of whether the statements are correct or not. - insert To avoid data conflicts, if the data inserted before is the same as the primary key or unique key, an error will be reported. - Update will also be queried because a certain piece of data may be updated during update. - delete Deletes only the data that meets the conditions. Therefore, logs are generated. In a scenario where a large number of concurrent operations are performed, a large number of query information is generated. In this case, I/O is high, affecting MySQL performance. Therefore, you are not advised to enable the log query function in a debugging environment. Enabling query log helps us analyze which statements are intensively executed and whether the data corresponding to select statements that are intensively executed can be cached. At the same time, it also helps us analyze problems. Therefore, you can decide whether to enable query log based on your actual situation. The log query mode is disabled. You can run the following command to enable log query: set global general_log=1 set global log_output='table'; General_log =1 indicates that query log is enabled, 0 indicates that query log is disabled. This command takes effect immediately and does not need to restart the MySQL server. (3) Slow log query: Slow log query consumes too much CPU, IOPS, and memory. When a database encounters performance bottlenecks, slow log query is mostly caused by slow log query. Slow query logs enable the MySQL database to record statements that are queried after the specified time. O&m personnel can optimize database performance by locating and analyzing the statements. By default, the slow query log function is disabled. The slow query function is recorded in the slow query log only when the slow query function is manually enabled. Run the following command to record slow query statements in the current database: set global slow_query_log='ON'; Use set global slow_query_log='ON' to enable slow query logging, only for the current database, and will be disabled if the MySQL database restarts. So to make this permanent, modify the configuration file my.cnf, set slow_query_log=1 and restart the MySQL server. 4. Redo log: To avoid performance problems caused by I/O bottlenecks during data writing, MySQL uses a caching mechanism. First write data to the memory, and then flush the data back to disk in batches. To avoid data loss due to power failures or system failures during data flushing, InnoDB uses redo logs. ⑤ Undo log: The undo log is used to store the value of the log before modification, so that if the modification fails, the undo log can be used to roll back the log. The undo log is a logical log, unlike the redo log. If you delete a log, the Undo log records an INSERT record, and vice versa. If you update a log, It records a reverse update record, and when rollback is performed, the logical records in the undo log can be read and rolled back. By default, undo logs are stored in a shared tablespace. In MySQL 5.6, the undo log location can also be customized by using the innodb_undo_directory variable. The default value is ". Represents the datadir directory. ⑥ Bin log: MySQL > ALTER TABLE; MySQL > ALTER TABLE; MySQL > ALTER TABLE; MySQL > ALTER TABLE; MySQL > ALTER TABLE; MySQL > ALTER TABLE; It also records additional information such as statement occurrence time, execution duration, and operation data, but it does not record SQL statements such as SELECT and SHOW that do not modify data. The functions of binlog are as follows: - Recovery: Binary logs are required to recover certain data. For example, after full backup files of a database are recovered, users can use binary logs to perform point-in-time recovery. - Replication: The principle is similar to recovery. By copying and executing binary logs, a remote MySQL database (usually called slave or standby) is synchronized with a MySQL database (usually called master or primary) in real time. - Audit: Users can audit binary logs to determine whether database injection attacks occur. In addition to the above mentioned functions, binlog is also very important for the transaction storage engine crash recovery. In order to ensure the consistency of binlog and redo, MySQL uses the transaction two-phase commit protocol. When the MySQL system crashes, the transaction status in the storage engine can be prepared or COMMIT. For prepared transactions, refer to binlog to determine whether to commit or rollback the transaction. If the transaction exists in the binlog, commit it; If it does not exist in the binlog, it is rolled back, thus ensuring consistency between the master and slave libraries. You can configure log-bin = [base-name] in the MySQL configuration file (my.cnf) to enable the recording of binlog logs. If base-name is not specified, the default binary log file name is the host name. For example, mysql-bin.000001. The directory where the database resides is datadir. Run the following command to check whether binlog is enabled: show variables like 'log_%'; The binlog format includes STATEMENT, ROW, and MIXED: - STATEMENT binlog records native SQL statements executed on the database. The advantage of this format is that it is simple to record and execute these statements, so that the master and the slave can be synchronized, and the SQL statements executed on the master server can be executed on the slave server. Another benefit is that the time in binary logs is more compact, so the statement-based replication mode uses relatively little bandwidth, saves disk space, and is easy to read with the mysqlBinlog tool. The disadvantage is that the execution time of the same SQL may vary slightly or greatly between the master and slave libraries, so the binary log transmitted includes metadata information, such as the current timestamp, in addition to the query statement. Even so, there are some SQL that can't be copied correctly. For example, data inserted using the INSERT INTO TB1 VALUE(CUURENT_DATE()) statement using a function will change when copied to the current slave server. Stored procedures and triggers can also have problems when using the statement-based replication mode; Another problem is that statement-based replication must be serialized, such as InnoDB's next-key lock, etc. Not all storage engines support statement-based replication. The -row format has supported row-based replication since MySQL 5.1, that is, data-based replication, row-based changes. This approach, which records the actual data in a binary log, has some advantages and disadvantages of its own. The biggest advantage is that each row of data can be copied correctly, some statements can be copied more efficiently, and there are few scenarios that row-based replication cannot handle. All SQL constructs, triggers, stored procedures, etc. execute correctly; The downside is that binary logs can be large and unintuitive, so you can't use mysqlbinlog to view binary logs or determine which SQL statement is currently executed by looking at binary logs. ROW binary logging is now standard, mainly because its advantages far outweigh its disadvantages, and because ROW records ROW data, you can use this mode to do DBA tools such as data recovery and data synchronization between different databases. - MIXED is also the binary logging method used by MySQL by default, but MIXED uses statement-based replication by default. If statement-based replication cannot be exact, row-based replication is used. For example, indeterminate functions such as UUID(), USER(), CURRENT_USER(), ROW_COUNT() are used.Copy the code

3.1.2 index

1. The difference between clustered index and non-clustered index

The fundamental difference is whether the physical order of the records in the table is the same as the index sort. Yes: clustered, no: non-clustered; 1. Concept of non-clustered index (MyISAM) : two B+ trees of non-clustered index look no different. The structure of the nodes is exactly the same, but the contents of the nodes are different. Table data is stored in separate places, and the leaves of both B+ trees use one address to point to real table data. There is no difference between the two keys for table data. Because the index tree is independent, secondary key retrieval does not require access to the index tree of the primary key. Application scenario: a. This column contains a large number of different values. B. Frequently updated columns; InnoDB primary key index (InnoDB primary key index) The primary key is organized into a B+ tree, and the row data is stored in the leaf node. If a condition such as "where ID = 1" is used to search for the primary key, the corresponding leaf node can be found according to the B+ tree search algorithm, and then the row data can be obtained. - Secondary index query: If the secondary index is name, perform the following two steps to search for the name column: 1. Name is retrieved in the secondary index B+ tree, and the corresponding primary key is obtained when the leaf node is reached. 2. Use the primary key to perform the B+ tree retrieval operation again in the primary index B+ tree species, and finally reach the leaf node to obtain the whole row of data. Note: If only the secondary index fields are queried, the table is not returned. If the non-secondary index fields are queried, the table needs to be queried. - Note: Why not use long fields as primary keys? Because all secondary indexes reference the primary index, an excessively long primary index can make the secondary index too large.Copy the code

2. Why InnoDB uses B+ trees instead of B trees, Hash, red black or binary trees?

-B tree: Data is stored on both leaf and non-leaf nodes. As a result, the number of Pointers that can be saved on non-leaf nodes decreases (some data is also called fan out). If the number of Pointers is small, a large amount of data needs to be saved, which only increases the tree height, resulting in more I/O operations and lower query performance. - Hash: Enables fast Hash but has no sequence and high I/O complexity. - Binary tree: the height of the tree is uneven and cannot be self-balanced. The search efficiency is related to the data (the height of the tree) and the I/O cost is high. - Red-black tree: The tree height increases as the data volume increases and the I/O cost is high. -B + tree: The index structure should be organized to minimize the number of disk I/O accesses during the search. 1. Highly reason (only three layer) : B + tree in which each node can contain a large number of keywords, such a tree depth decreased, so any query keywords must walk a path from the root node to leaf node, all the keyword query path length is the same, lead to each data query efficiency, reduce the number of disk I/O access. 2. Disk prefetch principle and locality principle: Set the size of a node to be equal to one page, so that each node can be fully loaded with only one I/O.Copy the code

3. Explain analysis and optimization

Note: In associative query, use small tables to drive large tables as much as possible, and minimize the size of temporary tables. (1) select_type (select_type) - SIMPLE(select without UNION or subquery, etc.) - PRIMARY(select with any complex subpart, the outer layer of select is marked as PRIMARY) - UNION(the second or subsequent SELECT statement from UNION) - DEPENDENT UNION(the second or subsequent SELECT statement from UNION, - DEPENDENT SUBQUERY - DEPENDENT SUBQUERY - DEPENDENT SUBQUERY - DEPENDENT SUBQUERY - DEPENDENT SUBQUERY - DEPENDENT SUBQUERY - DEPENDENT SUBQUERY - DERIVED table SELECT, SUBQUERY FROM clause) - UNCACHEABLE SUBQUERY(the result of a SUBQUERY cannot be cached, (2) type - ALL: Full Table Scan, MySQL will run through the Table to find rows that match - index: -range: only rows in a given range are retrieved, using an Index to select rows (ari requires minimum) -ref: Represents the join match condition of the above table, that is, which columns or constants are used to find the value on the index column (Ali recommends best) -eq_ref: SQL > select * from a table where a primary key or a unique key is used - const, system: select * from a table where a primary key or a unique key is used These types of access are used when MySQL optimizes part of a query and converts it to a constant. MySQL can convert the query to a constant if the primary key is placed in the WHERE list. System is a special case of const type. System-null is used when the table is queried with only one row: MySQL breaks down statements during optimization without even accessing the table or index. For example, picking the minimum value from an index column can be done by a separate index lookup. (3) possible_keys: possible_keys will be listed if there is an index on the possible_keys column, but it may not be possible_keys Represents the number of bytes used in the index, which can be used to calculate the length of the index used in the query (key_len displays the maximum possible length of the index field, not the actual length used, i.e. key_len is calculated from the table definition, not retrieved from the table) without loss of accuracy. Ref: The join matching condition of the above table, i.e. which columns or constants are used to find the value on the index column (8) rows: Extra-using where column data is returned from a table that uses only the information in the index and does not read the actual action. This occurs when all the requested columns of a table are part of the same index. This means that the mysql server will filter the rows after the storage engine retrieves them. Using temporary or Using filesort in red - Using join buffer The value change highlights the lack of use of indexes when retrieving connection conditions and the need for a connection buffer to store intermediate results. If this value is present, it should be noted that indexes may be added to improve performance depending on the query. - Impossible WHERE: This value emphasizes that the WHERE statement will result in no eligible rows. - Select Tables Optimized Away: This value means that the optimizer may return only one row from the aggregate function result by using the index aloneCopy the code

4, create index A, B, C

1. Where A B C stops when it reaches B? - A < = B = C, then the index A, B, because B is range queries (B IN () IN the same way), don't go to C index 2. Where would C B A walk index - will, MySQL do sorting optimization, A B C can go indexCopy the code

3.1.3 Transaction and MVCC multi-version concurrency control

1. What is a transaction? What are its properties? Isolation level? How to set MySQL transaction isolation level?

1. A transaction is a series of database operations, which is the basic unit of database applications. Only The InnoDB engine supports transactions, with four features as follows: - Atomicity, either all or none; - Consistency. The execution of a transaction changes the database from one correct state to another correct state. - Isolation. Any changes made to data by a transaction are not allowed to be provided to other transactions until the transaction is committed correctly; 2.MySQL has 4 transaction isolation levels: -read uncommited, Uncommitted, Read to Uncommitted; - read committed: Indicates that the committed reads are inconsistent. - repetable read, repeatable; (Default) - Serializable, read-write data will lock the entire table, data operation will not error, but the concurrency performance is very low, rarely used; MySQL transaction isolation = REPEATABLE-READ; MySQL transaction isolation = REPEATABLE-READ; MySQL transaction isolation = REPEATABLE-READCopy the code

2. MVCC multi-version concurrency control

https://blog.csdn.net/SnailMann/article/details/94724197 1. Function: MVCC in MySQL InnoDB implementation is mainly to improve database concurrency performance, with a better way to deal with read-write conflict, even if there is read and write conflict, can also do not lock, non-blocking concurrent read; What are current and snapshot reads under MySQL InnoDB? - Current read (pessimistic lock implementation) like Select lock in share mode, select for update; Update, INSERT,delete(exclusive lock) operations are a type of current read, why are they called current read? It reads the latest version of the record. It also ensures that other concurrent transactions cannot modify the current record when reading. Lock snapshot read (MVCC) is used for reading the record. The prerequisite for snapshot reads is that the isolation level is not serial. Snapshot reads at the serial level are degraded to current reads. The reason for the occurrence of snapshot read is to improve concurrency performance. The implementation of snapshot read is based on multi-version concurrency control, namely MVCC. MVCC can be considered as a variant of row lock, but it avoids locking operations and reduces overhead in many cases. Since it is based on multiple versions, the snapshot read may not necessarily be the latest version of the data, but may be the previous historical version. 3. In the database, because of MVCC, we can form two combinations: 1.MVCC + pessimistic lock (MVCC resolves read/write conflicts, pessimistic lock resolves write conflicts) 2.MVCC + Optimistic lock (MVCC resolves read/write conflicts, optimistic lock resolves write conflicts) -db_trx_id: 6byte, recently modified (modified/inserted) transaction ID: record the transaction ID that created this record/last modified this record -db_roll_ptr: 7byte, a rollback pointer to the previous version of this record (stored in the rollback segment) -db_row_id: If the table does not have a primary key, InnoDB will automatically generate a cluster index with DB_ROW_ID - there is actually a drop flag hidden field. If the record is updated or deleted, it does not mean it is actually deleted. -undo log -insert undo log: indicates the undo log generated when the transaction inserts new records. - Update undo log: The undo log generated when a transaction is updated or deleted; Not only when a transaction is rolled back, but also when a snapshot is read; Purge the undo log only if the purge log is not involved in a quick read or transaction rollback segment. Purge the undo log only if the purge log is not involved in a quick read or transaction rollback segment.  1. The person table inserts a new record as follows: name = Jerry, age = 24, implicit primary key = 1, transaction ID and rollback pointer (we assume NULL); Now a transaction 1 changes the name of the record to Tom 2. - When transaction 1 modifies the name of the row, the database locks the row exclusively - then copies the row to the undo log as an old record, since there is a copy of the current row in the undo log - after the copy is complete, Change the name of the row to Tom, and change the transaction ID of the hidden field to the ID of the current transaction 1. By default, we start with 1 and then increase. The rollback pointer points to the copy of the undo log. Alter table person alter table age = 30; alter table person alter table age = 30; alter table person alter table age = 30; Change the age of the row to 30 years old, and change the transaction ID of the hidden field to the ID of the current transaction 2, that is 2. The rollback pointer points to the copy of the record just copied to undo log. The undo log of the same record will become a linear list of record versions, that is, a linked list. The first link of the undo log is the latest old record, and the last link is the earliest old record. When the row data is modified in transaction 2, The database also locks the row and copies the row to the undo log as the old record. If the row already has the undo log, insert the latest old data as the head of the linked list. Insert the undo log at the front of the row and change the row age to 30. And change the transaction ID of the hidden field to the ID of the current transaction 2, that is, 2. The rollback pointer points to the copy of the transaction commit that was just copied to undo log. Release lock 3. Read View - The Read View is a snapshot of the database system generated at the time of the snapshot Read. Log and maintain the ID of the current active transaction in the system (when each transaction is started, an ID is assigned, and this ID is incremented, so the latest transaction has a higher ID value) - Read View follows a visibility algorithm, DB_TRX_ID (the current transaction ID) is retrieved from the latest record of the data to be modified and compared to the ids of other active transactions in the system (maintained by the Read View). If the DB_TRX_ID is compared to the Read View attribute, it is not visible. Select DB_TRX_ID from Undo Log by using the DB_ROLL_PTR rollback pointer. Until DB_TRX_ID is found, the old record where the DB_TRX_ID resides is the latest and oldest version of the current transaction. -up_limit_id: records the lowest transaction ID in the trx_list. -low_limit_id: records the lowest transaction ID in the trx_list. -low_limit_id: records the lowest transaction ID in the trx_list. ReadView Specifies the value of the next transaction ID that has not yet been assigned by the system at the time of generation, i.e., the maximum number of transaction ids that have already occurred. +1 - repetable read, repeatable; InnoDB: InnoDB: InnoDB: InnoDB: InnoDB: InnoDB: InnoDB: InnoDB: InnoDB: InnoDB: InnoDB Data 1, 3, 5, and 7 will be locked when queried in the range 1-7, and data 2, 4, and 6 will be locked in the range 1-7 but do not exist. These are called gap hazards: When queried in the range 1-7, even if 2 does not exist, it will be locked innocently. In this case, 2 cannot be insertedCopy the code

3.1.4 Read/write Separation

MySQL primary/secondary replication

(1) Semi-synchronous replication, that is, semi-sync replication (to solve the problem of data loss in the master database), after the master database writes the binlog, the data will be forced to synchronize to the slave database immediately. After the slave database writes the log to its own local relay log, the slave database will return an ACK to the master database. (2) Parallel replication (to solve the synchronization delay problem) : The slave library starts multiple threads, reads the logs of different libraries in the relay log in parallel, and then replays the logs of different libraries in parallel. This is library-level parallelismCopy the code

2. How to solve the production environment problems caused by master/slave delay

1. Concurrency optimization: The write concurrency of each primary library is 500/s (2000/4=500), at which time the master-slave delay can be ignored. (2) Mysql parallel replication is enabled, and multiple libraries are parallel replication. If the write concurrency of a certain library is particularly high, the single-library write is developed to 2000/s, Parallel replication still doesn't make sense. Rule 28, a lot of times, for example, is that a few order tables write 2000/s, and dozens of other tables 10/s. 2. Optimization at the code level: (1) Rewrite the code, the students who wrote the code should be careful. At that time, we actually asked the student to rewrite the code for a short period of time, and update the code directly after inserting data. (3) Optimize SQL to avoid slow SQL and reduce batch operations. It is recommended to write scripts to complete them in the form of update-sleep. (2) Improve the configuration of the slave library machine, reduce the efficiency difference between the master library writing binlog and the slave library reading binlog.Copy the code

3. Implementation method -Cannal

Cannal's advantage is that it is non-intrusive to the business code because it is based on listening to binlog logs to synchronize data. Real-time can also achieve quasi real-time, in fact, many enterprises a more common data synchronization scheme. In the actual project, we configured the MQ mode, and with RocketMQ or Kafka, Canal would send the data to the MQ topic, which would then be processed by the consumers of the message queue. You can synchronize data to Redis, Hbase, and Elasticsearch.Copy the code