Index of one.
- What data structure is the index of Mysql?
A: B+ tree, multi-way balanced search tree. In the database, we use B tree (and B+ tree) as the index structure, which can speed up the query speed. At this time, key in the B tree represents the key, and data represents the logical address of the item corresponding to this key on the hard disk.Copy the code
- What is the structure of a B+ tree?
A: Multi-path balanced lookup tree. (1) B+ tree contains2There are three types of nodes: internal nodes (also known as index nodes) and leaf nodes. Internal nodes do not hold data, they are only used for indexes, and all data (or records) are stored in leaf nodes. (2B+ tree of order m indicates that internal nodes have at most M -1At the same time, limit the leaf node to store m-1A record. Minimum of Math. ceil(m/2) -1A record (3For a key in an internal node, all keys in the left tree are smaller than it, and all keys in the right subtree are greater than or equal to it. Records in leaf nodes are also arranged by key size. (4) Each leaf node has Pointers to neighboring leaf nodes, which themselves are linked in a descending order according to the size of the keyword.Copy the code
- Why is B+ tree lookup more stable?
A: B+ all keyword data addresses are stored on the leaf node, and any search is a process from the root node to the leaf node, so the number of searches is the same each time, so the query speed is more stable than the B treeCopy the code
- Why are B+ trees dumber?
The middle nodes of the B+ tree do not hold data, so disk pages can hold more node elements and are "dumber".Copy the code
- What’s the difference between B tree and B+ tree?
All nodes of a B tree store both keys and data, while only leaf nodes of a B+ tree store keys and data, and other inner nodes only store keys. The leaf nodes of B tree are independent; A leaf node of a B+ tree has a chain of references to neighboring leaf nodes. The retrieval process of B-tree is equivalent to binary search for the keywords of each node in the range, and the retrieval may end before the leaf node is reached. However, the retrieval efficiency of B+ tree is very stable. Any search is a process from root node to leaf node, and the sequential retrieval of leaf nodes is obvious. The advantages of B+ trees1B+ tree has fewer levels: compared with B+ tree, EACH non-leaf node stores more key words, and the tree has fewer levels, so data query is faster;2B+ tree more stable query speed: B+ all keyword data addresses are stored on leaf nodes, so the search times are the same, so the query speed is more stable than B tree;3B+ tree naturally has the sorting function: all the leaf node data of B+ tree constitute an ordered linked list, which is more convenient for querying the data of the size range, with high data tightness and higher cache hit ratio than B tree.4, B+ tree full node traversal faster: B+ tree traversal the whole tree only need to traverse all leaf nodes, rather than need to traverse each layer like B tree, which is conducive to the database to do full table scan. Compared with B+ tree, the advantage of B tree is that if the frequently accessed data is close to the root node, and the non-leaf node of B tree itself has the address of the keyword and its data, the data retrieval will be faster than that of B+ tree.Copy the code
- Why not use Hash indexes?
A Hash index is a Hash table. It has the advantage of being able to locate data based on the Hash function in a short period of time. The biggest disadvantage of Hash indexes is that they do not support order and range queries. If we want to sort or query the data in a table, we cannot use Hash indexes.Copy the code
- What are the types of indexes?
1.Primary key index: This is the primary key index used by the primary key column of the data table2.Secondary index: the leaf node of the secondary index stores data unique to the primary key. The property column cannot have duplicate data, but the data is allowed to be NULL. A table can create multiple unique indexes common indexes: The only purpose is to quickly query data, a table allows the creation of multiple normal indexes, data duplication and null-prefixed index: only suitable for string data full-text index: mainly for the retrieval of keyword information in large text dataCopy the code
- The difference between a primary key index and a normal index:
- There is only one primary key index. The primary key cannot be null and cannot be repeated. The leaf node of the primary key index holds the entire row of records
- A normal index is a secondary index. A single table allows the creation of multiple normal indexes and data duplication and nulls. The leaf node of a normal index stores the primary key ID, which may need to be queried back to the table
- Clustered and non-clustered indexes
- A clustered index is an index where the index structure and data are stored together. Primary key indexes are clustered indexes.
- A non-clustered index is one in which the index structure and data are stored separately. Secondary indexes are non-clustered indexes. The biggest disadvantage is the possibility of a secondary query (back table)
- P: non-clustered indexes do not necessarily return queries to the table. If an index contains (or overwrites) the values of all the fields to be queried, it is called a “overwriting index.” For example, the user is going to use SQL to query the user name, and the user name field happens to be indexed. Then the index key itself is name, check the corresponding name directly return line, no need to return to the table query.
- Create index statement:
- Add PRIMARY KEY (PRIMARY KEY) to index (PRIMARY KEY)
ALTER TABLE `table_name` ADD PRIMARY KEY ( `column`)Copy the code
- Add UNIQUE index
- Add INDEX(normal INDEX)
- Add FULLTEXT(full-text index)
- Add a multi-column index
ALTER TABLE `table_name` ADD INDEX index_name ( `column1`, `column2`, `column3` ) Copy the code
2. The transaction
- What is a transaction? Database transactions?
A transaction is a logical set of operations that either all or none of them execute. Database transactions can ensure that multiple SQL statements form a logical whole. The database operations that make up the logical whole follow: either all execute successfully or none. --0.START TRANSACTION; --1.Three accounts -500
UPDATE account SET balance = balance - 500 WHERE NAME = 'zhangsan';
-- 2.Li Si account +500
UPDATE account SET balance = balance + 500 WHERE NAME = 'lisi';
-- 3.COMMIT the transaction COMMIT; --3.If a transaction is found to be faulty, ROLLBACK the transaction and restore the data before the transaction began.Copy the code
- Four characteristics of transactions:
-- 1.Atomicity: A transaction is the smallest indivisible unit of operation that either succeeds or fails at the same time. --2.Persistence: When a transaction is committed, the database persists the data. A database failure should not have any impact on it. --3.Isolation: Between multiple transactions. Independent of each other. --4.Consistency: The total amount of data remains the same before and after a transactionCopy the code
- Transaction implementation principle:
Take MySQL's InnoDB engine as an example for a brief explanation. useundo log(Rollback logs)Use redo to maintain atomicity of transactionslog(Redo log)Guarantee the persistence of transaction By means of lock mechanism, MVCC to ensure the isolation of transaction (the default isolation level is REPEATABLE-READ) to ensure the persistence, atomicity, isolation of transaction, consistency can be guaranteed.Copy the code
- What are the problems associated with concurrent transactions?
-- 1.Dirty read: a transaction that reads data that has not been committed in another transaction2.Non-repeatable read (virtual read) : Different data is read twice in the same transaction. --3.Phantom reading: Phantom reading is similar to unrepeatable reading. It occurs when one transaction (T1) reads several rows of data, followed by another concurrent transaction (T2) inserts some data. In subsequent queries, the first transaction (T1) will find more records that did not exist before, as if an illusion occurred, so it is called phantom read.Copy the code
- Setting a different transaction isolation level solves these problems:
-- 1.Read-uncommitted: The lowest isolation level that allows UNCOMMITTED data changes to be READ, potentially resulting in dirty, illusory, or unrepeatable reads. --2.Read-committed: Allows concurrent transactions to READ data that has been COMMITTED, preventing dirty reads, but magic or unrepeatable reads can still occur. --3.REPEATABLE-READ: Multiple reads of the same field are consistent, unless the data is modified by the transaction itself. This can prevent dirty reads and unrepeatable reads, but phantom reads are still possible. (Default quarantine mechanism) --4.SERIALIZABLE: Highest isolation level, fully subject to ACID isolation level. All transactions are executed one by one so that interference between transactions is completely impossible. That is, this level prevents dirty reads, unrepeatable reads, and phantom reads.Copy the code
Storage engine
- Check out all the storage engines provided by MySQL: Show Engines;
- Difference between MyISAM and InnoDB
The difference between | MyISAM | InnoDB |
---|---|---|
Row-level locks | No, only table-level locks are supported | support |
The transaction | Does not support | Support, with commit and rollback capabilities |
Safe recovery after database abnormal crash | Does not support | support |
The execution flow of a statement
- MySQL > create MySQL
- Server layer:
Connector: Authentication is related to permissions (when logging into MySQL). Query cache: when executing a query statement, the query cache (MySQL) is used first8.0Removed later because this feature is not very useful). Parsing: If the SQL statement does not hit the cache, the SQL statement is parsed. Parsing: Extract the keyword to see what the SQL statement does. Parsing: Check that your SQL statement is syntactic. Optimizer: Execute whatever MySQL thinks is best. Executor: The system checks whether the user has permission before execution. If the user does not have permission, an error message is displayed. With permission, the statement is executed and data is returned from the storage engine. Log module (binlog) : This log module can be shared by all execution engines,redolog only has InnoDBCopy the code
- Storage engine: mainly responsible for data storage and reading, support InnoDB, MyISAM, Memory and other storage engines
- Query statement execution flow
select * from tb_student A where A.age='18' and A.name='Joe';
Copy the code
- The connector verifies the permission. If no permission is granted, an error message is returned
- The analyzer performs lexical analysis, syntax analysis
- The optimizer determines the execution plan
- Permission verification, no permission will return an error message
- The executor calls the database engine interface and returns the results of the engine’s execution.
- Update statement execution flow (add, update, Delete)
update tb_student A set A.age='the' where A.name='Joe';
Copy the code
- The connector verifies the permission. If no permission is granted, an error message is returned
- The analyzer performs lexical analysis, syntax analysis
- The optimizer determines the execution plan
- Permission verification, no permission will return an error message
- The executor queries this data first and changes the standby part to be changed to obtain a new row of data
- InnoDB writes this entry to memory and logs the redo log. The redo log enters the prepare state and tells the executor that execution is complete and ready to commit
- The executor generates a binlog of this operation and writes the binlog to disk
- The executor calls the commit transaction interface of the engine, and the engine changes the redo log to the commit state.
Five. Optimistic lock pessimistic lock
- Pessimistic locking
- Always assume the worst, so lock it every time you pick up data. So someone else trying to get hold of the data will block it until it gets the lock
- Usage scenario: Write more
- Category: Shared lock: Multiple transactions can share a lock for the same data. All transactions can access the data but can only read the data but cannot modify it. Exclusive locks: If a transaction acquires an exclusive lock on a row, other transactions cannot acquire additional locks on that row
- Advantages: It provides guarantee for the security of data processing
- Cons: Additional overhead and increased chances of deadlocks. In addition, parallelism is reduced; if a transaction locks a row, other transactions must wait for the transaction to complete before processing that row.
- Optimistic locking
- Always assume the best case scenario, not locked, but in the update will determine whether someone else has updated the data in the meantime, can use the version number mechanism and CAS algorithm to implement.
- Usage scenario: Multi-read, which improves throughput
- Implementation method: ① Version number mechanism: a version number field is added to the data table to indicate the number of times the data has been modified. When the data has been modified, the version value will be increased by one. When thread A wants to update the data value, it also reads the version value. When submitting the update, it updates the version value only when the version value it just read is the same as the version value in the current database. Otherwise, the update operation is retried until the update succeeds. ②CAS algorithm: CAS operation contains three operands — memory location (V), expected original value (A) and new value (B). If the value of the memory location (V) matches the expected original value (A), the processor automatically updates that location value to the new value (B). Otherwise, the processor does nothing.
- Drawbacks: (1) ABA problem: the variable V was first read with A value, and it is still A value when preparing to assign, so we can say that its value has not been modified by other threads? ② Atomic operations of only one shared variable can be guaranteed
Six MVCC.
- MVCC multi-version concurrency control refers to the concept of “maintaining multiple versions of a data so that read and write operations do not conflict”. It’s just an ideal concept
- Assigns a one-way growing timestamp to a transaction, saves a version for each change, and the version is associated with the transaction timestamp. The read operation reads only the snapshot of the database before the transaction began.
7. Snapshot read, current read
- Current read: The latest version of a record is read. When read, it must ensure that other concurrent transactions cannot modify the current record and lock the read record.
- The current read is implemented with a next-key lock (row record lock + gap lock)
- Snapshot read: For example, the unlocked SELECT operation is snapshot read, that is, unlocked non-blocking read. The prerequisite for snapshot reads is that the isolation level is not serial. Snapshot reads at the serial level are degraded to current reads. The reason for the occurrence of snapshot read is to improve concurrency performance. The implementation of snapshot read is based on multi-version concurrency control, namely MVCC. MVCC can be considered as a variant of row lock, but it avoids locking operations and reduces overhead in many cases. Since it is based on multiple versions, snapshot reads may not necessarily read the latest version of data, but may read the previous historical version
- Normal read is implemented through undo log + MVCC
Eight paradigms
- The first paradigm is the most basic paradigm. If all field values in a database table are non-decomposable atomic values
- The second normal form takes the first one one step further and ensures that every column in the table is associated with a primary key
- The third normal form requires that each column in the data table is directly related to the primary key, not indirectly
Six. Something else
- How to store time in MySQL? Datetime? Timestamp? The timestamp of the value saved?
- Don’t store dates in strings! Larger space is occupied and inefficient in comparison (character by character comparison)
- Other types of comparison
2. Why not use autoincrement for primary keys instead of UUID?
Refer to the link: www.cnblogs.com/nullzx/p/87…