Wechat public account [Huanshao’s Growth Path]

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Hello, I’m Leo. Currently, I am engaged in Java backend development in Changzhou. This is the first in a series on MySQL learning to organize. This series will be output together with byte, netease, Ali, Tencent, Meituan and Kuaishou. Hope to help more friends as soon as possible into the factory!

Train of thought

The whole writing is probably such ideas. An article to familiarize you with MySQL! The length is very big, the suggestion pays attention to first, collect!

The whole point of the book has been reduced. Built for the interview series! If you want specific in-depth study, please pay attention to the public number to browse other articles!

1. How is SQL executed

1.1 the query

Usually we are concerned about how SQL is executed, but do you understand the structure of MySQL? Here we introduce

MySQL has two main layers.

  • Service
  • Storage engine layer

The two floors are mainly composed of four parts

  • The connector
  • analyzer
  • The optimizer
  • actuator

Connector: verifies user identity information, verifies SQL statement permissions of current users, and manages CHANNELS for SQL connections

Analyzer: lexical analysis, syntax analysis. It is used to process SQL statements on the client. After analysis and processing, it is written into the cache. If a match is made next time, it is returned directly to improve query efficiency.

Optimizer: generate execution plan, index selection (this perfectly explains the SQL execution problem I threw above)

Executor: call operation storage engine, retrieve data.

It introduces the role of each structure of MySQL. Here’s an extension of what everyone has been saying about long and short connection optimization!

** Long connection: ** After the connection is established, if the client has a request operation, it can use the same connection for interactive processing

Short connection: After the connection has been established and the client has executed its own requirements, the connection is closed.

Long and short connection summary: database connection this process is more complex, so establish as far as possible to reduce the use of short connection, that is, as far as possible to use long connection. However, long connections are relatively easy to expand memory. It can also be mistaken by the system for too much memory and forcibly killed.

Optimization solution: It can be optimized by periodically disconnecting long connections. You can also reinitialize the connection resource by executing mysQL_reset_connection. This process does not require reconnection and re-authentication.

1.2 to modify

I’ll start with two logs. Redo log and binlog.

Redolog serves as a notepad, a memo, in our daily lives.

The binlog acts as the death-book in the hades.

Same thing in the database. When modifying data, the data will not be modified to disks immediately. If the disk I/O is changed immediately, the disk I/O is greatly affected. Changes are written to the redo log and then updated to the binlog when the system is not busy.

The difference between

  1. Redolog is unique to the InnoDB engine layer, binlog is native to the Server layer.
  2. Redolog is a physical log, binlog is a logical log
  3. Redolog records what changes were made on a data page, and binlog records the original logic of the statement.
  4. Redolog loop write, binlog append write

Modify the process

Update VIP set name= ' 'where ID=2Copy the code
  1. Let’s look at the row where ID=2. There are two ways to find the table, one is directly from the table, and the other is directly from the cache if it exists in the cache
  2. After you get it, modify it directlyName = The road to growthWrite the new row
  3. The new row is updated into memory
  4. Write redolog and it is in the prepare phase
  5. Write binlog
  6. Commit Commit transaction

Here you can tell the interviewer about the two-stage submission. Why should step 4 and Step 5 be written separately? This is the essence of two-phase commit

The purpose is to keep the data consistent. If two-phase commit is not used, the state of the database may be different from the state of the library recovered from its logs.

  • Write redo log and then binlog. MySQL > redo log; MySQL > redo log; MySQL > redo log; MySQL > redo logHuan shao's way of growth. But because the binlog didn’t finish writing the crash, the binlog didn’t record the statement. Therefore, when the log is backed up later, the saved binlog does not contain this statement. Then you will notice that if you need to restore the temporary library with this binlog, the temporary library will be missing the update because the binlog of this statement is lost, and the restored line will be the value of nameLess personal business, different from the value of the original library.
  • Write binlog and redo log. If the transaction crashed after the binlog was done and the redo log was not written, the transaction is invalid after the crash recovery, so the value of name in this line isLess personal business. But the binlog already records the name fromLess personal businesstoHuan shao's way of growthThis log. So, there’s one more transaction that comes out when you use binlog to restore, and the value of the restored row is cHuan shao's way of growth, different from the value of the original library.

1.3 summarize

Above is a SQL how to execute the introduction, from the query and modify respectively introduced the process and the knowledge points involved. The most important thing is the two-phase commit. If the interviewer asks you, you’ll almost certainly get by with a proof by contradiction

2. Index structure

2.1 what is

Index structure is the lowest level of MySQL data processing structure. Mainly divided into five pieces

  • The hash
  • The list
  • Binary tree
  • B tree
  • B + tree

The hash

This is a hash algorithm that hashes each piece of data into a string of keys and stores them somewhere on the data page. If you have the same key, you have a hash collision. This will be covered in more detail in the algorithms section.

The MySQL solution here uses linked lists + hashes to store together in one slot.

The list

A linked list is a non-sequential storage structure on a physical storage unit. The logical order of data elements is realized by linking the order of Pointers in the linked list.

Binary tree

A binary tree is a tree structure with only two children per node. It is one of the simplest and most important trees. The recursion of binary tree is defined as: binary tree is an empty tree, or a non-empty tree consisting of a root node and two disjoint left and right subtrees called root respectively. Left and right subtrees are also binary trees

Red and black tree

A red-black tree is a specific type of binary tree that is used in computer science to organize blocks of data such as numbers. If a binary search tree is a red-black tree, any of its children must be red-black.

Red black tree is a variant of balanced binary search tree. The height difference between left and right subtrees may be greater than 1, so red black tree is not a balanced binary tree (AVL) in the strict sense, but the cost of balancing it is lower, and its average statistical performance is better than AVL.

Since every red-black tree is a binary sorting tree, the search algorithm applied to ordinary binary sorting tree can be used to search the red-black tree, and color information is not needed in the search process.

B + tree

B+ tree is an upgraded version of B tree. Compared with B tree, B+ tree makes more full use of node space and makes the query speed more stable. Its speed is completely close to the binary search

2.2 the advantages and disadvantages

The hash

  • Advantages: Hash indexes are very fast in solving single-value queries with O(1) time complexity.
  • Disadvantages: It does not support range queries, so current MySQL uses hash indexes, but not the default indexes.

The list

  • Advantages: As we saw above, it is discontinuous and non-sequential. Therefore, when modifying data, there is no need to move the node before and after the current node. After modifying the data directly, it is good to point to the latest node.
  • Disadvantages: If a linked list is too large, we need to traverse the data one by one when querying. Linked lists do not have the efficient queries of arrays, nor do they have the 50-percent queries of tree structures.

Binary tree

  • Advantages: When querying data, use half search. Query efficiency is very high. Solved the problem left by linked lists.
  • Disadvantages: When a number is continuously increasing, there will be a tilt state, such as 0, 1, 2, 3, 4, 5, 6. This is a slanted tree with query efficiency comparable to that of a linked list. Big data storage incompatible with MySQL

Red and black tree

  • Advantages: certain program to solve the problem of binary tree offset, but the problem is not fundamental enough to solve
  • Disadvantages: There are many levels of this problem. Having more levels affects query performance.

B + tree

  • Advantages: Optimized on the basis of B tree, is also an evolutionary version of red black tree. The main optimization point is the spin of the data node. During insertion, the tree spins automatically to become another tree when it exceeds a certain limit. And it has the sorting function. Nodes are joined to each other, which is very helpful for queries.

2.3 summarize

The second part introduces the five index structures of MySQL. From concept to pros and cons. Through the advantages and disadvantages of the entry point for the analysis of MySQL why B+ tree as the default index structure.

3. Internal indexes

3.1 Cluster index

concept

Cluster indexes are also primary key indexes. A table can have only one clustered index. When a table has a clustered index, rows are stored on the leaf page of the index.

advantages

Data access is faster and the data is kept in a single tree, avoiding the need to go back to the table to query other columns.

disadvantages

Clustered indexes greatly improve the performance of IO intensive applications, but if they are all in memory, the order of access is not necessary. Clustered indexes lose their advantage.

Updating clustered index columns is expensive because InnoDB is forced to move each updated row to a new location.

Tables based on clustered indexes can suffer from “page splitting” when new rows are inserted, or when the primary key is updated so that rows need to be moved. When the primary key of a row requires that the row must be inserted into a full page, the storage engine splits the page into two pages to accommodate the row, a split operation. Page splitting causes tables to take up more disk space.

Clustered indexes can cause slow full table scans, especially if rows are sparse or the data store is discontinuous due to page splitting

3.2 Non-clustered index

concept

A non-clustered index is also a secondary index.

Leaf nodes are not stored data, the address of the storage is a data line, that is to say, according to an index lookup to the row position to find the data to disk, this is similar to a book catalog, to find the third chapter first quarter, for example, it is now inside the directory lookup, find the corresponding page Numbers to the corresponding page again after see article.

The advantages and disadvantages

Non-clustered indexes often require a secondary query when doing a lookup. When the primary key value is found for the first time, the data Page corresponding to the data row is found by the primary key value, and the data row is found by the Page Directory in the data Page.

Multiple non-clustered indexes can exist in a table.

If the primary key is large, the non-clustered index will become larger, because the non-clustered index leaf node stores the primary key value. If the primary key value is too long, the non-clustered index will take up more physical space

3.3 Common Indexes

The most basic index, with no restrictions, is the one we use all the time. His task is to speed up access to data. Therefore, indexes should be created only for those data columns that appear most frequently in a query condition (WHEREcolumn=) or a sort condition (ORDERBYcolumn). Whenever possible, select a column with the most neat and compact data (such as an integer type) to create the index

3.4 Unique Index

Similar to normal indexes, except that the column value of a unique index must be unique, but is allowed to be null. A primary key index is a special unique index that does not allow empty values.

3.5 Joint Index

Several columns are retrieved as an index, using the leftmost matching principle. Take an example of a user login. You can set the login account and password as a joint index. This provides performance while saving on index maintenance costs.

3.6 Index push-down

Index push-down is an optimization introduced in MySQL5.6. During index traversal, you can judge the fields contained in the index first to filter out the records that do not meet the conditions and reduce the number of entries back to the table.

3.7 summarize

The third part introduces the concept of MySQL internal index, advantages and disadvantages, applications and so on. A common interview question is the difference between a clustered index and a non-clustered index. So this is also the most detailed place to write these two pieces.

4. To summarize

The articles are in the tens of thousands of words, and each one is about 3,000 words. There will be 3-4 more to come. If you think it’s ok, like it and follow it. Those who express their opinions will be given books, interview materials and other benefits