preface

Welcome to our GitHub repository Star: github.com/bin39232820… The best time to plant a tree was ten years ago, followed by now. I know many people don’t play QQ anymore, but for a moment of nostalgia, welcome to join the six-vein Shenjian Java rookie learning group, group chat number: 549684836 encourage everyone to write blog on the road of technology

omg

Let’s continue exploring mysql. Previous article

  • Schema data type optimization and indexing foundation

The query procedure of a Select statement

Generally speaking, MySQL can be divided into Server layer and storage engine layer two parts.

The Server layer includes connectors, query caches, analyzers, optimizers, actuators, etc. It covers most of the core service functions of MySQL, as well as all the built-in functions (such as date, time, math and encryption functions, etc.). All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, etc.

The storage engine layer is responsible for data storage and extraction. Its architecture mode is plug-in, supporting InnoDB, MyISAM, Memory and other storage engines. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL 5.5.5.

In other words, if you do not specify the engine type when executing create Table, InnoDB is used by default. However, you can also select another engine by specifying the type of storage engine

The connector

Connectors are very simple to understand that you enter the account password to connect to mysql, such as our Navicat client, such as our JDBC database connection pool and so on, are connections

Once the connection is complete, if you have no subsequent action, the connection is idle, as you can see in the show ProcessList command. The graph in the text is the result of show ProcessList, where the Command column is displayed as “Sleep”, indicating that there is now a free connection in the system

The query cache

After the connection is established, you can execute the SELECT statement. The execution logic leads to the second step: query cache.

When MySQL gets a query request, it will go to the query cache to see if it has executed this statement before, which is something like redis key is your SQL value even if your SQL returns the value.

But in most cases I would advise you not to use query caching, why? Because query caching often does more harm than good.

For example Suppose you the list of SQL is to check the current order query, every time as long as someone is inserted into the order, the cache then you would have to be removed, so in this case, with a new database for more than one step operation, so it is not necessary, mysql is off by default cache, and there is no cache in mysql 8 this concept.

You can set the query_cache_type parameter to DEMAND so that query caching is not used for default SQL statements

analyzer

If the query cache is not hit, it is time to actually execute the statement. First of all, MySQL needs to know what you’re going to do, so you need to parse SQL statements, right

The parser first does a “lexical analysis”. You are entering an SQL statement consisting of multiple strings and Spaces. MySQL needs to figure out what the strings are and what they represent.

The optimizer

After a profiler, MySQL knows what you need to do. Before execution can begin, it needs to be processed by the optimizer.

The optimizer determines which index to use when there are multiple indexes in a table

actuator

MySQL knows what you’re going to do from the parser, it knows what you’re going to do from the optimizer, so it goes to the executor stage and starts executing statements, okay

The same logic is performed for indexed tables. The first call is the “fetch the first row that meets the condition” interface, and the next loop is the “fetch the next row that meets the condition” interface, which is defined in the engine

A small problem

Select * from T where k=1; error: “Unknown column ‘k’ in ‘where clause'” At what stage do you think this error was reported?

It should be profiler. The optimizer is the best solution for SQL statement optimization, if there is no table, there is no optimization, so it should be in the analyzer to do the analysis of SQL, found that there is no table directly throw exception prompt.

How are SQL update statements executed?

We have a system to understand the execution process of a query statement, and introduced the processing module involved in the execution process. As you may recall, the execution of a query statement usually passes through the connector, analyzer, optimizer, executor and other functional modules, and finally reaches the storage engine

What about the execution flow of an update statement?

To be sure, the query statement of that set of processes, update statements will also go through

Before you execute a statement, you need to connect to the database. This is the job of the connector.

When an update occurs on a table, the query cache associated with that table is invalidated, so this statement clears all cached results on table T. This is why we generally don’t recommend using query caching

Next, the parser knows that this is an update statement through lexical and syntactic parsing. The optimizer decides to use the ID string. The executor is then responsible for executing, finding the line, and updating it.

Unlike the query process, the update process also involves two important log modules, which are the main ones we will discuss today: redo log and binlog.

Important log module: Redo log

I don’t know if you remember the article “Kong Yiji”, but the hotel owner has a powder board, which is specially used to record customers’ credit records. If there are not many people on credit, he can write down the names of his customers and the accounts on the board. But if there are too many people on credit, there will always be a time when the powder board can’t keep track of it. At this time, the shopkeeper must have a special account book to record credit.

If someone wants credit or to pay the bill, the shopkeeper has two ways to do it

  • One way is to just pull out the books and add or subtract the credit
  • Another method is to write down the account on the powder board first, and then turn the account book out after closing time.

In business flourishing counter is very busy, shopkeeper will choose latter certainly, because former operation is too troublesome really. First, you need to find a record of this guy’s total credit. Think about it. If you want to find the name in dozens of pages, the shopkeeper may have to put on old reading glasses and slowly search for it. After finding it, he will take out an abacus and calculate it.

This problem also exists in MySQL. If every update operation needs to be written to disk, and disk also needs to find the corresponding record, and then update, the whole process of IO costs, search costs are very high. In order to solve this problem, MySQL designers used the same idea as hotel owners to improve the update efficiency

WAL stands for write-Ahead Logging. The key point of WAL is to Write log first and then disk, that is, Write powder board first and then Write ledger when it is not busy

In particular, when a record needs to be updated, the InnoDB engine writes the record to the redo log and updates the memory. The update is complete. At the same time, the InnoDB engine will update this record to disk when appropriate, and this update is usually done when the system is idle, which is like the shopkeeper after closing time.

Similarly, InnoDB’s redo log is fixed in size. For example, it can be configured as a group of 4 1GB redo logs. This pink log can record 4GB of operations. Start at the beginning and then go back to the beginning at the end, as shown in the figure below.

Wirte pos is the current recorded position, moving backwards as you write, returning to the beginning of file 0 at the end of file 3. Checkpoint is the current point of erasure, which is also iterated backwards, updating the record to the data file before erasure.

Between write POS and checkpoint is the empty portion of the pink board that can be used to record new operations. If write POS catches up to checkpoint, the “fan board” is full, and no new updates can be performed. We must stop to erase some records and push checkpoint forward.

With redo log, InnoDB guarantees that all previously committed records will not be lost in the event of an unexpected database restart. This capability is called crash-safe.

Important log module: binlog

MySQL as a whole, in fact, there are two: one is the Server layer, which mainly does things at the functional level of MySQL; There is also the engine layer, which takes care of the details of storage. The redo log we talked about above is InnoDB engine specific log, and the Server layer has its own log called binlog.

Why are there two logs?

Because there was no InnoDB engine in MySQL at first. The MySQL engine is MyISAM, but MyISAM does not have crash-safe capability. Binlog can only be used for archiving. InnoDB uses a redo log to implement crash-safe functionality, since it does not rely solely on binlogs.

Redo log and binlog

  • Redo logs are unique to InnoDB; Binlog is implemented in the Server layer of MySQL and is available to all engines
  • A redo log is a physical log of what was changed on a data page. The binlog is the logical log that records the original logic of the statement, such as “add 1 to the C field on the ID=2 line
  • Redo logs are written in a redo log cycle. Binlog can be appended. “Appending” means that the binlog file will be switched to the next one after being written to a certain size without overwriting the previous log.

The actual execution process

update T set c=c+1 where ID=2;

  • The executor first finds the engine and fetches the line ID=2. ID is the primary key, and the engine uses the tree search directly to find this row. If the row ID=2 is already in memory, it is returned directly to the executor. Otherwise, you need to read memory from disk and then return.
  • The executor takes the row given by the engine, adds 1 to it, for example, N+1 to get a new row of data, and then calls the engine interface to write the new row of data.
  • The engine updates the data to memory and logs the update to the redo log. The redo log is prepared. The executor is then told that the execution is complete and the transaction can be committed at any time
  • The executor generates a binlog of this operation and writes the binlog to disk.
  • The executor calls the commit transaction interface of the engine, and the engine changes the redo log to commit.

As you can see, if two-phase commit is not used, the state of the database may be inconsistent with the state of the library recovered from its logs.

In short, both redo log and binlog can be used to represent the commit state of a transaction, and two-phase commit is to keep the two states logically consistent.

At the end

This is from Geek Time’s Mysql45 and it’s very good

Daily for praise

Ok, everybody, that’s all for this article, you can see people here, they are real fans.

Creation is not easy, your support and recognition, is the biggest motivation for my creation, we will see in the next article

Six pulse excalibur | article “original” if there are any errors in this blog, please give criticisms, be obliged!