After interviewing 100 people, the boss told me the inner workings of Innodb that I had to ask

preface

Many people wonder why they read source code? Why do you need to know the bottom line?

I’ve been thinking about this for years. I’ve been getting it lately.

In the work, the actual use of middleware, if you do not know the internal principle of middleware, for its external representation is difficult to understand. Can only do superficial memory, to achieve understanding, when you do not understand the underlying principle, is very difficult.

As businesses grow in complexity and volume, the requirements on middleware become more stringent, and there are often many difficult to understand phenomena. Behind these phenomena is unreasonable use. Explaining and solving strange problems requires internal implementation principles.

For Mysql, there is no problem with DAILY CRUD, but there will also be a sudden surge of CPU and memory, so this time surely can not wait for nothing, do I have to tell the boss, I only know CRUD?

The foundation of solving CPU and memory problems is to understand what’s going on inside the Mysql or InnoDB engine. Only when you understand the bottom, can you analyze the cause. Finally, performance tuning is done.

This article takes a look inside the InooDB engine.

Mysql

The history of Mysql

Mysql was founded in 1995 as a Swedish company.

Mysql is called My, not because of My SQL, but because the daughter of Michael Widenius, the father of Mysql, is called My. Mysql is like its own daughter. Hadoop got its name from a yellow toy elephant that belonged to the son of Hadoop founder Doug Cutting. There’s actually a lot of interesting things about the name of the project, but I’ll sort them out later, and I’ll talk about them in more detail. Here is a picture of the father of Msyql, big guy.

General overview

Take a look at the internal architecture of Mysql.

If you haven’t been exposed to Mysql internals before, take a look. Now let’s explain it in detail.

First, we can see that Mysql internally is split into two large chunks. These are the top service layer and the bottom Innodb engine layer. The service layer handles some general logic, while the engine layer handles the core storage logic.

So why is Mysql internally divided into two modules? Because different business scenarios will have different requirements on the database.

Some, for example, require good database performance and low latency, regardless of data persistence. Others require data persistence and transaction operations, but have lower performance requirements. If developers know enough about Msyql, Mysql can even allow developers to implement their own storage engine to meet the database-specific requirements of specific business scenarios. That may be part of the appeal of open source.

Mysql has several storage engines, including InnoDB, MyISAM, Memory, etc

Of course, after 25 years of development, the default InnoDB engine is mostly used internally and no other storage engine is used. It’s kind of like a rule. One or two of them win. (This rule has to be found and stated)

For example, the following construction sentence

create table user {
    id int(10),
    name varchar(10)
}
Copy the code

Create a table named user with two fields named name and age.

So what is his storage engine? One might say, well, I just said Innodb by default, so the engine for the user table is Innodb.

So that raises another question, right? Mysql also supports many storage engines. How do I switch to another storage engine?

This problem may not be known to many people, because Mysql development so far, the industry will basically only use Innodb to store data. It actually switches like this.

create table user{
    id int(10),
    name varchar(10)
} Engine = MyISAM
Copy the code

As you can see from this syntax, the storage engine is for tables. Each table must define its own storage engine, which is Innodb by default.

An overview of the components,

The Server layer consists of five modules: connector, parser, optimizer, executor, and cache.

Connector: It is actually a pool of connections internally and provides connections externally. Each client must first establish a connection with Mysql to work properly.
Parser: Mysql cannot understand SQL statements directly, it must go through a parser. The parser parses the SQL into a syntax tree. The syntax tree is eventually converted into function calls. Only function calls can be executed by Mysql.
Optimizer: Programmers write SQL syntax that is not necessarily optimal. Mysql will optimize it to be a relatively efficient syntax tree.
Executor: The executor executes on the optimized syntax tree.
Cache: Mysql will cache the results of SQL execution in memory, and then the same request can be read directly from the cache. However, due to the low cache hit ratio, older versions of Mysql have removed the cache.

Buffer Pool

A buffer pool is a large memory space used to store mysql data. In mysql, any query statement will be retrieved from the buffer pool. If the buffer pool cannot be retrieved, data will be loaded from disk to the buffer pool first. In other words, all data obtained from mysql cannot be read directly from disk, but must pass through buffer pool.

Consider the following query statement

select * from user where id = 100
Copy the code

The above query statement needs to find the record whose ID is 100 in the user table. We know that all queries are executed by the executor. The executor first queries the buffer pool, and if the buffer pool exists, it directly retrieves the data and returns it. As shown in the figure below.

If the buffer pool with id 10 already exists, the buffer pool with ID 10 will be returned directly.

After all, the buffer pool is memory, Mysql cannot load all data into memory, some data must still be on disk.

What if there is no record with ID 10 in the buffer pool? The buffer pool will first load the record with ID 10 from disk into the buffer pool and then return it again. As shown in the figure below.

At this point, the simple operation principle of buffer pool should be understood. Some people may wonder why there is a buffer pool. Can’t you read the disk directly?

This is because reading directly from disk is too slow, and Mysql would not be able to perform as well as it does if all requests had to be re-read from disk. The buffer pool ensures that most of the requests are in memory and only a small part of the requests are in disk, which greatly improves the performance

Redo Log

Redo logs are also called Redo logs. The redo log records each database operation. For example, update user where id=10 and name = zhangsan or delete user where ID =11.

When the Mysql fails unexpectedly, data in the buffer pool may not be flushed to the disk, resulting in data loss. Redo Redo Log to restore Mysql to its original state during Mysql restart.

Take this update statement for example.

update user SET name = zhangsan where id = 10
Copy the code

Each update is written to the Redo log Buffer after Mysql updates the Buffer Pool, as shown in the following figure.

Note that logs from the Redo log buffer are flushed to disk only after the transaction commits.

Undo Log

If Redo logs are called Redo logs, undo logs should be called undone logs. The raw data is recorded before the transaction is executed.

Let’s say it’s the same update statement.

update user SET name = zhangsan where id = 10
Copy the code

Before this update statement was executed, the record name field with ID 10 was LISi. The original field value of the record with ID 10 will be recorded in the Undo log before the update statement is executed. The content can be thought of as id=10,name=lisi.

In case the transaction needs to be rolled back, the undo log can be used to roll back the data to the state before the transaction was executed, since the original value of the data is recorded in the Undo log.

Bin Log

In addition to the Redo log that records every data change, the Bin log also records every database change. As shown in the figure below.

After writing the redo log, the executor writes a log to the bin log, flusher the bin log to disk when committing the transaction, and writes the bin log location and commit identifier to the redo log. Ensure the consistency between the redo log and bin log logs.

So what’s the difference between Redo logs and Bin logs? There are two points.

The form is different. The Redo Log is A physical Log, for example, data A in A data page is changed to data B. The Bin Log is A logical Log, for example, data A in A record is changed to data B.
The recorder is different. The Redo Log is logged by Innodb engine and the Bin Log is logged by Mysql Server. For example, if you change the storage engine, there will be no Redo Log, only Bin Log.

In the industry, Bin Log is often used for database synchronization, such as Mysql master-slave replication, Mysql data synchronization to other data sources.

The last

This article first establishes the general feeling of Innodb engine, knowing that there are several modules in it, and how the basic function of each module is. Then I will talk about the operation details of some important modules in detail, to have a comprehensive understanding and understanding of Innodb engine.

Give it a thumbs up when you see the last one. It’s not easy to write an article.

I’m shane. Written on the morning of September 14, 2020.