My girlfriend cried and asked me why the SQL she wrote ran so slowly.

The world fairy home originally from jusri, why to meet halfway. Jing Hong glanced at you dragon, diffuse angry King Chen nothing. Hi, everyone, I’m Luo Shen, gender male. A programmer from Happy Planet. Welcome everyone to focus on my public number [programmer Luo Shen], I will not regularly issue welfare yo ~

preface

In fact, this article has been conceived for a long time. As the first technical article after the return, I have been looking for the most appropriate entry point. Want to want to go, it’s not just in time to catch yue interview four seasons, warm and thoughtful, handsome decided to write an article about Mysql optimization, and the database is arguably the most important aspect of the present enterprise, trust is interviewing friends, also must be found, the frequency of the database are mentioned in the interview is quite high, I also combined a lot of my interview summary and thinking, hoping to be of some help to you.

background

A sunny weekend afternoon, Luo Shuaishuai is melancholy sitting on the sofa thinking about the next week’s swimming fish plan, the warm sunshine on his angular face, intoxicant… Suddenly, the girl friend’s room spread the cry, as a warm male, of course, the first time rushed past, the girl friend saw luo Shuaishuai, suddenly hugged him, choked up said: dear, why I write SQL run so slow ah, you quickly help me see what the problem woo woo ~~

Fool, the knowledge of the database can be many, not simply write a SQL on the line, a good database design can promote N+1 times the execution efficiency. As the old saying goes, if you learn database well, you can go anywhere! Here, let Lorraine tell you all about it today.

The body of the

As we all know, database tuning is a complex and long process, a product research and development process, database optimization is always accompanied by, with the increase of business, the direction of database optimization should also be constantly adjusted. Luo Shuaishuai will from the following four angles to Lao girlfriend write SQL exactly how to optimize!

Database table structure optimization SQL and index optimization system and MySQL configuration optimization hardware optimization Database table structure optimization

Let’s start with the three paradigms that all developers must follow when building tables.

First Normal Form (1NF)

Each column of each table must maintain its atomicity, which means that each column stores only one item of information and cannot be subdivided.

This design violates the first paradigm because “people” can be subdivided into three fields: name, gender, and age.

Second normal Form (2NF)

On the basis of satisfying the first normal form, each table is unique, that is, the non-primary key fields of the table are completely dependent on the primary key fields

This design violates the second normal form because both name and class are primary key fields. Class and score should be separated into a single class table and score table, thus ensuring that the primary keys of each table are unique.

Third normal Form (3NF)

Transitive dependencies cannot be created in tables, and to eliminate redundancy (which can be violated appropriately) in tables, we must mention the concept of anti-paradigm: Sometimes, in order to improve the operation efficiency, we will appropriately reduce the requirements of the three normal forms and redundancy some business data in the table, such as order table and a certain field in the associated logistics table. In this way, we can redundancy this field into the order table to avoid joint table query every time and improve the query efficiency.

In the design of database table structure, it is far from enough to only follow the three paradigms, which is only the basis of design. It is also necessary to spend a lot of time designing the table structure based on actual business requirements, so as to avoid frequent changes to the table structure during development.

Table structure design, the next to consider the appropriate table engine, lo Shuai hand-in-hand teach you

MySQL currently supports 8 engines, but don’t worry, you don’t need to remember all of them, the most widely used and most frequently used are InnoDB and MyISAM

A brief description of the differences between the two engines:

1.InnoDB supports transaction and safe recovery after crash, MyISAM does not support it (because of this amazing feature, MySQL default engine changed from MyISAM to InnoDB)

2.InnoDB’s minimum lock granularity is row lock,MyISAM’s minimum lock granularity is table lock. In other words, an update statement from MyISAM will lock the entire table, blocking other queries or updates.

InnoDB supports foreign keys, MyISAM does not support foreign keys (InnoDB tables with foreign keys cannot be converted to MyISAM tables)

4.InnoDB is a clustered index while MyISAM is a non-clustered index.

5.InnoDB tables have their own data page management, default 16KB. The management of MYISAM table data depends on the file system. For example, the default file system is 4KB, and the block size of MYISAM is also 4KB. MYISAM table does not have its own crash recovery mechanism, but all depends on the file system.

What? Is the text not direct enough? How could Lowry not think about you guys, as a warm guy

After reading the comparison of the two engines, I think you should probably have a good idea of when to use what engine. Yes, when we need to do business (i.e. involving transactions), we definitely choose to use InnoDB. If some of the data is cold, or if some of the tables are only used for queries, we can use the MyISAM engine to store it.

For example, articles, authors, titles and so on in the blog system can be queried many times but almost never modified. They can be removed from the business table and stored in the table of MyISAM engine to improve the search efficiency.

InnoDB engine can be used to improve the update efficiency for frequently updated data such as the number of blog views and replies.

Let me tell you more about the score… “Wait! “, girlfriend no mercy of the interruption of luo Shuaishuai mercury diarrhoea like smooth thinking: “dear, just now you said the difference between the two engines, which clustered index and non-clustered index is what ah, I did not remember using this kind of index ah ~”

Luo shuaishuai dote on drowning touch her head: fool, that I ask you, each business table inside you can set a primary key index ah, this primary key index, be to gather index (attention! Primary key index ≠ clustered index, but due to the feature of innoDB, each table has a clustered index set by default, usually the primary key. If the table has no primary key, the only non-empty index of the table is set to clustered index. If neither of the above is true, then innoDB internally generates a hidden primary key as the clustered index. Therefore, a clustered index can only have one (but can contain multiple columns) in a table, and its key-value order determines the physical storage order of the table’s rows. Does a clustered index have high performance against primary key queries, except that its secondary index (non-primary key index) must contain primary key columns? So if the primary key column is large, the index will also be large.

A non-clustered index is a normal index that can have more than one column in a table. It only creates the corresponding index for the column in the table. It does not affect the physical storage order of the whole table like a clustered index.

Oh, oh, oh, oh, oh, honey, you’re good! By the way, what were you going to tell me? Points? Break up?? I know, long handsome man is not a good thing, let alone you so handsome!

Cough cough, can’t go on to say, again kualuo shuai should blush. Back to the subject, the following to tell you about the optimization of the killer —- sub-database sub-table

This is actually a platitude of the topic, sub-database sub-table is a very direct way to optimize the database violence, but everything has advantages and disadvantages, sub-database sub-table will bring many new problems, this is the need to take into account in the design scheme.

Girl friend: but somebody else is want to cent cent table! Hurry up and teach me! ~~~ harm, really take you have no way, who let luo Shuaishuai so warm, lovely, teach you.

The first thing we need to know is why do we need separate tables?

The user requests are too large. Procedure
The normal master-slave model is no longer viable
The amount of data in one table is too large
MySQL officially recommends that the maximum amount of data per table not exceed 600W. Too much data leads to index inflation, query timeout, and performance bottleneck.
The reservoir is too big
The number of concurrent processing of a single database is limited. Once the number of requests increases, it is difficult to support a single database.

Ok, now we know why we need to divide the table, so, how to divide the table, don’t worry, and listen to the Luo Shuai shuai slowly…

Vertical segmentation

Vertical segmentation is divided into vertical sub-library and vertical sub-table.

Vertical partitioning refers to the splitting of some fields in one table to form another table. This practice usually occurs when there are too many columns in a single table, such as separating the shipping address and remarks from large fields in the order table. In a sense, this also avoids the problem of “cross-page” (the “data pages” used for MySQL’s underlying storage, which may incur additional performance overhead).

Vertical database separation refers to the separation of the database according to business requirements. For example, the original e-commerce database is divided into order database and logistics database, and different business requests are sent to different databases. In this way, we can manage, maintain, monitor, and extend the data of different business types. It is also an important means to optimize database architecture in large distributed systems.

Disadvantages of vertical sharding: The performance bottleneck still exists in the large amount of data in a single table.

The level of segmentation

Similarly, horizontal segmentation is also divided into horizontal sub-database and horizontal sub-table. Horizontal table is to split a table with a large amount of data into multiple tables based on certain rules (hash, order time, user ID, etc.), so that when querying data, you only need to query specific small tables according to the data. When the data reaches a certain volume, the single database cannot bear such high data, so the data will be divided into different databases according to the rules to relieve the data processing pressure of the single database. PS: For cold data, we can also use data segmentation for cold and hot separation. Cold data is used for data analysis, while hot data is used for business processing. Disadvantages: Horizontal sharding cannot resolve IO contention between tables.

As can be seen from the above statement, horizontal/vertical sharding has its pros and cons. Therefore, the current practice in enterprises is to adopt the data architecture of horizontal + vertical sharding: vertical sharding decouples the business and horizontal sharding relieves the pressure of single table.

“Wow, the original sub – database sub – table function so big ah, I now go to our company’s database redesign once!”

Fool, don’t worry, everything has advantages and disadvantages, Luo Shuai shuai to you about the problems brought by the table after the database:

Distributed transaction problem

As long as it is done after the branch, it is inevitable that the industry is the most headache distributed transaction problem, the same interface operation results in different libraries, how to ensure the consistency of data? This is to know 2PC, 3PC, TCC, XA and other related concepts of distributed transaction (this place involves a lot of things, Luo Shuai plans to issue a separate issue on this one, not detailed here) distributed transaction solution can use distributed transaction frameworks such as TCC-transcation to solve.

Problems with cross-library joins

After the database is divided, it is not possible to use join to join the tables of different libraries. The result that can be queried once may need to be queried several times. How to solve it? You can consider redundant fields that need to join in each table to minimize the need for join. You can also consider data assembly at the code level, identifying the various business data that you need separately and then assembling it (which is more troublesome).

Lateral expansion problem

For example, when we use hash mod to allocate data of sub-database and sub-table, the original 8 tables should be allocated with hash%8. However, if the later business volume increases and the table needs to be extended to 16 tables, the original data may not be located by hash%16 at this time. Because the old data was allocated by 8. How to solve it? This problem is usually caused by the lack of proper planning before the division. Therefore, before the division of databases and tables, make reasonable capacity planning based on the current and expected data volume of services to avoid later expansion and migration.

To solve this problem, you can also consider using consistent hash to distribute data, which is done by using hash rings. After expansion, only adjacent data points will be affected. If you’re interested, leave a comment below, and I’ll write a special issue later.

The database and table can be done with the help of middleware, such as Sharding-JDBC or MyCat. Online learning resources are also very complete. You can learn and practice while building a shelf by yourself.

Conclusion:

1. The three paradigms must follow the specification, but in the actual business data design, it can be appropriate to violate the third paradigms and do some appropriate field redundancy. For example, the order table is associated with the logistics table. In addition to the ID of the logistics table in the order table, the status column of the logistics can be redundant, so that there is no need to join the whole table to look up one column. 2. InnoDB was used when the table engine was used to deal with frequent business, but for cold data, MyISAM can be considered for fast query. 3. The scheme of subdividing database and subdividing table should be carefully selected and designed according to the actual business requirements, rather than for the sake of subdividing, and avoiding excessive optimization and over-design. Therefore, I recommend the order of database optimization: index optimization ->SQL optimization -> table structure optimization -> read and write separation -> sub-database sub-table

Ok, speaking of which, database structure design you probably understand? Girlfriend: Uh-huh! I was enlightened! Honey, you’re amazing! I’ll talk about the rest tomorrow. You’ve been working so hard. Go wash the dishes and have a rest.

At the end

First of all, this is a literacy article, not too in-depth explanation, as we all know, this above any point, can pull an article. I wanted to finish database optimization in one go, but I found it was too much and one article was too long. So I split it up into modules and will be Posting the other three in the next few days (just to get more posts).

The world fairy home originally from jusri, why to meet halfway. Jing Hong glanced at you dragon, diffuse angry King Chen nothing. I’m Luo, and I’ll see you next time.