Welcome to pay attention to github.com/hsfxuebao, I hope to help you, if you think it can trouble to click on the Star ha

1. Why database design

When we design the data table, we need to consider many issues. Such as:

  • What data do users need? What data do I need to save in the data table?
  • How to ensure the correctness of data in the data table, and what constraints should be checked when inserting, deleting, and updating?
  • How to reduce the data redundancy of the data table and ensure that the data table does not rapidly expand due to the growth of users?
  • How to make database maintenance personnel more convenient to use the database?
  • The use of database application scenarios are also different, can be said to be different for different situations, in addition to the calculated data table may vary.

In reality, the scenario is:

It was only after the database had been running for a while that we discovered a problem with the data table design. Readjusting the structure of the data table requires data migration, and may affect the business logic of the program, as well as the normal access to the website.

Poor database design can cause the following problems:

  • Data redundancy, information duplication, and storage space waste
  • Data update, insert, or delete exception
  • Information cannot be represented correctly
  • Loss of valid information
  • Poor program performance

Good database design has the following advantages:

  • Saves data storage space
  • Ensure data integrity
  • Facilitate the development of database application system

In short, when setting up the database, we need to pay attention to the design of the data table. In order to build a database with less redundancy and reasonable structure, certain rules must be followed when designing the database.

2. The paradigm

2.1 Paradigm Introduction

In relational databases, the basic principles and rules of data table design are called paradigms. It can be understood as a level of design criteria that the design structure of a data table needs to meet. To design a reasonably structured relational database, it is necessary to satisfy certain paradigms.

The English name of the Normal Form is Normal Form, NF for short. It was summarized by E.F.Codd, an Englishman, who put forward the relational database model in the 1970s. Paradigm is the basis of relational database theory, and it is also the rule and guiding method that we should follow in the process of designing database structure.

2.2 What are the paradigms

At present, there are six common paradigms in relational database, from lowest to highest in terms of paradigms: first paradigms (1NF), second paradigms (2NF), third paradigms (3NF), Bas-Coad paradigms (BCNF), fourth paradigms (4NF) and fifth paradigms (5NF, also known as perfect paradigms). The higher the order of database paradigm design, the lower the redundancy degree, and the higher order paradigm must meet the requirements of the lower order paradigm, the paradigm that meets the minimum requirements is the first paradigm (1NF). The second normal form (2NF) that meets more specification requirements on top of the first is called the second normal form (2NF), and so on. In general, in relational database design, the highest is also followed up to BCNF, generally 3NF. But not always, sometimes to improve the performance of certain queries, we also need to break the paradigm rules, that is, de-normalize.

There are currently six common paradigms for relational databases, ranked from lowest to highest:

  • First Normal Form (1NF)
  • Second normal Form (2NF)
  • Third normal Form (3NF)
  • Bush-codvan formula (BCNF)
  • Fourth normal Form (4NF)
  • The fifth normal Form (5NF, also known as perfect normal Form)

2.3 Concepts of keys and related attributes

The definition uses primary keys and candidate keys. A Key in a database consists of one or more attributes. Definition of several keys and attributes commonly used in data tables:

  • Super key: A set of attributes that uniquely identify a tuple is called a superkey.
  • Candidate key: If the superkey does not contain redundant attributes, the superkey is a candidate key.
  • A primary key: The user can select one of the candidates as the primary key.
  • A foreign key: If an attribute set in table R1 is not the primary key of R1, but the primary key of another table R2, then the attribute set is the tunnel of data table R1.
  • The main properties: An attribute that contains the optional key of any Tl is called a primary attribute.
  • The main properties: As opposed to a primary attribute, an attribute that is not included in the heke Y candidate key.

In general, we also refer to candidate keys as “codes” and primary keys as “primary codes”. Because keys can be composed of multiple attributes, we can also distinguish between primary and non-primary attributes for individual attributes.

Example: Here are two tables

Players table (player) : id number name of | | | | age team number table (team) : team number | | manager is located

  • Super keyFor a player table, a superkey is any combination of player numbers or ID numbers, such as (player number) (player number, name) (ID number, age), etc.
  • Candidate key: is the smallest superkey. For the player table, the candidate key is either (player number) or (ID number).
  • A primary key: We select ourselves, that is, we select one of the candidate keys, such as (player number).
  • A foreign key: Number of the team in the player table.
  • The main propertiesThe main properties: In the player table, the primary attribute is (player number) (ID number), the other attributes (name) (age) (team number) are non-primary attributes.

2.4 First Normal Form (1st NF)

The first normal form is to ensure that the value of each field in the data table must be atomic, that is, the value of each field in the data table is the smallest unit of data that cannot be split again.

When we design a field, for field X, we cannot split field X into field X-1 and field X-2. In fact, any DBMS will meet the requirements of the first paradigm and will not split the fields.

Example 1:

Suppose a company wants to store the names and contact information of its employees. It creates the following table:

The table does not comply with 1NF because the rule says “each attribute of the table must have an atomic (single) value” and the EMP_mobile value of the Lisi and Zhaoliu employees violates this rule. To make the table conform to 1NF, we should have the following table data:

Example 2:

The design of the User table does not conform to the first normal form

The user_info field is user information and can be further divided into smaller fields, which does not meet the requirements of the first normal form in database design. Split user_info as follows:

Example 3: The atomicity of a property is subjective.

For example, should employee names in the Employees relationship be represented with 1 (fullName), 2 (firstName and lastName), or 3 (firstName, middlename, and lastName) attributes? The answer depends on the application. If the application needs to process the employee’s name parts separately (for search purposes, for example), it is necessary to separate them. Otherwise, no need.

Table 1:

Table 2:

Addresses are subdivided in fine granularity

2.5 Second Normal Form (2nd NF)

The second normal form requires that, on the basis of satisfying the first normal form, each data record in the data table should be uniquely identified. And all non-primary key fields must depend entirely on the primary key, not just part of it. If you know the values of all the attributes of the primary key, you can retrieve any value of the field attributes of any tuple (row). (The primary key in the requirement can actually be extended to replace the candidate key)

Select * from table where columns do not exist except for additional dependencies example 1:

In the relation of grade table (Student number, course number, grade), (student number, course number) can determine the grade, but the student number cannot determine the grade, and the course number cannot determine the grade, so “(Student number, course number) → grade” is a completely dependent relationship.

Example 2:

Table player_game contains player id, name, age, match id, match time, match venue and other attributes.

(Player number, match number) → (Name, age, match time, venue, score)Copy the code

However, this data table does not meet the second normal form, because there is also the following correspondence between the fields in the data table:

(Player NUMBER) → (Name, age) → (Match number) → (Match time, venue)Copy the code

For non-primary attributes, there is no complete dependence on candidate keys. What kind of problems does that create?

  1. Data redundancy: If a player can play m games, then the player’s name and age are repeated M-1 times. A game can also be played by n players, and the time and place of the game are repeated n-1 times.
  2. Insert the abnormal: If we want to add a new match, but we don’t know who will be playing, then we can’t insert it.
  3. Remove abnormal: If I want to delete a player id, I will delete the match information at the same time if THE match table is not saved separately.
  4. Update anomalies: If we adjust the time of a match, then all the match times in the table need to be adjusted, otherwise a match time will be different.

To avoid the above situation, we can design the player match table as the following three tables.

In this way, each data table conforms to the second normal form, thus avoiding the occurrence of anomalies.

1NF tells us that field attributes need to be atomic, while 2NF tells us that a table is a separate object and that a table has only one meaning.

Example 3: Defines a relationship named Orders that represents information about the order and the order line:

It violates the second normal form because there are non-primary key attributes that depend on only a portion of the candidate key (or primary key). For example, orderDate, customerID, and CompanyName of an order can be found by orderID alone, without the need to use productid.

Modify: The Orders and OrderDetails tables are as follows, now conforming to the second normal form.

Summary: The second normal Form (2N) requires that the attributes of the entity depend entirely on the primary key. If incomplete dependencies exist, this attribute and this part of the primary key should be separated to form a new entity that has a one-to-many relationship with the meta-entity.

2.6 Third Normal Form (3rd NF)

The third normal form is based on the second normal form to ensure that every non-primary key field in the table is directly related to the primary key field, that is, all non-primary key fields in the table cannot depend on other non-primary key fields. (i.e., there can be no case where non-primary attribute A depends on non-primary attribute B, and non-primary attribute B depends on primary key C, i.e., there is A “A – →B→C” “decision.) Informally speaking, this rule means that all non-primary key attributes cannot have dependencies and must be independent of each other. The primary key here can be expanded to be a candidate key.

Example 1:

Department information table: Each department contains department id (DEPt_id), department name, and department introduction. Employee information sheet: Each employee has employee number, name and department number. After listing the department number, you cannot add information related to the department, such as the department name and department introduction, to the employee information sheet. If the department information table does not exist, it should also be built according to the Third Normal Form (3NF), otherwise there will be a lot of data redundancy.

Example 2:

Commodity category names depend on commodity category numbers and do not conform to the third normal form. Modified: Table 1: Design of commodity category tables conforming to the third normal Form

Table 2: Design of commodity tables conforming to the third normal form

Product table GOODS is related to product category table GOOds_category through the product category ID field (category_id).

Example 3: Player table: player id, name, team name, and team manager. Now, let’s draw the dependencies between the attributes, as shown below:

You can see that the player number determines the team name, while the team name determines the team manager. The manager of a non-primary team will pass depending on the player number, thus not meeting the requirements of 3NF.

If you want to meet the requirements of 3NF, you need to disassemble the data table as follows:

Example 4:

Modify Example 3 in the second normal form. The Orders relationship at this point contains the OrderID, OrderDate, CustomerID, and CompanyName attributes, with the primary key defined as OrderID.

Both CustomerID and CompanyName depend on the primary key — OrderID. For example, you need the orderid primary key to find the customerid representing the customer in the order. Similarly, you need the orderid primary key to find the companyname of the customer in the order. However, customerId and CompanyName also depend on each other. To satisfy the third normal form, it can be rewritten as follows:

In layman’s terms, 2NF and 3NF are usually summarized by the phrase “every non-key attribute depends on the key, depends on the whole key, and is nothing but the key”.

2.7 summary

There are three paradigms to follow when designing data tables.

  • First Normal Form (1NF) : Ensure that each column is maintainedatomic
    • Each column of the database is an indivisible atomic data item, the smallest data unit that cannot be divided again, and cannot be a collection, array, record and other non-atomic data item.
  • Second normal Form (2NF) : Make sure that each column has a primary keyTotally dependent on
    • Especially in the case of compound primary keys, non-primary key parts should not depend on partial primary keys.
  • Third normal Form (3F) : Ensure that each column matches the primary key columnIs directly related to theNot indirectly

Benefits of the paradigm: Standardization of data helps eliminate data redundancy in databases, and the Third paradigm (3NF) is generally considered to strike the best balance in terms of performance, extensibility, and data integrity.

Disadvantages of paradigms: The use of paradigms may reduce the efficiency of queries. Because the higher the level of the paradigm, the more and more refined the data tables designed, the lower the redundancy of the data, the data query may need to be associated with multiple tables, which is not only expensive, but also may make some indexing strategies invalid.

The paradigm simply suggests standards for design, and it is not necessary to actually conform to these standards when designing data tables. In the development, we will violate the principle of formalization for performance and read efficiency. By adding a small amount of redundant or repeated data, we can improve the read performance of the database, reduce the number of associated query and JOI table, and achieve the purpose of space for time. Therefore, in the actual design process to combine theory with practice, flexible use.

Paradigms themselves are not superior or inferior, only applicable scenarios. There is no perfect design, only the right design, we also need to use a mixture of normal and anti-normal patterns in the design of data tables according to the requirements.

3. Counterstereotype

According to the business, appropriate redundant storage fields, reduce the pressure of multi-table query

3.1 an overview of the

Sometimes it is not possible to simply design a data table as required by the specification, because some data appears redundant but is actually important to the business. At this point, we need to follow the business first principle, first meet the business needs, and then minimize redundancy.

If the amount of data in the database is large and the frequency of UV and PV access of the system is high, the data table is designed in full accordance with the three paradigms of MySQL. When data is read, a large number of associated queries will be generated, which will affect the read performance of the database to a certain extent. If we want to optimize query efficiency, anti-paradigm optimization is also an optimization idea. In this case, you can add redundant fields to the data table to improve the database read performance.

Normalization vs performance

  • To meet certain business goals, database performance is more important than canonical databases
  • At the same time of data normalization, the performance of database should be considered comprehensively
  • Significantly reduce the time needed to search for information from a given table by adding additional fields
  • By inserting computed columns into a given table to facilitate queries

3.2 Application Examples

Example 1:

Employee information is stored in the Employees table, and department information is stored in the DEPARTMENTS table. Establish an association with the DEPARTMENTS table through the Department_ID field in the Employees table. If you want to query the name of an employee’s department:

select employee_id,department_name from employees e join departments d
on e.department_id = d.department_id;
Copy the code

Join queries can waste a lot of time if you need to do this often. You can add a redundant field department_NAME to the Employees table so that you don’t have to join every time.

Example 2:

The anti-formalized goods information table is designed as follows:

Example 3:

There are two tables, namely the Commodity flow Table (TRANS) and commodity information table (GoodsINFO). There are 4 million flow records in the commodity flow table, and there are 2000 commodity records in the commodity information table.

Commodity flow chart:

Commodity Information Sheet:

The new flow chart is as follows:

Example 4:

The name and meaning of the corresponding field in the course comment table class_comment are as follows:

Student table student, the corresponding field name and meaning are as follows:

In practice, when we display course comments, we usually display the student’s nickname instead of the student ID. Therefore, when we want to query the first 1000 comments of a course, we need to associate the two tables class_comment and student for query.

Experimental data: Simulate two data tables of a million magnitude

To better perform SQL optimization experiments, we need to randomly simulate millions of data for the student table and the course comment table. We can implement simulated data through stored procedures.

Experimental comparison of anti-paradigm optimization

If we wanted to query the first 1000 comments for the course ID 10001, we would write as follows:

SELECT p.comment_text, p.comment_time, stu.stu_name FROM class_comment AS p LEFT JOIN student AS stu ON p.stu_id = stu.stu_id WHERE p.class_id = 10001 ORDER BY  p.comment_id DESC LIMIT 1000;Copy the code

Run result (1000 data rows) :

Running time is 0.395 seconds, which is very slow for the responsiveness of the site, and the user experience will be very poor.

If we wanted to make our queries more efficient, we could allow some data redundancy by adding user nickname fields to the product comment table and stu_name fields to the class_comment table, resulting in the class_COMMENT2 table.

This way, a single table query can get the result of the dataset:

SELECT comment_text, comment_time, stu_name
FROM class_comment2
WHERE class_id = 10001
ORDER BY class_id DESC LIMIT 1000;
Copy the code

Operating results (1000 pieces of data) :

After optimization, it only needs to scan the clustered index once, the running time is 0.039 seconds, and the query time is 1/10 of the previous one. As you can see, query efficiency is significantly improved with large volumes of data.

3.3 New problems of anti-paradigm

Anti-paradigm can improve query efficiency by changing space for time, but it also brings some new problems.

  • storageSpace greatens
  • If a field in one table is modified, redundant fields in the other table must be synchronized. Otherwise, the data in the other table must be synchronizedData inconsistency
  • If a stored procedure is used to support additional operations such as data update and deletion, if the update is frequent, it will be veryConsuming system resources
  • inSmall amount of dataIn this case, the anti-paradigm does not capture the performance advantage and may also make the database design morecomplex

3.4 Application scenarios of anti-paradigm

People improved on the basis of 3NF and put forward Bass Normal Form (BCNF), also known as Bass -Codd Normal Form (Boyce-Codd Normal Form). BCNF is considered to have no new design specifications, but only stronger requirements for design specifications in the third normal form, resulting in less database redundancy. Therefore, called modified third normal form, or expanded third normal form, BCNF is not called fourth normal form.

If a relationship reaches the third normal form and it has only one candidate key, or each of its candidate keys is single-attribute, then the relationship naturally reaches BC normal form.

In general, a database design that conforms to 3NF or BCNF will suffice. Antiparadigm optimization is used when redundant information is valuable or greatly improves query efficiency.

  1. Suggestions for adding redundant fields

Adding redundant fields must meet the following two conditions:

  • This redundant fieldYou don't need to change it very often
  • This redundant fieldInquiry is indispensable
  1. Historical snapshots and historical data are required

In real life, we often need some redundant information, such as the consignee information in the order, including name, phone number and address, etc. The receiving information of each order is a historical snapshot and needs to be saved. However, users can modify their own information at any time. In this case, it is necessary to save the redundant information. Anti-paradigm optimization is also commonly used in the design of data warehouse, because data warehouse usually stores historical data, which has weak real-time requirements for adding, deleting and modifying, and strong requirements for analyzing historical data. In this case, data redundancy is properly allowed to facilitate data analysis.

4. BCNF(Bass Paradigm)

4.1 case

In this table, a warehouse has only one administrator, and an administrator manages only one warehouse. Let’s first tease out the dependencies between these attributes.

The warehouse name determines the manager, the manager determines the warehouse name, and the set of attributes (warehouse name, item name) can determine the quantity attribute. In this way, we can find the candidate key for the table.

Candidate keys: are (admin, item name) and (warehouse, item name), and then we choose one of the candidate keys as the primary key, such as (Warehouse, item name). Primary properties: Contains properties in any of the candidate keys, that is, warehouse name, administrator, and item name. Non-primary property: Quantity is the property.

4.2 Whether it conforms to the three paradigms

How do you determine the form of a table? We need to judge from low to high according to the hierarchy of paradigms.

First, every attribute of the data table is atomic and meets the requirements of 1NF. Secondly, the non-primary attribute “quantity” of the data table is completely dependent on the candidate key, (warehouse name, item name) determines the quantity, (administrator, item name) determines the quantity. Therefore, the data table meets the requirements of 2NF; Finally, non-primary attributes in the data table do not pass dependent on candidate keys. Therefore, it meets the requirements of 3NF.

4.3 Existing problems

Since the data table has met the requirements of 3NF, is there no problem? Let’s look at the following:

  • Add a warehouse, but don’t store anything yet. According to the requirement of data table entity integrity, the primary key cannot have empty value, so there will be insert exception;
  • If the warehouse changes administrators, we may modify multiple records in the data table;
  • If all items in the warehouse are sold short, the warehouse name and the corresponding administrator name will be deleted accordingly.

As you can see, even if the table meets the requirements of 3NF, there may be exceptions to insert, update, and delete data.

4.4 Troubleshooting

First, we need to confirm the cause of the exception: the primary attribute repository name is partially dependent on the candidate key (administrator, item name), which could lead to the above exception. Therefore, BCNF is introduced, which removes the partial or transitive dependence of the primary attribute on the candidate key on the basis of 3NF.

  • If U is the primary key in relation R, and the A attribute is an attribute of the primary key, then the relation is not BCNF if A->Y is the primary attribute.

According to the requirements of BCNF, we need to split the warehouse_keeper table into the following: (Warehouse name, item name, quantity) So that there is no partial or pass-through dependence of the primary attribute on the candidate key, the above table is designed to comply with the BCNF.

For example:

There is a student advisor table that contains fields: Student ID, major, Advisor, major GPA, where student ID and major are joint primary keys.

This table is designed to meet the three paradigms, but there is another dependency, “major” depends on “mentor”, that is to say, each mentor is only a professional mentor, as long as we know which mentor, we will naturally know which major.

Some of the primary keys in this table depend on the non-primary key Advisor, so we can split the table into 2 tables by the following adjustments:

Student Tutor List:

Mentor table:

5. Fourth normal Form

The concept of multi-valued dependencies:

  • Many values depend onThat is, the one-to-many relationship between attributes, denoted as K→ one →A.
  • Functional dependenciesIn fact, they are single-valued dependencies, so one-to-many relationships between attribute values cannot be expressed.
  • Trivial many-valued dependencies: The complete set U=K+A, A K can correspond to multiple A, that is, K→ A. The entire table is now a set of one-to-many relationships.
  • Nontrivial multivalued dependencies: The complete set U=K+A+B, A K can correspond to multiple A, can also correspond to multiple B,A and B are mutually independent, namely K→→A,K→→B. The whole table has a one-to-many relationship of multiple groups, and there are: “one” part is the same attribute set, “multiple parts are mutually independent attribute set.

The fourth normal form is to eliminate non-trivial and non-functional multi-valued dependencies on the basis of satisfying the Bus-Coad normal Form (BCNF) (that is, delete the many-to-many relations in the same table).

Example 1: employee table (employee number, employee child name, employee elective courses).

In this table, the same employee may have multiple child names. Similarly, the same worker may also have more than one worker elective courses, that is, there are many value facts, do not conform to the fourth paradigm.

If you want to conform to the fourth normal form, you only need to divide the table above into two tables, so that they have only one multi-valued fact, such as: employee table 1 (employee number, employee child name), employee table 2 (employee number, employee elective course), both tables have only one multi-valued fact, so it conforms to the fourth normal form.

Example 2:

For example, we build models of courses, teachers and textbooks. We stipulate that each course has a corresponding set of teachers, and each course also has a corresponding set of textbooks. The textbooks used in a course have nothing to do with the teachers. The relationship table we established is as follows: course ID, teacher ID, textbook ID; These three columns act as the joint primary key.

For convenience, we use Name instead of ID to make it easier to read:

This table has no fields other than primary keys, so it must satisfy the BC paradigm, but there are exceptions due to multi-value dependencies.

If we want to use a new English advanced mathematics textbook next semester, but we haven’t decided which teacher will teach it, then we can’t maintain the relationship between Course advanced mathematics and English advanced mathematics textbook in this table.

The solution is that we split the multi-value dependent table into two tables and establish the relationship respectively. Here’s our split table:

As well as

6. The fifth paradigm, domain key paradigm

In addition to the fourth normal form, we also have more advanced fifth normal form (also known as perfect normal form) and domain bond normal form (DKNF).

On the basis of satisfying the fourth normal form (4NF), the join dependencies not implied by candidate keys are eliminated. A relational pattern R is said to conform to the fifth normal form if every join dependency in R is implied by a candidate key of R.

Function dependencies are a special case of multivalued dependencies, which are really a special case of join dependencies. However, unlike function dependencies and multivalued dependencies, join dependencies can be derived directly from semantics, but are reflected in relational join operations. Data redundancy and insert, modify, and delete exceptions may occur in connection dependent relational patterns.

The fifth paradigm, which deals with lossless connections, is largely moot because lossless connections are rare and difficult to detect.

The domain bond paradigm attempts to define an ultimate paradigm, which considers all types of dependencies and constraints, but is of minimal practical value and only exists in theoretical studies.

7. Real case study

Analysis of the purchase list in the supermarket purchase system:There are many fields in this table, and the amount of data in this table is amazing. A lot of repetition causes tables to become large and inefficient. How to transform?

In the actual work scenario, the phenomenon of data duplication caused by unreasonable data table structure design is not uncommon. Although the system can run, the carrying capacity is poor. If there is a little traffic, there will be insufficient memory and soaring CPU usage, and even the whole project will fail.

7.1 Iteration 1: Consider 1NF

The first normal form requires that all fields be basic data fields and cannot be further split. Make sure that each of the columns contains only one type of data.

In this table, we split the “property” field into “specification” and “unit” as follows:

7.2 Iterate twice: consider 2NF

The second normal form requires that, on the basis of satisfying the first normal form, each data record in the data table should be uniquely identified. And all fields must depend entirely on the primary key, not just part of it.

The first step is to determine the primary key of the table. The listnumber and barcode fields uniquely identify each record and serve as the primary key.

Step 2, once you have determined the primary key, determine which fields depend entirely on the primary key and which fields depend only on a part of the primary key. Split out the fields that depend on only part of the primary key to form a new table.

First of all, the “GoodsName” “Specification” “Unit” in the purchase order list are the attributes of the product, which only depends on the “barcode”. It’s not entirely dependent on the primary key, it can be split out. We split these three fields together with their dependent field barcode to form a new data table commodity information table.

In this way, the original table is split into two tables.

Commodity Information Sheet:

List of commodity details:

In addition, the fields “SupplierID”, “SupplierName”, and “stock” rely only on “listNumber”, not entirely on the primary key, so, We can separate the fields “SupplierId” supplierName “stock” and add the dependent word “listNumber” to form a new table purchase single header table. The remaining fields form a new table, which we call the “purchase Order List.” The original data table is split into three tables.

Purchase list:



List of purchase order:



Commodity Information Sheet:

The three data tables after splitting all meet the requirements of the second normal form.

7.3 Iterate 3 times: Consider 3NF

We have a single head of inventory and the possibility of data redundancy. Because “supplierName ‘” depends on” supplierId “. So, at this point, we can split according to the third normal form. We will further split the next purchase sheet, it is disassembled into supplier sheet and purchase sheet.

List of suppliers:



Purchase list:



Both tables satisfy the requirements of the third normal form

7.4 Anti-stereotype: The principle of service priority

In the purchase order list, quantity*importprice= importValue, “importprice'”, “quantity'” and “importValue ‘” can be calculated from any two of the three fields, thus there are redundant fields. If we strictly follow the requirements of the Third normal Form, we should now optimize step by step. The optimization is to delete one of the fields and keep the other two so that there is no redundant data.

But can it really be done? To answer this question, we need to understand the business first principle in practice.

The so-called business priority principle means that everything is based on business needs and technology serves the business. Completely according to the theory of design is not necessarily optimal, but also according to the actual situation to decide. Here’s a look at the pros and cons of different options.

For Quantity *importprice= importValue, it appears that “importvalue'” is a redundant field, but it does not cause data inconsistency. However, if we cancel this field, it will affect the business.

Because sometimes, suppliers often carry out some promotional activities, by the amount of promotion, that they bring the purchase list only amount, no price. On the contrary, the “importprice'” is calculated by “importValue” ÷ “quantity”, which will produce a large error after rounding. After a long period of time, the query results may differ greatly, affecting system reliability.

For example: The quantity is 34, then the purchase price is 25.5÷34=0.74 yuan, but if the calculated purchase price is used to calculate the purchase amount, then, The importvalue is equal to 0.74×34=25.16 yuan, and the difference is 25.5-25.16=0.34 yuan.

Therefore, in line with the principle of service priority, “importValue ‘”,” importprice” and “quantity” can be left without affecting system reliability.

Therefore, we can finally divide the purchase order form into the following four tables:

List of suppliers:



Purchase list:



List of purchase order:



Commodity Information Sheet:

8. ER model

Database design is related to the whole body. Is there any way to see the full picture of the database in advance? Such as which tables are needed, which fields should be in the tables, how the tables relate to each other, which fields are connected, and so on. In this way, we can carry out the overall combing and design.

In fact, the ER model is one such tool. ER model, also known as entity relation model, is a data model used to describe objectively existing things in real life, their attributes and the relationships between things. In the design stage of developing information system based on database, E model is usually used to describe information requirements and information characteristics, which helps us to clarify business logic, so as to design an excellent database.

8.1 What are the elements of ER model?

There are three elements in the ER model, namely entities, attributes and relationships.

  • entity, can be seen as data objects, often corresponding to real individuals in real life. In ER model, userectangularTo represent. There are two types of entities, strong and weak. Strong entities are entities that are not dependent on other entities; Weak entities are those that have a strong dependency on another entity.
  • attribute, refers to the characteristics of the entity. Such as the supermarket’s address, contact number, number of employees and so on. In ER modelThe ovalTo represent.
  • Relationship between, refers to the relationship between entities. For example, when supermarkets sell goods to customers, it is a kind of connection between supermarkets and customers. In ER modelThe diamondTo represent.

Note: Entities and attributes are not easily distinguished. Here provides a principle: we should look at it from the perspective of the system as a whole, what can exist independently are entities, and what cannot be separated are attributes. That is, attributes cannot contain other attributes.

8.2 Types of Relationships

Among the three elements of ER model, the relationship can be divided into three types, namely one-to-one, one-to-many and many-to-many.

One-to-one: refers to the one-to-one relationship between entities, for example, the relationship between individuals and id card information is one-to-one. A person can only have one ID card information, and one ID card information belongs to one person only.

One-to-many: An entity on one side can correspond to multiple entities on the other side. Conversely, entities on the other side can only correspond to entities on the one side through this correlation. For example, if we create a class table, and each class has multiple students, each student corresponds to a class, class to student is one-to-many relationship.

Many-to-many: The entities on both sides of a relationship can map to multiple entities of the other side. For example, in the purchase module, the relationship between suppliers and supermarkets is many-to-many. One supplier can supply goods to multiple supermarkets, and one supermarket can purchase goods from multiple suppliers. Another example is a list of courses, there are many subjects, each subject has many students, and each student can choose multiple subjects, this is a many-to-many relationship.

8.3 Modeling and Analysis

The ER model may seem cumbersome, but it is very important for us to control the overall project. If you are developing a small application, it may be enough to simply design a few tables, but once you are designing an application of a certain scale, it is critical to establish a complete ER model at the beginning of the project. The essence of developing application projects is modeling.

The case we design is e-commerce business. Since e-commerce business is too large and complex, we simplify the business. For example, in terms of the meaning of SKU (StockKeepingUnit) and SPU (Standard Product Unit), We use the SKU directly, without mentioning the concept of SPU. This e-commerce business design has a total of 8 entities, as shown below.

  • Address of the entity
  • The user entity
  • Shopping cart entity
  • Comment on the entity
  • Commodity entity
  • Commodity classification entity
  • Order entity
  • Order Details Entity

Among them, users and goods categories are strong entities because they do not need to depend on any other entities. The others are weak because they can all exist independently, but they all depend on the user as an entity and are therefore weak. Knowing these elements, we can create an ER model for the e-commerce business, as shown in the figure below:

In this diagram, the additive relationship between the address and the user is one-to-many, while the item and item details are one-to-one, and the item and order are many-to-many. The ER model includes eight relationships between eight entities.

  • Users can add multiple addresses on the e-commerce platform;
  • Users can have only one shopping cart;
  • Users can generate multiple orders;
  • Users can post multiple comments;
  • A product can have multiple reviews;
  • Each commodity category contains a variety of commodities;
  • An order can contain multiple items, and an item can be in multiple orders.
  • The order contains multiple order details, because an order may contain different kinds of goods

8.4 Refinement of ER model

With this ER model, we can understand the business of e-commerce as a whole. The ER model just presented shows the framework of e-commerce business, but it only includes eight entities such as order, address, user, shopping cart, comment, commodity, commodity classification and order details, as well as the relationship between them, but cannot correspond to specific tables or the association between tables. We need to add the properties to the ellipse, so that we have a more complete ER model.

Therefore, we need to further design each part of the ER model, that is, to refine the specific business process of e-commerce, and then integrate them together to form a complete ER model. This helps clarify the design of the database. Next, let’s examine what attributes each entity has, as shown below.

  • The address entity includes the user number, province, city, region, recipient, phone number, and default address.
  • User entities include user ID, user name, nickname, user password, mobile phone number, email, profile picture, and user level.
  • The shopping cart entity includes the shopping cart number, user number, item number, item quantity, and image file URL.
  • The order entity includes the order number, consignee, recipient phone number, total amount, user number, payment method, delivery address, and order time.
  • Order details entities include order details number, order number, commodity name, commodity number and commodity quantity.
  • Commodity entity includes commodity number, price, commodity name, classification number, whether to sell, specification and color.
  • Comment entities include comment ID, comment content, comment time, user number and product number
  • Commodity classification entities include category number, category name and parent category number

After such subdivision, we can redesign the e-commerce business. The ER model is shown as follows:

8.5 ER model diagrams are converted into data tables

By drawing the ER model, we have clarified the business logic. Now, we are about to take a very important step: converting the drawn ER model into a concrete data table. Here are the transformation principles:

  • An entity is usually converted to a table of numbers;
  • A many-to-many relationship is also usually converted to a number ju table:
  • A 1-to-1, or 1-to-many, relationship is often expressed by a table’s foreign key, rather than by designing a new table;
  • Property to the fields of the table.

The following combined with the previous ER model, specifically explain how to use the original side of these transformations, the ER model into a specific data table, so as to abstract the data model, to implement the specific database design.

8.5.1 Converting an entity to a data table

Let’s take a look at the strong entity to data table: the user entity to user table (user_info) code is shown below.

CREATE TABLE 'user_info'('id' bigint(2g) NOT NULL auto_. INCREMENT COMMENT' 'user_name' varchar(2BB)DEFAULT NULL COMMENT' iD ', 'nick_name 'varchar(2BB)DEFAULT NULL COMMENT' iD', 'passwd' varchar(2gO)DEFAULT NULL COMMENT' user password ', 'phone_num 'varchar(2gB)DEFAULT NULL COMMENT' user id', 'email 'varchar(2OB)DEFAULT NULL COMMENT' mailbox', 'head_img' varchar(2GG)DEFAULT NULL COMMENT' avatar ', 'user_level' varchar(208)DEFAULT NULL C0 ment 'user level', PRIMARY KEY (id'))ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=Utf8 COMMENT=' 表';Copy the code

The commodity classification entity is transformed into base_category. As commodity classification can be classified into first-level and second-level categories, such as home furnishment and mobile phone, etc., the second-level classification can be divided into mobile phone accessories and operators according to the first-level classification of mobile phones. Here, we plan the commodity classification entity into two tables. Level-1 classification table and level-2 classification table are respectively planned because both level-1 and level-2 classification tables are limited. The service structure is clearer when two tables are stored.

CREATE TABLE 'base_category1'('id 'BIGint (28) NOT NULL AUTO_INCREMENT COMMENT' id', 'id' varchar(10) NOT NULL COMMENT' iD ', PRIMARY KEY (id') USING BTREE)ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSE:=Utf8 ROW_FORMAT=DYNAMIC COMMENT=' 1 '; CREATE TABLE 'base_category2' 'id' BIGint (28) NOT NULL AUTOLINCREMENT COMMENT' id ', 'name' varchar(20) NOT NULL COMMENT' catid ', 'catid' bigint(20) DEFAULT NULL COMMENT' catid ', PRIMARY KEY ('id')USING BTREE ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=Utf8 ROW_FORMAT=DYNAMIC COMMENT=' t2 ';Copy the code

How about if it is planned as a table, the table structure is as follows.

CREATE TABLE 'base_category' id 'BIGint (2g)NOT NULL AUTO_INCREMENT COMMENT' id', 'name 'varchar(2gg) NOT NULL C0 MMENT' category_parent_id' BIGINT (20)DEFAULT NULL COMMENT' parent_id', PRIMARY KEY ('id') USING BTREE ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET= UTf8 ROW_FORMAT=DYNAMIC COMMENT=' table 'Copy the code

In this case, when querying level-1 classification, you need to determine whether the parent classification number is empty. However, if the number of level-2 classification is also empty, service data may be confused. In addition, the index cannot be used when the secondary classification IS NOT NULL condition. At the same time, such a design also does not conform to the second normal form (because the parent category number does not depend on the category number D, because the parent category number can have many data NULL), so it is necessary to split the table. Therefore, both business requirements and database table specifications should be split into two tables.

Now let’s convert the weak entities into tables:

Address entities are converted into address tables (user_Address) as shown below.

CREATE TABLE 'user_address' id 'bigint(20) NOT NULL AUTO_INCREMENT COMMENT' 增 长', 'type' varchar(500) DEFAULT NULL COMMENT, 'city 'varchar(5Ba)DEFAULT NULL COMMENT' city', 'user_address' varchar(50) DEFAULT NULL COMMENT' user_address', 'user_id' bigINT (20) DEFAULT NULL COMMENT' id', 'consignee' VARCHar (48) DEFAULT NULL COMMENT' recipient ', 'phone_num' varchar(48) DEFAULT NULL COMMENT' contact ', 'is_default 'varchar(1) DEFAULT NULL C0MMENT', PRIMARY KEY ('id ')ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=Utf8 COMMENT=' Utf8 ';Copy the code

The order entity is transformed into an order table (order_.info), as shown below, and the actual business will have a lot of information about the order.

CREATE TABLE 'order_info' 'id 'BIGint (28) NOT NULL AUTO_INCREMENT COMMENT 'id', 'Consignee' VARCHar (18) DEFAULT NULL COMMENT 'Consignee', 'ConsigNEE_tel' VARCHar (28) DEFAULT NULL COMMENT 'Receiver phone', 'total_amount' decimal(18,2)DEFAULT NULL C0MMENT 'user_id'bigint(20) DEFAULT NULL COMMENT' id', 'payment_way 'varchar(28) DEFAULT NULL COMMENT' iD ', 'delivery_address 'varchar(18)DEFAULT NULL C0 MMENT' address', 'create_time' datetime DEFAULT NULL COMMENT' order time', PRIMARY KEY ('id')USING BTREE ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET= UTf8 ROW_FORMAT=DYNAMIC COMMENT=' order entity table ';Copy the code

The order detail entity is transformed into an order detail table (order_detail), as shown below. (See next section for many-to-many relationships)

CREATE TABLE 'order_detail'('id' BIGint (28) NOT NULL AUTO_INCREMENT COMMENT' order_detail', 'order_id' bigint(28) DEFAULT NULL COMMENT' order_id', 'sku_id' bigint(20) DEFAULT NULL COMMENT sku_id', 'sku_name' varchar(28)DEFAULT NULL COMMENT'sku name', 'sku_num 'varchar(28)DEFAULT NULL COMMENT' 数 据', 'create_time 'datetime DEFAULT NULL COMMENT' 数 据', PRIMARY KEY ('id 'USING BTREE)ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=Utf8 ROW_FORMAT=DYNAMIC COMMENT=' SQL';Copy the code

The shopping cart entity is transformed into a shopping cart table (cart_info), as shown below.

CREATE TABLE 'cart_info'(' cart_ID 'BIGint (28) NOT NULL AUTO_INCREMENT COMMENT' ID',  'user_id' varchar(200) DEFAULT NULL COMMENT 'id', 'sku_id' bigint(20) DEFAULT NULL COMMENT 'skuid', 'sku_num' int(11) DEFAULT NULL COMMENT' 主 键 ', 'img_uri' varchar(58) DEFAULT NULL COMMENT' 主 键 ', PRIMARY KEY (id')USING BTREE)ENGINE:=InnoDB AUTO._INCREMENT=1 DEFAULT CHARSET=Utf8ROW_FORMAT=DYNAMIC COMMENT=' 1 ';Copy the code

Comment entities are transformed into comment tables (Members), as shown below.

CREATE TABLE 'sku_comments' 'comment_id' BIGint (28) NOT NULL AUTO_INCREMENT COMMENT' 'user_id' bigint(28) DEFAULT NULL COMMENT' select * from decimal ', 'sku_id' decimal(18) DEFAULT NULL C0 MMENT' 'varchar(288) DEFAULT NULL comment', 'create_time' datetime DEFAULT NULL comment', PRIMARY KEY ('id 'USING BTREE)ENGINE=InnoDB AUTO_INCREMENT=45 DEFAULT CHARSET=Utf8ROW_FORMAT=DYNAMIC COMMENT=' BTREE';Copy the code

Commodity entities are transformed into commodity tables (Members), as shown below.

CREATE TABLE 'sku_info' 'sku_id 'BIGint (20) NOT NULL AUTO_INCREMENT COMMENT' itemID ', 'price 'decimal(10,2) DEFAULT NULL COMMENT' price', 'sku_name' varchar(280) DEFAULT NULL COMMENT'sku name', 'sku_desc' Varchar (288) DEFAULT NULL COMMENT' catid ', 'category3_id 'BIGint (28) DEFAULT NULL COMMENT' catid', 'color 'varchar(28) DEFAULT NULL COMMENT' color', 'is_sa1le 'tinyint(3) N0T NULL DEFAULT'0' C0MMENT' Yes O: No)', PRIMARY KEY (id.')USING BTREE ENGINE=InnoDB AUTO_INCREMENT=45 DEFAULT CHARSET=utf8 ROW_FORMAT-DYNAMIC COMMENT='Copy the code

8.5.2 A many-to-many relationship is transformed into a data table

There is one many-to-many relationship in this ER model, that is, the relationship between goods and orders. Goods of the same category can appear in different orders, and different orders can also contain goods of the same type, so the relationship between them is many-to-many. In view of this situation, it is necessary to design an independent table to represent, which is generally called an intermediate table.

We can design a separate order detail table to represent the inclusion relationship between the item and the order. This table is associated with two entities, orders and goods. Therefore, the table must include the primary key of the table to which these two entities are converted. In addition, we also include the properties of the relationship: the quantity of goods, the price at which the goods are ordered, and the name of the goods.

CREATE TABLE 'order_detail'(' id 'bigint(2O)NOT NULL AUTO_INCREMENT COMMENT' order_detail', 'order_id 'bigint(2B)DEFAULT NULL COMMENT' order_id', 'sku_id' bigint(20)DEFAULT NULL COMMENT sku_i 'sku_name' varchar(2gB)DEFAULT NULL COMMENT'sku name', 'sku_num' varchar(2gB)DEFAULT NULL COMMENT' 数 据 ', 'create_time 'datetime DEFAULT NULL COMMENT' 数 据', PRIMARY KEY (ID) USING BTREE ENGINE:=InnoDB AUTO._INCREMENT:=1 DEFAULT CHARSET:=Utf8ROW_FORMAT=DYNAMIC COMMENT=' order list ';Copy the code

8.5.3 The One-to-many relationship is expressed through foreign Keys

In the design of the table above, we can use foreign keys to express 1-to-many relationships. For example, in the product comment table SKu_comments, we define user_id and sku_id as foreign keys, respectively, to set foreign keys with the following statement.

CONSTRAINT fk_comment_user FOREIGN KEY(user_id) REFERENCES user_info (id),
CONSTRAINT fk_comment_sku FOREIGN KEY(sku_id) REFERENCES sku_info (sku_id)
Copy the code

Foreign key constraints are mainly used to ensure data consistency at the database level, but because foreign keys need to be checked to insert and update data, theoretically performance can be reduced and negatively affected.

In practical projects, foreign keys are not recommended, on the one hand to reduce the complexity of development (with foreign keys, the operation of primary and secondary table classes must operate on the primary table first), and on the other hand, foreign keys can be very troublesome when processing data. On e-commerce platforms, foreign keys are not configured because concurrent services are heavy. Otherwise, database performance may be affected.

8.5.4 summary

In fact, any database-based application projects can be built through this first ER model, and then converted into data tables, complete the design of the database. Creating an ER model is not the goal, the goal is to sort out the business logic and design a good database. I suggest that instead of modeling for modeling’s sake, you use the process of creating an ER model to organize your thoughts so that creating an ER model makes sense.

9. Data table design principles

Based on the above, the general principles of data table design are summarized: three less and one more

  • The fewer tables you have, the better
    • The core of an RDBMS is the definition of entities and associations, also known as the E-R Diagram. The fewer the data tables, the more concise the design of entities and associations, which is easy to understand and operate.
  • The fewer fields in the data table, the better
    • The more fields there are, the greater the possibility of data redundancy. The minimum number of fields must be independent of each other, rather than the value of one field can be calculated by other fields. Of course, the number of fields is relatively small, we usually inData redundancyandRetrieval efficiencyTo balance.
  • The fewer fields in a table that have a federated primary key, the better
    • A primary key is used to determine uniqueness. When uniqueness cannot be determined for a field, a joint primary key is used (that is, multiple fields define a primary key). The more fields in the federated primary key,The larger the index space is, which not only makes it more difficult to understand, but also increases the running time and index space, so the fewer fields associated with the primary key the better.
  • The more primary and foreign keys you use, the better
    • The design of a database is essentially to define tables and relationships between fields. The more relationships there are, the less redundancy there is between these entities,The higher the utilization. The advantage of this is that it not only ensures that the data table is in betweenindependence, can also improve the usage of correlation between each other.

The core of the principle of “three less, one more” is simple reuse. Simplicity means fewer tables, fewer fields, and fewer federated primary key fields to complete the data table design. The reuse side enhances the reuse rate between data tables by using primary and foreign keys. Because a primary key can be understood as representing a table. The more keys you design, the higher the utilization between them.

Note: This principle is not absolute, sometimes we need to sacrifice data redundancy for efficiency.

10. Database object writing suggestions

10.1 about the library

  • [Mandatory] The library name must contain a maximum of 32 characters and can only contain letters, digits, and underscores (_). You are advised to start with a letter.
  • [Mandatory] All database names are lowercase in Both Chinese and English, and different words are separated by underscores. One must know by name.
  • [Mandatory] Library name format: business system name_subsystem name.
  • [Mandatory] Prohibit the use of keywords (such as type,order, etc.) in library names.
  • [Mandatory] The character set must be utF8 or UTF8MB4 when creating a database. SQL example for creating a DATABASE: CREATE DATABASE crm_fund DEFAULT CHARACTER SET ‘utF8’;
  • [Suggestion] For programs to connect to the database account, follow the principle of minimum permission. The database account can only be used in one DB and cannot be used across databases. Accounts used by programs are not allowed to have the DROP permission in principle.
  • [Suggestion] Temporary libraries are prefixed with TMP_ and suffixed with date. Backup libraries are prefixed with bak_ and suffixed with date.

10.2 About Tables and Columns

  • [Mandatory] The name of a table or column must contain a maximum of 32 characters. The table name can contain only letters, digits, and underscores (_). You are advised to start with a letter.
  • [Mandatory] All table names and column names are lowercase, and different words are separated by underscores. One must know by name.
  • [Mandatory] Table names must be strongly correlated with module names. Use the same prefix for table names of the same module. For example: crm_fund_item
  • [Mandatory] The character set must be utF8 or UTF8MB4 when creating a table.
  • [Mandatory] Do not use keywords (such as type or order) in the table name or column name.
  • [Mandatory] The type of the table storage engine must be explicitly specified when creating a table. If there are no special requirements, all InnoDB.
  • [Mandatory] Create a table with a comment.
  • [Mandatory] The field name should use English words or abbreviations that express the actual meaning as much as possible. For example, instead of using corporation_id, use corp_id.
  • [Mandatory] Fields of Boolean type are named IS_ description. For example, the name of the member field in the member table that indicates whether the member is enabled is is_enabled.
  • [Mandatory] Do not store large binary data, such as pictures and files, in the database. Usually, the file size is very large and the data volume increases rapidly in a short period of time. When the database reads data, a large number of random I/O operations are performed. Usually stored in the file server, the database only stores the file address information.
  • [Suggestion] About primary keys when creating a table: The table must have a primary key
    • Mandatory primary key id, type int or bigINT, and auto_increment Unsigned unsigned is recommended.
    • Select user_id, order_ID, user_id, order_ID, and create unique key index. Because if the primary key is set to random insert, it will cause internal page splitting and lots of random I/O, and performance degradation.
  • [Suggestion] The core table (such as the user table) must have the create_time and update_time fields of row data to facilitate fault detection.
  • [Suggestion] All fields in the table should be NOT NULL. Services can define DEFAULT values as required. Using NULL values can take up extra storage space for each row, data migration is error-prone, and aggregation function calculation results are biased.
  • [Suggestion] The names and types of all columns that store the same data must be the same. (Generally, associated columns are used. If the associated column types are different during query, implicit data type conversion is automatically performed, which invalidates indexes on the columns and reduces query efficiency.)
  • [Suggestion] The intermediate table (or temporary table) is used to reserve the intermediate result set. The name starts with TMP_. The backup table is used to back up or capture the snapshot of the source table. The name starts with bak_. The intermediate and backup tables are cleaned periodically.
  • 【 典 型 范 例 2 】 A more formal construction sentence:
CREATE TABLE user_info (' id 'int unsigned NOT NULL AUTO_INCREMENT COMMENT' increment ', 'user_id' bigint(11) NOT NULL COMMENT 'id',' username 'varchar(45) NOT NULL COMMENT' id', 'email' varchar(30) NOT NULL COMMENT 'nickname' varchar(45) NOT NULL COMMENT 'nickname' varchar(45) NOT NULL COMMENT 'nickname ', 'birthday' date NOT NULL COMMENT 'birthday ',' sex 'tinyint(4) DEFAULT '0' COMMENT' gender ', 'short_rule' varchar(150) DEFAULT NULL COMMENT ' ', 'user_resume' varchar(300) NOT NULL COMMENT 'user_resume ', 'user_register_IP' int NOT NULL COMMENT 'user_register_ip ', 'create_time' TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'create_time ', 'update_time' TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'update_time ', 'user_review_status' tinyint NOT NULL COMMENT' User_review_status' PRIMARY KEY (' id '), UNIQUE KEY (' uniq_user_id '), KEY (' idx_username ') KEY 'idx_create_time_status' (' create_time', 'user_review_status') ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Copy the code

10.3 About Indexes

  • [Mandatory] InnoDB tables must have id int/ BIGINT AUTO_increment as the primary keyForbid to be updated
  • InnoDB and MyISAM store engine table, index type must be BTREE
  • [Suggestion] The name of the primary key starts with pk_, and the name of the unique key starts with pk_uni_uk_Common indexes start with IDX_. All indexes are in lowercase format and suffixed with the field name or abbreviation.
  • [Suggestion] It is a multi-word columnname. Take the first letters of the first few words and add the last word to form the column_name. For example, the index of sample table member_id is idx_SAMple_mid.
  • [Suggestion] Indicates the number of indexes in a single tableNo more than six
  • [Suggestion] When creating indexes, consider creating indexesJoint indexAnd put the most differentiated fields first.
  • [Suggestion] In SQL with multi-table JOIN, ensure that the JOIN columns of the driven table have indexes, so that the JOIN execution efficiency is the highest.
  • [Suggestion] When creating a table or adding an index, ensure that the table does not exist with each otherRedundant indexes. For example, if key(a,b) already exists in the table, key(a) is a redundant index and needs to be deleted.

10.4 write SQL

  • [Mandatory] The name of a field must be specified in the program SELECT statement. * is not allowed.
  • [Suggestion] Insert INTO T1 VALUES(…) .
  • [Suggestion] Except for static tables or small tables (less than 100 rows), DML statements must have a WHERE condition and use index lookup.
  • 【 suggestion 】INSERT INTO… VALUES (XX), (XX), (XX)… The value of XX should not exceed 5000. Too many values, although quickly online, will cause master/slave synchronization delay.
  • [Suggestion] Do not use UNION in SELECT statements. Instead, use UNION ALL and limit the number of UNION clauses to 5.
  • [Suggestion] In an online environment, do not JOIN more than five tables.
  • [Suggestion] Reduce the use of ORDER BY, and communicate with business without sorting, or put sorting on the program side. The ORDER BY, GROUP BY, and DISTINCT statements consume CPU, and the CPU resources of the database are extremely valuable.
  • [Suggestion] For statements that contain the ORDER BY, GROUP BY, and DISTINCT queries, keep the result set filtered BY the WHERE condition within 1000 lines; otherwise, the SQL will be slow.
  • [Suggestion] Multiple ALTER operations on a single table must be merged into one ALTER table for a large table with more than 100W rows. The alter table must be reviewed by the DBA and executed during peak business periods. Multiple ALTER operations must be integrated. A table lock is generated for ALTER TABLE, blocking all writes to the table during this period, which may have a significant impact on services.
  • [Suggestion] When you operate data in batches, control the transaction processing interval and perform necessary sleep.
  • [Suggestion] The transaction contains no more than five SQL statements. Because long transactions can lead to long lock data, MySQL internal cache, excessive connection consumption, etc.
  • [Suggestion] UPDATE statements in transactions based on primary keys or UNIQUE keys, such as UPDATE… WHERE id=XX; Otherwise, a gap lock will occur, and the internal lock range will be expanded, resulting in system performance deterioration and deadlock.

Refer to the article

MySQL Technical Insider: InnoDB Storage Engine (2nd Edition) database Index Design and Optimization