Database management is a very professional thing, the database tuning, monitoring is generally completed by database engineers, but developers often deal with the database, even if it is simple to add, delete, change and check is also a lot of tricks, here, together to talk about the database is easy to ignore the problem.

Save the field length

Let’s start with some common types of storage length:

The column type The length of the store
tinyint 1 byte
smallint 2 –
int 4 bytes
bigint 8 bytes
float 4 bytes
decimal(m,d) 0 to 4 bytes
datetime 8 bytes
timestamp 4 bytes
char(m) M bytes
varchar(m) Variable length
text Variable length

Obviously, the length of different types of storage is very different, which has an impact on the efficiency of the query, and the field length has a significant impact on the index.

  • Char char char char char char char char char char char
  • String length varies greatly, so use varchar, limit the length, and don’t waste space
  • Integer type Selects the appropriate type based on the size
  • The recommended time is timestamp
  • Decimal is recommended, not float, and for prices, consider int or bigint, such as $1, which stores 100

The use of the UUID (GUID) is abandoned

Both uUID and GUID are used to avoid generating duplicate ids at the same time, but alternatives are recommended for the following reasons:

  • Uuid has no order
  • Uuid is too long
  • The UUID rule is completely out of control

The recommended solution uses bigint (preferred), or char, which is generated in a manner similar to Snowflake’s algorithm, sequential, fixed in length, shorter than uuid, and of course, almost never repeated.

Large tables reduce join tables, and single table queries are preferred

Single table query has many advantages, query efficiency is very high, easy to sub-table sub-library expansion, but many times we feel that the real realization is not too realistic, completely lost the meaning of relational database, but the performance advantage of a single table is too obvious, there will always be a way to solve:

  • Reasonably redundant fields
  • Use with in-memory database (redis\mongodb)
  • Table changes to multiple queries (explained below)

If considering the large amount of data in the later period, it is necessary to divide into tables and libraries, real-time single table query should be conducted as soon as possible. The current middleware of database divided into tables and libraries basically cannot support joint table query. Even though mycat supports up to two table joins, there is a significant performance cost.

The correct handling of indexes

The advantages of an index are not discussed here, but the misuse of an index can be counterproductive:

  • Tables with small data volumes that do not require indexes
  • You are advised to have a maximum of five indexes in a table. Indexes cannot meet all scenarios, but they can meet most scenarios
  • Mysql and sqlserver index differences are quite large, need to be noted. For example: mysql index field order has a significant impact on performance, sqlserver has been optimized, the impact is minimal

It’s probably better to look it up a few times than to have a list

I believe this proposal will get a lot of opposition, but I believe this conclusion is very suitable for large data scenarios. Checking the database more than once has several drawbacks:

  • Increased network consumption
  • Increased the number of database connections

In fact, these two problems can be ignored now, the database and application are basically connected to the internal network, the efficiency of the network connection is very high. Database connection pool optimization has been relatively mature, as long as the number of connections is not too much, the impact will not be too serious, but the advantages of checking several times are many:

  • A single meter is more efficient
  • Facilitate later expansion of sub-table sub-library library
  • Make effective use of the result cache of the database itself
  • Reduce table locks. Linked tables lock multiple tables

Of course, you have to check it a few times. Never query a database in a loop. We should also minimize the number of database queries. We can accept that one query becomes two queries, but if you become 10 queries, you have to give up. For example, when querying goods, you need to display the category name of the category table

select category.name,product.name from product  inner join category on p.categoryid=category.idCopy the code

Suggested ways:

select categoryid,name  from product 
select categoryname from category where categoryid in (' '.' '.' '.' ')Copy the code

Of course, you can optimize it by sorting the categoryID of the product before querying the category name. Because we’ve already used Snowflake to generate sequential primary keys. As a bonus, in is not as slow as you might think, and if you keep it at 100 nodes (many books say 1000 nodes, let’s be conservative), performance is still high.

Use as simple a database script as possible

Many people who have used the.NET Entity Framework say that the Framework is slow, but it is slow in two main ways: the incorrect use of lazy loading (foreign key association) and the slow generation of SQL compilations. Entity Framework generates SQL scripts that are too cumbersome to compile. Try to use a simple database script. Do not use a SQL script that is too long. It will take too long to compile the SQL script when it is first executed.

Avoid aggregation operations as much as possible

Aggregation operations such as count,group, etc., are a big killer of database performance. There are often large scale table scans and table calls, so we can see that many platforms have hidden the calculation of quantity, and commodity queries do not display the count results in real time. Taobao, for example, does not show the number of results, only the first 100 pages. The way to avoid aggregations is to store real-time count calculations in fields to accumulate the results. Of course, you can also consider using a real-time computing framework such as Spark, which is beyond the scope of this discussion. (PS: Mainly I don’t understand either)

conclusion

Program optimization is a lot of time are some details of the problem, more should pay attention to the usual accumulation, Ali SQL specification has a lot of places to learn, the above is also some of the summary of their own work. (after)


Welcome to pay attention to my public number communication, learning, the first time to get the latest articles. Wechat id: ITmifen