Today, database operations are becoming a performance bottleneck for the entire application, especially for Web applications. Database performance is not just something for DBAs to worry about, it’s something for us programmers to worry about. When we design the structure of the database table, we need to pay attention to the performance of the data operation when we operate the database (especially the SQL statement when looking up the table). Here, we won’t talk too much about SQL statement optimization, but just for MySQL, the most widely used database on the Web. Hopefully these optimization tips are useful to you.

1. Optimize your query for query caching

Most MySQL servers have query caching enabled. This is one of the most effective ways to improve performance and is handled by MySQL’s database engine. When many of the same queries are executed multiple times, the query results are placed in a cache so that subsequent queries can access the cache results without manipulating the table.

The main problem here is that it’s easy for programmers to overlook this. Because some of our queries will make MySQL not use caching. Take a look at the following example:

1
2
3
4
5
6
// Query cache is disabled
$r
= mysql_query(
"SELECT username FROM user WHERE signup_date >= CURDATE()"
);
// Enable query caching
$today
=
date
(
"Y-m-d"
);
$r
= mysql_query(
"SELECT username FROM user WHERE signup_date >= '$today'"
);

The difference between the above two SQL statements is the CURDATE() function. MySQL’s query cache does not apply to this function. Therefore, SQL functions such as NOW() and RAND() and others do not enable query caching because their returns are variable. So, all you need is a variable instead of a MySQL function to enable caching.

EXPLAIN your SELECT query

Use the EXPLAIN keyword to let you know how MySQL handles your SQL statements. This can help you analyze performance bottlenecks in your query or table structure.

The EXPLAIN query results will also tell you how your index primary keys are utilized, how your tables are searched and sorted… Wait, wait, wait.

Pick one of your SELECT statements (the most complex one with multiple table joins is recommended) and add the keyword EXPLAIN in front of it. You can do this using phpMyadmin. Then, you’ll see a table. In the following example, we forgot to add the group_id index and have a table join:

When we index the group_id field:

We can see that the first result shows 7883 rows searched, and the second one just searches rows 9 and 16 of the two tables. Looking at the Rows columns allows us to identify potential performance issues.

Call waiting welfare

1. Recently sorted out 20G resources, including product/operation/test/programmer/market, etc., and Internet practitioners [necessary skills for work, professional books on the industry, precious books on interview questions, etc.]. Access:

  • Scan the code of wechat to follow the public account “Atypical Internet”, forward the article to the moments of friends, and send the screenshots to the background of the public account to obtain dry goods resources links;

2. Internet Communication Group:

  • Pay attention to the public account “atypical Internet”, in the background of the public account reply “into the group”, network sharing, communication;

3. Use LIMIT 1 when only one row is required


Sometimes when you query a table, you already know that the result will only have one result, but because you might need to fetch the cursor, or you might check the number of records returned.

In this case, adding LIMIT 1 can increase performance. In this case, the MySQL database engine will stop searching after it finds a piece of data, rather than continuing to search for the next piece of data that matches the record.

The following example, just to find out if there are “Chinese” users, is obviously more efficient than the previous one. (Note that the first is Select * and the second is Select 1)

1
2
3
4
5
6
7
8
9
10
11
// Not efficient:
$r
= mysql_query(
"SELECT * FROM user WHERE country = 'China'"
);
if
(mysql_num_rows(
$r
) > 0) {

// ...
}
// Efficient:
$r
= mysql_query(
"SELECT 1 FROM user WHERE country = 'China' LIMIT 1"
);
if
(mysql_num_rows(
$r
) > 0) {

// ...
}

4. Index the search field

Indexes do not have to be assigned to primary keys or unique fields. If you have a field in your table that you want to search frequently, index it.

From the above you can see that the search string “last_name LIKE ‘a%'” is about 4 times worse than that of the search string (last_name LIKE’ a%’).

In addition, you should also know what searches cannot use the normal index. For example, when you need to search for a word in a large article, such as: “WHERE post_content LIKE ‘%apple%'”, the index may be meaningless. You may need to use the MySQL full-text index or create your own index (e.g., search keywords or tags).

5. Join tables with similar types of examples and index them

If your application has a lot of JOIN queries, you should make sure that the JOIN fields in both tables are indexed. In this way, MySQL internally starts the SQL statement that optimizes the Join for you.

Furthermore, the fields used to Join should be of the same type. For example, if you want to Join DECIMAL fields with an INT field, MySQL cannot use their indexes. For those strings, the same character set is required. (The character set of the two tables may be different)

1
2
3
4
5
6
// Find company in state
$r
= mysql_query("SELECT company_name FROM users

LEFT JOIN companies ON (users.state = companies.state)

WHERE users.id =
$user_id
");
// The two state fields should be indexed, of the same type, and of the same character set.

6. Don’t ORDER BY RAND()

Want to scramble the returned rows? Pick a random number, right? I don’t know who invented this, but many novices enjoy it. But what you don’t understand is how terrible the performance problem is.

If you really want to scramble the rows returned, there are N ways you can do this. Using it like this only degrades your database performance exponentially. The problem here is that MySQL would have to perform the RAND() function (which is CPU time consuming) to remember rows for each row and then sort them. Even if you use Limit 1, it doesn’t help.

The following example picks a record at random

1
2
3
4
5
6
7
8
9
// Do not do this:
$r
= mysql_query(
"SELECT username FROM user ORDER BY RAND() LIMIT 1"
);
// This is better:
$r
= mysql_query(
"SELECT count(*) FROM user"
);
$d
= mysql_fetch_row(
$r
);
$rand
= mt_rand(0,
$d
[0] - 1);
$r
= mysql_query(
"SELECT username FROM user LIMIT $rand, 1"
);

7. Avoid SELECT *

The more data is read from the database, the slower the query becomes. Also, if your database server and WEB server are two separate servers, this can increase the load of network traffic.

Therefore, you should form a good habit of taking what you need.

1
2
3
4
5
6
7
8
9
/ / do not recommend
$r
= mysql_query(
"SELECT * FROM user WHERE user_id = 1"
);
$d
= mysql_fetch_assoc(
$r
);
echo
"Welcome {$d['username']}"
;
/ / recommend
$r
= mysql_query(
"SELECT username FROM user WHERE user_id = 1"
);
$d
= mysql_fetch_assoc(
$r
);
echo
"Welcome {$d['username']}"
;

8. Always set an ID for each table

Each table in the database should have an ID as its primary key, preferably an INT (UNSIGNED is recommended) and an AUTO_INCREMENT flag on it.

Even if your Users table has a field with a primary key called “email”, do not make it a primary key. Using the VARCHAR type as a primary key degrades performance. In addition, in your program, you should use table ids to construct your data structures.

In addition, there are some operations that require primary keys under the MySQL data engine. In these cases, the performance and Settings of primary keys become very important, for example, clustering, partitioning…

There is only one exception here, and that is the “foreign key” of an “associated table”, that is, the primary key of this table is formed by the primary key of several individual tables. We call this a “foreign key.” For example, if there is a “student table” with a student ID and a “course table” with a course ID, then the “score table” is the “association table”, which is associated with the student table and the course table. In the score table, the student ID and the course ID are called “foreign keys”, which together form the primary key.

9. Use ENUM instead of VARCHAR

ENUM types are very fast and compact. In fact, it holds a TINYINT, but it looks like a string. This makes it perfect for making a list of options with this field.

If you have a field such as “gender”, “country”, “ethnicity”, “status” or “department” and you know that the values of these fields are finite and fixed, then you should use ENUM instead of VARCHAR.

MySQL also has a “suggestion” (see # 10) that tells you how to reorganize your table structure. When you have a VARCHAR field, this advice tells you to change it to an ENUM type. You can get advice using PROCEDURE ANALYSE().

Get advice from PROCEDURE ANALYSE()

PROCEDURE ANALYSE() will let MySQL analyze your fields and their actual data for you, and will give you useful advice. These suggestions will only be useful if there is actual data in the table, because you need data to make big decisions.

For example, if you create an INT field as your primary key and don’t have much data, PROCEDURE ANALYSE() suggests changing the field’s type to MEDIUMINT. Or if you use a VARCHAR field, you might get a suggestion to change it to ENUM because there isn’t much data. These suggestions may not be accurate because there is not enough data.

In PHPMyAdmin, you can view these suggestions by clicking “Choose Table Structure” while viewing the table

It is important to note that these are only suggestions and will only become accurate as you add more data to your table. Always remember that you are the one who makes the final decision.

11. Use NOT NULL whenever possible

Unless you have a very specific reason to use NULL, you should always keep your fields NOT NULL. This may seem controversial, but read on.

First, ask yourself how much difference is there between “Empty” and “NULL” (0 and NULL if it’s INT)? If you think there is no difference between them, then you should not use NULL. You know what? In Oracle, NULL is the same as Empty!

Don’t assume NULL doesn’t require space, it requires extra space, and your program will be more complex when you make comparisons. Of course, this is not to say that you can’t use NULL, the reality is complicated, and there will still be situations where you need to use NULL.

Here is an excerpt from MySQL’s own documentation:

NULL columns require additional space in the row to record whether their values are NULL. For MyISAM tables, Each NULL column takes one bit extra, rounded up to the nearest byte.”

12. Prepared Statements

Prepared Statements, much like stored procedures, are collections of SQL Statements that run in the background, and there are many benefits to be gained from using Prepared Statements, both in terms of performance and security issues.

Prepared Statements can check some of the variables you have bound to protect your program from “SQL injection” attacks. Of course, you can check your variables manually; however, manual checking is problematic and often forgotten by programmers. This problem is better when we use some framework or ORM.

In terms of performance, this gives you a significant performance advantage when the same query is used multiple times. You can define some parameters for these Prepared Statements and MySQL will only parse them once.

Although the latest version of MySQL uses binary schema in transmitting Prepared Statements, this makes network transmission very efficient.

Of course, there are situations where we need to avoid using Prepared Statements because they do not support query caching. But it is said to be supported after version 5.1.

To use Prepared statements in PHP, you can refer to their manual: the mysqli extension or use a database abstraction layer such as PDO.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Create Prepared Statement
if
(
$stmt
=
$mysqli
->prepare(
"SELECT username FROM user WHERE state=?"
)) {

// Bind parameters

$stmt
->bind_param(
"s"
.
$state
);

/ / execution

$stmt
->execute();

// Bind the result

$stmt
->bind_result(
$username
);

// Move the cursor

$stmt
->fetch();

printf(
"%s is from %s\n"
.
$username
.
$state
);

$stmt
->close();
}

13. Unbuffered query

Normally, when you execute an SQL statement in your script, your program will stop there until no SQL statement is returned, and then your program will continue to execute. You can change this behavior with unbuffered queries.

The mysql_unbuffered_query() function has a nice explanation for this in the PHP documentation:

“Mysql_unbuffered_query () sends the SQL query query to MySQL without automatically fetching and buffering the result rows as mysql_query() does. This saves a considerable amount of memory with SQL queries that produce large result sets, And you can start working on the result set immediately after the first row has been retrieved as you don’t have to wait Until the complete SQL query has been performed.”

Mysql_unbuffered_query () sends a SQL statement to MySQL instead of automatically fethch and cache the results as mysql_query() does. This saves a lot of memory, especially for queries that produce a lot of results, and you don’t need to wait for all the results to return, just the first row of data, before you start working on the results.

However, there are some limitations. Either you read all the rows, or you call mysql_free_result() to clear the result before the next query. Also, mysql_num_rows() or mysql_data_seek() will not be available. So, you need to think carefully about whether to use unbuffered queries.

14. Save the IP address as an UNSIGNED INT

Many programmers create a VARCHAR(15) field to hold the IP as a string rather than an integer. If you store it with an integer, it only takes 4 bytes, and you can have fields of fixed length. Also, this gives you a query advantage, especially if you need to use WHERE conditions like IP between ip1 and ip2.

We must use UNSIGNED INT because IP addresses use the entire 32-bit UNSIGNED integer.

For your query, you can use INET_ATON() to turn a string IP into an integer, and INET_NTOA() to turn an integer into a string IP. In PHP, there are also functions like ip2long() and long2ip().

1
$r
=
"UPDATE users SET ip = INET_ATON('{$_SERVER['REMOTE_ADDR']}') WHERE user_id = $user_id"
;

Fixed length watches are faster

If all fields in a table are “fixed length”, the entire table is considered “static” or “fixed-length”. For example, the table does not have the following types of fields: VARCHAR, TEXT, BLOB. As long as you include one of these fields, the table is no longer a “fixed-length static table” and the MySQL engine handles it in a different way.

Fixed length tables improve performance because MySQL searches faster, and because these fixed lengths make it easier to calculate the offset of the next data, the read is naturally faster. If the field is not fixed length, the program needs to find the primary key each time it has to find the next one.

Also, fixed-length tables are easier to cache and rebuild. The only side effect, however, is that fixed-length fields waste some space, because they have to allocate that much space whether you use them or not.

Using the “vertical split” technique (see next article), you can split your table into two, one of fixed length and one of indefinite length.

16. Vertical segmentation

Vertical partitioning is a method of dividing a table in a database into several tables by column, which can reduce the complexity of the table and the number of fields for optimization purposes. (BEFORE, I worked on a project in a bank, and I saw a table with more than 100 fields, which was horrible)

Example 1: There is a field in the Users table called the home address. This field is optional, by contrast, and you do not need to read or overwrite this field very often except for personal information when operating in the database. So why not put him on another list? This is going to give you better performance for your table, and if you think about it, most of the time, for my user table, only user ID, user name, password, user role and so on are used frequently. Smaller tables will always have better performance.

Example 2: You have a “last_login” field that is updated every time the user logs in. However, each update causes the query cache for this table to be cleared. So, you can put this field in another table so that you don’t have to read the user ID, user name, and user role over and over again, because the query cache will add a lot of performance.

In addition, you need to note that you will not often Join the partitioned columns in the table, otherwise the performance will be worse than without partitioning, and there will be a drop in the number of poles.

17. Split large DELETE or INSERT statements

If you need to perform a large DELETE or INSERT query on an online site, you need to be very careful that your operation does not bring your entire site to a halt. Because these two operations will lock the table, the table is locked, other operations can not enter.

Apache has many child processes or threads. So, it works pretty efficiently, and our servers don’t want too many child processes, threads, and database links, which take up a lot of server resources, especially memory.

If you lock your table for a period of, say, 30 seconds, the amount of access processes/threads, database links, and open files accumulated in those 30 seconds for a highly trafficed site may not only cause you to Crash your WEB service, but also cause your entire server to Crash immediately.

So, if you have a big process and you’re sure you must split it up, using LIMIT conditions is a good way to do it. Here is an example:

1
2
3
4
5
6
7
8
9
10
while
(1) {

// Do 1000 pieces at a time

mysql_query(
"DELETE FROM logs WHERE log_date <= '2009-11-01' LIMIT 1000"
);

if
(mysql_affected_rows() == 0) {

// There is no more to delete, exit!

break
;

}

// Take a break every time

usleep(50000);
}

18. The smaller the column, the faster

Hard disk operations are probably the most significant bottleneck for most database engines. So, making your data compact can be very helpful in this case, because it reduces access to your hard drive.

Refer to the MySQL documentation Storage Requirements for all data types.

If a table has only a few columns (such as dictionary tables, configuration tables), there is no reason to use ints as primary keys. MEDIUMINT, SMALLINT, or smaller TinyInts are more economical. If you don’t need to keep track of the time, DATE is much better than DATETIME.

Of course, you also need to leave enough room for expansion, otherwise you will die a horrible death if you do this later. See Slashdot’s example (November 06, 2009), where a simple ALTER TABLE statement took more than 3 hours because there were 16 million entries in it.

19. Select a correct storage engine

There are two storage engines in MySQL, MyISAM and InnoDB, and each has its pros and cons. MySQL: InnoDB or MyISAM? Discuss and discuss the matter.

MyISAM is good for some applications that require a lot of queries, but it’s not good for a lot of writes. Even if you update a single field, the entire table is locked, and no other process, even the reader process, can operate until the read is complete. In addition, MyISAM is super fast for computations like SELECT COUNT(*).

InnoDB tends to be a very complex storage engine, slower than MyISAM for small applications. He argues that it supports “row locking”, so it is better when there are more writes. It also supports more advanced applications, such as transactions.

Here is the MySQL manual

  • Target = “_blank” MyISAM Storage Engine
  • InnoDB Storage Engine

20. Use an Object Relational Mapper

Using An Object Relational Mapper (ORM), you can expect reliable performance gains. Everything an ORM can do can also be written manually. However, this requires a high-level expert.

The most important aspect of ORM is “Lazy Loading”, that is, it only does the value when it needs to be done. However, you need to be careful of the side effects of this mechanism, as it is possible to reduce performance by creating many, many small queries.

ORM can also package your SQL statements into a transaction, which is much faster than executing them individually.

Currently, personal favorite of PHP’s ORM is: Doctrine.

21. Watch out for permalink

The purpose of permalink is to reduce the number of times MySQL links are recreated. When a link is created, it remains connected forever, even after the database operation has ended. Also, since our Apache started reusing its child process — that is, the next HTTP request will reuse the Apache child process and reuse the same MySQL link.

  • Mysql_pconnect ()

In theory, that sounds pretty good. But from personal (and most people’s) experience, this feature creates more headaches. You only have a limited number of links, memory problems, file handles, etc.

Also, Apache runs in an extremely parallel environment, creating many, many more processes. This is why the permalink mechanism doesn’t work well. Before you decide to use permalink, you need to think about the architecture of your entire system.

Article: Source

(Full text)

Author: increasingly, blog: https://coolshell.cn/articles/18190.html