Along with the development of the Internet information industry, the amount of data for each company are rising steadily, even have a plenty of growth index properties, besides some companies directly into big data, for your company’s database pressure, after these problems, the industrious and programmers like all kinds of solution for the database bottleneck, Hence the following several database tuning schemes, we take a look, perhaps useful to you? Note: this article mainly focuses on the database and table, and there are some other performance tuning solutions

1. Database bottlenecks

Either an IO bottleneck or a CPU bottleneck will eventually lead to an increase in the number of active connections to the database, approaching or even reaching the threshold for the number of active connections the database can handle. In the case of a business Service, there are few or no database connections available. Then you can imagine (concurrency, throughput, crashes).

1. IO bottlenecks

The first type: disk read IO bottleneck, too much hotspot data, database caching, each query will produce a large number of I/O, reduce the query speed -> branch library and vertical branch table.

Second: network IO bottleneck, request too much data, network bandwidth is not enough -> branch library.

2. CPU bottlenecks

The first kind: SQL problem, such as SQL contains JOIN, group BY, order by, non-index field condition query, etc., increase THE OPERATION of CPU operation -> SQL optimization, establish appropriate indexes, and carry out business calculation in the business Service layer.

Second: single table data volume is too large, query scan too many rows, SQL efficiency is low, CPU bottleneck -> horizontal table.

Two, sub-database sub-table

1. Horizontal branch library

Concept: Split data from one library into multiple libraries based on fields and policies (hash, range, etc.).

Results:

  • Each library has the same structure;
  • Each library has different data and no intersection;
  • The union of all libraries is full data;

Scenario: The absolute concurrency of the system is up, it is difficult to fundamentally solve the problem of table splitting, and there is no obvious business ownership to divide the database vertically.

Analysis: library more, IO and CPU pressure can naturally be doubled relief.

2. Level table

Concept: Split data in one table into multiple tables based on fields and policies (hash, range, etc.).

Results:

  • Each table has the same structure;
  • The data in each table is different and there is no intersection;
  • The union of all tables is full data;

Scenario: The absolute concurrency of the system does not increase, but the amount of data in a single table is too large, affecting SQL efficiency and increasing the CPU burden, and thus becoming a bottleneck. Recommendation: A SQL query optimization principle analysis

Analysis: the amount of table data is less, the single SQL execution efficiency is high, naturally reduce the BURDEN of CPU.

3. Vertical branch library

Concept: Split different tables into different libraries based on service ownership.

Results:

  • Each library has a different structure;
  • The data in each library is also different and there is no intersection;
  • The union of all libraries is full data;

Scenario: The absolute concurrency of the system is up and individual business modules can be abstracted out.

Analysis: At this point, it’s basically servitization. For example, with the development of services, there are more and more common configuration tables and dictionary tables. At this time, these tables can be separated into a separate library, or even service-oriented. Furthermore, as the business develops and incubates a business model, relevant tables can be separated into a separate library or even servitized.

4, vertical table

Concept: Separate the fields in a table into different tables (main table and extended table) according to the activity of the fields.

Results:

  • Each table has a different structure;
  • The data of each table is also different. Generally speaking, each table has at least one intersection column of fields, which is usually the primary key used to associate data.
  • The union of all tables is full data;

Scenario: The absolute amount of concurrent data does not increase. The table has a few records but many fields. In addition, hotspot data and non-hotspot data are stored together, and a single row of data requires large storage space. As a result, the number of data rows in the database cache is reduced. A large number of random read I/OS are generated when disk data is read during query, resulting in I/O bottlenecks.

Analysis: You can use list pages and detail pages to help you understand. The split principle of vertical split tables is to put hot data (which may be redundant and often queried together) together as the primary table, and non-hot data together as the extended table. This allows more hotspot data to be cached, reducing random read IO. After the split, to get all the data needs to associate the two tables to fetch the data.

Keep in mind, however, that you should never use joins, because joins not only increase the CPU burden but also couple the two tables together (on the same database instance). Associated data should be done in the business Service layer, fetching primary and extended table data separately and then associating all data with associated fields.

Three, database and table tools

  • Sharding-sphere: JAR, formerly sharding-JDBC;
  • TDDL: JAR, Taobao Distribute Data Layer;
  • Mycat: middleware.

Note: the pros and cons of the tool, please do your own research, official website and community priority.

Four, the steps of database and table

Evaluate the number of branches or tables based on capacity (current capacity and growth) -> Select key (uniform) -> table rules (hash or range) -> execute (double write) -> Capacity expansion (minimize data movement).

MySQL: the difference between partition and database partition

Fifth, the problem of database and table

1. Query non-partition keys

Based on the horizontal database and table, the split strategy is the common hash method.

Only one non-partition key can be used for conditional query

Mapping method

Genetic method

Note: When writing, the genetic method generates user_id, as shown in the figure. For xBIT gene, for example, there are 8 tables, 23=8, so x is 3, that is, 3bit gene. When querying according to user_id, modules can be directly routed to the corresponding sub-database or sub-table.


When querying according to user_name, user_name_code is generated using the user_name_code generation function, and then the user_name_code is modelled to the corresponding sub-database or sub-table. Id Generates the common Snowflake algorithm.

More than one non-partition key is used as a conditional query

Mapping method

Redundancy method

Note: queries as order_id or buyer_id are routed to db_o_buyer, and queries as seller_id are routed to db_o_seller. Feels like putting the cart before the horse! Is there another good way? What about changing the technology stack?

In addition to partition keys, the background can also query the combination of non-partition keys

No method

Redundancy method

2. Non-partition key cross-library cross-table paging query problem

Based on the horizontal database and table, the split strategy is the common hash method.

Note: use NoSQL method to solve (ES, etc.).

3. Capacity expansion

Based on the horizontal database and table, the split strategy is the common hash method.

Horizontal expansion of the library (upgrade from the library method)

Note: Expansion is multiplied.

Horizontal expansion table (double write migration method)

  • Step 1 :(synchronize double-write) modify the application configuration and code, add double-write, and deploy;
  • Step 2 :(synchronous double write) copy the old data from the old library to the new library;
  • Step 3 :(synchronous double-write) proofread the old data in the new database according to the old database;
  • Step 4 :(synchronize double-write) modify the application configuration and code, remove double-write, and deploy the application.

Note: Double-write is common.

Other performance optimization solutions


1. Always set an ID for each table

Each table should have an ID field primary key of type INT or UNSIGNED and an AUTO_INCREMENT flag on it. Using primary keys of the VARCHAR type degrades performance.

There is only one exception here, and that is the “foreign key” of an “associated table”, that is, the primary key of this table is formed by the primary key of several individual tables. We call this a “foreign key.” For example, if there is a “student table” with a student ID and a “course table” with a course ID, then the “score table” is the “association table”, which is associated with the student table and the course table. In the score table, the student ID and the course ID are called “foreign keys”, which together form the primary key.

2. Index the search field

This simply means that when you create a table, if subsequent queries for the table always involve a certain field, or a field written in the code, then you can consider creating an index.

3. Use ENUM instead of VARCHAR

ENUM types are very fast and compact. In fact, it holds a TINYINT, but it looks like a string. This makes it perfect for making a list of options with this field.

If you have a field such as “country” and you know that the values of these fields are finite and fixed, then you should use ENUM instead of VARCHAR.

ENUM is a unique field type in the MySQL database. After being used, data will be migrated to other databases. So, if you want to change the database situation in the future, be careful to use it.

4. Use NOT NULL whenever possible

You should always keep your fields NOT NULL to save space (NULL also requires space).

5. Save the IP address as an UNSIGNED INT

If you use an integer to store IP instead of VARCHAR(15) fields, you save a lot of space (you need to write an IP translation function).

6. UTF8 character set must be used

Universal code, no transcoding, no garbled risk, save space

EXPLAIN your SELECT query

Use the EXPLAIN keyword to let you know how MySQL handles your SQL statements. This can help you analyze performance bottlenecks in your query or table structure.

Looking at the Rows columns allows us to identify potential performance issues.

Get advice from PROCEDURE ANALYSE()

PROCEDURE ANALYSE() lets MySQL analyze fields and their actual data, and provides useful advice (just advice). These suggestions will only be useful if there is actual data in the table, because you need data to make big decisions.

mysql>select * from mysql.user procedure analyse();Copy the code

Optimizing SQL statements

1. Use the query cache

Check whether cache is enabled:

mysql> select @@query_cache_type;Copy the code

Mysql.cnf = my.cnf = my.cnf = my.cnf = my.cnf = my.cnf = my.cnf

query_cache_type = 1query_cache_size = 600000Copy the code

MySQL query cache can greatly reduce the CPU usage of the database server. The actual usage is about 120% before MySQL query cache is enabled, and it drops to 10% after MySQL query cache is enabled. However, the limitations of using the cache of queries are numerous. Consider enabling query caching when the usage scenario is mainly read-only and rarely updated.

2. Use LIMIT 1 when only one row of data is required

In this case, adding LIMIT 1 can increase performance. In this case, the MySQL database engine will stop searching after it finds a piece of data, rather than continuing to search for the next piece of data that matches the record.

3. JOIN tables with similar types of examples and index them

If there are many JOIN operations, JOIN fields should be indexed and their types should be consistent.

4. Avoid SELECT *

The more data is read from the database, the slower the query becomes. Therefore, we should form the good habit of taking what we need.

5. Split large DELETE or INSERT statements

If you need to perform a large number of DELETE or INSERT queries on an online site, you need to be very careful that your actions do not bring your entire site to a halt. Because these two operations will lock the table, the table is locked, other operations can not enter.

Such a large number of DELETES and inserts can be performed in several parts, with each part paused.

Basic for SQL tuning scheme, is in the initial design table —- in the middle of SQL tuning — finally directly tuning table, hope this document is helpful to everyone, welcome to pay attention to + like + forward, thank you