preface

Before written an article “the use of the database you may ignore the”, mainly from places when they use easy to overlook, such as: field length, table design, and so on to explain that the theme of this article is also such, just from say database use in other aspects, easy to overlook, cause into the pit.

Reasonably estimate data volume

When designing tables for a database, you should evaluate the amount of data that can be generated, which can have a significant impact on the overall development and robustness of your code. Developing an application with a data volume of ten thousand level, one hundred thousand level, one million level and more than ten million level requires great differences in development ideas, technology selection and architecture. Basically, my principles are:

  • Ten thousand level of database, can be a bit random, SQL writing good habits;
  • 100,000 level, pay attention to the index, pay attention to the joint table performance;
  • Millions level, as far as possible to reduce the joint table, as far as possible do not do summary query, such as check the total number;
  • 10 million level above, in addition to the cache, the use of sub-table sub-library;

Many systems because in the design of the table, there is no good estimate of the later development of the system, leading to the online soon can not support the situation, too many linked table query code, do not care about the basic SQL performance, resulting in the database bottleneck soon appeared, had to reconstruct the system. When designing the database, it must be based on the business design, the development of the business has a certain estimate, look at the long term.

Estimate the concurrent traffic

Databases have a natural bottleneck, namely concurrency. We generally use caching to reduce concurrent connections to the database, as well as operations to the database. Database concurrency is not only encountered by large platforms, but also many small and medium-sized platforms. For example:

Loop through database operations

This problem, as I mentioned in the last article, is not to perform database operations in a loop, which will directly cause the number of database connections to explode, which can be very serious. Although it is a relatively low-level problem, but the probability of occurrence is actually very high, IN my side to see a lot of such cases, this kind of problem, is the need for programmers themselves to avoid these problems, of course, can also be monitored by some means to find these problems, but it will be more troublesome.

High frequency data requests from the business itself

In fact, some services, even small and medium-sized platforms, will have high concurrent requests to the database, a common example is: log. For example, we need to grab everyone’s operation logs, or the loading times of all modules, and store them persistently. If you choose to use Mysql to record these data, you can easily encounter high concurrency problems. This is a type of error.

Therefore, avoid high-concurrency database operations. Queries can be processed by caching, and additions, deletions, and changes can be processed asynchronously using tools such as MQ or Kafka. If the structural requirements for the database are not high, you can use hbase or Hive to save the database.

Rational use of database thread pools

Now database operations are the use of thread pool, thread pool is mainly used to control the number of database connections, in fact, connection pool is not a database category, but we generally use and database combination is very close, so here together. A common thread pool takes several parameters:

parameter instructions
Minimum number of connections These connections will always exist whether there are database operations or not,
Maximum number of connections The maximum number of connections allowed. If this number is exceeded, the connection cannot be applied for
Recovery time How often are all connections disconnected and then reconnected?
Release time How long a connection has not been operated on will be released

All connection pools have these parameters. Different pool parameters may have different names, but the function is the same. Here we focus on the maximum number of connections, which is an easy setting to ignore. When setting the maximum number of connections, many people like to set the maximum number of connections to a large number, such as 5000. However, the default value of an instance of mysql database is 1000. If the number of connections exceeds this, the database will not be able to handle it.

Number of servers x Maximum number of connections < maximum number of database connections

And that’s in the case of one instance, one database, as for multiple databases: I suggest

Number of servers x Maximum number of connections x Number of databases < Maximum number of database connections

If a single database consumes too many database connections, other databases may become unavailable. Of course, this value can be reasonably estimated according to the business, high frequency business allocation a little more, low frequency business allocation a little less. Do not blindly set the maximum connection pool size.

conclusion

Today, although a variety of storage, but relational databases has been one of the most important part of our system, try not to expose database deal with concurrent prematurely short board, design the database and the database operation in our development should be a very sacred things, certification to treat the relationship between the database would be wise to each operation.

You may have missed some of the things you need to learn about database read-write separation, table separation, and database separation – Mycat, this one is enough!


Welcome to pay attention to my public number communication, learning, the first time to get the latest articles. Wechat id: ITmifen