preface

Docker is very hot in the past two years. Every developer wants to deploy all the applications and software in Docker container, but are you sure to deploy the database in the container?
This problem is not false, because you can find many kinds of manuals and video tutorials on the Internet, small series sorted out some database is not suitable for container for your reference, but also hope that you can be careful when using.
So far, containerization of databases has been very unreasonable, but the advantages of containerization are sure to be appreciated by all developers. Hopefully, more perfect solutions will emerge as the technology develops.

7 reasons Docker is not suitable for database deployment

1. Data security issues

Don’t store data in containers, which is one of Docker’s official container tips. Containers can be stopped or deleted at any time. When the container is dropped by rm, the data in the container is lost. To avoid data loss, users can mount data volumes to store data. However, the container’s Volumes are designed around the Union FS mirroring layer to provide persistent storage, so data security is not guaranteed. If the container suddenly crashes and the database is not shut down properly, data can be corrupted. In addition, the container shared data volume group, the physical machine hardware damage is also relatively large.
Even if you want to store Docker data on the host, it still doesn’t guarantee data loss. Docker Volumes are designed to provide persistent storage around the Union FS mirroring layer, but it still lacks guarantees.
Docker still runs the risk of being unreliable with current storage drivers. If the container crashes and the database is not shut down properly, data can be corrupted.

2. Performance issues

As we all know, MySQL belongs to a relational database, high IO requirements. When a physical machine runs multiple I/OS, I/OS accumulate, resulting in I/O bottlenecks and greatly reducing the READ and write performance of MySQL.
In a special session on the ten difficulties of Docker application, an architect of a state-owned bank also put forward: “The performance bottleneck of database generally occurs in IO. If Docker’s thinking is followed, multiple Docker IO requests will finally appear in storage. The current Internet database is mostly based on the share nothing architecture, maybe this is also a factor not considering migrating to Docker.
Some students may also have a corresponding solution to the performance problem:
(1) Separation of database program and data
If you use Docker to run MySQL, the database program and data need to be separated, the data is stored in shared storage, the program in the container. If the container has an exception or MySQL service exception, a brand new container is automatically started. In addition, it is recommended not to store data in the host machine. The host and the container share the volume group, which will greatly affect the damage of the host machine.
(2) Run lightweight or distributed databases
In Docker, lightweight or distributed databases are deployed. Docker itself recommends that the service be suspended and the new container be automatically started, rather than continuing to restart the container service.
(3) Reasonable layout of applications
For applications or services that require high I/O requirements, deploy the database on a physical machine or KVM. Currently, TDSQL of TX Cloud and Oceanbase of Ali are deployed directly on physical machines instead of Docker.

3. Network problems

To understand Docker networks, you must have a deep understanding of network virtualization. You must also be prepared for the unexpected. You may need to fix bugs without support or additional tools.
We know that databases require dedicated and persistent throughput to achieve higher loads. We also know that the container is a layer of isolation between the hypervisor and the host virtual machine. However, networking is critical for database replication, requiring a 24/7 stable connection between the master and slave databases. Docker network issues remain unresolved in version 1.9.
Taken together, containerization makes database containers difficult to manage. I know you are a top engineer and can solve any problem. But how much time do you need to fix the Docker network? Wouldn’t it be better to put the database in a dedicated environment? Save time to focus on the business goals that really matter.

4, state,

It’s cool to package stateless services in Docker to choreograph containers and solve single points of failure. But what about databases? Put the database in the same environment, it will be stateful, and the scope of system failure will be greater. The next time your application instance or application crashes, the database may be affected.
Horizontal scaling in Docker can only be used for stateless computing services, not databases.
An important feature of Docker’s rapid expansion is stateless. Those with data states are not suitable to be directly placed in Docker. If a database is installed in Docker, storage services need to be provided separately.
Currently, TDSQL(financial distributed database) of TX Cloud and Oceanbase(distributed database system) of Ali Cloud are running directly on physical machines, not on Docker, which is easy to manage.

5. Resource isolation

In terms of resource isolation, Docker is indeed inferior to virtual machine KVM. Docker uses Cgroup to achieve resource restriction, which can only limit the maximum consumption of resources, but cannot isolate other programs from occupying their resources. If other applications occupy physical machine resources, the read and write efficiency of MySQL in the container will be affected.
The more isolation levels you need, the more resource overhead you gain. Ease of horizontal scaling is a big advantage of Docker compared to dedicated environments. However, horizontal scaling in Docker can only be used for stateless computing services, not databases.
We don’t see any isolation for the database, so why should we put it in a container?

6. Inapplicability of cloud platform

Most people start projects through a shared cloud. The cloud simplifies the complexity of virtual machine operation and replacement, so there is no need to test new hardware environments at night or on weekends when no one is working. Why should we worry about the environment in which an instance runs when we can start it quickly?
That’s why we pay a lot of fees to cloud providers. When we place the database container for the instance, all of this convenience goes away. New instances will not be compatible with existing instances because of data mismatches. If we want to limit instances to stand-alone services, we should let DB use a non-containerized environment. We only need to reserve the ability to scale flexibly for the computing services layer.

7. Environmental requirements for running the database

It is common to see DBMS containers running on the same host as other services. However, these services have very different hardware requirements.
Databases (especially relational databases) have high I/O requirements. Common database engines use dedicated environments to avoid concurrent resource contention. Putting your database in a container wastes resources for your project. Because you need to configure a lot of additional resources for that instance. In the public cloud, when you need 34GB of memory, the instance you start must have 64GB on. In practice, these resources are not fully utilized.
How to solve it? You can design hierarchically and start multiple instances at different levels with fixed resources. Horizontal scaling is always better than vertical scaling.

conclusion

The database must not be deployed in a container.
The answer is: no
We can take the data loss insensitive business (search, buried point) can be datalized, the use of database sharding to increase the number of instances, thus increasing throughput.
Docker is suitable for running lightweight or distributed databases. When the Docker service fails, it will automatically start a new container instead of restarting the container service.
Database using middleware and containerized system can automatically scale, disaster recovery, switch, own multiple nodes, can also be containerized.


Pay attention to wechat public number: Songhua said, get more wonderful!

A BLOG address:www.liangsonghua.com

Introduction to our official account: We share our technical insights from working in JD, as well as JAVA technology and best practices in the industry, most of which are pragmatic, understandable and reproducible