Are databases really good for containerization? Probably not

The container concept (and Docker in particular) is very hot. However, there are a few things to keep in mind before wrapping your database into a brand new container.

This article evaluated the feasibility of Docker and other container solutions in a database environment.

A few weeks ago, I wrote a relatively general article about containers. It explains when you should consider using container technologies like Docker, RKT, LXC, etc. If convenient, might as well browse first. This is a good way to understand some of the considerations before moving to a new technology architecture. This led to an internal discussion among our solution engineering team. Your team should have the same puzzle: Should customers run databases in containers?

Before we begin, we need to acknowledge the fact that Percona is using containers. All the elegant charts and query analysis provided by Percona Monitoring and Management (PMM) is hosted by running a Docker container. We made this choice because integration between components is where we can provide the most value to our users. Docker makes it cool to distribute a prepared unit. In short, it has great potential on the application side of the enterprise environment.

However, for databases… Here are a few suggestions.

Temporary emergency

Decision = not to containerize the database (keep the status quo)

This is not to say that this is true in every environment. It’s what we think is most recommended by default for the vast majority of our customers. Remember, I’m just suggesting you do this with your database. If your application is microservified today, it may make more sense to implement the containerization of the database depending on the load characteristics of the database, the scaling requirements, and the existing skill set of the engineers.

Why is that?

The lack of coordination

Before you get mad, let’s take a moment to remember where we came from. First, container solutions are designed to handle stateless applications with temporary data. The container quickly builds a microservice and then destroys it. This covers all the components of the container (including its cache and data). The transient nature of a container determines that all of its components and services are considered part of the container (basically all or none). Providing the container with a volume of data belonging to the underlying operating system by punching holes in it can be challenging in itself. Existing solutions are too unreliable for most database systems.

The vast majority of development forces working on various solutions have one goal in mind: statelessness. There are many solutions to help you persist your data, but they are still in rapid iteration. Arguably, their use introduces a high level of complexity, and the increased operational complexity (and risk) negates the efficiency gains of containerization. This is certainly exemplified by our previous review of some “real world” feedback on the use of containers (especially Docker).

They’re not stable enough

The intent of these container solutions is to rapidly develop and deploy applications that are broken down into many tiny components: microservices. Often, these applications grow very quickly in software/engineer-driven organizations. This seems to be why these container solutions (again, especially Docker) were developed. New features are pushed out after a little testing and design. The main focus seems to be on the latest feature set, and getting it to market first. Instead of asking permission, they ask forgiveness after the fact. On top of that, they make backward compatibility (as we already know from what we said earlier) a very high priority (and even an exaggerated one). This means that you will have to plan to build a mature environment for continuous delivery and testing, as well as a mirror repository that is well known and tested for containers.

There are some cool tools out there for the right use cases, but they have time, money, resources, and experience. For most of our customers, as a business, that’s not where they should be thinking. Their business wasn’t designed around software development, and they didn’t have enough cash to fund the resources needed to keep the machines running. Instead, they want to create a stable and high-performance service that their users can happily use 24/7.

I knew that if we took the database out of the container we could give them a high performance, high availability environment and not worry too much.

Is there any hope?

Of course there is. In fact, there is more than hope. Many organizations today are already running containers (including databases) on a large scale! These companies typically have very mature processes. Their software development is a core part of business planning and drives the value proposition. You probably know what I’m talking about: Uber, Google, Facebook (the list goes on, but this is just a few). Another good option is to use Joyent to get persistence of container data. But as I said earlier, the complexity that comes with ensuring the necessary data retention and availability (the most basic use of a database) is simply too high. My personal opinion is that containers are one step closer to success when they have a better and more stable solution for persistent storage volumes. Even so, containerized databases may not be necessary within most organizations if they do not support large deployments (more than 50 nodes) and workloads vary widely.

This article is a translated version of the original article.

Are databases really good for containerization? Probably not

Decision = not to containerize the database (keep the status quo)

The lack of coordination

Related Posts

Optional Java functional programming

Super full screen display of the epidemic

Go: How does the garbage collector monitor your application?