This article is based on the speech I shared online in [DeePlus Live Issue 247].

Today I will share the contents in the following four aspects:

  • One, the origin
  • II. Various containerization technologies are introduced
  • III. Introduction to Redis
  • IV. Comparison of REDIS Container Schemes

One, the origin

First let’s talk a little bit about why I’m sharing this topic today. My friends and I have organized a Redis technology exchange group, which has been running for about 6 years. One day, one of my friends in the group asked me a question:

He asked if Redis online was installed with Docker. How about using the network mode of Host for Docker and local mount mode for disk? Here, we will not talk about this plan for the time being, because after today’s sharing, I believe everyone should have a clearer understanding and evaluation of this plan.

II. Various containerization technologies are introduced

1, chroot and jails

In terms of containerization technology, it actually goes back a long way. Although the containerized technology we’re using, or K8S, and the concept of cloud native, has only been around for a few years now, it’s actually quite old in terms of containerized technology. For example, the original chroot, which many of you may have used or are familiar with, came from UNIX back in 1979, and its main function was to modify processes and child processes.

What is achieved by using chroot? If you chroot a directory and start a process, then the process itself will see /, which is what we call the/directory, and this/will be the folder we just specified, or the path we just specified. In this way, we can effectively protect some files on our operating system, or permissions and security related things.

In 2000, there was a new technology called jails that already had the sandbox, the prototype of the sandbox environment. With jails, you can have a process or an environment that you create with a separate network interface and IP address, and one of the things that comes to mind when we talk about using jails is that if you have a separate network interface and IP address, then you can’t send the original socket. The one that usually comes in more contact with raw sockets is the Ping command that we use. By default, this does not allow the use of raw sockets. Instead, it has its own network interface and IP address, which is similar to the virtual machine we use.

2. Linux VServer and OpenVZ

Then, in 2001, there was a new technology in the Linux community called Linux VServer. Linux VServer can sometimes be abbreviated as LVS, but it’s not the same as the 4-tier proxy LVS that we usually use. It is actually a Patch of the Linux kernel, which needs to be modified. After the modification, we can make it support system-level virtualization. At the same time, if Linux VServer is used, it can share system calls, which has no emulation overhead. That is to say, some of our common system calls, some of the system call functions can be shared.

In 2005, a new technology, OpenVZ, emerged. In fact, OpenVZ is very similar to Linux VServer in that it is also a Patch to the kernel. The biggest change of these two technologies is that it patches Linux a lot and adds a lot of new functions. However, in 2005, it did not merge all these into the backbone of Linux. And when using OpenVZ, it allows each process to have its own /proc or /sys.

In fact, we all know that in Linux, if you start a process, for example, you can go under /proc/self, and you can see the information about the process. If you have your own /proc, you can actually isolate yourself from other processes.

The next notable feature is that it has separate users and groups, which means you can have separate users or separate groups, and this can be separate from other users or groups in the system.

Secondly, OpenVZ is commercially used, that is, there are many foreign hosts and various VPS are using OpenVZ technology solutions.

3. Namespace and cgroups

By 2002, the new technology was namespace. In Linux, we have a new technology called namespace, which can achieve the isolation of specific resources within a process group. Because there are many types of namespace we use in our daily life, such as PID, NET, etc., and if you are not under the same namespace, you will not see other process-specific resources.

In 2013, a new namespace feature was created, that is, user namespace. In fact, when there is a User Namespace, it is quite similar to the function of independent users and groups implemented by OpenVZ as mentioned above.

There are generally three types of namespace operations.

1) the Clone

You can specify the namespace under which the child process is located.

2) Unshare

It is not shared with other processes. If unshare is added with -net, it can be independent with other processes and does not share its own net or namespace of its own network.

3) Setns

Set the namespace for the process.

In 2008, cgroups began to be introduced into the Linux kernel, which can be used to isolate the resource use of process groups, such as CPU, memory, disk, and network, especially after it was combined with user namespace in 2013 and redesigned. At this time, It became much more modern, just like the Docker features that we use a lot today, actually came from this time. So cgroups and namespace form the foundation of modern container technology.

4, LXC and Cloudfoundry

In 2008, there was a new technology called LXC, which we will also call Linux Container. We have mentioned a lot of containerized technologies above, such as Linux VServer and OpenVZ, but these are realized by patching, while LXC is the first one that can work directly with the upstream Linux kernel.

LXC supports privileged containers, which means you can do uid maps, gid maps, and mappings on the machine without having to boot them all from the root user, which is a lot of convenience. And this way can make your attack surface greatly reduced. The common operations supported by LXC are LXC-Start, which can be used to launch the Container and LXC-Attach to enter the Container.

In 2011, Cloudfoundry began to appear. It actually used the combination of LXC and Warden. At this time, I have to mention that its technical architecture is CS mode, that is, there is a client side and a server side, and Warden container. It usually has two layers, one for the OS read-only, which is the file system of the read-only operating system, and one for the application and its dependent non-persistent read-write layer, which is a combination of the two layers.

Most of the techniques we’ve talked about are specific to a single machine, a single machine. Cloudfoundry’s biggest difference is that it can manage clusters of containers across computers, which is already a feature of modern container technology.

LMCTFY and SYSTEMD-NSPAWN

In 2013, Google opened source its own containerized solution, called LMCTFY. This solution supports isolation of CPU, memory, and devices. It also supports subcontainers, allowing applications to be aware that they are currently in a container. Also could create a child container to himself, but as the development after 2013, it was gradually found that only rely on myself to do these technologies, is equivalent to fight alone, development is always limited, so it step by step will be your main focus on the abstract and transplant, his core features were transplanted with libcontainer. LibContainer then became a core part of Docker’s runtime, which was then donated by Docker to OCI, and then moved on to RUNC. We’ll talk more about this later.

Everyone knows that the server must have a process with a PID of 1. Is its initial process, daemon, and modern operating systems, most people are using systemd, the same systemd it also provides a containerized solution, called systemd-nspawn. With this technique, it can be combined with the systemd-related toolchain.

In addition to the systemctl we use in our daily routine, there is also the systemd machine CTL, which manages the machine. This machine supports two main interfaces, one for managing container related interfaces, and the other for managing virtual machine related interfaces.

For example, you can use machine CTL start to start a container that is supported by systemd. For example, you can use machine CTL start to start a container that is supported by systemd In this technology, it supports the isolation of resources and networks. In fact, the most important thing is systemd ns, and it actually uses namespace to do the isolation. As for resources, they come from systemd, and systemd can use cgroups to do resource isolation. In fact, this is also a combination of these two technical solutions.

6, Docker

In 2013, Docker also appeared. Generally speaking, Docker is the pioneer of the container era. Why is that? Because when Docker appeared in 2013, he first mentioned the standardized deployment unit, namely Docker Image. It also launched DockerHub, the central mirror repository. Allow everyone to download pre-built Docker images through DockerHub, and launch the container with a single Docker run.

In the context of many cumbersome and complex technologies, Docker came up with the idea that you only need one line of Docker Run to start a container, which greatly simplifies the complexity of starting the container and improves the convenience.

So Docker technology became popular all over the world. What are the main functions that Docker provides? Such as resource isolation and management. Before 0.9, Docker’s container was run as LXC. After 0.9, Docker began to replace LXC with LibContainer, which is actually Google’s LMCTFY as mentioned above. LibContainer then donated it to OCI. And what’s Docker’s container runtime now after that? Is containerd. Next to containerd is runc, which at its core is libContainer.

But in 2014, Google found that most of the containerized solutions only provided stand-alone solutions. Meanwhile, since Docker was also based on CS architecture, it needed a Docker Demand, which required the existence of a daemon. For this daemon, It needs to be started by root user, and the daemon started by root user actually increases the attack surface, so the security problem of Docker has also been criticized by many people.

At this time, Google found this point and made its Borg system open source. The open source version is Kubernetes. Google has also teamed up with a group of companies to form the Cloud Native Foundation (CNCF).

7, Kubernetes

In general, Kubernetes is the cornerstone of cloud native applications. That is to say, after Kubernetes emerged, our cloud native technology began to evolve gradually, gradually leading the trend, and Kubernetes provides some key features.

It can support more flexible scheduling, control and management, and the scheduler, in addition to its default, also can be convenient to do with it to expand, such as we can to write their own scheduler, or affiliation, affinity, these are we compared to some of the features of commonly used.

It also offers services such as built-in DNS, Kube-DNS, or now CoreDNS, for service discovery by means of domain names, and there are many controllers in Kubernetes. It can adjust the state of the cluster to the desired state. For example, if a POD fails, it can automatically restore it to the desired state.

The other thing is that it supports a rich set of resources, such as several major tiers, the smallest one being POD, then further up there is Deployment, or StatefulSets, something like that.

The last factor that makes us like it more is that it has rich CRD extension, that is, we can write some custom resources by ourselves, and then extend it, such as CRD.

8. More containerized technology

In addition to the main ones, there are a number of containerization technologies that we haven’t covered, such as RUNC, which we haven’t covered much above, and Containerd. Containerd is the core of Docker’s open-source version, which aims to make Containerd a standardized, industry-usable container runtime, and CoreOS’s open-source solution called RKT. The target of RKT is the security issues related to Docker mentioned above. But RKT has now terminated the project.

And then there’s Red Hat’s Podman. Podman is a device that you can use to start containers, you can use it to manage containers, and you don’t have daemons, so in terms of security, Podman is better than Docker intuitionarily, but in terms of convenience, It’s going to be a big discount. Things like restarting the container, booting up, etc., but we all have some different solutions.

In 2017, there was a Kata Container, and the Kata Container had a process of development. At the beginning, Intel was building its own Container runtime, and there was a startup called Hyper.sh, which was also building its own Container runtime. Both companies aim to make more secure containers, and the underlying technology they use is based on the K8S. Then the two companies merged. Hyper. sh, an open source solution, Runv, was adopted by Intel, and then Kata Container was born. In 2018, AWS opened source for its Firecracker.

These two technologies are quite different from the containerization on the machine mentioned above, because the underlying technology is virtual machine, and we generally think of it as a containerization of a lightweight virtual machine. So much for the variety of containerization technologies.

III. Introduction to Redis

Next, enter the introduction about Redis. The following is an introduction excerpted from the Redis official website.

1. Main scenarios used by Redis

In fact, Redis is now the most widely used type of KV database. When we use it, the main usage scenarios may be as follows:

  • Use it as a cache, put it in front of the database, use it as a cache;
  • To use it as a DB, you need to actually store data in it for persistence.
  • Message queue, it supports a lot of data types, here will not be introduced.

2. Features of Redis

  • It is a single-threaded model, it can have multiple threads, but it only has one worker thread, in Reddis6.0, it supports IO multi-threading, but IO multi-threading can only have multiple threads to deal with the network related parts, in fact you are actually processing data is single threaded, so overall, We still call it a single-threaded model.
  • The data in Redis is actually in memory. It’s an in-memory database.
  • Relating to HA, Redis wants to make HA. We used to make HA in Redis mainly by Redis Sentinel, but after Redis came out of cluster, we mainly made HA by Redis Cluster. These are the two major HA solutions.

IV. Comparison of REDIS Container Schemes

When it comes to doing Redis operations, what are the points we need to consider:

  • Deployment, how to deploy it quickly, how to deploy it quickly, and also managing the ports that you’re listening on so that they don’t conflict, and logs and persistent files and things like that, that’s all part of deployment;
  • Expand/shrink volume, is also a problem we often encounter;
  • Monitoring and alarm;
  • Failure and recovery.

These are the areas that we are most concerned about. I am going to do some introduction to these aspects.

1, deployment,

When we mention to do single instance, Redis as a standalone instance to deploy more, first of all, the first point is that we want to be able to have the process level resource isolation, we all above a certain node deployment of Redis instance, can have its own resources, can be not affected by other instances, this is for the process isolation level of resources.

Process level resource isolation, it is mainly divided into two aspects, on the one hand is CPU, on the other hand is memory, then we hope that on the single machine we can also have their own port management, or we can have an independent network resource isolation related technology.

In this case, we first mentioned resource isolation at the process level. We have introduced so many techniques related to containerization. We already know that the easiest way to support resource isolation at the process level is to use cgroups. In other words, all solutions that support this kind of planning, such as cgroups and namespace, can meet the needs of our place.

Another solution is virtualization solution, such as Kata Container, which is based on virtualization as we mentioned above, because everyone has access to virtualization solutions and everyone knows that it is the technology of virtualization. In fact, by default, all of them are isolated at the beginning.

So for deployment, if you’re using something like Docker, if you’re using something like systemd-nspawn or something like that you can use either cgroups or namespace, you can use both, but you have to think about the convenience, For example, if you’re using Docker, do a Docker command and run to it, and then just make it map to a different port, and you’re done.

If you are using systemd-nspawn, in which case, you need to write some configuration files. If you are going to use a virtualization solution, you will also need to prepare some images.

2. Expand/shrink volume

There are actually two main scenarios for scaling/scaling. One scenario is the single-instance MaxMemory adjustment, which is our maximum memory adjustment. There is also a Cluster solution for our clustering, which is Redis Cluster. For this cluster size, if we expand/shrink, there will be two changes.

On the one hand, there are nodes, which are changes to our nodes. If we add nodes, we may also remove them.

The other is the change of Slot, which is to migrate my Slot. Of course, these are related to Node nodes, because when we enlarge the capacity, we increase the number of Node nodes in the Redis Cluster. After the number increases, we can not allocate slots to them. Or I want to have certain slots on certain nodes, but the same needs exist.

So let’s see, if you want to adjust maxmemory, if you want to restrict it by cgroups, you need to have a solution that allows you to adjust the cgroups quota dynamically.

For example, we can use Docker Update, which can directly modify the cgroups resource of an instance or one of its containers. For example, we can use Docker Update, which can specify new memory for it, and can limit the maximum available memory. When you increase the number of available memory for it, Then you can adjust the instance’s maxMemory, which means that for a single instance of maxMemory, the most important thing is to have the cgroups technology, and provide some support for the cgroups technology.

For cluster node changes, this section will be covered in more detail later.

3. Monitoring and alarm

The third point is the monitoring alarm, whether it is using a physical machine, or using a cloud environment, or using any solution is good, the monitoring alarm we most want to get the effect is that it can be found automatically.

We hope when starting an instance, we can immediately know this instance A he has been up, and know what is his status, and monitoring alarm, this part is not dependent on A specific container technology, even on A purely physical machine deployment, also it can be found through some solution automatically, I will automatically register it in our monitoring system, so that it belongs to the part of monitoring alarm. In fact, it does not depend on specific container technology, but the only problem is that if I use the container solution, I can ask my regular Redis Exporter to cooperate with Prometheus to do monitoring. You can have Redis Exporter and Redis Server, two processes that live in the same network namespace.

If both of them are in the same network namespace, we can connect it directly via 127.0.0.1:6379, and if we are using k8s, we can put both of them into a pod, but it doesn’t matter, because it doesn’t depend on the specific containerization technology. You can use any technology you want, you can do monitoring and alarm.

4. Fault recovery

The last part we talked about was failure and recovery. In this part, I think there are three main aspects:

  • The first is the reboot of the instance.

It’s possible that under certain circumstances, under certain scenarios, your instance might go down while it’s running, and if you want it to restart automatically, you need to have a process management tool. For us, as mentioned in systemd above, systemd can support automatic restart of a process, can also support automatic restart of a process after it dies, can be restarted, or if you use Docker, it also has a restart policy, You can set it to always or on-failure and have it pull up when it fails.

If you’re using a K8S, it’s even simpler and can be pulled up automatically.

  • The second is the master-slave switch.

Master-slave handoff is relatively normal, but I’ve included it in failover here because it’s possible that during a master-slave handoff, your cluster may be in bad health, and this can happen. So what do we need to do when we switch from master to slave? The first on the one hand, need to let him to have data persistence, on the other hand at the time of master-slave switch, is likely to encounter a situation that is not enough resources, cause the failure of master-slave switch, in this case, and we actually above mentioned expansion/shrinkage is related, so in this case, you must have to give him a resource, The best way is to automatically add resources to him.

In K8S, if we want it to automatically add resources, we usually set its request and limit, which is the quota of resources, and hope it can automatically add up, we usually call it VPA.

  • The third is data recovery.

Typically, let’s say we have an RDB and an AOF open, and we want our data to be saved so that we can start using it when our data is recovered. So data persistence is one thing.

When we do containerization, what points do we have to think about? If you use Docker, you need to hang a voucher, and then you can persist the data. For example, if you use systemd-nspawn, you also need a file directory to persist the data.

If you are using K8S, you will have a variety of options when it comes to mounting coupons. For example, you can mount Ceph’s RDB, S3, or a local file directory. However, for greater reliability, distributed storage with more copies may be used.

5. Node Node changes

Next, let’s talk about the Node changes mentioned above when we talked about service expansion/shrinkage. For example, I want one of my Redis Cluster to expand some nodes, which means that you must be able to join the cluster and communicate with the cluster. This means that you are truly joining the cluster.

And we also know that in Redis Cluster, if you want to build a cluster, one of the biggest problems is that the services found in K8S are mostly through a domain name. However, in our Redis, for example, our NodeIP, it only supports IP, it does not support our domain name.

So if the Node Node changes, all we need to do is allow us to dynamically write NodeIP to our cluster configuration.

If you want it to have a full life cycle, the following screenshot is from an operator called Kubedb. As you can see in the following image, Redis provides the three main parts:

  • PVCS, PVCS is doing data persistence.
  • Services, Services is doing service discovery.
  • StatefulSets, StatefulSets is actually a resource in K8S that is a little bit more friendly for our stateful applications.

In fact, what is the one thing that is not introduced in the whole content? The company behind Redis is called Redis Labs, which provides a commercial solution called Redis Enterprise. This is actually on top of the K8S solution, which also uses the Redis Operator. His scheme is similar to that of Kubedb, because the core still uses StatefulSets and then adds its own Controller to complete this thing.

Five, the summary

Let’s take a look at today’s summary. If we use it alone, we can turn it over to Docker or other tools that support cgroups or namespace resource management. Because when you use cgroups and namespace, you can isolate resources, avoid conflicts between network ports, and so on.

If it’s a problem like the guy I mentioned above: he’s going to use the host’s network, he just wants Docker to do process management, and you don’t want to introduce new content. Then systemd or systemd-nspawn is a viable solution, as this is also a containerized solution.

For scheduling and management in complex scenarios, such as large scale and more flexible scheduling and management, I recommend that you use the Kubernetes operator, such as Kubedb, which is also a solution. If your scene is not so complicated and relatively simple, the original Kubernetes StatefulSets with a few tweaks and modifications can also meet your needs.

That’s all I have to share today. Thank you for your participation.

>>>>

Q&A

Q1: If Redis cluster is made up of three physical machines and each machine runs two instances, how to ensure that the instances of each physical machine are not master and slave to each other?

A1: This is a problem that we all have in common. Number one, if you’re doing it with a physical machine, and you’re running two instances on each machine, three machines running two instances on each machine, you’re going to have six instances. Can you guarantee that each of these 6 examples is not master or subordinate to each other? In fact, you can guarantee directly.

The only problem is that if this is a cluster and there is a failover, an active switch of nodes, it is very likely that your master and slave have changed. In fact, I suggest that if you find this kind of problem, you should do the switch manually, because the physical machine environment to do this, so far I haven’t seen any good solution.

Q2: Could you please tell me how to add new nodes during capacity expansion in K8S? How is the step of capacity expansion and slot allocation automated?

A2: Us apart two steps, the first part is to add a new node, add a new node, I actually in process already mentioned just now, after add the new node, you must first make it and the cluster to do communications, however the place is you need to modify the configuration files of the cluster, and then he had an NodeIP you need, Since it communicates via IP, you need to modify its configuration file to include its NodeIP so that it can communicate with other nodes in the cluster. For this part, however, I prefer to use operator.

Q3: If you do not use Redis Operator and do not use distributed storage, how can K8S deploy a cluster?

A3: It is possible not to use the Redis Operator. As I mentioned earlier, there are two modes. One is StatefulSets, which is a bit more secure. And the main part of it is still changing the configuration. You need to add an init container to your Redis container image, and then you can change the configuration for it in this part first, which is OK. After modifying the configuration, pull it up so that it can join the cluster.

Q4: What is the difference in network latency between different network models?

A4: If we use the physical network directly, normally we don’t think of it as having latency, which is the host network and we usually ignore that latency, but if you’re using the overlay network model, because it’s an overlay network, so when you’re going to do the unpackage, Of course there will be different resource losses, there will be performance losses.

Q5: Is it generally recommended that the company share a Redis cluster or separate clusters for each system?

For example, if you are using a list, we all know that Redis has a ZipList configuration, which is actually related to your storage and your performance. If you are a company that uses the same cluster for everything, you can change the configuration of the Redis cluster and it will probably affect all the services. But if you use a Redis cluster for each system, it won’t affect each other, and there won’t be a chain reaction in which one application accidentally crashes the cluster.

Q6: How does Redis persistence work in production?

A6: That’s part of what I think. If you really need to do persistence, on the one hand Redis provides two cores, one is RDB and the other is AOF, and if you have a lot of data, your RDB can get quite large. If you’re doing persistence, I usually recommend doing a tradeoff between the two. Because usually, even if you are using the physical environment, I would also suggest that you can storage directory on a separate disk, or you can go to hang in a distributed storage, but the premise is you need to ensure that its performance, because you can’t because it’s written performance and drag your cluster, so more recommendation is that you can open them all, But if your data isn’t really that important, you can just turn on the AOF.

Q7: Is the production grade Ceph reliable?

A7: The reliability of Ceph has been discussed by many people. Personally, the reliability of Ceph is guaranteed. I am using a lot of Ceph, and I have stored a lot of core data. The reliability of Ceph is OK.

Focus is to say you can get him, but there is one company, in fact you may know, called SUSE, is a Linux distribution, the company actually provides an enterprise storage solutions, and its underlying or with ceph actually, actually this is normal, as long as there is someone to do this thing, And then get it out of the way. I think Ceph is stable enough.

By the way, if you’re using K8s, there’s a project called ROOK that actually provides a Ceph operator, which is relatively stable right now, so I recommend you try it out.

Q8: How to configure the application memory, limited memory, and Redis memory?

A8: There are several issues to consider here. First of all, let’s talk about the memory of Redis itself. In fact, it depends on the usage scenario of your actual business, or the actual demand of your business.

If it’s full, you need to turn on the LRU to do evictions and things like that, that’s one thing. In fact, I understand that the question you are going to ask here should be referring to, under the K8S environment, one is request, one is limit, limit must be your available limit of memory, you must take into account the Redis itself also used some memory.


Welcome to subscribe to my article public number [Moelove]