In the age of microservices, cloud computing, and no-service architectures, it is useful to understand Kubernetes and know how to use it. However, the official Kubernetes documentation can be difficult to understand for users just getting started with cloud computing. In this article, we’ll look at the important concepts in Kubernetes. In future articles in the series, we will also learn how to write configuration files, use Helm as the package manager, create a cloud infrastructure, easily orchestrate services using Kubernetes, and create a CI/CD pipeline to automate the entire workflow. With this information, you can start any kind of project and create a powerful infrastructure.

First, we know that there are multiple benefits to using containers, from faster deployment to large scale consistent delivery. Even so, containers are not the answer to all problems because of the overhead associated with using containers, such as maintaining a container orchestration layer. So, you need to analyze the cost/benefit at the beginning of the project.

Now, let’s start the world tour of Kubernetes!

Kubernetes hardware architecture

node

The node is a worker machine in Kubernetes, which can be any device with CPU and RAM. For example, a smartwatch, smartphone or laptop, or even a Raspberry Pi could be a node. When we use the cloud, a node is a virtual machine (VM). So, in a nutshell, a node is an abstraction of a single device. The advantage of this abstraction is that we don’t need to know the underlying hardware structure. We only use nodes, so our infrastructure is platform-independent.

The cluster

A cluster is a group of nodes. When you deploy an application to a cluster, it automatically distributes work to the nodes. If more resources are needed (in short, we need more money), new nodes will be added to the cluster and work will be automatically reassigned.

We run our code on the cluster, but we don’t need to care which part of the code is running on which node. The assignment of work is automatic.

Persistent volumes should be collected regularly.

Because our code can be moved from one node to another (for example, if one node does not have enough memory, work will be rescheduled to another node that has enough memory), saving data on a node is easy to lose. If we want to keep our data forever, we should use persistent volumes. A persistent volume is kind of like an external hard drive that you can plug in and save your data on.

Kubernetes, developed by Google, is a platform for stateless applications whose persistent data is stored elsewhere. As the project matures, many enterprises will want to use it in stateful applications, so developers will need to add persistent volume management. As with earlier virtualization technologies, database servers are generally not the first servers to migrate to the new architecture. This is because databases are at the heart of many applications and may contain a lot of important information, so local database systems are typically large in virtual or physical machines.

So, the question is, when should we start using persistent volumes? To answer this question, we should first understand the different types of database applications.

We classify data management solutions into the following two categories:

  1. Vertical scaling – This includes traditional RDMS solutions such as MySQL, PostgreSQL, and SQL Server
  2. Horizontal scaling – includes “NoSQL” solutions such as Elasticsearch or Hadoop based solutions

Vertical scaling solutions (such as MySQL, PostgreSQL, and Microsoft SQL) should not be applied in containers. These database platforms require high I/O, shared disks, block storage, etc., and can’t handle node loss within a cluster, but this often happens within a container-based ecosystem.

Containers can be used for horizontally scaled applications such as Elastic, Cassanda, Kafka, and so on. They can withstand node loss in the database cluster and the database application can restore itself to equilibrium.

In general, you should containerize distributed databases to take advantage of redundant storage technologies and be able to handle node loss within a database cluster (Elasticsearch is a good example).

Kubernetes software components

The container

One of the goals of modern software development is to ensure that various applications can be isolated from each other on the same host or cluster. Virtual machines are one solution to this problem. But virtual machines need their own operating system, so they are usually gigabytes in size.

Containers, on the other hand, isolate an application’s execution environment but share the underlying operating system. So, a container is like a box in which you can store everything you need to run your application: code, runtime, system tools, system repositories, Settings, and so on. They typically require only a few megabytes to run, far less than the virtual machine needs, and can be started immediately.

Pods

A Pod is a set of containers. In Kubernetes, the smallest unit is the Pod. A pod can contain more than one container, but in general we only use one container per pod, because the smallest unit of replication in Kubernetes is the pod. If we want to expand each container individually, we can simply add a container to the POD.

Deployments

The initial function of Deployment is to provide declarative updates to PODs and replicasets where the same POD will be replicated many times. Using Deployment, we can specify how many copies of the same POD should be running at any time. Deployment is similar to a POD manager in that it automatically starts up the required number of PODS, monitors them, and recreates them in the event of a failure. Deployment is extremely useful because you do not need to create and manage each pod separately.

We typically use Deployment for stateless applications. However, you can persist the state of Deployment and make it stateful by attaching a persistent volume to it.

Stateful Sets

StatefulSet is a new concept in Kubernetes and it is a resource for managing stateful applications. It manages the extension of Deployment and a set of PODs, and ensures the order and uniqueness of these PODs. It is similar to Deployment, except that Deployment creates a set of PODS with any name and the order of the PODS is not important to it, whereas StatefulSet creates PODS with a unique name and order. So if you want to create 3 copies of a pod called example, StatefulSet will create: example-0, example-1, example-2. Therefore, the most important benefit of this creation method is that you can get a general idea of what is going on by the name of the pod.

DaemonSets

DaemOnset ensures that POD is running on all nodes of the cluster. If a node is added/removed from the cluster, DaemOnset automatically adds/removes the POD. This is important for monitoring and logging because you can monitor each node without manually monitoring the cluster.

Services

Deployment is responsible for keeping a set of PODs running, so Service is responsible for initiating network access for a set of PODs. Services can provide standardized features across clusters: load balancing, service discovery between applications, and zero-downtime application deployment. Each service has a unique IP address and a DNS host name. The appropriate IP address or host name can be manually configured for the application that needs to use the service, and the traffic will be load-balanced to the correct POD. In the External Traffic section, we’ll learn more about service types and how we communicate between internal services and the outside world.

ConfigMaps

If you want to deploy to multiple environments, such as staging, development, and production, it’s not a good idea to configure Bake in your application because of the differences between environments. Ideally, you would want a different configuration for each deployment environment. Thus, ConfigMap was born. ConfigMaps allows you to decouple configuration artifacts from images to keep containerized applications portable.

External flow

Now that you know about the services running in the cluster, how do you get external traffic to your cluster? There are three types of services that can handle external traffic: ClusterIP, NodePort, and LoadBalancer. There is a fourth solution: add another abstraction layer, called the Ingress Controller.

ClusterIP

ClusterIP is the default service type in Kubernetes and allows you to communicate with other services within the cluster. Although ClusterIP was not designed for external access, external traffic was able to access our service with a few changes using the proxy. Do not use this solution in production, but use it for debugging purposes. The services declared for ClusterIP should not be directly visible from the outside.

NodePort

As we saw in the first part of this article, POD is running on the node. Nodes can be a variety of different devices, such as laptops or virtual machines (but running in the cloud). Each node has a fixed IP address. By declaring a service as NodePort, the service exposes the node’s IP address so that you can access it from the outside. You can use NodePort in a production environment, but for large applications with many services, it can be cumbersome to manually manage all the different IP addresses.

LoadBalancer

Declare a service of type LoadBalancer and you can use the cloud provider’s LoadBalancer to expose it to the outside world. How the external load balancer routes traffic to the service Pod is up to the cluster provider. With this solution, you don’t have to manage all the IP addresses for each node in the cluster, but you will have a load balancer for each service. The downside is that each service has a separate Load Balancer, and you’ll pay per Load Balancer instance.

This solution is suitable for production environments, but it is somewhat expensive. Next, let’s look at a slightly cheaper solution.

Ingress

Ingress is not a service, but an API object that manages external access to the cluster service. It enters your cluster as a reverse proxy and a single entry point, routing requests to different services. I usually use the NGINX Ingress Controller, which takes on the reverse proxy and also acts as SSL. The best production scenario for exposing Ingress is to use a load balancer.

With this solution, you can expose any number of services using a single load balancer, so you can keep fees to a minimum.

conclusion

In this article, we learned about the basic concepts in Kubernetes and its hardware architecture. We also looked at the different software components such as POD, Deployment, StatefulSets and Services and saw how the service communicates with the outside world. Hopefully this will help you tease through the intricate architecture of the components in Kubernetes again.