As we all know, microservices are closely combined with the container itself. With the popularity of cloud services, they gradually extend from the server inside the enterprise machine room to various cloud service scenarios.

So how do we manage common hybrid cloud services with container-based models?

On May 18-19, 2018, the Global Software and Operation Technology Summit hosted by 51CTO was held in Beijing.

In the sub-forum of “Micro service Architecture Design”, Li Jian, a senior engineer from ele. me Computing Power Delivery Department, brought a wonderful speech with the theme of “Ele. me Container-based Hybrid Cloud Practice”.

This article will be shared in the following four parts:

  • Computational power delivery

  • Technology selection

  • “Computing Power Takeaway” Based on Kubernetes

  • Kubernetes extension solution

With the rapid growth of the business, the corresponding resource scale is also increasing rapidly. This growth has led directly to more server types, more administrative tasks, and more diverse delivery requirements.

For example, in some scenarios, we need to pre-install an application on the delivered server; Sometimes we need to deliver a “dependent” service, or a collection of services.

Computational power delivery

Although the number of physical resources and virtual machine resources we face is huge, but our operation and maintenance personnel is limited, can not achieve unlimited expansion. Therefore, we need to abstract the physical resources uniformly and output them to the developers.

This standardized abstraction can bring two major benefits to the enterprise: greatly reducing costs and increasing server management capacity. At the same time, this unified abstraction led to the emergence of our computing power delivery division.

Specifically, the delivery of servers, including cloud service IaaS, is essentially the same as the delivery of applications, such as SaaS.

In today’s cloud era of virtualization, we can prepare CentOS or Ubuntu for a file system with a simple command input. As a result, the server becomes a software service, or an App.

By standardizing servers and applications in an abstract way, we can abstract away all physical resources and form a controlled computing force that can coordinate the system. Everything delivered belongs to the application, so to speak.

So the key to our computing power delivery is to manage the application, and that’s a shift in focus.

Container technology prototypes began in the late 1970s, but it wasn’t until Docker came out in 2013 that containers became mainstream technology.

The biggest contribution of Docker to container technology lies in that it has the characteristics of cross-platform portability in a truly application-oriented way and delivers all services in a unified way of Image packaging for application delivery.

In addition, it is a packaging standard for applications. Based on this standard, we can have applications running on any platform.

At the same time, this further promotes the development of automated operations, AIOps and big data applications. Therefore, it reduces the labor cost, but also improves the utilization rate of resources.

We classify computational power delivery into three broad categories:

  • Deployment of customer applications. Upon receiving a service request, we will deploy and get the service up and running.

  • One-click delivery of standard services. For example, the big data service of a department requires an environment in which many services are isolated from each other by default, but some services can be connected with each other.

    Then we need to have some reproducible standardized service templates in place to ensure smooth discovery of application services even in complex SOA architectures.

  • Server delivery. As mentioned earlier, the standardization of server delivery is a manifestation of our capabilities in computing power delivery.

Technology selection

Today, there are many container technologies to choose from, including Kubernetes, Swarm, and AWSECS, among which Kubernetes is the most popular.

Therefore, we need to consider the following factors in selection:

  • The Kubernetes project has become the de facto standard for container choreography. Because everyone is using it in general, if you have a problem, you can go to the community to find the answer. This brings an invisible advantage of cost reduction.

  • Alignment between actual requirements and technology.

  • Expansibility and ecological development. Some of the technologies adopted by large companies as endorsements, generally have strong background support and certain forward-looking, but also convenient to establish a certain ecosystem.

“Computing Power Takeaway” Based on Kubernetes

The type of service we developers typically need for our application is very similar to the ordering service we do. The green Box in the picture above is actually like a takeout Box abstracted from us.

Each of these services is invoked by DomainName. And these boxes are replicable, so we can create multiple boxes based on the template.

When each Box calls different services, its domain name is unique in the system. As a result, they can be invoked across different environments, reducing developer workload.

In the past, when services started, the system automatically generated a network identity, and when IP addresses or domain names changed, they had to make configuration changes accordingly.

Now, once the container environment and the corresponding application are up and running, different services can call each other based on a unique network identity.

In the concrete implementation, we use Kubernetes to make a bottom container engine. Within each Unit, there are domains and pods. And each Unit has its own copy for load balancing.

We use Systemd as a startup to understand the interdependencies of services.

It implements the same functionality in Box through a startup tree to ensure that when Box starts up, it can get the application up and running in the order that we’ve defined the dependency description.

Although there should not be too much dependency between services from a technology development point of view, since our department is providing services to business departments, all we need to do is to promote standardization to accommodate development habits with their current projects.

As shown in the figure above, each Unit also has a Hook that assists in starting/stopping and initializing the service. For example, after some service completes, it calls another Pod for initialization.

Of course, we also have access to public services. For example, Box1 and Box2 both pass data through a public service. The unique network ID of a public service may change due to external factors such as restart.

So for consistency, we tried to convert the internal identity mentioned above into an external identity. In fact, our internal identity is never changed, whereas external relationships need to be dynamically associated with internal services through the mechanism of service discovery.

In this way, internal services do not have to worry about configuration changes and service discovery. The picture above is our simplest abstraction of the delivery service.

As we all know, Kubernetes service mainly depends on the startup of Etcd, but Etcd itself does not support a lot of business in the application scenario of Kubernetes.

Therefore, we need to split Kubernetes as far as possible considering the increasing volume of services and the stability of the service.

During the split process, we split the resource pools in the original single machine room network into three or four Kubernetes clusters.

After the split was complete, we ran into a problem where the cluster resources were unevenly utilized because of the split: some clusters were underloaded, while others were overloaded.

Therefore, we use different ETcds corresponding to different Kubernetes clusters, through the way of scheduling, so that the service can drift between clusters, so as to solve the problem of resource efficiency, and solve the problem of reliability.

In the above simple structure, except for the yellow part, all components are native to Kubernetes, including the Scheduler with the Pod function in the blue part.

We use the above attributes and service information of the different Node layers to determine which cluster to invoke.

Since we are using a two-tier scheduling approach, we do not have real-time information about the cluster when we move to tier 1, nor do we know where the cluster has moved its services.

To do this, we developed a Scheduler ourselves (shown in yellow). In addition, we have developed an APIServer service similar to Kubernetes.

As you can see, our goal is to extend the Kubernetes cluster in a peripheral way, not to change Kubernetes itself.

A layer of APIServer and Scheduler is added around Kubernetes. The immediate benefit is that we save on development and maintenance costs on the framework.

Let’s take a look at the actual container environment:

  • As mentioned earlier, we are based on Kubernetes.

  • We use aliyun virtual machines on the network. If the scale is small, we will use Vroute; If we build our own computer room, we use LinuxBridge. Meanwhile, our Storage Driver uses Overlay2.

  • In terms of operating system, we all use Centos7-3.10.0-693.

  • On the Docker version, we used 17.06. It’s worth adding: there are rumors in the community that this version is going to be discontinued, so we’ll be working on an update soon.

Registry

Speaking of Registry, there was no problem when it was small in the past. But now we’re big enough to span several rooms.

Therefore, we need to synchronize the data sent from Registry and “double write” to OSS(Object Storage Service) and Ceph(an open source distributed Storage system that provides Object Storage, block Storage and file system Storage mechanisms).

Since we all have Ceph in our physics rooms, we can follow the principle of downloading the Ceph nearby instead of going out of the room.

The advantage of this approach is that it reduces the bandwidth bottleneck caused by transferring data between computer rooms.

So why do we synchronize? As soon as our services are published, they are automatically deployed asynchronously.

However, some machines may not have the required image at all, and they will report an error when performing a “pull” operation.

Therefore, based on this consideration, we adopted the “synchronous double write” mode and added the authentication step.

As you probably know, as Registry runs for a long time, there are more and more bloBs in the mirror.

We can clean up by ourselves or according to the official method, but the cleaning time is usually very long.

Therefore, a cleanup would result in an operational disruption of services that would be unacceptable to the average company.

We came up with an approach called Bucket rotation. Since we generate a large number of mirrors during CI (continuous integration), we divide the mirrors into two quarterly retention cycles, i.e. :

  • Create a Bucket for the first quarter to store the image.

  • In the second quarter, the new image is generated into the second Bucket.

  • In the third quarter, we cleaned out the images in the first Bucket to store the new images.

And so on, in this way, we are able to limit the mirror image from growing indefinitely.

The Docker objects served by our current 8 Registry have a volume size of tens of thousands.

If there is a problem with Registry, we need to be able to locate the problem quickly, so monitoring Registry is essential.

At the same time, we also need to monitor the use of the system in real time so that we can make the necessary expansion.

In terms of implementation, we are actually only slightly adding some of our own programs, while maintaining the overall decoupling relationship with other programs.

As shown in the figure, we can track the running status of Registry in real time by monitoring metrics such as upload and download speed, including the number of BLOBs.

Docker

We use Docker Syslog Driver to output the collected user process to ELK for presentation. However, when the log volume is too frequent, ELK can be out of order at the millisecond level.

This disorder will make it difficult for business departments to conduct troubleshooting. So we changed the Docker code to give each log an artificially increasing sequence number.

For some logs with high reliability requirements, we choose TCP mode. However, in TCP mode, if a log is output to a service, the Socket that receives the service is “jammed” due to full load.

Then the container can’t continue, and its Docker ps command will just stop there. In addition, even if the TCP mode is enabled, but the TCPServer has not been started to receive logs is of no help.

Thus, in the face of some practical problems, there is still a distance between our requirements and what the open source project itself can provide. No open source project will be perfect when it lands in the enterprise.

Open source projects tend to provide standardized services, while our requirements are often customized to our specific scenarios.

The software environment of our enterprise grows bit by bit according to its own environment, so the difference is inevitable.

We can’t make open source software change for us. We can only modify programs to adapt open source software to our current software state and environment, and then solve various specific problems.

Here’s some of Docker’s monitoring, which helps us find problems, bugs, and “bugs.”

The Init process

In the process of containerizing the traditional business, we customized and improved the Init process for the container.

The Init process can toggle some directory and environment variables managed by the Image base layer through macro substitution that can also replace the base configuration file.

For migrated services that have configuration items themselves, they usually use environment variables to read configuration information. This adds to the cost of modifying it to a container.

So the solution we provide is that although variables are written in the configuration, we can automatically replace the service write “dead” configuration based on the container’s environment variable Settings.

This approach of the Init process is especially useful when the container has not been started and the IP address of the service is not known.

For container management, we also enable Command to continuously listen for process status to avoid zombie processes.

In addition, due to the internal SOA system of our company, traffic needs to be removed from the whole cluster of the company after Docker generates stop signal.

Therefore, by intercepting the information, we have achieved the removal of service at the relevant traffic entry point.

At the same time, we also pass the stop signal to the application process, in order to complete all kinds of cleaning operations for the software, and then realize the smooth transition to the traditional business container.

Kubernetes resource management extension

Here’s a case study of how we can scale to meet business requirements.

As we all know, Kubernetes’ APIServer is a core component that allows all resources to be described and configured. This is similar to Linux’s “everything is a file” philosophy.

The Kubernetes cluster uses all operations, including monitoring and other resources, to read files to master the state of the service, and to write files to modify the state of the service.

In this way, we successively developed different components and interfaces of APIServer, and implemented microservitization of services in the form of Restful interfaces.

In the actual deployment, we did not use Kubernetes Proxy, but this reduced the load of APIServer.

Of course, if necessary, we can load and deploy on demand to ensure and enhance the overall scalability of the system.

Kubernetes APIServer

The diagram above visually illustrates the simple internals of APIServer. All resources actually go into Etcd.

At the same time, each resource has an InternalObject, which corresponds to some storage class Key/Value in Etcd, thus ensuring the uniqueness of each object.

When there is an external request to access the service, the ExternalObject and InternalObject conversion are required.

For example, we might be developing a program using a different version of the API, and a requirement change occurs that requires us to add a field.

In fact, the field may already exist in the InternalObject, so it can be generated directly with some processing.

So what we really want to implement is ExternalObject, we don’t have to change InternalObject.

In this way, we try to decouple the service and storage without modifying the storage, so as to better cope with the changes of the outside world.

Of course, this is also a process of object service in Kubernetes APIServer.

At the same time, we also grouped part of the API into an APIGroup. In the same APIGroup, different resources are decoupled and parallel to each other, and we can extend resources by adding interfaces.

The conversion between the two objects mentioned above, as well as the type registration, is done in Scheme in the figure.

If you want to make your own APIServer, Google has an official solution for the project and provides technical support. If you are interested, you can refer to the corresponding Github project url at the bottom of the image.

In addition, Google provides tools to automatically generate code. Accordingly, in the development process, we also made a small tool to generate all database related operations directly from the object, which saved us a lot of time.

Kubernetes fall to the ground

Finally, I would like to share with you some of the problems we encountered during the landing of Kubernetes:

  • Pod restart mode. Since the so-called reboot of a Pod is actually creating a new Pod and abandoning the old one, this is unacceptable for our enterprise environment.

  • Kubernetes cannot limit the size of the container file system. Sometimes a programmer writing business code can cause log files to swell by more than 100 GB in a day, filling up the disk and causing an alarm.

    However, we found that there didn’t seem to be any properties in Kubernetes to limit the size of the file system, so we had to change it slightly.

  • DNS modification. You can use your own DNS or Kubernetes’ DNS. However, we took the time to customize and modify Kubernetes DNS before using it.

  • The Memory. Kmem. Slabinfo. When we get to the point of creating a container, we find that the container can’t be created anymore. We tried to free the container, but we were still limited by memory exhaustion.

    After analysis, we found that: Because the Kernel version we used is 3, and Kubelet activated the cgroupkernel-memory test attribute feature, and Kernel 3 of Centos7-3.10.0-693 does not support this feature well. So I used up all of my memory.

Author: Li Jian

Editors: Chen Jun, Tao Jialong, Sun Shujuan

Submission: If you want to contribute, please contact [email protected]

Jian Li, head of Ele. me Computing Power Delivery Department, has many years of experience in container system construction, and promoted ele. me platform containerization; She is good at implementing container agility and standardization at the enterprise level, and is the development leader of several container-based cloud computing projects in the enterprise. Love open source, and be keen on solving enterprise problems with open source projects.

Excellent article recommendation:

Small white also can play open source project, you and big god only difference these steps!

NoSQL or SQL? This one makes it clear

What is the secret behind the highly available purchase system under 5 million orders per day?