This article is based on the DBAplus community issue 122

Whether in the community or in the process of communicating with customers, I am always asked when on earth I should use Docker. When to use virtual machines? If containers are used, which container platform should be used? Obviously, I will not give you an answer directly. Instead, I hope to analyze specific scenarios from the technical perspective, such as whether the customer is a large company or a small company, whether the customer will deploy a small cluster or a large cluster, whether the customer prefers private cloud or public cloud, whether the customer has purchased IaaS or not, whether the IT operation and maintenance capability is strong or weak. Whether a mixture of physical machines, virtual machines, and containers is required, whether it is a common concurrent system or a high concurrent system, the technology selection should be different.

 

For example, if you are a small start-up company whose main business is not IT, you shouldn’t be putting a lot of effort into building your own large-scale, high-concurrency, high-performance container platform in your data center.

 

First of all, when should Docker be used

 

 

On the left are the oft-talked advantages of the container, but the virtual machine can play them all back.

 

If you deploy a traditional application that starts slowly, has a small number of processes, and is rarely updated, the VM can meet the requirements.

 

  • Application startup is slow: the application starts in 15 minutes, the container itself is in seconds, and the VIRTUAL machine can be optimized to more than ten seconds on many platforms. There is almost no difference between the two

  • Memory occupation: always 32G, 64G memory, a machine can not run a few

  • Almost no update: Vm images can be upgraded and rolled back if updated every six months

  • Stateful: The application loses data when stopped. If you don’t know what is lost, even if the second startup is useful, you can’t recover the data, and there is also the possibility of lost data and blind restart without fixing the data.

  • Small number of processes: Two or three processes can be configured with each other to avoid service discovery and easy configuration

 

If it is a traditional application, there is no need to spend money on containerization, because the effort is wasted and the benefits are not enjoyed.

 

 

At what point should you consider making a change?

 

The traditional business was suddenly impacted by the Internet business, the application is always changing, two or three days to update, and the flow increased, the original payment system is to take money to brush the card, now to the Internet payment, the flow expanded N times.

 

No way, a word: open!

 

Broken down, each submodule changes independently, with less interaction.

Separated, originally a process to carry the flow, now multiple processes together.

 

That’s why it’s called microservices.

 

 

In the microservice scenario, there are many processes and updates are fast, so there are 100 processes and one mirror per day.

 

The virtual machine cries because each image of the virtual machine is too big.

 

So in microservice scenarios, you can start thinking about containers.

 

 

Virtual machine rage, I do not use containers, micro services split, using Ansible automatic deployment is the same.

 

There is no problem from a technical point of view. The problem arises from an organizational perspective.

 

In most companies, development will be much more than operation, development and writing code is not concerned with the deployment of the environment is entirely operation, operation for automation, write Ansible scripts to solve problems.

 

However, with so many processes, split and merged, updates so fast, configuration changes, Ansible scripts need to be constantly changed, live every day, not tiring operation and maintenance.

 

So with such a heavy workload, operations are prone to errors, even through automated scripts. In this case, the container can be a great tool to use.

 

In addition to the container from a technical point of view, most of the internal configuration can be placed in the mirror, but more importantly from a process point of view, the environment configuration is pushed forward, pushed to the development, requires that after the development, you need to consider the deployment of the environment, not just the hands of the console.

 

The advantage of this is that although there are many processes, configuration changes and frequent updates, the amount is very small for the development team of a certain module, because 5-10 people are dedicated to maintaining the configuration and update of this module, which is not prone to errors.

 

If all this work is handed over to a few operations teams, not only will information transfer lead to inconsistent environment configuration, but deployment will be much larger.

 

Containers are a great tool that allows for a 200% o&M savings with just 5% more work per developer and less error prone.

 

However, development has done what operation and maintenance should do, is the development boss willing to do it? Will old developers complain to the o&M boss?

 

This is not a technical problem. In fact, it is DevOps. DevOps does not distinguish between development and operation and maintenance.

 

So microservices, DevOps, containers are mutually reinforcing and inseparable.

 

No microservices, no containers at all, virtual machines, no DevOps, annual deployments, slow development and o&M communications.

 

So, the essence of containers is mirroring based migration across environments.

 

Image is the fundamental invention of the container, the standard for encapsulation and operation. Other namespaces, cgroups, have been around for a long time. That’s the technical side.

 

On the process side, mirroring is a great tool for DevOps.

 

Containers are intended for migration across environments, and the first migration scenario is between development, test, and production environments. If you don’t need to migrate, or if migration is infrequent, a virtual machine image is fine, but always migrating, with a few hundred GIGABytes of virtual machine image, is too large.

 

The second migration scenario is cross-cloud migration. It is very troublesome or even impossible to migrate VMS across public clouds, regions, and two OpenStack, because the public cloud does not provide the function of downloading and uploading vm images, and the VM images are too large to be transmitted all day long.

 

Therefore, containers are also good scenarios for cross-cloud scenarios and hybrid cloud scenarios. This also solves the problem that only private cloud resources are not enough to carry traffic.

 

So that’s the nature of the container, I think, is that ultimately you should use the container in the correct posture, and of course you don’t always follow that exactly at the beginning.

 

Mode 1: Public cloud VMS

 

Suitable scenario: a start-up company with no information security concerns

 

If you are a startup company with few staff, insufficient IT operation and maintenance capabilities, few systems to deploy and limited funds to spend on IT systems, you should choose virtual machine deployment in public cloud, which can solve your following problems:

 

  • The management of grassroots IT resources is entrusted to the public cloud platform, and the company’s own operation and maintenance personnel only need basic Linux capabilities

  • For a small number of deployed systems, such as less than 10 VMS, replacing a WAR can be done by restarting Tomcat. For a small number of 10 to 20 VMS, Ansible scripts can solve this problem well

  • Charging by the hour, the public cloud can be launched cheaply and can quickly apply for a large number of virtual machines when the business expands rapidly

 

The information security concerns mentioned here are really just psychological concerns. Public cloud often has a large number of security mechanisms to ensure the security isolation of each tenant. As long as these mechanisms are used properly, the security of public cloud is definitely greater than the average data center built by the company itself.

 

You can read “What Are Customers Thinking when They Say They Want Security?” , absolute end-to-end solution.

 

Here is a picture to illustrate the security of the public cloud.

 

 

Public cloud has accumulated stronger security protection capability and more security protection experience to support its high-concurrency services:

 

  • Multi-line BGP, redundant extranet lines

  • High-throughput DDoS extranet defense

  • Better firewalls, intrusion detection, WAF

  • Improved traffic cleaning rules

 

Public cloud has launched more secure, more reliable and more available PaaS services to support its high-concurrency services:

 

Database:

  • High availability: No data is lost during the active/standby switchover

  • High reliability: Same-city hypermetro and remote backup

  • Security: access control, IP whitelist

 

Object storage:

  • High reliability: large capacity, three backups, remote synchronization

  • Security: access control, anti-theft chain

 

In order to support its high-concurrency business, public cloud has launched a more perfect monitoring operation and maintenance system, process and experience:

 

  • Perfect monitoring system to ensure the rapid positioning and troubleshooting of system faults during the promotion period

  • Security promotion can greatly improve and train an experienced operation and maintenance team

  • Data at the business level is also confidential to operation and maintenance and requires process assurance

 

In order to ensure the security of its own business, the public cloud continues to upgrade the cloud platform:

 

  • Stronger DDoS protection

  • Better and better firewall rules

  • Latest cloud platform security features and mechanisms

  • Constantly updated virtual machine and container image construction vulnerabilities

  • A constantly updated virus library

 

That’s not what I’m talking about today, but I’ll give you a few pictures for your own reference.

 

Mode 2: No IaaS, raw container

 

Application scenario: A startup company does not have IaaS but has information security concerns

 

But even then, there are startups or startups, maybe because of psychology, maybe because of compliance, they’re very worried about information security, and they want to deploy in their own room.

 

However, as a startup company, IaaS cannot be deployed in the machine room generally, because IaaS platform is difficult to operate and maintain, and it is more difficult to optimize, and it cannot be played by a team of 50 people. Therefore, physical machines are generally deployed before containers are used. When the number of physical machines is very small, For example, manual deployment or simple script deployment is ok for 5 to 10 applications, but once you reach 20 applications, manual deployment and simple script deployment are very troublesome:

 

  • The proportion of operation and maintenance personnel is low, but the application is relatively large

  • There are many applications deployed on the same physical machine, with configuration conflicts, port conflicts and interconnections. The operation and maintenance requires an Excel to manage, and it is also prone to errors

  • The physical machine container is messed up by script and Ansible, which makes it difficult to ensure the consistency of the environment. Reinstalling the physical machine is even more troublesome

  • Different applications depend on different operating systems and underlying packages, which can vary greatly

 

If you want to start a process using a Docker run script, you can use the following tools:

 

  • Configuration, port isolation, conflict reduction

  • Container-based deployment enables environment consistency, clean installation and removal

  • Different operating systems and underlying packages can be handled with container images

 

At this stage, the easiest way is to put the container as a virtual machine to use, or start the container first, and then download the war file in it, and so on, of course, also can go a step further, will play in the war file and configure direct container inside the mirror, so need a continuous integration process, is not just a matter of operations, development should also be involved.

 

At this stage, the mode of the network can use bridge leveling.

 

 

The advantage of this way is to access Docker and access the physical machine, Docker and physical machine inside the interoperability can be easily realized, compatible with the original deployment of the application on the physical machine.

 

Of course, Bridge performance is average. If performance requirements are high, sr-IOV network adapters can be embedded in containers.

 

 

Mode 3: With IaaS, bare container

 

Application scenario: Innovative project, introduce DevOps process

 

Some of the larger companies have purchased IaaS, but have innovative projects to deploy. In this state, basic virtual machines can meet the requirements, and due to their ability to operate and maintain IaaS, their IT capabilities are strong, often using deployment tools such as Ansible.

 

In this case, there is less power to use containers, but there is a certain benefit to containers: DevOps.

 

The speed of innovation projects is relatively fast, and if there are many innovation projects, the pressure on operation and maintenance is also very great. The difference between the bare containers here and the bare containers in Mode 2 is that the containers are not used as virtual machines, but as deliverables. Although containerization is a limited improvement in the overall operation and maintenance process, the key is to develop and write a Dockerfile, which is very important, which means that the configuration of the runtime environment is advanced to the development, rather than directly handed over to the operation and maintenance, namely, as mentioned above, 5% of the development workload increases and reduces a lot of operation and maintenance work. Atomic rollback of the container environment reduces downtime and maintains consistency in development, test, and o&M environments.

 

 

Docker Swarm Mode is used

 

Application scenario: Developing companies and medium-sized clusters

 

When the cluster size is more than 50, the bare container is already very uncomfortable, because the network, storage, choreography service discovery, etc. all depend on your own script or Ansible, it is time to introduce the container platform.

 

Docker Swarm Mode Docker Swarm Mode Docker Swarm Mode Docker Swarm Mode Docker Swarm Mode Docker Swarm Mode Docker Swarm Mode

 

  • Zookeeper and Etcd are not required for cluster maintenance

  • The command line is the same as Docker, so it’s easy to use

  • Service discovery and DNS are built in

  • Docker Overlay Network is built in

 

Anyway, Docker takes care of everything for you, you don’t have to worry too much about the details, and it’s easy to get the cluster up and running.

 

In addition, you can use the container on the cluster like a container on a machine by using docker command, and the container can be used as a virtual machine at any time, which is relatively friendly for medium-sized clusters and operation and maintenance personnel.

 

Of course, too much built-in has its drawbacks, such as bad customization, bad Debug, and bad intervention. When you find that one part of the code isn’t performing, you have to change the whole code, recompile it all, and merge branches is a headache when the community is updated. When there is a problem, the Manager does a lot of work, so I don’t know which step is wrong, but I just don’t return and stop there. If I restart the whole Manager, the impact is very large.

 

 

Pattern 5: Use Marathon and Mesos

 

Usage scenario: multi-node cluster

 

When the cluster size is larger, hundreds of nodes, many people are not willing to use Docker Swarm Mode, many choose to use neither DC/OS nor Kubernetes, but only Marathon and Mesos.

 

Because Mesos is an excellent scheduler, its two-tier scheduling mechanism can make clusters much larger.

 

The Mesos scheduling process is shown below:

 

 

Mesos consists of Framework, Master, Agent, Executor, and Task. There are two layers of schedulers. The first layer is in the Master, where Allocator allocates resources equally to each Framework. The second layer is in the Framework, where the Framework Scheduler allocates resources to tasks according to rules.

 

The advantage of Mesos is that the first-tier scheduler assigns the entire Node to a Framework, and the Framework’s scheduler assigns the entire Node to a much smaller cluster, and then performs a secondary scheduler within the Framework, and if there are multiple frameworks, For example, if there are multiple Marathons, parallel scheduling is possible without conflicts.

 

The detailed scheduling mechanism is quite complex, as can be seen in “MesOS Two-tier scheduling: Answer the following five questions!” This article.

 

And Mesos’s architecture is relatively loosely coupled, allowing for a lot of customization so that operations people can develop their own modules to suit their needs. For details on how to customize Mesos tasks, see the article several ways to Customize Mesos Tasks.

 

This is why many good companies use Marathon and Mesos.

 

For example, IQiyi, Qunar, Ctrip and Dangdang have all chosen to use Mesos. It should be mentioned that if you participate in the community, you will find that Marathon and Mesos are used in many cases, but not in the whole DC/OS. However, Marathon and Mesos often cannot solve some problems. So these it-savvy Internet companies did a lot of their own customizations, adding peripheral modules from Marathon and Mesos.

 

Pattern 6: Use open source Kubernetes

 

Usage scenario: 1000 nodes cluster, less customization

 

Kubernetes modules are more finely divided, more modular, more functional than bare Marathon and Mesos, and the complete loose coupling between modules makes it easy to customize.

 

 

And Kubernetes data structure design level is relatively fine, very in line with the design idea of microservices. For example, from containers ->Pods->Deployment->Service, the original simple operation of a container is encapsulated into so many layers, each layer has its own role, each layer can be split and combined, this brings a great disadvantage, is the high learning threshold, in order to simply run a container, There are a lot of concepts and choreography rules to learn.

 

But as the business gets more complex and the scenarios get more and more complex, you’ll find that the elegance of Kubernetes’ fine-grained design allows you to combine flexibly according to your needs, without having to customize a component because it’s packaged. For example, for Service, in addition to providing discovery and mutual access between internal services, headless Service is also flexibly designed, which makes it a good way for many games to maintain long connections statically. In addition, when accessing external services, For example, the database, cache, and Headless service act as a DNS, making it much easier to configure external services. For many large applications with complex configuration, the more complex is not the mutual configuration of services, which can be solved by Spring Cloud or Dubbo. Instead, the more complex is the configuration of External services. Different environments depend on different External applications.

 

Including unified monitoring CAdvisor, unified configuration confgMap, are necessary to build a microservice.

 

However, Kubernetes also has a bottleneck at present — the cluster scale is not very large, the official word is thousands of nodes, so the super-large cluster still needs strong IT ability for customization. This will be mentioned in Mode 7 about what we do in netease cloud. But it is sufficient for a medium-sized cluster.

 

And the Kubernetes community is so hot that companies using open source Kubernetes can quickly find help, waiting for new features to be developed and bugs to be solved.

 

Pattern 7: Deep mastery of using Kubernetes

 

Application scenario: ten thousand node cluster with strong IT capability

 

With the increasing use of Kubernetes, large companies can customize Kubernetes to a certain extent, so as to realize the support of ten thousand nodes or even larger scale. Of course, IT needs strong ability, netease has a lot of practice in this aspect.

 

The size of a cluster from APIServer

 

As the cluster size increases, the pressure on Apiserver increases.

 

 

Because all the other components, such as Controller, Scheduler, client, Kubelet, etc., need to listen to apiserver to see changes in etCD to perform certain operations.

 

A lot of people associate containers with microservices. Kubernetes’ module design is very microservize, each process only does its own thing, and is linked by apiserver’s loose coupling.

 

Apiserver, much like the API gateway in microservices, is a stateless service that scales well.

 

In response to Listwatch, Apiserver uses WatchCache to relieve pressure, but the ultimate bottleneck is etCD.

 

Etcd2 was used initially, and listwatch could only accept one event at a time, so it was stressful. To continue using ETCD2, multiple clusters of ETCD2 are needed to solve this problem, with different tenants assigned to different ETCD2 clusters to share the pressure.

 

There will be migration to ETCD3 with bulk push of events, but there will be some migration effort to get from ETCD2 to ETCD3.

 

The problem of parallel scheduling is solved by optimizing Scheduler

 

The scheduling of a large resource pool is also a big problem, because the same resource can only be used by one task. In parallel scheduling, two parallel schedulers will consider a resource idle and schedule two tasks to the same machine at the same time, resulting in competition.

 

For tenant isolation, different tenants do not share VMS, so that different tenants can schedule VMS in parallel by referring to the Mesos mechanism. Even if different tenants perform parallel scheduling, there will be no conflict. Each tenant schedules only a limited number of nodes belonging to the tenant instead of tens of thousands of nodes, which greatly improves the scheduling strategy.

 

 

In addition, the Predicate algorithm is adjusted to prefilter nodes that have no idle resources, further reducing the scheduling scale.

 

Speed up the scheduling of new tasks by optimizing the Controller

 

 

Kubernetes uses the event-based programming model commonly used for microservices.

 

When an incremental event occurs, the controller adds, deletes, and updates the event.

 

However, one disadvantage of the event-based model is that events are always triggered through delta. after a period of time, you do not know whether you are synchronized or not, so you need to periodically Resync to ensure full synchronization and then do incremental event processing.

 

However, the problem is that when Resync happens, a new container is created and all the events are in a queue, which slows down the creation of the new container.

 

By keeping multiple queues and prioritizing ADD over Update over Delete over Sync, real-time performance is ensured.

 

Mode 8: In-depth mastery of using DC/OS

 

Application scenario: ten thousand node cluster with strong IT capability

 

As mentioned earlier, Mesos supports a relatively large cluster size due to its unique scheduling mechanism, but most companies using Mesos do not use DC/OS. Instead, they use Marathon and Mesos naked, plus some custom-developed components.

 

Mesos can support the use of its own Framework mechanism to allow different tenants to solve problems using separate Marathons when the cluster size is too large to support the performance of a single Marathon.

 

 

 

Later, DC/OS added many components on top of the basic Marathon and Mesos, as shown in the picture, which are now very rich. For example, DCOS client (Kubectl), API gateway Admin Router (similar to Apiserver), service discovery minuteman(similar to Kube-proxy), Pod support, CNI plug-in support, storage plug-in support, etc. It’s very similar to Kubernetes.

 

Many companies have used Marathon and Mesos without further use of DC/OS, probably because, unlike the core components, Mesos, which have been supported on a large scale, these peripheral components are new and there are concerns about their stability, so there is a long learning curve. And for these new components have very good control, dare to go into production.

 

So from this point of view, although Mesos stability and large-scale no doubt, but in terms of the whole DC/OS, and Kubernetes in terms of functionality and stability, equal, need users have strong IT ability, familiar with the various modules of open source software, It can even make some code changes and Bug fixes before it can be used in large clusters.

 

Pattern 9: Deploying big data, Kubernetes vs. Mesos

 

Another advantage of Mesos is that Mesos can be used to build big data platforms by developing a Framework. Spark, for example, has a deployment approach based on Mesos.

 

 

Mesos-based Spark comes in two forms, coarse-grained and fine-grained.

 

  • Coarse-grained Mode: Apply for all resources in the running environment before running applications. In that case, resources are used all the time. When applications run, resources are reclaimed. Group granularity is a way to waste resources.

 

  • Fine-grained Mode: When an application is started, executors are started first, but each executor consumes only the resources it needs to run, with no need to think about the tasks it will run in the future. Mesos then dynamically allocates resources to each executor. With each allocation, a new task can be run. Resources can be released immediately after a single Task runs. The downside of fine granularity is performance issues.

 

Fine-grained model is really able to actually play Mesos dynamic resource scheduling is the most effective way, but there is a big performance, considering the https://issues.apache.org/jira/browse/SPARK-11857, Unfortunately, this approach has been deprecated in Spark 2.0.0.

 

Different from Mesos, Kubernetes will not intervene in the context of big data operation. Kubernetes starts the container only as a resource reservation way exists, and the allocation of resources in the container is solved by the big data platform itself. This reduced utilization is equivalent to a coarse-grained pattern.

 

To deploy big data platforms based on containers, you are advised to deploy the computing part, such as Map-Reduce or Spark. Deploy the HDFS separately.

 

Pattern 10: Mixed container and virtualization deployment

 

Usage scenario: large companies, gradually containerized

 

For many large companies but non-internet companies, containers need to be treated with care and therefore need to be containerized gradually, so there are IaaS platforms and virtual machines and containers mixed state, this state can last quite a long time.

 

In this case, you are advised to use containers inside virtual machines.

 

Using Flannel and Calico is only for bare-metal containers, and only for communication between containers.

 

 

Once there is an IaaS layer, there is the problem of secondary virtualization of the network.

 

VMS can be connected using a virtual network, such as the VxLAN. Using Flannel or Calico is like performing virtualization on top of vm network virtualization, which greatly reduces network performance.

 

If Flannel or Calico is used, applications in a container and applications on VMS communicate with each other, they need to go out of the container platform, use node port, and access using NAT or external load balancers. In real applications, it is impossible to container all applications at once, but only part of the application container, part of the application deployed in virtual machine is a common phenomenon. However, by means of NAT or external load balancers, applications can be invaded into calling each other, making it impossible for applications to call each other in the same way, especially if they are using service discovery mechanisms such as Dubbo or SpringCloud.

 

 

Netease Cloud develops its own NeteaseController. When monitoring Pod creation, Netease Cloud invokes API of IaaS to create virtual network adapters for the IaaS layer. Netease Cloud invokes Netease CNI plug-in to add virtual network adapters to containers in VMS. The addition technique uses the setNS command mentioned in the previous section.

 

 

From this figure, we can see that the container’s network card is directly connected to the OVS of the virtual private network, which is a flat layer 2 network with the VIRTUAL machine. From the perspective of OVS, the container and the virtual machine are in the same network.

 

In this way, there is no secondary virtualization on the one hand, only OVS layer virtualization. In addition, leveling the network between containers and VMS allows some applications to deploy containers and VMS without encroaching on applications. In this way, applications can gradually be containerized.

 

 

There is a project in OpenStack called Kuryr that can do this very well. Using open source OpenStack completely and Kubernetes can try to integrate it.

 

M.qlchat.com/topic/detai…

Password: 927