The author | Zhu Qi

Original: https://mp.weixin.qq.com/s/DV0sWnluKe5qCYhvZ8m38Q



1 overview

1.1 Characteristics of DpCA architecture

The official documentation of Alibaba Cloud describes the DpCA architecture as follows:

It retains the resource elasticity of common cloud servers and retains the experience of physical servers due to the nesting virtualization technology.


1.2 Difficulties in understanding

At the same time, it has the resource flexibility of cloud server and retains the experience of physical machine, which makes it easy for users to have a question when they need to deeply understand The Divine Dragon architecture: is the divine Dragon architecture virtual or real? If virtual and real integration, how to understand what virtual and real integration is? By what means?


1.3 The issues highlighted in this paper

Combined with the above characteristics and difficulties in understanding of The Divine Dragon architecture, this paper studies and analyzes the divine Dragon architecture in detail, explaining how the divine Dragon architecture achieves the resource flexibility of cloud server and retains the goal of physical machine experience.


Why was the Dragon architecture invented

2.1 Take moving bricks as an example to illustrate the characteristics of virtualization technology

The technology of turning a physical machine into a virtual machine is called virtualization. Such as home decoration is 100 bricks I need to carry, also in decorating a neighbor’s house also has 100 bricks to carry, we have 50 porter, when workers arrived found the master is sleeping in the neighborhood, so his 50 workers are waiting for him to wake up and move the brick, 50 workers can directly in my please help me to start moving bricks, as shown in the figure below:

It happened that the two workers came from the same company, so the contractor came to have a look and found that the workers in the neighbor’s idle state felt very inefficient. So I decided that since the neighbor’s workmen were free at the moment, they would come and help me carry bricks. After consulting with me, the cost would not increase, but the number of workers would increase by 50. Naturally, I was very happy, thinking that 50 more workers would be given to my family. So the neighbor’s workers came and helped me move the bricks, as shown in the picture below. We call this 100 workers a compute node:

The contractor had one thing in mind: he needed to go to another construction site immediately. Now 100 workers were helping my family to move bricks, so the progress was very fast. But what if my neighbor woke up and started to move bricks? Give him back the workers who temporarily moved bricks for my house and the number of workers is at least 50.

So a left the brick-moving action, watching the neighbor’s owner to prevent him from suddenly waking up, the number of workers to help my house to move bricks is now 99. This person is responsible for keeping an eye on the neighbor’s sleep and returning the workers to him later, we call him the management node.

Waked up neighbor’s owner, armour then immediately from my house will arrange to the neighbor 50 workers began to move brick, at the same time to discuss with me, because my house before move brick a times more labor, so the 1000 brick has been moved only 50 pieces, and the brick in the neighborhood or $1000, so in addition to the neighbor hire 50 workers can only five workers in my house, Forty-five of my own workers were employed to carry bricks for my neighbor’s house. I readily agreed, so the number of bricks carried by the two houses changed again, as shown in the figure:

The essence of the whole process is the elastic computing, the premise is virtualization, if the lack of virtualization technology, on behalf of me and her neighbor’s hiring workers from the two companies, no one came to plan as a whole decide every move brick on the number of workers, so even if the neighbors in bed, he hired workers idle cannot come and help me with a brick, The premise of flexible allocation of brick-moving workers is to consider the overall distribution of workers employed by the two families. The benefit for users lies in that both my neighbor and I have 1000 bricks to move, but the time to move the bricks is different. When I move the bricks, he is sleeping, and when he wakes up and needs to move the bricks, the bricks in my house are almost finished. The labor force of 100 workers is used to the extreme in different periods of time.


2.2 Bottlenecks of virtualization technology

As can be seen from the above example of moving bricks, worker A is no longer responsible for moving bricks because he is responsible for coordinating the arrangement of workers to move bricks between me and my neighbor’s house. In other words, one worker out of 100 laborers is transferred to do the management work, and 99 laborers actually move bricks. According to the original employing labor, employs 50 workers in my house, neighbor’s employs 50 workers, the total workforce of 100 people, so the actual move brick is less labor 1, but because I and neighbor moved brick time stagger and we all enjoyed the feeling more than 50 workers labor (actually my house 99, The neighbor woke up with 94.) So we met our needs and didn’t care too much about 1 in 100 workers being the manager of our two workers.

Hidden danger is that if my house brick moved out, the neighbor’s move brick workers rose to 99, he found the need to hurry up, asked 100 workers move the brick, at that time I and my neighbors will also found that Labour because someone to do management work and the less one, we both spent a total of 100 workers, but can only enjoy a total of 99 workers of the workforce.

In fact, this one manager is indeed an unsolvable bottleneck in the whole system. It means that as long as virtualization and elastic computing are adopted, 100 labor forces must choose one manager, and only 99 labor forces can move bricks.

Reflect on cloud computing is just a physical server virtualization technology, we must configure the management node, so a single physical server provided by the calculation of the force on the basis of the original need to discount, cause physical server generated after the virtualization technology based on the cloud server computing performance inevitably is worse than a physical server. Although users may not feel it because of the elastic computing capabilities of cloud server clusters.

This bottleneck originally existed among cloud service providers and seemed inevitable because it was felt that there was no way to solve the total computing power loss caused by the need to manage nodes and therefore no cloud service provider to discuss the problem further. And Ali Cloud Divine Dragon architecture is the first in this bottleneck problem began to knife, want to achieve the goal is that since 100 workers move bricks, it is necessary to move all bricks, but also need to have the means to manage and control my house and neighbors at different times to move the number of workers.


3 The divine dragon is born

3.1 Continue to talk about our brick moving problem

It is pointed out in Chapter 2 that as long as virtualization and elastic computing are adopted, it means that 100 labor forces must choose one manager, and only 99 labor forces can actually move bricks. The goal of Shenlong is that if 100 workers move bricks, they need to move all of them, but at the same time, they need to have the means to manage and control the number of workers who move bricks at different times in my house and my neighbor’s house. The picture above shows the yellow worker who has been pulled out to do the management work and is still moving bricks.

The contractor looked at the current situation and thought, if you want to maintain the flexibility of two workers moving bricks, you need 100 workers to smoke 1 workers to do management work, then if 1,000 workers need to lose 10, 10,000 workers need to lose 100. The greater the amount of engineering, the more labor is lost. When the business is developed in a big plan, the problem of loss if it can be solved can greatly improve the efficiency of moving bricks. The virtualization loss of Ali Cloud before the advent of DCP architecture was actually greater than the example of moving bricks. The average virtualization loss was about 10%, representing 100 workers, of which only 90 were moving bricks, and the remaining 10 were doing the management work of moving bricks.


3.2 Core concepts of Dragon 1.0

Considering the actual situation, the contractor decided to let worker A, who had been removed to do management work, still go back to carry his bricks, because his strong strength meant that he was originally suitable for carrying bricks but not for management work. And the management of the workers adopts the project manager system, that is, the introduction of professional managers to be responsible for the management of the workers, so that the workers are only responsible for moving bricks, of course, after the introduction of professional managers, the cost is certainly rising, but the labor force of moving bricks is not lost. The situation after adopting the project manager system is shown in the figure below:

It needs to be pointed out that the minimum unit of the elastic expansion of the brick team is 1 team. If the brick 1 team is busy, it can only ask the whole brick 2 team or the whole brick 3 team to help, and can not say that only a few workers from the brick 2 team come to help. This structure ensures that the labor force of each brick moving team is managed by a dedicated project manager without loss. I will not extend to the Dragon architecture here, because there is an important issue left unaddressed.


3.3 The essence of heterogeneous computing is the combination of brick moving and wall building

Contractor from its own business development analysis, found that I and my neighbor except move brick and wall of build by laying bricks or stones demand, and the original workers moved all of them are good at and not good at build by laying bricks or stones wall brick masonry, let workers move brick to build by laying bricks or stones wall is also possible, but the speed and quality are less specialized build by laying bricks or stones wall masonry. Therefore, the contractor added masons to the original team, so that one team could move bricks and build walls, as shown in the picture below:


The way that the bricklayer and the mason combine is called isomeric computing, and the bricklayer is moving the brick while the mason is building the wall.


3.4 Summary of features of Dragon 1.0

So far, although the dragon has not been mentioned, in fact, it has all the characteristics of the Dragon 1.0 said clearly, here the brick team and the characteristics of the dragon 1.0 combined as a summary of the characteristics of the Dragon 1.0.

In order to solve the problem of labor loss, the team moves all the bricks and builds all the walls, and the management work is responsible for by the special project manager. In DCP 1.0, Ali Cloud created a special board with intelligent chip to solve the problem of virtualization loss, which is responsible for virtualization scheduling. This special board is called MOC card, and its appearance is shown as follows:

In order to solve the task of brick-moving and brick-building, the brick-moving and brick-building team with project manager is the DCP server of Ali Cloud, as shown in the picture below:

It’s commonly known in the industry as a flexible bare-metal server. According to the official document of Aliyun: ECS Bare Metal Instance (ECS) is an elastic high-performance computing service that provides the same computing performance as traditional physical machines. It is secure and physically isolated, and provides you with real-time service response capabilities in a minute delivery cycle, helping your core services grow rapidly. Now you can understand why the computing performance is no different from the traditional physical machine, because the DpCA server is a physical machine, so of course the computing performance is no different from the physical machine, in addition, it can be flexibly scalable like the cloud server, and the delivery cycle is minute.

In a word, the characteristics of DpCA 1.0 are that dpCA cloud server combines the advantages of both physical server and cloud server. In essence, dpCA is a physical machine that can be flexibly scaled and designed specifically to provide cloud services.


3.5 Bottlenecks in Dragon 1.0

Back to the example of moving bricks, the contractor ran into a new problem. The neighbor himself was a project manager and had special requirements for moving bricks and laying walls. He required 100 workers in a brick-moving team to move the left bricks and build the right wall at the same time in the morning. In the afternoon, move the bricks on the right and build the wall on the left. The project manager of the brick-moving team had no experience in this situation and did not know how to deploy the workers on the team.

This is the bottleneck of Dragon 1.0. Virtualization actually goes in two directions:

  • Virtualization combination, stick a pile of physical machine into a large virtual machine;

  • Virtual shard, a physical machine into a bunch of small virtual machines.

Dragon 1.0 do the virtualization portfolio, but she didn’t do virtualization segmentation, in case the project manager for move brick wall only know that in his own team fails to help other teams, but his team in how to respond to my neighbor’s request, by the team in the morning and afternoon workers allocate do labor flexibility but there was no way to implement.

This problem was solved in Dragon 2.0.

The last

Welcome everyone to pay attention to my public account [Programmer Chase wind], sorted out 1000 2019 Java interview questions of many companies more than 400 pages of PDF documents, articles will be updated in it, sorted information will also be placed in it.





If you like the article, please remember to like it, thank you for your support!