Abstract:

1. The introduction

Elastic Compute Service (ECS) is the mainstay of Alibaba Cloud’s revenue and traffic provider. As a new business and industry customers the engine of technology innovation and enablers, pay on ECS can not only in 10 minutes stated a medium size Internet companies all the required computing power, can carry more double tenth ali group a great peak demand elasticity and the Internet giant customer business peak required computing power, to help all users calculation of force boundary limits. Silently working behind the stage is the ECS scheduling system, the legendary “Houyi”. As a resource scheduling system that created and delivered countless virtual machines, Houyi came to feel like the Mona Lisa smile, mysterious and curious. Today let us dig the past and present life of hou Yi.

Ancient origin: Hou Yi was born

The time machine goes back to 2009, when technology companies around the world were deploying cloud computing, just like blockchain, artificial intelligence and new retail. In the same year, Aliyun first tried yingti and broke the ground in the field of cloud computing, independently developed Pangu, Nuwa, Shennong, Kuafu and Fuxi. These ancient gods together constitute the cloud computing base system platform of Ali Cloud – Apsara system. Houyi was born naturally. Based on Apsaras, it unified resources such as computing (Host), network (IP) and storage (Pangu block storage) and used virtualization technology to produce a virtual server that could be delivered to users. In May 2010, the first ECS cluster was launched in Beijing. At that time, Houyi was still a simple child. His job was to control the flow of the virtual machine production line. His brain (scheduling strategy) was relatively simple.

3. The Industrial Revolution: Wild growth

Only fast can’t break! Yes, that’s how fast the “industrial revolution” is coming! Although Houyi did not have time to muddlehead a few times, but since THE launch of ECS, the business volume is developing very rapidly, no problem to become the fastest growing business of Ali Cloud, no one. ECS quickly proved that selling computing power as a virtual machine was the most suitable way to commercialize cloud computing, which, like the steam engine of the industrial age, propelled Alibaba Cloud into a savage growth era.

ECS soon grew to nearly a hundred flying clusters, with new clusters being deployed every week and at least one new version released every week. This pace overwhelmed the students who had been maintaining and upgrading houyi systems. Wow, those years together with the prehistorical power of the students, still remember the struggle of the appearance? With the growth of the scale, the simple world houyi faced suddenly became the sea of stars, and the complexity of scheduling increased with each passing day. It was necessary to break through network Pod and computer room. Growing pains are inevitable. New system architecture design goals, large Region level resource management and scheduling capabilities, rapid iterative development…

Fortunately, with the technical support of the group, Houyi adopted the technology of distributed service and gradually reconstructed and evolved. The biggest challenge was that the business was growing and it was impossible to stop and give us a few months to do it. We use the framework first, then small migration scheme, under the premise of not affecting business development, one function module, from function migration to data migration, gradually complete the system upgrade switch, in fashionable words, this is to change the engine for the aircraft in flight. Houyi, who changed the engine, upgraded from the original single cluster management and control system to a large Region multi-machine room management and control system based on distributed service architecture. The scheduling strategy is upgraded from the original simple resource allocation model to the current classical filter + weight factor scoring model. The management scale has changed from a cluster with a maximum of hundreds of physical machines to a cluster with a maximum of tens of thousands of physical machines. Tempered by the industrial revolution, our boy Houyi has grown up.

During this period, ECS gradually improved the upper business system, mainly providing several major capabilities: standardized ECS Open API interface, connecting official website sales and API users; Sales constraints, access ali cloud billing system, and realize ECS unique billing mode (annual and monthly payment, payment by volume, etc.); Configure business attributes and sales specifications in the cluster granularity, and schedule the cluster according to user needs; Basic inventory service and water level control.

4. The Information age: Coming into its own

If the era of industrial revolution is the rapid development of scale, a single dimension, then the information age is the development of individuality and diversification. On the one hand, ECS attracts more and more rich customer groups with more and more diversified demands, including different price demands, stability demands, regional demands, etc., which requires ECS to package products with different characteristics to meet different customer scenarios. On the other hand, the explosion of ECS-related technologies has also brought about a “war of ideas”. The virtual Network team has focused on launching customized VPC networks. The block storage team launches high-performance SSD cloud disks and cost-effective hybrid SSD cloud disks, while ESSD cloud disks achieve the androgarities of cost-effective and high-performance cloud disks. The virtualization team moved from Xen to KVM, launched heterogeneous computing GPU and FPGA virtualization technology, and began to develop a new generation of elastic bare metal cloud server (Shenlong). So many new products and features interconnect with dozens of underlying physical models, multiple network adapters and network architectures, multiple virtualization schemes and virtual storage, and multiple versions of virtual networks. To achieve accurate scheduling of products to resources and maximize the efficiency of resource use are the basic capabilities that Houyi system needs to provide in the information age.

In the wild growth period, the upper business layer was responsible for cluster-level scheduling, while Houyi was responsible for intra-cluster scheduling. At that time, each cluster deployed one kind of service, and the team setup was also divided into two layers in this way. But now, in order to support the rich product form and optimize inventory efficiency, each physical machine in the same cluster may sell different product specifications, the original hierarchical scheduling is clearly outdated. We performed a combined operation to redefine the division of responsibilities between the upper and lower systems, with the upper business system responsible for business functions and the lower Houyi system responsible for all scheduling related logic. The unified scheduling logic not only greatly improves Houyi’s scheduling ability, but also enables scheduling technology to drive more abundant product forms. List the system functions and products supported by the scheduling technology in this period:

No miss. – Precise dispatch

Houyi groups and filters resources based on tags, enabling precise scheduling of product specifications in a large AZ. Rich weighting factors enable Houyi to weigh the optimal scheduling decision under multiple objectives. Optimal ratio packing (minimum fragmentation) is the minimum requirement. To take just a few examples of others: Deployment Set: meet user-defined Deployment location requirements, such as physical machine granularity scattered VM resource consumption scattered: Ensure performance SLA, improve user experience Customer VM split rack Power balance……

Fine inventory management

In fact, VM scheduling is houyi’s business hobby, inventory management is Houyi’s own work. Due to the diversity of product forms, the inventory data calculated by different payment types and different payment times in the same region and the same product specifications are different! Many people know that one of the key skills of e-commerce is inventory and supply chain management. Houyi is actually a shopkeeper who sells VM on the Internet. He played a lot of clever tricks behind the back: inventory level control: to ensure that each product in each available area can guarantee the elastic expansion and upgrade demand when the supply is close to cut-off; Stock sharing: to meet the emergency needs of multiple products; Inventory forecasting: forecast sales and adjust inventory accordingly

An instance of a sale bid at a fire-sale price

After 1 fold jump price yi shopkeeper also dare to sell? ! Yeah, he’s been selling them for a long time. This is a kind of game called bidding instance, and in the game is hou Yi. To put it bluntly, he is taking out resources that are temporarily empty and selling them temporarily. Why “temporary”? Because Houyi only sold bidding instances when inventory was abundant, inventory was tight to withdraw. But the recycling example is also exquisite, not random recycling, such as to ensure that at least 1 hour of operation, to five minutes in advance notice, and so on. Then why the bidding? Sell more in resource-constrained areas? In fact, more important is the linkage with inventory, is a price lever, so that customers take the initiative to choose cheap prices and adequate inventory of the region and specifications. This is actually scheduling customers, behind the road is really many.

Snatching the chestnuts out of the fire

Although the price of bidding instance is cheap, the quantity is enough, but because it will be suspended at any time to release, so the suitable business scenarios are limited or the need for technical capacity architecture adaptation to be happy to use. Is there a cheap, cost-effective instance type that you don’t have to worry about being released? Take a look at some examples of cheap, guaranteed performance. It’s a high-level operation by Houyi and ECS’s powerful virtualization team, and it’s like pulling the chestnuts out of the fire. However, with strong technical support, Houyi broke a physical thread into small pieces and sold them, but also guaranteed SLA. Computing power accumulates Credit in spare time, consumes Credit when it is needed, and buys Credit with money in the future.

  1. Intelligent future: The door to the mythical future has been opened, and the age of intelligence has arrived. After a period of savage growth and functional enrichment, Houyi moved into deep waters that required more wisdom and a broader vision for intensive farming. Cloud computing is one of the core competitiveness of cost-effective, hou yi to do is data-driven future link, intensification of all inventory allocation and scheduling integrated big closed loop, and mining on each link using the dividends of more intelligent methods, from the data visualization, to the operation of automatic to finally realize the comprehensive intelligent.

Houyi’s story continues. With the development of ECS business, ups and downs on the road, our eyes all the way strange peaks and different scenes, persistent forward! Block storage, virtualization, virtualized networks, and ECS management and control are all part of a cohesive team. Of course, Houyi is standing on the shoulders of giants, and his growth is inseparable from the powerful core technology of Fetian Cloud computing platform and the all-in-one support of ali Cloud products! I am very glad that the selfless cooperation and help of the brothers in Ali Group can make Houyi truly become a myth in the industry in the future.

The original link