USENIX ATC, the international conference on Computer System Architecture, was held online recently. Founded in 1992, ATC is the top conference in the field of computer systems organized by USENIX. It has been successfully held for 31 sessions so far. A series of influential research results in the field of computer systems such as Oak language (the predecessor of JAVA language), QEMU, ZooKeeper and so on have been published or published in USENIXATC. ATC has very high requirements for papers, which must meet the requirements of basic contribution, prospective impact and solid system implementation. The acceptance rate of this paper is only 18%, and only 3 best papers are selected in the world.

ATC2021 release, acceptance rate hit a new low, 18%. At the same time, three of the best papers were published, including one on the Flying operating system submitted by Aliyun, which set a record for a Chinese company.

USENIX ATC, the international conference on Computer System Architecture, was held online recently. Founded in 1992, ATC is the top conference in the field of computer systems organized by USENIX. It has been successfully held for 31 sessions so far. A series of influential research results in the field of computer systems such as Oak language (the predecessor of JAVA language), QEMU, ZooKeeper and so on have been published or published in USENIXATC. ATC has very high requirements for papers, which must meet the requirements of basic contribution, prospective impact and solid system implementation. The acceptance rate of this paper is only 18%, and only 3 best papers are selected in the world.

Ali Cloud submitted a paper named Scaling Large Production Clusters withPartitioned Synchronization (PDF version), which discussed how Feitian solved the scheduling problem of large-scale computing resources. It was included and won the best paper award. This is the first time that the best ATC paper has been published by a Chinese company.

Feitian is a super-scale cloud computing operating system developed by Ali Cloud, which can connect millions of servers all over the world into a supercomputer and provide computing power to the society in the way of online public service. The core services of Feitian include distributed computing, storage, database, network, etc. The paper awarded this prize is one of the resource scheduling services.

It is reported that the feitian distributed scheduling system “fuxi2.0” submitted by ali cloud is the result of a joint project between ali academic cooperative innovation research program (AIR) and Jamescheng from the Chinese university of Hong Kong. The paper discusses the distributed scheduling architecture industry serious resource conflict and the worse performance of scheduling problems, and creatively puts forward a set of resource conflict resolution mechanism, realizes the scheduler in the scalability of cluster scale, at the same time ensure good scheduling, performance and operation results supported the flying big data platform MaxCompute single 100000 nodes of cluster size, Concurrent capacity of 40,000 jobs per second.

The core problem of cloud computing is how to organize thousands or even larger machines efficiently, flexibly schedule and manage tasks, so that users can use cloud computing as a machine. With the increasing amount of data and computation, cloud computing scenarios also become very large scale. The traditional scheduler based on the central architecture is limited by the single point of processing capacity and cannot be scalable in scale.

“There is a saying in the field of distributed systems that every time the scale increases by an order of magnitude, it becomes a whole new problem,” said Guan Tao, a researcher at Alibaba cloud Computing Platform Business Unit. Scale, utilization rate and fairness are the three cores of the scheduling system. This paper, based on part of the work of Aliyun Feitian system, explores the scalability of the scheduling system in the super-large scale without losing utilization rate and fairness.

In recent years, a number of research achievements of Feitian operating system have been admitted by the International Summit conference: In 2019, data scheduling paper Yugong was admitted by the top database conference VLDB; In 2020, Machine learning & Single machine scheduling paper AntMan was admitted by OSDI, the top conference on operating systems. In 2021, computational scheduling paper Fangorn was admitted to VLDB, the top conference on databases.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.