Introduction: North Slope model: Provide HPC service with the help of big computing performance breakthrough on the cloud, which focuses more on cloud service.

With the deepening of digital transformation, industry applications put forward higher requirements for computing power. In order to meet the flexible business forms and computing requirements of different industries, the HPC Cloud, which takes Cloud computing technology as the technical means of service mode innovation and takes high performance computing service as the core, has been widely concerned by the industry.

In order to further build industrial consensus and promote the development of HPC Cloud industry, on December 21, 2021, the Computing Network Convergence Industry and Standards Promotion Committee and supercomputing Innovation Alliance jointly held the first HPC Cloud Industry Development Forum, inviting experts from academia and industry to “Cloud supercomputing, With the theme of “Wise Future”, he will give a wonderful speech from many aspects such as technology research, application deployment and practice development.

At this forum, He Wanqing, the head of high performance computing from Ali Cloud, delivered a keynote speech titled “South Slope VS North slope, Ali Cloud HPC-AS-A-Service Industry practice”.

01 Aliyun HIGH performance computing development

It has become a trend for supercomputers to be deployed and delivered over the Internet. He wanqing said ali Cloud’s HPC has been developed for four to five years, and is currently deployed in many industrial and industry-related vertical businesses, such as automotive simulation, film and television post-rendering, AI biopharmaceutical, weather services and other fields.

Based on the technical observation of offline HPC in the past ten years, he Wanqing talked about the trend of transforming traditional supercomputing system into HIGH-PERFORMANCE computing cloud. He compared the two modes of transforming offline supercomputing into cloud Service and cloud Service providing high-performance computing products and services to climbing the peak of HPC-AS-A-Service from different routes of the south slope and the north slope. In the north slope, cloud companies with large cloud computing performance breakthrough to provide HPC services, the focus of the cut is more focused on the cloud service: single SLA and mass stability, fast, flexible, and a variety of cloud products and services quickly and SaaS services, beyond to provide “nuclear” and “force” as the core model, emphasize the overall research on cloud business.

2020 is a high performance computing in stability, flexibility and billing sensitivity direction of the fastest growing for a year, a lot of important tasks in ali can already massive deployment of cloud, cloud on the process to the IOE, ali can make non-inductive eliminate over 70% hardware faults, in the public cloud computing products side, Michael chan is SCC seventh generation new instance specification.

On the DpCA computing platform, its advantages include efficient offloading of virtual networks, complete decoupling of physical networks, storage and computing, and avoidance of resource contention. The DpCA eRDMA function will also be officially launched in 2021, realizing the pooling and mixed deployment of CPU and GPU instances, greatly expanding the cluster range of CPU and GPU instances. Expanded from POD to availability areas and data centers, the platform supports large-scale flexible scaling and integration of VPC, eRDMA, and storage networks. In numerical weather forecasting, eRDMA will greatly increase application scale and parallel efficiency.

02 AliYun High-performance computing Cloud Stack

He Wanqing said that, based on the above underlying architecture, Ali Cloud HIGH-PERFORMANCE computing is collectively called “DpCA Supercomputer”, on which the E-HPC cloud software stack for public cloud and hybrid cloud is deployed, which is based on DPCA server +RDMA network + parallel file system development of PssS layer services. Among them, the scheduler and elastic expansion, hot migration and other functions can be transparently realized from the bottom to the customer. At the ISV layer, services are provided in the form of workflows, where data is moved not just physically, but via high-speed networks and a one-time upload to complete task delivery. In the computing cluster across the data center, three networks are integrated. The existing scheduler fully realizes the scheduling of computing nodes across the available area and assigns different tasks to different instances. In terms of different queue scheduling, only Ali Cloud in the world can realize different instances of queue binding during operation.

Ali Cloud high-performance computing application scheme

He wanqing introduced the combination of preemptive instances and breakpoint continuation, so that users can get the resources they need at a lower cost than the traditional way. In terms of hybrid cloud technology, data pulling and calculation can be realized online and offline simultaneously through hybrid cloud asynchronous file storage, which has been widely used in film and television rendering scenes.

In the E-HPC commercial License solution, the cloud on cloud off cloud network can be connected through ali Cloud high-speed channel. E-hpc automatically deplores cloud computing resources and configures the License service or License proxy node. The License server in the cloud connects to the License proxy node through VPN. The E-HPC service is responsible for License Manager deployment, License provisioning, and usage monitoring.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.