Ali Cloud released the VGN5i instance, the first lightweight GPU heterogeneous computing product on public cloud in China. This instance breaks the limitation of traditional pass-through mode and can provide finer service than a single physical GPU, thus enabling customers to carry out business with lower cost and higher flexibility. Suitable for cloud games, VR/AR, AI reasoning, DL teaching and other lightweight GPU computing scenarios, more fine-grained GPU computing services.

What is a lightweight GPU cloud server?

Lightweight GPU cloud server is a new GPU cloud server specification family. It uses the GPU virtualization technology of the public cloud to install virtualized GPU resources in GPU cloud server instances as virtual Gpus. The difference between lightweight GPU cloud servers and conventional GPU cloud servers is that lightweight GPU cloud servers provide finer GPU computing resources, such as fewer CUDA computing cores and smaller video memory. In this way, services can flexibly configure GPU computing resources based on resource requirements.


What pain points do users encounter when using conventional GPU cloud servers?

The computing particle size of the GPU is too large: The computing capability of a single physical GPU is becoming more and more powerful, but many applications require smaller GPU computing resources. Regular GPU resources are not conducive to automatic service scaling: An instance with a single physical GPU resource becomes a “fat node” in service deployment because it needs to make full use of GPU resources, which is not conducive to the design of an elastic and scalable architecture. It lacks flexibility and cannot cope with rapid service changes. Common GPU instances cannot be migrated online: A common PASS-through virtualization GPU instance cannot be migrated online due to its architecture features.

How are lightweight GPU cloud servers different from regular GPU cloud servers?

We are different from GPU accelerator presentation, business continuity, computing business scenarios, and use and management: GPU accelerator presentation Conventional GPU cloud server instances provide physical GPU accelerators through device pass-through; Lightweight GPU cloud server instances provide virtual GPU accelerators through GPU virtualization. Service continuity A conventional GPU cloud server supports only offline migration of jobs A lightweight GPU cloud server supports online migration of jobs Computing service scenarios A conventional GPU cloud server applies to GPU acceleration computing under heavy load, for example: Deep learning training and inference computing, HPC computing, and Heavy graphics computing Lightweight GPU cloud server applies to GPU accelerated computing with light load, for example: Lightweight deep learning inference computing, deep learning teaching scenarios, cloud games and VR/AR scenarios using and managing conventional GPU cloud servers to provide elastic computing service instances, business expansion adding lightweight GPU cloud servers in the form of a single physical GPU resource to still provide elastic computing service instances. But business scaling is done with smaller GPU resources (e.g. 1/8 or 1/4 Tesla P4 resources);

What are the technical highlights and technical leadership of VGN5i, and what problems are solved?

Technical highlights: Cloud server instances that allow users to create smaller virtual Gpus on public clouds. Technology leadership is threefold: whether any leading computing technology should be ported to the public cloud for output, or whether it should follow the technical requirements of reliability, economy, and ease of use. First, reliability. A public cloud server is a public service. It should provide “simple and reliable” basic services to all users. Although virtualization GPU technology is mature in private deployment conditions, it still faces several reliability challenges in public cloud: first, data security; Second, resource isolation; These two issues are usually not required in private deployments because private deployments are for the same user, making security and resource contention easier to resolve. However, to be used in the public cloud, these issues must be resolved in the public cloud. The second is economy. Users can choose lightweight GPU cloud servers because they hope to use GPU resources in a more precise way. In essence, they pursue economy. The virtualization GPU technology can be deployed in a private environment. As the requirements are determined, the virtualization proportion can be configured based on the expected usage scenarios. However, in a public cloud scenario, it should not only meet the usage scenarios of all users, but also maintain the efficiency of the scheduling system, constantly reduce costs and pursue economy. Finally, ease of use, which is reflected in the following aspects: one is that the management interface and usage habits are consistent with those of other ECS instances; the other is that the application scenarios and methods of APP in GPU instances are consistent with those of other conventional GPU instances. There is no learning cost for the user.



How to use a lightweight GPU cloud server?

The usage of GPU instances is as convenient as that of ordinary elastic computing instances. Users can configure and purchase services using the Web console or OpenAPI. Users can fully control the instance in the process of use. The instance runs in ali cloud computing environment and can also be used together with other cloud services. New instances can be extended within minutes to accommodate the growth of user services when they encounter business peaks. During the entire process of using the virtualization GPU service, users can enjoy the online service consultation and quick fault handling services.

What are the instances of lightweight GPU cloud servers?

The VGN5i instance based on NVIDIA Tesla P4 is currently available for sale, which provides 1/8 to 1/1 virtual GPU accelerator; A VGN6i instance based on NVIDIA Tesla T4 will be available later, which provides 1/16 to 1/1 virtual GPU accelerators.

What are the application scenarios of a lightweight GPU cloud server?

A lightweight GPU cloud server can be configured to create a GPU cloud server instance that matches the computing resources required by services based on service requirements. Therefore, each lightweight GPU cloud server instance can run only one computing service load. When the service peak comes, a computing service load can be extended horizontally. This feature is very suitable for the batch deployment of AI computing in Internet business, as well as the teaching experiment scenes of cloud games, AR/VR application in cloud and deep learning.

What is the user value of VGN5i?

The user value of VGN5i includes: reducing the cost of batch deployment of GPU instances, enabling rapid and flexible scaling, and improving o&M efficiency. Reducing the cost of batch deployment In many graphics computing and AI inference computing scenarios, users generally do not require strong computing performance of a single GPU instance, but pay more attention to the cost of services in batch deployment. Small-grained virtual GPU instances are more suitable for these scenarios, balancing the cost requirements of users in batch deployment. Rapid elastic scaling With small-grained virtualized GPU instances, users no longer need to deploy services as complex fat service nodes to match strong physical GPU resources, but can decouple services with GPU computing requirements on different virtualized GPU instance nodes based on container mode. In this way, thin service nodes can be rapidly and flexibly scaled to improve service o&M efficiency at any time. Improving O&M efficiency Thin service nodes are deployed using small-particle virtualization GPU instances, simplifying service environment configuration and service interfaces. Large-scale AI applications can be deployed using different images without complex fat nodes, improving O&M efficiency and reducing time risks and costs.


The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.