Abstract: Computing power virtualization technology for consumers, can effectively reduce the cost of computing power, for equipment vendors or operators, it can greatly improve the utilization of computing power resources, reduce equipment operating costs.

Why do you want to virtualization computing power

In recent years, the field of artificial intelligence has developed rapidly. Computing power is productivity. Artificial intelligence computing centers based on AI clusters have blossomed in many places in China, providing a new urban public resource for the government, enterprises and individuals, and becoming the “black land” in the intelligent world.

Computing power resource sharing is bound to bring about resource allocation problems. Both large AI infrastructure such as ARTIFICIAL intelligence computing center and small AI computing resources such as AI computing card bear diversified DEMANDS of USERS for AI computing power.

For example, in some scenarios with low computational power requirements, the AI model does not need to occupy the whole AI computing card during training or reasoning, and lower resource allocation is expected to meet the requirements, such as 1/4 slice. Or, in a teaching scenario, an AI server (which can consist of one or more cards) wants to be shared with the entire class.

With the help of virtualization technology can easily cope with the above problems!

We virtualize the computing resources of the entire card or system into multiple virtual computing devices, and deploy different virtual machines or containers on them to run AI training or reasoning application services respectively.

For consumers, it can effectively reduce the cost of computing power, while for equipment suppliers or operators, it can greatly improve the utilization of computing power resources and reduce equipment operating costs.

Common virtualization technologies in the industry

Common virtualization technologies in the industry include Time-based sharding and instance-based isolation.

For example, with the introduction of the three-child policy, in the next few years, mothers may be faced with having to deal with three monsters at the same time: the first one has to help with homework, the second one has to read picture books, and the third one has to drink milk… Busy first who has a problem, how to do?

Don’t panic, it’s no use waiting for the sky to fall. Time-sharding virtualization technology provides a friendly solution for mothers of three children, so that each child won’t have to wait too long, and the rain and dew will be equally balanced and impartial:

In fact, computing resources are divided by time. Similar to CPU process scheduling, each task occupies all physical device resources by time slice.

And isolation based on computing instances of virtualization technology, it is the direct division of computing resources itself, just as the beginning of the article, for example, the teacher teaching, will be divided into N a AI server virtual groups and make security isolation, and then Shared with the whole class to use, make every student can use independently, noninterference, implement multiple virtual machine instances to share the same hardware resources.

Obviously, both virtualization technologies can effectively improve the utilization of computing power resources, but they are not seamless.

Security isolation between AI tasks cannot be achieved based on time sharding. However, based on computing instance sharding, the sharding granularity supported in the current market is not fine enough to meet users’ demands for higher specifications of sharding.

The AI computing virtualization technology provided by Huawei supports sufficient segmentation granularity and can achieve sufficient secure isolation between virtual devices, which is superior both in terms of segmentation granularity and security.

Here’s how Huawei did it.

Huawei AI computing virtualization technology interpretation

It integrates 32 Da Vinci architecture AI Core computing engines, which can efficiently perform matrix and vector computation-intensive operator tasks. Centerm 910 can perform 640 TOPS with 8-bit integer accuracy (INT8). Performance under 16-bit floating point (FP16) is up to 320 TFLOPS.

The single-chip Centerm 910 AI processor provides so much computing power that in order to make the best use of it, virtualization technology must be fully utilized to maximize the computing resources of the hardware.

As a heterogeneous computing architecture in the FIELD of AI, CANN can not only do its job well — release hardware performance to a great extent and provide powerful computing power support for AI applications. In the latest version 5.0, with the help of ** “Computing power virtualization technology based on AI Core segmentation” **, CANN can be divided into up to 32 fragments. This improves hardware resource utilization.

The following figure shows the computing power virtualization framework of CANN 5.0:

Based on the framework, it supports single or mixed deployment of virtual machines and containers, and division of different types of computing power units to achieve flexible segmentation and isolation of computing power, memory, and bandwidth.

The following table shows the typical configuration of computing power virtualization based on AI Core sharding supported by Centerm 910:

For example, in the teaching scenario, some people have small demands for computing power, while others have large demands for computing power. The computing power virtualization technology based on AI Core segmentation can flexibly slice a Centerm 910 AI processor to match the diversified computing power demands of developers:

Computing power is the grain in the age of intelligence, and half a thread and half a thread are precious. Thanks to CANN’s ultra-small-grained computing power segmentation mechanism, computing power can be allocated more reasonably in the scenario of small computing power, and precious computing resources can be more fully utilized.

CANN not only realizes force segmentation, but also realizes security isolation among virtual devices, including:

  • Support memory isolation between virtual devices:

Automatic partition and isolation of HBM and DDR memory can be realized by computing force configuration.

  • Supports data isolation between virtual devices

Virtual device identifiers can automatically process and isolate user data based on virtual devices.

  • Supports service fault management and isolation between virtual devices

The fault information of each virtual device is reported to the corresponding VM or container.

In addition to supporting single Centerm 910 AI processor by AI Core level, CANN also supports dividing computing power by centerm 910 AI processor dimension in AI Server or cluster system equipped with multiple Centerm 910 AI processors.

From a technical point of view, CANN 5.0’s computing power virtualization mode is more finely segmented, isolated and secure, giving industry developers more choices.

From the perspective of performance, the performance of virtualization and non-virtualization scenarios is the same, improving the flexibility and ensuring the user experience is not discounted.

Write in the last

CANN 5.0’s efforts in computing power virtualization can effectively reduce management costs, improve system utilization and security.

Since then, individual cloud users or small business customers can deploy AI applications at the lowest cost by purchasing resources and computing power as needed.

In the intelligent world of the future, AI must be a universal technology available to everyone, and computing power must be a civilian resource available to everyone.

Thanks to CANN 5.0, AI is becoming an “affordable” PRatt & Whitney AI.

The future is not far away, already on the road, are you ready?

Welcome to centerm community website for more information.

Click to follow, the first time to learn about Huawei cloud fresh technology ~