background

QGPU is the GPU sharing technology launched by Tencent Cloud. It supports the sharing of GPU cards among multiple containers and provides the capability of strong isolation of graphics memory and computing power between containers. In this way, on the basis of using GPU cards in a smaller granularity, service security is ensured, GPU usage is increased, and customer costs are reduced.

QGPU on TKE Relies on the Nano GPU scheduling framework provided by Tencent Cloud TKE to implement fine-grained scheduling of GPU computing power and graphics memory, and supports multi-container sharing of Gpus and cross-gpu resource allocation. At the same time, it relies on the powerful underlying qGPU isolation technology to achieve strong isolation of GPU memory and computing power. When using the GPU through sharing, service performance and resources are not interfered as much as possible.

Functional advantages

By scheduling tasks on NVIDIA GPU cards more effectively, the qGPU solution can be shared by multiple containers. It supports the following functions:

Flexibility: Users can freely configure the graphics memory size and computing power ratio of the GPU

Cloud native: standard Kubernetes support, compatible with NVIDIA Docker solution

Compatibility: No image modification, no CUDA library replacement, no service reprogramming, easy deployment, and no service awareness

High performance: Operate GPU devices at the bottom layer, with efficient convergence and throughput close to zero loss

Strong isolation: Supports strict isolation of video memory and computing power, and does not affect service sharing

The technical architecture

QGPU on TKE uses the Nano GPU scheduling framework, extends the scheduling mechanism through Kubernetes, and supports GPU computing power and video memory resource scheduling. In addition, it relies on the container locating mechanism of the Nano GPU to support fine GPU card scheduling, multi-container GPU card sharing and multi-container GPU cross-card allocation.

QGPU scheduling directly adopts nvidia GPU underlying hardware features to achieve fine-grained computing force isolation, breaking the traditional CUDA API hijacking scheme that can only isolate computing force based on CUDA Kernel granularity, and providing better QoS guarantee.

The client’s return

  1. Flexible GPU sharing for multiple tasks improves GPU utilization
  2. GPU resources are strongly isolated, and service sharing is not affected
  3. Fully Kubernetes oriented, zero cost for business use

The future planning

● Fine-grained resource monitoring: qGPU on TKE supports Pod and container-level GPU usage collection to achieve fine-grained resource monitoring and integration with GPU elastic capabilities

● Support mixing in offline mode: qGPU on TKE supports mixing of online and offline services with high or low priorities to maximize GPU utilization

● Support qGPU computing power pooling: GPU computing power pooling based on qGPU decouples CPU and memory resources from heterogeneous computing resources

Closed to apply for

QGPU has been open for free internal test, welcome to add Tencent cloud native small assistant: TKEplatform, note “qGPU internal test application” for trial!

About us

More about cloud native cases and knowledge, can pay attention to the same name [Tencent cloud native] public account ~

Benefits: the official account responds to the “Manual” backstage, and you can get “Tencent Cloud native Roadmap manual” & “Best Practices of Tencent Cloud Native” ~