First of all, let’s take a look at the comparison of different types of AI servers. We can make a simple comparison of servers with different architectures through the above two-dimensional diagram. From the top left to the bottom right are CPU, GPU, FPGA, TPU, and ASIC. On the horizontal axis, Performance improves as you move to the right. Longitudinal axis Programmability/Flexibility refers to the server Programmability and Flexibility, the performance of the ASIC is best, because it is the algorithm of curing on a chip, algorithm is relatively fixed, so its performance is the best, but its programming and Flexibility is relatively weak. The CPU has the most flexibility and programming, but the least performance. In general, the GPU is less flexible than the CPU, but its performance is better. FPGA, TPU and ASIC are the next in order. In the actual selection, power consumption, cost, performance, real-time and other factors need to be considered, especially for some processors with special purposes. If the algorithm has been solidified and is very simple, ASIC can be considered, because ASIC has good performance and low power consumption. GPU is a better choice for training or general purpose situations.

chooseGPU serverBasic principles of

Before introducing basic principles for selecting a GPU server, this section describes common Gpus and GPU servers.

According to bus interface types, common Gpus can be divided into NV-Link interface, traditional bus interface and traditional PCI-E bus.

The TYPICAL GPU of the NV-Link interface type is the NVIDIA V100, which uses the SXM2 interface. Dgx-2 provides SXM3 ports. Nv-link BUS standard GPU server can be divided into two types, one is the NVIDIA designed DGX supercomputer, the other is the partner designed NV-Link interface server. The DGX supercomputer provides not only the hardware, but also the associated software and services.

Traditional bus interface GPU, currently there are several mainstream products, such as V100, P40 (P refers to the last generation PASCAL architecture) and P4 of PCI-E interface, as well as the latest Turing architecture T4. Among them, P4 and T4, which are relatively thin and occupy only one slot, are usually used in Inference. At present, there have been mature models for Inference and identification.

GPU servers of traditional PCI-E bus are also divided into two categories. One is OEM server, such as Dawning, Inspur, Huawei and other international brands. Another category of non-OEM servers also includes many categories. Select a server in addition to classification, but also to consider performance indicators, such as precision, video memory type, video memory capacity and power consumption, while there will be some servers need water cooling, noise reduction or have special requirements on temperature, mobility and so on, you need a special server.

When selecting a GPU server, consider service requirements and select an appropriate GPU model. In HPC high performance computing, it is also necessary to choose according to the accuracy. For example, some high performance computing requires double precision, so it is not appropriate to use P40 or P4, only V100 or P100 can be used. At the same time, it will also have requirements on the display memory capacity, such as petroleum or petrochemical exploration calculation applications on the display memory requirements are relatively high; Some have requirements for bus standards, so the choice of GPU model should be based on business requirements.

GPU server artificial intelligence field is also more applications. In teaching scenarios, GPU virtualization has high requirements. Depending on the class number, a teacher may need to virtualize GPU servers into 30 or even 60 virtual Gpus. Therefore, batch Training has high requirements on GPU, and V100 is usually used for GPU Training. Reasoning is needed after model training, so P4 or T4 is generally used for reasoning, and V100 is also used in a few cases.

After the GPU model is selected, consider which GPU server to use. At this point we need to consider the following situations:

First, in the edge server, corresponding servers such as T4 or P4 should be selected according to the quantity, and the use scenario of the server should also be considered, such as the bayonet of railway station, airport or public security, etc. V100 server may be needed in the process of Inference at the central end, and throughput, usage scenarios and quantity should be considered.


Second, IT is necessary to consider the user group and IT operation and maintenance capacity of customers. For large companies like BAT, which have strong operation capacity, they will choose the general PCI-E server. However, for some customers with less IT operation and maintenance ability, who pay more attention to numbers and data annotation, we call these human data scientists, the criteria for selecting GPU server will be different.


Third, you need to consider the value of the accompanying software and services.


Fourth, the maturity and engineering efficiency of the overall GPU cluster system should be considered. For example, DGX, a gPU-integrated supercomputer, has a very mature operating system driving Docker from the bottom to other parts, which are fixed and optimized, so the efficiency is relatively high.