This article provides a minimalist step – by – step installation of CUDA&Python&Pytorch based on Ubuntu18.04, Ubuntu16, and Windows10.

To prepare

Example environment:

Ubuntu18.04

The following tools need to be installed:

  1. Nvidia driver (connect GPU to host)
  2. CudaToolKit (GPU accelerated dependency)
  3. Miniconda (Installing Python and managing the Python environment)
  4. Pytorch, TensorFlow, and MXNet GPU versions installed

operation

Install the Nvidia drive

Ubuntu18.04: Bash run

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubunt U1804_1.0.0-1_amd64. deb sudo apt install./ nvidia-machine-learn-repo sudo ubuntu1804_1.0.0-1_amd64.deb sudo apt get install --no-install-recommends nvidia-driver-450Copy the code

Ubuntu16.04: Bash run

sudo apt-get install gnupg-curl
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt-get install --no-install-recommends nvidia-418
Copy the code

Then restart the system and run nvidia-SMI to check whether the following interface is correctly displayed: The following interface is only an example, and the version later than Nvidia-SMI should be the same as the one installed above (Nvidia-driver-450).

Install CudaToolKit

Take cuda11.0 and Ubuntu18.04 as examples

  1. Baidu search: CUDa11.0

Click the first link to enter (Internet address) if you cannot open please science: CUDA Toolkit 11.0 Download | NVIDIA Developer

Ubuntu-> x86_64-> deb(local)

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda-repo-ubuntu1604-11-0-local_11.0.2-450.51. 05-1_amd64.deb sudo DPKG -I cuda-repo- Ubuntu1604-11-0-local_11.0.2-450.51.05-1_amd64. deb sudo apt-key add /var/cuda-repo-ubuntu1604-11-0-local/7fa2af80.pub sudo apt-get -y install cudaCopy the code

Ubuntu18.04 ->deb(local); copy Installation Instructions using Bash.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.2-450.51. 05-1_amd64.deb sudo DPKG -I cuda-repo-ubuntu1804-11-0-local_11.0.2-450.51.05-1_amd64.deb sudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub sudo apt-get -y install cudaCopy the code

Song said: This step will be the main problem is the third step, this step is to download the complete installation package, about 2GB. Familiar with Bash command friend can download developer.download.nvidia.com/compute/cud know this is… This link file directive. In this step, you can use tools such as Thunderbolt to copy the link to speed up the download, and then copy it to the bash running path (step 3 does not need to perform under bash), and then perform step 4 installation.

Miniconda installation

It is recommended to use tsinghua source, address: mirrors.tuna.tsinghua.edu.cn/anaconda/mi…

At the end of the page, download the latest version of the corresponding system (note to select the suffix x86_64) : miniconda3-PY38_4.9.2 – linux-x86_64. sh

Run the following command under bash:

Bash Miniconda3 - py38_4. 9.2 - Linux - x86_64. ShCopy the code

If the option is Yes, the other default is ok. After the installation is complete, create a Terminal and conda will take effect.

Pytorch installed with the GPU version of TensorFlow

See: PIP Conda replacement for Windows and Ubuntu

  1. First conda and PIP are switched to domestic sources to speed up bash execution
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --append channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/fastai/
conda config --append channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --append channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/
conda config --set show_channel_urls yes
Copy the code

CondaHTTPError: HTTP 000 CONNECTION FAILED

Solve the Problem that CondaHTTPError: HTTP 000 CONNECTION FAILED on Windows and Ubuntu

  1. To create a deep learning Python environment using conda, bash executes:
Conda create -n dl_py37 python=3.7Copy the code

(base) root@b9fc5be9c7f1: # -> (dl_py37) root@b9fc5be9c7f1: #

conda activate dl_py37
Copy the code
  1. Install Pytoch1.7, bash execution (cudatoolkit 10.1 is recommended to support TensorFlow2.3) :
Conda install PyTorch TorchVision TorchAudio CudatoolKit =10.1Copy the code
  1. TensorFlow ==2.3, TensorFlow 2.3 supports GPU by default, so do not specify it:
PIP install tensorflow = = 2.3Copy the code
  1. Cudatoolkit =10.1 cudatoolkit= cu101
PIP install mxnet - cu101 = = 1.7Copy the code
  1. Test Pytorch, TensorFlow, and MXNet

See: “AI Practices” to test whether the GPU version of deep learning framework is correctly installed methods: TensorFlow, PyTorch, MXNet, PaddlePaddle

1) TensorFlow

Tensorflow1. x and tensorFlow2. x test method is the same, the code is as follows:

print(tf.test.is_gpu_available())
Copy the code

The above code is saved as a.py file, which can be run using the test environment. Output: above is the log information, the key is the last True, indicating that the test is successful

The 2020-09-28 15:43:03. 197710: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not Compiled to Use: AVX2 2020-09-28 15:43:03.204525: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll Found device 0 with properties: I tensorflow/core/common_runtime/ GPU /gpu_device. Cc :1618] Found device 0 with properties: Name: GeForce RTX 2070 with max-Q Design Major: 7 Minor: 5 memoryClockRate(GHz): 1.125 2020-09-28 15:43:03.235352: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll The 2020-09-28 15:43:03. 242823: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll The 2020-09-28 15:43:03. 261932: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll The 2020-09-28 15:43:03. 268757: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll The 2020-09-28 15:43:03. 297478: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll The 2020-09-28 15:43:03. 315410: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll The 2020-09-28 15:43:03. 330562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll Cc: 2020-09-28 15:43:03.332846: I Tensorflow /core/ COMMON_runtime/GPU /gpu_device.cc:1746] Adding Visible GPU Devices: 0 2020-09-28 15:43:05. 198465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-09-28 15:43:05.200423: I tensorflow/ Core/COMMON_Runtime/GPU /gpu_device.cc: 220-09-28 15:43:05.201540: I TENsorflow/Core/GPU/Gpu_device.cc: 220-09-28 15:43:05.201540: I tensorflow/core/ COMMON_Runtime/GPU /gpu_device.cc:1178] 0: N 2020-09-28 15:43:05.203863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 6306 MB memory) -> Physical GPU (Device :0, name: GeForce RTX 2070 with max-q Design, PCI bus ID: 0000:01:00.0, compute Capability: 7.5)Copy the code

The last True indicates that the test is successful. In fact, we can find a lot of GPU information

GPU model: Name: GeForce RTX 2070 with max-Q Design

Cuda version: Successfully Opened Dynamic Library CUDart64_100.dll (10.0)

Cudnn versions: Successfully Opened Dynamic Library CUDNn64_7.dll (7.x)

Number of Gpus: Adding Visible GPU Devices: 0 (1)

GPU: / Device :GPU:0 with 6306 MB memory (8G)

2) PyTorch

PyTorch and TensorFlow are similar in that they both have a GPU test interface. PyTorch’s GPU test code is as follows:

print(torch.cuda.is_available())
Copy the code

The above code is saved as a.py file and can be run using the test environment. The output: True indicates that the test is successful

True
Copy the code

You can see that the PyTorch output is much cleaner. The log output of TensorFlow is also controllable.

3) MXNet

MXNet differs from PyTorch and TensorFlow testing methods because MXNet’ has no GPU testing interface (or I could not find it). Therefore, the GPU test code of MXNet adopts try-catch method to test exceptions, and the code is as follows:

    _ = mx.nd.array(1,ctx=mx.gpu(0))
Copy the code

The above code is saved as a.py file and can be run using the test environment. The output: True indicates that the test is successful