preface

Sometimes the production environment cannot connect to the Internet. In order to make the AI application run on the GPU machine as a container, there are a lot of software to install, and the software dependency is troublesome. Here a feasible installation process is recorded.

Some instructions

  • The physical machine only needs to have nVIDIA drivers installed, not CUDA, in the image.
  • In order for NVIDIA to penetrate the Docker container to the host, the following four plug-ins need to be installed. After installing these plug-ins, run systemctl restart docker to restart the docker service.
Yum install nvidia - container - toolkit - 1.7.0-1. X86_64. RPM nvidia - container - the runtime - 3.7.0-1. Noarch. RPM Libnvidia - container - tools - 1.7.0-1. X86_64. RPM libnvidia - container1-1.7.0-1. X86_64. RPMCopy the code

Install the graphics card driver

Driver dependent package

  • DKMS, select the appropriate version, such as DKMS – 2.2.0.3-30. Git. 7 c3e7c5. El7. Noarch. RPM.

Driver package

Go to the official website to select the corresponding graphics card and operating system version, and click Download. www.nvidia.cn/Download/in…

For example, the downloaded package is nvidia-driver-local-repo-rhel7-450.172.01-1.0-1.x86_64. RPM

Then execute the following commands to install.

RPM -ivh nvidia-driver-local-repo-rhel7-450.172.01-1.0-1.x86_64. RPM yum clean all yum install cuda-driversCopy the code

Finally, run the nvidia-smi command to see if it is successful. Success should look like the following figure. (Note: If the installation is correct but the command is not executed successfully, restart the server.)

Install the docker

Download the operating system version of the production environment from the local docker, for example, if the production environment is centos7.3, the next centos7.3 image. The general idea is to download the dependent packages of Docker installation from the local Centos7.3 container, and then upload these offline packages to the production system for offline installation.

Build the installation software in the local environment

  1. Run the following command to create a directory to store the installation package.
mkdir -p /home/package/docker
mkdir -p /home/package/docker/Docker_install_RPM
mkdir -p /home/package/docker/yum-utils
mkdir -p /home/package/docker/nvidia-docker-plugin
Copy the code
  1. Download the yum-utils package to the yum-utils directory.
cd /home/package/docker/yum-utils
yum install -y yum-utils --downloadonly --downloaddir ./
Copy the code
  1. EPLE source = EPLE source = EPLE source = EPLE source = EPLE source = EPLE source = EPLE source = EPLE Download the EPLE into the docker directory and install it.
cd /home/package/docker/
yum install -y epel-release --downloadonly --downloaddir ./
rpm -ivh epel-release-7-11.noarch.rpm
Copy the code
  1. Download the docker package to the Docker_install_RPM directory
cd /home/package/docker/Docker_install_RPM
yum -y install docker-ce --downloadonly --downloaddir ./

Copy the code
  1. Download nvidia-Docker and run the plugin in the nvidia-Docker-plugin directory.
cd /home/package/docker/nvidia-docker-plugin distribution=$(. /etc/os-release; echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo yum -y install nvidia-container-toolkit --downloadonly --downloaddir ./ yum -y install nvidia-container-runtime --downloadonly --downloaddir ./ yum -y install libnvidia-container-tools --downloadonly  --downloaddir ./ yum -y install libnvidia-container1 --downloadonly --downloaddir ./Copy the code
  1. Write the offline installation script docker_offline_install.sh in the docker directory.
vi docker_offline_install.sh
Copy the code
#! / bin/bash echo "# # # # # # # # # # # # # # # # # # # # # 1 start install yum - utils# # # # # # # # # # # # # # # # # # # # # # # # # # # # #" RPM - the ivh yum - utils / * -- nodeps -- force Echo "# # # # # # # # # # # # # # # # # # # # # 2 began to install the epel repository file # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #" RPM - the ivh epel - release - 7-11. Noarch. RPM echo "# # # # # # # # # # # # # # # # # # # # # 3 began to install Docker # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #" RPM - the ivh Docker_install_RPM / * -- nodeps -- force Echo "# # # # # # # # # # # # # # # # # # # # # 4 to install Nvidia - Docker plug-in # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #" RPM - the ivh Nvidia - Docker - the plugin / * -- nodeps -- force echo "# # # # # # # # # # # # # # # # # # # # # 5 start Docker service # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #" systemctl start Docker Echo "# # # # # # # # # # # # # # # # # # # # # 6 Docker version to see # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #" Docker versionCopy the code
  1. The final directory structure is as follows.

  1. Package all files.
tar czvf docker.tar.gz /home/package/docker/
Copy the code
  1. Copy the compressed package to the physical machine.
docker cp magical_wescoff:/home/package/docker.tar.gz d:/
Copy the code

The production environment is installed offline

  1. Copy the compressed package to the production environment, decompress it, and run the docker_offline_install.sh script to install it offline.
tar -xzvf docker.tar.gz
cd home/package/docker
chmod +x docker_offline_install.sh
./docker_offline_install.sh
Copy the code
  1. If the following error is reported, it is likely that the docker container nVIDIA penetration into the host four plug-ins are not installed properly.
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
Copy the code

You can reinstall it with the following command.

Yum install nvidia - container - toolkit - 1.7.0-1. X86_64. RPM nvidia - container - the runtime - 3.7.0-1. Noarch. RPM Libnvidia - container - tools - 1.7.0-1. X86_64. RPM libnvidia - container1-1.7.0-1. X86_64. RPMCopy the code

Remember to restart the Docker service after installing it.

systemctl restart docker
Copy the code

This public account focuses on artificial intelligence, reading and feelings, talk about mathematics, computer science, distributed, machine learning, deep learning, natural language processing, algorithms and data structures, Java depth, Tomcat kernel, etc.

Author profile: Seaboat, good at artificial intelligence, computer science, mathematical principles, basic algorithms. Published books: Anatomy of Tomcat Kernel Design, Illustrated Data Structure and Algorithm, Popular Science of Artificial Intelligence Principles, Illustrated Java Concurrency Principles.