MindX DL (Centerm Deep Learning Component) is a reference design of deep learning component that supports Atlas 800 training server and Atlas 800 inference server. It provides centerm AI processor resource management and monitoring, Centerm AI processor optimization scheduling, distributed training set communication configuration generation and other basic functions. Quickly enable partners to develop the deep learning platform.

The operating system uses Ubuntu-1804 and the CPU uses the ARM architecture developed by Huawei.

1. Preparation before installation

  1. Configure apt network sources

hello@ubuntu:/etc/apt$ sudo cp sources.list~ sources.list hello@ubuntu:/etc/apt$ cat sources.list # # deb Cdrom :[Ubuntu-server 18.04.5LTS _Bionic Beaver_ - Release arm64 (20200810)]/ Bionic main Restricted #deb Cdrom :[Ubuntu-server 18.04.5LTS _Bionic Beaver_ - Release arm64 (20200810)]/ Bionic main Restricted # See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to # newer versions of the distribution. deb http://cn.ports.ubuntu.com/ubuntu-ports/ bionic main restricted # deb-src http://cn.ports.ubuntu.com/ubuntu-ports/ bionic main restricted ## Major bug fix updates produced after the final release of the ## distribution. deb http://cn.ports.ubuntu.com/ubuntu-ports/ bionic-updates main restricted # deb-src http://cn.ports.ubuntu.com/ubuntu-ports/ bionic-updates main restricted ## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu ## team. Also, please note that software in universe WILL NOT receive any ## review or updates from the Ubuntu security team. deb http://cn.ports.ubuntu.com/ubuntu-ports/ bionic universe # deb-src http://cn.ports.ubuntu.com/ubuntu-ports/ bionic universe deb http://cn.ports.ubuntu.com/ubuntu-ports/ bionic-updates universe # deb-src http://cn.ports.ubuntu.com/ubuntu-ports/ bionic-updates universe ## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu ## team, and may not be under a free licence. Please satisfy yourself as to ## your rights to use the software. Also, please note that software in ## multiverse WILL NOT receive any review or updates from the Ubuntu ## security team. deb http://cn.ports.ubuntu.com/ubuntu-ports/ bionic multiverse # deb-src http://cn.ports.ubuntu.com/ubuntu-ports/ bionic multiverse deb http://cn.ports.ubuntu.com/ubuntu-ports/ bionic-updates multiverse # deb-src http://cn.ports.ubuntu.com/ubuntu-ports/ bionic-updates multiverse ## N.B. software from this repository may not have been tested as ## extensively as that contained in the main release, although it includes ## newer versions of some applications which may provide useful features. ## Also, please note that software in backports WILL NOT receive any review ## or updates from the Ubuntu security team. deb http://cn.ports.ubuntu.com/ubuntu-ports/ bionic-backports main restricted universe multiverse # deb-src http://cn.ports.ubuntu.com/ubuntu-ports/ bionic-backports main restricted universe multiverse ## Uncomment the following  two lines to add software from Canonical's ## 'partner' repository. ## This software is not part of Ubuntu, but is offered by Canonical and the ## respective vendors as a service to Ubuntu users. # deb http://archive.canonical.com/ubuntu bionic partner # deb-src http://archive.canonical.com/ubuntu bionic partner deb http://ports.ubuntu.com/ubuntu-ports bionic-security main restricted # deb-src http://ports.ubuntu.com/ubuntu-ports bionic-security main restricted deb http://ports.ubuntu.com/ubuntu-ports bionic-security universe # deb-src http://ports.ubuntu.com/ubuntu-ports bionic-security universe deb http://ports.ubuntu.com/ubuntu-ports bionic-security multiverse # deb-src http://ports.ubuntu.com/ubuntu-ports bionic-security multiverseCopy the code

2. Configure the Kubernetes network source

root@ubuntu:~/123/offline-pkg-arm64# cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
> deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
> EOF

Copy the code

3. Create a directory and download the basic package

root@ubuntu:~/123# mkdir offline-pkg-arm64 root@ubuntu:~/123# cd offline-pkg-arm64/ root@ubuntu:~/123/offline-pkg-arm64#  sudo apt update root@ubuntu:~/123/offline-pkg-arm64# apt-get download conntrack cri-tools haveged keyutils libhavege1 libltdl7 libnfsidmap2 libtirpc-dev libtirpc1 nfs-common nfs-kernel-server rpcbind socat sshpass root@ubuntu:~/123/offline-pkg-arm64# wget --no-check-certificate https://download.docker.com/linux/ubuntu/dists/bionic/pool/stable/arm64/docker-ce_18.06.3~ce~3-0~ubuntu_arm64.deb root@ubuntu:~/123/offline-pkg-arm64# apt-get download kubelet=1.17.3-00 kubeadm=1.17.3-00 kubectl=1.17.3-00 Kubernetes - the cni = 0.8.6-00Copy the code

4. Download and save the Docker image

root@ubuntu:~/123# mkdir docker_images
root@ubuntu:~/123# cd docker_images/
root@ubuntu:~/123/docker_images# docker pull calico/node:v3.11.3
root@ubuntu:~/123/docker_images# docker save -o calico-node_arm64.tar.gz calico/node:v3.11.3
root@ubuntu:~/123/docker_images# docker pull calico/pod2daemon-flexvol:v3.11.3
root@ubuntu:~/123/docker_images# docker save -o calico-pod2daemon-flexvol_arm64.tar.gz calico/pod2daemon-flexvol:v3.11.3
root@ubuntu:~/123/docker_images# docker pull calico/cni:v3.11.3
root@ubuntu:~/123/docker_images# docker save -o calico-cni_arm64.tar.gz calico/cni:v3.11.3
root@ubuntu:~/123/docker_images# docker pull calico/kube-controllers:v3.11.3
root@ubuntu:~/123/docker_images# docker save -o calico-kube-controllers_arm64.tar.gz calico/kube-controllers:v3.11.3
root@ubuntu:~/123/docker_images# docker pull coredns/coredns:1.6.5
root@ubuntu:~/123/docker_images# docker save -o coredns_arm64.tar.gz coredns/coredns:1.6.5
root@ubuntu:~/123/docker_images# docker pull cruse/etcd-arm64:3.4.3-0
root@ubuntu:~/123/docker_images# docker save -o etcd_arm64.tar.gz cruse/etcd-arm64:3.4.3-0
root@ubuntu:~/123/docker_images# docker pull cruse/kube-apiserver-arm64:v1.17.3
root@ubuntu:~/123/docker_images# docker save -o kube-apiserver_arm64.tar.gz cruse/kube-apiserver-arm64:v1.17.3
root@ubuntu:~/123/docker_images# docker pull cruse/kube-controller-manager-arm64:v1.17.3
root@ubuntu:~/123/docker_images# docker save -o kube-controller-manager_arm64.tar.gz  cruse/kube-controller-manager-arm64:v1.17.3
root@ubuntu:~/123/docker_images# docker pull cruse/kube-proxy-arm64:v1.17.3-beta.0
root@ubuntu:~/123/docker_images# docker save -o kube-proxy_arm64.tar.gz cruse/kube-proxy-arm64:v1.17.3-beta.0
root@ubuntu:~/123/docker_images# docker pull cruse/kube-scheduler-arm64:v1.17.3-beta.0
root@ubuntu:~/123/docker_images# docker save -o kube-scheduler_arm64.tar.gz cruse/kube-scheduler-arm64:v1.17.3-beta.0
root@ubuntu:~/123/docker_images# docker pull cruse/pause-arm64:3.1
root@ubuntu:~/123/docker_images# docker save -o pause_arm64.tar.gz cruse/pause-arm64:3.1
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# docker login -u 15648907522 -p RtZOXgmpYAQd5cj93uFCabNXUWB7wOftGw4pFdcal4XZH4bf06hvFxTOrYtr1nRao ascendhub.huawei.com
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# docker pull ascendhub.huawei.com/public-ascendhub/vc-controller-manager_arm64:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker pull ascendhub.huawei.com/public-ascendhub/vc-scheduler_arm64:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker pull ascendhub.huawei.com/public-ascendhub/vc-webhook-manager_arm64:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker pull ascendhub.huawei.com/public-ascendhub/vc-webhook-manager-base_arm64:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker pull ascendhub.huawei.com/public-ascendhub/hccl-controller_arm64:v20.2.0
root@ubuntu:~/123/docker_images# docker pull ascendhub.huawei.com/public-ascendhub/ascend-k8sdeviceplugin_arm64:v20.2.0
root@ubuntu:~/123/docker_images# docker pull ascendhub.huawei.com/public-ascendhub/cadvisor_arm64:v0.34.0-r40
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# docker tag ascendhub.huawei.com/public-ascendhub/vc-controller-manager_arm64:v1.0.1-r40 volcanosh/vc-controller-manager:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker tag ascendhub.huawei.com/public-ascendhub/vc-scheduler_arm64:v1.0.1-r40 volcanosh/vc-scheduler:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker tag ascendhub.huawei.com/public-ascendhub/vc-webhook-manager_arm64:v1.0.1-r40 volcanosh/vc-webhook-manager:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker tag ascendhub.huawei.com/public-ascendhub/vc-webhook-manager-base_arm64:v1.0.1-r40 volcanosh/vc-webhook-manager-base:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker tag ascendhub.huawei.com/public-ascendhub/hccl-controller_arm64:v20.2.0 hccl-controller:v20.2.0
root@ubuntu:~/123/docker_images# docker tag ascendhub.huawei.com/public-ascendhub/ascend-k8sdeviceplugin_arm64:v20.2.0 ascend-k8sdeviceplugin:v20.2.0
root@ubuntu:~/123/docker_images# docker tag ascendhub.huawei.com/public-ascendhub/cadvisor_arm64:v0.34.0-r40 google/cadvisor:v0.34.0-r40
root@ubuntu:~/123/docker_images# docker rmi ascendhub.huawei.com/public-ascendhub/vc-controller-manager_arm64:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker rmi ascendhub.huawei.com/public-ascendhub/vc-scheduler_arm64:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker rmi ascendhub.huawei.com/public-ascendhub/vc-webhook-manager_arm64:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker rmi ascendhub.huawei.com/public-ascendhub/vc-webhook-manager-base_arm64:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker rmi ascendhub.huawei.com/public-ascendhub/hccl-controller_arm64:v20.2.0
root@ubuntu:~/123/docker_images# docker rmi ascendhub.huawei.com/public-ascendhub/ascend-k8sdeviceplugin_arm64:v20.2.0
root@ubuntu:~/123/docker_images# docker rmi ascendhub.huawei.com/public-ascendhub/cadvisor_arm64:v0.34.0-r40
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# 
root@ubuntu:~/123/docker_images# docker save -o Ascend-K8sDevicePlugin-v20.2.0-arm64-Docker.tar.gz ascend-k8sdeviceplugin:v20.2.0
root@ubuntu:~/123/docker_images# docker save -o hccl-controller-v20.2.0-arm64.tar.gz hccl-controller:v20.2.0
root@ubuntu:~/123/docker_images# docker save -o huawei-cadvisor-v0.34.0-r40-arm64.tar.gz google/cadvisor:v0.34.0-r40
root@ubuntu:~/123/docker_images# docker save -o vc-controller-manager-v1.0.1-r40-arm64.tar.gz volcanosh/vc-controller-manager:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker save -o vc-scheduler-v1.0.1-r40-arm64.tar.gz volcanosh/vc-scheduler:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker save -o vc-webhook-manager-base-v1.0.1-r40-arm64.tar.gz volcanosh/vc-webhook-manager-base:v1.0.1-r40
root@ubuntu:~/123/docker_images# docker save -o vc-webhook-manager-v1.0.1-r40-arm64.tar.gz volcanosh/vc-webhook-manager:v1.0.1-r40

Copy the code

Note * Some images need to be downloaded after obtaining permissions in Huawei Hub

Support.huaweicloud.com/usermanual-…

5. The finished directory

root@ubuntu:~/123# School exercises ── Docker_images │ ├── Ascend-K8sDevicePlugin-v20.2.0-arm64-Docker.tar.gz │ ├── Calico-cni_arm64.tar. gz │ ├─ Calico-kube - Controllers_arm64.tar. gz │ ├─ Calico-Node_arm64.tar. gz │ ├─ Calico-Node_arm64.tar. gz │ ├─ Calico-Node_arm64.tar. gz │ ├─ The calico pod2daemon - flexvol_arm64. Tar. Gz │ ├ ─ ─ coredns_arm64. Tar. Gz │ ├ ─ ─ etcd_arm64. Tar. Gz │ ├ ─ ─ Gz │ ├─ Heavy Metal Flag school ─ Kube - ApiserVer_arm64.tar. gz │ ├─ Heavy metal Flag School ─ Kube - Heavy metal Flag school ├─ Kube - Controller-Manager_Arm64.tar. gz │ ├─ Kube - Controller-Manager_arm64.tar. gz │ ├─ Kube - Controller-Manager_Arm64.tar. gz │ ├─ Kube - Controller-Manager_Arm64.tar. gz │ ├─ Kube - Controller-Manager_Arm64.tar. gz │ ├─ Pause_arm64. Tar. Gz │ ├ ─ ─ vc - controller - the manager - v1.0.1 - r40 - arm64. Tar. Gz │ ├ ─ ─ vc - the scheduler - v1.0.1 - r40 - arm64. Tar. Gz │ ├ ─ ─ Vc - webhook - manager - base - v1.0.1 - r40 - arm64. Tar. Gz │ └ ─ ─ vc - webhook - manager - v1.0.1 - r40 - arm64. Tar. Gz ├ ─ ─ offline - PKG - arm64 │ ├ ─ ─ conntrack_1%3 a1. 4.4 + snapshot20161117-6 ubuntu2_arm64. Deb │ ├ ─ ─ cri - tools_1. 13.0-01 _arm64. Deb │ ├ ─ ─ Docker-ce_18.06.3 ~ce~3-0~ Ubuntu_arm64. Deb │ ├─ Haveged_1.9-6_arm64. Deb │ ├─ Ubuntu2_arm64. Docker-ce_18.06.3 ~ CE ~3-0~ Ubuntu_arm64 ├ ─ ─ kubeadm_1. 17.3-00 _arm64. Deb │ ├ ─ ─ kubectl_1. 17.3-00 _arm64. Deb │ ├ ─ ─ kubelet_1. 17.3-00 _arm64. Deb │ ├ ─ ─ Kubernetes-cni_0.8.6-00_arm64. Deb │ ├─ Libhavege1_1.1-6_arm64. Deb │ ├─ Libltdl7_2.6-2_arm64 Deb │ ├─ LibTirpc1_0.5-1.2 Ubuntu0.1_arm64.deb │ ├─ libtirpc1_0.5-1.2 ubuntu0.1_arm64.deb │ ├─ libTirpc1_0.5-1.2 Ubuntu0.1_arm64.deb │ ├─ libTirpc1_0.5-1.2 Ubuntu0.1_arm64.deb │ ├─ Libtirpc-dev_0.2.5-1.2 ubuntu0.1_arm64. Deb │ ├─ nfS-Common_1a1.4-2.1 ubuntu5.1_arm64. Deb │ ├─ nfs-Common_1a1.4-2.1 ubuntu5.1_arm64 Nfs-kern-server_3a3.4-2.1 Ubuntu5._arm64. deb │ ├─ rpcbind_0.2.3-0.6 Ubuntu0.18.04._arm64. deb │ ├─ rpcbind_0.2.3-0.6 Ubuntu0.18.04._arm64. deb │ ├─ Socat_1. 7.3.2-2 ubuntu2_arm64. Deb │ └ ─ ─ sshpass_1. 06-1 _arm64. Deb ├ ─ ─ offline - PKG - arm64. Zip └ ─ ─ yamls ├ ─ ─ Ascendplugin-v20.2.0. YamL Exercises ── Ascendplugin-v20.2.0. YamL Exercises ── CAdvisor-V0.34.0-R40. yamL Exercises ── HCCL - controller - v20.2.0. Yaml ├ ─ ─ npu - exporter - v20.2.0. Yaml └ ─ ─ volcano - v1.0.1 - r40. Yaml 3 directories, 46 files root@ubuntu:~/123#Copy the code

Note * The YAMLS file can be downloaded from the link below

Gitee.com/ascend/mind…

6. Configure encrypted login

root@ubuntu:~# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:07dTbsAycQqT2w7HdCwjIyJig5T20FQ/eHZGxWg7pbY root@ubuntu The key's randomart image is: +---[RSA 2048]----+ | .+... .+. | |o+ . o .+ + | |+o+ ... =BoO + | |... o .o.+/ O | | S @ + . | | E + = | | . o o | | o | | | +----[SHA256]-----+ root@ubuntu:~# root@ubuntu:~# ssh-copy-id -i 127.0.0.1Copy the code

7. Configure and install Ansible

root@ubuntu:~# root@ubuntu:~# apt install ansible root@ubuntu:~# vim /etc/ansible/hosts # The configuration is as follows [all:vars] # default shared directory, You can change it as yours nfs_shared_dir=/data/atlas_dls # NFS service IP nfs_service_ip=192.168.1.110 # Master Master_ip =192.168.1.110 # DLS install package dir dls_root_dir=/root/123 # set proxy proxy="" # Command for logging in to the Asend hub ascendhub_login_command="login_command" # Generally, you do not need to change the value or delete it. ascendhub_prefix="ascendhub.huawei.com/public-ascendhub" # versions Deviceplugin_version = "v20.2.0" cadvisor_version = "v0.34.0 - r40 volcano_version" = "v1.0.1 - r40 hccl_version =" v20.2.0 "" [nfs_server] ubuntu ansibLE_host =192.168.1.110 anSIBLE_ssh_user ="root" AnSIBLE_ssh_pass ="123123" [localNode] Ubuntu Ansible_host =192.168.1.110 anSIBLE_ssh_user ="root" anSIBLE_ssh_pass ="123123" [training_node] Ubuntu Ansible_host =192.168.1.110 ANSIBLE_ssh_user ="root" AnSIBLE_ssh_pass ="123123" [inference_node] [A300T_node] [arm] Ubuntu Ansible_host =192.168.1.110 ANSIBLE_ssh_user ="root" anSIBLE_ssh_pass ="123123" [x86] [workers: Children] training_node inference_node A300T_node root@ubuntu:~/mindxdl/deploy/offline/steps# vim /etc/ansible/ansible.cfg log_path = /var/log/ansible.log host_key_checking = False deprecation_warnings = FalseCopy the code

Note * Parameter description, please write according to the actual:

Nfs-host-ip: indicates the IP address of the NFS server. If NFS is not installed, set this parameter to an empty string, for example, "". Master-host-ip: specifies the IP address of the management node server. Install_dir: upload directory for basic software packages, image packages, and YAMls folders. Proxy_address: indicates the proxy address. Set it as required. If no proxy is required, set it to an empty string, for example, "". Login_command: indicates the login command used to obtain images from the Ascend Hub. This command is required only for online installation, for example: "Docker login -u xxxxxx@xxxxxx -p XXXXXXXX ascendhub.huawei.com", do not omit the quotation marks before and after the command. For details about how to obtain the command, see 1 to 2 in Obtaining a MindX DL Image. For offline installation, the value can be an empty string, for example, "". Single-node-host-name: Specifies the hostname of a single node. You can run the hostname command to view the hostname. IP: indicates the IP address of the server. Username: indicates the username for logging in to the server. You are advised to use the root user to avoid insufficient permissions. Passwd: indicates the password for logging in to the server.Copy the code

Two, one key installation

root@ubuntu:~/sshpass# apt install sshpass
root@ubuntu:~/mindxdl/deploy/offline/steps# dos2unix *
root@ubuntu:~/mindxdl/deploy/offline/steps# chmod 500 entry.sh
root@ubuntu:~/mindxdl/deploy/offline/steps# bash -x entry.sh
Copy the code

Three, verify after installation

1. View docker information

root@ubuntu:~# docker info
Containers: 35
 Running: 30
 Paused: 0
 Stopped: 5
Images: 18
Server Version: 18.06.3-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: ascend runc
Default Runtime: ascend
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: a592beb5bc4c4092b1b1bac971afed27687340c5
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-112-generic
Operating System: Ubuntu 18.04.5 LTS
OSType: linux
Architecture: aarch64
CPUs: 192
Total Memory: 503.6GiB
Name: ubuntu
ID: MUTU:QOYU:2P6F:P2QB:4JKZ:QNKE:PPMQ:PQLL:3PDG:QEYU:LMDK:KNMF
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 docker.mirrors.ustc.edu.cn
 127.0.0.0/8
Registry Mirrors:
 https://dockerhub.azk8s.cn/
 https://docker.mirrors.ustc.edu.cn/
 http://hub-mirror.c.163.com/
Live Restore Enabled: false

WARNING: No swap limit support
Copy the code

2. Kubectl pod information check

root@ubuntu:~# kubectl get pod --all-namespaces
NAMESPACE        NAME                                       READY   STATUS      RESTARTS   AGE
cadvisor         cadvisor-nsn4r                             1/1     Running     0          5m23s
default          hccl-controller-645bb466f-5fqq6            1/1     Running     0          5m34s
kube-system      ascend-device-plugin-daemonset-vxj8s       1/1     Running     0          5m23s
kube-system      calico-kube-controllers-8464785d6b-bnjdn   1/1     Running     0          5m50s
kube-system      calico-node-blshl                          1/1     Running     0          5m51s
kube-system      coredns-6955765f44-5jr59                   1/1     Running     0          5m50s
kube-system      coredns-6955765f44-wbzvz                   1/1     Running     0          5m50s
kube-system      etcd-ubuntu                                1/1     Running     0          5m43s
kube-system      kube-apiserver-ubuntu                      1/1     Running     0          5m43s
kube-system      kube-controller-manager-ubuntu             1/1     Running     0          5m43s
kube-system      kube-proxy-b78fm                           1/1     Running     0          5m51s
kube-system      kube-scheduler-ubuntu                      1/1     Running     0          5m43s
volcano-system   volcano-admission-74776688c8-g9p9q         1/1     Running     0          5m31s
volcano-system   volcano-admission-init-sbktn               0/1     Completed   0          5m31s
volcano-system   volcano-controllers-6786db54f-vn797        1/1     Running     0          5m31s
volcano-system   volcano-scheduler-844f9b547b-xxjm7         1/1     Running     0          5m31s
root@ubuntu:~# 
root@ubuntu:~# kubectl describe node ubuntu
Name:               ubuntu
Roles:              master,worker
Labels:             accelerator=huawei-Ascend910
                    beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/os=linux
                    host-arch=huawei-arm
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=ubuntu
                    kubernetes.io/os=linux
                    masterselector=dls-master-node
                    node-role.kubernetes.io/master=
                    node-role.kubernetes.io/worker=worker
                    workerselector=dls-worker-node
Annotations:        huawei.com/Ascend910: Ascend910-1,Ascend910-2,Ascend910-3,Ascend910-0
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.1.110/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 10.30.243.192
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 05 Aug 2021 16:34:33 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ubuntu
  AcquireTime:     <unset>
  RenewTime:       Thu, 05 Aug 2021 16:41:29 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 05 Aug 2021 16:35:06 +0800   Thu, 05 Aug 2021 16:35:06 +0800   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Thu, 05 Aug 2021 16:40:30 +0800   Thu, 05 Aug 2021 16:34:27 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Thu, 05 Aug 2021 16:40:30 +0800   Thu, 05 Aug 2021 16:34:27 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Thu, 05 Aug 2021 16:40:30 +0800   Thu, 05 Aug 2021 16:34:27 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 05 Aug 2021 16:40:30 +0800   Thu, 05 Aug 2021 16:35:19 +0800   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.1.110
  Hostname:    ubuntu
Capacity:
  cpu:                   192
  ephemeral-storage:     920422204Ki
  huawei.com/Ascend910:  4
  hugepages-2Mi:         0
  memory:                528101392Ki
  pods:                  110
Allocatable:
  cpu:                   192
  ephemeral-storage:     848261101802
  huawei.com/Ascend910:  4
  hugepages-2Mi:         0
  memory:                527998992Ki
  pods:                  110
System Info:
  Machine ID:                 3996e745414f461b9e0e990f6d0b597e
  System UUID:                CD56756C-607E-BD02-EB11-5292EAFB068C
  Boot ID:                    adb96127-7fdc-4d84-8867-a13005f9b535
  Kernel Version:             4.15.0-112-generic
  OS Image:                   Ubuntu 18.04.5 LTS
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  docker://18.6.3
  Kubelet Version:            v1.17.3
  Kube-Proxy Version:         v1.17.3
PodCIDR:                      10.30.0.0/24
PodCIDRs:                     10.30.0.0/24
Non-terminated Pods:          (15 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  cadvisor                    cadvisor-nsn4r                              500m (0%)     1 (0%)      300Mi (0%)       2000Mi (0%)    6m17s
  default                     hccl-controller-645bb466f-5fqq6             500m (0%)     500m (0%)   300Mi (0%)       300Mi (0%)     6m28s
  kube-system                 ascend-device-plugin-daemonset-vxj8s        500m (0%)     500m (0%)   500Mi (0%)       500Mi (0%)     6m17s
  kube-system                 calico-kube-controllers-8464785d6b-bnjdn    0 (0%)        0 (0%)      0 (0%)           0 (0%)         6m44s
  kube-system                 calico-node-blshl                           250m (0%)     0 (0%)      0 (0%)           0 (0%)         6m45s
  kube-system                 coredns-6955765f44-5jr59                    100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     6m44s
  kube-system                 coredns-6955765f44-wbzvz                    100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     6m44s
  kube-system                 etcd-ubuntu                                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         6m37s
  kube-system                 kube-apiserver-ubuntu                       250m (0%)     0 (0%)      0 (0%)           0 (0%)         6m37s
  kube-system                 kube-controller-manager-ubuntu              200m (0%)     0 (0%)      0 (0%)           0 (0%)         6m37s
  kube-system                 kube-proxy-b78fm                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         6m45s
  kube-system                 kube-scheduler-ubuntu                       100m (0%)     0 (0%)      0 (0%)           0 (0%)         6m37s
  volcano-system              volcano-admission-74776688c8-g9p9q          500m (0%)     500m (0%)   300Mi (0%)       300Mi (0%)     6m25s
  volcano-system              volcano-controllers-6786db54f-vn797         500m (0%)     500m (0%)   300Mi (0%)       300Mi (0%)     6m25s
  volcano-system              volcano-scheduler-844f9b547b-xxjm7          500m (0%)     500m (0%)   300Mi (0%)       300Mi (0%)     6m25s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource              Requests     Limits
  --------              --------     ------
  cpu                   4 (2%)       3500m (1%)
  memory                2140Mi (0%)  4040Mi (0%)
  ephemeral-storage     0 (0%)       0 (0%)
  huawei.com/Ascend910  0            0
Events:
  Type    Reason                   Age                    From                Message
  ----    ------                   ----                   ----                -------
  Normal  NodeHasSufficientMemory  7m10s (x8 over 7m11s)  kubelet, ubuntu     Node ubuntu status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    7m10s (x7 over 7m11s)  kubelet, ubuntu     Node ubuntu status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     7m10s (x6 over 7m11s)  kubelet, ubuntu     Node ubuntu status is now: NodeHasSufficientPID
  Normal  Starting                 6m37s                  kubelet, ubuntu     Starting kubelet.
  Normal  NodeHasSufficientMemory  6m37s                  kubelet, ubuntu     Node ubuntu status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    6m37s                  kubelet, ubuntu     Node ubuntu status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     6m37s                  kubelet, ubuntu     Node ubuntu status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  6m37s                  kubelet, ubuntu     Updated Node Allocatable limit across pods
  Normal  Starting                 6m33s                  kube-proxy, ubuntu  Starting kube-proxy.
  Normal  NodeReady                6m17s                  kubelet, ubuntu     Node ubuntu status is now: NodeReady
root@ubuntu:~#
Copy the code

Note * You can see CPU and accelerator card information in this information

Capacity:
  cpu:                   192
  ephemeral-storage:     920422204Ki
  huawei.com/Ascend910:  4
  hugepages-2Mi:         0
  memory:                528101392Ki
  pods:                  110
Allocatable:
  cpu:                   192
  ephemeral-storage:     848261101802
  huawei.com/Ascend910:  4
  hugepages-2Mi:         0
  memory:                527998992Ki
  pods:                  110
Copy the code

** For details, see huawei official documentation: **

support.huaweicloud.com/mindxdl201/

Linux O&M Communication community

Linux o&M Community, Internet news and technical exchanges.

30 original articles

The public,

This article uses the article synchronization assistant to synchronize