Download mmdetection

Mmdetection is a tool. There is no installer, just clone it on Git and run it on the server.

Link: https://github.com/open-mmlab/mmdetection

A virtual environment

To train datasets with MMDetection, you first create a virtual environment on the server

Pyenv virtualenv < Python version > < Environment name> Create a virtual environment pyenv deactivate Exit the virtual environment pyenv uninstall < environment name> Delete the virtual environment Pyenv Activate < Environment name> The virtual environment is displayedCopy the code

Configuration package

Once in the virtual environment, start configuring the required packages

pip install torch==1.6. 0

pip install torchvision==0.7. 0PIP install -r requirements/build. TXT PIP install -v -e. PIP install Ninja # So you must download mMCV-full # here cu101/ Torch16.. 0PIP install mmcV-full -f HTTPS:/ / download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html

Copy the code

The versions of native CUDA, Torch, TorchVision, and MMCV correspond to each other

Corresponding table link: https://blog.csdn.net/zhaosuyuan/article/details/115942293?utm_medium=distribute.pc_aggpage_search_result.none-task-blog -2~aggregatepage~first_rank_ecpm_v1~rank_aggregation-1-115942293.pc_agg_rank_aggregation&utm_term=torch%E5%92%8Ccuda%E7% 89% 88% % E6 9 E5 AC c % % % AF BA E5 B9% % % % 94 & SPM = 1000.2123.3001.4430

Corresponding table link: https://github.com/open-mmlab/mmcv#install-with-pip

Preparing the data set

  1. Create a data folder under the folder
data
|---coco
    |---annotations
        |---instances_train2017.json
        |---instances_val2017.json
    |---val2017
    |---train2017


Copy the code

Where val2017 and train2017 are image folders of validation set and test set, instances_train2017.json and instances_val2017.json are annotation files of training set and validation set (of course, you can use your own name here and change it in the code. But there’s more to change)

The model configuration

Next, let’s complete the model configuration file

Let’s first look at the configs/_base_/models/ directory, which contains parameters for each model. Let’s first try fasterRCNN. Let’s open faster_rcnn_R50_fpn.py

Search for num_classes and change the number of classes to the number of classes in your own data set. You don’t need to +1 because of the background, because MMDetection handles it automatically


# num_classes=80,
num_classes=1.Copy the code

The corresponding change is also made in MMDET /datasets/ coco-.py

def coco_classes():
    return [
        ' '' 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush' '' '
        'tree',]Copy the code

The corresponding class, and then modify the coco dataset in mmdet/core/evaluation/class_names py, find the coco data set, modified

' ''
CLASSES = ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush')
'' '
CLASSES = ('tree'.)Copy the code

We then use tool/train.py to generate the corresponding configuration file

python tools/train.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py --work-dir record


Copy the code

– work – dir mean specifies the working directory, here for a while will generate configuration files, generate configuration files, we can break off, and then find the configuration file, in the working directory after modification (all kinds of fine tuning, such as vector, training times, continue to the last training documents, etc.) to run the configuration file

python tools/train.py record/faster_rcnn_r50_fpn_1x_coco.py

Copy the code

Debug

Here are some of the configuration problems I encountered

Version of the problem

The torch CUDa MMCV-Full version must correspond and the CUDA version must not be too low

Error file not found

Maybe it was missed in the PIP process, or there was a mistake in the modification process of the file. You can check the ALREADY requirement again by PIP

ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory

This is one of the most difficult issues I have encountered and has cost me a lot of vanishing, but here’s the main reason for this error: the current CUDA release is not 10.1 and the necessary files are missing

I tried a number of things:

  1. The above error was reported because the file libcudart.so.10.1 was not found. Execution of the sudo ln -s/usr/local/cuda – 10.0 / lib64 / libcudart. So. 10.0 / usr/lib/libcudart. So. 10.1 after the instruction, /usr/lib = /usr/lib = /usr/lib = /usr/lib It is said that I have copied the two files libcudart.so.10.1 and libcudart.so.10.1.243 from other places to solve the problem, but I did not wait for the author of that article to send me these two files

  2. I suspect that the cudatoolKit download is incomplete, but I found my CudatoolKit is available and complete by following the command

The default directory is local. Enter local: CD /usr/local and run the ls command to view the files in the directory. The installed CUDA file is CD cuda-7.5(I was7.5In the bin directory, go to the directory where cuda files reside, and then go to the bin directory: CD bin. Run the ls command to view the tools provided with Cuda Toolkit.Copy the code
  1. Finally I suspected I was in the right place: there was a problem with the soft link between the system and CUDA

But it still took me a few hours to figure this out, because there are so many different ways to switch CUDA versions and soft links online, and I tried each one, and on 1080, 2080, and 3090 servers.

Here is another category, documenting the various commands I use

3.1 First say the final successful instruction

Enter the command:

export PATH=/usr/local/cuda-7.5/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
Copy the code

(I’m using CUDa7.5 here, depending on your version)

When the path is added, enter the command NVCC -v to check whether the path is added successfully. If NVCC is found, the path is added successfully. In this way, toolkit tools can be directly used in the terminal.

3.2

Check which CUDA version the current CUDA soft link points to

cd /usr/local
ls
stat cuda
Copy the code
  1. Deleting soft Links
cd /usr/local/
sudo rm -rf cuda
sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda
Copy the code
  1. At home/ modify the comments of.bashrc, the environment variables

Open the.bashrc file command: gedit ~/.bashrc

# will be8.0Comment out,10.0Version of theexportLeave out the # in front of itexport PATH="/ usr/local/cuda - 10.0 / bin: $PATH." "
export LD_LIBRARY_PATH="/ usr/local/cuda - 10.0 / lib64: $LD_LIBRARY_PATH." "
export CUDA_HOME=/usr/local/cuda
Copy the code
  1. View CUDA and CUDNN versions

cat /usr/local/cuda/version.txt

4. However, cuda10.0 is still displayed when NVCC -v is used, so let’s look at the environment variable path and run the echo $path command

5. Find the causes: Because the/usr/bin/cuda 10.0 (1) by in the PATH/usr/local/cuda – 10.1 (2), so the NVCC – V command will be in the PATH order, go to (1) find inside, should be found in this folder NVCC cuda version 10.0, So we didn’t look anywhere else. We didn’t find the VERSION of CUDA we wanted.

Export PATH=/usr/local/cuda/bin:$PATH =/usr/local/cuda/bin:$PATH

6. NVCC again – V

3.3 Switch between CUDA versions using Update-Alternatives

Sudo update-alternatives --install /usr/local/cuda/usr/local/cuda -10.0/ 10
sudo update-alternatives --install /usr/local/cuda cuda /usr/local/cuda-10.2/ 20

Copy the code

Sudo update-alternatives –config cuda Select a CUDA version

Nvcc-v Check whether the replacement is successful

Error invalid syntax

This error is a character error, you can go to the corresponding file error line around the number of missing punctuation marks.

Most likely, you’re not in your virtual environment at all, you’re still in base, and python2 in base is too low to run some python3 code, so you get incompatibilities

A virtual environment

Make sure that every time you run the train. Py file, you enter your own environment, not the base environment. The base environment has an older python version, which will give you an error that the Python version is too low, giving the illusion that you need to install a new Python.

Notice what virtual environments are available in the current environment with the command pyEnv Versions