Last time, we used Faster R-CNN to train a VGG-16 network. In order to improve the accuracy of recognition, we used ResNet network to train the same data for another time.


The basic process is similar to that of training VGG-16 network, which can be referred to the VGG-16 model of training Faster R-CNN with its own data

First, training network

Download the Resnet-50 prototxt file

The files I used can be downloaded on my Github, and of course you can use different ResNet network structures.

(ii) Modification of relevant documents

1. cd $FRCN_ROOT/lib/rpn/generate_anchors.py

# on line 37:
def generate_anchors(base_size=16, ratios=[0.5.1.2],
                     scales=2**np.arange(3.6)) :
# change to:
def generate_anchors(base_size=16, ratios=[0.5.1.2],
                     scales=2**np.arange(1.6)) :
Copy the code

2. cd $FRCN_ROOT/lib/rpn/anchor_target_layer.py

# on line 28:
        anchor_scales = layer_params.get('scales', (8.16.32))
# change to:
        anchor_scales = layer_params.get('scales', (2.4.8.16.32))
Copy the code

3. cd $FRCN_ROOT/lib/rpn/proposal_layer.py

# on line 29:
        anchor_scales = layer_params.get('scales', (8.16.32))
# change to:
        anchor_scales = layer_params.get('scales', (2.4.8.16.32))
Copy the code

4. Modification of pascal_VOC. py, imdb.py, train.prototxt, test.prototxt,.pt files refer to training vGG-16 model of Faster R-CNN using our own data.

5. Because we have used 5 scales of anchors, our previous 9 anchors have become 3*5=15. Change 18 to 30 in prototxt and pt files.

layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 30   # 2(bg/fg) * 9(anchors) /// replace 18 with 30
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0}}Copy the code

(3) Download ImageNet model

Download the ImageNet pre-training file: resnet-50.v2.caffemodel

(4) Clear the cache

Delete the cache file: $FRCN_ROOT/data/VOCdevkit2007 annotations_cache/annots PKL and $FRCN_ROOT PKL under/data/cache files If you don’t clear the cache may be an error.

(5) Start training

See the VGG16 training command: CD $FRCN_ROOT

./experiments/scripts/faster_rcnn_end2end.sh 0 ResNet-50 pascal_voc

$FRCN_ROOT/models/pascal_voc/ resnet-50 = $FRCN_ROOT/models/pascal_voc/ resnet-50 = $FRCN_ROOT/models/pascal_voc/ resnet-50 = $FRCN_ROOT/models/pascal_voc/ resnet-50 = $FRCN_ROOT/models/pascal_voc/ resnet-50 = =


Due to the deeper network of RESNET-50, the training time is longer, and each iteration takes about 0.5s. It took me about 10 hours to train this network, but the effect is better than VGG 16, mainly because it is more accurate for small scale object detection.

Here are my AP’s from training: