Heart of the Machine column

Author: Chen Huichan

From RCNN to Faster RCNN, and more recently FPN and Mask RCNN, which won ICCV Best Paper, deep learning has taken off from many machine learning algorithms in object detection with absolute advantage. We have been expecting Facebook’s computer vision project to be open source for a long time. After more than a year of waiting, Facebook has finally opened source today, Detectron. Detectron project uses Caffe2 and Python interface. Realized more than 10 computer vision of the latest achievements. Let’s take a brief look at the paper that Detectron implemented. We also did our initial test of Detectron, and we will update the Detectron training model and speed standard that we tested ourselves in a subsequent blog post.

Fast RCNN, Faster RCNN, RFCN, FPN, RetinaNet

Detectron implements the standard object detection model and adds Feature Pyramid Network and RetinaNet state-of-the art object detection model. FPN is state-of-the art of two-stage detection, RetinaNet is best-performing model of one-stage, and ICCV’s best Student paper.

ResNet, ResNeXt

Detectron implements basic neural Network structures such as Residual Network and ResNeXt. ResNext uses Depthwise Convolution to greatly reduce parameters and ensure classification results.

 

Human-object Interaction Detection

Bounding boxes can be obtained by object detection, as shown in Figure (a). Human-object interaction can learn the relationship between different bounding boxes by predicting the probability density between different bounding boxes. As shown in figure (c), the relationship between man and knife is cut.

Mask RCNN

Mask RCNN can achieve 7 FPS instance segmentation and key point detection by improving Faster RCNN, which exceeds all methods at that time. Mask RCNN has had good results on COCO and CITYSCAPES datasets. The schematic diagram of Mask RCNN is shown below.

 

Training Imagenet in one hour

This paper finds that large batch can greatly improve the convergence speed of the classification network. By increasing the Batch size from 256 to 8192, the training time is reduced from several weeks to one hour, which greatly improves the training speed of the Shenjiang network.

Learning to Segment Everything

It is very expensive to collect marks for MASK RCNN. It takes one hour to mark a picture on Cityscapes. This paper proposes the method of weight transfer to divide all objects, which avoids the huge time and money cost of collecting segmentation data. This paper uses the weights of bounding box Detection branch to predict the weights of mask branch to achieve this goal.

Non Local Neural Convolution

Convolution Neural Network can only transmit information of the neighborhood. This paper designs non-local Convolution by referring to non-local means and self attention. So you can capture information that’s not in your neighborhood. In the figure below, a central point can capture important information about a non-neighborhood.

 

A preliminary study of Detectron framework

To use the Detectron frame, we need to install caffe2. Please refer to the caffe2 website for the installation. Then refer to install.md to INSTALL Detectron, which provides easy testing and adding op functionality. To add an op, see test_zero_even_op.py.

The Detectron framework contains folders such as Config, Demo, lib, Tests, and Tools. Config contains the training and testing parameters of each model. Lib is the core folder of the Detectron, such as Data Loader, Model Builder, Operator Definition and utils (non-core functions such as learning rate).

Detectron installation

Caffe2 installed, refer to https://caffe2.ai/docs/getting-started.html? platform=ubuntu&configuration=compile

Core commands:


     
  1. git clone --recursive https://github.com/caffe2/caffe2.git && cd caffe2

  2. make && cd build && sudo make install

  3. python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"

Copy the code

Detectron installed, refer to https://github.com/facebookresearch/Detectron/blob/master/INSTALL.md

Detectron test

Run the test using Mask RCNN FPN ResNet 50 as follows:


     
  1. CUDA_VISIBLE_DEVICES=3 python tools/train_net.py --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml OUTPUT_DIR /tmp/detectron-output

Copy the code

Test speed on Titan X:


     
  1. INFO infer_simple.py: 111: Processing demo/16004479832_a748d55f21_k.jpg -> /tmp/detectron-visualizations/16004479832_a748d55f21_k.jpg.pdf

  2. INFO infer_simple.py: 119: Inference time: 1.402s

  3. The INFO infer_simple. Py: 121: | im_detect_bbox: 1.329 s

  4. The INFO infer_simple. Py: 121: | misc_mask: 0.034 s

  5. The INFO infer_simple. Py: 121: | im_detect_mask: 0.034 s

  6. The INFO infer_simple. Py: 121: | misc_bbox: 0.005 s

  7. INFO infer_simple.py: 124:  \ Note: inference on the first image will be slower than the rest (caches and auto-tuning need to warm up)

  8. INFO infer_simple.py: 111: Processing demo/18124840932_e42b3e377c_k.jpg -> /tmp/detectron-visualizations/18124840932_e42b3e377c_k.jpg.pdf

  9. INFO infer_simple.py: 119: Inference time: 0.411s

  10. The INFO infer_simple. Py: 121: | im_detect_bbox: 0.305 s

  11. The INFO infer_simple. Py: 121: | misc_mask: 0.058 s

  12. The INFO infer_simple. Py: 121: | im_detect_mask: 0.044 s

  13. The INFO infer_simple. Py: 121: | misc_bbox: 0.004 s

  14. INFO infer_simple.py: 111: Processing demo/24274813513_0cfd2ce6d0_k.jpg -> /tmp/detectron-visualizations/24274813513_0cfd2ce6d0_k.jpg.pdf

  15. INFO infer_simple.py: 119: Inference time: 0.321s

  16. The INFO infer_simple. Py: 121: | im_detect_bbox: 0.264 s

  17. The INFO infer_simple. Py: 121: | misc_mask: 0.034 s

  18. The INFO infer_simple. Py: 121: | im_detect_mask: 0.018 s

  19. The INFO infer_simple. Py: 121: | misc_bbox: 0.005 s

  20. INFO infer_simple.py: 111: Processing demo/33823288584_1d21cf0a26_k.jpg -> /tmp/detectron-visualizations/33823288584_1d21cf0a26_k.jpg.pdf

  21. INFO Infer_simple. py: 119: Inference Time: 0.722s

  22. The INFO infer_simple. Py: 121: | im_detect_bbox: 0.515 s

  23. The INFO infer_simple. Py: 121: | misc_mask: 0.127 s

  24. The INFO infer_simple. Py: 121: | im_detect_mask: 0.072 s

  25. The INFO infer_simple. Py: 121: | misc_bbox: 0.007 s

  26. INFO infer_simple.py: 111: Processing demo/17790319373_bd19b24cfc_k.jpg -> /tmp/detectron-visualizations/17790319373_bd19b24cfc_k.jpg.pdf

  27. INFO infer_simple.py: 119: Inference time: 0.403s

  28. The INFO infer_simple. Py: 121: | im_detect_bbox: 0.292 s

  29. The INFO infer_simple. Py: 121: | misc_mask: 0.067 s

  30. The INFO infer_simple. Py: 121: | im_detect_mask: 0.038 s

  31. The INFO infer_simple. Py: 121: | misc_bbox: 0.006 s

Copy the code

Detectron framework training

FPN ResNet50 was used for Faster RCNN training on COCO datasets

Use the following command:


     
  1. CUDA_VISIBLE_DEVICES=3 python tools/train_net.py     --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml     OUTPUT_DIR /tmp/detectron-output

Copy the code

The output is as follows:


     
  1. Namespace(cfg_file='configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml', multi_gpu_testing=False, opts=['OUTPUT_DIR', '/tmp/detectron-output'], skip_test=False)

  2. INFO train_net.py: 188: Training with config:

  3. INFO train_net.py: 189: {'BBOX_XFORM_CLIP': 4.1351665567423561,

  4. 'CLUSTER': {'ON_CLUSTER': False},

  5. 'DATA_LOADER': {'NUM_THREADS': 4},

  6. 'DEDUP_BOXES: 0.0625,

  7. 'DOWNLOAD_CACHE': '/tmp/detectron-download-cache',

  8. 'EPS': 1e-14,

  9. 'EXPECTED_RESULTS': [],

  10. 'EXPECTED_RESULTS_ATOL: 0.005,

  11. 'EXPECTED_RESULTS_EMAIL': '',

  12. 'EXPECTED_RESULTS_RTOL: 0.1,

  13. 'FAST_RCNN': {'MLP_HEAD_DIM': 1024,

  14.               'ROI_BOX_HEAD': 'fast_rcnn_heads.add_roi_2mlp_head',

  15.               'ROI_XFORM_METHOD': 'RoIAlign',

  16.               'ROI_XFORM_RESOLUTION': 7,

  17.               'ROI_XFORM_SAMPLING_RATIO': 2},

  18. 'FPN': {'COARSEST_STRIDE': 32,

  19.         'DIM': 256,

  20.         'EXTRA_CONV_LEVELS': False,

  21.         'FPN_ON': True,

  22.         'MULTILEVEL_ROIS': True,

  23.         'MULTILEVEL_RPN': True,

  24.         'ROI_CANONICAL_LEVEL': 4,

  25.         'ROI_CANONICAL_SCALE': 224,

  26.         'ROI_MAX_LEVEL': 5,

  27.         'ROI_MIN_LEVEL': 2,

  28.         'RPN_ANCHOR_START_SIZE': 32,

  29. 'RPN_ASPECT_RATIOS: (0.5, 1, 2),

  30.         'RPN_MAX_LEVEL': 6,

  31.         'RPN_MIN_LEVEL': 2,

  32.         'ZERO_INIT_LATERAL': False},

  33. 'MATLAB': 'matlab',

  34. 'MEMONGER': True,

  35. 'MEMONGER_SHARE_ACTIVATIONS': False,

  36. 'MODEL' : {' BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0),

  37.           'CLS_AGNOSTIC_BBOX_REG': False,

  38.           'CONV_BODY': 'FPN.add_fpn_ResNet50_conv5_body',

  39.           'EXECUTION_TYPE': 'dag',

  40.           'FASTER_RCNN': True,

  41.           'KEYPOINTS_ON': False,

  42.           'MASK_ON': False,

  43.           'NUM_CLASSES': 81,

  44.           'RPN_ONLY': False,

  45.           'TYPE': 'generalized_rcnn'},

  46. 'MRCNN': {'CLS_SPECIFIC_MASK': True,

  47.           'CONV_INIT': 'GaussianFill',

  48.           'DILATION': 2,

  49.           'DIM_REDUCED': 256,

  50.           'RESOLUTION': 14,

  51.           'ROI_MASK_HEAD': '',

  52.           'ROI_XFORM_METHOD': 'RoIAlign',

  53.           'ROI_XFORM_RESOLUTION': 7,

  54.           'ROI_XFORM_SAMPLING_RATIO': 0,

  55. 'THRESH_BINARIZE: 0.5,

  56.           'UPSAMPLE_RATIO': 1,

  57.           'USE_FC_OUTPUT': False,

  58. 'WEIGHT_LOSS_MASK: 1.0},

  59. 'NUM_GPUS': 1,

  60. 'OUTPUT_DIR': '/tmp/detectron-output',

  61. 'PIXEL_MEANS: array ([[[102.9801, 115.9465, 122.7717]]]).

  62. 'RESNETS': {'NUM_GROUPS': 1,

  63.             'RES5_DILATION': 1,

  64.             'STRIDE_1X1': True,

  65.             'TRANS_FUNC': 'bottleneck_transformation',

  66.             'WIDTH_PER_GROUP': 64},

  67. 'RETINANET': {'ANCHOR_SCALE': 4,

  68. 'ASPECT_RATIOS: (0.5, 1.0, 2.0),

  69. 'BBOX_REG_BETA: 0.11,

  70. 'BBOX_REG_WEIGHT: 1.0,

  71.               'CLASS_SPECIFIC_BBOX': False,

  72. 'INFERENCE_TH: 0.05,

  73. 'LOSS_ALPHA: 0.25,

  74. 'LOSS_GAMMA: 2.0,

  75. 'NEGATIVE_OVERLAP: 0.4,

  76.               'NUM_CONVS': 4,

  77. 'POSITIVE_OVERLAP: 0.5,

  78.               'PRE_NMS_TOP_N': 1000,

  79. 'PRIOR_PROB: 0.01,

  80.               'RETINANET_ON': False,

  81.               'SCALES_PER_OCTAVE': 3,

  82.               'SHARE_CLS_BBOX_TOWER': False,

  83.               'SOFTMAX': False},

  84. 'RFCN': {'PS_GRID_SIZE': 3},

  85. 'RNG_SEED': 3,

  86. 'ROOT_DIR': '/home/huichan/caffe2/detectron',

  87. 'RPN': {'ASPECT_RATIOS': (0.5, 1, 2),

  88.         'RPN_ON': True,

  89.         'SIZES': (64, 128, 256, 512),

  90.         'STRIDE': 16},

  91. 'SOLVER' : {' BASE_LR: 0.0025,

  92. 'GAMMA' : 0.1,

  93. 'LOG_LR_CHANGE_THRESHOLD: 1.1,

  94.            'LRS': [],

  95.            'LR_POLICY': 'steps_with_decay',

  96.            'MAX_ITER': 60000,

  97. 'MOMENTUM' : 0.9,

  98.            'SCALE_MOMENTUM': True,

  99. 'SCALE_MOMENTUM_THRESHOLD: 1.1,

  100.            'STEPS': [0, 30000, 40000],

  101.            'STEP_SIZE': 30000,

  102. 'WARM_UP_FACTOR: 0.3333333333333333,

  103.            'WARM_UP_ITERS': 500,

  104.            'WARM_UP_METHOD': u'linear',

  105. 'WEIGHT_DECAY: 0.0001},

  106. 'TRAIN': {'ASPECT_GROUPING': True,

  107.           'AUTO_RESUME': True,

  108.           'BATCH_SIZE_PER_IM': 256,

  109. 'BBOX_THRESH: 0.5,

  110. 'BG_THRESH_HI: 0.5,

  111. 'BG_THRESH_LO: 0.0,

  112. 'CROWD_FILTER_THRESH: 0.7,

  113.           'DATASETS': ('coco_2014_train',),

  114. 'FG_FRACTION: 0.25,

  115. 'FG_THRESH: 0.5,

  116.           'FREEZE_CONV_BODY': False,

  117.           'GT_MIN_AREA': -1,

  118.           'IMS_PER_BATCH': 2,

  119.           'MAX_SIZE': 833,

  120.           'PROPOSAL_FILES': (),

  121.           'RPN_BATCH_SIZE_PER_IM': 256,

  122. 'RPN_FG_FRACTION: 0.5,

  123.           'RPN_MIN_SIZE': 0,

  124. 'RPN_NEGATIVE_OVERLAP: 0.3,

  125. 'RPN_NMS_THRESH: 0.7,

  126. 'RPN_POSITIVE_OVERLAP: 0.7,

  127.           'RPN_POST_NMS_TOP_N': 2000,

  128.           'RPN_PRE_NMS_TOP_N': 2000,

  129.           'RPN_STRADDLE_THRESH': 0,

  130.           'SCALES': (500,),

  131.           'SNAPSHOT_ITERS': 20000,

  132.           'USE_FLIPPED': True,

  133.           'WEIGHTS': u'/tmp/detectron-download-cache/ImageNetPretrained/MSRA/R-50.pkl'},

  134. 'USE_NCCL': False,

  135. 'VIS': False,

  136. 'VIS_TH: 0.9}

  137. I0123 13:14:38.367794 36482 Context_gpu.cu :325] Total: 311 MB

  138. INFO train_net.py: 330: Loading dataset: ('coco_2014_train',)

  139. loading annotations into memory...

  140. Done (t = 15.17 s)

  141. creating index...

  142. index created!

  143. INFO roidb.py:  49: Appending horizontally-flipped training examples...

  144. INFO roidb.py:  51: Loaded dataset: coco_2014_train

  145. INFO roidb.py: 135: Filtered 1404 roidb entries: 165566 -> 164162

  146. INFO roidb.py:  67: Computing bounding-box regression targets...

  147. INFO roidb.py:  69: done

  148. INFO roidb.py: 191: Ground-truth class histogram:

  149. INFO roidb.py: 195: 0__background__: 0

  150. INFO roidb.py: 195: 1        person: 363358

  151. INFO roidb.py: 195: 2       bicycle: 9824

  152. INFO roidb.py: 195: 3           car: 61106

  153. INFO roidb.py: 195: 4    motorcycle: 11944

  154. INFO roidb.py: 195: 5      airplane: 7656

  155. INFO roidb.py: 195: 6           bus: 8642

  156. INFO roidb.py: 195: 7         train: 6316

  157. INFO roidb.py: 195: 8         truck: 14094

  158. INFO roidb.py: 195: 9          boat: 14912

  159. INFO roidb.py: 195: 10 traffic light: 18248

  160. INFO roidb.py: 195: 11  fire hydrant: 2632

  161. INFO roidb.py: 195: 12     stop sign: 2744

  162. INFO roidb.py: 195: 13 parking meter: 1666

  163. INFO roidb.py: 195: 14         bench: 13482

  164. INFO roidb.py: 195: 15          bird: 14226

  165. INFO roidb.py: 195: 16           cat: 6598

  166. INFO roidb.py: 195: 17           dog: 7534

  167. INFO roidb.py: 195: 18         horse: 9304

  168. INFO roidb.py: 195: 19         sheep: 12916

  169. INFO roidb.py: 195: 20           cow: 11196

  170. INFO roidb.py: 195: 21      elephant: 7760

  171. INFO roidb.py: 195: 22          bear: 1806

  172. INFO roidb.py: 195: 23         zebra: 7316

  173. INFO roidb.py: 195: 24       giraffe: 7186

  174. INFO roidb.py: 199:          total: 1195680

  175. INFO train_net.py: 334: 164162 roidb entries

  176. INFO net.py:  54: Loading from: /tmp/detectron-download-cache/ImageNetPretrained/MSRA/R-50.pkl

  177. I0123 13:16:41.699045 36482 net_dag_utils. Cc :118] Operator graph pruning prior to chain compute took: 0.000500389 secs

  178. I0123 13:16:41.699774 36482 Net_DAG.cc :61] Number of Parallel Execution Chains 340 Number of Operators = 632

  179. INFO loader.py: 232:   [62/64]

  180. INFO Detector. Py: 434: Changing learning Rate 0.000000 -> 0.0033 at iter 0

  181. Json_stats: {"accuracy_cls": 0.000000, "ETA ": "2 days, 6:05:39", "iter": 0, "loss": 5.814330, "loss_bbox": 0.008809, "loSS_CLS ": 4.863443," loSS_RPn_bbox_fpN2 ": 0.000000, "loss_RPn_bbox_fpN3 ": 0.000000," loss_RPn_bbox_fpn4 ": 0.002576, "loss_RPn_bbox_fpN5 ": 0.264878," loss_RPn_bbox_fpN6 ": 0.000000, "loss_RPn_clS_fpN2 ": 0.264878," loss_RPn_bbox_fpN6 ": 0.000000 0.455301, "LOSS_RPN_CLS_FPN3 ": 0.091068," loSS_RPN_CLS_fpN4 ": 0.022299, "loss_RPN_clS_fpN5 ": 0.105955, "loss_RPn_clS_fpN6 ": 0.000000," LR ": 0.0033, "MB_qsize ": 64, "mem": 3253, "time": 3.245656}

  182. Json_stats: {" accuracy_cls ": 0.940430," eta ":" 8:29:26 ", "iter" : 20, "loss" : 1.839182, "loss_bbox" : 0.071032, "LOSS_CLS ": 0.897934," LOSS_RPn_bbox_fpN2 ": 0.077837, "loSS_RPn_bbox_fpN3 ": 0.005068," loss_RPn_bbox_fpN4 ": 0.014110, "loSS_RPn_bbox_fpN5 ": 0.013995," loss_RPn_bbox_fpN6 ": 0.000000, "loss_RPn_clS_fpN2 ": 0.014110," loss_RPn_bbox_fpN6 ": 0.000000, "loss_RPn_clS_fpN2 ": 0.425642, "loSS_RPn_clS_fpN3 ": 0.099356," loSS_RPn_clS_fpN4 ": 0.034078, "loss_RPn_clS_fpN5 ": 0.019162, "loss_RPN_clS_fpN6 ": 0.000000," LR ": 0.000900, "MB_qsize ": 64," meM ": 3267, "time": 0.509616}

Copy the code

Memory usage is as follows:

 

conclusion

Detecrton framework provides Caffe2 and Python interfaces, Caffe2 provides good support for multi-GPU and distributed training, and the utilization rate of GPU is greatly improved. And it provides a good baseline implementation for many state-of-art methods. It is believed that the Detectron framework will shine in the future computer Vision field.

Installation Tip 1:


     
  1. >>> import caffe2

  2. >>> from caffe2.python import core

  3. Traceback (most recent call last):

  4.  File "<stdin>", line 1, in <module>

  5.  File "caffe2/python/core.py", line 24, in <module>

  6.    from past.builtins import basestring

  7. ImportError: No module named past.builtins

  8. >>> quit()

Copy the code

Sudo PIP install Future

Installation Tip 2:

Caffe2 needs to be added to the PYTHONPATH and LD_LIBRARY_PATH paths once it is installed

nano ~/.bashrc

Input:


     
  1. export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

  2. export PYTHONPATH=$PYTHONPATH:/home/huichan/caffe2/caffe2/build

  3. source ~/.bashrc

Copy the code

This article is the heart of the machine column, reprint please contact the original author for authorization.

✄ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Join Heart of the Machine (full-time reporter/intern) : [email protected]

Contribute or seek coverage: [email protected]

Advertising & Business partnerships: [email protected]