From paper to test: A primer on Facebook's Detectron Project

Heart of the Machine column

Author: Chen Huichan

From RCNN to Faster RCNN, and more recently FPN and Mask RCNN, which won ICCV Best Paper, deep learning has taken off from many machine learning algorithms in object detection with absolute advantage. We have been expecting Facebook’s computer vision project to be open source for a long time. After more than a year of waiting, Facebook has finally opened source today, Detectron. Detectron project uses Caffe2 and Python interface. Realized more than 10 computer vision of the latest achievements. Let’s take a brief look at the paper that Detectron implemented. We also did our initial test of Detectron, and we will update the Detectron training model and speed standard that we tested ourselves in a subsequent blog post.

Fast RCNN, Faster RCNN, RFCN, FPN, RetinaNet

Detectron implements the standard object detection model and adds Feature Pyramid Network and RetinaNet state-of-the art object detection model. FPN is state-of-the art of two-stage detection, RetinaNet is best-performing model of one-stage, and ICCV’s best Student paper.

ResNet, ResNeXt

Detectron implements basic neural Network structures such as Residual Network and ResNeXt. ResNext uses Depthwise Convolution to greatly reduce parameters and ensure classification results.

Human-object Interaction Detection

Bounding boxes can be obtained by object detection, as shown in Figure (a). Human-object interaction can learn the relationship between different bounding boxes by predicting the probability density between different bounding boxes. As shown in figure (c), the relationship between man and knife is cut.

Mask RCNN

Mask RCNN can achieve 7 FPS instance segmentation and key point detection by improving Faster RCNN, which exceeds all methods at that time. Mask RCNN has had good results on COCO and CITYSCAPES datasets. The schematic diagram of Mask RCNN is shown below.

Training Imagenet in one hour

This paper finds that large batch can greatly improve the convergence speed of the classification network. By increasing the Batch size from 256 to 8192, the training time is reduced from several weeks to one hour, which greatly improves the training speed of the Shenjiang network.

Learning to Segment Everything

It is very expensive to collect marks for MASK RCNN. It takes one hour to mark a picture on Cityscapes. This paper proposes the method of weight transfer to divide all objects, which avoids the huge time and money cost of collecting segmentation data. This paper uses the weights of bounding box Detection branch to predict the weights of mask branch to achieve this goal.

Non Local Neural Convolution

Convolution Neural Network can only transmit information of the neighborhood. This paper designs non-local Convolution by referring to non-local means and self attention. So you can capture information that’s not in your neighborhood. In the figure below, a central point can capture important information about a non-neighborhood.

A preliminary study of Detectron framework

To use the Detectron frame, we need to install caffe2. Please refer to the caffe2 website for the installation. Then refer to install.md to INSTALL Detectron, which provides easy testing and adding op functionality. To add an op, see test_zero_even_op.py.

The Detectron framework contains folders such as Config, Demo, lib, Tests, and Tools. Config contains the training and testing parameters of each model. Lib is the core folder of the Detectron, such as Data Loader, Model Builder, Operator Definition and utils (non-core functions such as learning rate).

Detectron installation

Caffe2 installed, refer to https://caffe2.ai/docs/getting-started.html? platform=ubuntu&configuration=compile

Core commands:


     
      git clone --recursive https://github.com/caffe2/caffe2.git && cd caffe2
      make && cd build && sudo make install
      python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
      
     
Copy the code

Detectron installed, refer to https://github.com/facebookresearch/Detectron/blob/master/INSTALL.md

Detectron test

Run the test using Mask RCNN FPN ResNet 50 as follows:


     
      CUDA_VISIBLE_DEVICES=3 python tools/train_net.py --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml OUTPUT_DIR /tmp/detectron-output
      
     
Copy the code

Test speed on Titan X:


     
      INFO infer_simple.py: 111: Processing demo/16004479832_a748d55f21_k.jpg -> /tmp/detectron-visualizations/16004479832_a748d55f21_k.jpg.pdf
      INFO infer_simple.py: 119: Inference time: 1.402s
      The INFO infer_simple. Py: 121: | im_detect_bbox: 1.329 s
      The INFO infer_simple. Py: 121: | misc_mask: 0.034 s
      The INFO infer_simple. Py: 121: | im_detect_mask: 0.034 s
      The INFO infer_simple. Py: 121: | misc_bbox: 0.005 s
      INFO infer_simple.py: 124:  \ Note: inference on the first image will be slower than the rest (caches and auto-tuning need to warm up)
      INFO infer_simple.py: 111: Processing demo/18124840932_e42b3e377c_k.jpg -> /tmp/detectron-visualizations/18124840932_e42b3e377c_k.jpg.pdf
      INFO infer_simple.py: 119: Inference time: 0.411s
      The INFO infer_simple. Py: 121: | im_detect_bbox: 0.305 s
      The INFO infer_simple. Py: 121: | misc_mask: 0.058 s
      The INFO infer_simple. Py: 121: | im_detect_mask: 0.044 s
      The INFO infer_simple. Py: 121: | misc_bbox: 0.004 s
      INFO infer_simple.py: 111: Processing demo/24274813513_0cfd2ce6d0_k.jpg -> /tmp/detectron-visualizations/24274813513_0cfd2ce6d0_k.jpg.pdf
      INFO infer_simple.py: 119: Inference time: 0.321s
      The INFO infer_simple. Py: 121: | im_detect_bbox: 0.264 s
      The INFO infer_simple. Py: 121: | misc_mask: 0.034 s
      The INFO infer_simple. Py: 121: | im_detect_mask: 0.018 s
      The INFO infer_simple. Py: 121: | misc_bbox: 0.005 s
      INFO infer_simple.py: 111: Processing demo/33823288584_1d21cf0a26_k.jpg -> /tmp/detectron-visualizations/33823288584_1d21cf0a26_k.jpg.pdf
      INFO Infer_simple. py: 119: Inference Time: 0.722s
      The INFO infer_simple. Py: 121: | im_detect_bbox: 0.515 s
      The INFO infer_simple. Py: 121: | misc_mask: 0.127 s
      The INFO infer_simple. Py: 121: | im_detect_mask: 0.072 s
      The INFO infer_simple. Py: 121: | misc_bbox: 0.007 s
      INFO infer_simple.py: 111: Processing demo/17790319373_bd19b24cfc_k.jpg -> /tmp/detectron-visualizations/17790319373_bd19b24cfc_k.jpg.pdf
      INFO infer_simple.py: 119: Inference time: 0.403s
      The INFO infer_simple. Py: 121: | im_detect_bbox: 0.292 s
      The INFO infer_simple. Py: 121: | misc_mask: 0.067 s
      The INFO infer_simple. Py: 121: | im_detect_mask: 0.038 s
      The INFO infer_simple. Py: 121: | misc_bbox: 0.006 s
      
     
Copy the code

Detectron framework training

FPN ResNet50 was used for Faster RCNN training on COCO datasets

Use the following command:


     
      CUDA_VISIBLE_DEVICES=3 python tools/train_net.py     --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml     OUTPUT_DIR /tmp/detectron-output
      
     
Copy the code

The output is as follows:


     
      Namespace(cfg_file='configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml', multi_gpu_testing=False, opts=['OUTPUT_DIR', '/tmp/detectron-output'], skip_test=False)
      INFO train_net.py: 188: Training with config:
      INFO train_net.py: 189: {'BBOX_XFORM_CLIP': 4.1351665567423561,
       'CLUSTER': {'ON_CLUSTER': False},
       'DATA_LOADER': {'NUM_THREADS': 4},
      'DEDUP_BOXES: 0.0625,
       'DOWNLOAD_CACHE': '/tmp/detectron-download-cache',
       'EPS': 1e-14,
       'EXPECTED_RESULTS': [],
      'EXPECTED_RESULTS_ATOL: 0.005,
       'EXPECTED_RESULTS_EMAIL': '',
      'EXPECTED_RESULTS_RTOL: 0.1,
       'FAST_RCNN': {'MLP_HEAD_DIM': 1024,
                     'ROI_BOX_HEAD': 'fast_rcnn_heads.add_roi_2mlp_head',
                     'ROI_XFORM_METHOD': 'RoIAlign',
                     'ROI_XFORM_RESOLUTION': 7,
                     'ROI_XFORM_SAMPLING_RATIO': 2},
       'FPN': {'COARSEST_STRIDE': 32,
               'DIM': 256,
               'EXTRA_CONV_LEVELS': False,
               'FPN_ON': True,
               'MULTILEVEL_ROIS': True,
               'MULTILEVEL_RPN': True,
               'ROI_CANONICAL_LEVEL': 4,
               'ROI_CANONICAL_SCALE': 224,
               'ROI_MAX_LEVEL': 5,
               'ROI_MIN_LEVEL': 2,
               'RPN_ANCHOR_START_SIZE': 32,
      'RPN_ASPECT_RATIOS: (0.5, 1, 2),
               'RPN_MAX_LEVEL': 6,
               'RPN_MIN_LEVEL': 2,
               'ZERO_INIT_LATERAL': False},
       'MATLAB': 'matlab',
       'MEMONGER': True,
       'MEMONGER_SHARE_ACTIVATIONS': False,
      'MODEL' : {' BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0),
                 'CLS_AGNOSTIC_BBOX_REG': False,
                 'CONV_BODY': 'FPN.add_fpn_ResNet50_conv5_body',
                 'EXECUTION_TYPE': 'dag',
                 'FASTER_RCNN': True,
                 'KEYPOINTS_ON': False,
                 'MASK_ON': False,
                 'NUM_CLASSES': 81,
                 'RPN_ONLY': False,
                 'TYPE': 'generalized_rcnn'},
       'MRCNN': {'CLS_SPECIFIC_MASK': True,
                 'CONV_INIT': 'GaussianFill',
                 'DILATION': 2,
                 'DIM_REDUCED': 256,
                 'RESOLUTION': 14,
                 'ROI_MASK_HEAD': '',
                 'ROI_XFORM_METHOD': 'RoIAlign',
                 'ROI_XFORM_RESOLUTION': 7,
                 'ROI_XFORM_SAMPLING_RATIO': 0,
      'THRESH_BINARIZE: 0.5,
                 'UPSAMPLE_RATIO': 1,
                 'USE_FC_OUTPUT': False,
      'WEIGHT_LOSS_MASK: 1.0},
       'NUM_GPUS': 1,
       'OUTPUT_DIR': '/tmp/detectron-output',
      'PIXEL_MEANS: array ([[[102.9801, 115.9465, 122.7717]]]).
       'RESNETS': {'NUM_GROUPS': 1,
                   'RES5_DILATION': 1,
                   'STRIDE_1X1': True,
                   'TRANS_FUNC': 'bottleneck_transformation',
                   'WIDTH_PER_GROUP': 64},
       'RETINANET': {'ANCHOR_SCALE': 4,
      'ASPECT_RATIOS: (0.5, 1.0, 2.0),
      'BBOX_REG_BETA: 0.11,
      'BBOX_REG_WEIGHT: 1.0,
                     'CLASS_SPECIFIC_BBOX': False,
      'INFERENCE_TH: 0.05,
      'LOSS_ALPHA: 0.25,
      'LOSS_GAMMA: 2.0,
      'NEGATIVE_OVERLAP: 0.4,
                     'NUM_CONVS': 4,
      'POSITIVE_OVERLAP: 0.5,
                     'PRE_NMS_TOP_N': 1000,
      'PRIOR_PROB: 0.01,
                     'RETINANET_ON': False,
                     'SCALES_PER_OCTAVE': 3,
                     'SHARE_CLS_BBOX_TOWER': False,
                     'SOFTMAX': False},
       'RFCN': {'PS_GRID_SIZE': 3},
       'RNG_SEED': 3,
       'ROOT_DIR': '/home/huichan/caffe2/detectron',
       'RPN': {'ASPECT_RATIOS': (0.5, 1, 2),
               'RPN_ON': True,
               'SIZES': (64, 128, 256, 512),
               'STRIDE': 16},
      'SOLVER' : {' BASE_LR: 0.0025,
      'GAMMA' : 0.1,
      'LOG_LR_CHANGE_THRESHOLD: 1.1,
                  'LRS': [],
                  'LR_POLICY': 'steps_with_decay',
                  'MAX_ITER': 60000,
      'MOMENTUM' : 0.9,
                  'SCALE_MOMENTUM': True,
      'SCALE_MOMENTUM_THRESHOLD: 1.1,
                  'STEPS': [0, 30000, 40000],
                  'STEP_SIZE': 30000,
      'WARM_UP_FACTOR: 0.3333333333333333,
                  'WARM_UP_ITERS': 500,
                  'WARM_UP_METHOD': u'linear',
      'WEIGHT_DECAY: 0.0001},
       'TRAIN': {'ASPECT_GROUPING': True,
                 'AUTO_RESUME': True,
                 'BATCH_SIZE_PER_IM': 256,
      'BBOX_THRESH: 0.5,
      'BG_THRESH_HI: 0.5,
      'BG_THRESH_LO: 0.0,
      'CROWD_FILTER_THRESH: 0.7,
                 'DATASETS': ('coco_2014_train',),
      'FG_FRACTION: 0.25,
      'FG_THRESH: 0.5,
                 'FREEZE_CONV_BODY': False,
                 'GT_MIN_AREA': -1,
                 'IMS_PER_BATCH': 2,
                 'MAX_SIZE': 833,
                 'PROPOSAL_FILES': (),
                 'RPN_BATCH_SIZE_PER_IM': 256,
      'RPN_FG_FRACTION: 0.5,
                 'RPN_MIN_SIZE': 0,
      'RPN_NEGATIVE_OVERLAP: 0.3,
      'RPN_NMS_THRESH: 0.7,
      'RPN_POSITIVE_OVERLAP: 0.7,
                 'RPN_POST_NMS_TOP_N': 2000,
                 'RPN_PRE_NMS_TOP_N': 2000,
                 'RPN_STRADDLE_THRESH': 0,
                 'SCALES': (500,),
                 'SNAPSHOT_ITERS': 20000,
                 'USE_FLIPPED': True,
                 'WEIGHTS': u'/tmp/detectron-download-cache/ImageNetPretrained/MSRA/R-50.pkl'},
       'USE_NCCL': False,
       'VIS': False,
      'VIS_TH: 0.9}
      I0123 13:14:38.367794 36482 Context_gpu.cu :325] Total: 311 MB
      INFO train_net.py: 330: Loading dataset: ('coco_2014_train',)
      loading annotations into memory...
      Done (t = 15.17 s)
      creating index...
      index created!
      INFO roidb.py:  49: Appending horizontally-flipped training examples...
      INFO roidb.py:  51: Loaded dataset: coco_2014_train
      INFO roidb.py: 135: Filtered 1404 roidb entries: 165566 -> 164162
      INFO roidb.py:  67: Computing bounding-box regression targets...
      INFO roidb.py:  69: done
      INFO roidb.py: 191: Ground-truth class histogram:
      INFO roidb.py: 195: 0__background__: 0
      INFO roidb.py: 195: 1        person: 363358
      INFO roidb.py: 195: 2       bicycle: 9824
      INFO roidb.py: 195: 3           car: 61106
      INFO roidb.py: 195: 4    motorcycle: 11944
      INFO roidb.py: 195: 5      airplane: 7656
      INFO roidb.py: 195: 6           bus: 8642
      INFO roidb.py: 195: 7         train: 6316
      INFO roidb.py: 195: 8         truck: 14094
      INFO roidb.py: 195: 9          boat: 14912
      INFO roidb.py: 195: 10 traffic light: 18248
      INFO roidb.py: 195: 11  fire hydrant: 2632
      INFO roidb.py: 195: 12     stop sign: 2744
      INFO roidb.py: 195: 13 parking meter: 1666
      INFO roidb.py: 195: 14         bench: 13482
      INFO roidb.py: 195: 15          bird: 14226
      INFO roidb.py: 195: 16           cat: 6598
      INFO roidb.py: 195: 17           dog: 7534
      INFO roidb.py: 195: 18         horse: 9304
      INFO roidb.py: 195: 19         sheep: 12916
      INFO roidb.py: 195: 20           cow: 11196
      INFO roidb.py: 195: 21      elephant: 7760
      INFO roidb.py: 195: 22          bear: 1806
      INFO roidb.py: 195: 23         zebra: 7316
      INFO roidb.py: 195: 24       giraffe: 7186
      INFO roidb.py: 199:          total: 1195680
      INFO train_net.py: 334: 164162 roidb entries
      INFO net.py:  54: Loading from: /tmp/detectron-download-cache/ImageNetPretrained/MSRA/R-50.pkl
      I0123 13:16:41.699045 36482 net_dag_utils. Cc :118] Operator graph pruning prior to chain compute took: 0.000500389 secs
      I0123 13:16:41.699774 36482 Net_DAG.cc :61] Number of Parallel Execution Chains 340 Number of Operators = 632
      INFO loader.py: 232:   [62/64]
      INFO Detector. Py: 434: Changing learning Rate 0.000000 -> 0.0033 at iter 0
      Json_stats: {"accuracy_cls": 0.000000, "ETA ": "2 days, 6:05:39", "iter": 0, "loss": 5.814330, "loss_bbox": 0.008809, "loSS_CLS ": 4.863443," loSS_RPn_bbox_fpN2 ": 0.000000, "loss_RPn_bbox_fpN3 ": 0.000000," loss_RPn_bbox_fpn4 ": 0.002576, "loss_RPn_bbox_fpN5 ": 0.264878," loss_RPn_bbox_fpN6 ": 0.000000, "loss_RPn_clS_fpN2 ": 0.264878," loss_RPn_bbox_fpN6 ": 0.000000 0.455301, "LOSS_RPN_CLS_FPN3 ": 0.091068," loSS_RPN_CLS_fpN4 ": 0.022299, "loss_RPN_clS_fpN5 ": 0.105955, "loss_RPn_clS_fpN6 ": 0.000000," LR ": 0.0033, "MB_qsize ": 64, "mem": 3253, "time": 3.245656}
      Json_stats: {" accuracy_cls ": 0.940430," eta ":" 8:29:26 ", "iter" : 20, "loss" : 1.839182, "loss_bbox" : 0.071032, "LOSS_CLS ": 0.897934," LOSS_RPn_bbox_fpN2 ": 0.077837, "loSS_RPn_bbox_fpN3 ": 0.005068," loss_RPn_bbox_fpN4 ": 0.014110, "loSS_RPn_bbox_fpN5 ": 0.013995," loss_RPn_bbox_fpN6 ": 0.000000, "loss_RPn_clS_fpN2 ": 0.014110," loss_RPn_bbox_fpN6 ": 0.000000, "loss_RPn_clS_fpN2 ": 0.425642, "loSS_RPn_clS_fpN3 ": 0.099356," loSS_RPn_clS_fpN4 ": 0.034078, "loss_RPn_clS_fpN5 ": 0.019162, "loss_RPN_clS_fpN6 ": 0.000000," LR ": 0.000900, "MB_qsize ": 64," meM ": 3267, "time": 0.509616}
      
     
Copy the code

Memory usage is as follows:

conclusion

Detecrton framework provides Caffe2 and Python interfaces, Caffe2 provides good support for multi-GPU and distributed training, and the utilization rate of GPU is greatly improved. And it provides a good baseline implementation for many state-of-art methods. It is believed that the Detectron framework will shine in the future computer Vision field.

Installation Tip 1:


     
      >>> import caffe2
      >>> from caffe2.python import core
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "caffe2/python/core.py", line 24, in <module>
          from past.builtins import basestring
      ImportError: No module named past.builtins
      >>> quit()
      
     
Copy the code

Sudo PIP install Future

Installation Tip 2:

Caffe2 needs to be added to the PYTHONPATH and LD_LIBRARY_PATH paths once it is installed

nano ~/.bashrc

Input:


     
      export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
      export PYTHONPATH=$PYTHONPATH:/home/huichan/caffe2/caffe2/build
      source ~/.bashrc
      
     
Copy the code

This article is the heart of the machine column, reprint please contact the original author for authorization.

✄ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Join Heart of the Machine (full-time reporter/intern) : [email protected]

Contribute or seek coverage: [email protected]

Advertising & Business partnerships: [email protected]

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

From paper to test: A primer on Facebook’s Detectron Project

From paper to test: A primer on Facebook’s Detectron Project

Related Posts

Build a robot that can make drinks with Arduino

Read andrei Capas, head of Tesla neural Network, on self-driving Strategy (1)

Recommended system engineering to build a map search map services