On October 22, 2017, ICCV 2017, the top international conference on computer vision, announced the winning papers. Kaiming He, an AI researcher at Facebook, won the best paper award and was one of the authors of the best student paper. This article is a complete reproduction of the best paper “Mask R-CNN” by Tucson Future, a domestic autonomous driving startup, and has been open-source on Github.


For CV circle friends, these two days the biggest news must be he Kaiming god won double best paper on ICCV! On various social platforms, people say “God is god, and no mortal can compete with him”, and some friends say “Other people’s best paper is faster than mine”…


Of course, everyone’s “shame”, in fact, is more to express the worship of The Great God Of Cumming and the best academic researchers to show their respect. At the same time, many practitioners are more concerned with the question: When will open source be available?





Tucson would like to congratulate Jimin on what he has accomplished. And this time, in addition to the social media platform for Kamin, Tucson is going to do something real in the future — reproduce the results of His paper in full (Mask R-CNN and Feature Pyramid Network) and open source the corresponding code to everyone! This is also the first open source code to reproduce the results of he Kaiming’s thesis.




Mask R-CNN framework used for instance segmentation in this paper


Making project address: https://github.com/TuSimple/mx-maskrcnn


MX Mask R-CNN


This is an implementation of Mask R-CNN. The repository of this implementation is primarily the MX-RCNN implementation based on Faster RCNN.





The primary outcome


Cityscapes




  • Main: Resnet – 50 – FPN

COCO

It will be out soon. Please stay tuned.


System requirements


We tested the code on the following configuration:

  • Ubuntu 16.04, Python 2.7
  • Numpy (1.12.1), cv2 (2.4.9), PIL (4.3), matplotlib (2.1.0), cython (0.26.1), easydict


Training to prepare


1. Download Cityscapes data (gtfine_trainvaltest.zip, leftimg8bit_trainvaltest.zip). Extract to ‘data/cityscape/’. The folders are as follows:



2. Download resNET-50 pre-training model


bash scripts/download_res50.sh


3. Build MXNet using the ROIAlign operator


cp rcnn/CXX_OP/* incubator-mxnet/src/operator/


Build MXNet from source code please refer to the tutorial:


https://mxnet.incubator.apache.org/get_started/build_from_source.html


4. Build relevant Cython code


make


5. Start training


bash scripts/train_alternate.sh


Assessment to


1. Prepare Cityscapes evaluation scripts


bash scripts/download_cityscapescripts.sh


2. Eval


bash scripts/eval.sh


Demo


1. The model can be downloaded from the following link, please place it in the Model folder.

  • Dropbox link: https://www.dropbox.com/s/zidcbbt7apwg3z6/final-0000.params?dl=0
  • Baidu cloud link: https://pan.baidu.com/s/1o8n4VMU

2. Please confirm that you have placed cityscapes data in the “Data/Cityscapes” folder.


bash scripts/demo.sh


reference


Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Neural Information Processing Systems, Workshop on Machine Learning Systems, 2015

Ross Girshick. “Fast R-CNN.” In Proceedings of the IEEE International Conference on Computer Vision, 2015.

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016.

Sung-yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie. “Feature Pyramid Networks for Object Detection.” In Computer Vision and Pattern Recognition, IEEE Conference on, 2017.

Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick. “Mask R-CNN.” In Proceedings of the IEEE International Conference on Computer Vision, 2017.

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. “Caffe: Convolutional architecture for fast feature embedding.” In Proceedings of the ACM International Conference on Multimedia, 2014.

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. “ImageNet: A large-scale hierarchical image database.” In Computer Vision and Pattern Recognition, IEEE Conference on, 2009.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. “Deep Residual Learning for Image Recognition”. In Computer Vision and Pattern Recognition, IEEE Conference on, 2016.

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. “The Cityscapes Dataset for Semantic Urban Scene Understanding.” In Computer Vision and Pattern Recognition, IEEE Conference on, 2016.