A review of semantic segmentation

preface

This paper gives a brief overview of the important papers on semantic segmentation, introduces their main improvement methods and effects, and provides the download methods of these papers.

This article is from the public CV technical guide technical summary series ****

Welcome to CV technical guide, focusing on computer vision technology summary, the latest technology tracking, classic paper interpretation.

Semantic segmentation refers to the process of linking each pixel in an image to a class label. These labels may include people, cars, flowers, furniture, etc.

We can think of semantic segmentation as pixel-level image classification. For example, in an image with many cars, segmentation marks all objects as car objects. However, a separate category of models called instance segmentation can mark individual instances of objects appearing in the image. This segmentation is useful in applications that are used to calculate the number of objects, such as the flow of people in a shopping mall.

Some of its major applications are self-driving cars, human-computer interaction, robotics, and photo editing/creative tools. For example, semantic segmentation is important in self-driving cars and robotics because it is important for models to understand the context in which they operate.

“Two men riding on a bike in front of a building on the road. And there is a car.”

This paper will introduce some research papers on the latest methods of constructing semantic segmentation models, namely:

Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation
Fully Convolutional Networks for Semantic Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
Multi-Scale Context Aggregation by Dilated Convolutions
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Rethinking Atrous Convolution for Semantic Image Segmentation
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
Improving Semantic Segmentation via Video Propagation and Label Relaxation
Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

Attached at the end of the paper above download method

Weakly supervised and semi-supervised learning in deep convolutional Networks for semantic image segmentation

Weakly- and semi-supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation (ICCV, 2015)

Code: HTTPS: / / bitbucket.org/deeplab/deeplab-public

This paper proposes a solution for processing weakly labeled data and the combination of well-labeled and improperly labeled data in deep convolutional neural network (CNN).

In this paper, the combination of deep CNN and fully connected conditional random field is applied.

On PASCAL VOC segmentation benchmark, the model gives mean intersection-over-union (IOU) scores higher than 70%. One of the main challenges with this model is that it requires images to be annotated at the pixel level during training.

The main contributions of this paper are:

An expectation maximization algorithm is introduced for boundary box or image level training in weakly supervised and semi-supervised Settings.
It is shown that combining weak and strong annotations can improve performance. After combining notes from the MS-COCO and PASCAL datasets, the authors achieved 73.9% IOU performance on PASCAL VOC 2012.
It is proved that their method achieves higher performance by combining a small number of pixel-level annotation images with a large number of boundary-box or image-level annotation images.

Full convolutional networks for semantic segmentation

Fully Convolutional Networks for Semantic Segmentation (PAMI, 2016)

Code: fcn.berkeleyvision.org

The proposed model achieves 67.2% average IU performance on PASCAL VOC 2012.

The fully connected network takes images of any size and generates outputs of the corresponding spatial dimensions. In this model, ILSVRC classifiers are projected onto fully connected networks and intensive predictions are enhanced using pixel-level losses and in-network upsampling. Then the segmentation training is completed by fine tuning. Fine-tuning is done by back propagation across the network.

U-net: Convolutional networks for biomedical image segmentation

U-net: Convolutional Networks for Biomedical Image Segmentation (MICCAI, 2015)

Code: LMB. Informatik. Uni – freiburg. DE/people/ronn…

In biomedical image processing, it is very important to obtain a category label for each cell in the image. The biggest challenge in biomedical missions is the difficulty of obtaining thousands of images for training.

In this paper, the complete convolution layer is built and modified to process some training images and produce more accurate segmentation.

Since there is very little training data available, the model uses data enhancement by applying elastic deformation to the available data. As shown in Figure 1, the network architecture consists of a contraction path on the left and an expansion path on the right.

The contraction path consists of two 3×3 convolution. Each convolution is followed by a rectifying linear unit and a 2×2 maximum pooling operation for downsampling. Each downsampling phase doubles the number of feature channels. The extended path step includes up-sampling of the feature channel. And then convolved over 2×2, halving the number of characteristic channels. The final layer is the 1×1 convolution, which maps component feature vectors to the desired number of classes.

In this model, training is done using input images, their segmentation graphs, and Caffe’s stochastic gradient descent implementation. Data enhancement is used to teach the required robustness and immutability of the network when very little training data is used. The model achieved an average IOU score of 92% in one experiment.

One-hundred tier Tiramisu: Fully convolutional DenseNets for semantic segmentation

Thesis: The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (2017)

Code: github.com/SimJeg/FC-D…

The idea behind DenseNets is to have each layer connected to each other in a feedforward fashion, making the network easier to train and more accurate.

The architecture of the model is built in dense blocks of downsampling and upsampling paths. The downsampling path has two downconversions (TDS) and the upsampling path has two upconversions (TU). The circles and arrows represent connection patterns within the network.

The main contributions of this paper are:

DenseNet architecture is extended to complete convolutional networks for semantic segmentation.
An upsampling path that performs better than other upsampling paths is proposed from dense networks.
Demonstrate that the network can produce SOTA results in standard benchmarks.
The model achieved 88% global accuracy on the CamVid dataset.

Multi-scale context aggregation is carried out by extended convolution

Thesis: Multi-scale Context Aggregation by Dilated Convolutions (ICLR, 2016)

Code: github.com/fyu/dilatio…

In this paper, a convolutional network module is developed to fuse multi-scale context information without loss of resolution. The module can then be plugged into an existing schema at any resolution. The module is based on extended convolution.

The module was tested on the Pascal VOC 2012 dataset. It proves that adding context modules to existing semantic segmentation architectures can improve their accuracy.

The front-end modules trained in the experiment achieved 69.8% average IoU on the VOC-2012 validation set and 71.3% average IoU on the test set. The prediction accuracy of this model for different objects is shown below

DeepLab: Semantic image segmentation using deep convolutional networks, Atrous convolution, and fully connected CRF

Thesis: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (TPAMI, 2017)

Code: github.com/tensorflow/… (Unofficial)

In this paper, the author makes the following contributions to the semantic segmentation task of deep learning:

Convolution with an upsampling filter for intensive prediction tasks
Pyramid Pooling in Porous Space for Multi-scale Segmentation Targets (ASPP)
Improved positioning of object boundaries by using DCNN.

The proposed DeepLab system achieves 79.7% mIOU in PASCAL VOC-2012 semantic image segmentation task.

This paper addresses the main challenges of using deep CNN in semantic segmentation, including:

Reduced feature resolution due to repeated combination maximum pooling and downsampling.
The existence of multiscale targets.
Because the target-centered classifier requires the invariance of spatial transformation, the invariance of DCNN leads to the reduction of positioning accuracy.

Atrous convolution is applied by up-sampling the filter by inserting zeros or sparsely sampling the input feature graph. The second method requires a subsample of the input feature graph equal to the porous convolution rate r, and a de-interleaving scan is performed to generate R ^2 reduced resolution graphs, with one possible shift for each R ×r. After this, standard convolution is applied to the direct feature graphs, interleaving them with the original resolution of the image.

Rethinking semantic image segmentation with Atrous convolution

Rethinking Atrous Convolution for Semantic Image Segmentation (2017)

Code: github.com/pytorch/vis… (Unofficial)

This article addresses two challenges of semantic segmentation using DCNN (mentioned earlier); Reduction in feature resolution occurs when continuous pooling operations are applied and multiple scale objects are present.

To solve the first problem, the paper suggests using atrous convolution, also known as extended convolution. It addresses the second problem by proposing the use of porous convolution to enlarge the field of view and thus include multiscale context.

The paper’s “DeepLabv3” achieved 85.7% performance on the PASCAL VOC 2012 test set without DenseCRF post-processing.

Encoder-decoder with Atrous separable convolution for semantic image segmentation

Encoder-Decoder with Atrous Convolution for Semantic Image Segmentation (ECCV, 2018)

Code: github.com/tensorflow/…

The proposed method “DeepLabv3+” achieved 89.0% and 82.1% test set performance without any post-processing of PASCAL VOC 2012 and Cityscapes data sets. The model is an extension of DeepLabv3, which refines the segmentation results by adding a simple decoder module.

In this paper, two types of neural networks are implemented, which use spatial pyramid pooling modules for semantic segmentation. One captures context information by aggregating features of different resolutions, while the other captures clear object boundaries.

FastFCN: Rethinking extended convolution in semantic segmentation backbone

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation (2019)

Code: github.com/wuhuikai/Fa…

In this paper, a Joint Pyramid Upsampling (JPU) module is proposed to replace the time-consuming and memory-consuming extended convolution. Its working principle is that the function of high resolution map extraction is formulated as a joint upsampling problem.

The method achieves 53.13% mIoU performance on Pascal Context data sets and runs three times faster.

In this method, a fully connected network (FCN) is implemented as the backbone, and JPU is used to up-sample the final low-resolution feature image to generate a high-resolution feature image. Replacing extended convolution with JPU does not result in any performance penalty.

Joint sampling uses low resolution target image and high resolution guide image. Then the high resolution target image is generated by transmitting the structure and details of the guide image.

Improved semantic segmentation through video propagation and tag relaxation

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation (2019)

Code: github.com/NVIDIA/sema…

In this paper, a video-based approach is proposed to expand the training set by synthesizing new training samples. This is aimed at improving the accuracy of semantic segmentation networks. It explores the ability of video prediction models to predict future frames in order to predict future tags.

This paper shows that training segmentation networks on data sets derived from synthetic data can improve prediction accuracy. The proposed method achieves 83.5% mIoU on Cityscapes and 82.9% on CamVid.

The paper proposes two methods for predicting future labels:

Label Propagation (LP) creates new training samples by pairing propagated labels with original future frames
Joint Image-Label Propagation (JP) creates new training samples by pairing Propagation labels with corresponding Propagation images

The thesis has three main propositions; A video prediction model is used to propagate labels to immediate adjacent frames, joint image label propagation is introduced to deal with the problem of misalignment, and the single-hot label training is relaxed by maximizing the possibility of probabilistic union along the boundary.

Porta-scnn: Gated shape CNN for semantic segmentation

Gated Shape CNNs for Semantic Segmentation (2019)

Code: nv – tlabs. Making. IO/GSCNN /

This paper is the latest achievement in semantic segmentation. The authors propose a dual – flow CNN architecture. In this architecture, shape information is processed as a separate branch. This shape flow processes only boundary-related information. This is enforced by the model’s gated convolution layer (GCL) and local oversight.

The model is 1.5% higher on mIoU than Deeplab-V3 + and 4% higher on F boundary score. The model was evaluated using the Cityscapes benchmark. On smaller, thinner objects, the model achieved a 7% improvement on the IoU.

The following table shows the performance of porta-SCNN compared to other models.

conclusion

We should now master some of the most common — and more recent — techniques for performing semantic segmentation in a variety of contexts.

Access to all the above papers: public CV technical guide background reply keyword “0009” can be obtained

By Derrick Mwiti

Compilation: CV technical Guide

Heartbeat.com et.ml/ A-2019-Guid…

Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

Reply keyword “technical summary” in the public account to obtain the summary PDF of the original technical summary article of the public account.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Weakly supervised and semi-supervised learning in deep convolutional Networks for semantic image segmentation

Full convolutional networks for semantic segmentation

U-net: Convolutional networks for biomedical image segmentation

One-hundred tier Tiramisu: Fully convolutional DenseNets for semantic segmentation

Multi-scale context aggregation is carried out by extended convolution

DeepLab: Semantic image segmentation using deep convolutional networks, Atrous convolution, and fully connected CRF

Rethinking semantic image segmentation with Atrous convolution

Encoder-decoder with Atrous separable convolution for semantic image segmentation

FastFCN: Rethinking extended convolution in semantic segmentation backbone

Improved semantic segmentation through video propagation and tag relaxation

Porta-scnn: Gated shape CNN for semantic segmentation

conclusion

Other articles

A review of semantic segmentation

Weakly supervised and semi-supervised learning in deep convolutional Networks for semantic image segmentation

Full convolutional networks for semantic segmentation

U-net: Convolutional networks for biomedical image segmentation

One-hundred tier Tiramisu: Fully convolutional DenseNets for semantic segmentation

Multi-scale context aggregation is carried out by extended convolution

DeepLab: Semantic image segmentation using deep convolutional networks, Atrous convolution, and fully connected CRF

Rethinking semantic image segmentation with Atrous convolution

Encoder-decoder with Atrous separable convolution for semantic image segmentation

FastFCN: Rethinking extended convolution in semantic segmentation backbone

Improved semantic segmentation through video propagation and tag relaxation

Porta-scnn: Gated shape CNN for semantic segmentation

conclusion

Other articles

Related Posts

Topological Quantum computing dreams shattered? Three years ago Nature mistakenly failed to find Majorana fermion

A robot can’t handle hand-eye calibration? Quick poke, the best hand-eye calibration library!!

Medical image registration based on MATLAB GUI Optical flow field model