Posted by Jakub Czakon

url : https://towardsdatascience.co…

Source: the source of Neptune. Ai

The topics we will discuss in this article are:

  • What is image segmentation
  • The architecture of image segmentation
  • The loss function used in image segmentation
  • Frame available in your image segmentation project

Let’s find out.

What is image segmentation

As the name implies, image segmentation is the process of transforming an image into multiple parts. In this process, each pixel in the image is associated with a specific object. There are two main types of image segmentation: semantic segmentation and instance segmentation.

In semantic segmentation, all objects are of the same type, and all objects of the same type are labeled with a class tag, while in instance segmentation, similar objects can have their own independent tags.

Referring to the 2018 Paper by Anurag Arnab, Shuai Zheng et al: “Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation” http://www.robots.ox.ac.uk/~t…

The architecture of image segmentation

The basic structure of image segmentation is composed of encoder and decoder.

From Vijay Badrinarayanan et al. ‘s 2017 Paper: “Segnet: A Deep Convolutional Encoder and Decoder Architecture for Image Segmentation “https://arxiv.org/abs/1511.00561

The encoder extracts features from the image through a filter. The decoder is responsible for generating the final output, usually a Segmantation Mask containing the outline of the object. Most architectures have this architecture or a variation of it.

Let’s take a look at some examples.

U-Net

U-Net is a convolutional neural network originally used to develop biological impact segmentation. Visually, its architecture looks like the letter U, hence the name U-Net. Its architecture consists of two parts, the contraction path on the left and the expansion path on the right. The purpose of the shrink path is to capture content, while the role of the extend path is to help pinpoint it.

From Olaf Ronneberger etc. The author in 2015 Paper “U -.net architecture image segmentation” https://arxiv.org/abs/1505.04597

U-NET consists of an expansion path on the right and a contraction path on the left. The contraction path consists of two 3×3 convolutional layers. The convolution is followed by a calibrated linear element and a calculation of a 2×2 max-pooling layer to do the following sampling.

Complete implementation of U – Net can be found here: https://lmb.informatik.uni-fr…

FASTFCN – Fast, fully connected network

In this architecture, one uses the Joint Pyramid Upsampling (JPU) module instead of the extended convolutional network, because the convolutional network consumes a lot of memory and computation time. It uses a fully connected network as the core while applying JPU for up sampling. JJPU upsamples the low-resolution feature map into a high-resolution feature map.

From Huikai Wu et al. 2019 Paper “FastFCN: An Dilated Convolution in the Backbone for Semantic Segmentation “https://arxiv.org/abs/1903.11816

If you want to put through code realization, look here: https://github.com/wuhuikai/F…

=

Gated-SCNN

This architecture consists of a two-stream CNN architecture. In this model, a separate branch is used to process the shape information of the image. Shape flow is used to process boundary information.

Towaki Takikawa al 2019, such as “Gated – SCNN: Gated Shape CNNs for semantic segmentation” https://arxiv.org/abs/1907.05740

Code implementation: https://github.com/nv-tlabs/g…

DEEPLAB (Deep Laboratory)

In this architecture, convolution with an upsampled filter is used for tasks involving intensive prediction. Partitioning of multiple objects is accomplished by means of a space pool with no space pyramid. Finally, the target boundary localization is improved by using DCNNs. The filter is upsampled by inserting zero or sparse sampling the input characteristic graph to achieve useless convolution.

Chen Liangjie et al., 2016, “DeepLab: using depth convolution network, Atrous convolution and the connection of CRF semantic image segmentation” https://arxiv.org/abs/1606.00915

You can use the PyTorch (https://github.com/fregu856/d…) or TensorFlow (https://github.com/sthalles/d…Try its implementation on).

Mask R-CNN

In this architecture, objects are classified and located using a bounding box/bounding box and semantic segmentation, which classifies each pixel into a set of categories. Each area of interest has a segmentation mask. A class label and a bounding box are produced as the final output. In fact, the architecture is an extension of Faster R-CNN. Faster R-CNN consists of a deep convolutional network that proposes the region and a detector that utilizes the region.

Kaiming He, et. Al 2017 “Mask R – CNN” https://arxiv.org/abs/1703.06870

This is an image of the results obtained on the Coco test set.

Kaiming He, et. Al 2017 “Mask R – CNN” https://arxiv.org/abs/1703.06870

Image segmentation loss function

Semantic segmentation models usually use a simple cross-class entropy loss function in the training process. However, if you are interested in getting the details of the image, then you must revert to a slightly more advanced loss function.

Let’s take a look at a few of them.

The focus of loss

This loss is an improvement on the standard cross entropy standard. This is accomplished and implemented by changing its shape so that the loss assigned to adequately classified examples is weighted down. Ultimately, this ensures that there is no class imbalance. In this loss function, as confidence in the correct category increases, the cross-entropy loss scales as the scaling factor decays at zero. The scaling factor automatically reduces the contribution of the tradeoff simple examples and focuses on the difficult examples during training.

Source source: Neptune. Ai

Dice loss

This loss is obtained by calculating the smooth die coefficient function. This loss is the most common loss and belongs to the partition problem.

Source source: Neptune. Ai

IOU balance loss intersection

The purpose of IOU balanced classification loss is to increase the gradient of samples with high IOU and reduce the gradient of samples with low IOU. In this way, the positioning accuracy of the machine learning model can be improved.

Source source: Neptune. Ai

Boundary loss

A variant of boundary loss is suitable for tasks with segmented height imbalance. This loss takes the form of spatial contour contours rather than regional distance measures. In this way, the problem caused by region loss in highly unbalanced partitioning tasks can be solved.

Source source: Neptune. Al

Weighted cross entropy

In a variable of cross entropy, all positive examples are weighted by a certain coefficient. It is used for scenarios or scenarios involving class imbalances.

Source source: Neptune. Ai

Lovasz – Softmax losses

The loss is based on the convex Lovasz extension of the submodule loss, and the average intersect-greater than joint loss in the neural network is directly optimized.

Source source: Neptune. Ai

Other noteworthy losses are:

  • The purpose of TOPK loss is to ensure that the network focuses on hard samples during training.
  • The CE loss of the distance loss leads the network to the hard-to-divide boundary region.
  • Sensitivity Sensitivity-specificity (SS) loss, used to calculate the weighted sum of the mean square deviation of specificity and sensitivity.
  • The Hausdorff distance (HD) loss can be estimated from the convolutional neural network.

These are just a few of the loss functions used in image segmentation. To learn more, please click here to link to view: https://github.com/JunMa11/Se…

Data sets for image segmentation

If you see it here, you will think about where you can get the corresponding data set to study image segmentation.

Now let’s look at some of the data sets that we can use.

Common Objects in Context — COCO dataset

COCO is a large-scale data set generated by object detection, image segmentation, and five-term description. There are 91 item categories in this dataset. It’s 250,000 people with key points. Its download size is 37.57GIB. It contains 80 object classes. It is available under the Apache2.0 license and can be downloaded here (https://cocodataset.org/#down…). .

PASCAL Visual Object Classes (PASCAL VOC)

Pascal has 9,963 images in 20 different categories. The training/verification set is a 2GB tar file. Data set can be downloaded from the official website: http://host.robots.ox.ac.uk/p…

Cityscapes data set

This data set contains images of city scenes. It can be used to evaluate the performance of visual algorithms in urban scenes. The data set can be downloaded here: https://www.cityscapes-datase… .

Cambridge Driving Tagging Video Database – CAMVid

This is a motion-based segmentation and recognition data set. It contains 32 semantic categories. This link contains data set further explanation and pointing to the download link: http://mi.eng.cam.ac.uk/resea… .

Image segmentation framework

Now that you have your data set ready to work with, let me introduce some tools/frameworks that you can use to get started.

  • Fastai library – Given an image, this library creates masks/masks for the objects in the image.
  • Sefexa Image Segmentation Tool — Sefexa is a free tool for semi-automatic image segmentation, image analysis and ground authenticity creation.
  • DeepMask — Facebook Research’s DeepMask is a Torch implementation of DeepMask and SharpMask.
  • Multipath — This is the Torch implementation of the Object Detection Network in the Multipath Network for Object Detection.
  • OpenCV – This is an open source computer vision library with over 2500 optimization algorithms.
  • MisCNN – is an open source library for medical image segmentation. It allows the use of state-of-the-art convolutional neural networks and deep learning models to build pipes in a few lines of code.
  • Fritz – Fritz provides several computer vision tools, including an image segmentation tool for mobile devices.

conclusion

Hopefully, this article has provided you with some background on image segmentation and some tools and frameworks that you can use in your work.

For more information, see the links attached to each architecture and framework.