Preface:

In deep learning and computer vision, efforts are being made to extract features and output meaningful representations for various visual tasks. In some tasks, we only focus on the geometry of the object, regardless of color, texture, lighting, etc. This is where boundary detection comes in.

Check out two computer vision articles a day

Problem definition

Figure 1 Boundary detection

Figure 1 is an example of boundary detection, which, as the name suggests, is the task of detecting the boundaries of an object from an image. This is an ill-posed question because the problem setting itself is ambiguous. As shown in the figure, for the image of an indoor room (left), Ground truth (middle) defines the ground truth object boundary in the room and predicts (right) the object boundary of the room. However, we can see that the estimated boundaries are much more than ground truth, including unnecessary boundaries from room layouts, curtains, and even sofa textures. Extracting clean and meaningful object boundaries is not easy.

The original method

A straightforward solution to boundary detection is to treat it as a semantic segmentation problem. By simply marking the boundary as 1 and other regions as 0 in the annotation, we can express it as a binary semantic segmentation problem, and take the binary cross entropy loss as the loss function. However, it has two causes: the highly unbalanced label distribution and the inherent problem of cross-entropy loss per pixel.

Limitations of Cross Entropy Loss

When cross-entropy loss is used, the statistical distribution of labels plays an important role in training accuracy. The more unevenly the tags are distributed, the harder it is to train. Although the difficulty can be reduced by weighted cross entropy loss, the improvement is not significant, and the inherent problem of cross entropy loss has not been solved. In the cross-entropy loss, the loss is calculated as the average value of the loss per pixel, and the loss per pixel is calculated as a discrete value, without knowing whether its adjacent pixels are the boundaries. Therefore, the cross-entropy loss is considered only in the microscopic sense, not in the global sense, which is not enough to predict the image level.

Figure 2 Boundary prediction with cross entropy loss

See Figure 2. For the input image (left), the prediction of cross entropy loss (middle) and weighted cross entropy loss (right) are compared. The boundary on the right is much better than the one in the middle, but the predicted boundary is not clean, and the dirty grassy texture boundary remains.

Dice Loss

Dice Loss originates from the Sø rensen-DICE coefficient, a statistical data used in the 1940s to measure the similarity between two samples. It was brought to computer vision by Miller Taree et al. In 2016, 3d medical image segmentation was performed.

Figure 3 Dice coefficient

The above equation shows the dice coefficient equation, where PI and GI represent the corresponding pixel predicted value and ground truth respectively. In the boundary detection scene, the values of PI and GI are 0 or 1, indicating whether the pixel is the boundary. If yes, the value is 1, otherwise the value is 0. Therefore, the denominator is the sum of the total boundary pixels of prediction and Ground truth, and the value is the sum of the correctly predicted boundary pixels, because PI and GI only increase when the values match (the two values are 1).

Figure 4 Dice coefficient (Setting view)

Figure 4 is another view of Figure 3. From the perspective of set theory, the dice coefficient (DSC) is the measure of the overlap between two sets. For example, if two groups A and B are completely overlapping, the maximum value of the DSC is 1. Otherwise, the DSC begins to decrease, and if the two groups do not overlap at all, the minimum value is 0. Therefore, the DSC range is between 0 and 1, the larger the better. Therefore, we can use 1-DSC as the die loss to maximize the overlap between the two groups.

In the boundary detection task, ground truth boundary pixels and predicted boundary pixels can be regarded as two sets. By utilizing Dice Loss, the two groups were trained to overlap a bit. See Figure 4. The denominator considers the total number of boundary pixels at the global scale, while the value considers the overlap between two sets at the local scale. Therefore, Dice Loss takes Loss information into account both locally and globally, which is crucial for high accuracy.

The results of

FIG. 5 Results of boundary prediction

See Figure 5. The prediction results using Dice Loss (column C) have higher accuracy than other methods (column D and e). Especially for thin boundary, Dice Loss can be reduced only when the predicted boundary pixel overlaps with the thin boundary of Ground truth and there is no predicted boundary pixel in other regions

The reference papers

V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation, Milletari et al., 3DV 2016

Learning to Predict Crisp Boundaries, Deng et al., ECCV 2018

Link to original article:

Medium.com/ai-salon/un…

This article comes from the public account CV technical guide of the paper sharing series.

Welcome to the public account CV technical guide, focusing on the technical summary of computer vision, the latest technology tracking, classical paper interpretation.

To get a PDF summary of the following articles, reply to the official account with the keyword “Technical summary”.

Other articles

Use Dice Loss for clear boundary detection

PVT– Backbone function without convolution dense prediction

CVPR2021 | open the target detection of the world

Siamese network summary

Past, present and possibility of visual object detection and recognition

What concepts or techniques have you learned as an algorithm engineer that have made you feel like you’ve grown tremendously?

Summary of computer vision terms (1) to build a knowledge system of computer vision

Summary of underfitting and overfitting techniques

Summary of normalization methods

Summary of common ideas of paper innovation

Summary of methods of reading English literature efficiently in CV direction

A review of small sample learning in computer vision

A brief overview of knowledge distillation

Optimize OpenCV video read speed

NMS summary

Technical summary of loss function

Technical summary of attention mechanisms

Summary of feature pyramid technology

Summary of pooling technology

Summary of data enhancement methods

Summary of CNN structure Evolution (I) Classic model

Summary of CNN structure evolution (II) Lightweight model

Summary of CNN structure evolution (III) Design principles

How to view the future of computer vision

Summary of CNN Visualization Technology (I) Visualization of feature map

Summary of CNN visualization technology (2) Visualization of convolution kernel

Summary of CNN Visualization Technology (III) Class visualization

CNN Visualization Technology Summary (IV) Visualization tools and projects