This is the 19th day of my participation in the First Challenge 2022

Recent work content is mainly semantic segmentation, semantic segmentation is a big topic, it is estimated that one or two sharing will be difficult to make it clear, especially the semantic form, so I am ready to do more sharing to say semantic segmentation.

Sharing involves those

What do we talk about in sharing?

  • What is semantic segmentation, that is, what motivates us to learn language segmentation
  • What are the applications of semantic segmentation
  • Difficulties in semantic segmentation task
  • We will talk about FCN, UNet, deepLab and SWIN
  • Semantic segmentation future development direction, that is, some of their own ideas

What is semantic segmentation

As a relatively basic computer vision task, semantic segmentation is often translated as semantic segmentation, which is easily misleading and is related to natural language. In fact, this task has attracted people’s attention long before deep learning became popular. At that time, clustering was still used to do semantic segmentation.

What are the applications of semantic segmentation

It is now commonly used in situations like autonomous driving. Compared with such computer vision tasks as image classification and object detection, semantic segmentation is difficult because it can more accurately represent the object in the space occupied by the image by recognizing the target contour. However, although semantic segmentation looks elegant, it is actually a classification problem, that is, a pixel classification, what we need to do is to mark each pixel to the category it belongs to.

  • Medical imaging applications
  • Image processing such as blur image background

Semantic segmentation metrics

Before introducing the model, let’s briefly introduce the semantic segmentation measurement index Acc and IoU. In fact, when we are doing AI projects, we usually start to select the model first. In fact, we should first understand the measurement standard and have the target.

Distinguish between background prospects
background TN FP
background FN TP

Here T stands for True, F stands for False, P and N stand for positive and negative respectively, where positive and negative stand for prediction results, and T and F stand for prediction errors

TP is the correct prediction, which is actually a positive sample TN is the correct prediction, and FP is the negative sample FP is the wrong prediction, and FN is the positive sample FN is the wrong prediction, which is actually a negative sample

So essentially, these problems are usually category disequilibrium, that is, there are far more pixels that belong to the background than to the target (foreground), because most pixels are usually background pixels, and generally almost 90% or so of pixels are background, or negative example samples.

To predict 10 different categories, only 10% of a category belongs to that category, the rest can be used as background,

For each classification, such as the classification problem in the figure, in the left Ground Truth figure, the light yellow area represents the character, and the light purple represents the background. And on the right is the projection,

We need to expand intersection. In the second figure, we predict the intersection between the light yellow and the task. The area of this intersection is also the area we want to expand

In the figure below, the light yellow on the left represents the intersection A∪BA \ Cap BA∩B (TP), and the yellow on the right represents the union (A∪BA \cup BA∪B)


I o U = i n t e r s e c t i o n u n i o n IoU = \frac{intersection}{union}

We further correspond it to our confusion matrix, TP represents that the prediction is actually the prospect, and is actually the prospect (person). FP represents the actual prospect of the forecast error, while FN represents the actual background of the forecast error


I o U = T P F P + T P + F N IoU = \frac{TP}{FP + TP + FN}

Simple code implementation

intersection = np.logical_and(target, prediction) 
union = np.logical_or(target, prediction) 
iou_score = np.sum(intersection) / np.sum(union)
Copy the code