Author: You Luying, Fuzhou University, Datawhale member

Image segmentation is another basic task in computer vision besides classification and detection. It means to divide the image into different blocks according to the content. Compared with image classification and detection, segmentation is a more delicate work, because each pixel needs to be classified.

In the street view segmentation of the following figure, because each pixel point is classified, the contour of the object is accurately outlined, rather than the boundary box given like detection.

Image segmentation can be divided into the following three sub-areas: semantic segmentation, instance segmentation and panoramic segmentation.

It can be found from the comparison figure that semantic segmentation identifies the image from the pixel level and makes the category mark for each pixel in the image.At present, it is widely used in medical images and unmanned driving; Instance segmentation is relatively more challenging, not only need to correctly detect the target in the image, but also need to accurately segment each instance; Panoramic segmentation combines two tasks requiring that each pixel in the image must be assigned a semantic label and an instance ID.

01 Key steps in semantic segmentation

In network training, it is often necessary to preprocess semantic label graph or instance segmentation graph. For example, for a colored label map, the category represented by each color is obtained through the color mapping table, and then it is converted into the corresponding mask or Onehot coding to complete the training. The key steps will be explained here.

Firstly, taking semantic segmentation task as an example, different expression forms of labels are introduced.

1.1 Semantic Label diagram

The semantic segmentation data set includes the original image and semantic label image, both of which have the same size and are RGB images.

In the label image, white and black represent the border and background respectively, while other different colors represent different categories:

1.2 Single-channel mask

The RGB value of each label corresponds to its annotation category, so it is easy to find the category index of each pixel in the label and generate a single-channel Mask Mask.

Such as the chart below, labeled categories include: Person, Purse, Plants, Sidewalk, Building. After the semantic label graph is converted into a single-channel mask, it is shown in the figure on the right. The size remains unchanged, but the number of channels changes from 3 to 1.

There is a one-to-one correspondence for each pixel position.

1.3 Onehot coding

Onehot, as an encoding method, can encode each single channel mask.

For example, for the Mask image Mask mentioned above, the image size is, and there are five categories of label category. We need to turn this Mask into a Onehot output with five channels. The size is, that is, the pixel points with all median values of 1 in the Mask are extracted into a map, and the corresponding positions are set to 1, and the rest are 0. Then extract all 2 to regenerate a graph, set the corresponding position as 1, the rest as 0, and so on.

02 Semantic segmentation practice

Next, the Pascal VOC 2012 semantically segmented dataset is used as an example to show how different expressions should be converted to each other.

In practice, Pascal VOC 2012 semantic segmentation dataset is used, which is a very important data set in semantic segmentation tasks. It has 20 categories of objects, including human, vehicle and other objects, which can be used for segmentation of object categories or backgrounds.

Data set open source address:

Gas. Graviti. Cn/dataset/ylu…

2.1 Data set reading

This time, the online reading of data sets is completed using the Gewu Ti data platform service. The platform supports a variety of data set types and provides many public data sets for easy use. Make some necessary preparations before using:

  • Fork cubes: If you want to use a public cube, Fork it into your own account first.
  • To obtain the AccessKey: required to use the SDK with titanium gridded data platform, interactive key, link for gas. Graviti. Cn/tensorbay/d…
  • The VOC data set is divided into “train” and “test” segments.
import os\ from tensorbay import GAS\ from tensorbay.dataset import Data, Dataset\ from tensorbay.label import InstanceMask, SemanticMask\ from PIL import Image\ import numpy as np\ import torchvision\ import matplotlib.pyplot as plt\ \ ACCESS_KEY = "<YOUR_ACCESSKEY>"\ gas = GAS(ACCESS_KEY)\ \ \ def read_voc_images(is_train=True, index=0):\ """\ read voc image using tensorbay\ """\ dataset = Dataset("VOC2012Segmentation", gas)\ if is_train:\ segment = dataset["train"]\ else:\ segment = dataset["test"]\ \ data = segment[index]\ feature = Image.open(data.open()).convert("RGB")\ label = Image.open(data.label.semantic_mask.open()).convert("RGB")\ visualize(feature, label)\ \ return feature, label # PIL Image\ \ \ def visualize(feature, \ FIG = plt.figure()\ ax = FIG. Add_subplot (121) # Ax.imshow (feature)\ ax2 = FIG. Add_subplot (122) # 2 \ ax2.imshow(label)\ plt.show()\ train_feature, train_label = read_voc_images(is_train=True, index=10)\ train_label = np.array(train_label) # (375, 500, 3)Copy the code

2.2 Color mapping table

Once you have a color semantic label map, you can build a color table map that lists the values of each RGB color in the label and the category it is labeled with.

def colormap_voc():\ """\ create a colormap\ """\ colormap = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0],\ [0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128], \ [64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0], \ [64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128], \ [0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0], \ [0, 64, 128]] classes = ['background', 'aeroplane', 'bicycle', 'bird', 'boat',\ 'bottle', 'bus', 'car', 'cat', 'chair', 'cow',\ 'diningtable', 'dog', 'horse', 'motorbike', 'person',\ 'potted plant', 'sheep', 'sofa', 'train', 'tv/monitor']\ \ return colormap, classesCopy the code

2.3 Conversion between Label and Onehot

According to the mapping table, realize the conversion between semantic label graph and Onehot encoding:

def label_to_onehot(label, colormap):\ """\ Converts a segmentation label (H, W, C) to (H, W, K) where the last dim is a one\ hot encoding vector, C is usually 1 or 3, and K is the number of class.\ """\ semantic_map = []\ for colour in colormap:\ equality = np.equal(label, colour)\ class_map = np.all(equality, axis=-1)\ semantic_map.append(class_map)\ semantic_map = np.stack(semantic_map, axis=-1).astype(np.float32)\ return semantic_map\ \ def onehot_to_label(semantic_map, colormap):\ """\ Converts a mask (H, W, K) to (H, W, C)\ """\ x = np.argmax(semantic_map, axis=-1)\ colour_codes = np.array(colormap)\ label = np.uint8(colour_codes[x.astype(np.uint8)])\ return label\ \ colormap, classes = colormap_voc()\ semantic_map = mask_to_onehot(train_label, Colormap)\ print(semantic_map.shape) # [H, W, K] = [375, 500, 21] \ label = onehot_to_label(semantic_map, colormap)\ print(label.shape) # [H, W, K] = [375, 500, 3]Copy the code

2.4 Onehot and Mask conversion

Similarly, with the help of the mapping table, the conversion between single-channel Mask Mask and Onehot encoding is realized:

def onehot2mask(semantic_map):\ """\ Converts a mask (K, H, W) to (H,W)\ """\ _mask = np.argmax(semantic_map, axis=0).astype(np.uint8)\ return _mask\ \ \ def mask2onehot(mask, num_classes):\ """\ Converts a segmentation mask (H,W) to (K,H,W) where the last dim is a one\ hot encoding vector\ \ """\ semantic_map = [mask == i for i in range(num_classes)]\ return np.array(semantic_map).astype(np.uint8)\ \ mask = Onehot2mask (semantic_map.transpose(2,0,1))\ print(np.unique(mask)) # [0 1 15] \ print(mask.shape) # (375,  500)\ \ semantic_map = mask2onehot(mask, len(colormap))\ print(semantic_map.shape) # (21, 375, 500)Copy the code

For more information, please visit gewu Titanium official website