What is interactive Foreground extraction

Classic foreground extraction techniques are done primarily using texture (color) information, such as the Magic wand tool, or based on edge (contrast) information, such as smart scissors. In 2004, Rother et al. from Microsoft Research put forward interactive foreground extraction technology in their paper. The proposed algorithm can accurately extract the foreground image with very little interaction.

To start extracting the foreground, a rectangular box is used to specify the approximate range of the foreground area, and then the segmentation is repeated until the best results are achieved. After the above processing, the effect of foreground extraction may not be ideal. There are cases where the foreground is not extracted or the background is extracted as the foreground. In this case, the user needs to intervene in the extraction process. In the copy of the original image (it can also be any image with the same size of the original image), the user annotates the area to be extracted as the foreground with white, and the area to be extracted as the background with black. Then, the annotated image is used as the mask, and the algorithm continues to iteratively extract the foreground to get the final result.

For example, the figure image above, first use the rectangular box to extract the foreground box, and then use white and black respectively to mark the foreground image and background image. After the annotation is completed, the interactive foreground extraction algorithm can be used to extract the complete foreground of the character.

Below, the blogger introduces the principle of GrabCut, an interactive foreground extraction algorithm:

  1. Use a rectangle to mark the general area where the foreground is located. It is worth noting that the rectangle at this point only shows the approximate position of the foreground, which contains both foreground and background, so this area is actually an undetermined area. But the areas outside of that area are considered to be “determined background.”
  2. Distinguish between the foreground and background within the rectangle area based on the Determine Background data on the outside of the rectangle.
  3. The foreground and background were modeled using Gaussian mixture model (GMM). GMM learns from the user’s input and creates a new pixel distribution. Unclassified pixels (which may be background or foreground) are classified according to their relationship to known classified pixels (foreground and background).
  4. Generate a graph based on the distribution of pixels, and the nodes in the graph are the pixels. In addition to the pixels, there are two nodes: the foreground node and the background node. All foreground pixels are connected to foreground nodes, and all background pixels are connected to background nodes. The weight of the edges each pixel connects to a foreground or background node is determined by the probability that the pixel is a foreground or background node
  5. In addition to being connected to the foreground or background nodes, each pixel in the image is connected to each other. The weight value of the edge connected by two pixels is determined by their similarity. The closer the color of the two pixel values is, the greater the weight value of the edge is.
  6. When the nodes are connected, the problem to be solved becomes a connected graph. Different points are divided into foreground nodes and background nodes according to the weight relationship of each edge.
  7. This process is repeated until the classification converges.

GrabCut function

In OpenCV, it provides us with the cv2.GrabCut () function to realize interactive foreground extraction, which is fully defined as follows:

def grabCut(img, mask, rect, bgdModel, fgdModel, iterCount, mode=None) : 
Copy the code

Img: Input image, 8 bits, 3 channels

Mask: Mask image, 8 bit single channel. This parameter is used to determine the foreground region, background region and uncertain region, and can be set in four forms:

The values meaning
cv2.GC_BGD It can also be represented by the value 0
cv2.GC_FGD It can also be expressed by the number 1
cv2.GC_PR_BGD Indicates the possible background, which can also be represented by the number 2
cv2.GC_PR_FGD It can also be represented by the number 3

When the template is used to extract the foreground, the parameter values 0 and 2 are combined as background (both are treated as 0), and the parameter values 1 and 3 are combined as foreground (both are treated as 1). In general, we can use a white brush and a black brush to mark the mask image and convert the white pixel to 0 and the black pixel to 1.

Rect: Refers to the area containing foreground objects, the outside part of this area is considered the “determined background”. Therefore, make sure that the foreground is contained within the scope specified by recT when selecting; Otherwise, the rect outside the foreground part will not be extracted. The rect parameter is meaningful only if the value of the mode parameter is set to the rectangular mode cv2.gc_init_with_rect. The format is (x,y,w,h), which respectively represent the X-axis and Y-axis coordinates of the upper-left pixel of the region and the length and width of the region. If the foreground is in the bottom right and you don’t want to judge the size of the original image, you can just use a very large value for w and h. In mask mode, set the value to None.

BgdModel: an array used internally by the algorithm. The value is required to create a numpy. Float64 array of size (1,65).

FgdModel: an array used internally by the algorithm. The value is required to create a numpy. Float64 array of size (1,65).

IterCount: Number of iterations

Mode: indicates the iterative mode. The values are as follows:

The values meaning
cv2.GC_INIT_WITH_RECT Using rectangular templates
cv2.GC_INIT_WITH_MASK Use a custom template. Note that cv2.gc_init_with_rect and Cv2.gc_init_with_mask can be used together. All pixels outside the ROI area (that is, not within the scope specified by the template or rectangle) are automatically processed as backgrounds
cv2.GC_EVAL Repair mode
cv2.GC_EVAL_FREEZE_MODEL Use a fixed pattern

Extract image foreground

Now that we know how it works, we know what OpenCV provides. Next, let’s try to extract the foreground of the above image with the following code:

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread("4.jpg")
plt.subplot(121)
plt.imshow(img, cmap="gray")
plt.axis('off')
mask = np.zeros(img.shape[:2], dtype=np.uint8)
bgdModel = np.zeros((1.65), dtype=np.float64)
fgdModel = np.zeros((1.65), dtype=np.float64)
mask[10:200.95:220] = 3
cv2.grabCut(img, mask, None, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_MASK)
mask2 = np.where((mask == 2) | (mask == 0), 0.1).astype('uint8')
ogc = img * mask2[:, :, np.newaxis]

plt.subplot(122)
plt.imshow(ogc, cmap="gray")
plt.axis('off')
plt.show()
Copy the code

After running, we can completely separate face and background, the effect is as follows:

As to why the mask value is such, let’s first look at the following code:

import cv2

img = cv2.imread("4.jpg")
rect=img[10:200.95:220]
cv2.imshow("4",rect)
cv2.waitKey()
cv2.destroyAllWindows()
Copy the code

After running, the effect is as follows:

As you can see, 10:200, 95:220 is a rectangle with coordinates (10,95), width, height, 200,220, which is the rectangle of the possible foreground. Using the possible foreground plus CrabCut algorithm, we get the determined foreground (in this case, the character head).

Use template to extract image foreground

Next, we extract it directly using cv2.gc_init_with_mask. The code is as follows:

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread("4.jpg")
mask = np.zeros(img.shape[:2], dtype=np.uint8)
bgdModel = np.zeros((1.65), dtype=np.float64)
fgdModel = np.zeros((1.65), dtype=np.float64)
rect = (60.10.400.500)
cv2.grabCut(img, mask, rect, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT)
mask2 = cv2.imread("37.jpg".0)
plt.subplot(121)
plt.imshow(mask2, cmap="gray")
plt.axis('off')
mask[mask2 == 0] = 0
mask[mask2 == 255] = 1
mask, bgd, fgd = cv2.grabCut(img, mask, None, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_MASK)
mask = np.where((mask == 2) | (mask == 0), 0.1).astype('uint8')
ogc = img * mask[:, :, np.newaxis]

plt.subplot(122)
plt.imshow(ogc, cmap="gray")
plt.axis('off')
plt.show()
Copy the code

After running, the effect is as follows:

Here, we construct a template without using a rectangle. Through the template, we can get a rough, definite foreground in another image. Here, we use the pixel 0 annotation to determine the background and the pixel 1 annotation to determine the foreground.