Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

What is Selective Search

Selective Search is a method based on region proposal in target detection. Its function is to locate the specific position of the target, mainly dividing the original image into many sub-blocks, which will be used for the target recognition model. In short, his input is an image, and his output is the location of the object in the image. The algorithm was later applied to r-CNN, SPP-NET,Fast R-CNN and other algorithms. So I study the algorithm code and do a lot of notes, I hope to help you understand his source code.

Selective search profile

The main idea of the algorithm is to input a picture, first through the method of image segmentation (felzenszwalb algorithm is used in the code) to obtain a lot of small areas, and then continue to merge these small areas, until it is impossible to merge. Below is the pseudo-code description of the algorithm in the original text

The algorithm is divided into the following steps:

  1. Generate the original region set R (using felzenszwalb algorithm)

  2. Calculate the similarity of each adjacent region in region set R S={s1,s2… }

  3. Find the two regions with the highest similarity, merge them into a new set, and add them to R

  4. Remove all subsets from S that are relevant to step 3

  5. Computes the similarity of the new set to all subsets

  6. Skip to step 3 and continue the loop, merging, until S is empty (until no more merging is possible)

Code and comments

It’s a little bit long, but HOPEFULLY you’ll read it, and you’ll really get a better sense of what this algorithm is

import skimage.data
from skimage.segmentation import clear_border
import skimage.io
import skimage.feature
import skimage.color
import skimage.transform
import skimage.util
import skimage.segmentation
import numpy
import skimage.data
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

#1. User generated function of original region set, in which felzenszwalb image segmentation algorithm is used. Each area has a number, which is incorporated into the image for later operations
def _generate_segments(im_orig, scale, sigma, min_size) :
    """ segment smallest regions by the algorithm of Felzenswalb and Huttenlocher """

    # open the Image
    # Calculate Felsenszwalb's efficient graph based image segmentation.
    im_mask = skimage.segmentation.felzenszwalb(skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,min_size=min_size)
    #im_mask numbers each pixel

    # merge mask channel to the image as a 4th channel
    im_orig = numpy.append(im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
    im_orig[:, :, 3] = im_mask

    return im_orig

#2. Calculate the similarity of the two regions
There are four similarity factors considered in this paper -- color, texture, size, and overlap.
#
# Where color and texture similarity, by obtaining the intersection of histograms of two regions, to judge the similarity.
#
The final similarity is the sum of the four similarity degrees
def _sim_colour(r1, r2) :
    """ calculate the sum of histogram intersection of colour """
    The # zip() function is a package function
    "" "a = b = [1, 2, 3] [4 and 6] c = zipped,5,6,7,8 [4] = zip (a, b) # package for a list of tuples [(1, 4), (2, 5), (3, # 6)] zip (a, c) number of elements in line with the shortest list [(1, 4), (2, 5), (3, 6) "" ""
    return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"]])def _sim_texture(r1, r2) :
    """ calculate the sum of histogram intersection of texture """
    return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"]])def _sim_size(r1, r2, imsize) :
    """ calculate the size similarity over the image """
    return 1.0 - (r1["size"] + r2["size"]) / imsize


def _sim_fill(r1, r2, imsize) :
    """ calculate the fill similarity over the image """
    bbsize = (
            (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x")) * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y")))return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize


def _calc_sim(r1, r2, imsize) :
    # Calculate the similarity between the two categories
    return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
            + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))


# 3. Function for calculating histograms of colors and textures
#
# Colour histogram: Convert the colour space to HSV and calculate the histogram with bins=25 under each channel, so that the colour histogram for each area has 25*3=75 intervals. After normalizing the histogram divided by the area size, the following formula is used to calculate the similarity:
#
#
#
# Texture similarity: The paper uses gaussian distributions with variance of 1 to make gradient statistics in 8 directions, and then calculates the histogram with bins=10 for the statistical results (size consistent with area size). The number of histogram intervals is 8*3*10=240 (using RGB color space). Here LBP (Local binary Pattern) is used to obtain texture features and build histograms, and the rest is the same
#
#
#
Where, is the value of the first bin in the histogram.

def _calc_colour_hist(img) :
    The img parameter entered is the HSV value for all pixels in each category
    The size of output histogram will be BINS * COLOUR_CHANNELS(3) number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf] extract HSV """

    BINS = 25
    hist = numpy.array([])
    for colour_channel in (0.1.2) :# extracting one colour channel
        # extract the 1st, 2nd, 3rd HSV channel values of each pixel band of the input parameter IMG, so the C array is one-dimensional, and the length of C is the same as img
        c = img[:, colour_channel]
        # calculate histogram for each colour and join to the result
        # numpy. Concatenate is a concatenate function that concatenates two functions
        # numpy. Histogram is a histogram of calculated data, i.e. how much data is contained in each data segment. The first parameter is the data matrix; the second parameter is the gap between each data segment (defined as 25); and the third parameter is the maximum or minimum value of the statistics
        # This step is to concatenate the square statistics of the three channels of this category
        hist = numpy.concatenate([hist] + [numpy.histogram(c, BINS, (0.0.255.0))0]])

    # L1 normalize
    hist = hist / len(img)
    return hist


def _calc_texture_gradient(img) :
    """ calculate texture gradient for entire image The original SelectiveSearch algorithm proposed Gaussian derivative for 8 orientations, but we use LBP instead. output will be [height(*)][width(*)] """

    ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))

    for colour_channel in (0.1.2):
        ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(img[:, :, colour_channel], 8.1.0)


    return ret


def _calc_texture_hist(img) :
    """ calculate texture histogram for each region calculate the histogram of gradient for each colours the size of output histogram will be BINS * ORIENTATIONS * COLOUR_CHANNELS(3) """
    BINS = 10

    hist = numpy.array([])

    for colour_channel in (0.1.2) :# mask by the colour channel
        fd = img[:, colour_channel]

        # calculate histogram for each orientation and concatenate them all
        # and join to the result
        hist = numpy.concatenate([hist] + [numpy.histogram(fd, BINS, (0.0.1.0))0]])

    # L1 Normalize
    hist = hist / len(img)

    return hist


#4. Extract the size, color and texture features of the area
def _extract_regions(img) :
    R = {}

    # get hsv image
    # Each pixel is three HSVS less than one
    hsv = skimage.color.rgb2hsv(img[:, :, :3])

    # pass 1: count pixel positions
    The maximum and minimum x and Y coordinates of each category are recorded in the dictionary R
    for y, i in enumerate(img):

        for x, (r, g, b, l) in enumerate(i):

            # initialize a new region
            if l not in R:
                R[l] = {"min_x": 0xffff."min_y": 0xffff."max_x": 0."max_y": 0."labels": [l]}

            # bounding box
            if R[l]["min_x"] > x:
                R[l]["min_x"] = x
            if R[l]["min_y"] > y:
                R[l]["min_y"] = y
            if R[l]["max_x"] < x:
                R[l]["max_x"] = x
            if R[l]["max_y"] < y:
                R[l]["max_y"] = y

    # pass 2: calculate texture gradient
    tex_grad = _calc_texture_gradient(img)



    # pass 3: calculate colour histogram of each region
    #k is the type, v is the type minx,maxx,miny,maxy
    for k, v in list(R.items()):
        # colour histogram
        masked_pixels = hsv[:, :, :][img[:, :, 3] == k]Extract the HSV of the input pixel of category K

        R[k]["size"] = len(masked_pixels / 4)# count the number of pixels in this class.
        R[k]["hist_c"] = _calc_colour_hist(masked_pixels)Record the straight square statistics of HSV three-channel distribution of such ratio

        # texture histogram
        R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])# similar to the previous step


    # the R returned here records information for each category of the image: mix_x,min_y,max_x,max_y,size, HIST_c, HIST_T
    return R


#5. Find neighbors - Determine whether each region is a neighbor by calculating whether it intersects all other regions
Regions: R records information for each category of the image: mix_x, MIN_Y,max_x, max_Y,size, HIST_c, HIST_T
def _extract_neighbours(regions) :
    def intersect(a, b) :# A and B are two categories
        # the minimum x of b is between the minimum x and the maximum x of A and the minimum y of B is between the minimum y and the maximum y of A or
        # the maximum x of b is between the minimum x and the maximum x of A and the maximum y of B is between the minimum y and the maximum y of A or
        # the minimum x of b is between the minimum x and the maximum x of A and the maximum y of B is between the minimum y and the maximum y of A or
        # the maximum x of b is between the minimum x and maximum x of A and the minimum y of B is between the minimum y and maximum y of A
        Part of B is in A
        if (a["min_x"] < b["min_x"] < a["max_x"]and a["min_y"] < b["min_y"] < a["max_y"\])or (a["min_x"] < b["max_x"] < a["max_x"]and a["min_y"] < b["max_y"] < a["max_y"\])or (a["min_x"] < b["min_x"] < a["max_x"]and a["min_y"] < b["max_y"] < a["max_y"\])or (a["min_x"] < b["max_x"] < a["max_x"]and a["min_y"] < b["min_y"] < a["max_y") :return True
        return False

    R = list(regions.items()) # items() takes each element of Regions, that is, information for each category
    neighbours = []
    for cur, a in enumerate(R[:-1) :# enumerate () is used for iteration to return subscripts and values, i.e. cur is the subscript, starting at 0, and R[:-1] means the last one is not iterated
        for b in R[cur + 1] :if intersect(a[1], b[1) :Compare the current category a with all categories after a
                neighbours.append((a, b))Put the two classes that are neighbors into an array

    return neighbours

# 6. Merge the functions of the two regions
The # parameter is the information for the two regions, merging the two regions into one region
def _merge_regions(r1, r2) :
    new_size = r1["size"] + r2["size"]
    rt = {
        "min_x": min(r1["min_x"], r2["min_x"]),
        "min_y": min(r1["min_y"], r2["min_y"]),
        "max_x": max(r1["max_x"], r2["max_x"]),
        "max_y": max(r1["max_y"], r2["max_y"]),
        "size": new_size,
        "hist_c": (
            r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
        "hist_t": (
            r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
        "labels": r1["labels"] + r2["labels"]}return rt

# 7. Selective Search is the main function
#
# scale: clustering degree of image segmentation. The larger the value is, the higher the clustering degree is, the less the segmentation is, and the larger the sub-regions are obtained. The default is 1
#
# Signa: Before image segmentation, gaussian filtering denoising will be performed on the original image. Sigma is the size of gaussian kernel. The default is 0.8
#
# min_size: the minimum number of pixels in a region. When the value is less than this value, the calculation of image segmentation will stop, the default value is 20
#
# Select the group of regions with the highest similarity each time (e.g., regions numbered 100 and 120) and merge them to get new regions (e.g., regions numbered 300).
After that, the similarity of the new area 300 with all neighbors in area 100 and all neighbors in area 120 is calculated and added into area set S. I keep repeating, knowing that S is empty,
# At this point, only the next region will be left, and its pixel number will be very large, close to the number of pixels of the original image, so it cannot continue to merge. Finally exit the program.


def selective_search(im_orig, scale=1.0, sigma=0.8, min_size=50) :
    '''Selective Search Parameters ---------- im_orig : ndarray Input image scale : int Free parameter. Higher means larger clusters in felzenszwalb segmentation. sigma : float Width of Gaussian kernel for felzenszwalb segmentation. min_size : int Minimum component size for felzenszwalb segmentation. Returns ------- img : ndarray image with region label region label is stored in the 4th value of each pixel [r,g,b,(region)] regions : array of dict [ { 'rect': (left, top, width, height), 'labels': [...], 'size': component_size }, ... ] ' ' '
    Raise an exception when the image is not a three-channel
    assert im_orig.shape[2] = =3."3ch image is expected"

    # load image and get smallest regions
    # region label is stored in the 4th value of each pixel [r,g,b,(region)]
    img = _generate_segments(im_orig, scale, sigma, min_size)


    if img is None:
        return None, {}

    imsize = img.shape[0] * img.shape[1]
    R = _extract_regions(img)   #R records information for each category of the image: mix_x, MIN_Y,max_x, max_Y,size, HIST_c, HIST_T

    # extract neighbouring information
    neighbours = _extract_neighbours(R)

    # calculate initial similarities
    S = {}
    for (ai, ar), (bi, br) in neighbours:
        # ai, BI refers to the number of the two categories, ar, BR refers to the information for each category
        #S is a dictionary, key is the number of two categories, and value is the similarity of the two categories
        S[(ai, bi)] = _calc_sim(ar, br, imsize)

    # hierarchal search
    whileS ! = {} :# get highest similarity
        # find the two category numbers I and j with the highest similarity in S
        The sorted function sorts S, and [-1][0] refers to the first element of the last sorted item (value = most similar).
        i, j = sorted(S.items(), key=lambda i: i[1[-])1] [0]

        # merge corresponding regions
        The #key () function returns all keys of the dictionary
        t = max(R.keys()) + 1.0Create a new area number
        R[t] = _merge_regions(R[i], R[j])

        # mark similarities for regions to be removed
        key_to_delete = []
        for k, v in list(S.items()):
            if (i in k) or (j in k):
                Add the items in S that have category I and category j to the key_to_delete array
                key_to_delete.append(k)

        # remove old similarities of related regions
        for k in key_to_delete:
            del S[k]

        # calculate similarity set with the new region
        # Calculate the similarity between the new region and other regions
        for k in [a for a in key_to_delete ifa ! = (i, j)]: n = k[1] if k[0] in (i, j) else k[0]
            S[(t, n)] = _calc_sim(R[t], R[n], imsize)

    regions = []
    for k, r in list(R.items()):
        #k is the number, and r stores information about that class
        regions.append({
            'rect': (
                r['min_x'], r['min_y'],
                r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
            'size': r['size'].'labels': r['labels']})return img, regions




# read an image from skimage. The size of the image is 512*512*3
# img = skimage.data.astronaut()
# img_lbl = selective_search(img, scale=500, sigma=0.9, min_size=10) # img_lBL = selective_search(img, scale=500, sigma=0.9, min_size=10)
# print(regions)


def main() :
    Load image data
    img = skimage.data.astronaut()

    Selective search is performed with regions in the following format [{'rect': (left, top, width, height), 'labels': [...], 'size': component_size }, ... ] ' ' '
    img_lbl, regions = selective_search(img, scale=500, sigma=0.9, min_size=10)
    print(img_lbl.shape)
    # Calculate how many original candidate regions are divided
    temp = set(a)The # set() function creates an unordered set of non-repeating elements
    for i in range(img_lbl.shape[0) :for j in range(img_lbl.shape[1) :# temp stores all the category numbers
            temp.add(img_lbl[i, j, 3])

    print(len(temp))  # 286

    # Calculate how many candidate regions are obtained by Selective Search algorithm
    print(len(regions))  # 570
    Each element is a list(top left x, top left y, width, height) representing the border of a candidate region
    candidates = set(a)for r in regions:
        # Eliminate duplicate candidate fields
        if r['rect'] in candidates:
            continue
        # Exclude candidate regions smaller than 2000 pixels (not the size of regions in the bounding box)
        if r['size'] < 2000:
            continue
        Remove distorted candidate borders that are only approximately square
        x, y, w, h = r['rect']
        if w / h > 1.2 or h / w > 1.2:
            continue
        candidates.add(r['rect'])

    Draw the candidate region border on the original image
    fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6.6))
    ax.imshow(img)
    for x, y, w, h in candidates:
        print(x, y, w, h)
        rect = mpatches.Rectangle(
            (x, y), w, h, fill=False, edgecolor='red', linewidth=1)
        ax.add_patch(rect)

    plt.show()

main()
Copy the code

The results show

Please give me a thumbs up if you find this post useful, and if there is anything you can exchange in the comments section