0 overview

  • Richer Convolutional Features for Edge Detection
  • Paper links: openaccess.thecvf.com/content_cvp…
  • Abbreviations: RCF

This thesis, in my opinion, is an improvement of CVPR’s 2015 HED network (Holistically nested edge Detection), and RCF’s thesis is almost always contrasted with HED network.

From the last article, we dimly remembered that the HED model had this diagram:

There are five characteristic diagrams of HED side output, as shown in the RCF paper below:

Let’s look at the difference between these two graphs to see how much improvement RCF has over HED, and you can take a look at the graph.

Reveal the answer:

  • HED is a leopard picture, but RCF is a bird picture (manual dog head)
  • HED is the feature graph of the side output, while RCF is conv3_1 and conv3_2, which means that RCF seems to treat every feature graph of the output after convolution as a side output.

Yes, HED selects 5 side outputs, and each side output is a feature graph of the convolution layer output before the pooling layer. RCF treats the output feature graph of each convolution as side output. In other words, there may be more than one output of the same size in the final Side output.

If not, see the following section, Model Structure.

1. Model Structure

Backbone of RCF is a VGG model:

As can be seen from the figure:

  • The trunk network is divided into State1 to 5. Stage1 has two convolution layers and STAGE2 has two convolution layers, with a total of 13 convolution layers. For each image output by convolution, an additional 1×1 convolution is added to reduce the number of channels.
  • The feature graph of 21 channels on the same stage is splicing into the feature graph of 42 channels or 63 channels, and then through a 1×1 convolution layer to reduce the channel number to 1. After entering the Sigmoid layer, the output result is side output in an RCF model

2 Loss function

The loss function here is similar to HED:

First, on the whole, the loss function still uses binary cross entropy

Including ∣ Y – ∣ | Y ^ – | ∣ Y – ∣ said the pixel value, the negative ∣ Y + ∣ | Y ^ + | ∣ Y + ∣ said positive pixel values. Generally speaking, in contour detection tasks, positive samples should be small, so the value of α\alphaα is small. Therefore, in the first line of the loss function, y=0, that is, when calculating the loss of the non-contour part, a small weight will be added to avoid the problem of category imbalance.

There are two constants in the loss function, one is lambda lambda lambda, which is the weighting constant and defaults to 1.1. The other is “η\eta”. The description in the paper is:

Edge datasets in this community are usually labeled by several annotators using their knowledge about the presences of objects and object parts. Though humans vary in cognition, these human-labeled edges for the same image share high consistency. For each image, we average all the ground truth to generate an edge probability map, which ranges from 0 to 1. Here, 0 means no annotator labeled at this pixel, and 1 means all annotators have labeled at this pixel. We consider the pixels with edge probability higher than η as positive samples and the pixels with edge probability equal to 0 as negative samples. Otherwise, if a pixel is marked by fewer than η of the annotators, this pixel may be semantically controversial to be an edge point. Thus, whether regarding it as positive or negative samples may confuse networks. So we ignore pixels in this category.

It takes more than one person to annotate a data set. Although different people have different consciousness, they often have consistency in the outline annotation of the same picture. The final output of the RCF network is generated by the fusion of 5 side outputs, so your RCF output should also be considered as positive if it is greater than η\etaη, and negative if it is less than η\etaη. In fact, I did not take this into account when I reproduced it. I did not take this into account when I looked at Github and the official code on the Internet. Let me explain the meaning of “η\etaη” in this paper

3 PyTorch part of the code

For this RCF paper, the key is the construction of a model, the other is the construction of the loss function, here released these two parts of the code, to help you better understand the above content.

3.1 Model part

The following code is a bit old in the up-sample section, as the pyTorch version is probably older than the Conv2DTrans function, but it does not prevent you from learning RCF in code.

class RCF(nn.Module) :
    def __init__(self) :
        super(RCF, self).__init__()
        #lr 1 2 decay 1 0
        self.conv1_1 = nn.Conv2d(3.64.3, padding=1)
        self.conv1_2 = nn.Conv2d(64.64.3, padding=1)

        self.conv2_1 = nn.Conv2d(64.128.3, padding=1)
        self.conv2_2 = nn.Conv2d(128.128.3, padding=1)

        self.conv3_1 = nn.Conv2d(128.256.3, padding=1)
        self.conv3_2 = nn.Conv2d(256.256.3, padding=1)
        self.conv3_3 = nn.Conv2d(256.256.3, padding=1)

        self.conv4_1 = nn.Conv2d(256.512.3, padding=1)
        self.conv4_2 = nn.Conv2d(512.512.3, padding=1)
        self.conv4_3 = nn.Conv2d(512.512.3, padding=1)

        self.conv5_1 = nn.Conv2d(512.512, kernel_size=3,
                        stride=1, padding=2, dilation=2)
        self.conv5_2 = nn.Conv2d(512.512, kernel_size=3,
                        stride=1, padding=2, dilation=2)
        self.conv5_3 = nn.Conv2d(512.512, kernel_size=3,
                        stride=1, padding=2, dilation=2)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(2, stride=2, ceil_mode=True)
        self.maxpool4 = nn.MaxPool2d(2, stride=1, ceil_mode=True)


        #lr 0.1 0.2 decay 1 0
        self.conv1_1_down = nn.Conv2d(64.21.1, padding=0)
        self.conv1_2_down = nn.Conv2d(64.21.1, padding=0)

        self.conv2_1_down = nn.Conv2d(128.21.1, padding=0)
        self.conv2_2_down = nn.Conv2d(128.21.1, padding=0)

        self.conv3_1_down = nn.Conv2d(256.21.1, padding=0)
        self.conv3_2_down = nn.Conv2d(256.21.1, padding=0)
        self.conv3_3_down = nn.Conv2d(256.21.1, padding=0)

        self.conv4_1_down = nn.Conv2d(512.21.1, padding=0)
        self.conv4_2_down = nn.Conv2d(512.21.1, padding=0)
        self.conv4_3_down = nn.Conv2d(512.21.1, padding=0)
        
        self.conv5_1_down = nn.Conv2d(512.21.1, padding=0)
        self.conv5_2_down = nn.Conv2d(512.21.1, padding=0)
        self.conv5_3_down = nn.Conv2d(512.21.1, padding=0)

        #lr 0.01 0.02 decay 1 0
        self.score_dsn1 = nn.Conv2d(21.1.1)
        self.score_dsn2 = nn.Conv2d(21.1.1)
        self.score_dsn3 = nn.Conv2d(21.1.1)
        self.score_dsn4 = nn.Conv2d(21.1.1)
        self.score_dsn5 = nn.Conv2d(21.1.1)
        #lr 0.000.002 Decay 1 0
        self.score_final = nn.Conv2d(5.1.1)

    def forward(self, x) :
        # VGG
        img_H, img_W = x.shape[2], x.shape[3] conv1_1 = self.relu(self.conv1_1(x)) conv1_2 = self.relu(self.conv1_2(conv1_1)) pool1 = self.maxpool(conv1_2) conv2_1 = self.relu(self.conv2_1(pool1)) conv2_2 = self.relu(self.conv2_2(conv2_1)) pool2 = self.maxpool(conv2_2) conv3_1 = self.relu(self.conv3_1(pool2)) conv3_2 = self.relu(self.conv3_2(conv3_1)) conv3_3 = self.relu(self.conv3_3(conv3_2)) pool3 = self.maxpool(conv3_3) conv4_1 = self.relu(self.conv4_1(pool3)) conv4_2 = self.relu(self.conv4_2(conv4_1)) conv4_3 = self.relu(self.conv4_3(conv4_2)) pool4 = self.maxpool4(conv4_3) conv5_1 = self.relu(self.conv5_1(pool4)) conv5_2 = self.relu(self.conv5_2(conv5_1)) conv5_3 = self.relu(self.conv5_3(conv5_2)) conv1_1_down = self.conv1_1_down(conv1_1) conv1_2_down = self.conv1_2_down(conv1_2) conv2_1_down = self.conv2_1_down(conv2_1) conv2_2_down = self.conv2_2_down(conv2_2) conv3_1_down = self.conv3_1_down(conv3_1) conv3_2_down = self.conv3_2_down(conv3_2) conv3_3_down = self.conv3_3_down(conv3_3) conv4_1_down = self.conv4_1_down(conv4_1) conv4_2_down = self.conv4_2_down(conv4_2) conv4_3_down = self.conv4_3_down(conv4_3) conv5_1_down = self.conv5_1_down(conv5_1) conv5_2_down = self.conv5_2_down(conv5_2) conv5_3_down = self.conv5_3_down(conv5_3) so1_out =  self.score_dsn1(conv1_1_down + conv1_2_down) so2_out = self.score_dsn2(conv2_1_down + conv2_2_down) so3_out = self.score_dsn3(conv3_1_down + conv3_2_down + conv3_3_down) so4_out = self.score_dsn4(conv4_1_down + conv4_2_down + conv4_3_down) so5_out = self.score_dsn5(conv5_1_down + conv5_2_down + conv5_3_down)## transpose and crop way 
        weight_deconv2 =  make_bilinear_weights(4.1).cuda()
        weight_deconv3 =  make_bilinear_weights(8.1).cuda()
        weight_deconv4 =  make_bilinear_weights(16.1).cuda()
        weight_deconv5 =  make_bilinear_weights(32.1).cuda()

        upsample2 = torch.nn.functional.conv_transpose2d(so2_out, weight_deconv2, stride=2)
        upsample3 = torch.nn.functional.conv_transpose2d(so3_out, weight_deconv3, stride=4)
        upsample4 = torch.nn.functional.conv_transpose2d(so4_out, weight_deconv4, stride=8)
        upsample5 = torch.nn.functional.conv_transpose2d(so5_out, weight_deconv5, stride=8)
        ### center crop
        so1 = crop(so1_out, img_H, img_W)
        so2 = crop(upsample2, img_H, img_W)
        so3 = crop(upsample3, img_H, img_W)
        so4 = crop(upsample4, img_H, img_W)
        so5 = crop(upsample5, img_H, img_W)

        fusecat = torch.cat((so1, so2, so3, so4, so5), dim=1)
        fuse = self.score_final(fusecat)
        results = [so1, so2, so3, so4, so5, fuse]
        results = [torch.sigmoid(r) for r in results]
        return results
Copy the code

3.2 Loss function

def cross_entropy_loss_RCF(prediction, label) :
    label = label.long()
    mask = label.float()
    num_positive = torch.sum((mask==1).float()).float()
    num_negative = torch.sum((mask==0).float()).float()

    mask[mask == 1] = 1.0 * num_negative / (num_positive + num_negative)
    mask[mask == 0] = 1.1 * num_positive / (num_positive + num_negative)
    mask[mask == 2] = 0
    cost = torch.nn.functional.binary_cross_entropy(
            prediction.float(),label.float(), weight=mask, reduce=False)
    return torch.sum(cost)
Copy the code

Reference article:

  1. Blog.csdn.net/a8039974/ar…
  2. Gitee.com/HEART1/RCF-…
  3. Openaccess.thecvf.com/content_cvp…