In this article, we will explain the code implementation logic and visualization of variable convolution. All based on python, no C++. Most of the code comes from: github.com/oeway/pytor… But I’ve been working on it for a while, and I’ve found that there’s something wrong with this guy’s code that makes variable convolution not implemented. The reason why I found this problem was that WHEN I visualized the detection points of variable convolution, I found some clues. After modification, it could be visualized normally and the accuracy was improved.

1 Code Logic

# for visualization
class ConvOffset2D(nn.Conv2d) :
    """ConvOffset2D Convolutional layer responsible for learning the 2D offsets and output the deformed feature map using bilinear interpolation Note that this layer does not perform convolution on the deformed feature map. See get_deform_cnn  in cnn.py for usage """
    def __init__(self, filters, init_normal_stddev=0.01, **kwargs) :
        """Init Parameters ---------- filters : int Number of channel of the input feature map init_normal_stddev : float Normal kernel initialization **kwargs: Pass to superclass. See Con2d layer in pytorch """
        self.filters = filters
        self._grid_param = None
        super(ConvOffset2D, self).__init__(self.filters, self.filters*2.3, padding=1, bias=False, **kwargs)
        self.weight.data.copy_(self._init_weights(self.weight, init_normal_stddev))

    def forward(self, x) :
        """Return the deformed featured map"""
        x_shape = x.size()
        offsets_ = super(ConvOffset2D, self).forward(x)
        

        # offsets: (b*c, h, w, 2)
        # Self._to_bc_h_w_2 is the code I changed
        offsets = self._to_bc_h_w_2(offsets_, x_shape)

        # x: (b*c, h, w)
        x = self._to_bc_h_w(x, x_shape)

        # X_offset: (b*c, h, w)
        x_offset = th_batch_map_offsets(x, offsets, grid=self._get_grid(self,x))

        # x_offset: (b, h, w, c)
        x_offset = self._to_b_c_h_w(x_offset, x_shape)

        return x_offset,offsets_
Copy the code

Suppose we now want to calculate the offset of the variable convolution of a 5-channel 28×28 feature graph.

  1. offsets_ = super(ConvOffset2D, self).forward(x)

Offsets_ is now a feature map of a 10-channel 28×28.

  1. offsets = self._to_bc_h_w_2(offsets_, x_shape)

Calling this function features the graph from (b, 2c, h, w) to (BXC, h, w, 2)

  1. x = self._to_bc_h_w(x, x_shape)

Change the structure of the original feature graph to (BXC, H, W)

  1. x_offset = th_batch_map_offsets(x, offsets, grid=self._get_grid(self,x))

This is equivalent to applying the previous offset offsets to the feature graph X

  1. x_offset = self._to_b_c_h_w(x_offset, x_shape)

The feature graph after the offset is restored to the structure of (B, C, H, W)

As you can see, the key is the step of applying offset to x.

def th_batch_map_offsets(input, offsets, grid=None, order=1) :
    """Batch map offsets into input Parameters --------- input : torch.Tensor. shape = (b, s, s) offsets: torch.Tensor. shape = (b, s, s, 2) Returns ------- torch.Tensor. shape = (b, s, s) """
    batch_size = input.size(0)
    input_height = input.size(1)
    input_width = input.size(2)

    offsets = offsets.view(batch_size, -1.2)
    if grid is None:
        grid = th_generate_grid(batch_size, input_height, input_width, offsets.data.type(), offsets.data.is_cuda)

    coords = offsets + grid

    mapped_vals = th_batch_map_coordinates(input, coords)
    return mapped_vals
Copy the code
  1. offsets = offsets.view(batch_size, -1, 2)

The offsets were modified to look like (BXC, H, W,2) and now look like (B, CXHXW,2)

  1. coords = offsets + grid

The grid is similar to the xy axis of the pixel. Offsets is a relative offset, so the offset+grid becomes the absolute coordinate after the offset, and the corresponding element can be directly located from the feature graph. Because the xy axis of the pixel value must be an integer, and because the offset is a decimal, the element positioned to a decimal coordinate in the feature graph is obtained by the method of bilinear difference to obtain the pixel value of the non-existent position.

  1. mapped_vals = th_batch_map_coordinates(input, coords)

The content of this section is to apply offset to the original feature graph

Almost logic is that logic

2 Result Display

Let’s look at the results of using variable convolution without using it:

This MNIST number recognition task is already at the kindergarten level, so the success rate is basically very high:

Look at the results using variable convolution:

It can be found that the final loss reduction is not better than the effect without variable convolution. As for the reason, I am not sure, maybe the task is too simple. It occurred to me that perhaps the purpose of variable convolution is to be more sensitive to the target texture and so on, but it is not effective for MNIST classification.

Finally, I came up with a picture of the variable convolution visualization that I finally achieved after much effort:

It can be seen that the variable convolution has a larger response to the digital part, and the detection point will have a larger offset in the digital part. However, in the process of variable convolution testing, the size of this offset is uncertain. This training model may have a large offset and the next training may have a small offset, which seems to increase the difficulty of network training. That’s about it. (I’m not sure if it’s my code.)

3 Complete Code

class ConvOffset2D(nn.Conv2d) :
    """ConvOffset2D Convolutional layer responsible for learning the 2D offsets and output the deformed feature map using bilinear interpolation Note that this layer does not perform convolution on the deformed feature map. See get_deform_cnn  in cnn.py for usage """
    def __init__(self, filters, init_normal_stddev=0.01, **kwargs) :
        """Init Parameters ---------- filters : int Number of channel of the input feature map init_normal_stddev : float Normal kernel initialization **kwargs: Pass to superclass. See Con2d layer in pytorch """
        self.filters = filters
        self._grid_param = None
        super(ConvOffset2D, self).__init__(self.filters, self.filters*2.3, padding=1, bias=False, **kwargs)
        self.weight.data.copy_(self._init_weights(self.weight, init_normal_stddev))

    def forward(self, x) :
        """Return the deformed featured map"""
        x_shape = x.size()
        offsets = super(ConvOffset2D, self).forward(x)

        # offsets: (b*c, h, w, 2)
        offsets = self._to_bc_h_w_2(offsets, x_shape)

        # x: (b*c, h, w)
        x = self._to_bc_h_w(x, x_shape)

        # X_offset: (b*c, h, w)
        x_offset = th_batch_map_offsets(x, offsets, grid=self._get_grid(self,x))

        # x_offset: (b, h, w, c)
        x_offset = self._to_b_c_h_w(x_offset, x_shape)

        return x_offset

    @staticmethod
    def _get_grid(self, x) :
        batch_size, input_height, input_width = x.size(0), x.size(1), x.size(2)
        dtype, cuda = x.data.type(), x.data.is_cuda
        if self._grid_param == (batch_size, input_height, input_width, dtype, cuda):
            return self._grid
        self._grid_param = (batch_size, input_height, input_width, dtype, cuda)
        self._grid = th_generate_grid(batch_size, input_height, input_width, dtype, cuda)
        return self._grid

    @staticmethod
    def _init_weights(weights, std) :
        fan_out = weights.size(0)
        fan_in = weights.size(1) * weights.size(2) * weights.size(3)
        w = np.random.normal(0.0, std, (fan_out, fan_in))
        return torch.from_numpy(w.reshape(weights.size()))

    @staticmethod
    def _to_bc_h_w_2(x, x_shape) :
        """(b, 2c, h, w) -> (b*c, h, w, 2)"""
        x = x.contiguous().view(-1.int(x_shape[2]), int(x_shape[3]), 2)
        return x

    @staticmethod
    def _to_bc_h_w(x, x_shape) :
        """(b, c, h, w) -> (b*c, h, w)"""
        x = x.contiguous().view(-1.int(x_shape[2]), int(x_shape[3]))
        return x

    @staticmethod
    def _to_b_c_h_w(x, x_shape) :
        """(b*c, h, w) -> (b, c, h, w)"""
        x = x.contiguous().view(-1.int(x_shape[1]), int(x_shape[2]), int(x_shape[3]))
        return x
    
def th_generate_grid(batch_size, input_height, input_width, dtype, cuda) :
    grid = np.meshgrid(
        range(input_height), range(input_width), indexing='ij'
    )
    grid = np.stack(grid, axis=-1)
    grid = grid.reshape(-1.2)

    grid = np_repeat_2d(grid, batch_size)
    grid = torch.from_numpy(grid).type(dtype)
    if cuda:
        grid = grid.cuda()
    return Variable(grid, requires_grad=False)


def th_batch_map_offsets(input, offsets, grid=None, order=1) :
    """Batch map offsets into input Parameters --------- input : torch.Tensor. shape = (b, s, s) offsets: torch.Tensor. shape = (b, s, s, 2) Returns ------- torch.Tensor. shape = (b, s, s) """
    batch_size = input.size(0)
    input_height = input.size(1)
    input_width = input.size(2)

    offsets = offsets.view(batch_size, -1.2)
    if grid is None:
        grid = th_generate_grid(batch_size, input_height, input_width, offsets.data.type(), offsets.data.is_cuda)

    coords = offsets + grid

    mapped_vals = th_batch_map_coordinates(input, coords)
    return mapped_vals

def np_repeat_2d(a, repeats) :
    """Tensorflow version of np.repeat for 2D"""

    assert len(a.shape) == 2
    a = np.expand_dims(a, 0)
    a = np.tile(a, [repeats, 1.1])
    return a

def th_batch_map_coordinates(input, coords, order=1) :
    """Batch version of th_map_coordinates Only supports 2D feature maps Parameters ---------- input : tf.Tensor. shape = (b, s, s) coords : tf.Tensor. shape = (b, n_points, 2) Returns ------- tf.Tensor. shape = (b, s, s) """

    batch_size = input.size(0)
    input_height = input.size(1)
    input_width = input.size(2)

    n_coords = coords.size(1)

    # coords = torch.clamp(coords, 0, input_size - 1)

    coords = torch.cat((torch.clamp(coords.narrow(2.0.1), 0, input_height - 1), torch.clamp(coords.narrow(2.1.1), 0, input_width - 1)), 2)

    assert (coords.size(1) == n_coords)

    coords_lt = coords.floor().long()
    coords_rb = coords.ceil().long()
    coords_lb = torch.stack([coords_lt[..., 0], coords_rb[..., 1]], 2)
    coords_rt = torch.stack([coords_rb[..., 0], coords_lt[..., 1]], 2)
    idx = th_repeat(torch.arange(0, batch_size), n_coords).long()
    idx = Variable(idx, requires_grad=False)
    if input.is_cuda:
        idx = idx.cuda()

    def _get_vals_by_coords(input, coords) :
        indices = torch.stack([
            idx, th_flatten(coords[..., 0]), th_flatten(coords[..., 1]]),1)
        inds = indices[:, 0] *input.size(1) *input.size(2)+ indices[:, 1] *input.size(2) + indices[:, 2]
        vals = th_flatten(input).index_select(0, inds)
        vals = vals.view(batch_size, n_coords)
        return vals

    vals_lt = _get_vals_by_coords(input, coords_lt.detach())
    vals_rb = _get_vals_by_coords(input, coords_rb.detach())
    vals_lb = _get_vals_by_coords(input, coords_lb.detach())
    vals_rt = _get_vals_by_coords(input, coords_rt.detach())

    coords_offset_lt = coords - coords_lt.type(coords.data.type())
    vals_t = coords_offset_lt[..., 0]*(vals_rt - vals_lt) + vals_lt
    vals_b = coords_offset_lt[..., 0]*(vals_rb - vals_lb) + vals_lb
    mapped_vals = coords_offset_lt[..., 1]* (vals_b - vals_t) + vals_t
    return mapped_vals

def th_repeat(a, repeats, axis=0) :
    """Torch version of np.repeat for 1D"""
    assert len(a.size()) == 1
    return th_flatten(torch.transpose(a.repeat(repeats, 1), 0.1))


def th_flatten(a) :
    """Flatten tensor"""
    return a.contiguous().view(a.nelement())
Copy the code