Resnet’s default input image channel is 3 channels, and the pre-training model is also 3 channels. How to modify if you want the input to be a single channel grayscale image? It’s easy, just modify the input channel of the first convolution. Looking at resNet’s function initialization, you can see that the initial first convolution is 3-channel, change it to single-channel. [1] Modify the resnet function definition

# self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
Copy the code

[2] Modified the loading of the pre-training model without importing the weight of the first convolution layer. Reference: github.com/ShuaiLYU/re…

# if pretrained:
#     state_dict = model_zoo.load_url(model_urls['resnet18'])
#     for key  in list(state_dict.keys() ):
#         if "conv1" in key : del state_dict[key]
#     model.load_state_dict(state_dict, False)
Copy the code

[3] The pre-processing normalization operation used to read color images with HWC3 dimension data, and the three image channels were normalized respectively. At the same time, BCHW data would enter the tensor, while gray images only have HW two dimension data. The C channel needs to be added separately and the normalized data needs to be modified

def load_image(self, image_path):
    # self.RGB_MEAN = np.array([122.67891434, 116.66876762, 104.00698793])
    self.RGB_MEAN = 122.67891434
    # img = cv2.imread(image_path, cv2.IMREAD_COLOR).astype('float32')
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE).astype('float32')
    height, width  = img.shape[:2]
    if height > width:
        img = cv2.copyMakeBorder(img, 0, 0, 0, height - width, cv2.BORDER_CONSTANT, value=(255, 255, 255))
    else:
        img = cv2.copyMakeBorder(img, 0, width - height, 0, 0, cv2.BORDER_CONSTANT, value=(255, 255, 255))
    original_shape = img.shape[:2]
    img = self.resize_image_cn(img)
    img -= self.RGB_MEAN
    img /= 255.
    img = np.expand_dims(img, axis=-1)
    img = torch.from_numpy(img).permute(2, 0, 1).float().unsqueeze(0)
    return img, original_shape


def resize_image_cn(self, img, img_side=1280):
    height, width, _ = img.shape
    if height > img_side:
        new_height = 1280
        new_width = 1280
    else:
        new_height = int(math.ceil(height / 32) * 32)
        new_width = int(math.ceil(new_height / height * width / 32) * 32)
    resized_img = cv2.resize(img, (new_width, new_height))
    return resized_img
    
Copy the code

3-channel image

   def load_image(self, image_path):
        img = cv2.imread(image_path, cv2.IMREAD_COLOR).astype('float32')
        height, width, _ = img.shape
        if height > width:
            img = cv2.copyMakeBorder(img, 0, 0, 0, height - width, cv2.BORDER_CONSTANT, value=(255, 255, 255))
        else:
            img = cv2.copyMakeBorder(img, 0, width - height, 0, 0, cv2.BORDER_CONSTANT, value=(255, 255, 255))
        original_shape = img.shape[:2]
        img = self.resize_image_cn(img)
        img -= self.RGB_MEAN
        img /= 255.
        img = torch.from_numpy(img).permute(2, 0, 1).float().unsqueeze(0)
        return img, original_shape
Copy the code

Thinking: The initial change to grayscale input is to reduce the size of the model, but in fact, the image is changed to single channel input, and only the first convolution is modified, so the size of the model file basically does not change much and will not be reduced much. Instead, in order to fit the changes need to modify the use of several function (data loading, image preprocessing, model of load, the model of function definitions), and use other scene again if it is colour scene cannot fit, more troublesome and not very practical, so don’t recommend this kind of practice, the default image input loading 3 channels. As for how to reduce the model, accelerated reasoning can use more mature quantitative pruning and other operations.