Feature map visualization and thermal map visualization are two common visualization methods in paper. The previous article “a visual feature map code” introduced the feature map visualization code, this article will be how to do a thermal map visualization instructions.

This paper introduces the principle and defects of CAM and GradCAM, introduces how to use GradCAM algorithm to realize the visualization of thermal map, and introduces other types of tasks such as target detection, semantic segmentation, Transformer model and so on.

This article from the public CV technical guide technical summary series

Welcome to pay attention to the public number CV technical guide, focusing on computer vision technology summary, the latest technology tracking, classic paper interpretation, CV recruitment information.

Principle of visualization method of thermogram

In a neural network model, images go through the neural network to get the categorical output. We do not know what the model is based on to make predictions. In other words, we need to know how much each region of the image affects the prediction of the model. That’s what a thermogram does, it generates an isotherm-like picture by figuring out how important different areas of the image are to the model.

Thermal map visualization method has gone through the process from CAM, GradCAM, to GradCAM++, the more commonly used is GradCAM algorithm.

CAM

Learning Deep Features for Discriminative Localization

The principle of CAM is to take out the rights-safeguarding value of the probability of getting class C in the whole connection layer, which is represented by W. Then, the feature map before GAP was weighted summation. Since the feature map was not the original image size at this time, up-sampling was also required after the weighted summation to obtain the Class Activation map.

CAM has a very fatal defect, its structure is composed of CNN + GAP + FC + Softmax. That is to say, if you want to visualize an existing model, for the model without GAP, you need to modify the original model structure and retrain, which is quite troublesome. Moreover, if the model is large, retraining after modification may not achieve the original effect, and visualization is meaningless.

Therefore, an improved grad-CAM was developed to address this defect.

GradCAM

Grad-cam: Visual accessibility from Deep Networks via gradient-based Localization

The biggest feature of Grad-CAM is that it does not need to modify the existing model structure or retrain. It can be visualized directly on the original model.

Principle: Feature maps of the last layer of CNN feature extraction network are also processed. For the category C to be visualized, grad-CAM makes the probability value of the last output category C propagated back to the feature maps of the last layer, so as to obtain the gradient value of category C for each pixel of the feature maps. The gradient value of each pixel is globally averaged pooled. The weighted coefficient alpha of feature maps can be obtained. It is mentioned in the paper that the weighted coefficient obtained in this way is almost equivalent to the calculation amount of coefficient in CAM. Then the weighted summation of the feature graph was carried out, and ReLU was used for correction, and then up-sampling was carried out.

The reason for using ReLU is that for those negative values, it can be considered irrelevant to the recognition of category C. These negative values may be related to other categories, while positive values have a positive impact on the recognition of C.

The specific formula is as follows:

Grad-cam is followed by an improved version of Grad-Cam ++, whose main improvement effect is more accurate positioning and more suitable for the situation of homogeneous and multi-objective. The so-called homogeneous and multi-objective refers to the occurrence of multiple targets in a certain category in an image, such as seven or eight people. The improved method is to put forward a new method for obtaining the weighted coefficient, which is very complicated and will not be introduced here.

GradCAM tutorial

This code comes from the author of GradCAM thesis. There are many other Cams in the original link. Here is an example of how to use GradCAM.

Source link: github.com/jacobgil/py…

Link to this tutorial code: github.com/CV-Tech-Gui…

Using the process

It’s easy to use, just know the main function.

if __name__ == "__main__": imgs_path = "path/to/image.png" model = models.mobilenet_v3_large(pretrained=True) Model.load_state_dict (torch.load('model.pth')) model = model.cuda().eval() #target_layers Target_layers = [model. Features [-1]] img Imgs_path (imgs_path) data = data.cuda() CAM = GradCAM(model=model, target_layers=target_layers) Specify None, and the class with the highest probability currently predicted is the visualization class. target_category = None grayscale_cam = cam(input_tensor=data, target_category=target_category) grayscale_cam = grayscale_cam[0, :] visualization = show_cam_on_image(np.array(img) / 255., grayscale_cam) plt.imshow(visualization) plt.xticks() plt.yticks() plt.axis('off') plt.savefig("path/to/gradcam_image.jpg")Copy the code

As shown in the above code, only the input image, model, visualization layer and visualization category need to be set independently, and other parts can be fully used.

Here are the details.

Data preprocessing

So we’ll do the same thing we did with the visual feature, we’ll read it, we’ll resize it, we’ll translate it into Tensor, we’ll format it, if we only have one image, we’ll expand it to four dimensions.

def image_proprecess(img_path): img = Image.open(img_path) data_transforms = transforms.Compose([ transforms.Resize((384, 384), interpolation=3), Transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, Transforms (img) data = torch. Unsqueeze (data,0) img_resize = img.resize((384,712)) return img_resize,dataCopy the code

GradCAM

The GradCAM class is packaged according to the principles described in the first section, so it is easy to understand the code for this class once you understand the principles.

class GradCAM: def __init__(self, model, target_layers, reshape_transform=None): self.model = model.eval() self.target_layers = target_layers self.reshape_transform = reshape_transform self.cuda = use_cuda self.activations_and_grads = ActivationsAndGradients( self.model, target_layers, reshape_transform) """ Get a vector of weights for every channel in the target layer. Methods that return weights channels, will typically need to only implement this function. """ @staticmethod def get_cam_weights(grads): return np.mean(grads, axis=(2, 3), keepdims=True) @staticmethod def get_loss(output, target_category): loss = 0 for i in range(len(target_category)): loss = loss + output[i, target_category[i]] return loss def get_cam_image(self, activations, grads): weights = self.get_cam_weights(grads) weighted_activations = weights * activations cam = weighted_activations.sum(axis=1) return cam @staticmethod def get_target_width_height(input_tensor): width, height = input_tensor.size(-1), input_tensor.size(-2) return width, height def compute_cam_per_layer(self, input_tensor): activations_list = [a.cpu().data.numpy() for a in self.activations_and_grads.activations] grads_list = [g.cpu().data.numpy() for g in self.activations_and_grads.gradients] target_size = self.get_target_width_height(input_tensor) cam_per_target_layer = [] # Loop over the saliency image from every layer for  layer_activations, layer_grads in zip(activations_list, grads_list): cam = self.get_cam_image(layer_activations, layer_grads) cam[cam < 0] = 0 # works like mute the min-max scale in the function of scale_cam_image scaled = self.scale_cam_image(cam, target_size) cam_per_target_layer.append(scaled[:, None, :]) return cam_per_target_layer def aggregate_multi_layers(self, cam_per_target_layer): cam_per_target_layer = np.concatenate(cam_per_target_layer, axis=1) cam_per_target_layer = np.maximum(cam_per_target_layer, 0) result = np.mean(cam_per_target_layer, axis=1) return self.scale_cam_image(result) @staticmethod def scale_cam_image(cam, target_size=None): result = [] for img in cam: img = img - np.min(img) img = img / (1e-7 + np.max(img)) if target_size is not None: img = cv2.resize(img, target_size) result.append(img) result = np.float32(result) return result def __call__(self, input_tensor, target_category=None): Output = self. Activations_and_tensor (input_tensor) if isinstance(target_category, int): target_category = [target_category] * input_tensor.size(0) if target_category is None: target_category = np.argmax(output.cpu().data.numpy(), axis=-1) print(f"category id: {target_category}") else: assert (len(target_category) == input_tensor.size(0)) self.model.zero_grad() loss = self.get_loss(output, target_category) loss.backward(retain_graph=True) # In most of the saliency attribution papers, the saliency is # computed with a single target layer. # Commonly it is the last convolutional layer. # Here we support passing a list with multiple target layers. # It will compute the saliency image for every image, # and then aggregate them (with a default mean aggregation). # This gives you more flexibility in case you just want to # use all conv layers for example, all Batchnorm layers, # or something else. cam_per_layer = self.compute_cam_per_layer(input_tensor) return self.aggregate_multi_layers(cam_per_layer) def __del__(self): self.activations_and_grads.release() def __enter__(self): return self def __exit__(self, exc_type, exc_value, exc_tb): self.activations_and_grads.release() if isinstance(exc_value, IndexError): # Handle IndexError here... print( f"An exception occurred in CAM with block: {exc_type}. Message: {exc_value}") return TrueCopy the code

To briefly explain what the whole thing is doing, first obtain the gradient and activation function values in the model inference process through ActivationsAndGradients below, calculate the loss of the class to be visualized (all other classes are ignored), and calculate the corresponding gradient diagram of the visualized class through this loss. Global average pooling is carried out to obtain the weighting coefficient of each feature maps channel, which is weighted with feature maps on the channel and averaged on the channel to obtain the single-channel graph, and ReLU is output corresponding graph. Note: this map is not yet a thermal map, but must be added to the original map to obtain the final thermal map.

GradCAM is a class that is defined first and then executed. Defines the network to be entered and the layers to be visualized, and the execution requires the input of images and categories of visualizations.

The execution returns a regional importance map.

CAM = GradCAM(model=model, target_layers=target_layers) target_category = None grayscale_cam = cam(input_tensor=data, target_category=target_category)Copy the code

Capturing gradients in reasoning is done primarily through the following class. I won’t tell you much here.

class ActivationsAndGradients:
  """ Class for extracting activations and
  registering gradients from targeted intermediate layers """

  def __init__(self, model, target_layers, reshape_transform):
      self.model = model
      self.gradients = []
      self.activations = []
      self.reshape_transform = reshape_transform
      self.handles = []
      for target_layer in target_layers:
          self.handles.append(
              target_layer.register_forward_hook(
                  self.save_activation))
          # Backward compatibility with older pytorch versions:
          if hasattr(target_layer, 'register_full_backward_hook'):
              self.handles.append(
                  target_layer.register_full_backward_hook(
                      self.save_gradient))
          else:
              self.handles.append(
                  target_layer.register_backward_hook(
                      self.save_gradient))

  def save_activation(self, module, input, output):
      activation = output
      if self.reshape_transform is not None:
          activation = self.reshape_transform(activation)
      self.activations.append(activation.cpu().detach())

  def save_gradient(self, module, grad_input, grad_output):
      # Gradients are computed in reverse order
      grad = grad_output[0]
      if self.reshape_transform is not None:
          grad = self.reshape_transform(grad)
      self.gradients = [grad.cpu().detach()] + self.gradients

  def __call__(self, x):
      self.gradients = []
      self.activations = []
      return self.model(x)

  def release(self):
      for handle in self.handles:
          handle.remove()
Copy the code

The next step is to display the importance diagram of GradCAM output on top of the original diagram, using this function.

def show_cam_on_image(img: np.ndarray, mask: np.ndarray, use_rgb: bool = False, colormap: int = cv2.COLORMAP_JET) -> np.ndarray: """ This function overlays the cam mask on the image as an heatmap. By default the heatmap is in BGR format. :param img:  The base image in RGB or BGR format. :param mask: The cam mask. :param use_rgb: Whether to use an RGB or BGR heatmap, this should be set to True if 'img' is in RGB format. :param colormap: The OpenCV colormap to be used. :returns: The default image with the cam overlay. """ heatmap = cv2.applyColorMap(np.uint8(255 * mask), colormap) if use_rgb: heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB) heatmap = np.float32(heatmap) / 255 if np.max(img) > 1: raise Exception( "The input image should np.float32 in the range [0, 1]") cam = heatmap + img cam = cam / np.max(cam) return np.uint8(255 * cam)Copy the code

The previous introduction is only the visualization of the thermal map for classification tasks, but what about the multi-task applications such as target detection and semantic segmentation?

Thermal map visualization for other types of tasks

In the gradCAM paper the author of the code is also introduced how to visualize the target detection, semantic segmentation, Transformer code. Because the author provides the use method, here is not much introduction, directly gives the author to write a tutorial.

  • Notebook tutorial: Class Activation Maps for Object Detection with Faster-RCNN

  • Notebook tutorial: Class Activation Maps for Semantic Segmentation

  • How it works with Vision/SwinT transformers

Address: github.com/jacobgil/py…

Welcome to pay attention to the public number CV technical guide, focusing on computer vision technology summary, the latest technology tracking, classic paper interpretation, CV recruitment information.

CV Technical Guide has created a great environment for communication, except for out-of-the-way questions, which are almost always answered. Concern public number to add edit micro signal can invite to add exchange group.

​​

Other articles

A visual feature map of the code

Build a Pytorch model from zero

Build Pytorch model from zero

Summary of Anomaly Detection research on Industrial Image (2019-2020)

What tricks will make the model train faster? Summary of possible causes of slow model training

A Review of Small Sample Learning (INSTITUTE of Computing Science, Chinese Academy of Sciences)

Summary of positive and negative sample differentiation strategy and balance strategy in target detection

Model quantification techniques and application practices in low power IOT devices

NeurIPS 2021 | from the perspective of evolutionary algorithms to explain the Transformer structure, and put forward for the unification of the multimodal task sequence model paradigm

Single GPU CVPR2022 | 76 frames per second, overlapping objects can also be perfect segmentation, multimodal Transformer used in video segmentation effect is amazing

Summary of frame position optimization in target detection

Summary of Anchor-free application methods of target detection, instance segmentation and multi-target tracking

Soft Sampling: Explore more effective Sampling strategy

How to solve the problem of small samples in industrial defect detection

Machine learning, deep learning interview knowledge summary

The future of deep learning image recognition: Opportunities and Challenges coexist

A summary of some personal habits and thoughts about fast learning a new technology or field