This series of articles deconstructs the Faster R-CNN network structure through the wonderful journey of little M in the Faster R-CNN

See the code for the initial article: Pytorch -tutorial/framework.py

There is a primitive tribe called the Land of the Graph, where there are all kinds of pictures. Each picture is born with a different pocket with a different item, and each picture is destined to figure out which pocket has the item and what.

In this tribe lived many intelligent architects, who built one maze after another. These mazes range from complex to cleverly constructed and uneven, and can take anywhere from a few days to a lifetime. And each picture that completes the maze can complete the mission of a lifetime.

One architect, RHRS, spent several years creating a maze called Faster R-CNN, which became one of the most famous mazes of its time.

This day, picture little M came to the maze, knocked on the door, saw the maze entrance wall carved with the following words

If you want to go through this maze, you must first go through the five levels, both difficult and easy. I hope you keep thinking and patience.

Level 1: Transform preprocesses the image to make it meet the basic requirements for entering Faster R-CNN Level 2: Backbone obtains the feature picture of the image Level 3: RPN Gives the possible area of the image containing the detection target Level 4: ROI Pooling & Flatten Process all possible areas to meet the post-processing requirements level 5: Classification and regression Classify the obtained possible areas to find out what they are and perform regression to find out where they are

Finally, the categories and positions obtained can be marked on the original drawing after Post Process

Below the text should be the design of the entire maze, the text below the picture indicates that the detailed design of each level is provided on the way.

While little M was thinking, a scroll dropped, and when opened, the basic structure of the maze was written down in its entirety, as follows:

class FasterRCNNBase(nn.Module) :
    "Transform: level 1, used to preprocess pictures backbone: Level 2, used to extract Feature Map RPN: Roi_heads: the integration of levels 4 and 5, which is used to obtain the final Region "" containing the target from level 3.
    def __init__(self, transform, backbone, rpn, roi_heads) :
        super(FasterRCNNBase, self).__init__()
        self.transform = transform
        self.backbone = backbone
        self.rpn = rpn
        self.roi_heads = roi_heads
        
    def forward(self, images: List[Tensor], targets=None) :
    
        raw_image_shape: List[Tuple[int.int]] = []
        for image in images:
            raw_image_shape.append((image.shape[1], image.shape[2]))

        "" Level one" "
        images, targets = self.transform(images, targets)  # Preprocess the image
        
        "" Level two" "
        features = self.backbone(images.tensors) # Input the preprocessed image to backbone to get the feature map

        "" Level three" "
        Input the feature map to RPN and get the region proposals
        proposals, proposal_losses = self.rpn(images, features, targets) 
        
        Levels four and five
        # Pass the feature map, areas that may contain targets, pre-processed images and targets into ROI_HEADS to obtain the detected targets
        detections, detector_losses = self.roi_heads(
            features, proposals, images, targets
        )
        
        Finally, the predicted Bboxes are restored to the original image scale.
        detections = self.transform.post_process(
            detections, images, raw_image_shape
        )

        losses = {}
        losses.update(detector_losses)
        losses.update(proposal_losses)

        return losses, detections
Copy the code

After reading the instructions, blueprints and building the frame, M. felt confident he could complete the maze.

With the scroll, M opens the door to the first level: Transform, so what’s in Transform? Listen next time…