Author | Aakarsh Yelisetty compile | Flin source | towardsdatascience

Let’s take a look at how to use FAIR’s (Facebook AI Research) Detectron 2 for instance detection on custom datasets involving text recognition.

Have you ever tried to train an object detection model from scratch using a custom dataset of your own choosing?

If so, you’ll know how tedious the process can be. If we choose a regionally suggested approach such as the faster R-CNN, or we can also use one-time detector algorithms such as SSD and YOLO, we need to build the model from using feature pyramid networks and regionally suggested networks.

Any of them would be a bit complicated if we wanted to implement them from scratch. We need a framework in which we can use state-of-the-art models such as Fast, Faster and Mask R-CNN. However, it is important that we build a model from scratch to understand the mathematics behind it.

If we want to quickly train our object detection model with custom data sets, Detectron 2 can help. All models in the model library of the Detectron 2 library have been pre-trained on the COCO Dataset. We just need to fine-tune our custom data set on a pre-trained model.

Detectron 2 is a complete rewrite of the first Detectron released in 2018. Its predecessor was written on Caffe2, a deep learning framework that is also supported by Facebook. Caffe2 and Detectron are not recommended at this time. Caffe2 is now part of PyTorch and its successor, Detectron 2, was written entirely on PyTorch.

Detectron2 is designed to advance machine learning by providing rapid training and solving problems companies face from research to production.

Here are the various types of target detection models that Detectron 2 provides.

Let’s go straight to sample testing.

Instance detection refers to the classification and location of objects with bounding boxes. In this article, we will use the Faster RCNN model in Detectron 2’s model library to recognize text language in images.

Note that we have limited the languages to two.

We recognize Hindi and English texts and provide a class called “Others” for other languages.

We will implement a model that outputs this way.

Let’s get started!

With Detectron 2, you can perform object detection on any custom dataset in seven steps. All of these steps are easy to find in this Google Colab Notebook, and you can run them right away!

Using Google Colab to do this is easy because we can train faster using gpus.

Step 1: Install Detectron 2

First install some dependencies, such as Torch Vision and COCO APIS, and then check if CUDA is available. CUDA helps keep track of the currently selected GPU. Then install Detectron2.

# install dependencies:! PIP install -u torch torchvision = = = = 1.5 0.6 f https://download.pytorch.org/whl/cu101/torch_stable.html! PIP Install Cython Pyyaml ==5.1! pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI' import torch, torchvision print(torch.__version__, torch.cuda.is_available()) ! gcc --version# install detectron2:! PIP install detectron2 = = 0.1.3 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.htmlCopy the code

Step 2: Prepare and register the dataset

Import the necessary packages.

# You may need to restart your runtime prior to this, to let your installation take effect
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import cv2
import random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
Copy the code

The built-in dataset lists the datasets that Detectron2 has built-in support for. If you want to use a custom dataset and reuse the data loader for Detectron2, you need to register the dataset (that is, tell Detectron2 how to get the dataset).

  • Built-in data sets: detectron2. Readthedocs. IO/tutorials/b…

We use a text detection dataset with three categories:

  1. English

  2. Hindi

  3. other

We will train the text detection model from an existing model pre-trained on the COCO dataset, which is available in detectron2’s model library.

If you are interested in learning about the conversion from the raw dataset format to the format accepted by Detectron 2, check out:

  • Colab.research.google.com/drive/1q-gw…

How does the data enter the model? The input data must be in some format, such as YOLO, PASCAL VOC, COCO, etc. Detectron2 accepts COCO data sets. The COCO format of the dataset consists of a JSON file that contains all the details of the image, such as size, annotations (that is, bounding box coordinates), labels corresponding to its bounding box, and so on. For example,

This is a JSON format image. Bounding boxes indicate that there are different types of formats. It must be a member of Detectron2 with Structures.BoxMode. There are five such formats. But currently, it supports boxmode.xyxy_abs, boxmode.xyWH_abs.

We use the second format. (X, Y) represents a coordinate of the bounding box, and W, H represents the width and height of the box. ‘Category_id’ refers to the category to which the boundary box belongs.

Then, we need to register our data set.

import json
from detectron2.structures import BoxMode
def get_board_dicts(imgdir) :
    json_file = imgdir+"/dataset.json" #Fetch the json file
    with open(json_file) as f:
        dataset_dicts = json.load(f)
    for i in dataset_dicts:
        filename = i["file_name"] 
        i["file_name"] = imgdir+"/"+filename 
        for j in i["annotations"]:
            j["bbox_mode"] = BoxMode.XYWH_ABS #Setting the required Box Mode
            j["category_id"] = int(j["category_id"])
    return dataset_dicts
from detectron2.data import DatasetCatalog, MetadataCatalog
#Registering the Dataset
for d in ["train"."val"]:
    DatasetCatalog.register("boardetect_" + d, lambda d=d: get_board_dicts("Text_Detection_Dataset_COCO_Format/" + d))
    MetadataCatalog.get("boardetect_" + d).set(thing_classes=["HINDI"."ENGLISH"."OTHER"])
board_metadata = MetadataCatalog.get("boardetect_train")
Copy the code

To verify whether the data is loaded correctly, let’s visualize the annotation of randomly selected samples in the training set.

Step 3: Visualize the training set

We will randomly select 3 images from the train folder of the data set and see what the bounding boxes look like.

#Visualizing the Train Dataset
dataset_dicts = get_board_dicts("Text_Detection_Dataset_COCO_Format/train")
#Randomly choosing 3 images from the Set
for d in random.sample(dataset_dicts, 3):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=board_metadata)
    vis = visualizer.draw_dataset_dict(d)
    cv2_imshow(vis.get_image()[:, :, ::-1])
Copy the code

The output looks something like this,

Step 4: Train the model

We have taken a big step forward. This is the step we take to configure and set up the model for training. Technically, we are just fine-tuning our model on the dataset because the model has been pre-trained on the COCO dataset.

Detectron2 has a large library of models available for target detection. In this case, we use faster_rcnn_R_50_FPN_3x.

There is a backbone (Resnet in this case) for extracting features from images, followed by a region suggestion network for making region suggestions, and a box header for tightening boundary boxes.

You can read more about how R-CNN works faster in my previous article.

  • Towardsdatascience.com/understandi…

Let’s set up the configuration for training.

from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
import os
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")) #Get the basic model configuration from the model zoo 
#Passing the Train and Validation sets
cfg.DATASETS.TRAIN = ("boardetect_train",)
cfg.DATASETS.TEST = ("boardetect_val".)# Number of data loading threads
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
# Number of images per batch across all machines.
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.0125  # pick a good LearningRate
cfg.SOLVER.MAX_ITER = 1500  #No. of iterations   
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256  
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3 # No. of classes = [HINDI, ENGLISH, OTHER]
cfg.TEST.EVAL_PERIOD = 500 # No. of iterations after which the Validation Set is evaluated. 
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = CocoTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()
Copy the code

I don’t think this is the best configuration. Of course, the accuracy of other configurations will also improve. After all, it depends on choosing the right hyperparameters.

Note that here we also calculate the accuracy of every 500 iterations in the validation set.

Step 5: Use your trained model for reasoning

Now it’s time to extrapolate the results by testing the model on a validation set.

After successfully completing the training, the output folder is saved in local storage where the final weights are stored. You can save this folder for future inferences based on this model.

from detectron2.utils.visualizer import ColorMode

#Use the final weights generated after successful training for inference  
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.8  # set the testing threshold for this model
#Pass the validation dataset
cfg.DATASETS.TEST = ("boardetect_val", )

predictor = DefaultPredictor(cfg)

dataset_dicts = get_board_dicts("Text_Detection_Dataset_COCO_Format/val")
for d in random.sample(dataset_dicts, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1],
                   metadata=board_metadata, 
                   scale=0.8,
                   instance_mode=ColorMode.IMAGE   
    )
    v = v.draw_instance_predictions(outputs["instances"].to("cpu")) #Passing the predictions to CPU from the GPU
    cv2_imshow(v.get_image()[:, :, ::-1])
Copy the code

Results:

Step 6: Evaluate the training model

In general, the evaluation of the model follows the COCO evaluation criteria. Average accuracy (mAP) was used to evaluate the performance of the model.

Here is an article about mAP: tarangshah.com/blog/2018-0…

#import the COCO Evaluator to use the COCO Metrics
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader

#Call the COCO Evaluator function and pass the Validation Dataset
evaluator = COCOEvaluator("boardetect_val", cfg, False, output_dir="/output/")
val_loader = build_detection_test_loader(cfg, "boardetect_val")

#Use the created predicted model in the previous step
inference_on_dataset(predictor.model, val_loader, evaluator)
Copy the code

For an IoU of 0.5, we achieved an accuracy of about 79.4%, which is not bad. This can be increased by tweaking the parameters slightly and increasing the number of iterations. But pay close attention to the training process, as the model can overfit.

If you need to extrapolate from the saved model, please visit: colab.research.google.com/drive/1d0kX…

conclusion

In this article, I focus on the process of using the Custom data set of Detectron 2 for target detection, rather than on achieving greater accuracy.

Although this seems like a pretty simple procedure, there’s still a lot to explore in the Detectron 2 library. We have a large number of optimization parameters that can be further tweaked for greater accuracy, and it all depends on one’s custom data set.

You can download notebook from my Github repository and try running it on Google Colab or Jupyter Notebooks.

  • Github.com/aakarsh7599…

I hope you learned something new today.

The original link: towardsdatascience.com/object-dete…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/