Torchvision. models contains a number of models to solve different visual tasks: image classification, semantic segmentation, object detection, instance segmentation, human critical point detection, and video classification.
In this article, we will introduce the introduction to the TorchVision model, and let’s create the Faster R-CNN pre-training model to predict what objects are in the image.
import torch
import torchvision
from PIL import Image
Copy the code
Create a pre-training model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
Copy the code
Print (model) to see its structure:
FasterRCNN( (transform): GeneralizedRCNNTransform(Normalize(mean=[0.485, 0.456, 0.406], STD =[0.229, 0.224, 0.225]) max_size=1333, mode='bilinear')
)
(backbone): BackboneWithFPN(
...
)
(rpn): RegionProposalNetwork(
(anchor_generator): AnchorGenerator()
(head): RPNHead(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
(bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
)
)
(roi_heads): RoIHeads(
(box_roi_pool): MultiScaleRoIAlign(featmap_names=['0'.'1'.'2'.'3'], output_size=(7, 7), sampling_ratio=2)
(box_head): TwoMLPHead(
(fc6): Linear(in_features=12544, out_features=1024, bias=True)
(fc7): Linear(in_features=1024, out_features=1024, bias=True)
)
(box_predictor): FastRCNNPredictor(
(cls_score): Linear(in_features=1024, out_features=91, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=364, bias=True)
)
)
)
Copy the code
This pre-training model is trained on COCO Train2017, and the predictable classification is as follows:
COCO_INSTANCE_CATEGORY_NAMES = [
'__background__'.'person'.'bicycle'.'car'.'motorcycle'.'airplane'.'bus'.'train'.'truck'.'boat'.'traffic light'.'fire hydrant'.'N/A'.'stop sign'.'parking meter'.'bench'.'bird'.'cat'.'dog'.'horse'.'sheep'.'cow'.'elephant'.'bear'.'zebra'.'giraffe'.'N/A'.'backpack'.'umbrella'.'N/A'.'N/A'.'handbag'.'tie'.'suitcase'.'frisbee'.'skis'.'snowboard'.'sports ball'.'kite'.'baseball bat'.'baseball glove'.'skateboard'.'surfboard'.'tennis racket'.'bottle'.'N/A'.'wine glass'.'cup'.'fork'.'knife'.'spoon'.'bowl'.'banana'.'apple'.'sandwich'.'orange'.'broccoli'.'carrot'.'hot dog'.'pizza'.'donut'.'cake'.'chair'.'couch'.'potted plant'.'bed'.'N/A'.'dining table'.'N/A'.'N/A'.'toilet'.'N/A'.'tv'.'laptop'.'mouse'.'remote'.'keyboard'.'cell phone'.'microwave'.'oven'.'toaster'.'sink'.'refrigerator'.'N/A'.'book'.'clock'.'vase'.'scissors'.'teddy bear'.'hair drier'.'toothbrush'
]
Copy the code
Specifies the CPU or GPU
Obtain supported devices:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
Copy the code
Model moved to Device:
model.to(device)
Copy the code
Read the input image
img = Image.open('data/bicycle.jpg').convert("RGB")
img = torchvision.transforms.ToTensor()(img)
Copy the code
Prepare the model for the image parameter:
images = [img.to(device)]
Copy the code
Case diagram data/bicycle. JPG:
Model inference
Change the model to eval mode:
# For inference
model.eval(a)Copy the code
When the model inferences, it only needs to give the image data, without annotating the data. After inference, a List[Dict[Tensor]] of predictions for each image is returned. Dict contains fields such as:
- boxes (
FloatTensor[N, 4]
) : predict box[x1, y1, x2, y2]
.x
The scope of[0,W]
.y
The scope of[0,H]
- labels (
Int64Tensor[N]
): Forecast category - scores (
Tensor[N]
): Predicted score
predictions = model(images)
pred = predictions[0]
print(pred)
Copy the code
The forecast results are as follows:
{'boxes': Tensor ([750.7896, 56.2632, 948.7942, 473.7791], [82.7364, 178.6174, 204.1523, 491.9059],... [174.9881, 235.7873, 351.1031, 417.4089], [631.6036, 278.6971, 664.1542, 353.2548]]'cuda:0',
grad_fn=<StackBackward>), 'labels': tensor([ 1, 1, 2, 1, 1, 1, 2, 2, 1, 77, 1, 1, 1, 2, 1, 1, 1, 1,
1, 1, 27, 1, 1, 44, 1, 1, 1, 1, 27, 1, 1, 32, 1, 44, 1, 1,
31, 2, 38, 2, 2, 1, 1, 31, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2,
1, 1, 1, 1, 1, 1, 31, 2, 27, 1, 2, 1, 1, 31, 2, 77, 2, 1,
2, 2, 2, 44, 2, 31, 1, 1, 1, 1], device='cuda:0'), 'scores': Tensor ([0.9990, 0.9976, 0.9962, 0.9958, 0.9952, 0.9936, 0.9865, 0.9746, 0.9694, 0.9679, 0.9620, 0.9395, 0.8984, 0.8979, 0.8847, 0.8537, 0.8475, 0.7865, 0.7822, 0.6896, 0.6633, 0.6629, 0.6222, 0.6132, 0.6073, 0.5383, 0.5248, 0.4891, 0.4881, 0.4595, 0.4335, 0.4273, 0.4089, 0.4074, 0.3679, 0.3357, 0.3192, 0.3102, 0.2797, 0.2655, 0.2640, 0.2626, 0.2615, 0.2375, 0.2306, 0.2174, 0.2129, 0.1967, 0.1912, 0.1907, 0.1739, 0.1722, 0.1669, 0.1666, 0.1596, 0.1586, 0.1473, 0.1456, 0.1408, 0.1374, 0.1373, 0.1329, 0.1291, 0.1290, 0.1289, 0.1278, 0.1205, 0.1182, 0.1182, 0.1103, 0.1060, 0.1025, 0.1010, 0.0985, 0.0959, 0.0919, 0.0887, 0.0886, 0.0873, 0.0832, 0.0792, 0.0778, 0.0764, 0.0693, 0.0686, 0.0679, 0.0671, 0.0668, 0.0636, 0.0635, 0.0607, 0.0605, 0.0581, 0.0578, 0.0572, 0.0568, 0.0557, 0.0556, 0.0555, 0.0533], device='cuda:0', grad_fn=<IndexBackward>)}
Copy the code
Mapping the prediction results
Obtain prediction results with score >= 0.9:
scores = pred['scores']
mask = scores >= 0.9
boxes = pred['boxes'][mask]
labels = pred['labels'][mask]
scores = scores[mask]
Copy the code
Using plots and plot_image
from utils.colors import golden
from utils.plots import plot_image
lb_names = COCO_INSTANCE_CATEGORY_NAMES
lb_colors = golden(len(lb_names), fn=int, scale=0xff, shuffle=True)
lb_infos = [f'{s:2.f}' for s in scores]
plot_image(img, boxes, labels, lb_names, lb_colors, lb_infos,
save_name='result.png')
Copy the code
Torchvision >= 0.9.0/nightly. Plots. Plot_image requires torchvision >= 0.9.0/nightly.
The source code
- test_pretrained_models.py
utils.colors.golden
:
import colorsys
import random
def golden(n, h=random.random(), s=0.5, v=0.95,
fn=None, scale=None, shuffle=False) :
if n <= 0:
return []
coef = (1 + 5支那0.5) / 2
colors = []
for _ in range(n):
h += coef
h = h - int(h)
color = colorsys.hsv_to_rgb(h, s, v)
if scale is not None:
color = tuple(scale*v for v in color)
if fn is not None:
color = tuple(fn(v) for v in color)
colors.append(color)
if shuffle:
random.shuffle(colors)
return colors
Copy the code
utils.plots.plot_image
:
from typing import Union.Optional.List.Tuple
import matplotlib.pyplot as plt
import numpy as np
import torch
import torchvision
from PIL import Image
def plot_image(
image: Union[torch.Tensor, Image.Image, np.ndarray],
boxes: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None,
lb_names: Optional[List[str]] = None,
lb_colors: Optional[List[Union[str.Tuple[int.int.int=]]]]None,
lb_infos: Optional[List[str]] = None,
save_name: Optional[str] = None,
show_name: Optional[str] = 'result'.) -> torch.Tensor:
""" Draws bounding boxes on given image. Args: image (Image): `Tensor`, `PIL Image` or `numpy.ndarray`. boxes (Optional[Tensor]): `FloatTensor[N, 4]`, the boxes in `[x1, y1, x2, y2]` format. labels (Optional[Tensor]): `Int64Tensor[N]`, the class label index for each box. lb_names (Optional[List[str]]): All class label names. lb_colors (List[Union[str, Tuple[int, int, int]]]): List containing the colors of all class label names. lb_infos (Optional[List[str]]): Infos for given labels. save_name (Optional[str]): Save image name. show_name (Optional[str]): Show window name. """
if not isinstance(image, torch.Tensor):
image = torchvision.transforms.ToTensor()(image)
if boxes is not None:
ifimage.dtype ! = torch.uint8: image = torchvision.transforms.ConvertImageDtype(torch.uint8)(image) draw_labels =None
draw_colors = None
if labels is not None:
draw_labels = [lb_names[i] for i in labels] if lb_names is not None else None
draw_colors = [lb_colors[i] for i in labels] if lb_colors is not None else None
if draw_labels and lb_infos:
draw_labels = [f'{l} {i}' for l, i in zip(draw_labels, lb_infos)]
# torchvision > = 0.9.0 / nightly
# https://github.com/pytorch/vision/blob/master/torchvision/utils.py
res = torchvision.utils.draw_bounding_boxes(image, boxes,
labels=draw_labels, colors=draw_colors)
else:
res = image
if save_name or show_name:
res = res.permute(1.2.0).contiguous().numpy()
if save_name:
Image.fromarray(res).save(save_name)
if show_name:
plt.gcf().canvas.set_window_title(show_name)
plt.imshow(res)
plt.show()
return res
Copy the code
reference
- torch.hub
- torchvision.models
GoCoding personal practice experience sharing, can follow the public account!