• How to Perform Object Detection With YOLOv3 in Keras
  • Originally by Jason Brownlee
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: Daltan
  • Proofread by: LSvih, zhmhhu

How to use YOLOv3 for object detection in Keras

Object detection is a task of computer vision, which involves recognizing the existence, location, type and other attributes of one or more objects for a given image.

However, it is a challenging task to find suitable methods to solve the problems of object identification (where they are), object localization (how much they are), and object classification (what they are).

Over the years, deep learning has made advances in object recognition methods such as standard benchmark datasets and computer vision contests. Notable among them is YOLO (You Only Look Once), which is a series of convolutional neural network algorithms that achieve almost the most advanced results in real-time object detection through a single end-to-end model.

This tutorial teaches you how to build a YOLOv3 model and perform object detection on a new image.

By the end of this tutorial, you will know:

  • YOLO algorithm based on convolutional neural network model for object detection, and its latest variant, YOLOv3.
  • Best implementation of YOLOv3 open source library using Keras deep learning library.
  • How to use preprocessed YOLOv3 for object location and detection of new images.

Let’s get started.

How to use YOLOv3 for object detection David Berkowitz graphs in Keras, some rights reserved.

An overview of the tutorial

This tutorial is divided into three parts:

  1. YOLO for object detection
  2. Experiencor’s YOLO3 project
  3. Object detection with YOLOv3

YOLO for object detection

Object detection is a task of computer vision, which involves not only locating one or more objects in a single image, but also classifying each object in that image.

Object detection, a challenging computer vision task, not only needs to successfully locate the object in the image, find each object and draw its border, but also needs to correctly classify the located object.

YOLO (You Only Look Once) is a series of end-to-end deep learning models for fast object detection, first described by Joseph Redmon et al in his 2015 paper “You Only Look Once: Unified Real-time Object Detection”.

The approach involves a single deep convolutional neural network (originally a version of GoogLeNet and later updated as VGG based DarkNet) that divides the input into a grid of cells, each of which directly predicts borders and object classification. The result is that a large number of candidate bounding boxes are incorporated into the final prediction through post-processing steps.

At the time of writing, there are three main variants: YOLOv1, YOLOv2, and YOLOv3. The first version proposed a common architecture, while the second version improved the design and used predefined anchoring boxes to improve the bounding box scheme, and the third version further refined the model architecture and training process.

Although the accuracy of the model is slightly less than that of region-based convolutional neural network (R-CNN), the YOLO model is very popular in object detection due to its fast detection speed, and detection results can usually be displayed in real time on the input of video or camera.

In one evaluation, a single neural network predicts bounding box and category probabilities directly from the complete image. Because the whole detection pipeline is a single network, the detection performance can be directly optimized end-to-end.

  • — You Only Look Once: Unified, Real-Time Object Detection, 2015.

This tutorial focuses on using YOLOv3.

Practice YOLO3 in the Keras project

The source code for each version of YOLO, as well as pre-trained models, is available for download.

The official repository DarkNet GitHub, which contains the source code for the YOLO version mentioned in the paper, is written in C. The repository also provides a step-by-step tutorial that teaches you how to use code for object detection.

Implementing the model from scratch can be challenging, especially for novices, because many custom model elements need to be developed for training and prediction. For example, even working directly with a pre-trained model requires complex code to extract and interpret the predictive bounding boxes of the model’s output.

Instead of writing code from scratch, we can use code implemented by third parties. There are many third-party implementations designed to use YOLO in Keras, but none are standardized and designed to be used by libraries.

The YAD2K project is a de facto YOLOv2 standard that provides scripts to convert pre-trained weights into Keras format, make predictions using pre-trained models, and provide the code required to extract explanatory prediction bounding boxes. Many other third-party developers have used this code as a starting point and updated it to support YOLOv3.

Probably the most widely used project using the pre-trained YOLO model is “Keras-Yolo3: Training and Detecting Objects with Yolo3”, developed by Huynh Ngoc Anh, also known as Experiencor. The code in this project is available under an MIT open Source license. Like YAD2K, the project provides scripts that can be used to load and use pre-trained YOLO models, as well as develop YoloV3-based transfer learning models on new data sets.

Experiencor also has a Keras-Yolo2 project, which has code similar to YOLOv2, as well as detailed tutorials on how to use the repository code. Keras-yolo3 appears to be an updated version of this project.

Interestingly, Experiencor has run experiments based on this model, training versions of YOOLOv3 on standard object detection problems such as kangaroo datasets, racoon datasets, red blood cell tests, and so on. He lists the results of the model, gives weights for downloading, and even publishes a YouTube video showing the results. Such as:

  • Raccoon Detection using YOLO 3

This tutorial is based on Experiencor’s Keras-Yolo3 project, using YOLOv3 for object detection.

Here are the code branches at the time this article was written, in case the repository changes or is deleted (which can happen in third-party open source projects).

Object detection with YOLOv3

The Keras-Yolo3 project provides many models using YOLOv3, including object detection, transfer learning, and training models from scratch.

In this section, the pre-training model is used for object detection of unseen images. You can do this with a Python file of the repository called yolo3_one_file_to_DETECt_them_all.py, which has 435 lines. In fact, the script prepares the model with pre-training weights, then uses the model for object detection, and finally outputs a model. In addition, the script relies on OpenCV.

Instead of using the program directly, we build our own script with elements from the program, preparing and saving the Keras YOLOv3 model, then loading and predicting the new image.

Create and save the model

The first step is to download the pre-trained model weights.

Here is a model trained using DarNet code based on the MSCOCO dataset. Download the model weights, collocate them to the current working path and rename them to yolov3.weights. The file is large and may take a while to download, depending on your network.

  • YOLOv3 Pre-trained Model Weights (yolov3.weights) (237 MB)

The next step is to define a Keras model, ensuring that the number and types of model layers match the downloaded model weights. The model architecture is called DarkNet and was initially based largely on the VGG-16 model.

The script file yolo3_one_file_to_DETECt_them_all.py provides the make_yolov3_model() function to create the model, and the auxiliary _conv_block() function to create the layer blocks. Both functions can be copied from the script.

Now define the Keras model for YOLOv3.

# define the model
model  =  make_yolov3_model()
Copy the code

Next load the model weights. The form of weight storage used by DarkNet doesn’t matter, and we don’t have to decode it manually, just use the WeightReader class in the script.

To use WeightReader, you first instantiate the path to a weight file, such as yolov3.weights. The following code parses the file and loads the model weights into memory so that their format can be used in the Keras model.

# load the model weights
weight_reader  =  WeightReader('yolov3.weights')
Copy the code

Then call the load_weights() function of the WeightReader instance to pass in the defined Keras model and set the weights to the layer.

# set the model weights into the model
weight_reader.load_weights(model)
Copy the code

Code as above. Now you have the YOLOv3 model to work with.

Save this model as a Keras compatible.h5 model file for later use.

# save the model to file
model.save('model.h5')
Copy the code

Put all of this together. The code is copied from yolo3_one_file_to_DETECt_them_all.py, including the full code of the function below.

# create a YOLOv3 Keras model and save it to file
# based on https://github.com/experiencor/keras-yolo3import struct import numpy as np from keras.layers import Conv2D from keras.layers import Input from keras.layers import  BatchNormalization from keras.layers import LeakyReLU from keras.layers import ZeroPadding2D from keras.layers import UpSampling2D from keras.layers.merge import add, concatenate from keras.models import Model def _conv_block(inp, convs, skip=True): x = inp count = 0for conv in convs:
		if count == (len(convs) - 2) and skip:
			skip_connection = x
		count += 1
		if conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # peculiar padding as darknet prefer left and top
		x = Conv2D(conv['filter'],
				   conv['kernel'],
				   strides=conv['stride'],
				   padding='valid' if conv['stride'] > 1 else 'same'.# peculiar padding as darknet prefer left and top
				   name='conv_' + str(conv['layer_idx']),
				   use_bias=False if conv['bnorm'] else True)(x)
		if conv['bnorm']: X = BatchNormalization(Epsilon =0.001, name='bnorm_' + str(conv['layer_idx']))(x)
		if conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)
	return add([skip_connection, x]) if skip else x

def make_yolov3_model():
	input_image = Input(shape=(None, None, 3))
	# Layer 0 => 4
	x = _conv_block(input_image, [{'filter': 32.'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},
								  {'filter': 64, 'kernel': 3.'stride': 2.'bnorm': True, 'leaky': True, 'layer_idx': 1},
								  {'filter': 32.'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},
								  {'filter': 64, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])
	# Layer 5 => 8
	x = _conv_block(x, [{'filter': 128, 'kernel': 3.'stride': 2.'bnorm': True, 'leaky': True, 'layer_idx': 5},
						{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},
						{'filter': 128, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])
	# Layer 9 => 11
	x = _conv_block(x, [{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},
						{'filter': 128, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])
	# Layer 12 => 15
	x = _conv_block(x, [{'filter': 256, 'kernel': 3.'stride': 2.'bnorm': True, 'leaky': True, 'layer_idx': 12},
						{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},
						{'filter': 256, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx'14}]) :# Layer 16 => 36
	for i in range(7):
		x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},
							{'filter': 256, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])
	skip_36 = x
	# Layer 37 => 40
	x = _conv_block(x, [{'filter': 512, 'kernel': 3.'stride': 2.'bnorm': True, 'leaky': True, 'layer_idx': 37},
						{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},
						{'filter': 512, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])
	# Layer 41 => 61
	for i in range(7):
		x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},
							{'filter': 512, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])
	skip_61 = x
	# Layer 62 => 65
	x = _conv_block(x, [{'filter': 1024, 'kernel': 3.'stride': 2.'bnorm': True, 'leaky': True, 'layer_idx': 62},
						{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},
						{'filter': 1024, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])
	# Layer 66 => 74
	for i in range(3):
		x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},
							{'filter': 1024, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])
	# Layer 75 => 79
	x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},
						{'filter': 1024, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},
						{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},
						{'filter': 1024, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},
						{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], skip=False)
	# Layer 80 => 82
	yolo_82 = _conv_block(x, [{'filter': 1024, 'kernel': 3.'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 80},
							  {'filter':  255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], skip=False)
	# Layer 83 => 86
	x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], skip=False)
	x = UpSampling2D(2)(x)
	x = concatenate([x, skip_61])
	# Layer 87 => 91
	x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},
						{'filter': 512, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},
						{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},
						{'filter': 512, 'kernel': 3.'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},
						{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], skip=False)
	# Layer 92 => 94
	yolo_94 = _conv_block(x, [{'filter': 512, 'kernel': 3.'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 92},
							  {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], skip=False)
	# Layer 95 => 98
	x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True,   'layer_idx': 96}], skip=False)
	x = UpSampling2D(2)(x)
	x = concatenate([x, skip_36])
	# Layer 99 => 106
	yolo_106 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 99},
							   {'filter': 256, 'kernel': 3.'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 100},
							   {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 101},
							   {'filter': 256, 'kernel': 3.'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 102},
							   {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 103},
							   {'filter': 256, 'kernel': 3.'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 104},
							   {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], skip=False)
	model = Model(input_image, [yolo_82, yolo_94, yolo_106])
	return model

class WeightReader:
	def __init__(self, weight_file):
		with open(weight_file, 'rb') as w_f:
			major,	= struct.unpack('i', w_f.read(4))
			minor,	= struct.unpack('i', w_f.read(4))
			revision, = struct.unpack('i', w_f.read(4))
			if (major*10 + minor) >= 2 and major < 1000 and minor < 1000:
				w_f.read(8)
			else:
				w_f.read(4)
			transpose = (major > 1000) or (minor > 1000)
			binary = w_f.read()
		self.offset = 0
		self.all_weights = np.frombuffer(binary, dtype='float32')

	def read_bytes(self, size):
		self.offset = self.offset + size
		return self.all_weights[self.offset-size:self.offset]

	def load_weights(self, model):
		for i in range(106):
			try:
				conv_layer = model.get_layer('conv_' + str(i))
				print("loading weights of convolution #" + str(i))
				if i not in [81, 93, 105]:
					norm_layer = model.get_layer('bnorm_' + str(i))
					size = np.prod(norm_layer.get_weights()[0].shape)
					beta  = self.read_bytes(size) # bias
					gamma = self.read_bytes(size) # scale
					mean  = self.read_bytes(size) # mean
					var   = self.read_bytes(size) # variance
					weights = norm_layer.set_weights([gamma, beta, mean, var])
				iflen(conv_layer.get_weights()) > 1: bias = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape)) kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape)) kernel = Kernel. Reshape (list (reversed (conv_layer get_weights () [0]. Shape))) kernel = kernel. The transpose (,3,1,0 [2]) conv_layer.set_weights([kernel, bias])else: kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape)) kernel = Kernel. Reshape (list (reversed (conv_layer get_weights () [0]. Shape))) kernel = kernel. The transpose (,3,1,0 [2]) conv_layer.set_weights([kernel]) except ValueError:print("no convolution #" + str(i))

	def reset(self):
		self.offset = 0

# define the model
model = make_yolov3_model()
# load the model weights
weight_reader = WeightReader('yolov3.weights')
# set the model weights into the model
weight_reader.load_weights(model)
# save the model to file
model.save('model.h5')
Copy the code

Running this sample code on modern hardware would probably take less than a minute.

When the weights file loads, you can see the debug information report output by the WeightReader class.

. loading weights of convolution# 99
loading weights of convolution # 100
loading weights of convolution # 101
loading weights of convolution # 102
loading weights of convolution # 103
loading weights of convolution # 104
loading weights of convolution # 105
Copy the code

At the end of the run, the model.h5 file is saved in the current working path and is close to the size of the original weight file (237MB), but it can be loaded and used directly, just like the Keras model.

Do the forecast

We need a new image for object detection, ideally objects in the image that we know our model can recognize from the MSCOCO dataset.

Here is an image of three zebras, taken with permission by Boegh while traveling.

Photo by Boegh. Some rights reserved.

  • Three zebra pictures (Zebra. JPG)

Download this image, place it in your current work path, and name it Zebra. JPG.

While interpreting predictions takes some work, making them is straightforward.

The first step is to load the Keras model, which is probably the slowest step in the prediction process.

# load yolov3 model
model  =  load_model('model.h5')
Copy the code

The next step is to load the new image and organize it into a form suitable for model input. The input form the model wants is a color image of a 416×416 square.

The load_img() Keras function is used to load the image. The target_size parameter is used to resize the image after loading. It is also possible to use the img_to_array() function to convert the loaded PIL image object into a Numpy array and then readjust the pixel values from 0-255 to a 32-bit floating-point value of 0-1.

# load the image with the required size
image = load_img('zebra.jpg', target_size=(416, 416))
# convert to numpy array
image = img_to_array(image)
# scale pixel values to [0, 1]
image = image.astype('float32') image / = 255.0Copy the code

We want to show the original photo again later, which means we need to scale the bounding boxes of all detected objects from the square shape back to the original shape. This way, we can load the image and restore the original shape.

load the image to get its shape
image  =  load_img('zebra.jpg')
width,  height  =  image.size
Copy the code

The above steps can be combined into a load_image_pixels() function for easy use. This function takes the filename as input, the target size, and returns scaled pixel data that can be used as input to the Keras model, as well as the width and height of the original image.

# load and prepare an image
def load_image_pixels(filename, shape):
    # load the image to get its shape
    image = load_img(filename)
    width, height = image.size
    # load the image with the required size
    image = load_img(filename, target_size=shape)
    # convert to numpy array
    image = img_to_array(image)
    # scale pixel values to [0, 1]
    image = image.astype('float32') image / = 255.0# add a dimension so that we have one sample
    image = expand_dims(image, 0)
    return image, width, height
Copy the code

This function is then called to load the zebra graph.

# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
Copy the code

This image is input to Keras model for prediction.

# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])
Copy the code

So that’s the prediction itself. A complete example is shown below.

# load yolov3 model and perform object detection
# based on https://github.com/experiencor/keras-yolo3
from numpy import expand_dims
from keras.models import load_model
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array

# load and prepare an image
def load_image_pixels(filename, shape):
    # load the image to get its shape
    image = load_img(filename)
    width, height = image.size
    # load the image with the required size
    image = load_img(filename, target_size=shape)
    # convert to numpy array
    image = img_to_array(image)
    # scale pixel values to [0, 1]
    image = image.astype('float32') image / = 255.0# add a dimension so that we have one sample
    image = expand_dims(image, 0)
    return image, width, height

# load yolov3 model
model = load_model('model.h5')
# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])
Copy the code

The sample code returns a list of three Numpy arrays, whose shapes are represented as output.

This data predicts both the border and the type of tag, but it’s coded. These results require some interpretation.

[(1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255)]
Copy the code

Make predictions and interpret results

In fact, the output of the model is encoded candidate borders from three different sizes of grids. The frames themselves are defined by the context of the anchor frame, carefully selected based on an analysis of object sizes in the MSCOCO dataset.

One of the decode_netout() functions in the script provided by Experincor takes each Numpy array one at a time, decoding the categories of candidate borders and predictions. In addition, all borders that cannot be described with sufficient confidence (such as probability below a certain threshold) are ignored. A probability threshold of 60% or 0.6 is used here. This function returns a list of instances of BoundBox that define the corners of each bounding box. These bounding boxes represent the shape and category probability of the input image.

# define the anchorsAnchors = [[116, living, 156198, 373326], [30, 21, 62, 59119], [10, 13, 16, 30, 33, 10]]# define the probability threshold for detected objectsClass_threshold = 0.6 boxes = list()for i in range(len(yhat)):
	# decode the output of the network
	boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
Copy the code

The next step is to stretch the border to the shape of the original image. This step is useful because it means that later we can draw the original image and draw the bounding box, hoping to detect the real object.

The script provided by Experiencor has a correct_yolo_boxes() function that converts border coordinates, taking as parameters a list of border boxes, the original shape of the image originally loaded, and the shape entered in the network. Boundary box coordinates are directly updated:

# correct the sizes of the bounding boxes for the shape of the image
correct _yolo_boxes(boxes,  image_h,  image_w,  input_h,  input_w)
Copy the code

The model predicts many borders, most of which are the same object. You can filter the list of borders to merge overlapping boxes that point to a unified object. The amount of overlap can be defined as a configuration parameter, in this case 50% or 0.5. This screening step is not the most stringent and requires more post-processing steps.

The script does this through do_NMS (), which takes a border list and a threshold. Instead of overlapping borders, this function collates the predicted probabilities of overlapping classes. This way, if another object type is detected, the border is still available.

# suppress non-maximal boxesDo_nms (boxes, 0.5)Copy the code

This leaves the same number of borders, but only a few of them are useful. We can only retrieve borders that strongly predict the presence of objects: more than 60% confidence. This can be done by walking through all the boxes and checking the class predictive values. We can then look for the appropriate class label for that box and add it to the list. Each border needs to be checked against each class label to prevent the same box from strongly predicting multiple objects.

Create a get_boxes() function to do this, taking as arguments a list of borders, known labels, and a classification threshold, and returning the corresponding list of borders, labels, and scores.

# get all of the results above a threshold
def get_boxes(boxes, labels, thresh):
	v_boxes, v_labels, v_scores = list(), list(), list()
	# enumerate all boxes
	for box in boxes:
		# enumerate all possible labels
		for i in range(len(labels)):
			# check if the threshold for this label is high enough
			if box.classes[i] > thresh:
				v_boxes.append(box)
				v_labels.append(labels[i])
				v_scores.append(box.classes[i]*100)
				# don't break, many labels may trigger for one box
	return v_boxes, v_labels, v_scores
Copy the code

Call this function with the border list as an argument.

We also need a list of strings containing the known class labels in the model, in the same order as when we trained the model, especially the class labels in the MSCOCO dataset. Thankfully, these are also available in the Experiencor script.

# define the labels
labels = ["person"."bicycle"."car"."motorbike"."aeroplane"."bus"."train"."truck"."boat"."traffic light"."fire hydrant"."stop sign"."parking meter"."bench"."bird"."cat"."dog"."horse"."sheep"."cow"."elephant"."bear"."zebra"."giraffe"."backpack"."umbrella"."handbag"."tie"."suitcase"."frisbee"."skis"."snowboard"."sports ball"."kite"."baseball bat"."baseball glove"."skateboard"."surfboard"."tennis racket"."bottle"."wine glass"."cup"."fork"."knife"."spoon"."bowl"."banana"."apple"."sandwich"."orange"."broccoli"."carrot"."hot dog"."pizza"."donut"."cake"."chair"."sofa"."pottedplant"."bed"."diningtable"."toilet"."tvmonitor"."laptop"."mouse"."remote"."keyboard"."cell phone"."microwave"."oven"."toaster"."sink"."refrigerator"."book"."clock"."vase"."scissors"."teddy bear"."hair drier"."toothbrush"]
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
Copy the code

Now that you have a few strong borders for your prediction objects, you can summarize them.

# summarize what we found
for i in range(len(v_boxes)):
    print(v_labels[i], v_scores[i])
Copy the code

We can also draw raw photos and draw bounding boxes around each detected object. You do this by retrieving coordinates from each bounding box and creating a Rectangle object.

box = v_boxes[i]
# get coordinates
y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
# calculate width and height of the box
width, height = x2 - x1, y2 - y1
# create the shape
rect = Rectangle((x1, y1), width, height, fill=False, color='white')
# draw the box
ax.add_patch(rect)
Copy the code

It can also be drawn as a string with class tags and confidence levels.

# draw text and score in top left corner
label = "%s (%.3f)" % (v_labels[i], v_scores[i])
pyplot.text(x1, y1, label, color='white')
Copy the code

The draw_boxes() function below does this, getting the file name of the original photo, the list of corresponding borders, the label, the score, and drawing all the detected objects.

# draw all results
def draw_boxes(filename, v_boxes, v_labels, v_scores):
	# load the image
	data = pyplot.imread(filename)
	# plot the image
	pyplot.imshow(data)
	# get the context for drawing boxes
	ax = pyplot.gca()
	# plot each box
	for i in range(len(v_boxes)):
		box = v_boxes[i]
		# get coordinates
		y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
		# calculate width and height of the box
		width, height = x2 - x1, y2 - y1
		# create the shape
		rect = Rectangle((x1, y1), width, height, fill=False, color='white')
		# draw the box
		ax.add_patch(rect)
		# draw text and score in top left corner
		label = "%s (%.3f)" % (v_labels[i], v_scores[i])
		pyplot.text(x1, y1, label, color='white')
	# show the plot
	pyplot.show()
Copy the code

This function is then called to draw the final result.

# draw what we found
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)
Copy the code

You now have all the elements you need to make predictions using the YOLOv3 model. Interpret the results and plot them for review.

The complete code listing, including the original and modified Xperiencor scripts, is listed below.

# load yolov3 model and perform object detection
# based on https://github.com/experiencor/keras-yolo3import numpy as np from numpy import expand_dims from keras.models import load_model from keras.preprocessing.image import load_img from keras.preprocessing.image import img_to_array from matplotlib import pyplot from matplotlib.patches  import Rectangle class BoundBox: def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None): self.xmin = xmin self.ymin = ymin self.xmax = xmax self.ymax = ymax self.objness = objness self.classes = classes self.label = -1 self.score = -1 def get_label(self):if self.label == -1:
			self.label = np.argmax(self.classes)

		return self.label

	def get_score(self):
		if self.score == -1:
			self.score = self.classes[self.get_label()]

		return self.score

def _sigmoid(x):
	return 1. / (1. + np.exp(-x))

def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
	grid_h, grid_w = netout.shape[:2]
	nb_box = 3
	netout = netout.reshape((grid_h, grid_w, nb_box, -1))
	nb_class = netout.shape[-1] - 5
	boxes = []
	netout[..., :2]  = _sigmoid(netout[..., :2])
	netout[..., 4:]  = _sigmoid(netout[..., 4:])
	netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
	netout[..., 5:] *= netout[..., 5:] > obj_thresh

	for i in range(grid_h*grid_w):
		row = i / grid_w
		col = i % grid_w
		for b in range(nb_box):
			# 4th element is objectness score
			objectness = netout[int(row)][int(col)][b][4]
			if(objectness.all() <= obj_thresh): continue
			# first 4 elements are x, y, w, and h
			x, y, w, h = netout[int(row)][int(col)][b][:4]
			x = (col + x) / grid_w # center position, unit: image width
			y = (row + y) / grid_h # center position, unit: image height
			w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
			h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
			# last elements are class probabilities
			classes = netout[int(row)][col][b][5:]
			box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
			boxes.append(box)
	return boxes

def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):
	new_w, new_h = net_w, net_h
	for i in range(len(boxes)):
		x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
		y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
		boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
		boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
		boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
		boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)

def _interval_overlap(interval_a, interval_b):
	x1, x2 = interval_a
	x3, x4 = interval_b
	if x3 < x1:
		if x4 < x1:
			return 0
		else:
			return min(x2,x4) - x1
	else:
		if x2 < x3:
			 return 0
		else:
			return min(x2,x4) - x3

def bbox_iou(box1, box2):
	intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
	intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])
	intersect = intersect_w * intersect_h
	w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
	w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
	union = w1*h1 + w2*h2 - intersect
	return float(intersect) / union

def do_nms(boxes, nms_thresh):
	if len(boxes) > 0:
		nb_class = len(boxes[0].classes)
	else:
		return
	for c in range(nb_class):
		sorted_indices = np.argsort([-box.classes[c] for box in boxes])
		for i in range(len(sorted_indices)):
			index_i = sorted_indices[i]
			if boxes[index_i].classes[c] == 0: continue
			for j in range(i+1, len(sorted_indices)):
				index_j = sorted_indices[j]
				if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
					boxes[index_j].classes[c] = 0

# load and prepare an image
def load_image_pixels(filename, shape):
	# load the image to get its shape
	image = load_img(filename)
	width, height = image.size
	# load the image with the required size
	image = load_img(filename, target_size=shape)
	# convert to numpy array
	image = img_to_array(image)
	# scale pixel values to [0, 1]
	image = image.astype('float32') image / = 255.0# add a dimension so that we have one sample
	image = expand_dims(image, 0)
	return image, width, height

# get all of the results above a threshold
def get_boxes(boxes, labels, thresh):
	v_boxes, v_labels, v_scores = list(), list(), list()
	# enumerate all boxes
	for box in boxes:
		# enumerate all possible labels
		for i in range(len(labels)):
			# check if the threshold for this label is high enough
			if box.classes[i] > thresh:
				v_boxes.append(box)
				v_labels.append(labels[i])
				v_scores.append(box.classes[i]*100)
				# don't break, many labels may trigger for one box
	return v_boxes, v_labels, v_scores

# draw all results
def draw_boxes(filename, v_boxes, v_labels, v_scores):
	# load the image
	data = pyplot.imread(filename)
	# plot the image
	pyplot.imshow(data)
	# get the context for drawing boxes
	ax = pyplot.gca()
	# plot each box
	for i in range(len(v_boxes)):
		box = v_boxes[i]
		# get coordinates
		y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
		# calculate width and height of the box
		width, height = x2 - x1, y2 - y1
		# create the shape
		rect = Rectangle((x1, y1), width, height, fill=False, color='white')
		# draw the box
		ax.add_patch(rect)
		# draw text and score in top left corner
		label = "%s (%.3f)" % (v_labels[i], v_scores[i])
		pyplot.text(x1, y1, label, color='white')
	# show the plot
	pyplot.show()

# load yolov3 model
model = load_model('model.h5')
# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])
# define the anchorsAnchors = [[116, living, 156198, 373326], [30, 21, 62, 59119], [10, 13, 16, 30, 33, 10]]# define the probability threshold for detected objectsClass_threshold = 0.6 boxes = list()for i in range(len(yhat)):
	# decode the output of the network
	boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
# correct the sizes of the bounding boxes for the shape of the image
correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)
# suppress non-maximal boxesDo_nms (boxes, 0.5)# define the labels
labels = ["person"."bicycle"."car"."motorbike"."aeroplane"."bus"."train"."truck"."boat"."traffic light"."fire hydrant"."stop sign"."parking meter"."bench"."bird"."cat"."dog"."horse"."sheep"."cow"."elephant"."bear"."zebra"."giraffe"."backpack"."umbrella"."handbag"."tie"."suitcase"."frisbee"."skis"."snowboard"."sports ball"."kite"."baseball bat"."baseball glove"."skateboard"."surfboard"."tennis racket"."bottle"."wine glass"."cup"."fork"."knife"."spoon"."bowl"."banana"."apple"."sandwich"."orange"."broccoli"."carrot"."hot dog"."pizza"."donut"."cake"."chair"."sofa"."pottedplant"."bed"."diningtable"."toilet"."tvmonitor"."laptop"."mouse"."remote"."keyboard"."cell phone"."microwave"."oven"."toaster"."sink"."refrigerator"."book"."clock"."vase"."scissors"."teddy bear"."hair drier"."toothbrush"]
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
# summarize what we found
for i in range(len(v_boxes)):
	print(v_labels[i], v_scores[i])
# draw what we found
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)
Copy the code

Run the example again to print out the original output of the model.

Next comes a summary of the objects detected by the model and the corresponding confidence. As you can see, the model detected three zebras, and the similarity was more than 90%.

[(1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 91)] Zebra is one of the best zebra in the worldCopy the code

The resulting image has three borders, indicating that the model did successfully detect the three zebras in the image.

Zebra image with YOLOv3 model detection and frame location

Develop reading

This section provides more resources on this topic if you want to learn more about it.

The paper

  • You Only Look Once: Unified, Real-Time Object Detection, 2015.
  • YOLO9000: Better, Faster, Stronger, 2016.
  • YOLOv3: An Incremental Improvement, 2018.

API

  • matplotlib.patches.Rectangle API

resources

  • YOLO: Real-Time Object Detection, Homepage.
  • Official DarkNet and YOLO Source Code, GitHub.
  • Official YOLO: Real Time Object Detection.
  • Huynh Ngoc Anh, Experiencor, Home Page.
  • experiencor/keras-yolo3, GitHub.

Keras project for other YOLO implementations

  • allanzelener/YAD2K, GitHub.
  • qqwweee/keras-yolo3, GitHub.
  • xiaochus/YOLOv3 GitHub.

conclusion

This tutorial teaches you how to develop a YOLOv3 model for object detection of new images.

Specifically, you learned:

  • Convolutional neural network model based on YOLO for object detection. The latest variant is YOLOv3.
  • The best open source library YOLOv3 implementation for Keras deep learning library.
  • How to locate and detect new photos using pre-trained YOLOv3.

Any questions? Ask questions in the comments section and I’ll do my best to answer them.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.