• How to Train an Object Detection Model with Keras
  • Originally by Jason Brownlee
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: EmilyQiRabbit
  • Proofread by: Ultrasteve, Zhmhhu

How to use Keras to train the target detection model

Target detection is a challenging computer vision subject, which includes predicting the position of the target in the image and confirming what type of object the detected target is.

Convolutional neural network model based on Mask region, or Mask R-CNN for short, is one of the most advanced methods in target detection. The Matterport Mask R-CNN project provides us with a library that we can use to develop and test Keras models for Mask R-CNN for our own target detection tasks. Although it utilizes the best models trained in very challenging object detection tasks, such as MS COCO, for transfer learning, the library can be difficult for beginners to use and requires careful preparation of data sets.

In this tutorial, you will learn how to train a Mask R-CNN model that can recognize kangaroos in photos.

By the end of this tutorial, you will know:

  • How to prepare target detection data set for training R-CNN model.
  • How to use transfer learning to train object detection models on new data sets.
  • How to evaluate Mask R-CNN on test data sets and how to make predictions on new photos.

If you still want to know how to model image classification, object detection, face recognition, etc., check out my new book on computer vision, which includes 30 detailed tutorials and all the source code.

Now let’s get started.

How the R-CNN model and Keras training can be used to identify kangaroos in photos of target detection models Photo by Ronnie Robertson, author of image rights reserved.

Tutorial directory

This tutorial can be divided into five parts, which are:

  1. How do I install Mask R-CNN for Keras
  2. How do I prepare data sets for target detection
  3. How to train and detect the Mask R-CNN model of kangaroo
  4. How to evaluate the Mask R-CNN model
  5. How to detect kangaroos in new photos

How do I install Mask R-CNN for Keras

Object detection is a subject in computer vision, which includes the recognition of the presence of a particular content in a given image, location information, and the category to which one or more objects belong.

This is a very challenging problem, covering the model building approach to target recognition (e.g., finding where the target is), target localization (e.g., the range of the target’s location), and target classification (e.g., what kind of object the target is).

Region-based convolutional neural network (R-CNN), developed by Ross Girshick et al., is a family of convolutional neural network models designed specifically for target detection. This method has about four major upgrades, resulting in the formation of the current optimal Mask R-CNN. Mask R-CNN proposed in the article “Mask R-CNN” in 2018 is the latest version of the region-based convolutional neural network model family, which can support both target detection and target segmentation. Target segmentation includes not only the location of the target in the image, but also the mask of the image, and the accurate indication of which pixels in the image belong to the object.

Compared with simple model and even the most advanced deep convolutional neural network model, Mask R-CNN is a model with complex application. Instead of developing r-CNN or Mask R-CNN model applications from scratch, use a reliable third-party application based on Keras deep learning framework.

At present, the best third-party application of Mask R-CNN is Mask R-CNN Project, developed by Matterport. The project is a licensed open source project (such as the MIT license), and its code has been used in a wide variety of projects and Kaggle competitions.

The first step is to install the library.

As of this writing, there is no release of the library, so you will need to install it manually. But the good news is that installation is very simple.

The installation steps include copying the GitHub repository and running the installation script in your workspace. If you’re having trouble with this process, refer to the installation instructions in the repository readme file.

Step 1: Clone the Mask R-CNN repository on GitHub

This step is as simple as running the following command from the command line:

git clone https://github.com/matterport/Mask_RCNN.git
Copy the code

This code will create a new directory locally named Mask_RCNN with the following structure:

Mask_RCNN ├ ─ ─ assets ├ ─ ─ build │ ├ ─ ─ bdist. Macosx - 10.13 - x86_64 │ └ ─ ─ lib │ └ ─ ─ MRCNN ├ ─ ─ dist ├ ─ ─ images ├ ─ ─ Mask_rcnn. Egg -info Bass Exercises ── MRCNN ├── bass Exercises ─ Balloon Bass Exercises ── Coco Bass Exercises ── ─Copy the code

Step 2, install the Mask R-CNN library

The repository can be installed with the PIP command.

Switch the path to Mask_RCNN and run the installation script.

On the command line, enter:

cd Mask_RCNN
python setup.py install
Copy the code

On Linux or MacOS, you may need to use Sudo to allow software installation; You might see an error like this:

error: can't create or remove files in install directory
Copy the code

In this case, use Sudo to install the software:

sudo python setup.py install
Copy the code

If you are using a Python virtualenv, such as the AMI instance of EC2 deep learning (recommended for this tutorial), you can install Mask_RCNN into your environment using the following command:

sudo ~/anaconda3/envs/tensorflow_p36/bin/python setup.py install
Copy the code

The library will then begin installation directly, and you will see a successful installation message, ending with the following:

. Finished processing dependenciesforMask - RCNN = = 2.1Copy the code

This message indicates that you have successfully installed the latest 2.1 version of the library.

Third, verify that the library has been installed

It is always a good practice to verify that libraries are installed correctly.

You can request the library via the PIP command to verify that it has been installed correctly; Such as:

pip show mask-rcnn
Copy the code

You should see output that tells you the version number and installation address; Such as:

Name: mask-rcnn
Version: 2.1
Summary: Mask R-CNN for object detection and instance segmentation
Home-page: https://github.com/matterport/Mask_RCNN
Author: Matterport
Author-email: [email protected]
License: MIT
Location: ...
Requires:
Required-by:
Copy the code

We are now ready to start using the library.

How do I prepare data sets for target detection

Next, we need to prepare the data set for the model.

In this tutorial, we will use the Kangaroo data set, which was created by Experiencor (Huynh Ngoc Anh). The dataset consists of 183 images containing kangaroos, as well as XML annotation files that provide information on the border of each kangaroo in the photo.

The Mask R-CNN designed by people can learn and simultaneously predict the boundary of the target and detect the Mask of the target, but the kangaroo data set does not provide the Mask information. Therefore, we use this data set to complete the task of learning kangaroo target detection, while ignoring the mask. We do not care about the image segmentation ability of the model.

Before preparing training data set of the model, which requires several steps, these steps we will finish them one by one in this chapter, including downloading data sets, parse the annotations file, establish a library can be used for Mask_RCNN kangaroo data set object, then the test data set object, in order to ensure that we can correct loading images and annotations file.

Install data set

The first step is to download the data set to the current working directory.

To do this, run the following command to copy the GitHub repository directly:

git clone https://github.com/experiencor/kangaroo.git
Copy the code

A new directory named “Kangaroo” is created with a subdirectory named ‘images/’ that contains all kangaroo JPEG images and an XML file named ‘AnNotes/’ that describes the location of the kangaroo in each photo.

Kangaroo ├── imagesCopy the code

Let’s look at each subdirectory and see that the image and annotation files follow a consistent naming convention, that is, the 5-digit zero-padded numbering system; Such as:

images/00001.jpg
images/00002.jpg
images/00003.jpg
...
annots/00001.xml
annots/00002.xml
annots/00003.xml
...
Copy the code

This naming method makes it easy to match images with their annotation files.

We can also see that the numbers in the numbering system are not contiguous and some photos do not appear, for example, there is no JPG or XML file named ‘00007’.

This means that we should load the actual list of files in the directory directly, rather than using the numbering system to load files.

Parsing annotation files

The next step is to figure out how to load the annotation file.

First, we open and look at the first annotation file (annots/00001.xml); You will see:

<annotation>
	<folder>Kangaroo</folder>
	<filename>00001.jpg</filename>
	<path>.</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>450</width>
		<height>319</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>kangaroo</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>233</xmin>
			<ymin>89</ymin>
			<xmax>386</xmax>
			<ymax>262</ymax>
		</bndbox>
	</object>
	<object>
		<name>kangaroo</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>134</xmin>
			<ymin>105</ymin>
			<xmax>341</xmax>
			<ymax>253</ymax>
		</bndbox>
	</object>
</annotation>
Copy the code

As you can see, the annotation file contains a “size” element that describes the size of the image, and one or more “object” elements that describe the border around the position of the kangaroo object in the image.

Size and borders are the minimum information required in each annotation file. We can be careful and write some XML parsing code to handle these annotation files, which is helpful in production systems. During development, we will shorten the steps and use XPath directly to extract the data we need from each file. For example, //size requests can extract the size element from the file, The //object or // bndBox requests can extract bounding box elements.

Python provides an element tree API for developers to load and parse XML files, and we can use the find() and findall() functions to make XPath requests to loaded files.

First, the annotation file must be loaded and parsed into an ElementTree object.

# load and parse the file
tree  =  ElementTree.parse(filename)
Copy the code

Once the load is successful, we can take the root element of the document and make an XPath request to the root element.

Get the document root element
root  =  tree.getroot()
Copy the code

We can use the findAll () function with the ‘.//bndbox ‘argument to get all the’ bndbox ‘elements, and then iterate over each element to extract the x, y, min, and Max values used to define each border.

Text inside an element can also be parsed to integer values.

Extract each bounding box element
for  box in  root.findall('.//bndbox'):
	xmin  =  int(box.find('xmin').text)
	ymin  =  int(box.find('ymin').text)
	xmax  =  int(box.find('xmax').text)
	ymax  =  int(box.find('ymax').text)
	coors  =  [xmin,  ymin,  xmax,  ymax]
Copy the code

We can then organize all the border definition values into a list.

The size of the image is also useful and can be obtained by direct request.

Extract the image size
width  =  int(root.find('.//size/width').text)
height  =  int(root.find('.//size/height').text)
Copy the code

We can combine the above code into a function that takes the annotation file as an input, extracts details such as borders and image sizes, and returns these values to us for use.

The following extract_boxes() function is an implementation of the above.

Function to extract border values from the annotation file
def extract_boxes(filename):
	Load and parse the file
	tree = ElementTree.parse(filename)
	Get the document root element
	root = tree.getroot()
	Extract each bounding box element
	boxes = list()
	for box in root.findall('.//bndbox'):
		xmin = int(box.find('xmin').text)
		ymin = int(box.find('ymin').text)
		xmax = int(box.find('xmax').text)
		ymax = int(box.find('ymax').text)
		coors = [xmin, ymin, xmax, ymax]
		boxes.append(coors)
	Extract the image size
	width = int(root.find('.//size/width').text)
	height = int(root.find('.//size/height').text)
	return boxes, width, height
Copy the code

Now that we’re ready to test the method, we can test the first annotation file in the directory as a function parameter.

The complete example is shown below.

Function to extract border values from the annotation file
def extract_boxes(filename):
	Load and parse the file
	tree = ElementTree.parse(filename)
	Get the document root element
	root = tree.getroot()
	Extract each bounding box element
	boxes = list()
	for box in root.findall('.//bndbox'):
		xmin = int(box.find('xmin').text)
		ymin = int(box.find('ymin').text)
		xmax = int(box.find('xmax').text)
		ymax = int(box.find('ymax').text)
		coors = [xmin, ymin, xmax, ymax]
		boxes.append(coors)
	Extract the image size
	width = int(root.find('.//size/width').text)
	height = int(root.find('.//size/height').text)
	return boxes, width, height
Copy the code

Running the sample code above, the function returns a list containing information about each border element in the annotation file, as well as the width and height of each image.

[[233.89.386.262], [134.105.341.253]] 450 319
Copy the code

Now that we have learned how to load the annotation file, we will learn how to use this feature to create a dataset object.

Create a Kangaroo dataset object

Mask-rcnn requires the MRCNn.utils. Dataset object to manage the training, validation, and testing of the Dataset.

This means that the new class must inherit the mrCNn.utils.dataset class and define a function to load the Dataset. This function can be named anything, such as load_dataset(), It overloads the function load_mask() for loading masks and the function image_reference() for loading image references (paths or urls).

Class used to define and load kangaroo datasets
class KangarooDataset(Dataset):
	Load the dataset definition
	def load_dataset(self, dataset_dir, is_train=True):
		#...

	# Load image mask
	def load_mask(self, image_id):
		#...

	# load image references
	def image_reference(self, image_id):
		#...
Copy the code

To be able to use an object like Dataset, it must first be instantiated, then call your custom loading function, and then the built-in prepare() function will be called.

For example, we are going to create a class called KangarooDataset that will be used in the following way:

# Prepare the data set
train_set  =  KangarooDataset()
train_set.load_dataset(...)
train_set.prepare()
Copy the code

A custom loading function, load_dataset(), is responsible for defining both the class and the images in the dataset.

The class can be defined by calling the built-in function add_class(), specifying the name of the dataset ‘source’ and the integer number of the class ‘class_id’ (for example, generation 1 refers to the first class; don’t use 0 because 0 is reserved for the background class), And ‘class_name’ (e.g. ‘kangaroo’).

Define a class
self.add_class("dataset".1."kangaroo")
Copy the code

The image object can be defined by calling the built-in add_image() function, which can specify the dataset name ‘source’, a unique ‘image_id’ (for example, an unextended filename like ‘00001’), And the position of the image to load (such as’ the kangaroo/images / 00001 JPG “).

In this way, we define an “image info” dictionary structure for the image so that the image can be retrieved by adding an index or serial number to the dataset. You can also define other parameters that will also be added to the dictionary, such as the ‘annotation’ parameter used to define an annotation file.

# add to dataset
self.add_image('dataset',  image_id='00001',  path='kangaroo/images/00001.jpg',  annotation='kangaroo/annots/00001.xml')
Copy the code

For example, we could run the load_dataset() function with the address of the dataset dictionary as an argument, and it would load all the images in the dataset.

Note that the tests showed that there were some problems with the image numbered ‘00090’, so we removed it from the dataset.

Load the dataset definition
def load_dataset(self, dataset_dir):
	Define a class
	self.add_class("dataset".1."kangaroo")
	Define the location of the data
	images_dir = dataset_dir + '/images/'
	annotations_dir = dataset_dir + '/annots/'
	# Locate all images
	for filename in listdir(images_dir):
		Extract image ID
		image_id = filename[:4 -]
		# Skip unqualified images
		if image_id in ['00090'] :continue
		img_path = images_dir + filename
		ann_path = annotations_dir + image_id + '.xml'
		# add to dataset
		self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
Copy the code

We can go one step further and add a parameter to the function that defines whether an instance of the Dataset is used for training, testing, or validation. We have about 160 images, so we can use about 20% of them, or the last 32 images, as a test set or validation set, and the first 131, or 80%, as a training set.

You can use the number in the file name to complete the image classification. Images numbered before 150 will be used for training, and images equal to or greater than 150 will be used for testing. The load_dataset() function has been updated to support training and testing datasets, with the following code:

Load the dataset definition
def load_dataset(self, dataset_dir, is_train=True):
	Define a class
	self.add_class("dataset".1."kangaroo")
	Define the location of the data
	images_dir = dataset_dir + '/images/'
	annotations_dir = dataset_dir + '/annots/'
	# Locate all images
	for filename in listdir(images_dir):
		Extract image ID
		image_id = filename[:4 -]
		# Skip unqualified images
		if image_id in ['00090'] :continue
		# If we are building a training set, skip all images after number 150
		if is_train and int(image_id) >= 150:
			continue
		# If we are building a test/validation set, skip all images up to number 150
		if not is_train and int(image_id) < 150:
			continue
		img_path = images_dir + filename
		ann_path = annotations_dir + image_id + '.xml'
		# add to dataset
		self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
Copy the code

Next, we need to define the function load_mask() to load the mask for the given ‘image_id’.

In this case ‘image_id’ is an integer index of the images in the dataset based on the order in which the images were added to the dataset by calling add_image() when the dataset was loaded. The function must return an array of one or more image masks associated with image_id, as well as the classes for each mask.

We don’t have a mask yet, but we do have a border, and we can load the border of a given image and return it as a mask. The library will then infer the border information from the mask because they are the same size.

We must first load the annotation file to get the image_id. The steps include first getting the ‘image info’ dictionary containing the image_id, and then getting the image load path through our previous call to add_image(). We can then use this path when calling extract_boxes(), the function defined in the previous section to get the list of borders and the image size.

Get image details
info = self.image_info[image_id]
Define the box file location
path = info['annotation']
# to load XML
boxes, w, h = self.extract_boxes(path)
Copy the code

Now we can define a mask for each border, along with an associated class.

The mask is a two-dimensional array like the image dimension, in which the value of the position that does not belong to the object is 0 and the value of the position that does not belong to the object is 1.

By creating a NumPy array of all zeros for each image of unknown size and creating a channel for each border, we can accomplish the above goal:

Create an array for all masks, each in a different channel
masks  =  zeros([h,  w,  len(boxes)],  dtype='uint8')
Copy the code

Each border can be defined with the min, Max, X, and y coordinates of the image box.

These values can be used directly to define the range of rows and columns in an array with a value of 1.

# create mask
for i in range(len(boxes)):
	box = boxes[i]
	row_s, row_e = box[1], box[3]
	col_s, col_e = box[0], box[2]
	masks[row_s:row_e, col_s:col_e, i] = 1
Copy the code

In this data set, all objects have the same class. We can get the index of the class through the ‘class_names’ dictionary, and then add the index and mask to the list to return.

self.class_names.index('kangaroo')
Copy the code

Putting these steps together to test, you end up with the load_mask() function.

# Load image mask
def load_mask(self, image_id):
	Get image details
	info = self.image_info[image_id]
	Define the box file location
	path = info['annotation']
	# to load XML
	boxes, w, h = self.extract_boxes(path)
	Create an array for all masks, each in a different channel
	masks = zeros([h, w, len(boxes)], dtype='uint8')
	# create mask
	class_ids = list()
	for i in range(len(boxes)):
		box = boxes[i]
		row_s, row_e = box[1], box[3]
		col_s, col_e = box[0], box[2]
		masks[row_s:row_e, col_s:col_e, i] = 1
		class_ids.append(self.class_names.index('kangaroo'))
	return masks, asarray(class_ids, dtype='int32')
Copy the code

Finally, we must implement the image_reference() function,

This function returns the path or URL of the given ‘image_id’, which is the ‘path’ property of the ‘image info’ dictionary.

# load image references
def image_reference(self, image_id):
	info = self.image_info[image_id]
	return info['path']
Copy the code

All right, so we’re done. We have successfully defined Dataset objects for the Mask-RCNN library of the Kangaroo Dataset.

The complete list containing classes and creating training and test datasets is below.

# Divide the data into training and test sets
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset

Class used to define and load kangaroo datasets
class KangarooDataset(Dataset):
	Load the dataset definition
	def load_dataset(self, dataset_dir, is_train=True):
		Define a class
		self.add_class("dataset".1."kangaroo")
		Define the location of the data
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# Locate all images
		for filename in listdir(images_dir):
			Extract image ID
			image_id = filename[:4 -]
			# Skip unqualified images
			if image_id in ['00090'] :continue
			# If we are building a training set, skip all images after number 150
			if is_train and int(image_id) >= 150:
				continue
			# If we are building a test/validation set, skip all images up to number 150
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	Extract the border value from the annotation file
	def extract_boxes(self, filename):
		Load and parse the file
		tree = ElementTree.parse(filename)
		Get the document root element
		root = tree.getroot()
		Extract each bounding box element
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		Extract the image size
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# Load image mask
	def load_mask(self, image_id):
		Get image details
		info = self.image_info[image_id]
		Define the box file location
		path = info['annotation']
		# to load XML
		boxes, w, h = self.extract_boxes(path)
		Create an array for all masks, each in a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create mask
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load image references
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# training set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))

# Test/validation set
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
Copy the code

Running the sample code correctly will load and prepare the training and test sets and print out the number of images in each set.

Train: 131
Test: 32
Copy the code

Now that we have the data set defined, we need to make sure that the images, masks, and borders are handled correctly.

Test kangaroo dataset object

The first useful test is to verify that the image and mask load correctly.

We can do this by creating a dataset and calling load_image() with image_id to load the image, and then load_mask() with the same image_id to load the mask.

# load image
image_id = 0
image = train_set.load_image(image_id)
print(image.shape)
# Load image mask
mask, class_ids = train_set.load_mask(image_id)
print(mask.shape)
Copy the code

Next, we can draw the image using the API provided by Matplotlib, and then draw the first mask at the top using the alpha value so that the image below is still visible.

# draw an image
pyplot.imshow(image)
# Draw mask
pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)
pyplot.show()
Copy the code

A complete code example is shown below.

# Draw an image and mask
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from matplotlib import pyplot

Define and load the class of the kangaroo dataset
class KangarooDataset(Dataset):
	Load the dataset definition
	def load_dataset(self, dataset_dir, is_train=True):
		Define a class
		self.add_class("dataset".1."kangaroo")
		Define the location of the data
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# Locate all images
		for filename in listdir(images_dir):
			Extract image ID
			image_id = filename[:4 -]
			# Skip unqualified images
			if image_id in ['00090'] :continue
			# If we are building a training set, skip all images after number 150
			if is_train and int(image_id) >= 150:
				continue
			# If we are building a test/validation set, skip all images up to number 150
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	Extract the border value from the annotation file
	def extract_boxes(self, filename):
		Load and parse the file
		tree = ElementTree.parse(filename)
		Get the document root element
		root = tree.getroot()
		Extract each bounding box element
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		Extract the image size
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# Load image mask
	def load_mask(self, image_id):
		Get image details
		info = self.image_info[image_id]
		Define the box file location
		path = info['annotation']
		# to load XML
		boxes, w, h = self.extract_boxes(path)
		Create an array for all masks, each in a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create mask
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load image references
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# training set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
# load image
image_id = 0
image = train_set.load_image(image_id)
print(image.shape)
# Load image mask
mask, class_ids = train_set.load_mask(image_id)
print(mask.shape)
# draw an image
pyplot.imshow(image)
# Draw mask
pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)
pyplot.show()
Copy the code

Running the sample code will first print out the NumPy array of image dimensions and masks.

We can determine that these two have the same length and width, differing only in the number of channels. We can also see that in this scenario, the first image (that is, the image with image_id = 0) has only one mask.

(626.899.3)
(626.899.1)
Copy the code

A drawing of the image is created together with the overlap of the first mask.

At this point, a kangaroo with a mask covering its boundary appears in the image.

Image of kangaroo with target detection mask overlay

We can do the same for the first nine images in the dataset, plot each image as a subgraph of the whole image, and then plot all the masks of each image.

Draw the first few images
for i in range(9) :# Define subgraph
	pyplot.subplot(330 + 1 + i)
	Draw raw pixel data
	image = train_set.load_image(i)
	pyplot.imshow(image)
	# Draw all masks
	mask, _ = train_set.load_mask(i)
	for j in range(mask.shape[2]):
		pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
# display the result of drawing
pyplot.show()
Copy the code

Running the sample code, we can see that the images are loaded correctly, and the images containing multiple objects have their masks defined correctly.

Draw the first 9 kangaroo images with target detection masks in the training set

Another useful debugging step is to load all the ‘image info’ objects in the dataset and print them on the console.

This helps confirm that all calls to the add_image() function in the load_dataset() function work as expected.

Enumerates all images in the dataset
for image_id in train_set.image_ids:
	Load image information
	info = train_set.image_info[image_id]
	# display on console
	print(info)
Copy the code

Running this code on the loaded training set will display all the ‘image Info’ dictionaries containing the path and ID of each image in the dataset.

{'id': '00132'.'source': 'dataset'.'path': 'kangaroo/images/00132.jpg'.'annotation': 'kangaroo/annots/00132.xml'}
{'id': '00046'.'source': 'dataset'.'path': 'kangaroo/images/00046.jpg'.'annotation': 'kangaroo/annots/00046.xml'}
{'id': '00052'.'source': 'dataset'.'path': 'kangaroo/images/00052.jpg'.'annotation': 'kangaroo/annots/00052.xml'}...Copy the code

Finally, the mask-RCNN library provides tools for displaying images and masks. We can use some built-in methods to verify that the data set is working properly.

For example, the mrCNn.visualize.display_Instances () function provided by mask-rCNN can be used to display images containing borders, masks, and class labels. But the required border is already extracted from the mask using the extract_bboxes() method.

# define image ID
image_id = 1
# load image
image = train_set.load_image(image_id)
Load mask and class ID
mask, class_ids = train_set.load_mask(image_id)
Extract borders from the mask
bbox = extract_bboxes(mask)
# Display the image with mask and border
display_instances(image, bbox, mask, class_ids, train_set.class_names)
Copy the code

To give you a sense of the completion of the process, all the code is listed below.

# Display the image with mask and border
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from mrcnn.visualize import display_instances
from mrcnn.utils import extract_bboxes

Define and load the class of the kangaroo dataset
class KangarooDataset(Dataset):
	Load the dataset definition
	def load_dataset(self, dataset_dir, is_train=True):
		Define a class
		self.add_class("dataset".1."kangaroo")
		Define the location of the data
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# Locate all images
		for filename in listdir(images_dir):
			Extract image ID
			image_id = filename[:4 -]
			# Skip unqualified images
			if image_id in ['00090'] :continue
			# If we are building a training set, skip all images after number 150
			if is_train and int(image_id) >= 150:
				continue
			# If we are building a test/validation set, skip all images up to number 150
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	Extract the border value from the annotation file
	def extract_boxes(self, filename):
		Load and parse the file
		tree = ElementTree.parse(filename)
		Get the document root element
		root = tree.getroot()
		Extract each bounding box element
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		Extract the image size
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# Load image mask
	def load_mask(self, image_id):
		Get image details
		info = self.image_info[image_id]
		Define the box file location
		path = info['annotation']
		# to load XML
		boxes, w, h = self.extract_boxes(path)
		Create an array for all masks, each in a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create mask
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load image references
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# training set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
# define image ID
image_id = 1
# load image
image = train_set.load_image(image_id)
Load mask and class ID
mask, class_ids = train_set.load_mask(image_id)
Extract borders from the mask
bbox = extract_bboxes(mask)
# Display the image with mask and border
display_instances(image, bbox, mask, class_ids, train_set.class_names)
Copy the code

Running this sample code will create an image that marks each target mask with a different color.

From the beginning of the program, borders and masks can be matched exactly to each other, and they are marked in the image with a dashed outer border. Finally, each object is also marked by a class tag, in this case the ‘Kangaroo’ class.

Displays images of target detection masks, borders, and class labels

Now that we are sure that the data set will be loaded correctly, we can use it to fit the Mask R-CNN model.

How to train and detect the Mask R-CNN model of kangaroo

The Mask R-CNN model can be fitted from scratch, but as with other computer vision applications, it can save time and improve performance by using a transfer learning approach.

The pre-fitting of the Mask R-CNN model in MS COCO target detection can be used as the initial model, and then it can be adapted to a specific data set, namely kangaroo data set in this case.

The first step is to download the model file (including structure and weight information) for the pre-fitted Mask R-CNN model. Weight information can be downloaded from the Github project in a file of about 250 MB.

Load the model weights into the file ‘mask_rCNn_coke.h5’ in your working directory.

  • Download the weight information file (mask_rCNn_coke.h5) 246M

Next, you must define a configuration object for the model.

This new class inherits the MRCNn.config. config class, which defines what to predict (such as the name and number of classes) and the algorithm to train the model (such as the learning rate).

The configuration object must have a configuration NAME defined by the ‘NAME’ property, such as’ Kangaroo_cfg ‘, which is used to save details and models to a file when the project is running. The configuration object must also define the number of classes in the prediction problem through the ‘NUM_CLASSES’ attribute. In this example, although there are many other classes in the background, we have only one target for identification, and that is the kangaroo.

Finally, we need to define the number of samples (images) to be used in each training round. This is the number of images in the training set, which is 131.

Putting all this together, our custom KangarooConfig class is defined as follows.

Define the model configuration
class KangarooConfig(Config):
	Name the configuration object
	NAME = "kangaroo_cfg"
	# Number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# Number of iterations per training round
	STEPS_PER_EPOCH = 131

Prepare the configuration information
config = KangarooConfig()
Copy the code

Now we can define the model.

We can create models by creating instances of the class mrcnn.model.maskrCNN. By setting the ‘mode’ property to ‘training’, specific models will be available for training.

The ‘config_’ parameter must be assigned to our KangarooConfig class.

Finally, you need a directory to store configuration files and model checkpoints at the end of each training round. Let’s just use the current working directory.

# Define model
model  =  MaskRCNN(mode='training',  model_dir='/',  config=config)
Copy the code

Next, you need to load the structure and weights of the predefined model. This is done by calling the load_weights() function on the model, remembering to specify the address of the ‘mask_rCNn_coke.h5’ file that holds the downloaded data.

The model will be used as is, but the output layer specifying the class will be removed so that the new output layer can be defined and trained. This is done by specifying the ‘exclude’ parameter and listing all the output layers that need to be removed from the model once the model is loaded. This includes output layers for categorizing labels, borders, and masks.

Load mSCOCO weight information
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits"."mrcnn_bbox_fc"."mrcnn_bbox"."mrcnn_mask"])
Copy the code

Next, the model will begin fitting on the training set by calling the train() function and passing in the training set and validation set as parameters. We can also specify the learning rate. The default learning rate is 0.001.

We can also specify which layer to train. In our example, we train only the header, which is the output layer of the model.

# Training weights (output layer, or 'header')
model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')
Copy the code

We can repeat this training step in subsequent training, fine-tuning the weights in the model. This is done by using a smaller learning rate and changing the ‘layer’ parameter from ‘heads’ to’ all ‘.

The complete code of training Mask R-CNN model in kangaroo dataset is as follows.

Even running your code on decent hardware can take some time. So I recommend running it on a GPU, such as Amazon EC2, and on P3 type hardware, the code runs in five minutes.

# Fit mask RCNN model on kangaroo dataset
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from mrcnn.config import Config
from mrcnn.model import MaskRCNN

Define and load the class of the kangaroo dataset
class KangarooDataset(Dataset):
	Load the dataset definition
	def load_dataset(self, dataset_dir, is_train=True):
		Define a class
		self.add_class("dataset".1."kangaroo")
		Define the location of the data
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# Locate all images
		for filename in listdir(images_dir):
			Extract image ID
			image_id = filename[:4 -]
			# Skip unqualified images
			if image_id in ['00090'] :continue
			# If we are building a training set, skip all images after number 150
			if is_train and int(image_id) >= 150:
				continue
			# If we are building a test/validation set, skip all images up to number 150
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	Extract the border value from the annotation file
	def extract_boxes(self, filename):
		Load and parse the file
		tree = ElementTree.parse(filename)
		Get the document root element
		root = tree.getroot()
		Extract each bounding box element
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		Extract the image size
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# Load image mask
	def load_mask(self, image_id):
		Get image details
		info = self.image_info[image_id]
		Define the box file location
		path = info['annotation']
		# to load XML
		boxes, w, h = self.extract_boxes(path)
		Create an array for all masks, each in a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create mask
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load image references
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

Define the model configuration
class KangarooConfig(Config):
	Define the configuration name
	NAME = "kangaroo_cfg"
	# Number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# Number of iterations per training round
	STEPS_PER_EPOCH = 131

# Prepare the training set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
Prepare a test/validation set
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
Prepare the configuration information
config = KangarooConfig()
config.display()
# Define model
model = MaskRCNN(mode='training', model_dir='/', config=config)
Load mSCOCO weight information, exclude the output layer
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits"."mrcnn_bbox_fc"."mrcnn_bbox"."mrcnn_mask"])
# Training weights (output layer, or 'header')
model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')
Copy the code

Running the sample code reports progress using the standard Keras progress bar.

We can see that each network’s output header reports different training and test loss scores. It can be very confusing to notice the loss of points.

In the example in this article, we are interested in target recognition rather than target segmentation, so I suggest that we pay attention to the loss of the classification output of the training set and validation set (e.g. Mrcnn_class_loss and VAL_MRCNn_class_loss), There are also border outputs for training and validation sets (MRCNn_bbox_loss and val_MRCNn_bbox_loss).

Epoch 1/5 131/131 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 106-811 ms/s step - loss: 0.8491 - rpn_class_loss: 0.0044-RPn_bbox_loss: 0.1452-MRCNn_class_loss: 0.0420-MRCNn_bbox_loss: 0.2874 -MRCNn_mask_loss: 0.3701 - val_loss: 1.3402 - val_RPn_class_loss: 0.0160 - val_RPn_bbox_loss: 0.7913 - val_MRCNn_class_loss: 0.0092 - val_MRCNn_bbox_loss: 0.2263 - val_MRCNn_mask_loss: 0.2975 Epoch 2/5 131/131 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 69-526 ms/s step - loss: 0.4774 - rpn_class_loss: 0.0025 -RPn_bbox_loss: 0.1159-MRCNn_class_loss: 0.0170 -MRCNn_bbox_loss: 0.1134-MRCNn_mask_loss: 0.2285 - val_LOSS: 0.6261 - val_RPn_class_loss: 8.9502E-04 - val_RPn_bbox_loss: 0.1624 - val_MRCNn_class_loss: 0.0197 - val_MRCNn_bbox_loss: 0.2148 - val_MRCNn_mask_loss: 0.2282 Epoch 3/5 131/131 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 67-515 ms/s step - loss: 0.4471 - rpn_class_loss: 0.0029-RPn_bbox_loss: 0.1153-MRCNn_class_loss: 0.0234-MRCNn_bbox_loss: 0.0958-MRCNn_mask_loss: 0.2097 - val_loss: 1.2998 - val_RPn_class_loss: 0.0144 - val_RPn_bbox_loss: 0.6712 - val_MRCNn_class_loss: 0.0372-val_MRCNn_bbox_loss: 0.2645 - val_MRCNn_mask_loss: 0.3125 Epoch 4/5 131/131 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 66-502 ms/s step - loss: 0.3934 - rpn_class_loss: 0.0026 -RPn_bbox_loss: 0.1003 -MRCNn_class_loss: 0.0171 -MRCNn_bbox_loss: 0.0806 -MRCNn_mask_loss: 0.1928 - val_loss: 0.6709 - val_RPn_class_loss: 0.0016 - val_RPn_bbox_loss: 0.2012 - val_MRCNn_class_loss: 0.0244 - val_MRCNn_bbox_loss: 0.1942 - val_MRCNn_mask_loss: 0.2495 Epoch 5/5 131/131 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 65-493 ms/s step - loss: 0.3357 - rpn_class_loss: 0.0024 -RPn_bbox_loss: 0.0804-MRCNn_class_loss: 0.0193 -MRCNn_bbox_loss: 0.0616-MRCNn_mask_loss: 0.1721 - VAL_loss: 0.8878 - val_RPn_class_loss: 0.0030 - val_RPn_bbox_loss: 0.4409 - val_MRCNn_class_loss: 0.0174 -val_MRCNn_bbox_loss: 0.1752 -val_MRCNn_mask_loss: 0.2513Copy the code

After each round of training, a model file is created and saved in a subdirectory. The file name starts with an kangaroo_cfg and is followed by random characters.

To use it, we have to choose a model; In this example, the loss of border selection decreases with each round of training, so we will use the final model, which is generated after running ‘mask_RCNn_kangaroo_CFG_0005.h5’.

Copy the model files from the configuration directory to the current working directory. We will use it in the following sections to evaluate the model and make predictions for unknown images.

The results show that perhaps more training times can lead to better performance of the model, perhaps can fine-tune the parameters of all layers of the model; This idea could be an interesting extension to this article.

Let’s take a look at the performance evaluation of this model.

How to evaluate the Mask R-CNN model

The performance of models that identify targets is usually measured using average absolute accuracy, known as maps.

What we want to predict is the position of the border, so we can determine whether the prediction is accurate by how much the predicted border overlaps with the actual border. The accuracy can be calculated by dividing the area where the borders overlap by the total area of the two borders, or intersection over union. The perfect border would predict an IoU value of 1.

In general, if the IoU value is greater than 0.5, we can consider the border prediction to be good, that is, the overlap is more than 50% of the total area.

Accuracy is the percentage of borders that are correctly predicted (i.e., borders with an IoU > 0.5) as a percentage of total borders. Recall is the percentage of correctly predicted borders (i.e., borders with IoU > 0.5) as a percentage of objects in all images.

The recall rate will increase as we make more predictions, but the accuracy may decrease or fluctuate as we start to overfit. Recall rate (x) can be plotted based on accuracy (y), and a curve or straight line can be plotted for each accuracy value. We can maximize the value at each point on the curve and calculate the average accuracy, or AP, for each recall rate.

Note: There are many ways in which AP can be calculated. For example, the widely used PASCAL VOC dataset and MS COCO dataset are calculated differently.

The average (AP) of the average accuracy of all images in a data set is called the average absolute accuracy, or mAP.

The mask-rCNN library provides the function mrcnn.utils.compute_ap to calculate the AP and other metrics for a given image. All the AP values in the dataset can be aggregated together, and calculating the mean gives us an idea of how well the model can detect objects in the dataset.

First we must define a Config object that will be used to make predictions, not for training. We can extend the KangarooConfig defined earlier to duplicate some parameters. We’ll define a new object with equal property values to keep the code simple. The configuration must change some of the default values for prediction using the GPU, which is different from the configuration when training the model (it doesn’t matter if you’re running code on the GPU or CPU).

# define the prediction configuration
class PredictionConfig(Config):
	Define the configuration name
	NAME = "kangaroo_cfg"
	# Number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# Simplify GPU configuration
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1
Copy the code

Next we can use the configuration to define the model and change the parameter ‘mode’ from ‘training’ to ‘inference’.

# create configuration
cfg = PredictionConfig()
# Define model
model = MaskRCNN(mode='inference', model_dir='/', config=cfg)
Copy the code

Next, we can load weights from the saved model.

You do this by specifying the path to the model file. In this example, the model file is’ mask_RCNn_kangaroo_CFG_0005.h5 ‘in the current working directory.

Load model weights
model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5',  by_name=True)
Copy the code

Next, we can evaluate the model. This involves enumerating the images in the data set, making a prediction, and then calculating the AP value for the prediction before predicting the average AP of all the images.

The first step is to load the image and the real mask from the dataset with the specified image_id. This is done by using the load_image_gt() handy function.

Load the image, border, and mask with the specified image ID
image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
Copy the code

Next, the pixel values of the loaded image must be scaled in the same way as the training data, such as centered. You can do this by using the mold_image() convenience.

# Convert pixel values (e.g. center)
scaled_image  =  mold_image(image,  cfg)
Copy the code

The dimensions of the image then need to be expanded in the dataset into a sample that will serve as input for model predictions.

sample = expand_dims(scaled_image, 0)
# Make a prediction
yhat = model.detect(sample, verbose=0)
# is the extraction result of the first sample
r = yhat[0]
Copy the code

The predicted value can then be compared to the real value and the metrics are calculated using the compute_ap() function.

# Statistical calculations, including AP calculations
AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
Copy the code

AP values are added to a list and then averaged.

Putting the above together, the evaluate_model() function below is the implementation of the process and computes the mAP given the data set, model, and configuration.

Compute a mAP of the model in the given dataset
def evaluate_model(dataset, model, cfg):
	APs = list()
	for image_id in dataset.image_ids:
		Load the image, border, and mask with the specified image ID
		image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
		# Convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# Convert images to samples
		sample = expand_dims(scaled_image, 0)
		# Make a prediction
		yhat = model.detect(sample, verbose=0)
		# is the extraction result of the first sample
		r = yhat[0]
		# Statistical calculations, including AP calculations
		AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
		# save
		APs.append(AP)
	# Calculate the average AP of all images
	mAP = mean(APs)
	return mAP
Copy the code

Now we can calculate the mAP of the model on the training set and data set.

Evaluate the model on the training set
train_mAP = evaluate_model(train_set, model, cfg)
print("Train mAP: %.3f" % train_mAP)
Evaluate the model on the test set
test_mAP = evaluate_model(test_set, model, cfg)
print("Test mAP: %.3f" % test_mAP)
Copy the code

The complete code is shown below.

# Evaluate the MASK RCNN model on the kangaroo dataset
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from numpy import expand_dims
from numpy import mean
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from mrcnn.utils import Dataset
from mrcnn.utils import compute_ap
from mrcnn.model import load_image_gt
from mrcnn.model import mold_image

Define and load the class of the kangaroo dataset
class KangarooDataset(Dataset):
	Load the dataset definition
	def load_dataset(self, dataset_dir, is_train=True):
		Define a class
		self.add_class("dataset".1."kangaroo")
		Define the location of the data
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# Locate all images
		for filename in listdir(images_dir):
			Extract image ID
			image_id = filename[:4 -]
			# Skip unqualified images
			if image_id in ['00090'] :continue
			# If we are building a training set, skip all images after number 150
			if is_train and int(image_id) >= 150:
				continue
			# If we are building a test/validation set, skip all images up to number 150
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	Extract the border value from the annotation file
	def extract_boxes(self, filename):
		Load and parse the file
		tree = ElementTree.parse(filename)
		Get the document root element
		root = tree.getroot()
		Extract each bounding box element
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		Extract the image size
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# Load image mask
	def load_mask(self, image_id):
		Get image details
		info = self.image_info[image_id]
		Define the box file location
		path = info['annotation']
		# to load XML
		boxes, w, h = self.extract_boxes(path)
		Create an array for all masks, each in a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create mask
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load image references
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# define the prediction configuration
class PredictionConfig(Config):
	Define the configuration name
	NAME = "kangaroo_cfg"
	# Number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# Simplify GPU configuration
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

Compute a mAP of the model in the given dataset
def evaluate_model(dataset, model, cfg):
	APs = list()
	for image_id in dataset.image_ids:
		Load the image, border, and mask with the specified image ID
		image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
		# Convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# Convert images to samples
		sample = expand_dims(scaled_image, 0)
		# Make a prediction
		yhat = model.detect(sample, verbose=0)
		# is the extraction result of the first sample
		r = yhat[0]
		# Statistical calculations, including AP calculations
		AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
		# save
		APs.append(AP)
	# Calculate the average AP of all images
	mAP = mean(APs)
	return mAP

Load the training set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
Load the test set
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
# create configuration
cfg = PredictionConfig()
# Define model
model = MaskRCNN(mode='inference', model_dir='/', config=cfg)
Load model weights
model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)
Evaluate the model on the training set
train_mAP = evaluate_model(train_set, model, cfg)
print("Train mAP: %.3f" % train_mAP)
Evaluate the model on the test set
test_mAP = evaluate_model(test_set, model, cfg)
print("Test mAP: %.3f" % test_mAP)
Copy the code

Running the sample code will make a prediction for each image in the training set and test set and calculate a mAP for each prediction.

A mAP of 90% or more is a good score. We can see that mAP scores are good on both datasets, and may be even better on the test set rather than the training set.

This may be because the test set is smaller, or because the model has become more accurate with further training.

Train mAP: 0.929
Test mAP: 0.958
Copy the code

Now that we are convinced that the model is sound, we can use it to make predictions.

How to detect kangaroos in new photos

We can use trained models to detect kangaroos in new images, especially those in which kangaroos are expected.

First, we need a new image of a kangaroo

We can go to Flickr and randomly pick an image of a kangaroo. Alternatively, you can use images that are not used to train the model in the test set.

In the previous chapters, we have seen how to make predictions about images. Specifically, you need to scale the pixel value of the image, and then call the model.detect() function. Such as:

# Make predictions.# load image
image = ...
# Convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)
# Convert images to samples
sample = expand_dims(scaled_image, 0)
# Make a prediction
yhat = model.detect(sample, verbose=0)...Copy the code

Let’s go one step further and make predictions for multiple images in the dataset, and then plot the images with the actual border and the predicted border in turn. So we can see directly how accurate the model is.

The first step is to load the image and mask from the dataset.

# Load images and masks
image = dataset.load_image(image_id)
mask, _ = dataset.load_mask(image_id)
Copy the code

Next, we can make predictions about the image.

# Convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)
# Convert images to samples
sample = expand_dims(scaled_image, 0)
# Make a prediction
yhat = model.detect(sample, verbose=0) [0]
Copy the code

Next, we can create and draw a subgraph of the image that contains the actual border position.

# Define subgraph
pyplot.subplot(n_images, 2, i*2+1)
Draw raw pixel data
pyplot.imshow(image)
pyplot.title('Actual')
# Draw mask
for j in range(mask.shape[2]):
	pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
Copy the code

We can then create a second subgraph next to the first one and draw the first one, this time drawing the image with the predicted border position.

# Get the context of the drawing box
pyplot.subplot(n_images, 2, i*2+2)
Draw raw pixel data
pyplot.imshow(image)
pyplot.title('Predicted')
ax = pyplot.gca()
Draw each frame
for box in yhat['rois'] :# get coordinates
	y1, x1, y2, x2 = box
	# Calculate the width and height of the drawing box
	width, height = x2 - x1, y2 - y1
	Create a shape object
	rect = Rectangle((x1, y1), width, height, fill=False, color='red')
	# Draw a drawing box
	ax.add_patch(rect)
Copy the code

We can make the dataset, model, configuration information, and draw the first five images of the dataset with real and predicted borders all in one function.

Draw multiple images with real and predicted borders
def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):
	# Load images and masks
	for i in range(n_images):
		# Load images and masks
		image = dataset.load_image(i)
		mask, _ = dataset.load_mask(i)
		# Convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# Convert images to samples
		sample = expand_dims(scaled_image, 0)
		# Make a prediction
		yhat = model.detect(sample, verbose=0) [0]
		# Define subgraph
		pyplot.subplot(n_images, 2, i*2+1)
		Draw raw pixel data
		pyplot.imshow(image)
		pyplot.title('Actual')
		# Draw mask
		for j in range(mask.shape[2]):
			pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
		# Get the context of the drawing box
		pyplot.subplot(n_images, 2, i*2+2)
		Draw raw pixel data
		pyplot.imshow(image)
		pyplot.title('Predicted')
		ax = pyplot.gca()
		# Draw each drawing box
		for box in yhat['rois'] :# get coordinates
			y1, x1, y2, x2 = box
			# Calculate the width and height of the drawing box
			width, height = x2 - x1, y2 - y1
			Create a shape object
			rect = Rectangle((x1, y1), width, height, fill=False, color='red')
			# Draw a drawing box
			ax.add_patch(rect)
	# display the result of drawing
	pyplot.show()
Copy the code

The complete code for loading the trained model and predicting the first few images in the training set and test set is shown below.

# Use mask RCNN model to detect kangaroos in images
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from numpy import expand_dims
from matplotlib import pyplot
from matplotlib.patches import Rectangle
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from mrcnn.model import mold_image
from mrcnn.utils import Dataset

Define and load the class of the kangaroo dataset
class KangarooDataset(Dataset):
	Load the dataset definition
	def load_dataset(self, dataset_dir, is_train=True):
		Define a class
		self.add_class("dataset".1."kangaroo")
		Define the location of the data
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# Locate all images
		for filename in listdir(images_dir):
			Extract image ID
			image_id = filename[:4 -]
			# Skip unqualified images
			if image_id in ['00090'] :continue
			# If we are building a training set, skip all images after number 150
			if is_train and int(image_id) >= 150:
				continue
			# If we are building a test/validation set, skip all images up to number 150
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	Load all border information from the image
	def extract_boxes(self, filename):
		Load and parse the file
		root = ElementTree.parse(filename)
		boxes = list()
		Extract the border information
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		Extract the image size
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# Load image mask
	def load_mask(self, image_id):
		Get image details
		info = self.image_info[image_id]
		Define the box file location
		path = info['annotation']
		# to load XML
		boxes, w, h = self.extract_boxes(path)
		Create an array for all masks, each in a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create mask
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load image references
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# define the prediction configuration
class PredictionConfig(Config):
	Define the configuration name
	NAME = "kangaroo_cfg"
	# Number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# Simplify GPU configuration
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

Draw multiple images with real and predicted borders
def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):
	# Load images and masks
	for i in range(n_images):
		# Load images and masks
		image = dataset.load_image(i)
		mask, _ = dataset.load_mask(i)
		# Convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# Convert images to samples
		sample = expand_dims(scaled_image, 0)
		# Make a prediction
		yhat = model.detect(sample, verbose=0) [0]
		# Define subgraph
		pyplot.subplot(n_images, 2, i*2+1)
		Draw raw pixel data
		pyplot.imshow(image)
		pyplot.title('Actual')
		# Draw mask
		for j in range(mask.shape[2]):
			pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
		# Get the context of the drawing box
		pyplot.subplot(n_images, 2, i*2+2)
		Draw raw pixel data
		pyplot.imshow(image)
		pyplot.title('Predicted')
		ax = pyplot.gca()
		# Draw each drawing box
		for box in yhat['rois'] :# get coordinates
			y1, x1, y2, x2 = box
			# Calculate the width and height of the drawing box
			width, height = x2 - x1, y2 - y1
			Create a shape object
			rect = Rectangle((x1, y1), width, height, fill=False, color='red')
			# Draw a drawing box
			ax.add_patch(rect)
	# display the result of drawing
	pyplot.show()

Load the training set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
Load the test set
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
# create configuration
cfg = PredictionConfig()
# Define model
model = MaskRCNN(mode='inference', model_dir='/', config=cfg)
Load model weights
model_path = 'mask_rcnn_kangaroo_cfg_0005.h5'
model.load_weights(model_path, by_name=True)
# Draw the predicted result of training set
plot_actual_vs_predicted(train_set, model, cfg)
Draw test set training results
plot_actual_vs_predicted(test_set, model, cfg)
Copy the code

Running the sample code will create a drawing that shows the first five images in the training set, with real and predicted borders in the two adjacent images.

We can see that in these examples the model performs well, finding all kangaroos, even in a single image containing two or three kangaroos. There was a small error in the second image in the right column, where the model predicted two borders on the same kangaroo.

Draw kangaroo images with real and predicted borders in the training set

The second image created shows the five images in the test set with real and predicted borders.

These images did not appear during training, and again, the model detected kangaroos in each image. We can see that there are two small mistakes in the last two photos. Specifically, the same kangaroo was detected twice.

There is no doubt that these differences can be ignored after repeated training, and perhaps with a larger data set and data expansion, the model can use the detected human as the background and not detect kangaroos repeatedly.

Draw kangaroo images with real and predicted borders in the test set

Further reading

This chapter provides additional resources related to object detection, which you can read if you want to learn more about it.

The paper

  • Mask R-CNN, 2017.

project

  • Kangaroo Dataset, GitHub.
  • Mask RCNN Project, GitHub.

API

  • xml.etree.ElementTree API
  • matplotlib.patches.Rectangle API
  • matplotlib.pyplot.subplot API
  • matplotlib.pyplot.imshow API

The article

  • Splash of Color: Instance Segmentation with Mask R-CNN and TensorFlow, 2018.
  • Mask R-CNN — Inspect Ballon Trained Model, Notebook
  • Mask R-CNN — Train on Shapes Dataset, Notebook.
  • mAP (mean Average Precision) for Object Detection, 2018.

conclusion

In this tutorial, we jointly explore how to develop a Mask R-CNN model for detecting kangaroo targets in images.

Specifically, you will learn:

  • How to prepare target detection data set for training R-CNN model.
  • How to use transfer learning to train object detection models on new data sets.
  • How to evaluate Mask R-CNN on test data sets and how to make predictions on new photos.

Do you have any other questions? Post your questions in the comments section below and I’ll give you the best possible answers.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.