Introduction to the

Inception- V3 is a neural network proposed by Google for the implementation of ImageNet Large Visual Recognition Challenge

Inception- V3 uses Inception Block repeatedly and involves extensive convolution and pooling, while ImageNet includes over 14 million images with over 1000 categories

Therefore, manual training of Inception- V3 on ImageNet requires considerable resources and time

Here, we choose to load the pre-trained Inception v3 model to complete some image classification tasks

To prepare

The pre-trained model consists of three parts

  • classify_image_graph_def.pb: Inception- V3 model structure and parameters
  • imagenet_2012_challenge_label_map_proto.pbtxt: Indicates the mapping from the category number to the category string
  • imagenet_synset_to_human_label_map.txt: Indicates the mapping between the category string and the category name

For example, 169 corresponds to N02510455, giant Panda, Panda, Panda Bear, Coon Bear, Ailuropoda melanoleuca

Image classification

Load the library

# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np
Copy the code

Collate the two mapping files and get the corresponding relationship from category number to category name

uid_to_human = {}
for line in tf.gfile.GFile('imagenet_synset_to_human_label_map.txt').readlines():
	items = line.strip().split('\t')
	uid_to_human[items[0]] = items[1]

node_id_to_uid = {}
for line in tf.gfile.GFile('imagenet_2012_challenge_label_map_proto.pbtxt').readlines():
	if line.startswith(' target_class:'):
		target_class = int(line.split(':') [1])if line.startswith(' target_class_string:'):
		target_class_string = line.split(':')[1].strip('\n').strip('\ "')
		node_id_to_uid[target_class] = target_class_string

node_id_to_name = {}
for key, value in node_id_to_uid.items():
	node_id_to_name[key] = uid_to_human[value]
Copy the code

Load model

def create_graph():
	with tf.gfile.FastGFile('classify_image_graph_def.pb'.'rb') as f:
		graph_def = tf.GraphDef()
		graph_def.ParseFromString(f.read())
		_ = tf.import_graph_def(graph_def, name=' ')
Copy the code

Defines a function that categorizes images

def classify_image(image, top_k=1):
	image_data = tf.gfile.FastGFile(image, 'rb').read()

	create_graph()

	with tf.Session() as sess:
		# 'softmax:0': A tensor containing the normalized prediction across 1000 labels
		# 'pool_3:0': A tensor containing the next-to-last layer containing 2048 float description of the image
		# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG encoding of the image
		softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
		predictions = sess.run(softmax_tensor, feed_dict={'DecodeJpeg/contents:0': image_data})
		predictions = np.squeeze(predictions)

		top_k = predictions.argsort()[-top_k:]
		for node_id in top_k:
			human_string = node_id_to_name[node_id]
			score = predictions[node_id]
			print('%s (score = %.5f)' % (human_string, score))
Copy the code

The function top_k is called to classify images. The parameter top_k returns the most possible classification results

classify_image('test1.png')
Copy the code

The classification results are as follows

  • test1: Giant Panda, panda, Panda bear, Coon Bear, Ailuropoda melanoleuca (score = 0.89107)
  • test2: Pekinese, Pekingese, Peke (Score = 0.90348)
  • test3: Samoyed, Samoyede (Score = 0.92054)

Customizing categorizing tasks

Inception- V3 is designed for the ImageNet image classification task, so the number of neurons in the last full connection layer is the same as the number of classification tags

If you need to customize the classification task, you can simply use your own annotation data and replace the last full connection layer

The number of neurons in the last fully connected layer is equal to the number of tags of the customized classification task. The model only trains the parameters of the last layer, and other parameters remain unchanged

As a typical application scenario of transfer learning, Inception- V3 retains its ability to understand and abstract images while meeting the requirements of customized classification tasks

TensorFlow provides a tutorial on how to do migration learning on Inception-v3

www.tensorflow.org/tutorials/i…

The data used included photographs of five flowers

  • Daisy: Daisy
  • Dandelion: Dandelion
  • Roses: roses
  • The sunflowers
  • -Rufus: Tulips

With the last full link layer removed, the representation of the model output for a picture input is called Bottleneck

Calculating and caching the entire image beforehand saves a lot of training time, since it only needs to calculate and learn the hidden layer between the Bottleneck and the output tag

TensorFlow officially provides the code for retraining

Github.com/tensorflow/…

Used on the command line, some optional command line arguments include

  • --image_dir: Training picture directory
  • --output_graph: Model save directory
  • --output_labels: Model label save directory
  • --summaries_dir: directory for saving model logs
  • --how_many_training_steps: Number of training iterations. Default value: 4000
  • --learning_rate: Learning rate, default is 0.01
  • --testing_percentage: Ratio of test sets. The default value is 10%
  • --validation_percentage: Proportion of parity sets. The default value is 10%
  • --eval_step_interval: Model evaluation frequency, the default is once in 10 iterations
  • --train_batch_size: Training batch size, default is 100
  • --print_misclassified_test_images: Whether to output test set images of all error categories. The default is False
  • --model_dir: Inception-v3 model path
  • --bottleneck_dir: Cache cache directory
  • --final_tensor_name: Indicates the name of the new full connection layer. The default value isfinal_result
  • --flip_left_right: Indicates whether to randomly flip half of an image horizontally. Default: False
  • --random_crop: Indicates the proportion of random clipping. The default value is 0, that is, no clipping
  • --random_scale: Random zoom ratio. The default value is 0, indicating no zoom
  • --random_brightness: Random brightening ratio. The default value is 0, indicating no brightening
  • --architecture: Migration model. The default value isinception_v3, the highest accuracy but the training time is longer, you can also choose'mobilenet_<parameter size>_<input_size>[_quantized]', e.g.Mobilenet_1. 0 _224andMobilenet_0. 25 _128_quantized

Run the code

python retrain.py --image_dir flower_photos --output_graph output_graph.pb --output_labels output_labels.txt --summaries_dir summaries_dir --model_dir .. --bottleneck_dir bottleneck_dir
Copy the code

This is an erratum to the content of the video

  • will--output_graphAfter theoutput_graphInstead ofoutput_graph.pb
  • will--output_labelsAfter theoutput_labelsInstead ofoutput_labels.txt

The classification accuracy of the calibration set and test set is 91% and 91.2% respectively

It took 55 minutes on my laptop, of which 44 minutes were spent on the Bottleneck cache, but without caching, each iteration of the training would have to be computed again

Training logs in the summaries_DIR directory are available for TensorBorad visualization

tensorboard --logdir summaries_dir
Copy the code

Then go to http://localhost:6006 in your browser and you can see the visualizations, including SCALARS, GRAPHS, Weighting and HISTOGRAMS pages

To complete other image classification tasks, organize corresponding labeled images and use label names as subfolders

To use the trained model, follow the code below

  • output_labels.txt: Category File path
  • output_graph.pb: Trained model path
  • read_image(): The function that reads the picture
  • input_operation: Picture input correspondingoperation
  • output_operation: Corresponding to the classification outputoperation
  • test.jpg: Indicates the path of the image to be classified
# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np

labels = []
for line in tf.gfile.GFile('output_labels.txt').readlines():
	labels.append(line.strip())

def create_graph():
	graph = tf.Graph()
	graph_def = tf.GraphDef()
	with open('output_graph.pb'.'rb') as f:
		graph_def.ParseFromString(f.read())
	with graph.as_default():
		tf.import_graph_def(graph_def)
	return graph

def read_image(path, height=299, width=299, mean=128, std=128):
	file_reader = tf.read_file(path, 'file_reader')
	if path.endswith('.png'):
		image_reader = tf.image.decode_png(file_reader, channels=3, name='png_reader')
	elif path.endswith('.gif'):
		image_reader = tf.squeeze(tf.image.decode_gif(file_reader, name='gif_reader'))
	elif path.endswith('.bmp'):
		image_reader = tf.image.decode_bmp(file_reader, name='bmp_reader')
	else:
		image_reader = tf.image.decode_jpeg(file_reader, channels=3, name='jpeg_reader')
	image_np = tf.cast(image_reader, tf.float32)
	image_np = tf.expand_dims(image_np, 0)
	image_np = tf.image.resize_bilinear(image_np, [height, width])
	image_np = tf.divide(tf.subtract(image_np, [mean]), [std])
	sess = tf.Session()
	image_data = sess.run(image_np)
	return image_data

def classify_image(image, top_k=1):
	image_data = read_image(image)

	graph = create_graph()

	with tf.Session(graph=graph) as sess:
		input_operation = sess.graph.get_operation_by_name('import/Mul')
		output_operation = sess.graph.get_operation_by_name('import/final_result')
		predictions = sess.run(output_operation.outputs[0], feed_dict={input_operation.outputs[0]: image_data})
		predictions = np.squeeze(predictions)

		top_k = predictions.argsort()[-top_k:]
		for i in top_k:
			print('%s (score = %.5f)' % (labels[i], predictions[i]))

classify_image('test.jpg')
Copy the code

reference

  • Deep Learning with Tensorflow Part 2 – Image classification:towardsdatascience.com/deep-learni…
  • Going deeper with convolutions: arxiv.org/pdf/1409.48…
  • ImageNet:image-net.org/
  • Image Recognition:www.tensorflow.org/tutorials/i…
  • How to Retrain Inception ‘s Final Layer for New Categories:www.tensorflow.org/tutorials/i…
  • Convolution neural network – micro Wu En reach deep learning professional – netease mooc.study.163.com/learn/20012 cloud class…

Video lecture course

Deep and interesting (1)