Introduction to the
Inception- V3 is a neural network proposed by Google for the implementation of ImageNet Large Visual Recognition Challenge
Inception- V3 uses Inception Block repeatedly and involves extensive convolution and pooling, while ImageNet includes over 14 million images with over 1000 categories
Therefore, manual training of Inception- V3 on ImageNet requires considerable resources and time
Here, we choose to load the pre-trained Inception v3 model to complete some image classification tasks
To prepare
The pre-trained model consists of three parts
classify_image_graph_def.pb
: Inception- V3 model structure and parametersimagenet_2012_challenge_label_map_proto.pbtxt
: Indicates the mapping from the category number to the category stringimagenet_synset_to_human_label_map.txt
: Indicates the mapping between the category string and the category name
For example, 169 corresponds to N02510455, giant Panda, Panda, Panda Bear, Coon Bear, Ailuropoda melanoleuca
Image classification
Load the library
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
Copy the code
Collate the two mapping files and get the corresponding relationship from category number to category name
uid_to_human = {}
for line in tf.gfile.GFile('imagenet_synset_to_human_label_map.txt').readlines():
items = line.strip().split('\t')
uid_to_human[items[0]] = items[1]
node_id_to_uid = {}
for line in tf.gfile.GFile('imagenet_2012_challenge_label_map_proto.pbtxt').readlines():
if line.startswith(' target_class:'):
target_class = int(line.split(':') [1])if line.startswith(' target_class_string:'):
target_class_string = line.split(':')[1].strip('\n').strip('\ "')
node_id_to_uid[target_class] = target_class_string
node_id_to_name = {}
for key, value in node_id_to_uid.items():
node_id_to_name[key] = uid_to_human[value]
Copy the code
Load model
def create_graph():
with tf.gfile.FastGFile('classify_image_graph_def.pb'.'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name=' ')
Copy the code
Defines a function that categorizes images
def classify_image(image, top_k=1):
image_data = tf.gfile.FastGFile(image, 'rb').read()
create_graph()
with tf.Session() as sess:
# 'softmax:0': A tensor containing the normalized prediction across 1000 labels
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048 float description of the image
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG encoding of the image
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
predictions = sess.run(softmax_tensor, feed_dict={'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
top_k = predictions.argsort()[-top_k:]
for node_id in top_k:
human_string = node_id_to_name[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
Copy the code
The function top_k is called to classify images. The parameter top_k returns the most possible classification results
classify_image('test1.png')
Copy the code
The classification results are as follows
test1
: Giant Panda, panda, Panda bear, Coon Bear, Ailuropoda melanoleuca (score = 0.89107)test2
: Pekinese, Pekingese, Peke (Score = 0.90348)test3
: Samoyed, Samoyede (Score = 0.92054)
Customizing categorizing tasks
Inception- V3 is designed for the ImageNet image classification task, so the number of neurons in the last full connection layer is the same as the number of classification tags
If you need to customize the classification task, you can simply use your own annotation data and replace the last full connection layer
The number of neurons in the last fully connected layer is equal to the number of tags of the customized classification task. The model only trains the parameters of the last layer, and other parameters remain unchanged
As a typical application scenario of transfer learning, Inception- V3 retains its ability to understand and abstract images while meeting the requirements of customized classification tasks
TensorFlow provides a tutorial on how to do migration learning on Inception-v3
www.tensorflow.org/tutorials/i…
The data used included photographs of five flowers
- Daisy: Daisy
- Dandelion: Dandelion
- Roses: roses
- The sunflowers
- -Rufus: Tulips
With the last full link layer removed, the representation of the model output for a picture input is called Bottleneck
Calculating and caching the entire image beforehand saves a lot of training time, since it only needs to calculate and learn the hidden layer between the Bottleneck and the output tag
TensorFlow officially provides the code for retraining
Github.com/tensorflow/…
Used on the command line, some optional command line arguments include
--image_dir
: Training picture directory--output_graph
: Model save directory--output_labels
: Model label save directory--summaries_dir
: directory for saving model logs--how_many_training_steps
: Number of training iterations. Default value: 4000--learning_rate
: Learning rate, default is 0.01--testing_percentage
: Ratio of test sets. The default value is 10%--validation_percentage
: Proportion of parity sets. The default value is 10%--eval_step_interval
: Model evaluation frequency, the default is once in 10 iterations--train_batch_size
: Training batch size, default is 100--print_misclassified_test_images
: Whether to output test set images of all error categories. The default is False--model_dir
: Inception-v3 model path--bottleneck_dir
: Cache cache directory--final_tensor_name
: Indicates the name of the new full connection layer. The default value isfinal_result
--flip_left_right
: Indicates whether to randomly flip half of an image horizontally. Default: False--random_crop
: Indicates the proportion of random clipping. The default value is 0, that is, no clipping--random_scale
: Random zoom ratio. The default value is 0, indicating no zoom--random_brightness
: Random brightening ratio. The default value is 0, indicating no brightening--architecture
: Migration model. The default value isinception_v3
, the highest accuracy but the training time is longer, you can also choose'mobilenet_<parameter size>_<input_size>[_quantized]'
, e.g.Mobilenet_1. 0 _224
andMobilenet_0. 25 _128_quantized
Run the code
python retrain.py --image_dir flower_photos --output_graph output_graph.pb --output_labels output_labels.txt --summaries_dir summaries_dir --model_dir .. --bottleneck_dir bottleneck_dir
Copy the code
This is an erratum to the content of the video
- will
--output_graph
After theoutput_graph
Instead ofoutput_graph.pb
- will
--output_labels
After theoutput_labels
Instead ofoutput_labels.txt
The classification accuracy of the calibration set and test set is 91% and 91.2% respectively
It took 55 minutes on my laptop, of which 44 minutes were spent on the Bottleneck cache, but without caching, each iteration of the training would have to be computed again
Training logs in the summaries_DIR directory are available for TensorBorad visualization
tensorboard --logdir summaries_dir
Copy the code
Then go to http://localhost:6006 in your browser and you can see the visualizations, including SCALARS, GRAPHS, Weighting and HISTOGRAMS pages
To complete other image classification tasks, organize corresponding labeled images and use label names as subfolders
To use the trained model, follow the code below
output_labels.txt
: Category File pathoutput_graph.pb
: Trained model pathread_image()
: The function that reads the pictureinput_operation
: Picture input correspondingoperation
output_operation
: Corresponding to the classification outputoperation
test.jpg
: Indicates the path of the image to be classified
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
labels = []
for line in tf.gfile.GFile('output_labels.txt').readlines():
labels.append(line.strip())
def create_graph():
graph = tf.Graph()
graph_def = tf.GraphDef()
with open('output_graph.pb'.'rb') as f:
graph_def.ParseFromString(f.read())
with graph.as_default():
tf.import_graph_def(graph_def)
return graph
def read_image(path, height=299, width=299, mean=128, std=128):
file_reader = tf.read_file(path, 'file_reader')
if path.endswith('.png'):
image_reader = tf.image.decode_png(file_reader, channels=3, name='png_reader')
elif path.endswith('.gif'):
image_reader = tf.squeeze(tf.image.decode_gif(file_reader, name='gif_reader'))
elif path.endswith('.bmp'):
image_reader = tf.image.decode_bmp(file_reader, name='bmp_reader')
else:
image_reader = tf.image.decode_jpeg(file_reader, channels=3, name='jpeg_reader')
image_np = tf.cast(image_reader, tf.float32)
image_np = tf.expand_dims(image_np, 0)
image_np = tf.image.resize_bilinear(image_np, [height, width])
image_np = tf.divide(tf.subtract(image_np, [mean]), [std])
sess = tf.Session()
image_data = sess.run(image_np)
return image_data
def classify_image(image, top_k=1):
image_data = read_image(image)
graph = create_graph()
with tf.Session(graph=graph) as sess:
input_operation = sess.graph.get_operation_by_name('import/Mul')
output_operation = sess.graph.get_operation_by_name('import/final_result')
predictions = sess.run(output_operation.outputs[0], feed_dict={input_operation.outputs[0]: image_data})
predictions = np.squeeze(predictions)
top_k = predictions.argsort()[-top_k:]
for i in top_k:
print('%s (score = %.5f)' % (labels[i], predictions[i]))
classify_image('test.jpg')
Copy the code
reference
- Deep Learning with Tensorflow Part 2 – Image classification:towardsdatascience.com/deep-learni…
- Going deeper with convolutions: arxiv.org/pdf/1409.48…
- ImageNet:image-net.org/
- Image Recognition:www.tensorflow.org/tutorials/i…
- How to Retrain Inception ‘s Final Layer for New Categories:www.tensorflow.org/tutorials/i…
- Convolution neural network – micro Wu En reach deep learning professional – netease mooc.study.163.com/learn/20012 cloud class…
Video lecture course
Deep and interesting (1)