This article mainly introduces how to use their own data set training DeepLabv3+ segmentation algorithm, code using official source code.

1. Introduction to code

The official source code for the current version of TensorFlow is used. It was chosen because the code is comprehensive. In addition to the code implementation, there is a lot of documentation to help understand and use, as well as the code implementation for model transformation.

Models /research/ Deeplab at Master · tensorflow/ Models

Next, the first code warehouse for a simple introduction, because I only care about the implementation of the training code in the use of the code warehouse, and ignore the other content, take a lot of detours, to the back of the discovery of the content I want, the warehouse has long (==).

In the current implementation, we support the following network backbone:

  • MobileNetv2andMobileNetv3: a fast networking architecture for mobile devices
  • Xception: powerful network structure for server-side deployment
  • ResNet-v1-{50, 101}: We offer originalResNet-v1And its"Beta"Variant, which is right"Stem"Modifications were made for semantic segmentation.
  • PNASNet: a powerful network structure discovered through a neural architecture search.
  • Auto-Deeplab(Called in the codeHNASNet) : subdivision specific network backbone found through a neural architecture search.

This directory contains the TensorFlow implementation. We provide code that allows users to train models, evaluate results against mIOU (mean intersection summation), and visualize segmentation results. Take PASCAL VOC 2012 and Cityscapes semantic segmentation benchmarks as examples.

Several important files in the code:

  • datasets/: This folder contains the processing code for the training data set, mainly forPASCAL VOC 2012andCityscapesProcessing of data sets.
  • g3doc/: This folder contains multiple filesMarkdownFiles, very useful, how to install, FAQs, etc.
  • deeplab_demo.ipynb: This file is given how to perform semantic segmentation of an image and display the results of the Demo.
  • export_model.py: The document provides the training to becheckpointModels to.pbFile code implementation.
  • train.py: Training code file. During training, you need to specify the provided training parameters.
  • eval.py: Validates the code and outputs the mIOU to evaluate the model.
  • vis.py: Visual code.

2, installation,

Deeplab relies on the following libraries:

  • Numpy
  • Pillow 1.0
  • tf Slim (which is included in the “tensorflow/models/research/” checkout)
  • Jupyter notebook
  • Matplotlib
  • Tensorflow

2.1 Adding libraries toPYTHONPATH

When running locally, tensorflow/models/research/directory should be appended to the PYTHONPATH, as follows:

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

# [Optional] for panoptic evaluation, you might need panopticapi:
# https://github.com/cocodataset/panopticapi
# Please clone it to a local directory ${PANOPTICAPI_DIR}
touch ${PANOPTICAPI_DIR}/panopticapi/__init__.py
export PYTHONPATH=$PYTHONPATH:${PANOPTICAPI_DIR}/panopticapi
Copy the code

Note: This command needs to run on each new terminal you start. If you want to avoid running this command manually, you can add it as a new line to the end of the ~ /.bashrc file.

2.2 Checking whether the installation is successful

Quick test by running model_test.py:

# From tensorflow/models/research/
python deeplab/model_test.py
Copy the code

Quickly run all code on PASCAL VOC 2012 dataset:

# From tensorflow/models/research/deeplab
sh local_test.sh
Copy the code

3. Data set preparation

Final goal: generate data in TFRecord format

The data set directory structure is as follows:

+dataset # dataset name +image +mask +index - train.txt - trainval.txt - val.txt + TFRecordCopy the code
  • Image: Original image, RGB color image
  • mask: Mask image with pixel value as category label, single channel, same as the name of the original image, suffix is.jpgand.pngEither is fine, as long as it reads consistently in the code. The VOC dataset default is.jpg, mask image is.png.
  • index: stores the image file nametxtFile (without suffix)
  • tfrecord: Store to converttfrecordFormat image data

Data set production process:

  1. Annotate data and make the required onesmaskimage
  2. The data set is divided into training set, validation set and test set
  3. generateTFRecordFormat data set

3.1 Annotation Data

The data of the training set consists of two parts, one is the original image, the other is the annotation value of the corresponding classification (called mask image in this paper).

How is the value of the mask image set? According to the classification number of image segmentation, the mask image corresponding to the original image is made. If there are a total of N categories (background as one), the value range of mask image is [0~N]. 0 value is used as background value, and the values of other segmentation categories are successively set as 1, 2…, n-1.

Note:

  • ignore_label: is literally an ignored label, i.eignore_labelRefers to unlabeled pixels, that is, pixel values that do not need to be predicted, therefore, it does not participatelossThe value is computed inmaskThe value is denoted in the graph255.
  • maskThe image is a single – channel grayscale image.
  • maskThere is no limit on the image format, but all mask images adopt the same image format to facilitate data reading.

The values of mask images are divided into three categories:

  1. Background:0said
  2. Category Category: Used1, 2,... , N-1said
  3. ignore_labelValue:255said

If there are fewer categories, the generated mask image will look black, because the values of the categories are small and it is not easy to display within the range of 0~255.

3.2 Split the data set

This part is to divide the prepared data set into training set, verification set and test set. There is no need to divide the specific image files into three folders, only need to establish the index file of the image, by adding the corresponding path + file name can obtain the specific image.

Assume that the storage path of the original image and mask image is as follows:

  • The original:./dataset/images
  • maskImage:./dataset/mask: The storage here isSection 2.1Required format

The original image corresponds to the mask image one by one, including image size and image name (suffix can be different).

The index file is stored in the./dataset/index directory.

  • train.txt
  • trainval.txt
  • val.txt

In the index file, only the file name (no suffix) is recorded, depending on how the data set is loaded in the code.

3.3 Package data asTFRecordformat

TFRecord is a binary file format recommended by Google, which in theory can store information in any format. TFRecord uses the “Protocol Buffer” binary data encoding scheme internally, which only takes up a memory block and only needs to load a binary file at a time. It is simple and fast, especially friendly to large training data. Moreover, when the amount of training data is relatively large, the data can be divided into multiple TFRecord files to improve the processing efficiency.

So, how do you generate data into TFRecord format?

/datasets/build_voc2012_data.py in the project code. To the file is VOC2012 data set processing code, we just need to modify the input parameters.

Parameters:

  • image_folder: Original image folder name,./dataset/image
  • semantic_segmentation_folder: Split folder name,./dataset/mask
  • list_folder: index folder name,./dataset/index
  • output_dir: Indicates the generated output pathtfrecordThe location of the file,./dataset/tfrecord

Run the command:

python ./datasets/build_voc2012_data.py --image_folder=./dataset/image
										--semantic_segmentation_folder=./dataset/mask
										--list_folder=./dataset/index
										--output_dir=./dataset/tfrecord
Copy the code

The generated file is as follows:

_NUM_SHARDS
4
_NUM_SHARDS

The core code of this file is as follows:

# dataset_split refers to train.txt, val.txt, etc
dataset = os.path.basename(dataset_split)[:4 -]
filenames = [x.strip('\n') for x in open(dataset_split, 'r')] # file name list

Print the tfRecord file name
output_filename = os.path.join(
            FLAGS.output_dir,
            '%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
	for i in range(start_idx, end_idx): 
		image_filename = os.path.join(iamge_folder, filenames[i]+'. '+image_format)# Original path
		image_data = tf.gfile.GFile(image_filename, 'rb').read() # Read the original file
	    height, width = image_reader.read_image_dims(image_data)
	    
	    seg_filename = os.path.join(semantic_segmentation_folder,
                    filenames[i] + '. ' + label_format) # mask image path
	    seg_data = tf.gfile.GFile(seg_filename, 'rb').read() # Read the segmented image
	    seg_height, seg_width = label_reader.read_image_dims(seg_data)
	    
	    # Determine whether the original image and mask image size match
	    ifheight ! = seg_heightorwidth ! = seg_width:raise RuntimeError('Shape mismatched between image and label.')
	    # Convert to tf example.
	    example = build_data.image_seg_to_tfexample(
	        image_data, filenames[i], height, width, seg_data)
	    tfrecord_writer.write(example.SerializeToString())
Copy the code

At this point, the production part of the dataset is complete!!

4, training,

4.1 Code Modification

To train your own data set, you need to modify the following files:

1 datasets/data_generator.py: Adds the registration of the dataset

This file provides a wrapper for semantically partitioned data

In this file, you can see the data descriptions of PASCAL_VOC, CITYSCAPES, and ADE20K datasets as follows:

_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 1464.'train_aug': 10582.'trainval': 2913.'val': 1449,
    },
    num_classes=21,
    ignore_label=255.)Copy the code

En, for example, adds descriptive information to our own data set as follows:

_PORTRAIT_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 17116.'trainval': 21395.'val': 4279,
    },
    num_classes=2.# Number of categories, including backgrounds
    ignore_label=255.# Ignore pixel values
)
Copy the code

Take the portrait segmentation task for example, there are only two categories, namely foreground (portrait) and background (non-human).

After adding the description, you need to register the data set as follows:

_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
    'portrait_seg': _PORTRAIT_INFORMATION, # Add this sentence
}
Copy the code

Note: The dataset name must correspond to the previous one!

2 ./utils/train_utils.pyModify the

In get_model_init_fn, change to the following code, increase the logits layer does not load the pretraining model weight:

	# Variables that will not be restored.
    exclude_list = ['global_step'.'logits']
    if not initialize_last_layer:
        exclude_list.extend(last_layers)
Copy the code

4.2 Main training parameters

The training files train.py and common.py contain all the parameters needed to train a split network.

  • model_variant:DeeplabModel variables, optional values visiblecore/feature_extractor.py.
    • When usingmobilenet_v2, set the variablestrous_rates=decoder_output_stride=None;
    • When usingxception_65orresnet_v1When settingStrous_rates =[6,12,18](output stride 16), decoder_output_stride=4.
  • label_weights: This variable can set the weight value of labels. When category imbalance occurs in the data set, this variable can be used to specify the weight value of labels of each categoryLabel_weights = [0.1, 0.5]This means that tag 0 has a weight of 0.1 and tag 1 has a weight of 0.5. If the value isNone, all labels have the same weight1.0.
  • train_logdirDeposit:checkpointandlogsThe path.
  • log_steps: Indicates the interval at which the log information is output.
  • save_interval_secs: This value indicates how often, in seconds, model files are saved to hard disk.
  • optimizer: optimizer, optional value['momentum', 'adam'].
  • learning_policy: Learning rate policy, optional value['poly', 'step'].
  • base_learning_rate: Basic learning rate, default value0.0001.
  • training_number_of_steps: Number of iterations of model training.
  • train_batch_size: Number of batch images for model training.
  • train_crop_size: Image size used for model training, default'513, 513'.
  • tf_initial_checkpoint: Pre-training model.
  • initialize_last_layer: Whether to initialize the last layer.
  • last_layers_contain_logits_only: Whether only the logical layer is considered as the last layer.
  • fine_tune_batch_norm: Fine tuningbatch normParameters.
  • atrous_ratesDefault value:[6, 12, 18].
  • output_strideDefault value:16, the ratio of input and output spatial resolution
    • forxception_65If theoutput_stride=8, use theatrous_rates=[12, 24, 36]
    • ifoutput_stride=16,atrous_rates=[6, 12, 18]
    • formobilenet_v2, the use ofNone
    • Note: Different ones can be used during the training and validation phasesstrous_ratesandoutput_stride.
  • dataset: The split data set used, the same as the name used when the data set was registered.
  • train_split: Which data set to use for training, the optional value is the value of the data set at the time of registration, such astrain.trainval.
  • dataset_dir: Indicates the path where the data set is stored.

For training parameters, the following points need to be paid attention to:

  1. As for whether to load the weight of the pre-training network, the following parameters need to be paid attention to in order to fine-tune the network on other data sets:

    • Using pretraining network weights, setinitialize_last_layer=True
    • Web-onlybackboneTo set upinitialize_last_layer=Falseandlast_layers_contain_logits_only=False
    • Use all pre-training weights exceptlogitsTo set upinitialize_last_layer=Falseandlast_layers_contain_logits_only=True

    Since my dataset classification is different from the default category number, the parameter values taken are:

    --initialize_last_layer=false
    --last_layers_contain_logits_only=true
    Copy the code
  2. A few tips for training your own data set if your resources are limited:

    • Set up theoutput_stride=16Or even32(Also needs to be modifiedatrous_ratesVariables, for example, foroutput_stride=32.atrous_rates=[3, 6, 9])
    • Use as much as possibleGPUTo change thenum_cloneMark, and willtrain_batch_sizeSet it as large as possible
    • Adjust thetrain_crop_size, you can make it smaller, for example513x513(even321x321), so you can use a bigger onebatch_size
    • Use a smaller network backbone, such asmobilenet_v2
  3. About fine tuning BATCH_norm Set fine_tunE_batch_norm =True when the batch size train_batch_size is greater than 12 (preferably greater than 16). Otherwise, set fine_tune_batch_norm=False.

4.3 Pre-training model

Models /model_zoo.md at master · tensorflow/models

Pretraining models are provided on several data sets, including (1) PASCAL VOC 2012, (2) Cityscapes, and (3) ADE20K

Undecompressed items include:

  • afrozen inference graph(forzen_inference_graph.pb). By default, all frozen reasoning diagrams have an output step of 8, a single Eval Scale of 1.0, and no left/right flip unless otherwise specified. Based on theMobileNet-v2The model does not include a decoder module.
  • acheckpoint(model.ckpt.data-00000-of-00001.model.ckpt.index)

Also provides the ability to pre-train during ImageNet checkpoints

Undecompressed files include: a model checkpoint (model.ckpt.data-00000-of-00001, model.ckpt.index)

Download according to their own situation!

4.4 Training model

python train.py \
    --logtostderr \
    --training_number_of_steps=20000 \
    --train_split="train" \
    --model_variant="xception_65" \
    --train_crop_size="513513" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --train_batch_size=2 \
    --save_interval_secs=240 \
    --optimizer="momentum" \
    --leraning_policy="poly" \
    --fine_tune_batch_norm=false \
    --initialize_last_layer=false \
    --last_layers_contain_logits_only=true \
    --dataset="portrait_seg" \
    --tf_initial_checkpoint="./checkpoint/deeplabv3_pascal_trainval/model.ckpt" \
    --train_logdir="./train_logs" \
    --dataset_dir="./dataset/tfrecord"
Copy the code

4.5 Verifying the Model

The verification code is./eval.py

# From tensorflow/models/research/
python deeplab/eval.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="xception_65" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --eval_crop_size="513513" \
    --dataset="portrait_seg" \ Data set name
    --checkpoint_dir=${PATH_TO_CHECKPOINT} \ # Pre-training model
    --eval_logdir=${PATH_TO_EVAL_DIR} \ 
    --dataset_dir="./dataset/tfrecord" Data set path
Copy the code

The results are as follows:

4.6 Visualization of training process

You can use Tensorboard to check the progress of training and evaluation efforts. With the recommended directory structure, Tensorboard can be run using the following command:

tensorboard --logdir=${PATH_TO_LOG_DIRECTORY}
# text log address
tensorboard --logdir="./train_logs"
Copy the code

5, reasoning

5.1 Model Export

During training, model files are saved to hard disk as follows:

TensorFlow
checkpoint
export_model.py
checkpoint
.pb

Export_model. py Main parameters:

  • checkpoint_path: Training saved checkpoint files
  • export_path: Model export path
  • num_classes: Classification category
  • crop_size: Image size,[513, 513]
  • atrous_rates:12, 24, 36
  • output_stride:8

The. Pb file is as follows:

5.2 Inference on a single image

class DeepLabModel(object):
    """class to load deeplab model and run inference"""

    INPUT_TENSOR_NAME = 'ImageTensor:0'
    OUTPUT_TENSOR_NAME='SemanticPredictions:0'
    INPUT_SIZE = 513
    FROZEN_GRAPH_NAME= 'frozen_inference_graph'

    def __init__(self, pretrained_weights):
        """Creates and loads pretrained deeplab model."""
        self.graph = tf.Graph()
        graph_def = None
        # Extract frozen graph from tar archive
        if pretrained_weights.endswith('.tar.gz'):
            tar_file = tarfile.open(pretrained_weights)
            for tar_info in tar_file.getmembers():
                if self.FROZEN_GRAPH_NAME in os.path.basename(tar_info.name):
                    file_handle = tar_file.extractfile(tar_info)
                    graph_def = tf.GraphDef.FromString(file_handle.read())
                    break
            tar_file.close()
        else:
            with open(pretrained_weights, 'rb') as fd:
                graph_def = tf.GraphDef.FromString(fd.read())

        if graph_def is None:
            raise RuntimeError('Cannot find inference graph in tar archive.')

        with self.graph.as_default():
            tf.import_graph_def(graph_def, name=' ')

        gpu_options = tf.GPUOptions(allow_growth=True)
        config = tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False)
        self.sess = tf.Session(graph=self.graph, config=config)

    def run(self, image):
        """Runs inference on a single image. Args: image: A PIL.Image object, raw input image. Returns: resized_image:RGB image resized from original input image. seg_map:Segmentation map of 'resized_iamge'. """
        width, height = image.size
        resize_ratio = 1.0 * self.INPUT_SIZE/max(width, height)
        target_size = (int(resize_ratio*width), int(resize_ratio * height))
        resized_image = image.convert('RGB').resize(target_size, Image.ANTIALIAS)
        batch_seg_map = self.sess.run(
            self.OUTPUT_TENSOR_NAME,
            feed_dict={self.INPUT_TENSOR_NAME:[np.asarray(resized_image)]}
        )
        seg_map = batch_seg_map[0]

        return resized_image, seg_map
        
        
if __name__ == '__main__':
    pretrained_weights = './train_logs/frozen_inference_graph_20000.pb'
    MODEL = DeepLabModel(pretrained_weights) # Load model
    
    img_name = 'test.jpg'
    img = Image.open(img_name)
    resized_im, seg_map = MODEL.run(original_im) # get results
    seg_map[seg_map==1] =255 # Set the pixel value of the portrait to 255
    seg_map.save('output.jpg') # Save mask result image
Copy the code

At this point, the whole training process is over!!