This is my fifth day of the August Challenge

The environment

  • 10 64 – bit Windows
  • Python 3.7
  • labelImg

Image annotation

Here, the open source tool labelImg is used to annotate the images. The exported data set format is PASCAL VOC. After the data annotation is completed, you can see that the folder looks like this: annotated XML file and image file are mixed together

Homemade VOC data set

First, create folders VOCdevkit, VOC2007, Annotations, ImageSets, Main, and JPEGImages according to the VOC2007 data set format requirements, as shown below

├─ VOCdevkit ├─ VOC2007 Heavy exercises ─ImageSets │ ├─ Main ├─ JPEGImagesCopy the code

Annotations are used to store XML annotation files, JPEGImages are used to store image files, and ImageSets/Main is used to store several TXT text files whose contents are the names of the images in the training set, validation set, and test set (with the extensions removed). These are the text files that we need to generate ourselves, which we’ll talk about later.

Next, copy the image files from the images folder to the JPEGImages folder, and the XML annotation files from the images file to the Annotations folder

Next, create a new script and put it in the VOCdevkit/VOC2007 folder named test.py

─VOCdevkit │ ├─ VOC2007 │ test.py │ ├─ImageSets │ ├─ JPEGImagesCopy the code

The script is as follows

Import OS import random # Trainval_percent = 0.1 Train_percent = 0.9 # xmlFilepath = 'Annotations' Txtsavepath = 'ImageSets\Main' total_xml = os.listdir(xmlFilepath) num = len(total_xml) list = range(num)  tv = int(num * trainval_percent) tr = int(tv * train_percent) trainval = random.sample(list, tv) train = random.sample(trainval, tr) ftrainval = open('ImageSets/Main/trainval.txt', 'w') ftest = open('ImageSets/Main/test.txt', 'w') ftrain = open('ImageSets/Main/train.txt', 'w') fval = open('ImageSets/Main/val.txt', 'w') for i in list: name = total_xml[i][:-4] + '\n' if i in trainval: ftrainval.write(name) if i in train: ftest.write(name) else: fval.write(name) else: ftrain.write(name) ftrainval.close() ftrain.close() fval.close() ftest.close()Copy the code

Then, go to VOCdevkit/VOC2007 directory, execute the script, after the end of ImageSets/Main generated 4 TXT files

├─ImageSets │ ├─ Main │ ├─ test.txt │ trainval.txt │ val.txt │ ├─ JPEGImagesCopy the code

The format of these four files is the same. The content of the file is the result of removing the extension of the corresponding image name (which is the same as removing.xml from the XML tag file)

OK, so with that in mind, let’s finally take a look at the V3/V4 version of YOLO and see how the data set and the training profile fit together.

Here, we download a script file from the yOLO official pjreddie.com/media/files… , and post the URL to the browser to download

Code is relatively simple, is the need to training, verification, test the image of the absolute path to write the corresponding TXT file

import xml.etree.ElementTree as ET import pickle import os from os import listdir, Getcwd from os.path import join # We delete it # sets = [(' 2012 ', '" train "), (' 2012', 'val), (' 2007', '" train "), (' 2007', 'val), (' 2007', 'test')] sets = [(' 2007 ', 'train'), ('2007', 'val'), ('2007', 'test')] # classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] classes = ["hat"] def convert(size, box): Dw = 1 / size [0] dh = 1 / size [1] x = box (box [0] + [1])/y = 2.0 + box (box [2] [3])/w = 2.0 box [1] - box [0] h = box [3] - box[2] x = x*dw w = w*dw y = y*dh h = h*dh return (x,y,w,h) def convert_annotation(year, image_id): in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id)) out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w') tree=ET.parse(in_file) root = tree.getroot() size = root.find('size') w = int(size.find('width').text) h = int(size.find('height').text) for obj in root.iter('object'): difficult = obj.find('difficult').text cls = obj.find('name').text if cls not in classes or int(difficult) == 1: continue cls_id = classes.index(cls) xmlbox = obj.find('bndbox') b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text)) bb = convert((w,h), b) out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n') wd = getcwd() for year, image_set in sets: if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)): os.makedirs('VOCdevkit/VOC%s/labels/'%(year)) image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split() list_file = open('%s_%s.txt'%(year, image_set), 'w') for image_id in image_ids: list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id)) convert_annotation(year, image_id) list_file.close()Copy the code

After executing the above script, 2007_train. TXT, 2007_val. TXT and 2007_test. TXT will be generated in the VOCdevkit directory.

At this point, the homemade VOC2007 data set is ready. The configuration file CFG /voc.data in darknet can be written this way

classes= 1
train  = 2007_train.txt
valid  = 2007_val.txt
names = data/voc.names
backup = backup/
Copy the code

Convert to YOLO data format

First, the aforementioned annotation tool labelImg can export the YOLO data format. But if you have a piece of data that is labeled as XML, you need to convert it. Take the example we labeled above

Place all the images in the Images folder, the XML annotation file in the Annotations folder, and create an labels folder

├ ─ Annotations ├ ─ images └ ─ labelsCopy the code

Next, prepare to convert the script voc2Yolo. Py, partially commented in the code

import xml.etree.ElementTree as ET import pickle import os from os import listdir, Getcwd from os.path import join classes = ["hat"] def convert(size, box): Dw = 1 / size [0] dh = 1 / size [1] x = box (box [0] + [1])/y = 2.0 + box (box [2] [3])/w = 2.0 box [1] - box [0] h = box[3] - box[2] x = x * dw w = w * dw y = y * dh h = h * dh return (x, y, w, h) def convert_annotation(image_id): if not os.path.exists('Annotations/%s.xml' % (image_id)): return in_file = open('annotations/%s.xml' % (image_id)) out_file = open('labels/%s.txt' % (image_id), 'w') tree = ET.parse(in_file) root = tree.getroot() size = root.find('size') w = int(size.find('width').text) h = int(size.find('height').text) for obj in root.iter('object'): cls = obj.find('name').text if cls not in classes: continue cls_id = classes.index(cls) xmlbox = obj.find('bndbox') b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text)) bb = convert((w, h), b) out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n') for image in os.listdir('images'): You need to change it according to your image. For example, if the image name is 123.456.jpg, there will be an error. Image_id =image[:-4] image_id=image[:-4] image_id=image[:-4] image_id=image[:-4] In short, the situation is more, look at their own, ha ha! image_id = image.split('.')[0] convert_annotation(image_id)Copy the code

After executing the script above, the Labels folder will generate the annotation file in the format of TXT

As you all know, the data set structure used in yOLOV5 training looks like this

├─test │ ├─images │ ├─ labels │ ├─train │ ├─images │ ├─ labels │ ├─ valid │ ├─images │ ├─ labelsCopy the code

Therefore, we need to divide the image file and the corresponding TXT label file again, first create the outer layer of train, valid, test folders, and then create images and labels folders under each folder

Next, you can use the following script to scale the image and label files

Import OS import shutil import Random # Proportion assignment of training set, validation set, and test set test_Percent = 0.1 VALID_percent = 0.2 train_percent = 0.7 # Image_path = 'images' label_path = 'labels' images_files_list = os.listdir(image_path) labels_files_list = os.listdir(label_path) print('images files: {}'.format(images_files_list)) print('labels files: {}'.format(labels_files_list)) total_num = len(images_files_list) print('total_num: {}'.format(total_num)) test_num = int(total_num * test_percent) valid_num = int(total_num * valid_percent) train_num = Int (total_num * train_percent) # test_image_index = random. Sample (total_num) test_num) valid_image_index = random.sample(range(total_num), valid_num) train_image_index = random.sample(range(total_num), train_num) for i in range(total_num): print('src image: {}, i={}'.format(images_files_list[i], i)) if i in test_image_index: Shutil.copyfile ('images/{}'.format(images_files_list[I]), 'test/images/{}'.format(images_files_list[i])) shutil.copyfile('labels/{}'.format(labels_files_list[i]), 'test/labels/{}'.format(labels_files_list[i])) elif i in valid_image_index: shutil.copyfile('images/{}'.format(images_files_list[i]), 'valid/images/{}'.format(images_files_list[i])) shutil.copyfile('labels/{}'.format(labels_files_list[i]), 'valid/labels/{}'.format(labels_files_list[i])) else: shutil.copyfile('images/{}'.format(images_files_list[i]), 'train/images/{}'.format(images_files_list[i])) shutil.copyfile('labels/{}'.format(labels_files_list[i]), 'train/labels/{}'.format(labels_files_list[i]))Copy the code

After executing the code, you see a file-like hierarchy

─ test │ ├ ─ images │ │ 1234565343231 JPG │ │ 1559035146628 JPG │ │ 2019032210151 JPG │ │ │ └ ─ labels │ 1234565343231. TXT │ 1559035146628. TXT │ 2019032210151. TXT │ ├ ─ "train" │ ├ ─ images │ │ 1213211 JPG │ │ 12 i4u33112. JPG │ │ 1559092537114 JPG │ │ │ ├─ PDF │ ├─ PDF │ ├─ PDF │ ├─ PDF │ ├─ PDF │ ├─ PDF │ ├─ PDF │ ├─ PDF │ ├─ PDF │ ├─ PDF │ ├─ PDF │ ├─ PDF JPG │ ├ ─ PDF (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0Copy the code

At this point, the data set is truly ready.

YOLO turn VOC

If you get the TXT label, but need to use VOC, also need to convert. Look at the following script, with comments written in the code

Import OS import xml.etree.ElementTree as ET from PIL import Image import numpy as NP # Labels_path = 'images/' # XML '; labels_path = 'images/' # XML '; }}}}}}}}}}}}}}}}}}}}}}}}}}}}}} = sd = 0 def write_xml(imgname, sw, sh, sd, filepath, labeldicts): ''' imgname: # Create an Annotation root = et.element ('Annotation') No extension et.subelement (root, Id = et.subelement (root,'size') et.subelement (sizes, 'size') 'width').text = str(sw) ET.SubElement(sizes, 'height').text = str(sh) ET.SubElement(sizes, 'depth').text = str(sd) for labeldict in labeldicts: objects = ET.SubElement(root, 'object') ET.SubElement(objects, 'name').text = labeldict['name'] ET.SubElement(objects, 'pose').text = 'Unspecified' ET.SubElement(objects, 'truncated').text = '0' ET.SubElement(objects, 'difficult').text = '0' bndbox = ET.SubElement(objects,'bndbox') ET.SubElement(bndbox, 'xmin').text = str(int(labeldict['xmin'])) ET.SubElement(bndbox, 'ymin').text = str(int(labeldict['ymin'])) ET.SubElement(bndbox, 'xmax').text = str(int(labeldict['xmax'])) ET.SubElement(bndbox, 'ymax').text = str(int(labeldict['ymax'])) tree = ET.ElementTree(root) tree.write(filepath, encoding='utf-8') for label in labels: with open(labels_path + label, 'r') as f: img_id = os.path.splitext(label)[0] contents = f.readlines() labeldicts = [] for content in contents: It depends on the format of your image. I have JPG here. Open (img_path + label.strip('.txt') + '.jpg')) img.shape[1], img.shape[2] content = content.strip('\n').split() x = float(content[1])*sw y = float(content[2])*sh w = Float (content[3])*sw h = float(content[4])*sh # x_center y_center width height -> xmin ymin xmax ymax new_dict = {'name': classes[int(content[0])], 'difficult': '0', 'xmin': x+1-w/2, 'ymin': y+1-h/2, 'xmax': x+1+w/2, 'ymax': y+1+h/2 } labeldicts.append(new_dict) write_xml(img_id, sw, sh, sd, annotations_path + label.strip('.txt') + '.xml', labeldicts)Copy the code

Executing the script above, you can see the converted XML file in Annotations. Refer to part 2 of this article for the VOC dataset operations that follow.