MIT License:github.com/albumentati…

Why do we need image enhancement

Deep neural networks need a lot of training data to get good results and prevent overfitting. However, it is often difficult to obtain enough training samples, and the main difficulty lies in the high cost of collecting and annotating data sets. At this point, new samples can be made through image enhancement:

  

The effect of image enhancement

It is helpful to resist overfitting and improve the accuracy of prediction.

In 2018, Google published a paper on AutoAugment — an algorithm that automatically finds the best augmentation set of a data set. They show that a set of custom enhancements can improve the performance of the model. The 2022-02-27 1

Use Albumentations for image enhancement

This library is a high-level package.

If you use Pillow for simple image enhancement:

python from PIL import Image, ImageEnhance image = Image.open("parrot.jpg") mirrored_image = image.transpose(Image.FLIP_LEFT_RIGHT) rotated_image = image.rotate(45) brightness_enhancer = ImageEnhance.Brightness(image) brighter_image = Brightness_enhancer. Enhance (factor = 1.5)Copy the code

This method has limitations and is difficult to reuse. The limitations are mainly reflected in two different cases of image enhancement:

Some transformations, such as pixel-level enhancement, change the pixel value of the original image but do not change the output mask. Now we just need to transform the image.

  

While some transformations (spatial enhancement) will change the image and mask, so we need to process the image and annotation data at the same time.

  

The same is true for object detection tasks. For pixel-level enhancement, you only need to change the input image. With spatial enhancement, you need to apply the same transformation not only to the image, but also to the boundary box coordinates. After applying spatial enhancement, you need to update the coordinates of the bounding box to indicate the correct position of the object on the enhanced image.

  

Albumentations provides:

  • Properly apply transformations to input data and output labels.

  • Allows you to set the desired probability and value size for each transformation.

  • PipeLine and uniform interface definition

    For example this Pipeline first crops a random 512px x 512px portion of the input image. The brightness and contrast of the image are then changed randomly with a 30% probability. Finally, the resulting image is flipped horizontally with a 50% probability

    Python import albumentations as A transform = A.Com pose ([A.R andomCrop (512, 512), A.R andomBrightnessContrast (p = 0.3), A.H orizontalFlip (p = 0.5),)Copy the code
  • At the same time, it is a reliable library because it has complete tests.

  • It has very strong extensibility

Image enhancement Example

  • Learnopencv.com/why-does-op…

Online demo:albumentations-demo.herokuapp.com/

python import albumentations as A # Import albumentations import cv2 # Import a library to read images from the disk (e.g., OpenCV). # Define an augmentation pipeline. transform = A.Compose([ A.RandomCrop(width=256, height=256), A.H orizontalFlip (p = 0.5), A.R andomBrightnessContrast (p = 0.2). ]) # Read Image From Disk image = cv2.imread("/path/to/image.jpg") image = cv2.cvtColor(image, Cv2. COLOR_BGR2RGB) # OpenCV because of historical reasons [] (https://learnopencv.com/why-does-opencv-use-bgr-color-format/) read image in BGR format, Transform = transform(image=image) transformed_image = transform ["image"]Copy the code

Of course, you can also use other libraries to read images, such as Pillow:

python
# pip install pillow
from PIL import Image
import numpy as np

pillow_image = Image.open("image.jpg")
image = np.array(pillow_image)
Copy the code

Both the image and the annotation work with Example

For example, for semantic segmentation tasks, you need to add both the input image and one or more output masks.

python import albumentations as A import cv2 transform = A.Compose([ A.RandomCrop(width=256, height=256), A.H orizontalFlip (p = 0.5), A.R andomBrightnessContrast (p = 0.2). ] image = cv2.imread("/path/to/image.jpg") image = cv2.cvtcolor (image, cv2.color_bgr2rgb) # The simultaneous read type should be the same/compatible with numpy array, Mask = cv2.imread("/path/to/mask.png") transformed = transform(image=image, Mask =mask) transformed_image = transformed['image'] transformed_mask = transformed['mask'] Mask_1 = cv2.imread("/ PATH /to/mask_1.png") mask_2 = cv2.imread("/ PATH /to/mask_2.png") mask_3 = cv2.imread("/path/to/mask_3.png") masks = [mask_1, mask_2, Mask transformed = transform(masks =image, masks =image) masks=masks) transformed_image = transformed['image'] transformed_masks = transformed['masks']Copy the code

  

Example of image recognition calibration frame

Support four image annotation formats: Pascal_VOC, Albumentations, COCO and YOLO.

  

As shown above: A rectangular labeling box can be identified by four variables, namely [x_min, y_min, x_max, y_max].

Therefore, we can summarize several annotation formats:

  1. Pascal_voc:[x_min, y_min, x_max, y_max]In the figure above is[98, 345, 420, 462]
  2. Albumentations: normalized PASCAL, divided by the height/width of the image,[x_min, y_min, x_max, y_max]In the figure above is[0.153125, 0.71875, 0.65625, 0.9625]
  3. Coco:[x_min, y_min, width, height]In the figure above is[98, 345, 322, 117]
  4. Yolo:[x_center, y_center, width, height]And normalized, in the figure above[0.4046875, 0.840625, 0.503125, 0.24375]

Albumentations supports transformation of annotations and images simultaneously:

python import albumentations as A import cv2 transform = A.Compose([ A.RandomCrop(width=450, height=450), A.H orizontalFlip (p = 0.5), A.R andomBrightnessContrast (p = 0.2),], bbox_params = a. boxParams (format = 'coco', min_area = 1024, Min_visibility =0.1, label_fields=['class_labels'])) # Minimum area size after transformation, minimum target visibility, Imread ("/path/to/image.jpg") image = cv2.cvtcolor (image, cv2.color_bgr2rgb) # Support for adding annotation types to coordinate arrays, such as [23, 74, 295, 388, 'dog'], [333, 421, 49, 49, 'sports ball', 'item'] transformed = transform(image=image, Bboxes =bboxes) transformed_image = transformed['image'] transformed_bboxes = transformed['bboxes'] # You can also pass tags individually (default) for example: [23, 74, 295, 388]... ['cat', 'dog', 'sports ball'] transformed = transform(image=image, bboxes=bboxes, Class_labels =class_labels) # where class_labels is the name specified by label_fields, Support for multiple arguments transformed_image = transformed['image'] transformed_bboxes = transformed['bboxes'] transformed_class_labels = transformed['class_labels']Copy the code