Same old story. Official firsttorchvision.transformsIntroduction:

This Transforms is a common transformation of an image (including image enhancements, etc.), and the different Transforms can then be connected via the Compose function (similar to how sequences connect network layers). The following is about the introduction of image segmentation tasks, because the introduction of PyTorch is mainly image classification, so the following is not mentioned.

1 Basic Function

1.1 Compose

[code]

torchvision.transforms.Compose(transforms)

【 introduction 】

Squeezing the different transforms together is a very important function

【 Code Examples 】

transforms.Compose([
     transforms.CenterCrop(10),
     transforms.ToTensor(),
 ])
Copy the code

1.2 RandomChoice

[code]

torchvision.transforms.RandomChoice(transforms)

【 introduction 】

For Compose, a random transform is selected from the list of transforms.

1.3 RandomOrder

[code]

torchvision.transforms.RandomOrder(transforms)

【 introduction 】

For Compose, it is a transform in an out-of-order list.


As mentioned in the previous course, in the official data set of TorchVision, the data provided is in PIL format, and then we need to convert the data into FloatTensor form. Therefore, the image enhancement processing here is also divided into PIL image operation and FloatTensor tensor operation.


Operation on PIL

2.1 Center cut CenterCrop

[code]

torchvision.transforms.CenterCrop(size)

【 introduction 】

PIL picture center as the center, the picture cutting. More commonly used

Size (sequence or int) – The size of the image you want to cut. If size is an integer, cut a square; If it’s a (height,width) tuple, cut a rectangle.

【 Code Examples 】

transforms.Compose([
     transforms.CenterCrop(10),
     transforms.ToTensor(),
 ])
Copy the code

2.2 RandomCrop

[code]

torchvision.transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')

【 introduction 】

Similar to CenterCrop, but randomly selected centers for cutting

“Parameters”

  • Size can be an int it can be a tuple (height, width)
  • The padding is whether or not to fill the image, so you can type in 2 tuples, left and right, top and bottom, or you can type in 4 tuples, top and right, bottom and left;
  • Pad_if_needed is Boolean, usually True. Randomly select whether to fill if the selected comparison edge exceeds the boundary
  • Fill (int), do you choose to fill 0 (black) or 255 (white)? This only works when padding_mode=’constant’
  • Padding_mode indicates the filling method. There are four:‘constant’, ‘edge’, ‘reflect’ or ‘symmetric’. The default is constant padding. Edge is the pixel value that fills the edge. The effect is usually stronger than constant. Both Reflect and Symmetric refer to the filling of mirrors along the boundary axis. The difference lies in:
    • Reflect: [1, 2, 3, 4, 5] for padding = 2, then,2,1,2,3,4,5,4,3 [3]
    • Symmetric: [1, 2, 3, 4, 5] for padding = 2, then,1,1,2,3,4,5,5,4 [2]
    • Which element of the boundary is repeated or not. There is not much difference between the two methods.

2.3 Random proportional cutting

[code]

Torchvision. Transforms. RandomResizedCrop (size, scale = (0.08, 1.0), thewire = (0.75, 1.3333), interpolation = 2)

【 introduction 】

Cut the image at random size and then resize to the set size.

The scale parameter controls the size of the cut image to be the ratio of the original image, while ratio controls the aspect ratio of the cut image, which is from 3/4 to 4/3 by default. After the cutting is complete, resize to the set size. This method is commonly used to train inception networks.

2.4 Randomly modify brightness contrast saturation ColorJitter

[code]

torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0) 【 introduction 】

Change brightness, contrast, saturation, hue randomly

“Parameters”

  • Brightness (float or tuple (min, Max)) — If the input is a float, it is recommended to select a float less than 1. The brightness coefficient varies from [Max (0,1− Brightness) to 1+brightness][Max (0,1 – Brightness), 1+brightness] [Max (0,1−brightness),1+brightness] Then the coefficient is randomly selected between [0.9,1.1][0.9,1.1][0.9,1.1]. Select a tuple from [min, Max][min, Max][min, Max].

  • Contrast (float or tuple (min, Max)) – Again, a coefficient selection.

  • Saturation (float or tuple (min, Max)) — As above, is also a selection of coefficients.

  • Hue (float or tuple (min, Max)) – Hue is the hue. Here the hue value should be less than 0.5. If a float is entered, the value should be 0<= Hue <=0.50<= Hue <=0.50<= Hue <=0.5. The coefficient is set to [− Hue, Hue][- Hue,hue][− Hue, Hue]. [min, Max][min, Max][min, Max]

2.5 rotate the RandomRotation

[code]

torchvision.transforms.RandomRotation(degrees, resample=False, expand=False, center=None, fill=None)

【 introduction 】

It’s the random Angle of the picture

“Parameters”

  • Degrees (int or tuple (min, Max)) — If the rotation Angle is [-int,int],tuple is [min, Max]

  • Expand (bool, optional) — True Transforms) if True is followed by a resize transform); The default is False and the rotated image is the same size as the input image.

  • Center (2-tuple, optional) – Can be set to non-image center rotation

  • Fill (n-tuple or int or float) — Sets the value of the fill pixel, which defaults to 0 and is usually chosen as well.

2.4 Grayscale

[code]

torchvision.transforms.Grayscale(num_output_channels=1)

【 introduction 】

This function is not important, but can be used to improve the speed of ha ha. That’s how you turn a picture to gray.

“Parameters”

  • Num_output_channels (int) — Normally greyscale images are single channels, but you can set it to 3 to output three channels of greyscale images (all with the same eigenvalues). You don’t have to modify the input interface in torchVision’s pre-training model. (Because as mentioned earlier, the pre-training model is trained with ImageNet, and the input is all three-channel color images)

2.5 the size

[code]

torchvision.transforms.Resize(size, interpolation=2)

【 introduction 】

Resize PIL image to the specified size

“Parameters”

  • Size (tuple(height,width) or int) — tuple resize to specified size; Int, scale the short edge of the image to int size.
  • Interpolation (int, optional) — Interpolation method, usually using the default Pil.image.bilinear double linear interpolation.

2.6 Probability and randomness (Common)

Image enhancement: gray, mirror, flip, translation, rotation and so on.

[code]

# Change to gray, the number of input and output channels is the same by default
torchvision.transforms.RandomGrayscale(p=0.1)
# Random horizontal flip
torchvision.transforms.RandomHorizontalFlip(p=0.5)
# Random vertical flip
torchvision.transforms.RandomVerticalFlip(p=0.5)
Copy the code

“Parameters”

  • P: represents the probability of executing this transform

At Tensor

3.1 Normalize

[code]

torchvision.transforms.Normalize(mean, std, inplace=False)
Copy the code

“Parameters”

  • Mean and STD are list, [mean_1,…, mean_n] and [std_1,…, std_n], n for the channel number. Each channel should have a MEAN and STD. The calculation method is the usual one:


o u t p u t [ c h a n n e l ] = ( i n p u t [ c h a n n e l ] m e a n [ c h a n n e l ] ) s t d [ c h a n n e l ] output[channel] = \frac{(input[channel] – mean[channel]) } {std[channel]}

PIL, Tensor transformation function

4.1 ToPILImage

torchvision.transforms.ToPILImage(mode=None)
Copy the code

【 introduction 】

Take a tensor or np array and convert it into a PIL. And the thing to note is that if you put in the Tensor, then the dimension would be C x H x W, or H x W x C if you put in numpy. (This is a question that doesn’t usually come up, but is hard to think about when it does.)

4.2 ToTensor

torchvision.transforms.ToTensor
Copy the code

【 introduction 】

Translate PIL or Numpy into Tensor. PIL and Numpy (format H x W x C, range [0,255]), translate into Tensor (format C x H x W, range [0,1])

5 case code analysis

from PIL import Image
from torchvision import transforms

def loadImage() :
    # fetch image
    im = Image.open("brunch.jpg")
    im = im.convert("RGB")
    im.show()
    return im
im = loadImage()
Copy the code

The picture is from when I was studying in The UK. There was a dish called FIG toast. Although it was not delicious, it was beautiful.

Crop a 600*600 image from the center
output = transforms.CenterCrop(600)(im)
output.show()
Copy the code

Cut an image from the center with a length of 600 and a width of 800
output = transforms.CenterCrop((600.800))(im)
output.show()
Copy the code

Crop a 600*600 image randomly
output = transforms.RandomCrop(600)(im)
output.show()
Copy the code

Crop a 600*800 image randomly
output = transforms.RandomCrop((600.800))(im)
output.show()
Copy the code

Cut a 300*300 image from top, bottom, left, right and center
outputs = transforms.FiveCrop(300)(im)
outputs[4].show()
Copy the code

A similar picture is less painful

#p defaults to 0.5, set this to 1, then it will definitely flip horizontally
output = transforms.RandomHorizontalFlip(p=1.0)(im)
output.show()
Copy the code

output = transforms.RandomVerticalFlip(p=1)(im)
output.show()
Copy the code

Select an Angle between (-30,30) to rotate
output = transforms.RandomRotation(30)(im)
output.show()
Copy the code

Select an Angle between 60 and 90 for rotation
output = transforms.RandomRotation((60.90))(im)
output.show()
Copy the code

output = transforms.Resize((400.500))(im)
output.show()
Copy the code

This image is also smaller in size, so I’m not going to show it.

trans = transforms.Compose([transforms.CenterCrop(300),
                            transforms.RandomRotation(30),
                            ])
output = trans(im)
output.show()
Copy the code