This is the second day of my participation in the August Text Challenge.More challenges in August

The basics of graphics

Author: ZackSock

1. Images in computers

In a computer, images are stored in binary form. But we don’t usually manipulate images in binary mode, and we prefer to treat images as a set of points. The set is distributed in two dimensions, each point has its own color, and each point is indivisible. Such points are called pixels. Like this one:

We can think of it as a 5 by 5 graph, where every point is black.

Early computers could only display simple images, such as binary images. The image is either black or white, with no room for a third color. For example:

For such images, we only need one bit of binary (0,1) per pixel to represent them. But this kind of image can’t meet the needs of people, so we have more detailed, but still no color image, which we will see later [grayscale image]. For example:

The image above retains most of the details of the real scene. It is not enough to use a bit binary to represent the pixels of the grayscale image, so the grayscale image needs to be represented by an 8-bit binary, that is, 0-255. Since hard disks are no longer a scarce resource, binary images are often represented in 8-bit binary. Use 0 for 0 (black) and 255 for 1 (white).

If you want to represent a color image, it is more complicated. Here is the RGB image of the image:

An RGB image is a picture in which the pixels are represented by three values. The three values indicate the degree of red, green, and blue colors. For example, if one pixel has a red degree of 255 and the other two colors have a red degree of 0, that pixel looks red to us. By matching these three colors, we can match 4294967296 different colors.

Of course, the images we live are much richer, such as transparent images, dynamic images and so on. I won’t go into details here.

2. Images in OpenCV

The understanding of different images mentioned above is equally applicable in OpenCV.

In OpenCV, images are stored as an NDARray type. Ndarray is an array in NUMPY that has multiple dimensions and can represent the complete information of an image. Including the width and height of the picture, pixel value, etc. We can try looking at the nDARray array of the following image:

Since it is quite long, let’s take a look at it and here is the output:

[[[245 225 190]
  [245 225 190]
  [245 225 190]... [214 195 184]
  [214 195 184]
  [214 195 184]]

 [[245 225 190]
  [245 225 190]
  [245 225 190]
Copy the code

We don’t need to worry too much about the contents of arrays at this stage, OpenCV provides a simple API for retrieving key information about images. So let’s take a quick look at it.

3. Obtain image information

Before getting the image information, we need to read the image using the imread function. Let’s look at the following code:

import cv2
im = cv2.imread('xscn.jpg')
print("Shape of image:", im.shape)
print("Image size:", im.size)
print("Image data type:", im.dtype)
Copy the code

So let’s look at the output,

Image shape: (1080.1920.3) Image size:6220800Data type of image: uint8Copy the code

To explain the parameters again:

  • Shape: Image shape, including height, width, and number of layers
  • Size: height * width * number of layers
  • Dtype: The type of each data. For grayscale images, a piece of data is a pixel. For RGB images, a piece of data is the value of a color in a pixel. Unit8 represents an 8-bit binary positive integer (0-255)

The number of layers means that a pixel is composed of several data. For example, the grayscale image has 1 layer number and RGB image has 3 layer number.

4. Image coordinates

For convenience, we can construct a coordinate system for the image, which does not need to be actually constructed, but is just for our convenience.

Take a look at the picture below:

We construct a coordinate system, and we put the top left corner of the picture at the origin. In this way, we can define A pixel point in (x, y) form, such as point A in the figure with the coordinate of (500,300).

5. Get the pixel value

To obtain the pixel value, we also need to read the image first, and then we can access the pixel at the specified position in the following way:

im[y][x]
Copy the code

Where IM is our picture object. X and y in the corresponding coordinates. Take this code for example:

import cv2
im = cv2.imread('xscn.jpg')
pixel = im[0] [0]
print(pixel)
Copy the code

The following output is displayed:

[245 225 190]
Copy the code

Because an RGB image is read, a single pixel consists of three pieces of data. We can further get the value of a certain color, for example, I want to get the value of red in the pixel of coordinate (100,100), we can get it like this:

red = im[100] [100] [0]
Copy the code

But this is wrong. Because in OpenCV, the image is represented in GBR mode by default, and we should get the green value above. The correct actions to get red should look like this:

red = im[100] [100] [2]
Copy the code

6. Modify pixel values

Changing the pixel value is very simple. We just need to find a pixel and assign a value to it. For example, let’s do the following image:

For easy viewing, a 3 × 3 picture is enlarged. We modify the image pixels with code:

import numpy as np
im = np.zeros((3.3.1), dtype=np.uint8)
im[0] [2] = 255
Copy the code

Where np.zeros is used to create a multidimensional array. We will interpret this directly as creating the image above, which we will explain in more detail later.

After creating the image, we changed the pixel at (2,0) to 255. Here is the modified image:

You can see that the specified pixel has been modified.