preface

A while ago, I was interested in image processing, so I learned it, but I found it was a big hole:

Look at these books, and the smile fades. I use MATLAB in my book, but I haven’t used it since I finished university. If you want to use it in iOS, you need to learn OpenCV, and OpenCV needs to… Forget it. I don’t want to.

I’ll be old by the time the hole is filled in:)

At present, summarize some basic knowledge of image processing, record it, share it with your friends… Of course, the big boys don’t have to look.

Image Basics

Image format

Daily we see the picture file suffix, on behalf of the picture format, the common storage formats are: JPEG, BMP, PNG, GIF and so on. Picture format is the format in which the computer stores pictures.

Image classification

Images can be divided into bitmaps and vector maps.

  • Bitmap: Also known as a pixel map, is an image pieced together using an array of pixels. Captured images such as shots and screenshots are pixel images. The pixel image will be blurred when magnified to a certain extent. Common pixel image formats include JPEG, PSD, PNG and TIFF.

  • Vector drawing: an image made up of points, lines and planes. Vector graphics are often drawn using vector software. Points, lines and planes are mathematical, so magnification does not blur. Common vector formats include AI, EPS and SVG.

Several common bitmap formats

You can refer to wikipedia for more information about various formats. For iOS development or clients, bitmap formats are generally.jpg /.png /.gif /.webp. Here’s an excerpt from the Mobile Image Format Survey (highly recommended) to illustrate the pros and cons:

File extension The birth of history advantages disadvantages
GIF It was born in 1987 and became popular with the first generation of Internet The only advantage is supportMany animationThis feature has made it popular since Windows 1.0 and is still popular. It has many disadvantages, such as generally only 256 colors, transparent channel only 1 bit, file compression ratio is not high.
JPEG Created in 1992, it is a very old format Its compression algorithm can precisely control the compression ratio toImage quality buys storage space. It is so common that cpus on many mobile devices support hard coding and hard decoding for it. Only supportLossy compression
PNG It was born in 1995, a few years after JPEG. It itself is designed to replace the GIF format, so it has more in common with GIF. Its biggest advantage over JPEG and GIF is supportComplete transparent channel. PNG supports onlyLossless compression, so its compression ratio has an upper limit.
WebP Is an image format that Google released in 2010, hoping to replace JPEG with a higher compression ratio VP8 video coding is used as the basis of the algorithm, and good compression effect is achieved. It supportsLossy and lossless compression, complete transparent channel support, also support multi-frame animation, and no copyright issues, is a very ideal image format. For complex images, such as photos, WebPLossless coding does not perform well

JPEG and PNG comparison

As you can see from the table above, JPEG was born earlier than PNG and is more widely used, but JPEG does not support the alpha channel.

What’s the problem with that? For example, when you compress an image on iOS, if the image is transparent in some way, using JPEG will not show the transparent part. PNG is fully supported.

I wrote a demo to actually verify this problem, the specific code and effect can be viewed in the demo project.

First, I used an original image with the top half white, then straight down and gradually transparent:

Then the imageView screenshot processing, respectively using PNG and JPEG two ways to compress the generation of new images.

Compressed image in PNG format:

Compressed image in JPEG format:

By comparison, it can be found that:

  • After JPEG processing (JPEG compression ratio is set to 0.9), the transparent part of the original image is lost and becomes white. PNG supports transparent channels well.

  • JPEG images are 81 KB in size, while PNG images are 487 KB, a difference of more than six times for the same image.

Therefore, JPEG and PNG have their own advantages and disadvantages. JPEG has a huge advantage in the size of the picture, but PNG is undoubtedly better in the presentation of the picture. You can adjust it according to the scene.

Compression mode – lossy compression/lossless compression

It says the difference between JPEG and PNG. So let’s explore further, what makes JPEG and PNG different in rendering and size?

So this is where compression comes in.

GIF/PNG uses lossless compression, while JPEG uses lossy compression. WebP supports both (WebP shows a triumphant smile ✌️).

Here are two types of compression:

  • Lossless Compression refers to the fact that after data Compression, the information is not damaged and can be completely restored to the original state before Compression. Lossless compression is usually used in scenarios where the compressed and decompressed data must be consistent with the original data.

    Lossless compression, for example, the original sentence is:

    TimWang says TimZhang is TimWang’s brother. TimZhang says TimZhang is not TimWang’s brother.

    Now for lossless compression, let’s define:

    Tim = a, Zhang = 1, Wang = 2, brother = b

    Then the sentence became:

    A2 says a1 is b of A2. A1 says a1 is not the b of A2.

    It is easy to see, then, that the length of the data is significantly reduced (compression), and by definition, the information can be completely restored (lossless), hence lossless compression.

  • Lossy Compression (Lossy Compression). The compressed and decompressed data is different but very similar to the original data. Reduce data volume and improve compression ratio by discarding secondary data and sacrificing quality.

    Lossy compression, again in one sentence:

    TimWang says TimZhang is TimWang’s brother. TimZhang says TimZhang is not TimWang’s brother.

    Lossy compression, processing our data, still does not affect its expression:

    TimWang said it was Brother TimZhang. TimZhang said it wasn’t the TimWang brothers.

    As you can see, the length of the data is also reduced (compression), which is achieved by losing “unimportant” information (lossy) and leaving “critical information”. However, it cannot be restored, hence the name lossy compression.

From the above, you have a general understanding of lossless/lossy compression. Here is no longer redundant, traditional martial arts pay attention to point to the end, about the two ways of compression want to know more students to go into the pit

Color model of bitmap

As explained above, a Bitmap is an image represented by an array of pixels.

Bitmap pixels are assigned specific positions and color values, and the color information of each pixel is represented by RGB combinations or grayscale values.

From the bitmap, select a number of the most representative colors (usually no more than 256) into a color table, and then the original colors in the picture, with the index of the color table to represent.

This reference to RGB combinations and color tables requires an understanding of color models.

What is a color model

Color model is a computer model to represent various colors, is a way of color description.

A color is usually identified by multiple variables, equivalent to spatial coordinates.

In fact, a color is objective and fixed, but different color models describe it from different angles, such as 11 = 10 + 1, or 11 = 2 + 9.

Color models determine the “rules” by which a color is expressed.

There are many color models, the following 6 models are commonly used:

  • RGB
  • YUV
  • HSV
  • HSL
  • CMY/CMYK
  • LAB

It can be divided into two categories:

  • Hardware oriented devices, such as color display, printer, ETC., RGB(color camera/color camera), CMYK/CMY(color printer), YUV(TV system)
  • For color processing, such as creating color maps for animation, HSL/HSV/LAB.

The space of this article is limited. The following is a brief description of RGB and HSV color models.

RGB

RGB, also known as RGB color model, is an additive color model:

The three primary colors of Red/Green/Blue are added together in different proportions to produce various color effects.

As for why the primary colors are red/green/blue rather than other color components, wikipedia explains the physiological reasons for choosing red, green and blue:

The principle of three primary colors is not due to physical reason, cause as a result of physiological reason however.

There are several types of color-sensitive cone-shaped photoreceptors in the eye, which are most sensitive to yellow-green/green/blue-violet light. If the yellow-green cells are stimulated slightly more than the green-green cells, the human perceives yellow. If the yellow-green cells are stimulated significantly more than the green-green cells, the human perceives them as red.

Although the three types of cells were not individually most sensitive to red/green/blue, each of the three types of light stimulated the three cone-shaped cells.

Therefore, we can consider:

Red + green produces a visual effect equal to yellow light, not equal to yellow light itself.

Like salt water and pure water, both are transparent liquids that look the same but are not actually the same thing.

RGB color space is a kind of color space with poor uniformity. RGB color space is suitable for display system, but not for image processing.

On iOS, you can directly use the RGBA model with the added Alpha channel to express colors:

UIColor *color = [UIColor colorWithRed:39/255.0 green:45/255.0 blue:51/255.0 alpha:0];
Copy the code
HSV

HSV color model has a large number of applications in image processing. It consists of three parts:

  • H: Hue
  • S: Saturation value
  • V: Value: brightness Value

Hue indicates the color information. The value ranges from 0 to 360 degrees.

The Saturation is expressed in percentage and ranges from 0 to 1. Desaturation means that by adding white to the spectral color, the percentage of the spectral color decreases. A saturation of 0 means that the proportion of spectral colors is 0, resulting in a white color.

Value Is expressed in percentage. The Value ranges from 0 to 1. When the lightness decreases, black is added to the spectral color, the proportion of the spectral color also decreases. When the lightness is 0, the proportion of the spectral color is 0, resulting in the whole color appearing black.

HSV is close to human visual characteristics and has a strong degree of perception. HSV is widely used in computer graphics, scientific visualization and other fields. Unlike RGB, it is a uniform color space. Because HSV is closer to people’s subjective perception of color, it is very suitable for image similarity comparison based on color.

It is also possible to use HSV directly for colors in iOS:

 UIColor *hsvColor = [UIColor colorWithHue:0.10 saturation:0.79 brightness:0.53 alpha:1.00];
Copy the code

note:

All kinds of color models are different expressions of the same physical quantity, so they can be transformed into each other.

For example, RGB/HSV can be converted into corresponding coordinates through formulas. OpenCV, for example, also provides wrapped functions to convert.

The relationship between color model and color space

When it comes to color models, we can also see that models are built in three-dimensional space, so they are inseparable from color space.

So what is the relationship between color model and color space?

If you have a little more development experience, you will certainly find one thing in your daily life:

When we use a color model, the same value XXXX to represent a color may appear differently on different devices (such as emulators/real machines).

The reason for the difference lies in the different color space used. In different color space, the presentation of color is different.

A color model can have different color space, according to the different arrangement conditions, there are different gamut (can represent the color range) and meaning.

Color model is only practical in a specific color space. Take the RGB model mentioned above as an example, there are sRGB/AdobeRGB/AppleRGB and other color Spaces, which are mostly related to image peripherals.

AdobeRGB and sRGB are the most common. Here are two color Spaces (demo materials from the Internet) :

⬆ ️ sRGB

⬆ ️ AdobeRGB

In terms of area, Adobe GB is larger than sRGB, that is, it can represent a larger range of colors.

The RGB color space in the usual design draft is mostly sRGB, which sometimes inevitably causes a small problem of color difference.

The next time you encounter inconsistent colors on the device, you can pair the color space with the designer.

The bitmap is stored

With that in mind, let’s look at bitmap storage.

If RGB color model is used in the picture, every pixel is expressed in RGB to form pixel matrix:

In practical development, RGBA also has a lot of storage, adding Aplha channels, such as PNG images.

In general, if a channel storage requires 8 bits, the number of bits a pixel occupies: RGB is 3×8=24 bits, RGBA is 4×8=32 bits.

Although the above we see is a matrix, in fact, in memory, only one dimension of storage:

Arrange pixels in rows from top to bottom or from bottom to top.

Note that RGB or RGBA may also be stored in a different order, as shown below:

Generally there are RGB, BGR, RGBA, BGRA permutations, most are BGR\BGRA permutations.

The depth of the bitmap

The above mentioned bitmap storage, specific to the number of bits occupied by 1 pixel, is bitmap depth. It is also called color depth, or color number.

Color bits represent the number of bits used to store each pixel in a bitmap, usually in bits per pixel (BPP –bits per Pixel).

The number of bits required to represent the color of each point is an important indicator of resolution:

The higher the color number, the more colors are available — if the color depth is n bits, there are 2n2^n2n colors to choose from.

A pixel bitmap with a bit depth of 1 has only 2(212^121) possible values (black and white), so it is also called a binary bitmap.

An image with a bit depth of 8 has 256(282^828) possible values (or 256 grayscale levels).

If the number of color digits reaches 24, it is called true color, which can be combined into the 24th power of 2, namely: 16,777,216 colors (or millions of colors), more than the number of colors that the human eye can distinguish.

Color bits commonly used are 1 bit (monochrome), 2 bits (4 colors, CGA), 4 bits (16 colors, VGA), 8 bits (256 colors), 16 bits (enhanced colors), 24 bits and 32 bits, etc.

Bitmap size calculation

The more colors, the more realistic the color performance, but each pixel takes up more bits, the corresponding amount of data.

So the color number also determines the size of the picture. The calculation formula is: Picture size (unit bit) = Length x width x color number

For example, a 100*100 image with 32 color bits would be: 100X100X32 = 320000 bits-> 40000 bytes.

And then there’s High Dynamic Range Image, which is the HDR mode on the iPhone.

It mainly uses three kinds of images (underexposure, normal exposure, overexposure), and finally synthesizes a picture. Both light and dark bits of detail are reflected.

HDR uses more than the usual 32 bit levels to store images. Typically, each pixel is allocated 32+32+32 bits to store color information, and each primary color uses one 32-bit bit to store the number of colors it can represent up to 2 to the 32nd power. Rich color, which is also a key to support its performance of details.

Image gray scale

Color space:

Bitmap pixels are assigned specific positions and color values, and the color information for each pixel is represented by RGB combinations or grayscale values.

What is gray scale

Grayscale can also be considered as brightness, which is simply the depth of color.

When it comes to brightness, the brightest is white, and the least bright is black. There is a gray zone between black and white, so it is called grayscale.

The grayscale of RGB bitmap

Single-channel images can only display grayscale images, RGB three-channel images can also display grayscale images.

However, each pixel value of single-channel grayscale image is 8 bits, while each pixel value of three-channel is 24 bits. As mentioned above, the memory size is determined by the number of pixels, so the storage capacity of a single channel image is smaller than that of a three-channel image.

The following diagram is further drawn to represent the RGB gray level. The coordinate system is established with R/G/B as the axis, black (0,0,0) as the origin, and different coordinates represent different colors.

Where, the color on the diagonal of (0,0,0) and (1,1,1) is the gray level of different levels.

Through the diagonals in the picture, according to the three-dimensional coordinates to calculate RGB gray scale, that is, R=G=B.

grayscale

The gray level of a picture is the maximum number of different gray levels in the image.

The larger the gray level is, the larger the brightness range of the image is.

The more gray scale the picture can display, it means that stronger color level and richer color can be realized. For example, RGB bitmap 16 gray scale, the color can be displayed is 16×16×16 = 4096 color.

The gray level of an image, which is related to the bitmap depth:

If the bitmap depth is N bits, the bitmap gray level = 2n2^n2n.

Currently, bitmaps are mostly 256 levels of gray (RGB single channel stored in 8 bits, 2 to the eighth power).

grayscale

Each pixel of a grayscale map has only one sampling color, which is represented by the grayscale value from black to white. For example, each pixel of a grayscale map uses 8 bits, then the grayscale value of the pixel point is one of 256 grayscale levels (0~255) between black and white.

The grayscale of the image is to make every pixel in the pixel matrix meet the relationship:

R=G=B, this value is called gray value.

For example: RGB(100,100,100) represents the gray value of 100, RGB(50,50,50) represents the gray value of 50.

The larger the gray value is, the brighter it is. When the gray value is the highest, the display is white; on the contrary, the darker the pixel, the lower the gray value, and the display is black when the gray value is 0.

Through the difference of gray value, make the pixel present different degrees of gray, show different levels of gray, gray map will be displayed.

Image grayscale processing

In many cases, gray image is a simplified representation of color image, reflecting the brightness of color image.

Image grayscale processing can be used as a pre-processing step for image segmentation/image recognition/image analysis and other upper operations.

There are 4 general gray processing methods: component method/maximum method/average method/weighted average method.

Here is a picture of Yang Chaoyue:

Use the above four ways of processing, respectively, to get different gray images, take a look at the effect.

Weighted average method

The three components are weighted and averaged with different weights.

Human eyes are most sensitive to green and least sensitive to blue. Therefore, a more reasonable gray image can be obtained by weighted average of the RGB three components according to the following formula:

Gray = Rx0.3 Gx0.59 + + Bx0.11

The average method

A gray value is obtained by averaging the brightness of three components in color image.

Gray = (R + G + B) / 3

The maximum value method

The maximum brightness of the three components in the color image is the gray value of the gray image.

Gray = Max(Max(R,G),B)

Component method

According to the brightness of the three components in the color image, as the gray value of the three gray images, three new images are formed. Then select a required gray image

Gray = R

Gray = G

Gray = B

Generate the above several image code, are uploaded to GitHub above, for the inside [UIImage grayForImage:forType:] method.

Image binarization

It has been mentioned that image graying is a pre-operation of image segmentation. After finishing grayscale, often do, is to further binarization of the picture.

What is image binarization

Binarization is to make the gray value of each pixel in the pixel matrix of the image black and white, so that the whole image presents only black and white effect.

For example, when the gray level is 256, the gray value in the binarized image is only 0 or 255.

The effect of image binarization

Binary image plays a very important role in digital image processing. Compared with color image, the binary image is much smaller and the amount of data in the image is greatly reduced.

However, the operation of image binarization is generally to extract the object from the image.

How to process gray image binarization

Generally speaking, to binarization processing, first gray, and then according to gray value judgment, black and white transformation.

So after a pixel is gray, how can its gray value be transformed into black or white?

For example, if the gray value is 188, is it black or white after binarization? This involves taking a threshold.

There are three basic approaches:

1. Median method

For example, if the threshold is 127 (the median value from 0 to 255), the gray value less than or equal to 127 is black, and the gray value greater than 127 is white. The advantage lies in the small amount of calculation and fast speed, but the disadvantage is that the threshold value is 127 in different pictures. Different pictures, the color distribution is very different, so using 127 as the threshold, the effect is definitely not good.

2. Average gray value of all pixels

First, take the average value of a pixel point:

Mean value of pixel = (gray value of pixel 1 +… + pixel point N gray value)/ n.

The pixel average is then used as the threshold. This is better than the median method.

3. Histogram method (also known as bimodal method).

Histograms are an important feature of images. Histogram method considers that the image is composed of foreground and background. In the gray histogram, both foreground and background form peaks, and the threshold value is located at the lowest point between the peaks.

It should be noted that these 3 ways are not universal universal methods, which are relatively rough processing methods.

Binarization processing has a lot of algorithms for different situations, interested in their own in-depth understanding of the recommendations.

Gray histogram

Here is the gray histogram. The horizontal axis is gray level, and the vertical axis is the number of pixels at each gray level.

From the gray histogram, we can intuitively see the comparison of each gray level pixel, gray distribution is an important feature of the image.

Binarization processing demonstration

Again, take a picture of Yang Chaoyue for demonstration:

Binarization threshold To facilitate calculation, directly take one from 0 to 1 in proportion.

First, color image pixels are grayed (weighted average), then greater than the threshold is white, less than the threshold is black.

The threshold value = 0.4

The threshold value = 0.5

The threshold value = 0.6

The relevant code has also been uploaded to GitHub and placed in the [UIImage covertToBinaryzation:] method.

Binarization processing, the most critical place lies in the threshold value. What value to use, and where to use the range, determines the effect of binarization.

This is just a very simple demonstration.

There are many specific applications of image binarization. At present, I think of two-dimensional code scanning, image signature/seal extraction, color icon transformation of merchants and so on in the client.

If there is any incorrect understanding, we welcome you to correct.

reference

Mobile image format research

RGB, HSV, and HSL color Spaces

Color model and color space

Some of the basics of image processing

IOS Storyboard and UIColor create different colors

JPG PNG TGA BMP storage format

Ios image gray processing three methods

Image processing: image grayscale

OpenCV– Binary Image