1. The human eye features
  2. The frame and field
  3. BLC (Black Level Correction)
  4. Binning
  5. Video Size
  6. Prevent the red eye of the red eye
  7. Tone mapping
  8. Names of some light sources
  9. What is 3D Sensing?
  10. Slow motion video
  11. The ultra-high speed camera takes pictures when it freezes
  12. Image segmentation
  13. A flat area
  14. Texture area
  15. Clear edge
  16. Bayer CFA
  17. Distortion
  18. Artifacts
  19. CIE1931
  20. The color deviation
  21. ISP

First, the characteristics of human eyes

97 million rods

Cones.

1 million ganglion cells 100w ganglion cells

There are about 125 million rods and cones in the retina, which act as photoreceptors.

There are 18 times more rods than cones.

Rods are sensitive to low light (they can sense a photon) and are responsible for producing dark vision in dim environments, but can only distinguish light and dark, not details and colors. 97 million rods

Cones sense bright light and color, and produce bright vision, which helps us read books and newspapers. Cones.

The information received by the rods and cones is then transmitted to the nearly one million ganglion cells in the retina.

These ganglion cells send information from the rods and cones to the brain via the optic nerve.

There are three types of cones, each containing red, green and blue light-sensitive pigments.

Color light causes color vision, which is a complex physical and psychological phenomenon.

According to the theory of three primary colors, the mechanism of color perception is explained as follows: when different color light acts on the retina, the three kinds of cones produce different degrees of excitement, and the excited information is transformed into different combinations of optic nerve impulses after processing.

When it reaches the cerebral cortex, it produces different color perception.

For example, when the ratio of red, green and blue cone excitement is 4:1:0, red color perception is produced. A ratio of 2:8:1 produces green color perception.

The human eye can distinguish about 150 colors at wavelengths between 380 and 760nm,

But some people, due to genetic factors, lack the corresponding cones and cannot distinguish certain colors, known as color blindness.

If there is no sense of red or green light cones, do not distinguish between red and green, called red-green color blindness.

Some people more because health or nutrition is poor, distinguish color ability is poorer, call color weakness.

Two, frame, field

Live television images are composed of continuous still images with little change in content.

One of these still images is called a “frame” in television technology.

In order to ensure that there is no flicker when watching images, according to the characteristics of visual persistence of human eyes (the images previously appeared in human eyes will remain for a period of time after the disappearance of images), it is required to transmit continuous images up to 25 frames per second.

In television transmission technology, in order to further improve transmission quality, and a frame image is divided into two to transmit, at this time, one is called “a scene”.

Iii. BLC (Black Level Correction)

When the analog signal is very weak, it may not be converted by A/D, resulting in the loss of image details when the light is very dark.

Therefore, Sesnor gives the analog signal A fixed offset before A/D conversion, ensuring that the output digital signal retains more image detail.

The black level correction module determines the specific value of the offset through calibration. The subsequent ISP processing modules need to subtract the offset value first to ensure the linear consistency of data.

Four, Binning

Camera Binning Mode: Pixel merging Mode, in which the charges of adjacent pixel units are physically superimposed together as a pixel output signal

Horizontal Binning: charge stacking of adjacent rows in the same column;

Vertical Binning: charge stacking of adjacent rows in the same row;

Advantages of Binning Mode:

Increase the photosensitive area, improve the sensitivity of light sensing in the dark; Application: add physical photosensitive pixel unit, adopt pixel merge mode, improve the sensitivity of light sensing in dark place;

Disadvantages of Binning Mode:

Reduce output resolution;

Five, the Video Size

 FHD Video: 1920 * 1080-16:930 FPS

 HD Video: 1280 * 720-16:9 60 FPS

 VGA Video: 640 * 480-4:3 120 FPS

4K video shooting 3840×2160 30fps

1080p video shooting 1920×1080 30fps

720p video shoots 1280×720 at 30fps

480p video shooting 720×480 30fps

Slow motion video, 720p, 120fps

For example, with a horizontal field of view Angle of 120 degrees, the 720P can take pictures of license plates at a distance of up to 5.5 meters, about the distance of a car body.

1080P can pick up a license plate measuring 8.3 meters, about a car and a half away. The 1296P can take a picture of a license plate from 10 meters away, about two car cars away.

The difference between 720P and 1080P is only the difference in resolution.

2, 1080P picture resolution: 1920×1080

3, 720P picture resolution: 1280×720

4. The “720” in 720P refers to the picture resolution, and the “P” refers to the progressive scan. The average frame rate is 60, 30, 25, 24.

Here are some common scan formats:

D1 is in 480I format, with the same resolution as NTSC analog television, 525 vertical scan lines, 480 visible vertical scan lines, 4:3 or 16:9, interspaced /60Hz, line frequency 15.25KHz.

D2 is in 480P format with the same specifications as the line-by-line scan DVD, with 525 vertical scan lines and 480 visible vertical scan lines at 4:3 or 16:9. The resolution is 640×480, line-by-line /60Hz, and the line frequency is 31.5KHz.

D3 is 1080I format, which is the standard digital TV display mode. 1125 vertical scan lines and 1080 visible vertical scan lines are 16:9, with a resolution of 1920×1080, interline /60Hz, and line frequency of 33.75khz.

D4 is a 720P format, the standard digital TV display mode, 750 vertical scan lines, 720 visible vertical scan lines 16:9, resolution of 1280×720, line by line /60Hz, line frequency of 45KHz.

D5 is a standard digital TV display mode in 1080P format, with 1125 vertical scan lines and 1080 visible vertical scan lines at 16:9. The resolution is 1920×1080 progressive scan, professional format.

In hd webcams, we usually call them megapixel and megapixel cameras:

1080P actually means 1920 x 1080 pixels — 1920 x 1080=2073600 pixels if you do the math and people usually call it 1080P or 1080i for 2 megapixels

960P actually means 1280 by 960 pixels — that’s 1228,800 pixels if you do the math and people call it 960P or 960i for 1.3 megapixels

720P actually means 1280 by 720 pixels — 1280 by 720 is 921600 pixels if you do the math and people call it 720P or 720i megapixel resolution

1, XGA resolution 1024*768

2, VGA resolution 640*480

3. UXGA resolution 1600×1200

Six, prevent red eye in red eye

Due to the abundance of blood vessels in the retina, at night, when a flashlight is used to take photos, the pupil does not have time to shrink due to the instantaneous light. Instead, the pupil dilates to allow more light to pass through. In this way, the light is projected onto the retina.

Blood vessels in the retina produce a redness in the image, causing the eye to appear red, or “red eyes”.

Seven, tone mapping

Refers to the image color mapping algorithm.

The purpose is to adjust the gray level of the image, so that the processed image looks more comfortable to the human eye and can better express the information and features in the original image.

Names of some light sources

Tungsten: Tungsten light, Horizon

A Warm Fluorescent: A

Shade: D75

-Leonard: Strobe

Fluorescent: CWF, TL84

Daylight Fluorescent lamp

Incandescent: A,H

Neutral – density filter:

Neutral Density Filter (Neutral Density Filter) is also called medium gray Density lens, referred to as ND lens, its role is to Filter light.

This filter effect is non-selective, that is to say, ND mirror on a variety of different wavelengths of light reduction ability is the same, uniform, only play the role of weak light, and the color of the original object will not have any impact, so can truly reproduce the contrast of scenery.

9. What is 3D Sensing?

3D Sensing literally translates the original 2D camera into 3D data, which not only makes the image appear three-dimensional, but also enables each pixel to have z axis (depth/distance) data in addition to X and Y axis data.

So 3D Sensing simply means adding a module for measuring depth data to the existing camera.

At present, 3D Sensing is mainly composed of transmitter and receiver. Different paths have different device structures, generally including laser emitter (VCSEL emitter), Wafer-Level Optics (WLO) and Sensor.

There are three solutions on the 3D Sensing market, in descending order of maturity: Structured Light, TOF, and Binocular.

Among them, the most mature structured light scheme has been widely applied in the field of industrial 3D vision, while TOF scheme has appeared in Google’s Project Tango scheme. Binocular algorithm is difficult to develop, so it is widely applied in emerging fields such as robots, autonomous driving and other fields that do not care about power consumption.

10. Slow motion video

A technique in cinematography, “Cinema is truth in 24 frames per second.”

To achieve some simple tricks, vary your normal shooting speed from higher than 24 frames per second to 50 frames per second, 100 frames per second, or even higher.

The playback speed is still 24 frames per second, which is the equivalent of taking an image taken in one second and playing it back in two to four seconds. This is what we call slow motion.

If you shoot at a lower speed (less than 24 frames per second), the projection is fast.

Record movies in slow motion at 960fps and output movies at 30fps.

• 960 FPS / 30 FPS = 32 x

That means that even a phone can record 32 times more slow motion than the Google Pixel’s 240fps, four times slower than the Sony Xperia XZs and XZ Premium’s ultra-slow motion. The sensor is also five times faster at recording images, reducing the rolling shutter effect by recording images in 8ms instead of 40ms.

Many new phones, like the iPhone X, already support 1080/240p slow motion video. However, slow motion video is not a common feature because it weakens the image quality and takes up a lot of storage space.

1. 120FPS is playing 120 frames per second, which can also be interpreted as playing 120 pictures per second. 240FPS is 240 frames per second, also known as 240 images per second.

In contrast, if you play slow motion, the latter can play slower.

2. Iphone6 can reach 240fps in slow motion. Generally speaking, the higher the frame number is, the picture may be a little blurry compared to 60 frames.

The reason for this is that the camera and the image processor have the same performance, the larger the value, the more blurred the relative image.

The constant talk of “jelly”, or the camera flash at the scene only took a bright line, the root is caused by the rolling shutter of CMOS.

1. Electronic rolling shutter

Most CMOS sensors currently use this shutter. For any pixel, it is now cleared at the beginning of exposure, and then the signal value is read after the exposure time. Because the readings are serial, the clearing/exposure/readings can only be done line by line, usually from top to bottom, much like a mechanical focal plane shutter. Like the mechanical focal plane shutter, it will produce obvious deformation to the object moving at high speed. And because its scanning speed is slower than that of a mechanical focal plane shutter, the distortion is more pronounced. For example, if the data is read at 20 frames per second, the exposure sequence between the top and bottom of the image will differ by as much as 50 milliseconds. In order to make up for the defect, usually in the digital camera usually cooperate with mechanical shutter, exposure at the beginning of the whole image sensor reset (the most sensors have rapid reset function, can be done in a couple of clock cycle of the whole sensor reset), then the mechanical shutter open, mechanical shutter closes after exposure, read data order.

2. Global Shutter /snapshot shutter

The main difference is that a sampling holding unit is added to each pixel, which samples the data after the specified time and then reads it out in sequence. In this way, although the pixels that are read out after the exposure are still being made, the data stored in the sampling holding unit is not changed.

11. Ultra-high speed camera shooting when coagulation

Condensing (960fps) is a feature that allows users to record video at 960fps using an ultra-fast camera and capture footage that is normally invisible to the human eye by playing it 32 times slower than normal video (30fps) and 4 times slower than existing slow motion (240fps).

Normal video (30fps) :

The speed of the video is the same as the actual motion.

Slow-motion video (240fps) :

Video playback is 8 times slower than normal, similar to typical movie, commercial, or sports speeds.

Shooting video while setting (960fps) :

The images were played 32 times slower than normal, capturing small changes in facial expression and subtle movements of fast-moving objects.

In slow motion (240fps), all video clips are saved at 240fps.

In the case of freezing shot (960fps), only the 0.2 second video shot by the user at super high speed is saved at 960fps, and the rest is normal video (30 FPS). Thus, the video stored at condensation time (960fps) is relatively small.

What is 960fps slow motion photography?

First, there is the definition of FPS, which is the number of frames per second transmitted by an image or video frame, commonly known as “refresh rate.” The higher the number, the more coherent the motion picture looks.

When a video reaches 960fps, it is beyond the realm of “smooth” and can only be fully demonstrated with slow motion video.

This technique is often used in professional film and television fields. For example, when we watch a racing car, the high speed racing car whizzes by and can’t even make out the type of car; Using high frames in slow motion, you can see the car as it passes through the auditorium, not just a shadow.

Xii. Image segmentation

It is the technical process of dividing an image into several parts or subsets according to certain features such as gray scale, color, spatial texture and geometric shape.

The quality of segmentation directly affects the results of image analysis and processing.

13. Flat areas

Refers to the region that does not contain obvious edges and has relatively flat pixel variation.

Texture area

It refers to the region where the edges are relatively dense and there are some false edges. The edge distribution in the texture region presents a certain periodicity, so there are some texture features.

15. Clear edges

It refers to the clear edge of the image with obvious edge features that can be detected by the traditional edge detection operator.

Common legend:

Bayer CFA

In the 17th and Distortion

Eighteen, Artifacts,

In the 19th, CIE1931

20. Color offset

Diagram of withered leaves

The color falling coin diagram is an upgrade of the dead leaf diagram designed to measure texture sharpness.

It has several advantages over the old dead-leaf map, including the fact that the grey areas on the left and right help to remove noise (using a technique called grey block noise, the spectral power density of which is subtracted from the PSD of the sum of the signal and noise from the central spilt area) in order to improve the ratio of measurement accuracy to robustness.


* I have been engaged in Android Camera-related development for 5 years. * Now I am working in Shenzhen. * Welcome to follow my wechat official account: Xiaochi Notes