Alitao – Poirot

What is Computer Vision? What is the difference between Computer Graphics and Image processing?

In a word, computer vision (CV) is the science of allowing computers to extract useful information from images and analyze it. The information extracted using CV can then be used to assist decision-making or tasks such as medical image analysis, engineering measurement, autonomous driving, robotics and so on.

Two will be discussed together often have confuse sex is the concept of computer graphics (CG) and image processing, the difference between them is also very simple, CV and CG in the reverse process is actually a direction: CV is a computer extract useful information from the image, and CG by using mathematical model and computer algorithms to generate the image. The input and output of image processing are images, but the output image is enhanced or extracted in some features. (The flow chart of the three is as follows)


As you can see from the above definition, CV is not the core technology used by Google Street View, Google Glass, Pokeymon Go (AR), Deep Fake, and many other applications that sound relevant to it (maybe CV will be used in one part of the process, but not the core technology). The real high-intensity applications of CV are unmanned driving, OCR, intelligent screening of packages during security check, medical imaging analysis, robot vision, military target screening and positioning by image and so on, which require computer analysis of images to extract information.

Computer vision versus the human eye

Human vision is simply a two-dimensional projection of the three-dimensional world (affected by the change of perspective). The visible wavelength is between 400-800 nanometers, and the gamut can be summarized by RGB system. A third of the human brain is connected or indirectly related to the retina and optic nerve in the human eye.



The imaging instrument of computer vision is not as simple as the human eye. Compared with the visible frequency domain of the human eye, computer vision is not even limited to the field of electromagnetic waves (such as B ultrasound imaging). As for the dimensions of imaging, traditional cameras are two-dimensional images, radar images are 2.5D images (the extra half dimension is distance), and MRI images are 3D images (the third dimension is formed by a large number of TWO-DIMENSIONAL images superimposed on the Z-axis).




Computer vision link overview

If we try to do an abstract summary of the CV process step by step, we get two very classical links. In the traditional means of CV, we will seldom directly the two-dimensional image (or even higher dimensions of the image) flattening to one dimension, and then direct input to the traditional machine learning model (such as the family of the decision tree, the SVM and KNN, MLP, etc.), usually we will do to the original image feature extraction, will some of the more digital features as the input of machine learning, Make the model run more accurately and efficiently. In recent years, the rise of deep learning largely overturned people’s perception of the law of experience, CNN and its derivative model (ResNet, etc.), GAN, already LiangNian year hot from supervised learning, such as comparative study has can replace manual feature extraction in many situations, a classic example of the past game AI, Engineers will extract information such as the chi of each piece and the size of the space between the two sides as input to the model, but with the emergence of The AlphaGo series, manual feature extraction proved far less accurate and efficient in this scenario than letting the convolutional and pooling layers of the deep learning model extract themselves.


Is deep learning just a flash in the bubble under hype?

Over the past few years, many engineers have jumped into the field with a vision of deep learning, but may now find that it is not as brilliant as expected. Cool-headed people are starting to wonder if deep learning has been over-hyped. To illustrate the problem, start with the Gartner Hype Cycle. The life cycle of a technology can often be divided into five stages: 1. Start-up ascent 2. The peak of bubble expansion 3. The trough of bubble disillusionment 4. The bright stage of maturity 5. A plateau in production.


The graphs below show Gartner’s official statistics on some of the hottest cutting-edge technologies of recent times. As shown in the chart, deep learning (arrow) has gone from the bubble peak of expectations in 2016 to the trough of disillusionment in the past five years. It can be expected that deep learning will go through a long period of hibernation, accompanied by doubts and pessimistic expectations, until it reaches maturity.






Image digitization

Spatial sampling, quantization, and connectivity strategies are important concepts involved in digital representation of images. ** Spatial sampling: ** describes the number of samples in a unit space.


Quantization: The process of converting the brightness of each pixel in the color channel from continuous distribution to discrete distribution according to the interval.


Four – connected and eight – connected: Criteria for the field of pixel computing. Quad connection is the region where four edges of a pixel are connected, and the octet connection has four more angles than the quad connection.

Distance: Euclidean distance – straight line distance between two points Manhattan distance (City Block) – Distance between two points when walking along a grid line (i.e. moving only in horizontal and vertical directions) Chebyshev distance – The distance between two points (i.e. horizontal, vertical, and diagonal) along the grid line




Region connectivity algorithm

A picture often has foreground and background. Before we judge which pixels in the picture are connected into a block, we need to determine the connectivity strategy of foreground and background: background four-connected, foreground eight-connected and background eight-connected, foreground four-connected. Why not use a four-connected or eight-connected background and foreground strategy? As shown below:


Maintaining the same connectivity strategy for the background and foreground can result in the extreme situation shown above where foreground and background are completely connected or completely broken.

Recursive Algorithm

As a basic idea to calculate the connected regions of pixels, the strategy of recursion method is very simple: traverse all pixels in the image, and recursively mark all connected pixels with the current region number when encountering the point without marked region number.


Sequential Method

Although the recursive strategy can solve the problem, it consumes too much resources. Here is an advanced method to apply the connectivity problem with pixel areas:

  • Iterate over all points in the image, and assign the same value to the changed pixel points if the adjacent points have been marked and numbered. (For a four-connected prospect there are two traversed neighbors, and for an eight-connected prospect there are four traversed neighbors.) If no neighbors are numbered, one is added to the previous number. There is a special case where the pixel pointed by the red arrow in the figure has multiple adjacent points with different numbers at the same time. In this case, we need to mark numbers 2 and 3 as equivalent.
  • The second time all points in the image are iterated, merging all numbers previously marked as equivalent.




Tao department front – F-X-team opened a weibo! (Visible after microblog recording)
In addition to the article there is more team content to unlock 🔓