Theory of wisdom

Compile | Bot

Source | Seattle Data Guy

Editor’s note: As an important branch of artificial intelligence, computer vision has gone through a stormy 50 years. Over the past half century, it has learned from neuroscience, using cameras to simulate human eyes, computers to simulate brains, algorithms and programs to simulate thinking, and realized the use of machines to recognize images and depict scenes. So, how did it develop?

Vision has always been a complex task of the human brain (and now the “computer brain”), which scans our surroundings and locates ourselves all the time whenever we open our eyes. Although some changes in the environment, such as depth perception, object tracking, light differences, edge detection and other feature retrieval, can affect the visual formation process, we are accustomed to them and do not understand the subtle changes that occur behind them. Creating a system that functions like the human brain may have never occurred to previous researchers, but over the past 50 years, humans have made the leap from purely neuroscientific research to using computers to create images.

From neuroscience to computer vision

Phineas Gage injury source: Sciencedaily

Neuroscience provides a wealth of medical cases for understanding brain function, such as the famous case of Phineas Gage: Phineas Gage, a railway construction foreman, recovered from an injury in which an iron tube was driven through his left frontal lobe. His basic functions, including movement, speech and intelligence, were normal, but his temperament changed. This extremely rare case led researchers to associate the frontal lobe with higher mental activities. In 1992, Kenneth H. Britten described in A Comparison of Neuronal and psychical Performance changes in brain signals when looking for A target in A group of chaotic moving points. These studies are piecemeal, but they do a great job of explaining how the brain works.

Similarly, computer vision was inspired by neuroscience, a field whose founding work was the work of Hubel and Wiesel.

In 1981, Hubel and Wiesel won the Nobel Prize in Physiology or Medicine for “discoveries concerning information processing in the visual system,” having tested cells in the visual cortex of cats in the late 1950s. In their experiments, they buried microelectrodes in the cells of the cat’s visual cortex, then flashed lights and patterns on a screen. By fixing the cat’s head, they were able to easily control images on the retina and test how cells responded to shapes like lines, right angles, and edge lines. Through amplifiers and speakers, they could even hear the cells turning on.

This new discovery of the primary visual cortex (V1 cortex) caused a sharp reversal at the time and laid the foundation for later research on neuronal mapping.

Experimental video screenshot, when the light column moves, it produces some noise

In doing so, Hubel and Wiesel found that cells in the visual cortex responded only to specific details of images on the retina. Another fascinating feature was that the cells seemed to map naturally to different angles. As shown below, each region of the V1 cortex contains specific neurons that respond differently to light from a particular Angle:

When these cells respond, they theoretically create a wall-to-wall projection of the real world, meaning that as light-sensing neurons respond to light from different angles at the same time, they are actually making images of the real world in the brain.

How to encode and decode

Let’s fast forward to Olshausen and D J Field.

Nearly 30 years after Hubel and Wiesel’s theory, Olshausen and D J Field, two researchers specializing in computer neuroscience, made a major breakthrough in encoding and decoding the brain, pushing computer vision work further. In fact, they mentioned the cat cell experiment 30 years ago.

Unlike their predecessors, the two young researchers, then at Cornell University, were more concerned with how algorithms recognised and encoded features in images than with blocking them. In 1996, Their paper Natural Image Statistics and Efficient Coding (http://pdfs.semanticscholar.org/e309/e441a38ccee6456bd02e0f1e894e44180d53.pdf) officially published.

This is a classic paper. In this paper, the author points out the limitations of the model in image recognition by using the Hebb learning algorithm of principal component analysis, that is, it cannot learn localization, orientation and bandpass structure to form a natural image. One of the core ideas of The Hebb model is that the more times a certain feature is trained, the easier it will be detected in the later recognition process. Hubel and Wiesel’s experiment proved that the visual cortex neurons only respond to certain features.

Olshausen and D J Field integrated various viewpoints, and believed that when there is some regularity in coded natural images, we should weaken the training and learning of repetitive excitation features, develop a model that emphasizes sparsity, and make the network pay attention to those different features to help improve the discrimination ability.

They established some backup models for various features in the image, which are mainly reflected in the following formula:

Source: Natural Image Statistics and Efficient Coding

This formula is used to calculate the lowest mean error between the actual image and the image function.

Source: Natural Image Statistics and Efficient Coding

This part forces the algorithm to limit the coefficient of the image function in combination with cost function.

Source: Natural Image Statistics and Efficient Coding

This part minimizes the coefficient of the image function by gradient descent.

While the paper doesn’t give a specific neural network model for recognizing images, the timing of the idea is remarkable given that the world only had the Internet in 1991 and their paper was published in 1996.

From kitten cells to mathematical models

Now, computer vision research has shifted from visual cortex cells to mathematical models.

Olshausen and D J Field conclude by saying: “An important and exciting future challenge will be to extrapolate these principles to higher cortical visual areas to provide predictions.” This is not a small challenge, which means that researchers need to create a branch to stem neural network based on low-level models and realize image prediction.

Source: Natural Image Statistics and Efficient Coding

How about they include a picture like this as an example in their paper? Does that look familiar? If you’re a deep learning enthusiast, you can find a matrix similar to the one above in many papers from the past few years. These matrices are commonly used as volume bases in convolutional neural networks (CNN), which are thought to mimic the way individual neurons respond to visual stimuli.

Source: Andrej Karpathy and Deep Visual-Semantic Alignments for Generating Image Descriptions

Now, this 1996 challenge has been successfully resolved and the use of low-grade features to predict images has become a reality.

In 2015, Andrej Karpathy and Li Feifei from Stanford University wrote a paper called Deep Visual-Semantic Alignments for Generating Image Descriptions. In this paper, They demonstrated a recursive neural network (RNN) that provides detailed descriptions of images. Not only can it point to a cat, or identify a dog from a photo, it can describe specific images, such as “a skateboard boy doing a back flip.”

Source: Andrej Karpathy and Deep Visual-Semantic Alignments for Generating Image Descriptions

The model is not perfect, but compared to what was done in 1968, the progress is impressive.

From the late 1950s to 2015, computer vision passed for half a century, which may not seem like much compared with the long road ahead. However, the era of artificial intelligence has arrived, the future, the development of computer vision will only be faster and faster. It is not only about image recognition in the academic field, but also about demonstrating the success of advanced technology in medical imaging, autonomous driving, emotion prediction and other aspects of social progress.

What will computer vision bring us in the next 50 years?

The original address: https://towardsdatascience.com/from-neuroscience-to-computer-vision-e86a4dea3574

This article is compiled on wisdom, reprint please contact this public number for authorization.