Posted by Ben Dickson

url : https://bdtechtalks.com/2020/…

Human level representation. Human level accuracy. Whether it’s facial recognition, object detection or question answering, these are terms you’ll hear a lot from companies developing artificial intelligence systems. To its credit, there have been many great products powered by AI algorithms in recent years, largely thanks to advances in machine learning and deep learning.

But many of these comparisons only consider the end result of testing deep learning algorithms on a limited set of data. This approach creates false expectations for AI systems and can produce dangerous results when they are given mission-critical tasks.

In a recent study, a group of researchers from different German organizations and universities highlighted the challenges of evaluating deep learning in processing visual data. In their paper, titled “The Notorious Difficulties of Comparing Human and Machine Perception,” the researchers highlight problems with current approaches to comparing deep neural networks with the human visual system.

In their research, the scientist conducted a series of experiments that delved into the surface of deep learning results and compared them to the work of the human visual system. Their findings are a reminder that we must be careful when comparing AI to humans, even if it performs the same or better on the same tasks.

The complexity of human and computer vision

In the seemingly endless quest to reconstruct human perception, the field of computer vision in deep learning has, by far, yielded the most favorable results. Convolutional Neural Network (CNN) is an architecture often used for deep learning algorithms in computer vision, which can accomplish tasks that are difficult for traditional software to accomplish.

However, comparing neural networks to human perception remains a challenge. That’s partly because we still have a lot to learn about the human visual system and brain. Further complicating matters is the complex way deep learning systems work. Deep neural networks work in very complex ways that often confuse their own creators.

In recent years, a research organization has attempted to evaluate the inner workings of neural networks and their robustness in dealing with real-world situations. “Comparing human and machine perception is not easy, despite the amount of research that has been done,” the German researchers wrote in their paper. .

In their research, the scientists focused on three areas to measure how humans and deep neural networks process visual data.

How does a neural network perceive contours?

The first test involves profile detection. In this experiment, both human and artificial intelligence participants had to indicate whether an image contained closed contours. . The goal here is to understand whether deep learning algorithms can learn the concepts of closed and open shapes, and whether they can detect them in a variety of situations.

Can you tell which of the above images contains a closed shape?

For humans, a closed silhouette is flanked by many open silhouettes, which is visually striking. In contrast, detection of closed contours may be difficult for DNNs, as they may require long-distance contour integration, “the researchers wrote.

In the experiment, the scientists used RESNET-50, a popular convolutional neural network developed by Microsoft artificial intelligence researchers. They used transfer learning to tweak the AI model on 14,000 closed and open contour images.

They then tested the AI on a variety of examples that looked like training data, and gradually moved in other directions. Initial findings suggest that a well-trained neural network seems to grasp the concept of a closed contour. Although the network is trained on a data set that contains only the shape of a straight line, it can also perform well on curves.

“These results suggest that our model does indeed learn the concepts of open and closed contours and performs contour integration processes similar to those used in humans,” the scientists wrote.

Resnet neural network is able to detect a variety of open and closed contour images, despite training only the straight line samples.

However, further research showed that other changes that did not affect human behavior reduced the accuracy of the AI model results. For example, changing the color and width of a line can cause a sudden drop in the accuracy of a deep learning model. When the shape grows to a certain size, the model also seems to have difficulty detecting the shape.

The RESNET-50 neural network struggled when seeing images that contained lines of different colors and thicknesses, and images whose shapes were larger than the training set.

Neural networks are also very sensitive to interference resistance, and well-designed changes are invisible to the naked eye, but can disrupt the behavior of machine learning systems.

The image on the right has been subjected to adversarial interference, which is noise that humans can’t detect. To the human eye, these two images are identical. But for the neural network, they’re different images.

To further investigate AI’s decision-making process, the scientists used bag-of-feature networks, a technique that tries to locate bits of data that help deep learning models make decisions. The researchers found that the analysis proved that “there are some local features, such as endpoints combined with short edges, that usually give the correct class label.” .

Can machine learning reason about images?

The second experiment tested the ability of deep learning algorithms in abstract visual reasoning. The data used in the experiment is based on the Integrated Visual Reasoning Test (SVRT), in which the AI has to answer questions that require understanding the relationships between different shapes in a picture. Testing includes same-different tasks (e.g., are the two shapes in the picture the same?). And spatial tasks (e.g., is the smaller shape located in the center of the larger shape?). . These problems are easily solved by human observers.

The SVRT challenge requires participating AIs to solve same-different and spatial tasks.

In their experiment, the researchers used RESNET-50 to test its performance on training datasets of different sizes. The results showed that the pre-training model, fine-tuned on 28,000 samples, performed well on the same different tasks and spatial tasks. (Previous experiments have trained a very small neural network to produce a million images.) As the researchers reduced the number of training samples, the AI’s performance declined, but at a faster rate on the same and different tasks.

“Similar-different tasks require more training samples than spatial reasoning tasks,” the researchers wrote, adding that “this cannot be used as evidence of systematic differences between feedforward neural networks and the human visual system.”

The researchers point out that the human visual system is hardwired to be pre-trained for a large number of abstract visual reasoning tasks. This makes it unfair to test deep learning models with low data, and almost impossible to draw reliable conclusions about differences in internal information processing between humans and AI.

“It is likely that human visual systems trained from scratch for these two tasks show a similar difference in sample efficiency to that of RESNET-50,” the researchers wrote. .

Measuring cognitive gaps in deep learning

Identifying gaps is one of the most interesting tests in the visual system. Consider the following image, can you tell me what it is without scrolling down?

Below is a smaller view of the same image. There is no doubt that it is a cat. If I show you a close-up of another part of the image (perhaps the ear), you might have a better chance of predicting what’s in the image. We humans need to see a certain number of overall shapes and patterns to recognize objects in images. The more you zoom in, the more features you remove, and the harder it is to tell what’s in the image apart.

Depending on the features they contain, close-up shots of different parts of the cat image have different effects on our perception.

Deep learning systems are also feature-based, but they work in a more subtle way. Neural networks sometimes pick up tiny features that are invisible to the human eye, but can detect them even at very close magnification.

In the final experiment, the researchers tried to measure the recognition gap in the deep neural network by gradually zooming in on the image until the accuracy of the AI model began to drop dramatically.

Previous experiments have shown that human image recognition gaps are quite different from deep neural networks. But in their paper, the researchers point out that most previous tests of neural networks identifying gaps have been based on human-selected patches of the image. These plaques are good for the human visual system.

When they tested their deep learning model on a “machine-selected” patch, the researchers got back results that showed a similar gap between humans and AI.

How does recognition gap test evaluation of enlarged images affect the accuracy of artificial intelligence

“These results highlight the importance of testing humans and machines on exactly the same basis and avoiding human bias in experimental design,” the researchers wrote. . All conditions, instructions, and procedures between man and machine should be as close as possible to ensure that any observed differences are due to inherent different decision strategies and not to differences in test procedures.”

Close the gap between artificial intelligence and human intelligence

As our AI systems become more complex, we will have to develop more sophisticated methods to test them. Previous research in the field has shown that many of the popular benchmarks used to measure the accuracy of computer vision systems are misleading. The work by German researchers is one of many attempts to measure AI and better quantify the differences between AI and human intelligence. Their conclusions could provide a direction for future AI research.

“The primary challenge in comparing humans and machines seems to be a strong interpretive bias within humans,” the researchers wrote. . “Appropriate analytical tools and extensive cross-checking (such as network architecture changes, calibration of experimental procedures, generalization tests, adversarial examples, and restricted network testing) help rationalize the interpretation of the findings and put this internal bias into perspective. In summary, care must be taken not to impose our human systemic biases when comparing human and machine perceptions.”