Posted by Ben Dickson

Url: bdtechtalks.com/2020/08/10/…

! [](https://pic2.zhimg.com/80/v2-27010576070b50967b873bfb82e0fdb7_720w.jpg)

On a human level. Accuracy at the human level. Whether it’s facial recognition, object detection or question answering, these are terms you’ll hear a lot from companies developing artificial intelligence systems. To its credit, there have been a number of great products driven by AI algorithms in recent years, largely thanks to advances in machine learning and deep learning.

But many of these comparisons only consider the end result of testing deep learning algorithms on a limited set of data. This approach creates false expectations about AI systems and dangerous results when they are given mission-critical tasks.

In a recent study, a group of researchers from different organizations and universities in Germany highlighted the challenges of evaluating deep learning in processing visual data. In their paper, “The Notorious Difficulty of Comparing Human and Machine Perception,” the researchers highlight problems with current approaches to comparing deep neural networks and the human visual system.

For their research, the scientist conducted a series of experiments to scratch the surface of deep learning results and compare them to the work of the human visual system. Their findings remind us that we must be cautious when comparing AI to humans, even if it shows equal or better performance on the same tasks.

Complexity of human and computer vision

In the seemingly endless quest to reconstruct human perception, the field of computer vision in deep learning has yielded the most favorable results so far. Convolutional neural network (CNN) is an architecture commonly used in deep learning algorithms for computer vision, which can accomplish tasks that are difficult to be accomplished by traditional software.

However, comparing neural networks to human perception remains a challenge. That’s partly because we still have a lot to learn about the human visual system and brain. Complicating matters is the complex way deep learning systems work. Deep neural networks work in very complex ways that tend to confuse their own creators.

In recent years, a research institute has attempted to evaluate the inner workings of neural networks and their robustness in dealing with real-world situations. “Despite numerous studies, comparing human and machine perception is not simple,” the German researchers wrote in their paper. .

In their research, the scientists focused on three areas to measure how humans and deep neural networks process visual data.

How does a neural network perceive contours?

The first test involves contour detection. In this experiment, both human and AI participants had to say whether an image contained closed contours. . The goal here is to understand whether deep learning algorithms can learn the concepts of closed and open shapes, and whether they can detect them in a variety of situations.

! [](https://pic2.zhimg.com/80/v2-db2943172a96215981083fa215704e76_720w.jpg)

Can you tell which of the above images contains a closed shape?

To humans, a closed silhouette flanked by many open silhouettes is visually very salient. In contrast, detecting closed contour lines may be difficult for DNNs because they may require contour integration over long distances, “the researchers wrote.

For the experiment, the scientists used ResNet-50, a popular convolutional neural network developed by Microsoft artificial intelligence researchers. They used transfer learning to adjust the AI model on 14,000 closed and open contour images.

They then tested the AI on various examples of what looked like training data, and gradually moved in other directions. The initial findings suggest that a trained neural network appears to grasp the concept of closed contours. Although the network was trained on data sets containing only straight line shapes, it can also perform well on curves.

“These results show that our model does learn the concept of open and closed contours and performs contour integration processes similar to those seen in humans,” the scientists wrote.

! [](https://picb.zhimg.com/80/v2-a89eab2db5c2ecea6c39176a9bb51576_720w.jpg)

ResNet neural network can detect a variety of open and closed contour images, although only the straight line samples have been trained.

However, further research showed that other changes that did not affect people’s behavior reduced the accuracy of the ai model results. For example, changing the color and width of a line can cause the precision of a deep learning model to suddenly drop. The model also seems to have a hard time detecting shapes when they grow to a certain size.

! [](https://picb.zhimg.com/80/v2-cb7a4dc960d331a16deec3a89c81b0cf_720w.jpg)

The ResNET-50 neural network struggled when presented with images containing lines of different colors and thicknesses, as well as images whose shapes were larger than the training set.

Neural networks are also very sensitive to interference, with carefully designed changes invisible to the naked eye that disrupt the behavior of the machine learning system.

! [](https://picb.zhimg.com/80/v2-115ea70e7f73e7a3390ff0ff67f72113_720w.jpg)

The image on the right goes through antagonistic interference, which is noise that humans can’t detect. To the human eye, these two images are the same. But to a neural network, they’re different images.

To further study the decision making process in AI, scientists used bag-of-feature networks, a technique that tries to locate bits of data that help make decisions in deep learning models. The researchers found that the analysis proved that “there are indeed some local features, such as the combination of endpoints with short edges, that generally give the correct class label.” .

Can machine learning reason about images?

The second experiment tested the ability of deep learning algorithms in abstract visual reasoning. The data used in the experiment was based on the Integrated Visual Reasoning Test (SVRT), in which the AI had to answer questions that required understanding relationships between different shapes in pictures. The test involves identical – different tasks (for example, are the two shapes in the picture the same?). And spatial tasks (e.g., are smaller shapes at the center of larger shapes?). . Human observers can easily solve these problems.

! [](https://picb.zhimg.com/80/v2-7c5abb212a13e3e5bf2466df7c912097_720w.jpg)

SVRT challenges require participating AI to solve identical – different and spatial tasks.

In their experiment, the researchers used RESNET-50 to test its performance on training datasets of different sizes. The results showed that the fine-tuned pre-training model performed well on the same different tasks and spatial tasks on 28,000 samples. (Previous experiments trained a very small neural network into a million images.) As the researchers reduced the number of training samples, the ai’s performance declined, but at a faster rate on the same different tasks.

“Homogeneity tasks require more training samples than spatial reasoning tasks,” the researchers wrote, adding, “This cannot be used as evidence of systematic differences between feedforward neural networks and the human visual system.”

The researchers point out that the human visual system is inherently pretrained for a large number of abstract visual reasoning tasks. This makes it unfair to test deep learning models with low data, and almost impossible to draw reliable conclusions about differences in internal information processing between humans and ai.

“It is likely that human visual systems trained from scratch for these two tasks show similar differences in sample efficiency as resNET-50,” the researchers wrote. .

Measuring cognitive gaps in deep learning

Identifying gaps is one of the most interesting tests in the visual system. Consider the following image, can you tell me what it is without scrolling down?

! [](https://pic3.zhimg.com/80/v2-cdec6c611a21a6933c8f1e3bdc3fa390_720w.jpg)

Below is a smaller view of the same image. There is no doubt that it is a cat. If I show you a close-up of another part of the image, perhaps the ear, you might have a better chance of predicting what’s in the image. We humans need to see a certain number of overall shapes and patterns to recognize objects in an image. The more you zoom in, the more features you remove, and the harder it becomes to tell what’s in the image.

! [](https://picb.zhimg.com/80/v2-9741e8ed7be6553d274a173ac7fa9d84_720w.jpg)

Close-ups of different parts of a cat image can have different effects on our perception, depending on the features they contain.

Deep learning systems are also feature-based, but they work in a more subtle way. The neural network will sometimes pick up tiny features invisible to the human eye, but can detect them even at very close magnification.

In the final experiment, the researchers tried to measure the deep neural network’s recognition gap by gradually zooming in on the image, until the ai model’s accuracy began to plummet.

Previous experiments have shown that human image recognition gaps are quite different from deep neural networks. But in their paper, the researchers point out that most previous tests of neural networks identifying gaps have been based on patches of images chosen by humans. These patches benefit the human visual system.

When they tested their deep learning model on the “machine selection” patch, the researchers got results that showed similar gaps between humans and AI.

! [](https://pic1.zhimg.com/80/v2-7aca094b76a1cd34bacd49c7a305d35e_720w.jpg)

How does the recognition gap test evaluate magnified images affect the accuracy of artificial intelligence

“These results highlight the importance of testing humans and machines on exactly the same basis and avoiding human bias in experimental design,” the researchers wrote. . All conditions, instructions, and procedures between humans and machines should be as close as possible to ensure that all observed differences are due to inherently different decision strategies and not due to differences in test procedures.”

Closing the gap between artificial intelligence and human intelligence

As our AI systems become more complex, we will have to develop more sophisticated methods to test them. Previous research in the field has shown that many popular benchmarks used to measure the accuracy of computer vision systems are misleading. The work by German researchers is one of many attempts to measure ARTIFICIAL intelligence and better quantify the differences between ai and human intelligence. Their conclusions could provide a direction for future ai research.

“The primary challenge in comparative studies of humans and machines appears to be the strong explanatory bias within humans,” the researchers wrote. . “Appropriate analytical tools and extensive cross-checking (such as changes in network architecture, calibration of experimental procedures, generalization testing, adversarial examples, and testing of constrained networks) help rationalize the interpretation of findings and put this internal bias into perspective. In summary, care must be taken not to impose our human systemic biases when comparing human and machine perceptions.”