Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Recently, Tesla removed the radar in front of the vehicle from the vehicle, which means that the data of driverless decision-making now completely depends on visual perception, which has aroused unanimous doubts from people inside and outside. After all, the accuracy and measurement accuracy of driverless vehicles are very strict, which is centimeter level and cannot be neglected. But Tesla believes vision can do better, and that focusing on vision can solve the problem.

Among them, the vision sensor is much better than radar, and the data from other sensors will not contribute to the accuracy of the results, but may affect the perception of noise, dragging down the accuracy. Next, neural network input is shown, which takes images from eight cameras around the car.

The eight cameras are high-definition at 36 frames per second, enough to get all the information you need from your surroundings.

Two side and rear view cameras: mounted on the fins and positioned forward. Two side forward view cameras: mounted on pillar B, located 1m behind the side and rear view installation position. One rear-view camera: one millimeter wave radar mounted above the license plate frame of the trunk: located below the front bumper. Sustainable development of business model: Tesla adopts the business model of “self-research system and chip + car manufacturing”, which can not only reduce long-term costs, but also obtain benefits from vehicle sales.

The large amount of information is about 8M data per second, and the key is how to utilize such rich data to extract valuable information that the Tesla development team is facing adjustment. So their focus is on data analysis and network architecture design, rather than on radar stacks and visual fusion.

They dare to choose visual sources of information humans rely on visual perception for driving, but we’re not sure that neural networks are the strategy for correctly interpreting video. For example, how to find various objects in video data, and how to calculate depth and speed of moving objects. But research by Andre Capas and his team has convinced them that neural networks are the best solution.

Of course radar can give very accurate depth and speed measurements, but the radar stack that looks at the car in the image on the left reports depth, speed and acceleration, which are a little wobbly. Radar can also be used for ranging to track the vehicle in front, but the problem is that when the vehicle is blocked, such as a bridge or a vehicle crossing, some errors can occur.

What data sets are necessary to train sufficiently large neural networks? First of all, the data set needs to be big enough to live in, and the data needs to be clean without pollution, the data needs to mark the depth and speed of the vehicle, it needs to be diversified, it needs the strength of the fleet to collect some edge examples, so the neural network is knowledgeable.

Tesla needs to use vision to measure depth and speed, replacing radar predictions. To do so, it will need a lot of data on the vehicle’s depth and speed, because neural networks are voracious. It takes a lot of data to train a large enough neural network, so collecting data and the quality and quantity of data is key.

Next time I’ll share how Tesla gathered this data.