With the rapid development of autonomous driving technology, multi-sensor fusion scheme is becoming more and more mature. This paper will briefly introduce the popular Vins-Mono in recent years. The scale of traditional monocular visual SLAM is uncertain. After integrating IMU, it can effectively solve the uncertainty of monocular visual SLAM, and eliminate the integral drift of IMU, which greatly improves the robustness of the system. In addition, in the back-end initialization, the fusion of IMU and vision is actually only a loose coupling, the positioning accuracy is effectively comparable to OKVIS, and has a more complete and robust initialization and closed-loop detection process than OKVIS. So here’s a brief introduction to how VINS works.

Vins-mono is a robust universal monocular vision inertial system consisting of a monocular camera and a low-cost IMU. A high-precision visual inertial odometer is obtained by integrating the pre-integrated IMU measurements and characteristic observations, which, combined with closed-loop detection and graph optimization, forms a complete monocular VIO-SLAM system.

1.1 Algorithm Framework The general process is as follows:

  1. The visual images are obtained from the monocular camera, and an adaptive histogram equalization process is performed on the collected images. Then, Harris corner features are extracted based on FAST algorithm. Here, the CV ::goodFeaturesToTrack function in OpenCV is directly called, and the KLT pyramid optical flow algorithm is used for feature point tracking. The tracked feature points were put into a queue, and the Outliers were eliminated by random consistent sampling after the essence matrix was obtained by five-point method. Subsequently, pre-integration of IMU is carried out, mainly to prevent re-propagation of IMU observed values and increase the difficulty of operation. The IMU pre-integration model is used to transform the world coordinate system into the ontology coordinate system. Preintegration using median integral in discrete state.
  2. After initialization, the nonlinear optimization method based on sliding window was used to estimate the state of the body. The body state includes the IMU state (position, velocity, rotation, acceleration bias and angular velocity bias) of the key frame in the slide window.
  3. When the system detects loopback, the residual part of vision measurement obtained by closed-loop detection is added to the cost function mentioned in 1 to relocate the state. The closed-loop detection of DBoW2 method.
  4. After closed-loop detection, pose map optimization is carried out for the four parameters that can produce drift in the pose, and the four parameters are X,Y, Z and YAML.

Due to space limitations, only the marginalization method and IMU pre-integration will be introduced below.

1.2 IMU pre-integrationThe IMU is an inertial measurement unit. In VINS, the IMU has two devices, the gyroscope and the accelerometer respectively. The gyroscope measures the rotation angular velocity in the IMU coordinate system, and the accelerometer measures the linear acceleration in the IMU coordinate system. Attach a reference link:Zhuanlan.zhihu.com/p/133666509… Why do I need to preintegrate? Because a rigid body is moving in translation in the same inertial frame, the first and second derivatives of its translation with respect to time are velocity and acceleration.Early integral:The next state can be iteratively solved by the current state, as follows:Pre-integration term: Since the data collected by IMU is discrete, the integration method adopted here is median integrationBecause there are too many formulas, the following formula is no longer derived, you can refer to the need. We need to know that the purpose of pre-integration is to solve the problem that v and R of frame K need to be iteratively updated during the nonlinear optimization of our back-end optimization, which will result in the need to re-integrate according to the value after each iteration, which requires a large amount of calculation. Therefore, the optimization variable is separated from the pre-integral term of the IMU.

1.3 marginalizedAccording to the nonlinear optimization theory based on Gauss Newton discussed aboveIt should be noted that the state variables in the above formula are not necessarily the camera pose and signpost part, but the part that Marg wants to drop and retain. In VINS, what really needs to be marginalized is the oldest frame or the next new frame in the sliding window. The purpose is to no longer calculate the pose of this frame or its related waypoints, but to retain the constraint relationship of this frame to other frames in the window. We cannot discard a variable directly, because it contains constraints on other variables, and if we discard it directly, the constraints will be missing and the information will be lost. We need to use Schur complement for elimination. So we can just figure it out, and no constraints are lost, so no information is lost. According to whether the new frame is a key frame, VINS divides into two marginalization strategies: by comparing the parallax between the new frame and the new frame, marg decides to drop the new frame or the oldest frame. The residual term in the previous prior term is passed to the current prior term, and the state quantity to be discarded is removed from it.

If the next new frame is not a key frame, MARGIN_SECOND_NEW, we will simply discard the next new frame and its visual observation side, instead of marg the next new frame, because we think that the current frame and the next new frame are similar, that is, the constraints between the current frame and the waypoint are very close to the constraints of the next new frame and the waypoint. Discarding does not result in the loss of too much information in the overall constraint. However, it is worth noting that we need to keep the IMU data of the new frames, so as to ensure the consistency of THE IMU pre-integration. The prior term is constructed through the above process. When the state quantity in the sliding window is optimized, it is put together with the IMU residual term and the visual residual term, so as to obtain the latest state estimation result without loss of historical information.

1.4 Comparison of data setsThe results of the three methods, OKVIS,VINS without loop and VINS with loop, on 15 data subsets of EuRoC dataset were compared.

After VINS_Mono and VINS_Moblie, VINS_Fusion is an open source binocular vision INERTIAL navigation SLAM scheme of THE University of Hong Kong science and Technology.VINS_Fusion is a multi-sensor state estimator based on optimization, which can realize accurate positioning of autonomous applications. Vins-mono is an extension of vins-Mono to support a variety of visual inertial sensor types. The open source project also showed an example of a module that incorporates VINS into GPS.

Features are as follows:

  1. Multi-sensor support (STEREO/mono +IMU/ STEREO +IMU)
  2. Online spatial calibration (conversion between camera and IMU)
  3. Online time calibration (time offset between camera and IMU)
  4. Closure of visual cycle

The basic principle is the same as that of Mono, which is no longer redundant.

2.2 Comparison between Mono and FusionCompared with monocular SLAM, the main advantage of binocular SLAM is that the initialization can be carried out at rest during the initialization process. On the other hand, because the scale information does not completely depend on THE IMU, the scale is not objective. However, due to various reasons such as visual error and mismatching, the accuracy of binocular is actually worse than that of monocular. However, binocular was significantly better than monocular in terms of robustness. The sensor used in this experiment is D435i, and the experimental results are as follows:

By learning the principles of VINS, we can understand the most innovative part of VINS, namely the pre-integration process. This provides a good idea for us to improve our OWN SLAM scheme in the future.

  • End –

With the rapid development of technology, AMu Lab will keep up with the pace of technology and continue to recommend the latest technology and hardware in the robotics industry to you. The greatest value of our training is to see our trainees improve by leaps and bounds. If you are in the robot industry, please follow our official account, we will continue to publish the most valuable information and technology in the robot industry. Amu Lab is committed to the education of cutting-edge IT technology and intelligent equipment to make robot research and development more efficient!