By Tom Hardy

Date: 2020-2-18

Source: summary | deep learning method based on 3 d point cloud

Public account [3D Vision Workshop] : Mainly focus on 3D vision algorithm, SLAM, vSLAM, computer vision, deep learning, automatic driving, image processing and technology dry goods sharing

Operators and guests: Operators from domestic a line of giant algorithm engineer, deep research on 3 d vision, vSLAM, computer vision, point cloud processing, deep learning, automatic driving, image processing, 3 d reconstruction, and other fields, specially invited guests include domestic and foreign well-known colleges and universities master, Dr Kuang, Thomson, inauguration of algorithms such as baidu, ali, welcome to learn together

preface

Three-dimensional data can often be represented in different formats, including depth images, point clouds, grids, and volumetric grids. As a common representation scheme, point cloud representation retains the original geometric information in three-dimensional space without any discretization. As such, it is the preferred representation for many scenarios to understand related applications, such as autonomous driving and robotics. In recent years, deep learning technology has become a research hotspot in computer vision, speech recognition, natural language processing, bioinformatics and other fields. However, 3d point cloud deep learning still faces many challenges such as small data set size, high dimension, and unstructured. On this basis, this paper makes a detailed description of the latest progress of deep learning methods based on point cloud data, including three tasks: 3D shape classification, 3D target detection and tracking, and 3D point cloud segmentation.

3D point cloud shape recognition

These methods typically learn the embedding of each point, then use the aggregation method to extract the global shape embedding from the entire point cloud, and finally implement classification through several fully connected layers. Based on the feature learning method at each point, the existing 3D shape classification can be divided into project-based network and point-based network. The project-based approach first projects an unstructured point cloud onto an intermediate regular representation, and then implements shape classification using well-established 2d or 3D convolution. In contrast, the point-based approach acts directly on the original point cloud without any voxelization or projection. Point-based approaches introduce no explicit information loss and are becoming increasingly popular.

Projection based approach

These methods first project 3d objects into multiple views, extract corresponding view features, and then fuse these features for accurate object recognition. How to aggregate multiple view characteristics into a differentiated global representation is a key challenge. This class of methods mainly includes:

  1. MVCNN
  2. MHBN
  3. Learning relationships for multi-view 3D object recognition
  4. Volumetric and multi-view CNNs for object classification on 3D data
  5. GVCNN: Groupview convolutional neural networks for 3D shape recognition
  6. Dominant set clustering and pooling for multi-view 3D object recognition
  7. Learning multi-view representation with LSTM for 3D shape recognition and retrieval

In addition, there are some volumetric representations of 3D point clouds, mainly including:

  1. VoxNet
  2. 3D shapenets: A deep representation for volumetric shapes
  3. OctNet: Learning deep 3D representations at high resolutions
  4. OCNN: Octree-based convolutional neural networks for 3D shape analysis
  5. Pointgrid: A deep network for 3d shape understanding

Point – -based network

According to the network architecture used for feature learning at each point, the method can be divided into point-by-point MLP, convolution, graph-based, data-indexed networks and other typical networks. The network summary is as follows:

3D point cloud target detection and tracking

3D target detection

The task of 3D target detection is to precisely locate all interested targets in a given scene. Similar to target detection in images, 3d target detection methods can be divided into two categories: region proposal-based methods and single shot methods.

For region proposal-based methods: These methods first propose several regions that may contain objects (also known as proposals), and then extract regional features to determine the category label of each proposal. Based on their proposal generation methods, these methods can be further divided into three categories: multi-view-based methods, segmentation based methods, and Frustum based methods.

For Single shot methods: These methods directly predict category probabilities and use single-level networks to regression 3d bounding boxes of objects. These methods do not require region proposals and post-processing. As a result, they can run at high speeds, making them ideal for real-time applications. According to the type of input data, it can be divided into two categories: BEV (projection graph) based methods and point cloud based methods.

Network aggregation in two ways:

3D target tracking

Given an object’s position in the first frame, the task of object tracking is to estimate its state in subsequent frames. Because 3d target tracking can make use of the rich geometric information in point cloud, it is expected to overcome the shortcomings of 2d image tracking, such as occlusion, illumination and scale change. The main methods include:

  1. Leveraging shape completion for 3D siamese tracking
  2. Context-aware correlation filter tracking
  3. Efficient tracking proposals using 2D-3D siamese networks on lidar
  4. Complexer-YOLO: Real-time 3D object detection and tracking on semantic point clouds

In addition to the above methods, there are some tracking algorithms based on the idea of optical flow. Similar to optical flow estimation in 2D vision, there are many methods to learn useful information (such as 3D scene flow and spatial temporary information) from point cloud sequence, mainly including:

  1. Flownet3D: Learning scene flow in 3D point clouds
  2. FlowNet3D++: Geometric losses for deep scene flow estimation
  3. HPLFlowNet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds
  4. PointRNN: Point recurrent neural network for moving point cloud processing
  5. MeteorNet: Deep learning on dynamic 3D point cloud sequences Just go with the flow: Self-supervised scene flow estimation

3D point cloud segmentation

3d point cloud segmentation requires an understanding of the global geometry and fine-grained details of each point. According to the segmentation granularity, 3d point cloud segmentation methods can be divided into three categories: semantic segmentation (scene level), instance segmentation (object level) and component segmentation (component level).

Semantic segmentation

Semantic segmentation is based on scene level and mainly includes project-based and point-based methods.

Segmentation algorithm for projection mode: Mainly includes multi-view Representation, Spherical Representation, Volumetric Representation, and Permutohedral Lattice There are five ways of Representation and Hybrid Representation. The following figure summarizes recent segmentation networks:

For the segmentation algorithm based on point method: the point-based network directly acts on the irregular point cloud. However, point cloud is disordered and unstructured, and it is not feasible to directly apply standard CNN. To that end, the groundbreaking PointNet was developed to learn point-by-point features using shared MLPS and global features using symmetric pool functions. Based on this idea, the later methods can be roughly divided into point MLP method, point convolution method, RNN based method and graph based method. For the recent point-based segmentation network, the following table provides a detailed summary:

Examples of segmentation

Compared to semantic segmentation, instance segmentation is more challenging because it requires more precise and refined reasoning about points. In particular, it distinguishes not only semantically different points, but also semantically identical instances. In general, the existing methods can be divided into two categories: proposal-based methods and proposal-free methods.

Based on the proposal, the instance segmentation problem is transformed into two sub-tasks: 3d object detection and instance mask prediction. However, the proposal-free method has no object detection module. On the contrary, such methods usually regard instance segmentation as the subsequent clustering step after semantic segmentation. In particular, most existing methods are based on the assumption that points belonging to the same instance should have very similar characteristics. Therefore, these methods mainly focus on discriminating feature learning and point grouping. The two networks are summarized as follows:

Part Segmentation

There are two difficulties in part segmentation of 3D shape. First of all, the shape parts with the same semantic label have greater geometric variation and fuzziness. Secondly, the method should be robust to noise and sampling. Existing algorithms mainly include:

  1. VoxSegNet: Volumetric CNNs for semantic part segmentation of 3D shapes
  2. 3D shape segmentation with projective convolutional networks
  3. SyncSpecCNN: Synchronized spectral CNN for 3D shape segmentation
  4. 3D shape segmentation via shape fully convolutional networks
  5. CoSegNet: Deep co-segmentation of 3D shapes with group consistency loss