Apple’s new self-driving car tech: Voxelnet is quite Awesome

Point Cloud-Based 3D Object Detection.

Self driving car tech
Self driving car tech

Apple is known for pushing technological boundaries. It’s also known for keeping its future projects secret. Now, the company has revealed their Apple’s mysterious self-driving car program.

A pair of Apple researchers published a paper proposing new software, called “VoxelNet”. Voxelnet is a software that aids computers detect three-dimensional objects.

3D object detection is an important component of a variety of real-world applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality.

LiDAR-based 3D object recognition
Qualitative results. For better visualization 3D boxes detected using LiDAR are projected on to the RGB images.

The software uses a combination of normal two-dimensional camera and depth-sensing LiDAR for recognition of nearby object around it. LiDAR provides reliable depth information that can be used to accurately localize objects and characterize their shapes. It also has highly variable point density, due to factors such as non-uniform sampling of the 3D space, the effective range of the sensors, occlusion, and the relative pose.

According to reports, Apple will soon begin to apply this new software to its own self-driving cars. This revelation about Apple’s self-driving tech progress, combined with its new testing permit, sure sounds like it wants to develop and eventually sell its knowledge.

Lead author Yin Zhou said, “VoxelNet, a generic 3D detection framework that simultaneously learns a discriminative feature representation from point clouds and predicts accurate 3D bounding boxes, in an end-to-end fashion.”

“We design a novel voxel feature encoding (VFE) layer, which enables inter-point interaction within a voxel, by combining point-wise features with a locally aggregated feature”

Voxelnet separates the point cloud into similarly divided 3D voxels, encodes each voxel by means of stacked VFE layers, and after that 3D convolution additionally totals neighborhood voxel features, transforming the point cloud into a high-dimensional volumetric portrayal.The RPN then analyze the volumetric representation and represent the results.

Scientists tested their software on the bird’s eye view detection and the full 3D detection tasks. Experimental results show that VoxelNet outperforms the state-of-the-art LiDAR-based 3D detection.

Scientists noted, “There are also several multi-modal fusion methods that combine images and LiDAR to improve detection accuracy. These methods provide improved performance compared to LiDAR-only 3D detection, particularly for small objects (pedestrians, cyclists) or when the objects are far, since cameras provide an order of magnitude more measurements than LiDAR.”