MLF3D: Multi-Level Fusion for Multi-Modal 3D Object Detection
H Jiang, J Wang, J Xiao, Y Zhao… - 2024 IEEE Intelligent …, 2024 - ieeexplore.ieee.org
2024 IEEE Intelligent Vehicles Symposium (IV), 2024•ieeexplore.ieee.org
Recently, 3D object detection techniques based on the fusion of camera and LiDAR sensor
modalities have received much attention due to their complementary capabilities. How-ever,
prevalent multi-modal models are relatively homogeneous in terms of feature fusion
strategies, making their performance being strictly limited to the detection results of one of
the modalities. While the latest data-level fusion models based on virtual point clouds do not
make further use of image features, resulting in a large amount of noise in depth estimation …
modalities have received much attention due to their complementary capabilities. How-ever,
prevalent multi-modal models are relatively homogeneous in terms of feature fusion
strategies, making their performance being strictly limited to the detection results of one of
the modalities. While the latest data-level fusion models based on virtual point clouds do not
make further use of image features, resulting in a large amount of noise in depth estimation …
Recently, 3D object detection techniques based on the fusion of camera and LiDAR sensor modalities have received much attention due to their complementary capabilities. How-ever, prevalent multi-modal models are relatively homogeneous in terms of feature fusion strategies, making their performance being strictly limited to the detection results of one of the modalities. While the latest data-level fusion models based on virtual point clouds do not make further use of image features, resulting in a large amount of noise in depth estimation. To address the above issues, this paper integrates the advantages of data-level and feature-level sensor fusion, and proposes MLF3D, a 3D object detection based on multi-level fusion. MLF3D generates virtual point clouds to realize the data-level fusion, and implements feature-level fusion through two key designs: VIConv3D and ASFA. VIConv3D reduces the noise problem and realizes deep interactive enhancement of features through cross-modal fusion, noise sensing, and cross-space fusion. ASFA refines the bounding box by adaptively fusing cross-layer spatial semantic information. Our MLF3D achieves 92.91%, 87.71% AP and 85.25% AP in easy, medium and hard scenarios on the KITTI’s 3D Car Detection Leaderboard, realizing excellent performance.
ieeexplore.ieee.org
Showing the best result for this search. See all results