PAFNet: Pillar Attention Fusion Network for Vehicle–Infrastructure Cooperative Target Detection Using LiDAR
<p>Representative samples of DAIR-V2X-C dataset. (<b>A</b>) Visualization of scenario 1 at an intersection. (<b>B</b>) Visualization of scenario 2 at an intersection. (<b>a</b>) Image collected by the vehicle-side camera. (<b>b</b>) Point cloud collected by the vehicle-side LiDAR. (<b>c</b>) Image collected by the roadside camera. (<b>d</b>) Point cloud collected by the roadside LiDAR.</p> "> Figure 2
<p>PAFNet structure.</p> "> Figure 3
<p>Spatial and temporal cooperative fusion preprocessing.</p> "> Figure 4
<p>Pillar Voxel Feature Encoding.</p> "> Figure 5
<p>2D feature extraction module.</p> "> Figure 6
<p>Grid attention feature fusion module.</p> "> Figure 7
<p>Visualization of center point heatmap. (<b>A</b>) Center point heatmap of vehicle side. (<b>B</b>) Center point heatmap of roadside. (<b>C</b>) Center point heatmap of VIC. Red, green, and blue points indicate class vehicles, class pedestrians, and class cyclists, respectively.</p> "> Figure 8
<p>2D feature extraction module.</p> "> Figure 9
<p>Visualization of target detection results. (<b>A</b>) Image collected by the vehicle-side camera. (<b>B</b>) Point cloud collected by the vehicle-side LiDAR. (<b>C</b>) Image collected by the roadside camera. (<b>D</b>) Point cloud collected by the roadside LiDAR. (<b>E</b>) Vehicle-side point cloud detection result. (<b>F</b>) Roadside point cloud detection result. (<b>G</b>) Feature fusion detection result. The red boxes and orange boxes indicate vehicles.</p> "> Figure 10
<p>3D detection results by different methods on DAIR-V2X-C.</p> ">
Abstract
:1. Introduction
- A novel anchor-based VIC feature fusion target detection network is proposed in this paper. The proposed network combines the advantages of a center point detection scheme, improves the accuracy of target detection through feature fusion, and provides a new solution for VIC autonomous driving target detection.
- A method for preprocessing point cloud features with spatial–temporal coordination is proposed. The accuracy of feature fusion can be improved by optimizing the matching of roadside point cloud frames and vehicle-side point cloud frames. It is also enhanced by unifying the vehicle-side and roadside point cloud coordinate systems to the world coordinate system.
- A method for fusing VIC features based on spatial attention, called Grid Attention Feature Fusion (GAFF), is proposed. The information contained in roadside feature maps and vehicle-side feature maps is maximally preserved through feature extraction and feature fusion using a spatial attention model.
2. Materials and Methods
2.1. Dataset
2.2. Method
2.2.1. Spatial and Temporal Cooperative Fusion Preprocessing
2.2.2. Feature Extraction Network
2.2.3. Grid Attention Feature Fusion
2.2.4. Detection Head
3. Results
3.1. Visualization Results and Analysis
3.2. Detection Experiment
3.3. Ablation Experiment
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Royo, S.; Ballesta-Garcia, M. An Overview of Lidar Imaging Systems for Autonomous Vehicles. Appl. Sci. 2019, 9, 4093. [Google Scholar] [CrossRef]
- Li, Y.; Ibanez-Guzman, J. Lidar for Autonomous Driving: The Principles, Challenges, and Trends for Automotive Lidar and Perception Systems. IEEE Signal Process. Mag. 2020, 37, 50–61. [Google Scholar] [CrossRef]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep Learning for 3d Point Clouds: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4338–4364. [Google Scholar] [CrossRef]
- Fernandes, D.; Silva, A.; Névoa, R.; Simões, C.; Gonzalez, D.; Guevara, M.; Novais, P.; Monteiro, J.; Melo-Pinto, P. Point-Cloud Based 3d Object Detection and Classification Methods for Self-Driving Applications: A Survey and Taxonomy. Inf. Fusion 2021, 68, 161–191. [Google Scholar] [CrossRef]
- Zhikun, W.; Jincheng, Y.; Ling, Y.; Sumin, Z.; Yehao, C.; Caixing, L.; Xuhong, T. Improved Hole Repairing Algorithm for Livestock Point Clouds Based on Cubic B-Spline for Region Defining. Measurement 2022, 190, 110668. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Adv. Neural Inf. Process. Syst. 2017, 30, 1–10. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.; Elhoseiny, M.; Ghanem, B. Pointnext: Revisiting Pointnet++ with Improved Training and Scaling Strategies. Adv. Neural Inf. Process. Syst. 2022, 35, 23192–23204. [Google Scholar]
- Ma, X.; Qin, C.; You, H.; Ran, H.; Fu, Y. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual Mlp Framework. arXiv 2022, arXiv:2202.07123. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3dssd: Point-Based 3d Single Stage Object Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11040–11048. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end Learning for Point Cloud based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely Embedded Convolutional Detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast Encoders for Object Detection from Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Kuang, H.; Wang, B.; An, J.; Zhang, M.; Zhang, Z. Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3d Object Detection from Lidar Point Clouds. Sensors 2020, 20, 704. [Google Scholar] [CrossRef]
- Shi, S.; Wang, Z.; Shi, J.; Wang, X.; Li, H. From Points to Parts: 3d Object Detection from Point Cloud with Part-Aware and Part-Aggregation Network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2647–2664. [Google Scholar] [CrossRef]
- Shi, G.; Li, R.; Ma, C. Pillarnet: Real-Time and High-Performance Pillar-Based 3d Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 35–52. [Google Scholar]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3D Object Detection and Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11784–11793. [Google Scholar]
- Wang, G.; Wu, J.; Tian, B.; Teng, S.; Chen, L.; Cao, D. Centernet3D: An Anchor Free Object Detector for Point Cloud. IEEE Trans. Intell. Transp. Syst. 2021, 23, 12953–12965. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, J.; Zhang, X.; Qi, X.; Jia, J. Voxelnext: Fully Sparse Voxelnet for 3d Object Detection and Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Oxford, UK, 18–22 June 2023; pp. 21674–21683. [Google Scholar]
- Li, J.; Luo, C.; Yang, X. Pillarnext: Rethinking Network Designs for 3d Object Detection in Lidar Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Oxford, UK, 18–22 June 2023; pp. 17567–17576. [Google Scholar]
- Wang, G.; Wu, J.; Xu, T.; Tian, B. 3D Vehicle Detection with RSU Lidar for Autonomous Mine. IEEE Trans. Veh. Technol. 2021, 70, 344–355. [Google Scholar] [CrossRef]
- Schinagl, D.; Krispel, G.; Possegger, H.; Roth, P.M.; Bischof, H. Occam’s Laser: Occlusion-Based Attribution Maps for 3d Object Detectors on Lidar Data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1141–1150. [Google Scholar]
- Wu, J.; Xu, H.; Tian, Y.; Pi, R.; Yue, R. Vehicle Detection under Adverse Weather from Roadside LiDAR Data. Sensors 2020, 20, 3433. [Google Scholar] [CrossRef]
- Wang, J.; Wu, Z.; Liang, Y.; Tang, J.; Chen, H. Perception Methods for Adverse Weather Based on Vehicle Infrastructure Cooperation System: A Review. Sensors 2024, 24, 374. [Google Scholar] [CrossRef]
- Yu, H.; Luo, Y.; Shu, M.; Huo, Y.; Yang, Z.; Shi, Y.; Guo, Z.; Li, H.; Hu, X.; Yuan, J. DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3d Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 21361–21370. [Google Scholar]
- Abdelazeem, M.; Elamin, A.; Afifi, A.; El-Rabbany, A. Multi-Sensor Point Cloud Data Fusion for Precise 3D Mapping. Egypt. J. Remote Sens. Space Sci. 2021, 24, 835–844. [Google Scholar] [CrossRef]
- Zhou, Y.; Sun, P.; Zhang, Y.; Anguelov, D.; Gao, J.; Ouyang, T.; Guo, J.; Ngiam, J.; Vasudevan, V. End-to-End Multi-View Fusion for 3d Object Detection in Lidar Point Clouds. In Proceedings of the Conference on Robot Learning, Virtual, 16–18 November 2020; pp. 923–932. [Google Scholar]
- Yu, H.; Yang, W.; Ruan, H.; Yang, Z.; Tang, Y.; Gao, X.; Hao, X.; Shi, Y.; Pan, Y.; Sun, N. V2x-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Oxford, UK, 18–22 June 2023; pp. 5486–5495. [Google Scholar]
- Sun, P.; Sun, C.; Wang, R.; Zhao, X. Object Detection Based on Roadside LiDAR for Cooperative Driving Automation: A Review. Sensors 2022, 22, 9316. [Google Scholar] [CrossRef]
- Chen, Q.; Tang, S.; Yang, Q.; Fu, S. Cooper: Cooperative Perception for Connected Autonomous Vehicles based on 3d Point Clouds. In Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, TX, USA, 7–10 July 2019; pp. 514–524. [Google Scholar]
- Tang, Z.; Hu, R.; Chen, Y.; Sun, Z.; Li, M. Multi-Expert Learning for Fusion of Pedestrian Detection Bounding Box. Knowl.-Based Syst. 2022, 241, 108254. [Google Scholar] [CrossRef]
- Hurl, B.; Cohen, R.; Czarnecki, K.; Waslander, S. Trupercept: Trust Modelling for Autonomous Vehicle Cooperative Perception from Synthetic Data. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 341–347. [Google Scholar]
- Bai, Z.; Wu, G.; Barth, M.J.; Liu, Y.; Sisbot, E.A.; Oguchi, K. Pillargrid: Deep Learning-Based Cooperative Perception for 3d Object Detection from Onboard-Roadside Lidar. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 1743–1749. [Google Scholar]
- Yu, H.; Tang, Y.; Xie, E.; Mao, J.; Luo, P.; Nie, Z. Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3d Object Detection. Adv. Neural Inf. Process. Syst. 2024, 36, 1–9. [Google Scholar]
- Raj, T.; Hanim Hashim, F.; Baseri Huddin, A.; Ibrahim, M.F.; Hussain, A. A Survey on Lidar Scanning Mechanisms. Electronics 2020, 9, 741. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. Carla: An Open Urban Driving Simulator. In Proceedings of the 1st Annual Conference on Robot Learning, Proceedings of Machine Learning Research (PMLR), Amsterdam, The Netherlands, 13–15 November 2017; pp. 1–16. [Google Scholar]
- Beck, J.; Arvin, R.; Lee, S.; Khattak, A.; Chakraborty, S. Automated Vehicle Data Pipeline for Accident Reconstruction: New Insights from Lidar, Camera, and Radar Data. Accid. Anal. Prev. 2023, 180, 106923. [Google Scholar] [CrossRef]
- Zhou, S.; Xu, H.; Zhang, G.; Ma, T.; Yang, Y. Leveraging Deep Convolutional Neural Networks Pre-Trained on Autonomous Driving Data for Vehicle Detection from Roadside Lidar Data. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22367–22377. [Google Scholar] [CrossRef]
- Xie, S.; Gu, J.; Guo, D.; Qi, C.R.; Guibas, L.; Litany, O. Pointcontrast: Unsupervised Pre-Training for 3d Point Cloud Understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 574–591. [Google Scholar]
- Fei, J.; Peng, K.; Heidenreich, P.; Bieder, F.; Stiller, C. Pillarsegnet: Pillar-Based Semantic Grid Map Estimation Using Sparse Lidar Data. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021; pp. 838–844. [Google Scholar]
- Yuan, Z.; Song, X.; Bai, L.; Wang, Z.; Ouyang, W. Temporal-Channel Transformer for 3d Lidar-Based Video Object Detection for Autonomous Driving. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2068–2078. [Google Scholar] [CrossRef]
- Lin, T.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y.; Jiang, Y. Extended Feature Pyramid Network for Small Object Detection. IEEE Trans. Multimed. 2021, 24, 1968–1979. [Google Scholar] [CrossRef]
- Zhu, L.; Deng, Z.; Hu, X.; Fu, C.; Xu, X.; Qin, J.; Heng, P. Bidirectional Feature Pyramid Network with Recurrent Attention Residual Modules for Shadow Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 121–136. [Google Scholar]
- Gao, J.; Geng, X.; Zhang, Y.; Wang, R.; Shao, K. Augmented Weighted Bidirectional Feature Pyramid Network for Marine Object Detection. Expert Syst. Appl. 2024, 237, 121688. [Google Scholar] [CrossRef]
- Lian, X.; Pang, Y.; Han, J.; Pan, J. Cascaded Hierarchical Atrous Spatial Pyramid Pooling Module for Semantic Segmentation. Pattern Recognit. 2021, 110, 107622. [Google Scholar] [CrossRef]
- Qiu, Y.; Liu, Y.; Chen, Y.; Zhang, J.; Zhu, J.; Xu, J. A2sppnet: Attentive Atrous Spatial Pyramid Pooling Network for Salient Object Detection. IEEE Trans. Multimed. 2023, 25, 1991–2006. [Google Scholar] [CrossRef]
- He, H.; Yang, D.; Wang, S.; Wang, S.; Li, Y. Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss. Remote Sens. 2019, 11, 1015. [Google Scholar] [CrossRef]
- Guo, M.; Xu, T.; Liu, J.; Liu, Z.; Jiang, P.; Mu, T.; Zhang, S.; Martin, R.; Cheng, M.; Hu, S. Attention Mechanisms in Computer Vision: A Survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Yan, J.; Peng, Z.; Yin, H.; Wang, J.; Wang, X.; Shen, Y.; Stechele, W.; Cremers, D. Trajectory Prediction for Intelligent Vehicles Using Spatial-Attention Mechanism. IET Intell. Transp. Syst. 2020, 14, 1855–1863. [Google Scholar] [CrossRef]
- Chen, J.; Chen, Y.; Li, W.; Ning, G.; Tong, M.; Hilton, A. Channel and Spatial Attention Based Deep Object Co-Segmentation. Knowl.-Based Syst. 2021, 211, 106550. [Google Scholar] [CrossRef]
- Xue, Y.; Mao, J.; Niu, M.; Xu, H.; Mi, M.B.; Zhang, W.; Wang, X.; Wang, X. Point2seq: Detecting 3d Objects as Sequences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 8521–8530. [Google Scholar]
Method | Anchor | Dataset | LiDAR Type | |
---|---|---|---|---|
Point-based | PointNet [6]/Pointnet++ [7] | √ | Vehicle-side Dataset | Mechanical Scanning [36] |
PointRCNN [8] | √ | |||
PointNeXt [9] | Anchor-free | |||
Voxel-based | VoxelNet [12]/VoxelNeXt [20] | √ | ||
Voxel-FPN [15] | √ | |||
PartA2 [16] | √ | |||
SECOND [13] | √ | |||
CenterPoint [18,19] | Anchor-free | |||
Pillar-based | PointPillars [14]/PillarNet [17] | √ | ||
PillarNeXt [21] | Anchor-free | |||
PillarGrid [34] | √ | VIC, Carla Simulation [37] | Simulation mechanical scanning [38] | |
FFNet [35] | √ | VIC | V: Mechanical Scanning, I: Solid-State [36] |
Information of DAIR-V2X-C [26] | Parameters | ||
---|---|---|---|
Roadside (RS) equipment | LiDAR | Laser lines | 300 lines |
Capture frequency | 10 Hz | ||
Horizontal/Vertical FOV | 100°/40° | ||
Detection distance accuracy [36] | ≤3 cm | ||
Camera | Resolution | 1920 × 1080 | |
Capture frequency | 25 Hz | ||
Vehicle-side (VS) equipment | LiDAR | Laser lines | 40 lines |
Capture frequency | 10 Hz | ||
Horizontal/Vertical FOV | 360°/40° | ||
Detection distance accuracy [36] | ≤3 cm | ||
Camera | Resolution | 1920 × 1080 | |
Capture frequency | 25 Hz | ||
Dataset annotation [26] | Frames | 38,845 | |
Object types | 10 types (car, truck, van, bus, pedestrian, cyclist, tricyclist, motorcyclist, cart, traffic cone) | ||
2D box in image | Height, width | ||
3D box in point cloud | Height, width, length, location, rotation | ||
Calibration data | Extrinsic parameter matrix of RS and VS | ||
Coordinate conversion | LiDAR and camera coordinate of RS and VS, virtual world coordinate | ||
Other information | Time stamp, occluded state, truncated state |
Method | Anchor | Detection Modality | Model Size | Speed |
---|---|---|---|---|
Pointpillars | Based | Early fusion | 58.1 MB | 69.85 ms |
VoxelNeXt | Free | Early fusion | 228.9 MB | 103.21 ms |
PillarGrid | Based | Feature fusion | 114.8 MB | 126.51 ms |
FFNet | Based | Feature fusion | 128 MB | 131.86 ms |
PAFNet | Free | Feature fusion | 95.85 MB | 133.52 ms |
STCFP Module | ASPP Module | GAFF Module | 3D Detection AP (%) | |||
---|---|---|---|---|---|---|
Car (IoU = 0.7) | Pedestrian (IoU = 0.5) | Cyclist (IoU = 0.5) | mAP | |||
× | × | × | 51.56 | 30.37 | 31.70 | 37.88 |
√ | × | × | 52.62 | 30.54 | 32.03 | 38.40 |
√ | × | √ | 56.08 | 36.42 | 38.37 | 43.62 |
√ | √ | × | 53.80 | 32.73 | 34.29 | 40.27 |
√ | √ | √ | 57.27 | 39.52 | 41.95 | 46.25 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, L.; Lan, J.; Li, M. PAFNet: Pillar Attention Fusion Network for Vehicle–Infrastructure Cooperative Target Detection Using LiDAR. Symmetry 2024, 16, 401. https://doi.org/10.3390/sym16040401
Wang L, Lan J, Li M. PAFNet: Pillar Attention Fusion Network for Vehicle–Infrastructure Cooperative Target Detection Using LiDAR. Symmetry. 2024; 16(4):401. https://doi.org/10.3390/sym16040401
Chicago/Turabian StyleWang, Luyang, Jinhui Lan, and Min Li. 2024. "PAFNet: Pillar Attention Fusion Network for Vehicle–Infrastructure Cooperative Target Detection Using LiDAR" Symmetry 16, no. 4: 401. https://doi.org/10.3390/sym16040401
APA StyleWang, L., Lan, J., & Li, M. (2024). PAFNet: Pillar Attention Fusion Network for Vehicle–Infrastructure Cooperative Target Detection Using LiDAR. Symmetry, 16(4), 401. https://doi.org/10.3390/sym16040401