MFFNet: multimodal feature fusion network for point cloud semantic segmentation

Dayong Ren ORCID: orcid.org/0000-0002-2943-0931¹,
Jiawei Li¹,
Zhengyi Wu¹,
Jie Guo¹,
Mingqiang Wei² &
…
Yanwen Guo¹

973 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

We introduce a multimodal feature fusion network (MFFNet) for 3D point cloud semantic segmentation. Unlike previous methods that directly learn from colored point clouds (XYZRGB), MFFNet transforms point clouds to 2D RGB image and frequency image representations for efficient multimodal feature fusion. For each point, MFFNet performs a local projection by automatically learning a weighted orthogonal projection to softly project surrounding points onto 2D images. Regular 2D convolution can thus be applied to these regular grids for efficient semantic feature learning. Then, we fuse 2D semantic features into 3D point cloud features by using a multimodal feature fusion module (MFF). MFF module could employ high-level features from 2D RGB images and frequency images to boost the intrinsic correlation and discriminability of different structure features from the point cloud. In particular, the discriminative descriptions are quantified and leveraged as the local soft attention mask further to enforce the structure feature of the semantic categories. We have evaluated the proposed method on the S3DIS and ScanNet datasets. Experimental results and comparisons with four backbone methods demonstrate that our framework can perform better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AFpoint: adaptively fusing local and global features for point cloud

Article 04 March 2024

Waterfall-Net: Waterfall Feature Aggregation for Point Cloud Semantic Segmentation

SemReg: Semantics Constrained Point Cloud Registration

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Liu, T., Cai, Y., Zheng, J., Thalmann, N.M.: Beacon: a boundary embedded attentional convolution network for point cloud instance segmentation. Vis. Comput. 38(7), 2303–2313 (2022)
Article Google Scholar
Sun, Y., Miao, Y., Chen, J., Pajarola, R.: Pgcnet: patch graph convolutional network for point cloud segmentation of indoor scenes. Vis. Comput. 36(10), 2407–2418 (2020)
Article Google Scholar
Janai, J., Güney, F., Behl, A., Geiger, A., et al.: Computer vision for autonomous vehicles: problems, datasets and state of the art. Found. Trends Comput. Graph. Vis. 12(1–3), 1–308 (2020)
Article Google Scholar
Yang, F., Li, X., Shen, J.: Nested architecture search for point cloud semantic segmentation. IEEE Trans. Image Process. 32, 2889–2418 (2022)
Yin, J., Zhou, D., Zhang, L., Fang, J., Xu, C.-Z., Shen, J., Wang, W.: Proposalcontrast: Unsupervised pre-training for lidar-based 3d object detection. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX, pp. 17–33. Springer (2022)
Yin, J., Fang, J., Zhou, D., Zhang, L., Xu, C.-Z., Shen, J., Wang, W.: Semi-supervised 3d object detection with proficient teachers. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp. 727–743. Springer (2022)
Jaritz, M., Gu, J., Su, H.: Multi-view pointnet for 3d scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)
Rizzoli, G., Barbato, F., Zanuttigh, P.: Multimodal semantic segmentation in autonomous driving: a review of current approaches and future perspectives. Technologies 10(4), 90 (2022)
Article Google Scholar
Meng, Q., Wang, W., Zhou, T., Shen, J., Jia, Y., Van Gool, L.: Towards a weakly supervised framework for 3d point cloud object detection and annotation. IEEE Trans. Pattern Anal. Mach. Intell. 44(8), 4454–4468 (2022)
Qian, G., Li, Y., Peng, H., Mai, J., Hammoud, H.A.A.K., Elhoseiny, M., Ghanem, B.: Pointnext: Revisiting pointnet++ with improved training and scaling strategies (2022). arXiv preprint arXiv:2206.04670
Wu, W., Qi, Z., Li, F.: Pointconv: Deep convolutional networks on 3d point clouds. In: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, pp. 9621–9630 (2019)
You, H., Feng, Y., Ji, R., Gao, Y.: Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In: MM, pp. 1310–1318 (2018)
Baltrusaitis, T., Ahuja, C., Morency, L.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2019)
Article Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)
Wang, J., Wei, Z., Zhang, T., Zeng, W.: Deeply-fused nets (2016). CoRR arXiv:1605.07716
Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: ICLR (2017)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 77–85 (2017)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5105–5114 (2017)
Ren, D., Wu, Z., Li, J., Yu, P., Guo, J., Wei, M., Guo, Y.: Point attention network for point cloud semantic segmentation. Sci. China Inf. Sci. 65(9), 192104 (2022)
Article Google Scholar
Yin, J., Shen, J., Gao, X., Crandall, D., Yang, R.: Graph neural network and spatiotemporal transformer attention for 3d video object detection from point clouds. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3125981
Cao, J., Qin, X., Zhao, S., Shen, J.: Bilateral cross-modality graph matching attention for feature fusion in visual question answering. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2021.3135655
Giering, M., Venugopalan, V., Reddy, K.K.: Multi-modal sensor registration for vehicle perception via deep neural networks. In: HPEC, pp. 1–6 (2015)
Ye, M., Lan, X., Leng, Q., Shen, J.: Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans. Image Process. 29, 9387–9399 (2020)
Article Google Scholar
Zhou, T., Fu, H., Chen, G., Shen, J., Shao, L.: Hi-net: hybrid-fusion network for multi-modal mr image synthesis. IEEE Trans. Med. Imaging 39(9), 2772–2781 (2020)
Article Google Scholar
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Zhang, Z., Hua, B., Yeung, S.: Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 1607–1616
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: Pointcnn: Convolution on x-transformed points. In: Advances in Neural Information Processing Systems, pp. 828–838
Tosteberg, P.: Semantic segmentation of point clouds using deep learning. Master of Science Thesis in Electrical Engineering Department of Electrical Engineering, Linköping University (2017)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017)
Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d-3d-semantic data for indoor scene understanding (2017). arXiv preprint arXiv:1702.01105
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Zhu, X., Zhou, H., Wang, T., Hong, F., Ma, Y., Li, W., Li, H., Lin, D.: Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9939–9948 (2021)
Armeni, I., Sener, O., Zamir, A.R., Jiang, H., Brilakis, I.K., Fischer, M., Savarese, S.: 3d semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,, pp. 1534–1543 (2016)
Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., Gall, J.: Semantickitti: A dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9297–9307 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint arXiv:1704.04861

Download references

Author information

Authors and Affiliations

National Key Lab for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Dayong Ren, Jiawei Li, Zhengyi Wu, Jie Guo & Yanwen Guo
Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Mingqiang Wei

Authors

Dayong Ren
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Guo
View author publications
You can also search for this author in PubMed Google Scholar
Mingqiang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yanwen Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jie Guo or Yanwen Guo.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ren, D., Li, J., Wu, Z. et al. MFFNet: multimodal feature fusion network for point cloud semantic segmentation. Vis Comput 40, 5155–5167 (2024). https://doi.org/10.1007/s00371-023-02907-w

Download citation

Accepted: 15 May 2023
Published: 11 June 2023
Issue Date: August 2024
DOI: https://doi.org/10.1007/s00371-023-02907-w

MFFNet: multimodal feature fusion network for point cloud semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AFpoint: adaptively fusing local and global features for point cloud

Waterfall-Net: Waterfall Feature Aggregation for Point Cloud Semantic Segmentation

SemReg: Semantics Constrained Point Cloud Registration

Data availability

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MFFNet: multimodal feature fusion network for point cloud semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AFpoint: adaptively fusing local and global features for point cloud

Waterfall-Net: Waterfall Feature Aggregation for Point Cloud Semantic Segmentation

SemReg: Semantics Constrained Point Cloud Registration

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation