Nothing Special   »   [go: up one dir, main page]

Skip to main content

ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15059))

Included in the following conference series:

  • 323 Accesses

Abstract

The costly and time-consuming annotation process to produce large training sets for modelling semantic LiDAR segmentation methods has motivated the development of semi-supervised learning (SSL) methods. However, such SSL approaches often concentrate on employing consistency learning only for individual LiDAR representations. This narrow focus results in limited perturbations that generally fail to enable effective consistency learning. Additionally, these SSL approaches employ contrastive learning based on the sampling from a limited set of positive and negative embedding samples. This paper introduces a novel semi-supervised LiDAR semantic segmentation framework called ItTakesTwo (IT2). IT2 is designed to ensure consistent predictions from peer LiDAR representations, thereby improving the perturbation effectiveness in consistency learning. Furthermore, our contrastive learning employs informative samples drawn from a distribution of positive and negative embeddings learned from the entire training set. Results on public benchmarks show that our approach achieves remarkable improvements over the previous state-of-the-art (SOTA) methods in the field. https://github.com/yyliu01/IT2.

Y. Liu and Y. Chen—Denotes equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Cluster hypothesis (or assumption) [42]: the data of various forms that behave similarly with respect to information relevance should be clustered together.

  2. 2.

    Range-to-point projection can lead to information loss [45], which is typically mitigated by post-processing with K-Nearest Neighbors (KNN). In pursuit of efficiency, our IT2 does not use any post-processing during training.

  3. 3.

    The original GPC paper [23] calculates the mIoU differently compared to common practices [27, 31, 66]. The reported performance is based on the re-implementation from the official supported checkpoints of their GitHub, where more details are in Supplementary Sec. 1.

References

  1. Arazo, E., Ortego, D., Albert, P., O’Connor, N.E., McGuinness, K.: Pseudo-labeling and confirmation bias in deep semi-supervised learning. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)

    Google Scholar 

  2. Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., Gall, J.: Semantickitti: a dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9297–9307 (2019)

    Google Scholar 

  3. Berman, M., Triki, A.R., Blaschko, M.B.: The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4413–4421 (2018)

    Google Scholar 

  4. Caesar, H., et al.: nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)

    Google Scholar 

  5. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural. Inf. Process. Syst. 33, 9912–9924 (2020)

    Google Scholar 

  6. Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2613–2622 (2021)

    Google Scholar 

  7. Cheng, M., Hui, L., Xie, J., Yang, J.: Sspc-net: Semi-supervised semantic 3d point cloud segmentation network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1140–1147 (2021)

    Google Scholar 

  8. Cheung, Y.m.: A rival penalized em algorithm towards maximizing weighted likelihood for density mixture clustering with automatic model selection. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 4, pp. 633–636. IEEE (2004)

    Google Scholar 

  9. Cheung, Y.M.: Maximum weighted likelihood via rival penalized em for density mixture clustering with automatic model selection. IEEE Trans. Knowl. Data Eng. 17(6), 750–761 (2005)

    Article  Google Scholar 

  10. Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49

    Chapter  Google Scholar 

  11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977)

    Article  MathSciNet  Google Scholar 

  12. Deng, S., Dong, Q., Liu, B., Hu, Z.: Superpoint-guided semi-supervised semantic segmentation of 3d point clouds. In: 2022 International conference on robotics and automation (ICRA), pp. 9214–9220. IEEE (2022)

    Google Scholar 

  13. Fan, L., Xiong, X., Wang, F., Wang, N., Zhang, Z.: Rangedet: In defense of range view for lidar-based 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2918–2927 (2021)

    Google Scholar 

  14. French, G., Aila, T., Laine, S., Mackiewicz, M., Finlayson, G.: Semi-supervised semantic segmentation needs strong, high-dimensional perturbations (2019)

    Google Scholar 

  15. Graham, B., Engelcke, M., Van Der Maaten, L.: 3d semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)

    Google Scholar 

  16. Grigorescu, S., Trasnea, B., Cocias, T., Macesanu, G.: A survey of deep learning techniques for autonomous driving. J. Field Rob. 37(3), 362–386 (2020)

    Article  Google Scholar 

  17. Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., Bennamoun, M.: Deep learning for 3d point clouds: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2020)

    Article  Google Scholar 

  18. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  20. Hou, J., Graham, B., Nießner, M., Xie, S.: Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15587–15597 (2021)

    Google Scholar 

  21. Hou, Y., Zhu, X., Ma, Y., Loy, C.C., Li, Y.: Point-to-voxel knowledge distillation for lidar semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8479–8488 (2022)

    Google Scholar 

  22. Hung, W.C., Tsai, Y.H., Liou, Y.T., Lin, Y.Y., Yang, M.H.: Adversarial learning for semi-supervised semantic segmentation. arXiv preprint arXiv:1802.07934 (2018)

  23. Jiang, L., et al.: Guided point contrastive learning for semi-supervised point cloud semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6423–6432 (2021)

    Google Scholar 

  24. Khosla, P., et al.: Supervised contrastive learning. Adv. Neural. Inf. Process. Syst. 33, 18661–18673 (2020)

    Google Scholar 

  25. Kohli, A.P.S., Sitzmann, V., Wetzstein, G.: Semantic implicit neural scene representations with semi-supervised training. In: 2020 International Conference on 3D Vision (3DV), pp. 423–433. IEEE (2020)

    Google Scholar 

  26. Kong, L., et al.: Rethinking range view representation for lidar segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 228–240 (2023)

    Google Scholar 

  27. Kong, L., Ren, J., Pan, L., Liu, Z.: Lasermix for semi-supervised lidar semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21705–21715 (2023)

    Google Scholar 

  28. Lai, X., Chen, Y., Lu, F., Liu, J., Jia, J.: Spherical transformer for lidar-based 3D recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17545–17555 (2023)

    Google Scholar 

  29. Le-Khac, P.H., Healy, G., Smeaton, A.F.: Contrastive representation learning: a framework and review. IEEE Access 8, 193907–193934 (2020)

    Article  Google Scholar 

  30. Li, J., Zhou, P., Xiong, C., Hoi, S.C.: Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020)

  31. Li, L., Shum, H.P., Breckon, T.P.: Less is more: reducing task and model complexity for 3d point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9361–9371 (2023)

    Google Scholar 

  32. Li, M., et al.: Hybridcr: weakly-supervised 3d point cloud semantic segmentation via hybrid contrastive regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14930–14939 (2022)

    Google Scholar 

  33. Liang, C., Wang, W., Miao, J., Yang, Y.: Gmmseg: Gaussian mixture based generative semantic segmentation models. Adv. Neural. Inf. Process. Syst. 35, 31360–31375 (2022)

    Google Scholar 

  34. Liang, T., et al.: Bevfusion: a simple and robust lidar-camera fusion framework. Adv. Neural. Inf. Process. Syst. 35, 10421–10434 (2022)

    Google Scholar 

  35. Liu, M., Zhou, Y., Qi, C.R., Gong, B., Su, H., Anguelov, D.: Less: label-efficient semantic segmentation for lidar point clouds. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) ECCV 202. LNCS, vol. 13699, pp. 70–89. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19842-7_5

    Chapter  Google Scholar 

  36. Liu, W., Yue, X., Chen, Y., Denoeux, T.: Trusted multi-view deep learning with opinion aggregation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 7585–7593 (2022)

    Google Scholar 

  37. Liu, Y., Hu, Q., Lei, Y., Xu, K., Li, J., Guo, Y.: Box2seg: learning semantics of 3d point clouds with box-level supervision. arXiv preprint arXiv:2201.02963 (2022)

  38. Liu, Y., et al.: Segment any point cloud sequences by distilling vision foundation models. arXiv preprint arXiv:2306.09347 (2023)

  39. Liu, Y., Tian, Y., Chen, Y., Liu, F., Belagiannis, V., Carneiro, G.: Perturbed and strict mean teachers for semi-supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4258–4267 (2022)

    Google Scholar 

  40. Liu, Z., et al.: Bevfusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. In: IEEE International Conference on Robotics and Automation (ICRA) (2023)

    Google Scholar 

  41. Liu, Z., et al.: Bevfusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2774–2781. IEEE (2023)

    Google Scholar 

  42. Manning, C.D.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  43. Mena, G., Nejatbakhsh, A., Varol, E., Niles-Weed, J.: Sinkhorn em: an expectation-maximization algorithm based on entropic optimal transport. arXiv preprint arXiv:2006.16548 (2020)

  44. Miech, A., Alayrac, J.B., Smaira, L., Laptev, I., Sivic, J., Zisserman, A.: End-to-end learning of visual representations from uncurated instructional videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9879–9889 (2020)

    Google Scholar 

  45. Milioto, A., Vizzo, I., Behley, J., Stachniss, C.: Rangenet++: Fast and accurate lidar semantic segmentation. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4213–4220. IEEE (2019)

    Google Scholar 

  46. Nunes, L., Marcuzzi, R., Chen, X., Behley, J., Stachniss, C.: Segcontrast: 3d point cloud feature representation learning through self-supervised segment discrimination. IEEE Rob. Autom. Lett. 7(2), 2116–2123 (2022)

    Article  Google Scholar 

  47. Ouali, Y., Hudelot, C., Tami, M.: Semi-supervised semantic segmentation with cross-consistency training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12674–12684 (2020)

    Google Scholar 

  48. Reichardt, L., Ebert, N., Wasenmüller, O.: 360deg from a single camera: a few-shot approach for lidar segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1075–1083 (2023)

    Google Scholar 

  49. Sun, C.Y., et al.: Semi-supervised 3d shape segmentation with multilevel consistency and part substitution. Comput. Visual Media 9(2), 229–247 (2023)

    Article  Google Scholar 

  50. Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv preprint arXiv:1703.01780 (2017)

  51. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 776–794. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_45

    Chapter  Google Scholar 

  52. Unal, O., Dai, D., Van Gool, L.: Scribble-supervised lidar semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2697–2707 (2022)

    Google Scholar 

  53. Vora, S., et al.: Nesf: neural semantic fields for generalizable semantic segmentation of 3d scenes. arXiv preprint arXiv:2111.13260 (2021)

  54. Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7303–7313 (2021)

    Google Scholar 

  55. Wu, B., Wan, A., Yue, X., Keutzer, K.: Squeezeseg: convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1887–1893. IEEE (2018)

    Google Scholar 

  56. Xie, S., Gu, J., Guo, D., Qi, C.R., Guibas, L., Litany, O.: PointContrast: unsupervised pre-training for 3D point cloud understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_34

    Chapter  Google Scholar 

  57. Xu, J., Hsu, D.J., Maleki, A.: Benefits of over-parameterization with em. Adv. Neural Inf. Process. Syst. 31 (2018)

    Google Scholar 

  58. Xu, J., Zhang, R., Dou, J., Zhu, Y., Sun, J., Pu, S.: Rpvnet: a deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16024–16033 (2021)

    Google Scholar 

  59. Xu, J., Tang, H., Ren, Y., Peng, L., Zhu, X., He, L.: Multi-level feature learning for contrastive multi-view clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16051–16060 (2022)

    Google Scholar 

  60. Xu, Z., Yuan, B., Zhao, S., Zhang, Q., Gao, X.: Hierarchical point-based active learning for semi-supervised point cloud semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18098–18108 (2023)

    Google Scholar 

  61. Yang, L., Qi, L., Feng, L., Zhang, W., Shi, Y.: Revisiting weak-to-strong consistency in semi-supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7236–7246 (2023)

    Google Scholar 

  62. Yang, L., Zhuo, W., Qi, L., Shi, Y., Gao, Y.: St++: Make self-training work better for semi-supervised semantic segmentation. arXiv preprint arXiv:2106.05095 (2021)

  63. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)

    Google Scholar 

  64. Zhang, Y., et al.: Polarnet: an improved grid representation for online lidar point clouds semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9601–9610 (2020)

    Google Scholar 

  65. Zhao, Y., Bai, L., Huang, X.: Fidnet: lidar point cloud semantic segmentation with fully interpolation decoding. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4453–4458. IEEE (2021)

    Google Scholar 

  66. Zhu, X., et al.: Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9939–9948 (2021)

    Google Scholar 

  67. Zhuang, Z., Li, R., Jia, K., Wang, Q., Li, Y., Tan, M.: Perception-aware multi-sensor fusion for 3d lidar semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16280–16290 (2021)

    Google Scholar 

  68. Zou, Y., Yu, Z., Kumar, B., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 289–305 (2018)

    Google Scholar 

  69. Zou, Y., et al.: Pseudoseg: designing pseudo labels for semantic segmentation. arXiv preprint arXiv:2010.09713 (2020)

Download references

Acknowledgements

The project is supported by the Australian Research Council (ARC) through grant FT190100525.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuyuan Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8589 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y., Chen, Y., Wang, H., Belagiannis, V., Reid, I., Carneiro, G. (2025). ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15059. Springer, Cham. https://doi.org/10.1007/978-3-031-73232-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73232-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73231-7

  • Online ISBN: 978-3-031-73232-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics