Nothing Special   »   [go: up one dir, main page]

Skip to main content

4D-StOP: Panoptic Segmentation of 4D LiDAR Using Spatio-Temporal Object Proposal Generation and Aggregation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13801))

Included in the following conference series:

Abstract

In this work, we present a new paradigm, called 4D-StOP, to tackle the task of 4D Panoptic LiDAR Segmentation. 4D-StOP first generates spatio-temporal proposals using voting-based center predictions, where each point in the 4D volume votes for a corresponding center. These tracklet proposals are further aggregated using learned geometric features. The tracklet aggregation method effectively generates a video-level 4D scene representation over the entire space-time volume. This is in contrast to existing end-to-end trainable state-of-the-art approaches which use spatio-temporal embeddings that are represented by Gaussian probability distributions. Our voting-based tracklet generation method followed by geometric feature-based aggregation generates significantly improved panoptic LiDAR segmentation quality when compared to modeling the entire 4D volume using Gaussian probability distributions. 4D-StOP achieves a new state-of-the-art when applied to the SemanticKITTI test dataset with a score of 63.9 LSTQ, which is a large (+7%) improvement compared to current best-performing end-to-end trainable methods. The code and pre-trained models are available at: https://github.com/LarsKreuzberg/4D-StOP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Athar, A., Mahadevan, S., Os̆ep, A., Leal-Taixé, L., Leibe, B.: STEm-Seg: spatio-temporal embeddings for instance segmentation in videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 158–177. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_10

    Chapter  Google Scholar 

  2. Aygün, M., et al.: 4D panoptic segmentation. In: CVPR (2021)

    Google Scholar 

  3. Bai, X., et al.: TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers. arXiv preprint arXiv:2203.11496 (2022)

  4. Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences. In: ICCV (2019)

    Google Scholar 

  5. Behley, J., Milioto, A., Stachniss, C.: A benchmark for LiDAR-based panoptic segmentation based on KITTI. In: ICRA (2021)

    Google Scholar 

  6. Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. In: ICCV (2019)

    Google Scholar 

  7. Braso, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: CVPR (2020)

    Google Scholar 

  8. Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR (2020)

    Google Scholar 

  9. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: NIPS (2020)

    Google Scholar 

  10. Chen, T., Kornblith, S., 0002, M.N., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning (ICML) (2020)

    Google Scholar 

  11. Chibane, J., Engelmann, F., Tran, T.A., Pons-Moll, G.: Box2Mask: weakly supervised 3D semantic instance segmentation using bounding boxes. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, pp. 681–699. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_39

    Chapter  Google Scholar 

  12. Chiu, H.K., Prioletti, A., Li, J., Bohg, J.: Probabilistic 3D Multi-Object Tracking for Autonomous Driving. In: arXiv preprint arXiv:2001.05673 (2020)

  13. Cortinhal, T., Tzelepis, G., Erdal Aksoy, E.: SalsaNext: fast, uncertainty-aware semantic segmentation of LiDAR point clouds. In: International Symposium on Visual Computing (2020)

    Google Scholar 

  14. Elich, C., Engelmann, F., Schult, J., Kontogianni, T., Leibe, B.: 3D-BEVIS: birds-eye-view instance segmentation. In: German Conference on Pattern Recognition (GCPR) (2019)

    Google Scholar 

  15. Engelmann, F., Bokeloh, M., Fathi, A., Leibe, B., Nießner, M.: 3D-MPA: multi-proposal aggregation for 3D semantic instance segmentation. In: CVPR (2020)

    Google Scholar 

  16. Engelmann, F., Kontogianni, T., Hermans, A., Leibe, B.: Exploring spatial context for 3D semantic segmentation of point clouds. In: ICCV Workshops (2017)

    Google Scholar 

  17. Engelmann, F., Kontogianni, T., Schult, J., Leibe, B.: Know what your neighbors do: 3D semantic segmentation of point clouds. In: ECCV Workshops (2018)

    Google Scholar 

  18. Fong, W.K., et al.: Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking. In: arXiv preprint arXiv:2109.03805 (2021)

  19. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)

    Google Scholar 

  20. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)

    Google Scholar 

  21. Hong, F., Zhou, H., Zhu, X., Li, H., Liu, Z.: LiDAR-based panoptic segmentation via dynamic shifting network. In: CVPR (2021)

    Google Scholar 

  22. Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: CVPR (2019)

    Google Scholar 

  23. Hurtado, J.V., Mohan, R., Burgard, W., Valada, A.: MOPT: multi-object panoptic tracking. In: CVPR Workshops (2020)

    Google Scholar 

  24. Kim, D., Woo, S., Lee, J.Y., Kweon, I.S.: Video panoptic segmentation. In: CVPR (2020)

    Google Scholar 

  25. Kim, Aleksandr, O.A., Leal-Taixé, L.: EagerMOT: 3D multi-object tracking via sensor fusion. In: ICRA (2021)

    Google Scholar 

  26. Lahoud, J., Ghanem, B., Pollefeys, M., Oswald, M.R.: 3D instance segmentation via multi-task metric learning. In: ICCV (2019)

    Google Scholar 

  27. Landrieu, L., Simonovsky, M.: Large-scale point cloud semantic segmentation with superpoint graphs. In: CVPR (2018)

    Google Scholar 

  28. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR (2019)

    Google Scholar 

  29. Leal-Taixé, L., Fenzi, M., Kuznetsova, A., Rosenhahn, B., Savarese, S.: Learning an image-based motion context for multiple people tracking. In: CVPR (2014)

    Google Scholar 

  30. Marcuzzi, R., Nunes, L., Wiesmann, L., Vizzo, I., Behley, J., Stachniss, C.: Contrastive instance association for 4D panoptic segmentatio using sequences of 3D LiDAR scans. In: IEEE Robotics and Automation Society (2022)

    Google Scholar 

  31. Meinhardt, T., Kirillov, A., Leal-Taixé, L., Feichtenhofer, C.: TrackFormer: multi-object tracking with transformers. In: CVPR (2022)

    Google Scholar 

  32. Milan, A., Leal-Taixé, L., Schindler, K., Reid, I.D.: Joint tracking and segmentation of multiple targets. In: CVPR (2015)

    Google Scholar 

  33. Milioto, A., Vizzo, I., Behley, J., Stachniss, C.: RangeNet++: fast and accurate LiDAR semantic segmentation. In: IROS (2019)

    Google Scholar 

  34. Milioto, A., Behley, J., McCool, C., Stachniss, C.: LiDAR panoptic segmentation for autonomous driving. In: IROS (2020)

    Google Scholar 

  35. Misra, I., Girdhar, R., Joulin, A.: An end-to-end transformer model for 3D object detection. In: ICCV (2021)

    Google Scholar 

  36. Mittal, H., Okorn, B., Held, D.: Just go with the flow: self-supervised scene flow estimation. In: CVPR (2020)

    Google Scholar 

  37. Neven, D., Brabandere, B.D., Proesmans, M., Gool, L.V.: Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In: CVPR (2019)

    Google Scholar 

  38. Oord, A.V.D., Li, Y., Vinyals, O.: Representation Learning with Contrastive Predictive Coding. arXiv preprint arXiv:1807.03748 (2018)

  39. Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking. In: CVPR (2021)

    Google Scholar 

  40. Qi, C., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)

    Google Scholar 

  41. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: ICCV (2019)

    Google Scholar 

  42. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

    Google Scholar 

  43. Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: CVPR (2020)

    Google Scholar 

  44. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR (2019)

    Google Scholar 

  45. Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: CVPR (2020)

    Google Scholar 

  46. Tang, H., et al.: Searching efficient 3D architectures with sparse point-voxel convolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 685–702. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_41

    Chapter  Google Scholar 

  47. Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV (2019)

    Google Scholar 

  48. Voigtlaender, P., et al.: MOTS: multi-object tracking and segmentation. In: CVPR (2019)

    Google Scholar 

  49. Wang, Y., Kitani, K., Weng, X.: Joint object detection and multi-object tracking with graph neural networks. In: ICRA (2021)

    Google Scholar 

  50. Weng, X., Wang, J., Held, D., Kitani, K.: 3D multi-object tracking: a baseline and new evaluation metrics. In: IROS (2020)

    Google Scholar 

  51. Weng, X., Wang, J., Held, D., Kitani, K.: AB3DMOT: a baseline for 3D multi-object tracking and new evaluation metrics. In: ECCV Workshops (2020)

    Google Scholar 

  52. Wu, B., Wan, A., Yue, X., Keutzer, K.: SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud. In: ICRA (2018)

    Google Scholar 

  53. Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Article  Google Scholar 

  54. Yang, B., et al.: Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds. arXiv preprint arXiv:1906.01140 (2019)

  55. Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: CVPR (2020)

    Google Scholar 

  56. Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3D object detection and tracking. In: CVPR (2021)

    Google Scholar 

  57. Zhang, Y., Zhou, Z., David, P., Yue, X., Xi, Z., Foroosh, H.: PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation. In: CVPR (2020)

    Google Scholar 

  58. Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28

    Chapter  Google Scholar 

  59. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2018)

    Google Scholar 

  60. Zhou, Z., Zhang, Y., Foroosh, H.: Panoptic-PolarNet: proposal-free LiDAR point cloud panoptic segmentation. In: CVPR (2021)

    Google Scholar 

  61. Zhu, X., et al.: Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation. In: CVPR (2021)

    Google Scholar 

Download references

Acknowledgments

We thank Sima Yagmur Zulfikar for her help and feedback on the figures, and István Sárándi for his helpful comments on our manuscript. This project was funded by ERC Consolidator Grant DeeVise (ERC-2017-COG-773161). The computing resources for several experiments were granted by RWTH Aachen University under project ’supp0003’. Francis Engelmann is a post-doctoral research fellow at the ETH AI Center. This work is part of the first author’s master thesis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Idil Esen Zulfikar .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 791 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kreuzberg, L., Zulfikar, I.E., Mahadevan, S., Engelmann, F., Leibe, B. (2023). 4D-StOP: Panoptic Segmentation of 4D LiDAR Using Spatio-Temporal Object Proposal Generation and Aggregation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13801. Springer, Cham. https://doi.org/10.1007/978-3-031-25056-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25056-9_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25055-2

  • Online ISBN: 978-3-031-25056-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics