4D-StOP: Panoptic Segmentation of 4D LiDAR Using Spatio-Temporal Object Proposal Generation and Aggregation

Lars Kreuzberg¹⁰,
Idil Esen Zulfikar¹⁰,
Sabarinath Mahadevan¹⁰,
Francis Engelmann¹¹ &
…
Bastian Leibe¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13801))

Included in the following conference series:

European Conference on Computer Vision

2080 Accesses
16 Citations

Abstract

In this work, we present a new paradigm, called 4D-StOP, to tackle the task of 4D Panoptic LiDAR Segmentation. 4D-StOP first generates spatio-temporal proposals using voting-based center predictions, where each point in the 4D volume votes for a corresponding center. These tracklet proposals are further aggregated using learned geometric features. The tracklet aggregation method effectively generates a video-level 4D scene representation over the entire space-time volume. This is in contrast to existing end-to-end trainable state-of-the-art approaches which use spatio-temporal embeddings that are represented by Gaussian probability distributions. Our voting-based tracklet generation method followed by geometric feature-based aggregation generates significantly improved panoptic LiDAR segmentation quality when compared to modeling the entire 4D volume using Gaussian probability distributions. 4D-StOP achieves a new state-of-the-art when applied to the SemanticKITTI test dataset with a score of 63.9 LSTQ, which is a large (+7%) improvement compared to current best-performing end-to-end trainable methods. The code and pre-trained models are available at: https://github.com/LarsKreuzberg/4D-StOP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation

LISO: Lidar-Only Self-supervised 3D Object Detection

Learning Temporal Variations for 4D Point Cloud Segmentation

Article 21 June 2024

References

Athar, A., Mahadevan, S., Os̆ep, A., Leal-Taixé, L., Leibe, B.: STEm-Seg: spatio-temporal embeddings for instance segmentation in videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 158–177. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_10
Chapter Google Scholar
Aygün, M., et al.: 4D panoptic segmentation. In: CVPR (2021)
Google Scholar
Bai, X., et al.: TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers. arXiv preprint arXiv:2203.11496 (2022)
Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences. In: ICCV (2019)
Google Scholar
Behley, J., Milioto, A., Stachniss, C.: A benchmark for LiDAR-based panoptic segmentation based on KITTI. In: ICRA (2021)
Google Scholar
Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. In: ICCV (2019)
Google Scholar
Braso, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: CVPR (2020)
Google Scholar
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR (2020)
Google Scholar
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: NIPS (2020)
Google Scholar
Chen, T., Kornblith, S., 0002, M.N., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning (ICML) (2020)
Google Scholar
Chibane, J., Engelmann, F., Tran, T.A., Pons-Moll, G.: Box2Mask: weakly supervised 3D semantic instance segmentation using bounding boxes. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, pp. 681–699. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_39
Chapter Google Scholar
Chiu, H.K., Prioletti, A., Li, J., Bohg, J.: Probabilistic 3D Multi-Object Tracking for Autonomous Driving. In: arXiv preprint arXiv:2001.05673 (2020)
Cortinhal, T., Tzelepis, G., Erdal Aksoy, E.: SalsaNext: fast, uncertainty-aware semantic segmentation of LiDAR point clouds. In: International Symposium on Visual Computing (2020)
Google Scholar
Elich, C., Engelmann, F., Schult, J., Kontogianni, T., Leibe, B.: 3D-BEVIS: birds-eye-view instance segmentation. In: German Conference on Pattern Recognition (GCPR) (2019)
Google Scholar
Engelmann, F., Bokeloh, M., Fathi, A., Leibe, B., Nießner, M.: 3D-MPA: multi-proposal aggregation for 3D semantic instance segmentation. In: CVPR (2020)
Google Scholar
Engelmann, F., Kontogianni, T., Hermans, A., Leibe, B.: Exploring spatial context for 3D semantic segmentation of point clouds. In: ICCV Workshops (2017)
Google Scholar
Engelmann, F., Kontogianni, T., Schult, J., Leibe, B.: Know what your neighbors do: 3D semantic segmentation of point clouds. In: ECCV Workshops (2018)
Google Scholar
Fong, W.K., et al.: Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking. In: arXiv preprint arXiv:2109.03805 (2021)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
Google Scholar
Hong, F., Zhou, H., Zhu, X., Li, H., Liu, Z.: LiDAR-based panoptic segmentation via dynamic shifting network. In: CVPR (2021)
Google Scholar
Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: CVPR (2019)
Google Scholar
Hurtado, J.V., Mohan, R., Burgard, W., Valada, A.: MOPT: multi-object panoptic tracking. In: CVPR Workshops (2020)
Google Scholar
Kim, D., Woo, S., Lee, J.Y., Kweon, I.S.: Video panoptic segmentation. In: CVPR (2020)
Google Scholar
Kim, Aleksandr, O.A., Leal-Taixé, L.: EagerMOT: 3D multi-object tracking via sensor fusion. In: ICRA (2021)
Google Scholar
Lahoud, J., Ghanem, B., Pollefeys, M., Oswald, M.R.: 3D instance segmentation via multi-task metric learning. In: ICCV (2019)
Google Scholar
Landrieu, L., Simonovsky, M.: Large-scale point cloud semantic segmentation with superpoint graphs. In: CVPR (2018)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR (2019)
Google Scholar
Leal-Taixé, L., Fenzi, M., Kuznetsova, A., Rosenhahn, B., Savarese, S.: Learning an image-based motion context for multiple people tracking. In: CVPR (2014)
Google Scholar
Marcuzzi, R., Nunes, L., Wiesmann, L., Vizzo, I., Behley, J., Stachniss, C.: Contrastive instance association for 4D panoptic segmentatio using sequences of 3D LiDAR scans. In: IEEE Robotics and Automation Society (2022)
Google Scholar
Meinhardt, T., Kirillov, A., Leal-Taixé, L., Feichtenhofer, C.: TrackFormer: multi-object tracking with transformers. In: CVPR (2022)
Google Scholar
Milan, A., Leal-Taixé, L., Schindler, K., Reid, I.D.: Joint tracking and segmentation of multiple targets. In: CVPR (2015)
Google Scholar
Milioto, A., Vizzo, I., Behley, J., Stachniss, C.: RangeNet++: fast and accurate LiDAR semantic segmentation. In: IROS (2019)
Google Scholar
Milioto, A., Behley, J., McCool, C., Stachniss, C.: LiDAR panoptic segmentation for autonomous driving. In: IROS (2020)
Google Scholar
Misra, I., Girdhar, R., Joulin, A.: An end-to-end transformer model for 3D object detection. In: ICCV (2021)
Google Scholar
Mittal, H., Okorn, B., Held, D.: Just go with the flow: self-supervised scene flow estimation. In: CVPR (2020)
Google Scholar
Neven, D., Brabandere, B.D., Proesmans, M., Gool, L.V.: Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In: CVPR (2019)
Google Scholar
Oord, A.V.D., Li, Y., Vinyals, O.: Representation Learning with Contrastive Predictive Coding. arXiv preprint arXiv:1807.03748 (2018)
Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking. In: CVPR (2021)
Google Scholar
Qi, C., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: ICCV (2019)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: CVPR (2020)
Google Scholar
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR (2019)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: CVPR (2020)
Google Scholar
Tang, H., et al.: Searching efficient 3D architectures with sparse point-voxel convolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 685–702. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_41
Chapter Google Scholar
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV (2019)
Google Scholar
Voigtlaender, P., et al.: MOTS: multi-object tracking and segmentation. In: CVPR (2019)
Google Scholar
Wang, Y., Kitani, K., Weng, X.: Joint object detection and multi-object tracking with graph neural networks. In: ICRA (2021)
Google Scholar
Weng, X., Wang, J., Held, D., Kitani, K.: 3D multi-object tracking: a baseline and new evaluation metrics. In: IROS (2020)
Google Scholar
Weng, X., Wang, J., Held, D., Kitani, K.: AB3DMOT: a baseline for 3D multi-object tracking and new evaluation metrics. In: ECCV Workshops (2020)
Google Scholar
Wu, B., Wan, A., Yue, X., Keutzer, K.: SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud. In: ICRA (2018)
Google Scholar
Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Yang, B., et al.: Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds. arXiv preprint arXiv:1906.01140 (2019)
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: CVPR (2020)
Google Scholar
Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3D object detection and tracking. In: CVPR (2021)
Google Scholar
Zhang, Y., Zhou, Z., David, P., Yue, X., Xi, Z., Foroosh, H.: PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation. In: CVPR (2020)
Google Scholar
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Chapter Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2018)
Google Scholar
Zhou, Z., Zhang, Y., Foroosh, H.: Panoptic-PolarNet: proposal-free LiDAR point cloud panoptic segmentation. In: CVPR (2021)
Google Scholar
Zhu, X., et al.: Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation. In: CVPR (2021)
Google Scholar

Download references

Acknowledgments

We thank Sima Yagmur Zulfikar for her help and feedback on the figures, and István Sárándi for his helpful comments on our manuscript. This project was funded by ERC Consolidator Grant DeeVise (ERC-2017-COG-773161). The computing resources for several experiments were granted by RWTH Aachen University under project ’supp0003’. Francis Engelmann is a post-doctoral research fellow at the ETH AI Center. This work is part of the first author’s master thesis.

Author information

Authors and Affiliations

RWTH Aachen University, Aachen, Germany
Lars Kreuzberg, Idil Esen Zulfikar, Sabarinath Mahadevan & Bastian Leibe
ETH Zurich, AI Center, Zürich, Switzerland
Francis Engelmann

Authors

Lars Kreuzberg
View author publications
You can also search for this author in PubMed Google Scholar
Idil Esen Zulfikar
View author publications
You can also search for this author in PubMed Google Scholar
Sabarinath Mahadevan
View author publications
You can also search for this author in PubMed Google Scholar
Francis Engelmann
View author publications
You can also search for this author in PubMed Google Scholar
Bastian Leibe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Idil Esen Zulfikar .

Editor information

Editors and Affiliations

IBM Research AI and MIT-IBM Watson AI Lab, Haifa, Israel
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 791 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kreuzberg, L., Zulfikar, I.E., Mahadevan, S., Engelmann, F., Leibe, B. (2023). 4D-StOP: Panoptic Segmentation of 4D LiDAR Using Spatio-Temporal Object Proposal Generation and Aggregation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13801. Springer, Cham. https://doi.org/10.1007/978-3-031-25056-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-25056-9_34
Published: 15 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25055-2
Online ISBN: 978-3-031-25056-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

4D-StOP: Panoptic Segmentation of 4D LiDAR Using Spatio-Temporal Object Proposal Generation and Aggregation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation

LISO: Lidar-Only Self-supervised 3D Object Detection

Learning Temporal Variations for 4D Point Cloud Segmentation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 791 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

4D-StOP: Panoptic Segmentation of 4D LiDAR Using Spatio-Temporal Object Proposal Generation and Aggregation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation

LISO: Lidar-Only Self-supervised 3D Object Detection

Learning Temporal Variations for 4D Point Cloud Segmentation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 791 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation