Article

Equivariant Spatio-temporal Self-supervision for LiDAR Object Detection

Authors:

Kuan-Chuan Peng,

Michael J. Jones,

Vishal M. PatelAuthors Info & Claims

Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXVI

Pages 475 - 491

https://doi.org/10.1007/978-3-031-73347-5_27

Published: 29 October 2024 Publication History

Abstract

Popular representation learning methods encourage feature invariance under transformations applied at the input. However, in 3D perception tasks like object localization and segmentation, outputs are naturally equivariant to some transformations, such as rotation. Using pre-training loss functions that encourage equivariance of features under certain transformations provides a strong self-supervision signal while also retaining information of geometric relationships between transformed feature representations. This can enable improved performance in downstream tasks that are equivariant to such transformations. In this paper, we propose a spatio-temporal equivariant learning framework by considering both spatial and temporal augmentations jointly. Our experiments show that the best performance arises with a pre-training approach that encourages equivariance to translation, scaling, and flip, rotation and scene flow. For spatial augmentations, we find that depending on the transformation, either a contrastive objective or an equivariance-by-classification objective yields best results. To leverage real-world object deformations and motion, we consider sequential LiDAR scene pairs and develop a novel 3D scene flow-based equivariance objective that leads to improved performance overall. We show that our pre-training method for 3D object detection outperforms existing equivariant and invariant approaches in many settings.

References

[1]

Bhardwaj, S., et al.: Steerable equivariant representation learning. arXiv preprint arXiv:2302.11349 (2023)

[2]

Boulch, A., Sautier, C., Michele, B., Puy, G., Marlet, R.: ALSO: automotive lidar self-supervision by occupancy estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13455–13465 (2023)

[3]

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

[4]

Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3075–3084 (2019)

[5]

Dangovski, R., et al.: Equivariant self-supervised learning: encouraging equivariance in representations. In: International Conference on Learning Representations (2022)

[6]

Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1201–1209 (2021)

[7]

Devillers, A., Lefort, M.: EquiMod: an equivariance module to improve self-supervised learning. arXiv preprint arXiv:2211.01244 (2022)

[8]

Fan, L., et al.: Embracing single stride 3D object detector with sparse transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8458–8468 (2022)

[9]

Garrido, Q., Najman, L., Lecun, Y.: Self-supervised learning of split invariant equivariant representations. In: The Fortieth International Conference on Machine Learning (2023)

[10]

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

[11]

Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21271–21284 (2020)

[12]

Gupta, S., Robinson, J., Lim, D., Villar, S., Jegelka, S.: Learning structured representations with equivariant contrastive learning. In: ICML Workshop on Topology, Algebra, and Geometry in Machine Learning (2023)

[13]

He, K., Chen, X., Xie, S., Li, Y., Doll’ar, P., Girshick, R.B.: Masked autoencoders are scalable vision learners. 2022 IEEE. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15979–15988 (2021)

[14]

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

[15]

Huang, S., Xie, Y., Zhu, S.C., Zhu, Y.: Spatio-temporal self-supervised representation learning for 3D point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6535–6545 (2021)

[16]

Jin, Z., Lei, Y., Akhtar, N., Li, H., Hayat, M.: Deformation and correspondence aware unsupervised synthetic-to-real scene flow estimation for point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7233–7243 (2022)

[17]

Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)

[18]

Liao, Y., Xie, J., Geiger, A.: KITTI-360: a novel dataset and benchmarks for urban scene understanding in 2D and 3D. Pattern Anal. Mach. Intell. (PAMI) (2022)

[19]

Mao, J., et al.: One million scenes for autonomous driving: once dataset. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) (2021)

[20]

Nunes L, Marcuzzi R, Chen X, Behley J, and Stachniss C SegContrast: 3D point cloud feature representation learning through self-supervised segment discrimination IEEE Robot. Autom. Lett. 2022 7 2 2116-2123

[21]

Nunes, L., Wiesmann, L., Marcuzzi, R., Chen, X., Behley, J., Stachniss, C.: Temporal consistent 3D LiDAR representation learning for semantic perception in autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5217–5228 (2023)

[22]

Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020)

[23]

Shi S et al. PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection Int. J. Comput. Vision 2023 131 2 531-551

Digital Library

[24]

Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)

[25]

Shi S, Wang Z, Shi J, Wang X, and Li H From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network IEEE Trans. Pattern Anal. Mach. Intell. 2021 43 08 2647-2664

[26]

Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)

[27]

Tang H et al. Vedaldi A, Bischof H, Brox T, Frahm J-M, et al. Searching efficient 3D architectures with sparse point-voxel convolution Computer Vision – ECCV 2020 2020 Cham Springer 685-702

Digital Library

[28]

Team, O.D.: OpenPCDet: an open-source toolbox for 3D object detection from point clouds (2020). https://github.com/open-mmlab/OpenPCDet

[29]

Wei, Y., Wang, Z., Rao, Y., Lu, J., Zhou, J.: PV-RAFT: point-voxel correlation fields for scene flow estimation of point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6954–6963 (2021)

[30]

Xiao, T., Wang, X., Efros, A.A., Darrell, T.: What should not be contrastive in contrastive learning. In: International Conference on Learning Representations (2021)

[31]

Xie, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves ImageNet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698 (2020)

[32]

Xie S, Gu J, Guo D, Qi CR, Guibas L, and Litany O Vedaldi A, Bischof H, Brox T, and Frahm J-M PointContrast: unsupervised pre-training for 3D point cloud understanding Computer Vision – ECCV 2020 2020 Cham Springer 574-591

Digital Library

[33]

Xiong, Y., Ren, M., Zeng, W., Urtasun, R.: Self-supervised representation learning from flow equivariance. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10191–10200 (2021)

[34]

Xu, R., et al.: MV-JAR: masked voxel jigsaw and reconstruction for LiDAR-based self-supervised pre-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13445–13454 (2023)

[35]

Yan, X., et al.: SPOT: scalable 3D pre-training via occupancy prediction for autonomous driving. arXiv preprint arXiv:2309.10527 (2023)

[36]

Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10) (2018)., https://www.mdpi.com/1424-8220/18/10/3337

[37]

Yang, H., et al.: GD-MAE: generative decoder for MAE pre-training on LiDAR point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9403–9414 (2023)

[38]

Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11040–11048 (2020)

[39]

Yin J, Zhou D, Zhang L, Fang J, Xu CZ, Shen J, and Wang W Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T ProposalContrast: unsupervised pre-training for LiDAR-based 3D object detection Computer Vision - ECCV 2022 2022 Cham Springer 17-33

Digital Library

[40]

Zhang, Z., Girdhar, R., Joulin, A., Misra, I.: Self-supervised pretraining of 3D features on any point-cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10252–10263 (2021)

[41]

Zhu, X., et al.: Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9939–9948 (2021)

Index Terms

Equivariant Spatio-temporal Self-supervision for LiDAR Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Index terms have been assigned to the content through auto-classification.

Recommendations

Better Self-training for Image Classification Through Self-supervision
AI 2021: Advances in Artificial Intelligence
Abstract
Self-training is a simple semi-supervised learning approach: Unlabelled examples that attract high-confidence predictions are labelled with their predictions and added to the training set, with this process being repeated multiple times. Recently, ...
Self-supervised learning of rotation-invariant 3D point set features using transformer and its self-distillation
Abstract
Invariance against rotations of 3D objects is an important property in analyzing 3D point set data. Conventional 3D point set DNNs having rotation invariance typically obtain accurate 3D shape features via supervised learning by using labeled 3D ...
Highlights
- Novel self-supervised learning (SSL) for rotation-invariant 3D point set analysis.
- Our method outperforms competitors both in accuracy and efficiency.
- Demonstrating incompatibility between existing rotation-invariant DNNs and SSL.
LiDAR-only 3D object detection based on spatial context
Abstract
LiDAR-based 3D Object detection is one of the popular topics in recent years, and it is widely used in the fields of autonomous driving and robot controlling. However, due to the scanning pattern of LiDAR, the point clouds of objects ...
Highlights
- A novel SC-RCNN method is proposed for robust 3D object detection at far distance.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXVI

Sep 2024

578 pages

ISBN:978-3-031-73346-8

DOI:10.1007/978-3-031-73347-5

Editors:
Aleš Leonardis
https://ror.org/03angcq70University of Birmingham, Birmingham, UK
,
Elisa Ricci
https://ror.org/05trd4x28University of Trento, Trento, Italy
,
Stefan Roth
https://ror.org/05n911h24Technical University of Darmstadt, Darmstadt, Germany
,
Olga Russakovsky
https://ror.org/00hx57361Princeton University, Princeton, NJ, USA
,
Torsten Sattler
https://ror.org/03kqpb082Czech Technical University in Prague, Prague, Czech Republic
,
Gül Varol
https://ror.org/02nwvxz07École des Ponts ParisTech, Marne-la-Vallée, France

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 29 October 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten