Nothing Special   »   [go: up one dir, main page]

Skip to main content

Occlusion-Aware Seamless Segmentation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Panoramic images can broaden the Field of View (FoV), occlusion-aware prediction can deepen the understanding of the scene, and domain adaptation can transfer across viewing domains. In this work, we introduce a novel task, Occlusion-Aware Seamless Segmentation (OASS), which simultaneously tackles all these three challenges. For benchmarking OASS, we establish a new human-annotated dataset for Blending Panoramic Amodal Seamless Segmentation, i.e., BlendPASS. Besides, we propose the first solution UnmaskFormer, aiming at unmasking the narrow FoV, occlusions, and domain gaps all at once. Specifically, UnmaskFormer includes the crucial designs of Unmasking Attention (UA) and Amodal-oriented Mix (AoMix). Our method achieves state-of-the-art performance on the BlendPASS dataset, reaching a remarkable mAPQ of \(26.58\%\) and mIoU of \(43.66\%\). On public panoramic semantic segmentation datasets, i.e., SynPASS and DensePASS, our method outperforms previous methods and obtains \(45.34\%\) and \(48.08\%\) in mIoU, respectively. The fresh BlendPASS dataset and our source code are available at https://github.com/yihong-97/OASS.

Y. Cao and J. Zhang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ai, H., Cao, Z., Zhu, J., Bai, H., Chen, Y., Wang, L.: Deep learning for omnidirectional vision: a survey and new perspectives. arXiv preprint arXiv:2205.10468 (2022)

  2. Ao, J., Ke, Q., Ehinger, K.A.: Image amodal completion: a survey. Comput. Vis. Image Underst. 229, 103661 (2023)

    Article  Google Scholar 

  3. Back, S., et al.: Unseen object amodal instance segmentation via hierarchical occlusion modeling. In: ICRA (2022)

    Google Scholar 

  4. Breitenstein, J., Fingscheidt, T.: Amodal cityscapes: a new dataset, its generation, and an amodal semantic segmentation challenge baseline. In: IV (2022)

    Google Scholar 

  5. Chen, J., Niu, L., Zhang, J., Si, J., Qian, C., Zhang, L.: Amodal instance segmentation via prior-guided expansion. In: AAAI (2023)

    Google Scholar 

  6. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49

    Chapter  Google Scholar 

  7. Chen, Y., et al.: BANet: bidirectional aggregation network with occlusion handling for panoptic segmentation. In: CVPR (2020)

    Google Scholar 

  8. Cheng, B., et al.: Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR (2020)

    Google Scholar 

  9. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

    Google Scholar 

  10. Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: CVPR (2015)

    Google Scholar 

  11. Deng, L., Yang, M., Qian, Y., Wang, C., Wang, B.: CNN based semantic segmentation for urban traffic scenes using fisheye camera. In: IV (2017)

    Google Scholar 

  12. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: CoRL (2017)

    Google Scholar 

  13. Fan, K., et al.: Rethinking amodal video segmentation from learning supervised signals with object-centric representation. In: ICCV (2023)

    Google Scholar 

  14. Follmann, P., König, R., Härtinger, P., Klostermann, M., Böttger, T.: Learning to see the invisible: end-to-end trainable amodal instance segmentation. In: WACV (2019)

    Google Scholar 

  15. Fu, X., et al.: PanopticNeRF-360: panoramic 3D-to-2D label transfer in urban scenes. arXiv preprint arXiv:2309.10815 (2023)

  16. Gao, J., et al.: Coarse-to-fine amodal segmentation with shape prior. In: ICCV (2023)

    Google Scholar 

  17. Gao, S., Yang, K., Shi, H., Wang, K., Bai, J.: Review on panoramic imaging and its applications in scene understanding. IEEE Trans. Instrum. Measur. 71, 1–34 (2022)

    Google Scholar 

  18. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  19. Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.: SegNeXt: rethinking convolutional attention design for semantic segmentation. In: NeurIPS (2022)

    Google Scholar 

  20. Guttikonda, S., Rambach, J.: Single frame semantic segmentation using multi-modal spherical images. In: WACV (2024)

    Google Scholar 

  21. Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_20

    Chapter  Google Scholar 

  22. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)

    Google Scholar 

  23. Hoyer, L., Dai, D., Van Gool, L.: DAFormer: improving network architectures and training strategies for domain-adaptive semantic segmentation. In: CVPR (2022)

    Google Scholar 

  24. Hoyer, L., Dai, D., Van Gool, L.: HRDA: context-aware high-resolution domain-adaptive semantic segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13690, pp. 372–391. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20056-4_22

    Chapter  Google Scholar 

  25. Hoyer, L., Dai, D., Wang, H., Van Gool, L.: MIC: masked image consistency for context-enhanced domain adaptation. In: CVPR (2023)

    Google Scholar 

  26. Hu, X., An, Y., Shao, C., Hu, H.: Distortion convolution module for semantic segmentation of panoramic images based on the image-forming principle. IEEE Trans. Instrum. Measur. 71, 1–12 (2022)

    Google Scholar 

  27. Hu, Y.T., Chen, H.S., Hui, K., Huang, J.B., Schwing, A.G.: SAIL-VOS: semantic amodal instance level video object segmentation - a synthetic dataset and baselines. In: CVPR (2019)

    Google Scholar 

  28. Jang, S., Na, J., Oh, D.: DaDA: distortion-aware domain adaptation for unsupervised semantic segmentation. In: NeurIPS (2022)

    Google Scholar 

  29. Jaus, A., Yang, K., Stiefelhagen, R.: Panoramic panoptic segmentation: towards complete surrounding understanding via unsupervised contrastive learning. In: IV (2021)

    Google Scholar 

  30. Jaus, A., Yang, K., Stiefelhagen, R.: Panoramic panoptic segmentation: insights into surrounding parsing for mobile agents via unsupervised contrastive learning. IEEE Trans. Intell. Transp. Syst. 24(4), 4438–4453 (2023)

    Article  Google Scholar 

  31. Jiang, Q., et al.: Minimalist and high-quality panoramic imaging with PSF-aware transformers. IEEE Trans. Image Process. 33, 4568–4583 (2024)

    Article  Google Scholar 

  32. Jiang, Q., Shi, H., Sun, L., Gao, S., Yang, K., Wang, K.: Annular computational imaging: capture clear panoramic images through simple lens. IEEE Trans. Comput. Imaging 8, 1250–1264 (2022)

    Article  Google Scholar 

  33. Ke, L., Tai, Y.W., Tang, C.K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: CVPR (2021)

    Google Scholar 

  34. Kim, J., Jeong, S., Sohn, K.: PASTS: toward effective distilling transformer for panoramic semantic segmentation. In: ICIP (2022)

    Google Scholar 

  35. Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: CVPR (2019)

    Google Scholar 

  36. Kirillov, A., et al.: Segment anything. In: ICCV (2023)

    Google Scholar 

  37. Lazarow, J., Lee, K., Shi, K., Tu, Z.: Learning instance occlusion for panoptic segmentation. In: CVPR (2020)

    Google Scholar 

  38. Li, K., Malik, J.: Amodal instance segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 677–693. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_42

    Chapter  Google Scholar 

  39. Li, X., Wu, T., Qi, Z., Wang, G., Shan, Y., Li, X.: SGAT4PASS: spherical geometry-aware transformer for panoramic semantic segmentation. In: IJCAI (2023)

    Google Scholar 

  40. Li, Z., Ye, W., Jiang, T., Huang, T.: 2D amodal instance segmentation guided by 3D shape prior. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13689, pp. 165–181. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19818-2_10

    Chapter  Google Scholar 

  41. Li, Z., Ye, W., Jiang, T., Huang, T.: GIN: generative invariant shape prior for amodal instance segmentation. IEEE Trans. Multimedia 26, 3924–3936 (2023)

    Article  Google Scholar 

  42. Li, Z., et al.: MUVA: a new large-scale benchmark for multi-view amodal instance segmentation in the shopping scenario. In: ICCV (2023)

    Google Scholar 

  43. Liao, Y., Xie, J., Geiger, A.: KITTI-360: a novel dataset and benchmarks for urban scene understanding in 2D and 3D. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3292–3310 (2023)

    Article  Google Scholar 

  44. Ling, Z., Xing, Z., Zhou, X., Cao, M., Zhou, G.: PanoSwin: a pano-style swin transformer for panorama understanding. In: CVPR (2023)

    Google Scholar 

  45. Liu, Z., Li, Z., Jiang, T.: BLADE: box-level supervised amodal segmentation through directed expansion. In: AAAI (2024)

    Google Scholar 

  46. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)

    Google Scholar 

  47. Luo, Y., Zheng, L., Guan, T., Yu, J., Yang, Y.: Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation. In: CVPR (2019)

    Google Scholar 

  48. Ma, C., Zhang, J., Yang, K., Roitberg, A., Stiefelhagen, R.: DensePASS: dense panoramic semantic segmentation via unsupervised domain adaptation with attention-augmented context exchange. In: ITSC (2021)

    Google Scholar 

  49. Mei, J., et al.: Waymo open dataset: panoramic video panoptic segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13689, pp. 53–72. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19818-2_4

    Chapter  Google Scholar 

  50. Mohan, R., Valada, A.: Amodal panoptic segmentation. In: CVPR (2022)

    Google Scholar 

  51. Mohan, R., Valada, A.: Perceiving the invisible: proposal-free amodal panoptic segmentation. IEEE Robot. Autom. Lett. 7(4), 9302–9309 (2022)

    Article  Google Scholar 

  52. Nanay, B.: The importance of amodal completion in everyday perception. i-Perception (2018)

    Google Scholar 

  53. Orhan, S., Bastanlar, Y.: Semantic segmentation of outdoor panoramic images. Sig. Image Video Process. 16(3), 643–650 (2022)

    Article  Google Scholar 

  54. Poudel, R.P.K., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network. In: BMVC (2019)

    Google Scholar 

  55. Psomas, B., Kakogeorgiou, I., Karantzalos, K., Avrithis, Y.: Keep it SimPool: who said supervised transformers suffer from attention deficit? In: ICCV (2023)

    Google Scholar 

  56. Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with KINS dataset. In: CVPR (2019)

    Google Scholar 

  57. Saha, S., Hoyer, L., Obukhov, A., Dai, D., Van Gool, L.: EDAPS: enhanced domain-adaptive panoptic segmentation. In: ICCV (2023)

    Google Scholar 

  58. Sekkat, A.R., Dupuis, Y., Vasseur, P., Honeine, P.: The OmniScape dataset. In: ICRA (2020)

    Google Scholar 

  59. Sekkat, A.R., Mohan, R., Sawade, O., Matthes, E., Valada, A.: AmodalSynthDrive: a synthetic amodal perception dataset for autonomous driving. arXiv preprint arXiv:2309.06547 (2023)

  60. Shen, Z., Lin, C., Liao, K., Nie, L., Zheng, Z., Zhao, Y.: PanoFormer: panorama transformer for indoor 360\(^\circ \) depth estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13661, pp. 195–211. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_12

    Chapter  Google Scholar 

  61. Shi, H., et al.: FishDreamer: towards fisheye semantic completion via unified image outpainting and segmentation. In: CVPRW (2023)

    Google Scholar 

  62. Shi, Y., Ying, X., Zha, H.: Unsupervised domain adaptation for semantic segmentation of urban street scenes reflected by convex mirrors. IEEE Trans. Intell. Transp. Syst. 23(12), 24276–24289 (2022)

    Article  Google Scholar 

  63. Sun, Y., Kortylewski, A., Yuille, A.: Amodal segmentation through out-of-task and out-of-distribution generalization with a Bayesian model. In: CVPR (2022)

    Google Scholar 

  64. Tateno, K., Navab, N., Tombari, F.: Distortion-aware convolutional filters for dense prediction in panoramic images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 732–750. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_43

    Chapter  Google Scholar 

  65. Teng, Z., et al.: 360BEV: panoramic semantic mapping for indoor bird’s-eye view. In: WACV (2024)

    Google Scholar 

  66. Tran, M., Vo, K., Yamazaki, K., Fernandes, A., Kidd, M., Le, N.: AISFormer: amodal instance segmentation with transformer. In: BMVC (2022)

    Google Scholar 

  67. Tranheden, W., Olsson, V., Pinto, J., Svensson, L.: DACS: domain adaptation via cross-domain mixed sampling. In: WACV (2021)

    Google Scholar 

  68. Wang, J., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021)

    Article  Google Scholar 

  69. Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: ICCV (2021)

    Google Scholar 

  70. Wang, Z., et al.: Differential treatment for stuff and things: a simple unsupervised domain adaptation method for semantic segmentation. In: CVPR (2020)

    Google Scholar 

  71. Xiao, Y., Xu, Y., Zhong, Z., Luo, W., Li, J., Gao, S.: Amodal segmentation based on visible region segmentation and shape prior. In: AAAI (2021)

    Google Scholar 

  72. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Álvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. In: NeurIPS (2021)

    Google Scholar 

  73. Xu, Y., Wang, K., Yang, K., Sun, D., Fu, J.: Semantic segmentation of panoramic images using a synthetic dataset. SPIE (2019)

    Google Scholar 

  74. Yang, K., et al.: Can we PASS beyond the field of view? Panoramic annular semantic segmentation for real-world surrounding perception. In: IV (2019)

    Google Scholar 

  75. Yang, K., Hu, X., Bergasa, L.M., Romera, E., Wang, K.: PASS: panoramic annular semantic segmentation. IEEE Trans. Intell. Transp. Syst. 21(10), 4171–4185 (2020)

    Article  Google Scholar 

  76. Yang, K., Hu, X., Chen, H., Xiang, K., Wang, K., Stiefelhagen, R.: DS-PASS: detail-sensitive panoramic annular semantic segmentation through SwaftNet for surrounding sensing. In: IV (2020)

    Google Scholar 

  77. Yang, K., Hu, X., Fang, Y., Wang, K., Stiefelhagen, R.: Omnisupervised omnidirectional semantic segmentation. IEEE Trans. Intell. Transp. Syst. 23(2), 1184–1199 (2022)

    Article  Google Scholar 

  78. Yang, K., Hu, X., Stiefelhagen, R.: Is context-aware CNN ready for the surroundings? Panoramic semantic segmentation in the wild. IEEE Trans. Image Process. 30, 1866–1881 (2021)

    Article  Google Scholar 

  79. Yang, K., Zhang, J., Reiß, S., Hu, X., Stiefelhagen, R.: Capturing omni-range context for omnidirectional segmentation. In: CVPR (2021)

    Google Scholar 

  80. Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. In: SMC (2020)

    Google Scholar 

  81. Yogamani, S.K., et al.: WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: ICCV (2019)

    Google Scholar 

  82. Yu, F., Wang, X., Cao, M., Li, G., Shan, Y., Dong, C.: OSRT: omnidirectional image super-resolution with distortion-aware transformer. In: CVPR (2023)

    Google Scholar 

  83. Yu, H., He, L., Jian, B., Feng, W., Liu, S.: PanelNet: understanding 360 indoor environment via panel representation. In: CVPR (2023)

    Google Scholar 

  84. Yu, W., et al.: MetaFormer is actually what you need for vision. In: CVPR (2022)

    Google Scholar 

  85. Yuan, X., Kortylewski, A., Sun, Y., Yuille, A.: Robust instance segmentation through reasoning about multi-object occlusion. In: CVPR (2021)

    Google Scholar 

  86. Yue, X., et al.: Prototypical cross-domain self-supervised learning for few-shot unsupervised domain adaptation. In: CVPR (2021)

    Google Scholar 

  87. Zhang, C., et al.: DeepPanoContext: panoramic 3D scene understanding with holistic scene context graph and relation-based optimization. In: ICCV (2021)

    Google Scholar 

  88. Zhang, J., Ma, C., Yang, K., Roitberg, A., Peng, K., Stiefelhagen, R.: Transfer beyond the field of view: dense panoramic semantic segmentation via unsupervised domain adaptation. IEEE Trans. Intell. Transp. Syst. 23(7), 9478–9491 (2022)

    Article  Google Scholar 

  89. Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: distortion-aware transformers for adapting to panoramic semantic segmentation. In: CVPR (2022)

    Google Scholar 

  90. Zhang, J., et al.: Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 46(12), 8549–8567 (2024)

    Article  Google Scholar 

  91. Zhang, J., Huang, J., Lu, S.: Hierarchical mask calibration for unified domain adaptive panoptic segmentation. In: CVPR (2023)

    Google Scholar 

  92. Zhang, Z., Chen, A., Xie, L., Yu, J., Gao, S.: Learning semantics-aware distance map with semantics layering network for amodal instance segmentation. In: MM (2019)

    Google Scholar 

  93. Zheng, X., Pan, T., Luo, Y., Wang, L.: Look at the neighbor: distortion-aware unsupervised domain adaptation for panoramic semantic segmentation. In: ICCV (2023)

    Google Scholar 

  94. Zheng, X., Zhu, J., Liu, Y., Cao, Z., Fu, C., Wang, L.: Both style and distortion matter: dual-path unsupervised domain adaptation for panoramic semantic segmentation. In: CVPR (2023)

    Google Scholar 

  95. Zheng, Z., Lin, C., Nie, L., Liao, K., Shen, Z., Zhao, Y.: Complementary bi-directional feature compression for indoor 360\(^\circ \) semantic segmentation with self-distillation. In: WACV (2023)

    Google Scholar 

  96. Zhou, D., et al.: Understanding the robustness in vision transformers. In: ICML (2022)

    Google Scholar 

  97. Zhu, Y., Tian, Y., Metaxas, D., Dollár, P.: Semantic amodal segmentation. In: CVPR (2017)

    Google Scholar 

  98. Zou, Y., Yu, Z., Liu, X., Kumar, B.V.K.V., Wang, J.: Confidence regularized self-training. In: ICCV (2019)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Major Research Plan of the National Natural Science Foundation of China under Grant 92148204, the National Key RD Program under Grant 2022YFB4701400, the Hunan Leading Talent of Technological Innovation under Grant 2022RC3063, the Top Ten Technical Research Projects of Hunan Province under Grant 2024GK1010, the Key Research Development Program of Hunan Province under Grant 2022GK2011, and in part by Hangzhou SurImage Technology Company Ltd.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hui Zhang or Kailun Yang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3199 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cao, Y. et al. (2025). Occlusion-Aware Seamless Segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15077. Springer, Cham. https://doi.org/10.1007/978-3-031-72655-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72655-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72654-5

  • Online ISBN: 978-3-031-72655-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics