Abstract
Panoramic images can broaden the Field of View (FoV), occlusion-aware prediction can deepen the understanding of the scene, and domain adaptation can transfer across viewing domains. In this work, we introduce a novel task, Occlusion-Aware Seamless Segmentation (OASS), which simultaneously tackles all these three challenges. For benchmarking OASS, we establish a new human-annotated dataset for Blending Panoramic Amodal Seamless Segmentation, i.e., BlendPASS. Besides, we propose the first solution UnmaskFormer, aiming at unmasking the narrow FoV, occlusions, and domain gaps all at once. Specifically, UnmaskFormer includes the crucial designs of Unmasking Attention (UA) and Amodal-oriented Mix (AoMix). Our method achieves state-of-the-art performance on the BlendPASS dataset, reaching a remarkable mAPQ of \(26.58\%\) and mIoU of \(43.66\%\). On public panoramic semantic segmentation datasets, i.e., SynPASS and DensePASS, our method outperforms previous methods and obtains \(45.34\%\) and \(48.08\%\) in mIoU, respectively. The fresh BlendPASS dataset and our source code are available at https://github.com/yihong-97/OASS.
Y. Cao and J. Zhang—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ai, H., Cao, Z., Zhu, J., Bai, H., Chen, Y., Wang, L.: Deep learning for omnidirectional vision: a survey and new perspectives. arXiv preprint arXiv:2205.10468 (2022)
Ao, J., Ke, Q., Ehinger, K.A.: Image amodal completion: a survey. Comput. Vis. Image Underst. 229, 103661 (2023)
Back, S., et al.: Unseen object amodal instance segmentation via hierarchical occlusion modeling. In: ICRA (2022)
Breitenstein, J., Fingscheidt, T.: Amodal cityscapes: a new dataset, its generation, and an amodal semantic segmentation challenge baseline. In: IV (2022)
Chen, J., Niu, L., Zhang, J., Si, J., Qian, C., Zhang, L.: Amodal instance segmentation via prior-guided expansion. In: AAAI (2023)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Chen, Y., et al.: BANet: bidirectional aggregation network with occlusion handling for panoptic segmentation. In: CVPR (2020)
Cheng, B., et al.: Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR (2020)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: CVPR (2015)
Deng, L., Yang, M., Qian, Y., Wang, C., Wang, B.: CNN based semantic segmentation for urban traffic scenes using fisheye camera. In: IV (2017)
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: CoRL (2017)
Fan, K., et al.: Rethinking amodal video segmentation from learning supervised signals with object-centric representation. In: ICCV (2023)
Follmann, P., König, R., Härtinger, P., Klostermann, M., Böttger, T.: Learning to see the invisible: end-to-end trainable amodal instance segmentation. In: WACV (2019)
Fu, X., et al.: PanopticNeRF-360: panoramic 3D-to-2D label transfer in urban scenes. arXiv preprint arXiv:2309.10815 (2023)
Gao, J., et al.: Coarse-to-fine amodal segmentation with shape prior. In: ICCV (2023)
Gao, S., Yang, K., Shi, H., Wang, K., Bai, J.: Review on panoramic imaging and its applications in scene understanding. IEEE Trans. Instrum. Measur. 71, 1–34 (2022)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.: SegNeXt: rethinking convolutional attention design for semantic segmentation. In: NeurIPS (2022)
Guttikonda, S., Rambach, J.: Single frame semantic segmentation using multi-modal spherical images. In: WACV (2024)
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_20
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
Hoyer, L., Dai, D., Van Gool, L.: DAFormer: improving network architectures and training strategies for domain-adaptive semantic segmentation. In: CVPR (2022)
Hoyer, L., Dai, D., Van Gool, L.: HRDA: context-aware high-resolution domain-adaptive semantic segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13690, pp. 372–391. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20056-4_22
Hoyer, L., Dai, D., Wang, H., Van Gool, L.: MIC: masked image consistency for context-enhanced domain adaptation. In: CVPR (2023)
Hu, X., An, Y., Shao, C., Hu, H.: Distortion convolution module for semantic segmentation of panoramic images based on the image-forming principle. IEEE Trans. Instrum. Measur. 71, 1–12 (2022)
Hu, Y.T., Chen, H.S., Hui, K., Huang, J.B., Schwing, A.G.: SAIL-VOS: semantic amodal instance level video object segmentation - a synthetic dataset and baselines. In: CVPR (2019)
Jang, S., Na, J., Oh, D.: DaDA: distortion-aware domain adaptation for unsupervised semantic segmentation. In: NeurIPS (2022)
Jaus, A., Yang, K., Stiefelhagen, R.: Panoramic panoptic segmentation: towards complete surrounding understanding via unsupervised contrastive learning. In: IV (2021)
Jaus, A., Yang, K., Stiefelhagen, R.: Panoramic panoptic segmentation: insights into surrounding parsing for mobile agents via unsupervised contrastive learning. IEEE Trans. Intell. Transp. Syst. 24(4), 4438–4453 (2023)
Jiang, Q., et al.: Minimalist and high-quality panoramic imaging with PSF-aware transformers. IEEE Trans. Image Process. 33, 4568–4583 (2024)
Jiang, Q., Shi, H., Sun, L., Gao, S., Yang, K., Wang, K.: Annular computational imaging: capture clear panoramic images through simple lens. IEEE Trans. Comput. Imaging 8, 1250–1264 (2022)
Ke, L., Tai, Y.W., Tang, C.K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: CVPR (2021)
Kim, J., Jeong, S., Sohn, K.: PASTS: toward effective distilling transformer for panoramic semantic segmentation. In: ICIP (2022)
Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: CVPR (2019)
Kirillov, A., et al.: Segment anything. In: ICCV (2023)
Lazarow, J., Lee, K., Shi, K., Tu, Z.: Learning instance occlusion for panoptic segmentation. In: CVPR (2020)
Li, K., Malik, J.: Amodal instance segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 677–693. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_42
Li, X., Wu, T., Qi, Z., Wang, G., Shan, Y., Li, X.: SGAT4PASS: spherical geometry-aware transformer for panoramic semantic segmentation. In: IJCAI (2023)
Li, Z., Ye, W., Jiang, T., Huang, T.: 2D amodal instance segmentation guided by 3D shape prior. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13689, pp. 165–181. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19818-2_10
Li, Z., Ye, W., Jiang, T., Huang, T.: GIN: generative invariant shape prior for amodal instance segmentation. IEEE Trans. Multimedia 26, 3924–3936 (2023)
Li, Z., et al.: MUVA: a new large-scale benchmark for multi-view amodal instance segmentation in the shopping scenario. In: ICCV (2023)
Liao, Y., Xie, J., Geiger, A.: KITTI-360: a novel dataset and benchmarks for urban scene understanding in 2D and 3D. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3292–3310 (2023)
Ling, Z., Xing, Z., Zhou, X., Cao, M., Zhou, G.: PanoSwin: a pano-style swin transformer for panorama understanding. In: CVPR (2023)
Liu, Z., Li, Z., Jiang, T.: BLADE: box-level supervised amodal segmentation through directed expansion. In: AAAI (2024)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Luo, Y., Zheng, L., Guan, T., Yu, J., Yang, Y.: Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation. In: CVPR (2019)
Ma, C., Zhang, J., Yang, K., Roitberg, A., Stiefelhagen, R.: DensePASS: dense panoramic semantic segmentation via unsupervised domain adaptation with attention-augmented context exchange. In: ITSC (2021)
Mei, J., et al.: Waymo open dataset: panoramic video panoptic segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13689, pp. 53–72. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19818-2_4
Mohan, R., Valada, A.: Amodal panoptic segmentation. In: CVPR (2022)
Mohan, R., Valada, A.: Perceiving the invisible: proposal-free amodal panoptic segmentation. IEEE Robot. Autom. Lett. 7(4), 9302–9309 (2022)
Nanay, B.: The importance of amodal completion in everyday perception. i-Perception (2018)
Orhan, S., Bastanlar, Y.: Semantic segmentation of outdoor panoramic images. Sig. Image Video Process. 16(3), 643–650 (2022)
Poudel, R.P.K., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network. In: BMVC (2019)
Psomas, B., Kakogeorgiou, I., Karantzalos, K., Avrithis, Y.: Keep it SimPool: who said supervised transformers suffer from attention deficit? In: ICCV (2023)
Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with KINS dataset. In: CVPR (2019)
Saha, S., Hoyer, L., Obukhov, A., Dai, D., Van Gool, L.: EDAPS: enhanced domain-adaptive panoptic segmentation. In: ICCV (2023)
Sekkat, A.R., Dupuis, Y., Vasseur, P., Honeine, P.: The OmniScape dataset. In: ICRA (2020)
Sekkat, A.R., Mohan, R., Sawade, O., Matthes, E., Valada, A.: AmodalSynthDrive: a synthetic amodal perception dataset for autonomous driving. arXiv preprint arXiv:2309.06547 (2023)
Shen, Z., Lin, C., Liao, K., Nie, L., Zheng, Z., Zhao, Y.: PanoFormer: panorama transformer for indoor 360\(^\circ \) depth estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13661, pp. 195–211. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_12
Shi, H., et al.: FishDreamer: towards fisheye semantic completion via unified image outpainting and segmentation. In: CVPRW (2023)
Shi, Y., Ying, X., Zha, H.: Unsupervised domain adaptation for semantic segmentation of urban street scenes reflected by convex mirrors. IEEE Trans. Intell. Transp. Syst. 23(12), 24276–24289 (2022)
Sun, Y., Kortylewski, A., Yuille, A.: Amodal segmentation through out-of-task and out-of-distribution generalization with a Bayesian model. In: CVPR (2022)
Tateno, K., Navab, N., Tombari, F.: Distortion-aware convolutional filters for dense prediction in panoramic images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 732–750. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_43
Teng, Z., et al.: 360BEV: panoramic semantic mapping for indoor bird’s-eye view. In: WACV (2024)
Tran, M., Vo, K., Yamazaki, K., Fernandes, A., Kidd, M., Le, N.: AISFormer: amodal instance segmentation with transformer. In: BMVC (2022)
Tranheden, W., Olsson, V., Pinto, J., Svensson, L.: DACS: domain adaptation via cross-domain mixed sampling. In: WACV (2021)
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021)
Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: ICCV (2021)
Wang, Z., et al.: Differential treatment for stuff and things: a simple unsupervised domain adaptation method for semantic segmentation. In: CVPR (2020)
Xiao, Y., Xu, Y., Zhong, Z., Luo, W., Li, J., Gao, S.: Amodal segmentation based on visible region segmentation and shape prior. In: AAAI (2021)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Álvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. In: NeurIPS (2021)
Xu, Y., Wang, K., Yang, K., Sun, D., Fu, J.: Semantic segmentation of panoramic images using a synthetic dataset. SPIE (2019)
Yang, K., et al.: Can we PASS beyond the field of view? Panoramic annular semantic segmentation for real-world surrounding perception. In: IV (2019)
Yang, K., Hu, X., Bergasa, L.M., Romera, E., Wang, K.: PASS: panoramic annular semantic segmentation. IEEE Trans. Intell. Transp. Syst. 21(10), 4171–4185 (2020)
Yang, K., Hu, X., Chen, H., Xiang, K., Wang, K., Stiefelhagen, R.: DS-PASS: detail-sensitive panoramic annular semantic segmentation through SwaftNet for surrounding sensing. In: IV (2020)
Yang, K., Hu, X., Fang, Y., Wang, K., Stiefelhagen, R.: Omnisupervised omnidirectional semantic segmentation. IEEE Trans. Intell. Transp. Syst. 23(2), 1184–1199 (2022)
Yang, K., Hu, X., Stiefelhagen, R.: Is context-aware CNN ready for the surroundings? Panoramic semantic segmentation in the wild. IEEE Trans. Image Process. 30, 1866–1881 (2021)
Yang, K., Zhang, J., Reiß, S., Hu, X., Stiefelhagen, R.: Capturing omni-range context for omnidirectional segmentation. In: CVPR (2021)
Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. In: SMC (2020)
Yogamani, S.K., et al.: WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: ICCV (2019)
Yu, F., Wang, X., Cao, M., Li, G., Shan, Y., Dong, C.: OSRT: omnidirectional image super-resolution with distortion-aware transformer. In: CVPR (2023)
Yu, H., He, L., Jian, B., Feng, W., Liu, S.: PanelNet: understanding 360 indoor environment via panel representation. In: CVPR (2023)
Yu, W., et al.: MetaFormer is actually what you need for vision. In: CVPR (2022)
Yuan, X., Kortylewski, A., Sun, Y., Yuille, A.: Robust instance segmentation through reasoning about multi-object occlusion. In: CVPR (2021)
Yue, X., et al.: Prototypical cross-domain self-supervised learning for few-shot unsupervised domain adaptation. In: CVPR (2021)
Zhang, C., et al.: DeepPanoContext: panoramic 3D scene understanding with holistic scene context graph and relation-based optimization. In: ICCV (2021)
Zhang, J., Ma, C., Yang, K., Roitberg, A., Peng, K., Stiefelhagen, R.: Transfer beyond the field of view: dense panoramic semantic segmentation via unsupervised domain adaptation. IEEE Trans. Intell. Transp. Syst. 23(7), 9478–9491 (2022)
Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: distortion-aware transformers for adapting to panoramic semantic segmentation. In: CVPR (2022)
Zhang, J., et al.: Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 46(12), 8549–8567 (2024)
Zhang, J., Huang, J., Lu, S.: Hierarchical mask calibration for unified domain adaptive panoptic segmentation. In: CVPR (2023)
Zhang, Z., Chen, A., Xie, L., Yu, J., Gao, S.: Learning semantics-aware distance map with semantics layering network for amodal instance segmentation. In: MM (2019)
Zheng, X., Pan, T., Luo, Y., Wang, L.: Look at the neighbor: distortion-aware unsupervised domain adaptation for panoramic semantic segmentation. In: ICCV (2023)
Zheng, X., Zhu, J., Liu, Y., Cao, Z., Fu, C., Wang, L.: Both style and distortion matter: dual-path unsupervised domain adaptation for panoramic semantic segmentation. In: CVPR (2023)
Zheng, Z., Lin, C., Nie, L., Liao, K., Shen, Z., Zhao, Y.: Complementary bi-directional feature compression for indoor 360\(^\circ \) semantic segmentation with self-distillation. In: WACV (2023)
Zhou, D., et al.: Understanding the robustness in vision transformers. In: ICML (2022)
Zhu, Y., Tian, Y., Metaxas, D., Dollár, P.: Semantic amodal segmentation. In: CVPR (2017)
Zou, Y., Yu, Z., Liu, X., Kumar, B.V.K.V., Wang, J.: Confidence regularized self-training. In: ICCV (2019)
Acknowledgements
This work was supported in part by the Major Research Plan of the National Natural Science Foundation of China under Grant 92148204, the National Key RD Program under Grant 2022YFB4701400, the Hunan Leading Talent of Technological Innovation under Grant 2022RC3063, the Top Ten Technical Research Projects of Hunan Province under Grant 2024GK1010, the Key Research Development Program of Hunan Province under Grant 2022GK2011, and in part by Hangzhou SurImage Technology Company Ltd.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cao, Y. et al. (2025). Occlusion-Aware Seamless Segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15077. Springer, Cham. https://doi.org/10.1007/978-3-031-72655-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-72655-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72654-5
Online ISBN: 978-3-031-72655-2
eBook Packages: Computer ScienceComputer Science (R0)