Abstract
Incremental learning represents a crucial task in aerial image processing, especially given the limited availability of large-scale annotated datasets. A major issue concerning current deep neural architectures is known as catastrophic forgetting, namely the inability to faithfully maintain past knowledge once a new set of data is provided for retraining. Over the years, several techniques have been proposed to mitigate this problem for image classification and object detection. However, only recently the focus has shifted towards more complex downstream tasks such as instance or semantic segmentation. Starting from incremental-class learning for semantic segmentation tasks, our goal is to adapt this strategy to the aerial domain, exploiting a peculiar feature that differentiates it from natural images, namely the orientation. In addition to the standard knowledge distillation approach, we propose a contrastive regularization, where any given input is compared with its augmented version (i.e. flipping and rotations) in order to minimize the difference between the segmentation features produced by both inputs. We show the effectiveness of our solution on the Potsdam dataset, outperforming the incremental baseline in every test (Code available at: https://github.com/edornd/contrastive-distillation).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Audebert, N., Le Saux, B., Lefèvre, S.: Beyond RGB: very high resolution urban remote sensing with multimodal deep networks. ISPRS J. Phot. Rem. Sens. 140, 20–32 (2018)
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882 (2020)
Cermelli, F., Mancini, M., Rota Bulò, S., Ricci, E., Caputo, B.: Modeling the background for incremental learning in semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, June 2020 (2020)
Chaitanya, K., Erdil, E., Karani, N., Konukoglu, E.: Contrastive learning of global and local features for medical image segmentation with limited annotations. In: Advances in Neural Information Processing System (2020)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Diakogiannis, F.I., Waldner, F., Caccetta, P., Wu, C.: ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogram. Rem. Sens. 162, 94–114 (2020)
Feng, Y., Sun, X., Diao, W., Li, J., Gao, X., Fu, K.: Continual learning with structured inheritance for semantic segmentation in aerial imagery. IEEE Trans. Geosci. Rem. Sens. 60, 1–17 (2021)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Khosla, P., et al.: Supervised contrastive learning. In: Advances in Neural Information Processing System, vol. 33, pp. 18661–18673 (2020)
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)
Loghmani, M.R., Robbiano, L., Planamente, M., Park, K., Caputo, B., Vincze, M.: Unsupervised domain adaptation through inter-modal rotation for RGB-D object recognition. IEEE Robot. Autom. Lett. 5(4), 6631–6638 (2020). https://doi.org/10.1109/LRA.2020.3007092
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Mallya, A., Lazebnik, S.: PackNet: adding multiple tasks to a single network by iterative pruning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2018, pp. 7765–7773 (2018). https://doi.org/10.1109/CVPR.2018.00810
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psych. Learn. Motiv. 24, 109–165 (1989)
Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations. In: IEEE Conference on Computer Vision and Pattern Recognition, June 2020 (2020)
Nogueira, K., Dalla Mura, M., Chanussot, J., Schwartz, W.R., dos Santos, J.A.: Learning to semantically segment high-resolution remote sensing images. In: International Conference on Pattern Recognition, pp. 3566–3571 (2016)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Pan, B., Shi, Z., Xu, X., Shi, T., Zhang, N., Zhu, X.: CoinNet: copy initialization network for multispectral imagery semantic segmentation. IEEE Geos. Rem. Sens. Lett. 16(5), 816–820 (2019). https://doi.org/10.1109/LGRS.2018.2880756
The International Society for Photogrammetry and Remote Sensing: Potsdam dataset (2018)
Pielawski, N., et al.: CoMIR: contrastive multimodal image representation for registration. In: Advances in Neural Information Processing Systems, vol. 33, pp. 18433–18444 (2020)
Piramanayagam, S., Saber, E., Schwartzkopf, W., Koehler, F.W.: Supervised classification of multisensor remotely sensed images using a deep learning framework. Rem. Sens. 10(9) (2018). https://doi.org/10.3390/rs10091429
Qi, K., Yang, C., Hu, C., Shen, Y., Shen, S., Wu, H.: Rotation invariance regularization for remote sensing image scene classification with convolutional neural networks. Rem. Sens. 13(4) (2021). https://doi.org/10.3390/rs13040569
Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)
Ridnik, T., Lawen, H., Noy, A., Friedman, I.: TResNet: high performance GPU-dedicated architecture. In: Winter Conference on Applications of Computer Vision, pp. 1399–1408 (2021)
Rota Bulò, S., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of DNNs. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Singh, S., et al.: Self-supervised feature learning for semantic segmentation of overhead imagery. In: The British Machine Vision Conference, vol. 1, p. 4 (2018)
Tasar, O., Tarabalka, Y., Alliez, P.: Incremental learning for semantic segmentation of large-scale remote sensing data. IEEE J. Sel. Top. App. Earth Observ. Rem. Sens. 12(9), 3524–3537 (2019)
Valada, A., Mohan, R., Burgard, W.: Self-supervised model adaptation for multimodal semantic segmentation. Int. J. Comput. Vis. 128(5), 1239–1285 (2020)
Wang, G., Wang, X., Fan, B., Pan, C.: Feature extraction by rotation-invariant matrix representation for object detection in aerial image. IEEE Geos. Rem. Sens. Lett. 14(6), 851–855 (2017). https://doi.org/10.1109/LGRS.2017.2683495
Yang, S., Yu, S., Zhao, B., Wang, Y.: Reducing the feature divergence of RGB and near-infrared images using switchable normalization. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop, June 2020, pp. 206–211 (2020). https://doi.org/10.1109/CVPRW50498.2020.00031
Yuan, Q., Shafri, H.Z.M., Alias, A.H., Hashim, S.J.: Multiscale semantic feature optimization and fusion network for building extraction using high-resolution aerial images and LiDAR data. Rem. Sens. 13(13), 2473 (2021). https://doi.org/10.3390/rs13132473
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: International Conference on Machine Learning, ICML 2017, vol. 70, pp. 3987–3995 (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition, July 2017 (2017)
Acknowledgements
This work was developed in the context of the Horizon 2020 projects SHELTER (grant agreement n.821282) and SAFERS (grant agreement n.869353).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Arnaudo, E., Cermelli, F., Tavera, A., Rossi, C., Caputo, B. (2022). A Contrastive Distillation Approach for Incremental Semantic Segmentation in Aerial Images. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13232. Springer, Cham. https://doi.org/10.1007/978-3-031-06430-2_62
Download citation
DOI: https://doi.org/10.1007/978-3-031-06430-2_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06429-6
Online ISBN: 978-3-031-06430-2
eBook Packages: Computer ScienceComputer Science (R0)