Abstract
With the proliferation of remote sensing images, how to segment buildings more accurately in remote sensing images is a critical challenge. First, most networks have poor recognition ability on high resolution images, resulting in blurred boundaries in the segmented building maps. Second, the similarity between buildings and background results in intra-class inconsistency. To address these two problems, we propose an UNet-based network named Context-Transfer-UNet (CT-UNet). Specifically, we design Dense Boundary Block. Dense Block utilizes reuse mechanism to refine features and increase recognition capabilities. Boundary Block introduces the low-level spatial information to solve the fuzzy boundary problem. Then, to handle intra-class inconsistency, we construct Spatial Channel Attention Block. It combines context space information and selects more distinguishable features from space and channel. Finally, we propose an improved loss function to enhance the purpose of loss by adding evaluation indicator. Based on our proposed CT-UNet, we achieve 85.33% mean IoU on the Inria dataset, 91.00% mean IoU on the WHU dataset and 83.92% F1-score on the Massachusetts dataset. The results outperform our baseline (U-Net ResNet-34) by 3.76%, exceed Web-Net by 2.24% and surpass HFSA-Unet by 2.17%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adiba A, Hajji H, Maatouk M (2019) Transfer learning and u-net for buildings segmentation. In: Proceedings of the new challenges in data sciences: acts of the second conference of the Moroccan classification society, ACM, p 14
Aptoula E (2013) Remote sensing image retrieval with global morphological texture descriptors. IEEE Trans Geosci Remote Sens 52(5):3023–3034
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. https://arxiv.org/abs/1409.0473
Bischke B, Helber P, Folz J, Borth D, Dengel A (2019) Multi-task learning for segmentation of building footprints with deep neural networks. In: 2019 IEEE international conference on image processing (ICIP), IEEE, pp 1480–1484
Fourure D, Emonet R, Fromont E, Muselet D, Tremeau A, Wolf C (2017) Residual conv-deconv grid network for semantic segmentation. https://arxiv.org/abs/1707.07958
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He N, Fang L, Plaza A (2020) Hybrid first and second order attention unet for building segmentation in remote sensing images. Inf Sci 63(140305):1–140305
Hu S, Ning Q, Chen B, Lei Y, Zhou X, Yan H, Zhao C, Tang T, Hu R (2020) Segmentation of aerial image with multi-scale feature and attention model. In: Artificial Intelligence in China, Springer, pp 58–66
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Ji S, Wei S, Lu M (2018) Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Tran Geosci Remote Sens 57(1):574–586
Khalel A, El-Saban M (2018) Automatic pixelwise object labeling for aerial imagery using stacked u-nets. https://arxiv.org/abs/1803.04953
Kim JH, Lee H, Hong SJ, Kim S, Park J, Hwang JY, Choi JP (2018) Objects segmentation from high-resolution aerial images using u-net with pyramid pooling layers. IEEE Geosci Remote Sens Lett 16(1):115–119
Liu Y, Gross L, Li Z, Li X, Fan X, Qi W (2019) Automatic building extraction on high-resolution remote sensing imagery using deep convolutional encoder-decoder with spatial pyramid pooling. IEEE Access 7:128774–128786
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. https://arxiv.org/abs/1508.04025
Maggiori E, Tarabalka Y, Charpiat G, Alliez P (2017) Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark. In: 2017 IEEE international geoscience and remote sensing symposium (IGARSS), IEEE, pp 3226–3229
Mitra P, Shankar BU, Pal SK (2004) Segmentation of multispectral remote sensing images using active support vector machines. Pattern Recogn Lett 25(9):1067–1074
Mnih V (2013) Machine learning for aerial image labeling. Citeseer
Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B, et al. (2018) Attention u-net: Learning where to look for the pancreas. https://arxiv.org/abs/1804.03999
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222
Pan X, Yang F, Gao L, Chen Z, Zhang B, Fan H, Ren J (2019) Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms. Remote Sens 11(8):917
Qi HN, Yang JG, Zhong YW, Deng C (2004) Multi-class svm based remote sensing image classification and its semi-supervised improvement scheme. In: Proceedings of 2004 international conference on machine learning and cybernetics (IEEE Cat. No. 04EX826), IEEE, vol 5, pp 3146–3151
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 234–241
Sebastian C, Imbriaco R, Bondarev E, de With PH (2020) Adversarial loss for semantic segmentation of aerial imagery. https://arxiv.org/abs/2001.04269
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556
Singh PP, Garg R (2013) Automatic road extraction from high resolution satellite image using adaptive global thresholding and morphological operations. J Ind Soc Remote Sens 41(3):631–640
Song L, Xu Y, Zhang L, Du B, Zhang Q, Wang X (2020) Learning from synthetic images via active pseudo-labeling. IEEE Transactions on Image Processing
Tuermer S, Kurz F, Reinartz P, Stilla U (2013) Airborne vehicle detection in dense urban areas using hog features and disparity maps. IEEE J Select Top Appl Earth Observ Remote Sens 6(6):2327–2337
Xia J, Du P, He X, Chanussot J (2013) Hyperspectral remote sensing image classification based on rotation forest. IEEE Geosci Remote Sens Lett 11(1):239–243
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yi Y, Zhang Z, Zhang W, Zhang C, Li W, Zhao T (2019) Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network. Remote Sens 11(15):1774
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1857–1866
Zhang L, Zhang L, Tao D, Huang X (2011) On combining multiple features for hyperspectral remote sensing image classification. IEEE Trans Geosci Remote Sens 50(3):879–893
Zhang Y, Gong W, Sun J (1897) Li W (2019) Web-net: A novel nest networks with ultra-hierarchical sampling for building extraction from aerial imageries. Remote Sens 11(16)
Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett 15(5):749–753
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Zhao H, Zhang Y, Liu S, Shi J, Change Loy C, Lin D, Jia J (2018) Psanet: Point-wise spatial attention network for scene parsing. In: Proceedings of the European conference on computer vision (ECCV), pp 267–283
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, Springer, pp 3–11
Acknowledgements
This work was supported by the National Key R&D Program of China(No.2018YFB1305200) and Science Technology Department of Zhejiang Province(No.LGG19F020010). An earlier version of this paper was accepted at the Conference on International Conference on Pattern Recognition
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, S., Ye, H., Jin, K. et al. CT-UNet: Context-Transfer-UNet for Building Segmentation in Remote Sensing Images. Neural Process Lett 53, 4257–4277 (2021). https://doi.org/10.1007/s11063-021-10592-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10592-w