A Postprocessing Method Based on Regions and Boundaries Using Convolutional Neural Networks and a New Dataset for Building Extraction
<p>Illustration of satellite images and their ground truth. (<b>a</b>) Images from Satellite dataset II of the WHU Building Dataset [<a href="#B28-remotesensing-14-00647" class="html-bibr">28</a>]; (<b>b</b>) Ground truth boundaries; (<b>c</b>) Ground truth regions.</p> "> Figure 2
<p>Geospatial distribution of XIHU building dataset.</p> "> Figure 3
<p>Examples of annotated images from the training set of the XIHU building dataset. (<b>a</b>) From top to bottom, images show low-rise residential buildings, rural residential buildings, high-rise residential buildings, and industrial buildings, respectively; (<b>b</b>) Ground truth boundaries; (<b>c</b>) Ground truth regions.</p> "> Figure 4
<p>Examples of annotated images from the training set of the WHU building dataset (East Asia). (<b>a</b>) Original images; (<b>b</b>) Ground truth regions.</p> "> Figure 5
<p>Overview of the proposed building detection framework.</p> "> Figure 6
<p>Examples of the predicted building regions and boundaries by the trained models in the network training stage. (<b>a</b>) An example test satellite image; (<b>b</b>) The predicted building regions by UNET; (<b>c</b>) The predicted building boundaries by BDCN.</p> "> Figure 7
<p>Workflow of the building extraction stage. The postprocessing method is shown in the dashed box.</p> "> Figure 8
<p>Examples of the outputs by deep edge-detection methods. (<b>a</b>) An example satellite image; (<b>b</b>) The results of HED; (<b>c</b>) The results of BDCN; (<b>d</b>) The results of DexiNed.</p> "> Figure 9
<p>The kernel used for the morphological operation.</p> "> Figure 10
<p><span class="html-italic">F</span>1 versus threshold <math display="inline"><semantics> <mi>α</mi> </semantics></math> using the XIHU training set. (<b>a</b>) The combinations of building regions by U-Net and building boundaries by HED, BDCN and DexiNed, respectively; (<b>b</b>) The combinations of building regions by DeepLab and building boundaries by HED, BDCN and DexiNed, respectively; (<b>c</b>) The combinations of building regions by Mask R-CNN and building boundaries by HED, BDCN and DexiNed, respectively.</p> "> Figure 11
<p><span class="html-italic">F</span>1 versus threshold <math display="inline"><semantics> <mi>α</mi> </semantics></math> using the WHUEA training set. (<b>a</b>) The combinations of building regions by U-Net and building boundaries by HED, BDCN and DexiNed, respectively; (<b>b</b>) The combinations of building regions by DeepLab and building boundaries by HED, BDCN and DexiNed, respectively; (<b>c</b>) The combinations of building regions by Mask R-CNN and building boundaries by HED, BDCN and DexiNed, respectively.</p> "> Figure 12
<p>Examples of building extraction results on the XIHU building dataset. (<b>a</b>) Original satellite images; (<b>b)</b> Ground truth; (<b>c</b>) Results of U-Net; (<b>d</b>) Results of DeepLab; (<b>e</b>) Results of Mask R-CNN; (<b>f</b>) Our results using the combination of DeepLab and BDCN (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 110).</p> "> Figure 13
<p>Examples of one-pixel-wide building boundaries by BDCN (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 110). Results (<b>a</b>–<b>d</b>) are computed from images in Columns 1–4 of <a href="#remotesensing-14-00647-f012" class="html-fig">Figure 12</a>a.</p> "> Figure 14
<p>Examples of building extraction results on the WHUEA building dataset. (<b>a</b>) Original satellite images; (<b>b</b>) Ground truth; (<b>c</b>) Results of U-Net; (<b>d</b>) Results of DeepLab; (<b>e</b>) Results of Mask R-CNN; (<b>f</b>) Our results using the combination of Mask R-CNN and DexiNed (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 200).</p> "> Figure 15
<p>Examples of one-pixel-wide building boundaries by DexiNed (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 200). Results (<b>a</b>–<b>d</b>) are computed from images in Columns 1–4 of <a href="#remotesensing-14-00647-f014" class="html-fig">Figure 14</a>a.</p> "> Figure 16
<p>Confusion matrix for the XIHU dataset. (<b>a</b>) Confusion matrix for DeepLab; (<b>b</b>) Confusion matrix for our method that combines the results of DeepLab and BDCN (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 110).</p> "> Figure 17
<p>Confusion matrix for the WHUEA dataset. (<b>a</b>) Confusion matrix for Mask R-CNN; (<b>b</b>) Confusion matrix for our method that combines the results of Mask R-CNN and DexiNed (<span class="html-italic">α</span> = 200).</p> "> Figure 18
<p>Examples of results of XIHU building dataset by edge-detection models. (<b>a</b>) Original satellite images; (<b>b</b>) Results of HED (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 250); (<b>c</b>) Results of HED (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 160); (<b>d</b>) Results of HED (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 120); (<b>e</b>) Results of BDCN (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 250); (<b>f</b>) Results of BDCN (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 110); (<b>g</b>) Results of BDCN (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 20); (<b>h</b>) Results of DexiNed (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 250); (<b>i</b>) Results of DexiNed (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 180); (<b>j</b>) Results of DexiNed (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 10).</p> "> Figure 19
<p>Examples of results of WHUEA building dataset by edge-detection models. (<b>a</b>) Original satellite images; (<b>b</b>) Results of HED (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 250); (<b>c</b>) Results of HED (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 180); (<b>d</b>) Results of HED (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 160); (<b>e</b>) Results of BDCN (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 250); (<b>f</b>) Results of BDCN (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 130); (<b>g</b>) Results of BDCN (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 120); (<b>h</b>) Results of DexiNed (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 240); (<b>i</b>) Results of DexiNed (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 220); (<b>j</b>) Results of DexiNed (<math display="inline"><semantics> <mi>α</mi> </semantics></math> = 200).</p> "> Figure 20
<p><span class="html-italic">F</span>1 versus threshold <math display="inline"><semantics> <mi>α</mi> </semantics></math> using the XIHU training set for the combinations of building regions by PMNet mask and building boundaries by HED, BDCN, DexiNed and PMNet contour, respectively.</p> "> Figure 21
<p><span class="html-italic">F</span>1 versus threshold <math display="inline"><semantics> <mi>α</mi> </semantics></math> using the WHUEA training set for the combinations of building regions by PMNet mask and building boundaries by HED, BDCN, DexiNed and PMNet contour, respectively.</p> ">
Abstract
:1. Introduction
- We develop and provide a building dataset, which contains 742 image tiles with 512 × 512 pixels in the training set and 197 image tiles with 512 × 512 pixels in the testing set from 0.5 m satellite images covering several sites in Hangzhou, Zhejiang Province, China.
- We propose a new postprocessing method for building extraction based on DCNNs. This method takes advantage of both the predicted building regions and boundaries and shows improvements in extracting complete rooftops.
2. Materials and Methods
2.1. Datasets
2.2. Methods
2.2.1. Network Training
- Training with region labels
- Training with building boundaries
- Training with Building Regions and Boundaries
2.2.2. Building Extraction
- Obtain the candidate building regions and boundaries of the test images as the input;
- Reduce the candidate building boundaries to one-pixel-wide edge maps;
- Combine the building regions with the one-pixel-wide edge maps;
- Fill holes in the combination map of the building regions and boundaries;
- Apply morphological operation to the hole filling results;
- Remove building regions whose areas are smaller than the threshold;
- Generate the building regions as the output.
2.3. Threshold Optimization
2.4. Implementation Details
2.5. Accuracy Assessment
3. Results
3.1. XIHU Building Dataset
3.2. WHUEA Building Dataset
4. Discussion
4.1. Comparison with the Networks That Predict Building Regions
4.1.1. Qualitative Analysis
4.1.2. Quantitative Analysis
4.2. Comparison with Deep Edge-Detection Networks
4.3. Comparison with PMNet
4.4. Perspectives
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
- Our postprocessing method: https://github.com/yanghplab/building-dataset (accessed on 31 December 2021)
- Unet: https://github.com/JavisPeng/u_net_liver (accessed on 31 December 2021)
- MaskRCNN: https://github.com/jinfagang/FruitsNutsSeg (accessed on 31 December 2021)
- PMNet: https://github.com/tiruss/PMNet (accessed on 31 December 2021)
- Deeplabv3+: https://github.com/VainF/DeepLabV3Plus-Pytorch (accessed on 31 December 2021
- HED: https://github.com/Duchen521/HED-document-detection (accessed on 31 December 2021)
- BDCN: https://github.com/pkuCactus/BDCN (accessed on 31 December 2021)
- DexiNed: https://github.com/xavysp/DexiNed (accessed on 31 December 2021)
References
- Park, Y.; Guldmann, J.-M.; Liu, D. Impacts of tree and building shades on the urban heat island: Combining remote sensing, 3D digital city and spatial regression approaches. Comput. Environ. Urban Syst. 2021, 88, 101655. [Google Scholar] [CrossRef]
- Adriano, B.; Yokoya, N.; Xia, J.; Miura, H.; Liu, W.; Matsuoka, M.; Koshimura, S. Learning from multimodal and multitemporal earth observation data for building damage mapping. ISPRS J. Photogramm. Remote Sens. 2021, 175, 132–143. [Google Scholar] [CrossRef]
- Zhang, X.; Du, S.; Du, S.; Liu, B. How do land-use patterns influence residential environment quality? A multiscale geographic survey in Beijing. Remote Sens. Environ. 2020, 249, 112014. [Google Scholar] [CrossRef]
- Guo, M.; Liu, H.; Xu, Y.; Huang, Y. Building Extraction Based on U-Net with an Attention Block and Multiple Losses. Remote Sens. 2020, 12, 1400. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, D.; Ma, A.; Zhong, Y.; Fang, F.; Xu, K. Multiscale U-Shaped CNN Building Instance Extraction Framework With Edge Constraint for High-Spatial-Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6106–6120. [Google Scholar] [CrossRef]
- Xia, L.; Zhang, X.; Zhang, J.; Yang, H.; Chen, T. Building Extraction from Very-High-Resolution Remote Sensing Images Using Semi-Supervised Semantic Edge Detection. Remote Sens. 2021, 13, 2187. [Google Scholar] [CrossRef]
- Liow, Y.-T.; Pavlidis, T. Use of shadows for extracting buildings in aerial images. Comput. Vis. Graph. Image Processing 1990, 49, 242–277. [Google Scholar] [CrossRef]
- Liasis, G.; Stavrou, S. Building extraction in satellite images using active contours and colour features. Int. J. Remote Sens. 2016, 37, 1127–1153. [Google Scholar] [CrossRef]
- Zhang, Q.; Huang, X.; Zhang, G. A Morphological Building Detection Framework for High-Resolution Optical Imagery over Urban Areas. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1388–1392. [Google Scholar] [CrossRef]
- Ok, A.O. Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts. ISPRS J. Photogramm. Remote Sens. 2013, 86, 21–40. [Google Scholar] [CrossRef]
- Turker, M.; Koc-San, D. Building extraction from high-resolution optical spaceborne images using the integration of support vector machine (SVM) classification, Hough transformation and perceptual grouping. Int. J. Appl. Earth Obs. Geoinf. 2015, 34, 58–69. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 386–397. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Lv, X.; Ming, D.; Chen, Y.; Wang, M. Very high resolution remote sensing image classification with SEEDS-CNN and scale effect analysis for superpixel CNN classification. Int. J. Remote Sens. 2019, 40, 506–531. [Google Scholar] [CrossRef]
- Vakalopoulou, M.; Karantzalos, K.; Komodakis, N.; Paragios, N. Building detection in very high resolution multispectral data with deep learning features. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 1873–1876. [Google Scholar]
- Volpi, M.; Tuia, D. Dense Semantic Labeling of Subdecimeter Resolution Images with Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 881–893. [Google Scholar] [CrossRef] [Green Version]
- Yang, H.; Yu, B.; Luo, J.; Chen, F. Semantic segmentation of high spatial resolution images with deep neural networks. GIScience Remote Sens. 2019, 56, 749–768. [Google Scholar] [CrossRef]
- Zhao, K.; Kang, J.; Jung, J.; Sohn, G. Building Extraction from Satellite Images Using Mask R-CNN with Building Boundary Regularization. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 242–2424. [Google Scholar]
- Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Mura, M.D. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
- Waldner, F.; Diakogiannis, F.I. Deep learning on edge: Extracting field boundaries from satellite images with a convolutional neural network. Remote Sens. Environ. 2020, 245, 111741. [Google Scholar] [CrossRef]
- Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2013. [Google Scholar]
- Zhu, Y.; Liang, Z.; Yan, J.; Chen, G.; Wang, X. E-D-Net: Automatic Building Extraction From High-Resolution Aerial Images With Boundary Information. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4595–4606. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, X.; Xiao, P.; Zheng, Z. On the Effectiveness of Weakly Supervised Semantic Segmentation for Building Extraction From High-Resolution Remote Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3266–3281. [Google Scholar] [CrossRef]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Honolulu, HI, USA, 23–28 July 2017; pp. 3226–3229. [Google Scholar]
- Ji, S.; Wei, S.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote Sens. 2019, 57, 574–586. [Google Scholar] [CrossRef]
- Chen, Z.; Li, D.; Fan, W.; Guan, H.; Wang, C.; Li, J. Self-Attention in Reconstruction Bias U-Net for Semantic Segmentation of Building Rooftops in Optical Remote Sensing Images. Remote Sens. 2021, 13, 2524. [Google Scholar] [CrossRef]
- Cai, J.; Chen, Y. MHA-Net: Multipath Hybrid Attention Network for Building Footprint Extraction From High-Resolution Remote Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5807–5817. [Google Scholar] [CrossRef]
- Zhu, Q.; Liao, C.; Hu, H.; Mei, X.; Li, H. MAP-Net: Multiple Attending Path Neural Network for Building Footprint Extraction From Remote Sensed Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6169–6181. [Google Scholar] [CrossRef]
- He, N.; Fang, L.; Plaza, A. Hybrid first and second order attention Unet for building segmentation in remote sensing images. Sci. China Inf. Sci. 2020, 63, 140305. [Google Scholar] [CrossRef] [Green Version]
- Mnih, V.; Hinton, G.E. Learning to Detect Roads in High-Resolution Aerial Images. In Proceedings of the Computer Vision—ECCV 2010, Heraklion, Greece, 5–11 September 2010; pp. 210–223. [Google Scholar]
- Saito, S.; Yamashita, T.; Aoki, Y. Multiple Object Extraction from Aerial Imagery with Convolutional Neural Networks. J. Imaging Sci. Technol. 2016, 60, 0104021–0104029. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 645–657. [Google Scholar] [CrossRef] [Green Version]
- Yuan, J. Learning Building Extraction in Aerial Scenes with Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2793–2798. [Google Scholar] [CrossRef]
- Yang, H.L.; Yuan, J.; Lunga, D.; Laverdiere, M.; Rose, A.; Bhaduri, B. Building Extraction at Scale Using Convolutional Neural Network: Mapping of the United States. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2600–2614. [Google Scholar] [CrossRef] [Green Version]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Cham, Switzerland, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Yang, G.; Zhang, Q.; Zhang, G. EANet: Edge-Aware Network for the Extraction of Buildings from Aerial Images. Remote Sens. 2020, 12, 2161. [Google Scholar] [CrossRef]
- Cheng, D.; Liao, R.; Fidler, S.; Urtasun, R. DARNet: Deep Active Ray Network for Building Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7423–7431. [Google Scholar]
- Li, Z.; Wegner, J.D.; Lucchi, A. Topological Map Extraction from Overhead Images. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1715–1724. [Google Scholar]
- Lu, T.; Ming, D.; Lin, X.; Hong, Z.; Bai, X.; Fang, J. Detecting Building Edges from High Spatial Resolution Remote Sensing Imagery Using Richer Convolution Features Network. Remote Sens. 2018, 10, 1496. [Google Scholar] [CrossRef] [Green Version]
- Kass, M.; Witkin, A.; Terzopoulos, D. Snakes: Active contour models. Int. J. Comput. Vis. 1988, 1, 321–331. [Google Scholar] [CrossRef]
- Zhang, L.; Bai, M.; Liao, R.; Urtasun, R.; Marcos, D.; Tuia, D.; Kellenberger, B. Learning Deep Structured Active Contours End-to-End. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8877–8885. [Google Scholar]
- Zhao, W.; Persello, C.; Stein, A. Building outline delineation: From aerial images to polygons with an improved end-to-end learning framework. ISPRS J. Photogramm. Remote Sens. 2021, 175, 119–131. [Google Scholar] [CrossRef]
- Xie, S.; Tu, Z. Holistically-Nested Edge Detection. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1395–1403. [Google Scholar]
- Liu, Y.; Cheng, M.; Hu, X.; Bian, J.; Zhang, L.; Bai, X.; Tang, J. Richer Convolutional Features for Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1939–1946. [Google Scholar] [CrossRef] [Green Version]
- Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 192–1924. [Google Scholar]
- Soria, X.; Riba, E.; Sappa, A. Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 1912–1921. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [Green Version]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. In Proceedings of the ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Cham, Switzerland, 8–14 September 2018; pp. 833–851. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 1, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- He, J.; Zhang, S.; Yang, M.; Shan, Y.; Huang, T. Bi-Directional Cascade Network for Perceptual Edge Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3823–3832. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Lee, C.-Y.; Xie, S.; Gallagher, P.; Zhang, Z.; Tu, Z. Deeply-Supervised Nets. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, San Diego, CA, USA, 9–12 May 2015; pp. 562–570. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
- Kang, D.; Park, S.; Paik, J. Coarse to Fine: Progressive and Multi-Task Learning for Salient Object Detection. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 1491–1498. [Google Scholar]
- Zhang, T.Y.; Suen, C.Y. A fast parallel algorithm for thinning digital patterns. Commun. ACM 1984, 27, 236–239. [Google Scholar] [CrossRef]
Dataset | Training | Testing |
---|---|---|
XIHU Building | 742 | 197 |
WHU Building (East Asia) | 3135 | 903 |
Methods | P (%) | R (%) | F1 (%) | IoU (%) |
---|---|---|---|---|
U-Net [39] | 89.46 | 73.55 | 80.73 | 67.68 |
U-Net [39] + HED [47] (α = 250) + our method | 89.36 | 73.69 | 80.77 | 67.74 |
HED [47] (α = 250) | 58.71 | 1.45 | 2.83 | 1.43 |
U-Net [39] + BDCN [55] (α = 250) + our method | 89.39 | 73.63 | 80.75 | 67.71 |
BDCN [55] (α = 250) | 64.96 | 0.44 | 0.88 | 0.44 |
U-Net [39] + DexiNed [50] (α = 250) + our method | 89.34 | 73.77 | 80.81 | 67.80 |
DexiNed [50] (α = 250) | 57.85 | 2.44 | 4.69 | 2.40 |
Methods | P (%) | R (%) | F1 (%) | IoU (%) |
---|---|---|---|---|
DeepLab [51] | 86.76 | 78.39 | 82.36 | 70.01 |
DeepLab [51] + HED [47] (α = 160) + our method | 84.38 | 80.88 | 82.59 | 70.35 |
HED [47] (α = 160) | 80.52 | 42.47 | 55.61 | 38.51 |
DeepLab [51] + BDCN [55] (α = 110) + our method | 85.61 | 80.81 | 83.14 | 71.14 |
BDCN [55] (α = 110) | 84.80 | 46.23 | 59.84 | 42.69 |
DeepLab [51] + DexiNed [50] (α = 180) + our method | 86.14 | 79.95 | 82.93 | 70.84 |
DexiNed [50] (α = 180) | 85.47 | 27.51 | 41.62 | 26.28 |
Methods | P (%) | R (%) | F1 (%) | IoU (%) |
---|---|---|---|---|
Mask R-CNN [15] | 89.82 | 70.38 | 78.92 | 65.18 |
Mask R-CNN [15] + HED [47] ( = 120) + our method | 85.00 | 76.20 | 80.36 | 67.17 |
HED [47] ( = 120) | 78.87 | 54.11 | 64.19 | 47.26 |
Mask R-CNN [15] + BDCN [55] ( = 20) + our method | 85.01 | 77.66 | 81.17 | 68.31 |
BDCN [55] ( = 20) | 79.47 | 68.74 | 73.72 | 58.38 |
Mask R-CNN [15] + DexiNed [50] ( = 10) + our method | 87.76 | 75.46 | 81.15 | 68.27 |
DexiNed [50] ( = 10) | 85.06 | 52.00 | 64.55 | 47.65 |
Methods | P (%) | R (%) | F1 (%) | IoU (%) |
---|---|---|---|---|
U-Net [39] | 85.87 | 73.95 | 79.47 | 65.93 |
U-Net [39] + HED [47] ( = 250) + our method | 85.74 | 74.22 | 79.56 | 66.06 |
HED [47] ( = 250) | 59.48 | 2.95 | 5.62 | 2.89 |
U-Net [39] + BDCN [55] ( = 250) + our method | 85.89 | 73.97 | 79.48 | 65.95 |
BDCN [55] ( = 250) | 60.56 | 0.12 | 0.24 | 0.12 |
U-Net [39] + DexiNed [50] ( = 240) + our method | 85.36 | 76.51 | 80.69 | 67.64 |
DexiNed [50] ( = 240) | 86.38 | 38.52 | 53.28 | 36.32 |
Methods | P (%) | R (%) | F1 (%) | IoU (%) |
---|---|---|---|---|
DeepLab [51] | 81.57 | 78.54 | 80.03 | 66.70 |
DeepLab [51] + HED [47] ( = 180) + our method | 81.02 | 79.64 | 80.33 | 67.12 |
HED [47] ( = 180) | 78.65 | 19.51 | 31.26 | 18.53 |
DeepLab [51] + BDCN [55] ( = 130) + our method | 79.81 | 82.43 | 81.10 | 68.21 |
BDCN [55] ( = 130) | 82.81 | 66.03 | 73.48 | 58.07 |
DeepLab [51] + DexiNed [50] ( = 220) + our method | 80.86 | 81.53 | 81.19 | 68.34 |
DexiNed [50] ( = 220) | 86.75 | 52.97 | 65.78 | 49.01 |
Methods | P (%) | R (%) | F1 (%) | IoU (%) |
---|---|---|---|---|
Mask R-CNN [15] | 84.43 | 77.48 | 80.81 | 67.79 |
Mask R-CNN [15] + HED [47] ( = 160) + our method | 83.55 | 78.62 | 81.01 | 68.08 |
HED [47] ( = 160) | 79.93 | 23.35 | 36.14 | 22.05 |
Mask R-CNN [15] + BDCN [55] ( = 120) + our method | 82.12 | 80.87 | 81.49 | 68.76 |
BDCN [55] ( = 120) | 82.09 | 69.13 | 75.06 | 60.07 |
Mask R-CNN [15] + DexiNed [50] ( = 200) + our method | 83.39 | 80.46 | 81.90 | 69.34 |
DexiNed [50] ( = 200) | 86.70 | 57.83 | 69.38 | 53.12 |
Methods | P (%) | R (%) | F1 (%) | IoU (%) |
---|---|---|---|---|
PMNet mask [59] | 88.85 | 62.71 | 73.53 | 58.13 |
PMNet mask [59] + HED [47] ( = 180) + our method | 86.86 | 68.63 | 76.68 | 62.18 |
HED [47] ( = 180) | 80.24 | 33.96 | 47.73 | 31.34 |
PMNet mask [59] + BDCN [55] ( = 140) + our method | 87.97 | 68.78 | 77.20 | 62.87 |
BDCN [55] ( = 140) | 85.25 | 37.41 | 52.00 | 35.14 |
PMNet mask [59] + DexiNed [50] ( = 200) + our method | 88.42 | 66.39 | 75.84 | 61.08 |
DexiNed [50] ( = 200) | 85.01 | 24.61 | 38.17 | 23.59 |
PMNet mask [59] + PMNet contour [59] ( = 100) + our method | 88.40 | 63.87 | 74.16 | 58.94 |
PMNet contour [59] ( = 100) | 67.58 | 6.93 | 12.57 | 6.70 |
Methods | P (%) | R (%) | F1 (%) | IoU (%) |
---|---|---|---|---|
PMNet mask [59] | 89.09 | 66.37 | 76.07 | 61.38 |
PMNet mask [59] + HED [47] ( = 190) + our method | 88.21 | 68.48 | 77.10 | 62.74 |
HED [47] ( = 190) | 77.88 | 17.55 | 28.64 | 16.72 |
PMNet mask [59] + BDCN [55] ( = 210) + our method | 88.43 | 68.82 | 77.40 | 63.13 |
BDCN [55] ( = 210) | 83.00 | 23.55 | 36.69 | 22.46 |
PMNet mask [59] + DexiNed [50] ( = 230) + our method | 87.68 | 72.06 | 79.11 | 65.44 |
DexiNed [50] ( = 230) | 86.74 | 48.52 | 62.23 | 45.17 |
PMNet mask [59] + PMNet contour [59] ( = 120) + our method | 88.88 | 67.06 | 76.45 | 61.87 |
PMNet contour [59] ( = 120) | 70.88 | 5.85 | 10.81 | 5.71 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, H.; Xu, M.; Chen, Y.; Wu, W.; Dong, W. A Postprocessing Method Based on Regions and Boundaries Using Convolutional Neural Networks and a New Dataset for Building Extraction. Remote Sens. 2022, 14, 647. https://doi.org/10.3390/rs14030647
Yang H, Xu M, Chen Y, Wu W, Dong W. A Postprocessing Method Based on Regions and Boundaries Using Convolutional Neural Networks and a New Dataset for Building Extraction. Remote Sensing. 2022; 14(3):647. https://doi.org/10.3390/rs14030647
Chicago/Turabian StyleYang, Haiping, Meixia Xu, Yuanyuan Chen, Wei Wu, and Wen Dong. 2022. "A Postprocessing Method Based on Regions and Boundaries Using Convolutional Neural Networks and a New Dataset for Building Extraction" Remote Sensing 14, no. 3: 647. https://doi.org/10.3390/rs14030647
APA StyleYang, H., Xu, M., Chen, Y., Wu, W., & Dong, W. (2022). A Postprocessing Method Based on Regions and Boundaries Using Convolutional Neural Networks and a New Dataset for Building Extraction. Remote Sensing, 14(3), 647. https://doi.org/10.3390/rs14030647