Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation

Published: 20 December 2021 Publication History

Abstract

With the rapid development of sensor technology, lots of remote sensing data have been collected. It effectively obtains good semantic segmentation performance by extracting feature maps based on multi-modal remote sensing images since extra modal data provides more information. How to make full use of multi-model remote sensing data for semantic segmentation is challenging. Toward this end, we propose a new network called Multi-Stage Fusion and Multi-Source Attention Network ((MS)2-Net) for multi-modal remote sensing data segmentation. The multi-stage fusion module fuses complementary information after calibrating the deviation information by filtering the noise from the multi-modal data. Besides, similar feature points are aggregated by the proposed multi-source attention for enhancing the discriminability of features with different modalities. The proposed model is evaluated on publicly available multi-modal remote sensing data sets, and results demonstrate the effectiveness of the proposed method.

References

[1]
Sangram Panigrahi, Kesari Verma, and Priyanka Tripathi. 2019. Land cover change detection using focused time delay neural network. Soft Computing 23, 17 (2019), 7699–7713.
[2]
Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722–2730.
[3]
Qiang Zhang, Nianchang Huang, Lin Yao, Dingwen Zhang, Caifeng Shan, and Jungong Han. 2019. RGB-T salient object detection via fusing multi-level CNN features. IEEE Transactions on Image Processing 29 (2019), 3321–3335.
[4]
Nianchang Huang, Yang Yang, Dingwen Zhang, Qiang Zhang, and Jungong Han. 2021. Employing bilinear fusion and saliency prior information for RGB-D salient object detection. IEEE Transactions on Multimedia (2021).
[5]
Nianchang Huang, Yi Liu, Qiang Zhang, and Jungong Han. 2020. Joint cross-modal and unimodal features for RGB-D salient object detection. IEEE Transactions on Multimedia (2020).
[6]
Guotai Wang, Wenqi Li, Maria A. Zuluaga, Rosalind Pratt, Premal A. Patel, Michael Aertsen, Tom Doel, Anna L. David, Jan Deprest, Sébastien Ourselin, et al. 2018. Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Transactions on Medical Imaging 37, 7 (2018), 1562–1573.
[7]
Laxmi Kant Sharma, Rajit Gupta, and Prem Chandra Pandey. 2021. Future aspects and potential of the remote sensing technology to meet the natural resource needs. Advances in Remote Sensing for Natural Resource Monitoring (2021), 445–464.
[8]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[9]
Zijiang Zhu, Deming Li, Yi Hu, Junshan Li, Dong Liu, and Jianjun Li. 2020. Indoor scene segmentation algorithm based on full convolutional neural network. Neural Computing and Applications (2020).
[10]
Khairi Abdulrahim, Kamaruzzaman Seman, Rosalina Abdul Salam, et al. 2019. A new spatio-temporal background–foreground bimodal for motion segmentation and detection in urban traffic scenes. Neural Computing and Applications (2019), 1–17.
[11]
Baohua Yuan, Lixin Han, Xiangping Gu, and Hong Yan. 2020. Multi-deep features fusion for high-resolution remote sensing image scene classification. Neural Computing and Applications (2020), 1–17.
[12]
Hao Zhou, Lu Qi, Zhaoliang Wan, Hai Huang, and Xu Yang. 2020. RGB-D Co-attention network for semantic segmentation. In Proceedings of the Asian Conference on Computer Vision.
[13]
Thomas Czerniawski and Fernanda Leite. 2020. Automated segmentation of RGB-D images into a comprehensive set of building components using deep learning. Advanced Engineering Informatics 45 (2020), 101131.
[14]
Yanhua Cheng, Rui Cai, Zhiwei Li, Xin Zhao, and Kaiqi Huang. 2017. Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3029–3037.
[15]
Camille Couprie, Clément Farabet, Laurent Najman, and Yann LeCun. 2013. Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572 (2013).
[16]
Caner Hazirbas, Lingni Ma, Csaba Domokos, and Daniel Cremers. 2016. Fusenet: Incorporating depth into semantic segmentation via fusion-based CNN architecture. In Asian Conference on Computer Vision. Springer, 213–228.
[17]
Yingying Zhang, Quan Fang, Shengsheng Qian, and Changsheng Xu. 2020. Knowledge-aware attentive Wasserstein adversarial dialogue response generation. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 4 (2020), 1–20.
[18]
Tao-Yang Fu and Wang-Chien Lee. 2020. Trembr: Exploring road networks for trajectory representation learning. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 1 (2020), 1–25.
[19]
Binge Cui, Xin Chen, and Yan Lu. 2020. Semantic segmentation of remote sensing images using transfer learning and deep convolutional neural network with dense connection. IEEE Access 8 (2020), 116744–116755.
[20]
Mingmin Chi, Antonio Plaza, Jon Atli Benediktsson, Zhongyi Sun, Jinsheng Shen, and Yangyong Zhu. 2016. Big data for remote sensing: Challenges and opportunities. Proc. IEEE 104, 11 (2016), 2207–2219.
[21]
Yanwei Ju, Yan Zhang, and Dong Chen. 2020. A SAR image segmentation method based on MLRT. In 2020 5th International Conference on Communication, Image and Signal Processing (CCISP). IEEE, 179–182.
[22]
S. Manju and K. Helenprabha. 2019. A structured support vector machine for hyperspectral satellite image segmentation and classification based on modified swarm optimization approach. Journal of Ambient Intelligence and Humanized Computing (2019), 1–10.
[23]
Mohamed A. Hamada, Yeleussiz Kanat, and Adejor Egahi Abiche. 2019. Multi-spectral image segmentation based on the K-means clustering. Int. J. Innov. Technol. Explor. Eng 9 (2019), 1016–1019.
[24]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.
[25]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234–241.
[26]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2481–2495.
[27]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062 (2014).
[28]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 834–848.
[29]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
[30]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV). 801–818.
[31]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[32]
Teerapong Panboonyuen, Kulsawasd Jitkajornwanich, Siam Lawawirojwong, Panu Srestasathiern, and Peerapon Vateekul. 2019. Semantic segmentation on remotely sensed images using an enhanced global convolutional network with channel attention and domain specific transfer learning. Remote Sensing 11, 1 (2019), 83.
[33]
Siyu Liu, Changtao He, Haiwei Bai, Yijie Zhang, and Jian Cheng. 2020. Light-weight attention semantic segmentation network for high-resolution remote sensing images. In IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2595–2598.
[34]
Michael Kampffmeyer, Arnt-Borre Salberg, and Robert Jenssen. 2016. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1–9.
[35]
David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision. 2650–2658.
[36]
Farzad Husain, Hannes Schulz, Babette Dellen, Carme Torras, and Sven Behnke. 2016. Combining semantic and geometric features for object class segmentation of indoor scenes. IEEE Robotics and Automation Letters 2, 1 (2016), 49–55.
[37]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning. PMLR, 448–456.
[38]
Ran Wei, Peter E. D. Love, Weili Fang, Hanbin Luo, and Shuangjie Xu. 2019. Recognizing people’s identity in construction sites with computer vision: A spatial and temporal attention pooling network. Advanced Engineering Informatics 42 (2019), 100981.
[39]
Maxim Berman, Amal Rannen Triki, and Matthew B. Blaschko. 2018. The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4413–4421.
[40]
Marc Bosch, Kevin Foster, Gordon Christie, Sean Wang, Gregory D. Hager, and Myron Brown. 2019. Semantic stereo for incidental satellite images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1524–1532.
[41]
Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, and Jian Sun. 2017. Large kernel matters–improve semantic segmentation by global convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4353–4361.
[42]
Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. 2018. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV). 418–434.
[43]
Lei Ding, Kai Zheng, Dong Lin, Yuxing Chen, Bing Liu, Jiansheng Li, and Lorenzo Bruzzone. 2021. MP-ResNet: Multipath residual network for the semantic segmentation of high-resolution PolSAR images. IEEE Geoscience and Remote Sensing Letters (2021).
[44]
Seong-Jin Park, Ki-Sang Hong, and Seungyong Lee. 2017. RDFnet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 4980–4989.
[45]
Xinxin Hu, Kailun Yang, Lei Fei, and Kaiwei Wang. 2019. ACNet: Attention based network to exploit complementary features for RGBD semantic segmentation. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 1440–1444.
[46]
Daniel Seichter, Mona Köhler, Benjamin Lewandowski, Tim Wengefeld, and Horst-Michael Gross. 2020. Efficient RGB-D semantic segmentation for indoor scene analysis. arXiv preprint arXiv:2011.06961 (2020).

Cited By

View all
  • (2024)Unsupervised Domain Adaptation with Contrastive Learning-Based Discriminative Feature Augmentation for RS Image ClassificationRemote Sensing10.3390/rs1611197416:11(1974)Online publication date: 30-May-2024
  • (2024)Cross-Modal Segmentation Network for Winter Wheat Mapping in Complex Terrain Using Remote-Sensing Multi-Temporal Images and DEM DataRemote Sensing10.3390/rs1610177516:10(1775)Online publication date: 16-May-2024
  • (2024)Forest Fire Smoke Detection Based on Multiple Color Spaces Deep Feature FusionForests10.3390/f1504068915:4(689)Online publication date: 11-Apr-2024
  • Show More Cited By

Index Terms

  1. Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 12, Issue 6
      December 2021
      356 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3501281
      • Editor:
      • Huan Liu
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 December 2021
      Accepted: 01 August 2021
      Revised: 01 July 2021
      Received: 01 December 2020
      Published in TIST Volume 12, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Semantic segmentation
      2. multi-modal remote sensing images
      3. attention
      4. feature fusion

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • Fundamental Research Funds for the Central Universities

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)246
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 19 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Unsupervised Domain Adaptation with Contrastive Learning-Based Discriminative Feature Augmentation for RS Image ClassificationRemote Sensing10.3390/rs1611197416:11(1974)Online publication date: 30-May-2024
      • (2024)Cross-Modal Segmentation Network for Winter Wheat Mapping in Complex Terrain Using Remote-Sensing Multi-Temporal Images and DEM DataRemote Sensing10.3390/rs1610177516:10(1775)Online publication date: 16-May-2024
      • (2024)Forest Fire Smoke Detection Based on Multiple Color Spaces Deep Feature FusionForests10.3390/f1504068915:4(689)Online publication date: 11-Apr-2024
      • (2024)Detail-Optimized Super-Resolution Reconstruction-Based Multistage Training Strategy for Remote Sensing Semantic SegmentationIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.333984262(1-16)Online publication date: 2024
      • (2024)HR and LiDAR Data Collaborative Semantic Segmentation Based on Adaptive Cross-Modal Fusion NetworkIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.341838717(12153-12168)Online publication date: 2024
      • (2024)Efficient multi-scale network for semantic segmentation of fine-resolution remotely sensed imagesMeasurement Science and Technology10.1088/1361-6501/ad50fa35:9(096005)Online publication date: 5-Jun-2024
      • (2024)Dual-weight attention-based multi-source multi-stage alignment domain adaptation for industrial fault diagnosisMeasurement Science and Technology10.1088/1361-6501/ad503835:9(096105)Online publication date: 5-Jun-2024
      • (2024)Design of network security monitoring system based on CNN and exponential weighted D-S evidence theoryJournal of Cyber Security Technology10.1080/23742917.2024.2367795(1-20)Online publication date: 17-Jun-2024
      • (2024)Optimizing Hyperspectral Image Classification Through Swin Transformer Integration and CNN Feature ExtractionComputational Intelligence in Data Science10.1007/978-3-031-69986-3_29(374-386)Online publication date: 30-Aug-2024
      • (2023)Daytime Sea Fog Identification Based on Multi-Satellite Information and the ECA-TransUnet ModelRemote Sensing10.3390/rs1516394915:16(3949)Online publication date: 9-Aug-2023
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media