Abstract
This paper presents a novel and efficient multitask learning framework for image translation and saliency detection from remote sensing images, which mainly contains the image translation network-weight sharing attention GAN (WSA-GAN) and the salient object detection network-boundary guidance network (BGNet). WSA-GAN can be used to generate a large number of synthetic infrared remote sensing images (IRIs) or optical remote sensing images (ORIs) from the corresponding complementary modality images. Then, a new multimodal context-aware learning is proposed for feature extraction and to coordinate the entanglement of latent features in the multimodal context of ORIs and IRIs. Since convolutional neural networks do not perform well when the object has directional variance, our framework introduces the attention-aware CapsNet (AACNet) to alleviate the problem and enhance the feature expressiveness. In addition, knowledge distillation strategy is introduced in AACNet to reduce the model complexity. Finally, the multiscale feature learning network and the boundary-aware block are designed to generate more accurate saliency detection results with clear boundaries. Experimental results demonstrate that the presented image translation and salient object detection networks outperform other approaches.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study. If you have any questions, please consult the corresponding author.
References
Kaji, S., Kida, S.: Overview of image-to-image translation by use of deep neural networks: denoising, super-resolution, modality conversion, and reconstruction in medical imaging. Radiol. Phys. Technol. 12(3), 235–248 (2019)
Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)
Hasanov, A., Laine, T.H., Chung, T.S.: A survey of adaptive context-aware learning environments. J. Ambient Intell. Smart Environ. 11(5), 403–428 (2019)
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 327–340 (2001)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8798–8807 (2018)
Yi, Z., Zhang, H., Tan, P., Gong, M.: Dualgan: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2849–2857 (2017)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. Adv. Neural Inform. Process. Syst. 30 (2017)
Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks. In: International Conference on Machine Learning, pp. 1857–1865. PMLR (2017)
Yoo, D., Kim, N., Park, S., Paek, A.S., Kweon, I.S.: Pixel-level domain transfer. In: European Conference on Computer Vision, pp. 517–532. Springer (2016)
Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 35–51 (2018)
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189 (2018)
Lan, J., Ye, F., Ye, Z., Xu, P., Ling, W.K., Huang, G.: Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation. Visual Comput. pp. 1–15 (2022)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. Adv. Neural Inf. Process. Syst. 29 (2016)
Tang, H., Xu, D., Sebe, N., Yan, Y.: Attention-guided generative adversarial networks for unsupervised image-to-image translation. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Li, J., Zeng, H., Peng, L., Zhu, J., Liu, Z.: Learning to rank method combining multi-head self-attention with conditional generative adversarial nets. Array 15, 100205 (2022)
Heo, Y.J., Kim, B.G., Roy, P.P.: Frontal face generation algorithm from multi-view images based on generative adversarial network. J. Multimed. Inf. Sys. 8(2), 85–92 (2021)
Ruoyao, L., Bo, Z., Bin, W.: Remote sensing image scene classification based on multi-layer feature context coding network. J. Infrared Millim. Waves 40(4), 530 (2021)
Wang, Q., He, X., Li, X.: Locality and structure regularized low rank representation for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 57(2), 911–923 (2018)
Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1915–1926 (2011)
Li, T., Song, H., Zhang, K., Liu, Q.: Learning residual refinement network with semantic context representation for real-time saliency object detection. Pattern Recogn. 105, 107372 (2020)
Li, C., Yuan, Y., Cai, W., Xia, Y., Dagan Feng, D.: Robust saliency detection via regularized random walks ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2710–2717 (2015)
Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3166–3173 (2013)
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., Shum, H.Y.: Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 353–367 (2010)
Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., Li, S.: Salient object detection: A discriminative regional feature integration approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2083–2090 (2013)
Zhao, J.X., Liu, J.J., Fan, D.P., Cao, Y., Yang, J., Cheng, M.M.: Egnet: Edge guidance network for salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer vision, pp. 8779–8788 (2019)
Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., Torr, P.H.: Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3203–3212 (2017)
Li, C., Cong, R., Hou, J., Zhang, S., Qian, Y., Kwong, S.: Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 57(11), 9156–9166 (2019)
Zhang, L., Liu, Y., Zhang, J.: Saliency detection based on self-adaptive multiple feature fusion for remote sensing images. Int. J. Remote Sens. 40(22), 8270–8297 (2019)
Hu, X., Fu, C.W., Zhu, L., Wang, T., Heng, P.A.: Sac-net: Spatial attenuation context for salient object detection. IEEE Trans. Circuits Syst. Video Technol. 31(3), 1079–1090 (2020)
Das, D.K., Shit, S., Ray, D.N., Majumder, S.: Cgan: closure-guided attention network for salient object detection. Vis. Comput. 38(11), 3803–3817 (2022)
Yu, Y., Gu, T., Guan, H., Li, D., Jin, S.: Vehicle detection from high-resolution remote sensing imagery using convolutional capsule networks. IEEE Geosci. Remote Sens. Lett. 16(12), 1894–1898 (2019)
Yu, Y., Wang, J., Qiang, H., Jiang, M., Tang, E., Yu, C., Zhang, Y., Li, J.: Sparse anchoring guided high-resolution capsule network for geospatial object detection from remote sensing imagery. Int. J. Appl. Earth Obs. Geoinf. 104, 102548 (2021)
Janakiramaiah, B., Kalyani, G., Karuna, A., Prasad, L., Krishna, M.: Military object detection in defense using multi-level capsule networks. Soft Comput. pp. 1–15 (2021)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Proceedings of Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)
Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with em routing. In: International Conference on Learning Representations (2018)
Feng, Y., Gao, J., Xu, C.: Learning dual-routing capsule graph neural network for few-shot video classification. IEEE Transactions on Multimedia (2022)
Liu, Y., Zhang, D., Zhang, Q., Han, J.: Part-object relational visual saliency. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
Mazzia, V., Salvetti, F., Chiaberge, M.: Efficient-capsnet: Capsule network with self-attention routing. Sci. Rep. 11(1), 1–13 (2021)
Park, H.J., Choi, Y.J., Lee, Y.W., Kim, B.G.: ssfpn: Scale sequence (\(s ^{2}\)) feature based feature pyramid network for object detection. arXiv preprint arXiv:2208.11533 (2022)
Chen, T., Xiao, J., Hu, X., Zhang, G., Wang, S.: Boundary-guided network for camouflaged object detection. Knowl.-Based Syst. 248, 108901 (2022)
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 30 (2017)
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374 (2019)
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint 2(7) arXiv:1503.02531 (2015)
Jia, B., Huang, Q.: De-capsnet: a diverse enhanced capsule network with disperse dynamic routing. Appl. Sci. 10(3), 884 (2020)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Zhang, Q., Cong, R., Li, C., Cheng, M.M., Fang, Y., Cao, X., Zhao, Y., Kwong, S.: Dense attention fluid network for salient object detection in optical remote sensing images. IEEE Trans. Image Process. 30, 1305–1317 (2020)
Liu, J.J., Hou, Q., Cheng, M.M., Feng, J., Jiang, J.: A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3917–3926 (2019)
Yuan, Y., Li, C., Kim, J., Cai, W., Feng, D.D.: Reversion correction and regularized random walk ranking for saliency detection. IEEE Trans. Image Process. 27(3), 1311–1322 (2017)
Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: A new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4548–4557 (2017)
Funding
This research was funded by NSFC 61972353, NSF IIS-1816511, OAC-1910469 and Strategic Cooperation Technology Projects of CNPC and CUPB: ZLZX2020-05.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lian, Y., Shi, X., Shen, S. et al. Multitask learning for image translation and salient object detection from multimodal remote sensing images. Vis Comput 40, 1395–1414 (2024). https://doi.org/10.1007/s00371-023-02857-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02857-3