Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

RSTFusion: an end-to-end fusion network for infrared and visible images based on residual swin transfomer

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Infrared and visible image fusion techniques have emerged as powerful methods to harness the unique advantages of diverse sensors, resulting in improved image quality through the preservation of complementary and redundant information from the original images. While deep learning-based methods have gained widespread adoption in this domain, their predominant reliance on convolutional neural networks and limited utilization of transforms pose certain limitations. Notably, convolutional operations fail to effectively capture long-range dependency between images, which hampers the generation of fused images with optimal complementarities. Because of this, we provide an original end-to-end fusion model built on the swin transform. By modeling long-range dependency using a full-attention feature-encoding backbone, which is a pure transform network with stronger representational capabilities than convolutional neural networks, the model overcomes the shortcomings of manual design as well as complex activity-level measurement and fusion rule design. In addition, we present three loss function strategies that are created expressly to enhance similarity constraints and network parameter training, improving the quality of the detailed information in the fused images. Finally, experimental results on four datasets indicated that our method reached the state-of-the-art in both subjective and objective evaluation compared to eleven state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Bai Y, Hou Z, Liu X, Ma S, Yu W, Pu L (2020) An object detection algorithm based on decision-level fusion of visible light image and infrared image. J Air Force Eng Univ 21(06):53–59

    Google Scholar 

  2. Fu H, Wang S, Duan P, Xiao C, Dian R, Li S, Li Z (2023) LRAF-Net: long-range attention fusion network for visible–infrared object detection. IEEE Trans Neural Netw Learn Syst 1–14

  3. Hui L, XiaoJun W, Josef K (2020) Mdlatlrr: a novel decomposition method for infrared and visible image fusion. IEEE Trans Image Process 29:4733–4746

    Google Scholar 

  4. Wu M, Ma Y, Huang J, Fan F, Dai X (2020) A new patch-based two-scale decomposition for infrared and visible image fusion. Infrared Phys Technol 110(103):362

    Google Scholar 

  5. Hill P, Al-Mualla ME, Bull D (2016) Perceptual image fusion using wavelets. IEEE Trans Image Process 26(3):1076–1088

    MathSciNet  Google Scholar 

  6. Jin X, Jiang Q, Yao S, Zhou D, Nie R, Lee SJ, He K (2018) Infrared and visual image fusion method based on discrete cosine transform and local spatial frequency in discrete stationary wavelet transform domain. Infrared Phys Technol 88:1–12

    Google Scholar 

  7. Naidu AR, Bhavana D, Revanth P, Gopi G, Kishore MP, Venkatesh KS (2020) Fusion of visible and infrared images via saliency detection using two-scale image decomposition. Int J Speech Technol 23(4):815–824

    Google Scholar 

  8. Chen J, Li X, Luo L, Mei X, Ma J (2020) Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf Sci 508:64–78

    Google Scholar 

  9. Ma B, Zhu Y, Yin X, Ban X, Huang H, Mukeshimana M (2021) Sesf-fuse: an unsupervised deep model for multi-focus image fusion. Neural Comput Appl 33:5793–5804

    Google Scholar 

  10. Zhou Q, Ye S, Wen M, Huang Z, Ding M, Zhang X (2022) Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer. Neural Comput Appl 34(24):21741–21761

    Google Scholar 

  11. Zhiguang Y, Youping C, Zhuliang L, Yong M (2021) GANFuse: a novel multi-exposure image fusion method based on generative adversarial networks. Neural Comput Appl 33:6133–6145

    Google Scholar 

  12. Li H, Wu X (2019) Densefuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28:2614–2623

    MathSciNet  Google Scholar 

  13. Keutzer FIMKGD (2014) Densenet: implementing efficient convnet descriptor pyramids. Comput Science

  14. Li H, Wu XJ, Durrani T (2020) Nestfuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans Instrum Meas 69(12):9645–9656

    Google Scholar 

  15. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention—MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18. Springer, pp 234–241

  16. Hou R, Zhou D, Nie R, Liu D, Xiong L, Guo Y, Yu C (2020) Vif-net: an unsupervised framework for infrared and visible image fusion. IEEE Trans Comput Imaging 6:640–651

    Google Scholar 

  17. Xu H, Zhang H, Ma J (2021) Classification saliency-based rule for visible and infrared image fusion. IEEE Trans Comput Imaging 7:824–836

    MathSciNet  Google Scholar 

  18. Zhang J, Lei W, Li S, Li Z, Li X (2023) Infrared and visible image fusion with entropy-based adaptive fusion module and mask-guided convolutional neural network. Infrared Phys Technol 131(104):629

    Google Scholar 

  19. Li J, Liu J, Zhou S, Zhang Q, Kasabov NK (2023) Infrared and visible image fusion based on residual dense network and gradient loss. Infrared Phys Technol 128(104):486

    Google Scholar 

  20. Jiayi M, Wei Y, Pengwei L, Chang L, Junjun J (2019) Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf Fusion 48:11–26

    Google Scholar 

  21. Ma J, Liang P, Yu W, Chen C, Guo X, Wu J, Jiang J (2020) Infrared and visible image fusion via detail preserving adversarial learning. Inf Fusion 54:85–98

    Google Scholar 

  22. Xu H, Ma J, Zhang XP (2020) Mef-gan: multi-exposure image fusion via generative adversarial networks. IEEE Trans Image Process 29:7203–7216

    Google Scholar 

  23. Le Z, Huang J, Xu H, Fan F, Ma Y, Mei X, Ma J (2022) Uifgan: an unsupervised continual-learning generative adversarial network for unified image fusion. Inf Fusion 88:305–318

    Google Scholar 

  24. Jiayi M, Han X, Junjun J, Xiaoguang M, Xiao-Ping Z (2020) Ddcgan: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans Image Process 29:4980–4995

    Google Scholar 

  25. Song A, Duan H, Pei H, Ding L (2022) Triple-discriminator generative adversarial network for infrared and visible image fusion. Neurocomputing 483:183–194

    Google Scholar 

  26. Zhan L, Zhuang Y, Huang L (2017) Infrared and visible images fusion method based on discrete wavelet transform. J Comput 28(2):57–71

    Google Scholar 

  27. Li G, Lin Y, Qu X (2021) An infrared and visible image fusion method based on multi-scale transformation and norm optimization. Inf Fusion 71:109–129

    Google Scholar 

  28. Cvejic N, Bull D, Canagarajah N (2007) Region-based multimodal image fusion using ICA bases. IEEE Sens J 7(5):743–751

    Google Scholar 

  29. Naidu V (2014) Hybrid DDCT-PCA based multi sensor image fusion. J Opt 43:48–61

    Google Scholar 

  30. Cui G, Feng H, Xu Z, Li Q, Chen Y (2015) Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Opt Commun 341:199–209

    Google Scholar 

  31. Bin Y, Chao Y, Guoyu H (2016) Efficient image fusion with approximate sparse representation. Int J Wavelets Multiresolut Inf Process 14(04):1650024

    MathSciNet  Google Scholar 

  32. Tang H, Liu G, Tang L, Bavirisetti DP, Wang J (2022) Mdedfusion: a multi-level detail enhancement decomposition method for infrared and visible image fusion. Infrared Phys Technol 127:104435

    Google Scholar 

  33. Chen J, Li X, Luo L, Mei X, Ma J (2020) Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf Sci 508:64–78

    Google Scholar 

  34. Li X, Guo X, Han P, Wang X, Li H, Luo T (2020) Laplacian redecomposition for multimodal medical image fusion. IEEE Trans Instrum Meas 69(9):6880–6890

    Google Scholar 

  35. Bhat S, Koundal D (2021) Multi-focus image fusion using neutrosophic based wavelet transform. Appl Soft Comput 106:107307

    Google Scholar 

  36. Nencini F, Garzelli A, Baronti S, Alparone L (2007) Remote sensing image fusion using the curvelet transform. Inf Fusion 8(2):143–156

    Google Scholar 

  37. Li H, Qiu H, Yu Z, Zhang Y (2016) Infrared and visible image fusion scheme based on NSCT and low-level visual features. Infrared Phys Technol 76:174–184

    Google Scholar 

  38. Bavirisetti DP, Xiao G, Liu G (2017) Multi-sensor image fusion based on fourth order partial differential equations. In: 2017 20th international conference on information fusion (fusion), pp 1–9. https://doi.org/10.23919/ICIF.2017.8009719

  39. Liu N, Yang B (2021) Infrared and visible image fusion based on trpca and visual saliency detection. In: 2021 6th international conference on image, vision and computing (ICIVC). IEEE, pp 13–19

  40. Mitianoudis N, Stathaki T (2007) Pixel-based and region-based image fusion schemes using ICA bases. Inf Fusion 8(2):131–142

    Google Scholar 

  41. Mou J, Gao W, Song Z (2013) Image fusion based on non-negative matrix factorization and infrared feature extraction. In: 2013 6th international congress on image and signal processing (CISP), vol 2. IEEE, pp 1046–1050

  42. Bavirisetti DP, Dhuli R (2016) Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys Technol 76:52–64

    Google Scholar 

  43. Liu Y, Dong L, Ren W, Xu W (2021) Multi-scale saliency measure and orthogonal space for visible and infrared image fusion. Infrared Phys Technol 118(103):916

    Google Scholar 

  44. Zhang S, Li X, Zhang X, Zhang S (2021) Infrared and visible image fusion based on saliency detection and two-scale transform decomposition. Infrared Phys Technol 114(103):626

    Google Scholar 

  45. Wang W, Ma X, Liu H, Li Y, Liu W (2021) Multi-focus image fusion via joint convolutional analysis and synthesis sparse representation. Signal Process Image Commun 99(116):521

    Google Scholar 

  46. Yang Y, Zhang Y, Huang S, Wu J (2020) Multi-focus image fusion via nsst with non-fixed base dictionary learning. Int J Syst Assur Eng Manag 11:849–855

    Google Scholar 

  47. Liu Y, Liu S, Wang Z (2015) A general framework for image fusion based on multi-scale transform and sparse representation. Inf Fusion 24:147–164

    Google Scholar 

  48. Prabhakar K, Srikar V, Babu R (2017) Deepfuse: a deep unsupervised approach for exposure fusion with extreme exposure image pairs (conference paper). In: Proceedings of the IEEE international conference on computer vision, pp 4724–4732

  49. Liu L, Chen M, Xu M, Li X (2021) Two-stream network for infrared and visible images fusion. Neurocomputing 460:50–58

    Google Scholar 

  50. Jiayi M, Linfeng T, Meilong X, Hao Z, Guobao X (2021) Stdfusionnet: an infrared and visible image fusion network based on salient target detection. IEEE Trans Instrum Meas 70:1–13

    Google Scholar 

  51. Zhang Y, Liu Y, Sun P, Yan H, Zhao X, Zhang L (2020) Ifcnn: a general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118

    Google Scholar 

  52. Xu H, Ma J, Jiang J, Guo X, Ling H (2022) U2fusion: A unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44:502–518

    Google Scholar 

  53. Jiayi M, Hao Z, Zhenfeng S, Pengwei L, Han X (2021) Ganmcc: a generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans Instrum Meas 70:1–14

    Google Scholar 

  54. Ashish V, Noam S, Niki P, Jakob U, Llion J, Gomez AN, Lukasz K, Illia P (2017) Attention is all you need. Learning

  55. Alexey D, Lucas B, Alexander K. An image is worth 16x16 words: transformers for image recognition at scale

  56. Jiayi M, Linfeng T, Fan F, Jun H, Xiaoguang M, Yong M (2022) Swinfusion: cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J Autom Sin 9:1200–1217

    Google Scholar 

  57. Jun C, Jianfeng D, Yang Y, Wenping G (2023) Thfuse: an infrared and visible image fusion network using transformer and hybrid feature extractor. Neurocomputing 527:71–82

    Google Scholar 

  58. Dongyu R, Tianyang X, XiaoJun W (2023) Tgfuse: an infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans Image Process : a publication of the IEEE Signal Processing Society 1

  59. Tno image fusion dataset (2014). https://figshare.com/articles/TN_Image_Fusion_Dataset/1008029

  60. Linfeng T, Jiteng Y, Hao Z, Xingyu J, Jiayi M (2022) Piafusion: a progressive infrared and visible image fusion network based on illumination aware. Inf Fusion 83:79–92

    Google Scholar 

  61. Liu J, Fan X, Huang Z, Wu G, Liu R, Zhong W, Luo Z (2022) Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5802–5811

  62. Zhishe W, Yanlin C, Wenyu S, Hui L, Lei Z (2022) Swinfuse: a residual swin transformer fusion network for infrared and visible images. IEEE Trans Instrum Meas 71:1–12

    Google Scholar 

  63. Zhou Z, Wang B, Li S, Dong M (2016) Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters. Inf Fusion 30:15–26

    Google Scholar 

  64. Kurban R (2023) Gaussian of differences: a simple and efficient general image fusion method. Entropy 25(8):1215

    MathSciNet  Google Scholar 

  65. Roberts JW, Van Aardt JA, Ahmed FB (2008) Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J Appl Remote Sens 2:023522

    Google Scholar 

  66. Rao YJ (1997) In-fiber Bragg grating sensors. Meas Sci Technol 8:355

    Google Scholar 

  67. Zhou W, Alan B, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13:600–612

    Google Scholar 

  68. Haghighat M, Razian M (2014) Fast-FMI: non-reference image fusion metric. In: 2014 IEEE 8th international conference on application of information and communication technologies (AICT) international conference on application of information and communication technologies, pp 424–426

  69. Aslantas V, Bendes E (2015) A new image quality metric for image fusion: the sum of the correlations of differences. AEU Int J Electron Commun 69:1890–1896

    Google Scholar 

  70. Bauer E (1999) An empirical of voting classification algorithms: bagging, boosting and variants. Mach Learn 36:105–139

    Google Scholar 

  71. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708

    MathSciNet  Google Scholar 

  72. Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind’’ image quality analyzer. IEEE Signal Process Lett 20(3):209–212

    Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the financial supports by The National Science Foundation of China (62203224), Shanghai Special Plan for Local Colleges and Universities for Capacity Building (22010501300).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Liu.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service or company that could be construed as influencing the position presented in the manuscript entitled "RSTFusion: An end-to-end fusion network for infrared and visible images based on residual swin transfomer".

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, K., Tang, H., Liu, G. et al. RSTFusion: an end-to-end fusion network for infrared and visible images based on residual swin transfomer. Neural Comput & Applic 36, 13467–13489 (2024). https://doi.org/10.1007/s00521-024-09716-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09716-9

Keywords

Navigation