RSTFusion: an end-to-end fusion network for infrared and visible images based on residual swin transfomer

Kaixin Li¹^na1,
Haojie Tang¹^na1,
Gang Liu ORCID: orcid.org/0000-0003-4143-828X¹,
Rui Chang¹,
Mengliang Xing¹ &
…
Jianchao Tang²

231 Accesses
Explore all metrics

Abstract

Infrared and visible image fusion techniques have emerged as powerful methods to harness the unique advantages of diverse sensors, resulting in improved image quality through the preservation of complementary and redundant information from the original images. While deep learning-based methods have gained widespread adoption in this domain, their predominant reliance on convolutional neural networks and limited utilization of transforms pose certain limitations. Notably, convolutional operations fail to effectively capture long-range dependency between images, which hampers the generation of fused images with optimal complementarities. Because of this, we provide an original end-to-end fusion model built on the swin transform. By modeling long-range dependency using a full-attention feature-encoding backbone, which is a pure transform network with stronger representational capabilities than convolutional neural networks, the model overcomes the shortcomings of manual design as well as complex activity-level measurement and fusion rule design. In addition, we present three loss function strategies that are created expressly to enhance similarity constraints and network parameter training, improving the quality of the detailed information in the fused images. Finally, experimental results on four datasets indicated that our method reached the state-of-the-art in both subjective and objective evaluation compared to eleven state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Two-Branch Fusion Network for Infrared and Visible Image Fusion

Unsupervised densely attention network for infrared and visible image fusion

Article 05 August 2020

MFTCFNet: infrared and visible image fusion network based on multi-layer feature tightly coupled

Article 22 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Bai Y, Hou Z, Liu X, Ma S, Yu W, Pu L (2020) An object detection algorithm based on decision-level fusion of visible light image and infrared image. J Air Force Eng Univ 21(06):53–59
Google Scholar
Fu H, Wang S, Duan P, Xiao C, Dian R, Li S, Li Z (2023) LRAF-Net: long-range attention fusion network for visible–infrared object detection. IEEE Trans Neural Netw Learn Syst 1–14
Hui L, XiaoJun W, Josef K (2020) Mdlatlrr: a novel decomposition method for infrared and visible image fusion. IEEE Trans Image Process 29:4733–4746
Google Scholar
Wu M, Ma Y, Huang J, Fan F, Dai X (2020) A new patch-based two-scale decomposition for infrared and visible image fusion. Infrared Phys Technol 110(103):362
Google Scholar
Hill P, Al-Mualla ME, Bull D (2016) Perceptual image fusion using wavelets. IEEE Trans Image Process 26(3):1076–1088
MathSciNet Google Scholar
Jin X, Jiang Q, Yao S, Zhou D, Nie R, Lee SJ, He K (2018) Infrared and visual image fusion method based on discrete cosine transform and local spatial frequency in discrete stationary wavelet transform domain. Infrared Phys Technol 88:1–12
Google Scholar
Naidu AR, Bhavana D, Revanth P, Gopi G, Kishore MP, Venkatesh KS (2020) Fusion of visible and infrared images via saliency detection using two-scale image decomposition. Int J Speech Technol 23(4):815–824
Google Scholar
Chen J, Li X, Luo L, Mei X, Ma J (2020) Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf Sci 508:64–78
Google Scholar
Ma B, Zhu Y, Yin X, Ban X, Huang H, Mukeshimana M (2021) Sesf-fuse: an unsupervised deep model for multi-focus image fusion. Neural Comput Appl 33:5793–5804
Google Scholar
Zhou Q, Ye S, Wen M, Huang Z, Ding M, Zhang X (2022) Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer. Neural Comput Appl 34(24):21741–21761
Google Scholar
Zhiguang Y, Youping C, Zhuliang L, Yong M (2021) GANFuse: a novel multi-exposure image fusion method based on generative adversarial networks. Neural Comput Appl 33:6133–6145
Google Scholar
Li H, Wu X (2019) Densefuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28:2614–2623
MathSciNet Google Scholar
Keutzer FIMKGD (2014) Densenet: implementing efficient convnet descriptor pyramids. Comput Science
Li H, Wu XJ, Durrani T (2020) Nestfuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans Instrum Meas 69(12):9645–9656
Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention—MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18. Springer, pp 234–241
Hou R, Zhou D, Nie R, Liu D, Xiong L, Guo Y, Yu C (2020) Vif-net: an unsupervised framework for infrared and visible image fusion. IEEE Trans Comput Imaging 6:640–651
Google Scholar
Xu H, Zhang H, Ma J (2021) Classification saliency-based rule for visible and infrared image fusion. IEEE Trans Comput Imaging 7:824–836
MathSciNet Google Scholar
Zhang J, Lei W, Li S, Li Z, Li X (2023) Infrared and visible image fusion with entropy-based adaptive fusion module and mask-guided convolutional neural network. Infrared Phys Technol 131(104):629
Google Scholar
Li J, Liu J, Zhou S, Zhang Q, Kasabov NK (2023) Infrared and visible image fusion based on residual dense network and gradient loss. Infrared Phys Technol 128(104):486
Google Scholar
Jiayi M, Wei Y, Pengwei L, Chang L, Junjun J (2019) Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf Fusion 48:11–26
Google Scholar
Ma J, Liang P, Yu W, Chen C, Guo X, Wu J, Jiang J (2020) Infrared and visible image fusion via detail preserving adversarial learning. Inf Fusion 54:85–98
Google Scholar
Xu H, Ma J, Zhang XP (2020) Mef-gan: multi-exposure image fusion via generative adversarial networks. IEEE Trans Image Process 29:7203–7216
Google Scholar
Le Z, Huang J, Xu H, Fan F, Ma Y, Mei X, Ma J (2022) Uifgan: an unsupervised continual-learning generative adversarial network for unified image fusion. Inf Fusion 88:305–318
Google Scholar
Jiayi M, Han X, Junjun J, Xiaoguang M, Xiao-Ping Z (2020) Ddcgan: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans Image Process 29:4980–4995
Google Scholar
Song A, Duan H, Pei H, Ding L (2022) Triple-discriminator generative adversarial network for infrared and visible image fusion. Neurocomputing 483:183–194
Google Scholar
Zhan L, Zhuang Y, Huang L (2017) Infrared and visible images fusion method based on discrete wavelet transform. J Comput 28(2):57–71
Google Scholar
Li G, Lin Y, Qu X (2021) An infrared and visible image fusion method based on multi-scale transformation and norm optimization. Inf Fusion 71:109–129
Google Scholar
Cvejic N, Bull D, Canagarajah N (2007) Region-based multimodal image fusion using ICA bases. IEEE Sens J 7(5):743–751
Google Scholar
Naidu V (2014) Hybrid DDCT-PCA based multi sensor image fusion. J Opt 43:48–61
Google Scholar
Cui G, Feng H, Xu Z, Li Q, Chen Y (2015) Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Opt Commun 341:199–209
Google Scholar
Bin Y, Chao Y, Guoyu H (2016) Efficient image fusion with approximate sparse representation. Int J Wavelets Multiresolut Inf Process 14(04):1650024
MathSciNet Google Scholar
Tang H, Liu G, Tang L, Bavirisetti DP, Wang J (2022) Mdedfusion: a multi-level detail enhancement decomposition method for infrared and visible image fusion. Infrared Phys Technol 127:104435
Google Scholar
Chen J, Li X, Luo L, Mei X, Ma J (2020) Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf Sci 508:64–78
Google Scholar
Li X, Guo X, Han P, Wang X, Li H, Luo T (2020) Laplacian redecomposition for multimodal medical image fusion. IEEE Trans Instrum Meas 69(9):6880–6890
Google Scholar
Bhat S, Koundal D (2021) Multi-focus image fusion using neutrosophic based wavelet transform. Appl Soft Comput 106:107307
Google Scholar
Nencini F, Garzelli A, Baronti S, Alparone L (2007) Remote sensing image fusion using the curvelet transform. Inf Fusion 8(2):143–156
Google Scholar
Li H, Qiu H, Yu Z, Zhang Y (2016) Infrared and visible image fusion scheme based on NSCT and low-level visual features. Infrared Phys Technol 76:174–184
Google Scholar
Bavirisetti DP, Xiao G, Liu G (2017) Multi-sensor image fusion based on fourth order partial differential equations. In: 2017 20th international conference on information fusion (fusion), pp 1–9. https://doi.org/10.23919/ICIF.2017.8009719
Liu N, Yang B (2021) Infrared and visible image fusion based on trpca and visual saliency detection. In: 2021 6th international conference on image, vision and computing (ICIVC). IEEE, pp 13–19
Mitianoudis N, Stathaki T (2007) Pixel-based and region-based image fusion schemes using ICA bases. Inf Fusion 8(2):131–142
Google Scholar
Mou J, Gao W, Song Z (2013) Image fusion based on non-negative matrix factorization and infrared feature extraction. In: 2013 6th international congress on image and signal processing (CISP), vol 2. IEEE, pp 1046–1050
Bavirisetti DP, Dhuli R (2016) Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys Technol 76:52–64
Google Scholar
Liu Y, Dong L, Ren W, Xu W (2021) Multi-scale saliency measure and orthogonal space for visible and infrared image fusion. Infrared Phys Technol 118(103):916
Google Scholar
Zhang S, Li X, Zhang X, Zhang S (2021) Infrared and visible image fusion based on saliency detection and two-scale transform decomposition. Infrared Phys Technol 114(103):626
Google Scholar
Wang W, Ma X, Liu H, Li Y, Liu W (2021) Multi-focus image fusion via joint convolutional analysis and synthesis sparse representation. Signal Process Image Commun 99(116):521
Google Scholar
Yang Y, Zhang Y, Huang S, Wu J (2020) Multi-focus image fusion via nsst with non-fixed base dictionary learning. Int J Syst Assur Eng Manag 11:849–855
Google Scholar
Liu Y, Liu S, Wang Z (2015) A general framework for image fusion based on multi-scale transform and sparse representation. Inf Fusion 24:147–164
Google Scholar
Prabhakar K, Srikar V, Babu R (2017) Deepfuse: a deep unsupervised approach for exposure fusion with extreme exposure image pairs (conference paper). In: Proceedings of the IEEE international conference on computer vision, pp 4724–4732
Liu L, Chen M, Xu M, Li X (2021) Two-stream network for infrared and visible images fusion. Neurocomputing 460:50–58
Google Scholar
Jiayi M, Linfeng T, Meilong X, Hao Z, Guobao X (2021) Stdfusionnet: an infrared and visible image fusion network based on salient target detection. IEEE Trans Instrum Meas 70:1–13
Google Scholar
Zhang Y, Liu Y, Sun P, Yan H, Zhao X, Zhang L (2020) Ifcnn: a general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118
Google Scholar
Xu H, Ma J, Jiang J, Guo X, Ling H (2022) U2fusion: A unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44:502–518
Google Scholar
Jiayi M, Hao Z, Zhenfeng S, Pengwei L, Han X (2021) Ganmcc: a generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans Instrum Meas 70:1–14
Google Scholar
Ashish V, Noam S, Niki P, Jakob U, Llion J, Gomez AN, Lukasz K, Illia P (2017) Attention is all you need. Learning
Alexey D, Lucas B, Alexander K. An image is worth 16x16 words: transformers for image recognition at scale
Jiayi M, Linfeng T, Fan F, Jun H, Xiaoguang M, Yong M (2022) Swinfusion: cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J Autom Sin 9:1200–1217
Google Scholar
Jun C, Jianfeng D, Yang Y, Wenping G (2023) Thfuse: an infrared and visible image fusion network using transformer and hybrid feature extractor. Neurocomputing 527:71–82
Google Scholar
Dongyu R, Tianyang X, XiaoJun W (2023) Tgfuse: an infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans Image Process : a publication of the IEEE Signal Processing Society 1
Tno image fusion dataset (2014). https://figshare.com/articles/TN_Image_Fusion_Dataset/1008029
Linfeng T, Jiteng Y, Hao Z, Xingyu J, Jiayi M (2022) Piafusion: a progressive infrared and visible image fusion network based on illumination aware. Inf Fusion 83:79–92
Google Scholar
Liu J, Fan X, Huang Z, Wu G, Liu R, Zhong W, Luo Z (2022) Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5802–5811
Zhishe W, Yanlin C, Wenyu S, Hui L, Lei Z (2022) Swinfuse: a residual swin transformer fusion network for infrared and visible images. IEEE Trans Instrum Meas 71:1–12
Google Scholar
Zhou Z, Wang B, Li S, Dong M (2016) Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters. Inf Fusion 30:15–26
Google Scholar
Kurban R (2023) Gaussian of differences: a simple and efficient general image fusion method. Entropy 25(8):1215
MathSciNet Google Scholar
Roberts JW, Van Aardt JA, Ahmed FB (2008) Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J Appl Remote Sens 2:023522
Google Scholar
Rao YJ (1997) In-fiber Bragg grating sensors. Meas Sci Technol 8:355
Google Scholar
Zhou W, Alan B, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13:600–612
Google Scholar
Haghighat M, Razian M (2014) Fast-FMI: non-reference image fusion metric. In: 2014 IEEE 8th international conference on application of information and communication technologies (AICT) international conference on application of information and communication technologies, pp 424–426
Aslantas V, Bendes E (2015) A new image quality metric for image fusion: the sum of the correlations of differences. AEU Int J Electron Commun 69:1890–1896
Google Scholar
Bauer E (1999) An empirical of voting classification algorithms: bagging, boosting and variants. Mach Learn 36:105–139
Google Scholar
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
MathSciNet Google Scholar
Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind’’ image quality analyzer. IEEE Signal Process Lett 20(3):209–212
Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the financial supports by The National Science Foundation of China (62203224), Shanghai Special Plan for Local Colleges and Universities for Capacity Building (22010501300).

Author information

Kaixin Li and Haojie Tang have contributed equally to this work.

Authors and Affiliations

Shanghai University of Electric Power, Shanghai, People’s Republic of China
Kaixin Li, Haojie Tang, Gang Liu, Rui Chang & Mengliang Xing
Shanghai Xilong Technology Co., LTD, Shanghai, People’s Republic of China
Jianchao Tang

Authors

Kaixin Li
View author publications
You can also search for this author in PubMed Google Scholar
Haojie Tang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Chang
View author publications
You can also search for this author in PubMed Google Scholar
Mengliang Xing
View author publications
You can also search for this author in PubMed Google Scholar
Jianchao Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Liu.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service or company that could be construed as influencing the position presented in the manuscript entitled "RSTFusion: An end-to-end fusion network for infrared and visible images based on residual swin transfomer".

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, K., Tang, H., Liu, G. et al. RSTFusion: an end-to-end fusion network for infrared and visible images based on residual swin transfomer. Neural Comput & Applic 36, 13467–13489 (2024). https://doi.org/10.1007/s00521-024-09716-9

Download citation

Received: 23 September 2023
Accepted: 25 March 2024
Published: 22 April 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s00521-024-09716-9

RSTFusion: an end-to-end fusion network for infrared and visible images based on residual swin transfomer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Two-Branch Fusion Network for Infrared and Visible Image Fusion

Unsupervised densely attention network for infrared and visible image fusion

MFTCFNet: infrared and visible image fusion network based on multi-layer feature tightly coupled

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

RSTFusion: an end-to-end fusion network for infrared and visible images based on residual swin transfomer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Two-Branch Fusion Network for Infrared and Visible Image Fusion

Unsupervised densely attention network for infrared and visible image fusion

MFTCFNet: infrared and visible image fusion network based on multi-layer feature tightly coupled

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation