Abstract
Infrared and visible image fusion techniques have emerged as powerful methods to harness the unique advantages of diverse sensors, resulting in improved image quality through the preservation of complementary and redundant information from the original images. While deep learning-based methods have gained widespread adoption in this domain, their predominant reliance on convolutional neural networks and limited utilization of transforms pose certain limitations. Notably, convolutional operations fail to effectively capture long-range dependency between images, which hampers the generation of fused images with optimal complementarities. Because of this, we provide an original end-to-end fusion model built on the swin transform. By modeling long-range dependency using a full-attention feature-encoding backbone, which is a pure transform network with stronger representational capabilities than convolutional neural networks, the model overcomes the shortcomings of manual design as well as complex activity-level measurement and fusion rule design. In addition, we present three loss function strategies that are created expressly to enhance similarity constraints and network parameter training, improving the quality of the detailed information in the fused images. Finally, experimental results on four datasets indicated that our method reached the state-of-the-art in both subjective and objective evaluation compared to eleven state-of-the-art methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Bai Y, Hou Z, Liu X, Ma S, Yu W, Pu L (2020) An object detection algorithm based on decision-level fusion of visible light image and infrared image. J Air Force Eng Univ 21(06):53–59
Fu H, Wang S, Duan P, Xiao C, Dian R, Li S, Li Z (2023) LRAF-Net: long-range attention fusion network for visible–infrared object detection. IEEE Trans Neural Netw Learn Syst 1–14
Hui L, XiaoJun W, Josef K (2020) Mdlatlrr: a novel decomposition method for infrared and visible image fusion. IEEE Trans Image Process 29:4733–4746
Wu M, Ma Y, Huang J, Fan F, Dai X (2020) A new patch-based two-scale decomposition for infrared and visible image fusion. Infrared Phys Technol 110(103):362
Hill P, Al-Mualla ME, Bull D (2016) Perceptual image fusion using wavelets. IEEE Trans Image Process 26(3):1076–1088
Jin X, Jiang Q, Yao S, Zhou D, Nie R, Lee SJ, He K (2018) Infrared and visual image fusion method based on discrete cosine transform and local spatial frequency in discrete stationary wavelet transform domain. Infrared Phys Technol 88:1–12
Naidu AR, Bhavana D, Revanth P, Gopi G, Kishore MP, Venkatesh KS (2020) Fusion of visible and infrared images via saliency detection using two-scale image decomposition. Int J Speech Technol 23(4):815–824
Chen J, Li X, Luo L, Mei X, Ma J (2020) Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf Sci 508:64–78
Ma B, Zhu Y, Yin X, Ban X, Huang H, Mukeshimana M (2021) Sesf-fuse: an unsupervised deep model for multi-focus image fusion. Neural Comput Appl 33:5793–5804
Zhou Q, Ye S, Wen M, Huang Z, Ding M, Zhang X (2022) Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer. Neural Comput Appl 34(24):21741–21761
Zhiguang Y, Youping C, Zhuliang L, Yong M (2021) GANFuse: a novel multi-exposure image fusion method based on generative adversarial networks. Neural Comput Appl 33:6133–6145
Li H, Wu X (2019) Densefuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28:2614–2623
Keutzer FIMKGD (2014) Densenet: implementing efficient convnet descriptor pyramids. Comput Science
Li H, Wu XJ, Durrani T (2020) Nestfuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans Instrum Meas 69(12):9645–9656
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention—MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18. Springer, pp 234–241
Hou R, Zhou D, Nie R, Liu D, Xiong L, Guo Y, Yu C (2020) Vif-net: an unsupervised framework for infrared and visible image fusion. IEEE Trans Comput Imaging 6:640–651
Xu H, Zhang H, Ma J (2021) Classification saliency-based rule for visible and infrared image fusion. IEEE Trans Comput Imaging 7:824–836
Zhang J, Lei W, Li S, Li Z, Li X (2023) Infrared and visible image fusion with entropy-based adaptive fusion module and mask-guided convolutional neural network. Infrared Phys Technol 131(104):629
Li J, Liu J, Zhou S, Zhang Q, Kasabov NK (2023) Infrared and visible image fusion based on residual dense network and gradient loss. Infrared Phys Technol 128(104):486
Jiayi M, Wei Y, Pengwei L, Chang L, Junjun J (2019) Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf Fusion 48:11–26
Ma J, Liang P, Yu W, Chen C, Guo X, Wu J, Jiang J (2020) Infrared and visible image fusion via detail preserving adversarial learning. Inf Fusion 54:85–98
Xu H, Ma J, Zhang XP (2020) Mef-gan: multi-exposure image fusion via generative adversarial networks. IEEE Trans Image Process 29:7203–7216
Le Z, Huang J, Xu H, Fan F, Ma Y, Mei X, Ma J (2022) Uifgan: an unsupervised continual-learning generative adversarial network for unified image fusion. Inf Fusion 88:305–318
Jiayi M, Han X, Junjun J, Xiaoguang M, Xiao-Ping Z (2020) Ddcgan: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans Image Process 29:4980–4995
Song A, Duan H, Pei H, Ding L (2022) Triple-discriminator generative adversarial network for infrared and visible image fusion. Neurocomputing 483:183–194
Zhan L, Zhuang Y, Huang L (2017) Infrared and visible images fusion method based on discrete wavelet transform. J Comput 28(2):57–71
Li G, Lin Y, Qu X (2021) An infrared and visible image fusion method based on multi-scale transformation and norm optimization. Inf Fusion 71:109–129
Cvejic N, Bull D, Canagarajah N (2007) Region-based multimodal image fusion using ICA bases. IEEE Sens J 7(5):743–751
Naidu V (2014) Hybrid DDCT-PCA based multi sensor image fusion. J Opt 43:48–61
Cui G, Feng H, Xu Z, Li Q, Chen Y (2015) Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Opt Commun 341:199–209
Bin Y, Chao Y, Guoyu H (2016) Efficient image fusion with approximate sparse representation. Int J Wavelets Multiresolut Inf Process 14(04):1650024
Tang H, Liu G, Tang L, Bavirisetti DP, Wang J (2022) Mdedfusion: a multi-level detail enhancement decomposition method for infrared and visible image fusion. Infrared Phys Technol 127:104435
Chen J, Li X, Luo L, Mei X, Ma J (2020) Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf Sci 508:64–78
Li X, Guo X, Han P, Wang X, Li H, Luo T (2020) Laplacian redecomposition for multimodal medical image fusion. IEEE Trans Instrum Meas 69(9):6880–6890
Bhat S, Koundal D (2021) Multi-focus image fusion using neutrosophic based wavelet transform. Appl Soft Comput 106:107307
Nencini F, Garzelli A, Baronti S, Alparone L (2007) Remote sensing image fusion using the curvelet transform. Inf Fusion 8(2):143–156
Li H, Qiu H, Yu Z, Zhang Y (2016) Infrared and visible image fusion scheme based on NSCT and low-level visual features. Infrared Phys Technol 76:174–184
Bavirisetti DP, Xiao G, Liu G (2017) Multi-sensor image fusion based on fourth order partial differential equations. In: 2017 20th international conference on information fusion (fusion), pp 1–9. https://doi.org/10.23919/ICIF.2017.8009719
Liu N, Yang B (2021) Infrared and visible image fusion based on trpca and visual saliency detection. In: 2021 6th international conference on image, vision and computing (ICIVC). IEEE, pp 13–19
Mitianoudis N, Stathaki T (2007) Pixel-based and region-based image fusion schemes using ICA bases. Inf Fusion 8(2):131–142
Mou J, Gao W, Song Z (2013) Image fusion based on non-negative matrix factorization and infrared feature extraction. In: 2013 6th international congress on image and signal processing (CISP), vol 2. IEEE, pp 1046–1050
Bavirisetti DP, Dhuli R (2016) Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys Technol 76:52–64
Liu Y, Dong L, Ren W, Xu W (2021) Multi-scale saliency measure and orthogonal space for visible and infrared image fusion. Infrared Phys Technol 118(103):916
Zhang S, Li X, Zhang X, Zhang S (2021) Infrared and visible image fusion based on saliency detection and two-scale transform decomposition. Infrared Phys Technol 114(103):626
Wang W, Ma X, Liu H, Li Y, Liu W (2021) Multi-focus image fusion via joint convolutional analysis and synthesis sparse representation. Signal Process Image Commun 99(116):521
Yang Y, Zhang Y, Huang S, Wu J (2020) Multi-focus image fusion via nsst with non-fixed base dictionary learning. Int J Syst Assur Eng Manag 11:849–855
Liu Y, Liu S, Wang Z (2015) A general framework for image fusion based on multi-scale transform and sparse representation. Inf Fusion 24:147–164
Prabhakar K, Srikar V, Babu R (2017) Deepfuse: a deep unsupervised approach for exposure fusion with extreme exposure image pairs (conference paper). In: Proceedings of the IEEE international conference on computer vision, pp 4724–4732
Liu L, Chen M, Xu M, Li X (2021) Two-stream network for infrared and visible images fusion. Neurocomputing 460:50–58
Jiayi M, Linfeng T, Meilong X, Hao Z, Guobao X (2021) Stdfusionnet: an infrared and visible image fusion network based on salient target detection. IEEE Trans Instrum Meas 70:1–13
Zhang Y, Liu Y, Sun P, Yan H, Zhao X, Zhang L (2020) Ifcnn: a general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118
Xu H, Ma J, Jiang J, Guo X, Ling H (2022) U2fusion: A unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44:502–518
Jiayi M, Hao Z, Zhenfeng S, Pengwei L, Han X (2021) Ganmcc: a generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans Instrum Meas 70:1–14
Ashish V, Noam S, Niki P, Jakob U, Llion J, Gomez AN, Lukasz K, Illia P (2017) Attention is all you need. Learning
Alexey D, Lucas B, Alexander K. An image is worth 16x16 words: transformers for image recognition at scale
Jiayi M, Linfeng T, Fan F, Jun H, Xiaoguang M, Yong M (2022) Swinfusion: cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J Autom Sin 9:1200–1217
Jun C, Jianfeng D, Yang Y, Wenping G (2023) Thfuse: an infrared and visible image fusion network using transformer and hybrid feature extractor. Neurocomputing 527:71–82
Dongyu R, Tianyang X, XiaoJun W (2023) Tgfuse: an infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans Image Process : a publication of the IEEE Signal Processing Society 1
Tno image fusion dataset (2014). https://figshare.com/articles/TN_Image_Fusion_Dataset/1008029
Linfeng T, Jiteng Y, Hao Z, Xingyu J, Jiayi M (2022) Piafusion: a progressive infrared and visible image fusion network based on illumination aware. Inf Fusion 83:79–92
Liu J, Fan X, Huang Z, Wu G, Liu R, Zhong W, Luo Z (2022) Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5802–5811
Zhishe W, Yanlin C, Wenyu S, Hui L, Lei Z (2022) Swinfuse: a residual swin transformer fusion network for infrared and visible images. IEEE Trans Instrum Meas 71:1–12
Zhou Z, Wang B, Li S, Dong M (2016) Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters. Inf Fusion 30:15–26
Kurban R (2023) Gaussian of differences: a simple and efficient general image fusion method. Entropy 25(8):1215
Roberts JW, Van Aardt JA, Ahmed FB (2008) Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J Appl Remote Sens 2:023522
Rao YJ (1997) In-fiber Bragg grating sensors. Meas Sci Technol 8:355
Zhou W, Alan B, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13:600–612
Haghighat M, Razian M (2014) Fast-FMI: non-reference image fusion metric. In: 2014 IEEE 8th international conference on application of information and communication technologies (AICT) international conference on application of information and communication technologies, pp 424–426
Aslantas V, Bendes E (2015) A new image quality metric for image fusion: the sum of the correlations of differences. AEU Int J Electron Commun 69:1890–1896
Bauer E (1999) An empirical of voting classification algorithms: bagging, boosting and variants. Mach Learn 36:105–139
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind’’ image quality analyzer. IEEE Signal Process Lett 20(3):209–212
Acknowledgements
The authors gratefully acknowledge the financial supports by The National Science Foundation of China (62203224), Shanghai Special Plan for Local Colleges and Universities for Capacity Building (22010501300).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service or company that could be construed as influencing the position presented in the manuscript entitled "RSTFusion: An end-to-end fusion network for infrared and visible images based on residual swin transfomer".
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, K., Tang, H., Liu, G. et al. RSTFusion: an end-to-end fusion network for infrared and visible images based on residual swin transfomer. Neural Comput & Applic 36, 13467–13489 (2024). https://doi.org/10.1007/s00521-024-09716-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09716-9