Abstract
Existing deep learning techniques for image fusion either learn image mapping (LIM) directly, which renders them ineffective at preserving details due to the equal consideration to each pixel, or learn detail mapping (LDM), which only attains a limited level of performance because only details are used for reasoning. The recent lossless invertible network (INN) has demonstrated its detail-preserving ability. However, the direct applicability of INN to the image fusion task is limited by the volume-preserving constraint. Additionally, there is the lack of a consistent detail-preserving image fusion framework to produce satisfactory outcomes. To this aim, we propose a general paradigm for image fusion based on a novel conditional INN (named DCINN). The DCINN paradigm has three core components: a decomposing module that converts image mapping to detail mapping; an auxiliary network (ANet) that extracts auxiliary features directly from source images; and a conditional INN (CINN) that learns the detail mapping based on auxiliary features. The novel design benefits from the advantages of INN, LIM, and LDM approaches while avoiding their disadvantages. Particularly, using INN to LDM can easily meet the volume-preserving constraint while still preserving details. Moreover, since auxiliary features serve as conditional features, the ANet allows for the use of more than just details for reasoning without compromising detail mapping. Extensive experiments on three benchmark fusion problems, i.e., pansharpening, hyperspectral and multispectral image fusion, and infrared and visible image fusion, demonstrate the superiority of our approach compared with recent state-of-the-art methods. The code is available at https://github.com/wwhappylife/DCINN
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We just use this toy example to indicate the challenges of applying INN to image fusion. The involved invertible transformation in our CDINN paradigm is much more complex and powerful.
The notation \(\textsf {size}({\textbf {x}})\) denotes the total size (or the so-called volume) of \({\textbf {x}}\).
For the different tasks, the two source images are different. For example, the two source images are the PAN image and LRMS image for pansharpening. More details about the other applications can be found from the bottom part of Fig. 3
For convenience, we unfold the two-dimensional images into one-dimensional vectors. The same for the subsequent notations.
\({\textbf {R}}^+\) can be easily computed by using the Matlab function “pinv(R)”.
In the experiments, the low-pass filter is a zero-mean Gaussian filter with size of \(11\times 11\) and standard deviation equal to 1.
The Harr transform is an invertible transform that satisfies the volume-preserving constraint by increasing the number of channels when downsampling the image.
We mentioned that INN has a small capacity due to feature splitting. Thus, enhancing the channel interaction is very important for INN to increase the capacity.
The symbol “\(\vert \)” indicates that CINN takes \({\textbf {F}}_\textrm{a}\) as conditional features.
The reason why using different learning ways is given in Sect. 3.1 Even though there are different learning ways, the incorporation into a uniform framework is not affected.
IVF is considered a typical multi-model image fusion task addressed in an unsupervised way.
Both MS-SSIM and MI can be used as reference metrics, but for the IVF task, the references are not available.
DCINN can be just seen as an upper bound.
The reasons we chose IFCNN rather than U2Fusion or YDTR are that IFCNN yields better visual quality than U2Fusion and YDTR tends to generate artifacts.
We simply adopt the mean rule as detail fusion rule.
References
Adu, J. J., Gan, J. H., Wang, Y., & Huang, J. (2013). Image fusion based on non-subsampled contourlet transform for infrared and visible light image. Infrared Physics and Technology, 61, 94–100.
Aiazzi, B., Alparone, L., Baronti, S., & Garzelli, A. (2002). Context-driven fusion of high spatial and spectral resolution images based on oversampled multiresolution analysis. IEEE Transactions on Geoscience and Remote Sensing, 40, 2300–2312.
Alexander, T. (2017). The TNO multiband image data collection. Data in brief, 15, 249–251.
Alparone, L., Aiazzi, B., Baronti, S., Garzelli, A., Nencini, F., & Selva, M. (2008). Multispectral and panchromatic data fusion assessment without reference. Photogrammetric Engineering and Remote Sensing, 74(2), 193–200.
Andrea, G., Filippo, N., & Luca, C. (2007). Optimal MMSE pan sharpening of very high resolution multispectral images. IEEE Transactions on Geoscience and Remote Sensing, 46(1), 228–236.
Ardizzone, L., Lüth, C., Kruse, J., Rother, C., & Köthe, U. (2019). Guided image generation with conditional invertible neural networks. CoRR.
Barata, J., & Hussein, M. (2012). The Moore–Penrose pseudoinverse: A tutorial review of the theory. Brazilian Journal of Physics, 42, 146–165.
Behrmann, J., Grathwohl, W., Chen, R. T., Duvenaud, D., & Jacobsen, J. H.(2019). Invertible residual networks. In International Conference on Machine Learning (ICML) (pp. 573–582).
Chakrabarti, A., & Zickler, T. (2011). Statistics of real-world hyperspectral images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 193–200).
Choi, J., Yu, K., & Kim, Y. (2010). A new adaptive component substitution based satellite image fusion by using partial replacement. IEEE Transactions on Geoscience and Remote Sensing, 49(1), 295–309.
Craig, L., & Bernard, B. (2000). Process for enhancing the spatial resolution of multispectral imagery using pan-sharpening. US Patent 6,011,875
Cui, J., Zhou, L., Li F, & Zha, Y. (2022). Visible and infrared image fusion by invertible neural network. In China Conference on Command and Control (CICC) (pp. 133–145).
Deng, L. J., Vivone, G., Jin, C., & Chanussot, J. (2021). Detail injection-based deep convolutional neural networks for pansharpening. IEEE Transactions on Geoscience and Remote Sensing, 59(8), 6995–7010.
Deng, L. J., Vivone, G., Paoletti, M. E., Scarpa, G., He, J., Zhang, Y., Chanussot, J., & Plaza, A. (2022). Machine learning in pansharpening: A benchmark, from shallow to deep networks. IEEE Geoscience and Remote Sensing Magazine, 10(3), 279–315. https://doi.org/10.1109/MGRS.2022.3187652
Deng, S. Q., Deng, L. J., Wu, X., Ran, R., Hong, D., & Vivone, G. (2023). PSRT: Pyramid shuffle-and-reshuffle transformer for multispectral and hyperspectral image fusion. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–15. https://doi.org/10.1109/TGRS.2023.3244750
Dian, R. W., Li, S. T., Guo, A. J., & Fang, L. (2018). Deep hyperspectral image sharpening. IEEE Transactions on Neural Networks and Learning Systems, 29(11), 5345–5355.
Dinh L, Krueger D, Bengio Y (2015) Nice: Non-linear independent components estimation. In Conference on Learning Representations (ICLR) Workshop Track.
Dong, W. S., Zhou, C., Wu, F. F., Wu, J., Shi, G., & Li, X. (2021). Model-guided deep hyperspectral image super-resolution. IEEE Transactions on Image Processing, 30, 5754–5768.
Emiel H, Victor GS, Jakub T, & Welling, M. (2020) The Convolution Exponential and Generalized Sylvester Flows. In Conference on Neural Information Processing Systems (NeurIPS) (pp. 18249–18260).
Eskicioglu, A., & Fisher, P. (1995). Image quality measures and their performance. IEEE Transactions on Communications, 43(12), 2959–2965.
Fu, X. Y., Wang, W., Huang, Y., Ding, X., & Paisley, J. (2020). Deep multiscale detail networks for multiband spectral image sharpening. IEEE Transactions on Neural Networks and Learning Systems, 32(5), 2090–2104.
Garzelli, A., & Nencini, F. (2009). Hypercomplex quality assessment of multi/hyperspectral images. IEEE Geoscience and Remote Sensing Letters, 6(4), 662–665.
Giuseppe, M., Davide, C., Luisa, V., & Scarpa, G. (2016). Pansharpening by convolutional neural networks. Remote Sensing, 8(7), 594.
Gomez, A. N., Ren, M., Urtasun, R., & Grosse, R. B. (2017) The reversible residual network: Backpropagation without storing activations. In Conference on Neural Information Processing Systems (NeurIPS).
Guan, P. Y., & Lam, E. Y. (2021). Multistage dual-attention guided fusion network for hyperspectral pansharpening. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–14.
Guo, A. J., Dian, R. W., & Li, S. T. (2023). A deep framework for hyperspectral image fusion between different satellites. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 7939–7954.
Guo, P. H., Zhuang, P. X., & Guo, Y. C. (2020). Bayesian pan-sharpening with multiorder gradient-based deep network constraints. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 950–962.
He, K., Zhang, X., Ren, S., & Sun, J. (2016) Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778).
He, L., Rao, Y. Z., Li, J., Chanussot, J., Plaza, A., Zhu, J., & Li, B. (2019). Pansharpening via detail injection based convolutional neural networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(4), 1188–1204.
Hou, R. C., Zhou, D. M., Nie, R. C., Liu, D., Xiong, L., Guo, Y., & Yu, C. (2020). VIF-Net: An unsupervised framework for infrared and visible image fusion. IEEE Transactions on Computational Imaging, 6, 640–651.
Hu, J. F., Huang, T. Z., Deng, L. J., Dou, H. X., Hong, D., & Vivone, G. (2022). Fusformer: A transformer-based fusion network for hyperspectral image super-resolution. IEEE Geoscience and Remote Sensing Letters, 19, 1–5.
Hu, J. F., Huang, T. Z., Deng, L. J., Jiang, T. X., Vivone, G., & Chanussot, J. (2022). Hyperspectral image super-resolution via deep spatiospectral attention convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 7251–7265.
Huang G, Liu Z, Maaten LVD, & Weinberger, K. Q. (2017) Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (pp. 4700–4708).
Huang, J. J., & Dragotti, P. L. (2022). WINNet: Wavelet-inspired invertible network for image denoising. IEEE Transactions on Image Processing, 31, 4377–4392.
Huang, T., Dong, W. S., Wu, J. J., Li, L., Li, X., & Shi, G. (2022). Deep hyperspectral image fusion network with iterative spatio-spectral regularization. IEEE Transactions on Computational Imaging, 8, 201–214.
Jin, C., Deng, L. J., Huang, T. Z., & Vivone, G. (2022). Laplacian pyramid networks: A new approach for multispectral pansharpening. Information Fusion, 78, 158–170.
Jin ZR, Zhang TJ, Jiang TX, Vivone, G., & Deng, L. J. (2022b) LAGConv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening. In AAAI Conference on Artificial Intelligence (AAAI) (pp. 1113–1121).
Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. In International Conference On Learning Representations (ICLR) (p. 80).
Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. In Conference on Neural Information Processing Systems (NeurIPS).
Lanaras, C., Baltsavias, E., & Schindler, K. (2015). Hyperspectral super-resolution by coupled spectral unmixing. In International Conference on Computer Vision (ICCV) (pp. 3586–3594).
Li, H., & Wu, X. J. (2019). DenseFuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5), 2614–2623.
Li, H., Wu, X. J., & Kittler, J. (2021). RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, 73, 72–86.
Li, H., Xu, T., Wu, X. J., Lu, J., & Kittler, J. (2023). LRRNet: A novel representation learning guided fusion network for infrared and visible images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 11040–11052. https://doi.org/10.1109/TPAMI.2023.3268209
Liu, J. G. (2002). Smoothing filter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details. International Journal of Remote Sensing, 23(3), 593–597.
Liu, J. Y., Dian, R. W., Li, S. T., & Liu, H. (2023). SGFusion: A saliency guided deep-learning framework for pixel-level image fusion. Information Fusion, 91, 205–214.
Liu, R. S., Liu, J. Y., Jiang, Z. Y., Fan, X., & Luo, Z. (2020). A bilevel integrated model with data-driven layer ensemble for multi-modality image fusion. IEEE Transactions on Image Processing, 30, 1261–1274.
Liu, X. Y., Liu, Q. J., & Wang, Y. H. (2020). Remote sensing image fusion based on two-stream fusion network. Information Fusion, 55, 1–15.
Lu, S. P., Wang, R., Zhong, T., & Rosin, P. L. (2021) Large-capacity image steganography based on invertible neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10816–10825).
Ma, J. Y., Chen, C., Li, C., & Huang, J. (2016). Infrared and visible image fusion via gradient transfer and total variation minimization. Information Fusion, 31, 100–109.
Ma, J. Y., Yu, W., Liang, P. W., Li, C., & Jiang, J. (2019). FusionGAN: A generative adversarial network for infrared and visible image fusion. Information Fusion, 48, 11–26.
Ma, J. Y., Liang, P. W., Yu, W., Chen, C., Guo, X., Wu, J., & Jiang, J. (2020). Infrared and visible image fusion via detail preserving adversarial learning. Information Fusion, 54, 85–98.
Ma, J. Y., Yu, W., Chen, C., Liang, P., Guo, X., & Jiang, J. (2020). Pan-GAN: An unsupervised pan-sharpening method for remote sensing image fusion. Information Fusion, 62, 110–120.
Ma, J. Y., Tang, L., Fan, F., Huang, J., Mei, X., & Ma, Y. (2022). SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7), 1200–1217.
Ma, Q., Jiang, J. J., Liu, X. M., & Ma, J. (2023). Learning a 3D-CNN and transformer prior for hyperspectral image super-resolution. Information Fusion. https://doi.org/10.1016/j.inffus.2023.101907
Miguel, S., Bioucas-Dias, J., Almeida, L. B., & Chanussot, J. (2015). A convex formulation for hyperspectral image superresolution via subspace-based regularization. IEEE Transactions on Geoscience and Remote Sensing, 53(6), 3373–3388.
Naoto, Y., Takehisa, Y., & Akira, I. (2012). Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion. IEEE Transactions on Geoscience and Remote Sensing, 50(2), 528–537.
Qi, W., Nicolas, D., & Jean-Yves, T. (2015). Fast fusion of multi-band images based on solving a Sylvester equation. IEEE Transactions on Image Processing, 24(11), 4109–4121.
Qu, G. H., Zhang, D. L., & Yan, P. F. (2002). Information measure for performance of image fusion. Electronics Letters, 38, 1–7. https://doi.org/10.1049/el:20020212
Ran, R., Deng, L. J., Jiang, T. X., Hu, J. F., Chanussot, J., & Vivone, G. (2023). GuidedNet: A general CNN fusion framework via high-resolution guidance for hyperspectral image super-resolution. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2023.3238200
Rao, Y. J. (1997). In-fibre Bragg grating sensors. Measurement science and technology, 8, 355–358.
Tang, W., He, F. Z., & Liu, Y. (2022). YDTR: Infrared and visible image fusion via Y-shape dynamic transformer. IEEE Transactions on Multimedia. https://doi.org/10.1109/TMM.2022.3192661
Vivone, G., Restaino, R., & Chanussot, J. (2018). Full scale regression-based injection coefficients for panchromatic sharpening. IEEE Transactions on Image Processing, 27(7), 3418–3431.
Wald, L. (2002). Data Fusion. Definitions and Architectures—Fusion of Images of Different Spatial Resolutions. Presses des MINES.
Wald, L., Ranchin, T., & Mangolini, M. (1997). Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogrammetric Engineering and Remote Sensing, 63(6), 691–699.
Wang, L. G., Guo, Y. L., Dong, X. Y., Wang, Y., Ying, X., Lin, Z., & An, W. (2022). Exploring fine-grained sparsity in convolutional neural networks for efficient inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4474–4493.
Wang W, Zeng WH, Huang Y, Ding, X., & Paisley, J. (2019). Deep blind hyperspectral image fusion. In International Conference on Computer Vision (ICCV) (pp. 4150–4159).
Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003). Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems and Computers (ACSSC) (pp. 1398–1402).
Wesley, R., van Aardt, J., & Fethi, A. (2008). Assessment of image fusion procedures using entropy, image quality, and multispectral classification. Journal of Applied Remote Sensing, 2, 1–28.
Wu, Z. C., Huang, T. Z., Deng, L. J., Huang, J., Chanussot, J., & Vivone, G. (2023). LRTCFPan: Low-rank tensor completion based framework for pansharpening. IEEE Transactions on Image Processing, 32, 1640–1655.
Xiao, J. J., Li, J., Yuan, Q. Q., & Zhang, L. (2022). A dual-UNet with multistage details injection for hyperspectral image fusion. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–13.
Xiao M, Zheng S, Liu C, Wang, Y., He, D., Ke, G., Bian, J., Lin, Z., & Liu, T. Y. (2020). Invertible image rescaling. In European Conference on Computer Vision (ECCV) (pp. 126–144).
Xu, H., Ma, J., Le, Z., Jiang, J., & Guo, X. (2020). FusionDN: A unified densely connected network for image fusion. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (pp. 12484–12491).
Xu, H., Ma, J. Y., Jiang, J. J., Guo, X., & Ling, H. (2022). U2Fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1), 502–518.
Xu, Q. Z., Zhang, Y., Li, B., & Ding, L. (2014). Pansharpening using regression of classified MS and pan images to reduce color distortion. IEEE Geoscience and Remote Sensing Letters, 12(1), 28–32.
Xu, S., Zhang, J., Zhao, Z., Sun, K., Liu, J., & Zhang, C. (2021). Deep gradient projection networks for pan-sharpening. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1366–1375).
Xu, Y., & Zhang, J. (2021). Invertible resampling-based layered image compression. In 2021 Data Compression Conference (DCC) (pp. 380–380).
Yan, Y. S., Liu, J. M., Xu, S., Wang, Y., & Cao, X. (2022). MD\(^3\)Net: Integrating model-driven and data-driven approaches for pansharpening. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–16.
Yang, J., Fu, X., Hu, Y., Huang, Y., Ding, X., & Paisley, J. (2017). PanNet: A deep network architecture for pan-sharpening. In International Conference on Computer Vision (ICCV) (pp. 5449–5457).
Yang, Y., Lu, H. Y., Huang, S. Y., & Tu, W. (2020). Pansharpening based on joint-guided detail extraction. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 389–401.
Yang, Y., Wu, L., Huang, S. Y., Wan, W., Tu, W., & Lu, H. (2020). Multiband remote sensing image pansharpening based on dual-injection model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 1888–1904.
Yuhas, R. H., Goetz, A. F., & Boardman, J. W. (1992). Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In Annual JPL Airborne Geoscience Workshop.
Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., & Yang, M. H. (2022). Restormer: Efficient transformer for high-resolution image restoration. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5728–5739).
Zhang, T. J., Deng, L. J., Huang, T. Z., Chanussot, J., & Vivone, G. (2022). A triple-double convolutional neural network for panchromatic sharpening. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3155655
Zhang, X. T., Huang, W., Wang, Q., & Li, X. (2021). SSR-NET: Spatial-spectral reconstruction network for hyperspectral and multispectral image fusion. IEEE Transactions on Geoscience and Remote Sensing, 59(7), 5953–5965.
Zhang, Y., Liu, Y., Sun, P., Yan, H., Zhao, X., & Zhang, L. (2020). IFCNN: A general image fusion framework based on convolutional neural network. Information Fusion, 54, 99–118.
Zhao, R., Liu, T. S., Xiao, J., Lun, D. P., & Lam, K. M. (2021). Invertible image decolorization. IEEE Transactions on Image Processing, 30, 6081–6095.
Zhao, Z., Xu, S., Zhang, C., Liu, J., Li, P., & Zhang, J. (2020). DIDFuse: Deep image decomposition for infrared and visible image fusion. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 970–976).
Zhao, Z., Bai, H., Zhang, J., Zhang, Y., Xu, S., Lin, Z., Timofte, R., & Van Gool, L. (2023). CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5906–5916).
Zhou, M., Fu, X. Y., Huang, J., Zhao, F., & Hong, D. (2022). Effective pan-sharpening by multiscale invertible neural network and heterogeneous task distilling. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–14. https://doi.org/10.1109/TGRS.2022.3199210
Zhou, M., Yan, K. Y., Pan, J. S., Ren, W., Xie, Q., & Cao, X. (2023). Memory-augmented deep unfolding network for guided image super-resolution. International Journal of Computer Vision, 131(1), 215–242.
Zhou, Z. Q., Wang, B., Li, S., & Dong, M. (2016). Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters. Information Fusion, 30, 15–26.
Zhuang, P., Liu, Q., & Ding, X. (2019). Pan-GGF: A probabilistic method for pan-sharpening with gradient domain guided image filtering. Signal Processing, 156, 177–190.
Acknowledgements
This research is supported by National Natural Science Foundation of China (12271083, 12171072), Natural Science Foundation of Sichuan Province (2022NSFSC0501), and National Key Research and Development Program of China (Grant No. 2020YFA0714001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Massimiliano Mancini.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, W., Deng, LJ., Ran, R. et al. A General Paradigm with Detail-Preserving Conditional Invertible Network for Image Fusion. Int J Comput Vis 132, 1029–1054 (2024). https://doi.org/10.1007/s11263-023-01924-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01924-5