Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Asymmetric Dual-Decoder U-Net for Joint Rain and Haze Removal

Published: 09 December 2023 Publication History

Abstract

This work studies the multi-weather restoration problem. In real-life scenarios, rain and haze, two often co-occurring common weather phenomena, can greatly degrade the clarity and quality of the scene images, leading to a performance drop in the visual applications, such as autonomous driving. However, jointly removing the rain and haze in scene images is ill-posed and challenging, where the existence of haze and rain and the change of atmosphere light, can both degrade the scene information. Current methods focus on the contamination removal part, thus ignoring the restoration of the scene information affected by the change of atmospheric light. We propose a novel deep neural network, named Asymmetric Dual-decoder U-Net (ADU-Net), to address the aforementioned challenge. The ADU-Net produces both the contamination residual and the scene residual to efficiently remove the contamination while preserving the fidelity of the scene information. Extensive experiments show our work outperforms the existing state-of-the-art methods by a considerable margin in both synthetic data and real-world data benchmarks, including RainCityscapes, BID Rain, and SPA-Data. For instance, we improve the state-of-the-art PSNR value by 2.26/4.57 on the RainCityscapes/SPA-Data, respectively. Codes will be made available freely to the research community.

1 Introduction

When photographing in bad weather, the quality of outdoor scene images can be greatly degraded by the contamination, i.e., rain, haze, snow, and so forth, distributed in the air. Such contamination absorbs or disperses the scene light, thereby reducing the contrast and color fidelity of the scene image. Hence, the existence of contamination significantly affects many real-world vision systems, such as scene recognition, object tracking, semantic segmentation, and so forth, and all of these vision systems are essential for autonomous driving [7, 13, 60]. In other words, such outdoor vision systems, which work efficiently in ideal weather conditions, will suffer a plummet due to complex real-world weather conditions. Therefore, it is essential to develop algorithms to restore images contaminated by different contaminants as a pre-processor for such outdoor vision systems.
In this work, we focus on a real yet less-investigated scenario, the co-occurrence of the rain and haze in the scenes. Both image rain removal and haze removal are challenging low-level computer vision tasks. Many efforts have been made to solve the individual rain removal and haze removal tasks [48, 52, 56]. However, only a few works consider removing the rain and haze jointly in scene images [18, 21, 47]. In the real-world scenario, it is a very common situation that rain and haze co-occur in a rainfall environment (see Figure 1(a)) [17]. Along with rain streaks and raindrops, the uneven haze will also obscure the image, interfering with the perception of the environment. Such a scenario brings challenges to the outdoor vision systems that are required to jointly remove the rain and haze in images.
Fig. 1.
Fig. 1. Example of a scene image and its residual maps. (a) is the input image and (b) is the ground truth from the RainCityscapes dataset. Image in (c) is the difference between (a) and (b). (d) and (e) are the contamination residual and scene residual. (f) is the result of (a)+(b). “Res” indicates “Residual.” The contamination and scene details are included in the red and yellow boxes, respectively (zoom in to find the details).
The existing methods for single-image rain and haze removal can be roughly categorized into two categories: priority knowledge-oriented approaches and data-driven approaches. The prior knowledge-based image rain removal [24, 31, 36] and haze removal methods [15, 19, 63] are mostly based on the physical imaging models. However, such solutions suffer from the robustness issue when deployed into real-world scenarios [32, 62]. Recent advances in deep learning demonstrate dramatic success in haze removal [9, 27, 43] and rain removal [40, 49, 59]. Learning-based methods in both fields have achieved cutting-edge performance on synthetic datasets. However, methods designed for certain contamination cannot handle the complex real-world scenario with the co-occurrence of rain and haze in the natural scenes. Recent studies also pointed out the necessity of joint-removal, such as Han et al. [18] decompose rain and haze by a Blind Image Decomposition Network, and Kim et al. [25] remove rain and haze by a frequency-based model. A new dataset for the purpose of benchmarking joint rain and haze removal, named RainCityScapes, is also proposed to facilitate research on this important task [21]. Thus, such a joint-removal task becomes an open problem in the community and calls for further study.
Recent advances in low-level computer vision have made remarkable progress, where a well-trained deep neural network can almost perfectly remove the contamination in outdoor scene images. However, no existing work considers paying attention to the scene difference in the restoration process. We observe that the true residual, obtained by \((\mathrm{Input} - \mathrm{Ground}~\mathrm{Truth})\) (see Figure 1(b)), contains the scene information. That is, a neural network designed to focus on contamination may suffer from a gap in recovering the scenes. Such a gap motivates us to develop a unified method to remove the contamination and compensate for the scene information in one go.
In real-world scenarios, the weather condition is complex, that is, different components, such as rain streaks and haze, may co-occur in the scenes. The occurrence of some components, i.e., heavy haze, impacts the atmospheric light. As a consequence, the scene information at the photometric level can be degraded. Physically speaking, along with removing contamination in the image, it is also necessary to restore scene information affected by the change of atmospheric light. To address this issue, we propose a novel dual-branch architecture, called Asymmetric Dual-decoder U-Net (ADU-Net). The ADU-Net consists of a single-branch encoder and an asymmetric dual-branch decoder. In the asymmetric dual-branch architecture, one branch, the contamination residual branch, is designed to remove the contamination (see Figure 1(c)). Another branch, the scene residual branch, is required to perform the recovery of scene information (see Figure 1(d)). The contamination residual branch, equipped with a novel channel feature fusion (CFF) module and window multi-head self-attention (W-MSA), produces the contamination residual. The special design allows the branch to focus more on the local foreground information in the image, thus extracting the contamination residual. The scene branch, powered by a novel global channel feature fusion (GCFF) module and shift-window multi-head self-attention (SW-MSA) mechanism, aims to compensate for the scene information. Unlike the contamination residual branch, the scene residual branch is designed to focus more on the global contextual information in the image, thus extracting the scene residual. The joint efforts of contamination residual and scene residual separate the rain and haze from the input scene image, while preserving the scene of the image (see Figure 1(e)). The proposed ADU-Net can effectively remove the different contamination in the images and compensate for the scene information on multiple benchmark datasets, including RainCityscapes [21], BID rain [18], and SPA-Data [49].
Our contribution can be summarized as follows:
We propose a novel yet efficient neural architecture, ADU-Net, to jointly remove rain and haze in scene images.
We present an asymmetric dual-decoder, which removes the contamination while compensating for the scene information of the image. To the best of our knowledge, this is the first work to consider the recovery of scene information in deraining and dehazing tasks.
Extensive experiments, including quantitative studies and qualitative studies, are conducted to evaluate the effectiveness of the ADU-Net. Empirical evaluation shows our method outperforms the current state-of-the-art methods by a considerable margin.

2 Related Work

2.1 Single-image Rain Removal

The very first single-image rain removal methods were based on a priori knowledge. Morphological component analysis (MCA) [24] employs bilateral filters to extract high-frequency components from rain images, where the high-frequency components are further decomposed into “rain components” and “non-rain components” through dictionary learning and sparse coding. Luo et al. [36] proposed a single-image rain removal algorithm based on mutual exclusion dictionary learning. Gaussian mixture model prior knowledge [31] was utilized to accommodate multiple orientations and scales of rain streaks. In [62], Zhu et al. detected the approximate region, where the rain streaks were located, to guide the separation of the rain layer from the background layer. However, early models based on prior knowledge often suffer from a lack of stability in real scenarios [24, 31, 36]. Since 2017, deep learning approaches have been developed for rain removal tasks. Deep detail networks [16] narrowed the mapping from input to output and combined prior knowledge to capture high-frequency details, making the model stay focused on rain streaks information. By adding an iterative information feedback network, JORDER [53] used a binary mapping to locate rain streaks. A non-locally enhanced encoder-decoder structure [28] was proposed to capture long-range dependencies and leverage the hierarchical features of the convolutional layer. In [30], Li et al. proposed a deep recurrent convolutional neural network to remove rain marks located at different depths progressively. A density-aware multi-stream connectivity network was introduced for rain removal in [58]. By adding constraints to the cGAN [23], Zhang et al. [59] generated more photo-realistic results. A progressive contextual aggregation network [40] was proposed as a baseline for rain removal. A real-world rain dataset was constructed by Wang et al. [49]; they also incorporated spatial perception mechanisms into deraining networks. Recently, Zhu et al. [61] proposed a gated non-local depth residual network for image rain removal. Yu et al. [55] conducted a comprehensive analysis of various aspects of existing rain removal models and their robustness against adversarial attacks. Based on these analyses, they proposed a more robust approach to address this issue.
While significant progress has been made in the research on image rain removal, the existing studies lack consideration for real-world rainy scenarios, limiting their effectiveness in practical applications. In contrast, our methods take a more realistic approach by not only addressing rain streak occlusions commonly encountered in rainy weather but also considering the impact of haze, which is prevalent in the atmosphere, on atmospheric light. By incorporating these factors, our methods offer a more comprehensive and practical solution that better aligns with real-world conditions.

2.2 Single-image Haze Removal

Similar to image rain removal methods, early work on image dehaze tended to employ statistical methods to acquire prior information by capturing patterns in haze-free images. Representative methods include Dark channel prior [19], color-line prior [15], color attenuation prior [63], and so forth. However, prior-based methods tend to distort colors and thus produce undesirable artifacts [15, 19, 63]. In the deep learning era, methods started to not rely on prior knowledge, but to estimate atmospheric light and the transmission map directly. For example, Cai et al. [5] proposed an end-to-end dehazing model named DehazeNet, where haze-free images are produced by learning the transmission rate. Similarly, Ren et al. [41] employed multi-scale deep neural networks to learn the mapping relationship between foggy images and their corresponding transmission maps, aiming to reduce the error in estimating the transmission maps. AODNet [27] reconstructed the atmospheric scattering model by leveraging an improved convolutional neural network to learn the mapping relationship between foggy and clean pairs. In [57], a single network was proposed to simultaneously learn the intrinsic relationship between transmission maps, atmospheric light, and clean images. Ren et al. [42] built an encoder-decoder neural network to enhance the dehazing process. A network with an enhancer and two generators was proposed by Qu et al. [39]. Chen et al. [9] proposed a patch map-based PMS-Net to effectively suppress the distorted color issue. Dong et al. [12] proposed MSBDN (Multi-Scale Boosted Dehazing Network) based on the U-Net architecture, incorporating boosting and error feedback as guiding principles. Although the method achieves good results, it suffers from a large number of parameters. Yeh et al. [54] decomposed hazy images into base components and detail components and proposed MSRL-DehazeNet, which is based on residual learning and U-Net architecture. Sun et al. [46] proposed SADNet based on the attention mechanism using a semi-supervised approach for solving practical problems. Song et al. [45] introduced Swin Transformer into image haze removal and proposed DehazeFormer, which achieved significant improvements on multiple datasets. Unlike image rain removal, image dehazing often considers the impact of haze on atmospheric light intensity, which can compensate for the limitations in rain removal methods. Our methods combine these approaches with the research on rain removal, resulting in a more realistic approach that better aligns with real-world scenarios.

2.3 Other Related Works

Unlike previous single-task models, some researchers have also explored the simultaneous enhancement of both rain removal and haze removal in images. Hu et al. [21] built an imaging model for rain streaks and haze based on the visual effect of rain and the scene depth map to synthesize a realistic dataset named RainCityscapes. Han et al. [18] constructed a superimposed image dataset and proposed a simple yet general Blind Image Decomposition Network to decompose rain streaks, raindrops, and haze in a blind image decomposition setting. Kim et al. [25] proposed a frequency-based model for removing rain and haze, where the frequency-based model divided the input image into high-frequency and low-frequency parts with a guided filter and then employed a symmetric encoder-decoder network to remove rain and haze separately. Kulkarni and Murala [26] proposed a lightweight network that combines convolutions at different scales with spatial attention and channel attention mechanisms, employing a dual restoration mechanism to handle images affected by various weather conditions. Recently, Li et al. [29] used a neural structure search based approach to handle multiple weather situations; however, it has a large number of parameters as it uses multiple encoders for each weather removal task. Chen et al. [10] proposed a training approach based on knowledge distillation, considering the perspective of training strategies. They introduced a multi-teacher model and a single-student model, enabling a single model to handle various weather conditions without increasing the parameter size. Valanarasu et al. [47] proposed a single transformer-based encoder-decoder network while restored image with a learnable weather type query in the decoder to learn the type of weather degradation. Wang et al. [50] enhanced the U-Net architecture by adding a small decoder and a dilated convolution attention module. This enhancement enabled the network to capture both global information and finer details in high-resolution remote sensing images. After an in-depth study of related works, we have identified two primary factors that contribute to image quality degradation: contamination and scene information affected by atmospheric light. To effectively address these factors, we introduce an innovative asymmetric dual-branch structure, allowing independent processing of each category. By separately optimizing contamination removal and scene information recovery, our method achieves enhanced overall performance and improved image quality.

3 Method

This section details the proposed method in a top-down fashion: starting from the problem formulation of our application, followed by the architecture of the proposed ADU-Net and its building block, namely, asymmetric dual-decoder block (ADB).
Notations. Throughout the article, we use bold capital letters to denote matrices or tensors (e.g., \({\boldsymbol {X}}\)), and bold lowercase letters to denote vectors (e.g., \({\boldsymbol {x}}\)).

3.1 Problem Formulation

Let a third-order tensor, \({{\boldsymbol {I}}} \in \mathbb {R}^{C \times H \times W}\), denote an input image, where C, H, and W present the channel, height, and width of the image, respectively. In our application, both rain and haze are synthesized into the origin scene images as input images. Each input image \({\boldsymbol {I}}\) is labeled with its ground truth image \({\boldsymbol {I}}^{\mathrm{gt}}\) without rain and haze in the scene. Our ADU-Net \(f_{\theta }\), consisting of a single branch encoder \(f_{\mathrm{E}}\), and an asymmetric dual-decoder \(f_{\mathrm{AD}}\), can remove the rain and haze in the input image, such that the output of the ADU-Net, \({\boldsymbol {Y}} = f_{\theta }({\boldsymbol {I}})\), can restore its ground truth scene \({\boldsymbol {I}}^{\mathrm{gt}}\). The ADU-Net is trained to learn a set of parameters, \(\theta ^*\), with minimum empirical objective value \(\mathcal {L}({\boldsymbol {I}}^{\mathrm{gt}}, {\boldsymbol {Y}})\).

3.2 Network Overview

We first give a sketch of the proposed ADU-Net. In rain and haze removal applications, one ideal option is to employ the deep neural network to understand the scene of the input image and separate the rain and haze from the input image. In our work, we develop the ADU-Net to remove the rain and haze jointly. As shown in Figure 2, the ADU-Net is stacked by a single branch encoder and an asymmetric dual-decoder. In the encoder \(f_{\mathrm{E}}\), we have five convolutional blocks, with each denoted by \(\mathrm{Conv}_{i},~0 \le i \le 4\). The output of each convolutional block is denoted by \({\boldsymbol {F}}_i = \mathrm{Conv}_i({\boldsymbol {F}}_{i-1})\) and \({\boldsymbol {F}}_{-1} = {\boldsymbol {I}}\).
Fig. 2.
Fig. 2. The network architecture of the proposed ADU-Net, which consists of an encoder \(f_{\mathrm{E}}\) and an asymmetric dual-decoder \(f_{\mathrm{AD}}\). \(f_{\mathrm{E}}\) has five \(\mathrm{Conv}_{i}\) blocks and \(f_\mathrm{AD}\) has four \(\mathrm{ADB}_j\) blocks and a Conv block. The network is optimized by the SSIM loss function.
Then a following asymmetric dual-decoder \(f_{\mathrm{AD}}\) aims to recover the scene image without rain and haze (see Figure 2). The proposed asymmetric dual-decoder is stacked of a set of ADBs, which produce two streams of latent features, denoted by \({\boldsymbol {Z}}^{\mathrm{c}}_j\) and \({\boldsymbol {Z}}^{\mathrm{s}}_j\) in the j-th ADB. Specifically, the processing can be formulated as
\begin{equation} {\boldsymbol {Z}}^{\mathrm{c}}_0, {\boldsymbol {Z}}^{\mathrm{s}}_0 = \mathrm{ADB}_0({\boldsymbol {F}}_3, {\boldsymbol {F}}_4), \end{equation}
(1)
or
\begin{equation} {\boldsymbol {Z}}^{\mathrm{c}}_j, {\boldsymbol {Z}}^{\mathrm{s}}_j = \mathrm{ADB}_j({\boldsymbol {Z}}^{\mathrm{c}}_{j-1}, {\boldsymbol {Z}}^{\mathrm{s}}_{j-1}, {\boldsymbol {F}}_{3-j}),~j\gt 0. \end{equation}
(2)
After the last ADB, each stream of latent features \({\boldsymbol {Z}}^{\mathrm{c}}_3\) or \({\boldsymbol {Z}}^{\mathrm{s}}_3\) is encoded by a convolutional block to recover the channel dimensions into the image space (e.g., \(C=3\)), as \({\boldsymbol {Y}}^{\mathrm{c}} = \mathrm{Conv}_5({\boldsymbol {Z}}^{\mathrm{c}}_3)\) and \({\boldsymbol {Y}}^{\mathrm{s}} = \mathrm{Conv}_5({\boldsymbol {Z}}^{\mathrm{s}}_3)\). We denote the \({\boldsymbol {Y}}^{\mathrm{c}}\) as the contamination residual, and \({\boldsymbol {Y}}^{\mathrm{s}}\) as the scene residual. Having the \({\boldsymbol {Y}}^{\mathrm{c}}\) and \({\boldsymbol {Y}}^{\mathrm{s}}\) at hand, one can obtain the restored scene image \({\boldsymbol {Y}}\) as
\begin{equation} {\boldsymbol {Y}} = {\boldsymbol {I}} - {\boldsymbol {Y}}^{\mathrm{c}} - {\boldsymbol {Y}}^{\mathrm{s}}. \end{equation}
(3)
The network is optimized by the negative SSIM loss [51] as \(\mathcal {L}_{\mathrm{SSIM}} = -\mathrm{SSIM} ({\boldsymbol {I}}^{\mathrm{gt}}, {\boldsymbol {Y}})\). Note that the common practice uses both the negative SSIM loss and MSE loss as the objective. Empirically, we observed that a negative SSIM loss works better in the proposed ADU-Net, which will be justified in Section 4.4.

3.3 Asymmetric Dual-decoder Block

In this part, we will describe the asymmetric dual-decoder \(f_{\mathrm{AD}}\) in ADU-Net. As shown in Figure 2, \(f_{\mathrm{AD}}\) consists of four ADBs and a convolutional block, while the ADBs are stacked by two different instantiations (e.g., \(\mathrm{ADB}_0\) vs. \(\mathrm{ADB}_j,~j = 1,2,3\)). In this following, we will first describe \(\mathrm{ADB}_0\), a simple form of the block. Then with minor modifications, we can realize the \(\mathrm{ADB}_j, \ j = 1,2,3\) on top of the \(\mathrm{ADB}_0\).
The \(\mathrm{ADB}_0\) is a two-branch architecture (see Figure 2), which receives the \({\boldsymbol {F}}_3\) and \({\boldsymbol {F}}_4\) as input, and produces two latent features \({\boldsymbol {Z}}^{\mathrm{c}}_0\) and \({\boldsymbol {Z}}^{\mathrm{s}}_0\). In \(\mathrm{ADB}_0\), the two latent features are respectively encoded by two branches of network, namely, contamination residual net (denoted by \(g^{\mathrm{c}}\)), and scene residual net (denoted by \(g^{\mathrm{s}}\)), given by
\begin{equation} {\boldsymbol {Z}}^{\mathrm{c}}_0 = g^{\mathrm{c}}({\boldsymbol {F}}_3, {\boldsymbol {F}}_4) \end{equation}
(4)
and
\begin{equation} {\boldsymbol {Z}}^{\mathrm{s}}_0 = g^{\mathrm{s}}({\boldsymbol {F}}_3, {\boldsymbol {F}}_4). \end{equation}
(5)
Contamination Residual Net. In the contamination residual net (\(g^{\mathrm{c}}\)), \({\boldsymbol {F}}_3\) and \({\boldsymbol {F}}_4\) are fed to a cCFF module to localize the rain and haze areas in the scene image, as
\begin{equation} {\boldsymbol {G}}^{\mathrm{c}}_0 = \mathrm{CFF}({\boldsymbol {F}}_3, {\boldsymbol {F}}_4). \end{equation}
(6)
The details of CFF are illustrated in Figure 3(a). Given two feature maps \({\boldsymbol {F}}_3\) and \({\boldsymbol {F}}_4\) as input, it first fuses the two inputs by using element-wise addition and then feeds the fused feature maps to two-layer convolutional blocks to obtain the attention weights, formulated by
\begin{equation} {\boldsymbol {W}}_0^{\mathrm{c}} = \sigma (\mathrm{BN}(\mathrm{Conv}(\mathrm{ReLU(\mathrm{BN}(\mathrm{Conv}({\boldsymbol {F}}_3 \oplus {\boldsymbol {F}}_4)))}))), \end{equation}
(7)
where \(\sigma\), \(\mathrm{BN}\), and \(\mathrm{ReLU}\) are sigmoid function, batch normalization, and rectified linear unit activation, respectively. Here, the kernel size of \(\mathrm{Conv}\) is \(1\times 1\), which can be understood as applying a fully connected layer to the channel features.
Fig. 3.
Fig. 3. Architecture of the global channel feature fusion module and channel feature fusion module.
Then, we can apply the attention weights to the input feature maps and obtain the fused output, as
\begin{equation} {\boldsymbol {G}}^{\mathrm{c}}_0 = \left({\boldsymbol {W}}_0^{\mathrm{c}} \otimes {\boldsymbol {F}}_3 \right) \oplus \left(\left({\boldsymbol {I}} - {\boldsymbol {W}}_0^{\mathrm{c}}\right) \otimes {\boldsymbol {F}}_4 \right)\!. \end{equation}
(8)
The CFF module fuses the input feature maps, and the fusion weights are produced via the channel patterns. We further employ a self-attention mechanism to build the spatially long-range dependencies of the fused feature maps \({\boldsymbol {G}}^{\mathrm{s}}_0\), given by
\begin{equation} {\boldsymbol {H}}^{\mathrm{c}}_0 = \mathrm{W\text{-}MSA}({\boldsymbol {G}}^{\mathrm{c}}_0), \end{equation}
(9)
where W-MSA is the window multi-head self-attention from the Swin Transformer [35].
By fusing the input feature maps and processing them by the attention mechanism, we can obtain the contamination residual feature maps as
\begin{equation} {\boldsymbol {Z}}^{\mathrm{c}}_0 = \mathrm{Conv}({\boldsymbol {H}}^{\mathrm{c}}_0). \end{equation}
(10)
The contamination residual net (\(g^{\mathrm{c}}\)) aims to attend to the rainy and hazy regions, thereby highlighting the rain and haze components in the contamination residual feature maps.
Scene Residual Net. Since we can observe from the contamination residual (\({\boldsymbol {Y}}^{\mathrm{c}}\)) that it contains the scene information along with the rain and haze, we develop a scene residual net (\(g^{\mathrm{s}}\)), that can compensate for the removed scene information in the image. In doing so, the GCFF module is proposed to capture valuable global scene information of the image, and fuse features, as
\begin{equation} {\boldsymbol {G}}^{\mathrm{s}}_0 = \mathrm{GCFF}({\boldsymbol {F}}_3, {\boldsymbol {F}}_4). \end{equation}
(11)
As shown in Figure 3(b), \({\boldsymbol {F}}_3\) and \({\boldsymbol {F}}_4\) are first fused, and summarized to its global feature, as
\begin{equation} {\boldsymbol {m}}^{\mathrm{s}}_0 = \mathrm{GAP}({\boldsymbol {F}}_3 \oplus {\boldsymbol {F}}_4), \end{equation}
(12)
where \(\mathrm{GAP}\) indicates the global average pooling and \({\boldsymbol {m}}^{\mathrm{s}}_0\) indicates the resultant vector. Then, a two-layer convolutional block is used to modulate per element of the global feature \({\boldsymbol {m}}^{\mathrm{s}}_0\), written by
\begin{equation} {\boldsymbol {w}}^{\mathrm{s}}_0 = \sigma \left(\mathrm{BN}\left(\mathrm{Conv}\left(\mathrm{ReLU\left(\mathrm{BN}\left(\mathrm{Conv}\left({\boldsymbol {m}}^{\mathrm{s}}_0\right)\right)\right)}\right)\right)\right)\!. \end{equation}
(13)
We can thereby fuse the input feature maps as
\begin{equation} {\boldsymbol {G}}^{\mathrm{s}}_0 = \left({\boldsymbol {w}}^{\mathrm{s}}_0 \otimes {\boldsymbol {F}}_3 \right) \oplus \left(\left({\boldsymbol {i}} - {\boldsymbol {w}}_0^{\mathrm{s}}\right) \otimes {\boldsymbol {F}}_4 \right)\!, \end{equation}
(14)
where \({\boldsymbol {i}}\) indicates a unit vector with the same size as \({\boldsymbol {w}}^{\mathrm{s}}_0\).
After GCFF, we employ the SW-MSA to enhance the spatial interaction of the feature maps and obtain the scene residual features, described by
\begin{equation} {\boldsymbol {H}}^{\mathrm{s}}_0 = \mathrm{SW\text{-}MSA}({\boldsymbol {G}}^{\mathrm{s}}_0) \end{equation}
(15)
and
\begin{equation} {\boldsymbol {Z}}^{\mathrm{s}}_0 = \mathrm{Conv}({\boldsymbol {H}}^{\mathrm{s}}_0). \end{equation}
(16)
Instantiation of \({\boldsymbol {ADB}}_j\). The difference between \(\mathrm{ADB}_{j}, \ j \ne 0\) and \(\mathrm{ADB}_0\) is that \(\mathrm{ADB}_0\) receives two feature maps as input, while \(\mathrm{ADB}_j, \ j\ne 0\) includes three feature maps as input. To adapt the architecture of \(\mathrm{ADB}_0\) to \(\mathrm{ADB}_j, \ j \ne 0\), we make minor modifications (see Figure 2). Specifically, for any block, \(\mathrm{ADB}_j\), its input includes the output from the \(\left(j-1\right)\)-th ADB blcok, e.g., \({\boldsymbol {Z}}^{\mathrm{c}}_{j-1}, {\boldsymbol {Z}}^{\mathrm{s}}_{j-1} \in \mathbb {R}^{d \times h\times w}\), and from the \(\left(3-j\right)\)-th convolutional encoder, e.g., \({\boldsymbol {F}}_{3-j}\). We first concatenate the \({\boldsymbol {Z}}^{\mathrm{c}}_{j-1}, {\boldsymbol {Z}}^{\mathrm{s}}_{j-1}\), and reduce its dimension from \(2d \times h \times w\) to \(d \times h \times w\), as
\begin{equation} \bar{{\boldsymbol {Z}}}_{j-1} = \mathrm{Concat}({\boldsymbol {Z}}^{\mathrm{c}}_{j-1}, {\boldsymbol {Z}}^{\mathrm{s}}_{j-1}) \end{equation}
(17)
and
\begin{equation} \tilde{{\boldsymbol {Z}}}^{\mathrm{c}}_{j-1} = \mathrm{Conv_{in}}(\bar{{\boldsymbol {Z}}}_{j-1}),~\tilde{{\boldsymbol {Z}}}^{\mathrm{s}}_{j-1} = \mathrm{Conv_{in}}(\bar{{\boldsymbol {Z}}}_{j-1}),{} \end{equation}
(18)
where \(\mathrm{Conv_{in}}\) indicates a two-layer convolution block with batch normalization and rectified linear unit activation. Here, the kernel size of convolution layers is \(3 \times 3\).
With \({\boldsymbol {F}}_{3-j}\), the output of \(\mathrm{ADB}_j\) can be obtained as
\begin{equation} \begin{split}{\boldsymbol {Z}}^{\mathrm{c}}_j &= g^{\mathrm{c}}\left(\tilde{{\boldsymbol {Z}}}^{\mathrm{c}}_{j-1} ,{\boldsymbol {F}}_{3-j}\right) \\ &= \mathrm{Conv_{out}}\left(\mathrm{W\text{-}MSA}\left(\mathrm{CFF}\left(\tilde{{\boldsymbol {Z}}}^{\mathrm{c}}_{j-1} ,{\boldsymbol {F}}_{3-j}\right)\right)\right) \end{split} \end{equation}
(19)
and
\begin{equation} \begin{split}{\boldsymbol {Z}}^{\mathrm{s}}_j &= g^{\mathrm{s}}(\tilde{{\boldsymbol {Z}}}^{\mathrm{s}}_{j-1} ,{\boldsymbol {F}}_{3-j}) \\ &= \mathrm{Conv_{out}}\left(\mathrm{SW\text{-}MSA}\left(\mathrm{GCFF}\left(\tilde{{\boldsymbol {Z}}}^{\mathrm{s}}_{j-1} ,{\boldsymbol {F}}_{3-j}\right)\right)\right)\!. \end{split} \end{equation}
(20)
Here, \(\mathrm{Conv_{out}}\) indicates a convolution layer with the kernel size of \(3 \times 3\) followed by a leaky rectified linear unit and another convolution layer with the kernel size of \(1 \times 1\) also followed by a leaky rectified linear unit.
In this work, we propose a novel architecture for the rain and haze removal task. Considering the network capacity and hardware overhead, we propose two sizes of networks. One is the lite network, called ADU-Net, and the other is the large network, called ADU-Net-plus. In Section 4, we present the details of two architectures. The network performance is also evaluated in Section 4.
Remark 1.
The residual U-Net architecture has been used extensively for the rain or haze removal tasks [6], as shown in Figure 4(a). Having the observation that the contamination residual, produced by the decoder, contains the scene information, we aim to develop a dual-decoder U-Net, with one decoder producing the contamination residual, and another one producing the scene residual as a scene compensator. Its initial design is shown in Figure 4(b). Considering the physical property of the contamination and scene information in the input image, we propose a novel network architecture, ADU-Net, where we integrate two decoders with non-identical architectures (see Figure 4(c)). We justify our design in Section 4.4.
Fig. 4.
Fig. 4. Schematic comparison of the ADU-Net architecture and U-Net-based architectures. (a) is a vanilla architecture of the residual U-Net. (b) is a simple form of the residual U-Net with dual decoders. (c) is the diagram of our method.

4 Experiments

In this section, we first give the implementation details of the proposed ADU-Net and ADU-Net-plus. Then, the benchmark datasets and evaluation protocol are also introduced. We further compare our network to the state-of-the-art methods and conduct ablation studies to evaluate the superiority of the proposed network and each component. In the final part, we demonstrate substantial qualitative results to analyze the superior performance of our network.

4.1 Implementation Details

Network Architecture. The overall neural architecture of the proposed network is shown in Figure 2. Table 1 lists the kernel size of the convolutional layers. In the encoder block, the feature maps are processed by the Batch Normalization [22] and ReLU [1] after the convolutional layer, i.e., \(\mathrm{Conv_0}\), \(\mathrm{Conv_1}\), \(\mathrm{Conv_2}\), \(\mathrm{Conv_3}\), and \(\mathrm{Conv_4}\). Then the max-pooling layer is employed to down-sample the feature maps in each layer. In the decoder block, we also list the kernel size in the convolutional layers (see Table 1), and employ the Leaky ReLU as the activation function. Having computational efficiency in mind, we develop two neural networks of different scales. The light one is denoted as ADU-Net, while the large one is denoted as ADU-Net-plus. As shown in Table 1, the difference between the two networks is merely the modification to the channel dimensions. The superiority of our network will be evaluated in Section 4.3.
Table 1.
Layer nameOutput sizeADU-NetADU-Net-plus
\(\mathrm{Conv_0}\) \(H \times W\) \(\left[\begin{array}{l}3 \times 3,32 \\ 3 \times 3,32\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,64 \\ 3 \times 3,64\end{array}\right]\)
\(\mathrm{Conv_1}\) \(\frac{H}{2} \times \frac{W}{2}\) \(\left[\begin{array}{l}3 \times 3,64 \\ 3 \times 3,64\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,128 \\ 3 \times 3,128\end{array}\right]\)
\(\mathrm{Conv_2}\) \(\frac{H}{4} \times \frac{W}{4}\) \(\left[\begin{array}{l}3 \times 3,128 \\ 3 \times 3,128\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,256 \\ 3 \times 3,256\end{array}\right]\)
\(\mathrm{Conv_3}\) \(\frac{H}{8} \times \frac{W}{8}\) \(\left[\begin{array}{l}3 \times 3,256 \\ 3 \times 3,256\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,512 \\ 3 \times 3,512\end{array}\right]\)
\(\mathrm{Conv_4}\) \(\frac{H}{16} \times \frac{W}{16}\) \(\left[\begin{array}{l}3 \times 3,256 \\ 3 \times 3,256\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,512 \\ 3 \times 3,512\end{array}\right]\)
\(\mathrm{ADB_0}\) \(\frac{H}{8} \times \frac{W}{8}\) \(\left[\begin{array}{l}3 \times 3,128 \\ 3 \times 3,128\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,256 \\ 3 \times 3,256\end{array}\right]\)
\(\mathrm{ADB_1}\) \(\mathrm{Conv_{in}}\) \(\frac{H}{4} \times \frac{W}{4}\) \(\left[\begin{array}{l}3 \times 3,128 \\ 3 \times 3,128\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,256 \\ 3 \times 3,256\end{array}\right]\)
\(\mathrm{Conv_{out}}\) \(\frac{H}{4} \times \frac{W}{4}\) \(\left[\begin{array}{l}3 \times 3,64 \\ 3 \times 3,64\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,128 \\ 3 \times 3,128\end{array}\right]\)
\(\mathrm{ADB_2}\) \(\mathrm{Conv_{in}}\) \(\frac{H}{2} \times \frac{W}{2}\) \(\left[\begin{array}{l}3 \times 3,64 \\ 3 \times 3,64\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,128 \\ 3 \times 3,128\end{array}\right]\)
\(\mathrm{Conv_{out}}\) \(\frac{H}{2} \times \frac{W}{2}\) \(\left[\begin{array}{l}3 \times 3,32 \\ 3 \times 3,32\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,64 \\ 3 \times 3,64\end{array}\right]\)
\(\mathrm{ADB_3}\) \(\mathrm{Conv_{in}}\) \(H \times W\) \(\left[\begin{array}{l}3 \times 3,32 \\ 3 \times 3,32\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,64 \\ 3 \times 3,64\end{array}\right]\)
\(\mathrm{Conv_{out}}\) \(H \times W\) \(\left[\begin{array}{l}3 \times 3,16 \\ 3 \times 3,16\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,32 \\ 3 \times 3,32\end{array}\right]\)
\(\mathrm{Conv_5}\) \(H \times W\) \(\left[\begin{array}{l}3 \times 3,3 \\ 3 \times 3,3\end{array}\right]\) \(\left[\begin{array}{l}3 \times 3,3 \\ 3 \times 3,3\end{array}\right]\)
Parameter size \(6.63\times 10^6\) \(26.45 \times 10^6\)
Table 1. Details of the Kernel Size in Convolution Layers
\(H\) and \(W\) denote the height and width of the input image, respectively.
Network Training. We implement our method using the PyTorch deep learning package [37]. All experiments are evaluated on NVIDIA GTX 2080ti GPUs. In the experiments for RainCityscapes [21], BID Rain datasets [18], and NH-HAZE [2], the input images are resized to \(512 \times 256\). For the SPA-Data, we follow the practice in [49], which uses original images with size of \(256 \times 256\). The Adam optimization scheme with an initial learning rate of 0.001 is used to optimize the network. We train the network for 100 epochs for RainCityscapes and BID Rain datasets, and 20 epochs for SPA-Data. The learning rate adjustment strategy is employed to realize the learning rate decay, where the learning rate is decayed by a factor of 0.1 when the accuracy of the network does not improve in five epochs.

4.2 Datasets and Evaluation Protocol

We evaluate the proposed methods on two synthetic datasets, i.e., RainCityscapes [21], BID Rain [18], and two real-world datasets, i.e., SPA-Data [49], and NH-HAZE [2]. In the following, we will introduce these datasets and the statistics of each dataset are illustrated in Table 2.
Table 2.
DatasetTrain setTest setPropertyContamination
SyntheticReal worldRain streaksHazeSnowRaindrops
RainCityscapes9,4321,188 \(\checkmark\)  \(\checkmark\) \(\checkmark\)  
BID Rain2,975500*6 \(\checkmark\)  \(\checkmark\) \(\checkmark\) \(\checkmark\) \(\checkmark\)
SPA-Data638,4921,000  \(\checkmark\) \(\checkmark\)   
NH-HAZE4015  \(\checkmark\)  \(\checkmark\)  
Table 2. The Statistics of Datasets
RainCityscapes. The RainCityscapes dataset is synthesized from the Cityscapes dataset [11]. It takes 9,432 images synthesized from 262 Cityscapes images as the training set and 1,188 images synthesized from 33 Cityscapes images as the test set. All the selected images of Cityscapes are overcast, without obvious shadow. Rain streaks and haze are synthesized by different intensity maps. By adjusting the intensity of the rain streaks and haze, each original image can produce 36 different synthesized images. The results of different methods are reported in Table 3.
Table 3.
MethodPSNRSSIM
Input15.550.7722
Haze removalEPDN‡ [39]26.080.9306
DCPDN‡ [57]28.520.9277
AECRNet† [52]28.770.9350
Rain removalRESCAN‡ [30]24.490.8852
PReNet† [40]27.340.9497
DuRN‡ [33]29.430.9487
RCDNet† [48]30.560.8873
SPANet‡ [49]31.480.9656
MPRNet† [56]32.330.9767
Rain and haze removalDAF-Net† [20]30.160.9531
DGNL-Net† [21]32.380.9743
TransWeather† [47]29.280.9216
GTRain† [3]30.190.9597
WiperNet† [26]30.210.9584
ADU-Net33.830.9784
ADU-Net-plus34.640.9805
Table 3. Comparison with the State-of-the-Arts Methods of Rain Removal and Haze Removal on RainCityscapes Dataset
†indicates the network was trained on the RainCityscapes dataset. ‡ indicates the results of the algorithms as reported in [21]. \(1^{\mathrm{st}}/2^{\mathrm{nd}}\) best in red/blue.
BID Rain. The BID Rain dataset is also synthesized from the Cityscapes dataset. It samples 2,975 images from the validation set of the Cityscapes dataset as a training set, and 500 images from the test set of the Cityscapes dataset as its test set. This is a complicated dataset as the images contain rain streaks, haze, snow, and raindrops. The rain streaks masks are sampled from Rain100L and Rain100H [53], and the snow masks are sampled from Snow 100K [34]. The haze masks include three different intensities originating from FoggyCityScape [44]. The raindrops are produced from the metaball model [4]. Those weather components are mixed with the images in the Cityscapes dataset using the physical imaging models [4, 19, 34, 44, 53]. In the training set, every image can be mixed with each weather component with random probabilities, and we evaluate our model in six different cases; the combinations of the weather components in each case are as follows (1) rain streaks; (2) rain streaks and snow; (3) rain streaks and light haze; (4) rain streaks and heavy haze; (5) rain streaks, moderate haze, and raindrops; and (6) rain streaks, snow, moderate haze, and raindrops. Refer to [18] for more details of the six settings. The results of different cases are shown in Table 4.
Table 4.
CaseInputPReNetRCDNetBIDNetTransWeatherGTrainADUNetADUNet-plus
(1)PSNR25.5132.6928.0531.1731.8831.9134.6239.05
SSIM0.81440.98030.95270.94380.93070.95960.98270.9877
(2)PSNR18.6930.5229.8429.4729.3730.0332.4736.48
SSIM0.59790.95040.93510.90890.88440.91780.95600.9742
(3)PSNR17.4829.6530.1728.9029.4630.1431.4833.75
SSIM0.74270.95680.95360.93250.91760.94700.96690.9777
(4)PSNR11.5525.8026.7426.8227.5127.3326.5229.30
SSIM0.60170.92330.92100.91250.89490.92220.93600.9565
(5)PSNR14.0227.3628.3027.3126.9427.7428.5430.32
SSIM0.64550.93020.92850.91160.88330.91910.94430.9594
(6)PSNR12.3826.5627.2626.5426.2226.8527.6329.66
SSIM0.49160.90460.90050.86750.85040.88570.92220.9418
Table 4. Comparison with the State-of-the-Arts Methods on BID Rain Dataset
†indicates the network was trained on the BID Rain dataset. \(1^{\mathrm{st}}/2^{\mathrm{nd}}\) best in red/blue.
SPA-Data. The SPA-Data is a real-world dataset, which is cropped from 170 real rain videos, of which 86 videos are collected from StoryBlocks or YouTube, and 84 videos are captured by iPhone X or iPhone 6SP. Those videos cover outdoor fields, suburb scenes, and common urban scenes. This dataset contains 638,492 image pairs for training and 1,000 for testing. The results of SPA-Data are shown in Table 5.
Table 5.
MethodPSNRSSIM
Input34.150.9269
RESCA \(\mathrm{N}\)‡ [30]38.190.9707
PReNe \(\mathrm{t}\)‡ [40]40.160.9816
SPANe \(\mathrm{t}\)‡ [49]40.240.9811
RCDNe \(\mathrm{t}\)‡[48]41.470.9834
TransWeather† [47]38.310.9757
WiperNet† [26]41.730.9905
ADU-Net44.190.9885
ADU-Net-plus46.040.9924
Table 5. Comparison with the State-of-the-Arts Methods on SPA-Data Dataset
‡indicates the results of the algorithms as reported in [48]. †indicates the network was trained on the SPA-data. \(1^{\mathrm{st}}/2^{\mathrm{nd}}\) best in red/blue.
NH-HAZE The NH-HAZE [2] is a valuable dataset for non-homogeneous haze research, as it offers ground truth images for evaluation. The dataset comprises 55 pairs of real-world outdoor scenes, where each pair consists of a hazy image and its corresponding haze-free counterpart. The non-homogeneous haze present in these images has been meticulously generated using a professional haze generator, ensuring an accurate representation of real-life haze conditions. The results on the NH-HAZE dataset are presented in Table 6.
Table 6.
MethodPSNRSSIM
Input11.480.4023
DehazeNet‡ [5]16.620.524
FFA-Net‡ [38]19.870.692
MSBDN‡ [12]19.230.706
AECRNet‡ [52]19.880.717
DehazeFormer-S‡[45]20.470.731
ADU-Net28.370.887
ADU-Net-plus29.460.890
Table 6. Comparison with the State-of-the-Arts Methods on NH-HAZE Dataset
‡indicates the results of the algorithms as reported in [45]. †indicates the network was trained on the SPA-data. \(1^{\mathrm{st}}/2^{\mathrm{nd}}\) best in red/blue.
Evaluation Protocol. In our experiments, the network performance is quantitatively evaluated by the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) metrics. A higher value of PSNR and SSIM indicates a better image recovery performance of the network.

4.3 Comparison to the State-of-the-Arts

To verify the advance of our method, we compare the performance of our method with current state-of-the-art methods across three datasets.
RainCityscapes. In the RainCityscapes dataset, we compare our methods to the state-of-the-art rain removal methods including RESCAN [28], PReNet [40], DuRN [33], RCDNet [48], SPANet [49], and MPRNet [56]. We also compare our methods with approaches that jointly remove the rain and haze, i.e., DAF-Net [20], DGNL-Net [21], WiperNet [26], TransWeather [47], and GTRain [3]. The comparison with haze removal methods, like EPDN [39], DCPDN [57], and AECR-Net [52], is also conducted. The results are reported in Table 3. We can find that our vanilla solution, i.e., ADU-Net, outperforms the existing state-of-the-art methods. In particular, it improves the PSNR/SSIM values of the DGNL-Net by 1.45/0.0017, indicating the superior design of our method. The plus version of our method, i.e., ADU-Net-plus, again brings performance gain over the AUD-Net, where the ADU-Net-plus improves the PSNR/SSIM values by 0.81/0.0021.
BID Rain. Since the scene in the RainCityscapes dataset only contains rain and haze information, we further evaluate our methods on the challenging dataset, BID Rain, to verify its generalization of working in complicated weather conditions. Table 4 illustrates the comparison of the model performance in each weather condition. We can observe that the proposed ADU-Net can outperform the BIDeN [18] in each of the cases. Especially in cases (2) and (3), the ADU-Net brings the maximum performance gain. One possible explanation is that the proposed ADU-Net is designed with a dual-branch decoder, which is tailored for the images in case (2) including the rain streaks and snow, or that in (3) including rain streaks and a light haze. However, the improvement in the other cases reveals the generalization of our proposal. Along with the ADU-Net, its plus version can significantly improve both PSNR/SSIM values, showing the superiority of our network architecture. In case (4), the performance of ADU-Net is lower than that of RCDNet [48], BIDeN [18], TransWeather [47], and GTrain [3]. One possible explanation is that the “heavy haze” covers the scenes, which makes it difficult for our network to produce the scene residual. Nevertheless, this issue is addressed by increasing the parameter size, supported by the performance in ADU-Net-plus.
SPA-Data. We also evaluate our methods in the large-scale dataset, SPA-Data. We compare our methods to the existing state-of-the-art methods in Table 5, including RESCAN [30], PReNet [40], SPANet [49], RCDNet [48], and WiperNet [26]. As shown in Table 5, the proposed methods outperform the existing methods by a large margin. For example, the improvements read of 2.72/0.0051 (PSNR/SSIM) from ADU-Net and 4.57/0.0090 from ADU-Net-plus, as compared to RCDNet, showing the strong performance of our network architecture. Indeed, although ADU-Net exhibits a slightly lower SSIM compared to WiperNet by 0.0020, its superiority in PSNR by 2.46 highlights its excellent performance. Furthermore, ADU-Net-plus outperforms WiperNet, leading in both PSNR and SSIM by 4.31 and 0.0019, respectively. These findings affirm the robustness and efficacy of our proposed methods for image rain removal in real-world scenarios.
NH-HAZE. To showcase the effectiveness of our approach, we conducted experiments using the real-world NH-HAZE dataset [2]. In Table 6, we compare our methods with state-of-the-art techniques, including MSBDN [12], FFA-Net [38], AECRNet [52], and DehazeFormer [45]. The results, as presented in Table 6, demonstrate significant performance improvements with our proposed methods surpassing existing approaches by a wide margin. For instance, when compared to DehazeFormer-S, ADU-Net and ADU-Net-plus exhibit remarkable improvements of 7.9/0.156 (PSNR/SSIM) and 8.99/0.159, respectively. These outcomes highlight the strong performance of our network architecture.
Comparison of Model Complexity and Time Cost. In addition to analyzing PSNR and SSIM, we also compare the complexity of our methods with the existing state-of-the-art methods in Table 7, including PReNet [40], RCDNet [48], AECRNet [52], and DGNL-Net [21]. We employ GFLOPs (Giga Floating Point Operations) to quantify the complexity of the model. Additionally, we assess the runtime complexity by averaging the training time per epoch (s/epoch). It is evident from the table that ADU-Net stands out with the smallest GFLOPs and the second shortest s/epoch. Although our model’s runtime is slightly higher compared to DGNL-Net [21], we have demonstrated superior performance on both PSNR and SSIM. In our proposed methods, the convolutional layers play a significant role in the overall computational load. We are proud to share that ADU-Net achieves an impressive 50% reduction in kernel size compared to ADU-Net-plus while maintaining an exceptional performance level of 95%. We believe this significant reduction in complexity makes ADU-Net a more efficient and practical choice for various applications. However, in cases where computational resources are abundant and time is not a constraint, choosing ADU-Net-plus to achieve better performance is also a viable option.
Table 7.
ModelGFLOPss/epochPSNRSSIM
PReNet [40]132.882145.4527.340.9497
RCDNet [48]48.463509.2730.560.8873
AECRNet [52]86.091113.6828.770.9350
DGNL-Net [21]39.53837.2832.380.9743
ADU-Net31.651017.2133.830.9784
ADU-Net-plus125.711126.4934.640.9805
Table 7. Model Complexity and Runtime Cost
The PSNR/SSIM are the results of the RainCityscapes dataset. \(1^{\mathrm{st}}/2^{\mathrm{nd}}\) best in red/blue.

4.4 Ablation Study

In this section, we conduct thorough ablation studies to verify the effectiveness per component in the proposed network. All studies in this section are conducted using ADU-Net on the RainCityscapes dataset.
Loss Function. In our implementation, the network is optimized by the negative SSIM loss, i.e., \(\mathcal {L}_{\mathrm{SSIM}}\). While in many practices of the low-level computer vision tasks, the MSE loss, i.e., \(\mathcal {L}_{\mathrm{MSE}}\), is employed [14]. In this study, we evaluate the effectiveness of each loss function. As shown in Table 8, we can find that each of the loss functions works better for our rain and haze removal task, and the network performance training from the two-loss functions is similar. However, the multi-task training, which optimizes the loss functions jointly, will degrade the network performance, indicating that the network may be saturated using one loss function, and the joint training will harm the network.
Table 8.
Loss Function \(\mathcal {L}_{\mathrm{MSE}}\) \(\mathcal {L}_{\mathrm{SSIM}}\) \(\mathcal {L}_{\mathrm{MSE}} + \mathcal {L}_{\mathrm{SSIM}}\)
PSNR33.1733.8333.74
SSIM0.97200.97840.9774
Table 8. Comparison of the Effectiveness of Loss Functions
We use bold to indicate the best result.
Effect of Dual-branch Architecture. Our work naively proposes a dual-branch architecture, i.e., asymmetric dual-decoder U-Net, for rain and haze removal tasks. In this study, we will justify the effectiveness of the dual-branch design in our task (shown in Figure 4). Table 9 shows the empirical comparison of three architectures, i.e., Residual U-Net, Dual-decoder U-Net, and the proposed ADU-Net. The “Residual U-Net” represents the structure shown in Figure 4(a), while the “Dual-Decoder U-Net” also represents the structure depicted in Figure 4(b). Table 9 verifies our design is reasonable, where the dual-decoder U-Net outperforms the vanilla version of the residual U-Net and our ADU-Net can further bring the performance gain to the dual-decoder U-Net.
Table 9.
ModelPSNRSSIM
Residual U-Net31.640.9712
Dual-decoder U-Net32.260.9724
ADU-Net33.830.9784
Table 9. Effect of Dual-branch Architecture in Rain and Haze Removal
We use bold to indicate the best result.
The above study shows our design flow is reasonable. We further evaluate the effectiveness of the contamination residual branch and scene residual branch in ADU-Net (see the results in Table 10). As compared to the Residual U-Net, each branch can improve its performance, showing the effectiveness of the proposed residual branch. Also, we can observe that the combination of the proposed residual branches can achieve further improvement, indicating that those two decoders learn complementary features of the image. In Table 10, the first row, “Residual U-Net,” is the same as in Table 9. The second row, “+Contamination residual branch,” represents the model where the decoder of Residual U-Net is replaced with the Contamination residual branch, and similarly, the third row, “+Scene residual branch,” represents the model where the decoder of Residual U-Net is replaced with the Scene residual branch. From the experimental results, it can be observed that each branch contributes to the improvement of PSNR and SSIM. However, combining both branches in the model leads to greater improvement.
Table 10.
ModelPSNRSSIM
Residual U-Net31.640.9712
+Contamination residual branch32.300.9725
+Scene residual branch32.940.9744
ADU-Net33.830.9784
Table 10. Effect of the Dual-branch Decoder in ADU-Net
We use bold to indicate the best result.
Effect of Self-attention Module. As for Table 11, we aim to demonstrate that using W-MSA and SW-MSA in both symmetric decoders is superior to using either one alone. The first row (Dual-decoder U-Net) represents the architecture shown in Figure 4(b). The second row (+W-MSA) indicates that W-MSA is used in both symmetric decoders. Compared to the first row, there is an improvement of 0.44/0.0037 in PSNR/SSIM, indicating that using W-MSA alone brings only limited improvement. Similarly, the third row (+SW-MSA) indicates that SW-MSA is used in both symmetric decoders. Compared to the first row, there is an improvement of 0.51/0.0036 in PSNR/SSIM, showing that using SW-MSA alone also brings a modest improvement. The third row (+W-MSA&SW-MSA) represents the utilization of both W-MSA and SW-MSA in the two branches. There is an improvement of 0.74/0.0040 in PSNR/SSIM compared to the first row. However, it should be noted that in our experiments, we focused on saving the model with the best PSNR during training and did not specifically optimize for SSIM. Therefore, we believe that the marginal decrease in SSIM does not necessarily indicate a decline in model performance. We acknowledge that the benefits brought by simultaneously using both W-MSA and SW-MSA may not be significant.
Table 11.
ModelPSNRSSIM
Dual-decoder U-Net32.260.9724
+W-MSA32.700.9761
+SW-MSA32.770.9760
+W-MSA& SW-MSA33.000.9764
Table 11. Effect of Self-attention Module
We use bold to indicate the best result.
Effect of Feature Fusion Module. In the proposed architecture of the ADU-Net, each decoder block has two information flows, respectively encoding the contamination residual and the scene residual (see Figure 2). Each information flow yields the feature fusion w.r.t. the concern of physical properties. In this study, we evaluate our design. Table 12 shows ablations of the effectiveness of the feature fusion blocks. Each of the CFF or GCFF can improve the accuracy by about 0.2 PSNR value. However, combining those two blocks can further bring an outstanding performance gain on top of the individual one, around 0.6 PSNR value. This can greatly verify the good practice of the feature fusion blocks in our design.
Table 12.
ModelPSNRSSIM
Dual-decoder U-Net32.260.9724
w/o GCFF& CFF33.000.9764
+CFF33.250.9770
+GCFF33.210.9773
ADU-Net33.830.9784
Table 12. Effect of Feature Fusion Module
We use bold to indicate the best result.

4.5 Application: Semantic Segmentation

To demonstrate the effectiveness of our approach for application, we conducted an evaluation of our approach using the RainCityscapes dataset, chosen for its comprehensive assessment of overall image restoration. The experimental results are presented in Table 13. As a baseline, we used DeepLabV3 [8] and performed semantic segmentation on the RainCityscapes dataset. The evaluation metrics included \(IoU_{{class}}\) (Intersection-over-Union for classes), \(iIoU_{class}\) (instance-level Intersection-over-Union for classes), \(IoU_{{category}}\) (Intersection-over-Union for categories), \(iIoU_{category}\) (instance-level Intersection-over-Union for categories), and accuracy.
Table 13.
Model \(IoU_{\text{class}}\) \(iIoU_{\text{class}}\) \(IoU_{\text{category}}\) \(iIoU_{\text{category}}\)Accuracy
Rainy Images0.36490.11160.62930.30450.7823
Rain-free Images (ADU-Net)0.46960.19580.75340.49410.8577
Rain-free Images (ADU-Net-plus)0.47240.19510.75610.50050.8589
Rain-free Images (ground truth)0.48410.20390.76440.52650.8625
Table 13. Effect of ADU-Net on Semantic Segmentation
\(1^{\mathrm{st}}/2^{\mathrm{nd}}\) best in red/blue.
We conducted four types of experiments: original rainy images from the RainCityscapes dataset (Rainy Images), rainy images derained by ADU-Net (Rain-free Images (ADU-Net)), rainy images derained by ADU-Net-plus (Rain-free Images (ADU-Net-plus)), and ground truth images from the RainCityscapes dataset (Rain-free Images (ground truth)). As shown in Table 13, the utilization of ADU-Net for removing rain and haze from the images resulted in improvements across various evaluation metrics. The \(IoU_{{class}}\) metric showed a notable improvement of 0.1047, while the \(iIoU_{{class}}\) increased by 0.0842. Furthermore, the \(IoU_{{category}}\) experienced a significant boost of 0.1241, and the \(iIoU_{{category}}\) demonstrated an even more substantial enhancement of 0.1896. Additionally, the accuracy metric showed a notable increase of 0.0754. Similar positive advancements were observed for ADU-Net-plus across all metrics.
It is noteworthy that the experimental results of the rainy images derained by ADU-Net (Rain-free Images (ADU-Net)) and ADU-Net-plus (Rain-free Images (ADU-Net-plus)) were comparable. Considering the minimal difference of only 0.81/0.0021 in terms of PSNR and SSIM, it indicates that there is a limited benefit in the application of semantic segmentation models when the image restoration reaches a certain level. Further improvements are necessary to achieve better results.

4.6 Visualization

Along with the quantitative analysis in the above paragraphs, we further conduct qualitative analysis to verify the superiority of our work. In this study, we first illustrate the rain and haze removal performance between our work and existing SOTA methods in synthetic datasets (see Figure 5). Various real-world outdoor scenes are also evaluated (see Figure 6). The generalization of the proposed ADU-Net is further evaluated by removing other contamination, e.g., only rain in Figure 7, or rain and snow in Figure 8.
Fig. 5.
Fig. 5. Visualization of contamination removal performance on the RainCityscapes. The first column (a) is the input image. We compare our method with state-of-the-art algorithms, including PReNet [40], AECR-Net [52], and DGNL-Net [21]. (f) is the ground truth.
Fig. 6.
Fig. 6. Visualization of contamination removal performance on real-world images with rain and haze. The first column (a) is the input image. We compare our method with state-of-the-art algorithms, including PReNet [40], AECR-Net [52], and DGNL-Net [21].
Fig. 7.
Fig. 7. Visualization of the contamination removal on the BID Rain dataset. The images in BID Rain are synthesized with rain streaks, raindrops, snow, and haze. The first row is the input image. The second row and third row are contamination residual and scene residual, respectively. The fourth row and fifth row are the clean image and ground truth, respectively.
Fig. 8.
Fig. 8. Visualization of the contamination removal on real-world rain images. The first row is the input image. The second row and third row are contamination residual and scene residual, respectively. The fourth row is the clean image.
The first study is evaluated on the RainCityscapes dataset. We compare our method with the state-of-the-art methods, including PReNet [40], AECR-Net [52], and DGNL-Net [21]. As shown in Figure 5, our method can produce a much clearer scene image (see the red box for details). For example, in the fourth row of Figure 5, our method removes most of the haze and produces a clear shape of the tree branches, while other methods fail to recover the tree branches. This clearly shows the superiority of our method.
In the second study, we conduct the analysis on real-world images1 used in [49], to justify the potential of our method in real scenarios. We again compare our method to PReNet, AECR-Net, and DGNL-Net. For a fair comparison, each method adopts publicly available fine-tuned weights trained on their own datasets. As can be observed from Figure 6, the scene images, generated by our method, are clearer and more realistic than those from other methods. For example, as compared to the rain removal network PReNet, our method can also remove the haze in real-world scenes. The hues of the recovered scene from our method are also more realistic than that from the dehazing network AECR-Net and reflective details of the scenes are maintained by our method. As compared to DGNL-Net, the closest work to ours, our ADU-Net can remove more rain streaks (the second row) or haze (the third row) and retain more scene details (the first row). This study can vividly show the effectiveness of our method in real scenarios.
To demonstrate the generalization of our dual-decoder architecture in separating different contamination, we show the residual produced by different branches. Figure 7 shows the results of our method on the BID Rain dataset. The first row is the input image. The second row and third row present the masks of contamination residual and scene residual, respectively. The fourth row and fifth row are the generated images and the ground truth, respectively. We can find that our method separates the contamination (e.g., snow or haze) and scene clearly, and produces high-quality scene images. A similar observation is also made in the real-world images from Internet-Data in Figure 8. This study also verifies our motivation that most of the contamination components in the image are included in the contamination residual while the scene residual contains more details of the scene including building structures and driveway lines. This analysis again illustrates the superior generalization of the proposed method.

5 Conclusion

In this article, we propose ADU-Net, the first module involving two residual branches, for the joint rain and haze removal task. Unlike previous work focusing on the contamination removal only, ADU-Net recalls the importance of restoring the scene information affected by the change of atmospheric light. By leveraging our proposed scene residual and contamination residual, ADU-Net can produce clear scene images. The superiority of ADU-Net is evaluated by extensive experiments, and the proposed ADU-Net outperforms the current state-of-the-art approaches significantly across three benchmark datasets and tasks. We believe our study will serve as a strong baseline for future work, and inspire more research work in the line of joint rain and haze removal task.

Footnote

1
147 real rain images collected from Internet.

References

[1]
Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv:1803.08375. https://arxiv.org/pdf/1803.08375
[2]
Codruta O. Ancuti, Cosmin Ancuti, and Radu Timofte. 2020. NH-HAZE: An image dehazing benchmark with non-homogeneous hazy and haze-free images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 444–445.
[3]
Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso M. de Melo, Suya You, Stefano Soatto, Alex Wong, and Achuta Kadambi. 2022. Not just streaks: Towards ground truth for single image deraining. In Proceedings of the 17th European Conference on Computer Vision (ECCV ’22), Tel Aviv, Israel, Part VII. Springer-Verlag, Berlin, 723–740.
[4]
James F. Blinn. 1982. A generalization of algebraic surface drawing. ACM Transactions on Graphics 1, 3 (July1982), 235–256.
[5]
Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, and Dacheng Tao. 2016. DehazeNet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing 25, 11 (November2016), 5187–5198.
[6]
Chenghao Chen and Hao Li. 2021. Robust representation learning with feedback for single image deraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7738–7747.
[7]
Long Chen, Wujing Zhan, Wei Tian, Yuhang He, and Qin Zou. 2019. Deep integration: A multi-label architecture for road scene recognition. IEEE Transactions on Image Processing 28, 10 (October2019), 4883–4898.
[8]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587. https://arxiv.org/pdf/1706.05587
[9]
Wei-Ting Chen, Jian-Jiun Ding, and Sy-Yen Kuo. 2019. PMS-net: Robust haze removal based on patch map for single images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11673–11681.
[10]
Wei-Ting Chen, Zhi-Kai Huang, Cheng-Che Tsai, Hao-Hsiang Yang, Jian-Jiun Ding, and Sy-Yen Kuo. 2022. Learning multiple adverse weather removal via two-stage knowledge learning and multi-contrastive regularization: Toward a unified model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17632–17641.
[11]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3213–3223.
[12]
Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, and Ming-Hsuan Yang. 2020. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2154–2164.
[13]
Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. 2019. LaSOT: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5369–5378.
[14]
Zhiwen Fan, Huafeng Wu, Xueyang Fu, Yue Huang, and Xinghao Ding. 2018. Residual-guide network for single image deraining. In Proceedings of the 26th ACM International Conference on Multimedia. Association for Computing Machinery, 1751–1759.
[15]
Raanan Fattal. 2014. Dehazing using color-lines. ACM Transactions on Graphics 34, 1 (November2014), 1–14.
[16]
Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley. 2017. Removing rain from single images via a deep detail network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1715–1723.
[17]
Kshitiz Garg and Shree K. Nayar. 2007. Vision and rain. International Journal of Computer Vision 75, 1 (October2007), 3–27.
[18]
Junlin Han, Weihao Li, Pengfei Fang, Chunyi Sun, Jie Hong, Mohammad Ali Armin, Lars Petersson, and Hongdong Li. 2022. Blind image decomposition. In Proceedings of the European Conference on Computer Vision.
[19]
Kaiming He, Jian Sun, and Xiaoou Tang. 2011. Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 12 (December2011), 2341–2353.
[20]
Xiaowei Hu, Chi-Wing Fu, Lei Zhu, and Pheng-Ann Heng. 2019. Depth-attentional features for single-image rain removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8014–8023.
[21]
Xiaowei Hu, Lei Zhu, Tianyu Wang, Chi-Wing Fu, and Pheng-Ann Heng. 2021. Single-image real-time rain removal based on depth-guided non-local features. IEEE Transactions on Image Processing 30 (January2021), 1759–1770.
[22]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning(ICML’15, Vol. 37). JMLR.org, 448–456.
[23]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5967–5976.
[24]
Li-Wei Kang, Chia-Wen Lin, and Yu-Hsiang Fu. 2012. Automatic single-image-based rain streaks removal via image decomposition. IEEE Transactions on Image Processing 21, 4 (December2012), 1742–1755.
[25]
Dong Hwan Kim, Woo Jin Ahn, Myo Taeg Lim, Tae Koo Kang, and Dong Won Kim. 2021. Frequency-based haze and rain removal network (FHRR-Net) with deep convolutional encoder-decoder. Applied Sciences 11, 6 (March2021).
[26]
Ashutosh Kulkarni and Subrahmanyam Murala. 2022. Wipernet: A lightweight multi-weather restoration network for enhanced surveillance. IEEE Transactions on Intelligent Transportation Systems 23, 12 (2022), 24488–24498.
[27]
Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and Dan Feng. 2017. AOD-Net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision. 4780–4788.
[28]
Guanbin Li, Xiang He, Wei Zhang, Huiyou Chang, Le Dong, and Liang Lin. [n. d.]. Non-locally enhanced encoder-decoder network for single image de-raining. In Proceedings of the 26th ACM International Conference on Multimedia. 1056–1064.
[29]
Ruoteng Li, Robby T. Tan, and Loong-Fah Cheong. 2020. All in one bad weather removal using architectural search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3172–3182.
[30]
Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin Zha. 2018. Recurrent squeeze-and-excitation context aggregation net for single image deraining. In Proceedings of the European Conference on Computer Vision. 262–277.
[31]
Yu Li, Robby T. Tan, Xiaojie Guo, Jiangbo Lu, and Michael S. Brown. 2016. Rain streak removal using layer priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2736–2744.
[32]
Chuping Liang, Yidan Feng, Haoran Xie, Mingqiang Wei, and Xuefeng Yan. 2021. Prior-based single image rain and haze removal. Journal of ZheJiang University (Science Edition) 48, 3 (May2021), 270–281.
[33]
Xing Liu, Masanori Suganuma, Zhun Sun, and Takayuki Okatani. 2019. Dual residual networks leveraging the potential of paired operations for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7000–7009.
[34]
Yun-Fu Liu, Da-Wei Jaw, Shih-Chia Huang, and Jenq-Neng Hwang. 2018. DesnowNet: Context-aware deep network for snow removal. IEEE Transactions on Image Processing 27, 6 (June2018), 3064–3073.
[35]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE/CVF International Conference on Computer Vision. 9992–10002.
[36]
Yu Luo, Yong Xu, and Hui Ji. 2015. Removing rain from a single image via discriminative sparse coding. In IEEE International Conference on Computer Vision. 3397–3405.
[37]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, Article 721, 8026–8037.
[38]
Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia. 2020. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11908–11915.
[39]
Yanyun Qu, Yizi Chen, Jingying Huang, and Yuan Xie. 2019. Enhanced Pix2pix dehazing network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8152–8160.
[40]
Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng. 2019. Progressive image deraining networks: A better and simpler baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3932–3941.
[41]
Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, and Ming-Hsuan Yang. 2016. Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 154–169.
[42]
Wenqi Ren, Lin Ma, Jiawei Zhang, Jinshan Pan, Xiaochun Cao, Wei Liu, and Ming-Hsuan Yang. 2018. Gated fusion network for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3253–3261.
[43]
Wenqi Ren, Jinshan Pan, Hua Zhang, Xiaochun Cao, and Ming-Hsuan Yang. 2020. Single image dehazing via multi-scale convolutional neural networks with holistic edges. International Journal of Computer Vision 128, 1 (January2020), 240–259.
[44]
Christos Sakaridis, Dengxin Dai, and Luc Van Gool. 2018. Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision 126 (September2018), 973–992.
[45]
Yuda Song, Zhuqing He, Hui Qian, and Xin Du. 2023. Vision transformers for single image dehazing. IEEE Transactions on Image Processing 32 (2023), 1927–1941.
[46]
Ziyi Sun, Yunfeng Zhang, Fangxun Bao, Ping Wang, Xunxiang Yao, and Caiming Zhang. 2022. Sadnet: Semi-supervised single image dehazing method based on an attention mechanism. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 2 (2022), 1–23.
[47]
Jeya Maria Jose Valanarasu, Rajeev Yasarla, and Vishal M. Patel. 2022. TransWeather: Transformer-based restoration of images degraded by adverse weather conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2353–2363.
[48]
Hong Wang, Qi Xie, Qian Zhao, and Deyu Meng. 2020. A model-driven deep neural network for single image rain removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3103–3112.
[49]
Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson W. H. Lau. 2019. Spatial attentive single-image deraining with a high quality real rain dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12262–12271.
[50]
Ying Wang, Yuexing Peng, Wei Li, George C. Alexandropoulos, Junchuan Yu, Daqing Ge, and Weixu Xiang. 2022. DDU-Net: Dual-decoder-U-Net for road extraction using high-resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1–12.
[51]
Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (April2004), 600–612.
[52]
Haiyan Wu, Yanyun Qu, Shaohui Lin, Jian Zhou, Ruizhi Qiao, Zhizhong Zhang, Yuan Xie, and Lizhuang Ma. 2021. Contrastive learning for compact single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10546–10555.
[53]
Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. 2017. Deep joint rain detection and removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1685–1694.
[54]
Chia-Hung Yeh, Chih-Hsiang Huang, and Li-Wei Kang. 2020. Multi-scale deep residual learning-based single image haze removal via image decomposition. IEEE Transactions on Image Processing 29 (2020), 3153–3167.
[55]
Yi Yu, Wenhan Yang, Yap-Peng Tan, and Alex C. Kot. 2022. Towards robust rain removal against adversarial attacks: A comprehensive benchmark analysis and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6013–6022.
[56]
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. 2021. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14816–14826.
[57]
He Zhang and Vishal M. Patel. 2018. Densely connected pyramid dehazing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3194–3203.
[58]
He Zhang and Vishal M. Patel. 2018. Density-aware single image de-raining using a multi-stream dense network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 695–704.
[59]
He Zhang, Vishwanath Sindagi, and Vishal M. Patel. 2020. Image de-raining using a conditional generative adversarial network. IEEE Transactions on Circuits and Systems for Video Technology 30, 11 (November2020), 3943–3956.
[60]
Hang Zhang, Han Zhang, Chenguang Wang, and Junyuan Xie. 2019. Co-occurrent features in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 548–557.
[61]
Lei Zhu, Zijun Deng, Xiaowei Hu, Haoran Xie, Xuemiao Xu, Jing Qin, and Pheng-Ann Heng. 2021. Learning gated non-local residual for single-image rain streak removal. IEEE Transactions on Circuits and Systems for Video Technology 31, 6 (June2021), 2147–2159.
[62]
Lei Zhu, Chi-Wing Fu, Dani Lischinski, and Pheng-Ann Heng. 2017. Joint bi-layer optimization for single-image rain streak removal. In Proceedings of the IEEE International Conference on Computer Vision. 2545–2553.
[63]
Qingsong Zhu, Jiaming Mai, and Ling Shao. 2015. A fast single image haze removal algorithm using color attenuation prior. IEEE Transactions on Image Processing 24, 11 (November2015), 3522–3533.

Index Terms

  1. Asymmetric Dual-Decoder U-Net for Joint Rain and Haze Removal

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 3
    March 2024
    665 pages
    EISSN:1551-6865
    DOI:10.1145/3613614
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 December 2023
    Online AM: 18 October 2023
    Accepted: 07 October 2023
    Revised: 19 September 2023
    Received: 06 February 2023
    Published in TOMM Volume 20, Issue 3

    Check for updates

    Author Tags

    1. Joint rain and haze removal
    2. asymmetric dual-decoder U-Net (ADU-Net)
    3. contamination residual
    4. scene residual

    Qualifiers

    • Research-article

    Funding Sources

    • Natural Science Foundation of Zhejiang Province ZJNSF
    • National Natural Science Foundation of China
    • Southeast University Start-Up Grant for New Faculty
    • Big Data Computing Center of Southeast University

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 805
      Total Downloads
    • Downloads (Last 12 months)777
    • Downloads (Last 6 weeks)141
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media