Abstract
Current unpaired image-to-image translation models deal with the datasets on unpaired domains effectively but face the challenge of mapping images from domains with scarce information to domains with abundant information due to the degradation of visibility and the loss of semantic information. To improve the quality of night-to-day translation further, we propose a novel image translation method named “RefN2D-Guide GAN” that utilizes reference images to improve the adaptability of the encoder within the generator through the feature matching loss. Moreover, we introduce a segmentation module to assist in preserving the semantic details of the generated images without the need for ground true annotations. The incorporation of the embedding consistency loss differentiates the roles of the encoder and decoder and facilitates the transfer of learned representation to both translation directions. Our experimental results show that our proposed method can effectively enhance the quality of night-to-day image translation on the night training set of the ACDC dataset and achieve higher mIoU on the translated images.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For the comparison in experiments, N2D-GAN employed has removed the segmentation module using the guidance of ground truth semantic labels.
References
Anoosheh, A., Sattler, T., Timofte, R., Pollefeys, M., Van Gool, L.: Night-to-day image translation for retrieval-based localization. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5958–5964. IEEE (2019)
Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying mmd gans. arXiv preprint arXiv:1801.01401 (2018)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Dar, H., Butterworth, N., Willet, C.E.: Sydney informatics hub: artemis training, October 2018. https://sydney-informatics-hub.github.io/training.artemis/, version 2018.10
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Guo, J., Li, J., Fu, H., Gong, M., Zhang, K., Tao, D.: Alleviating semantics distortion in unsupervised low-level image-to-image translation via structure consistency constraint. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18249–18259 (2022)
Han, J., Shoeiby, M., Petersson, L., Armin, M.A.: Dual contrastive learning for unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 746–755 (2021)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Jiang, Y., et al.: EnlightenGAN: deep light enhancement without paired supervision. IEEE Trans. Image Process. 30, 2340–2349 (2021)
Kim, J., Kim, M., Kang, H., Lee, K.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830 (2019)
Li, M., Huang, H., Ma, L., Liu, W., Zhang, T., Jiang, Y.: Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199 (2018)
Li, X., Guo, X.: SPN2D-GAN: semantic prior based night-to-day image-to-image translation. IEEE Trans. Multimed. (2022)
Li, X., Guo, X., Zhang, J.: N2D-GAN: a night-to-day image-to-image translator. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2022)
Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1925–1934 (2017)
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Sakaridis, C., Dai, D., Van Gool, L.: ACDC: the adverse conditions dataset with correspondences for semantic driving scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10765–10775 (2021)
Tang, H., Liu, H., Xu, D., Torr, P.H., Sebe, N.: Attentiongan: unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Trans. Neural Netw. Learn. Syst. (2021)
Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7472–7481 (2018)
Wang, X., et al.: Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Wu, X., Wu, Z., Guo, H., Ju, L., Wang, S.: Dannet: a one-stage domain adaptation network for unsupervised nighttime semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15769–15778 (2021)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Zheng, Z., Yang, Y.: Unsupervised scene adaptation with memory regularization in vivo. arXiv preprint arXiv:1912.11164 (2019)
Zheng, Z., Wu, Y., Han, X., Shi, J.: ForkGAN: seeing into the rainy night. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 155–170. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_10
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Acknowledgements
This project was carried out utilizing the high-performance computing resources provided by Artemis HPC [4] at the University of Sydney.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Opposite Direction of Image Translation
See Fig. 6.
1.2 A.2 Details on the Construction of the Training and Evaluation Data
For the evaluation stage of the image translation experiments, we evaluate our performance based on the translation task from the night condition of the ACDC dataset to the referring domains of three adverse conditions. We choose this image dataset as our primary evaluation benchmark for the following reasons:
-
One of the key contributions of the proposed model is to leverage the reference images from the dataset. The ACDC dataset provides a set of reference images for each adverse condition, making the evaluation of our experimental results possible.
-
ACDC contains a specific adverse domain dedicated to nighttime street views, with fine-grained pixel-level annotations for up to 400 images. This aligns with the proposed model’s goal of translating from daytime to nighttime.
-
The reference images of the three adverse conditions in the ACDC dataset, including rain, snow, and fog, can act as daytime domains, as reference images for each adverse condition are taken during daytime. The original images in the three adverse conditions can serve as reference images in the settings of image translation from nighttime to daytime, providing an abundant source of data to train a sufficiently accurate translation model for evaluation purposes.
The Table 4 presents the details of data used in the experiment for training and validation. For the Daytime domain, we leverage the reference images of the three adverse conditions (i.e., Rain, Snow, and Fog) in ACDC as the training images. This is possible because the reference images for these three conditions are all taken under normal weather conditions during the daytime. We swap the sets of Rain, Snow, and Fog images with their corresponding reference sets in ACDC, using them as the training set and reference set, respectively.
1.3 A.3 Hyper Parameter Learning of \(\lambda _{Seg}\)
The strength of the coefficient in the final objective to control the importance of the segmentation module plays a role in determining the weight of accepting the useful guidance from this module. We explore the choices of \(\lambda _{Seg}\) over a range of 0.3 to 1. For the dataset of the ACDC nighttime domain, we found that our proposed method performs best on the interval between 0.3 and 0.7 and it has an average of 26.6816 mIoU for this chosen interval. In general, the performance is not significantly affected by the choices of \(\lambda _{seg}\). Nonetheless, it is recommended to adjust this hyperparameter when applying it to disparate datasets (Table 5).
1.4 A.4 Grouping the Classes for Better Visualization in Assessing the Segmentation Performance
Conducting Semantic Segmentation Evaluation (mIoU) on the translated images as opposed to the original images in the nighttime. This is to evaluate the model’s ability to preserve semantic details and in-paint invisible and under-exposed regions of night-time images during the image translation from the nighttime domain to the target domain. Here, we group the class labels into four categories to better visualize the experimental results in Table 6.
1.5 A.5 Table of the Notations and Symbols in the Proposed Method
1.6 A.6 The Code Skeleton of the Proposed Method
To illustrate the training procedure of our proposed method, we present the detailed pseudo codes in Algorithm 1 to implement this method for further inspection.
1.7 A.7 Limitations
Despite improvement in our model, we identify certain limitations that need to pay the attention for further improvement in future research:
-
Mitigating Unwanted Semantic In-Painting
We need to tackle the issue of unnecessary semantic in-painting in specific classes within the target domain, especially in vegetation. In scenarios where imbalanced classes are present, we have seen that the generator may erroneously interpret regions with low light conditions, which leads to in-painting inaccuracies in the translations. Although this is partially mitigated by semantic prediction it does exist in the proposed method. Specifically in our case, the semantic class of vegetation is being misinterpreted in the translated images where the image in the original domain does not have the visual clues in those regions.
-
Addressing Motion Blur and Glare
Challenges related to motion blurring and glare elimination in night-to-day translation direction require addressing. Motion blurring is influenced by dataset choice and is common in datasets containing moving images. In our experiment, the ACDC datasets gathered from a car-mounted camera capturing street views, suffer from this problem and penalized the quality of image translation. In addition to motion blurring, glare from light sources is a significant hurdle to enhancing the existing image translation method from the nighttime domain to the daytime domain. As a primary research, direction involves improving the image translation method’s capability to capture geometric and textural features, these issues can severely lead to inaccurate conversions of semantic details that do not actually exist in the daytime domain.
-
Semantic Detail Preservation Challenges
It is essential to enhance the preservation of semantic details for smaller and dynamic objects. Without ground labels in our method, the mismatch between the source and target domains may cause semantic predictions to overlook these objects. For instance, a pre-trained semantic segmentation model, trained on a dataset similar to the source domains like the daytime domain in our problem settings, may fail to predict accurate semantic maps for distant objects or those with a small proportion. This limitation further restricts the guidance of semantic knowledge, which could otherwise improve the quality of image translation in our method.
-
Empirical Evaluation Challenges
In the context of empirical investigation, our method’s performance evaluation is limited due to the scarcity of datasets with the availability of reference images. For a more comprehensive performance assessment, it is critical to conduct empirical statistical analyses across a diverse range of datasets in night-to-day settings.
1.8 A.8 Future Directions
-
Improved Control of Semantic Prediction
Due to the uncertainty of the semantic prediction produced by the segmentation module, attempts of adopting a mechanism to quantify the confidence of the semantic prediction should be designed dynamically to handle the adjustment of the weight for semantic loss in the final objective value. Specifically, computing the entropy of the semantic prediction into a concise quantity might be one of the possible approaches to serve as this purpose. This quantified confidence term could thereby used as an auxiliary instrument, bolstering the model’s adaptability to varying semantic prediction scenarios.
Stop generating
-
Improving Reference Image Utilization
The way to incorporate the reference images to extract useful information could be further improved. Instead of conditioning the outputs of the intermediate layers, considering the approach of using geometric matching could be more efficient in terms of utilizing the visual features present in the reference image. Particularly, existing methods of spatial alignment of two images can selectively surpass the regions that are mismatched, leading to more precisely pinpointing areas that are needed to pay attention to during the image translation.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ning, J., Gong, M. (2024). Enhancing Night-to-Day Image Translation with Semantic Prior and Reference Image Guidance. In: Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., Yang, Z. (eds) Databases Theory and Applications. ADC 2023. Lecture Notes in Computer Science, vol 14386. Springer, Cham. https://doi.org/10.1007/978-3-031-47843-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-47843-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47842-0
Online ISBN: 978-3-031-47843-7
eBook Packages: Computer ScienceComputer Science (R0)