Enhancing Night-to-Day Image Translation with Semantic Prior and Reference Image Guidance

Junzhi Ning¹² &
Mingming Gong¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14386))

Included in the following conference series:

Australasian Database Conference

619 Accesses

Abstract

Current unpaired image-to-image translation models deal with the datasets on unpaired domains effectively but face the challenge of mapping images from domains with scarce information to domains with abundant information due to the degradation of visibility and the loss of semantic information. To improve the quality of night-to-day translation further, we propose a novel image translation method named “RefN2D-Guide GAN” that utilizes reference images to improve the adaptability of the encoder within the generator through the feature matching loss. Moreover, we introduce a segmentation module to assist in preserving the semantic details of the generated images without the need for ground true annotations. The incorporation of the embedding consistency loss differentiates the roles of the encoder and decoder and facilitates the transfer of learned representation to both translation directions. Our experimental results show that our proposed method can effectively enhance the quality of night-to-day image translation on the night training set of the ACDC dataset and achieve higher mIoU on the translated images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For the comparison in experiments, N2D-GAN employed has removed the segmentation module using the guidance of ground truth semantic labels.

References

Anoosheh, A., Sattler, T., Timofte, R., Pollefeys, M., Van Gool, L.: Night-to-day image translation for retrieval-based localization. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5958–5964. IEEE (2019)
Google Scholar
Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying mmd gans. arXiv preprint arXiv:1801.01401 (2018)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Dar, H., Butterworth, N., Willet, C.E.: Sydney informatics hub: artemis training, October 2018. https://sydney-informatics-hub.github.io/training.artemis/, version 2018.10
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet Google Scholar
Guo, J., Li, J., Fu, H., Gong, M., Zhang, K., Tao, D.: Alleviating semantics distortion in unsupervised low-level image-to-image translation via structure consistency constraint. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18249–18259 (2022)
Google Scholar
Han, J., Shoeiby, M., Petersson, L., Armin, M.A.: Dual contrastive learning for unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 746–755 (2021)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Jiang, Y., et al.: EnlightenGAN: deep light enhancement without paired supervision. IEEE Trans. Image Process. 30, 2340–2349 (2021)
Article Google Scholar
Kim, J., Kim, M., Kang, H., Lee, K.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830 (2019)
Li, M., Huang, H., Ma, L., Liu, W., Zhang, T., Jiang, Y.: Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199 (2018)
Google Scholar
Li, X., Guo, X.: SPN2D-GAN: semantic prior based night-to-day image-to-image translation. IEEE Trans. Multimed. (2022)
Google Scholar
Li, X., Guo, X., Zhang, J.: N2D-GAN: a night-to-day image-to-image translator. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2022)
Google Scholar
Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1925–1934 (2017)
Google Scholar
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Chapter Google Scholar
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Google Scholar
Sakaridis, C., Dai, D., Van Gool, L.: ACDC: the adverse conditions dataset with correspondences for semantic driving scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10765–10775 (2021)
Google Scholar
Tang, H., Liu, H., Xu, D., Torr, P.H., Sebe, N.: Attentiongan: unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Trans. Neural Netw. Learn. Syst. (2021)
Google Scholar
Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7472–7481 (2018)
Google Scholar
Wang, X., et al.: Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Google Scholar
Wu, X., Wu, Z., Guo, H., Ju, L., Wang, S.: Dannet: a one-stage domain adaptation network for unsupervised nighttime semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15769–15778 (2021)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Google Scholar
Zheng, Z., Yang, Y.: Unsupervised scene adaptation with memory regularization in vivo. arXiv preprint arXiv:1912.11164 (2019)
Zheng, Z., Wu, Y., Han, X., Shi, J.: ForkGAN: seeing into the rainy night. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 155–170. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_10
Chapter Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar

Download references

Acknowledgements

This project was carried out utilizing the high-performance computing resources provided by Artemis HPC [4] at the University of Sydney.

Author information

Authors and Affiliations

University of Sydney, Sydney, Australia
Junzhi Ning
University of Melbourne, Melbourne, Australia
Mingming Gong

Authors

Junzhi Ning
View author publications
You can also search for this author in PubMed Google Scholar
Mingming Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingming Gong .

Editor information

Editors and Affiliations

Royal Melbourne Institute of Technology, Melbourne, VIC, Australia
Zhifeng Bao
The University of Melbourne, Melbourne, VIC, Australia
Renata Borovica-Gajic
The University of Queensland, Brisbane, QLD, Australia
Ruihong Qiu
The University of Melbourne, Melbourne, VIC, Australia
Farhana Choudhury
The University of New South Wales, Sydney, NSW, Australia
Zhengyi Yang

A Appendix

1.1 A.1 Opposite Direction of Image Translation

See Fig. 6.

1.2 A.2 Details on the Construction of the Training and Evaluation Data

For the evaluation stage of the image translation experiments, we evaluate our performance based on the translation task from the night condition of the ACDC dataset to the referring domains of three adverse conditions. We choose this image dataset as our primary evaluation benchmark for the following reasons:

One of the key contributions of the proposed model is to leverage the reference images from the dataset. The ACDC dataset provides a set of reference images for each adverse condition, making the evaluation of our experimental results possible.
ACDC contains a specific adverse domain dedicated to nighttime street views, with fine-grained pixel-level annotations for up to 400 images. This aligns with the proposed model’s goal of translating from daytime to nighttime.
The reference images of the three adverse conditions in the ACDC dataset, including rain, snow, and fog, can act as daytime domains, as reference images for each adverse condition are taken during daytime. The original images in the three adverse conditions can serve as reference images in the settings of image translation from nighttime to daytime, providing an abundant source of data to train a sufficiently accurate translation model for evaluation purposes.

The Table 4 presents the details of data used in the experiment for training and validation. For the Daytime domain, we leverage the reference images of the three adverse conditions (i.e., Rain, Snow, and Fog) in ACDC as the training images. This is possible because the reference images for these three conditions are all taken under normal weather conditions during the daytime. We swap the sets of Rain, Snow, and Fog images with their corresponding reference sets in ACDC, using them as the training set and reference set, respectively.

Table 4. Subsets of ACDC dataset used for the experiments and evaluation

Full size table

Table 5. Hyperparameter Study of different choices of $\mathcal {\lambda }_{Seg}$ of the proposed method evaluated on the three pre-trained City-Scapes segmentation models on ACDC train set.

Full size table

1.3 A.3 Hyper Parameter Learning of $\lambda _{Seg}$

The strength of the coefficient in the final objective to control the importance of the segmentation module plays a role in determining the weight of accepting the useful guidance from this module. We explore the choices of $\lambda _{Seg}$ over a range of 0.3 to 1. For the dataset of the ACDC nighttime domain, we found that our proposed method performs best on the interval between 0.3 and 0.7 and it has an average of 26.6816 mIoU for this chosen interval. In general, the performance is not significantly affected by the choices of $\lambda _{seg}$. Nonetheless, it is recommended to adjust this hyperparameter when applying it to disparate datasets (Table 5).

1.4 A.4 Grouping the Classes for Better Visualization in Assessing the Segmentation Performance

Conducting Semantic Segmentation Evaluation (mIoU) on the translated images as opposed to the original images in the nighttime. This is to evaluate the model’s ability to preserve semantic details and in-paint invisible and under-exposed regions of night-time images during the image translation from the nighttime domain to the target domain. Here, we group the class labels into four categories to better visualize the experimental results in Table 6.

Table 6. For a clearer visualization of the semantic results, we group the 19 semantic classes into ACDC into four broad categories.

Full size table

1.5 A.5 Table of the Notations and Symbols in the Proposed Method

1.6 A.6 The Code Skeleton of the Proposed Method

To illustrate the training procedure of our proposed method, we present the detailed pseudo codes in Algorithm 1 to implement this method for further inspection.

1.7 A.7 Limitations

Despite improvement in our model, we identify certain limitations that need to pay the attention for further improvement in future research:

Mitigating Unwanted Semantic In-Painting

We need to tackle the issue of unnecessary semantic in-painting in specific classes within the target domain, especially in vegetation. In scenarios where imbalanced classes are present, we have seen that the generator may erroneously interpret regions with low light conditions, which leads to in-painting inaccuracies in the translations. Although this is partially mitigated by semantic prediction it does exist in the proposed method. Specifically in our case, the semantic class of vegetation is being misinterpreted in the translated images where the image in the original domain does not have the visual clues in those regions.
Addressing Motion Blur and Glare

Challenges related to motion blurring and glare elimination in night-to-day translation direction require addressing. Motion blurring is influenced by dataset choice and is common in datasets containing moving images. In our experiment, the ACDC datasets gathered from a car-mounted camera capturing street views, suffer from this problem and penalized the quality of image translation. In addition to motion blurring, glare from light sources is a significant hurdle to enhancing the existing image translation method from the nighttime domain to the daytime domain. As a primary research, direction involves improving the image translation method’s capability to capture geometric and textural features, these issues can severely lead to inaccurate conversions of semantic details that do not actually exist in the daytime domain.
Semantic Detail Preservation Challenges

It is essential to enhance the preservation of semantic details for smaller and dynamic objects. Without ground labels in our method, the mismatch between the source and target domains may cause semantic predictions to overlook these objects. For instance, a pre-trained semantic segmentation model, trained on a dataset similar to the source domains like the daytime domain in our problem settings, may fail to predict accurate semantic maps for distant objects or those with a small proportion. This limitation further restricts the guidance of semantic knowledge, which could otherwise improve the quality of image translation in our method.
Empirical Evaluation Challenges

In the context of empirical investigation, our method’s performance evaluation is limited due to the scarcity of datasets with the availability of reference images. For a more comprehensive performance assessment, it is critical to conduct empirical statistical analyses across a diverse range of datasets in night-to-day settings.

1.8 A.8 Future Directions

Improved Control of Semantic Prediction

Due to the uncertainty of the semantic prediction produced by the segmentation module, attempts of adopting a mechanism to quantify the confidence of the semantic prediction should be designed dynamically to handle the adjustment of the weight for semantic loss in the final objective value. Specifically, computing the entropy of the semantic prediction into a concise quantity might be one of the possible approaches to serve as this purpose. This quantified confidence term could thereby used as an auxiliary instrument, bolstering the model’s adaptability to varying semantic prediction scenarios.

Stop generating
Improving Reference Image Utilization

The way to incorporate the reference images to extract useful information could be further improved. Instead of conditioning the outputs of the intermediate layers, considering the approach of using geometric matching could be more efficient in terms of utilizing the visual features present in the reference image. Particularly, existing methods of spatial alignment of two images can selectively surpass the regions that are mismatched, leading to more precisely pinpointing areas that are needed to pay attention to during the image translation.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ning, J., Gong, M. (2024). Enhancing Night-to-Day Image Translation with Semantic Prior and Reference Image Guidance. In: Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., Yang, Z. (eds) Databases Theory and Applications. ADC 2023. Lecture Notes in Computer Science, vol 14386. Springer, Cham. https://doi.org/10.1007/978-3-031-47843-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-47843-7_12
Published: 07 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47842-0
Online ISBN: 978-3-031-47843-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Night-to-Day Image Translation with Semantic Prior and Reference Image Guidance

Abstract

Access this chapter

Subscribe and save

Buy Now

Notes

References

Acknowledgements