Nothing Special   »   [go: up one dir, main page]

Skip to main content

Enhancing Night-to-Day Image Translation with Semantic Prior and Reference Image Guidance

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14386))

Included in the following conference series:

  • 619 Accesses

Abstract

Current unpaired image-to-image translation models deal with the datasets on unpaired domains effectively but face the challenge of mapping images from domains with scarce information to domains with abundant information due to the degradation of visibility and the loss of semantic information. To improve the quality of night-to-day translation further, we propose a novel image translation method named “RefN2D-Guide GAN” that utilizes reference images to improve the adaptability of the encoder within the generator through the feature matching loss. Moreover, we introduce a segmentation module to assist in preserving the semantic details of the generated images without the need for ground true annotations. The incorporation of the embedding consistency loss differentiates the roles of the encoder and decoder and facilitates the transfer of learned representation to both translation directions. Our experimental results show that our proposed method can effectively enhance the quality of night-to-day image translation on the night training set of the ACDC dataset and achieve higher mIoU on the translated images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For the comparison in experiments, N2D-GAN employed has removed the segmentation module using the guidance of ground truth semantic labels.

References

  1. Anoosheh, A., Sattler, T., Timofte, R., Pollefeys, M., Van Gool, L.: Night-to-day image translation for retrieval-based localization. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5958–5964. IEEE (2019)

    Google Scholar 

  2. Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying mmd gans. arXiv preprint arXiv:1801.01401 (2018)

  3. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  4. Dar, H., Butterworth, N., Willet, C.E.: Sydney informatics hub: artemis training, October 2018. https://sydney-informatics-hub.github.io/training.artemis/, version 2018.10

  5. Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  6. Guo, J., Li, J., Fu, H., Gong, M., Zhang, K., Tao, D.: Alleviating semantics distortion in unsupervised low-level image-to-image translation via structure consistency constraint. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18249–18259 (2022)

    Google Scholar 

  7. Han, J., Shoeiby, M., Petersson, L., Armin, M.A.: Dual contrastive learning for unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 746–755 (2021)

    Google Scholar 

  8. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  9. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

    Google Scholar 

  10. Jiang, Y., et al.: EnlightenGAN: deep light enhancement without paired supervision. IEEE Trans. Image Process. 30, 2340–2349 (2021)

    Article  Google Scholar 

  11. Kim, J., Kim, M., Kang, H., Lee, K.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830 (2019)

  12. Li, M., Huang, H., Ma, L., Liu, W., Zhang, T., Jiang, Y.: Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199 (2018)

    Google Scholar 

  13. Li, X., Guo, X.: SPN2D-GAN: semantic prior based night-to-day image-to-image translation. IEEE Trans. Multimed. (2022)

    Google Scholar 

  14. Li, X., Guo, X., Zhang, J.: N2D-GAN: a night-to-day image-to-image translator. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2022)

    Google Scholar 

  15. Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1925–1934 (2017)

    Google Scholar 

  16. Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19

    Chapter  Google Scholar 

  17. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)

    Google Scholar 

  18. Sakaridis, C., Dai, D., Van Gool, L.: ACDC: the adverse conditions dataset with correspondences for semantic driving scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10765–10775 (2021)

    Google Scholar 

  19. Tang, H., Liu, H., Xu, D., Torr, P.H., Sebe, N.: Attentiongan: unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Trans. Neural Netw. Learn. Syst. (2021)

    Google Scholar 

  20. Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7472–7481 (2018)

    Google Scholar 

  21. Wang, X., et al.: Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

    Google Scholar 

  22. Wu, X., Wu, Z., Guo, H., Ju, L., Wang, S.: Dannet: a one-stage domain adaptation network for unsupervised nighttime semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15769–15778 (2021)

    Google Scholar 

  23. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

    Google Scholar 

  24. Zheng, Z., Yang, Y.: Unsupervised scene adaptation with memory regularization in vivo. arXiv preprint arXiv:1912.11164 (2019)

  25. Zheng, Z., Wu, Y., Han, X., Shi, J.: ForkGAN: seeing into the rainy night. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 155–170. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_10

    Chapter  Google Scholar 

  26. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

    Google Scholar 

Download references

Acknowledgements

This project was carried out utilizing the high-performance computing resources provided by Artemis HPC [4] at the University of Sydney.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingming Gong .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Opposite Direction of Image Translation

See Fig. 6.

Fig. 6.
figure 6

Architecture of the proposed method: Direction from Night to Day domains.

1.2 A.2 Details on the Construction of the Training and Evaluation Data

For the evaluation stage of the image translation experiments, we evaluate our performance based on the translation task from the night condition of the ACDC dataset to the referring domains of three adverse conditions. We choose this image dataset as our primary evaluation benchmark for the following reasons:

  • One of the key contributions of the proposed model is to leverage the reference images from the dataset. The ACDC dataset provides a set of reference images for each adverse condition, making the evaluation of our experimental results possible.

  • ACDC contains a specific adverse domain dedicated to nighttime street views, with fine-grained pixel-level annotations for up to 400 images. This aligns with the proposed model’s goal of translating from daytime to nighttime.

  • The reference images of the three adverse conditions in the ACDC dataset, including rain, snow, and fog, can act as daytime domains, as reference images for each adverse condition are taken during daytime. The original images in the three adverse conditions can serve as reference images in the settings of image translation from nighttime to daytime, providing an abundant source of data to train a sufficiently accurate translation model for evaluation purposes.

The Table 4 presents the details of data used in the experiment for training and validation. For the Daytime domain, we leverage the reference images of the three adverse conditions (i.e., Rain, Snow, and Fog) in ACDC as the training images. This is possible because the reference images for these three conditions are all taken under normal weather conditions during the daytime. We swap the sets of Rain, Snow, and Fog images with their corresponding reference sets in ACDC, using them as the training set and reference set, respectively.

Table 4. Subsets of ACDC dataset used for the experiments and evaluation
Table 5. Hyperparameter Study of different choices of \(\mathcal {\lambda }_{Seg}\) of the proposed method evaluated on the three pre-trained City-Scapes segmentation models on ACDC train set.

1.3 A.3 Hyper Parameter Learning of \(\lambda _{Seg}\)

The strength of the coefficient in the final objective to control the importance of the segmentation module plays a role in determining the weight of accepting the useful guidance from this module. We explore the choices of \(\lambda _{Seg}\) over a range of 0.3 to 1. For the dataset of the ACDC nighttime domain, we found that our proposed method performs best on the interval between 0.3 and 0.7 and it has an average of 26.6816 mIoU for this chosen interval. In general, the performance is not significantly affected by the choices of \(\lambda _{seg}\). Nonetheless, it is recommended to adjust this hyperparameter when applying it to disparate datasets (Table 5).

1.4 A.4 Grouping the Classes for Better Visualization in Assessing the Segmentation Performance

Conducting Semantic Segmentation Evaluation (mIoU) on the translated images as opposed to the original images in the nighttime. This is to evaluate the model’s ability to preserve semantic details and in-paint invisible and under-exposed regions of night-time images during the image translation from the nighttime domain to the target domain. Here, we group the class labels into four categories to better visualize the experimental results in Table 6.

Table 6. For a clearer visualization of the semantic results, we group the 19 semantic classes into ACDC into four broad categories.

1.5 A.5 Table of the Notations and Symbols in the Proposed Method

figure y

1.6 A.6 The Code Skeleton of the Proposed Method

To illustrate the training procedure of our proposed method, we present the detailed pseudo codes in Algorithm 1 to implement this method for further inspection.

figure z

1.7 A.7 Limitations

Despite improvement in our model, we identify certain limitations that need to pay the attention for further improvement in future research:

  • Mitigating Unwanted Semantic In-Painting

    We need to tackle the issue of unnecessary semantic in-painting in specific classes within the target domain, especially in vegetation. In scenarios where imbalanced classes are present, we have seen that the generator may erroneously interpret regions with low light conditions, which leads to in-painting inaccuracies in the translations. Although this is partially mitigated by semantic prediction it does exist in the proposed method. Specifically in our case, the semantic class of vegetation is being misinterpreted in the translated images where the image in the original domain does not have the visual clues in those regions.

  • Addressing Motion Blur and Glare

    Challenges related to motion blurring and glare elimination in night-to-day translation direction require addressing. Motion blurring is influenced by dataset choice and is common in datasets containing moving images. In our experiment, the ACDC datasets gathered from a car-mounted camera capturing street views, suffer from this problem and penalized the quality of image translation. In addition to motion blurring, glare from light sources is a significant hurdle to enhancing the existing image translation method from the nighttime domain to the daytime domain. As a primary research, direction involves improving the image translation method’s capability to capture geometric and textural features, these issues can severely lead to inaccurate conversions of semantic details that do not actually exist in the daytime domain.

  • Semantic Detail Preservation Challenges

    It is essential to enhance the preservation of semantic details for smaller and dynamic objects. Without ground labels in our method, the mismatch between the source and target domains may cause semantic predictions to overlook these objects. For instance, a pre-trained semantic segmentation model, trained on a dataset similar to the source domains like the daytime domain in our problem settings, may fail to predict accurate semantic maps for distant objects or those with a small proportion. This limitation further restricts the guidance of semantic knowledge, which could otherwise improve the quality of image translation in our method.

  • Empirical Evaluation Challenges

    In the context of empirical investigation, our method’s performance evaluation is limited due to the scarcity of datasets with the availability of reference images. For a more comprehensive performance assessment, it is critical to conduct empirical statistical analyses across a diverse range of datasets in night-to-day settings.

1.8 A.8 Future Directions

  • Improved Control of Semantic Prediction

    Due to the uncertainty of the semantic prediction produced by the segmentation module, attempts of adopting a mechanism to quantify the confidence of the semantic prediction should be designed dynamically to handle the adjustment of the weight for semantic loss in the final objective value. Specifically, computing the entropy of the semantic prediction into a concise quantity might be one of the possible approaches to serve as this purpose. This quantified confidence term could thereby used as an auxiliary instrument, bolstering the model’s adaptability to varying semantic prediction scenarios.

    Stop generating

  • Improving Reference Image Utilization

    The way to incorporate the reference images to extract useful information could be further improved. Instead of conditioning the outputs of the intermediate layers, considering the approach of using geometric matching could be more efficient in terms of utilizing the visual features present in the reference image. Particularly, existing methods of spatial alignment of two images can selectively surpass the regions that are mismatched, leading to more precisely pinpointing areas that are needed to pay attention to during the image translation.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ning, J., Gong, M. (2024). Enhancing Night-to-Day Image Translation with Semantic Prior and Reference Image Guidance. In: Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., Yang, Z. (eds) Databases Theory and Applications. ADC 2023. Lecture Notes in Computer Science, vol 14386. Springer, Cham. https://doi.org/10.1007/978-3-031-47843-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47843-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47842-0

  • Online ISBN: 978-3-031-47843-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics