research-article

A Domain Gap Aware Generative Adversarial Network for Multi-Domain Image Translation

Authors:

Guanghui WangAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 31

Pages 72 - 84

https://doi.org/10.1109/TIP.2021.3125266

Published: 01 January 2022 Publication History

Abstract

Recent image-to-image translation models have shown great success in mapping local textures between two domains. Existing approaches rely on a cycle-consistency constraint that supervises the generators to learn an inverse mapping. However, learning the inverse mapping introduces extra trainable parameters and it is unable to learn the inverse mapping for some domains. As a result, they are ineffective in the scenarios where (i) multiple visual image domains are involved; (ii) both structure and texture transformations are required; and (iii) semantic consistency is preserved. To solve these challenges, the paper proposes a unified model to translate images across multiple domains with significant domain gaps. Unlike previous models that constrain the generators with the ubiquitous cycle-consistency constraint to achieve the content similarity, the proposed model employs a perceptual self-regularization constraint. With a single unified generator, the model can maintain consistency over the global shapes as well as the local texture information across multiple domains. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and superior performance over state-of-the-art models. It is more effective in representing shape deformation in challenging mappings with significant dataset variation across multiple domains.

References

[1]

Z. Chen, S. Nie, T. Wu, and C. G. Healey, “High resolution face completion with multiple controllable attributes via fully end-to-end progressive generative adversarial networks,” 2018, arXiv:1801.07632.

[2]

J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Proc. Eur. Conf. Comput. Vis. Springer, 2016, pp. 694–711.

[3]

R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in Proc. Eur. Conf. Comput. Vis. Springer, 2016, pp. 649–666.

[4]

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” 2016, arXiv:1611.07004.

[5]

W. Shen and R. Liu, “Learning residual images for face attribute manipulation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4030–4038.

[6]

M.-Y. Liuet al., “Few-shot unsupervised image-to-image translation,” 2019, arXiv:1905.01723.

[7]

J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” 2017, arXiv:1703.10593.

[8]

C. Yang, T. Kim, R. Wang, H. Peng, and C.-C. J. Kuo, “ESTHER: Extremely simple image translation through self-regularization,” in Proc. BMVC, 2018, p. 110.

[9]

W. Wu, K. Cao, C. Li, C. Qian, and C. C. Loy, “TransGaGa: Geometry-aware unsupervised image-to-image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 8012–8021.

[10]

S. Ruder, “An overview of multi-task learning in deep neural networks,” 2017, arXiv:1706.05098.

[11]

G. Zhou, Y. Zhao, F. Guo, and W. Xu, “A smart high accuracy silicon piezoresistive pressure sensor temperature compensation system,” Sensors, vol. 14, no. 7, pp. 12174–12190, 2014.

[12]

W. Xu, Y. Wu, W. Ma, and G. Wang, “Adaptively denoising proposal collection for weakly supervised object localization,” Neural Process. Lett., vol. 51, no. 1, pp. 993–1006, Feb. 2020.

[13]

F. Cen and G. Wang, “Boosting occluded image classification via subspace decomposition-based estimation of deep features,” IEEE Trans. Cybern., vol. 50, no. 7, pp. 3409–3422, Jul. 2020.

[14]

L. He, J. Lu, G. Wang, S. Song, and J. Zhou, “SOSD-Net: Joint semantic object segmentation and depth estimation from monocular images,” Neurocomputing, vol. 440, pp. 251–263, Jun. 2021.

[15]

W. Xu, S. Keshmiri, and G. Wang, “Stacked wasserstein autoencoder,” Neurocomputing, vol. 363, pp. 195–204, Oct. 2019.

Digital Library

[16]

W. Xu, K. Shawn, and G. Wang, “Adversarially approximated autoencoder for image generation and manipulation,” IEEE Trans. Multimedia, vol. 21, no. 9, pp. 2387–2396, Sep. 2019.

[17]

K. Liet al., “Colonoscopy polyp detection and classification: Dataset creation and comparative evaluations,” 2021, arXiv:2104.10824.

[18]

X. Dong, C. Long, W. Xu, and C. Xiao, “Dual graph convolutional networks with transformer and curriculum learning for image captioning,” 2021, arXiv:2108.02366.

[19]

W. Xu, G. Wang, A. Sullivan, and Z. Zhang, “Towards learning affine-invariant representations via data-efficient CNNs,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2020, pp. 904–913.

[20]

R. Huang, W. Xu, T.-Y. Lee, A. Cherian, Y. Wang, and T. K. Marks, “FX-GAN: Self-supervised GAN learning via feature exchange,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2020, pp. 3194–3202.

[21]

W. Xu, S. Keshmiri, and G. Wang, “Toward learning a unified many-to-many mapping for diverse image translation,” Pattern Recognit., vol. 93, pp. 570–580, Jan. 2019.

Digital Library

[22]

V. Dumoulinet al., “Adversarially learned inference,” 2016, arXiv:1606.00704.

[23]

I. Goodfellowet al., “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.

[24]

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf, “Wasserstein auto-encoders,” 2017, arXiv:1711.01558.

[25]

J.-Y. Zhuet al., “Toward multimodal image-to-image translation,” in Proc. Neural Inf. Process. Syst., 2017, pp. 465–476.

[26]

A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, “Autoencoding beyond pixels using a learned similarity metric,” in Proc. 33rd Int. Conf. Mach. Learn., vol. 48, Feb. 2016, pp. 1558–1566.

[27]

W. Xu, C. Long, R. Wang, and G. Wang, “DRB-GAN: A dynamic resblock generative adversarial network for artistic style transfer,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., Oct. 2021, pp. 6383–6392.

[28]

C. Lediget al., “Photo-realistic single image super-resolution using a generative adversarial network,” 2016, arXiv:1609.04802.

[29]

L. Tran, X. Yin, and X. Liu, “Disentangled representation learning GAN for pose-invariant face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1415–1424.

[30]

Z. Zhang, Y. Song, and H. Qi, “Age progression/regression by conditional adversarial autoencoder,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5810–5818.

[31]

M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Conf. Mach. Learn., 2017, pp. 214–223.

[32]

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5769–5779.

[33]

T. Kim, M. Cha, H. Kim, J. Kwon Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks,” 2017, arXiv:1703.05192.

[34]

M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 700–708.

[35]

Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: Unsupervised dual learning for Image-to-Image translation,” 2017, arXiv:1704.02510.

[36]

Y. Taigman, A. Polyak, and L. Wolf, “Unsupervised cross-domain image generation,” 2016, arXiv:1611.02200.

[37]

K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, “Unsupervised pixel-level domain adaptation with generative adversarial networks,” 2016, arXiv:1612.05424.

[38]

X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 172–189.

[39]

H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Kumar Singh, and M.-H. Yang, “Diverse Image-to-Image translation via disentangled representations,” 2018, arXiv:1808.00948.

[40]

A. Gokaslan, V. Ramanujan, D. Ritchie, K. In Kim, and J. Tompkin, “Improving shape deformation in unsupervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 649–665.

[41]

Y. Shi, D. Deb, and A. K. Jain, “WarpGAN: Automatic caricature generation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 10762–10771.

[42]

L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, and L. Van Gool, “Pose guided person image generation,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 406–416.

[43]

K. Cao, J. Liao, and L. Yuan, “CariGANs: Unpaired photo-to-caricature translation,” ACM Trans. Graph., vol. 37, no. 6, pp. 1–14, 2018.

Digital Library

[44]

C.-H. Lee, Z. Liu, L. Wu, and P. Luo, “MaskGAN: Towards diverse and interactive facial image manipulation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5549–5558.

[45]

Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8789–8797.

[46]

A. H. Liu, Y.-C. Liu, Y.-Y. Yeh, and Y.-C. F. Wang, “A unified feature disentangler for multi-domain image translation and manipulation,” 2018, arXiv:1809.01361.

[47]

M. Liuet al., “STGAN: A unified selective transfer network for arbitrary image attribute editing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3673–3682.

[48]

A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 818–833.

[49]

A. Anoosheh, E. Agustsson, R. Timofte, and L. Van Gool, “ComboGAN: Unrestrained scalability for image domain translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2018, pp. 783–790.

[50]

X. Yu, X. Cai, Z. Ying, T. Li, and G. Li, “SingleGAN: Image-to-image translation by a single-generator network using multiple generative adversarial learning,” in Proc. Asian Conf. Comput. Vis. Springer, 2018, pp. 341–356.

[51]

X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1501–1510.

[52]

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 2234–2242.

[53]

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 6629–6640.

[54]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.

[55]

L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.

Cited By

Tang WHe FLiu YDuan Y(2022)MATR: Multimodal Medical Image Fusion via Multiscale Adaptive TransformerIEEE Transactions on Image Processing10.1109/TIP.2022.319328831(5134-5149)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TIP.2022.3193288
Bhuyan HVijayaraj ARavi V(2022)Development of secrete images in image transferring systemMultimedia Tools and Applications10.1007/s11042-022-13677-382:5(7529-7552)Online publication date: 24-Aug-2022
https://dl.acm.org/doi/10.1007/s11042-022-13677-3

Recommendations

Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation
MM '18: Proceedings of the 26th ACM international conference on Multimedia

State-of-the-art techniques in Generative Adversarial Networks (GANs) have shown remarkable success in image-to-image translation from peer domain X to domain Y using paired image data. However, obtaining abundant paired data is a non-trivial and ...
Domain Switch-Aware Holistic Recurrent Neural Network for Modeling Multi-Domain User Behavior
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Understanding user behavior and predicting future behavior on the web is critical for providing seamless user experiences as well as increasing revenue of service providers. Recently, thanks to the remarkable success of recurrent neural networks (RNNs), ...
Retrieval Guided Unsupervised Multi-domain Image to Image Translation
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Image to image translation aims to learn a mapping that transforms an image from one visual domain to another. Recent works assume that images descriptors can be disentangled into a domain-invariant content representation and a domain-specific style ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 31, Issue

2022

3518 pages

ISSN:1057-7149

Issue’s Table of Contents

1941-0042 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2022

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tang WHe FLiu YDuan Y(2022)MATR: Multimodal Medical Image Fusion via Multiscale Adaptive TransformerIEEE Transactions on Image Processing10.1109/TIP.2022.319328831(5134-5149)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TIP.2022.3193288
Bhuyan HVijayaraj ARavi V(2022)Development of secrete images in image transferring systemMultimedia Tools and Applications10.1007/s11042-022-13677-382:5(7529-7552)Online publication date: 24-Aug-2022
https://dl.acm.org/doi/10.1007/s11042-022-13677-3

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents