Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Domain Gap Aware Generative Adversarial Network for Multi-Domain Image Translation

Published: 01 January 2022 Publication History

Abstract

Recent image-to-image translation models have shown great success in mapping local textures between two domains. Existing approaches rely on a cycle-consistency constraint that supervises the generators to learn an inverse mapping. However, learning the inverse mapping introduces extra trainable parameters and it is unable to learn the inverse mapping for some domains. As a result, they are ineffective in the scenarios where (i) multiple visual image domains are involved; (ii) both structure and texture transformations are required; and (iii) semantic consistency is preserved. To solve these challenges, the paper proposes a unified model to translate images across multiple domains with significant domain gaps. Unlike previous models that constrain the generators with the ubiquitous cycle-consistency constraint to achieve the content similarity, the proposed model employs a perceptual self-regularization constraint. With a single unified generator, the model can maintain consistency over the global shapes as well as the local texture information across multiple domains. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and superior performance over state-of-the-art models. It is more effective in representing shape deformation in challenging mappings with significant dataset variation across multiple domains.

References

[1]
Z. Chen, S. Nie, T. Wu, and C. G. Healey, “High resolution face completion with multiple controllable attributes via fully end-to-end progressive generative adversarial networks,” 2018, arXiv:1801.07632.
[2]
J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Proc. Eur. Conf. Comput. Vis. Springer, 2016, pp. 694–711.
[3]
R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in Proc. Eur. Conf. Comput. Vis. Springer, 2016, pp. 649–666.
[4]
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” 2016, arXiv:1611.07004.
[5]
W. Shen and R. Liu, “Learning residual images for face attribute manipulation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4030–4038.
[6]
M.-Y. Liuet al., “Few-shot unsupervised image-to-image translation,” 2019, arXiv:1905.01723.
[7]
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” 2017, arXiv:1703.10593.
[8]
C. Yang, T. Kim, R. Wang, H. Peng, and C.-C. J. Kuo, “ESTHER: Extremely simple image translation through self-regularization,” in Proc. BMVC, 2018, p. 110.
[9]
W. Wu, K. Cao, C. Li, C. Qian, and C. C. Loy, “TransGaGa: Geometry-aware unsupervised image-to-image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 8012–8021.
[10]
S. Ruder, “An overview of multi-task learning in deep neural networks,” 2017, arXiv:1706.05098.
[11]
G. Zhou, Y. Zhao, F. Guo, and W. Xu, “A smart high accuracy silicon piezoresistive pressure sensor temperature compensation system,” Sensors, vol. 14, no. 7, pp. 12174–12190, 2014.
[12]
W. Xu, Y. Wu, W. Ma, and G. Wang, “Adaptively denoising proposal collection for weakly supervised object localization,” Neural Process. Lett., vol. 51, no. 1, pp. 993–1006, Feb. 2020.
[13]
F. Cen and G. Wang, “Boosting occluded image classification via subspace decomposition-based estimation of deep features,” IEEE Trans. Cybern., vol. 50, no. 7, pp. 3409–3422, Jul. 2020.
[14]
L. He, J. Lu, G. Wang, S. Song, and J. Zhou, “SOSD-Net: Joint semantic object segmentation and depth estimation from monocular images,” Neurocomputing, vol. 440, pp. 251–263, Jun. 2021.
[15]
W. Xu, S. Keshmiri, and G. Wang, “Stacked wasserstein autoencoder,” Neurocomputing, vol. 363, pp. 195–204, Oct. 2019.
[16]
W. Xu, K. Shawn, and G. Wang, “Adversarially approximated autoencoder for image generation and manipulation,” IEEE Trans. Multimedia, vol. 21, no. 9, pp. 2387–2396, Sep. 2019.
[17]
K. Liet al., “Colonoscopy polyp detection and classification: Dataset creation and comparative evaluations,” 2021, arXiv:2104.10824.
[18]
X. Dong, C. Long, W. Xu, and C. Xiao, “Dual graph convolutional networks with transformer and curriculum learning for image captioning,” 2021, arXiv:2108.02366.
[19]
W. Xu, G. Wang, A. Sullivan, and Z. Zhang, “Towards learning affine-invariant representations via data-efficient CNNs,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2020, pp. 904–913.
[20]
R. Huang, W. Xu, T.-Y. Lee, A. Cherian, Y. Wang, and T. K. Marks, “FX-GAN: Self-supervised GAN learning via feature exchange,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2020, pp. 3194–3202.
[21]
W. Xu, S. Keshmiri, and G. Wang, “Toward learning a unified many-to-many mapping for diverse image translation,” Pattern Recognit., vol. 93, pp. 570–580, Jan. 2019.
[22]
V. Dumoulinet al., “Adversarially learned inference,” 2016, arXiv:1606.00704.
[23]
I. Goodfellowet al., “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
[24]
I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf, “Wasserstein auto-encoders,” 2017, arXiv:1711.01558.
[25]
J.-Y. Zhuet al., “Toward multimodal image-to-image translation,” in Proc. Neural Inf. Process. Syst., 2017, pp. 465–476.
[26]
A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, “Autoencoding beyond pixels using a learned similarity metric,” in Proc. 33rd Int. Conf. Mach. Learn., vol. 48, Feb. 2016, pp. 1558–1566.
[27]
W. Xu, C. Long, R. Wang, and G. Wang, “DRB-GAN: A dynamic resblock generative adversarial network for artistic style transfer,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., Oct. 2021, pp. 6383–6392.
[28]
C. Lediget al., “Photo-realistic single image super-resolution using a generative adversarial network,” 2016, arXiv:1609.04802.
[29]
L. Tran, X. Yin, and X. Liu, “Disentangled representation learning GAN for pose-invariant face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1415–1424.
[30]
Z. Zhang, Y. Song, and H. Qi, “Age progression/regression by conditional adversarial autoencoder,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5810–5818.
[31]
M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Conf. Mach. Learn., 2017, pp. 214–223.
[32]
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5769–5779.
[33]
T. Kim, M. Cha, H. Kim, J. Kwon Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks,” 2017, arXiv:1703.05192.
[34]
M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 700–708.
[35]
Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: Unsupervised dual learning for Image-to-Image translation,” 2017, arXiv:1704.02510.
[36]
Y. Taigman, A. Polyak, and L. Wolf, “Unsupervised cross-domain image generation,” 2016, arXiv:1611.02200.
[37]
K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, “Unsupervised pixel-level domain adaptation with generative adversarial networks,” 2016, arXiv:1612.05424.
[38]
X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 172–189.
[39]
H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Kumar Singh, and M.-H. Yang, “Diverse Image-to-Image translation via disentangled representations,” 2018, arXiv:1808.00948.
[40]
A. Gokaslan, V. Ramanujan, D. Ritchie, K. In Kim, and J. Tompkin, “Improving shape deformation in unsupervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 649–665.
[41]
Y. Shi, D. Deb, and A. K. Jain, “WarpGAN: Automatic caricature generation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 10762–10771.
[42]
L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, and L. Van Gool, “Pose guided person image generation,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 406–416.
[43]
K. Cao, J. Liao, and L. Yuan, “CariGANs: Unpaired photo-to-caricature translation,” ACM Trans. Graph., vol. 37, no. 6, pp. 1–14, 2018.
[44]
C.-H. Lee, Z. Liu, L. Wu, and P. Luo, “MaskGAN: Towards diverse and interactive facial image manipulation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5549–5558.
[45]
Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8789–8797.
[46]
A. H. Liu, Y.-C. Liu, Y.-Y. Yeh, and Y.-C. F. Wang, “A unified feature disentangler for multi-domain image translation and manipulation,” 2018, arXiv:1809.01361.
[47]
M. Liuet al., “STGAN: A unified selective transfer network for arbitrary image attribute editing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3673–3682.
[48]
A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 818–833.
[49]
A. Anoosheh, E. Agustsson, R. Timofte, and L. Van Gool, “ComboGAN: Unrestrained scalability for image domain translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2018, pp. 783–790.
[50]
X. Yu, X. Cai, Z. Ying, T. Li, and G. Li, “SingleGAN: Image-to-image translation by a single-generator network using multiple generative adversarial learning,” in Proc. Asian Conf. Comput. Vis. Springer, 2018, pp. 341–356.
[51]
X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1501–1510.
[52]
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 2234–2242.
[53]
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 6629–6640.
[54]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.
[55]
L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.

Cited By

View all
  • (2022)MATR: Multimodal Medical Image Fusion via Multiscale Adaptive TransformerIEEE Transactions on Image Processing10.1109/TIP.2022.319328831(5134-5149)Online publication date: 1-Jan-2022
  • (2022)Development of secrete images in image transferring systemMultimedia Tools and Applications10.1007/s11042-022-13677-382:5(7529-7552)Online publication date: 24-Aug-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 31, Issue
2022
3518 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2022

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)MATR: Multimodal Medical Image Fusion via Multiscale Adaptive TransformerIEEE Transactions on Image Processing10.1109/TIP.2022.319328831(5134-5149)Online publication date: 1-Jan-2022
  • (2022)Development of secrete images in image transferring systemMultimedia Tools and Applications10.1007/s11042-022-13677-382:5(7529-7552)Online publication date: 24-Aug-2022

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media