SITTA: Single Image Texture Translation for Data Augmentation

Boyi Li¹⁰,
Yin Cui¹¹,
Tsung-Yi Lin¹² &
…
Serge Belongie¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13802))

Included in the following conference series:

European Conference on Computer Vision

2097 Accesses
2 Citations

Abstract

Recent advances in data augmentation enable one to translate images by learning the mapping between a source domain and a target domain. Existing methods tend to learn the distributions by training a model on a variety of datasets, with results evaluated largely in a subjective manner. Relatively few works in this area, however, study the potential use of image synthesis methods for recognition tasks. In this paper, we propose and explore the problem of image translation for data augmentation. We first propose a lightweight yet efficient model for translating texture to augment images based on a single input of source texture, allowing for fast training and testing, referred to as Single Image Texture Translation for data Augmentation (SITTA). Then we explore the use of augmented data in long-tailed and few-shot image classification tasks. We find the proposed augmentation method and workflow is capable of translating the texture of input data into a target domain, leading to consistently improved image recognition performance. Finally, we examine how SITTA and related image translation methods can provide a basis for a data-efficient, “augmentation engineering” approach to model training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Article 03 February 2020

Diverse Image-to-Image Translation via Disentangled Representations

InvGAN: Invertible GANs

Notes

1.
The $\epsilon $ is a small stability constant (e.g., $\epsilon =10^{-5}$) to avoid divisions by zero and imaginary values due to numerical inaccuracies.
2.
We first train a SinGAN model and then apply it to Paint to Image.
3.
Note: We utilize the official code or the widely used github code. For CycleGAN, we train it for the same number of epochs of SITTA to make fair comparisons. For FUNIT, we test the one-shot translation using the offical pre-trained model that has been trained for 100,000 iterations, so the ‘training time’ is ‘-’, ‘training iters’ is ‘pre-train’. We report the training time of SinGAN and TuiGAN at Scale = 0, which costs less time than other scales.

References

Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340, 2017
Bolukbasi, T., Chang, K.-W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Adv. Neural Inf. Process. Syst. 29, 4349–4357 (2016)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709 (2020)
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 (2020)
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation strategies from data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113–123 (2019)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 341–346 (2001)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423 (2016)
Google Scholar
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2672–2680 (2014)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2012)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. arXiv preprint arXiv:1706.08500 (2017)
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Chapter Google Scholar
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
Lee, H.-Y., Tseng, H.-Y., Huang, J.-B., Singh, M., Yang, M.-H.: Diverse image-to-image translation via disentangled representations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 36–52. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_3
Chapter Google Scholar
LeGendre, C., et al.: DeepLight: learning illumination for unconstrained mobile mixed reality. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5918–5928 (2019)
Google Scholar
Li, B., Wu, F., Lim, S.-N., Belongie, S., Weinberger, K.Q.: On feature normalization and data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12383–12392 (2021)
Google Scholar
Li, B., Wu, F., Weinberger, K.Q., Belongie, S.: Positional normalization. In: Advances in Neural Information Processing Systems, pp. 1622–1634 (2019)
Google Scholar
Lin, J., Pang, Y., Xia, Y., Chen, Z., Luo, J.: TuiGAN: learning versatile image-to-image translation with two unpaired images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 18–35. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_2
Chapter Google Scholar
Liu, M.-Y., et al.: Few-shot unsupervised image-to-image translation. In: arxiv, Timo Aila (2019)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. arXiv preprint arXiv:2003.08934 (2020)
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE (2008)
Google Scholar
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Chapter Google Scholar
Park, T., Zhu, J.-Y., Wang, O., Lu, J., Shechtman, E., Efros, A., Zhang, R.: Swapping autoencoder for deep image manipulation. In: Advances in Neural Information Processing Systems 33 (2020)
Google Scholar
Paszke, A., et al. PyTorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)
Qin, Z., Liu, Z., Zhu, P., Xue, Y.: A GAN-based image synthesis method for skin lesion classification. Comput. Methods Programs Biomed. 195, 105568 (2020)
Article Google Scholar
Rao, K., Harris, C., Irpan, A., Levine, S., Ibarz, J., Khansari, M.: RL-CycleGAN: reinforcement learning aware simulation-to-real. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11157–11166 (2020)
Google Scholar
Shaham, T.R., Dekel, T., Michaeli, T.: SinGAN: learning a generative model from a single natural image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4570–4580 (2019)
Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Google Scholar
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2107–2116 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Somavarapu, N., Ma, C.-Y., Kira, Z.: Frustratingly simple domain generalization via image stylization. arXiv preprint arXiv:2006.11207 (2020)
Sun, Q., Liu, Y., Chua, T.-S., Schiele, B.: Meta-transfer learning for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Thapa, R., Snavely, N., Belongie, S., Khan, A.: The plant pathology 2020 challenge dataset to classify foliar disease of apples. arXiv preprint arXiv:2004.11958 (2020)
Théry, M., Casas, J.: Predator and prey views of spider camouflage. Nature 415(6868), 133 (2002)
Google Scholar
Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy. In: Advances in Neural Information Processing Systems, pp. 8250–8260 (2019)
Google Scholar
Van Horn, G., et al.: The iNaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8769–8778 (2018)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2011)
Google Scholar
Wang, Y., Pan, X., Song, S., Zhang, H., Huang, G., Wu, C.: Implicit semantic data augmentation for deep networks. In: Advances in Neural Information Processing Systems, pp. 12635–12644 (2019)
Google Scholar
Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. (CSUR), 53(3), 1–34 (2020)
Google Scholar
Wang, Y.-X., Girshick, R., Hebert, M., Hariharan, B.: Low-shot learning from imaginary data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7278–7286 (2018)
Google Scholar
Xiao, T., Wang, X., Efros, A.A., Darrell, T.: What should not be contrastive in contrastive learning. In International Conference on Learning Representations (2021)
Google Scholar
Yoo, J., Uh, Y., Chun, S., Kang, B., Ha, J.-W.: Photorealistic style transfer via wavelet transforms. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6023–6032 (2019)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)
Google Scholar
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Chapter Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhang, Y., Zhang, Y., Qinwei, X., Zhang, R.: Learning robust shape-based features for domain generalization. IEEE Access 8, 63748–63756 (2020)
Article Google Scholar
Zheng, X., Chalasani, T., Ghosal, K., Lutz, S., Smolic, A.: STaDa: style transfer as data augmentation. arXiv preprint arXiv:1909.01056 (2019)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the Pioneer Centre for AI, DNRF grant number P1.

Author information

Authors and Affiliations

Cornell University, Cornell Tech, New York, USA
Boyi Li
Google Research, Mountain View, USA
Yin Cui
NVIDIA, Santa Clara, USA
Tsung-Yi Lin
University of Copenhagen, Copenhagen, Denmark
Serge Belongie

Authors

Boyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Yin Cui
View author publications
You can also search for this author in PubMed Google Scholar
Tsung-Yi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Serge Belongie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boyi Li .

Editor information

Editors and Affiliations

IBM Research - MIT-IBM Watson AI Lab, Massachusetts, USA
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4480 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, B., Cui, Y., Lin, TY., Belongie, S. (2023). SITTA: Single Image Texture Translation for Data Augmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13802. Springer, Cham. https://doi.org/10.1007/978-3-031-25063-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-25063-7_1
Published: 16 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25062-0
Online ISBN: 978-3-031-25063-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SITTA: Single Image Texture Translation for Data Augmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

InvGAN: Invertible GANs

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4480 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

SITTA: Single Image Texture Translation for Data Augmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

InvGAN: Invertible GANs

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4480 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation