Material Translation Based on Neural Style Transfer with Ideal Style Image Retrieval
<p>Examples of material translation results. Metal objects translated from different original materials: wood, fabric, glass, plastic, fabric, and stone (content image in red). From left to right and top to bottom, respectively.</p> "> Figure 2
<p>Examples of material translation (plastic → paper) using different style images (right corner of each generated result). The first picture shows the content image (blue), and the last is the generated image using our proposed framework (red).</p> "> Figure 3
<p>Example images from the ten material classes used in this paper. Each row depicts images from the same class, from top to bottom: fabric, foliage, glass, leather, metal, paper, plastic, stone, water, and wood.</p> "> Figure 4
<p>General overview of our proposal for material translation using style image retrieval.</p> "> Figure 5
<p>Fixed style images per material selected from the best-scored samples and the widest material regions. From left to right and top to bottom: fabric, foliage, glass, leather, metal, paper, plastic, stone, water, and wood.</p> "> Figure 6
<p>Retrieved results from our proposal using IN (<b>top</b>) and BN (<b>bottom</b>). From left to right: content image (stone); results of fabric, foliage, and wood materials.</p> "> Figure 7
<p>Classification accuracy per material class using our proposed VGG19-IN.</p> "> Figure 8
<p>Classification accuracy (%) of translations from material A (rows) to material B (columns).</p> "> Figure 9
<p>Translated results using our VGG19-IN proposal. (<b>A</b>) from wood, and (<b>B</b>) from foliage material (content image in red). From left to right, and top to bottom: content image, results of fabric, foliage, glass, leather, metal, stone, and water.</p> "> Figure 10
<p>Qualitative results from all evaluated NST methods, translating from wood to stone. From left to right and top to bottom: content image (red) and style (blue); results from Gatys, STROTSS, Johnson, MetaStyle, WCT, LST, and AdaIN.</p> "> Figure 11
<p>Human perceptual study interface: 70% of the participants chose (<b>A</b>), 18% chose (<b>C</b>), while the image generated using our approach (<b>B</b>) got only 12% of the votes.</p> "> Figure 12
<p>Realism results from the human perceptual study. <span class="html-italic">Y</span>-axis shows the average results when participants did not select the translated image as the outlier. Higher results represent more people being fooled by the synthesized images.</p> "> Figure 13
<p>Examples of synthesized images with fewer votes (i.e., perceived as real). Each row shows the image triplets shown in one question (1st row: metal; 2nd row: leather). The most-voted pictures are shown from (<b>left</b>) to (<b>right</b>). The synthesized results of metal and leather got 4% and 14% of the votes, respectively (content image in red).</p> "> Figure 14
<p>Examples of synthesized images with more votes (i.e., perceived as fake). Each row shows the image triplets shown in one question (1st row: foliage; 2nd row: water). The most-voted pictures are shown from (<b>left</b>) to (<b>right</b>). The synthesized results of foliage and water got 88% and 85% votes of the votes, respectively (content image in red).</p> ">
Abstract
:1. Introduction
- We propose a single-material translation framework based on real-time material segmentation and neural style transfer with automatic style image retrieval.
- We present a human perceptual study applied to 100 participants to evaluate the capacity of our generated results to fool the human perception of objects with translated materials.
2. Related Work
3. Proposed Method
3.1. Style Image Retrieval
3.2. Material Translation with NST
3.2.1. Real-Time Material Segmentation
3.2.2. Material Translation
4. Experimental Results
4.1. Implementation Details
4.2. Datasets
4.3. Ablation Study
4.4. Comparison among SOTA NST Methods
4.5. Human Perceptual Study
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 11976–11986. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Jing, Y.; Yang, Y.; Feng, Z.; Ye, J.; Yu, Y.; Song, M. Neural style transfer: A review. IEEE Trans. Vis. Comput. Graph. 2019, 26, 3365–3385. [Google Scholar] [CrossRef] [PubMed]
- Siarohin, A.; Zen, G.; Majtanovic, C.; Alameda-Pineda, X.; Ricci, E.; Sebe, N. How to make an image more memorable? A deep style transfer approach. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, Mountain View, CA, USA, 6–9 June 2017; pp. 322–329. [Google Scholar]
- Yanai, K.; Tanno, R. Conditional fast style transfer network. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, Mountain View, CA, USA, 6–9 June 2017; pp. 434–437. [Google Scholar]
- Li, T.; Qian, R.; Dong, C.; Liu, S.; Yan, Q.; Zhu, W.; Lin, L. Beautygan: Instance-level facial makeup transfer with deep generative adversarial network. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea, 22–26 October 2018; pp. 645–653. [Google Scholar]
- Matsuo, S.; Shimoda, W.; Yanai, K. Partial style transfer using weakly supervised semantic segmentation. In Proceedings of the IEEE International Conference on Multimedia & Expo Workshops, Hong Kong, China, 10–14 July 2017; pp. 267–272. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Ahn, J.; Kwak, S. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4981–4990. [Google Scholar]
- Chao, P.; Kao, C.Y.; Ruan, Y.S.; Huang, C.H.; Lin, Y.L. Hardnet: A low memory traffic network. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October– 2 November 2019; pp. 3552–3561. [Google Scholar]
- Sharan, L.; Rosenholtz, R.; Adelson, E. Material perception: What can you see in a brief glance? J. Vis. 2009, 9, 784. [Google Scholar] [CrossRef]
- Zhang, Y.; Ozay, M.; Liu, X.; Okatani, T. Integrating deep features for material recognition. In Proceedings of the 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 3697–3702. [Google Scholar]
- Benitez-Garcia, G.; Shimoda, W.; Yanai, K. Style Image Retrieval for Improving Material Translation Using Neural Style Transfer. In Proceedings of the 2020 Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia (MMArt-ACM ’20), Dublin, Ireland, 26–29 October 2020; pp. 8–13. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; Yang, M.H. Universal style transfer via feature transforms. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 386–396. [Google Scholar]
- Li, X.; Liu, S.; Kautz, J.; Yang, M.H. Learning linear transformations for fast arbitrary style transfer. arXiv 2018, arXiv:1808.04537. [Google Scholar]
- Zhang, C.; Zhu, Y.; Zhu, S.C. Metastyle: Three-way trade-off among speed, flexibility, and quality in neural style transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1254–1261. [Google Scholar]
- Kolkin, N.; Salavon, J.; Shakhnarovich, G. Style Transfer by Relaxed Optimal Transport and Self-Similarity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10051–10060. [Google Scholar]
- Xu, Z.; Hou, L.; Zhang, J. IFFMStyle: High-Quality Image Style Transfer Using Invalid Feature Filter Modules. Sensors 2022, 22, 6134. [Google Scholar] [CrossRef] [PubMed]
- Kim, M.; Choi, H.C. Total Style Transfer with a Single Feed-Forward Network. Sensors 2022, 22, 4612. [Google Scholar] [CrossRef] [PubMed]
- Shimoda, W.; Yanai, K. Distinct class-specific saliency maps for weakly supervised semantic segmentation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 218–234. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Huang, X.; Liu, M.Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 172–189. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8798–8807. [Google Scholar]
- Chen, Q.; Koltun, V. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1511–1520. [Google Scholar]
Method | Training Set | Test Set |
---|---|---|
PSA | 10,000 (EFMD) | 1000 (FMD) |
HarDNet-base | 10,000 (EFMD) | 1000 (FMD) |
InceptionV3 | 10,000 (EFMD) | 1000 (FMD) |
HarDNet | 900 (FMD) | 100 (FMD) |
NST-based | - | 100 (FMD) |
w/o Refine | w/ Refine | |||
---|---|---|---|---|
Method | acc | mIoU | acc | mIoU |
Baseline | - | - | 0.556 | 0.4860 |
VGG19-IN | 0.409 | 0.3967 | 0.572 | 0.5062 |
VGG19-BN | 0.291 | 0.3612 | 0.543 | 0.4887 |
VGG19 | 0.270 | 0.3520 | 0.506 | 0.4845 |
Method | acc ↑ | mIoU ↑ | IS ↑ | FID ↓ | Inference Time ↓ |
---|---|---|---|---|---|
Gatys [4] | 0.572 | 0.5062 | 4.181 | 61.30 | 45.6545 s |
STROTSS [22] | 0.515 | 0.4887 | 4.046 | 60.29 | 89.1562 s |
Johnson’s [17] | 0.506 | 0.4464 | 3.887 | 68.44 | 0.0881 s |
MetaStyle [21] | 0.442 | 0.4674 | 3.635 | 61.93 | 0.1868 s |
WCT [19] | 0.353 | 0.4079 | 3.604 | 64.53 | 1.0151 s |
LST [20] | 0.343 | 0.3606 | 3.569 | 62.95 | 0.4816 s |
AdaIN [18] | 0.304 | 0.2780 | 3.129 | 74.52 | 0.1083 s |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Benitez-Garcia, G.; Takahashi, H.; Yanai, K. Material Translation Based on Neural Style Transfer with Ideal Style Image Retrieval. Sensors 2022, 22, 7317. https://doi.org/10.3390/s22197317
Benitez-Garcia G, Takahashi H, Yanai K. Material Translation Based on Neural Style Transfer with Ideal Style Image Retrieval. Sensors. 2022; 22(19):7317. https://doi.org/10.3390/s22197317
Chicago/Turabian StyleBenitez-Garcia, Gibran, Hiroki Takahashi, and Keiji Yanai. 2022. "Material Translation Based on Neural Style Transfer with Ideal Style Image Retrieval" Sensors 22, no. 19: 7317. https://doi.org/10.3390/s22197317
APA StyleBenitez-Garcia, G., Takahashi, H., & Yanai, K. (2022). Material Translation Based on Neural Style Transfer with Ideal Style Image Retrieval. Sensors, 22(19), 7317. https://doi.org/10.3390/s22197317