research-article

CTNet: hybrid architecture based on CNN and transformer for image inpainting detection

Authors:

Ye YaoAuthors Info & Claims

Multimedia Systems, Volume 29, Issue 6

Pages 3819 - 3832

https://doi.org/10.1007/s00530-023-01184-w

Published: 19 September 2023 Publication History

Abstract

Digital image inpainting technology has increasingly gained popularity as a result of the development of image processing and machine vision. However, digital image inpainting can be used not only to repair damaged photographs, but also to remove specific people or distort the semantic content of images. To address the issue of image inpainting forgeries, a hybrid CNN-Transformer Network (CTNet), which is composed of the hybrid CNN-Transformer encoder, the feature enhancement module, and the decoder module, is proposed for image inpainting detection and localization. Different from existing inpainting detection methods that rely on hand-crafted attention mechanisms, the hybrid CNN-Transformer encoder employs CNN as a feature extractor to build feature maps tokenized as the input patches of the Transformer encoder. The hybrid structure exploits the innate global self-attention mechanisms of Transformer and can effectively capture the long-term dependency of the image. Since inpainting traces mainly exist in the high-frequency components of digital images, the feature enhancement module performs feature extraction in the frequency domain. The decoder regularizes the upsampling process of the predicted masks with the assistance of high-frequency features. We investigate the generalization capacity of our CTNet on datasets generated by ten commonly used inpainting methods. The experimental results show that the proposed model can detect a variety of unknown inpainting operations after being trained on the datasets generated by a single inpainting method.

References

[1]

Wang H, Li W, Hu L, Zhang C, and He Q Structural smoothness low-rank matrix recovery via outlier estimation for image denoising Multimedia Syst 2022 28 1 241-255

[2]

Yan W-Q, Wang J, and Kankanhalli MS Automatic video logo detection and removal Multimedia Syst. 2005 10 379-391

[3]

Ghorai M, Mandal S, and Chanda B A group-based image inpainting using patch refinement in mrf framework IEEE Trans. Image Process. 2017 27 2 556-567

[4]

Guo Q, Gao S, Zhang X, Yin Y, and Zhang C Patch-based image inpainting via two-stage low rank approximation IEEE Trans. Visual Comput. Graphics 2017 24 6 2023-2036

[5]

Li H, Luo W, and Huang J Localization of diffusion-based inpainting in digital images IEEE Trans. Inf. Forensics Secur. 2017 12 12 3050-3064

[6]

Sridevi G and Srinivas Kumar S Image inpainting based on fractional-order nonlinear diffusion for image reconstruction Circuits Syst Signal Process. 2019 38 8 3802-3817

[7]

Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4471–4480 (2019)

[8]

Wang N, Zhang Y, and Zhang L Dynamic selection network for image inpainting IEEE Trans. Image Process. 2021 30 1784-1798

[9]

Wang, W., Zhang, J., Niu, L., Ling, H., Yang, X., Zhang, L.: Parallel multi-resolution fusion network for image inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14559–14568 (2021)

[10]

Jiang Y, Xu J, Yang B, Xu J, and Zhu J Image inpainting based on generative adversarial networks IEEE Access 2020 8 22884-22892

[11]

Dong X, Dong J, Sun G, Duan Y, Qi L, and Yu H Learning-based texture synthesis and automatic inpainting using support vector machines IEEE Trans. Industr. Electron. 2018 66 6 4777-4787

[12]

Nabi ST, Kumar M, Singh P, Aggarwal N, and Kumar K A comprehensive survey of image and video forgery techniques: variants, challenges, and future directions Multimedia Syst. 2022 28 3 939-992

[13]

Wu, Q., Sun, S.-J., Zhu, W., Li, G.-H., Tu, D.: Detection of digital doctoring in exemplar-based inpainted images. In: 2008 International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1222–1226 (2008)

[14]

Bacchuwar, K.S., Ramakrishnan, K., et al.: A jump patch-block match algorithm for multiple forgery detection. In: 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), pp. 723–728 (2013)

[15]

Chang I-C, Yu JC, and Chang C-C A forgery detection algorithm for exemplar-based inpainting images using multi-region relation Image Vis. Comput. 2013 31 1 57-71

[16]

Zhu X, Qian Y, Zhao X, Sun B, and Sun Y A deep learning approach to patch-based image inpainting forensics Signal Proces Image Comm 2018 67 90-99

[17]

Chu, X., Zhang, B., Tian, Z., Wei, X., Xia, H.: Do we really need explicit position encodings for vision transformers. arXiv preprint arXiv:2102.10882 (2021)

[18]

Li, H., Huang, J.: Localization of deep inpainting using high-pass fully convolutional network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8301–8310 (2019)

[19]

Zhang Y, Ding F, Kwong S, and Zhu G Feature pyramid network for diffusion-based image inpainting detection Inf. Sci. 2021 572 29-42

[20]

Wang X, Niu S, and Wang H Image inpainting detection based on multi-task deep learning network IETE Tech. Rev. 2021 38 1 149-157

[21]

Li, A., Ke, Q., Ma, X., Weng, H., Zong, Z., Xue, F., Zhang, R.: Noise doesn’t lie: Towards universal detection of deep inpainting. arXiv preprint arXiv:2106.01532 (2021)

[22]

Chen, X., Dong, C., Ji, J., Cao, J., Li, X.: Image manipulation detection by multi-view multi-scale supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14185–14193 (2021)

[23]

Wu, Y., AbdAlmageed, W., Natarajan, P.: Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2019)

[24]

Wu H and Zhou J IID-Net: Image inpainting detection network via neural architecture search and attention IEEE Trans. Circuits Syst. Video Technol. 2021 32 3 1172-1185

[25]

Liu, K., Li, J., Hussain Bukhari, S.S.: Overview of image inpainting and forensic technology. Security and Communication Networks 2022 (2022)

[26]

Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 417–424 (2000)

[27]

Chan T Local inpainting models and tv inpainting SIAM J. Appl. Math. 2001 62 3 1019-1043

[28]

Chan TF and Shen J Nontexture inpainting by curvature-driven diffusions J. Vis. Commun. Image Represent. 2001 12 4 436-449

[29]

Xu Z and Sun J Image inpainting by patch propagation using patch sparsity IEEE Trans. Image Process. 2010 19 5 1153-1165

[30]

Ruzic T and Pizurica A Context-aware patch-based image inpainting using markov random field modeling IEEE Trans. Image Process. 2015 24 1 444-456

[31]

Telea A An image inpainting technique based on the fast marching method J graph tools 2004 9 1 23-34

[32]

Bertalmio, M., Bertozzi, A.L., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2001)

[33]

Herling J and Broll W High-quality real-time video inpaintingwith pixmix IEEE Trans. Visual Comput. Graphics 2014 20 6 866-879

[34]

Huang J-B, Kang SB, Ahuja N, and Kopf J Image completion using planar structure guidance ACM Trans graphi (TOG) 2014 33 4 1-10

[35]

Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)

[36]

Yang, C., Lu, X., Lin, Z., Shechtman, E., Wang, O., Li, H.: High-resolution image inpainting using multi-scale neural patch synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6721–6729 (2017)

[37]

Iizuka S, Simo-Serra E, and Ishikawa H Globally and locally consistent image completion ACM Trans Graph 2017 36 4 1-14

[38]

Zeng, Y., Fu, J., Chao, H., Guo, B.: Learning pyramid-context encoder network for high-quality image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1486–1494 (2019)

[39]

Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5505–5514 (2018)

[40]

Yan, Z., Li, X., Li, M., Zuo, W., Shan, S.: Shift-net: Image inpainting via deep feature rearrangement. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 1–17 (2018)

[41]

Nazeri, K., Ng, E., Joseph, T., Qureshi, F.Z., Ebrahimi, M.: Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212 (2019)

[42]

Wu, H., Zhou, J., Li, Y.: Deep generative model for image inpainting with local binary pattern learning and spatial attention. arXiv preprint arXiv:2009.01031 (2020)

[43]

Yu, T., Guo, Z., Jin, X., Wu, S., Chen, Z., Li, W., Zhang, Z., Liu, S.: Region normalization for image inpainting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12733–12740 (2020)

[44]

Xiao C, Li F, Zhang D, Huang P, Ding X, and Sheng VS Image inpainting detection based on high-pass filter attention network Comput. Syst. Sci. Eng. 2022 43 3 1146-1154

[45]

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

[46]

Yang, W., Cai, R., Kot, A.: Image inpainting detection via enriched attentive pattern with near original image augmentation. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 2816–2824 (2022)

[47]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

[48]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

[49]

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

[50]

Krizhevsky, A.: Learning multiple layers of features from tiny images. The CIFAR-100 dataset https://www.cs.toronto.edu/~kriz/cifar.html (2009)

[51]

Zhai, X., Puigcerver, J., Kolesnikov, A., Ruyssen, P., Riquelme, C., Lucic, M., Djolonga, J., Pinto, A.S., Neumann, M., Dosovitskiy, A., et al.: A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867 (2019)

[52]

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357 (2021)

[53]

Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)

[54]

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

[55]

Wu, K., Peng, H., Chen, M., Fu, J., Chao, H.: Rethinking and improving relative position encoding for vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10033–10041 (2021)

[56]

Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.-H., Tay, F.E., Feng, J., Yan, S.: Tokens-to-token vit: Training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558–567 (2021)

[57]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

[58]

Bayar B and Stamm MC Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection IEEE Trans. Inf. Forensics Secur. 2018 13 11 2691-2706

[59]

Camacho, I.C.: Initialization methods of convolutional neural networks for detection of image manipulations. PhD thesis, Université Grenoble Alpes (2021)

[60]

Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

[61]

Gloe, T., Böhme, R.: The dresden image database for benchmarking digital image forensics. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1584–1590 (2010)

[62]

Zhou B, Lapedriza A, Khosla A, Oliva A, and Torralba A Places: A 10 million image database for scene recognition IEEE Trans. Pattern Anal. Mach. Intell. 2017 40 6 1452-1464

Cited By

Wu YCai CYeo C(2024)Propagating prior information with transformer for robust visual object trackingMultimedia Systems10.1007/s00530-024-01423-830:5Online publication date: 13-Aug-2024
https://dl.acm.org/doi/10.1007/s00530-024-01423-8

Index Terms

CTNet: hybrid architecture based on CNN and transformer for image inpainting detection

Index terms have been assigned to the content through auto-classification.

Recommendations

TransCNN-HAE: Transformer-CNN Hybrid AutoEncoder for Blind Image Inpainting
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Blind image inpainting is extremely challenging due to the unknown and multi-property complexity of contamination in different contaminated images. Current mainstream work decomposes blind image inpainting into two stages: mask estimating from the ...
Edge-Guided Image Inpainting with Transformer
Advances in Visual Computing
Abstract
Image inpainting aims to complete missing regions by extracting the features of the image through the information of the known region. Traditional image inpainting approaches like patch-based and diffusion-based methods are robust for simple ...
A transformer–CNN for deep image inpainting forensics
Abstract
As an advanced image editing technology, image inpainting leaves very weak traces in the tampered image, causing serious security issues, particularly those based on deep learning. In this paper, we propose the global–local feature fusion network (...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Multimedia Systems

Multimedia Systems Volume 29, Issue 6

Dec 2023

800 pages

ISSN:0942-4962

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 19 September 2023

Accepted: 04 September 2023

Received: 31 March 2023

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu YCai CYeo C(2024)Propagating prior information with transformer for robust visual object trackingMultimedia Systems10.1007/s00530-024-01423-830:5Online publication date: 13-Aug-2024
https://dl.acm.org/doi/10.1007/s00530-024-01423-8

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents