Learning Fine-Grained and Semantically Aware Mamba Representations for Tampered Text Detection in Images

Hao Sun^15,16,
Jie Cao^15,16,
Zhida Zhang^15,16,
Tao Wu¹⁷,
Kai Zhou¹⁸ &
…
Huaibo Huang^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15037))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

22 Accesses

Abstract

Tampered text detection in images, as a task focused on detecting manipulated or forged text with image documentation or signage, has increasingly attracted attention due to the widespread use of image editing software and CNN synthesis techniques. The potential difficulties of perceiving subtle differences in tampered text images lie in the gap between the model’s capability to obtain global fine-grained information and the realistic demand. In this work, we propose a robust detection method, Tampered Text Detection with Mamba (TTDMamba). It achieves linear complexity without sacrificing global spatial contextual information, offering significant advancements over the limitations of the Transformer architecture. In particular, we utilize the advanced VMamba architecture as the encoder and incorporate the proposed High-frequency Feature Aggregation to enhance the visual feature set by integrating additional signals. This aggregation guides Mamba’s attention toward capturing fine-grained forgery information. Additionally, we integrate Disentangled Semantic Axial Attention into the stacked Visual State Space block of the VMamba architecture. This integration allows us to incorporate the inherent high-level semantic attributes of the tampered image into a pretrained hierarchical converter. As a result, we obtain a tamper map that is more reliable and accurate. Extensive experiments on the T-SROIE, T-IC13, and DocTamper datasets demonstrate that TTDMamba not only surpasses existing state-of-the-art methods in detection accuracy but also shows superior robustness in pixel-level forgery localization, marking a significant contribution to the domain of text tampering detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Detecting Tampered Scene Text in the Wild

Detect Text Forgery with Non-forged Image Features: A Framework for Detection and Grounding of Image-Text Manipulation

ICDAR 2023 Competition on Detecting Tampered Text in Images

References

Cao, J., Luo, M., Yu, J., Yang, M.H., He, R.: Scoremix: a scalable augmentation strategy for training gans with limited data. IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 8920–8935 (2022)
Google Scholar
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
Google Scholar
Chen, X., Dong, C., Ji, J., Cao, J., Li, X.: Image manipulation detection by multi-view multi-scale supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14185–14193 (2021)
Google Scholar
Fan, Q., Huang, H., Chen, M., Liu, H., He, R.: Rmt: Retentive networks meet vision transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5641–5651 (2024)
Google Scholar
Fan, Q., Huang, H., Zhou, X., He, R.: Lightweight vision transformer with bidirectional interaction. Adv. Neural Inf. Process. Syst. 36 (2024)
Google Scholar
Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces (2023). arXiv:2312.00752
He, R., Hu, B., Yuan, X., Wang, L., et al.: Robust Recognition Via Information Theoretic Learning. Springer, Cham (2014)
Google Scholar
He, R., Zhang, M., Wang, L., Ji, Y., Yin, Q.: Cross-modal subspace learning via pairwise constraints. IEEE Trans. Image Process. 24(12), 5543–5556 (2015)
Article MathSciNet Google Scholar
Huang, H., Zhou, X., Cao, J., He, R., Tan, T.: Vision transformer with super token sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22690–22699 (2023)
Google Scholar
Huang, H., Zhou, X., He, R.: Orthogonal transformer: an efficient vision transformer backbone with token orthogonalization. Adv. Neural Inf. Process. Syst. 35, 14596–14607 (2022)
Google Scholar
Huang, Z., et al.: Icdar2019 competition on scanned receipt ocr and information extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520. IEEE (2019)
Google Scholar
Kwon, M.J., Nam, S.H., Yu, I.J., Lee, H.K., Kim, C.: Learning jpeg compression artifacts for image manipulation detection and localization. Int. J. Comput. Vision 130(8), 1875–1895 (2022)
Article Google Scholar
Liu, X., Liu, Y., Chen, J., Liu, X.: Pscc-net: progressive spatio-channel correlation network for image manipulation detection and localization. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7505–7517 (2022)
Article MathSciNet Google Scholar
Liu, Y., et al.: Vmamba: Visual state space model (2024). arXiv:2401.10166
Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization (2017). arXiv:1711.05101
Luo, D., et al.: Toward real text manipulation detection: New dataset and new solution (2023). arXiv:2312.06934
Qu, C., et al.: Towards robust tampered text detection in document image: new dataset and new solution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5937–5946 (2023)
Google Scholar
Qu, Y., Tan, Q., Xie, H., Xu, J., Wang, Y., Zhang, Y.: Exploring stroke-level modifications for scene text editing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2119–2127 (2023)
Google Scholar
Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? Adv. Neural Inf. Process. Syst. 34, 12116–12128 (2021)
Google Scholar
Roy, P., Bhattacharya, S., Ghosh, S., Pal, U.: Stefann: scene text editor using font adaptive neural network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13228–13237 (2020)
Google Scholar
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
Article Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Wang, X., Jiang, Y., Luo, Z., Liu, C.L., Choi, H., Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6449–6458 (2019)
Google Scholar
Wang, Y., Xie, H., Xing, M., Wang, J., Zhu, S., Zhang, Y.: Detecting tampered scene text in the wild. In: European Conference on Computer Vision, pp. 215–232. Springer, Berlin (2022)
Google Scholar
Wang, Y., Xie, H., Zha, Z.J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)
Google Scholar
Wang, Y., Zhang, B., Xie, H., Zhang, Y.: Tampered text detection via rgb and frequency relationship modeling. Chin. J. Netw. Inf. Secur. 8(3), 29–40 (2022)
Google Scholar
Wang, Y., Zhang, B., Xie, H., Zhang, Y.: Tampered text detection via rgb and frequency relationship modeling. Chin. J. Netw. Inf. Secur. 8(3), 29 (2022)
Google Scholar
Wu, Y., AbdAlmageed, W., Natarajan, P.: Mantra-net: manipulation tracing network for detection and localization of image forgeries with anomalous features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2019)
Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Xu, W., et al.: Document images forgery localization using a two-stream network. Int. J. Intell. Syst. 37(8), 5272–5289 (2022)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar

Download references

Acknowledgement.

This work is funded by Beijing Municipal Science and Technology Project(Nos. Z231100010323005), Beijing Nova Program (20230484276), Youth Innovation Promotion Association CAS(Grant No.2022132), and National Natural Science Foundation of China (Grant No. 62206277).

Author information

Authors and Affiliations

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
Hao Sun, Jie Cao, Zhida Zhang & Huaibo Huang
MAIS & NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Hao Sun, Jie Cao, Zhida Zhang & Huaibo Huang
School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
Tao Wu
Meituan, Beijing, 100012, China
Kai Zhou

Authors

Hao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jie Cao
View author publications
You can also search for this author in PubMed Google Scholar
Zhida Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Huaibo Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaibo Huang .

Editor information

Editors and Affiliations

Peking University, Beijing, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, H., Cao, J., Zhang, Z., Wu, T., Zhou, K., Huang, H. (2025). Learning Fine-Grained and Semantically Aware Mamba Representations for Tampered Text Detection in Images. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15037. Springer, Singapore. https://doi.org/10.1007/978-981-97-8511-7_4

Download citation

DOI: https://doi.org/10.1007/978-981-97-8511-7_4
Published: 03 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8510-0
Online ISBN: 978-981-97-8511-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Fine-Grained and Semantically Aware Mamba Representations for Tampered Text Detection in Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Detecting Tampered Scene Text in the Wild

Detect Text Forgery with Non-forged Image Features: A Framework for Detection and Grounding of Image-Text Manipulation

ICDAR 2023 Competition on Detecting Tampered Text in Images

References

Acknowledgement.

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning Fine-Grained and Semantically Aware Mamba Representations for Tampered Text Detection in Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Detecting Tampered Scene Text in the Wild

Detect Text Forgery with Non-forged Image Features: A Framework for Detection and Grounding of Image-Text Manipulation

ICDAR 2023 Competition on Detecting Tampered Text in Images

References

Acknowledgement.

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation