Nothing Special   »   [go: up one dir, main page]

Skip to main content

Learning Fine-Grained and Semantically Aware Mamba Representations for Tampered Text Detection in Images

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15037))

Included in the following conference series:

  • 22 Accesses

Abstract

Tampered text detection in images, as a task focused on detecting manipulated or forged text with image documentation or signage, has increasingly attracted attention due to the widespread use of image editing software and CNN synthesis techniques. The potential difficulties of perceiving subtle differences in tampered text images lie in the gap between the model’s capability to obtain global fine-grained information and the realistic demand. In this work, we propose a robust detection method, Tampered Text Detection with Mamba (TTDMamba). It achieves linear complexity without sacrificing global spatial contextual information, offering significant advancements over the limitations of the Transformer architecture. In particular, we utilize the advanced VMamba architecture as the encoder and incorporate the proposed High-frequency Feature Aggregation to enhance the visual feature set by integrating additional signals. This aggregation guides Mamba’s attention toward capturing fine-grained forgery information. Additionally, we integrate Disentangled Semantic Axial Attention into the stacked Visual State Space block of the VMamba architecture. This integration allows us to incorporate the inherent high-level semantic attributes of the tampered image into a pretrained hierarchical converter. As a result, we obtain a tamper map that is more reliable and accurate. Extensive experiments on the T-SROIE, T-IC13, and DocTamper datasets demonstrate that TTDMamba not only surpasses existing state-of-the-art methods in detection accuracy but also shows superior robustness in pixel-level forgery localization, marking a significant contribution to the domain of text tampering detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cao, J., Luo, M., Yu, J., Yang, M.H., He, R.: Scoremix: a scalable augmentation strategy for training gans with limited data. IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 8920–8935 (2022)

    Google Scholar 

  2. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

    Google Scholar 

  3. Chen, X., Dong, C., Ji, J., Cao, J., Li, X.: Image manipulation detection by multi-view multi-scale supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14185–14193 (2021)

    Google Scholar 

  4. Fan, Q., Huang, H., Chen, M., Liu, H., He, R.: Rmt: Retentive networks meet vision transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5641–5651 (2024)

    Google Scholar 

  5. Fan, Q., Huang, H., Zhou, X., He, R.: Lightweight vision transformer with bidirectional interaction. Adv. Neural Inf. Process. Syst. 36 (2024)

    Google Scholar 

  6. Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces (2023). arXiv:2312.00752

  7. He, R., Hu, B., Yuan, X., Wang, L., et al.: Robust Recognition Via Information Theoretic Learning. Springer, Cham (2014)

    Google Scholar 

  8. He, R., Zhang, M., Wang, L., Ji, Y., Yin, Q.: Cross-modal subspace learning via pairwise constraints. IEEE Trans. Image Process. 24(12), 5543–5556 (2015)

    Article  MathSciNet  Google Scholar 

  9. Huang, H., Zhou, X., Cao, J., He, R., Tan, T.: Vision transformer with super token sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22690–22699 (2023)

    Google Scholar 

  10. Huang, H., Zhou, X., He, R.: Orthogonal transformer: an efficient vision transformer backbone with token orthogonalization. Adv. Neural Inf. Process. Syst. 35, 14596–14607 (2022)

    Google Scholar 

  11. Huang, Z., et al.: Icdar2019 competition on scanned receipt ocr and information extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520. IEEE (2019)

    Google Scholar 

  12. Kwon, M.J., Nam, S.H., Yu, I.J., Lee, H.K., Kim, C.: Learning jpeg compression artifacts for image manipulation detection and localization. Int. J. Comput. Vision 130(8), 1875–1895 (2022)

    Article  Google Scholar 

  13. Liu, X., Liu, Y., Chen, J., Liu, X.: Pscc-net: progressive spatio-channel correlation network for image manipulation detection and localization. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7505–7517 (2022)

    Article  MathSciNet  Google Scholar 

  14. Liu, Y., et al.: Vmamba: Visual state space model (2024). arXiv:2401.10166

  15. Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)

    Google Scholar 

  16. Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization (2017). arXiv:1711.05101

  17. Luo, D., et al.: Toward real text manipulation detection: New dataset and new solution (2023). arXiv:2312.06934

  18. Qu, C., et al.: Towards robust tampered text detection in document image: new dataset and new solution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5937–5946 (2023)

    Google Scholar 

  19. Qu, Y., Tan, Q., Xie, H., Xu, J., Wang, Y., Zhang, Y.: Exploring stroke-level modifications for scene text editing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2119–2127 (2023)

    Google Scholar 

  20. Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? Adv. Neural Inf. Process. Syst. 34, 12116–12128 (2021)

    Google Scholar 

  21. Roy, P., Bhattacharya, S., Ghosh, S., Pal, U.: Stefann: scene text editor using font adaptive neural network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13228–13237 (2020)

    Google Scholar 

  22. Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)

    Article  Google Scholar 

  23. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

    Google Scholar 

  24. Wang, X., Jiang, Y., Luo, Z., Liu, C.L., Choi, H., Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6449–6458 (2019)

    Google Scholar 

  25. Wang, Y., Xie, H., Xing, M., Wang, J., Zhu, S., Zhang, Y.: Detecting tampered scene text in the wild. In: European Conference on Computer Vision, pp. 215–232. Springer, Berlin (2022)

    Google Scholar 

  26. Wang, Y., Xie, H., Zha, Z.J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)

    Google Scholar 

  27. Wang, Y., Zhang, B., Xie, H., Zhang, Y.: Tampered text detection via rgb and frequency relationship modeling. Chin. J. Netw. Inf. Secur. 8(3), 29–40 (2022)

    Google Scholar 

  28. Wang, Y., Zhang, B., Xie, H., Zhang, Y.: Tampered text detection via rgb and frequency relationship modeling. Chin. J. Netw. Inf. Secur. 8(3), 29 (2022)

    Google Scholar 

  29. Wu, Y., AbdAlmageed, W., Natarajan, P.: Mantra-net: manipulation tracing network for detection and localization of image forgeries with anomalous features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2019)

    Google Scholar 

  30. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)

    Google Scholar 

  31. Xu, W., et al.: Document images forgery localization using a two-stream network. Int. J. Intell. Syst. 37(8), 5272–5289 (2022)

    Google Scholar 

  32. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)

    Google Scholar 

Download references

Acknowledgement.

This work is funded by Beijing Municipal Science and Technology Project(Nos. Z231100010323005), Beijing Nova Program (20230484276), Youth Innovation Promotion Association CAS(Grant No.2022132), and National Natural Science Foundation of China (Grant No. 62206277).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaibo Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, H., Cao, J., Zhang, Z., Wu, T., Zhou, K., Huang, H. (2025). Learning Fine-Grained and Semantically Aware Mamba Representations for Tampered Text Detection in Images. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15037. Springer, Singapore. https://doi.org/10.1007/978-981-97-8511-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-8511-7_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-8510-0

  • Online ISBN: 978-981-97-8511-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics