Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

TTS: Hilbert Transform-Based Generative Adversarial Network for Tattoo and Scene Text Spotting

Published: 19 March 2024 Publication History

Abstract

Text spotting in natural scenes is of increasing interest and significance due to its critical role in several applications, such as visual question answering, named entity recognition and event rumor detection on social media. One of the newly emerging challenging problems is Tattoo Text Spotting (TTS) in images for assisting forensic teams and for person identification. Unlike the generally simpler scene text addressed by current state-of-the-art methods, tattoo text is typically characterized by the presence of decorative backgrounds, calligraphic handwriting and several distortions due to the deformable nature of the skin. This paper describes the first approach to address TTS in a real-world application context by designing an end-to-end text spotting method employing a Hilbert transform-based Generative Adversarial Network (GAN). To reduce the complexity of the TTS task, the proposed approach first detects fine details in the image using the Hilbert transform and the Optimum Phase Congruency (OPC). To overcome the challenges of only having a relatively small number of training samples, a GAN is then used for generating suitable text samples and descriptors for text spotting (i.e., both detection and recognition). The superior performance of the proposed TTS approach, for both tattoo and general scene text, over the state-of-the-art methods is demonstrated on a new TTS-specific dataset (publicly available) as well as on the existing benchmark natural scene text datasets: Total-Text, CTW1500 and ICDAR 2015.

References

[1]
W. Wang et al., “PAN++: Towards efficient and accurate end-to-end spotting of arbitrary-shaped text,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5349–5367, Sep. 2022.
[2]
C. Zheng, Z. Wu, T. Wang, Y. Cai, and Q. Li, “Object-aware multimodal named entity recognition in social media posts with adversarial learning,” IEEE Trans. Multimedia, vol. 23, pp. 2520–2532, 2021.
[3]
H. Zhang, S. Qian, Q. Fang, and C. Xu, “Multimodal disentangled domain adaption for social media event rumor detection,” IEEE Trans. Multimedia, vol. 23, pp. 4441–4454, 2021.
[4]
M. Huang et al., “SwinTextSpotter: Scene text spotting via better synergy between text detection and text recognition,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2022, pp. 4593–4603.
[5]
P. N. Chowdhury et al., “An episodic learning network for text detection on human bodies in sports images,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 4, pp. 2279–2289, Apr. 2022.
[6]
S. Nag, P. Shivakumara, U. Pal, T. Lu, and M. Blumenstein, “A new unified method for detecting text from marathon runners and sports players in video,” Pattern Recognit., vol. 107, 2020, Art. no.
[7]
H. Han, J. Li, A. K. Jain, S. Shan, and X. Chen, “Tattoo image search at scale: Joint detection and compact representation learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 10, pp. 2333–2348, Oct. 2019.
[8]
K. Wang, P. Xiao, X. Feng, and G. Wu, “Image features detection from phase congruency based on two-dimensional Hilbert transform,” Pattern Recognit. Lett., vol. 32, no. 15, pp. 2015–2024, 2011.
[9]
P. Keserwani and P. P. Roy, “Text region conditional generative adversarial network for text concealment in the wild,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 3152–3163, May 2022.
[10]
F. Zhan, H. Zhu, and S. Lu, “Spatial fusion GAN for image synthesis,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2019, pp. 3648–3657.
[11]
G. Deng, Y. Ming, and J. H. Xue, “RFRN: A recurrent feature refinement network for accurate and efficient scene text detection,” Neurocomputing, vol. 453, pp. 465–481, 2021.
[12]
Z. Raisi, M. A. Naiel, G. Younes, S. Wardell, and J. S. Zelek, “Transformer-based text detection in the wild,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2021, pp. 3162–3171.
[13]
P. Dai, Y. Li, H. Zhang, J. Li, and X. Cao, “Accurate scene text detection via scale-aware data augmentation and shape similarity constraint,” IEEE Trans. Multimedia, vol. 24, pp. 1883–1895, 2022.
[14]
P. Dai, H. Zhang, and X. Cao, “Deep multi-scale context aware feature aggregation for curved scene text detection,” IEEE Trans. Multimedia, vol. 22, no. 8, pp. 1969–1984, Aug. 2020.
[15]
P. Dai, S. Zhang, H. Zhang, and X. Cao, “Progressive contour regression for arbitrary-shape scene text detection,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2021, pp. 7389–7398.
[16]
Y. Wang et al., “ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2020, pp. 1750–11759.
[17]
M. Xue et al., “Arbitrarily-oriented text detection in low light natural scene images,” IEEE Trans. Multimedia, vol. 23, pp. 2706–2720, 2021.
[18]
S. X. Zhang et al., “Deep relational reasoning graph network for arbitrary shape text detection,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2020, pp. 9696–9705.
[19]
S. Zhang, Y. Liu, L. Jin, Z. Wei, and C. Shen, “OPMP: An omnidirectional pyramid mask proposal network for arbitrary-shape scene text detection,” IEEE Trans. Multimedia, vol. 23, pp. 454–467, 2021.
[20]
S. X. Zhang, Z. Xiaobin, Y. Chun, W. Hongfa, and X. C. Yin, “Adaptive boundary proposal network for arbitrary shape text detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 1305–1314.
[21]
Y. Zhu et al., “Fourier contour embedding for arbitrary-shaped text detection,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2021, pp. 3123–3131.
[22]
M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proc. Assoc. Advance. Artif. Intell., 2020, pp. 11474–11481.
[23]
X. Zhou et al., “East: An efficient and accurate scene text detector,” in Proc. IEEE Comput. Vis. Pattern Recognit., 2017, pp. 5551–5560.
[24]
B. Shi, X. Bai, and S. Belongie, “Detecting oriented text in natural images by linking segments,” in Proc. IEEE Comput. Vis. Pattern Recognit., 2017, pp. 2550–2558.
[25]
M. He et al., “MOST: A multi-oriented scene text detector with localization refinement,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2021, pp. 8813–8822.
[26]
L. Nandanwar et al., “A new deep wavefront based model for text localization in 3D video,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 6, pp. 3375–3389, Jun. 2022.
[27]
K. S. Raghunandan et al., “Riesz fractional based model for enhancing license plate detection and recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 9, pp. 2276–2288, Sep. 2018.
[28]
K. S. Raghunndan et al., “Multi-script-oriented text detection and recognition in videos/scene/born digital images,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 4, pp. 1145–1162, Apr. 2019.
[29]
N. Lu et al., “Master: Multi-aspect non-local network for scene text recognition,” Pattern Recognit., vol. 117, 2021, Art. no.
[30]
Z. Qiao, X. Qin, Y. Zhou, F. Yang, and W. Wang, “Gaussian constrained attention network for scene text recognition,” in Proc. IEEE 25th Int. Conf. Pattern Recognit., 2021, pp. 3328–3335.
[31]
P. Dai, H. Zhang, and X. Cao, “SLOAN: Scale-adaptive orientation attention network for scene text recognition,” IEEE Trans. Image Process., vol. 30, pp. 1687–1701, 2021.
[32]
Y. Gao, Y. Chen, J. Wang, and H. Lu, “Semi-supervised scene text recognition,” IEEE Trans. Image Process., vol. 30, pp. 3005–3016, 2021.
[33]
Q. Lin, C. Luo, L. Jin, and S. Lai, “STAN: A sequential transformation attention-based network for scene text recognition,” Pattern Recognit., vol. 111, 2021, Art. no.
[34]
R. Litman et al., “SCATTER: Selective context attentional scene text recognizer,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2020, pp. 11959–11969.
[35]
C. Luo, Y. Zhu, L. Jin, and Y. Wang, “Learn to augment: Joint data augmentation and network optimization for text recognition,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2020, pp. 13743–13752.
[36]
Z. Qiao, Y. Zhou, D. Yang, Y. Zhou, and W. Wang, “Seed: Semantics enhanced encoder-decoder framework for scene text recognition,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2020, pp. 13528–13537.
[37]
U. Sajid, M. Chow, J. Zhang, T. Kim, and G. Wang, “Parallel scale-wise attention network for effective scene text recognition,” in Proc. IEEE Int. Joint Conf. Neural Netw., 2021, pp. 1–8.
[38]
Z. Wan, J. Zhang, L. Zhang, J. Luo, and C. Yao, “On vocabulary reliance in scene text recognition,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2020, pp. 11422–11431.
[39]
C. Zhang, W. Ding, G. Peng, F. Fu, and W. Wang, “Street view text recognition with deep learning for urban scene understanding in intelligent transportation systems,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 7, pp. 4727–4743, Jul. 2021.
[40]
C. Luo, L. Jin, and J. Chen, “SimAN: Exploring self-supervised representation learning of scene text via similarity-aware normalization,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2022, pp. 1039–1048.
[41]
M. Liao, G. Pang, J. Huang, T. Hassner, and X. Bai, “Mask TextSpotter v3: Segmentation proposal network for robust scene text spotting,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 706–722.
[42]
H. Wang et al., “All you need is boundary: Toward arbitrary-shaped text spotting,” in Proc. Assoc. Advance. Artif. Intell., 2020, pp. 12160–12167.
[43]
L. Qiao et al., “Text perceptron: Towards end-to-end arbitrary-shaped text spotting,” in Proc. Assoc. Advance. Artif. Intell., 2020, pp. 11899–11907.
[44]
M. Liao et al., “Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 2, pp. 532–548, Feb. 2021.
[45]
Y. Liu et al., “ABCNetv2: Adaptive bezier-curve network for real-time end-to-end text spotting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 8048–8064, Nov. 2022.
[46]
P. Wang, H. Li, and C. Shen, “Towards end-to-end text spotting in natural scenes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 7266–7281, Oct. 2022.
[47]
X. Zhang, Y. Su, S. Tripathi, and Z. Tu, “Text spotting transformers,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2022, pp. 9519–9528.
[48]
Y. Kittenplon et al., “Towards weakly-supervised text spotting using a multi-task transformer,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2022, pp. 4604–4613.
[49]
M. Ye et al., “Deepsolo: Let transformer decoder with explicit points solo for text spotting,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2023, pp. 19348–19357.
[50]
F. Zhan, H. Zhu, and S. Lu, “Spatial fusion GAN for image synthesis,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2019, pp. 3468–3657.
[51]
D. M. Souza, J. Wehrmann, and D. D. Ruiz, “Efficient neural architecture for text-to-image synthesis,” in Proc. IEEE Int. Joint Conf. Neural Netw., 2020, pp. 1–8.
[52]
C. Liu et al., “EraseNet: End-to-end text removal in the wild,” IEEE Trans. Image Process., vol. 29, pp. 8760–8775, 2020.
[53]
Y. Li, L. Gao, Z. Tang, Q. Yan, and Y. Huang, “A GAN-based feature generator for table detection,” in Proc. IEEE Int. Conf. Document Anal. Recognit., 2019, pp. 763–768.
[54]
T. Hinz, S. Heinrich, and S. Wermter, “Semantic obeject accuracy for generative text-to-image synthesis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3, pp. 1552–1565, Mar. 2022.
[55]
P. Dzianok, M. Koldziej, and E. Kublik, “Detecting attention in hilbert transformed EEG brain signals from simple reaction and choice reaction cognitive tasks,” in Proc. IEEE 21st Int. Conf. Bioinf. Bioeng., 2021, pp. 1–4.
[56]
A. Dragulinescu, A. M. Dragulinescu, and I. Marcu, “Optical correlator based on Hilbert transform for image recognition,” in Proc. IEEE 13th Int. Conf. Electron., Comput. Artif. Intell., 2021, pp. 1–4.
[57]
M. Zabin, J. Uddin, H. J. Choi, M. H. Furthad, and A. B. Ullah, “Industrial fault diagnosis using Hilbrert transform and texture features,” in Proc. IEEE Int. Conf. Big Data Smart Comput., 2021, pp. 121–128.
[58]
T. Chowdhury et al., “DCINN: Deformable convolution and inception based neural network for tattoo text detection through skin region,” in Proc. Document Anal. Recognit.: 16th Int. Conf., Lausanne, 2021, pp. 335–350.
[59]
R. Jiang, “Riesz transform via heat kernel and harmonic functions on non-compact manifolds,” Adv. Math., vol. 377, 2021, Art. no.
[60]
J. Y. Zhu et al., “Toward multimodal image-to-image translation,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 465–476.
[61]
B. Liu, K. Song, Y. Zhu, G. de Melo, and A. Elgammal, “Time: Text and image mutual-translation adversarial networks,” in Proc. Assoc. Advance. Artif. Intell., 2021, pp. 2082–2090.
[62]
B. Duan, W. Wang, H. Tang, H. Latapie, and Y. Yan, “Cascade attention guided residue learning GAN for cross-modal translation,” in Proc. IEEE Int. Conf. Pattern Recognit., 2021, pp. 1336–1343.
[63]
T. Xu and W. Takano, “Graph stacked hourglass networks for 3D human pose estimation,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2021, pp. 16105–16114.
[64]
S. Vandenhende et al., “Multi-task learning for dense prediction tasks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3614–3633, Jul. 2022.
[65]
S. Kwon, “Att-Net: Enhanced emotion recognition system using lightweight self-attention module,” Appl. Soft Comput., vol. 102, 2021, Art. no.
[66]
K. Li, Z. Fu, H. Wang, Z. Chen, and Y. Guo, “Adv-Depth: Self-supervised monocular depth estimation with an adversarial loss,” IEEE Signal Process. Lett., vol. 28, pp. 638–642, 2021.
[67]
H. H. Lee, S. K. Ko, and Y. S. Han, “SALNet: Semi-supervised few-shot text classification with attention-based lexicon construction,” in Proc. Assoc. Advance. Artif. Intell., 2021, pp. 13189–13197.
[68]
Y. Liu, L. Jin, S. Zhang, and S. Zhang, “Detecting curve text in the wild: New dataset and new solution,” 2017, arXiv.1712.02170.
[69]
C. K. Ch'ng and C. S. Chan, “Total-Text: A comprehensive dataset for scene text detection and recognition,” in Proc. IEEE 14th IAPR Int. Conf. Document Anal. Recognit., 2017, pp. 935–942.
[70]
D. Karatzas et al., “ICDAR 2015 competition on Robust Reading,” in Proc. IEEE 13th Int. Conf. Document Anal. Recognit., 2015, pp. 1156–1160.
[71]
D. Li et al., “BigDatasetGAN: Synthesizing ImageNet with pixel-wise annotations,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit., 2022, pp. 21330–21340.

Index Terms

  1. TTS: Hilbert Transform-Based Generative Adversarial Network for Tattoo and Scene Text Spotting
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Multimedia
      IEEE Transactions on Multimedia  Volume 26, Issue
      2024
      10405 pages

      Publisher

      IEEE Press

      Publication History

      Published: 19 March 2024

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Nov 2024

      Other Metrics

      Citations

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media