Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3647649.3647697acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicigpConference Proceedingsconference-collections
research-article

Orthogonal Feature Alignment Network for Cross-Domain Text Detection

Published: 03 May 2024 Publication History

Abstract

Scene text detection methods based on deep learning have achieved remarkable success. To address the laborious and time-consuming process of manually annotating datasets, a large amount of synthetic data has been created and utilized. However, due to the domain discrepancy between synthetic and real scene data, models trained on synthetic data may suffer from performance degradation when applied to real scenes. In order to tackle the domain shift issue between synthetic and real scene data, we propose the Orthogonal Feature Alignment Network (OFAN) specifically designed for text objects. OFAN incorporates an orthogonal feature enhancement module to strengthen the edge features of text instances, emphasizing the text objects, and employs adversarial training for text instance alignment across domains. Additionally, a multi-transform self-training mixture technique is utilized to further improve the detection performance of the model in the target domain, mitigating the adverse effects of false positives and false negatives. We extensively evaluate OFAN on four benchmark datasets, and the experimental results demonstrate the effectiveness of our approach.

References

[1]
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., & Liang, J. (2017). East: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 5551-5560).
[2]
Zhu, Y., Chen, J., Liang, L., Kuang, Z., **, L., & Zhang, W. (2021). Fourier contour embedding for arbitrary-shaped text detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3123-3131).
[3]
Wang, W., **e, E., Li, X., Hou, W., Lu, T., Yu, G., & Shao, S. (2019). Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9336-9345).
[4]
Wang, W., **e, E., Song, X., Zang, Y., Wang, W., Lu, T., ... & Shen, C. (2019). Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8440-8449).
[5]
Wang, Y., **e, H., Zha, Z. J., **ng, M., Fu, Z., & Zhang, Y. (2020). Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11753-11762).
[6]
Li, J., Xu, R., Ma, J., Zou, Q., Ma, J., & Yu, H. (2023). Domain adaptive object detection for autonomous driving under foggy weather. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 612-622).
[7]
Xu, C. D., Zhao, X. R., **, X., & Wei, X. S. (2020). Exploring categorical regularization for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11724-11733).
[8]
Kim, S., Choi, J., Kim, T., & Kim, C. (2019). Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6092-6101).
[9]
Maurya, J., Ranipa, K. R., Yamaguchi, O., Shibata, T., & Kobayashi, D. (2023, January). Domain Adaptation using Self-Training with Mixup for One-Stage Object Detection. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 4178-4187). IEEE.
[10]
Li, Y. J., Dai, X., Ma, C. Y., Liu, Y. C., Chen, K., Wu, B., ... & Vajda, P. (2022). Cross-domain adaptive teacher for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7581-7590).
[11]
Inoue, N., Furuta, R., Yamasaki, T., & Aizawa, K. (2018). Cross-domain weakly-supervised object detection through progressive domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5001-5009).
[12]
Zhan, F., Xue, C., & Lu, S. (2019). Ga-dan: Geometry-aware domain adaptation network for scene text detection and recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9105-9115).
[13]
Wu, W., Lu, N., **e, E., Wang, Y., Yu, W., Yang, C., & Zhou, H. (2020). Synthetic-to-real unsupervised domain adaptation for scene text detection in the wild. In Proceedings of the Asian Conference on Computer Vision.
[14]
Deng, J., Luo, X., Zheng, J., Dang, W., & Li, W. (2022). Text Enhancement Network for Cross-Domain Scene Text Detection. IEEE Signal Processing Letters, 29, 2203-2207.
[15]
Chen, D., Lu, L., Lu, Y., Yu, R., Wang, S., Zhang, L., & Liu, T. (2019). Cross-domain scene text detection via pixel and image-level adaptation. In Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part V 26 (pp. 135-143). Springer International Publishing.
[16]
Zheng, J. (2022, January). Multiple-level alignment for cross-domain scene text detection. In 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE) (pp. 671-675). IEEE.
[17]
Mattolin, G., Zanella, L., Ricci, E., & Wang, Y. (2023). ConfMix: Unsupervised Domain Adaptation for Object Detection via Confidence-based Mixing. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 423-433).
[18]
Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., & He, K. (2018). Data distillation: Towards omni-supervised learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4119-4128).
[19]
Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2315-2324).
[20]
Zhan, F., Lu, S., & Xue, C. (2018). Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 249-266).
[21]
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., ... & Valveny, E. (2015, August). ICDAR 2015 competition on robust reading. In 2015 13th international conference on document analysis and recognition (ICDAR) (pp. 1156-1160). IEEE.
[22]
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L. G., Mestre, S. R., ... & De Las Heras, L. P. (2013, August). ICDAR 2013 robust reading competition. In 2013 12th international conference on document analysis and recognition (pp. 1484-1493). IEEE.
[23]
Sun, B., & Saenko, K. (2016). Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14 (pp. 443-450). Springer International Publishing.

Index Terms

  1. Orthogonal Feature Alignment Network for Cross-Domain Text Detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing
    January 2024
    480 pages
    ISBN:9798400716720
    DOI:10.1145/3647649
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Domain adaption
    2. Scene text detection

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICIGP 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 13
      Total Downloads
    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media