Scene text recognition via dual character counting-aware visual and semantic modeling network

Ke Xiao¹,
Anna Zhu¹,
Brian Kenji Iwana² &
…
Cheng-Lin Liu^3,4

177 Accesses
2 Citations
Explore all metrics

Conclusion

In this work, we study character counting in STR from a new viewpoint, giving a principled framework showing that the counting information is involved in both visual decoding and semantic decoding. Based on the principled framework, we propose a novel scene text recognizer with a dual character counting-aware visual and semantic modeling network, where the counting information is fused in both vision and language branches. Experimental results demonstrate the effectiveness of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Jiang H, Xu Y L, Zhan Z, et al. Reciprocal feature learning via explicit and implicit tasks in scene text recognition. In: Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021. 287–303
Xie Z, Huang Y, Zhu Y, et al. Aggregation cross-entropy for sequence recognition. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 6531–6540
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 6000–6010
Yue X Y, Kuang Z H, Lin C H, et al. RobustScanner: dynamically enhancing positional clues for robust text recognition. In: Proceedings of European Conference on Computer Vision, 2020. 135–151
Zhang B, Haddow B, Sennrich R. Revisiting end-to-end speech-to-text translation from scratch. In: Proceedings of International Conference on Machine Learning, 2022. 26193–26205

Download references

Acknowledgements

This work was supported by Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (Grant No. 202200049).

Author information

Authors and Affiliations

School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China
Ke Xiao & Anna Zhu
Human Interface Laboratory, Kyushu University, Fukuoka, 819-0395, Japan
Brian Kenji Iwana
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Cheng-Lin Liu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
Cheng-Lin Liu

Authors

Ke Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Anna Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Brian Kenji Iwana
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Zhu.

Additional information

Supporting information Appendixes A–C. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File