Conclusion
In this work, we study character counting in STR from a new viewpoint, giving a principled framework showing that the counting information is involved in both visual decoding and semantic decoding. Based on the principled framework, we propose a novel scene text recognizer with a dual character counting-aware visual and semantic modeling network, where the counting information is fused in both vision and language branches. Experimental results demonstrate the effectiveness of our model.
References
Jiang H, Xu Y L, Zhan Z, et al. Reciprocal feature learning via explicit and implicit tasks in scene text recognition. In: Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021. 287–303
Xie Z, Huang Y, Zhu Y, et al. Aggregation cross-entropy for sequence recognition. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 6531–6540
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 6000–6010
Yue X Y, Kuang Z H, Lin C H, et al. RobustScanner: dynamically enhancing positional clues for robust text recognition. In: Proceedings of European Conference on Computer Vision, 2020. 135–151
Zhang B, Haddow B, Sennrich R. Revisiting end-to-end speech-to-text translation from scratch. In: Proceedings of International Conference on Machine Learning, 2022. 26193–26205
Acknowledgements
This work was supported by Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (Grant No. 202200049).
Author information
Authors and Affiliations
Corresponding author
Additional information
Supporting information Appendixes A–C. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
Rights and permissions
About this article
Cite this article
Xiao, K., Zhu, A., Iwana, B.K. et al. Scene text recognition via dual character counting-aware visual and semantic modeling network. Sci. China Inf. Sci. 67, 139101 (2024). https://doi.org/10.1007/s11432-023-3935-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-023-3935-8