Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.07518 (cs)

[Submitted on 12 Mar 2024]

Title:Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss

Abstract:Scene text recognition is an important and challenging task in computer vision. However, most prior works focus on recognizing pre-defined words, while there are various out-of-vocabulary (OOV) words in real-world applications.
In this paper, we propose a novel open-vocabulary text recognition framework, Pseudo-OCR, to recognize OOV words. The key challenge in this task is the lack of OOV training data. To solve this problem, we first propose a pseudo label generation module that leverages character detection and image inpainting to produce substantial pseudo OOV training data from real-world images. Unlike previous synthetic data, our pseudo OOV data contains real characters and backgrounds to simulate real-world applications. Secondly, to reduce noises in pseudo data, we present a semantic checking mechanism to filter semantically meaningful data. Thirdly, we introduce a quality-aware margin loss to boost the training with pseudo data. Our loss includes a margin-based part to enhance the classification ability, and a quality-aware part to penalize low-quality samples in both real and pseudo data.
Extensive experiments demonstrate that our approach outperforms the state-of-the-art on eight datasets and achieves the first rank in the ICDAR2022 challenge.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.07518 [cs.CV]
	(or arXiv:2403.07518v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.07518

Submission history

From: Xuhua Ren [view email]
[v1] Tue, 12 Mar 2024 10:54:38 UTC (850 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators