Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.17955 (cs)

[Submitted on 29 Nov 2023 (v1), last revised 23 Jul 2024 (this version, v3)]

Title:PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution

Authors:Zuoyan Zhao, Hui Xue, Pengfei Fang, Shipeng Zhu

Abstract:Scene text image super-resolution (STISR) aims at simultaneously increasing the resolution and readability of low-resolution scene text images, thus boosting the performance of the downstream recognition task. Two factors in scene text images, visual structure and semantic information, affect the recognition performance significantly. To mitigate the effects from these factors, this paper proposes a Prior-Enhanced Attention Network (PEAN). Specifically, an attention-based modulation module is leveraged to understand scene text images by neatly perceiving the local and global dependence of images, despite the shape of the text. Meanwhile, a diffusion-based module is developed to enhance the text prior, hence offering better guidance for the SR network to generate SR images with higher semantic accuracy. Additionally, a multi-task learning paradigm is employed to optimize the network, enabling the model to generate legible SR images. As a result, PEAN establishes new SOTA results on the TextZoom benchmark. Experiments are also conducted to analyze the importance of the enhanced text prior as a means of improving the performance of the SR network. Code is available at this https URL.

Comments:	Accepted by ACMMM 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2311.17955 [cs.CV]
	(or arXiv:2311.17955v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.17955

Submission history

From: Zuoyan Zhao [view email]
[v1] Wed, 29 Nov 2023 08:11:20 UTC (771 KB)
[v2] Mon, 15 Apr 2024 08:43:58 UTC (902 KB)
[v3] Tue, 23 Jul 2024 09:09:33 UTC (915 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators