Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.03516 (cs)

[Submitted on 5 Sep 2024]

Title:LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution

Authors:Jeongsoo Kim, Jongho Nang, Junsuk Choe

Abstract:Recent Vision Transformer (ViT)-based methods for Image Super-Resolution have demonstrated impressive performance. However, they suffer from significant complexity, resulting in high inference times and memory usage. Additionally, ViT models using Window Self-Attention (WSA) face challenges in processing regions outside their windows. To address these issues, we propose the Low-to-high Multi-Level Transformer (LMLT), which employs attention with varying feature sizes for each head. LMLT divides image features along the channel dimension, gradually reduces spatial size for lower heads, and applies self-attention to each head. This approach effectively captures both local and global information. By integrating the results from lower heads into higher heads, LMLT overcomes the window boundary issues in self-attention. Extensive experiments show that our model significantly reduces inference time and GPU memory usage while maintaining or even surpassing the performance of state-of-the-art ViT-based Image Super-Resolution methods. Our codes are availiable at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2409.03516 [cs.CV]
	(or arXiv:2409.03516v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.03516

Submission history

From: Jeongsoo Kim [view email]
[v1] Thu, 5 Sep 2024 13:29:50 UTC (29,957 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators