Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.14960 (cs)

[Submitted on 29 Aug 2023 (v1), last revised 10 Nov 2023 (this version, v2)]

Title:Read-only Prompt Optimization for Vision-Language Few-shot Learning

Authors:Dongjun Lee, Seokwon Song, Jihee Suh, Joonmyung Choi, Sanghyeok Lee, Hyunwoo J.Kim

View PDF

Abstract:In recent years, prompt tuning has proven effective in adapting pre-trained vision-language models to downstream tasks. These methods aim to adapt the pre-trained models by introducing learnable prompts while keeping pre-trained weights frozen. However, learnable prompts can affect the internal representation within the self-attention module, which may negatively impact performance variance and generalization, especially in data-deficient settings. To address these issues, we propose a novel approach, Read-only Prompt Optimization (RPO). RPO leverages masked attention to prevent the internal representation shift in the pre-trained model. Further, to facilitate the optimization of RPO, the read-only prompts are initialized based on special tokens of the pre-trained model. Our extensive experiments demonstrate that RPO outperforms CLIP and CoCoOp in base-to-new generalization and domain generalization while displaying better robustness. Also, the proposed method achieves better generalization on extremely data-deficient settings, while improving parameter efficiency and computational overhead. Code is available at this https URL.

Comments:	Accepted at ICCV2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.14960 [cs.CV]
	(or arXiv:2308.14960v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.14960

Submission history

From: Dongjun Lee [view email]
[v1] Tue, 29 Aug 2023 01:22:30 UTC (2,491 KB)
[v2] Fri, 10 Nov 2023 03:07:22 UTC (2,487 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Read-only Prompt Optimization for Vision-Language Few-shot Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Read-only Prompt Optimization for Vision-Language Few-shot Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators