Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Consistency-guided pseudo labeling for transductive zero-shot learning

Published: 18 July 2024 Publication History

Abstract

Zero-shot learning (ZSL) aims to recognize unseen classes during training. Transductive methods have advanced in ZSL, however, often rely on pseudo labels based on confidence scores, leading to semantic misalignment between unseen-class image features and corresponding class semantic descriptions due to noisy pseudo labels. In this paper, we introduce a novel Consistency-Guided Pseudo-Labeling (CGPL) to generate high-quality pseudo labels, achieving robust mapping from visual to semantic space for unseen classes. CGPL incorporates a large-scale vision-language model as a collaborator with the ZSL model to generate high-quality pseudo-labels. Then, pseudo-labeled samples with consistent prediction of two models are added to the training set, to learn the visual-to-semantic mapping for unseen classes. Furthermore, we design a quasi-classification loss based on reconstructed unseen prototypes to learn accurate visual-semantic mapping. Consequently, CGPL is further encouraged to obtain higher-quality pseudo labels, and progressively learn the precise visual-semantic mapping for unseen classes throughout the iterative process. Our extensive experimental results across four benchmark datasets highlight the superior performance of CGPL in both CZSL and GZSL settings.

References

[1]
J. Liu, C. Chang, J. Liu, X. Wu, L. Ma, X. Qi, MarS3D: A plug-and-play motion-aware model for semantic segmentation on multi-scan 3D point clouds, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 9372–9381,.
[2]
B. Pathiraja, M. Gunawardhana, M.H. Khan, Multiclass confidence and localization calibration for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 19734–19743,.
[3]
Z. Zhang, Z. Xue, Y. Chen, S. Liu, Y. Zhang, J. Liu, M. Zhang, Boosting verified training for robust image classifications via abstraction, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 16251–16260,.
[4]
N. Wang, M. Niu, Z. Dou, Z. Wang, Z. Wang, Z. Ming, B. Liu, H. Li, Coloring anime line art videos with transformation region enhancement network, Pattern Recognit. 141 (2023),.
[5]
N. Wang, M. Niu, Z. Wang, K. Hu, B. Liu, Z. Wang, H. Li, Region assisted sketch colorization, IEEE Trans. Image Process. 32 (2023) 6142–6154,.
[6]
Y. Fu, T.M. Hospedales, T. Xiang, S. Gong, Transductive multi-view zero-shot learning, IEEE Trans. Pattern Anal. Mach. Intell. 37 (11) (2015) 2332–2345,.
[7]
Y. Liu, J. Guo, D. Cai, X. He, Attribute attention for semantic disambiguation in zero-shot learning, in: Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6697–6706,.
[8]
Y. Li, J. Zhang, J. Zhang, K. Huang, Discriminative learning of latent features for zero-shot recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7463–7471,.
[9]
Z. Zhai, X. Li, Z. Chang, Open zero-shot learning via asymmetric VAE with dissimilarity space, Inf. Sci. 647 (2023),.
[10]
P. Zhao, H. Xue, X. Ji, H. Liu, L. Han, Zero-shot learning via visual feature enhancement and dual classifier learning for image recognition, Inf. Sci. 642 (2023),.
[11]
L. Tang, P. Zhao, Z. Pan, X. Duan, P.M. Pardalos, A two-stage denoising framework for zero-shot learning with noisy labels, Inf. Sci. 654 (2024),.
[12]
T. Guo, J. Liang, G. Xie, Group-wise interactive region learning for zero-shot recognition, Inf. Sci. 642 (2023),.
[13]
A. Farhadi, I. Endres, D. Hoiem, D.A. Forsyth, Describing objects by their attributes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1778–1785,.
[14]
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in: Proceedings of the 1st International Conference on Learning Representations, 2013, pp. 1–12.
[15]
S.E. Reed, Z. Akata, H. Lee, B. Schiele, Learning deep representations of fine-grained visual descriptions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 49–58,.
[16]
J. Song, C. Shen, Y. Yang, Y. Liu, M. Song, Transductive unbiased embedding for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1024–1033,.
[17]
Z. Wan, D. Chen, Y. Li, X. Yan, J. Zhang, Y. Yu, J. Liao, Transductive zero-shot learning with visual structure constraint, in: Advances in Neural Information Processing Systems, 2019, pp. 9972–9982.
[18]
Y. Guo, G. Ding, X. Jin, J. Wang, Transductive zero-shot recognition via shared model space learning, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 3434–3500.
[19]
G. Xie, X. Zhang, Y. Yao, Z. Zhang, F. Zhao, L. Shao, VMAN: A virtual mainstay alignment network for transductive zero-shot learning, IEEE Trans. Image Process. 30 (2021) 4316–4329,.
[20]
B. Liu, L. Hu, Q. Dong, Z. Hu, An iterative co-training transductive framework for zero shot learning, IEEE Trans. Image Process. 30 (2021) 6943–6956,.
[21]
B. Liu, Q. Dong, Z. Hu, Hardness sampling for self-training based transductive zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 16499–16508.
[22]
H. Yang, B. Sun, B. Li, C. Yang, Z. Wang, J. Chen, L. Wang, H. Li, Iterative class prototype calibration for transductive zero-shot learning, IEEE Trans. Circuits Syst. Video Technol. 33 (3) (2023) 1236–1246,.
[23]
Z. Wang, Y. Hao, T. Mu, O. Li, S. Wang, X. He, Bi-directional distribution alignment for transductive zero-shot learning, in: Proceedings of the 16th European Conference on Computer Vision, 2023, pp. 19893–19902,.
[24]
Z. Wang, J. Liang, Z. Wang, T. Tan, Exploiting semantic attributes for transductive zero-shot learning, in: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1–5.
[25]
A. Frome, G.S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, T. Mikolov, Devise: A deep visual-semantic embedding model, in: Proceedings of the 27th Annual Conference on Neural Information Processing Systems, 2013, pp. 2121–2129.
[26]
C.H. Lampert, H. Nickisch, S. Harmeling, Attribute-based classification for zero-shot visual object categorization, IEEE Trans. Pattern Anal. Mach. Intell. 36 (3) (2014) 453–465,.
[27]
Z. Zhang, V. Saligrama, Zero-shot learning via semantic similarity embedding, in: Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015, pp. 4166–4174,.
[28]
M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. Corrado, J. Dean, Zero-shot learning by convex combination of semantic embeddings, in: Proceedings of the 2nd International Conference on Learning Representations, 2014, pp. 1–9.
[29]
B. Romera-Paredes, P.H.S. Torr, An embarrassingly simple approach to zero-shot learning, in: Proceedings of the 32nd International Conference on Machine Learning, 2015, pp. 2152–2161.
[30]
Z. Akata, F. Perronnin, Z. Harchaoui, C. Schmid, Label-embedding for attribute-based classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 819–826,.
[31]
Z. Akata, S.E. Reed, D. Walter, H. Lee, B. Schiele, Evaluation of output embeddings for fine-grained image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2927–2936,.
[32]
S. Changpinyo, W. Chao, B. Gong, F. Sha, Synthesized classifiers for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5327–5336,.
[33]
Y. Xian, T. Lorenz, B. Schiele, Z. Akata, Feature generating networks for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5542–5551,.
[34]
S. Narayan, A. Gupta, F.S. Khan, C.G.M. Snoek, L. Shao, Latent embedding feedback and discriminative features for zero-shot classification, in: A. Vedaldi, H. Bischof, T. Brox, J. Frahm (Eds.), Proceedings of the 16th European Conference on Computer Vision, 2020, pp. 479–495,.
[35]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks, Commun. ACM 63 (11) (2020) 139–144.
[36]
D.P. Kingma, M. Welling, Auto-encoding variational Bayes, in: Y. Bengio, Y. LeCun (Eds.), Proceedings of the 1st International Conference on Learning Representations, 2014, pp. 1–14.
[37]
M. Rohrbach, S. Ebert, B. Schiele, Transfer learning in a transductive setting, in: C.J.C. Burges, L. Bottou, Z. Ghahramani, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems, 2013, pp. 46–54.
[38]
E. Kodirov, T. Xiang, Z. Fu, S. Gong, Unsupervised domain adaptation for zero-shot learning, in: Proceedings of the International Conference on Computer Vision, 2015, pp. 2452–2460,.
[39]
M. Ye, Y. Guo, Progressive ensemble networks for zero-shot recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11728–11736.
[40]
H. Jiang, R. Wang, S. Shan, Y. Yang, X. Chen, Learning discriminative latent attributes for zero-shot classification, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4233–4242,.
[41]
J. Li, M. Jing, K. Lu, Z. Ding, L. Zhu, Z. Huang, Leveraging the invariant side of generative zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7402–7411,.
[42]
G. Xie, L. Liu, X. Jin, F. Zhu, Z. Zhang, J. Qin, Y. Yao, L. Shao, Attentive region embedding network for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9384–9393,.
[43]
V.K. Verma, G. Arora, A. Mishra, P. Rai, Generalized zero-shot learning via synthesized examples, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4281–4289,.
[44]
Z. Han, Z. Fu, S. Chen, J. Yang, Contrastive embedding for generalized zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 2371–2381,.
[45]
A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from natural language supervision, in: M. Meila, T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 8748–8763.
[46]
Y. Xian, C.H. Lampert, B. Schiele, Z. Akata, Zero-shot learning - A comprehensive evaluation of the good, the bad and the ugly, IEEE Trans. Pattern Anal. Mach. Intell. 41 (9) (2019) 2251–2265,.
[47]
S. Chen, W. Wang, B. Xia, Q. Peng, X. You, F. Zheng, L. Shao, FREE: feature refinement for generalized zero-shot learning, in: Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 122–131,.
[48]
L.V. Der Maaten, G.E. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res. 9 (11) (2008) 2579–2605.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 670, Issue C
Jun 2024
882 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 18 July 2024

Author Tags

  1. Zero-shot learning
  2. Transductive learning
  3. Vision-language model
  4. Modality alignment
  5. Representation learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media