research-article

Consistency-guided pseudo labeling for transductive zero-shot learning

Authors:

Haojie LiAuthors Info & Claims

Volume 670, Issue C

https://doi.org/10.1016/j.ins.2024.120572

Published: 18 July 2024 Publication History

Abstract

Zero-shot learning (ZSL) aims to recognize unseen classes during training. Transductive methods have advanced in ZSL, however, often rely on pseudo labels based on confidence scores, leading to semantic misalignment between unseen-class image features and corresponding class semantic descriptions due to noisy pseudo labels. In this paper, we introduce a novel Consistency-Guided Pseudo-Labeling (CGPL) to generate high-quality pseudo labels, achieving robust mapping from visual to semantic space for unseen classes. CGPL incorporates a large-scale vision-language model as a collaborator with the ZSL model to generate high-quality pseudo-labels. Then, pseudo-labeled samples with consistent prediction of two models are added to the training set, to learn the visual-to-semantic mapping for unseen classes. Furthermore, we design a quasi-classification loss based on reconstructed unseen prototypes to learn accurate visual-semantic mapping. Consequently, CGPL is further encouraged to obtain higher-quality pseudo labels, and progressively learn the precise visual-semantic mapping for unseen classes throughout the iterative process. Our extensive experimental results across four benchmark datasets highlight the superior performance of CGPL in both CZSL and GZSL settings.

References

[1]

J. Liu, C. Chang, J. Liu, X. Wu, L. Ma, X. Qi, MarS3D: A plug-and-play motion-aware model for semantic segmentation on multi-scan 3D point clouds, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 9372–9381,.

[2]

B. Pathiraja, M. Gunawardhana, M.H. Khan, Multiclass confidence and localization calibration for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 19734–19743,.

[3]

Z. Zhang, Z. Xue, Y. Chen, S. Liu, Y. Zhang, J. Liu, M. Zhang, Boosting verified training for robust image classifications via abstraction, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 16251–16260,.

[4]

N. Wang, M. Niu, Z. Dou, Z. Wang, Z. Wang, Z. Ming, B. Liu, H. Li, Coloring anime line art videos with transformation region enhancement network, Pattern Recognit. 141 (2023),.

Digital Library

[5]

N. Wang, M. Niu, Z. Wang, K. Hu, B. Liu, Z. Wang, H. Li, Region assisted sketch colorization, IEEE Trans. Image Process. 32 (2023) 6142–6154,.

Digital Library

[6]

Y. Fu, T.M. Hospedales, T. Xiang, S. Gong, Transductive multi-view zero-shot learning, IEEE Trans. Pattern Anal. Mach. Intell. 37 (11) (2015) 2332–2345,.

Digital Library

[7]

Y. Liu, J. Guo, D. Cai, X. He, Attribute attention for semantic disambiguation in zero-shot learning, in: Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6697–6706,.

[8]

Y. Li, J. Zhang, J. Zhang, K. Huang, Discriminative learning of latent features for zero-shot recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7463–7471,.

[9]

Z. Zhai, X. Li, Z. Chang, Open zero-shot learning via asymmetric VAE with dissimilarity space, Inf. Sci. 647 (2023),.

Digital Library

[10]

P. Zhao, H. Xue, X. Ji, H. Liu, L. Han, Zero-shot learning via visual feature enhancement and dual classifier learning for image recognition, Inf. Sci. 642 (2023),.

Digital Library

[11]

L. Tang, P. Zhao, Z. Pan, X. Duan, P.M. Pardalos, A two-stage denoising framework for zero-shot learning with noisy labels, Inf. Sci. 654 (2024),.

Digital Library

[12]

T. Guo, J. Liang, G. Xie, Group-wise interactive region learning for zero-shot recognition, Inf. Sci. 642 (2023),.

Digital Library

[13]

A. Farhadi, I. Endres, D. Hoiem, D.A. Forsyth, Describing objects by their attributes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1778–1785,.

[14]

T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in: Proceedings of the 1st International Conference on Learning Representations, 2013, pp. 1–12.

[15]

S.E. Reed, Z. Akata, H. Lee, B. Schiele, Learning deep representations of fine-grained visual descriptions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 49–58,.

[16]

J. Song, C. Shen, Y. Yang, Y. Liu, M. Song, Transductive unbiased embedding for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1024–1033,.

[17]

Z. Wan, D. Chen, Y. Li, X. Yan, J. Zhang, Y. Yu, J. Liao, Transductive zero-shot learning with visual structure constraint, in: Advances in Neural Information Processing Systems, 2019, pp. 9972–9982.

[18]

Y. Guo, G. Ding, X. Jin, J. Wang, Transductive zero-shot recognition via shared model space learning, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 3434–3500.

[19]

G. Xie, X. Zhang, Y. Yao, Z. Zhang, F. Zhao, L. Shao, VMAN: A virtual mainstay alignment network for transductive zero-shot learning, IEEE Trans. Image Process. 30 (2021) 4316–4329,.

[20]

B. Liu, L. Hu, Q. Dong, Z. Hu, An iterative co-training transductive framework for zero shot learning, IEEE Trans. Image Process. 30 (2021) 6943–6956,.

Digital Library

[21]

B. Liu, Q. Dong, Z. Hu, Hardness sampling for self-training based transductive zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 16499–16508.

[22]

H. Yang, B. Sun, B. Li, C. Yang, Z. Wang, J. Chen, L. Wang, H. Li, Iterative class prototype calibration for transductive zero-shot learning, IEEE Trans. Circuits Syst. Video Technol. 33 (3) (2023) 1236–1246,.

Digital Library

[23]

Z. Wang, Y. Hao, T. Mu, O. Li, S. Wang, X. He, Bi-directional distribution alignment for transductive zero-shot learning, in: Proceedings of the 16th European Conference on Computer Vision, 2023, pp. 19893–19902,.

[24]

Z. Wang, J. Liang, Z. Wang, T. Tan, Exploiting semantic attributes for transductive zero-shot learning, in: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1–5.

[25]

A. Frome, G.S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, T. Mikolov, Devise: A deep visual-semantic embedding model, in: Proceedings of the 27th Annual Conference on Neural Information Processing Systems, 2013, pp. 2121–2129.

[26]

C.H. Lampert, H. Nickisch, S. Harmeling, Attribute-based classification for zero-shot visual object categorization, IEEE Trans. Pattern Anal. Mach. Intell. 36 (3) (2014) 453–465,.

Digital Library

[27]

Z. Zhang, V. Saligrama, Zero-shot learning via semantic similarity embedding, in: Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015, pp. 4166–4174,.

Digital Library

[28]

M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. Corrado, J. Dean, Zero-shot learning by convex combination of semantic embeddings, in: Proceedings of the 2nd International Conference on Learning Representations, 2014, pp. 1–9.

[29]

B. Romera-Paredes, P.H.S. Torr, An embarrassingly simple approach to zero-shot learning, in: Proceedings of the 32nd International Conference on Machine Learning, 2015, pp. 2152–2161.

[30]

Z. Akata, F. Perronnin, Z. Harchaoui, C. Schmid, Label-embedding for attribute-based classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 819–826,.

Digital Library

[31]

Z. Akata, S.E. Reed, D. Walter, H. Lee, B. Schiele, Evaluation of output embeddings for fine-grained image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2927–2936,.

[32]

S. Changpinyo, W. Chao, B. Gong, F. Sha, Synthesized classifiers for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5327–5336,.

[33]

Y. Xian, T. Lorenz, B. Schiele, Z. Akata, Feature generating networks for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5542–5551,.

[34]

S. Narayan, A. Gupta, F.S. Khan, C.G.M. Snoek, L. Shao, Latent embedding feedback and discriminative features for zero-shot classification, in: A. Vedaldi, H. Bischof, T. Brox, J. Frahm (Eds.), Proceedings of the 16th European Conference on Computer Vision, 2020, pp. 479–495,.

Digital Library

[35]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks, Commun. ACM 63 (11) (2020) 139–144.

Digital Library

[36]

D.P. Kingma, M. Welling, Auto-encoding variational Bayes, in: Y. Bengio, Y. LeCun (Eds.), Proceedings of the 1st International Conference on Learning Representations, 2014, pp. 1–14.

[37]

M. Rohrbach, S. Ebert, B. Schiele, Transfer learning in a transductive setting, in: C.J.C. Burges, L. Bottou, Z. Ghahramani, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems, 2013, pp. 46–54.

[38]

E. Kodirov, T. Xiang, Z. Fu, S. Gong, Unsupervised domain adaptation for zero-shot learning, in: Proceedings of the International Conference on Computer Vision, 2015, pp. 2452–2460,.

Digital Library

[39]

M. Ye, Y. Guo, Progressive ensemble networks for zero-shot recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11728–11736.

[40]

H. Jiang, R. Wang, S. Shan, Y. Yang, X. Chen, Learning discriminative latent attributes for zero-shot classification, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4233–4242,.

[41]

J. Li, M. Jing, K. Lu, Z. Ding, L. Zhu, Z. Huang, Leveraging the invariant side of generative zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7402–7411,.

[42]

G. Xie, L. Liu, X. Jin, F. Zhu, Z. Zhang, J. Qin, Y. Yao, L. Shao, Attentive region embedding network for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9384–9393,.

[43]

V.K. Verma, G. Arora, A. Mishra, P. Rai, Generalized zero-shot learning via synthesized examples, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4281–4289,.

[44]

Z. Han, Z. Fu, S. Chen, J. Yang, Contrastive embedding for generalized zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 2371–2381,.

[45]

A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from natural language supervision, in: M. Meila, T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 8748–8763.

[46]

Y. Xian, C.H. Lampert, B. Schiele, Z. Akata, Zero-shot learning - A comprehensive evaluation of the good, the bad and the ugly, IEEE Trans. Pattern Anal. Mach. Intell. 41 (9) (2019) 2251–2265,.

[47]

S. Chen, W. Wang, B. Xia, Q. Peng, X. You, F. Zheng, L. Shao, FREE: feature refinement for generalized zero-shot learning, in: Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 122–131,.

[48]

L.V. Der Maaten, G.E. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res. 9 (11) (2008) 2579–2605.

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Transductive Visual-Semantic Embedding for Zero-shot Learning
ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

Zero-shot learning (ZSL) aims to bridge the knowledge transfer via available semantic representations (e.g., attributes) between labeled source instances of seen classes and unlabelled target instances of unseen classes. Most existing ZSL approaches ...
Cross-domain mapping learning for transductive zero-shot learning
Abstract
Zero-shot learning (ZSL) aims to learn a projection function from a visual feature space to a semantic embedding space or reverse. The main challenge of ZSL is the domain shift problem where the unseen test data has a large gap with ...
Highlights
- Our general algorithm can extend inductive ZSL methods to transductive scenarios.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal

Information Sciences: an International Journal Volume 670, Issue C

Jun 2024

882 pages

Issue’s Table of Contents

Elsevier Inc.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 18 July 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents