Fine-grained Primitive Representation Learning for Compositional Zero-shot Classification
H Jiang, X Yang, C Chen, C Xu - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
H Jiang, X Yang, C Chen, C Xu
2023 IEEE International Conference on Multimedia and Expo (ICME), 2023•ieeexplore.ieee.orgCompositional zero-shot learning (CZSL) aims to recognize attribute-object compositions
that are never seen in the training set. Existing methods solve this problem mainly by
learning a single primitive representation for each attribute or object, ignoring the natural
intra-attribute or intra-object diversity, ie, the same attribute (or object) presents dramatically
different visual appearance in different compositions. In this paper, we treat the same
attribute (or object) under different compositions as fine-grained classes and propose a …
that are never seen in the training set. Existing methods solve this problem mainly by
learning a single primitive representation for each attribute or object, ignoring the natural
intra-attribute or intra-object diversity, ie, the same attribute (or object) presents dramatically
different visual appearance in different compositions. In this paper, we treat the same
attribute (or object) under different compositions as fine-grained classes and propose a …
Compositional zero-shot learning (CZSL) aims to recognize attribute-object compositions that are never seen in the training set. Existing methods solve this problem mainly by learning a single primitive representation for each attribute or object, ignoring the natural intra-attribute or intra-object diversity, i.e., the same attribute (or object) presents dramatically different visual appearance in different compositions. In this paper, we treat the same attribute (or object) under different compositions as fine-grained classes and propose a novel fine-grained primitive representation learning framework to learn more discriminative primitive representations by multiple compact feature sub-spaces. We employ attribute- or object-specific fine-grained primitive prototypes and embed them into a cross-modal space to improve the discriminative ability of primitive representations. Moreover, we leverage semantic-guided sample synthesis to estimate primitive representations of unseen compositions. Extensive experiments on three public benchmarks indicate that our approach can achieve state-of-the-art performance.
ieeexplore.ieee.org
Showing the best result for this search. See all results