research-article

Mitigating Generation Shifts for Generalized Zero-Shot Learning

Authors:

Zi HuangAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 844 - 852

https://doi.org/10.1145/3474085.3475258

Published: 17 October 2021 Publication History

Get Access

Abstract

Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information to recognize seen and unseen samples, where unseen classes are not observable during training. It is natural to derive generative models and hallucinate training samples for unseen classes based on the knowledge learned from the seen samples. However, most of these models suffer from the generation shifts, where the synthesized samples may drift from the real distribution of unseen data. In this paper, we propose a novel generative flow framework that consists of multiple conditional affine coupling layers for learning unseen data generation. In particular, we identify three potential problems that trigger the generation shifts, i.e., semantic inconsistency, variance collapse, and structure disorder and address them respectively. First, to reinforce the correlations between the generated samples and their corresponding attributes, we explicitly embed the semantic information into the transformations in each coupling layer. Second, to recover the intrinsic variance of the real unseen features, we introduce a visual perturbation strategy to diversify the generated data and hereby help adjust the decision boundary of the classifiers. Third, a relative positioning strategy is proposed to revise the attribute embeddings, guiding them to fully preserve the inter-class geometric structure and further avoid structure disorder in the semantic space. Experimental results demonstrate that GSMFlow achieves the state-of-the-art performance on GZSL.

Supplementary Material

MP4 File (MM21-fp00520.mp4)

Most of generative zero-shot learning models suffer from the generation shifts, where the synthesized samples may drift from the real distribution. In this paper, we propose a novel generative flow framework that consists of multiple conditional affine coupling layers. In particular, we identify three potential problems that trigger the generation shifts, i.e., semantic inconsistency, variance collapse, and structure disorder and address them respectively. First, to reinforce the correlations between the generated samples and attributes, we explicitly embed the semantic information into each coupling layer. Second, to recover the intrinsic variance of the real unseen features, we introduce a visual perturbation strategy to diversify the generated data. Third, a relative positioning strategy is proposed to revise the attributes, guiding them to fully preserve the inter-class geometric structure. Experimental results demonstrate that GSMFlow achieves the state-of-the-art performance.

Download
29.10 MB

References

[1]

Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. 2015. Label-embedding for image classification. TPAMI 38, 7 (2015), 1425--1438.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Multi-label Generalized Zero-Shot Learning Using Identifiable Variational Autoencoders

Generalized Zero-Shot Learning using Identifiable Variational Autoencoders

Contrastive Prototype-Guided Generation for Generalized Zero-Shot Learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations