Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3474085.3475258acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Mitigating Generation Shifts for Generalized Zero-Shot Learning

Published: 17 October 2021 Publication History

Abstract

Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information to recognize seen and unseen samples, where unseen classes are not observable during training. It is natural to derive generative models and hallucinate training samples for unseen classes based on the knowledge learned from the seen samples. However, most of these models suffer from the generation shifts, where the synthesized samples may drift from the real distribution of unseen data. In this paper, we propose a novel generative flow framework that consists of multiple conditional affine coupling layers for learning unseen data generation. In particular, we identify three potential problems that trigger the generation shifts, i.e., semantic inconsistency, variance collapse, and structure disorder and address them respectively. First, to reinforce the correlations between the generated samples and their corresponding attributes, we explicitly embed the semantic information into the transformations in each coupling layer. Second, to recover the intrinsic variance of the real unseen features, we introduce a visual perturbation strategy to diversify the generated data and hereby help adjust the decision boundary of the classifiers. Third, a relative positioning strategy is proposed to revise the attribute embeddings, guiding them to fully preserve the inter-class geometric structure and further avoid structure disorder in the semantic space. Experimental results demonstrate that GSMFlow achieves the state-of-the-art performance on GZSL.

Supplementary Material

MP4 File (MM21-fp00520.mp4)
Most of generative zero-shot learning models suffer from the generation shifts, where the synthesized samples may drift from the real distribution. In this paper, we propose a novel generative flow framework that consists of multiple conditional affine coupling layers. In particular, we identify three potential problems that trigger the generation shifts, i.e., semantic inconsistency, variance collapse, and structure disorder and address them respectively. First, to reinforce the correlations between the generated samples and attributes, we explicitly embed the semantic information into each coupling layer. Second, to recover the intrinsic variance of the real unseen features, we introduce a visual perturbation strategy to diversify the generated data. Third, a relative positioning strategy is proposed to revise the attributes, guiding them to fully preserve the inter-class geometric structure. Experimental results demonstrate that GSMFlow achieves the state-of-the-art performance.

References

[1]
Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. 2015. Label-embedding for image classification. TPAMI 38, 7 (2015), 1425--1438.
[2]
Z. Akata, S. Reed, D. Walter, H. Lee, and B. Schiele. 2015. Evaluation of output embeddings for fine-grained image classification. In CVPR. 2927--2936.
[3]
L. Ardizzone, J. Kruse, S. Wirkert, D. Rahner, E. W. Pellegrini, R. S. Klessen, L. Maier-Hein, C. Rother, and U. Köthe. 2018. Analyzing inverse problems with invertible neural networks. arXiv preprint arXiv:1808.04730 (2018).
[4]
L. Ardizzone, C. Lüth, J. Kruse, C. Rother, and U. Köthe. 2019. Guided image gener-ation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392 (2019).
[5]
M. Arjovsky, S. Chintala, and L. Bottou. 2017. Wasserstein generative adversarial networks. In ICML. 214--223.
[6]
X. Chen, X. Lan, F. Sun, and N. Zheng. 2020. A boundary based out-of-distribution classifier for generalized zero-shot learning. In ECCV. 572--588.
[7]
Z. Chen, J. Li, Y. Luo, Z. Huang, and Y. Yang. 2020. Canzsl: Cycle-consistent adversarial networks for zero-shot learning from natural language. In WACV. 874--883.
[8]
Z. Chen, Y. Luo, R. Qiu, S. Wang, Z. Huang, J. Li, and Z Zhang. 2021. Semantics disentangling for generalized zero-shot learning. In ICCV.
[9]
Z. Chen, S. Wang, J. Li, and Z. Huang. 2020. Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches. In ACM MM. 3413--3421.
[10]
T. DeVries and G. W. Taylor. 2017. Dataset augmentation in feature space. In ICLR Workshop.
[11]
L. Dinh, J. Sohl-Dickstein, and S. Bengio. 2016. Density estimation using real nvp. arXiv preprint arXiv:1605.08803 (2016).
[12]
A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. 2009. Describing objects by their attributes. In CVPR. 1778--1785.
[13]
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov. 2013. Devise: A deep visual-semantic embedding model. In NeurIPS. 2121--2129.
[14]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, Da. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. In NeurlPS.
[15]
T. Han, Y. Lu, S. Zhu, and Y. Wu. 2017. Alternating back-propagation for generator network. In AAAI, Vol. 31.
[16]
H. Huang, C. Wang, P. S. Yu, and C. Wang. 2019. Generative Dual Adversarial Network for Generalized Zero-shot Learning. In CVPR. 801--810.
[17]
H. Jiang, R. Wang, S. Shan, and X. Chen. 2019. Transferable contrastive network for generalized zero-shot learning. In ICCV. 9765--9774.
[18]
D. P Kingma and M. Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[19]
E. Kodirov, T. Xiang, and S. Gong. 2017. Semantic autoencoder for zero-shot learning. In CVPR. 3174--3183.
[20]
A. Krizhevsky, I. Sutskever, and G. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NeurIPS. 1097--1105.
[21]
C. H. Lampert, H. Nickisch, and S. Harmeling. 2013. Attribute-based classification for zero-shot visual object categorization. TPAMI 36, 3 (2013), 453--465.
[22]
Jingjing Li, Mengmeng Jin, Ke Lu, Zhengming Ding, Lei Zhu, and Zi Huang. 2019. Leveraging the Invariant Side of Generative Zero-Shot Learning. In CVPR. 7402--7411.
[23]
J. Li, M. Jing, K. Lu, L. Zhu, and H. T. Shen. 2021. Investigating the bilateral connections in generative zero-shot learning. IEEE TCYB (2021).
[24]
J. Li, M. Jing, K. Lu, L. Zhu, Y. Yang, and Z. Huang. 2019. Alleviating Feature Confusion for Generative Zero-shot Learning. In ACM MM. 1587--1595.
[25]
J. Li, M. Jing, K. Lu, L. Zhu, Y. Yang, and Z. Huang. 2019. From zero-shot learning to cold-start recommendation. In AAAI. 4189--4196.
[26]
J. Li, M. Jing, L. Zhu, Z. Ding, K. Lu, and Y. Yang. 2020. Learning modality-invariant latent representations for generalized zero-shot learning. In ACM MM. 1348--1356.
[27]
Y. Long, L. Liu, L. Shao, F. Shen, G. Ding, and J. Han. 2017. From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In CVPR. 1627--1636.
[28]
Y. Luo, Z. Huang, Z. Zhang, Z. Wang, M. Baktashmotlagh, and Y. Yang. 2020. Learning from the Past: Continual Meta-Learning with Bayesian Graph Neural Networks. In AAAI. 5021--5028.
[29]
A. Miko"ajczyk and M. Grochowski. 2018. Data augmentation for improving deep learning in image classification problem. In IIPhDW. IEEE, 117--122.
[30]
S. Min, H. Yao, H. Xie, C. Wang, Z. J. Zha, and Y. Zhang. 2020. Domain-aware Visual Bias Eliminating for Generalized Zero-Shot Learning. In CVPR. 12664--12673.
[31]
S. Narayan, A. Gupta, F. S. Khan, C. G. Snoek, and L. Shao. 2020. Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification. arXiv preprint arXiv:2003.07833 (2020).
[32]
M. E. Nilsback and A. Zisserman. 2008. Automated flower classification over a large number of classes. In Indian Conference on Computer Vision, Graphics & Image Processing. 722--729.
[33]
S. Reed, Z. Akata, H. Lee, and B. Schiele. 2016. Learning deep representations of fine-grained visual descriptions. In CVPR. 49--58.
[34]
B. Romera-Paredes and P. Torr. 2015. An embarrassingly simple approach to zero-shot learning. In ICML. 2152--2161.
[35]
E. Schonfeld, S. Ebrahimi, S. Sinha, T. Darrell, and Z. Akata. 2019. Generalized zero-and few-shot learning via aligned variational autoencoders. In CVPR. 8247--8255.
[36]
Y. Shen, J. Qin, L. Huang, L. Liu, F. Zhu, and L. Shao. 2020. Invertible zero-shot recognition flows. In ECCV. Springer, 614--631.
[37]
C. Shorten and T. M. Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of Big Data 6, 1 (2019), 1--48.
[38]
C. Sun, A. Shrivastava, S. Singh, and A. Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV. 843--852.
[39]
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. 2005. Large margin methods for structured and interdependent output variables. JMLR 6, 9 (2005).
[40]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).
[41]
Z. Wang, Y. Luo, Z. Huang, and M. Baktashmotlagh. 2020. Prototype-matching graph network for heterogeneous domain adaptation. In ACM MM. 2104--2112.
[42]
Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein, and B. Schiele. 2016. Latent embeddings for zero-shot classification. In CVPR. 69--77.
[43]
Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata. 2018. Zero-shot learning-A comprehensive evaluation of the good, the bad and the ugly. TPAMI (2018), 2251--2265.
[44]
Y. Xian, T. Lorenz, B. Schiele, and Z. Akata. 2018. Feature generating networks for zero-shot learning. In CVPR. 5542--5551.
[45]
Y. Xian, S. Sharma, B. Schiele, and Z. Akata. 2019. f-VAEGAN-D2: A feature generating framework for any-shot learning. In CVPR. 10275--10284.
[46]
G. Xie, L. Liu, F. Zhu, F. Zhao, Z. Zhang, Y. Yao, J. Qin, and L. Shao. 2020. Region graph embedding network for zero-shot learning. In ECCV. Springer, 562--580.
[47]
G. Xie, Z. Zhang, G. Liu, F. Zhu, L. Liu, L. Shao, and X. Li. 2021. Generalized zero-shot learning with multiple graph adaptive generative networks. IEEE TNNLS (2021).
[48]
Y. Yang, Y. Luo, W. Chen, F. Shen, J. Shao, and H. T. Shen. 2016. Zero-shot hashing via transferring supervised knowledge. In ACM MM.
[49]
Y. Yu, Z. Ji, J. Han, and Z. Zhang. 2020. Episode-Based Prototype Generating Network for Zero-Shot Learning. In CVPR. 14035--14044.
[50]
J. Zhu, T. Park, P. Isola, and A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV. 2223--2232.
[51]
Y. Zhu, M. Elhoseiny, B. Liu, X. Peng, and A. Elgammal. 2018. A generative adversarial approach for zero-shot learning from noisy texts. In CVPR. 1004--1013.
[52]
Y. Zhu, J. Xie, B. Liu, and A. Elgammal. 2019. Learning feature-to-feature translator by alternating back-propagation for generative zero-shot learning. In CVPR. 9844--9854.

Cited By

View all
  • (2024)Multimodal few-shot classification without attribute embeddingEURASIP Journal on Image and Video Processing10.1186/s13640-024-00620-92024:1Online publication date: 10-Jan-2024
  • (2024)A Progressive Placeholder Learning Network for Multimodal Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2024.337324826(7933-7945)Online publication date: 8-Mar-2024
  • (2024)MLTU: mixup long-tail unsupervised zero-shot image classification on vision-language modelsMultimedia Systems10.1007/s00530-024-01373-130:3Online publication date: 5-Jun-2024
  • Show More Cited By

Index Terms

  1. Mitigating Generation Shifts for Generalized Zero-Shot Learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. generalized zero-shot learning
    2. generative flow
    3. normalizing flow
    4. zero-shot learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multimodal few-shot classification without attribute embeddingEURASIP Journal on Image and Video Processing10.1186/s13640-024-00620-92024:1Online publication date: 10-Jan-2024
    • (2024)A Progressive Placeholder Learning Network for Multimodal Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2024.337324826(7933-7945)Online publication date: 8-Mar-2024
    • (2024)MLTU: mixup long-tail unsupervised zero-shot image classification on vision-language modelsMultimedia Systems10.1007/s00530-024-01373-130:3Online publication date: 5-Jun-2024
    • (2023)Complex Scenario Image Retrieval via Deep Similarity-aware HashingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362401620:4(1-24)Online publication date: 11-Dec-2023
    • (2023)Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with Differentiable Expected Calibration ErrorProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611808(1167-1178)Online publication date: 26-Oct-2023
    • (2023)Enhancing Domain-Invariant Parts for Generalized Zero-Shot LearningProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611764(6283-6291)Online publication date: 26-Oct-2023
    • (2023)FFM: Injecting Out-of-Domain Knowledge via Factorized Frequency Modification2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00412(4124-4133)Online publication date: Jan-2023
    • (2023)GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2022.319067825(5374-5385)Online publication date: 1-Jan-2023
    • (2023)Adaptive Bias-Aware Feature Generation for Generalized Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2021.312513425(280-290)Online publication date: 2023
    • (2023)Dual-Aligned Feature Confusion Alleviation for Generalized Zero-Shot LearningIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.323939033:8(3774-3785)Online publication date: 1-Aug-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media