research-article

Set prediction in the latent space

AUTHORs:

Konpat Preechakul,

Chawan Piansaddhayanon,

Burin Naowarat,

Tirasan Khandhawit,

Sira Sriswasdi,

Ekapol ChuangsuwanichAuthors Info & Claims

NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems

Article No.: 1954, Pages 25516 - 25527

Published: 10 June 2024 Publication History

Abstract

Set prediction tasks require the matching between predicted set and ground truth set in order to propagate the gradient signal. Recent works have performed this matching in the original feature space thus requiring predefined distance functions. We propose a method for learning the distance function by performing the matching in the latent space learned from encoding networks. This method enables the use of teacher forcing which was not possible previously since matching in the feature space must be computed after the entire output sequence is generated. Nonetheless, a naive implementation of latent set prediction might not converge due to permutation instability. To address this problem, we provide sufficient conditions for permutation stability which begets an algorithm to improve the overall model convergence. Experiments on several set prediction tasks, including image captioning and object detection, demonstrate the effectiveness of our method.

Supplementary Material

Additional material (3540261.3542215_supp.pdf)

Supplemental material.

Download
2.26 MB

References

[1]

Adam R Kosiorek, Hyunjik Kim, and Danilo J Rezende, "Conditional set generation with transformers," June 2020.

[2]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, et al., "End-to-End object detection with transformers," May 2020.

Digital Library

[3]

Ashish Vaswani, Noam Shazeer, Niki Parmar, et al., "Attention is all you need,", no. Nips, June 2017.

[4]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al., "Generative adversarial nets," in Advances in Neural Information Processing Systems 27, Z Ghahramani, M Welling, C Cortes, et al., Eds., pp. 2672–2680. Curran Associates, Inc., 2014.

[5]

Xuankai Chang, Yanmin Qian, Kai Yu, et al., "End-to-End monaural multi-speaker ASR system without pretraining," Nov. 2018.

[6]

Xuankai Chang, Wangyou Zhang, Yanmin Qian, et al., "End-to-End multi-speaker speech recognition with transformer," Feb. 2020.

[7]

Xiaodong Liu, Kevin Duh, Liyuan Liu, et al., "Very deep transformers for neural machine translation," Aug. 2020.

[8]

S Hamid Rezatofighi, Kumar B G Vijay, Anton Milan, et al., "DeepSetNet: Predicting sets with deep neural networks," Nov. 2016.

[9]

Hamid Rezatofighi, Roman Kaskman, Farbod T Motlagh, et al., "Learn to predict sets using Feed-Forward neural networks," Jan. 2020.

[10]

S Hamid Rezatofighi, Roman Kaskman, Farbod T Motlagh, et al., "Deep Perm-Set net: Learn to predict sets with unknown permutation and cardinality using deep neural networks," May 2018.

[11]

David W Zhang, Gertjan J Burghouts, and Cees G M Snoek, "Set prediction without imposing structure as conditional density estimation," Sept. 2020.

[12]

Yan Zhang, Jonathon Hare, and Adam Prugel-Bennett, "Deep set prediction networks," in Advances in Neural Information Processing Systems 32, H Wallach, H Larochelle, A Beygelzimer, et al., Eds., pp. 3212–3222. Curran Associates, Inc., 2019.

[13]

Malte Probst, "The set autoencoder: Unsupervised representation learning for sets," Feb. 2018.

[14]

Tsung-Yi Lin, Michael Maire, Serge Belongie, et al., "Microsoft COCO: Common objects in context," May 2014.

Digital Library

[15]

Ranjay Krishna, Yuke Zhu, Oliver Groth, et al., "Visual genome: Connecting language and vision using crowdsourced dense image annotations," Int. J. Comput. Vis., no. 1, pp. 32–73, Feb. 2016.

Digital Library

[16]

Alistair E W Johnson, Tom J Pollard, Nathaniel R Greenbaum, et al., "MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs," Jan. 2019.

[17]

Shaoqing Ren, Kaiming He, Ross Girshick, et al., "Faster R-CNN: Towards Real-Time object detection with region proposal networks," in Advances in Neural Information Processing Systems, C Cortes, N Lawrence, D Lee, et al., Eds. 2015, vol. 28, Curran Associates, Inc.

[18]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, et al., "SSD: Single shot MultiBox detector," in Computer Vision – ECCV 2016. 2016, pp. 21–37, Springer International Publishing.

[19]

Kaiwen Duan, Song Bai, Lingxi Xie, et al., "CenterNet: Keypoint triplets for object detection," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Apr. 2019, pp. 6568–6577.

[20]

Zhi Tian, Chunhua Shen, Hao Chen, et al., "FCOS: Fully convolutional one-stage object detection," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Oct. 2019, IEEE.

[21]

Hamed Karimi, Julie Nutini, and Mark Schmidt, "Linear convergence of gradient and Proximal-Gradient methods under the Polyak-Łojasiewicz condition," Aug. 2016.

[22]

Yann LeCun, Corinna Cortes, and C J Burges, "MNIST handwritten digit database," AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, vol. 2, pp. 18, 2010.

[23]

Justin Johnson, Bharath Hariharan, Laurens van der Maaten, et al., "CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning," Dec. 2016.

[24]

Jonathan Krause, Justin Johnson, Ranjay Krishna, et al., "A hierarchical approach for generating descriptive image paragraphs," Nov. 2016.

[25]

Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, et al., "Show, attend and tell: Neural image caption generation with visual attention," in Proceedings of the 32nd International Conference on Machine Learning (ICML 2015). July 2015, ICML'15, pp. 2048–2057, JMLR.org.

[26]

Zhihong Chen, Yan Song, Tsung-Hui Chang, et al., "Generating radiology reports via memory-driven transformer," Oct. 2020.

[27]

Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDermott, et al., "Clinically accurate chest X-Ray report generation," in Proceedings of the 4th Machine Learning for Healthcare Conference, PMLR, Finale Doshi-Velez, Jim Fackler, Ken Jung, et al., Eds., Ann Arbor, Michigan, 2019, vol. 106 of Proceedings of Machine Learning Research, pp. 249–269, PMLR.

[28]

Yongsik Sim, Myung Jin Chung, Elmar Kotter, et al., "Deep convolutional neural network-based software improves radiologist detection of malignant lung nodules on chest radiographs," Radiology, vol. 294, no. 1, pp. 199–209, Jan. 2020.

[29]

Kishore Papineni, Salim Roukos, Todd Ward, et al., "Bleu: a method for automatic evaluation of machine translation," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, July 2002, ACL '02, pp. 311–318, Association for Computational Linguistics.

[30]

Matt Post, "A call for clarity in reporting BLEU scores," in Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, Oct. 2018, pp. 186–191, Association for Computational Linguistics.

[31]

K He, X Zhang, S Ren, et al., "Deep residual learning for image recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 770–778.

[32]

Yan Zhang, Jonathon Hare, and Adam Prügel-Bennett, "FSPool: Learning set representations with featurewise sort pooling," June 2019.

[33]

Yinhan Liu, Myle Ott, Naman Goyal, et al., "RoBERTa: A robustly optimized BERT pretraining approach," July 2019.

[34]

Thomas Wolf, Lysandre Debut, Victor Sanh, et al., "HuggingFace's transformers: State-of-the-art natural language processing," Oct. 2019.

[35]

Matthew Honnibal, Ines Montani, Sofie Van Landeghem, et al., "spacy: Industrial-strength natural language processing in python," 2020.

[36]

Robert M Gower, "Convergence theorems for gradient descent," Tech. Rep., Sept. 2019.

Recommendations

The approximation set of a vague set in rough approximation space

Vague set is a further generalization of fuzzy set. In rough set theory, a target concept may be a defined set, fuzzy set or vague set. That the target concept is a defined set or fuzzy set was analyzed in detail in our other papers respectively. In ...
A covering-based pessimistic multigranulation rough set
ICIC'11: Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications

In view of granular computing, the classical optimistic and pessimistic multigranulation rough set models are both primarily based on simple granules among multiple granular structures, namely multiple partitions of the universe in MGRS. This ...
Temporal Latent Space Modeling for Community Prediction
Advances in Information Retrieval
Abstract
We propose a temporal latent space model for user community prediction in social networks, whose goal is to predict future emerging user communities based on past history of users’ topics of interest. Our model assumes that each user lies within ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

December 2021

30517 pages

ISBN:9781713845393

Copyright © 2021 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 10 June 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents