Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3540261.3542215guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

Set prediction in the latent space

Published: 10 June 2024 Publication History

Abstract

Set prediction tasks require the matching between predicted set and ground truth set in order to propagate the gradient signal. Recent works have performed this matching in the original feature space thus requiring predefined distance functions. We propose a method for learning the distance function by performing the matching in the latent space learned from encoding networks. This method enables the use of teacher forcing which was not possible previously since matching in the feature space must be computed after the entire output sequence is generated. Nonetheless, a naive implementation of latent set prediction might not converge due to permutation instability. To address this problem, we provide sufficient conditions for permutation stability which begets an algorithm to improve the overall model convergence. Experiments on several set prediction tasks, including image captioning and object detection, demonstrate the effectiveness of our method.

Supplementary Material

Additional material (3540261.3542215_supp.pdf)
Supplemental material.

References

[1]
Adam R Kosiorek, Hyunjik Kim, and Danilo J Rezende, "Conditional set generation with transformers," June 2020.
[2]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, et al., "End-to-End object detection with transformers," May 2020.
[3]
Ashish Vaswani, Noam Shazeer, Niki Parmar, et al., "Attention is all you need,", no. Nips, June 2017.
[4]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al., "Generative adversarial nets," in Advances in Neural Information Processing Systems 27, Z Ghahramani, M Welling, C Cortes, et al., Eds., pp. 2672–2680. Curran Associates, Inc., 2014.
[5]
Xuankai Chang, Yanmin Qian, Kai Yu, et al., "End-to-End monaural multi-speaker ASR system without pretraining," Nov. 2018.
[6]
Xuankai Chang, Wangyou Zhang, Yanmin Qian, et al., "End-to-End multi-speaker speech recognition with transformer," Feb. 2020.
[7]
Xiaodong Liu, Kevin Duh, Liyuan Liu, et al., "Very deep transformers for neural machine translation," Aug. 2020.
[8]
S Hamid Rezatofighi, Kumar B G Vijay, Anton Milan, et al., "DeepSetNet: Predicting sets with deep neural networks," Nov. 2016.
[9]
Hamid Rezatofighi, Roman Kaskman, Farbod T Motlagh, et al., "Learn to predict sets using Feed-Forward neural networks," Jan. 2020.
[10]
S Hamid Rezatofighi, Roman Kaskman, Farbod T Motlagh, et al., "Deep Perm-Set net: Learn to predict sets with unknown permutation and cardinality using deep neural networks," May 2018.
[11]
David W Zhang, Gertjan J Burghouts, and Cees G M Snoek, "Set prediction without imposing structure as conditional density estimation," Sept. 2020.
[12]
Yan Zhang, Jonathon Hare, and Adam Prugel-Bennett, "Deep set prediction networks," in Advances in Neural Information Processing Systems 32, H Wallach, H Larochelle, A Beygelzimer, et al., Eds., pp. 3212–3222. Curran Associates, Inc., 2019.
[13]
Malte Probst, "The set autoencoder: Unsupervised representation learning for sets," Feb. 2018.
[14]
Tsung-Yi Lin, Michael Maire, Serge Belongie, et al., "Microsoft COCO: Common objects in context," May 2014.
[15]
Ranjay Krishna, Yuke Zhu, Oliver Groth, et al., "Visual genome: Connecting language and vision using crowdsourced dense image annotations," Int. J. Comput. Vis., no. 1, pp. 32–73, Feb. 2016.
[16]
Alistair E W Johnson, Tom J Pollard, Nathaniel R Greenbaum, et al., "MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs," Jan. 2019.
[17]
Shaoqing Ren, Kaiming He, Ross Girshick, et al., "Faster R-CNN: Towards Real-Time object detection with region proposal networks," in Advances in Neural Information Processing Systems, C Cortes, N Lawrence, D Lee, et al., Eds. 2015, vol. 28, Curran Associates, Inc.
[18]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, et al., "SSD: Single shot MultiBox detector," in Computer Vision – ECCV 2016. 2016, pp. 21–37, Springer International Publishing.
[19]
Kaiwen Duan, Song Bai, Lingxi Xie, et al., "CenterNet: Keypoint triplets for object detection," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Apr. 2019, pp. 6568–6577.
[20]
Zhi Tian, Chunhua Shen, Hao Chen, et al., "FCOS: Fully convolutional one-stage object detection," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Oct. 2019, IEEE.
[21]
Hamed Karimi, Julie Nutini, and Mark Schmidt, "Linear convergence of gradient and Proximal-Gradient methods under the Polyak-Łojasiewicz condition," Aug. 2016.
[22]
Yann LeCun, Corinna Cortes, and C J Burges, "MNIST handwritten digit database," AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, vol. 2, pp. 18, 2010.
[23]
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, et al., "CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning," Dec. 2016.
[24]
Jonathan Krause, Justin Johnson, Ranjay Krishna, et al., "A hierarchical approach for generating descriptive image paragraphs," Nov. 2016.
[25]
Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, et al., "Show, attend and tell: Neural image caption generation with visual attention," in Proceedings of the 32nd International Conference on Machine Learning (ICML 2015). July 2015, ICML'15, pp. 2048–2057, JMLR.org.
[26]
Zhihong Chen, Yan Song, Tsung-Hui Chang, et al., "Generating radiology reports via memory-driven transformer," Oct. 2020.
[27]
Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDermott, et al., "Clinically accurate chest X-Ray report generation," in Proceedings of the 4th Machine Learning for Healthcare Conference, PMLR, Finale Doshi-Velez, Jim Fackler, Ken Jung, et al., Eds., Ann Arbor, Michigan, 2019, vol. 106 of Proceedings of Machine Learning Research, pp. 249–269, PMLR.
[28]
Yongsik Sim, Myung Jin Chung, Elmar Kotter, et al., "Deep convolutional neural network-based software improves radiologist detection of malignant lung nodules on chest radiographs," Radiology, vol. 294, no. 1, pp. 199–209, Jan. 2020.
[29]
Kishore Papineni, Salim Roukos, Todd Ward, et al., "Bleu: a method for automatic evaluation of machine translation," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, July 2002, ACL '02, pp. 311–318, Association for Computational Linguistics.
[30]
Matt Post, "A call for clarity in reporting BLEU scores," in Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, Oct. 2018, pp. 186–191, Association for Computational Linguistics.
[31]
K He, X Zhang, S Ren, et al., "Deep residual learning for image recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 770–778.
[32]
Yan Zhang, Jonathon Hare, and Adam Prügel-Bennett, "FSPool: Learning set representations with featurewise sort pooling," June 2019.
[33]
Yinhan Liu, Myle Ott, Naman Goyal, et al., "RoBERTa: A robustly optimized BERT pretraining approach," July 2019.
[34]
Thomas Wolf, Lysandre Debut, Victor Sanh, et al., "HuggingFace's transformers: State-of-the-art natural language processing," Oct. 2019.
[35]
Matthew Honnibal, Ines Montani, Sofie Van Landeghem, et al., "spacy: Industrial-strength natural language processing in python," 2020.
[36]
Robert M Gower, "Convergence theorems for gradient descent," Tech. Rep., Sept. 2019.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems
December 2021
30517 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 10 June 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media