research-article

PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection

Authors:

Di HuangAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 8691 - 8700

https://doi.org/10.1145/3664647.3681176

Published: 28 October 2024 Publication History

Abstract

In recent years, Few-Shot Object Detection (FSOD) has gained widespread attention and made significant progress due to its ability to build models with a good generalization power using extremely limited annotated data. The fine-tuning based paradigm is currently dominating this field, where detectors are initially pre-trained on base classes with sufficient samples and then fine-tuned on novel ones with few samples, but the scarcity of labeled samples of novel classes greatly interferes precisely fitting their data distribution, thus hampering the performance. To address this issue, we propose a new framework for FSOD, namely Prototype-based Soft-labels and Test-Time Learning (PS-TTL). Specifically, we design a Test-Time Learning (TTL) module that employs a mean-teacher network for self-training to discover novel instances from test data, allowing detectors to learn better representations and classifiers for novel classes. Furthermore, we notice that even though relatively low-confidence pseudo-labels exhibit classification confusion, they still tend to recall foreground. We thus develop a Prototype-based Soft-labels (PS) strategy through assessing similarities between low-confidence pseudo-labels and category prototypes as soft-labels to unleash their potential, which substantially mitigates the constraints posed by few-shot samples. Extensive experiments on both the VOC and COCO benchmarks show that PS-TTL achieves the state-of-the-art, highlighting its effectiveness. The code and model are available at https://github.com/gaoyingjay/PS-TTL.

References

[1]

Yuhang Cao, Jiaqi Wang, Ying Jin, Tong Wu, Kai Chen, Ziwei Liu, and Dahua Lin. 2021. Few-shot object detection via association and discrimination. In Advances in Neural Information Processing Systems, Vol. 34. 16570--16581.

[2]

Yuhang Cao, Jiaqi Wang, Yiqi Lin, and Dahua Lin. 2022. Mini: Mining implicit novel instances for few-shot object detection. arXiv preprint arXiv:2205.03381 (2022).

[3]

Hao Chen, Yali Wang, Guoyou Wang, and Yu Qiao. 2018. Lstd: A low-shot transfer detector for object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[4]

Berkan Demirel, Orhun Buugra Baran, and Ramazan Gokberk Cinbis. 2023. Meta-tuning loss functions and data augmentation for few-shot object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7339--7349.

[5]

Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, Vol. 111 (2015), 98--136.

Digital Library

[6]

Qi Fan, Chi-Keung Tang, and Yu-Wing Tai. 2022. Few-shot object detection with model calibration. In Proceedings of the European Conference on Computer Vision. Springer, 720--739.

Digital Library

[7]

Qi Fan, Wei Zhuo, Chi-Keung Tang, and Yu-Wing Tai. 2020. Few-shot object detection with attention-RPN and multi-relation detector. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4013--4022.

[8]

Zhibo Fan, Yuchen Ma, Zeming Li, and Jian Sun. 2021. Generalized few-shot object detection without forgetting. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4527--4536.

[9]

Guangxing Han, Yicheng He, Shiyuan Huang, Jiawei Ma, and Shih-Fu Chang. 2021. Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In IEEE/CVF International Conference on Computer Vision. 3263--3272.

[10]

Guangxing Han, Shiyuan Huang, Jiawei Ma, Yicheng He, and Shih-Fu Chang. 2022. Meta faster r-cnn: Towards accurate few-shot object detection with attentive feature alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 780--789.

[11]

Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, and Shih-Fu Chang. 2022. Few-shot object detection with fully cross-transformer. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5321--5330.

[12]

Jiaming Han, Yuqiang Ren, Jian Ding, Ke Yan, and Gui-Song Xia. 2023. Few-shot object detection via variational feature aggregation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 755--763.

Digital Library

[13]

Hanzhe Hu, Shuai Bai, Aoxue Li, Jinshi Cui, and Liwei Wang. 2021. Dense relation distillation with context-aware aggregation for few-shot object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10185--10194.

[14]

Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. 2019. Few-shot object detection via feature reweighting. In IEEE/CVF International Conference on Computer Vision. 8420--8429.

[15]

Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, and Alex M Bronstein. 2019. Repmet: Representative-based metric learning for classification and few-shot object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5197--5206.

[16]

Prannay Kaul, Weidi Xie, and Andrew Zisserman. 2022. Label, verify, correct: A simple few shot object detection method. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14237--14247.

[17]

Geonuk Kim, Hong-Gyu Jung, and Seong-Whan Lee. 2021. Spatial reasoning for few-shot object detection. Pattern Recognition, Vol. 120 (2021), 108118.

Digital Library

[18]

Dong-Hyun Lee. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of the International Conference on Machine Learning Workshop.

[19]

Aoxue Li and Zhenguo Li. 2021. Transformation invariant few-shot object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3094--3102.

[20]

Bohao Li, Boyu Yang, Chang Liu, Feng Liu, Rongrong Ji, and Qixiang Ye. 2021. Beyond max-margin: Class margin equilibrium for few-shot object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7363--7372.

[21]

Zhuoling Li, Minghui Dong, Shiping Wen, Xiang Hu, Pan Zhou, and Zhigang Zeng. 2019. CLU-CNNs: Object detection for medical images. Neurocomputing, Vol. 350 (2019), 53--59.

Digital Library

[22]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In IEEE/CVF International Conference on Computer Vision. 2980--2988.

[23]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740--755.

[24]

Nanqing Liu, Xun Xu, Turgay Celik, Zongxin Gan, and Heng-Chao Li. 2023. Transformation-invariant network for few-shot object detection in remote-sensing images. IEEE Transactions on Geoscience and Remote Sensing, Vol. 61 (2023), 1--14.

[25]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21--37.

[26]

Yang Liu, Zhuo Ma, Ximeng Liu, Siqi Ma, and Kui Ren. 2019. Privacy-preserving object detection for medical images with faster R-CNN. IEEE Transactions on Information Forensics and Security, Vol. 17 (2019), 69--84.

Digital Library

[27]

Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, and Peter Vajda. 2021. Unbiased Teacher for Semi-Supervised Object Detection. In Proceedings of the International Conference on Learning Representations.

[28]

Xiaonan Lu, Wenhui Diao, Yongqiang Mao, Junxi Li, Peijin Wang, Xian Sun, and Kun Fu. 2023. Breaking immutable: Information-coupled prototype elaboration for few-shot object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 1844--1852.

Digital Library

[29]

Yue Lu, Xingyu Chen, Zhengxing Wu, and Junzhi Yu. 2022. Decoupled metric network for single-stage few-shot object detection. IEEE Transactions on Cybernetics, Vol. 53, 1 (2022), 514--525.

[30]

Haoxiang Ma and Di Huang. 2023. Towards scale balanced 6-dof grasp detection in cluttered scenes. In Conference on Robot Learning. PMLR, 2004--2013.

[31]

Jiawei Ma, Guangxing Han, Shiyuan Huang, Yuncong Yang, and Shih-Fu Chang. 2022. Few-shot end-to-end object detection via constantly concentrated encoding across heads. In Proceedings of the European Conference on Computer Vision. Springer, 57--73.

Digital Library

[32]

Thanh Nguyen, Chau Pham, Khoi Nguyen, and Minh Hoai. 2022. Few-shot object counting and detection. In Proceedings of the European Conference on Computer Vision. Springer, 348--365.

Digital Library

[33]

Wenjie Pei, Shuang Wu, Dianwen Mei, Fanglin Chen, Jiandong Tian, and Guangming Lu. 2022. Few-shot object detection by knowledge distillation using bag-of-visual-words representations. In Proceedings of the European Conference on Computer Vision. Springer, 283--299.

Digital Library

[34]

Yifan Pu, Weicong Liang, Yiduo Hao, Yuhui Yuan, Yukang Yang, Chao Zhang, Han Hu, and Gao Huang. 2024. Rank-DETR for high quality object detection. In Advances in Neural Information Processing Systems, Vol. 36.

[35]

Limeng Qiao, Yuxuan Zhao, Zhiyuan Li, Xi Qiu, Jianan Wu, and Chi Zhang. 2021. Defrcn: Decoupled faster r-cnn for few-shot object detection. In IEEE/CVF International Conference on Computer Vision. 8681--8690.

[36]

Ran Qin, Haoxiang Ma, Boyang Gao, and Di Huang. 2023. RGB-D grasp detection via depth guided learning with cross-modal attention. In IEEE International Conference on Robotics and Automation. IEEE, 8003--8009.

[37]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 779--788.

[38]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 6 (2016), 1137--1149.

Digital Library

[39]

Bo Sun, Banghuai Li, Shengcai Cai, Ye Yuan, and Chi Zhang. 2021. Fsce: Few-shot object detection via contrastive proposal encoding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7352--7362.

[40]

Yingbo Tang, Zhiqiang Cao, Yuequan Yang, Jierui Liu, and Junzhi Yu. 2023. Semi-supervised few-shot object detection via adaptive pseudo labeling. IEEE Transactions on Circuits and Systems for Video Technology (2023).

[41]

Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems, Vol. 30.

[42]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2020. FCOS: A simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 4 (2020), 1922--1933.

[43]

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. 2023. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7464--7475.

[44]

Xin Wang, Thomas E Huang, Trevor Darrell, Joseph E Gonzalez, and Fisher Yu. 2020. Frustratingly simple few-shot object detection. In Proceedings of the International Conference on Machine Learning. 9919--9928.

[45]

Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2019. Meta-learning to detect rare objects. In IEEE/CVF International Conference on Computer Vision. 9925--9934.

[46]

Aming Wu, Yahong Han, Linchao Zhu, and Yi Yang. 2021. Universal-prototype enhancing for few-shot object detection. In IEEE/CVF International Conference on Computer Vision. 9567--9576.

[47]

Jiaxi Wu, Songtao Liu, Di Huang, and Yunhong Wang. 2020. Multi-scale positive sample refinement for few-shot object detection. In Proceedings of the European Conference on Computer Vision. Springer, 456--472.

Digital Library

[48]

Shuang Wu, Wenjie Pei, Dianwen Mei, Fanglin Chen, Jiandong Tian, and Guangming Lu. 2022. Multi-faceted distillation of base-novel commonality for few-shot object detection. In Proceedings of the European Conference on Computer Vision. Springer, 578--594.

Digital Library

[49]

Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, and Liang Lin. 2019. Meta r-cnn: Towards general solver for instance-level low-shot learning. In IEEE/CVF International Conference on Computer Vision. 9577--9586.

[50]

Yukuan Yang, Fangyun Wei, Miaojing Shi, and Guoqi Li. 2020. Restoring negative information in few-shot object detection. In Advances in Neural Information Processing Systems, Vol. 33. 3521--3532.

[51]

Ze Yang, Chi Zhang, Ruibo Li, Yi Xu, and Guosheng Lin. 2022. Efficient few-shot object detection via knowledge inheritance. IEEE Transactions on Image Processing, Vol. 32 (2022), 321--334.

[52]

Li Yin, Juan M Perez-Rua, and Kevin J Liang. 2022. Sylph: A hypernetwork framework for incremental few-shot object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9035--9045.

[53]

Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, Shijian Lu, and Eric P Xing. 2022. Meta-detr: Image-level few-shot detection with inter-class correlation exploitation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 11 (2022), 12832--12843.

[54]

Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel Ni, and Heung-Yeung Shum. 2023. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. In Proceedings of the International Conference on Learning Representations.

[55]

Jinqing Zhang, Yanan Zhang, Qingjie Liu, and Yunhong Wang. 2023. SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection. In IEEE/CVF International Conference on Computer Vision. 3348--3357.

[56]

Lu Zhang, Shuigeng Zhou, Jihong Guan, and Ji Zhang. 2021. Accurate few-shot object detection with support-query mutual guidance and hybrid loss. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14424--14432.

[57]

Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z Li. 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9759--9768.

[58]

Weilin Zhang and Yu-Xiong Wang. 2021. Hallucination improves few-shot object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13008--13017.

[59]

Weilin Zhang, Yu-Xiong Wang, and David A Forsyth. 2020. Cooperating RPN's improve few-shot object detection. arXiv preprint arXiv:2011.10142 (2020).

[60]

Yanan Zhang, Jiaxin Chen, and Di Huang. 2022. Cat-det: Contrastively augmented transformer for multi-modal 3d object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 908--917.

[61]

Yanan Zhang, Di Huang, and Yunhong Wang. 2021. PC-RGNN: Point cloud completion and graph neural network for 3D object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 3430--3437.

[62]

Zhiyuan Zhao, Qingjie Liu, and Yunhong Wang. 2022. Exploring effective knowledge transfer for few-shot object detection. In Proceedings of the ACM International Conference on Multimedia. 6831--6839.

Digital Library

[63]

Chao Zhou, Yanan Zhang, Jiaxin Chen, and Di Huang. 2023. Octr: Octree-based transformer for 3d object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5166--5175.

[64]

Chenchen Zhu, Fangyi Chen, Uzair Ahmed, Zhiqiang Shen, and Marios Savvides. 2021. Semantic relation reasoning for shot-stable few-shot object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8782--8791.

Index Terms

PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
  2. Machine learning
    1. Learning settings
      1. Online learning settings

Recommendations

Self-supervised Prototype Conditional Few-Shot Object Detection
Image Analysis and Processing – ICIAP 2022
Abstract
Traditional deep learning-based object detection methods require a large amount of annotation for training, and creating such a dataset is expensive. Few-shot object detection which detects a new category of objects with a small amount of data is ...
Multi-task Self-supervised Few-Shot Detection
Pattern Recognition and Computer Vision
Abstract
Few-shot object detection involves detecting novel objects with only a few training samples. But very few samples are difficult to cover the bias of the new class in the deep model. To address the issue, we use self-supervision to expand the ...
FSODv2: A Deep Calibrated Few-Shot Object Detection Network
Abstract
Traditional methods for object detection typically necessitate a substantial amount of training data, and creating high-quality training data is time-consuming. We propose a novel Few-Shot Object Detection network (FSODv2) in this paper that aims ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
185
Total Downloads

Downloads (Last 12 months)185
Downloads (Last 6 weeks)136

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten