Foreground-Specialized Model Imitation for Instance Segmentation

Dawei Li¹²,
Wenbo Li¹³ &
Hongxia Jin¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13847))

Included in the following conference series:

Asian Conference on Computer Vision

522 Accesses

Abstract

Instance segmentation is formulated as a multi-task learning problem. However, knowledge distillation is not well-suited to all sub-tasks except the multi-class object classification. Based on such a competence, we introduce a lightweight foreground-specialized (FS) teacher model, which is trained with foreground-only images and highly optimized for object classification. Yet, this leads to discrepancy between inputs to the teacher and student models. Thus, we introduce a novel Foreground-Specialized model Imitation (FSI) method with two complementary components. First, a reciprocal anchor box selection method is introduced to distill from the most informative output of the FS teacher. Second, we embed the foreground-awareness into student’s feature learning via either adding a co-learned foreground segmentation branch or applying a soft feature mask. We conducted an extensive evaluation against the others on COCO and Pascal VOC.

D. Li and W. Li—Equal contributions.

D. Li—This work was performed at Samsung Research America.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Weakly Supervised Learning of Instance Segmentation with Confidence Feedback

SOS: Segment Object System for Open-World Instance Segmentation with Object Priors

Context-FPN and Memory Contrastive Learning for Partially Supervised Instance Segmentation

Notes

1.
In distillation, the memory cost can be reduced when the outputs of teacher models are pre-computed. Yet, this disables on-the-fly data augmentation, a critical component for improving the model accuracy especially when the dataset is small.
2.
Simply adding the foreground awareness (FS) to the student without a teacher can improve mAP by 0.4.

References

Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: YOLACT++: better real-time instance segmentation. CoRR abs/1912.06218 (2019)
Google Scholar
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: YOLACT: real-time instance segmentation. In: ICCV (2019)
Google Scholar
Chen, G., Choi, W., Yu, X., Han, T.X., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: NIPS (2017)
Google Scholar
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017)
Google Scholar
Chen, S., Zhao, Q.: Attention-based autism spectrum disorder screening with privileged modality. In: ICCV (2019)
Google Scholar
Cho, J.H., Hariharan, B.: On the efficacy of knowledge distillation. In: ICCV (2019)
Google Scholar
Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111, 98–136 (2015)
Article Google Scholar
Garcia, N.C., Morerio, P., Murino, V.: Modality distillation with multiple stream networks for action recognition. In: ECCV (2018)
Google Scholar
Han, S., et al.: ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: FPGA (2017)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: ICLR (2016)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV (2017)
Google Scholar
He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: CVPR (2019)
Google Scholar
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)
Google Scholar
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: CVPR (2018)
Google Scholar
Kim, Y., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications. In: ICLR (2016)
Google Scholar
Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: EMNLP (2016)
Google Scholar
Lambert, J., Sener, O., Savarese, S.: Deep learning under privileged information using heteroscedastic dropout. In: CVPR (2018)
Google Scholar
Lee, K., Ros, G., Li, J., Gaidon, A.: SPIGAN: privileged adversarial learning from simulation. In: ICLR (2019)
Google Scholar
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J.: Structured knowledge distillation for semantic segmentation. In: CVPR (2019)
Google Scholar
Luo, Z., Hsieh, J., Jiang, L., Niebles, J.C., Fei-Fei, L.: Graph distillation for action detection with privileged modalities. In: ECCV (2018)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Vapnik, V., Izmailov, R.: Learning using privileged information: similarity control and knowledge transfer. JMLR 16(1), 2023–2049 (2015)
MathSciNet MATH Google Scholar
Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: CVPR (2019)
Google Scholar
Xie, E., et al.: Polarmask: single shot instance segmentation with polar representation. In: CVPR (2020)
Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: CVPR (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Amazon Lab126, Sunnyvale, USA
Dawei Li
Samsung Research America AI Center, Mountain View, USA
Wenbo Li & Hongxia Jin

Authors

Dawei Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenbo Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongxia Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenbo Li .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, D., Li, W., Jin, H. (2023). Foreground-Specialized Model Imitation for Instance Segmentation. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13847. Springer, Cham. https://doi.org/10.1007/978-3-031-26293-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-26293-7_24
Published: 11 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26292-0
Online ISBN: 978-3-031-26293-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics