Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

SiSL-Net: : Saliency-guided self-supervised learning network for image classification

Published: 21 October 2022 Publication History

Highlights

Saliency-Augmented Module is proposed for clean data, reflecting latent categories.
A new self-supervised learning network SiSL-Net is developed.
4.35% performance improvement on classification in self-supervised learning.

Abstract

The emergent studies and success on contrastive self-supervised learning have been well verified in the pretext task of instance discrimination, which learns visual representations by maximizing agreement between different augmented views of the same image sample (positive pairs). However, randomly cropping on original images may cause that the augmented view contains interference from a large proportion of the backgrounds, referred to as noisy data. Aiming to optimize the data augmentation and improve positive pairs, a Saliency-Augmented Module is proposed to obtain the augmented views, which only contain the ”latent” object area, referred to as clean data. Furthermore, a Saliency-Guided Self-Supervised Learning Network (SiSL-Net) is constructed as a new pattern of contrastive learning. A symmetric structure of trunk net and branch net is trained to learn a feature mapping from the clean data space and the noisy data space. Besides, a novel loss function is designed, including the embedding contrastive loss and distribution consistency loss, to optimize the feature representations during network training. The linear classification performance of our SiSL-Net is evaluated on the miniImageNet dataset with ResNet-50. Experiments show that our method achieves the top-1 accuracy from 64.67% to 69.02%, outperforming the state-of-the-art performance.

References

[1]
X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, J. Tang, Self-supervised learning: Generative or contrastive, IEEE Transactions on Knowledge and Data Engineering (2021),.
[2]
P. Goyal, D. Mahajan, A. Gupta, I. Misra, Scaling and benchmarking self-supervised visual representation learning, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019, pp. 6391–6400,.
[3]
C. Doersch, A. Gupta, A.A. Efros, Unsupervised visual representation learning by context prediction, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1422–1430,.
[4]
D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, A.A. Efros, Context encoders: Feature learning by inpainting, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2536–2544,.
[5]
R. Zhang, P. Isola, A.A. Efros, Colorful image colorization, in: Proceedings of the European Conference on Computer Vision (ECCV), Springer, 2016, pp. 649–666,.
[6]
P. Bojanowski, A. Joulin, Unsupervised learning by predicting noise, in: Proceedings of the International Conference on Machine Learning, PMLR, 2017, pp. 517–526,.
[7]
Gidaris, S., Singh, P., Komodakis, N., 2018. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728.
[8]
Z. Feng, C. Xu, D. Tao, Self-supervised representation learning by rotation feature decoupling, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10364–10374,.
[9]
A. Jaiswal, A.R. Babu, M.Z. Zadeh, D. Banerjee, F. Makedon, A survey on contrastive self-supervised learning, Technologies 9 (2021) 2,.
[10]
Li, J., Zhou, P., Xiong, C., Hoi, S.C., 2020. Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966.
[11]
T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple framework for contrastive learning of visual representations, in: Proceedings of the International Conference on Machine Learning, PMLR, 2020, pp. 1597–1607.
[12]
K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9729–9738,.
[13]
Chen, X., Fan, H., Girshick, R., He, K., 2020. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297.
[14]
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A., 2020. Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882.
[15]
Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., et al., 2020. Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733.
[16]
Li, Z., Liu, S., Sun, J., 2021. Momentumteacher: Momentum teacher with momentum statistics for self-supervised learning. arXiv preprint arXiv:2101.07525.
[17]
X. Chen, K. He, Exploring simple siamese representation learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15750–15758.
[18]
S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems 28 (2015) 91–99.
[19]
D. Zhang, J. Han, Y. Zhang, D. Xu, Synthesizing supervision for learning deep saliency network without human annotation, IEEE transactions on pattern analysis and machine intelligence 42 (2019) 1755–1769,.
[20]
D. Zhang, H. Tian, J. Han, Few-cost salient object detection with adversarial-paced learning, Advances in Neural Information Processing Systems 33 (2020) 12236–12247.
[21]
Q. Hou, M.M. Cheng, X. Hu, A. Borji, Z. Tu, P.H. Torr, Deeply supervised salient object detection with short connections, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3203–3212,.
[22]
P. Jiang, Z. Pan, C. Tu, N. Vasconcelos, B. Chen, J. Peng, Super diffusion for salient object detection, IEEE Transactions on Image Processing 29 (2019) 2903–2917,.
[23]
X. Zhang, T. Wang, J. Qi, H. Lu, G. Wang, Progressive attention guided recurrent network for salient object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018,.
[24]
A. Abdulmunem, Y.K. Lai, X. Sun, Saliency guided local and global descriptors for effective action recognition, Computational Visual Media 2 (2016),.
[25]
R. Zhao, W. Ouyang, X. Wang, Person re-identification by salience matching, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013, pp. 2528–2535,.
[26]
R. Zhao, W. Ouyang, X. Wang, Unsupervised salience learning for person re-identification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 3586–3593,.
[27]
G. Sun, W. Wang, J. Dai, L. Van Gool, Mining cross-image semantics for weakly supervised semantic segmentation, in: Proceedings of the European Conference on Computer Vision (ECCV), Springer, 2020, pp. 347–365,.
[28]
X. Li, F. Yang, H. Cheng, W. Liu, D. Shen, Contour knowledge transfer for salient object detection, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 355–370,.
[29]
T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117–2125,.
[30]
J.J. Liu, Q. Hou, M.M. Cheng, J. Feng, J. Jiang, A simple pooling-based design for real-time salient object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3917–3926.
[31]
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical Image Computing and Computer-assisted Intervention, Springer. pp. 234–241.
[32]
Zhao, X., Pang, Y., Zhang, L., Lu, H., Zhang, L., 2020. Suppress and balance: A simple gated network for salient object detection, in: Proceedings of the European Conference on Computer Vision (ECCV), Springer. pp. 35–51.
[33]
Gao, S.H., Tan, Y.Q., Cheng, M.M., Lu, C., Chen, Y., Yan, S., 2020. Highly efficient salient object detection with 100k parameters, in: Proceedings of the European Conference on Computer Vision (ECCV), Springer. pp. 702–721.
[34]
H. Zhou, X. Xie, J.H. Lai, Z. Chen, L. Yang, Interactive two-stream decoder for accurate and fast saliency detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9141–9150,.
[35]
F. Chen, N. Wang, J. Tang, D. Liang, H. Feng, Self-supervised data augmentation for person re-identification, Neurocomputing 415 (2020) 48–59,.
[36]
X. Pan, F. Tang, W. Dong, Y. Gu, Z. Song, Y. Meng, P. Xu, O. Deussen, C. Xu, Self-supervised feature augmentation for large image object detection, IEEE Transactions on Image Processing 29 (2020) 6745–6758,.
[37]
Oord, A.v.d., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., Kavukcuoglu, K., 2016. Conditional image generation with pixelcnn decoders. arXiv preprint arXiv:1606.05328.
[38]
Misra, I., Maaten, L.v.d., 2020. Self-supervised learning of pretext-invariant representations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6707–6717.
[39]
W. Xia, X. Zhang, Q. Gao, X. Gao, Adversarial self-supervised clustering with cluster-specificity distribution, Neurocomputing 449 (2021) 38–47,.
[40]
M. Gutmann, A. Hyvärinen, Noise-contrastive estimation: A new estimation principle for unnormalized statistical models, in: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, 2010, pp. 297–304.
[41]
F. Wang, H. Liu, Understanding the behaviour of contrastive loss, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2495–2504.
[42]
Z. Wu, Y. Xiong, S.X. Yu, D. Lin, Unsupervised feature learning via non-parametric instance discrimination, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3733–3742,.
[43]
Oord, A.v.d., Li, Y., Vinyals, O., 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
[44]
Gansbeke, W.V., Vandenhende, S., Georgoulis, S., Proesmans, M., Gool, L.V., 2020. Learning to classify images without labels. arXiv preprint arXiv:2005.12320.
[45]
D. Yuan, X. Chang, P.Y. Huang, Q. Liu, Z. He, Self-supervised deep correlation tracking, IEEE Transactions on Image Processing 30 (2020) 976–985,.
[46]
Koohpayegani, S.A., Tejankar, A., Pirsiavash, H., 2020. Compress: Self-supervised learning by compressing representations. arXiv preprint arXiv:2010.14713.
[47]
Fang, Z., Wang, J., Wang, L., Zhang, L., Yang, Y., Liu, Z., 2021. Seed: Self-supervised distillation for visual representation. arXiv preprint arXiv:2101.04731.
[48]
Vinyals, O., Blundell, C., Lillicrap, T.P., Kavukcuoglu, K., Wierstra, D., 2016. Matching networks for one shot learning. arXiv preprint arXiv:1606.04080.
[49]
H. Van Hasselt, Y. Doron, F. Strub, M. Hessel, N. Sonnerat, J. Modayil, Deep reinforcement learning and the deadly triad, 2018, arXiv preprint arXiv:1812.02648.
[50]
J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Ieee, 2009, pp. 248–255,.
[51]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778,.
[52]
I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient descent with warm restarts, 2016, arXiv preprint arXiv:1608.03983.
[53]
L. Van der Maaten, G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research 9 (2008).
[54]
R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, in: Proceedings of the IEEE international Conference on Computer Vision (ICCV), 2017, pp. 618–626,.
[55]
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K., 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677.
[56]
Krizhevsky, A., Hinton, G., et al., 2009. Learning multiple layers of features from tiny images.

Cited By

View all

Index Terms

  1. SiSL-Net: Saliency-guided self-supervised learning network for image classification
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Neurocomputing
        Neurocomputing  Volume 510, Issue C
        Oct 2022
        234 pages

        Publisher

        Elsevier Science Publishers B. V.

        Netherlands

        Publication History

        Published: 21 October 2022

        Author Tags

        1. Positive pairs
        2. Contrastive self-supervised learning
        3. Saliency detection

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 22 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media