Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Focusing intermediate pixels loss for salient object segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

To improve the network performance of salient object segmentation, many researchers modified the loss functions and set weights to pixel losses. However, these loss functions paid less attention to intermediate pixels of which the predicted probabilities lie in the intermediate region between correct and incorrect classification. To solve this problem, focusing intermediate pixels loss is proposed. Firstly, foreground and background are divided into correct and incorrect classified sets respectively to discover intermediate pixels which are difficult to determine the category. Secondly, the intermediate pixels are paid more attention according to the predicted probability. Finally, misclassified pixels are strengthened dynamically with the order of training epochs. The proposed method can 1) make the model focus on intermediate pixels that have more uncertainty; 2) solve the vanishing gradient problem of Focal Loss for well-classified pixels. Experiment results on six public datasets and two different type of network structures show that the proposed method performs better than other state-of-the-art weighted loss functions and the average Fβ is increased by about 2.7% compared with typical cross entropy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Chen LC, Papandreou G, Schroff F, et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587.

  2. Chen Z, Zhou H, Lai J et al (2020) Contour-aware loss: Boundary-aware learning for salient object segmentation. IEEE Trans Image Proc 30:431–443

    Article  ADS  Google Scholar 

  3. Cheng MM, Mitra NJ, Huang X et al (2014) SalientShape: group saliency in image collections. Vis Comput 30(4):443–453

    Article  Google Scholar 

  4. Deng Z, Hu X, Zhu L et al (2018) R3net: Recurrent residual refinement network for saliency detection. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, USA, pp 684–690

    Google Scholar 

  5. Fan DP, Cheng MM, Liu JJ, et al (2018) Salient objects in clutter: Bringing salient object detection to the foreground. In: Proceedings of the European conference on computer vision, pp. 186–202.

  6. Fan D P, Ji G P, Sun G, et al (2020) Camouflaged object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2777–2787.

  7. Goyal P, Dollár P, Girshick R, et al (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677.

  8. Gu J, Wang Z, Kuen J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377

    Article  ADS  Google Scholar 

  9. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.

  10. He T, Zhang Z, Zhang H, et al (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 558–567.

  11. Hossain MS, Betts JM, Paplinski AP (2021) Dual Focal Loss to address class imbalance in semantic segmentation. Neurocomputing 462:69–87

    Article  Google Scholar 

  12. Hou Q, Cheng MM, Hu X, et al (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3203–3212.

  13. Ji Y, Zhang H, Zhang Z et al (2021) CNN-based encoder-decoder networks for salient object detection: A comprehensive review and recent advances. Inf Sci 546:835–857

    Article  MathSciNet  Google Scholar 

  14. Kim T, Lee H, Kim D (2021) Uacanet: Uncertainty augmented context attention for polyp segmentation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 2167-2175.

  15. Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980.

  16. Lei Feng, Senlin Shu, Zhuoyi Lin, et al (2020) Can cross entropy loss be robust to label noise. In: Proceedings of the 29th International Joint Conferences on Artificial Intelligence, pp 2206–2212.

  17. Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5455–5463.

  18. Li Y, Hou X, Koch C, et al (2014) The secrets of salient object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 280–287.

  19. Li X, Yu L, Chang D et al (2019) Dual cross-entropy loss for small-sample fine-grained vehicle classification. IEEE Trans Veh Technol 68(5):4204–4212

    Article  Google Scholar 

  20. Lin TY, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE inter-national conference on computer vision, pp. 2980–2988.

  21. Liu JJ, Hou Q, Cheng MM, et al (2019) A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3917–3926.

  22. Liu Z, Tang J, Xiang Q et al (2020) Salient object detection for RGB-D images by generative adversarial network. Multimed Tools Appl 79:25403–25425

    Article  Google Scholar 

  23. Liu Z, Lin Y, Cao Y, et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10012–10022.

  24. Mao Y, Zhang J, Wan Z, et al (2021) Generative transformer for accurate and reliable salient object detection arXiv: 2104.10127.

  25. Mavroforakis ME, Theodoridis S (2006) A geometric approach to support vector machine (SVM) classification. IEEE Trans Neural Netw 17(3):671–682

    Article  PubMed  Google Scholar 

  26. Pan C, Yan WQ (2020) Object detection based on saturation of visual perception. Multimed Tools Appl 79:19925–19944

    Article  Google Scholar 

  27. Pang Y, Zhao X, Xiang T Z, et al (2022) Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2160–2170.

  28. Rahman MA, Wang Y (2016) Optimizing intersection-over-union in deep neural networks for image segmentation. International symposium on visual computing. Springer, Cham, pp 234–244

    Google Scholar 

  29. Singh VK, Kumar N, Singh N (2020) A hybrid approach using color spatial variance and novel object position prior for salient object detection. Multimed Tools Appl 79:30045–30067

    Article  Google Scholar 

  30. Wang Q, Zhang L, Li Y et al (2020) Overview of deep-learning based methods for salient object detection in videos. Pattern Recogn 104:107340

    Article  Google Scholar 

  31. Wei J, Wang S, Huang Q (2020) F3Net: fusion, feedback and focus for salient object detection. Proc AAAI Conf Artificial Int 34(07):12321–12328

    Google Scholar 

  32. Yang C, Zhang L, Lu H, et al (2013) Saliency detection via graph based manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3166–3173.

  33. Zhang P, Liu W, Lu H, et al (2018) Salient object detection by lossless feature reflection. arXiv preprint arXiv:1802.06527.

  34. Zhao S, Wu B, Chu W, et al (2019) Correlation maximized structural similarity loss for semantic segmentation. arXiv:1910.08711.

Download references

Funding

This work was supported by the Natural Science Foundation of China (61801512, 62071484), Natural Science Foundation of Jiangsu Province (BK20180080).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tieyong Cao.

Ethics declarations

Conflict of interest/Competing interests

We declare that we have no conflict of interest to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

When calculating the gradient of loss function, we found that the gradient we calculated (Fig. 5b) is inconsistent with Dual Focal Loss (DFL) (Fig. 1b) [11] in background. The two images are shown below.

After research, we found that Hossain et al. [11] misunderstood cross entropy loss formula. The binary cross entropy loss is calculated as follows.

$${\mathrm{L}}_{\mathrm{CE}}=-{\sum}_{\mathrm{i}=1}^{\mathrm{T}}\left[{\mathrm{y}}_{\mathrm{i}}\log {\mathrm{p}}_{\mathrm{i}}+\left(1-{\mathrm{y}}_{\mathrm{i}}\right)\log \left(1-{\mathrm{p}}_{\mathrm{i}}\right)\right]$$

In foreground, yi is 1. Cross entropy loss of foreground pixels is as follows.

$${\mathrm{L}}_{\mathrm{F}}^{\mathrm{i}}=-\log {\mathrm{p}}_{\mathrm{i}}$$

In background, yi is 0. Cross entropy loss of background pixels is as follows.

$${\mathrm{L}}_{\mathrm{B}}^{\mathrm{i}}=-\log \left(1-{\mathrm{p}}_{\mathrm{i}}\right)$$

While Hossain et al. [11] calculated cross entropy loss of foreground and background in the same way.

$${\mathrm{L}}_{\mathrm{F}}^{\mathrm{i}}={\mathrm{L}}_{\mathrm{B}}^{\mathrm{i}}=-\log {\mathrm{p}}_{\mathrm{i}}$$

To explore the reason for the author’s error, we think it may be affected by Focal Loss (FL) [20]. On page 3 of FL, for the convenience of writing, cross entropy is written as follows.

$$\mathrm{CE}\left(p,y\right)=\mathrm{CE}\left({p}_{\mathrm{t}}\right)=-\log \left({p}_{\mathrm{t}}\right)$$

Focal loss is written as follows.

$$\mathrm{FL}\left({p}_{\mathrm{t}}\right)=-{\left(1-{p}_{\mathrm{t}}\right)}^{\gamma }=-\log \left({p}_{\mathrm{t}}\right)$$

However, pt is not simply expressed as pixel predicted probability, it is distinguished to foreground and background. Hossain et al. [11] may directly regard pt as predicted probability of the pixel. The screenshot of original focal loss is as follows.

Focal Loss

The Focal Loss is designed to address the one-stage object detection scenario in which there is an extreme imbalance between foreground and background classes during trainig (e.g., 1:1000). We introduce the focal loss starting from the cross entropy (CE) loss for binary classification1:

$$\mathrm{CE}\left(p,y\right)=\left\{\begin{array}{c}-\log (p)\ \mathrm{if}\ y=1\\ {}-\log \left(1-p\right)\mathrm{otherwie}\end{array}\right.$$
(18)

In the above y ∈ {±}1 specifies the ground-truth class and p ∈ [0, 1] is the model’s estimated probability for the class with label y = 1. For notational convience, we define pt:

$${p}_{\mathrm{t}}=\left\{\begin{array}{c}p\kern0.5em \mathrm{if}\ y=1\\ {}1-p\kern0.5em \mathrm{otherwise}\end{array}\right.$$
(19)

and rewrite CE (p, y) = CE(pt) =  − log(pt).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Cao, T., Zheng, Y. et al. Focusing intermediate pixels loss for salient object segmentation. Multimed Tools Appl 83, 19747–19766 (2024). https://doi.org/10.1007/s11042-023-15873-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15873-1

Keywords

Navigation