Abstract
To improve the network performance of salient object segmentation, many researchers modified the loss functions and set weights to pixel losses. However, these loss functions paid less attention to intermediate pixels of which the predicted probabilities lie in the intermediate region between correct and incorrect classification. To solve this problem, focusing intermediate pixels loss is proposed. Firstly, foreground and background are divided into correct and incorrect classified sets respectively to discover intermediate pixels which are difficult to determine the category. Secondly, the intermediate pixels are paid more attention according to the predicted probability. Finally, misclassified pixels are strengthened dynamically with the order of training epochs. The proposed method can 1) make the model focus on intermediate pixels that have more uncertainty; 2) solve the vanishing gradient problem of Focal Loss for well-classified pixels. Experiment results on six public datasets and two different type of network structures show that the proposed method performs better than other state-of-the-art weighted loss functions and the average Fβ is increased by about 2.7% compared with typical cross entropy.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Chen LC, Papandreou G, Schroff F, et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587.
Chen Z, Zhou H, Lai J et al (2020) Contour-aware loss: Boundary-aware learning for salient object segmentation. IEEE Trans Image Proc 30:431–443
Cheng MM, Mitra NJ, Huang X et al (2014) SalientShape: group saliency in image collections. Vis Comput 30(4):443–453
Deng Z, Hu X, Zhu L et al (2018) R3net: Recurrent residual refinement network for saliency detection. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, USA, pp 684–690
Fan DP, Cheng MM, Liu JJ, et al (2018) Salient objects in clutter: Bringing salient object detection to the foreground. In: Proceedings of the European conference on computer vision, pp. 186–202.
Fan D P, Ji G P, Sun G, et al (2020) Camouflaged object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2777–2787.
Goyal P, Dollár P, Girshick R, et al (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677.
Gu J, Wang Z, Kuen J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
He T, Zhang Z, Zhang H, et al (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 558–567.
Hossain MS, Betts JM, Paplinski AP (2021) Dual Focal Loss to address class imbalance in semantic segmentation. Neurocomputing 462:69–87
Hou Q, Cheng MM, Hu X, et al (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3203–3212.
Ji Y, Zhang H, Zhang Z et al (2021) CNN-based encoder-decoder networks for salient object detection: A comprehensive review and recent advances. Inf Sci 546:835–857
Kim T, Lee H, Kim D (2021) Uacanet: Uncertainty augmented context attention for polyp segmentation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 2167-2175.
Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980.
Lei Feng, Senlin Shu, Zhuoyi Lin, et al (2020) Can cross entropy loss be robust to label noise. In: Proceedings of the 29th International Joint Conferences on Artificial Intelligence, pp 2206–2212.
Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5455–5463.
Li Y, Hou X, Koch C, et al (2014) The secrets of salient object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 280–287.
Li X, Yu L, Chang D et al (2019) Dual cross-entropy loss for small-sample fine-grained vehicle classification. IEEE Trans Veh Technol 68(5):4204–4212
Lin TY, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE inter-national conference on computer vision, pp. 2980–2988.
Liu JJ, Hou Q, Cheng MM, et al (2019) A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3917–3926.
Liu Z, Tang J, Xiang Q et al (2020) Salient object detection for RGB-D images by generative adversarial network. Multimed Tools Appl 79:25403–25425
Liu Z, Lin Y, Cao Y, et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10012–10022.
Mao Y, Zhang J, Wan Z, et al (2021) Generative transformer for accurate and reliable salient object detection arXiv: 2104.10127.
Mavroforakis ME, Theodoridis S (2006) A geometric approach to support vector machine (SVM) classification. IEEE Trans Neural Netw 17(3):671–682
Pan C, Yan WQ (2020) Object detection based on saturation of visual perception. Multimed Tools Appl 79:19925–19944
Pang Y, Zhao X, Xiang T Z, et al (2022) Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2160–2170.
Rahman MA, Wang Y (2016) Optimizing intersection-over-union in deep neural networks for image segmentation. International symposium on visual computing. Springer, Cham, pp 234–244
Singh VK, Kumar N, Singh N (2020) A hybrid approach using color spatial variance and novel object position prior for salient object detection. Multimed Tools Appl 79:30045–30067
Wang Q, Zhang L, Li Y et al (2020) Overview of deep-learning based methods for salient object detection in videos. Pattern Recogn 104:107340
Wei J, Wang S, Huang Q (2020) F3Net: fusion, feedback and focus for salient object detection. Proc AAAI Conf Artificial Int 34(07):12321–12328
Yang C, Zhang L, Lu H, et al (2013) Saliency detection via graph based manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3166–3173.
Zhang P, Liu W, Lu H, et al (2018) Salient object detection by lossless feature reflection. arXiv preprint arXiv:1802.06527.
Zhao S, Wu B, Chu W, et al (2019) Correlation maximized structural similarity loss for semantic segmentation. arXiv:1910.08711.
Funding
This work was supported by the Natural Science Foundation of China (61801512, 62071484), Natural Science Foundation of Jiangsu Province (BK20180080).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest/Competing interests
We declare that we have no conflict of interest to this work.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
When calculating the gradient of loss function, we found that the gradient we calculated (Fig. 5b) is inconsistent with Dual Focal Loss (DFL) (Fig. 1b) [11] in background. The two images are shown below.
After research, we found that Hossain et al. [11] misunderstood cross entropy loss formula. The binary cross entropy loss is calculated as follows.
In foreground, yi is 1. Cross entropy loss of foreground pixels is as follows.
In background, yi is 0. Cross entropy loss of background pixels is as follows.
While Hossain et al. [11] calculated cross entropy loss of foreground and background in the same way.
To explore the reason for the author’s error, we think it may be affected by Focal Loss (FL) [20]. On page 3 of FL, for the convenience of writing, cross entropy is written as follows.
Focal loss is written as follows.
However, pt is not simply expressed as pixel predicted probability, it is distinguished to foreground and background. Hossain et al. [11] may directly regard pt as predicted probability of the pixel. The screenshot of original focal loss is as follows.
Focal Loss
The Focal Loss is designed to address the one-stage object detection scenario in which there is an extreme imbalance between foreground and background classes during trainig (e.g., 1:1000). We introduce the focal loss starting from the cross entropy (CE) loss for binary classification1:
In the above y ∈ {±}1 specifies the ground-truth class and p ∈ [0, 1] is the model’s estimated probability for the class with label y = 1. For notational convience, we define pt:
and rewrite CE (p, y) = CE(pt) = − log(pt).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, L., Cao, T., Zheng, Y. et al. Focusing intermediate pixels loss for salient object segmentation. Multimed Tools Appl 83, 19747–19766 (2024). https://doi.org/10.1007/s11042-023-15873-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15873-1