Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Classification of calcareous algae under noisy labels

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Calcareous algae are an important marine ecosystem that is under threat due to global warming and local stressors like oil and gas offshore. Under this condition, an important Brazilian oil and gas company started to monitor that environment. To carry out this monitoring, a deep learning classifier was proposed. However, the elaborated dataset presented noisy labels. Noisy labels mean that some dataset samples are mislabeled, and it degenerates the robustness of the model. State-of-the-art models to deal with it use small loss technique. This technique excludes from the training set noisy labels and keeps the cleans. The state-of-the-art models apply different techniques on the set of clean samples to improve their performance. We introduce in this work a novel framework to deal with noise, that can improve “small loss” models’ performance, named retrieving discard samples (RDS), that under ideal conditions is equivalent to training without any noise. The main idea of this method is to retrieve the discarded samples, add a pseudo-label to the excluded samples and return them to the training stage. This paper demonstrates the adaptability of the framework RDS to other models utilizing the small loss approach. Furthermore, two novel models are proposed to effectively handle noisy labels. Results show that RDS significantly improves the accuracy of both models, achieving superior results than the state-of-the-art approaches. We also developed a deep learning model for calcareous algae environmental monitoring based in RDS approach that improved the performance of F’-Score in 4.2% when compared to standard deep learning classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The benchmarks dataset used to evaluate our model are available in: Mnist: https://www.tensorflow.org/datasets/catalog/mnist; the dataset Cifar-10 and Cifar-100 dataset are available in https://www.cs.toronto.edu/~kriz/cifar.html; The Real-World Dataset: Clothing1M are available in https://drive.google.com/drive/folders/0B67_d0rLRTQYU2E4aHNHaE1uMTg?resourcekey=0_FShcGYZwIyESjnz6S6aLQ. The dataset used in the real application of our model referred as “Calcareous Algae dataset” is a private data, and in respect to the determinations of the data owners, it will not be turned public, for corporate reasons. The code used for models RDS-C and RDS-J will not be turned public due the determinations of the data owners, for corporate reasons.

References

  1. Oliveira Amatussi J, Francisco Mógor Á, Mógor G, Bochetti de Lara G (2020) Novel use of calcareous algae as a plant biostimulant. J Appl Phycol. https://doi.org/10.1007/s10811-020-02077-5

    Article  Google Scholar 

  2. Horta PA, Riul P, Amado Filho GM et al (2016) Rhodoliths in Brazil: current knowledge and potential impacts of climate change. Braz J Oceanogr 64:117–136

    Article  Google Scholar 

  3. Basso D (2012) Carbonate production by calcareous red algae and global change. Geodiversitas 34:13–33. https://doi.org/10.5252/g2012n1a2

    Article  Google Scholar 

  4. Liu Y, Lu H, Li Y et al (2021) A review of treatment technologies for produced water in offshore oil and gas fields. Sci Total Environ 775:145485

    Article  Google Scholar 

  5. Liu W, Jiang Y-G, Luo J, Chang S-F (2011) Noise resistant graph ranking for improved web image search. CVPR 2011. IEEE, p 849–856

  6. Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. Adv Neural Inf Process Syst 23

  7. Han B, Yao Q, Yu X, et al (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. Adv Neural Inf Process Syst 31

  8. Huang L, Zhang C, Zhang H (2022) Self-adaptive training: bridging supervised and self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE

  9. Wei H, Feng L, Chen X, An B (2020) Combating noisy labels by agreement: a joint training method with co-regularization. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p 13726–13735

  10. Yao Y, Sun Z, Zhang C et al (2021) Jo-SRC: a contrastive approach for combating noisy labels. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p 5192–5201

  11. Xiao T, Xia T, Yang Y et al (2015) Learning from massive noisy labeled data for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition. p 2691–2699

  12. Sun Z, Shen F, Huang D et al (2022) PNP: robust learning from noisy labels by probabilistic noise prediction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. p 5311–5320

  13. Yu X, Han B, Yao J et al (2019) How does disagreement help generalization against label corruption?. International Conference on Machine Learning. p 7164–7173

  14. Pham H, Dai Z, Xie Q, Le QV (2021) Meta pseudo labels. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p 11557–11568

  15. van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109:373–440. https://doi.org/10.1007/s10994-019-05855-6

    Article  MathSciNet  Google Scholar 

  16. Deng L (2012) The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29:141–142. https://doi.org/10.1109/MSP.2012.2211477

    Article  Google Scholar 

  17. Krizhevsky A (2009) Learning multiple layers of features from tiny images. Toronto, ON, Canada

  18. Lopez MM, Kalita J (2017) Deep Learning applied to NLP. arXiv preprint: https://doi.org/arXiv:1703.03091

  19. Li Y, Zhang H, Xue X et al (2018) Deep learning for remote sensing image classification: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8:e1264

    Article  Google Scholar 

  20. Lai M (2015) Deep learning for medical image segmentation. arXiv preprint: https://doi.org/arXiv:1505.02000

  21. Liu Y, Cheng H, Zhang K (2022) Identifiability of label noise transition matrix. International Conference on Machine Learning. PMLR, p 21475–21496

  22. Sanderson T, Scott C (2014) Class proportion estimation with application to multiclass anomaly rejection. Artificial Intelligence and Statistics. PMLR, p 850–858

  23. Jiang L, Zhou Z, Leung T et al (2018) MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. International conference on machine learning. PMLR, p 2304–2313

  24. Malach E, Shalev-Shwartz S (2017) Decoupling “when to update” from “how to update”. Adv Neural Inform Process Syst 30

  25. Ho Y, Wookey S (2020) The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access 8:4806–4813. https://doi.org/10.1109/ACCESS.2019.2962617

    Article  Google Scholar 

  26. Van Erven T, Harrëmos P (2014) Rényi divergence and Kullback–Leibler divergence. IEEE Trans Inf Theory 60:3797–3820. https://doi.org/10.1109/TIT.2014.2320500

    Article  Google Scholar 

  27. Guo Q, Feng W, Zhou C, et al (2017) Learning dynamic siamese network for visual object tracking. Proceedings of the IEEE international conference on computer vision. p 1763–1771

  28. Huang L, Zhang C, Zhang H (2022) Self-adaptive training: bridging supervised and self-supervised learning. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3217792

    Article  Google Scholar 

  29. Berthelot D, Research G, Carlini N et al (2019) MixMatch: a holistic approach to semi-supervised learning. Adv Neural Inform Process Syst 32

  30. Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data. https://doi.org/10.1186/s40537-019-0197-0

    Article  Google Scholar 

  31. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint: https://doi.org/arXiv:1412.6980

  32. Girija SS (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems. Software available from tensorflow. org 39:9

  33. Van Rooyen B, Williamson RC (2017) A theory of learning with corrupted labels. J Mach Learn Res 18(1):8501–8550

  34. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. p 770–778

  35. Jia Deng WD, Richard Socher, Li-Jia Li et al (2009) ImageNet: a large-scale hierarchical image database. IEEE

  36. Patrini G, Rozza A, Menon AK et al (2017) Making deep neural networks robust to label noise: a loss correction approach. Proceedings of the IEEE conference on computer vision and pattern recognition. p 1944–1952

  37. Sousa V, Pereira A, Koher M, Pachecho M (2023) Learning by small loss approach multi-label to deal with noisy labels. International Conference on Computational Science and Its Applications. Springer, p 385–403

Download references

Acknowledgements

The authors would like to thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Capes) and Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio) for their financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vitor Bento.

Ethics declarations

Conflict of interest

All authors state that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Table 14.

Table 14 Network details

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bento, V., Kohler, M. & Pacheco, M.A. Classification of calcareous algae under noisy labels. Neural Comput & Applic 36, 3197–3214 (2024). https://doi.org/10.1007/s00521-023-09235-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09235-z

Keywords

Navigation