Classification of calcareous algae under noisy labels

200 Accesses
1 Altmetric
Explore all metrics

Abstract

Calcareous algae are an important marine ecosystem that is under threat due to global warming and local stressors like oil and gas offshore. Under this condition, an important Brazilian oil and gas company started to monitor that environment. To carry out this monitoring, a deep learning classifier was proposed. However, the elaborated dataset presented noisy labels. Noisy labels mean that some dataset samples are mislabeled, and it degenerates the robustness of the model. State-of-the-art models to deal with it use small loss technique. This technique excludes from the training set noisy labels and keeps the cleans. The state-of-the-art models apply different techniques on the set of clean samples to improve their performance. We introduce in this work a novel framework to deal with noise, that can improve “small loss” models’ performance, named retrieving discard samples (RDS), that under ideal conditions is equivalent to training without any noise. The main idea of this method is to retrieve the discarded samples, add a pseudo-label to the excluded samples and return them to the training stage. This paper demonstrates the adaptability of the framework RDS to other models utilizing the small loss approach. Furthermore, two novel models are proposed to effectively handle noisy labels. Results show that RDS significantly improves the accuracy of both models, achieving superior results than the state-of-the-art approaches. We also developed a deep learning model for calcareous algae environmental monitoring based in RDS approach that improved the performance of F’-Score in 4.2% when compared to standard deep learning classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic identification of harmful algae based on multiple convolutional neural networks and transfer learning

Article 28 September 2022

Multi-label noisy samples in underwater inspection from the oil and gas industry

Article 16 February 2024

FDCNet: filtering deep convolutional network for marine organism classification

Article 18 March 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The benchmarks dataset used to evaluate our model are available in: Mnist: https://www.tensorflow.org/datasets/catalog/mnist; the dataset Cifar-10 and Cifar-100 dataset are available in https://www.cs.toronto.edu/~kriz/cifar.html; The Real-World Dataset: Clothing1M are available in https://drive.google.com/drive/folders/0B67_d0rLRTQYU2E4aHNHaE1uMTg?resourcekey=0_FShcGYZwIyESjnz6S6aLQ. The dataset used in the real application of our model referred as “Calcareous Algae dataset” is a private data, and in respect to the determinations of the data owners, it will not be turned public, for corporate reasons. The code used for models RDS-C and RDS-J will not be turned public due the determinations of the data owners, for corporate reasons.

References

Oliveira Amatussi J, Francisco Mógor Á, Mógor G, Bochetti de Lara G (2020) Novel use of calcareous algae as a plant biostimulant. J Appl Phycol. https://doi.org/10.1007/s10811-020-02077-5
Article Google Scholar
Horta PA, Riul P, Amado Filho GM et al (2016) Rhodoliths in Brazil: current knowledge and potential impacts of climate change. Braz J Oceanogr 64:117–136
Article Google Scholar
Basso D (2012) Carbonate production by calcareous red algae and global change. Geodiversitas 34:13–33. https://doi.org/10.5252/g2012n1a2
Article Google Scholar
Liu Y, Lu H, Li Y et al (2021) A review of treatment technologies for produced water in offshore oil and gas fields. Sci Total Environ 775:145485
Article Google Scholar
Liu W, Jiang Y-G, Luo J, Chang S-F (2011) Noise resistant graph ranking for improved web image search. CVPR 2011. IEEE, p 849–856
Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. Adv Neural Inf Process Syst 23
Han B, Yao Q, Yu X, et al (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. Adv Neural Inf Process Syst 31
Huang L, Zhang C, Zhang H (2022) Self-adaptive training: bridging supervised and self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE
Wei H, Feng L, Chen X, An B (2020) Combating noisy labels by agreement: a joint training method with co-regularization. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p 13726–13735
Yao Y, Sun Z, Zhang C et al (2021) Jo-SRC: a contrastive approach for combating noisy labels. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p 5192–5201
Xiao T, Xia T, Yang Y et al (2015) Learning from massive noisy labeled data for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition. p 2691–2699
Sun Z, Shen F, Huang D et al (2022) PNP: robust learning from noisy labels by probabilistic noise prediction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. p 5311–5320
Yu X, Han B, Yao J et al (2019) How does disagreement help generalization against label corruption?. International Conference on Machine Learning. p 7164–7173
Pham H, Dai Z, Xie Q, Le QV (2021) Meta pseudo labels. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p 11557–11568
van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109:373–440. https://doi.org/10.1007/s10994-019-05855-6
Article MathSciNet Google Scholar
Deng L (2012) The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29:141–142. https://doi.org/10.1109/MSP.2012.2211477
Article Google Scholar
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Toronto, ON, Canada
Lopez MM, Kalita J (2017) Deep Learning applied to NLP. arXiv preprint: https://doi.org/arXiv:1703.03091
Li Y, Zhang H, Xue X et al (2018) Deep learning for remote sensing image classification: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8:e1264
Article Google Scholar
Lai M (2015) Deep learning for medical image segmentation. arXiv preprint: https://doi.org/arXiv:1505.02000
Liu Y, Cheng H, Zhang K (2022) Identifiability of label noise transition matrix. International Conference on Machine Learning. PMLR, p 21475–21496
Sanderson T, Scott C (2014) Class proportion estimation with application to multiclass anomaly rejection. Artificial Intelligence and Statistics. PMLR, p 850–858
Jiang L, Zhou Z, Leung T et al (2018) MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. International conference on machine learning. PMLR, p 2304–2313
Malach E, Shalev-Shwartz S (2017) Decoupling “when to update” from “how to update”. Adv Neural Inform Process Syst 30
Ho Y, Wookey S (2020) The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access 8:4806–4813. https://doi.org/10.1109/ACCESS.2019.2962617
Article Google Scholar
Van Erven T, Harrëmos P (2014) Rényi divergence and Kullback–Leibler divergence. IEEE Trans Inf Theory 60:3797–3820. https://doi.org/10.1109/TIT.2014.2320500
Article Google Scholar
Guo Q, Feng W, Zhou C, et al (2017) Learning dynamic siamese network for visual object tracking. Proceedings of the IEEE international conference on computer vision. p 1763–1771
Huang L, Zhang C, Zhang H (2022) Self-adaptive training: bridging supervised and self-supervised learning. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3217792
Article Google Scholar
Berthelot D, Research G, Carlini N et al (2019) MixMatch: a holistic approach to semi-supervised learning. Adv Neural Inform Process Syst 32
Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data. https://doi.org/10.1186/s40537-019-0197-0
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint: https://doi.org/arXiv:1412.6980
Girija SS (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems. Software available from tensorflow. org 39:9
Van Rooyen B, Williamson RC (2017) A theory of learning with corrupted labels. J Mach Learn Res 18(1):8501–8550
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. p 770–778
Jia Deng WD, Richard Socher, Li-Jia Li et al (2009) ImageNet: a large-scale hierarchical image database. IEEE
Patrini G, Rozza A, Menon AK et al (2017) Making deep neural networks robust to label noise: a loss correction approach. Proceedings of the IEEE conference on computer vision and pattern recognition. p 1944–1952
Sousa V, Pereira A, Koher M, Pachecho M (2023) Learning by small loss approach multi-label to deal with noisy labels. International Conference on Computational Science and Its Applications. Springer, p 385–403

Download references

Acknowledgements

The authors would like to thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Capes) and Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio) for their financial support.

Author information

Authors and Affiliations

Department of Electrical Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
Vitor Bento, Manoela Kohler & Marco Aurelio Pacheco

Authors

Vitor Bento
View author publications
You can also search for this author in PubMed Google Scholar
Manoela Kohler
View author publications
You can also search for this author in PubMed Google Scholar
Marco Aurelio Pacheco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vitor Bento.

Ethics declarations

Conflict of interest

All authors state that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Table 14.

Table 14 Network details

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bento, V., Kohler, M. & Pacheco, M.A. Classification of calcareous algae under noisy labels. Neural Comput & Applic 36, 3197–3214 (2024). https://doi.org/10.1007/s00521-023-09235-z

Download citation

Received: 30 August 2023
Accepted: 03 November 2023
Published: 02 December 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00521-023-09235-z

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic identification of harmful algae based on multiple convolutional neural networks and transfer learning

Multi-label noisy samples in underwater inspection from the oil and gas industry

FDCNet: filtering deep convolutional network for marine organism classification

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Classification of calcareous algae under noisy labels

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic identification of harmful algae based on multiple convolutional neural networks and transfer learning

Multi-label noisy samples in underwater inspection from the oil and gas industry

FDCNet: filtering deep convolutional network for marine organism classification

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation