Abstract
Calcareous algae are an important marine ecosystem that is under threat due to global warming and local stressors like oil and gas offshore. Under this condition, an important Brazilian oil and gas company started to monitor that environment. To carry out this monitoring, a deep learning classifier was proposed. However, the elaborated dataset presented noisy labels. Noisy labels mean that some dataset samples are mislabeled, and it degenerates the robustness of the model. State-of-the-art models to deal with it use small loss technique. This technique excludes from the training set noisy labels and keeps the cleans. The state-of-the-art models apply different techniques on the set of clean samples to improve their performance. We introduce in this work a novel framework to deal with noise, that can improve “small loss” models’ performance, named retrieving discard samples (RDS), that under ideal conditions is equivalent to training without any noise. The main idea of this method is to retrieve the discarded samples, add a pseudo-label to the excluded samples and return them to the training stage. This paper demonstrates the adaptability of the framework RDS to other models utilizing the small loss approach. Furthermore, two novel models are proposed to effectively handle noisy labels. Results show that RDS significantly improves the accuracy of both models, achieving superior results than the state-of-the-art approaches. We also developed a deep learning model for calcareous algae environmental monitoring based in RDS approach that improved the performance of F’-Score in 4.2% when compared to standard deep learning classifiers.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The benchmarks dataset used to evaluate our model are available in: Mnist: https://www.tensorflow.org/datasets/catalog/mnist; the dataset Cifar-10 and Cifar-100 dataset are available in https://www.cs.toronto.edu/~kriz/cifar.html; The Real-World Dataset: Clothing1M are available in https://drive.google.com/drive/folders/0B67_d0rLRTQYU2E4aHNHaE1uMTg?resourcekey=0_FShcGYZwIyESjnz6S6aLQ. The dataset used in the real application of our model referred as “Calcareous Algae dataset” is a private data, and in respect to the determinations of the data owners, it will not be turned public, for corporate reasons. The code used for models RDS-C and RDS-J will not be turned public due the determinations of the data owners, for corporate reasons.
References
Oliveira Amatussi J, Francisco Mógor Á, Mógor G, Bochetti de Lara G (2020) Novel use of calcareous algae as a plant biostimulant. J Appl Phycol. https://doi.org/10.1007/s10811-020-02077-5
Horta PA, Riul P, Amado Filho GM et al (2016) Rhodoliths in Brazil: current knowledge and potential impacts of climate change. Braz J Oceanogr 64:117–136
Basso D (2012) Carbonate production by calcareous red algae and global change. Geodiversitas 34:13–33. https://doi.org/10.5252/g2012n1a2
Liu Y, Lu H, Li Y et al (2021) A review of treatment technologies for produced water in offshore oil and gas fields. Sci Total Environ 775:145485
Liu W, Jiang Y-G, Luo J, Chang S-F (2011) Noise resistant graph ranking for improved web image search. CVPR 2011. IEEE, p 849–856
Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. Adv Neural Inf Process Syst 23
Han B, Yao Q, Yu X, et al (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. Adv Neural Inf Process Syst 31
Huang L, Zhang C, Zhang H (2022) Self-adaptive training: bridging supervised and self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE
Wei H, Feng L, Chen X, An B (2020) Combating noisy labels by agreement: a joint training method with co-regularization. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p 13726–13735
Yao Y, Sun Z, Zhang C et al (2021) Jo-SRC: a contrastive approach for combating noisy labels. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p 5192–5201
Xiao T, Xia T, Yang Y et al (2015) Learning from massive noisy labeled data for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition. p 2691–2699
Sun Z, Shen F, Huang D et al (2022) PNP: robust learning from noisy labels by probabilistic noise prediction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. p 5311–5320
Yu X, Han B, Yao J et al (2019) How does disagreement help generalization against label corruption?. International Conference on Machine Learning. p 7164–7173
Pham H, Dai Z, Xie Q, Le QV (2021) Meta pseudo labels. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p 11557–11568
van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109:373–440. https://doi.org/10.1007/s10994-019-05855-6
Deng L (2012) The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29:141–142. https://doi.org/10.1109/MSP.2012.2211477
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Toronto, ON, Canada
Lopez MM, Kalita J (2017) Deep Learning applied to NLP. arXiv preprint: https://doi.org/arXiv:1703.03091
Li Y, Zhang H, Xue X et al (2018) Deep learning for remote sensing image classification: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8:e1264
Lai M (2015) Deep learning for medical image segmentation. arXiv preprint: https://doi.org/arXiv:1505.02000
Liu Y, Cheng H, Zhang K (2022) Identifiability of label noise transition matrix. International Conference on Machine Learning. PMLR, p 21475–21496
Sanderson T, Scott C (2014) Class proportion estimation with application to multiclass anomaly rejection. Artificial Intelligence and Statistics. PMLR, p 850–858
Jiang L, Zhou Z, Leung T et al (2018) MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. International conference on machine learning. PMLR, p 2304–2313
Malach E, Shalev-Shwartz S (2017) Decoupling “when to update” from “how to update”. Adv Neural Inform Process Syst 30
Ho Y, Wookey S (2020) The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access 8:4806–4813. https://doi.org/10.1109/ACCESS.2019.2962617
Van Erven T, Harrëmos P (2014) Rényi divergence and Kullback–Leibler divergence. IEEE Trans Inf Theory 60:3797–3820. https://doi.org/10.1109/TIT.2014.2320500
Guo Q, Feng W, Zhou C, et al (2017) Learning dynamic siamese network for visual object tracking. Proceedings of the IEEE international conference on computer vision. p 1763–1771
Huang L, Zhang C, Zhang H (2022) Self-adaptive training: bridging supervised and self-supervised learning. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3217792
Berthelot D, Research G, Carlini N et al (2019) MixMatch: a holistic approach to semi-supervised learning. Adv Neural Inform Process Syst 32
Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data. https://doi.org/10.1186/s40537-019-0197-0
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint: https://doi.org/arXiv:1412.6980
Girija SS (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems. Software available from tensorflow. org 39:9
Van Rooyen B, Williamson RC (2017) A theory of learning with corrupted labels. J Mach Learn Res 18(1):8501–8550
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. p 770–778
Jia Deng WD, Richard Socher, Li-Jia Li et al (2009) ImageNet: a large-scale hierarchical image database. IEEE
Patrini G, Rozza A, Menon AK et al (2017) Making deep neural networks robust to label noise: a loss correction approach. Proceedings of the IEEE conference on computer vision and pattern recognition. p 1944–1952
Sousa V, Pereira A, Koher M, Pachecho M (2023) Learning by small loss approach multi-label to deal with noisy labels. International Conference on Computational Science and Its Applications. Springer, p 385–403
Acknowledgements
The authors would like to thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Capes) and Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio) for their financial support.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors state that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bento, V., Kohler, M. & Pacheco, M.A. Classification of calcareous algae under noisy labels. Neural Comput & Applic 36, 3197–3214 (2024). https://doi.org/10.1007/s00521-023-09235-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09235-z