Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k

Published: 16 December 2023 Publication History

Abstract

The use of all samples in the optimization process does not produce robust results in datasets with label noise. Because the gradients calculated according to the losses of the noisy samples cause the optimization process to go in the wrong direction. In this paper, we recommend using samples with loss less than a threshold determined during the optimization, instead of using all samples in the mini-batch. Our proposed method, Adaptive-k, aims to exclude label noise samples from the optimization process and make the process robust. On noisy datasets, we found that using a threshold-based approach, such as Adaptive-k, produces better results than using all samples or a fixed number of low-loss samples in the mini-batch. On the basis of our theoretical analysis and experimental results, we show that the Adaptive-k method is closest to the performance of the Oracle, in which noisy samples are entirely removed from the dataset. Adaptive-k is a simple but effective method. It does not require prior knowledge of the noise ratio of the dataset, does not require additional model training, and does not increase training time significantly. In the experiments, we also show that Adaptive-k is compatible with different optimizers such as SGD, SGDM, and Adam. The code for Adaptive-k is available at GitHub.

References

[1]
Zhang C, Bengio S, Hardt M, Recht B, and Vinyals O Understanding deep learning (still) requires rethinking generalization Communications of the ACM 2021 64 3 107-115
[2]
Liao S, Jiang X, and Ge Z Weakly supervised multilayer perceptron for industrial fault classification with inaccurate and incomplete labels IEEE Transactions on Automation Science and Engineering 2022 19 2 1192-1201
[3]
Ortego D, Arazo E, Albert P, O’Connor N E, McGuinness K. Towards robust learning with different label noise distributions. In: Proceedings of the 25th International Conference on Pattern Recognition (ICPR). 2021, 7020–7027
[4]
Arazo E, Ortego D, Albert P, O’Connor N, McGuinness K. Unsupervised label noise modeling and loss correction. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 312–321
[5]
Nishi K, Ding Y, Rich A, Höllerer T. Augmentation strategies for learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 8018–8027
[6]
Majidi N, Amid E, Talebi H, Warmuth M K. Exponentiated gradient reweighting for robust training under label noise and beyond. 2021, arXiv preprint arXiv: 2104.01493
[7]
Shah V, Wu X, Sanghavi S. Choosing the sample with lowest loss makes SGD robust. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. 2020, 2120–2130
[8]
Bengio Y, Louradour J, Collobert R, Weston J. Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 41–48
[9]
Kesgin H T, Amasyali M F. Cyclical curriculum learning. 2022, arXiv preprint arXiv: 2202.05531
[10]
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang I W, Sugiyama M. Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8536–8546
[11]
Shi X and Che W Combating with extremely noisy samples in weakly supervised slot filling for automatic diagnosis Frontiers of Computer Science 2023 17 5 175333
[12]
Yang H, Jin Y, Li Z, Wang D B, Miao L, Geng X, Zhang M L. Learning from noisy labels via dynamic loss thresholding. 2021, arXiv preprint arXiv: 2104.02570
[13]
Wei Y, Xue M, Liu X, and Xu P Data fusing and joint training for learning with noisy labels Frontiers of Computer Science 2022 16 6 166338
[14]
Yao Q, Yang H, Han B, Niu G, Kwok J T. Searching to exploit memorization effect in learning with noisy labels. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 1000
[15]
Chi Y, Li Y, Zhang H, Liang Y. Median-truncated gradient descent: a robust and scalable nonconvex approach for signal estimation. In: Proceedings of the 3rd International MATHEON Conference on Compressed Sensing and Its Applications. 2019, 237–261
[16]
Shen Y, Sanghavi S. Learning with bad training data via iterative trimmed loss minimization. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5739–5748
[17]
Nakamura K, Hong B W. Regularization in neural network optimization via trimmed stochastic gradient descent with noisy label. 2020, arXiv preprint arXiv: 2012.11073
[18]
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
[19]
Deng L The MNIST database of handwritten digit images for machine learning research [best of the web] IEEE Signal Processing Magazine 2012 29 6 141-142
[20]
Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. 2017, arXiv preprint arXiv: 1708.07747
[21]
Krizhevsky A. Learning multiple layers of features from tiny images.Technical Report, 2009
[22]
He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 630–645
[23]
Maas A L, Daly R E, Pham P T, Huang D, Ng A Y, Potts C. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 142–150
[24]
comet-examples/comet-keras-cnn-lstm-example.py at master • cometml/comet-examples • github. See qwone.com/~jason/20Newsgroups website, 2021
[25]
Misra R, Arora P. Sarcasm detection using hybrid neural network. 2019, arXiv preprint arXiv: 1908.07414
[26]
kaggle. Sarcasm detection: a guide for ML and DL approach. See kaggle.com/subbhashit/sarcasm-detection-a-guide-for-ml-and-dl-approach website. 2021
[27]
Alam M H, Ryu W J, and Lee S Joint multi-grain topic sentiment: modeling semantic aspects for online reviews Information Sciences 2016 339 206-223
[28]
kaggle. Hotel reviews sentiment prediction. See kaggle.com/code/shahraizanwar/hotel-reviews-sentiment-prediction/notebook website. 2021
[29]
Home page for 20 newsgroups data set. See qwone.com/~jason/20Newsgroups website, 2014
[30]
Team K. Using pre-trained word embeddings. See keras.io/examples/nlp/pretrained_word_embeddings website, 2021

Index Terms

  1. A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Frontiers of Computer Science: Selected Publications from Chinese Universities
        Frontiers of Computer Science: Selected Publications from Chinese Universities  Volume 18, Issue 4
        Aug 2024
        210 pages

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 16 December 2023
        Accepted: 03 April 2023
        Received: 07 July 2022

        Author Tags

        1. robust optimization
        2. label noise
        3. noisy label
        4. deep learning
        5. noisy datasets
        6. noise ratio estimation
        7. robust training

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 19 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media