Improved Majority Filtering Algorithm for Cleaning Class Label Noise in Supervised Learning

Abstract

In most supervised learning problems, the labelling quality of datasets plays a paramount role in the learning of high-performance classifiers. The performance of a classifier can significantly be degraded if it is trained with mislabeled data. Therefore, identification of such examples from the dataset is of critical importance. In this study, we proposed an improved majority filtering algorithm, which utilized the ability of a support vector machine in terms of capturing potentially mislabeled examples as support vectors (SVs). The key technical contribution of our work, is that the base (or component) classifiers that construct the ensemble of classifiers are trained using non-SV examples, although at the time of testing, the examples captured as SVs were employed. An example can be tagged as mislabeled if the majority of the base classifiers incorrectly classifies the example. Experimental results confirmed that our algorithm not only showed high-level accuracy with higher F₁ scores, for identifying the mislabeled examples, but was also significantly faster than the previous methods.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!