The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks

Michael R. Smith¹ &
Tony Martinez¹

504 Accesses
13 Citations
Explore all metrics

Abstract

Removing or filtering outliers and mislabeled instances prior to training a learning algorithm has been shown to increase classification accuracy, especially in noisy data sets. A popular approach is to remove any instance that is misclassified by a learning algorithm. However, the use of ensemble methods has also been shown to generally increase classification accuracy. In this paper, we extensively examine filtering and ensembling. We examine 9 learning algorithms individually and ensembled together as filtering algorithms as well as the effects of filtering in the 9 chosen learning algorithms on a set of 54 data sets. We compare the filtering results with using a majority voting ensemble. We find that the majority voting ensemble significantly outperforms filtering unless there are high amounts of noise present in the data set. Additionally, for most cases, using an ensemble of learning algorithms for filtering produces a greater increase in classification accuracy than using a single learning algorithm for filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A geometric framework for multiclass ensemble classifiers

Article Open access 27 September 2023

Removing Bias from Diverse Data Clusters for Ensemble Classification

Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

As opposed to an ensemble composed of models linduced by the same learning algorithm such as bagging or boosting.
The NNge learning algorithm did not finish running two data sets: eye-movements and Magic telescope. RIPPER did not finish on the lung cancer data set. In these cases, the data sets are omitted from the presented results. As such, NNge was evaluated on a set of 52 data sets and RIPPER was evaluated on a set of 53 data sets.

References

Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 1. Springer, New York
MATH Google Scholar
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167
MATH Google Scholar
Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23rd international conference on machine learning, pp 201–208
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems, Lecture Notes in Computer Science, vol 1857. Springer, Berlin, pp 1–15
Freund Y (1990) Boosting a weak learning algorithm by majority. In: Proceedings of the third annual workshop on computational learning theory, pp 202–216
Gamberger D, Lavrač N, Džeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14(2):205–223
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24:289–300
Article Google Scholar
John G.H (1995) Robust decision trees: removing outliers from databases. In: Knowledge discovery and data mining, pp 174–179
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207
Article MATH Google Scholar
Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. In: In Proceedings of the 18th international conference on machine learning, pp 306–313
Lee J, Giraud-Carrier C (2011) A metric for unsupervised metalearning. Intell Data Anal 15(6):827–841
Google Scholar
Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306
Article Google Scholar
Ng AY, Jordan MI (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems, vol 14, pp 841–848
Opitz DW, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198
MATH Google Scholar
Orriols-Puig A, Macià N, Bernadó-Mansilla E, Ho TK (2009) Documentation for the data complexity library in C$++$. Tech. Rep. 2009001, La Salle—Universitat Ramon Llull
Peterson AH, Martinez TR (2005) Estimating the potential for combining learning models. In: Proceedings of the ICML Workshop on meta-learning, pp 68–75
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45
Article Google Scholar
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Rebbapragada U, Brodley CE (2007) Class noise mitigation through instance weighting. In: Proceedings of the 18th European conference on machine learning, pp 708–715
Sáez JA, Luengo J, Herrera F (2013) Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit 46(1):355–364
Article Google Scholar
Salojärvi J, Puolamäki K, Simola J, Kovanen L, Kojo I, Kaski S (2005) Inferring relevance from eye movements: feature extraction. Tech. Rep. A82, Helsinki University of Technology
Sayyad Shirabad J, Menzies T (2005) The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada . http://promise.site.uottawa.ca/SERepository/
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
Google Scholar
Segata N, Blanzieri E, Cunningham P (2009) A scalable noise reduction technique for large case-based systems. In: Proceedings of the 8th international conference on case-based reasoning: case-based reasoning research and development, pp 328–342
Servedio RA (2003) Smooth boosting and learning with malicious noise. J Mach Learn Res 4:633–648
MathSciNet MATH Google Scholar
Smith MR, Martinez T (2011) Improving classification accuracy by identifying and removing instances that should be misclassified. In: Proceedings of the IEEE international joint conference on neural networks, pp 2690–2697
Smith MR, Martinez T (2014) Reducing the effects of detrimental instances. In: Proceedings of the 13th international conference on machine learning and applications, pp 183–188
Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256
Article MathSciNet Google Scholar
Stiglic G, Kokol P (2009) GEMLer: gene expression machine learning repository. University of Maribor, Faculty of Health Sciences. http://gemler.fzv.uni-mb.si/
Teng C (2003) Combining noise correction with feature selection. Data warehousing and knowledge discovery, Lecture Notes in Computer Science, vol 2737, pp 340–349
Teng CM (2000) Evaluating noise correction. In: PRICAI, pp 188–198
Thomson K, McQueen RJ (1996) Machine learning applied to fourteen agricultural datasets. Tech. Rep. 96/18, The University of Waikato
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6:448–452
MathSciNet MATH Google Scholar
Verbaeten S, Van Assche A (2003) Ensemble methods for noise elimination in classification problems. In: Proceedings of the 4th international conference on multiple classifier systems, pp 317–325
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2–3:408–421
Article MathSciNet MATH Google Scholar
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286
Article MATH Google Scholar
Zeng X, Martinez TR (2001) An algorithm for correcting mislabeled data. Intell Data Anal 5:491–502
MATH Google Scholar
Zeng X, Martinez TR (2003) A noise filtering method using neural networks. In: Proceedings of the international workshop of soft computing techniques in instrumentation, measurement and related applications
Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22:177–210
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Brigham Young University, Provo, UT, 84602, USA
Michael R. Smith & Tony Martinez

Authors

Michael R. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Tony Martinez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael R. Smith.

Appendices

Appendix 1: Statistical Significance Tables

This section provides the results from the statistical significance tests comparing not filtering with filtering with a biased filter, the ensemble filter, and the adaptive filter for the investigated learning algorithms. The results are in Tables 10, 11, 12, 13, 14, 15, 16, 17 and 18. The p values with a value ${<}$0.05 are in bold and “greater-equal-less” refers to the number of times that the algorithm listed in the row is greater than, equal to, or less than the algorithm listed in the column.

Table 10 Pair-wise comparison of filtering for multilayer perceptrons trained with backpropagation

Full size table

Table 11 Pair-wise comparison of filtering for decision trees

Full size table

Table 12 Pair-wise comparison of filtering for 5-nearest neighbors

Full size table

Table 13 Pair-wise comparison of filtering for locally weighted learning (LWL)

Full size table

Table 14 Pair-wise comparison of filtering for naïve Bayes

Full size table

Table 15 Pair-wise comparison of filtering for NNge

Full size table

Table 16 Pair-wise comparison of filtering for random forests

Full size table

Table 17 Pair-wise comparison of filtering for Ridor

Full size table

Table 18 Pair-wise comparison of filtering for RIPPER

Full size table

Table 19 Comparison of the accuracy for each data set using a voting ensemble (Ens) with using the ensemble filter for the investigated learning algorithms

Full size table

Table 20 Comparison of the accuracy from a majority voting ensemble trained on unfiltered (Ens) and filtered data (FEns)

Full size table

Appendix 2: Ensemble results for each data set

This section provides the results for each data set comparing a voting ensemble with filtering using the ensemble filter for each investigated learning algorithm as well as filtering using the ensemble filter for a voting ensemble. The results comparing a voting ensemble with filtering for each investigated non-ensembled learning algorithm are shown in Table 19. The bold values represent the highest classification accuracy and the rows highlighted in gray are the data sets where filtering with the ensemble filter increased the accuracy over the voting ensemble for all learning algorithms. The results comparing a voting ensemble with a filtered voting ensemble are shown in Table 20. The bold values for the “Ens” column represent if the voting ensemble trained on unfiltered data achieves higher accuracy while the bold values for the “FEns” columns represent if the voting ensemble trained on filtered data achieves higher accuracy than the voting ensemble trained on unfiltered data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smith, M.R., Martinez, T. The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks. Artif Intell Rev 49, 105–130 (2018). https://doi.org/10.1007/s10462-016-9518-2

Download citation

Published: 22 September 2016
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10462-016-9518-2

The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A geometric framework for multiclass ensemble classifiers

Removing Bias from Diverse Data Clusters for Ensemble Classification

Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Statistical Significance Tables

Appendix 2: Ensemble results for each data set

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A geometric framework for multiclass ensemble classifiers

Removing Bias from Diverse Data Clusters for Ensemble Classification

Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Statistical Significance Tables

Appendix 2: Ensemble results for each data set

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation