Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Removing or filtering outliers and mislabeled instances prior to training a learning algorithm has been shown to increase classification accuracy, especially in noisy data sets. A popular approach is to remove any instance that is misclassified by a learning algorithm. However, the use of ensemble methods has also been shown to generally increase classification accuracy. In this paper, we extensively examine filtering and ensembling. We examine 9 learning algorithms individually and ensembled together as filtering algorithms as well as the effects of filtering in the 9 chosen learning algorithms on a set of 54 data sets. We compare the filtering results with using a majority voting ensemble. We find that the majority voting ensemble significantly outperforms filtering unless there are high amounts of noise present in the data set. Additionally, for most cases, using an ensemble of learning algorithms for filtering produces a greater increase in classification accuracy than using a single learning algorithm for filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. As opposed to an ensemble composed of models linduced by the same learning algorithm such as bagging or boosting.

  2. The NNge learning algorithm did not finish running two data sets: eye-movements and Magic telescope. RIPPER did not finish on the lung cancer data set. In these cases, the data sets are omitted from the presented results. As such, NNge was evaluated on a set of 52 data sets and RIPPER was evaluated on a set of 53 data sets.

References

  • Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 1. Springer, New York

    MATH  Google Scholar 

  • Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167

    MATH  Google Scholar 

  • Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23rd international conference on machine learning, pp 201–208

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems, Lecture Notes in Computer Science, vol 1857. Springer, Berlin, pp 1–15

  • Freund Y (1990) Boosting a weak learning algorithm by majority. In: Proceedings of the third annual workshop on computational learning theory, pp 202–216

  • Gamberger D, Lavrač N, Džeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14(2):205–223

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  • Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24:289–300

    Article  Google Scholar 

  • John G.H (1995) Robust decision trees: removing outliers from databases. In: Knowledge discovery and data mining, pp 174–179

  • Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207

    Article  MATH  Google Scholar 

  • Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. In: In Proceedings of the 18th international conference on machine learning, pp 306–313

  • Lee J, Giraud-Carrier C (2011) A metric for unsupervised metalearning. Intell Data Anal 15(6):827–841

    Google Scholar 

  • Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306

    Article  Google Scholar 

  • Ng AY, Jordan MI (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems, vol 14, pp 841–848

  • Opitz DW, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198

    MATH  Google Scholar 

  • Orriols-Puig A, Macià N, Bernadó-Mansilla E, Ho TK (2009) Documentation for the data complexity library in C\(++\). Tech. Rep. 2009001, La Salle—Universitat Ramon Llull

  • Peterson AH, Martinez TR (2005) Estimating the potential for combining learning models. In: Proceedings of the ICML Workshop on meta-learning, pp 68–75

  • Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45

    Article  Google Scholar 

  • Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  • Rebbapragada U, Brodley CE (2007) Class noise mitigation through instance weighting. In: Proceedings of the 18th European conference on machine learning, pp 708–715

  • Sáez JA, Luengo J, Herrera F (2013) Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit 46(1):355–364

    Article  Google Scholar 

  • Salojärvi J, Puolamäki K, Simola J, Kovanen L, Kojo I, Kaski S (2005) Inferring relevance from eye movements: feature extraction. Tech. Rep. A82, Helsinki University of Technology

  • Sayyad Shirabad J, Menzies T (2005) The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada . http://promise.site.uottawa.ca/SERepository/

  • Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227

    Google Scholar 

  • Segata N, Blanzieri E, Cunningham P (2009) A scalable noise reduction technique for large case-based systems. In: Proceedings of the 8th international conference on case-based reasoning: case-based reasoning research and development, pp 328–342

  • Servedio RA (2003) Smooth boosting and learning with malicious noise. J Mach Learn Res 4:633–648

    MathSciNet  MATH  Google Scholar 

  • Smith MR, Martinez T (2011) Improving classification accuracy by identifying and removing instances that should be misclassified. In: Proceedings of the IEEE international joint conference on neural networks, pp 2690–2697

  • Smith MR, Martinez T (2014) Reducing the effects of detrimental instances. In: Proceedings of the 13th international conference on machine learning and applications, pp 183–188

  • Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256

    Article  MathSciNet  Google Scholar 

  • Stiglic G, Kokol P (2009) GEMLer: gene expression machine learning repository. University of Maribor, Faculty of Health Sciences. http://gemler.fzv.uni-mb.si/

  • Teng C (2003) Combining noise correction with feature selection. Data warehousing and knowledge discovery, Lecture Notes in Computer Science, vol 2737, pp 340–349

  • Teng CM (2000) Evaluating noise correction. In: PRICAI, pp 188–198

  • Thomson K, McQueen RJ (1996) Machine learning applied to fourteen agricultural datasets. Tech. Rep. 96/18, The University of Waikato

  • Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6:448–452

    MathSciNet  MATH  Google Scholar 

  • Verbaeten S, Van Assche A (2003) Ensemble methods for noise elimination in classification problems. In: Proceedings of the 4th international conference on multiple classifier systems, pp 317–325

  • Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2–3:408–421

    Article  MathSciNet  MATH  Google Scholar 

  • Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286

    Article  MATH  Google Scholar 

  • Zeng X, Martinez TR (2001) An algorithm for correcting mislabeled data. Intell Data Anal 5:491–502

    MATH  Google Scholar 

  • Zeng X, Martinez TR (2003) A noise filtering method using neural networks. In: Proceedings of the international workshop of soft computing techniques in instrumentation, measurement and related applications

  • Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22:177–210

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael R. Smith.

Appendices

Appendix 1: Statistical Significance Tables

This section provides the results from the statistical significance tests comparing not filtering with filtering with a biased filter, the ensemble filter, and the adaptive filter for the investigated learning algorithms. The results are in Tables 10, 11, 12, 13, 14, 15, 16, 17 and 18. The p values with a value \({<}\)0.05 are in bold and “greater-equal-less” refers to the number of times that the algorithm listed in the row is greater than, equal to, or less than the algorithm listed in the column.

Table 10 Pair-wise comparison of filtering for multilayer perceptrons trained with backpropagation
Table 11 Pair-wise comparison of filtering for decision trees
Table 12 Pair-wise comparison of filtering for 5-nearest neighbors
Table 13 Pair-wise comparison of filtering for locally weighted learning (LWL)
Table 14 Pair-wise comparison of filtering for naïve Bayes
Table 15 Pair-wise comparison of filtering for NNge
Table 16 Pair-wise comparison of filtering for random forests
Table 17 Pair-wise comparison of filtering for Ridor
Table 18 Pair-wise comparison of filtering for RIPPER
Table 19 Comparison of the accuracy for each data set using a voting ensemble (Ens) with using the ensemble filter for the investigated learning algorithms
Table 20 Comparison of the accuracy from a majority voting ensemble trained on unfiltered (Ens) and filtered data (FEns)

Appendix 2: Ensemble results for each data set

This section provides the results for each data set comparing a voting ensemble with filtering using the ensemble filter for each investigated learning algorithm as well as filtering using the ensemble filter for a voting ensemble. The results comparing a voting ensemble with filtering for each investigated non-ensembled learning algorithm are shown in Table 19. The bold values represent the highest classification accuracy and the rows highlighted in gray are the data sets where filtering with the ensemble filter increased the accuracy over the voting ensemble for all learning algorithms. The results comparing a voting ensemble with a filtered voting ensemble are shown in Table 20. The bold values for the “Ens” column represent if the voting ensemble trained on unfiltered data achieves higher accuracy while the bold values for the “FEns” columns represent if the voting ensemble trained on filtered data achieves higher accuracy than the voting ensemble trained on unfiltered data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Smith, M.R., Martinez, T. The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks. Artif Intell Rev 49, 105–130 (2018). https://doi.org/10.1007/s10462-016-9518-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-016-9518-2

Keywords

Navigation