Nothing Special   »   [go: up one dir, main page]

Skip to main content

Identifying Influential Cases in Kernel Fisher Discriminant Analysis by Using the Smallest Enclosing Hypersphere

  • Conference paper
  • First Online:
Advances in Data Analysis, Data Handling and Business Intelligence

Abstract

Kernel methods have become standard tools for solving classification and regression problems in statistics. An example of a kernel based classification method is Kernel Fisher discriminant analysis (KFDA). Conceptually KFDA entails transforming the data in the input space to a high-dimensional feature space, followed by linear discriminant analysis (LDA) performed in feature space. Although the resulting classifier is linear in feature space, it corresponds to a non-linear classifier in input space. However, as in the case of LDA, the classification performance of KFDA deteriorates in the presence of influential data points. Louw et al. (Communications in Statistics: Simulation and Computation 37:2050–2062, 2008) proposed several criteria for identification of influential cases in KFDA. In extensive simulation studies these criteria have been found to be successful, in the sense that the error rate of the KFD classifier based on the data set after removal of influential cases, is lower than the error rate of the KFD classifier based on the entire data set. A disadvantage is that these criteria are calculated on a leave-one-out basis, which becomes computationally expensive when dealing with large data sets. In this paper we propose a two-step procedure for identifying influential cases in large data sets. Firstly, a subset of potentially influential data cases is found by constructing the smallest enclosing hypersphere (for each group) in feature space. Secondly, the proposed criteria are employed to identify influential cases, but only cases in the subset are considered on a leave-one-out basis, leading to a substantial reduction in computation time. We investigate the merit of this new proposal in a simulation study, and compare the results to the results obtained when not using the hypersphere as a first step. We conclude that the new proposal has merit.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Crithchley, F., & Vitiello, C. (1991). The influence of observations on misclassification probability estimates in linear discriminant analysis. Biometrika, 78, 677–690.

    Article  MathSciNet  Google Scholar 

  • Croux, C., Filzmoser, P., & Joossens, K. (2008). Classification efficiencies for robust linear discriminant analysis. Statistica Sinica, 18, 581–599.

    MATH  MathSciNet  Google Scholar 

  • Flury, B.W., & Riedwyl, H. (1988). Multivariate statistics: A practical approach. London: Chapman and Hall.

    Google Scholar 

  • Fung, W.K. (1992). Some diagnostic measures in discriminant analysis. Statistics and Probability Letters, 13, 279–285.

    Article  MATH  MathSciNet  Google Scholar 

  • Fung, W.K. (1995). Diagnostics in linear discriminant analysis. Journal of the American Statistical Association, 90, 952–956.

    Article  MATH  MathSciNet  Google Scholar 

  • Lamont, M.M.C. (2008). Assessing the influence of observations on the generalisation performance of the kernel Fisher discriminant classifier. Unpublished PhD-thesis, University of Stellenbosch.

    Google Scholar 

  • Louw, N., Lamont, M.M.C., & Steel, S.J. (2008). Identification of influential cases in kernel Fisher discriminant analysis. Communications in Statistics: Simulation and Computation, 37, 2050–2062.

    Article  MATH  Google Scholar 

  • Mika, S., Rätsch, G., Weston, J., Schölkopf, B., & Müller, K.-R. (1999). Fisher discriminant analysis with kernels. In Y.-H. Hu, J. Larsen, E. Wilson, & S. Douglas (Eds.), Neural networks for signal processing (pp. 41–48). New York: IEEE Press.

    Google Scholar 

  • Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  • Tax, D.M.J., & Duin, R.P.W. (1999). Support vector domain description. Pattern Recognition Letters, 120, 1191–1199.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nelmarie Louw .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Louw, N., Steel, S., Lamont, M. (2009). Identifying Influential Cases in Kernel Fisher Discriminant Analysis by Using the Smallest Enclosing Hypersphere. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_33

Download citation

Publish with us

Policies and ethics