Abstract
Kernel methods have become standard tools for solving classification and regression problems in statistics. An example of a kernel based classification method is Kernel Fisher discriminant analysis (KFDA). Conceptually KFDA entails transforming the data in the input space to a high-dimensional feature space, followed by linear discriminant analysis (LDA) performed in feature space. Although the resulting classifier is linear in feature space, it corresponds to a non-linear classifier in input space. However, as in the case of LDA, the classification performance of KFDA deteriorates in the presence of influential data points. Louw et al. (Communications in Statistics: Simulation and Computation 37:2050–2062, 2008) proposed several criteria for identification of influential cases in KFDA. In extensive simulation studies these criteria have been found to be successful, in the sense that the error rate of the KFD classifier based on the data set after removal of influential cases, is lower than the error rate of the KFD classifier based on the entire data set. A disadvantage is that these criteria are calculated on a leave-one-out basis, which becomes computationally expensive when dealing with large data sets. In this paper we propose a two-step procedure for identifying influential cases in large data sets. Firstly, a subset of potentially influential data cases is found by constructing the smallest enclosing hypersphere (for each group) in feature space. Secondly, the proposed criteria are employed to identify influential cases, but only cases in the subset are considered on a leave-one-out basis, leading to a substantial reduction in computation time. We investigate the merit of this new proposal in a simulation study, and compare the results to the results obtained when not using the hypersphere as a first step. We conclude that the new proposal has merit.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Crithchley, F., & Vitiello, C. (1991). The influence of observations on misclassification probability estimates in linear discriminant analysis. Biometrika, 78, 677–690.
Croux, C., Filzmoser, P., & Joossens, K. (2008). Classification efficiencies for robust linear discriminant analysis. Statistica Sinica, 18, 581–599.
Flury, B.W., & Riedwyl, H. (1988). Multivariate statistics: A practical approach. London: Chapman and Hall.
Fung, W.K. (1992). Some diagnostic measures in discriminant analysis. Statistics and Probability Letters, 13, 279–285.
Fung, W.K. (1995). Diagnostics in linear discriminant analysis. Journal of the American Statistical Association, 90, 952–956.
Lamont, M.M.C. (2008). Assessing the influence of observations on the generalisation performance of the kernel Fisher discriminant classifier. Unpublished PhD-thesis, University of Stellenbosch.
Louw, N., Lamont, M.M.C., & Steel, S.J. (2008). Identification of influential cases in kernel Fisher discriminant analysis. Communications in Statistics: Simulation and Computation, 37, 2050–2062.
Mika, S., Rätsch, G., Weston, J., Schölkopf, B., & Müller, K.-R. (1999). Fisher discriminant analysis with kernels. In Y.-H. Hu, J. Larsen, E. Wilson, & S. Douglas (Eds.), Neural networks for signal processing (pp. 41–48). New York: IEEE Press.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Tax, D.M.J., & Duin, R.P.W. (1999). Support vector domain description. Pattern Recognition Letters, 120, 1191–1199.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Louw, N., Steel, S., Lamont, M. (2009). Identifying Influential Cases in Kernel Fisher Discriminant Analysis by Using the Smallest Enclosing Hypersphere. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-01044-6_33
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)