Abstract
This paper provides a graphical visualization of multiple outliers based on a clustering algorithm using the minimal spanning tree, and proposes a modified version of this clustering algorithm for identifying multiple outliers. Graphical visualization is helpful for the classification of multiple outliers. It is shown that the proposed modified procedure preserves the performance of the clustering algorithm in identifying multiple outliers, but also reduces the problem of swamping of observations.
Similar content being viewed by others
References
Brownlee KA (1965) Statistical theory and methodology in science and engineering, 2nd edn. Wiley, New York
Gordon AD (1981) classification. Chapman and Hall, London
Gower JC, Ross GJS (1969) Minimum spanning trees and single linkage cluster analysis. Appl Stat 18:54–64
Hadi AS, Simonoff JS (1993) Procedures for the identification of multiple outliers in linear models. J Am Stat Assoc 88:1264–1272
Hawkins DM, Bradu D, Kass GV (1984) Location of several outliers in multiple regression data using elemental sets. Technometrics 26:197–208
Jolliffe IT, Jones B, Morgan BJT (1995) Identifying influential observations in hierarchical cluster analysis. J Appl Stat 22(1):61–80
Kim S, Kwon S, Cook D (2000) Interactive visualization of hierarchical clusters using MDS and MST. Metrika 51(1):39–51
Kim S, Park S (1995) Dynamic Plots for Displaying the Roles of Variables and Observations in Regression Model. Comput Stat Data Anal 19:401–418
Krzanowski WJ (1988) Principles of multivariate analysis. Oxford Science Publication, Oxford
Lawrance AJ (1995) Deletion Influence and Masking in Regression. J Roy Stat Soc B 57(1):181–189
Mojena R (1977) Hierarchical grouping methods and stopping rule:an evaluation. Comput J 20:359–363
Pena D, Yohai VJ (1995) The Detection of Influential Subsets in Linear Regression by using an Influence Matrix. J Roy Stat Soc B 57(1):145–156
Rousseeuw PJ, Leroy A (1987) Robust regression and outlier detection. Wiley, New York
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Associ 79:871–881
Rousseeuw PJ, van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Associ 85:633–639
Sebert DM, Montgomery DC, Rollier DA (1998) A clustering algorithm for identifying multiple outliers. Comput Stat Data Analy 27:461–484
Wilcox RR (2005) Introduction to robust estimation and hypothesis testing, 2nd edn. Elsevier Academic Press, Amsterdam
Wisnowski JW, Montgomery DC, Simpson JR (2001) A comparative analysis of multiple outlier detection procedures in the linear regression model. Comput Stat Data Anal 351–382
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, SS., Krzanowski, W.J. Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization. Computational Statistics 22, 109–119 (2007). https://doi.org/10.1007/s00180-007-0026-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-007-0026-3