Abstract
High leverage points have tremendous effect in linear regression analysis. When a group of high leverage points is present in a dataset, the existing detection methods fail to detect them correctly. This problem is due to the masking and swamping effects. We propose the Diagnostic Robust Generalized Potentials Based on Index Set Equality (DRGP(ISE)) in this regard. The DRGP(ISE) takes off from the Diagnostic Robust Generalized Potential Based on Minimum Volume Ellipsoid (DRGP(MVE)). However, the running time of ISE is much faster than MVE. Monte Carlo simulation study and numerical data indicate that DRGP(ISE) works excellently to detect the actual high leverage points and reduce masking and swamping effects in a linear model.
Similar content being viewed by others
References
Fitrianto A, Midi H (2010) Diagnostic-robust generalized potentials for identifying high leverage points in mediation analysis. World Appl Sci J 11(8):979–987
Habshah M, Norazan MR, Rahmatullah Imon AHM (2009) The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression. J Appl Stat 36(5):507–520
Hadi AS (1992) A new measure of overall potential influence in linear regression. Comput Stat Data Anal 14(1):1–27
Hawkins DM, Bradu D, Kass GV (1984) Location of several outliers in multiple-regression data using elemental sets. Technometrics 26(3):197–208
Hoaglin DC, Welsch RE (1978) The hat matrix in regression and ANOVA. Am Stat 32(1):17–22
Hubert M, Rousseeuw PJ, Van Aelst S (2008) High-breakdown robust multivariate methods. Stat Sci 23:92–119
Imon AHMR (2002) Identifying multiple high leverage points in linear regression. J Stat Stud 3:207–218
Jaggia S, Kelly A (2008) Practical considerations when estimating in the presence of autocorrelation. CS-BIGS. 2(1):21–27
Leroy AM, Rousseeuw PJ (1987) Robust regression and outlier detection. Robust regression and outlier detection. Wiley series in probability and mathematical statistics, Wiley, New York
Mahalanobis PC (1936) On the generalized distance in statistics. Proc Natl Inst Sci Calcutta 2:49–55
Mishra SK (2008) A new method of robust linear regression analysis: some monte carlo experiments. J Appl Econ Sci 5:261–269
Peña D, Yohai VJ (1995) The detection of influential subsets in linear regression by using an influence matrix. J R Stat Soc Ser B Methodol 57:145–156
Rana MS, Midi H, Imon AR (2008) A robust modification of the Goldfeld–Quandt test for the detection of heteroscedasticity in the presence of outliers. J Math Stat 4(4):277
Riazoshams H, Midi HB, Sharipov OS (2010) The performance of robust two-stage estimator in nonlinear regression with autocorrelated error. Commun Stat Simul Comput 39(6):1251–1268
Rohayu MS (2013) A robust estimation method of location and scale with application in monitoring process variability. Ph.D. Thesis, Universit Teknologi Malaysia, Malaysia (unpublished)
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79(388):871–880
Rousseeuw PJ (1985) Multivariate estimation with high breakdown point. Math Stat Appl B:283–297
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88(424):1273–1283
Rousseeuw PJ, Driessen KV (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212–223
Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633–639
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lim, H.A., Midi, H. Diagnostic Robust Generalized Potential Based on Index Set Equality (DRGP (ISE)) for the identification of high leverage points in linear model. Comput Stat 31, 859–877 (2016). https://doi.org/10.1007/s00180-016-0662-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-016-0662-6