Abstract
The inference of mixture regression models (MRM) is traditionally based on the normal (symmetry) assumption of component errors and thus is sensitive to outliers or symmetric/asymmetric lightly/heavy-tailed errors. To deal with these problems, some new mixture regression models have been proposed recently. In this paper, a general class of robust mixture regression models is presented based on the two-piece scale mixtures of normal (TP-SMN) distributions. The proposed model is so flexible that can simultaneously accommodate asymmetry and heavy tails. The stochastic representation of the proposed model enables us to easily implement an EM-type algorithm to estimate the unknown parameters of the model based on a penalized likelihood. In addition, the performance of the considered estimators is illustrated using a simulation study and a real data example.
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723
Andrews DR, Mallows CL (1974) Scale mixture of normal distribution. J R Stat Soc B 36:99–102
Arellano-Valle RB, Gómez H, Quintana FA (2005) Statistical inference for a general class of asymmetric distributions. J Stat Plan Inference 128:427–443
Arellano-Valle RB, Castro LM, Genton MG, Gómez HW (2008) Bayesian inference for shape mixtures of skewed distributions, with application to regression analysis. Bayesian Anal 3(3):513–539
Bai X, Yao W, Boyer JE (2012) Robust fitting of mixture regression models. Comput Stat Data Anal 56:2347–2359
Barkhordar Z, Maleki M, Khodadadi Z, Wraith D, Negahdari F (2020) A Bayesian approach on the two-piece scale mixtures of normal homoscedastic nonlinear regression models. J Appl Stat. https://doi.org/10.1080/02664763.2020.1854203
Basford KE, Greenway DR, Mclachlan GJ, Peel D (1997) Standard errors of fitted component means of normal mixtures. Comput Stat 12:1–17
Bazrafkan M, Zare K, Maleki M, Khodadadi Z (2021) Partially linear models based on heavy-tailed and asymmetrical distributions. Stoch Environ Res Risk Assess. https://doi.org/10.1007/s00477-021-02101-1
Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79:99–113
Cohen E (1984) Some effects of inharmonic partials on interval perception. Music Percept 1:323–349
Cosslett SR, Lee L-F (1985) Serial correlation in latent discrete variable models. J Econom 27(1):79–97
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39:1–22
DeSarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:248–282
DeSarbo WS, Wedel M, Vriens M, Ramaswamy V (1992) Latent class metric conjoint analysis. Mark Lett 3(3):273–288
DeVeaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8(3):227–245
Doğru FZ, Arslan O (2017) Robust mixture regression based on the skew t distribution. Revista Colombiana De Estadística 40(1):45–64. https://doi.org/10.15446/rce.v40n1.53580
Engel C, Hamilton JD (1990) Long swings in the Dollar: are they in the data and do markets know it? Am Econ Rev 80(4):689–713
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Ghasami S, Maleki M, Khodadadi Z (2020) Leptokurtic and platykurtic class of robust symmetrical and asymmetrical time series models. J Comput Appl Math. https://doi.org/10.1016/j.cam.2020.112806
Hajrajabi A, Maleki M (2019) Nonlinear semiparametric autoregressive model with finite mixtures of scale mixtures of skew normal innovations. J Appl Stat 46(11):2010–2029
Hamilton JD (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57:357–384
Hoseinzadeh A, Maleki M, Khodadadi Z (2021) Heteroscedastic nonlinear regression models using asymmetric and heavy tailed two-piece distributions. AStA Adv Stat Anal 105:451–467
Huang T, Peng H, Zhang K (2017) Model selection for Gaussian mixture models. Stat Sin 27(1):147–169
Hunter DR, Young DS (2012) Semiparametric mixtures of regressions. J Nonparametric Stat 24(1):19–38
Lin TI, Lee JC, Hsieh WJ (2007) Robust mixture modelling using the skew t distribution. Stat Comput 17:81–92
Liu M, Lin T-I (2014) A skew-normal mixture regression model. Educ Psychol Meas 74:139–162
Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:633–648
Liu M, Hancock GR, Harring JR (2011) Using finite mixture modeling to deal with systematic measurement error: a case study. J Mod Appl Stat Methods 10(1):249–261
Mahmoudi MR, Maleki M, Baleanu D, Nguyen VT, Pho KH (2020) A Bayesian approach to heavy-tailed finite mixture autoregressive models. Symmetry 12(6):929
Maleki M (2022) Time series modelling and prediction of the coronavirus outbreaks (COVID-19) in the World. In: Azar AT, Hassanien AE (eds) Modeling, control and drug development for COVID-19 outbreak prevention: studies in systems, decision and control, vol 366. Springer, Cham. https://doi.org/10.1007/978-3-030-72834-2_2
Maleki M, Mahmoudi MR (2017) Two-piece location-scale distributions based on scale mixtures of normal family. Commun Stat Theory Methods 46(24):12356–12369
Maleki M, Nematollahi AR (2017) Bayesian approach to epsilon-skew-normal family. Commun Stat Theory Methods 46(15):7546–7561
Maleki M, Wraith D (2019) Mixtures of multivariate restricted skew-normal factor analyzer models in a Bayesian framework. Comput Stat 34:1039–1053
Maleki M, Barkhordar Z, Khodadado Z, Wraith D (2019a) A robust class of homoscedastic nonlinear regression models. J Stat Comput Simul 89(14):2765–2781
Maleki M, Contreras-Reyes JE, Mahmoudi MR (2019b) Robust mixture modeling based on two-piece scale mixtures of normal family. Axioms 8(2):38. https://doi.org/10.3390/axioms8020038
Maleki M, Wraith D, Arellano-Valle RB (2019c) Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions. Stat Comput 29(3):415–428
Maleki M, Hajrajabi A, Arellano-Valle RB (2020a) Symmetrical and asymmetrical mixture autoregressive processes. Braz J Probab Stat 34(2):273–290
Maleki M, Mahmoudi MR, Wraith D, Pho KH (2020b) Time series modelling to forecast the confirmed and recovered cases of COVID-19. Travel Med Infect Dis. https://doi.org/10.1016/j.tmaid.2020.101742
Maleki M, McLachlan G, Lee S (2021) Robust clustering based on finite mixture of multivariate fragmental distributions. Stat Model. https://doi.org/10.1177/1471082X211048660
Maleki M, Bidram H, Wraith D (2022) Robust clustering of COVID-19 cases across U.S. counties using mixtures of asymmetric time series models with time varying and freely indexed covariates. J Appl Stat. https://doi.org/10.1080/02664763.2021.2019688
Markatou M (2000) Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56:483–486
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Meng X, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
Moravveji B, Khodadadi Z, Maleki M (2019) A Bayesian analysis of two-piece distributions based on the scale mixtures of normal family. Iran J Sci Technol Trans Science 43(3):991–1001
Mudholkar GS, Hutson AD (2000) The epsilon-skew-normal distribution for analyzing near-normal data. J Stat Plan Inference 83(2):291–309
Naik PA, Shi P, Tsai C-L (2007) Extending the Akaike information criterion to mixture regression models. J Am Stat Assoc 102(477):244–254
Quandt RE (1972) A new approach to estimating switching regressions. J Am Stat Assoc 67:306–310
Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions. J Am Stat Assoc 73(364):730–738
Resende PAA, Dorea CCY (2016) Model identification using the efficient determination criterion. J Multivar Anal 150:229–244
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Song W, Yao W, Xing Y (2014) Robust mixture regression model fitting by Laplace distribution. Comput Stat Data Anal 71:128–137
Späth H (1979) Algorithm 39 clusterwise linear regression. Computing 22(4):367–373
Teicher H (1963) Identifiability of finite mixtures. Ann Math Stat 34(4):1265–1269
Tibshirani RJ (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B 58:267–288
Turner TR (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. J R Stat Soc Ser C (appl Stat) 49(3):371–384
Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12(4):315–330
Wang H, Li R, Tsai C-L (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94:553–568
Yao W, Wei Y, Yu C (2014) Robust mixture regression using the t-distribution. Comput Stat Data Anal 71:116–127
Zeller CB, Cabral CRB, Lachos VH (2016) Robust mixture regression modeling based on scale mixtures of skew-normal distributions. TEST 25:375–396
Acknowledgements
We would like to express our very great appreciation to editor, associate editor and two anonymous reviewers for their valuable and constructive suggestions during the planning and development of this research work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zarei, A., Khodadadi, Z., Maleki, M. et al. Robust mixture regression modeling based on two-piece scale mixtures of normal distributions. Adv Data Anal Classif 17, 181–210 (2023). https://doi.org/10.1007/s11634-022-00495-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-022-00495-6
Keywords
- ECME algorithm
- Mixture regression models
- Penalized likelihood
- Two-piece scale mixtures of normal distributions