Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Estimation of l 0 norm penalized models: : A statistical treatment

Published: 12 April 2024 Publication History

Abstract

Fitting penalized models for the purpose of merging the estimation and model selection problem has become commonplace in statistical practice. Of the various regularization strategies that can be leveraged to this end, the use of the l 0 norm to penalize parameter estimation poses the most daunting model fitting task. In fact, this particular strategy requires an end user to solve a non-convex NP-hard optimization problem irregardless of the underlying data model. For this reason, the use of the l 0 norm as a regularization strategy has been woefully under utilized. To obviate this difficulty, a strategy to solve such problems that is generally accessible by the statistical community is developed. The approach can be adopted to solve l 0 norm penalized problems across a very broad class of models, can be implemented using existing software, and is computationally efficient. The performance of the method is demonstrated through in-depth numerical experiments and through using it to analyze several prototypical data sets.

References

[1]
A. Armagan, D. Dunson, J. Lee, Generalized double Pareto shrinkage, Stat. Sin. 23 (1) (2013) 119–143.
[2]
Atamturk, A.; Gomez, A. (2019): Rank-one convexification for sparse regression. ArXiv preprint arXiv:1901.10334.
[3]
Auto MPG, UCI Machine Learning Repository, 1993.
[4]
D. Bertsimas, A. King, Or forum—an algorithmic approach to linear regression, Oper. Res. 64 (1) (2016) 2–16.
[5]
D. Bertsimas, A. King, R. Mazumder, Best subset selection via a modern optimization lens, Ann. Stat. 44 (2) (2016) 813–852.
[6]
S. Bi, X. Liu, S. Pan, Exact penalty decomposition method for zero-norm minimization based on mpec formulation, SIAM J. Sci. Comput. 36 (4) (2014) A1451–A1477.
[7]
P. Breheny, J. Huang, Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, Ann. Appl. Stat. 5 (1) (2011) 232.
[8]
A.M. Bruckstein, D.L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Rev. 51 (1) (2009) 34–81.
[9]
Y. Chen, Y. Ye, M. Wang, Approximation hardness for a class of sparse optimization problems, J. Mach. Learn. Res. (2019).
[10]
S. Datta, J. Le-Rademacher, S. Datta, Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and lasso, Biometrics 63 (1) (2007) 259–271.
[11]
G. Davis, S. Mallat, M. Avellaneda, Adaptive greedy approximations, Constr. Approx. 13 (1) (1997) 57–98.
[12]
Duan, J.-C. : Variable selection with big data based on zero norm and via sequential Monte Carlo. Available at SSRN: https://ssrn.com/abstract=3377038 (2019): Variable selection with big data based on zero norm and via sequential Monte Carlo. https://doi.org/10.2139/ssrn.3377038.
[13]
J. Fan, R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc. 96 (456) (2001) 1348–1360.
[14]
M. Feng, J. Mitchell, J.-S. Pang, X. Shen, A. Wächter, Complementarity formulations of l 0-norm optimization problems, Industrial Engineering and Management Sciences, Technical Report Northwestern University, Evanston, IL, USA 5, 2013.
[15]
F. Frommlet, A neutral comparison of algorithms to minimize l0 penalties for high-dimensional variable selection, Biom. J. (2023).
[16]
W. Fu, Penalized estimating equations, Biometrics 59 (1) (2003) 126–132.
[17]
G. Fung, O. Mangasarian, Equivalence of minimal 0-norm and p-norm solutions of linear equalities, inequalities and linear programs for sufficiently small p, J. Optim. Theory Appl. 151 (2011) 1–10.
[18]
T. Hastie, R. Tibshirani, J.H. Friedman, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2, Springer, 2009.
[19]
C. Herzet, A. Drémeau, Bayesian pursuit algorithms, in: 2010 18th European Signal Processing Conference, IEEE, 2010, pp. 1474–1478.
[20]
J. Huang, S. Ma, H. Xie, Regularized estimation in the accelerated failure time model with high-dimensional covariates, Biometrics 62 (3) (2006) 813–820.
[21]
X. Huo, J. Chen, Complexity of penalized likelihood estimation, J. Stat. Comput. Simul. 80 (7) (2010) 747–759.
[22]
A. Janosi, W. Steinbrunn, M. Pfisterer, R. Detrano, Heart Disease, UCI Machine Learning Repository, 1988.
[23]
F. Jin, L.-F. Lee, Lasso maximum likelihood estimation of parametric models with singular information matrices, Econometrics 6 (1) (2018) 8.
[24]
H.A. Le Thi, T.P. Dinh, H. Le, X.T. Vo, Dc approximation approaches for sparse optimization, Eur. J. Oper. Res. 244 (1) (2015) 26–46.
[25]
H.A. Le Thi, H. Le, T. Pham Dinh, Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm, Mach. Learn. 101 (1) (2015) 163–186.
[26]
Q. Li, N. Lin, R. Xi, Bayesian regularized quantile regression, Bayesian Anal. 5 (3) (2010) 533–556.
[27]
Y. Liu, S. Bi, S. Pan, Equivalent Lipschitz surrogates for zero-norm and rank optimization problems, J. Glob. Optim. 72 (4) (2018) 679–704.
[28]
J. López, S. Maldonado, M. Carrasco, Double regularization methods for robust feature selection and svm classification via dc programming, Inf. Sci. 429 (2018) 377–389.
[29]
Z. Lu, Iterative hard thresholding methods for l 0 regularized convex cone programming, Math. Program. 147 (1) (2014) 125–154.
[30]
S. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process. 41 (12) (1993) 3397–3415.
[31]
L. Meier, S. Van De Geer, P. Bühlmann, The group lasso for logistic regression, J. R. Stat. Soc., Ser. B, Stat. Methodol. 70 (1) (2008) 53–71.
[32]
B.K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput. 24 (2) (1995) 227–234.
[33]
D. Needell, J. Tropp, Cosamp: iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmon. Anal. 26 (3) (2009) 301–321.
[34]
Y. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition, in: Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, IEEE, 1993, pp. 40–44.
[35]
B. Pötscher, H. Leeb, On the distribution of penalized maximum likelihood estimators: the lasso, scad, and thresholding, J. Multivar. Anal. 100 (9) (2009) 2065–2082.
[36]
B. Ramana, N. Venkateswarlu, ILPD (Indian Liver Patient Dataset), UCI Machine Learning Repository, 2012.
[37]
Real estate valuation data set, UCI Machine Learning Repository, 2018.
[38]
M. Redmond, Communities and Crime, UCI Machine Learning Repository, 2009.
[39]
V. Roth, The generalized lasso, IEEE Trans. Neural Netw. 15 (1) (2004) 16–28.
[40]
X. Shen, W. Pan, Y. Zhu, Likelihood-based selection and sharp parameter estimation, J. Am. Stat. Assoc. 107 (497) (2012) 223–232.
[41]
N. Simon, J. Friedman, T. Hastie, R. Tibshirani, A sparse-group lasso, J. Comput. Graph. Stat. 22 (2) (2013) 231–245.
[42]
I. Sohn, J. Kim, S.-H. Jung, C. Park, Gradient lasso for Cox proportional hazards model, Bioinformatics 25 (14) (2009) 1775–1781.
[43]
T. Stamey, J. Kabalin, J. McNeal, I. Johnstone, F. Freiha, E. Redwine, N. Yang, Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients, J. Urol. 141 (5) (1989) 1076–1083.
[44]
R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B, Methodol. 58 (1) (1996) 267–288.
[45]
R. Tibshirani, The lasso method for variable selection in the Cox model, Stat. Med. 16 (4) (1997) 385–395.
[46]
R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, K. Knight, Sparsity and smoothness via the fused lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol. 67 (1) (2005) 91–108.
[47]
J. Tropp, Greed is good: algorithmic results for sparse approximation, IEEE Trans. Inf. Theory 50 (10) (2004) 2231–2242.
[48]
J. Tropp, A. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit, IEEE Trans. Inf. Theory 53 (12) (2007) 4655–4666.
[49]
L. Wang, J. Zhou, A. Qu, Penalized generalized estimating equations for high-dimensional longitudinal data analysis, Biometrics 68 (2) (2012) 353–360.
[50]
J. Weston, A. Elisseeff, B. Schölkopf, M. Tipping, Use of the zero norm with linear models and kernel methods, J. Mach. Learn. Res. 3 (2003) 1439–1461.
[51]
L. Wolsey, G. Nemhauser, Integer and Combinatorial Optimization, vol. 55, John Wiley & Sons, 1999.
[52]
Y. Wu, Y. Liu, Variable selection in quantile regression, Stat. Sin. (2009) 801–817.
[53]
T.-J. Yen, A majorization–minimization approach to variable selection using spike and slab priors, Ann. Stat. 39 (3) (2011) 1748–1775.
[54]
Yuan, G.; Ghanem, B. (2016): Sparsity constrained minimization via mathematical programming with equilibrium constraints. ArXiv preprint arXiv:1608.04430.
[55]
C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38 (2) (2010) 894–942.
[56]
Y. Zhang, X. Shen, Model selection procedure for high-dimensional data, Stat. Anal. Data Min. ASA Data Sci. J. 3 (5) (2010) 350–358.
[57]
H. Zou, The adaptive lasso and its oracle properties, J. Am. Stat. Assoc. 101 (476) (2006) 1418–1429.
[58]
H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc., Ser. B, Stat. Methodol. 67 (2) (2005) 301–320.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computational Statistics & Data Analysis
Computational Statistics & Data Analysis  Volume 192, Issue C
Apr 2024
318 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 12 April 2024

Author Tags

  1. Shrinkage prior
  2. EM algorithm
  3. Regularized regression
  4. BIC

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media