research-article

Estimation of l ₀ norm penalized models: : A statistical treatment

Authors:

Christopher S. McMahan,

Yuyuan OuyangAuthors Info & Claims

Volume 192, Issue C

https://doi.org/10.1016/j.csda.2023.107902

Published: 12 April 2024 Publication History

Abstract

Fitting penalized models for the purpose of merging the estimation and model selection problem has become commonplace in statistical practice. Of the various regularization strategies that can be leveraged to this end, the use of the l 0 norm to penalize parameter estimation poses the most daunting model fitting task. In fact, this particular strategy requires an end user to solve a non-convex NP-hard optimization problem irregardless of the underlying data model. For this reason, the use of the l 0 norm as a regularization strategy has been woefully under utilized. To obviate this difficulty, a strategy to solve such problems that is generally accessible by the statistical community is developed. The approach can be adopted to solve l 0 norm penalized problems across a very broad class of models, can be implemented using existing software, and is computationally efficient. The performance of the method is demonstrated through in-depth numerical experiments and through using it to analyze several prototypical data sets.

References

[1]

A. Armagan, D. Dunson, J. Lee, Generalized double Pareto shrinkage, Stat. Sin. 23 (1) (2013) 119–143.

[2]

Atamturk, A.; Gomez, A. (2019): Rank-one convexification for sparse regression. ArXiv preprint arXiv:1901.10334.

[3]

Auto MPG, UCI Machine Learning Repository, 1993.

[4]

D. Bertsimas, A. King, Or forum—an algorithmic approach to linear regression, Oper. Res. 64 (1) (2016) 2–16.

Digital Library

[5]

D. Bertsimas, A. King, R. Mazumder, Best subset selection via a modern optimization lens, Ann. Stat. 44 (2) (2016) 813–852.

[6]

S. Bi, X. Liu, S. Pan, Exact penalty decomposition method for zero-norm minimization based on mpec formulation, SIAM J. Sci. Comput. 36 (4) (2014) A1451–A1477.

[7]

P. Breheny, J. Huang, Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, Ann. Appl. Stat. 5 (1) (2011) 232.

[8]

A.M. Bruckstein, D.L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Rev. 51 (1) (2009) 34–81.

Digital Library

[9]

Y. Chen, Y. Ye, M. Wang, Approximation hardness for a class of sparse optimization problems, J. Mach. Learn. Res. (2019).

[10]

S. Datta, J. Le-Rademacher, S. Datta, Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and lasso, Biometrics 63 (1) (2007) 259–271.

[11]

G. Davis, S. Mallat, M. Avellaneda, Adaptive greedy approximations, Constr. Approx. 13 (1) (1997) 57–98.

[12]

Duan, J.-C. : Variable selection with big data based on zero norm and via sequential Monte Carlo. Available at SSRN: https://ssrn.com/abstract=3377038 (2019): Variable selection with big data based on zero norm and via sequential Monte Carlo. https://doi.org/10.2139/ssrn.3377038.

[13]

J. Fan, R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc. 96 (456) (2001) 1348–1360.

[14]

M. Feng, J. Mitchell, J.-S. Pang, X. Shen, A. Wächter, Complementarity formulations of l 0-norm optimization problems, Industrial Engineering and Management Sciences, Technical Report Northwestern University, Evanston, IL, USA 5, 2013.

[15]

F. Frommlet, A neutral comparison of algorithms to minimize l0 penalties for high-dimensional variable selection, Biom. J. (2023).

[16]

W. Fu, Penalized estimating equations, Biometrics 59 (1) (2003) 126–132.

[17]

G. Fung, O. Mangasarian, Equivalence of minimal 0-norm and p-norm solutions of linear equalities, inequalities and linear programs for sufficiently small p, J. Optim. Theory Appl. 151 (2011) 1–10.

[18]

T. Hastie, R. Tibshirani, J.H. Friedman, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2, Springer, 2009.

[19]

C. Herzet, A. Drémeau, Bayesian pursuit algorithms, in: 2010 18th European Signal Processing Conference, IEEE, 2010, pp. 1474–1478.

[20]

J. Huang, S. Ma, H. Xie, Regularized estimation in the accelerated failure time model with high-dimensional covariates, Biometrics 62 (3) (2006) 813–820.

[21]

X. Huo, J. Chen, Complexity of penalized likelihood estimation, J. Stat. Comput. Simul. 80 (7) (2010) 747–759.

[22]

A. Janosi, W. Steinbrunn, M. Pfisterer, R. Detrano, Heart Disease, UCI Machine Learning Repository, 1988.

[23]

F. Jin, L.-F. Lee, Lasso maximum likelihood estimation of parametric models with singular information matrices, Econometrics 6 (1) (2018) 8.

[24]

H.A. Le Thi, T.P. Dinh, H. Le, X.T. Vo, Dc approximation approaches for sparse optimization, Eur. J. Oper. Res. 244 (1) (2015) 26–46.

[25]

H.A. Le Thi, H. Le, T. Pham Dinh, Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm, Mach. Learn. 101 (1) (2015) 163–186.

[26]

Q. Li, N. Lin, R. Xi, Bayesian regularized quantile regression, Bayesian Anal. 5 (3) (2010) 533–556.

[27]

Y. Liu, S. Bi, S. Pan, Equivalent Lipschitz surrogates for zero-norm and rank optimization problems, J. Glob. Optim. 72 (4) (2018) 679–704.

[28]

J. López, S. Maldonado, M. Carrasco, Double regularization methods for robust feature selection and svm classification via dc programming, Inf. Sci. 429 (2018) 377–389.

[29]

Z. Lu, Iterative hard thresholding methods for l 0 regularized convex cone programming, Math. Program. 147 (1) (2014) 125–154.

[30]

S. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process. 41 (12) (1993) 3397–3415.

Digital Library

[31]

L. Meier, S. Van De Geer, P. Bühlmann, The group lasso for logistic regression, J. R. Stat. Soc., Ser. B, Stat. Methodol. 70 (1) (2008) 53–71.

[32]

B.K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput. 24 (2) (1995) 227–234.

Digital Library

[33]

D. Needell, J. Tropp, Cosamp: iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmon. Anal. 26 (3) (2009) 301–321.

[34]

Y. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition, in: Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, IEEE, 1993, pp. 40–44.

[35]

B. Pötscher, H. Leeb, On the distribution of penalized maximum likelihood estimators: the lasso, scad, and thresholding, J. Multivar. Anal. 100 (9) (2009) 2065–2082.

[36]

B. Ramana, N. Venkateswarlu, ILPD (Indian Liver Patient Dataset), UCI Machine Learning Repository, 2012.

[37]

Real estate valuation data set, UCI Machine Learning Repository, 2018.

[38]

M. Redmond, Communities and Crime, UCI Machine Learning Repository, 2009.

[39]

V. Roth, The generalized lasso, IEEE Trans. Neural Netw. 15 (1) (2004) 16–28.

Digital Library

[40]

X. Shen, W. Pan, Y. Zhu, Likelihood-based selection and sharp parameter estimation, J. Am. Stat. Assoc. 107 (497) (2012) 223–232.

[41]

N. Simon, J. Friedman, T. Hastie, R. Tibshirani, A sparse-group lasso, J. Comput. Graph. Stat. 22 (2) (2013) 231–245.

[42]

I. Sohn, J. Kim, S.-H. Jung, C. Park, Gradient lasso for Cox proportional hazards model, Bioinformatics 25 (14) (2009) 1775–1781.

[43]

T. Stamey, J. Kabalin, J. McNeal, I. Johnstone, F. Freiha, E. Redwine, N. Yang, Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients, J. Urol. 141 (5) (1989) 1076–1083.

[44]

R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B, Methodol. 58 (1) (1996) 267–288.

[45]

R. Tibshirani, The lasso method for variable selection in the Cox model, Stat. Med. 16 (4) (1997) 385–395.

[46]

R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, K. Knight, Sparsity and smoothness via the fused lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol. 67 (1) (2005) 91–108.

[47]

J. Tropp, Greed is good: algorithmic results for sparse approximation, IEEE Trans. Inf. Theory 50 (10) (2004) 2231–2242.

Digital Library

[48]

J. Tropp, A. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit, IEEE Trans. Inf. Theory 53 (12) (2007) 4655–4666.

Digital Library

[49]

L. Wang, J. Zhou, A. Qu, Penalized generalized estimating equations for high-dimensional longitudinal data analysis, Biometrics 68 (2) (2012) 353–360.

[50]

J. Weston, A. Elisseeff, B. Schölkopf, M. Tipping, Use of the zero norm with linear models and kernel methods, J. Mach. Learn. Res. 3 (2003) 1439–1461.

[51]

L. Wolsey, G. Nemhauser, Integer and Combinatorial Optimization, vol. 55, John Wiley & Sons, 1999.

[52]

Y. Wu, Y. Liu, Variable selection in quantile regression, Stat. Sin. (2009) 801–817.

[53]

T.-J. Yen, A majorization–minimization approach to variable selection using spike and slab priors, Ann. Stat. 39 (3) (2011) 1748–1775.

[54]

Yuan, G.; Ghanem, B. (2016): Sparsity constrained minimization via mathematical programming with equilibrium constraints. ArXiv preprint arXiv:1608.04430.

[55]

C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38 (2) (2010) 894–942.

[56]

Y. Zhang, X. Shen, Model selection procedure for high-dimensional data, Stat. Anal. Data Min. ASA Data Sci. J. 3 (5) (2010) 350–358.

[57]

H. Zou, The adaptive lasso and its oracle properties, J. Am. Stat. Assoc. 101 (476) (2006) 1418–1429.

[58]

H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc., Ser. B, Stat. Methodol. 67 (2) (2005) 301–320.

Recommendations

Reweighted l1-norm penalized LMS for sparse channel estimation and its analysis

A new reweighted l"1-norm penalized least mean square (LMS) algorithm for sparse channel estimation is proposed and studied in this paper. Since standard LMS algorithm does not take into account the sparsity information about the channel impulse response ...
Estimation of graphical models through structured norm minimization

Estimation of Markov Random Field and covariance models from high-dimensional data represents a canonical problem that has received a lot of attention in the literature. A key assumption, widely employed, is that of sparsity of the underlying model. In ...
Image compressive sensing via Truncated Schatten-p Norm regularization

Low-rank property as a useful image prior has attracted much attention in image processing communities. Recently, a nonlocal low-rank regularization (NLR) approach toward exploiting low-rank property has shown the state-of-the-art performance in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computational Statistics & Data Analysis

Computational Statistics & Data Analysis Volume 192, Issue C

Apr 2024

318 pages

Issue’s Table of Contents

Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 12 April 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents