Subspace Newton method for sparse group $$\ell _0$$ optimization problem

Shichen Liao¹,
Congying Han ORCID: orcid.org/0000-0002-3445-4620¹,
Tiande Guo¹ &
…
Bonan Li²

266 Accesses
Explore all metrics

Abstract

This paper investigates sparse optimization problems characterized by a sparse group structure, where element- and group-level sparsity are jointly taken into account. This particular optimization model has exhibited notable efficacy in tasks such as feature selection, parameter estimation, and the advancement of model interpretability. Central to our study is the scrutiny of the $\ell _0$ and $\ell _{2,0}$ norm regularization model, which, in comparison to alternative surrogate formulations, presents formidable computational challenges. We embark on our study by conducting the analysis of the optimality conditions of the sparse group optimization problem, leveraging the notion of a $\gamma $-stationary point, whose linkage to local and global minimizer is established. In a subsequent facet of our study, we develop a novel subspace Newton algorithm for sparse group $\ell _0$ optimization problem and prove its global convergence property as well as local second-order convergence rate. Experimental results reveal the superlative performance of our algorithm in terms of both precision and computational expediency, thereby outperforming several state-of-the-art solvers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group sparse recovery via group square-root elastic net and the iterative multivariate thresholding-based algorithm

Article 08 April 2022

A regularized limited memory subspace minimization conjugate gradient method for unconstrained optimization

Article 19 June 2023

Convex Relaxation Algorithm for a Structured Simultaneous Low-Rank and Sparse Recovery Problem

Article 11 July 2015

Data availability

The data used in Sect. 4.1 are generated randomly by Matlab’s built-in generation function. The orignal image data in Sect. 4.2 is publicly available from: http://www0.cs.ucl.ac.uk/staff/b.jin/software/gpdasc.zip.

References

Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005). https://doi.org/10.1109/TIT.2005.858979
Article MathSciNet Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Article MathSciNet Google Scholar
Zhang, P., Wang, R., Xiu, N.: Multinomial logistic regression classifier via $\ell _{q,0}$-proximal newton algorithm. Neurocomputing 468, 148–164 (2021)
Article Google Scholar
Lin, D., Zhang, J., Li, J., Calhoun, V.D., Deng, H.-W., Wang, Y.-P.: Group sparse canonical correlation analysis for genomic data integration. BMC Bioinform. 14(1), 1–16 (2013)
Article Google Scholar
Li, J., Dong, W., Meng, D.: Grouped gene selection of cancer via adaptive sparse group lasso based on conditional mutual information. IEEE ACM Trans. Comput. Biol. Bioinform. 15(6), 2028–2038 (2018). https://doi.org/10.1109/TCBB.2017.2761871
Article Google Scholar
Hu, Y., Lu, J., Yang, X., Zhang, K.: Mix sparse optimization: theory and algorithm (2022). https://www.polyu.edu.hk/ama/profile/xqyang/mix_sparse2022.pdf
Li, Y., Nan, B., Zhu, J.: Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure. Biometrics 71(2), 354–363 (2015)
Article MathSciNet Google Scholar
Matsuoka, R., Kyochi, S., Ono, S., Okuda, M.: Joint sparsity and order optimization based on admm with non-uniform group hard thresholding. IEEE Trans. Circuits Syst. I Regul. Pap. 65(5), 1602–1613 (2017)
Article Google Scholar
Pan, L., Chen, X.: Group sparse optimization for images recovery using capped folded concave functions. SIAM J. Imaging Sci. 14(1), 1–25 (2021)
Article MathSciNet Google Scholar
Li, W., Bian, W., Toh, K.-C.: Difference-of-convex algorithms for a class of sparse group $\ell _0$ regularized optimization problems. SIAM J. Optim. 32(3), 1614–1641 (2022)
Article MathSciNet Google Scholar
Chen, J., Dai, G., Zhang, N.: An application of sparse-group lasso regularization to equity portfolio optimization and sector selection. Ann. Oper. Res. 284, 243–262 (2020)
Article MathSciNet Google Scholar
Bunea, F., Tsybakov, A., Wegkamp, M.: Sparsity oracle inequalities for the lasso. Electron. J. Stat. 1, 169–194 (2007)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
Article MathSciNet Google Scholar
Wang, H., Shao, Y., Zhou, S., Zhang, C., Xiu, N.: Support vector machine classifier via $l_{0/1}$ soft-margin loss. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7253–7265 (2019)
Article Google Scholar
Beck, A., Vaisbourd, Y.: The sparse principal component analysis problem: optimality conditions and algorithms. J. Optim. Theory Appl. 170, 119–143 (2015)
Article MathSciNet Google Scholar
Zhou, S., Luo, Z., Xiu, N., Li, G.Y.: Computing one-bit compressive sensing via double-sparsity constrained optimization. IEEE Trans. Signal Process. 70, 1593–1608 (2021)
Article MathSciNet Google Scholar
Shen, X., Pan, W., Zhu, Y., Zhou, H.: On constrained and regularized high-dimensional regression. Ann. Inst. Stat. Math. 65, 807–832 (2013)
Article MathSciNet Google Scholar
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of the 27th Asilomar Conference on Signals Systems, and Computers, pp. 40–44 (1993)
Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)
Article MathSciNet Google Scholar
Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: guaranteed stability and performance. IEEE J. Sel. Top. Signal Process. 4(2), 298–309 (2010)
Article Google Scholar
Beck, A., Eldar, Y.C.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23, 1480–1509 (2013)
Article MathSciNet Google Scholar
Yuan, X., Li, P., Zhang, T.: Gradient hard thresholding pursuit. J. Mach. Learn. Res. 18(1), 6027–6069 (2017)
MathSciNet Google Scholar
Zhou, S., Xiu, N., Qi, H.: Global and quadratic convergence of newton hard-thresholding pursuit. J. Mach. Learn. Res. 22, 12–11245 (2019)
MathSciNet Google Scholar
Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)
Article MathSciNet Google Scholar
Soubies, E., Blanc-Féraud, L., Aubert, G.: A continuous exact $\ell _0$ penalty (cel0) for least squares regularized problem. SIAM J. Imaging Sci. 8(3), 1607–1639 (2015)
Article MathSciNet Google Scholar
Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)
Article MathSciNet Google Scholar
Cheng, W., Chen, Z., Hu, Q.: An active set Barzilar–Borwein algorithm for $\ell _0$ regularized optimization. J. Glob. Optim. 76(4), 769–791 (2020)
Article Google Scholar
Bian, W., Chen, X.: A smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty. SIAM J. Numer. Anal. 58(1), 858–883 (2020)
Article MathSciNet Google Scholar
Ito, K., Kunisch, K.: A variational approach to sparsity optimization based on Lagrange multiplier theory. Inverse Probl. 30(1), 015001 (2013)
Article MathSciNet Google Scholar
Huang, J., Jiao, Y., Liu, Y., Lu, X.: A constructive approach to l0 penalized regression. J. Mach. Learn. Res. 19(1), 403–439 (2018)
Google Scholar
Zhou, S., Pan, L., Xiu, N.: Newton method for $\ell _0$-regularized optimization. Numer. Algorithms 88, 1541–1570 (2021)
Article MathSciNet Google Scholar
Nocedal, J., Wright, S.J.: Numerical optimization. In: Fundamental Statistical Inference (2018)
Facchinei, F.: Minimization of sc1 functions and the Maratos effect. Oper. Res. Lett. 17(3), 131–137 (1995). https://doi.org/10.1016/0167-6377(94)00059-F
Article MathSciNet Google Scholar
Yang, J., Leung, H.C.M., Yiu, S.-M., Cai, Y., Chin, F.Y.L.: Intra- and inter-sparse multiple output regression with application on environmental microbial community study. In: 2013 IEEE International Conference on Bioinformatics Biomedicine, pp. 404–409 (2013)
Esser, E., Lou, Y., Xin, J.: A method for finding structured sparse solutions to nonnegative least squares problems with applications. SIAM J. Imaging Sci. 6(4), 2010–2046 (2013)
Article MathSciNet Google Scholar
Jiao, Y., Jin, B., Lu, X.: Group sparse recovery via the $\ell ^0(\ell ^2)$ penalty: theory and algorithm. IEEE Trans. Signal Process. 65, 998–1012 (2016)
Article MathSciNet Google Scholar
Eldar, Y.C., Kuppinger, P., Bölcskei, H.: Block-sparse signals: uncertainty relations and efficient recovery. IEEE Trans. Signal Process. 58, 3042–3054 (2009)
Article MathSciNet Google Scholar
Berg, E., Friedlander, M.P.: Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)
Article MathSciNet Google Scholar
Huang, J., Breheny, P.J., Ma, S.: A selective review of group selection in high-dimensional models. Stat. Sci. 27, 4 (2012)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to express sincere appreciation for the editor and all reviewers’ valuable comments and suggestions, which are of great help to our paper. Besides, the authors appreciate for the assistance provided by fellow colleagues within the research group, as well as for the dedicated guidance of the supervisors. This paper is supported by National Key R &D (research and development) Program of China (2021YFA1000403) and the National Natural Science Foundation of China (Nos. U23B2012, 11991022) and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Mathematical Sciences, University of Chinese Academy of Sciences, 19A, Yuquan Road, Beijing, 100049, China
Shichen Liao, Congying Han & Tiande Guo
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, 19A, Yuquan Road, Beijing, 100049, China
Bonan Li

Authors

Shichen Liao
View author publications
You can also search for this author in PubMed Google Scholar
Congying Han
View author publications
You can also search for this author in PubMed Google Scholar
Tiande Guo
View author publications
You can also search for this author in PubMed Google Scholar
Bonan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Congying Han.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Theorem 2.1

Lemma A.1

For any solution $x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma } (z)$, it holds that for all $i\in {\textrm{supp}}(z)$, either $x_i=z_i$ or $x_i=0$, and for all $i\notin {\textrm{supp}}(z)$, $x_i=0$.

Proof

Define $ P_z(u):=\lambda \Vert u\Vert _{2,0}+\mu \Vert u\Vert _0+\frac{1}{2}\Vert u-z\Vert _2^2. $ Let $x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma } (z)=\arg \min P_z(u) $.

Assume there exists $i\in {\textrm{supp}}(z)$, such that $x_i>0$ and $x_i \ne z_i$. Construct a new vector $\bar{x}$ by letting $\bar{x}_i=z_i\ne 0$, and $\bar{x}_l=x_l$ for $l\ne i$. It follows immediately that

$$\begin{aligned} \begin{aligned} \lambda \Vert x\Vert _{2,0} + \mu \Vert x\Vert _0&= \lambda \Vert \bar{x}\Vert _{2,0} + \mu \Vert \bar{x}\Vert _0,\\ \Vert x-z\Vert _2^2&= \Vert \bar{x}-z\Vert _2^2 + (\bar{x}_i-x_i)^2 > \Vert \bar{x}-z\Vert _2^2. \end{aligned} \end{aligned}$$

Hence, $P_z(x) > P_z(\bar{x})$, which is a contradiction to $x\in \arg \min P_z(u) $. As a conclusion, for any $i\in {\textrm{supp}}(z)$, either $x_i=z_i$ or $x_i=0$.

Through a similar manner, assume there exists $i\notin {\textrm{supp}}(z)$, such that $x_i\ne 0$. Construct a new vector $\bar{x}$ by letting $\bar{x}_i=0=z_i$, and $\bar{x}_l=x_l$ for $l\ne i$. It follows immediately that

$$\begin{aligned} \lambda \Vert x\Vert _{2,0} + \mu \Vert x\Vert _0\ge & {} \lambda \Vert \bar{x}\Vert _{2,0} + \mu \Vert \bar{x}\Vert _0,\\ \Vert x-z\Vert _2^2= & {} \Vert \bar{x}-z\Vert _2^2 + x_i^2 > \Vert \bar{x}-z\Vert _2^2. \end{aligned}$$

Same contradiction is derived: $P_z(x) > P_z(\bar{x})$. Hence, for any $i\notin {\textrm{supp}}(z)$, $x_i=0$. $\square $

Lemma A.2

$\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$ containing the solution that has the most nonzero elements is a singleton.

Proof

Define

$$\begin{aligned} P_{z_G}(u_{G_i}):=\lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2}\Vert u_{G_i}-z_{G_i}\Vert _2^2, \end{aligned}$$

where $\Vert u_{G_i}\Vert _{2,0}=\left\{ \begin{aligned}&1,\quad if\ \Vert u_{G_i}\Vert _2 \ne 0 \\&0,\quad if\ \Vert u_{G_i}\Vert _2=0 \end{aligned}\right. \ ,$ and use the previous notation

$$\begin{aligned} P_z(u):=\lambda \Vert u\Vert _{2,0}+\mu \Vert u\Vert _0+\frac{1}{2}\Vert u-z\Vert _2^2. \end{aligned}$$

We then have

$$\begin{aligned} \begin{aligned} \min _{u}P_z(u)=&\min _{u}\left\{ \lambda \Vert u\Vert _{2,0}+\mu \Vert u\Vert _0+\frac{1}{2}\Vert u-z\Vert _2^2 \right\} \\ =&\min _{u}\sum _{i=1}^{m}\left\{ \lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2}\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \\ =&\sum _{i=1}^{m}\min _{u_{G_i}} \left\{ \lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2}\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \\ =&\sum _{i=1}^{m}\min _{u_{G_i}} P_{z_G}(u_{G_i}) , \end{aligned} \end{aligned}$$

(49)

where the first and second equality holds due to $G_1,\dots ,G_m$ are non-overlapping group division, i.e. $G_i\cap G_j=\emptyset ,\forall i\ne j,\text { and }\bigcup _{i=1}^m G_i=[n]$. Equation (49) implies that we can consider the minimization problem within each group.

Suppose x and $x'$ are two different elements in $\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$. According to the definition of $\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$ that contains the solution with the most nonzero elements in ${\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$, we have $\Vert x\Vert _0 = \Vert x'\Vert _0$. Together with Lemma A.1, there must exists $j \in [n]$ such that $x_j =0$, while $x'_j=z_j \ne 0$, otherwise $x=x'=z$, which is a contradiction to $x\ne x'$.

Construct a new vector $\bar{x}$ by letting $\bar{x}_j=x'_j\ne 0$, and $\bar{x}_l=x_l$ for $l\ne j$. Then $\Vert \bar{x}\Vert _0>\Vert x\Vert _0$. Suppose element j belongs to group $G_i$, we have from (49) that both $x_{G_i}$ and $x'_{G_i}$ are solutions to $P_{z_G}(u_{G_i})$, since x and $x'$ are solutions to $P_z(u)$. Again by utilizing (49), it can be verified $\bar{x}$ is a solution to $P_z(u)$. However, here comes the contradiction that we construct a solution with more nonzero elements than x and $x'$.

Hence, $\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$ has to be a singleton. $\square $

Lemma A.3

Given $z\in \mathbb {R}^d$, let $y=\textbf{H}(z,\sqrt{2\gamma \mu })$. Then y has the most nonzero elements among all the solutions to $\min _{u\in \mathbb {R}^d}\{\mu \Vert u\Vert _0+\frac{1}{2\gamma }\Vert u-z\Vert _2^2\}.$

Proof

Observing that

$$\begin{aligned} \min _{u\in \mathbb {R}^d}\{\mu \Vert u\Vert _0+\frac{1}{2\gamma }\Vert u-z\Vert _2^2\} = \sum _{i=1}^d \min _{u_i} \left\{ \mu \Vert u_i\Vert _0+\frac{1}{2\gamma }(u_i-z_i)^2\right\} , \end{aligned}$$

any solution $u^*\in \arg \min _{u\in \mathbb {R}^d}\{\mu \Vert u\Vert _0+\frac{1}{2\gamma }\Vert u-z\Vert _2^2\}$ has the form:

$$\begin{aligned} u^*_i=\left\{ \begin{aligned}&z_i,&\text {if } |z_i|>\sqrt{2\gamma \mu },\\&0\text { or }z_i, \quad&\text {if } |z_i|=\sqrt{2\gamma \mu },\\&0,&\text {if } |z_i|<\sqrt{2\gamma \mu }. \end{aligned}\right. \end{aligned}$$

From the definition of hard-thresholding operator (4), the conclusion of the lemma naturally holds. $\square $

Lemma A.4

Let x be computed through two-step projection (5a,5a), then $x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$.

Proof

We prove the lemma within each group $G_i$. From the definition of hard-thresholding operator, we have $y_{G_i}=\textbf{H}(z_{G_i},\sqrt{2\gamma \mu })$ is the solution to

$$\begin{aligned} \min \limits _{u_{G_i}}\left\{ \mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2\right\} . \end{aligned}$$

(50)

(a). If $\exists j\in G_i,$ such that $|z_{G_i}|_j\ge \sqrt{2\gamma \mu }$. Therefore, $y_{G_i}$ is a nonzero vector, and the following equations hold:

$$\begin{aligned} \min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i})=&\min _{u_{G_i}\ne 0} \left\{ \lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \nonumber \\ =&\lambda + \min _{u_{G_i}\ne 0} \left\{ \mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \nonumber \\ =&\lambda + \mu \Vert y_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert y_{G_i}-z_{G_i}\Vert _2^2 . \end{aligned}$$

(51)

Further, define $\widetilde{G}_i:={\textrm{supp}}(y_{{G}_i})\ne \emptyset $ and $\widetilde{G}_i^o:=G_i\backslash \widetilde{G}_i$, if follows $y_{\widetilde{G}_i}=z_{\widetilde{G}_i}$ and $y_{\widetilde{G}_i^o}=\textbf{0}.$ We then obtain

$$\begin{aligned} \begin{aligned}&\min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i})-P_z(\textbf{0})\\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 +\frac{1}{2\gamma }\Vert y_{G_i}-z_{G_i}\Vert _2^2 - \frac{1}{2\gamma }\Vert z_{G_i}\Vert _2^2\\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 +\frac{1}{2\gamma }\Vert z_{\widetilde{G}_i^o}\Vert _2^2-\frac{1}{2\gamma }\left( \Vert z_{\widetilde{G}_i}\Vert _2^2+\Vert z_{\widetilde{G}_i^o}\Vert _2^2 \right) \\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 -\frac{1}{2\gamma }\Vert z_{\widetilde{G}_i}\Vert _2^2 \\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 -\frac{1}{2\gamma }\Vert y_{\widetilde{G}_i}\Vert _2^2 \\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 -\frac{1}{2\gamma }\Vert y_{G_i}\Vert _2^2, \end{aligned} \end{aligned}$$

(52)

where the last equality holds for $\widetilde{G}_i$ captures all nonzero elements in $y_{G_i}$.

(a-1). If $\Vert y_{G_i}\Vert _2\ge \sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)}$, then

$$\begin{aligned} x_{G_i}:=\textbf{H}_{\mathcal {G}}(y_{G_i},\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)})=y_{G_i}. \end{aligned}$$

And we have from (52)$\le 0$ that $x_{G_i}$ is a solution to $\min _{u_{G_i}}P_{z_G}(u_{G_i})$. Hence, combining (49), it can be deduced that $x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z).$

(a-2). If $\Vert y_{G_i}\Vert _2<\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)}$, then

$$\begin{aligned} x_{G_i}:=\textbf{H}_{\mathcal {G}}(y_{G_i},\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)})=\textbf{0}. \end{aligned}$$

And we have from (52) that

$$\begin{aligned} \min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i})-P_z(\textbf{0})=\lambda +\mu \Vert y_{G_i}\Vert _0 -\frac{1}{2\gamma }\Vert y_{G_i}\Vert _2^2 > 0, \end{aligned}$$

(53)

thus $\textbf{0}$ is a solution to $\min _{u_{G_i}}P_{z_G}(u_{G_i})$. Combining (49), it can be deduced that $x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z).$

(b). If $|z_{G_i}|_j < \sqrt{2\gamma \mu },\,\forall j \in G_i$, then $y_{G_i}=\textbf{0}$. It follows $\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)} > \Vert y_{G_i}\Vert _2=0. $ Hence,

$$\begin{aligned} x_{G_i}:=\textbf{H}_{\mathcal {G}}(y_{G_i},\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)})=\textbf{0}. \end{aligned}$$

Besides, we have

$$\begin{aligned} \begin{aligned} \min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i})=&\min _{u_{G_i}\ne 0} \left\{ \lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \\ =&\lambda + \min _{u_{G_i}\ne 0} \left\{ \mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \\>&\lambda + \frac{1}{2\gamma }\Vert z_{G_i}\Vert _2^2\\ >&\frac{1}{2\gamma }\Vert z_{G_i}\Vert _2^2=P_z(\textbf{0})\,, \end{aligned} \end{aligned}$$

(54)

where the first inequality is due to the only solution to (50) is zero vector. Hence, $x_{G_i}=\textbf{0}\in \arg \min P_{z_G}(u_{G_i})$. As a conclusion, $x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z).$

In summary, two-step projection finds an element in ${\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z).$ $\square $

We now proof Theorem 2.1:

Proof of Theorem 2.1

For the first part of the theorem, $x={\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$ implies ${\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$ is a singleton. We just need to prove that x can be calculated from the two-step projection. Let $x'$ satisfies (5a,5b), Lemma A.4 guarantees $x'\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$. Hence, we obtain $x=x'$ for ${\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$ is a singleton.

For the second part of the theorem, let x be calculated from the two-step projection. It has already been proved that $\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$ is a singleton and $x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$. All that remains is to prove x has the most nonzero elements. Equivelently, let $\bar{x}=\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)$, we just need to show ${\textrm{supp}}(x)\supseteq {\textrm{supp}}(\bar{x}).$ According to (49), it is necessary and sufficient to prove within each group $G_i$, there is ${\textrm{supp}}(x_{G_i})\supseteq {\textrm{supp}}(\bar{x}_{G_i}).$

Case 1. If $x_{G_i}\ne \textbf{0}$. We shall only consider the case when $\bar{x}_{G_i}\ne \textbf{0},$ otherwise ${\textrm{supp}}(x_{G_i})\supseteq {\textrm{supp}}(\bar{x}_{G_i})$ holds naturally. Since both $x_{G_i}$ and $\bar{x}_{G_i}$ minimize $P_{z_G}(u_{G_i})$, we obtain from

$$\begin{aligned} \min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i}) = \lambda + \min _{u_{G_i}\ne 0} \left\{ \mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \end{aligned}$$

that both $x_{G_i}$ and $\bar{x}_{G_i}$ are the solution to the minimization on the right hand side of the above equation. Also note that $x_{G_i}\ne \textbf{0}$ implies (5b) does not take place, therefore, $x_{G_i}=y_{G_i}=\textbf{H}(z_{G_i},\sqrt{2\gamma \mu }).$ Utilizing Lemma A.3, it follows ${\textrm{supp}}(x_{G_i})\supseteq {\textrm{supp}}(\bar{x}_{G_i}).$

Case 2. If $x_{G_i} = \textbf{0}$. We claim $\bar{x}_{G_i}= \textbf{0}.$ Prove by contradiction, suppose $\bar{x}_{G_i}\ne \textbf{0}.$ From (5a,5b), we konw that either case $(2a):\ z_i < \sqrt{2\gamma \mu },\forall i\in {\textrm{supp}}(z)$, or case $(2b):\ \Vert y_{G_i}\Vert _2 < \sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)}$ holds. However, case (2a) implies (54) holds, which is a contradiction to $ \textbf{0}\ne \bar{x}_{G_i}\in \min _{u_{G_i}} P_{z_G}(u_{G_i})$, as zero vector is the only solution to $\min _{u_{G_i}} P_{z_G}(u_{G_i})$; case (2b) implies (53) holds, which derives the same contradiction as case (2a) does.

As a conclusion, ${\textrm{supp}}(x_{G_i})\supseteq {\textrm{supp}}(\bar{x}_{G_i})$ holds, thus ${\textrm{supp}}(x)\supseteq {\textrm{supp}}(\bar{x})$. The second part of the theorem is proved. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liao, S., Han, C., Guo, T. et al. Subspace Newton method for sparse group $\ell _0$ optimization problem. J Glob Optim 90, 93–125 (2024). https://doi.org/10.1007/s10898-024-01396-y

Download citation

Received: 01 October 2023
Accepted: 30 March 2024
Published: 29 April 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s10898-024-01396-y

Subspace Newton method for sparse group \(\ell _0\) optimization problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Group sparse recovery via group square-root elastic net and the iterative multivariate thresholding-based algorithm

A regularized limited memory subspace minimization conjugate gradient method for unconstrained optimization

Convex Relaxation Algorithm for a Structured Simultaneous Low-Rank and Sparse Recovery Problem

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proof of Theorem 2.1

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Proof of Theorem 2.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Subspace Newton method for sparse group \(\ell _0\) optimization problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Group sparse recovery via group square-root elastic net and the iterative multivariate thresholding-based algorithm

A regularized limited memory subspace minimization conjugate gradient method for unconstrained optimization

Convex Relaxation Algorithm for a Structured Simultaneous Low-Rank and Sparse Recovery Problem

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proof of Theorem 2.1

Appendix A: Proof of Theorem 2.1

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Proof of Theorem 2.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation