Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Alternating linearization for structured regularization problems

Published: 01 January 2014 Publication History

Abstract

We adapt the alternating linearization method for proximal decomposition to structured regularization problems, in particular, to the generalized lasso problems. The method is related to two well-known operator splitting methods, the Douglas-Rachford and the Peaceman-Rachford method, but it has descent properties with respect to the objective function. This is achieved by employing a special update test, which decides whether it is beneficial to make a Peaceman-Rachford step, any of the two possible Douglas-Rachford steps, or none. The convergence mechanism of the method is related to that of bundle methods of nonsmooth optimization. We also discuss implementation for very large problems, with the use of specialized algorithms and sparse data structures. Finally, we present numerical results for several synthetic and real-world examples, including a three-dimensional fused lasso problem, which illustrate the scalability, efficacy, and accuracy of the method.

References

[1]
H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, New York, 2011.
[2]
A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183-202, 2009.
[3]
E. G. Birgin and J. M. Martínez. Large-scale active-set box-constrained optimization method with spectral projected gradients. Comput. Optim. Appl., 23(1):101-125, 2002.
[4]
J.-F. Bonnans, J. C. Gilbert, C. Lemaréchal, and C. Sagastizábal. Numerical Optimization. Theoretical and Practical Aspects. Springer-Verlag, Berlin, 2003.
[5]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1-122, 2010.
[6]
X. Chen, S. Kim, Q. Lin, J. Carbonell, and E. Xing. Graph-structured multi-task regression and an efficient optimization method for general fused lasso. Technical report 1005.3579v1, arXiv, 2010.
[7]
X. Chen, Q. Lin, S. Kim, J. Carbonell, and E. Xing. Smoothing proximal gradient method for general structured sparse regression. Ann. Appl. Stat., 6(2):719-752, 2012.
[8]
P. L. Combettes. Iterative construction of the resolvent of a sum of maximal monotone operators. J. Convex Anal., 16(3-4):727-748, 2009.
[9]
P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer Optimization and Its Applications, pages 185-212. Springer, 2011.
[10]
I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math, 57(11):1413-1457, 2004.
[11]
E. Van den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM J. on Scientific Computing., 2008.
[12]
J. Douglas and H. H. Rachford. On the numerical solution of heat conduction problems in two and three space variables. Trans. Amer. Math. Soc., 82:421-439, 1956.
[13]
J. Eckstein. Augmented Lagrangian and alternating direction methods for convex optimization: A tutorial and some illustrative computational results. RUTCOR Research Report, Rutgers University, RRR 32-2012, 2012.
[14]
J. Eckstein and D. P. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Programming, 55(3, Ser. A):293-318, 1992.
[15]
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407-451, 2004.
[16]
J.M. Fadili and G. Peyré. Total variation projection with first order schemes. IEEE Transactions on Image Processing, 20(3):657-669, 2011.
[17]
A. Friedlander and J. M. Martínez. On the maximization of a concave quadratic function with box constraints. SIAM J. Optim., 4(1):177-192, 1994.
[18]
J. Friedman, T. Hastie, H. Hoefling, and R. Tibshirani. Pathwise coordinate optimization. Annals of Applied Statistics, 1(2):302-332, 2007.
[19]
W. Fu. Penalized regressions: the bridge vs. the lasso. Journal of Computational and Graphical Statistics, 7(3):397-416, 1998.
[20]
D. Goldfarb, S. Ma, and K. Scheinberg. Fast alternating linearization methods for minimizing the sum of two convex functions. Mathematical Programming, (141):349-382, 2013.
[21]
T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci., 2(2):323-343, 2009.
[22]
G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, third edition, 1996.
[23]
L. Grosenick, B. Klingenberg, B. Knutson, and J. Taylor. A family of interpretable multivariate models for regression and classification of whole-brain fmri data. arXiv/1110.4139, 2011.
[24]
L. Grosenick, B. Klingenberg, K. Katovich, B. Knutson, and J. Taylor. Interpretable whole-brain prediction analysis with graphnet. Neuroimage, 2013.
[25]
B. He and X. Yuan. On the O(1/t) convergence rate of alternating direction method. Technical report, 2011.
[26]
J.-B. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms. II, volume 306 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1993.
[27]
H. Hoefling. A path algorithm for the fused lasso signal approximator. Journal of Computational and Graphical Statistics, 19(4), 2010.
[28]
R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparse hierarchical dictionary learning. International Conference on Machine Learning, 2010.
[29]
P. Karunanayaka, V. J. Schmithorst, J. Vannest, J. P. Szaflarski, E. Plante, and S. K. Holland. A group independent component analysis of covert verb generation in children: A functional magnetic resonance imaging study. Neuroimage, 51:472-487, 2010.
[30]
K. Kiwiel, C. Rosa, and A. Ruszczynski. Proximal decomposition via alternating linearization. SIAM Journal on Optimization, 9:153-172, 1999.
[31]
K. C. Kiwiel. Methods of Descent for Nondifferentiable Optimization, volume 1133 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1985.
[32]
C. Li, W. Yin, H. Jiang, and Y. Zhang. An efficient augmented Lagrangian method with applications to total variation minimization. Computational Optimization and Applications, 56(3):507-530, 2013.
[33]
P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal., 16(6):964-979, 1979.
[34]
J. Liu, L. Yuan, and J. Ye. An efficient algorithm for a class of fused lasso problems. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010.
[35]
J. Liu, L. Yuan, and J. Ye. SLEP: Sparse learning with efficient projections. Technical report, Computer Science Center, Arizona State University, 2011.
[36]
V. Michel, A. Gramfort, G. Varoquaux, E. Eger, and B. Thirion. Total variation regularization for fMRI-based prediction of behavior. IEEE Transactions on Medical Imaging, 30(7):1328-1340, 2011.
[37]
Yu. Nesterov. Gradient methods for minimizing composite objective function. Discussion paper 2007/76, ECORE, 2007.
[38]
D. W. Peaceman and H. H. Rachford. The numerical solution of parabolic and elliptic differential equations. J. Soc. Indust. Appl. Math., 3:28-41, 1955.
[39]
Z. Qin and D. Goldfarb. Structured sparsity via alternating direction methods. Journal of Machine Learning Research, pages 1435-1468, 2012.
[40]
L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena., 1992.
[41]
A. Ruszczynski. On convergence of an augmented Lagrangian decomposition method for sparse convex optimization. Math. Oper. Res., 20(3):634-656, 1995.
[42]
A. Ruszczynski. Nonlinear Optimization. Princeton University Press, Princeton, NJ, 2006.
[43]
V. Schmithorst, S. Holland, and E. Plante. Cognitive modules utilized for narrative comprehension in children: a functional magnetic resonance imaging study. Neuroimage, 29: 254-266, 2006.
[44]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of The Royal Statistical Society Series B, 58(1):267-288, 1996.
[45]
R. Tibshirani and J. Taylor. The solution path of the generalized lasso. Annals of Statistics, 39(3), 2011.
[46]
R. Tibshirani and P. Wang. Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics, 9(1):18-29, 2008.
[47]
R. Tibshirani, M. Saunders, J. Zhu, and S. Rosset. Sparsity and smoothness via the fused lasso. Journal of The Royal Statistical Society Series B, 67:91-108, 2005.
[48]
P. Tseng. Convergence of block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109:474-494, 2001.
[49]
B. Wahlberg, S. Boyd, M. Annergren, and Y. Wang. An ADMM algorithm for a class of total variation regularized estimation problems. arXiv:1203.1828, 2012.
[50]
Y. Wang, J. Yang, W. Yin, and Y. Zhang. A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences, 1(3):248-272, 2008.
[51]
S. Wright, R. Nowak, and M. Figueiredo. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7), 2009.
[52]
L. Xiao and T. Zhang. A proximal-gradient homotopy method for the sparse least-squared problem. SIAM Journal on Optimization., 23(2):1062-1091, 2013.
[53]
G.-B. Ye and X. Xie. Split Bregman method for large scale fused Lasso. Comput. Statist. Data Anal., 55(4):1552-1569, 2011.
[54]
Z. Zhang, K. Lange, and C. Sabatti. Reconstructing DNA copy number by joint segmentation of multiple sequences. BMC Bioinformatics, 13(205), 2012.

Cited By

View all
  • (2021)An outer–inner linearization method for non-convex and nondifferentiable composite regularization problemsJournal of Global Optimization10.1007/s10898-021-01054-781:1(179-202)Online publication date: 1-Sep-2021
  • (2019)A parallelizable augmented Lagrangian method applied to large-scale non-convex-constrained optimization problemsMathematical Programming: Series A and B10.1007/s10107-018-1253-9175:1-2(503-536)Online publication date: 17-May-2019
  • (2018)Robust multicategory support vector machines using difference convex algorithmMathematical Programming: Series A and B10.1007/s10107-017-1209-5169:1(277-305)Online publication date: 1-May-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 15, Issue 1
January 2014
4085 pages
ISSN:1532-4435
EISSN:1533-7928
  • Editors:
  • Kevin Murphy,
  • Bernhard Schölkopf
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Revised: 01 May 2014
Published: 01 January 2014
Published in JMLR Volume 15, Issue 1

Author Tags

  1. fused lasso
  2. lasso
  3. nonsmooth optimization
  4. operator splitting

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)13
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)An outer–inner linearization method for non-convex and nondifferentiable composite regularization problemsJournal of Global Optimization10.1007/s10898-021-01054-781:1(179-202)Online publication date: 1-Sep-2021
  • (2019)A parallelizable augmented Lagrangian method applied to large-scale non-convex-constrained optimization problemsMathematical Programming: Series A and B10.1007/s10107-018-1253-9175:1-2(503-536)Online publication date: 17-May-2019
  • (2018)Robust multicategory support vector machines using difference convex algorithmMathematical Programming: Series A and B10.1007/s10107-017-1209-5169:1(277-305)Online publication date: 1-May-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media