Article

Scalable training of L¹-regularized log-linear models

Authors:

Jianfeng GaoAuthors Info & Claims

ICML '07: Proceedings of the 24th international conference on Machine learning

Pages 33 - 40

https://doi.org/10.1145/1273496.1273501

Published: 20 June 2007 Publication History

Abstract

The L-BFGS limited-memory quasi-Newton method is the algorithm of choice for optimizing the parameters of large-scale log-linear models with L₂ regularization, but it cannot be used for an L₁-regularized loss due to its non-differentiability whenever some parameter is zero. Efficient algorithms have been proposed for this task, but they are impractical when the number of parameters is very large. We present an algorithm Orthant-Wise Limited-memory Quasi-Newton (OWL-QN), based on L-BFGS, that can efficiently optimize the L₁-regularized log-likelihood of log-linear models with millions of parameters. In our experiments on a parse reranking task, our algorithm was several orders of magnitude faster than an alternative algorithm, and substantially faster than L-BFGS on the analogous L₂-regularized problem. We also present a proof that OWL-QN is guaranteed to converge to a globally optimal parameter vector.

References

[1]

Benson, J. S., & More, J. J. (2001). A limited memory variable metric method for bound constraint minimization.

[2]

Bertsekas, D. P. (1999). Nonlinear programming. Athena Scientific.

[3]

Byrd, R. H., Lu, P., Nocedal, J., & Zhu, C. Y. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16, 1190--1208.

Digital Library

[4]

Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and maxent discriminative reranking. ACL.

Digital Library

[5]

Collins, M. (2000). Discriminative reranking for natural language parsing. ICML (pp. 175--182).

Digital Library

[6]

Darroch, J., & Ratcliff, D. (1972). Generalised iterative scaling for log-linear models. Annals of Mathematical Statistics.

[7]

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics.

[8]

Gao, J., Andrew, G., Johnson, M., & Toutanova, K. (2007). A comparative study of parameter estimation methods for statistical NLP. ACL.

[9]

Goodman, J. (2004). Exponential priors for maximum entropy models. ACL.

[10]

Kazama, J., & Tsujii, J. (2003). Evaluation and extension of maximum entropy models with inequality constraints. EMNLP.

Digital Library

[11]

Lee, S.-I., Lee, H., Abbeel, P., & Ng, A. (2006). Efficient L1 regularized logistic regression. AAAI-06.

[12]

Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. CONLL.

Digital Library

[13]

Minka, T. P. (2003). A comparison of numerical optimizers for logistic regression (Technical Report). Microsoft Research.

[14]

Ng, A. Y. (2004). Feature selection, L1 vs. L2 regularization, and rotational invariance. ICML.

Digital Library

[15]

Nocedal, J., & Wright, S. J. (1999). Numerical optimization. Springer.

[16]

Perkins, S., & Theiler, J. (2003). Online feature selection using grafting. ICML.

[17]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B.

[18]

Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw., 23, 550--560.

Digital Library

Cited By

Biyik EAnari NSadigh D(2024)Batch Active Learning of Reward Functions from Human PreferencesACM Transactions on Human-Robot Interaction10.1145/364988513:2(1-27)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3649885
Lu FCurtin RRaff EFerraro FHolt JBaeza-Yates RBonchi F(2024)High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global UpdatesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672038(2037-2047)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3672038
Tillmann ABienstock DLodi ASchwartz A(2024)Cardinality Minimization, Constraints, and Regularization: A SurveySIAM Review10.1137/21M142770X66:3(403-477)Online publication date: 8-Aug-2024
https://doi.org/10.1137/21M142770X
Show More Cited By

Scalable training of L¹-regularized log-linear models
1. Computing methodologies
2. Theory of computation

Recommendations

An efficient regularized PR splitting type algorithm for two-block nonconvex linear constrained programs in ℓ 1 / 2 regularized compressed sensing problems
Abstract
In this paper, we study the resolution of two-block nonconvex linear constrained optimization problems. The objective function of this program combines a hypoconvex Lipschitz differentiable function and a proper l.s.c. function. We discuss a new ...
An inexact successive quadratic approximation method for L-1 regularized optimization

We study a Newton-like method for the minimization of an objective function $$\phi $$ that is the sum of a smooth function and an $$\ell _1$$ℓ1 regularization term. This method, which is sometimes referred to in the literature as a proximal Newton ...
An Augmented Lagrangian Method for $\ell_{1}$-Regularized Optimization Problems with Orthogonality Constraints

A class of $\ell_1$-regularized optimization problems with orthogonality constraints has been used to model various applications arising from physics and information sciences, e.g., compressed modes for variational problems. Such optimization problems are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '07: Proceedings of the 24th international conference on Machine learning

June 2007

1233 pages

ISBN:9781595937933

DOI:10.1145/1273496

Editor:
Zoubin Ghahramani
University of Cambridge, United Kingdom

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Machine Learning Journal

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICML '07 & ILP '07

Sponsor:

ICML '07 & ILP '07: The 24th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 20 - 24, 2007

Oregon, Corvalis, USA

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

225
Total Citations
View Citations
1,456
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)15

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Biyik EAnari NSadigh D(2024)Batch Active Learning of Reward Functions from Human PreferencesACM Transactions on Human-Robot Interaction10.1145/364988513:2(1-27)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3649885
Lu FCurtin RRaff EFerraro FHolt JBaeza-Yates RBonchi F(2024)High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global UpdatesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672038(2037-2047)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3672038
Tillmann ABienstock DLodi ASchwartz A(2024)Cardinality Minimization, Constraints, and Regularization: A SurveySIAM Review10.1137/21M142770X66:3(403-477)Online publication date: 8-Aug-2024
https://doi.org/10.1137/21M142770X
Zhong KGatti MJain B(2024)Improving convolutional neural networks for cosmological fields with random permutationPhysical Review D10.1103/PhysRevD.110.043535110:4Online publication date: 29-Aug-2024
https://doi.org/10.1103/PhysRevD.110.043535
Koh JTai TLee C(2024)Realization of higher-order topological lattices on a quantum computerNature Communications10.1038/s41467-024-49648-515:1Online publication date: 10-Jul-2024
https://doi.org/10.1038/s41467-024-49648-5
Antonelo ECamponogara ESeman LJordanou Jde Souza EHübner J(2024)Physics-informed neural nets for control of dynamical systemsNeurocomputing10.1016/j.neucom.2024.127419579:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127419
Tamborrino CFalini AMazzia F(2024)Empirical density estimation based on spline quasi-interpolation with applications to copulas clustering modelingJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116131452:COnline publication date: 15-Dec-2024
https://dl.acm.org/doi/10.1016/j.cam.2024.116131
Luo XYuan STang HXu DRan QCen YLiang D(2024)Enhanced physics‐informed neural networks for efficient modelling of hydrodynamics in river networksHydrological Processes10.1002/hyp.1514338:4Online publication date: 11-Apr-2024
https://doi.org/10.1002/hyp.15143
Zhou DWang NYang HGao XLiu TKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Phase-aware adversarial defense for improving adversarial robustnessProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620209(42724-42741)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620209
Zhou DChen YWang NLiu DGao XLiu TKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Eliminating adversarial noise via information discard and robust representation restorationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620198(42517-42530)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620198
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents