Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleMarch 2024
Global convergence of sub-gradient method for robust matrix recovery: small initialization, noisy measurements, and over-parameterization
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 96, Pages 4434–4517In this work, we study the performance of sub-gradient method (SubGM) on a natural nonconvex and nonsmooth formulation of low-rank matrix recovery with ℓ1-loss, where the goal is to recover a low-rank matrix from a limited number of measurements, a ...
- research-articleMarch 2024
On learning rates and schrödinger operators
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 379, Pages 18153–18205Understanding the iterative behavior of stochastic optimization algorithms for minimizing nonconvex functions remains a crucial challenge in demystifying deep learning. In particular, it is not yet understood why certain simple techniques are remarkably ...
- research-articleMarch 2024
Preconditioned gradient descent for overparameterized nonconvex Burer—Monteiro factorization with global optimality certification
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 163, Pages 7781–7835We consider using gradient descent to minimize the nonconvex function f(X) = ϕ(XXT) over an n × r factor matrix X, in which ϕ is an underlying smooth convex cost function defined over n × n matrices. While only a second-order stationary point X can be ...
- research-articleMarch 2024
Restarted nonconvex accelerated gradient descent: no more polylogarithmic factor in the O(ε -7/4) complexity
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 157, Pages 7459–7495This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz continuous gradient and Hessian. We propose two simple accelerated gradient methods, restarted accelerated gradient descent (AGD) and restarted heavy ball (HB) ...
- research-articleMarch 2024
A line-search descent algorithm for strict saddle functions with complexity guarantees
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 10, Pages 307–340We describe a line-search algorithm which achieves the best-known worst-case complexity results for problems with a certain "strict saddle" property that has been observed to hold in low-rank matrix optimization problems. Our algorithm is adaptive, in ...
-
- research-articleJanuary 2022
Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 239, Pages 10891–10951We propose and analyze several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems. First, we propose a simple proximal stochastic ...
- research-articleJanuary 2022
Nonconvex matrix completion with linearly parameterized factors
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 207, Pages 9359–9393Techniques of matrix completion aim to impute a large portion of missing entries in a data matrix through a small portion of observed ones. In practice, prior information and special structures are usually employed in order to improve the accuracy of ...
- research-articleJanuary 2022
Distributed stochastic gradient descent: nonconvexity, nonsmoothness, and convergence to local minima
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 328, Pages 14751–14812Gradient-descent (GD) based algorithms are an indispensable tool for optimizing modern machine learning models. The paper considers distributed stochastic GD (D-SGD)--a network-based variant of GD. Distributed algorithms play an important role in large-...
- research-articleJanuary 2022
On constraints in first-order optimization: a view from non-smooth dynamical systems
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 256, Pages 11681–11727We introduce a class of first-order methods for smooth constrained optimization that are based on an analogy to non-smooth dynamical systems. Two distinctive features of our approach are that (i) projections or optimizations over the entire feasible set ...
- research-articleJune 2021
Greedy adversarial equilibrium: an efficient alternative to nonconvex-nonconcave min-max optimization
STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of ComputingPages 896–909https://doi.org/10.1145/3406325.3451097Min-max optimization of an objective function f: ℝd × ℝd → ℝ is an important model for robustness in an adversarial setting, with applications to many areas including optimization, economics, and deep learning. In many applications f may be nonconvex-...
- research-articleJanuary 2021
Optimization with momentum: dynamical, control-theoretic, and symplectic perspectives
The Journal of Machine Learning Research (JMLR), Volume 22, Issue 1Article No.: 73, Pages 3407–3456We analyze the convergence rate of various momentum-based optimization algorithms from a dynamical systems point of view. Our analysis exploits fundamental topological properties, such as the continuous dependence of iterates on their initial conditions, ...
- research-articleAugust 2020
A Block Decomposition Algorithm for Sparse Optimization
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningPages 275–285https://doi.org/10.1145/3394486.3403070Sparse optimization is a central problem in machine learning and computer vision. However, this problem is inherently NP-hard and thus difficult to solve in general. Combinatorial search methods find the global optimal solution but are confined to small-...
- posterJuly 2020
Continuous optimization by hierarchical gaussian mixture with clustering embedded resource allocation
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference CompanionPages 191–192https://doi.org/10.1145/3377929.3390019Continuous optimization problems, which are highly related to real-world problems, are difficult since they are multimodal and rugged. In this paper, we introduce a new algorithm called Global and Local Adaptation with Clustering Embedded Resource ...
- research-articleJanuary 2020
AdaGrad stepsizes: sharp convergence over nonconvex landscapes
The Journal of Machine Learning Research (JMLR), Volume 21, Issue 1Article No.: 219, Pages 9047–9076Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization for their ...
- research-articleJanuary 2020
On the theoretical guarantees for parameter estimation of Gaussian random field models: a sparse precision matrix approach
The Journal of Machine Learning Research (JMLR), Volume 21, Issue 1Article No.: 217, Pages 8962–9002Iterative methods for fitting a Gaussian Random Field (GRF) model via maximum likelihood (ML) estimation requires solving a nonconvex optimization problem. The problem is aggravated for anisotropic GRFs where the number of covariance function parameters ...
- research-articleJanuary 2020
Stochastic nested variance reduction for nonconvex optimization
The Journal of Machine Learning Research (JMLR), Volume 21, Issue 1Article No.: 103, Pages 4130–4192We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction,...
- research-articleJanuary 2020
Estimation of a low-rank topic-based model for information cascades
The Journal of Machine Learning Research (JMLR), Volume 21, Issue 1Article No.: 71, Pages 2721–2767We consider the problem of estimating the latent structure of a social network based on the observed information diffusion events, or cascades, where the observations for a given cascade consist of only the timestamps of infection for infected nodes but ...
- research-articleJanuary 2020
Unique sharp local minimum in l1-minimization complete dictionary learning
The Journal of Machine Learning Research (JMLR), Volume 21, Issue 1Article No.: 63, Pages 2354–2405We study the problem of globally recovering a dictionary from a set of signals via l1- minimization. We assume that the signals are generated as i.i.d. random linear combinations of the K atoms from a complete reference dictionary D* ∈ RK×K, where the ...
- research-articleOctober 2019
Analysis of Industrial Internet Firm Size Based on Transaction Efficiency
ICCSE'19: Proceedings of the 4th International Conference on Crowd Science and EngineeringPages 103–109https://doi.org/10.1145/3371238.3371255The Industrial Internet is booming and playing an important role in industry. This paper focuses on the Industrial Internet firm size, and takes transaction efficiency as the dynamic of the firm size's changing. Through building a decision model of the ...
- research-articleJuly 2019
Estimating Cellular Goals from High-Dimensional Biological Data
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningPages 2202–2211https://doi.org/10.1145/3292500.3330775Optimization-based models have been used to predict cellular behavior for over 25 years. The constraints in these models are derived from genome annotations, measured macromolecular composition of cells, and by measuring the cell's growth rate and ...