Low-rank univariate sum of squares has no spurious local minima
We study the problem of decomposing a polynomial into a sum of squares by minimizing a
quadratically penalized objective. This objective is nonconvex and is equivalent to the rank-
Burer–Monteiro factorization of a semidefinite program (SDP) encoding the sum of squares
decomposition. We show that for all univariate polynomials, if, then has no spurious second-
order critical points, showing that all local optima are also global optima. This is in contrast to
previous work showing that for general SDPs, in addition to genericity conditions, has to be …
quadratically penalized objective. This objective is nonconvex and is equivalent to the rank-
Burer–Monteiro factorization of a semidefinite program (SDP) encoding the sum of squares
decomposition. We show that for all univariate polynomials, if, then has no spurious second-
order critical points, showing that all local optima are also global optima. This is in contrast to
previous work showing that for general SDPs, in addition to genericity conditions, has to be …
Abstract
We study the problem of decomposing a polynomial into a sum of squares by minimizing a quadratically penalized objective . This objective is nonconvex and is equivalent to the rank- Burer–Monteiro factorization of a semidefinite program (SDP) encoding the sum of squares decomposition. We show that for all univariate polynomials , if , then has no spurious second-order critical points, showing that all local optima are also global optima. This is in contrast to previous work showing that for general SDPs, in addition to genericity conditions, has to be roughly the square root of the number of constraints (the degree of ) for there to be no spurious second-order critical points. Our proof uses tools from computational algebraic geometry and can be interpreted as constructing a certificate using the first- and second-order necessary conditions. We also show that by choosing a norm based on sampling equally spaced points on the circle, the gradient can be computed in nearly linear time using fast Fourier transforms. Experimentally we demonstrate that this method has very fast convergence using first-order optimization algorithms such as L-BFGS, with near-linear scaling to million-degree polynomials.