Sharp Guarantees and Optimal Performance for Inference in Binary and Gaussian-Mixture Models †
<p>(<bold>a</bold>) Comparison between theoretical (solid lines) and empirical (markers) performance for least-squares (LS) and least-absolute deviations (LAD), as predicted by Theorem 1, and the optimal performance, as predicted by the upper bound of Theorem 2, for the signed model. The squares and circles denote the empirical performance for Gaussian and Rademacher features, respectively. (<bold>b</bold>) Illustrations of optimal loss functions for the signed model for different values of <inline-formula> <mml:math id="mm1063" display="block"> <mml:semantics> <mml:mi>δ</mml:mi> </mml:semantics> </mml:math> </inline-formula> according to Theorem 3.</p> "> Figure 2
<p>Theoretical (solid lines) and empirical (markers) results of classification risk in GMM as in Theorem 4 and (<xref ref-type="disp-formula" rid="FD39-entropy-23-00178">39</xref>) for LS, LAD and logistic loss functions as a function of <inline-formula> <mml:math id="mm1064" display="block"> <mml:semantics> <mml:mi>δ</mml:mi> </mml:semantics> </mml:math> </inline-formula> for <inline-formula> <mml:math id="mm1065" display="block"> <mml:semantics> <mml:mrow> <mml:mi>r</mml:mi> <mml:mo>=</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>. The vertical line represents the threshold <inline-formula> <mml:math id="mm1066" display="block"> <mml:semantics> <mml:mrow> <mml:msup> <mml:mi>δ</mml:mi> <mml:mo>☆</mml:mo> </mml:msup> <mml:mo>≈</mml:mo> <mml:mn>3.7</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> as evaluated by (<xref ref-type="disp-formula" rid="FD36-entropy-23-00178">36</xref>). Logistic loss gives unbounded solution if and only if <inline-formula> <mml:math id="mm1067" display="block"> <mml:semantics> <mml:mrow> <mml:mi>δ</mml:mi> <mml:mo><</mml:mo> <mml:msup> <mml:mi>δ</mml:mi> <mml:mo>☆</mml:mo> </mml:msup> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>.</p> "> Figure 3
<p>(<bold>a</bold>) Comparison between analytical and empirical results for the performance of LS, logistic loss, hinge loss and optimal loss function for logistic model. The vertical dashed line represents <inline-formula> <mml:math id="mm1068" display="block"> <mml:semantics> <mml:mrow> <mml:msubsup> <mml:mi>δ</mml:mi> <mml:mi>f</mml:mi> <mml:mo>☆</mml:mo> </mml:msubsup> <mml:mo>≈</mml:mo> <mml:mn>2.275</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>, as evaluated by (<xref ref-type="disp-formula" rid="FD35-entropy-23-00178">35</xref>). (<bold>b</bold>) Illustrations of optimal loss functions for different values of <inline-formula> <mml:math id="mm1069" display="block"> <mml:semantics> <mml:mi>δ</mml:mi> </mml:semantics> </mml:math> </inline-formula>, derived according to Theorem 3 for logistic model. To signify the similarity of optimal loss function to the LS loss, the optimal loss functions (hardly visible) are scaled such that <inline-formula> <mml:math id="mm1070" display="block"> <mml:semantics> <mml:mrow> <mml:mo>ℓ</mml:mo> <mml:mo>(</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> <mml:mo>=</mml:mo> <mml:mn>0</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> and <inline-formula> <mml:math id="mm1071" display="block"> <mml:semantics> <mml:mrow> <mml:mo>ℓ</mml:mo> <mml:mo>(</mml:mo> <mml:mn>2</mml:mn> <mml:mo>)</mml:mo> <mml:mo>=</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>.</p> "> Figure 4
<p>(<bold>a</bold>) Comparison between analytical and empirical results for the performance of LS, hinge loss and optimal loss function for Probit model. The vertical dashed line represents <inline-formula> <mml:math id="mm1072" display="block"> <mml:semantics> <mml:mrow> <mml:msubsup> <mml:mi>δ</mml:mi> <mml:mi>f</mml:mi> <mml:mo>☆</mml:mo> </mml:msubsup> <mml:mo>≈</mml:mo> <mml:mn>2.699</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>, as evaluated by (<xref ref-type="disp-formula" rid="FD35-entropy-23-00178">35</xref>). (<bold>b</bold>) Illustrations of optimal loss functions for different values of <inline-formula> <mml:math id="mm1073" display="block"> <mml:semantics> <mml:mi>δ</mml:mi> </mml:semantics> </mml:math> </inline-formula> derived according to Theorem 3 for Probit model. To signify the similarity of optimal loss function to the LS loss, the optimal loss functions (hardly visible) are scaled such that <inline-formula> <mml:math id="mm1074" display="block"> <mml:semantics> <mml:mrow> <mml:mo>ℓ</mml:mo> <mml:mo>(</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> <mml:mo>=</mml:mo> <mml:mn>0</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> and <inline-formula> <mml:math id="mm1075" display="block"> <mml:semantics> <mml:mrow> <mml:mo>ℓ</mml:mo> <mml:mo>(</mml:mo> <mml:mn>2</mml:mn> <mml:mo>)</mml:mo> <mml:mo>=</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>.</p> "> Figure A1
<p>The value of <inline-formula> <mml:math id="mm880" display="block"> <mml:semantics> <mml:mrow> <mml:mi>κ</mml:mi> <mml:mo>(</mml:mo> <mml:mi>σ</mml:mi> <mml:mo>)</mml:mo> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> as in Theorem 2 for various measurement models. Since <inline-formula> <mml:math id="mm881" display="block"> <mml:semantics> <mml:mrow> <mml:mi>κ</mml:mi> <mml:mo>(</mml:mo> <mml:mi>σ</mml:mi> <mml:mo>)</mml:mo> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> is a monotonic function of <inline-formula> <mml:math id="mm882" display="block"> <mml:semantics> <mml:mi>σ</mml:mi> </mml:semantics> </mml:math> </inline-formula>, the solution to <inline-formula> <mml:math id="mm883" display="block"> <mml:semantics> <mml:mrow> <mml:mi>κ</mml:mi> <mml:mo>(</mml:mo> <mml:mi>σ</mml:mi> <mml:mo>)</mml:mo> <mml:mo>=</mml:mo> <mml:mn>1</mml:mn> <mml:mo>/</mml:mo> <mml:mi>δ</mml:mi> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> determines the minimum possible value of <inline-formula> <mml:math id="mm884" display="block"> <mml:semantics> <mml:mi>σ</mml:mi> </mml:semantics> </mml:math> </inline-formula>.</p> "> Figure A2
<p>Probability distribution function of <inline-formula> <mml:math id="mm885" display="block"> <mml:semantics> <mml:mrow> <mml:mi>S</mml:mi> <mml:mi>Y</mml:mi> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> for the logistic and Probit models (<inline-formula> <mml:math id="mm886" display="block"> <mml:semantics> <mml:mrow> <mml:mrow> <mml:mo>∥</mml:mo> </mml:mrow> <mml:msub> <mml:mi mathvariant="bold">x</mml:mi> <mml:mn>0</mml:mn> </mml:msub> <mml:msub> <mml:mrow> <mml:mo>∥</mml:mo> </mml:mrow> <mml:mn>2</mml:mn> </mml:msub> <mml:mo>=</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>) compared with the probability distribution function of the Gaussian random variable (dashed lines) with the same mean and variance i.e., <inline-formula> <mml:math id="mm887" display="block"> <mml:semantics> <mml:mrow> <mml:mi mathvariant="script">N</mml:mi> <mml:mo>(</mml:mo> <mml:mi mathvariant="double-struck">E</mml:mi> <mml:mo>[</mml:mo> <mml:mi>S</mml:mi> <mml:mi>Y</mml:mi> <mml:mo>]</mml:mo> <mml:mo>,</mml:mo> <mml:mi>Var</mml:mi> <mml:mo>[</mml:mo> <mml:mi>S</mml:mi> <mml:mi>Y</mml:mi> <mml:mo>]</mml:mo> <mml:mo>)</mml:mo> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>.</p> "> Figure A3
<p>The value of the threshold <inline-formula> <mml:math id="mm958" display="block"> <mml:semantics> <mml:msubsup> <mml:mi>δ</mml:mi> <mml:msub> <mml:mi>f</mml:mi> <mml:mi>ε</mml:mi> </mml:msub> <mml:mo>☆</mml:mo> </mml:msubsup> </mml:semantics> </mml:math> </inline-formula> in (A69) as a function of probability of error <inline-formula> <mml:math id="mm959" display="block"> <mml:semantics> <mml:mrow> <mml:mi>ε</mml:mi> <mml:mo>∈</mml:mo> <mml:mo>[</mml:mo> <mml:mn>0</mml:mn> <mml:mo>,</mml:mo> <mml:mn>1</mml:mn> <mml:mo>/</mml:mo> <mml:mn>2</mml:mn> <mml:mo>]</mml:mo> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>. For logistic and hinge losses, the set of minimizers in (<xref ref-type="disp-formula" rid="FD3-entropy-23-00178">3</xref>) is bounded (as required by Theorem 1) iff <inline-formula> <mml:math id="mm960" display="block"> <mml:semantics> <mml:mrow> <mml:mi>δ</mml:mi> <mml:mo>></mml:mo> <mml:msubsup> <mml:mi>δ</mml:mi> <mml:msub> <mml:mi>f</mml:mi> <mml:mi>ε</mml:mi> </mml:msub> <mml:mo>☆</mml:mo> </mml:msubsup> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>.</p> "> Figure A4
<p>Comparisons between analytical and empirical results for the least-squares (LS), least-absolute deviations and hinge loss functions along with the upper bound on performance and the empirical performance of optimal loss function as in Theorem 3, for noisy-signed measurement model with <inline-formula> <mml:math id="mm979" display="block"> <mml:semantics> <mml:mrow> <mml:mi>ε</mml:mi> <mml:mo>=</mml:mo> <mml:mn>0.1</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> (<bold>a</bold>) and <inline-formula> <mml:math id="mm980" display="block"> <mml:semantics> <mml:mrow> <mml:mi>ε</mml:mi> <mml:mo>=</mml:mo> <mml:mn>0.25</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> (<bold>b</bold>). The vertical dashed lines are evaluated by (<xref ref-type="disp-formula" rid="FD112-entropy-23-00178">A59</xref>) and represent <inline-formula> <mml:math id="mm981" display="block"> <mml:semantics> <mml:mrow> <mml:msubsup> <mml:mi>δ</mml:mi> <mml:msub> <mml:mi>f</mml:mi> <mml:mi>ε</mml:mi> </mml:msub> <mml:mo>☆</mml:mo> </mml:msubsup> <mml:mo>≈</mml:mo> <mml:mn>3</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> and <inline-formula> <mml:math id="mm982" display="block"> <mml:semantics> <mml:mrow> <mml:mn>2.25</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> for <inline-formula> <mml:math id="mm983" display="block"> <mml:semantics> <mml:mrow> <mml:mi>ε</mml:mi> <mml:mo>=</mml:mo> <mml:mn>0.1</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula> and <inline-formula> <mml:math id="mm984" display="block"> <mml:semantics> <mml:mrow> <mml:mn>0.25</mml:mn> </mml:mrow> </mml:semantics> </mml:math> </inline-formula>, respectively.</p> ">
Abstract
:1. Introduction
1.1. Motivation
1.2. Data Models
- (Noisy) Signed:
- Logistic:
- Probit:
1.3. Empirical Risk Minimization
- Least Squares (LS):,
- Least-Absolute Deviations (LAD):,
- Logistic Loss:,
- Exponential Loss:,
- Hinge Loss:.
1.4. Contributions and Organization
- Precise Asymptotics: We show that the absolute value of correlation of to the true vector is sharply predicted by where the “effective noise” parameter can be explicitly computed by solving a system of three non-linear equations in three unknowns. We find that the system of equations (and, thus, the value of ) depends on the loss function ℓ through its Moreau envelope function. Our prediction holds in the linear asymptotic regime in which and (see Section 2).
- Fundamental Limits: We establish fundamental limits on the performance of convex optimization-based estimators by computing an upper bound on the best possible correlation performance among all convex loss functions. We compute the upper bound by solving a certain nonlinear equation and we show that such a solution exists for all (see Section 3.1).
- Optimal Performance and (sub)-optimality of LS for binary models: For certain binary models including signed and logistic, we find the loss functions that achieve the optimal performance, i.e., they attain the previously derived upper bound (see Section 3.2). Interestingly, for logistic and Probit models with , we prove that the correlation performance of least-squares (LS) is at least as good 0.9972 and 0.9804 times the optimal performance. However, as grows large, logistic and Probit models approach the signed model, in which case LS becomes sub-optimal (see Section 4.1).
- Extension to the Gaussian-Mixture Model: In Section 5, we extend the fundamental limits and the system of equations to the Gaussian-mixture model. Interestingly, our results indicate that, for this model, LS is optimal among all convex loss functions for all .
- Numerical Simulations: We do numerous experiments to specialize our results to popular models and loss functions, for which we provide simulation results that demonstrate the accuracy of the theoretical predictions (see Section 6 and Appendix E).
1.5. Related Works
- (a)
- Sharp asymptotics for linear measurements.
- (b)
- One-bit compressed sensing.
- (c)
- Classification in high-dimensions.
2. Sharp Performance Guarantees
2.1. Definitions
2.2. A System of Equations
2.3. Asymptotic Prediction
3. On Optimal Performance
3.1. Fundamental Limits
3.2. On the Optimal Loss Function
4. Special Cases
4.1. Least-Squares
4.2. Logistic and Hinge Loss
5. Extensions to Gaussian-Mixture Models
5.1. System of Equations for GMM
5.2. Theoretical Prediction of Error for Convex Loss Functions
5.3. Special Case: Least-Squares
5.4. Optimal Risk for GMM
6. Numerical Experiments
Numerical Experiments for GMM
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Properties of Moreau Envelopes
Appendix A.1. Derivatives
- (a)
- The proximal operator is unique and continuous. In fact, whenever with .
- (b)
- The value is finite and depends continuously on , with for all x as .
- (c)
- The Moreau envelope function is differentiable with respect to both arguments. Specifically, for all , the following properties are true:
Appendix A.2. Alternative Representations of (8)
Appendix A.3. Examples of Proximal Operators
Appendix A.4. Fenchel–Legendre Conjugate Representation
Appendix A.5. Convexity of the Moreau Envelope
Appendix A.6. The Expected Moreau-Envelope (EME) Function and its Properties
Appendix A.6.1. Derivatives
Appendix A.6.2. Strict Convexity
“ If ℓ is strictly convex and does not attain its minimum at 0, then is also strictly convex. ”
- , for all .
- .
Appendix A.6.3. Strict Concavity
“ If ℓ is convex, continuously differentiable and , then is strictly concave. ”
Appendix A.6.4. Summary of Properties of (uid135)
- (a)
- The function Ω is differentiable and its derivatives are given as follows:
- (b)
- The function Ω is jointly convex and concave on γ.
- (c)
- The function Ω is increasing in α.For the statements below, further assume that ℓ is strictly convex and continuously differentiable with .
- (d)
- The function Ω is strictly convex in and strictly concave in λ.
- (e)
- The function Ω is strictly increasing in α.
Appendix B. Proof of Theorem 1
Appendix B.1. Technical Tool: CGMT
Appendix B.1.1. Gordon’s Min-Max Theorem (GMT)
Appendix B.1.2. Convex Gaussian Min-Max Theorem (CGMT)
Appendix B.2. Applying the CGMT to ERM for Binary Classification
Appendix B.3. Analysis of the Auxiliary Optimization
Appendix B.4. Convex-Concavity and First-Order Optimality Conditions
Appendix B.5. On the Uniqueness of Solutions to (A57): Proof of Proposition 1
Appendix C. Discussions on the Fundamental Limits for Binary Models
- On the Uniqueness of Solutions to Equation
Appendix C.1. Distribution of SY in Special Cases
- Signed:
- Logistic:
- Probit:
Appendix D. Proofs and Discussions on the Optimal Loss Function
Appendix D.1. Proof of Theorem 3
Appendix D.2. On the Convexity of Optimal Loss Function
Appendix D.2.1. Provable Convexity of the Optimal Loss Function for Signed Model
Appendix E. Noisy-Signed Measurement Model
Appendix F. On LS Performance for Binary Models
Appendix F.1. Proof of Corollary 2
Appendix F.2. Discussion
Linear vs. Binary
Appendix G. Fundamental Limits for Gaussian-Mixture Models: Proofs for Section 5
Appendix G.1. Proof of Corollary 3
Appendix G.2. Proof of Theorem 5
References
- Donoho, D.L. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math. Chall. Lect. 2000, 1, 32. [Google Scholar]
- Donoho, D.L. Compressed sensing. Inf. Theory IEEE Trans. 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
- Stojnic, M. Various thresholds for ℓ1-optimization in compressed sensing. arXiv 2009, arXiv:0907.3666. [Google Scholar]
- Chandrasekaran, V.; Recht, B.; Parrilo, P.A.; Willsky, A.S. The convex geometry of linear inverse problems. Found. Comput. Math. 2012, 12, 805–849. [Google Scholar] [CrossRef]
- Donoho, D.L.; Maleki, A.; Montanari, A. The noise-sensitivity phase transition in compressed sensing. Inf. Theory IEEE Trans. 2011, 57, 6920–6941. [Google Scholar] [CrossRef] [Green Version]
- Tropp, J.A. Convex recovery of a structured signal from independent random linear measurements. arXiv 2014, arXiv:1405.1102. [Google Scholar]
- Oymak, S.; Tropp, J.A. Universality laws for randomized dimension reduction, with applications. Inf. Inference J. IMA 2017, 7, 337–446. [Google Scholar] [CrossRef]
- Bayati, M.; Montanari, A. The LASSO risk for gaussian matrices. Inf. Theory IEEE Trans. 2012, 58, 1997–2017. [Google Scholar] [CrossRef] [Green Version]
- Stojnic, M. A framework to characterize performance of LASSO algorithms. arXiv 2013, arXiv:1303.7291. [Google Scholar]
- Oymak, S.; Thrampoulidis, C.; Hassibi, B. The Squared-Error of Generalized LASSO: A Precise Analysis. arXiv 2013, arXiv:1311.0830. [Google Scholar]
- Karoui, N.E. Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: Rigorous results. arXiv 2013, arXiv:1311.2445. [Google Scholar]
- Bean, D.; Bickel, P.J.; El Karoui, N.; Yu, B. Optimal M-estimation in high-dimensional regression. Proc. Natl. Acad. Sci. USA 2013, 110, 14563–14568. [Google Scholar] [CrossRef] [Green Version]
- Thrampoulidis, C.; Oymak, S.; Hassibi, B. Regularized Linear Regression: A Precise Analysis of the Estimation Error. In Proceedings of the 28th Conference on Learning Theory, Paris, France, 3–6 July 2015; pp. 1683–1709. [Google Scholar]
- Donoho, D.; Montanari, A. High dimensional robust m-estimation: Asymptotic variance via approximate message passing. Probab. Theory Relat. Fields 2016, 166, 935–969. [Google Scholar] [CrossRef] [Green Version]
- Thrampoulidis, C.; Abbasi, E.; Hassibi, B. Precise Error Analysis of Regularized M-Estimators in High Dimensions. IEEE Trans. Inf. Theory 2018, 64, 5592–5628. [Google Scholar] [CrossRef] [Green Version]
- Advani, M.; Ganguli, S. Statistical mechanics of optimal convex inference in high dimensions. Phys. Rev. X 2016, 6, 031034. [Google Scholar] [CrossRef] [Green Version]
- Weng, H.; Maleki, A.; Zheng, L. Overcoming the limitations of phase transition by higher order analysis of regularization techniques. Ann. Stat. 2018, 46, 3099–3129. [Google Scholar] [CrossRef] [Green Version]
- Thrampoulidis, C.; Xu, W.; Hassibi, B. Symbol Error Rate Performance of Box-relaxation Decoders in Massive MIMO. IEEE Trans. Signal Process. 2018, 66, 3377–3392. [Google Scholar] [CrossRef] [Green Version]
- Miolane, L.; Montanari, A. The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning. arXiv 2018, arXiv:1811.01212. [Google Scholar]
- Bu, Z.; Klusowski, J.; Rush, C.; Su, W. Algorithmic analysis and statistical estimation of slope via approximate message passing. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BD, Canada, 8–14 December 2019; pp. 9361–9371. [Google Scholar]
- Xu, J.; Maleki, A.; Rad, K.R.; Hsu, D. Consistent risk estimation in high-dimensional linear regression. arXiv 2019, arXiv:1902.01753. [Google Scholar]
- Celentano, M.; Montanari, A. Fundamental Barriers to High-Dimensional Regression with Convex Penalties. arXiv 2019, arXiv:1903.10603. [Google Scholar]
- Kammoun, A.; Alouini, M.S. On the precise error analysis of support vector machines. arXiv 2020, arXiv:2003.12972. [Google Scholar]
- Amelunxen, D.; Lotz, M.; McCoy, M.B.; Tropp, J.A. Living on the edge: A geometric theory of phase transitions in convex optimization. arXiv 2013, arXiv:1303.6672. [Google Scholar]
- Donoho, D.L.; Johnstone, L.; Montanari, A. Accurate Prediction of Phase Transitions in Compressed Sensing via a Connection to Minimax Denoising. IEEE Trans. Inf. Theory 2013, 59, 3396–3433. [Google Scholar] [CrossRef] [Green Version]
- Mondelli, M.; Montanari, A. Fundamental limits of weak recovery with applications to phase retrieval. arXiv 2017, arXiv:1708.05932. [Google Scholar] [CrossRef] [Green Version]
- Taheri, H.; Pedarsani, R.; Thrampoulidis, C. Fundamental limits of ridge-regularized empirical risk minimization in high dimensions. arXiv 2020, arXiv:2006.08917. [Google Scholar]
- Bayati, M.; Lelarge, M.; Montanari, A. Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 2015, 25, 753–822. [Google Scholar] [CrossRef]
- Panahi, A.; Hassibi, B. A universal analysis of large-scale regularized least squares solutions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3381–3390. [Google Scholar]
- Abbasi, E.; Salehi, F.; Hassibi, B. Universality in learning from linear measurements. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BD, Canada, 8–14 December 2019; pp. 12372–12382. [Google Scholar]
- Goldt, S.; Reeves, G.; Mézard, M.; Krzakala, F.; Zdeborová, L. The Gaussian equivalence of generative models for learning with two-layer neural networks. arXiv 2020, arXiv:2006.14709. [Google Scholar]
- Donoho, D.L.; Maleki, A.; Montanari, A. Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 2009, 106, 18914–18919. [Google Scholar] [CrossRef] [Green Version]
- Bayati, M.; Montanari, A. The dynamics of message passing on dense graphs, with applications to compressed sensing. Inf. Theory IEEE Trans. 2011, 57, 764–785. [Google Scholar] [CrossRef] [Green Version]
- Mousavi, A.; Maleki, A.; Baraniuk, R.G. Consistent parameter estimation for LASSO and approximate message passing. Ann. Stat. 2018, 46, 119–148. [Google Scholar] [CrossRef] [Green Version]
- El Karoui, N. On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probab. Theory Relat. Fields 2018, 170, 95–175. [Google Scholar] [CrossRef] [Green Version]
- Boufounos, P.T.; Baraniuk, R.G. 1-bit compressive sensing. In Proceedings of the 2008 IEEE 42nd Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 19–21 March 2008; pp. 16–21. [Google Scholar]
- Jacques, L.; Laska, J.N.; Boufounos, P.T.; Baraniuk, R.G. Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors. IEEE Trans. Inf. Theory 2013, 59, 2082–2102. [Google Scholar] [CrossRef] [Green Version]
- Plan, Y.; Vershynin, R. One-Bit Compressed Sensing by Linear Programming. Commun. Pure Appl. Math. 2013, 66, 1275–1297. [Google Scholar] [CrossRef] [Green Version]
- Plan, Y.; Vershynin, R. Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach. IEEE Trans. Inf. Theory 2012, 59, 482–494. [Google Scholar] [CrossRef] [Green Version]
- Plan, Y.; Vershynin, R. The generalized lasso with non-linear observations. IEEE Trans. Inf. Theory 2016, 62, 1528–1537. [Google Scholar] [CrossRef]
- Genzel, M. High-dimensional estimation of structured signals from non-linear observations with general convex loss functions. IEEE Trans. Inf. Theory 2017, 63, 1601–1619. [Google Scholar] [CrossRef] [Green Version]
- Xu, C.; Jacques, L. Quantized compressive sensing with rip matrices: The benefit of dithering. arXiv 2018, arXiv:1801.05870. [Google Scholar] [CrossRef]
- Thrampoulidis, C.; Abbasi, E.; Hassibi, B. Lasso with non-linear measurements is equivalent to one with linear measurements. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 3420–3428. [Google Scholar]
- Candès, E.J.; Sur, P. The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression. arXiv 2018, arXiv:1804.09753. [Google Scholar] [CrossRef] [Green Version]
- Sur, P.; Candès, E.J. A modern maximum-likelihood theory for high-dimensional logistic regression. Proc. Natl. Acad. Sci. USA 2019, 201810420. [Google Scholar] [CrossRef] [Green Version]
- Mai, X.; Liao, Z.; Couillet, R. A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3357–3361. [Google Scholar]
- Salehi, F.; Abbasi, E.; Hassibi, B. The Impact of Regularization on High-dimensional Logistic Regression. arXiv 2019, arXiv:1906.03761. [Google Scholar]
- Montanari, A.; Ruan, F.; Sohn, Y.; Yan, J. The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime. arXiv 2019, arXiv:1911.01544. [Google Scholar]
- Deng, Z.; Kammoun, A.; Thrampoulidis, C. A Model of Double Descent for High-dimensional Binary Linear Classification. arXiv 2019, arXiv:1911.05822. [Google Scholar]
- Mignacco, F.; Krzakala, F.; Lu, Y.M.; Zdeborová, L. The role of regularization in classification of high-dimensional noisy Gaussian mixture. arXiv 2020, arXiv:2002.11544. [Google Scholar]
- Aubin, B.; Krzakala, F.; Lu, Y.M.; Zdeborová, L. Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization. arXiv 2020, arXiv:2006.06560. [Google Scholar]
- Rockafellar, R.T.; Wets, R.J.B. Variational Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009; Volume 317. [Google Scholar]
- Barron, A.R. Monotonic Central Limit Theorem for Densities; Technical Report; Stanford University: Stanford, CA, USA, 1984. [Google Scholar]
- Costa, M.H.M. A new entropy power inequality. IEEE Trans. Inf. Theory 1985, 31, 751–760. [Google Scholar] [CrossRef]
- Blachman, N. The convolution inequality for entropy powers. IEEE Trans. Inf. Theory 1965, 11, 267–271. [Google Scholar] [CrossRef]
- Brillinger, D.R. A Generalized Linear Model with “Gaussian” Regressor Variables. In A Festschrift For Erich L. Lehmann; Springer: New York, NY, USA, 1982; p. 97. [Google Scholar]
- Dhifallah, O.; Lu, Y.M. A precise performance analysis of learning with random features. arXiv 2020, arXiv:2008.11904. [Google Scholar]
- Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Gordon, Y. On Milman’s Inequality and Random Subspaces which Escape through a Mesh in Rn; Springer: Berlin/Heidelberg, Germany, 1988. [Google Scholar]
- Dhifallah, O.; Thrampoulidis, C.; Lu, Y.M. Phase retrieval via polytope optimization: Geometry, phase transitions, and new algorithms. arXiv 2018, arXiv:1805.09555. [Google Scholar]
- Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
- Genzel, M.; Jung, P. Recovering structured data from superimposed non-linear measurements. arXiv 2017, arXiv:1708.07451. [Google Scholar] [CrossRef]
- Goldstein, L.; Minsker, S.; Wei, X. Structured signal recovery from non-linear and heavy-tailed measurements. IEEE Trans. Inf. Theory 2018, 64, 5513–5530. [Google Scholar] [CrossRef]
- Thrampoulidis, C.; Rawat, A.S. The generalized lasso for sub-gaussian measurements with dithered quantization. arXiv 2018, arXiv:1807.06976. [Google Scholar] [CrossRef] [Green Version]
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|
Predicted Performance | ||||||||
Empirical (Gaussian) | ||||||||
Empirical (Rademacher) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Taheri, H.; Pedarsani, R.; Thrampoulidis, C. Sharp Guarantees and Optimal Performance for Inference in Binary and Gaussian-Mixture Models. Entropy 2021, 23, 178. https://doi.org/10.3390/e23020178
Taheri H, Pedarsani R, Thrampoulidis C. Sharp Guarantees and Optimal Performance for Inference in Binary and Gaussian-Mixture Models. Entropy. 2021; 23(2):178. https://doi.org/10.3390/e23020178
Chicago/Turabian StyleTaheri, Hossein, Ramtin Pedarsani, and Christos Thrampoulidis. 2021. "Sharp Guarantees and Optimal Performance for Inference in Binary and Gaussian-Mixture Models" Entropy 23, no. 2: 178. https://doi.org/10.3390/e23020178