Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity
<p>Strictly log-convex functions form a proper subset of strictly convex functions.</p> "> Figure 2
<p>The canonical divergence <math display="inline"><semantics> <mi mathvariant="script">D</mi> </semantics></math> and dual canonical divergence <math display="inline"><semantics> <msup> <mi mathvariant="script">D</mi> <mo>*</mo> </msup> </semantics></math> on a dually flat space <math display="inline"><semantics> <mi mathvariant="script">M</mi> </semantics></math> equipped with potential functions <math display="inline"><semantics> <mi mathvariant="script">F</mi> </semantics></math> and <math display="inline"><semantics> <msup> <mi mathvariant="script">F</mi> <mo>*</mo> </msup> </semantics></math> can be viewed as single-parameter contrast functions on the product manifold <math display="inline"><semantics> <mrow> <mi mathvariant="script">M</mi> <mo>×</mo> <mi mathvariant="script">M</mi> </mrow> </semantics></math>: The divergence <math display="inline"><semantics> <mi mathvariant="script">D</mi> </semantics></math> can be expressed using either the <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>×</mo> <mi>θ</mi> </mrow> </semantics></math>-coordinate system as a Bregman divergence or the mixed <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>×</mo> <mi>η</mi> </mrow> </semantics></math>-coordinate system as a Fenchel–Young divergence. Similarly, the dual divergence <math display="inline"><semantics> <mi mathvariant="script">D</mi> </semantics></math> can be expressed using either the <math display="inline"><semantics> <mrow> <mi>η</mi> <mo>×</mo> <mi>η</mi> </mrow> </semantics></math>-coordinate system as a dual Bregman divergence or the mixed <math display="inline"><semantics> <mrow> <mi>η</mi> <mo>×</mo> <mi>θ</mi> </mrow> </semantics></math>-coordinate system as a dual Fenchel–Young divergence.</p> "> Figure 3
<p>Statistical divergences between normalized <math display="inline"><semantics> <msub> <mi>p</mi> <mi>θ</mi> </msub> </semantics></math> and unnormalized <math display="inline"><semantics> <msub> <mover accent="true"> <mi>p</mi> <mo>˜</mo> </mover> <mi>θ</mi> </msub> </semantics></math> densities of an exponential family <math display="inline"><semantics> <mi mathvariant="script">E</mi> </semantics></math> with corresponding divergences between their natural parameters. Without loss of generality, we consider a natural exponential family (i.e., <math display="inline"><semantics> <mrow> <mi>t</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>=</mo> <mi>x</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>k</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>) with cumulant function <span class="html-italic">F</span> and partition function <span class="html-italic">Z</span>, with <math display="inline"><semantics> <msub> <mi>J</mi> <mi>F</mi> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>B</mi> <mi>F</mi> </msub> </semantics></math> respectively denoting the Jensen and Bregman divergences induced by the generator <span class="html-italic">F</span>. The statistical divergences <math display="inline"><semantics> <msub> <mi>D</mi> <mrow> <mi>R</mi> <mo>,</mo> <mi>α</mi> </mrow> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>D</mi> <mrow> <mi>B</mi> <mo>,</mo> <mi>α</mi> </mrow> </msub> </semantics></math> denote the Rényi <math display="inline"><semantics> <mi>α</mi> </semantics></math>-divergences and skewed <math display="inline"><semantics> <mi>α</mi> </semantics></math>-Bhattacharyya distances, respectively. The superscript “s” indicates rescaling by the multiplicative factor <math display="inline"><semantics> <mfrac> <mn>1</mn> <mrow> <mi>α</mi> <mo>(</mo> <mn>1</mn> <mo>−</mo> <mi>α</mi> <mo>)</mo> </mrow> </mfrac> </semantics></math>, while the superscript “*” denotes the reverse divergence obtained by swapping the parameter order.</p> ">
Abstract
:1. Introduction
2. Dual Subtractive and Divisive Normalizations of Exponential Families
2.1. Natural Exponential Families
2.2. Exponential Families
2.3. Normalizations of Exponential Families
3. Divergences Related to the Cumulant Function
4. Divergences Related to the Partition Function
5. Deforming Convex Functions and Their Induced Dually Flat Spaces
5.1. Comparative Convexity
5.2. Dually Flat Spaces
6. Conclusions and Discussion
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Amari, S.I. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Tokyo, Japan, 2016. [Google Scholar]
- Bregman, L.M. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
- Nielsen, F.; Hadjeres, G. Monte Carlo information-geometric structures. In Geometric Structures of Information; Springer: Berlin/Heidelberg, Germany, 2019; pp. 69–103. [Google Scholar]
- Brown, L.D. Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. In Lecture Notes-Monograph Series; Cornell University: Ithaca, NY, USA, 1986; Volume 9. [Google Scholar]
- Scarfone, A.M.; Wada, T. Legendre structure of κ-thermostatistics revisited in the framework of information geometry. J. Phys. Math. Theor. 2014, 47, 275002. [Google Scholar] [CrossRef]
- Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef] [PubMed]
- Nielsen, F.; Boltz, S. The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 2011, 57, 5455–5466. [Google Scholar] [CrossRef]
- Cichocki, A.; Amari, S.I. Families of alpha-beta-and gamma-divergences: Flexible and robust measures of similarities. Entropy 2010, 12, 1532–1568. [Google Scholar] [CrossRef]
- Niculescu, C.; Persson, L.E. Convex Functions and Their Applications, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2018; Volume 23, first edition published in 2006. [Google Scholar]
- Billingsley, P. Probability and Measure; John Wiley & Sons: Hoboken, NJ, USA, 2017. [Google Scholar]
- Barndorff-Nielsen, O. Information and Exponential Families; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Morris, C.N. Natural exponential families with quadratic variance functions. Ann. Stat. 1982, 10, 65–80. [Google Scholar] [CrossRef]
- Efron, B. Exponential Families in Theory and Practice; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
- Grünwald, P.D. The Minimum Description Length Principle; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
- Kailath, T. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. Commun. Technol. 1967, 15, 52–60. [Google Scholar] [CrossRef]
- Wainwright, M.J.; Jordan, M.I. Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 2008, 1, 1–305. [Google Scholar]
- LeCun, Y.; Chopra, S.; Hadsell, R.; Ranzato, M.; Huang, F. A tutorial on energy-based learning. In Predicting Structured Data; University of Toronto: Toronto, ON, USA, 2006; Volume 1. [Google Scholar]
- Kindermann, R.; Snell, J.L. Markov Random Fields and Their Applications; American Mathematical Society: Providence, RI, USA, 1980; Volume 1. [Google Scholar]
- Dai, B.; Liu, Z.; Dai, H.; He, N.; Gretton, A.; Song, L.; Schuurmans, D. Exponential family estimation via adversarial dynamics embedding. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2019; Volume 32. [Google Scholar]
- Cobb, L.; Koppstein, P.; Chen, N.H. Estimation and moment recursion relations for multimodal distributions of the exponential family. J. Am. Stat. Assoc. 1983, 78, 124–130. [Google Scholar] [CrossRef]
- Garcia, V.; Nielsen, F. Simplification and hierarchical representations of mixtures of exponential families. Signal Process. 2010, 90, 3197–3212. [Google Scholar] [CrossRef]
- Zhang, J.; Wong, T.K.L. λ-Deformed probability families with subtractive and divisive normalizations. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 2021; Volume 45, pp. 187–215. [Google Scholar]
- Boyd, S.P.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Wong, T.K.L. Logarithmic divergences from optimal transport and Rényi geometry. Inf. Geom. 2018, 1, 39–78. [Google Scholar] [CrossRef]
- Van Erven, T.; Harremos, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
- Azoury, K.S.; Warmuth, M.K. Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn. 2001, 43, 211–246. [Google Scholar] [CrossRef]
- Amari, S.I. Differential-Geometrical Methods in Statistics, 1st ed.; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 28. [Google Scholar]
- Nielsen, F. Statistical divergences between densities of truncated exponential families with nested supports: Duo Bregman and duo Jensen divergences. Entropy 2022, 24, 421. [Google Scholar] [CrossRef] [PubMed]
- Del Castillo, J. The singly truncated normal distribution: A non-steep exponential family. Ann. Inst. Stat. Math. 1994, 46, 57–66. [Google Scholar] [CrossRef]
- Wainwright, M.J.; Jaakkola, T.S.; Willsky, A.S. A new class of upper bounds on the log partition function. IEEE Trans. Inf. Theory 2005, 51, 2313–2335. [Google Scholar] [CrossRef]
- Hyvärinen, A.; Dayan, P. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 2005, 6, 695–709. [Google Scholar]
- Fujisawa, H.; Eguchi, S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 2008, 99, 2053–2081. [Google Scholar] [CrossRef]
- Eguchi, S.; Komori, O. Minimum Divergence Methods in Statistical Machine Learning; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
- Kolmogorov, A. Sur la Notion de la Moyenne; Cold Spring Harbor Laboratory: Cold Spring Harbor, NY, USA, 1930. [Google Scholar]
- Komori, O.; Eguchi, S. A unified formulation of k-Means, fuzzy c-Means and Gaussian mixture model by the Kolmogorov–Nagumo average. Entropy 2021, 23, 518. [Google Scholar] [CrossRef]
- Aczél, J. A generalization of the notion of convex functions. Det K. Nor. Vidensk. Selsk. Forh. Trondheim 1947, 19, 87–90. [Google Scholar]
- Nielsen, F.; Nock, R. Generalizing skew Jensen divergences and Bregman divergences with comparative convexity. IEEE Signal Process. Lett. 2017, 24, 1123–1127. [Google Scholar] [CrossRef]
- Bauschke, H.H.; Goebel, R.; Lucet, Y.; Wang, X. The proximal average: Basic theory. SIAM J. Optim. 2008, 19, 766–785. [Google Scholar] [CrossRef]
- Rockafellar, R.T. Conjugates and Legendre transforms of convex functions. Can. J. Math. 1967, 19, 200–205. [Google Scholar] [CrossRef]
- Shima, H. The Geometry of Hessian Structures; World Scientific: Singapore, 2007. [Google Scholar]
- Eguchi, S. A differential geometric approach to statistical inference on the basis of contrast functionals. Hiroshima Math. J. 1985, 15, 341–391. [Google Scholar] [CrossRef]
- Rockafellar, R. Convex Analysis; Princeton Landmarks in Mathematics and Physics; Princeton University Press: Princeton, NJ, USA, 1997. [Google Scholar]
- Yoshizawa, S.; Tanabe, K. Dual differential geometry associated with the Kullbaek-Leibler information on the Gaussian distributions and its 2-parameter deformations. SUT J. Math. 1999, 35, 113–137. [Google Scholar] [CrossRef]
- Hougaard, P. Convex Functions in Exponential Families; Department of Mathematical Sciences, University of Copenhagen: Copenhagen, Denmark, 1983. [Google Scholar]
- Brekelmans, R.; Nielsen, F. Variational representations of annealing paths: Bregman information under monotonic embeddings. Inf. Geom. 2024. [Google Scholar] [CrossRef]
- Amari, S.I. α-Divergence is unique, belonging to both f-divergence and Bregman divergence classes. IEEE Trans. Inf. Theory 2009, 55, 4925–4931. [Google Scholar] [CrossRef]
- Hennequin, R.; David, B.; Badeau, R. Beta-divergence as a subclass of Bregman divergence. IEEE Signal Process. Lett. 2010, 18, 83–86. [Google Scholar] [CrossRef]
- Ohara, A.; Eguchi, S. Group invariance of information geometry on q-Gaussian distributions induced by Beta-divergence. Entropy 2013, 15, 4732–4747. [Google Scholar] [CrossRef]
- Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J.; Lafferty, J. Clustering with Bregman divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
- Frongillo, R.; Reid, M.D. Convex Found. Gen. Maxent Model. 2014, 1636, 11–16. [Google Scholar]
- Ishige, K.; Salani, P.; Takatsu, A. Hierarchy of deformations in concavity. Inf. Geom. 2022, 7, 251–269. [Google Scholar] [CrossRef]
- Zhang, J.; Wong, T.K.L. λ-Deformation: A canonical framework for statistical manifolds of constant curvature. Entropy 2022, 24, 193. [Google Scholar] [CrossRef] [PubMed]
- Jenssen, R.; Principe, J.C.; Erdogmus, D.; Eltoft, T. The Cauchy–Schwarz divergence and Parzen windowing: Connections to graph theory and Mercer kernels. J. Frankl. Inst. 2006, 343, 614–629. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nielsen, F. Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity. Entropy 2024, 26, 193. https://doi.org/10.3390/e26030193
Nielsen F. Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity. Entropy. 2024; 26(3):193. https://doi.org/10.3390/e26030193
Chicago/Turabian StyleNielsen, Frank. 2024. "Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity" Entropy 26, no. 3: 193. https://doi.org/10.3390/e26030193
APA StyleNielsen, F. (2024). Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity. Entropy, 26(3), 193. https://doi.org/10.3390/e26030193