Axiomatic Characterizations of Information Measures
Abstract
:1. Introduction
1.1. Historical comments
1.2. Directions of axiomatic characterizations
2. Direction (A)
- -
- Positivity: H(P) ≥ 0
- -
- Expansibility: “Expansion” of P by a new component equal to 0 does not change H(P)
- -
- Symmetry: H(P) is invariant under permutations of p1,…,pn
- -
- Continuity: H(P) is a continuous function of P (for fixed n)
- -
- Additivity: H(P × Q) = H(P) + H(Q)
- -
- Subadditivity: H(X, Y) ≤ H(X) + H(Y)
- -
- Strong additivity: H(X, Y) = H(X) + H(Y|X)
- -
- Recursivity: H(p1,…,pn) = H(p1 + p2, p3,…,pn) + (p1 + p2)H
- -
- Sum property: , for some function g.
2.1. Shannon entropy and I-divergence
2.2. Rényi entropies and divergences
2.3. Other entropies and divergences
2.4. Entropies and divergences of degree α
3. Direction (B)
4. Direction (C)
- (i)
- a probability distribution P = (p1,…,pn), or
- (ii)
- any , or
- (iii)
- any P ∈ ℝn
- Regularity: (a) Q ∈ F implies Π(F, Q) = Q, (b) F1 ⊂ F and Π(F, Q) ∈ F1 imply Π(F1, Q) = Π(F, Q), (c) for each P ≠ Q, among the feasible sets determined by a single constraint there exists a unique F such that Π(F, Q) = P , (d) Π(F, Q) depends continuously on F.
- Locality: If F1 is defined by a set of I-local constraints, F2 by a set of Ic-local ones, then the components of are determined by F1 and {qi : i ∈ I}.
- Transitivity: If F1 ⊂ F, Π(F, Q) = P∗, then Π(F1, Q) = Π(F1, P∗).
- Semisymmetry: If F = {P : pi + pj = t} for some i ≠ j and constant t, and Q satisfies qi = qj , then P∗ = Π(F, Q) satisfies .
- Weak scaling (for cases (i), (ii)): For F as above, P∗ = Π(F, Q) always satisfies
- (a)
- transitivity iff the functions fk are of form
- (b)
- semisymmetry iff f1 = ⋯ = fn
- (c)
- weak scaling (in cases (i), (ii)) iff the functions f1 = ⋯ = fn are of form af(p/q), f is strictly convex, f(1) = f′(1) = 0, and f′(x) → −∞ as x → 0; then d(P, Q) is the f-divergence Df(P||Q).
- (a)
- translation and scale invariance (in case (iii)) iff Π(F, Q) equals the Euclidean projection of Q to F
- (b)
- scale invariance (in case (ii)) iff Π(F, Q) is the minimizer of
5. Discussion
Acknowledgement
References
- Aczél, J.; Daróczy, Z. On Measures of Information and Their Characterizations; Academic Press: New York, 1975. [Google Scholar]
- Aczél, J.; Daróczy, Z. A mixed theory of information I. RAIRO Inform. Theory 1978, 12, 149–155. [Google Scholar]
- Aczél, J.; Forte, B.; Ng, C.T. Why Shannon and Hartley entropies are “natural”. Adv. Appl. Probab. 1974, 6, 131–146. [Google Scholar] [CrossRef]
- Ahlswede, R.; Cai, N. An interpretation of identification entropy. IEEE Trans. Inf. Theory 2006, 52, 4198–4207. [Google Scholar] [CrossRef]
- Ali, S.M.; Silvey, S.D. A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B 1966, 28, 131–142. [Google Scholar]
- Arimoto, S. Information-theoretic considerations on estimation problems. Information and Control 1971, 19, 181–194. [Google Scholar] [CrossRef]
- Arimoto, S. Information measures and capacity of order α for discrete memoryless channels. In Topics in Information Theory; Colloq. Math. Soc. J. Bolyai 16; Csiszár, I., Elias, P., Eds.; North Holland: Amsterdam, 1977; pp. 41–52. [Google Scholar]
- Ben-Bassat, M. f-entropies, probability of error, and feature selection. Information and Control 1978, 39, 227–242. [Google Scholar] [CrossRef]
- Bennett, C.; Brassard, G.; Crépeau, C.; Maurer, U. Generalized privacy amplification. IEEE Trans. Inf. Theory 1995, 41, 1915–1923. [Google Scholar] [CrossRef]
- Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 1943, 35, 99–109. [Google Scholar]
- Bregman, L.M. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math. and Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
- Campbell, L.L. A coding theorem and Rényi’s entropy. Information and Control 1965, 8, 423–429. [Google Scholar] [CrossRef]
- Chaundry, T.W.; McLeod, J.B. On a functional equation. Edinburgh Mat. Notes 1960, 43, 7–8. [Google Scholar] [CrossRef]
- Csiszár, I. Eine informationstheorische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. 1963, 8, 85–107. [Google Scholar]
- Csiszár, I. Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 1967, 2, 299–318. [Google Scholar]
- Csiszár, I. A class of measures of informativity of observation channels. Periodica Math. Hungar. 1972, 2, 191–213. [Google Scholar] [CrossRef]
- Csiszár, I. Information measures: a critical survey. In Trans. 7th Prague Conference on Inf. Theory, etc.; Academia: Prague, 1977; pp. 73–86. [Google Scholar]
- Csiszár, I. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Statist. 1991, 19, 2032–2066. [Google Scholar]
- Csiszár, I. Generalized cutoff rates and Rényi information measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
- Daróczy, Z. Über die gemeinsame Charakterisierung der zu den nicht vollständigen Verteilungen gehörigen Entropien von Shannon und von Rényi. Z. Wahrscheinlichkeitsth. Verw. Gebiete 1963, 1, 381–388. [Google Scholar] [CrossRef]
- Daróczy, Z. Über Mittelwerte und Entropien vollständiger Wahrscheinlichkeitsverteilungen. Acta Math. Acad. Sci. Hungar. 1964, 15, 203–210. [Google Scholar] [CrossRef]
- Daróczy, Z. Generalized information functions. Information and Control 1970, 16, 36–51. [Google Scholar] [CrossRef]
- Daróczy, Z. On the measurable solutions of a functional equation. Acta Math. Acad. Sci. Hungar. 1971, 34, 11–14. [Google Scholar] [CrossRef]
- Daróczy, Z.; Maksa, Gy. Nonnegative information functions. In Analytic Function Methods in Probability and Statistics; Colloq. Math. Soc. J. Bolyai 21; Gyires, B., Ed.; North Holland: Amsterdam, 1979; pp. 65–76. [Google Scholar]
- Diderrich, G. The role of boundedness in characterizing Shannon entropy. Information and Control 1975, 29, 149–161. [Google Scholar] [CrossRef]
- Ebanks, B.; Sahoo, P.; Sander, W. Characterizations of Information Measures; World Scientific: Singapore, 1998. [Google Scholar]
- Faddeev, D.K. On the concept of entropy of a finite probability scheme (in Russian). Uspehi Mat. Nauk 1956, 11, 227–231. [Google Scholar]
- Fischer, P. On the inequality ∑pif(pi) ≥ ∑pif(qi). Metrika 1972, 18, 199–208. [Google Scholar] [CrossRef]
- Forte, B. Why Shannon’s entropy. In Conv. Inform. Teor., Rome 1973; Symposia Math. 15; Academic Press: New York, 1975; pp. 137–152. [Google Scholar]
- Grünwald, P.; Dawid, P. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. Statist. 2004, 32, 1367–1433. [Google Scholar]
- Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
- Ingarden, R.S.; Urbanik, K. Information without probability. Colloq. Math. 1962, 9, 131–150. [Google Scholar]
- Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
- Jones, L.K.; Byrne, C.L. General entropy criteria for inverse problems, with applications to data compression, pattern classification and cluster analysis. IEEE Trans. Inf. Theory 1990, 36, 23–30. [Google Scholar] [CrossRef]
- Kampé de Fériet, J.; Forte, B. Information et probabilité. C. R. Acad. Sci. Paris A 1967, 265, 110–114, 142–146, and 350–353. [Google Scholar]
- Kannappan, Pl.; Ng, C.T. Measurable solutions of functional equations related to information theory. Proc. Amer. Math. Soc. 1973, 38, 303–310. [Google Scholar] [CrossRef]
- Kannappan, Pl.; Ng, C.T. A functional equation and its applications in information theory. Ann. Polon. Math. 1974, 30, 105–112. [Google Scholar]
- Kolmogorov, A.N. A new invariant for transitive dynamical systems (in Russian). Dokl. Akad. Nauk SSSR 1958, 119, 861–864. [Google Scholar]
- Kullback, S. Information Theory and Statistics; Wiley: New York, 1959. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Statist. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Lee, P.M. On the axioms of information theory. Ann. Math. Statist. 1964, 35, 415–418. [Google Scholar] [CrossRef]
- Linhard, J.; Nielsen, V. Studies in statistical dynamics. Kong.Danske Vid. Selskab Mat-fys. Med. 1971, 38, 1–42. [Google Scholar]
- Maksa, Gy. On the bounded solutions of a functional equation. Acta Math. Acad. Sci. Hungar. 1981, 37, 445–450. [Google Scholar] [CrossRef]
- Matúš, F. Infinitely many information inequalities. In IEEE ISIT07 Nice, Symposium Proceedings; pp. 41–44.
- Neumann, J. Thermodynamik quantenmechanischer Gesamtheiten. Gött. Nachr. 1927, 1, 273–291. [Google Scholar]
- Paris, J.; Vencovská, A. A note on the inevitability of maximum entropy. Int’l J. Inexact Reasoning 1990, 4, 183–223. [Google Scholar] [CrossRef]
- Pippenger, N. What are the laws of information theory? In Special Problems on Communication and Computation Conference, Palo Alto, CA, Sep. 3–5, 1986.
- Rényi, A. On measures of entropy and information. In Proc. 4th Berkeley Symp. Math. Statist. Probability, 1960; Univ. Calif. Press: Berkeley, 1961; Vol. 1, pp. 547–561. [Google Scholar]
- Rényi, A. On the foundations of information theory. Rev. Inst. Internat. Stat. 1965, 33, 1–4. [Google Scholar] [CrossRef]
- Sanov, I.N. On the probability of large deviations of random variables (in Russian). Mat. Sbornik 1957, 42, 11–44. [Google Scholar]
- Schützenberger, M.P. Contribution aux applications statistiques de la théorie de l’information. Publ. Inst. Statist. Univ. Paris 1954, 3, 3–117. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell System Tech. J. 1948, 27, 379–423 and 623–656. [Google Scholar] [CrossRef]
- Shore, J.E.; Johnson, R.W. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. [Google Scholar] [CrossRef]
- Sibson, R. Information radius. Z. Wahrscheinlichkeitsth. Verw. Gebiete 1969, 14, 149–161. [Google Scholar] [CrossRef]
- Tsallis, C. Possible generalizations of the Boltzmann-Gibbs statistics. J. Statist. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
- Tverberg, H. A new derivation of the information function. Math. Scand. 1958, 6, 297–298. [Google Scholar]
- Vajda, I. Bounds on the minimal error probability for testing a finite or countable number of hy- potheses (in Russian). Probl. Inform. Transmission 1968, 4, 9–17. [Google Scholar]
- Wald, A. Sequential Analysis; Wiley: New York, 1947. [Google Scholar]
- Yeung, R.W. A First Course in Information Theory; Kluwer: New York, 2002. [Google Scholar]
- Zhang, Z.; Yeung, R.W. On characterizations of entropy function via information inequalities. IEEE Trans. Inf. Theory 1998, 44, 1440–1452. [Google Scholar] [CrossRef]
© 2008 by the authors. Licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/3.0/).
Share and Cite
Csiszár, I. Axiomatic Characterizations of Information Measures. Entropy 2008, 10, 261-273. https://doi.org/10.3390/e10030261
Csiszár I. Axiomatic Characterizations of Information Measures. Entropy. 2008; 10(3):261-273. https://doi.org/10.3390/e10030261
Chicago/Turabian StyleCsiszár, Imre. 2008. "Axiomatic Characterizations of Information Measures" Entropy 10, no. 3: 261-273. https://doi.org/10.3390/e10030261
APA StyleCsiszár, I. (2008). Axiomatic Characterizations of Information Measures. Entropy, 10(3), 261-273. https://doi.org/10.3390/e10030261