Affine Statistical Bundle Modeled On A Gaussian Orlicz-Sobolev Space
Affine Statistical Bundle Modeled On A Gaussian Orlicz-Sobolev Space
Affine Statistical Bundle Modeled On A Gaussian Orlicz-Sobolev Space
https://doi.org/10.1007/s41884-022-00078-6
SURVEY PAPER
Giovanni Pistone1
Abstract
The dually flat structure of statistical manifolds can be derived in a non-parametric
way from a particular case of affine space defined on a qualified set of probability
measures. The statistically natural displacement mapping of the affine space depends
on the notion of Fisher’s score. The model space must be carefully defined if the state
space is not finite. Among various options, we discuss how to use Orlicz–Sobolev
spaces with Gaussian weight. Such a fully non-parametric set-up provides tools to
discuss intrinsically infinite-dimensional evolution problems
Professor S.-I. Amari clearly stated in a 1987 conference paper [3] the notion of a
non-parametric fiber bundle in Information Geometry (IG). He says
A fibre bundle is constructed on a finite-dimensional parametric statistical model
with a Hilbert space as the fibre space. The Hilbert space represents the tangent
directions of the set of probability distributions in the function space. A pair of
dual linear connections is introduced in the Hilbert bundle.
A related journal paper [6] shows the applications to the statistics of semi-parametric
statistical models. Kass and Vos, in their 1997 monograph [26, 10.3], further explain
the construction by describing the tangent fiber T p M of a statistical model M as a
vector space of random variables:
B Giovanni Pistone
giovanni.pistone@carloalberto.org
1 de Castro Statistics, Collegio Carlo Alberto, Piazza Vincenzo Arbarello 8, 10122 Turin, Italy
123
S110 G. Pistone
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S111
B(E) ⊂ ∩μ∈E L 2 (μ) that contains the scores of all one-dimensional statistical models.
Then, each fiber is defined to be
S p E = u ∈ B(E) u p dμ = 0 .
For example, assume the sample space is a compact set K , and μ is a diffuse
measure. Let E be the set of all continuous, strictly positive probability densities. E is
an open convex set of C(K ). If θ → pθ is a differentiable curvein E, then the score
dθ log pθ is a curve in B(E) = C(K ) such that for each θ it holds dθ log pθ dμ = 0.
d d
And conversely, for each u ∈ Bθ the curve θ → pθ ∝ e has values in E and its
tu
dμ
m
Uνμ : Sμ E u → u ∈ Sν E, (2)
dν
Also, there is a transport of the inner product from one fiber to the other,
e ν m ν
u, v μ = e Uμ e ν
ν Uμ u, v μ = Uμ u, Uμ v ν .
The equations above clearly define a geometry of probability measures that is related
but different from the previously studied Riemannian geometry based on the notion
of Fisher-Rao information matrix taken as an expression of an inner product between
tangent vectors. This new geometry originated, a least in the statistical community,
with the idea of defining the geometry of curved exponential models as embedded in
a larger exponential family [2, 18, 21, 22].
Such a theory has been known for a long time in statistical mechanics. The main
difference is that R. Fisher and other statisticians of the same period used to think about
parsimoniously parameterized models. In contrast, physicists such as Boltzmann and
Gibbs used to think in terms of simple relations between statistical observables. The
123
S112 G. Pistone
d ∂ ∂
Tt (y, x) = ωH (Tt (y, x)) , ωH = − H , H ,
dt ∂x ∂y
T0 (y, x) = (y, x) .
∂
log f t + ωH · ∇ log f t = 0 .
∂t
In particular, f t = f if f is a function of H . Among all invariant probability
densities, the curve θ → pt = eθ H /Z (θ ) represents an evolution in the class of
invariant probabilities. The score of the model is
d d
log eθ H /Z (θ ) = H − log Z (t) = H − H pt dm .
dt dt
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S113
−→
1. for each fixed P the partial mapping s P : Q → P Q is 1-to-1 and onto, and
−→ −→ −→
2. the parallelogram law, P Q + Q R = P R, holds true.
The structure (M, V , −
→) is, by definition, the affine space. The corresponding affine
manifold is derived from the atlas of charts s P : M → V , P ∈ M. Notice that the
change of chart is the choice of a new origin. Such a structure supports a full geometrical
development; see [37].
Weyl’s axioms suggest the following definition.
Let M be a set and let Bμ , μ ∈ M, be a family of real topological vector spaces.
μ μ
Let (Uν ), ν, μ ∈ M be a family of isomorphism Uν : Bν → Bμ satisfying the cocycle
condition,
μ ρ ρ μ
AF0 Uμ = I and Uν Uνμ = Uμ , where Uν is the transport from Bν onto Bμ .
Consider a displacement mapping
S : (ν, μ) → sν (μ) ∈ Bν
is, by definition, the affine manifold associated with the given affine bundle.
Here is our main
instance. Consider the exponential transport of Eq. (1) and define
s p (q) = log qp − log qp p · dm. The parallelogram identity is
q q
log − log p dm
p p
r r r r
+ log − log p dm − log − log p dm dm
q q q q
r r
= log − log p dm
p p
123
S114 G. Pistone
The dual instance is associated with the mixture transport of Eq. (2) and s p (q) =
q
p − 1. The parallelogram identity is
q q r r
−1+ −1 = −1.
p p q p
the equation
The affine bundle is a convenient expression of the tangent bundle of the affine
manifold if we define the velocity as follows. The velocity of the smooth curve t →
γ (t) of the affine manifold M is the curve t → (γ (t), γ (t)) of the affine bundle
whose second component is
d
γ (t) = lim h −1
(sγ (t) (γ (t + h)) = sγ (t) (γ (t + h)) .
h→0 dh h=0
By assumption AF2 applied to the points, the expression in the chart centered at ν
of γ (t) is Uνγ (t) γ (t) = sν (γ (t)).
For example, in the exponential manifold, it holds
γ (t) d γ (t) γ (t) d
γ (t) = e
Up log − E p log = log γ (t) ,
dt p p dt
so that the (affine) velocity in the exponential manifold equals Fisher’s score.
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S115
Let F be a section of the affine manifold, that is, μ → (μ, F(μ)) ∈ SM. An
integral curve of the section F is a curve t → γ (t) such that γ (t) = F(γ (t)). A flow
of the section F is a mapping
such that for each ν the curve t → γt (ν) is an integral curve and γ (0, ν) = ν.
The following proposition gives a characterization of affine geodesics. The follow-
ing statements are equivalent.
γ (t)
1. The curve I : t → γ (t) is auto parallel, that is, γ (t) = Uγ (s) γ (s), s, t ∈ I .
2. The expression of the curve in each chart is affine.
3. For all s, t
γ (t) = Sγ−1
(s) (t − s)γ (s)
The acceleration is defined as a velocity in the affine bundle. Consider the curve
∗∗
t → μ(t) with velocity t → μ(t). The acceleration t → μ(t) is the velocity t →
(μ(t), μ(t)).
h→0
d μ
μ(t) = Uμ(t)
∗∗
μ U μ(t) .
dt μ(t)
This equation shows that a curve with 0 acceleration is auto-parallel.
In the exponential example, the acceleration is computed as follows: In the expo-
nential case,
∗∗ p(t) d e p p(t) d ṗ(t) ṗ(t)
p(t) = e U p U p(t) p(t) = e U p − p dm
dt dt p(t) p(t)
p̈(t) ṗ(t) 2 p̈(t) ṗ(t) 2
= − − − p(t) dm
p(t) p(t) p(t) p(t)
p̈(t) ṗ(t) 2 ṗ(t) 2
= − + p(t) dm .
p(t) p(t) p(t)
Above, we have discussed in general terms how to define an affine Banach manifold.
We now proceed to instantiate the general formalism into a specific case of Gaussian
space. In doing that, the usual toolbox of IG should be extended with other analytical
notions. A general reference is [11]. We now restrict our attention to a particular
instance of model Banach space. Precisely, we will use the generalization of Lebesgue
spaces called Orlicz spaces. General references are the monographs [36, Ch. II] and
123
S116 G. Pistone
[1, Ch. VII]. The basic technical tools are the notion of conjugation between convex
functions and the analysis of the Gaussian space. I will use my conference paper [46].
(x) + (y) ≥ x y , x, y ≥ 0 ,
The Young inequality provides a separating duality u, v μ = uv dμ of L (μ) and
L ∗ (μ) such that u, v μ ≤ 2 u L (μ) v L ∗ (μ) . The dual norm is called the Orlicz
norm and is equivalent to the Luxembourg norm.
The domination relation between Young functions implies continuous injection
properties for the corresponding Orlicz spaces. We say that 2 eventually dominates
1 , written 1 ≺ 2 , if there is a constant κ such that 1 (x) ≤ 2 (κ x) for all x larger
than some x̄. As, in our case, μ is a probability measure, the continuous embedding
L 2 (μ) → L 1 (μ) holds if, and only if, 1 ≺ 2 . See proof in [1, Th. 8.2]. If
1 ≺ 2 , then ( 2 )∗ ≺ ( 1 )∗ . Looking at the examples above, exp2 (4) and cosh2
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S117
(5) are equivalent, they both are eventually dominated by gauss2 (6) and eventually
dominate all powers (3).
A special case occurs when there exists a function C such that (ax) ≤ C(a) (x)
for all a ≥ 0. This is true, for example, for a power function and in the case of the
functions (exp2 )∗ and (cosh2 )∗ . In such a case, the conjugate space is the dual space
and bounded functions form a dense set.
The spaces corresponding to case (3) are ordinary Lebesgue spaces. The cases (4)
and (5) provide isomorphic Banach spaces, which are of special interest to us as they
provide the model spaces for our non-parametric version of IG. In fact, a random
variable u belongs to L exp2 (μ) if, and only if, the exponential family pθ ∝ eθu
is defined in a neighborhood of θ = 0. In the conjugate space, a strictly positive
probability density f has finite entropy if, and only if, the random variable v = f − 1
belongs to L (exp2 )∗ (μ).
Another important feature of the class L cosh2 (μ) is the following. Such a class coin-
cides with the class of sub-exponential random variables, that is, those for which there
exist constants C1 , C2 > 0 such that the large deviations admit an exponential bound
Pμ (| f | ≥ t) ≤ C1 exp (−C2 t) , t ≥ 0 .
123
S118 G. Pistone
1 (Rn ), we have
Cpoly
f (x) ∂i g(x) γ (x) d x = δi f (x) g(x) γ (x) d x ,
We refer to Sect. 1, and [43, 44] for the definition of maximal exponential manifold
E (γ ), and of statistical bundle S E (γ ). Below we report the results that are necessary
in the context of the present paper.
A key result is the proof of the following proposition, see [16, 17, 48] and [49,
Th. 4.7].
For all p, q ∈ E (γ ) it holds q = eu−K p (u) · p, where u ∈ L cosh2 (γ ), E p [u] =
0, and u belongs to the interior of the proper domain of the convex function K p . This
property is equivalent to any of the following:
1. p and q are connected by an open exponential arc;
2. L cosh2 ( p) = L cosh2 (q) and the norms are equivalent;
3. p/q ∈ ∪a>1 L a (q) and q/ p ∈ ∪a>1 L a ( p).
Item 2 ensures that all the fibers of the statistical bundle, namely S p E (γ ), p ∈
E (γ ), are isomorphic. Item 3 gives an explicit description of the exponential manifold.
For example, let p be a positive probability density with respect to γ , and take q = 1
and a = 2. Then a sufficient condition for p ∈ E (γ ) is
1
p(x)2 γ (x) d x < ∞ and γ (x) d x < ∞ .
p(x)
By replacing the L 2 -norm with a cosh2 -Orlicz norm, a set-up for IG obtains [30,
44]. Precisely, we have exponential families with weakly differentiable densities and
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S119
where f ∈ Cpoly
1 (Rn ). See a proof in [38, 1.4]. In terms of norms, the inequality above
is equivalent to f − f L 2 (γ ) ≤ |∇ f | L 2 (γ ) , where f = f (y) γ (y) dy .
For example, if p ∈ Cpoly
2 is a probability density with respect to γ , then the
χ 2 -divergence
of P = p · γ from γ is bounded by
Dχ 2 (P|γ ) = ( p(x) − 1) γ (x) d x ≤
2
(δ · ∇ p(x))2 γ (x) d x .
see [33, V-1.5] and [38, 1.3]. Notice that the OU semigroup interpolates between
P0 f = f and P∞ f = f . If X , Y are independent standard Gaussian random variables
in Rn , then
X t = e−t X + 1 − e−2t Y , Yt = 1 − e−2t X − e−t Y
are independent standard Gaussian random variables for all t ≥ 0. By the change of
variable (X , Y ) → (X t , Yt ) and Jensen’s inequality, it follows for each convex that
(Pt f (x)) γ (x) d x ≤ ( f (x)) γ (x) d x .
That is, for all t ≥ 0, the mapping f → Pt f is non-expansive for the norm of each
Orlicz space L (γ ).
For all : R convex and all f ∈ Cpoly1 (Rn ), it holds
f (x) − f (y) γ (y) dy γ (x) d x
π
≤ ∇ f (x) · y γ (x)γ (y) d xd y
2
1 π
|∇ f (x)| z e−z /2 γ (x) dzd x
2
=√
2π 2
123
S120 G. Pistone
= (|∇ f (x)|) γ (x) d x , (7)
The cases (a) = a 2 p are special in that we can use them in the proof of the
multiplicative property (ab) = (a) (b). The argument generalizes to the case
where the convex function is a Young function whose increase is controlled through
a function C, (uv) ≤ C(u) (v), and such that there exists a κ > 0 for which
π
C κu γ (u) du ≤ 1 ,
2
Assume now that |∇ f | L (γ ) ≤ 1 so that the LHS does not exceed 1. Then
κ f − f L (γ ) ≤ 1, which, in turn, implies the inequality
f − f ≤ κ −1 |∇ f | L (γ ) .
L (γ )
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S121
It is of particular interest the case of the Young function = cosh −1, for which
there is no such bound. Instead, we use Eq. (8) with κ and −κ to get
2κ
(cosh −1) f (x) − f γ (x) d x
π
≤ gauss2 (κ |∇ f (x)|) γ (x) d x . (9)
π
f − f ≤ |∇ f | L gauss (γ ) .
L cosh −1 (γ ) 2 2
inequalities hold:
f − f (y) γ (y) dy ≤ C1 |∇ f | L (exp )∗ (γ ) . (10)
2
L (exp2 )∗ (γ )
f − f (y) γ (y) dy ≤ C2 ( p) |∇ f | L 2 p (γ ) , p > 1/2 . (11)
L 2 p (γ )
f − f (y) γ (y) dy ≤ C3 |∇ f | L gauss . (12)
2 (γ )
L cosh2 (γ )
Other equivalent norms could be used in the inequalities above. For example,
L (exp2 )∗ (γ ) ↔ L (cosh −1)∗ (γ ) and L gauss2 (γ ) ↔ L 2cosh −1 (γ ).
We now consider a further set of inequalities based on the use of infinitesimal
generator −δ · ∇ of the OU semigroup [38, 1.3.7].
We have, for all f ∈ Cpoly2 (Rn ), that
∞ ∞
d
f (x) − f = − Pt f (x) dt = δ · ∇ Pt f (x) dt . (13)
0 dt 0
Note that
∇ Pt f (x) = ∇ f (e−t x + 1 − e−2t y) γ (y) dy
= e−t ∇ f (e−t x + 1 − e−2t y) γ (y) dy = e−t Pt ∇ f (x) ,
so that
Pt δ · ∇ f (x) = δ · ∇ Pt f (x) = e−t δ · Pt ∇ f (x) .
123
S122 G. Pistone
As
δ · ∇ f (x) γ (x) d x = 0 ,
Covγ ( f , g)
= ( f (x) − f )g(x) γ (x) d x = ( f (x) − f )(g(x) − g) γ (x) d x .
∞
−t
Covγ ( f , g) = e Pt ∇ f (x) · ∇g(x) γ (x) d x dt . (15)
0
We use here a result of [44, Prop. 5]. Let |·|1 and |·|2 be two norms on Rn , such
that |x · y| ≤ |x|1 |y|2 . For a Young function , consider the norm of L (γ ) and the
conjugate space endowed with the dual norm,
f L ,∗ (γ )
= sup f g γ (g) γ ≤ 1 .
The following proposition includes the standard Poincaré case provided (u) =
u 2 /2.
Given a couple of conjugate Young function , , and norms |·|1 , |·|2 on Rn such
that x · y ≤ |x|1 |y|2 , x, y ∈ Rn , for all f , g ∈ Cpoly
1 (Rn ), it holds
Covγ ( f , g) ≤ |∇ f |1 |∇g|2 .
L (γ ) L ,∗ (γ )
The case of our interest here is = cosh −1, = (cosh −1)∗ . As (cos −1)∗ ≺
(cosh −1), it follows, in particular, that Covγ ( f , f ) is bounded by constant times
|∇ f |2L cosh −1 (γ ) .
A reasonable option for our model space is to assume densities f = eu−K γ (u) ·
γ in the Gaussian maximal exponential family, f ∈ E (γ ), and, moreover, assume
differentiability in the form u ∈ L 2cosh2 (γ ) = L gauss2 (γ ), that is, u 2 ∈ L cosh2 (γ ), see
Sect. 2.2.
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S123
with δ j φ = (u j − ∂ j )φ. Here, the Stein operator δi acts on C0∞ (Rn ) [13].
The meaning of both operators ∂ j and δ j = (u j − ∂ j ) when acting on square-
integrable random variables of the Gaussian space is well known. Still, here we
are specifically interested in the action on OSG spaces. Let us denote by Cp∞ (Rn )
the space of infinitely differentiable functions with polynomial growth of all deriva-
tives. Polynomial growth implies the existence of γ -moments of all derivatives, hence
Cp∞ (Rn ) ⊂ Wcosh1,2
2∗
(γ ). If f ∈ Cp∞ (Rn ), then the distributional derivative and the
ordinary derivative are equal and moreover δ j f ∈ Cp∞ (Rn ). For each φ ∈ C0∞ (Rn )
we have φ, δ j f γ = ∂ j φ, f γ .
1,2 1,2
The OSG spaces Wcosh 2
(γ ) and Wcosh 2∗
(γ ) are both Banach spaces [36, Sec. 10].
The norm is the graph norm,
n
vW 1,2 = v L cosh (γ ) +
∂ j v ,
cosh2 (γ ) 2 L gauss2 (γ )
j=1
n
ηW 1,2 = η L cosh (γ ) + ∂ j η .
cosh2∗ (γ ) 2∗ L gauss2∗ (γ )
j=1
In the cases of null integral, Eq. (12) shows that the second term only provides an
1,2
equivalent norm for Wcosh 2
(γ ).
We review some relations between OSG spaces and Sobolev spaces without weight
[1] in the following proposition. For each ball radius R > 0,
n n R2
(2π )− 2 ≥ γ (x) ≥ γ (x)(|x| < R) ≥ (2π )− 2 e− 2 (|x| < R), x ∈ Rn .
Let R denote the open sphere of radius R > 0 and consider the restriction u →
u R of u to R .
1. We have the continuous mappings
W 1,(cosh −1) Rn ⊂ W 1,(cosh −1) (γ ) → W 1, p ( R ), p ≥ 1.
123
S124 G. Pistone
W 1, p (Rn ) ⊂ W 1,(cosh −1)∗ Rn ⊂ W 1,(cosh −1)∗ (γ ) → W 1,1 ( R ), p > 1.
3. Each u ∈ W 1,(cosh −1) (γ ) is a.s. Hölder of all orders on each R and hence a.s.
continuous. The restriction W 1,(cosh −1) (γ ) → C( R ) is compact.
1
DH ( p|q) = ∇(u − v)2 p(x) γ (x) d x < +∞
2
3 Conclusion
1
n
lim f (X j ) = f (x) p(x) γ (x) d x ,
n→∞ n
j=1
with an exponential bound on the tail probability. See [52, 2.8] and [50].
I adapt [25, 40], and [5, 13.6.2] to my Gaussian case. Consider the Hyvärinen diver-
gence of Sect. 2.6 in the Gaussian case, that is, P = p · γ and Q = q · γ . As a function
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S125
of q, the divergence is
1
q → DH (q · γ | p · γ ) = |∇ log p(x)|2 p(x) γ (x) d x
2
1
+ |∇ log q(x)| p(x) γ (x) d x − ∇ log p(x) · ∇ log q(x) p(x) γ (x) d x ,
2
2
where the first term does not depend on q and the second term is a p · γ -expectation.
As ∇ log p = p −1 ∇ p, the third term equals
− δ · ∇ log q(x) p(x) γ (x) d x ,
1
S(q, x) = |∇ log q(x)|2 − δ · ∇ log q(x)
2
u−K (u) with
If p and q belong to the maximal exponential model of γ , then1 q = e2
u ∈ L cosh2 (γ ) and u(x) γ (x) d x = 0. The local score becomes 2 |∇u| −δ ·∇u. To
compute the p-expected value of the score with an independent sample of p·γ , we have
the interest to assume that the score is in L cosh2 (γ ), because this assumption implies
the good convergence of the empirical means for all p. Assume, for example, ∇u ∈
L 2cosh2 (γ ) = L gauss2 (γ ). This implies directly |∇u|2 ∈ L (cosh −1) (γ ). Moreover, we
must assume that the L cosh2 (γ )-norm of δ · ∇u is finite. Under such assumptions,
one hopes that the minimization of a suitable model of the sample expectation of the
Hyvärinen score is consistent.
This metric was originally defined in [39]. Let be given in the maximal exponential
model of γ , p ∈ E (γ ), and let f and g be given in the p-fiber of the statistical
1,2 1,2
statistical bundle, that is, f , g ∈ Wcosh ( p) = W (γ ) and f (x) p γ (x) d x =
2 cosh2
g(x) p γ (x) d x = 0. Otto’s inner product is
( f , g) → f,g p = ∇ f (x) · ∇g(x) p(x) γ (x) d x
= f (x) δ · ( p(x)∇g(x)) γ (x) d x = f , δ · ( p∇g) p .
123
S126 G. Pistone
where x is the transpose of the column vector x, S2 is the unit sphere of R3 , and σ is
the uniform probability on S2 . One can prove that f → Q( f )/ f is a section of the
mixture bundle. The Boltzmann equation can be seen as the equation f = Q( f )/ f .
The smoothness of the Boltzmann section follows from a superposition of operators:
1. Product: E ( f 0 ) f → f ⊗ f ∈ E ( f 0 ⊗ f 0 );
2. Interaction: E ( f 0 ⊗ f 0 ) f ⊗ f →
g = B f ⊗ f ∈ E ( f 0 ⊗ f 0 );
3. Conditioning: E ( f 0 ⊗ f 0 ) g → S2 g ◦ A x σ (d x) ∈ E ( f 0 ⊗ f 0 );
4. Marginalization.
There is a weak form of the Boltzmann section. Let v, w be a couple of velocities
before the collision and let us denote by (vx , wx ) the velocities after the collision, see
[53]. For f ∈ E (γ ) and g ∈ L cosh2 (γ ), define the operator A with
1 1
Ag(v, w) = (g(vx ) + g(wx )) σ (d x) − (g(v) + g(w)) .
S2 2 2
Ag is in L cosh2 γ ⊗2 and g, Q( f )/ f f = E f ⊗ f [Ag].
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S127
function. The conjugation relation holds, and the mechanic’s equations provide
a dynamic picture of the statistical bundle. For example, one can consider the
Lagrangian function, where entropy takes the potential energy, and Fisher’s metric
takes the kinetic energy. This was done in the finite case in [20, 45].
Acknowledgements It is a pleasure to acknowledge the contribution of many people to my work in non-
parametric IG. In particular, I like to mention professor Sun-ichi Amari’s constant encouragement and
the critical assessment by Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachöfer in [9, 3.3]. I also
like to mention friends and coworkers in order of appearance in this paper: Carlo Sempi, Paolo Gibilisco,
Alberto Cena, Maria Piera Rogantin, Barbara Trivellato, Paola Siri, Marina Santacroce, Luigi Malagò, Luigi
Montrucchio, Goffredo Chirco, Bertrand Lods. The author was partially supported by de Castro Statistics,
Collegio Carlo Alberto, and is a member of GNAMPA, Istituto di Alta Matematica, Rome.
Funding The author is supported by de Castro Statistics, Collegio Carlo Alberto, and INdAM-Gnafa.
Declarations
Conflict of interest The author is on the Editorial Board of Information Geometry. The author states that
there are no other conflicts of interest.
References
1. Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Pure and Applied Mathematics (Amsterdam), vol. 140,
2nd edn., p. 305. Elsevier/Academic Press, Amsterdam (2003)
2. Amari, S.-I.: Differential geometry of curved exponential families-curvatures and information loss.
Ann. Stat. 10(2), 357–385 (1982). https://doi.org/10.1214/aos/1176345779
3. Amari, S.: Dual connections on the Hilbert bundles of statistical models. In: Dodson, C.T.J. (ed.)
Geometrization of Statistical Theory (Lancaster, 1987), pp. 123–151. ULDM Publ, Lancaster (1987)
4. Amari, S.-I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998).
https://doi.org/10.1162/089976698300017746
5. Amari, S.-I.: Information Geometry and Its Applications. Applied Mathematical Sciences, vol. 194,
p. 374. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8
6. Amari, S.-I., Kumon, M.: Estimation in the presence of infinitely many nuisance parameters—geometry
of estimating functions. Ann. Stat. 16(3), 1044–1068 (1988). https://doi.org/10.1214/aos/1176350947
7. Amari, S.-I., Nagaoka, H.: Methods of Information Geometry. Translations of Mathematical Mono-
graphs, vol. 191, p. 206. American Mathematical Society, Providence, Oxford University Press, Oxford
(2000). https://doi.org/10.1090/mmono/191 (Translated from the 1993 Japanese original by Daishi
Harada)
8. Arnold, V.I.: Mathematical Methods of Classical Mechanics. Graduate Texts in Mathematics, vol. 60,
p. 516. Springer, New York (1989). (Translated from the 1974 Russian original by K. Vogtmann and
A. Weinstein, Corrected reprint of the second (1989) edition)
123
S128 G. Pistone
9. Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information Geometry. Ergebnisse der Mathematik und
ihrer Grenzgebiete. 3. Folge, vol. 64, p. 407. Springer, Cham (2017). https://doi.org/10.1007/978-3-
319-56478-4
10. Bauer, M., Bruveris, M., Michor, P.W.: Uniqueness of the Fisher–Rao metric on the space of smooth
densities. Bull. Lond. Math. Soc. 48(3), 499–506 (2016). https://doi.org/10.1112/blms/bdw020
11. Bogachev, V.I.: Differentiable Measures and the Malliavin Calculus. Mathematical Surveys and Mono-
graphs, vol. 164, p. 488. American Mathematical Society, Providence (2010). https://doi.org/10.1090/
surv/164
12. Bourbaki, N.: Variétés Differentielles et Analytiques. Fascicule de Résultats / Paragraphes 1 à 7.
Éléments de mathématiques, vol. XXXIII. Hermann, Paris (1971)
13. Brezis, H.: Functional Analysis, Sobolev Spaces and Partial Differential Equations. Universitext, p.
599. Springer, New York (2011). https://doi.org/10.1007/978-0-387-70914-7
14. Brown, L.D.: Fundamentals of Statistical Exponential Families with Applications in Statistical Deci-
sion Theory. IMS Lecture Notes. Monograph Series, vol. 9, p. 283. Institute of Mathematical Statistics,
Hayward (1986)
15. Buldygin, V.V., Kozachenko, Y.V.: Metric Characterization of Random Variables and Random Pro-
cesses. Translations of Mathematical Monographs, vol. 188, p. 257. American Mathematical Society,
Providence (2000). (Translated from the 1998 Russian original by V. Zaiats)
16. Cena, A.: Geometric structures on the non-parametric statistical manifold. Ph.D. thesis, Università
degli Studi di Milano (2002)
17. Cena, A., Pistone, G.: Exponential statistical manifold. Ann. Inst. Stat. Math. 59(1), 27–56 (2007).
https://doi.org/10.1007/s10463-006-0096-y
18. Čencov, N.N.: Statistical Decision Rules and Optimal Inference. Translations of Mathematical Mono-
graphs, vol. 53, p. 499. American Mathematical Society, Providence (1982). https://doi.org/10.1090/
mmono/053. (Translation from the Russian edited by Lev J. Leifman)
19. Chirco, G., Pistone, G.: Dually affine Information Geometry modeled on a Banach space. (2022).
arXiv:2204.00917
20. Chirco, G., Malagò, L., Pistone, G.: Lagrangian and Hamiltonian dynamics for probabilities on the sta-
tistical bundle. Int. J. Geom. Methods Mod. Phys. (2022). https://doi.org/10.1142/s0219887822502140
21. Efron, B.: Defining the curvature of a statistical problem (with applications to second order efficiency).
Ann. Stat. 3(6), 1189–1242 (1975). https://doi.org/10.1214/aos/1176343282. (With a discussion by
C. R. Rao, Don A. Pierce, D. R. Cox, D. V. Lindley, Lucien LeCam, J. K. Ghosh, J. Pfanzagl, Niels
Keiding, A. P. Dawid, Jim Reeds and with a reply by the author)
22. Efron, B.: The geometry of exponential families. Ann. Stat. 6(2), 362–376 (1978). https://doi.org/10.
1214/aos/1176344130
23. Efron, B., Hastie, T.: Computer Age Statistical Inference. Institute of Mathematical Statistics (IMS)
Monographs. Algorithms, Evidence, and Data Science, vol. 5, p. 475. Cambridge University Press,
New York (2016). https://doi.org/10.1017/CBO9781316576533
24. Gibilisco, P., Pistone, G.: Connections on non-parametric statistical manifolds by Orlicz space geom-
etry. IDAQP 1(2), 325–347 (1998). https://doi.org/10.1142/S021902579800017X
25. Hyvärinen, A.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn.
Res. 6, 695–709 (2005)
26. Kass, R.E., Vos, P.W.: Geometrical Foundations of Asymptotic Inference. Wiley Series in Prob-
ability and Statistics: Probability and Statistics. Wiley, New York (1997). https://doi.org/10.1002/
9781118165980
27. Lang, S.: Differential and Riemannian Manifolds. Graduate Texts in Mathematics, 3rd edn., p. 364.
Springer, New York (1995). https://doi.org/10.1007/978-1-4612-4182-9
28. Lê, H.V.: Natural differentiable structures on statistical models and the Fisher metric (2022).
arXiv:2208.06539
29. Li, W., Montúfar, G.: Natural gradient via optimal transport. Inf. Geom. 1(2), 181–214 (2018). https://
doi.org/10.1007/s41884-018-0015-3
30. Lods, B., Pistone, G.: Information geometry formalism for the spatially homogeneous Boltzmann
equation. Entropy 17(6), 4323–4363 (2015). https://doi.org/10.3390/e17064323
31. Lott, J.: Some geometric calculations on Wasserstein space. Commun. Math. Phys. 277(2), 423–437
(2008). https://doi.org/10.1007/s00220-007-0367-3
32. Malagò, L., Montrucchio, L., Pistone, G.: Wasserstein riemannian geometry of gaussian densities. Inf.
Geom. 1(2), 137–179 (2018). https://doi.org/10.1007/s41884-018-0014-4
123
Affine statistical bundle modeled on a Gaussian Orlicz–Sobolev space S129
33. Malliavin, P.: Integration and Probability. Graduate Texts in Mathematics, vol. 157, p. 322. Springer,
New York (1995). https://doi.org/10.1007/978-1-4612-4202-4. (With the collaboration of Héléne
Airault, Leslie Kay and Gérard Letac, Edited and translated from the French by Kay, With a fore-
word by Mark Pinsky)
34. Malliavin, P.: Stochastic Analysis. Grundlehren der Mathematischen Wissenschaften [Fundamental
Principles of Mathematical Sciences], vol. 313, p. 343. Springer, Berlin (1997). https://doi.org/10.
1007/978-3-642-15074-6
35. Montrucchio, L., Pistone, G.: Kantorovich distance on finite metric spaces: Arens-eells norm and cut
norms. Inf. Geom. (2021). https://doi.org/10.1007/s41884-021-00050-w
36. Musielak, J.: Orlicz Spaces and Modular Spaces. Lecture Notes in Mathematics, vol. 1034. Springer,
Berlin (1983)
37. Nomizu, K., Sasaki, T.: Affine Differential Geometry: Geometry of Affine. Immersions Cambridge
Tracts in Mathematics, vol. 111. Cambridge University Press, Cambridge (1994)
38. Nourdin, I., Peccati, G.: Normal Approximations with Malliavin Calculus. From Stein’s Method to Uni-
versality. Cambridge Tracts in Mathematics, vol. 192, p. 239. Cambridge University Press, Cambridge
(2012). https://doi.org/10.1017/CBO9781139084659
39. Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Commun.
Partial Differ. Equ. 26(1–2), 101–174 (2001). https://doi.org/10.1081/PDE-100002243
40. Parry, M., Dawid, A.P., Lauritzen, S.: Proper local scoring rules. Ann. Stat. 40(1), 561–592 (2012).
https://doi.org/10.1214/12-AOS971
41. Peyré, G., Cuturi, M.: Computational optimal transport. Found. Trends Mach. Learn. 11(5–6), 355–607
(2019). https://doi.org/10.1561/2200000073. arXiv:1803.00567
42. Pistone, G.: Examples of the application of nonparametric information geometry to statistical physics.
Entropy 15(10), 4042–4065 (2013). https://doi.org/10.3390/e15104042
43. Pistone, G.: Nonparametric information geometry. In: Nielsen, F., Barbaresco, F. (eds.) Geometric
Science of Information. Lecture Notes in Comput. Sci., vol. 8085, pp. 5–36. Springer, Heidel-
berg (2013). https://doi.org/10.1007/978-3-642-40020-9_3. First International Conference, GSI 2013
Paris, France, August 28-30, 2013 Proceedings
44. Pistone, G.: Information geometry of the Gaussian space. In: Information Geometry and Its Applica-
tions. Springer Proc. Math. Stat., vol. 252, pp. 119–155. Springer, Cham (2018). https://doi.org/10.
1007/978-3-319-97798-0_5
45. Pistone, G.: Lagrangian function on the finite state space statistical bundle. Entropy 20(2), 139 (2018).
https://doi.org/10.3390/e20020139
46. Pistone, G.: Information geometry of smooth densities on the Gaussian space: Poincaré inequalities.
In: Nielsen, F. (ed.) Progress in Information Geometry. Signals and Communication Technology, pp.
1–17. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65459-7_1
47. Pistone, G.: Statistical bundle of the transport model. In: Geometric Science of Information. Lecture
Notes in Comput. Sci., vol. 12829, pp. 752–759. Springer, Cham (2021). https://doi.org/10.1007/978-
3-030-80209-7_81
48. Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability
measures equivalent to a given one. Ann. Stat. 23(5), 1543–1561 (1995)
49. Santacroce, M., Siri, P., Trivellato, B.: New results on mixture and exponential models by Orlicz spaces.
Bernoulli 22(3), 1431–1447 (2016). https://doi.org/10.3150/15-BEJ698
50. Siri, P., Trivellato, B.: Robust concentration inequalities in maximal exponential models. Stat. Prob.
Lett. 170, 109001 (2021). https://doi.org/10.1016/j.spl.2020.109001
51. Susskind, L., Hrabovsky, G.: The Theoretical Minimum: What You Need to Know to Start Doing
Physics. Basic Books, New York (2013)
52. Vershynin, R.: High-dimensional Probability: an Introduction with Applications in Data Science. Cam-
bridge Series in Statistical and Probabilistic Mathematics, vol. 47, p. 284. Cambridge University Press,
Cambridge (2018). https://doi.org/10.1017/9781108231596. (With a foreword by Sara van de Geer)
53. Villani, C.: A review of mathematical topics in collisional kinetic theory. In: Handbook of Mathemat-
ical Fluid Dynamics, vol. I, pp. 71–305. North-Holland, Amsterdam (2002). https://doi.org/10.1016/
S1874-5792(02)80004-0
54. Wainwright, M.J.: High-dimensional Statistics: A Non-asymptotic Viewpoint. Cambridge Series in
Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2019). https://doi.
org/10.1017/9781108627771
123
S130 G. Pistone
55. Weyl, H.: Space Time Matter. Dover, New York (1952). (Translation of the 1921 RAUM ZEIT
MATERIE)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
123