Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
A Fast Firefly Algorithm for Function Optimization: Application to the Control of BLDC Motor
Previous Article in Journal
Surface Plasmonic Sensors: Sensing Mechanism and Recent Applications
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Information Geometry of Sensor Configuration

1
Department of Electrical and Electronic Engineering, University of Melbourne, Melbourne, VIC 3000, Australia
2
Department of Theoretical Astrophysics, Eberhard Karls University of Tübingen, D-72076 Tübingen, Germany
3
Manly Astrophysics, 15/41-42 East Esplanade, Manly, NSW 2095, Australia
4
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(16), 5265; https://doi.org/10.3390/s21165265
Submission received: 27 June 2021 / Revised: 27 July 2021 / Accepted: 28 July 2021 / Published: 4 August 2021
(This article belongs to the Section Intelligent Sensors)
Figure 1
<p>Diagrammatic representation of the sensor model; sensors are at <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>i</mi> </msub> <mo>∈</mo> <mi>M</mi> </mrow> </semantics></math> taking measurements of a target at <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">θ</mi> <mo>∈</mo> <mi>M</mi> </mrow> </semantics></math>. A measure of distance between different sensor configurations, physically corresponding to change in information content is obtained through a suitable restriction of the metric <math display="inline"><semantics> <msub> <mi>G</mi> <mi>g</mi> </msub> </semantics></math> (<a href="#FD13-sensors-21-05265" class="html-disp-formula">13</a>) to the configuration manifold <math display="inline"><semantics> <mrow> <mi mathvariant="script">M</mi> <mo>(</mo> <mo>Γ</mo> <mo>)</mo> <mo>⊂</mo> <mi mathvariant="script">M</mi> </mrow> </semantics></math>, the space of all Riemannian metric on <span class="html-italic">M</span>. <math display="inline"><semantics> <mi mathvariant="script">M</mi> </semantics></math> is almost certainly not topologically spherical, it is merely drawn here as such for simplicity.</p> ">
Figure 2
<p>Graphical illustration of D-optimal sensor dynamics; a sensor configuration <math display="inline"><semantics> <msub> <mo>Γ</mo> <mn>0</mn> </msub> </semantics></math> evolves to a new configuration <math display="inline"><semantics> <msub> <mo>Γ</mo> <mn>1</mn> </msub> </semantics></math> by moving the <span class="html-italic">N</span> sensors in <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>-space to new positions that are determined by maximizing the determinant of the metric <span class="html-italic">G</span>, given by Equation (<a href="#FD14-sensors-21-05265" class="html-disp-formula">14</a>), on the sensor manifold <math display="inline"><semantics> <mrow> <mi mathvariant="script">M</mi> <mo>(</mo> <mo>Γ</mo> <mo>)</mo> </mrow> </semantics></math>. Each sensor <math display="inline"><semantics> <msup> <mi>λ</mi> <mi>i</mi> </msup> </semantics></math> traverses a path <math display="inline"><semantics> <msub> <mi>γ</mi> <mi>i</mi> </msub> </semantics></math> through <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>-space to end up in the appropriate positions constituting <math display="inline"><semantics> <msub> <mo>Γ</mo> <mn>1</mn> </msub> </semantics></math>. As shown in <a href="#sec4dot3-sensors-21-05265" class="html-sec">Section 4.3</a>, the paths <math display="inline"><semantics> <msub> <mi>γ</mi> <mi>i</mi> </msub> </semantics></math> are entropy minimizing if they are geodesic on the sensor manifold <math display="inline"><semantics> <mrow> <mi mathvariant="script">M</mi> <mo>(</mo> <mo>Γ</mo> <mo>)</mo> </mrow> </semantics></math>. Note that the target is shown as stationary in this particular illustration.</p> ">
Figure 3
<p>Solutions to the geodesic equation on <math display="inline"><semantics> <mrow> <mo>Ω</mo> <mo>=</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> <mo>×</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> </mrow> </semantics></math> for a target starting at <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>1</mn> <mo>,</mo> <mo>−</mo> <mn>3</mn> <mo>)</mo> </mrow> </semantics></math> and sensors at <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>7</mn> <mo>,</mo> <mo>−</mo> <mn>6</mn> <mo>)</mo> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> </semantics></math> (raised at a height 1 above <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>). The differing paths correspond to the initial direction vector <math display="inline"><semantics> <mrow> <mo>(</mo> <mo form="prefix">cos</mo> <mi>θ</mi> <mo>,</mo> <mo form="prefix">sin</mo> <mi>θ</mi> <mo>)</mo> </mrow> </semantics></math> of the target, varying as <math display="inline"><semantics> <mi>θ</mi> </semantics></math> varies from 0 to <math display="inline"><semantics> <mrow> <mn>2</mn> <mi>π</mi> </mrow> </semantics></math> radians in steps of 0.25 radians.</p> ">
Figure 4
<p>Geodesic distance on <math display="inline"><semantics> <mrow> <mo>Ω</mo> <mo>=</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> <mo>×</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> </mrow> </semantics></math> from the point <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>1</mn> <mo>,</mo> <mo>−</mo> <mn>3</mn> <mo>)</mo> </mrow> </semantics></math> with sensors at <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>7</mn> <mo>,</mo> <mo>−</mo> <mn>6</mn> <mo>)</mo> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> </semantics></math> (raised at a height 1 above <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>). The distance was calculated using a Fast Marching formulation of the geodesic equation. The fact that the geodesics follow the gradient of this distance allows comparison with <a href="#sensors-21-05265-f003" class="html-fig">Figure 3</a>.</p> ">
Figure 5
<p>Geodesic speed for each point in <math display="inline"><semantics> <mrow> <mo>Ω</mo> <mo>=</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> <mo>×</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> </mrow> </semantics></math> for targets departing each point in direction of the vector (1,−1).</p> ">
Figure 6
<p>Contour plot of <math display="inline"><semantics> <mrow> <mo movablelimits="true" form="prefix">det</mo> <mo>(</mo> <mi mathvariant="bold-italic">G</mi> <mo>)</mo> </mrow> </semantics></math>, for <math display="inline"><semantics> <mi mathvariant="bold-italic">G</mi> </semantics></math> given by (<a href="#FD14-sensors-21-05265" class="html-disp-formula">14</a>), for the case of three sensors, where <math display="inline"><semantics> <msup> <mi>S</mi> <mn>1</mn> </msup> </semantics></math> is fixed at <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>2</mn> <mo>,</mo> <mo>−</mo> <mn>2</mn> <mo>)</mo> </mrow> </semantics></math>, and <math display="inline"><semantics> <msup> <mi>S</mi> <mn>2</mn> </msup> </semantics></math> at <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>2</mn> <mo>,</mo> <mn>2</mn> <mo>)</mo> </mrow> </semantics></math> (red dot). Brighter shades indicate a greater value of <math display="inline"><semantics> <mrow> <mo movablelimits="true" form="prefix">det</mo> <mo>(</mo> <mi mathvariant="bold-italic">G</mi> <mo>)</mo> </mrow> </semantics></math>; <math display="inline"><semantics> <mrow> <mo movablelimits="true" form="prefix">det</mo> <mo>(</mo> <mi mathvariant="bold-italic">G</mi> <mo>)</mo> </mrow> </semantics></math> is maximum at <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>1</mn> <mo>,</mo> <mo>−</mo> <mn>5.5</mn> <mo>)</mo> </mrow> </semantics></math>. The third sensor is allowed to move. The geodesic path linking the initial and final positions of <math display="inline"><semantics> <msup> <mi>S</mi> <mn>3</mn> </msup> </semantics></math> starting at <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>6</mn> <mo>,</mo> <mo>−</mo> <mn>6</mn> <mo>)</mo> </mrow> </semantics></math> (yellow dot) is shown by the dashed yellow curve, while the dashed red curve shows another geodesic path linking <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>6</mn> <mo>,</mo> <mo>−</mo> <mn>7</mn> <mo>)</mo> </mrow> </semantics></math> (red dot) to the D-optimal location <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>1</mn> <mo>,</mo> <mo>−</mo> <mn>5.5</mn> <mo>)</mo> </mrow> </semantics></math>. The actual positions of the sensors are raised above <math display="inline"><semantics> <mo>Ω</mo> </semantics></math> by a distance <math display="inline"><semantics> <msub> <mi>z</mi> <mi>i</mi> </msub> </semantics></math>.</p> ">
Figure 7
<p>Solutions to the geodesic equation on <math display="inline"><semantics> <mrow> <mo>Ω</mo> <mo>=</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> <mo>×</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> </mrow> </semantics></math> for a sensor starting at <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> </semantics></math> with the other sensors stationary at <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>2</mn> <mo>,</mo> <mo>−</mo> <mn>3</mn> <mo>)</mo> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>3</mn> <mo>,</mo> <mn>2</mn> <mo>)</mo> </mrow> </semantics></math> (raised at a height 1 above <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>.). The target is located at <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> </semantics></math> on <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>. The differing paths correspond to the initial direction vector <math display="inline"><semantics> <mrow> <mo>(</mo> <mo form="prefix">cos</mo> <mi>ϕ</mi> <mo>,</mo> <mo form="prefix">sin</mo> <mi>ϕ</mi> <mo>)</mo> </mrow> </semantics></math> of the target, varying as <math display="inline"><semantics> <mi>ϕ</mi> </semantics></math> varies from 0 to <math display="inline"><semantics> <mrow> <mn>2</mn> <mi>π</mi> </mrow> </semantics></math> radians in steps of 0.25 radians.</p> ">
Figure 8
<p>Geodesic distance on <math display="inline"><semantics> <mrow> <mo>Ω</mo> <mo>=</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> <mo>×</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> </mrow> </semantics></math> for a sensor starting at <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> </semantics></math> with the other sensor stationary at <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>7</mn> <mo>,</mo> <mo>−</mo> <mn>6</mn> <mo>)</mo> </mrow> </semantics></math> (raised at a height 1 above <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>) and the using an uninformative prior for the target. The distance was calculated using a Fast Marching formulation of the geodesic equation. The fact that the geodesics follow the gradient of this distrance allows comparison with <a href="#sensors-21-05265-f007" class="html-fig">Figure 7</a>.</p> ">
Figure 9
<p>Geodesic speed sensor moving along a geodesic in direction (1, −1) at each point in <math display="inline"><semantics> <mrow> <mo>Ω</mo> <mo>=</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> <mo>×</mo> <mo>[</mo> <mo>−</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn> <mo>]</mo> </mrow> </semantics></math>. The other sensor is stationary at <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>7</mn> <mo>,</mo> <mo>−</mo> <mn>6</mn> <mo>)</mo> </mrow> </semantics></math>, and an uninformative prior is used for the target. The actual positions of the sensor are raised above <math display="inline"><semantics> <mo>Ω</mo> </semantics></math> by a distance 1.</p> ">
Review Reports Versions Notes

Abstract

:
In problems of parameter estimation from sensor data, the Fisher information provides a measure of the performance of the sensor; effectively, in an infinitesimal sense, how much information about the parameters can be obtained from the measurements. From the geometric viewpoint, it is a Riemannian metric on the manifold of parameters of the observed system. In this paper, we consider the case of parameterized sensors and answer the question, “How best to reconfigure a sensor (vary the parameters of the sensor) to optimize the information collected?” A change in the sensor parameters results in a corresponding change to the metric. We show that the change in information due to reconfiguration exactly corresponds to the natural metric on the infinite-dimensional space of Riemannian metrics on the parameter manifold, restricted to finite-dimensional sub-manifold determined by the sensor parameters. The distance measure on this configuration manifold is shown to provide optimal, dynamic sensor reconfiguration based on an information criterion. Geodesics on the configuration manifold are shown to optimize the information gain but only if the change is made at a certain rate. An example of configuring two bearings-only sensors to optimally locate a target is developed in detail to illustrate the mathematical machinery, with Fast Marching methods employed to efficiently calculate the geodesics and illustrate the practicality of using this approach.

1. Introduction

This paper is an attempt to initiate the construction of an abstract theory of sensor management, in the hope that it will help to provide both a theoretical underpinning for the solution of practical problems and insights for future work. A key component of sensor management is the amount of information a sensor in a given configuration can gain from a measurement, and how that information gain changes with the configuration of the sensor. In this vein, it is interesting to observe how information theoretic surrogates have been used in a range of applications as objective functions for sensor management; see, for instance, [1,2,3]. Our aim here is to abstract from various papers including these, those by Kershaw and Evans [4] on sensor management, as well as others, the mathematical principles required for this theory. Our approach is set within the mathematical context of differential geometry and we aim to show that geodesics on a particular manifold provide the theoretical best-possible sensor reconfiguration in the same way that the Cramer–Rao lower bound provides a minimum variance for an estimator.
The problem of estimation is expressed in terms of a likelihood; that is, a probability density p ( x | θ ) , where x denotes the measurement and θ the parameter to be estimated. It is well known [5] that the Fisher information associated with this likelihood provides a measure of information gained from the measurement. This in turn leads to the Fisher–Rao metric [6,7] on the parameter space. This Riemannian metric lies at the heart of our discussion.
While the concepts discussed here are, in some sense, generic and can be applied to any sensor system that has the capability to modify its characteristics, for simplicity and to keep in mind a motivating example, we focus on a particular problem: that of localization of a target. Of course, a sensor system is itself just a (more complex) aggregate sensor, but it will be convenient, for the particular problems we will discuss, to assume a discrete collection of disparate (at least in terms of location) sensors, that together provide measurements of aspects of the location of the target. This distributed collection of sensors, each drawing measurements that provide partial information about the location of a target, using known likelihoods, defines a particular sensor configuration state. As the individual sensors move, they change their sensing characteristics and thereby the collective Fisher information associated with estimation of target location. As already stated, the Fisher information matrix defines a metric, the Fisher–Rao metric, over the physical space where the target resides, or in more generality a metric over the parameter space in which the estimation process takes place, and this metric is a function of the location of the sensors. This observation permits analysis of the information content of the system, as a function of sensor parameters, in the framework of differential geometry (“information geometry”) [8,9,10,11]. A considerable literature is dedicated to the problem of optimizing the configuration so as to maximize information retrieval [1,12,13,14]. The mathematical machinery of information geometry has led to advances in several signal processing problems, such as blind source separation [15], gravitational wave parameter estimation [16], and dimensionality reduction for image retrieval [17] or shape analysis [18].
In sensor management/adaptivity applications, the performance of the sensor configuration (in terms of some function of the Fisher information) becomes a cost associated with finding the optimal sensor configuration, and tuning the metric by changing the configuration serves to optimize this cost. Literally hundreds of papers, going back to the seminal work of [4] and perhaps beyond, have used the Fisher information as a measure of sensor performance. In this context, parametrized families of Fisher–Rao metrics arise (e.g., [19,20]). Sensor management then becomes one of choosing an optimal metric (based on the value of the estimated parameter), from among a family of such, to permit the acquisition of the maximum amount of information about that parameter.
As we have stated, the focus, hopefully clarifying, example of this paper, is that of estimating the location of a target using measurements from mobile sensors (cf. [21,22]). The information content of the system depends both on the location of the target and on the spatial locations of the sensors, because the covariance of measurements is sensitive to the distances and angles made between the sensors and the target. As the sensors move in space, the associated likelihoods vary, as do the resulting Fisher matrices, describing the information content of the system, for every possible sensor arrangement. It is this interaction between sensors and target that this paper sets out to elucidate in from an information geometric viewpoint.
Perhaps the key extra idea in this paper is the observation by Gil-Medrano and Michor that the collection of all Riemannian metrics on a Riemannian manifold itself admits the structure of an infinite-dimensional Riemannian manifold [23,24]. Of interest to us is only the subset of Riemannian metrics corresponding to Fisher informations of available sensor configurations, and this allows us to restrict attention to a finite-dimensional sub-manifold of the manifold of metrics, called the sensor manifold [12,13]. In particular, a continuous adjustment of the sensor configuration, say by moving one of the individual sensors, results in a continuous change in the associated Fisher metric and so a movement in the sensor manifold.
Though computationally difficult, the idea of regarding the Fisher–Rao metric as a measure of performance of a given sensor configuration and then understanding variation in sensor configuration in terms of the manifold of such metrics is a powerful concept. It permits questions concerning optimal target trajectories, as discussed here, to minimize information passed to the sensors and, as will be discussed in a subsequent paper, optimal sensor trajectories to maximize information gleaned about the target. In particular, we remark that the metric on the space of Riemannian metrics that appears naturally in a mathematical context in [23,24] also has a natural interpretation in a statistical context.
Our aims here are to further develop the information geometric view of sensor configuration begun in [12,13]. While the systems discussed are simple and narrow in focus, already they point to concepts of information collection that appear to be new. Specifically, we setup the target location problem in an information geometric context and show that the optimal (in a manner to be made precise in Section 3) sensor trajectories, in a physical sense, are determined by solving the geodesic equations on the configuration manifold (Section 4). Various properties of geodesics on this space are derived, and the mathematical machinery is demonstrated using concrete physical examples (Section 5).

2. Divergences and Metrics

Here, we discuss some generalities on statistical divergences and metrics. The two divergences of interest to us are the Kullback–Leibler divergence and the Jensen–Shannon divergence. These are defined by
D KL ( p q ) = p ( x ) log p ( x ) q ( x ) d x
D JS ( p q ) = 1 2 D KL p p + q 2 + D KL q p + q 2 ,
where p and q are two probability densities on some underlying measure space.
In our case, the densities p , q are taken from a single parametrized family p ( θ ) , where θ is in a manifold M. More specifically, the parameter θ = θ , the location of the target. A justification for the Kullback–Leibler divergence is that, in this context, given a measurement x of a target at θ , the likelihood that the same measurement could be obtained from a target at θ , L ( θ , θ ) , is given by the log odds expression
L ( θ , θ ) = log p ( x | θ ) p ( x | θ ) ,
and the average over all measurements is, by definition, the Kullback–Leibler divergence [25], D KL ( θ θ ) .
Each of the Kullback–Leibler and Jensen–Shannon divergences is regarded as a measure of information difference between the two densities. They are both non-negative and share the property that D ( p q ) = 0 implies that p = q almost everywhere. The Kullback–Leibler divergence is related to mutual information, as is, somewhat more circuitously the Jensen–Shannon divergence. We refer the reader to Section 17.1 of [5] for a discussion of this connection for the Kullback–Leibler, and to [26] for a brief discussion of the Jensen–Shannon divergence.
Importantly for us, as θ θ , the first non-zero term in the series expansion is second order, viz.
lim θ θ D ( θ θ ) = lim θ θ ( θ θ ) T g ( θ θ ) + O ( θ θ ) 3 ,
where g = g ( θ ) is an n × n symmetric matrix, and n is the dimension of the manifold M.
This location-dependent matrix defines a Riemannian metric over M, the Fisher information metric [6,7]. It can also be calculated, under mild conditions, as the expectation of the tensor product of the gradients of the log-likelihood = log p ( x | θ ) as
g = g ( θ ) = E x | θ d θ d θ .
This is an example of a more general structure encapsulated by the following:
Theorem 1.
Given a likelihood : M [ 0 , 1 ] and a divergence D ( ( θ ) ( θ ) ) then there is a unique largest metric d on M such that d ( θ , θ ) D ( ( θ ) | | ( θ ) )
The relationship between this metric and information is explored in the next section. A proof of the theorem appears in the Appendix A.

3. The Information in Sensor Measurements

Sensor measurements, as considered in this paper, can be formulated as follows. Suppose we have, in a fixed manifold M, a collection of N sensors located at λ i   ( i = 1 , , N ) . For instance, the manifold may be R 3 and location may just mean that in the usual sense in Euclidean space. The measurements from these sensors are used to estimate the location of a target θ also in M (see the left of Figure 1). Each sensor draws measurements x i from a distribution with probability density function (PDF) p i ( x i | θ , ν i ) , where θ represents the state of the system we are trying to estimate and ν i the noise parameters of each individual sensor.
A measurement x = { x i } i = 1 N is the collected set of individual measurement from each of the sensors with likelihood
p ( x | θ ) = i = 1 N p i ( x i | θ , ν i ) .
These measurements are assumed independent between sensors and over time when dynamics are included. While this assumption is probably not necessary, it allows one to define the aggregate likelihood (5) as a simple product over the individual likelihoods, which renders the problem computationally more tractable.
Since Fisher information is additive over independent measurements, the Fisher information metric provides a measure of the instantaneous change in information the sensors can obtain about the target. In this paper, we adopt a relatively simplistic view that the continuous collection of measurements as exemplified by, for instance, the Kalman-Bucy filter [27], is a limit of measurements discretized over time. Because sensor measurements depend on the relative locations of the sensors and target, this incremental change depends on the direction the target is moving; the Fisher metric (4) can naturally be expressed in coordinates that represent the sensor locations (see Section 4), but also depends on parameters that represent target location, which may be functions of time in a dynamical situation. Once the Fisher metric (4) has been evaluated, one can proceed to optimize it in an appropriate manner.
Another divergence will be of interest in our discussions. This is the Jensen–Shannon divergence, defined in terms of the Kullback–Leibler divergence by
D JS ( p θ p θ ) = 1 2 D KL p θ p θ + p θ 2 + D KL p θ p θ + p θ 2

3.1. D-Optimality

The information of the sensor system described in Section 3 is a matrix-valued function and so problems of optimization of ‘total information’ with respect to the sensor parameters are not well defined. We require a definition of ‘optimal’ in an information theoretic context. Several different optimization criteria exist (e.g., [28,29]), achieved by constructing scalar invariants from the matrix entries of g, and maximizing those functions in the usual way.
We adopt the notion of D-optimality in this paper; we consider the maximization of the determinant of (4). Equivalently, D-optimality maximizes the differential Shannon entropy of the system with respect to the sensor parameters [30], and minimizes the volume of the elliptical confidence regions for the sensors estimate of the location of the target θ [31].
While we have focused so far on the metric as a function of target location θ on the manifold, it is also clear that sensor parameters are involved. In applying D-optimality (or any other) criterion to this problem, we take account of the possibility that sensor locations (in the context we have described) and distributions are not fixed. Conventionally, measurements are drawn from sensors with fixed properties, with a view to estimating a parameter θ . Permitting sensors to move throughout M produces an infinite family of sensor configurations, and hence Fisher–Rao metrics (4), parametrized by the locations of the sensors. One aim of this structure is to move sensors to locations that serve to maximize information content, given some prior distribution for a particular θ M . This necessitates a tool to measure the difference between information provided by members of a family of Fisher–Rao metrics; this is explored in Section 4. We emphasize that the ideas described here will extend to situations where the sensor manifold is not identified with a copy (or several copies) of the target manifold.

3.2. Geodesics on the Sensor Manifold

We now consider the case where the target is in motion, so that θ varies along a path γ ( t ) M . The instantaneous information gain by the sensor(s) at time t is then g γ ( t ) , γ ( t ) , where g is the Fisher information metric (4). We define the information provided to the sensors about the target by the measurements as the target moves along the curve as follows. Take a δ -partition T = { 0 = t 0 , t 1 , t 2 , , t N = 1 } of the interval [ 0 , 1 ] , where 0 < t i + 1 t i < δ ( i = 0 , 1 , , N 1 ). At each point γ ( t i ) , a measurement is taken represented by the likelihood p ( x i | γ ( t i ) ) . Over the interval ( t i + 1 , t i ] , the increment in information could be D ( γ ( t i γ ( t i + 1 ) / ( t i + 1 t i ) but this does not make sense for an increment as it is not symmetric. Instead, we should calculate the information gain from the endpoints of the interval to the mean distribution
D ( γ ( t i 1 2 ( γ ( t i ) + γ ( t i + 1 ) ) + D ( γ ( t i + 1 1 2 ( γ ( t i ) + γ ( t i + 1 ) ) .
This is also know as the Jensen–Shannon divergence, D J S ( X Y ) . Based on the assumption that measurements taken over time are all independent, as the target traverses a curve γ , the total information I T gained measuring at the δ -partition, T is
I T ( γ ) = i = 0 N 1 D J S ( γ ( t i γ ( t i + 1 ) / ( t i + 1 t i )
Letting δ go to zero gives us an information functional, I ( T ) , defined on paths, γ ,
I ( γ ) = lim δ 0 I T ( γ ) = 0 T g ( γ ( t ) , γ ( t ) ) d t ,
which is the equivalent of the energy functional in differential geometry (e.g., Chapter 9 of [32]) This has the same extremal paths as l g ( γ ) , the arc-length of the path γ ,
l g ( γ ) = 0 T g γ ( t ) , γ ( t ) d t .
Paths with extremal values of this length are geodesics and these can be interpreted as the evasive action that can be taken by the target to minimize amount of information it gives to the sensors.

3.3. Kinematic Conditions on Information Management

While the curves that are extrema of the information functional and of arc-length are the same as sets, a geodesic only minimizes the information functional if traversed at a specific speed, namely,
d l g / d t = + g ( γ ( t ) , γ ( t ) ) .
In differential geometric terms, this is equivalent to requiring the arc-length parameterization of the geodesic to fulfill the energy condition. In order to minimize information about its location, the target should move along a geodesic of g at exactly the speed d l g / d t (10). This quantifies directly the relation between the speed of the target and the ability to identify its location.
While aspects of this speed constraint are still unclear to us, an analogy that may be useful is to regard the target as moving though a “tensorial information fluid” whose flow is given by the information metric. In this setting, moving slower relative to the fluid will result in “pressure” building behind, requiring information (energy) to be expended to maintain the slower speed. Moving faster also requires more information to push through the slower moving fluid. This information is directly available to the sensors resulting in improved target position estimate.
In the fluid dynamics analogy, the energy expended in moving though a fluid is proportional to the square of the difference in speed between the fluid and the object. The local energy is proportional to the difference between actual speed and the speed desired by the geodesic; that is, the speed that minimizes the energy functional. Pursuing the fluid dynamics analogy, the density of the fluid mediates the relationship between the energy and the relative speed.
E g ( δ v , δ v )
In particular, the scalar curvature, which depends on G, influences the energy and hence the information flow. We will explore this issue further in a future publication.

4. The Information of Sensor Configurations

A sensor configuration is a set Γ = { λ i } i = 1 N of sensor parameters. The Fisher–Rao metric g can be viewed as a function of Γ as well as θ , the location of the target. To calculate the likelihood that a measurement came from one configuration Γ 0 over another Γ 1 requires the calculation of p ( x | Γ 0 , θ ) / p ( x | Γ 1 , θ ) , which is difficult as the value of θ is not known exactly. Measurements can be used to construct an estimate θ ^ . However, the distribution of this estimate is hard to quantify and even harder to calculate. Instead, here, the maximum entropy distribution is used. This is normally distributed with mean θ ^ , and covariance g 1 ( θ ^ ) , the inverse of the Fisher information metric at the estimated target location.
The information gain due to the sensor configuration Γ is now D ( p ( x | Γ , θ ^ ) 1 ) because there was no prior information about the location before the sensors were configured compared with the maximum entropy distribution p after. Note that the uniform distribution, 1 is, in general, an improper prior in the Bayesian sense unless the manifold M is of finite volume. It may be necessary, therefore, to restrict attention to some (compact) submanifold Ω M for D ( p ( x | Γ , θ ^ ) 1 ) to be well defined (cf. the discussion below Equation (12)).
Evaluating this information gain gives
D ( p 1 ) = log ( 2 π e ) n det g 1 ( Γ , θ ^ )
The Fisher information metric G for this divergence can be calculated from
G ( h , k ) = E d g 2 D ( p 1 ) = M t r ( g 1 h g 1 k ) vol ( g ) d μ
where h and k are tangent vectors to the space of metrics. The integral defining (12) may not converge for non-compact M, so restriction to a compact submanifold Ω of M is assumed throughout as necessary (cf. Figure 1).

4.1. The Manifold of Riemannian Metrics

The set of all Riemannian metrics over a manifold M can itself be imbued with the structure of an infinite-dimensional Riemannian manifold [23,24], which we call M . Points of M are Riemmanian metrics on M; i.e., each point G M bijectively corresponds to a positive-definite, symmetric ( 0 , 2 ) -tensor in the space S + 2 T M . Under reasonable assumptions, an L 2 metric on M [23,33] may be defined as:
G ( h , k ) = M t r ( g 1 h g 1 k ) vol ( g ) d μ ,
which should be compared to (12).
It should be noted that the points of the manifold M comprise all of the metrics that can be put on M, most of which are irrelevant for our physical sensor management problem. We restrict consideration to a sub-manifold of M consisting only of those Riemannian metrics that are members of the family of Fisher information matrices (4) corresponding to feasible sensor configurations. This particular sub-manifold is called the ‘sensor’ or ‘configuration’ manifold [12,13] and is denoted by M ( Γ ) , where now the objects h and k are now elements of the now finite-dimensional tangent space T M ( Γ ) . The dimension of M ( Γ ) is N × dim ( M ) since each point of M ( Γ ) is uniquely described by the locations of the N-sensors, each of which require dim ( M ) numbers to denote their coordinates. A visual description of these spaces is given in Figure 1. For all cases considered in this paper, the integral defined in (13) is well defined and converges (see, however, the discussion in [21]).
For the purposes of computation, it is convenient to have an expression for the metric tensor components of (13) in some local coordinate system. In particular, in a given coordinate basis z i over M ( Γ ) (not to be confused with the coordinates on Ω ; see Secion Section 4), the metric (13) reads
G ( h , k ) = Ω g n k g m h m n k k vol ( g ) .
where h and k are tangents vectors in T M ( Γ ) given in coordinates by
T M ( Γ ) = span z i g m n i = 1 dim M ( Γ ) .
From the explicit construction (14), all curvature quantities of M ( Γ ) , such as the Riemann tensor and Christoffel symbols, can be computed.

4.2. D-Optimal Configurations

D-optimality in the context of the sensor manifold described above is discussed in this section. Suppose that the sensors are arranged in some arbitrary configuration Γ 0 . The sensors now move in anticipation of target behaviour; a prior distribution is adopted to localize a target position θ . The sensors move continuously to a new configuration Γ 1 , where Γ 1 is determined by maximizing the determinant of G, i.e., Γ 1 corresponds to the sensor locations for which det ( G ) , computed from (14), is maximized. The physical situation is depicted graphically in Figure 2. This process can naturally be extended to the case where real measurement data are used. In particular, as measurements are drawn, a (continuously updated) posterior distribution for θ becomes available, and this can be used to update the Fisher metric (and hence the metric G) to define a sequence Γ t of optimal configurations; see Section 5.

4.3. Geodesics for the Configuration Manifold

While D-optimality allows us to determine where the sensors should move given some prior, it provides us with no guidance on which path the sensors should traverse, as they move through Ω , to reach their new, optimized positions.
A path Υ ( t ) from one configuration Γ 0 to another Γ 1 is a set of paths Γ ( t ) = γ i ( t ) i = 1 N for each sensor S i from location λ i 0 to λ i 1 . Varying the sensor locations is equivalent to varying the metric g ( t ) = g ( Γ ( t ) , θ ^ ( t ) ) and the estimate of the target location θ ^ . The information gain along Υ , I ( Υ ) , is then
I ( Υ ) = Υ G g ( t ) g ( t ) , g ( t ) d t ,
and the extremal paths are again the geodesics of the metric G, by analogy with (8). Also the speed constraint observed earlier in Section 3. Section 3.3 for the sensor geodesics is in place here and given by
d l G d t = G g ( t ) g ( t ) , g ( t ) .
Again, this leads to the conclusion that there are kinematic constraints on the rate of change of sensor parameters that lead to the collection of the maximum amount of information. In this case, it is the sensor using information to slow or increase the rate of change of its parameters relative to (17). That information is wasted (like a form of information heat) and results in less information available for localisation of the target and, hence, a poorer estimate of it location.

5. Configuration for Bearings-Only Sensors

To illustrate the mathematical machinery developed in the previous sections consider first the configuration metric for two bearings-only sensors. The physical space Ω , where the sensors and the target reside, is chosen to be the square [ 10 , 10 ] × [ 10 , 10 ] R 2 , though the sensors are ‘elevated’ at some height z i above the plane. Effectively, we assume an uninformative (uniform) prior over the square Ω .
The goal is to estimate a position θ from bearings-only measurements taken by the sensors, as in previous work [12,13]. We assume that measurements are drawn from a Von Mises distribution,
M n p n ( · | θ ) = e κ cos · arg θ λ n 2 π I 0 ( κ ) ,
where κ is the concentration parameter and I r is the rth modifed Bessel function of the first kind, [34], and λ n = ( x n , y n ) is the location of the n-th sensor in Cartesian coordinates. Note that the parameter, κ , being a measure of angular concentration means the location error will increase and the signal-to-noise ratio decrease as the sensors move farther away from the target.
For the choice (18), the Fisher metric (4) can be computed easily, and has components
g = κ 1 I 2 ( κ ) 2 I 0 ( κ ) i = 1 N 1 ( x x i ) 2 + ( y y i ) 2 + ( z z i ) 2 ( y y i ) 2 ( x x i ) ( y y i ) ( x x i ) ( y y i ) ( x x i ) 2 ,
where κ = 1 is chosen as our noise parameter in what follows.
The target lives in the plane, thus z = 0 but the sensors live in a copy of the plane at height 1, i.e., we take z i = 1 . This prevents singularities from emerging within (19).

5.1. Target Geodesics

A geodesic, γ ( t ) , starting at p with initial direction v for a manifold with metric g is the solution of the coupled second-order initial value problem for the components γ i ( t ) :
d 2 γ i d t 2 = Γ j k i d γ j d t d γ k d t , γ ( 0 ) = p , γ ( 0 ) = v ,
where Γ j k i are the Christoffel symbols for the metric g.
Figure 3 shows directly integrated solutions to the geodesic Equation (20) a target at ( 1 , 3 ) and sensors at ( 7 , 6 ) and ( 0 , 1 ) (raised at a height 1 above Ω ). The differing paths correspond to the initial direction vector ( cos ϕ , sin ϕ ) of the target, varying as ϕ varies from 0 to 2 π radians in steps of 0.25 radians.
The solution of the geodesic equation are generated from t = 0 to 5. this means the length of the corresponds to the information given away by the target. The longest line corresponds to the target forming a right-angle with the sensors which is the optimal arrangement for bearings only sensors. The shortest geodesics occur when the target moves on the line joining the two sensors. This usually where a location cannot be determined but this singularity is avoided in this example because the sensors are arranged above the plane inhabited by the target but it is still the place where the location is most difficult to determine.
An alternative way to numerically compute the geodesics connecting two points on the manifold M is using the Fast Marching Method (FMM) [35,36,37]. Since the Fisher–Rao information metric is a Riemannian metric on the Riemannian manifold M, one can show that the geodesic distance map u, the geodesic distance from the initial point p to a point θ , satisfies the Eikonal equation
u ( θ ) g θ 1 = 1
with initial condition u ( p ) = 0 . By using a mesh over the parameter space, the (isotropic or weakly anisotropic) Eikonal Equation (21), can be solved numerically by Fast Marching. The geodesic is then extracted by integrating numerically the ordinary differential equation
d γ ( t ) d t = η t g θ ( t ) 1 u ( θ ( t ) ) ,
where η t > 0 is the mesh size. The computational complexity of FMM is O ( N log N ) , where N is the total number of mesh grid points. We perform FMM on a central computational node in this paper. If the computational capability of that node is limited, a parallel version of FMM maybe used instead and we will include it in future work. For Eikonal equations with strong anisotropy, a generalized version of FMM, the Ordered Upwind Method (OUM) [38] is preferred.
Compare Figure 3 with Figure 4 which uses a Fast Marching algorithm to calculate the geodesic distance from the same point.
Figure 5 shows the speed required along a geodesic travelling in the direction of the vector ( 1 , 1 ) . In the case of these bearing-only sensors, the trade-off over speed is between time-on-target and rate of change of relative angle between the sensors and the target. Travelling slowly means more time for the sensors to integrate measurements but the change in angle is slower, resulting in more accurate position estimates. Conversely, faster movement than the geodesic speed results in larger change in angle measurements but less time for measurements again resulting in more accurate measurements. Only at the geodesic speed is the balance reached and the minimum information criterion achieved.

5.2. Configuration Metric Calculations

The coordinates z on S Ω are { x 1 , y 1 , x 2 , y 2 } . The actual positions of the sensor are raised above Ω by a distance z i . The sensor management problem amounts to, given some initial configuration, identifying a choice of { x 1 , y 1 , x 2 , y 2 } for which the determinant of (14) is maximized, and traversing geodesics γ i in S Ω , starting at the initial locations and ending at the D-optimal locations; see Figure 2. We assume that the target location is given by an uninformative prior distribution; that is, P ( θ A ) = vol ( A ) / vol ( Ω ) for all A Ω .
To make the problem tractable, we consider a simple case where one of the sensor trajectories is fixed; that is, we consider a two-dimensional submanifold of M ( Γ ) parameterized by the coordinates ( x 2 , y 2 ) of S 2 only. Figure 6 shows a contour plot of det ( G ) , as a function of ( x 2 , y 2 ) , for the case where S 1 moves from ( 0 , 1 ) (yellow dot) to ( 2 , 3 ) (black dot). The second sensor S 2 begins at the point ( 6 , 7 ) (red dot). Figure 6 demonstrates that, provided S 1 moves from ( 0 , 1 ) to ( 2 , 3 ) , det ( G ) is maximized at the point ( 1 , 5.5 ) , implying that S 2 moving to ( 1 , 5.5 ) is the D-optimal choice. The geodesic path γ 1 beginning at ( 0 , 1 ) and ending at ( 2 , 3 ) is shown by the dashed yellow curve; this is the information-maximizing path of S 1 through Ω . Similarly for S 2 , the geodesic path γ 2 beginning at ( 1 , 3 ) and ending at ( 1 , 5.5 ) is shown by the dotted red curve.

5.3. Visualizing Configuration Geodesics

Figure 7 shows solutions to the geodesic equation on Ω = [ 10 , 10 ] × [ 10 , 10 ] for a sensor starting at ( 0 , 1 ) with the other sensor stationary at ( 7 , 6 ) and the target at ( 1 , 3 ) . The actual positions of the sensor are raised above Ω by a distance 1. The differing paths correspond to ϕ varying through [ 0 , 2 π ] in steps of 0.25 radians in the target initial direction vector ( cos ϕ , sin ϕ ) . the integration of the geodesic equation is carried to t = 5 so the length of each line corresponds to the information obtained about the target. It is clear by that the longer lines correspond to the sensors forming a right-angle relationship to the target which is optimal for bearings-only sensors. This should be compared with Figure 8, which uses a Fast Marching algorithm [39,40] to calculate the geodesic distance from the same point. Figure 9 shows the speed required along a geodesic travelling through each point in the direction of the vector (1,1).

6. Discussion

We consider the problem of sensor management from an information-geometric standpoint [8,9]. A physical space houses a target, with an unknown position, and a collection of mobile sensors, each of which takes measurements with the aim of gaining information about target location [12,13,14]. The measurement process is parametrized by the relative positions of the sensors. For example, if one considers an array of sensors that take bearings-only measurements to the target (18), the amount of information that can be extracted regarding target location clearly depends on the angles between the sensors. In general, we illustrate that in order to optimize the amount of information the sensors can obtain about the target, the sensors should move to positions which maximize the norm of the volume form (‘D-optimality’) on a particular manifold imbued with a metric (14) which measures the distance (information content difference) between Fisher matrices [30,31]. We also show that, if the sensors move along geodesics (with respect to (14)) to reach the optimal configuration, the amount of information that they give away to the target is minimized. This paves the way for (future) discussions about game-theoretic scenarios where both the target and the sensors are competitively trying to acquire information about one another from stochastic measurements; see, e.g., [41,42] for a discussion on such games. Differential games along these lines will be addressed in forthcoming work.
It is important to note that while the measurements from sensors may be discrete and subject to sampling limits, the underlying model of target and sensor parameters is continuous. We have introduced two discretisations of our manifold to calculate approximations to our optimal reconfiguration paths and to enable some helpful visualisations. The explicit solution of the geodesic equation Figure 3 uses a numerical differential equation solver on a coordinate patch over the sensor manifold. The Fast Marching visualisation of Figure 4 uses a meshgrid over the same coordinate patch. Future work on optimal control of sensor parameters and game theoretic competitions between the information demands of sensors and targets may require more sophisticated discretisations, which available in the literature [43,44].
We hope that this work may eventually have realistic applications to signal processing problems involving parameter estimation using sensors. We have demonstrated that there is a theoretical way of choosing sensor positions, velocities, and possibly other parameters in an optimal manner so that the maximum amount of useful data can be harvested from a sequence of measurements taken by the sensors. For example, with sensors that take continuous or discrete measurements, this potentially allows one to design a system that minimizes the expected amount of time taken to localize (with some given precision) the position of a target. If the sensors move along paths that are geodesic with respect to (14), then the target, in some sense, learns the least about its trackers. This allows the sensors to prevent either intentional or unintentional evasive manoeuvres; a unique aspect of information-geometric considerations. Ultimately, these ideas may lead to improvements on search or tracking strategies available in the literature, e.g., [45,46]. Though we have only considered simple sensor models in this paper, the machinery can, in principle, be adopted to systems of arbitrary complexity. It would certainly be worth testing the theoretical ideas presented in this paper experimentally using various sensor setups.

Author Contributions

Conceptualization, validation, analysis, investigation, writing—original draft preparation, S.W., A.G.S. and B.M.; methodology, writing—review and editing, S.W., A.G.S., Z.W. and B.M.; software, visualization, S.W., A.G.S. and Z.W.; resources, supervision, funding acquisition, B.M.; project administration, S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the US Air Force Office of Scientific Research (AFOSR) Under Grant No. FA9550-12-1-0418. This work also received funding from the Australian Government, via grant AUS- 7 MURIB000001 associated with ONR MURI grant N00014-19-1-2571. Simon Williams acknowledges an Outside Studies Program grant from Flinders University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Proof of Theorem

Appendix A.1. Mise en Scène

Let M be a manifold parametrizing a family of probability densities M θ p ( x θ ) = p θ ( x ) . This map is assumed 1 1 in the sense that, if p ( x θ ) = p ( x θ ) almost everywhere x, then θ = θ . We assume that this family is continuously differentiable in the sense p ( x θ ) θ is continuous as a function of θ , almost everywhere x Ω , It will be convenient to write the likelihood as:
x ( θ ) = p ( x θ )
to emphasise the dependence on θ .
This gives rise to the Fisher Information in the usual way:
FI ( θ ) = E [ d d ] .
Provided this is non-degenerate it defines a Riemannian metric on M —the Fisher-Rao metric.
The total variation metric is defined on M by
ρ TV ( θ , θ ) = Ω p ( x θ ) p ( x θ ) d x ,
where d x indicates the underlying measure on the space Ω . This is a true metric.
We recall two divergences: the Kullback-Leibler divergence and the Jensen-Shannon divergence:
D KL ( p θ p θ ) = D KL ( θ θ ) = ω p ( x θ ) log p ( x θ ) p ( x θ ) d x
D JS ( θ | θ ) = 1 2 D KL ( p θ q ) + D KL ( p θ q ) ,
where q = 1 2 ( p θ + p θ ) . Pinsker’s inequality says:
ρ TV ( θ , θ ) 1 2 D KL ( θ θ ) .
and Endres & Schindelin have proved [26] that D JS is a metric in its own right. Both of these satisfy the basic property of divergences:
D ( θ θ ) 0 and D ( θ θ ) = 0 θ = θ .
Assume that h T θ ( M ) —the tangent space of M at θ . Then
D KL ( exp θ ( t h ) θ ) = 1 2 t 2 FI θ ( h , h ) + O ( t 3 )
A similar equation holds for the Jensen-Shannon divergence:
D JS ( exp θ ( t h ) θ ) = c t 2 FI θ ( h , h ) + O ( t 3 ) .

Appendix A.2. The Information Metric

We define
D S ( θ θ ) = 1 2 D KL ( θ θ ) + D KL ( θ θ ) ,
ρ KL ϵ ( γ ) = inf { k = 1 N 1 D S γ ( t k ) , γ ( t k 1 ) : γ ( 0 ) = θ γ ( 1 ) = θ , 0 = t 0 < t 1 < < t N = 1 , D S γ ( t k ) , γ ( t k 1 ) < ϵ k }
and
ρ KL ( θ , θ ) = inf γ lim ϵ 0 ρ ϵ ( γ )
Here γ is any path from θ to θ for which the limit is finite. Similarly we can define
ρ JS ( θ , θ ) = inf γ lim ϵ 0 inf { k = 1 N 1 D JS γ ( t k ) , γ ( t k 1 ) : γ ( 0 ) = θ γ ( 1 ) = θ , 0 = t 0 < t 1 < < t N = 1 , D S γ ( t k ) , γ ( t k 1 ) < ϵ k }
The Jensen-Shannon version does not require symmetrization. Also the fact that D JS is actually a metric means that there is no problem about finiteness.
It is straightforward to show that both of ρ KL and ρ JS satisfy the symmetry and triangle inequality properties of metrics. The zero property for ρ JS just follows from the fact that D JS is a metric in its own right and that for D KL follows from Pinsker’s inequality:
ρ TV ( θ , θ ) 1 2 D KL ( θ θ )
The C ( 1 ) nature of the likelihoods and compactness of the paths γ yields the following lemma:
Lemma A1.
lim ϵ 0 ρ KL ϵ ( γ ) = 0 1 1 2 FI ( γ ( t ) d γ ( t ) , d γ ( t ) d t
where d γ ( t ) is the tangent vector of γ at γ ( t ) .
This confirms finiteness of ρ KL ( θ , θ ) . Similarly for D JS :
Lemma A2.
lim ϵ 0 ρ JS ϵ ( γ ) = 0 1 c FI ( γ ( t ) d γ ( t ) , d γ ( t ) d t
where d γ ( t ) is the tangent vector of γ at γ ( t ) .
One obvious consequence is that ρ JS and ρ KL are just multiples of each other.

Appendix A.3. Length Metrics

Here we insert a brief discussion of length metrics. Our main reference is [47].
Let ( X , d ) be a metric space. The length of a curve (that is a continuous function γ : [ 0 , 1 ] X is
error ( γ ) = sup k = 1 N 1 d ( γ ( t k ) , γ ( t k 1 ) : 0 = t 0 < t 1 < < t N = 1 .
Now ( X , d ) is called a length space and d a length metric if
d ( x , y ) = inf { error ( γ ) : γ is a curve from x to y } x , y X .
If, for all x , y X , there is a curve γ that achieves the infimum, the ( X , d ) is called a geodesic space.
A metric space is proper if every closed and bounded set is compact. We note that any noncontracting map f : X X is an isometry if X is a proper metric space. The following two results are in [47]:
Proposition A1.
Every proper length space is geodesic.
Theorem A1.
(Hopf-Rinow)Every locally compact complete length space is proper.

Appendix A.4. Extremal Property of the Information Metric

First we sort out the differences in the definitions in (A17) and (A11) regarding the lengths of curves. We note that by the triangle inequality, we could replace (A17) by
error ( γ ) = lim ϵ 0 sup { k = 1 N 1 d ( γ ( t k ) , γ ( t k 1 ) ) : 0 = t 0 < t 1 < < t N = 1 , d ( γ ( t k ) , γ ( t k 1 ) ) < ϵ , k } ,
and now using (A8) we see that, for our case, the length of a curve defined in this way starting from ρ KL (resp. ρ JS ) is just the integral defined in (A15) (resp. (A16)). This proves that ρ KL and ρ JS are indeed length metrics. In fact, we have the following theorems:
Theorem A2.
The metric ρ KL is the largest length metric ρ such that ρ ( θ , θ ) 2 D KL ( θ θ ) for all θ , θ M .
Theorem A3.
The metric ρ JS is the largest length metric ρ such that ρ ( θ , θ ) 2 D JS ( θ θ ) for all θ , θ M .
Since M is a proper space, these are geodesic metrics—with the same geodesics.

References

  1. Bell, M.R. Information theory and radar waveform design. IEEE Trans. Inf. Theory 1993, 39, 1578–1597. [Google Scholar] [CrossRef] [Green Version]
  2. Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
  3. Sowelam, S.M.; Tewfik, A.H. Waveform selection in radar target classification. IEEE Trans. Inf. Theory 2000, 46, 1014–1029. [Google Scholar] [CrossRef]
  4. Kershaw, D.J.; Evans, R.J. Optimal waveform selection for tracking systems. IEEE Trans. Inf. Theory 1994, 40, 1536–1550. [Google Scholar] [CrossRef]
  5. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
  6. Rao, C.R. Information and accuracy attainable in estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
  7. Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. Ser. A 1946, 186, 453–461. [Google Scholar]
  8. Amari, S. Information geometry on hierarchy of probability distributions. IEEE Trans. Inf. Theory 2001, 47, 1701–1711. [Google Scholar] [CrossRef] [Green Version]
  9. Amari, S.I.; Nagaoka, H. Methods of Information Geometry; American Mathematical Soc.: Providence, RI, USA, 2007; Volume 191. [Google Scholar]
  10. Amari, S.I. Information geometry. Jpn. J. Math. 2021, 16, 1–48. [Google Scholar] [CrossRef]
  11. Nielsen, F. (Ed.) Progress in Information Geometry: Theory and Applications; Springer International Publishing: New York, NY, USA, 2021. [Google Scholar]
  12. Moran, B.; Howard, S.; Cochran, D. An information-geometric approach to sensor management. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 5261–5264. [Google Scholar]
  13. Moran, W.; Howard, S.D.; Cochran, D.; Suvorova, S. Sensor management via Riemannian geometry. In Proceedings of the 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 1–5 October 2012; pp. 358–362. [Google Scholar]
  14. Mentre, F.; Mallet, A.; Baccar, D. Optimal design in random-effects regression models. Biometrika 1997, 84, 429–442. [Google Scholar] [CrossRef]
  15. Amari, S.I.; Cichocki, A.; Yang, H.H. A new learning algorithm for blind signal separation. In Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers: Burlington, MA, USA, 1996; pp. 757–763. [Google Scholar]
  16. Vallisneri, M. Use and abuse of the Fisher information matrix in the assessment of gravitational-wave parameter-estimation prospects. Phys. Rev. D 2008, 77, 042001-1–042001-20. [Google Scholar] [CrossRef] [Green Version]
  17. Carter, K.M.; Raich, R.; Finn, W.G.; Hero, A.O., III. Information-geometric dimensionality reduction. IEEE Signal Process. Mag. 2011, 28, 89. [Google Scholar] [CrossRef] [Green Version]
  18. Peter, A.; Rangarajan, A. Shape analysis using the Fisher-Rao Riemannian metric: Unifying shape representation and deformation. In Proceedings of the 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, Arlington, VA, USA, 6–9 April 2006; pp. 1164–1167. [Google Scholar]
  19. Montúfar, G.; Rauh, J.; Ay, N. On the Fisher metric of conditional probability polytopes. Entropy 2014, 16, 3207–3233. [Google Scholar] [CrossRef] [Green Version]
  20. Beldjoudi, G.; Rebuffel, V.; Verger, L.; Kaftandjian, V.; Rinkel, J. An optimised method for material identification using a photon counting detector. Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2012, 663, 26–36. [Google Scholar] [CrossRef]
  21. Suvorova, S.; Moran, B.; Howard, S.D.; Cochran, D. Control of sensing by navigation on information gradients. In Proceedings of the IEEE Global Conference on Signal and Information Processing (GlobalSIP), Austin, TX, USA, 3–5 December 2013; pp. 197–200. [Google Scholar]
  22. Cochran, D.; Hero, A.O. Information-driven sensor planning: Navigating a statistical manifold. In Proceedings of the IEEE Global Conference on Signal and Information Processing (GlobalSIP), Austin, TX, USA, 3–5 December 2013; pp. 1049–1052. [Google Scholar]
  23. Gil-Medrano, O.; Michor, P.W. The Riemannian manifold of all Riemannian metrics. Q. J. Math. 1991, 42, 183–202. [Google Scholar] [CrossRef]
  24. DeWitt, B.S. Quantum theory of gravity. I. The canonical theory. Phys. Rev. 1967, 160, 1113. [Google Scholar] [CrossRef] [Green Version]
  25. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  26. Endres, D.; Schindelin, J. A New Metric for Probability Distributions. IEEE Trans. Inf. Theory 2002, 49, 1858–1860. [Google Scholar] [CrossRef] [Green Version]
  27. Jazwinski, A. Stochastic Processes and Filtering Theory; Number 64 in Mathematics in Science and Engineering; Academic Press: New York, NY, USA, 1970. [Google Scholar]
  28. Steinberg, D.M.; Hunter, W.G. Experimental design: Review and comment. Technometrics 1984, 26, 71–97. [Google Scholar] [CrossRef]
  29. Walter, É.; Pronzato, L. Qualitative and quantitative experiment design for phenomenological models¡ªa survey. Automatica 1990, 26, 195–213. [Google Scholar] [CrossRef]
  30. Sebastiani, P.; Wynn, H.P. Bayesian experimental design and Shannon information. Proc. Sect. Bayesian Stat. Sci. 1997, 44, 176–181. [Google Scholar]
  31. Ford, I.; Titterington, D.; Kitsos, C.P. Recent advances in nonlinear experimental design. Technometrics 1989, 31, 49–60. [Google Scholar] [CrossRef]
  32. Carmo, M.P.D. Riemannian Geometry; Birkhäuser: Basel, Switzerland, 1992. [Google Scholar]
  33. Clarke, B. The Completion of the Manifold of Riemannian Metrics with Respect to its L2 Metric. arXiv 2009, arXiv:0904.0159. [Google Scholar]
  34. Forbes, C.; Evans, M.; Hastings, N.; Peacock, B. Statistical Distributions; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
  35. Sethian, J.A. A fast marching level set method for monotonically advancing fronts. Proc. Natl. Acad. Sci. USA 1996, 93, 1591–1595. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Sethian, J.A. Fast marching methods. SIAM Rev. 1999, 41, 199–235. [Google Scholar] [CrossRef]
  37. Tsitsiklis, J.N. Efficient algorithms for globally optimal trajectories. IEEE Trans. Autom. Control 1995, 40, 1528–1538. [Google Scholar] [CrossRef] [Green Version]
  38. Sethian, J.A.; Vladimirsky, A. Ordered upwind methods for static Hamilton—Jacobi equations: Theory and algorithms. SIAM J. Numer. Anal. 2003, 41, 325–363. [Google Scholar] [CrossRef]
  39. Kimmel, R.; Sethian, J.A. Computing geodesic paths on manifolds. Proc. Natl. Acad. Sci. USA 1998, 95, 8431–8435. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Peyré, G.; Péchaud, M.; Keriven, R.; Cohen, L.D. Geodesic methods in computer vision and graphics. Found. Trends Comput. Graph. Vis. 2010, 5, 197–397. [Google Scholar] [CrossRef] [Green Version]
  41. Rhodes, I.; Luenberger, D. Differential games with imperfect state information. IEEE Trans. Autom. Control 1969, 14, 29–38. [Google Scholar] [CrossRef]
  42. Hamadene, S.; Lepeltier, J.P. Zero-sum stochastic differential games and backward equations. Syst. Control Lett. 1995, 24, 259–263. [Google Scholar] [CrossRef]
  43. Keller, M. Intrinsic Metrics on Graphs: A Survey. In Mathematical Technology of Networks; Mugnolo, D., Ed.; Springer International Publishing: Cham, Switzerland, 2015; pp. 81–119. [Google Scholar]
  44. Bobenko, A.I.; Sullivan, J.M.; Schröder, P.; Ziegler, G.M. (Eds.) Discrete Differential Geometry; Number 38 in Oberwolfach Seminars; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  45. Padula, S.L.; Kincaid, R.K. Optimization strategies for sensor and actuator placement. In Langley Research; NASA: Washington, DC, USA, 1999. [Google Scholar]
  46. Sheng, X.; Hu, Y.H. Maximum likelihood multiple-source localization using acoustic energy measurements with wireless sensor networks. IEEE Trans. Signal Process. 2005, 53, 44–53. [Google Scholar] [CrossRef] [Green Version]
  47. Alexander, S.; Kapovitch, V.; Petrunin, A. Invitation to Alexandrov Geometry: CAT[0] Spaces; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Figure 1. Diagrammatic representation of the sensor model; sensors are at λ i M taking measurements of a target at θ M . A measure of distance between different sensor configurations, physically corresponding to change in information content is obtained through a suitable restriction of the metric G g (13) to the configuration manifold M ( Γ ) M , the space of all Riemannian metric on M. M is almost certainly not topologically spherical, it is merely drawn here as such for simplicity.
Figure 1. Diagrammatic representation of the sensor model; sensors are at λ i M taking measurements of a target at θ M . A measure of distance between different sensor configurations, physically corresponding to change in information content is obtained through a suitable restriction of the metric G g (13) to the configuration manifold M ( Γ ) M , the space of all Riemannian metric on M. M is almost certainly not topologically spherical, it is merely drawn here as such for simplicity.
Sensors 21 05265 g001
Figure 2. Graphical illustration of D-optimal sensor dynamics; a sensor configuration Γ 0 evolves to a new configuration Γ 1 by moving the N sensors in Ω -space to new positions that are determined by maximizing the determinant of the metric G, given by Equation (14), on the sensor manifold M ( Γ ) . Each sensor λ i traverses a path γ i through Ω -space to end up in the appropriate positions constituting Γ 1 . As shown in Section 4.3, the paths γ i are entropy minimizing if they are geodesic on the sensor manifold M ( Γ ) . Note that the target is shown as stationary in this particular illustration.
Figure 2. Graphical illustration of D-optimal sensor dynamics; a sensor configuration Γ 0 evolves to a new configuration Γ 1 by moving the N sensors in Ω -space to new positions that are determined by maximizing the determinant of the metric G, given by Equation (14), on the sensor manifold M ( Γ ) . Each sensor λ i traverses a path γ i through Ω -space to end up in the appropriate positions constituting Γ 1 . As shown in Section 4.3, the paths γ i are entropy minimizing if they are geodesic on the sensor manifold M ( Γ ) . Note that the target is shown as stationary in this particular illustration.
Sensors 21 05265 g002
Figure 3. Solutions to the geodesic equation on Ω = [ 10 , 10 ] × [ 10 , 10 ] for a target starting at ( 1 , 3 ) and sensors at ( 7 , 6 ) and ( 0 , 1 ) (raised at a height 1 above Ω ). The differing paths correspond to the initial direction vector ( cos θ , sin θ ) of the target, varying as θ varies from 0 to 2 π radians in steps of 0.25 radians.
Figure 3. Solutions to the geodesic equation on Ω = [ 10 , 10 ] × [ 10 , 10 ] for a target starting at ( 1 , 3 ) and sensors at ( 7 , 6 ) and ( 0 , 1 ) (raised at a height 1 above Ω ). The differing paths correspond to the initial direction vector ( cos θ , sin θ ) of the target, varying as θ varies from 0 to 2 π radians in steps of 0.25 radians.
Sensors 21 05265 g003
Figure 4. Geodesic distance on Ω = [ 10 , 10 ] × [ 10 , 10 ] from the point ( 1 , 3 ) with sensors at ( 7 , 6 ) and ( 0 , 1 ) (raised at a height 1 above Ω ). The distance was calculated using a Fast Marching formulation of the geodesic equation. The fact that the geodesics follow the gradient of this distance allows comparison with Figure 3.
Figure 4. Geodesic distance on Ω = [ 10 , 10 ] × [ 10 , 10 ] from the point ( 1 , 3 ) with sensors at ( 7 , 6 ) and ( 0 , 1 ) (raised at a height 1 above Ω ). The distance was calculated using a Fast Marching formulation of the geodesic equation. The fact that the geodesics follow the gradient of this distance allows comparison with Figure 3.
Sensors 21 05265 g004
Figure 5. Geodesic speed for each point in Ω = [ 10 , 10 ] × [ 10 , 10 ] for targets departing each point in direction of the vector (1,−1).
Figure 5. Geodesic speed for each point in Ω = [ 10 , 10 ] × [ 10 , 10 ] for targets departing each point in direction of the vector (1,−1).
Sensors 21 05265 g005
Figure 6. Contour plot of det ( G ) , for G given by (14), for the case of three sensors, where S 1 is fixed at ( 2 , 2 ) , and S 2 at ( 2 , 2 ) (red dot). Brighter shades indicate a greater value of det ( G ) ; det ( G ) is maximum at ( 1 , 5.5 ) . The third sensor is allowed to move. The geodesic path linking the initial and final positions of S 3 starting at ( 6 , 6 ) (yellow dot) is shown by the dashed yellow curve, while the dashed red curve shows another geodesic path linking ( 6 , 7 ) (red dot) to the D-optimal location ( 1 , 5.5 ) . The actual positions of the sensors are raised above Ω by a distance z i .
Figure 6. Contour plot of det ( G ) , for G given by (14), for the case of three sensors, where S 1 is fixed at ( 2 , 2 ) , and S 2 at ( 2 , 2 ) (red dot). Brighter shades indicate a greater value of det ( G ) ; det ( G ) is maximum at ( 1 , 5.5 ) . The third sensor is allowed to move. The geodesic path linking the initial and final positions of S 3 starting at ( 6 , 6 ) (yellow dot) is shown by the dashed yellow curve, while the dashed red curve shows another geodesic path linking ( 6 , 7 ) (red dot) to the D-optimal location ( 1 , 5.5 ) . The actual positions of the sensors are raised above Ω by a distance z i .
Sensors 21 05265 g006
Figure 7. Solutions to the geodesic equation on Ω = [ 10 , 10 ] × [ 10 , 10 ] for a sensor starting at ( 0 , 1 ) with the other sensors stationary at ( 2 , 3 ) and ( 3 , 2 ) (raised at a height 1 above Ω .). The target is located at ( 0 , 1 ) on Ω . The differing paths correspond to the initial direction vector ( cos ϕ , sin ϕ ) of the target, varying as ϕ varies from 0 to 2 π radians in steps of 0.25 radians.
Figure 7. Solutions to the geodesic equation on Ω = [ 10 , 10 ] × [ 10 , 10 ] for a sensor starting at ( 0 , 1 ) with the other sensors stationary at ( 2 , 3 ) and ( 3 , 2 ) (raised at a height 1 above Ω .). The target is located at ( 0 , 1 ) on Ω . The differing paths correspond to the initial direction vector ( cos ϕ , sin ϕ ) of the target, varying as ϕ varies from 0 to 2 π radians in steps of 0.25 radians.
Sensors 21 05265 g007
Figure 8. Geodesic distance on Ω = [ 10 , 10 ] × [ 10 , 10 ] for a sensor starting at ( 0 , 1 ) with the other sensor stationary at ( 7 , 6 ) (raised at a height 1 above Ω ) and the using an uninformative prior for the target. The distance was calculated using a Fast Marching formulation of the geodesic equation. The fact that the geodesics follow the gradient of this distrance allows comparison with Figure 7.
Figure 8. Geodesic distance on Ω = [ 10 , 10 ] × [ 10 , 10 ] for a sensor starting at ( 0 , 1 ) with the other sensor stationary at ( 7 , 6 ) (raised at a height 1 above Ω ) and the using an uninformative prior for the target. The distance was calculated using a Fast Marching formulation of the geodesic equation. The fact that the geodesics follow the gradient of this distrance allows comparison with Figure 7.
Sensors 21 05265 g008
Figure 9. Geodesic speed sensor moving along a geodesic in direction (1, −1) at each point in Ω = [ 10 , 10 ] × [ 10 , 10 ] . The other sensor is stationary at ( 7 , 6 ) , and an uninformative prior is used for the target. The actual positions of the sensor are raised above Ω by a distance 1.
Figure 9. Geodesic speed sensor moving along a geodesic in direction (1, −1) at each point in Ω = [ 10 , 10 ] × [ 10 , 10 ] . The other sensor is stationary at ( 7 , 6 ) , and an uninformative prior is used for the target. The actual positions of the sensor are raised above Ω by a distance 1.
Sensors 21 05265 g009
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Williams, S.; Suvorov, A.G.; Wang, Z.; Moran, B. The Information Geometry of Sensor Configuration. Sensors 2021, 21, 5265. https://doi.org/10.3390/s21165265

AMA Style

Williams S, Suvorov AG, Wang Z, Moran B. The Information Geometry of Sensor Configuration. Sensors. 2021; 21(16):5265. https://doi.org/10.3390/s21165265

Chicago/Turabian Style

Williams, Simon, Arthur George Suvorov, Zengfu Wang, and Bill Moran. 2021. "The Information Geometry of Sensor Configuration" Sensors 21, no. 16: 5265. https://doi.org/10.3390/s21165265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop