1. Introduction
Consider a joint distribution over “source” variables
and
and “target”
Y. Such distributions arise in many settings: sensory integration, logical computing, neural coding, functional network inference, and many others. One promising approach to understanding how the information shared between
, and
Y is organized is the partial information decomposition (PID) [
1]. This decomposition seeks to quantify how much of the information shared between
,
, and
Y is done so redundantly, how much is uniquely attributable to
, how much is uniquely attributable to
, and finally how much arises synergistically by considering both
and
together.
Unfortunately, the lack of a commonly accepted method of quantifying these components has hindered PID’s adoption. In point of fact, several proposed axioms are not mutually consistent [
2,
3]. Furthermore, to date, there is little agreement as to which should hold. Here, we take a step toward understanding these issues by adopting an operational definition for the unique information. This operational definition comes from information-theoretic cryptography and quantifies the rate at which two parties can construct a secret while a third party eavesdrops. Said more simply, for a source and the target to uniquely share information, no other variables can have any portion of that information—their uniquely shared information is a secret that only the source and target have.
There are four varieties of secret key agreement rate depending on which parties are allowed to communicate, each of which defines a different PID. Each variety also relates to a different intuition as to how the PID operates. We discuss several aspects of these different methods and further demonstrate that three of the four fail to construct an internally consistent decomposition. The surviving method induces a directionality on the PID that has not been explicitly considered before.
Our development proceeds as follows.
Section 2 briefly describes the two-source PID.
Section 3 reviews the notion of secret key agreement rate and how to quantify it in three contexts: no one communicates, only Alice communicates, and both Alice and Bob communicate.
Section 4 discusses the behavior of the PID quantified utilizing secret key agreement rates as unique information and what intuitions are implied by the choice of who is permitted to communicate.
Section 5 compares the behavior of the one consistent secret key agreement rate PID with several others proposed in the literature.
Section 6 explores two further implications of our primary results, first in a distribution where two-way communication seems to capture synergistic, third-order connected information and second in the behavior of an extant method of quantifying the PID along with maximum entropy methods. Finally,
Section 7 summarizes our findings and speculates about PID’s future.
3. Secret Key Agreement
Secret key agreement is a fundamental concept within information-theoretic cryptography [
4]. Consider three parties—Alice, Bob, and Eve—who each partially observe a source of common randomness, joint probability distribution
, where Alice has access only to
a, Bob
b, and Eve
e. The central challenge is to determine if it is possible for Alice and Bob to agree on a secret key of which Eve has no knowledge. The degree to which they may generate such a secret key immediately depends on the structure of the joint distribution
. It also depends on whether Alice and Bob are allowed to publicly communicate.
Concretely, consider Alice, Bob, and Eve each receiving
n independent, identically distributed samples according to
—Alice receiving
, Bob
, and Eve
, where
denotes a sequence of random variables
. Note that, although each party’s observations
,
are independent, the observations of different parties at the same time are correlated according to
. A secret key agreement scheme consists of functions
f and
g, as well as a protocol
h for public communication allowing either Alice, Bob, neither, or both to communicate. In the case of a single party being permitted to communicate—say, Alice—she constructs
and then broadcasts it to all parties over an authenticated channel. In the case that both parties are permitted communication, they take turns constructing and broadcasting messages of the form
(Alice) and
(Bob) [
5]. Said more plainly, Alice’s public messages are a function of her observations and any prior public communication from both parties. Bob’s public messages are a function of his observations and any prior communication from both parties.
Formally, a secret key agreement scheme is considered
R-achievable if for all
:
where
and
denote the method by which Alice and Bob construct their keys
and
, respectively,
states that their keys must agree with arbitrarily high probability,
states that the information about the key which Eve—armed with both her private information
as well as the public communication
C—has access is arbitrarily small, and
states that the key consists of approximately
R bits per sample.
The greatest rate R such that an achievable scheme exists is known as the secret key agreement rate. Notational variations indicate which parties are permitted to communicate. In the case that Alice and Bob are not allowed to communicate, their rate of secret key agreement is denoted . When only Alice is allowed to communicate, their secret key agreement rate is or, equivalently, . When both Alice and Bob are allowed to communicate, their secret key agreement rate is denoted . In this, we modified the standard notation for secret key agreement rates to emphasize which party or parties communicate.
The secret key agreement rates obey a simple partial order. lower bounds both and , since no communication is a special case of one party communicating. Similarly, both and lower bound , since only one party communicating is a special case of both parties communicating. Other than having some identical lower and upper bounds, and are themselves generally incomparable.
In the case of no communication,
is given by [
6]:
where
denotes the Gács–Körner common random variable [
7]. It is worth noting that the entropy of this variable, the Gács–Körner common information, is not continuous under smooth changes in the probability distribution. It is also only nonzero for a measure-zero set of distributions within the simplex; specifically, the set of distributions whose joint events form bipartite graphs with multiple connected components [
8].
In the case of one-way communication,
is given by [
9]:
where the maximum is taken over all variables
C and
K, such that the following Markov condition holds:
. This quantifies the maximum amount of information that Bob can share with the key above the amount that Eve shares with the key, both given the public communication. It suffices, by the Fenchel–Eggleston strengthening of Carathéodory’s theorem [
10], to assume
K and
C alphabets are limited:
and
.
There is no such closed-form, calculable solution for
; however, various upper- and lower-bounds are known [
5]. One simple lower bound is the supremum of the two one-way secret key agreement rates, as they are both a subset of bidirectional communication. An even simpler upper bound that we will use is the intrinsic mutual information [
11]:
This states that the amount of secret information that Alice and Bob share is no greater than their mutual information conditioned on any modification of Eve’s observations. Here, is an arbitrary stochastic function of E or alternatively the result of passing E through a memoryless channel.
The unique PID component could be assigned the value of a secret key agreement rate under four different schemes. First, neither nor Y may be allowed to communicate. Second, only can communicate. Third, only Y is permitted to communicate. Finally, both and Y may be allowed to communicate. Note that the eavesdropper is not allowed to communicate in any secret sharing schemes here.
Secret key agreement rates have been associated with unique information before. One particular upper bound on
—the intrinsic mutual information Equation (
7)—is known to not satisfy the consistency condition Equation (
4) [
12]. Rosas et al. [
13] briefly explored the idea of eavesdroppers influence on the PID. More recently, the relationship between a particular method of quantifying unique information and one-way secret key agreement
has been considered [
14]. Furthermore, there are analogous notions of secret key agreement rates within the channel setting, as opposed to the source setting considered here. We leave an analysis of that setting as future work.
4. Cryptographic Partial Information Decompositions
We now address the application of each form of secret key agreement rate as unique information in turn. For each resulting PID, we consider two distributions. The first is that called
Pointwise Unique, chosen here to exemplify the differing intuitions that can be applied to the PID. The second distribution we look at is entitled
Problem as it serves as a counterexample demonstrating that three of the four forms of secret key agreement do not result in a consistent decomposition. Both distributions are given in
Figure 1. Although we only consider two distributions here, their behaviors are rich enough for us to draw out the two main results of this work. Further examples are given in
Section 5.
Interpreting the
Pointwise Unique [
15] distribution is relatively straightforward. The target
Y takes on the values “1” and “2” with equal probability. At the same time, exactly one of the two sources (again with equal probability) will be equal to
Y, while the other is “0”. The mutual information I
bit and I
bit.
The Problem distribution lacks the symmetry of Pointwise Unique, yet still consists of four equally probable events. The sources are restricted to take on pairs “00”, “01”, “02”, “10”. The target Y is equal to a “1” if either or is “1”, and is “0” otherwise. With this distribution, the mutual information I and I bit.
4.1. No Public Communication
In the first case, we consider the unique information from to Y as the rate at which and Y can agree on a secret key while exchanging no public communication: . This approach has some appeal: the PID is defined simply by a joint distribution without any express allowance or prohibition on public communication. However, given its quantification in terms of the Gács-Körner common information, the quantity does not vary continuously with the distribution of interest. Now, what is the behavior of this measure on our two distributions of interest?
When applied to Pointwise Unique, each source and the target are unable to construct a secret key. In turn, each unique information is determined to be 0 bits. This results in a redundancy and a synergy each of ½ bit.
The
Problem distribution demonstrates the inability of
to construct a consistent PID. In this instance, as in the case of
Pointwise Unique, no secrecy is possible and each unique information is assigned a value of 0 bits. We therefore determine from Equation (
2) that the redundancy should be I
. Equation (
3), however, says that the redundancy is I
. This contradiction demonstrates that no-communication secret key agreement rate cannot be used as a PID’s unique components.
The resulting partial information decompositions for both distributions are listed in
Table 1.
4.2. One-Way Public Communication
We next consider the situation when one of the two parties is allowed public communication. This gives us two options: either the source communicates to target Y or vice versa. Both situations enshrine a particular directionality in the resulting PID.
The first, where constructs and publicly communicates it, emphasizes the channels and the channel . This creates a narrative of the sources conspiring to create the target. We call this interpretation the camel intuition, after the aphorism that a camel is a horse designed by committee. The committee member may announce what design constraints they brought to the table.
The second option, where Y constructs and publicly communicates it, emphasizes the channels and . It implies the situation that the sources are imperfect representations of the target. We call this interpretation the elephant intuition, as it recalls the parable of the blind men describing an elephant for the first time. The elephant Y may announce which of its features is revealed in a particular instance.
4.2.1. Camels
The first option adopts , bringing to mind the idea of sources acting as inputs into some scheme by which the target is produced. When viewed this way, one may ask questions such as “How much information in is uniquely conveyed to Y?”. Furthermore, the channels , , and take center stage.
Through this lens, the Pointwise Unique distribution has a clear interpretation. Given any realization, exactly one source is perfectly correlated with the target, while the other is impotently “0”. From this vantage, it is clear that the unique information should each be ½ bit, and this is borne out with the one-way secret key agreement rate. To see this, consider broadcasting each time they observed a “1” or a “2”. This corresponds to the joint events “101” and “202”, respectively. It is clear that, when considering just these two events, and Y can safely utilize their observations which will agree exactly and with which has no knowledge. Since these events occur half the time, we conclude that the secret key agreement rate is ½ bit. This implies that the redundancy and synergy of this decomposition are both 0 bits.
For the
Problem distribution, we find that
can broadcast the times when they observed a “1” or a “2”, which correspond to
Y having observed a “1” or “0”, respectively. In both instances,
observed a “0” and so cannot deduce what the other two have agreed upon. This leads to
being equal to ½ bit. At the same time,
vanishes. However,
Problem’s redundancy and synergy cannot be quantified, since the two secret key agreement schemes imply different redundancies and so are inconsistent with Equation (
4).
The resulting PIDs for both are given in
Table 2.
4.2.2. Elephants
When the target Y is the one party permitted communication, one adopts and we can interpret the sources as alternate views of the singular target. Consider, for example, journalism where several sources give differing perspectives on the same event. When viewed this way, one might ask a question such as “How much information in Y is uniquely captured by ?”. The channels , , and are paramount with this approach. We denote these in reverse to emphasize that Y is still the target in the PID.
Considered this way, the Pointwise Unique distribution takes on a different character. The sources each receive identical descriptions of the target—accurate half the time and erased the remainder. The description is identical, however. Nothing is uniquely provided to either source. This is reflected in the secret key agreement rates: Y can broadcast her observation, restricting events to either “011” and “101” or to “022” and “202”. In either case, these restrictions do not help in the construction of a secret key since Y cannot further restrict to cases where it is that agrees with her and not . This makes each unique information 0 bits, leaving both the redundancy and synergy ½ bit.
The
Problem distribution’s unique information are
bit and
bit. Unlike the prior two decompositions, this unique information satisfies Equation (
4). The resulting redundancy is 0.3113 bit while the synergy is ½ bit.
Their PIDs are listed in
Table 3. Thus, by having
Y publicly communicate and thus invoking a particular directionality, we finally get a consistent PID.
4.3. Two-Way Public Communication
We finally turn to the full two-way secret key agreement rate: . This approach is also appealing, as it does not ascribe any directionality to interpreting the PID. Furthermore, it varies continuously with the distribution, unlike the no-communication case. However, this quantity is generally impossible to compute directly, with only upper and lower bounds known. Fortunately, this only slightly complicates the analyses we wish to make.
In the case of the Pointwise Unique distribution, it is not possible to extract more secret information than was done in the camel situation. Therefore, the resulting PID is identical: unique information of ½ bit and redundancy and synergy of 0 bits.
Problem, however, is again a problem. In this instance, upper and lower bounds on
converge: the larger of the two one-way secret key agreement rates form a lower bound of ½ bit, while the upper bound provided by the intrinsic mutual information is also ½ bit, and so we know this value exactly. Utilizing the consistency relation Equation (
4), we find that the other unique information must be 0.3113 bit in order for the full decomposition to be consistent. However, the intrinsic mutual information places an upper bound of 0.1887 bit on
. We therefore must conclude that two-way secret key agreement rates cannot be used to directly quantify unique information and a consistent PID cannot be built using them.
The resulting PIDs for both these distributions can be seen in
Table 4.
4.4. Summary
To conclude, then, there is only one secret-key communication scenario—
Y publicly communicates—that yields a consistent PID, as in
Table 3. The above arguments by counterexample pruned away the unworkable scenarios, narrowing to only one. Naturally, this does not constitute proof that the remaining scenario always leads to a consistent PID. The narrowing, though, allowed us to turn to numerical searches using the
dit [
16] software package. Extensive searches were unable to find a counterexample. Thus, practically, with a high probability, this scenario leads to consistent PIDs.
Though this is the singular viable secret key agreement rate-based PID, we hesitate to fully endorse its use due to the necessary directionality that comes with it. That is, one must invoke a directionality, unspecified by the PID, to have a consistent PID when using secret key agreement as the basis for the PID component of unique information. It is not immediately obvious as to why only
results in a viable partial information decomposition, if this is indeed the case. We leave a proof as to whether
and
satisfy Equation (
4) for all distributions or not as an open question. Specifically:
where both optimizations are performed over the space
. We conjecture that
is the only secret key agreement rate resulting in a viable PID due to the fact that the spaces in which the solutions are found are identical.
The reasons why the other three secret key agreement rates fail to form consistent decompositions are likely particular to each scenario. In the case of no communication, the limitations carried over from the Gács–Körner common information play a major role—specifically, that it vanishes even for weakly mixing distributions. In the case of the one-way “camel” secret key agreement rate, it is possible that the failure arises from the optimization spaces of each unique information being different. Finally, for the case of two-way communication, we offer several speculations in
Section 6.1.
These measures of unique information can also be applied within the multivariate sources setting [
1], though, like other measures of unique information [
17,
18], these measures cannot fully quantify the general decompositions there.
5. Examples
We next demonstrate the behavior of the partial information decomposition quantified using
, herein denoted
. On many of the “standard” distributions—
Rdn,
Unq,
Xor—
behaves as intuited by Griffith [
19]. Here, we compare its behavior with that of several other proposed measures—
[
1],
[
20],
[
17],
[
21],
[
18],
[
15], and
[
22]—across five distributions—
And,
Diff,
Not Two,
Pnt. Unq., and
Two Bit Copy. These distributions are given in
Figure 2. Note that Reference [
14] proved that the secret key agreement rates
lower bound the unique information of
.
The
And distribution—the first set of results in
Table 5—yields the same decomposition under
,
,
, and
. In each of these cases, there is no unique information, resulting in 0.311 bit of redundancy and ½ bit of synergy.
produces negative unique values of −¼ bit with a redundancy of 0.561 bit and a synergy of ¾ bit.
,
, and
all produce positive unique values, indicating that, at least in this case, they interpret unique information as something more than the ability of source and target to agree on a secret when the target can communicate.
The
Diff distribution—the second set of results in
Table 5—is named so due to its keen ability to differentiate PID measures. In this example, two pairs of PIDs coincide:
with
and
with
. In magnitude,
attributes the least to unique information for this distribution, while both
and
attribute the most.
The
Not Two distribution—the third set of results in
Table 5—is named due to it consisting of all binary events over three variables that do not contain two “1”s. As far as the behavior of the various PIDs goes, the distribution is very similar to the
And distribution.
,
,
, and
all allot 0 bit to the unique information, while
finds the unique information to be negative. Here,
also finds them to be negative. There is a major difference between the
Not Two and
And distributions, however: the
Not Two possesses a great deal of third-order information, while the
And possesses none. This indicates that, although the PIDs of the two distributions are qualitatively similar, their structures are in fact quite distinct.
Next, we consider the
Pnt. Unq. distribution—the fourth set of results in
Table 5—again. We see that
,
,
, and
behave as elephants, while
and
behave as camels.
does not commit to either but leans towards elephant, while
splits the difference.
Finally, we consider the venerable
Two Bit Copy distribution, and the final set of results in
Table 5. Here,
and
stand out as assigning one bit to redundant information and one bit to synergy, while assigning nothing to unique. All other measures assign one bit to each unique piece of information and nothing to redundancy and synergy. Note that this is not a directionality issue: all four secret key agreement rates agree that the rate at which a secret key can be constructed is 1 bit.
7. Conclusions
At present, a primary barrier for PID’s general adoption as a useful and possibly central tool in analyzing how complex systems store and process information is an agreement on a method to quantify its component information. Here, we posited that one reason for disagreement stems from conflicting intuitions regarding the decomposition’s operational behavior. To give an operational meaning to unique information and address these intuitions, we equated unique information with the ability of two parties to agree on a secret—a reasonably intuitive operationalization of what it means for two variables to share a piece of information that no others have. This led to numerous observations.
The first is that the PID, as currently defined, is ambivalent to any notion of directionality. There are, however, very clear cases in which the assumption of a directionality—or lack thereof—is critical to the existence of unique information. Consider, for example, the case of the McGurk effect [
27] where the visual stimulus of one phoneme and the auditory stimulus of another phoneme gives rise to the perception of a third phoneme. By construction, the stimuli cause the perception, and the channels implicit in a camel intuition are central. If one were to study this interaction using an elephant-like PID, it is unclear that the resulting decomposition would reflect the neurobiological mechanisms by which the perception is produced. Similarly, a camel-like measure would be inappropriate when interpreting simultaneous positron emission tomography (PET) and magnetic resonance imaging (MRI) scans of a tumor.
One can view this as the PID being inherently context-dependent and conclude that quantification requires specifying directionality. In this case, the elephant intuition is apparently more natural, as adopting closely-related notions from cryptography results in a consistent PID. If context demands the camel intuition, though, either a noncryptographic method of quantifying unique information is needed or consistency must be enforced by augmenting the secret key agreement rate. It is additionally possible that associating secret key agreement rates with unique information is fundamentally flawed and that, ultimately, PID entails quantifying unique information as something distinct from the ability to agree on a secret key. Whatever is missing has yet to be identified.
The next observation concerns the third-order connected information. We first demonstrated that such triadic information can be constructed from common information in which each constituent variable is independently and identically modified. Furthermore, it was shown that any two of three parties, when engaging in bidirectional communication, capture the triadic information. Granted, this does not generically occur. For example, if
Y are related by
Xor, the distribution contains 1 bit of third-order connected information, but
(or any permutation of the variables) is equal to 0 bits. This suggests that the third-order connected information may not be an atomic quantity, but rather consists of two parts, one accessible to two communicating parties and one not. This idea has been explored in a different context in Reference [
28].
Our third observation regards the behavior of the measure, especially in relation to standard maximum entropy principles. We first demonstrated that indeed correlates sources, but argued that this behavior only seems inappropriate when adopting a camel intuition. We then discussed how its intermediate distribution is as structured as the initial one and so, if indeed is operating correctly, it must shunt the dependencies that result in synergy to another aspect of the distribution. Finally, we discussed how the standard maximum entropy approach may remove synergy from a distribution all together. This calls for a more careful investigation as to whether it does (and BROJA optimization is incorrect) or does not and if synergistic information arises from source-target marginals and Occam’s razor.
Looking to the future, we trust that this exploration of the relationship between cryptographic secrecy and unique information will provide a basis for future efforts to understand and quantify the partial information decomposition. Furthermore, the explicit recognition of the role that directional intuitions play in the meaning and interpretation of a decomposition should reduce cross-talk and improve understanding as we collectively move forward.