A Logical Consequence Informed by Probability

Neil F. Hallonquist¹

344 Accesses
Explore all metrics

Abstract

There are two general conceptions on the relationship between probability and logic. In the first, these systems are viewed as complementary—having offsetting strengths and weaknesses—and there exists a fusion of the two that creates a reasoning system that improves upon each. In the second, probability is viewed as an instance of logic, given some sufficiently broad formulation of it, and it is this that should inform the development of more general reasoning systems. These two conceptions are in conflict with each other, where the root issue of contention is the proper abstraction of the concept of logical consequence. In this work, we put forth a proposal on this abstraction based on an extension of the subset relation through the use of projections, which in turn, allows for the formalization of valid inferences to more general settings. Our proposal results in a formalism that encompasses probability and classical logic, and importantly, does so with minimal machinery. This formalism makes assertions about the relationship between these two systems that are explicit, and suggests a path forward in the development of alternatives to them.

On the Descriptive Power of Probability Logic

Interpretations of Probability and Bayesian Inference—an Overview

Article 16 June 2019

Frequentist Probability Logic

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Overview

The question of how probability and logic relate, and how they can or cannot be combined, has an extensive literature and a long history, but is generally considered an unsettled matter (e.g., see [46, 81]). Perhaps the prevailing conception of the relationship, at least in the field of artificial intelligence, is that these systems are complementary (e.g., see [10, 51, 69]); probability can enhance logic by extending it to handle uncertainty, and in turn, logic can enhance probability with, among other things, a notion of valid inference (i.e., probability does not have a notion of consequence analogous to that of logic, and must be extended in this direction^{Footnote 1}). Research on fusing these two systems has splintered into several branches (e.g., probability logic, inductive logic, statistical relational modeling, etc.), the boundaries of which are not always definitive, but generally approaching the fusion from different angles,^{Footnote 2} and often emphasizing the role of either probability or logic over the other in the fusion, as well as certain components within. But which approach, if any, is in some sense most principled? Does one approach integrate the core components in each system more fully? This issue is important because if one system is, e.g., incorporated only at a surface level, then the fused system would form a faulty foundation for the development of a reasoning system meant to improve upon each.

Taking a step back, the above view of a complemental relationship is in conflict with a classical conception, that for some sufficiently broad definition of logic, probability is an instance of it (i.e., probability is a logic). This view has been repeatedly expressed since the inception of the field (e.g., [24, 25, 47, 52, 67]), and the “idea runs like a thread, at times more visible, at times less, through the subsequent history of epistemic probability” [45]. At the moment, this classical view is somewhat sidelined, perhaps because to convincingly answer the question “In what way is probability a logic?” requires a compelling definition for logic, broadly construed. The problem of formulating logic broadly has old origins, and is studied, for example, in the field of universal logic [9, 12, 13, 16, 82], but there remains debate on whether these abstractions successfully extract and circumscribe the heart of logic. For the classical conception of the relationship to be valid, probability must be endowed with a notion of consequence (in order to be an instance of logic^{Footnote 3}), whereas for the complemental view, it cannot (hence the need to extend it in this direction). The resolution of this conflict hinges on how we choose to abstract consequence beyond classical logic.

It is useful to introduce our proposed abstraction by contrasting it with a reoccurring theme in research on combining probability and logic, the defining of probability measures over propositions.^{Footnote 4} Since much research is motivated by, or can be traced back to, questions about what can be inferred about a conclusion, given premises are true with only some degree of certainty, the correspondence between these two systems that is most striking is that both sentences and events define subsets of a given set $\Omega $ of possible worlds, except with valuations being placed on these respective objects differing from binary to real valued. This correspondence between sentences and events shapes a general perspective^{Footnote 5} of the relationship and leads to the conclusion that probability does not have a consequence relation analogous to that of classical logic.^{Footnote 6} In this work, in contrast, we consider a different correspondence between these systems that could also dictate the nature of the relationship and leads to the opposite conclusion.

This work begins by looking at how logical consequence and probabilistic conditioning relate, and we observe that both can be described in terms of projections, allowing for a compact description of them. In turn, this leads to a general version of consequence, a relation extended from binary to real-valued functions. This conception of consequence produces an abstract formulation of logic, for which we make a case based on its simplicity and fit, and where probability may indeed be viewed as an instance. We initially focus on logical (semantic) consequence, and first establish its abstraction before turning to the issue of proof and formality (i.e., proofs based on syntactic form of sentences).^{Footnote 7} On a slightly more technical note, the concept of logical consequence can be viewed as being inextricably linked to the concept of set inclusion, and the abstraction of the former as tantamount to the abstraction of the latter.^{Footnote 8} Indeed, abstract formulations of logic (such as the standard one by Tarski) may be viewed as efforts to abstract the concept of set inclusion from a relation on sets to a relation on sets of sets (with asymmetries in the domain). Here, we consider an extension in a different direction: we abstract essential properties of the set inclusion relation, allowing for the concept of a subset to be extended to that of a sub-function.

This work proceeds as follows. We begin by looking at the relationship between logical consequence and probabilistic conditioning, noticing that both can be stated in terms of projections that dictate the transitions between functions that are valid (Sect. 2). This observation then leads to an abstract logical consequence based on projective systems, required to have preservation, restriction, and consistency properties (Sect. 3). We then examine how grammars and languages allow for proofs of instances of this abstract consequence relation by providing compact symbolic representations of valuation functions (Sect. 4). These ideas lead to a formulation of logic that encompasses probability, making an explicit claim on, in particular, the relationship between probability and classical logic; we attempt to reconcile this relational interpretation with the extensive literature on combining probability and logic (Sect. 5). We conclude with a summary and discussion (Sect. 6).

2 A Simple Correspondence

In this section, we compare logical consequence and probabilistic conditioning, noting a shared property: both relations prescribe what functions are allowed to follow from others based on projections of them. This observation forms the basis for the developments in the rest of the paper. In the next section, we use it to help formulate essential properties for an abstract consequence.

Before proceeding, we note the contrasting role of structure on a set $\Omega $ in these two systems. The modern definition of probability is general in that the definition of a probability space can be stated without reference to the structure on the sample space, if it has any (i.e., the definition is agnostic to the properties of the objects in this set). The definitions of classical logics, on the other hand, are associated with spaces of possible worlds that take specific, though complex, structures.^{Footnote 9} The structure of these spaces is important when defining a language, but for the purposes of this section, where we look at connections with probability to inform potential directions for abstracting logical consequence, we may defer these considerations about language and space structure (see Sect. 4.1). In addition, for developing the basic ideas here, we assume the set $\Omega $ is countable, although the ideas extend beyond this simple setting. In uncountable spaces, the measurability issues that occur (where consistent probabilities cannot be assigned to all subsets of the space) prompted the modern definition of probability, which while providing rigor, can sometimes obscure underpinning concepts. We will return to this issue.

Suppose that ${\mathcal {S}}$ is a set of sentences (i.e., well-formed formulas with no free variables) in a propositional or first-order language. Let $\Omega $ be a set of possible worlds and let ${\mathcal {F}}=2^\Omega $ be the set of binary functions over it. Let $\tau : {\mathcal {S}} \rightarrow {\mathcal {F}}$ be a mapping taking sentences to binary functions; for each sentence $\phi \in {\mathcal {S}}$, there is a function $f_{\phi } \equiv \tau (\phi )$ (taking the form $f_{\phi }:\Omega \rightarrow \{0,1\}$) that specifies its truth value in each world $\omega \in \Omega $, which we will refer to as a valuation function. Let $\models $ be a relation on ${\mathcal {F}}$, referred to as logical (semantic) consequence, discussed in this and the next section, and let $\vdash $ be a relation on the set ${\mathcal {S}}$ of sentences based on the provability of logical consequence, referred to as deductive (syntactic) consequence, discussed in the section after next. We define $\models $ as a relation on the set ${\mathcal {F}}$ of binary functions, rather than on the set ${\mathcal {S}}$ of sentences, to make explicit that this relation is a direct function of the valuations. The relation $\models $ on ${\mathcal {F}}$ induces a relation $\models _{{\mathcal {S}}}$ on ${\mathcal {S}}$: for all $\phi , \psi \in {\mathcal {S}}$, let $\phi \models _{{\mathcal {S}}} \psi $ if and only if $f_{\phi } \models f_{\psi }$. A logical system $({\mathcal {S}}, \models _{{\mathcal {S}}}, \vdash )$ is sound if $\vdash $ is a subset of $\models _{{\mathcal {S}}}$, and is complete if $\models _{{\mathcal {S}}}$ is a subset of $\vdash $.

In classical logic, a valuation function in ${\mathcal {F}}$ is defined to be the logical consequence of another function in ${\mathcal {F}}$ if it is positive whenever the other is positive. That is, for $f, f' \in {\mathcal {F}}$, we have $f \models f'$ if and only if:

$$\begin{aligned} f(\omega )=1 \implies f'(\omega )=1 \end{aligned}$$

(2.1)

for all $\omega \in \Omega $. Another way to state this consequence relation, which we find useful below, is based on projections. Define projections $\pi _A$, where $A \subseteq \Omega $, taking functions in ${\mathcal {F}}$ to functions in ${\mathcal {F}}$, as follows:

$$\begin{aligned} f_A(\omega ) \equiv \pi _A(f)(\omega ) = {\left\{ \begin{array}{ll} f(\omega ), \; & \text {if } \omega \in A \\ 0, \; & \text {otherwise} \end{array}\right. }, \end{aligned}$$

(2.2)

where $f \in {\mathcal {F}}$, and $\omega \in \Omega $. Then, for two valuation functions $f, f' \in {\mathcal {F}}$, we have that $f \models f'$ if and only if there exists a projection in this projection family that takes $f'$ to f (i.e., there exists an $A \subseteq \Omega $ such that $\pi _A(f')=f$). To establish/prove instances of this relation, these binary functions must be given symbolic representations (Sect. 4).

Probability uses real-valued functions and is also endowed with a binary relation on them, defined in terms of conditional distributions. Let ${\mathcal {P}}$ be the set of all probability measures with respect to $\Omega $, and define projections on this function space as follows:

$$\begin{aligned} P_A(\omega ) \equiv \pi _A(P)(\omega ) = {\left\{ \begin{array}{ll} P(\omega )/P(A), \; & \text {if } \omega \in A \\ 0, \; & \text {otherwise} \end{array}\right. }, \end{aligned}$$

(2.3)

where $P \in {\mathcal {P}}$, $\omega \in \Omega $, and $A \subseteq \Omega $ such that $P(A)>0$. These projections, after restrictions on the function’s domain, define conditional distributions: for two probability measures $P,P' \in {\mathcal {P}}$, we have that P is a conditional distribution of $P'$ if and only if there exists a projection in this projection family that takes $P'$ to P (i.e., there exists an $A \subseteq \Omega $ such that $\pi _A(P')=P$). Analogous to the above logical consequence, for a given probability distribution, this relation dictates which other distributions follow from it, e.g., what distributions result from incorporating ‘evidence’.

These relations in classical logic and probability are alike in that they are both based on projections that restrict the mass of functions to some subset of their domain, preserve the mass of functions within that domain, and have certain consistency properties. This is suggestive of a possible abstraction of logical consequence, which we consider in the next section.

In comparing these relations in classical logic and probability, the directionality of their respective projections needs to be distinguished. For a given function f, one can check if a function $f'$ entails it (i.e., there exists a projection taking f to $f'$), or alternatively, if it entails $f'$ (i.e., there exists a projection taking $f'$ to f); we will refer to these as projecting down or projecting up, respectively. A probability distribution, when conditioned on evidence, is a projection of it to another distribution. A binary valuation function, in contrast, logically entails another binary function if the latter projects to the former. In other words, probabilistic conditioning corresponds to projecting down and logical consequence to projecting up, and hence, conditioning and consequence are opposite in their projective directionality.^{Footnote 10}

3 Logical Consequence

Motivated by the above discussion, which illustrated how projections could be used to dictate the allowable transitions of ‘knowledge’ from one state to another, we consider an abstraction of logical consequence based on it. We begin with a brief review of relevant literature to situate our developments. In the first subsection, we state a standard formulation of abstract consequence, allowing us to contrast the direction that it extends the classical version with the direction taken here. Then, in the following couple subsections, we discuss the relevant consequence relations used in non-classical logics, in particular, in many-valued and multiple conclusion logics. We then develop our proposed version, which is based on abstracting the essential properties of the subset relation in order to create a relation analogous to it, but over more general objects. We consider languages and deductive proof in Sect. 4.

3.1 Standard Abstract Consequence

A standard abstraction of consequence is as follows [77, 78]:

Definition 3.1

(Tarskian Consequence). A Tarskian consequence relation on a set $\Psi $ is a relation $\models \; \subseteq {\mathbb {P}}(\Psi ) \times \Psi $ such that, for all $A,B \subseteq \Psi $ and for all $a \in \Psi $, we have:

1.
(Reflexivity) If $a \in A$, then $A \models a$
2.
(Monotonicity) If $A \models a$ and $A \subseteq B$, then $B \models a$
3.
(Transitivity) If $A \models a$ and $B \models b$ for every $b \in A$, then $B \models a$

In this definition, the set $\Psi $ is generic, and the elements in it could be well-formed formulas, valuation functions, or something else; no particular structure is assumed of them. The structure then that makes this relation have substance is not in the set $\Psi $, but in the form assumed of the space ${\mathbb {P}}(\Psi ) \times \Psi $, a point to which we will return below. This abstract formulation of consequence has many logics falling under its umbrella; one instantiation is many-valued logics, to which we now turn.

3.2 Many-Valued Logic

Suppose we have an ordered^{Footnote 11} set ${\mathcal {V}}$ of truth values, containing 0 and 1, where the latter is the greatest element and the former the least. The numerous proposals for logical consequence in the many-valued setting can be categorized as follows [22]: those based on the preservation of designated values from the premises to the conclusion are pure consequences (e.g., Lukasiewicz’s three-valued logic [58]); those based on the conclusion being at least as true as the falsest premise are order-theoretic consequences (e.g., certain probability logics [4], also [59]); and those based on premises having some designated set of truth values, and conclusions having some other designated set of truth values, are mixed consequences (e.g., q-consequence [60]). Each many-valued version is motivated by different criteria, but for our purposes here, the thing to note about these different variations is that they all are defined locally (referred to as ‘truth-relational’ in [22]), whereby one can evaluate entailments by individually evaluating it at each possible world:

Definition 3.2

(Locality). A consequence relation $\models $ is local if there exists a relation $\sim \; \subseteq {\mathbb {P}}({\mathcal {V}}) \times {\mathcal {V}}$ such that for all $A \subseteq \Psi $ and for all $a \in \Psi $, we have:

$$\begin{aligned} A \models a \text { if and only if } A(\omega ) \sim a(\omega ) \text { for all } \omega \in \Omega . \end{aligned}$$

We will say a consequence relation is global if it is not local.

This property is a cornerstone of most logics because the values in ${\mathcal {V}}$ are interpreted, for example, as degrees of truth values [42], quasi truth values [74], gaps in truth [54], or informational states [11], etc., which only depend on a given possible world.^{Footnote 12} In this work, however, we are interested in exploring formulations that encompass probability, which requires principles of inference for states of ‘knowledge’ that take a more general form. For this reason, we consider formulations in which logical consequence is a global relation, i.e., it is not restricted with a locality condition that allows for point-wise evaluation. We now turn to the topic of symmetry in logics, which is also relevant to our developments.

3.3 Multiple Conclusion Logic

One way classical logic can be extended in a direction different than in Tarskian consequence (Definition 3.1) is to endow the relation with a symmetry between premises and conclusions, a topic studied in the field of multiple conclusion logic [73]. In this logic, both premises and conclusions are sets, and consequence relations take the form $\models \; \subseteq {\mathbb {P}}(\Psi ) \times {\mathbb {P}}(\Psi )$, where $\Psi $ is some set.

One way to define a consequence relation in this setting is based on the disjunction of the conclusions: if all premises are true, then at least one conclusion must be true.^{Footnote 13} The idea of giving symmetry to logic has early roots, going back at least to Gentzen’s sequent calculus [31, 32] and Tait’s version of it (whereby finite sets are considered instead of sequences), and continuing with the work of Carnap [17], where consequence and rules of inference were defined, Kneale [53], where a proof technique was devised for these rules, and Scott [71], which made important contributions, for example, generalizing Lindenbaum’s theorem to this setting.

In this work, we take symmetry—not the multiplicity of conclusions—as the indispensable concept within multiple conclusion logic to be retained. If we assume that we can define a set ${\mathcal {F}}$ of all possible valuation functions, then for the purposes of defining a logical consequence relation, the particular language is immaterial (e.g., two different languages may have the same logical consequence relation). Thus, in this work, we make the assumption that for a given language and set ${\mathcal {S}}$ of well-formed formulas in it, there exists a mapping $\tau : {\mathbb {P}}({\mathcal {S}}) \rightarrow {\mathcal {F}}$ taking sets of formulas to valuation functions. This assumption holds in classical logic, where consequence is defined based on the conjunction of premises, which results in a single sentence (hence any set of premises gets mapped to a valuation function). This assumption allows us to formulate logical consequence as a relation between a single premise and single conclusion, rather than between a set of premises and set of conclusions. This is not as restrictive as it may appear, since we may always construct logical systems such that this assumption holds,^{Footnote 14} e.g., by adding values to the truth set ${\mathcal {V}}$ and increasing the size of ${\mathcal {F}}$, a point that we will discuss further below.

This assumption might cause the following concern: how can we perform proofs if consequence involves only a single premise? However, as we will see, the rules of inference in this setting will still have asymmetries and involve compositions of sets of sentences, and indeed, we argue that this is the proper location for it (see Sect. 4.2.1). Thus, without loss of generality, we may define logical consequence as a binary relation on ${\mathcal {F}}$, rather than as a relation on ${\mathbb {P}}(\Psi )$ between sets. However, a logic in which consequence is defined between a single premise and single conclusion raises the question of where the necessary structure on consequence will be located, a topic to which we now turn.

3.4 A Symmetric, Global Logical Consequence

We now consider an extension of logical consequence to infinite values. In the classical version, logical consequence may be written as a relation $\models \; \subseteq {\mathbb {P}}(\Omega ) \times {\mathbb {P}}(\Omega )$, where $\Omega $ is a set of possible worlds, such that, for all $A,B \subseteq \Omega $, we have:

$$\begin{aligned} A \models B \text { if and only if } B \subseteq A. \end{aligned}$$

(3.1)

To extend this relation to more general functions over $\Omega $, the comparison of classical logic with probability elucidates a path.

Let ${\mathcal {F}}_{\Omega }$ be a set of valuation functions over some domain $\Omega $ and let their codomain be ${\mathcal {V}} = [0,1]$ (this chosen for concreteness, but can be abstracted), i.e., ${\mathcal {F}}_{\Omega }$ generalizes ${\mathbb {P}}(\Omega )$ used in the classical version of the relation above. The discussion in the previous section suggests an abstraction of logical consequence based on projections that define valid transitions between functions. From the relations in equations 2.2 and 2.3, we see that desirable properties of these projections include that they: (i) restrict the domain of valuation functions to some given region; (ii) preserve their mass within that region; and (iii) are consistent with each other. Let $\Pi _{\Omega }$ be a set of projections on ${\mathcal {F}}_{\Omega }$, i.e., containing projections of the form $\pi :{\mathcal {F}}_{\Omega } \rightarrow {\mathcal {F}}_{\Omega }$, and let them be indexed by subsets $A \subseteq \Omega $. Recall, a projection $\pi $ is an idempotent function in which $\pi \circ \pi = \pi $, i.e., projecting an object more than once does not change the result.

We begin by defining some basic properties that a projection family must have in order for it to produce a partial ordering over ${\mathcal {F}}_{\Omega }$, which is a necessary (but we argue not sufficient) requirement for a relation to be considered a consequence relation.

Definition 3.3

(Regular Projections). A projective system $\Pi _{\Omega }$ over ${\mathcal {F}}_{\Omega }$ is regular if:

1.
(Consistency) For all $A, B \subseteq \Omega $, we have $\pi _A \circ \pi _B = \pi _{A \cap B}$. In particular, we have that if $A \subseteq B$, then $\pi _A \circ \pi _{B} = \pi _A$.
2.
(Identity Map) For all $f \in {\mathcal {F}}$, we have $\pi _A(f)=f$ for all $A \subseteq \Omega $ that contains the positive mass of f (i.e., $A \supseteq \{\omega \in \Omega | f(\omega )>0\}$). In particular, $\pi _{\Omega }(f) = f$ for all $f \in {\mathcal {F}}$.

These basic properties allow for a projective system to define a partial ordering over the functions in ${\mathcal {F}}_{\Omega }$; if there exists a projection taking a function h to a function f, then we say $f \le h$. The consistency property is necessary for producing relations that are transitive:

Proposition 3.4

(Projective Ordering). For a function space ${\mathcal {F}}_{\Omega }$ and a regular projective system $\Pi _{\Omega }$ over it, define a relation $\le $ over ${\mathcal {F}}_{\Omega }$ as follows:

$$\begin{aligned} f \le h \text { if and only if } \exists \pi \in \Pi \text { such that } \pi (h)=f. \end{aligned}$$

for all $f,h \in {\mathcal {F}}_{\Omega }$. Then $\le $ is a partial ordering.

Proof

The relation $\le $ is a partial ordering if it is reflexive, transitive, and antisymmetric. Reflexivity (i.e., $f \le f$) follows directly from the identity projections $\pi _{\Omega }$ in the regular projective system. To show transitivity, suppose that $f \le h$ and $h \le g$. Then there exists projections $\pi _A$ and $\pi _B$ such that $f = \pi _A(h)$ and $h = \pi _B(g)$, and hence, by the consistency property, there exists a projection $\pi _{A \cap B} = \pi _A \circ \pi _B$ such that $f = \pi _{A \cap B}(g)$, implying $f \le g$. To show antisymmetry, suppose that $f \le h$ and $h \le f$. Then there exists projections $\pi _A$ and $\pi _B$ such that $f = \pi _A(h)$ and $h = \pi _B(f)$, and by the consistency property, we have that $f = \pi _A \circ \pi _B (f) = \pi _{A \cap B} (f)$ and $h = \pi _B \circ \pi _A (h) = \pi _{A \cap B} (h)$. Also by the consistency property, we have that $\pi _{A \cap B} \circ \pi _B = \pi _{A \cap B}$, which implies that $\pi _{A \cap B}(f) = \pi _{A \cap B} \circ \pi _B (f) = \pi _{A \cap B}(h)$. As a result, we have that $f = h$. $\square $

We refer to the partial ordering defined from a projective system as a projective ordering. Now, we consider what additional properties a projective system must have in order to produce a relation we might call a consequence relation. In particular, these additional properties abstract the notion of the set inclusion relation. Assume the function space ${\mathcal {F}}_{\Omega }$ contains the zero function^{Footnote 15} over $\Omega $, denoted by 0, and to simplify the notation, for a projection $\pi _A$, where $A \subseteq \Omega $, let $f_A \equiv \pi _A(f)$ for each function $f \in {\mathcal {F}}_{\Omega }$.

Definition 3.5

(Focal Projections). A projective system $\Pi _{\Omega }$ over ${\mathcal {F}}_{\Omega }$ is focalizing if it is regular, and for all $A,B \subseteq \Omega $ and for all $f,h \in {\mathcal {F}}_{\Omega }$, we have:

1.
(Mass Restriction) For all $\omega \notin A$, we have $f_A(\omega ) = 0$.
2.
(Mass Preservation) For all $\omega \in A$:
$$\begin{aligned} \text {if } f(\omega )>0, \text { then } f_A(\omega ) > 0. \end{aligned}$$
(3.2)
3.
(Order Preservation) For all $\omega ,\omega ' \in A$:
$$\begin{aligned} \text {if } f(\omega ) \le f(\omega '), \text { then } f_A(\omega ) \le f_A(\omega '). \end{aligned}$$
(3.3)

A focalizing projective system can induce a relation on real-valued functions that may be viewed as abstracting the basic properties of the subset (or set inclusion) relation; we refer to these induced relations as sub-function relations.

Definition 3.6

(Sub-Functions). For a function space ${\mathcal {F}}_{\Omega }$ and a focalizing projective system $\Pi _{\Omega }$ over it, define the relation $\subseteq _{\Pi }$ over ${\mathcal {F}}_{\Omega }$ with respect to them, referred to as a sub-function relation, as follows:

$$\begin{aligned} f \subseteq _{\Pi } h \text { if and only if } \exists \pi \in \Pi \text { such that } \pi (h)=f. \end{aligned}$$

for all $f,h \in {\mathcal {F}}_{\Omega }$.

This definition of a sub-function relation may be viewed as extending the concept of a subset relation from $2^{\Omega }$ to $[0,1]^{\Omega }$. This definition extends the basic, point-wise version of a sub-function, where $h:\Omega ' \rightarrow [0,1]$ is a sub-function of $f:\Omega \rightarrow [0,1]$, where $\Omega ' \subseteq \Omega $, if we have $f(\omega )=h(\omega )$ for all $\omega \in \Omega '$ [34]. This definition also allows other basic set theory concepts to be extended in terms of it (see Appendix A). With this sub-function relation $\subseteq _{\Pi }$ in hand, we may state the definition of logical consequence in terms of it:

Definition 3.7

(Logical Consequence). A logical consequence relation with respect to a function space ${\mathcal {F}}_{\Omega }$ and a focalizing projective system $\Pi _{\Omega }$ over it, is a relation $\models \; \subseteq {\mathcal {F}}_{\Omega } \times {\mathcal {F}}_{\Omega }$ such that:

$$\begin{aligned} f \models h \text { if and only if } f \subseteq _{\Pi } h \end{aligned}$$

for all $f,h \in {\mathcal {F}}_{\Omega }$.

This definition, by design, subsumes both logical consequence in classical logic and conditioning in probability. For logic, we may let ${\mathcal {F}}_{\Omega }$ be the space of binary functions and define a projection family as in Definition 2.2; then we see that these projections produce the subset relation and in turn, the classical consequence relation. For probability, we may let ${\mathcal {F}}_{\Omega } = {\mathcal {P}}_{\Omega } \cup \{ {\textbf {0}} \}$, where ${\mathcal {P}}_{\Omega }$ is the set of probability measures over $\Omega $ and 0 is the zero function over $\Omega $ (denoting the degenerate distribution), and define the projections as in Definition 2.3 (except now, for $A \subseteq \Omega $ and $P \in {\mathcal {P}}_{\Omega }$ such that $P(A)=0$, letting the projection $\pi _A(P) = {\textbf {0}}$); then this projection family produces a sub-function relation that coincides with the probabilistic conditioning relation. This consequence definition also encompasses many of the extensions of consequence considered for many-valued logics (see, for example, [22]).

As mentioned above, this formulation does not conform to the traditional interpretations given to consequence based on truth-preservation, but rather, is suggestive of a more general notion of valid inference. The valuation functions may be interpreted in different ways, but in their usual application, they are used to describe a state of knowledge, broadly construed, about $\Omega $, the set of possible worlds. The knowledge about one world may not be independent of the knowledge about another (if there are dependencies, these must be preserved). This knowledge can be refined or reformulated based on restrictions of $\Omega $ to subsets of possible worlds. The projections define, as these restrictions vary, the valuation functions that are allowed to follow from another (i.e., what may be inferred). The consistency condition results in consequence relations that are transitive, important since conclusions should hold no matter the route taken to arrive at them, whether from a direct or more circuitous route, loosely speaking. This of course is the case in classical logic and probability, allowing for example, for one to perform a sequence of logical entailments to arrive at some conclusion, or similarly, to perform a sequence of conditionings (based on, for example, incoming evidence) to arrive at some probabilistic inference.

In this section, we defined a logical consequence relation that is symmetric, global, and infinite-valued. We now compare its properties to those of other many-valued logical consequence relations.

3.5 Comparing Consequences

The proposed definition of consequence differs from Tarskian consequence (Definition 3.1) in that it assumes a symmetry between premise and conclusion. Under the assumption that there exists a mapping $\tau : {\mathbb {P}}({\mathcal {S}}) \rightarrow {\mathcal {F}}$ taking sets of sentences to valuation functions, the Tarskian definition of consequence collapses to a trivial form (i.e., letting $\Psi = {\mathcal {S}}$ in Definition 3.1). This raises the question, is this structure sufficient, or has too much been lost in assuming generic elements, for the purposes of defining the essential shape of consequence (i.e., is it too abstract)? To add constraints to these general structural properties, most logical systems are also defined with operational rules that consequence has to respect (e.g., the operational rules in Gentzen’s sequent calculus, see Sect. 4.2.1). In the next section, we consider rules of inference, and we argue that consequence should be used to define them, rather than them being used to help define consequence. Here, we were able to define a logical consequence independently of any such rules because we required the elements to be structured (i.e., elements that are functions) and we gave consequence its form based on it.

We now contrast the proposed logical consequence with those in most literature on many-valued logic. Due to the numerous proposals, the work [22] explores fundamental properties (or constraints) that they should satisfy in order to be viewed as ‘respectable’ consequence relations in this setting. These properties include: (i) bivalence-compliance, where the restriction of the relation to binary values respects classical consequence; (ii) locality, where consequence can be evaluated in terms of individual worlds (Definition 3.2); (iii) value-monotonicity, where the relation is monotonic with respect to the ordering of the truth values (see Definition 3.9 below); and (iv) validity-coherence, where every function is entailed by the zero function 0. Our proposed consequence relation, in general, violates all but the last of these properties.

Consider, for example, the value-monotonicity property. A focalizing projection family defines a partial order relation $\le $ on ${\mathcal {F}}$ (Proposition 3.4), which may be contrasted with the standard point-wise ordering:

Definition 3.8

(Point-wise Ordering). A relation $\le $ on ${\mathcal {F}}$ is a point-wise ordering if, for all $f_0,f_1 \in {\mathcal {F}}$, we have:

$$\begin{aligned} f_0 \le f_1 \text { if and only if } f_0(\omega ) \le f_1(\omega ) \text { for all } \omega \in \Omega . \end{aligned}$$

This ordering gives rise to a notion of strength of functions, where loosely speaking a function is stronger than another if it entails more than it, and in turn, a notion of monotonicity:

Definition 3.9

(Value-Monotonicity). A consequence relation is value-monotonic if, for all $f_0,f_1,f_0',f_1' \in {\mathcal {F}}$ such that $f_0' \le f_0$ and $f_1 \le f_1'$, we have:

$$\begin{aligned} \text {If } f_0 \models f_1, \text { then } f_0' \models f_1'. \end{aligned}$$

Notice that a focalizing projection family does not, in general, induce a partial ordering that satisfies the point-wise ordering in Definition 3.8. In probability, for example, for two probability distributions such that $P_0 \subseteq P_1$ (i.e., $P_0$ is a conditional distribution of $P_1$), there may exist $\omega \in \Omega $ such that $P_0(\omega ) > P_1(\omega )$ (e.g., consider conditioning on a single element $a \in \Omega $, so that all mass is placed on it). Rather than requiring point-wise orderings, we specify a requirement about maintaining the relative orderings (Eq. 3.3). This issue presents itself in probability because the amount of mass contained in these functions must add up to one, which creates interdependencies among the values that can be assigned to different worlds $\omega \in \Omega $.

We note that under the single premise, single conclusion assumption used in this work, the monotonicity property—formalizing what we mean by a function being stronger than another—is the core of the relation, and hence the reason for using projections to give the concept more form. Additional constraints could be placed on projective systems so as to satisfy the above properties, but for the purposes of creating a formulation that encompasses classical logic and probability, our definition suffices. With a notion of logical consequence in hand, we now turn to the other fundamental component of logic.

4 Deductive Consequence

Thus far, we have developed a relation and referred to it as a logical consequence. However, if we wish to create an abstract logic (and further claim that probability is an instance), then we must contend with deduction and language, and how probability relates. The aim of this section is not to produce a particular logic, with particular rules of inference, but rather, we attempt to understand how the components of logic fit together.

There are two general approaches to constructing logical systems. In the first, which is the traditional approach, we begin with a language ${\mathcal {L}}$, which defines a space of sentences ${\mathcal {S}}_{{\mathcal {L}}}$, and we then proceed to define consequence relations over it. In the second approach, we instead begin with a space of functions ${\mathcal {F}}_{\Omega }$ (over a domain $\Omega $), and then proceed to define a consequence relation over it. The systems $({\mathcal {F}}_{\Omega }, \models )$ and $({\mathcal {S}}_{{\mathcal {L}}}, \vdash )$ are duals of each other; given either a mapping $\tau :{\mathcal {S}}_{{\mathcal {L}}} \rightarrow {\mathcal {F}}_{\Omega }$ (that, loosely speaking, assigns valuations to sentences), or a mapping $\tau ': {\mathcal {F}}_{\Omega } \rightarrow {\mathcal {S}}_{{\mathcal {L}}}$ (that assigns symbolic representations to functions), once one system is defined, then so is the other. In the literature, logical systems are typically constructed by first defining a language (and propositional space).^{Footnote 16}

In this work, on the other hand, we consider the latter approach in which the function space is first defined. We believe considering this approach is instructive as it allows us to view the purpose of logical components from another angle. In this section, because this approach is somewhat atypical, we begin by discussing how a language can be constructed with respect to a function space. Then we consider the use of this language in constructing rules of inference for a given consequence relation. Finally, we consider the role of deduction in probability.

4.1 Language

A proof must take the form of a finite sequence of formulas, each derivable from the previous ones in the sequence, and the validity of which can be determined by inspection; unlike in the previous section, a language is critical here. In a formal language, the set of symbol strings are defined by a grammar, a set of rules for taking symbol strings and constructing larger ones, loosely speaking. Grammars define compositional objects, whereby objects are composed of parts, which in turn, are composed of parts, etc. Grammars can also be used for constructing other compositional objects besides strings, for example graphs and trees (e.g., using graph grammars [68] or tree grammars [50]), which could be used as well. In classical logics, these symbol strings are used to represent binary functions, allowing for a compressed (and finite, under certain assumptions) representation of them.

Consider the relationship between vocabulary and structured spaces. To begin in a simple setting, suppose $\Omega = \Omega _1 \times \ldots , \Omega _n$ is a product space and let ${\mathcal {F}} = 2^{\Omega }$ be the space of binary functions over it. To construct compact representations of these functions, we begin by specifying ‘primitive’ functions that only depend on parts of $\omega \in \Omega $. Assign symbols $a,b,c,\ldots $ to represent functions $f_a,f_b,f_c,\ldots \in {\mathcal {F}}$ that only depend on a single component^{Footnote 17}. We refer to these as non-logical symbols due to the domain of their corresponding functions being the set $\Omega $. For a truth set ${\mathcal {V}}=\{0,1\}$, next we can define connective symbols $c_0,c_1,\ldots $ that represent functions of the form $c:\{0,1\}^m \rightarrow \{0,1\}$, where m is a positive integer. These symbols allow for the construction of symbol strings (that represent additional functions), where a grammar is imposed: the set of well-formed sentences must correspond to the set of well-defined function compositions.

Now suppose the space $\Omega $ has a more general structure than a product space (see Appendix B) and the above language is no longer sufficient for representing all the functions over it. The non-logical symbols are again assigned to the primitive functions, as defined by the projective structure on this space. To increase the number of functions that can be represented, we can introduce free symbols $x,y,z,\ldots $, that we can place in strings where non-logical symbols occur, allowing for the representation of a set of functions: for a well-formed formula $\sigma (x)$ with free variable x, let $f_{\sigma (x)}$ be a function of the form

$$\begin{aligned} f_{\sigma (x)}:\Omega \rightarrow \bigcup _{k=1}^{\infty } \{0,1\}^{k}, \end{aligned}$$

(4.1)

defined as $f_{\sigma (x)}(\omega ) = \{f_{\sigma (x \rightarrow a)}(\omega ) \; | \; a \text { is a non-logical symbol in } \omega \}$, where $\sigma (x \rightarrow a)$ is the symbol string formed from replacing the symbol x with the symbol a. Now we may introduce quantifier symbols $\forall , \exists ,\ldots $, that correspond to mappings of the form $\bigcup _{k=1}^{\infty } \{0,1\}^{k} \mapsto \{0,1\}$, allowing its composition with functions of the form in Eq. 4.1 to create a function in ${\mathcal {F}}$.

We note that symbols can be defined as logical constants in two ways. In the traditional approach, a symbol is a constant if, roughly speaking, admissible replacements of the other symbols in sentences do not change inferences. Alternatively, a symbol can be defined to be a constant if its corresponding function does not depend on the set $\Omega $, e.g., functions with domains (and codomains) that only involve the set ${\mathcal {V}}$ of valuations. For example, the above connective and quantifier symbols correspond to functions of the form $\{0,1\}^m \mapsto \{0,1\}$ and $\bigcup _{k=1}^{\infty } \{0,1\}^{k} \mapsto \{0,1\}$, respectively, which could be referred to as world-free functions.^{Footnote 18} We now give an example of a structured space $\Omega $ where a first-order language is required for representing the functions over it.

Example

(First Order Language). Let ${\mathcal {U}}$ be a set referred to as the universe. Suppose $\Omega $ is a space with structured elements, where each $\omega \in \Omega $ takes the form of a hypergraph $\omega = (D,e^{(0)},e^{(1)},\ldots )$, where $D \subseteq {\mathcal {U}}$ is an subset of the universe, referred to as a domain of discourse, and each $e^{(i)}$ is an edge function of the form $e^{(i)}:D^{i} \rightarrow \{0,1\}^{m_i}$. Let ${\mathcal {F}}$ be the set of binary functions over $\Omega $. We define a language for describing these functions as follows.

For each $\omega \in \Omega $, let $D[\omega ]$ denote the domain in $\omega $ and let $e^{(i)}[\omega ]$ denote the function $e^{(i)}$ in $\omega $. We define projections of $\omega \in \Omega $ onto its function $e^{(0)}$. Define functions $f_k: \Omega \rightarrow \{0,1\}$, for $k=1,\ldots ,m_0$, as follows: $f_k(\omega ) = e^{(0)}_k [\omega ]$, where the subscript denotes the vectors $k^{\text {th}}$ component. Assign symbols $a_1,\ldots ,a_{m_0}$ to these functions. Now we define projections of $\omega \in \Omega $ onto its function $e^{(1)}$. Define functions $f_{k,a}:\Omega \rightarrow \{0,1\}$, for $k=1,\ldots ,m_1$ and $a \in {\mathcal {U}}$, as follows:

$$\begin{aligned} f_{k,a}(\omega ) = {\left\{ \begin{array}{ll} e^{(1)}_k[\omega ](a), \; & \text {if } a \in D[\omega ] \\ 0, \; & \text {otherwise} \end{array}\right. }. \end{aligned}$$

Assign symbols $P_k(a)$ to each one of these functions. Continuing in this fashion, we construct functions of the form $f_{k,a_1,\ldots ,a_i}:\Omega \rightarrow \{0,1\}$ and symbols $P_k(a_1,\ldots ,a_i)$ that represent them, for $i = 1,2,\ldots $ and $k = 1,\ldots , m_i$. Assign symbols, e.g., $\{\lnot , \wedge , \vee , \rightarrow \}$, to functions of the form $\{0,1\}^2 \mapsto \{0,1\}$. Finally, introduce variable and quantifier symbols for expanding the number of functions that can be represented. Define a grammar for this set of symbols that corresponds to valid compositions of their corresponding functions such that the output is binary. We note that the structure of $\Omega $ in this example, where there are functional dependences between edges and the objects in the domain, can be described in terms of a projective system (Appendix B).

To summarize, for a given domain $\Omega $, we may assign symbols to primitive functions on $\Omega $, which describe components of $\omega \in \Omega $, then build other functions over $\Omega $ in terms of them, where the use of variables and quantifiers in the language allow the representation of more complex functions.

4.2 Proofs

Proofs require symbolic representations. Suppose that binary functions over $\Omega $ do not have compact descriptions (with respect to a given vocabulary). Then, these function must be specified by enumeration (i.e., an exhaustive listing of the function’s output for every input to it), and in order to conclude that one function entails another, an exhaustive inspection is required (e.g., checking over each $\omega \in \Omega $^{Footnote 19}). Further, even if the domain $\Omega $ is finite, it may be the case that evaluating the function f(w) is too expensive or impossible, for example, because the element $w \in \Omega $ is itself infinite is size^{Footnote 20}; this then also necessitates compact representations, allowing for proofs that do not require direct evaluations of f. Symbolic representations allow for proofs of consequence based only on the syntactic form of sentences.

The abstract definition of consequence given by Tarski (Definition 3.1) specifies general structural properties (e.g., transitivity, monotonicity, etc.). To construct a proof theory for such relations, we must introduce a language with logical operators (i.e., logical constants), and define how consequence and the logical operators relate. Thus, in Gentzen’s sequent calculus [32], in addition to structural rules, we have operational rules that specify this relationship. These rules produce a joint definition; the definition of consequence is in terms of logical operators, and vice versa. Thus, for using these operational rules in constructing a logic (e.g., a many-valued logic where consequence and constants must be defined), if we specify one, we can derive the other: i.e., we may define consequence with respect to given logical operators [14, 80], or conversely, define logical operators with respect to given consequence relations [7, 15, 21]. Notice, though, that in constructing a proof calculus, there are three objects in play: the operational rules, the consequence relation, and the logical operators. Given any two of them, we may derive the third (since they must be in alignment with each other). In this work, we consider an approach in which we first define consequence and logical constants, and then derive the operational rules (or rules of inference) from them. This was the approach taken in constructing Gentzen’s sequent calculus; the operational rules were clearly motivated by definitions of consequence and constants in classical logic.^{Footnote 21}

Thus far, we defined a logical consequence on valuation functions (Sect. 3.4) and we defined logical constants for the symbolic representation of these function in the previous subsection. The constants can serve dual purposes however: they are used both in the representation of functions and in the rules for transforming functions to others. These dual roles make them indispensable for characterizing the operational rules for consequence, the valid transformations from one function to another that can be verified by inspection.

4.2.1 Rules of Inference

Rules of inference specify the sequences of sentences that are valid for proofs. In the approach taken in this work, these rules are defined so as to be in alignment with the chosen definition of the logical consequence relation $\models $ and the definition of the logical symbols in the language. Thus, we define a relation $\models _{{\mathcal {S}}}$ over ${\mathcal {S}}$ (where $\models _{{\mathcal {S}}} \; \subseteq {\mathcal {S}} \times {\mathcal {S}}$) such that $\phi \models _{{\mathcal {S}}} \psi $ if and only if $f_{\phi } \models f_{\psi }$. We wish to construct rules of inference that define a relation $\vdash $ that is equivalent to $\models _{{\mathcal {S}}}$.

There are two general approaches to defining logical rules. In the Hilbert style, where every line in a proof is an unconditional tautology, and we let the rules of inference be some relation $\vdash '$ on ${\mathcal {S}}$ that can be evaluated by inspection. Then a sequence $(\phi _0, \phi _1,\ldots ,\phi _m)$ is valid with respect to these rules if $\phi _0 \vdash ' \phi _1 \vdash ' \cdots \vdash ' \phi _m$, and we define $\vdash $ as the set derivable relations from $\vdash '$. To define the relation $\vdash '$ of inspectable consequences, we include in $\vdash '$ only transformations that involve the application of a small number of logical constants.

For example, suppose we have logical operators $\wedge $ and $\rightarrow $ (e.g., functions of the form ${\mathcal {V}}^2 \mapsto {\mathcal {V}}$) such that, for all $f_0,f_1 \in {\mathcal {F}}$, we have $f_0 \wedge (f_0 \rightarrow f_1) \models f_1$. Then, we may define a rule of the form $\phi \wedge (\phi \rightarrow \psi ) \vdash \psi $, where to simplify notation, we are letting the symbols for these operators be the same as their function symbols. This rule corresponds to a relation $\vdash _{r_1}$ on ${\mathcal {S}}$ such that:

$$\begin{aligned} (q,r) \in \; \vdash _{r_1} \text { if and only if } q = \phi \wedge (\phi \rightarrow \psi ) \text { and } r = \psi \text { for some } \phi ,\psi \in {\mathcal {S}}, \end{aligned}$$

where $q,r \in {\mathcal {S}}$. As another example, if we have a logical operator $\vee $ such that, for all $f_0,f_1 \in {\mathcal {F}}$, we have $f_0 \models f_0 \vee f_1$, then we may define a rule of the form $\phi \vdash \phi \vee \psi $ and a relation $\vdash _{r_2}$ that corresponds to it. Supposing we have a set of such rules corresponding to a set of relations $\vdash _{r_1},\ldots ,\vdash _{r_l}$, we define $\vdash '$, the set of inspectable consequences, as their union, and define $\vdash $, the set of provable consequences, in terms of it. Similarly, we can construct Gentzen style rules, where each line in a proof is a conditional tautology^{Footnote 22}.

Any rule of inference constructed in this manner can be used for valid inference, as they are valid by construction. These rules only depend on form, not on the particular values assigned to the non-logical symbols in them. A sufficient number of rules must be constructed such that any instance of $\models _{{\mathcal {S}}}$ is derivable from them. Thus, these rules define a grammar for valid proofs, where the functions represented by symbol strings need not be made explicit.

Importantly, the methodology for constructing rules of inference applies more broadly than to just consequence relations, and can be used for any relation. For example, let ${\mathcal {F}}$ be the set of real-valued functions over a space $\Omega $, let $=$ be the equality relation on ${\mathcal {F}}$ (or the order relation $\le $), and let $+$ be the binary operator corresponding to addition. Then, for all $f_0,f_1,f_2 \in {\mathcal {F}}$, we have that $f_0 = f_1$ if and only if $f_0 + f_2 = f_1 + f_2$. This is the rule of inference concerning the preservation of equality to additions. For constructing such rules, the logical operators need not be limited to point-wise operators of the form ${\mathcal {V}}^n \mapsto {\mathcal {V}}$, but can be extended to the more general form ${\mathcal {F}}^n \mapsto {\mathcal {F}}$, which may be useful when relations are not local (Definition 3.2). The construction of rules of inference can be abstracted from such mechanisms, where the rules of inference in classical logic are a particular instance, and for example, where the standard rules routinely used in mathematics are instances.

A point worth emphasizing is that even though the deductive consequence we have been considering is a symmetric relation (i.e., a subset of ${\mathcal {S}} \times {\mathcal {S}}$) rather than the traditional relation between a set of premises and a conclusion (i.e., a subset of ${\mathbb {P}}({\mathcal {S}}) \times {\mathcal {S}}$), we still have that the rules of inference are in terms of sets of sentences. That is, to prove instances of symmetric relations, the structure (or form) of a given sentence is represented by a set of component sentences and the rules specify valid manipulations of these components (e.g., if a sentence can be written as $\phi \vee \psi $, we can conclude that it entails $\phi $). Thus, we view this as the location in logical systems where asymmetries are present, not within the consequence relations themselves.

In summary, given a consequence relation such as our proposal from the previous section and a language (with logical constants) for symbolically representing functions, we can construct Hilbert style and Gentzen style rules of inference. We contend that the rules of inference are servant to the proof of instances of consequence relations, and hence the approach of first specifying logical consequence is a natural one.

4.3 Probability

Does probability have some language for compactly representing its functions? And if so, does it also have a deductive apparatus for making proofs regarding its consequence relation?

Real-valued functions are significantly more complex than binary functions in that a more complicated language is necessary to represent them. Indeed, in general, real-valued function spaces are too complex for every function in them to be compactly represented by a finite string (with finite vocabulary), i.e., languages will not be functionally complete (where there exists a surjective function $\tau : {\mathcal {S}} \rightarrow {\mathcal {F}}$ such that $\tau ( {\mathcal {S}} ) = {\mathcal {F}}$). There are, however, functions in them that can be represented, for example, those expressible in closed form (e.g., using standard mathematical notation and grammar), or closely related, those that belong to parameterized models. Let ${\mathcal {P}}$ denote the set of probability measures with respect to a space $\Omega $, and let ${\mathcal {S}}$ denote a set of finite symbol strings such that for each string $\phi \in {\mathcal {S}}$, there is a probability measure $P_{\phi } \in {\mathcal {P}}$ corresponding to it. Let $\tilde{{\mathcal {P}}} \subseteq {\mathcal {P}}$ denote the subset that can be expressed in closed form with respect to the language (i.e., there exists finite symbol strings corresponding to the probability measures). As an example, instead of specifying it by an exhaustive listing, we can write the Ising model compactly as follows:

$$\begin{aligned} P(\omega ) = \frac{1}{Z} \exp \left[ - \sum _{i<j} a_{i,j} x_i x_j - \sum _{i} u_i x_i \right] , \; \omega \in \{+1,-1\}^n, \end{aligned}$$

where each $x_i = \pi _i(\omega )$, the $a_{i,j}$ and $u_i$ are real values, and Z is a normalizing constant. Here, the symbols $x_1,...,x_n$ are non-logical symbols, corresponding to projections on $\omega $ to its components, and the remaining symbols are logical. Notice the ‘abuse’ of notation in this equation: the left-hand side denotes a function P, whereas the right-hand side is a symbol string $\phi \in {\mathcal {S}}$, and the equality symbol $=$ expresses their correspondence (this is analogous to writing that a logical sentence equals its valuation function).

We may construct rules of inference from the consequence relation and the language. In addition to a more complicated language, the rules of inference are more complicated as well. For classical logic, the operators were of the form ${\mathcal {V}}^2 \mapsto {\mathcal {V}}$ and were applied point-wise, whereas in probability, we need logical operators of the form ${\mathcal {P}}^2 \mapsto {\mathcal {P}}$ due to its non-local dependencies. For distributions $P_0,P_1 \in {\mathcal {P}}$, we defined that $P_0 \models P_1$ if there exists a focalizing projection such that $\pi (P_1)=P_0$. Thus, rules concerning operations that maintain the equality relation are relevant, as are, more generally, the standard rules of inference used in everyday mathematics. It’s beyond the scope of this work to catalogue these; they are merely the methods of proof already in use.

For our purposes here, it’s sufficient to note that probability has a language for compactly representing real-valued functions, allowing for the provability of whether certain functions entail others, i.e., there are deductive tools for its consequence relation. Of course, deduction plays dissimilar roles in probability and classical logic; in probability, although proofs of consequence are important, they are not usually viewed as a defining feature of the system itself. The divergent interpretations that have developed regarding the purposes of these systems reflect their different applications.

5 Interfaces, Fusions, and Foundations

The previous sections have sketched a general formulation for logic, a system involving a binary relation over functions—with certain preservation, restriction, and consistency properties—and a mechanism for proving instances of it, which requires compact representations of them. Each function specifies a state of ‘knowledge’, at least in its usual application, and the relation specifies the valid transformations and inferences from it. The formulation contains as instances propositional logic, predicate logic, and probability, as well as others, which compose the (formal) tools of reasoning that have found, by far, the most real-world application.

The implications of the proposed abstraction are significant, and are both positive, concerning how to develop alternative reasoning systems, and negative, concerning how not to combine probability and logic. In this section, we briefly evaluate, from the lens of the proposed relational interpretation, some of the common approaches to the fusion in the literature, and the degree to which they integrate both systems.

A large amount of research into the coupling of probability and logic initiates their investigation with the placement of probabilities on propositions. This presupposes a certain relationship between these systems. Suppose we have a function space ${\mathcal {F}}_{\Omega }$, a sentence space ${\mathcal {S}}_{{\mathcal {L}}}$, and a mapping $\tau :{\mathcal {S}}_{{\mathcal {L}}} \rightarrow {\mathcal {F}}_{\Omega }$ (that assigns valuations to sentences). The placement of probabilities on sentences amounts to the presupposition that, for sentences $\phi \in {\mathcal {S}}_{{\mathcal {L}}}$, their binary valuation functions $\tau (\phi ) \in {\mathcal {F}}_{\Omega }$ correspond to probabilistic events-subsets of $\Omega $-and it is this correspondence that defines the relationship between probability and logic.

There’s another possibility, however: the binary valuation functions $\tau (\phi ) \in {\mathcal {F}}_{\Omega }$ can instead correspond to probability densities, the valuation functions used in probability. That is, since both sentence valuations in logic and densities in probability define functions over a set $\Omega $ of possible worlds, except one binary and the other real valued, sentences could be taken to correspond, not to probabilistic events (the usual correspondence), but directly to probability measures.

Considerations of this alternative correspondence led us to our proposed abstraction. In the following subsections, we consider its implications with respect to the fields of statistical relational modeling, probability logic, and inductive logic, as well as to some foundational issues.

5.1 Statistical Relational Modeling

We first consider the fusion approach in the field of statistical relational modeling, which explores, among other things, the practical construction of probability distributions over spaces of possible worlds in first-order logic, where such distributions allow consistent probabilities to be assigned to sentences in the language.

Suppose we want to construct a probability distribution over a space in which elements represent possible worlds, where these worlds are composed of objects with attributes (e.g., child wearing a hat, blue sedan, etc.), and relationships between objects (e.g., holding hands, driving, etc.), and further, worlds vary in the objects in them (e.g., they may be empty or may have numerous objects). Given a vocabulary in a logical language, a space $\Omega $ can be associated with it and a distribution over it allows consistent probabilities to be assigned to sentences in the language, a central problem in the field [30, 63, 72]. The set $\Omega $ is not equivalent to a product space in terms of its structural constraints (an idea that can be formalized in terms of structure-preserving mappings), which prevents the straightforward application of standard modeling approaches to it.

The set $\Omega $ of possible worlds will, in general, be too massive to make learning a distribution over it feasible without the use of invariance assumptions (i.e., constraints, structure). These invariances can be specified using the logical language. If we have a probability distribution over a countable space $\Omega $, then sentences of a logical language about this space (where sentences correspond to binary functions of the form $f:\Omega \rightarrow \{0,1\}$), can be assigned probabilities. (That is, a sentence assigned probability equal to the probability of the subset of $\Omega $ in which it is true.) If, on the other hand, the distribution over a space $\Omega $ is unknown and we want to learn it, logical expressions can be used to express invariances in the distribution. For example, invariances can be defined on distributions by constraining the distributions to those that assign certain probabilities to certain sentences.

More generally, invariances can be defined on these distributions in terms of logical expressions about the distribution itself [8, 28, 43], referred to as probability expressions (these correspond to functionals of the form $f(P) \mapsto \{0,1\}$, where P is a distribution over $\Omega $). Thus, this creates a modeling framework for $\Omega $ based on general invariances, as expressed by logical expressions. This level of expressiveness in invariances, however, can result in the specification of a set of invariances that is inconsistent in the sense that there does not exist a well-defined distribution that satisfies it. This problem has led researchers to consider forgoing some of the expressive power in these logics to ensure consistency, and since graphical models provide consistency guarantees for their invariances, to research extensions of graphical models to more general spaces.

We now consider two issues concerning the depth of this general approach to fusing probability and logic. The first issue is that it is not clear where deduction comes into play in the fused system. Given a vocabulary in a logical language, a set of possible worlds can be defined, and from the perspective of its statistical modeling, it is only the structure of this space that matters. Since the structure of this space can be described without recourse to a logical system, the extra machinery (e.g., a notion of logical consequence, rules of inference, etc.) that comes with it can be forgone when defining distributions over it. In the case where additional structure is placed on the set (e.g., constraining certain sentences to have certain probabilities), while it may be convenient to describe these constraints using sentences, the defining characteristics of logic are still not being employed and the constraints could be equivalently described by functions. In the proposed interpretation, one of the primary purposes of sentences in logical systems—symbolic representations of the state of the world—is that they allow for proofs of instances of logical consequence. Without a notion of consequence, or a need for its proof, a logical language is not strictly necessary. This raises the question, in fusions of probability and logic based on assigning probabilities to sentences, where does logical consequence, and a need for a language, appear? Are we to assume that logical consequence takes the form of probabilistic conditioning in this fusion? If so, then not only is the role of probability being prioritized, but the very idea that probability needs to be extended with a notion of valid inference and that the systems are complementary is contradicted.

A second issue with this approach to a fusion concerns the interpretation of knowledge bases. In statistical relational modeling, many approaches define a knowledge base as a set of sentences and a corresponding set of weights, and use it when defining probability distributions over (subsets of) a set $\Omega $ of possible worlds. In this formulation, knowledge bases define constraints on the allowable set of distributions. The use of constraints when defining distributions, however, is a standard practice in statistical modeling, which raises the question about whether the treatment of knowledge bases as a set of constraints is the proper abstraction of the concept. In the proposed interpretation of the relationship, both classical logic and probability share a common conception of what a knowledge base is, i.e., a designated valuation function over $\Omega $ (or its corresponding symbol string), that expresses a state of knowledge and from which inferences can be made. Thus, we see that the proposed interpretation is in conflict with the interpretation given to knowledge bases in this fusion approach. Finally, the notion of logical consequence in the fused system takes the form of probabilistic conditioning, and hence the role of probability is prioritized in this fusion.

5.2 Probability Logic

We now consider, through the lens of the proposed interpretation, the approach to the fusion in the field of probability logic ([5, 35, 36, 55]), where symbols are incorporated into a logical language for referencing, or quantifying over, probability measures. These logics can be roughly divided into two categories: (1) those in which logical consequence maintains its traditional definition based on the preservation of sentence truth values; and (2) those in which it does not. We consider each in turn. Let $\Omega $ be a (countable) set of possible worlds and ${\mathcal {P}}_{\Omega }$ the set of probability measures over it. In the first category, a logic is defined over a space of probability measures ${\mathcal {P}}_{\Omega }$ (see, e.g., [28, 36, 37]); in this approach, a vocabulary and grammar are specified, producing symbol strings that represent binary functions over ${\mathcal {P}}_{\Omega }$. Basic sentences in these logics often take the form $P(\phi ) \ge b$, where $\phi \subseteq \Omega $, and more complex sentences are possible such as linear combinations $a_1 P(\phi _1) + \cdots + a_n P(\phi _n) \ge b$, as well as others [8, 28, 43, 64, 66]. These sentences are either true or false for a given $P \in {\mathcal {P}}_{\Omega }$, and the consequence relation is defined based on these truth values as usual.

In this fused reasoning system, sentences correspond to binary valuation functions of the form f(P), where $P \in {\mathcal {P}}_{\Omega }$. This may be contrasted with the reasoning system discussed in the previous section (Sect. 5.1), where probability measures were defined that take the form P(f), where $f \in {\mathcal {F}}_{\Omega }$ is a binary function over $\Omega $ and logical consequence is discarded in favor of probabilistic conditioning; here the opposite occurs. In this sense, there is a duality between these two approaches.^{Footnote 23} The problem, however, is that neither approach to the coupling produces a novel (or extended) reasoning system in the sense that the consequence relation essentially reduces to that of one or the other.

In the second category of probability logic, the logical consequence relation cannot be stated based on the truth values of sentences, but instead is redefined. We consider a representative example here [3, 5, 6], although others could have been employed as well. Let the language be limited to sentences of the form “$P(\beta | \alpha )$ is high”, where $\beta , \alpha \subseteq \Omega $, and where the meaning of this sentence is defined indirectly through the consequence relation described below. That is, the truth of a sentence “$P(\beta | \alpha )$ is high” cannot be evaluated for a given $P \in {\mathcal {P}}_{\Omega }$ (i.e., sentences do not represent binary functions on ${\mathcal {P}}_{\Omega }$). Sentences do not have independent meanings here, but can only be interpreted in the context of others. Suppose we have a knowledge base $\Gamma $ of the form:

$$\begin{aligned} \left\{ ``P(\psi _1 | \phi _1) \text { is high''}, \ldots , ``P(\psi _n | \phi _n) \text { is high''} \right\} . \end{aligned}$$

Then, define logical consequence as follows: $\Gamma $ $\models $ “$P(\beta | \alpha )$ is high” if and only if for all $\epsilon >0$, there is a $\delta >0$, such that for all probability measures $P \in {\mathcal {P}}_{\Omega }$:

$$\begin{aligned} P(\psi _i | \phi _i) \ge 1-\delta \text { for each } i \implies P(\beta | \alpha ) \ge 1 - \epsilon . \end{aligned}$$

The first thing to notice is that this consequence is not a binary relation on pairs of sentences, i.e., the premises are not individual sentences (possibly formed from a conjunction of others), but sets of sentences. (The language does not contain connectives or other logical symbols for combining sentences to form others.)

This probability logic integrates both logical consequence and probabilistic conditioning into a single notion of consequence, unlike those in the previous category, which creates a novel reasoning system (that does not reduce to one or the other). While this logic fits into the standard abstraction of logic (e.g., involving Definition 3.1), it radically departs from the abstraction proposed here (i.e., sentences are not symbolic representations of functions, nor logical consequence a relation on them). As a result, this probability logic deviates far from our intuitive understanding of what is logic. It is, in a sense, less constrained, as well as more complicated, than ones belonging to the proposed abstraction; such an unconstrained version of reasoning has less precedent in terms of supporting examples used in practice.

5.3 Inductive Logic

We now consider our interpretation’s fit with inductive logic [18,19,20, 33], where the main idea is to extend deductive (syntactic) consequence, a relation on symbol strings, to cases where premises provide less than conclusive support for conclusions, referred to as partial entailment. The goal is to define the degree (of confirmation) to which premises support conclusions, and to specify these values solely in terms of their syntactic structure. Probability appears to be a natural tool for specifying these degrees: assign to them the probabilities of conclusions after conditioning on premises. The view that probability defines partial entailments is referred to as the “logical interpretation of probability”^{Footnote 24} [48, 52].

In the proposed interpretation, sentences are symbolic representations of valuation functions, allowing for proofs. In both logic and probability, deductive consequence then concerns the provability of instances of logical consequence for their respective valuation functions (over any appropriate set of possible worlds). By extending deductive consequence from a binary to real-valued function over pairs of sentences, this then translates to an assumption about partial degrees of provability (regarding proof of the entailment of a binary function from another), an assumption that contradicts the common conception of what is a proof. Hence, the interface between probability and logic expressed in inductive logic is at odds with the proposed interpretation.

5.4 Foundations

In research concerning the foundation of probability, there is debate about whether conditional probabilities should be taken to be primary, and then defining unconditional probability from them, or vice versa; it appears, however, that no argument on the matter is decisive [27, 38, 39]. In this work, we took unconditional probability as primary based on our interpretation of probability’s relationship with logic, i.e., since we took conditioning to be a form of consequence, it was defined in a secondary manner in terms of projections between unconditional distributions (or more generally, between valuation functions). Finally, we note that: (i) both logic and probability may be considered to be monotonic in the sense that consequence projections always constrict the regions of positive mass of functions, never expanding it; and (ii) there may be implications concerning the dichotomy between deductive and inductive reasoning: if probability is an instance of logic, how can it be inherently inductive (except by way of its application)?

5.5 Narrow Interpretations

Although probability and classical logic may be viewed as instances of logic, broadly construed, the logical consequence relation in probability may also be considered an extension of that in classical logic in the sense that there exists a mapping taking probability measures to binary functions that preserves it. That is, if we let $m:{\mathcal {P}}_{\Omega } \rightarrow {\mathcal {F}}_{\Omega }$ be the mapping that takes probability measures and identifies the portion of their domain with positive probability mass (i.e., $m(P)(\omega ) = {\mathbb {I}}(P(\omega )>0)$, where $\omega \in \Omega $ and ${\mathbb {I}}$ is the indicator function), then if a probability measure P entails $P'$, we have that binary function m(P) is entailed by $m(P')$. Thus, the latter relation can be recovered from the former, based on a simple mapping identifying positive probability mass. This suggests that, in this narrow sense, probability may be viewed as an extension of any particular classical logic.

One of the main objections against extensional interpretations of the relationship between logic and probability, aside from the latter lacking a consequence relation, is that it does not have “extensional” operations ([5, 77]; see also ([36])). This objection is based on the observation that logical connectives can be used in logic to produce, loosely speaking, more complex sentences (e.g., taking two sentences $\phi $ and $\psi $ and forming their conjunction $\phi \wedge \psi $), whereas there are not coinciding connectives for probability measures (in the sense that $P(\phi \wedge \psi )$ cannot be expressed as a function of $P(\phi )$ and $P(\psi )$). While this observation suggests against viewing probability measures defined over sentences as an extended version of logic, it does not conflict with the above extensional interpretation.

Another objection occasionally encountered against an extensional interpretation is that probability lacks the “expressive power” of logic. Loosely speaking, the expressivity of a logic concerns the number of valuation functions that can be expressed with it. Depending on how one formalizes these concepts, by most reasonable measures we have that the expressivity of probability, the number of functions that it can express, is not less than that of classical logic.

6 Summary and Discussion

This work concerns the foundational question of what is the core relationship between probability and logic. Is there a correct relational interpretation, or are more than one equally legitimate? In our investigation, we return to basic considerations regarding the correspondences between the two systems that could dictate the nature of their relationship. This work began with an observation: both logical consequence and probabilistic conditioning may be described in terms of a set of projections that prescribe what (valuation) functions are allowed to follow from others under refinements to the possible worlds. When these projections are endowed with preservation and restriction properties, they result in an extension of the subset relation to a sub-function relation. We view this more general relation as a natural direction to abstract logical consequence, using probability and classical logic as our guiding examples.

Inferences with respect to these relations are made relative to some designated function (e.g., a ‘knowledge base’), where in classical logic, this is a binary function over some structured space $\Omega $, perhaps described by some conjunction of sentences, and in probability, a probability measure over it. The valuation functions, in their usual application, describe a state of ‘knowledge’, e.g., which possible worlds can occur or how likely they are to occur, and consequence may now be interpreted as a notion of how knowledge is allowed to flow or transition from one state to another. For proving instances of this consequence relation, it is necessary that the valuation functions have compact representations, which can be accomplished by assigning labels to elementary functions, and using them as building blocks for constructing more complex ones. Given these labels, grammars can then be used for specifying well-defined symbolic representations, as well as consequence-preserving operations on them, necessary for deduction.

One critique of the proposed abstraction may be related to the use of logic as a foundation of mathematics: how can logic and probability be considered to be on an equal footing (in the sense that both are instances of a broadly construed logic) if, for example, probability theory requires logic in the proof of its theorems^{Footnote 25}? This interpretation of the relationship, however, is at a stratospheric level and was never considered satisfactory by those searching for an intimate understanding of it. Another critique may concern our use of countable instead of uncountable spaces $\Omega $ in our presentation, where in the latter case, consistent probabilities can generally no longer be assigned to all subsets and must be limited to only those in some sub-collection. However, the proposed interpretation of the relationship, based on the basic observation that probability may be interpreted as having a consequence relation analogous to logic, is not changed by the technicalities of measure. The function projections defined in Sect. 2, for example, can be adjusted to accommodate $\sigma $-fields in a straightforward manner^{Footnote 26}.

6.1 Why Abstract?

In general, abstractions can only be evaluated through appeals to meta-level principles such as aesthetics, simplicity, fit (to guiding examples), and other such qualities. The value of an abstraction derives, to a large extent, from the clarity and understanding that it affords.

The proposed abstraction encompasses probability and classical logic, and importantly, does so using minimal machinery. This allows for a compressed description of these systems. It accomplishes this without appeals to higher mathematics and without stretching or contorting the original conception of logic. It also has an explanatory property, providing a simple narrative concerning the purpose of components (i.e., the abstraction is not too abstract). We found this beneficial in understanding how components should and should not fit together in reasoning systems (and was used to analyze different approaches to fusing probability and logic). Finally, the proposed abstraction, like any abstraction of value, suggests possible directions for the development of alternative instances.

Importantly, the proposed formulation, while encompassing those most used, does not encompass all logics that have been proposed, which provides a useful demarcation. The Tarskian abstraction (Definition 3.1), which extends classical logic in a different direction, encapsulates a greater number of examples, including many of the offshoot logics that have been developed in the literature in recent years. The proposed abstraction is not a replacement for the traditional Tarskian abstraction; they abstract in different directions, in the service of different aims.

6.2 Probability and Logic Revisited

The abstraction proposed in this work furnishes an interpretation of the relationship between probability and logic that is in agreement with the classical conception of it, that for some sufficiently broad definition of logic, probability is an instance^{Footnote 27}. This interpretation conflicts with the conception that these systems are fundamentally complementary in nature; as a result, our proposal is also in conflict with the proposed approaches to their coupling in the literature.

We considered several such research branches, and from the lens of the proposed interpretation, evaluated their consistency with the designated purpose of components in logical systems. For example, in the proposed interpretation, the basic purpose of a language in logical systems is that it allows for symbolic representations of functions (a requirement for proofs of instances of logical consequence).^{Footnote 28} Many fusions in the literature violate this raison d’être, as well as others (Sect. 5). We also appraised fusions based on their degree of integration and compactness. For example, a fusion in which one system is stacked on top of the other is not as compactly integrated as a fusion in which the systems are more intertwined (see Sects. 5.1 and 5.2). Similarly, a fusion in which only partial parts in each system are incorporated is not as fully integrated as a fusion that includes more of them.

Although probability and logic can combine/interface in many ways, ranging from the trivial to the complex, the issue at hand is whether there exists a deep fusion that forms a principled foundation for the development of reasoning system more generally. If one accepts the proposed abstraction of logic in which probability is an instance, then it appears that there does not exist such a fusion between probability and any particular classical logic,^{Footnote 29} since one is already an extension of the other (Sect. 5.5). The proposed abstraction of logic, however, focuses our attention to other, perhaps more promising, directions in which new reasoning systems could be developed.

Notes

We defer discussion of the fit of other components—language and proof—until later in the manuscript, when we find conceptual footholds to help us.
From a high-level, common approaches include: (i) extending the valuations given to logical sentences from binary to real valued; (ii) extending logical and/or deductive consequence from binary to real valued functions over sentences; (iii) extending the syntax to allow for symbolic representations of probabilities; or (iv) some combination of the above.
Probability must also be endowed with a notion of consequence under a third interpretation of the relationship, that probability is an extension of classical, two-valued logic [47, 49, 70]; this is not necessarily inconsistent with the classical interpretation, since probability may be an extension of two-valued logic while still being an instance of logic generally defined.
For example, the seminal work in [30], extended in [72] to more expressive languages, proceed along these lines, directly defining probability over sentences instead of sets.
This correspondence has been the framing for a vast amount of research. To give just one example, research that presupposes this relationship includes “the logical view of conditioning” [26], which explores the probability of conditional statements, and in particular, how it relates to conditional probability. (In particular, well known proposals are given in [1, 2, 75], and further explored in, e.g., [40, 41, 56, 57].) We mention this example, because although we also discuss the relationship between conditional probability and logic in this work, it does not fall in the same category as this research since here we will reject this framing.
That is, if logical consequence is based on the preservation of these object valuations, then probability cannot have an analogous consequence relation since the probability of a given event generally does not effectively delimit that of another. The probability of events can be constrained using inequalities, resulting for example, in upper and lower bounds on the probability of the consequent [2, 35, 76], but this is generally not restrictive enough for these bounds to be deemed as preservation properties analogous to that of logic.
By beginning with logical consequence, it allows us to understand what we are proving before considering how we prove it. We will not attend to pre-theoretical notions of consequence to motivate the abstraction of consequence in this work, which are based on intuitions that are partly formed through historical developments [44]. Rather, we will use our relational explorations with probability to motivate the necessary properties, with the hope that this may help frame and perhaps inform our intuitions on what consequence could be.
In classical logics, there is an equivalency between logical consequence and material implication: for sentences A, B and set of sentences $\Gamma $, we have $\Gamma ,A \models B$ if and only if $\Gamma \models A \rightarrow B$ (this is a semantic version of the deduction theorem, trivial to prove [29]). Further, there is a similar equivalency between material implication and set inclusion (indeed, often it is represented with the same symbol). This leads us to the above view that, for the purposes of developing abstractions of logical consequence, one possible direction is to treat these equivalencies in classical logic as identities and investigate abstractions of the set inclusion relation.
For example, in the case of propositional logic, the set $\Omega $ is a product space, and in the case of first-order logic, a non-product space.
This difference in the conditioning and consequence definitions is due to the nature of these reasoning systems and their traditional uses. This observation concerning the directionality is in conflict with an assumption often appearing in research on fused reasoning systems about the directionality being the same [1, 2, 75].
For simplicity, we forgo considerations of partially ordered truth values, e.g., [11].
In the case of modal logics, we may view these logics as having: (a) truth values with dependencies beyond individual worlds based on accessibility relations, a constrained form of global consequence; or (b) a local consequence relation over a more complicated space of possible worlds.
In these definitions, however, the relation is not transitive (i.e., if $A \models B$ and $B \models C$, then $A \models C$), which in this work, we take as an essential property of classical consequence to retain due to its presence in both the classical consequence relation and the probabilistic conditioning relation. Of course, there is debate on its importance; see, e.g., [23, 79].
We note that whether we view a consequence relation as between a set of premises and a set of conclusions or between a single premise and single conclusion depends on the language. For a fixed language, a set of sentences is a more general object than a single sentence. On the other hand though, the language may be extended with an additional connective that concatenates sets of sentences into a single sentence. For defining an abstract logical consequence relation, the goal is to define its properties regardless of the particular language employed.
In classical logic, a set of binary functions is consistent if their logical conjunction is not the zero function. The problem with this case is that any other binary function logically entails from it, and in particular, given any binary function, both it and its negation logical entail from it, a degenerate case that is not interesting from a practical perspective. In contrast, for the ones function, we have that no other binary function logically entails from it with the exception of only itself, and conversely, the ones function logically entails from any other binary function. Similarly, for probability, we will view the zero function as a degenerate function that entails all others.
From here, either the valuations are specified (and in turn, a logical consequence relation) and/or the rules of inference and a deductive consequence relation is defined.
That is, functions which take the form $f = f \vert _{i} \circ \pi _i$, where $\pi _i:\Omega \rightarrow \Omega _i$ is the component projection and $f\vert _{i}$ is the restriction of f to $\Omega _i$.
This latter approach to the definition may have conceptual value for defining constants on more general languages, avoiding some of the complications that arise in the former approach (e.g., see [61, 65])
Assuming a consequence relation that is locally inspectable.
For example, suppose $\omega \in \Omega $ has an infinite domain of discourse; see Example 4.1.
For this reason, using these particular logical rules to derive consequence or constants when constructing a non-classical logic, e.g., a many-valued logic, may be problematic.
For example, suppose we have logical operators $\wedge $ and $\rightarrow $ such that, for all $f_0,f_1,f_2 \in {\mathcal {F}}$, we have $f_0 \wedge f_1 \models f_2$ if and only if $f_0 \models f_1 \rightarrow f_3$. Then, we may define a rule of the form $\phi \wedge \psi \vdash \mu $ if and only if $\phi \vdash \psi \rightarrow \mu $.
From the perspective of the proposed interpretation where probability is a logic, we may view these couplings as forming hierarchies of logics, akin to Tarski’s hierarchies of languages designed to address paradoxes of self-reference in truth assignments. For example, in probability logic, probability is used for the object language and the classical logic over it is a metalanguage.
In [45], however, it was argued that assigning probabilities to sentences is neither a necessary nor sufficient condition for a logical interpretation of probability.
The definition of probability should be distinguished from the field of probability theory here. It is worth mentioning also that it has been suggested that probability could also be used as a foundation of mathematics [62], suggesting then that this relationship could be inverted.
Suppose we have a measureable space $(\Omega , {\mathcal {F}})$ and a logic involving binary valuation functions of the form $f':\Omega \rightarrow \{0,1\}$. We may define set functions of the form $f:{\mathcal {F}} \rightarrow \{0,1\}$ that form alternative representations of the $f'$’s. For example, we could define binary functions on ${\mathcal {F}}$ to be logical valuation functions if:
1. 1.
  $f(\Omega )=1$.
2. 2.
  For all collections $\{E_i\}_{i \in J}$ of sets in ${\mathcal {F}}$, we have
  $$\begin{aligned} f(\bigcup _{i \in J} E_i) = \bigvee _{i \in J} f(E_i). \end{aligned}$$
We may convert from one representation to the other: given a function $f'$, let $f(E)=1$ if $f'(\omega )=1$ for some $\omega \in E$; conversely, given a set function f, let $f'(\omega )=1$ if $f(E)=1$ for all $E \in {\mathcal {F}}$ such that $\omega \in E$. Thus, for a given set function f, the function $f'$ provides a compressed representation of it, analogous to how densities provide compressed representation of probability measures (when with respect to some standard measure). The first condition that $f(\Omega )=1$ forces logical consistency, i.e., the corresponding function $f'$ cannot be the zero function. Although the use of this alternative, distributed representation is unnecessary for dealing with consistency of measure issues (since these binary valuation functions are not measures), it allows alignment in the definition of projective consequence in probability and logic (i.e., both involve a couple $(\Omega , {\mathcal {F}})$, set functions P or f on ${\mathcal {F}}$, and projections of these functions). The preservation properties of these projections (Sect. 3) can be adjusted accordingly.
In addition, for any particular classical logic, probability is an extension of it (see Sect. 5.5)
Languages can be useful for defining a space $\Omega $ and its structure as well.
However, a distribution over a set $\Omega $ may be used to assign consistent probabilities to logical sentences, which may be useful in terms of interpretability. Further, logic may be useful for defining regions of the domain $\Omega $ in which probability cannot have mass, as well as other invariance assumptions about distributions. Lastly, there may be reasons to combine probability and logic so as to form a logic hierarchy (such as Tarski’s hierarchy of languages) to address issues such as the paradoxes of self-reference.
There are functional dependencies in the atomic representation whenever there exists a projection $\pi \in \Pi _{\text {atom}}$ such that the set of its subprojections $\Pi _{\pi }$ is non-empty.

References

Adams, E.: The logic of conditionals. Inquiry 8(1–4), 166–197 (1965)
Article Google Scholar
Adams, E.: The Logic of conditionals: An Application of Probability to Deductive Logic. Springer Science & Business Media, Dordrecht (1975)
Book Google Scholar
Adams, E.: On the logic of high probability. J. Philos. Log. 15(3), 255–279 (1986)
Article MathSciNet Google Scholar
Adams, E.: Four probability-preserving properties of inferences. J. Philos. Log. 25(1), 1–24 (1996)
Article MathSciNet Google Scholar
Adams, E.: A Primer of Probability Logic. Center for the Study of Language and Inf (1996)
Adams, E., Levine, H.: On the uncertainties transmitted from premises to conclusions in deductive inferences. Synthese 30(3–4), 429–460 (1975)
Article Google Scholar
Avron, A.: Natural 3-valued logics-characterization and proof theory. J. Symb. Log. 56(1), 276–294 (1991)
Article MathSciNet Google Scholar
Bacchus, F.: Representing and Reasoning with Probabilistic Knowledge: A Logical Approach to Probabilities (1990)
Barwise, J.: Axioms for abstract model theory. Ann. Math. Log. 7(2–3), 221–265 (1974)
Article MathSciNet Google Scholar
Belle, V.: Logic meets probability: towards explainable AI systems for uncertain worlds. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 5116–5120 (2017)
Belnap, N.: How a computer should think. In: New Essays on Belnap–Dunn Logic, pp. 35–53. Springer, Cham (2019)
Chapter Google Scholar
Beziau, J.-Y.: Logica Universalis: Towards a General Theory of Logic. Springer, Berlin (2007)
Book Google Scholar
Béziau, J-.Y.: Universal Logic: An Anthology from Paul Hertz to Dov Gabbay (2012)
Bolzano, B.: Theory of Science. University of California Press (2020)
Bonnay, D., Westerståhl, D.: Consequence mining. J. Philos. Log. 41(4), 671–709 (2012)
Article Google Scholar
Brady, R.: Universal logic (2006)
Carnap, R.: Formalization of Logic. Cambridge, Mass (1943)
Carnap, R.: On inductive logic. Philos. Sci. 12(2), 72–97 (1945)
Article MathSciNet Google Scholar
Carnap, R.: Logical Foundations of Probability, 2nd edn. The University of Chicago Press (1962)
Carnap, R.: A basic system of inductive logic, part I. In: Jeffrey, R., Carnap, R. (eds.) Studies in Inductive Logic and Probability, pp. 34–165. University of California Press, Los Angeles (1971)
Chapter Google Scholar
Chemla, E., Egré, P.: From many-valued consequence to many-valued connectives. Synthese 198(22), 5315–5352 (2021)
Article MathSciNet Google Scholar
Chemla, E., Égré, P., Spector, B.: Characterizing logical consequence in many-valued logic. J. Log. Comput. 27(7), 2193–2226 (2017)
MathSciNet Google Scholar
Cobreros, P., Egré, P., Ripley, D., van Rooij, R.: Tolerant, classical, strict. J. Philos. Log. 41(2), 347–385 (2012)
Article MathSciNet Google Scholar
de Finetti, B.: “La Prévision: Ses Lois Logiques, Ses Sources Subjectives”, Annales de l’Institut Henri Poincaré. 7: 1–68; translated as “Foresight. Its Logical Laws, Its Subjective Sources,” in Studies in Subjective Probability, H. E. Kyburg, Jr. and H. E. Smokler (eds.). Malabar, FL: R. E. Krieger Publishing Company, 1980, pp. 53–118 (1937)
De Morgan, A. Formal Logic: Or, the Calculus of Inference, Necessary and Probable. Taylor and Walton (1847)
Dubois, D., Prade, H.: The logical view of conditioning and its application to possibility and evidence theories. Int. J. Approx. Reason. 4(1), 23–46 (1990)
Article MathSciNet Google Scholar
Easwaran, K.: What conditional probability must (almost) be. In: Formal Epistemology Workshop, Austin (2005)
Fagin, R., Halpern, J., Megiddo, N.: A logic for reasoning about probabilities. Inf. Comput. 87(1–2), 78–128 (1990)
Article MathSciNet Google Scholar
Franks, C.: The deduction theorem (before and after herbrand). Hist. Philos. Logic 42(2), 129–159 (2021)
Article MathSciNet Google Scholar
Gaifman, H.: Concerning measures in first order calculi. Israel J. Math. 2(1), 1–18 (1964)
Article MathSciNet Google Scholar
Gentzen, G.: Untersuchungen über das logische schließen. II. Math. Z. 39, 405 (1935)
Article MathSciNet Google Scholar
Gentzen, G.: Investigations into logical deduction. Am. Philos. Q. 1(4), 288–306 (1964)
Google Scholar
Glaister, S.: Inductive logic. In: A Companion to Philosophical Logic, pp. 563–581 (2006)
Grätzer, G.: Universal Algebra. Springer Science & Business Media (2008)
Haenni, R., Romeijn, J.-W., Wheeler, G. Williamson, J.: Probabilistic Logics and Probabilistic Networks, vol. 350. Springer Science & Business Media (2010)
Hailperin, T.: Probability logic. Notre Dame J. Form. Log. 25(3), 198–212 (1984)
Article MathSciNet Google Scholar
Hailperin, T.: Sentential Probability Logic: Origins, Development, Current Status, and Technical Applications. Lehigh University Press (1996)
Hájek, A.: What conditional probability could not be. Synthese 137(3), 273–323 (2003)
Article MathSciNet Google Scholar
Hájek, A.: What conditional probability also could not be. In: Formal Epistemology Workshop, Austin (2005)
Hájek, A.: Triviality pursuit. Topoi 30(1), 3–15 (2011)
Article MathSciNet Google Scholar
Hájek, A., Hall, N.: The hypothesis of the conditional construal of conditional probability. In: Probability and Conditionals, pp. 75–111. Cambridge University Press, Cambridge (1994)
Google Scholar
Hájek, P.: Metamathematics of Fuzzy Logic, vol. 4. Springer Science & Business Media, Dordrecht (2013)
Google Scholar
Halpern, J.: An analysis of first-order logics of probability. Artif. Intell. 46(3), 311–350 (1990)
Article MathSciNet Google Scholar
Hanson, W.: The concept of logical consequence. Philos. Rev. 106(3), 365–409 (1997)
Article Google Scholar
Howson, C.: Probability and logic. J. Appl. Log. 1(3), 151–165 (2003)
Article MathSciNet Google Scholar
Howson, C.: Can logic be combined with probability? Probably. J. Appl. Log. 7(2), 177–187 (2009). Special issue: Combining Probability and Logic
Jaynes, E.: Probability Theory: The Logic of Science. Cambridge University Press, Cambridge (2003)
Book Google Scholar
Jeffreys, H.: Scientific Inference. University Press (1957)
Jeffreys, H.: The Theory of Probability, 3rd edn. Oxford University Press, Oxford (1961)
Google Scholar
Joshi, A., Levy, L., Takahashi, M.: Tree adjunct grammars. J. Comput. Syst. Sci. 10(1), 136–163 (1975)
Article MathSciNet Google Scholar
Kautz, H., Singla, P.: Technical perspective: combining logic and probability. Commun. ACM 59(7), 106 (2016)
Article Google Scholar
Keynes, J.M.: A Treatise on Probability. Macmillan and Company limited (1921)
Kneale, W.: The province of logic. Mind 66(262), 258–258 (1957)
Article Google Scholar
Kripke, S.: Outline of a theory of truth. J. Philos. 72(19), 690–716 (1976)
Article Google Scholar
Leitgeb, H.: Probability in Logic. The Oxford Handbook of Probability and Philosophy. Oxford University Press, Oxford (2014)
Lewis, D.: Probabilities of conditionals and conditional probabilities. In: Ifs, pp. 129–147. Springer (1976)
Lewis, D.: Probabilities of conditionals and conditional probabilities II. Philos. Rev. 95(4), 581–589 (1986)
Article Google Scholar
Lukasiewicz, J.: On three-valued logic. Ruch Filozoficzny 5, 170–171 (1920)
Google Scholar
Machina, K.: Truth, belief, and vagueness. J. Philos. Log. 47–78 (1976)
Malinowski, G.: Q-consequence operation. Reports on mathematical logic 24(1) (1990)
McCarthy, T.: The idea of a logical constant. J. Philos. 78(9), 499–523 (1981)
Article Google Scholar
Mumford, D.: The dawning of the age of stochasticity. In: Mathematics: Frontiers and Perspectives, pp. 197–218 (2000)
Nilsson, N.: Probabilistic logic. Artif. Intell. 28(1), 71–87 (1986)
Article MathSciNet Google Scholar
Ognjanovic, Z., Raskovic, M.: Some probability logics with new types of probability operators. J. Log. Comput. 9(2), 181–195 (1999)
Article MathSciNet Google Scholar
Peacocke, C.: What is a logical constant? J. Philos. 73(9), 221–240 (1976)
Article Google Scholar
Perović, A., Ognjanović, Z., Rašković, M., Marković, Z.: A probabilistic logic with polynomial weight formulas. In: International Symposium on Foundations of Information and Knowledge Systems, pp. 239–252. Springer (2008)
Ramsey, F.: Truth and probability. In: Readings in Formal Epistemology, pp. 21–45. Springer (2016)
Chapter Google Scholar
Rozenberg, G.: Handbook of Graph Grammars and Comp, vol. 1. World scientific (1997)
Russell, S.: Unifying logic and probability. Commun. ACM 58(7), 88–97 (2015)
Article Google Scholar
Savage, L.: The Foundations of Statistics. Courier Corporation (1972)
Scott, D.: Completeness and axiomatizability in many-valued logic. In: Proceedings of the Tarski Symposium, vol. 25, pp. 411–436. American Mathematical Society, Providence (1974)
Scott, D., Krauss, P.: Assigning probabilities to logical formulas. In: Studies in Logic and the Foundations of Mathematics, vol. 43, pp. 219–264. Elsevier (1966)
Shoesmith, D., Smiley, T.J.: Multiple-Conclusion Logic. CUP Archive (1978)
Siegfried, G.: Mehrwertige logik. eine einfuhrung in theorie und anwendungen (1989)
Stalnaker, R.: Probability and conditionals. Philos. Sci. 37(1), 64–80 (1970)
Article MathSciNet Google Scholar
Suppes, P.: Probabilistic inference and the concept of total evidence. In: Studies in Logic and the Foundations of Mathematics, vol. 43, pp. 49–65. Elsevier (1966)
Tarski, A.: Wahrscheinlichkeitslehre und mehrwertige logik. Erkenntnis 5(1), 174–175 (1935)
Article Google Scholar
Tarski, A.: On the concept of following logically. Hist. Philos. Logic 23(3), 155–196 (2002)
Article MathSciNet Google Scholar
Tennant, N.: The transmission of truth and the transitivity of deduction. In: What is a Logical System?. pp. 161–177 (1994)
van Benthem, J.: Is there still logic in bolzano’s key?. In: Bernard Bolzanos Leistungen in Logik, Mathematik und Physik Bd., pp. 11–34. Sankt Augustin: Academia (2003)
van Benthem, J.: Against all odds: When logic meets probability. In: ModelEd, TestEd, TrustEd, pp. 239–253. Springer (2017)
Wójcicki, R.: Matrix approach in methodology of sentential calculi. Stud. Log. Int. J. Symb. Log. 32, 7–39 (1973)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Applied Physics Laboratory, Johns Hopkins University, Laurel, MD, USA
Neil F. Hallonquist

Authors

Neil F. Hallonquist
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neil F. Hallonquist.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Basic Sets, Fuzzy Sets, and Projections

In Definition 3.6, we extended the notion of a subset from a relation over $2^{\Omega }$ to a relation over ${\mathbb {R}}^{\Omega }$. To do this, we used projections on these respective spaces to define which sets or functions are subsets or sub-functions of others. With a notion of sub-functions in hand, we may define analogues extensions to other basic set theory concepts such as the intersection and union relations:

Definition A.1

(General Set Membership). Let ${\mathcal {F}}_{\Omega }$ be a function space and $\Pi _{\Omega }$ a focalizing projective system over it. For all $f,h,g \in {\mathcal {F}}$:

1.
(Proper subset): The function f is a proper sub-function of h, denoted by $f \subset _{\Pi } h$, if $f \subseteq _{\Pi } h$ and $f \ne h$.
2.
(Intersection): The functions f and h intersect, denoted $f \cap _{\Pi } h \ne \emptyset $, if there exists a function $g \in {\mathcal {F}}$ such that $g \subseteq _{\Pi } f$ and $g \subseteq _{\Pi } h$.
3.
(Full intersection): The functions f and h intersect to g, denoted $f \cap _{\Pi } h = g$, if $g \subseteq _{\Pi } f$ and $g \subseteq _{\Pi } h$ and there does not exist a function $g' \in {\mathcal {F}}$, where $g \subset _{\Pi } g'$, such that $g' \subseteq _{\Pi } f$ and $g' \subseteq _{\Pi } h$.
4.
(Union) The functions f and h have union g, denoted $f \cup _{\Pi } h = g$, if $f \subseteq _{\Pi } g$ and $h \subseteq _{\Pi } g$ and there does not exist a function $g' \in {\mathcal {F}}$, where $g \subset _{\Pi } g'$, such that $f \subseteq _{\Pi } g'$ and $h \subseteq _{\Pi } g'$.

We note how this contrasts with fuzzy set theory. In that domain, the notion of intersection and union is defined as follows:

Definition A.2

(Fuzzy Sets). For two functions $f:\Omega \rightarrow [0, 1]$ and $h:\Omega \rightarrow [0,1]$, define subset, intersection, and union as:

1.
(Subset): Let $f \subseteq h$ if and only if $f(\omega ) \le h(\omega )$ for all $\omega \in \Omega $.
2.
(Intersection/Union): The intersection $f \cap h$ and union $f \cup h$ are defined by:
$$\begin{aligned} f \cap h(\omega ) = t( f(\omega ), h(\omega )), \text { for all } \omega \in \Omega , \\ f \cup h(\omega ) = s( f(\omega ), h(\omega )), \text { for all } \omega \in \Omega , \end{aligned}$$
where t is a t-norm and s is a corresponding s-norm.

The function space ${\mathcal {F}}$ in fuzzy logic is a superset of that used in probability. But, the notions of sub-functions are point-wise defined, as well as differing in the property that a sub-function must be less than or equal to another everywhere.

Structured Spaces

In this appendix, we define structure on a set $\Omega $ in terms of a projection family. These projections should not be confused with those used to define transformations between functions, and in turn, define logical consequence (Sect. 3.4). Rather, the projections here are used to define structure on the space $\Omega $ itself, which is important for defining a language (Sect. 4.1). We divide this appendix into two sections. In the first, we discuss projections of the form $\pi : \Omega \rightarrow \Omega ' \subseteq \Omega $ and in the second, we discuss projections of the form $\pi : \Omega \rightarrow \Omega '$ where $\Omega ' \not \subseteq \Omega $.

1.1 Subset Substructure

For a space $\Omega $, suppose we have a family $\Pi $ of projections where each projection has the form $\pi : \Omega \rightarrow \Omega ' \subseteq \Omega $. We will define a structured space $(\Omega , \Pi )$, where the structural constraints on objects $\omega \in \Omega $ are defined by $\Pi $. We specify the necessary properties of this projective system.

Definition B.1

(Consistency). The projections $\pi _1: \Omega \rightarrow \Omega _1$ and $\pi _2: \Omega \rightarrow \Omega _2$ are consistent if:

$$\begin{aligned}&\Omega _1 \subseteq \Omega _2 \implies \pi _1 \circ \pi _2 = \pi _1 \\&\Omega _2 \subseteq \Omega _1 \implies \pi _2 \circ \pi _1 = \pi _2. \end{aligned}$$

In other words, two projections are consistent if: (a) one’s image is not a subset of the other’s; or (b) projecting an object onto the smaller space is the same as first projecting the object onto the larger space, and then projecting onto the smaller space.

Definition B.2

(Strong Consistency). The projections $\pi _1: \Omega \rightarrow \Omega _1$ and $\pi _2: \Omega \rightarrow \Omega _2$ are strongly consistent if $\Omega _1 \cap \Omega _2 \ne \emptyset $ implies that there exists a unique projection $\pi _3:\Omega \rightarrow \Omega _1 \cap \Omega _2$ consistent with $\pi _1$ and $\pi _2$.

Definition B.3

(Completeness). A projection family $\Pi $ is complete if for all $\pi _1:\Omega \rightarrow \Omega _1$ and $\pi _{2}:\Omega \rightarrow \Omega _{2}$ in $\Pi $, a projection of the form $\pi _{3}: \Omega \rightarrow \Omega _1 \cap \Omega _2$ exists in $\Pi $.

We say that a projection family is (strongly) consistent if every pair of projections in it is (strongly) consistent. If a projection family is strongly consistent, then it can be made complete by augmenting it with additional projections. We now define the notion of atomic projections. Loosely speaking, for a projection family $\Pi $, a projection in it is atomic if: (1) there does not exist a projection in this family that projects to a subset of its image; or (2) if there are projections in this family that project to a subset of its image, then this set loses information. The second condition ensures that any object projected by a sufficiently large set of atomic projections can be reconstructed. To define this more formally, we introduce some notation. Suppose we have a projection family $\Pi $; for a projection $\pi \in \Pi $, let its sub-projections be the subset $\Pi _{\pi } \subseteq \Pi $ containing projections whose images are subsets of the image of $\pi $, i.e.:

$$\begin{aligned} \Pi _{\pi } = \{\pi ' \in \Pi \; | \; \text {Im}(\pi ') \subset \text {Im}(\pi ) \}, \end{aligned}$$

where $\text {Im}(\pi )$ denotes the image of the function $\pi $. For a set of projections $\Pi $, let $\Pi (w)$ be the set $(\pi (w), \pi \in \Pi )$, where $w \in \Omega $. We say that a set of projections $\Pi $ is invertible over a set $\Omega ' \subseteq \Omega $ if there exists a function $\Pi ^{-1}$ such that

$$\begin{aligned} \Pi ^{-1} (\Pi (w)) = w, \; \forall w \in \Omega '. \end{aligned}$$

We define atomic projections as follows:

Definition B.4

(Atomic Projections). For a finite projection family $\Pi $, a projection $\pi \in \Pi $ is atomic if:

1.
It does not have any sub-projections (i.e., $\Pi _{\pi } = \emptyset $); or
2.
If it does have sub-projections (i.e., $\Pi _{\pi } \ne \emptyset $), then the projection set $\Pi _{\pi }$ is not invertible over $\text {Im}(\pi )$.

In other words, for a projection family, the atomic projections are those with either the smallest images, or if there are projections with smaller images, they cannot be reconstructed from them. Finally, we need to assume that a projection family has enough coverage over a space $\Omega $ so that it can be used for representing objects in it:

Definition B.5

(Atomic Representation). For a finite projection family $\Pi $ over $\Omega $, a set of atomic projections $\Pi _{\text {atom}} \subseteq \Pi $ is an atomic representation of the space $\Omega $ if it is invertible over $\Omega $.

If a finite projection family $\Pi $ over $\Omega $ contains the identity projection $\text {I}_{\Omega }$, then there exists an atomic representation of $\Omega $ within $\Pi $. If a finite projection family $\Pi $ over $\Omega $ contains the identity projection, and (1) is consistent and complete; (2) its atomic projections have no functional dependencies^{Footnote 30}, then there is a unique atomic representation (up to the projection to the empty set). For defining symbolic representations for functions over $\Omega $, we may assign symbols that correspond to an atomic representation of $\Omega $.

This formulation of a projective system can be used to describe the structure of the space $\Omega $ corresponding to first-order languages (Example 4.1). In particular, define projections $\pi _{A}: \Omega \rightarrow \Omega '$, where $A \subseteq {\mathcal {U}}$, where each $\omega \in \Omega $ is projected to its restriction to the set A. The set of all projections of this form defines the same structure as that described in the example, except in a different form. There are two methods, which are equivalent, for endowing a set with structure. The first is to interpret the elements of the set as having structure (e.g., letting the elements be vectors, graphs, trees, etc.). The second is to interpret the elements of a set as generic, and associating with that set an additional mathematical object, which defines the structure. In this latter approach, the set and its structure are separated (rather than being intertwined), allowing structures to be more easily compared. In this appendix, we took this latter approach, defining a structure based on consistent projection families, allowing an equivalent description of the structure found in, for example, a set of vectors, graphs, or trees.

1.2 Substructures

In the previous section, we discussed projection families in which each projection’s image was a subset of its domain (i.e., for each projection $\pi : \Omega \rightarrow \Omega '$, we have $\Omega ' \subseteq \Omega $). However, certain objects have a structure that cannot be described in this form; in this section, we consider functions on $\Omega $ in which their images may not be a subset (i.e., $\Omega ' \not \subseteq \Omega $). For convenience, we will view these functions as mapping into substructures, and refer to them as projections. An example of a projection from a space $\Omega $ to a substructure is the projection of a vector to one of its components.

Suppose we have a set $\Pi $ of projections of the form $\pi : \Omega \rightarrow \Omega '$, where $\Omega ' \not \subseteq \Omega $; since the images of these projections are not subsets of $\Omega $, the composition of these projections is no longer well-defined (i.e., the image of one projection is not necessarily a subset of the domain of another projection). Further, we can no longer use basic set theory in defining a hierarchy of projections, which we used, for example, for defining when a projection’s image was to a subset of another’s. Rather, we now have to define a notion of set membership with respect to the functions over these spaces, which we denote as $\Pi $-set membership. We can define these set-theoretic notions as follows. Assume that the image of every projection in $\Pi $ is unique, i.e., no two projections have the same image.

Definition B.6

(Set Membership). Let $\pi _1:\Omega \rightarrow \Omega _1$ and $\pi _{2}:\Omega \rightarrow \Omega _{2}$ be projections in $\Pi $.

1.
(Subset): $\Omega _1$ is a subset of $\Omega _2$, denoted $(\Omega _1 \subseteq \Omega _2)_{\Pi }$, if for all $\omega \in \Omega _1$ there is some $A \subseteq \Omega _2$ such that $\pi _1^{-1}(\omega ) = \pi _2^{-1}(A)$.
2.
(Intersection): $\Omega _1$ intersects $\Omega _2$, denoted $(\Omega _1 \cap \Omega _2 \ne \emptyset )_{\Pi }$, if there exists a projection $\pi _3:\Omega \rightarrow \Omega _3$ such that $\Omega _3 \ne \emptyset $, $(\Omega _3 \subseteq \Omega _1)_{\Pi }$ and $(\Omega _3 \subseteq \Omega _2)_{\Pi }$. Further, if there does not exist a projection $\pi _3':\Omega \rightarrow \Omega _3'$, where $(\Omega _3' \subseteq \Omega _1)_{\Pi }$ and $(\Omega _3' \subseteq \Omega _2)_{\Pi }$, such that $|\Omega _3'| > |\Omega _3|$, then we say that $\pi _3$ intersects fully, that $\Omega _3$ is a full intersection, and denote this by $(\Omega _1 \cap \Omega _2 = \Omega _3)_{\Pi }$.

Now we can define notions of consistency in projective systems analogous to those in the previous section. In particular, projections $\pi _1: \Omega \rightarrow \Omega _1$ and $\pi _2: \Omega \rightarrow \Omega _2$ are strongly consistent if $(\Omega _1 \cap \Omega _2 \ne \emptyset )_{\Pi }$ implies that there exists a projection $\pi _3:\Omega \rightarrow \Omega _3$ that intersects fully with $\pi _1$ and $\pi _2$. A projection family $\Pi $ is complete if for all $\pi _1:\Omega \rightarrow \Omega _1$ and $\pi _{2}:\Omega \rightarrow \Omega _{2}$ in $\Pi $, a projection $\pi _3:\Omega \rightarrow \Omega _3$ that intersects fully with $\pi _1$ and $\pi _2$ exists in $\Pi $. These definitions provide a natural extension of those in the previous section for projections to substructures.

Example

(Vectors). A simple example of projections to substructures is the familiar coordinate projections used in modeling multivariate random variables. Let $\Omega $ be a product space of the form $\Omega = {\mathcal {X}} = {\mathcal {X}}_1 \times \cdots \times {\mathcal {X}}_n$, and let the coordinate projection $\pi _A:{\mathcal {X}} \rightarrow {\mathcal {X}}_A$ be

$$\begin{aligned} \pi _A(X)=X_A, \end{aligned}$$

where ${\mathcal {X}}_A = \prod _{i \in A} {\mathcal {X}}_i$ and $A \subseteq \{1,\ldots ,n\}$. Since ${\mathcal {X}}_A \not \subseteq {\mathcal {X}}$ for $A \ne \{1,\ldots ,n\}$, these projections are to substructures of the original space. Let $\Pi $ be the projection family $\Pi = \{\pi _A, \; A \subseteq \{1,\ldots ,n\} \}$. The set $\Pi $ is complete since for all $\pi _A,\pi _B \in \Pi $, the projection $\pi _{A \cap B}$ exists in it, and the sets consistency follows from the fact that each projection is surjective.

We have arrived at a general description of structural constraints on the objects in a space $\Omega $. in terms of a consistent, complete system of substructure projections $\Pi $. Using this structure, we can define a vocabulary and symbolic representations of functions over $\Omega $ with respect to it.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Hallonquist, N.F. A Logical Consequence Informed by Probability. Log. Univers. 18, 395–429 (2024). https://doi.org/10.1007/s11787-024-00359-x

Download citation

Received: 21 June 2024
Accepted: 01 September 2024
Published: 07 October 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s11787-024-00359-x

A Logical Consequence Informed by Probability

Abstract

Similar content being viewed by others

On the Descriptive Power of Probability Logic

Interpretations of Probability and Bayesian Inference—an Overview

Frequentist Probability Logic

1 Overview

2 A Simple Correspondence

3 Logical Consequence

3.1 Standard Abstract Consequence

Definition 3.1

3.2 Many-Valued Logic

Definition 3.2

3.3 Multiple Conclusion Logic

3.4 A Symmetric, Global Logical Consequence

Definition 3.3

Proposition 3.4

Proof

Definition 3.5

Definition 3.6

Definition 3.7

3.5 Comparing Consequences

Definition 3.8

Definition 3.9

4 Deductive Consequence

4.1 Language

Example

4.2 Proofs

4.2.1 Rules of Inference

4.3 Probability

5 Interfaces, Fusions, and Foundations

5.1 Statistical Relational Modeling

5.2 Probability Logic

5.3 Inductive Logic

5.4 Foundations

5.5 Narrow Interpretations

6 Summary and Discussion

6.1 Why Abstract?

6.2 Probability and Logic Revisited

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Basic Sets, Fuzzy Sets, and Projections

Definition A.1

Definition A.2

Structured Spaces

1.1 Subset Substructure

Definition B.1

Definition B.2

Definition B.3

Definition B.4

Definition B.5

1.2 Substructures

Definition B.6

Example

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation