1 Overview

The question of how probability and logic relate, and how they can or cannot be combined, has an extensive literature and a long history, but is generally considered an unsettled matter (e.g., see [46, 81]). Perhaps the prevailing conception of the relationship, at least in the field of artificial intelligence, is that these systems are complementary (e.g., see [10, 51, 69]); probability can enhance logic by extending it to handle uncertainty, and in turn, logic can enhance probability with, among other things, a notion of valid inference (i.e., probability does not have a notion of consequence analogous to that of logic, and must be extended in this directionFootnote 1). Research on fusing these two systems has splintered into several branches (e.g., probability logic, inductive logic, statistical relational modeling, etc.), the boundaries of which are not always definitive, but generally approaching the fusion from different angles,Footnote 2 and often emphasizing the role of either probability or logic over the other in the fusion, as well as certain components within. But which approach, if any, is in some sense most principled? Does one approach integrate the core components in each system more fully? This issue is important because if one system is, e.g., incorporated only at a surface level, then the fused system would form a faulty foundation for the development of a reasoning system meant to improve upon each.

Taking a step back, the above view of a complemental relationship is in conflict with a classical conception, that for some sufficiently broad definition of logic, probability is an instance of it (i.e., probability is a logic). This view has been repeatedly expressed since the inception of the field (e.g., [24, 25, 47, 52, 67]), and the “idea runs like a thread, at times more visible, at times less, through the subsequent history of epistemic probability” [45]. At the moment, this classical view is somewhat sidelined, perhaps because to convincingly answer the question “In what way is probability a logic?” requires a compelling definition for logic, broadly construed. The problem of formulating logic broadly has old origins, and is studied, for example, in the field of universal logic [9, 12, 13, 16, 82], but there remains debate on whether these abstractions successfully extract and circumscribe the heart of logic. For the classical conception of the relationship to be valid, probability must be endowed with a notion of consequence (in order to be an instance of logicFootnote 3), whereas for the complemental view, it cannot (hence the need to extend it in this direction). The resolution of this conflict hinges on how we choose to abstract consequence beyond classical logic.

It is useful to introduce our proposed abstraction by contrasting it with a reoccurring theme in research on combining probability and logic, the defining of probability measures over propositions.Footnote 4 Since much research is motivated by, or can be traced back to, questions about what can be inferred about a conclusion, given premises are true with only some degree of certainty, the correspondence between these two systems that is most striking is that both sentences and events define subsets of a given set \(\Omega \) of possible worlds, except with valuations being placed on these respective objects differing from binary to real valued. This correspondence between sentences and events shapes a general perspectiveFootnote 5 of the relationship and leads to the conclusion that probability does not have a consequence relation analogous to that of classical logic.Footnote 6 In this work, in contrast, we consider a different correspondence between these systems that could also dictate the nature of the relationship and leads to the opposite conclusion.

This work begins by looking at how logical consequence and probabilistic conditioning relate, and we observe that both can be described in terms of projections, allowing for a compact description of them. In turn, this leads to a general version of consequence, a relation extended from binary to real-valued functions. This conception of consequence produces an abstract formulation of logic, for which we make a case based on its simplicity and fit, and where probability may indeed be viewed as an instance. We initially focus on logical (semantic) consequence, and first establish its abstraction before turning to the issue of proof and formality (i.e., proofs based on syntactic form of sentences).Footnote 7 On a slightly more technical note, the concept of logical consequence can be viewed as being inextricably linked to the concept of set inclusion, and the abstraction of the former as tantamount to the abstraction of the latter.Footnote 8 Indeed, abstract formulations of logic (such as the standard one by Tarski) may be viewed as efforts to abstract the concept of set inclusion from a relation on sets to a relation on sets of sets (with asymmetries in the domain). Here, we consider an extension in a different direction: we abstract essential properties of the set inclusion relation, allowing for the concept of a subset to be extended to that of a sub-function.

This work proceeds as follows. We begin by looking at the relationship between logical consequence and probabilistic conditioning, noticing that both can be stated in terms of projections that dictate the transitions between functions that are valid (Sect. 2). This observation then leads to an abstract logical consequence based on projective systems, required to have preservation, restriction, and consistency properties (Sect. 3). We then examine how grammars and languages allow for proofs of instances of this abstract consequence relation by providing compact symbolic representations of valuation functions (Sect. 4). These ideas lead to a formulation of logic that encompasses probability, making an explicit claim on, in particular, the relationship between probability and classical logic; we attempt to reconcile this relational interpretation with the extensive literature on combining probability and logic (Sect. 5). We conclude with a summary and discussion (Sect. 6).

2 A Simple Correspondence

In this section, we compare logical consequence and probabilistic conditioning, noting a shared property: both relations prescribe what functions are allowed to follow from others based on projections of them. This observation forms the basis for the developments in the rest of the paper. In the next section, we use it to help formulate essential properties for an abstract consequence.

Before proceeding, we note the contrasting role of structure on a set \(\Omega \) in these two systems. The modern definition of probability is general in that the definition of a probability space can be stated without reference to the structure on the sample space, if it has any (i.e., the definition is agnostic to the properties of the objects in this set). The definitions of classical logics, on the other hand, are associated with spaces of possible worlds that take specific, though complex, structures.Footnote 9 The structure of these spaces is important when defining a language, but for the purposes of this section, where we look at connections with probability to inform potential directions for abstracting logical consequence, we may defer these considerations about language and space structure (see Sect. 4.1). In addition, for developing the basic ideas here, we assume the set \(\Omega \) is countable, although the ideas extend beyond this simple setting. In uncountable spaces, the measurability issues that occur (where consistent probabilities cannot be assigned to all subsets of the space) prompted the modern definition of probability, which while providing rigor, can sometimes obscure underpinning concepts. We will return to this issue.

Suppose that \({\mathcal {S}}\) is a set of sentences (i.e., well-formed formulas with no free variables) in a propositional or first-order language. Let \(\Omega \) be a set of possible worlds and let \({\mathcal {F}}=2^\Omega \) be the set of binary functions over it. Let \(\tau : {\mathcal {S}} \rightarrow {\mathcal {F}}\) be a mapping taking sentences to binary functions; for each sentence \(\phi \in {\mathcal {S}}\), there is a function \(f_{\phi } \equiv \tau (\phi )\) (taking the form \(f_{\phi }:\Omega \rightarrow \{0,1\}\)) that specifies its truth value in each world \(\omega \in \Omega \), which we will refer to as a valuation function. Let \(\models \) be a relation on \({\mathcal {F}}\), referred to as logical (semantic) consequence, discussed in this and the next section, and let \(\vdash \) be a relation on the set \({\mathcal {S}}\) of sentences based on the provability of logical consequence, referred to as deductive (syntactic) consequence, discussed in the section after next. We define \(\models \) as a relation on the set \({\mathcal {F}}\) of binary functions, rather than on the set \({\mathcal {S}}\) of sentences, to make explicit that this relation is a direct function of the valuations. The relation \(\models \) on \({\mathcal {F}}\) induces a relation \(\models _{{\mathcal {S}}}\) on \({\mathcal {S}}\): for all \(\phi , \psi \in {\mathcal {S}}\), let \(\phi \models _{{\mathcal {S}}} \psi \) if and only if \(f_{\phi } \models f_{\psi }\). A logical system \(({\mathcal {S}}, \models _{{\mathcal {S}}}, \vdash )\) is sound if \(\vdash \) is a subset of \(\models _{{\mathcal {S}}}\), and is complete if \(\models _{{\mathcal {S}}}\) is a subset of \(\vdash \).

In classical logic, a valuation function in \({\mathcal {F}}\) is defined to be the logical consequence of another function in \({\mathcal {F}}\) if it is positive whenever the other is positive. That is, for \(f, f' \in {\mathcal {F}}\), we have \(f \models f'\) if and only if:

$$\begin{aligned} f(\omega )=1 \implies f'(\omega )=1 \end{aligned}$$
(2.1)

for all \(\omega \in \Omega \). Another way to state this consequence relation, which we find useful below, is based on projections. Define projections \(\pi _A\), where \(A \subseteq \Omega \), taking functions in \({\mathcal {F}}\) to functions in \({\mathcal {F}}\), as follows:

$$\begin{aligned} f_A(\omega ) \equiv \pi _A(f)(\omega ) = {\left\{ \begin{array}{ll} f(\omega ), \; & \text {if } \omega \in A \\ 0, \; & \text {otherwise} \end{array}\right. }, \end{aligned}$$
(2.2)

where \(f \in {\mathcal {F}}\), and \(\omega \in \Omega \). Then, for two valuation functions \(f, f' \in {\mathcal {F}}\), we have that \(f \models f'\) if and only if there exists a projection in this projection family that takes \(f'\) to f (i.e., there exists an \(A \subseteq \Omega \) such that \(\pi _A(f')=f\)). To establish/prove instances of this relation, these binary functions must be given symbolic representations (Sect. 4).

Probability uses real-valued functions and is also endowed with a binary relation on them, defined in terms of conditional distributions. Let \({\mathcal {P}}\) be the set of all probability measures with respect to \(\Omega \), and define projections on this function space as follows:

$$\begin{aligned} P_A(\omega ) \equiv \pi _A(P)(\omega ) = {\left\{ \begin{array}{ll} P(\omega )/P(A), \; & \text {if } \omega \in A \\ 0, \; & \text {otherwise} \end{array}\right. }, \end{aligned}$$
(2.3)

where \(P \in {\mathcal {P}}\), \(\omega \in \Omega \), and \(A \subseteq \Omega \) such that \(P(A)>0\). These projections, after restrictions on the function’s domain, define conditional distributions: for two probability measures \(P,P' \in {\mathcal {P}}\), we have that P is a conditional distribution of \(P'\) if and only if there exists a projection in this projection family that takes \(P'\) to P (i.e., there exists an \(A \subseteq \Omega \) such that \(\pi _A(P')=P\)). Analogous to the above logical consequence, for a given probability distribution, this relation dictates which other distributions follow from it, e.g., what distributions result from incorporating ‘evidence’.

These relations in classical logic and probability are alike in that they are both based on projections that restrict the mass of functions to some subset of their domain, preserve the mass of functions within that domain, and have certain consistency properties. This is suggestive of a possible abstraction of logical consequence, which we consider in the next section.

In comparing these relations in classical logic and probability, the directionality of their respective projections needs to be distinguished. For a given function f, one can check if a function \(f'\) entails it (i.e., there exists a projection taking f to \(f'\)), or alternatively, if it entails \(f'\) (i.e., there exists a projection taking \(f'\) to f); we will refer to these as projecting down or projecting up, respectively. A probability distribution, when conditioned on evidence, is a projection of it to another distribution. A binary valuation function, in contrast, logically entails another binary function if the latter projects to the former. In other words, probabilistic conditioning corresponds to projecting down and logical consequence to projecting up, and hence, conditioning and consequence are opposite in their projective directionality.Footnote 10

3 Logical Consequence

Motivated by the above discussion, which illustrated how projections could be used to dictate the allowable transitions of ‘knowledge’ from one state to another, we consider an abstraction of logical consequence based on it. We begin with a brief review of relevant literature to situate our developments. In the first subsection, we state a standard formulation of abstract consequence, allowing us to contrast the direction that it extends the classical version with the direction taken here. Then, in the following couple subsections, we discuss the relevant consequence relations used in non-classical logics, in particular, in many-valued and multiple conclusion logics. We then develop our proposed version, which is based on abstracting the essential properties of the subset relation in order to create a relation analogous to it, but over more general objects. We consider languages and deductive proof in Sect. 4.

3.1 Standard Abstract Consequence

A standard abstraction of consequence is as follows [77, 78]:

Definition 3.1

(Tarskian Consequence). A Tarskian consequence relation on a set \(\Psi \) is a relation \(\models \; \subseteq {\mathbb {P}}(\Psi ) \times \Psi \) such that, for all \(A,B \subseteq \Psi \) and for all \(a \in \Psi \), we have:

  1. 1.

    (Reflexivity) If \(a \in A\), then \(A \models a\)

  2. 2.

    (Monotonicity) If \(A \models a\) and \(A \subseteq B\), then \(B \models a\)

  3. 3.

    (Transitivity) If \(A \models a\) and \(B \models b\) for every \(b \in A\), then \(B \models a\)

In this definition, the set \(\Psi \) is generic, and the elements in it could be well-formed formulas, valuation functions, or something else; no particular structure is assumed of them. The structure then that makes this relation have substance is not in the set \(\Psi \), but in the form assumed of the space \({\mathbb {P}}(\Psi ) \times \Psi \), a point to which we will return below. This abstract formulation of consequence has many logics falling under its umbrella; one instantiation is many-valued logics, to which we now turn.

3.2 Many-Valued Logic

Suppose we have an orderedFootnote 11 set \({\mathcal {V}}\) of truth values, containing 0 and 1, where the latter is the greatest element and the former the least. The numerous proposals for logical consequence in the many-valued setting can be categorized as follows [22]: those based on the preservation of designated values from the premises to the conclusion are pure consequences (e.g., Lukasiewicz’s three-valued logic [58]); those based on the conclusion being at least as true as the falsest premise are order-theoretic consequences (e.g., certain probability logics [4], also [59]); and those based on premises having some designated set of truth values, and conclusions having some other designated set of truth values, are mixed consequences (e.g., q-consequence [60]). Each many-valued version is motivated by different criteria, but for our purposes here, the thing to note about these different variations is that they all are defined locally (referred to as ‘truth-relational’ in [22]), whereby one can evaluate entailments by individually evaluating it at each possible world:

Definition 3.2

(Locality). A consequence relation \(\models \) is local if there exists a relation \(\sim \; \subseteq {\mathbb {P}}({\mathcal {V}}) \times {\mathcal {V}}\) such that for all \(A \subseteq \Psi \) and for all \(a \in \Psi \), we have:

$$\begin{aligned} A \models a \text { if and only if } A(\omega ) \sim a(\omega ) \text { for all } \omega \in \Omega . \end{aligned}$$

We will say a consequence relation is global if it is not local.

This property is a cornerstone of most logics because the values in \({\mathcal {V}}\) are interpreted, for example, as degrees of truth values [42], quasi truth values [74], gaps in truth [54], or informational states [11], etc., which only depend on a given possible world.Footnote 12 In this work, however, we are interested in exploring formulations that encompass probability, which requires principles of inference for states of ‘knowledge’ that take a more general form. For this reason, we consider formulations in which logical consequence is a global relation, i.e., it is not restricted with a locality condition that allows for point-wise evaluation. We now turn to the topic of symmetry in logics, which is also relevant to our developments.

3.3 Multiple Conclusion Logic

One way classical logic can be extended in a direction different than in Tarskian consequence (Definition 3.1) is to endow the relation with a symmetry between premises and conclusions, a topic studied in the field of multiple conclusion logic [73]. In this logic, both premises and conclusions are sets, and consequence relations take the form \(\models \; \subseteq {\mathbb {P}}(\Psi ) \times {\mathbb {P}}(\Psi )\), where \(\Psi \) is some set.

One way to define a consequence relation in this setting is based on the disjunction of the conclusions: if all premises are true, then at least one conclusion must be true.Footnote 13 The idea of giving symmetry to logic has early roots, going back at least to Gentzen’s sequent calculus [31, 32] and Tait’s version of it (whereby finite sets are considered instead of sequences), and continuing with the work of Carnap [17], where consequence and rules of inference were defined, Kneale [53], where a proof technique was devised for these rules, and Scott [71], which made important contributions, for example, generalizing Lindenbaum’s theorem to this setting.

In this work, we take symmetry—not the multiplicity of conclusions—as the indispensable concept within multiple conclusion logic to be retained. If we assume that we can define a set \({\mathcal {F}}\) of all possible valuation functions, then for the purposes of defining a logical consequence relation, the particular language is immaterial (e.g., two different languages may have the same logical consequence relation). Thus, in this work, we make the assumption that for a given language and set \({\mathcal {S}}\) of well-formed formulas in it, there exists a mapping \(\tau : {\mathbb {P}}({\mathcal {S}}) \rightarrow {\mathcal {F}}\) taking sets of formulas to valuation functions. This assumption holds in classical logic, where consequence is defined based on the conjunction of premises, which results in a single sentence (hence any set of premises gets mapped to a valuation function). This assumption allows us to formulate logical consequence as a relation between a single premise and single conclusion, rather than between a set of premises and set of conclusions. This is not as restrictive as it may appear, since we may always construct logical systems such that this assumption holds,Footnote 14 e.g., by adding values to the truth set \({\mathcal {V}}\) and increasing the size of \({\mathcal {F}}\), a point that we will discuss further below.

This assumption might cause the following concern: how can we perform proofs if consequence involves only a single premise? However, as we will see, the rules of inference in this setting will still have asymmetries and involve compositions of sets of sentences, and indeed, we argue that this is the proper location for it (see Sect. 4.2.1). Thus, without loss of generality, we may define logical consequence as a binary relation on \({\mathcal {F}}\), rather than as a relation on \({\mathbb {P}}(\Psi )\) between sets. However, a logic in which consequence is defined between a single premise and single conclusion raises the question of where the necessary structure on consequence will be located, a topic to which we now turn.

3.4 A Symmetric, Global Logical Consequence

We now consider an extension of logical consequence to infinite values. In the classical version, logical consequence may be written as a relation \(\models \; \subseteq {\mathbb {P}}(\Omega ) \times {\mathbb {P}}(\Omega )\), where \(\Omega \) is a set of possible worlds, such that, for all \(A,B \subseteq \Omega \), we have:

$$\begin{aligned} A \models B \text { if and only if } B \subseteq A. \end{aligned}$$
(3.1)

To extend this relation to more general functions over \(\Omega \), the comparison of classical logic with probability elucidates a path.

Let \({\mathcal {F}}_{\Omega }\) be a set of valuation functions over some domain \(\Omega \) and let their codomain be \({\mathcal {V}} = [0,1]\) (this chosen for concreteness, but can be abstracted), i.e., \({\mathcal {F}}_{\Omega }\) generalizes \({\mathbb {P}}(\Omega )\) used in the classical version of the relation above. The discussion in the previous section suggests an abstraction of logical consequence based on projections that define valid transitions between functions. From the relations in equations 2.2 and 2.3, we see that desirable properties of these projections include that they: (i) restrict the domain of valuation functions to some given region; (ii) preserve their mass within that region; and (iii) are consistent with each other. Let \(\Pi _{\Omega }\) be a set of projections on \({\mathcal {F}}_{\Omega }\), i.e., containing projections of the form \(\pi :{\mathcal {F}}_{\Omega } \rightarrow {\mathcal {F}}_{\Omega }\), and let them be indexed by subsets \(A \subseteq \Omega \). Recall, a projection \(\pi \) is an idempotent function in which \(\pi \circ \pi = \pi \), i.e., projecting an object more than once does not change the result.

We begin by defining some basic properties that a projection family must have in order for it to produce a partial ordering over \({\mathcal {F}}_{\Omega }\), which is a necessary (but we argue not sufficient) requirement for a relation to be considered a consequence relation.

Definition 3.3

(Regular Projections). A projective system \(\Pi _{\Omega }\) over \({\mathcal {F}}_{\Omega }\) is regular if:

  1. 1.

    (Consistency) For all \(A, B \subseteq \Omega \), we have \(\pi _A \circ \pi _B = \pi _{A \cap B}\). In particular, we have that if \(A \subseteq B\), then \(\pi _A \circ \pi _{B} = \pi _A\).

  2. 2.

    (Identity Map) For all \(f \in {\mathcal {F}}\), we have \(\pi _A(f)=f\) for all \(A \subseteq \Omega \) that contains the positive mass of f (i.e., \(A \supseteq \{\omega \in \Omega | f(\omega )>0\}\)). In particular, \(\pi _{\Omega }(f) = f\) for all \(f \in {\mathcal {F}}\).

These basic properties allow for a projective system to define a partial ordering over the functions in \({\mathcal {F}}_{\Omega }\); if there exists a projection taking a function h to a function f, then we say \(f \le h\). The consistency property is necessary for producing relations that are transitive:

Proposition 3.4

(Projective Ordering). For a function space \({\mathcal {F}}_{\Omega }\) and a regular projective system \(\Pi _{\Omega }\) over it, define a relation \(\le \) over \({\mathcal {F}}_{\Omega }\) as follows:

$$\begin{aligned} f \le h \text { if and only if } \exists \pi \in \Pi \text { such that } \pi (h)=f. \end{aligned}$$

for all \(f,h \in {\mathcal {F}}_{\Omega }\). Then \(\le \) is a partial ordering.

Proof

The relation \(\le \) is a partial ordering if it is reflexive, transitive, and antisymmetric. Reflexivity (i.e., \(f \le f\)) follows directly from the identity projections \(\pi _{\Omega }\) in the regular projective system. To show transitivity, suppose that \(f \le h\) and \(h \le g\). Then there exists projections \(\pi _A\) and \(\pi _B\) such that \(f = \pi _A(h)\) and \(h = \pi _B(g)\), and hence, by the consistency property, there exists a projection \(\pi _{A \cap B} = \pi _A \circ \pi _B\) such that \(f = \pi _{A \cap B}(g)\), implying \(f \le g\). To show antisymmetry, suppose that \(f \le h\) and \(h \le f\). Then there exists projections \(\pi _A\) and \(\pi _B\) such that \(f = \pi _A(h)\) and \(h = \pi _B(f)\), and by the consistency property, we have that \(f = \pi _A \circ \pi _B (f) = \pi _{A \cap B} (f)\) and \(h = \pi _B \circ \pi _A (h) = \pi _{A \cap B} (h)\). Also by the consistency property, we have that \(\pi _{A \cap B} \circ \pi _B = \pi _{A \cap B}\), which implies that \(\pi _{A \cap B}(f) = \pi _{A \cap B} \circ \pi _B (f) = \pi _{A \cap B}(h)\). As a result, we have that \(f = h\). \(\square \)

We refer to the partial ordering defined from a projective system as a projective ordering. Now, we consider what additional properties a projective system must have in order to produce a relation we might call a consequence relation. In particular, these additional properties abstract the notion of the set inclusion relation. Assume the function space \({\mathcal {F}}_{\Omega }\) contains the zero functionFootnote 15 over \(\Omega \), denoted by 0, and to simplify the notation, for a projection \(\pi _A\), where \(A \subseteq \Omega \), let \(f_A \equiv \pi _A(f)\) for each function \(f \in {\mathcal {F}}_{\Omega }\).

Definition 3.5

(Focal Projections). A projective system \(\Pi _{\Omega }\) over \({\mathcal {F}}_{\Omega }\) is focalizing if it is regular, and for all \(A,B \subseteq \Omega \) and for all \(f,h \in {\mathcal {F}}_{\Omega }\), we have:

  1. 1.

    (Mass Restriction) For all \(\omega \notin A\), we have \(f_A(\omega ) = 0\).

  2. 2.

    (Mass Preservation) For all \(\omega \in A\):

    $$\begin{aligned} \text {if } f(\omega )>0, \text { then } f_A(\omega ) > 0. \end{aligned}$$
    (3.2)
  3. 3.

    (Order Preservation) For all \(\omega ,\omega ' \in A\):

    $$\begin{aligned} \text {if } f(\omega ) \le f(\omega '), \text { then } f_A(\omega ) \le f_A(\omega '). \end{aligned}$$
    (3.3)

A focalizing projective system can induce a relation on real-valued functions that may be viewed as abstracting the basic properties of the subset (or set inclusion) relation; we refer to these induced relations as sub-function relations.

Definition 3.6

(Sub-Functions). For a function space \({\mathcal {F}}_{\Omega }\) and a focalizing projective system \(\Pi _{\Omega }\) over it, define the relation \(\subseteq _{\Pi }\) over \({\mathcal {F}}_{\Omega }\) with respect to them, referred to as a sub-function relation, as follows:

$$\begin{aligned} f \subseteq _{\Pi } h \text { if and only if } \exists \pi \in \Pi \text { such that } \pi (h)=f. \end{aligned}$$

for all \(f,h \in {\mathcal {F}}_{\Omega }\).

This definition of a sub-function relation may be viewed as extending the concept of a subset relation from \(2^{\Omega }\) to \([0,1]^{\Omega }\). This definition extends the basic, point-wise version of a sub-function, where \(h:\Omega ' \rightarrow [0,1]\) is a sub-function of \(f:\Omega \rightarrow [0,1]\), where \(\Omega ' \subseteq \Omega \), if we have \(f(\omega )=h(\omega )\) for all \(\omega \in \Omega '\) [34]. This definition also allows other basic set theory concepts to be extended in terms of it (see Appendix A). With this sub-function relation \(\subseteq _{\Pi }\) in hand, we may state the definition of logical consequence in terms of it:

Definition 3.7

(Logical Consequence). A logical consequence relation with respect to a function space \({\mathcal {F}}_{\Omega }\) and a focalizing projective system \(\Pi _{\Omega }\) over it, is a relation \(\models \; \subseteq {\mathcal {F}}_{\Omega } \times {\mathcal {F}}_{\Omega }\) such that:

$$\begin{aligned} f \models h \text { if and only if } f \subseteq _{\Pi } h \end{aligned}$$

for all \(f,h \in {\mathcal {F}}_{\Omega }\).

This definition, by design, subsumes both logical consequence in classical logic and conditioning in probability. For logic, we may let \({\mathcal {F}}_{\Omega }\) be the space of binary functions and define a projection family as in Definition 2.2; then we see that these projections produce the subset relation and in turn, the classical consequence relation. For probability, we may let \({\mathcal {F}}_{\Omega } = {\mathcal {P}}_{\Omega } \cup \{ {\textbf {0}} \}\), where \({\mathcal {P}}_{\Omega }\) is the set of probability measures over \(\Omega \) and 0 is the zero function over \(\Omega \) (denoting the degenerate distribution), and define the projections as in Definition 2.3 (except now, for \(A \subseteq \Omega \) and \(P \in {\mathcal {P}}_{\Omega }\) such that \(P(A)=0\), letting the projection \(\pi _A(P) = {\textbf {0}}\)); then this projection family produces a sub-function relation that coincides with the probabilistic conditioning relation. This consequence definition also encompasses many of the extensions of consequence considered for many-valued logics (see, for example, [22]).

As mentioned above, this formulation does not conform to the traditional interpretations given to consequence based on truth-preservation, but rather, is suggestive of a more general notion of valid inference. The valuation functions may be interpreted in different ways, but in their usual application, they are used to describe a state of knowledge, broadly construed, about \(\Omega \), the set of possible worlds. The knowledge about one world may not be independent of the knowledge about another (if there are dependencies, these must be preserved). This knowledge can be refined or reformulated based on restrictions of \(\Omega \) to subsets of possible worlds. The projections define, as these restrictions vary, the valuation functions that are allowed to follow from another (i.e., what may be inferred). The consistency condition results in consequence relations that are transitive, important since conclusions should hold no matter the route taken to arrive at them, whether from a direct or more circuitous route, loosely speaking. This of course is the case in classical logic and probability, allowing for example, for one to perform a sequence of logical entailments to arrive at some conclusion, or similarly, to perform a sequence of conditionings (based on, for example, incoming evidence) to arrive at some probabilistic inference.

In this section, we defined a logical consequence relation that is symmetric, global, and infinite-valued. We now compare its properties to those of other many-valued logical consequence relations.

3.5 Comparing Consequences

The proposed definition of consequence differs from Tarskian consequence (Definition 3.1) in that it assumes a symmetry between premise and conclusion. Under the assumption that there exists a mapping \(\tau : {\mathbb {P}}({\mathcal {S}}) \rightarrow {\mathcal {F}}\) taking sets of sentences to valuation functions, the Tarskian definition of consequence collapses to a trivial form (i.e., letting \(\Psi = {\mathcal {S}}\) in Definition 3.1). This raises the question, is this structure sufficient, or has too much been lost in assuming generic elements, for the purposes of defining the essential shape of consequence (i.e., is it too abstract)? To add constraints to these general structural properties, most logical systems are also defined with operational rules that consequence has to respect (e.g., the operational rules in Gentzen’s sequent calculus, see Sect. 4.2.1). In the next section, we consider rules of inference, and we argue that consequence should be used to define them, rather than them being used to help define consequence. Here, we were able to define a logical consequence independently of any such rules because we required the elements to be structured (i.e., elements that are functions) and we gave consequence its form based on it.

We now contrast the proposed logical consequence with those in most literature on many-valued logic. Due to the numerous proposals, the work [22] explores fundamental properties (or constraints) that they should satisfy in order to be viewed as ‘respectable’ consequence relations in this setting. These properties include: (i) bivalence-compliance, where the restriction of the relation to binary values respects classical consequence; (ii) locality, where consequence can be evaluated in terms of individual worlds (Definition 3.2); (iii) value-monotonicity, where the relation is monotonic with respect to the ordering of the truth values (see Definition 3.9 below); and (iv) validity-coherence, where every function is entailed by the zero function 0. Our proposed consequence relation, in general, violates all but the last of these properties.

Consider, for example, the value-monotonicity property. A focalizing projection family defines a partial order relation \(\le \) on \({\mathcal {F}}\) (Proposition 3.4), which may be contrasted with the standard point-wise ordering:

Definition 3.8

(Point-wise Ordering). A relation \(\le \) on \({\mathcal {F}}\) is a point-wise ordering if, for all \(f_0,f_1 \in {\mathcal {F}}\), we have:

$$\begin{aligned} f_0 \le f_1 \text { if and only if } f_0(\omega ) \le f_1(\omega ) \text { for all } \omega \in \Omega . \end{aligned}$$

This ordering gives rise to a notion of strength of functions, where loosely speaking a function is stronger than another if it entails more than it, and in turn, a notion of monotonicity:

Definition 3.9

(Value-Monotonicity). A consequence relation is value-monotonic if, for all \(f_0,f_1,f_0',f_1' \in {\mathcal {F}}\) such that \(f_0' \le f_0\) and \(f_1 \le f_1'\), we have:

$$\begin{aligned} \text {If } f_0 \models f_1, \text { then } f_0' \models f_1'. \end{aligned}$$

Notice that a focalizing projection family does not, in general, induce a partial ordering that satisfies the point-wise ordering in Definition 3.8. In probability, for example, for two probability distributions such that \(P_0 \subseteq P_1\) (i.e., \(P_0\) is a conditional distribution of \(P_1\)), there may exist \(\omega \in \Omega \) such that \(P_0(\omega ) > P_1(\omega )\) (e.g., consider conditioning on a single element \(a \in \Omega \), so that all mass is placed on it). Rather than requiring point-wise orderings, we specify a requirement about maintaining the relative orderings (Eq. 3.3). This issue presents itself in probability because the amount of mass contained in these functions must add up to one, which creates interdependencies among the values that can be assigned to different worlds \(\omega \in \Omega \).

We note that under the single premise, single conclusion assumption used in this work, the monotonicity property—formalizing what we mean by a function being stronger than another—is the core of the relation, and hence the reason for using projections to give the concept more form. Additional constraints could be placed on projective systems so as to satisfy the above properties, but for the purposes of creating a formulation that encompasses classical logic and probability, our definition suffices. With a notion of logical consequence in hand, we now turn to the other fundamental component of logic.

4 Deductive Consequence

Thus far, we have developed a relation and referred to it as a logical consequence. However, if we wish to create an abstract logic (and further claim that probability is an instance), then we must contend with deduction and language, and how probability relates. The aim of this section is not to produce a particular logic, with particular rules of inference, but rather, we attempt to understand how the components of logic fit together.

There are two general approaches to constructing logical systems. In the first, which is the traditional approach, we begin with a language \({\mathcal {L}}\), which defines a space of sentences \({\mathcal {S}}_{{\mathcal {L}}}\), and we then proceed to define consequence relations over it. In the second approach, we instead begin with a space of functions \({\mathcal {F}}_{\Omega }\) (over a domain \(\Omega \)), and then proceed to define a consequence relation over it. The systems \(({\mathcal {F}}_{\Omega }, \models )\) and \(({\mathcal {S}}_{{\mathcal {L}}}, \vdash )\) are duals of each other; given either a mapping \(\tau :{\mathcal {S}}_{{\mathcal {L}}} \rightarrow {\mathcal {F}}_{\Omega }\) (that, loosely speaking, assigns valuations to sentences), or a mapping \(\tau ': {\mathcal {F}}_{\Omega } \rightarrow {\mathcal {S}}_{{\mathcal {L}}}\) (that assigns symbolic representations to functions), once one system is defined, then so is the other. In the literature, logical systems are typically constructed by first defining a language (and propositional space).Footnote 16

In this work, on the other hand, we consider the latter approach in which the function space is first defined. We believe considering this approach is instructive as it allows us to view the purpose of logical components from another angle. In this section, because this approach is somewhat atypical, we begin by discussing how a language can be constructed with respect to a function space. Then we consider the use of this language in constructing rules of inference for a given consequence relation. Finally, we consider the role of deduction in probability.

4.1 Language

A proof must take the form of a finite sequence of formulas, each derivable from the previous ones in the sequence, and the validity of which can be determined by inspection; unlike in the previous section, a language is critical here. In a formal language, the set of symbol strings are defined by a grammar, a set of rules for taking symbol strings and constructing larger ones, loosely speaking. Grammars define compositional objects, whereby objects are composed of parts, which in turn, are composed of parts, etc. Grammars can also be used for constructing other compositional objects besides strings, for example graphs and trees (e.g., using graph grammars [68] or tree grammars [50]), which could be used as well. In classical logics, these symbol strings are used to represent binary functions, allowing for a compressed (and finite, under certain assumptions) representation of them.

Consider the relationship between vocabulary and structured spaces. To begin in a simple setting, suppose \(\Omega = \Omega _1 \times \ldots , \Omega _n\) is a product space and let \({\mathcal {F}} = 2^{\Omega }\) be the space of binary functions over it. To construct compact representations of these functions, we begin by specifying ‘primitive’ functions that only depend on parts of \(\omega \in \Omega \). Assign symbols \(a,b,c,\ldots \) to represent functions \(f_a,f_b,f_c,\ldots \in {\mathcal {F}}\) that only depend on a single componentFootnote 17. We refer to these as non-logical symbols due to the domain of their corresponding functions being the set \(\Omega \). For a truth set \({\mathcal {V}}=\{0,1\}\), next we can define connective symbols \(c_0,c_1,\ldots \) that represent functions of the form \(c:\{0,1\}^m \rightarrow \{0,1\}\), where m is a positive integer. These symbols allow for the construction of symbol strings (that represent additional functions), where a grammar is imposed: the set of well-formed sentences must correspond to the set of well-defined function compositions.

Now suppose the space \(\Omega \) has a more general structure than a product space (see Appendix B) and the above language is no longer sufficient for representing all the functions over it. The non-logical symbols are again assigned to the primitive functions, as defined by the projective structure on this space. To increase the number of functions that can be represented, we can introduce free symbols \(x,y,z,\ldots \), that we can place in strings where non-logical symbols occur, allowing for the representation of a set of functions: for a well-formed formula \(\sigma (x)\) with free variable x, let \(f_{\sigma (x)}\) be a function of the form

$$\begin{aligned} f_{\sigma (x)}:\Omega \rightarrow \bigcup _{k=1}^{\infty } \{0,1\}^{k}, \end{aligned}$$
(4.1)

defined as \(f_{\sigma (x)}(\omega ) = \{f_{\sigma (x \rightarrow a)}(\omega ) \; | \; a \text { is a non-logical symbol in } \omega \}\), where \(\sigma (x \rightarrow a)\) is the symbol string formed from replacing the symbol x with the symbol a. Now we may introduce quantifier symbols \(\forall , \exists ,\ldots \), that correspond to mappings of the form \(\bigcup _{k=1}^{\infty } \{0,1\}^{k} \mapsto \{0,1\}\), allowing its composition with functions of the form in Eq. 4.1 to create a function in \({\mathcal {F}}\).

We note that symbols can be defined as logical constants in two ways. In the traditional approach, a symbol is a constant if, roughly speaking, admissible replacements of the other symbols in sentences do not change inferences. Alternatively, a symbol can be defined to be a constant if its corresponding function does not depend on the set \(\Omega \), e.g., functions with domains (and codomains) that only involve the set \({\mathcal {V}}\) of valuations. For example, the above connective and quantifier symbols correspond to functions of the form \(\{0,1\}^m \mapsto \{0,1\}\) and \(\bigcup _{k=1}^{\infty } \{0,1\}^{k} \mapsto \{0,1\}\), respectively, which could be referred to as world-free functions.Footnote 18 We now give an example of a structured space \(\Omega \) where a first-order language is required for representing the functions over it.

Example

(First Order Language). Let \({\mathcal {U}}\) be a set referred to as the universe. Suppose \(\Omega \) is a space with structured elements, where each \(\omega \in \Omega \) takes the form of a hypergraph \(\omega = (D,e^{(0)},e^{(1)},\ldots )\), where \(D \subseteq {\mathcal {U}}\) is an subset of the universe, referred to as a domain of discourse, and each \(e^{(i)}\) is an edge function of the form \(e^{(i)}:D^{i} \rightarrow \{0,1\}^{m_i}\). Let \({\mathcal {F}}\) be the set of binary functions over \(\Omega \). We define a language for describing these functions as follows.

For each \(\omega \in \Omega \), let \(D[\omega ]\) denote the domain in \(\omega \) and let \(e^{(i)}[\omega ]\) denote the function \(e^{(i)}\) in \(\omega \). We define projections of \(\omega \in \Omega \) onto its function \(e^{(0)}\). Define functions \(f_k: \Omega \rightarrow \{0,1\}\), for \(k=1,\ldots ,m_0\), as follows: \(f_k(\omega ) = e^{(0)}_k [\omega ]\), where the subscript denotes the vectors \(k^{\text {th}}\) component. Assign symbols \(a_1,\ldots ,a_{m_0}\) to these functions. Now we define projections of \(\omega \in \Omega \) onto its function \(e^{(1)}\). Define functions \(f_{k,a}:\Omega \rightarrow \{0,1\}\), for \(k=1,\ldots ,m_1\) and \(a \in {\mathcal {U}}\), as follows:

$$\begin{aligned} f_{k,a}(\omega ) = {\left\{ \begin{array}{ll} e^{(1)}_k[\omega ](a), \; & \text {if } a \in D[\omega ] \\ 0, \; & \text {otherwise} \end{array}\right. }. \end{aligned}$$

Assign symbols \(P_k(a)\) to each one of these functions. Continuing in this fashion, we construct functions of the form \(f_{k,a_1,\ldots ,a_i}:\Omega \rightarrow \{0,1\}\) and symbols \(P_k(a_1,\ldots ,a_i)\) that represent them, for \(i = 1,2,\ldots \) and \(k = 1,\ldots , m_i\). Assign symbols, e.g., \(\{\lnot , \wedge , \vee , \rightarrow \}\), to functions of the form \(\{0,1\}^2 \mapsto \{0,1\}\). Finally, introduce variable and quantifier symbols for expanding the number of functions that can be represented. Define a grammar for this set of symbols that corresponds to valid compositions of their corresponding functions such that the output is binary. We note that the structure of \(\Omega \) in this example, where there are functional dependences between edges and the objects in the domain, can be described in terms of a projective system (Appendix B).

To summarize, for a given domain \(\Omega \), we may assign symbols to primitive functions on \(\Omega \), which describe components of \(\omega \in \Omega \), then build other functions over \(\Omega \) in terms of them, where the use of variables and quantifiers in the language allow the representation of more complex functions.

4.2 Proofs

Proofs require symbolic representations. Suppose that binary functions over \(\Omega \) do not have compact descriptions (with respect to a given vocabulary). Then, these function must be specified by enumeration (i.e., an exhaustive listing of the function’s output for every input to it), and in order to conclude that one function entails another, an exhaustive inspection is required (e.g., checking over each \(\omega \in \Omega \)Footnote 19). Further, even if the domain \(\Omega \) is finite, it may be the case that evaluating the function f(w) is too expensive or impossible, for example, because the element \(w \in \Omega \) is itself infinite is sizeFootnote 20; this then also necessitates compact representations, allowing for proofs that do not require direct evaluations of f. Symbolic representations allow for proofs of consequence based only on the syntactic form of sentences.

The abstract definition of consequence given by Tarski (Definition 3.1) specifies general structural properties (e.g., transitivity, monotonicity, etc.). To construct a proof theory for such relations, we must introduce a language with logical operators (i.e., logical constants), and define how consequence and the logical operators relate. Thus, in Gentzen’s sequent calculus [32], in addition to structural rules, we have operational rules that specify this relationship. These rules produce a joint definition; the definition of consequence is in terms of logical operators, and vice versa. Thus, for using these operational rules in constructing a logic (e.g., a many-valued logic where consequence and constants must be defined), if we specify one, we can derive the other: i.e., we may define consequence with respect to given logical operators [14, 80], or conversely, define logical operators with respect to given consequence relations [7, 15, 21]. Notice, though, that in constructing a proof calculus, there are three objects in play: the operational rules, the consequence relation, and the logical operators. Given any two of them, we may derive the third (since they must be in alignment with each other). In this work, we consider an approach in which we first define consequence and logical constants, and then derive the operational rules (or rules of inference) from them. This was the approach taken in constructing Gentzen’s sequent calculus; the operational rules were clearly motivated by definitions of consequence and constants in classical logic.Footnote 21

Thus far, we defined a logical consequence on valuation functions (Sect. 3.4) and we defined logical constants for the symbolic representation of these function in the previous subsection. The constants can serve dual purposes however: they are used both in the representation of functions and in the rules for transforming functions to others. These dual roles make them indispensable for characterizing the operational rules for consequence, the valid transformations from one function to another that can be verified by inspection.

4.2.1 Rules of Inference

Rules of inference specify the sequences of sentences that are valid for proofs. In the approach taken in this work, these rules are defined so as to be in alignment with the chosen definition of the logical consequence relation \(\models \) and the definition of the logical symbols in the language. Thus, we define a relation \(\models _{{\mathcal {S}}}\) over \({\mathcal {S}}\) (where \(\models _{{\mathcal {S}}} \; \subseteq {\mathcal {S}} \times {\mathcal {S}}\)) such that \(\phi \models _{{\mathcal {S}}} \psi \) if and only if \(f_{\phi } \models f_{\psi }\). We wish to construct rules of inference that define a relation \(\vdash \) that is equivalent to \(\models _{{\mathcal {S}}}\).

There are two general approaches to defining logical rules. In the Hilbert style, where every line in a proof is an unconditional tautology, and we let the rules of inference be some relation \(\vdash '\) on \({\mathcal {S}}\) that can be evaluated by inspection. Then a sequence \((\phi _0, \phi _1,\ldots ,\phi _m)\) is valid with respect to these rules if \(\phi _0 \vdash ' \phi _1 \vdash ' \cdots \vdash ' \phi _m\), and we define \(\vdash \) as the set derivable relations from \(\vdash '\). To define the relation \(\vdash '\) of inspectable consequences, we include in \(\vdash '\) only transformations that involve the application of a small number of logical constants.

For example, suppose we have logical operators \(\wedge \) and \(\rightarrow \) (e.g., functions of the form \({\mathcal {V}}^2 \mapsto {\mathcal {V}}\)) such that, for all \(f_0,f_1 \in {\mathcal {F}}\), we have \(f_0 \wedge (f_0 \rightarrow f_1) \models f_1\). Then, we may define a rule of the form \(\phi \wedge (\phi \rightarrow \psi ) \vdash \psi \), where to simplify notation, we are letting the symbols for these operators be the same as their function symbols. This rule corresponds to a relation \(\vdash _{r_1}\) on \({\mathcal {S}}\) such that:

$$\begin{aligned} (q,r) \in \; \vdash _{r_1} \text { if and only if } q = \phi \wedge (\phi \rightarrow \psi ) \text { and } r = \psi \text { for some } \phi ,\psi \in {\mathcal {S}}, \end{aligned}$$

where \(q,r \in {\mathcal {S}}\). As another example, if we have a logical operator \(\vee \) such that, for all \(f_0,f_1 \in {\mathcal {F}}\), we have \(f_0 \models f_0 \vee f_1\), then we may define a rule of the form \(\phi \vdash \phi \vee \psi \) and a relation \(\vdash _{r_2}\) that corresponds to it. Supposing we have a set of such rules corresponding to a set of relations \(\vdash _{r_1},\ldots ,\vdash _{r_l}\), we define \(\vdash '\), the set of inspectable consequences, as their union, and define \(\vdash \), the set of provable consequences, in terms of it. Similarly, we can construct Gentzen style rules, where each line in a proof is a conditional tautologyFootnote 22.

Any rule of inference constructed in this manner can be used for valid inference, as they are valid by construction. These rules only depend on form, not on the particular values assigned to the non-logical symbols in them. A sufficient number of rules must be constructed such that any instance of \(\models _{{\mathcal {S}}}\) is derivable from them. Thus, these rules define a grammar for valid proofs, where the functions represented by symbol strings need not be made explicit.

Importantly, the methodology for constructing rules of inference applies more broadly than to just consequence relations, and can be used for any relation. For example, let \({\mathcal {F}}\) be the set of real-valued functions over a space \(\Omega \), let \(=\) be the equality relation on \({\mathcal {F}}\) (or the order relation \(\le \)), and let \(+\) be the binary operator corresponding to addition. Then, for all \(f_0,f_1,f_2 \in {\mathcal {F}}\), we have that \(f_0 = f_1\) if and only if \(f_0 + f_2 = f_1 + f_2\). This is the rule of inference concerning the preservation of equality to additions. For constructing such rules, the logical operators need not be limited to point-wise operators of the form \({\mathcal {V}}^n \mapsto {\mathcal {V}}\), but can be extended to the more general form \({\mathcal {F}}^n \mapsto {\mathcal {F}}\), which may be useful when relations are not local (Definition 3.2). The construction of rules of inference can be abstracted from such mechanisms, where the rules of inference in classical logic are a particular instance, and for example, where the standard rules routinely used in mathematics are instances.

A point worth emphasizing is that even though the deductive consequence we have been considering is a symmetric relation (i.e., a subset of \({\mathcal {S}} \times {\mathcal {S}}\)) rather than the traditional relation between a set of premises and a conclusion (i.e., a subset of \({\mathbb {P}}({\mathcal {S}}) \times {\mathcal {S}}\)), we still have that the rules of inference are in terms of sets of sentences. That is, to prove instances of symmetric relations, the structure (or form) of a given sentence is represented by a set of component sentences and the rules specify valid manipulations of these components (e.g., if a sentence can be written as \(\phi \vee \psi \), we can conclude that it entails \(\phi \)). Thus, we view this as the location in logical systems where asymmetries are present, not within the consequence relations themselves.

In summary, given a consequence relation such as our proposal from the previous section and a language (with logical constants) for symbolically representing functions, we can construct Hilbert style and Gentzen style rules of inference. We contend that the rules of inference are servant to the proof of instances of consequence relations, and hence the approach of first specifying logical consequence is a natural one.

4.3 Probability

Does probability have some language for compactly representing its functions? And if so, does it also have a deductive apparatus for making proofs regarding its consequence relation?

Real-valued functions are significantly more complex than binary functions in that a more complicated language is necessary to represent them. Indeed, in general, real-valued function spaces are too complex for every function in them to be compactly represented by a finite string (with finite vocabulary), i.e., languages will not be functionally complete (where there exists a surjective function \(\tau : {\mathcal {S}} \rightarrow {\mathcal {F}}\) such that \(\tau ( {\mathcal {S}} ) = {\mathcal {F}}\)). There are, however, functions in them that can be represented, for example, those expressible in closed form (e.g., using standard mathematical notation and grammar), or closely related, those that belong to parameterized models. Let \({\mathcal {P}}\) denote the set of probability measures with respect to a space \(\Omega \), and let \({\mathcal {S}}\) denote a set of finite symbol strings such that for each string \(\phi \in {\mathcal {S}}\), there is a probability measure \(P_{\phi } \in {\mathcal {P}}\) corresponding to it. Let \(\tilde{{\mathcal {P}}} \subseteq {\mathcal {P}}\) denote the subset that can be expressed in closed form with respect to the language (i.e., there exists finite symbol strings corresponding to the probability measures). As an example, instead of specifying it by an exhaustive listing, we can write the Ising model compactly as follows:

$$\begin{aligned} P(\omega ) = \frac{1}{Z} \exp \left[ - \sum _{i<j} a_{i,j} x_i x_j - \sum _{i} u_i x_i \right] , \; \omega \in \{+1,-1\}^n, \end{aligned}$$

where each \(x_i = \pi _i(\omega )\), the \(a_{i,j}\) and \(u_i\) are real values, and Z is a normalizing constant. Here, the symbols \(x_1,...,x_n\) are non-logical symbols, corresponding to projections on \(\omega \) to its components, and the remaining symbols are logical. Notice the ‘abuse’ of notation in this equation: the left-hand side denotes a function P, whereas the right-hand side is a symbol string \(\phi \in {\mathcal {S}}\), and the equality symbol \(=\) expresses their correspondence (this is analogous to writing that a logical sentence equals its valuation function).

We may construct rules of inference from the consequence relation and the language. In addition to a more complicated language, the rules of inference are more complicated as well. For classical logic, the operators were of the form \({\mathcal {V}}^2 \mapsto {\mathcal {V}}\) and were applied point-wise, whereas in probability, we need logical operators of the form \({\mathcal {P}}^2 \mapsto {\mathcal {P}}\) due to its non-local dependencies. For distributions \(P_0,P_1 \in {\mathcal {P}}\), we defined that \(P_0 \models P_1\) if there exists a focalizing projection such that \(\pi (P_1)=P_0\). Thus, rules concerning operations that maintain the equality relation are relevant, as are, more generally, the standard rules of inference used in everyday mathematics. It’s beyond the scope of this work to catalogue these; they are merely the methods of proof already in use.

For our purposes here, it’s sufficient to note that probability has a language for compactly representing real-valued functions, allowing for the provability of whether certain functions entail others, i.e., there are deductive tools for its consequence relation. Of course, deduction plays dissimilar roles in probability and classical logic; in probability, although proofs of consequence are important, they are not usually viewed as a defining feature of the system itself. The divergent interpretations that have developed regarding the purposes of these systems reflect their different applications.

5 Interfaces, Fusions, and Foundations

The previous sections have sketched a general formulation for logic, a system involving a binary relation over functions—with certain preservation, restriction, and consistency properties—and a mechanism for proving instances of it, which requires compact representations of them. Each function specifies a state of ‘knowledge’, at least in its usual application, and the relation specifies the valid transformations and inferences from it. The formulation contains as instances propositional logic, predicate logic, and probability, as well as others, which compose the (formal) tools of reasoning that have found, by far, the most real-world application.

The implications of the proposed abstraction are significant, and are both positive, concerning how to develop alternative reasoning systems, and negative, concerning how not to combine probability and logic. In this section, we briefly evaluate, from the lens of the proposed relational interpretation, some of the common approaches to the fusion in the literature, and the degree to which they integrate both systems.

A large amount of research into the coupling of probability and logic initiates their investigation with the placement of probabilities on propositions. This presupposes a certain relationship between these systems. Suppose we have a function space \({\mathcal {F}}_{\Omega }\), a sentence space \({\mathcal {S}}_{{\mathcal {L}}}\), and a mapping \(\tau :{\mathcal {S}}_{{\mathcal {L}}} \rightarrow {\mathcal {F}}_{\Omega }\) (that assigns valuations to sentences). The placement of probabilities on sentences amounts to the presupposition that, for sentences \(\phi \in {\mathcal {S}}_{{\mathcal {L}}}\), their binary valuation functions \(\tau (\phi ) \in {\mathcal {F}}_{\Omega }\) correspond to probabilistic events-subsets of \(\Omega \)-and it is this correspondence that defines the relationship between probability and logic.

There’s another possibility, however: the binary valuation functions \(\tau (\phi ) \in {\mathcal {F}}_{\Omega }\) can instead correspond to probability densities, the valuation functions used in probability. That is, since both sentence valuations in logic and densities in probability define functions over a set \(\Omega \) of possible worlds, except one binary and the other real valued, sentences could be taken to correspond, not to probabilistic events (the usual correspondence), but directly to probability measures.

Considerations of this alternative correspondence led us to our proposed abstraction. In the following subsections, we consider its implications with respect to the fields of statistical relational modeling, probability logic, and inductive logic, as well as to some foundational issues.

5.1 Statistical Relational Modeling

We first consider the fusion approach in the field of statistical relational modeling, which explores, among other things, the practical construction of probability distributions over spaces of possible worlds in first-order logic, where such distributions allow consistent probabilities to be assigned to sentences in the language.

Suppose we want to construct a probability distribution over a space in which elements represent possible worlds, where these worlds are composed of objects with attributes (e.g., child wearing a hat, blue sedan, etc.), and relationships between objects (e.g., holding hands, driving, etc.), and further, worlds vary in the objects in them (e.g., they may be empty or may have numerous objects). Given a vocabulary in a logical language, a space \(\Omega \) can be associated with it and a distribution over it allows consistent probabilities to be assigned to sentences in the language, a central problem in the field [30, 63, 72]. The set \(\Omega \) is not equivalent to a product space in terms of its structural constraints (an idea that can be formalized in terms of structure-preserving mappings), which prevents the straightforward application of standard modeling approaches to it.

The set \(\Omega \) of possible worlds will, in general, be too massive to make learning a distribution over it feasible without the use of invariance assumptions (i.e., constraints, structure). These invariances can be specified using the logical language. If we have a probability distribution over a countable space \(\Omega \), then sentences of a logical language about this space (where sentences correspond to binary functions of the form \(f:\Omega \rightarrow \{0,1\}\)), can be assigned probabilities. (That is, a sentence assigned probability equal to the probability of the subset of \(\Omega \) in which it is true.) If, on the other hand, the distribution over a space \(\Omega \) is unknown and we want to learn it, logical expressions can be used to express invariances in the distribution. For example, invariances can be defined on distributions by constraining the distributions to those that assign certain probabilities to certain sentences.

More generally, invariances can be defined on these distributions in terms of logical expressions about the distribution itself [8, 28, 43], referred to as probability expressions (these correspond to functionals of the form \(f(P) \mapsto \{0,1\}\), where P is a distribution over \(\Omega \)). Thus, this creates a modeling framework for \(\Omega \) based on general invariances, as expressed by logical expressions. This level of expressiveness in invariances, however, can result in the specification of a set of invariances that is inconsistent in the sense that there does not exist a well-defined distribution that satisfies it. This problem has led researchers to consider forgoing some of the expressive power in these logics to ensure consistency, and since graphical models provide consistency guarantees for their invariances, to research extensions of graphical models to more general spaces.

We now consider two issues concerning the depth of this general approach to fusing probability and logic. The first issue is that it is not clear where deduction comes into play in the fused system. Given a vocabulary in a logical language, a set of possible worlds can be defined, and from the perspective of its statistical modeling, it is only the structure of this space that matters. Since the structure of this space can be described without recourse to a logical system, the extra machinery (e.g., a notion of logical consequence, rules of inference, etc.) that comes with it can be forgone when defining distributions over it. In the case where additional structure is placed on the set (e.g., constraining certain sentences to have certain probabilities), while it may be convenient to describe these constraints using sentences, the defining characteristics of logic are still not being employed and the constraints could be equivalently described by functions. In the proposed interpretation, one of the primary purposes of sentences in logical systems—symbolic representations of the state of the world—is that they allow for proofs of instances of logical consequence. Without a notion of consequence, or a need for its proof, a logical language is not strictly necessary. This raises the question, in fusions of probability and logic based on assigning probabilities to sentences, where does logical consequence, and a need for a language, appear? Are we to assume that logical consequence takes the form of probabilistic conditioning in this fusion? If so, then not only is the role of probability being prioritized, but the very idea that probability needs to be extended with a notion of valid inference and that the systems are complementary is contradicted.

A second issue with this approach to a fusion concerns the interpretation of knowledge bases. In statistical relational modeling, many approaches define a knowledge base as a set of sentences and a corresponding set of weights, and use it when defining probability distributions over (subsets of) a set \(\Omega \) of possible worlds. In this formulation, knowledge bases define constraints on the allowable set of distributions. The use of constraints when defining distributions, however, is a standard practice in statistical modeling, which raises the question about whether the treatment of knowledge bases as a set of constraints is the proper abstraction of the concept. In the proposed interpretation of the relationship, both classical logic and probability share a common conception of what a knowledge base is, i.e., a designated valuation function over \(\Omega \) (or its corresponding symbol string), that expresses a state of knowledge and from which inferences can be made. Thus, we see that the proposed interpretation is in conflict with the interpretation given to knowledge bases in this fusion approach. Finally, the notion of logical consequence in the fused system takes the form of probabilistic conditioning, and hence the role of probability is prioritized in this fusion.

5.2 Probability Logic

We now consider, through the lens of the proposed interpretation, the approach to the fusion in the field of probability logic ([5, 35, 36, 55]), where symbols are incorporated into a logical language for referencing, or quantifying over, probability measures. These logics can be roughly divided into two categories: (1) those in which logical consequence maintains its traditional definition based on the preservation of sentence truth values; and (2) those in which it does not. We consider each in turn. Let \(\Omega \) be a (countable) set of possible worlds and \({\mathcal {P}}_{\Omega }\) the set of probability measures over it. In the first category, a logic is defined over a space of probability measures \({\mathcal {P}}_{\Omega }\) (see, e.g., [28, 36, 37]); in this approach, a vocabulary and grammar are specified, producing symbol strings that represent binary functions over \({\mathcal {P}}_{\Omega }\). Basic sentences in these logics often take the form \(P(\phi ) \ge b\), where \(\phi \subseteq \Omega \), and more complex sentences are possible such as linear combinations \(a_1 P(\phi _1) + \cdots + a_n P(\phi _n) \ge b\), as well as others [8, 28, 43, 64, 66]. These sentences are either true or false for a given \(P \in {\mathcal {P}}_{\Omega }\), and the consequence relation is defined based on these truth values as usual.

In this fused reasoning system, sentences correspond to binary valuation functions of the form f(P), where \(P \in {\mathcal {P}}_{\Omega }\). This may be contrasted with the reasoning system discussed in the previous section (Sect. 5.1), where probability measures were defined that take the form P(f), where \(f \in {\mathcal {F}}_{\Omega }\) is a binary function over \(\Omega \) and logical consequence is discarded in favor of probabilistic conditioning; here the opposite occurs. In this sense, there is a duality between these two approaches.Footnote 23 The problem, however, is that neither approach to the coupling produces a novel (or extended) reasoning system in the sense that the consequence relation essentially reduces to that of one or the other.

In the second category of probability logic, the logical consequence relation cannot be stated based on the truth values of sentences, but instead is redefined. We consider a representative example here [3, 5, 6], although others could have been employed as well. Let the language be limited to sentences of the form “\(P(\beta | \alpha )\) is high”, where \(\beta , \alpha \subseteq \Omega \), and where the meaning of this sentence is defined indirectly through the consequence relation described below. That is, the truth of a sentence “\(P(\beta | \alpha )\) is high” cannot be evaluated for a given \(P \in {\mathcal {P}}_{\Omega }\) (i.e., sentences do not represent binary functions on \({\mathcal {P}}_{\Omega }\)). Sentences do not have independent meanings here, but can only be interpreted in the context of others. Suppose we have a knowledge base \(\Gamma \) of the form:

$$\begin{aligned} \left\{ ``P(\psi _1 | \phi _1) \text { is high''}, \ldots , ``P(\psi _n | \phi _n) \text { is high''} \right\} . \end{aligned}$$

Then, define logical consequence as follows: \(\Gamma \) \(\models \)\(P(\beta | \alpha )\) is high” if and only if for all \(\epsilon >0\), there is a \(\delta >0\), such that for all probability measures \(P \in {\mathcal {P}}_{\Omega }\):

$$\begin{aligned} P(\psi _i | \phi _i) \ge 1-\delta \text { for each } i \implies P(\beta | \alpha ) \ge 1 - \epsilon . \end{aligned}$$

The first thing to notice is that this consequence is not a binary relation on pairs of sentences, i.e., the premises are not individual sentences (possibly formed from a conjunction of others), but sets of sentences. (The language does not contain connectives or other logical symbols for combining sentences to form others.)

This probability logic integrates both logical consequence and probabilistic conditioning into a single notion of consequence, unlike those in the previous category, which creates a novel reasoning system (that does not reduce to one or the other). While this logic fits into the standard abstraction of logic (e.g., involving Definition 3.1), it radically departs from the abstraction proposed here (i.e., sentences are not symbolic representations of functions, nor logical consequence a relation on them). As a result, this probability logic deviates far from our intuitive understanding of what is logic. It is, in a sense, less constrained, as well as more complicated, than ones belonging to the proposed abstraction; such an unconstrained version of reasoning has less precedent in terms of supporting examples used in practice.

5.3 Inductive Logic

We now consider our interpretation’s fit with inductive logic [18,19,20, 33], where the main idea is to extend deductive (syntactic) consequence, a relation on symbol strings, to cases where premises provide less than conclusive support for conclusions, referred to as partial entailment. The goal is to define the degree (of confirmation) to which premises support conclusions, and to specify these values solely in terms of their syntactic structure. Probability appears to be a natural tool for specifying these degrees: assign to them the probabilities of conclusions after conditioning on premises. The view that probability defines partial entailments is referred to as the “logical interpretation of probability”Footnote 24 [48, 52].

In the proposed interpretation, sentences are symbolic representations of valuation functions, allowing for proofs. In both logic and probability, deductive consequence then concerns the provability of instances of logical consequence for their respective valuation functions (over any appropriate set of possible worlds). By extending deductive consequence from a binary to real-valued function over pairs of sentences, this then translates to an assumption about partial degrees of provability (regarding proof of the entailment of a binary function from another), an assumption that contradicts the common conception of what is a proof. Hence, the interface between probability and logic expressed in inductive logic is at odds with the proposed interpretation.

5.4 Foundations

In research concerning the foundation of probability, there is debate about whether conditional probabilities should be taken to be primary, and then defining unconditional probability from them, or vice versa; it appears, however, that no argument on the matter is decisive [27, 38, 39]. In this work, we took unconditional probability as primary based on our interpretation of probability’s relationship with logic, i.e., since we took conditioning to be a form of consequence, it was defined in a secondary manner in terms of projections between unconditional distributions (or more generally, between valuation functions). Finally, we note that: (i) both logic and probability may be considered to be monotonic in the sense that consequence projections always constrict the regions of positive mass of functions, never expanding it; and (ii) there may be implications concerning the dichotomy between deductive and inductive reasoning: if probability is an instance of logic, how can it be inherently inductive (except by way of its application)?

5.5 Narrow Interpretations

Although probability and classical logic may be viewed as instances of logic, broadly construed, the logical consequence relation in probability may also be considered an extension of that in classical logic in the sense that there exists a mapping taking probability measures to binary functions that preserves it. That is, if we let \(m:{\mathcal {P}}_{\Omega } \rightarrow {\mathcal {F}}_{\Omega }\) be the mapping that takes probability measures and identifies the portion of their domain with positive probability mass (i.e., \(m(P)(\omega ) = {\mathbb {I}}(P(\omega )>0)\), where \(\omega \in \Omega \) and \({\mathbb {I}}\) is the indicator function), then if a probability measure P entails \(P'\), we have that binary function m(P) is entailed by \(m(P')\). Thus, the latter relation can be recovered from the former, based on a simple mapping identifying positive probability mass. This suggests that, in this narrow sense, probability may be viewed as an extension of any particular classical logic.

One of the main objections against extensional interpretations of the relationship between logic and probability, aside from the latter lacking a consequence relation, is that it does not have “extensional” operations ([5, 77]; see also ([36])). This objection is based on the observation that logical connectives can be used in logic to produce, loosely speaking, more complex sentences (e.g., taking two sentences \(\phi \) and \(\psi \) and forming their conjunction \(\phi \wedge \psi \)), whereas there are not coinciding connectives for probability measures (in the sense that \(P(\phi \wedge \psi )\) cannot be expressed as a function of \(P(\phi )\) and \(P(\psi )\)). While this observation suggests against viewing probability measures defined over sentences as an extended version of logic, it does not conflict with the above extensional interpretation.

Another objection occasionally encountered against an extensional interpretation is that probability lacks the “expressive power” of logic. Loosely speaking, the expressivity of a logic concerns the number of valuation functions that can be expressed with it. Depending on how one formalizes these concepts, by most reasonable measures we have that the expressivity of probability, the number of functions that it can express, is not less than that of classical logic.

6 Summary and Discussion

This work concerns the foundational question of what is the core relationship between probability and logic. Is there a correct relational interpretation, or are more than one equally legitimate? In our investigation, we return to basic considerations regarding the correspondences between the two systems that could dictate the nature of their relationship. This work began with an observation: both logical consequence and probabilistic conditioning may be described in terms of a set of projections that prescribe what (valuation) functions are allowed to follow from others under refinements to the possible worlds. When these projections are endowed with preservation and restriction properties, they result in an extension of the subset relation to a sub-function relation. We view this more general relation as a natural direction to abstract logical consequence, using probability and classical logic as our guiding examples.

Inferences with respect to these relations are made relative to some designated function (e.g., a ‘knowledge base’), where in classical logic, this is a binary function over some structured space \(\Omega \), perhaps described by some conjunction of sentences, and in probability, a probability measure over it. The valuation functions, in their usual application, describe a state of ‘knowledge’, e.g., which possible worlds can occur or how likely they are to occur, and consequence may now be interpreted as a notion of how knowledge is allowed to flow or transition from one state to another. For proving instances of this consequence relation, it is necessary that the valuation functions have compact representations, which can be accomplished by assigning labels to elementary functions, and using them as building blocks for constructing more complex ones. Given these labels, grammars can then be used for specifying well-defined symbolic representations, as well as consequence-preserving operations on them, necessary for deduction.

One critique of the proposed abstraction may be related to the use of logic as a foundation of mathematics: how can logic and probability be considered to be on an equal footing (in the sense that both are instances of a broadly construed logic) if, for example, probability theory requires logic in the proof of its theoremsFootnote 25? This interpretation of the relationship, however, is at a stratospheric level and was never considered satisfactory by those searching for an intimate understanding of it. Another critique may concern our use of countable instead of uncountable spaces \(\Omega \) in our presentation, where in the latter case, consistent probabilities can generally no longer be assigned to all subsets and must be limited to only those in some sub-collection. However, the proposed interpretation of the relationship, based on the basic observation that probability may be interpreted as having a consequence relation analogous to logic, is not changed by the technicalities of measure. The function projections defined in Sect. 2, for example, can be adjusted to accommodate \(\sigma \)-fields in a straightforward mannerFootnote 26.

6.1 Why Abstract?

In general, abstractions can only be evaluated through appeals to meta-level principles such as aesthetics, simplicity, fit (to guiding examples), and other such qualities. The value of an abstraction derives, to a large extent, from the clarity and understanding that it affords.

The proposed abstraction encompasses probability and classical logic, and importantly, does so using minimal machinery. This allows for a compressed description of these systems. It accomplishes this without appeals to higher mathematics and without stretching or contorting the original conception of logic. It also has an explanatory property, providing a simple narrative concerning the purpose of components (i.e., the abstraction is not too abstract). We found this beneficial in understanding how components should and should not fit together in reasoning systems (and was used to analyze different approaches to fusing probability and logic). Finally, the proposed abstraction, like any abstraction of value, suggests possible directions for the development of alternative instances.

Importantly, the proposed formulation, while encompassing those most used, does not encompass all logics that have been proposed, which provides a useful demarcation. The Tarskian abstraction (Definition 3.1), which extends classical logic in a different direction, encapsulates a greater number of examples, including many of the offshoot logics that have been developed in the literature in recent years. The proposed abstraction is not a replacement for the traditional Tarskian abstraction; they abstract in different directions, in the service of different aims.

6.2 Probability and Logic Revisited

The abstraction proposed in this work furnishes an interpretation of the relationship between probability and logic that is in agreement with the classical conception of it, that for some sufficiently broad definition of logic, probability is an instanceFootnote 27. This interpretation conflicts with the conception that these systems are fundamentally complementary in nature; as a result, our proposal is also in conflict with the proposed approaches to their coupling in the literature.

We considered several such research branches, and from the lens of the proposed interpretation, evaluated their consistency with the designated purpose of components in logical systems. For example, in the proposed interpretation, the basic purpose of a language in logical systems is that it allows for symbolic representations of functions (a requirement for proofs of instances of logical consequence).Footnote 28 Many fusions in the literature violate this raison d’être, as well as others (Sect. 5). We also appraised fusions based on their degree of integration and compactness. For example, a fusion in which one system is stacked on top of the other is not as compactly integrated as a fusion in which the systems are more intertwined (see Sects. 5.1 and 5.2). Similarly, a fusion in which only partial parts in each system are incorporated is not as fully integrated as a fusion that includes more of them.

Although probability and logic can combine/interface in many ways, ranging from the trivial to the complex, the issue at hand is whether there exists a deep fusion that forms a principled foundation for the development of reasoning system more generally. If one accepts the proposed abstraction of logic in which probability is an instance, then it appears that there does not exist such a fusion between probability and any particular classical logic,Footnote 29 since one is already an extension of the other (Sect. 5.5). The proposed abstraction of logic, however, focuses our attention to other, perhaps more promising, directions in which new reasoning systems could be developed.