PROBABILISTIC ENTAILMENT ON FIRST ORDER LANGUAGES AND REASONING WITH INCONSISTENCIES

Part of: Mathematical Logic and Foundations General logic

Published online by Cambridge University Press: 07 July 2022

SOROUSH RAFIEE RAD

Show author details

SOROUSH RAFIEE RAD*: Affiliation:
DUTCH INSTITUTE FOR EMERGENT PHENOMENA (DIEP) UNIVERSITY OF AMSTERDAM, AMSTERDAM, THE NETHERLANDS and THE INSTITUTE FOR LOGIC LANGUAGE AND COMPUTATION (ILLC), AMSTERDAM, THE NETHERLANDS
*: E-mail: S.RafieeRad@uva.nl

Article contents

Abstract
Introduction
Preliminaries and notation
Probabilistically consistent revisions
Probabilistic entailment
Conclusion
Footnotes
References

Rights & Permissions

Abstract

We investigate an approach for drawing logical inference from inconsistent premisses. The main idea in this approach is that the inconsistencies in the premisses should be interpreted as uncertainty of the information. We propose a mechanism, based on Kinght’s [14] study of inconsistency, for revising an inconsistent set of premisses to a minimally uncertain, probabilistically consistent one. We will then generalise the probabilistic entailment relation introduced in [15] for propositional languages to the first order case to draw logical inference from a probabilistic set of premisses. We will show how this combination can allow us to limit the effect of uncertainty introduced by inconsistent premisses to only the reasoning on the part of the premise set that is relevant to the inconsistency.

Keywords

probabilistic reasoning inconsistency

MSC classification

Primary: 03-02: Research exposition (monographs, survey articles)

Secondary: 03B42: Logics of knowledge and belief (including belief change) 03B48: Probability and inductive logic

Type: Research Article
Information: The Review of Symbolic Logic , Volume 16 , Issue 2 , June 2023 , pp. 351 - 368

DOI: https://doi.org/10.1017/S1755020322000235 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press on behalf of The Association for Symbolic Logic

1 Introduction

The treatment of inconsistencies is a long-standing issue for mathematical logic. Classical logic comes along with strong built-in consistency assumptions and it follows that the full force of the classical entailment relation is too strong for reasoning with inconsistencies. There are, however, many different motivations for the development of logics that can accommodate inconsistencies. Although limiting the scope of logical inference to only consistent domains fits well with the spirit of what one requires from reasoning in mathematical contexts, there are many contexts where it does not. In particular, we have the case when the context of the reasoning is not assumed to represent some factual property of a structure nor objective facts concerning the real state of things but some not-necessarily-certain information or approximations regarding those facts.

Hence there have been many attempts in the literature to develop logical systems and inference processes that allow for reasoning with inconsistent premisses. The main difference between these attempts arise from the way that the inconsistent evidence is interpreted. One motivation stems from adopting the philosophical position of dialetheism as advocated by Priest [Reference Priest23, Reference Priest, Gabbay and Woods25]. This position is characterised by submitting to the thesis that there are true contradictions. That is to accept that there are sentences which are true and false simultaneously. One way to formalise this view is to develop logics that would allow for evaluating a sentence as both true and false, for example by adopting a three-valued logic with truth values $\{0, 1, \{0, 1\}\}$ with truth value $\{0, 1\}$ for the sentences that are assumed to be both true and false. The most notable example of such logical systems is arguably the logic LP [Reference Priest22, Reference Priest, Gabbay and Guenthner24].

Other motivations can arise from more pragmatic reasons which deal with reasoning in non-ideal contexts. Here the inconsistencies are interpreted as a property of the information and are taken to be anomalies that point out errors or shortcomings of the reasoners’ information (or maybe communication channels). The idea, however, is that despite this shortcoming it is still useful to have formal systems that allow logical inference from such sets of information without submitting to dialetheism. The approaches that arise from this latter motivation can be divided into two groups. The first group aims at developing formal systems with mechanisms for dealing with inconsistent information. These include, amongst others, discussive logic [Reference Jaśkowski13], adaptive logic introduced by Batens [Reference Batens3], Da Costa’s logics of formal inconsistency [Reference da Costa6, Reference da Costa and Subrahmanian7], Dunn–Belnap four-valued logic [Reference Belnap, Anderson, Belnap and Dunn4, Reference Dunn10], and relevant logic of Anderson and Belnap [Reference Anderson and Belnap2] and their variants. The second group attempts to approach reasoning with inconsistent premisses by reducing the context of reasoning to a consistent one, for instance by defining the logical consequences on the basis of maximal consistent subsets, as in the logical system of Rescher and Manor [Reference Rescher and Manor26], or by first revising the inconsistent sets of premises to consistent ones, as in AGM belief revision process [Reference Alchourròn, Gadenfors and Makinson1], and make the reasoning on the basis of this consistent revision.

Our approach will fall into this last group and is in line with the view that treats inconsistencies as a property of evidence, pointing to some shortcoming or inadequacy of the information, and defines the logical consequence on the basis of some consistent revision of the premisses. In this setting, the existence of inconsistencies points, foremost, to the unreliability of the information, and hence the revision process shifts the context of reasoning from a set of categorically true premisses to uncertain information, expressed probabilistically. As will become clear shortly, however, our approach to revising inconsistent premisses is radically different from that of AGM.

The idea in an AGM-like belief revision process is that upon receiving some information $\phi $ that is inconsistent with the current knowledge base, one will first retract the part of the premisses that contradicts this new information and then expand the remaining premise set by adding $\phi $ . The assumption here, however, is that the new information is always more reliable than the old. This assumption is counter-intuitive in many aspects of reasoning; for example when the context of reasoning consists of statements derived from potentially unreliable sources or processes that are subject to errors. Even more pointed are cases that deal with statements accumulated through different sources and processes which do not necessarily agree. This is the case in almost all applications of reasoning outside mathematics. In many such cases as the information set expands by acquiring new information through possibly conflicting sources and processes, it may very well come to include conflicting and inconsistent evidence without any second order information that warrants discarding parts of the premisses in favour of others. At the same time keeping the evidence set whole will void the possibility of using classical entailment (or other variations of it which still get trivialised in the presence of inconsistencies). In this sense having some inconsistency in a (possibly very large) set of evidence will render it completely useless for reasoning. This has motivated a large body of work that deals with “non-prioritised” belief revision [Reference Hansson12].

There are many applications of reasoning, however, in which the inconsistencies should intuitively affect the reasoning only partially. Consider for example sentences $\phi $ and $\psi $ that share no syntactic component, and the entailment $ \{\phi , \psi , \neg \phi \} \vDash \neg \psi $ , many instances of which are counter-intuitive. For instance, assume $\phi $ and $\neg \phi $ are acquired through different sources, say $S_1$ and $S_2$ , where both sources agree on $\psi $ . Here one would expect the inconsistency to affect one’s evaluation of the reliability of the data and thus produce uncertainty in the information, but what is more, at the same time one might wish to do so in a way that introduces as little uncertainty as possible and only where necessary. This is the motivation for what we shall pursue in this paper and the aspect of the literature we hope to contribute to.

Our approach comes in two parts. First is the revision of an inconsistent set of categorical premisses to a probabilistically consistent set of uncertain premisses. In light of the discussion above this should be done in a way that limits the introduction of the uncertainty to only those premisses that are affected by the inconsistency. The second component will then be an entailment relation that allows reasoning on the basis of the new probabilistically consistent premisses. For these we use the work proposed by Knight [Reference Knight14] in his study of inconsistency, for defining the revision process. A similar approach has been studied by Picado Muiño [Reference Picado Muiño20] and Thimm [Reference Thimm, Bilmes and Ng27, Reference Thimm28] for measuring the inconsistency of a set of probabilistic assertions (see also Bona [Reference De Bona and Finger8] and Bona et al. [Reference De Bona, Finger, Ribeiro, Santos and Wassermann9]), and by Potyka and Thimm [Reference Potyka and Thimm21] for inconsistency tolerant reasoning. We will then follow the work developed in Knight [Reference Knight15], Picado Muiño[Reference Picado Muiño20] and Paris, Picado-Muiño and Rosefield [Reference Paris, Picado-Muino and Rosefield19] on probabilistic entailment for propositional languages, and will extend their work to the first order case.

The rest of this paper is organised as follows. In Section 2 we will set up our notation and preliminaries. In Section 3 we will investigate a revision process for reducing inconsistent sets of sentences to probabilistically consistent, uncertain ones. We will investigate revision of categorical inconsistent sets of sentences in Section 3.1, and the revision of inconsistent probabilistic assertions and prioritised sets of sentences in Section 3.2. In Section 4 we generalise the probabilistic entailment relation of [Reference Knight15] to first order languages. Finally, we will show, in Section 4.3, how generalisation to multiple thresholds, as suggested in [Reference Knight15], would allow us to limit the effect of inconsistency to only part of the reasoning.

2 Preliminaries and notation

Throughout this paper we will work with a first order language L with finitely many relation symbols, no function symbols and countably many constant symbols $a_{1},a_{2},a_{3},\ldots $ . Furthermore we assume that these constants exhaust the universe. This means, in particular, that we have a name for every element in our universe. Thus a model is a structure M for the language L with countably infinite domain $|M|=\{ \, a_{i} \, | \, i=1,2,\ldots \}$ where every constant symbol is interpreted as itself. Let $RL\ \mathrm{and}\ SL$ denote the set of relations and the set of sentences of L respectively.

Definition 2.1. A function $w: SL \rightarrow [0\, , \, 1]$ is called a probability function if for every $\phi , \psi , \exists x \psi (x) \in SL$ ,

– P1. If $\models \phi $ then $w(\phi )=1$ .
– P2. $w(\phi \vee \psi )= w(\phi )+ w(\psi )- w(\phi \wedge \psi )$ .
– P3. $w(\exists x \psi (x))= \lim _{n \to \infty } w(\bigvee _{i=1}^{n} \psi (a_{i}))$ .

We will denote the set of all probability functions on $SL$ by $\mathbb {P}_L$ .

Let $L_{prop}$ be a propositional language with propositional variables $p_{1},p_{2},\ldots ,p_{n}$ . By atoms of $L_{prop}$ we mean sentences $At=\{\,\alpha _{i}\,|\, i=1,\ldots ,J \}$ , $J=2^{n}$ , of the form

$$ \begin{align*}p_{1}^{\epsilon_1} \wedge p_{2}^{\epsilon_2} \wedge \cdots \wedge p_{n}^{\epsilon_n}, \end{align*} $$

where $\epsilon _i \in \{0, 1\}$ and $p^1=p$ and $p^0=\neg p$ . By disjunctive normal form theorem, for every sentence $\phi \in SL_{prop}$ there is unique set $\varGamma _{\phi } \subseteq At$ such that $\models \phi \leftrightarrow \bigvee _{\alpha _{i} \in \varGamma _{\phi }} \alpha _{i}$ . It can be easily checked that $\varGamma _{\phi }=\{\, \alpha _{j}\,|\, \alpha _{j}\vDash \phi \,\}$ . Thus if $w: SL_{prop} \rightarrow [0\, ,\, 1]$ is a probability function then $w(\phi )= w(\bigvee _{\alpha _{i} \vDash \phi } \alpha _{i})=\sum _{\alpha _{i} \vDash \phi } w(\alpha _{i})$ as the $\alpha _{i}$ ’s are mutually inconsistent. On the other hand, since $\models \bigvee _{i=1}^{J} \alpha _{i}$ we have $\sum _{i=1}^{J} w(\alpha _{i})=1$ . So the probability function w will be uniquely determined by its values on the $\alpha _i$ ’s, that is by the vector $( w(\alpha _{1}),\ldots ,w(\alpha _{J}) ) \in \mathbb {D}^{L_{prop}}$ where ${\mathbb {D}^{L_{prop}}=\{\, \vec {x}\in \mathbb {R}^{J}\, |\, \vec {x} \geq 0, \, \sum _{i=1}^{J} x_{i}=1\}}$ . Conversely if $\vec {a} \in \mathbb {D}^{L_{prop}}$ we can define a probability function $w^{\prime }: SL_{prop} \rightarrow [0\,,\,1]$ such that $(w^{\prime }(\alpha _{1}),\ldots ,w^{\prime }(\alpha _{J})) = \vec {a}$ by setting $w^{\prime }(\phi )= \sum _{\alpha _{i}\vDash \phi } a_{i}$ . This gives a one-to-one correspondence between the probability functions on $L_{prop}$ and the points in $\mathbb {D}^{L_{prop}}$ (see [Reference Paris17] for more details).

The situation for first order languages is a bit more complicated since defining atoms in a way similar to the propositional case will require the use of infinite conjunctions. Instead, we have the state descriptions that play a role similar to those of atoms for propositional languages.

Definition 2.2. Let L be a first order language with the set of relation symbols $RL$ and let $L^{(k)}$ be a sub-language of L with only constant symbols $a_1,\ldots ,a_k$ . The state descriptions of $L^{(k)}$ are the sentences $\Theta _{1}^{(k)},\ldots ,\Theta _{n_{k}}^{(k)}$ of the form

$$ \begin{align*}\bigwedge_{{i_{1},\ldots ,i_{j} \leq k} \atop{{ R\,\, j-ary} \atop{R \in RL, j \in N^{+}}}} R(a_{i_{1}},\ldots ,a_{i_{j}})^{\epsilon_{i_1, \ldots, i_j}}.\end{align*} $$

where $\epsilon_{i_1, \ldots, i_j} \in \{0, 1\}$ and as before $R(a_{i_1}, \ldots, a_{i_j})^1 = R(a_{i_1}, \ldots, a_{i_j})$ and $R(a_{i_1}, \ldots, a_{i_j})^0 = \neg R(a_{i_1}, \ldots, a_{i_j})$ . The following theorem, due to Gaifman [Reference Gaifman11], provides a similar result to the one we had above, for the case of a first order language L. Let $QFSL$ be the set of quantifier-free sentences of L:

Theorem 2.3. Let $v: QFSL \rightarrow [0\, , \,1]$ satisfy P1 and P2 for $\phi , \psi \in QFSL$ . Then v has a unique extension $w: SL \rightarrow [0\, ,\, 1]$ that satisfies P1, P2 and P3. In particular if $w: SL \rightarrow [0\, ,\, 1]$ satisfies P1, P2 and P3 then w is uniquely determined by its restriction to $QFSL$ .

The language $L^{(k)}$ can be thought of as a propositional language with propositional variables $R(a_{i_{1}},\ldots ,a_{i_{j}})$ for $i_{1},\ldots ,i_{j} \leq k$ , $R \in RL$ and $R \,\, j-ary$ . With this in mind, for $\phi \in QFSL$ let k be an upper bound on the i such that $a_{i}$ appears in $\phi $ . Then $\phi $ can be thought of as a propositional formula in $L^{(k)}$ . Then the sentences $\Theta _{i}^{(k)}$ will be the atoms of $L^{(K)}$ and

$$ \begin{align*}\phi \leftrightarrow \bigvee_{\Theta_{i}^{(k)}\vDash \phi} \Theta_{i}^{(k)} \hspace{5mm} so \hspace{5mm} w(\phi)=\sum_{\Theta_{i}^{(k)} \vDash \phi} w(\Theta_{i}^{(k)}).\end{align*} $$

Thus to determine the value $w(\phi )$ we only need to determine the values $w(\Theta _{i}^{(k)})$ and to require

– (i) $w(\Theta _{i}^{(k)}) \geq 0$ and $\sum _{i=1}^{n_{k}}w(\Theta _{i}^{(k)})=1.\, {}$ (ii) $w(\Theta _{i}^{(k)})= \sum _{\Theta _{j}^{(k+1)}\vDash \Theta _{i}^{(k)}}w(\Theta _{j}^{(k+1)})$ .

to ensure that P1 and P2 are satisfied.

Definition 2.4. A set $K=\{w(\phi _i)= a_i \, \vert \, i=1, \ldots , n\}$ is probabilistically consistent if there is a probability function $w: SL \to [0, 1]$ that satisfies the constraints given in K.

3 Probabilistically consistent revisions

3.1 Probabilistic revision of inconsistent sets of sentences

Consider a consistent set of sentences $\varGamma =\{\phi _1, \ldots , \phi _n\}$ and let $\theta $ be such that $\varGamma \cup \{\theta \}$ is inconsistent. In the setting we shall present here, this inconsistency will be characterised as uncertainty and will thus result in moving to some probabilistically consistent revision of $\varGamma \cup \{\theta \}$ , $\varGamma '$ , i.e., a set $\varGamma '$ consisting of jointly satisfiable probabilistic statements of the form $w(\phi ) = a$ for $\phi \in \varGamma \cup \{\theta \}$ .

If a set of sentences $\varGamma $ is classically consistent then the sentences in $\varGamma $ can be simultaneously assigned probability one. That is, there are probability functions that assign probability one to all sentences in $\varGamma $ . This will, however, be impossible for an inconsistent $\varGamma $ , in which case, the highest probability that can be simultaneously assigned to all sentences of $\varGamma $ will be strictly less than 1. Following Knight [Reference Knight14, Reference Knight15] we define:

Definition 3.1. Let $\mathbb {A}$ be a set of probability functions on $SL$ . A set of sentences $\varGamma \subseteq SL$ is $\zeta $ -consistent in $\mathbb {A}$ , if there is a probability function $w \in \mathbb {A}$ such that $w(\phi ) \geq \zeta $ for all $\phi \in \varGamma $ . We say that $\varGamma \subseteq SL$ is $\zeta $ -consistent, if it is $\zeta $ -consistent in $\mathbb {P}_L$ .

Notice that if $\varGamma \nvDash \bot $ then $\varGamma $ is $1$ -consistent and if $\varGamma \vDash \bot $ and is $\zeta $ -consistent then necessarily $\zeta < 1$ . For $\varGamma =\{\phi _1, \ldots , \phi _n\}$ let $\beta ^{\varGamma }_{i}$ , $i=1, \ldots , m \leq 2^n$ enumerate the consistent sentences of the form

$$ \begin{align*}\bigwedge_{i=1}^{n} \phi_{i}^{\epsilon_{i}},\end{align*} $$

where $\epsilon _{i} \in \{0, 1\}$ , $\phi ^1= \phi $ and $\phi ^0=\neg \phi $ . For a probability function w on $SL$ , let $\vec {w}_{\varGamma }=(w(\beta ^{\varGamma }_{1}), \ldots , w(\beta ^{\varGamma }_{m})).$ We will drop the superscript and subscript $\varGamma $ when it is clear from the context. Next we will give a simple lemma that plays a crucial role in what follows:

Lemma 3.2. Take $\phi _1, \ldots , \phi _n \in SL$ , and let $\beta _1, \ldots , \beta _m$ enumerate the sentences $\bigwedge _{i=1}^{n} \phi _i^{\epsilon _i}$ as above. Let $v: \{\beta_1, \ldots, \beta_m\} \to \mathbb{R}^{+}$ and let $v(\beta _i)$ be such that $\sum _{i=1}^{m} v(\beta _i)=1$ . Then there is a probability function w on $SL$ for which $w(\beta _i)=v(\beta _i)$ .

Proof. It is enough to define w on $QFSL$ , the quantifier-free sentences of L. Choose any probability function u on $SL$ such that $u(\beta _i) \neq 0$ for $i=1, \dots , m$ and for each n and state description $\Theta ^{(n)}$ , of $L^{(n)}$ , define $w(\Theta ^{(n)})= \sum _{i=1}^{m} v(\beta _i) u(\Theta ^{(n)} \, \vert \, \beta _i)$ .

Then clearly $w(\Theta ^{(n)}) \geq 0$ , and also $\sum _{j=1}^{n_k} w(\Theta _j^{(n)})= \sum _{j=1}^{n_k} \sum _{i=1}^{m} v(\beta _i) u(\Theta _j^{(n)} \, \vert \, \beta _i) = \sum _{i=1}^{m} v(\beta _i)\sum _{j=1}^{n_k} u(\Theta _j^{(n)} \, \vert \, \beta _i)= \sum _{i=1}^{m} v(\beta _i)=1$ where the second to last follows from the fact that $u(- \vert \beta_i)$ is a probability function. Also

$$ \begin{align*}\sum_{\Theta_{j}^{n+1} \vDash \Theta^{(n)}} w(\Theta_j^{(n+1)})= \sum_{\Theta_{j}^{n+1} \vDash \Theta^{(n)}} \sum_{i=1}^{m} v(\beta_i) u(\Theta_j^{(n+1)} \, \vert\, \beta_i)\end{align*} $$

$$ \begin{align*}= \sum_{i=1}^{m} v(\beta_i)\sum_{\Theta_{j}^{(n+1)} \vDash \Theta^{(n)}} u(\Theta_j^{(n+1)} \, \vert\, \beta_i)= \sum_{i=1}^{m} v(\beta_i) u(\Theta^{(n)} \, \vert \beta_i)=w(\Theta^{(n)}). \end{align*} $$

These ensure that w satisfies P1 and P2 and will thus have a unique extension to $SL$ by Gaifman’s Theorem. It is clear that $w(\beta _i)=v(\beta _i)$ .

Proposition 3.3. Let $\varGamma =\{\phi _1, \ldots , \phi _n\} \subset SL$ . Set $C_{\varGamma } =\{\zeta \, \vert \, \varGamma \text { is } \zeta \text {-consistent}\}$ and $\eta = sup \,\,C_{\varGamma }$ . Then $\varGamma $ is $\eta $ -consistent.

Proof. Take a non-decreasing sequence $\zeta _n \in C_{\varGamma }$ with $\lim _{n \to \infty } \zeta _n= \eta $ . Since each $\zeta _n$ is in $C_{\varGamma }$ , there is a probability function $w_n$ on $SL$ such that $w_n(\phi ) \geq \zeta _n$ for all $\phi \in \varGamma $ . Let $\beta _i$ enumerate sentences $\bigwedge _{i=1}^{n} \phi _{i}^{\epsilon _{i}}$ and $\vec {w_n} = (w_n(\beta _1), \ldots , w_n(\beta _{m}))$ as above. Since $w_n(\beta _1)$ is a bounded sequence, it has a convergent subsequence, say $w^{1}_{n}(\beta _1)$ converging to, say $b_1$ . Let $\vec {w}^{1}_{n}= (w^{1}_n(\beta _1), \ldots , w^{1}_n(\beta _{m}))$ be a subsequence of $\vec {w}_n$ specified by the subsequence $w^{1}_{n}(\beta _1)$ (so the first coordinates of $\vec{w}_{n}^{1}$ converge to $b_1$ ). The same way $w^{1}_{n}(\beta _2)$ is a bounded sequence and has a converging subsequent say $w^{2}_{n}$ converging to, say $b_2$ . Let $\vec {w}^{2}_n=(w^{2}_n(\beta _1), w^{2}_n(\beta _2), \ldots , w^{2}_n(\beta _{m}))$ be the subsequence of $\vec {w}^{1}_n$ specified by $w^{2}_n(\beta _2)$ (so the first coordinates converge to $b_1$ and second coordinates converge to $b_2$ ). Continuing the same way, after m steps, we construct a sequence $\vec {w}^{m}_n$ that converges to $(b_1, b_2, \ldots , b_m)$ . Notice that $\sum _{i=1}^{m} b_i = \sum _{i=1}^{m} \lim _{n \to \infty } w^{m}_n(\beta _m) =1$ . Thus by Lemma 3.2 there is a probability function w on $SL$ such that $\vec {w} (\beta _i)= b_i=\lim _{n \to \infty } w^{m}_n(\beta _i)$ for all $i=1, \ldots , m$ . Then for all $\phi \in \varGamma $

$$ \begin{align*}w(\phi)= \sum_{\beta_k \vDash \phi} w(\beta_k) = \sum_{\beta_k \vDash \phi} b_k = \sum_{\beta_k \vDash \phi} \lim_{n \to \infty} w^{m}_n(\beta_k)\end{align*} $$

$$ \begin{align*}= \lim_{n \to \infty} \sum_{\beta_k \vDash \phi} w^{m}_n (\beta_k) =\lim_{n \to \infty} w^{m}_n(\phi) \geq \lim_{n \to \infty} \zeta^{m}_n =\eta. \\[-42pt] \end{align*} $$

Definition 3.4. For a set of sentences $\varGamma \subset SL$ the maximal consistency of $\varGamma $ , denoted by $mc(\varGamma )$ is defined as $mc(\varGamma )= max \{ \eta \, \vert \, \varGamma \text { is } \eta \text {-consistent}\}$ .

The maximal consistency of $\varGamma $ is thus the largest probability $\eta $ that can be simultaneously guaranteed for all $\phi \in \varGamma $ by some probability functions on $SL$ . Notice that by Proposition 3.3, the largest such $\eta $ exists because the supremum of those $\eta $ that satisfy this condition will be witnessed by some probability function.

The next result shows that the maximal consistency of a set of sentences $\varGamma $ is determined by a fixed subset of it, in the sense that any probability function that realizes the maximal consistency should assign probability equal to $mc(\varGamma )$ to every sentence in this subset. In other words, maximal consistency of $\varGamma $ is the highest probability that can be jointly realized by sentences in this subset.Footnote ¹ Call this subset $\varGamma _1$ and let $\eta_1=mc(\Gamma)$ . Repeating the same observation we can now look at $\varGamma \setminus \varGamma _1$ and enquire about the highest probability that can be guaranteed for sentences in $\varGamma \setminus \varGamma _1$ , this time over those probability functions that realize the maximum consistency.Footnote ² Call this $\eta_2 $ . With the same reasoning as for $\varGamma $ we will find a fixed subset $\varGamma _2 \subseteq \varGamma \setminus \varGamma _1$ and any probability function w that realizes the maximum consistency for $\varGamma $ and $\eta_2 $ for $\varGamma \setminus \varGamma _1$ Footnote ³ should have $w(\psi ) = \eta_2 $ for all $\psi \in \varGamma \setminus \varGamma _1$ . Repeating this process finitely many times we will end up with a partitioning of $\varGamma = \bigcup _{i=1}^n \varGamma _n$ and a sequence $\eta _1, \eta _2, \ldots , \eta _n$ as the highest probabilities can be assigned to $\varGamma _i$ .

Lemma 3.5. Let $\mathbb {P}_L$ be the set of probability function on $SL$ and $\varGamma =\{\phi _1, \ldots , \phi _n\} \subset SL$ with $mc(\varGamma )=\eta $ . Then:

– (i) There is a non-empty subset of $\varGamma $ , say $\varGamma _1$ , such that for every probability function $w \in \mathbb {P}_L$ if $w(\phi ) \geq \eta $ for all $\phi \in \varGamma $ then $w(\phi ) = \eta $ for all $\phi \in \varGamma _1$ .
– (ii) There is a partition $\varGamma = \varGamma _1 \cup \varGamma _2 \cup \cdots \cup \varGamma _m$ , values $\eta _1< \cdots < \eta _m$ and $\mathbb {P}_L= \mathbb {P}_0 \supseteq \mathbb {P}_1 \supseteq \cdots \supseteq \mathbb {P}_m$ such that:
- – $\eta _i = max \{\zeta \, \vert \, \varGamma \setminus \bigcup _{j=1}^{i-1} \varGamma _j \text { is } \zeta \text {-consistent in } \mathbb {P}_{i-1}\} $ ,
- – $\mathbb {P}_i =\{w \in \mathbb {P}_{i-1}\, \vert \, w(\phi ) \geq \eta _i \text { for all } \phi \in \varGamma \setminus \bigcup _{j=1}^{i-1} \varGamma _j\}$ , $i=1, \ldots , n$ , and
- – for all $w \in \mathbb {P}_i$ , $w(\psi )=\eta _i$ for all $\psi \in \varGamma _i$ .

Proof. For (i), suppose not, then for every $\psi \in \varGamma $ there is a probability function $w_{\psi }$ (not necessarily distinct) such that $w_{\psi } (\phi ) \geq \eta $ for all $\phi \in \varGamma $ and $w_{\psi }(\psi )> \eta $ . Let $w= 1/n \sum _{\psi \in \varGamma } w_{\psi }$ then for every $\phi \in \varGamma $ we have $w(\phi ) = 1/n \sum _{\psi \in \varGamma } w_{\psi }(\phi )> \eta $ since for every $\phi \neq \psi \ w_{\psi }(\phi ) \geq \eta $ and $w_{\phi }(\phi )>\eta $ . This is a contradiction with $mc(\varGamma )=\eta $ .

For (ii), first for a set of probability function $\mathbb {A}$ and a set of sentences $\Delta $ define

$$ \begin{align*}mc_{\mathbb{A}}(\Delta)= max \{\zeta\, \vert \, \Delta \text{ is } \zeta\text{-consistent in } \mathbb{A}\} \end{align*} $$

where the maximum exists. Next notice that for sets of probability functions $\mathbb {P}_i$ above and any finite set of sentences $\Delta $ , $mc_{\mathbb {P}_i}(\Delta )$ is well defined. This follows by an argument similar to that of Proposition 3.3 by noticing that if we restrict the construction in the proof of Proposition 3.3 to some $\mathbb {P}_i$ then the probability function w constructed in that proof that witnesses the threshold $sup \{\zeta \, \vert \, \Delta \text { is } \zeta \text {-consistent in } \mathbb {P}_i\} $ will also be in $\mathbb {P}_i$ .

Now, let $\eta _1 = mc_{\mathbb {P}_0}(\varGamma )= mc(\varGamma )= \eta $ , $\varGamma _1$ as in (i), and let $\eta _2=mc_{\mathbb {P}_1}(\varGamma \setminus \varGamma _1).$ That is the highest threshold that can be simultaneously satisfied by all sentences in $\varGamma \setminus \varGamma _1$ assuming that all sentences in $\varGamma $ have probability at least $\eta _1$ . With the same argument as in (i), one can show that there is a fixed subset $\varGamma _2 \subset \varGamma -\varGamma _1$ such that $w(\theta )= \eta _2$ for $\theta \in \varGamma _2$ and $w(\theta ) \geq \eta _2$ for $\theta \in \varGamma -(\varGamma _1 \cup \varGamma _2)$ for every probability function $w \in \mathbb {P}_1$ . Following the same process finitely many times one will be left a partition $\varGamma = \varGamma _1 \cup \varGamma _2 \cup \cdots \cup \varGamma _m$ and values $\eta _1, \ldots , \eta _m$ .

Take a fixed enumeration of $\varGamma $ as $\{\phi _1, \ldots , \phi _n\}$ . Let $\varGamma = \varGamma _1 \cup \varGamma _2 \cup \cdots \cup \varGamma _m$ and $\eta _1< \cdots < \eta _m$ be as in Lemma 3.5 and set

$$ \begin{align*}\vec{mc}(\varGamma)= (\delta_{1}, \ldots, \delta_{n}), \,\,\, \text{where} \,\,\, \delta_{j}= \eta_k \iff \phi_j \in \varGamma_k.\end{align*} $$

Intuitively the values given in $\vec {mc}(\varGamma )$ are the highest probabilities that can be assigned to the sentences in $\varGamma $ coherently. In the sense that there is no probability function that can assign a probability higher than $\eta _1$ to all the sentences in $\varGamma _1$ simultaneously and the same for $\eta _2$ and $\varGamma _2$ and so on. Notice that $\vec {mc}(\varGamma )$ depends on the enumeration of $\varGamma $ that we fixed. To be more precise $\vec {mc}(\varGamma )$ is not invariant under permutations of $\varGamma $ , although it is invariant under those that are made from union of separate permutations on $\varGamma _i$ ’s. If we take $\vec {1}=(1, \ldots , 1)$ as an n-vector representing the assignment of probabilities $1$ to all sentences $\phi _1, \ldots , \phi _n$ (which will be impossible if $\varGamma $ is inconsistent) then for any probability function w and $\vec {w}= (w(\phi _1), \ldots , w(\phi _n))$ , we have $d(\vec {1}, \vec {mc}(\varGamma )) \leq d(\vec {1}, \vec {w})$ where d is the Euclidean distance,Footnote ⁴ and thus accounting for $\vec {mc}(\varGamma )$ being the closest we can get to the assumption that all sentences in $\varGamma $ are true.

Definition 3.6. Let $\varGamma =\{\phi _1, \ldots , \phi _n\} \subset SL$ be a consistent set of sentences and $\phi _{n+1} \in SL$ be such that $\varGamma \cup \{\phi _{n+1}\} \vDash \bot $ . The revision of $\varGamma $ by $\phi _{n+1}$ is defined as

$$ \begin{align*}\varGamma'=\{w(\phi_1)=a_1, \ldots, w(\phi_n)=a_n, w(\phi_{n+1})=a_{n+1}\}, \end{align*} $$

where $(a_1, \ldots , a_{n}, a_{n+1})= \vec {mc}(\{\phi _1, \ldots , \phi _n, \phi _{n+1}\})$ .

Definition 3.6 is intended to capture the idea that the revised assignments of probabilities to the sentences $\phi _1, \ldots , \phi _n, \phi _{n+1}$ remain as close as possible to $1$ , i.e., to assign the highest reliability to the information that is probabilistically consistently possible. We will show the uniqueness of $\vec {mc}$ in the next section for a more general setting.

Example 3.7. Let L be a first order language with a single binary relation R and equality. Consider the following sentences that express properties of R as a partial order: $\phi _1: \forall x \neg R(x,x)$ asserting that R is anti-reflexive, $\phi _2: \forall x,y (R(x,y) \wedge R(y, x) \to x=y)$ , asserting that R is anti-symmetry, $\phi _3: \forall x \exists y R(x, y)$ , asserting that each element in the domain has an R-successor, $\phi _4: \forall x, y ((x \neq y) \to R(x,y) \vee R(y, x)$ , asserting that R is a total relation, $\phi _5: \phi _4 \to \phi _1 \wedge \phi _2$ (if R is total then it is anti-reflexive and anti-symmetric) and $\phi _6: \exists x R(x,x)$ . Let $\varGamma =\{\phi _1, \ldots , \phi _6\}$ .

Then $\varGamma $ is clearly inconsistent as $\phi _1$ and $\phi _6$ are inconsistent, indeed $\phi _6= \neg \phi _1$ and thus for any probability function w, $w(\phi _1) \geq 1/2$ if and only if $w(\phi _6) \leq 1/2$ . Hence the largest $\eta $ such that both $w(\phi ), w(\psi ) \geq \eta $ for some probability function w is 1/2. Therefore the highest $\eta $ such that $w(\phi _i) \geq \eta $ for $i=1, \ldots , 6$ is at most 1/2. It is easy to check that $\varGamma $ is $1/2$ -consistent. So $\eta _1$ is 1/2 and $\varGamma _1=\{\phi _1, \phi _6\}$ . Amongst the probability functions that assign a probability of at least $1/2$ to $\phi _1, \ldots , \phi _6$ , the highest $\eta _2$ such that probability of all $\phi _2, \phi _3, \phi _4$ and $\phi _5$ is at least $\eta _2$ is $3/4$ . To see this, consider $\beta _j$ that enumerate sentences of the form $\bigwedge _{i=1}^{6} \phi _{i}^{\epsilon _{i}}$ with $\epsilon _{i} \in \{0, 1\}$ as before, and let w be any probability function on $SL$ such that $w(\phi _i) \geq 1/2$ for $i=1, \ldots , 6$ . Now observe that $\sum _{\beta _j \vDash \phi _1} w(\beta _j)= w(\phi _1)=1/2$ , and that $\beta _j \vDash \phi _4 \wedge \phi _5$ implies that $\beta _j \vDash \phi _1$ . Thus $\sum _{\beta _j \vDash \phi _4 \wedge \phi _5} w(\beta _j)\leq \sum _{\beta _j \vDash \phi _1} w(\beta _j)= w(\phi _1)=1/2$ . Next note that

$$ \begin{align*}w(\phi_4)=\sum_{\beta_j \vDash \phi_4} w(\beta_j)= \sum_{\beta_j \vDash \phi_4 \wedge \phi_5} w(\beta_j) + \sum_{\beta_j \vDash \phi_4 \wedge \neg \phi_5} w(\beta_j) \leq 1/2+ \sum_{\beta_j \vDash \phi_4 \wedge \neg \phi_5} w(\beta_j). \end{align*} $$

Hence if $w(\phi _4)> 3/4$ then $\sum _{\beta _j \vDash \phi _4 \wedge \neg \phi _5} w(\beta _j)> 1/4$ but then

$$ \begin{align*}w(\neg \phi_5) = \sum_{\beta_j \vDash \neg \phi_5} w(\beta_j) = \sum_{\beta_j \vDash \neg \phi_5 \wedge \phi_4} w(\beta_j) + \sum_{\beta_j \vDash \neg \phi_5 \wedge \neg \phi_4} w(\beta_j)> 1/4 \end{align*} $$

and consequently $w(\phi _5) < 3/4$ . Therefore $\phi _4$ and $\phi _5$ cannot have probabilities higher than $3/4$ simultaneously. It is easy to see that there is a probability function that assigns probabilities at least 1/2 to all $\varGamma $ and at least $3/4$ to both $\phi _4$ and $\phi _5$ . So we get $\eta _2 =3/4$ and $\varGamma _2=\{\phi _4, \phi _5\}$ . For the next step we observe that $\phi _2$ and $\phi _3$ are consistent with each other and with all $\phi _1 =\neg \phi _6, \phi _4, \neg \phi _4, \phi _5, \neg \phi _5$ and $\phi _6= \neg \phi _1$ and so we get $\eta _3=1$ and $\varGamma _3=\{\phi _2, \phi _3\}.$ Thus $\vec{mc}(\Gamma)= (1/2, 1 , 1 , 3/4, 3/4, 1/2)$ .

3.2 Revision of probabilistic assertions

Using the revision process described above, one will move, in the presence of inconsistencies, from a set of sentences to one consisting of probabilistic assertion on those sentences. To use this as a process for iterated revision one needs to define the revision process also on the sets of probabilistic assertions. The latter will be more general and include the categorical sets by identifying a set $\{\phi _1, \ldots , \phi _n\}$ with the set of probabilistic assertions $\{w(\phi _1)=1, \ldots , w(\phi _n)=1\}$ .

Notice that in revising $\varGamma =\{\phi _1, \ldots , \phi _n\}$ , with a sentence $\phi _{n+1}$ , the notion of maximal consistency of $\varGamma \cup \{\phi _{n+1}\}$ represents an attempt to jointly assign probabilities to these sentences while remaining as close as possible to their “prior probabilities” (namely, $1$ ). The approach when dealing with inconsistent sets of probabilistic assertions is going to be the same. We shall try to jointly revise the probability assignments while remaining as close as possible to the prior probabilities, which might not necessarily be $1$ any more. To this end we first generalise the notion of maximal consistency given above.

Definition 3.8. Let $\varGamma =\{w(\phi _1)=a_1, \ldots , w(\phi _n)=a_n\}$ be a (possibly inconsistent) set of probabilistic assertions. Minimal change consistency of $\varGamma $ , $\vec {mcc}(\varGamma )$ , is defined as the n-vector

$$ \begin{align*}\vec{q} {{\kern-1pt}\in{\kern-1pt}} \{\vec{b} \in [0, 1]^n \, \vert \, \text{there is a probability function } W \text{ on } SL \text{ with } W(\phi_i)=\vec{b}_i, i=1, \ldots, n\} \end{align*} $$

for which $d(\vec {q}, \vec {a})$ is minimal, where $\vec {a}=(a_1, \ldots , a_n)$ and d is the Euclidean distance.

Proposition 3.9. Let $\varGamma = \{w(\phi _1)=a_1, \ldots , w(\phi _n)=a_n\}$ be inconsistent. Then there is a unique $\vec {b} \in [0, 1]^n$ such that $\vec {mcc}(\varGamma )= \vec {b}$ .

Proof. Let $\Lambda =\{\vec {x} \in [0, 1]^n \, \vert \ \text {There\, is\, a\, probability\, function} \ w \ \mathrm {on} \ SL \ \mathrm {with} w(\phi _i)=x_i\}$ . Then $\Lambda $ is convex. To see this let $\vec {x}, \vec {y} \in \Lambda $ and $\vec {z}= t \vec {x} + (1-t) \vec {y}$ for some $t \in [0, 1]$ . Since $\vec {x}, \vec {y} \in \Lambda $ , there are probability functions $v, w$ on $SL$ such that $v(\phi _i)= \vec {x}_i$ and $w(\phi _i) = \vec {y}_i$ , $i=1, \ldots , n$ . Let $u(\psi ) = t v(\psi )+ (1-t) w(\psi )$ for all $\psi \in SL$ . Then u is a probability function on $SL$ and $u(\phi _i) = \vec {z}_i$ for $i=1, \ldots , n$ . Thus $\vec {z} \in \Lambda $ . Next notice that if d is the Euclidean distance then $f: [0, 1]^n \to \mathbb {R}$ defined as $f(\vec {x}) = d(\vec {x}, \vec {a})$ is a convex function and so it has a unique minimum on the convex set $\Lambda $ .

It is immediate from the definition that for a set of categorical sentences $\varGamma $ , i.e., when $a_1= \cdots =a_n=1$ , $\vec {mcc}(\varGamma )$ coincides with $\vec {mc}(\varGamma )$ . Notice also that for consistent $\varGamma =\{w(\phi _1)=a_1, \ldots , w(\phi _n)=a_n\}$ , the $\vec {mcc}(\varGamma )= \vec {a}$ . The process of revising a set of probabilistic assertions $\varGamma =\{w(\phi _1)=a_1, \ldots , w(\phi _n)=a_n\}$ with the statement $w(\phi _{n+1})=a_{n+1}$ is the same as revising a non-probabilistic set of sentences but with $\vec {mcc}(\varGamma \cup \{w(\phi _{n+1})=a_{n+1}\})$ instead of $\vec {mc}(\varGamma \cup \{\phi _{n+1}\})$ .

Definition 3.10. Let $\varGamma =\{w(\phi _1)=a_1, \ldots , w(\phi _n)=a_n\}$ , where $\{\phi _1, \ldots , \phi _n\} \subset SL$ and $\phi _{n+1} \in SL$ be such that $\varGamma \cup \{w(\phi _{n+1})=a_{n+1}\} $ is inconsistent, then the revision of $\varGamma $ by $w(\phi _{n+1})=a_{n+1}$ is defined as $\varGamma '=\{w(\phi _i)=q_i \vert i=1, \ldots , n+1\}$ where $\vec {q}= \vec {mcc}(\varGamma \cup \{w(\phi _{n+1})= a_{n+1}\})$ .

One thing worth noting here is that classically there are different ways to eliminate inconsistencies from a set of sentences. One can, for instance, adopt any of its maximal consistent subsets, or eliminate inconsistencies in a number of different ways by deleting different sentences. However, as pointed out above, for a set of categorical sentences $\varGamma =\{\phi _1, \ldots , \phi _n\}$ , $\vec {mcc}(\varGamma )= \vec {mc}(\varGamma )$ . Thus Proposition 3.10 ensures that there is a unique way of probabilistically eliminating inconsistency from a set of sentences in the manner that we propose here.

One can immediately notice that in the revision process described above all the sentences are given the same priority. This can be readily relaxed. One can modify the distance used in the definition of $\vec {mcc}$ (or $\vec {mc}$ ) to account for a higher degree of reliability or trust in one or some of the probabilistic assertions that are to be revised. Hence we can revise the definition of minimum change consistency as follows

Definition 3.11. Let $\varGamma =\{w(\phi _i)=a_i\, \vert \, i=1, \ldots , n\}$ . Then $\vec {mcc}({\varGamma })$ is the n-vector $\vec {q} \in \{(b_1, \ldots , b_n) \, \vert \, \text {there is a probability function } W \text { on } SL \text { with } W(\phi _i)=b_i, i=1, \ldots , n\}$ for which $m(\vec {q}, \vec {b}):= \sqrt {k_{i}(q_i - a_i)^{2}}$ is minimal.

We take the assignment of weights $k_i$ to be a context-dependent process. There are different approaches one might take for this. When $\varGamma $ is taken as the set of probabilistic beliefs of an agent, $k_i$ ’s can be regarded as what is referred to as the degrees of entrenchment of the beliefs in $\phi _i$ , expressing how strongly the agent holds their (probabilistic) belief in $\phi _i$ compared to their belief in $\phi _j$ , $j \neq i$ . One can achieve the same goal by taking a more detailed approach using some notion of ordinal ranking. To see this take the language $L^{(k)}$ ,Footnote ⁵ and either let $\varGamma $ consist of only quantifier-free sentences, or let k be large enough that $L^{(k)}$ captures a good approximation of the real world for the context. As described in Section 2, $\phi _{i}^{(k)}$ , $i=1, \ldots , n$ , can be viewed as sentences in the propositional language with propositional variables $R_{i}(a_{j_{1}}, \dots , a_{j_{s_{i}}})$ and atoms of this language are the sentences of the form

$$ \begin{align*}\bigwedge_{{j_{1},\ldots ,j_{s_{i}} \leq k} \atop{{ R\,\, s_{i}-ary} \atop{R_i \in RL, j \in N^{+}}}} R_{i}(a_{j_{1}},\ldots , a_{j_{s_{i}}})^{\epsilon_{j_1, \ldots, j_{s_i}}}. \end{align*} $$

Then, given an ordinal ranking on these atoms, expressing what the agent takes to be more likely to be the real world, in a way that contradictions are given rank $0$ , and the more plausible atoms get assigned a higher ordinal, one can take the coefficients $k_i$ above as the highest rank such that there is an atom of that rank consistent with $\phi _i$ . That is the highest rank of a possible world, of appropriate size, consistent with $\phi _i$ . On other contextual consideration one might choose to have the coefficients $k_i$ to represent the reliability of the source or the process from which the information is acquired, etc.

Observation 3.12. Similar to AGM revision, probabilistic consistent revision is order-dependent.

Take for example three sentences $\phi , \psi $ and $\theta $ where each two are mutually consistent but $\phi \wedge \psi \wedge \theta \vDash \bot $ . Then maximum consistency of $\{\phi , \psi , \theta \}$ is 2/3. Take the set $\varGamma =\{\phi , \psi \}$ . Then take the revision of $\varGamma $ by $\theta $ and $\neg \psi $ . If we revise $\varGamma $ first by $\neg \psi $ , then the consistent probabilistic revision will give us $\varGamma + \neg \psi =\{w(\phi ) =1, w(\psi ) = 1/2, w(\neg \psi ) =1/2\}$ . If we then revise $\varGamma + \neg \psi $ by $\theta $ we get $(\varGamma + \neg \psi ) + \theta =\{w(\phi ) =3/4, w(\psi )=1/2, w(\neg \psi )= 1/2, w(\theta )=3/4\}$ . Clearly, 1/2 is the largest value that $\psi $ and $\neg \psi $ can be jointly assigned and given probability 1/2 for $\psi $ and $\neg \psi $ , the largest probability that $\phi $ and $\theta $ can jointly have is 3/4, because if $w(\psi )=1/2$ and $w(\phi ) \geq 3/4+ \epsilon $ then necessarily $w(\theta ) \leq 3/4 - \epsilon $ . To see this notice that from $w(\psi )=1/2$ and $w(\phi ) \geq 3/4+ \epsilon $ we get $w(\psi \wedge \phi ) \geq 1/4 + \epsilon $ but $\psi \wedge \phi \vDash \neg \theta $ and so $w(\neg \theta ) \geq 1/4 + \epsilon $ and thus $w(\theta ) \leq 3/4 - \epsilon $ . Thus these are upper bounds on the values that can be jointly assigned to $\phi , \psi , \neg \psi $ and $\theta $ . To see these values can be realized take the assignment $v(\phi \wedge \psi \wedge \neg \theta )= v(\neg \phi \wedge \psi \wedge \theta ) =1/4$ , $v(\phi \wedge \neg \psi \wedge \theta )=1/2$ and $v(\phi \wedge \psi \wedge \theta )= v(\phi \wedge \neg \psi \wedge \neg \theta )=v(\neg \phi \wedge \psi \wedge \neg \theta )=v(\neg \phi \wedge \neg \psi \wedge \theta )=v(\neg \phi \wedge \neg \psi \wedge \neg \theta )=0$ . Then by Lemma 3.2 there is a probability function w on the whole language that assigns these values to $\phi , \psi $ and $\theta $ .

On the other hand if we first revise $\varGamma $ by $\theta $ we will have $\varGamma + \theta =\{w(\phi )=2/3 , w(\psi )=2/3, w(\theta )=2/3\}$ . If we then revise this with $\neg \psi $ then by using minimal change consistency we get $(\varGamma + \theta ) + \neg \psi =\{w(\phi ) =2/3, w(\psi ) = 1/3, w(\theta )=2/3, w(\neg \psi ) =2/3\}$ . So the order by which we revise $\varGamma $ with $\theta $ and $\neg \psi $ makes a difference in the final revised belief set.

Observation 3.13. Probabilistic consistent revision is not idempotent.

Take for example the three sentences $\phi , \psi $ and $\theta $ from Observation 3.12 and for every probability function $\mu $ , let $\vec {\mu }= (\mu (\phi ), \mu (\psi ), \mu (\theta ), \mu (\theta ))$ . Take $\varGamma =\{\phi , \psi \}$ . Then revising $\varGamma $ with $\theta $ will result in the updated belief set $\varGamma +\theta =\{w(\phi )=2/3, w(\psi )=2/3, w(\theta )=2/3\}$ as before. Learning $\theta $ again will then update the belief set to $(\varGamma +\theta )+\theta =\{w(\phi )= \mu _{\phi }, w(\psi )= \mu _{\psi }, w(\theta )= \mu _{\theta }\}$ such that there is a probability function $\mu $ with $\vec {\mu }= (\mu _{\phi }, \mu _{\psi }, \mu _{\theta }, \mu _{\theta })$ , and for $\vec {\gamma }=(2/3, 2/3, 2/3, 1)$ , $d(\vec {\gamma }, \vec {\mu }) < d(\vec {\gamma }, \vec {\nu })$ for any probability function $\nu $ . It is then easy to check that $\vec {\mu } =(7/12, 7/12, 5/6, 5/6)$ . Thus learning $\theta $ again has the effect of increasing the belief in $\theta $ and thus reducing the belief in $\phi $ and $\psi $ .

Notice that this last observation is indeed in line with the intuition of revising belief by learning new information from different (and possibly contradictory) sources; learning some $\phi $ (possibly in conflict with the rest of available information) from multiple sources should increase the belief in $\phi $ possibly at the cost of decreasing belief in other parts of the belief set in conflict with it.

4 Probabilistic entailment

We started with the problem of drawing logical inference from an inconsistent set of premisses. Following the intuition that inconsistencies in the premisses should be interpreted not as a property of the world but rather as a deficiency of the information, we proposed that the presence of inconsistencies should be understood as an inadequacy and hence uncertainty of the information. With this view, inconsistencies should be treated by moving from reasoning in a categorical context to the reasoning in uncertain ones, hence moving from categorical premisses to probabilistic (-ally consistent) ones. The previous section addressed the issue of how to reduce an inconsistent categorical set of premisses to a consistent set of probabilistic assertions. This however leaves open the question of how one should draw logically valid inferences from these sets of probabilistic assertions. We now move to this question.

The classical entailment relation is defined as a process that preserves the truth (given model theoretically). That is the entailment ensures the truth of the conclusion (in a model) given the truth of the premisses (in that model). In the probabilistic setting the truth can be identified by assignment of probability 1. However, for probabilities less than 1 one has to settle for a weaker notion. The precise nature of this weaker notion seems context-dependent but instances of such would be for example one of reliability or acceptability, which in our setting will be represented by a probability threshold. The entailment relation for the sets of probabilistic assertions would then be defined by ensuring the preservation of this weaker notion. Such an entailment relation has been proposed in [Reference Knight15], by Knight and further studied by Paris [Reference Paris18] and Paris, Picado-Muiño and Rosefield [Reference Paris, Picado-Muino and Rosefield19]. In this section we will extend this entailment relation to first order languages and will investigate some of its properties, which follow with an straightforward modification of the results for the propositional case. We will then look briefly at its generalisation to multiple probability thresholds as in [Reference Knight15] which can be used to limit the pathological effect of inconsistencies only to the relevant part of the premisses. As will be clear shortly, the probabilistic entailment we study provides a spectrum of consequence relations, allowing for reasoning at different degrees of reliability or acceptability.

4.1 The $^{\eta }\rhd _{\zeta }$ entailment

If we identify truth by the probabilistic threshold 1, the classical consequence relation can be read as if all the premisses are reliable with threshold 1, then so is the conclusion. The weakening of this relation in our setting is captured by allowing for thresholds less than 1.

Definition 4.1. [Reference Paris18].

Let $\varGamma \subset SL$ , $\psi \in SL$ and $\eta , \zeta \in [0, 1]$ .

$$ \begin{align*}\varGamma ^{\eta}\rhd_{\zeta} \psi \iff \text{ for all probability functions } w \text{ on } L, \text{ if } w(\varGamma) \geq \eta \text{ then } w(\psi) \geq \zeta. \end{align*} $$

The idea here is that as long as one is in the position to assign to each of the sentences in $\varGamma $ a probability of at least $\eta $ , one is also in the position to assign a probability of at least $\zeta $ to the sentence $\psi $ . The intuition for defining such a probabilistic entailment is more evident when $\eta =\zeta $ are interpreted as the thresholds for acceptance. In this situation the entailment relation $\varGamma ^{\eta }\rhd _{\eta } \psi $ can be read as: as long as we are prepared to accept all the sentences in $\varGamma $ we are bound to accept $\psi $ . With this reading, the $^{\eta }\rhd _{\eta }$ relation connects very well with the idea of belief in terms of the Lockean Thesis [Reference Leitgeb16] with $\eta $ as the Lockean threshold for belief. With this reading the entailment relation can be read as one of belief: ensuring belief in a sentence $\psi $ conditioned on belief in a set of sentences $\varGamma $ . There are situations, however, where the context of reasoning justifies different threshold of belief (or acceptance) for the assumptions and conclusion.

It can then be immediately observed that for the right value of $\eta $ this will avoid explosion on inconsistent premisses, for example . To be more precise, when dealing with an inconsistent $\varGamma $ one can avoid the trivialisation of the entailment relation $^{\eta }\rhd _{\zeta }$ on $\varGamma $ as long as one chooses $\eta \leq mc(\varGamma )$ . Reading the entailment relation as one of belief, the constraint imposed by $mc(\varGamma )$ on the thresholds for belief is conceptually reminiscent of the context-dependency of the Lockean threshold in Leitgeb’s stability theory of belief [Reference Leitgeb16].

For the rest of this section we shall restrict ourselves to $\eta \in [0, mc(\varGamma )]$ whenever we make a reference to $\varGamma ^{\eta }\rhd _{\zeta } $ .

4.2 Some properties of $^{\eta }\rhd _{\zeta }$

Following [Reference Paris, Picado-Muino and Rosefield19] we now show some elementary properties of $^{\eta }\rhd _{\zeta }$ for the first order case. These results follow by an straightforward modification of the same results for the propositional case given in [Reference Paris18, Reference Paris, Picado-Muino and Rosefield19].

Proposition 4.2. For any $\varGamma =\{\phi _1, \ldots , \phi _n\} \subset SL$ and $\psi \in SL$ ,

(i) $\varGamma ^{\eta }\rhd _{0} \psi $ .
(ii) For $\zeta>0$ , $\varGamma ^{1}\rhd _{\zeta } \psi \iff \varGamma \vDash \psi $ .
(iii) For $\eta> mc(\varGamma )$ , $\varGamma ^{\eta }\rhd _{1} \psi $ .
(iv) For $\zeta>0$ , $\varGamma ^{0}\rhd _{\zeta } \psi \iff \vDash \psi $ .

Proof. Parts (i) and (iii) are immediate from the definition, for (i) observe that probability of $\psi $ is non-negative for any probability function, independent of what probability it assigns to sentences of $\varGamma $ , and (iii) is vacuously true since there are no probability functions that assigns probabilities higher than maximal consistency of $\Gamma $ to all sentences in $\varGamma $ simultaneously. For (ii) notice that classical valuations on L are themselves probability functions. Thus for consistent $\varGamma $ , $\varGamma ^{1}\rhd _{\zeta } \psi $ implies that $v(\psi ) \geq \zeta $ for all valuations v for which $v(\phi)=1\ \textrm{for all}\ \phi \in \Gamma$ . Since $\zeta>0$ this implies that $v(\psi )=1$ and thus $\varGamma \vDash \psi $ . If $\varGamma $ is inconsistent then (ii) follows trivially. Conversely suppose $\varGamma \vDash \psi $ and $w(\phi)=1\ \textrm{for all}\ \phi \in \Gamma$ . Let $\beta _i$ , $1 \leq i \leq m$ , enumerate the consistent sentences of the form $\bigwedge _{i=1}^{n} \phi _{i}^{\epsilon _{i}}$ where $\epsilon _i \in \{0, 1\}$ and $\phi _{i}^{1}= \phi _i \text { and } \phi _{i}^{0}= \neg \phi _i$ . Then for any $\beta _i$ such that $w(\beta _i)> 0$ we have $\beta _i \vDash \phi _i$ for all $1 \leq i \leq n$ since otherwise we will have $w(\phi _i) = \sum _{\beta _j \vDash \phi _i} w(\beta _j) <1$ . So $\beta _i \vDash \bigwedge \varGamma $ and since $\bigwedge \varGamma \vDash \psi $ , $ w(\psi ) \geq w(\bigwedge \varGamma ) = \sum _{\beta _j \vDash \bigwedge \varGamma }=1 \geq \zeta $ as required. For (iv), if $\nvDash \psi $ then there is a valuation v for which $v(\psi )=0$ . Since v is also a probability function and $v(\phi ) \geq 0$ for all $\phi \in \varGamma $ , $\varGamma ^{0}\rhd _{\psi }$ will fail for any $\zeta>0$ . Conversely if $\varGamma ^{0}\rhd _{\zeta } \psi $ fails then there is a probability function w for which $w(\psi ) < \zeta \leq 1$ and thus $\nvDash \psi $ .

Proposition 4.3. Assume that $\varGamma ^{\eta }\rhd _{\zeta } \psi $ . Then:

(i) If $\tau \geq \eta $ and $\nu \leq \zeta $ , then $\varGamma ^{\tau }\rhd _{\nu } \psi $ .
(ii) If $\tau \geq 0$ and $\eta + \tau , \zeta + \tau \leq 1$ , then $\varGamma ^{\eta + \tau }\rhd _{\zeta + \tau } \psi $ .

Proof. (i) is immediate from the definition. For (ii) suppose that $\varGamma ^{\eta + \tau }\rhd _{\zeta + \tau } \psi $ failed. Thus there is a probability function w for which $w(\phi ) \geq \eta + \tau $ for all $\phi \in \varGamma $ but $w(\psi ) < \zeta +\tau $ . If $w(\psi ) < \zeta $ we will have that $\varGamma ^{\eta }\rhd _{\zeta } \psi $ fails. Otherwise let $\gamma \geq 0$ be such that

$$ \begin{align*}\gamma < \zeta < \gamma + (\zeta + \tau -w(\psi)). \end{align*} $$

Let $\beta _i$ enumerate all the consistent sentences of the form $\bigwedge _{i=1}^{n} \phi _i^{\epsilon _i} \wedge \psi ^{\epsilon _{n+1}}$ with $\epsilon_j \in \{0, 1\}$ and $\phi^1, \phi^0, \psi^1$ and $\psi^0$ defined as above. Pick a $\beta _i$ such that $w(\beta _i)>0$ and $\beta _i \nvDash \psi $ (such a $\beta _i$ exists otherwise we should have $w(\psi )=1$ and $\varGamma ^{\eta + \tau }\rhd _{\zeta + \tau } \psi $ will hold). Define

$$ \begin{align*}v(\beta_k)= \begin{cases} w(\beta_k).(\gamma/w(\psi)), & \mbox{if } \beta_k \vDash \psi,\\ w(\beta_k), & \mbox{if } \beta_k \nvdash \psi, \beta_k\neq \beta_i,\\ w(\beta_i) + w(\psi)-\gamma, & \mbox{if } \beta_k=\beta_i, \end{cases}\end{align*} $$

so $\sum _{k=1}^{2^{n+1}} v(\beta _k)=1$ . Using Lemma 3.2, we can find a probability function $w'$ on $SL$ such that $w'(\beta _i)= v(\beta _i)$ for $i=1, \ldots , 2^{n+1}$ . Then we have $w'(\psi )= \sum _{\beta _i \vDash \psi } w'(\beta _i)= \sum _{\beta _i \vDash \psi } w(\beta _i). \gamma /w(\psi )=\gamma $ and for $\phi \in \varGamma $ we have $w(\phi )- w'(\phi )\leq \sum _{\beta _i \vDash \phi \wedge \psi } w(\beta _i)(1-\gamma /w(\psi )) \leq w(\psi )-\gamma $ since all other $\beta _k$ remain the same or increase in probability under $w'$ , $w'(\phi ) \geq \eta +\tau -(w(\psi )-\gamma )> \eta $ . So we have $w'(\phi _i)> \eta $ while $w'(\psi ) = \gamma < \zeta $ which contradicts $\varGamma ^{\eta }\rhd _{\zeta } \psi $ .

The next result shows that the entailment relation $^{\eta }\rhd _{\zeta } $ does not depend on the choice of the language. More precisely, let $L_1, L_2$ be finite first order languages such that $\varGamma \subset SL_1\cap SL_2$ and $\psi \in SL_1 \cap SL_2$ , then $w_1(\psi ) \geq \zeta $ for every probability function $w_1$ on $SL_1$ such that $w_1(\varGamma ) \geq \eta $ if and only if $w_2(\psi ) \geq \zeta $ for every probability function $w_2$ on $SL_2$ such that $w_2(\varGamma ) \geq \eta $ .

Proposition 4.4. The relation $^{\eta }\rhd _{\zeta } $ is language invariant. That is, for two languages $L, L'$ with $SL \subseteq SL'$ , and any $\varGamma \subset SL$ and $\psi \in SL$ , $\varGamma ^{\eta }\rhd _{\zeta } \psi $ in the context of the language L (i.e., for every probability function w on $SL$ if $w(\varGamma ) \geq \eta $ then $w(\psi )\geq \zeta $ ) if and only if $\varGamma ^{\eta }\rhd _{\zeta } \psi $ in the context of the language $L'$ (i.e., for every probability function $w'$ on $SL'$ , if $w'(\varGamma ) \geq \eta $ then $w'(\psi ) \geq \zeta $ ).

Proof. For the forward direction assume that $w'$ is a probability function on $SL'$ such that $w'(\varGamma ) \geq \eta $ but $w'(\psi ) < \zeta $ . Let w be the restriction of $w'$ to $SL$ . Then w will be a probability function that agrees with $w'$ on $\varGamma $ and $\psi $ and thus $\varGamma ^{\eta }\rhd _{\zeta } \psi $ will fail in the context of the language L. Conversely let w be a probability function on $SL$ such that $w(\varGamma ) \geq \eta $ but $w(\psi ) < \zeta $ . Let $\varGamma = \{\phi _1, \ldots , \phi _n\}$ and as before let $\beta _i$ enumerate the sentences of the form $\bigwedge _{i=1}^{n} \phi _i^{\epsilon _i} \wedge \psi ^{\epsilon _{i+1}}$ and we have that $w(\psi )= \sum _{\beta _i \vDash \psi } w(\beta _i) < \zeta $ . Since $L \subset L'$ , we have $\beta _i \in SL'$ and since w is a probability function we have that $\sum _{i=1}^{2^{n+1}} w(\beta _i)=1$ . Using Lemma 3.2, we can find a probability function $w'$ on $SL'$ with $w'(\beta _i)=w(\beta _i)$ . With the notation of Lemma 3.2, for $\phi \in \varGamma $ ,

$$ \begin{align*}w'(\phi)= \sum_{i=1}^{2^{n+1}} w(\beta_i) u(\phi \vert \beta_i)= \sum_{\beta_i \vDash \phi} w(\beta_i)= w(\phi) \geq \eta\end{align*} $$

and

$$ \begin{align*}w'(\psi)= \sum_{i=1}^{2^{n+1}} w(\beta_i) u(\psi \vert \beta_i)= \sum_{\beta_i \vDash \psi} w(\beta_i)= w(\psi) < \zeta.\end{align*} $$

Hence $\varGamma ^{\eta }\rhd _{\zeta } \psi $ fails in the context of language $L'$ .

With this in place we can now talk about making logical inference from an inconsistent set of premisses. Let $\varGamma \vDash \bot $ and $\eta = mc(\varGamma )$ . As pointed out in the previous section $\eta $ can be regarded as the highest threshold of reliability that can be jointly satisfied by all sentences in $\varGamma $ . One can then devise a spectrum of entailment relations ${}^{\eta } \rhd _{\zeta }$ for $\zeta \in [0, 1]$ . Given the intuition we started with it seems more reasonable however to limit the spectrum to $\zeta \in [\eta , 1]$ . With $\zeta =1$ one would be effectively define the logical inference from $\varGamma $ as the set of sentences that will be categorically true if one was to accept sentences in $\varGamma $ with the highest possible threshold. Similarly values of $\zeta \in [\eta , 1)$ can correspond to more relaxed criteria of acceptability for what can be considered as a consequence of $\varGamma $ . Given a set of sentences $\varGamma \subset SL$ , let $\eta = mc(\varGamma )$ and define . Notice that if we denote the set of consequences of $\varGamma $ at reliability degree $\zeta $ by $C_{\varGamma }^{\zeta }$ then for $\zeta \leq \delta $ we have $C_{\varGamma }^{\delta } \subseteq C_{\varGamma }^{\zeta }$ .

4.3 Generalising to multiple thresholds; $^{\vec {\eta }}\rhd _{\zeta }$

What is missing from this picture so far is the promise of an entailment relation that can limit the effect on inconsistencies to only the part of reasoning that is relevant to them. For this we should look at the $\vec {mc}(\varGamma )$ and generalise the entailment relation to allow for multiple thresholds.

Definition 4.5 [Reference Knight15].

Let $\varGamma = \{\phi _1, \ldots , \phi _n\} \subset SL$ , $\psi \in SL$ and $\vec {\eta } \in [0, 1]^{n}, \zeta \in [0, 1]$ . Define

$$ \begin{align*} \varGamma ^{\vec{\eta}}\rhd_{\zeta} \psi \iff \text{ for all probability functions } w \text{ on } L,\\ \text{ if } w(\phi_i) \geq (\vec{\eta})_i \text{ for } i=1, \ldots, n \text{ then } w(\psi) \geq \zeta. \end{align*} $$

Let $\varGamma =\{\phi _1, \ldots , \phi _n\}$ be an inconsistent set of sentences. The notion of $\vec {mc}(\varGamma )$ , introduced in the previous section, was meant to capture the highest probability that can be simultaneously assigned to sentences in $\varGamma $ , capturing the highest degree of reliability that one can consider for them. In this sense the entailment $\varGamma ^{\vec {mc}(\varGamma )}\rhd _{\zeta }$ allows us to relax the notion of logical consequence of a set $\varGamma $ by considering not only the models in which sentences in $\varGamma $ hold categorically (of which there are none since $\varGamma $ is inconsistent) but extend to probabilistic models in which sentences in $\varGamma $ are as reliable as possible. The next result shows how this can be employed to limit the effect of inconsistencies to only reasoning from the relevant part of the premisses.

Proposition 4.6. Let $\varGamma =\{\theta _1, \ldots , \theta _k\}$ and $\Pi \subseteq \mathcal {P}(\varGamma )$ be the set of maximally consistent subsets of $\varGamma $ and let $\Delta = \bigcap \Pi $ , then $\left (\vec {mc}(\varGamma )\right )_i = 1$ whenever $\theta _i \in \Delta $ .

Indeed $\Delta $ can be regarded as the part of $\varGamma $ to which the inconsistency is irrelevant.

Proof. Let $\Delta =\{\phi _1, \ldots , \phi _t\}= \bigcap \Pi $ , $\varGamma '= \varGamma \setminus \Delta =\{\psi _1, \ldots , \psi _n\}$ ( $k=t+n$ ) and $\vec {mc}(\varGamma ')=\vec {\zeta }= (\zeta _1, \ldots , \zeta _n)$ . Let $\varGamma '=\varGamma ^{\prime }_1 \cup \cdots \cup \varGamma ^{\prime }_m$ and $\eta _1, \ldots , \eta _m$ ( $m \leq n$ ) be as in Lemma 3.5, so that for all $1 \leq i\leq n$ , if $\psi _i \in \varGamma ^{\prime }_j$ then $\zeta _i=\eta _j$ . Then there is a probability function u on $SL$ such that $u(\psi _i)=\eta _i$ , $1\leq i\leq n$ .

Let $\alpha _{j}$ , $i=1, \ldots , m \leq 2^{n}$ , enumerate all the satisfiable sentences of the form $\bigwedge _{k=1}^{n} \psi _{k}^{ \epsilon _{k}}$ and sentences $\beta _{j, \vec {\epsilon }}$ enumerate all the sentences of the form

$$ \begin{align*}\beta_{j, \vec{\epsilon}} = \alpha_j \wedge \bigwedge_{k=1}^{t} \phi_{k}^{(\vec{\epsilon})_k}.\end{align*} $$

Define $v(\beta _{j, \vec {\epsilon }})= u(\alpha _j)$ if $\vec {\epsilon }= \vec {1}$ and $v(\beta _{j, \vec {\epsilon }})= 0$ otherwise. Then $\sum _{j, \vec {\epsilon }} v(\beta _{j, \vec {\epsilon }}) = \sum _{i=1}^{2^m} v(\alpha _j)=1$ . By Lemma 3.2 there is a probability function w on $SL$ such that $w(\beta _{j, \vec {\epsilon }})= v(\beta _{j, \vec {\epsilon }})$ . Then $w(\psi _i)= u(\psi )= \eta _i$ , $i=1, \ldots , n$ , and $w(\phi _j)=1$ , $j=1, \ldots , t$ . So w assigns probability 1 to all sentences in $\Delta $ . Then $\varGamma = \varGamma ^{\prime }_1 \cup \cdots \cup \varGamma ^{\prime }_m \cup \varGamma ^{\prime }_{m+1}=\Delta $ and $\eta _1, \ldots , \eta _m ,\eta _{m+1}=1$ give a partitioning of $\varGamma $ and the corresponding probability bounds satisfying conditions in Lemma 3.5. Therefore by definition of $\vec {mc}(\varGamma )$ , $\vec {mc}(\varGamma )_i =1$ for all i with $\theta _i \in \Delta $ .

This ensures that $\vec {mc}(\varGamma )$ assigns probability 1 to all sentences that are not affected by the inconsistency in $\varGamma $ . That is for any $\psi \in SL$ such that $\Delta \vDash \neg \psi $ , $\varGamma \not { ^{\vec {\eta }}\rhd _{\zeta }} \psi $ for all $\zeta>0$ and if $\Delta \vDash \psi $ then $\varGamma ^{\vec {\eta }}\rhd _{\zeta } \psi $ for all $\zeta \in [0, 1]$ .

Example 4.7. Consider $L_1$ and $L_2$ to be disjoint languages with $L=L_1 \cup L_2$ and let $\varGamma _1 \subset SL_1$ and $\varGamma _2 \subset SL_2$ and $\varGamma = \varGamma _1 \cup \varGamma _2 \subset SL$ . Let $\varGamma _1= \{\phi _{1}, \ldots , \phi _{n}\}$ be inconsistent with $\vec {mc}(\varGamma _1)= (\eta _1, \ldots , \eta _n)$ and assume that $\varGamma _2=\{\psi _{1}, \ldots , \psi _{m}\}$ is consistent and so $\vec {mc}(\varGamma _2)= (1, \ldots , 1)$ . Then taking $\varGamma =\{\phi _{1}, \ldots , \phi _{n}, \psi _{1}, \ldots , \psi _{m} \}$ in this fixed order, we have $\vec {\eta }= \vec {mc}(\varGamma )= (\eta _1, \ldots , \eta _{n}, 1, \ldots , 1)$ . Define . Again, we have a spectrum of entailment relations from the set $\varGamma $ each at a different degree of reliability in $[0, 1]$ . Now for $\theta \in SL_{2}\subset SL$ we have , thus reducing the inference on sentences of $L_2$ where the relevant knowledge is consistent to the classical inference, hence limiting the pathological effect of the inconsistency only to inferences on sentences of $L_{1}$ where the knowledge is inconsistent.

5 Conclusion

One approach to deal with inconsistencies is motivated by reasoning in non-ideal contexts and is based on the assumption that the inconsistent evidence does not point out the inconsistencies of the reality under investigation but point to an inconsistent evaluation of facts. Receiving contradictory information should thus cast doubts on those evaluations. In this view, receiving some piece of information $\phi $ while having $\neg \phi $ in our knowledge base has the effect of changing the (categorical or probabilistic) evaluation of $\phi $ (and thus $\neg \phi $ ). In case of categorical knowledge (with truth values of zero or one), this means moving from categorical belief in $\phi $ or $\neg \phi $ to some uncertain evaluation of them. This can be done, for example, by assigning reliabilities to information sources and judge the evidence in light of these reliabilities [Reference Bovens and Hartmann5, Reference Williamson29]. Similarly, in case of probabilistic knowledge this would entail re-evaluation of the probabilities. This approach, as we followed here, is based on two assumptions:

– The inconsistencies are identified with the uncertainty that they induce in the information set.
– The information is assumed to be as reliable as possibly allowed by the consistency considerations.

Hence, receiving inconsistent information will change the context of reasoning from a categorical one to an uncertain one that is expressed probabilistically. We built upon the work introduced by Knight [Reference Knight14, Reference Knight15] and argued that it is possible to do so in a way that allows limiting the pathological effect of an inconsistency to the part of the reasoning that is relevant to it.

How the change induced by the inconsistency is carried out in the information set depends on one’s approach to the weighting of the new information with respect to the old information. For example, if we take the new information to be infinitely more reliable than the old, we will end up with the same retraction and expansion process as in the AGM. But as we have seen, one can also devise the change in a manner that allows a wider range of epistemic attitudes towards the new information in comparison to the old. Since the inconsistencies will reduce our categorical knowledge to probabilistic one, any inference based on such knowledge will essentially be probabilistic. We then studied a probabilistic entailment relation on propositional languages, introduced by Knight, and showed that it can be extended to the first order case in a very straight forward manner. The idea on this entailment relation is to generalise the classical consequence relation from a relation that preserves the truth to one that preserves, or more precisely ensures, some degree of reliability.

It is also worth mentioning that one can choose a different route altogether and deal with the inconsistent evidence by adopting a richer language in which the source, and/or reliability of information is also coded in the information. Thus, for example, $\phi $ received from source S is replaced by $(\phi )_{S}$ to the effect that “according to S, $\phi $ .” In this approach receiving $\phi _S$ and $(\neg \phi )_{S'}$ poses no contradiction any more, while contradictory information from the same source has the effect of reducing the reliability of the source. The evaluation of information should then depend on the reliability of the sources. This approach however can, at least to some extent, be covered by our setting. The simplest case we discussed corresponds to dealing with equally reliable pieces of information. And the notion of maximal consistency can be regarded as the highest reliability that one can assign to a source that gives contradictory information. The case of prioritised evidence can cover dealing with (possibly inconsistent) information from sources that have different reliabilities. The approach given here, however, has the advantage of avoiding unnecessary complication of the language.

Of course our notion of “closeness” when revising the inconsistent theories into probabilistically consistent ones can be subject to debate. The use of Euclidean distance was motivated by trying to choose the closest values for all sentences simultaneously. It would be interesting to investigate if other notions of “closeness” can improve this approach.

Footnotes

1 If the probability of any sentence in this subset is increased above $mc(\varGamma )$ the probability of some other sentence in it will necessarily decrease below it.

2 Thus already ensuring the highest possible probability for sentences in $\varGamma _1$ .

3 So $w(\phi) \geq \eta_1= mc(\Gamma)$ for all $\phi \in \varGamma $ and $w(\psi ) \geq \eta_2 $ for all $\psi \in \varGamma \setminus \varGamma _1$ .

4 So $d(\vec {x}, \vec {1}) = \sqrt {\sum _{i=1}^{n}(x_i-1)^2}$ .

5 The language with the same relation symbols as L, say $R_{1}, \ldots , R_{t}$ but with the domain restricted to $\{a_1, \ldots , a_k\}.$

References

BIBLIOGRAPHY

Alchourròn, C. E., Gadenfors, P., & Makinson, D. (1985). On the logic of theory change: Partial meet contraction and revision functions. Journal of Symbolic Logic , 50, 510–530.CrossRef Google Scholar

Anderson, A. R., & Belnap, N. (1975). Entailment: The Logic of Relevance and Necessity, Vol. 1. Princeton: Princeton University Press.Google Scholar

Batens, D. (2001). A general characterization of adaptive logics. Logique et Analyse , 44(173–175), 45–68.Google Scholar

Belnap, N. D. (1992). A useful four-valued logic: How a computer should think. In Anderson, A. R., Belnap, N. D. Jr., and Dunn, J. M., editors. Entailment: The Logic of Relevance and Necessity, Vol. II. Princeton: Princeton University Press.Google Scholar

Bovens, L., & Hartmann, S. (2003). Bayesian Epistemology. Oxford: Oxford University Press.Google Scholar

da Costa, N. C. A. (1974). On the theory of inconsistent formal systems. Notre Dame Journal of Formal Logic , 15(4), 497–510.CrossRef Google Scholar

da Costa, N. C. A., & Subrahmanian, V. S. (1989). Paraconsistent logic as a formalism for reasoning about inconsistent knowledge bases. Artificial Intelligence in Medicine , 1, 167–174.CrossRef Google Scholar

De Bona, G., & Finger, M. (2015). Measuring inconsistency in probabilistic logic: Rationality postulates and Dutch book interpretation. Artificial Intelligence , 227, 140–164.CrossRef Google Scholar

De Bona, G., Finger, M., Ribeiro, M., Santos, Y., & Wassermann, R. (2016). Consolidating probabilistic knowledge bases via belief contraction. In Proceeding of International Conference on the Principles of Knowledge Representation and Reasoning KR2016. Palo Alto: AAAI Press.Google Scholar

Dunn, J. M. (1976). Intuitive semantics for first degree entailment and “coupled trees”. Philosophical Studies , 29(3), 149–168.CrossRef Google Scholar

Gaifman, H. (1964). Concerning measures in first order calculi. Israel Journal of Mathematics , 2(1), 1–18.CrossRef Google Scholar

Hansson, A. (1999). Survey of non-prioritized belief revision. Erkenntnis , 50(2–3), 413–427.CrossRef Google Scholar

Jaśkowski, S. (1948 [1969]). Propositional calculus for contradictory deductive systems. Studia Logica , 24, 143–157.CrossRef Google Scholar

Knight, K. M. (2002). Measuring inconsistency. Journal of Philosophical Logic , 31(1), 77–98.CrossRef Google Scholar

Knight, K. M. (2003). Probabilistic entailment and a non-probabilistic logic. Logic Journal of the IGPL , 11(3), 353–365.CrossRef Google Scholar

Leitgeb, H. (2014). The stability theory of belief. The Philosophical Review , 23(2), 131–171.CrossRef Google Scholar

Paris, J. B. (1994). The Uncertain Reasoners’ Companion: A Mathematical Perspective, Cambridge Tracts in Theoretical Computer Science, Vol. 39. Cambridge: Cambridge University Press.Google Scholar

Paris, J. B. (2004). Deriving information from inconsistent knowledge bases: A completeness theorem. Logic Journal of the IGPL , 12, 345–353.CrossRef Google Scholar

Paris, J. B., Picado-Muino, D., & Rosefield, M. (2009). Inconsistency as qualified truth: A probability logic approach. International Journal of Approximate Reasoning , 50, 1151–1163.CrossRef Google Scholar

Picado Muiño, D. (2011). Measuring and repairing inconsistency in probabilistic knowledge bases. International Journal of Approximate Reasoning , 52, 828–840.CrossRef Google Scholar

Potyka, N., & Thimm, M. (2017). Inconsistency-tolerant reasoning over linear probabilistic knowledge bases. International Journal of Approximate Reasoning , 88, 209–236.CrossRef Google Scholar

Priest, G. (1979). Logic of paradox. Journal of Philosophical Logic , 8, 219–241.CrossRef Google Scholar

Priest, G. (1987). In Contradiction. Nijhoff International Philosophy Series. Dordrecht: Springer.CrossRef Google Scholar

Priest, G. (2002). Paraconsistent logic. In Gabbay, D. M. and Guenthner, F., editors. Handbook of Philosophical Logic, Vol. 6, pp. 287–393. Dordrecht: Springer.CrossRef Google Scholar

Priest, G. (2007). Paraconsistency and dialetheism. In Gabbay, D. and Woods, J., editors. Handbook of the History of Logic, Vol. 8, pp. 129–204.Google Scholar

Rescher, N., & Manor, R. (1970). On inference from inconsistent premisses. Theory and Decision , 1(2), 179–217.CrossRef Google Scholar

Thimm, M. (2009). Measuring inconsistency in probabilistic knowledge bases. In Bilmes, J. and Ng, A., editors, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI’09), pp. 530–537. Arlington: AUAI Press.Google Scholar

Thimm, M. (2013). Inconsistency measures for probabilistic logic. Artificial Intelligence , 197, 1–24.CrossRef Google Scholar

Williamson, J. (2015). Deliberation, judgement and the nature of evidence. Economics and Philosophy , 31, 27–65.CrossRef Google Scholar