#### Supplement to Imprecise Probabilities

## Formal appendix

Some formal machinery was introduced in section 1.1. Refer back to that section for definitions. Recall that the objects of belief are an algebra of subsets of some set of states \(\Omega\).

- 1. Properties of strength of belief
- 2. The betting justification of IP
- 3. From lower probabilities to credal sets

### 1. Properties of strength of belief

We are interested in *strength of belief* and thus in some
sort of graded notion of belief. One straightforward and popular way
to represent strength of belief is with a function that maps the
objects of belief to real numbers: bigger numbers represent more
strongly believed sentences. Let’s consider a
function, \(b\), from subsets
of \(\Omega\) to real numbers. In this section
I outline a few properties we might want these numerical
representations of belief to satisfy (see also
Huber 2014).

By convention, let \(b(\Omega) = 1\)
and \(b(\emptyset) = 0\). It seems reasonable
to suppose that your beliefs in other sentences are bounded above and
below by these expressions: it would be odd if there were something
you believed more strongly than the necessary event, or less strongly
than the impossible event. Call this
property—\(b(\emptyset) \le b(X) \le
b(\Omega)\) for
all \(X\)—*boundedness*.

If \(X\) is always true
whenever \(Y\) is, then it seems sensible to
require that you believe \(X\) at least as
much as you do \(Y\). Call this
property—if \(Y \subseteq X\)
then \(b(Y)\le
b(X)\)—*monotonicity*.

Consider the following property: If \(X\)
and \(Y\) are
incompatible—if \(X \cap Y =
\emptyset\)—then \(b(X\cup Y) \ge b(X) +
b(Y)\). This property is called *superadditivity*. If the
“\(\ge\)” is replaced by a
“\(\le\)” we have the property
of *subadditivity*.

A slight strengthening of superadditivity gives
us *2-monotone*: \(b(X \cup Y) \ge b(X) +
b(Y) - b(X \cap Y)\). It is straightforward to show that this
entails superadditivity, and only a little more work to show that the
converse is false.

A further strengthening of this idea is *n-monotone*. This
is slightly more complicated than the previous ones.

Let’s break this down. On the left hand side, we are constraining
belief in unions of \(n\) events. How are we
constraining them? Look at the inner summation on the right hand
side. This is the sum of all the \(i\)-event
intersections. If \(i\) is even, we take each
of these away, if \(i\) is odd, we add
them. The outer sum ranges over \(i\). So we
add the one member intersections, take away the two element
intersections, add the three element intersections, and so on up
to \(n\). If \(n=2\)
then we have exactly the formula we had above for
2-monotone. If \(b\)
is \(n\)-monotone, then it
is \((n-1)\)-monotone. If \(b\)
is \(n\)-monotone for
all \(n\), then \(b\)
is called *infinite monotone* or *totally monotone*.

One final property is *additivity* which results from
replacing the inequality in the description of superadditivity with an
equality: if \(X\)
and \(Y\) are incompatible,
then \(b(X\cup Y) = b(X) + b(Y)\). A bounded,
monotonic function on an algebra of events is called
a *capacity*. A bounded, infinite monotone function on an
algebra of events is called a *belief function*. A bounded,
additive function on an algebra of events is a *probability
function*. Belief functions and capacities represent one way of
going beyond orthodox probability theory.

One might want to move beyond orthodox probability theory by
relaxing the constraint that degree of belief be represented by
a *real valued* function. One might instead take belief to be
represented by a function that maps events to *intervals* of
the real line, or to *sets* of real
numbers (Weichselberger 2000). These
approaches can, to some extent, be subsumed under the more general way
of thinking about imprecise belief using sets of probabilities that we
encountered in section 1.1, where your credal
state is represented by a set of probability
functions \(P\). Recall that your *lower
envelope* of \(X\)
is: \(\underline{P}(X)=\inf P(X)\); and
your *upper envelope*
is \(\overline{P}(X)=\sup
P(X)\). \(\underline{P}\) is a
superadditive capacity, \(\overline{P}\) is a
subadditive capacity.

This brief survey doesn’t come close to mapping out the whole range of possible representations of uncertainty (but see Halpern 2003; Haenni et al. 2011; Augustin et al. 2014; Huber 2014) There is, however, a philosophical tradition that puts sets of probabilities and belief functions foremost, and this article is primarily about that tradition. Credal sets are also a very general theory that subsumes many of the other representations. So in what follows the discussion will be in terms of sets of probability functions.

One might wonder whether taking representors to be sets
of *conditional probabilities* as the basic item might not be a
better way to do things. Hájek
(2003) argues that conditional probabilities should be taken to
be the basic entity, and unconditional probabilities should be defined
out of them. In what follows, this point won’t really make any
difference, so I will continue to talk mostly of unconditional
probabilities. Bear in mind that these are “really”
conditional probabilities conditioned on the tautology.

### 2. The betting justification of IP

Bruno de Finetti justified that your degrees of belief should have the structure of a probability function in the following way. Belief is understood as a summary of your attitudes to gambles. That is, belief is what determines your dispositions to choose among bets, and thus determines your betting behaviour. He showed that if you satisfy some reasonable constraints on how your attitudes to gambles are structured, then your degrees of belief are, in a certain sense, probabilistic (de Finetti 1964; 1990 [1974]).

The insight behind de Finetti’s approach was that your attitudes to how likely various events are could be read off your attitudes to gambles. Recall that gambles are bounded real valued functions (Troffaes and de Cooman (2014) extend the theory to unbounded gambles). Of particular interest are the indicator functions that output 1 if the proposition in question is true and 0 otherwise.

De Finetti essentially appealed to the claim that if you are unwilling to buy a bet at a particular price, you must be happy to sell the bet for that price. This was one of his criteria of “coherence”. C.A.B. Smith questioned this assumption (Smith 1961) and Peter Walley built an extremely powerful and rich theory on the basis of these insights (Walley 1991). This provides a nice foundation for IP.

There’s quite a lot of set up required here, but the punchline is worth it. If we assume that you are neither risk seeking nor risk averse, and that the gambles are denominated in some currency such that your utility is linear in that currency, then your attitude about the contingencies can be read off your attitude about gambles in a straightforward way. Before we explain this, we add some structure to the set of gambles. This presentation follows Quaeghebeur (2014). Call \(\mathcal{L}\) the set of bounded real valued functions on \(\Omega\). If \(f,g\in \mathcal{L}\) are gambles, then so is their “pointwise sum” \((f+g)(\omega) = f(\omega)+g(\omega)\) for all \(\omega\). For a real number \(\lambda\), if \(f\) is a gamble then so is \((\lambda f)(\omega) = \lambda f(\omega)\) for all \(\omega\). Let real number \(\lambda\) stand also for the constant gamble that pays out \(\lambda\): \(\lambda(\omega) = \lambda\) for all \(\omega\).

Consider the set of desirable
gambles \(D\). These are gambles that you
would be willing to accept if they were offered to you. Which gambles
you would willingly accept reflects your opinions in the following
way, the more you would be willing to pay for a
gamble \(f\), the more likely you consider the
contingencies where \(f\) gets you a good
prize. Consider the indicator function \(I_X\)
for an event \(X\). This is the function that
outputs 1 if \(\omega\) is
in \(X\) and 0 otherwise. The more you would
be willing to pay for the gamble \(I_X\), the
more likely you take \(X\) to be. In what
follows I will use \(X\) interchangeably for
the event and its indicator function. Being willing to
pay \(\alpha\) for the
gamble \(f\) is the same as finding the
gamble \(f-\alpha\) acceptable. Consider the
following function: \(\underline{E}(f) =
\sup\{\alpha \in \mathbb{R}, f-\alpha \in D\}\). Call this your
“lower prevision” for \(f\). That
is, your lower prevision for the gamble \(f\)
is the largest amount you would pay to receive the
gamble \(f\). We can also
define \(\overline{E}(f) = \inf\{\alpha
\in\mathbb{R}, \alpha - f \in D\}\): the smallest amount you’d
be willing to sell the gamble \(f\) for. These
two functions are *conjugate* in the sense
that \(\underline{E}(f) = -
\overline{E}(-f)\). We will be particularly interested in lower
previsions of indicator functions: these directly reflect how likely
you think the event in question
is. \(X-\alpha\) is effectively a unit bet
on \(X\) that costs
you \(\alpha\). De Finetti focussed on the
case where if \(f-\alpha \notin D\)
then \(\alpha - f \in D\) for
all \(f\) and
all \(\alpha\). Then \(\underline{E}(f)
= \overline{E}(f)\) and we can describe this number—call
it \(E(f)\)—as your “fair
price” for the gamble \(f\). It’s the
price that makes you indifferent between buying and selling
gamble \(f\). Smith
(1961) allowed your fair selling price and your fair buying
price to differ.

Here’s how we now proceed. We describe some reasonable “coherence constraints” on sets of desirable gambles, and then describe what structure such constraints put on \(\underline{E}\). \(D\) should be closed under addition. That is, if \(f,g\in D\) then \(f + g\in D\). Or, more succinctly, \(D + D \subseteq D\). \(D\) should also be closed under multiplication by a positive constant \(\lambda >0\): if \(f\in D\) then \(\lambda f \in D\), or more succinctly \(\lambda D = D\). Call \(\mathcal{L}^+\) the set of gambles that are always positive, that is \(f\in\mathcal{L}^+\) iff \(f(\omega) > 0\) for all \(\omega\). Likewise \(\mathcal{L}^-\) is the set of everywhere negative gambles. Our third condition is that always positive gambles should be in \(D\): \(\mathcal{L}^+ \subseteq D\). And fourth, no always negative gambles should be in \(D\): \(\mathcal{L}^- \cap D = \emptyset\).

Now, \(D\) is a coherent set of gambles—i.e., it satisfies the above four properties—if and only if \(\underline{E}\) has the following structure:

- \(\underline{E}(f) \ge \inf_{\Omega} \{f(\omega), \omega\in\Omega\}\)
- \(\underline{E}(f+g) \ge \underline{E}(f) + \underline{E}(g)\)
- if \(\lambda >0\) then \(\underline{E}(\lambda f) = \lambda \underline{E}(f)\)

Call a lower prevision coherent if and only if it satisfies these
properties. Call a set of
gambles \(D\) *maximal*
when \(f\in D\)
iff \(-f\notin D\) for
all \(f\neq
0\). \(\underline{E} = \overline{E}\)
iff \(D\) is coherent and maximal. The above
is, strictly speaking, not quite true. There are some subtleties about
topological properties of sets of desirable gambles—whether bets
on the boundary of \(D\) are
in \(D\)—that preclude the link being
quite this straightforward, but these need not detain
us (Quaeghebeur
2014). When \(\underline{E} =
\overline{E}\), we call this
function, \(E\), a *linear prevision*
because such previsions satisfy the following properties:

- \(E(f) \ge \inf_{\Omega} \{f(\omega), \omega\in\Omega\}\)
- \(E(f+g) = E(f) + E(g)\)
- and therefore \(E(\lambda f) = \lambda E(f)\) for all \(\lambda \in\mathbb{R}\)

Now, here is the punchline: if we restrict attention to linear previsions of indicator functions, we get exactly the probability functions! (De Finetti didn’t use the “sets of desirable gambles” approach to coherence, but rather something almost equivalent to it, see Quaeghebeur (2014) and Miranda and de Cooman (2014) for details). The same move in the case of lower previsions—restricting attention to indicator functions—yields lower probabilities. As well as Smith, P.M. Williams was another author to consider weakening de Finetti’s requirement in 1975, although the paper was published only in 2007 (Williams 2007; Vicig, Zaffalon, and Cozman 2007). Building on Smith’s and Williams’ insight, Walley (1991) builds a rich and sophisticated theory of statistical inference on lower previsions and lower probabilities. See Augustin et al. (2014) for a modern introduction to IP, and Vicig and Seidenfeld (2012) for a more careful discussion of the history of IP. Because of the conjugacy of the upper and lower probabilities—because \(\underline{P}(X) = 1-\overline{P}(\neg X)\)—the lower probability tells us all we need to know about your belief state. In fact, coherent \(\underline{P}\) and \(\overline{P}\) are related as described in Figure F1 (Walley 1991: 84 ff).

An arrow between two expressions means that the higher one is greater than or equal to the lower one. The solid lines highlight the nontrivial inequalities. If \(A\) and \(B\) are incompatible, it is clear that this tangle of inequalities simplifies to: \[ \overline{P}(A) + \overline{P}(B) \ge \overline{P}(A\cup B) \ge \overline{P}(A) + \underline{P}(B) \ge \underline{P}(A\cup B) \ge \underline{P}(A) + \underline{P}(B) \]

### 3. From lower probabilities to credal sets

To see how the above foundation relates to the credal sets approach,
we need a final couple of formal tools. Say that prevision
\(\underline{E}\) *dominates* \(\underline{E'}\) if, for
all \(f\in \mathcal{L}\), \(\underline{E}(f) \ge
\underline{E'}(f)\). For a lower prevision \(\underline{E}\),
define the *envelope* of \(\underline{E}\) as the set of linear
previsions that dominate \(\underline{E}\). Call this
\(M(\underline{E})\). For a set of linear previsions \(M\), define
their *lower envelope* as \(\inf\{E(f), E\in
M\}\). The *envelope theorem* states that \(\underline{E}\) is
coherent iff \(\underline{E}\) is the lower envelope of
\(M(\underline{E})\). Restricted to the case of lower previsions of
indicator functions—lower probabilities—this states that
your betting behaviour is coherent if and only if your fair buying
prices are the lower envelope of a set of probabilities.

This puts the imprecise probabilities approach on almost the same
footing as the precise probability approach in that we have an exact
analogue of de Finetti’s version of the so-called Dutch book
theorem: the theorem has less demanding—more
reasonable—premises, and the conclusion is that rational
agents’ beliefs should have the structure of a lower probability
and thus through the envelope theorem, the structure of a credal
set. One disanalogy between de Finetti’s result and the IP
one-sided result is that your precise fair prices *determine*
your probability function whereas in the one-sided betting case, the
coherence requirements only constrain your representor. That is, a
complete set of coherent fair prices pick out a unique probability,
while several distinct sets of probabilities can give rise to the same
set of one-sided previsions.