Supplement to Imprecise Probabilities
Formal appendix
Some formal machinery was introduced in section 1.1. Refer back to that section for definitions. Recall that the objects of belief are an algebra of subsets of some set of states \(\Omega\).
- 1. Properties of strength of belief
- 2. The betting justification of IP
- 3. From lower probabilities to credal sets
1. Properties of strength of belief
We are interested in strength of belief and thus in some sort of graded notion of belief. One straightforward and popular way to represent strength of belief is with a function that maps the objects of belief to real numbers: bigger numbers represent more strongly believed sentences. Let’s consider a function, \(b\), from subsets of \(\Omega\) to real numbers. In this section I outline a few properties we might want these numerical representations of belief to satisfy (see also Huber 2014).
By convention, let \(b(\Omega) = 1\) and \(b(\emptyset) = 0\). It seems reasonable to suppose that your beliefs in other sentences are bounded above and below by these expressions: it would be odd if there were something you believed more strongly than the necessary event, or less strongly than the impossible event. Call this property—\(b(\emptyset) \le b(X) \le b(\Omega)\) for all \(X\)—boundedness.
If \(X\) is always true whenever \(Y\) is, then it seems sensible to require that you believe \(X\) at least as much as you do \(Y\). Call this property—if \(Y \subseteq X\) then \(b(Y)\le b(X)\)—monotonicity.
Consider the following property: If \(X\) and \(Y\) are incompatible—if \(X \cap Y = \emptyset\)—then \(b(X\cup Y) \ge b(X) + b(Y)\). This property is called superadditivity. If the “\(\ge\)” is replaced by a “\(\le\)” we have the property of subadditivity.
A slight strengthening of superadditivity gives us 2-monotone: \(b(X \cup Y) \ge b(X) + b(Y) - b(X \cap Y)\). It is straightforward to show that this entails superadditivity, and only a little more work to show that the converse is false.
A further strengthening of this idea is n-monotone. This is slightly more complicated than the previous ones.
\[ b\left(\bigcup_{i=1}^{n}X_i\right) \ge \sum_{i=1}^{n} \sum_{I\subseteq \{1,\ldots,n\},\mid I\mid =i} (-1)^{i+1_b} \left(\bigcap_{j\in I} X_j\right) \]Let’s break this down. On the left hand side, we are constraining belief in unions of \(n\) events. How are we constraining them? Look at the inner summation on the right hand side. This is the sum of all the \(i\)-event intersections. If \(i\) is even, we take each of these away, if \(i\) is odd, we add them. The outer sum ranges over \(i\). So we add the one member intersections, take away the two element intersections, add the three element intersections, and so on up to \(n\). If \(n=2\) then we have exactly the formula we had above for 2-monotone. If \(b\) is \(n\)-monotone, then it is \((n-1)\)-monotone. If \(b\) is \(n\)-monotone for all \(n\), then \(b\) is called infinite monotone or totally monotone.
One final property is additivity which results from replacing the inequality in the description of superadditivity with an equality: if \(X\) and \(Y\) are incompatible, then \(b(X\cup Y) = b(X) + b(Y)\). A bounded, monotonic function on an algebra of events is called a capacity. A bounded, infinite monotone function on an algebra of events is called a belief function. A bounded, additive function on an algebra of events is a probability function. Belief functions and capacities represent one way of going beyond orthodox probability theory.
One might want to move beyond orthodox probability theory by relaxing the constraint that degree of belief be represented by a real valued function. One might instead take belief to be represented by a function that maps events to intervals of the real line, or to sets of real numbers (Weichselberger 2000). These approaches can, to some extent, be subsumed under the more general way of thinking about imprecise belief using sets of probabilities that we encountered in section 1.1, where your credal state is represented by a set of probability functions \(P\). Recall that your lower envelope of \(X\) is: \(\underline{P}(X)=\inf P(X)\); and your upper envelope is \(\overline{P}(X)=\sup P(X)\). \(\underline{P}\) is a superadditive capacity, \(\overline{P}\) is a subadditive capacity.
This brief survey doesn’t come close to mapping out the whole range of possible representations of uncertainty (but see Halpern 2003; Haenni et al. 2011; Augustin et al. 2014; Huber 2014) There is, however, a philosophical tradition that puts sets of probabilities and belief functions foremost, and this article is primarily about that tradition. Credal sets are also a very general theory that subsumes many of the other representations. So in what follows the discussion will be in terms of sets of probability functions.
One might wonder whether taking representors to be sets of conditional probabilities as the basic item might not be a better way to do things. Hájek (2003) argues that conditional probabilities should be taken to be the basic entity, and unconditional probabilities should be defined out of them. In what follows, this point won’t really make any difference, so I will continue to talk mostly of unconditional probabilities. Bear in mind that these are “really” conditional probabilities conditioned on the tautology.
2. The betting justification of IP
Bruno de Finetti justified that your degrees of belief should have the structure of a probability function in the following way. Belief is understood as a summary of your attitudes to gambles. That is, belief is what determines your dispositions to choose among bets, and thus determines your betting behaviour. He showed that if you satisfy some reasonable constraints on how your attitudes to gambles are structured, then your degrees of belief are, in a certain sense, probabilistic (de Finetti 1964; 1990 [1974]).
The insight behind de Finetti’s approach was that your attitudes to how likely various events are could be read off your attitudes to gambles. Recall that gambles are bounded real valued functions (Troffaes and de Cooman (2014) extend the theory to unbounded gambles). Of particular interest are the indicator functions that output 1 if the proposition in question is true and 0 otherwise.
De Finetti essentially appealed to the claim that if you are unwilling to buy a bet at a particular price, you must be happy to sell the bet for that price. This was one of his criteria of “coherence”. C.A.B. Smith questioned this assumption (Smith 1961) and Peter Walley built an extremely powerful and rich theory on the basis of these insights (Walley 1991). This provides a nice foundation for IP.
There’s quite a lot of set up required here, but the punchline is worth it. If we assume that you are neither risk seeking nor risk averse, and that the gambles are denominated in some currency such that your utility is linear in that currency, then your attitude about the contingencies can be read off your attitude about gambles in a straightforward way. Before we explain this, we add some structure to the set of gambles. This presentation follows Quaeghebeur (2014). Call \(\mathcal{L}\) the set of bounded real valued functions on \(\Omega\). If \(f,g\in \mathcal{L}\) are gambles, then so is their “pointwise sum” \((f+g)(\omega) = f(\omega)+g(\omega)\) for all \(\omega\). For a real number \(\lambda\), if \(f\) is a gamble then so is \((\lambda f)(\omega) = \lambda f(\omega)\) for all \(\omega\). Let real number \(\lambda\) stand also for the constant gamble that pays out \(\lambda\): \(\lambda(\omega) = \lambda\) for all \(\omega\).
Consider the set of desirable gambles \(D\). These are gambles that you would be willing to accept if they were offered to you. Which gambles you would willingly accept reflects your opinions in the following way, the more you would be willing to pay for a gamble \(f\), the more likely you consider the contingencies where \(f\) gets you a good prize. Consider the indicator function \(I_X\) for an event \(X\). This is the function that outputs 1 if \(\omega\) is in \(X\) and 0 otherwise. The more you would be willing to pay for the gamble \(I_X\), the more likely you take \(X\) to be. In what follows I will use \(X\) interchangeably for the event and its indicator function. Being willing to pay \(\alpha\) for the gamble \(f\) is the same as finding the gamble \(f-\alpha\) acceptable. Consider the following function: \(\underline{E}(f) = \sup\{\alpha \in \mathbb{R}, f-\alpha \in D\}\). Call this your “lower prevision” for \(f\). That is, your lower prevision for the gamble \(f\) is the largest amount you would pay to receive the gamble \(f\). We can also define \(\overline{E}(f) = \inf\{\alpha \in\mathbb{R}, \alpha - f \in D\}\): the smallest amount you’d be willing to sell the gamble \(f\) for. These two functions are conjugate in the sense that \(\underline{E}(f) = - \overline{E}(-f)\). We will be particularly interested in lower previsions of indicator functions: these directly reflect how likely you think the event in question is. \(X-\alpha\) is effectively a unit bet on \(X\) that costs you \(\alpha\). De Finetti focussed on the case where if \(f-\alpha \notin D\) then \(\alpha - f \in D\) for all \(f\) and all \(\alpha\). Then \(\underline{E}(f) = \overline{E}(f)\) and we can describe this number—call it \(E(f)\)—as your “fair price” for the gamble \(f\). It’s the price that makes you indifferent between buying and selling gamble \(f\). Smith (1961) allowed your fair selling price and your fair buying price to differ.
Here’s how we now proceed. We describe some reasonable “coherence constraints” on sets of desirable gambles, and then describe what structure such constraints put on \(\underline{E}\). \(D\) should be closed under addition. That is, if \(f,g\in D\) then \(f + g\in D\). Or, more succinctly, \(D + D \subseteq D\). \(D\) should also be closed under multiplication by a positive constant \(\lambda >0\): if \(f\in D\) then \(\lambda f \in D\), or more succinctly \(\lambda D = D\). Call \(\mathcal{L}^+\) the set of gambles that are always positive, that is \(f\in\mathcal{L}^+\) iff \(f(\omega) > 0\) for all \(\omega\). Likewise \(\mathcal{L}^-\) is the set of everywhere negative gambles. Our third condition is that always positive gambles should be in \(D\): \(\mathcal{L}^+ \subseteq D\). And fourth, no always negative gambles should be in \(D\): \(\mathcal{L}^- \cap D = \emptyset\).
Now, \(D\) is a coherent set of gambles—i.e., it satisfies the above four properties—if and only if \(\underline{E}\) has the following structure:
- \(\underline{E}(f) \ge \inf_{\Omega} \{f(\omega), \omega\in\Omega\}\)
- \(\underline{E}(f+g) \ge \underline{E}(f) + \underline{E}(g)\)
- if \(\lambda >0\) then \(\underline{E}(\lambda f) = \lambda \underline{E}(f)\)
Call a lower prevision coherent if and only if it satisfies these properties. Call a set of gambles \(D\) maximal when \(f\in D\) iff \(-f\notin D\) for all \(f\neq 0\). \(\underline{E} = \overline{E}\) iff \(D\) is coherent and maximal. The above is, strictly speaking, not quite true. There are some subtleties about topological properties of sets of desirable gambles—whether bets on the boundary of \(D\) are in \(D\)—that preclude the link being quite this straightforward, but these need not detain us (Quaeghebeur 2014). When \(\underline{E} = \overline{E}\), we call this function, \(E\), a linear prevision because such previsions satisfy the following properties:
- \(E(f) \ge \inf_{\Omega} \{f(\omega), \omega\in\Omega\}\)
- \(E(f+g) = E(f) + E(g)\)
- and therefore \(E(\lambda f) = \lambda E(f)\) for all \(\lambda \in\mathbb{R}\)
Now, here is the punchline: if we restrict attention to linear previsions of indicator functions, we get exactly the probability functions! (De Finetti didn’t use the “sets of desirable gambles” approach to coherence, but rather something almost equivalent to it, see Quaeghebeur (2014) and Miranda and de Cooman (2014) for details). The same move in the case of lower previsions—restricting attention to indicator functions—yields lower probabilities. As well as Smith, P.M. Williams was another author to consider weakening de Finetti’s requirement in 1975, although the paper was published only in 2007 (Williams 2007; Vicig, Zaffalon, and Cozman 2007). Building on Smith’s and Williams’ insight, Walley (1991) builds a rich and sophisticated theory of statistical inference on lower previsions and lower probabilities. See Augustin et al. (2014) for a modern introduction to IP, and Vicig and Seidenfeld (2012) for a more careful discussion of the history of IP. Because of the conjugacy of the upper and lower probabilities—because \(\underline{P}(X) = 1-\overline{P}(\neg X)\)—the lower probability tells us all we need to know about your belief state. In fact, coherent \(\underline{P}\) and \(\overline{P}\) are related as described in Figure F1 (Walley 1991: 84 ff).
An arrow between two expressions means that the higher one is greater than or equal to the lower one. The solid lines highlight the nontrivial inequalities. If \(A\) and \(B\) are incompatible, it is clear that this tangle of inequalities simplifies to: \[ \overline{P}(A) + \overline{P}(B) \ge \overline{P}(A\cup B) \ge \overline{P}(A) + \underline{P}(B) \ge \underline{P}(A\cup B) \ge \underline{P}(A) + \underline{P}(B) \]
3. From lower probabilities to credal sets
To see how the above foundation relates to the credal sets approach, we need a final couple of formal tools. Say that prevision \(\underline{E}\) dominates \(\underline{E'}\) if, for all \(f\in \mathcal{L}\), \(\underline{E}(f) \ge \underline{E'}(f)\). For a lower prevision \(\underline{E}\), define the envelope of \(\underline{E}\) as the set of linear previsions that dominate \(\underline{E}\). Call this \(M(\underline{E})\). For a set of linear previsions \(M\), define their lower envelope as \(\inf\{E(f), E\in M\}\). The envelope theorem states that \(\underline{E}\) is coherent iff \(\underline{E}\) is the lower envelope of \(M(\underline{E})\). Restricted to the case of lower previsions of indicator functions—lower probabilities—this states that your betting behaviour is coherent if and only if your fair buying prices are the lower envelope of a set of probabilities.
This puts the imprecise probabilities approach on almost the same footing as the precise probability approach in that we have an exact analogue of de Finetti’s version of the so-called Dutch book theorem: the theorem has less demanding—more reasonable—premises, and the conclusion is that rational agents’ beliefs should have the structure of a lower probability and thus through the envelope theorem, the structure of a credal set. One disanalogy between de Finetti’s result and the IP one-sided result is that your precise fair prices determine your probability function whereas in the one-sided betting case, the coherence requirements only constrain your representor. That is, a complete set of coherent fair prices pick out a unique probability, while several distinct sets of probabilities can give rise to the same set of one-sided previsions.