Notes to Bayes’ Theorem

1. Though one can view conditional probabilities as basic, and even make sense of them when the conditioning event has probability zero, we stick to the standard definition here. Whenever PE appears it is assumed that E's probabilty is positive. For discussion of generalized conditional probabilities and useful references see (Renyi, 1955), (Harper, 1976), (Sphon, 1986), (Hammond, 1994), (McGee, 1994) and (Joyce 1999, 201-213).

2. PE must be a real-valued function, bounded between zero and one, that satisfies:

  • Normalization. PE(T) = 1 where T is any logical truth.
  • Countable Additivity. PE(X) = Σi PE(Xi) when {X1, X2, X3,…} is any set of pairwise incompatible propositions whose disjunction is X.

3. More generally, if {E1, E2, E3,…} is a countable partition of evidence propositions, mixing entails that P(H) = ΣiP(Ei) PEi(H). One can think of the Ei as a set of mutually exclusive and collectively exhaustive results for some experiment. Mixing then says that H's unconditional probability is the expectation of it probability conditional on the results of the experiment.

4. If H1, H2, H3,…, Hn is a partition for which each of the inverse probabilities PHi(E) is known, then one can express the direct probability as PE(Hi) = P(Hi)P Hi(E)/j P(Hj)PHj( E)].

5. For an excellent general discussion of subjectivism see the entry interpretations of probability in this Encyclopedia.

6. When the person's opinions about H are not sufficiently definite to be measured by a single number, her belief state is represented by a family of probability functions. For useful discussions of indeterminate belief states see (Levi, 1985), (Jeffrey, 1987) and (Kaplan 1996, pp. 23-31).

7. One can have a determinate subjective probability for H conditional on E even when one lacks determinate probabilities for H & E and E. Statistical evidence often justifies assignments of conditional probability without providing any information about underlying unconditional probabilities. For example, careful study of actuarial tables might convince me that my chances of living past eighty given that I have a serious heart attack in my fifties are between 0.04 and 0.02, but the same table might give me no information whatever about the chances that I will suffer a serious heart attack in my fifties, and so no information about my unconditional chances of living to eighty.

8. I take this term from Alvin Goldman, who argues against the view (Goldman 1986, 89-93). While not all Bayesians accept evidence proportionism, the account of incremental evidence as change in subjective probability really only makes sense if one supposes that a subject's level of confidence in a proposition varies directly with the strenght of her evidence for its truth.

9. The distinction between the total and incremental evidence is essentially the same as Carnap's distinction between "confirmation as firmness" and "confirmation as increase in firmness" (Carnap 1962, new preface). Compare (Maher 1996, 162).

10. On a pure subjectivist conception, a person's total evidence for a hypothesis derives from her own "subjective" views about the intrinsic plausibilities of propositions and from information she acquires via learning. The person begins with a subjective probability P0 that encapsulates her "prior" judgments about the plausibilities of propositions (or, more accurately, her initial epistemic prejudices). She subsequently revises these probabilities in light of experience, thereby incorporating new information into her doxastic system. Her subjective probabilities at any time are thus the result of augmenting her "prior" opinions about the intuitive plausibilities of propositions with information acquired via learning. While some subjecivists speak as if each person starts her epistemic life with a kind of "ur-prior" that captures the state of her opinions before any empirical information comes it, such talk is misleading. Talk of "prior" and "posterior" probabilities only makes sense relative to a specific prospective sequence of learning experiences. The "prior" is nothing more than the probability function that reflects the person's beliefs, however arrived at, before learning commences. As Elliot Sober puts it, "the prior probability is properly so called, not because it is a priori (it is not), but because it is in place prior to taking the new evidence into account." (Sober 2002, 24). Some probabilists, less inclined to subjective approaches, adopt less permissive conceptions of evidence. See, for example, (Williamson 2000, 184-208) and (Maher 1996).

11. For a thorough survey of the literature in this area, with penetrating commentary, see Fitelson 2001, chapters 1 and 2, a link to which can be found in the Other Internet Resources section of this entry. Other functions have been proposed as measures of evidential support as well, among them P(H & E) − P(H)P(E) see (Carnap 1962, 360), PH(E) − P~H(E) see (Nozick 1981, 252), and PH(E) − P(E) see (Halina 1988), and PE(H) − P~E(H) see (Joyce 1999) and (Christensen 1999). As Fitelson shows, none of these measures satisfies the second clause of (2.1), which means than none captures the notion of incremental evidnece we are after.

12. While the term "effective evidence" is not standard, the idea that the disparity between PE(H) and P~E(H) captures an important evidential relationship is defended in (Joyce 1999, 203-213) and (Christensen 1999). Both authors argue that this measure helps Bayesians circumvent the so-called "old evidence" problem described in (Glymour 1980).

13. An alternative would be to take the logarithm of each measure. Logarithmic scales are quite different from additive scales, however, since they express quantities in terms of multiples of a common base rather than distances from a common zero. Thus, on a logarithmic scale equal distances represent equal ratios of increases in evidence, not equal increments of evidence. To compare apples and apples, we must express the multiplicative measures on an additive scale (or the additive ones on a multiplicative scale).

14. Consider, for example, two probabilities, P and Q, related by the following transformation:

Q(X) = hPH(X) + [P(E) − hPH(E)]P ~H & E(X) + [P(~E) − hPH(~E)]P ~H & ~E(X)

where 1/PR(H, E) > Q(H) = h > 0. The reader may verify that the Q and P probability ratios are the same, but that the Q-probability difference is Q(H)/P(H) times the P-probability difference. Similarly, if P and Q are related by

Q(X) = hPH(X) + (1 − h)P~H(X)

where 1 > h > 0, then the P and Q likelihood ratios are the same even though their odds differences are a factor of [Q(H)/Q(~ H)]/[P(H)/ P(~H)] apart.

15. For a clear statement of this position see (Royall 1997, 8-11).

16. See, for example, (Teller 1976), (Armendt 1980), (Skyrms 1987) and (van Fraassen 1999).

Copyright © 2003 by
James Joyce <>

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free