## Notes to Bayes’ Theorem

1.
Though one can
view conditional probabilities as basic, and even make sense of them
when the conditioning event has probability zero, we stick to the
standard definition here. Whenever
**P*** _{E}* appears it is assumed that

*E*'s probabilty is positive. For discussion of generalized conditional probabilities and useful references see (Renyi, 1955), (Harper, 1976), (Sphon, 1986), (Hammond, 1994), (McGee, 1994) and (Joyce 1999, 201-213).

2.
**P*** _{E}* must be a real-valued function,
bounded between zero and one, that satisfies:

*Normalization*.**P**(_{E}) = 1 where*T*is any logical truth.*T**Countable Additivity*.**P**(_{E}*X*) = Σ_{i}**P**(_{E}*X*_{i}) when {*X*_{1},*X*_{2},*X*_{3},…} is any set of pairwise incompatible propositions whose disjunction is*X*.

3.
More generally,
if {*E*_{1}, *E*_{2},
*E*_{3},…} is a countable *partition* of
evidence propositions, mixing entails that
**P**(*H*) =
Σ_{i}**P**(*E*_{i})**
P**_{E}*i*(*H*). One can think of
the *E*_{i} as a set of mutually exclusive and
collectively exhaustive results for some experiment. Mixing then says
that *H*'s unconditional probability is the *expectation*
of it probability conditional on the results of the experiment.

4.
If
*H*_{1}, *H*_{2},
*H*_{3},…, *H _{n}* is a partition
for which each of the inverse probabilities

**P**

*i(*

_{H}*E*) is known, then one can express the direct probability as

**P**

*(*

_{E}*H*) =

_{i}**P**(

*H*)

_{i}**P**

*i(*

_{H}*E*)

**/**[Σ

_{j}

**P**(

*H*)

_{j}**P**

*(*

_{H}j*E*)].

5.
For an excellent
general discussion of subjectivism see the entry
interpretations of probability
in
this *Encyclopedia*.

6.
When the person's
opinions about *H* are not sufficiently definite to be measured
by a single number, her belief state is represented by a family of
probability functions. For useful discussions of indeterminate belief
states see (Levi, 1985), (Jeffrey, 1987) and (Kaplan 1996, pp.
23-31).

7.
One can have a
determinate subjective probability for *H* conditional on
*E* even when one lacks determinate probabilities for *H*
& *E* and *E*. Statistical evidence often justifies
assignments of conditional probability without providing any
information about underlying unconditional probabilities. For example,
careful study of actuarial tables might convince me that my chances of
living past eighty given that I have a serious heart attack in my
fifties are between 0.04 and 0.02, but the same table might give me no
information whatever about the chances that I will suffer a serious
heart attack in my fifties, and so no information about my
unconditional chances of living to eighty.

8.
I take this term
from Alvin Goldman, who argues *against* the view (Goldman 1986,
89-93). While not all Bayesians accept evidence proportionism, the
account of incremental evidence as change in subjective probability
really only makes sense if one supposes that a subject's level of
confidence in a proposition varies directly with the strenght of her
evidence for its truth.

9. The distinction between the total and incremental evidence is essentially the same as Carnap's distinction between "confirmation as firmness" and "confirmation as increase in firmness" (Carnap 1962, new preface). Compare (Maher 1996, 162).

10.
On a pure
subjectivist conception, a person's total evidence for a hypothesis
derives from her own "subjective" views about the intrinsic
plausibilities of propositions and from information she acquires via
learning. The person begins with a subjective probability
**P**_{0} that encapsulates her "prior" judgments
about the plausibilities of propositions (or, more accurately, her
initial epistemic prejudices). She subsequently revises these
probabilities in light of experience, thereby incorporating new
information into her doxastic system. Her subjective probabilities at
any time are thus the result of augmenting her "prior" opinions about
the intuitive plausibilities of propositions with information acquired
via learning. While some subjecivists speak as if each person starts
her epistemic life with a kind of "ur-prior" that captures the state of
her opinions before any empirical information comes it, such talk is
misleading. Talk of "prior" and "posterior" probabilities only makes
sense relative to a specific prospective sequence of learning
experiences. The "prior" is nothing more than the probability function
that reflects the person's beliefs, however arrived at, before learning
commences. As Elliot Sober puts it, "the prior probability is properly
so called, not because it is *a priori* (it is not), but because
it is in place prior to taking the new evidence into account." (Sober
2002, 24). Some probabilists, less inclined to subjective approaches,
adopt less permissive conceptions of evidence. See, for example,
(Williamson 2000, 184-208) and (Maher 1996).

11.
For a
thorough survey of the literature in this area, with penetrating
commentary, see Fitelson 2001, chapters 1 and 2, a link to which can be
found in the Other Internet Resources section of this entry. Other
functions have been proposed as measures of evidential support as well,
among them **P**(*H* & *E*) −
**P**(*H*)**P**(*E*) see
(Carnap 1962, 360), **P**_{H}(*E*)
− **P**_{~H}(*E*) see
(Nozick 1981, 252), and
**P**_{H}(*E*) −
**P**(*E*) see (Halina 1988), and
**P**_{E}(*H*) −
**P**_{~E}(*H*) see (Joyce 1999)
and (Christensen 1999). As Fitelson shows, none of these measures
satisfies the second clause of (2.1), which means than none captures
the notion of *incremental* evidnece we are after.

12.
While the
term "effective evidence" is not standard, the idea that the disparity
between **P**_{E}(*H*) and
**P**_{~E}(*H*) captures an
important evidential relationship is defended in (Joyce 1999, 203-213)
and (Christensen 1999). Both authors argue that this measure helps
Bayesians circumvent the so-called "old evidence" problem described in
(Glymour 1980).

13.
An
alternative would be to take the *logarithm* of each measure.
Logarithmic scales are quite different from additive scales, however,
since they express quantities in terms of multiples of a common base
rather than distances from a common zero. Thus, on a logarithmic scale
equal distances represent equal *ratios* of increases in
evidence, not equal *increments* of evidence. To compare apples
and apples, we must express the multiplicative measures on an additive
scale (or the additive ones on a multiplicative scale).

14.
Consider, for
example, two probabilities, **P** and **Q**,
related by the following transformation:

Q(X) =hP_{H}(X) + [P(E) −hP_{H}(E)]P_{ ~H & E}(X) + [P(~E) −hP_{H}(~E)]P_{ ~H & ~E}(X)

where 1**/**** PR**(

*H*,

*E*) >

**Q**(

*H*) =

*h*

__>__0. The reader may verify that the

**Q**and

**P**probability ratios are the same, but that the

**Q**-probability difference is

**Q**(

*H*)

**/**

**P**(

*H*) times the

**P**-probability difference. Similarly, if

**P**and

**Q**are related by

Q(X) =hP_{H}(X) + (1 −h)P_{~H}(X)

where 1 > *h* > 0, then the **P** and
**Q** likelihood ratios are the same even though their
odds differences are a factor of
[**Q**(*H*)**/****Q**(~*
H*)]**/**[**P**(*H*)**/****
P**(~*H*)] apart.

15. For a clear statement of this position see (Royall 1997, 8-11).

16. See, for example, (Teller 1976), (Armendt 1980), (Skyrms 1987) and (van Fraassen 1999).