Supplement to Inductive Logic
The Effect on EQI of Partitioning the Outcome Space More Finely—Including Proof of the Nonnegativity of EQI
Given some experiment or observation (or series of them) c, is there any special advantage to parsing the space of possible outcomes O into more alternatives rather than fewer alternatives? Couldn't we do as well at evidentially evaluating hypotheses by parsing the space of outcomes into just a few alternatives—e.g., one possible outcome that h_{i} says is very likely and h_{j} says is rather unlikely, one that h_{i} says is rather unlikely and h_{j} says is very likely, and perhaps a third outcome on which h_{i} and h_{j} pretty much agree? The answer is “No !”. Parsing the space of outcomes into a larger number of empirically distinct possible outcomes always provides a better measure of evidential support.
To see this intuitively, suppose some outcome description o can be parsed into two distinct outcome descriptions, o_{1} and o_{2}, where o is equivalent to (o_{1}∨o_{2}), and suppose that h_{i} differs from h_{j} much more on the likelihood of o_{1} than on the likelihood of o_{2}. Then, intuitively, when o is found to be true, whichever of the more precise descriptions, o_{1} or o_{2}, is true should make a difference as to how strong the comparative support for the two hypotheses turns out to be. Reporting whichever of o_{1} or o_{2} occurs will be more informative than simply reporting o. That is, if the outcome of the experiment is only described as o, relevant information is lost.
It turns out that EQI measures how well possible outcomes can distinguish between hypotheses in a way that reflects the intuition that a finer partition of the possible outcomes is more informative. The numerical value of EQI is always made larger by parsing the outcome space more finely, provided that the likelihoods for outcomes in the finer parsing differ at least a bit from some of the likelihoods for outcomes of the less refined parsing. This is important for our main convergence result because in that theorem we want the average value of EQI for the whole sequence of experiments and observations to be positive, and the larger the better.
The following Partition Theorem implies the Nonnegativity of EQI result as well. It show that each EQI[c_{k}  h_{i}/h_{j}  b] must be nonnegative; and it will be positive just in case for at least one possible outcome o_{ku}, P[o_{ku}  h_{j}·b·c_{k}] ≠ P[o_{ku}  h_{i}·b·c_{k}]. This theorem will also show that EQI[c_{k}  h_{i}/h_{j} b] generally becomes larger whenever the outcome space is partitioned more finely. It follows immediately that the average value of EQI for a sequence of experiments or observations, EQI[c^{n}  h_{i}/h_{j}  b], averaged over the sequence of observations c^{n}, is nonnegative, and must be positive if for even one of the c_{k} that contribute to it, at least one possible outcome o_{ku} distinguishes between the two hypotheses by making P[o_{ku}  h_{j}·b·c_{k}] ≠ P[o_{ku}  h_{i}·b·c_{k}].
Partition Theorem:
For any positive real numbers r_{1}, r_{2},
s_{1}, s_{2}:
(1) if r_{1}/s_{1} >
(r_{1}+r_{2})/(s_{1}+s_{2}), then
(r_{1}+r_{2})×log[(r_{1}+r_{2})/(s_{1}+s_{2})] <
r_{1}×log[r_{1}/s_{1}] + r_{2}×log[r_{2}/s_{2}];
and
(2) if r_{1}/s_{1} =
(r_{1}+r_{2})/(s_{1}+s_{2}), then
r_{1}×log[r_{1}/s_{1}] +
r_{2}×log[r_{2}/s_{2}] =
(r_{1}+r_{2})×log[(r_{1}+r_{2})/(s_{1}+s_{2})].
To prove this theorem first notice that
r_{1}/s_{1} = (r_{1}+r_{2})/(s_{1}+s_{2}) iff r_{1}s_{1} + r_{1}s_{2} = s_{1}r_{1} + s_{1}r_{2} iff r_{1}/s_{1} = r_{2}/s_{2}.
We'll draw on this little result immediately below. It is clearly relevant to the antecedent of case (2) of the theorem we want to prove.
We establish case (2) first. Suppose the antecedent of case (2) holds. Then, from the little result just proved, we have
r_{1} log[r_{1}/s_{1}] + r_{2} log[r_{2}/s_{2}] = r_{1} log[(r_{1}+r_{2})/(s_{1}+s_{2})] + r_{2} log[(r_{1}+r_{2})/(s_{1}+s_{2})] = (r_{1} + r_{2}) log[(r_{1}+r_{2})/(s_{1}+s_{2})].
That establishes case (2).
To get case (1), consider the following function of p:
f(p) = p log[p/u] + (1−p) log[(1−p)/v],
where we only assume that u > 0, v > 0, and 0 < p < 1.
This function has its minimum value when p = u/(u+v). (This is easily verified by setting the derivative of f(p) with respect to p equal to 0 to find the minimum value of f(p); and it is easy to verified that this is a minimum rather than a maximum value.) At this minimum, where p = u/(u+v), we have
f(p) = −u/(u+v) log[u+v] − v/(u+v) log[u+v] = −log[u+v].
Thus, for all values of p other than u/(u+v),
−log[u+v] < f(p) = p log[p/u] + (1−p) log[(1−p)/v].
That is, if p ≠ u/(u+v), −log[u+v] < p log[p/u] + (1−p) log[(1−p)/v].
Now, let p = r_{1}/(r_{1}+r_{2}), let u = s_{1}/(r_{1}+r_{2}), and let v = s_{2}/(r_{1}+r_{2}). Plugging into the previous formula, and multiplying both sides by (r_{1}+r_{2}), we get:
if
r_{1}/(r_{1}+r_{2}) ≠
s_{1}/(s_{1}+s_{2}) (i.e., equivalently, if
r_{1}/s_{1} ≠
(r_{1}+r_{2})/(s_{1}+s_{2})),
then
log[(r_{1}+r_{2})/(s_{1}+s_{2})] <
[r_{1}/(r_{1}+r_{2})] log[r_{1}/s_{1}] + (1−[r_{1}/(r_{1}+r_{2})])
log[r_{2}/s_{2}]
(i.e. equivalently,
(r_{1}+r_{2})
log[(r_{1}+r_{2})/(s_{1}+s_{2})] <
r_{1} log[r_{1}/s_{1}] + r_{2}
log[r_{2}/s_{2}]).
Thus, from the two equivalents, we've proved case 2:
if
r_{1}/s_{1} ≠
(r_{1}+r_{2})/(s_{1}+s_{2})),
then
(r_{1}+r_{2})
log[(r_{1}+r_{2})/(s_{1}+s_{2})] <
r_{1} log[r_{1}/s_{1}] + r_{2}
log[r_{2}/s_{2}]).
This completes the proof of the theorem.
To apply this result to EQI[c_{k}  h_{i}/h_{j}  b] recall that
EQI[c_{k}  h_{i}/h_{j}  b] = ∑{u: P[o_{ku}  h_{j}·b·c_{k}] > 0} log[P[o_{ku}  h_{i}·b·c_{k}]/P[o_{ku}  h_{j}·b·c_{k}]]
×P[o_{ku}  h_{i}·b·c_{k}].
Suppose c_{k} has m alternative outcomes o_{ku} on which both
P[o_{ku}  h_{j}·b·c_{k}] > 0 and P[o_{ku}  h_{i}·b·c_{k}] > 0.
Let's label their likelihoods relative to h_{i} (i.e., their likelihoods P[o_{ku}  h_{i}·b·c_{k}]) as r_{1}, r_{2}, …, r_{m}. And let's label their likelihoods relative to h_{j} as s_{1}, s_{2}, …, s_{m}. In terms of this notation,
EQI[c_{k}  h_{i}/h_{j}  b] = m
∑
u = 1r_{u}×log[r_{u}/s_{u}].
Notice also that (r_{1}+r_{2}+r_{3}+…+r_{m}) = 1 and (s_{1}+s_{2}+s_{3}+…+s_{m}) = 1.
Now, think of EQI[c_{k}  h_{i}/h_{j}  b] as generated by applying the theorem in successive steps:
0  =  1× log[1/1]  
=  (r_{1}+r_{2}+r_{3}+…+r_{m})×log[(r_{ 1}+r_{2}+r_{3}+…+r_{m})/(s_{1}+s_{ 2}+s_{3}+…+s_{m})]  
≤  r_{1}×log[r_{1}/s_{1}] + (r_{2}+r_{3}+…+r_{m})× log[(r_{2}+r_{3}+…+r_{m})/(s_{2}+s_{ 3}+…+s_{m})]  
≤  r_{1}×log[r_{1}/s_{1}] + r_{2}×log[r_{2}/s_{2}] + (r_{3}+…+r_{m})×log[(r_{3}+…+r_{m})/(s_{ 3}+…+s_{m})]  
≤  …  
≤ 


=  EQI[c_{k}  h_{i}/h_{j}  b]. 
The theorem also says that at each step equality holds just in case
r_{u}/s_{u} = (r_{u}+r_{u+1}+…+r_{m})/(s_{u}+s_{u+1}+…+s_{ m}),
which itself holds just in case
r_{u}/s_{u} = (r_{u+1}+…+r_{m})/(s_{u+1}+…+s_{m}).
So,
EQI[c_{k}  h_{i}/h_{j}  b] = 0
just in case
1 = (r_{1}+r_{2}+r_{3}+…+r_{m})/(s_{1}+s_{ 2}+s_{3}+…+s_{m}) = r_{1}/s_{1} = (r_{2}+r_{3}+…+r_{m})/(s_{2}+s_{3}+…+s_{ m}) = r_{2}/s_{2} = (r_{3}+…+r_{m})/(s_{3}+…+s_{m}) = r_{3}/s_{3} = … = r_{m}/s_{m}.
That is,
EQI[c_{k}  h_{i}/h_{j }  b] = 0
just in case for all o_{ku} such that P[o_{ku}  h_{j}·b] > 0 and P[o_{ku}  h_{i}·b] > 0,
P[o_{ku}  h_{i}·b·c_{k}]/P[o_{ku}  h_{j}·b·c_{ k}] = 1.
Otherwise,
EQI[c_{k}  h_{i}/h_{j}  b] > 0;
and for each successive step in partitioning the outcome space to generate EQI[c_{k}  h_{i}/h_{j}  b], if
r_{u}/s_{u} ≠ (r_{u}+r_{u+1}+…+r_{m})/(s_{u}+s_{u+1}+…+s_{m}),
we have the strict inequality:
(r_{u}+r_{u+1}+…+r_{m}) × log[(r_{u}+r_{u+1}+…+r_{m})/(s_{u}+s_{ u+1}+…+s_{m})] <
r_{u}×log[r_{u}/s_{u}] + (r_{u+1}+…+r_{m})×log[(r_{u+1}+…+r_{m})/(s_{u+1}+…+s_{m})].
So each such division of (o_{ku}∨o_{ku+1}∨…∨o_{km}) into two separate tatements, o_{ku} and (o_{ku+1}∨…∨o_{km}), adds a strictly positive contribution to the size of EQI[c_{k}  h_{i}/h_{j}  b] just when P[o_{ku}  h_{i}·b·c_{k}] / P[o_{ku}  h_{j}·b·c_{ k}] ≠ P[(o_{ku+1}∨…∨o_{km})  h_{i}·b·c_{k}] / P[(o_{ku+1}∨…∨o_{km})  h_{j}·b·c_{ k}].