Supplement to Inductive Logic
Proof of the Non-Falsifying Refutation Theorem
The proof of Convergence Theorem 2 requires the introduction of one more concept, that of the variance in the quality of information for a sequence of experiments or observations, VQI[cn | hi/hj | b]. The quality of the information QI from a specific outcome sequence en may vary somewhat from the expected quality of information for conditions cn. A common statistical measure of how widely individual values tend to vary from an expected value is given by the expected squared distance from the expected value, which is called the variance.
Definition: VQI—the Variance in the Quality of Information.
For hj outcome-compatible with hi on ck, define
VQI[ck | hi/hj | b] =
∑u (QI[oku | hi/hj | b·ck] − EQI[ck | hi/hj | b])2 × P[oku | hi·b·ck].
For a sequence cn of observations on which hj is outcome-compatible with hi, define
VQI[cn | hi/hj | b] =
∑{en} (QI[en | hi/hj | b·cn] − EQI[cn | hi/hj | b])2 × P[en | hi·b·cn].
Clearly VQI will be positive unless hi and hj agree on the likelihoods of all possible outcome sequences in the evidence stream, in which case both EQI[cn | hi/hj | b] and VQI[cn | hi/hj | b] equal 0.
When both Independent Evidence Conditions hold, VQI[cn | hi/hj | b] decompose into the sum of the VQI for individual experiments or observations ck.
Theorem: The VQI Decomposition Theorem for Independent
Evidence on Each Hypothesis:
Suppose both condition independence
and result-independence hold. Then
VQI[cn | hi/hj | b] = n
∑
k=1VQI[ck | hi/hj | b].
For the Proof, we employ the following abbreviations:
Q[ek] = QI[ek | hi/hj | b·ck] Q[ek] = QI[ek | hi/hj | b·ck] E[ck] = EQI[ck | hi/hj | b] E[ck] = EQI[ck | hi/hj | b] V[ck] = VQI[ck | hi/hj | b] V[ck] = VQI[ck | hi/hj | b]
The equation stated by the theorem may be derived as follows:
= | ∑{en} (Q[en] − E[cn])2 × P[en | hi·b·cn] |
= | ∑{en}
((Q[en]+Q[en−1]) −
(E[cn]+E[cn−1]))2 × P[en | hi·b·cn]×P[en−1 | hi·b·cn−1] |
= | ∑{en−1}
∑{en}
((Q[en]−E[cn]) +
(Q[en−1]−E[cn−1]))2 × P[en | hi·b·cn]×P[e n−1 | hi·b·cn−1] |
= | ∑{en−1}
∑{en} (
(Q[en]−E[cn])2 +
(Q[en−1]−E[cn−1])2
+ 2×(Q[en]−E[cn])×(Q[en−1]−E[c n−1]) ) × P[en | hi·b·cn]×P[en−1 | hi·b·cn−1] |
= | ∑{en−1}
∑{en}
(Q[en]−E[cn])2 ×
P[en | hi·b·cn]×P[en−1 | hi·b·cn−1] + ∑{en−1} ∑{en}(Q[en−1]−E[cn−1])2 × P[en | hi·b·cn]×P[en−1 | hi·b·cn−1] + ∑{en−1} ∑{en} 2×(Q[en]−E[cn])·(Q[en−1]−E[c n−1]) × P[en | hi·b·cn] × P[en−1 | hi·b·cn−1] |
= | V[cn] + V[cn−1] + 2×∑{en−1} ∑{en}(Q[en]×Q[en−1] − Q[en]×E[cn−1] − E[cn]×Q[en−1] + E[cn]×E[cn−1]) × P[en | hi·b·cn]×P[e n−1 | hi·b×cn−1] |
= | V[cn] + V[cn−1] + 2× (∑{en−1} ∑{en} Q[en]×Q[en−1]× P[en | hi·b·cn]×P[e n−1 | hi·b×cn−1] − ∑{en−1} ∑{en} Q[en]×E[cn−1]× P[en | hi·b·cn]×P[e n−1 | hi·b×cn−1] − ∑{en−1} ∑{en} E[cn]×Q[en−1]× P[en | hi·b·cn]×P[e n−1 | hi·b×cn−1] + ∑{en−1} ∑{en} E[cn]×E[cn−1] × P[en | hi·b·cn]×P[e n−1 | hi·b×cn−1]) |
= | V[cn] + V[cn−1] + 2 × (E[cn]×E[cn−1] − E[cn]×E[cn−1] − E[cn]×E[cn−1] + E[cn]×E[cn−1]) |
= | V[cn] + V[cn−1] |
= | … |
= |
n ∑ k = 1 |
VQI[ck | hi/hj | b]. |
By averaging the values of VQI[cn | hi/hj | b] over the number of observations n we obtain a measure of the average variance in the quality of the information due to cn. We represent this average by overlining ‘VQI’.
Definition: The Average Variance in the Quality of Information
VQI[cn | hi/hj | b] = VQI[cn | hi/hj | b] ÷ n.
We are now in a position to state a very general version of the second part of the Likelihood Ratio Convergence Theorem. It applies to all evidence streams not containing possibly falsifying outcomes for hj. That is, it applies to all evidence streams for which hj is fully outcome-compatible with hi on each ck in the evidence stream. This theorem is essentially a specialized version of Chebyshev's Theorem, which is a Weak Law of Large Numbers.
Likelihood Ratio Convergence Theorem 2*—The Non-Falsifying Refutation Theorem.
Suppose the evidence stream cn contains only experiments or observations on which hj is fully outcome-compatible with hi—i.e. suppose that for each condition ck in sequence cn, for each of its possible outcomes possible outcomes oku, either P[oku | hi·b·ck] = 0 or P[oku | hj·b·ck] > 0.
And suppose that the Independent Evidence Conditions hold for evidence stream cn with respect to each of these hypotheses.
Now, choose any positive ε < 1, as small as you like,
but large enough (for the number of observations n being contemplated)
that the value of EQI[cn | hi/hj | b] > −(log ε)/n. Then:
P[∨{ en : P[en | hj·b·cn] / P[en | hi·b·cn] < ε} | hi·b·cn]
> 1 −
1 n ×
VQI[cn | hi/hj | b] (EQI[cn | hi/hj | b] + (log ε)/n )2
Thus, provided that the average expected quality of the information, EQI[cn | hi/hj | b], for the stream of experiments and observations cn doesn't get too small (as n increases), and provided that the average variance, VQI[cn | hi/hj | b], doesn't blow up (e.g. it is bounded above), hypothesis hi (together with b·cn) says it is highly likely that outcomes of cn will be such as to make the likelihood ratio against hj as compared to hi as small as you like, as n increases.
Proof: Let
V = VQI[cn | hi/hj | b] E = EQI[cn | hi/hj | b] Q[en] = QI[en | hi/hj | b·cn] = log(P[en | hi·b·cn]/P[en | hj·b·cn])
Choose any small ε > 0, and suppose (for n large enough) that E > −(log ε)/n. Then we have
V | = | ∑{en: P[en | hj·b·cn] > 0} (E − Q)2 × P[en | hi·b·cn] |
≥ | ∑{en: P[en|hj·b·cn] > 0 & Q[en] ≤ −(log ε)} (E − Q)2 × P[en | hi·b·cn] | |
≥ | (E + (log ε))2 × ∑{en: P[enhj·b·cn] > 0 & Q[en] ≤ −(log ε)} P[en | hi·b·cn] | |
= | (E + (log ε))2 × P[∨{en: P[en | hj·b·cn] >0 & Q[en]≤log(1/ε)} | hi·b·cn] | |
= | (E + (log ε))2 × P[∨{en: P[en | hj·b·cn]/P[en | hi·b·cn] ≥ ε} | hi·b·cn] |
So,
|
= | V/(E + (log ε))2 |
≥ P[∨{en: P[en | hj·b·cn]/P[en | hi·b·cn] ≥ ε} | hi·b·cn] |
= 1 − P[∨{en: P[en | hj·b·cn]/P[en | hi·b·cn] < ε} | hi·b·cn] |
Thus, for any small ε > 0,
P[∨{en: P[en | hj·b·cn]/P[en | hi·b·cn] < ε} | hi·b·cn] | ≥ |
|
(End of Proof)
This theorem shows that when VQI is bounded above and EQI has a positive lower bound, a sufficiently long stream of evidence will very likely result in the refutation of false competitors of a true hypothesis. We can show that VQI will indeed be bounded above when a very simple condition is satisfied. This gives us the version of the theorem stated in the main text.
Likelihood Ratio Convergence Theorem 2—The Non-Falsifying Refutation Theorem.
Suppose the evidence stream cn contains only experiments or observations on which hj is fully outcome-compatible with hi—i.e. suppose that for each condition ck in sequence cn, for each of its possible outcomes possible outcomes oku, either P[oku | hi·b·ck] = 0 or P[oku | hj·b·ck] > 0. In addition (as a slight strengthening of the previous supposition),
for some γ > 0 a
number smaller than 1/e2 (≈ .135; where
‘e’ is the base of the natural logarithm), suppose that for each possible outcome oku of each
observation condition ck in cn, either P[oku | hi·b·ck] = 0 or P[oku | hj·b·ck] / P[oku | hi·b·ck] ≥ γ.
And suppose that the Independent Evidence Conditions hold for evidence stream cn with respect to each of these hypotheses.
Now, choose any positive ε < 1, as small as you like,
but large enough (for the number of observations n being contemplated)
that the value of EQI[cn | hi/hj | b] > −(log ε)/n. Then:
P[∨{ en : P[en | hj·b·cn] / P[en | hi·b·cn] < ε} | hi·b·cn]
> 1 −
1 n ×
(log γ)2 (EQI[cn | hi/hj | b] + (log ε)/n )2
Proof: This follows from Theorem 2* together with the following observation:
If for each ck in cn, for each of its possible outcomes oku, either P[oku | hj·b·ck] = 0 or P[oku | hj·b·ck]/P[oku | hi·b·ck] ≥ γ > 0, for some lower bound γ < 1/e2 (≈ .135; where ‘e’ is the base of the natural logarithm), then V = VQI[cn | hi/hj | b] ≤ (log γ)2.
To see that this observation holds, assume its antecedent.
- First notice that when 0 < P[ek |
hj·b·ck] < P[ek |
hi·b·ck] we have
(log[P[ek | hi·b·ck]/P[e k | hj·b·ck]])2 × P[ek | hi·b·ck]
≤ (log γ)2 × P[ek | hi·b·ck].So we only need establish that when P[ek | hj·b·ck] > P[ek | hi·b·ck] > 0, we will also have this relationship—i.e., we will also have
(log[P[ek | hi·b·ck]/P[e k | hj·b·ck]])2 × P[ek | hi·b·ck]
≤ (log γ)2 × P[ek | hi·b·ck].(Then it will follow easily that VQI[cn | hi/hj | b] ≤ (log γ)2, and we'll be done.)
- To establish the needed relationship, suppose that
P[ek |
hj·b·ck] >
P[ek |
hi·b·ck]
> 0. Notice that for all p ≤ q, p and q
between 0 and 1, the function g(p) =
(log(p/q))2 × p
has a minimum at p = q, where
g(p) = 0,
and (for p < q) has a
maximum value at p =
q/e2—i.e.,
at p/q =
1/e2.
(To get this, take the derivative of
g(p)
with respect to p and set it equal to 0; this gives a maximum for
g(p)
at p = q/e2.)
So, for 0 < P[ek | hi·b·ck] < P[ek | hj·b·ck] we have
(log(P[ek | hi·b·ck]/P[e k | hj·b·ck]))2 × P[ek | hi·b·ck]
≤ (log(1/e2))2 × P[ek | hj·b·ck] ≤ (log γ)2 × P[ek | hj·b·ck](since, for γ ≤ 1/e2 we have log γ ≤ log(1/e2) < 0; so (log γ)2 ≥ (log(1/e 2))2 > 0).
- Now (assuming the antecedent of the theorem), for
each ck,
VQI[ck | hi/hj | b] = ∑{oku: P[oku | h j·b·ck] > 0} (EQI[ck] − QI[ck])2 × P[oku | hi·b·ck] = ∑{oku: P[oku | hj·b·ck] > 0} (EQI[ck]2 − 2×QI[ck]×EQI[ck] + QI[ck]2) × P[oku | hi·b·ck] = ∑{oku: P[oku | hj·b·ck] > 0} EQI[ck]2× P[oku | hi·b·ck] −
2×EQI[ck] × ∑{oku: P[oku | hj·b·ck] > 0} QI[ck]× P[oku | hi·b·ck] +
∑{oku: P[oku | hj·b·ck] > 0} QI[ck]2 × P[oku | hi·b·ck]= ∑{oku: P[oku | hj·b·ck] > 0} QI[ck]2 × P[oku | hi·b·ck] − EQI[ck]2 ≤ ∑{oku: P[oku | hj·b·ck] > 0} QI[ck]2 × P[oku | hi·b·ck] ≤ ∑{oku: P[oku | hj·b·ck] > 0} (log γ)2 × P[oku | hi·b·ck] ≤ (log γ)2.
So,
VQI[ck | hi/hj | b] =
(1/n) × n
∑
k = 1VQI[ck|hi/hj | b] ≤ (log γ)2.