Supplement to Inductive Logic
Proof of the NonFalsifying Refutation Theorem
The proof of Convergence Theorem 2 requires the introduction of one more concept, that of the variance in the quality of information for a sequence of experiments or observations, VQI[c^{n}  h_{i}/h_{j}  b]. The quality of the information QI from a specific outcome sequence e^{n} may vary somewhat from the expected quality of information for conditions c^{n}. A common statistical measure of how widely individual values tend to vary from an expected value is given by the expected squared distance from the expected value, which is called the variance.
Definition: VQI—the Variance in the Quality of Information.
For h_{j} outcomecompatible with h_{i} on c_{k}, define
VQI[c_{k}  h_{i}/h_{j}  b] =
∑_{u} (QI[o_{ku}  h_{i}/h_{j}  b·c_{k}] − EQI[c_{k}  h_{i}/h_{j}  b])^{2} × P[o_{ku}  h_{i}·b·c_{k}].
For a sequence c^{n} of observations on which h_{j} is outcomecompatible with h_{i}, define
VQI[c^{n}  h_{i}/h_{j}  b] =
∑_{{en}} (QI[e^{n}  h_{i}/h_{j}  b·c^{n}] − EQI[c^{n}  h_{i}/h_{j}  b])^{2} × P[e^{n}  h_{i}·b·c^{n}].
Clearly VQI will be positive unless h_{i} and h_{j} agree on the likelihoods of all possible outcome sequences in the evidence stream, in which case both EQI[c^{n}  h_{i}/h_{j}  b] and VQI[c^{n}  h_{i}/h_{j}  b] equal 0.
When both Independent Evidence Conditions hold, VQI[c^{n}  h_{i}/h_{j}  b] decompose into the sum of the VQI for individual experiments or observations c_{k}.
Theorem: The VQI Decomposition Theorem for Independent
Evidence on Each Hypothesis:
Suppose both condition independence
and resultindependence hold. Then
VQI[c^{n}  h_{i}/h_{j}  b] = n
∑
k=1VQI[c_{k}  h_{i}/h_{j}  b].
For the Proof, we employ the following abbreviations:
Q[e_{k}] = QI[e_{k}  h_{i}/h_{j}  b·c_{k}] Q[e^{k}] = QI[e^{k}  h_{i}/h_{j}  b·c^{k}] E[c_{k}] = EQI[c_{k}  h_{i}/h_{j}  b] E[c^{k}] = EQI[c^{k}  h_{i}/h_{j}  b] V[c_{k}] = VQI[c_{k}  h_{i}/h_{j}  b] V[c^{k}] = VQI[c^{k}  h_{i}/h_{j}  b]
The equation stated by the theorem may be derived as follows:
=  ∑_{{en}} (Q[e^{n}] − E[c^{n}])^{2} × P[e^{n}  h_{i}·b·c^{n}] 
=  ∑_{{en}}
((Q[e_{n}]+Q[e^{n−1}]) −
(E[c_{n}]+E[c^{n−1}]))^{2} × P[e_{n}  h_{i}·b·c_{n}]×P[e^{n−1}  h_{i}·b·c^{n−1}] 
=  ∑_{{en−1}}
∑_{{en}}
((Q[e_{n}]−E[c_{n}]) +
(Q[e^{n−1}]−E[c^{n−1}]))^{2} × P[e_{n}  h_{i}·b·c_{n}]×P[e^{ n−1}  h_{i}·b·c^{n−1}] 
=  ∑_{{en−1}}
∑_{{en}} (
(Q[e_{n}]−E[c_{n}])^{2} +
(Q[e^{n−1}]−E[c^{n−1}])^{2}
+ 2×(Q[e_{n}]−E[c_{n}])×(Q[e^{n−1}]−E[c^{ n−1}]) ) × P[e_{n}  h_{i}·b·c_{n}]×P[e^{n−1}  h_{i}·b·c^{n−1}] 
=  ∑_{{en−1}}
∑_{{en}}
(Q[e_{n}]−E[c_{n}])^{2} ×
P[e_{n}  h_{i}·b·c_{n}]×P[e^{n−1}  h_{i}·b·c^{n−1}] + ∑_{{en−1}} ∑_{{en}}(Q[e^{n−1}]−E[c^{n−1}])^{2} × P[e_{n}  h_{i}·b·c_{n}]×P[e^{n−1}  h_{i}·b·c^{n−1}] + ∑_{{en−1}} ∑_{{en}} 2×(Q[e_{n}]−E[c_{n}])·(Q[e^{n−1}]−E[c^{ n−1}]) × P[e_{n}  h_{i}·b·c_{n}] × P[e^{n−1}  h_{i}·b·c^{n−1}] 
=  V[c_{n}] + V[c^{n−1}] + 2×∑_{{en−1}} ∑_{{en}}(Q[e_{n}]×Q[e^{n−1}] − Q[e_{n}]×E[c^{n−1}] − E[c_{n}]×Q[e^{n−1}] + E[c_{n}]×E[c^{n−1}]) × P[e_{n}  h_{i}·b·c_{n}]×P[e^{ n−1}  h_{i}·b×c^{n−1}] 
=  V[c_{n}] + V[c^{n−1}] + 2× (∑_{{en−1}} ∑_{{en}} Q[e_{n}]×Q[e^{n−1}]× P[e_{n}  h_{i}·b·c_{n}]×P[e^{ n−1}  h_{i}·b×c^{n−1}] − ∑_{{en−1}} ∑_{{en}} Q[e_{n}]×E[c^{n−1}]× P[e_{n}  h_{i}·b·c_{n}]×P[e^{ n−1}  h_{i}·b×c^{n−1}] − ∑_{{en−1}} ∑_{{en}} E[c_{n}]×Q[e^{n−1}]× P[e_{n}  h_{i}·b·c_{n}]×P[e^{ n−1}  h_{i}·b×c^{n−1}] + ∑_{{en−1}} ∑_{{en}} E[c_{n}]×E[c^{n−1}] × P[e_{n}  h_{i}·b·c_{n}]×P[e^{ n−1}  h_{i}·b×c^{n−1}]) 
=  V[c_{n}] + V[c^{n−1}] + 2 × (E[c_{n}]×E[c^{n−1}] − E[c_{n}]×E[c^{n−1}] − E[c_{n}]×E[c^{n−1}] + E[c_{n}]×E[c^{n−1}]) 
=  V[c_{n}] + V[c^{n−1}] 
=  … 
= 
n ∑ k = 1 
VQI[c_{k}  h_{i}/h_{j}  b]. 
By averaging the values of VQI[c^{n}  h_{i}/h_{j}  b] over the number of observations n we obtain a measure of the average variance in the quality of the information due to c^{n}. We represent this average by overlining ‘VQI’.
Definition: The Average Variance in the Quality of Information
VQI[c^{n}  h_{i}/h_{j}  b] = VQI[c^{n}  h_{i}/h_{j}  b] ÷ n.
We are now in a position to state a very general version of the second part of the Likelihood Ratio Convergence Theorem. It applies to all evidence streams not containing possibly falsifying outcomes for h_{j}. That is, it applies to all evidence streams for which h_{j} is fully outcomecompatible with h_{i} on each c_{k} in the evidence stream. This theorem is essentially a specialized version of Chebyshev's Theorem, which is a Weak Law of Large Numbers.
Likelihood Ratio Convergence Theorem 2*—The NonFalsifying Refutation Theorem.
Suppose the evidence stream c^{n} contains only experiments or observations on which h_{j} is fully outcomecompatible with h_{i}—i.e. suppose that for each condition c_{k} in sequence c^{n}, for each of its possible outcomes possible outcomes o_{ku}, either P[o_{ku}  h_{i}·b·c_{k}] = 0 or P[o_{ku}  h_{j}·b·c_{k}] > 0.
And suppose that the Independent Evidence Conditions hold for evidence stream c^{n} with respect to each of these hypotheses.
Now, choose any positive ε < 1, as small as you like,
but large enough (for the number of observations n being contemplated)
that the value of EQI[c^{n}  h_{i}/h_{j}  b] > −(log ε)/n. Then:
P[∨{ e^{n} : P[e^{n}  h_{j}·b·c^{n}] / P[e^{n}  h_{i}·b·c^{n}] < ε}  h_{i}·b·c^{n}]
> 1 −
1 n ×
VQI[c^{n}  h_{i}/h_{j}  b] (EQI[c^{n}  h_{i}/h_{j}  b] + (log ε)/n )^{2}
Thus, provided that the average expected quality of the information, EQI[c^{n}  h_{i}/h_{j}  b], for the stream of experiments and observations c^{n} doesn't get too small (as n increases), and provided that the average variance, VQI[c^{n}  h_{i}/h_{j}  b], doesn't blow up (e.g. it is bounded above), hypothesis h_{i} (together with b·c^{n}) says it is highly likely that outcomes of c^{n} will be such as to make the likelihood ratio against h_{j} as compared to h_{i} as small as you like, as n increases.
Proof: Let
V = VQI[c^{n}  h_{i}/h_{j}  b] E = EQI[c^{n}  h_{i}/h_{j}  b] Q[e^{n}] = QI[e^{n}  h_{i}/h_{j}  b·c^{n}] = log(P[e^{n}  h_{i}·b·c^{n}]/P[e^{n}  h_{j}·b·c^{n}])
Choose any small ε > 0, and suppose (for n large enough) that E > −(log ε)/n. Then we have
V  =  ∑{e^{n}: P[e^{n}  h_{j}·b·c^{n}] > 0} (E − Q)^{2} × P[e^{n}  h_{i}·b·c^{n}] 
≥  ∑{e^{n}: P[e^{n}h_{j}·b·c^{n}] > 0 & Q[e^{n}] ≤ −(log ε)} (E − Q)^{2} × P[e^{n}  h_{i}·b·c^{n}]  
≥  (E + (log ε))^{2} × ∑{e^{n}: P[e^{n}h_{j}·b·c^{n}] > 0 & Q[e^{n}] ≤ −(log ε)} P[e^{n}  h_{i}·b·c^{n}]  
=  (E + (log ε))^{2} × P[∨{e^{n}: P[e^{n}  h_{j}·b·c^{n}] >0 & Q[e^{n}]≤log(1/ε)}  h_{i}·b·c^{n}]  
=  (E + (log ε))^{2} × P[∨{e^{n}: P[e^{n}  h_{j}·b·c^{n}]/P[e^{n}  h_{i}·b·c^{n}] ≥ ε}  h_{i}·b·c^{n}] 
So,

=  V/(E + (log ε))^{2} 
≥ P[∨{e^{n}: P[e^{n}  h_{j}·b·c^{n}]/P[e^{n}  h_{i}·b·c^{n}] ≥ ε}  h_{i}·b·c^{n}] 
= 1 − P[∨{e^{n}: P[e^{n}  h_{j}·b·c^{n}]/P[e^{n}  h_{i}·b·c^{n}] < ε}  h_{i}·b·c^{n}] 
Thus, for any small ε > 0,
P[∨{e^{n}: P[e^{n}  h_{j}·b·c^{n}]/P[e^{n}  h_{i}·b·c^{n}] < ε}  h_{i}·b·c^{n}]  ≥ 

(End of Proof)
This theorem shows that when VQI is bounded above and EQI has a positive lower bound, a sufficiently long stream of evidence will very likely result in the refutation of false competitors of a true hypothesis. We can show that VQI will indeed be bounded above when a very simple condition is satisfied. This gives us the version of the theorem stated in the main text.
Likelihood Ratio Convergence Theorem 2—The NonFalsifying Refutation Theorem.
Suppose the evidence stream c^{n} contains only experiments or observations on which h_{j} is fully outcomecompatible with h_{i}—i.e. suppose that for each condition c_{k} in sequence c^{n}, for each of its possible outcomes possible outcomes o_{ku}, either P[o_{ku}  h_{i}·b·c_{k}] = 0 or P[o_{ku}  h_{j}·b·c_{k}] > 0. In addition (as a slight strengthening of the previous supposition),
for some γ > 0 a
number smaller than 1/e^{2} (≈ .135; where
‘e’ is the base of the natural logarithm), suppose that for each possible outcome o_{ku} of each
observation condition c_{k} in c^{n}, either P[o_{ku}  h_{i}·b·c_{k}] = 0 or P[o_{ku}  h_{j}·b·c_{k}] / P[o_{ku}  h_{i}·b·c_{k}] ≥ γ.
And suppose that the Independent Evidence Conditions hold for evidence stream c^{n} with respect to each of these hypotheses.
Now, choose any positive ε < 1, as small as you like,
but large enough (for the number of observations n being contemplated)
that the value of EQI[c^{n}  h_{i}/h_{j}  b] > −(log ε)/n. Then:
P[∨{ e^{n} : P[e^{n}  h_{j}·b·c^{n}] / P[e^{n}  h_{i}·b·c^{n}] < ε}  h_{i}·b·c^{n}]
> 1 −
1 n ×
(log γ)^{2} (EQI[c^{n}  h_{i}/h_{j}  b] + (log ε)/n )^{2}
Proof: This follows from Theorem 2* together with the following observation:
If for each c_{k} in c^{n}, for each of its possible outcomes o_{ku}, either P[o_{ku}  h_{j}·b·c_{k}] = 0 or P[o_{ku}  h_{j}·b·c_{k}]/P[o_{ku}  h_{i}·b·c_{k}] ≥ γ > 0, for some lower bound γ < 1/e^{2} (≈ .135; where ‘e’ is the base of the natural logarithm), then V = VQI[c^{n}  h_{i}/h_{j}  b] ≤ (log γ)^{2}.
To see that this observation holds, assume its antecedent.
 First notice that when 0 < P[e_{k} 
h_{j}·b·c_{k}] <
P[e_{k} 
h_{i}·b·c_{k}] we have
(log[P[e_{k}  h_{i}·b·c_{k}]/P[e_{ k}  h_{j}·b·c_{k}]])^{2} × P[e_{k}  h_{i}·b·c_{k}]
≤ (log γ)^{2} × P[e_{k}  h_{i}·b·c_{k}].So we only need establish that when P[e_{k}  h_{j}·b·c_{k}] > P[e_{k}  h_{i}·b·c_{k}] > 0, we will also have this relationship—i.e., we will also have
(log[P[e_{k}  h_{i}·b·c_{k}]/P[e_{ k}  h_{j}·b·c_{k}]])^{2} × P[e_{k}  h_{i}·b·c_{k}]
≤ (log γ)^{2} × P[e_{k}  h_{i}·b·c_{k}].(Then it will follow easily that VQI[c^{n}  h_{i}/h_{j}  b] ≤ (log γ)^{2}, and we'll be done.)
 To establish the needed relationship, suppose that
P[e_{k} 
h_{j}·b·c_{k}] >
P[e_{k} 
h_{i}·b·c_{k}]
> 0. Notice that for all p ≤ q, p and q
between 0 and 1, the function g(p) =
(log(p/q))^{2} × p
has a minimum at p = q, where
g(p) = 0,
and (for p < q) has a
maximum value at p =
q/e^{2}—i.e.,
at p/q =
1/e^{2}.
(To get this, take the derivative of
g(p)
with respect to p and set it equal to 0; this gives a maximum for
g(p)
at p = q/e^{2}.)
So, for 0 < P[e_{k}  h_{i}·b·c_{k}] < P[e_{k}  h_{j}·b·c_{k}] we have
(log(P[e_{k}  h_{i}·b·c_{k}]/P[e_{ k}  h_{j}·b·c_{k}]))^{2} × P[e_{k}  h_{i}·b·c_{k}]
≤ (log(1/e^{2}))^{2} × P[e_{k}  h_{j}·b·c_{k}] ≤ (log γ)^{2} × P[e_{k}  h_{j}·b·c_{k}](since, for γ ≤ 1/e^{2} we have log γ ≤ log(1/e^{2}) < 0; so (log γ)^{2} ≥ (log(1/e^{ 2}))^{2} > 0).
 Now (assuming the antecedent of the theorem), for
each c_{k},
VQI[c_{k}  h_{i}/h_{j}  b] = ∑{o_{ku}: P[o_{ku}  h_{ j}·b·c_{k}] > 0} (EQI[c_{k}] − QI[c_{k}])^{2} × P[o_{ku}  h_{i}·b·c_{k}] = ∑{o_{ku}: P[o_{ku}  h_{j}·b·c_{k}] > 0} (EQI[c_{k}]^{2} − 2×QI[c_{k}]×EQI[c_{k}] + QI[c_{k}]^{2}) × P[o_{ku}  h_{i}·b·c_{k}] = ∑{o_{ku}: P[o_{ku}  h_{j}·b·c_{k}] > 0} EQI[c_{k}]^{2}× P[o_{ku}  h_{i}·b·c_{k}] −
2×EQI[c_{k}] × ∑{o_{ku}: P[o_{ku}  h_{j}·b·c_{k}] > 0} QI[c_{k}]× P[o_{ku}  h_{i}·b·c_{k}] +
∑{o_{ku}: P[o_{ku}  h_{j}·b·c_{k}] > 0} QI[c_{k}]^{2} × P[o_{ku}  h_{i}·b·c_{k}]= ∑{o_{ku}: P[o_{ku}  h_{j}·b·c_{k}] > 0} QI[c_{k}]^{2} × P[o_{ku}  h_{i}·b·c_{k}] − EQI[c_{k}]^{2} ≤ ∑{o_{ku}: P[o_{ku}  h_{j}·b·c_{k}] > 0} QI[c_{k}]^{2} × P[o_{ku}  h_{i}·b·c_{k}] ≤ ∑{o_{ku}: P[o_{ku}  h_{j}·b·c_{k}] > 0} (log γ)^{2} × P[o_{ku}  h_{i}·b·c_{k}] ≤ (log γ)^{2}.
So,
VQI[c_{k}  h_{i}/h_{j}  b] =
(1/n) × n
∑
k = 1VQI[c_{k}h_{i}/h_{j}  b] ≤ (log γ)^{2}.