#### Supplement to Chance versus Randomness

## C. Proofs of Selected Theorems

*Proof of Theorem
1.*
This follows from Church's observation by a result of Doob (1936) or a
similar result of Wald (1938). Doob's result is that the measure of the
set of infinite sequences which can result from the application of a
admissible place selection φ on a member of some subset
*S* of the Cantor space is the same as the measure of *S*
(i.e., μ(φ^{−1}(*S*)) =
μ(*S*)) (see also Feller 1950: 203). Since the
measure of the set of Borel *abnormal* sequences *A* is
zero, Doob's theorem shows that the measure of
φ^{−1}(*A*) is zero—i.e., that
the set of sequences such that an admissible place selection gives a
biased subsequence has measure zero. Since for von Mises only a
non-random sequence has biased admissible subsequences, the set of
non-random sequences is the set of all sequences such that for some
place selection, the result is a biased subsequence, so is the union of
φ^{−1}(*A*) for all φ.
If there are only countably many φ (as there are on
Church's conception of place selections), since the measure of the
union of countably many sets is less than or equal to the sum of their
measures, the measure of the non-random sequences is zero, and the von
Mises-random sequences therefore exist and form a measure one subset of
all infinite binary sequences. See van Lambalgen (1987a:
§2.5.2).

*Proof of Corollary
1.*
(This proof is due to Chris Porter.) To begin, suppose
that the place selection
*f*_{σ}(*x*_{1}…*
x*_{i-1}) = 1 iff there exists a *j* <
*i* such that σ =
*x _{j}*…

*x*. That is, on input some initial segment of sequence

_{i}*x*of length

*i*−1, this place selection relative to σ selects the

*i*-th digit of

*x*iff σ is a suffix of the input. It is easy to see that, for each finite binary sequence σ,

*f*

_{σ}will be effectively computable, so it is a plausible assumption that {

*f*

_{σ}: σ ∈

*X*

^{<ω}} is a subset of the set of admissible place selections. It can then be shown (by induction over the length of σ) that any sequence which satisfies the property of large numbers, and all of its admissible subsequences do too, will be Borel normal.

- The base case, for the empty string ∅, is guaranteed by the
requirement that the sequence
*x*satisfies the property of large numbers, as every outcome place is preceded by an initial subsequence which has ∅ as a suffix. - Next suppose that every string of length
*n*appears in*x*with limit frequency 2^{−n}. Then for each σ of length*n*,*f*will select from_{σ}*x*a subsequence that satisfies the property of large numbers, as roughly half the time, when σ appears in*x*, it will be followed by a 0, and the other half of the time it will be followed by a 1. Since this holds for any σ of length*n*, it follows that σ0 and σ1, the possible strings of length*n*+1, all occur with limit frequency ½ · 2^{−n}= 2^{−(n+1)}, as required.

*finite automata*(finite state machines) to characterise admissible place selections, a result of Agafonov (1968) shows that the class of resulting sequences is exactly the class of normal sequences. (Of course, since not every normal sequence is random—as the result of Ville, Theorem 2, shows—that class of sequences isn't a good candidate for the class of random sequences.)

*Proof of Theorem
3.*
Every rejected sequence is rejected at some level. So every rejected
sequence has some initial sequence which is rejected at that level. To
be rejected at a level 2^{−m} is to fall into a
set of sequences *U _{m}* which has measure less than the
size of the level, by definition. A test is just a collection of such
sets

*U*; it is clear that if a sequence is rejected at a level, it is rejected at every smaller level, so

_{m}*U*⊇

_{m}*U*

_{m+1}. It is effectively determinable whether a given sequence is rejected; since the set of pairs ⟨

*m*,

*n*⟩ is effectively enumerable (using Cantor's pairing function), if a sequence should be rejected, there is an effective procedure which establishes that. These two results show that the set of rejected sequences is effective measure zero: each test is a sequence of effective open sets of sequences. To be rejected is to be a member of some such set. So the set of all rejected sequences is in the intersection of all the test sets

*U*. By construction, the rejected sequences, by a particular test, are thus of effective measure zero.

_{m}
Martin-Löf then shows that the set of all tests is itself
effectively enumerable. Using the fact that a test is an enumerable set
of sets, and Martin-Löf explicitly constructs a function that
assigns a test to a given triple of numbers (which, in effect, encodes
the test), and shows that the set *T* of such triples is
effectively enumerable. The universal test *U* is obtained by
fixing on some constant, *c*, for each test *V*, and
letting the universal test be obtained from the set *T*, using
the triple which is the level of significance, the length of the
sequence, and the constant (which may as well be the Gödel number
of the test *V*). The construction ensures that for any test
*V*, *V*_{m+c} ⊆
*U _{m}*; that is, the universal test is one that rejects
any sequence at a given level iff there is some test that rejects it at
a related level (which level depending on the test). The details of the
construction (Martin-Löf 1966: 606, 609–10) show that the
universal test also rejects only an effective measure zero set of
sequences, so the complement of the rejected sequences—the set of
ML-random sequences—is effective measure one by means of a single
universal property.

*Proof of Theorem
4.*
The proof is a fairly immediate corollary of the existence of a
enumeration of the partial recursive functions. Each function
*g* has an associated code number (Gödel number)
γ. Let α^{[γ]}
represent the string consisting of a block of γ
instances of the digit α. We now choose an *f*
that, on input 1^{[γ]}0δ,
produces *g*(δ). Such functions exist, because of
the existence of universal Turing machines (let γ be the
Gödel number of *g* under some standard coding, then a
universal Turing machine given this input may emulate the operation of
*g* on input δ). It is obvious that they are
near-superior to any *g*, for
*c*_{f}(δ) =
*c*_{g}(δ) + (γ
+1). Any universal function will be able to emulate any other, which
also shows that they are all complexity equivalent to one another.

*Proof of Theorem
5.*
Suppose there was an algorithm *h* that on input *i*
yields a random sequence ρ such that |ρ| =
*i*. ρ is random, so *C*(ρ)
≈ *i*; yet *h* is near-inferior to some universal
function, so that *C*(ρ) ≤
*C*_{h}(ρ) + *k*, for some
*k* depending on *h*. Let |ρ| increase, so
*k* becomes negligible with respect to *i*; then
*i* ≤ *C*_{h}(ρ) +
*k* ≈ *C*_{h}(ρ). But
*C*_{h}(ρ) ≈
log_{2}*i* (because that is the length of the binary
representation of *i*); it follows that *i* ≲
log_{2}*i*, contradiction. So there can be no such
algorithm *h* (van Lambalgen 1995: 11).

*Proof of Theorem
6.*
We show that, for fixed *k*. If μ is sufficiently
long then there is an initial segment σ of μ
such that *C*(σ) < |σ| −
*k*.

Suppose that we have some sequence μ, with initial
segment ν, such that ν is the *n*-th
string in the length-lexicographic ordering of the finite binary
strings. Let ρ be the next *n* digits of
μ, and let σ = ν
⁀ρ. We can generate σ from
ρ alone, because the length of ρ will give us
ν using the standard ordering via some constant decoder. So
we know that *C*(σ) ≈
*C*(ρ) ≤ |ρ| + *c*. We also
know, however, that |σ| = |ν| +
|ρ|. If ν is chosen to be longer than the sum
of the constants (i.e., |ν| > *c* + *k*),
it's obvious that *C*(σ) < |σ|
− *k*.

*Proof of Theorem
7.*
*Only if:* The set *U*_{k} consists of
those sequences which have finite initial subsequences which are
compressible by *k* bits. That is,
*U*_{k} = ∪_{σ ∈
R}
*N*(σ), where σ ∈
*R* iff
*K*(σ) ≤ |σ| −
*k*. The set of *k*-compressible strings is recursively
enumerable; on input *n* ∈ ℕ, take the universal
prefix-free decompression function *u*, and check if
|*u*(*n*)| − |*n*| < *k*. So each
*U*_{k} is effective open; indeed, this
algorithm shows the *U*_{k} to be uniformly
effective open (supplement
C).

Recall that as a proportion of all finite strings, there are at most
1/2^{k} compressible strings
(§2.2.1).
Since there are fewer prefix-free
*k*-compressible sequences than plain *k*-compressible
sequences, the upper bound still holds for prefix-free compressibility.
As this holds independently of the length of sequences, we can use it
to establish a measure for *U*_{k}:
μ(*U*_{k}) = 2^{−k}.
So μ(*U*_{k})
≤ 2^{−k}. So
∩_{k}*U*_{k} is effective
measure zero, and is a Martin-Löf test. Suppose *x* is
ML-random; thus *x* ∉
∩_{k}*U*_{k}. Therefore
there is a *k* such that *x* has no initial sequence
which is compressible by *k* bits, and is therefore prefix-free
Kolmogorov random.

*If:* The ‘if’ direction is a little trickier, and
depends in part on the details of how the sequential significance
tests for ML-randomness are defined. For details, see Li and
Vitányi (2008: 221–3) (as well as Chaitin 1987). The
general idea is to show that for a sequence ω which is not
ML-random, and fails some sequential test, then there is a way of
constructing a Turing machine that will compress initial subsequences
of ω arbitrarily much as the length of the initial subsequence
increases (i.e., the difference between *n* and the prefix-free
complexity of the length *n* initial subsequence of ω is
positively unbounded).