Supplement to Chance versus Randomness
B. Further Details Concerning Algorithmic Randomness
- B.1 Further Details Concerning Martin-Löf Randomness
- B.2 Lebesgue Measure and Effective Measure
Martin-Löf makes use of the notion of a statistical test. In statistical hypothesis testing, we will reject a hypothesis if an outcome is observed that is improbable (supposing the hypothesis were true) at some set significance level; in that case, the outcome is significant. Correspondingly, we may define a sequence to be significant if it is a member of an effective measure zero set. A random sequence, then, is not a significant sequence.
How are these recursive significance tests defined? We illustrate the idea using Martin-Löf's (1966: 604) example of the property of large numbers (having a digit frequency equal to ½). He restricts attention to significance levels of the form 2−m, and suggests that we would reject a finite initial substring x1…xn at the level 2−m if the difference between the observed relative frequency in that string differed ‘too much’ from ½. By ‘too much’ Martin-Löf means that fewer than 2n−m strings which are in fact initial subsequences of a random sequence, would have a relative frequency differing by that much from ½. (By the definitions in supplement B.2, this means that the string is rejected if it is a member of a set which has measure less than 2n−m/2n = 2m.)
An infinite sequence x is rejected at level 2−m iff we would reject some initial string at that level. Finally, an infinite string is rejected tout court if, for every level of significance, there is an initial string of that sequence that we would reject at that level (Martin-Löf 1969: 609). A Martin-Löf random sequence is one we would not reject.
B.1.2 Relations between Martin-Löf Randomness, von Mises Randomness, and other Measure Theoretic Accounts, including Schnorr Randomness
If φ is an effectively computable (recursive) place selection, the compound test of first applying φ and then some recursive significance test is also a recursive significance test (since the class of recursive functions is closed under functional composition). Therefore the class of ML-random sequences is closed under effective place selections, and contains only sequences with the right limit relative frequencies. Given the characterisation of vM-random sequences in section §2.1.1, it is clear that the ML-random sequences are a subset of the vM-random sequences. Ville's result shows that this inclusion is strict, a result strengthed by a result of van Lambalgen (1987b: §4). It also follows immediately from the above claims that the class of ML-random sequences is such that every admissible subsequence of a random sequence is itself Borel normal. (It is not known whether the requirement that all admissible subsequences be themselves Borel normal is sufficient for ML-randomness.)
Various refinements of ML-randomness have been offered, usually by varying the conditions on what should count as an effective statistical test. Schnorr (1971), for example, restricts the effective tests to a narrower class than Martin-Löf permits, and thus has a correspondingly broader set of random sequences. For Schnorr, the effective uniform measure zero sets should be defined as the intersection of a sequence of sets whose measures are not merely bounded by computable reals (as in section B.2), but are themselves computable reals. Schnorr offers an argument that this requirement—that the significance tests should be total recursive, not just recursive as Martin-Löf requires—is needed for genuinely effective significance tests. Schnorr contends that only when the test is total recursive can we ‘visualize’ the property of stochasticity that is being tested for, and this is the only constraint on whether we can genuinely grasp the property (van Lambalgen 1987a: 70).
Schnorr also provides an interesting connection with the gambling motivation behind von Mises' view, as Schnorr is able to characterise both his and Martin-Löf's conception of randomness in terms of the failure of a generalised kind of betting strategy called a martingale, which generalises the place selection characterisation of betting strategies (van Lambalgen, 1987a: §3.4; Martin-Löf, 1969b: §9). A martingale determines a betting strategy that depends not only on the outcomes observed, but also on the accumulated winnings of the gambler employing it. (In more detail: a martingale is a function d from finite binary sequences into the non-negative reals such that d(ω) = (d(ω1) + d(ω1))/2. The function d is the expected winnings of a gambler following the strategy on the initial subsequence of a given sequence, subject to the condition that the expected winnings on a sequence is the average of the expected winnings of the next sequence—in effect, the extended sequence that becomes actual pays double the wager placed on it. The values of d on ω, ω1, and ω0 determine the entire betting strategy.) A martingale succeeds on a sequence if the gambler's expected winnings are arbitrarily great (i.e., the limit of d as the length of the sequence increases is unbounded). A constructive martingale is one for which a computable function (of ω and some natural number t) monotonically approaches d(ω) from below as t increases. Schnorr proves that a sequence is ML-random iff no constructive martingale succeeds on it. A computable martingale is one where d itself is computable. He defines a Schnorr-random sequence as one on which no computable martingale succeeds sufficiently quickly to be effectively detected. It is clear that every ML-random sequence is Schnorr random, but not vice versa.
Schnorr motivates his ‘sufficiently quickly’ condition by observing that the idea of Martin-Löf tests is to represent statistical tests, but it ends up including many statistical tests that do not ‘have a physical meaning’ (Schnorr 1971: 255). He continues, observing that the statistical properties with a physical meaning
are intuitively those properties that can be established by statistical experience. This means that a sequence fails to be random in this sense if and only if there is an effective process in which this failure becomes evident. On the other hand, it is clear that if there is no effective process in which the failure of the sequence to be random appears, then the sequence behaves like a random sequence. Therefore the definition of random sequence has to be made in such a way that this sequence is random by definition. (Schnorr 1971: 255)
Schnorr 1971 actually discriminates a number of different randomness properties using martingales, and further definitions give rise to other natural classes of random sequences: see also Li and Vitányi 2008: § 4.5.8, Dasgupta, forthcoming: §11, Downey and Griffiths 2004, and Downey and Hirschfeldt, 2010: ch. 7. But Schnorr proposes that, for the reasons given, his account of random sequences is more conceptually adequate; and suggests a rival to the Martin-Löf-Chaitin thesis:
Thesis: A sequence behaves within all effective procedures like a p-random sequence iff it is Schnorr p-random. (Schnorr 1977: 198)
However, the philosophical (as opposed to mathematical) rationale behind these notions is not yet clear. Schnorr's notion of visualisability, whatever its merits, is not clear enough to be decisive, especially because even arbitrary total recursive tests are still far from being graspable in any ordinary everyday sense. And Schnorr's restriction has other drawbacks; for one, there is no universal test of Schnorr randomness (Downey and Hirschfeldt 2010: §7.1). This is because, if the property is satisfied by a computable example (as Schnorr requires for visualisability), there is a computable sequence which is not rejected by the test. If there were a universal test for Schnorr randomness, there would be a computable sequence that is not rejected by it—but that would be a computable random sequence, which cannot be. So of course it is an intended feature of Schnorr's account that there be no universal test; but that does make the space of randomness properties a little more unmanageable on Schnorr's view. In general, one might be worried (for good reason) about Schnorr's operationalist idea that a non-random sequence must be one that can be effectively determined to be non-random: we don't normally take the existence of good evidence for a property to be equivalent to the presence of the property.
Another potential difficulty with Schnorr randomness is that it is not quite as robust as ML-randomness. The class of ML-random sequences has many apparently very different mathematical definitions (in terms of effective tests, in terms of Martingales, and—as we will see in §2.3—in terms of complexity) and is thus apparently a mathematically robust notion; it is not clear that Schnorr randomness has the same robustness. There is a complexity based account of Schnorr randomness in terms of computable prefix free machines (Downey and Griffiths 2004), as well as one in terms of computable martingales (Downey and Hirshfeldt 2010: §7.1). But ML-randomness satisfies van Lambalgen's theorem on relative randomness (van Lambalgen 1987a, Dasgupta forthcoming: §10), for if x ⊕ y is a random sequence which results from interleaving two other sequences x and y, then those two sequences will be random relative to each other (that is, even complete knowledge of the outcomes in x—i.e., using x as an oracle—won't improve predictability of the outcomes in y). But Schnorr random sequences do not satisfy van Lambalgen's theorem (Downey and Hirschfeldt 2010: §5.9)— in fact, there is a Schnorr random sequence such that the even outcomes compute the odd outcomes and vice versa. However, the strongest reason for suspecting that Schnorr randomness is not the correct account of random sequences is that there is a Schnorr random sequence that is not von Mises-random (Wang 1996: Theorem 3.3.5). So there is a Schnorr random sequence that does not have invariant subsequence frequencies under admissible place selections, and is therefore susceptible to exploitation by a gambling system. This seems compelling reason to think that Schnorr randomness is not conceptually adequate.
Having said this, these objections to Schnorr randomness open the door to similar objections to ML-randomness, though one might wonder whether the purported insufficiencies of ML-randomness are nearly as worrisome as the insufficies of Schnorr randomness, no matter how much Schnorr might embrace as intended some of these apparent counterexamples. But a more nuanced response, neither opting for ML-randomness or Schnorr randomness as the ‘correct’ account of our intuitive notion of randomness, might appeal in light of some recent results. (This pluralist approach to randomness is reminiscent of the pluralist position with respect to the informal notion of logical consequence defended by Beall and Restall (2006).)
The basic idea is the familiar one that random sequences are typical, in the sense of most expected (§2.1). This notion of typicality is connected to a mathematically important style of quantification, where a given result is said to hold for almost all members of a given domain. For example, consider this theorem:
Theorem 9 (Lebesgue, via Jordan). Every real-valued function of bounded variation is differentiable almost everywhere. (Brattka, Miller, and Nies 2011: §1.1)
This theorem applies to arbitrary functions. If we restrict our attention to computable functions, we can obtain this result:
Theorem 10 (Demuth; Bratka, Miller, and Nies 2011: § 3.2). Every computable real-valued function of bounded variation is differentiable at exactly those reals which are ML-random.
So, for computable functions, the Martin-Löf random reals (those represented by a ML-random sequence) provide a precise measure one set that corresponds to the ‘almost everywhere’ of the non-effective Theorem 9. The ML-random reals are exactly those which are typical with respect to this property of computable functions.
The interesting thing is that other ‘almost everywhere’ results of classical analysis do not end up holding at only the ML-random reals when we move to computable functions. For example:
Theorem 11 (Rute 2011; Pathak 2009; Pathak, Rojas, and Simpson). Every variation computable absolutely continuous function is differentiable at exactly those reals which are Schnorr random.
Thus another almost everywhere property of classical analysis, when effectivised, holds at just the Schnorr reals. In general, it is possible to characterise randomness notions (such as ML-randomness, Schnorr randomness, computable randomness, etc.) in terms of the differentiability of some appropriate class of computable functions (the ‘main thesis’ of Nies 2011). Although the different notions of randomness end up corresponding to the differentiability of slightly different classes of appropriate computable functions, in a sense there is a common core notion of typicality in play here. But unlike the case of Church's thesis, the existence of many overlapping but distinct precisifications of ‘typical with respect to differentiability of (some sort of) computable function’ seems to undermine the idea that there is any one precisification that perfectly deserves the name. On the other hand, the thesis that randomness is intimately linked to typical differentiability is not one that has much intuitive support in the first place, and a kind of pluralism with respect to the different realisers of this description may not appeal. The more direct considerations for ML-randomness, and against Schnorr randomness, in particular that ML-random sequences are strictly a subset of the Schnorr random sequences, might justify us in continuing to give some support to the Martin-Löf-Chaitin Thesis.
Standard approaches to algorithmic randomness, those summarised in the state of the art volumes of Downey and Hirschfeldt (2010) and Nies (2009), are based on randomness with respect to the standard Lebesgue measure over the space of sequences (discussed in supplement B.2). The limitation this poses is that we also ordinarily might want to attribute randomness to other kinds of objects than equifrequent sequences. As discussed in §4.3, to connect chance with randomness will in particular involve consideration of other Bernoulli processes which produce sequences with unequal outcome frequencies. Also in §4.3, a generalisation of Schnorr's theorem to arbitrary computable measures was introduced. While this allows biased sequences to count as both Kolmogorov and ML-random, and looks to be a natural way to maintain a univocal notion of randomness in the face of the compressibility of biased sequences, computable measures can have some interesting features.
The feature most of interest here is that a some computable measures assign positive measure to a some individual infinite binary sequences. (The uniform Lebesgue measure cannot assign positive measure to any point of the Cantor space, on pain of violating the axioms of measure theory.) Any sequence which receives positive measure is called an atom of the measure under discussion. Examples are easily constructed: consider the computable measure which assigns measure 1/2 to the individual sequence consisting only of 0s, and divides the remaining measure uniformly over the other sequences. It turns out that the only individual sequences which can get such positive probability are the computable sequences: a sequence is an atom of a computable measure if and only if the sequence is computable (Kautz 1991). But this seems to pose a problem, for an individual sequence can be random with respect to a measure, because it has positive probability (and therefore is part of every λ-measure 1 set of sequences), even though it is computable. Atoms are thus sometimes referred to as ‘pseudorandom’ sequences, not because they mimic a Lebesgue-random sequence, but because they are functionally like random sequences with respect to their preferred measure, but are in some sense not ‘really’ random because of their computability. Such pseudorandom sequences might pose a problem for the generalisation of Kolmogorov complexity to biased sequences, because it seems that the generalisation ends up classifying as random some atoms, which are apparently not random because they are computable.
There is an interesting result that pushes back against this unwelcome conclusion (Downey and Hirschfeldt, § 6.12). It turns out that for every computable measure λ, if σ is random with respect to that measure and is not an atom of λ (i.e., is not itself a computable sequence), then there is an ML-random sequence σ′ that is in the same Turing degree as σ (a result due to Levin and Zvonkin (1970) and Kautz (1991)). A Turing degree, for our purposes, is a set of sequences such that for any pair of sequences σ1 and σ2 in the set, there exists a Turing machine equipped with an oracle concerning σ1 that computes σ2. (An oracle concerning a sequence is a special state in a Turing machine that allows that machine, having computed a string x to obtain a ‘yes/no’ answer concerning whether the x-th member of the sequence has value 1; equivalently, conceiving of the sequence as coding a set of natural numbers, whether or not x is a member of the set—see Li and Vitányi 2008: 38.) So if a given λ-random sequence is of the same Turing degree as a ML-random sequence, it is possible to effectively recover an unbiased random sequence (one that is ML-random) from an uncomputable biased sequence (one that is λ-random). Arguably, this shows that it is ‘at least as “hard”’ (Li and Vitányi 2008: 44) to determine the content of the λ-random sequence as the genuine ML-random sequence (and vice versa). In that sense, the non-computable λ-random sequences are at least as disorderly as the ML-random sequences, and having admitted that the latter capture the notion of randomness for a sequence tolerably well, we should also admit that the former do so as well.
One way of thinking about this result is that random sequences are those non-computable sequences that are random with respect to a computable measure, for those sequences are in some sense equivalent to the sequences that are random with respect to the Lebesgue measure. The non-redundancy of the clause requiring the non-computability of the sequence is due to existence of atoms in measures other than the Lebesgue measure, but that those atoms have non-zero measure may be regarded as an unintended side-effect.
Another way of thinking about it would involve excluding atomic countable measures from the domain of legitimate probability functions over the space of sequences. One way of doing this would be to argue that proper chance functions are such that sequences with the same frequencies should have the same measure. Since any atomic infinite sequence shares a frequency with the countably many sequences that differ from it in only finitely many places, imposing that condition would exclude atomic measures. We could then classify the random sequences as those that are random with respect to some proper measure (where the propriety of measure is justified in some way from the idea that genuine chances that underly some particular sequences should be proper in the sense just introduced). This approach secures a notion of randomness for biased sequences, but at the cost—noted in the main text—of putting a connection with chance in by hand. If that connection with chance is not presupposed, the exclusion of atoms looks unmotivated, except by the result connecting ML-random sequences with the uncomputable λ-random sequences. And there is reason to think that this approach won't work in any case: some recent work suggests that there is some atomless measure for any point in the Cantor space, with the exception of those points falling in a poorly understood countable set known as NCR1 (Reimann and Slaman 2008). So not every atomless measure will be proper; what the precise role of atomlessness is with respect to λ-randomness is yet to be made entirely clear.
Let X be the set of all sequences of outcomes of a binary process. Let X<ω be the set of finite sequences of outcomes of a binary process, or strings. If x ∈ X, let sx(i) be the i-th member of sequence x. If σ is a finite sequence of outcomes, let N (σ) be the set of all (finite or infinite) x ∈ X which begin with σ.
We now define a uniform Lebesgue measure μ on subsets of the Cantor space (Dasgupta forthcoming: § 2; Earman 1986: 138–9; Gowers 2008: III.55). We begin by assigning measures to the sets N(σ), called the basic sets. Note that half of the possible outcome sequences begin with 1, and half with 0; and of those that begin with 1, half begin 11, the other half 10; and so on. In general, where |σ| denotes the length of the string σ, the proportion of all possible outcome sequences that begin with σ—the measure of N(σ), denoted μ(N(σ))—is 1/2|σ|. We now extend this measure to the full space. An open set is defined to be an arbitrary union of basic sets. A basic set is maximally contained in A if N(σ) ⊆ A and for every proper initial segment τ of σ, it is not the case that N(τ) ⊆ A . It turns out that every open set G is a union of non-overlapping (‘disjoint’) basic sets that are maximally contained in G. The measure of a union of disjoint sets is the sum of the measures of the sets, so if the open set G decomposes into disjoint basic sets N(σ1), N(σ2), …, then μ(G) = ∑i μ(N(σi)).
A set E is said to have measure zero if there is an infinite sequence of open sets G1, G2, … such that (i) E is contained in each Gi (equivalently, E ⊆ ∩iGi); and (ii) μ(Gi) ≤ 2−i. (The idea is that a small set should be coverable by an open set which is also small. Conditions which do the same job as (ii) can be formulated in various ways, with various exponents; it is convenient to have the measures of the open sets diminish quickly.) A set E is measurable iff it can be approximated arbitrarily closely by the union of finitely many basic sets. (More precisely, if for any ε > 0, there is a finite union of basic sets F such that the symmetric difference between E and F can be covered by a set of measure less than ε. The symmetric difference is the set of elements which are in exactly one of E and F; it is the ‘mismatch’ between the two sets.)
The measurable sets form a σ-algebra, which is a set with the nice properties of being closed under complementation, union, and intersection. Moreover, it can be shown that the measure μ can be naturally and uniquely extended to the set of all measurable sets, where the measure defined has corresponding nice properties (so that, e.g., the measure of the complement of E is 1 − μ(E), if E ⊆ F then μ(E) ≤ μ(F), etc.). The measure so-defined is the Lebesgue (uniform) probability measure on the Cantor space.
Each individual x ∈ X forms a singleton of measure zero, since if σn is the initial n elements of x, then x ∈ N(σn) for all n. Each N(σn) is an open set, because a basic set. Moreover, there are infinitely many such sets, and for each μ(N(σn)) = 2−n. So this singleton set satisfies the conditions for a measure zero set. (A countable union of measure zero sets also has measure zero, so all countable sets of sequences are also of measure zero.)
We now apply to measure theory some ideas from the theory of computability. A set is recursively enumerable iff it is the range of a (total or partial) recursive function on the natural numbers (Boolos et al. 2003: §8.3). Since the recursive functions are just those which are effectively computable, by the Church-Turing thesis, we may also call these sets computably enumerable. We defined an open set as an arbitrary union of basic sets; we now define an effective open set as a union of basic sets which are members of a recursively enumerable set. There are only countably many recursive functions, so there are only countably many recursively enumerable sets of basic sets. Since there are countably many basic sets, there are uncountably many sets of basic sets. So while there are uncountably many open sets, there are only countably many effective open sets; the later notion is much more restrictive. A sequence of sets G1, G2, … is uniformly effective open if there is a recursive function f from pairs of natural numbers (m, n) to basic sets such that each Gm = ∪∞n=1 N(f(m, n)). That is, the sequence is uniformly effective open if not only is each member effective open, but there is a single recursive function that enumerates, for each member of the sequence, the set of basic sets of which it is the union.
Adapting the earlier definition in the obvious way, a set is said to have effective measure zero if there is an infinite sequence of uniformly effective open sets G1, G2, … such that (i) E is covered by each Gi; and (ii) μ(Gi) ≤ 2−i. (The idea is that an effectively small set should be coverable by an effective open set which is also small.) A set is effective measure one iff its complement is effective measure zero.