Notes to Chance versus Randomness
1. By the theorem of total probability, if Qi is the proposition that the chance of p is xi, C(p) = ∑iC(Qi) C(p|Qi). Suppose that one has arrived at one's current credence C by conditionalising a reasonable initial function on admissible evidence; then if the PP is true (and the NP is approximately true), it follows that one's credence C(p) is equal to ∑iC(Qi)xi. In other words, one's credence in p is simply your subjective expectation of the chance of p (always assuming you have no inadmissible evidence).
2. The proposition expressed by the relevant utterance of the sentence, ‘The coin lands heads’ does exist even when the coin doesn't land heads. Can it be the bearer of the chance? It seems implausible, since the proposition exists necessarily, and is intrinsically qualitatively identical, even when the chance varies; so the chance cannot supervene on properties of this proposition.
3. This conclusion, that chanciness is a feature of a generating process, is resisted by those philosophers, like von Mises, who reject single-case chance. If von Mises' frequentism were the only theoretically adequate account of chance, there would be some force to his contention, but that view is widely believed to be inadequate as an account of chance—see the discussion at supplement A.3. Another argument offered against single-case chance is Milne's generalisation of Humphreys 1985, ‘directed against any realist single-case interpretations of probability’ (Milne 1985: 130). His argument basically is that single-case conditional chances only exist when the conditioned event is not determined at the time of the conditioning event, but that this situation is very rare and would make almost all the conditional chances we use illegitimate. Milne's argument rests in part on making a close connection between chance and determinism, a controversial issue we will return to below (§7). But a more immediate possible line of objection to his argument is that Milne takes a feature of one theory of single-case chance—the causal tendency view of Giere (1973), which requires that conditional chances be understood as the chance of the conditioned event, given that the conditioning event has occurred—and applies it to theories which make no such assumption. In particular, he applies it to views according to which chances vary with time, even though Milne makes no explicit reference to any temporal parameter on the chance functions he uses in his argument. These possible responses to Milne's argument make it reasonable to conclude that single-case chance is consistent and an integral part of our ordinary beliefs about chance.
4. This does not apply to Bohmian mechanics, which is a different theory than orthodox elementary quantum mechanics, though it makes the same experimental predictions (Albert 1994, chapter ). This theory has a determinate prior state for every quality; it reproduces the observed predictions of quantum mechanics by permitting non-local interactions between arbitrarily separated parts of entangled systems.
5. This inference shouldn't be taken too far; even fundamental theories which do not mention chances may nevertheless be true in chancy worlds—just as there are people despite the fact that our fundamental physics doesn't mention people (Eagle 2011: §2).
6. More formally, a sequence is Borel normal if the frequency of every string σ of length |σ| in the first n digits of the sequence approaches 1/2|σ| as n → ∞. This obviously entails the strong law.
7. Von Mises himself gives a more general characterisation, as he is concerned to define the probability of an arbitrary type of outcome in an arbitrary sequence of outcomes, so he insists only that each type of outcome should have a well defined limit frequency in the overall sequence, and that frequency should remain constant in all admissibly selected subsequences, whether or not that frequency is ½.
8. We must restrict this proposal to include only those computable functions which take the value 1 an infinite number of times, and thus select only infinite subsequences of the original sequence. The problem with finite selection rules (those that take the value 1 only finitely often) is that every such rule will select outcomes drawn from some finite initial segment of the sequence. But, by the law of iterated logarithm, which we return to below in connection with the law of symmetric oscillation, infinitely many initial segments of a random sequence will have more 0s than 1s (and infinitely many will have more 1s than 0s). A selection rule that draws from the just the first n outcomes in a sequence, for infinitely many n, will be selecting from a subsequence with the wrong frequency, and will yield in turn infinitely often a finite subsequence with the wrong frequency.
9. For more details of Ville's theorem, see van Lambalgen (1987b: 730–1, 745–8), Downey and Hirschfeldt (2010: §5.5), and Lieb et al. (2006, Other Internet Resources).
10. The law of symmetric oscillation follows immediately from the law of the iterated logarithm, a celebrated result on the asymptotic behaviour of Sn, which states that almost all sequences have Sn spread around the mean (n/2) to an asymptotic limit of (2 log log n)½σn, where σn is the standard deviation √n/2. The result holds for much more general kinds of random variables (Kolmogorov 1929; Feller 1945).
11. As an interesting aside, Kolmogorov randomness provides a good explication of Lewis' notion of a ‘quasi-miracle’, crucial to his treatment of chancy counterfactuals. Lewis says that a quasi-miracle is a low-probability event that is also remarkable:
What makes a quasi-miracle is not improbability per se but rather the remarkable way in which the chance outcomes seem to conspire to produce a pattern. (Lewis, 1979a: 60)
The classic case is that of a monkey typing Hamlet by randomly striking the keyboard; that is an event remarkable but no more improbable than any other sequence of characters the monkey might have produced. The probability of a producing a given sequence of length n is 1/2n; this is equal for any sequence of the given length, orderly or not. So the probability does not account for the fact that different sequences of the same length can differ with respect to their remarkableness. The orderliness of a sequence σ may be defined as 1/2C(σ); orderly sequences are such that they exhibit patterns, and for such a patterned sequence C(σ) will be low, and 1/2C(σ) correspondingly higher. We can then define the remarkableness of σ as 2C(σ)/2|σ|—i.e., the orderliness of σ divided by its probability. If σ is both orderly and low-probability, it will be highly remarkable. So we might say that the occurrence of a remarkable event is a quasi-miracle. (This suggestion dovetails nicely with the revised Lewis-style approach to counterfactuals, using what amounts to a tweaked notion of quasi-miraculousness, in Williams 2008: §3.)
12. To see why, suppose it was not prefix-free. Then the code of one string σ is a proper part of the code of another string, τ. Since both codes start with 1[|σ|]0 (for how else could the code of σ be an initial part of each?), it follows that σ and τ must be the same length. But then the code of τ will be the same length as the code of σ, contrary to the assumption that the latter code is a proper part of the former code.
3. Note the presence in the upper bound for K(σ) of C(σ); since C is not a recursive function, this is not a computable upper bound on K (since |σ| is an upper bound on C, we do have a computable upper bound in that case). There is a somewhat less well-behaved computable upper bound on K (Downey and Hirschfeldt, 2010: §2.12).
14. Rather than solve the reference class problem, von Mises proposed to sidestep it entirely, and to deny that there is any such thing as the single-case chance of a particular event:
A probability of death is attached to this class of men or to another class that can be defined in a similar way. We can say nothing about the probability of death of an individual even if we know his condition of life and health in detail. The phrase ‘probability of death’, when it refers to a single person, has no meaning at all for us. (von Mises, 1957: 11)
15. This is contrary to what some have argued: for example, Earman (1986: 143–4) argues that there is no natural way to extend Kolmogorov randomness to biased sequences, because biased sequences are Kolmogorov compressible with respect to the Lebesgue measure. But that seems to hold 'compressibility' to a double standard that we aren't holding ML-randomness to; the natural generalisation in the main text brings the two approaches back into parity. I'm indebted to Christopher Porter for bringing my attention to this problem for Earman, and for helpful discussion of this generalisation of Kolmogorov randomness to the case of biased sequences.
16. The Champernowne number fails the Law of Iterated Logarithms, for example (Dasgupta, forthcoming: §3.4).
17. If we represented the system by a single bi-infinite sequence, this symbol shift characterisation is more obvious.
18. The pseudorandom sequences we considered in §4.5 lacked randomness because the finite seed entailed repetition in the sequence after some finite period. In the baker's transformation, the initial seed is a real number, a measure one set of which have an infinite random binary expansion (Downey and Hirschfeldt, 2010: Part II). If the initial seed is a random real in this sense, then the product will be a random sequence, even though the process is akin to the algorithms which produce merely pseudorandom sequences (of course since no algorithm can perfectly represent arbitrary reals, it cannot be the same algorithm).
19. Consider some small bundle of initial states S, and some state s0 ∈ S. Then, for some systems,
∃ε>0 ∀δ>0 ∃s0′∈S ∃t>0(|s0 − s0′| < δ ∧ |st − st′| > ε).
In fact, for many chaotic systems, all neighbouring trajectories within the bundle of states diverge exponentially fast.
20. Not all chaotic systems are dissipative; Werndl (2009) argues that this feature, displaying mixing behaviour, is precisely what chaos more generally amounts to.
21. This does not, despite first appearances, violate the first law of inertia—for at every instant, the law (in the form, bodies with no net force acting on them are unaccelerated) is true. At t = t* , the body is at rest at the only place on the dome where the forces balance; and at every t > t* , the body is accelerating because at every such time the body is at a point where net force is being exerted. (This does suggest, however, that the law of inertia in the form bodies continue in uniform motion (including rest) if not subject to force yields a misleading dynamical reading that would rule out the dome example.)
22. Norton's footnote: Since all excitation times T would have to be equally probable, the probability that the time is in each of the infinitely many time intervals, (0, 1), (1, 2), (2, 3), (3, 4),… would have to be the same, so that zero probability must be assigned to each of these intervals. Summing over all intervals, this distribution entails a zero probability of excitation ever happening.
23. Eagle (2005: §4–5) suggests that a system is predictable iff, conditional on what we know about the past states of the system, and knowing the laws, we may have a posterior credence in future states that is closer to the truth than our prior credence (where closeness to the truth is characterised by having a more inaccuracy-minimising credence, as in Joyce 1998). Conditioning on we know about a system, rather than everything that is true, makes unpredictability appropriately relative to us, especially given the existence of unknown truths. It follows that a system will be unpredictable to the extent that information we can obtain about the past states of the system is, or is very close to, probabilistically independent of future states, holding fixed our knowledge of the laws. Werndl (2009: §5) endorses a similar characterisation when she proposes that approximate probabilistic irrelevance is the hallmark of unpredictability, and proves that mixing systems exhibit such unpredictability. If randomness is unpredictability, then, sequences produced by random processes will be at least approximately Bernoulli, which does explain the appeal of KML-randomness. Berkowitz et al. 2006 argue that in fact an epistemic conception of randomness as unpredictability is the only way to understand the ergodic hierarchy of ergodicity, mixing, and Bernoulli properties of systems.
24. The existence of such a ‘prediction algorithm’ should not be taken to mean that the system really is predictable. We could never know that the algorithm made correct predictions until after they had come true, so they would hardly have the epistemic status that a reliable prediction should have for us, namely, to guide our future expectations.
25. An even more radical denial of universalism comes from the debate over free will. Libertarians believe that our free will is not determined by the past states of the universe, but also that our exercises of will are not purely by chance—rather, we make them happen (through the determinations of the will, not the determination of past history). If this view is coherent, it may provide a case where there is indeterminism but no chance.
26. Admittedly, showing that a sequence is vM-random is not yet to show that it is KML-random. Yet the role of effective tests in Martin-Löf's construction ensures that the non-random sequences will be effectively determined to satisfy some measure zero property. Such effective tests are specified by properties low down on the arithmetical hierarchy (ML-randomness is also known as Σ01-randomness because passing a ML-randomness test can be specified as not being in the intersection of a sequence of Σ01 classes), so it is plausible that some specification higher up the hierarchy (as Humphreys' is, at least Π02) will define a deterministic sequence which does not violate any effective measure one property.
Notes to Supplement A. Basic Principles About Chance
1. One recent objection to PP worth noting arises from apparent direct counterexamples to the PP, derived from the existence of contingent a priori truths, which illustrate the possibility of a formal mismatch between chance functions and credence functions (Hawthorne and Lasonen-Aarnio, 2009; Williamson, 2006) This example gives the flavour of both arguments: consider the sentence ‘actually A iff A’. This sentence is a priori true, and so should get credence 1. Yet the sentence is contingent. Suppose that A is actually true; then according to the standard logic of the ‘actually’ operator (Davies and Humberstone, 1980), ‘Actually A iff A’ is true in exactly the same possibilities as A (this is because ‘Actually A’ is necessarily true if true). So if A is a statement with a non-trivial chance, ‘actually A iff A’ also has a non-trivial chance, differing from the credence one should have in it, which will pose an obvious problem for the PP. One response to this kind of argument is to take the entities in the domain of the chance and credence functions to be propositions rather than sentences (as Lewis originally did in formulating the PP), and suggest that the problem does not arise because the contingent a priori only emerges at the level of sentences. To have different credences in A and ‘Actually A iff A’, even though they express the same proposition when A is actually true, is to violate some widely accepted (though not uncontroversial) norms on rational credence. A sentential account of the a priori (such as the two-dimensionalist account offered in Stalnaker 1978) can accept this norm, accept the PP, and nevertheless offer some kind of explanation of the apparent counterexample. Whether this sort of response succeeds remains a matter for debate.
It is also worth noting that de Finetti (1974), for one, did deny that there really are chances, hoping to do everything with exchangeable credence alone; this may not be right, but perhaps Lewis is too strong in calling this program ‘silly’.
2. Others have argued that the original puzzle only arises because even the original PP inappropriately conditionalises the credence on evidence E which includes information about the chances. Ismael (forthcoming) argues that the real principle to adopt is the following, where Ht is just the history up to t
(UPP) C(p|Ht) = Cht(p)
This principle also is not susceptible to undermining, because one never conditionalises on the theory of chance (assuming that the past history itself does not fix the chances). One won't ever in general know the right hand side of this equation; but by the theorem of total probability and general principles about current estimates of unknown quantities, it can be estimated as the weighted sum of the chances assigned by various future histories, weighted by your credences in those histories. Ismael's final recommendation is ‘that you should adjust credence in A to your best estimate of the chances’.
3. Schaffer endorses a strengthened version of the BCP, which he calls the Realization Principle (RP): this is the claim that if the present chance of A is greater than zero, there is a world where A is true which matches ours in history and natural laws (not just laws of chance, but all laws).
4. Although there is an argument that something like the BCP can be defended on elementary grounds: It is an axiom of probability theory that tautologies should receive probability one; Hájek suggests that necessary truths also should receive probability one. This is perhaps debatable in the case of subjective probability, for perhaps it is sometimes possible to rationally not be fully confident in a necessary truth (for example, if identity statements are necessary if true, it may yet be rational on occasion to doubt a true identity claim). But in the case of objective chance, it is hard to dispute that if p is necessary, then the chance of p should be 1. By substitution, necessarily ¬p entails Ch(¬p) = 1. Equivalently, necessarily: if it is not possible that p, then the chance of p is zero. By contraposition, we get that if the chance of p is non-zero, p is possible. While Hájek's argument seems sound, its conclusion is notably weaker than the BCP as defended by Bigelow et al., Mellor, Schaffer, or Eagle. For Hájek's claim is just that something with a non-zero chance is true at some possible world—it says nothing about whether that possibility should be one which is systematically similar to actuality with respect to the facts that ground the actual chance. But all the other formulations do entail that the possibilities where the outcome does happen should be similar with respect to the actual past, or to the actual properties of the chance device. The BCP proper is thus stronger than the relatively trivial conclusion of Hájek's argument, and correspondingly more vulnerable to objections.
5. The downside is that his theory of ‘L*-chance’ is subject to the objections raised by Arntzenius and Hall (2003) that we discussed above.
6. Though Strevens (1999) raises an epistemological objection to the effect that, even if it is a truth about chance, the PP cannot be justified at all, on any conception of chance. His basic concern is analogous to Hume's worries about traditional principles of induction—that setting one's credences equal to the chances is rational only if one is already convinced that the high objective probability of PP leading to epistemic success justifies confidence in the PP, i.e., is justified only if one already accepts the PP. (Hall 2004 develops a more limited version of this conclusion, arguing that reductionists about chance can't justify the PP.) Hoefer (2007) responds using his particular account of chance. Others may well adapt their own preferred response to the Humean problem of induction, or simply deny that the justification of the PP matters as much to the theory of objective chance as its truth—and its truth will do to secure the frequency-chance connection needed here.
7. Popper (1959: 34) took himself to be objecting to frequentism with his example of a sequence of mixed tosses of differently biased dice, arguing that ‘the frequency theorist is forced to introduce a modification of his theory… He will now say that an admissible sequence of events (a reference sequence, a ‘collective’) must always be a sequence of repeated experiments.’ But von Mises (1957: 14) had already imposed this requirement, and indeed had already made progress towards identifying the basis of a collective with a physical property of the chance setup: ‘The probability of a 6 is a physical property of a given die and is a property analogous to its mass, specific heat, or electrical resistance. Similarly, for a given pair of dice (including of course the total setup) the probability of a “double 6” is a characteristic property, a physical constant belonging to the experiment as a whole and comparable with all its other physical properties.’