## Notes to Counterfactuals

1. See Declerck and Reed (2001: 99) and Brée 1982. See also Fintel 1999 for a closely related definition.

2. D. Lewis (1973b: 3) attributes (4) and (3), and this observation, to Adams (1970). But the actual pair in Adams (1970) is if Oswald hadn’t shot Kennedy, Kennedy would be alive today and if Oswald didn’t shoot Kennedy, Kennedy is alive today, and the observation made there is that the subjunctive is justified while the indicative is not. Adams is careful to say that this does not imply that one is true while the other false.

3. Where “V” is a variable ranging over un-tensed verbs.

4. Those labels are used across languages to distinguish two broad functional categories of verbal mood that indicates whether the speaker is committing to the occurrence of the event described by that verb (Palmer 1986: §1.1.2)—much as verbal tense indicates whether that event occurred in the past, present or future. Indicative indicates the clause is being committed to, while the subjunctive is noncommittal (it often includes imperatives, optatives, interrogatives). While were-conditionals such as (7) could be said to have an antecedent in the subjunctive mood, the same cannot be said of (4), which is formally indicative past perfect. Further, some languages have a widely used subjunctive mood, but do not employ it in the relevant conditionals (Palmer 1986; Iatridou 2000). Many linguists working on non-Indo-European languages use the labels “realis” and “irrealis” in a related but different way (Palmer 1986: §§6-7). Stone (1997: 8) suggests terminology implicit in typological work: remote and vivid modality.

5. For surveys on indicative conditionals see the complementary entry Indicative Conditionals and Gillies (2012). For a survey of subjunctive conditionals see Fintel (2012).

6. This entry will use $$\phi,\psi$$ as variables ranging over any sentences of the language, A as a variable ranging over atomic sentences, and $$\mathsf{A,{\ldots},Z}$$ as particular atomic sentences.

7. Some incompatibilists instead boldly reject the forms of determinism and indeterminism that lead to this conflict. This could seen as providing an alternative account of what is physically possible so as to make (9) true.

8. See Kant (1781: A533/B560–A558-B586) and Smilansky (2000) for something like this pragmatic view, and Pereboom (2014: 176–178): for criticisms of it.

9. As Marr (1982) puts it: an abstract mathematical theory of how cash registers work leaves open what system of numerical representation is used (binary, Arabic, Roman) and what algorithms are employed to manipulate those representations to perform arithmetic operations.

10. See Interpretations of Probability (§1) for details about the probability calculus.

11. Pearl (2009) uses “=” instead of “$$\dequal$$”, but this can obscure the fact that this is an asymmetric relation: the left-hand side is determined by the right.

12. The corresponding joint probability distribution requires storing $$2^8=256$$ probability values—one for each Boolean combination of the variables—while this Bayesian network would require only 18—one conditional probability for each Boolean combination of the parent variables, and one for each of the two independent variables. See Sloman (2005: Ch.4) and Pearl (2009: Ch.1) for details.

13. Spirtes, Glymour, and Scheines (1993, 2000), Pearl (2000, 2009), and Halpern and Pearl (2005a,b) instead treat the equations as representing the “basic mechanisms” or laws of a causal system. This interpretation is best construed as a non-reductive analysis of causation, rather than analyzing causation in terms of basic counterfactuals. The entry Causation and Manipulability describes how such a view fits into manipulationist theories of causation and the entry Probabilistic Causation describes how it fits into probabilistic theories of causation.

14. While Wilson (2018) explicitly interprets the structural equations as basic counterfactuals, Schaffer (2016) is less clear on this point. It may be better to read Schaffer (2016) as taking those equations to be basic grounding claims. However, as with causation, there are good reasons to view these equations as counterfactuals (Hitchcock 2001 and Woodward 2002, 2003).

15. K. Bennett (2017: §3.3) rejects a counterfactual theory of building relations while taking causation and grounding to be kinds of building relations.

16. For Woodward (2003: §5.6), explanations need not involve laws of nature. They only need to involve “invariants” like the relationships represented in a system of structural equations.

17. See Ichikawa (2011), K. Lewis (2016, 2017), and Ippolito (2016) for further discussion of the context-sensitivity of counterfactuals.

18. Rather than a sphere of accessibility, Kripke (1963) uses an accessibility relation $$R(w,w')$$. Accessibility spheres will fit more smoothly with the presentation here and can be defined in terms of an accessibility relation: $$R(w)\dequal {\{w'\mid R(w,w')\}}$$.

19. Here, v is an atomic valuation which assigns every atomic sentence to exactly one truth-value in each possible world. Atomic valuations correspond to one line in a truth-table in classical logic.

20. I would like to thank Gabriel Greenberg for allowing me to use this (modified) version of a diagram from one of his unpublished papers.

21. I would like to thank Gabriel Greenberg for allowing me to use this (modified) version of a diagram from one of his unpublished papers.

22. Auxiliary assumption: if $${\llbracket}\phi_2{\rrbracket}^R_v\subseteq{\llbracket}\phi_1{\rrbracket}^R_v$$ then $$\phi_2>\phi_1$$ is true at any w in $$R,v$$. Suppose that the antecedent of antecedent monotonicity holds so (a) $$\phi_1>\psi$$ is true at some $$w,R,v$$ and (b) $${\llbracket}\phi_2{\rrbracket}^R_v\subseteq{\llbracket}\phi_1{\rrbracket}^R_v$$. From (b) and the auxiliary assumption it follows that $$\phi_2>\phi_1$$ is true at $$w,R,v$$. By Transititivity, $$\phi_2>\psi$$ follows.

23. Auxiliary assumption: if $$\phi>\psi_1$$ is true and

${\llbracket \psi_1 \rrbracket}^R_v \subseteq {\llbracket \psi_2 \rrbracket}^R_{v},$

then $$\phi>\psi_2$$ is true. Suppose that the antecedent of antecedent monotonicity holds so

$\tag{a} \phi_1>\psi \textrm{ is true at some } w,R,v$

and

$\tag{b} {\llbracket \phi_2 \rrbracket}^R_{v} \subseteq {\llbracket\phi_1 \rrbracket}^R_{v}.$

$$\neg\psi>\neg\phi_1$$ follows from (a) by Contraposition. From (b) it follows that

${\llbracket \neg\phi_1 \rrbracket}^R_v \subseteq {\llbracket\neg\phi_2 \rrbracket}^R_v,$

since

$W-{\llbracket \phi_1 \rrbracket}^R_v \subseteq W-{\llbracket \phi_2 \rrbracket}^R_v$

follows from (b) and the set-theoretic fact that

$A\subseteq B \iff (W-B)\subseteq (W-A).$

From this and the auxiliary assumption $$\neg\psi>\neg\phi_2$$ follows. By Contraposition again, $$\phi_2>\psi$$ follows.

24. Peirce (1896: 33) attributes this view to Philo the Logician, a member of the early Hellenistic Dialectical School. However Bobzien (2011: §3.1) presents Philo as a material implication theorist. For these historical issues see Sanford (1989: Ch.2), Copeland 2002, and Zeman (1997).

25. Saying that $$\phi\supset\psi$$ is true throughout $$R(w)$$ is equivalent to saying that $$\psi$$ is true throughout the $$\phi$$-worlds in $$R(w)$$. More formally:

$\tag{a} R(w)\subseteq ((W-{\llbracket \phi \rrbracket}^R_v) \cup {\llbracket\psi\rrbracket}^R_v)$

holds if and only if

$\tag{b} (R(w)\cap{\llbracket\phi\rrbracket}^R_v) \subseteq {\llbracket\psi\rrbracket}^R_v.$

Suppose (a) and that $$w'\in R(w)\cap{\llbracket\phi\rrbracket}^R_v$$. Then $$w'\in R(w)$$ and by (a)

$w'\in((W-{\llbracket\phi\rrbracket}^R_v) \cup {\llbracket\psi\rrbracket}^R_v).$

This entails $$w'\in{\llbracket\psi\rrbracket}^R_v$$, after all $$w'\in{\llbracket\phi \rrbracket}^R_v$$ in which case

$w'\notin(W-{\llbracket\phi\rrbracket}^R_v).$

Thus (b) follows from (a). Now suppose (b) and that $$w'\in R(w)$$. Either $$w'\in{\llbracket\phi \rrbracket}^R_v$$ or $$w'\notin{\llbracket\phi \rrbracket}^R_v$$. Suppose the former. Then

$w'\in R(w)\cap{\llbracket\phi \rrbracket}^R_v.$

So by (b) $$w'\in{\llbracket\psi \rrbracket}^R_v$$ and so

$w'\in((W-{\llbracket\phi \rrbracket}^R_v) \cup {\llbracket\psi \rrbracket}^R_v).$

Suppose the latter. Then $$w'\in(W-{\llbracket\phi \rrbracket}^R_v)$$ and so

$w'\in((W-{\llbracket\phi \rrbracket}^R_v) \cup {\llbracket\psi \rrbracket}^R_v).$

Thus (a) follows from (b).

26. For this see the entry Modal Logic.

27. For a more exhaustive study of the logic of strict conditionals see Cresswell and Hughes (1996: Ch. 11). For conditionals generally, see The Logic of Conditionals and Nute 1980b.

28. See Gillies (2007: 335 fn10).

29. The contemporaneity of Sprigge, Lewis, and Stalnaker is stated in a letter from Lewis to Sprigge published in Sprigge (2006). According to Nute (1975b: 773n3), Nute (1975a) was accepted before the appearance of D. Lewis (1973b), but Nute (1975a) discusses the already-published Stalnaker (1968) and Stalnaker and Thomason (1970). I am indebted to a discussion of these issues by Marcus Arvan, Jessica Wilson, David Balcarras, Benj Hellie, and Christopher Gauker on the blog Philosophers’ Cocoon as part of the Campaign for Better Citation and Credit-Giving Practices in Philosophy sub-blog.

30. See D. Lewis (1973b: §2.7) for a translation from set selection functions to using system-of-spheres formulations. Stalnaker (1968) uses a world selection function which by definition requires uniqueness. It also requires positing an “absurd world” to return when p is contradictory. Nute (1975b) uses set selection functions, and Nute (1975a: 777) contends that his formulation does not require the limit assumption, and therefore contradicts D. Lewis (1973b: 58), who says that the system-of-spheres approach is more general than the set selection approach because the latter requires the limit assumption. This technical issue needs further investigation.

31. Walters (2014) and Morreau (2009) try to break this stalemate, in favor of similarity analyses. But their arguments are not completely decisive. While Morreau’s (2009: 447–448) counterexample to Transititivity differs from Stalnaker’s (1968) (27), it comes down to whether or not it is true to assert If it had rained, there would have been an ordinary rain shower, not a thunderstorm in a context where a thunderstorm is a real, but unlikely, possibility. My intuitions on this aren’t clear. Even if it is intuitively true, a strict theorist can say it is due to accommodating a presupposition: that we can rule out the unlikely possibility. Walters (2014) asks us to consider a context where I went to a show and my view was obstructed, and I didn’t see it. Intuitively, (32a) is true. Walters (2014) argues that (32b) must be true because the consequent is true and the antecedent and consequent are independent of each other. Obviously, (32c) is not true, although it follows from (32a) and (32b) by Transititivity.

(32)
a.
If I had been an inch taller than I actually am, I would have seen the show.
b.
I would not have been an inch taller than I actually am if I had seen the show.
c.
If I had been an inch taller than I actually am, I would not be an inch taller than I actually am.

But, I have a hard time hearing (32b) as true, precisely because (32a)’s truth makes it hard to regard being an inch taller and me seeing the show as independent.

32. On a similarity analysis, when the nested counterfactual $$\phi_2>\psi$$ is evaluated in $$\phi_1>(\phi_2 >\psi)$$, it is free to select $$\neg\phi_1$$-worlds. So $$\phi_1>(\phi_2>\psi)$$ will not guarantee that all most similar $$\phi_1\land\phi_2$$-worlds are $$\psi$$-worlds.

33. Champollion, Ciardelli, and Zhang (2016) experimentally confirm that this is a robust intuition across English speakers.

34. A counterexample to SNCA is also mentioned by Willer (2017: §4.2), and attributed to an anonymous referee. But (39) brings out the intuition more robustly.

35. The only secondary literature on Stalnaker’s (1984: Ch.7) projection strategy is Pendlebury (1989) who argues that it leads to a crucially different semantics for counterfactuals.

36. As Francis Fairbairn (p.c.) has pointed out to me, the counterexamples proposed by Kment (2006) and Wasserman (2006) assume that Lewis’ second constraint requires maximizing not just the continuous spatio-temporal region of exact match before a small miracle, but match over subsequent regions discontinuous with the initial one too. It is somewhat difficult to see how that interpretation of Lewis’ second constraint would adequately address the original future similarity objection.

37. Veltman (2005) does not include (42b). Presumably it just ensures that there are both $$\mathsf{R\land W}$$ and $$\mathsf{R\land\neg W}$$ worlds.

38. Situations are an essential part of both Veltman (1985) and Kratzer (1989, 2012), although kratzer (1989, 2012) offers a much more intricate theory of situations. See entry Situations in Natural Language Semantics.

39. The fact that you didn’t bet determines the fact that you didn’t win, so $${\langle \mathsf{B},0\rangle}$$ and $${\langle \mathsf{W},0\rangle}$$ cannot both be in a basis for $$w_2$$. But neither fact determines whether the coin came up heads, and whether the coin came up heads does not determine whether you bet, or whether you won.

40. See the entry Logic and Probability for a detailed discussion of probabilistic tools.

41. Loewer (2007) finesses this issue but only by considering only counterfactuals that have an explicit probability value in the consequent.

42. While Hawthorne (2005) has criticized probabilistic approaches for not validating Agglomeration

$\phi>\psi_1, \phi>\psi_2 \textrm{ therefore } \phi>(\psi_1\land\psi_2)$

this pattern is validated by theories like Adams 1976 and Leitgeb (2012a,b).

43. Although this kind of reasoning is sometimes referred to as “backtracking”, the conditional (48c) is not a backtracker in the sense of D. Lewis (1979): the consequent event does not occur before the antecedent event—they are simultaneous.

44. For more on this see §5.2 of the entry Assertion.

45. A moment’s thought reveals the staggering complexity here. For simple subject-predicate sentences there are at least 12 tense-aspect combinations. So even ignoring conditionals containing conditionals, there are at least 144 combinations to investigate in a conditional sentence that combines two sentences. The most comprehensive study for English is Declerck and Reed (2001: §5.7.5), which finds nine importantly distinct tense combinations in counterfactuals—and this excludes many variants with modals or special verb forms in antecedent and consequent.

46. If the if’s involved were mere homonyms then all conditionals should admit of both interpretations, which they do not. And, we should not expect to find the same particles used for the two constructions across unrelated languages, but we do.

47. There are parallel examples for were and simple past tense subjunctives (Iatridou 2000).

48. Fintel (1999: §1) adds the requirement that the consequent have an overt modal like might or would. This definition of the indicative/subjunctive distinction is quite useful in English, but there is a question whether it is sufficiently cross-linguistically stable. For example, Bittner (2011) considers conditionals in the tenseless language Kalaallisut, where counterfactuals do not bear any morphological affinity with past tense. There, counterfactuals look just like indicatives except for the inclusion of a remote modality suffix -galuar in both antecedent and consequent.

49. This is the version preferred by Schulz (2007, 2014) and Starr (2014), while Iatridou (2000) prefers a stronger position: subjunctive antecedents evoke a scenario which is assumed to be partly incompatible with what is being assumed about the actual world in the discourse. The most comprehensive study of the possibilities here is Fintel (1999), although that study is more broadly construed so as to be applicable to Past Modality approaches: what do indicatives presuppose about the possibilities they describe?

50. Roughly, re-categorization is the adaptation of an old form to a new use which exploits some similarity between the uses. Since the uses are not identical, a new convention evolves and hence a new morpheme is born.

51. See also the King Ludwig example in Kratzer (1989: 640). For a detailed discussion of these examples see Schulz (2007: §5.6).

52. Stalnaker (1984: 129) puts it this way:

This is the requirement that if one possible world is selected over another relative to one antecedent, then it must be favored relative to any antecedent for which both are eligible.

But it takes some work to see how this is a paraphrase of uniformity.

53. For LT, one can get from the second premise to the conclusion by SSE if $$(\phi_1\land\phi_2)>\phi_1$$ and $$\phi_1>(\phi_1\land\phi_2)$$ can be established. The former is a logical truth, and the latter follows from the first premise. For LAS, one can get from the second premise to the conclusion by SSE, and as before $$(\phi_1\land\phi_2)>\phi_1$$ and $$\phi_1>(\phi_1\land\phi_2)$$.

54. CN requires adding $${\Diamond}\phi$$ to cover the case where $$\phi$$ is contradictory and thus both $$\phi>\psi$$ and $$\phi>\neg\psi$$ are vacuously true and $$\neg(\phi>\psi)$$ false. $${\Diamond}\phi$$ is given a similarity analysis as well, where it’s true just in case $$f(w,{\llbracket\phi \rrbracket}^f_v)\neq{\emptyset}$$.

55. Evaluated in $$w_0$$, $$\mathsf{O>{\mathsf{Could}}(T)}$$ says that $$\mathsf{{\mathsf{Could}}(T)}$$ is true in the $$\mathsf{O}$$-world most similar to $$w_0$$, call it $$w_1$$. Either $$w_1$$ is a $$\mathsf{T}$$-world or a $$\mathsf{\neg T}$$-world. If $$w_1$$ is a $$\mathsf{T}$$-world, then $$\mathsf{O>T}$$ is true. If $$w_1$$ is a $$\mathsf{\neg T}$$-world, then $$\mathsf{O>\neg T}$$ is true.

56. Though it is syntactically suspect, one might try to analyze it as a conjunction of conditionals. But this distorts the anaphoric relations: if John had come to the party, he would have had a drink and it might be that if he had come to the party, he would have liked it. On this re-analysis, it’s dominant construal is the party, unlike (65).