## Notes to Counterfactuals

1. See Declerck and Reed (2001: 99) and Brée 1982. See also Fintel 1999 for a closely related definition.

2.
D. Lewis (1973b: 3) attributes
(4)
and
(3),
and this observation, to Adams (1970).
But the actual pair in Adams (1970) is
*if Oswald hadn’t shot Kennedy, Kennedy would be alive
today* and *if Oswald didn’t shoot Kennedy, Kennedy is
alive today*, and the observation made there is that the
subjunctive is *justified* while the indicative is not. Adams
is careful to say that this does not imply that one is true while the
other false.

3.
Where “*V*” is a variable ranging over un-tensed
verbs.

4.
Those labels are used across languages to distinguish two broad
functional categories of verbal mood that indicates whether the
speaker is committing to the occurrence of the event described by that
verb (Palmer 1986:
§1.1.2)—much as verbal tense indicates whether that
event occurred in the past, present or future. Indicative indicates
the clause is being committed to, while the subjunctive is
noncommittal (it often includes imperatives, optatives,
interrogatives). While *were*-conditionals such as
(7)
could be said to have an antecedent in the subjunctive mood, the same
cannot be said of
(4),
which is formally indicative past perfect. Further, some languages
have a widely used subjunctive mood, but do not employ it in the
relevant conditionals (Palmer 1986; Iatridou
2000). Many linguists working on non-Indo-European languages
use the labels “realis” and “irrealis” in a
related but different way (Palmer 1986:
§§6-7). Stone (1997: 8)
suggests terminology implicit in typological work: *remote* and
*vivid* modality.

5. For surveys on indicative conditionals see the complementary entry Indicative Conditionals and Gillies (2012). For a survey of subjunctive conditionals see Fintel (2012).

6.
This entry will use \(\phi,\psi\) as variables ranging over any
sentences of the language, *A* as a variable ranging over atomic
sentences, and \(\mathsf{A,{\ldots},Z}\) as particular atomic
sentences.

7. Some incompatibilists instead boldly reject the forms of determinism and indeterminism that lead to this conflict. This could seen as providing an alternative account of what is physically possible so as to make (9) true.

8. See Kant (1781: A533/B560–A558-B586) and Smilansky (2000) for something like this pragmatic view, and Pereboom (2014: 176–178): for criticisms of it.

9. As Marr (1982) puts it: an abstract mathematical theory of how cash registers work leaves open what system of numerical representation is used (binary, Arabic, Roman) and what algorithms are employed to manipulate those representations to perform arithmetic operations.

10. See Interpretations of Probability (§1) for details about the probability calculus.

11. Pearl (2009) uses “=” instead of “\(\dequal \)”, but this can obscure the fact that this is an asymmetric relation: the left-hand side is determined by the right.

12. The corresponding joint probability distribution requires storing \(2^8=256\) probability values—one for each Boolean combination of the variables—while this Bayesian network would require only 18—one conditional probability for each Boolean combination of the parent variables, and one for each of the two independent variables. See Sloman (2005: Ch.4) and Pearl (2009: Ch.1) for details.

13. Spirtes, Glymour, and Scheines (1993, 2000), Pearl (2000, 2009), and Halpern and Pearl (2005a,b) instead treat the equations as representing the “basic mechanisms” or laws of a causal system. This interpretation is best construed as a non-reductive analysis of causation, rather than analyzing causation in terms of basic counterfactuals. The entry Causation and Manipulability describes how such a view fits into manipulationist theories of causation and the entry Probabilistic Causation describes how it fits into probabilistic theories of causation.

14. While Wilson (2018) explicitly interprets the structural equations as basic counterfactuals, Schaffer (2016) is less clear on this point. It may be better to read Schaffer (2016) as taking those equations to be basic grounding claims. However, as with causation, there are good reasons to view these equations as counterfactuals (Hitchcock 2001 and Woodward 2002, 2003).

15. K. Bennett (2017: §3.3) rejects a counterfactual theory of building relations while taking causation and grounding to be kinds of building relations.

16. For Woodward (2003: §5.6), explanations need not involve laws of nature. They only need to involve “invariants” like the relationships represented in a system of structural equations.

17. See Ichikawa (2011), K. Lewis (2016, 2017), and Ippolito (2016) for further discussion of the context-sensitivity of counterfactuals.

18.
Rather than a *sphere* of accessibility, Kripke
(1963) uses an accessibility relation
\(R(w,w')\). Accessibility spheres will fit more smoothly with the
presentation here and can be defined in terms of an accessibility
relation: \(R(w)\dequal {\{w'\mid R(w,w')\}}\).

19.
Here, *v* is an atomic valuation which assigns every atomic
sentence to exactly one truth-value in each possible world. Atomic
valuations correspond to one line in a truth-table in classical
logic.

20. I would like to thank Gabriel Greenberg for allowing me to use this (modified) version of a diagram from one of his unpublished papers.

21. I would like to thank Gabriel Greenberg for allowing me to use this (modified) version of a diagram from one of his unpublished papers.

22.
Auxiliary assumption: if
\({\llbracket}\phi_2{\rrbracket}^R_v\subseteq{\llbracket}\phi_1{\rrbracket}^R_v\)
then \(\phi_2>\phi_1\) is true at any *w* in \(R,v\). Suppose
that the antecedent of
antecedent monotonicity
holds so (a) \(\phi_1>\psi\) is true at some \(w,R,v\) and (b)
\({\llbracket}\phi_2{\rrbracket}^R_v\subseteq{\llbracket}\phi_1{\rrbracket}^R_v\).
From (b) and the auxiliary assumption it follows that
\(\phi_2>\phi_1\) is true at \(w,R,v\). By
Transititivity,
\(\phi_2>\psi\) follows.

23. Auxiliary assumption: if \(\phi>\psi_1\) is true and

\[{\llbracket \psi_1 \rrbracket}^R_v \subseteq {\llbracket \psi_2 \rrbracket}^R_{v}, \]then \(\phi>\psi_2\) is true. Suppose that the antecedent of antecedent monotonicity holds so

\[\tag{a} \phi_1>\psi \textrm{ is true at some } w,R,v\]and

\[\tag{b} {\llbracket \phi_2 \rrbracket}^R_{v} \subseteq {\llbracket\phi_1 \rrbracket}^R_{v}. \]\(\neg\psi>\neg\phi_1\) follows from (a) by Contraposition. From (b) it follows that

\[{\llbracket \neg\phi_1 \rrbracket}^R_v \subseteq {\llbracket\neg\phi_2 \rrbracket}^R_v,\]since

\[W-{\llbracket \phi_1 \rrbracket}^R_v \subseteq W-{\llbracket \phi_2 \rrbracket}^R_v\]follows from (b) and the set-theoretic fact that

\[A\subseteq B \iff (W-B)\subseteq (W-A).\]From this and the auxiliary assumption \(\neg\psi>\neg\phi_2\) follows. By Contraposition again, \(\phi_2>\psi\) follows.

24. Peirce (1896: 33) attributes this view to Philo the Logician, a member of the early Hellenistic Dialectical School. However Bobzien (2011: §3.1) presents Philo as a material implication theorist. For these historical issues see Sanford (1989: Ch.2), Copeland 2002, and Zeman (1997).

25. Saying that \(\phi\supset\psi\) is true throughout \(R(w)\) is equivalent to saying that \(\psi\) is true throughout the \(\phi\)-worlds in \(R(w)\). More formally:

\[\tag{a} R(w)\subseteq ((W-{\llbracket \phi \rrbracket}^R_v) \cup {\llbracket\psi\rrbracket}^R_v)\]holds if and only if

\[\tag{b} (R(w)\cap{\llbracket\phi\rrbracket}^R_v) \subseteq {\llbracket\psi\rrbracket}^R_v.\]Suppose (a) and that \(w'\in R(w)\cap{\llbracket\phi\rrbracket}^R_v\). Then \(w'\in R(w)\) and by (a)

\[w'\in((W-{\llbracket\phi\rrbracket}^R_v) \cup {\llbracket\psi\rrbracket}^R_v).\]This entails \(w'\in{\llbracket\psi\rrbracket}^R_v\), after all \(w'\in{\llbracket\phi \rrbracket}^R_v\) in which case

\[w'\notin(W-{\llbracket\phi\rrbracket}^R_v).\]Thus (b) follows from (a). Now suppose (b) and that \(w'\in R(w)\). Either \(w'\in{\llbracket\phi \rrbracket}^R_v\) or \(w'\notin{\llbracket\phi \rrbracket}^R_v\). Suppose the former. Then

\[w'\in R(w)\cap{\llbracket\phi \rrbracket}^R_v.\]So by (b) \(w'\in{\llbracket\psi \rrbracket}^R_v\) and so

\[w'\in((W-{\llbracket\phi \rrbracket}^R_v) \cup {\llbracket\psi \rrbracket}^R_v).\]Suppose the latter. Then \(w'\in(W-{\llbracket\phi \rrbracket}^R_v)\) and so

\[w'\in((W-{\llbracket\phi \rrbracket}^R_v) \cup {\llbracket\psi \rrbracket}^R_v).\]Thus (a) follows from (b).

26. For this see the entry Modal Logic.

27. For a more exhaustive study of the logic of strict conditionals see Cresswell and Hughes (1996: Ch. 11). For conditionals generally, see The Logic of Conditionals and Nute 1980b.

28. See Gillies (2007: 335 fn10).

29.
The contemporaneity of Sprigge, Lewis, and Stalnaker is stated in a
letter from Lewis to Sprigge published in Sprigge
(2006). According to Nute
(1975b: 773n3), Nute
(1975a) was accepted before the
appearance of D. Lewis (1973b), but
Nute (1975a) discusses the
already-published Stalnaker (1968) and
Stalnaker and Thomason (1970). I am
indebted to a discussion of these issues by Marcus Arvan, Jessica
Wilson, David Balcarras, Benj Hellie, and Christopher Gauker on the
blog *Philosophers’ Cocoon* as part of the Campaign for
Better Citation and Credit-Giving Practices in Philosophy
sub-blog.

30.
See D. Lewis (1973b: §2.7) for a
translation from set selection functions to using system-of-spheres
formulations. Stalnaker (1968) uses a
**world** selection function which by definition requires
uniqueness.
It also requires positing an “absurd world” to return
when *p* is contradictory. Nute
(1975b) uses set selection functions, and Nute
(1975a: 777) contends that his
formulation does not require the limit assumption, and therefore
contradicts D. Lewis (1973b: 58), who
says that the system-of-spheres approach is more general than the set
selection approach because the latter requires the limit assumption.
This technical issue needs further investigation.

31.
Walters (2014) and Morreau
(2009) try to break this stalemate, in
favor of similarity analyses. But their arguments are not completely
decisive. While Morreau’s (2009:
447–448) counterexample to
Transititivity
differs from Stalnaker’s (1968)
(27),
it comes down to whether or not it is true to assert *If it had
rained, there would have been an ordinary rain shower, not a
thunderstorm* in a context where a thunderstorm is a real, but
unlikely, possibility. My intuitions on this aren’t clear. Even
if it is intuitively true, a strict theorist can say it is due to
accommodating a presupposition: that we can rule out the unlikely
possibility. Walters (2014) asks us to
consider a context where I went to a show and my view was obstructed,
and I didn’t see it. Intuitively,
(32a)
is true. Walters (2014) argues that
(32b)
must be true because the consequent is true and the antecedent and
consequent are independent of each other. Obviously,
(32c)
is not true, although it follows from
(32a)
and
(32b)
by
Transititivity.

- (32) a. If I had been an inch taller than I actually am, I would have seen the show.
- b. I would not have been an inch taller than I actually am if I had seen the show.
- c. If I had been an inch taller than I actually am, I would not be an inch taller than I actually am.

But, I have a hard time hearing (32b) as true, precisely because (32a)’s truth makes it hard to regard being an inch taller and me seeing the show as independent.

32. On a similarity analysis, when the nested counterfactual \(\phi_2>\psi\) is evaluated in \(\phi_1>(\phi_2 >\psi)\), it is free to select \(\neg\phi_1\)-worlds. So \(\phi_1>(\phi_2>\psi)\) will not guarantee that all most similar \(\phi_1\land\phi_2\)-worlds are \(\psi\)-worlds.

33. Champollion, Ciardelli, and Zhang (2016) experimentally confirm that this is a robust intuition across English speakers.

34. A counterexample to SNCA is also mentioned by Willer (2017: §4.2), and attributed to an anonymous referee. But (39) brings out the intuition more robustly.

35. The only secondary literature on Stalnaker’s (1984: Ch.7) projection strategy is Pendlebury (1989) who argues that it leads to a crucially different semantics for counterfactuals.

36.
As Francis Fairbairn (p.c.) has pointed out to me, the
counterexamples proposed by Kment (2006)
and Wasserman (2006) assume that
Lewis’ second constraint requires maximizing not just *the
continuous* spatio-temporal region of exact match before a small
miracle, but match over subsequent regions discontinuous with the
initial one too. It is somewhat difficult to see how that
interpretation of Lewis’ second constraint would adequately
address the original future similarity objection.

37. Veltman (2005) does not include (42b). Presumably it just ensures that there are both \(\mathsf{R\land W}\) and \(\mathsf{R\land\neg W}\) worlds.

38. Situations are an essential part of both Veltman (1985) and Kratzer (1989, 2012), although kratzer (1989, 2012) offers a much more intricate theory of situations. See entry Situations in Natural Language Semantics.

39. The fact that you didn’t bet determines the fact that you didn’t win, so \({\langle \mathsf{B},0\rangle}\) and \({\langle \mathsf{W},0\rangle}\) cannot both be in a basis for \(w_2\). But neither fact determines whether the coin came up heads, and whether the coin came up heads does not determine whether you bet, or whether you won.

40. See the entry Logic and Probability for a detailed discussion of probabilistic tools.

41. Loewer (2007) finesses this issue but only by considering only counterfactuals that have an explicit probability value in the consequent.

42. While Hawthorne (2005) has criticized probabilistic approaches for not validating Agglomeration

\[\phi>\psi_1, \phi>\psi_2 \textrm{ therefore } \phi>(\psi_1\land\psi_2)\]this pattern is validated by theories like Adams 1976 and Leitgeb (2012a,b).

43. Although this kind of reasoning is sometimes referred to as “backtracking”, the conditional (48c) is not a backtracker in the sense of D. Lewis (1979): the consequent event does not occur before the antecedent event—they are simultaneous.

45. A moment’s thought reveals the staggering complexity here. For simple subject-predicate sentences there are at least 12 tense-aspect combinations. So even ignoring conditionals containing conditionals, there are at least 144 combinations to investigate in a conditional sentence that combines two sentences. The most comprehensive study for English is Declerck and Reed (2001: §5.7.5), which finds nine importantly distinct tense combinations in counterfactuals—and this excludes many variants with modals or special verb forms in antecedent and consequent.

46.
If the *if*’s involved were mere homonyms then all
conditionals should admit of both interpretations, which they do not.
And, we should not expect to find the same particles used for the two
constructions across unrelated languages, but we do.

47.
There are parallel examples for *were* and simple past tense
subjunctives (Iatridou 2000).

48.
Fintel (1999: §1) adds the
requirement that the consequent have an overt modal like
*might* or *would*. This definition of the
indicative/subjunctive distinction is quite useful in English, but
there is a question whether it is sufficiently cross-linguistically
stable. For example, Bittner (2011)
considers conditionals in the tenseless language Kalaallisut, where
counterfactuals do not bear any morphological affinity with past
tense. There, counterfactuals look just like indicatives except for
the inclusion of a remote modality suffix *-galuar* in both
antecedent and consequent.

49. This is the version preferred by Schulz (2007, 2014) and Starr (2014), while Iatridou (2000) prefers a stronger position: subjunctive antecedents evoke a scenario which is assumed to be partly incompatible with what is being assumed about the actual world in the discourse. The most comprehensive study of the possibilities here is Fintel (1999), although that study is more broadly construed so as to be applicable to Past Modality approaches: what do indicatives presuppose about the possibilities they describe?

50. Roughly, re-categorization is the adaptation of an old form to a new use which exploits some similarity between the uses. Since the uses are not identical, a new convention evolves and hence a new morpheme is born.

51. See also the King Ludwig example in Kratzer (1989: 640). For a detailed discussion of these examples see Schulz (2007: §5.6).

52. Stalnaker (1984: 129) puts it this way:

This is the requirement that if one possible world is selected over another relative to one antecedent, then it must be favored relative to any antecedent for which both are eligible.

But it takes some work to see how this is a paraphrase of uniformity.

53. For LT, one can get from the second premise to the conclusion by SSE if \((\phi_1\land\phi_2)>\phi_1\) and \(\phi_1>(\phi_1\land\phi_2)\) can be established. The former is a logical truth, and the latter follows from the first premise. For LAS, one can get from the second premise to the conclusion by SSE, and as before \((\phi_1\land\phi_2)>\phi_1\) and \(\phi_1>(\phi_1\land\phi_2)\).

54. CN requires adding \({\Diamond}\phi\) to cover the case where \(\phi\) is contradictory and thus both \(\phi>\psi\) and \(\phi>\neg\psi\) are vacuously true and \(\neg(\phi>\psi)\) false. \({\Diamond}\phi\) is given a similarity analysis as well, where it’s true just in case \(f(w,{\llbracket\phi \rrbracket}^f_v)\neq{\emptyset}\).

55. Evaluated in \(w_0\), \(\mathsf{O>{\mathsf{Could}}(T)}\) says that \(\mathsf{{\mathsf{Could}}(T)}\) is true in the \(\mathsf{O}\)-world most similar to \(w_0\), call it \(w_1\). Either \(w_1\) is a \(\mathsf{T}\)-world or a \(\mathsf{\neg T}\)-world. If \(w_1\) is a \(\mathsf{T}\)-world, then \(\mathsf{O>T}\) is true. If \(w_1\) is a \(\mathsf{\neg T}\)-world, then \(\mathsf{O>\neg T}\) is true.

56.
Though it is syntactically suspect, one might try to analyze it as a
conjunction of conditionals. But this distorts the anaphoric
relations: *if John had come to the party, he would have had a
drink and it might be that if he had come to the party, he would have
liked it*. On this re-analysis, *it*’s dominant
construal is *the party*, unlike
(65).