1. We write variables $$X$$, $$Y$$, etc., in italics and their (binary) realizations $$\r{X}$$, $$\neg\r{X}$$, etc. in roman script.

2. Whereas theorists of probabilistic causality viewed events as the causal relata, throughout this article, we use variables, which can represent a range of relata. We write binary and numerical variables $$V$$ in italics and their instantiations $$\r{V}$$ and $$\neg\r{V}$$ in roman letters.

3. Reichenbach’s Principle of the Common Cause (1956: 163) states that if two events are probabilistically dependent but neither is the cause of the other, the dependence must be explained by a common cause. This principle has been generalized to the causal Markov condition in graphical approaches. Malinas (2001: 277) falsely claims that Simpson’s Paradox leads to counterexamples to Reichenbach’s principle, since any probabilistic dependency will be screened off by an indefinite number of partitioning variables, many of which are not common causes. But the principle does not entail that all screening off variables are common causes.

4. Skyrms (1980) gives the weaker requirement that causes must raise the probabilities of their effects in some contexts and lower them in none. In the econometrics literature, this assumption has been studied extensively under the label “monotonicity” (Imbens & Angrist 1994).

5. In practical contexts, knowing the average effect of a treatment in a population may be of limited use for determining how an individual in the population would respond to the treatment. But this is an issue with averages generally, and does not make average effects any less genuinely causal (cf. Hausman 2010). See Pearl (2000 [2009: 396–400]) for further results about the quantitative relationship between the effects in populations and the effects for individuals.

6. Here (and below) the qualification “typically” is a placeholder for “assuming the causal Faithfulness condition” (see Weinberger 2018).

7. Note that while the back-door criterion is sufficient for identifiability, it is not necessary. For example, Pearl’s front-door criterion (2000 [2009: 82]) licenses identifiability in certain scenarios in which one cannot block all back-door paths. In such a case (and many others) the probabilistic formula identifying the effect will be more complicated, and the relationship between the effect in the population and subpopulations will be less transparent from looking at the formula.

8. Here we follow Pearl in assuming that in decision-theoretic contexts, actions should be modeled as interventions. See Stern 2019 for critical discussion.

9. Whether sampling assumptions in fact have “nothing” to do with causality is non-trivial, and has not been addressed in the literature. For instance, if, following Malinas (2001), the variables $$T$$, $$R$$, and $$M$$ refer to letters on balls in an urn in the same proportions as in table 1, then sorting the balls into two urns based on whether they have an $$M$$ or not prior to drawing from one of the urns would be probabilistically equivalent to intervening on $$M$$.

10. Fitelson (2017: 305 fn. 17) conjectures that there are cases in which $$T$$ has a minor influence on $$M$$, and thus $$p(\r{T}\mid \do (\r{M})) \neq p(\r{T})$$, but where Simpson’s reversals would still seem paradoxical. These test cases suggest a basis for empirically distinguishing between Fitelson and Pearl’s explanations of the paradox.

11. Cases such as this one in which certain factors change systematically with time need to be modeled using non-stationary time-series. The methods for establishing probabilistic association among non-stationary time-series are distinct from those for stationary time-series, and thus pose additional problems for causal inference from probabilities (Hoover 2003). These methods are beyond the scope of the present entry.

12. The discussion here follows Sober (2000 [2018]).