Notes to Experimental Moral Philosophy
1. As Quine is said to have put it, “The universe is no university.”
2. Isen & Levin (1972).
3. John Stuart Mill (1977) had something like this in mind when he wrote of “experiments of living” (1977, 260–261).
4. Statistical power is the probability that a test will reject the null hypothesis (the statement that the conditions of the experiment do not differ) when the null hypothesis is false. It is therefore 1 minus the probability of a Type II error (failure to detect a relationship despite the reality of the relationship). Statistical power increases as the sample size grows.
5. There is a good deal of exciting new work on questions often characterized as matters of public policy, but which have important moral dimensions. See, among others, Sunstein & Thaler (2008), Sunstein (2013), Gigerenzer & Muir Gray (2011), and Conly (2012). Also see the work of behavioral economists such as George Lowenstein (Kriss, Loewenstein, Wang, and Weber, 2011; Cain, Loewenstein, and Moore, 2011).
6. The distinction between moral intuitions and moral judgments is fraught. Here, we treat moral intuitions as moral seemings and moral judgments as moral beliefs.
7. See Doris & Stich (2006) and Stich & Weinberg (2001).
8. For incisive criticisms of this claim, see Banerjee, Huebner, & Hauser (2010), Sytsma & Livengood (2011), and Lam (2010).
9. Others prominently expressing concern about the bearing of experimental results such as these on philosophers' reliance on moral intuitions include Kwame Anthony Appiah (2008) and Peter Singer (2005).
10. This and related research are discussed in more detail in subsection 2.3.
11. Fiery Cushman and Liane Young (2009, 2011) have developed an alternative dual-process model for moral (and non-moral reasoning), as has Daniel Kahneman (2011). For more on dual-system approaches see Section 3.3, below.
12. Mediation analysis attempts to determine whether one variable (the predictor) affects a second variable (the outcome) by influencing a third, mediating variable (Baron & Kenny 1986). Structural equation modeling allows the analyst to assess and compare various models relating predictors, outcomes, mediators, and moderators (Kline 2005).
13. See Nadelhoffer (2004, 2006); Knobe & Mendlow (2004); Knobe (2004a, 2004b, 2007); Pettit & Knobe (2009); Tannenbaum, Ditto, & Pizarro (2007); Beebe & Buckwalter (2010), Beebe & Jensen (2012); Alfano, Beebe, & Robinson (2012); Robinson, Stey, & Alfano (2013).
14. Such scales are named for their inventor, Rensis Likert [pronounced “LICK-urt”] (1932). The participant is presented a statement and then asked to agree or disagree with it on a numeric scale. Commonly, scales run from 1 to 7, 1 to 5, −3 to 3, or −2 to 2. Almost always, the endpoints are labeled ‘strongly disagree’ and ‘strongly agree’. Quite often, the midpoint is labeled ‘neither agree nor disagree’. Sometimes other points on the scale are labeled as well.
15. The idea that seemingly predictive and explanatory concepts might also have a normative component is not entirely original with Knobe; Bernard Williams pointed out that virtues and vices have such a dual nature (1985, 129).
16. Owen Flanagan (1991) considered some of the same evidence before Doris and Harman, but he was reluctant to draw the pessimistic conclusions they did about virtue ethics.
17. When it comes to explaining variance in behavior, the basic idea is that the statistical analysis of experimental results yields a correlation between a personality variable (such as extroversion) and a behavioral variable (such as an act of helping). Correlations range from −1 to +1. A correlation of 0 means that the individual variable is of literally no use in predicting the behavioral outcome; a correlation of 1 means that the individual variable is a perfect positive predictor; a correlation of −1 means that the individual variable is a perfect negative predictor. Actual correlations tend to be between −.3 and +.3. The amount of variance explained by a given predictor variable is the square of the correlation between that variable and the behavior in question. So, for instance, if extroversion is correlated with helping behavior at .25, then extroversion explains 6.25% of the variance in helping behavior. Although this is only one, rather simplistic, measure of explanatory power, personality variables do not look better on other measures, such as Cohen's d, eta2, or partial-eta2.
18. Merritt (2000) was the first to suggest that the situationist critique could be handled by offloading some of the responsibility for virtue onto the social environment in something like this way.
19. One might hope that philosophical reflection on ethics would promote moral behavior. Eric Schwitzgebel has recently begun to investigate whether professional ethicists behave better morally than their non-ethicist philosophical peers, and claims that, on most measures, the two groups are indistinguishable (Schwitzgebel 2009; Schwitzgebel & Rust 2010; Schwitzgebel et al. 2011).
20. See, for instance, Diener, Scollon, & Lucas (2003).
21. See Schimmack & Oishi (2005) for a critical reply, which argues that chronically accessible information is a much better predictor of life satisfaction responses than temporarily accessible information, such as how many dates one went on last week.
22. See Haybron (2008).
23. See Nichols (2002; 2004, ch. 5.)
24. See Kennett (2006), Roskies (2003), and Shoemaker (2011). See Strandberg & Björklund (2013) for a different sort of experimentally-motivated argument against internalism. See also Buckwalter & Turri (unpublished manuscript) for an argument that internalism and externalism draw on distinct folk notions of belief. Some doubt the relevance of empirical considerations to the debate over internalism in ethics, arguing that since internalism claims a necessary or conceptual link between moral judgments and motivation, externalism is compatible with any merely contingent connection.
25. See May (2014) for a criticism of these findings, and Kelly (2011, especially chapter 1) for a comprehensive literature review.
26. Fitting attitude theorists needn't deny this. Some experiences of disgust are not fitting attitudes upon which to ground wellbeing, they could say. And similar arguments could perhaps be deployed on behalf of the other metaethical views mentioned above. Even so, experimental moral philosophy could play a useful role, helping us to identify suspect experiences of disgust.
27. There does seem to be some potential for fine-tuning, however (Rozin 2008; Case et al. 2006).
28. 1) Moral Metaphysics: Is morality a realm of fact? 2) Moral Semantics: What is the function of moral language? How are moral terms to be defined? 3) Moral Reasons: Why be moral? What is the connection between moral principles and reasons for action, or between moral judgments and motivations to act in accord with them? What, indeed, are reasons for action and are they all grounded in our desires? 4) Moral Epistemology. How (if at all) is moral knowledge possible? Is there such a thing as good moral reasoning and if so, what are its precepts?
29. These may themselves be relevant to questions in category 1, moral metaphysics. The categories are somewhat vague, admit of a good deal of overlap, and are by no means exhaustive, but they will serve our purpose here.
30. We also leave aside important issues related to the evolution of morality (see, e.g., Trivers, 1971, and Cosmides and Tooby, 2013), itself the basis for an important challenge to moral realist (and other) views. The latter, evolutionary debunking arguments, are discussed in the entries, “Morality and Evolutionary Biology” and “Moral Epistemology”.
31. See Kluckholn (1959) and Benedict (1959).
32. Moral disagreement is still often cited in philosophical discussions of moral relativism. For example, John Cook (1999) and Richard Schweder (2012) take empirical evidence concerning apparent moral disagreement seriously.
33. Other philosophers notably recognizing the relevance of empirical inquiry to the argument from moral disagreement include Michele Moody Adams (1997), Francis Snare (1980 and 1984), and William Tolhurst (1987).
34. For a fuller discussion see the article in this encyclopedia by Stich and Doris, Moral Psychology: Empirical Approaches, Section 6.
35. Ayer (1936), who makes a similar argument, would have eschewed labeling the conclusion of such an argument as metaphysical, since he thought the issue of moral objectivity couldn't be articulated in a meaningful way. (Ayer used the word ‘metaphysical’ in a very different sense than we do here.)
36. This is Stich and Weinberg's (2001) term.
37. Frank Jackson (1998) displays what some might think is a too-cavalier attitude toward empirical investigation of moral language: “I am sometimes asked—in a tone that suggests that the question is a major objection—why, if conceptual analysis is concerned to elucidate what governs our classificatory practice, don't I advocate doing serious opinion polls on people's responses to various cases? My answer is that I do—when it is necessary. Everyone who presents the Gettier cases to a class of students is doing their own bit of fieldwork, and we all know the answer they get in the vast majority of cases.” (36–37.)
38. The scandal over replication has (rightly or wrongly) assumed such proportions recently that John Doris has taken to calling it, “Repligate”.
39. There is also an ongoing controversy surrounding null-hypothesis significance testing (NHST). In a nutshell, the problem is that a p-value is a conditional probability, but not the conditional probability that one might expect. A p-value is the probability that the result in hand would have been observed given the null hypothesis, i.e., given that nothing interesting is happening (no positive correlations, no negative correlations, no interaction effects, and so on). This is sometimes inverted by sloppy researchers and interpreters, who gloss the p-value as the probability of the null hypothesis given the observation. Symbolically, the difference is between P(observation | null) and P(null | observation). The latter, more desirable, conditional probability can be estimated using Bayesian statistical analysis, but seldom is (and there are controversies surrounding Bayesian analysis, especially the arbitrariness of prior probabilities). For an introduction to these problems, see Abelson (1997), Cohen (1994), and Wagenmakers et al. (2012).
42. In a recent critique of this kind of fallacious statistical thinking, Peter Austin, Muhammad Mamdani, David Juurlink, and Janet Hux (2006) describe statistical arguments purporting to show that Canadian patients' astrological signs were often correlated with their pathologies. For instance, using the same statistical techniques favored by many experimental philosophers one would be led to conclude that Gemini are 30% more likely to be alcoholics (p < 0.02), Scorpios have an 80% higher risk of developing leukemia (p < 0.05), and Virgo women suffer 40% more from excessive vomiting during pregnancy (p < 0.04). These are presumably statistical anomalies, not indicators of genuine health risks.
43. If something is actual, for example, then it is also possible.
44. We are here indebted to Chris Heathwood.