Epistemic Utility Arguments for Probabilism

First published Fri Sep 23, 2011; substantive revision Thu Dec 17, 2015

Our beliefs come in degrees; we believe some more strongly than others. For instance, I believe that the sun will rise tomorrow more strongly than I believe that it will rise every morning for the coming week; and I believe both of these propositions much more strongly than I believe that there will be an earthquake tomorrow in Bristol. We call the strength or the degree of our belief in a proposition our credence in that proposition. Suppose I know that a die is to be rolled, and I believe that it will land on six more strongly than I believe that it will land on an even number. In this case, we would say that there is something wrong with my credences, for if it lands on six, it lands on an even number, and I ought not to believe a proposition more strongly than I believe any of its logical consequences. This is a consequence of a popular doctrine in the epistemology of credences called Probabilism, which says that our credences at a given time ought to satisfy the axioms of the probability calculus (given in detail below). Since this says something about how our credences ought to be rather than how they in fact are, we call this an epistemic norm.

In this entry, we explore a particular strategy that we might deploy when we wish to establish an epistemic norm such as Probabilism. It is called epistemic utility theory, or sometimes cognitive decision theory, epistemic decision theory, or even accuracy-first or accuracy-centered epistemology. I will use the former. Epistemic utility theory is inspired by traditional utility theory, so let’s begin with a quick summary of that.

Traditional utility theory (also known as decision theory, see entry on normative theories of rational choice: expected utility) explores a particular strategy for establishing the norms that govern which actions it is rational for us to perform in a given situation. Given a particular situation, the framework for the theory includes states of the world that are relevant to the situation, actions that are available to the agent in the situation, and the agent’s utility function, which takes a state of the world and an action and returns a measure of the extent to which she values the outcome of performing that action at that world. We call this measure the utility of the outcome at the world. For example, there might be just two relevant states of the world: one in which it rains and one in which it does not. And there might be just two relevant actions from which to choose: take an umbrella when you leave the house or don’t. Then your utility function will measure how much you value the outcomes of each action at each state of the world: that is, it will give the value of being in the rain without an umbrella, being in the rain with an umbrella, being with an umbrella when there is no rain, and being without an umbrella when there is no rain. With this framework in hand, we can state certain very general norms of action in terms of it. For instance, we might say that an agent ought not to perform an action if there is some other action that has greater utility than it at every possible state of the world. This norm is called Naive Dominance. We will have a lot to say about it in section 5.1 below.

In epistemic utility theory, the states of the world remain the same, but the possible actions an agent might perform are replaced by the possible epistemic states she might adopt, and the utility function is replaced, for each agent, by an epistemic utility function, which takes a state of the world and a possible epistemic state and returns a measure of the purely epistemic value that the agent attaches to being in that epistemic state at that state of the world. So, in epistemic utility theory, we appeal to epistemic utility to ask which of a range of possible epistemic states it is rational to adopt, just as in traditional utility theory we appeal to non-epistemic, pragmatic utility to ask which of a range of possible actions it is rational to perform. In fact, we will often talk of epistemic disutility rather than epistemic utility in this entry. But it is easy to translate between them. If \(\mathfrak{EU}\) is an epistemic utility function, then \(-\mathfrak{EU}\) is an epistemic disutility function, and vice versa.

Again, certain very general norms may be stated, such as the obvious analogue of Naive Dominance from above. Thus, before the die is rolled, we might ask whether I should adopt an epistemic state in which I believe that the die will land on six more strongly than I believe that it will land on an even number. And we might be able to show that I shouldn’t because there is some other epistemic state I could adopt instead that will have greater epistemic utility however the world turns out. In this case, we appeal to the epistemic version of Naive Dominance to show what is wrong with my credences. This is an example of how epistemic utility theory might come to justify Probabilism. As we will see, arguments just like this have indeed been given. In this entry, we explore these arguments.

1. Modelling Epistemic States

In formal epistemology, epistemic states are modelled in many different ways (see entry on formal representations of belief). Given an epistemic agent and a time \(t\), we might model her epistemic state at \(t\) using any of the following:

  • the set of propositions she believes at \(t\);
  • the set of propositions she believes at \(t\) together with an entrenchment ordering, which specifies the order in which she is prepared to abandon her beliefs in the light of conflicting evidence;
  • her credence function at \(t\), which takes each proposition about which she has an opinion and returns her credence in that proposition at \(t\);
  • a set of credence functions, each of which is a precisification of her otherwise vague or imprecise or indeterminate credences at \(t\);
  • her upper and lower probability functions at \(t\);
  • and so on.

Epistemic utility theory may be applied to any one of these ways of modelling epistemic states. Whichever we choose, we define an epistemic disutility function to be a function that takes an epistemic state modelled in this way, together with a state of the world, to a non-negative real number or the number \(\infty\), and we take this number to measure the epistemic disutility of having that epistemic state at that world.

However, the vast majority of work carried out so far in epistemic utility theory has taken an agent’s epistemic state at time \(t\) to be modelled by her credence function at \(t\). And, in any case, the epistemic norm of Probabilism that interests us here governs agents modelled in this way. Thus, we focus on this case. In section 7, we will consider how the argument strategy employed here to justify Probabilism for agents with precise credences might be employed to establish other norms either for agents also represented as having precise credences or for agents represented in other ways.

So, henceforth, we model an agent’s epistemic state at \(t\) by her credence function at \(t\). We now make more precise what this means. We assume that the set of propositions about which an agent has an opinion is finite and forms an algebra \(\mathcal{F}\). That is:

  1. It contains a contradictory proposition (\(\bot\)). This is a proposition that is false at all worlds.
  2. It contains a tautologous proposition (\(\top\)). This is a proposition that is true at all worlds.
  3. It is closed under disjunction, conjunction, and negation. That is, if \(A\) and \(B\) are in \(\mathcal{F}\), then \(A \vee B\), \(A\ \&\ B\), and \(\neg A\) and \(\neg B\) are also in \(\mathcal{F}\).

We then assume that our agent’s credence in a proposition in \(\mathcal{F}\) can be measured by a real number between 0 and 1 inclusive, where 0 represents minimal credence, and 1 represents maximal credence. Then her credence function at \(t\) is a function c from \(\mathcal{F}\) to the closed unit interval \([0, 1]\). If \(A\) is in \(\mathcal{F}\), then \(c(A)\) is our agent’s credence in \(A\) at \(t\). Throughout, we denote by \(\mathcal{C_F}\) the set of possible credence functions defined on \(\mathcal{F}\). There is no principled reason for restricting to the case in which \(\mathcal{F}\) is finite. We do it here only because the majority of work on this problem has been carried out under this assumption. It is an interesting question how the results here might be extended to the case in which \(\mathcal{F}\) is infinite, but we will not explore it here (again, see section 7).

So, an epistemic utility function for credences takes a credence function, together with a way the world might be, and returns a measure of the epistemic utility of having that credence function if the world were that way.

2. The Form of Arguments in Epistemic Utility Theory

In epistemic utility theory, we attempt to justify an epistemic norm N using the following two ingredients:

  • QA norm of standard utility theory (or decision theory), which is to be applied, using epistemic utility functions, to discover which credence functions it is rational for an agent to adopt in a given situation.
  • EA set of conditions that a legitimate measure of epistemic utility must satisfy.

Typically, the inference from Q and E to N appeals to a mathematical theorem, which shows that, applied to any epistemic utility function that satisfies the conditions E, the norm Q entails the norm N.

Given that the existing arguments of epistemic utility theory share this common form, we might organize these arguments by the norms they attempt to justify, or by the norms of standard utility theory they employ, or by the set of constraints on epistemic utility functions they impose. We will take the latter course in this survey.

In sections 4 and 5, we identify a specific epistemic goal and treat epistemic disutility functions as measures of the distance of an epistemic state from that goal in a given situation; we lay down conditions that it is claimed all such measures must satisfy. In section 6, we take an alternative route: we lay down putative general conditions on any epistemic disutility function, which it is claimed such a function must satisfy regardless of whether or not it is a measure of distance from a specified epistemic goal. In the next section, we state Probabilism precisely, so that we can refer back to it later.

3. The Epistemic Norm of Probabilism

Probabilism is often said to be a coherence constraint on credence functions, which would mean that it governs how an agent’s credences in some propositions should relate to her credences in other, related propositions. It is often likened to the consistency constraint on sets of full beliefs. In fact, this isn’t quite right. Condition (ii) below is certainly a coherence constraint, but condition (i) is not.

Probabilism A rational agent’s credence function \(c\) at a given time is a probability function. That is:

  1. \(c(\bot) = 0\) and \(c(\top) = 1\).
  2. \(c(A \vee B) = c(A) + c(B)\), for all mutually exclusive \(A\) and \(B\) in \(\mathcal{F}\).

Note that any agent who satisfies Probabilism must be logically omniscient: that is, she must be certain of every tautology. Some other consequences of Probabilism:

  • \(c(A) \leq c(A \vee B)\) for any \(A\), \(B\) in \(\mathcal{F}\).
  • \(c(A\ \&\ B) \leq c(A)\) for any \(A\), \(B\) in \(\mathcal{F}\).
  • \(c(A) = c(B)\) if \(A\) and \(B\) are logically equivalent.

4. Calibration Arguments

In this section, we consider the conditions imposed on an epistemic disutility function when we treat it as a measure of the distance of an epistemic state from the goal of being actually or hypothetically calibrated (van Fraassen 1983; Lange 1999; Shimony 1988). We say that a credence function is actually calibrated at a particular possible world if the credence it assigns to a proposition matches the relative frequency with which propositions of that kind are true at that world. Thus, credence 0.2 in proposition \(A\) is actually calibrated if one-fifth of propositions like \(A\) are actually true. And we say that a credence function is hypothetically calibrated if the credence it assigns to a proposition matches the limiting relative frequency with which propositions of that kind would be true were there more propositions of that kind. Thus, credence 0.2 in proposition \(A\) is hypothetically calibrated if, as we move to worlds with more and more propositions like \(A\), the proportion of such propositions that are true approaches one-fifth in the limit. According to the calibration arguments, matching the relative frequencies or limiting relative frequencies is an epistemic goal. And they attempt to justify Probabilism by appealing to this goal and measures of distance from it.

4.1 Calibration measures

First, we must make precise what we mean by actual and hypothetical calibration; then we can say which functions will count as measuring distance from these putative goals. We treat actual calibration first. Since we are talking of relative frequencies, we will need to assign to each proposition in \(\mathcal{F}\) its reference class: that is, the set of propositions that are relevantly similar to it. Thus, we require an equivalence relation \(\sim\) on \(\mathcal{F}\), where \(A \sim B\) iff \(A\) and \(B\) are relevantly similar. For instance, if our algebra of propositions contains Heads on first toss of coin, Heads on second toss of coin, and Six on first roll of die, we might plausibly say that the first two are relevantly similar, but neither first nor second is relevantly similar to the third. Proponents of calibration arguments do not claim to give an account of how the equivalence relation is determined. Nor do they claim that there is a single, objectively correct equivalence relation on a given algebra of propositions: this is the notorious problem of the reference class that haunts frequentist interpretations of objective probability. Rather they treat the equivalence relation as a component of the agent’s epistemic state, along with her credence function. Indeed, for van Fraassen, it is determined entirely by the credence function together with the form of the propositions in \(\mathcal{F}\) (van Fraassen 1983: 299). However, they do impose some rational constraints on \(\sim\) in order to establish their conclusion. We will not discuss these conditions in any detail. Rather we denote them \(C(\sim)\), and keep in mind that this is a placeholder for a full account of conditions on \(\sim\). Detailed accounts of these conditions have been given by van Fraassen (1983) and Shimony (1988). We say that a credence function \(c\), together with an equivalence relation \(\sim\), is perfectly calibrated or not relative to a way the world might be. We are now ready to give our first definitions; but we preface these with an example.

Suppose a coin is to be flipped 1000 times. And suppose that \(A\) is the proposition Heads on toss 1. And suppose that the propositions that are relevantly similar to \(A\) in algebra \(\mathcal{F}\) are: Heads on toss 1, …Heads on toss 1000. Finally, suppose that \(w\) is a possible world; a way that the world might be. In fact, throughout this article, we need not quantify over genuine possible worlds, which are maximally specific ways the world might be; we need only quantify over ways the world might be that are specific enough to assign truth values to each of the propositions in the algebra \(\mathcal{F}\). Let’s call these possible worlds relative to \(\mathcal{F}\) and let \(\mathcal{W_F}\) be the set of them for a given algebra \(\mathcal{F}\). Then the relative frequency of \(A\) at \(w\) (written \(\mathrm{Freq}(\mathcal{F}, A, \sim, w)\)) is the proportion of the propositions relevantly similar to \(A\) that are true at \(w\): that is, the frequency of heads amongst the 1000 coin tosses at that world. For instance, if every second toss lands heads at \(w\), or if the first five hundred land heads and the rest land tails at \(w\), then \(\mathrm{Freq}(\mathcal{F}, A, \sim, w) = \frac{1}{2}\). If every third toss lands heads at \(w\), then \(\mathrm{Freq}(\mathcal{F}, A, \sim, w) = \frac{1}{3}\). And so on.

Now we give the definition in full generality. Suppose \(\sim\) is an equivalence relation on \(\mathcal{F}\), and \(w\) is a possible world relative to \(\mathcal{F}\). Then:

  • For each \(A\) in \(\mathcal{F}\), the relative frequency of truths amongst propositions like \(A\) is defined as follows: \[\mathrm{Freq}(\mathcal{F}, A, \sim, w) := \frac{|\{ X \in \mathcal{F} : X \sim A\ \&\ v_w(X) = 1\}|}{|\{X \in \mathcal{F} : X \sim A\}|}\] where \(|X|\) is the cardinality of the set \(X\) and \(v_w\) is the standard numerical truth value assignment at that world, so that \(v_w(X) = 1\) if \(X\) is true at \(w\) and \(v_w(X) = 0\) if \(X\) is false at \(w\) (we call \(v_w\) the omniscient credence function at \(w\)). Thus, \(\mathrm{Freq}(\mathcal{F}, A, \sim, w)\) is the proportion of true propositions amongst all propositions in \(\mathcal{F}\) that are relevantly similar to the proposition \(A\).
  • Relative to \(\sim\), the credence r in proposition \(A\) is actually calibrated at \(w\) if \(r = \mathrm{Freq}(\mathcal{F}, A, \sim, w)\).

The idea is that, if \(\sim\) satisfies constraints \(C(\sim)\), then the function \(\mathrm{Freq}(\mathcal{F}, \cdot, \sim, w)\) is always a probability function on \(\mathcal{F}\).

It is clear from this definition that the calibration arguments will work only for finite algebras \(\mathcal{F}\). For an infinite algebra, the definition just given will often make no sense, since the cardinalities of the two sets involved in the ratio will often be infinite.

Next, we treat hypothetical calibration. For this, we need the notion of the limiting relative frequency of truths amongst propositions of a certain sort. The idea is that, for each proposition \(A\) in \(\mathcal{F}\), there is not just a fact of the matter about what the frequency of truths amongst propositions like \(A\) actually is; there is also a fact of the matter about what the frequency of truths amongst propositions like \(A\) would be if there were more propositions like \(A\). For instance, there is not just a fact of the matter about how many actual tosses of a given coin will land heads; there is also a fact of the matter about the frequency of heads amongst hypothetical further tosses of the same coin. In general, suppose we have a possible world \(w\), an extension \(\mathcal{F}'\) of \(\mathcal{F}\) (containing new propositions like \(A\)), and an extension \(\sim'\) of \(\sim\) to cover the new propositions in \(\mathcal{F}'\). Then there is a single unique number \(\mathrm{Freq}(\mathcal{F}', A, \sim', w)\) that gives what the relative frequency of truths amongst propositions like \(A\) would be were there all the propositions in \(\mathcal{F}'\) and where the relation of similarity amongst them is given by \(\sim'\), where this counterfactual is evaluated at the world \(w\). Again, let us illustrate this using our example of the coin toss from above.

Suppose again that \(A\) is the proposition Heads on toss 1 and that the propositions in \(\mathcal{F}\) that are relevantly similar to \(A\) according to \(\sim\) are Heads on toss 1, …, Heads on toss 1000. Now suppose that \(\mathcal{F}_1\) extends \(\mathcal{F}\) by introducing a new proposition about a further hypothetical toss of the coin (as well as perhaps other propositions). That is, it introduces Heads on toss 1001 (and closes out under negation, disjunction, and conjunction). And suppose that \(\sim_1\) extends \(\sim\), so that the new proposition Heads on toss 1001 is considered relevantly similar to each Heads on toss 1, …, Heads on toss 1000. Then those who appeal to hypothetical limiting frequencies must claim that there is a unique number that gives what the frequency of heads would be, were the coin tossed 1001 times. They denote this number \(\mathrm{Freq}(\mathcal{F}_1, A, \sim_1, w)\). Now suppose that \(\mathcal{F}_2\) extends \(\mathcal{F}_1\) by adding the new proposition Heads on toss 1002 and \(\sim_2\) extends \(\sim_1\), so that the new proposition Heads on toss 1002 is considered relevantly similar to each Heads on toss 1, …, Heads on toss 1001. And so on. Then the limiting relative frequency of \(A\) at \(w\) (written \(\mathrm{LimFreq}(\mathcal{F}, A, \sim, w)\)) is the number towards which the following sequence tends: \[\mathrm{Freq}(\mathcal{F}, A, \sim, w), \mathrm{Freq}(\mathcal{F}_1, A, \sim_1, w), \mathrm{Freq}(\mathcal{F}_2, A, \sim_2, w), \ldots\]

In general, for each algebra \(\mathcal{F}\) and equivalence relation \(\sim\), there is an infinite sequence \[(\mathcal{F}, \sim) = (\mathcal{F}_0, \sim_0), (\mathcal{F}_1, \sim_1), (\mathcal{F}_2, \sim_2), \ldots\] of pairs of algebras \(\mathcal{F}_i\) and equivalence relations \(\sim_i\) such that each \(\mathcal{F}_{i+1}\) is an extension of \(\mathcal{F}_i\) and each \(\sim_{i+1}\) is an extension of \(\sim_i\) and, for each \(i\), \(C(\sim_i)\). Using this, we can define the notion of limiting relative frequency and the associated notion of hypothetical calibration in full generality. Suppose \(\sim\) is an equivalence relation on \(\mathcal{F}\) and \(w\) is a possible world. And suppose \[(\mathcal{F}_0, \sim_0), (\mathcal{F}_1, \sim_1), (\mathcal{F}_2, \sim_2), \ldots\] is the sequence just mentioned. Then:

  • For each \(A\) in \(\mathcal{F}\), the limiting relative frequency of truths amongst propositions like \(A\) is \[\mathrm{LimFreq}(\mathcal{F}, A, \sim, w) := \lim_{n \rightarrow \infty} \mathrm{Freq}(\mathcal{F}_n, A, \sim_n, w)\] That is, the limiting relative frequency of \(A\) is the number approached arbitrarily closely by the hypothetical relative frequencies of truths as we extend the algebra \(\mathcal{F}\) to include more and more propositions like \(A\).
  • Relative to \(\sim\), the credence r in proposition \(A\) is hypothetically calibrated at \(w\) if \[r = \mathrm{LimFreq}(\mathcal{F}, A, \sim, w)\]

According to some calibration arguments, actual calibration is an epistemic goal; according to others, hypothetical calibration is the goal. Whichever it is, the epistemic disutility of a credence ought to be given by its distance from this epistemic goal. We say that an epistemic disutility function is local if it measures only the epistemic disutility of an individual credence at a world; we say that it is global if it measures the epistemic disutility of an entire credence function at a world. In this section, we will be concerned only with local epistemic disutility functions. In sections 5 and 6, we will be concerned instead with global epistemic disutility functions.

The goals of actual calibration and hypothetical calibration give rise to the following definitions of two sorts of local epistemic disutility function:

  • An actual calibration measure is a function of the form \[\mathfrak{c}(r, A, \mathcal{F}, \sim, w) = f(|\mathrm{Freq}(\mathcal{F}, A, \sim, w) - r|)\] where \(f : [0, 1] \rightarrow \mathbb{R}\) is a strictly increasing continuous function with \(f(0) = 0\). Let Actual Calibration be the claim that \(\mathfrak{c}\) is the measure of epistemic disutility.
  • A hypothetical calibration measure is a function of the form \[\mathfrak{hc}(r, A, \mathcal{F}, \sim, w) = f(|\mathrm{LimFreq}(\mathcal{F}, A, \sim, w) - r|)\] where again \(f : [0, 1] \rightarrow \mathbb{R}\) is a strictly increasing continuous function with \(f(0) = 0\). Let Hypothetical Calibration be the claim that \(\mathfrak{hc}\) is the measure of epistemic disutility.

Our next task is to identify the norms of standard decision theory/utility theory that are deployed in conjunction with this characterization to derive Probabilism.

4.2 Calibration arguments for Probabilism

In this section, we consider the two accounts of epistemic disutility for credences given in the previous section and we combine them with decision-theoretic norms to derive epistemic norms. When we state the decision-theoretic norms in question, we state them in full generality. In practical decision theory, we evaluate acts: it is acts that have practical disutilities at worlds. In epistemic decision theory, on the other hand, we evaluate credence functions: it is credence functions that have epistemic disutilities at worlds. And in another context still, we might wish to use decision theory to evaluate some other sort of thing, such as a scientific theory (Maher 1993). So we want to state the decision-theoretic norms in a way that is neutral between these. We will talk of options as the things that are being evaluated and that have utilities at worlds. Options can thus be acts or credence functions or scientific theories or some other sort of thing.

Here’s our first putative norm of standard decision theory (van Fraassen 1983: 297):

Possibility of Vindication A rational agent will not adopt an option that has no possibility of attaining minimal disutility, when such a minimum exists.

Here it is a little more formally: Suppose \(\mathcal{O}\) is a set of options, \(\mathcal{W}\) is the set of possible worlds, and \(\mathfrak{U}\) is a disutility function. Then, if \(o^*\) is an option, and there is no \(w^*\) in \(\mathcal{W}\) such \[\mathfrak{U}(o^*, w^*) = \min \{\mathfrak{U}(o, w) : o \in \mathcal{O}\ \&\ w \in \mathcal{W}\}\] (when this minimum exists), then \(o^*\) is irrational.

It can be shown that, together with Actual Calibration from the previous section and suitable constraints \(C(\sim)\) on the equivalence relation \(\sim\), this norm entails something stronger than Probabilism. It entails:

Rational-valued Probabilism At any time \(t\), a rational agent’s credence function \(c\) is a probability function that takes only values in \(\mathbb{Q}\) (where \(\mathbb{Q}\) is the set of rational numbers).

This is a consequence of the following theorem:

Theorem 1 Suppose \(\mathfrak{c}\) is a calibration measure and suppose \(C(\sim)\). Then the following are equivalent:

  1. \(c\) is a probability function on \(\mathcal{F}\) that takes only values in \(\mathbb{Q}\);
  2. There is a world at which \(c\) is actually calibrated. That is, there is a world \(w\) in \(\mathcal{W}\) such that, for all \(A\) in \(\mathcal{F}\), \(\mathfrak{c}(c(A), A, \mathcal{F}, \sim, w) = 0\).

Different versions of this theorem result from different constraints \(C(\sim)\) on the equivalence relation \(\sim\) (van Fraassen 1983; Shimony 1988), but the result is not surprising. An agent will satisfy Possibility of Vindication just in case her credences match the relative frequencies at some world. And those relative frequencies will satisfy the probability axioms if \(C(\sim)\) and if we have specified that condition correctly. That they will be rational numbers follows from the definition of the relative frequency of a proposition at a world.

Thus, we have the following argument:

Actual Calibration argument for Rational-valued Probabilism

Most proponents of the calibration argument are reluctant to accept a norm that rules out every credence given by an irrational number. To establish the weaker norm of Probabilism, there are two strategies they might adopt. The first is to appeal to the epistemic goal of hypothetical calibration instead of actual calibration. This, together with Possibility of Vindication gives us Probabilism via the following theorem:

Theorem 2 Suppose \(C(\sim)\). Then the following are equivalent:

  1. \(c\) is a probability function on \(\mathcal{F}\).
  2. There is a world at which \(c\) is hypothetically calibrated. That is, there is a world \(w\) in \(\mathcal{W}\) such that, for all \(A\) in \(\mathcal{F}\), \(\mathfrak{hc}(c(A), A, \mathcal{F}, \sim, w) = 0\).

The reason is that, while relative frequencies are always rational numbers, the limit of an infinite sequence of rational numbers may be an irrational number. And, in fact, for any irrational number, there is a sequence of rational numbers that approaches it in the limit (indeed, there are infinitely many such sequences).

Thus, we have the following argument:

Hypothetical Calibration argument for Probabilism

An alternative route to Probabilism changes the decision-theoretic norm to which we appeal, rather than the sort of calibration from which we wish our epistemic disutility function to measure distance. The alternative norm is:

Possibility of Arbitrary Closeness to Vindication. An agent ought not to adopt an option unless there are worlds at which it is arbitrarily close to achieving minimal disutility.

That is: Suppose \(\mathcal{O}\) is a set of options, \(\mathcal{W}\) is the set of possible worlds, and \(\mathfrak{U}\) is a disutility function. Then, if \(o^*\) is an option, and if it is not the case that, for any \(\varepsilon > 0\), there is a possible world \(w^*_\varepsilon\) in \(\mathcal{W}\) such \[| \mathfrak{U}(o^*, w^*_\varepsilon) - \min \{\mathfrak{U}(o, w) : o \in \mathcal{O}\ \&\ w \in \mathcal{W}\}| < \varepsilon\] (when these minima exist), then \(o^*\) is irrational.

Together with the characterization of calibration measures given above, suitable constraints \(C(\sim)\) on the equivalence relation \(\sim\), and two extra assumptions, this norm does establish Probabilism. The extra assumptions are these: First, if our agent has a credence function \(c\) in \(\mathcal{C_F}\), the possible worlds that we are considering include not only all (consistent) truth assignments to \(\mathcal{F}\), but also any (consistent) truth assignments to any (finite) algebra \(\mathcal{F}'\) that extends \(\mathcal{F}\). And, second, given any such \(\mathcal{F}'\), the equivalence relation \(\sim\) can be extended in any possible way, providing the extension \(\sim'\) of \(\sim\) satisfies \(C(\sim')\).

Theorem 3 Suppose \(C(\sim)\). Then the following are equivalent:

  1. \(c\) is a probability function on \(\mathcal{F}\).
  2. For all \(\varepsilon > 0\), there is a finite extension \(\mathcal{F}'\) of \(\mathcal{F}\) and an extension \(\sim'\) of \(\sim\) that satisfies \(C(\sim')\), and a possible world \(w'\) in \(\mathcal{W}\) such that, for all \(A\) in \(\mathcal{F}\), \(\mathfrak{c}(c(A), A, \mathcal{F}', \sim', w') < \varepsilon\)

Thus, if our agent satisfies Probabilism, then however close she would like to be to actual calibration, there is some possible world at which she is that close. And conversely.

Thus, we have the following argument:

Actual Calibration argument for Probabilism

These are the calibration arguments for Probabilism. In the next section, we consider objections that may be raised against them.

4.3 Objections to calibration arguments for Probabilism

Objection 1: Calibration is not an epistemic goal. It may be objected that neither actual nor hypothetical calibration measures are truth-directed epistemic disutility functions, where this is taken to be a necessary condition on such a function (Joyce 1998: 595; Seidenfeld 1985). We say that a local epistemic disutility function—that is, recall, an epistemic disutility function defined for individual credences—is truth-directed if the disutility that it assigns to a credence in a true proposition increases as the credence decreases, and the disutility it assigns to a credence in a false proposition increases as the credence increases. Calibration measures do not have this property. To see this, let us return to our toy example: the propositions Heads on toss 1, …, Heads on toss 1000 are in \(\mathcal{F}\) and they are all relevantly similar according to \(\sim\). Now suppose that the first coin toss lands heads, but all the others land tails. Then credence 0.001 in Heads on toss 1 is actually calibrated, since exactly one out of one-thousand relevantly similar propositions are true; so it has epistemic disutility 0. Credence 0.993, on the other hand, is not, and thus receives a positive epistemic disutility. However, it is a higher credence in a true proposition, and thus should be assigned a lower epistemic disutility, according to the requirement of truth-directedness. One natural response to this objection is that it is question-begging. Proponents of the calibration argument will simply reject the claim that an epistemic disutility function must be truth-directed. Credences, unlike beliefs, they might say, are not in the business of getting close to the truth; they are in the business of getting close to being calibrated.

Objection 2: Limiting relative frequencies are not well-defined. To define the limiting relative frequency of \(A\) at a world \(w\), we require that there is a unique sequence of extensions of the algebra each of which contains more propositions that are relevantly similar to \(A\) than the previous extension, and a corresponding sequence of relative frequencies of truths amongst the propositions like \(A\) in the corresponding algebra. But the assumption of such a unique sequence is extremely controversial and the problems to which it gives rise have haunted hypothetical frequentism about objective probability (Hájek 2009).

Objection 3: Neither Possibility of Vindication nor Possibility of Arbitrary Closeness to Vindication is a norm. It might be that the only actions that give rise to the possibility of vindication or of arbitrary closeness to vindication also give rise to the possibility of maximal distance from vindication. And it might be that there are actions that do not give rise to the possibility of vindication or of arbitrary closeness to vindication, but do limit the distance from vindication that is risked by choosing that action. In such cases, it is not at all clear that it is rationally required of an agent that she ought to risk maximal distance from vindication in order to leave open the possibility of vindication or of arbitrary closeness to vindication. Compare: I have two options—if I choose option 1, I will receive £0 or £100, but I don’t know which; if I choose option 2, I will receive £99 for sure. Even before they know the objective chances of the two possibilities that the first option creates, many people will opt for the second. However, by doing so, they rule out the possibility that they will receive the maximum possible utility, which is obtained by option 1 if I receive £100. It seems that ruling out such a possibility is not irrational. To put it another way: Possibility of Vindication and Possibility of Arbitrary Closeness to Vindication are extreme risk-seeking norms. That is, they suggest that we make our decisions by trying to maximise the utility we obtain in our best-case scenario. But while it might be rationally permissible to be so risk-seeking, it is certainly not mandatory (Easwaran & Fitelson forthcoming: Section 8).

Objection 4: The constraints on \(\sim\) are ill-motivated. This objection will vary with the constraints \(C(\sim)\) that are imposed on \(\sim\). One uncontroversial constraint is this: If \(A \sim B\), then \(c(A) = c(B)\). The further constraints imposed by Shimony (1988) and van Fraassen (1983) are more controversial (Joyce 1998: 594–6). Moreover, they limit the application of the result, since they involve assumptions about the form of the propositions in \(\mathcal{F}\). Thus, the calibration arguments do not show in general, of any finite algebra \(\mathcal{F}\), that a credence function on \(\mathcal{F}\) ought to be a probability function, since not every such algebra will contain propositions with the form required by the constraints \(C(\sim)\).

5. Accuracy Arguments

In this section, we move from calibration arguments to accuracy arguments for Probabilism. These arguments have the same structure as the calibration arguments. They consist of a mathematically-precise account of epistemic disutility and a decision-theoretic norm. And they derive, from that norm together with that account of disutility, an epistemic norm. In particular, they derive Probabilism. And that derivation goes via a mathematical theorem. However, they will use different accounts of epistemic disutility and different decision-theoretic norms.

In this section, we will begin with the original accuracy-based argument for Probabilism due to James M. Joyce (1998; see also Rosenkrantz 1981). Then we’ll consider its various components in turn, and explore the objections they have elicited and the adjustments that have been made to them.

5.1 Joyce’s accuracy argument for Probabilism

Joyce’s argument consists of an account of the epistemic disutility of credences and a decision-theoretic norm. Let’s consider each in turn.

Joyce’s account of the epistemic disutility of credences itself consists of two components. The first identifies epistemic disutility with gradational inaccuracy; the second gives a mathematically-precise account of gradational inaccuracy.

In more detail: The first component of Joyce’s account of epistemic disutility for credences is the claim—which we will call Credal Veritism, partly following Goldman (2002: 58)—that the only source of value for credences that is relevant to their epistemic status is their gradational accuracy, where the gradational accuracy of a credence in a true proposition is higher when the credence is closer to 1, which we might think of as the ideal or vindicated credence in a true proposition, while the gradational accuracy of a false proposition is higher when the credence is closer to 0, which we might think of as the ideal or vindicated credence in a false proposition. Thus, the only source of disvalue for credences is their gradational inaccuracy.

The second component of Joyce’s account of epistemic disutility for credences is a set of mathematically-precise conditions that a measure of the gradational inaccuracy of a credence function at a given possible world must satisfy. A putative inaccuracy measure for credence functions over an algebra \(\mathcal{F}\) is a mathematical function \(\mathfrak{I}\) that takes a credence function \(c\) in \(\mathcal{C_F}\) and a possible world \(w\) in \(\mathcal{W_F}\) and returns a number \(\mathfrak{I}(c, w)\) in \([0, \infty]\) that measures the inaccuracy of \(c\) at \(w\). (The set \([0, \infty]\) contains all non-negative real numbers together with \(\infty\).) Here is an example, called the Brier score: \[\mathfrak{B}(c, w) := \sum_{X \in \mathcal{F}} |v_w(X) - c(w)|^2\] Thus, the Brier score measures the inaccuracy of a credence function at a world as follows: it takes each proposition to which the credence function assigns credences; it takes the difference between the credence that the credence function assigns to that proposition and the ideal or vindicated credence in that proposition at that world; it squares this difference; and it sums up the results. I will not give all of Joyce’s conditions here, but I will note that the Brier score just defined satisfies them all. Let us say that any putative inaccuracy measure \(\mathfrak{I}\) that satisfies these conditions is a Joycean inaccuracy measure. And let Joycean Inaccuracy be the claim that all legitimate inaccuracy measures are Joycean inaccuracy measures.

Combining Credal Veritism and Joycean Inaccuracy, we have the claim that the epistemic disutility of a credence function at a world is given by its inaccuracy at that world as measured by a Joycean inaccuracy measure.

Let us turn now to the decision-theoretic norm to which Joyce appeals. We have met it already above in the introduction to this article: it is the norm of Naive Dominance. We will state it here precisely:

Naive Dominance A rational agent will not adopt an option when there is another option that has lower disutility at all worlds.

That is: Suppose \(\mathcal{O}\) is a set of options, \(\mathcal{W}\) is the set of possible worlds, and \(\mathfrak{U}\) is a disutility function. Then, if \(o^*\) is an option, and if there is another option \(o'\) such that \(\mathfrak{U}(o', w) < \mathfrak{U}(o^*, w)\) for all worlds \(w\) in \(\mathcal{W}\), then \(o^*\) is irrational. (In this situation, we say that \(o^*\) \(\mathfrak{U}\)-dominates \(o'\).)

The idea behind Naive Dominance is this: If there is one option that is guaranteed to have lower disutility than another option, then the latter is guaranteed to be worse than the former; so the agent can know a priori that the latter is worse than the former. And surely it is irrational to adopt an option if there is another that you know a priori to be better.

Thus, we have the substantial components of Joyce’s argument: Credal Veritism, Joycean Inaccuracy, and Naive Dominance. From these, we can derive Probabilism via the following mathematical theorem Joyce (1998: 597–600):

Theorem 4 (Joyce’s Main Theorem) Suppose \(\mathcal{F}\) is an algebra and \(\mathfrak{I} : \mathcal{C_F} \times \mathcal{W_F} \rightarrow [0, \infty]\) is a Joycean inaccuracy measure for the credence functions on \(\mathcal{F}\). Now suppose that \(c^*\) is a credence function in \(\mathcal{C_F}\) that violates Probabilism. Then there is a credence function \(c'\) in \(\mathcal{C_F}\) such that \(\mathfrak{I}(c', w) < \mathfrak{I}(c^*, w)\) for all \(w\) in \(\mathcal{W_F}\). (In this situation, we say that \(c'\) accuracy dominates \(c^*\) relative to \(\mathfrak{I}\).)

Figure 1 illustrates this result in the particular very simple case in which \(\mathcal{F}\) contains just a proposition, Heads, and its negation, Tails, and inaccuracy is measured using the Brier score.

Thus, we have the following argument:

Joyce’s accuracy argument for Probabilism

Figure 1: In this figure, we plot the various possible credence functions defined on a proposition Heads and its negation Tails in the unit square. Thus, we plot the credence in Heads along the horizontal axis and the credence in Tails up the vertical axis. We also plot the vindicated credence functions \(v_{w_1}\) and \(v_{w_2}\) for the two worlds \(w_1\) (at which Tails is true and Heads is false) and \(w_2\) (at which Heads is true and Tails is false). The diagonal line between them contains all and only the credence functions on these two propositions that are probability functions and thus satisfy Probabilism. \(c^*\) (which assigns 0.7 to Heads and 0.6 to Tails) violates Probabilism. The lower right-hand arc contains all the credence functions that are exactly as inaccurate as \(c\) at world \(w_2\), where that inaccuracy is measured using the Brier score. To see this, note that the Brier score of \(c^*\) at \(w_2\) is the square of the Euclidean distance of the point \(c^*\) from the point \(v_{w_2}\). Thus, the credence functions that have exactly the same Brier score as \(c^*\) at \(w_2\) are those that lie equally far from \(v_{w_2}\). For the same reason, the upper left-hand arc contains all the credence functions that are exactly as inaccuracy as \(c\) at world \(w_1\). Every credence function that lies between the two arcs is more accurate than \(c^*\) at both worlds. These are the ones whose squared Euclidean distance from \(v_{w_2}\) is less than the squared Euclidean distance of \(c^*\) from \(v_{w_2}\), and similarly for \(v_{w_1}\). It assigns 0.55 to Heads and 0.45 to Tails. \(c'\) is such a credence function. \(c'\) also satisfies Probabilism.

5.2 The source(s) of epistemic disutility

Let us start by considering the first of the two components that comprise Joyce’s account of epistemic disutility for credences, namely, Credal Veritism. This says that the sole fundamental source of epistemic disutility for a credence is its gradational inaccuracy. Any other vice that the credence has, it is claimed, must derive from this vice (Goldman 2002: 52).

First, let’s note why it is important to make this assumption. Would it not be sufficient to say merely that one of the sources of disutility for a credence is its inaccuracy, and then to point out that any credence function that isn’t a probability function is accuracy dominated? If it could always be guaranteed that one of the credence functions that does the accuracy dominating does not have some other epistemic vice to a greater degree than does the credence function it accuracy dominates, then this would be sufficient. But if it were possible that all of the accuracy dominating credence functions, while guaranteed to be better along the dimension of inaccuracy, were worse along some other dimension of epistemic disutility, then being accuracy dominated would not rule out a credence function as irrational. Thus, we must claim, with Credal Veritism, that inaccuracy is the only source of epistemic disutility for credences.

How are we to establish this? How can we be sure there aren’t other sources of disutility. For instance, perhaps it is a virtue of a credence function if the credences it assigns cohere with one another in a particular way, and a vice if they do not. This is a coherentist claim of the sort endorsed for full beliefs, rather than credences, by the likes of BonJour (1985) and Harman (1973). Or perhaps it is a virtue of a credence in a particular proposition if it matches the degree of support given to that proposition by the agent’s current total evidence. This claim is dubbed evidential proportionalism by Goldman (2002: 55–7). Recent proponents might include Williamson (2000) and White (2009). Both of these seem plausible. How is the credal veritist to answer the objection that there are sources of epistemic disutility, such as these, that go beyond inaccuracy? Of course, it is notoriously difficult to prove a negative existential claim, such as the credal veritist claim that there are no other epistemic vices beyond inaccuracy. But here is a natural strategy: for each proposed candidate epistemic vice besides accuracy, the credal veritist should provide an account of how its badness derives from the badness of inaccuracy.

In the case of the coherentist described above, who proposes that it is a vice to have credences that fail to cohere in a particular way, there is a very natural instance of this strategy. The coherence that we demand of credences is precisely that they relate to one another in the way that Probabilism demands, so that, for instance, no disjunction is assigned lower credence than is assigned to either of the disjuncts, no proposition is assigned very high credence at the same time that its negation is also assigned very high credence, and so on. If that is correct, then of course Joyce’s accuracy argument for Probabilism detailed above provides an argument that this vice derives its badness from the badness of inaccuracy: after all, if a credence function lacks the coherence that the coherentist considers virtuous, they will be accuracy dominated.

What of the evidential proportionalist? Here it is a little more difficult. There are principles that the evidential proportionalist will take to govern evidential support that go beyond merely Probabilism, which is a relatively weak and undemanding principle. So it is not sufficient to point to the accuracy argument for that principle in the way we did in response to the coherentist. However, here is an attempt at an answer. It comes from collecting together a series of accuracy arguments for other principles of rationality that we take to govern our credences. For instance, Greaves & Wallace (2006) give an accuracy argument for the principle of conditionalization, which says that, if an agent is rational, her credence function at a later time will be obtained from her credence function at an earlier time by conditionalizing on the total evidence she obtains between those two times; Easwaran (2013) and Huttegger (2013) extend the argument, and Schoenfield ms.) clarifies the norm that it establishes. Moreover, Pettigrew (2013a) gives an accuracy argument for the Principal Principle, which says that, if an agent is rational, her credences in propositions concerning the objective chances will relate to propositions to which those chances attach in a particular way. Pettigrew (2014b) and Konek (ms.) give rather different accuracy-based arguments for the Principle of Indifference, which says how a rational agent with no evidence will distribute their credences. Moss (2011), Lam (2013), and Levinstein (2015) describe principles that rational agents will obey in the presence of peer disagreement and provide accuracy-based arguments in their favour. And finally Horowitz (2014) uses accuracy-based arguments to evaluate various species of permissivism. The point is that, piece by piece, the principles that are taken to govern the degree of support provided to a proposition by a body of evidence are being shown to follow from accuracy considerations alone. This, it seems, constitutes a response to the concerns of the evidential proportionalist.

However, both the response to the coherentist and the response to the evidential proportionalist leave the accuracy argument for Probabilism in a strange position. The argument for, or defence of, one component of its first premise, namely, Credal Veritism appeals to the argument of which it is a premise! In fact, this isn’t problematic. The credal veritist and her opponent might agree that the argument at least establishes a conditional: if credal veritism is true, then probabilism is true. You need not accept credal veritism to accept that conditional. And it is that conditional to which the credal veritist appeals in defending her position against the coherentist and the evidential proportionalist. Having successfully defended credal veritism in this way, she can then appeal to its truth to derive Probabilism.

5.3 Measures of inaccuracy

5.3.1 Joyce on Convexity

So much, then, for the first component of the first premise of the accuracy argument for Probabilism, namely, Credal Veritism. In this section, we turn to the second component, namely, Joycean Inaccuracy. I will focus on a particular condition that Joyce places on measures of inaccuracy, namely, Strong Convexity (Joyce calls it Weak Convexity, but I change the name in this presentation because, as Patrick Maher (2002) points out, it is considerably stronger than Joyce imagines.)

Strong Convexity Suppose \(\mathfrak{I}\) is a legitimate inaccuracy measure. Then if \(c \neq c'\) and \(\mathfrak{I}(c, w) = \mathfrak{I}(c', w)\), then \[\mathfrak{I}\left(\frac{1}{2}c + \frac{1}{2}c', w\right) < \mathfrak{I}(c, w) = \mathfrak{I}(c', w)\] (Given two credence functions, \(c\) and \(c'\), we define a third credence function \(\frac{1}{2}c + \frac{1}{2} c'\) as follows: the credence that \(\frac{1}{2}c + \frac{1}{2}c'\) assigns to a proposition is the straight average of the credences that \(c\) and \(c'\) assign to it. Thus, \((\frac{1}{2}c + \frac{1}{2}c')(X) = \frac{1}{2}c(X) + \frac{1}{2}c'(X).\) We call this the equal mixture of \(c\) and \(c'\).)

This says that, for any two distinct credence functions that are equally inaccurate at a given world, the third credence function obtained by “splitting the difference” between them and taking an equal mixture of the two is less inaccurate than either of them. Here is Joyce’s justification of this condition:

[Strong] Convexity is motivated by the intuition that extremism in the pursuit of accuracy is no virtue. It says that if a certain change in a person’s degrees of belief does not improve accuracy then a more radical change in the same direction and of the same magnitude should not improve accuracy either. Indeed, this is just what the principle says. (Joyce 1998: 596)

Joyce’s point is this: Suppose we have three credence functions, \(c\), \(m\), and \(c'\). And suppose that, to move from \(m\) to \(c'\) is just to move in the same direction and by the same amount as to move from \(c\) to \(m\), which is exactly what will be true if \(m\) is the equal mixture of \(c\) and \(c'\). Now suppose that \(m\) is at least as inaccurate as \(c\)—that is, the change from \(c\) to \(m\) does not “improve accuracy”. Then, Joyce claims, \(c'\) must be at least as inaccurate as \(m\)—that is, the change from \(m\) to \(c'\) also does not “improve accuracy”.

Objection: The justification given doesn’t justify Strong Convexity. The problem with this justification is that it establishes a weaker principle than Strong Convexity. This was first pointed out by Patrick Maher (2002), who noted that Joyce’s justification in fact motivates the following weaker principle:

Weak Convexity Suppose \(\mathfrak{I}\) is a legitimate inaccuracy measure. Then if \(c \neq c'\) and \(\mathfrak{I}(c, w) = \mathfrak{I}(c', w)\), then \[\mathfrak{I}\left(\frac{1}{2}c + \frac{1}{2}c', w\right) \leq \mathfrak{I}(c, w) = \mathfrak{I}(c', w)\]

That is, Joyce’s motivation rules out situations in which inaccuracy increases from \(c\) to \(m\) and then decreases from \(m\) to \(c'\). And this is what Weak Convexity also rules out. But Strong Convexity furthermore rules out situations in which inaccuracy remains the same from \(c\) to \(m\) and then from \(m\) to \(c'\). And Joyce has given no reason to think that such changes are problematic. What’s more, as Maher proves, the stronger convexity condition is crucial for Joyce’s proof. With only the weaker condition, the theorem is false.

5.3.2 Local and global inaccuracy

In this section, we consider alternative sets of conditions on inaccuracy measures that are presented by Leitgeb & Pettigrew (2010a). These propose that we replace the claim Joycean Inaccuracy in Joyce’s accuracy argument for Probabilism with an alternative claim that says that the legitimate inaccuracy measures are (amongst) those that satisfy Leitgeb and Pettigrew’s alternative conditions. Unlike Joyce’s conditions, these are sufficient to narrow the field of legitimate inaccuracy measures to just a single one, namely, the Brier score \(\mathfrak{B}\) that we met in section 5.1 above. Let us say that Brier Inaccuracy is the claim that the Brier score is the only legitimate measure of inaccuracy. And note that, if we replace Joycean Inaccuracy with Brier Inaccuracy in Joyce’s argument for Probabilism, we retain our argument for that epistemic norm:

Brier accuracy-based argument for Probabilism: I

So far, in this section, we have been concerned only with what we might call global measures of inaccuracy—that is, measures of the inaccuracy of entire credence functions. Leitgeb and Pettigrew are certainly interested in those. But they are also interested in what we might call local measures of inaccuracy—that is, measures of the inaccuracy of individual credences. Indeed, they are interested in how these two sorts of inaccuracy measure interact. They lay down constraints on each of the inaccuracy measures individually, and then they lay down constraints on how they combine. The guiding idea in each case is that any feature of the inaccuracy of credences that is determined from the point of view of local inaccuracy measures—such as their total inaccuracy, or the urgency with which an agent with inaccurate credences should change them—should match that same feature when it is determined from the point of view of global inaccuracy measures. If this doesn’t happen, then the agent will face a rational dilemma when choosing which of the two ways she should use to determine that feature. Here, I will focus only on one of the most powerful of Leitgeb and Pettigrew’s conditions, which also turns out to be the most problematic. Here it is:

Global Normality and Dominance If \(\mathfrak{I}\) is a legitimate global inaccuracy measure, there is a strictly increasing \(f:[0, \infty) \rightarrow [0, \infty)\) such \[\mathfrak{I}(c, w) = f(||v_w - c||_2).\] where, for any two credence functions \(c\), \(c'\) defined on \(\mathcal{F}\), \[||c - c'||_2 := \sqrt{\sum_{X \in \mathcal{F}} |c(X) - c'(X)|^2}\] and we call \(||c - c'||_2\) the Euclidean distance between \(c\) and \(c'\); and, recall, \(v_w\) is the omniscient credence function at \(w\), so that \(v_w(X) = 1\) if \(X\) is true at \(w\) and \(v_w(X) = 0\) if \(X\) is false at \(w\).

Thus, Global Normality and Dominance says that the inaccuracy of a credence function at a world should supervene in a certain way upon the Euclidean distance between that credence function and the omniscient credence function at that world. Indeed, it should be a strictly increasing function of that distance between them.

Objection 1: There is no motivation for the appeal to Euclidean distance. Leitgeb and Pettigrew show that the only inaccuracy measure that satisfies Global Normality and Dominance, together with their other conditions on inaccuracy measures, is the Brier score, which we defined above. That is, imposing these conditions entails Brier Inaccuracy. The problem with this characterization, however, is that it depends crucially on the appeal to the Euclidean distance made in Global Normality and Dominance, and no reason is given for appealing to the Euclidean distance measure in particular, rather than some other measure of distance between credence functions. Suppose we replace that condition with one that says that a legitimate global inaccuracy measure must be a strictly increasing function of the so-called Manhattan or city block distance measure, where the distance between two credence functions measured in this way is defined as follows: \[||c - c'||_1 := \sum_{X \in \mathcal{F}} |c(X) - c'(X)|\] That is, the Manhattan distance between two credence functions is obtained by summing the differences between the credences they each assign to the various propositions on which they are defined. Together with the other constraints that Leitgeb and Pettigrew place on inaccuracy measures, this alternative constraint entails that the only legitimate inaccuracy measure is the so-called absolute value score, which is defined as follows: \[\mathfrak{A}(c, w) := \sum_{X \in \mathcal{F}} |v_w(X) - c(X)|\]

Now, it turns out that the absolute value score cannot ground an accuracy argument for Probabilism. In fact, there are situations in which non-probabilistic credence functions accuracy dominate probabilistic credence functions when inaccuracy is measured using the absolute value score. Let \(\mathcal{F} = \{X_1, X_2, X_3\}\), where \(X_1\), \(X_2\), and \(X_3\) are mutually exclusive and exhaustive propositions. And consider the following two credence functions: \(c(X_i) = \frac{1}{3}\) for each \(i = 1, 2, 3\); \(c'(X_i) = 0\) for each \(i = 1, 2, 3\). The former, \(c\), is probabilistic; the latter, \(c'\), is not. But, if we measure inaccuracy using the absolute score, the inaccuracy of \(c\) at each of the three possible worlds is \(\frac{4}{3}\), whereas the inaccuracy of \(c'\) at each of the three possible worlds is \(1\). The upshot of this observation is that it is crucial, if our accuracy argument for Probabilism is to succeed, to rule out the absolute value score. The problem with the Leitgeb and Pettigrew characterization is that it rules out this measure essentially by fiat. It rules it out by demanding that the inaccuracy of a credence function at a world supervenes on the Euclidean distance between the credence function and the omniscient credence function at that world. But it gives no reason for favouring this measure of distance over another, such as Manhattan distance.

Objection 2: Using the Brier score to measure inaccuracy has unintuitive consequences. A further objection to Leitgeb and Pettigrew’s characterization of inaccuracy measures is given by Levinstein (2012). In the sequel to the paper in which they give this characterization, Leitgeb and Pettigrew use it to argue in favour of an updating rule for credences that applies in the same situations as so-called Jeffrey Conditionalization (or Probability Kinematics) but offers different advice (Jeffrey 1965; Leitgeb & Pettigrew 2010b). Levinstein objects to the use of the Brier score to measure inaccuracy on the grounds that this alternative updating rule gives deeply unintuitive results.

5.3.3 Calibration and accuracy

The final characterization of inaccuracy measures that I will consider here is due to Pettigrew (forthcoming-a). Again, I won’t enumerate all of the conditions here. Instead, I’ll describe the most contentious and mathematically powerful of the conditions—the one that in some sense does the main mathematical “heavylifting” when it comes to showing what putative inaccuracy measures these conditions permit.

So far in this entry, we have presented calibration accounts of epistemic utility and accuracy accounts as separate and incompatible. The condition on inaccuracy measures that Pettigrew proposes and that we consider in this section denies that. Rather, it claims that closeness to calibration in fact plays a role in determining the accuracy of a credence function; the difference between this approach and the calibration arguments of section 4 is that Pettigrew does not think that closeness to calibration is the whole story. Let \(\mathfrak{D}\) be a putative measure of the distance between two credence functions. That is, \(\mathfrak{D} : \mathcal{C_F} \times \mathcal{C_F} \rightarrow [0, \infty]\), and we’ll assume that \(\mathfrak{D}(c, c') = 0\) iff \(c = c'\). Now first we use this measure of distance to define a measure of the distance that a credence function lies from being perfectly calibrated at a world. Then, following a point already made above in our treatment of calibration arguments for Probabilism, we note that this, on its own, cannot define a measure of inaccuracy because it lacks a crucial feature that we demand of any such measure: it is not truth-directed. However, we then note how to supplement the measure of distance from calibration in order to give an inaccuracy measure that does have the crucial feature. And we claim that all inaccuracy measures are produced by supplementing a measure of distance from calibration in this way.

As in section 4.1, we let \(\sim\) be an equivalence relation on the set \(\mathcal{F}\) of propositions to which our agent assigns opinions. It is the relation of relevant similarity between two propositions. In section 4.1, we said that we would impose conditions \(C(\sim)\) on this equivalence relation, but we said no more to identify those conditions. In this section, we in fact define this equivalence relation. We take it to be relative to a credence function \(c\), so we write it \(\sim_c\), and we define it as follows: \(A \sim_c B\) iff \(c(A) = c(B)\). That is, two propositions are relevantly similar for our agent with credence function \(c\) if \(c\) assigns them the same credence. Thus, given a possible world \(w\), we say that a credence function \(c\) is perfectly calibrated at \(w\) if, for each \(A\) in \(\mathcal{F}\), \[c(A) = \mathrm{Freq}(\mathcal{F}, A, \sim_c, w)\]

Next, given a credence function \(c\) and a world \(w\), the perfectly calibrated counterpart of \(c\) at \(w\) is a credence function also defined on \(\mathcal{F}\) that is defined as follows: for each \(A\) in \(\mathcal{F}\) \[c^w(A) = \mathrm{Freq}(\mathcal{F}, A, \sim_c, w)\] That is, the perfectly calibrated counterpart of \(c\) at \(w\) assigns to each proposition \(A\) the frequency of truths at \(w\) amongst all propositions to which \(c\) assigns the same credence that it assigns to \(A\). Note that \(c^w\) is perfectly calibrated at \(w\). And if \(c\) is perfectly calibrated at \(w\), then \(c^w = c\). Now, we define the distance that a credence function \(c\) lies from calibration at a world \(w\) to be the distance, \(\mathfrak{D}(c^w, c)\), from \(c^w\) to \(c\). Now, as we saw in Objection 1 from section 4.3 above, this measure does not itself give a measure of epistemic disutility. The problem is that an agent can move closer to calibration at a world \(w\) while moving uniformly further from the omniscient credence function at that world: that is, the measure of epistemic disutility provided by the distance of the credence function from its perfectly calibrated counterpart is not truth-directed. Thus, if an agent’s distance from her perfectly calibrated counterpart is to contribute to a measure of her inaccuracy, it must be supplemented by something that ensures that the resulting measure avoids this consequence. The idea that Pettigrew proposes is this: the inaccuracy of \(c\) at \(w\) is given by the distance of \(c\) from the omniscient credence function \(v_w\) at \(w\); and that is given by adding the distance of \(c\) from its perfectly calibrated counterpart \(c^w\) to the distance of \(c^w\) from \(v_w\). Thus, while moving to a credence function that is closer to its perfectly calibrated counterpart may move you further from the omniscient credence function, this can only be because the perfectly calibrated counterpart of your new credence function is further from the omniscient credence function than the perfectly calibrated counterpart of your current credence function. If their perfectly calibrated counterparts are the same, or if they are different but equally close to the omniscient credence function, then moving closer to them will move you closer to the omniscient credence function. Thus, Pettigrew imposes the following constraint:

Decomposition Suppose \(\mathfrak{I}\) is a legitimate inaccuracy measure and \(\mathfrak{D}\) is a distance measure such that \(\mathfrak{I}(c, w) = \mathfrak{D}(v_w, c)\). Then \[\mathfrak{I}(c, w) = \mathfrak{D}(v_w, c) = \mathfrak{D}(c^w, c) + \mathfrak{D}(v_w, c^w)\]

Together with the other conditions that Pettigrew imposes, Decomposition narrows down the class of legitimate inaccuracy measures to a single one, namely, the Brier score. That is, imposing these conditions entails Brier Inaccuracy.

Objection 1: Appeal to summation is arbitrary. One concern about Decomposition is this: it is crucial for the proof that the Brier score and only the Brier score satisfies all of Pettigrew’s conditions that in Decomposition we combine the distance between \(c\) and \(c^w\) with the distance between \(c^w\) and \(v_w\) by summing them together. But we could have combined those quantities in other ways: we might have multiplied them together, for instance; or, we might have summed them and then taken a strictly increasing function of that sum. It might be mathematically natural simply to add them together: but that doesn’t privilege that means of combining them for philosophical purposes. However, if we combine them in any of these alternative ways, Pettigrew’s conditions will no longer hold of the Brier score.

Objection 2: Proximity to calibration is not a good. Another concern is that, while proximity to being perfectly calibrated seems epistemically good in the standard cases that are used to motivate calibrationist accounts, it seems less compelling in other cases. For instance, suppose you have opinions only about three propositions: First coin toss lands heads, Second coin toss lands heads, Third coin toss lands heads. And suppose you assign to each of them the same credence, \(\frac{1}{3}\). Then, in that situation, it seems plausible that you are doing better if one out of the three tosses comes up heads. Now suppose that I have opinions only about three propositions: Djibouti is the capital of Ghana, Serena Williams is a badminton player, Doris Lessing wrote The Golden Notebook. And suppose I assign to each of them the same credence, \(\frac{1}{3}\). Then, in that situation, do we really retain the intuition that I do best if one out of the three turns out true?

5.4 Dominance principles

So far, we have considered the two components of the first premise of Joyce’s accuracy argument for Probabilism: Credal Veritism and Joycean Inaccuracy. We have left the former intact, but we have seen concerns with the latter, and we have considered arguments for a stronger claim, Brier Inaccuracy, though these also face difficulties. In this section, we move from the account of epistemic disutility on which the argument is based to the decision-theoretic principle to which we appeal in order to derive Probabilism from this account. Let’s recall the version of the principle to which Joyce appeals in his original paper:

Naive Dominance A rational agent will not adopt an option when there is another option that has lower disutility at all worlds.

That is: Suppose \(\mathcal{O}\) is a set of options, \(\mathcal{W}\) is the set of possible worlds, and \(\mathfrak{U}\) is a disutility function. Then, if \(o^*\) is an option, and if there is another option \(o'\) such that \(\mathfrak{U}(o', w) < \mathfrak{U}(o^*, w)\) for all worlds \(w\) in \(\mathcal{W}\), then \(o^*\) is irrational.

Thus, according to Joyce, a credence function is irrational if it is accuracy dominated.

In this section, we’ll consider four objections that have been raised against Naive Dominance in the context of the accuracy argument for Probabilism.

5.4.1 The Bronfman objection

The first objection to the application of Naive Dominance in the context of the accuracy argument for Probabilism was first stated in an unpublished manuscript by Aaron Bronfman entitled “A Gap in Joyce’s Proof of Probabilism”; it has been discussed by Hájek (2008) and Pettigrew (2010, 2013b). The starting point for the objection is the observation that Credal Veritism and Joycean Inaccuracy do not together narrow down the class of legitimate measures of epistemic disutility to a single function; they characterize a family of such measures. But, for all that Theorem 4 (Joyce’s Main Theorem) tells us, it may well be that, for a given non-probabilistic credence function \(c^*\), different measures in this family of legitimate inaccuracy measures give different sets of credence functions that accuracy dominate \(c^*\). Thus, an agent with a non-probabilistic credence function \(c^*\) might be faced with a range of credence functions, each of which accuracy dominates \(c^*\) relative to a different legitimate inaccuracy measure. Moreover, it may be that any credence function that accuracy dominates \(c^*\) relative to Joycean inaccuracy measure \(\mathfrak{I}\) does not accuracy dominate \(c^*\) relative to the alternative Joycean measure \(\mathfrak{I}'\); indeed, it may be that any credence function that dominates \(c^*\) relative to \(\mathfrak{I}\) risks very high inaccuracy at some world relative to \(\mathfrak{I}'\), and vice versa. In this situation, it is plausible that the agent is rationally permitted to stick with her non-probabilistic credence function \(c^*\).

There are two replies to this objection. According to the first, the objection relies on a false meta-normative claim; according to the second, it misunderstands the purpose of Joyce’s conditions.

Reply 1: No requirement to give advice. The meta-normative claim on which the objection seems to rely is the following: For a norm to hold, there must be specific advice available to those who violate that norm concerning how to improve their behaviour. Bronfman’s objection begins with the observation that, for any specific advice that one might give to a non-probabilistic agent concerning which credence function she should adopt in favour of her own, there will be inaccuracy measures that satisfy Joyce’s conditions, but don’t sanction this advice; indeed, there will be inaccuracy measures relative to which that advice is very bad. Thus, Joyce’s accuracy argument violates the meta-normative constraint. But, the reply submits, the meta-normative claim is false: for a norm to hold, it is sufficient that there is a serious defect suffered by those who violate the norm that is not shared by those who satisfy the norm; it is not also required that there should be advice on which specific action an agent should perform to improve her behaviour. And Joyce’s argument satisfies this sufficient condition. An agent ought to satisfy Probabilism because non-probabilistic credence functions suffer from a serious epistemic defect (namely, being accuracy dominated) that does not beset probabilistic ones. And this fact is “supertrue”, so to speak: that is, it is true on any precisification of the notion of accuracy that obeys Joyce’s conditions on an inaccuracy measure.

Reply 2: Each agent uses a single inaccuracy measure. The second reply to this objection does not take issue with the meta-normative claim mentioned above; indeed, on the understanding of the accuracy argument for Probabilism that it proposes, the argument satisfies the necessary condition imposed by that claim. That is, according to this reply, the accuracy argument, properly understood, does in fact provide specific advice to non-probabilistic agents. The idea is this: There are (at least) three ways to understand the purpose of Joyce’s conditions on inaccuracy measures. First, we might think that the notion of inaccuracy is vague; and we might say that any inaccuracy measure that satisfies the conditions is a legitimate precisification of it. This is a supervaluationist approach. On this approach, there is no specific advice available to non-probabilistic agents that is sanctioned by all precisifications. Second, we might think that the notion of inaccuracy is precise, but that we have only limited knowledge about it, and that the sum total of our knowledge is embodied in the conditions. This is an epistemicist approach. On this approach, there is specific advice, but it is not available to us. Third, we might think that there is no objectively correct inaccuracy measure; rather, any inaccuracy measure that satisfies the conditions is rationally permissible. But nonetheless, any particular agent has exactly one such measure. This is a subjectivist approach. On this understanding, there is specific advice for any non-probabilistic agent. Any such agent uses an inaccuracy measure that satisfies Joyce’s conditions. And this gives, for any non-probabilistic credence function, a probabilistic credence function that strongly dominates it. So the specific advice is this: adopt one of the probabilistic credence functions that strongly dominates your non-probabilistic credence function relative to your favoured measure of inaccuracy. This gives us Probabilism and does so without violating the meta-normative claim on which Bronfman’s objection relies.

However, this response isn’t without its own problems. For instance, it assumes that each agent values inaccuracy in a sufficiently specific way that they narrow down the class of inaccuracy measures to a single measure that they can then use to obtain this advice. But, at least for those who think that Joycean Inaccuracy is the strongest condition we can place on the inaccuracy measures, this seems too strong. How can we assume that each rational agent will have a unique inaccuracy measure in mind when we don’t think that there are conditions that demand that we narrow down the class of legitimate inaccuracy measures this far?

5.4.2 Undominated dominance

The second objection to Naive Dominance comes from Pettigrew (2014a). Here, Pettigrew observes that there are decisions in which Naive Dominance does not seem to hold because the irrationality of being dominated depends on the status of the dominating options in some way. Here’s Pettigrew’s central example:

Name Your Fortune\(^*\) You have a choice: play a game with God or don’t. If you don’t, you receive 2 utiles for sure. If you do, you then pick an integer. If you pick \(k\), God will then do one of two things: (i) give you \(2^{k-1}\) utiles; or (ii) give you \(2 - \frac{1}{2^{k-1}}\) utiles. (Pettigrew 2014a: 587)

In this example, the only option that isn’t dominated is the option in which you do not play the game with God. If you choose that option, you get 2 utiles for sure. If, on the other hand, you choose to play the game and pick integer \(k\), then choosing integer \(k+1\) will be guaranteed to get you more utility: either \(2^{k+1}\) utiles compared with \(2^k\) or \(2 - \frac{1}{2^k}\) utiles compared with \(2 - \frac{1}{2^{k-1}}\). However, the option in which you get 2 utiles for sure seems a lousy option given the other possibilities available. One way to see this is as follows: Take a probability distribution over the two possibilities (i) and (ii) between which God will choose if you choose to play; then, providing it doesn’t assign all probability to God choosing (i), there will be some option you can take if you play the game that has greater expected utility than the option of not playing the game. If Naive Dominance is correct, however, not playing the game is the only rational option. This seems to tell against Naive Dominance.

The moral that Pettigrew draws from this example is the following. Not all dominated options are irrational. Whether or not a dominated option is irrational depends on the status of the options that dominate it. If all of the options that dominate a given option are themselves dominated, then being dominated does not rule out the given option as irrational. Thus, in Name Your Fortune\(^*\), none of the options are ruled irrational because they are dominated; after all, all of the dominated options are only dominated by other options that are also themselves dominated. Thus, Pettigrew instead suggests a decision-theoretic principle to replace Naive Dominance. To state it, we must distinguish between two notions of dominance: a strong notion and a weak notion. Suppose \(o^*\) and \(o'\) are options. We say that \(o^*\) strongly dominates \(o'\) if \(o^*\) has greater utility than \(o'\) at all worlds. We say that \(o^*\) weakly dominates \(o'\) if \(o^*\) has at least as great utility as \(o'\) at all worlds and greater utility at some world.

Undominated Dominance A rational agent will not adopt an option that is strongly dominated by an option that is not itself even weakly dominated.

Now, it turns out that, if we accept Brier Inaccuracy, we can still derive Probabilism using only Undominated Dominance. This is a consequence of the following theorem:

Theorem 5 (de Finetti) Suppose that \(c^*\) is a credence function in \(\mathcal{C_F}\) that violates Probabilism. Then there is a credence function \(c'\) in \(\mathcal{C_F}\) such that (i) \(\mathfrak{B}(c', w) < \mathfrak{B}(c^*, w)\) for all \(w\) in \(\mathcal{W_F}\), and (ii) there is no credence function \(c\) such that \(\mathfrak{B}(c, w) \leq \mathfrak{B}(c', w)\) for all \(w\) in \(\mathcal{W_F}\) and \(\mathfrak{B}(c, w) < \mathfrak{B}(c', w)\) for some \(w\) in \(\mathcal{W_F}\).

Thus, we have the following argument:

Brier-based accuracy argument for Probabilism: II

5.4.3 Evidence and Accuracy

The next objection to Naive Dominance is similar to the objection raised in the previous section. In the previous section, the moral we drew from Name Your Fortune\(^*\) is that a dominated option is only ruled irrational in virtue of being dominated if at least one of the options that dominate it is not itself dominated. But there may be other features that a credence function might have besides itself being dominated such that being dominated by that credence function does not entail irrationality. Easwaran & Fitelson (2012) suggest the following feature. Suppose that your credence function is non-probabilistic, but it matches the evidence that you have: that is, the credence it assigns to a proposition matches the extent to which your evidence supports that proposition. And suppose that none of the credence functions that accuracy dominate your credence function have that feature. Then, we might say, the fact that your credence function is accuracy dominated does not rule it irrational. After all, it is dominated only by credence functions that violate the constraints that your evidence imposes on your credences. Thus, Easwaran and Fitelson suggest the following decision-theoretic principle, which applies only when the options in question are credence functions:

Evidential Dominance A rational agent will not adopt a credence function that is strongly dominated by an alternative credence function that is not itself even weakly dominated and which matches the agent’s evidence if the dominated credence function does.

Easwaran and Fitelson then object that there are situations in which Evidential Dominance does not entail Probabilism. For instance, suppose that a trick coin is about to be tossed. Your evidence tells you that the chance of it landing heads is 0.7. Your credence that it will lands heads is 0.7 and your credence that it will land tails is 0.6. Then you might think that your credences match your evidence, because you have evidence only about it landing heads and your credence that it will land heads equals the known chance that it will land heads. However, it turns out that all of the credence functions that accuracy dominate your credence function (when accuracy is measured by the Brier score) fail to match this evidence: that is, they assign credence other than 0.7 to Heads. Thus, Evidential Dominance does not entail that your credence function is irrational. Figure 2 illustrates this result. Pettigrew (2014a) responds to this objection on behalf of the accuracy argument for Probabilism.

[The same as figure 1 except there is a vertical dashed line going through the point labelled \(c*\).  (figure 1 description repeated: a graph of two vertical lines and two horizontal lines forming a square but lines extend beyond the intersections.  The left vertical line is labelled 'Tails' and the lower horizontal line labelled 'Heads'. The upper left corner is labelled \(v_{w_1}\), the lower right corner is labelled \(v_{w_2}\)] and a diagonal line connects the two. Two arcs, one stretches from the lower left vertical line to the upper right horizontal line and the other from the upper right vertical line to the lower left horizontal line.  The two arcs intersect twice.  The upper right intersection is labelled \(c*\) and a point on the diagonal line in the middle of the intersection space is labelled \(c'\).]

Figure 2: In this figure, as in Figure 1, we plot the various possible credence functions defined on a proposition Heads and its negation Tails in the unit square. The diagonal line contains all and only the probability functions. Let \(c^*\) be your credence function: that is, it assigns 0.7 to Heads and 0.6 to Tails. So it violates Probabilism. The credence functions that lie between the two arcs are all and only the credence functions that accuracy dominate \(c^*\). The credence functions on the dashed line are all and only the credence functions that match your evidence that the chance of Heads is 0.7. Notice that the dashed line does not overlap with the set of credence functions that accuracy dominate yours at any point. This is the crucial fact on which Easwaran and Fitelson’s objection rests.

5.4.4 Dominance and Act-State Dependence

The final objection to Naive Dominance comes from Hilary Greaves (2013) and Michael Caie (2013), who point out that, in practical decision theory, only a restricted version of that principle is accepted (see also Jenkins 2007; Berker 2013a,b; Carr ms.). To see why such a restriction is needed, consider the following case:

Driving Test My driving test is in a week’s time. I can choose now whether or not I will practise for it. Other things being equal, I prefer not to practice. But I also want to pass the test, and I know that I won’t pass if I don’t practise, and I will pass if I do. Here is my decision table:

Pass Fail
Practise 10 2
Don’t Practise 15 7

According to Naive Dominance, it is irrational to practise. After all, whether or not I pass or fail, I obtain higher utility if I don’t practise, so not practising strongly dominates practising. But this is clearly the wrong result. The reason is that I should not compare practising at the world at which I pass with not practising at that world, and practising at the world at which I fail with not practising at that world. For if I practise, I will pass; and if I don’t, I will fail. Moreover, I know all this. So I should compare practising at the world at which I pass with not practising at the world at which I fail. And then my utility is higher if I practise.

The moral of this example is that Naive Dominance should be restricted so that it applies only in situations in which the options between which the agent is choosing will not influence the way the world is if they are adopted. Such situations are sometimes called situations of act-state independence. In situations in which the acts (options) influence the states (of the world), Naive Dominance does not apply. To see how this affects the accuracy argument for Probabilism, consider the following example, which borrows from Caie’s and Greaves’ examples:

Thwarted Accuracy Suppose I can read your mind. You have opinions only about two propositions, \(A\) and \(\neg A\). And suppose that I have control over the truth of \(A\) and \(\neg A\). I decide to do the following. First, define the non-probabilistic credence function \(c^\dag(A) = 0.99\) and \(c^\dag(\neg A) = 0.005\). Then:

  1. If your credence function is \(c^\dag\), I will make \(A\) true (and thereby make your credence function very accurate);
  2. If your credence function is not \(c^\dag\) and your credence in \(A\) is greater than 0.5, I will make \(A\) false (and thereby make your credence function rather inaccurate);
  3. If your credence function is not \(c^\dag\) and your credence in \(A\) is at most 0.5, I will make \(A\) true (and thereby make your credence function rather inaccurate).

In this case, since the credence function \(c^\dag\) is not a probability function, it is accuracy dominated by Joyce’s theorem and thus it is ruled out as irrational by Naive Dominance, just as the option of practising is ruled out as irrational in Driving Test. However, this is a situation in which adopting an option influences the way the world is in such a way that it affects the utility of the option, just as choosing whether or not to practise does in Driving Test. If I were to have credence function \(c^\dag\), I would be more accurate than I would be were I to have any other credence function. Thus, it seems that, just as we said that practising is in fact the only option that shouldn’t be ruled irrational in Driving Test, so now we must say that credence function \(c^\dag\) is the only option that shouldn’t be ruled irrational in Thwarted Accuracy. But of course, it then follows that Probabilism is false, for there are situations such as this one in which it is irrational to do anything other than have a non-probabilistic credence function.

There are three responses available here: the first is to bite the bullet, accept the restriction to Naive Dominance, and therefore accept a restriction on the cases in which Probabilism holds; the second is to argue that the practical case and the epistemic case are different, with different decision-theoretic principles applying to each; the third, of course, is to abandon the accuracy argument for Probabilism. Joyce (forthcoming) and Pettigrew (forthcoming-b) argue for the first response. They advocate different decision-theoretic principles to replace Naive Dominance in the epistemic case: Joyce advocates standard causal decision theory together with a Ratifiability condition (Jeffrey 1983); Pettigrew omits the ratifiability condition. But they both agree that these principles will agree with Naive Dominance in cases of act-state independence; and they agree with the verdict that \(c^\dag\) is the only credence function that isn’t ruled out as irrational in Thwarted Accuracy. Konek & Levinstein (ms) argue for the second response, claiming that, since doxastic states and actions have different directions of fit, different decision-theoretic principles will govern them. They hold that Naive Dominance (or, perhaps, Undominated Dominance) is the correct principle when the options are credence functions, even though it is not the correct principle when the options are actions. Caie (2013) and Berker (2013b), on the other hand, argue for the third option.

6. Epistemic disutility arguments

So far, we have considered calibration arguments and accuracy arguments for Probabilism. In each of these cases, we identify a particular feature of a credence function—the proximity of its credences to being calibrated, or their proximity to the omniscience credences—we claim that it is the source of all epistemic utility, and we attempt to characterize the mathematical functions that legitimately measure the extent to which the credence function has that feature. In this section, we consider an argument, due again to Joyce, that attempts to characterize epistemic disutility functions directly (Joyce 2009). Here, I focus only on the central condition:

Coherent Admissibility Suppose \(\mathfrak{D}\) is a measure of epistemic disutility. Then, if \(c^*\) is a probabilistic credence function, then \(c^*\) is not weakly dominated relative to \(\mathfrak{D}\). That is, for any probabilistic credence function \(c^*\), there is no credence function \(c'\) such that (i) \(\mathfrak{D}(c', w) \leq \mathfrak{D}(c^*, w)\) for all \(w\); and (ii) \(\mathfrak{D}(c', w) < \mathfrak{D}(c^*, w)\) for some \(w\).

Together with Undominated Dominance, Joyce’s new set of conditions on an epistemic disutility function entail Probabilism. Let’s say that Joycean Disutility is the claim that all legitimate measures of epistemic disutility satisfy Coherent Admissibility along with the other new conditions that Joyce imposes. Then we have:

Theorem 5 (Joyce 2009) Suppose \(\mathcal{F}\) is an algebra and \(\mathfrak{D} : \mathcal{C_F} \times \mathcal{W_F} \rightarrow [0, \infty]\) is a Joycean epistemic disutility function for the credence functions on \(\mathcal{F}\). Now suppose that \(c^*\) is a credence function in \(\mathcal{C_F}\) that violates Probabilism. Then there is a credence function \(c'\) in \(\mathcal{C_F}\) such that (i) \(\mathfrak{D}(c', w) < \mathfrak{D}(c^*, w)\) for all \(w\) in \(\mathcal{W_F}\), and (ii) there is no credence function \(c\) such that \(\mathfrak{D}(c, w) \leq \mathfrak{D}(c', w)\) for all \(w\) in \(\mathcal{W_F}\) and \(\mathfrak{D}(c, w) < \mathfrak{D}(c', w)\) for some \(w\) in \(\mathcal{W_F}\).

Thus, we have the following argument:

Joycean epistemic disutility argument for Probabilism

Joyce argues for Coherent Admissibility as follows.

  • \((1)\)For each probabilistic credence function \(c\), there is a possible world at which \(c\) is the objective chance function.
  • \((2)\)If an agent learns with certainty that \(c\) is the objective chance function, and nothing more, then the unique rational response to her evidence is to set her credence function to \(c\). (This is close to David Lewis’ Principal Principle (Lewis 1980).)
  • \((3)\)Thus, by (1) and (2): for each probabilistic credence function \(c\), there is an evidential situation in which an agent might find herself such that \(c\) is the unique rational response to that evidential situation.
  • \((4)\)Thus, by (3): Let \(c^*\) be a probabilistic credence function. Then there is an evidential situation in which \(c^*\) is the unique rational response.
  • \((5)\)If \(c'\) weakly dominates \(c^*\) relative to a legitimate measure of epistemic disutility, and \(c^*\) is rationally permitted, then \(c'\) is also rationally permitted.
  • \((6)\)Thus, by (4) and (5): if \(c^*\) is weakly dominated, there is no evidential situation in which \(c^*\) the unique rational response.
  • Therefore,
  • \((7)\)\(c^*\) is not weakly dominated relative to any legitimate measure of epistemic disutility.

Alan Hájek (2008) has raised two objections to this argument.

Objection 1: Not all probabilistic credence functions could be chance functions. The first objection denies (1). As Hájek notes, if \(c\) is defined on propositions concerning ethical matters, or mathematical matters, or aesthetic matters, or facts about the current time or the agent’s current location, it is not clear that it could possibly be the chance function of any world, since chances cannot attach to these sorts of proposition. Pettigrew (2014b: 5.2.1) replies on Joyce’s behalf.

Objection 2: The argument over-generates. The second objection claims that, in the absence of Probabilism, which is supposed to be the conclusion of the argument for which Coherent Admissibility is a crucial part, this argument overgenerates. Consider, for instance, the following claim:

  • (\(2'\))If an agent learns with certainty that \(c\) is the credence function that constitutes the unique rational response to her evidence at that time, and nothing more, then the unique rational response is to set her credence function to \(c\).

Now, suppose \(c^\dag\) is a non-probabilistic credence function and apply the version of Joyce’s argument that results from replacing (2) with (2’). That is, we assume that it is possible that the agent learn with certainty that \(c^*\) is the unique rational response to her evidence, even if in fact it is not. We might assume, for instance, that a mischievous God whispers in the agent’s ear that this is the case. Then we must conclude that \(c^\dag\) is not weakly dominated relative to any legitimate measure of epistemic disutility. But now we have that no credence function is weakly dominated, whether it is probabilistic or not. And, combined with Joyce’s other considerations, this is impossible. If no probabilistic credence functions are weakly dominated relative to an epistemic disutility function, then all of the non-probabilistic credence functions are: that’s the lesson of Theorem 5 above. Of course, the natural response to this objection is to note that (2’) only holds when \(c\) is a probabilistic credence function. But such a restriction is unmotivated until we have established Probabilism.

7. Related issues

That completes our survey of the existing literature on the epistemic utility arguments for Probabilism. We have considered three families of argument: calibration arguments, accuracy arguments, and epistemic disutility arguments. In this final section, we briefly consider ways in which the argument strategy employed here (and described in section 2) might be generalised.

7.1 Infinite probability spaces

We have assumed throughout that the set of propositions on which an agent’s credence function is defined is finite. What happens when we lift this restriction? Can we justify Countable Additivity, for instance? Some work has been done in this area, but there is great scope for further investigation (Easwaran 2013; Huttegger 2013; Konek ms.).

7.2 Other principles of rationality for credences

We have focussed here on the synchronic coherence principle of Probabilism. But there are many other principles that are thought to govern rational credence. It is natural to ask whether we can give similar arguments for those. As we saw above in section 5.2, a number of epistemic norms have been explored in this framework, but of course there are many more still to consider.

7.3 Other doxastic states

In this entry, we have considered agents represented as having precise credence functions. But there are, of course, many other models of doxastic states that are considered in current epistemology. As mentioned at the outset, we might represent an agent by the set of propositions that they believe; or we might represent them using a set of precise probability functions; or a comparative confidence ordering; or a precise primitive conditional probability function. And, when modelled in this way, there are principles of rationality that apply to these agents. Are there accuracy arguments in their favour? See (Easwaran 2015) and (Easwaran & Fitelson forthcoming) for some work on this question for the case of full beliefs. And see Seidenfeld, Schervish, & Kadane 2012, Schoenfield 2015, and Mayo-Wilson & Wheeler forthcoming for results that suggest that it may be difficult to extend the framework to the case of imprecise credences.

Bibliography

  • Ahlstrom-Vij, K. & J. Dunn (eds.), forthcoming, Epistemic Consequentialism, Oxford: Oxford University Press.
  • Berker, S., 2013a, “Epistemic Teleology and the Separateness of Propositions”, Philosophical Review, 122(3): 337–393.
  • –––, 2013b, “The Rejection of Epistemic Consequentialism”, Philosophical Issues (Supp. Noûs), 23(1): 363–387.
  • BonJour, L., 1985, The Structure of Empirical Knowledge, Cambridge, MA: Harvard University Press.
  • Caie, M., 2013, “Rational Probabilistic Incoherence”, Philosophical Review, 122(4): 527–575.
  • Carr, J., ms., Epistemic Utility Theory and the Aim of Belief.
  • Easwaran, K., 2013, “Expected Accuracy Supports Conditionalization—and Conglomerability and Reflection”, Philosophy of Science, 80(1): 119–142.
  • –––, 2015, “Dr Truthlove, Or: How I Learned to Stop Worrying and Love Bayesian Probabilities”, Noûs, doi:10.1111/nous.12099
  • Easwaran, K. & B. Fitelson, 2012, “An ‘evidentialist’ worry about Joyce’s argument for Probabilism”, Dialectica, 66(3): 425–433.
  • –––, forthcoming, “Accuracy, Coherence, and Evidence”, Oxford Studies in Epistemology, 5. [preprint of Easwaran & Fitelson forthcoming]
  • Fraassen, B.C. van, 1983, “Calibration: Frequency Justification for Personal Probability”, in R.S. Cohen & L. Laudan (eds.), Physics, Philosophy, and Psychoanalysis, Dordrecht: Springer.
  • Goldman, A.I., 2002, Pathways to Knowledge: Private and Public, New York: Oxford University Press.
  • Greaves, H., 2013, “Epistemic Decision Theory”, Mind, 122(488): 915–952.
  • Greaves, H. & D. Wallace, 2006, “Justifying Conditionalization: Conditionalization Maximizes Expected Epistemic Utility”, Mind, 115(459): 607–632.
  • Harman, G., 1973, Thought, Princeton, NJ: Princeton University Press.
  • Hájek, A., 2008, “Arguments For—Or Against—Probabilism?”, The British Journal for the Philosophy of Science, 59(4): 793–819.
  • –––, 2009, “Fifteen Arguments against Hypothetical Frequentism”, Erkenntnis, 70: 211–235.
  • Horowitz, S., 2014, “Immoderately rational”, Philosophical Studies, 167: 41–56.
  • Huttegger, S.M., 2013, “In Defense of Reflection”, Philosophy of Science, 80(3): 413–433.
  • Jeffrey, R., 1965, The Logic of Decision, New York: McGraw-Hill.
  • Jeffrey, R., 1983, The Logic of Decision (2nd). Chicago; London: University of Chicago Press.
  • Jenkins, C.S., 2007, “Entitlement and Rationality”, Synthese, 157: 25–45.
  • Joyce, J.M., 1998, “A Nonpragmatic Vindication of Probabilism”, Philosophy of Science, 65(4): 575–603.
  • –––, 2009, “Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief”, in F. Huber & C. Schmidt-Petri (eds.), Degrees of Belief, Springer.
  • –––, forthcoming, “The True Consequences of Epistemic Consequentialism”, in Ahlstrom-Vij & Dunn forthcoming.
  • Konek, J., ms., “Probabilistic Knowledge and Cognitive Ability”,
  • Konek, J. & B.A. Levinstein, ms., The Foundations of Epistemic Decision Theory.
  • Lam, B., 2013, “Calibrated Probabilities and the Epistemology of Disagreement”, Synthese, 190(6): 1079–1098.
  • Lange, M., 1999, “Calibration and the Epistemological Role of Bayesian Conditionalization”, The Journal of Philosophy, 96(6): 294–324.
  • Leitgeb, H. & R. Pettigrew, 2010a, “An Objective Justification of Bayesianism I: Measuring Inaccuracy”, Philosophy of Science, 77: 201–235.
  • –––, 2010b, “An Objective Justification of Bayesianism II: The Consequences of Minimizing Inaccuracy”, Philosophy of Science, 77: 236–272.
  • Levinstein, B.A., 2012, “Leitgeb and Pettigrew on Accuracy and Updating”, Philosophy of Science, 79(3): 413–424.
  • –––, 2015, “With All Due Respect: The Macro-Epistemology of Disagreement”, Philosophers’ Imprint, 15(3): 1–20.
  • Lewis, D., 1980, “A Subjectivist’s Guide to Objective Chance”, in R.C. Jeffrey (ed.), Studies in Inductive Logic and Probability (Vol. II). Berkeley: University of California Press.
  • Maher, P., 1993, Betting on Theories, Cambridge: Cambridge University Press.
  • –––, 2002, “Joyce’s Argument for Probabilism”, Philosophy of Science, 69(1): 73–81.
  • Mayo-Wilson, C. & G. Wheeler, forthcoming, “Scoring Imprecise Credences: A Mildly Immodest Proposal”, Philosophy and Phenomenological Research. [preprint of Mayo-Wilson & Wheeler forthcoming]
  • Moss, S., 2011, “Scoring Rules and Epistemic Compromise”, Mind, 120(480): 1053–1069.
  • Pettigrew, R., 2010, “Modelling uncertainty”, Grazer Philosophische Studien, 80.
  • –––, 2013a, “A New Epistemic Utility Argument for the Principal Principle”, Episteme, 10(1): 19–35.
  • –––, 2013b, “Epistemic Utility and Norms for Credence”, Philosophy Compass, 8(10): 897–908.
  • –––, 2014a, “Accuracy and Evidence”, Dialectica.
  • –––, 2014b, “Accuracy, Risk, and the Principle of Indifference”, Philosophy and Phenomenological Research.
  • –––, forthcoming-a, Accuracy and the Laws of Credence, Oxford: Oxford University Press.
  • –––, forthcoming-b, “Making Things Right: the true consequences of decision theory in epistemology”, in Ahlstrom-Vij & Dunn forthcoming. [draft of Pettigrew forthcoming-b]
  • Rosenkrantz, R.D., 1981, Foundations and Applications of Inductive Probability, Atascadero, CA: Ridgeview Press.
  • Schoenfield, M., ms., “Conditionalization does not (in general) Maximize Expected Accuracy”. [Schoenfield ms.]
  • –––, 2015, “The Accuracy and Rationality of Imprecise Credences”, Noûs, doi:10.1111/nous.12105.
  • Seidenfeld, T., 1985, “Calibration, Coherence, and Scoring Rules”, Philosophy of Science, 52(2): 274–294.
  • Seidenfeld, T., M.J. Schervish, & J.B. Kadane, 2012, “Forecasting with imprecise probabilities”, International Journal of Approximate Reasoning, 53: 1248–1261.
  • Shimony, A., 1988, “An Adamite Derivation of the Calculus of Probability”, in J. Fetzer (ed.), 1988, Probability and Causality: Essays in Honor of Wesley C. Salmon, Dordrecht: Reidel.
  • White, R., 2009, “Evidential Symmetry and Mushy Credence”, Oxford Studies in Epistemology, 3: 161–186.
  • Williamson, T., 2000, Knowledge and its Limits, Oxford: Oxford University Press.

Other Internet Resources

Copyright © 2015 by
Richard Pettigrew <Richard.Pettigrew@bris.ac.uk>

This is a file in the archives of the Stanford Encyclopedia of Philosophy.
Please note that some links may no longer be functional.
[an error occurred while processing the directive]