Stanford Encyclopedia of Philosophy
This is a file in the archives of the Stanford Encyclopedia of Philosophy.

Defeasible Reasoning

First published Fri Jan 21, 2005; substantive revision Fri Dec 11, 2009

Reasoning is defeasible when the corresponding argument is rationally compelling but not deductively valid. The truth of the premises of a good defeasible argument provide support for the conclusion, even though it is possible for the premises to be true and the conclusion false. In other words, the relationship of support between premises and conclusion is a tentative one, potentially defeated by additional information. Philosophers have studied the nature of defeasible reasoning since Aristotle's analysis of dialectical reasoning in the Topics and the Posterior Analytic, but the subject has been studied with unique intensity over the last forty years, largely due to the interest it attracted from the artificial intelligence movement in computer science. There are have been two approaches to the study of reasoning: treating it either as a branch of epistemology (the study of knowledge) or as a branch of logic. In recent work, the term defeasible reasoning has typically been limited to inferences involving rough-and-ready, exception-permitting generalizations, that is, inferring what has or will happen on the basis of what normally happens. This narrower sense of defeasible reasoning, which will be the subject of this article, excludes from the topic the study of other forms of non-deductive reasoning, including inference to the best explanation, abduction, analogical reasoning, and scientific induction. This exclusion is to some extent artificial, but it reflects the fact that the formal study of these other forms of non-deductive reasoning remains quite rudimentary.


1. History

Defeasible reasoning has been the subject of study by both philosophers and computer scientists (especially those involved in the field of artificial intelligence). The philosophical history of the subject goes back to Aristotle, while the field of artificial intelligence has greatly intensified interest in it over the last forty years.

1.1 Philosophy

According to Aristotle, deductive logic (especially in the form of the syllogism) plays a central role in the articulation of scientific understanding, deducing observable phenomena from definitions of natures that hold universally and without exception. However, in the practical matters of every day life, we rely upon generalizations that hold only “for the most part”, under normal circumstances, and the application of such common sense generalizations involves merely dialectical reasoning, reasoning that is defeasible and falls short of deductive validity. Aristotle lays out a large number and great variety of examples of such reasoning in his work entitled the Topics.

Investigations in logic after Aristotle (from later antiquity through the twentieth century) seem to have focused exclusively on deductive logic. This continued to be true as the predicate logic was developed by Peirce, Frege, Russell, Whitehead and others in the late nineteenth and early twentieth centuries. With the collapse of logical positivism in the mid twentieth century (and the abandonment of attempts to treat the physical world as a logical construction from facts about sense data), new attention was given to the relationship between sense perception and the external world. Roderick Chisholm (Chisholm 1957; Chisholm 1966) argued that sensory appearances give good, but defeasible, reasons for believing in corresponding facts about the physical world. If I am “appeared to redly” (have the sensory experience as of being in the presence of something red), then, Chisholm argued, I may presume that I really am in the presence of something red. This presumption can, of course, be defeated, if, for example, I learn that my environment is relevantly abnormal (for instance, all the ambient light is red).

John L. Pollock developed Chisholm's idea into a theory of prima facie reasons and defeaters of those reasons (Pollock 1967; Pollock 1979; Pollock 1974). Pollock distinguished between two kinds of defeaters of a defeasible inference: rebutting defeaters (which give one a prima facie reason for believing the denial of the original conclusion) and undercutting defeaters (which give one a reason for doubting that the usual relationship between the premises and the conclusion hold in the given case). According to Pollock, a conclusion is warranted, given all of one's evidence, if it is supported by an ultimately undefeated argument whose premises are drawn from that evidence.

1.2 Artificial Intelligence

As the subdiscipline of artificial intelligence took shape in the 1960's, pioneers like John M. McCarthy and Patrick J. Hayes soon discovered the need to represent and implement the sort of defeasible reasoning that had been identified by Aristotle and Chisholm. McCarthy and Hayes (McCarthy and Hayes 1969) developed a formal language they called the “situation calculus”, for use by expert systems attempting to model changes and interactions among a domain of objects and actors. McCarthy and Hayes encountered what they called the frame problem: the problem of deciding which conditions will not change in the wake of an event. They required a defeasible principle of inertia: the presumption that any given condition will not change, unless required to do so by actual events and dynamic laws. In addition, they encountered the qualification problem: the need for a presumption that an action can be successfully performed, once a short list of essential prerequisites have been met. McCarthy (McCarthy 1977, 1038-1044) suggested that the solution lay in a logical principle of circumscription: the presumption that the actual situation is as unencumbered with abnormalities and oddities (including unexplained changes and unexpected interferences) as is consistent with our knowledge of it. (McCarthy 1982; McCarthy 1986) In effect, McCarthy suggests that it is warranted to believe whatever is true in all the minimal (or otherwise preferred) models of one's initial information set.

In the early 1980's, several systems of defeasible reasoning were proposed by others in the field of artificial intelligence: Ray Reiter's default logic (Reiter 1980; Etherington & Reiter 1983, 104-108), McDermott and Doyle's Non-Monotonic Logic I (McDermott and Doyle, 1982), Robert C. Moore's Autoepistemic Logic (Moore 1985), and Hector Levesque's formalization of the “all I know” operator (Levesque 1990). These early proposals involved the search for a kind of fixed point or cognitive equilibrium. Special rules (called default rules by Reiter) permit drawing certain conclusions so long as these conclusions are consistent with what one knows, including all that one knows on the basis of these very default rules. In some cases, no such fixed point exists, and, in others, there are multiple, mutually inconsistent fixed points. In addition, these systems were procedural or computational in nature, in contrast to the semantic characterization of warranted conclusions (in terms of preferred models) in McCarthy's circumscription system. Later work in artificial intelligence has tended to follow McCarthy's lead in this respect.

2. Applications and Motivation

Philosophers and theorists of artificial intelligence have found a wide variety of applications for defeasible reasoning. In some cases, the defeasibility seems to be grounded in some aspect of the subject or the context of communication, and in other cases in facts about the objective world. The first includes defeasible rules as communicative or representational conventions and autoepistemic (reasoning about one's own knowledge and lack of knowledge). The latter, the objective sources of defeasibility, include defeasible obligations, defeasible laws of nature, induction, abduction, and Ockham's razor (the presumption that the world is as uncomplicated as possible).

2.1 Defeasibility as a Convention of Communication

Much of John McCarthy's early work in artificial intelligence concerned the interpretation of stories and puzzles (McCarthy and Hayes 1969; McCarthy 1977). McCarthy found that we often make assumptions based on what is not said. So, for example, in a puzzle about safely crossing a river by canoe, we assume that there are no bridges or other means of conveyance available. Similarly, when using a database to store and convey information, the information that, for example, no flight is scheduled at a certain time is represented simply by not listing such a flight. Inferences based on these conventions are defeasible, however, because the conventions can themselves be explicitly abrogated or suspended.

2.2 Autoepistemic Reasoning

Robert C. Moore (Moore 1985) pointed out that we sometimes infer things about the world based on our not knowing certain things. So, for instance, I might infer that I do not have a sister, since, if I did, I would certainly know it, and I do not in fact know that I have a sister. Such an inference is, of course, defeasible, since if I subsequently learn that I have a sister, after all, the basis for the original inference is nullified.

2.3 Semantics for Generics and the Progressive

Generic terms (like birds in Birds fly) are expressed in English by means of bare common noun phrases (without determiner). Adverbs like normally and typically are also indicators of generic predication. As Asher and Pelletier (Asher and Pelletier 1997) have argued, the semantics for such sentences seems to involve intentionality: a generic sentence can be true even if the majority of the kind, or even all of the kind, fail to conform to the generalization. It can be true that birds fly even if, as a result of a freakish accident, all surviving birds are abnormally flightless. A promising semantic theory for the generic is to represent generic predication by means of a defeasible rule or conditional.

The progressive verb involves a similar kind of intentionality. (Asher 1992) If Jones is crossing the street, then it would normally be the case that Jones will succeed in crossing the street. However, this inference is clearly defeasible: Jones might be hit by a truck midway across and never complete the crossing.

2.4 Defeasible Obligations

Philosophers have, for quite some time, been interested in defeasible obligations, which give rise to defeasible inferences about what we are, all things considered, obliged to do. David Ross, in 1930, discussed the phenomena of prima facie obligations (Ross 1930). The existence of a prima facie obligation gives one good, but defeasible grounds, for believing that one ought to fulfill that obligation. When formal deontic logic was developed by Chisholm and others in the 1960s (Chisholm 1963), the use of classical logic gave rise to certain paradoxes, such as Chisholm's paradox of contrary-to-duty imperatives. These paradoxes can be resolved by recognizing that the inference from imperative to actual duty is a defeasible one (Asher and Bonevac 1996; Nute 1997).

2.5 Defeasible Laws of Nature

Philosophers David M. Armstrong and Nancy Cartwright have argued that the actual laws of nature are oaken rather than iron (to use Armstrong's terms). (Armstrong 1983; Armstrong 1997, 230-231; Cartwright 1983). Oaken laws admit of exceptions: they have tacit ceteris paribus (other things being equal) or ceteris absentibus (other things being absent) conditions. As Cartwright points out, an inference based on such a law of nature is always defeasible, since we may discover that additional phenomenological factors must be added to the law in question in special cases.

There are several reasons to think that deductive logic is not an adequate tool for dealing with this phenomenon. In order to apply deduction to the laws and the initial conditions, the laws must be represented in a form that admits of no exceptions. This would require explicitly stating each potentially relevant condition in the antecedent of each law-stating conditional. This is impractical, not only because it makes the statement of each and every law extremely cumbersome, but also because we know that there are many exceptional cases that we have not yet encountered and may not be able to imagine. Defeasible laws enable us to express what we really know to be the case, rather than forcing us to pretend that we can make an exhaustive list of all the possible exceptions.

2.6 Defeasible Principles in Metaphysics and Epistemology

Many classical philosophical arguments, especially those in the perennial philosophy that endured from Plato and Aristotle to the end of scholasticism, can be fruitfully reconstructed by means of defeasible logic. Metaphysical principles, like the laws of nature, may hold in normal cases, while admitting of occasional exceptions. The principle of causality, for example, that plays a central role in the cosmological argument for God's existence, can plausibly construed as a defeasible generalization (Koons 2001).

As discussed above (in section 1.1), prima facie reasons and defeaters of those reasons play a central role in contemporary epistemology, not only in relation to perceptual knowledge, but also in relation to every other source of knowledge: memory, imagination (as an indicator of possibility) and testimony, at the very least. In each cases, an impression or appearance provides good but defeasible evidence of a corresponding reality.

2.7 Occam's Razor and the Assumption of a “Closed World”

Prediction always involves an element of defeasibilty. If one predicts what will, or what would, under some hypotheis, happen, one must presume that there are no unknown factors that might interfere with those factors and conditions that are known. Any prediction can be upset by such unanticipated interventions. Prediction thus proceeds from the assumption that the situation as modeled constitutes a closed world: that nothing outside that situation could intrude in time to upset one's predictions. In addition, we seem to presume that any factor that is not known to be causally relevant is in fact causally irrelevant, since we are constantly encountering new factors and novel combinations of factors, and it is impossible to verify their causal irrelevance in advance. This closed-world assumption is one of the principal motivations for McCarthy's logic of circumscription (McCarthy 1982; McCarthy 1986).

3. Varieties of Approaches

We can treat the study of defeasible reasoning either as a branch of epistemology (the theory of knowledge), or as a branch of logic. In the epistemological apporach, defeasible reasoning is studied as a form of inference, that is, as a process by which we add to our stock of knowledge. The epistemological approach is concerned with the transmission of warrant, with the question of when an inference, starting with justified or warranted beliefs, produces a new belief that is also warranted. This approach focuses explicitly on the norms of belief change.

In contrast, a logical approach to defeasible reasoning fastens on a relationship between propositions or possible bodies of information. Just as deductive logic consists of the study of a certain consequence relation between propositions or sets of propositions (the relation of valid implication), so defeasible (or nonmonotonic) logic consists of the study of a different kind of consequence relation. Deductive consequence is monotonic: if a set of premises logically entails a conclusion, than any superset (any set of premises that includes all of the first set) will also entail that some conclusion. In contrast, defeasible consequence is nonmonotonic. A conclusion follows defeasibly or nonmonotonically from a set of premises just in case it is true in nearly all of the models that verify the premises, or in the most normal models that do.

The two approaches are related. In particular, a logical theory of defeasible consequence will have epistemological consequences. It is presumably true that an ideally rational thinker will have a set of beliefs that are closed under defeasible, as well as deductive, consequence. However, a logical theory of defeasible consequence would have a wider scope of application than a merely epistemological theory of inference. Defeasible logic would provide a mechanism for engaging in hypothetical reasoning, not just reasoning from actual beliefs.

Conversely, as David Makinson and Peter Gärdenfors have pointed out (Makinson and Gärdenfors 1991, 185-205; Makinson 2005), an epistemological theory of belief change can be used to define a set of nonmonotonic consequence relations (one relation for each initial belief state). We can define the consequence relation α dproves β, for a given set of beliefs T, as holding just in case the result of adding belief α to T would include belief in β. However, on this approach, there would be many distinct nonmonotonic consequence relations, instead of a single perspective-independent one.

4. Epistemological Approaches

There are have been three versions of the epistemological approach, each of which attempts to define how an cognitively ideal agent arrives at warranted conclusions, given an initial input. The first two of these, John L. Pollock's theory of defeasible reasoning and the theory of semantic inheritance networks, are explicitly computational in nature. They take as input a complex, structured state, representing the data available to the agent, and they define a procedure by which new conclusions can be warranted. The third approach, based on the theory of belief change (the AGM model) developed by Alchourrón, Gärdenfors and Makinson (Alchourrón, Gärdenfors and Makinson 1982), instead lays down a set of conditions that an ideal process of belief change ought to satisfy. The AGM model can be used to define a nonmonotonic consequence relation that is temporary and local. This can represent reasoning that is hypothetically or counterfactually defeasible, in the sense that what “follows” from a conjunctive proposition (p & q) need not be a superset of what “follows” from p alone.

4.1 Formal Epistemology

John Pollock's approach to defeasible reasoning consists of enumerating a set of rules that are constructive and effectively computable, and that aim at describing how an ideal cognitive agent builds up a rich set of beliefs, beginning with a relatively sparse data set (consisting of beliefs about immediate sensory appearances, apparent memories, and such things). The inferences involved are not, for the most part, deductive. Instead, Pollock defines, first, what it is for one belief to be a prima facie reason for believing another proposition. In addition, Pollock defines what it is for one belief, say in p, to be a defeater for q as a prima facie reason for r. In fact. Pollock distinguishes two kinds of defeaters: rebutting defeaters, which are themselves prima facie reasons for believing the negation of the conclusion, and undercutting defeaters, which provide a reason for doubting that q provides any support, in the actual circumstances, for r. (Pollock 1987, 484) A belief is ultimately warranted in relation to a data set (or epistemic basis) just in case it is supported by some ultimately undefeated argument proceeding from that epistemic basis.

In his most recent work (Pollock 1995), Pollock uses a directed graph to represent the structure of an ideal cognitive state. Each directed link in the network represent the first node's being a prima face reason for the second. The new theory includes an account of hypothetical, as well as categorical reasoning, since each node of the graph includes a (possibly empty) set of hypotheses. Somewhat surprisingly, Pollock assumes a principle of monotonicity with respect to hypotheses: a belief that is warranted relative to a set of hypotheses is also warranted with respect to any superset of hypotheses. Pollock also permits conditionalization and reasoning by cases.

An argument is self-defeating if it supports a defeater for one of its own defeasible steps. Here is an interesting example: (1) Robert says that the elephant beside him looks pink. (2) Robert's color vision becomes unreliable in the presence of pink elephants. Ordinarily, belief 1 would support the conclusion that the elephant is pink, but this conclusion undercuts the argument, thanks to belief 2. Thus, the argument that the elephant is pink is self-defeating. Pollock argues that all self-defeating arguments should be rejected, and that they should not be allowed to defeat other arguments. In addition, a set of nodes can experience mutual destruction or collective defeat if each member of the set is defeated by some other member, and no member of the set is defeated by an undefeated node that is outside the set.

In formalizing the undercutting rebuttal, Pollock introduces a new connective, ⊗, where pq means that it is not the case that p wouldn't be true unless q were true. Pollock uses rules, rather than conditional propositions, to express the prima facie relation. If he had, instead, introduced a special connective ⇒, with pq meaning that p would be a prima facie reason for q, then undercutting defeaters could be represented by means of negating this conditional. To express the fact that r is an undercutting defeater of p as a prima facie reason for q, we could state both that (pq) and ¬((p & r) ⇒ q).

In the case of conflicting prima facie reasons, Pollock rejects the principle of specificity, a widely accepted principle according to which the defeasible rule with the more specific antecedent takes priority over conflicting rules with less specific antecedents. Pollock does, however, accept a special case of specificity in the area of statistical syllogisms with projectible properties. (Pollock 1995, 64-66) So, if I know that most As are Bs, and the most ACs are not Bs, then I should, upon learning that individual b is both A and C, give priority to the AC generalization over the A generalization (concluding that b is not a B).

Pollock's theory of warrant is intended to provide normative rules for belief, of the form: if you have warranted beliefs that are prima facie reasons for some further belief, and you have no ultimately undefeated defeaters for those reasons, then that further belief is warranted and should be believed. For more details of Pollock's theory, see the following supplementary document:

John Pollock's System

Wolfgang Spohn (Spohn 2002) has argued that Pollock's system is normatively defective because, in the end, Pollock has no normative standard to appeal to, other than ad hoc intuitions about how a reasonable person would respond to this or that cognitive situation. Spohn suggests that, with respect to the state of development of the study of defeasible reasoning, Pollock's theory corresponds to C. I. Lewis's early investigations into modal logic. Lewis suggested a number of possible axiom systems, but lacked an adequate semantic theory that could provide an independent check on the correctness or completeness of any given list (of the kind that was later provided by Kripke and Kanger). Analogously, Spohn argues that Pollock's system is in need of a unifying normative standard. This very same criticism can be lodged, with equal justice, against a number of other theories of defeasible reasoning, including semantic inheritance networks and default logic.

4.2 Semantic Inheritance Networks

The system of semantic inheritance networks, developed by Horty, Thomason and Touretzky (Horty, Thomason and Touretzky 1990), is similar to Pollock's system. Both represent cognitive states by means of directed graphs, with links representing defeasible inferences. The semantic inheritance network theory has a intentionally narrower scope: the initial nodes of the network represent particular individuals, and all non-initial nodes represent kinds, categories or properties. A link from an initial (individual) node to a category node represents simply predication: that Felix (initial node) is a cat (category node), for example. Links between category nodes represent defeasible or generic inclusion: that birds (normally or usually) are flying things. To be more precise, there are both positive (“is a”) and negative (“is not a”) links. The negative links are usually reprented by means of a slash through the body of the arrow.

Semantic inheritance networks differ from Pollock's system in two important ways. First, they cannot represent one fact's constituting an undercutting defeater of an inference, although they can represent rebutting defeaters. For example, they do not allow an inference from the apparent color of an elephant to its actual color to be undercut by the information that my color vision is unreliable, unless I have information about the actual color of the elephant that contradicts its apparent color. Secondly, they do incorporate the principle of specificity (the principle that rules with more specific antecedents take priority in case of conflict) into the very definition of a warranted conclusion. In fact, in contrast to Pollock, the semantic inheritance approach gives priority to rules whose antecedents are weakly or defeasibly more specific. That is, if the antecedent of one rule is defeasibly linked to the antecedent of a second rule, the first rule gains priority. For example, if Quakers are typically pacifists, then, when reasoning about a Quaker pacifist, rules pertaining to Quakers would override rules pertaining to pacifists. For the details of semantic inheritance theory, see the following supplementary document:

Semantic Inheritance Networks.

David Makinson (Makinson 1994) has pointed out that semantic network theory is very sensitive to the form in which defeasible information is represented. There is a great difference between having a direct link between two nodes and having a path between the two nodes being supported by the graph as a whole. The notion of preemption gives special powers to explicitly given premises over conclusions. Direct links always take priority over longer paths. Consequently, inheritance networks lack two desirable metalogical properties: cut and cautious monotony (which will be covered in more detail in the section on Logical Approaches).

Cumulativity (Cut plus Cautious Monotony) corresponds to reasoning by lemmas or subconclusions. The Horty-Thomason-Touretzky system does satisfy special cases of Cut and Cautious Monotony: if A is an atomic statement (a link from an individual to a category), then if graph G supports A, then for any statement B, G ∪ {A} supports B if and only if G supports B.

Another form of inference that is not supported by semantic inheritance networks is that of reasoning by cases or by dilemma. In addition, semantic networks do not license modus-tollens-like inferences: from the fact that birds normally fly and Tweety does not fly, we are not licensed to infer that Tweety is not a bird. (This feature is also lacking in Pollock's system.)

4.3 Belief Revision Theory

Alchourrón, Gärdenfors and Makinson (Alchourrón, Gärdenfors and Makinson 1982) developed a formal theory of belief revision and contraction, drawing largely on Willard van Orman Quine's model of the web of belief (Quine and Ullian 1970). The cognitive agent is modelled as believing a set of propositions that are ordered by their degree of entrenchment. This model provides the basis for a set of normative constraints on belief contraction (subtracting a belief) and belief revision (adding a new belief that is inconsistent with the original set). When a belief is added that is logically consistent with the original belief set, the agent is supposed to believe the logical closure of the original set plus the new belief. When a belief is added that is inconsistent with the original set, the agent retreats to the most entrenched of the maximal subsets of the set that are consistent with the new belief, adding the new proposition to that set and closing under logical consequence. For the axioms of the AGM model, see the following supplementary document:

AGM Postulates

AGM belief revision theory can be used as the basis for a system of defeasible reasoning or nonmonotonic logic, as Gärdenfors and Makinson have recognized (Makinson and Gärdenfors 1991). If K is an epistemic state, then a nonmonotonic consequence relation dproves can be defined as follows: A dproves B iff BK*A. Unlike Pollock's system or semantic inheritance networks, this defeasible consequence relation depends upon a background epistemic state. Thus, the belief revision approach gives rise, not to a single nonmonotonic consequence relation, but to family of relations. Each background state K gives rise to its own characteristic consequence relation.

One significant limitation of the belief-revision approach is that there is no representation in the object-language of a defeasible or default rule or conditional (that is, of a conditional of the form If p, then normally q or That p would be a prima facie reason for accepting that q). In fact, Gärdenfors (Gärdenfors 1978; Gärdernfors 1986) proved that no conditional satisfying the Ramsey test can be added to the AGM system without trivializing the revision relation.[1] (A conditional ⇒ satisfies the Ramsey test just in case, for every epistemic state K, K includes (AB) iff K*A includes B.)

Since the AGM system cannot include conditional beliefs, it cannot elucidate the question of what logical relationships hold between conditional defaults.

The lack of a representation of conditional beliefs is closely connected to another limitation of the AGM system: its inability to model repeated or iterated belief revision. The input to a belief change is an epistemic state, consisting both of a set of propositions believed and an entrenchment relation on that set. The output of an AGM revision, in contrast, consists simply of a set of beliefs. The system provides no guidance on the question of what would be the result of revising an epistemic state in two or more steps. If the entrenchment relation could be explicitly represented by means of conditional propositions, then it would be possible to define the new entrenchment relation that would result from a single belief revision, making iterated belief revision representable. A number of proposals along these lines have been made. The difficulty lies in defining exactly what would constitute a minimal change in the relative entrenchment or epistemic ranking of a set of beliefs. To this point, no clear consensus has emerged on this question. (See Spohn 1988; Nayak 1994; Wobcke 1995; Bochman, 2001.)

On the larger question of the relation between belief revision and defeasible reasoning, there are two possibilities: that a theory of defeasible reasoning should be grounded in a theory of belief revision, and that a theory of belief revision should be grounded in a theory of defeasible reasoning. The second view has been defended by John Pollock (Pollock 1987; Pollock 1995) and by Hans Rott (Rott 1989). On this second view, we must make a sharp distinction between basic or foundational beliefs on the one hand and inferred or derived beliefs on the other. We can then model belief change on the assumption that new beliefs are added to the foundation (and are logically consistent with the existing set of those beliefs). Beliefs can be added which are inconsistent with previous inferred beliefs, and the new belief state consists simply in the closure of the new foundational set under the relation of defeasible consequence. On such an approach, default conditionals can be explicitly represented among the agent's beliefs. Gärdenfors's triviality result is then avoided by rejecting one of the assumptions of the theorem, preservation:

Preservation: If ¬AK, then KK*A.

From the perspective that uses defeasible reasoning to define belief revision, there is no good reason to accept Preservation. One can add a belief that is consistent with what one already believes and thereby lose beliefs, since the new information might be an undercutting defeater to some defeasible inference that had been successful.

5. Logical Approaches

Logical approaches to defeasible reasoning treat the subject as a part of logic: the study of nonmonotonic consequence relations (in contrast to the monotonicity of classical logic). These relations are defined on propositions, not on the beliefs of an agent, so the focus is not on epistemology per se, although a theory of nonmonotonic logic will certainly have implications for epistemology.

5.1 Relations of Logical Consequence

A consequence relation is a mathematical relation that models what follows logically from what. Consequence relations can be defined in a variety of ways: Hilbert, Tarski and Scott relations. A Hilbert consequence relation is a relation between pairs of formulas, a Tarski relation is a relation between sets of formulas (possibly infinite) and individual formulas, and a Scott relation is a relation between two sets of formulas. In the case of Hilbert and Tarski relations, A ⊨ B or Γ ⊨ B mean that the formula B follows from formula A or from set of formulas Γ. In the case of Scott consequence relations, Γ ⊨ Δ means that the joint truth of all the members of Γ implies (in some sense) the truth of at least one member of Δ. To this point, studies of nonmonotonic logic have defined nonmonotonic consequence relations in the style of Hilbert or Tarski, rather than Scott.

A (Tarski) consequence relation is monotonic just in case it satisfies the following condition, for all formulas p and all sets Γ and Δ:

Monotonicity: If Γ ⊨ p, then Γ ∪ Δ ⊨ p.

Any consequence relation that fails this condition is nonmonotonic. A relation of defeasible consequence clearly must be nonmonotonic, since a defeasible inference can be defeated by adding additional information that constitutes a rebutting or undercutting defeater.

5.2 Metalogical Desiderata

Once monotonicity is given up, the question arises: why call the relation of defeasible consequence a logical consequence relation at all? What properties do defeasible consequence and classical logical consequence have in common, that would justify treating them as sub-classes of the same category? What justifies calling nonmonotonic consequence logical?

To count as logical, there are certain minimal properties that a relation must satisfy. First, the relation ought to permit reasoning by lemmas or subconclusions. That is, if a proposition p already follows from a set Γ, then it should make no difference to add p to Γ as an additional premise. Relations that satisfy this condition are called cumulative. Cumulative relations satisfy the following two conditions (where “C(Γ)” represents the set of defeasible consequences of Γ):

Cut: If Γ ⊆ Δ ⊆ C(Γ), then C(Δ) ⊆ C(Γ).

Cautious Monotony: If Γ ⊆ Δ ⊆ C(Γ), then C(Γ) ⊆ C(Δ).

In addition, a defeasible consequence relation ought to be supraclassical: if p follows from q in classical logic, then it ought to be included in the defeasible consequences of q as well. A formula q ought to count as an (at least) defeasible consequence of itself, and anything included in the content of q (any formula p that follows from q in classical logic) ought to count as a defeasible consequence of q as well. Moreover, the defeasible consequences of a set Γ ought to depend only on the content of the formulas in Γ, not in how that content is represented. Consequently, the defeasible consequence relation ought to treat Γ and the classical logical closure of Γ (which I'll represent as “Cn(Γ)”) in exactly the same way. A consequence relation that satisfies these two conditions is said to satisfy full absorption (see Makinson 1994, 47).

Full Absorption: Cn(C(Γ)) = C(Γ) = C(Cn(Γ))

Finally, a genuinely logical consequence relation ought to enable us to reason by cases. So, it should satisfy a principle called distribution: if a formula p follows defeasibly from both q and r, then it ought to follow from their disjunction. (To require the converse principle would be to reinstate monotonicity.) The relevant principle is this:

Distribution: C(Γ) ∩ C(Δ) ⊆ C(Cn(Γ) ∩ Cn(Δ)).

Consequence relations that are cumulative, strongly absorptive and distributive satisfy a number of other desirable properties, including conditionalization: If a formula p is a defeasible consequence of Γ ∪ {q}, then the material conditional (qp) is a defeasible consequence of Γ alone. In addition, such logics satisfy the property of loop: if p1 dproves p2pn-1 dproves pn (where “dproves” represents the defeasible consequence relation), then the defeasible consequences of pi and pj are exactly the same, for any i or j.[2]

There are three further conditions that have been much discussed in the literature, but whose status remains controversial: disjunctive rationality, rational monotony and consistency preservation.

Disjunctive Rationality: If Γ ∪ {p} not-dproves r, and Γ ∪ {q} not-dproves r, then Γ ∪ {(p ∨ q)} not-dproves r.

Rational Monotony: If Γ dproves A, then either Γ ∪ {B} dproves A or Γ dproves ¬B.

Consistency Preservation: If Γ is classically consistent, then so is C(Γ) (the set of defeasible consequences of Γ).

All three properties seem desirable, but they set a very hight standard for the defeasible reasoner.

5.3 Default Logic

Ray Reiter's default logic (Reiter 1980; Etherington and Reiter 1983) was part of the first generation of defeasible systems developed in the field of artificial intelligence. The relative ease of computing default extensions have made it one of the more popular systems.

Reiter's system is based on the use of default rules. A default rule consists of three formulas: the prerequisite, the justification, and the consequent. If one accepts the prerequisite of a default rule, and the justification is consistent with all one knows (including what one knows on the basis of the default rules themselves), then one is entitled to accept the consequent. The most popular use of default logic relies solely on normal defaults, in which the justification and the consequent are identical. Thus, a normal default of the form (p; qq) allows one to infer q from p, so long as q is consistent with one's endpoint (the extension of the default theory).

A default theory consists of a set of formulas (the facts), together with a set of default rules. An extension of a default theory is a fixed point of a particular inferential process: an extension E must be a consistent theory (a consistent set closed under classical consequence) that contains all of the facts of the default theory T, and, in addition, for each normal default (pq), if p belongs to E, and q is consistent with E, then q must belong to E also.

Since the consequence relation is defined by a fixed-point condition, there are default theories that have no extension at all, and other theories that have multiple, mutually inconsistent extensions. For example, the theory consisting of the fact p and the pair of defaults (p ; (q & r) ∴ q) and (q ; ¬r ∴ ¬r) has no extension. If the first default is applied, then the second must be, and if the second default is not applied, the first must be. However, the conclusion of the second default contradicts the prerequisite of the first, so the first cannot be applied if the second is. There are many default theories that have multiple extensions. Consider the theory consisting of the facts q and r and the pair of defaults (q ; pp) and (r ; ¬p ∴ ¬p). One or the other, but not both, defaults must be applied.

Furthermore, there is no guarantee that if E and E′ are both extensions of theory T, then the intersection of E and E′ is also an extension (the intersection of two fixed points need not be itself a fixed point). Default logic is usually interpreted as a credulous system: as a system of logic that allows the reasoner to select any extension of the theory and believe all of the members of that theory, even though many of the resulting beliefs will involve propositions that are missing from other extensions (and may even be contradicted in some of those extensions).

Default logic fails many of the tests for a logical relation that were introduced in the previous section. It satisfied Cut and Full Absorption, but it fails Cautious Monotony (and thus fails to be cumulative). In addition, it fails Distribution, a serious limitation that rules out reasoning by cases. For example, if one knows that Smith is either Amish or Quaker, and both Quakers and Amish are normally pacifists, one cannot infer that Smith is a pacifist. Default logic also fails to represent Pollock's undercutting defeaters. Finally, default logic does not incorporate any form of the principle of Specificity, the principle that defaults with more specific prerequisites ought, in cases of conflict, to take priority over defaults with less specific prerequisites. Recently, John Horty (Horty 2007) has examined the implications of adding priorities among defaults (in the form of a partial ordering), which would permit the recognition of specificity and other grounds for preferring one default to another.

5.4 Nonmonotonic Logic I and Autoepistemic Logic

In both McDermott-Doyle's Nonmonotonic Logic I and Moore's Autoepistemic logic (McDermott and Doyle, 1982; Moore, 1985; Konolige 1994), a modal operator M (representing a kind of epistemic possibility) is used. Default rules take the following form: ((p & Mq) → q), that is, if p is true and q is “possible” (in the relevant sense), then q is also true. In both cases, the extension of a theory is defined, as in Reiter's default logic, by means of a fixed-point operation. Mp represents the fact that ¬p does not belong to the extension. For example, in Moore's case, a set Δ is a stable expansion of a theory Γ just in case Δ is the set of classical consequences of the set Γ ∪ {¬Mp: p ∈ Δ} ∪ {Mp: p ∉ Δ}. As in the case of Reiter's default logic, some theories will lack a stable expansion, or have more than one. In addition, these systems fail to incorporate Specificity.

5.5 Circumscription

In circumscription (McCarthy 1982; McCarthy 1986; Lifschitz 1988), one or more predicates of the language are selected for minimization (there is, in addition, a further technical question of which predicates to treat as fixed and which to treat as variable). The nonmonotonic consequences of a theory T then consist of all the formulas that are true in every model of T that minimizes the extensions of the selected predicates. One model M of T is preferred to another, M', if and only if, for each designated predicate F, the extension of F in M is a subset of the extension of F in M', and, for some such predicate, the extension in M is a proper subset of the extension in M'.

The relation of circumscriptive consequence has all the desirable meta-logical properties. It is cumulative (satisfies Cut and Cautious Monotony), strongly absorptive and distributive. In addition, it satisfies Consistency Preservation, although not Rational Monotony.

The most critical problem in applying circumscription is that of deciding on what predicates to minimize (there is, in addition, a further technical question about which predicates to treat as fixed and which as variable in extension). Most often what is done is to introduce a family of abnormality predicates ab1, ab2, etc. A default rule then can be written in the form: ∀x((F(x) & ¬ abi(x) ) → G(x)), where “→” is the ordinary material conditional of classical logic. To derive the consequences of a theory, all of the abnormality predicates are simultaneously minimized. This simple approach fails to satisfy the principle of Specificity, since each default is given its own, independent abnormality predicate, and each are therefore treated with the same priority. It is possible to add special rules for the prioritizing of circumscription, but these are, of necessity, ad hoc and exogenous, rather than a natural result of the definition of the consequence relation.

Circumscription does have the capacity of representing the existence of undercutting defeaters. Suppose that satisfying predicate F provides a prima facie reason for supposing something to be a G, and suppose that we use the abnormality predicate ab1 in representing this default rule. We can state that the predicate H provides an undercutting defeater to this inference by simply adding the rule: ∀ x (H(x) → ab1(x)), stating that all Hs are abnormal in respect number 1.

5.6 Preferential Logics

Circumscription is a special case of a wider class of defeasible logics, the preferential logics (Shoham 1987). In preferential logics, Γ dproves p iff p is true in all of the most preferred models of Γ. In the case of circumscription, the most preferred models are those that minimize the extension of certain predicates, but many other kinds of preference relations can be used instead, so long as the preference relations are transitive and irreflexive (a strict partial order). A structure consisting of a set of models of a propositional or first-order language, together with a preference order on those models, is called a preferential structure. The symbol ≺ shall represent the preference relation. MM′ means that M is strictly preferred to M′. A most preferred model is one that is minimal in the ordering.

In order to give rise to a cumulative logic (one that satisfies Cut and Cautious Monotony), we must add an additional condition to the preferential structures, a Limit Assumption (also known as the condition of stopperedness or smoothness:

Limit Assumption: Given a theory T, and M, a non-minimal model of T, there exists a model M′ which is preferred to M and which is a minimal model of T.

The Limit Assumption is satisfied if the preferential structure does not contain any infinite descending chains of more and more preferred models, with no minimal member. This is a difficult condition to motivate as natural, but without it, we can find preferential structures that give rise to nonmonotonic consequence relations that fail to be cumulative.

Once we have added the Limit Assumption, it is easy to show that any consequence relation based upon a preferential model is not only cumulative but also supraclassical, strongly absorptive and distributive. Let's call such logics preferential. In fact, Kraus, Lehmann and Magidor (Kraus, Lehmann and Magidor 1990; Makinson 1994, 77; Makinson 2005, ) proved the following representation theorem for preferential logics:

Representation Theorem for Preferential Logics: if dproves is a cumulative, supraclassical, strongly absorptive, and distributive consequence relation (i.e., a preferential relation) then there is a preferential structure calM satisfying the Limit Assumption such that for all finite theories T, the set of dproves-consequences of T is exactly the set of formulas true in every preferred model of T in calM.[3]

There are preferential logics that fail to satisfy consistency preservation, as well as disjunctive rationality and rational monotony:

Disjunctive Rationality:
If Γ ∪ {p} not-dproves r, and Γ ∪ {q} not-dproves r, then Γ ∪ {(pq)} not-dproves r.

Rational Monotony:
If Γ dproves p, then either Γ ∪ {q} dproves p or Γ dproves ¬q.

A very natural condition has been found by Kraus, Lehmann and Magidor that corresponds to Rational Monotony: that of ranked models. (No condition on preference structures has been found that ensures disjunctive rationality without also ensuring rational monotony.) A preferential structure calM satisfies the Ranked Models condition just in case there is a function r that assigns an ordinal number to each model in such a way that MM′ iff r(M) < r(M'). Let's say that a preferential consequence relation is a rational relation just in case it satisfies Rational Monotony, and that a preferential structure is a rational structure just in case it satisfies the ranked models condition. Kraus, Lehmann and Magidor (Kraus, Lehmann and Magidor 1990; Makinson 1994, 71-81) also proved the following representation theorem:

Representation Theorem for Rational Logics: if dproves is a rational consequence relation (i.e., a preferential relation that satisfies Rational Monotony) then there is a preferential structure calM satisfying the Limit Assumption and the Ranked Models Assumption such that for all finite theories T, the set of dproves-consequences of T is exactly the set of formulas true in every preferred model of T in calM.

Freund proved an analogous representation result for preferential logics that satisfy disjunctive rationality, replacing the ranking condition with a weaker condition of filtered models: a filtered model is one such that, for every formula, if two worlds non-minimally satisfy the formula, then there is a world less than both of them that also satisfies the formula (Freund 1993).

5.7 Logics of Extreme Probabilities

Lehmann and Magidor (Lehmann and Magidor 1992) noticed an interesting coincidence: the metalogical conditions for preferential consequence relations correspond exactly to the axioms for a logic of conditionals developed by Ernest W. Adams (Adams 1975).[4] Adams's logic was based on a conditional, ⇒, intended to represent a relation of very high conditional probability: (pq) means that the conditional probability Pr(q/p) is extremely close to 1. Adams used the standard delta-epsilon definition of the calculus to make this idea precise. Let us suppose that a theory T consists of a set of conditional-free formulas (the facts) and a set of probabilistic conditionals. A conclusion p follows defeasibly from T if and only if every probability function satisfies the following condition:

For every δ, there is an ε such that, if the probability of every fact in T is assigned a probability at least as high as 1 - ε, and every conditional in T is assigned a conditional probability at least as high as 1 - ε, then the probability of the conclusion p is at least 1 - δ.

The resulting defeasible consequence relation is a preferential relation. (It need not, however, be consistency-preserving.) This consequence relation also corresponds to a relation, 0-entailment, defined by Judea Pearl (Pearl 1990), as the common core to all defeasible consequence relations.

Lehmann and Magidor (1992) proposed a variation on Adams's idea. Instead of using the delta-epsilon construction, they made use of nonstandard measure theory, that is, a theory of probability functions that can take values that are infinitesimals (infinitely small numbers). In addition, instead of defining the consequence relation by quantifying over all probability functions, Lehmann and Magidor assume that we can select a single probability function (representing something like the ideally rational or objective probability). On their construction, a conclusion p follows from T just in case the probability of p is infinitely close to 1, on the assumption that the probabilities assigned to members of T are infinitely close to 1. Lehmann and Magidor proved that the resulting consequence relation is always not only preferential: it is also rational. The logic defined by Lehmann and Magidor also corresponds exactly to the theory of Popper functions, another extension of probability theory designed to handle cases of conditioning on propositions with infinitesimal probability (see Harper 1976; Hawthorne 1998). For a brief discussion of Popper functions, see the following supplementary document:

Popper Functions

Arló Costa and Parikh, using van Fraassen's account (van Fraassen, 1995) of primitive conditional probabilities (a variant of Popper functions), proved a representation result for both finite and infinite languages (Arló Costa and Parikh, 2005). For infinite languages, they assumed an axiom of countable additivity for probabilities.

Kraus, Lehmann and Magidor proved that, for every preferential consequence relation dproves that is probabilistically admissible,[5] there is a unique rational consequence relation dproves* that minimally extends it (that is, that the intersection of all the rational consequence relations extending dproves is also a rational consequence relation). This relation, dproves*, is called the rational closure of dproves. To find the rational closure of a preferential relation, one can perform the following operation on a preferential structure that supports that relation: assign to each model in the structure the smallest number possible, respecting the preference relation. Judea Pearl also proposed the very same idea under the name 1-entailment or System Z (Pearl 1990).

A critical advantage to the Lehmann-Magidor-Pearl 1-entailment system over Adams's epsilon-entailment lay in the way in which 1-entailment handles irrelevant information. Suppose, for example, that we know that birds fly (BF), Tweety is a bird (B) and Nemo is a whale (W). These premises do not epsilon-entail F (that Tweety flies), since there is no guarantee that a probability function assign a high probability to F, given the conjunction of B and W. In contrast, 1-entailment does give us the conclusion F.

Moreover, 1-entailment satisfies a condition of weak independence of defaults: conditionals with logically unrelated antecedents can “fire” independently of each other: one can warrant a conclusion even though we are given an explicit exception to the other. Consider, for example, the following case: birds fly (BF), Tweety is a bird that doesn't fly (B & ¬F), whales are large (WL), and Nemo is a whale (W). These premises 1-entail that Nemo is large (L). In addition, 1-entailment automatically satisfies the principle of Specificity: conditionals with more specific antecedents are always given priority over those with less specific antecedents.

There is another form of independence, strong independence, that even 1-entailment fails to satisfy. If we are given one exception to a rule involving a given antecedent, then we are unable to use any conditional with the same antecedent to derive any conclusion whatsoever. Suppose, for example, that we know that birds fly (BF), Tweety is a bird that doesn't fly (B & ¬F), and birds lay eggs (BE). Even under 1-entailment, the conclusion that Tweety lays eggs (E) fails to follow. This failure to satisfy Strong Independence is also known as the Drowning Problem (since all conditionals with the same antecedent are “drowned” by a single exception).

A consensus is growing that the Drowning Problem should not be “solved” (see Pelletier and Elio 1994; Wobcke 1995, 85; Bonevac, 2003, 461-462). Consider the following variant on the problem: birds fly, Tweety is a bird that doesn't fly, and birds have strong forelimb muscles. Here it seems we should refrain from concluding that Tweety has strong forelimb muscles, since there is reason to doubt that the strength of wing muscles is causally (and hence, probabilistically) independent of capacity for flight. Once we know that Tweety is an exceptional bird, we should refrain from applying other conditionals with Tweety is a bird as their antecedents, unless we know that these conditionals are independent of flight, that is, unless we know that the conditional with the stronger antecedent, Tweety is a non-flying bird, is also true.

Nonetheless, several proposals have been made for securing strong independence and solving the Drowning Problem. Geffner and Pearl (Geffner and Pearl 1992) proposed a system of conditional entailment, a variant of circumscription, in which the preference relation on models is defined in terms of the sets of defaults that are satisfied. This enables Geffner and Pearl to satisfy both the Specificity principle and Strong Independence. Another proposal is the maximum entropy approach (Pearl 1988, 490-496; Goldszmidt, Morris and Pearl, 1993; Pearl 1990). A theory T, consisting of defaults Δ and facts F, entails p just in case the probability of p, conditional on F, approaches 1 as the probabilities associated with Δ approach 1, using the entropy-maximizing[6] probability function that respects the defaults in Δ. The maximum-entropy approaches satisfies both Specificity and Strong Independence.

Every attempt to solve the drowning problem (including conditional entailment and the maximum-entropy approach) come at the cost of sacrificing cumulativity. Securing strong independence makes the systems very sensitive to the exact form in which the default information is stored. Consider, for example the following case: Swedes are (normally) fair, Swedes are (normally) tall, Jon is a short Swede. Conditional entailment and maximum-entropy entailment would permit the conclusion that Jon is fair in this case. However, if we replace the first two default conditionals by the single default, Swedes are normally both tall and fair, then the conclusion no longer follows, despite the fact that the new conditional is logically equivalent to the conjunction of the two original conditionals.

Applying the logic of extreme probabilities to real-world defeasible reasoning generates an obvious problem, however. We know perfectly well that, in the case of the default rules we actually use, the conditional probability of the conclusion on the premises is nowhere near 1. For example, the probability that an arbitrary bird can fly is certainly not infinitely close to 1. This problem resembles that of using idealizations in science, such as frictionless planes and ideal gases. It seems reasonable to think that, in deploying the machinery of defeasible logic, we indulge in the degree of make-believe necessary to make the formal models applicable. Nonetheless, this is clearly a problem warranting further attention.

5.8 Fully Expressive Languages: Conditional Logics and Higher-Order Probabilities

With relatively few exceptions, the logical approaches to defeasible reasoning developed so far put severe restrictions on the logical form of propositions included in a set of premises. In particular, they require the default conditional operator, ⇒, to have wide scope in every formula in which it appears. Default conditionals are not allowed to be nested within other default conditionals, or within the scope of the usual Boolean operators of propositional logic (negation, conjunction, disjunction, material conditional). This is a very severe restriction and one that is quite difficult to defend. For example, in representing undercutting defeaters, it would be very natural to use a negated default conditional of the form ¬((p & q) ⇒ r) to signify that q defeats p as a prima facie reason for r. In addition, it seems plausible that one might come gain disjunctive default information: for example, that either customers are gullible or salesman are wily.

Asher and Pelletier (Asher and Pelletier 1997) have argued that, when translating generic sentences in natural language, it is essential that we be allowed to nest default conditionals. For example, consider the following English sentences:

Close friends are (normally) people who (normally) trust one another.

People who (normally) rise early (normally) go to bed early.

In the first case, a conditional is nested within the consequent of another conditional:

xy(Friend(x,y) ⇒ ∀z (Time(z) ⇒ Trust(x,y,z)))

In the second case, we seem to have conditionals nested within both the antecedent and the consequent of a third conditional, something like:

x (Person(x) → (∀y(Day(y) ⇒ Rise-early(x,y)) ⇒ ∀z(Day(z) ⇒ Bed-early(x,z))

This nesting of conditionals can be made possible by borrowing and modifying the semantics of the subjunctive or counterfactual conditional, developed by Robert Stalnaker and David K. Lewis (Lewis 1973). For an axiomatization of Lewis's conditional logic, see the following supplementary document:

David Lewis's Conditional Logic

The only modification that is essential is to drop the condition of Centering (both strong and weak), a condition that makes modus ponens (affirming the antecedent) logically valid. If the conditional ⇒ is to represent a default conditional, we do not want modus ponens to be valid: we do not want (pq) and p to entail q classically (i.e., monotonically). If Centering is dropped, the resulting logic can be made to correspond exactly to either a preferential or a rational defeasible entailment relation. For example, the condition of Rational Monotony is the exact counterpart of the CV axiom of Lewis's logic:

CV: (pq) → [((p & r) ⇒ q) ∨ (p ⇒ ¬r )]

Something like this was proposed first by James Delgrande (Delgrande 1987), and the idea has been most thoroughly developed by Nicholas Asher and his collaborators (Asher and Morreau 1991; Asher 1995; Asher and Bonevac 1996; Asher and Mao 2001) under the name Commonsense Entailment.[7] Commonsense Entailment is a preferential (although not a rational) consequence relation, and it automatically satisfies the Specificity principle. It permits the arbitrary nesting of default conditionals within other logical operators, and it can be used to represent undercutting defeaters, through the use of negated defaults (Asher and Mao 2001).

The models of Commonsense Entailment differ significantly from those of preferential logic and the logic of extreme probabilities. Instead of having structures that contain sets of models of a standard, default-free language, a model the language of Commonsense Entailment includes a set of possible worlds, together with a function that assigns standard interpretation (a model of the default-free language) to each world. In addition, to each pair consisting of a world w and a set of worlds (proposition) A, there is a function * that assigns a set of worlds *(w,A) to the pair. The set *(w,A) is the set of most normal A-worlds, from the perspective of w. A default conditional (pq) is true in a world w (in such a model) just in case all of the most normal p worlds (from w's perspective) are worlds in which q is also true. Since we can assign truth-conditions to each such conditional, we can define the truth of nested conditionals, whether the conditionals are nested within Boolean operators or within other conditionals. Moreover, we can define both a classical, monotonic consequence relation for this class of models and a defeasible, nonmonotonic relation (in fact, the nonmonotonic consequence relation can be defined in a variety of ways). We can then distinguish between a default conditional's following with logical necessity from a default theory and its following defeasibly from that same theory. Contraposition, for example — inferring (¬q ⇒ ¬p) from (pq) — is not logically valid for default conditionals, but it might be a defeasibly correct inference.[8]

The one critical drawback to Commonsense Entailment, when compared to the logic of extreme probabilities, is that it lacks a single, clear standard of normativity. The truth-conditions of the default conditional and the definition of nonmonotonic consequence can be fine-tuned to match many of our intuitions, but in the end of the day, the theory of Commonsense Entailment offers no simple answer to the question of what its conditional or its consequence relation are supposed (ideally) to represent.

Logics of extreme probability (beginning with the work of Ernest Adams) did not permit the nesting of default conditionals for this reason: the conditionals were supposed to represent something like subjective conditional probabilities of the agent, to which the agent was supposed to have perfect introspective access. Consequently, it made no sense to nest this conditionals within disjunctions (as though the agent couldn't tell which disjunct represented his actual probability assignment) or within other conditionals (since the subjective probability of a subjective probability is always trivial — either exactly 1 or exactly 0). However, there is no reason why the logic of extreme probabilities couldn't be given a different interpretation, with (pq) representing something like the objective probability of q, conditional on p, is infinitely close to 1. In this case, it makes perfect sense to nest such statements of objective conditional probability within Boolean operators (either the probability of q on p is close to 1, or the probability of r on s is close to 1), or within operators of objective probability (the objective probability that the objective probability of p is close to 1 is itself close to 1). What is required in the latter case is a theory of higher-order probabilities.

Fortunately, such a theory of higher-order probabilities is available (see Skyrms 1980; Gaifman 1988). The central principle of this theory is Miller's principle. For a description of the models of the logic of extreme, higher-order probability, see the following supplementary document:

Models of Higher-Order Probability

The following proposition is logically valid in this logic, representing the presence of a defeasible modus ponens rule:

((p & (pq)) ⇒ q)

This system can be the basis for a family of rational nonmonotonic consequence relations that include the Adams ε-entailment system as a proper part (see Koons 2000, 298-319).

5.9 Objections to Nonmonotonic Logic

Confusing Logic and Epistemology?

In an early paper (Israel 1980), David Israel raised a number of objections to the very idea of nonmonotonic logic. First, he pointed out that the nonmonotonic consequences of a finite theory are typically not semi-decidable (recursively enumerable). This remains true of most current systems, but it is also true of second-order logic, infinitary logic, and a number of other systems that are now accepted as logical in nature.

Secondly, and more to the point, Israel argued that the concept of nonmonotonic logic evinces a confusion between the rules of logic and rules of inference. In other words, Israel accused defenders of nonmonotonic logic of confusing a theory of defeasible inference (a branch of epistemology) with a theory of genuine consequence relations (a branch of logic). Inference is nonmonotonic, but logic (according to Israel) is essentially monotonic.

The best response to Israel is to point out that, like deductive logic, a theory of nonmonotonic or defeasible consequence has a number of applications besides that of guiding actual inference. Defeasible logic can be used as part of a theory of scientific explanation, and it can be used in hypothetical reasoning, as in planning. It can be used to interpret implicit features of stories, even fantastic ones, so long as it is clear which actual default rules to suspend. Thus, defeasible logic extends far beyond the boundaries of the theory of epistemic justification. Moreover, as we have seen, nonmonotonic consequence relations (especially the preferential ones) share a number of very significant formal properties with classical consequence, warranting the inclusion of them all in a larger family of logics. From this perspective, classical deductive logic is simply a special case: the study of indefeasible consequence.

Problems with the Deduction Theorem

In a recent paper, Charles Morgan (Morgan 2000) has argued that nonmonotonic logic is impossible. Morgan offers a series of impossibility proofs. All of Morgan's proofs turn on the fact that nonmonotonic logics cannot support a generalized deduction theorem, i.e., something of the following form:

Γ ∪ {p} dproves q iff Γ dproves (pq)

Morgan is certainly right about this.

However, there are good grounds for thinking that a system of nonmonotonic logic should fail to include a generalized deduction theorem. The very nature of defeasible consequence ensures that it must be so. Consider, for example, the left-to-right direction: suppose that Γ ∪ {p} dproves q. Should it follow that Γ dproves (pq)? Not at all. It may be that, normally, if p then ¬q, but Γ may contain defaults and information that defeat and override this inference. For instance, it might contain the fact r and the default ((r & p) ⇒ q). Similarly, consider the right-to-left direction: suppose that Γ dproves (pq). Should it follow that Γ ∪ {p} dproves q? Again, clearly not. Γ might contain both r and a default ((p & r) ⇒ ¬q), in which case Γ ∪ {p} dproves ¬q.

It would be reasonable, however, to demand that a system of nonmonotonic logic satisfy the following special deduction theorem:

{p} dproves q iff ∅ dproves (pq)

This is certainly possible. The special deduction theorem holds trivially, if we define{p} dproves q as ∅ ⊨ (pq), that is, {p} defeasibly entails q if and only if (by definition) (pq) is a theorem of the classical conditional logic.[9]

6. Causation and Defeasible Reasoning

6.1 The Need for Explicit Causal Information

Hanks and McDermott, computer scientists at Yale, demonstrated that the existing systems of nonmonotonic logic were unable to give the right solution to a simple problem about predicting the course of events (Hanks and McDermott 1987). The problem became known as the Yale shooting problem. Hanks and McDermott assume that some sort of law of inertia can be assumed: that normally properties of things do not change. In the Yale shooting problem, there are two relevant properties: being loaded (a property of a gun) and being alive (a property of the intended victim of the shooting). Let's assume that in the initial situation, s0, the gun is loaded and the victim is alive, Loaded(s0) and Alive(s0), and that two actions are performed in sequence: Wait and Shoot. Let's call the the situation that results from a moment of waiting s1, and the situation that follows both waiting and then shooting s2. There are then three instances of the law of inertia that are relevant:

We need to make one final assumption: that shooting the victim with a loaded gun results in death (not being alive):

Intuitively, we should be able to derive the defeasible conclusion that the victim is still alive after waiting, but dead after waiting and shooting: Alive(s1) & ¬Alive(s2). However, none of the nonmonotonic logics described above give us this result, since each of the three instances of the law of inertia can be violated: by the victim's inexplicably dying while we are waiting, by the gun's miraculously becoming unloaded while we are waiting, or by the victim's dying as a result of the shooting. Nothing introduced into nonmonotonic logic up to this point provides us with a basis for preferring the second exception to the law of inertia to the first or third. What's missing is a recognition of the importance of causal structure to defeasible consequence.[10]

There are several even simpler examples that illustrate the need to include explicitly causal information in the input to defeasible reasoning. Consider, for instance, this problem of Judea Pearl's (Pearl 1988): if the sprinkler is on, then normally the sidewalk is wet, and, if the sidewalk is wet, then normally it is raining. However, we should not infer that it is raining from the fact that the sprinkler is on. (See Lifschitz 1990 and Lin and Reiter 1994 for additional examples of this kind.) Similarly, if we also know that if the sidewalk is wet, then it is slippery, we should be able to infer that the sidewalk is slippery if the sprinkler is on and it is not raining.

6.2 Causally-Grounded Independence Relations

Hans Reichenbach, in his analysis of the interaction of causality and probability (Reichenbach 1956), observed that the immediate causes of an event probabilistically screen off from that event any other event that is not causally posterior to it. This means that, given the immediate causal antecedents of an event, the occurrence of that event is rendered probabilistically independent of any information about non-posterior events. When this insight is applied to the nonmonotonic logic of extreme probabilities, we can use causal information to identify which defaults function independently of others: that is, we can decide when the fact that one default conditional has an exception is irrelevant to the question of whether a second conditional is also violated (see Koons 2000, 320-323). In effect, we have a selective version of Independence of Defaults that is grounded in causal information, enabling us to dissolve the Drowning Problem.

For example, in the case of Pearl's sprinkler, since rain is causally prior to the sidewalk's being wet, the causal structure of the situation does not ensure that the rain is probabilistically independent of whether the sprinkler is on, given the fact that the sidewalk is wet. That is, we have no grounds for thinking that the probability of rain, conditional on the sidewalk's being wet, is identical to the probability of rain, conditional on the sidewalk's being wet and the sprinkler's being on (presumably, the former is higher than the latter). This failure of independence prevents us from using the (WetRain) default, in the presence of the additional fact that the sprinkler is on.

In the case of the Yale shooting problem, the state of the gun's being loaded in the aftermath of waiting, Loaded(s1), has at its only causal antecedent the fact that the gun is loaded in s0. The fact of Loaded(s0) screens off the fact that the victim is alive in s0 from the conclusion Loaded(s1). Similarly, the fact that the victim is alive in s0 screens of the fact that the gun is loaded in s0 from the conclusion that the victim is still alive in s1. In contrast, the fact that the victim is alive at s1 does not screen off the fact that the gun is loaded at s1 from the conclusion that the victim is still alive at s2. Thus, we can assign higher priority to the law of inertia with respect to both Load and Alive at s0, and we can conclude that the victim is alive and the gun is loaded at s1. The causal law for shooting then gives us the desired conclusion, namely, that the victim is dead at s2.

6.3 Causal Circumscription

Our knowledge of causal relatedness is itself very partial. In particular, it is difficult for us to verify conclusively that any two randomly selected facts are or are not causally related. It seems that in practice we apply something like Occam's razor, assuming that two randomly selected facts are not causally related unless we have positive reason for thinking otherwise. This invites the use of something like circumscription, minimizing the extension of the predicate causes. Once we have a set of tentative conclusions about the causal structure of the world, we can use Reichenbach's insight to enable us to determine which default rules should be rendered independent of exceptions to other default rules. Since circumscription is itself a nonmonotonic logical system, there are at least two independent sources of nonmonotonicity or defeasibility: the minimization or circumscription of causal relevance, and the application of defeasible causal laws and laws of inertia.

Bibliography

Other Internet Resources

Related Entries

artificial intelligence: logic and | causation: probabilistic | epistemology: Bayesian | logic: modal | logic: non-monotonic | logic: of belief revision | probability, interpretations of