This is a file in the archives of the Stanford Encyclopedia of Philosophy. |
version |
Stanford Encyclopedia of Philosophy |
content revised
|
A problem being presented to an automated reasoning program consists of two main items, namely a statement expressing the particular question being asked called the problem's conclusion, and a collection of statements expressing all the relevant information available to the program -- the problem's assumptions. Solving a problem means proving the conclusion from the given assumptions by the systematic application of rules of deduction embedded within the reasoning program. The problem solving process ends when one such proof is found, when the program is able to detect the non-existence of a proof, or when it simply runs out of resources.
A first important consideration in the design of an automated reasoning program is to delineate the class of problems that the program will be required to solve -- the problem domain. The domain can be very large, as would be the case for a general-purpose theorem prover for first-order logic, or be more restricted in scope as in a special-purpose theorem prover for Tarski's geometry, or the modal logic K. A typical approach in the design of an automated reasoning program is to provide it first with sufficient logical power (e.g., first-order logic) and then further demarcate its scope to the particular domain of interest defined by a set of domain axioms. To illustrate, EQP, a theorem-proving program for equational logic, was used to solve an open question in Robbins algebra (McCune 1997): Are all Robbins algebras Boolean? For this, the program was provided with the axioms defining a Robbins algebra:
(A1) x + y = y + x (commutativity) (A2) (x + y) + z = x + (y + z) (associativity) (A3) -(-(x + y) + -(x + -y)) = x (Robbins equation)
The program was then used to show that a characterization of Boolean algebra that uses Huntington's equation,
-(-x + y) + -(-x + -y) = x,follows from the axioms. We should remark that this problem is non-trivial since deciding whether a finite set of equations provides a basis for Boolean algebra is undecidable, that is, it does not permit an algorithmic representation; also, the problem was attacked by Robbins, Huntington, Tarski and many of his students with no success. The key step was to establish that all Robbins algebras satisfy
(since it was known that this formula is a sufficient condition for a Robbins algebra to be Boolean. When EQP was supplied with this piece of information, the program provided invaluable assistance by completing the proof automatically.x)(
y)(x + y = x),
A special-purpose theorem prover does not draw its main benefit by restricting its attention to the domain axioms but from the fact that the domain may enjoy particular theorem-proving techniques which can be hardwired -- coded -- within the reasoning program itself and which may result in a more efficient logic implementation. Much of EQP's success at settling the Robbins question can be attributed to its built-in associative-commutative inference mechanisms.
A second important consideration in the building of an automated
reasoning program is to decide (1) how problems in its domain will be
presented to the reasoning program; (2) how they will actually be
represented internally within the program; and, (3) how the solutions
found -- completed proofs -- will be displayed back to the user.
There are several formalisms available for this, and the choice is
dependent on the problem domain and the underlying deduction calculus
used by the reasoning program. The most commonly used formalisms
include standard first-order logic, typed
-calculus,
and clausal logic. We take up clausal logic
here and assume that the reader is familiar with the rudiments of
first-order logic; for the typed
-calculus
the reader may want to check Church 1940. Clausal logic is a
quantifier-free variation of first-order logic and has been the most
widely used notation within the automated reasoning community. Some
definitions are in order: A term is a constant, a
variable, or a function whose arguments are themselves terms. For
example, a, x, f(x), and
h(c,f(z),y) are all terms.
A literal is either an atomic formula, e.g.
F(x), or the negation of an atomic formula, e.g.
~R(x,f(a)). Two literals are
complementary if one is the negation of the other. A
clause is a (possibly empty) finite disjunction of
literals l1
ln
where no literal appears more than once in the clause (that is, clauses
can be alternatively treated as sets of literals).
Ground terms, ground literals, and ground clauses have
no variables. The empty clause, [ ], is the clause
having no literals and, hence, is unsatisfiable -- false under any
interpretation. Some examples: ~R(a,b), and
F(a)
~R(f(x),b)
F(z) are both examples of
clauses but only the former is ground. The general idea is to be able
to express a problem's formulation as a set of clauses or,
equivalently, as a formula in conjunctive normal form,
that is, as a conjunction of clauses.
For formulas already expressed in standard logic notation, there is
a systematic two-step procedure for transforming them into conjunctive
normal form. The first step consists in re-expressing a formula into a
semantically equivalent formula in prenex normal form,
(x1)
(
xn)
(x1,
,xn),
consisting of a string of quantifiers
(
x1)
(
xn)
followed by a quantifier-free expression
(x1,
,xn)
called the matrix. The second step in the
transformation first converts the matrix into conjunctive normal form
by using well-known logical equivalences such as DeMorgan's
laws, distribution, double-negation, and others; then, the
quantifiers in front of the matrix, which is now in conjunctive
normal form, are dropped according to certain rules. In the presence
of existential quantifiers, this latter step does not always preserve
equivalence and requires the introduction of Skolem
functions whose role is to "simulate" the behaviour of
existentially quantified variables. For example, applying the
skolemizing process to the formula
(x)(
y)(
z)(
u)(
v)[R(x,y,v)
~K(x,z,u,v)]
requires the introduction of a one-place and two-place Skolem functions, f and g respectively, resulting in the formula
(x)(
z)(
v) [R(x,f(x),v)
~K(x,z,g(x,z),v)]
The universal quantifiers can then be removed to obtain the final
clause, R(x,f(x),v)
~K(x,z,g(x,z),
v) in our example. The Skolemizing process may not preserve
equivalence but maintains satisfiability, which is enough for
clause-based automated reasoning.
Although clausal form provides a more uniform and economical notation -- there are no quantifiers and all formulas are disjunctions -- it has certain disadvantages. One drawback is the exponential increase in the size of the resulting formula when transformed from standard logic notation into clausal form. The increase in size is accompanied by an increase in cognitive complexity that makes it harder for humans to read proofs written with clauses. Another disadvantage is that the syntactic structure of a formula in standard logic notation can be used to guide the construction of a proof but this information is completely lost in the transformation into clausal form.
A third important consideration in the building of an automated reasoning program is the selection of the actual deduction calculus that will be used by the program to perform its inferences. As indicated before, the choice is highly dependent on the nature of the problem domain and there is a fair range of options available: General-purpose theorem proving and problem solving (first-order logic, simple type theory), program verification (first-order logic), distributed and concurrent systems (modal and temporal logics), program specification (intuitionistic logic), hardware verification (higher-order logic), logic programming (Horn logic), and so on.
A deduction calculus consists of a set of logical axioms and a
collection of deduction rules for deriving new formulas from previously
derived formulas. Solving a problem in the program's problem
domain then really means establishing a particular formula
-- the problem's conclusion
-- from the extended set
consisting of the logical
axioms, the domain axioms, and the problem assumptions. That is, the
program needs to determine if
.
How the program goes about establishing this
semantic fact depends, of course, on the calculus it implements. Some
programs may take a very direct route and attempt to
establish that
by actually constructing a step-by-step proof of
from
.
If successful, this shows of course that
derives -- proves --
,
a fact we denote by writing
.
Other reasoning programs may instead opt for a more
indirect approach and try to establish that
by showing that
{~
}
is inconsistent which, in turn,
is shown by deriving a contradiction,
,
from the set
{~
}.
Automated systems that implement the former approach
include natural deduction systems; the latter approach is used by
systems based on resolution, sequent deduction, and matrix connection
methods.
Soundness and completeness are two (metatheoretical) properties of a
calculus that are particularly important for automated deduction.
Soundness states that the rules of the calculus are
truth-preserving. For a direct calculus this means that if
then
.
For indirect calculi, soundness means that if
{~
}
then
.
Completeness in a direct calculus
states that if
then
.
For indirect calculi, the completeness property is
expressed in terms of refutations since one
establishes that
by showing the existence of a proof, not of
from
,
but of
from
{~
}.
Thus, an indirect calculus is
refutation complete if
implies
{~
}
.
Of the two properties, soundness is the most desirable. An
incomplete calculus indicates that there are entailment relations
that cannot be established within the calculus. For an automated
reasoning program this means, informally, that there are true
statements that the program cannot prove. Incompleteness may be an
unfortunate affair but lack of soundness is a truly problematic
situation since an unsound reasoning program would be able to
generate false conclusions from perfectly true information.
It is important to appreciate the difference between a logical calculus and its corresponding implementation in a reasoning program. The implementation of a calculus invariably involves making some modifications to the calculus and this results, strictly speaking, in a new calculus. The most important modification to the original calculus is the mechanization of its deduction rules, that is, the specification of the systematic way in which the rules are to be applied. In the process of doing so, one must exercise care to preserve the metatheoretical properties of the original calculus.
Two other metatheoretical properties of importance to automated
deduction are decidability and complexity. A calculus is
decidable if it admits an algorithmic representation,
that is, if there is an algorithm that, for any given
and
,
it can determine in a finite amount of time the answer,
Yes or No, to the question Does
?
A calculus may be undecidable in which case one needs to determine
which decidable fragment to implement. The time-space
complexity of a calculus specifies how efficient its
algorithmic representation is. Automated reasoning is made the more
challenging because many calculi of interest are not decidable and
have poor complexity measures forcing researchers to seek tradeoffs
between deductive power versus algorithmic efficiency.
Of the many calculi used in the implementation of reasoning programs,
the ones based on the resolution principle have been
the most popular. Resolution is modeled after the chain rule (of which
Modus Ponens is a special case) and essentially states that from
p
q and ~q
r one can
infer p
r. More formally, let C
l denote
the clause C with the literal l removed. Assume
that C1 and C2 are ground
clauses containing, respectively, a positive literal
l1 and a negative literal ~l2
such that l1 and ~l2 are
complementary. Then, the rule of ground resolution
states that, as a result of resolving
C1 and C2, one can infer
(C1
l1)
(C2
~l2):
C1 C2 --------------- (ground resolution) (C1 l1)
(C2
~l2)
Herbrand's theorem (Herbrand 1930) assures us that the
non-satisfiability of any set of clauses, ground or not, can
be established by using ground resolution. This is a very significant
result for automated deduction since it tells us that if a set
is not satisfied by any of the infinitely many
interpretations, this fact can be determined in finitely many
steps. Unfortunately, a direct implementation of ground resolution
using Herbrand's theorem requires the generation of a vast number
of ground terms making this approach hopelessly inefficient. This issue
was effectively addressed by generalizing the ground resolution rule to
binary resolution and by introducing the notion of
unification (Robinson 1965a). Unification allows resolution proofs to
be lifted and be conducted at a more general level;
clauses only need to be instantiated at the moment where they are to be
resolved. Moreover, the clauses resulting from the instantiation
process do not have to be ground instances and may still contain
variables. The introduction of binary resolution and unification is
considered one of the most important developments in the field of
automated reasoning.
{xis a unifier forb, y
b, z
f(a,b)}
R(x,f(a,y)) and R(b,z)since when applied to both expressions makes them equal:
A most general unifier (mgu) produces the most general instance shared by two unifiable expressions. In the previous example, the substitution {x
R(x,f(a,y)){x b, y
b, z
f(a,b)}
= R(b,f(a,b)) = R(b,z){x b, y
b, z
f(a,b)}
by binary resolution; the clause (C1
C1 C2 --------------- (binary resolution) (C1 ![]()
l1
)
(C2
![]()
~l2
)
Resolution proofs, more precisely refutations, are constructed by
deriving the empty clause [ ] from
{~
}
using resolution; this will always be possible if
{~
} is unsatisfiable since resolution is
refutation complete (Robinson 1965a). As an example of a resolution
proof, we show that the set
{(
x)(P(x)
Q(x)),
(
x)(P(x)
R(x)),(
x)(Q(x)
R(x))},
denoted by
, entails the formula
(
x)R(x).
The first step is
to find the clausal form of
{~(
x)R(x)};
the resulting clause set, denoted by S0, is
shown in steps 1 to 4 in the refutation below. The refutation is
constructed by using a level-saturation method: Compute all the
resolvents of the initial set, S0, add them to
the set and repeat the process until the empty clause is
derived. (This produces the sequence of increasingly larger sets:
S0, S1,
S2,
) The only constraint that we impose is
that we do not resolve the same two clauses more than once.
S0 1 P(x) Q(x)
Assumption 2 ~P(x) R(x)
Assumption 3 ~Q(x) R(x)
Assumption 4 ~R(a) Negation of the conclusion S1 5 Q(x) R(x)
Res 1 2 6 P(x) R(x)
Res 1 3 7 ~P(a) Res 2 4 8 ~Q(a) Res 3 4 S2 9 Q(a) Res 1 7 10 P(a) Res 1 8 11 R(x) Res 2 6 12 R(x) Res 3 5 13 Q(a) Res 4 5 14 P(a) Res 4 6 15 R(a) Res 5 8 16 R(a) Res 6 7 S3 17 R(a) Res 2 10 18 R(a) Res 2 14 19 R(a) Res 3 9 20 R(a) Res 3 13 21 [ ] Res 4 11
Although the resolution proof is successful in deriving [ ], it has some significant drawbacks. To start with, the refutation is too long as it takes 21 steps to reach the contradiction, [ ]. This is due to the naïve brute-force nature of the implementation. The approach not only generates too many formulas but some are clearly redundant. Note how R(a) is derived six times; also, R(x) has more information content than R(a) and one should keep the former and disregard the latter. Resolution, like all other automated deduction methods, must be supplemented by strategies aimed at improving the efficiency of the deduction process. The above sample derivation has 21 steps but research-type problems command derivations with thousands or hundreds of thousands of steps.
Instead of removing redundant clauses, some strategies prevent the
generation of useless clauses in the first place. The
set-of-support strategy (Wos, Carson and Robinson
1965) is one of the most powerful strategies of this kind. A subset
T of the set S, where S is initially
{~
},
is called a set of support of
S iff S
T is satisfiable.
Set-of-support resolution dictates that the resolved clauses are not
both from S
T. The motivation behind
set-of-support is that since the set
is usually satisfiable it might be wise not to
resolve two clauses from
against
each other. Hyperresolution (Robinson 1965b) reduces
the number of intermediate resolvents by combining several resolution
steps into a single inference step.
Independently co-discovered, linear resolution (Loveland 1970, Luckham 1970) always resolves a clause against the most recently derived resolvent. This gives the deduction a simple linear structure affording a straightforward implementation; yet, linear resolution preserves refutation completeness. Using linear resolution we can derive the empty clause in the above example in only eight steps:
1 P(x) Q(x)
Assumption 2 ~P(x) R(x)
Assumption 3 ~Q(x) R(x)
Assumption 4 ~R(a) Negation of the conclusion 5 ~P(a) Res 2 4 6 Q(a) Res 1 5 7 R(a) Res 3 6 8 [ ] Res 4 7
With the exception of unrestricted subsumption, all the strategies mentioned so far preserve refutation completeness. Efficiency is an important consideration in automated reasoning and one may sometimes be willing to trade completeness for speed. Unit resolution and input resolution are two such refinements of linear resolution. In the former, one of the resolved clauses is always a literal; in the latter, one of the resolved clauses is always selected from the original set to be refuted. Albeit efficient, neither strategy is complete. Ordering strategies impose some form of partial ordering on the predicate symbols, terms, literals, or clauses occurring in the deduction. Ordered resolution treats clauses not as sets of literals but as sequences -- linear orders -- of literals. Ordered resolution is extremely efficient but, like unit and input resolution, is not refutation complete. To end, it must be noted that some strategies improve certain aspects of the deduction process at the expense of others. For instance, a strategy may reduce the size of the proof search space at the expense of increasing, say, the length of the shortest refutations.
There are several automated reasoning programs that are based on resolution, or refinements of resolution. Otter is one of the most versatile among these programs and is being used in a growing number of applications (Wos, Overbeek, Lusk and Boyle 1984). Resolution also provides the underlying logico-computational mechanism for the popular logic programming language Prolog (Clocksin and Mellish 1981).
Hilbert-style calculi (Hilbert and Ackermann 1928) have been traditionally used to characterize logic systems. These calculi usually consist of a few axiom schemata and a small number of rules that typically include modus ponens and the rule of substitution. Although they meet the required theoretical requisites (soundness, completeness, etc.) the approach at proof construction in these calculi is difficult and does not reflect standard practice. It was Gentzen's goal to set up a formalism that reflects as accurately as possible the actual logical reasoning involved in mathematical proofs (Gentzen 1935). To carry out this task, Gentzen analyzed the proof-construction process and then devised two deduction calculi for classical logic: the natural deduction calculus, NK, and the sequent calculus, LK. (Gentzen actually designed NK first and then introduced LK to pursue metatheoretical investigations). The calculi met his goal to a large extent while at the same time managing to secure soundness and completeness. Both calculi are characterized by a relatively larger number of deduction rules and a simple axiom schema. Of the two calculi, LK is the one that has been most widely used in implementations of automated reasoning programs, and it is the one that we will discuss first; NK will be discussed in the next section.
Although the application of the LK rules affect logic formulas, the
rules are seen as manipulating not logic formulas themselves but
sequents. Sequents are expressions of the form
,
where both
and
are
(possibly empty) sets of formulas.
is the sequent's antecedent and
its succedent. Sequents can be
interpreted thus: Let
be an interpretation. Then,
In other words,satisfies the sequent
![]()
![]()
(written as:
![]()
![]()
) iff
either![]()
![]()
(for some
![]()
![]()
) or
![]()
![]()
(for some
![]()
![]()
).
If![]()
![]()
![]()
![]()
iff
![]()
(
1 & &
n)
(
1
![]()
![]()
n), where
1 & &
n is the iterated conjunction of the formulas in
and
1
![]()
![]()
n is the iterated disjunction of those in
.
Axioms Cut Rule -------
,
![]()
![]()
,
![]()
![]()
,
![]()
,
----------
,
![]()
![]()
,
Antecedent Rules ( )
Succedent Rules ( )
& ![]()
,
,
![]()
![]()
------
,
&
![]()
![]()
&
![]()
![]()
,
![]()
![]()
![]()
,
----------
![]()
![]()
,
&
,
![]()
![]()
![]()
,
![]()
![]()
----------
,
![]()
![]()
![]()
![]()
![]()
![]()
,
,
------
![]()
![]()
,
![]()
![]()
![]()
![]()
,
![]()
,
![]()
![]()
----------
,
![]()
![]()
![]()
![]()
,
![]()
![]()
,
------
![]()
![]()
,
![]()
![]()
,
,
![]()
![]()
![]()
![]()
![]()
,
,
------------
,
![]()
![]()
![]()
![]()
,
![]()
![]()
,
![]()
,
,
![]()
,
------------
![]()
![]()
,
![]()
![]()
~ ![]()
![]()
,
----
, ~
![]()
![]()
~
,
![]()
![]()
-----
![]()
![]()
, ~
,
(a/x)
![]()
-------
, (
x)
(x)
![]()
![]()
![]()
,
(t/x), (
x)
(x)
----------
![]()
![]()
, (
x)
(x)
,
(t/x), (
x)
(x)
![]()
----------
, (
x)
(x)
![]()
![]()
![]()
,
(a/x)
-------
![]()
![]()
, (
x)
(x)
The sequents above a rule's line are called the
rule's premises and the sequent below the line
is the rule's conclusion. The quantification
rules
and
have an eigenvariable condition that restricts their applicability,
namely that a must not occur in
,
or in the quantified sentence. The purpose of this restriction is to
ensure that the choice of parameter, a, used in the substitution
process is completely arbitrary.
Proofs in LK are represented as trees where each node in the tree is
labeled with a sequent, and where the original sequent sits at the root
of the tree. The children of a node are the premises of the rule being
applied at that node. The leaves of the tree are labeled with axioms.
Here is the LK-proof of
(x)R(x)
from the set
{(
x)(P(x)
Q(x)),
(
x)(P(x)
R(x)),(
x)(Q(x)
R(x))}. In the tree below,
stands for this set:
![]() ![]() ![]() |
![]() ![]() ![]() |
![]() ![]() ![]() |
![]() ![]() ![]() |
----------------------------![]() ![]() ![]() ![]() --------------- ![]() ![]() ![]() |
-----------------------------![]() ![]() ![]() ![]() --------------- ![]() ![]() ![]() |
||
-------------------------------------------![]() ![]() ![]() ![]() ------------- ![]() ![]() ![]() -------- ![]() ![]() ![]() |
In our example, all the leaves in the proof tree are labeled with
axioms. This establishes the validity of
(
x)R(x)
and, hence, the fact that
(
x)R(x).
LK takes an indirect approach at proving the conclusion and this is
an important difference between LK and NK. While NK constructs an
actual proof (of the conclusion from the given assumptions), LK
instead constructs a proof that proves the existence of a proof (of
the conclusion from the assumptions). For instance, to prove that
is entailed by
,
NK constructs a step-by-step proof of
from
(assuming that one exists); in contrast, LK first constructs the
sequent
which then attempts to prove valid by showing that it cannot be made
false. This is done by searching for a counterexample that makes (all
the sentences in)
true and makes
false: If the search fails then a
counterexample does not exist and the sequent is therefore valid. In
this respect, proof trees in LK are actually refutation proofs. Like
resolution, LK is refutation complete: If
then the sequent
has a proof tree.
As it stands, LK is unsuitable for automated deduction and there are
some obstacles that must be overcome before it can be efficiently
implemented. The reason is, of course, that the statement of the
completeness of LK only has to assert, for each entailment relation,
the existence of a proof tree but a reasoning program has the more
difficult task of actually having to construct one. Some of the main
obstacles: First, LK does not specify the order in which the rules must
be applied in the construction of a proof tree. Second, and as a
particular case of the first problem, the premises in the rules
and
rules inherit the quantificational formula to
which the rule is applied, meaning that the rules can be applied
repeatedly to the same formula sending the proof search into an endless
loop. Third, LK does not indicate which formula must be selected next
in the application of a rule. Fourth, the quantifier rules provide no
indication as to what terms or free variables must be used in their
deployment. Fifth, and as a particular case of the previous problem,
the application of a quantifier rule can lead into an infinitely long
tree branch because the proper term to be used in the instantiation
never gets chosen. Fortunately, as we will hint at below each of these
problems can be successfully addressed.
Axiom sequents in LK are valid, and the conclusion of a rule is valid iff its premises are. This fact allows us to apply the LK rules in either direction, forwards from axioms to conclusion, or backwards from conclusion to axioms. Also, with the exception of the cut rule, all the rules' premises are subformulas of their respective conclusions. For the purposes of automated deduction this is a significant fact and we would want to dispense with the cut rule; fortunately, the cut-free version of LK preserves its refutation completeness (Gentzen 1935). These results provide a strong case for constructing proof trees in a backwards fashion; indeed, by working this way a refutation in cut-free LK gets increasingly simpler as it progresses since subformulas are simpler than their parent formulas. Moreover, and as far as propositional rules go, the new subformulas entered into the tree are completely dictated by the cut-free LK rules. Furthermore, and assuming the proof tree can be brought to completion, branches eventually end up with atoms and the presence of axioms can be quickly determined. Another reason for working backwards is that the truth-functional fragment of cut-free LK is confluent in the sense that the order in which the non-quantifier rules are applied is irrelevant: If there is a proof, regardless of what you do, you will run into it! To bring the quantifier rules into the picture, things can be arranged so that all rules have a fair chance of being deployed: Apply, as far as possible, all the non-quantifier rules before applying any of the quantifier rules. This takes care of the first and second obstacles, and it is no too difficult to see how the third one would now be handled. The fourth and fifth obstacles can be addressed by requiring that the terms to be used in the substitutions be suitably selected from the Herbrand universe (Herbrand 1930).
The use of sequent-type calculi in automated theorem proving was initiated by efforts to mechanize mathematics (Wang 1960). At the time, resolution captured most of the attention of the automated reasoning community but during the 1970's some researchers started to further investigate non-resolution methods (Bledsoe 1977), prompting a frutiful and sustained effort to develop more human-oriented theorem proving systems (Bledsoe 1975, Nevins 1974). Eventually, sequent-type deduction gained momentum again, particularly in its re-incarnation as analytic tableaux (Fitting 1990). The method of deduction used in tableaux is essentially cut-free LK's with sets used in lieu of sequents.
Although LK and NK are both commonly labeled as natural
deduction systems, it is the latter which better deserves the
title due to its more natural, human-like, approach to proof
construction. The rules of NK are typically presented as acting on
standard logic formulas in an implicitly understood context, but they
are also commonly given in the literature as acting more explicitly on
judgements, that is, expressions of the form
where
is a set of formulas and
is a formula. This form is typically understood as making the
metastatement that there is a proof of
from
(Kleene 1962). Following Gentzen 1935 and Prawitz 1965 here we take
the former approach. The system NK has no logical axioms and provides
two introduction-elimination rules for each logical connective:
Introduction Rules ( I)
Elimination Rules ( E)
&I ![]()
-----
&
&E 1 &
2
----
i (for i = 1,2)
I
i (for i = 1,2)
----
1
![]()
2
E
![]()
![]()
[---
]
[---
]
---------
I
[ ---
]
-----
![]()
![]()
E
![]()
![]()
![]()
-----
I
[ ---
]
[---
]
-----
![]()
![]()
E
i (i = 0,1)
0
![]()
1
----------
1-i
~I [ ---
]
----
~~E [~ ---
]
----
I
(t/x)
----
(x)
(x)
E
( x)
(x)
[(a/x) ---
]
---------
I
(a/x)
----
(x)
(x)
E
( x)
(x)
----
(t/x)
A few remarks: First, the expression
[ ---
]
represents the fact that
is an auxiliary assumption in the proof of
that eventually gets discharged,
i.e. discarded. For example,
E
tells us that if in the process of constructing a proof
one has already derived
(
x)
(x)
and also
with
(a/x)
as an auxiliary assumption then
the inference to
is allowed. Second, the eigenparameter, a, in
E and
I
must be foreign to the premises, undischarged -- active
-- assumptions, to the rule's conclusion and, in the case of
E, to
(
x)
(x). Third,
is shorthand for two contradictory formulas,
and
~
.
Finally, NK is complete: If
then there is a proof of
from
using the rules of NK.
As in LK, proofs constructed in NK are represented as trees with the
proof's conclusion sitting at the root of the tree, and the
problem's assumptions sitting at the leaves. (Proofs are also
typically given as sequences of judgements,
,
running from the top to the bottom of the printed page.) Here is a
natural deduction proof tree of
(
x)R(x)
from (
x)(P(x)
Q(x)),
(
x)(P(x)
R(x)) and
(
x)(Q(x)
R(x)):
( x)(P(x)
R(x))
-------
P(a)R(a)
( x)(Q(x)
R(x))
-------
Q(a)R(a)
( x)(P(x)
Q(x))
-------
P(a)Q(a)
[P(a) --- R(a)]
----------
R(a)[Q(a) --- R(a)]
----------
R(a)-------------------------------
R(a)
----
(x)R(x)
As in LK, a forward-chaining strategy for proof construction is not
well focused. So, although proofs are read forwards, that is,
from leaves to root or, logically speaking, from assumptions to
conclusion, that is not the way in which they are typically
constructed. A backward-chaining strategy implemented by
applying the rules in reverse order is more effective. Many of the
obstacles that were discussed above in the implementation of sequent
deduction are applicable to natural deduction as well. These issues can
be handled in a similar way, but natural deduction introduces some
issues of its own. For example, as suggested by the
-Introduction rule,
to prove a goal of the form
one could attempt to prove
on the assumption that
.
But note that although the goal
does not match the conclusion of any other introduction rule, it
matches the conclusion of all elimination rules and the
reasoning program would need to consider those routes too. Similarly
to forward-chaining, here there is the risk of setting goals that are
irrelevant to the proof and that could lead the program astray. To
wit: What prevents a program from entering the never-ending process
of building, say, larger and larger conjunctions? Or, what is there
to prevent an uncontrolled chain of backward applications of, say,
-Elimination?
Fortunately, NK enjoys the subformula property in the sense that each
formula entering into a natural deduction proof can be restricted to
being a subformula of
{
}, where
is the set of auxiliary assumptions made by the ~-Elimination
rule. By exploiting the subformula property a natural deduction
automated theorem prover can drastically reduce its search space and
bring the backward application of the elimination rules under control
(Portoraro 1998, Sieg and Byrnes 1996). Further gains can be realized
if one is willing to restrict the scope of NK's logic to its
intuitionistic fragment where every proof has a normal form in the
sense that no formula is obtained by an introduction rule and then is
eliminated by an elimination rule (Prawitz 1965).
Implementations of automated theorem proving systems using NK deduction have been motivated by the desire to have the program reason with precisely the same proof format and methods employed by the human user. This has been particularly true in the area of education where the student is engaged in the interactive construction of formal proofs in an NK-like calculus working under the guidance of a theorem prover ready to provide assistance when needed (Portoraro 1994, Suppes 1981). Other, research-oriented, theorem provers true to the spirit of NK exist (Pelletier 1998) but are rare.
The name of the matrix connection method (Bibel 1981) is indicative of
the way it operates. The term matrix refers to the form
in which the set of logic formulas expressing the problem is
represented; the term connection refers to the way the
method operates on these formulas. To illustrate the method at work, we
will use an example from propositional logic and show that
R
is entailed by
P Q,
P
R and
Q
R.
This is done by establishing that the formula
(Pis unsatisfiable. To do this, we begin by transforming it into conjunctive normal form:Q) & (P
R) & (Q
R) & ~R
(PQ) & (~P
R) & (~Q
R) & ~R
This formula is then represented as a matrix, one conjunct per row and, within a row, one disjunct per column:
P Q ~P R ~Q R ~R
The idea now is to explore all the possible vertical paths running through this matrix. A vertical path is a set of literals selected from each row in the matrix such that each literal comes from a different row. The vertical paths:
Path 1 P, ~P, ~Q and ~R Path 2 P, ~P, R and ~R Path 3 P, R, ~Q and ~R Path 4 P, R, R and ~R Path 5 Q, ~P, ~Q and ~R Path 6 Q, ~P, R and ~R Path 7 Q, R, ~Q and ~R Path 8 Q, R, R and ~R
A path is complementary if it contains two literals which are complementary. For example, Path 2 is complementary since it has both P and ~P but so is Path 6 since it contains both R and ~R. Note that as soon as a path includes two complementary literals there is no point in pursuing the path since it has itself become complementary. This typically allows for a large reduction in the number of paths to be inspected. In any event, all the paths in the above matrix are complementary and this fact establishes the unsatisfiability of the original formula. This is the essence of the matrix connection method. The method can be extended to predicate logic but this demands additional logical apparatus: Skolemnization, variable renaming, quantifier duplication, complementarity of paths via unification, and simultaneous substitution across all matrix paths (Bibel 1981, Andrews 1981). Variations of the method have been implemented in reasoning programs in higher-order logic (Andrews 1981) and non-classical logics (Wallen 1990).
Equality is an important logical relation whose behavior within
automated deduction deserves its own separate treatment.
Equational logic and, more generally, term
rewriting treat equality-like equations as rewrite
rules, also known as reduction or demodulation rules. An
equality statement like f(a)= a allows the
simplification of a term like
g(c,f(a)) to
g(c,a). However, the same equation also
has the potential to generate an unboundedly large term:
g(c,f(a)),
g(c,f(f(a))),
g(c,f(f(f(a)))),
. What distinguishes term rewriting from equational logic is
that in term rewriting equations are used as unidirectional reduction
rules as opposed to equality which works in both directions. Rewrite
rules have the form t1
t2 and the basic
idea is to look for terms t occurring in expressions
e such that t unifies with
t1 with unifier
so that the occurrence
t1
in e
can be replaced by
t2
.
For example, the rewrite rule x + 0
x allows the rewriting of succ(succ(0) +
0) as succ(succ(0)).
To illustrate the main ideas in term rewriting, let us explore an example involving symbolic differentiation (the example and ensuing discussion are adapted from Chapter 1 of Baader and Nipkow 1998). Let der denote the derivative respect to x, let y be a variable different from x, and let u and v be variables ranging over expressions. We define the rewrite system:
(R1) der(x)1
(R2) der(y)0
(R3) der(u + v)der(u) + der(v)
(R4) der(u × v)(u × der(v)) + (der(u) × v)
Again, the symbol
indicates that
a term matching the left-hand side of a rewrite rule should be replaced
by the rule's right-hand side. To see the differentiation system
at work, let us compute the derivative of
x × x respect to x,
der(x × x):
der(x × x) (x × der(x)) + (der(x) × x) by R4 (x × 1) + (der(x) × x) by R1 (x × 1) + (1 × x) by R1
At this point, since none of the rules (R1)-(R4) applies, no further reduction is possible and the rewriting process ends. The final expression obtained is called a normal form, and its existence motivates the following question: Is there an expression whose reduction process will never terminate when applying the rules (R1)-(R4)? Or, more generally: Under what conditions a set of rewrite rules will always stop, for any given expression, at a normal form after finitely many applications of the rules? This fundamental question is called the termination problem of a rewrite system, and we state without proof that the system (R1)-(R4) meets the termination condition.
There is the possibility that when reducing an expression, the set of rules of a rewrite system could be applied in more than one way. This is actually the case in the system (R1)-(R4) where in the reduction of der(x × x) we could have applied R1 first to the second sub-expression in (x × der(x)) + (der(x) × x), as shown below:
der(x × x) (x × der(x)) + (der(x) × x) by R4 (x × der(x)) + (1 × x) by R1 (x × 1) + (1 × x) by R1
Following this alternative course of action, the reduction terminates with the same normal form as in the previous case. This fact, however, should not be taken for granted: A rewriting system is said to be (globally) confluent if and only if independently of the order in which its rules are applied every expression always ends up being reduced to its one and only normal form. It can be shown that (R1)-(R4) is confluent and, hence, we are entitled to say: Compute the derivative of an expression (as opposed to simply a derivative). Adding more rules to a system in an effort to make it more practical can have undesired consequences. For example, if we add the rule
(R5) u + 0u
to (R1)-(R4) then we will be able to further reduce certain expressions but at the price of losing confluency. The following reductions show that der(x + 0) now has two normal forms:
der(x + 0) der(x) + der(0) by R3 1 + der(0) by R1
der(x + 0) der(x) by R5 1 by R1
Adding the rule, (R6) der(0)
0,
would allow the further reduction of 1 + der(0) to 1 + 0
and then, by R5, to 1. Although the presence of this new rule
actually increases the number of alternative paths --
der(x + 0) can now be reduced in four possible ways
-- they all end up with the same normal form, namely 1. This is no
coincidence as it can be shown that (R6) actually restores
confluency. This motivates another fundamental question: Under what
conditions can a non-confluent system be made into an equivalent
confluent one? The Knuth-Bendix completion algorithm
(Knuth and Bendix 1970) gives a partial answer to this question.
Term rewriting, like any other automated deduction method, needs strategies to direct its application. Rippling (Bundy, Stevens and Harmelen 1993, Basin and Walsh 1996) is a heuristic that has its origins in inductive theorem-proving that uses annotations to selectively restrict the rewriting process.
Mathematical induction is a very important technique of theorem proving in mathematics and computer science. Problems stated in terms of objects or structures that involve recursive definitions or some form of repetition invariably require mathematical induction for their solving. In particular, reasoning about the correctness of computer systems requires induction and an automated reasoning program that effectively implements induction will have important applications.
To illustrate the need for mathematical induction, assume that a
property
is true of the number zero and also that if true of a number then is
true of its successor. Then, with our deductive systems, we
can deduce that for any given number n,
is true of it,
(n).
But we cannot deduce that
is true of
all numbers, (
x)
(x);
this inference step requires the rule of mathematical induction:
(0)
[ (n) ---
(succ(n))]
--------------- (mathematical induction) ( x)
(x)
In other words, to prove that
(x)
(x)
one proves that
(0) is the case, and that
(succ(n))
follows from the assumption that
(n).
The implementation of induction in a reasoning system presents very
challenging search control problems. The most important of these is
the ability to determine the particular way in which induction will
be applied during the proof, that is, finding the appropriate
induction schema. Related issues include selecting the proper
variable of induction, and recognizing all the possible cases for the
base and the inductive steps.
Nqthm (Boyer and Moore 1979) has been one of the most successful implementations of automated inductive theorem proving. In the spirit of Gentzen, Boyer and Moore were interested in how people prove theorems by induction. Their theorem prover is written in the functional programming language Lisp which is also the language in which theorems are represented. For instance, to express the commutativity of addition the user would enter the Lisp expression (EQUAL (PLUS X Y) (PLUS Y X)). Everything defined in the system is a functional term, including its basic predicates: T, F, EQUAL X Y, IF X Y Z, AND, NOT, etc. The program operates largely as a black box, that is, the inner working details are hidden from the user; proofs are conducted by rewriting terms that posses recursive definitions, ultimately reducing the conclusion's statement to the T predicate. The Boyer-Moore theorem prover has been used to check the proofs of some quite deep theorems (Boyer, Kaufmann, and Moore 1995). Lemma caching, problem statement generalization, and proof planning are techniques particularly useful in inductive theorem proving (Bundy, Harmelen and Hesketh 1991).
Higher-order logic differs from first-order logic in that
quantification over functions and predicates is allowed. The statement
Any two people are related to each other in one way or
another can be legally expressed in higher-order logic as
(x)(
y)(
R)R(x,y)
but not in first-order logic. Higher-order logic is inherently more
expressive than first-order logic and is closer in spirit to actual
mathematical reasoning. For example, the notion of set finiteness
cannot be expressed as a first-order concept. Due to its richer
expressiveness, it should not come as a surprise that implementing an
automated theorem prover for higher-order logic is more challenging
than for first-order logic. This is largely due to the fact that
unification in higher-order logic is more complex than in the
first-order case: unifiable terms do not always posess a most general
unifier, and higher-order unification is itself undecidable. Finally,
given that higher-order logic is incomplete, there are always proofs
that will be entirely out of reach for any automated reasoning
program.
Methods used to automate first-order deduction can be adapted to
higher-order logic. TPS (Andrews et al. 1996) is a theorem proving
system for higher-order logic that uses Church's typed
-calculus as its logical
representation language and is based on a connection-type deduction
mechanism that incorporates Huet's unification algorithm (Huet
1975). As a sample of the capabilities of TPS, the program has proved
automatically that a subset of a finite set is finite, the
equivalence among several formulations of the Axiom of Choice, and
Cantor's Theorem that a set has more subsets than members. The
latter was proved by the program by asserting that there is no onto
function from individuals to sets of individuals, with the proof
proceeding by a diagonal argument. HOL (Gordon and Melham 1993) is
another higher-order proof development system primarily used as an
aid in the development of hardware and software safety-critical
systems. HOL is based on the LCF approach to interactive theorem
proving (Gordon, Milner and Wadsworth 1979), and it is built on the
strongly typed functional programming language ML. HOL, like TPS, can
operate in automatic and interactive mode. Availability of the latter
mode is welcomed since the most useful automated reasoning systems
may well be those which place an emphasis on interactive theorem
proving (Farmer, Guttman and Thayer 1993) and can be used as
assistants operating under human guidance. Isabelle (Paulson 1994) is
a generic, higher-order, framework for rapid prototyping of deductive
systems. Object logics can be formulated within Isabelle's
metalogic by using its many syntactic and deductive tools. Isabelle
also provides some ready-made theorem proving environments, including
Isabelle/HOL, Isabelle/ZF and Isabelle/FOL, which can be used as
starting points for applications and further development by the
user. Isabelle/ZF has been used to prove equivalent formulations of
the Axiom of Choice, formulations of the Well-Ordering Principle, as
well as the key result about cardinal arithmetic that, for any
infinite cardinal
,
=
(Paulson and Grabczewski 1996).
Non-classical logics such as modal logics, intuitionsitic logic, multi-valued logics, autoepistemic logics, non-monotonic reasoning, commonsense and default reasoning, relevant logic, paraconsistent logic, and so on, have been increasingly gaining the attention of the automated reasoning community. One of the reasons has been the natural desire to extend automated deduction techniques to new domains of logic. Another reason has been the need to mechanize non-classical logics as an attempt to provide a suitable foundation for artificial intelligence. A third reason has been the desire to attack some problems that are combinatorially too large to be handled by paper and pencil. Indeed, some of the work in automated non-classical logic provides a prime example of automated reasoning programs at work. To illustrate, the Ackerman Constant Problem asks for the number of non-equivalent formulas in the relevant logic R. There are actually 3,088 such formulas (Slaney 1984) and the number was found by sandwiching it between a lower and an upper limit, a task that involved constraining a vast universe of 20400 20-element models in search of those models that rejected non-theorems in R. It is safe to say that this result would have been impossible to obtain without the assistance of an automated reasoning program.
There have been three basic approaches to automate the solving of problems in non-classical logic (McRobie 1991). One approach has been, of course, to try to mechanize the non-classical deductive calculi. Another has been to simply provide an equivalent formulation of the problem in first-order logic and let a classical theorem prover handle it. A third approach has been to formulate the semantics of the non-classical logic in a first-order framework where resolution or connection-matrix methods would apply.
Modal logic. Modal logics find extensive use in computing science as logics of knowledge and belief, logics of programs, and in the specification of distributed and concurrent systems. Thus, a program that automates reasoning in a modal logic such as K, K4, T, S4, or S5 would have important applications. With the exception of S5, these logics share some of the important metatheoretical results of classical logic, such as cut-elimination, and hence cut-free (modal) sequent calculi can be provided for them, along with techniques for their automation. Connection methods (Andrews 1981, Bibel 1981) have played an important role in helping to understand the source of redundancies in the search space induced by these modal sequent calculi and have provided a unifying framework not only for modal logics but also for intuitionistic and classical logic as well (Wallen 1990).
Intutionistic logic. There are different ways in which intuitionsitic logic can be automated. One is to directly implement the intuitionistic versions of Gentzen's sequent and natural deduction calculi, LJ and NJ respectively. This approach inherits the stronger normalization results enjoyed by these calculi allowing for a more compact mechanization than their classical counterparts. Another approach at mechanizing intuitionistic logic is to exploit its semantic similarities with the modal logic S4 and piggy back on an automated implementation of S4. Automating intuitionistic logic has applications in software development since writing a program that meets a specification corresponds to the problem of proving the specification within an intuitionistic logic (Martin-Löf 1982). A system that automated the proof construction process would have important applications in algorithm design but also in constructive mathematics. Nuprl (Constable et al. 1986) is a computer system supporting a particular mathematical theory, namely constructive type theory, and whose aim is to provide assistance in the proof development process. The focus is on logic-based tools to support programming and on implementing formal computational mathematics. Over the years the scope of the Nuprl project has expanded from proofs-as-programs to systems-as-theories.
Logic programming, particularly represented by the language Prolog (Colmerauer et al. 1973), is probably the most important and widespread application of automated theorem proving. During the early 1970s, it was discovered that logic could be used as a programming language (Kowalski 1974). What distinguishes logic programming from other traditional forms of programming is that logic programs, in order to solve a problem, do not explicitly state how a specific computation is to be performed; instead, a logic program states what the problem is and then delegates the task of actually solving it to an underlying theorem prover. In Prolog, the theorem prover is based on a refinement of resolution known as SLD-resolution. SLD-resolution is a variation of linear input resolution that incorporates a special rule for selecting the next literal to be resolved upon; SLD-resolution also takes into consideration the fact that, in the computer's memory, the literals in a clause are actually ordered, that is, they form a sequence as opposed to a set. A Prolog program consists of clauses stating known facts and rules. For example, the following clauses make some assertions about flight connections:
flight(toronto, london).
flight(london,rome).
flight(chicago,london).
flight(X,Y) :- flight(X,Z) , flight(Z,Y).
The clause flight(toronto, london) is a
fact while flight(X,Y) :-
flight(X,Z) ,
flight(Z,Y) is a rule, written by
convention as a reversed conditional (the symbol :- means
if; the comma means and; terms starting in
uppercase are variables). The former states that there is flight
connection between Toronto and London; the latter states that there
is a flight between cities X and Y if, for some
city Z, there is a flight between X and Z
and one between Z and Y. Clauses in Prolog programs
are a special type of Horn clauses having precisely one positive
literal: Facts are program clauses with no negative
literals while rules have at least one negative
literal. (Note that in standard clause notation the program rule in
the previous example would be written as
flight(X,Y)
~flight(X,Z)
~flight(Z,Y).)
The specific form of the program rules is to effectively express
statements of the form: If these conditions over here are
jointly met then this other fact will follow. Finally, a
goal is a Horn clause with no positive literals. The
idea is that, once a Prolog program
has been written, we can then try to determine if a new clause
, the goal, is entailed by
,
; the Prolog prover does this by
attempting to derive a contradiction from
{~
}.
We should remark that program facts and rules alone cannot produce a
contradiction; a goal must enter into the process. Like input
resolution, SLD-resolution is not refutation complete for first-order
logic but it is complete for the Horn logic of Prolog programs. The
fundamental theorem: If
is a Prolog program and
is the goal clause then
iff
{~
}
[ ] by SLD-resolution (Lloyd 1984).
For instance, to find out if there is a flight from Toronto to Rome one asks the Prolog prover to see if the clause flight(toronto, rome) follows from the given program. To do this, the prover adds ~flight(toronto, rome) to the program clauses and attempts to derive the empty clause, [ ], by SLD-resolution:
1 flight(toronto,london) Program clause 2 flight(london,rome) Program clause 3 flight(chicago,london) Program clause 4 flight(X,Y) ~flight(X,Z)
~flight(Z,Y)
Program clause 5 ~flight(toronto,rome) Negation of the conclusion 6 ~flight(toronto,Z) ~flight(Z,rome)
Res 5 4 7 ~flight(london,rome) Res 6 1 8 [ ] Res 7 2
The conditional form of rules in Prolog programs adds to their
readability and also allows reasoning about the underlying
refutations in a more friendly way: To prove that there is a flight
between Toronto and Rome,
flight(toronto,rome), unify this clause
with the consequent flight(X,Y) of the
fourth clause in the program which itself becomes provable if both
flight(toronto,Z) and
flight(Z,rome) can be proved. This can be
seen to be the case under the substitution {Z
london} since both
flight(toronto,london) and
flight(london,rome) are themselves
provable. Note that the theorem prover not only establishes that
there is a flight between Toronto and Rome but it can also come up
with an actual itinerary, Toronto-London-Rome, by extracting it from
the unifications used in the proof.
There are at least two broad problems that Prolog must address in order to achieve the ideal of a logic programming language. Logic programs consist of facts and rules describing what is true; anything that is not provable from a program is deemed to be false. In regards to our previous example, flight(toronto, boston) is not true since this literal cannot be deduced from the program. The identification of falsity with non-provability is further exploited in most Prolog implementations by incorporating an operator, not, that allows programmers to explicitly express the negation of literals (or even subclauses) within a program. By definition, not l succeeds if the literal l itself fails to be deduced. This mechanism, known as negation-by-failure, has been the target of criticism. Negation-by-failure does not fully capture the standard notion of negation and there are significant logical differences between the two. Standard logic, including Horn logic, is monotonic which means that enlarging an axiom set by adding new axioms simply enlarges the set of theorems derivable from it; negation-by-failure, however, is non-monotonic and the addition of new program clauses to an existing Prolog program may cause some goals to cease from being theorems. A second issue is the control problem. Currently, programmers need to provide a fair amount of control information if a program is to achieve acceptable levels of efficiency. For example, a programmer must be careful with the order in which the clauses are listed within a program, or how the literals are ordered within a clause. Failure to do a proper job can result in an inefficient or, worse, non-terminating program. Programmers must also embed hints within the program clauses to prevent the prover from revisiting certain paths in the search space (by using the cut operator) or to prune them altogether (by using fail). Last but not least, in order to improve their efficiency, many implementations of Prolog do not implement unification fully and bypass a time-consuming yet critical test -- the so-called occurs-check -- responsible for checking the suitability of the unifiers being computed. This results in an unsound calculus and may cause a goal to be entailed by a Prolog program (from a computational point of view) when in fact it should not (from a logical point of view).
There are variations of Prolog intended to extend its scope. By
implementing a model elimination procedure, the Prolog Technology
Theorem Prover (PPTP) (Stickel 1992) extends Prolog into full
first-order logic. The implementation achieves both soundness and
completeness. Moving beyond first-order logic,
Prolog
(Miller and Nadathur 1988) bases the language
on higher-order constructive logic.
Automated reasoning is a growing field that provides a healthy interplay between basic research and application. Automated deduction is being conducted using a multiplicity of theorem-proving methods, including resolution, sequent calculi, natural deduction, matrix connection methods, term rewriting, mathematical induction, and others. These methods are implemented using a variety of logic formalisms such as first-order logic, type theory and higher-order logic, clause and Horn logic, non-classical logics, and so on. Automated reasoning programs are being applied to solve a growing number of problems in formal logic, mathematics and computer science, logic programming, software and hardware verification, circuit design, and many others. One of the results of this variety of formalisms and automated deduction methods has been the proliferation of a large number of theorem proving programs. To test the capabilities of these different programs, selections of problems has been proposed against which their performance can be measured (McCharen, Overbeek and Wos 1976, Pelletier 1986). The TPTP (Sutcliffe and Suttner 1998) is a library of such problems that is updated on a regular basis. There is also a competition among automated theorem provers held regularly at the CADE conference; the problems for the competition are selected from the TPTP library.
Initially, computers were used to aid scientists with their complex and often tedious numerical calculations. The power of the machines was then extended from the numeric into the symbolic domain where infinite-precision computations performed by computer algebra programs have become an everyday affair. The goal of automated reasoning has been to further extend the machine's reach into the realm of deduction where they can be used as reasoning assistants in helping their users establish truth through proof.
Frederic Portoraro frederic@ai.toronto.edu |