The Notation in Principia Mathematica

First published Thu Aug 19, 2004; substantive revision Sun Jul 17, 2016

Principia Mathematica [PM] by A.N. Whitehead and Bertrand Russell, published 1910–1913 in three volumes by Cambridge University Press, contains a derivation of large portions of mathematics using notions and principles of symbolic logic. The notation in that work has been superseded by the subsequent development of logic during the 20th century, to the extent that the beginner has trouble reading PM at all. This article provides an introduction to the symbolism of PM, showing how that symbolism can be translated into a more contemporary notation which should be familiar to anyone who has had a first course in symbolic logic. This translation is offered as an aid to learning the original notation, which itself is a subject of scholarly dispute, and embodies substantive logical doctrines so that it cannot simply be replaced by contemporary symbolism. Learning the notation, then, is a first step to learning the distinctive logical doctrines of Principia Mathematica.

1. Why Learn the Symbolism in Principia Mathematica?

Principia Mathematica [PM] was written jointly by Alfred North Whitehead and Bertrand Russell over several years, and published in three volumes, which appeared between 1910 and 1913. It presents a system of symbolic logic and then turns to the foundations of mathematics to carry out the logicist project of defining mathematical notions in terms of logical notions and proving the fundamental axioms of mathematics as theorems of logic. While hugely important in the development of logic, philosophy of mathematics and more broadly of “Early Analytic Philosophy”, the work itself is no longer studied for these topics. As a result the very notation of the work has become alien to contemporary students of logic, and that has become a barrier to the study of Principia Mathematica.

This entry is intended to assist the student of PM in reading the symbolic portion of the work. What follows is a partial translation of the symbolism into a more contemporary notation, which should be familiar from other articles in this Encyclopedia, and which is quite standard in contemporary textbooks of symbolic logic. No complete algorithm is supplied, rather various suggestions are intended to help the reader learn the symbolism of PM. Many issues of interpretation would be prejudged by only using contemporary notation, and many details that are unique to PM depend on that notation. It will be seen below, with some of the more contentious aspects of the notation, that doctrines of substance are built into the notation of PM. Replacing the notation with a more modern symbolism would drastically alter the very content of the book.

2. Primitive Symbols

Below the reader will find, in the order in which they are introduced in PM, the following symbols, which are briefly described. More detail is provided in what follows:

pronounced “star”; indicates a number, or chapter, as in ∗1, or ∗20.
· a centered dot (an old British decimal point); indicates a numbered sentence in the order by first digit (all the 0s preceding all the 1s etc.), then second digit, and so on. The first definitions and propositions of ∗1 illustrate this “lexicographical” ordering: 1·01, 1·1, 1·11, 1·2, 1·3, 1·4, 1·5, 1·6, 1·7, 1·71, 1·72.
\(\vdash\) the assertion-sign; indicates an assertion, either an axiom (i.e., a primitive proposition, which are also annotated “\(\Pp\)”) or a theorem.
\(\Df\) the definition sign; follows a definition.
\(.\),   \( :\),   \(:.\),   \(::\),  etc. are dots used for delimiting punctuation; in contemporary logic, we use ( ), [ ], \(\{\ \}\), etc.
\(p, q, r\), etc. are propositional variables.
\(\lor\), \(\supset\), \(\osim \), \(\equiv\), \(\sdot\) are the familiar sentential connectives, corresponding to “or”, “if-then”, “not”, “if and only if” and “and”, respectively. [In the Second Edition of PM, 1925–27, the Sheffer Stroke “\(\mid\)” is the one primitive connective. It means “not bothand ___”.]
\(x, y, z\), etc. are individual variables, which are to be read with “typical ambiguity”, i.e., with their logical types to be filled in (see below).
\(a, b, c\), etc. are individual constants, and stand for individuals (of the lowest type). These occur only in the Introduction to PM, and not in the official system.
\(xRy, aRb, R(x)\), etc. are atomic predications, in which the objects named by the variables or constants stand in the relation \(R\) or have the property \(R\). These occur only in the Introduction. “\(a\)” and “\(b\)” occur as constants only in the Second Edition. The predications \(R(x), R(x,y)\), etc., are used only in the Second Edition.
\(\phi\), \(\psi\), \(\chi\), etc.,
and \(f, g\), etc.
are variables which range over propositional functions, no matter whether those functions are simple or complex.
\(\phi x\), \(\psi x\), \(\phi(x,y)\), etc. open atomic formulas in which both “\(x\)” and “\(\phi\)” are free. [An alternative interpretation is to view “\(\phi x\)” as a schematic letter standing for a formula in which the variable “\(x\)” is free.]
\(\hat{\phantom{x}}\) the circumflex; when placed over a variable in an open formula (as in “\(\phi \hat{x}\)”) results in a term for a function. [This matter is controversial. See Landini 1998.] When the circumflected variable precedes a complex variable, the result indicates a class, as in \(\hat{x}\phi x\).
\(\phi\hat{x}, \psi\hat{x}, \phi(\hat{x},\hat{z}),\) etc. Terms for propositional functions. Here are examples of such terms which are constants: “\(\hat{x}\) is happy”, “\(\hat{x}\) is bald and \(\hat{x}\) is happy”, “\(4 \lt \hat{x} \lt 6\)”, etc. If we apply, for example, the function “\(\hat{x}\) is bald and \(\hat{x}\) is happy” to the particular individual \(b\), the result is the proposition “\(b\) is bald and \(b\) is happy”.
\(\exists\) and ( ) are the quantifiers “there exists” and “for all” (“every”), respectively. For example, where \(\phi x\) is a simple or complex open formula,
\((\exists x)\phi x\) asserts “there exists an \(x\) such that \(\phi x\)”
\((\exists \phi)\phi x\) asserts “there exists a propositional function \(\phi\) such that \(\phi x\)”
\((x)\phi x\) asserts “every \(x\) is such that \(\phi x\)”
\((\phi)\phi x\) asserts “every propositional function \(\phi\) is such that \(\phi x\)”

[These were used by Peano. More recently, \(\forall\) has been added for symmetry with \(\exists\). Some scholars see the quantfiers \((\phi)\) and \((\exists \phi\)) as substitutional.]

\(\phi x \supset_x \psi x\)
\(\phi x \equiv_x \psi x\)
This is notation that is used to abbreviate universally quantified variables. In modern notation, these become \(\forall x(\phi x \supset \psi x)\) and \(\forall x(\phi x \equiv \psi x)\), respectively. See the definitions for this notation at the end of Section 3.2 below.
\(\bang\) pronounced “shriek”; indicates that a function is predicative, as in \(\phi \bang x\) or \(\phi\bang \hat{x}\). See Section 7.
= the identity symbol; expresses identity, which is a defined notion in PM, not primitive as in contemporary logic.
\(\atoi\) read as “the”; is the inverted iota or description operator and is used in expressions for definite descriptions, such as \((\atoi x)\phi x\) (which is read: the \(x\) such that \(\phi x\)).
[\((\atoi x)\phi x\)] a definite description in brackets; this is a scope indicator for definite descriptions.
\(E\bang \) is defined at ∗14·02, in the context \(E\bang (\atoi x)\phi x\), to mean that the description \((\atoi x)\phi x\) is proper, i.e., there is exactly one \(\phi\).
\(\exists\bang \) is defined at ∗24·03, in the context \(\exists \bang \alpha\), to mean that the class \(\alpha\) is non-empty, i.e., has a member.

3. The Use of Dots for Punctuation

An immediate obstacle to reading PM is the unfamiliar use of dots for punctuation, instead of the more common parentheses and brackets. The system is precise, and can be learned with just a little practice. The use of dots for punctuation is not unique to PM. Originating with Peano, it was later used in works by Alonzo Church, W.V.O. Quine, and others, but it has now largely disappeared. (The use of dots of some historical interest, as Alan Turing made a study of the use of dots from a computational point of view in 1942, presumably in his spare time after a day's work at Bletchley Park breaking the codes of the Enigma Machine.) The best way to learn to use it is to look at a few samples which are translated to formulae using parentheses, and thus to get the feel for it. What follows is an explanation as presented in PM, pages 9–10, followed by a number of examples which illustrate each of its clauses:

The use of dots. Dots on the line of the symbols have two uses, one to bracket off propositions, the other to indicate the logical product of two propositions. Dots immediately preceded or followed by “\(\lor\)” or “\(\supset\)” or “\(\equiv\)” or “\(\vdash\)”, or by “\((x)\)”, “\((x,y)\)”, “\((x,y,z)\)” … or “\((\exists x)\)”, “\((\exists x,y)\)”, “\((\exists x,y,z)\)” … or “\([(\atoi x)(\phi x)]\)” or “\([R‘y]\)” or analogous expressions, serve to bracket off a proposition; dots occurring otherwise serve to mark a logical product. The general principle is that a larger number of dots indicates an outside bracket, a smaller number indicates an inside bracket. The exact rule as to the scope of the bracket indicated by dots is arrived at by dividing the occurrences of dots into three groups which we will name I, II, and III. Group I consists of dots adjoining a sign of implication \((\supset)\) or equivalence \((\equiv)\) or of disjunction \(\lor)\) or of equality by definition \((=\Df)\). Group II consists of dots following brackets indicative of an apparent variable, such as \((x)\) or \((x,y)\) or \((\exists x)\) or \((\exists x,y)\) or \([(\atoi x)(\phi x)]\) or analogous expressions. Group III consists of dots which stand between propositions in order to indicate a logical product. Group I is of greater force than Group II, and Group II than Group III. The scope of the bracket indicated by any collection of dots extends backwards or forwards beyond any smaller number of dots, or any equal number from a group of less force, until we reach either the end of the asserted proposition or a greater number of dots or an equal number belonging to a group of equal or superior force. Dots indicating a logical product have a scope which works both backwards and forwards; other dots only work away from the adjacent sign of disjunction, implication, or equivalence, or forward from the adjacent symbol of one of the other kinds enumerated in Group II. Some examples will serve to illustrate the use of dots. (PM, 9–10)

3.1 Some Basic Examples

Consider the following series of extended examples, in which we examine propositions in PM and then discuss how to translate them step by step into modern notation. (Symbols below are sometimes used as names for themselves, thus avoiding some otherwise needed quotation marks. Russell is often accused of confusing use and mention, so there may well be some danger in this practice.)

Example 1

\[\tag*{∗1·2} {\vdash} \colon p \lor p \ldot {\supset} \ldot p \quad\Pp \]

This is the second assertion of “star” 1. It is in fact an axiom or “Primitive Proposition” as indicated by the ’\(\Pp\)’. That this is an assertion (axiom or theorem) and not a definition is indicated by the use of “\(\vdash\)”. (By contrast, a definition would omit the assertion sign but conclude with a “\(\Df\)” sign.) Now the first step in the process of translating ∗1·2 into modern notation is to note the colon. Recall, from the above quoted passage, that “a larger number of dots indicates an outside bracket, a smaller number indicates an inside bracket”. Thus, the colon here (which consists of a larger number of dots than the single dots occurring on the line in ∗1·2) represents an outside bracket. So, the first step is to translate ∗1·2 to:

\[ \vdash[ p \lor p \ldot {\supset} \ldot p] \]

So the brackets “[” and “]” represent the colon in ∗1·2. The scope of the colon thus extends past any smaller number of dots (i.e., one dot) to the end of the formula. Since formulas are read from left to right the expression “past” means “to the right of”.

Next, the dots around the “\(\supset\)” are represented in modern notation by the parenthesis around the antecedent and consequent. Recall, in the above passage, we find “… dots only work away from the adjacent sign of disjunction, implication, or equivalence …”. Thus, the next step in the translation process is to move to the formula: \[ \vdash [(p \lor p) \supset(p)] \]

Finally, standard modern conventions allow us to delete the outer brackets and the parentheses around single letters, yielding:

\[ \vdash(p \lor p) \supset p \]

Our next example involves conjunction, which is indicated by simple juxtaposition of atomic sentences, or with a dot when a substitution instance might be considered, as in the definition of conjunction in the following:

Example 2

\[ \tag*{∗3·01} p \sdot q \ldot {=} \ldot \osim(\osim p \lor \osim q) \quad\Df \]

Here we have a case in which dots occur indicate both a “logical product” (i.e., conjunction) and delimiting brackets. As a first step in translating ∗3·01 into modern notation, we replace the first dot by an ampersand (and its corresponding scope delimiters) and replace “\(\ldot {=} \ldot\)” by “\(=_{df}\)”, to yield:

\[ (p \amp q) =_{df} [\osim (\osim p \lor \osim q)] \]

The above step clearly illustrates how a “dot indicating a logical product has a scope which works both backwards and forwards”. Note that the first dot in ∗3·01, i.e., between the \(p\) and \(q\), is really optional, given the above quotation from PM. However, since we may sometimes want to substitute entire formulas for \(p\) and \(q\), the dot indicates the extent of the substituted formulas. Thus, we might have, as a substitution instance: \(r \lor s \sdot q \supset s\) (in PM notation) or \((r \lor s) \amp(q \supset s)\) (in contemporary symbols).

Finally, our modern conventions allow us to eliminate the outer parentheses from the definiendum and the brackets “[” and “]” from the definiens, yielding:

\[ p \amp q =_{df} \osim (\osim p \lor \osim q) \]

Notice that the scope of the negation sign “\(\osim \)” in ∗3·01 is not indicated with dots, even in the PM system, but rather requires parentheses.

Example 3

\[ \tag*{∗9·01} \osim \{(x) \sdot \phi x\} \ldot {=} \ldot (\exists x) \sdot \osim \phi x \quad\Df \]

If we apply the rule “dots only work away from the adjacent sign of disjunction, implication, or equivalence, or forward from the adjacent symbol of one of the other kinds enumerated in Group II” (where Group II includes “\((\exists x)\)”), then the modern equivalent would be: \[ \osim (x)\phi x =_{df} (\exists x)\osim \phi x \] or \[ \osim \forall x\phi x =_{df} \exists x\osim \phi x \]

3.2 The Force of Connectives

The ranking of connectives in terms of relative “force”, or scope, is a standard convention in contemporary logic. If there are no explicit parentheses to indicate the scope of a connective those which have precedence in the ranking are presumed to be the principal connective, and so on for subformulas. Thus, instead formulating the following DeMorgan’s law as the cumbersome:

\[ [(\osim p) \lor (\osim q)] \equiv[\osim (p \amp q)] \]

we nowadays write it as:

\[ \osim p \lor \osim q \equiv\osim (p \amp q) \]

This simpler formulation is natural because \(\equiv\) takes precedence over (has wider “scope” than) \(\lor\) and &, and the latter take precedence over \(\osim \). Indeed parentheses are often unneeded around \(\equiv\), given a further convention on which \(\equiv\) takes precedence over \(\supset\). Thus, the formula \(p \supset q \equiv\osim p\lor q\) becomes unambiguous. We might represent these conventions by listing the connectives in groups with those with widest scope at the top:

\[\begin{array}{c} \equiv \\ \supset \\ \amp, \lor \\ \osim \end{array}\]

For Whitehead and Russell, however, the symbols \(\supset\), \(\equiv\), \(\lor\) and \(\ldots =\ldots \Df\), in Group I, are of equal force. Group II consists of the variable binding expressions, quantifiers and scope indicators for definite descriptions, and Group III consists of conjunctions. Negation is below all of these. So the ranking in PM would be:

\[\begin{array}{c} \supset, \equiv, \lor \text{ and } \ldots =\ldots \quad\Df \\ (x), (x,y) \ldots (\exists x), (\exists x,y) \ldots [(\atoi x)\phi x] \\ p \sdot q \quad \text{(conjunction)} \\ \osim \end{array}\]

This is what Whitehead and Russell seem to mean when they say “Group I is of greater force than Group II, and Group II than Group III.” Consider the following:

Example 4

\[ \tag*{∗3·12} {\vdash} \colon \osim p \ldot {\lor} \ldot \osim q \ldot {\lor} \ldot p \sdot q \]

This theorem illustrates how to read multiple uses of the same number of dots within one formula. Grouping “associates to the left” both for dots and for a series of disjunctions, following the convention of reading from left to right and the definition:

\[ \tag*{∗2·33} p \vee q \vee r \ldot {=} \ldot (p \vee q) \vee r \quad\Df \]

So, in ∗3·12, the first two dots around the \(\lor\) simply “work away” from the connective. The second “extends” until it meets with the next of the same number (the third single dot). That third dot, and the fourth “work away” from the second \(\lor\), and the final dot indicates a conjunction with narrowest scope. The result, formulated with all possible punctuation for maximum explicitness, is:

\[ \{[(\osim p) \lor (\osim q)] \lor (p \amp q)\} \]

If we employ all the standard conventions for dropping parentheses, this becomes:

\[ (\osim p \lor \osim q) \lor (p \amp q) \]

This illustrates the passage in the above quotation which says “The scope of the bracket indicated by any collection of dots extends backwards or forwards beyond any smaller number of dots, or any equal number from a group of less force, until we reach either the end of the asserted proposition or a greater number of dots or an equal number belonging to a group of equal or superior force.”

Before we look at a wider range of examples, a detailed example involving quantified variables will prove to be instructive. Whitehead and Russell follow Peano’s practice of expressing universally quantified conditionals (such as “All \(\phi\)s are \(\psi\)s”) with the bound variable subscripted under the conditional sign. Similarly with universally quantified biconditionals (“All and only \(\phi\)s are \(\psi\)s”). That is, the expressions “\(\phi x \supset_x \psi x\)” and “\(\phi x \equiv_x \psi x\)” are defined as follows:

\[ \tag*{∗10·02} \phi x \supset_x \psi x \ldot {=} \ldot (x) \ldot \phi x \supset \psi x \quad\Df \] \[ \tag*{∗10·03} \phi x \equiv_x \psi x \ldot {=} \ldot (x) \ldot \phi x \equiv \psi x \quad\Df \]

and correspond to the following more modern formulas, respectively:

\[ \forall x(\phi x \supset \psi x) \] \[ \forall x(\phi x \equiv \psi x) \]

As an exercise the reader might be inclined to formulate a rigorous algorithm for converting PM into a particular contemporary symbolism (with conventions for dropping parentheses), but the best way to learn the system is to look over a few more examples of translations, and then simply begin to read formulae directly.

3.3 More Examples

In the examples below, each formula number is followed first by Principia notation and then its modern translation. Notice that in ∗1·5 parentheses are used for punctuation in addition to dots. (Primitive Propositions ∗1·2, ∗1·3, ∗1·4, ∗1·5, and ∗1·6 together constitute the axioms for propositional logic in PM. ) Proposition ∗1·5 was shown to be redundant by Paul Bernays in 1926. It can be derived from appropriate instances of the others and the rule of modus ponens.

∗1·3 \({\vdash} \colon q \ldot {\supset} \ldot p \lor q \quad\Pp\)
\(q \supset p \lor q\)
∗1·4 \({\vdash} \colon p \lor q \ldot {\supset} \ldot q \lor p \quad\Pp\)
\(p \lor q \supset q \lor p\)
∗1·5 \({\vdash} \colon p \lor (q \lor r ) \ldot {\supset} \ldot q \lor (p \lor r ) \quad\Pp\)
\(p \lor (q \lor r ) \supset q \lor (p \lor r )\)
∗1·6 \({\vdash} \colondot q \supset r \ldot {\supset} \colon p \lor q \ldot {\supset} \ldot p \lor r \quad\Pp\)
\((q \supset r ) \supset(p \lor q \supset p \lor r )\)
∗2·03 \({\vdash} \colon p \supset \osim q \ldot {\supset} \ldot q \supset\osim p \)
\((p \supset\osim q) \supset(q \supset\osim p)\)
∗3·3 \({\vdash} \colondot p \sdot q \ldot {\supset} \ldot r \colon {\supset} \colon p \ldot {\supset} \ldot q \supset r\)
\([(p \amp q) \supset r] \supset [p \supset(q \supset r)]\)
∗4·15 \({\vdash} \colondot p \sdot q \ldot {\supset} \ldot \osim r \colon {\equiv} \colon q \sdot r \ldot {\supset} \ldot \osim p\)
\(p \amp q \supset\osim r \equiv q \amp r \supset\osim p\)
∗5·71 \({\vdash} \colondot q \supset\osim r \ldot {\supset} \colon p \lor q \sdot r \ldot {\equiv} \ldot p \sdot r\)
\((q \supset\osim r) \supset [(p \lor q) \amp r \equiv p \amp r]\)
∗9·04 \( p \ldot {\lor} \ldot (x) \ldot \phi x \colon {=} \ldot (x) \ldot \phi x \lor p \quad\Df\)
\(p \lor \forall x\phi x =_{df} \forall x(\phi x \lor p)\)
∗9·521 \({\vdash} \colons (\exists x) \ldot \phi x \ldot {\supset} \ldot q \colon {\supset} \colondot (\exists x) \ldot \phi x \ldot {\lor} \ldot r \colon {\supset} \ldot q \lor r\)
[\((\exists x\phi x) \supset q] \supset [((\exists x\phi x) \lor r) \supset (q \lor r)\)]
∗10·55 \({\vdash} \colondot (\exists x) \ldot \phi x \sdot \psi x \colon \phi x \supset_x \psi x \colon {\equiv} \colon (\exists x) \ldot \phi x \colon \phi x \supset_x \psi x\)
\(\exists x(\phi x \amp \psi x) \amp \forall x(\phi x \supset \psi x) \equiv \exists x\phi x \amp \forall x(\phi x \supset \psi x)\)

4. Propositional Functions

There are two kinds of functions in PM. Propositional functions such as “\(\hat{x}\) is a natural number” are to be distinguished from the more familiar mathematical functions, which are called “descriptive functions” (PM, 31). Descriptive functions are defined using relations and definite descriptions. Examples of descriptive functions are \(x + y\) and “the successor of \(n\)”.

Focusing on propositional functions, Whitehead and Russell distinguish between expressions with a free variable (such as “\(x\) is hurt”) and names of functions (such as “\(\hat{x}\) is hurt”) (PM, 14–15). The propositions which result from the formula by assigning allowable values to the free variable “x” are said to be the “ambiguous values” of the function. Expressions using the circumflex notation, such as \(\phi \hat{x}\) only occur in the introductory material in the technical sections of PM and not in the technical sections themselves (with the exception of the sections on the theory of classes), prompting some scholars to say that such expressions do not really occur in the formal system of PM. This issue is distinct from that surrounding the interpretation of such symbols. Are they “term-forming operators” which turn an open formula into a name for a function, or simply a syntactic device, a placeholder, for indicating the variable for which a substitution can made in an open formula? If they are to be treated as term-forming operators, the modern notation for \(\phi \hat{x}\) would be “\(\lambda x\phi x\)”. The \(\lambda\)-notation has the advantage of clearly revealing that the variable \(x\) is bound by the term-forming operator \(\lambda\), which takes a predicate \(\phi\) and yields a term \(\lambda x\phi x\) (which in some logics is a singular term that can occur in the subject position of a sentence, while in other logics is a complex predicative expression). Unlike \(\lambda\)-notation, the PM notation using the circumflex cannot indicate scope. The function expression “\(\phi(\hat{x},\hat{z}\))” is ambiguous between “\(\lambda x\lambda y\phi xy\)” and “\(\lambda y\lambda x\phi xy\)”, without some further convention. Indeed, Whitehead and Russell specified this convention for relations in extension (on p. 200 in the introductory material of ∗21, in terms of the order of the variables), but the ambiguity it brought out most clearly by using \(\lambda\) notation: the first denotes the relation of being an \(x\) and \(y\) such that \(\phi xy\) and the second denotes the converse relation of being a \(y\) and \(x\) such that \(\phi xy\).

5. The Missing Notation for Types and Orders

This section explains notation that is not in Principia Mathematica. Except for some notation for “relative” types in Volume II, there are famously no symbols for types in Principia Mathematica! Sentences are generally to be taken as “typically ambiguous” and so standing for expressions of a whole range of types and so just as there are no individual or predicate constants, there are no particular functions of any specific type. So not only does one not see how to symbolize the argument:

All men are mortal
Socrates is a man
Therefore, Socrates is mortal

but also there is no indication of the logical type of the function “\(\hat{x}\) is mortal”. The project of PM is to reduce mathematics to logic, and part of the view of logic behind this project is that logical truths are all completely general. The derivation of truths of mathematics from definitions and truths of logic will thus not involve any particular constants other than those introduced by definition from purely logical notion. As a result no notation is included in PM for describing those types. Those of us who wish to consider PM as a logic which can be applied, must supplement it with some indication of types.

Readers should note that the explanation of types outlined below is not going to correspond with the statements about types in the text of PM. Alonzo Church [1976] developed a simple, rational reconstruction of the notation for both the simple and ramified theory of types as implied by the text of PM. (There are alternative, equivalent notations for the theory of types.) The full theory can be seen as a development of the simple theory of types.

5.1 Simple Types

A definition of the simple types can be given as follows:

  • \(\iota\) (Greek iota) is the type for an individual.
  • Where \(\tau_1,\ldots,\tau_n\) are any types, then \(\ulcorner(\tau_1,\ldots,\tau_n)\urcorner\) is the type of a propositional function whose arguments are of types \(\tau_1,\ldots,\tau_n\), respectively.
  • \(\ulcorner\)( )\(\urcorner\) is the type of propositions.

Here are some intuitive ways to understand the definition of type. Suppose that “Socrates” names an individual. (We are here ignoring Russell’s considered opinion that such ordinary individuals are in fact classes of classes of sense data, and so of a much higher type.) Then the individual constant “Socrates” would be of type \(\iota\). A monadic propositional function which takes individuals as arguments is of type \((\iota)\). Suppose that “is mortal” is a predicate expressing such a function. The function “\(\hat{x}\) is mortal” will also be of type \((\iota)\). A two-place or binary relation between individuals is of type \((\iota,\iota)\). Thus, a relation expression like “parent of” and the function “\(\hat{x}\) is a parent of \(\hat{z}\)” will be of type \((\iota,\iota)\).

Propositional functions of type \((\iota)\) are often called “first order”; hence the name “first order logic” for the familiar logic where the variables only range over arguments of first order functions. A monadic function of arguments of type \(\tau\) are of type \((\tau)\) and so functions of such functions are of type \(((\tau))\). “Second order logic” will have variables for the arguments of such functions (as well as variables for individuals). Binary relations between functions of type \(\tau\) are of type \((\tau,\tau)\), and so on, for relations of having more than 2 arguments. Mixed types are defined by the above. A relation between an individual and a proposition (such as “\(\hat{x}\) believes that \(\hat{P}\)”) will be of type \((\iota\),( )).

5.2 Ramified Types

To construct a notation for the full ramified theory of types of PM, another piece of information must be encoded in the symbols. Church calls the resulting system one of r-types. The key idea of ramified types is that any function defined using quantification over functions of some given type has to be of a higher “order” than those functions. To use Russell’s example:

\(\hat{x}\) has all the qualities that great generals have

is a function true of persons (i.e., individuals), and from the point of view of simple type theory, it has the same simple logical type as particular qualities of individuals (such as bravery and decisiveness). However, in ramified type theory, the above function will be of a higher order than those particular qualities of individuals, since unlike those particular qualities, it involves a quantification over those qualities. So, whereas the expression “\(\hat{x}\) is brave” denotes a function of r-type \((\iota)/1\), the expression “\(\hat{x}\) has all the qualities that great generals have” will have r-type \((\iota)/2\). In these r-types, the number after the “/” indicates the level of the function. The order of the functions will be defined and computed given the following definitions.

Church defines the r-types as follows:

  • \(\iota\) (Greek iota) is the r-type for an individual.
  • Where \(\tau_1,\ldots,\tau_m\) are any r-types, \(\ulcorner(\tau_1,\ldots,\tau_m)/n\urcorner\) is an r-type; this is the r-type of a \(m\)-ary propositional function of level \(n\), which has arguments of r-types \(\tau_1,\ldots,\tau_m\).

The order of an entity is defined as follows (here we no longer follow Church, for he defines orders for variables, i.e., expressions, instead of orders for the things the variables range over):

  • the order of an individual (of r-type \(\iota)\) is 0,
  • the order of a function of r-type \((\tau_1,\ldots,\tau_m)/n\) is \(n+N\), where \(N\) is the greatest of the order of the arguments \(\tau_1,\ldots,\tau_m\).

These two definitions are supplemented with a principle which identifies the levels of particular defined functions, namely, that the level of a defined function should be one higher than the highest order entity having a name or variable that appears in the definition of that function.

To see how these definitions and principles can be used to compute the order of the function “\(\hat{x}\) has all the qualities that great generals have”, note that the function can be represented as follows, where “\(x, y\)” are variables ranging over individuals of r-type \(\iota\) (order 0), “GreatGeneral\((y)\)” is a predicate denoting a propositional function of r-type \((\iota)/1\) (and so of order 1), and “\(\phi\)” is a variable ranging over propositional functions of r-type \((\iota)/1\) (and so of order 1) such as great general, bravery, leadership, skill, foresight, etc.:

\[ (\phi)\{[(y)(\textrm{GreatGeneral}(y) \supset \phi(y)] \supset \phi \hat{x} \} \]

We first note that given the above principle, the r-type of this function is \((\iota)/2\); the level is 2 because the level of the r-type of this function has to be one higher than the highest order of any entity named (or in the range of a variable used) in the definition. In this case, the denotation of GreatGeneral, and the range of the variable “\(\phi\)”, is of order 1, and no other expression names or ranges over an entity of higher order. Thus, the level of the function named above is defined to be 2. Finally, we compute the order of the function denoted above as it was defined: the sum of the level plus the greatest of the orders of the arguments of the above function. Since the only arguments in the above function are individuals (of order 0), the order of our function is just 2.

Quantifying over functions of r-type \((\tau)/n\) of order \(k\) in a definition of a new function yields a function of r-type \((\tau)/n+1\), and so a function of order one higher, \(k+1\). Two kinds of functions, then, can be of the second order: (1) functions of first-order functions of individuals, of r-type \(((\iota)/1)/1\), and (2) functions of r-type \((\iota)/2\), such as our example “\(\hat{x}\) has all the qualities that great generals have”. This latter will be a function true of individuals such as Napoleon, but of a higher order than simple functions such as “\(\hat{x}\) is brave”, which are of r-type \((\iota)/1\).

Logicians today use a different notion of “order”. Today, first-order logic is a logic with only variables for individuals. Second order logic is a logic with variables for both individuals and properties of individuals. Third-order logic is a logic with variables for individuals, properties of individuals, and properties of properties of individuals. And so forth. By contrast, Church would call these logics, respectively, the logic of functions of the types \((\iota)/1\) and \((\iota,\ldots,\iota)/1\), the logic of functions of the types \(((\iota)/1)/1\) and \(((\iota,\ldots,\iota)/1,\ldots,(\iota,\ldots,\iota)/1)/1\), and the logic of functions of the types \((((\iota)/1)/1)/1\) etc. (i.e., the level-one functions of the functions of the preceding type). Given Church’s definitions, these are logics of first-, second- and third-order functions, respectively, thus coinciding with the modern terminology of “\(n\)th-order logic”.

6. Variables

As mentioned previously, there are no individual or predicate constants in the formal system of PM, only variables. The Introduction, however, makes use of the example “\(a\) standing in the relation \(R\) to \(b\)” in a discussion of atomic facts (PM, 43). Although “\(R\)” is later used as a variable that ranges over relations in extension, and “\(a,b,c,\ldots\)” are individual variables, let us temporarily add them to the system as predicate and individual constants, respectively, in order to discuss the use of variables in PM.

PM makes special use of the distinction between “real”, or free, variables and “apparent”, or bound, variables. Since “\(x\)” is a variable, “\(xRy\)” will be an atomic formula in our extended language, with “\(x\)” and “\(y\)” real variables. When such formulae are combined with the propositional connectives \(\osim\), \(\lor\), etc., the result is a matrix. For example, “\(aRx \ldot {\lor} \ldot xRy\)” would be a matrix.

As we saw earlier, there are also variables which range over functions: “\(\phi\), \(\psi\), \(\ldots,f, g\)”, etc. The expression “\(\phi x\)” thus contains two variables and stands for a proposition, in particular, the result of applying the function \(\phi\) to the individual \(x\).

Theorems are stated with real variables, which gives them a special significance with regard to the theory. For example,

\[ \tag*{∗10·1} \vdash \colon (x) \ldot \phi x \ldot {\supset} \ldot \phi y \quad\Pp \]

is a fundamental axiom of the quantificational theory of PM. In this Primitive Proposition the variables “\(\phi\)” and “\(y\)” are real (free), and the “\(x\)” is apparent (bound). As there are no constants in the system, this is the closest that PM comes to a rule of universal instantiation.

Whitehead and Russell interpret “\((x) \sdot \phi x\)” as “the proposition which asserts all the values for \(\phi \hat{x}\)” (PM 41). The use of the word “all” has special significance within the theory of types. They present the “vicious circle principle”, which underlies the theory of types, as asserting that

… generally, given any set of objects such that, if we suppose the set to have a total, it will contain members which presuppose this total, then such as set cannot have a total. By saying that the set has “no total”, we mean, primarily, that no significant statement can be made about “all its members”. (PM, 37)

Specifically, then, a quantified expression, since it talks about “all” the members of a totality, must range over a specific logical type in order to observe the vicious circle principle. Thus, when interpreting a bound variable, we must assume that it ranges over a specific type of entity, and so types must be assigned to the other entities represented by expressions in the formula, in observance with the theory of types.

A question arises, however, once one realizes that the statements of primitive propositions and theorems in PM such as ∗10·1 are taken to be “typically ambiguous” (i.e., ambiguous with respect to type). These statements are actually schematic and represent all the possible specific assertions which can be derived from them by interpreting types appropriately. But if statements like ∗10·1 are schemata and yet have bound variables, how do we assign types to the entities over which the bound variables range? The answer is to first decide which type of thing the free variables in the statement range over. For example, assuming that the variable \(y\) in ∗10·1 ranges over individuals (of type \(\iota)\), then the variable \(\phi\) must range over functions of type \((\iota)/n\), for some \(n\). Then the bound variable \(x\) will also range over individuals. If, however, we assume that the variable \(y\) in ∗10·1 ranges over functions of type \((\iota)/1\), then the variable \(\phi\) must range over functions of type \(((\iota)/1)/m\), for some \(m\). In this case, the bound variable \(x\) will range over functions of type \((\iota)/1\).

So \(y\) and \(\phi\) are called “real” variables in ∗10·1 not only because they are free but also because they can range over any type. Whitehead and Russell frequently say that real variables are taken to ambiguously denote “any” of their instances, while bound variables (which also ambiguously denote) range over “all” of their instances (within a legitimate totality, i.e. type).

7. Predicative Functions and Identity

The exclamation mark “!” following a variable for a function and preceding the argument, as in “\(f\bang \hat{x}\)”, “\(\phi \bang x\)”, “\(\phi\bang \hat{x}\)”, indicates that the function is predicative, that is, of the lowest order which can apply to its arguments. In Church’s notation, this means that predicative functions are all of the first level, with types of the form \((\ldots)/1\). As a result, predicative functions will be of order one more than the highest order of any of their arguments. This analysis is based on quotations like the following, in the Introduction to PM:

We will define a function of one variable as predicative when it is of the next order above that of its argument, i.e., of the lowest order compatible with its having that argument. (PM, 53)

Unfortunately in the summary of ∗12, we find “A predicative function is one which contains no apparent variables, i.e., is a matrix” [PM, 167]. Reconciling this statement with that definition in the Introduction is a problem for scholars.

To see the shriek notation in action, consider the following definition of identity:

\[ \tag*{∗13·01} x = y \ldot {=} \colon (\phi) \colon \phi \bang x \ldot {\supset} \ldot \phi \bang y \quad\Df \]

That is, \(x\) is identical with \(y\) if and only if \(y\) has every predicative function \(\phi\) which is possessed by \(x\). (Of course the second occurrence of “=” indicates a definition, and does not independently have meaning. It is the first occurrence, relating individuals \(x\) and \(y\), which is defined.)

To see how this definition reduces to the more familiar definition of identity (on which objects are identical iff they share the same properties), we need the Axiom of Reducibility. The Axiom of Reducibility states that for any function there is an equivalent function (i.e., one true of all the same arguments) which is predicative:

Axiom of Reducibility: \[ \tag*{∗12·1} \vdash \colon (\exists f) \colon \phi x \ldot {\equiv_x} \ldot f\bang x \quad\Pp \]

To see how this axiom implies the more familiar definition of identity, note that the more familiar definition of identity is:

\[ x = y \ldot {=} \colon (\phi) \colon \phi x \ldot {\supset} \ldot \phi y \quad\Df \]

for \(\phi\) of “any” type. (Note that this differs from ∗13·01 in that the shriek no longer appears.) Now to prove this, assume both ∗13·01 and the Axiom of Reducibility, and suppose, for proof by reductio, that \(x = y\), and \(\phi x\), and not \(\phi y\), for some function \(\phi\) of arbitrary type. Then, the Axiom of Reducibility ∗12·1 guarantees that there will be a predicative function \(\psi \bang \), which is coextensive with \(\phi\) such that \(\psi \bang x\) but not \(\psi \bang y\), which contradicts ∗13·01.

8. Definite Descriptions

The inverted Greek letter iota “\(\atoi\)” is used in PM, always followed by a variable, to begin a definite description. \((\atoi x) \phi x\) is read as “the \(x\) such that \(x\) is \(\phi\)”, or more simply, as “the \(\phi\)”. Such expressions may occur in subject position, as in \(\psi(\atoi x) \phi x\), read as “the \(\phi\) is \(\psi\)”. The formal part of Russell’s famous “theory of definite descriptions” consists of a definition of all formulas “…\(\psi(\atoi x) \phi x\)…” in which a description occurs. To distinguish the portion \(\psi\) from the rest of a larger sentence (indicated by the ellipses above) in which the expression \(\psi(\atoi x) \phi x\) occurs, the scope of the description is indicated by repeating the definite description within brackets:

\[ [(\atoi x) \phi x] \sdot \psi(\atoi x) \phi x \]

The notion of scope is meant to explain a distinction which Russell famously discusses in “On Denoting” (1905). Russell says that the sentence “The present King of France is not bald” is ambiguous between two readings: (1) the reading where it says of the present King of France that he is not bald, and (2) the reading on which denies that the present King of France is bald. The former reading requires that there be a unique King of France on the list of things that are not bald, whereas the latter simply says that there is not a unique King of France that appears on the list of bald things. Russell says the latter, but not the former, can be true in a circumstance in which there is no King of France. Russell analyzes this difference as a matter of the scope of the definite description, though as we shall see, some modern logicians tend to think of this situation as a matter of the scope of the negation sign. Thus, Russell introduces a method for indicating the scope of the definite description.

To see how Russell’s method of scope works for this case, we must understand the definition which introduces definite descriptions (i.e., the inverted iota operator). Whitehead and Russell define:

\[ \tag*{∗14·01} [(\atoi x) \phi x] \sdot \psi(\atoi x) \phi x \ldot {=} \colon (\exists b) \colon \phi x \ldot {\equiv_x} \ldot x=b \colon \psi b \quad\Df \]

This kind of definition is called a contextual definition, which are to be contrasted with explicit definitions. An explicit definition of the definition description would have to look something like the following:

\[ (\atoi x)(\phi x) = \colon \ldots \quad\Df \]

which would allow the definite description to be replaced in any context by whichever defining expression fills in the ellipsis. By contrast, ∗14·01 shows how a sentence, in which there is occurrence of a description \((\atoi x)(\phi x)\) in a context \(\psi\), can be replaced by some other sentence (involving \(\phi\) and \(\psi\)) which is equivalent. To develop an instance of this definition, start with the following example:

Example.
The present King of France is bald.

Using \(PKFx\) to represent the propositional function of being a present King of France and \(B\) to represent the propositional function of being bald, Whitehead and Russell would represent the above claim as:

\[ [(\atoi x)(PKFx)] \sdot B(\atoi x)(PKFx) \]

which by ∗14·01 means:

\[ (\exists b) \colon PKFx \ldot {\equiv_x} \ldot x=b \colon Bb \]

In words, there is one and only one \(b\) which is a present King of France and which is bald. In modern symbols, using \(b\) non-standardly, as a variable, this becomes:

\[ (\exists b)[\forall x(PKFx \equiv x=b) \amp Bb] \]

Now we return to the example which shows how the scope of the description makes a difference:

Example.
The present King of France is not bald.

There are two options for representing this sentence.

\[ [(\atoi x)(Kx)] \sdot \osim B(\atoi x)(Kx) \]

and

\[ \osim [(\atoi x)(Kx)] \sdot B(\atoi x)(Kx) \]

In the first, the description has “wide” scope, and in the second, the description has “narrow” scope. Russell says that the description has “primary occurrence” in the former, and “secondary occurrence” in the latter. Given the definition ∗14·01, the two PM formulas immediately above become expanded into primitive notation as:

\[ \begin{align} (\exists b) \colon PKFx \equiv_x x=b \colon \osim Bb\\ \osim (\exists b) \colon PKFx \equiv_x x=b \colon Bb \end{align} \]

In modern notation these become:

\[ \begin{align} \exists x[\forall y(PKFy \equiv y=x) \amp \osim Bx]\\ \osim \exists x[\forall y(PKFy \equiv y=x) \amp Bx] \end{align} \]

The former says that there is one and only one object which is a present King of France and which is not bald; i.e., there is exactly one present King of France and he is not bald. This reading is false, given that there is no present King of France. The latter says it is not the case that there is exactly one present King of France which is bald. This reading is true.

Although Whitehead and Russell take the descriptions in these examples to be the expressions which have scope, the above readings in both expanded PM notation and in modern notation suggest why some modern logicians take the difference in readings here to be a matter of the scope of the negation sign.

9. Classes

The circumflex “ˆ” over a variable preceding a formula is used to indicate a class, thus \(\hat{x} \psi x\) is the class of things \(x\) which are such that \(\psi x\). In modern notation we represent this class as \(\{x \mid \psi x\}\), which is read: the class of \(x\) which are such that \(x\) has \(\psi\). Recall that “\(\phi \hat{x}\)”, with the circumflex over a variable after the predicate variable, expresses the propositional function of being an \(x\) such that \(\phi x\). In the type theory of PM, the class \(\hat{x} \phi x\) has the same logical type as the function \(\phi \hat{x}\). This makes it appropriate to use the following contextual definition, which allows one to eliminate the class term \(\hat{x} \psi x\) from occurrences in the context \(f\): \[ \tag*{∗20·01} f\{ \hat{z}(\psi z)\} \ldot {=} \colon (\exists \phi) \colon \phi \bang x \ldot {\equiv_x} \ldot \psi x \colon f \{ \phi\bang \hat{z}\} \quad\Df \] or in modern notation: \[ f\{z \mid \psi z\} =_{df} \exists \phi[\forall x(\phi x \equiv \psi x) \amp f(\lambda x \phi x)] \] where \(\phi\) is a predicative function of \(x\)

Note that \(f\) has to be interpreted as a higher-order function which is predicated of the function \(\phi \bang \hat{z}\). In the modern notation used above, the language has to be a typed language in which \(\lambda\) expressions are allowed in argument position. As was pointed out later (Chwistek 1924, Gödel 1944, and Carnap 1947) there should be scope indicators for class expressions just as there are for definite descriptions. Chwistek, for example, proposed copying the notation for definite descriptions, thus replacing ∗20·01 with:

\[ [\hat{z}(\psi z)] \sdot f\{ \hat{z}(\psi z)\} \ldot {=} \colon (\exists \phi) \colon \phi \bang x \ldot {\equiv_x} \ldot \psi x \colon f \{ \phi\bang \hat{z} \} \]

Contemporary formalizations of set theory make use of something like these contextual definitions, when they require an “existence” theorem of the form \(\exists x\forall y(y \in x \equiv \ldots y\ldots)\), in order to justify the introduction of a singular term \(\{y \mid \ldots y\ldots \}\). (Given the law of extensionality, it follows from \(\exists x\forall y(y \in x \equiv \ldots y\ldots)\) that there is a unique such set.) The relation of membership in classes \(\in\) is defined in PM by first defining a similar relationship between objects and propositional functions: \[ \tag*{∗20·02} x \in (\phi\bang \hat{z}) \ldot {=} \ldot \phi \bang x \quad\Df \] or, in modern notation: \[ x \in \lambda z\phi z =_{df} \phi x \]

∗20·01 and ∗20·02 together are then used to define the more familiar notion of membership in a class. The formal expression “\(y \in \{ \hat{z}(\phi z)\}\)” can now been seen as a context in which the class term occurs; it is then eliminated by the contextual definition ∗20·01. (Exercise)

PM also has Greek letters for classes: \(\alpha, \beta, \gamma\), etc. These will appear as bound (real) variables, apparent (free) variables and in abstracts for propositional functions true of classes, as in \(\phi \hat{\alpha}\). Only definitions of the bound Greek variables appear in the body of the text, the others are informally defined in the Introduction: \[ \tag*{∗20·07} (\alpha) \sdot f \alpha \ldot {=} \ldot (\phi) \sdot f \{ \hat{z}(\phi\bang z)\} \quad\Df \] or, in modern notation, \[ \forall \alpha\, f\alpha =_{df} \forall \phi f\{z\mid\phi z\} \] where \(\phi\) is a predicative function.

Thus universally quantified class variables are defined in terms of quantifiers ranging over predicative functions. Likewise for existential quantification: \[ \tag*{∗20·071} (\exists \alpha) \sdot f \alpha \ldot {=} \ldot (\exists \phi) \sdot f \{ \hat{z}(\phi\bang z)\} \quad\Df \] or, in modern notation, \[ \exists \alpha\, f\alpha =_{df} \exists \phi f\{z\mid\phi z\} \] where \(\phi\) is a predicative function.

Expressions with a Greek variable to the left of \(\in\) are defined: \[ \tag*{∗20·081} \alpha \in \psi\bang \hat{\alpha} \ldot {=} \ldot \psi \bang \alpha \quad\Df \]

These definitions do not cover all possible occurrences of Greek variables. In the Introduction to PM, further definitions of are \(f \alpha\) and \(f \hat{\alpha}\) proposed, but it is remarked that the definitions are in some way peculiar and they do not appear in the body of the work. The definition considered for \(f \hat{\alpha}\) is:

\[ f \hat{\alpha} \ldot {=} \ldot (\exists \psi) \sdot \hat{\phi} \bang x \equiv_x \psi \bang x \sdot f \{ \psi\bang \hat{z} \} \]

or, in modern notation,

\[ \lambda \alpha\, f\alpha =_{df} \lambda \phi f\{x \mid \phi x\} \]

That is, \(f \hat{\alpha}\) is an expression naming the function which takes a function \(\phi\) to a proposition which asserts \(f\) of the class of \(\phi\)s. (The modern notation shows that in the proposed definition of \(f \hat{\alpha}\) in PM notation, we shouldn’t expect \(\alpha\) in the definiens, since it is really a bound variable in \(f \hat{\alpha}\); similarly, we shouldn’t expect \(\phi\) in the definiendum because it is a bound variable in the definiens.) One might also expect definitions like ∗20·07 and ∗20·071 to hold for cases in which the Roman letter “\(z\)” is replaced by a Greek letter. The definitions in PM are thus not complete, but it is possible to guess at how they would be extended to cover all occurrences of Greek letters. This would complete the project of the “no-classes” theory of classes by showing how all talk of classes can be reduced to the theory of propositional functions.

10. Prolegomena to Cardinal Arithmetic

Although students of philosophy usually read no further than ∗20 in PM, this is in fact the point where the “construction” of mathematics really begins. ∗21 presents the “General Theory of Relations” (the theory of relations in extension; in contemporary logic these are treated as sets of ordered pairs, following Wiener). \(\hat{x} \hat{y} \psi(x, y)\) is the relation between \(x\) and \(y\) which obtains when \(\psi(x, y)\) is true. In modern notation we represent this as as the set of ordered pairs \(\{\langle x, y \rangle \mid \psi( x, y ) \}\), which is read: the set of ordered pairs \(\langle x, y \rangle\) which are such that \(x\) bears the relation \(\psi\) to \(y\).

The following contextual definition (∗21·01) allows one to eliminate the relation term \(\hat{x} \hat{y}\psi (x, y)\) from occurrences in the context \(f\):

\[ f \{ \hat{x} \hat{y} \psi ( x, y )\} \ldot {=} \colondot (\exists \phi) \colon \phi \bang ( x, y ) \ldot {\equiv_{x,y}} \ldot \psi( x, x ) \colon f \{ \phi\bang (\hat{u}, \hat{v} )\} \quad\Df \]

or in modern notation:

\[ f \{\langle x, y \rangle \mid \psi( x, y )\} =_{df} \exists \phi[\forall xy (\phi(x, y) \equiv \psi( x, y) ) \amp f ( \lambda u \lambda v \phi(u,v))] \]

where \(\phi\) is a predicative function of \(u\) and \(v\).

Principia does not analyze relations (or mathematical functions) in terms of sets of ordered pairs, but rather takes the notion of propositional function as primitive and defines relations and functions in terms of them. The upper case letters \({R}, {S}\) and \({T}\), etc., are used after ∗21 to stand for these “relations in extension”, and are distinguished from propositional functions by being written between the arguments. Thus it is \(\psi(x,y)\) with arguments after the propositional function symbol, but \(xRy\). From ∗21 functions “\(\phi\) and \(\psi\)”, etc., disappear and only relations in extension, \({R}\), \({S}\) and \({T}\), etc., appear in the pages of Principia . While propositional functions might be true of the same objects yet not be identical, no two relations in extension are true of the same objects. The logic of Principia is thus “extensional”, from page 200 in volume I, through to the end in Volume III.

∗22 on the “Calculus of Classes” presents the elementary set theory of intersections, unions and the empty set which is often all the set theory used in elementary mathematics of other sorts. The student looking for the set theory of Principia to compare it with, say the Zermelo-Fraenkel system, will have to look at various numbers later in the text. The Axiom of Choice is defined at ∗88 as the “Multiplicative Axiom” and a version of the Axiom of Infinity appears at ∗120 in Volume II as “Infin ax”. The set theory of Principia comes closest to Zermelo’s axioms of 1908 among the various familiar axiom systems, which means that it lacks the Axiom of Foundation and Axiom of Replacement of the now standard Zermelo-Fraenkel axioms of set theory. The system of Principia differs importantly from Zermelo’s in that it is formulated in the simple theory of types. As a result, for example, there are no quantifiers ranging over all sets, and there is a set of all things (for each type).

∗30 on “Descriptive Functions” provides Whitehead and Russell’s analysis of mathematical functions in terms of relations and definite descriptions. Frege had used the notion of function, in the mathematical sense, as a basic notion in his logical system. Thus a Fregean “concept” is a function from objects as arguments to one of the two “truth values” as its values. A concept yields the value “True” for each object to which the concept applies, and “False” for all others. Russell, from 1904, well before the writing of Principia had preferred to analyze functions in terms of the relation between each argument and value, and the notion of “uniqueness”. With modern symbolism, his view would be expressed as follows. For each function \(\lambda x f(x)\), there will be some relation (in extension) \(R\), such that the value of the function for an argument \(a\), that is \(f(a)\), will be the unique individual which bears the relation \(R\) to \(a\). (Nowadays we reduce functions to a binary relation between the argument in the first place and value in the second place.) The result is that there are no function symbols in Principia. As Whitehead and Russell say, the familiar mathematical expressions such as “\(\sin \pi/2\)” will be analyzed with a relation and a definite description, as a “descriptive function”. The “descriptive function”, \(R‘y\) (the \(R\) of \(y)\), is defined as follows:

\[ \tag*{∗30·01} R‘y = (\atoi x)xRy \quad\Df \]

We conclude this section by presenting a number of prominent examples from these later numbers below, with their intuitive meaning, location in PM, definition in PM, and a modern equivalent. (Some of these numbers are theorems rather than definitions.) Note, however, that the modern equivalent will sometimes logically differ from the original version in PM, such as by treating relations as sets of ordered pairs, etc. In his account of the logic of Principia, W.V. Quine (1951) objects to the complexity and even redundancy of much of this symbolism. These formulas can be worked out, however, with a step by step application of the definitions.

For each formula number, we present the information in the following format:

PM Symbol (Intuitive Meaning)    [Location]
PM Definition
Modern Equivalent
\(\alpha \subset \beta\) (\(\alpha\) is a subset of \(\beta\))    [∗22·01]
\(x\in \alpha \ldot {\supset_x} \ldot x\in \beta\)
\(\alpha \subseteq \beta\)
\(\alpha \cap \beta\) (the intersection of \(\alpha\) and \(\beta)\)    [∗22·02]
\(\hat{x} (x \in \alpha \sdot x \in \beta\))
\(\alpha \cap \beta\)
\(\alpha \cup \beta\) (the union of \(\alpha\) and \(\beta\))    [∗22·03]
\(\hat{x} (x \in \alpha \lor x \in \beta\))
\(\alpha \cup \beta\)
\(-\alpha\) (the complement of \(\alpha)\)    [∗22·04]
\(\hat{x} (x\osim \in \alpha\)) [i.e., \(\hat{x} \osim (x \in \alpha\)) by ∗20·06]
\(\{x \mid x \not\in \alpha \}\)
\(\alpha - \beta\) (\(\alpha\) minus \(\beta)\)    [∗22·05]
\(\alpha \cap -\beta\)
\(\{x \mid x\in \alpha \amp x\not\in \beta \}\)
\(\mathrm{V}\) (the universal class)    [∗24·01]
\(\hat{x} (x\) = \(x)\)
\(\mathrm{V}\) or \(\{x \mid x = x\}\)
\(\Lambda\) (the empty class)    [∗24·02]
\(-\mathrm{V}\)
\(\varnothing\)
\(R‘y\) (the \(R\) of \(y)\) (a descriptive function)    [∗30·01]
(\(\atoi x)(xRy)\)
\(f^{-1}(y)\), where \(f = \{\langle x,y\rangle \mid Rxy \}\)
\(\breve{R}\) (the converse of \(R)\)    [∗31·02]
\(\hat{x} \hat{z} (zRx)\)
\(\{\langle x,z\rangle \mid Rzx\}\)
\(\overrightarrow{R}‘y\) (the R-predecessors of \(y)\)    [∗32·01]
\(\hat{x} (xRy)\)
\(\{x \mid Rxy \}\)
\(\overleftarrow{R}‘x\) (the R-successors of \(x)\)    [∗32·02]
\(\hat{z} (xRz)\)
\(\{z \mid Rxz \}\)
\(D‘R\) (the domain of \(R)\)    [∗33·11]
\(\hat{x} \{ (\exists y) \sdot xRy \}\)
\(\{x \mid \exists yRxy \}\)
\(\backd‘R\) (the range of \(R)\)    [∗33·111]
\(\hat{z} \{(\exists x) \sdot xR z \}\)
\(\{z \mid \exists x Rxz \}\)
\(C‘R\) (the field of \(R)\)    [∗33·112]
\(\hat{x} \{(\exists y): xRy \ldot {\lor} \ldot yRx\}\)
\(\{x \mid \exists y (xRy \lor yRx)\}\)
\(R\mid S\) (the relative product of \(R\) and \(S)\)    [∗34·01]
\(\hat{x} \hat{z} \{(\exists y) \sdot xRy \sdot ySz \}\)
\(\{\langle x,z\rangle \mid \exists y(xRy \amp ySz)\}\)
\(R \restriction \beta\) (the restriction of \(R\) to \(\beta)\)    [∗35·02]
\(\hat{x} \hat{z}[xRz \sdot z\in \beta]\)
\(\{\langle x,z\rangle \mid z\in \beta \amp Rxz \}\)
\(\alpha \uparrow \beta\) (the Cartesian product of \(\alpha\) and \(\beta)\)    [∗35·04]
\(\hat{x} \hat{z}[x \in \alpha \sdot z \in \beta\)]
\(\alpha X\beta\), or \(\{\langle x,z\rangle \mid x\in \alpha \amp z\in \beta \}\)
\(R‘‘\beta\) (the projection of \(\beta\) by \(R)\)    [∗37·01]
\(\hat{x} \{(\exists y) \sdot y\in \beta \sdot x Ry\}\)
\(\{x \mid \exists y(y\in \beta \amp Rxy)\}\)
\(\iota‘x\) (singleton of x)    [∗51·11]
\(\hat{z} (z = x)\)
\(\{x\}\)
\(\mathbf{1}\) (the cardinal number 1)    [∗52·01]
\( \hat{\alpha} \{ (\exists x) \sdot x = \iota‘x \} \)
\( \{ x \mid \exists y \; ( x = \{y \} ) \}\) (the class of all singletons)
\(\mathbf{2}\) (the cardinal number 2)    [∗54·02]
\( \hat{\alpha} \{ (\exists x,y) \sdot x \neq y \sdot \alpha = \iota‘x \cup \iota‘y \} \)
\( \{ x \mid \exists y \exists z( y \neq z \amp x = \{y \} \cup \{z\} ) \}\) (the class of all pairs)
\(x \downarrow y\) (the ordinal couple of \(x\) and \(y\))    [∗55·01]
\( \iota‘x \uparrow \iota‘y\)
\( \langle x, y \rangle\) (the ordered pair \(\langle x,y \rangle\))
Note: The paperback abridged edition of PM to ∗56 only goes this far, so the remaining definitions have only been available to those with access to the full three volumes of PM.
\(\alpha \rightarrow \beta\) [∗70·01]
\(\hat{R} (\overrightarrow{R}“\backd ‘R \subset \alpha \sdot \overleftarrow{R}“D‘R \subset \beta \)
\(f : \alpha \rightarrow \beta \) (the functions \(f\) from \(\alpha\) to \(\beta\))
\(\alpha \mathbin{\overline{\mathrm{sm}}} \beta\) (the class of similarity relations betweeen \(\alpha\) and \(\beta\))    [∗73·01]
\(1 \rightarrow 1 \cap \overleftarrow{D}‘\alpha \cap \overleftarrow{\backd}‘\beta \)
\(\{f \mid f : \alpha \stackrel{1-1}{\longrightarrow} \beta\} \)
\(\mathrm{sm}\) (the relation of similarity)    [∗73·02]
\(\hat{\alpha} \hat{\beta}(\exists! \alpha \mathbin{\overline{\mathrm{sm}}} \beta) \)
\(\alpha \approx \beta \)
\(R_*\) (the ancestral of \(R)\)    [∗90·01]
\(\hat{x} \hat{y} \{ x \in C‘ R \colon \breve{R}“\mu \subset \mu \sdot x \in \mu \ldot {\supset_{\mu}} \ldot y \in \mu \}\)
Now written \(R^*\) this follows Frege’s definition: \(y\) is in all the \(R\)-hereditary classes \(x\) is in.

11. Arithmetic in Volume II

Volume II of Principia Mathematica begins with Part III, “Cardinal Arithmetic”. The notions of cardinal numbers are developed in full generality, extending to infinite cardinals. Consequently the theory of natural numbers, which are called “Inductive Cardinals” in PM, is introduced with a series of definitions of special cases of notions that are first introduced in a general form applying to any numbers or classes. For example, addition of natural numbers, as in the famous proof that 1 + 1 = 2 in ∗110·04 is proved with for the special case of the addition of classes that applies to cardinal numbers, ‘\(+_c\)’. These definitions, concluding with the appearance of the Axiom of Infinity at ∗120·03 will conclude this introduction to the symbolism of Principia Mathematica.

\(\mathrm{N_c}\) (the Cardinal Numbers)    [∗100·01]
\(\overrightarrow{\mathrm{sm}}\)
This is actually the relation between a class and its cardinal number.
\(\{x \mid \forall y (y \in x \leftrightarrow \forall z \forall w z,w \in y \leftrightarrow z \approx w))\}\)
Cardinal numbers are classes of equinumerous (similar) classes.
\(\mathbf{0}\) (the cardinal number 0)    [∗101·01]
\(0 = \mathrm{N_c}‘\Lambda \)
\(\{\varnothing\}\)
The class of all classes equinumerous with the empty set is just the singleton containing the empty set.
\(\alpha + \beta\) (the arithmetic sum of \(\alpha\) and \(\beta\))    [∗110·01]
\(\downarrow (\Lambda \cap \beta)“\iota“\alpha \cup (\Lambda \cap \alpha) \downarrow“\iota“\beta)\)
This is the union of \(\alpha\) and \(\beta\) after they are made disjoint by pairing each element of \(\beta\) with \(\{ \alpha \}\) and each element of \(\alpha\) with \(\{ \beta \} \). The classes \(\alpha\) and \(\beta\) are intersected with the empty class, \(\Lambda\), to adjust the type of the elements of the sum.
\((\beta \times \{\alpha\}) \cup (\alpha \times \{\beta\})\)
\(\mu +_c \nu\) (the cardinal sum of \(\mu\) and \(\nu\))    [∗110·02]
\(\hat{\xi}\{(\exists \alpha,\beta) \sdot \mu = \mathrm{N_0 c}‘\alpha \sdot \nu = \mathrm{N_0 c}‘\beta\sdot\xi\,\mathrm{sm}(\alpha + \beta)\}\)
Cardinal addition is the arithmetical sum of “homogeneous cardinals”, cardinals of a uniform type, to which \(\alpha\) and \(\beta\) are related by \( \mathrm{N_0 c}\) (itself defined [∗103·01]).
\(\{ x \mid x \approx (\beta \times \{\alpha\}) \cup (\alpha \times \{\beta\}) \}\)

The reader can now appreciate why this elementary theorem is not proved until page 83 of Volume II of PM:
\[\tag*{∗110·643} 1 +_c 1 = 2 \]

Whitehead and Russell remark that “The above proposition is occasionally useful. It is used at least three times, in …”. This joke reminds us that the theory of natural numbers, so central to Frege’s works, appears in PM as only a special case of a general theory of cardinal and ordinal numbers and even more general classes of isomorphic structures.

This survey of the notation in PM concludes with the definition of the natural numbers and a statement of the Axiom of Infinity, which allow the proof of the other axioms of Peano Arithmetic as, again, special cases of more general notions.

NC induct (the Inductive Cardinals)    [∗120·01]
\(\hat{\alpha}\{\alpha({+_c}1)_* 0\} \)
\(\{x \mid 0 S^* x\}\)
The inductive cardinals are the “natural numbers”, are 0 and all those cardinal numbers that are related to 0 by the ancestral of the “successor relation” \(S\), where \(xSy\) just in case \(y = x +1\).
Infin ax (the Axiom of Infinity)    [∗120·03]
\(\alpha \in \text{NC induct} \sdot \supset_{\alpha} \sdot \exists!\alpha\)
\(\forall y (\{x \mid 0S^* x\} \supset y \neq \varnothing) \)
The Axiom of Infinity asserts that all inductive cardinals are non-empty. (Recall that 0 = \(\{ \varnothing \}\), and so 0 is not empty.) The Axiom of Infinity is not a “primitive proposition” but instead to be listed as an “hypothesis” where used, that is as the antecedent of a conditional, where the consequent will be said to depend on the axiom. Technically it is not an axiom of PM as [∗120·03] is a definition, so this is just further notation in PM!

12. Conclusion

The definitions up to ∗120·03 constitute only about half of the definitions in PM. The last eight pages (667–674) of Volume I of the second edition (1925) consists of a complete “List of Definitions” from all three volumes. Correspondence in the Bertrand Russell Archives suggests that this list may have been compiled by Dorothy Wrinch. The list can be used to trace every one of the defined expressions of PM back to the notation discussed in this entry.

Bibliography

  • Carnap, R., 1947, Meaning and Necessity, Chicago: University of Chicago Press.
  • Church, A., 1976, “Comparison of Russell’s Resolution of the Semantical Antinomies with That of Tarski”, Journal of Symbolic Logic, 41: 747–60.
  • Chwistek, L., 1924, “The Theory of Constructive Types”, Annales de la Société Polonaise de Mathématique (Rocznik Polskiego Towarzystwa Matematycznego), II: 9–48.
  • Feys, R. and Fitch, F.B., 1969, Dictionary of Symbols of Mathematical Logic, Amsterdam: North Holland.
  • Gödel, K., 1944, “Russell’s Mathematical Logic”, in P.A. Schilpp, ed., The Philosophy of Bertrand Russell, LaSalle: Open Court, 125–153.
  • Landini, G., 1998, Russell’s Hidden Substitutional Theory, New York and Oxford: Oxford University Press.
  • Linsky, B., 1999, Russell’s Metaphysical Logic, Stanford: CSLI Publications.
  • –––, 2009, “From Descriptive Functions to Sets of Ordered Pairs”, in Reduction – Abstraction – Analysis, A. Hieke and H. Leitgeb (eds.), Ontos: Munich, 259–272.
  • –––, 2011, The Evolution of Principia Mathematica: Bertrand Russell’s Manuscripts and Notes for the Second Edition, Cambridge: Cambridge University Press.
  • Quine, W.V.O., 1951, “Whitehead and the Rise of Modern Logic”, The Philosophy of Alfred North Whitehead, ed. P.A. Schilpp, 2nd edition, New York: Tudor Publishing, 127–163.
  • Russell, B., 1905, “On Denoting”, Mind (N.S.), 14: 530–538.
  • Turing, A.M., 1942, “The Use of Dots as Brackets in Church’s System”, Journal of Symbolic Logic, 7:146–156.
  • Whitehead, A.N. and B. Russell, [PM], Principia Mathematica, Cambridge: Cambridge University Press, 1910–13, 2nd edition, 1925–27.
  • Whitehead, A.N. and B. Russell, 1927, Principia Mathematica to ∗56, Cambridge: Cambridge University Press.

Other Internet Resources

  • Principia Mathematica, reproduced in the University of Michigan Historical Math Collection.
  • Russell’s “On Denoting”, from the reprint in Logic and Knowledge (R. Marsh, ed., 1956) of the original article in Mind 1905, typed into HTML by Cosma Shalizi (Center for the Study of Complex Systems, U. Michigan)

Acknowledgments

The author would like to thank: Gregory Landini, Dick Schmitt, Franz Fritsche, Rafal Urbaniak, Adam Trybus, Pawel Manczyk, Kenneth Blackwell, and Dirk Schlimm for corrections to this entry.

Copyright © 2016 by
Bernard Linsky <bernard.linsky@ualberta.ca>

This is a file in the archives of the Stanford Encyclopedia of Philosophy.
Please note that some links may no longer be functional.
[an error occurred while processing the directive]