# Zermelo's Axiomatization of Set Theory

*First published Tue Jul 2, 2013*

The first axiomatisation of set theory was given by Zermelo in his
1908 paper “*Untersuchungen über die Grundlagen der
Mengenlehre, I*” (Zermelo 1908b), which became the basis for
the modern theory of sets. This entry focuses on the 1908
axiomatisation; a further entry will consider later axiomatisations of
set theory in the period 1920–1940, including Zermelo's second
axiomatisation of 1930.

## 1. The Axioms

The introduction to Zermelo's paper makes it clear that set theory is regarded as a fundamental theory:

Set theory is that branch of mathematics whose task is to investigate mathematically the fundamental notions “number”, “order”, and “function”, taking them in their pristine, simple form, and to develop thereby the logical foundations of all of arithmetic and analysis; thus it constitutes an indispensable component of the science of mathematics. (1908b: 261)

^{[1]}

This is followed by an acknowledgment that it is necessary to replace the central assumption that we can ‘assign to an arbitrary logically definable notion a “set”, or “class”, as its “extension” ’ (1908b: 261). Zermelo goes on:

In solving the problem [this presents] we must, on the one hand, restrict these principles [distilled from the actual operation with sets] sufficiently to exclude all contradictions and, on the other, take them sufficiently wide to retain all that is valuable in this theory. (1908b: 261)

The ‘central assumption’ which Zermelo describes (let us call it the Comprehension Principle, or CP) had come to be seen by many as the principle behind the derivation of the set-theoretic inconsistencies. Russell (1903: §104) says the following:

Perhaps the best way to state the suggested solution [of the Russell-Zermelo contradiction] is to say that, if a collection of terms can only be defined by a variable propositional function, then, though a class as many may be admitted, a class as one must be denied. We took it as axiomatic that the class as one is to be found wherever there is a class as many; but this axiom need not be universally admitted, and appears to have been the source of the contradiction. By denying it, therefore, the whole difficulty will be overcome.

But it is by no means clear that ‘the whole difficulty’
is thereby ‘overcome’. Russell makes a clear
identification of the principle he cites (a version of CP) as the
source of error, but this does not in the least make it clear what is
to take its
place.^{[2]}
In his *Grundgesetze* (see e.g., Frege
1903: §146–147) Frege recognises that his (in)famous Law V
is based on a conversion principle which allows us to assume that for
any concept (function), there is an object which contains precisely
those things which fall under that concept (or for which the function
returns the value ‘True’). Law V is then the principle
which says that two such extension objects *a*, *b* stemming
from two concepts *F*, *G* are the same if, and only
if, *F* and *G* are extensionally equivalent. Frege clearly
considers the ‘conversion’ of concepts to extensions as
fundamental; he also regards it as widely used in mathematics (even if
only implicitly), and thus that he is not ‘doing anything
new’ by using such a principle of conversion and the attendant
‘basic law of logic’, Law V. (The CP follows immediately
from Law V.) Frege was made aware by Russell (1902) that his Law V is
contradictory, since Russell's paradox flows easily from it. In the
Appendix to *Grundgesetze* (Frege 1903), Frege says this:

Hardly anything more unwelcome can befall a scientific writer than to have one of the foundations of his edifice shaken after the work is finished. This is the position into which I was put by a letter from Mr Bertrand Russell as the printing of this volume was nearing completion. The matter concerns my Basic Law (V). I have never concealed from myself that it is not as obvious as the others nor as obvious as must properly be required of a logical law. Indeed, I pointed out this very weakness in the foreword to the first volume, p. VII. I would gladly have dispensed with this foundation if I had known of some substitute for it. Even now, I do not see how arithmetic can be founded scientifically, how the numbers can be apprehended as logical objects and brought under consideration, if it is not—at least conditionally—permissible to pass from a concept to its extension. May I always speak of the extension of a concept, of a class? And if not, how are the exceptions to be recognised? May one always infer from the extension of one concept's coinciding with that of a second that every object falling under the first concept also falls under the latter? These questions arise from Mr Russell's communication. …What is at stake here is not my approach to a foundation in particular, but rather the very possibility of any logical foundation of arithmetic. (p. 253)

^{[3]}

The difficulty could hardly be summed up more succinctly. It was the replacement of assumptions involving the unfettered conversion of concepts to objects which was Zermelo's main task in his axiomatisation.

Zermelo's system was based on the presupposition that

Set theory is concerned with a “domain” 𝔅 of individuals, which we shall call simply “objects” and among which are the “sets”. If two symbols,

aandb, denote the same object, we writea=b, otherwisea≠b. We say of an objectathat it “exists” if it belongs to the domain 𝔅; likewise we say of a class 𝔎 of objects that “there exist objects of the class 𝔎” if 𝔅 contains at least one individual of this class. (1908b: 262)

Given this, the one fundamental relation is that of set membership,
‘ε’ , which allows one to state that an
object *a* belongs to, or is in, a set *b*, written
‘*a* ε
*b*’.^{[4]}
Zermelo then laid down seven axioms which
give a partial description of what is to be found in *B*. These
can be described as follows:

*Extensionality*

This says roughly that sets are determined by the elements they contain.*Axiom of Elementary Sets*

This asserts (a) the existence of a set which contains no members (denoted ‘0’ by Zermelo, now commonly denoted by ‘∅’); (b) the existence, for any object*a*, of the singleton set {*a*} which has*a*as its sole member; and (c) the existence, for any two objects*a*,*b*, of the unordered pair {*a*,*b*}, which has just*a*,*b*as its members.*Separation*(*Aussonderungsaxiom*)

This asserts that, for any given set*a*, and any given ‘definite’ property of elements in 𝔅 (more on this below), one can ‘separate’ out from*a*as a set just those elements which satisfy the given property.*Power Set*

This says that for any set, the collection of all subsets of that set is also a set.*Union*

This says that for any set, the collection of the members of the members of that set also forms a set.*Choice*

This says that for any set of pairwise disjoint, non-empty sets, there exists a set (which is a subset of the union set to which the given set gives rise) which contains exactly one member from each member of the given set.*Infinity*

This final axiom asserts the existence of an infinitely large set which contains the empty set, and for each set*a*that it contains, also contains the set {*a*}. (Thus, this infinite set must contain ∅, {∅}, {{∅}}, ….)

With the inclusion of this last, Zermelo explicitly rejects any
attempt to *prove* the existence of an infinite collection from other
principles, as we find in Dedekind (1888: §66), or in Frege via
the establishment of what is known as ‘Hume's Principle’.

The four central axioms of Zermelo's system are the Axioms of
Infinity and Power Set, which together show the existence of
uncountable sets, the Axiom of Choice, to which we will devote some
space below, and the Axiom of Separation. This latter allows that any
‘definite’ property φ does in fact give rise to a set,
namely the set of all those things which are already included in some
set *a* and which have the property φ, in other words, gives
rise to a certain subset of *a*, namely the subset of all the
φ-things in *a*. Thus, it follows from this latter that there
will generally be many sets giving partial extensions of φ, namely
the φ-things in *a*, the φ-things in *b*, the
φ-things in *c*, and so on. However, there will be no
guarantee of the existence of a unique extension-set for φ, as, of
course, there is under the CP, namely *a* = {*x* :
φ(*x*)}.

Zermelo shows that, on the basis of his system, the two central paradoxes, that of the greatest set and that of Russell, cannot arise. In fact, Zermelo proves:

Every set

Mpossesses at least one subsetM_{0}that is not an element ofM. (1908b: 265)

The proof is an easy modification of the argument for Russell's
Paradox, using the contradiction this time as *a reductio*. By
Separation, let *M*_{0} be the subset of *M*
consisting of those elements *x* of *M* such
that
*x* ∉ *x*. Now either
*M*_{0} ∈ *M*_{0}
or *M*_{0} ∉ *M*_{0}. Assume
that
*M*_{0} ∈ *M*_{0}. Since
*M*_{0} is a subset
of *M*, this tells us that
*M*_{0} ∈ *M*. But *M*_{0} is then a member of *M*
which fails to satisfy the condition for belonging
to *M*_{0}, showing
that
*M*_{0} ∉ *M*_{0}, which is a
contradiction. Hence,
necessarily,
*M*_{0} ∉ *M*_{0}. But now
if we suppose that *M*_{0} were in *M*,
then *M*_{0} itself is bound to be
in *M*_{0} by the defining condition of this
set. Hence,
*M*_{0} ∉ *M* on pain of
contradiction. The argument for the Russell paradox is used here to
constructive effect: one person's contradiction is another person's
*reductio*. Zermelo comments:

It follows from the theorem that not all objects

xof the domain 𝔅 can be elements of one and the same set; that is, the domain 𝔅 is not itself a set, and this disposes of the “Russell antinomy” so far as we are concerned. (1908b: 265)

For, in the absence of something like the CP, there is no
overriding reason to think that there must *be* a universal
set.^{[5]}

But although this deals with the Russell paradox and the paradox of the universal set, it does not tackle the general consistency of the system. Zermelo was well aware of this, as is clear from the Introduction to his paper:

I have not yet even been able to prove rigorously that my axioms are “consistent”, though this is certainly very essential; instead I have had to confine myself to pointing out now and then that the “antinomies” discovered so far vanish one and all if the principles here proposed are adopted as a basis. But I hope to have done at least some useful spadework hereby for subsequent investigations in such deeper problems. (1908b: 262)

It should be remarked in passing that Zermelo doesn't deal specifically with the
Burali-Forti paradox either, for the simple reason that it cannot be properly
formulated in his system, since it deals either with well-orderings
generally or with the general concept of ordinal number. We will come
back to this below. However, assuming that the known paradoxes *can* be
avoided, another question comes to the fore: if the Separation Axiom
is to be the basic principle for the workaday creation of sets, is it
*adequate*? This question, too, will be taken up later.

There were attempts at the statement of axioms before Zermelo, both
publicly and in private
correspondence.^{[6]}
In particular, Cantor, in correspondence
with Hilbert and Dedekind in the late 1890s, had endeavoured to
describe some principles of set
existence^{[7]}
which he thought were legitimate, and would
not give rise to the construction of what he called
‘inconsistent totalities’, totalities which engender
contradictions. (The best known of these totalities were the totality
of all ordinals and the totality of all cardinals.) These principles
included those of set union and a form of the replacement axiom, as
well as principles which seem to guarantee that every cardinal number
is an aleph, which we call for short the ‘Aleph Hypothesis
(AH)’.

Despite this, there are reasons for calling Zermelo's system the first real axiomatisation of set theory. It is clear above all that Zermelo's intention was to reveal the fundamental nature of the theory of sets and to preserve its achievements, while at the same time providing a general replacement for the CP.

## 2. The Background to Zermelo's Axiomatisation

### 2.1 Hilbert's Axiomatic Method

Hilbert's early work on the axiomatic method is an important part
of the context of Zermelo's axiomatisation. Hilbert developed a
particular version of the axiomatic approach to fundamental
mathematical theories in his work on geometry in the period
1894–1904 (see Hallett and Majer 2004). This was to be seen as a
distinct alternative to what Hilbert called the ‘genetic
approach’ to mathematics. (For a short, historically informed
description, see Felgner 2010: 169–174.) Ebbinghaus's book on
Zermelo makes it very clear how embedded Zermelo was in the Hilbert
foundational circle in the early years of the
century.^{[8]}
This is not meant to suggest that Zermelo adopted Hilbert's approach
to the foundations of mathematics in all its aspects. Indeed, Zermelo
developed his own, distinctive approach to foundational matters which
was very different from Hilbert's, something which emerges quite
clearly from his later work. Nevertheless, there are two elements of
Zermelo's procedure which fit very well with Hilbert's foundational
approach in the early part of the century. The first element concerns
what might be called the programmatic element of Hilbert's treatment
of the foundations of mathematics as it emerged in the later 1890s,
and especially with regard to the notion of mathematical
existence. And the second concerns proof analysis, a highly important
part of Hilbert's work on Euclidean geometry and geometrical systems
generally. These matters are intricate, and cannot be discussed
adequately here (for fuller discussion, see both Hallett 2008 and
2010a). But it is important for understanding Zermelo's work fully
that a rough account be given.

#### 2.1.1 Programmatic elements

First, Hilbert adopted the view that a mature presentation of a mathematical theory must be given axiomatically. This, he claims, requires several things:

- The postulation of the existence of a domain, of a ‘system (or systems) of things’.
- The insistence, however, that nothing is known about those things except what is expressed in, or can be derived from, a finite list of axioms.
- The requirement, along with this, of finite proofs, which begin with axioms and proceed from these to a conclusion by a ‘finite number of inferences’ (i.e., acceptable inferential steps).
- The rather imprecise notion of the ‘completeness’ of the axiomatisation, which involves, loosely, showing that the axioms can prove all that they ‘ought’ to prove.
- The provision of a consistency proof for these axioms, showing that no contradiction is derivable by a proof constructed in the system given.

For one thing, Hilbert was very clear (especially in his unpublished lectures on geometry: see Hallett and Majer 2004) that, although a domain is asserted to ‘exist’, all that is known about the objects in the domain is what is given to us by the axioms and what can be derived from these through ‘finite proof’. In other words, while a domain is postulated, nothing is taken to be known about the things in it independently of the axioms laid down and what they entail. The basic example was given by geometrical systems of points, lines and planes; although the geometrical domain is made up of these things, nothing can be assumed known about them (in particular no ‘intuitive’ geometrical knowledge from whatever source) other than what is given in the axioms or which can be derived from them by legitimate inference. (The axioms themselves might sum up, or be derived from, ‘intuitive’ knowledge, but that is a different matter. And even here it is important that we can detach the axioms from their intuitive meanings.)

Secondly, while ‘existence’ of the objects is just a
matter (as Zermelo says) of belonging to the domain (a fact which is
established by the axioms or by proofs from those axioms), the
mathematical existence of the domain itself, and (correspondingly) of
the system set out by the axioms, is established only by a consistency
proof for the axioms. Thus, to take the prime example, the
‘existence’ of Euclidean geometry (or more accurately
Euclidean geometries) is shown by the consistency proofs given by
means of analytic
geometry.^{[9]}
Thus, the unit of consistency is not the
concept nor the individual propositions, but rather the system of
axioms as a whole, and different systems necessarily give accounts of
different primitives. The expectation is that when a domain is
axiomatised, attention will turn (at some point) to a consistency
proof, and this will deal finally with the question of mathematical
existence. In any case, the task of showing existence is a
mathematical one and there is no further ontological or metaphysical
mystery to be solved once the axioms are laid down.

Many aspects of Hilbert's position are summed up in this passage from his 1902 lectures on the foundations of geometry: the axioms ‘create’ the domains, and the consistency proofs justify their existence. As he puts it:

The things with which mathematics is concerned are defined through axioms,

brought into life.The axioms can be taken quite arbitrarily. However, if these axioms contradict each other, then no logical consequences can be drawn from them; the system defined then does not exist for the mathematician. (Hilbert 1902: 47 or Hallett and Majer 2004: 563)

This notion of ‘definition through axioms’, what came to be known as the method of ‘implicit definition’, can be seen in various writings of Hilbert's from around 1900. His attitude to existence is illustrated in the following passage from his famous paper on the axiomatisation of the reals:

The objections which have been raised against the existence of the totality of all real numbers and infinite sets generally lose all their justification once one has adopted the view stated above [the axiomatic method]. By the set of the real numbers we do not have to imagine something like the totality of all possible laws governing the development of a fundamental series, but rather, as has been set out, a system of things whose mutual relations are given by the

finite and closedsystems of axioms I–IV [for complete ordered fields] given above, and about which statements only have validity in the case where one can derive them via a finite number of inferences from those axioms. (Hilbert 1900b: 184)^{[10]}

The parallels between this ‘axiomatic method’ of
Hilbert's and Zermelo's axiomatisation of set theory are
reasonably clear, if not
exact.^{[11]}
Particularly clear are the assumption of the existence of a
‘domain’ 𝔅, the statement of a finite list of
axioms governing its contents, and the recognition of the requirement
of a general consistency proof. There's also implicit
recognition of the requirements of ‘finite proof’; this
leads us to the second important aspect of the Hilbertian background,
namely proof analysis and the use of the Axiom of Choice.

#### 2.1.2 Proof analysis and Zermelo's Well-Ordering Theorem [WOT]

A great deal of Hilbert's work on geometry concerned the analysis
of proofs, of what can, or cannot, be derived from what. Much of
Hilbert's novel work on geometry involved the clever use of
(arithmetical) models for geometrical systems to demonstrate a
succession of independence results, which, among other things, often
show how finely balanced various central assumptions
are.^{[12]}
Moreover, a close reading of Hilbert's work makes it clear that the
development of an appropriate axiom system itself goes hand-in-hand
with the reconstruction and analysis of proofs.

One straightforward kind of proof analysis was designed to reveal what assumptions there are behind accepted ‘theorems’, and this is clearly pertinent in the case of Zermelo's Axiom of Choice (his sixth axiom) and the WOT. What Zermelo's work showed, in effect, is that the ‘choice’ principle behind the Axiom is a necessary and sufficient condition for WOT; and he shows this by furnishing a Hilbertian style proof for the theorem, i.e., a conclusion which follows from (fairly) clear assumptions by means of a finite number of inferential steps. Indeed, the Axiom is chosen so as to make the WOT provable, and it transpired subsequently that it also made provable a vast array of results, mainly (but not solely) in set theory and in set-theoretic algebra. To understand the importance of Zermelo's work, it's necessary to appreciate the centrality of the the WOT.

### 2.2 The Well-Ordering Problem and the Well-Ordering Theorem

#### 2.2.1 The importance of the problem before Zermelo

In one of the fundamental papers in the genesis of set theory,
Cantor (1883a) isolated the notion of a well-ordering on a collection
as one of the central conceptual pillars on which number is
built. Cantor took the view that the notion of a counting number must
be based on an underlying ordering of the set of things being counted,
an ordering in which there is a first element counted, and, following
any collection of elements counted, there must be a next element
counted, assuming that there are elements still uncounted. This kind
of ordering he called a ‘well-ordering’, which we now
define as a total-ordering with an extra condition, namely that any
subset has a least element in the ordering. Cantor recognised that
each distinct well-ordering of the elements gives rise to a distinct
counting number, what he originally called an ‘*Anzahl*
[enumeral]’, later an ‘*Ordnungszahl* [ordinal
number]’, numbers which are conceptually quite different from
*cardinal numbers* or *powers*, meant to express just the size of
collections.^{[13]}
This distinction is hard to perceive at first sight. Before Cantor and
the rise of the modern theory of transfinite numbers, the standard
counting numbers were the ordinary finite
numbers.^{[14]}
And, crucially, for finite collections, it turns out that any two
orderings of the same underlying elements, which are certainly
well-orderings in Cantor's sense, are order-isomorphic, i.e., not
essentially
distinct.^{[15]}
This means that one can in effect
identify a number arrived at by counting (an ordinal number) with the
cardinal number of the collection counted. Thus, the ordinary natural
numbers appear in two guises, and it is possible to determine the size
of a finite collection directly by counting it. Cantor observed that
this ceases to be the case in rather dramatic fashion once one
considers infinite collections; here, the same elements can give rise
to a large variety of distinct well-orderings.

Nevertheless, Cantor noticed that if one collects together all the
countable ordinal numbers, i.e., the numbers representing
well-orderings of the set of natural numbers, this collection, which
Cantor called the *second number-class* (the first being the set of
natural numbers), must be of greater cardinality than that of the
collection of natural numbers itself. Moreover, this size is the
cardinal *successor* to the size of the natural numbers in the very
clear sense that any infinite subset of the second number-class is
either of the power of the natural numbers or of the power of the
whole class; thus, there can be no size which is strictly
intermediate. The process generalises: collect together all the
ordinal numbers representing well-orderings of the second number-class
to form the third number-class, and this must be the immediate
successor in size to that of the second number-class, and so on. In
this way, Cantor could use the ordinal numbers to generate an infinite
sequence of cardinalities or powers. This sequence was later
(Cantor 1895) called the aleph-sequence, ℵ_{0} (the
size of the natural numbers), ℵ_{1} (expressing the
size of the second number-class), ℵ_{2} (expressing the
size of the third number-class), and so on. Since the intention was
that ordinal numbers could be generated arbitrarily far, then so too,
it seems, could the alephs.

This raises the possibility of reinstating the centrality of the ordinal numbers as the fundamental numbers even in the case of infinite sets, thus making ordinality the foundation of cardinality for all sets. In work after 1883, Cantor attempted to show that the alephs actually represent a scale of infinite cardinal number. For instance, it is shown that the ordinal numbers are comparable, i.e., for any two ordinal numbers α, β, either α < β, α = β or α > β, a desirable, perhaps essential, property of counting numbers. Through this, comparability therefore transfers to the alephs, and Cantor was able to give clear and appropriate arithmetical operations of addition, multiplication and exponentiation, generalising the corresponding notions for finite collections, and the statement and proof of general laws concerning these.

In 1878, Cantor had put forward the hypothesis that there is no
infinite power between that of the natural numbers and the
continuum. This became known as Cantor's Continuum Hypothesis
(CH). With the adumbration of the number classes, CH takes on the form
that the continuum has the power of the second number-class, and with
the development of the aleph-scale, it assumes the form of a
conjecture about the exponentiation operation in the generalised
cardinal arithmetic, for it can be expressed in the form
2^{ℵ0} =
ℵ_{1}. The *continuum problem* more generally
construed is really the problem of where the power of the continuum is
in the scale of aleph numbers, and the generalised continuum
hypothesis is the conjecture that taking the power set of an infinite
set corresponds to moving up just one level in the aleph scale. For
example, in 1883, Cantor had assumed (without remark) that the set of
all real functions has the size of the third number-class. Given the
CH, this then becomes the conjecture that
2^{ℵ1} = ℵ_{2}.

But adopting the aleph scale as a framework for infinite
cardinality depends on significant assumptions. It is clear that any
collection in well-ordered form (given that it is represented by an
ordinal) must have an aleph-number representing its size, so clearly
the aleph-sequence represents the sizes (or *powers* as Cantor called
them) of all the well-ordered sets. However, can *any* set be put into
well-ordered form? A particular question of this form concerns the
continuum itself: if the continuum is equivalent to the second
number-class, then clearly it can be well-ordered, and indeed this is
a necessary condition for showing that the continuum is represented at
all in the scale. But *can* it be well-ordered? More generally, to
assume that *any* cardinality is represented in the scale of aleph
numbers is to assume in particular that *any* set can be
well-ordered. And to assume that the aleph-sequence is *the* scale of
infinite cardinal number is to assume at the very least that sets
generally can be compared cardinally; i.e., that for any *M*, *N*, either
*M* ≼ *N* or
*N* ≼ *M*, COMP for short. But is this
correct?

When introducing the notion of well-ordering in 1883, Cantor expressed his belief that the fact that any set (‘manifold’) can be well-ordered is ‘a law of thought [Denkgesetz]’, thus putting forward what for convenience we can call the well-ordering hypothesis (WOH):

The concept of

well-ordered setreveals itself as fundamental for the theory of manifolds. That it is always possible to arrange anywell-definedset in the form of awell-orderedset is, it seems to me, a very basic law of thought, rich in consequences, and particularly remarkable in virtue of its general validity. I will return to this in a later memoir. (Cantor 1883a or 1932: 169)

Cantor says nothing about what it might mean to call the
well-ordering hypothesis a ‘law of thought’, and he never
did return to this question directly; however, in one form or another,
this claim is key. It could be that Cantor at this time considered the
WOH as something like a logical
principle.^{[16]} This, however, is not
particularly clear, especially since the study of formal logic
adequate for mathematical reasoning was only in its infancy, and the
set concept itself was new and rather unclearly delimited. Another
suggestion is that well-orderability is intrinsic to the way that
‘well-defined’ sets are either presented or conceived,
e.g., that it is impossible to think of a collection's being a
set without at the same time allowing that its elements can be
arranged ‘discretely’ in some way, or even that such
arrangement can be automatically deduced from the
‘definition’. Thus, if one views sets as necessary for
mathematics, and one holds that the concept of set itself necessarily
involves the discrete arrangement of the elements of the set, then WOH
might appear necessary, too. But all of this is imprecise, not least
because the notion of set itself was imprecise and imprecisely
formulated. One clear implication of Cantor's remark is that he
regards the WOH as something which does not require
proof. Nonetheless, not long after he had stated this, Cantor clearly
had doubts both about the well-orderability of the continuum and about
cardinal comparability (see Moore 1982: 44). All of
this suggested that the WOH, and the associated hypothesis that the
alephs represent the scale of infinite cardinality, do require proof,
and cannot just be taken as ‘definitional’. Thus, it
seemed clear that the whole Cantorian project of erecting a scale of
infinite size depends at root on the correctness of the WOH.

Work subsequent to 1884 suggests that Cantor felt the need to
supply arguments for well-ordering. For instance (Cantor 1895: 493) to
show that every infinite set *T* has a countable subset (and thus
that ℵ_{0} is the smallest cardinality), Cantor set
out to *prove the existence* of a subset of *T* which is
well-ordered like the natural numbers. The key point to observe here
is that Cantor felt it necessary to *exhibit* a well-ordered subset
of *T*, and did not simply proceed by first assuming (by appeal
to his ‘*Denkgesetz*’) that *M* can be
arranged in well-ordered form. He exhibits such a subset in the
following way:

Proof.If one has removed fromTa finite number of elementst_{1},t_{2}, …,t_{ν−1}according to some rule, then the possibility always remains of extracting a further elementt_{ν}. The set {t_{ν}}, in which ν denotes an arbitrary finite, cardinal number, is a subset ofTwith the cardinal number ℵ_{0}, because {t_{ν}} ∼ {ν}. (Cantor 1895: 493)

In 1932, Zermelo edited Cantor's collected papers (Cantor 1932), and commented on this particular proof as follows:

The “proof” of Theorem A, which is purely intuitive and logically unsatisfactory, recalls the well-known primitive attempt to arrive at a

well-orderingof a given set by successive removal of arbitrary elements. We arrive at a correct proof only when westart froman alreadywell-orderedset, whose smallest transfinite initial segment in fact has the cardinal number ℵ_{0}sought. (Zermelo in Cantor 1932: 352)

The second context in which an argument was given was an attempt
by Cantor (in correspondence first with Hilbert and then Dedekind) to
show that every set must have an aleph-number as a
cardinal.^{[17]} What Cantor attempts to
show, in effect, is the following. Assume that Ω represents the
sequence of all ordinal numbers, and assume (for a *reductio* argument)
that *V* is a ‘multiplicity’ which is not equivalent
to any aleph. Then Cantor argues that Ω can be
‘projected’ into *V*, in turn showing that *V*
must be what he calls an ‘inconsistent multiplicity’,
i.e., not a legitimate set. It will follow that all sets have alephs
as cardinals, since they will always be ‘exhausted’ by
such a projection by some ordinal or other, in which case they will be
cardinally equivalent to some ordinal
number-class.^{[18]} Zermelo's
dismissal of this attempted proof is no surprise, given the comments
just quoted. But he also comments further here exactly on this
‘projection’:

The weakness of the proof outlined lies precisely here. It is

notproved that the whole series of numbers Ω can be “projected into” any multiplicityVwhich does not have an aleph as a cardinal number, but this is rather taken from a somewhat vague “intuition”. ApparentlyCantorimagines the numbers of Ω successively and arbitrarily assigned to elements ofVin such a way that every element ofVis only used once.Eitherthis process must then come to an end, in that all elements ofVare used up, in which caseVwould be then be coordinated with aninitialsegment of the number series, and its power consequently an aleph, contrary to assumption;orVwould remain inexhaustible and would then contain a component equivalent to the whole of Ω, thus an inconsistent component. Here, the intuition of time [Zeitanschauung] is being applied to a process which goes beyond all intuition, and a being [Wesen] supposed which can makesuccessivearbitrary choices and thereby define a subsetV′ ofVwhich is not definable by the conditions given. (Zermelo in Cantor 1932: 451)^{[19]}

If it really is ‘successive’ selection which is relied
on, then it seems that one must be assuming a subset of instants of
time which is well-ordered and which forms a base ordering from which
the ‘successive’ selections are made. In short, what is
really presupposed is a well-ordered subset of temporal instants which
acts as the basis for a recursive definition. Even in the case of
countable subsets, if the ‘process’ is actually to come to
a conclusion, the ‘being’ presupposed would presumably
have to be able to distinguish a (countably) infinite, discrete
sequence of instants within a finite time, and this assumption is, as
is well-known, a notoriously controversial one. In the general case,
the position is actually worse, for here the question of the
well-orderability of the given set depends at the very least on the
existence of a well-ordered subset of temporal instants of arbitrarily
high infinite cardinality. This appears to go against the assumption
that time is an ordinary continuum, i.e., of cardinality
2^{ℵ0}, unless of course the power set of
the natural numbers itself is too ‘big’ to be counted by
any ordinal, in which case much of the point of the argument would be
lost, for one of its aims is presumably to show that the power of the
continuum is somewhere in the
aleph-sequence.^{[20]}

Part of what is at issue here, at least implicitly, is what
constitutes a proof. It seems obvious that if a set is non-empty, then
it must be possible to ‘choose’ an element from it (i.e.,
there must exist an element in it). Indeed, the obviousness of this is
enshrined in the modern logical calculus by the way the inference
principle of Existential Instantiation (EI) usually works: from
∃*x**P**x* one assumes *Pc*, where
‘*c*’ is a new constant, and reasons on that basis;
whatever can be inferred from
*P*(*c*) (as long as it does not itself contain the new constant
‘*c*’) is then taken to be inferable from ∃*x**P**x*
alone. Furthermore, it is clear how this extends to finite sets (or
finite extensions) by stringing together successive inferential
steps. But how can such an inferential procedure be extended to
infinite sets, if at all?

Some evidence of the centrality of WOH is provided by Problem 1 on Hilbert's list of mathematical problems in his famous lecture to the International Congress of Mathematicians in Paris in 1900. He notes Cantor's conviction of the correctness of CH, and its ‘great probability’, then goes on to mention another ‘remarkable assertion’ of Cantor's, namely his belief that the continuum, although not (in its natural order) in well-ordered form, can be rearranged as a well-ordered set. However, Russell, writing at roughly the same time, expressed doubts about precisely this:

Cantor assumes as an axiom that every class is the field of some well-ordered series, and deduces that all cardinals can be correlated with ordinals …. This assumption seems to me unwarranted, especially in view of the fact that no one has yet succeeded in arranging a class of 2

^{α0}terms in a well-ordered series. (Russell 1903: 322–323)

He goes on:

We do not know that of any two different cardinal numbers one must be the greater, and it may be that 2

^{α0}is neither greater nor less that α_{1}and α_{2}and their successors, which may be called well-ordered cardinals because they apply to well-ordered series. (Russell 1903: 323)^{[21]}

And recall that, at the International Congress of Mathematicians in
Heidelberg in 1904, König had given an apparently convincing
proof that the continuum *cannot* be an aleph. König's
argument, as we know, turned out to contain fatal flaws, but in any
case, the confusion it exhibits is
instructive.^{[22]}

In short, the clear impression in the immediate period leading up
to Zermelo's work was *both* that only the WOH would provide a
solid foundation on which to build a reasonable notion of infinite
cardinal number as a proper framework for tackling CH, *and* that WOH
requires justification, that it must become, in effect, the WOT, the
WO-Theorem. In short, establishing the WOT was closely bound up with
the clarification of what it is to count as a set.

#### 2.2.2 Zermelo's 1904 Proof of the Well-Ordering Theorem

Zermelo's approach to the well-ordering problem took place in three stages. He published a proof of WOT in 1904 (Zermelo 1904, an extract from a letter to Hilbert), where he first introduced the ‘choice’ principle, a principle designed (despite the name it has come to bear) to move away from the Cantorian ‘choosing’ arguments which almost universally preceded Zermelo's work, and which postulates that arbitrary ‘choices’ have already been made. This paper produced an outcry, to which Zermelo responded by producing a new proof (1908a), which again uses the choice principle, but this time in a somewhat different form and expressed now explicitly as an axiom. The first three pages of this paper give the new proof; this was then followed by seventeen pages which reply in great detail to many of the objections raised against the first proof. These consisted in objections to the choice principle itself, and also objections to the unclarity of the underlying assumptions about, and operation with, sets used in the proof. This paper was followed just two months later by Zermelo's official axiomatisation (1908b), an axiomatisation which to a large degree was prefigured in the paper (1908a).

Zermelo's 1904 proof can be briefly described.

- (1)
- Let
*M*be an arbitrarily given set, and let**M**be its power set. Assume given what Zermelo calls a ‘covering’ of**M**, i.e., a function γ from non-empty elements of**M**to*M*such that γ(*X*) ∈*X*, in other words, what would now be called a choice function. The argument then shows that such a γ determines a unique well-ordering of*M*.^{[23]} - (2)
- Using a fixed such γ, Zermelo then defines the so-called
γ-sets
*M*_{γ}. These satisfy the following conditions:*M*_{γ}⊆*M*;*M*_{γ}is well-ordered by some ordering ≺ specific to*M*_{γ};- If a ∈
*M*_{γ}, then*a*must determine an initial segment*A*of*M*_{γ}under ≺; but now γ and ≺ must be related in such a way that*a*= γ(*M*−*A*), i.e.,*a*is the ‘distinguished element’ (as Zermelo calls it) of the complement of*A*in*M*.

- (3)
- There clearly are γ sets:
{
*m*_{1}} is one such, where*m*_{1}= γ(*M*) and where we take the trivial well-ordering. The set {*m*_{1},*m*_{2}} is also a γ-set, where again*m*_{1}= γ(*M*),*m*_{2}= γ(*M*− {*m*_{1}}), and {*m*_{1},*m*_{2}} is given the ordering which places*m*_{2}after*m*_{1}. (Note that {*m*_{1},*m*_{2}} with the other ordering would not be a γ-set.) In fact, it is easy to see that if*M*′ ⊆*M*is to be a γ-set, then condition (2)(c) means that ≺ is uniquely (one is tempted to say, recursively) determined. - (4)
- Indeed, following this, Zermelo shows that of any two distinct γ-sets, one is identical to an initial segment of the other, and the well-ordering of the latter extends the well-ordering of the former.
- (5)
- Zermelo now considers the set
*L*_{γ}, which is the union taken over all the γ-sets. It is not difficult to see that*L*_{γ}itself must be a γ-set, indeed, the largest such. By definition,*L*_{γ}⊆*M*; but Zermelo shows that equality must hold. If not, then*M*−*L*_{γ}would be a non-empty subset of*M*, in which case we can consider γ(*M*−*L*_{γ}) =*m*_{1}′. Now form*L*_{γ}′*L*_{γ}∪ {*m*_{1}′}, and supply it with the well-ordering which is the same as that in*L*_{γ}, except that we extend it by fixing that*x*≺*m*_{1}′ for any*x*∈*L*_{γ}. Clearly now*L*_{γ}′ is a γ-set, but one which properly extends*L*_{γ}, which is a contradiction. Thus*L*_{γ}′ =*M*, and so*M*can be well-ordered by the ordering of*L*_{γ}′.^{[24]}

As Zermelo points out (p. 516 of his paper), the WOT establishes a
firm foundation for the theory of infinite cardinality; in particular,
it shows, he says, that every set (‘for which the totality of
its subsets etc. has a sense’) can be considered as a
well-ordered set ‘and its power considered as an
aleph’. Later work of Hartogs (see Hartogs 1915) showed that,
not only does WOT imply COMP as Zermelo shows, but that COMP itself
implies WOT, and thus in turn Zermelo's choice principle. Thus,
it is not just COMP which is necessary for a reasonable theory of
infinite cardinality, but WOT itself. Despite Zermelo's
endorsement here, the correctness of the hypothesis that the scale of
aleph numbers represents *all* cardinals (AH, for short) is a more
complicated matter, for it involves the claim that every set is
actually equivalent to an initial segment of the ordinals, and not
just well-orderable. In axiomatic frameworks for sets, therefore, the
truth of AH depends very much on which ordinals are present as sets in
the system.

The subsequent work showing the independence of AC from the other
axioms of set theory vindicates Zermelo's pioneering work; in
this respect, it puts Zermelo's revelation of the choice
principle in a similar position as that which Hilbert ascribes to the
Parallel Postulate in Euclid's work. Hilbert claims that Euclid
must have realised that to establish certain ‘obvious’
facts about triangles, rectangles etc., an entirely *new* axiom
(Euclid's Parallel Postulate) was necessary, and moreover that
Gauß was the first mathematician ‘for 2100 years’ to
see that Euclid had been right (see Hallett and Majer 2004:261–263 and 343–345).
This ‘pragmatic attitude’, which is on display in
Zermelo's second paper on well-ordering from 1908, became, in
effect, the reigning attitude towards the choice principle: If certain
problems are to be solved, then the choice principle must be
adopted. In 1908, Zermelo brings out this parallel explicitly:

Banishing fundamental facts or problems from science merely because they cannot be dealt with by means of certain prescribed principles would be like forbidding the further extension of the theory of parallels in geometry because the axiom upon which this theory rests has been shown to be unprovable. (Zermelo 1908a: 115)

Zermelo does not in 1904 call the choice principle an axiom; it is, rather, designated a ‘logical principle’. What Zermelo has to say by way of an explanation is very short:

This logical principle cannot, to be sure, be reduced to a still simpler one, but it is applied without hesitation everywhere in mathematical deduction. (Zermelo 1904: 516)

It is not clear from this whether he thinks of the choice principle
as a ‘law of thought’, as the term ‘logical
principle’ might suggest, or whether he thinks it is just
intrinsic to mathematical reasoning whenever sets are involved, a
position suggested by the reference to its application
‘everywhere in mathematical deduction’. By the time of his
second well-ordering paper of 1908, Zermelo seems to have moved away
from the idea of AC as a ‘logical’ principle in the sense
of a logical law, and appears to put the emphasis more on the view of
it as intrinsic to the subject matter; there it appears as Axiom IV,
and, as we saw, Axiom VI of Zermelo
1908b.^{[25]}

#### 2.2.3 Objections to the 1904 Proof

There were three central objections.

- Objections to the Choice Principle.
- Objections to Zermelo's general operation with sets, especially well-orderings.
- Objections to impredicative definitions.

Let us briefly deal with these.

(a) The objections to the choice principle were of two kinds. The
main objection was put forward by Borel in 1905 in
the *Mathematische Annalen* (Borel 1905), the journal which
published Zermelo's paper, and it is also widely discussed in
correspondence between some leading French mathematicians, and also
published in that year in the same Journal (see Hadamard et
al. 1905). The objection is basically that Zermelo's principle fails
to specify a ‘law’ or ‘rule’ by which the
choices are effected; in other words, the covering used is not
explicitly defined, which means that the resulting well-ordering is
not explicitly defined either. In a letter to Borel, Hadamard makes it
clear that the opposition in question is really that between the
assumption of the existence of an object which is fully described, and
of the existence of an object which is *not* fully described (see
Hadamard et al. 1905, esp. 262). In his reply, Zermelo remarks that
the inability to describe the choices is why the choice principle is
in effect an *axiom*, which has to be added to the other principles. In
effect, the position is that if one wants to do certain things which,
e.g., rely on the WOT, then the choice principle is indispensable. His
position, to repeat, is like the one that Euclidean geometry takes
towards parallels.

(b) An objection to the choice principle was also put forward by
Peano. This objection seems to be that since the choice principle
cannot be proved ‘syllogistically’ (i.e., from the
principles of Peano's *Formulario*), then it has to be rejected (see
Peano 1906). (Peano does think, however, that finite versions of the
choice principle are provable, relying essentially on repeated
applications of a version for classes of the basic logical principle
EI mentioned above (§2.2.1).
Zermelo's reply is the following. Axiom systems like Peano's are
constructed so as to be adequate for mathematics; but how does one go
about selecting the ‘basic principles’ required? One
cannot assemble a complete list of adequate principles, says Zermelo,
without careful inspection of actual mathematics and thereby a careful
assessment of what principles are actually necessary to such a list,
and such inspection would show that the choice principle is surely one
such; in other words, a selection of principles such as Peano's is
very much a *post hoc* procedure. The reply to Peano is of a piece with
the reply to Borel, and recalls strongly the invocation in Zermelo
(1908b: 261), that it is necessary to distill principles from the
actual operation with sets. He supports his claim that the choice
principle is necessary by a list of seven problems which ‘in my
opinion, could not be dealt with at all without the principle of
choice’ (Zermelo 1908a:
113).^{[26]}
In particular he points out that the
principle is indispensable for any reasonable theory of infinite
cardinality, for only it guarantees the right results for infinite
unions/sums, and in addition is vital for making sense of the very
definition of infinite product. That Peano cannot establish the choice
principle from his principles, says Zermelo, strongly suggests that
his list of principles is not ‘complete’ (Zermelo 1908a:
112).

(c) Another line of objection, represented in different ways by
Bernstein (Bernstein 1905), Jourdain (Jourdain 1904, 1905b) and Schoenflies (Schoenflies 1905), was that Zermelo's general
operation with sets in his proof was dangerous and flirts with
paradox. (See also Hallett 1984, 176–182.) In its imprecise form, the objection is that Zermelo is less
than explicit about the principles he uses in 1904, and that he
employs procedures which are reminiscent of those used crucially in
the generation of the Burali-Forti antinomy, e.g., in showing that if
the set
*L*_{γ} ≠ *M*, then it can be extended.
(What if *L*_{γ} is already the collection *W*?)

Zermelo's reply is dismissive, but there is something to the
criticism. Certainly Zermelo's 1904 proof attempts to show that WOT
can be proved while by-passing the general abstract theory of
well-ordering and its association with the Cantorian ordinals, and
therefore also bypassing the ‘the set *W*’ (as it was
widely known) of *all* Cantorian ordinals (denoted ‘Ω’
by Cantor), and consequently the Burali-Forti antinomy. However,
whatever Zermelo's *intention*, there is no *explicit* attempt to exclude
the possibility that *L*_{γ} = *W* and thus the
suggestion that antinomy might threaten. Of course, Zermelo, referring
to critics who ‘base their objections upon the
“Burali-Forti antinomy” ’, declares that this
antinomy ‘*is without significance* for my point of view, since
the principles I employed *exclude* the existence of a set *W* [of
all ordinals]’ (Zermelo 1908a: 128, with earlier hints on
118–119) that the real problem is with the ‘more
elementary’ Russell antinomy. It is also true that at the end of
the 1904 paper, Zermelo states that the argument holds for those
sets *M* ‘for which the totality of subsets, and so on, is
meaningful’, which, in retrospect is clearly a hint at important
restrictions on set formation. Even so, Zermelo's attitude is
unfair. It could be that the remark about ‘the totality of
subsets etc.’ is an indirect reference to difficulties with the
comprehension principle, but even so the principle is not repudiated
explicitly in the 1904 paper, neither does Zermelo put in its place
another principle for the conversion of properties to sets, which is
what the *Aussonderungsaxiom* of the 1908 axiomatisation
does. Moreover, he does not say that the existence principles on which
the proof is based are the *only* set existence principles, and he does
not divorce the proof of the theorem from the Cantorian assumptions
about well-ordering and ordinals. Indeed, Zermelo assumes that
‘every set can be well-ordered’ is equivalent to the
Cantorian ‘every cardinality is an aleph’ (Zermelo 1904:
141). And despite his later claim (Zermelo 1908a: 119), he does *appear*
to use the ordinals and the informal theory of well-ordering in his
definition of γ-sets, where a γ-set is ‘any
well-ordered *M*_{γ}…’, without any
specification of how ‘well-ordered set’ is to be
defined. What assurance is there that *this* can all be reduced to
Zermelo's principles? One important point here is that it had not yet
been shown that all the usual apparatus of set-theoretic mathematics
(relations, ordering relations, functions, cardinal equivalence
functions, order-isomorphisms, etc.) could be reduced to a few simple
principles of set existence. All of this was to come in the wake of
Zermelo's axiomatisation, and there is little doubt that this line of
criticism greatly influenced the shape of the second proof given in
1908, of which a little more below.

(d) The last line of objection was to a general feature of the
1904 proof, which was not changed in the second proof, namely the use
of what became known as ‘impredicative definition’. An
impredicative definition is one which defines an object *a* by a
property *A* which itself involves reference, either direct or
indirect, to all the things with that property, and this must, of
course, include *a* itself. There is a sense, then, in which the
definition of *a* involves a circle. Both Russell and
Poincaré became greatly exercised about this form of
definition, and saw the circle involved as being
‘vicious’, responsible for all the paradoxes. If one
thinks of definitions as like construction principles, then indeed
they are illegitimate. But if one thinks of them rather as ways of
singling out things which are already taken to exist, then they are
not illegitimate. In this respect, Zermelo endorses Hilbert's view of
existence. To show that some particular thing ‘exists’ is
to show that it is in 𝔅, i.e., to show by means of a finite
proof from the axioms that it exists in 𝔅. What
‘exists’, then, is really a matter of what the axioms,
taken as a whole, determine. If the separation, power set and choice
principles are axioms, then for a given *M* in the domain, there
will be choice functions/sets on the subsets of *M*, consequently
well-orderings, and so forth; if these principles are not included as
axioms, then such demonstrations of existence will not be
forthcoming. From this point of view, defining within the language
deployed is much more like what Zermelo calls
‘determination’, since definitions, although in a certain
sense arbitrary, have to be supported by existence proofs, and of
course in general it will turn out that a given extension can be
picked out by several, distinct ‘determinations’. In
short, Zermelo's view is that definitions pick out (or determine)
objects from among the others in the domain being axiomatised; they
are not themselves responsible for showing their *existence*. In
the end, the existence of a domain 𝔅 has to be guaranteed by a
consistency proof for the collection of axioms. Precisely this view
about impredicative definitions was put forward in Ramsey (1926:
368–369) and then later in Gödel's 1944 essay on Russell's
mathematical logic as part of his analysis of the various things which
could be meant by Russell's ambiguously stated Vicious Circle
Principle. (See Gödel 1944: 136, 127–128 of the reprinting
in Gödel 1990. See also Hadamard's letters in Hadamard et
al. 1905.) To support his view, Zermelo points out that impredicative
definitions are taken as standard in established mathematics,
particularly in the way that the least upper bound is defined; witness
the Cauchy proof of the Fundamental Theorem of Algebra. Once again,
Zermelo's reply is coloured by the principle of looking at the actual
practice of mathematics.^{[27]}

#### 2.2.4 Zermelo's second proof of the WOT, 1908

As mentioned, Zermelo published a second proof of the WOT,
submitted to *Mathematische Annalen* just two weeks before the
submission of his ‘official’ axiomatisation, and published
in the same volume as that axiomatisation. This proof is too elaborate
to be described here; a much fuller description can be found in
Hallett (2010b: 94–103), but some brief remarks about it must be
made nevertheless. Recall that the purpose of the proof was, in large
part, to reply to (some of) the criticisms raised in objection to the
1904 proof, and not least to clarify the status of the choice
principle.

Suppose *M* is the set given, and suppose (using Zermelo's
notation) that 𝔘*M* is the set of its subsets
(‘*Untermengen*’). The basic procedure in the 1904 proof was
to single out certain subsets of *M* and to show that these can
in effect be ‘chained’ together, starting from modest
beginnings (and using the choice function γ); thus we have
{*m*_{1}}, where
*m*_{1} = γ(*M*),
{*m*_{1}, *m*_{2}}, where
again
*m*_{1} = γ(*M*)
and
*m*_{2} = γ(*M* −
{*m*_{1}}), and so on. In this way, the proof
shows that one can ‘build up’ to the whole of *M*
itself.^{[28]} This
‘build-up’ is one of the things which provoked scepticism,
and particularly the step which shows that *M* itself must be
embraced by it. In the 1908 proof, the basic idea is to start
from *M* itself, and consider ‘cutting down’ by the
element ‘chosen’ by the choice principle, instead of
building up. Thus, if one accepts that if *M* is a legitimate
set, then so is 𝔘*M*, and there is not the same danger of
extending into inconsistent sets, not even the appearance of
danger. Again the key thing is to show that the sets defined are in
fact ‘chained’ together and are in the right way
exhaustive.

In the 1904 proof, there are points where it looks as if Zermelo is appealing to arbitrary well-orderings, and thus indirectly arbitrary ordinals. This is avoided in the 1908 proof (as it could have been in the 1904 proof) by focusing on the particular ‘chain’ which the proof gives rise to. It is this chain itself which exhibits the well-ordering.

In the modern understanding of set theory, to show that there is a
well-ordering on *M* would be to show that there is a set of
ordered pairs of members of *M* which is a relation satisfying
the right properties of a well-ordering relation over *M*. It is
well to remember that Zermelo's task in 1908 was constrained in that he had to
establish the existence of a well-ordering using only the
set-theoretical material *available to him*. This material did not
involve the general notion of ordinal and cardinal numbers, not even
the general notions of relation and function. What Zermelo used,
therefore, was the *particular* relation
*a* ⊆ *b* of being a subset,
and it is important to observe that the chain produced is
ordered by this relation.

Why would one expect this latter to work? Well, the chain produced
is naturally a subset well-ordering, for it is both linear and also
such that the intersection of arbitrary elements of members of the
chain is itself a member of the chain, and thus there is a natural
subset-least element for each subset of members of the chain. But the
wider explanation is hinted at towards the end of Zermelo's
proof. Suppose a set *M* is (speaking informally) *de facto*
well-ordered by an ordering relation ≺. Call the set
ℜ_{≼}(*a*) = {*x*
∈ *M* : *a* ≼ *x*} the
‘remainder [*Rest*]’ determined by *a* and the ordering
≺. Consider now the set of ‘remainders’ given by
this ordering, i.e.,
{ℜ_{≼}(*x*) : *x*
∈ *M*}. This set is in fact well-ordered by reverse
inclusion, where the successor remainder to
ℜ_{≼}(*a*) is just the remainder determined
by *a*'s successor *a*′ under ≺, and where
intersections are taken at the limit elements (the intersection of a
set of remainders is again a remainder). But not only is this set
well-ordered by reverse inclusion, the ordering is *isomorphic* to the
ordering ≺ on *M*, that is:

*a*≺

*b*if and only if ℜ

_{≼}(

*b*) ⊂ ℜ

_{≼}(

*a*).

Zermelo's 1908 construction is now meant to define a
‘remainder set’ directly without detour through some
≺; the resultant inclusion ordering is then
‘mirrored’ on *M*. The key thing is to show that the
chain of subsets of *M* picked out really matches *M*
itself. But if there were some element *a*
∈ *M* which did not correspond to a remainder
ℜ_{≼}(*a*), then it must be possible to use
the choice function to ‘squeeze’ another remainder into
the chain, which would contradict the assumption that all the sets
with the appropriate definition are already in the
chain.^{[29]} We
have spoken of functions and relations here. But in fact Zermelo
avoids such talk. He defines *M* as being
‘well-ordered’ when each element in *M*
‘corresponds’ uniquely to such a ‘remainder’
(Zermelo 1908a: 111). This shows, says Zermelo, that the theory of
well-ordering rests ‘exclusively upon the elementary notions of
set theory’, and that ‘the uninformed are only too prone
to look for some mystical meaning behind Cantor's relation
*a* ≺ *b*’ (Zermelo 1908a).

One can be considerably more precise about the relation between
orderings on *M* and ‘remainder inclusion orderings’
in 𝔘*M*. Much of this was worked out in Hessenberg (1906), and
was therefore known to Zermelo (Zermelo and Hessenberg were in regular
contact), and simplified greatly by Kuratowski in the 1920s. We will
have reason to refer to Kuratowski briefly in the next
section.^{[30]}

What about the choice principle? In 1904, this is framed in effect
as a choice function, whose domain is the non-empty subsets
on *M*. But in 1908, Zermelo frames it differently:

Axiom IV. A set

Sthat can be decomposed into a set of disjoint partsA,B,C, …, each containing at least one element, possesses at least one subsetS_{1}having exactly one element in common with each of the partsA,B,C, … considered. (Zermelo 1908a: 110)

In other words, the choice principle is now cast in a *set* form, and
not in the function form of 1904.

In the 1908 axiomatisation, the axiom is stated in much the same
way, but is called there (though not in the well-ordering paper) the
‘Axiom of Choice’. However, the 1908 paper on WOT does say
that the axiom provides a set (the *S*_{1}) of
‘simultaneous choices’, to distinguish them from the
‘successive choices’ used in the pre-Zermelo versions of
well-ordering. It is to be noted that in 1921, Zermelo wrote to
Fraenkel in partial repudiation of the designation ‘Axiom of
Choice’, saying that ‘there is no sense in which my theory
deals with a real
“choice” ’.^{[31]}

#### 2.2.5 The Axioms of the 1908 WOT Paper

What axioms governing set-existence does Zermelo rely on in Zermelo (1908a)? At the start of the paper, Zermelo list two ‘postulates’ that he explicitly depends on, a version of the separation axiom, and the power set axiom. Later on he lists Axiom IV, which, as noted, asserts the existence of a choice set for any set of disjoint non-empty sets. In addition to this, Zermelo makes use of the existence of various elementary sets, though he doesn't say exactly which principles he relies on. In the axiomatisation which follows two weeks later, Zermelo adopts all these axioms, but adds clarification about the elementary sets. He also adds the Axiom of Infinity, to guarantee that there are infinite sets, and the Axiom of Extensionality, which codifies the assumption that sets are really determined by their members, and not by the accidental way in which these members are selected. In addition, as we have noted, he now calls the Axiom of Choice by this name.

## 3. The Major Problems with Zermelo's System

Zermelo's system, although it forms the root of all modern axiomatisations of set theory, initially faced various difficulties. These were:

- Problems with the Axiom of Choice.
- Problem with the formulation of the Separation Axiom.
- Problems of ‘completeness’, one of Hilbert's important desiderata on the adequacy of an axiom system. Specifically, there were problems representing ordinary mathematics purely set-theoretically, and also problems representing fully the transfinite extension of mathematics which Cantor had pioneered.

The problems concerning the Axiom of Choice were discussed above; we now discuss the difficulties with the formulation of Separation and those of ‘completeness’.

### 3.1 Separation

The problem with the Axiom of Separation is not with the
obviousness of the principle; it *seems* straightforward to accept that
if one has a set of objects, one can separate off a subclass of this
set by specifying a property, and treat this in turn as a set. The
question here is a subtler one, namely that of how to formulate this
principle as an axiom. What means of ‘separating off’ are
to be accepted? What are allowable as the properties? As a matter of
practice, we use a language to state the properties, and in informal
mathematics, this is a mixture of natural language and special
mathematical language. The Richard Paradox (see Richard 1905 and also
the papers of Poincaré 1905, 1906a,b) makes it clear that one
has to be careful when defining properties, and that the unregulated
use of ‘ordinary language’ can lead to unexpected
difficulties.

Zermelo's answer to this, in moving from the system of the second
well-ordering paper to the axiomatisation, is to try specifying what
properties are to be allowed. He calls the properties to be allowed
‘definite properties’
(‘*Klassenaussagen*’ or ‘propositional
functions’), and states:

A question or assertion 𝔈 is said to be “

definite” if the fundamental relations of the domain, by means of the axioms and the universally valid laws of logic, determine without arbitrariness whether it holds or not. Likewise a “propositional function” 𝔈(x), in which the variable termxranges over all individuals of a class 𝔎, is said to be “definite” if it is definite for each single individualxof the class 𝔎. Thus the question whetheraεbor not is always definite, as is the question whetherM⊆Nor not.

Zermelo asserts that this shows that paradoxes involving the notions of definability (e.g., Richard's) or denotation (König's) are avoided, implying that what is crucial is the restriction to the ‘fundamental relations of the domain’ (so, ε, =).

The basic problem is that it is not explained by Zermelo what the precise route is from the fundamental relations ε and = to a given ‘definite property’; it is this which gives rise to a general doubt that the Separation Axiom is not, in fact, a safe replacement for the comprehension principle (see Fraenkel 1927: 104). This plays into the hands of those, who, like Poincaré, consider adoption of the Separation Axiom as insufficiently radical in the search for a solution to the paradoxes. Poincaré writes:

Mr. Zermelo does not allow himself to consider the set of all the objects which satisfy a certain condition because it seems to him that this set is never closed; that it will always be possible to introduce new objects. On the other hand, he has no scruple in speaking of the set of objects which are part of a certain

MengeMand which also satisfy a certain condition. It seems to him that one cannot possess aMengewithout possessing at the same time all its elements. Among these elements, he will choose those which satisfy a given condition, and will be able to make this choice very calmly, without fear of being disturbed by the introduction of new and unforeseen elements, since he already has all these elements in his hands. By positing beforehand thisMengeM, he has erected an enclosing wall which keeps out the intruders who could come from without. But he does not query whether there could be intruders from within whom he enclosed inside his wall. (Poincaré 1909: 477; p. 59 of the English translation)

Here, Poincaré is referring indirectly to his view that the paradoxes are due to impredicative set formation, and this of course will be still be possible even with the adoption of the Axiom of Separation.

The problem of the lack of clarity in Zermelo's account was
addressed by Weyl in 1910 (Weyl 1910; see especially p. 113) and then
again by Skolem in 1922 (Skolem 1923, p. 139 of the reprint). What
Weyl and Skolem both proposed, in effect, is that the question of what
‘definite properties’ are can be solved by taking these to
be the properties expressed by 1-place predicate formulas in what we
now call first-order logic. In effect, we thus have a recursive
definition which makes the definite properties completely transparent
by giving each time the precise route from ε, = to the
definite property in question. This does not deal with all aspects of
Poincaré's worry, but it does make it quite clear what definite
properties are, and it does also accord with Zermelo's view that the
relations =, ε are at root the only ones
used.^{[32]}

Fraenkel (1922 and later) took a different approach with a rather complicated direct axiomatisation of the notion of definite property, using recursive generation from the basic properties giving a notion which appears to be a subset of the recursively defined first-order properties.

Zermelo accepted none of these approaches, for two reasons. First,
he thought that the recursive definitions involved make direct use of
the notion of finite number (a fact pointed out by Weyl 1910), which
it ought to be the business of set theory to explain, not to
presuppose. Secondly, he became aware that using essentially a
first-order notion condemns the axiomatic system to countable models,
the fundamental fact pointed out in Skolem (1923). His own approach
was, first, to give a different kind of axiomatisation (see Zermelo
1929), and then to use (in Zermelo 1930) an essentially second-order
notion in characterising the axiom of
separation.^{[33]}

### 3.2 Completeness

There were also problems with the completeness of Zermelo's theory, since there were important theoretical matters with which Zermelo does not deal, either for want of appropriate definitions showing how certain constructions can be represented in a pure theory of sets, or because the axioms set out in Zermelo's system are not strong enough.

#### 3.2.1 Representing Ordinary Mathematics

Zermelo gives no obvious way of representing much of
‘ordinary mathematics’, yet it is clear from his opening
remarks that he regards the task of the theory of sets to stand as *the*
fundamental theory which should ‘investigate mathematically the
fundamental notions “number”, “order”, and
“function” ’.
(See §1.)

The first obvious question concerns the representation of the
ordinary number systems. The natural numbers are represented by
Zermelo as by ∅, {∅}, {{∅}}, …, and the Axiom
of Infinity gives us a set of these. Moreover, it seems that, since
both the set of natural numbers and the power set axiom are available,
there are enough sets to represent the rationals and the reals,
functions on reals etc. What are missing, though, are the details: how
exactly does one represent the right equivalence classes, sequences
etc.? And assuming that one *could* define the real numbers, how does
one characterise the field operations on them? In addition, as
mentioned previously, Zermelo has no natural way of representing
either the general notions of relation or of function. This means that
his presentation of set theory has no natural way of representing
those parts of mathematics (like real analysis) in which the general
notion of function plays a fundamental part.

A further difficulty is that the lack of the notion of function
makes the general theory of the comparison of sets by size (or indeed
by order) cumbersome. Zermelo does develop a way of expressing, for
disjoint sets *a*, *b*, that *a* is of the same size
as *b*, by first defining a ‘product’ of two disjoint
sets, and then isolating a set of unordered pairs (a certain subset of
this product) which ‘maps’ one of the sets one-to-one onto
the other. But this is insufficiently general, and does not in any
case indicate any way to introduce ‘the’ size
of *a*. Russell's method (defining the cardinality of *M* as
the set *card*(*M*) = {*N* : *N*
∼ *M*} (where ‘∼’ means
‘cardinally equivalent to’) is clearly inappropriate,
since with a set *a* = {*b*},
*card*(*a*) (which should be the cardinal number 1) is as big as
the universe, and the union set of 1 would indeed be the
universal ‘set’. Over and above this, there is the more
specific problem of defining the aleph numbers.

The second major difficulty is along the same lines, concerning, not functions, but relations, and thus ordering relations and ordinal numbers. As we have seen (in §2.2.4), Zermelo has the beginnings of an answer to this in his second proof of the WOT, for this uses a theory of subset-orderings to represent the underlying ordering of a set. It turns out that the method given in this particular case suggests the right way to capture the general notion.

#### 3.2.2 Ordinality

Zermelo's idea (1908a) was pursued by Kuratowski in the 1920s,
thereby generalising and systematising work, not just of Zermelo, but
of Hessenberg and Hausdorff too, giving a simple set of necessary and
sufficient conditions for a subset ordering to represent a linear
ordering. He also argues forcefully that it is in fact *undesirable* for
set theory to go beyond this and present a general theory of ordinal
*numbers*:

In reasoning with transfinite numbers one implicitly uses an axiom asserting their

existence; but it is desirable both from the logical and mathematical point of view to pare down the system of axioms employed in demonstrations. Besides, this reduction will free such reasoning from a foreign element, which increases its æsthetic value. (Kuratowski 1922: 77)

The assumption here is clearly that the (transfinite) numbers will
have to be added to set theory as new primitives. Kuratowski however
undertakes to *prove* that the transfinite numbers can be dispensed with
for a significant class of
applications.^{[34]}
Application of the ordinal numbers in
analysis, topology, etc. often focuses on some process of definition
by transfinite recursion over these numbers. Kuratowski succeeds in
showing that in a significant class of cases of this kind, the
ordinals can be avoided by using purely set-theoretic methods which
are reproducible in Zermelo's system. As he notes:

From the viewpoint of Zermelo's axiomatic theory of sets, one can say that the method explained here allows us to deduce theorems of a certain well-determined general type

directlyfrom Zermelo's axioms, that is to say, without the introduction of any independent, supplementary axiom about the existence of transfinite numbers. (Kuratowski 1922: 77)^{[35]}

It is in this reductionist context that Kuratowski develops his
very general theory of maximal inclusion orderings, which shows, in
effect, that all orderings on *a* can really be represented as
inclusion orderings on appropriate subsets of the power set
of *a*, thus reducing ordering to Zermelo's primitive relation
ε.

One immediate, and quite remarkable, result of this work is that it
shows how one can *define* the general notions of relation and function
in purely set-theoretic terms. It had long been recognised that
relations/functions can be conceived as sets of ordered pairs, and
Kuratowski's work now shows how to define the ordered pair
primitively. The ordered pair (*a*, *b*) can be considered
informally as the unordered pair *M* = {*a*, *b*},
together with an ordering relation *a* < *b*. Suppose
this relation is treated now via the theory of inclusion chains. The
only maximal inclusion chains in the power set of *M* are
{∅, {*a*}, {*a*, *b*}} and {∅,
{*b*}, {*a*, *b*}}. Using
Kuratowski's definition of the ordering ‘<’ derived
from a maximal inclusion chain, these chains must then correspond to
the orderings *a* < *b* and *b* < *a*
on {*a*, *b*} respectively. If
∅ is ignored, the resulting chain {{*a*},
{*a*, *b*}} is thus associated with the
relation *a* < *b*, and so with the ordered set (pair)
(*a*, *b*). It is then quite natural to *define*
(*a*, *b*) as {{*a*},
{*a*, *b*}} (see Kuratowski 1921: 170–171). One
can now define the product *a*
× *b* of *a* and *b* as the set of all
ordered pairs whose first member is in *a* and whose second
member is in *b*; relations on *a* can now be treated as
subsets of *a* × *a*, and
functions from *a* to *b* as certain subsets
of *a* × *b*. Thus, many of
the representational problems faced by Zermelo's theory are solved at
a stroke by Kuratowski's work, building as it does on Zermelo's
own.

#### 3.2.3 Cardinality

But there was a problem concerning cardinality which is independent
of the problem of definitional reduction. It was pointed out by both
Fraenkel and Skolem in the early 1920s that Zermelo's theory cannot
provide an adequate account of cardinality. The axiom of infinity and
the power set axiom together allow the creation of sets of
cardinality ≥ ℵ_{n}
for each natural number *n*, but this (in the absence of a result
showing that 2^{ℵ0} >
ℵ_{n} for every natural number *n*) is not
enough to guarantee a set whose power is ≥
ℵ_{ω}, and a set of power
ℵ_{ω} is a natural next step (in the Cantorian
theory) after those of power ℵ_{n}. Fraenkel
proposed a remedy to this (as did Skolem independently) by proposing
what was called the *Ersetzungsaxiom*, the Axiom of Replacement (see
Fraenkel 1922: 231 and Skolem 1923: 225–226). This says,
roughly, that the ‘functional image’ of a set must itself
be a set, thus if *a* is a set,
then {*F*(*x*) : *x*
∈ *a*} must also be a set, where
‘*F*’ represents a functional correspondence. Such an
axiom is certainly sufficient; assume that *a*_{0} is the
set of natural numbers {0, 1, 2, …}, and now assume that to
each number *n* is associated an *a*_{n} with
power ℵ_{n}. Then according to the replacement
axiom, *a* =
{*a*_{0}, *a*_{1}, *a*_{2},
…} must be a set, too. This set is countable, of course, but
(assuming that the *a*_{n} are all disjoint) the union set of *a* must have cardinality at
least ℵ_{ω}.

The main difficulty with the Replacement Axiom is that of how to
formulate the notion of a functional correspondence. This was not
solved satisfactorily by Fraenkel, but the Weyl/Skolem solution works
here, too: a functional correspondence is (in effect) just any
first-order 2-place predicate ϕ(*x*, *y*) which
satisfies the condition of uniqueness,
i.e., ∀*x*, *y*, *z*{[ϕ(*x*, *y*)
∧ ϕ(*x*, *z*)] → *y* = *z*}.
With this solution, the Replacement Axiom will be (as required)
stronger than Zermelo's original Separation Axiom and indeed can
replace it; however, in Fraenkel's system, one can prove his version
of the Replacement Axiom from his version of the Separation Axiom,
which shows that his separate definition of function is not
sufficiently strong. (For details, see Hallett 1984:
282–286.)

Zermelo initially had doubts about the Replacement Axiom (see the
letter to Fraenkel from 1922 published in Ebbinghaus 2007: 137), but
he eventually accepted it, and a form of it was included in his new
axiomatisation published in 1930 (Zermelo 1930). Skolem's formulation
is the one usually adopted, though it should be noted that von
Neumann's own formulation is rather different and indeed
stronger.^{[36]}

#### 3.2.4 Ordinals

Although Kuratowski's work solved many of the representational
problems for Zermelo's theory, and the Replacement Axiom shows how the
most obvious cardinality gap can be closed, there still remained the
issue (Kuratowski's view to one side) of representing accurately the
full extent of the theory which Cantor had developed, with the
transfinite numbers as fully fledged objects which
‘mirror’ the size/ordering of sets. Once the ordinal
number-classes are present, the representation of the alephs is not a
severe problem, which means that the representation of transfinite
numbers amounts to assuring the existence of sufficiently many
transfinite *ordinal* numbers. Indeed, as was stated above, the
hypothesis that the scale of aleph numbers is sufficient amounts to
the claim that any set can be ‘counted’ by some
ordinal. There are then two interrelated problems for the
‘pure’ theory of sets: one is to show how to define
ordinals as sets in such a way that the natural numbers generalise;
the other problem is to make sure that there are enough ordinals to
‘count’ all the sets.

The problem was fully solved by von Neumann in his work on
axiomatic set theory from the early 1920s. Cantor's fundamental
theorems about ordinal numbers, showing that the ordinals are the
*representatives* of well-ordered sets, are the theorem that every
well-ordered set is order-isomorphic to an initial segment of the
ordinals, and that every ordinal is itself the order-type of the set
of ordinals which precede it. These results prove crucial in the von
Neumann treatment. Von Neumann's basic idea was explained by him as
follows:

What we really wish to do is to take as the basis of our considerations the proposition: ‘Every ordinal is the type of the set of all ordinals that precede it’. But in order to avoid the vague notion ‘type’, we express it in the form: ‘Every ordinal is the set of the ordinals that precede it’. (von Neumann 1923, p. 347 of the English translation)

According to von Neumann's idea, 1 is just {0}, 2 is just {0, 1}, 3
is just {0, 1, 2} and so on. On this conception, the first transfinite
ordinal ω is just {0, 1, 2, 3, …, *n*, …},
and generally it's clear that the immediate successor of any ordinal
α is just α ∪ {α}. If we
identify 0 with ∅, as Zermelo did, then we have available a
reduction of the general notion of ordinal to pure set theory, where
the canonical well-ordering on the von Neumann ordinals is just the
subset relation, i.e., α < β just in case α ⊂
β, which von Neumann later shows is itself equivalent to saying
α ∈ β. (See von Neumann 1928, p. 328 of the
reprinting.) So again, inclusion orderings are fundamental.

Von Neumann gives a general definition of his ordinals, namely that
a set α is an ordinal number if and only if it is a set ordered
by inclusion, the inclusion ordering is a well-ordering, and each
element ξ in α equals the set of elements in the initial
segment of the ordering determined by ξ. This connects directly
with Kuratowski's work in the following way. Suppose *M* is a
well-ordered set which is then mirrored by an inclusion
chain **M** in the power set of *M*. Then the first few
elements of the inclusion chain will be the sets ∅, {*a*},
{*a*, *b*}, {*a*, *b*, *c*}, …,
where *a*, *b*, *c*, … are the first, second,
third …elements in the well-ordering of *M*. The von
Neumann ordinal corresponding to *M* will also be an inclusion
ordering whose first elements will be

∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}, …

(in other words, 0, 1, 2, 3…), and we have 0 ⊂ 1 ⊂ 2
⊂ 3 ⊂… in mirror image of
∅ ⊂ {*a*} ⊂ {*a*, *b*}
⊂ {*a*, *b*, *c*}
⊂ …

These von Neumann ordinals had, in effect, been developed before
von Neumann's work. The fullest published theory, and closest to the
modern account, is to be found in Mirimanoff's work published in 1917
and 1921 (see Mirimanoff 1917a,b, 1921), though he doesn't take the
final step of identifying the sets he characterises with the ordinals
(for an account of Mirimanoff's work, see Hallett 1984:
273–275). It is also clear that Russell, Grelling and Hessenberg
were close to von Neumann's general set-theoretic definition of
ordinals. But crucially Zermelo himself developed the von Neumann
conception of ordinals in the years 1913–1916, (for a full
account, see Hallett 1984: 277–280 and Ebbinghaus 2007:
133–134). Zermelo's idea was evidently well-known to the
Göttingen mathematicians, and there is an account of it in
Hilbert's lectures ‘*Probleme der mathematischen
Logik*’ from 1920,
pp. 12–15.^{[37]}

Despite all these anticipations, it is still right to ascribe the theory to von Neumann. For it was von Neumann who revealed the extent to which a full theory of the ordinals depends on the Axiom of Replacement. As he wrote later:

A treatment of ordinal number closely related to mine was known to Zermelo in 1916, as I learned subsequently from a personal communication. Nevertheless, the fundamental theorem, according to which to each well-ordered set there is a similar ordinal, could not be rigorously proved because the replacement axiom was unknown. (von Neumann 1928: 374, n. 2)

The theorem von Neumann states is the central result of Cantor's
mentioned here in the second paragraph of this section. As von Neumann goes on to point out
here (also p. 374), it is the possibility of definition by transfinite
induction which is key, and a rigorous treatment of this requires
being able to prove at each stage in a transfinite inductive process
that the collection of functional correlates to a set is itself a set
which can thus act as a new argument at the next stage. It is just
this which the replacement axiom guarantees. Once justified,
definition by transfinite induction can be used as the basis for
completely general definitions of the arithmetic operations on ordinal
numbers, for the definition of the aleph numbers, and so on. It also
allows a fairly direct transformation of Zermelo's first (1904) proof
of the WOT into a proof that every set can be represented by (is
equipollent with) an ordinal number, which shows that in the Zermelo
system with the Axiom of Replacement added there *are* enough ordinal
numbers.^{[38]}

It is thus remarkable that von Neumann's work, designed to show how
the transfinite ordinals can be incorporated directly into a pure
theory of sets, builds on and coalesces with both Kuratowski's work,
designed to show the *dispensability* of the theory of transfinite
ordinals, and also the axiomatic extension of Zermelo's theory
suggested by Fraenkel and Skolem.

## 4. Further reading

For a summary of the Cantorian theory as it stood in the early years of the twentieth century, see Young and Young (1906), and the magisterial Hausdorff (1914); for further reading on the development of set theory, see the books Ferreiros 1999, Hallett 1984, Hawkins 1970, and Moore 1982. See also the various papers on the history of set theory by Akihiro Kanamori (especially Kanamori 1996, 1997, 2003, 2004, 2012) and the joint paper with Dreben (Dreben and Kanamori 1997). For the place of set theory in the development of modern logic, see Mancosu et al., 2009, especially pages 345–352.

For an account of the various axiom systems and the role of the different axioms, see Fraenkel et al. (1973). For a detailed summary of the role of the Axiom of Choice, and insight into the question of its status as a logical principle, see Bell (2009).

This entry will be supplemented by a further entry on axiomatizations of set theory after Zermelo from 1920 to 1940.

## Bibliography

Most of the original sources surrounding Zermelo's work were written in German, and some in French; when translations of these works into English are available, bibliographic information for the translations follows the citation of the original text. Similarly for older, relatively inaccessible texts that have been republished in more current works.

- Bell, J., 2009,
*The Axiom of Choice*, London: College Publications. - Benacerraf, P. and H. Putnam (eds.), 1964,
*Philosophy of Mathematics: Selected Readings*, Oxford: Basil Blackwell. - ––– (eds.), 1983,
*Philosophy of Mathematics: Selected Readings*, Second Edition, Cambridge: Cambridge University Press. - Bernstein, F., 1905, “Über die Reihe der transfiniten Ordnungszahlen”,
*Mathematische Annalen*60: 187–193. - Borel, E., 1905, “Quelque remarques sur les principes de la théorie des ensembles”,
*Mathematische Annalen*60: 194–195. - Browder, F. (ed.), 1976,
*Mathematical Developments Arising from the Hilbert Problems*, Volume 28 of*Proceedings of Symposia in Pure Mathematics*, Providence: American Mathematical Society. - Cantor, G., 1883a, “Ueber unendliche, lineare Punktmannichfaltigkeiten”
*Mathematische Annalen*21: 545–591. Reprinted in Cantor 1883b and in Cantor 1932: 165–209. English translation in Ewald 1996, Volume 2. - –––, 1883b,
*Grundlagen einer allegemeinen Mannigfaltichkeitslehre. Ein mathematisch-philosophischer Versuch in der Lehre des Unendlichen*, Leipzig: B. G. Teubner. - –––, 1895, “Beiträge zur Begründung der transfiniten Mengenlehre, Erster Artikel”,
*Mathematische Annalen*46: 481–512. Reprinted in Cantor 1932: 282–311. English translation in Cantor 1915. - –––, 1897, “Beiträge zur Begründung der transfiniten Mengenlehre, Zweiter Artikel”,
*Mathematische Annalen*49: 207–246. Reprinted in Cantor 1932: 312–351. English translation in Cantor 1915. - –––, 1915,
*Contributions to the Founding of the Theory of Transfinite Numbers*, La Salle: Open Court. English translation of Cantor 1895, 1897 by Philip E. B. Jourdain. - –––, 1932,
*Gesammelte Abhandlungen mathematischen und philosophischen Inhalts, mit eläuternden Anmerkungen sowie mit Ergänzungen aus dem Briefwechsel Cantor-Dedekind herausgegeben von Ernst Zermelo*, Berlin: Springer. - –––, 1991,
*Georg Cantor: Briefe. Herausgegeben von Herbert Meschkowski*, Berlin: Springer - Dedekind, R., 1888,
*Was sind und was sollen die Zahlen?*, Braunschweig: Vieweg und Sohn. Also reprinted in Dedekind 1932: 335–391; English translation in Ewald 1996: 787–833. - –––, 1932,
*Gesammelte mathematische Werke. Band 3. Herausgegeben von Robert Fricke, Emmy Noether and Öystein Ore*, Braunschweig: Friedrich Vieweg und Sohn. Reprinted with some omissions by Chelsea Publishing Co., New York, 1969. - Dreben, B. and A. Kanamori, 1997, “Hilbert and set theory”,
*Synthese*, 110: 77–125. - Ebbinghaus, H.-D., 2007,
*Ernst Zermelo: An Approach to His Life and Work*, Berlin: Springer. - –––, 2010, “Introductory note to
*Über den Begriff von Definitheit in der Axiomatik*[Zermelo 1929]”, in Zermelo 2010: 352–357. - Ewald, W. (ed.), 1996,
*From Kant to Hilbert*, Oxford: Oxford University Press. - Ewald, W., W. Sieg, and M. Hallett (eds.), 2013,
*David Hilbert's Lectures on the Foundations of Logic and Arithmetic, 1917–1933*, Volume 3 of*Hilbert's Lectures on the Foundations of Mathematics and Physics, 1891–1933*, Berlin: Springer. - Felgner, U., 2010, “Introductory note to
*Untersuchungen über die Grundlagen der Mengenlehre, I*[Zermelo 1908b]”, in Zermelo 2010: 160–188. - Ferreiros, J., 1999,
*Labyrinth of Thought: A History of Set Theory and its Role in Modern Mathematics*, Science Networks Historical Studies, Basel: Birkhäuser. Second Revised Edition, 2007 - Fraenkel, A., Y. Bar-Hillel, and A. Levy, 1973,
*Foundations of Set Theory*. Amsterdam: North-Holland Publishing. - Fraenkel, A. A., 1922, “Zu den Grundlagen der Cantor-Zermeloschen Mengenlehre”,
*Mathematische Annalen*86: 230–237. - –––, 1927,
*Zehn Vorlesungen über die Grundlegung der Mengenlehre*, Leipzig: B. G. Teubner. - Frege, G., 1879,
*Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens*, Halle: Louis Nebert. Reprinted in Frege 1964, English translation in van Heijenoort 1967: 1–82. - –––, 1893,
*Grundgesetze der Arithmetik*, Band 1, Jena: Hermann Pohle. English translation by Philip Ebert and Marcus Rossberg,*Frege, The Basic Laws of Arithmetic, Derived using Concept-Script*, Oxford: Oxford University Press, forthcoming. - –––, 1903,
*Grundgesetze der Arithmetik*, Band II, Jena: Hermann Pohle. English translation by Philip Ebert and Marcus Rossberg,*Frege, The Basic Laws of Arithmetic, Derived using Concept-Script*, Oxford: Oxford University Press, forthcoming. - –––, 1964,
*Begriffsschrift und andere Aufsätze. Mit E. Husserls und H. Scholz’ Anmerkungen herausgegeben von Ignacio Angelelli*, Darmstadt: Wissenschaftliche Buchgesellschaft. - Gödel, K., 1944, “Russell's mathematical logic”, in P. A. Schillp (ed.),
*The Philosophy of Bertrand Russell*, pp. 125–153, La Salle: Open Court. Reprinted in Benacerraf and Putnam 1964: 211–232; Benacerraf and Putnam 1983: 447–469; and in Gödel 1990: 119–141. - –––, 1990,
*Kurt Gödel: Collected Works*, Volume 2, edited by Solomon Feferman et al., Oxford: Oxford University Press. - Haaparanta, L. (ed.), 2009,
*The Development of Modern Logic*, Oxford: Oxford University Press. - Hadamard, J. et al., 1905, “Cinq letters sur la théorie des ensembles”,
*Bulletin de la société mathématique de France*, 33: 261–273. Letters between Baire, Borel, Lebesgue and Hadamard on objections to, and defense of, Zermelo's 1904 proof of the well-ordering theorem. - Hallett, M., 1981, “Russell, Jourdain and ‘limitation of size’”,
*British Journal for the Philosophy of Science*, 32: 381–399. - –––, 1984,
*Cantorian Set Theory and Limitation of Size*, Oxford: Clarendon Press. - –––, 2008, “The ‘purity of method’ in Hilbert's
*Grundlagen der Geometrie*”, in P. Mancosu (ed.),*The Philosophy of Mathematical Practice*, pp. 198–255, Oxford: Clarendon Press. - –––, 2010a, “Frege and Hilbert”, in M. Potter and T. Ricketts (eds.),
*The Cambridge Companion to Frege*, Cambridge: Cambridge University Press. - –––, 2010b, “Introductory note to Zermelo's two papers on the well-ordering theorem”, in Zermelo 2010: 80–115.
- Hallett, M. and U. Majer (eds.), 2004,
*David Hilbert's Lectures on the Foundations of Geometry, 1891–1902*, Volume 1 of*Hilbert's Lectures on the Foundations of Mathematics and Physics, 1891–1933*, Berlin: Springer. - Hardy, G. H., 1904, “A theorem concerning the infinite cardinal numbers”,
*Quarterly Journal of Pure and Applied Mathematics*35: 87–94. - Hartogs, F., 1915, “Über das Problem der Wohlordnung”,
*Mathematische Annalen*, 76: 438–442. - Harward, A. E., 1905, “On the transfinite numbers”,
*Philosophical Magazine*10(6): 439–460. - Hausdorff, F., 1914,
*Grundzüge der Mengenlehre*, Leipzig: Von Veit. - Hawkins, T., 1970,
*Lebesgue's Theory of Integration*. New York: Blaisdell. Reprinted by the Chelsea Publishing Company, New York, 1979. - van Heijenoort, J. (ed.), 1967,
*From Frege to Gödel: A Source Book in Mathematical Logic*, Cambridge, Massachusetts: Harvard University Press. - Heinzmann, G., 1986,
*Poincaré, Russell, Zermelo et Peano. Textes de la discusion (1906–1912) sur les fondements des mathématiques: des antinomies à la prédicativité*, Paris: Albert Blanchard. - Hessenberg, G., 1906, “Grundbegriffe der Mengenlehre”,
*Abhandlungen der neuen Fries'schen Schule (Neue Folge)*1: 479–706. - Hilbert, D., 1899, “Grundlagen der Geometrie”, in
*Festschrift zur Feier der Enthüllung des Gauss-Weber-Denkmals in Göttingen*, Leipzig: B. G. Teubner. Republished as Chapter 5 in Hallett and Majer 2004. - –––, 1900a, “Mathematische Probleme”,
*Nachrichten von der königlichen Gesellschaft der Wissenschaften zu Göttingen, mathematisch-physikalische Klasse*, pp. 253–296. English translation by Mary Winston Newson, 1902, “Mathematical Problems”*Bulletin of the American Mathematical Society*8: 437–479. - –––, 1900b, “Über den Zahlbegriff”,
*Jahresbericht der deutschen Mathematiker-Vereinigung*8: 180–185. Reprinted (with small modifications) in Second to Seventh Editions of Hilbert 1899. - –––, 1902, “Grundlagen der Geometrie”. Ausarbeitung by August Adler for lectures in the Sommersemester of 1902 at the Georg-August Universität, Göttingen. Library of the Mathematisches Institut. Published as Chapter 6 in Hallett and Majer 2004.
- –––, 1918, “Axiomatisches Denken”,
*Mathematische Annalen*, 78: 405–415. Reprinted in Hilbert 1935: 146–156; English translation in Ewald 1996: volume 2, pp. 1105–1115. - –––, 1920, “Probleme der mathematischen Logik”, Lecture notes for a course held in the Wintersemester of 1920 at the Georg-August Universität, Göttingen, ausgearbeitet by Moses Schönfinkel and Paul Bernays. Library of the Mathematisches Institut, Universität Göttingen. Published in Ewald et al. 2013, Chapter 2.
- –––, 1935,
*Gesammelte Abhandlungen, Band 3*. Berlin: Julius Springer. - Jourdain, P. E. B., 1904, “On the the transfinite cardinal numbers of well-ordered aggregates”,
*Philosophical Magazine*7(6): 61–75. - –––, 1905a, “On a proof that every aggregate can be well-ordered”,
*Mathematische Annalen*60: 465–470. - –––, 1905b, “On transfinite numbers of the exponential form”,
*Philosophical Magazine*9(6): 42–56. - Kanamori, A., 1996, “The mathematical development of set theory from Cantor to Cohen”,
*Bulletin of Symbolic Logic*2: 1–71. - –––, 1997, “The mathematical import of Zermelo's well-ordering theorem”,
*Bulletin of Symbolic Logic*3: 281–311. - –––, 2003, “The empty set, the singleton, and the ordered pair”,
*Bulletin of Symbolic Logic*9: 273–298. - –––, 2004, “Zermelo and set theory”,
*Bulletin of Symbolic Logic*10: 487–553. - –––, 2012, “In praise of replacement”,
*Bulletin of Symbolic Logic*, 18: 46–90. - Kuratowski, C., 1921, “Sur la notion de l'ordre dans la théorie des ensembles”,
*Fundamenta Mathematicae*2: 161–171. - –––, 1922, “Une méthode d'élimination des nombres transfini des raisonnements mathématiques”,
*Fundamenta Mathematicae*3: 76–108. - Mancosu, P., 2009, “Measuring the size of infinite collections of natural numbers: was Cantor's theory of infinite number inevitable?”,
*Review of Symbolic Logic*2: 612–646. - –––, 2010,
*The Adventure of Reason: Interplay Between Philosophy of Mathematics and Mathematical Logic, 1900–1940*. Oxford: Oxford University Press. - Mancosu, P., R. Zach, and C. Badesa, 2009, “The development of mathematical logic from Russell to Tarski, 1900–1935”, in Haaparanta 2009: 318–470. Reprinted in Mancosu 2010: 5–119.
- Mirimanoff, D., 1917a, “Les antinomies de Russell et de Burali-Forti et le problème fondamental de la théorie des ensembles”,
*L'enseignement mathématique*19: 37–52. - –––, 1917b, “Remarques sur la théorie des ensembles et les antinomies Cantoriennes (I)”,
*L'enseignement mathématique*19: 208–217. - –––, 1921, “Remarques sur la théorie des ensembles et les antinomies Cantoriennes (II)”,
*L'enseignement mathématique*21: 29–52. - Moore, G., 1976, “Ernst Zemelo, A. E. Harward, and the axiomatisation of set theory”,
*Historia Mathematica*3: 206–209. - –––, 1982,
*Zermelo's Axiom of Choice: Its Origins, Development and Influence*. Berlin: Springer. - Peano, G., 1906, “Additione”,
*Revista di mathematica*8: 143–157. Reprinted in Heinzmann 1986: 106–120. - Peckhaus, V., 1990,
*Hilbertprogramm und kritische Philosophie: das Göttinger Modell interdisziplinärer Zusammenarbeit zwischen Mathematik und Philosophie*, Volume 7 of*Studien zur Wissenschafts- Sozial- und Bildungsgeschichte*, Göttingen: Vandenhoek and Ruprecht. - Poincaré, H., 1905, “Les mathématiques et la logique”,
*Revue de métaphysique et de morale*13: 815–835. Reprinted with alterations in Poincaré 1908: Part II, Chapter 3; and, with these alterations noted, in Heinzmann 1986: 11–34. English translation in Ewald 1996: 1021–1038. - –––, 1906a, “Les mathématiques et la logique”,
*Revue de métaphysique et de morale*14: 17–34. Reprinted with alterations in Poincaré 1908: Part II, Chapter 3; and, with these alterations noted, in Heinzmann 1986: 35–53. English translation in Ewald 1996: 1038–1052. - –––, 1906b, “Les mathématiques et la logique”,
*Revue de métaphysique et de morale*14: 294–317. Reprinted with alterations in Poincaré 1908: Part II, Chapter 5; and, with these alterations noted, in Heinzmann 1986: 35–53. English translation in Ewald 1996: 1052–1071. - –––, 1908,
*Science et méthode*, Paris: Ernst Flammarion. English translation in Poincaré 1913b, and retranslated by Francis Maitland as*Science and Method,*New York: Dover Publications. - –––, 1909, “Le logique de l'infini”,
*Revue de métaphysique et de morale*17: 462–482. Reprinted in Poincaré 1913a: 7–31. - –––, 1913a,
*Dernières Pensées*, Paris: Ernest Flammarion. English translation published in 1963 as*Mathematics and Science: Last Essays,*New York: Dover Publications. - –––, 1913b,
*The Foundations of Science*, New York: Science Press. Preface by Poincaré and an Introduction by Josiah Royce. Contains English translation by G. B. Halsted of Poincaré 1908. - Ramsey, F. P., 1926, “The foundations of mathematics”,
*Proceedings of the London Mathematical Society*25 (Second Series): 338–384. Reprinted in Ramsey 1931: 1–61, and Ramsey 1978: 152–212. - –––, 1931,
*The Foundations of Mathematics and Other Logical Essays*, R. B. Braithwaite (ed.), London: Routledge and Kegan Paul, London. - –––, 1978,
*Foundations: Essays in Philosophy, Logic, Mathematics and Economics*, D. H. Mellor (ed.), London: Routledge and Kegan Paul. - Richard, J., 1905, “Les principes des mathématiques et le problème des ensembles”,
*Révue général des sciences pures et appliqués*16: 541. English translation in van Heijenoort 1967: 142–144. - Russell, B., 1902, Letter to Frege. In Heijenoort 1967: 124–125.
- –––, 1903,
*The Principles of Mathematics*, Volume 1, Cambridge: Cambridge University Press. - Schoenflies, A., 1905, “Über wohlgeordnete Mengen”,
*Mathematische Annalen*60: 181–186. - Skolem, T., 1923, “Einige Bemerkungen zur axiomatischen Begründung der Mengenlehre”,
*Matimatikerkrongressen i Helsingfors den 4–7 Juli 1922, Den femte skandinaiska matematikerkongressen, redogörelse, 1923*, pp. 217–232. Reprinted in Skolem 1970: 137–152 which also preserves the original pagination. English translation in Heijenoort 1967: 290–301. - –––, 1970,
*Selected Papers in Logic*, Oslo: Universitetsforlaget. Edited by Jens Erik Fenstad. - von Neumann, J., 1923, “Zur Einführung der transfiniten Zahlen”,
*Acta Litterarum ac Scientiarum Regiæ Universitatis Hungaricæ Francisco-Josephinæ. Sectio Scientiæ-Mathematicæ*1, pp. 199–208. Reprinted in von Neumann 1961: 24–33. English translation in van Heijenoort 1967: 346–354. - –––, 1928, “Über die Definition durch transfinite Induktion und verwandte Fragen der allgemeinen Mengenlehre”,
*Mathematische Annalen*99: 373–391. Reprinted in von Neumann 1961: 320–338. - –––, 1961,
*John von Neumann: Collected Works*, Volume 1, Oxford: Pergamon Press. - Weyl, H., 1910, “Über die Definitionen der mathematischen Grundbegriffe”,
*Mathematisch-naturwissenschaftliche Blätter*7, pp. 93–95, 109–113. Reprinted in Weyl 1968, Volume 1, 298–304. - –––, 1968,
*Gesammelte Abhandlungen*, 4 Volumes, Berlin: Springer. - Young, W. H. and G. C. Young, 1906,
*The Theory of Sets of Points*, Cambridge: Cambridge University Press. - Zermelo, E., 1904, “Beweis, daß jede Menge wohlgeordnet werden kann”,
*Mathematische Annalen*59: 514–516. Reprinted in Zermelo 2010: 114–119, with a facing-page English translation, and an Introduction by Michael Hallett (2010b). English translation also in van Heijenoort 1967: 139–141. - –––, 1908a, “Neuer Beweis für die Möglichkeit einer Wohlordnung”,
*Mathematische Annalen*65: 107–128. Reprinted in Zermelo 2010: 120–159, with a facing-page English translation, and an Introduction by Michael Hallett (2010b). English translation also in van Heijenoort 1967: 183–198. - –––, 1908b, “Untersuchungen über die Grundlagen der Mengenlehre, I”,
*Mathematische Annalen*65: 261–281. Reprinted in Zermelo 2010: 189–228, with a facing-page English translation, and an Introduction by Ulrich Felgner (2010). English translation also in van Heijenoort 1967: 201–215. - –––, 1929, “Über den Begriff von Definitheit in der Axiomatik”,
*Fundamenta Mathematicae*14: 339–344. Reprinted with facing-page English translation in Zermelo 2010: 358–367, with an Introduction by Heinz-Dieter Ebbinghaus (2010). - –––, 1930, “Über Grenzzahlen und Mengenbereiche: Neue Untersuchungen über die Grundlagen der Mengenlehre”,
*Fundamenta Mathematicae*16: 29–47. Reprinted with facing-page English translation in Zermelo 2010: 400–431, with an Introduction by Akihiro Kanamori. English translation also in Ewald 1996, Volume 2, pp. 1219–1233. - –––, 2010,
*Collected Works. Volume I: Set Theory, Miscellanea*, H.-D. Ebbinghaus and A. Kanamori (eds.), Berlin: Springer.

## Academic Tools

How to cite this entry. Preview the PDF version of this entry at the Friends of the SEP Society. Look up this entry topic at the Indiana Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers, with links to its database.

## Other Internet Resources

[Please contact the author with suggestions.]

## Related Entries

Russell's paradox | set theory | set theory: alternative axiomatic theories | set theory: early development