The Axiom of Choice

First published Tue Jan 8, 2008; substantive revision Fri Dec 10, 2021

The principle of set theory known as the Axiom of Choice has been hailed as “probably the most interesting and, in spite of its late appearance, the most discussed axiom of mathematics, second only to Euclid’s axiom of parallels which was introduced more than two thousand years ago” (Fraenkel, Bar-Hillel & Levy 1973, §II.4). The fulsomeness of this description might lead those unfamiliar with the axiom to expect it to be as startling as, say, the Principle of the Constancy of the Velocity of Light or the Heisenberg Uncertainty Principle. But in fact the Axiom of Choice as it is usually stated appears humdrum, even self-evident. For it amounts to nothing more than the claim that, given any collection of mutually disjoint nonempty sets, it is possible to assemble a new set—a transversal or choice set—containing exactly one element from each member of the given collection. Nevertheless, this seemingly innocuous principle has far-reaching mathematical consequences—many indispensable, some startling—and has come to figure prominently in discussions on the foundations of mathematics. It (or its equivalents) have been employed in countless mathematical papers, and a number of monographs have been exclusively devoted to it.

1. Origins and Chronology of the Axiom of Choice

In 1904 Ernst Zermelo formulated the Axiom of Choice (abbreviated as AC throughout this article) in terms of what he called coverings (Zermelo 1904). He starts with an arbitrary set \(M\) and uses the symbol \(M'\) to denote an arbitrary nonempty subset of \(M\), the collection of which he denotes by M. He continues:

Imagine that with every subset \(M'\) there is associated an arbitrary element \(m_{1}'\), that occurs in \(M'\) itself; let \(m_{1}'\) be called the “distinguished” element of \(M'\). This yields a “covering” \(g\) of the set \(M\) by certain elements of the set \(M\). The number of these coverings is equal to the product [of the cardinalities of all the subsets \(M'\)] and is certainly different from 0.

The last sentence of this quotation—which asserts, in effect, that coverings always exist for the collection of nonempty subsets of any (nonempty) set—is Zermelo’s first formulation of the Axiom of Choice[1]. This is now usually stated in terms of choice functions: here a choice function on a collection \(\sH\) of nonempty sets is a map \(f\) with domain \(\sH\) such that \(f(X) \in X\) for every \(X \in \sH\).

As a very simple example, let \(\sH\) be the collection of nonempty subsets of \(\{0, 1\}\), i.e., \(\sH = \{\{0\}, \{1\}, \{0,1\}\}\). Then \(\sH\) has the two distinct choice functions \(f_{1}\) and \(f_{2}\) given by:

\begin{align} f_{1}(\{0\}) &= 0 \\ f_{1}(\{1\}) &= 1 \\ f_{1}(\{0, 1\}) &= 0 \\ f_{2}(\{0\}) &= 0 \\ f_{2}(\{1\}) &= 1 \\ f_{2}(\{0, 1\}) &= 1 \end{align}

A more interesting example of a choice function is provided by taking \(\sH\) to be the set of (unordered) pairs of real numbers and the function to be that assigning to each pair its least element. A different choice function is obtained by assigning to each pair its greatest element. Clearly many more choice functions on \(\sH\) can be defined.

Stated in terms of choice functions, Zermelo’s first formulation of AC reads:

Any collection of nonempty sets has a choice function.

AC1 can be reformulated in terms of indexed or variable sets. An indexed collection of sets \(\sA = \{A_{i}: i \in I\}\) may be conceived as a variable set, to wit, as a set varying over the index set \(I\). Each \(A_{i}\) is then the “value” of the variable set \(\sA\) at stage \(i\). A choice function on \(\sA\) is a map \(f: I \rightarrow \bigcup_{i\in I} A_{i}\) such that \(f(i) \in A_{i}\) for all \(i\in I\). A choice function on \(\sA\) is thus a “choice” of an element of the variable set \(\sA\) at each stage; in other words, a choice function on \(\sA\) is a variable element of \(\sA\). AC1 is then equivalent to the assertion

Any indexed collection of sets has a choice function.

Informally speaking, AC2 amounts to the assertion that a variable set with an element at each stage has a variable element.

AC1 can also be reformulated in terms of relations, viz.

For any relation \(R\) between sets \(A\), \(B\),

\[ {\forall x\inn A}\ \exists y\inn B[R(x,y)] \Rightarrow \exists f[f: A \rightarrow B \amp {\forall x\inn A}[R(x,fx)]]. \]

In other words, every relation contains a function having the same domain.

Finally AC3 is easily shown to be equivalent (in the usual set theories) to:[2]

Any surjective function has a right inverse.

In a 1908 paper Zermelo introduced a modified form of AC. Let us call a transversal (or choice set) for a family of sets \(\sH\) any subset \(T \subseteq \bigcup \sH\) for which each intersection \(T \cap X\) for \(X \in \sH\) has exactly one element. As a very simple example, let \(\sH = \{\{0\}, \{1\}, \{2, 3\}\}\). Then \(\sH\) has the two transversals \(\{0, 1, 2\}\) and \(\{0, 1, 3\}\). A more substantial example is afforded by letting \(\sH\) be the collection of all lines in the Euclidean plane parallel to the \(x\)-axis. Then the set \(T\) of points on the \(y\)-axis is a transversal for \(\sH\).

Stated in terms of transversals, then, Zermelo’s second (1908) formulation of AC amounts to the assertion that any family of mutually disjoint nonempty sets has a transversal.[3]

Zermelo asserts that “the purely objective character” of this principle “is immediately evident.” In making this assertion Zermelo meant to emphasize the fact that in this form the principle makes no appeal to the possibility of making “choices”. It may also be that Zermelo had the following “combinatorial” justification of the principle in mind. Given a family \(\sH\) of mutually disjoint nonempty sets, call a subset \(S \subseteq \bigcup \sH\) a selector for \(\sH\) if \(S\cap X \ne \varnothing\) for all \(X \in \sH\). Clearly selectors for \(\sH\) exist; \(\bigcup \sH\) itself is an example. Now one can imagine taking a selector \(S\) for \(\sH\) and “thinning out” each intersection \(S \cap X\) for \(X\in \sH\) until it contains just a single element. The result is a transversal for \(\sH\). This argument, suitably refined, yields a precise derivation of AC in this formulation from the set-theoretical principle known as Zorn’s lemma (see below).

Let us call Zermelo’s 1908 formulation the combinatorial axiom of choice:

Any collection of mutually disjoint nonempty sets has a transversal.

It is to be noted that AC1 and CAC for finite collections of sets are both provable (by induction) in the usual set theories. But in the case of an infinite collection, even when each of its members is finite, the question of the existence of a choice function or a transversal is problematic[4]. For example, as already mentioned, it is easy to come up with a choice function for the collection of pairs of real numbers (simply choose the smaller element of each pair). But it is by no means obvious how to produce a choice function for the collection of pairs of arbitrary sets of real numbers.

Zermelo’s original purpose in introducing AC was to establish a central principle of Cantor’s set theory, namely, that every set admits a well-ordering and so can also be assigned a cardinal number. Zermelo’s 1904 introduction of the axiom, as well as the use to which he put it, provoked considerable criticism from the mathematicians of the day. The chief objection raised was to what some saw as its highly non-constructive, even idealist, character: while the axiom asserts the possibility of making a number of—perhaps even an uncountable number—of arbitrary “choices”, it gives no indication whatsoever of how these latter are actually to be effected, of how, otherwise put, choice functions are to be defined. This was particularly objectionable to mathematicians of a “constructive” bent such as the so-called French Empiricists Baire, Borel and Lebesgue, for whom a mathematical object could be asserted to exist only if it can be defined in such a way as to characterize it uniquely. Zermelo’s response to his critics came in the form in two papers in 1908. In the first of these, as remarked above, he reformulated AC in terms of transversals; in the second (1908a) he made explicit the further assumptions needed to carry through his proof of the well-ordering theorem. These assumptions constituted the first explicit presentation of an axiom system for set theory.

As the debate concerning the Axiom of Choice rumbled on, it became apparent that the proofs of a number of significant mathematical theorems made essential use of it, thereby leading many mathematicians to treat it as an indispensable tool of their trade. Hilbert, for example, came to regard AC as an essential principle of mathematics[5] and employed it in his defence of classical mathematical reasoning against the attacks of the intuitionists. Indeed his ε-operators are essentially just choice functions (see the entry on the epsilon calculus).

Although the usefulness of AC quickly become clear, doubts about its soundness remained. These doubts were reinforced by the fact that it had certain strikingly counterintuitive consequences. The most spectacular of these was Banach and Tarski’s paradoxical decompositions of the sphere (Banach and Tarski 1924): any solid sphere can be split into finitely many pieces which can be reassembled to form two solid spheres of the same size; and any solid sphere can be split into finitely many pieces in such a way as to enable them to be reassembled to form a solid sphere of arbitrary size. (See Wagon 1993.)

It was not until the middle 1930s that the question of the soundness of AC was finally put to rest with Kurt Gödel’s proof of its consistency relative to the other axioms of set theory.

Here is a brief chronology of AC:[6]

1904/1908 Zermelo introduces axioms of set theory, explicitly formulates AC and uses it to prove the well-ordering theorem, thereby raising a storm of controversy.
1904 Russell recognizes AC as the multiplicative axiom: the product of arbitrary nonzero cardinal numbers is nonzero.
1914 Hausdorff derives from AC the existence of nonmeasurable sets in the “paradoxical” form that \(\bfrac{1}{2}\) of a sphere is congruent to \(\bfrac{1}{3}\) of it (Hausdorff 1914).
1922 Fraenkel introduces the “permutation method” to establish independence of AC from a system of set theory with atoms (Fraenkel 1922).
1924 Building on the work of Hausdorff, Banach and Tarski derive from AC their paradoxical decompositions of the sphere.
1926 Hilbert introduces into his proof theory the “transfinite” or “epsilon” axiom as a version of AC . (Hilbert 1926).
1936 Lindenbaum and Mostowski extend and refine Fraenkel’s permutation method, and prove the independence of various statements of set theory weaker than AC . (Lindenbaum and Tarski 1938)
1935–38 Gödel establishes relative consistency of AC with the axioms of set theory (Gödel 1938a, 1938b, 1939, 1940).
1950s Mendelson, Shoenfield and Specker, working independently, use the permutation method to establish the independence of various forms of AC from a system of set theory without atoms, but also lacking the axiom of foundation (Mendelson 1956, 1958, Shoenfield 1955, Specker 1957).
1963 Paul Cohen proves independence of AC from the standard axioms of set theory (Cohen 1963, 1964).

2. Independence and Consistency of the Axiom of Choice

As stated above, in 1922 Fraenkel proved the independence of AC from a system of set theory containing “atoms”. Here by an atom is meant a pure individual, that is, an entity having no members and yet distinct from the empty set (so a fortiori an atom cannot be a set). In a system of set theory with atoms it is assumed that one is given an infinite set \(A\) of atoms. One can build a universe \(V(A)\) of sets over \(A\) by starting with \(A\), adding all the subsets of \(A\), adjoining all the subsets of the result, etc., and iterating transfinitely. \(V(A)\) is then a model of set theory with atoms. The kernel of Fraenkel’s method for proving the independence of AC is the observation that, since atoms cannot be set-theoretically distinguished, any permutation of the set \(A\) of atoms induces a structure-preserving permutation—an automorphism—of the universe \(V(A)\) of sets built from \(A\). This idea may be used to construct another model \(Sym(V)\) of set theory—a permutation or symmetric model—in which a set of mutually disjoint pairs of elements of \(A\) has no choice function.

Now let us suppose that we are given a group \(G\) of automorphisms of \(A\). Let us say that an automorphism \(\pi\); of \(A\) fixes an element \(x\) of \(V(A)\) if \(\pi(x) = x\). Clearly, if \(\pi \in G\) fixes every element of \(A\), it also fixes every element of \(V(A)\). Now it may be the case that, for certain elements \(x \in V(A)\), the fixing of the elements of a subset of \(A\) by any \(\pi\in G\) suffices to fix \(x\). We are therefore led to define a support for \(x\) to be a subset \(X\) of \(A\) such that, whenever \(\pi\in G\) fixes each member of \(X\), it also fixes \(x\). Members of \(V(A)\) possessing a finite support are called symmetric.

We next define the universe \(Sym(V)\) to consist of the hereditarily symmetric members of \(V(A)\), that is, those \(x\in V(A)\) such that \(x\), the elements of \(x\), the elements of elements of \(x\), etc., are all symmetric. \(Sym(V)\) is also a model of set theory with set of atoms \(A\), and \(\pi\) induces an automorphism of \(Sym(V)\).

Now suppose \(A\) to be partitioned into a (necessarily infinite) mutually disjoint set \(P\) of pairs. Take \(G\) to be the group of permutations of \(A\) which fix all the pairs in \(P\). Then \(P\in Sym(V)\); it can now be shown that \(Sym(V)\) contains no choice function on \(P\). For suppose \(f\) were a choice function on \(P\) and \(f \in Sym(V)\). Then \(f\) has a finite support which may be taken to be of the form \(\{a_{1}, \ldots, a_{n}, b_{1},\ldots,b_{n}\} with each pair \{a_{i}, b_{i}\} \in P\). Since \(P\) is infinite, we may select a pair \(\{c, d\} = U\) from \(P\) different from all the \(\{a_{i}, b_{i}\}\). Now we define \(\pi \in G\) so that \(\pi\) fixes each \(a_{i}\) and \(b_{i}\) and interchanges \(c\) and \(d\). Then \(\pi\) also fixes \(f\). Since \(f\) was supposed to be a choice function on \(P\), and \(U \in P\), we must have \(f(U) \in U\), that is, \(f(U) = c\) or \(f(U) = d\). Since \(\pi\) interchanges \(c\) and \(d\), it follows that \(\pi(f(U)) \ne f(U)\). But since \(\pi\) is an automorphism, it also preserves function application, so that \(\pi(f(U)) = \pi f(\pi(U))\). But \(\pi(U) = U\) and \(\pi f = f\), whence \(\pi(f(U)) = f(U)\). We have duly arrived at a contradiction, showing that the universe \(Sym(V)\) contains no choice function on \(P\).

The point here is that for a symmetric function \(f\) defined on \(P\) there is a finite list \(L\) of pairs from \(P\) the fixing of all of whose elements suffices to fix \(f\), and hence also all the values of \(f\). Now, for any pair \(U\) in \(P\) but not in \(L\) , a permutation \(\pi\) can always be found which fixes all the elements of the pairs in \(L\), but does not fix the members of \(U\). Since \(\pi\) must fix the value of \(f\) at \(U\), that value cannot lie in \(U\). Therefore \(f\) cannot “choose” an element of \(U\), so a fortiori \(f\) cannot be a choice function on \(P\).

This argument shows that collections of sets of atoms need not necessarily have choice functions, but it fails to establish the same fact for the “usual” sets of mathematics, for example the set of real numbers. This had to wait until 1963 when Paul Cohen showed that it is consistent with the standard axioms of set theory (which preclude the existence of atoms) to assume that a countable collection of pairs of sets of real numbers fails to have a choice function. The core of Cohen’s method of proof—the celebrated method of forcing—was vastly more general than any previous technique; nevertheless his independence proof also made essential use of permutation and symmetry in essentially the form in which Fraenkel had originally employed them.

Gödel’s proof of the relative consistency of AC with the axioms of set theory (see the entry on Kurt Gödel) rests on an entirely different idea: that of definability. He introduced a new hierarchy of sets—the constructible hierarchy—by analogy with the cumulative type hierarchy. We recall that the latter is defined by the following recursion on the ordinals, where \(\sP(X)\) is the power set of \(X\), \(\alpha\) is an ordinal, and \(\lambda\) is a limit ordinal::

\begin{align} V_0 &= \varnothing \\ V_{\alpha+1} &= \sP(V_{\alpha}) \\ V_{\lambda} &= {\bigcup}_{\alpha \lt \lambda} V_{\alpha} \end{align}

The constructible hierarchy is defined by a similar recursion on the ordinals, where \(\Def(X)\) is the set of all subsets of \(X\) which are first-order definable in the structure \((X, \in, (x)_{x\in X})\):[7]

\begin{align} L_0 &= \varnothing \\ L_{\alpha+1} &= \Def(L_{\alpha}) \\ L_{\lambda} &= {\bigcup}_{\alpha\lt \lambda} L_{\alpha} \end{align}

The constructible universe is the class \(L = \bigcup_{\alpha\in \Ord} L_{\alpha}\); the members of \(L\) are the constructible sets. Gödel showed that (assuming the axioms of Zermelo-Fraenkel set theory ZF) the structure \((L, \in)\) is a model of ZF and also of AC as well as the Generalized Continuum Hypothesis). The relative consistency of AC with ZF follows.

It was also observed by Gödel (1964) (and, independently, by Myhill and Scott 1971, Takeuti 1963 and Post 1951) that a simpler proof of the relative consistency of AC can be formulated in terms of ordinal definability. If we write \(\rD(X)\) for the set of all subsets of \(X\) which are first-order definable in the structure \((X, \in)\), then the class OD of ordinal definable sets is defined to be the union \(\bigcup_{\alpha\in \Ord} D(V_{\alpha})\). The class HOD of hereditarily ordinal definable sets consists of all sets \(a\) for which \(a\), the members of \(a\), the members of members of \(a\), … etc., are all ordinal definable. It can then be shown that the structure (HOD, \(\in\)) is a model of ZF + AC , from which the relative consistency of AC with ZF again follows.[8]

3. Maximal Principles and Zorn’s Lemma

The Axiom of Choice is closely allied to a group of mathematical propositions collectively known as maximal principles. Broadly speaking, these propositions assert that certain conditions are sufficient to ensure that a partially ordered set contains at least one maximal element, that is, an element such that, with respect to the given partial ordering, no element strictly exceeds it.

To see the connection between the idea of a maximal element and AC, let us return to the latter’s formulation AC2 in terms of indexed sets. Accordingly suppose we are given an indexed family of nonempty sets \(\sA = \{A_{i}:i \in I\}.\) Let us define a potential choice function on \(\sA\) to be a function \(f\) whose domain is a subset of \(I\) such that \(f(i) \in A_{i}\) for all \(i\in J\). (Here the use of the qualifier potential is suggested by the fact that the domain is a subset of \(I\); recall that a choice function \(f\) on \(\sA\) has the same properties as what we are now calling potential choice functions except that the domain of \(f\) is required to be all of \(I\), not just a subset.) The set \(P\) of potential choice functions on \(\sA\) can be partially ordered by inclusion: we agree that, for potential choice functions \(f, g \in P\), the relation \(f \le g\) holds provided that the domain of \(f\) is included in that of \(g\) and the value of \(f\) at an element of its domain coincides with the value of \(g\) there. It is now easy to see that the maximal elements of \(P\) with respect to the partial ordering \(\le\) are precisely the choice functions on \(\sA\).

Zorn’s Lemma is the best-known principle ensuring the existence of such maximal elements. To state it, we need a few definitions. Given a partially ordered set \((P, \le)\), an upper bound for a subset \(X\) of \(P\) is an element \(a\in P\) for which \(x\le a\) for every \(x\in X\); a maximal element of \(P\) may then be defined as an element \(a\) for which the set of upper bounds of \(\{a\}\) coincides with \(\{a\}\), which essentially means that no element of \(P\) is strictly larger than \(a\). A chain in \((P,\le)\) is a subset \(C\) of \(P\) such that, for any \(x\), \(y\in P\), either \(x\le y\) or \(y\le x\). \(P\) is said to be inductive if every chain in \(P\) has an upper bound. Now Zorn’s Lemma asserts:

Zorn’s Lemma (ZL):
Every nonempty inductive partially ordered set has a maximal element.

Why is Zorn’s Lemma plausible? Here is an informal argument. Given a nonempty inductive partially ordered set \((P,\le)\), pick an arbitrary element \(p_{0}\) of \(P\). If \(p_{0}\) is maximal, stop there. Otherwise pick an element \(p_{1} \gt p_{0}\); if \(p_{1}\) is maximal, stop there. Otherwise pick an element \(p_{2} \gt p_{1}\), and repeat the process. If none of the elements \(p_{0} \lt p_{1} \lt p_{2} \lt \ldots \lt p_{n} \lt \ldots\) is maximal, the \(p_{i}\) form a chain which, since \(P\) is inductive, has an upper bound \(q_{0}\). If \(q_{0}\) is maximal, stop there. Otherwise the procedure can be repeated with \(q_{0} \lt q_{1}\), …, and then iterated. This process must eventually terminate, since otherwise the union of the chains so generated would constitute a proper class, making \(P\) itself a proper class contrary to assumption. The point at which the process terminates yields a maximal element of \(P\).

This argument, suitably rigorized, gives a proof[9] of ZL from AC1 in Zermelo-Fraenkel set theory: in this proof AC1 is used to “pick” the elements referred to in the informal argument.

Another version of Zorn’s Lemma can be given in terms of collections of sets. Given a collection \(\sH\) of sets, let us call a nest in \(\sH\) any subcollection \(\sN\) of \(\sH\) such that, for any pair of members of \(\sN\), one is included in the other.[10] Call \(\sH\) strongly inductive if the union of any nest in \(\sH\) is a member of \(\sH\). Zorn’s Lemma may then be equivalently restated as the assertion that any nonempty strongly inductive collection \(\sH\) of sets has a maximal member, that is, a member properly included in no member of \(\sH\). This may in turn be formulated in a dual form. Call a family of sets strongly reductive if it is closed under intersections of nests. Then any nonempty strongly reductive family of sets has a minimal element, that is, a member properly including no member of the family.

AC2 is now easily derived from Zorn’s Lemma in this alternative form. For the set \(P\) of potential choice functions on an indexed family of sets \(\sA\) is clearly nonempty and is readily shown to be strongly inductive; so Zorn’s lemma yields the existence of a choice function on \(\sA\).

CAC can be derived from ZL in a way echoing the “combinatorial” justification of CAC sketched above. Accordingly suppose we are given a family \(\sH\) of mutually disjoint nonempty sets; call a subset \(S \subseteq \bigcup \sH\) a sampling for \(\sH\) if, for any \(X\in \sH\), either \(X \subseteq S\) or \(S \cap X\) is nonempty and finite. Minimal samplings are precisely transversals for \(\sH\);[11] and the collection \(\sT\) of samplings is clearly nonempty since it contains \(\bigcup \sH\). So if it can be shown that \(\sT\) is strongly reductive,[12] Zorn’s lemma will yield a minimal element of \(\sT\) and so a transversal for \(\sH\). The strong reductiveness of \(\sT\) may be seen as follows: suppose that \(\{S_{i}: i\in I\}\) is a nest of samplings; let \(S = {\bigcap}_{i\in I} S_{i}\). We need to show that \(S\) is itself a sampling; to this end let \(X \in \sH\) and suppose \(\neg(X \subseteq S)\). Then there is \(i \in I\) for which \(\neg(X \subseteq S_{i})\); since \(S_{i}\) is a sampling, \(S_{i}\cap X\) is finite and nonempty, say \(S_{i}\cap X = \{x_{1}, \ldots, x_{n}\}\). Clearly \(S\cap X\) is then finite; suppose for the sake of contradiction that \(S\cap X = \varnothing\). Then for each \(k = 1,\ldots,n\) there is \(i_k \in I\) for which \(\neg(x_k \in S_{i_k})\). It follows that \(\neg(S_{i} \subseteq S_{i_k})\), for \(k = 1, \ldots, n\). So, since the \(S_{i}\) form a chain, each \(S_{i_k}\) is a subset of \(S_{i}\). Let \(S_{j}\) be the least of \(S_{i_1},\ldots, S_{i_k}\); then \(S_{j}\subseteq S_{i}\). But since \(\neg(x_{k} \in S_{j})\), for \(k = 1, \ldots, n\), it now follows that \(S_{j} \cap X = \varnothing\), contradicting the fact that \(S_{j}\) is a sampling. Therefore \(S \cap X\ne \varnothing\); and \(S\) is a sampling as claimed.

We note that while Zorn’s lemma and the Axiom of Choice are set-theoretically equivalent, it is much more difficult to derive the former from the latter than vice-versa.

Here is a brief chronology of maximal principles.

1909 Hausdorff introduces first explicit formulation of a maximal principle and derives it from AC (Hausdorff 1909)
1914 Hausdorff’s Grundzüge der Mengenlehre (one of the first books on set theory and general topology) includes several maximal principles.
1922 Kuratowski formulates and employs several maximal principles in avoiding use of transfinite ordinals (Kuratowski 1922).
1926–28 Bochner and others independently introduce maximal principles (Bochner 1928, Moore 1932).
1935 Max Zorn, apparently unacquainted with previous formulations of maximal principles, publishes (Zorn 1935) his definitive version thereof later to become celebrated as his lemma (ZL). ZL was first formulated in Hamburg in 1933, where Chevalley and Artin quickly “adopted” it. It seems to have been Artin who first recognized that ZL would yield AC, so that the two are equivalent (over the remaining axioms of set theory). Zorn regarded his principle less as a theorem than as an axiom—he hoped that it would supersede cumbersome applications in algebra of transfinite induction and well-ordering, which algebraists in the Noether school had come to regard as “transcendental” devices.
1939–40 Teichmüller, Bourbaki and Tukey independently reformulate ZL in terms of “properties of finite character”(Bourbaki 1939, Teichmuller 1939, Tukey 1940).

4. Mathematical Applications of the Axiom of Choice

The Axiom of Choice has numerous applications in mathematics, a number of which have proved to be formally equivalent to it[13]. Historically the most important application was the first, namely:

The Well-Ordering Theorem (Zermelo 1904, 1908). Every set can be well-ordered.

After Zermelo published his 1904 proof of the well-ordering theorem from AC, it was quickly seen that the two are equivalent.

Another early equivalent of AC is

The Multiplicative Axiom (Russell 1906). The product of any set of non-zero cardinal numbers is non-zero.

Early applications of AC include:

  • Every infinite set has a denumerable subset. This principle, again weaker than AC, cannot be proved without it in the context of the remaining axioms of set theory.
  • Every infinite cardinal number is equal to its square. This was proved equivalent to AC in Tarski 1924.
  • Every vector space has a basis (initiated by Hamel 1905). This was proved equivalent to AC in Blass 1984.
  • Every field has an algebraic closure (Steinitz 1910). This assertion is weaker than AC, indeed is a consequence of the (weaker) compactness theorem for first-order logic (see below).
  • There is a Lebesgue nonmeasurable set of real numbers (Vitali 1905). This was shown much later to be a consequence of BPI (see below) and hence weaker than AC. Solovay (1970) established its independence of the remaining axioms of set theory.

A significant “folklore” equivalent of AC is

The Set-Theoretic Distributive Law. For an arbitrary doubly-indexed family of sets \(\{M_{i,j}: i \in I,j \in J\}\), and where \(J^I\) is the set of all functions with domain \(I\) and which take values in \(J\):

\[ \bigcap_{i\in I} \bigcup_{j\in J} M_{i,j} = \bigcup_{f\in J^I} \bigcap_{i\in I} M_{i,f(i)} \]

A much-studied special case of AC is the

Principle of Dependent Choices (Bernays 1942, Tarski 1948). For any nonempty relation \(R\) on a set \(A\) for which \(\range(R) \subseteq \domain(R)\), there is a function \(f: \omega \rightarrow A\) such that, for all \(n\in \omega, R(f(n),f(n+1))\). This principle, although (much) weaker than AC, cannot be proved without it in the context of the remaining axioms of set theory.

Mathematical equivalents of AC include:

  • Tychonov’s Theorem (1930): the product of compact topological spaces is compact. This was proved equivalent to AC in Kelley 1950. But for compact Hausdorff spaces it is equivalent to BPI (see below) and hence weaker than AC
  • Löwenheim-Skolem-Tarski Theorem (Löwenheim 1915, Skolem 1920, Tarski and Vaught 1957): a first-order sentence having a model of infinite cardinality \(\kappa\) also has a model of any infinite cardinality \(\mu\) such that \(\mu \le \kappa\). This was proved equivalent to AC by Tarski.
  • Krein-Milman Theorem: the unit ball \(B\) of the dual of a real normed linear space has an extreme point, that is, one which is not an interior point of any line segment in \(B\). This was proved equivalent to AC in Bell and Fremlin 1972a. There it is shown that, given any indexed family \(\sA\) of nonempty sets, there is a natural bijection between choice functions on \(\sA\) and the extreme points of the unit ball of the dual of a certain real normed linear space constructed from \(\sA\).
  • Every distributive lattice has a maximal ideal. This was proved equivalent to AC in Klimovsky 1958, and for lattices of sets in Bell and Fremlin 1972.
  • Every commutative ring with identity has a maximal ideal. This was proved equivalent to AC by Hodges 1979.

There are a number of mathematical consequences of AC which are known to be weaker[14] than it, in particular:

  • The Boolean Prime Ideal Theorem (BPI): every Boolean algebra has a maximal (or prime) ideal. This was shown to be weaker than AC in Halpern and Levy 1971.
  • The Stone Representation Theorem for Boolean algebras (Stone 1936): every Boolean algebra is isomorphic to a field of sets. This is equivalent to BPI and hence weaker than AC
  • Compactness Theorem for First-Order Logic (Gödel 1930, Malcev 1937, others): if every finite subset of a set of first-order sentences has a model, then the set has a model. This was shown, in Henkin 1954, to be equivalent to BPI, and hence weaker than AC.
  • Completeness Theorem for First-Order Logic (Gödel 1930, Henkin 1954): each consistent set of first-order sentences has a model. This was shown by Henkin in 1954 to be equivalent to BPI, and hence weaker than AC. If the cardinality of the model is specified in the right way, the assertion becomes equivalent to AC.

Finally, there is

  • The Sikorski Extension Theorem for Boolean algebras (Sikorski 1949): every complete Boolean algebra is injective, i.e., for any Boolean algebra \(A\) and any complete Boolean algebra \(B\), any homomorphism of a subalgebra of \(A\) into \(B\) can be extended to the whole of \(A\).

The question of the equivalence of this with AC is one of the few remaining interesting open questions in this area; while it clearly implies BPI, it was proved independent of BPI in Bell 1983.

Many of these theorems are discussed in Bell and Machover (1977).

5. The Axiom of Choice and Logic

An initial connection between AC and logic emerges by returning to its formulation AC3 in terms of relations, namely: any binary relation contains a function with the same domain. This version of AC is naturally expressible within a second-order language \(L\) with individual variables \(x\), \(y\), \(z\), … and function variables \(f\), \(g\), \(h\), …. In \(L\), binary relations are represented by formulas \(\phi(x, y)\) with two free individual variables \(x\), \(y\). The counterpart in \(L\) of the assertion AC3 is then

\(\forall x \exists y \phi(x, y) \rightarrow \exists f \forall x \phi(x, fx).\)

This scheme of sentences is the standard logical form of AC.

Zermelo’s original form of the Axiom of Choice, AC1, can be expressed as a scheme of sentences within a suitably strengthened version of \(L\). Accordingly we now suppose \(L\) to contain in addition predicate variables \(X\), \(Y\), \(Z\), … and second-order function variables \(F\), \(G\), \(H\), …. Here a second-order function variable \(F\) may be applied to a predicate variable \(X\) to yield an individual term \(FX\). The scheme of sentences

\(\forall X [\Phi(X) \rightarrow \exists x X(x)] \rightarrow \exists F \forall X[\Phi(X) \rightarrow X(FX)]\)

is the direct counterpart of AC1 in this strengthened second-order language. In words, AC1L asserts that, if each predicate having a certain property \(\Phi\) has instances, then there is a function \(F\) on predicates such that, for any predicate \(X\) satisfying \(\Phi\), \(FX\) is an instance of \(X\). Here predicates are playing the role of sets.

Up to now we have tacitly assumed our background logic to be the usual classical logic. But the true depth of the connection between AC and logic emerges only when intuitionistic or constructive logic is brought into the picture. It is a remarkable fact that, assuming only the framework of intuitionistic logic together with certain mild further presuppositions, the Axiom of Choice can be shown to entail the cardinal rule of classical logic, the law of excluded middle—the assertion that \(A \vee \neg A\) for any proposition \(A\). To be precise, using the rules of intuitionistic logic within our augmented language \(L\), we shall derive[15] the law of excluded middle from AC1L conjoined with the following additional principles:

Predicative Comprehension:
\(\exists X \forall x[X(x) \leftrightarrow \phi(x)]\), where \(\phi\) contains no bound function or predicate variables.

Extensionality of Functions:
\(\forall X \forall Y \forall F[X \approx Y \rightarrow FX = FY]\), where \(X \approx Y\) is an abbreviation for \(\forall x[X(x) \leftrightarrow Y(x)]\), that is, \(X\) and \(Y\) are extensionally equivalent.

Two Distinct Individuals:
\(\uO \ne \uI\), where \(\uO\) and \(\uI\) are individual constants.

Now let \(A\) be a given proposition. By Predicative Comprehension, we may introduce predicate constants \(U\), \(V\) together with the assertions

\begin{align} \tag{1} &\forall x[U(x) \leftrightarrow (A \vee x = 0)] \\ \notag &\forall x[V(x) \leftrightarrow (A \vee x = 1)] \end{align}

Let \(\Phi(X)\) be the formula \(X \approx U \vee X \approx V\). Then clearly we may assert \(\forall X[\Phi(X) \rightarrow \exists xX(x)]\) so AC1L may be invoked to assert \(\exists F \forall X[\Phi(X) \rightarrow X(FX)]\). Now we can introduce a function constant \(K\) together with the assertion

\[ \tag{2} \forall X[\Phi(X) \rightarrow X(KX)]. \]

Since evidently we may assert \(\Phi(U)\) and \(\Phi(V)\), it follows from (2) that we may assert \(U(KU)\) and \(V(KV)\), whence also, using (1),

\[ [A \vee KU = 0] \wedge [A \vee KV = 1]. \]

Using the distributive law (which holds in intuitionistic logic), it follows that we may assert

\[ A \vee [KU = 0 \wedge KV = 1]. \]

From the presupposition that \(0 \ne 1\) it follows that

\[ \tag{3} A \vee KU \ne KV \]

is assertable. But it follows from (1) that we may assert \(A \rightarrow U ≈ V\), and so also, using the Extensionality of Functions, \(A \rightarrow KU = KV\). This yields the assertability of \(KU \ne KV \rightarrow \neg A\), which, together with (3) in turn yields the assertability of

\[ A \vee \neg A \]

that is, the law of excluded middle.

The fact that the Axiom of Choice implies Excluded Middle seems at first sight to be at variance with the fact that the former is often taken as a valid principle in systems of constructive mathematics governed by intuitionistic logic, e.g. Bishop’s Constructive Analysis[16] and Martin-Löf’s Constructive Type Theory[17], in which Excluded Middle is not affirmed. In Bishop’s words, “A choice function exists in constructive mathematics because a choice is implied by the very meaning of existence.” Thus, for example, the antecedent \(\forall x \exists y \phi(x,y)\) of ACL, given a constructive construal, just means that we have a procedure which, applied to each \(x\), yields a \(y\) for which \(\phi(x, y)\). But this is precisely what is expressed by the consequent \(\exists f \forall x \phi(x,fx)\) of ACL.

To resolve the difficulty, we note that in deriving Excluded Middle from ACL1 essential use was made of the principles of Predicative Comprehension and Extensionality of Functions[18]. It follows that, in systems of constructive mathematics affirming AC (but not Excluded Middle) either the principle of Predicative Comprehension or the principle of Extensionality of Functions must fail. While the principle of Predicative Comprehension can be given a constructive justification, no such justification can be provided for the principle of Extensionality of Functions. Functions on predicates are given intensionally, and satisfy just the corresponding principle of Intensionality \(\forall X \forall Y \forall F[X = Y \rightarrow FX = FY]\). The principle of Extensionality can easily be made to fail by considering, for example, the predicates \(P\): rational featherless biped and \(Q\): human being and the function \(K\) on predicates which assigns to each predicate the number of words in its description. Then we can agree that \(P \approx Q\) but \(KP = 3\) and \(KQ = 2\).

In intuitionistic set theory (that is, set theory based on intuitionistic as opposed to classical logic—we shall abbreviate this as IST) and in topos theory the principles of Predicative Comprehension and Extensionality of Functions (both appropriately construed) hold and so there AC implies Excluded Middle.[19] , [20]

The derivation of Excluded Middle from AC was first given by Diaconescu (1975) in a category-theoretic setting. His proof employed essentially different ideas from the proof presented above; in particular, it makes no use of extensionality principles but instead employs the idea of the quotient of an object (or set) by an equivalence relation. It is instructive to formulate Diaconescu’s argument within IST. To do this, let us call a subset \(U\) of a set \(A\) detachable if there is a subset \(V\) of \(A\) for which \(U \cap V = \varnothing\) and \(U \cup V = A\). Diaconescu’s argument amounts to a derivation from AC4 (see above) of the assertion that every subset of a set is detachable, from which Excluded Middle readily follows. Here it is.

First, given \(U \subseteq A\), an indicator for \(U\) (in \(A\)) is a map \(g: A \times 2 \rightarrow 2\) satisfying

\[ U = \{ x \in A: g(x,0) = g(x,1) \} \]

It is then easy to show that a subset is detachable if and only if it has an indicator.

Now we show that, if AC4 holds, then any subset of a set has an indicator, and hence is detachable.

For \(U \subseteq A\), let \(R\) be the binary relation on \(A + A = A \times \{0\} \cup A \times \{1\}\) given by

\[ R = \{ ((x,0),(x,0)) : x \in A \} \cup \{ ((x,1),(x,1)) : x \in A \} \cup \{ ((x,0),(x,1) : x \in A \} \cup \{ ((x,1),(x,0) \} : x \in A \]

It can be checked that \(R\) is an equivalence relation. Write \(r\) for the natural map from \(A + A\) to the quotient[21] \(Q\) of \((A + A)\) by \(R\) which carries each member of \(A + A\) to its \(R-\)equivalence class.

Now apply AC4 to obtain a map \(f: Q \rightarrow A + A\) satisfying \(f(X) \in X\) for all \(X \in Q\). It is then not hard to show that, writing \(\pi_{1}\) for projection on the first coordinate,

\[ \tag{*} \mbox{for } n=0,1 \mbox{ and } x \in A, \pi_1(f(r(x,n))) = x; \]


\[ \tag{**} x \in U \leftrightarrow f(r(x,0)) = f(r(x,1)). \]

Now define \(g: A \times 2 \rightarrow 2\) by \(g = \pi_{2} \circ f \circ r\), where \(\pi_2\) is a projection on the second coordinate. Then \(g\) is an indicator for \(U\), as the following equivalences show:

\begin{align} x \in U &\leftrightarrow f(r(x,0))=f(r(x,1)) \ldots \mbox{by (**)} \\ &\leftrightarrow \pi_{1}(f(r(x,0))) = \pi_{1}(f(r(x,1))) \wedge \pi_{2}(f(r(x,0))) = \pi_{2}(f(r(x,1))) \\ &\leftrightarrow \pi_{2}(f(r(x,0))) = \pi_{2}(f(r(x,1))) \ldots \mbox{using (*)} \\ &\leftrightarrow g(x,0) = g(x,1). \end{align}

The proof is complete.

It can be shown (Bell 2006) that each of a number of intuitionistically invalid logical principles, including the law of excluded middle, is equivalent (in intuitionistic set theory) to a suitably weakened version of the axiom of choice. Accordingly these logical principles may be viewed as choice principles.

Here are the logical principles at issue:

SLEM \(\alpha \vee \neg \alpha\) (\(\alpha\) any sentence)
Lin \((\alpha \rightarrow \beta) \vee (\beta \rightarrow \alpha)\) (\(\alpha\), \(\beta\) any sentences)
Stone \(\neg \alpha \vee \neg\neg \alpha\) (\(\alpha\) any sentence)
Ex \(\exists x[ \exists x \alpha(x) \rightarrow \alpha(x)]\) (\(\alpha(x)\) any formula with at most \(x\) free)
Un \(\exists x[\alpha(x) \rightarrow \forall x\alpha(x)]\) (\(\alpha(x)\) any formula with at most \(x\) free)
Dis \(\forall x[\alpha \vee \beta(x)] \rightarrow \alpha\vee\forall x\beta(x)\) (\(\alpha\) any sentence, \(\beta(x)\) any formula with at most \(x\) free)

Over intuitionistic logic, Lin, Stone and Ex are consequences of SLEM; and Un implies Dis. All of these schemes follow, of course, from the full law of excluded middle, that is SLEM for arbitrary formulas.

In what follows the empty set is denoted by 0, \(\{0\}\) by 1, and \(\{0, 1\}\) by 2.

We formulate the following choice principles—here \(X\) is an arbitrary set, \(\Fun(X)\) the class of functions with domain \(X\) and \(\phi(x,y)\) an arbitrary formula of the language of set theory with at most the free variables \(x\), \(y\):

AC\(_X\) \(\forall x\in X\ \exists y \phi(x,y) \rightarrow \exists f \in \Fun(X)\ \forall x \in X\ \phi(x,fx)\)
AC\(^*_X\) \({\exists f \inn \Fun(X)} [{\forall x\inn X}\ \exists y \phi(x,y) \rightarrow {\forall x \inn X}\ \phi(x,fx)]\)
DAC\(_X\) \(\forall f \inn Fun(X)\ {\exists x \inn X}\ \phi(x,fx) \rightarrow {\exists x \inn X}\ \forall y \phi(x,y)\)
DAC\(^*_X\) \({\exists f \inn \Fun(X)}\ [{\exists x\inn X}\ \phi(x,fx) \rightarrow {\exists x \inn X}\ \forall y \phi(x,y)]\)

The first two of these are forms of AC for \(X\); while classically equivalent, in IST AC\(^*_X\) implies AC\(_X\), but not conversely. The principles DAC\(_X\) and DAC\(^*_X\) are dual forms of the axiom of choice for \(X\): classically they are both equivalent to AC\(_X\) and AC\(^*_X\) but intuitionistically DAC\(^*_X\) implies DAC\(_X\), and not conversely.

We also formulate the weak extensional selection principle, in which \(\alpha(x)\) and \(\beta(x)\) are any formulas with at most the variable \(x\) free:

\({\exists x\inn 2}\ \alpha(x) \wedge {\exists x \inn 2}\ \beta(x) \rightarrow {\exists x \inn 2}\ {\exists y \inn 2}\ [\alpha(x) \wedge \beta(y) \wedge [{\forall x \inn 2}\ [\alpha(x) \leftrightarrow \beta(x)] \rightarrow x = y]].\)

This principle, a straightforward consequence of the axiom of choice, asserts that, for any pair of instantiated properties of members of 2, instances may be assigned to the properties in a manner that depends just on their extensions.

Each of the logical principles tabulated above is equivalent (in IST) to a choice principle. In fact:

  • WESP and SLEM are equivalent over IST.
  • AC\(^*_1\) and Ex are equivalent over IST.

Further, while DAC\(_1\) is easily seen to be provable in IST, we have

  • DAC\(^*_1\) and Un are equivalent over IST.

Next, while AC\(_2\) is easily proved in IST, by contrast we have

  • DAC\(_2\) and Dis are equivalent over IST.
  • Over IST, DAC\(^*_2\) is equivalent to Un, and hence also to DAC\(^*_1\).

In order to provide choice schemes equivalent to Lin and Stone we introduce

\({\exists f \inn 2^X} [{\forall x\inn X}\ {\exists y \inn 2}\ \phi(x,y) \rightarrow {\exists x \inn X}\ \phi(x,fx)]\)

\({\exists f \inn 2^X} [{\forall x \inn X}\ {\exists y \inn 2}\ \phi(x,y) \rightarrow {\forall x \inn X}\ \phi(x,fx)]\) provided it is provable in IST that \(\forall x[\phi(x,0) \rightarrow \neg\phi(x,1)]\)

Clearly ac\(^*_X\) is equivalent to

\({\exists f \inn 2^X}[{\forall x \inn X}[\phi(x,0) \vee \phi(x,1)] \rightarrow {\forall x \inn X}\ \phi(x,fx)]\)

and similarly for dac\(^*_X\).

Then, over IST, ac\(^*_1\) and dac\(^*_1\) are equivalent, respectively, to Lin and Stone.

These results show just how deeply choice principles interact with logic, when the background logic is assumed to be intuitionistic. In a classical setting where the Law of Excluded Middle is assumed these connections are obliterated.

Readers interested in the topic of the axiom of choice and type theory may consult the following supplementary document:

The Axiom of Choice and Type Theory


  • Aczel, P., 1978. “The type-theoretic interpretation of constructive set theory,” in A. ManIntyre, L. Pacholski, and J. Paris (eds.), Logic Colloquium 77, Amsterdam: North-Holland, pp. 55–66.
  • –––, 1982. “The type-theoretic interpretation of constructive set theory: choice principles,” in A. S. Troelstra and D. van Dalen (eds.), The L.E.J. Brouwer Centenary Symposium, Amsterdam: North-Holland, pp. 1–40.
  • Aczel, P. and N. Gambino, 2002. “Collection principles in dependent type theory,” in P. Callaghan, Z. Luo, J. McKinna and R. Pollack (eds.), Types for Proofs and Programs (Lecture Notes on Computer Science, Volume 2277), Berlin: Springer, pp. 1–23.
  • –––, 2005. “The generalized type-theoretic interpretation of constructive set theory,” Journal of Symbolic Logic, 71/1: 67–103. [Preprint available online in compressed Postscript]
  • Aczel, P., Berg, B. and J. Granstrom, 2013. “Are there enough injective sets?,” Studia Logica, 101(3): 467–482.
  • Aczel, P. and M. Rathjen, 2001. Notes on Constructive Set Theory. Technical Report 40, Mittag-Leffler Institute, The Swedish Royal Academy of Sciences. [Preprint available online]
  • Banach, S. and Tarski, A., 1924. “Sur la décomposition des ensembles de points en parties respectivement congruentes,” Fundamenta Mathematicae, 6: 244–277.
  • Bell, J.L., 1983. “On the strength of the Sikorski extension theorem for Boolean algebras,” Journal of Symbolic Logic, 48: 841–846.
  • –––, 1988. Toposes and Local Set Theories: An Introduction, Oxford: Clarendon Press.
  • –––, 1997. “Zorn’s lemma and complete Boolean algebras in intuitionistic type theories,” Journal of Symbolic Logic, 62: 1265–1279.
  • –––, 2003. “Some new intuitionistic equivalents of Zorn’s Lemma,” Archive for Mathematical Logic, 42: 811–814.
  • –––, 2005. Set Theory: Boolean-valued Models and Independence Proofs, Oxford: Clarendon Press.
  • –––, 2006. “Choice principles in intuitionistic set theory,” in A Logical Approach to Philosophy, Devidi, D. and Kenyon, T.(eds.), Berlin: Springer, 36–44.
  • –––, 2008. “The axiom of choice and the law of excluded middle in weak set theories,” Mathematical Logic Quarterly, 48: 841–846.
  • –––, 2009. The Axiom of Choice, London: College Publications.
  • –––, 2011. “The axiom of choice in an elementary theory of operations and sets” in Analysis and Interpretation in the Exact Sciences, Devidi, D. and Kenyon, T.(eds.), Berlin: Springer, 163–175.
  • Bell, J.L. and Fremlin, D., 1972. “The maximal ideal theorem for lattices of sets,” Bulletin of the London Mathematical Society, 4: 1–2.
  • –––, 1972a. “A geometric form of the axiom of choice,” Fundamenta Mathematicae, 77: 167–170.
  • Bell, J.L. and Machover, M. , 1977. A Course in Mathematical Logic. Amsterdam: North-Holland.
  • Bernays, P., 1942. “A system of axiomatic set theory, Part III,” Journal of Symbolic Logic, 7: 65–89.
  • Bishop, E. and Bridges, D., 1985. Constructive Analysis, Berlin: Springer.
  • Blass, A., 1984. “Existence of bases implies the axiom of choice,” in Axiomatic Set Theory, Baumgartner, Martin and Shelah (eds.) (Contemporary Mathematics Series, Volume 31), American Mathematical Society, pp. 31–33.
  • Bochner, S., 1928. “Fortsetzung Riemannscher Flachen,” Mathematische Annalen, 98: 406–421.
  • Bourbaki, N., 1939. Elements de Mathematique, Livre I: Theorie des Ensembles, Paris: Hermann.
  • –––, 1950.“Sur le theoreme de Zorn,” Archiv dem Mathematik, 2: 434–437.
  • Cohen, P.J., 1963. “The independence of the continuum hypothesis I,” Proceedings of the U.S. National Academy of Sciemces, 50: 1143–48.

    [The independence of the Axiom of Choice from the standard axioms of set theory, ZF, is part of Theorem 1 of this paper.]

  • –––, 1964. “The independence of the continuum hypothesis II,” Proceedings of the U.S. National Academy of Sciemces, 51: 105–110.
  • –––, 1966. Set Theory and the Continuum Hypotheis, New York: Benjamin.
  • Curry, H.B. and R. Feys, 1958. Combinatory Logic, Amsterdam: North Holland.
  • Devidi, D., 2004. “Choice principles and constructive logics,” Philosophia Mathematica, 12(3): 222–243.
  • Diaconescu, R., 1975. “Axiom of choice and complementation,” Proceedings of the American Mathematical Society, 51: 176–8.
  • Fraenkel, A., 1922. “Zu den Grundlagen der Cantor-Zermeloschen Mengenlehre”, Mathematische Annalen, 86: 230–237.
  • Fraenkel, A., 1922a.“Über den Begriff ‘definit’ und die Unabhängigkeit des Auswahlsaxioms,” Sitzungsberichte der Preussischen Akademie der Wissenschaften, Physik-math. Klasse, 253–257. Translated in van Heijenoort, From Frege to Gödel: A Source Book in Mathematical Logic 1879–1931, Harvard University Press, 1967, pp. 284–289.
  • Fraenkel, A., Y. Bar-Hillel and A. Levy, 1973. Foundations of Set Theory, Amsterdam: North-Holland, 2nd edition.
  • Gödel, K., 1938a. “The consistency of the axiom of choice and of the generalized continuum-hypothesis,” Proceedings of the U.S. National Academy of Sciences, 24: 556–7.
  • Gödel, K., 1938b. “Consistency-proof for the generalized continuum-hypothesis,” Proceedings of the U.S. National Academy of Sciemces, 25: 220–4.
  • Gödel, K., 1940. The Consistency of the Axiom of Choice and of the Generalized Continuum-Hypothesis with the Axioms of Set Theory, Annals of Mathematics Studies, No. 3, Princeton: Princeton University Press.
  • Gödel, K., 1964. “Remarks before the Princeton Bicentennial Conference,” in The Undecidable, Martin Davis (ed.), CITY: Raven Press, pp. 84–88.
  • Goodman, N. and Myhill, J., 1978. “Choice implies excluded middle,” Zeitschrift fur Mathematische Logik und Grundlagen der Mathematik , 24(25–30): 461.
  • Grattan-Guinness, I., 2012. “Jourdain, Russell and the axiom of choice: a new document,” Russell: The Journal of the Bertrand Russell Archives, 32(1): 69–74.
  • Grayson, R.J., 1975. “A sheaf approach to models of set theory,” M.Sc. Thesis, Mathematics Department, Oxford University.
  • Halpern, , J.D. and Levy, A., 1971. “The Boolean prime ideal theorem does not imply the axiom of choice,” Axiomatic Set Theory, Proceedings of Symposia in Pure Mathematics, Vol. XIII, Part I. American Mathematical Society, pp. 83–134.
  • Hamel, G., 1905. “Eine Basis aller Zahlen und die unstetigen Lösungen der Funktionalgleichung: f(x + y) = f(x) + f(y),” Mathematische Annalen, 60: 459–62.
  • Hausdorff, F., 1909. “Die Graduierung nach dem Endverlauf,” Königlich Sächsichsen Gesellschaft der Wissenschaften zu Leipzig, Math.—Phys. Klasse, Sitzungberichte, 61: 297–334.
  • –––, 1914. Grundzüge der Mengenlehre, Leipzig: de Gruyter. Reprinted, New York: Chelsea, 1965.
  • –––, 1914a. “Bemerkung über den Inhalt von Punktmengen,” Mathematische Annalen, 75: 428–433.
  • Hilbert D., 1926. “Über das Unendliche,” Mathematische Annalen, 95. Translated in J. van Heijenoort (ed.) From Frege to Gödel: A Source Book in Mathematical Logic, 1879–1931, Cambridge, MA: Harvard University Press, 1967, pp. 367–392.
  • Hodges, W., 1979. “Krull implies Zorn,” Journal of the London Mathematical Society, 19: 285–7.
  • Howard, P. and Rubin, J. E., 1998. Consequences of the Axiom of Choice, American Mathematical Society Surveys and Monographs, Vol. 59.
  • Howard, W. A., 1980. “The formulae-as-types notion of construction,” in J. R. Hindley and J. P. Seldin (eds.), To H. B. Curry: Essays on Combinatorial Logic. Lambda Calculus and Formalism, New York and London: Academic Press, pp. 479–490.
  • Jacobs, B., 1999. Categorical Logic and Type Theory, Amsterdam: Elsevier.
  • Jech, T., 1973. The Axiom of Choice, Amsterdam: North-Holland.
  • Kelley, J.L., 1950. “The Tychonoff product theorem implies the axiom of choice,” Fundamenta Mathematicae, 37: 75–76.
  • Klimovsky, G., 1958. “El teorema de Zorn y la existencia de filtros a ideales maximales en los reticulados distributivos,” Revista de la Union Matematica Argentina , 18: 160–64.
  • Kuratowski, K., 1922. “Une méthode d’élimination des nombres transfinis des raissonements mathématiques,” Fundamenta Mathematicae, 3: 76–108.
  • Lawvere, F. W. and Rosebrugh, R., 2003. Sets for Mathematics, Cambridge: Cambridge University Press.
  • Lindenbaum, A., and Mostowski, A., 1938. “Über die Unabhängigkeit des Auswahlsaxioms und einiger seiner Folgerungen,” Comptes Rendus des Séances de la Société des Sciences et des Lettres de Varsovie, 31: 27–32.
  • Maietti, M.E., 2005. “Modular correspondence between dependent type theories and categories including pretopoi and topoi,” Mathematical Structures in Computer Science, 15/6: 1089–1145.
  • Martin-Löf, P., 1975. “An Intuitionistic theory of types; predicative part,” in H. E. Rose and J. C. Shepherdson (eds.), Logic Colloquium 73, Amsterdam: North-Holland, pp. 73–118.
  • –––, 1982. “Constructive mathematics and computer programming,” in L. C. Cohen, J. Los, H. Pfeiffer, and K.P. Podewski (eds.), Logic, Methodology and Philosophy of Science VI, Amsterdam: North-Holland, pp. 153–179.
  • –––, 1984. Intuitionistic Type Theory, Naples: Bibliopolis.
  • –––, 2006. “100 years of Zermelo’s axiom of choice: what was the problem with it?,” The Computer Journal, 49(3): 345–350.
  • Mendelson, E., 1956. “The independence of a weak axiom of choice,” Journal of Symbolic Logic, 21: 350–366.
  • –––, 1958. “The axiom of fundierung and the axiom of choice,” Arkiv fur Mathematische Logik und Grundlagenforschung, 4: 67–70.
  • –––, 1987. Introduction to Mathematical Logic, CITY: Wadsworth & Brooks, 3rd edition.
  • Moore, G.H., 1982. Zermelo’s Axiom of Choice, Berlin: Springer-Verlag.
  • Moore, R.L., 1932. Foundations of Point Set Theory, Anerican Mathematical Society Colloquium Publications, vol. 13.
  • Myhill, J. and Scott, D.S., 1971. “Ordinal definability,” Axiomatic Set Theory. Proceedings of Symposia in Pure Mathematics, Vol. XIII, Part I. American Mathematical Society, pp. 271–8.
  • Post, E.L., 1953. “A necessary condition for definability for transfinite von Neumann-Gödel set theory sets, with an application to the problem of the existence of a definable well-ordering of the continuum.” Preliminary Report, Bulletin of the American Mathematical Society, 59: 246.
  • Ramsey, F. P., 1926. “The foundations of mathematics,” Proceedings of the London Mathematical Society, 25: 338–84. Reprinted in The Foundations of Mathematics and Other Essays, D.H. Mellor, ed. London: Routledge, 2001.
  • Rubin, H. and Rubin, J. E., 1985. Equivalents of the Axiom of Choice II, Amsterdam: North-Holland.
  • Rubin, H. and Scott, D.S., 1954. “Some topological theorems equivalent to the prime ideal theorem,” Bulletin of the American Mathematical Society, 60: 389.
  • Russell, B., 1906. “On some difficulties in the theory of transfinite numbers and order types,” Proceedings of the London Mathematical Society, 4(2): 29–53.
  • Shoenfield, J. R., 1955. “The independence of the axiom of choice,” Journal of Symbolic Logic, 20: 202.
  • Sikorski, R., 1948. “A theorem on extensions of homomorphisms,” Annales de la Societé Polonaise de Mathématiques, 21: 332–35.
  • Solovay, R., 1970. “A model of set theory in which every set of reals is Lebesgue measurable,” Annals of Mathematics, 92: 1–56.
  • Specker, E., 1957. “Zur Axiomatik der Mengenlehre (Fundierungs- und Auswahlaxiom),” Zeit. Math. Logik und Grund., 3: 173–210.
  • Steinitz, E., 1910. “Algebraische Theorie der Körper,” Journal für die Reine und angewandte Mathematik (Crelle), 137: 167–309.
  • Stone, M.H., 1936. “The theory of representations for Boolean algebras,” Transactions of the American Mathematical Society, 40: 37–111.
  • Tait, W. W., 1994. “The law of excluded middle and the axiom of choice,” in Mathematics and Mind, A. George (ed.), New York: Oxford University Press, pp. 45–70.
  • Takeuti, G., 1961. “Remarks on Cantor’s Absolute,” Journal of the Mathematical Society of Japan, 13: 197–206.
  • Tarski, A., 1948. “Axiomatic and algebraic aspects of two theorems on sums of cardinals,” Fundamenta Mathematicae, 35: 79–104.
  • Tarski, A., and Robert L. Vaught, 1957. “Arithmetical extensions of relational systems,” Compositio Mathematica, 13: 81–102. [Tarski & Vaught 1957 available online]
  • Teichmuller, O., 1939. “Brauch der Algebraiker das Auswahlaxiom?” Deutsches Mathematik, 4: 567–577.
  • Vitali, G., 1905. Sul problema della misura dei gruppi di punti di una retta, Bologna: Tip. Gamberini e Parmeggiani.
  • Wagon, S., 1993.The Banach-Tarski Paradox, Cambridge University Press.
  • Zermelo, E., 1904. “Neuer Beweis, dass jede Menge Wohlordnung werden kann (Aus einem an Herrn Hilbert gerichteten Briefe)”, Mathematische Annalen, 59: 514–16. Translated in J. van Heijenoort (ed.), From Frege to Gödel: A Source Book in Mathematical Logic, 1879–1931, Cambridge, MA: Harvard University Press, 1967, pp. 139–141.
  • –––, 1908. Neuer Beweis für die Möglichkeit einer Wohlordnung, Mathematische Annalen, 65: 107–128. Translated in J. van Heijenoort (ed.), From Frege to Gödel: A Source Book in Mathematical Logic, 1879–1931, Cambridge, MA: Harvard University Press, 1967, pp. 183–198.
  • –––, 1908a.“Untersuchungen uber die Grundlagen der Mengenlehre,” Mathematische Annalen, 65: 107–128. Translated in J. van Heijenoort (ed.), From Frege to Gödel: A Source Book in Mathematical Logic, 1879–1931, Cambridge, MA: Harvard University Press, 1967, pp. 199–215.
  • Zorn, M., 1935. A remark on method in transfinite algebra, Bulletin of the American Mathematical Society, 41: 667–70.

Other Internet Resources


The author and editors would like to thank Jesse Alama for carefully reading this piece and making many valuable suggestions for improvement.

Copyright © 2021 by
John L. Bell <>

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free