Set Theory

Cardinal and Ordinal Numbers
Math 6300
Klaus Kaiser
April 9, 2007
Contents
1 Introduction 2
2 The Zermelo Fraenkel Axioms of Set Theory 5
3 Ordinals 14
3.1 Well-Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Ordinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 The Orderstructure of the Ordinal Sum, the Ordinal Product and Ordinal Exponentia-
tion. Finite Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 The Axiom of Choice 32
5 The Axiom of Foundation 34
6 Cardinals 38
6.1 Equivalence of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Cardinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1
Chapter 1
Introduction
In this course, we will develop set theory like any other mathematical theory on the basis of a few
given axioms and generally accepted practices of logic. When we are studying algebraic structures like
groups, we have in mind structures like S n , Z, V, G whose elements are called permutations, integers,
vectors or just group elements, and we use suggestive notations like , n, v, g to denote the objects
these groups are made of. For groups, we also have a binary operation acting on the elements which
we like to denote as composition, addition, translation or multiplication and we use suggestive symbols,
like for composition and + for addition. A group has a unique identity element which is usually
denoted as e. As a matter of convenience, an additional unary operation is then added, which assigns
to a group element its unique inverse. The theory of groups is then governed by a few simple axioms
x(yz) = (xy)z, xx1 = x1 x = e and xe = ex = x. Groups are then developed solely on the basis of
these axioms with the aforementioned examples serving as illustrations and motivations.
The situation for set theory is somewhat different. Unless you have already seen some axiomatic
set theory or mathematical logic, you probably have not the fuzziest idea about different models of set
theory. Sets are just arbitrary collections of objects and manipulations of sets , like forming intersection,
union, and complement correspond to basic logical connectives, namely and, or, not. It seems that
there is only one universe of sets. Our knowledge about it, however, may increase over time. Before
Borel and Lebesgue, mathematicians didnt recognize measurable sets of real numbers. But they were
there, just as the planet Pluto existed before it was discovered around 1930. Of course, mathematical
objects are not physical. They are mental constructs owing their existence to our ability to speak and
think. But is it safe to speak about all possible sets? Indeed, most mathematicians believe that it is
safe to accept the idea of a universe of sets in which all of mathematics is performed. Mathematical
statements then should be either true or false even when we know that they are undecidable right
now. There is a belief that further insights will eventually resolve all open problems one way or the
other. However, certain precautions must be exercised in order to avoid inconsistencies., like Russels
paradox about the set of all sets which dont contain themselves. (If r(x) stands for the predicate
not(x x), then forming the Russel class r = {x|r(x)} leads to the contradiction (r r) iff not(r r).
In Naive Set Theory, methods for constructing new sets from given ones are presented and some sort of
etiquette for doing it right is established. Such an approach, however, can be confusing. For example,
most mathematicians dont feel any need for Kuratowskis definition of an ordered pair (a, b) as the set
{{a}, {a, b}} or to go through a lengthy justification that s(n, 0) = n, s(n, m0 ) = s(n, m)0 defines the
sum of two natural numbers n and m. On the other hand, certain proofs in analysis where sequences
s(n) are constructed argument by argument leaving at every n infinitely many options open, certainly
take some time for getting used to. How in the world can we talk about the sequence s(n), n N
as a finished product, when at point n we just dont know what s(m) for m n will be? Sure, the
2
Axiom of choice is supposed to do the job. But what kind of axiom is this anyway? Is it part of our
logic, or is it a technical property of a theory of sets? How come that generations of mathematicians
didnt notice that they had been using it all the time? In Axiomatic Set Theory we assume that there
is a mathematical structure U which we call the universe and whose elements are called sets. On U
a binary relation is defined which is called the membership relation. The basic assumption then is
that (U, ) satisfies the Zermelo-Fraenkel Axioms of set theory. We think that U is a set in the naive,
familiar sense whose objects are called sets. Because we dont want U to be a member of U, we call U
the universe. This is mainly a precaution which we exercise in other branches of mathematics, too. A
function space consists of functions, but is itself not a function. The relation is a relation between
sets in U. We use the notation only for this relation, in particular, instead of x U we say x is in U
or that x belongs to U. We are now going to describe the axioms of set theory. These are statements
about the universe of sets every mathematician would consider as self-evident. We are going to claim
that there are sets, in particular an empty set and an infinite set and that we can construct from given
sets certain new sets, like the union and the power set of a set. There is one difficulty in defining sets by
properties. Because we only have the membership relation at our disposal, any property about sets
should be expressed in terms of and logical procedures. For this reason we have to develop a language
of axiomatic set theory first. The existence of sets sharing a common property is then governed by
the axiom of comprehension. It will turn out that for example x = x or not (x x) never define sets,
no matter what the universe is, resolving Russells paradox. A certain amount of mathematical logic
seems to be unavoidable in doing axiomatic set theory. But all that is necessary is to explain the syntax
of the first order language for set theory. We do not have to say what a formal proof is. Similarly,
the interpretation of formulas in the model (U, ) is considered as self evident; delving into semantic
considerations is equally unnecessary. Also, this is standard mathematical practice. In algebra you
have no problems understanding the meaning of any particular equation, say the commutative law,
but it needs to be made clear what a polynomial as a formal expression is. Because in axiomatic set
theory we have to make statements concerning all formulas, we have to say what a formula is. Only
the most important facts about set theoretic constructions, cardinals and ordinals are discussed in
this course. Advanced topics of topology, for example, need more set theory. But these notes contain
enough material for understanding classical algebra and analysis.
References
K. Devlin, Fundamentals of Contemporary Set Theory. Springer (1979).
K. Devlin, The Joy of Sets. Springer (1993).
K. Hrbacek, T. Jech, Introduction to Set Theory. Marcel Dekker, Inc. (1984).
J. L. Krivine, Introduction to Axiomatic Set Theory. Reidel (1971).
K. Kunen, Set Theory. North Holland (1980).
Y. Moschovakis, Notes on Set Theory. Springer (1994).
J. D. Monk, Introduction to Set Theory. McGraw-Hill Book Company (1969)
J. H. Shoenfield, Mathematical Logic. Addison Wesley (1967).
R. Vaught, Set Theory. Birkhauser (1994).
These are very good text books on set theory and logic. The book by Monk is still useful for
learning the basics of cardinal and ordinal arithmetic. Devlins 93 book contains a chapter on recent
3
research on P. Aczels Anti-Foundation-Axiom. The books by Kunen, Krivine and Shoenfield are
advanced graduate texts, i.e., aimed at students who want to specialize in logic. The book by Kunen
is a comprehensive text on set theory while Krivine is a good introduction into the classical relative
consistency proofs, that is, the ones based on inner models of set theory. Shoenfield contains a final, far
reaching, chapter on set theory. The following two articles are quite interesting. Shoenfield analyzes the
truth of the ZF axioms, while Hilbert outlines the transition from finitary, constructive mathematics
(which underlies, for example, our intuitive understanding of the natural numbers as well as the syntax
of logic) towards a formalistic point of view about mathematics.
D. Hilbert,Uber das Unendliche. Math. Annalen 25 (1925).
J. H. Shoenfield, Axioms of Set Theory. In: Handbook of Mathematical Logic. (North Holland,
Amsterdam)
The Independence Proofs of Cohen are clearly presented in:
P. J. Cohen, Set Theory and the Continuum Hypothesis. Benjamin (1966).
The formal analysis of logic and set theory has important practical applications in form of non-
standard methods. There is an extensive literature on this vital subject. The following books are
exceptionally well written; the book by Robinson is a classic of this field.
R. Goldblatt, Lectures on the Hyperreals. An Introduction to Nonstandard Analysis. Springer (1998).
A. E. Hurd, P.A. Loeb, An Introduction to Nonstandard Real Analysis. Academic Press (1985).
E. Nelson, Radically Elementary Probability Theory. Princeton (1987).
A. Robinson, Non-Standard Analysis. North-Holland (1974).
The Axiom of Foundation, that is, a set cannot contain itself, should be true for the universe of
sets, but it does not have any significant consequences. So we have separated it from the other axioms
and develop set theory without this axiom. The following two books analyze strong negations of the
Foundation Axiom and provide applications to self referential statements and Computer Science.
P. Aczel, Non-Well-Founded-Sets. Center for the Study of Language and Information Publications,
Stanford (1988).
J. Barwise, J. Etchemendy, The Liar. Oxford (1987).
A good deal of the history of modern set theory is contained in

John W. Dawson, Jr. Logical Dilemmas,The Life and Work of Kurt Godel. A K Peters, Wellesley,
Massachusetts (1997).
4
Chapter 2
The Zermelo Fraenkel Axioms of

Set Theory
The Axiom of Extensionality. If every element of the set a is an element of the set b and every
element of the set b is an element of the set a, then a = b.
In other words, two sets are equal iff they contain the same elements. This should not be considered
as a definition of equality of sets. Equality is an undefined, primitive relation and clearly, equal sets
have the same elements. The axiom of extensionality merely states a condition on the relation . We
may formalize extensionality:

xy z (z x) (z y) (x = y)
The elements of the universe (U , ) are in the first place just objects without any structure. What
matters is their relationship to other elements with respect to . We may think of U as a directed
graph where the sets in U are nodes and a b corresponds to an edge a b. Part of the universe may
have nodes called 0, 1, 2, {1} and edges 0 1, 0 2, 1 2, 1 {1}:
2 {1}
Figure 2.1: Snapshot of the Universe
An edge 0 {1} would violate the axiom of extensionality, because then 2 and {1} would have the
same elements.
The Null Set Axiom. There is a set with no elements:
xy (y x)
By extensionality, there is only one such set. It is denoted by and called the empty set. It is a
constant within the universe U, i.e., a unique element defined by a formula.
5
The Pairing Axiom. For any sets a and b there is a set c whose only elements are a and b:

xyzt (t z) (t = x) (t = y)
By extensionality again, there is for given a and b only one such set c. We write c = {a, b} for the
set whose only elements are a and b. If a and b are the same set, then c has only one element, namely
a. That is, for any set a of the universe U there is a set whose only element is a. This set is called the
singleton {a}; {a, b} is called a pair if a is different from b. Three applications of the pairing axiom
lead to the existence of the set {{a}, {a, b}}. This is Kuratowskis definition of the ordered pair (a, b)
of a and b. One easily proves the
Theorem 2.1 One has that (a, b) = (a0 , b0 ) if and only if a = a0 and b = b0 .
The Union Axiom. For any set a there is a set b whose members are precisely the members of
members of a:
xyz (z y) t (t x) (z t)
S S
The set b is called the union of a and denoted by a or {x|x a}. We mention some consequences:
For any
S sets a, b, c there is a set d whose elements are a, b and c:
d = {{a, b}, {c}}
The union of c = {a, b} is denoted by a b. It is easy to see that
a b = {x|x a or x b}.
Let a and b be sets. We say that a is a subset of b if every element of a is also an element of b:

(x y) z (z x) (z y)
The left-hand side is not a formula, because is the only relation of our universe; (x y) is only
an abbreviation of the formula in the variables x and y on the right hand side. In particular we have
by extensionality that
xy (x = y) (x y) (y x)
The Power Set Axiom. Let a be a set of the universe U. Then there is a set b whose elements are
precisely the subsets of a:
xyz (z y) (z x)
The set b is called the power set of a and we use the notation b = P(a) . We have P() = {} ,
P({}) = {, {}} , P({, {}}) = {, {}, {{}}, {, {}}}.
If a is any set of our universe, any c P(a) corresponds to an intuitive subset of a, namely {d|d c}
where for each such d, d a holds. However, not every proper collection of edges d a will lend
itself to a set c of the universe. For example, if U happens to be countable then any infinite set a in U
will have subsets which dont correspond to sets in U. What kind of properties now lead to subsets?
We have reached the point where we have to talk a bit about mathematical logic.
The Language of Axiomatic Set Theory

We are going to describe a formal language that has the following ingredients.
1. Symbols
(a) An unlimited supply of variables x0 , x1 , x2 . . ..
6
(b) The elements of the universe U are the constants of the language.
(c) The membership symbol and the equality symbol =.
(d) The symbols for the propositional connectives: which stands for and, which stands for
or, which stands for not, which stands for if, then, which stands for if and only
if.
(e) For each variable xn one has the universal quantifier xn which stands for for all xn and
the existential quantifier xn which stands for there exists some xn .
2. Formation Rules for Formulas
(a) Let u and v stand for any variable or constant. Then (u v) and (u = v) are formulas.
These are the atomic formulas.
(b) If P and Q are formulas then (P Q), (P Q), P , (P Q), (P Q) are formulas.
(c) If P is a formula then xn P and xn P are formulas.
Only expressions that can be constructed by finitely many applications of these rules are formulas.
For better readability, different kinds of parentheses will be used, and letters, like x, y, z, . . . will stand
for variables. There are standard conventions concerning the priorities of the binary propositional
connectives in order to avoid an excessive accumulation of parentheses.
The axioms of set theory as stated so far are all formulas, actually sentences, that is, all occurrences
of variables are bound. If Q is a formula then every occurrence of xn within P of a subformula xn P
or xn P of Q is said to be bound. Variables xn which are not bound, i.e., which are not within the
scope of a quantifier xn or xn of Q, are said to be free. If we underline in a formula a variable then
this variable is meant to occur only bound.
Formulas can be represented by certain labelled, directed trees. An atomic formula is just a node,
e.g.,
(x a)
which is a tree. If 1 is the tree for P1 and if 2 is the tree for P2 , then the tree for (P1 P2 ) is
the graph:

1 2
Figure 2.2: The Graph of a Conjunction
Any node of the tree for the formula Q determines a subformula P of Q. For example, a
node labelled determines a conjunction P (P1 P2 ) as a subformula of Q, where P1 and P2 are
subformulas of P ; P1 and P2 are the scope of the node . Similarly, a node x determines a subformula
P xn P1 , where the subformula P1 of P is the scope of the node xn within Q.
Whenever we indicate a formula P as P (x0 , x1 , . . . , xn1 ), it is understood that the free variables
of P , if there are any, are are among x0 , x1 , . . . , xn . The constants within a formula are often called
parameters. So we write P (x0 , . . . , xn1 , a0 , . . . , am1 ) to indicate the free variables and parameters of
a formula. A sentence P is either true or false in the universe U. More generally, if P (x0 , . . . , xn1 )
is a formula with free variables x0 , . . . , xn1 and if a0 , . . . , an1 belong to U, then a simultaneous
7
x
(x = y)
(z x) (z y)
Figure 2.3: The Graph of the Extensionality Axiom
8
substitution of the xi by the ai makes P (a0 , . . . an1 ) either true or false. When we say that a formula
P (x0 , . . . xn1 ) holds on U , it is meant that its closure, i.e.,
x0 . . . xn1 P (x0 , . . . , xn1 )
holds on U . Because we have used the equality sign = as a symbol within the language, equality of
formulas, or more generally their equivalence, is denoted by , e.g., x = y y = x. That is, we
write P Q if and only if P Q is a theorem of logic. Formulas without parameters are called pure
formulas of set theory.
A formula in one free variable, or argument, is called a class.
S(x, a) (x a) and R(x) (x x)
are examples of classes. The first class defines a set, namely a, while the second one does not define a
set: b satisfies S(x, a) iff b a; there is no set r such that b satisfies R(x) iff b r.
Formulas P (x0 , . . . , xn1 ) are called n-ary relations. Formulas in two arguments are called binary
relations. We also use the terms predicates, properties and expressions for formulas. Let E(x) be a
class. We say that R(x, y) is a relation on E(x) if

xy R(x, y) E(x) E(y)
holds on U .
Let R(x, y) be a binary relation. We define domain and range as the classes
dom of R(x, y) yR(x, y) and ran of R(x, y) xR(x, y)
Then R(x, y) is a relation on
E(z) dom of R(z, y) ran of R(x, z)
which we call the extent of R(x, y).

A binary relation R(x, y) is called reflexive if

xy R(x, y) (R(x, x) R(y, y)
holds on U .
The relation R(x, y) is symmetric if

xy R(x, y) R(y, x)
holds on U .
The relation R(x, y) is transitive if

xyz R(x, y) R(y, z) R(x, z)
holds on U .
The binary relation E(x, y) is called an equivalence if it is reflexive, symmetric and transitive. It is
easy to see that for any reflexive relation, e.g., an equivalence E(x, y) one has that,
dom of E(x, y) E(x, x) and ran of E(x, y) E(y, y)
and therefore E(x, y) is a relation on its domain D(x).
9
The binary relation R(x, y) is called anti-symmetric if

xy R(x, y) R(y, x) (x = y)
holds on U .
A binary relation P O(x, y) which is reflexive, transitive and anti-symmetric is called a partial order.
Again we have by reflexivity that domain and range define the same class and that P O(x, y) is a relation
on its domain. A partial order L(x, y) is called linear or total, if

xy D(x) D(y) L(x, y) L(y, x)
holds on U . D(x) denotes the domain of L(x, y).

An (n + 1)-ary relation F (x0 , . . . , xn1 , y) is called functional if

x . . . xn1 y1 y2 F (x0 , . . . , xn1 , y1 ) F (x0 , . . . , xn1 , y2 )) (y1 = y2
holds on U .
We define for any relation P (x0 , . . . , xn1 , y) domain and range
dom of P (x0 , . . . , xn1 , y) yP (xo , . . . , xn1 , y)

ran of P (x0 , . . . , xn1 , y) x0 . . . xn1 P (x0 , . . . , xn1 , y)
The domain is an n ary relation D(x0 , . . . , xn1 ) while the range is a class R(y).
The binary relation
P (x, y) z (z y) (z x)
is functional in the variable x. It assigns to a set a the power set b = P(a). We have that

dom of P (x, y) (x = x) and ran of P (x, y) xz (z y) (z x)
Instead of P (x, y) we will often use the more suggestive notation y = P(x). We similarly write
z = (x, y), z = {x, y} and z = x y for the corresponding functional predicates.
We define for a formula F (x, y, z) the expression

f un F (x, y, z) xy1 y2 F (x, y1 , z) F (x, y2 , z) y1 = y2
which holds for a set a in U if and only if F (x, y, a) is functional.
The Schemes of Replacement and Comprehension

In the previous section we didnt stipulate the existence of sets. For example, domain and range of
a binary relation were defined as classes, i.e., as formulas in one variable. Of course, given a binary
relation on a given set a, domain and range should be subsets of a. The existence of sets according to
standard constructions in mathematics is guaranteed by
The Axiom Scheme of Replacement. Let F (x, y, x0 , . . . , xn1 ) be a pure formula of axiomatic set
theory such that for sets a0 , . . . , an1 the binary relation F (x, y, a0 , a1 , . . . , an1 ) is functional.
Let a be any set. Then there is a set b such that d b holds if and only if there is some c a
such that F (c, d, a0 , . . . , an1 ) holds on U:

x0 . . . xn1 f un F (x, y, x0 , . . . , xn1 ) xyv[v y u[u x F (u, v, x0 , . . . , xn1 )]]
10
Because this is supposed to hold for every pure formula F (x, y, x0 , . . . , xn1 ), where at least x and y
are free, this list of axioms is called a scheme. It is called replacement because it allows us to replace
some of the elements c of the set a simultaneously by sets d in order to create a set b. As a first
application of replacement we will deduce its weaker cousin
The Scheme of Comprehension. Let A(x, x0 , . . . , xn1 ) be a pure formula of axiomatic set theory
and let a0 , . . . , an1 be sets. Then for any set a there is a set b which consists exactly of those
elements c of a for which A(c, a0 , . . . , an1 ) holds on U:

x0 . . . xn1 xyz z y z x A(z, x0 , . . . , xn1 )
In order to deduce this from replacement, we only have to note that

F (x, y, x0 , . . . , xn1 ) (y = x A(x, x0 , . . . , xn1 )
is functional.
The standard notation for the subset b of a, which is defined by the property A(x, a, .., a), is
b = {x|x a A(x, a0 , . . . , an1 )}
Constructions within the Universe

The existence of the union of a set a was stipulated as an axiom. We dont need a further axiom for
the intersection.
The Intersection of a Set. Let a be non-empty set. Then there is a set b whose members are
precisely the members of all members of a.

x (x = ) yz (z y) t t x z t)
This follows at once from comprehension. Note that the intersection of theT set aTis contained in any
of its members c. The standard notation for the intersection of a set a is a or {x|x a}. Why is
it important to assume that the set a is non-empty?
The Cartesian product of Two Sets. Let a and b be sets. Then there is a set c such that e c if,
and only if, e = (f, g) where f a and g b:

xyzt (t z) uv t = (u, v) (u x) (v y)
The equation z = (x, y) is shorthand for the functional relation Q(x, y, z) which says that z is the
ordered pair (x, y), which according to Kuratowskis definition is the set {{x}, {x, y}}. Thus:
h n oi
Q(x, y, z) t (t z) uv t = ut = v s s u s = x s0 s0 v s0 = xs0 = y
If = (f, g) = {{f }, {f, g}} where f a and g b then {f } P(a) and {f, g} P(a b). Hence
(f, g) P(P(a b)). We now apply comprehension to

P (z, a, b) uv Q(u, v, z) u a v b
11
which says that z is an ordered pair whose two components belong to a and b, respectively and get
the desired result as
c = {e|e P(P(a b)) P (e, a, b)}
The set c is called the cartesian product a b of a and b. The cartesian product of finitely many sets
is similarly defined. The formula

C(x, y, z) t t z uv Q(u, v, t) (u x) (v y)
is functional and says that z is the cartesian product of x and y.

We remark that a binary relation R(x, y) may be perceived as a unary relation R (z) :

R (z) xy Q(x, y, z) R(x, y)
That is, R (e) holds if and only if e = (c, d) and R(c, d) holds.
Relations as Sets. Let R(x, y) be a binary relation. Assume that domain and range of R(x, y) are
sets a and b, respectively. Then define the set
r = {e|e = (c, d) a b, R(c, d)}
We now have e = (c, d) r if and only if R (e) holds. In this sense we may identify a binary relation,
for which the extent is a set, by a set of ordered pairs.
Graphs of Functions. If the binary relation F (x, y) is functional and the domain of F (x, y) is a set
a, then, according to replacement, the range is also a set. Let b be any set containing the range
of F (x, y). The set
{e|e = (c, d) a b, F (c, d)}
is called the graph of the function f : a b.
The projections are important examples of functional relations:
F1 (x, y, z, p) Q(x, y, z) p = x
F2 (x, y, z, q) Q(x, y, z) q = y
We have that F1 (a, b, c, d) holds on U iff c = (a, b) and d = a, i.e., d is the first component of the
ordered pair c. The predicate
P1 (t, x) uv F1 (u, v, t, x)
then holds if x is the first component of the ordered pair t. That f is a function from a to b is
expressed by F (a, b, f ) where

F (x, y, z) p C(x, y, p) z p u u x t t z P1 (t, u)

tt0 suu0 t z t0 z P1 (t, s) P1 (t0 , s) P2 (t, u) P2 (t0 , u0 ) u = u0
The Exponentiation of Sets. Let a and b be sets. Then there is a set c whose elements are given
by the functions f : a b.
The function f : a b is an element of P(a b). Hence c is defined by comprehension:
c = {f |f P(a b), F (a, b, f )}
For the set c one uses exponential notation c = ba .
12
Union and Intersection of a Family of Sets. A function s with domain i is sometimes called a
family of sets aj , j i, where, of course aj = s(j). The union of the family s is the
Sunion of the
range
S r of s, which is, according to the replacement axiom, a set u. We write u = {aj |j i} =
s. The intersection of a non-empty family s is defined similarly.
The Cartesian Product of a Family of Sets. Let s be a family of sets, indexed by the set i. A
function f : i u from i into the union u of the range r of s is called a choice function if for every
j i one has that f (j) aj . Then there is a set c whose members are all the choice Q functions
for s. This set is called the cartesian product of the family s and is denoted by c = {aj |j i}.
This follows from comprehension: We will

use the expression (x,
y) z as shorthand
for p(Q(x, y, p)
(p z)). Then c = {f |f ui xyz (x, y) f (x, z) s (y z) }
The Remaining Axioms of ZF

Within the universe U we certainly can find the sets 0 = , 1 = {0}, 2 = {0, 1},. . ., n = {0, 1, . . . , n 1}.
Note that n + 1 = n {n} where n is not a member of n. Hence n has exactly n elements and n 7 n
is an injective map from the set N of natural numbers into the universe U. The sets n are called the
natural number objects of U. Notice that we have n < m if and only if n m, and n m is the same
as n m, standing for strict inclusion. On the basis of the axioms stated so far we have no way of
telling whether there is a set whose elements are exactly the sets n.
The Axiom of Infinity. There is a set whose elements are exactly the natural number sets n.
This concludes the list ZF of axioms for axiomatic set theory. Our definition of as the set of all
natural number sets n is only preliminary; it is not even given by a first order sentence of our language
of set theory. After we have studied ordinals in general, will be defined as the set of all finite ordinal
numbers1 . Of course, there is no danger to think that the finite ordinals are just the ordinary finite
numbers n. And the vast majority of mathematicians feel that way. On the other hand, any axiomatic
definition of allows for elements which are nonstandard, i.e., different from any ordinary number n.
However, whether one realizes this possibility or not seems to be irrelevant for the formal development
of mathematics.
There are two more axioms most mathematicians consider astrue, the Axiom of Choice and the
Axiom of Foundation. These axioms are listed separately, mainly because because a great deal of set
theory can be developed without them.
The Axiom of Choice (AC). The Cartesian product of a family of non-empty sets is itself non-
empty.
The Axiom of Foundation (AF). Every non-empty set a contains a set b which is disjoint to a.
Both axioms are independent of ZF. The axiom of choice is necessary for proving many essential
theorems concerning infinite sets, e.g., that every vector space has a basis. The axiom of foundation
provides the universe U with more structure in the sense that every set will have a rank as measure of
its complexity. However, even strong negations of AF are consistent with ZF and such models of set
theory have become an important research tool in computing science for the analysis of self-referential
statements.
1 More elementary is Dedekinds approach: For any set x, x {x} is called the successor x+ of x. A set i is called
inductive if we ave that i and x i implies that x+ is in i. Dedekinds version of the axiom of infinity then says
that there is an inductive set i0 . It is obvious that i0 must contain all n. Then he defines the set of natural numbers
as the intersection of all inductive sets. This can be done because it is enough to intersect all inductive subsets of i0 .
13
Chapter 3
Ordinals
3.1 Well-Orderings
A binary relation r on a set w is called a well-ordering if it is a partial order such that every non-empty
subset has a smallest element. Because any pair has a minimum, r is actually a total order. Instead
of (c, d) r one writes c d. Actually, for well-orderings one prefers the irreflexive or strict version
of r. If r is a partial order then one has
(i) r \ = r0 is irreflexive, i.e., (c, c)
/ r.
(ii) r0 is transitive.
The conditions (i) and (ii) imply that
(iii) r0 is anti-symmetric, i.e., c < d implies that d < c cannot hold.
On the other hand, if r0 is an irreflexive, transitive relation then r = r0 is a partial order.
If we assume the axiom of infinity in its naive version, i.e., is the set of the standard natural
numbers n, then a prime example of a well-ordered set is provided by , together with the relation
restricted to the set . We have:
n < m iff n m iff n m
Because N = (N, <) is well-ordered, the same holds true for (, ). Notice, that N lives outside the
universe U and n 7 n is an isomorphism which is not an element of U. That N is well-ordered by <
is equivalent to the induction axiom, which we may take for granted.
On the other hand, we can prove, with the help of the axiom of foundation, that (, ) is well-
ordered. Let b be a non-empty subset of . According to AF there is some n in b which is disjoint to
b. That is, m n yields m / b. In other words, n is the smallest element of b.
Definition 3.1 A binary relation r on a set w is well-founded if there is no strictly decreasing map
f : w, n 7 an , i.e., a0 > a1 > a2 > . . . which belongs to U.
Proposition 3.1 A well-ordering is well-founded.
Proof. Otherwise the range {a0 , a1 , . . .} would be a set b without a smallest element. 2
Definition 3.2 A binary relation r on a set a satisfies the minimal condition if any non-empty subset
b of a contains a minimal element, i.e., there is some c b such that for no d b one has that (d, c) r.
14
Definition 3.3 Let be a partial order on the set a. Then any c a determines the (initial) segment
s(c) = {b|b a, b < c}.
Proposition 3.2 (AC) A partial order on a set a which is well-founded satisfies the minimal
condition.
Proof. Assume that a non-empty subset b of a does not have a minimal element. Then the set
d = {s(c)|c b} does not contain the empty set. Let f be a choice function on d. Pick any c0 from
b. Then 0 7 c0 , 1 7 f (s(c0 )) = c1 , 2 7 f (s(c2 )) = c2 , . . . defines a strictly decreasing sequence which
belongs to U. (The proof that we can define in such a way a sequence is a good excercise.) This
contradicts the well-foundedness of our relation. 2
In particular:
Proposition 3.3 (AC) A total, well-founded order is a well ordering. 2
As a generalization of complete induction for natural numbers we have:

Proposition 3.4 (The Principle of Proof by Induction) Let (w, <) be well-ordered system and
let a be a subset of w. Assume that
(i) The set a contains the smallest element o of w.
(ii) Assume that c a in case that d a for all d < c.
Then one has that a = w.
Proof. Assume that b = w \ a is non-empty. Let c be the smallest element of b. We have c > o by
(i). Now, any d < c belongs to a and therefore c belongs to a, by (ii). But this contradicts the choice
of c. 2
Notice that (ii) actually implies (i). There is no d < o, and therefore o a, by default.
Definition 3.4 Let R(x, y) be a partial ordering. The class S(x) is called a segment of R(x, y) if
xy R(x, y) S(y) S(x) holds on U.
Proposition 3.5 The proper segments of a well-ordered set w are exactly the segments s(a).
Proof. Assume that the subset s of w is a proper segment. Then w \ s is non-empty and has a
smallest element a. But then s = s(a). 2
Corollary 3.6 The map a 7 s(a) is strictly increasing and establishes a bijective order preserving
map between the well-ordered set (w, <), and the ordered set system S = ({s(a)|a w}, ) of segments
of w. 2
Definition 3.5 An injective, order preserving map map between partially ordered sets is called an
order embedding. A bijective order embedding for which the inverse is also order preserving is called
an (order) isomorphism. An order isomorphism of a partially ordered set to itself is called an (order)
automorphism.
Proposition 3.7 If f is a bijective order embedding from the totally ordered set (a, 1 ) to the partially
ordered set (b, 2 ) then 2 is a total order and f is an order isomorphism. 2
Corollary 3.8 A well-ordered set is order isomorphic to its system of proper segments. 2
15
Proposition 3.9 Let (w, <) be a well ordered system and f : w w be strictly increasing. Then one
has f (c) c for all c w.
Proof. For the proof assume that the set a = {c|f (c) < c} is non-empty. Then a has a smallest
element b. Then f (b) < b because b a. Hence f (f (b) < f (b) because f is strictly increasing. Therefore,
f (b) a. But then one has that b f (b), which is a contradiction. 2
Proposition 3.10 Let f be an automorphism of the well-ordered system (w, <). Then f is the identity
map.
Proof. Assume otherwise. Then a = {b|f (b) > b} is non-empty and has a smallest element b0 . Then
f (b0 ) > b0 because b0 a. Let b0 = f (c0 ). From f (b0 ) > f (c0 ) one infers b0 > c0 and therefore, by the
choice of b0 , f (c0 ) = c0 . Hence b0 = f (c0 ) = c0 , which is a contradiction. 2
Corollary 3.11 An isomorphism between isomorphic well-ordered systems (w, <) and (w0 , <0 ) is unique.
Proof. If f and g are two such isomorphisms then g 1 f is the identity on w. That is, f = g. 2
Corollary 3.12 A well-ordered system (w, <) is not isomorphic to any of its segments s(a).
Proof. For any such isomorphism we would have f (a) a, because of Proposition 3.9, and f (a)
s(a), i.e., f (a) < a. 2
Corollary 3.13 Assume that s(a) = s(a0 ) holds in the well-ordered system (w, <). Then a = a0 and
the isomorphism is the identity.
Proof. Assume a < a0 . Then s(a) s(a0 ) and the well-ordered system w0 = s(a0 ) would be isomorphic
to its segment s(a). This contradicts Corollary 3.12. 2
Theorem 3.14 Any two well-ordered systems (w1 , <1 ) and (w2 , <2 ) are either isomorphic or one is
isomorphic to a segment of the other one. Any such isomorphism is unique.
Proof. We cannot have w1 = (s(b), <2 ) and w2 = (s(a), <1 ) because this would yield w1
= s(b)
w2
= s(a), hence w
1 = s(a0
) for some a0
< a. This contradicts the statement of Corollary 3.12. For the
proof of the theorem we define a binary relation
F = {(a, b)|s(a)
= s(b)} w1 w2 .
It is easy to see that the domain and the range of F are segments of w1 and of w2 , respectively.
Moreover, F is functional and strictly increasing. If we had dom(F ) = s(a0 ) and ran(F ) = s(b0 ) then
F would establish an isomorphism s(a0 ) = s(b0 ). Hence (a0 , b0 ) F, and therefore a0 s(a0 ), which is
a contradiction. Hence, either dom(F ) or ran(F ), or both, are all of w1 or w2 , respectively. So either
F : w1 s(b0 ) or F : w1 w2 or F 1 : w2 s(a0 ). 2
Definition 3.6 Let R(x, y) be a total order relation on its domain D(x). Then R(x,y) is called a
well-ordering if for any element a in the domain one has that S(x, a) R(x, a) x 6= a is a set and
well-ordered by R(x, y).
This terminology is justified by the following

Lemma 3.15 Let R(x, y) be a well-ordering with domain D(x) and let T (x) be any non-empty subclass
of D(x). Then T (x) has a smallest element.
16
Proof. Let a be a set which satisfies T (x) and D(x). Then T (x) S(x, a) is a set. If this set is empty,
then a is the smallest element of T (x). Otherwise it has a smallest element with respect to R(x, y)
which is then the smallest element of T (x). 2
Corollary 3.16 Let R1 (x, y) and R2 (x, y) be two well-orderings with proper classes D1 (x) and D2 (y)
as their domains. Then there is a unique functional relation F (x, y) which defines an isomorphism
between D1 (x) and D2 (x).
Proof. This is just a small modification of the proof for Theorem 3.14. As before, define
F (x, y) D1 (x) D2 (y) {u|S1 (u, x)}

= {v|S2 (v, y)}.
Because D1 (x) is a proper class, it cannot be isomorphic to any S2 (y, b) because these segments are
by definition sets. 2
Corollary 3.17 Let R1 (x, y) and R2 (x, y) be two well-orderings with domains D1 (x) and D2 (x), re-
spectively. Assume that D1 (x) is a set a and D2 (x) is a proper class. Then a is order isomorphic to a
unique initial segment S(x, b) of R2 (x, y). 2
The question now is whether there are any well-ordered classes. This brings us to ordinals.
3.2 Ordinals
A set a is transitive if every element b of a is a subset of a. That is, the set a satisfies the predicate

T rans(x) z (z x) (z x) .
For example, every natural number set n is transitive. The set a is transitive if c b and b a implies
that c a. This explains the terminology. The set is called an ordinal if
(i) is transitive.
(ii) if restricted to is a strict well-ordering of .
The second condition is formalized by the predicate:

W ell(z) xy x z y z (x y) (y x)

uvw u z v z w z u v v w u w
h n oi
x x z x 6= u u x y y x u y u = y
Well() holds if restricted to is an irreflexive , transitive relation and where every subset s of
contains some such that for any s one has that / . It is customary to denote ordinals by
small Greek letters. The class of ordinals is defined by the predicate
Ord(x) T rans(x) W ell(x)
Proposition 3.18 Let be an ordinal. Then is not an element of itself.

Proof. That is an ordinal means that (, <) is a well-ordered system where < holds for elements
, iff . If we had , then as an element of itself would violate the irreflexivity of <
on . 2
17
Proposition 3.19 Every element of an ordinal is itself an ordinal.
Proof. We first have to show that is transitive. Let and . We need to show that .
Because of the transitivity of we have that the element of is also a subset of , i.e., .
Hence yields . But then, by transitivity of again, yields . So gives
. Hence the elements , , are all in and < < . Hence < and this is .
Because is a subset of , it is well-ordered by , restricted to . 2
Proposition 3.20 The segments of an ordinal are and the elements of .
Proof. Because is well-ordered we have that the segments of are and the sets s() for .
But s() = {| , < } = {| , } = {| } = . 2
Corollary 3.21 Let and be ordinals where . Then .
Proof. We show that is a proper segment of . Indeed, let and < where . But then
by definition of < on . Because is an ordinal we have that . So . Hence is a
segment and as a proper subset of , it is an element of . 2
Corollary 3.22 Let and be ordinals. Then if and only if . 2
Proposition 3.23 Let be any ordinal. Then + = {} is also an ordinal.
Proof. Let + and . If then and therefore , and + because of

+ . If = then and + . Hence + is transitive.
The elements of + are the elements of and . It is easy to see that < iff b defines a total
ordering on + with as largest element. If s is any non-empty subset of + , then is the minimum
of s in case that s = {}, otherwise it is min(s ). Hence + is well-ordered by . 2
Lemma 3.24 Let and be ordinals. Then = is an ordinal which is equal to or equal to
.
Proof. We show that is a segment of . Clearly, . Let and < where .
According to the definition of < on we have that . Now, and therefore , i.e.,
because is an ordinal. Therefore, . Hence, = . Thus is a segment of and
therefore = or . We need to show that implies that = . By symmetry we also have
that or = . But and leads to = , i.e., which is impossible for
ordinals. 2
Theorem 3.25 Let and be ordinals. Then = or or and these cases are mutually
exclusive
Proof. = is the same as and is the same as . The claim now follows
immediately from the lemma. 2
Let and be any ordinals. We define that < iff . The relation
R(x, y) Ord(x) Ord(y) (x y)
is a strict order on its domain, which is the class Ord(x). For any ordinal , we have that R(x, ) is
the set which is well-ordered by R(x, y).
Theorem 3.26 The class Ord(x) of ordinals is well-ordered by the membership relation , i.e., by
< iff . 2
18
S
Proposition 3.27 Let = a be the union of a set a of ordinals. Then is an ordinal and one has
for every a. Furthermore, if is an ordinal such that holds for each a, then
. That is, every set of ordinals has a least upper bound within the class of ordinals.
Proof. By the very definition of the union of a set a, we have that the union of a contains every
member of a: for each a; and obviously, any set c with that property must contain .
For ordinals, is the same as < or = and therefore we only have to show that is an
ordinal. We first show that is transitive. So let and . Then for some a. Now
is an ordinal, therefore . But then and because of . Now let and be
any elements of . Because they are ordinals we have that either = or or . That is,
is totally ordered by . If c is any non-empty subset of then c must be non-empty for at least
one a. Let = min(c ) and let c. We cannot have because was the smallest
element of c in . Hence . Therefore, is well-ordered by . 2
Theorem 3.28 The class Ord(x) of ordinals is a proper class, i.e., there is no set which contains all
ordinals.
Proof. Assume that a set b contains all ordinals. Then, by comprehension,
S there would be a set a
that consists exactly of all ordinals. By the previous proposition, = a would be an ordinal and
therefore + a. But then by the definition of union. But this is impossible for ordinals.2
Corollary 3.29 Let R(x, y) be any well-ordering where the domain D(x) of R(x, y) is a proper class.
Then there is exactly one functional relation between D(x) and Ord(x) which defines an order isomor-
phism between these classes. 2
Corollary 3.30 Let (w, <) be any well-ordered system. Then there is exactly one ordinal such that
(w, <) and (, ) are order isomorphic. 2
For any ordinal one has that + = {} is the smallest ordinal greater than . It is called the
successor of and commonly denoted as + 1. The successor of + 1 is denoted as + 2, etc. The
empty set is the smallest ordinal 0, and 1 = {0} is the successor of 0.
0 < 1 < 2 < ...
are the finite ordinals n and [

= {n|n }
is the smallest infinite ordinal. The existence of this ordinal has been explicitly stated as an axiom.
This slow march through the ordinals continues with
+ 1 < + 2 < ...
and then comes as the second infinite limit ordinal

[
+ = { + n|n } = 2
that is the next ordinal after which is not a successor, i.e., a limit ordinal ; 0 is the only finite limit
ordinal.
Theorem 3.31 (Proof by Induction on Ordinals) Let P (x) be any property. Assume that:
(i) P (0) holds.
19
(ii) P () holds provided P () holds for all < .
Then P(x) holds for all ordinals.
Proof. Indeed, if we had an ordinal for which P (x) would not hold, then we could find a smallest
such ordinal . But then P () for all < and therefore P () by condition (ii). Note that (ii)
actually implies (i). 2
Induction on Ordinals generalizes Complete Induction on N. We are now going to describe how
functions can be defined recursively on the ordinals. Let F (x, y) be any functional relation on the
class D(x) and let T (x) be a subclass of D(x), i.e., T (x) D(x) holds on U. Then F (x, y) T (x)
T (x) F (x, y) is called the restriction of F (x, y) to T (x).
Whenever we perceive a functional relation as a set, then we are actually
identifying the function with
its graph. So, for example, F (x, y) a stands for graph F (x, y a = {c|c (ab), c = (d, e), F (d, e)},
where b is the range of F (x, y) a.
Theorem 3.32 (Definition by Recursion on Ordinals) Let H(x, y, z) be a functional relation where
the domain is D(x, y) Ord(x) (y = y). That is, for any ordinal and any set a there is a unique
set b such that H(, a, b) holds on U. For this we write: z = H(x, y), Ord(x). Then there is a unique
functional relation F (x, y) on Ord(x) such that for any ordinal one has that
F (, b) iff H(, F , b)
i.e.,
F () = H(, F )
Proof. Let be any fixed ordinal. We first show that there is a unique function f on such that
() f () = H(, f ), <
Note, if one has that H(0, , a0 ), then necessarily
f (0) = a0 , f (1) = H(1, {(0, a0 )}) = a1 , f (2) = H(2, {(0, a0 ), (1, a1 )}), . . .
In order to have a formal proof for uniqueness, assume that we have functions f and g both satisfying
the condition () . Then f (0) = g (0) = a0 . We wish to show that if f and g agree for all
< , , then f () = g (). Now: f () = H(, f ) = H(, g ) = g (). Hence, by the
induction principle, f () = g () holds for all < , i.e., f = g .
With respect to the existence of f , note that we have already f0 , f1 , . . . and we may form the set
of all ordinals for which f exists:
= {| < , f c f : c , f () = H(, f ), < } = { < |f () }
We are going to show that is a segment of . Let and < . Put f = f . Then for < ,
f () = (f )() = f () = H(, f ) = H(, (f ) ) = H(, f )
That is, f satisfies () . Hence, , and moreover, we have shown that if and < , one
has that f = f . As a segment of the ordinal , is also an ordinal . In order to show that
= , we define a function f by
f () = H(, f ),
We wish to show that f satisfies () . To this end let < < . Then one has:
f () = H(, f ) = H(, f ) = f ()
20
Hence: f = f and therefore, f () = H(, f ) holds for all < . Thus the function f
satisfies () . If we had < , then by the definition of . But is impossible for ordinals
as we have shown before.
The functions f , an ordinal, form a class G(x). The predicate G(x) is the formalization of: f
is a function, dom(f ) is some ordinal , < f () = H(, f ) . As we already have shown, the
functions for G(x) are pairwise compatible, i.e., restrictions of each other. Hence they can be glued
together to a functional relation F (x, y) where we have:

F (, b) iff f () = b iff f+ () = b iff H(, f+ ) = b iff H(, F ) = b , i.e.
F () = H(, F )
Finally, we must show uniqueness of our F (x, y). So assume that F 0 (x, y) also satisfies the recursion
formula: F 0 () = H(, F 0 ). Put F 0 = f0 and let < . Then:
f0 () = F 0 () = H(, F 0 ) = H(, (F 0 ) ) = H(, f0 ).
According to the very first part of the proof we conclude that f = f0 holds for every . Hence
F (x, y) F 0 (x, y). 2
Here is a first application of the Recursion Principle: Define a functional relation

S
{P(f ())| < } if x is an ordinal and y a function f on
H(x, y) =

otherwise
This
S leads to a recursively
defined functional
S relation V (x) on the ordinals: SV () = H(, V ) =
P(V ())| < . We have V (0) = = and if < , V () = P(V ())| <
S +
P(V ())| < } = V (). Because taking the power set is a monotone
operation, we have V ( )
S
P(V ()) P((V ()), . Thus, V (+ ) = P(V ())| = P(V ()). If is a limit ordinal
+
S S +
S
then
< yields < . Hence, V () = P(V ())| < = V ( )| < = {V ()| <
.
The Zermelo-Fraenkel Hierarchy of Sets. The cumulative hierarchy of sets is defined by
[
V0 = ; V+ = P(V ) ; V = V | < , if is a limit ordinal.

V (x) = z Ord(z) x Vz is the class of sets belonging to that hierarchy.
In order to define addition of ordinals, we take the functional relation:

S
H(x, y) = if x = 0; H(x, y) = f ()+ if x = + and y is a function f on + ; H(x, y) = f ()| < }
if x is is a limit ordinal , y a function on ; H(x, y) = , otherwise.
We then have a unique S function

S(x) on Ord(x) such that S() = H(, S ). This is S(0) = ;
S(+ ) = S()+ ; S() = S()| < . Instead of S() we write ( + ).
S
The Addition of Ordinals + 0 = ; ( + + ) = ( + )+ ; ( + ) = + | < if 6= 0
is a limit ordinal, defines a unique operation on Ord(x). It generalizes the ordinary addition of
natural numbers.
21
S
The Multiplication of Ordinals 0 = 0; + = + ; = ( )| < if 6= 0 is
a limit ordinal, defines a unique operation on Ord(x). It generalizes the ordinary multiplication
of natural numbers.
+ S
The Exponentiation of Ordinal Numbers 0 = 1; = ; = | < if 6= 0 is
a limit ordinal, defines a unique operation on Ord(x). It generalizes the ordinary exponentiation
of natural numbers.
For example, + 1 = ( + 0)+ = + , which is in agreement with our earlier convention. Also,
0+ = {} = {} = 1. We get:
0+1 = (0 + 0)+ = 0+ = 1 i.e., 0+ = 1

0+2 = (0 + 1)+ = (0+ )+ = 1+ = 2, i.e., (0+ )+ = 2
0+3 = (0 + 2)+ = ((0+ )+ )+ = 2+ = 3, i.e., ((0+ )+ )+ = 3
0+4 = (0 + 3)+ = (((0+ )+ )+ )+ = 3+ = 4, i.e., (((0+ )+ )+ )+ = 4
Hence:
2 + 2 = (0+ )+ + (0+ )+ = ((0+ )+ + (0+ ))+ = (((0+ )+ + 0)+ )+ = (((0+ )+ )+ )+ = 4
Kant claimed that a statement like 2 + 2 = 4 is a synthetic judgment, i.e., a fact which must be
considered as being a priori true. Our argument showed that it can be proven on the basis of our
ability to recognize symbols like (((0+ )+ )+ )+ as 3+ , i.e., as 4. In this sense, 2 + 2 = 4 admits a proof,
i.e., it is the result of an analytical judgment.
At no point in our development of ordinals did we use our axiom of infinity, i.e., that there is a set
of natural numbers n, n N. We are now in a position to adopt a more general approach towards
this axiom.
The Finite Ordinals. An ordinal > 0 is called finite, if as well as every ordinal smaller than
, except 0, is a successor, i.e., it has a predecessor. The class of finite ordinals is given by the
formula:

F Ord(x) = Ord(x) y Ord(y) (y x) (y 6= ) z y = z {z}
Ordinals which are not finite are called infinite. If is a finite ordinal then all are finite. If
is finite then + is also finite. Clearly, all standard finite ordinals, i.e., the ordinals n are finite.
However, there is no way of showing that all finite ordinals are standard. A definition of finite as being
some n is a descriptive definition in contrast to the formal one above. According to a fundamental
result of mathematical logic, the so called Lowenheim-Skolem Theorem, there is no scheme of formulas
F (x, a), a b, which is satisfied exactly by the standard numbers n. Finite ordinals which are
nonstandard are of infinite magnitude in the sense that n holds for all n N. This is very easy to
see.
The Axiom of Infinity (revised). There is a set whose elements are exactly the finite ordinals.
S
We see that = | , hence is an ordinal, and is not finite because of / . Hence
cannot have a predecessor, i.e., is a limit ordinal. Because all ordinals smaller than are finite, i.e.,
they are zero or have predecessors, is the smallest infinite limit ordinal. The existence of an infinite
limit ordinal is equivalent to the axiom of infinity.
Theorem 3.33 (Proof by Induction for Finite Ordinals) Let P (x) be a property. Assume:
22
(i) P (0) holds.
(ii) P ( + ) holds, in case that P () holds.
Then P () holds for all .
Proof. Indeed, if there is some ordinal for which P () is not true, then > 0 for the smallest
such ordinal, because of (i), and because of (ii). 2
Addition and multiplication of finite ordinals provide the foundation of a rigorous development
of real numbers and the calculus. By the very nature of any such axiomatic or formal approach,
infinitesimals are lurking in the shadows e.g. as reciprocals of nonstandard finite numbers. However,
they cannot be discovered by any formal means. An approach of the Infinitesimal Calculus within any
model of ZF by adding an additional predicate which expresses the extraterrestrial quality of being
standard has been advocated by Ed. Nelson in his book on Radically Elementary Probability Theory.
It is quite interesting to note that Fraenkel discarded a theory of infinitesimals as useless for serious
mathematics. However axiomatic set theory together with formal logic finally reestablished the intuitive
methods of Euler and Cauchy, and made them again available not only as an attractive alternative to
the approach of the calculus but also as an intuitive and powerful tool for even the most advanced
parts of applied mathematics. And most of this NonStandard Analysis was developed by Abraham
Robinson who studied as a freshman set theory and mathematical logic under A. A. Fraenkel.
3.3 The Orderstructure of the Ordinal Sum, the Ordinal Prod-

uct and Ordinal Exponentiation. Finite Arithmetic
We are going to analyze the well ordered systems ( + , ), ( ) and in terms of (, ) and
(, ). We first need to prove a few facts about the addition and multiplication of ordinals.
+ 0 = ;0 + = (3.1)
The first equation is part of the recursive definition for addition. In order to prove the second equation,
we use induction on . Assume that we have 0 + = for every < . If = + then
0 + = 0 + + = (0 + )+ = + =
If is a limit ordinal then
[ [
0+= 0 + | < = | < = .
= + is a limit ordinal in case that is a limit ordinal (3.2)

S
Assume < = { + | < . Then + for some < . But then one has that
+ ( + )+ = + + , because + .
In
S particular, 1 + must be a limit ordinal while + 1 is the successor of . Actually 1 + =
1 + | = . Hence, addition of ordinals is not commutative.
+ < + if and only if < (3.3)
We proceed by induction on . That is, we assume that for every ordinal < one has that + < +
in case that < . If = + , then and
+ ( + ) < ( + )+ = + + = + .
23
If is a limit ordinal, then + < and
[
+ < ( + )+ = + + + | < = +
On the other hand, if + < + then 6= and < , by what we have shown.
If + = + then = (3.4)
This is trivial, because 6= leads according to 3.3 to an inequality.
If < then + + (3.5)
We prove this by induction on . That is, we assume the claim for all < . If = + , then
+ = + + = ( + )+ ( + )+ = + + = +
If is a limit ordinal, then

[ [
+ = + | < + | < = +
Note that 0 < 1 but 0 + = 1 + . Hence, we have right cancellation but not left cancellation.
( + ) + = + ( + ) (3.6)
We prove this by induction on . That is, we assume that associativity holds for every < . If
= + then ,
( + ) + + = (( + ) + ))+ = ( + ( + ))+ = + ( + )+ = + ( + + ).

[ [
( + ) + = ( + ) + | < = + ( + )| <
For = + , where < , we have that = + < + . Hence:

[ [
( + ( + )| < + | < +
S
Now assume that < + . Then + | < , i.e., + for some < . Hence,
+ + ( + ). This is:
[ [
+ | < + + ( + )| <
Thus: [ [
+ ( + )| < = + | < +
We already know that + is a limit ordinal. Hence:
[
+ | < + = + ( + )
This proves associativity.
< if and only if + = for a unique > 0. (3.7)
24
If > 0, then = + 0 < + = . We also know that there can be only one such that + = .
Let < . We induct on , i.e., we assume that for every , < , one can find some such
that + = .
If = + , then < and + = for some . Hence,
+ + = ( + )+ = + =
If is a limit ordinal, then let

s = {| + = < }
The set s is because of 3.3 a proper segment of ordinals, and therefore an ordinal, i.e.,s = 0 . If 0
then, because is a limit ordinal, also + 0 . That is, 0 is a limit ordinal. But then:
[ [
+ 0 = + | 0 = | < =
Theorem 3.34 (The Order Structure of the Ordinal Sum) If = + then = ( \ )

decomposes the well-ordered system w = (, ) into two disjoint subsets w1 = (, ) and w2 = ( \, )
where each element of w1 precedes each element of w2 and where w1 is order-isomorphic to and w2
is order-isomorphic to . That is, is the order sum of and .
Proof. w1 is actually equal to , and therefore isomorphic to ; by what we have shown, the map
7 + is an isomorphism between and w2 . 2
In order to analyze the structure of the ordinal product, we need a few facts.
0=0=0; 1=1= (3.8)
We have 0 = 0 by definition and 1 = 0+ = 0 + = .

We prove 0 = 0 by induction on . Hence we assume that for every < one has that 0 = 0
holds. If = + , then
0 = 0 + = 0 + 0 = 0 + 0 = 0
If is a limit ordinal, then [ [
0= 0 | < = 0=0
We prove 1 = by induction on . Hence we assume that for every < one has that 1 =
holds. If = + , then
1 = 1 + = 1 + 1 = + 1 = + =
[ [
1= 1 | < = | < =
= is a limit ordinal in case that or are limit ordinals. (3.9)
If is a limit ordinal, then [

= | <
Let < , i.e., for some < . We must have > 0, by 3.8, and therefore, by 3.3,
< < + = + < + + = ++ ; where ++
25
Now, < is the same as + and we conclude that
+ < ++ , i.e., + <
Hence, is a limit ordinal in case that is a limit ordinal. Now assume that is a limit ordinal. But
then it is enough to assume that is a successor, i.e., = + . But then:
= + = +
which is a limit ordinal by 3.2. S

Hence, 2 as well as 2 are limit ordinals. But 2 = 2 | = while 2 =
1+ = 1 + = + . We conclude that multiplication of ordinals is not commutative. Also,
2 = (1 + 1) 6= 1 + 1 = + , i.e., right distributivity fails.
If > 0, then (3.10)
We prove this by induction on . That is, we assume for every < . If is a successor, i.e.,
= + , then:
< + = + = , hence = + + =

[ [
= | < | < = , i.e., .
If > 0 and if > 0 then > 0 (3.11)
This is an immediate consequence of 3.10.
If > 0 and < , then < (3.12)
We induct on . That is, we assume for all < < that < . If = + , then and
. This is obvious for = , and for < it is our induction hypothesis. Hence,
< + = + =

[
= | < + = + > , i.e., >
In particular,
If > 0 then = only if =
If > 0 and > 1, then < (3.13)
Note, that = 1 < by 3.8 and 3.12.
If < , then (3.14)
We induct on . That is, we assume for every < that . If = + then, by 3.3:
= + = + + < + = + =
26
[ [
= | < | < = , i.e.,
We have already noticed, that 1 < 2 but 1 = = 2 . Hence, the sign in 3.14 cannot be replaced
by <.
( + ) = + (3.15)
We induct on . That is, we assume ( + ) = + for every < . If = + then:
( + ) = ( + + ) = ( + ( + 1)) = (( + ) + 1) = ( + ) +
= ( + ) + = + ( + ) = + +
= +
If is a limit ordinal, then + is a limit ordinal. Hence:

[ [ [
( + ) = | < + = | < + = | = + , <
[
= + | <
We may assume that > 0, otherwise our claim 3.15 becomes obvious. Hence, for = where
< we have by 3.12 that < . This yields:
[ [
+ | < + | <
Now let < . The set 0 = {| } is obviously a segment and therefore an ordinal.
Because S < one has that < , for each 0 . We claim that 0 is a successor. Otherwise,
0 = | < 0 } and, because holds for each 0 , one has that 0 . But this
is 0 0 , which is a contradiction. Hence, 0 = + and < shows that < . But then
also + < , because is a limit ordinal. This is, 0 > where 0 < . Hence:
[ [
+ | < + | <
This is: [ [
+ | < = + | <
Because is a limit ordinal, one has that
[
+ | < = + .
This proves left distributivity. As an application we note that 2 = (1 + 1) = 1 + 1 = + .

We extract from the proof of 3.15 the following
Lemma 3.35 Let < . Then there is a unique < such that < + . If is a limit
ordinal then one has that < 0 holds for some 0 < .
Proof. The only thing left to show is uniqueness of . Assume that we have some > . Then +
and therefore + > . If < then + and + . That is, only satisfies
< + . 2
( ) = ( ) (3.16)
27
We induct on , i.e., we assume that ( ) holds for each < . If = + , then
( + ) = ( ( + 1)) = ( + ) = ( ) + = ( ) + = ( ) +
This is, ( ) = ( ) .
If is a limit ordinal, then is a limit ordinal as well as ( ). Hence:
[
( ) = | <
For proving our claim 3.16 we may assume > 0. Now, if = where < , then, according to
3.12, one has that < . On the other hand, if < , then according to the last Lemma, and
again by 3.12, one has some < such that < < . Therefore,
[ [
| < = ( )| <
S S
But, ( )| < = ( ) )| < = ( ) . This proves associativity.
Lemma 3.36 Let < . Then = + for unique < and < .
Proof. According to Lemma 3.35, we can find some < such that < + . This is,
< + . Using 3.7, + = , and < by 3.3.
Now note that and have to be different from 0 in order for to be less than Assume
= 1 + 1 = 2 + 2 where < , 1 , 2 < and 1 , 2 < . We wish to show that 1 = 2
and 1 = 2 .
If 1 = 2 , then 1 = 2 and, a fortiori, 1 = 2 by 3.4. If 1 > 2 , then 1 (2 + 1). Hence,
1 (2 + 1) = 2 + . By 3.7 we can find some 0 such that 1 = ( 2 + ) + .
We conclude, 2 + 2 = 1 + 1 = 2 + + + 1 , i.e., 2 = + + 1 , which is a
contradiction. 2
Theorem 3.37 The map (, ) 7 + , and , defines a bijection between and

. One has 1 = 1 + 1 > 2 = 2 + 2 if, and only if, either 1 > 2 or in case that
1 = 2 that 1 > 2 . Hence, ( , ) is order isomorphic to ( , <al ) where <al denotes the
anti-lexicographic order on .
Proof. Note that if < then + . Thus, + = + . For any < , , we get
+ < + , i.e., + < . Together with Lemma 3.36 we have established the
said bijection between the Cartesian product and the ordinal .
We pull back the order on in order to make to a well-ordered system. That is,
(1 , 1 ) > (2 , 2 ) iff 1 = 1 + 1 > 2 = 2 + 2
If 1 > 2 , then 1 + +
2 and 1 2 = 2 + > 2 +2 . Thus, 1 +1 1 > 2 +2 .
Hence,
(1 , 1 ) > (2 , 2 ) in case that 1 > 2
If 1 = 2 then 1 = 1 + 1 > 2 = 2 + 2 yields 1 > 2 . Hence,
(1 , 1 ) > (2 , 2 ) in case that 1 = 2 and 1 > 2
This proves our theorem. 2
28
( , <al ) looks like a matrix with many rows of copies of :
(0, 0) < (1, 0) < ... < (, 0) < ...

(0, 1) < (1, 1) < ... < (, 1) < ...
..... . ..... . ... . ..... . ...
..... . ..... . ... . ..... . ...
(0, ) < (1, ) < ... < (, ) < ... ,
..... . ..... . ... . ..... . ...
..... . ..... . ... . ..... . ...
That is, can be realized as the order sum of many copies of . There are many layers of
copies of .
Corollary 3.38 (The Division Algorithm) Let > 0. Then every admits a unique representa-
tion = + where and < .
Proof. If 1 then, using the monotinicity relation of 3.14, we infer from 1 that 1 <
+ . Thus, < + and the claim now follows from Theorem 3.37. 2
Interesting special cases are obtained by taking = 2 and = :

For any ordinal one has that = 2 + where is either 0 or 1. That is, is either even or
odd.
For any ordinal one has that = + where . That is, any ordinal is the sum of a
limit ordinal which is equal to many copies of pieced together, plus a finite ordinal .
Of course, for finite ordinals we can say more about the sum and product. But notice, our proofs
are completely formal and dont rely on an intuitive understanding of what a number is and how we
learned to add and multiply.
Theorem 3.39 The set of finite ordinals is closed under addition and multiplication. Moreover,
addition and multiplication of finite ordinals is commutative. That is, if and then
(i) + ;
(ii) + = + ; =
Proof. In order to prove this, we induct on . If = 0, then + 0 = by 3.1 . If = + then
+ = ( + )+ = + , by the induction hypothesis. Also, 0 = 0 by 3.8. If = + then
= + = + by the first part of (i).
We first establish commutativity of addition for = 0 and = 1.
We have that + 0 = 0 + = by 3.1.
If = 1, then + 1 = + = 1 + . We prove this by induction on . For = 0 this is 0 + 1 = 0+ = 1 + 0
which is true by 3.1 and because of 0+ = 1. If = + we have
+ 1 = + + 1 = ( + 1) + 1 = (1 + ) + 1 = 1 + ( + 1) = 1 + + = 1 +
by induction and 3.6.

If = + we conclude, by induction, 3.6, and 1 + = + 1:
+ + = ( + )+ = ( + )+ = ( + ) + 1 = + ( + 1) = + (1 + ) = ( + 1) + 1 = + +
29
Hence, + = + .
In order to prove commutativity for multiplication, we need
(*) ( + 1) = +
We prove this by induction on . For = 0 this is ( + 1) 0 = 0 + 0, which is true because both
sides are zero by 3.8 and 3.1. If = + then
( + 1) + = ( + 1) + ( + 1) = ( + ) + ( + 1) = ( + ) + ( + 1) = + + +
which is (*) for .
If = 0, then 0 = 0 = 0 by 3.8. If = + then
+ = + = +
by the definition of ordinal multiplication and by the induction hypothesis. But also,
+ = +
by (*). This proves = . 2
Concerning exponentiation of ordinals, we must note that the notation = is ambiguous. The
ordinals and are sets but is not the set of maps from into . For example:
[
2 = 2 | =
while the set of maps from into 2 is equivalent to R, i.e., to the set of real numbers. Monk uses
for ordinal exponentiation. We do not adopt his convention. Later on, it will become clear from
the context what kind of exponentiation is meant. We already remark, that there is no natural way
to make 2 , perceived as the set of maps from into 2 to a well ordered set. One needs the AC to
well-order this set, i.e., R. However, one has the following
Proposition 3.40 A finite direct product
a = a0 . . . a1
of well-ordered sets a is well-ordered by the lexicographic order:
b = (b0 , . . . , b1 ) < c = (c0 , . . . , c1 ) iff b < c if is the first index where a and b differ.
Proof. This relation < is obviously irreflexive and one can easily prove that it is also transitive. We
need to show that every subset s of a has a smallest element. We define successively subsets s of s
and elements c a :
s0 = {b0 |b0 first component of some b s}
c0 = min(s0 )
s1 = {b1 |b1 second component of some b s where the first component is c0 }
c1 = min(s1 )

s1 = {b1 |b1 last component of some b s where first components are c0 , c1 , . . . , c2 }
c1 = min(s1 )
It is easy to see that c = (c0 , , c1 ) is the minimum of s. 2
30
Definition 3.7 We define for ordinals and the weak cartesian power :
() = {f |f () > 0 for only finitely many }
If f and g are different elements of () then, because f and g differ at only finitely many s there is
a largest where f () 6= g(). We define f < g in case that f () < g().
For example, we have in 2() that:
(0, 0, . . .) < (0, 1 . . .) < (0, 0, 1, . . .) < . . .
Theorem 3.41 The relation < is a well-order on ()

Proof. It is obviously irreflexive and transitive. We need to show that every nonempty subset s of
() has a minimum. For f in () we define 0 (f ) = max{|f () > 0}. We can do this because the
functions in () are nonzero for only finitely many ordinals in . Then let
0 = min{0 (f )|f s}, 0 = min{f (0 )|f s where 0 (f ) = 0 }, s0 = {f |f (0 ) = 0 , 0 (f ) = 0 }
We have g > f for g / s0 and f s0 . Thus, only functions f s0 are candidates for min(s). If s0 is a
singleton, we are done. Otherwise, we continue and define, similarly as before, 1 (f ) = max{|f () >
0, < 0 }. Then let
1 = min{1 (f )|f s0 }, 1 = min{f (1 )|f s0 where 1 (f ) = 1 }, s1 = {f |f (1 ) = 1 , 1 (f ) = 1 }
We have 0 > 1 and g > f for g s0 \ s1 and f s1 . We can continue that way and create a strictly
decreasing sequence of ordinals 0 > 1 . . . which must be finite1 . Thus s is a singleton for a finite
ordinal . The element of s then is the minimum for s. 2
The following relations are quite obvious from the definitions:

[
0 = {} = 1; (+1) = () ; () = () | < , if is a limit ordinal (3.17)
These identities make the following theorem quite obvious:

Theorem 3.42 The order type of the exponentiation of ordinals is order isomorphic to the weak carte-
sian power, i.e.,

= ()
as well-ordered sets. 2
One can use this isomorphism to prove various exponential identities, e.g., + = and =
( ) . The corresponding identities for the weak cartesian power, i.e., (+) = () () and
()
() = () are quite easily established.
1 A formalization of this process is based on Definition by Recursion on Ordinals(Theorem 3.32), and is left as an
excercise. For example, 1 should be 1 etc.
31
Chapter 4
The Axiom of Choice
The axiom of choice (AC) states that every family of non-empty sets admits a choice function. As an
important consequence we note that every set p of mutually disjoint and non-empty sets c admits a
cross-section, i.e., a set s such that s c 6= for each c p. For s we may take the range of a choice
function for the family which is theS identity on p, i.e., p p, c 7 c. The sets c are called the classes
of the partition p of the set a = c, and s is also called a set of representatives for the partition p.
Because the sets c are pairwise disjoint, s c is a singleton for each c p. The statement that any
partition p of a set a into non-empty classes admits a set of representatives is actually equivalent to
AC. Given any family a, j i, of non-empty sets we can make it to a set p of disjoint sets by defining
p = {cj = {j} aj |j i}. Clearly, if j 6= j 0 then cj cj 0 = . Then if s is a cross-section for p, we
have s cj = (j, sj ) for some sj aj , and the map j 7 sj is a choice function for the family aj , j i.
A choice function is easy to find in case that all sets aj are well-ordered.
S One
may pick from every aj
just the smallest element. Notice that a well-ordering of a = aj |j i induces a well-ordering of
each aj and produces a choice function. The most substantial consequence of AC is
Theorem 4.1 (Zermelos Well-Ordering Principle (WO)) Every set can be well-ordered.
Proof. Let f be any choice function on P(a) \ {}. The idea to well-order a is quite simple. Define
a0 = f (a), a1 = f (a \ {a0 }), a2 = f (a \ {a0 , a1 }), . . . and set a0 < a1 < . . . The process of assigning
ordinals to elements of a should terminate because at every such step we are taking away an element
from a. The formal definition of this process is done by recursion. We pick some c / a and define a
functional relation by

f (a \ ran(g)) if x = a and y = g : a, ran(g) a
H(x, y) =

c otherwise
Because the universe U is not a set it is clear that such a set c must exist for a. By the Recursion
Theorem we have a functional relation F (x, y) such that for each ordinal , F () = H(, F ) holds.
If F () 6= c holds, then F is a function whose domain is , the range is {F ()| < } a and,
moreover, F () = f (a \ {F ()| < }). We conclude that F () 6= F () for all < . If we had
F () 6= c for all ordinals , then F (x) would establish an injective functional relation from Ord(x) into
the set a. It would follow from comprehension and replacement that Ord(x) is a set. Hence, there is a
smallest ordinal such that F () = c. For each < we have that F () a. Hence F : a and
the map is injective. But rang(F ) = a, because F () = c. Thus F is bijective and well-orders a.
2
32
Let (p, ) be a partially ordered set. An element m p is called maximal if m a implies that
m = a. The open interval (0, 1) of real numbers is an example of a partially ordered set which does
not have maximal elements. The maximal elements of the partially ordered set of all proper subspaces
of Rn are the hyperplanes. In a totally unordered set, any element is maximal. In order to find a
condition which guarantees the existence of maximal elements let us try to find one in (p, ). Pick any
element a0 in p. If a0 is not maximal, pick an element a1 > a0 . . . etc. If the so created sequence does
not terminate at some finite n , we have to define some a > an , i.e., we need to have at least one
upper bound for the chain {an |n }. This heuristic construction suggests
Theorem 4.2 (Zorns Lemma (ZL)) Let (p, ) be a partially ordered set where every chain has an
upper bound in p. Then p contains a maximal element.
Proof. The element u p is called a strict upper bound for the subset a of p, if u > b holds for each
b a. Any element of p is a strict upper bound for . There is no strict upper bound for p. Let d be
the set of all subsets of p which have a strict upper bound and for a d let u(a) be the non-empty set
of strict upper bounds. For a choice function f on P(p) \ {} and a d define v(a) = f (u(a)), i.e., v
selects a strict upper bound for a subset a of p, if there is one. Of course, v(a)
/ a for each a d. We
define
v(rang(g)) if x = and y = g : p, ran(g) d
H(x, y) =

c otherwise
As before, we choose for c a set which is not a member of p. According to the Recursion Principle
we have a functional relation F (x, y) such that F () = H(, F ). If F () 6= c, then F is a
function from into p, where the range {F ()| < } belongs to d, and F () = v({F ()| < }).
Therefore, F () 6= F (), < . If we had F () 6= c for all ordinals then F (x, y) would define an
injective functional relation of the class Ord(x) into the set p. Hence F () = c for a smallest ordinal
. But then for < < we have, because of F () = v({F ()| < }), that F () > F (). Hence
{F ()| < } is a chain in p and must have an upper bound m. On the other hand, this chain cannot
have a strict upper bound because F () = c. Hence = + and F () = m and there cannot be any
element larger than m. This is, m is maximal. 2
Corollary 4.3 Let (p, ) be a partially ordered set where every chain has an upper bound. Then for
any a p there is some maximal element m in p such that m a.
Proof. We only have to apply Zorns Lemma to the partially ordered set {b|b a}. 2
Corollary 4.4 Let (p, ) be a partially ordered set where every well-ordered chain has an upper bound.
Then p has a maximal element.
Proof. For the proof of ZL we only needed an upper bound for the well-ordered chain {F ()| < }.
2
Theorem 4.5 (ZL) implies within ZF Set Theory (WO) and, a fortiori, also (AC).
Proof. Let a be a set we wish to well-order. To this end let s be the set of all well-orderings of subsets
b of a, i.e., s = {r|r b b, b a, r well-order on b}. We partially order s by
S saying that
r1 r2 if b1
is a segment of b2. Let c be any chain in s. It is quite obvious that u = r|r c is a well-order
S
on the set b = dom(r)|r c}, i.e., u s, and u r for each r c. Hence any chain in s has an
upper bound. By ZL we conclude that s has a maximal element w. If we had b = dom(w) a, then
any element d a \ b put on top of (b, w) would violate the maximality of w.
That WO implies AC is a trivial observation which we have stated before. 2
33
Chapter 5
The Axiom of Foundation
The axiom of foundation says that every non-empty set a contains an element b such that b a = .
It follows that a set can never be an element of itself, (x x), or more generally, there are no finite
cycles, i.e., maps defined on a finite ordinal such that a0 = a1 , a a+1 , 0 < 1; and
there is no map on such that a+1 a . The range of such maps would violate AF. An element
b a such that b a = , is minimal in (a, ), i.e., b a and for no c a one has that c b. From
this remark we conclude
Proposition 5.1 (AF) A set is an ordinal if and only if
(i) is transitive.
(ii) (, ) is totally ordered.
Proof. Recall that an ordinal is a set which is transitive on which is a strict well-ordering.
Because of AF, is irreflexive. If we have that is a total order then a minimal element of is
actually the minimum. 2
If a is transitive and b a then b a; thus b a, in the presence of AF.

Theorem 5.2 (AF) A set is an ordinal if and only if
(i) and all of its elements are transitive.
Proof. We need to show (ii). To this end assume that there are , which are different and
incomparable with respect to . Hence the set

a = | , ( 6= ) ( ) ( )
is non-empty and has a minimal element 0 . Because is transitive we have 0 . Hence every
element 0 belongs to but does not meet the condition for a. That is:
(*) If 0 then we have for any that = or or .
On the other hand, 0 belongs to a, hence the set

b = | , ( 6= 0 ) ( 0 ) (0 )
is non-empty and has a minimal element 0 . Because is transitive, every element 0 belongs to
but does not meet the condition for b. That is
34
(**) If 0 then we have = 0 or 0 or 0 .
We are going to deduce a contradiction from (*) and (**). First, we derive 0 0 . Let 0 ; we
use (**) to conclude 0 . If we had = 0 , then 0 0 would contradict 0 b. If we had 0
then 0 together with the transitivity of 0 would yield 0 0 , contradicting again 0 b.
Now, 0 0 , and 0 = 0 cannot hold because of 0 b. This is 0 0 and we may pick some
0 \ 0 . By (*) we have 0 = or 0 or 0 where we have already excluded 0 . Now,
0 = yields 0 0 which contradicts 0 b; 0 and 0 yields by transitivity of the set
0 again the contradiction 0 0 . 2
We have already introduced the cumulative ZF-Hierarchy of Sets:

[
V0 = ; V+ = P(V ) ; V = V | < , if is a limit ordinal.

and the class V (x) = z Ord(z) x Vz of sets which belong to that hierarchy. We are going to
show that under the assumption of AF, every set belongs to that hierarchy. Actually, AF is equivalent
to this statement. Not making any use of AF, we are going to prove a few useful facts about V (x).
But first a rather trivial
Lemma 5.3 Assume that the set a is transitive. Then P(a) is transitive. If a
/ a then P(a)
/ P(a).
Proof. Let a be transitive and assume b P(a); that is b a. But then, if c b one has that c a.
Because a is transitive, we have that c a. Hence, c P(a). That is, b P(a).
Now assume that we have a / a but P(a) P(a), that is, P(a) a. But then a P(a) a, hence
a a, which is a contradiction. 2
Lemma 5.4 Each V is transitive and one has that V
/ V . If < then V V and V V
Proof. Assume for all < that V isS transitive.

If is a limit ordinal then we have V = V | < . Assume that a V . Then a V for some
< . But by transitivity of V we have a V and therefore a V .
If is a successor, then = + where V = P(v ) and the claim follows from Lemma 5.3.
Assume V / V for all < .
Assume V V . If is a limit ordinal then V V for some < . But V is transitive, so V V .
But V V+ V V . Hence, V V , a contradiction.
If is a successor, then = + and the claim follows from Lemma 5.3.
Assume for all < < that V V and V V .
If is a limit ordinal then V V and V V + V . This is V V . We have V V + V ,
thus V V .
If is a successor, then V = V + . We have V V + = V , i.e., V V . Because V is transitive, we
have already V V . If we had V = V , then V = V V , but we have already V V ruled out.
2
Assume that V (a) holds for the set a. Then let be the smallest ordinal such that a V holds.
Because V0 = , we have > 0. It is also clear that must be a successor ordinal. For a limit ordinal
, each element a of V belongs to some V for some < . Thus = + and we call (a) the rank
of a: (a) = if a V where = + is the smallest ordinal such that a V , or a V
Assume a V . If is a limit ordinal then (a) < . If = + one has that (a) < . Thus,
a V iff (a) < .
35
Proposition 5.5 V (a) holds if and only if V (b) holds for every b a. Moreover, (b) < (a) for
every b a. If c is a subset of a then V (c) holds and (c) (a).
Proof. Let (a) = . Then a V+ = P(V ), i.e., a V . If b a then b V . Hence, V (b) and
(b) < = (a). S
Now assume V (b) for every b a. Let = (b)+ |b a . Then b V(b)+ V for every b a.
Hence, b V for every b a which is a V , i.e., a V+ .
We have c a V , hence c V which is (c) = (a) 2
Proposition 5.6 V () holds for every ordinal and one has the () = .
Proof. By Proposition 5.5 we have that V () holds in case that V () holds for all < . So V ()
holds for every ordinal . S
If we have V + for each , then V and because of V = P(V )| < , one has
V for each . Hence V , i.e., V+ . This is V+ for each ordinal . Assume
that there is an ordinal
S where () < , i.e., an ordinal for which V . We pick the smallest such
. Then V = P(V )| < yields some < such that V , i.e., V for some
< . However, was the smallest such ordinal. 2
Proposition 5.7 Each set a for which V (a) holds contains an element b such that a b =
Proof. {(b)|b a} is a set of ordinals and has a smallest element . Hence (b) = for some b a
and according to Proposition 5.5, (c) < rho(b) for each c b. Hence c
/ a for each c b. 2
We are going to show that the axiom of foundation implies that each set a of the universe belongs to
the class V (x). A possible argument runs roughly like this: Assume that there is a set a such that
V (a) holds. Then a must contain some element a1 , such that V (a1 ). But then a1 must contain
some element a2 such that V (a2 ) etc. That is, we create a sequence a 3 a1 3 a2 3 . . . which violates
AF. A complete proof along these lines requires obviously AC. We are tempted
to define recursively a
sequence a , , such that a0 = a, and a+1 = f b|b a , V (b) for some choice function f .
The trouble with this definition of the a is that we first have to specify a suitable family of non-empty
sets1 in order to have a choice function for the generation the a . There is a somewhat more elementary
approach possible, which is based on the transitive closure of a set.
Lemma 5.8 For any set a there is a set tr(a) which contains a and which is transitive. Moreover,
any transitive set which contains a contains also tr(a).
S S S
Proof. We define sets a0 = a, a1 = b|b a0 , a2 = b|b a1 , . . . a+1 = b|b a and
S
tr(a) = a | .
We have tr(a) a and if b tr(a) then b a for some . But then b a+1 tr(a), i.e., tr(a) is
transitive.
Now assume that c a where c is transitive. SAssume that
a c. Let b a c. Then b c and
b c, because c is transitive. Hence a+1 = b|b a c. 2
Theorem 5.9 The axiom of foundation holds if and only if every set belongs to the ZF-Hierarchy:
AF xV (x)
Proof. We already showed that every set a which belongs to V (x) has an element b which is disjoint
to a. So xV (x) implies the axiom of foundation.
1 However, Hilbert postulated the existence of a universal choice (class-)function for which a 6= (a) a holds.
He called it the logical choice function. However, nobody assumes this strong version of AC anymore
36
For the converse, assume that we have AF but that there is a set a which does not belong to V (x).
Then tr(a) does not satisfy V (x) as a set which contains a. Let b = {c|c tr(a), V (c)}.We are going
to show that b violates AF. Let c b. Then, because of V (c), c must contain an element d such that
V (d). Now, d c tr(a) yields d tr(a). But tr(a) is transitive, thus d tr(a). Together with
V (d) this yields d b. Hence, c b 6= . Because this holds for every element c b, the AF cannot
hold. 2
37
Chapter 6
Cardinals
6.1 Equivalence of Sets

Two sets a and b are said to be equivalent if there is some bijection from a onto b. This is obviously
an equivalence relation whose domain is the class of all sets. We write a b for equivalent sets a and
b. Intuitively, the sets a and b are equivalent if they have the same number of elements. That a has
not more elements than b can be formalized by defining: a in b iff there is an injection from a to b;
or quite similarly: a pr b iff there is a surjection from b onto a. Both relations are quasi orders (i.e.,
reflexive and transitive relations) on the class of all sets. Clearly, in a for every set a 6= . In the
following we always assume that a 6= .
Proposition 6.1 Let f : a b and g : b a be maps. Assume that g f = ida . Then f is injective
and g is surjective.
Proposition 6.2 Assume that f : a b is injective. Then there is some map g : b a such that
g f = ida . That is, every injective map has at least one left inverse.
Proposition 6.3 (AC) Assume that g : b a is surjective. Then there is some map f : a b such
that g f = ida . That is, under the assumption of AC, every surjective map has at least one right
inverse.
The proofs are very easy. The map f for Proposition 6.3 is defined with the help of a choice function
on P(b) \ {} which picks for every c a some element d g 1 (a) = {d|g(d) = a}.
Hence, a in b always yields a pr b but the converse needs the AC. Thus a in b iff a pr b holds
under the assumption of the axiom of choice.
For every map f : a b the equivalence kernel, or just the kernel, is defined by c1 f c2 iff f (c1 ) =
f (c2 ). This is an equivalence relation on the set a where the classes are the largest subsets of a on
which the map f is constant. As usual, a/ f denotes the set of equivalence classes and c 7 [c] is the
canonical projection qf . The map [c] 7 f (c) then is the canonical injection f.
Proposition 6.4 Every map f : a b decomposes into a surjection followed by an injection: f qf =

f.
Theorem 6.5 (Cantor-Bernstein) a in b and b in a if and only if a b.
38
d0 d1 d ................
2
g f g f g
f
c0 c
1 c2 c3 .....
Figure 6.1: A moving element
Proof. Let f : a b and g : b a be injections. We need to find a bijection from a to b. We call

an element c0 a moving if it allows for an infinite diagram as in figure 6.1. That is, we can define
two sequences c = g(d ) and d = f (c+1 ), , where c has a (unique) counter image d in b and
where d then has a (unique) counter image c+1 in a. We call an element c a stationary if it is not
moving. An element c is stationary if for a first we dont have a d , i.e., c is not in the range of the
map g (c gets stopped in a) or we dont have a c+1 , i.e., d is not in the range of f (c gets stopped
in b). Let a1 be the subset of a consisting of all moving elements and the elements which are stopped
in b; the set a2 then is the complement of a1 , i.e., the set of all elements of a which are stopped in a.
We define a map h : a b by pieces. On a1 an element c is mapped to d, where g(d) = c. This makes
sense. If c is moving, then clearly c ran(g). If c is not moving, then it got stopped in b, so again we
must have that at least c ran(g). On a2 an element c is mapped to f (c).
The map h is injective because f and g are injective, and g(d0 ) = c0 and d0 = f (c) for c0 a1 and
c a2 cannot happen. If c0 is moving then d0 = f (c1 ) for some c1 a1 . If c0 is stationary, it is
stopped in b; hence again we conclude from d0 = f (c) that c = c1 a1 .
Let d b. If g(d) is moving, then h(g(d)) = d. If g(d) is not moving and got stopped in b, then
again h(g(d)) = d. If g(d) = c got stopped in a, then we must have some c1 in a such that f (c1 ) = d,
otherwise c would have been stopped at d in b. Clearly c1 a2 and h(c1 ) = d. 2
Corollary 6.6 If a in b in c and a c then a b.
Proof. We have a in b and b in c a, yields also b in a. The claim follows now from Cantor-
Bernstein. 2
Theorem 6.7 (Cantor) For any set a one has that a <in P(a). That is a in P(a) but not P(a) in
a.
Proof. We have that c 7 {c} establishes an injective map from a into P(a). We need to show that
there is no surjection from a to P(a). Let h : a P(a) be any map. The set r = {c|c a, c / h(c)}
then is not within the range of h: h(c0 ) = r yields the Russel Paradox c0 h(c0 ) iff c0 r iff c0
/ h(c0 ).
Hence h is not surjective. 2
For any quasi-order the relation a b which is defined by (a b and b a) is an equivalence

and [a] [b] iff a b is a partial order for the equivalence classes. It is called the contraction of .
The equivalence for in is according to the Cantor-Bernstein Theorem the bijective equivalence and
[a] [b] iffa in b defines a partial order on the classes of equivalent sets. It is quite obvious that
many set operations are invariant under , e.g., if a b then P(a) P(b). However, some familiar
facts, e.g., cartesian products are equivalent if the factors are equivalent, require (in case of infinite
products) the help of the axiom of choice.
Proposition 6.8 Let a,b,c be sets. Then:

(i) P(a) 2a ;
39
(ii) (ab )c a(bc) ;
(iii) a(bc) ab ac in case that b c = ;
(iv) ac bc (a b)c
Proof. Subsets can be identified with their characteristic functions. Actually P(a) and 2a are
isomorphic as boolean algebras. Families of maps in one variable are just maps in two variables. This
is (ii). The statement (iii) says that maps on disjoint domains can be patched together; (iv) is the
basis of the categorical characterization of the cartesian product, i.e., pairs of maps (f, g) on a common
domain correspond to maps h into the direct product of the codomains. 2
Let r be the set of real numbers and (a, b) and (c, d) any two (proper) open intervals. Then (a, b)
(c, d) by means of a simple linear equation. Clearly (1, 1) in [1, 1] in (2, 2) and (1, 1) (2, 2)
then yields (1, 1) [1, 1]. Hence any two proper intervals, whether open, closed or half-open, are
equivalent. The arctangent function maps r bijectively onto the open interval (/2, /2). Hence r
is equivalent to any of its proper intervals. The function 1/x maps (0, 1] to [1, ) and, as before, any
two improper intervals are equivalent. Thus r is equivalent to any of its intervals. On the other hand,
with the help of the binary representation of real numbers, one easily establishes [0, 1] 2 .
Proposition 6.9 The set r of real numbers, the continuum, is equivalent to the powerset of the set
of natural numbers. 2
Proposition 6.10 (Cantor)
Proof. Every , 1, is of the form 2x (2 y + 1) with uniquely determined x and y. This

follows from the Prime Factorization Theorem. The following map: f (x, y) = 2x (2 y + 1) 1 then
provides a bijection between and . 2
Cantors well-known proof based on an enumeration of the tableau is easy to remember but a
bit more difficult to formalize.
Corollary 6.11 for every \ {0}. 2
Corollary 6.12 r r r, \ {0}.
Proof. We have 2 (2 ) 2 2 (2 ) 2 2 and r (2 )

and r (2 ) . We wrote instead of in and we will from now on continue to do so for similar
calculations. 2
Hence the continuum r is equivalent to the finite dimensional spaces r and even equivalent to
the infinite dimensional space of all real sequences. The set q + of non-negative rational numbers is
equivalent to a subset of , by means of the unique representation of a rational number as a reduced
fraction. Hence, q + . But q + , therefore q + . It follows that the set q of rational
numbers is equivalent to the set of finite ordinals: q + q {1, 1} q + 2 . Two
continuous real functions which agree on the set q are equal. Let c be the set of all continuous functions.
Then r c because every constant is continuous, and c rq with f 7 f q. So r c rq r r.
Hence r is equivalent to the set of all continuous functions.
We have 2 2 2 2 r r r, thus 2 2 and therefore

Corollary 6.13 r < 2r r (2 )r 2r 22 2r . 2
40
Therefore, the set of all real valued functions is equivalent to the set of functions which take on only
two values and this set is strictly larger than the continuum.
For any sets a and b where a / a and b / b one has that a b iff a {a} = a+ b+ = b {b}. Let
+ +
g : a b be a bijection. If g(a) = b, then g a is a bijection between a and b. If g(c) = b, then with
the transposition (a, c) we get in g (a, c) a bijection g 0 : a+ b+ which maps a to b. In particular,
0 <in 1, leads to 1 <in 2. It follows, by induction, that for any finite ordinal one has that <in + .
Hence, for finite ordinals one has that < iff <in . (Assume < . Then + and certainly
+ in , by means of the inclusion map. Hence <in + in . So < yields <in . Assume
<in . Then < , because otherwise and therefore, in .) A set a is called finite if
it is equivalent to some finite ordinal . This ordinal is then unique and called the cardinal of a;
= card(a) is a canonical representative of the class of all sets which are equivalent to the set a. Sets
which are not finite are called infinite.
For the ordinal we have that + . The map 7 0, 7 + is a bijection between and + .
Hence is an infinite set. Our next goal is to show that for any ordinal one has an ordinal such
that <in . Then automatically < . Otherwise , so in , thus . We are going to
show somewhat more:
Proposition 6.14 For any set a there is an ordinal which is not equivalent to any subset b of a.
The smallest such is called the Hartogs number of a
Proof. Let k(a) = {w|w P(a a), w a well-order of a subset b of a}. If w is in k(a) then w is
isomorphic to a unique ordinal . That is, we have a functional relation H(w, ), w k(a). If is any
ordinal which is equivalent to some subset b of a, then b carries a well-ordering w such that H(w, ).
The range of H(w, ) is by replacement a set and obviously a segment. So it is of the form S() where
is the first ordinal not equivalent to any subset of a. 2
Definition 6.1 An ordinal is called an initial ordinal if < yields <in ; that is, no ordinal
smaller than is equivalent to . Initial ordinals are also called cardinals.
Proposition 6.15 For any set a the Hartogs number h(a) = is a cardinal.
Proof. Assume for some < that . Then is equivalent to some subset b of a because
was the smallest ordinal for which this is not the case. But would make b well-orderable by .
This is a contradiction. 2
Proposition 6.16 The class of cardinals is not a set.
S
Proof. Let a be any set of cardinals. Then = a is an upper bound for a. But then h(b) >
holds for each a. That is, the cardinal h() does not belong to the set a. 2
Proposition 6.17 All infinite cardinals are limit ordinals.
Proof. Assume that is an ordinal and . We define a bijection f : {} by setting
f (0) = ; f () = 1, \ {0}; f () = for < . Hence + is not a cardinal. 2
We have that all finite ordinals are cardinals. If is any ordinal then there is a smallest ordinal
equivalent to it: card() = min{| }. For example, = card() = card( + 1) = card( + 2) =
. . . and we have already shown that there are cardinals larger than the first infinite cardinal, which
is . An ordinal is countably infinite, or denumerable, if = card(). Let 1 be the first cardinal
which is not countable. Then 1 = {| < 1 } tells us that 1 is just the set of all countable ordinals,
i.e., the ordinals which are either finite or denumerable.
41
Let a be a set which can be well-ordered. We then define:
card(a) = min{| a}
In particular, card(a) a. If is an ordinal then card() because is well-ordered by by the

identity map. If is a cardinal then card() = because no smaller ordinal is equivalent to . If
and are cardinals such that , then = .
Lemma 6.18 Let s be a subset of the ordinal . Then (s, ) is well-ordered and card(s) .
Proof. Indeed, we have that (s, ) = or (s, )
= S() = for some , or
= {| s, < },
for some in s. The last case is impossible because it would give us some strictly increasing map
f : and we know from Proposition 3.9 on Ordinals that for any such map f () holds.
In particular, f () , so f () is not in the range of f which consists only of ordinals less than .
The first two possibilities tell us that s can be well-ordered by or by an ordinal less than . Hence,
card(s) . 2
In particular, h() = {| s } = {| or } = 1 .
Corollary 6.19 If b in a where a can be well-ordered, then b can be well-ordered and one has that
card(b) card(a).
Proof. We have b s0 for some subset s0 of a and because of a card(a) we get b s for some
subset s of card(a). Hence, card(b) = card(s) card(a). 2
Corollary 6.20 Assume for the ordinal that card() < where is a cardinal. Then < .
Proof. Indeed, if we had , then would be a subset of . The smallest ordinal that well-orders
, i.e., card(), would also well-order the set . This well-ordering of is order-isomorphic to an
ordinal where card(). But was a cardinal, thus , and therefore card() which
contradicts card() < . 2
Thus, an ordinal is a cardinal if and only if for every ordinal one has that
< iff card() <in
The Continuum Hypothesis (CH) is the assertion that any subset s of real numbers which is not
finite is equivalent to or to r.
We can avoid the reals in the formulation of CH by asserting that every subset of 2 which is infinite
is equivalent to or to the whole set 2 .
Proposition 6.21 If 2 can be well-ordered then 1 = card(2 ) is equivalent to the Continuum Hy-
pothesis.
Proof. If 2 can be well-ordered then 1 card(2 ) because <in 2 card(2 ) but 1 is the
smallest cardinal such that < . It follows that 2 contains a subset s which is equivalent to 1 .
If we assume CH then s 2 , hence 1 2 and 1 = card(1 ) card(2 ). But equivalent cardinals
are equal, so 1 = card(2 ).
Assume that 2 can be well-ordered and that 1 = card(2 ). Then let s be any infinite subset of 2 .
If card(s) > , then 1 card(s) card(2 ). Hence, card(s) = card(2 ) and it follows that s 2 ,
i.e., CH. 2
Cantor stated the Continuum Hypothesis in 1878 and it was listed by Hilbert in his famous address of
1900 as the most important mathematical problem of the twentieth century. Hilbert believed in the
42
truth of CH and called it actually the Continuum Theorem in his 1925 paper About Infinity. Godel
proved in 1938 that AC and CH are simultaneously consistent with ZF. Within any universe he formed
the hierarchy of constructible sets: Roughly speaking, sets are constructed in stages. The critical case
is the one from stage to + . If the set a has already been constructed at stage then at + all
subsets of a are collected which are given by formulas Q(x) where all occurring parameters are sets
already constructed at stages . This constructible universe L settled many other open problems in
the affirmative and there are mathematicians who firmly believe in V = L, i.e., the ZF-Hierarchy V
should consist of constructible sets only. However, P. Cohen showed in 1963 that CH does not follow
from AC and he even constructed a model where the reals cannot be well-ordered, i.e., where AC is not
true. And CH can be true without having V = L. He invented a new method which he called forcing in
order to obtain these models. His research revolutionized set theory and gave new impetus to systems
of non-classical logics. The CH and its generalization to arbitrary infinite sets implies, however, AC.
It is quite interesting to note that P. Cohen believes that CH is obviously false. He argues that 1 as
the set of all types of well-orderings realizable on is, after all, a set one gets by adding one element at
a time: 1 = {1, 2, . . . , , + 1, + 2, . . . , + , . . .} while forming the power set should be perceived
as a much stronger tool to generate a set of a larger cardinal.
6.2 Cardinals
If we assume AC, and we will do so from now on, then any set has a cardinal. Let ()j , j i, be any
family of cardinals. We define the cardinal sum as the cardinal of the disjoint union:
X [
j = card {j} j
ji ji
If aj is a family of pairwise disjoint sets where card(aj ) = j , then the sum of the j is the cardinal of
the union of the aj . This is very easy to see. The cardinal product of the j is defined as the cardinal
of the cartesian product: Y Y
j = card j
ji ji
If aj is any family of sets where card(aj ) = j , then the product of the j is the cardinal of the
cartesian product of the aj . Again, this is very easy to see. If and are cardinals then + and
denote the sum and product, respectively. It is obvious that sum and product are commutative
and associative and that the multiplication is distributive over addition; for exponentiation we have
the familiar rules:
= ( )
= ( )
= +
Finite sums and products agree with ordinary addition and multiplication. These are the addition
and multiplication rules of finite combinatorics. Whenever we use cardinal and ordinal addition or
multiplication simultaneously, the ordinal operations are encircled, e.g., = 2 while + =
2 = = . Another convention is to use Hebrew letters for cardinals, like Aleph() or Beth(i).
We already know that the cardinals form a class, well-ordered by . Hence there is a unique order
isomorphic correspondence between all ordinals and all infinite cardinals. This correspondence was
called by Cantor Aleph(, ) and for which he introduced the notation . So in particular:

= 0 , 0 + 0 = 0 , 0 0 = 0 , , 0 < 20 , 0 0 = 20
43
The single most useful fact about cardinal arithmetic is
Theorem 6.22 Let be an infinite cardinal. Then = .
Proof. For the proof we restate two auxiliary Lemmas which can be easily proven ad hoc. Recall,
however, Theorem 3.34 and Theorem 3.37.
Lemma 6.23 Let (1 , <1 ) and (w2 , <2 ) be two well-ordered systems. Then the cartesian product
1 2 becomes a well-ordered set by defining: (c, d) < (c0 , d0 ) iff c <1 c0 or if c = c0 then d <2 d0 .
(w, <) = (w1 , <1 ) (w2 , <2 ) is called the lexicographic product. The ordinal product is
isomorphic to the lexicographic product of and .
Lemma 6.24 Let (w , < ), < , be a family of well-ordered

S systems indexed by the ordinal .
Assume that the w are pairwise disjoint. Then w = {w | < } becomes a well-ordered set by
defining: c < d iff c < d in case that c and d belong to the same w or if c w and d w then
< . The set w is called the order sum. The ordinal sum is isomorphic to the order sum of
and .
If we write |a| for card(a) then according to the last statement of the preceding Lemma we have that
| | = || + ||. For example, if is an infinite ordinal then | 1| = | {}| = || according to
7 0, 7 + 1 for < , 7 , for < .
The proof of the theorem generalizes a particular proof for S. Defineon the following
well-ordering < . Let p = {(, )| + = }. Then = p | where, of course, the
union is disjoint. Each p carries the lexicographic ordering which it inherited from . The order
sum of the p defines a well-ordering < on which is order-isomorphic to . We have:
(0, 0) < (0, 1) < (1, 0) < (0, 2) < (1, 1) < (2, 0) < . . .
It is plain that this enumeration of defines an isomorphism f between (, ) and ( , < ). A

formal, recursive definition of f is left as an easy exercise.
In order to prove the theorem, assume that there is a smallest cardinal such that 6= . Because
of we may assume > for a smallest and = < for every < . We
already know that > , but we will not need this fact.
Let , . If and are both finite, then < . Otherwise, | | = || + || + =
2 = < where is the maximum of || and ||. Hence, by Corollary 6.20, < for
all , .
Let p = = {(, )|, }. For each , let p = {(, )| = }. The p constitute a
partition of p. Each p is well-ordered as a subset of the lexicographic product . We now take on
p the well-ordering < which is the order sum of the p .
We have (, ) < ( 0 , 0 ) where 0 0 = 0 iff ( = 0 and < 0 or if = 0 one has that <
0 ) or ( < 0 ) . At any rate, if (, ) (0 , 0 ) then, , 0 .
The well-ordered system (p, < ) is order isomorphic to some ordinal . Because card(p) = > , we
have that > . Hence is isomorphic to a segment of (p, < ), say = S(0 , 0 ) where (0 , 0 ) p0
for some 0 < .
If (, ) S(0 , 0 ), then , < 0 1. Hence S(0 , 0 ) (0 1) (0 1). Thus = |S(0 , 0 )|
| 1| | 1| = | 1| < , which is a contradiction. (For the last step we needed that is an infinite
cardinal: If 0 < , then |0 | < as well as | 1| < ). 2
Corollary 6.25 Let and be cardinals where at least one is infinite. Then
+ = max(, ) =
44
Proof. We first assume that . Then + 2 = .
If is infinite and finite then + = : + + = ; = : = . 2
Corollary 6.26 Let and be cardinals such that . Then
= 2
Proof. We have that 2 (2 ) = 2 = 2 . 2

Corollary 6.27 Assume . Let (aj )ji be a family of sets where card(aj ) = j , j i, and
card(i) . Then [
card( aj .
ji
S P
Proof. This follows from card aj j card(i) = . 2
ji ji
In particular, a countable union of countable sets is countable. It is quite remarkable that one cannot
prove this without using AC.
Let j , j i, be a family of cardinals with supremum . Then is a cardinal. Indeed, j yields
card() j for all j i. Hence card() is an upper bound for the j ; but then card() yields
card() = .
Corollary 6.28 Let j 1, j i, be a family of cardinals where = sup{j |j i} and = card(i).
Then, if or are infinite, one has that
X
j =
ji
P P P P
Proof. j card(i) = . On the other hand, = 1 j . And for each j i
j jP P j P j
one has that j j , therefore j . Hence = max(, ) j . 2
j j
Theorem 6.29 (Zermelo-Konig) Let for j i, aj bj where aj <in bj and where the aj are
pairwise disjoint. Then [ Y
aj <in bj
Proof. Because cj = bj \ aj are non-empty, S we haveQ a choice function j 7 dj where the dj belong
to cj . This gives us an injective map f : aj bj from the disjoint sum a of the Q aj into the
direct product b of the bj : If e a, then e ak for a unique k i. We define f (e) bj such that
pj (f (e)) = dj for j 6= k, and pk (f (e)) = e. It is easy to see that f is injective: Assume that e 6= e0 . If
e and e0 belong to the same ak , then f (e) and f (e0 ) have different kcomponents, namely e and e0 . If
e ak and e0 ak0 , where k 6= k 0 , then the kcomponent of f (e) is e, while the kcomponent of f (e0 )
is dk ck , soScertainly
Q different from e, because dk / ak . Hence again, f (e) 6= f (e0 ).
Now let f : aj bj be any function. We are going to show that f is not surjective. Because of
aj <in bj none of the maps pj f aj from aj into bj is surjective (again by AC, see Proposition 6.3).
We have a choice function j 7 dj where dj bj but dj is not the jcomponent of any f (e) where
e aj . Hence (dj )ji is an element of b which is not in the range of the map f . 2
P Q
Corollary 6.30 Assume for the family of cardinals that j < 0j , j i. Then j < 0j .
45
Proof. We apply the previous theorem to bj = {j} 0j and aj = {j} j . 2
A function whose domain is an ordinal is called a transfinite sequence. Let ( ) be an increasing

sequence of ordinals. If is a successor ordinal
S then the sequence has a largest element, namely
where + = . At any rate, we can form = { | < } which is the supremum of the . If 0 < ,
then one has some 0 < such that for all 0 < one has that 0 < . We call therefore the
limit of the increasing sequence ( )< .
A subset c of an ordinal is said to be bounded in if there is some < such that holds for all
c.
The increasing sequence = ( ) is said to be cofinal in if the range of , ran() = { | < },
is an unbounded subset of .
Assume that = ( ) is cofinal in . Then and have to be limit ordinals. If is a successor,
i.e., = 0+ , then has maximum 0 < , so ran() is bounded in . On the other hand, any
sequence with values within a successor ordinal has to be bounded, namely by the predecessor of .
So is a limit ordinal.
The increasing sequence = ( ) is cofinal in iff = lim(). Indeed, the ordinal is an upper
bound of ran(). It is the least upper bound iff ran() is not bounded in , i.e., is cofinal in .
Proposition 6.31 For any limit ordinal , the identity on is cofinal in . If : 0 is cofinal in
0 and : 0 is cofinal in then : is cofinal in . 2

We remark that the subset
S a of
S is unbounded in iff a = {| , a} is unbounded in . In
this case one has that a = a = .
Definition 6.2 Let be a limit ordinal. The cofinality of , cf (), is the least ordinal such that
there is an increasing transfinite sequence = ( ) of ordinals within which is cofinal in .
Proposition 6.32 Let be a limit ordinal and = card(). Then cf () .
Proof. We have to find an increasing map g : which is cofinal in and where . We

know that there is a bijection f : because . However, we will only need S the fact that
ran(f ) is unbounded in . The map f gives rise to the map g 0 : {}, g 0 () = {f ()| < }.
Clearly, g 0 is increasing. If , then + < because is a limit ordinal. There is some such
+ + 0 + 0
that f () S and we may choose = min{| , f () }. Then g ( ) = g () f ().
Hence, {g 0 ()| } = . If g 0 () for all , then let g 0 = g and = . Otherwise,
let be the smallest ordinal such that g 0 () = and put g = g 0 . If we had = + , then
g 0 ( + ) = gS
0
()
f () = which
S contradicts
0
g ()
< , f () < . Hence, is a limit ordinal and we
conclude: g()| g( + )| f ()| = g 0 () = . 2
Remark 6.1 Let and be limit ordinals where < cf (). Let f : be any function from
into . Then f is bounded, i.e., there is some such that f () holds for all .
This follows from the proof of the proposition.
Corollary 6.33 Let be a limit ordinal. Then cf () is a cardinal.
Proof. Let = card(cf ()). By the previous theorem we have a cofinal map : cf (), ,
and a cofinal map : cf () . The composition : is cofinal and < cf () would
contradict the definition of cf (). 2
Proposition 6.34 Let be a limit ordinal. Then cf (cf ()) = cf ().
46
Proof. We have cf (cf ()) cf () and, as in the previous proof, cf (cf ()) < cf () would contradict
the definition of cf (). 2
Definition 6.3 An infinite cardinal is called a regular cardinal if = cf (). An infinite cardinal
which is not regular is called singular.
For example, cf (cf ()) is for any limit ordinal some regular cardinal. The first infinite cardinal, , is
obviously a regular cardinal.
Recall that the infinite cardinals form a well ordered class and that there is a unique order isomorphism
between the class of all ordinals and the class of all infinite cardinals which we called Aleph(, ), i.e.,
= . For example, 0 = and card(2 ) = for some . We are going to show that cannot be
, one of the few definitive statements one can make on the basis of ZF with the AC.
Definition 6.4 A cardinal is called a successor cardinal if = for some successor ordinal .
Otherwise it is called a limit cardinal, i.e., = for some limit ordinal .
Proposition 6.35 Let be a limit ordinal. Then : , 7 is cofinal.
Proof. Let < . Then card() < , thus card() = for some < . Because is a limit
ordinal, we have that + < . Hence, card() = < + < . 2
The second limit cardinal, , is singular because of cf ( ) = by means of the map 7 .
By the previous proposition, : , 7 is cofinal. Hence, cf ( ) = .
Proposition 6.36 All infinite successor cardinals are regular.

Proof. Let be an infinite cardinal and : + be an increasing map where < + . For any
Swe have that () < + . We have card() S and card(()) , . It follows that
card( {())| } is at most = , hence a = {())| } + , and therefore cannot
be cofinal. 2
Theorem 6.37 (Hausdorff s Formula) Let and be infinite cardinals. Then
(+ ) = +
Here + is the smallest cardinal larger than .

Proof. If + then (by Corollary 6.26 of Theorem 6.22 one has (+ ) = 2 ; = 2 , + < 2 .
Hence, (+ ) = 2 and + = 2 2 = 2 .
Recall that (+ ) is the cardinal of the set of all functions f from into + . If < + , then by
+ +
Remark 6.1 after the proof of Proposition 6.32 and because
S of cf+( ) = , every such function f
+ +
is bounded by an element of . That is,S( ) = { | P } as sets; but the cardinal of the
right-hand side can be estimated by: card( { | + }) + .
+
On the other hand, + (+ ) , (+ ) and therefore + (+ ) . 2
Corollary 6.38 Let be an infinite cardinal. Then (+ ) = 2 .
Proof. (+ ) = + = 2 + = 2 . 2
In Aleph notation, Hausdorffs formula reads

1 = 1
47
Proposition 6.39 Let be an infinite limit ordinal. Then cf ( ) = cf ().
Proof. Let : be cofinal. Then 0 : , 7 () is increasing, because Aleph is
strictly increasing. We show that 0 is cofinal. Let : , 7 if is an infinite ordinal and
card() = , and 7 0 if is finite. (Here we use that > 0, there is no map from 0 into the
empty set.) Now let and = () . Then < () for some . But then < ()
and because of card() we conclude () = 0 ().
Now let 0 : be cofinal. The map : , 7 ( 0 ()) is obviously increasing. In order to
show cofinality, let . Then < + < 0 () for some but then ( 0 ()) + > . 2
For = 0, the last proposition is not true: cf (0 ) = 0 > 0; 0 is a limit cardinal and regular.
Definition 6.5 An uncountable cardinal is called weakly inaccessible if it is
(i) a limit cardinal, i.e., = for some limit ordinal ,
(ii) regular, i.e., cf ( ) = .
Corollary 6.40 Let be a weakly inaccessible cardinal. Then = .
Proof. By the previous proposition, cf ( ) = cf () because is uncountable and a limit cardinal.

We certainly have because Aleph is strictly increasing on Ord(x) (compare Proposition 3.9 on
ordinals). But if we assume that < , then cf ( ) = cf () < shows that cannot be
regular. 2
The existence of weakly inaccessible cardinals cannot be proven on the basis of ZF.
Definition 6.6 A cardinal is called inaccessible if it is weakly inaccessible and if 2 < holds
for all < .
One can show that V is a model of ZF in case that is an inaccessible cardinal.

Theorem 6.41 cf (20 ) > .
S P
Proof. Let : 20 be any increasing map. Then card ()| , where
0
P
0 0
= card(()) < 2 . Hence, by Zermelo-Konigs theorem, we conclude that < 2 = 2 0 .
S
Hence, ()| 20 . Thus the map is not cofinal. 2
Corollary 6.42 20 6= , or more generally, 20 6= where is a limit ordinal and cf () = .

Proof. We have already noticed that cf ( ) = . The claim now follows from Theorem 6.41. 2

If we set 2 = E() then E() = 1 is the generalized continuum hypothesis. It is known that
any monotone functional relation E on the ordinals subject to the conditions that E() 1 and
cf (E() ) > can solve 2 = E() in a suitable model of set theory.
48

Set Theory

Uploaded by

Copyright:

Available Formats

Set Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Set Theory

Uploaded by

Copyright:

Available Formats

Cardinal and Ordinal Numbers

2 The Zermelo Fraenkel Axioms of Set Theory 5

4 The Axiom of Choice 32

5 The Axiom of Foundation 34

D. Hilbert,Uber das Unendliche. Math. Annalen 25 (1925).

The Independence Proofs of Cohen are clearly presented in:

P. J. Cohen, Set Theory and the Continuum Hypothesis. Benjamin (1966).

R. Goldblatt, Lectures on the Hyperreals. An Introduction to Nonstandard Analysis. Springer (1998).

J. Barwise, J. Etchemendy, The Liar. Oxford (1987).

A good deal of the history of modern set theory is contained in

The Zermelo Fraenkel Axioms of

Figure 2.1: Snapshot of the Universe

The Null Set Axiom. There is a set with no elements:

The Language of Axiomatic Set Theory

Figure 2.2: The Graph of a Conjunction

Figure 2.3: The Graph of the Extensionality Axiom

x0 . . . xn1 P (x0 , . . . , xn1 )

S(x, a) (x a) and R(x) (x x)

dom of R(x, y) yR(x, y) and ran of R(x, y) xR(x, y)

Then R(x, y) is a relation on

E(z) dom of R(z, y) ran of R(x, z)

which we call the extent of R(x, y).

dom of E(x, y) E(x, x) and ran of E(x, y) E(y, y)

and therefore E(x, y) is a relation on its domain D(x).

holds on U . D(x) denotes the domain of L(x, y).

dom of P (x0 , . . . , xn1 , y) yP (xo , . . . , xn1 , y)

which holds for a set a in U if and only if F (x, y, a) is functional.

The Schemes of Replacement and Comprehension

In order to deduce this from replacement, we only have to note that

b = {x|x a A(x, a0 , . . . , an1 )}

Constructions within the Universe

is functional and says that z is the cartesian product of x and y.

r = {e|e = (c, d) a b, R(c, d)}

c = {f |f P(a b), F (a, b, f )}

For the set c one uses exponential notation c = ba .

This follows from comprehension: We will

The Remaining Axioms of ZF

Proposition 3.1 A well-ordering is well-founded.

Proposition 3.3 (AC) A total, well-founded order is a well ordering. 2

As a generalization of complete induction for natural numbers we have:

This terminology is justified by the following

F (x, y) D1 (x) D2 (y) {u|S1 (u, x)}

Ord(x) T rans(x) W ell(x)

Proposition 3.18 Let be an ordinal. Then is not an element of itself.

Proposition 3.23 Let be any ordinal. Then + = {} is also an ordinal.

Proof. Let + and . If then and therefore , and + because of

R(x, y) Ord(x) Ord(y) (x y)

0 < 1 < 2 < ...

are the finite ordinals n and [

+ 1 < + 2 < ...

and then comes as the second infinite limit ordinal

Note, if one has that H(0, , a0 ), then necessarily

= {| < , f c f : c , f () = H(, f ), < } = { < |f () }

f () = (f )() = f () = H(, f ) = H(, (f ) ) = H(, f )

f0 () = F 0 () = H(, F 0 ) = H(, (F 0 ) ) = H(, f0 ).

Here is a first application of the Recursion Principle: Define a functional relation

In order to define addition of ordinals, we take the functional relation:

We then have a unique S function

0+1 = (0 + 0)+ = 0+ = 1 i.e., 0+ = 1

2 + 2 = (0+ )+ + (0+ )+ = ((0+ )+ + (0+ ))+ = (((0+ )+ + 0)+ )+ = (((0+ )+ )+ )+ = 4