Set Theory
Set Theory
Set Theory
Math 6300
Klaus Kaiser
April 9, 2007
Contents
1 Introduction 2
3 Ordinals 14
3.1 Well-Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Ordinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 The Orderstructure of the Ordinal Sum, the Ordinal Product and Ordinal Exponentia-
tion. Finite Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6 Cardinals 38
6.1 Equivalence of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Cardinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1
Chapter 1
Introduction
In this course, we will develop set theory like any other mathematical theory on the basis of a few
given axioms and generally accepted practices of logic. When we are studying algebraic structures like
groups, we have in mind structures like S n , Z, V, G whose elements are called permutations, integers,
vectors or just group elements, and we use suggestive notations like , n, v, g to denote the objects
these groups are made of. For groups, we also have a binary operation acting on the elements which
we like to denote as composition, addition, translation or multiplication and we use suggestive symbols,
like for composition and + for addition. A group has a unique identity element which is usually
denoted as e. As a matter of convenience, an additional unary operation is then added, which assigns
to a group element its unique inverse. The theory of groups is then governed by a few simple axioms
x(yz) = (xy)z, xx1 = x1 x = e and xe = ex = x. Groups are then developed solely on the basis of
these axioms with the aforementioned examples serving as illustrations and motivations.
The situation for set theory is somewhat different. Unless you have already seen some axiomatic
set theory or mathematical logic, you probably have not the fuzziest idea about different models of set
theory. Sets are just arbitrary collections of objects and manipulations of sets , like forming intersection,
union, and complement correspond to basic logical connectives, namely and, or, not. It seems that
there is only one universe of sets. Our knowledge about it, however, may increase over time. Before
Borel and Lebesgue, mathematicians didnt recognize measurable sets of real numbers. But they were
there, just as the planet Pluto existed before it was discovered around 1930. Of course, mathematical
objects are not physical. They are mental constructs owing their existence to our ability to speak and
think. But is it safe to speak about all possible sets? Indeed, most mathematicians believe that it is
safe to accept the idea of a universe of sets in which all of mathematics is performed. Mathematical
statements then should be either true or false even when we know that they are undecidable right
now. There is a belief that further insights will eventually resolve all open problems one way or the
other. However, certain precautions must be exercised in order to avoid inconsistencies., like Russels
paradox about the set of all sets which dont contain themselves. (If r(x) stands for the predicate
not(x x), then forming the Russel class r = {x|r(x)} leads to the contradiction (r r) iff not(r r).
In Naive Set Theory, methods for constructing new sets from given ones are presented and some sort of
etiquette for doing it right is established. Such an approach, however, can be confusing. For example,
most mathematicians dont feel any need for Kuratowskis definition of an ordered pair (a, b) as the set
{{a}, {a, b}} or to go through a lengthy justification that s(n, 0) = n, s(n, m0 ) = s(n, m)0 defines the
sum of two natural numbers n and m. On the other hand, certain proofs in analysis where sequences
s(n) are constructed argument by argument leaving at every n infinitely many options open, certainly
take some time for getting used to. How in the world can we talk about the sequence s(n), n N
as a finished product, when at point n we just dont know what s(m) for m n will be? Sure, the
2
Axiom of choice is supposed to do the job. But what kind of axiom is this anyway? Is it part of our
logic, or is it a technical property of a theory of sets? How come that generations of mathematicians
didnt notice that they had been using it all the time? In Axiomatic Set Theory we assume that there
is a mathematical structure U which we call the universe and whose elements are called sets. On U
a binary relation is defined which is called the membership relation. The basic assumption then is
that (U, ) satisfies the Zermelo-Fraenkel Axioms of set theory. We think that U is a set in the naive,
familiar sense whose objects are called sets. Because we dont want U to be a member of U, we call U
the universe. This is mainly a precaution which we exercise in other branches of mathematics, too. A
function space consists of functions, but is itself not a function. The relation is a relation between
sets in U. We use the notation only for this relation, in particular, instead of x U we say x is in U
or that x belongs to U. We are now going to describe the axioms of set theory. These are statements
about the universe of sets every mathematician would consider as self-evident. We are going to claim
that there are sets, in particular an empty set and an infinite set and that we can construct from given
sets certain new sets, like the union and the power set of a set. There is one difficulty in defining sets by
properties. Because we only have the membership relation at our disposal, any property about sets
should be expressed in terms of and logical procedures. For this reason we have to develop a language
of axiomatic set theory first. The existence of sets sharing a common property is then governed by
the axiom of comprehension. It will turn out that for example x = x or not (x x) never define sets,
no matter what the universe is, resolving Russells paradox. A certain amount of mathematical logic
seems to be unavoidable in doing axiomatic set theory. But all that is necessary is to explain the syntax
of the first order language for set theory. We do not have to say what a formal proof is. Similarly,
the interpretation of formulas in the model (U, ) is considered as self evident; delving into semantic
considerations is equally unnecessary. Also, this is standard mathematical practice. In algebra you
have no problems understanding the meaning of any particular equation, say the commutative law,
but it needs to be made clear what a polynomial as a formal expression is. Because in axiomatic set
theory we have to make statements concerning all formulas, we have to say what a formula is. Only
the most important facts about set theoretic constructions, cardinals and ordinals are discussed in
this course. Advanced topics of topology, for example, need more set theory. But these notes contain
enough material for understanding classical algebra and analysis.
References
K. Devlin, Fundamentals of Contemporary Set Theory. Springer (1979).
K. Devlin, The Joy of Sets. Springer (1993).
K. Hrbacek, T. Jech, Introduction to Set Theory. Marcel Dekker, Inc. (1984).
J. L. Krivine, Introduction to Axiomatic Set Theory. Reidel (1971).
K. Kunen, Set Theory. North Holland (1980).
Y. Moschovakis, Notes on Set Theory. Springer (1994).
J. D. Monk, Introduction to Set Theory. McGraw-Hill Book Company (1969)
J. H. Shoenfield, Mathematical Logic. Addison Wesley (1967).
R. Vaught, Set Theory. Birkhauser (1994).
These are very good text books on set theory and logic. The book by Monk is still useful for
learning the basics of cardinal and ordinal arithmetic. Devlins 93 book contains a chapter on recent
3
research on P. Aczels Anti-Foundation-Axiom. The books by Kunen, Krivine and Shoenfield are
advanced graduate texts, i.e., aimed at students who want to specialize in logic. The book by Kunen
is a comprehensive text on set theory while Krivine is a good introduction into the classical relative
consistency proofs, that is, the ones based on inner models of set theory. Shoenfield contains a final, far
reaching, chapter on set theory. The following two articles are quite interesting. Shoenfield analyzes the
truth of the ZF axioms, while Hilbert outlines the transition from finitary, constructive mathematics
(which underlies, for example, our intuitive understanding of the natural numbers as well as the syntax
of logic) towards a formalistic point of view about mathematics.
J. H. Shoenfield, Axioms of Set Theory. In: Handbook of Mathematical Logic. (North Holland,
Amsterdam)
The formal analysis of logic and set theory has important practical applications in form of non-
standard methods. There is an extensive literature on this vital subject. The following books are
exceptionally well written; the book by Robinson is a classic of this field.
A. E. Hurd, P.A. Loeb, An Introduction to Nonstandard Real Analysis. Academic Press (1985).
E. Nelson, Radically Elementary Probability Theory. Princeton (1987).
A. Robinson, Non-Standard Analysis. North-Holland (1974).
The Axiom of Foundation, that is, a set cannot contain itself, should be true for the universe of
sets, but it does not have any significant consequences. So we have separated it from the other axioms
and develop set theory without this axiom. The following two books analyze strong negations of the
Foundation Axiom and provide applications to self referential statements and Computer Science.
P. Aczel, Non-Well-Founded-Sets. Center for the Study of Language and Information Publications,
Stanford (1988).
4
Chapter 2
The Axiom of Extensionality. If every element of the set a is an element of the set b and every
element of the set b is an element of the set a, then a = b.
In other words, two sets are equal iff they contain the same elements. This should not be considered
as a definition of equality of sets. Equality is an undefined, primitive relation and clearly, equal sets
have the same elements. The axiom of extensionality merely states a condition on the relation . We
may formalize extensionality:
xy z (z x) (z y) (x = y)
The elements of the universe (U , ) are in the first place just objects without any structure. What
matters is their relationship to other elements with respect to . We may think of U as a directed
graph where the sets in U are nodes and a b corresponds to an edge a b. Part of the universe may
have nodes called 0, 1, 2, {1} and edges 0 1, 0 2, 1 2, 1 {1}:
2 {1}
An edge 0 {1} would violate the axiom of extensionality, because then 2 and {1} would have the
same elements.
xy (y x)
By extensionality, there is only one such set. It is denoted by and called the empty set. It is a
constant within the universe U, i.e., a unique element defined by a formula.
5
The Pairing Axiom. For any sets a and b there is a set c whose only elements are a and b:
xyzt (t z) (t = x) (t = y)
By extensionality again, there is for given a and b only one such set c. We write c = {a, b} for the
set whose only elements are a and b. If a and b are the same set, then c has only one element, namely
a. That is, for any set a of the universe U there is a set whose only element is a. This set is called the
singleton {a}; {a, b} is called a pair if a is different from b. Three applications of the pairing axiom
lead to the existence of the set {{a}, {a, b}}. This is Kuratowskis definition of the ordered pair (a, b)
of a and b. One easily proves the
Theorem 2.1 One has that (a, b) = (a0 , b0 ) if and only if a = a0 and b = b0 .
The Union Axiom. For any set a there is a set b whose members are precisely the members of
members of a:
xyz (z y) t (t x) (z t)
S S
The set b is called the union of a and denoted by a or {x|x a}. We mention some consequences:
For any
S sets a, b, c there is a set d whose elements are a, b and c:
d = {{a, b}, {c}}
The union of c = {a, b} is denoted by a b. It is easy to see that
a b = {x|x a or x b}.
Let a and b be sets. We say that a is a subset of b if every element of a is also an element of b:
(x y) z (z x) (z y)
The left-hand side is not a formula, because is the only relation of our universe; (x y) is only
an abbreviation of the formula in the variables x and y on the right hand side. In particular we have
by extensionality that
xy (x = y) (x y) (y x)
The Power Set Axiom. Let a be a set of the universe U. Then there is a set b whose elements are
precisely the subsets of a:
xyz (z y) (z x)
The set b is called the power set of a and we use the notation b = P(a) . We have P() = {} ,
P({}) = {, {}} , P({, {}}) = {, {}, {{}}, {, {}}}.
If a is any set of our universe, any c P(a) corresponds to an intuitive subset of a, namely {d|d c}
where for each such d, d a holds. However, not every proper collection of edges d a will lend
itself to a set c of the universe. For example, if U happens to be countable then any infinite set a in U
will have subsets which dont correspond to sets in U. What kind of properties now lead to subsets?
We have reached the point where we have to talk a bit about mathematical logic.
6
(b) The elements of the universe U are the constants of the language.
(c) The membership symbol and the equality symbol =.
(d) The symbols for the propositional connectives: which stands for and, which stands for
or, which stands for not, which stands for if, then, which stands for if and only
if.
(e) For each variable xn one has the universal quantifier xn which stands for for all xn and
the existential quantifier xn which stands for there exists some xn .
2. Formation Rules for Formulas
(a) Let u and v stand for any variable or constant. Then (u v) and (u = v) are formulas.
These are the atomic formulas.
(b) If P and Q are formulas then (P Q), (P Q), P , (P Q), (P Q) are formulas.
(c) If P is a formula then xn P and xn P are formulas.
Only expressions that can be constructed by finitely many applications of these rules are formulas.
For better readability, different kinds of parentheses will be used, and letters, like x, y, z, . . . will stand
for variables. There are standard conventions concerning the priorities of the binary propositional
connectives in order to avoid an excessive accumulation of parentheses.
The axioms of set theory as stated so far are all formulas, actually sentences, that is, all occurrences
of variables are bound. If Q is a formula then every occurrence of xn within P of a subformula xn P
or xn P of Q is said to be bound. Variables xn which are not bound, i.e., which are not within the
scope of a quantifier xn or xn of Q, are said to be free. If we underline in a formula a variable then
this variable is meant to occur only bound.
Formulas can be represented by certain labelled, directed trees. An atomic formula is just a node,
e.g.,
(x a)
which is a tree. If 1 is the tree for P1 and if 2 is the tree for P2 , then the tree for (P1 P2 ) is
the graph:
1 2
Any node of the tree for the formula Q determines a subformula P of Q. For example, a
node labelled determines a conjunction P (P1 P2 ) as a subformula of Q, where P1 and P2 are
subformulas of P ; P1 and P2 are the scope of the node . Similarly, a node x determines a subformula
P xn P1 , where the subformula P1 of P is the scope of the node xn within Q.
Whenever we indicate a formula P as P (x0 , x1 , . . . , xn1 ), it is understood that the free variables
of P , if there are any, are are among x0 , x1 , . . . , xn . The constants within a formula are often called
parameters. So we write P (x0 , . . . , xn1 , a0 , . . . , am1 ) to indicate the free variables and parameters of
a formula. A sentence P is either true or false in the universe U. More generally, if P (x0 , . . . , xn1 )
is a formula with free variables x0 , . . . , xn1 and if a0 , . . . , an1 belong to U, then a simultaneous
7
x
(x = y)
(z x) (z y)
8
substitution of the xi by the ai makes P (a0 , . . . an1 ) either true or false. When we say that a formula
P (x0 , . . . xn1 ) holds on U , it is meant that its closure, i.e.,
holds on U . Because we have used the equality sign = as a symbol within the language, equality of
formulas, or more generally their equivalence, is denoted by , e.g., x = y y = x. That is, we
write P Q if and only if P Q is a theorem of logic. Formulas without parameters are called pure
formulas of set theory.
A formula in one free variable, or argument, is called a class.
are examples of classes. The first class defines a set, namely a, while the second one does not define a
set: b satisfies S(x, a) iff b a; there is no set r such that b satisfies R(x) iff b r.
Formulas P (x0 , . . . , xn1 ) are called n-ary relations. Formulas in two arguments are called binary
relations. We also use the terms predicates, properties and expressions for formulas. Let E(x) be a
class. We say that R(x, y) is a relation on E(x) if
xy R(x, y) E(x) E(y)
holds on U .
Let R(x, y) be a binary relation. We define domain and range as the classes
holds on U .
The relation R(x, y) is symmetric if
xy R(x, y) R(y, x)
holds on U .
The relation R(x, y) is transitive if
xyz R(x, y) R(y, z) R(x, z)
holds on U .
The binary relation E(x, y) is called an equivalence if it is reflexive, symmetric and transitive. It is
easy to see that for any reflexive relation, e.g., an equivalence E(x, y) one has that,
9
The binary relation R(x, y) is called anti-symmetric if
xy R(x, y) R(y, x) (x = y)
holds on U .
A binary relation P O(x, y) which is reflexive, transitive and anti-symmetric is called a partial order.
Again we have by reflexivity that domain and range define the same class and that P O(x, y) is a relation
on its domain. A partial order L(x, y) is called linear or total, if
xy D(x) D(y) L(x, y) L(y, x)
holds on U .
We define for any relation P (x0 , . . . , xn1 , y) domain and range
The domain is an n ary relation D(x0 , . . . , xn1 ) while the range is a class R(y).
The binary relation
P (x, y) z (z y) (z x)
is functional in the variable x. It assigns to a set a the power set b = P(a). We have that
dom of P (x, y) (x = x) and ran of P (x, y) xz (z y) (z x)
Instead of P (x, y) we will often use the more suggestive notation y = P(x). We similarly write
z = (x, y), z = {x, y} and z = x y for the corresponding functional predicates.
We define for a formula F (x, y, z) the expression
f un F (x, y, z) xy1 y2 F (x, y1 , z) F (x, y2 , z) y1 = y2
10
Because this is supposed to hold for every pure formula F (x, y, x0 , . . . , xn1 ), where at least x and y
are free, this list of axioms is called a scheme. It is called replacement because it allows us to replace
some of the elements c of the set a simultaneously by sets d in order to create a set b. As a first
application of replacement we will deduce its weaker cousin
The Scheme of Comprehension. Let A(x, x0 , . . . , xn1 ) be a pure formula of axiomatic set theory
and let a0 , . . . , an1 be sets. Then for any set a there is a set b which consists exactly of those
elements c of a for which A(c, a0 , . . . , an1 ) holds on U:
x0 . . . xn1 xyz z y z x A(z, x0 , . . . , xn1 )
is functional.
The standard notation for the subset b of a, which is defined by the property A(x, a, .., a), is
The Intersection of a Set. Let a be non-empty set. Then there is a set b whose members are
precisely the members of all members of a.
x (x = ) yz (z y) t t x z t)
This follows at once from comprehension. Note that the intersection of theT set aTis contained in any
of its members c. The standard notation for the intersection of a set a is a or {x|x a}. Why is
it important to assume that the set a is non-empty?
The Cartesian product of Two Sets. Let a and b be sets. Then there is a set c such that e c if,
and only if, e = (f, g) where f a and g b:
xyzt (t z) uv t = (u, v) (u x) (v y)
The equation z = (x, y) is shorthand for the functional relation Q(x, y, z) which says that z is the
ordered pair (x, y), which according to Kuratowskis definition is the set {{x}, {x, y}}. Thus:
h n oi
Q(x, y, z) t (t z) uv t = ut = v s s u s = x s0 s0 v s0 = xs0 = y
If = (f, g) = {{f }, {f, g}} where f a and g b then {f } P(a) and {f, g} P(a b). Hence
(f, g) P(P(a b)). We now apply comprehension to
P (z, a, b) uv Q(u, v, z) u a v b
11
which says that z is an ordered pair whose two components belong to a and b, respectively and get
the desired result as
c = {e|e P(P(a b)) P (e, a, b)}
The set c is called the cartesian product a b of a and b. The cartesian product of finitely many sets
is similarly defined. The formula
C(x, y, z) t t z uv Q(u, v, t) (u x) (v y)
That is, R (e) holds if and only if e = (c, d) and R(c, d) holds.
Relations as Sets. Let R(x, y) be a binary relation. Assume that domain and range of R(x, y) are
sets a and b, respectively. Then define the set
We now have e = (c, d) r if and only if R (e) holds. In this sense we may identify a binary relation,
for which the extent is a set, by a set of ordered pairs.
Graphs of Functions. If the binary relation F (x, y) is functional and the domain of F (x, y) is a set
a, then, according to replacement, the range is also a set. Let b be any set containing the range
of F (x, y). The set
{e|e = (c, d) a b, F (c, d)}
is called the graph of the function f : a b.
The projections are important examples of functional relations:
F1 (x, y, z, p) Q(x, y, z) p = x
F2 (x, y, z, q) Q(x, y, z) q = y
We have that F1 (a, b, c, d) holds on U iff c = (a, b) and d = a, i.e., d is the first component of the
ordered pair c. The predicate
P1 (t, x) uv F1 (u, v, t, x)
then holds if x is the first component of the ordered pair t. That f is a function from a to b is
expressed by F (a, b, f ) where
F (x, y, z) p C(x, y, p) z p u u x t t z P1 (t, u)
tt0 suu0 t z t0 z P1 (t, s) P1 (t0 , s) P2 (t, u) P2 (t0 , u0 ) u = u0
The Exponentiation of Sets. Let a and b be sets. Then there is a set c whose elements are given
by the functions f : a b.
The function f : a b is an element of P(a b). Hence c is defined by comprehension:
12
Union and Intersection of a Family of Sets. A function s with domain i is sometimes called a
family of sets aj , j i, where, of course aj = s(j). The union of the family s is the
Sunion of the
range
S r of s, which is, according to the replacement axiom, a set u. We write u = {aj |j i} =
s. The intersection of a non-empty family s is defined similarly.
The Cartesian Product of a Family of Sets. Let s be a family of sets, indexed by the set i. A
function f : i u from i into the union u of the range r of s is called a choice function if for every
j i one has that f (j) aj . Then there is a set c whose members are all the choice Q functions
for s. This set is called the cartesian product of the family s and is denoted by c = {aj |j i}.
The Axiom of Infinity. There is a set whose elements are exactly the natural number sets n.
This concludes the list ZF of axioms for axiomatic set theory. Our definition of as the set of all
natural number sets n is only preliminary; it is not even given by a first order sentence of our language
of set theory. After we have studied ordinals in general, will be defined as the set of all finite ordinal
numbers1 . Of course, there is no danger to think that the finite ordinals are just the ordinary finite
numbers n. And the vast majority of mathematicians feel that way. On the other hand, any axiomatic
definition of allows for elements which are nonstandard, i.e., different from any ordinary number n.
However, whether one realizes this possibility or not seems to be irrelevant for the formal development
of mathematics.
There are two more axioms most mathematicians consider astrue, the Axiom of Choice and the
Axiom of Foundation. These axioms are listed separately, mainly because because a great deal of set
theory can be developed without them.
The Axiom of Choice (AC). The Cartesian product of a family of non-empty sets is itself non-
empty.
The Axiom of Foundation (AF). Every non-empty set a contains a set b which is disjoint to a.
Both axioms are independent of ZF. The axiom of choice is necessary for proving many essential
theorems concerning infinite sets, e.g., that every vector space has a basis. The axiom of foundation
provides the universe U with more structure in the sense that every set will have a rank as measure of
its complexity. However, even strong negations of AF are consistent with ZF and such models of set
theory have become an important research tool in computing science for the analysis of self-referential
statements.
1 More elementary is Dedekinds approach: For any set x, x {x} is called the successor x+ of x. A set i is called
inductive if we ave that i and x i implies that x+ is in i. Dedekinds version of the axiom of infinity then says
that there is an inductive set i0 . It is obvious that i0 must contain all n. Then he defines the set of natural numbers
as the intersection of all inductive sets. This can be done because it is enough to intersect all inductive subsets of i0 .
13
Chapter 3
Ordinals
3.1 Well-Orderings
A binary relation r on a set w is called a well-ordering if it is a partial order such that every non-empty
subset has a smallest element. Because any pair has a minimum, r is actually a total order. Instead
of (c, d) r one writes c d. Actually, for well-orderings one prefers the irreflexive or strict version
of r. If r is a partial order then one has
(i) r \ = r0 is irreflexive, i.e., (c, c)
/ r.
(ii) r0 is transitive.
The conditions (i) and (ii) imply that
(iii) r0 is anti-symmetric, i.e., c < d implies that d < c cannot hold.
On the other hand, if r0 is an irreflexive, transitive relation then r = r0 is a partial order.
If we assume the axiom of infinity in its naive version, i.e., is the set of the standard natural
numbers n, then a prime example of a well-ordered set is provided by , together with the relation
restricted to the set . We have:
n < m iff n m iff n m
Because N = (N, <) is well-ordered, the same holds true for (, ). Notice, that N lives outside the
universe U and n 7 n is an isomorphism which is not an element of U. That N is well-ordered by <
is equivalent to the induction axiom, which we may take for granted.
On the other hand, we can prove, with the help of the axiom of foundation, that (, ) is well-
ordered. Let b be a non-empty subset of . According to AF there is some n in b which is disjoint to
b. That is, m n yields m / b. In other words, n is the smallest element of b.
Definition 3.1 A binary relation r on a set w is well-founded if there is no strictly decreasing map
f : w, n 7 an , i.e., a0 > a1 > a2 > . . . which belongs to U.
Proof. Otherwise the range {a0 , a1 , . . .} would be a set b without a smallest element. 2
Definition 3.2 A binary relation r on a set a satisfies the minimal condition if any non-empty subset
b of a contains a minimal element, i.e., there is some c b such that for no d b one has that (d, c) r.
14
Definition 3.3 Let be a partial order on the set a. Then any c a determines the (initial) segment
s(c) = {b|b a, b < c}.
Proposition 3.2 (AC) A partial order on a set a which is well-founded satisfies the minimal
condition.
Proof. Assume that a non-empty subset b of a does not have a minimal element. Then the set
d = {s(c)|c b} does not contain the empty set. Let f be a choice function on d. Pick any c0 from
b. Then 0 7 c0 , 1 7 f (s(c0 )) = c1 , 2 7 f (s(c2 )) = c2 , . . . defines a strictly decreasing sequence which
belongs to U. (The proof that we can define in such a way a sequence is a good excercise.) This
contradicts the well-foundedness of our relation. 2
In particular:
Proposition 3.5 The proper segments of a well-ordered set w are exactly the segments s(a).
Proof. Assume that the subset s of w is a proper segment. Then w \ s is non-empty and has a
smallest element a. But then s = s(a). 2
Corollary 3.6 The map a 7 s(a) is strictly increasing and establishes a bijective order preserving
map between the well-ordered set (w, <), and the ordered set system S = ({s(a)|a w}, ) of segments
of w. 2
Definition 3.5 An injective, order preserving map map between partially ordered sets is called an
order embedding. A bijective order embedding for which the inverse is also order preserving is called
an (order) isomorphism. An order isomorphism of a partially ordered set to itself is called an (order)
automorphism.
Proposition 3.7 If f is a bijective order embedding from the totally ordered set (a, 1 ) to the partially
ordered set (b, 2 ) then 2 is a total order and f is an order isomorphism. 2
Corollary 3.8 A well-ordered set is order isomorphic to its system of proper segments. 2
15
Proposition 3.9 Let (w, <) be a well ordered system and f : w w be strictly increasing. Then one
has f (c) c for all c w.
Proof. For the proof assume that the set a = {c|f (c) < c} is non-empty. Then a has a smallest
element b. Then f (b) < b because b a. Hence f (f (b) < f (b) because f is strictly increasing. Therefore,
f (b) a. But then one has that b f (b), which is a contradiction. 2
Proposition 3.10 Let f be an automorphism of the well-ordered system (w, <). Then f is the identity
map.
Proof. Assume otherwise. Then a = {b|f (b) > b} is non-empty and has a smallest element b0 . Then
f (b0 ) > b0 because b0 a. Let b0 = f (c0 ). From f (b0 ) > f (c0 ) one infers b0 > c0 and therefore, by the
choice of b0 , f (c0 ) = c0 . Hence b0 = f (c0 ) = c0 , which is a contradiction. 2
Corollary 3.11 An isomorphism between isomorphic well-ordered systems (w, <) and (w0 , <0 ) is unique.
Proof. If f and g are two such isomorphisms then g 1 f is the identity on w. That is, f = g. 2
Corollary 3.12 A well-ordered system (w, <) is not isomorphic to any of its segments s(a).
Proof. For any such isomorphism we would have f (a) a, because of Proposition 3.9, and f (a)
s(a), i.e., f (a) < a. 2
Corollary 3.13 Assume that s(a) = s(a0 ) holds in the well-ordered system (w, <). Then a = a0 and
the isomorphism is the identity.
Proof. Assume a < a0 . Then s(a) s(a0 ) and the well-ordered system w0 = s(a0 ) would be isomorphic
to its segment s(a). This contradicts Corollary 3.12. 2
Theorem 3.14 Any two well-ordered systems (w1 , <1 ) and (w2 , <2 ) are either isomorphic or one is
isomorphic to a segment of the other one. Any such isomorphism is unique.
Proof. We cannot have w1 = (s(b), <2 ) and w2 = (s(a), <1 ) because this would yield w1
= s(b)
w2
= s(a), hence w
1 = s(a0
) for some a0
< a. This contradicts the statement of Corollary 3.12. For the
proof of the theorem we define a binary relation
F = {(a, b)|s(a)
= s(b)} w1 w2 .
It is easy to see that the domain and the range of F are segments of w1 and of w2 , respectively.
Moreover, F is functional and strictly increasing. If we had dom(F ) = s(a0 ) and ran(F ) = s(b0 ) then
F would establish an isomorphism s(a0 ) = s(b0 ). Hence (a0 , b0 ) F, and therefore a0 s(a0 ), which is
a contradiction. Hence, either dom(F ) or ran(F ), or both, are all of w1 or w2 , respectively. So either
F : w1 s(b0 ) or F : w1 w2 or F 1 : w2 s(a0 ). 2
Definition 3.6 Let R(x, y) be a total order relation on its domain D(x). Then R(x,y) is called a
well-ordering if for any element a in the domain one has that S(x, a) R(x, a) x 6= a is a set and
well-ordered by R(x, y).
16
Proof. Let a be a set which satisfies T (x) and D(x). Then T (x) S(x, a) is a set. If this set is empty,
then a is the smallest element of T (x). Otherwise it has a smallest element with respect to R(x, y)
which is then the smallest element of T (x). 2
Corollary 3.16 Let R1 (x, y) and R2 (x, y) be two well-orderings with proper classes D1 (x) and D2 (y)
as their domains. Then there is a unique functional relation F (x, y) which defines an isomorphism
between D1 (x) and D2 (x).
Proof. This is just a small modification of the proof for Theorem 3.14. As before, define
Because D1 (x) is a proper class, it cannot be isomorphic to any S2 (y, b) because these segments are
by definition sets. 2
Corollary 3.17 Let R1 (x, y) and R2 (x, y) be two well-orderings with domains D1 (x) and D2 (x), re-
spectively. Assume that D1 (x) is a set a and D2 (x) is a proper class. Then a is order isomorphic to a
unique initial segment S(x, b) of R2 (x, y). 2
The question now is whether there are any well-ordered classes. This brings us to ordinals.
3.2 Ordinals
A set a is transitive if every element b of a is a subset of a. That is, the set a satisfies the predicate
T rans(x) z (z x) (z x) .
For example, every natural number set n is transitive. The set a is transitive if c b and b a implies
that c a. This explains the terminology. The set is called an ordinal if
(i) is transitive.
(ii) if restricted to is a strict well-ordering of .
The second condition is formalized by the predicate:
W ell(z) xy x z y z (x y) (y x)
uvw u z v z w z u v v w u w
h n oi
x x z x 6= u u x y y x u y u = y
Well() holds if restricted to is an irreflexive , transitive relation and where every subset s of
contains some such that for any s one has that / . It is customary to denote ordinals by
small Greek letters. The class of ordinals is defined by the predicate
17
Proposition 3.19 Every element of an ordinal is itself an ordinal.
Proof. We first have to show that is transitive. Let and . We need to show that .
Because of the transitivity of we have that the element of is also a subset of , i.e., .
Hence yields . But then, by transitivity of again, yields . So gives
. Hence the elements , , are all in and < < . Hence < and this is .
Because is a subset of , it is well-ordered by , restricted to . 2
Proposition 3.20 The segments of an ordinal are and the elements of .
Proof. Because is well-ordered we have that the segments of are and the sets s() for .
But s() = {| , < } = {| , } = {| } = . 2
Corollary 3.21 Let and be ordinals where . Then .
Proof. We show that is a proper segment of . Indeed, let and < where . But then
by definition of < on . Because is an ordinal we have that . So . Hence is a
segment and as a proper subset of , it is an element of . 2
Corollary 3.22 Let and be ordinals. Then if and only if . 2
is a strict order on its domain, which is the class Ord(x). For any ordinal , we have that R(x, ) is
the set which is well-ordered by R(x, y).
Theorem 3.26 The class Ord(x) of ordinals is well-ordered by the membership relation , i.e., by
< iff . 2
18
S
Proposition 3.27 Let = a be the union of a set a of ordinals. Then is an ordinal and one has
for every a. Furthermore, if is an ordinal such that holds for each a, then
. That is, every set of ordinals has a least upper bound within the class of ordinals.
Proof. By the very definition of the union of a set a, we have that the union of a contains every
member of a: for each a; and obviously, any set c with that property must contain .
For ordinals, is the same as < or = and therefore we only have to show that is an
ordinal. We first show that is transitive. So let and . Then for some a. Now
is an ordinal, therefore . But then and because of . Now let and be
any elements of . Because they are ordinals we have that either = or or . That is,
is totally ordered by . If c is any non-empty subset of then c must be non-empty for at least
one a. Let = min(c ) and let c. We cannot have because was the smallest
element of c in . Hence . Therefore, is well-ordered by . 2
Theorem 3.28 The class Ord(x) of ordinals is a proper class, i.e., there is no set which contains all
ordinals.
Proof. Assume that a set b contains all ordinals. Then, by comprehension,
S there would be a set a
that consists exactly of all ordinals. By the previous proposition, = a would be an ordinal and
therefore + a. But then by the definition of union. But this is impossible for ordinals.2
Corollary 3.29 Let R(x, y) be any well-ordering where the domain D(x) of R(x, y) is a proper class.
Then there is exactly one functional relation between D(x) and Ord(x) which defines an order isomor-
phism between these classes. 2
Corollary 3.30 Let (w, <) be any well-ordered system. Then there is exactly one ordinal such that
(w, <) and (, ) are order isomorphic. 2
For any ordinal one has that + = {} is the smallest ordinal greater than . It is called the
successor of and commonly denoted as + 1. The successor of + 1 is denoted as + 2, etc. The
empty set is the smallest ordinal 0, and 1 = {0} is the successor of 0.
that is the next ordinal after which is not a successor, i.e., a limit ordinal ; 0 is the only finite limit
ordinal.
Theorem 3.31 (Proof by Induction on Ordinals) Let P (x) be any property. Assume that:
(i) P (0) holds.
19
(ii) P () holds provided P () holds for all < .
Then P(x) holds for all ordinals.
Proof. Indeed, if we had an ordinal for which P (x) would not hold, then we could find a smallest
such ordinal . But then P () for all < and therefore P () by condition (ii). Note that (ii)
actually implies (i). 2
Induction on Ordinals generalizes Complete Induction on N. We are now going to describe how
functions can be defined recursively on the ordinals. Let F (x, y) be any functional relation on the
class D(x) and let T (x) be a subclass of D(x), i.e., T (x) D(x) holds on U. Then F (x, y) T (x)
T (x) F (x, y) is called the restriction of F (x, y) to T (x).
Whenever we perceive a functional relation as a set, then we are actually
identifying the function with
its graph. So, for example, F (x, y) a stands for graph F (x, y a = {c|c (ab), c = (d, e), F (d, e)},
where b is the range of F (x, y) a.
Theorem 3.32 (Definition by Recursion on Ordinals) Let H(x, y, z) be a functional relation where
the domain is D(x, y) Ord(x) (y = y). That is, for any ordinal and any set a there is a unique
set b such that H(, a, b) holds on U. For this we write: z = H(x, y), Ord(x). Then there is a unique
functional relation F (x, y) on Ord(x) such that for any ordinal one has that
F (, b) iff H(, F , b)
i.e.,
F () = H(, F )
Proof. Let be any fixed ordinal. We first show that there is a unique function f on such that
() f () = H(, f ), <
f (0) = a0 , f (1) = H(1, {(0, a0 )}) = a1 , f (2) = H(2, {(0, a0 ), (1, a1 )}), . . .
In order to have a formal proof for uniqueness, assume that we have functions f and g both satisfying
the condition () . Then f (0) = g (0) = a0 . We wish to show that if f and g agree for all
< , , then f () = g (). Now: f () = H(, f ) = H(, g ) = g (). Hence, by the
induction principle, f () = g () holds for all < , i.e., f = g .
With respect to the existence of f , note that we have already f0 , f1 , . . . and we may form the set
of all ordinals for which f exists:
We are going to show that is a segment of . Let and < . Put f = f . Then for < ,
That is, f satisfies () . Hence, , and moreover, we have shown that if and < , one
has that f = f . As a segment of the ordinal , is also an ordinal . In order to show that
= , we define a function f by
f () = H(, f ),
We wish to show that f satisfies () . To this end let < < . Then one has:
f () = H(, f ) = H(, f ) = f ()
20
Hence: f = f and therefore, f () = H(, f ) holds for all < . Thus the function f
satisfies () . If we had < , then by the definition of . But is impossible for ordinals
as we have shown before.
The functions f , an ordinal, form a class G(x). The predicate G(x) is the formalization of: f
is a function, dom(f ) is some ordinal , < f () = H(, f ) . As we already have shown, the
functions for G(x) are pairwise compatible, i.e., restrictions of each other. Hence they can be glued
together to a functional relation F (x, y) where we have:
F (, b) iff f () = b iff f+ () = b iff H(, f+ ) = b iff H(, F ) = b , i.e.
F () = H(, F )
Finally, we must show uniqueness of our F (x, y). So assume that F 0 (x, y) also satisfies the recursion
formula: F 0 () = H(, F 0 ). Put F 0 = f0 and let < . Then:
According to the very first part of the proof we conclude that f = f0 holds for every . Hence
F (x, y) F 0 (x, y). 2
This
S leads to a recursively
defined functional
S relation V (x) on the ordinals: SV () = H(, V ) =
P(V ())| < . We have V (0) = = and if < , V () = P(V ())| <
S +
P(V ())| < } = V (). Because taking the power set is a monotone
operation, we have V ( )
S
P(V ()) P((V ()), . Thus, V (+ ) = P(V ())| = P(V ()). If is a limit ordinal
+
S S +
S
then
< yields < . Hence, V () = P(V ())| < = V ( )| < = {V ()| <
.
The Zermelo-Fraenkel Hierarchy of Sets. The cumulative hierarchy of sets is defined by
[
V0 = ; V+ = P(V ) ; V = V | < , if is a limit ordinal.
V (x) = z Ord(z) x Vz is the class of sets belonging to that hierarchy.
21
S
The Multiplication of Ordinals 0 = 0; + = + ; = ( )| < if 6= 0 is
a limit ordinal, defines a unique operation on Ord(x). It generalizes the ordinary multiplication
of natural numbers.
+ S
The Exponentiation of Ordinal Numbers 0 = 1; = ; = | < if 6= 0 is
a limit ordinal, defines a unique operation on Ord(x). It generalizes the ordinary exponentiation
of natural numbers.
For example, + 1 = ( + 0)+ = + , which is in agreement with our earlier convention. Also,
0+ = {} = {} = 1. We get:
Kant claimed that a statement like 2 + 2 = 4 is a synthetic judgment, i.e., a fact which must be
considered as being a priori true. Our argument showed that it can be proven on the basis of our
ability to recognize symbols like (((0+ )+ )+ )+ as 3+ , i.e., as 4. In this sense, 2 + 2 = 4 admits a proof,
i.e., it is the result of an analytical judgment.
At no point in our development of ordinals did we use our axiom of infinity, i.e., that there is a set
of natural numbers n, n N. We are now in a position to adopt a more general approach towards
this axiom.
The Finite Ordinals. An ordinal > 0 is called finite, if as well as every ordinal smaller than
, except 0, is a successor, i.e., it has a predecessor. The class of finite ordinals is given by the
formula:
F Ord(x) = Ord(x) y Ord(y) (y x) (y 6= ) z y = z {z}
Ordinals which are not finite are called infinite. If is a finite ordinal then all are finite. If
is finite then + is also finite. Clearly, all standard finite ordinals, i.e., the ordinals n are finite.
However, there is no way of showing that all finite ordinals are standard. A definition of finite as being
some n is a descriptive definition in contrast to the formal one above. According to a fundamental
result of mathematical logic, the so called Lowenheim-Skolem Theorem, there is no scheme of formulas
F (x, a), a b, which is satisfied exactly by the standard numbers n. Finite ordinals which are
nonstandard are of infinite magnitude in the sense that n holds for all n N. This is very easy to
see.
The Axiom of Infinity (revised). There is a set whose elements are exactly the finite ordinals.
S
We see that = | , hence is an ordinal, and is not finite because of / . Hence
cannot have a predecessor, i.e., is a limit ordinal. Because all ordinals smaller than are finite, i.e.,
they are zero or have predecessors, is the smallest infinite limit ordinal. The existence of an infinite
limit ordinal is equivalent to the axiom of infinity.
Theorem 3.33 (Proof by Induction for Finite Ordinals) Let P (x) be a property. Assume:
22
(i) P (0) holds.
(ii) P ( + ) holds, in case that P () holds.
Then P () holds for all .
Proof. Indeed, if there is some ordinal for which P () is not true, then > 0 for the smallest
such ordinal, because of (i), and because of (ii). 2
Addition and multiplication of finite ordinals provide the foundation of a rigorous development
of real numbers and the calculus. By the very nature of any such axiomatic or formal approach,
infinitesimals are lurking in the shadows e.g. as reciprocals of nonstandard finite numbers. However,
they cannot be discovered by any formal means. An approach of the Infinitesimal Calculus within any
model of ZF by adding an additional predicate which expresses the extraterrestrial quality of being
standard has been advocated by Ed. Nelson in his book on Radically Elementary Probability Theory.
It is quite interesting to note that Fraenkel discarded a theory of infinitesimals as useless for serious
mathematics. However axiomatic set theory together with formal logic finally reestablished the intuitive
methods of Euler and Cauchy, and made them again available not only as an attractive alternative to
the approach of the calculus but also as an intuitive and powerful tool for even the most advanced
parts of applied mathematics. And most of this NonStandard Analysis was developed by Abraham
Robinson who studied as a freshman set theory and mathematical logic under A. A. Fraenkel.
23
If is a limit ordinal, then + < and
[
+ < ( + )+ = + + + | < = +
On the other hand, if + < + then 6= and < , by what we have shown.
If + = + then = (3.4)
We prove this by induction on . That is, we assume the claim for all < . If = + , then
+ = + + = ( + )+ ( + )+ = + + = +
Note that 0 < 1 but 0 + = 1 + . Hence, we have right cancellation but not left cancellation.
( + ) + = + ( + ) (3.6)
We prove this by induction on . That is, we assume that associativity holds for every < . If
= + then ,
( + ) + + = (( + ) + ))+ = ( + ( + ))+ = + ( + )+ = + ( + + ).
Thus: [ [
+ ( + )| < = + | < +
We already know that + is a limit ordinal. Hence:
[
+ | < + = + ( + )
24
If > 0, then = + 0 < + = . We also know that there can be only one such that + = .
Let < . We induct on , i.e., we assume that for every , < , one can find some such
that + = .
If = + , then < and + = for some . Hence,
+ + = ( + )+ = + =
Proof. w1 is actually equal to , and therefore isomorphic to ; by what we have shown, the map
7 + is an isomorphism between and w2 . 2
In order to analyze the structure of the ordinal product, we need a few facts.
25
Now, < is the same as + and we conclude that
Hence, is a limit ordinal in case that is a limit ordinal. Now assume that is a limit ordinal. But
then it is enough to assume that is a successor, i.e., = + . But then:
= + = +
We prove this by induction on . That is, we assume for every < . If is a successor, i.e.,
= + , then:
< + = + = , hence = + + =
We induct on . That is, we assume for all < < that < . If = + , then and
. This is obvious for = , and for < it is our induction hypothesis. Hence,
< + = + =
In particular,
If > 0 then = only if =
We induct on . That is, we assume for every < that . If = + then, by 3.3:
= + = + + < + = + =
26
If is a limit ordinal, then
[ [
= | < | < = , i.e.,
We have already noticed, that 1 < 2 but 1 = = 2 . Hence, the sign in 3.14 cannot be replaced
by <.
( + ) = + (3.15)
( + ) = ( + + ) = ( + ( + 1)) = (( + ) + 1) = ( + ) +
= ( + ) + = + ( + ) = + +
= +
We may assume that > 0, otherwise our claim 3.15 becomes obvious. Hence, for = where
< we have by 3.12 that < . This yields:
[ [
+ | < + | <
Now let < . The set 0 = {| } is obviously a segment and therefore an ordinal.
Because S < one has that < , for each 0 . We claim that 0 is a successor. Otherwise,
0 = | < 0 } and, because holds for each 0 , one has that 0 . But this
is 0 0 , which is a contradiction. Hence, 0 = + and < shows that < . But then
also + < , because is a limit ordinal. This is, 0 > where 0 < . Hence:
[ [
+ | < + | <
This is: [ [
+ | < = + | <
Because is a limit ordinal, one has that
[
+ | < = + .
( ) = ( ) (3.16)
27
We induct on , i.e., we assume that ( ) holds for each < . If = + , then
( + ) = ( ( + 1)) = ( + ) = ( ) + = ( ) + = ( ) +
This is, ( ) = ( ) .
If is a limit ordinal, then is a limit ordinal as well as ( ). Hence:
[
( ) = | <
For proving our claim 3.16 we may assume > 0. Now, if = where < , then, according to
3.12, one has that < . On the other hand, if < , then according to the last Lemma, and
again by 3.12, one has some < such that < < . Therefore,
[ [
| < = ( )| <
S S
But, ( )| < = ( ) )| < = ( ) . This proves associativity.
Lemma 3.36 Let < . Then = + for unique < and < .
Proof. According to Lemma 3.35, we can find some < such that < + . This is,
< + . Using 3.7, + = , and < by 3.3.
Now note that and have to be different from 0 in order for to be less than Assume
= 1 + 1 = 2 + 2 where < , 1 , 2 < and 1 , 2 < . We wish to show that 1 = 2
and 1 = 2 .
If 1 = 2 , then 1 = 2 and, a fortiori, 1 = 2 by 3.4. If 1 > 2 , then 1 (2 + 1). Hence,
1 (2 + 1) = 2 + . By 3.7 we can find some 0 such that 1 = ( 2 + ) + .
We conclude, 2 + 2 = 1 + 1 = 2 + + + 1 , i.e., 2 = + + 1 , which is a
contradiction. 2
Proof. Note that if < then + . Thus, + = + . For any < , , we get
+ < + , i.e., + < . Together with Lemma 3.36 we have established the
said bijection between the Cartesian product and the ordinal .
We pull back the order on in order to make to a well-ordered system. That is,
If 1 > 2 , then 1 + +
2 and 1 2 = 2 + > 2 +2 . Thus, 1 +1 1 > 2 +2 .
Hence,
(1 , 1 ) > (2 , 2 ) in case that 1 > 2
If 1 = 2 then 1 = 1 + 1 > 2 = 2 + 2 yields 1 > 2 . Hence,
28
( , <al ) looks like a matrix with many rows of copies of :
That is, can be realized as the order sum of many copies of . There are many layers of
copies of .
Corollary 3.38 (The Division Algorithm) Let > 0. Then every admits a unique representa-
tion = + where and < .
Proof. If 1 then, using the monotinicity relation of 3.14, we infer from 1 that 1 <
+ . Thus, < + and the claim now follows from Theorem 3.37. 2
(ii) + = + ; =
Proof. In order to prove this, we induct on . If = 0, then + 0 = by 3.1 . If = + then
+ = ( + )+ = + , by the induction hypothesis. Also, 0 = 0 by 3.8. If = + then
= + = + by the first part of (i).
We first establish commutativity of addition for = 0 and = 1.
We have that + 0 = 0 + = by 3.1.
If = 1, then + 1 = + = 1 + . We prove this by induction on . For = 0 this is 0 + 1 = 0+ = 1 + 0
which is true by 3.1 and because of 0+ = 1. If = + we have
+ 1 = + + 1 = ( + 1) + 1 = (1 + ) + 1 = 1 + ( + 1) = 1 + + = 1 +
+ + = ( + )+ = ( + )+ = ( + ) + 1 = + ( + 1) = + (1 + ) = ( + 1) + 1 = + +
29
Hence, + = + .
In order to prove commutativity for multiplication, we need
(*) ( + 1) = +
We prove this by induction on . For = 0 this is ( + 1) 0 = 0 + 0, which is true because both
sides are zero by 3.8 and 3.1. If = + then
( + 1) + = ( + 1) + ( + 1) = ( + ) + ( + 1) = ( + ) + ( + 1) = + + +
which is (*) for .
If = 0, then 0 = 0 = 0 by 3.8. If = + then
+ = + = +
by the definition of ordinal multiplication and by the induction hypothesis. But also,
+ = +
by (*). This proves = . 2
Concerning exponentiation of ordinals, we must note that the notation = is ambiguous. The
ordinals and are sets but is not the set of maps from into . For example:
[
2 = 2 | =
while the set of maps from into 2 is equivalent to R, i.e., to the set of real numbers. Monk uses
for ordinal exponentiation. We do not adopt his convention. Later on, it will become clear from
the context what kind of exponentiation is meant. We already remark, that there is no natural way
to make 2 , perceived as the set of maps from into 2 to a well ordered set. One needs the AC to
well-order this set, i.e., R. However, one has the following
Proposition 3.40 A finite direct product
a = a0 . . . a1
of well-ordered sets a is well-ordered by the lexicographic order:
b = (b0 , . . . , b1 ) < c = (c0 , . . . , c1 ) iff b < c if is the first index where a and b differ.
Proof. This relation < is obviously irreflexive and one can easily prove that it is also transitive. We
need to show that every subset s of a has a smallest element. We define successively subsets s of s
and elements c a :
s0 = {b0 |b0 first component of some b s}
c0 = min(s0 )
s1 = {b1 |b1 second component of some b s where the first component is c0 }
c1 = min(s1 )
s1 = {b1 |b1 last component of some b s where first components are c0 , c1 , . . . , c2 }
c1 = min(s1 )
It is easy to see that c = (c0 , , c1 ) is the minimum of s. 2
30
Definition 3.7 We define for ordinals and the weak cartesian power :
If f and g are different elements of () then, because f and g differ at only finitely many s there is
a largest where f () 6= g(). We define f < g in case that f () < g().
We have g > f for g / s0 and f s0 . Thus, only functions f s0 are candidates for min(s). If s0 is a
singleton, we are done. Otherwise, we continue and define, similarly as before, 1 (f ) = max{|f () >
0, < 0 }. Then let
We have 0 > 1 and g > f for g s0 \ s1 and f s1 . We can continue that way and create a strictly
decreasing sequence of ordinals 0 > 1 . . . which must be finite1 . Thus s is a singleton for a finite
ordinal . The element of s then is the minimum for s. 2
1 A formalization of this process is based on Definition by Recursion on Ordinals(Theorem 3.32), and is left as an
31
Chapter 4
The axiom of choice (AC) states that every family of non-empty sets admits a choice function. As an
important consequence we note that every set p of mutually disjoint and non-empty sets c admits a
cross-section, i.e., a set s such that s c 6= for each c p. For s we may take the range of a choice
function for the family which is theS identity on p, i.e., p p, c 7 c. The sets c are called the classes
of the partition p of the set a = c, and s is also called a set of representatives for the partition p.
Because the sets c are pairwise disjoint, s c is a singleton for each c p. The statement that any
partition p of a set a into non-empty classes admits a set of representatives is actually equivalent to
AC. Given any family a, j i, of non-empty sets we can make it to a set p of disjoint sets by defining
p = {cj = {j} aj |j i}. Clearly, if j 6= j 0 then cj cj 0 = . Then if s is a cross-section for p, we
have s cj = (j, sj ) for some sj aj , and the map j 7 sj is a choice function for the family aj , j i.
A choice function is easy to find in case that all sets aj are well-ordered.
S One
may pick from every aj
just the smallest element. Notice that a well-ordering of a = aj |j i induces a well-ordering of
each aj and produces a choice function. The most substantial consequence of AC is
Theorem 4.1 (Zermelos Well-Ordering Principle (WO)) Every set can be well-ordered.
Proof. Let f be any choice function on P(a) \ {}. The idea to well-order a is quite simple. Define
a0 = f (a), a1 = f (a \ {a0 }), a2 = f (a \ {a0 , a1 }), . . . and set a0 < a1 < . . . The process of assigning
ordinals to elements of a should terminate because at every such step we are taking away an element
from a. The formal definition of this process is done by recursion. We pick some c / a and define a
functional relation by
f (a \ ran(g)) if x = a and y = g : a, ran(g) a
H(x, y) =
c otherwise
Because the universe U is not a set it is clear that such a set c must exist for a. By the Recursion
Theorem we have a functional relation F (x, y) such that for each ordinal , F () = H(, F ) holds.
If F () 6= c holds, then F is a function whose domain is , the range is {F ()| < } a and,
moreover, F () = f (a \ {F ()| < }). We conclude that F () 6= F () for all < . If we had
F () 6= c for all ordinals , then F (x) would establish an injective functional relation from Ord(x) into
the set a. It would follow from comprehension and replacement that Ord(x) is a set. Hence, there is a
smallest ordinal such that F () = c. For each < we have that F () a. Hence F : a and
the map is injective. But rang(F ) = a, because F () = c. Thus F is bijective and well-orders a.
2
32
Let (p, ) be a partially ordered set. An element m p is called maximal if m a implies that
m = a. The open interval (0, 1) of real numbers is an example of a partially ordered set which does
not have maximal elements. The maximal elements of the partially ordered set of all proper subspaces
of Rn are the hyperplanes. In a totally unordered set, any element is maximal. In order to find a
condition which guarantees the existence of maximal elements let us try to find one in (p, ). Pick any
element a0 in p. If a0 is not maximal, pick an element a1 > a0 . . . etc. If the so created sequence does
not terminate at some finite n , we have to define some a > an , i.e., we need to have at least one
upper bound for the chain {an |n }. This heuristic construction suggests
Theorem 4.2 (Zorns Lemma (ZL)) Let (p, ) be a partially ordered set where every chain has an
upper bound in p. Then p contains a maximal element.
Proof. The element u p is called a strict upper bound for the subset a of p, if u > b holds for each
b a. Any element of p is a strict upper bound for . There is no strict upper bound for p. Let d be
the set of all subsets of p which have a strict upper bound and for a d let u(a) be the non-empty set
of strict upper bounds. For a choice function f on P(p) \ {} and a d define v(a) = f (u(a)), i.e., v
selects a strict upper bound for a subset a of p, if there is one. Of course, v(a)
/ a for each a d. We
define
v(rang(g)) if x = and y = g : p, ran(g) d
H(x, y) =
c otherwise
As before, we choose for c a set which is not a member of p. According to the Recursion Principle
we have a functional relation F (x, y) such that F () = H(, F ). If F () 6= c, then F is a
function from into p, where the range {F ()| < } belongs to d, and F () = v({F ()| < }).
Therefore, F () 6= F (), < . If we had F () 6= c for all ordinals then F (x, y) would define an
injective functional relation of the class Ord(x) into the set p. Hence F () = c for a smallest ordinal
. But then for < < we have, because of F () = v({F ()| < }), that F () > F (). Hence
{F ()| < } is a chain in p and must have an upper bound m. On the other hand, this chain cannot
have a strict upper bound because F () = c. Hence = + and F () = m and there cannot be any
element larger than m. This is, m is maximal. 2
Corollary 4.3 Let (p, ) be a partially ordered set where every chain has an upper bound. Then for
any a p there is some maximal element m in p such that m a.
Proof. We only have to apply Zorns Lemma to the partially ordered set {b|b a}. 2
Corollary 4.4 Let (p, ) be a partially ordered set where every well-ordered chain has an upper bound.
Then p has a maximal element.
Proof. For the proof of ZL we only needed an upper bound for the well-ordered chain {F ()| < }.
2
Theorem 4.5 (ZL) implies within ZF Set Theory (WO) and, a fortiori, also (AC).
Proof. Let a be a set we wish to well-order. To this end let s be the set of all well-orderings of subsets
b of a, i.e., s = {r|r b b, b a, r well-order on b}. We partially order s by
S saying that
r1 r2 if b1
is a segment of b2. Let c be any chain in s. It is quite obvious that u = r|r c is a well-order
S
on the set b = dom(r)|r c}, i.e., u s, and u r for each r c. Hence any chain in s has an
upper bound. By ZL we conclude that s has a maximal element w. If we had b = dom(w) a, then
any element d a \ b put on top of (b, w) would violate the maximality of w.
That WO implies AC is a trivial observation which we have stated before. 2
33
Chapter 5
The axiom of foundation says that every non-empty set a contains an element b such that b a = .
It follows that a set can never be an element of itself, (x x), or more generally, there are no finite
cycles, i.e., maps defined on a finite ordinal such that a0 = a1 , a a+1 , 0 < 1; and
there is no map on such that a+1 a . The range of such maps would violate AF. An element
b a such that b a = , is minimal in (a, ), i.e., b a and for no c a one has that c b. From
this remark we conclude
Proposition 5.1 (AF) A set is an ordinal if and only if
(i) is transitive.
(ii) (, ) is totally ordered.
Proof. Recall that an ordinal is a set which is transitive on which is a strict well-ordering.
Because of AF, is irreflexive. If we have that is a total order then a minimal element of is
actually the minimum. 2
is non-empty and has a minimal element 0 . Because is transitive we have 0 . Hence every
element 0 belongs to but does not meet the condition for a. That is:
(*) If 0 then we have for any that = or or .
On the other hand, 0 belongs to a, hence the set
b = | , ( 6= 0 ) ( 0 ) (0 )
is non-empty and has a minimal element 0 . Because is transitive, every element 0 belongs to
but does not meet the condition for b. That is
34
(**) If 0 then we have = 0 or 0 or 0 .
We are going to deduce a contradiction from (*) and (**). First, we derive 0 0 . Let 0 ; we
use (**) to conclude 0 . If we had = 0 , then 0 0 would contradict 0 b. If we had 0
then 0 together with the transitivity of 0 would yield 0 0 , contradicting again 0 b.
Now, 0 0 , and 0 = 0 cannot hold because of 0 b. This is 0 0 and we may pick some
0 \ 0 . By (*) we have 0 = or 0 or 0 where we have already excluded 0 . Now,
0 = yields 0 0 which contradicts 0 b; 0 and 0 yields by transitivity of the set
0 again the contradiction 0 0 . 2
Assume that V (a) holds for the set a. Then let be the smallest ordinal such that a V holds.
Because V0 = , we have > 0. It is also clear that must be a successor ordinal. For a limit ordinal
, each element a of V belongs to some V for some < . Thus = + and we call (a) the rank
of a: (a) = if a V where = + is the smallest ordinal such that a V , or a V
Assume a V . If is a limit ordinal then (a) < . If = + one has that (a) < . Thus,
a V iff (a) < .
35
Proposition 5.5 V (a) holds if and only if V (b) holds for every b a. Moreover, (b) < (a) for
every b a. If c is a subset of a then V (c) holds and (c) (a).
Proof. Let (a) = . Then a V+ = P(V ), i.e., a V . If b a then b V . Hence, V (b) and
(b) < = (a). S
Now assume V (b) for every b a. Let = (b)+ |b a . Then b V(b)+ V for every b a.
Hence, b V for every b a which is a V , i.e., a V+ .
We have c a V , hence c V which is (c) = (a) 2
Proposition 5.6 V () holds for every ordinal and one has the () = .
Proof. By Proposition 5.5 we have that V () holds in case that V () holds for all < . So V ()
holds for every ordinal . S
If we have V + for each , then V and because of V = P(V )| < , one has
V for each . Hence V , i.e., V+ . This is V+ for each ordinal . Assume
that there is an ordinal
S where () < , i.e., an ordinal for which V . We pick the smallest such
. Then V = P(V )| < yields some < such that V , i.e., V for some
< . However, was the smallest such ordinal. 2
Proposition 5.7 Each set a for which V (a) holds contains an element b such that a b =
Proof. {(b)|b a} is a set of ordinals and has a smallest element . Hence (b) = for some b a
and according to Proposition 5.5, (c) < rho(b) for each c b. Hence c
/ a for each c b. 2
We are going to show that the axiom of foundation implies that each set a of the universe belongs to
the class V (x). A possible argument runs roughly like this: Assume that there is a set a such that
V (a) holds. Then a must contain some element a1 , such that V (a1 ). But then a1 must contain
some element a2 such that V (a2 ) etc. That is, we create a sequence a 3 a1 3 a2 3 . . . which violates
AF. A complete proof along these lines requires obviously AC. We are tempted
to define recursively a
sequence a , , such that a0 = a, and a+1 = f b|b a , V (b) for some choice function f .
The trouble with this definition of the a is that we first have to specify a suitable family of non-empty
sets1 in order to have a choice function for the generation the a . There is a somewhat more elementary
approach possible, which is based on the transitive closure of a set.
Lemma 5.8 For any set a there is a set tr(a) which contains a and which is transitive. Moreover,
any transitive set which contains a contains also tr(a).
S S S
Proof. We define sets a0 = a, a1 = b|b a0 , a2 = b|b a1 , . . . a+1 = b|b a and
S
tr(a) = a | .
We have tr(a) a and if b tr(a) then b a for some . But then b a+1 tr(a), i.e., tr(a) is
transitive.
Now assume that c a where c is transitive. SAssume that
a c. Let b a c. Then b c and
b c, because c is transitive. Hence a+1 = b|b a c. 2
Theorem 5.9 The axiom of foundation holds if and only if every set belongs to the ZF-Hierarchy:
AF xV (x)
Proof. We already showed that every set a which belongs to V (x) has an element b which is disjoint
to a. So xV (x) implies the axiom of foundation.
1 However, Hilbert postulated the existence of a universal choice (class-)function for which a 6= (a) a holds.
He called it the logical choice function. However, nobody assumes this strong version of AC anymore
36
For the converse, assume that we have AF but that there is a set a which does not belong to V (x).
Then tr(a) does not satisfy V (x) as a set which contains a. Let b = {c|c tr(a), V (c)}.We are going
to show that b violates AF. Let c b. Then, because of V (c), c must contain an element d such that
V (d). Now, d c tr(a) yields d tr(a). But tr(a) is transitive, thus d tr(a). Together with
V (d) this yields d b. Hence, c b 6= . Because this holds for every element c b, the AF cannot
hold. 2
37
Chapter 6
Cardinals
Proposition 6.2 Assume that f : a b is injective. Then there is some map g : b a such that
g f = ida . That is, every injective map has at least one left inverse.
Proposition 6.3 (AC) Assume that g : b a is surjective. Then there is some map f : a b such
that g f = ida . That is, under the assumption of AC, every surjective map has at least one right
inverse.
The proofs are very easy. The map f for Proposition 6.3 is defined with the help of a choice function
on P(b) \ {} which picks for every c a some element d g 1 (a) = {d|g(d) = a}.
Hence, a in b always yields a pr b but the converse needs the AC. Thus a in b iff a pr b holds
under the assumption of the axiom of choice.
For every map f : a b the equivalence kernel, or just the kernel, is defined by c1 f c2 iff f (c1 ) =
f (c2 ). This is an equivalence relation on the set a where the classes are the largest subsets of a on
which the map f is constant. As usual, a/ f denotes the set of equivalence classes and c 7 [c] is the
canonical projection qf . The map [c] 7 f (c) then is the canonical injection f.
38
d0 d1 d ................
2
g f g f g
f
c0 c
1 c2 c3 .....
39
(ii) (ab )c a(bc) ;
(iii) a(bc) ab ac in case that b c = ;
(iv) ac bc (a b)c
Proof. Subsets can be identified with their characteristic functions. Actually P(a) and 2a are
isomorphic as boolean algebras. Families of maps in one variable are just maps in two variables. This
is (ii). The statement (iii) says that maps on disjoint domains can be patched together; (iv) is the
basis of the categorical characterization of the cartesian product, i.e., pairs of maps (f, g) on a common
domain correspond to maps h into the direct product of the codomains. 2
Let r be the set of real numbers and (a, b) and (c, d) any two (proper) open intervals. Then (a, b)
(c, d) by means of a simple linear equation. Clearly (1, 1) in [1, 1] in (2, 2) and (1, 1) (2, 2)
then yields (1, 1) [1, 1]. Hence any two proper intervals, whether open, closed or half-open, are
equivalent. The arctangent function maps r bijectively onto the open interval (/2, /2). Hence r
is equivalent to any of its proper intervals. The function 1/x maps (0, 1] to [1, ) and, as before, any
two improper intervals are equivalent. Thus r is equivalent to any of its intervals. On the other hand,
with the help of the binary representation of real numbers, one easily establishes [0, 1] 2 .
Proposition 6.9 The set r of real numbers, the continuum, is equivalent to the powerset of the set
of natural numbers. 2
Hence the continuum r is equivalent to the finite dimensional spaces r and even equivalent to
the infinite dimensional space of all real sequences. The set q + of non-negative rational numbers is
equivalent to a subset of , by means of the unique representation of a rational number as a reduced
fraction. Hence, q + . But q + , therefore q + . It follows that the set q of rational
numbers is equivalent to the set of finite ordinals: q + q {1, 1} q + 2 . Two
continuous real functions which agree on the set q are equal. Let c be the set of all continuous functions.
Then r c because every constant is continuous, and c rq with f 7 f q. So r c rq r r.
Hence r is equivalent to the set of all continuous functions.
40
Therefore, the set of all real valued functions is equivalent to the set of functions which take on only
two values and this set is strictly larger than the continuum.
For any sets a and b where a / a and b / b one has that a b iff a {a} = a+ b+ = b {b}. Let
+ +
g : a b be a bijection. If g(a) = b, then g a is a bijection between a and b. If g(c) = b, then with
the transposition (a, c) we get in g (a, c) a bijection g 0 : a+ b+ which maps a to b. In particular,
0 <in 1, leads to 1 <in 2. It follows, by induction, that for any finite ordinal one has that <in + .
Hence, for finite ordinals one has that < iff <in . (Assume < . Then + and certainly
+ in , by means of the inclusion map. Hence <in + in . So < yields <in . Assume
<in . Then < , because otherwise and therefore, in .) A set a is called finite if
it is equivalent to some finite ordinal . This ordinal is then unique and called the cardinal of a;
= card(a) is a canonical representative of the class of all sets which are equivalent to the set a. Sets
which are not finite are called infinite.
For the ordinal we have that + . The map 7 0, 7 + is a bijection between and + .
Hence is an infinite set. Our next goal is to show that for any ordinal one has an ordinal such
that <in . Then automatically < . Otherwise , so in , thus . We are going to
show somewhat more:
Proposition 6.14 For any set a there is an ordinal which is not equivalent to any subset b of a.
The smallest such is called the Hartogs number of a
Proof. Let k(a) = {w|w P(a a), w a well-order of a subset b of a}. If w is in k(a) then w is
isomorphic to a unique ordinal . That is, we have a functional relation H(w, ), w k(a). If is any
ordinal which is equivalent to some subset b of a, then b carries a well-ordering w such that H(w, ).
The range of H(w, ) is by replacement a set and obviously a segment. So it is of the form S() where
is the first ordinal not equivalent to any subset of a. 2
Definition 6.1 An ordinal is called an initial ordinal if < yields <in ; that is, no ordinal
smaller than is equivalent to . Initial ordinals are also called cardinals.
Proposition 6.15 For any set a the Hartogs number h(a) = is a cardinal.
Proof. Assume for some < that . Then is equivalent to some subset b of a because
was the smallest ordinal for which this is not the case. But would make b well-orderable by .
This is a contradiction. 2
Proposition 6.16 The class of cardinals is not a set.
S
Proof. Let a be any set of cardinals. Then = a is an upper bound for a. But then h(b) >
holds for each a. That is, the cardinal h() does not belong to the set a. 2
Proposition 6.17 All infinite cardinals are limit ordinals.
Proof. Assume that is an ordinal and . We define a bijection f : {} by setting
f (0) = ; f () = 1, \ {0}; f () = for < . Hence + is not a cardinal. 2
We have that all finite ordinals are cardinals. If is any ordinal then there is a smallest ordinal
equivalent to it: card() = min{| }. For example, = card() = card( + 1) = card( + 2) =
. . . and we have already shown that there are cardinals larger than the first infinite cardinal, which
is . An ordinal is countably infinite, or denumerable, if = card(). Let 1 be the first cardinal
which is not countable. Then 1 = {| < 1 } tells us that 1 is just the set of all countable ordinals,
i.e., the ordinals which are either finite or denumerable.
41
Let a be a set which can be well-ordered. We then define:
card(a) = min{| a}
Corollary 6.19 If b in a where a can be well-ordered, then b can be well-ordered and one has that
card(b) card(a).
Proof. We have b s0 for some subset s0 of a and because of a card(a) we get b s for some
subset s of card(a). Hence, card(b) = card(s) card(a). 2
Corollary 6.20 Assume for the ordinal that card() < where is a cardinal. Then < .
Proof. Indeed, if we had , then would be a subset of . The smallest ordinal that well-orders
, i.e., card(), would also well-order the set . This well-ordering of is order-isomorphic to an
ordinal where card(). But was a cardinal, thus , and therefore card() which
contradicts card() < . 2
Thus, an ordinal is a cardinal if and only if for every ordinal one has that
The Continuum Hypothesis (CH) is the assertion that any subset s of real numbers which is not
finite is equivalent to or to r.
We can avoid the reals in the formulation of CH by asserting that every subset of 2 which is infinite
is equivalent to or to the whole set 2 .
Proposition 6.21 If 2 can be well-ordered then 1 = card(2 ) is equivalent to the Continuum Hy-
pothesis.
Proof. If 2 can be well-ordered then 1 card(2 ) because <in 2 card(2 ) but 1 is the
smallest cardinal such that < . It follows that 2 contains a subset s which is equivalent to 1 .
If we assume CH then s 2 , hence 1 2 and 1 = card(1 ) card(2 ). But equivalent cardinals
are equal, so 1 = card(2 ).
Assume that 2 can be well-ordered and that 1 = card(2 ). Then let s be any infinite subset of 2 .
If card(s) > , then 1 card(s) card(2 ). Hence, card(s) = card(2 ) and it follows that s 2 ,
i.e., CH. 2
Cantor stated the Continuum Hypothesis in 1878 and it was listed by Hilbert in his famous address of
1900 as the most important mathematical problem of the twentieth century. Hilbert believed in the
42
truth of CH and called it actually the Continuum Theorem in his 1925 paper About Infinity. Godel
proved in 1938 that AC and CH are simultaneously consistent with ZF. Within any universe he formed
the hierarchy of constructible sets: Roughly speaking, sets are constructed in stages. The critical case
is the one from stage to + . If the set a has already been constructed at stage then at + all
subsets of a are collected which are given by formulas Q(x) where all occurring parameters are sets
already constructed at stages . This constructible universe L settled many other open problems in
the affirmative and there are mathematicians who firmly believe in V = L, i.e., the ZF-Hierarchy V
should consist of constructible sets only. However, P. Cohen showed in 1963 that CH does not follow
from AC and he even constructed a model where the reals cannot be well-ordered, i.e., where AC is not
true. And CH can be true without having V = L. He invented a new method which he called forcing in
order to obtain these models. His research revolutionized set theory and gave new impetus to systems
of non-classical logics. The CH and its generalization to arbitrary infinite sets implies, however, AC.
It is quite interesting to note that P. Cohen believes that CH is obviously false. He argues that 1 as
the set of all types of well-orderings realizable on is, after all, a set one gets by adding one element at
a time: 1 = {1, 2, . . . , , + 1, + 2, . . . , + , . . .} while forming the power set should be perceived
as a much stronger tool to generate a set of a larger cardinal.
6.2 Cardinals
If we assume AC, and we will do so from now on, then any set has a cardinal. Let ()j , j i, be any
family of cardinals. We define the cardinal sum as the cardinal of the disjoint union:
X [
j = card {j} j
ji ji
If aj is a family of pairwise disjoint sets where card(aj ) = j , then the sum of the j is the cardinal of
the union of the aj . This is very easy to see. The cardinal product of the j is defined as the cardinal
of the cartesian product: Y Y
j = card j
ji ji
If aj is any family of sets where card(aj ) = j , then the product of the j is the cardinal of the
cartesian product of the aj . Again, this is very easy to see. If and are cardinals then + and
denote the sum and product, respectively. It is obvious that sum and product are commutative
and associative and that the multiplication is distributive over addition; for exponentiation we have
the familiar rules:
= ( )
= ( )
= +
Finite sums and products agree with ordinary addition and multiplication. These are the addition
and multiplication rules of finite combinatorics. Whenever we use cardinal and ordinal addition or
multiplication simultaneously, the ordinal operations are encircled, e.g., = 2 while + =
2 = = . Another convention is to use Hebrew letters for cardinals, like Aleph() or Beth(i).
We already know that the cardinals form a class, well-ordered by . Hence there is a unique order
isomorphic correspondence between all ordinals and all infinite cardinals. This correspondence was
called by Cantor Aleph(, ) and for which he introduced the notation . So in particular:
= 0 , 0 + 0 = 0 , 0 0 = 0 , , 0 < 20 , 0 0 = 20
43
The single most useful fact about cardinal arithmetic is
Theorem 6.22 Let be an infinite cardinal. Then = .
Proof. For the proof we restate two auxiliary Lemmas which can be easily proven ad hoc. Recall,
however, Theorem 3.34 and Theorem 3.37.
Lemma 6.23 Let (1 , <1 ) and (w2 , <2 ) be two well-ordered systems. Then the cartesian product
1 2 becomes a well-ordered set by defining: (c, d) < (c0 , d0 ) iff c <1 c0 or if c = c0 then d <2 d0 .
(w, <) = (w1 , <1 ) (w2 , <2 ) is called the lexicographic product. The ordinal product is
isomorphic to the lexicographic product of and .
If we write |a| for card(a) then according to the last statement of the preceding Lemma we have that
| | = || + ||. For example, if is an infinite ordinal then | 1| = | {}| = || according to
7 0, 7 + 1 for < , 7 , for < .
The proof of the theorem generalizes a particular proof for S. Defineon the following
well-ordering < . Let p = {(, )| + = }. Then = p | where, of course, the
union is disjoint. Each p carries the lexicographic ordering which it inherited from . The order
sum of the p defines a well-ordering < on which is order-isomorphic to . We have:
(0, 0) < (0, 1) < (1, 0) < (0, 2) < (1, 1) < (2, 0) < . . .
+ = max(, ) =
44
Proof. We first assume that . Then + 2 = .
If is infinite and finite then + = : + + = ; = : = . 2
= 2
In particular, a countable union of countable sets is countable. It is quite remarkable that one cannot
prove this without using AC.
Let j , j i, be a family of cardinals with supremum . Then is a cardinal. Indeed, j yields
card() j for all j i. Hence card() is an upper bound for the j ; but then card() yields
card() = .
Corollary 6.28 Let j 1, j i, be a family of cardinals where = sup{j |j i} and = card(i).
Then, if or are infinite, one has that
X
j =
ji
P P P P
Proof. j card(i) = . On the other hand, = 1 j . And for each j i
j jP P j P j
one has that j j , therefore j . Hence = max(, ) j . 2
j j
Theorem 6.29 (Zermelo-Konig) Let for j i, aj bj where aj <in bj and where the aj are
pairwise disjoint. Then [ Y
aj <in bj
Proof. Because cj = bj \ aj are non-empty, S we haveQ a choice function j 7 dj where the dj belong
to cj . This gives us an injective map f : aj bj from the disjoint sum a of the Q aj into the
direct product b of the bj : If e a, then e ak for a unique k i. We define f (e) bj such that
pj (f (e)) = dj for j 6= k, and pk (f (e)) = e. It is easy to see that f is injective: Assume that e 6= e0 . If
e and e0 belong to the same ak , then f (e) and f (e0 ) have different kcomponents, namely e and e0 . If
e ak and e0 ak0 , where k 6= k 0 , then the kcomponent of f (e) is e, while the kcomponent of f (e0 )
is dk ck , soScertainly
Q different from e, because dk / ak . Hence again, f (e) 6= f (e0 ).
Now let f : aj bj be any function. We are going to show that f is not surjective. Because of
aj <in bj none of the maps pj f aj from aj into bj is surjective (again by AC, see Proposition 6.3).
We have a choice function j 7 dj where dj bj but dj is not the jcomponent of any f (e) where
e aj . Hence (dj )ji is an element of b which is not in the range of the map f . 2
P Q
Corollary 6.30 Assume for the family of cardinals that j < 0j , j i. Then j < 0j .
45
Proof. We apply the previous theorem to bj = {j} 0j and aj = {j} j . 2
Remark 6.1 Let and be limit ordinals where < cf (). Let f : be any function from
into . Then f is bounded, i.e., there is some such that f () holds for all .
This follows from the proof of the proposition.
Corollary 6.33 Let be a limit ordinal. Then cf () is a cardinal.
Proof. Let = card(cf ()). By the previous theorem we have a cofinal map : cf (), ,
and a cofinal map : cf () . The composition : is cofinal and < cf () would
contradict the definition of cf (). 2
Proposition 6.34 Let be a limit ordinal. Then cf (cf ()) = cf ().
46
Proof. We have cf (cf ()) cf () and, as in the previous proof, cf (cf ()) < cf () would contradict
the definition of cf (). 2
Definition 6.3 An infinite cardinal is called a regular cardinal if = cf (). An infinite cardinal
which is not regular is called singular.
For example, cf (cf ()) is for any limit ordinal some regular cardinal. The first infinite cardinal, , is
obviously a regular cardinal.
Recall that the infinite cardinals form a well ordered class and that there is a unique order isomorphism
between the class of all ordinals and the class of all infinite cardinals which we called Aleph(, ), i.e.,
= . For example, 0 = and card(2 ) = for some . We are going to show that cannot be
, one of the few definitive statements one can make on the basis of ZF with the AC.
Definition 6.4 A cardinal is called a successor cardinal if = for some successor ordinal .
Otherwise it is called a limit cardinal, i.e., = for some limit ordinal .
Proof. Let < . Then card() < , thus card() = for some < . Because is a limit
ordinal, we have that + < . Hence, card() = < + < . 2
The second limit cardinal, , is singular because of cf ( ) = by means of the map 7 .
By the previous proposition, : , 7 is cofinal. Hence, cf ( ) = .
(+ ) = +
47
Proposition 6.39 Let be an infinite limit ordinal. Then cf ( ) = cf ().
Proof. Let : be cofinal. Then 0 : , 7 () is increasing, because Aleph is
strictly increasing. We show that 0 is cofinal. Let : , 7 if is an infinite ordinal and
card() = , and 7 0 if is finite. (Here we use that > 0, there is no map from 0 into the
empty set.) Now let and = () . Then < () for some . But then < ()
and because of card() we conclude () = 0 ().
Now let 0 : be cofinal. The map : , 7 ( 0 ()) is obviously increasing. In order to
show cofinality, let . Then < + < 0 () for some but then ( 0 ()) + > . 2
For = 0, the last proposition is not true: cf (0 ) = 0 > 0; 0 is a limit cardinal and regular.
Definition 6.5 An uncountable cardinal is called weakly inaccessible if it is
(i) a limit cardinal, i.e., = for some limit ordinal ,
(ii) regular, i.e., cf ( ) = .
48