Intro to Axiomatic (ZF) Set Theory


Modern Zermelo-Fraenkel set theory is currently the official rulebook for doing most mathematics. For this reason alone it is worth our attention, since our development of computability will be given in set theory. But it also formalizes the demonstration of existence claims in mathematics and illustrates the kind of free-wheeling assumptions that made constructivists nervous. Nobody knows if it is consistent. Gödel's theorem will tell us that if it is consistent, it can't tell us that it is. But even if it did tell us it was consistent we wouldn't care because it would have told us if it were inconsistent anyway.


The Language

The language L of set theory is a marvel to behold. It is just the logical language L with one binary relation added:

All of the symbols in calculus are definable in set theory in terms of the relation x in y. For legibility, I will replace the

"official" form

with


The Axioms

Axioms are just statements that you can stick into a proof wherever you want without further justification. If you want, you can assume that the power set of a given set always exists. Or you can assume that pink elephants exist. Assumptions are as easy to make as they are to write down. These are the axioms of ZF set theory.


Set Existence

Some set exists. We don't assume anything else about it, and the E elimination rule of logic makes sure that we don't fool ourselves into thinking that we do!

(Set Existence) Ex x = x.

Actually, this axiom is redundant because it follows from our logic.

  1. x = x [ref]
  2. Ex x = x [E I, x is free for x in x = x]


Extensionality Axiom

The axiom of extensionality says that sets with the same elements are identical.

There is no color , shape, or other characteristic about whatever it is that holds all the elements together. That lopsided circle with a ball and an a dog in it that they showed you in grade school therefore wasn't really a set, since two lopsided circles with different shapes would be distinct even though they have the same elements.

Extensionality doesn't have an existensial quantifier, so it isn't used to prove existence. It is used to prove uniqueness, which is essential for the introduction of function symbols into the language.


Separation Axiom Schema

The original set theories of Gottlob Frege and Georg Cantor assumed the following, which we will most definitely not assume. Let P(x, z1, ..., zn) be a formula of L free in x, z1, ..., zn. Then the following is an axiom:

Thus, comprehension is not a single axiom, but an axiom schema, each instance of which is an axiom. Nothing prevents a set of axioms from being infinite, so long as you say which sentences count as axioms. After all, the point is to know what you get to put in a proof without further justification.

The axiom says that for each z1, ..., zn, we are entitled to assume the existence of a set whose members are exactly the sets x satisfying P(x, z1, ..., zn). In other words, comprehension turns definable properties into things (sets). This is reminiscent to Plato's trick of turning the predicate "is even" into an object: the Form of Evenness.

Naive comprehension is the immediate source of Russell's paradox. Here is an official derivation. Note that squiggly braces have nothing to do with it. They are introduced by an explicit definition, and hence could not introduce an inconsistency unless it could be derived from the axioms without them. Choose P(x) = not (x in x). Then we have:

  1. Ey Ax (x in y <---> x notin x) [comprehension]
  2. | Ax (x in y <---> x notin x) [hyp for EE]
  3. | y in y <---> y notin y [AE, y is free for x in (x in y <---> not x in x)]
  4. | _|_
  5. _|_ [EE, y does not occur free in _|_ or in any formula on which _|_ depends except for the EE hyp]

By the [ _|_ ] rule, every Q follows from naive comprehension. You may be wondering how we get _|_ from A <---> -A in step 4. Well, let's do it once.

  1. A <---> -A
  2. A ---> (A ---> _|_) [defn <--->]
  3. (A ---> _|_) ---> A [defn <--->]
  4. | A [hyp]
  5. | (A ---> _|_) [---> E]
  6. | _|_ [---> E]
  7. (A ---> _|_) [---> I]
  8. A [---> I]
  9. _|_

Is this proof intuitionistically valid?

Zermelo's idea was to block this argument by means of a slight modification of the comprehension axiom, which is called separation. This one is official:

The idea is that separation yields only a subset y of a set z that is already known to exist. Thus, P(x, z1, ..., zn) is a "plan" for "carving" y out of a block z of given material. The plan may be fine, but if there is no sufficiently big block of marble available, it can't be produced.

Watch what happens when we try to re-run Russell's paradox using this axiom:

  1. Az Ey Ax (x in y <---> x in z & x notin x) [comprehension]
  2. Ey Ax (x in y <---> x in z & x notin x) [AE]
  3. | Ax (x in y <---> x in z & x notin x) [hyp for EE]
  4. | y in y <---> y in z & y notin y [AE]
  5. | ?

It doesn't work. Suppose we give ourselves enough marble to work with. Since assuming is as easy as writing, let's overcome the restrictiveness of zermelo's comprehension axiom by means of the...

This says that some set contains all sets. So it must contain itself as well. And the singleton containing itself, etc. Well, assumptions are easy to make, as I said. Now, together with comprehension, we have:

  1. Ew Ax (x in w) [mother]
  2. Az Ey Ax (x in y <---> x in z & x notin x) [comprehension]
  3. | Ax (x in w) [EE hyp]
  4. | Ey Ax (x in y <---> x in w & x notin x) [AE]
  5. | Ax (x in y <---> x in w & x notin x) [hyp for EE]
  6. | y in y <---> y in w & y notin y [AE]
  7. | y in w [AE]
  8. | _|_ [some propositional logic]
  9. _|_ [EE]

So Russell returns if we assume we have enough marble to carve the Russell set out of. But we don't assume [mother] and we hope we haven't assumed enough for the Russell set to come back, though we can't really be sure.

Note that if we put a subproof line next to the whole proof just given, it is actually an intuitionistically valid proof that

So there you have it: the bigger they are, the harder they fall! Russell's paradox has been tamed to prove a theorem.

Keeping with the David and Goliath theme, the littlest thing does exist. We are going to introduce the empty set. I will call it "0", which will turn out to be entirely appropriate when we construct the natural numbers.

"0" is a constant symbol. But it is not a constant symbol in L. We have to introduce it . A constant symbol may be thought of as a 0-ary function symbol. In general, here is how to introduce a function symbol without getting into trouble.


Very important digression on introducing new function symbols

Let P(x1, ..., xn, , y) be a formula of L free in x, y. Let T be a set of axioms. Suppose the axioms are such that

Then we may introduce a new symbol f into the language and use the symbol's definition:

as a new axiom. Existence and uniqueness come out of the quantifier rules automatically, once the definition is allowed into proofs. That's why you better make sure you can prove them before you add it! Just watch:

Existence

  1. Ax1 ... Axn Ay f(x1, ..., xn) = y <---> P(x1, ..., xn, y) [defn f]
  2. f(x1, ..., xn) = f(x1, ..., xn) [=]
  3. Ey f(x1, ..., xn) = y [EI f(x1, ..., xn) is free for y in f(x1, ..., xn) = y]
  4. | f(x1, ..., xn) = y [EE hyp]
  5. | P(x1, ..., xn, y) [---> E]
  6. | P(x1, ..., xn, f(x1, ..., xn))
  7. P(x1, ..., xn, f(x1, ..., xn)) [EE]
  8. Ez P(x1, ..., xn, z) [EI]
  9. Ax1 ... Axn Ez P(x1, ..., xn, z) [AI, x1, ..., xn not free in any uncancelled hypothesis on which prev stmt depends]

Uniqueness

  1. Ax1 ... Axn Ay f(x1, ..., xn) = y <---> P(x1, ..., xn, y) [defn f]
  2. | P(x1, ..., xn, y) & P(x1, ..., xn, z) [hyp]
  3. | f(x1, ..., xn) = y [---> E, AE]
  4. | f(x1, ..., xn) = z [---> E, AE]
  5. | y = z [=]
  6. P(x1, ..., xn, y) & P(x1, ..., xn, z) ---> y = z [---> I]
  7. Ax1 ... Axn Ay P(x1, ..., xn, y) & P(x1, ..., xn, z) ---> y = z [AI]

What justifies this procedure? Just that if we always follow it, we have the property that the theorems of T in L are exactly the theorems in L that are provable from T together with the definition of f. We say that the definition conservatively extends T. Thus, definitions may serve to shorten proofs, but they never allow us to prove things we couldn't have proved without them. IF, that is, we follow the rules just outlined.


Introducing 0

Let's introduce the constant (0-ary function symbol) 0 according to the above rule. Choose

Now we must find derivations witnessing:

  1. ZF |- Ey Ax x notin y.
  2. ZF |- Ay Az (Ax x notin y.) & (Ax x notin z) ---> y = z.

For existence, we have

  1. Ew w = w[existence]
  2. Az Ey Ax (x in y <---> (x in z & _|_)) [comprehension]
  3. | w = w [EE hyp]
  4. | Ey Ax (x in y <---> (x in w & _|_)) [AE]
  5. | | Ax x in y <---> (x in w & _|_) [EE hyp]
  6. | | x in y <---> (x in w & _|_) [AE]
  7. | | x in y <---> _|_ [propositional logic]
  8. | | x notin y [propositional logic]
  9. | | Ax x notin y [AI]
  10. | | Ey Ax x notin y [EI]
  11. | Ey Ax x notin y [EE]
  12. Ey Ax x notin y [EE]

For uniqueness, we have

  1. Ay Az (Ax( x in y <---> x in z) ---> y = z) [extensionality]
  2. | (Ax x in y <---> _|_) & (Ax x in z <---> _|_.) [hyp]
  3. | (Ax x in y <---> _|_) & (Ax x in z <---> _|_.) [AE]
  4. | x in y <---> _|_ [&E, AE]
  5. | x in z <---> _|_ [&E, AE]
  6. | x in y <---> x in z
  7. | Ax (x in y <---> x in z)
  8. | (Ax( x in y <---> x in z) ---> y = z) [AE]
  9. | y = z
  10. (Ax x in y <---> _|_) & (Ax x in z <---> _|_.) ---> y = z [---> I]
  11. Ay Az (Ax x in y <---> _|_) & (Ax x in z <---> _|_.) ---> y = z [AI]

Note that at this level, we shouldn't be getting too uptight about dinky propositional logic UNLESS we are particularly interested in intuitionistic as opposed to classical results. Eventually, you will be so used to the axioms that you can apply them without writing them down first, and similarly for definitions. But there must always be a reference tot he axiom or definition when you do so. Also, for now, quantifier rules should always be applied rigorously.

Now we are entitled to use the constant (0-ary function symbol) 0, together with the definition

This may look funny, but it is just a 0-ary instance of the familiar


Introducing Squiggly Brackets

You are no doubt thinking that it would have been much easier to just say 0 = {x in z: not x = x}, where z is an arbitrary existing set. But where are those wonderful squiggly brackets {x in z: P(x)}? Not in L. So they must be introduced by a definition.

What the heck are those squiggly brackets anyway? They should denote a set y carved out of some given set z according to "plan" P(x). Hence, despite their strange appearance, they denote a distinct function symbol introduced into L for each choice of P(x). This becomes apparent if we write them more conventionally as

Logic doesn't really care if you use funny symbols for function symbols, as long as you use the notation consistently. For f(x, y) you could write x___y, if you like. Don't get hung up staring at notation trying to guess what it might mean. Find the definition. That's all that matters.

So we must show existence and uniqueness in order to introduce this funny looking function symbol. It's easy! Let the set theoretic relation defining comprehension be

We need to show

  1. ZF |- Az Ey Q(z, y)
  2. ZF |- Az Ax Ay Q(z, x) & Q(z, y) ---> x = y.

Existence is a one liner. That's what comprehension is for.

Uniqueness is also easy. That's what extensionality is for.

  1. Ay Az (Ax( x in y <---> x in z) ---> y = z) [extensionality]
  2. | Ax x in y <---> x in z & P(x) & Ax x in w <---> x in z & P(x) [hyp]
  3. | x in y <---> x in z & P(x) [&E, AE]
  4. | x in w <---> x in z & P(x) [&E, AE]
  5. | x in y <---> x in w [propositional logic]
  6. | Ax (x in y <---> x in w ) [AI]
  7. | Ax( x in y <---> x in w) ---> y = w
  8. | y = w
  9. Ax x in y <---> x in z & P(x) & Ax x in w <---> x in z & P(x) ---> y = w [---> I]
  10. Ay Aw (Ax x in y <---> x in z & P(x) & Ax x in w <---> x in z & P(x) ---> y = w) [A I]

A proof like this can be streamlined.

Here is a streamlined version. You won't lose any points for a proof like this. You will earn the grader's undying gratitude.

  1. | x in y <---> x in z & P(x) [hyp]
  2. | x in w <---> x in z & P(x) [hyp]
  3. | x in y <---> x in w [logic]
  4. | Ax (x in y <---> x in w ) [AI]
  5. | y = w [extensionality]

Now we can introduce the squiggly brackets notation with a definition:

Again, this looks more normal in usual function notation:

Exercise 1: Use the new notation to define the function intersect(c), which returns the intersection of all the sets in c.


Pairing Axiom

Comprehension is a powerful tool for carving pretty sets out of big lumpy sets. But we don't have any big lumpy sets. We have only the empty set. That's not too exciting.

Maybe we can get more. How about {0, 0}? Nope. We don't get anything out of the air. We don't get anything unless we assume it into being or prove it to exist from other assumptions. Now we just plain assume that every two sets are elements of another set. Nothing subtle about that.

Now consider the definition

We have to justify it by showing the existence and uniqueness of z.

  1. Ez (x in z & y in z) [pairing]
  2. | x in z & y in z [EE hyp]
  3. | w = w [=]
  4. | w = {q in z: q = x or q = z = y }
  5. | q in w <---> q in z & (q = x or q = y)
  6. | q in w <---> (q = x or q = y) [logic, using 2, 5]
  7. | Aq q in w <---> (q = x or q = y) [AI]
  8. | Ew Aq q in w <---> (q = x or q = y) [EI]
  9. Ew Aq q in w <---> (q = x or q = y) [EE]

Uniqueness is easy.

Yippee. Now we have 0 and we can make pairs. We can also define without further ado:

Existence and uniqueness come immediately from the fact that {x, x} is already a term in the extended language.

We can also define

So we have infinitely many sets that look like binary trees:

And we have a clunky way to represent finite sequences.

Later, we will represent finite sequences as functions on a finite initial segment of the natural numbers, but we don't have even finite functions and relations yet.

Still, not bad for such a dumpy axiom.

Now if we had the set of natural numbers and we could form unions and cross products and power sets, we'd be cooking. But we can't do any of that yet. We have to assume every bit of it, with a separate existence axiom for each. Do you understand now why some mathematicians might have felt queasy about this? Where is the certitude? Where is the intersubjecivity? Where is the guarantee of consistency? Assume, assume, assume.


Union Axiom

We would like to let Uz be the union of all sets yin z. But we don't know that such a set exists. Intuitively it should, since it is no "bigger" than z. So since assumptions are free, let's assume it!

Now define

As usual, we must establish existence and uniquenss before using this definition in a proof. Existence is immediate by the axiom. I leave uniqueness to you.

Exercise 2: prove the uniqueness claim for union.

Exercise 3: Suppose x, y exist. Do we now know that x U y exists? Prove it.

So we now have lots of sets. In fact, we have each of the natural numbers

Note: each natural number is the set of all its predecessors, starting with the empty set.

Indeed, we have the successor operation on the natural numbers:

We also have ordered pairs of natural numbers. But we don't have the set of all natural numbers. That's another matter entirely. It is a "completed" rather than a "potential" infinity.


Defining relations

Function symbols are not the only extra notation we need. It is also convenient to add new relations. This is easier, because we don't have to establish uniqueness and existence. Every formula of set theory expresses a definable relation.

Also, we can define the ordering relation for natural numbers

We can also define interesting properties. Each natural number set defind above has the following property

Defined vocabulary can be used to state new axioms in a more succinct manner


Replacement Axiom Schema

This is one of the most useful axioms, since it yields the cartesian product function and hence relations and functions on a given domain. It falls short, however, of establishing the set of all functions on a given domain. That requires the power set axiom. Each time we get new objects, the set of all those objects becomes a new question requiring more assumptions.

The replacement schema says that if we can define a function symbol in set theory, then the image of any set under the function is also a set. The motivation is the same as in the union case: the image can't be any "bigger" than the original set (because of uniqueness), so such an axiom can't take us up to "scary", huge things like Russell's set.

Let f be an introduced function symbol. Then the following is an axiom:

We can use replacement to define a function taking each y to the image of y under defined function f:

Exercise 4: Prove existence and uniqueness to justify the definition.

Then we can introduce the following, more familiar notation

This isn't the notation justified by separation (why not?). Don't confuse the two.

We can use replacement twice and union to define the familiar cartesian product operation a X b.


Sets as Relations and Functions

We have seen how to add new function symbols and predicates to L by way of definitions. But we can also view sets as relations and functions.

Then define the special conventional notation:

Now you can define reflexivity, symmetry, transitivity, partial order, simple order, and equivalence relations as it is usually done in modern math textbooks.

We can't define the domain of x using the separation schema as it is usually done in elementary textbooks:

The right way to do it is to define the projection functions for ordered pairs as follows, establishing existence and uniqueness, as always:

Then use the replacement schema to define domain and range

Functions are now single-valued relations

Introduce the conventional notation:

To show that a function f exists as a set , follow this procedure:

  1. you must show that sets x, y, containing the domain and range of f exist.
  2. then use separation to carve f out of x X y.
  3. Then show that f is a single-valued relation.
  4. After that, you may only use the notation f(w) if you prove that w in dom(f).


Operations vs. functions

An important word of caution: the operations corresponding to introduced function symbols do not exist as sets.

Exercise 5: Show that the successor operation s does not exist as a set theoretic function. More precisely,

Why does it matter? I'll tell you...


Cardinality

We compare sizes of sets by associating each element of one set with a unique element of the other. Such an association is an injective function. If the function is also onto, then the sets have the same size or cardinality.

Notice that cardinality is defined with an existential quantifier. One must show the existence of f. Since quantifiers range over sets, that means that you must go through the above procedure to show that f exists as a function.


The Power Set Axiom

The power set axiom is as follows:

This axiom allows us to introduce a new squiggly bracket notation:

Notice that this is neither the replacement operation nor the separation operation. It requires the power set axiom to introduce. An alternate notation is:


Cantor's diagonal argument

The most basic result of set theory is that no set is the same size as its power set.

How do we prove this? It requires us to show that a function does not exist:

  1. | x > P(x) [hyp]
  2. | Ef (f is a an injection from x onto P(x)) [defn >]
  3. | | f is a an injection from x onto P(x) [EE hyp]
  4. | | Ay (y in P(x) ---> Ex f(x) = y) [defn onto]
  5. | | {y in x: y notin f(x)} is a subset of x [easy lemma using defn {}]
  6. | | {y in x: y notin f(x}} in P(x) [defn P]
  7. | | Ez (f(z) = {y in x: y notin f(x)}) [AE]
  8. | | | f(z) = {y in x: y notin f(x)} [EE hyp]
  9. | | | f(x) in f(z) <---> f(z) notin f(z) [follows using defn {}]
  10. | | | _|_ [logic]
  11. | | _|_ [EE] are the conditions met?
  12. | _|_ [EE]
  13. x > P(x) ---> _|_ [---> I]
  14. x < P(x)

Make sure you can fill in the missing steps rigorously.


Ordinal Numbers (Optional)

Some definitions:

Thus

are natural numbers.

is not a natural number.

We don't know that the set of all natural numbers exists yet. That is the job of the infinity axiom.


Axiom of Infinity and the Natural Numbers (Optional)

Recall that we don't yet have the set of all natural numbers, so we don't have relations or functions on the natural numbers either (although we do have finite functions defined on finite sets of natural numbers). How do we get the set of all natural numbers? Well, we assume that it exists.

To get the natural numbers themselves, define

If you like, show existence and uniqueness using the infinity axiom and extensionality.


Integers (Optional)

We can define addition and multiplication on the natural numbers recursively, using successor.

[It must be shown in set theory that such recurrence equations define a unique operation on N. That is the first lapse of rigor in these notes.]

Then we can define the integers Z as follows:


Rationals (Optional)

The rationals are obtained similarly. Define multiplication on the natural numbers as follows:

Then define the rationals Q as follows.


Reals (Optional)

So far, we haven't had to use the power set axiom. We need it to get from the rationals to the reals because the set R of all real numbers will be represented as a set of infinite sequences of rationals.

A Dedekind cut is a set of rationals (an element of P(Q)) that is

  1. nontrivial (nonempty, not = to Q)
  2. closed downward (if q is in, then each q' < q is in), and
  3. has no greatest element

R = {c in P(Q): c is a Dedekind cut}.