Malament D., Geometry - and - Spacetime PDF
Malament D., Geometry - and - Spacetime PDF
Malament D., Geometry - and - Spacetime PDF
David B. Malament
Department of Logic and Philosophy of Science
University of California, Irvine
dmalamen@uci.edu
Contents
1 Preface 2
2 Preliminary Mathematics 2
2.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Affine Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Metric Affine Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Euclidean Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 21
These notes have not yet been fully de-bugged. Please read them with
caution. (Corrections will be much appreciated.)
1
1 Preface
The notes that follow bring together a somewhat unusual collection of topics.
In section 3, I discuss the foundations of “special relativity”. I emphasize
the invariant, “geometrical approach” to the theory, and spend a fair bit of
time on one special topic: the status of the relative simultaneity relation within
the theory. At issue is whether the standard relation, the one picked out by
Einstein’s “definition” of simultaneity, is conventional in character, or is rather
in some significant sense forced on us.
Section 2 is preparatory. When the time comes, I take “Minkowski space-
time” to be a four-dimensional affine space endowed with a Lorentzian inner
product. So, to prepare the way, I first give a brief account of “metric affine
spaces” that is sufficiently general to include the Minkowskian variety. There is
more material in this section than is strictly necessary for what follows. But it
is helpful to have it available for reference purposes.
Section 4 is an afterthought. It deals with non-Euclidean (i.e., hyperbolic)
plane geometry. This geometry was, of course, first developed by Gauss, Lo-
batchevsky, Bolyai et al. in the 19th century. But it turns out that one of the
nicest routes to it is via special relativity. More precisely: one gets a simple,
easily visualized model of hyperbolic plane geometry (the so-called “hyperboloid
model”) if one starts with three-dimensional Minkowski spacetime, and then re-
stricts attention to a particular surface in it. With Minkowskian geometry in
hand, very little additional work is needed to develop this application; and it is
very tempting indeed to do it!
2 Preliminary Mathematics
It is one of the assumptions behind this course that a relatively modest invest-
ment in abstract mathematics pays significant dividends for the understanding
of the special theory of relativity and recent debates among philosophers con-
cerning its foundations. It provides the resources with which to pose and answer
precise questions that bear on those debates.
We begin the investment in this first part of the course. Here we review
certain facts about: (i) affine spaces, (ii) metric affine spaces, and (iii) Euclidean
spaces. This will help prepare the way for our consideration of Minkowskian
spaces in the next. (It is helpful to think of Euclidean and Minkowskian spaces
as but different species of metric affine space, and develop them in parallel.)
affine spaces
↓
metric affine spaces
. &
Euclidean spaces Minkowskian spaces
2
2.1 Vector Spaces
One has to start somewhere. In that follows, we take for granted familiarity with
basic facts about vector spaces and linear maps (equivalent to, say, chapters 1
and 3 of Lang [8]). Here we give a quick summary to establish notation and
terminology, and list a number of problems for review purposes.
A vector space (over R) is a structure (V, +, 0, ·) where V is a set (whose
elements are called “vectors”), 0 is an element of V (the “zero element”), + is
a map from V × V to V (“vector addition”) and · is a map from R × V to V
(“scalar multiplication”) satisfying the following conditions.
(VS1) For all u, v in V , u + v = v + u.
(VS2) For all u, v, w in V , (u + v) + w = u + (v + w).
(VS3) For all u in V , u + 0 = u.
(VS4) For all u in V , there is a v in V such that u + v = 0.
(VS5) For all a in R, and all u, v in V , a · (u + v) = a · u + a · v.
(VS6) For all a, b in R, and all u in V , (a + b) · u = a · u + b · u.
(VS7) For all a, b in R, and all u in V , (a b) · u = a · (b · u).
(VS8) For all u in V , 1 · u = u.
(Of course, one sometimes considers vector spaces defined over fields other than
R. But we have no need to do so. For us, a “vector space” will always be a
vector space over the reals.) Sometimes we will abuse our notation and write
a u rather than a · u. And sometimes, following standard practice, we will be a
bit casual about the distinction between a vector space V = (V, +, 0, ·), and its
underlying point set V . We will refer to “the vector space V ”, etc.
In what follows, let (V, +, 0, ·) be a vector space.
Problem 2.1.1. Prove that for all vectors u in V , there is a unique vector v
in V such that u + v = 0. (We write the latter as (−u); given vectors u and v,
we sometimes write (u − v) for (u + (−v)).)
Problem 2.1.2. Prove that for all vectors u in V , if u + u = u, then u = 0.
Problem 2.1.3. Prove that for all vectors u in V , and all real numbers a,
(i) 0 · u = 0
(ii) −u = (−1) · u
(iii) a · 0 = 0.
Example 2.1.1. For every n ≥ 1, the set Rn = {(a1 , ..., an ) : ai ∈ R for all i}
has a natural vector space structure (Rn , +, 0, ·), where 0 is (0, ..., 0), and the
operations + and · are defined by
(a1 , ..., an ) + (b1 , ..., bn ) = (a1 + b1 , ..., an + bn )
a · (b1 , ..., bn ) = (ab1 , ..., abn ).
3
(SS1) For all u, v in V , if u and v both belong to W , then so does (u + v).
(SS2) For all u in V , and all a in R, if u belongs to W , then so does (a u).
So, for example, the set of all vectors of the form (a, 0, 0) is a subspace of R3 ,
and so is the set of all vectors of the form (a, 0, c). But the set of all vectors of
the form (a, 0, a + 1) is not a subspace of R3 .
If W is a subspace of V , then it forms a vector space in its own right if we
define vector addition and scalar multiplication as in V , and use the same zero
element 0 as in V . (Question: How do we know that if W is a subspace of V , 0
belongs to W ?) Conditions (SS1) and (SS2) are precisely the conditions needed
to guarantee that these operations are well defined over W .
Problem 2.1.4. Prove that the intersection of any non-empty set of subspaces
of V is a subspace of V .
Let S be a subset of V , and let L(S) be the intersection of all subspaces of V
that contain S (as a subset). Then L(S) is itself a subspace of V . (This follows
from the result in problem 2.1.4.) We call it the (sub)space spanned by S or the
linear span of S. This definition makes sense for any subset S of V , empty or
non-empty, finite or infinite. (The linear span of the empty set ∅ turns out to
be the singleton set {0}. This follows since every subspace of V contains ∅ as a
subset, and the intersection of all subspaces is {0}.) But if S is non-empty, an
equivalent (and more familiar) characterization is available. In this case, we can
then take L(S) to be the set of all (finite) linear combinations of vectors in S,
i.e., all vectors of form a1 u1 + ... + ak uk , where k ≥ 1 and u1 , ..., uk are elements
of S. (Question: Why are we justified in writing the sum without parentheses?)
Problem 2.1.5. Let S be a subset of V . Show that L(S) = S iff S is a subspace
of V .
4
to secure this result. One would like to be able to assert that all vector spaces
have bases. If we had taken the linear span of a set S to be the set of finite
linear combinations of elements in S – even when S is the empty set – then the
trivial vector space {0} would not have had one.)
In the next proposition we collect several important facts about bases. To
simplify our formulation, we limit attention to the case where V is finite dimen-
sional, i.e., the case where there exists a finite subset S of V with L(S) = V .
(So, for example, the vector space Rn is finite dimensional for every n ≥ 1.) Not
all vector spaces are finite dimensional, but they are the only ones of concern
to us in what follows.
Proposition 2.1.1. Let V be finite dimensional. Then all the following hold.
(i) There exists a finite basis for V .
(ii) All bases for V have the same (finite) number of elements. (That number
is called the dimension of V , and is denoted dim(V ).)
(iii) If dim(V ) = n, every linearly independent subset of V with n elements is
a basis for V .
(iv) If dim(V ) = n, every subset of V with n elements that spans V is a basis
for V .
(v) If W is a subspace of V , W is finite dimensional, and dim(W ) ≤ dim(V ).
We skip the proof. It can be found in Lang [8] and almost any other basic
text in linear algebra.
For all n ≥ 1, the vector space Rn has dimension n. A trivial vector space
with one element (the zero vector 0) has dimension 0 (since the empty set is
a basis for the space). As we will observe in a moment, these are, “up to
isomorphism”, the only finite dimensional vector spaces.
If {u1 , ..., un } is a basis for V (n ≥ 1), every vector u in V can be expressed
uniquely in the form u = a1 u1 + ... + an un . That u can be expressed in this
form at all follows from the fact that u belongs to the linear span of {u1 , ..., un },
namely V . That the expression is unique follows from the linear independence
of {u1 , ..., un }. (For if we had two representations
a1 u1 + ... + an un = u = b1 u1 + ... + bn un ,
And so, by linear independence, it would have to be the case that ai = bi for
all i.)
Now consider two vectors spaces (V, +, 0, ·) and (V 0 , +0 , 00 , ·0 ). A map
Φ : V → V 0 is linear if, for all u, v in V , and all a in R,
5
These two conditions imply that Φ(0) = 00 , and
for all k ≥ 1, all real numbers a1 , a2 , ..., ak , and all vectors u1 , u2 , ..., uk in
V . It follows that Φ[V ] is a subspace of V 0 , and that, for all subsets S of V ,
Φ[L(S)] = L(Φ[S]). (Two points about notation: (1) If T is a subset of V , Φ[T ]
is here understood to be the range of T under Φ, i.e, the set all vectors of form
Φ(u) in V 0 where u is a vector in V . (2) When there is no danger of confusion,
we will use a uniform notation for the vector space operations and zero elements
in different vector spaces. That will allow us to write the equation above, more
simply, in the form:
(Here our notation is imperfect because the symbol ‘+’ is used for two different
maps: the old map from V × V to V (addition within the vector space V), and
the new one from A × V to A. But no ambiguity arises in practice.) If q = p + u,
we write u as →
−pq. Thus, q = p + →
−
pq. We refer to the elements of A as “points”,
→
−
and refer to pq as the “vector that runs from p to q”. Behind the formalism is
an intuitive picture. We think of →−
pq as an arrow with tail p and head q. (So
6
the assertion q = p + →
−
pq can be understood to mean that if one starts at p, and
then follows the arrow →−
pq, one ends up at q.)
In what follows, let (A, V, +) be a finite dimensional affine space. (We
understand the dimension of an affine space to be the dimension of its underlying
vector space.)
Proposition 2.2.1. For all points p, q, r in A,
(i) −
→ = 0 (or, equivalently, p + 0 = p)
pp
→
−
(ii) pq = 0 ⇒ p = q
(iii) →−
qp = −→−
pq
(iv) pq + qr = →
→
− →
− −
pr
Proof. (i) By (AS1), there is a unique u such that p + u = p. Hence, using
(AS2),
p + (u + u) = (p + u) + u = p + u = p.
So, by uniqueness, u + u = u and, therefore (recall problem 2.1.2), u = 0. Thus
p + 0 = p, i.e., −→ = 0.
pp
(ii) Assume pq = 0. Then q = p + →
→
− −
pq = p + 0 = p.
→
−
(iii) If q = p + u, then u = pq, and q + (−u) = (p + u) + (−u) = p + (u + (−u)) =
p + 0 = p. So → −
qp = −u = −→ −
pq.
(iv) If q = p + u and r = q + v, then u = → −
pq, v = →−
qr, and r = (p + u) + v =
→
− →
−
p + (u + v). So pr = u + v = pq + qr.→
−
Hence, r = q by clause (ii). To verify that the rule of association maps A onto
V , we must show that given any vector u in V , there is a point q in A such that
→
−
pq = u. But this condition is clearly satisfied by the point q = p + u.
Given any point p in A and any subspace W of V , the set
p + W = {p + u : u ∈ W }
7
(ii) p belongs to q+W
(iii) →
−
pq ∈ W
(iv) p+W and q+W coincide (i.e., contain the same points)
(v) p+W and q+W intersect (i.e., have at least one point in common)
Problem 2.2.2. Let p1 + W1 and p2 + W2 be lines, and let u1 and u2 be non-
zero vectors, respectively, in W1 and W2 . Show that the lines intersect iff −
p→
1 p2
is a linear combination of u1 and u2 .
We say that the lines p1 + W1 and p2 + W2 are parallel if W1 = W2 . (We are
allowing a line to count as parallel to itself.) An equivalent characterization is
given in the following proposition. The equivalence should seem obvious, but a
bit of work is necessary to give a complete proof.
Proposition 2.2.2. Two lines are parallel iff either they coincide, or they are
co-planar (i.e., subsets of some plane) and do not intersect.
Proof. Let p1 + W1 and p2 + W2 be any two lines, and let u1 and u2 be non-zero
vectors, respectively, in W1 and W2 .
Assume first that the lines are parallel (W1 = W2 ), but do not coincide.
Then, by problem 2.2.1, they do not intersect and − p1−→
p2 ∈/ W1 . The latter
−− →
assertion implies that the vectors u1 and p1 p2 are linearly independent and,
so, span a two-dimensional subspace W of V . To complete the first half of the
proof, it will suffice for us to show that the lines p1 + W1 and p2 + W2 are both
subsets of the plane p1 +W . Certainly p1 +W1 is a subset of p1 +W , since W1 is
a subset of W . Similarly, p2 + W2 is a subset of p2 + W . But p1 + W = p2 + W .
(This follows from problem 2.2.1 again and the fact that − p1−→p2 ∈ W .) So, as
claimed, both lines are subsets of p1 + W .
For the converse, assume first that p1 + W1 = p2 + W2 . Then p2 belongs to
p1 + W1 and, therefore, p2 = p1 + k1 u1 for some k1 . So − p1−→
p2 = k1 u1 ∈ W1 . It
follows (by problem 2.2.1) that p1 +W1 = p2 +W1 . So p2 +W1 = p2 +W2 . Hence
(since p2 +u1 clearly belongs to p2 +W1 ), p2 +u1 belongs to p2 +W2 . So there is
a number k2 such that p2 + u1 = p2 + k2 u2 . It follows that p2 = p2 + (u1 − k2 u2 )
and, therefore, u1 = k2 u2 . Thus the non-zero vectors u1 and u2 are linearly
dependent. So W1 = W2 , i.e., our lines are parallel.
Alternatively – we are still working on the converse – assume that the lines
p1 + W1 and p2 + W2 do not intersect, and are both subsets of the plane q + W
(where W is some two-dimensional subspace of V ). Since p1 and p2 both belong
to q + W , it must be the case (problem 2.2.1) that − p−
→ −−→
1 q ∈ W and p2 q ∈ W . So,
− −→
since W is a subspace, p1 p2 ∈ W . Furthermore, since p1 + W1 and p2 + W2 are
subsets of q + W , it must be the case that u1 and u2 belong to W . (Consider
the point r = p1 + u1 in p1 + W1 . Since it belongs to q + W , → −
qr ∈ W . But
−−→ −−
→ −
→ −−
→ −
→
u1 = p1 r = p1 q + q r. So, since both p1 q and q r belong to W , u1 does too.
And similarly for u2 .) Since the three vectors u1 , u2 , and − p1−→
p2 all belong to
a two-dimensional subspace (namely W ), they cannot be linearly independent.
So there are numbers a, b, c, not all 0, such that a u1 + b u2 + c − p1−→
p2 = 0. But
c must be 0. Otherwise, we could divide by c and express p1 → −−p2 as a linear
8
combination of u1 and u2 . And this, by problem 2.2.2, would contradict our
assumption that the lines p1 + W1 and p2 + W2 do not intersect. So u1 and
u2 are linearly dependent. Thus, in this case too, W1 = W2 , i.e, our lines are
parallel.
Let p and q be any two (distinct) points in A, and let W be the subspace of
V spanned by the vector → −
pq. We take the line determined by p and q to be the
set
L(p, q) = p + W = {p + a → −
pq : a ∈ R}.
It is easy to verify (e.g., as a consequence of problem 2.2.1) that L(q, p) = L(p, q),
and that L(p, q) is the only line that contains both p and q. We take the line
segment determined by p and q, in contrast, to be the subset
LS(p, q) = {p + a →
−
pq : 0 ≤ a ≤ 1}.
Again we have symmetry: LS(q, p) = LS(p, q). Three points p, q, r are said to
be collinear, of course, if there is a line to which they all belong. If they are
distinct, this is equivalent to the requirement that L(p, q) = L(q, r) = L(r, p).
(If the points are not distinct, they are automatically collinear.)
Proposition 2.2.3. Let p, q be distinct points in A, and let o be any point in A
whatsoever (not necessarily distinct from p and q). Then, for every point r on
L(p, q), there is a unique number a such that →
−
or = a →
−
op + (1 − a) →
−
oq. Conversely,
for every number a, the right side expression defines a point on L(p, q).
Proof. Let r be a point on L(p, q). We can express it in the form r = p + b →
−
pq
for some number b. Hence
→
−
or = →
−
op + →
−
pr = →
−
op + b →
−
pq = →
−
op + b (−→
−
op + →
−
oq) = (1 − b) →
−
op + b →
−
oq.
So →
−
or assumes the desired form iff a = (1 − b). We can reverse the argument for
the converse assertion.
Problem 2.2.3. Let p, q, r, s be any four (distinct) points in A. Show that the
following conditions are equivalent.
(i) →
−
pr = →−
sq
→
− →
−
(ii) sp = qr
(iii) The midpoints of the line segments LS(p, q) and LS(s, r) coincide, i.e.,
p + 12 →
−
pq = s + 12 →
−
sr.
Problem 2.2.4. Let p1 , ..., pn (n ≥ 1) be distinct points in A. Show that there
is a point o in A such that → −
op1 + ... + →
−
opn = 0. (If particles are present at the
points p1 , ..., pn , and all have the same mass, then o is the “center of mass” of
the n particle system. Hint: Let q be any point at all, and take
1 →
o=q+ (−
qp1 + ... + →
−
qpn ).)
n
9
It is not our purpose to develop affine geometry systematically. But we will
present one classic result, Desargues’ theorem. It provides a nice example of the
use of our algebraic methods. First we need a simple lemma.
Proposition 2.2.4. (Collinearity Criterion) Let p, q, r be distinct points in
A. They are collinear iff given any point o (a choice of “origin”), there exist
numbers a, b, c, not all 0, such that a + b + c = 0 and a →
−
op + b →
−
oq + c →
−
or = 0.
Proof. Assume first the points are collinear. Let o be any point. Since r lies
on L(p, q), it follows from proposition 2.2.3 that there is a number k such that
→
−
or = k → −
op + (1 − k) → −
oq. The desired conditions will be satisfied if we take
a = k, b = (1 − k), and c = −1. The argument can also be reversed. Let o
be any point and assume there exist numbers a, b, c, not all 0, satisfying the
given equations. Without loss of generality (interchanging the roles of p, q, r, if
a b
necessary) we may assume that c 6= 0. If we take k = − , then (1−k) = − , and
→
− c c
or = k →
−
op + (1 − k) →−
oq. So, by proposition 2.2.3 again, r belongs to L(p, q).
q0
q
r0
o
p p0
A triangle, for us, is just a set of three (distinct) non-collinear points. De-
sargue’s theorem deals with the “perspective” properties of triangles in affine
spaces of dimension at least 2. (Assume for the moment that our background
affine space satisfies this condition.) Let T and T 0 be disjoint triangles satisfying
two conditions: (i) no point of one is equal to any point of the other, and (ii)
no pair of points in one triangle determines the same line as any pair of points
in the other. We say they are perspective from a point o if we can label their
points so that T = {p, q, r}, T 0 = {p0 , q 0 , r0 }, and the lines L(p, p0 ), L(q, q 0 ), and
L(r, r0 ) all contain the point o. (See figure 2.2.1.) We say they are perspective
from a line L if we can label them so the lines determined by corresponding
sides intersect, and the intersection points L(p, q) ∩ L(p0 , q 0 ), L(q, r) ∩ L(q 0 , r0 ),
and L(p, r) ∩ L(p0 , r0 ) are all on L. (See figure 2.2.2.)
10
Proposition 2.2.5. (Desargues’ Theorem) Consider any two triangles satis-
fying conditions (i) and (ii). Assume they are perspective from a point o, and
(with labels as above) the lines determined by their corresponding sides intersect
in points
x = L(p, q) ∩ L(p0 , q 0 )
y = L(q, r) ∩ L(q 0 , r0 )
z = L(p, r) ∩ L(p0 , r0 ).
Then x, y, and z are collinear (and so the triangles are perspective from a line).
y x z
q0
q
r0
p p0
Figure 2.2.2: Triangles Perspective from a Line
Proof. (Roe [10]) We are assuming that the triples {o, p, p0 }, {o, q, q 0 }, and
{o, r, r0 } are all collinear. So there are numbers a, b, c such that → −
op0 = a →−
op,
→
− 0 →
− →
− 0 →
−
oq = b oq, and or = c or. (Well, not quite. We will only be able to find the
numbers if o is distinct from p, q and r. But if it is equal to one of them, then
it must be distinct from p0 , q 0 and r0 . So in that case we can run the argument
with the roles of p, q, r and p0 , q 0 , r0 interchanged.) The numbers a, b, c must all
be different from 1 (since p 6= p0 , etc.).
Now since x lies on both L(p, q) and L(p0 , q 0 ), it follows from proposition
2.2.3 that there are numbers d and f such that
d→
−
op + (1 − d) →
−
oq = −
→=f→
ox −
op0 + (1 − f ) →
−
oq 0 .
Hence (substituting a →
−
op for →
−
op0 and b →
−
oq for →
−
oq 0 in the expression on the right),
(d − af ) →
−
op + (1 − d − b + bf ) →
−
oq = 0.
But →
−
op and → −
oq are linearly independent. (They could only be proportional if the
points o, p, p0 , q, q 0 were collinear, violating our assumption that the lines L(p, q)
11
and L(p0 , q 0 ) are distinct.) So d = af and (1 − d) = b(1 − f ). Now it cannot be
the case that b = a, since otherwise it would follow that a = 1, contradicting
our remark above. So we can solve these equations for d in terms of a and b,
and obtain
a(1 − b)
d=
a−b
and, hence,
−
→ a(1 − b) →− b(1 − a) →−
ox = op − oq.
a−b a−b
Similarly, we have b 6= c, a 6= c,
→ = b(1 − c) →
−
oy −
oq −
c(1 − b) →−
or.
b−c b−c
and
→
− a(1 − c) →
− c(1 − a) →
−
oz = op − or.
a−c a−c
Therefore,
(a − b)(1 − c) −
→ + (b − c)(1 − a) −
ox → − (a − c)(1 − b) →
oy −
oz = 0.
And the factors (a − b), (b − c), (a − c), (1 − a), (1 − b), (1 − c) are all non-zero. So
it follows by the collinearity criterion (proposition 2.2.4) that the points x, y, z
are collinear.
Note that our formulation of Desargues’ theorem does not (quite) assert that
if two triangles (satisfying conditions (i) and (ii)) are perspective from a point,
then they are perspective from a line. We have to assume that the lines deter-
mined by corresponding sides of the triangles intersect. (Otherwise we would
not have points x, y, z that are, at least, candidates for being collinear.) But
there is a more general version of Desargues’ theorem within “projective geom-
etry” in which this assumption is not necessary. In this more general version,
the collinear intersection points that figure in the conclusion of the proposition
can be points “at infinity”. (Further discussion of Desargues’ theorem can be
found in almost any book on projective geometry.)
We know that, up to isomorphism, there is only one n-dimensional vector
space (for any n ≥ 0). (Recall problem 2.1.7.) This result extends easily
to affine spaces. The only slight subtlety is in the way one characterizes an
“isomorphism” between affine spaces.
Consider first the canonical examples. We get a 0-dimensional affine space
if we take A to be a singleton set {p}, take V to be a trivial vector space whose
only element is the zero vector 0, and take + to be the map that associates with
p and 0 the element p (i.e., p + 0 = p). Note that AS1 and AS2 are satisfied.
12
For n ≥ 1, we get an n-dimensional affine space if we take A to be the set
Rn , take V to be the vector space Rn , and take + to be the operation that
associates with a point p = (a1 , ..., an ) and a vector v = (b1 , ..., bn ) the point
p + v = (a1 + b1 , ..., an + bn ). We refer to it as the “affine space Rn ”.
Now let (A, V, +) and (A0 , V0 , +0 ) be affine spaces (not necessarily finite
dimensional). We say that a bijection ϕ : A → A0 between their underlying point
sets is an (affine space) isomorphism if there is a (vector space) isomorphism
Φ : V → V 0 satisfying the following condition.
−−−−−−→
(I1) For all p and q in A, ϕ(p) ϕ(q) = Φ(→ −
pq).
Of course, the two spaces are said to be isomorphic if there exists an isomorphism
mapping one onto the other. (See figure 2.2.3.)
ϕ(q)
q
p
ϕ(p)
A A0
This definition may seem less than perfectly clear. The idea is this. To
qualify as an (affine space) isomorphism, the bijection ϕ must induce a map
−−−−−−→
from V to V 0 (sending →−
pq to ϕ(p) ϕ(q)) that itself qualifies as a (vector space)
isomorphism. Notice that (I1) can also be formulated, equivalently, as follows:
ϕ(q) = ϕ(p + →
−
pq) = ϕ(p) + Φ(→
−
pq).
−−−−−−→
So ϕ(p) ϕ(q) = Φ(→
−
pq). This gives us (I1).
13
Associated with every affine space isomorphism ϕ : A → A0 is a unique vector
space isomorphism Φ : V → V 0 satisfying condition (I1). (It is unique because if
Φ1 and Φ2 both satisfy the condition, then (using the (I2) formulation)
−−−−−−−−−→
Φ1 (u) = ϕ(p) ϕ(p + u) = Φ2 (u)
for all p in A and all u in V . So Φ1 = Φ2 .) The association is not invertible.
One cannot recover ϕ from Φ. (There will always be infinitely many bijections
ϕ : A → A0 that, together with a given Φ, satisfy (I1).) But as the next
proposition indicates, the recovery is possible once one adds as a side constraint
the requirement that ϕ take some particular point o in A to some particular
point o0 in A0 . (We can think of o and o0 as “origins” for A and A0 .) We will
use the proposition repeatedly in what follows.
Proposition 2.2.6. Let (A, V, +) and (A0 , V0 , +0 ) be affine spaces. Further,
let Φ : V → V 0 be an isomorphism, let o and o0 be points, respectively, in A and
A0 , and let ϕ : A → A0 be defined by setting ϕ(p) = o0 + Φ(→
−
op) for all points p in
A. Then
(i) ϕ(o) = o0
−−−−−−→
(ii) ϕ(p) ϕ(q) = Φ(→−
pq) for all p and q in A
(iii) ϕ is a bijection
(iv) ϕ is the only bijection between A and A0 satisfying conditions (i) and (ii).
Proof. Clause (i) holds because we have
ϕ(o) = o0 + Φ(→
−
oo) = o0 + Φ(0) = o0 + 00 = o0 .
(Here and in what follows we use proposition 2.2.1 and the fact that Φ is an
isomorphism.) For (ii), notice that
ϕ(q) = o0 + Φ(→
−
oq) = o0 + Φ(→ −
op) − Φ(→
−
op) + Φ(→
−
oq) = ϕ(p) + [−Φ(→
−
op) + Φ(→
−
oq)]
= ϕ(p) + Φ(−→
−
op + →
−
oq) = ϕ(p) + Φ(→
−
pq)
for all p and q in A. For (iii) we show, in order, that ϕ is injective and that
it maps A onto A0 . Assume first there are points p and q in A such that
−−−−−−→ −−−−−−→
ϕ(p) = ϕ(q). Then Φ(→ −
pq) = ϕ(p) ϕ(q) = ϕ(p) ϕ(p) = 00 . Since Φ is injective,
it follows that →−
pq = 0. Therefore, p = q. Thus, ϕ is injective. Next, let p0 be
−→ −→
any point in A . Consider the point p = o + Φ−1 (o0 p0 ). Clearly, →
0 −
op = Φ−1 (o0 p0 ).
−→
Hence, o0 p0 = Φ(→−
op) and, therefore, p0 = o0 + Φ(→
−
op) = ϕ(p). Since the arbitrary
point p has a preimage under ϕ, we see that ϕ maps A onto A0 . So ϕ is a
0
14
Proposition 2.2.7. Finite dimensional affine spaces are isomorphic iff they
have the same dimension.
Proof. Let A = (A, V, +) and A0 = (A0 , V0 , +0 ) be finite dimensional affine
spaces. Assume first that there exists an isomorphism ϕ : A → A0 with corre-
sponding map Φ : V → V 0 . Since Φ is a (vector space) isomorphism, V and
V0 have the same dimension. (Recall problem 2.1.7.) So, by the way we have
characterized the dimension of affine spaces, A and A0 have the same dimension.
Conversely, assume V and V0 have the same dimension. Then, by problem
2.1.7 again, there exists a (vector space) isomorphism Φ : V → V 0 . Let o be
a point in A, let o0 be a point in A0 , and let ϕ : A → A be defined by setting
ϕ(p) = o0 + Φ(→ −
op) for all p. We know from proposition 2.2.6 that ϕ is an
isomorphism with associated map Φ. So A and A0 are isomorphic.
Problem 2.2.5. Let (V, A, +) be a two-dimensional affine space. Let {p1 , q1 , r1 }
and {p2 , q2 , r2 } be two sets of non-collinear points in A. Show that there is
a unique isomorphism ϕ : A → A such that ϕ(p1 ) = p2 , ϕ(q1 ) = q2 , and
ϕ(r1 ) = r2 . Hint: Use proposition 2.2.6 and the fact that a vector space iso-
morphism is uniquely determined by its action on the elements of any basis.
15
Proposition 2.3.1. (Projection Theorem) Assume V has dimension n ≥ 1,
and assume u is a vector in V such that hu, ui =
6 0. Then all the following hold.
(i) Every vector v in V has a unique decomposition of the form v = a u + w,
where w ∈ u⊥ .
(ii) If S is a basis for u⊥ , S ∪ {u} is a basis for V .
(iii) u⊥ has dimension (n − 1).
Proof. Since hu, ui 6= 0, u cannot be the zero-vector. We will use this fact
repeatedly.
(i) Let v be any vector in V . Since hu, ui =
6 0, there is a real number a such
that ahu, ui = hu, vi. Hence hu, v − a ui = 0. Take w = v − a u. For uniqueness,
note that if w1 = v − a1 u and w2 = v − a2 u are both orthogonal to u, then so
is the difference vector
w1 − w2 = (v − a1 u) − (v − a2 u) = (a2 − a1 )u.
16
S are orthogonal to u, and we have assumed that u is not orthogonal to itself.)
So S contains (n − 1) vectors. Since S is a basis for u⊥ , it follows that u⊥ has
dimension (n − 1).
The projection theorem has an important corollary. Let S be a basis for V .
We say that it is orthonormal (with respect to the inner product h , i) if, for all
u, v in S,
(i) u 6= v ⇒ hu, vi = 0
(ii) hu, ui2 = 1.
(It is important that we are not insisting that hu, ui = 1 for all u in S. We are
allowing for the possibility that, for at least some u, the inner product is −1.)
Proposition 2.3.2. V has an orthonormal basis.
Proof. The empty set qualifies as an orthonormal basis for any vector space
of dimension 0. So we may assume that dim(V ) ≥ 1. We claim, first, that
there exists a vector u in V such that hu, ui 6= 0. Suppose not. Then by the
polarization identity (problem 2.3.1), it follows that hv, wi = 0 for all v and w.
But this is impossible. Since dim(V ) ≥ 1, there exists a non-zero vector v in
V , and so, by (IP4), there is a vector w in V (corresponding to v) such that
hv, wi =
6 0. Thus, as claimed, there is a vector u in V such that hu, ui = 6 0.
Now we proceed by induction on n = dim(V ). Assume, first, that n = 1,
and consider the vector
u
u0 = 1 .
|hu, ui| 2
Clearly, hu0 , u0 i is either 1 or −1 (depending on whether hu, ui is positive or
negative). Either way, {u0 } qualifies as an orthonormal basis for V .
Next, assume that n ≥ 2, and that the proposition holds for vector spaces
of dimension (n − 1). By the projection theorem, u⊥ has dimension (n − 1).
We claim that the induction hypothesis is, therefore, applicable to u⊥ (together
with the restriction of h , i to u⊥ ). But there is something here that must be
checked. We need to know that the restriction of h , i is a generalized inner
product, i.e., satisfies conditions (IP1) – (IP3) and (IP4). The first three are
automatically inherited under restriction. What we need to check is that the
fourth does so as well, i.e., that for all nonzero vectors v in u⊥ , there is a w in
u⊥ (not just in V ) such that hv, wi = 6 0. But it is not hard to do so. Assume to
the contrary that v is a nonzero vector in u⊥ that is orthogonal to all vectors in
u⊥ . Then v must be orthogonal to all vectors v 0 in V . (Why? Consider any such
vector v 0 . It can be expressed in the form v 0 = a u + w where w ∈ u⊥ . By our
assumption, v is orthogonal to w (since w is in u⊥ ). And it is also orthogonal
to u (since v is in u⊥ ). So hv, v 0 i = hv, a u + wi = ahv, ui + hv, wi = 0.) But
this is impossible. By (IP4) again, there is no non-zero vector orthogonal to all
vectors in V .
Thus, as claimed, our induction hypothesis is applicable to the (n − 1) di-
mensional space u⊥ (together with the induced inner product on it). So it must
17
be the case that u⊥ has an orthonormal basis S. And, therefore, by the projec-
tion theorem, S 0 = S ∪ {u0 } qualifies as an orthonormal basis for V . (Here u0 is,
again, just the normalized version of u considered above.) Thus the proposition
holds in the case where the dimension of V is n. It follows, by the principle of
induction, that it holds no matter what the dimension of V .
Given an orthonormal basis S of V (with respect to the generalized inner
product h , i), there is some number of vectors u in S (possibly 0) such that
hu, ui = 1, and some number of vectors u in S (possibly 0) such that hu, ui = −1.
We next show that these two numbers are the same in all bases. We do so by
giving the two numbers an invariant, i.e., basis independent, characterization.
Let us say that a subspace W of V is (with respect to h , i) positive definite
if, for all w in W , w 6= 0 ⇒ hw, wi > 0; negative definite if, for all w in W ,
w 6= 0 ⇒ hw, wi < 0; and definite if it is one or the other.
Problem 2.3.2. Let W be a subspace of V . Show that the following conditions
are equivalent.
(i) W is definite.
(ii) There does not exist a non-zero vector w in W with hw, wi = 0.
(Hint: To show that (ii) implies (i), assume that W is neither positive definite
nor negative definite. Then there exist non-zero vectors u and v in W such
that hu, ui ≤ 0 and hv, vi ≥ 0. Consider the function f : [0, 1] → R defined by
f (x) = hx u + (1 − x)v, x u + (1 − x)vi. It is continuous. (Why?) So ... .)
The signature of h , i is a pair of non-negative integers (n+ , n− ) where
n+ = the maximal possible dimension for a positive definite subspace
n− = the maximal possible dimension for a negative definite subspace.
(This definition make sense. Positive and negative definite subspaces are not,
in general, unique. But among all such, there must be ones with maximal
dimension (since V itself is of finite dimension).)
Proposition 2.3.3. Let h , i have signature (n+ , n− ), and let S be an orthonor-
mal basis for V . Then there are n+ vectors u in S such that hu, ui = 1, and n−
vectors u in S such that hu, ui = −1. (And, therefore, n+ + n− = dim(V ).)
Proof. We give the proof for n+ . (The proof for n− is the same except for
obvious modifications.) Let n = dim(V ), and let m be the number of vectors u
in S such that hu, ui > 0. We must show that m = n+ . If n = 0, the assertion is
trivial. (For then S is the empty set, and n+ = 0 = m.) So we may assume that
n ≥ 1. Let S be {u1 , ..., un }. We may also assume that m ≥ 1. For suppose
m = 0, i.e., hui , ui i = −1 for all i. Then given any non-zero vector u in U , we
can express it as a linear combination u = a1 u1 + ... + an un with at least one
non-zero coefficient, and it follows that
hu, ui = ha1 u1 + ... + an un , a1 u1 + ... + an un i
Xn n
X
= ai aj hui , uj i = a2i hui , ui i = −a21 − ... − a2n < 0.
i,j=1 i=1
18
Thus, if m = 0, there are no non-zero vectors u in V such that hu, ui > 0 and,
therefore, n+ = 0 = m, as required.
So we may assume that n ≥ 1, and m ≥ 1. Reordering the elements in S if
necessary, we may also assume that, for all i,
1≤i≤m =⇒ hui , ui i = +1
m<i≤n =⇒ hui , ui i = −1.
19
the affine space Rn , and let h , i be the generalized inner product that assigns
to vectors u = (a1 , ..., an ) and v = (b1 , ..., bn ) the number
(One gets a 0-dimensional metric affine space with signature (0, 0) if one takes
A to be the 0-dimensional affine space ({p}, {0}, +) discussed in section 2.2, and
takes h , i to be the trivial inner product that makes the assignment h0, 0i = 0.)
These examples are, in an appropriate sense, the only ones. To make that sense
precise, we need a definition.
Let (A, h , i) and (A0 , h , i0 ) be metric affine spaces (with corresponding
underlying vector spaces V and V0 ). Recall that an (affine space) isomorphism
between A and A0 is a bijection ϕ : A → A0 satisfying the requirement that there
exist a (vector space) isomorphism Φ : V → V 0 such that, for all p and q in A,
−−−−−−→
ϕ(p) ϕ(q) = Φ(→ −
pq). (As we observed in section 2.2, if one exists, it is certainly
unique.) We say that ϕ is an isometry (between (A, h , i) and (A0 , h , i0 )) if, in
addition, hu, vi = hΦ(u), Φ(v)i0 for all vectors u and v in V . (So, to qualify as
an isometry, ϕ must respect the linear structure of V and the metric structure
of h , i.) (A, h , i) and (A0 , h , i0 ) are said to be isometric, of course, if there
exists an isometry mapping one onto the other.
Proposition 2.3.4. Two finite dimensional metric affine spaces are isometric
iff they have the same dimension and signature.
Proof. Let (A, h , i) and (A0 , h , i0 ) be finite dimensional metric affine spaces
(with corresponding underlying vector spaces V and V0 ). Assume there exists
an isometry ϕ : A → A0 with corresponding map Φ : V → V 0 . Since ϕ is an
(affine space) isomorphism, it follows from proposition 2.2.7 that A and A0
have the same dimension. Since Φ preserves inner products, it takes positive
and negative subspaces in V onto subspaces of the same type in V 0 . So h , i
and h , i0 necessarily have the same signature.
Conversely, suppose that the two metric affine spaces have the same dimen-
sion n ≥ 0 and same signature (n+ , n− ). It follows, we claim, that there exists
an isomorphism Φ : V → V 0 between their underlying vector spaces that pre-
serves inner products, i.e., such that hu, vi = hΦ(u), Φ(v)i0 for all vectors u and
v in V . If n = 0, the claim is trivial (since then the map Φ taking 0 in V to 00
in V 0 qualifies as an inner product preserving isomorphism). If n ≥ 1, we can
generate Φ by considering orthonormal bases for V and V 0 . Suppose {u1 , ..., un }
and {u01 , ..., u0n } are two such, and suppose the two are ordered so that, for all i,
20
for all numbers a1 , ..., an . It follows easily that Φ is an isomorphism and pre-
serves inner products. The latter holds since, for all vectors u and v in V , if
u = a1 u1 + ... + an un and v = b1 u1 + ... + bn un , then
21
that h , i is positive definite.) It follows that
kuk ≥ 0
kuk = 0 ⇐⇒ u = 0
ka uk = |a| kuk
for all real numbers a. If we think of u as an arrow, we can think of kuk as its
length.
The polarization identity (problem 2.3.1) can (in the present context) be
cast in the form:
1
hv, wi = (kvk2 + kwk2 − kv − wk2 ).
2
This tells us that we can reverse the order of definition, and recover the inner
product h , i from its associated norm. Another useful identity is formulated
in the following problem. It is called the “parallelogram identity”. (Can you
explain where the name comes from?)
Problem 2.4.1. Prove that for all vectors u and v in V ,
ku + vk2 + ku − vk2 = 2(kuk2 + kvk2 ).
Now hw, wi ≥ 0 (since this inequality holds for all vectors, whether zero or
nonzero). So
hu, vi2
hv, vi − ≥ 0,
hu, ui
or, equivalently,
hu, vi2 ≤ hu, uihv, vi.
Taking positive square roots of both sides yields the desired inequality.
22
Now assume that |hu, vi| = kuk kvk, i.e., hu, vi2 = hu, uihv, vi. Then (by the
string of equations above), hw, wi = 0. So, by (E2), w = 0. Hence,
hu, vi
v= u.
hu, ui
Thus the two vectors are proportional. Conversely, assume that one of the two
vectors is a multiple of the other. Since u is non-zero, we have v = a u for some
a. Then
It follows from (E2) that f (x) ≥ 0 for all x, and f (x) = 0 iff xu + v = 0. These
conditions impose constraints on the coefficients of f .
So ku + vk ≤ kuk + kvk. Furthermore, equality will hold iff all the sums in the
sequence are equal. Thus, it will hold iff |hu, vi| = hu, vi and |hu, vi| = kuk kvk.
But, by the second half of proposition 2.4.1, these two conditions will both hold
iff one of the two vectors is a multiple of the other and the proportionality factor
is nonnegative.
Now we come to Euclidean geometry proper. We take an n-dimensional
Euclidean space to be a metric affine space with dimension n and signature
(n, 0). As noted above, “up to isometry”, there is only one such.
In what follows, let (A, h , i) be an n-dimensional Euclidean space with
n ≥ 2.
All the usual notions and theorems of Euclidean geometry can be recovered
within our framework. First, h , i induces a distance function on A. Given
points p and q in A, we take the distance between them to be
d(p, q) = k→
−
pqk = h→
−
pq, →
− 1
pqi 2 .
23
It follows easily that, for all points p, q, and r in A,
d(p, p) = 0
d(p, q) = d(q, p)
d(p, r) ≤ d(p, q) + d(q, r).
h→
−
op, →
−
oqi
cos θ = →
kopk k→
− −
oqk
and 0 ≤ θ ≤ π. (We write this θ as ](p, o, q).) Notice that the definition is well
posed since, by the Schwarz inequality, the expression on the right side is be-
tween −1 and 1 (and for any number x in that interval, there is a unique number
θ in the interval [0, π] such that cos θ = x). Notice too that ](p, o, q) = ](q, o, p).
Finally, notice that the number ](p, o, q) does not depend on the length of the
segments LS(o, p) and LS(o, q), but only on their relative orientation. More
−
→ −
→
precisely, if p0 and q 0 are such that op0 = a →
−
op and oq 0 = b →
−
oq, with a, b > 0, then
0 0
](p , o, q ) = ](p, o, q).
Problem 2.4.3. (The measure of a straight angle is π.) Let p, q, r be (distinct)
collinear points, and suppose that q is between p and r (i.e., →
−
pq = a → −
pr with
0 < a < 1). Show that ](p, q, r) = π.
Problem 2.4.4. (Law of Cosines) Let p, q, r be points, with q distinct from p
and r. Show that
k→
−
prk2 = k→
−
qpk2 + k→
−
qrk2 − 2k→
−
qpk k→
−
qrk cos ](p, q, r).
(Hint: Use the polarization identity (problem 2.3.1).)
Problem 2.4.5. (Right Angle in a Semicircle Theorem) Let p, q, r, o be (dis-
tinct) points such that (i) p, o, r are collinear, and (ii) k→
−
opk = k→−
oqk = k→−
ork. (So
q lies on a semicircle with diameter LS(p, r) and center o.) Show that → −
qp ⊥ →−
qr,
π →
− →
− →
− →
− →
−
and so ](p, q, r) = . (Hint: First show that op = −or, qp = −oq + op, and
→
− 2
qr = −→ −
oq − →
−
op. Then expand the inner product h→ −
qp, →
−
qri using the latter two
equalities.)
Problem 2.4.6. (Stewart’s Theorem) Let p, q, r, s be points (not necessarily
distinct) with s (collinear with and) between q and r. (So →−
qs = a →
−
qr for some
a ∈ [0, 1].) Show that
k→
−
pqk2 k→
−
srk + k→
−
prk2 k→
−
qsk − k→
−
psk2 k→
−
qrk = k→
−
qrkk→
−
qskk→
−
srk.
24
The next proposition shows that our choice for the angular measure function
(involving the inverse cosine function) satisfies an additivity condition that one
would expect any natural notion of angular measure to satisfy. It turns out that
it is the only candidate, up to a constant, that satisfies the additivity condition
as well as certain other modest (invariance and continuity) conditions. This
goes some way to explaining where our choice “comes from”. We will not
prove the uniqueness theorem here, but will prove a corresponding theorem
for Minkowskian angular measure in section 3.3. And we will leave it as an
exercise there (problem 3.3.1) to rework the proof so as to apply to the present
(Euclidean) case.
Given co-planar points o, p, q, r, with o distinct from p, q, and r, we say the
vector →−
oq is between →
−
op and →−
or if there exist a, b ≥ 0 such that →−
oq = a → −
op + b →
−
or.
(This notion of “betweenness” is a bit delicate. It need not be the case (even
when o, p, q, and r are co-planar) that one of the three vectors → −
op, →
−
oq, →
−
or qualifies
as being between the other two. It could be the case, for example, that the
points are co-planar and the three angles ](p, o, q), ](q, o, r), and ](r, o, p) are
all equal.)
q
p
r
o
Figure 2.4.1: ](p, o, q) + ](q, o, r) = ](p, o, r)
and hence,
cos2 θ1 = a2 + 2ab h→−
op, →
−
ori + b2 h→
−
op, →
−
ori2 (2.4.1)
2
cos θ2 = a2 h→
−
op, →
−
ori2 + 2ab h→
−
op, →
−ori + b2 (2.4.2)
cos θ1 cos θ2 = (a + b )h→
2 2 −
op, →
−
ori + ab h→
−
op, →
−
ori2 + ab. (2.4.3)
1 = h→
−
oq, →
−
oqi = a2 + b2 + 2ab h→
−
op, →
−
ori. (2.4.4)
25
Subtracting first (2.4.1) from (2.4.4), and then (2.4.2) from (2.4.4), we arrive at
sin2 θ1 = b2 (1 − h→
−
op, →
−
ori2 )
2
sin θ2 = a (1 − hop, →
2 →
− −
ori2 ).
sin θ1 sin θ2 = ab (1 − h→
−
op, →
−
ori2 ). (2.4.5)
cos θ1 cos θ2 = h→
−
op, →
−
ori + ab (1 − h→
−
op, →
−
ori2 ) = h→
−
op, →
−
ori + sin θ1 sin θ2 .
So
cos ](p, o, r) = h→
−
op, →
−
ori = cos θ1 cos θ2 − sin θ1 sin θ2 = cos (θ1 + θ2 )
= cos (](p, o, q) + ](q, o, r)).
Since cos is injective over the domain [0, π], ](p, o, q)+](q, o, r) = ](p, o, r).
The proposition has as a corollary the following classic result.
q s
p r t
Figure 2.4.2: ](p, r, q) + ](r, q, p) + ](q, p, r) = π
h→
−
pq, →
− − →
→ −
pri −1 h rs, rti
](q, p, r) = cos−1 → = cos − = ](s, r, t).
k−
pqkk→ −
prk k→
− →
rskk rtk
Furthermore, since h→
−
qr, −→
−
rsi = h−→
−
qr, →
−
rsi = h→
−
rq, →
−
rsi,
h→
−
qr, →
−
qpi h→
−
qr, −→
−
rsi
](r, q, p) = cos−1 →
− →
− = cos−1 →
kqrkkqpk kqrkk − →
− −
rsk
→
− →
−
hrq, rsi
= cos−1 → = ](q, r, s).
k−
rqkk→ −
rsk
26
Hence
But by the additivity of angular measure, the sum on the right side is equal to
the straight angle ](p, r, t). And the latter, by problem 2.4.3, is equal to π. So
we are done.
27
3 Minkowskian Geometry and Its Physical Sig-
nificance
We take an n-dimensional Minkowskian space (with n ≥ 2) to be a metric affine
space of dimension n and signature (1, n − 1). We know (by proposition 2.3.4)
that, up to isometry, there is only one such space. In section 3.1, we proceed
formally and develop certain elements of Minkowskian geometry in parallel to
our development of Euclidean geometry in 2.4. Then, in section 3.2, we turn to
the physical significance of Minkowskian geometry (in the case where n is 4).
There we take the underlying set of points to represent the totality of all point-
event locations in spacetime, and relate the geometry to physical processes in
the world (involving point particles, light rays, clocks, and so forth). In section
3.3, we show that the standard Minkowskian angular measure function is the
only candidate that satisfies certain natural conditions. Finally, in section 3.4
we consider the relative simultaneity relation in special relativity, and prove a
cluster of related uniqueness results for it as well. The latter are motivated
by a longstanding debate over the status of the relative simultaneity relation in
special relativity. (Is it “conventional” in character, or rather, in some significant
sense, forced on us?)
28
(iii) u⊥ has an orthonormal basis {u2 , u3 , ..., un } (with respect to the inner
product h , i, as restricted to u⊥ ). This follows from proposition 2.3.2. (We saw
in the proof of that proposition that the restriction of h , i to u⊥ qualifies as a
generalized inner product on that subspace.) Let u1 be the normalized vector
u
1 . Then the expanded set {u1 , ..., un } is an orthonormal basis for V (by
hu, ui 2
proposition 2.3.1). Since the signature of h , i is (1, n − 1), and hu1 , u1 i = 1,
it must be the case that hui , ui i = −1, for i = 2, ..., n. It follows that u⊥ is a
negative definite subspace. For given any vector v = a2 u2 + ... + an un in u⊥ ,
29
is 0 since v and w are null, and v ⊥ w.) So, by proposition 3.1.1 again, v −a w =
0, i.e., v and w are proportional.
Problem 3.1.1. Show that there are no subspaces of dimension higher than 1
all of whose vectors are causal.
Since hu, ui, hw, wi, and k are all positive, it follows that hu, wi is positive as
well. So, again, we are led to the conclusion that the pair u, w is co-oriented.
We call the equivalence classes of timelike vectors under this relation tem-
poral lobes. There must be at least two lobes since, for any timelike vector u
in V , u and −u are not equivalent. There cannot be more than two since, for
all timelike vectors u, v in V , either hu, vi > 0 or h−u, vi > 0. (Remember, two
timelike vectors cannot be orthogonal.) Hence there are exactly two lobes. It
is easy to check that each lobe is convex, i.e., if u, v are co-oriented timelike
vectors and a, b > 0, then (au + bv) is a timelike vector co-oriented with u and
v.
30
The relation of co-orientation can easily be extended to the larger set of non-
zero causal (i.e., timelike or null) vectors. Given any two such vectors u and v,
we take them to be co-oriented if either hu, vi > 0 or v = au with a > 0. (The
second possibility must be allowed since we want a null vector to count as co-
oriented with itself.) Once again, co-orientation turns out to be an equivalence
relation with two equivalence classes that we call causal lobes. These lobes, too,
are convex. (Only minor changes in the proof of proposition 3.1.4 are required
to establish that the extended co-orientation relation is transitive.)
Problem 3.1.2. One might be tempted to formulate the extended definition this
way: two causal vectors are “co-oriented” if hu, vi ≥ 0. But this will not work.
Explain why.
We take a temporal orientation of Minkowski spacetime to be a specification
of one temporal (or causal) lobe as the “future lobe”. In the presence of such
an orientation, we can speak of “future directed” and “past directed” timelike
(and null) vectors.
The next proposition formulates Minkowskian counterparts to propositions
2.4.1 and 2.4.2 (concerning Euclidean spaces).
Proposition 3.1.5. Let u and v be causal vectors in V .
(i) (“Wrong way Schwarz inequality”): |hu, vi| ≥ kukkvk, with equality iff u
and v are proportional.
(ii) (“Wrong way triangle inequality”): If u and v are co-oriented,
ku + vk ≥ kuk + kvk,
with equality iff u and v are proportional.
Proof. (i) If both u and v are null, the assertion follows immediately from propo-
sition 3.1.2. So we may assume that one of the vectors, say u, is timelike. We
can express v in the form v = au + w where w ∈ u⊥ . Hence, hu, vi = ahu, ui and
hv, vi = a2 hu, ui + hw, wi. Since w belongs to u⊥ , it must be spacelike or the
zero vector. In either case, hw, wi ≤ 0. So, since u and v are causal, it follows
that
hu, vi2 = a2 hu, ui2 = (hv, vi − hw, wi)hu, ui ≥ hv, vihu, ui = kuk2 kvk2 .
Equality holds iff hw, wi = 0. But, as noted above, w is either spacelike or the
zero vector. So, equality holds iff w = 0, i.e, v = a u.
(ii) Assume u, v are co-oriented. Then either hu, vi > 0, or both vectors are
null and v = au for some number a > 0. In the latter case, ku + vk = kuk =
kvk = 0, and the assertion follows trivially. So we may assume that hu, vi > 0.
Hence (by clause (i)), hu, vi ≥ kukkvk Therefore,
(kuk + kvk)2 = kuk2 + 2kukkvk + kvk2 ≤ hu, ui + 2hu, vi + hv, vi
= hu + v, u + vi = ku + vk2 .
(For the last equality we need the fact that, since u, v are co-oriented, u + v is
causal.) Equality holds here iff hu, vi = kukkvk. But, by clause (i) again, this
condition holds iff u and v are proportional.
31
So far we have formulated our remarks in terms of vectors in V . Now we
switch and formulate them directly in terms of points in A. Our terminology
carries over naturally. Given points p, q in A, we say they are timelike (null,
etc.) related if →
−
pq is timelike (null, etc.). If p and q are both causally related to
a third point o, we say that p and q are in the same causal lobe of o if → −
op and
→
−
oq are co-oriented. And so forth.
Problem 3.1.3. Let o, p, q be three points in A such that p is spacelike related
to o, and q is timelike related to o. Show that any two of the following conditions
imply the third.
(i) →
−
pq is null.
(ii) −
→
op ⊥ →−
oq
(iii) kopk = k→
→
− −
oqk
Given points o, p, q in A, with o distinct from p, q, the vectors → −
op and →−
oq
form a (possbily degenerate) angle. There are two special cases in which we
can associate a natural angular measure ](p, o, q) with it. If → −
op and →−
oq are
both spacelike, we can proceed much as in the case of Euclidean geometry. We
can take ](p, o, q) to be the unique number θ in the interval [0, π] such that
h→
−
op, →
−
oqi = −k→ −
opkk→−
oqk cos θ. (We need to insert the minus sign because the
restriction of the inner product h , i to the subspace spanned by two spacelike
vectors is negative definite.)
Of greater interest to us in what follows is the second case. If → −
op and → −
oq
are timelike and co-oriented, we take ](p, o, q) to be the unique number θ ≥ 0
such that h→ −
op, →
−
oqi = k→−
opkk→−
oqk cosh θ. (Note that by the wrong way Schwarz
→
− →
−
hop, oqi
inequality, → ≥ 1. So existence and uniqueness follow from the fact
k−
opkk→ −
oqk
that the hyperbolic cosine function maps [0, ∞) onto [1, ∞) injectively. Basic
facts about the hyperbolic functions are summarized at the end of the section.)
In both special cases, it follows immediately that ](p, o, q) = ](q, o, p), and
that ](p, o, q) does not depend on the length of the vectors → −
op and → −
oq. More
→
− 0 →
− →
− 0 →
− 0 0
precisely, if op = a op and oq = b oq, with a, b > 0, then ](p , o, q ) = ](p, o, q).
These angular measure functions satisfy additivity conditions much like the
one considered in proposition 2.4.3 (in the context of Euclidean geometry). For
the case of “spacelike angles”, the proof is essentially the same. For “timelike
angles”, a few systematic changes are necessary. One arrives at the new version
of the proof by systematically substituting the hyperbolic functions cosh and
sinh for the trigonometric functions sin and cos.
Proposition 3.1.6. . Let o, p, q, r be co-planar points with o distinct from
p, q, r. Suppose that (i) the vectors →
−
op, →
−
oq, →
−
or are timelike and co-oriented, and
→
− →
− →
−
(ii) oq is between op and or. Then ](p, o, q) + ](q, o, r) = ](p, o, r).
Proof. We may assume k→ −
opk = k→−
oqk = k→ −
ork = 1. Let θ1 = ](p, o, q), θ2 =
](q, o, r), and let a, b ≥ 0 be such that oq = a →
→
− −
op + b →
−
or. Then we have
cosh θ = hop, oqi = a + b hop, →
1
→
− →
− →
− −
ori
cosh θ2 = h→
−
oq, →
−
ori = ah→
−
op, →
−
ori + b,
32
and hence,
cosh2 θ1 = a2 + 2ab h→−
op, →
−
ori + b2 h→
−
op, →
−
ori2 (3.1.1)
2
cosh θ2 = a2 h→
−
op, →
−
ori2 + 2ab h→
−
op, →
−ori + b2 (3.1.2)
cosh θ1 cosh θ2 = (a + b )h→
2 2 −
op, →
−
ori + ab h→
−
op, →
−
ori2 + ab. (3.1.3)
Taking the norm of →−
oq we also have
1 = h→
−
oq, →
−
oqi = a2 + b2 + 2 ab h→
−
op, →
−
ori. (3.1.4)
Subtracting first (3.1.4) from (3.1.1), and then (3.1.4) from (3.1.2), we arrive at
sinh2 θ = b2 (h→
1
−
op, →
−
ori2 − 1)
sinh θ2 = a2 (h→
2 −
op, →
−
ori2 − 1).
Hence, since a, b ≥ 0 and kh→−
op, →
−
orik ≥ k→
−
opkk→
−
ork = 1,
sinh θ sinh θ = ab (h→
1 2
−
op, →
−
ori2 − 1). (3.1.5)
Combining (3.1.3), (3.1.4), (3.1.5) yields
cosh θ1 cosh θ2 = h→
−
op, →
−
ori − ab (h→−
op, →
−
ori2 − 1) = h→
−
op, →
−
ori − sinh θ1 sinh θ2 .
So
cosh ](p, o, r) = h→
−
op, →
−
ori = cosh θ1 cosh θ2 + sinh θ1 sinh θ2
= cosh (θ1 + θ2 ) = cosh(](p, o, q) + ](q, o, r)).
Since cosh is injective over the domain [0, ∞), ](p, o, q) + ](q, o, r) = ](p, o, r).
In section 3.3 we will show that the angular measure function we have intro-
duced for timelike angles is (up to a constant) the only candidate that satisfies
both the additivity condition above and certain natural continuity and invari-
ance conditions.
The next proposition gives a Minkowskian analogue of the Pythagorean the-
orem and the standard projection formulas of Euclidean trigonometry.
Proposition 3.1.7. Let p and q be points timelike related to o, falling in the
same temporal lobe of o, such that →
−
op ⊥ →
−
pq (see figure 3.1.1.) Then
→
− 2 →
− 2 →
−
(i) koqk = kopk − kpqk . 2
(ii) k→
−
opk = k→
−
oqk cosh θ and k→−
pqk = k→
−
oqk sinh θ, where θ = ](poq).
Proof. (i) Since →
−
oq = →−
op + →
−
pq and →
−
op ⊥ →
−
pq, h→
−
oq, →
−
oqi = h→−
op, →
−
opi + h→
−
pq, →
−
pqi. Our
→
− →
− →
−
result now follows because oq and op are timelike, and pq is spacelike or the
zero vector (by clause (iii) of proposition 3.1.1).
(ii) We have h→−
op, →
−
oqi = k→−
opkk→−
oqk cosh θ and
hop, oqi = hop, →
→
− →
− →
− −
op + →
−
pqi = h→−
op, →
−
opi = k→
−
opk2 .
Hence, k→
−
opk = k→
−
oqk cosh θ. This with (i) yields
kpqk = k→
→
− 2 −
opk2 − k→−
oqk2 = k→
−
oqk2 (cosh2 θ − 1) = k→
−
oqk2 sinh2 θ.
Since θ ≥ 0, sinh θ ≥ 0. So k→
−
pqk = k→
−
oqk sinh θ.
33
p k−
→
oqk sinh θ q
oqk cosh θ
→
k−
θ
o
Figure 3.1.1: The “Minkowskian Pythagorean Theorem”
Problem 3.1.4. Let p, q, r, s be distinct points in A such that (see figure 3.1.2)
(i) r, q, s lie on a timelike line with q between r and s, i.e., →
−
rs is timelike and
→
− →
−
rq = a rs where 0 < a < 1;
(ii) →
−
rp and → −
ps are null.
Show that qp is spacelike, and k→
→
− −
qpk2 = k→−
rqk k→
−
qsk.
Problem 3.1.5. Let L be a timelike line and let p be any point in A. Show the
following.
(i) There is a unique point q on L such that →−
pq ⊥ L.
(ii) If p ∈
/ L, there are exactly two points on L that are null related to p. (If
p ∈ L, there is exactly one such point, namely p itself.)
Our final topic in this section concerns the length of “timelike curves” in
Minkowskian spaces. To prepare the way, we need to say something about
limits and differentiability.
34
Let {ui } be a sequence of vectors in V . Given any vector u in V , we need
to know what it means to say that {ui } converges to u or, equivalently, that
u is the limit of {ui }. (All other notions of concern to us can be defined in
terms of this one.) We will take it to mean that, for all vectors w in V , the
sequence hui , wi converges to hu, wi. (The latter assertion makes sense because
the elements involved are real numbers. Here we are back in the domain of
basic calculus.) It should be noted that we cannot take the statement to mean
that ku − ui k converges to 0. This characterization is available if the inner
product with which one is working is positive (or negative) definite. But it is
not available here – essentially because the Minkowski norm of a vector can be
0 without the vector being 0. For example, let u be a non-zero null vector, and
let ui = 10u for all i. Then ku − ui k = k9uk = 0 for all i, but we don’t want to
say that the sequence {ui } converges to u.
Next, consider a map u : I → V where I is an interval of the form (a, b), [a, b),
(a, b] or [a, b], with a and b elements of the extended real line R∪{∞}. We say it
is continuous at s in I, naturally, if, for any sequence {si } in I converging to s,
the sequence u(si ) converges to u(s). And we say it is differentiable at s if, for
u(s) − u(si )
any sequence {si } in I converging to s (with si 6= s), the sequence
(s − si )
converges to some vector in V . When the condition obtains, we take that vector
du
to be the derivative of u at s and use the notation (s) or u0 (s) for it.
ds
Various facts about derivatives can now be established much as they would
in a standard course in calculus. Suppose u : I → V and v : I → V are
differentiable and so is the real valued function f : I → R. Then, for example,
both the following hold for all s in I:
d du dv
(u + v) = + (3.1.6)
ds ds ds
d du df
(f u) = f + u. (3.1.7)
ds ds ds
And if w0 is any individual vector, then
d du
hu, w0 i = h , w0 i. (3.1.8)
ds ds
Now, finally, consider a curve in A, i.e., a map γ : I → A, where I is an
interval as above. Given any point p in A, we can represent γ in the form
−−−→
γ(s) = p + u(s), where u : I → V is defined by setting u(s) = p γ(s). We say
that γ is differentiable at s in I if u is, and, when the condition is satisfied, take
dγ
its derivative or tangent vector there to be u0 (s). (We use the notation (s)
0
ds
or γ (s) for this derivative.) Of course, we need to verify that our definition of
γ 0 (s) is well-posed, i.e., does not depend on the initial choice of “base point” p.
But this is easy. Let q and w : I → V be such that γ(s) = p + u(s) = q + w(s),
for all s in L. Then w(s) = → −qp + u(s) and, therefore,
w(s) − w(s ) = (→
i
−
qp + u(s)) − (→
−
qp + u(s )) = u(s) − u(s )
i i
35
for all s in I.
We know now what it means to say that a curve in A is differentiable. Other
basic notions from calculus (e.g., second (and higher order) differentiability,
piecewise differentiability, and so forth) can be handled similarly.
Let γ : I → A be differentiable. We say it is timelike (respectively null,
causal, spacelike) if its tangent vector γ 0 (s) is timelike (respectively null, causal,
spacelike) at all points s in I. We can picture timelike curves, for example,
as ones that thread the null cones of all the points through which they pass.
And we can picture null curves as ones whose tangent vectors at every point are
tangent to the null cone based at that point.
Note that null curves need not be straight. (A curve γ : I → A is straight if it
can be represented in the form γ(s) = p + f (s) v, where v is a vector in V , and
f is a real-valued function on I, i.e., if its tangent vectors at different points are
all proportional to one another). For example, one can have a null curve in the
shape of a helix. (All that is required is that the helix have exactly the right
pitch.) Let’s verify this explicitly, just for the practice. Let p be a point in A,
let u be a unit timelike vector in V , and let v and w be unit spacelike vectors
orthogonal to each other and to u. Consider the curve γ : R → A defined by
setting γ(s) = p+su+(cos s)v +(sin s) w. We have γ 0 (s) = u−(sin s)v +(cos s)w
and, so,
(So, for example, if γ is null, then kγ 0 (s)k = 0 for all s, and so kγk = 0.) And
we can take the length of a piecewise (“jointed”) differentiable curve to be the
sum of the lengths of its pieces. But this notion of length is not of much interest
in general. One reason is the following. Given any two points in A, one can
connect them with a piecewise differentiable curve whose length is 0. (It suffices
to consider a zig-zag (piecewise differentiable) curve all of whose segments are
null.) One can also connect them with differentiable curves of arbitrarily large
length. Thus, in Minkowskian spaces, straight lines can neither be characterized
as (images of) differentiable curves that minimize length between pairs of points,
nor as ones that maximize length between them.
The situation does not get much better if one restricts attention to spacelike
curves. For given any two spacelike related points in A, and any > 0, one can
connect them with a differentiable spacelike curve whose length is less than
(as well as with differentiable spacelike curves of arbitrarily large length). One
can arrive at the former by first connecting the points in question with a two
segment piecewise differentiable null curve (with length 0), and then approxi-
mating it sufficiently closely with a differentiable spacelike curve. Continuity
36
considerations guarantee that the approximating curve will have length close to
0.
But one does get an interesting theory of length if one restricts attention to
timelike curves. Given any two timelike related points in A, and any > 0, one
can connect them with a differentiable timelike curve whose length is less than
. (The argument here is much the same as in the case of spacelike curves. One
looks to timelike curves that closely approximate a jointed null curve.) But –
here is the asymmetry – there exists a longest timelike curve connecting them
(unique up to reparametrization), namely a straight curve. (See figure 3.1.3.)
This follows as a consequence of the wrong way triangle inequality. We capture
the claim in the next proposition. It will be important later in connection with
our discussion of the (so-called) “clock paradox”.
Proposition 3.1.8. Let p and q be points in A that are timelike related, and let
γ : [s1 , s2 ] → A be a differentiable timelike curve with γ(s1 ) = p and γ(s2 ) = q.
Then kγk ≤ k→ −
pqk, and equality holds iff γ is straight.
Proof. We can express γ in the form γ(s) = p + u(s) where u : [s1 , s2 ] → V is
differentiable. Since →−
pq is timelike, it follows from proposition 3.1.1 that we can
express u(s) in the form u(s) = f (s) → −
pq +w(s) where, for all s, w(s) is orthogonal
→
−
to pq. Here f : [s1 , s2 ] → R and w : [s1 , s2 ] → V are maps that, as is easy to
check, make the assignments
hu(s), →
−
pqi
f (s) = →
− →
− and w(s) = u(s) − f (s) →
−
pq. (3.1.9)
hpq, pqi
Note that since u is differentiable, so are f and w. Moreover,
for all s. These claims all follow from general facts about differentiability noted
above. (Recall (3.1.6), (3.1.7), and (3.1.8).) It also follows that w0 (s) ⊥ →−
pq for
37
all s, since, by (3.1.8),
d
hw0 (s), →
−
pqi = hw(s), →
−
pqi = 0.
ds
We claim next that
u(s1 ) = 0 f (s1 ) = 0 w(s1 ) = 0
u(s2 ) = →
−
pq f (s2 ) = 1 w(s2 ) = 0.
To see this, note first that p = γ(s1 ) = p + u(s1 ) and q = γ(s2 ) = p + u(s2 ).
This gives us the equations in the first column. The equations in the next two
columns then follow from (3.1.9).
Now recall (3.1.10). Since w0 (s) is orthogonal to the timelike vector →
−
pq, it
must be spacelike or the zero vector. Hence
kγ 0 (s)k2 = (f 0 (s))2 k→
−
pqk2 − kw0 (s)k2 ≤ (f 0 (s))2 k→
−
pqk2
for all s. But f 0 (s) > 0 for all s. (Why? γ 0 (s), but not w0 (s), is timelike for
all s. So, by (3.1.10) again, f 0 (s) 6= 0 for all s. Since f (s1 ) < f (s2 ), it must
be the case that f is everywhere increasing. So f 0 (s) > 0 for all s.) Thus,
kγ 0 (s)k ≤ f 0 (s) k→
−
pqk, with equality iff w0 (s) = 0 for all s. It follows that
Z s2 Z s2
kγk = kγ 0 (s)k ds ≤ f 0 (s) k→
−
pqk ds = k→
−
pqk (f (s2 ) − f (s1 )) = k→
−
pqk,
s1 s1
with equality iff w0 (s) = 0 for all s. But the latter condition (the vanishing
of w0 (s) for all s) holds iff w(s) = 0 for all s (since w(s1 ) = w(s2 ) = 0). So
kγk = k→ −
pqk precisely when γ(s) = p + f (s) → −
pq, i.e., when γ is a straight line
segment connecting p and q.
(eθ − e−θ )
sinh θ =
2
(e + e−θ )
θ
cosh θ =
2
sinh θ
tanh θ = .
cosh θ
38
They satisfy the following relations
cosh2 θ − sinh2 θ = 1
sinh(θ1 + θ2 ) = sinh θ1 cosh θ2 + cosh θ1 sinh θ2
sinh(θ1 − θ2 ) = sinh θ1 cosh θ2 − cosh θ1 sinh θ2
cosh(θ1 + θ2 ) = cosh θ1 cosh θ2 + sinh θ1 sinh θ2
cosh(θ1 − θ2 ) cosh θ1 cosh θ2 − sinh θ1 sinh θ2
=
tanh θ1 + tanh θ2
tanh(θ1 + θ2 ) =
1 + tanh θ1 tanh θ2
tanh θ1 − tanh θ2
tanh(θ1 − θ2 ) = .
1 − tanh θ1 tanh θ2
39
(CP 2) Timelike lines represent the spacetime trajectories of free massive point
particles, i.e., massive point particles that are not subject to any forces.
(CP 3) Null lines represent the spacetime trajectories of light rays (traveling in
a vacuum).
Group 2 (concerning clocks)
(CP 4) If p and q are timelike related points in A, then k→
−
pqk is the elapsed time
between them as recorded by a freely falling natural clock whose spacetime
trajectory contains these points.
(CP 40 ) More generally, if p and q are timelike related points in A, and γ is a
timelike curve that connects them, then the length kγk of γ is the elapsed
time recorded by a natural clock with spacetime trajectory γ.
40
(5) We have built in the requirement that “curves” be smooth. So, depend-
ing on how one models collisions of point particles, one might want to restrict
attention here to particles that do not experience collisions.
(6) The principles all involve complex idealizations. For example, (CP 4) and
(CP 40 ) take for granted that “natural clocks” can be represented as timelike
curves. But real clocks exhibit some spatial extension, and so they are properly
represented, not as timelike curves, but as “world tubes”. What is true is that,
in some circumstances, it is convenient and harmless to speak of point-sized
“clocks”. If pressed, one might try to cash this out in terms of a sequence
of smaller and smaller clocks whose respective worldtubes converge to a single
timelike curve.
(7) Without further qualification, (CP 40 ) is really not even close to being
true. It is formulated in terms of arbitrary natural clocks traversing arbitrary
timelike curves (not just freely falling clocks traversing straight ones). But no
clock does very well when subjected to extreme acceleration. Try smashing your
wristwatch against the wall. A more careful formulation of (CP 40 ) would have to
be qualified along the following lines: natural clocks measure the Minkowskian
length of the worldlines they traverse so long as they are not subjected to ac-
celerations (or tidal forces) exceeding certain characteristic limits (that depend
on the type of clock involved).
The issues raised here, particularly the role of idealization in the formulation
of physical theory, are interesting and important. But they do not have much
to do with relativity theory as such. They would arise in much the same way
if we undertook to describe spacetime structure in classical Newtonian physics,
and formulated a corresponding set of interpretive principles appropriate to that
setting.
It would take us too far afield to properly discuss particle dynamics in rela-
tivity theory. But certain parts of the story are already to be found in the first
two principles. (CP 2) captures a relativistic version of Newton’s first law of
motion. It asserts that free massive particles travel with constant (subluminal)
velocity. And it follows from (CP 1) that no matter what forces we impress on
a massive particle in a particle accelerator, we will never succeed in pushing it
to the speed of light.
(CP40 ) gives the whole story of relativistic clock behavior (modulo the con-
cerns mentioned above). In particular, it implies the “path dependence” of clock
readings. If two clocks start at a spacetime point p, and travel along different
trajectories to a spacetime point q, then, in general, they will record different
elapsed times for the trip. (E.g., one records an elapsed time of 365 seconds,
the other 28 seconds.) This is true no matter how similar the clocks. (We may
stipulate that they came off the same assembly line.) This is the case because,
as (CP 40 ) asserts, the elapsed time recorded by each of the clocks is just the
length of the timelike curve that the clock traversed in getting from p to q and,
in general, those lengths will be different.
In particular, if one clock (A) gets from p to q along a free fall trajectory (i.e.,
if it traverses a straight timelike line), and if the other (B) undergoes acceleration
at some point during the trip (and so has a trajectory that is not straight), then
41
A will record a greater elapsed time than B. This follows because, as we know
from proposition 3.1.8, the length of A’s trajectory will be k→ −
pqk and the length
→
−
of B’s will be some number smaller than kpqk. In Minkowskian geometry, again,
of all timelike curves connecting p and q, the straight line connecting them is
the longest, not the shortest. (Here is one way to remember how things work.
To get a clock to read a smaller elapsed time between p to q than the maximal
value, one will have to accelerate the clock. Now acceleration requires fuel, and
fuel is not free. So we have the principle that saving time costs money!)
The situation described here was once thought paradoxical because it was
believed that, “according to relativity theory”, we are equally well entitled to
think of clock B as being in a state of free fall and clock A as being the one that
undergoes acceleration. And hence, by parity of reasoning, it should be clock
B that records the greater elapsed time. The resolution, if one can call it that,
is that relativity theory makes no such claim. The situations of A and B are
not symmetric. B accelerates; A does not. The distinction between accelerated
motion and free fall makes every bit as much (observer independent) sense in
relativity theory as it does in classical physics.
Though in this course we are only dealing with so-called “special relativ-
ity”, that special, limiting case of relativity in which all spacetime curvature is
assumed to vanish, it is worth making one remark here about the general situ-
ation. If one considers only Minkowski spacetime, one might imagine that the
distinction between free fall and accelerated motion plays a more important role
in the determination of clock behavior than it does in fact. It is true there (in
Minkowski spacetime) that given any two timelike related points p and q, there
is a unique straight timelike line connecting them, and it is longer than any other
(non-straight) timelike curve connecting the points. But relativity theory ad-
mits spacetime models (with curvature) in which the situation is qualitatively
different. It can be the case that there is more than one timelike “geodesic”
connecting two points, and these geodesics can have (and, in general, will have)
different lengths. So it can be the case that clocks passing between two points
record different elapsed times even though both are in a state of free fall. Fur-
thermore – this follows from the preceding claim by continuity considerations
alone – it can be the case that of two clocks passing between the points, the
one that undergoes acceleration during the trip records a greater elapsed time
than the one that remains in a state of free fall. (What does remain true in
all relativistic spacetime models is a local version of the situation in Minkowksi
spacetime. If one restricts attention to sufficiently small (and properly shaped)
neighborhoods, then, given any two timelike related points in the neighborhood,
there is a unique geodesic (in the neighborhood) connecting them, and its length
is greater than that of any other timelike curve (in the neighborhood) connecting
them.)
(CP 40 ) has many interesting consequences. Here is one more. Given any two
timelike related points p and q in A, and any > 0, it is possible to travel from
one to the other in such a way that the elapsed time of the trip (as recorded
by the stopwatch one carries) is less than . This follows immediately since,
as we saw in section 3.1, there is a timelike curve connecting p and q whose
42
length is less than . (Once again, there a zig-zig null curve connecting them,
and it can be approximated arbitrarily closely with a timelike curve. See figure
3.1.3.) It suffices to follow that curve. (Here we pass over the possibility that
the accelerations required for the trip would tear us and our stopwatch apart!
We have encountered the principle that saving time costs money. Now we see
that with enough money – enough to pay for all the fuel needed to zig and zag
– one can save as much time as one wants!) This example shows the power
of thinking about special relativity in geometric terms. The claim made is a
striking one. But it is obvious if one has a good intuitive understanding of
Minkowskian geometry (and keeps (CP 40 ) in mind).
And speaking of intuitions, we should mention that though the path depen-
dent behavior of clocks in relativity theory may seem startling at first, it does
not take long to become perfectly comfortable with it. It helps to keep the anal-
ogy with automobile odometers in mind. If two cars are driven from NY to LA
along different routes, their odometers will, in general, record different elapsed
distances. Why? Because their routes have different lengths and odometers
record route length. That is what the latter do. Similarly clocks record the
length of their (four-dimensional) routes through Minkowski spacetime. That
is what they do. The one phenomenon is no more puzzling than the other.
p3
q2
p2
q1
p1
43
ray is a (zig-zag) null line, then the successive intervals k−
p→ −→
1 p2 k, kp2 p3 k, ... are
all equal. So the number of light arrivals – “ticks” of the clock – is a measure
of aggregate elapsed distance along the mirror’s worldline. (The computation is
given in the following proposition.)
Proposition 3.2.1. Let p1 , p2 , p3 , q1 , q2 be points satisfying the following con-
ditions. (See figure 3.2.1.)
(i) −
p→1 p2 is timelike
(ii) q1 q 2 = a −
−→ p→1 p2 for some a > 0
(iii) −
p− →p
2 3 = b −
p→
1 p2 for some b > 0
(iv) The vectors − p→ −→ −→ −→
1 q 1 , q1 p2 , p2 q 2 , q2 p3 are all null.
(v) The vectors in (iv) all have the same orientation as − p→
1 p2 . (This assump-
tion is actually redundant. It follows from the others.)
Then − p→ −→
1 p2 = p2 p3 .
(1 − a) −
p→ −→ −→
1 p2 = p1 q 1 − p2 q 2 .
Taking the inner product of each side with itself, and using (i) and (iv), we
arrive at
−2 h−
p→ −→ 2 −→ 2
1 q 1 , p2 q 2 i = (1 − a) kp1 p2 k ≥ 0.
Hence, h−p→ −→ −→ −→
1 q 1 , p2 q 2 i ≤ 0. But, by (v), p1 q 1 and p2 q 2 are co-oriented. So it must
be the case that hp q , p q i = 0 and, therefore, a = 1. Thus −
−
1 1
→
2 2
−→ p→
1 2p =− q→q , as
1 2
claimed.
Let us now, finally, consider measuring rods and (CP5). It should be said
immediately that, from the vantage point of relativity theory, measuring rods
are extremely complicated objects. One can, for certain purposes, represent
clocks by timelike curves. Correspondingly, one can (for certain purposes) rep-
resent measuring rods by two-dimensional timelike surfaces. (Let us agree that
a surface is timelike if, at every point, there is a tangent vector to the surface
that is timelike.) But the latter are much more complicated geometrically than
timelike curves. In fact, we will consider only the special case of rods in a state
of non-rotational free fall so that the timelike surfaces we have to work with will
be (fragments of) two-dimensional planes.
Figure 3.2.2 shows one such fragment of a plane. The indicated parallel
timelike lines are supposed to represent the worldlines of distance markers on
our (non-rotating, free falling) measuring rod. Now suppose p and q are space-
like related points in the plane so situated that →−
pq is orthogonal to the indicated
timelike lines. (This amounts to saying, as we shall soon see, that p and q are
simultaneous relative to an observer co-moving with the ruler.) To keep things
simple, imagine that p and q both fall on marker lines (as in the figure). Our
44
q
p
observer co-moving with the rod will naturally take the spatial distance between
p and q to be the number of marker spaces between those lines. (CP5) tells us
that this distance is (relative to our choice of units) precisely the magnitude
k→−
pqk. Thus this coordinating principle too correlates geometrical magnitudes
(determined by the inner product h , i) with (idealized) measurement proce-
dures.
This completes our very brief discussion of principles (CP1) through (CP5).
We now turn to the relation of “relative simultaneity” and the magnitudes
derived from it – relative speed, relative temporal distance, and relative spatial
distance.
Given a timelike vector u (or a timelike line L generated by u), we say that
two points p and q are simultaneous relative to u (or relative to L) if →
−
pq is orthog-
onal to u. In section 3.4 we will consider to what extent this standard notion
of relative simultaneity is arbitrary and to what extent it is forced. But for the
moment, we put such concerns aside, and simply work out the consequences of
adopting this notion.
Given any timelike vector u, the relation of simultaneity relative to u is an
equivalence relation. Its associated equivalence classes may be called simultane-
ity slices relative to u. Each is a three-dimensional affine subspace generated by
u⊥ . In particular, given any point p, the slice containing p is just p+u⊥ . (Recall
from our discussion of affine subspaces in section 2.2 that a point q belongs to
this set iff it is of form q = p + v where v is in u⊥ . The latter condition holds iff
→
−
pq is orthogonal to u. So q belongs p + u⊥ iff q is simultaneous with p relative
to u. Recall too that, by proposition 3.1.1, u⊥ is a three-dimensional, space-
like subspace of V ). One speaks of a “foliation” of A by (relative) simultaneity
slices. (See figure 3.2.3.)
The standard textbook expressions for relative speed, relative temporal dis-
45
u
p + u⊥
tance, relative spatial distance and the like can all be derived using a simple
formalism for projecting vectors. Let u be a unit timelike vector in V , which
we may as well take to be future-directed. Let Pu : V → V and Pu⊥ : V → V
be the associated linear maps that project a vector w onto its components,
respectively, proportional to, and orthogonal to, u. Thus,
46
L L0
p q
It also follows immediately from (3.2.2) that the relative speed v between freely
falling individuals (we may safely drop the subscript) satisfies 0 ≤ v < 1. Note
for future reference that, since
1
sinh θ (cosh2 θ − 1) 2
v = tanh θ = = ,
cosh θ cosh θ
we have
1
cosh θ = 1 . (3.2.3)
(1 − v 2 ) 2
Now let L1 , L2 , L3 be three timelike lines representing three freely falling
individuals O1 , O2 , O3 that pass each other at point o. Further let it be the case
that the three lines are co-planar, with L2 between the other two. Let θij be
the hyperbolic angle between lines Li and Lj , and let vij be the relative speed
between individuals Oi and Oj . By proposition 3.1.6, θ13 = θ12 + θ23 . Hence
47
Problem 3.2.1. Let o, p, q, r, s be distinct points where (see figure 3.2.5)
(i) o, p, q lie on a timelike line L with p between o and q;
(ii) o, r, s lie on a timelike line L0 with r between o and s;
(iii) →
−
pr and → −
qs are null;
(iv) oq, pr, and →
→
− →
− −
qs are co-oriented.
Show that
k→
−
rsk h 1 + v i 12
→
− = ,
kpqk 1−v
where v is the speed that the individual with worldline L attributes to the indi-
vidual with worldline L0 . This formula arises in discussions of the “relativistic
k→
−
rsk k→
−
osk
Doppler effect”. (Hint: First show that → − = →
− . Then show that if X
kpqk koqk
→
−
kosk 1
is the ratio → , X 2 − 2X(1 − v 2 )− 2 + 1 = 0. Finally, show that X > 1. It
k−
oqk
h 1 + v i 12
follows that the equation has a unique solution: X = .)
1−v
L L0
s
r
q
48
point on L such that →−
op ⊥ →−
pq. In this case, arguing much as before, and using,
for example, the result of problem 3.1.3, we reach the conclusion that
k→
−
pqk
the speed that O attributes to the light ray = →
− = 1. (3.2.5)
kopk
Thus, the speed of light turns out to be constant in this strong sense: it is
the same, for all observers, in all directions. In particular, if a light ray travels
from o to q, and is then reflected back to L (after encountering a mirror),
then, as judged by O, the speed of the light ray is same on its outgoing trip
as it is on its return trip. (In both cases, the speed is 1.) This follows as an
immediate consequence of the way we have construed the relation of relative
simultaneity. We will return to consider the relation between (relative) light
speed and (relative) simultaneity in section 3.4.
L L0
p q
Next we consider the notion of “relative elapsed time”. Consider again the
situation represented in figure 3.2.4. Timelike lines L and L0 represent two
freely falling individuals O and O0 that pass each other at point o. O0 says that
the elapsed time between o and q is k→ −
oqk, but O says that the elapsed time is
→
−
kopk, since he takes p and q to be simultaneous. But, by proposition 3.1.7 and
equation (3.2.3), if v is the speed of O0 relative to O,
k→
−
oqk
k→
−
opk = k→
−
oqk cosh θ = 1 > k→
−
oqk.
(1 − v 2 ) 2
Thus,
if v is the relative speed between two freely falling individuals O and
O0 , and if ∆T is the elapsed interval of time between two events on
O0 ’s worldline as determined by O0 , then O will determine the time
∆T
interval to be 1 .
(1 − v 2 ) 2
49
O determines the elapsed interval to be greater than does O0 . (“O says that O0 ’s
clock is running too slowly.”) Again, our formulation is perfectly symmetric;
the assertion remains true if one interchanges the roles of O and O0 . (“O0 also
says that O’s clock is running too slowly.”) Though one does understand the
statements in quotation marks, they are misleading and, in fact, have led to
considerable misunderstanding. The relativistic “time dilation effect” should
not be understood as resulting from some kind of clock “disturbance”.
We close this section, finally, with a brief account of “relative spatial length”.
Consider a measuring rod in a state of free fall. Let its front and back ends be
represented by timelike lines L0 and L00 . (See figure 3.2.7.) Further, let L be a
timelike line, co-planar with L0 and L00 , that intersects the former at o. We take
L to represent a free-falling observer O. Let θ be the hyperbolic angle between
L and L0 , and let v be their relative speed. Finally, let p and r be points on
L0 , and q a point on L00 , such that (see figure 3.2.7), (i) → −
oq is orthogonal to
0 →
− →
−
L , (ii) pq is orthogonal to L, and (iii) qr is co-aligned with L. It follows that
](o, r, q) = ](p, r, q) = θ.
L L0 L00
p
q
o
50
It follows immediately that
k→
−
oqk
k→
− oqk(1 − v 2 ) 2 < k→
= k→
− −
1
pqk = oqk.
cosh θ
Thus,
Then there is a constant K such that, for all p and q in Ho+ , f (p, q) = K ](p, o, q).
(Recall that ](p, o, q) is defined by the requirement that h→
−
op, →
−
oqi = cosh ](p, o, q).)
51
Note that we have the resources in hand for understanding the requirement
that f : Ho+ × Ho+ → R be “continuous”. This comes out as the condition that,
for all p and q in Ho+ , and all sequences {pi } and {qi } in Ho+ , if {pi } converges to
p and {qi } converges to q, then f (pi , qi ) converges to f (p, q). (And the condition
that {pi } converges to p can be understood to mean that the vector sequence
{−pi→
p} converges to the zero vector 0.)
Note also that the invariance condition is well formulated. For if ϕ : A → A
is an isometry of (A, h , i) that keeps o fixed and preserves temporal orientation,
then ϕ(p) and ϕ(q) are both points on Ho+ (and so (ϕ(p), ϕ(q)) is in the domain
of f ). ϕ(p) belongs to Ho+ since
−−−→ −−−−−−→
ko ϕ(p)k = kϕ(o) ϕ(p)k = kΦ(→
−
op)k = k→
−
opk = 1
−−−→
and o ϕ(p) is future-directed. And similarly for ϕ(q). (Here Φ is the vector space
isomorphism associated with φ. Recall the discussion preceding proposition
2.2.6.)
Proof. Given any four points p1 , q1 , p2 , q2 in Ho+ with h→−
op1 , →
−
oq 1 i = h→
−
op2 , →
−
oq 2 i,
there is a temporal orientation preserving isometry ϕ : A → A such that ϕ(o) =
o, ϕ(p1 ) = p2 , and ϕ(q1 ) = q2 . (We prove this after completing the main
part of the argument.) It follows from the invariance condition that f (p1 , q1 ) =
f (p2 , q2 ). Thus we see that the number f (p, q) depends only on the inner product
h→
−
op, →
−
oqi, i.e., there is a map g : [1, ∞) → R such that
f (p, q) = g(h→
−
op, →
−
oqi),
Clearly, q and r belong to Ho+ (since cosh2 θ−sinh2 θ = 1 for all θ). Multiplying
the first of these equations by sinh (θ1 + θ2 ), the second by sinh θ2 , and then
subtracting the second from the first, yields
sinh(θ1 + θ2 )→
−
oq − (sinh θ2 ) →
−
or
= [sinh(θ1 + θ2 ) cosh θ2 − cosh(θ1 + θ2 ) sinh θ2 ] →
−
op
→
− →
−
= [sinh((θ1 + θ2 ) − θ2 )] op = (sinh θ1 ) op.
52
It follows that →−
oq is between →−
op and →−
or. (If θ1 = 0 = θ2 , then q = p = r and
the three vectors are identical. Alternatively, if either θ1 > 0 or θ2 > 0, we can
express →−
oq in the form →−
oq = a →
−
op + b →
−
or, with non-negative coefficients
sinh θ1
a =
sinh(θ1 + θ2 )
sinh θ2
b = .)
sinh(θ1 + θ2 )
g(h→
−
op, →
−
ori) = f (p, r) = f (p, q) + f (q, r) = g(h→
−
op, →
−
oqi) + g(h→
−
oq, →
−
ori). (3.3.4)
h→
−
op, →
−
ori = cosh(θ1 + θ2 )
→
− →
−
hop, oqi = cosh θ2
h→
−
oq, →
−
ori = cosh(θ1 + θ2 ) cosh θ2 − sinh(θ1 + θ2 ) sinh θ2
= cosh((θ1 + θ2 ) − θ2 ) = cosh θ1 .
g ◦ cosh : [0, ∞) → R
f (p, q) = g(h→
−
op, →
−
oqi) = g(cosh ](p, o, q)) = K ](p, o, q).
Φ(→
−
op1 ) = →
−
op2
→
−
Φ(oq ) →
−
= oq .
1 2
53
For then the corresponding map ϕ : A → A defined by setting ϕ(p) = o + Φ(→ −
op)
will be an isometry of (A, h, i) that makes the correct assignments to o, p1 , and
q1 :
ϕ(o) = o + Φ(→ −
oo) = o + Φ(0) = o + 0 = o
ϕ(p1 ) = o + Φ(− →
op1 ) = o + → −
op2 = p2
→
−
ϕ(q1 ) = o + Φ(oq 1 ) = o + oq 2 →
− = q2 .
(Once again, recall the discussion preceding proposition 2.2.6.) Moreover, ϕ will
preserve temporal orientation, i.e., for all p and q in A, if →
−
pq is timelike, then
−−−−−−→ →
−
ϕ(p) ϕ(q) is co-oriented with pq. (Why? Assume without loss of generality that
→
−
pq is future-directed. (If not, we can work with → −
qp instead.) So h→−
pq, →
−
op1 i > 0
and, hence,
−−−−−−→ −
hϕ(p) ϕ(q), →
op2 i = hΦ(→
−
pq), Φ(→
−
op1 )i = h→
−
pq, →
−
op1 i > 0.
−−−−−−→ −−−−−−→
Thus, ϕ(p) ϕ(q) is co-oriented with the future-directed vector → −
op2 . So ϕ(p) ϕ(q)
is future-directed itself.)
We will realize Φ as a composition of two (Minkowski inner product pre-
serving) vector space isomorphisms. The first will be a “boost” (or “timelike
rotation”) Φ1 : V → V that takes → −
op1 to →
−
op2 . The second will be a spatial rota-
tion Φ2 : V → V that leaves op2 fixed, and takes Φ1 (→
→
− −
oq 1 ) to →
−
oq 2 . (Clearly, if these
conditions are satisfied, then (Φ2 ◦ Φ1 )(op1 ) = op2 and (Φ2 ◦ Φ1 )(→
→
− →
− −
oq 1 ) = →
−
oq 2 .)
We consider Φ1 and Φ2 in turn.
If p1 = p2 , we can take Φ1 to be the identity map. Otherwise, the vectors
→
−
op1 and → −
op2 span a two-dimensional subspace W of V . In this case, we define
Φ1 by setting
Φ1 (→
−
op1 ) = →
−
op2
Φ1 (op2 ) = −→
→
− −
op1 + 2 h→
−
op1 , →
−
op2 i →
−
op2
Φ1 (w) = w for all w in W ⊥ .
hΦ1 (→
−
op1 ), Φ1 (→
−
op2 )i = h→
−
op2 , −→
−
op1 + 2 h→−
op1 , →
−
op2 i →
−
op2 i
= −hop1 , op2 i + 2hop1 , op2 i h→
→
− →
− →
− →
− −
op2 , →
−
op2 i = h→
−
op1 , →
−
op2 i,
since h→
−
op2 , →
−
op2 i = 1.)
Next we turn to Φ2 . Since → −
op2 is a unit timelike vector, it follow from
proposition 3.1.1 that we can express Φ1 (→
−
oq 1 ) and →
−
oq 2 in the form
Φ1 (→
−
oq 1 ) = a →
−
op2 + u (3.3.5)
→
−
oq 2 = b →
−
op2 + v, (3.3.6)
54
where u and v are spacelike vectors orthogonal to → −
op2 . Now we must have a = b
since, by our initial assumption that hop1 , oq 1 i = h→
→
− →
− −
op2 , →
−
oq 2 i,
a = h→
−
op2 , Φ1 (→
−
oq 1 )i = hΦ1 (→
−
op1 ), Φ1 (→
−
oq 1 )i = h→
−
op1 , →
−
oq 1 i = h→
−
op2 , →
−
oq 2 i = b.
Moreover, since Φ1 (→−
oq 1 ) and →
−
oq 2 are both unit timelike vectors, it follows from
(3.3.5) and (3.3.6) that
a2 − hu, ui = 1 = b2 − hv, vi.
So kuk = kvk.
Now the restriction of the Minkowski inner product to the three-dimensional
subspace (→ −
op2 ) is (negative) definite. So (→
−
⊥ ⊥
op2 ) together with that inner prod-
uct is, essentially, just three-dimensional Euclidean space (conceived as an affine
metric space). But given any two vectors in Euclidean space of the same length,
there is a rotation that takes one to the other. Thus we can find a vector space
isomorphism of (→ − ⊥
op2 ) onto itself that preserves the induced inner product, and
takes u to v. We can extend it to a vector space isomorphism Φ2 : V → V that
preserves the Minkowskian inner product by simply adding the requirement that
Φ2 leave → −
op2 fixed. This map serves our purposes because it takes Φ1 (→ −
oq 1 ) to
→
−
oq 2 , as required:
Φ2 (Φ1 (→
−
oq 1 )) = Φ2 (a →
−
op2 + u) = a Φ2 (→
−
op2 ) + Φ2 (u) = b →
−
op2 + v = →
−
oq 2 .
The second lemma that we need to complete the proof of proposition 3.3.1
is the following.
Proposition 3.3.3. Let h : [0, ∞) → R be a map that that is additive and
continuous. (Additivity here means that, for all x and y, h(x+y) = h(x)+h(y).)
Then, for all x,
h(x) = Kx (3.3.7)
where K = h(1).
Proof. It follows by induction, of course, that
h(x1 + x2 + ... + xn ) = h(x1 ) + h(x2 ) + ... + h(xn ),
for all n ≥ 1, and all x1 , x2 , ..., xn in [0, ∞). Using just this condition (i.e.,
without appealing to the continuity of h), we can show that (3.3.7) holds for all
rational x. The argument proceeds in three stages. First, for all n ≥ 1, we have
h(n) = h(1 + ... + 1) = h(1) + ... + h(1) = n h(1) = K n.
| {z } | {z }
n n
(This also holds, trivially, if n = 0, since h(0) = h(0 + 0) = h(0) + h(0), and so
h(0) = 0 = k 0.) Next, for all m ≥ 1,
h(1) = h((1/m) + ... + (1/m)) = h(1/m) + ... + h(1/m) = m h(1/m)
| {z } | {z }
m m
55
and, so, h(1/m) = (1/m) h(1) = K (1/m). If follows that, for all n ≥ 0 and all
m ≥ 1,
h(n/m) = h((1/m) + ... + (1/m)) = h(1/m) + ... + h(1/m)
| {z } | {z }
n n
= n h(1/m) = n (K/m) = K (n/m).
Thus, as claimed, (3.3.7) holds for all rational x. If we now invoke continuity,
we can extend the claim to all reals. This is clear since every real r in the
interval [0, ∞) can be realized as the limit of a sequence of rationals {qi } in that
interval and, so,
h(r) = h(lim qi ) = lim h(qi ) = lim (Kqi ) = K lim qi = K r.
56
L
s
p
q
It follows that
1 − →
h→
−
pq, →
−
rsi = h→
−
pr+→
−
rq, →
−
rsi = h→
−
pr+→
−
rs, →
−
rsi = h→
−
pr, →
−
rsi+h→
−
rs, →
−
rsi = (− )h→
rs, −
rsi.
2
Thus,
1
= ⇐⇒ → −
pq ⊥ →−
rs. (3.4.1)
2
So the standard (orthogonality) relation of relative simultaneity in special rel-
ativity may equally well be described as the “ = 21 ” relation of relative simul-
taneity.
Yet another equivalent formulation involves the “one-way speed of light”.
Suppose a light ray travels from r to p with speed c+ relative to L, and from
p to s with speed c− relative to L. We saw in section 3.2 that if one adopts
the standard criterion of relative simultaneity, then it follows that c+ = c− .
(Indeed, in this case, both c+ and c− turn out to be 1.) The converse is true as
well. For if c+ = c− , then, as determined relative to L, it should take as much
time for light to travel from r to p as from p to s. And in that case, a point q on
L should be judged simultaneous with p relative to L precisely if it is midway
between r and s. So we are led, once again, to the “ = 12 ” relation of relative
simultaneity.
Now is adoption of the standard relation a matter of convention, or is it in
some significant sense forced on us?
There is a large literature devoted to this question. (Classic statements of
the conventionalist position can be found in Reichenbach [9] and Grünbaum [4].
Grünbaum has recently responded to criticism of his views in [3]. An overview
of the debate with many references can be found in Janis [7].) It is not my
purpose to review it here, but I do want to draw attention to certain remarks
of Howard Stein [12, pp. 153-4] that seem to me particularly insightful.
He makes the point that determinations of conventionality require a context.
57
There are really two distinct aspects to the issue of the “convention-
ality” of Einstein’s concept of relative simultaneity. One may assume
the position of Einstein himself at the outset of his investigation –
that is, of one confronted by a problem, trying to find a theory that
will deal with it satisfactorily; or one may assume the position of
(for instance) Minkowski – that is, of one confronted with a theory
already developed, trying to find its most adequate and instructive
formulation.
The problem Einstein confronted was (in part) that of trying to account
for our apparent inability to detect any motion of the earth with respect to
the “aether”. A crucial element of his solution was the proposal that we think
about simultaneity a certain way (i.e., in terms of the “ = 21 criterion”), and
resolutely follow through on the consequences of doing so. Stein emphasizes
just how different that proposal looks when we consider it, not from Einstein’s
initial position, but rather from the vantage point of the finished theory, i.e.,
relativity theory conceived as an account of invariant spacetime structure.
58
(S1) S is an equivalence relation (i.e., S is reflexive, symmetric, and transitive).
(S2) For all points p ∈ A, there is a unique point q ∈ L such that (p, q) ∈ S.
If S satisfies (S1), it has an associated family of equivalence classes. We can
think of them as “simultaneity slices” (as determined relative to L). Then (S2)
asserts that every simultaneity slice intersects L in exactly one point. Note
that if S = SimL , then (S1) and (S2) are satisfied. For in this case, the
equivalence classes associated with S are hyperplanes orthogonal to L, and
these clearly intersect L in exactly one point. (A hyperplane is a subspace of
V whose dimension is one less than that of V . So if V is four-dimensional,
hyperplanes are three-dimensional subspaces.)
The third, invariance condition is intended to capture the requirement that
S is determined by, or definable in terms of, the background geometric structure
of Minkowski spacetime and by L itself. The one subtle point here is whether
temporal orientation is taken to count as part of that background geometric
structure or not. Let’s assume for the moment that it does not.
Let ϕ : A → A be an isometry of (A, h , i), i.e., an affine space isomorphism
that preserves the Minkowski inner product h , i. We will say it is an L-isometry
if, in addition, it preserves L, i.e., if, for all points p in A, p ∈ L ⇐⇒ ϕ(p) ∈ L.
We will be interested in L-isometries of three types.
(a) translations along L
In this case, there exist points r, s on L such that, for all p, ϕ(p) = p + →
−
rs.
(b) isometries that leave L fixed
In this case, ϕ(p) = p for all p ∈ L, and the restriction of ϕ to any
hyperplane orthogonal to L is a reflection or rotation.
(c) temporal reflections with respect to hyperplanes orthogonal to L.
In this case, there is a point o on L such that, for all p, if p = o + v + w,
where v is parallel to L and w is orthogonal to it (the representation is
unique by proposition 3.1.1), ϕ(p) = o − v + w.
(It turns out that every L-isometry can be expressed as a composition of L-
isometries of these three basic types. But that fact will not be needed in what
follows.) We will say that our two-place relation S is L-invariant if it is preserved
under all L-isometries, i.e., if for all L-isometries ϕ : A → A, and all points
p, q ∈ A,
(p, q) ∈ S ⇐⇒ (ϕ(p), ϕ(q)) ∈ S. (3.4.2)
We can now formulate the first uniqueness result. (It is a close variant of one
presented in Hogarth [6].)
Proposition 3.4.1. Let L be a timelike line, and let S be a two-place relation on
A that satisfies conditions (S1) and (S2), and is L-invariant. Then S = SimL .
Proof. Assume the three conditions hold. For every point p ∈ A, let f (p) be
the unique point q on L such that →
−
pq ⊥ L. (q is the intersection of L with the
hyperplane through p orthogonal to L.) So, clearly, the following conditions
hold.
59
(i) For all p ∈ A, (p, f (p)) ∈ SimL .
(ii) For all p, p0 ∈ A, (p, p0 ) ∈ SimL ⇐⇒ f (p) = f (p0 ).
We claim the following condition holds as well.
(iii) For all p ∈ A, (p, f (p)) ∈ S.
For suppose p is a point in A. By (S2), there is a unique point q on L such
that (p, q) ∈ S. Now let ϕ : A → A be a temporal reflection with respect to
the hyperplane orthogonal to L that contains p and f (p). (See figure 3.4.2.)
−−−−−−→ −−−→
Then ϕ is an L-isometry, ϕ(p) = p, and f (p) ϕ(q) = q f (p) or, equivalently,
−−−→
ϕ(q) = f (p) + q f (p). Since (p, q) ∈ S, it follows by L-invariance of S that
(p, ϕ(q)) = (ϕ(p), ϕ(q)) ∈ S. But q and ϕ(q) are both on L. Hence, by the
−−−→ −−−−−−→ −−−→
uniqueness condition in (S2), ϕ(q) = q. Therefore, f (p) q = f (p) ϕ(q) = q f (p).
−−−→
So f (p) q = 0 and, therefore, q = f (p). Thus (p, f (p)) = (p, q) ∈ S. So we have
(iii).
ϕ(q)
ϕ
p
q
f (p)
Our conclusion now follows easily from (i), (ii), and (iii). To see this, assume
first that (p, p0 ) ∈ S. By (iii), we have (p0 , f (p0 )) ∈ S. So, by the transitivity
of S, (p, f (p0 )) ∈ S. But we also have (p, f (p)) ∈ S, by (iii) again. Since f (p)
and f (p0 ) are both on L, it follows from the uniqueness condition in (S2) that
f (p) = f (p0 ). Hence, by (ii), (p, p0 ) ∈ SimL . Thus we have S ⊆ SimL . Assume,
conversely, that (p, p0 ) ∈ SimL . It follows, by (ii), that f (p) = f (p0 ). Hence, by
(iii), we have (p, f (p)) ∈ S and (p0 , f (p)) = (p0 , f (p0 )) ∈ S. Hence, since S is
reflexive and transitive, (p, p0 ) ∈ S. Thus SimL ⊆ S, and we are done.
Notice that we have not used the full force of L-invariance in our proof. We
have only used the fact that S is preserved under L-isometries of type (c).
Suppose now that we do want to consider temporal orientation as part of the
background structure that may play a role in the determination of S. Then we
need to recast the invariance condition. Let us say that an L-isometry ϕ : A → A
is an (L, ↑)-isometry if it (also) preserves temporal orientation, i.e., if for all
−−−−−−→
timelike vectors →
−
pq, ϕ(p) ϕ(q) is co-oriented with → −
pq. And let us say that S is
60
(L, ↑)-invariant if it is preserved under all (L, ↑)-isometries. (So, to be (L, ↑)-
invariant, S must be preserved under all L-isometries of type (a) and (b), but
need not be preserved under those of type (c).)
(L, ↑)-invariance is a weaker condition than L-invariance and, in fact, is too
weak to deliver the uniqueness result we want. There are many two-place rela-
tions S on A other than SimL that satisfy (S1), (S2), and are (L, ↑)-invariant.
They include, for example, ones whose associated “simultaneity slices” are “flat
cones” (see figure 3.4.3) that are preserved under L-isometries of type (a) and
(b), but not (c).
61
We say (of course) that S is L-invariant if it is preserved under all L-
isometries, and (L, ↑)-invariant if it is preserved under all (L, ↑)-isometries.
Our second uniqueness result comes out as follows. (It is closely related to
propositions in Spirtes [11], Stein [12], and Budden [1].)
Proposition 3.4.2. Let L be a frame, and let S be a two-place relation on A.
Suppose S satisfies (S1) and, for some L in L, satisfies (S2). Further, suppose
S is (L, ↑)-invariant. Then S = SimL .
Proof. The proof is very much like that of proposition 3.4.1. Assume S satisfies
the hypotheses of the proposition. Then there is a line L in L such that S
satisfies (S2). Just as before, for every point p ∈ A, let f (p) be the unique point
q on L such that → −
pq ⊥ L. (q is the intersection of L with the hyperplane through
p that is orthogonal to L.) Once again, the following three conditions hold.
(i) For all p ∈ A, (p, f (p)) ∈ SimL .
(ii) For all p, p0 ∈ A, (p, p0 ) ∈ SimL ⇐⇒ f (p) = f (p0 ).
(iii) For all p ∈ A, (p, f (p)) ∈ S.
The first two are immediate. Only the third requires argument. But once we
have verified (iii), we will be done. For the rest of the argument – that (i),
(ii), and (iii) collectively imply S = SimL (and, so, S = SimL ) – goes through
intact.
Let p be a point in A. By (S2), there is a unique point q on L such that
(p, q) ∈ S. (3.4.3)
ϕ1 (q) = q
−−−→
ϕ1 (p) = f (p) + p f (p).
62
ϕ1
L
ϕ2
ϕ2 (ϕ1 (p))
f (p)
ϕ1 (p) p
q
63
4 From Minkowskian Geometry to Hyperbolic
Plane Geometry
In section 4.1, we first present Tarski’s axiomatization of first-order Euclidean
plane geometry, and a variant axiomatization of first-order hyperbolic (i.e., Lo-
batchevskian) plane geometry. Then we formulate (without proof) parallel com-
pleteness theorems for the two axiom systems. The second completeness the-
orem will be formulated in terms of the Klein-Beltrami model for hyperbolic
geometry. (This brief excursion into the metamathematics of plane geome-
try should be of some interest in its own right.) In section 4.2 we establish
the connection with Minkowskian geometry. There we use a three-dimensional
Minkowskian space to give a second, more intuitive model for hyperbolic plane
geometry and show that it is isomorphic to the Klein-Beltrami model. (In doing
so we gain a sense of “where the latter comes from”.)
64
(4) Reflexivity Axiom for Congruence
(∀x)(∀y)Cxyyx
(5) Identity Axiom for Congruence
(∀x)(∀y)(∀z)(Cxyzz → x = y)
(8) Euclid’s Axiom (This is Tarski’s name for the axiom, but it does not, in
fact, appear in Euclid.)
(∀x)(∀y)(∀z)(∀u)(∀v)(Bxyz & Buyv & x 6= y
→ (∃w)(∃t)(Bxvw & Bxut & Btzw))
Problem 4.1.1. Exhibit a sentence φpar in the language L that captures the
“parallel postulate”, the assertion that given a line L1 and a point p not on L1 ,
there is a unique line L2 that contains p and does not intersect L1 .
65
We consider three theories: TAbs (absolute geometry), TEuc (Euclidean geom-
etry), and THyp (hyperbolic geometry). Let φEuc be Euclid’s axiom. Then we
take
TAbs = the set of all axioms listed above except for Euclid’s axiom, i.e, axioms
(1)–(7), (9)–(12), and all instances of schema (13).
TEuc = TAbs ∪ {φEuc }
THyp = TAbs ∪ {¬φEuc }.
(Here, and in what follows, d : |E| × |E| → R is the standard Euclidean distance
1
function. If p = (p1 , p2 ) and q = (q1 , q2 ), d(p, q) = [(q1 − p1 )2 + (q2 − p2 )2 ] 2 .)
The basic completeness theorem for TEuc is the following.
Proposition 4.1.1. For all sentences φ in L,
TEuc ` φ ⇐⇒ φ is true under interpretation in E.
(Here we assume that we have in place some (sound and complete) derivation
system for first-order logic. For any set of sentences T in L, and any individual
sentence φ in L, we understand T ` φ to be the relation that holds if there is
a derivation of φ from T in that derivation system.) To prove the soundness
(left to right) half of the proposition, it suffices to check that all the sentences
in TEuc are true in E. That is relatively straightforward. It is the converse
direction that requires work. (A sketch of the proof can be found in the Tarski
article cited at the beginning of the section.)
Our second interpretation of L is the Klein-Beltrami model for THyp . Its do-
main |K| is the interior of the unit circle in R2 , i.e., the set {p ∈ R2 :: d(p, 0) < 1},
where 0 = (0, 0). The assignment to ‘B’ is just the restriction of the Euclidean
betweenness relation to |K|, i.e.,
To specify the assignment to ‘C’, we first need to define the hyperbolic distance
function dK : |K| × |K| → R on |K|. Given two points p and q in |K|, if p = q,
we set dK (p, q) = 0. If p 6= q, the line determined by p and q intersects the unit
circle in two points. If r is the point on the circle so situated that p is between
r and q, and s is the one such that q is between p and s (see figure 4.1.1), then
we take the cross-ratio of p and q to be the fraction
d(p, r) d(q, s)
CR(p, q) =
d(p, s) d(q, r)
66
and set dK (p, q) = − 21 log(CR(p, q)). The assignment to ‘C’ in the Klein-
Beltrami model is the congruence relation determined by dK , i.e.,
Let’s first observe that the function dK has certain properties that make it
at least a plausible candidate for a “distance function” on |K|.
(1) For all p and q in |K|, dK (p, q) ≥ 0, and dK (p, q) = 0 iff p = q.
This follows from the fact that if p 6= q, then CR(p, q) < 1 and, hence,
log(CR(p, q)) < 0.
(2) For all p1 , p2 , and p3 in |K|, if the points are collinear with p2
between p1 and p3 ,
dK (p1 , p3 ) = dK (p1 , p2 ) + dK (p2 , p3 ).
67
the next section we will show, in a sense, where the odd-looking function dK
“comes from” and, in so doing, it will become more or less obvious that it
satisfies the triangle inequality. The basic completeness theorem for THyp (the
proof of which is similar to that for TEuc ) is the following.
Proposition 4.1.2. For all sentences φ in L,
THyp ` φ ⇐⇒ φ is true under interpretation in K.
To prove the soundness (left to right) half of the proposition, one needs to
check that all the sentences in THyp are true in K. Note, for example, that
Euclid’s axiom is not true under interpretation in K (and so its negation is true
in that interpretation). To see this, it suffices to consider the counterexample
in figure 4.1.2. Again, it is the converse direction that requires work. (A proof
can be found in Szmielew [13].)
z
y
x
68
TAbs ` (φEuc → φ) and TAbs ` (¬φEuc → φ). Hence (by basic principles of
sentential logic), TAbs ` φ.
For (ii) assume first that TAbs ` (φ ↔ φEuc ). Then, by part (i), (φ ↔ φEuc )
is true in both E and K. So φ and φEuc have the same truth value in the two
interpretations. But φEuc is true in E and false in K. So φ too is true in E and
false in K. Conversely, assume that φ is true in E, but false in K. Then, since
φEuc has those same truth values in E and K, the biconditional (φ ↔ φEuc )
must be true in both interpretations. Hence, by part (i) again, it follows that
TAbs ` (φ ↔ φEuc ).
The proof for (iii) is very much the same as that for (ii).
It follows from the corollary, of course, that in our formalization of Euclidean
geometry we can substitute for φEuc any sentence in L that is true in E, but
false in K. In particular, we can substitute the parallel postulate. (And in our
formalization of hyperbolic geometry we can substitute for ¬φEuc any sentence
in L that is true in K, but false in E.)
69
γ(s) = o+(cosh s) → −
op +(sinh s) w. Clearly, γ(0) = o+ → −
op = p, γ(θ) = o+ →−
oq = q,
and the image of γ is just the “line segment” in which we are interested. Since
γ 0 (s) = (sinh s) →
−
op + (cosh s) w, kγ 0 (s)k = 1, and the length of the segment is
Z θ
kγk = kγ 0 (s)k ds = θ = dH (p, q).
0
Thus, as claimed, dH (p, q) is the length of the “line segment” in |H| connecting
p to q as determined by the inner product h , i. Though we will not stop to
do so, one can prove that, of all differentiable curves in Ho+ connecting p and
q, it is those whose images are “line segments” that have minimal length with
respect to h , i. (Thus one can think of those segments as geodesics (or geodesic
segments) with respect to the metric structure induced on Ho+ by h , i.) This
makes it clear that dH satisfies the triangle inequality.
Given points p and q in |H|, we know what the “line segment” in |H| con-
necting them is. We get the (full) line in |H| containing them, naturally enough,
by extending the segment, i.e., by adding all points r in |H| such that either
(p, q, r) or (r, p, q) stands in the relation H(B). Equivalently, we can charac-
terize it as the set of all points of form o + (cosh s) →−
op + (sinh s) w where w
is as in the preceding paragraph and s is now any real number. Yet a third
equivalent characterization is available. We can think of the line containing p
and q as just the intersection of Ho+ with the two dimensional affine subspace
of A that contains o, p, and q. (See figure 4.2.1.)
Ho+
p q
D
t
r ϕ(p) ϕ(q) s
Now we proceed to show how one gets from K to H, i.e., from the Klein-
Beltrami model of hyperbolic plane geometry to the hyperboloid model. Let t be
70
→
− →
−
a point (any point) in |H|, and let D be the set of all points d such that td ⊥ ot
→
−
and k tdk < 1. We think of D as a copy of the unit disk that is the domain of K.
→
−
Consider the map ϕ : |H| → A, defined by setting ϕ(p) = o + h→ −
op, oti−1 →
−
op for
all points p in H. It is not hard to check that ϕ determines a bijection between
|H| and D. (See problem 4.2.1.) Intuitively, it is a downward projection map.
It assigns to p that point where the line through p and o meets the disk D.
Problem 4.2.1. Verify that the map ϕ defined in the preceding paragraph is,
as claimed, a bijection between Ho+ and D.
Notice next that ϕ preserves the betweenness relation, i.e., given any three
points p, m, q in |H|, m is between p and q in the sense of H(B) iff ϕ(m) is
between ϕ(p) and ϕ(q) in the sense of K(B). Formally, this is the assertion that:
−→ = a→ − −−−−−−−→ −−−−−−→
om op+b →−
oq for some numbers a ≥ 0 and b ≥ 0 iff ϕ(p) ϕ(m) = k ϕ(p) ϕ(q)
for some number k, where 0 ≤ k ≤ 1. One can certainly give an analytic proof
of this fact. But it should be intuitively clear. Given points p and q in |H|,
consider the two-dimensional affine subspace containing them and the point o.
The intersection of that subspace with Ho+ is a line in H; its intersection with D
plays the role of a line in K; and our projection map takes the first intersection
set to the second. So it is clear that ϕ takes lines in the first interpretation to
lines in the second (and preserves the order of points on a line). This is precisely
our claim.
Now, finally, we consider the distance functions dH on |H| and dK on D.
We claim that for all points p and q in |H|,
(This will imply, of course, that ϕ preserves the congruence relation, i.e., given
any four points p1 , q1 , p2 , q2 in |H|, the pairs {p1 , q1 } and {p2 , q2 } are congruent
in the sense of H(C) iff the pairs {ϕ(p1 ), ϕ(q1 )} and {ϕ(p2 ), ϕ(q2 )} are congruent
in the sense of K(C).) To set this up, let p and q be given, and let r and s be
points on the boundary of D as called for in the definition of the cross ratio of
ϕ(p), ϕ(q) – with ϕ(p) between r and ϕ(q), and ϕ(q) between ϕ(p) and s. (See
figure 4.2.1.) Now dH (p, q) = cosh−1 (h→ −
op, →
−
oqi) and
−−−→ −−−→
1 1 kϕ(p) rk kϕ(q) sk
dK (ϕ(p), ϕ(q)) = − log(CR(ϕ(p), ϕ(q))) = − log −−−→ −−−→ .
2 2 kϕ(p) sk kϕ(q) rk
We proceed by deriving an expression for the latter and showing it equal to the
former. This involves a long, but relatively straightforward, computation. We
start by deriving expressions for r and s. Since the points are collinear with ϕ(p)
and ϕ(q) (and are positioned the way they are), there exist numbers α+ > 1
and α− < 0 such that
−−−−−−→
s = ϕ(p) + α+ ϕ(p) ϕ(q)
−−−−−−→
r = ϕ(p) + α− ϕ(p) ϕ(q).
71
We can determine α+ and α− using the fact that → −
or and →
−
os are null vectors,
and will do so in a moment. But first we note that
−−−→ −−−−−−→ −−−−−−→
kϕ(p) rk = |α− | kϕ(p) ϕ(q)k = (−α− ) kϕ(p) ϕ(q)k
−−−→ −−−→ −−−−−−→ −−−−−−→ −−−−−−→
kϕ(q) sk = kϕ(p) sk − kϕ(p) ϕ(q)k = α+ kϕ(p) ϕ(q)k − kϕ(p) ϕ(q)k
−−−−−−→
= (α+ − 1) kϕ(p) ϕ(q)k
−−−→ −−−−−−→
kϕ(p) sk = α+ kϕ(p) ϕ(q)k
−−−→ −−−→ −−−−−−→ −−−−−−→
kϕ(q) rk = kϕ(p) rk + kϕ(p) ϕ(q)k = (1 − α− ) kϕ(p) ϕ(q)k.
Hence
(−α− )(α+ − 1) α− − α− α+
CR(ϕ(p), ϕ(q)) = = .
α+ (1 − α− ) α+ − α− α+
−−−→ −−−→ −−−→ −−−−−−→
Since →
−
os = o ϕ(p) + ϕ(p) s = o ϕ(p) + α+ ϕ(p)ϕ(q) is null, we have
−−−→ −−−→ −−−−−−→ −−−−−−→
0 = ko ϕ(p)k2 + 2α+ ho ϕ(p), ϕ(p) ϕ(q)i − (α+ )2 kϕ(p) ϕ(q)k2 .
Using abbreviations
−−−−−−→
A = kϕ(p) ϕ(q)k2
−−−→ −−−−−−→
B = −2 ho ϕ(p), ϕ(p) ϕ(q)i
−−−→
C = −ko ϕ(p)k2 ,
It remains to derive expressions for the two terms appearing on the right side
of the equation. For the first, we have
−−−→ −−−−−−→ −−−→ −−−→ −−−→
−B = 2ho ϕ(p), ϕ(p) ϕ(q)i = 2 ho ϕ(p), −o ϕ(p) + o ϕ(q)i
−−−→ −−−→ −−−→ −−−→ −−−→
= −2 ko ϕ(p)k2 + 2 ho ϕ(p), o ϕ(q)i = 2C + 2 ho ϕ(p), o ϕ(q)i.
−−−→ −−−→
So (−B − 2C) = 2 ho ϕ(p), o ϕ(q)i and, hence,
−−−→ −−−→ −−−→ −−−→ −−−→
A = −h−o ϕ(p) + o ϕ(q), −o ϕ(p) + o ϕ(q)i = C + (−B − 2C) − ko ϕ(q)k2
−−−→
= −(B + C) − ko ϕ(q)k2 .
72
Therefore,
−−−→ −−−→
(B 2 − 4AC) = B 2 + 4C(B + C) + 4Ckoϕ(q)k2 = (B + 2C)2 + 4C ko ϕ(q)k2
−−−→ −−−→ −−−→ −−−→
= 4 ho ϕ(p), o ϕ(q)i2 − 4ko ϕ(p)k2 k o ϕ(q)k2 .
So
→
− →
−
(−B − 2C) = 2h→
−
op, oti−1 h→
−
oq, oti−1 h→
−
op, →
−
oqi
→
− →
− −1 →− →
− −1
= 2hop, oti hoq, oti (cosh ](p, o, q))
→
− →
−
= 2h→
−
op, oti−1 h→
−
oq, oti−1 (h→
−
op, →
−
1 1
(B 2 − 4AC) 2 oqi2 − 1) 2
→
− →
−
= 2h→
−
op, oti−1 h→
−
oq, oti−1 (sinh ](p, o, q))
and therefore,
1
(−B − 2C) − (B 2 − 4AC) 2
CR(ϕ(p), ϕ(q)) = 1
(−B − 2C) + (B 2 − 4AC) 2
cosh ](p, o, q) − sinh ](p, o, q)
= = e−2](p,o,q) .
cosh ](p, o, q) + sinh ](p, o, q)
Hence
1
dK (ϕ(p), ϕ(q)) = − log(CR(ϕ(p), ϕ(q))) = ](p, o, q) = dH (p, q).
2
This gives us (4.2.1) above, and we are done.
73
References
[1] T. Budden. Geometric simultaneity and the continuity of special relativity.
Foundations of Physics Letters, 11:343–357, 1998.
[2] B. Gelbaum and J. Olmsted. Counterexamples in Analysis. Dover Publi-
cations, 1964.
[3] A. Grünbaum. David Malament and the conventionality of simul-
taneity: A reply. Foundations of Physics, forthcoming. Also avail-
able on the Pittsburgh Philosophy of Science Archive: http://philsci-
archive.pitt.edu/archive/00000184/.
[4] A. Grünbaum. Philosophical Problems of Space and Time. Reidel, 2nd
enlarged edition, 1973.
[12] H. Stein. On relativity theory and the openness of the future. Philosophy
of Science, 58:147–167, 1991.
[13] W. Szmielew. Some metamathematical problems concerning elementary
hyperbolic geometry. In L. Henkin, P. Suppes, and A. Tarski, editors, The
Axiomatic Method, with Special Reference to Geometry and Physics. North
Holland, 1959.
[14] A. Tarski. What is elementary geometry. In L. Henkin, P. Suppes, and
A. Tarski, editors, The Axiomatic Method, with Special Reference to Ge-
ometry and Physics. North Holland, 1959. The article was reprinted in
[5].
74