General Relativity, Black Holes, and Cosmology
Andrew J. S. Hamilton
21 February 2018
Contents
1
Table of contents
List of illustrations
List of tables
List of exercises and concept questions
Legal notice
Notation
page iii
xviii
xxiii
xxiv
1
2
PART ONE FUNDAMENTALS
Concept Questions
What’s important?
5
7
9
Special Relativity
1.1
Motivation
1.2
The postulates of special relativity
1.3
The paradox of the constancy of the speed of light
1.4
Simultaneity
1.5
Time dilation
1.6
Lorentz transformation
1.7
Paradoxes: Time dilation, Lorentz contraction, and the Twin paradox
1.8
The spacetime wheel
1.9
Scalar spacetime distance
1.10 4-vectors
1.11 Energy-momentum 4-vector
1.12 Photon energy-momentum
1.13 What things look like at relativistic speeds
1.14 Occupation number, phase-space volume, intensity, and flux
1.15 How to program Lorentz transformations on a computer
iii
10
10
11
13
17
18
19
22
26
30
32
34
36
38
43
45
iv
Contents
Concept Questions
What’s important?
47
49
2
Fundamentals of General Relativity
2.1
Motivation
2.2
The postulates of General Relativity
2.3
Implications of Einstein’s principle of equivalence
2.4
Metric
2.5
Timelike, spacelike, proper time, proper distance
2.6
Orthonomal tetrad basis γm
2.7
Basis of coordinate tangent vectors eµ
2.8
4-vectors and tensors
2.9
Covariant derivatives
2.10 Torsion
2.11 Connection coefficients in terms of the metric
2.12 Torsion-free covariant derivative
2.13 Mathematical aside: What if there is no metric?
2.14 Coordinate 4-velocity
2.15 Geodesic equation
2.16 Coordinate 4-momentum
2.17 Affine parameter
2.18 Affine distance
2.19 Riemann tensor
2.20 Ricci tensor, Ricci scalar
2.21 Einstein tensor
2.22 Bianchi identities
2.23 Covariant conservation of the Einstein tensor
2.24 Einstein equations
2.25 Summary of the path from metric to the energy-momentum tensor
2.26 Energy-momentum tensor of a perfect fluid
2.27 Newtonian limit
50
51
52
54
56
57
57
58
59
62
65
67
67
71
71
71
72
73
73
75
78
79
79
79
80
80
81
81
3
More on the coordinate approach
3.1
Weyl tensor
3.2
Evolution equations for the Weyl tensor, and gravitational waves
3.3
Geodesic deviation
89
89
90
92
4
Action principle for point particles
4.1
Principle of least action for point particles
4.2
Generalized momentum
4.3
Lagrangian for a test particle
4.4
Massless test particle
94
94
96
96
98
Contents
v
4.5
Effective Lagrangian for a test particle
4.6
Nice Lagrangian for a test particle
4.7
Action for a charged test particle in an electromagnetic field
4.8
Symmetries and constants of motion
4.9
Conformal symmetries
4.10 (Super-)Hamiltonian
4.11 Conventional Hamiltonian
4.12 Conventional Hamiltonian for a test particle
4.13 Effective (super-)Hamiltonian for a test particle with electromagnetism
4.14 Nice (super-)Hamiltonian for a test particle with electromagnetism
4.15 Derivatives of the action
4.16 Hamilton-Jacobi equation
4.17 Canonical transformations
4.18 Symplectic structure
4.19 Symplectic scalar product and Poisson brackets
4.20 (Super-)Hamiltonian as a generator of evolution
4.21 Infinitesimal canonical transformations
4.22 Constancy of phase-space volume under canonical transformations
4.23 Poisson algebra of integrals of motion
Concept Questions
What’s important?
99
100
101
103
104
107
108
108
110
110
112
113
113
115
117
117
118
119
119
121
123
5
Observational Evidence for Black Holes
124
6
Ideal
6.1
6.2
6.3
126
126
127
127
7
Schwarzschild Black Hole
7.1
Schwarzschild metric
7.2
Stationary, static
7.3
Spherically symmetric
7.4
Energy-momentum tensor
7.5
Birkhoff’s theorem
7.6
Horizon
7.7
Proper time
7.8
Redshift
7.9
“Schwarzschild singularity”
7.10 Weyl tensor
7.11 Singularity
7.12 Gullstrand-Painlevé metric
Black Holes
Definition of a black hole
Ideal black hole
No-hair theorem
129
129
130
131
132
132
133
134
135
135
136
136
138
vi
Contents
7.13
7.14
7.15
7.16
7.17
7.18
7.19
7.20
7.21
7.22
7.23
7.24
7.25
7.26
7.27
7.28
7.29
7.30
7.31
7.32
7.33
7.34
8
Embedding diagram
Schwarzschild spacetime diagram
Gullstrand-Painlevé spacetime diagram
Eddington-Finkelstein spacetime diagram
Kruskal-Szekeres spacetime diagram
Antihorizon
Analytically extended Schwarzschild geometry
Penrose diagrams
Penrose diagrams as guides to spacetime
Future and past horizons
Oppenheimer-Snyder collapse to a black hole
Apparent horizon
True horizon
Penrose diagrams of Oppenheimer-Snyder collapse
Illusory horizon
Collapse of a shell of matter on to a black hole
The illusory horizon and black hole thermodynamics
Rindler space and Rindler horizons
Rindler observers who start at rest, then accelerate
Killing vectors
Killing tensors
Lie derivative
Reissner-Nordström Black Hole
8.1
Reissner-Nordström metric
8.2
Energy-momentum tensor
8.3
Weyl tensor
8.4
Horizons
8.5
Gullstrand-Painlevé metric
8.6
Radial null geodesics
8.7
Finkelstein coordinates
8.8
Kruskal-Szekeres coordinates
8.9
Analytically extended Reissner-Nordström geometry
8.10 Penrose diagram
8.11 Antiverse: Reissner-Nordström geometry with negative mass
8.12 Outgoing, ingoing
8.13 The inflationary instability
8.14 The X point
8.15 Extremal Reissner-Nordström geometry
8.16 Super-extremal Reissner-Nordström geometry
146
147
149
149
150
152
152
155
156
159
159
160
161
162
163
166
167
167
170
174
178
179
185
185
186
187
187
187
189
190
191
194
194
196
196
197
199
199
200
Contents
8.17
Reissner-Nordström geometry with imaginary charge
vii
202
9
Kerr-Newman Black Hole
9.1
Boyer-Lindquist metric
9.2
Oblate spheroidal coordinates
9.3
Time and rotation symmetries
9.4
Ring singularity
9.5
Horizons
9.6
Angular velocity of the horizon
9.7
Ergospheres
9.8
Turnaround radius
9.9
Antiverse
9.10 Sisytube
9.11 Extremal Kerr-Newman geometry
9.12 Super-extremal Kerr-Newman geometry
9.13 Energy-momentum tensor
9.14 Weyl tensor
9.15 Electromagnetic field
9.16 Principal null congruences
9.17 Finkelstein coordinates
9.18 Doran coordinates
9.19 Penrose diagram
Concept Questions
What’s important?
204
204
205
207
207
207
209
209
209
210
210
210
212
212
213
213
213
214
214
217
218
220
10
Homogeneous, Isotropic Cosmology
10.1 Observational basis
10.2 Cosmological Principle
10.3 Friedmann-Lemaı̂tre-Robertson-Walker metric
10.4 Spatial part of the FLRW metric: informal approach
10.5 Comoving coordinates
10.6 Spatial part of the FLRW metric: more formal approach
10.7 FLRW metric
10.8 Einstein equations for FLRW metric
10.9 Newtonian “derivation” of Friedmann equations
10.10 Hubble parameter
10.11 Critical density
10.12 Omega
10.13 Types of mass-energy
10.14 Redshifting
10.15 Evolution of the cosmic scale factor
221
221
226
227
227
229
230
232
232
232
234
236
236
237
239
241
viii
Contents
10.16
10.17
10.18
10.19
10.20
10.21
10.22
10.23
10.24
10.25
10.26
10.27
10.28
11
Age of the Universe
Conformal time
Looking back along the lightcone
Hubble diagram
Recombination
Horizon
Inflation
Evolution of the size and density of the Universe
Evolution of the temperature of the Universe
Neutrino mass
Occupation number, number density, and energy-momentum
Occupation numbers in thermodynamic equilibrium
Maximally symmetric spaces
242
243
244
245
247
248
250
252
254
258
260
262
267
PART TWO TETRAD APPROACH TO GENERAL RELATIVITY
Concept Questions
What’s important?
279
281
283
The tetrad formalism
11.1 Tetrad
11.2 Vierbein
11.3 The metric encodes the vierbein
11.4 Tetrad transformations
11.5 Tetrad vectors and tensors
11.6 Index and naming conventions for vectors and tensors
11.7 Gauge transformations
11.8 Directed derivatives
11.9 Tetrad covariant derivative
11.10 Relation between tetrad and coordinate connections
11.11 Antisymmetry of the tetrad connections
11.12 Torsion tensor
11.13 No-torsion condition
11.14 Tetrad connections in terms of the vierbein
11.15 Torsion-free covariant derivative
11.16 Riemann curvature tensor
11.17 Ricci, Einstein, Bianchi
11.18 Expressions with torsion
11.19 General relativity in 2 spacetime dimensions
284
284
285
285
286
287
289
290
290
291
293
293
293
294
294
295
295
298
299
300
Contents
ix
12
Spin
12.1
12.2
12.3
12.4
and Newman-Penrose tetrads
Spin tetrad formalism
Newman-Penrose tetrad formalism
Weyl tensor
Petrov classification of the Weyl tensor
305
305
308
310
313
13
The geometric algebra
13.1 Products of vectors
13.2 Geometric product
13.3 Reverse
13.4 The pseudoscalar and the Hodge dual
13.5 Multivector metric
13.6 General products of multivectors
13.7 Reflection
13.8 Rotation
13.9 A multivector rotation is an active rotation
13.10 A rotor is a spin- 21 object
13.11 Generator of a rotation
13.12 2D rotations and complex numbers
13.13 Quaternions
13.14 3D rotations and quaternions
13.15 Pauli matrices
13.16 Pauli spinors as quaternions, or scaled rotors
13.17 Spin axis
315
316
317
319
319
321
321
323
325
328
328
329
329
331
332
334
335
337
14
The spacetime algebra
14.1 Spacetime algebra
14.2 Complex quaternions
14.3 Lorentz transformations and complex quaternions
14.4 Spatial Inversion (P ) and Time Reversal (T )
14.5 How to implement Lorentz transformations on a computer
14.6 Killing vector fields of Minkowski space
14.7 Dirac matrices
14.8 Dirac spinors
14.9 Dirac spinors as complex quaternions
14.10 Non-null Dirac spinor
14.11 Null Dirac Spinor
339
339
341
342
346
347
351
354
356
357
362
363
15
Geometric Differentiation and Integration
15.1 Covariant derivative of a multivector
15.2 Riemann tensor of bivectors
15.3 Torsion tensor of vectors
367
368
370
371
x
Contents
15.4
15.5
15.6
15.7
15.8
15.9
15.10
15.11
15.12
15.13
15.14
15.15
15.16
Covariant spacetime derivative
Torsion-full and torsion-free covariant spacetime derivative
Differential forms
Wedge product of differential forms
Exterior derivative
An alternative notation for differential forms
Hodge dual form
Relation between coordinate- and tetrad-frame volume elements
Generalized Stokes’ theorem
Exact and closed forms
Generalized Gauss’ theorem
Dirac delta-function
Integration of multivector-valued forms
371
373
374
375
376
378
379
380
381
383
384
385
386
16
Action principle for electromagnetism and gravity
16.1 Euler-Lagrange equations for a generic field
16.2 Super-Hamiltonian formalism
16.3 Conventional Hamiltonian formalism
16.4 Symmetries and conservation laws
16.5 Electromagnetic action
16.6 Electromagnetic action in forms notation
16.7 Gravitational action
16.8 Variation of the gravitational action
16.9 Trading coordinates and momenta
16.10 Matter energy-momentum and the Einstein equations with matter
16.11 Spin angular-momentum
16.12 Lagrangian as opposed to Hamiltonian formulation
16.13 Gravitational action in multivector notation
16.14 Gravitational action in multivector forms notation
16.15 Space+time (3+1) split in multivector forms notation
16.16 Loop Quantum Gravity
16.17 Bianchi identities in multivector forms notation
388
389
391
391
392
393
400
406
409
411
413
413
421
422
427
443
453
459
17
Conventional Hamiltonian (3+1) approach
17.1 ADM formalism
17.2 ADM gravitational equations of motion
17.3 Conformally scaled ADM
17.4 Bianchi spacetimes
17.5 Friedmann-Lemaı̂tre-Robertson-Walker spacetimes
17.6 BKL oscillatory collapse
17.7 BSSN formalism
465
466
476
482
484
490
491
500
Contents
xi
17.8 Pretorius formalism
17.9 M +N split
17.10 2+2 split
504
505
507
18
Singularity theorems
18.1 Congruences
18.2 Raychaudhuri equations
18.3 Raychaudhuri equations for a timelike geodesic congruence
18.4 Raychaudhuri equations for a null geodesic congruence
18.5 Sachs optical coefficients
18.6 Hypersurface-orthogonality for a timelike congruence
18.7 Hypersurface-orthogonality for a null congruence
18.8 Focusing theorems
18.9 Singularity theorems
Concept Questions
What’s important?
508
508
510
511
513
515
516
519
522
523
527
528
19
Black hole waterfalls
19.1 Tetrads move through coordinates
19.2 Gullstrand-Painlevé waterfall
19.3 Boyer-Lindquist tetrad
19.4 Doran waterfall
529
529
530
536
538
20
General spherically symmetric spacetimes
20.1 Spherical spacetime
20.2 Spherical line element
20.3 Rest diagonal line element
20.4 Comoving diagonal line element
20.5 Tetrad connections
20.6 Riemann, Einstein, and Weyl tensors
20.7 Einstein equations
20.8 Choose your frame
20.9 Interior mass
20.10 Energy-momentum conservation
20.11 Structure of the Einstein equations
20.12 Comparison to ADM (3+1) formulation
20.13 Spherical electromagnetic field
20.14 General relativistic stellar structure
20.15 Freely-falling dust
20.16 Naked singularities in dust collapse
20.17 Self-similar spherically symmetric spacetime
544
544
544
546
547
548
549
550
551
551
553
554
557
557
558
560
562
565
xii
Contents
21
The interiors of accreting, spherical black holes
21.1 Boundary conditions and equation of state
21.2 Black hole accreting a neutral relativistic plasma
21.3 Black hole accreting a charged relativistic plasma
21.4 Black hole accreting charged baryons and dark matter
21.5 The black hole particle accelerator
21.6 The mechanism of mass inflation
21.7 The far future?
21.8 Weak null singularity on the Cauchy horizon?
21.9 Black hole accreting a fluid with an ultrahard equation of state
21.10 Black hole accreting a conducting charged plasma
21.11 Weird stuff at the outer horizon?
580
582
584
586
587
589
591
594
594
599
600
605
22
Ideal
22.1
22.2
22.3
22.4
22.5
22.6
22.7
607
607
609
610
613
617
619
621
23
Trajectories in ideal rotating black holes
23.1 Hamilton-Jacobi equation
23.2 Particle with magnetic charge
23.3 Killing vectors and Killing tensor
23.4 Turnaround
23.5 Constraints on the Hamilton-Jacobi parameters Pt and Px
23.6 Principal null congruences
23.7 Carter integral Q
23.8 Penrose process
23.9 Constant latitude trajectories in the Kerr-Newman geometry
23.10 Circular orbits in the Kerr-Newman geometry
23.11 General solution for circular orbits
23.12 Circular geodesics (orbits for particles with zero electric charge)
23.13 Null circular orbits
23.14 Marginally stable circular orbits
23.15 Circular orbits at constant latitude in the Antiverse
23.16 Circular orbits at the horizon of an extremal black hole
23.17 Equatorial circular orbits in the Kerr geometry
23.18 Thin disk accretion
rotating black holes
Separable geometries
Horizons
Conditions from Hamilton-Jacobi separability
Electrovac solutions from separation of Einstein’s equations
Electrovac solutions of Maxwell’s equations
Λ-Kerr-Newman boundary conditions
Taub-NUT geometry
625
625
627
627
628
628
629
630
633
633
634
635
640
644
646
647
647
649
651
Contents
23.19
23.20
23.21
23.22
Circular orbits in the Reissner-Nordström geometry
Hypersurface-orthogonal congruences
The principal null and Doran congruences
Pretorius-Israel double-null congruence
xiii
654
655
661
662
24
The interiors of rotating black holes
24.1 Nonlinear evolution
24.2 Focussing along principal null directions
24.3 Conformally separable geometries
24.4 Conditions from conformal Hamilton-Jacobi separability
24.5 Tetrad-frame connections
24.6 Inevitability of mass inflation
24.7 The black hole particle accelerator
Concept Questions
What’s important?
667
667
668
668
668
668
669
670
673
675
25
Perturbations and gauge transformations
25.1 Notation for perturbations
25.2 Vierbein perturbation
25.3 Gauge transformations
25.4 Tetrad metric assumed constant
25.5 Perturbed coordinate metric
25.6 Tetrad gauge transformations
25.7 Coordinate gauge transformations
25.8 Scalar, vector, tensor decomposition of perturbations
676
676
676
677
677
678
678
679
681
26
Perturbations in a flat space background
26.1 Classification of vierbein perturbations
26.2 Metric, tetrad connections, and Einstein and Weyl tensors
26.3 Spin components of the Einstein tensor
26.4 Too many Einstein equations?
26.5 Action at a distance?
26.6 Comparison to electromagnetism
26.7 Harmonic gauge
26.8 Newtonian (Copernican) gauge
26.9 Synchronous gauge
26.10 Newtonian potential
26.11 Dragging of inertial frames
26.12 Quadrupole pressure
26.13 Gravitational waves
26.14 Energy-momentum carried by gravitational waves
Concept Questions
684
684
687
689
690
690
691
695
697
698
699
700
704
705
708
713
xiv
Contents
27
An overview of cosmological perturbations
714
28
Cosmological perturbations in a flat Friedmann-Lemaı̂tre-Robertson-Walker background
28.1 Unperturbed line-element
28.2 Comoving Fourier modes
28.3 Classification of vierbein perturbations
28.4 Residual global gauge freedoms
28.5 Metric, tetrad connections, and Einstein tensor
28.6 Gauge choices
28.7 ADM gauge choices
28.8 Conformal Newtonian (Copernican) gauge
28.9 Conformal synchronous gauge
719
719
720
720
722
724
728
728
728
734
29
Cosmological perturbations: a simplest set of assumptions
29.1 Perturbed FLRW line-element
29.2 Energy-momenta of perfect fluids
29.3 Entropy conservation at superhorizon scales
29.4 Unperturbed background
29.5 Generic behaviour of non-baryonic cold dark matter
29.6 Generic behaviour of radiation
29.7 Equations for the simplest set of assumptions
29.8 On the numerical computation of cosmological power spectra
29.9 Analytic solutions in various regimes
29.10 Superhorizon scales
29.11 Radiation-dominated, adiabatic initial conditions
29.12 Radiation-dominated, isocurvature initial conditions
29.13 Subhorizon scales
29.14 Fluctuations that enter the horizon during the matter-dominated epoch
29.15 Matter-dominated regime
29.16 Baryons post-recombination
29.17 Matter with dark energy
29.18 Matter with dark energy and curvature
29.19 Primordial power spectrum
29.20 Matter power spectrum
29.21 Nonlinear evolution of the matter power spectrum
29.22 Statistics of random fields
736
737
737
740
743
744
745
747
750
751
752
755
759
760
763
766
767
768
769
772
773
776
777
30
Non-equilibrium processes in the FLRW background
30.1 Conditions around the epoch of recombination
30.2 Overview of recombination
30.3 Energy levels and ionization state in thermodynamic equilibrium
783
784
785
786
Contents
30.4
30.5
30.6
30.7
30.8
30.9
30.10
Occupation numbers
Boltzmann equation
Collisions
Non-equilibrium recombination
Recombination: Peebles approximation
Recombination: Seager et al. approximation
Sobolev escape probability
xv
790
790
792
794
797
802
804
31
Cosmological perturbations: the hydrodynamic approximation
31.1 Electron-photon (Thomson) scattering
31.2 Summary of equations in the hydrodynamic approximation
31.3 Standard cosmological parameters
31.4 The photon-baryon fluid in the tight-coupling approximation
31.5 WKB approximation
31.6 Including quadrupole pressure in the momentum conservation equation
31.7 Photon diffusion (Silk damping)
31.8 Viscous baryon drag damping
31.9 Photon-baryon wave equation with dissipation
31.10 Baryon loading
31.11 Neutrinos
806
808
809
814
816
818
819
820
821
822
823
825
32
Cosmological perturbations: Boltzmann treatment
32.1 Summary of equations in the Boltzmann treatment
32.2 Boltzmann equation in a perturbed FLRW geometry
32.3 Non-baryonic cold dark matter
32.4 Boltzmann equation for the temperature fluctuation
32.5 Spherical harmonics of the temperature fluctuation
32.6 The Boltzmann equation for massless particles
32.7 Energy-momentum tensor for massless particles
32.8 Nonrelativistic electron-photon (Thomson) scattering
32.9 The photon collision term for electron-photon scattering
32.10 Boltzmann equation for photons
32.11 Baryons
32.12 Boltzmann equation for relativistic neutrinos
32.13 Truncating the photon Boltzmann hierarchy
32.14 Massive neutrinos
32.15 Appendix: Legendre polynomials
826
828
831
834
836
838
838
839
839
840
844
844
845
847
848
849
33
Fluctuations in the Cosmic Microwave Background
33.1 Radiative transfer of CMB photons
33.2 Harmonics of the CMB photon distribution
33.3 CMB in real space
851
851
853
862
xvi
Contents
33.4
33.5
33.6
33.7
34
35
Observing CMB power
Large-scale CMB fluctuations (Sachs-Wolfe effect)
Radiative transfer of neutrinos
Appendix: Integrals over spherical Bessel functions
867
867
869
872
Polarization of the Cosmic Microwave Background
34.1 Photon polarization
34.2 Spin sign convention
34.3 Photon density matrix
34.4 Temperature fluctuation for polarized photons
34.5 Boltzmann equations for polarized photons
34.6 Spherical harmonics of the polarized photon distribution
34.7 Vector and tensor Einstein equations
34.8 Reality conditions on the polarized photon distribution
34.9 Polarized Thomson scattering
34.10 Summary of Boltzmann equations for polarized photons
34.11 Radiative transfer of the polarized CMB
34.12 Harmonics of the polarized CMB photon distribution
34.13 Harmonics of the polarized CMB in real space
34.14 Polarized CMB power spectra
34.15 Appendix: Spin-weighted spherical harmonics
873
873
875
875
878
878
879
881
882
883
887
888
889
892
893
895
PART THREE
901
SPINORS
The super geometric algebra
35.1 Spin basis vectors in 3D
35.2 Spin weight
35.3 Pauli representation of spin basis vectors
35.4 Spinor basis elements
35.5 Pauli spinor
35.6 Spinor metric
35.7 Row basis spinors
35.8 Inner products of spinor basis elements
35.9 Lowering and raising spinor indices
35.10 Outer products of spinor basis elements
35.11 The 3D super geometric algebra
35.12 C-conjugate Pauli spinor
35.13 Scalar products of spinors and conjugate spinors
35.14 C-conjugate multivectors
903
903
904
904
905
906
906
907
908
908
909
911
912
914
915
Contents
xvii
36
Super spacetime algebra
36.1 Newman-Penrose formalism
36.2 Chiral representation of γ-matrices
36.3 Spinor basis elements
36.4 Dirac and Weyl spinors
36.5 Spinor scalar product
36.6 Super spacetime algebra
36.7 C-conjugation
36.8 Anticommutation of Dirac spinors
36.9 Discrete transformations P , T
932
932
934
935
937
938
941
945
950
952
37
Geometric Differentiation and Integration of Spinors
37.1 Covariant derivative of a spinor
37.2 Covariant derivative in a spinor basis
37.3 Covariant spacetime derivative of a spinor
37.4 Gauss’ theorem for spinors
957
957
958
959
960
38
Action principle for spinor fields
38.1 Fermion content of the Standard Model of Physics
38.2 Dirac spinor field
38.3 Dirac field with electromagnetism
38.4 CP T -conjugates: Antiparticles
38.5 Negative mass particles?
38.6 Neutrinos and Majorana spinors
38.7 Spinors that are their own anti-particles are not self-conjugate
38.8 Dotted and undotted index notation
Bibliography
961
961
967
971
972
973
975
978
979
983
Illustrations
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19
1.20
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
4.1
Vermilion emits a flash of light
Spacetime diagram
Spacetime diagram of Vermilion emitting a flash of light
Cerulean’s spacetime is skewed compared to Vermilion’s
Distances perpendicular to the direction of motion are unchanged
Vermilion defines hypersurfaces of simultaneity
Cerulean defines hypersurfaces of simultaneity similarly
Light clocks
Spacetime diagram illustrating the construction of hypersurfaces of simultaneity
Time dilation and Lorentz contraction spacetime diagrams
A cube: are the lengths of its sides all equal?
Twin paradox spacetime diagram
Wheel
Spacetime wheel
The right quadrant of the spacetime wheel represents uniformly accelerating observers
Spacetime diagram illustrating timelike, lightlike, and spacelike intervals
The longest proper time between two events is a straight line
Superluminal motion of the M87 jet
The rules of 4-dimensional perspective
Tachyon spacetime diagram
The principle of equivalence implies that gravity curves spacetime
A 2-sphere must be covered with at least two charts
The principle of equivalence implies the gravitational redshift and the gravitational bending of
light
Tetrad vectors γm and tangent vectors eµ
Derivatives of tangent vectors eµ defined by parallel transport
Shapiro time delay
Lensing diagram
The appearance of a source lensed by a point lens
Action principle
xviii
13
14
14
15
16
17
18
19
20
23
24
25
26
27
29
31
34
39
41
45
51
53
55
58
63
86
87
87
95
4.2
6.1
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
7.14
7.15
7.16
7.17
7.18
7.19
7.20
7.21
7.22
7.23
7.24
7.25
7.26
7.27
7.28
7.29
7.30
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
8.10
Illustrations
xix
Rindler wedge
Fishes in a black hole waterfall
The singularity is not a point
Waterfall model of a Schwarzschild black hole
A body cannot remain rigid as it approaches the Schwarzschild singularity
Embedding diagram of the Schwarzschild geometry
Schwarzschild spacetime diagram
Gullstrand-Painlevé spacetime diagram
Finkelstein spacetime diagram
Kruskal-Szekeres spacetime diagram
Morph Finkelstein to Kruskal-Szekeres spacetime diagram
Embedding diagram of the analytically extended Schwarzschild geometry
Analytically extended Kruskal-Szekeres spacetime diagram
Sequence of embedding diagrams of the analytically extended Schwarzschild geometry
Penrose spacetime diagram
Morph Kruskal-Szekeres to Penrose spacetime diagram
Penrose spacetime diagram of the analytically extended Schwarzschild geometry
Penrose diagram of the Schwarzschild geometry
Penrose diagram of the analytically extended Schwarzschild geometry
Oppenheimer-Snyder collapse of a pressureless star
Spacetime diagrams of Oppenheimer-Snyder collapse
Penrose diagrams of Oppenheimer-Snyder collapse
Penrose diagram of a collapsed spherical star
Visualization of falling into a Schwarzschild black hole
Collapse of a shell on to a pre-existing black hole
Finkelstein spacetime diagram of a shell collapsing on to a pre-existing black hole
Rindler diagram
Penrose Rindler diagram
Spacetime diagram of Minkowski space showing observers who start at rest and then accelerate
Formation of the Rindler illusory horizon
Penrose diagram of Rindler space
Killing vector field on a 2-sphere
Waterfall model of a Reissner-Nordström black hole
Spacetime diagram of the Reissner-Nordström geometry
Finkelstein spacetime diagram of the Reissner-Nordström geometry
Kruskal spacetime diagram of the Reissner-Nordström geometry
Kruskal spacetime diagram of the analytically extended Reissner-Nordström geometry
Penrose diagram of the Reissner-Nordström geometry
Penrose diagram illustrating why the Reissner-Nordström geometry is subject to the inflationary
instability
Waterfall model of an extremal Reissner-Nordström black hole
Penrose diagram of the extremal Reissner-Nordström geometry
Waterfall model of an extremal Reissner-Nordström black hole
105
126
137
138
144
146
148
149
150
151
152
153
153
154
156
156
157
157
158
160
161
162
163
164
165
166
168
169
171
171
172
176
188
190
191
192
193
195
197
200
201
202
xx
8.11
9.1
9.2
9.3
9.4
9.5
9.6
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
10.9
10.10
10.11
10.12
10.13
10.14
10.15
10.16
10.17
10.18
11.1
11.2
13.1
13.2
13.3
13.4
14.1
14.2
15.1
17.1
18.1
18.2
18.3
18.4
18.5
18.6
19.1
19.2
Illustrations
Penrose diagram of the Reissner-Nordström geometry with imaginary charge Q
Geometry of Kerr and Kerr-Newman black holes
Contours of constant ρ, and their normals, in Boyer-Lindquist coordinates
Geometry of extremal Kerr and Kerr-Newman black holes
Geometry of a super-extremal Kerr black hole
Waterfall model of a Kerr black hole
Penrose diagram of the Kerr-Newman geometry
Hubble diagram of Type Ia supernovae
Power spectrum of fluctuations in the CMB
Power spectrum of galaxies
Embedding diagram of the FLRW geometry
Poincaré disk
Newtonian picture of the Universe as a uniform density gravitating ball
Behaviour of the mass-energy density of various species as a function of cosmic time
Distance versus redshift in the FLRW geometry
Spacetime diagram of FRLW Universe
Evolution of redshifts of objects at fixed comoving distance
Cosmic scale factor and Hubble distance as a function of cosmic time
Mass-energy density of the Universe as a function of cosmic time
Temperature of the Universe as a function of cosmic time
A massive fermion flips between left- and right-handed as it propagates through spacetime
Embedding spacetime diagram of de Sitter space
Penrose diagram of de Sitter space
Embedding spacetime diagram of anti de Sitter space
Penrose diagram of anti de Sitter space
Tetrad vectors γm
Derivatives of tetrad vectors γm defined by parallel transport
Vectors, bivectors, and trivectors
Reflection of a vector through an axis
Rotation of a vector by a bivector
Right-handed rotation of a vector by angle θ
Lorentz boost of a vector by rapidity θ
Killing trajectories in Minkowski space
Partition of unity
Cosmic scale factors in BKL collapse
Expansion, vorticity, and shear
Formation of caustics in a hypersurface-orthogonal timelike congruence
Caustics in the galaxy NGC 474
Formation of caustics in a hypersurface-orthogonal null congruence
Spacetime diagram of the dog-leg proposition
Null boundary of the future of a 2-dimensional spacelike surface
Waterfall model of Schwarzschild and Reissner-Nordström black holes
Waterfall model of a Kerr black hole
203
206
208
211
212
215
216
222
224
225
228
231
233
238
245
248
249
253
254
257
259
269
271
273
275
284
291
316
324
325
326
344
352
382
499
516
517
518
521
524
525
531
542
20.1
21.1
21.2
21.3
21.4
21.5
21.6
21.7
21.8
21.9
21.10
22.1
23.1
23.2
23.3
23.4
23.5
23.6
23.7
23.8
23.9
23.10
24.1
26.1
28.1
29.1
29.2
29.3
29.4
29.5
29.6
29.7
29.8
29.9
29.10
29.11
29.12
29.13
29.14
29.15
Illustrations
xxi
Spacetime diagram of a naked singularity in dust collapse
Uncharged baryonic plasma falls into spherical black hole
Charged, non-conducting plasma falls into a spherical black hole
Charged baryonic matter and neutral dark matter fall into a spherical black hole
Smaller accretion rates provoke faster inflation
Collision rates in the black hole particle accelerator
Spacetime diagram illustrating qualitatively the three successive phases of mass inflation
Charged plasma with an ultrahard equation of state falls into a black hole
Charged plasma with near critical conductivity falls into a black hole, creating huge entropy
inside the horizon
Accreting spherical charged black hole that creates even more entropy inside the horizon
Penrose diagram of entropy production inside a black hole
Geometry of a Kerr-NUT black hole
Values of 1/P for circular orbits of a charged particle about a Kerr-Newman black hole
Location of stable and unstable circular orbits in the Kerr geometry
Location of stable and unstable circular orbits in the super-extremal Kerr geometry
Radii of null circular orbits for a Kerr black hole
Radii of marginally stable circular orbits for a Kerr black hole
Values of the Hamilton-Jacobi parameter Pt for circular orbits in the equatorial plane of a
near-extremal Kerr black hole
Energy and anglar momentum on the ISCO
Accretion efficiency of a Kerr black hole
Outgoing and ingoing null coordinates
Pretorius-Israel double-null hypersurface-orthogonal congruence
Particles falling from infinite radius can be either outgoing or ingoing at the inner horizon only
up to a maximum latitude
The two polarizations of gravitational waves
Evolution of the tensor potential hab
Evolution of dark matter and radiation in the simple model
Overdensities and velocities in the simple approximation
Regimes in the evolution of fluctuations
Superhorizon scales
Evolution of the scalar potential Φ at superhorizon scales
Radiation-dominated regime
Evolution of the potential Φ and the radiation monopole Θ0
Evolution of the dark matter overdensity δc
Subhorizon scales
Growth of the dark matter overdensity δc through matter-radiation equality
Fluctuations that enter the horizon in the matter-dominated regime
Evolution of the potential Φ and the radiation monopole Θ0 for long wavelength modes
Matter-dominated regime
Growth factor g(a)
Matter power spectrum for the simple model
563
585
586
587
588
590
592
599
602
603
604
622
638
641
642
645
646
650
650
652
660
663
671
706
732
748
749
751
752
754
755
756
757
760
762
764
765
766
771
776
xxii
30.1
30.2
30.3
30.4
31.1
31.2
31.3
31.4
32.1
32.2
32.3
32.4
32.5
33.1
33.2
33.3
33.4
33.4
33.5
33.6
33.7
33.8
34.1
34.2
Illustrations
Ion fractions in thermodynamic equilibrium
Recombination of Hydrogen
Departure coefficients in the recombination of Hydrogen
Recombination of Hydrogen and Helium
Overdensities and velocities in the hydrodynamic approximation
Photon and neutrino multipoles in the hydrodynamic approximation
Evolution of matter and photon monopoles in the hydrodynamic approximation
Matter power spectrum in the hydrodynamic approximation
Overdensities and velocities from a Boltzmann computation
Photon and neutrino multipoles
Evolution of matter and photon monopoles in a Boltzmann computation
Difference Ψ − Φ in scalar potentials
Matter power spectrum
Visibility function
Factors in the solution of the radiative transfer equation
ISW integrand
CMB transfer functions Tℓ (η0 , k) for a selection of harmonics ℓ
(continued)
Thomson scattering source functions at recombination
CMB transfer functions in the rapid recombination approximation
CMB power spectrum in the rapid recombination approximation
Multipole contributions to the CMB power spectrum
Thomson scattering of polarized light
Angles between photon momentum, scattered photon momentum, and wavevector
788
799
800
803
807
808
811
812
826
827
829
830
831
853
855
856
858
859
860
861
865
866
884
886
Tables
1.1
10.1
10.2
10.3
10.4
12.1
17.1
17.2
23.1
35.1
36.1
38.1
38.2
Trip across the Universe
Cosmic inventory
Properties of universes dominated by various species
Evolution of cosmic scale factor in universes dominated by various species
Effective entropy-weighted number of relativistic particle species
Petrov classification of the Weyl tensor
Classification of Bianchi spaces
Bianchi vierbein
Signs of Pt and Px in various regions of the Kerr-Newman geometry
Symmetry of spinor metric
Square of the conjugation operator
Conserved charges in the Standard Model
Coincidences of dimensions of Lie algebras of Spin(K) and SU(M )
xxiii
29
237
238
242
256
314
486
494
629
918
954
962
967
Exercises and ∗Concept questions
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
1.22
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
∗
Does light move differently depending on who emits it?
Challenge problem: the paradox of the constancy of the speed of light
Pictorial derivation of the Lorentz transformation
3D model of the Lorentz transformation
Mathematical derivation of the Lorentz transformation
∗
Determinant of a Lorentz transformation
Time dilation
Lorentz contraction
∗
Is one side of a cube shorter than the other?
Twin paradox
∗
What breaks the symmetry between you and your twin?
∗
Proper time, proper distance
Scalar product
The principle of longest proper time
Superluminal jets
The rules of 4-dimensional perspective
Circles on the sky
Lorentz transformation preserves angles on the sky
The aberration of starlight
∗
Apparent (affine) distance
Brightness of a star
Tachyons
The equivalence principle implies the gravitational redshift of light, Part 1
The equivalence principle implies the gravitational redshift of light, Part 2
∗
Does covariant differentiation commute with the metric?
∗
Parallel transport when torsion is present
∗
Can the metric be Minkowski in the presence of torsion?
Covariant curl and coordinate curl
Covariant divergence and coordinate divergence
∗
If torsion does not vanish, does torsion-free covariant differentiation commute with the metric?
xxiv
13
13
20
20
20
22
23
23
23
24
26
31
33
33
39
41
42
43
43
43
44
45
54
55
65
66
69
69
69
70
Exercises and ∗ Concept questions
2.9
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
3.1
3.2
3.3
3.4
4.1
4.2
4.3
4.4
4.5
4.6
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
7.14
7.15
7.16
7.17
7.18
7.19
7.20
7.21
7.22
Gravitational redshift in a stationary metric
Gravitational redshift in Rindler space
Gravitational redshift in a uniformly rotating space
Derivation of the Riemann tensor
Jacobi identity
Einstein tensor in 3 or more dimensions
Special and general relativistic corrections for clocks on satellites
Equations of motion in weak gravity
Deflection of light by the Sun
Shapiro time delay
Gravitational lensing
Number of components of the Riemann, Ricci, and Weyl tensors in arbitrary dimensions
Weyl tensor in arbitrary dimensions
Number of Bianchi identities
Wave equation for the Riemann and Weyl tensors
∗
Redundant time coordinates?
∗
Throw a clock up in the air
∗
Conventional Lagrangian
Geodesics in Rindler space
∗
Action vanishes along a null geodesic, but its gradient does not
∗
How many integrals of motion can there be?
Schwarzschild metric in isotropic form
Derivation of the Schwarzschild metric
∗
Going forwards or backwards in time inside the horizon
∗
Is the singularity of a Schwarzschild black hole a point?
∗
Separation between infallers who fall in at different times
Geodesics in the Schwarzschild geometry
Geodesics in the Schwarzschild geometry in 3 or more dimensions
General relativistic precession of Mercury
A body cannot remain rigid as it approaches the Schwarzschild singularity
Affine distance between infallers who fall along different radial directions
Maximum transverse velocity of a light signal inside the horizon
∗
Penrose diagram of Minkowski space
∗
Penrose diagram of a thin spherical shell collapsing on to a Schwarzschild black hole
∗
Spherical Rindler space
Rindler illusory horizon
Area of the Rindler horizon
∗
What use is a Lie derivative?
Equivalence of expressions for the Lie derivative
Commutator of Lie derivatives
Lie derivative of the metric
Lie derivative of the inverse metric
Lie derivative of the metric determinant
xxv
74
74
75
76
79
80
82
83
85
85
86
89
90
91
91
96
98
99
105
112
120
130
132
134
136
137
139
141
142
143
144
145
158
167
170
173
174
179
181
182
184
184
184
Exercises and ∗ Concept questions
xxvi
8.1
8.2
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
10.9
10.10
10.11
10.12
10.13
10.14
10.15
10.16
10.17
10.18
10.19
10.20
10.21
10.22
10.23
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
11.9
11.10
11.11
13.1
13.2
13.3
13.4
13.5
13.6
13.7
∗
Units of charge of a charged black hole
Blueshift of a photon crossing the inner horizon of a Reissner-Nordström black hole
Isotropic (Poincaré) form of the FLRW metric
Omega in photons
Mass-energy in a FLRW Universe
∗
Mass of a ball of photons or of vacuum
Geodesics in the FLRW geometry
Age of a FLRW universe containing matter and vacuum
Age of a FLRW universe containing radiation and matter
Relation between conformal time and cosmic scale factor
Hubble diagram
Horizon size at recombination
The horizon problem
Relation between horizon and flatness problems
Distribution of non-interacting particles initially in thermodynamic equilibrium
The first law of thermodynamics with non-conserved particle number
Number, energy, pressure, and entropy of a relativistic ideal gas at zero chemical potential
A relation between thermodynamic integrals
Relativistic particles in the early Universe had approximately zero chemical potential
Entropy per particle
Photon temperature at high redshift versus today
Cosmic Neutrino Background
Maximally symmetric spaces
∗
Milne Universe
∗
Stationary FLRW metrics with different curvature constants describe the same spacetime
∗
Schwarzschild vierbein
Generators of Lorentz transformations are antisymmetric
Riemann tensor
Antisymmetry of the Riemann tensor
Cyclic symmetry of the Riemann tensor
Symmetry of the Riemann tensor
Number of components of the Riemann tensor
∗
Must connections vanish if Riemann vanishes?
Black holes in 2 spacetime dimensions?
Tidal forces falling into a Schwarzschild black hole
Totally antisymmetric tensor
Schur’s lemma
∗
What is the dimension of the rotor group in N dimensions?
∗
How fast do bivectors rotate?
Rotation of a vector
3D rotation matrices
∗
Properties of Pauli matrices
Translate a rotor into an element of SU(2)
186
198
231
237
239
239
240
242
243
243
247
251
251
252
263
264
264
265
265
265
266
266
276
276
277
286
287
296
296
296
297
297
297
301
301
303
320
327
327
330
333
334
335
Exercises and ∗ Concept questions
13.8
13.9
13.10
14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.10
14.11
14.12
14.13
14.14
14.15
14.16
14.17
14.18
15.1
15.2
16.1
16.2
16.3
16.4
16.5
16.6
16.7
16.8
16.9
16.10
16.11
16.12
16.13
17.1
17.2
17.3
17.4
17.5
17.6
17.7
18.1
Translate a Pauli spinor into a quaternion
Translate a quaternion into a Pauli spinor
Orthonormal eigenvectors of the spin operator
Null complex quaternions
Nilpotent complex quaternions
Lorentz boost
Factor a Lorentz rotor into a boost and a rotation
Topology of the group of Lorentz rotors
Interpolate a Lorentz transformation
Spline a Lorentz transformation
The wrong way to implement a Lorentz transformation
Translate a Dirac spinor into a complex quaternion
Translate a complex quaternion into a Dirac spinor
Dual Dirac spinor
Relation between ψ and ψ †
Translate a Dirac spinor into a pair of Pauli spinors
Is the group of Lorentz rotors isomorphic to SU(4)?
∗
Is ψψ real or complex?
∗
The boost axis of a null spinor is Lorentz-invariant
∗
What makes Weyl spinors special?
∗
Commutator versus wedge product of multivectors
Leibniz rule for the covariant spacetime derivative
∗
Can the coordinate metric be Minkowski in the presence of torsion?
∗
What kinds of metric or vierbein admit torsion?
∗
Why the names matter energy-momentum and spin angular-momentum?
Energy-momentum and spin angular-momentum of the electromagnetic field
Energy-momentum and spin angular-momentum of a Dirac (or Majorana) field
Electromagnetic field in the presence of torsion
Dirac (or Majorana) spinor field in the presence of torsion
Commutation of multivector forms
∗
Scalar product of the interval form e
Triple products involving products of the interval form e
Lie derivative of a form
Gravitational equations in arbitrary spacetime dimensions
Volume of a ball and area of a sphere
∗
Does Nature pick out a preferred foliation of time?
Energy and momentum constraints
Geodesics in Bianchi spacetimes
Kasner spacetime
Schwarzschild interior as a Bianchi spacetime
Kasner spacetime for a perfect fluid
Oscillatory Belinskii-Khalatnikov-Lifshitz (BKL) instability
Raychaudhuri equations for a non-geodesic timelike congruence
xxvii
336
337
338
341
342
345
345
346
350
350
350
358
359
360
360
360
361
363
364
366
368
373
418
418
418
418
419
420
420
427
429
431
442
456
458
468
481
487
493
495
495
497
513
Exercises and ∗ Concept questions
xxviii
18.2
18.3
19.1
19.2
19.3
19.4
19.5
19.6
20.1
20.2
20.3
20.4
20.5
20.6
21.1
22.1
22.2
22.3
23.1
23.2
23.3
23.4
23.5
23.6
23.7
23.8
23.9
24.1
24.2
25.1
25.2
25.3
26.1
26.2
26.3
26.4
26.5
26.6
26.7
26.8
26.9
28.1
∗
How do singularity theorems apply to the Kerr geometry?
How do singularity theorems apply to the Reissner-Nordström geometry?
Tetrad frame of a rotating wheel
Coordinate transformation from Schwarzschild to Gullstrand-Painlevé
Velocity of a person who free-falls radially from rest
Dragging of inertial frames around a Kerr-Newman black hole
River model of the Friedmann-Lemaı̂tre-Robertson-Walker metric
Program geodesics in a rotating black hole
Apparent horizon
Birkhoff’s theorem
Naked singularities in spherical spacetimes
Constant density star
Oppenheimer-Snyder collapse
Self-similar line element
A collisionless two-stream model of inflation
Explore other separable solutions
Explore separable solutions in an arbitrary number N of spacetime dimensions
Area of the horizon
Near the Kerr-Newman singularity
When must t and φ progress forwards on a geodesic?
Inside the sisytube
Gödel’s Universe
Negative energy trajectories outside the horizon
∗
Are principal null geodesics circular orbits?
Icarus
Interstellar
Expansion, vorticity, and shear along the principal null congruences of the Λ-Kerr-Newman
geometry
∗
Which Einstein equations are redundant?
Can accretion fuel outgoing and ingoing streams at the inner horizon?
∗
Non-infinitesimal tetrad transformations in perturbation theory?
∗
Should not the Lie derivative of a tetrad tensor be a tetrad tensor?
∗
Variation of unperturbed quantities under coordinate gauge transformations?
Classification of perturbations in N spacetime dimensions
∗
Are gauge-invariant potentials Lorentz-invariant?
∗
What parts of Maxwell’s equations can be discarded?
Einstein tensor in harmonic gauge
∗
Independent evolution of scalar, vector, and tensor modes
Gravity Probe B and the geodetic and frame-dragging precession of gyroscopes
Scalar potentials outside a spherical body
∗
Units of the gravitational quadrupole radiation formula
Hulse-Taylor binary
∗
Global curvature as a perturbation?
∗
525
526
529
532
532
538
542
543
546
555
555
559
561
568
595
613
613
623
631
632
632
632
633
649
653
653
665
670
670
679
680
681
686
692
694
696
698
702
705
708
709
724
28.2
28.3
28.4
28.5
28.6
28.7
29.1
29.2
29.3
29.4
29.5
29.6
29.7
29.8
29.9
29.10
29.11
29.12
29.13
29.14
29.15
30.1
30.2
30.3
30.4
30.5
30.6
30.7
31.1
31.2
31.3
31.4
31.5
31.6
31.7
32.1
32.2
32.3
32.4
32.5
33.1
33.2
Exercises and ∗ Concept questions
xxix
Can the Universe at large rotate?
Evolution of vector perturbations in FLRW spacetimes
Evolution of tensor perturbations (gravitational waves) in FLRW spacetimes
∗
Scalar, vector, tensor components of energy-momentum conservation
∗
What frame does the CMB define?
∗
Are congruences of comoving observers in cosmology hypersurface-orthogonal?
Entropy perturbation
∗
Entropy perturbation when number is conserved
Relation between entropy and ζ
∗
If the Friedmann equations enforce conservation of entropy, where does the entropy of the
Universe come from?
∗
What is meant by the horizon in cosmology?
Redshift of matter-radiation equality
Generic behaviour of dark matter
Generic behaviour of radiation
∗
Can neutrinos be treated as a fluid?
Program the equations for the simplest set of cosmological assumptions
Radiation-dominated fluctuations
∗
Does the radiation monopole oscillate after recombination?
Growth of baryon fluctuations after recombination
∗
Curvature scale
Power spectrum of matter fluctuations: simple approximation
Proton and neutron fractions
∗
Level populations of hydrogen near recombination
∗
Ionization state of hydrogen near recombination
∗
Atomic structure notation
∗
Dimensional analysis of the collision rate
Detailed balance
Recombination
Thomson scattering rate
Program the equations in the hydrodynamic approximation
Power spectrum of matter fluctuations: hydrodynamic approximation
Effect of massive neutrinos on the matter power spectrum
Behaviour of radiation in the presence of damping
Diffusion scale
Generic behaviour of neutrinos
Program the Boltzmann equations
Power spectrum of matter fluctuations: Boltzmann treatment
Boltzmann equation factors in a general gauge
Moments of the non-baryonic cold dark matter Boltzmann equation
Initial conditions in the presence of neutrinos
CMB power spectrum in the instantaneous and rapid recombination approximation
CMB power spectra from CAMB
724
730
731
733
734
735
739
740
741
∗
∗
741
744
744
745
746
746
747
758
765
767
770
774
784
789
789
790
793
794
801
809
811
811
812
824
824
825
830
830
834
835
846
864
866
xxx
33.3
34.1
34.2
34.3
34.4
34.5
34.6
34.7
35.1
35.2
35.3
36.1
36.2
36.3
36.4
36.5
36.6
36.7
38.1
38.2
Exercises and ∗ Concept questions
Cosmic Neutrino Background
∗
Relation of the polarization vector to the electromagnetic potential
∗
Elliptically polarized light
∗
Fluctuations with |m| ≥ 3?
Photon diffusion including polarization
Boltzmann code including polarization
∗
Scalar, vector, tensor power spectra?
CMB polarized power spectrum
Consistency of spinor and multivector scalar products
∗
Imaginary spinor metric?
Generalize the super geometric algebra to an arbitrary number of dimensions
∗
Boost and spin weight
∗
Lorentz transformation of the phase of a spinor
∗
Alternative scalar product of Dirac spinors?
Consistency of spinor and multivector scalar products
∗
Chiral scalar
Complex conjugate of a product of spinors and multivectors
Generalize the super spacetime algebra to an arbitrary number of dimensions
Prove that SU(2) × SU(2) is isomorphic to Spin(4)
Prove that SU(4) is isomorphic to Spin(6)
871
874
877
882
887
888
895
895
910
914
915
934
937
939
943
943
951
954
965
966
Legal notice
If you have obtained a copy of this proto-book from anywhere other than my website
http://jila.colorado.edu/∼ajsh/
then that copy is illegal. This version of the proto-book is linked at
http://jila.colorado.edu/∼ajsh/astr3740 17/notes.html
For the time being, this proto-book is free. Do not fall for third party scams that attempt to profit from
this proto-book.
c A. J. S. Hamilton 2014-17
1
Notation
Except where actual units are needed, units are such that the speed of light is one, c = 1, and Newton’s
gravitational constant is one, G = 1.
The metric signature is −+++.
Greek (brown) letters κ, λ, ..., denote spacetime (4D, usually) coordinate indices. Latin (black) letters k,
l, ..., denote spacetime (4D, usually) tetrad indices. Early-alphabet greek letters α, β, ... denote spatial (3D,
usually) coordinate indices. Early-alphabet latin letters a, b, ... denote spatial (3D, usually) tetrad indices.
To avoid distraction, colouring is applied only to coordinate indices, not to the coordinates themselves.
Early-alphabet latin letters a, b, ... are also used to denote spinor indices.
Sequences of indices, as encountered in multivectors (Chapter 13) and differential forms (Chapter 15), are
denoted by capital letters. Greek (brown) capital letters Λ, Π, ... denote sequences of spacetime (4D, usually)
coordinate indices. Latin (black) capital letters K, L, ... denote sequences of spacetime (4D, usually) tetrad
indices. Early-alphabet capital letters denote sequences of spatial (3D, usually) indices, coloured brown A,
B, ... for coordinate indices, and black A, B, ... for tetrad indices.
Specific (non-dummy) components of a vector are labelled by the corresponding coordinate (brown) or
tetrad (black) direction, for example Aµ = {At , Ax , Ay , Az } or Am = {At , Ax , Ay , Az }. Sometimes it is
convenient to use numerical indices, as in Aµ = {A0 , A1 , A2 , A2 } or Am = {A0 , A1 , A2 , A3 }. Allowing the
same label to denote either a coordinate or a tetrad index risks ambiguity, but it should be apparent from
the context (or colour) what is meant. Some texts distinguish coordinate and tetrad indices for example by
a caret on the latter (there is no widespread convention), but this produces notational overload.
Boldface denotes abstract vectors, in either 3D or 4D. In 4D, A = Aµ eµ = Am γm , where eµ denote
coordinate tangent axes, and γm denote tetrad axes.
Repeated paired dummy indices are summed over, the implicit summation convention. In special and
general relativity, one index of a pair must be up (contravariant), while the other must be down (covariant).
If the space being considered is Euclidean, then both indices may be down.
∂/∂xµ denotes coordinate partial derivatives, which commute. ∂m denotes tetrad directed derivatives,
which do not commute. Dµ and Dm denote respectively coordinate-frame and tetrad-frame covariant derivatives.
2
Notation
3
Choice of metric signature
There is a tendency, by no means unanimous, for general relativists to prefer the −+++ metric signature,
while particle physicists prefer +−−−.
For someone like me who does general relativistic visualization, there is no contest: the choice has to be
−+++, so that signs remain consistent between 3D spatial vectors and 4D spacetime vectors. For example,
the 3D industry knows well that quaternions provide the most efficient and powerful way to implement
spatial rotations. As shown in Chapter 13, complex quaternions provide the best way to implement Lorentz
transformations, with the subgroup of real quaternions continuing to provide spatial rotations. Compatibility
requires −+++. Actually, OpenGL and other graphics languages put spatial coordinates in the first three
indices, leaving time to occupy the fourth index; but in these notes I stick to the physics convention of
putting time in the zeroth index.
In practical calculations it is convenient to be able to switch transparently between boldface and index notation in both 3D and 4D contexts. This is where the +−−− signature poses greater potential for
misinterpretation in 3D. For example, with this signature, what is the sign of the 3D scalar product
P3
P3
a·b ?
Is it a · b = a=1 aa ba or a · b = a=1 aa ba ? To be consistent with common 3D usage, it must be the
latter. With the +−−− signature, it must be that a · b = −aa ba , where the repeated indices signify implicit
summation over spatial indices. So you have to remember to introduce a minus sign in switching between
boldface and index notation.
As another example, what is the sign of the 3D vector product
P3
P3
a×b ?
P3
Is it a×b = b,c=1 εabc ab bc or a×b = b,c=1 εa bc ab bc or a×b = b,c=1 εabc ab bc ? Well, if you want to switch
transparently between boldface and index notation, and you decide that you want boldface consistently to
signify a vector with a raised index, then maybe you’d choose the middle option. To be consistent with
standard 3D convention for the sign of the vector product, maybe you’d choose εa bc to have positive sign for
abc an even permutation of xyz.
Finally, what is the sign of the 3D spatial gradient operator
∂
?
∂x
Is it ∇ = ∂/∂xa or ∇ = ∂/∂xa ? Convention dictates the former, in which case it must be that some boldface
3D vectors must signify a vector with a raised index, and others a vector with a lowered index. Oh dear.
∇≡
PART ONE
FUNDAMENTALS
Concept Questions
1. What does c = universal constant mean? What is speed? What is distance? What is time?
2. c + c = c. How can that be possible?
3. The first postulate of special relativity asserts that spacetime forms a 4-dimensional continuum. The
fourth postulate of special relativity asserts that spacetime has no absolute existence. Isn’t that a
contradiction?
4. The principle of special relativity says that there is no absolute spacetime, no absolute frame of reference
with respect to which position and velocity are defined. Yet does not the cosmic microwave background
define such a frame of reference?
5. How can two people moving relative to each other at near c both think each other’s clock runs slow?
6. How can two people moving relative to each other at near c both think the other is Lorentz-contracted?
7. All paradoxes in special relativity have the same solution. In one word, what is that solution?
8. All conceptual paradoxes in special relativity can be understood by drawing what kind of diagram?
9. Your twin takes a trip to α Cen at near c, then returns to Earth at near c. Meeting your twin, you see
that the twin has aged less than you. But from your twin’s perspective, it was you that receded at near
c, then returned at near c, so your twin thinks you aged less. Is it true?
10. Blobs in the jet of the galaxy M87 have been tracked by the Hubble Space Telescope to be moving at
about 6c. Does this violate special relativity?
11. If you watch an object move at near c, does it actually appear Lorentz-contracted? Explain.
12. You speed towards the centre of our Galaxy, the Milky Way, at near c. Does the centre appear to you
closer or farther away?
13. You go on a trip to the centre of the Milky Way, 30,000 lightyears distant, at near c. How long does the
trip take you?
14. You surf a light ray from a distant quasar to Earth. How much time does the trip take, from your
perspective?
15. If light is a wave, what is waving?
16. As you surf the light ray, how fast does it appear to vibrate?
17. How does the phase of a light ray vary along the light ray? Draw surfaces of constant phase on a
spacetime diagram.
7
8
Concept Questions
18. You see a distant galaxy at a redshift of z = 1. If you could see a clock on the galaxy, how fast would
the clock appear to tick? Could this be tested observationally?
19. You take a trip to α Cen at near c, then instantaneously accelerate to return at near c. If you are
looking through a telescope at a clock on the Earth while you instantaneously accelerate, what do you
see happen to the clock?
20. In what sense is time an imaginary spatial dimension?
21. In what sense is a Lorentz boost a rotation by an imaginary angle?
22. You know what it means for an object to be rotating at constant angular velocity. What does it mean
for an object to be boosting at a constant rate?
23. A wheel is spinning so that its rim is moving at near c. The rim is Lorentz-contracted, but the spokes
are not. How can that be?
24. You watch a wheel rotate at near the speed of light. The spokes appear bent. How can that be?
25. Does a sunbeam appear straight or bent when you pass by it at near the speed of light?
26. Energy and momentum are unified in special relativity. Explain.
27. In what sense is mass equivalent to energy in special relativity? In what sense is mass different from
energy?
28. Why is the Minkowski metric unchanged by a Lorentz transformation?
29. What is the best way to program Lorentz transformations on a computer?
What’s important?
1. The postulates of special relativity.
2. Understanding conceptually the unification of space and time implied by special relativity.
1. Spacetime diagrams.
2. Simultaneity.
3. Understanding the paradoxes of relativity — time dilation, Lorentz contraction, the twin paradox.
3. The mathematics of spacetime transformations.
1. Lorentz transformations.
2. Invariant spacetime distance.
3. Minkowski metric.
4. 4-vectors.
5. Energy-momentum 4-vector. E = mc2 .
6. The energy-momentum 4-vector of massless particles, such as photons.
4. What things look like at relativistic speeds.
9
1
Special Relativity
Special relativity is a fundamental building block of general relativity. General relativity postulates that the
local structure of spacetime is that of special relativity.
The primary goal of this Chapter is to convey a clear conceptual understanding of special relativity.
Everyday experience gives the impression that time is absolute, and that space is entirely distinct from time,
as Galileo and Newton postulated. Special relativity demands, in apparent contradiction to experience, the
revolutionary notion that space and time are united into a single 4-dimensional entity, called spacetime.
The revolution forces conclusions that appear paradoxical: how can two people moving relative to each other
both measure the speed of light to be the same, both think each other’s clock runs slow, and both think the
other is Lorentz-contracted?
In fact special relativity does not contradict everyday experience. It is just that we humans move through
our world at speeds that are so much smaller than the speed of light that we are not aware of relativistic
effects. The correctness of special relativity is confirmed every day in particle accelerators that smash particles
together at highly relativistic speeds.
See http://jila.colorado.edu/∼ajsh/sr/ for animated versions of several of the diagrams in this Chapter.
1.1 Motivation
The history of the development of special relativity is rich and human, and it is beyond the intended scope
of this book to give any reasonable account of it. If you are interested in the history, I recommend starting
with the popular account by Thorne (1994).
As first proposed by James Clerk Maxwell in 1864, light is an electromagnetic wave. Maxwell believed
(Goldman, 1984) that electromagnetic waves must be carried by some medium, the luminiferous aether,
just as sound waves are carried by air. However, Maxwell knew that his equations of electromagnetism had
empirical validity without any need for the hypothesis of an aether.
For Albert Einstein, the theory of special relativity was motivated by the curious circumstance that
Maxwell’s equations of electromagnetism seemed to imply that the speed of light was independent of the
motion of an observer. Others before Einstein had noticed this curious feature of Maxwell’s equations.
10
1.2 The postulates of special relativity
11
Joseph Larmor, Hendrick Lorentz, and Henri Poincaré all noticed that the form of Maxwell’s equations
could be preserved if lengths and times measured by an observer were somehow altered by motion through
the aether. The transformations of special relativity were discovered before Einstein by Lorentz (1904), the
name “Lorentz transformations” being conferred by Poincaré (1905).
Einstein’s great contribution was to propose (Einstein, 1905) that there was no aether, no absolute spacetime. From this simple and profound idea stemmed his theory of special relativity.
1.2 The postulates of special relativity
The theory of special relativity can be derived formally from a small number of postulates:
1. Space and time form a 4-dimensional continuum:
2. The existence of globally inertial frames;
3. The speed of light is constant;
4. The principle of special relativity.
The first two postulates are assertions about the structure of spacetime, while the last two postulates form
the heart of special relativity. Most books mention just the last two postulates, but I think it is important
to know that special (and general) relativity simply postulate the 4-dimensional character of spacetime, and
that special relativity postulates moreover that spacetime is flat.
1. Space and time form a 4-dimensional continuum. The correct mathematical word for continuum
is manifold. A 4-dimensional manifold is defined mathematically to be a topological space that is locally
homeomorphic to Euclidean 4-space R4 .
The postulate that spacetime forms a 4-dimensional continuum is a generalization of the classical Galilean
concept that space and time form separate 3 and 1 dimensional continua. The postulate of a 4-dimensional
spacetime continuum is retained in general relativity.
Physicists widely believe that this postulate must ultimately breakpdown, that space and time are quantized
over
small intervals of space and time, the Planck length G~/c3 ≈ 10−35 m, and the Planck time
p extremely−43
5
G~/c ≈ 10
s, where G is Newton’s gravitational constant, ~ ≡ h/(2π) is Planck’s constant divided by
2π, and c is the speed of light.
2. The existence of globally inertial frames. Statement: “There exist global spacetime frames with
respect to which unaccelerated objects move in straight lines at constant velocity.”
A spacetime frame is a system of coordinates for labelling space and time. Four coordinates are needed,
because spacetime is 4-dimensional. A frame in which unaccelerated objects move in straight lines at constant velocity is called an inertial frame. One can easily think of non-inertial frames: a rotating frame, an
accelerating frame, or simply a frame with some bizarre Dahlian labelling of coordinates.
A globally inertial frame is an inertial frame that covers all of space and time. The postulate that
globally inertial frames exist is carried over from classical mechanics (Newton’s first law of motion).
12
Special Relativity
Notice the subtle shift from the Newtonian perspective. The postulate is not that particles move in straight
lines, but rather that there exist spacetime frames with respect to which particles move in straight lines.
Implicit in the assumption of the existence of globally inertial frames is the assumption that the geometry of
spacetime is flat, the geometry of Euclid, where parallel lines remain parallel to infinity. In general relativity,
this postulate is replaced by the weaker postulate that local (not global) inertial frames exist. A locally
inertial frame is one which is inertial in a “small neighbourhood” of a spacetime point. In general relativity,
spacetime can be curved.
3. The speed of light is constant. Statement: “The speed of light c is a universal constant, the same in
any inertial frame.”
This postulate is the nub of special relativity. The immediate challenge of this chapter, §1.3, is to confront
its paradoxical implications, and to resolve them.
Measuring speed requires being able to measure intervals of both space and time: speed is distance travelled
divided by time elapsed. Inertial frames constitute a special class of spacetime coordinate systems; it is with
respect to distance and time intervals in these special frames that the speed of light is asserted to be constant.
In general relativity, arbitrarily weird coordinate systems are allowed, and light need move neither in
straight lines nor at constant velocity with respect to bizarre coordinates (why should it, if the labelling
of space and time is totally arbitrary?). However, general relativity asserts the existence of locally inertial
frames, and the speed of light is a universal constant in those frames.
In 1983, the General Conference on Weights and Measures officially defined the speed of light to be
c ≡ 299,792,458 m s−1 ,
(1.1)
and the metre, instead of being a primary measure, became a secondary quantity, defined in terms of the
second and the speed of light.
4. The principle of special relativity. Statement: “The laws of physics are the same in any inertial
frame, regardless of position or velocity.”
Physically, this means that there is no absolute spacetime, no absolute frame of reference with respect to
which position and velocity are defined. Only relative positions and velocities between objects are meaningful.
Mathematically, the principle of special relativity requires that the equations of special relativity be
Lorentz covariant.
It is to be noted that the principle of special relativity does not imply the constancy of the speed of light,
although the postulates are consistent with each other. Moreover the constancy of the speed of light does
not imply the Principle of Special Relativity, although for Einstein the former appears to have been the
inspiration for the latter.
An example of the application of the principle of special relativity is the construction of the energymomentum 4-vector of a particle, which should have the same form in any inertial frame (§1.11).
1.3 The paradox of the constancy of the speed of light
13
1.3 The paradox of the constancy of the speed of light
The postulate that the speed of light is the same in any inertial frame leads immediately to a paradox.
Resolution of this paradox compels a revolution in which space and time are united from separate 3 and
1-dimensional continua into a single 4-dimensional continuum.
Figure 1.1 shows Vermilion emitting a flash of light, which expands away from her in all directions.
Vermilion thinks that the light moves outward at the same speed in all directions. So Vermilion thinks that
she is at the centre of the expanding sphere of light.
Figure 1.1 shows also Cerulean, who is moving away from Vermilion at about half the speed of light. But,
says special relativity, Cerulean also thinks that the light moves outward at the same speed in all directions
from him. So Cerulean should be at the centre of the expanding light sphere too. But he’s not, is he. Paradox!
Figure 1.1 Vermilion emits a flash of light, which (from left to right) expands away from her in all directions. Since
the speed of light is constant in all directions, she finds herself at the centre of the expanding sphere of light. Cerulean
is moving to the right at half of the speed of light relative to Vermilion. Special relativity declares that Cerulean too
thinks that the speed of light is constant in all directions. So should not Cerulean think that he too is at the centre
of the expanding sphere of light? Paradox!
Concept question 1.1 Does light move differently depending on who emits it? Would the light
have expanded differently if Cerulean had emitted the light?
Exercise 1.2 Challenge problem: the paradox of the constancy of the speed of light. Can you
figure out a solution to the paradox? Somehow you have to arrange that both Vermilion and Cerulean regard
themselves as being in the centre of the expanding sphere of light.
1.3.1 Spacetime diagram
A spacetime diagram suggests a way of thinking, first advocated by Minkowski, 1909, that leads to the
solution of the paradox of the constancy of the speed of light. Indeed, spacetime diagrams provide the way
to resolve all conceptual paradoxes in special relativity, so it is thoroughly worthwhile to understand them.
A spacetime diagram, Figure 1.2, is a diagram in which the vertical axis represents time, while the
horizontal axis represents space. Really there are three dimensions of space, which can be thought of as
Li
g
ht
g
Li
ht
Special Relativity
Time
14
Space
Figure 1.2 A spacetime diagram shows events in space and time. In a spacetime diagram, time goes upward, while
space dimensions are horizontal. Really there should be 3 space dimensions, but usually it suffices to show 1 spatial
dimension, as here. In a spacetime diagram, the units of space and time are chosen so that light goes one unit of
distance in one unit of time, i.e. the units are such that the speed of light is one, c = 1. Thus light moves upward and
outward at 45◦ from vertical in a spacetime diagram.
filling additional horizontal dimensions. But for simplicity a spacetime diagram usually shows just one spatial
dimension.
In a spacetime diagram, the units of space and time are chosen so that light goes one unit of distance in
one unit of time, i.e. the units are such that the speed of light is one, c = 1. Thus light always moves upward
at 45◦ from vertical in a spacetime diagram. Each point in 4-dimensional spacetime is called an event. Light
Time
Space
Figure 1.3 Spacetime diagram of Vermilion emitting a flash of light. This is a spacetime diagram version of the
situation illustrated in Figure 1.1. The lines along which Vermilion and Cerulean move through spacetime are called
their worldlines. Each point in 4-dimensional spacetime is called an event. Light signals converging to or expanding
from an event follow a 3-dimensional hypersurface called the lightcone. In the diagram, the sphere of light expanding
from the emission event is following the future lightcone. There is also a past lightcone, not shown here.
1.3 The paradox of the constancy of the speed of light
15
signals converging to or expanding from an event follow a 3-dimensional hypersurface called the lightcone.
Light converging on to an event in on the past lightcone, while light emerging from an event is on the
future lightcone.
Figure 1.3 shows a spacetime diagram of Vermilion emitting a flash of light, and Cerulean moving relative
to Vermilion at about 21 the speed of light. This is a spacetime diagram version of the situation illustrated in
Figure 1.1. The lines along which Vermilion and Cerulean move through spacetime are called their worldlines.
Consider again the challenge problem. The problem is to arrange that both Vermilion and Cerulean are
at the centre of the lightcone, from their own points of view.
Here’s a clue. Cerulean’s concept of space and time may not be the same as Vermilion’s.
1.3.2 Centre of the lightcone
The solution to the paradox is that Cerulean’s spacetime is skewed compared to Vermilion’s, as illustrated
by Figure 1.4. The thing to notice in the diagram is that Cerulean is in the centre of the lightcone, according
to the way Cerulean perceives space and time. Vermilion remains at the centre of the lightcone according
to the way Vermilion perceives space and time. In the diagram Vermilion and her space are drawn at one
“tick” of her clock past the point of emission, and likewise Cerulean and his space are drawn at one “tick”
of his identical clock past the point of emission. Of course, from Cerulean’s point of view his spacetime is
quite normal, and it is Vermilion’s spacetime that is skewed.
In special relativity, the transformation between the spacetime frames of two inertial observers is called a
e
Time
Tim
Space
e
c
a
p
S
Figure 1.4 The solution to how both Vermilion and Cerulean can consider themselves to be at the centre of the
lightcone. Cerulean’s spacetime is skewed compared to Vermilion’s. Cerulean is in the centre of the lightcone, according
to the way Cerulean perceives space and time, while Vermilion remains at the centre of the lightcone according to the
way Vermilion perceives space and time. In the diagram Vermilion (red) and her space are drawn at one “tick” of her
clock past the point of emission, and likewise Cerulean (blue) and his space are drawn at one “tick” of his identical
clock past the point of emission.
16
Special Relativity
Lorentz transformation. In general, a Lorentz transformation consists of a spatial rotation about some
spatial axis, combined with a Lorentz boost by some velocity in some direction.
Only space along the direction of motion gets skewed with time. Distances perpendicular to the direction
of motion remain unchanged. Why must this be so? Consider two hoops which have the same size when at
rest relative to each other. Now set the hoops moving towards each other. Which hoop passes inside the
other? Neither! For suppose Vermilion thinks Cerulean’s hoop passed inside hers; by symmetry, Cerulean
must think Vermilion’s hoop passed inside his; but both cannot be true; the only possibility is that the hoops
remain the same size in directions perpendicular to the direction of motion.
If you have understood all this, then you have understood the crux of special relativity, and you can
now go away and figure out all the mathematics of Lorentz transformations. The mathematical problem is:
what is the relation between the spacetime coordinates {t, x, y, z} and {t′ , x′ , y ′ , z ′ } of a spacetime interval,
a 4-vector, in Vermilion’s versus Cerulean’s frames, if Cerulean is moving relative to Vermilion at velocity v
in, say, the x direction? The solution follows from requiring
1. that both observers consider themselves to be at the centre of the lightcone, as illustrated by Figure 1.4,
and
2. that distances perpendicular to the direction of motion remain unchanged, as illustrated by Figure 1.5.
An alternative version of the second condition is that a Lorentz transformation at velocity v followed by a
Lorentz transformation at velocity −v should yield the unit transformation.
Note that the postulate of the existence of globally inertial frames implies that Lorentz transformations
are linear, that straight lines (4-vectors) in one inertial spacetime frame transform into straight lines in other
inertial frames.
You will solve this problem in the next section but two, §1.6. As a prelude, the next two sections, §1.4 and
§1.5 discuss simultaneity and time dilation.
Space
Space
Time
Figure 1.5 Same as Figure 1.4, but with Cerulean moving into the page instead of to the right. This is just Figure 1.4
spatially rotated by 90◦ in the horizontal plane. Distances perpendicular to the direction of motion are unchanged.
1.4 Simultaneity
17
1.4 Simultaneity
Most (all?) of the apparent paradoxes of special relativity arise because observers moving at different velocities relative to each other have different notions of simultaneity.
1.4.1 Operational definition of simultaneity
How can simultaneity, the notion of events occurring at the same time at different places, be defined operationally?
One way is illustrated in the sequences of spacetime diagrams in Figure 1.6. Vermilion surrounds herself
with a set of mirrors, equidistant from Vermilion. She sends out a flash of light, which reflects off the mirrors
back to Vermilion. How does Vermilion know that the mirrors are all the same distance from her? Because the
reflected flash from the mirrors arrives back to Vermilion all at the same instant. Vermilion asserts that the
light flash must have hit all the mirrors simultaneously. Vermilion also asserts that the instant when the light
hit the mirrors must have been the instant, as registered by her wristwatch, precisely half way between the
moment she emitted the flash and the moment she received it back again. If it takes, say, 2 seconds between
flash and receipt, then Vermilion concludes that the mirrors are 1 lightsecond away from her. The spatial
hyperplane passing through these events is a hypersurface of simultaneity. More generally, from Vermilion’s
perspective, each horizontal hyperplane in the spacetime diagram is a hypersurface of simultaneity.
Cerulean defines surfaces of simultaneity using the same operational setup: he encompasses himself with
mirrors, arranging them so that a flash of light returns from them to him all at the same instant. But whereas
Cerulean concludes that his mirrors are all equidistant from him and that the light bounces off them all at the
same instant, Vermilion thinks otherwise. From Vermilion’s point of view, the light bounces off Cerulean’s
mirrors at different times and moreover at different distances from Cerulean, as illustrated in Figure 1.7.
Only so can the speed of light be constant, as Vermilion sees it, and yet the light return to Cerulean all at
the same instant.
Of course from Cerulean’s point of view all is fine: he thinks his mirrors are equidistant from him, and
Time
Time
Time
Time
Time
Space
Space
Space
Space
Space
Figure 1.6 How Vermilion defines hypersurfaces of simultaneity. She surrounds herself with (green) mirrors all at the
same distance. She sends out a light beam, which reflects off the mirrors, and returns to her all at the same moment.
She knows that the mirrors are all at the same distance precisely because the light returns to her all at the same
moment. The events where the light bounced off the mirrors defines a hypersurface of simultaneity for Vermilion.
18
Special Relativity
e
e
Tim
e
Tim
e
Tim
e
Tim
Tim
e
c
a
p
S
e
c
a
p
S
e
c
a
p
S
e
c
a
p
S
e
c
a
p
S
Figure 1.7 Cerulean defines hypersurfaces of simultaneity using the same operational setup as Vermilion: he bounces
light off (green) mirrors all at the same distance from him, arranging them so that the light returns to him all at the
same time. But from Vermilion’s frame, Cerulean’s experiment looks skewed, as shown here.
that the light bounces off them all at the same instant. The inevitable conclusion is that Cerulean must
measure space and time along axes that are skewed relative to Vermilion’s. Events that happen at the same
time according to Cerulean happen at different times according to Vermilion; and vice versa. Cerulean’s
hypersurfaces of simultaneity are not the same as Vermilion’s.
From Cerulean’s point of view, Cerulean remains always at the centre of the lightcone. Thus for Cerulean,
as for Vermilion, the speed of light is constant, the same in all directions.
1.5 Time dilation
Vermilion and Cerulean construct identical clocks, Figure 1.8, consisting of a light beam which bounces off
a mirror. Tick, the light beam hits the mirror, tock, the beam returns to its owner. As long as Vermilion
and Cerulean remain at rest relative to each other, both agree that each other’s clock tick-tocks at the same
rate as their own.
But now suppose Cerulean goes off at velocity v relative to Vermilion, in a direction perpendicular to the
direction of the mirror. A far as Cerulean is concerned, his clock tick-tocks at the same rate as before, a tick
at the mirror, a tock on return. But from Vermilion’s point of view, although the distance between Cerulean
and his mirror at any instant remains the same as before, the light has farther to go. And since the speed
of light is constant, Vermilion thinks it takes longer for Cerulean’s clock to tick-tock than her own. Thus
Vermilion thinks Cerulean’s clock runs slow relative to her own.
1.6 Lorentz transformation
19
1
γ
γυ
Figure 1.8 Vermilion and Cerulean construct identical clocks, consisting of a light beam that bounces off a (green)
mirror and returns to them. In the left panel, Cerulean is at rest relative to Vermilion. They both agree that their
clocks are identical. In the middle panel, Cerulean is moving to the right at speed v relative to Vermilion. The vertical
distance to the mirror is unchanged by Cerulean’s motion in a direction orthogonal to the direction to the mirror.
Whereas Cerulean thinks his clock ticks at the usual rate, Vermilion sees the path of the light taken by Cerulean’s
clock is longer, by a factor γ, than the path of light taken by her own clock. Since the speed of light is constant,
Vermilion thinks Cerulean’s clock takes longer to tick, by a factor γ, than her own. The sides of the triangle formed
by the distance 1 to the mirror, the length γ of the lightpath to Cerulean’s clock, and the distance γv travelled by
Cerulean, form a right-angled triangle, illustrated in the right panel.
1.5.1 Lorentz gamma factor
How much slower does Cerulean’s clock run, from Vermilion’s point of view? In special relativity the factor
is called the Lorentz gamma factor γ, introduced by the Dutch physicist Hendrik A. Lorentz in 1904, one
year before Einstein proposed his theory of special relativity.
In units where the speed of light is one, c = 1, Vermilion’s mirror in Figure 1.8 is one tick away from her,
and from her point of view the vertical distance between Cerulean and his mirror is the same, one tick. But
Vermilion thinks that the distance travelled by the light beam between Cerulean and his mirror is γ ticks.
Cerulean is moving at speed v, so Vermilion thinks he moves a distance of γv ticks during the γ ticks of time
taken by the light to travel from Cerulean to his mirror. Thus, from Vermilion’s point of view, the vertical
line from Cerulean to his mirror, Cerulean’s light beam, and Cerulean’s path form a triangle with sides 1,
γ, and γv, as illustrated in Figure 1.8. Pythogoras’ theorem implies that
12 + (γv)2 = γ 2 .
(1.2)
From this it follows that the Lorentz gamma factor γ is related to Cerulean’s velocity v by
γ=√
1
,
1 − v2
(1.3)
which is Lorentz’s famous formula.
1.6 Lorentz transformation
A Lorentz transformation is a rotation of space and time. Lorentz transformations form a 6-dimensional
group, with 3 dimensions from spatial rotations, and 3 dimensions from Lorentz boosts.
20
Special Relativity
If you wish to understand special relativity mathematically, then it is essential for you to go through the
exercise of deriving the form of Lorentz transformations for yourself. Indeed, this problem is the challenge
problem posed in §1.3, recast as a mathematical exercise. For simplicity, it is enough to consider the case of
a Lorentz boost by velocity v along the x-axis.
You can derive the form of a Lorentz transformation either pictorially (geometrically), or algebraically.
Ideally you should do both.
t
t'
t
x'
x
γ
1
γυ
x
Figure 1.9 Spacetime diagram representing the experiments shown in Figures 1.6 and 1.7. The right panel shows a
detail of how the spacetime diagram can be drawn using only a straight edge and a compass. If Cerulean’s position is
drawn first, then Vermilion’s position follows from drawing the arc as shown.
Exercise 1.3 Pictorial derivation of the Lorentz transformation. Construct, with ruler and compass,
a spacetime diagram that looks like the one in Figure 1.9. You should recognize that the square represents the
paths of lightrays that Vermilion uses to define a hypersurface of simultaneity, while the rectangle represents
the same thing for Cerulean. Notice that Cerulean’s worldline and line of simultaneity are diagonals along his
light rectangle, so the angles between those lines and the lightcone are equal. Notice also that the areas of the
square and the rectangle are the same, which expresses the fact that the area is multiplied by the determinant
of the Lorentz transformation matrix, which must be one (why?). Use your geometric construction to derive
the mathematical form of the Lorentz transformation.
Exercise 1.4 3D model of the Lorentz transformation. Make a 3D spacetime diagram of the Lorentz
transformation, something like that in Figure 1.4, with not only an x-dimension, as in Exercise 1.3, but also
a y-dimension. You can use a 3D computer modelling program, or you can make a real 3D model. Make the
lightcone from flexible paperboard, the spatial hypersurface of simultaneity from stiff paperboard, and the
worldline from wooden dowel.
Exercise 1.5 Mathematical derivation of the Lorentz transformation. Relative to person A (Vermilion, unprimed frame), person B (Cerulean, primed frame) moves at velocity v along the x-axis. Derive
1.6 Lorentz transformation
21
the form of the Lorentz transformation between the coordinates (t, x, y, z) of a 4-vector in A’s frame and the
corresponding coordinates (t′ , x′ , y ′ , z ′ ) in B’s frame from the assumptions:
1. that the transformation is linear;
2. that the spatial coordinates in the directions orthogonal to the direction of motion are unchanged;
3. that the speed of light c is the same for both A and B, so that x = t in A’s frame transforms to x′ = t′
in B’s frame, and likewise x = −t in A’s frame transforms to x′ = −t′ in B’s frame;
4. the definition of speed; if B is moving at speed v relative to A, then x = vt in A’s frame transforms to
x′ = 0 in B’s frame;
5. spatial isotropy; specifically, show that if A thinks B is moving at velocity v, then B must think that A
is moving at velocity −v, and symmetry (spatial isotropy) between these two situations then fixes the
Lorentz γ factor.
Your logic should be precise, and explained in clear, concise English.
You should find that the Lorentz transformation for a Lorentz boost by velocity v along the x-axis is
t′
x′
y′
z′
= γt − γvx
= − γvt + γx
=y
=z
= γt′ + γvx′
= γvt′ + γx′
= y′
= z′
t
x
y
z
,
.
(1.4)
,
(1.5)
The transformation can be written more elegantly in matrix notation:
t′
γ
x′ −γv
y′ = 0
z′
0
−γv
γ
0
0
0
0
1
0
0
t
x
0
0 y
1
z
with inverse
t
γ
x γv
y = 0
0
z
γv
γ
0
0
0
0
1
0
′
0
t
x′
0
0 y′
1
z′
.
(1.6)
A Lorentz transformation at velocity v followed by a Lorentz transformation at velocity v in the opposite
direction, i.e. at velocity −v, yields the unit transformation, as it should:
γ
γv
0
0
γv
γ
0
0
0
0
1
0
0
γ
0 −γv
0 0
1
0
−γv
γ
0
0
0
0
1
0
0
1 0
0 0 1
=
0 0 0
1
0 0
0
0
1
0
0
0
.
0
1
(1.7)
22
Special Relativity
The determinant of the Lorentz transformation is one, as it should be:
γ
−γv
0
0
−γv
γ
0
0
0
0
1
0
0
0
0
1
= γ 2 (1 − v 2 ) = 1 .
(1.8)
Indeed, requiring that the determinant be one provides another derivation of the formula (1.3) for the Lorentz
gamma factor.
Concept question 1.6 Determinant of a Lorentz transformation. Why must the determinant of a
Lorentz transformation be one?
1.7 Paradoxes: Time dilation, Lorentz contraction, and the Twin paradox
There are several classic paradoxes in special relativity. One of them has already been met above, the paradox
of the constancy of the speed of light in §1.3. This section collects three famous paradoxes: time dilation,
Lorentz contraction, and the Twin paradox.
If you wish to understand special relativity conceptually, then you should work through all these paradoxes
yourself. As remarked in §1.4, most (all?) paradoxes in special relativity arise because different observers
have different notions of simultaneity, and most (all?) paradoxes can be solved using spacetime diagrams.
The Twin paradox is particularly helpful because it illustrates several different facets of special relativity,
not only time dilation, but also how light travel time modifies what an observer actually sees.
1.7.1 Time dilation
If a timelike interval {t, r} corresponds to motion at velocity v, then r = vt. The proper time along the
interval is
p
p
t
τ = t2 − r 2 = t 1 − v 2 = .
(1.9)
γ
This is Lorentz time dilation: the proper time interval τ experienced by a moving person is a factor γ less
than the time interval t according to an onlooker.
1.7.2 Fitzgerald-Lorentz contraction
Consider a rocket of proper length l, so that in the rocket’s own rest frame (primed) the back and front ends
of the rocket move through time t′ with coordinates
{t′ , x′ } = {t′ , 0} and {t′ , l} .
(1.10)
1.7 Paradoxes: Time dilation, Lorentz contraction, and the Twin paradox
23
From the perspective of an observer who sees the rocket move at velocity v in the x-direction, the worldlines
of the back and front ends of the rocket are at
{t, x} = {γt′ , γvt′ } and {γt′ + γvl, γvt′ + γl} .
(1.11)
However, the observer measures the length of the rocket simultaneously in their own frame, not the rocket
frame. Solving for γt′ = t at the back and γt′ + γvl = t at the front gives
l
(1.12)
{t, x} = {t, vt} and t, vt +
γ
which says that the observer measures the front end of the rocket to be a distance l/γ ahead of the back
end. This is Lorentz contraction: an object of proper length l is measured by a moving person to be shorter
by a factor γ.
Exercise 1.7 Time dilation. On a spacetime diagram such as that in the left panel of Figure 1.10, show
how two observers moving relative to each other can both consider the other’s clock to run slow compared
to their own.
Figure 1.10 (Left) Time dilation, and (right) Lorentz contraction spacetime diagrams.
Exercise 1.8 Lorentz contraction. On a spacetime diagram such as that in the right panel Figure 1.10,
show how two observers moving relative to each other can both consider the other to be contracted along
the direction of motion.
Concept question 1.9 Is one side of a cube shorter than the other? Figure 1.11 shows a picture
of a 3-dimensional cube. Is one edge shorter than the other? Projected on to the page, it appears so, but in
reality all the edges have equal length. In what ways is this situation similar or dissimilar to time dilation
and Lorentz contraction in 4-dimensional relativity?
24
Special Relativity
Figure 1.11 A cube. Are the lengths of its sides all equal?
Exercise 1.10 Twin paradox. Your twin leaves you on Earth and travels to the spacestation Alpha,
ℓ = 3 lyr away, at a good fraction of the speed of light, then immediately returns to Earth at the same speed.
Figure 1.12 shows on a spacetime diagram the corresponding worldlines of both you and your twin. Aside
from part 1 and the first part of 2, you should derive your answers mathematically, using logic and Lorentz
transformations. However, the diagram is accurately drawn, and you should be able to check your answers
by measuring.
1. On a spacetime diagram such that in Figure 1.12, label the worldlines of you and your twin. Draw the
worldline of a light signal which travels from you on Earth, hits Alpha just when your twin arrives,
and immediately returns to Earth. Draw the twin’s “now” when just arriving at Alpha, and the twin’s
“now” just departing from Alpha (in the first case the twin is moving toward Alpha, while in the second
case the twin is moving back toward Earth).
2. From the diagram, measure the twin’s speed v relative to you, in units where the speed of light is unity,
c = 1. Deduce the Lorentz gamma factor γ, and the redshift factor 1 + z = [(1 + v)/(1 − v)]1/2 , in the
cases (i) where the twin is receding, and (ii) where the twin is approaching.
3. Choose the spacetime origin to be the event where the twin leaves Earth. Argue that the position
4-vector of the twin on arrival at Alpha is
{t, x, y, z} = {ℓ/v, ℓ, 0, 0} .
(1.13)
Lorentz transform this 4-vector to determine the position 4-vector of the twin on arrival at Alpha, in
the twin’s frame. Express your answer first in terms of ℓ, v, and γ, and then in (light)years. State in
words what this position 4-vector means.
4. How much do you and your twin age respectively during the round trip to Alpha and back? What is
the ratio of these ages? Express your answers first in terms of ℓ, v, and γ, and then in years.
5. What is the distance between the Earth and Alpha from the twin’s point of view? What is the ratio
of this distance to the distance between Earth and Alpha from your point of view? Explain how your
arrived at your result. Express your answer first in terms of ℓ, v, and γ, and then in lightyears.
6. You watch your twin through a telescope. How much time do you see (through the telescope) elapse
on your twin’s wristwatch between launch and arrival on Alpha? How much time passes on your own
25
1y
r
Time
1.7 Paradoxes: Time dilation, Lorentz contraction, and the Twin paradox
yr
1 yr
1l
Space
1 lyr
Figure 1.12 Twin paradox spacetime diagram.
wristwatch during this time? What is the ratio of these two times? Express your answers first in terms
of ℓ, v, and γ, and then in years.
7. On arrival at Alpha, your twin looks back through a telescope at your wristwatch. How much time does
your twin see (through the telescope) has elapsed since launch on your watch? How much time has
elapsed on the twin’s own wristwatch during this time? What is the ratio of these two times? Express
your answers first in terms of ℓ, v, and γ, and then in years.
8. You continue to watch your twin through a telescope. How much time elapses on your twin’s wristwatch,
as seen by you through the telescope, during the twin’s journey back from Alpha to Earth? How much
time passes on your own watch as you watch (through the telescope) the twin journey back from Alpha
to Earth? What is the ratio of these two times? Express your answers first in terms of ℓ, v, and γ, and
then in years.
9. During the journey back from Alpha to Earth, your twin likewise continues to look through a telescope
at the time registered on your watch. How much time passes on your wristwatch, as seen by your twin
through the telescope, during the journey back? How much time passes on the twin’s wristwatch from
26
Special Relativity
the twin’s point of view during the journey back? What is the ratio of these two times? Express your
answers first in terms of ℓ, v, and γ, and then in years.
Concept question 1.11 What breaks the symmetry between you and your twin? From your
point of view, you saw the twin recede from you at velocity v on the outbound journey, then approach you
at velocity v on the inbound journey. But the twin saw the essentially same thing: from the twin’s point of
view, the twin saw you recede at velocity v on the outbound journey, then approach the twin at velocity
v on the inbound journey. Isn’t the situation symmetrical, so shouldn’t you and the twin age identically?
What breaks the symmetry, allowing your twin to age less?
1.8 The spacetime wheel
1.8.1 Wheel
Figure 1.13 shows an ordinary 3-dimensional wheel. As the wheel rotates, a point on the wheel describes an
invariant circle. The coordinates {x, y} of a point on the wheel relative to its centre change, but the distance
r between the point and the centre remains constant
r2 = x2 + y 2 = constant .
(1.14)
More generally, the coordinates {x, y, z} of the interval between any two points in 3-dimensional space (a
vector) change when the coordinate system is rotated in 3 dimensions, but the separation r of the two points
remains constant
r2 = x2 + y 2 + z 2 = constant .
y
x
Figure 1.13 A wheel.
(1.15)
1.8 The spacetime wheel
27
t
x
Figure 1.14 A spacetime wheel.
1.8.2 Spacetime wheel
Figure 1.14 shows a spacetime wheel. The diagram here is a spacetime diagram, with time t vertical and
space x horizontal. A rotation between time t and space x is a Lorentz boost in the x-direction. As the
spacetime wheel boosts, a point on the wheel describes an invariant hyperbola. The spacetime coordinates
{t, x} of a point on the wheel relative to its centre change, but the spacetime separation s between the point
and the centre remains constant
s2 = − t2 + x2 = constant .
(1.16)
More generally, the coordinates {t, x, y, z} of the interval between any two events in 4-dimensional spacetime
(a 4-vector) change when the coordinate system is boosted or rotated, but the spacetime separation s of the
two events remains constant
s2 = − t2 + x2 + y 2 + z 2 = constant .
(1.17)
1.8.3 Lorentz boost as a rotation by an imaginary angle
The − sign instead of a + sign in front of the t2 in the spacetime separation formula (1.17) means that time
t can often be treated mathematically as if it were an imaginary spatial dimension. That is, t = iw where
√
i ≡ −1 and w is a “fourth spatial coordinate.”
A Lorentz boost by a velocity v can likewise be treated as a rotation by an imaginary angle. Consider a
normal spatial rotation in which a primed frame is rotated in the wx-plane clockwise by an angle a about
the origin, relative to the unprimed frame. The relation between the coordinates {w′ , x′ } and {w, x} of a
point in the two frames is
′
w
cos a − sin a
w
=
.
(1.18)
x′
sin a cos a
x
28
Special Relativity
Now set t = iw and α = ia with t and α both real. In other words, take the spatial coordinate w to be
imaginary, and the rotation angle a likewise to be imaginary. Then the rotation formula above becomes
′
t
cosh α − sinh α
t
=
(1.19)
x′
− sinh α cosh α
x
This agrees with the usual Lorentz transformation formula (1.5) if the boost velocity v and boost angle α
are related by
v = tanh α ,
(1.20)
so that
γ = cosh α ,
γv = sinh α .
(1.21)
The boost angle α is commonly called the rapidity. This provides a convenient way to add velocities in
special relativity: the rapidities simply add (for boosts along the same direction), just as spatial rotation
angles add (for rotations about the same axis). Thus a boost by velocity v1 = tanh α1 followed by a boost
by velocity v2 = tanh α2 in the same direction gives a net velocity boost of v = tanh α where
α = α1 + α2 .
(1.22)
The equivalent formula for the velocities themselves is
v=
v1 + v2
,
1 + v1 v2
(1.23)
the special relativistic velocity addition formula.
1.8.4 Trip across the Universe at constant acceleration
Suppose that you took a trip across the Universe in a spaceship, accelerating all the time at one Earth
gravity g. How far would you travel in how much time?
The spacetime wheel offers a cute way to solve this problem, since the rotating spacetime wheel can be
regarded as representing spacetime frames undergoing constant acceleration. Points on the right quadrant of
the rotating spacetime wheel, Figure 1.15, represent worldlines of persons who accelerate with constant acceleration in their own frame. The spokes of the spacetime wheel are lines of simultaneity for the accelerating
persons.
If the units of space and time are chosen so that the speed of light and the gravitational acceleration are
both one, c = g = 1, then the proper time experienced by the accelerating person is the rapidity α, and the
time and space coordinates of the accelerating person, relative to a person who remains at rest, are those of
a point on the spacetime wheel, namely
{t, x} = {sinh α, cosh α} .
(1.24)
1.8 The spacetime wheel
29
t
x
Figure 1.15 The right quadrant of the spacetime wheel represents the worldlines and lines of simultaneity of persons
who accelerate in the x direction with uniform acceleration in their own frames.
In the case where the acceleration is one Earth gravity, g = 9.80665 m s−2 , the unit of time is
299,792,458 m s−1
c
=
= 0.97 yr ,
g
9.80665 m s−2
(1.25)
just short of one year. For simplicity, Table 1.1, which tabulates some milestones along the way, takes the
unit of time to be exactly one year, which would be the case if you were accelerating at 0.97 g = 9.5 m s−2 .
Table 1.1 Trip across the Universe.
Time elapsed
on spaceship
in years
Time elapsed
on Earth
in years
Distance travelled
in lightyears
α
sinh α
cosh α − 1
0
1
2
2.34
3.962
6.60
10.9
15.4
18.4
19.2
25.3
0
1.175
3.627
5.12
26.3
368
2.7 × 104
2.44 × 106
4.9 × 107
1.1 × 108
5 × 1010
0
.5431
2.762
4.22
25.3
367
2.7 × 104
2.44 × 106
4.9 × 107
1.1 × 108
5 × 1010
To
Earth (starting point)
Proxima Cen
Vega
Pleiades
Centre of Milky Way
Andromeda galaxy
Virgo cluster
Coma cluster
Edge of observable Universe
30
Special Relativity
After a slow start, you cover ground at an ever increasing rate, crossing 50 billion lightyears, the distance
to the edge of the currently observable Universe, in just over 25 years of your own time.
Does this mean you go faster than the speed of light? No. From the point of view of a person at rest
on Earth, you never go faster than the speed of light. From your own point of view, distances along your
direction of motion are Lorentz-contracted, so distances that are vast from Earth’s point of view appear
much shorter to you. Fast as the Universe rushes by, it never goes faster than the speed of light.
This rosy picture of being able to flit around the Universe has drawbacks. Firstly, it would take a huge
amount of energy to keep you accelerating at g. Secondly, you would use up a huge amount of Earth time
travelling around at relativistic speeds. If you took a trip to the edge of the Universe, then by the time
you got back not only would all your friends and relations be dead, but the Earth would probably be gone,
swallowed by the Sun in its red giant phase, the Sun would have exhausted its fuel and shrivelled into a
cold white dwarf star, and the Solar System, having orbited the Galaxy a thousand times, would be lost
somewhere in its milky ways.
Technical point. The Universe is expanding, so the distance to the edge of the currently observable Universe
is increasing. Thus it would actually take longer than indicated in the table to reach the edge of the currently
observable Universe. Moreover if the Universe is accelerating, as evidence from the Hubble diagram of Type Ia
Supernovae indicates, then you will never be able to reach the edge of the currently observable Universe,
however fast you go.
1.9 Scalar spacetime distance
The fact that Lorentz transformations leave unchanged a certain distance, the spacetime distance, between
any two events in spacetime is one the most fundamental features of Lorentz transformations. The scalar
spacetime distance ∆s between two events separated by {∆t, ∆x, ∆y, ∆z} is given by
∆s2 = − ∆t2 + ∆r2
= − ∆t2 + ∆x2 + ∆y 2 + ∆z 2 .
(1.26)
A quantity such as ∆s2 that remains unchanged under any Lorentz transformation is called a scalar. You
should check yourself that ∆s2 is unchanged under Lorentz transformations (see Exercise 1.13). Lorentz
transformations can be defined as linear spacetime transformations that leave ∆s2 invariant.
The single scalar spacetime squared interval ∆s2 replaces the two scalar quantities
time interval
spatial interval
of classical Galilean spacetime.
∆t p
∆r = ∆x2 + ∆y 2 + ∆z 2
(1.27)
1.9 Scalar spacetime distance
31
1.9.1 Timelike, lightlike, spacelike
The scalar spacetime distance squared ∆s2 , equation (1.26), between two events can be negative, zero, or
positive. A spacetime interval {∆t, ∆x, ∆y, ∆z} ≡ {∆t, ∆r} is called
timelike
null or lightlike
spacelike
if ∆t > ∆r
if ∆t = ∆r
if ∆t < ∆r
or equivalently if ∆s2 < 0 ,
or equivalently if ∆s2 = 0 ,
or equivalently if ∆s2 > 0 ,
(1.28)
t
Tim
elik
Li
e
gh
tli
ke
as illustrated in Figure 1.16.
e
elik
c
Spa
x
Figure 1.16 Spacetime diagram illustrating timelike, lightlike, and spacelike intervals.
1.9.2 Proper time, proper distance
The scalar spacetime distance squared ∆s2 has a physical meaning.
If an interval {∆t, ∆r} is timelike, ∆t > ∆r, then the square root of minus the spacetime interval squared
is the proper time ∆τ along it
p
p
(1.29)
∆τ = −∆s2 = ∆t2 − ∆r2 .
This is the time experienced by an observer moving along that interval.
If an interval {∆t, ∆r} is spacelike, ∆t < ∆r, then the spacetime interval equals the proper distance
∆l along it
p
√
∆l = ∆s2 = ∆r2 − ∆t2 .
(1.30)
This is the distance between two events measured by an observer for whom those events are simultaneous.
Concept question 1.12
Proper time, proper distance. Justify the assertions (1.29) and (1.30).
32
Special Relativity
1.9.3 Minkowski metric
It is convenient to denote an interval using an index notation,
∆xm ≡ {∆t, ∆r} ≡ {∆t, ∆x, ∆y, ∆z} .
(1.31)
The indices run over m = t, x, t, z, or sometimes m = 0, 1, 2, 3. The scalar spacetime length squared ∆s2 of
an interval ∆xm can be abbreviated
∆s2 = ηmn ∆xm ∆xn ,
(1.32)
where ηmn is the Minkowski metric
ηmn
−1 0
0 1
≡
0 0
0 0
0
0
1
0
0
0
.
0
1
(1.33)
Equation (1.32) uses the implicit summation convention, according to which paired indices, one lowered
and one raised, are implicitly summed over.
1.10 4-vectors
1.10.1 Contravariant 4-vector
Under a Lorentz transformation, a coordinate interval ∆xm transforms as
∆xm → ∆x′m = Lm n ∆xn ,
(1.34)
where Lm n denotes a Lorentz transformation. The paired indices n on the right hand side of equation (1.34),
one lowered and one raised, are implicitly summed over. In matrix notation, Lm n is a 4 × 4 matrix. For
example, for a Lorentz boost by velocity v along the x-axis, Lm n is the matrix on the right hand side of
equation (1.5).
In special relativity a contravariant 4-vector is defined to be a quantity
am ≡ {at , ax , ay , az } ,
(1.35)
that transforms under Lorentz transformations like an interval ∆xm of spacetime,
am → a′m = Lm n an .
The indices run over m = t, x, y, z, or sometimes m = 0, 1, 2, 3.
(1.36)
1.10 4-vectors
33
1.10.2 Covariant 4-vector
In special and general relativity, besides the contravariant 4-vector am , with raised indices, it is convenient
to introduce a covariant 4-vector am , with lowered indices, obtained by multiplying the contravariant
4-vector by the metric,
am ≡ ηmn an .
(1.37)
With the Minkowski metric (1.33), the covariant components am are
am = {−at , ax , ay , az } ,
(1.38)
which differ from the contravariant components am only in the sign of the time component.
The reason for introducing the two species of vector is that their implicitly summed product
am am ≡ ηmn am an
= at at + ax ax + ay ay + az az
= − (at )2 + (ax )2 + (ay )2 + (az )2
(1.39)
is a Lorentz scalar, a fact you will prove in Exercise 1.13.
The notation may seem overly elaborate, but it proves extremely useful in general relativity, where the
metric is more complicated than Minkowski. Further discussion of the formalism of 4-vectors is deferred to
Chapter 2.
Exercise 1.13 Scalar product. Suppose that am and bm are two 4-vectors. Show that am bm is a scalar,
that is, it is unchanged by any Lorentz transformation. [Hint: For the Minkowski metric of special relativity,
am bm = − at bt + ax bx + ay by + az bz . Show that a′m b′m = am bm . You may assume without proof the familiar
result that the 3D scalar product a · b = ax bx + ay by + az bz of two 3-vectors is unchanged by any spatial
rotation, so it suffices to consider a Lorentz boost, say in the x direction.]
Exercise 1.14 The principle of longest proper time. Consider a person whose worldline goes from
spacetime event P0 to spacetime event P1 at velocity v1 relative to some inertial frame, and then from P1
to spacetime event P2 at velocity v2 , as illustrated in Figure 1.17. Assume for simplicity that the velocities
are both in the (positive or negative) x-direction. Show that the proper time along a straight line from P0
to P2 is always greater than or equal to the sum of the proper times along the two straight lines from P0
to P1 followed by P1 to P2 . Hence conclude that the longest proper time between two events is a straight
line. What does this imply about the twin paradox? [Hint: It is simplest to use rapidities α rather than
velocities. Let the segment from P0 to P1 be {t1 , x1 } = τ1 {cosh α1 , sinh α1 }, and the segment from P1 to P2
be {t2 , x2 } = τ2 {cosh α2 , sinh α2 }. The segment from P0 to P2 is the sum of these, {t, x} = {t1 + t2 , x1 + x2 }.
Show that
2 α2 − α1
2
2
τ − (τ1 + τ2 ) = 4τ1 τ2 sinh
,
(1.40)
2
34
Special Relativity
P2
t
P1
P0
x
Figure 1.17 The longest proper time between P0 and P2 is a straight line.
which is a minimum for α2 = α1 .]
1.11 Energy-momentum 4-vector
The foremost example of a 4-vector other than the interval ∆xm is the energy-momentum 4-vector.
One of the great insights of modern physics is that conservation laws are associated with symmetries.
The Principle of Special Relativity asserts that the laws of physics should take the same form at any point.
There is no preferred origin in spacetime in special relativity. In special relativity, spacetime has translation
symmetry with respect to both time and space. Associated with those symmetries are laws of conservation
of energy and momentum:
Symmetry
Conservation law
Time translation
Space translation
Energy
Momentum
Since one-dimensional time and three-dimensional space are united in special relativity, this suggests that
the single component of energy and the three components of momentum should be combined into a 4-vector:
energy = time component
of a 4-vector.
(1.41)
momentum = space component
The Principle of Special Relativity requires that the equation of energy-momentum conservation
energy
= constant
momentum
(1.42)
1.11 Energy-momentum 4-vector
35
should take the same form in any inertial frame. The equation should be Lorentz covariant, that is, the
equation should transform like a Lorentz 4-vector.
1.11.1 Construction of the energy-momentum 4-vector
The energy-momentum 4-vector of a particle of mass m at position {t, r} moving at velocity v = dr/dt can
be derived by requiring
1. that is a 4-vector, and
2. that it goes over to the Newtonian limit as v → 0.
In the Newtoniam limit, the 3-momentum p equals mass m times velocity v,
dr
.
(1.43)
dt
To obtain a 4-vector, two things must be done to the Newtonian momentum:
1. replace r by a 4-vector xn = {t, r}, and
2. replace dt by a scalar; the only available scalar measure of time is the proper time interval dτ along the
worldline of the particle.
The result is the energy-momentum 4-vector pn :
p = mv = m
dxn
dτ
dt dr
,
=m
dτ dτ
pn = m
= m {γ, γv} .
(1.44)
The components of the energy-momentum 4-vector are the special relativistic versions of energy E and
momentum p,
pn = {E, p} = {mγ, mγv} .
(1.45)
1.11.2 Special relativistic energy
From equation (1.45), the special relativistic energy E is the product of the rest mass and the Lorentz
γ-factor,
E = mγ
(units c = 1) ,
(1.46)
or, restoring standard units,
E = mc2 γ .
(1.47)
For small velocities v, the Taylor expansion of the Lorentz factor γ is
γ=p
1
1−
v 2 /c2
=1+
1 v2
+ ... .
2 c2
(1.48)
36
Special Relativity
Thus for small velocities, the special relativistic energy E Taylor expands as
1 v2
2
+ ...
E = mc 1 +
2 c2
1
= mc2 + mv 2 + ... .
2
(1.49)
The first term, mc2 , is the rest-mass energy. The second term, 21 mv 2 , is the non-relativistic kinetic energy.
Higher-order terms give relativistic corrections to the kinetic energy.
Einstein did not discard the constant term, but rather interpreted it seriously as indicating that mass
contains energy, the rest-mass energy
E = mc2 ,
(1.50)
perhaps the most famous equation in all of physics.
1.11.3 Rest mass is a scalar
The scalar quantity constructed from the energy-momentum 4-vector pn = {E, p} is
pn p n = − E 2 + p 2
= − m2 (γ 2 − γ 2 v 2 )
= − m2 ,
(1.51)
minus the square of the rest mass. The minus sign is associated with the choice −+++ of metric signature
in this book.
Elementary texts sometimes state that special relativity implies that the mass of a particle increases as its
velocity increases, but this is a confusing way of thinking. Mass is rest mass m, a scalar, not to be confused
with energy. That being said, Einstein’s famous equation (1.50) does suggest that rest mass is a form of
energy, and indeed that proves to be the case. Rest mass is routinely converted into energy in chemical or
nuclear reactions that liberate heat.
1.12 Photon energy-momentum
The energy-momentum 4-vectors of photons are of special interest because when you move through a scene
at near the speed of light, the scene appears distorted by the Lorentz transformation of the photon 4-vectors
that you see.
A photon has zero rest mass
m=0.
(1.52)
p n pn = − E 2 + p2 = − m2 = 0 .
(1.53)
Its scalar energy-momentum squared is thus zero,
1.12 Photon energy-momentum
37
Consequently the 3-momentum of a photon equals its energy (in units c = 1),
p ≡ |p| = E .
(1.54)
The energy-momentum 4-vector of a photon therefore takes the form
pn = {E, p}
= E{1, n}
= hν{1, n}
(1.55)
where ν is the photon frequency. The photon velocity is n, a unit vector. The photon speed is one, the speed
of light.
1.12.1 Lorentz transformation of the photon energy-momentum 4-vector
The energy-momentum 4-vector pm of a photon follows the usual rules for 4-vectors under Lorentz transformations. In the case that the emitter (primed frame) is moving at velocity v along the x-axis relative to the
observer (unprimed frame), the transformation is
′t
t
γ
−γv 0 0
p
γ(pt − vpx )
p
x
t
p′x −γv
x
γ
0 0
p = γ(p − vp ) .
(1.56)
y
y
p′y = 0
p
0
1 0
p
p′z
pz
0
0
0 1
pz
Equivalently
γ
1
′x
−γv
n
hν ′
n′y = 0
n′z
0
−γv
γ
0
0
0
0
1
0
0
1
nx
0
hν
0 ny
1
nz
γ(1 − nx v)
x
= hν γ(n − v) .
y
n
z
n
(1.57)
These mathematical relations imply the rules of 4-dimensional perspective, §1.13.2.
1.12.2 Redshift
The wavelength λ of a photon is related to its frequency ν by
λ = c/ν .
(1.58)
Astronomers define the redshift z of a photon by the shift of the observed wavelength λobs compared to its
emitted wavelength λem ,
λobs − λem
z≡
.
(1.59)
λem
38
Special Relativity
In relativity, it is often more convenient to use the redshift factor 1 + z,
1+z ≡
νem
λobs
=
.
λem
νobs
(1.60)
Sometimes it is useful to use a blueshift factor which is just the reciprocal of the redshift factor,
νobs
λem
1
=
.
≡
1+z
λobs
νem
(1.61)
1.12.3 Special relativistic Doppler shift
If the emitter frame (primed) is moving with velocity v in the x-direction relative to the observer frame
(unprimed) then the emitted and observed frequencies are related by, equation (1.57),
hνem = hνobs γ(1 − nx v) .
(1.62)
The redshift factor is therefore
νem
νobs
= γ(1 − nx v)
1+z =
= γ(1 − n · v) .
Equation (1.63) is the general formula for the special relativistic Doppler shift. In special cases,
r
1−v
velocity directly towards observer (v aligned with n) ,
1+v
γ
velocity in the transverse direction (v · n = 0) ,
1+z =
r
1+v
velocity directly away from observer (v anti-aligned with n) .
1−v
(1.63)
(1.64)
1.13 What things look like at relativistic speeds
1.13.1 Light travel time effects
When you move through a scene at near the speed of light, the scene appears distorted not only by time
dilation and Lorentz contraction, but also by differences in the light travel time from different parts of the
scene. The effect of differential light travel times is comparable to the effects of time dilation and Lorentz
contraction, and cannot be ignored.
An excellent way to see the importance of light travel time is to work through the twin paradox, Exercise 1.10. Nature provides a striking example of the importance of light travel time in the form of superluminal
(faster-than-light) jets in galaxies, the subject of Exercise 1.15.
1.13 What things look like at relativistic speeds
39
Exercise 1.15 Superluminal jets.
Radio observations of galaxies show in many cases twin jets emerging from the nucleus of the galaxy. The
jets are typically narrow and long, often penetrating beyond the optical extent of the galaxy. The jets are
frequently one-sided, and in some cases that are favourable to observation the jets are found to be superluminal. A celebrated example is the giant elliptical galaxy M87 at the centre of the Local Supercluster, whose
jet is observed over a broad range of wavelengths, including optical wavelengths. Hubble Space Telescope
observations, Figure 1.18, show blobs in the M87 jet moving across the sky at approximately 6c.
1. Draw a spacetime diagram of the situation, in Earth’s frame of reference. Assume that the velocity of
the galaxy M87 relative to Earth is negligible. Let the x-axis be the direction to M87. Choose the y-axis
so the jet lies in the x–y-plane. Let the jet be moving at velocity v at angle θ away from the direction
towards us on Earth, so that its spatial velocity relative to Earth is v ≡ {vx , vy } = {−v cos θ, v sin θ}.
2. In Earth coordinates {t, x, y}, the jet moves in time t a distance l = {lx , ly } = vt. Argue that during an
Earth time t, the jet has moved a distance lx nearer to the Earth (the distances lx and ly are both tiny
compared to the distance to M87), so the apparent time as seen through a telescope is not t, but rather
t diminished by the light travel time lx (units c = 1). Hence conclude that the apparent transverse
1994
1995
1996
1997
1998
6.0 5.5 6.1
6.0
Figure 1.18 The left panel shows an image of the galaxy M87 taken with the Advanced Camera for Surveys on the
Hubble Space Telescope. A jet, bluish compared to the starry background of the galaxy, emerges from the galaxy’s
central nucleus. Radio observations, not shown here, reveal that there is a second jet in the opposite direction. Credit:
STScI/AURA. The right panel is a sequence of Hubble images showing blobs in the jet moving superluminally, at
approximately 6c. The slanting lines track the moving features, with speeds given in units of c. The upper strip shows
where in the jet the blobs were located. Credit: John Biretta, STScI.
40
Special Relativity
velocity on the sky is
vapp =
v sin θ
.
1 − v cos θ
(1.65)
3. Sketch the apparent velocity vapp as a function of θ for some given velocity v. In terms of v and the
Lorentz factor γ, what are the values of θ and of vapp at the point where vapp reaches its maximum?
What can you conclude about the jet in M87?
4. What is the expected redshift 1 + z, or equivalently blueshift 1/(1 + z), of the jet as a function of v and
θ? By expressing v in terms of vapp and θ using equation (1.65), show that the blueshift factor is
q
1
2
.
(1.66)
= 1 + 2vapp cot θ − vapp
1+z
[Hint: Remember to use the correct redshift formula, equation (1.63).]
5. In terms of vapp , at what value of θ is the blueshift (i) infinite, or (ii) zero? What are these angles in the
case of M87? If the redshift of the jet were measurable, could you deduce the velocity v and opening
angle θ? Unfortunately the redshift of a superluminal jet is not usually observable, because the emission
is a continuum of synchrotron emission over a broad range of wavelengths, with no sharp atomic or ionic
lines to provide a redshift.
6. Why is the opposing jet not visible?
1.13.2 The rules of 4-dimensional perspective
The distortion of a scene when you move through it at near the speed of light can be calculated most directly
from the Lorentz transformation of the energy-momentum 4-vectors of the photons that you see. The result
is what I call the “Rules of 4-dimensional perspective.”
Figure 1.19 illustrates the rules of 4-dimensional perspective, also called “special relativistic beaming,”
which describe how a scene appears when you move through it at near light speed.
On the left, you are at rest relative to the scene. Imagine painting the scene on a celestial sphere around
you. The arrows represent the directions of light rays (photons) from the scene on the celestial sphere to you
at the center.
On the right, you are moving to the right through the scene, at 0.8 times the speed of light.
√ The celestial
sphere is stretched along the direction of your motion by the Lorentz gamma-factor γ = 1/ 1 − 0.82 = 5/3
into a celestial ellipsoid. You, the observer, are not at the centre of the ellipsoid, but rather at one of its foci
(the left one, if you are moving to the right). The focus of the celestial ellipsoid, where you the observer are, is
displaced from centre by γv = 4/3. The scene appears relativistically aberrated, which is to say concentrated
ahead of you, and expanded behind you.
The lengths of the arrows are proportional to the energies, or frequencies, of the photons that you see.
When you are moving through the scene at near light speed, the arrows ahead of you, in your direction
of motion, are longer than at rest, so you see the photons blue-shifted, increased in energy, increased in
frequency. Conversely, the arrows behind you are shorter than at rest, so you see the photons red-shifted,
1.13 What things look like at relativistic speeds
Observer
41
υ = 0.8
1
γυ
1
γ
Figure 1.19 The rules of 4-dimensional perspective. In special relativity, the scene seen by an observer moving through
the scene (right) is relativistically beamed compared to the scene seen by an observer at rest relative to the scene
(left). On the left, the observer at the center of the circle is at rest relative to the surrounding scene. On the right,
the observer is moving to the right through the same scene at v = 0.8 times the speed of light. The arrowed lines
represent energy-momenta of photons. The length of an arrowed line is proportional to the perceived energy of the
photon. The scene ahead of the moving observer appears concentrated, blueshifted, and farther away, while the scene
behind appears expanded, redshifted, and closer.
decreased in energy, decreased in frequency. Since photons are good clocks, the change in photon frequency
also tells you how fast or slow clocks attached to the scene appear to you to run.
This table summarizes the four effects of relativistic beaming on the appearance of a scene ahead of you
and behind you as you move through it at near the speed of light:
Effect
Ahead
Behind
Aberration
Colour
Brightness
Time
Concentrated
Blueshifted
Brighter
Speeded up
Expanded
Redshifted
Dimmer
Slowed down
Mathematical details of the rules of 4-dimensional perspective are explored in the next several Exercises.
Exercise 1.16 The rules of 4-dimensional perspective.
1. In terms of the photon energy-momentum 4-vector pk in an unprimed frame, what is the photon energy
momentum 4-vector p′k in a primed frame of reference moving at speed v in the x direction relative to
the unprimed frame? Argue that the photon 4-vectors in the unprimed and primed frames are related
geometrically by the “celestial ellipsoid” transformation illustrated in Figure 1.19. Bear in mind that
the photon vector is pointed towards the observer.
42
Special Relativity
2. Aberration. The photon 4-vector seen by an observer is the null vector pk = E(1, −n), where E is the
photon energy, and n is a unit 3-vector in the direction away from the observer, the minus sign taking
into account the fact that the photon vector is pointed towards the observer. An object appears in the
unprimed frame at angle θ to the x-direction and in the primed frame at angle θ′ to the x-direction.
Show that µ′ ≡ cos θ′ and µ ≡ cos θ are related by
µ′ =
µ+v
.
1 + vµ
(1.67)
3. Redshift. By what factor a = E ′ /E is the observed photon frequency from the object changed? Express
your answer as a function of γ, v, and µ.
4. Brightness. Photons at frequency E in the unprimed frame appear at frequency E ′ in the primed
frame. Argue that the brightness F (E), the number of photons per unit time per unit solid angle per
log interval of frequency (about E in the unprimed frame, and E ′ in the primed frame),
F (E) ≡
dN (E)
,
dt do d ln E
(1.68)
goes as
E ′ dµ
F ′ (E ′ )
=
= a3 .
F (E)
E dµ′
(1.69)
[Hint: Photons number conservation implies that dN ′ (E ′ ) = dN (E).]
5. Time. By what factor does the rate at which a clock ticks appear to change?
Exercise 1.17 Circles on the sky. Show that a circle on the sky Lorentz transforms to a circle on the sky.
Let the primed frame be moving at velocity v in the x-direction, let θ be the angle between the x-direction
and the direction m to the center of the circle, and let α be the angle between the circle axis m and the
photon direction n. Show that the angle θ′ in the primed frame is given by
tan θ′ =
sin θ
,
γ(v cos α + cos θ)
(1.70)
and that the angular radius α′ in the primed frame is given by
tan α′ =
sin α
.
γ(cos α + v cos θ)
(1.71)
[This result was first obtained by Penrose (1959) and Terrell (1959), prior to which it had been widely
thought that circles would appear Lorentz-contracted and therefore squashed. The following simple proof
was told to me by Engelbert Schucking (NYU). The set of null 4-vectors pk = E{1, −n} on the circle
satisfies the Lorentz-invariant equation xk pk = 0, where xk = |x|{− cos α, m} is a spacelike 4-vector whose
spatial components |x|m point to the center of the circle. Note that |x| is a magnitude of a 3-vector, not a
Lorentz-invariant scalar.]
1.14 Occupation number, phase-space volume, intensity, and flux
43
Exercise 1.18 Lorentz transformation preserves angles on the sky. From equation (1.67), show
that the angular metric do2 ≡ dθ2 + sin2 θ dφ2 on the sky Lorentz transforms as
do′2 =
1 − v2
do2 .
(1 + v cos θ)2
(1.72)
This kind of transformation, which multiplies the metric by an overall factor, called a conformal factor, is
called a conformal transformation. The conformal transformation (1.72) of the angular metric shrinks
and expands patches on the sky while preserving their shapes, that is, while preserving angles between lines.
Exercise 1.19 The aberration of starlight. The aberration of starlight was discovered by James Bradley
(1728) through precision measurements of the position of γ Draconis observed from London with a specially
commissioned “zenith sector.” Stellar aberration results from the annual motion of the Earth about the
Sun. Calculate the size of the effect, in arcseconds. Are special relativistic effects important? How does the
observational signature of stellar aberration differ from that of stellar parallax?
Concept question 1.20 Apparent (affine) distance. The rules of 4-dimensional perspective illustrated
in Figure 1.19 suggest that when you move through a scene at near lightspeed, the scene ahead looks farther
away (and not Lorentz-contracted at all). Is the scene really farther away, or is it just an illusion? Answer.
What is reality? In a deep sense, reality is what can be observed (by something, not necessarily a person).
So yes, the scene ahead really is farther away. Let the observer take a tape measure that is at rest relative
to the observer, and lay it out to the emitter. The laying has to be done in advance, because the emitter
is moving. Observers who move at different velocities lay out tapes that move at different velocities. The
observer moving faster toward the emitter indeed sees the emitter farther away, according to their tape
measure. The distance measured in this fashion is called the affine distance, §2.18, a measure of distance
along the past lightcone of the observer.
1.14 Occupation number, phase-space volume, intensity, and flux
Exercise 1.16 asked you to discover how the appearance of an emitter changes when the observer boosts
into a different frame. The change (1.69) in brightness can be derived at a more fundamental level from the
concepts of occupation number and phase-space volume.
The intensity of light can be described by the number dN of photons in a 3-volume element d3r of space
(as measured by an observer in their own rest frame) with momenta in a 3-volume element d3p of momentum
(again as measured by an observer). The 6-dimensional product d3r d3p of spatial and momentum 3-volumes,
called the phase-space volume, is Lorentz-invariant, unchanged by a boost or rotation of the observer’s frame
(see §10.26.1 for a proof). Indeed, as shown in §4.22, the phase-space volume element d3r d3p is invariant
under a wide range of transformations (called canonical transformations, §4.17).
In quantum mechanics, the phase volume divided by (2π~)3 (which is the same as h3 ; but in quantum
mechanics ~ is a more natural unit; for example, angular momentum is quantized in units of ~, and spin in
units of 12 ~) counts the number of free states of particles, here photons. Particles typically have spin, and an
44
Special Relativity
associated discrete number of distinct spin states. Photons have spin 1, and two spin states. The occupation
number f (t, r, p) is defined to be the number of photons per state at time t and spatial position r with
momenta p. The number dN of photons is the product of the occupation number f , the number g of spin
states, and the number d3r d3p/(2π~)3 of free quantum states,
dN (t, r, p) = f (t, r, p)
g d3r d3p
.
(2π~)3
(1.73)
The number dN of photons, the occupation number f , the number g of spin states, and the phase volume
d3r d3p/(2π~)3 are all Lorentz invariant.
Astronomers conventionally define the intensity Iν of light observed from an object to be the energy
received per unit time t per unit area A (of the telescope mirror or lens) per unit solid angle o per unit
frequency ν. Often intensity is quoted per unit wavelength λ or per unit energy E instead of per unit
frequency ν, and the intensity is subscripted accordingly, Iλ or IE . The intensity measures are related by
Iν dν = Iλ dλ = IE dE with λ = c/ν and E = 2π~ν. The intensity IE per unit energy is related to the
occupation number f by
IE ≡
g p3
E dN
= cf
,
dt dA do dE
(2π~)3
(1.74)
the spatial and momentum 3-volumes being d3r = c dt dA and d3p = p2 dp do. The p3 factor in equation (1.74)
reproduces the brightness factor a3 ≡ (E ′ /E)3 in equation (1.69).
Stars typically appear to astronomers as point sources. Astronomers define the flux Fν from a source to be
the intensity Iν integrated over the solid angle of the source. Again, flux is often quoted per unit wavelength
λ or per unit energy E, and subscripted accordingly, Fλ or FE .
Concept question 1.21 Brightness of a star. How does the flux from a star change when an observer
boosts into another frame? The flux that an observer, or a telescope, actually sees depends on the spectrum
of the light incident on the observer (the flux as a function of photon energy) and on the sensitivity of the
detector as a function of photon energy. But imagine a perfect detector that sees all photons incident on it,
of any photon energy.
Solution. The flux FE in an interval dE of energy is
Z
E dN
g p3
FE ≡
=c
f do .
(1.75)
dt dA dE
(2π~)3
Since the solid angle varies as do ∝ p−2 , while the occupation number f is Lorentz invariant, and the photon
energy and momentum are related by E = pc, the flux FE varies as
FE ∝ E ,
(1.76)
that is, the flux is proportional to the blueshift factor. Physically, the observed number of photons per unit
time increases in proportion to the photon frequency. The flux integrated over d ln E counts the total number
1.15 How to program Lorentz transformations on a computer
of photons observed per unit time, which again increases in proportion to the blueshift factor,
Z
FE d ln E ∝ E .
45
(1.77)
The flux integrated over dE counts the total energy observed per unit time, which increases as the square
of the blueshift factor,
Z
FE dE ∝ E 2 .
(1.78)
1.15 How to program Lorentz transformations on a computer
3D gaming programmers are familiar with the fact that the best way to program spatial rotations on a
computer is with quaternions. Compared to standard rotation matrices, quaternions offer increased speed
and require less storage, and their algebraic properties simplify interpolation and splining.
Section 1.8 showed that a Lorentz boost is mathematically equivalent to a rotation by an imaginary
angle. Thus suggests that Lorentz transformations might be treated as complexified spatial rotations, which
proves to be true. Indeed, the best way to program Lorentz transformation on a computer is with complex
quaternions, §14.5.
yon
tach
Figure 1.20 Tachyon spacetime diagram.
Exercise 1.22
Tachyons. A tachyon is a hypothetical particle that moves faster than the speed of light.
46
Special Relativity
The purpose of this problem is to discover that the existence of tachyons would imply a violation of causality.
1. On a spacetime diagram such as that in Figure 1.20, show how a tachyon emitted by Vermilion at speed
v > 1 can appear to go backwards in time, with v < −1, in another frame, that of Cerulean.
2. What is the smallest velocity that Cerulean must be moving relative to Vermilion in order that the
tachyon appears to go backwards in Cerulean’s time?
3. Suppose that Cerulean returns the tachyonic signal at the same speed v > 1 relative to his own frame.
Show on the spacetime diagram how Cerulean’s tachyonic signal can reach Vermilion before she sent
out the original tachyon.
4. What is the smallest velocity that Cerulean must be moving relative to Vermilion in order that his
tachyon reach Vermilion before she sent out her tachyon?
5. Why is the situation problematic?
6. If it is possible for Vermilion to send out a particle with v > 1, do you think it should also be possible
for her to send out a particle backward in time, with v < −1, from her point of view? Explain how she
might do this, or not, as the case may be.
Concept Questions
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
What assumption of general relativity makes it possible to introduce a coordinate system?
Is the speed of light a universal constant in general relativity? If so, in what sense?
What does “locally inertial” mean? How local is local?
Why is spacetime locally inertial?
What assumption of general relativity makes it possible to introduce clocks and rulers?
Consider two observers at the same point and with the same instantaneous velocity, but one is accelerating and the other is in free-fall. What is the relation between the proper time or proper distance along
an infinitesimal interval measured by the two observers? What assumption of general relativity implies
this?
Does Einstein’s principle of equivalence imply that two unequal masses will fall at the same rate in a
gravitational field? Explain.
In what respects is Einstein’s principle of equivalence (gravity is equivalent to acceleration) stronger
than the weak principle of equivalence (gravitating mass equals inertial mass)?
Standing on the surface of the Earth, you hold an object of negative mass in your hand, and drop it.
According to the principle of equivalence, does the negative mass fall up or down?
Same as the previous question, but what does Newtonian gravity predict?
You have a box of negative mass particles, and you remove energy from it. Do the particles move faster
or slower? Does the entropy of the box increase or decrease? Does the pressure exerted by the particles
on the walls of the box increase or decrease?
You shine two light beams along identical directions in a gravitational field. The two light beams are
identical in every way except that they have two different frequencies. Does the equivalence principle
imply that the interference pattern produced by each of the beams individually is the same?
What is a “straight line,” according to the principle of equivalence?
If all objects move on straight lines, how is it that when, standing on the surface of the Earth, you throw
two objects in the same direction but with different velocities, they follow two different trajectories?
In relativity, what is the generalization of the “shortest distance between two points”?
What kinds of general coordinate transformations are allowed in general relativity?
47
48
Concept Questions
17. In general relativity, what is a scalar? A 4-vector? A tensor? Which of the following is a scalar/vector/
tensor/none-of-the-above? (a) a set of coordinates xµ ; (b) a coordinate interval dxµ ; (c) proper time
τ?
18. What does general covariance mean?
19. What does parallel transport mean?
20. Why is it important to define covariant derivatives that behave like tensors?
21. Is covariant differentiation a derivation? That is, is covariant differentiation a linear operation, and does
it obey the Leibniz rule for the derivative of a product?
22. What is the covariant derivative of the metric tensor? Explain.
23. What does a connection coefficient Γκµν mean physically? Is it a tensor? Why, or why not?
24. An astronaut is in free-fall in orbit around the Earth. Can the astronaut detect that there is a gravitational field?
25. Can a gravitational field exist in flat space?
26. How can you tell whether a given metric is equivalent to the Minkowski metric of flat space?
27. How many degrees of freedom does the metric have? How many of these degrees of freedom can be
removed by arbitrary transformations of the spacetime coordinates, and therefore how many physical
degrees of freedom are there in spacetime?
28. If you insist that the spacetime is spherical, how many physical degrees of freedom are there in the
spacetime?
29. If you insist that the spacetime is spatially homogeneous and isotropic (the cosmological principle), how
many physical degrees of freedom are there in the spacetime?
30. In general relativity, you are free to prescribe any spacetime (any metric) you like, including metrics
with wormholes and metrics that connect the future to the past so as to violate causality. True or false?
31. If it is true that in general relativity you can prescribe any metric you like, then why aren’t you bumping
into wormholes and causality violations all the time?
32. How much mass does it take to curve space significantly (significantly meaning by of order unity)?
33. What is the relation between the energy-momentum 4-vector of a particle and the energy-momentum
tensor?
34. It is straightforward to go from a prescribed metric to the energy-momentum tensor. True or false?
35. It is straightforward to go from a prescribed energy-momentum tensor to the metric. True or false?
36. Does the principle of equivalence imply Einstein’s equations?
37. What do Einstein’s equations mean physically?
38. What does the Riemann curvature tensor Rκλµν mean physically? Is it a tensor?
39. The Riemann tensor splits into compressive (Ricci) and tidal (Weyl) parts. What do these parts mean,
physically?
40. Einstein’s equations imply conservation of energy-momentum, but what does that mean?
41. Do Einstein’s equations describe gravitational waves?
42. Do photons (massless particles) gravitate?
43. How do different forms of mass-energy gravitate?
44. How does negative mass gravitate?
What’s important?
1. The postulates of general relativity. How do the various postulates imply the mathematical structure of
general relativity?
2. The road from spacetime curvature to energy-momentum:
metric gµν
→ connection coefficients Γκµν
→ Riemann curvature tensor Rκλµν
→ Ricci tensor Rκµ and scalar R
→ Einstein tensor Gκµ = Rκµ − 21 gκµ R
→ energy-momentum tensor Tκµ
3. 4-velocity and 4-momentum. Geodesic equation.
4. Bianchi identities guarantee conservation of energy-momentum.
49
2
Fundamentals of General Relativity
As of writing (2013), general relativity continues to beat all-comers in the Darwinian struggle to be top theory
of gravity and spacetime (Will, 2005). Despite its success, most physicists accept that general relativity cannot
ultimately be correct, because of the difficulty in reconciling it with that other pillar of physics, quantum
mechanics. The other three known forces of Nature, the electromagnetic, weak, and colour (strong) forces,
are described by renormalizable quantum field theories, the so-called Standard Model of Physics, that agree
extraordinarily well with experiment, and whose predictions have continued to be confirmed by ever more
precise measurements. Attempts to quantize general relativity in a similar fashion fail. The attempt to unite
general relativity and quantum mechanics continues to exercise some of the brightest minds in physics.
One place where general relativity predicts its own demise is at singularities inside black holes. What
physics replaces general relativity at singularities? This is a deep question, providing one of the motivations
for this book’s emphasis on black hole interiors.
The aim of this chapter is to give a condensed introduction to the fundamentals of general relativity, using
the traditional coordinate-based approach to general relativity. The approach is neither the most insightful
nor the most powerful, but it is the fastest route to connecting the metric to the energy-momentum content
of spacetime. The chapter does not attempt to convey a deep conceptual understanding, which I think is
difficult to gain from the mathematics by itself. Later chapters, starting with Chapter 7 on the Schwarzschild
geometry, present visualizations intended to aid conceptual understanding.
One of the drawbacks of the coordinate approach is that it works with frames that are aligned at each point
with the tangent vectors eµ to the coordinates at that point. General relativity postulates the existence of
locally inertial frames, so the coordinates at any point can always be arranged such that the tangent vectors
at that one point are orthonormal, and the spacetime is locally flat (Minkowski) about that point. But
in a curved spacetime it is impossible to arrange the coordinate tangent vectors eµ to be orthonormal
everywhere. Thus the coordinate approach inevitably presents quantities in a frame that is skewed compared
to the natural, orthonormal frame. It is like looking at a scene with your eyes crossed. The problem is not so
bad if the spacetime is empty of energy-momentum, as in the Schwarzschild and Kerr geometries for ideal
spherical and rotating black holes, but it becomes a significant handicap in realistic spacetimes that contain
energy-momentum.
The coordinate approach is adequate to deal with ideal black holes, Chapter 6 to 9, and with the Friedmann50
2.1 Motivation
51
Lemaı̂tre-Robertson-Walker spacetime of a homogeneous, isotropic cosmology, Chapter 10. After that, the
book restarts essentially from scratch. Chapter 11 introduces the tetrad formalism, the springboard for
further explorations of gravity, black holes, and cosmology.
The convention in this book is that greek (brown) dummy indices label curved spacetime coordinates,
while latin (black) dummy indices label locally inertial (more generally, tetrad) coordinates.
2.1 Motivation
Special relativity was unsatisfactory almost from the outset. Einstein had conceived special relativity by
abolishing the aether. Yet for something that had no absolute substance, the spacetime of special relativity
had strikingly absolute properties: in special relativity, two particles on parallel trajectories would remain
parallel for ever, just as in Euclidean geometry.
Moreover whereas special relativity neatly accommodated the electromagnetic force, which propagated
at the speed of light, it did not accommodate the other force known at the beginning of the 20th century,
gravity. Plainly Newton’s theory of gravity could not be correct, since it posited instantaneous transmission
of the gravitational force, whereas special relativity seemed to preclude anything from moving faster than
light, Exercise 1.22. You might think that gravity, an inverse square law like electromagnetism, might satisfy
a similar set of equations, but this is not so. Whereas an electromagnetic wave carries no electric charge, and
therefore does not interact with itself, any wave of gravity must carry energy, and therefore must interact
with itself. This proves to be a considerable complication.
A partial solution, the principle of equivalence of gravity and acceleration, occurred to Einstein while
working on an invited review on special relativity (Einstein, 1907). Einstein realised that “if a person falls
freely, he will not feel his own weight,” an idea that Einstein would later refer to as “the happiest thought of
my life.” The principle of equivalence meant that gravity could be reinterpreted as a curvature of spacetime.
In this picture, the trajectories of two freely-falling particles that pass either side of a massive body are caused
Figure 2.1 Particles initially on parallel trajectories passing either side of the Earth are caused to converge by the
Earth’s gravity. According to Einstein’s principle of equivalence, the situation is equivalent to one where the particles
are moving in straight lines in local free-fall frames. This allows the gravitational force to be reinterpreted as being
produced by a curvature of spacetime induced by the presence of the Earth.
52
Fundamentals of General Relativity
to converge not because of a gravitational force, but rather because the massive body curves spacetime, and
the particles follow straight lines in the curved spacetime, Figure 2.1.
Einstein’s principle of equivalence is only half the story. The principle of equivalence determines how
particles must move in a spacetime of given curvature, but it does not determine how spacetime is itself
curved by mass. That was a much more difficult problem, which Einstein took several more years to crack.
The eventual solution was Einstein’s equations, the final version of which he set out in a presentation to the
Prussian Academy at the end of November 1915 (Einstein, 1915).
Contemporaneously with Einstein’s discovery, David Hilbert derived Einstein’s equations independently
and elegantly from an action principle (Hilbert, 1915). In the present chapter, Einstein’s equations are simply
postulated, since their real justification is that they reproduce experiment and observation. A derivation of
Einstein’s equations from the Hilbert action is deferred to Chapter 16.
2.2 The postulates of General Relativity
General relativity follows from three postulates:
1. Spacetime is a 4-dimensional differentiable manifold;
2. Einstein’s principle of equivalence;
3. Einstein’s equations.
2.2.1 Spacetime is a 4-dimensional differentiable manifold
A 4-dimensional manifold is defined mathematically to be a topological space that is locally homeomorphic
to Euclidean 4-space R4 . A homeomorphism is a continuous map that has a continuous inverse.
The postulate that spacetime is a 4-dimensional manifold means that it is possible to set up a coordinate
system, possibly in patches, called charts,
xµ ≡ {x0 , x1 , x2 , x3 }
(2.1)
such that each point of a chart of the spacetime has a unique coordinate.
It is not always possible to cover a manifold with a single chart, that is, with a coordinate system such
that every point of spacetime has a unique coordinate. A simple example of a 2-dimensional manifold that
cannot be covered with a single chart is the 2-sphere S 2 , the 2-dimensional surface of a 3-dimensional sphere,
as illustrated in Figure 2.2. Inevitably, lines of constant coordinate must cross somewhere on the 2-sphere.
At least two charts are required to cover a 2-sphere.
When more than one chart is necessary, neighbouring charts are required to overlap, in order that the
structure of the manifold be consistent across the overlap. General relativity postulates that the mapping
between the coordinates of overlapping charts be at least doubly differentiable. A manifold subject to this
property is called differentiable.
In practice one often uses coordinate systems that misbehave at some points, but in an innocuous fashion.
The 2-sphere again provides a classic example, where the standard choice of polar coordinates xµ = {θ, φ}
2.2 The postulates of General Relativity
53
y
x
θ
φ
Figure 2.2 The 2-sphere is a 2-manifold, a topological space that is locally homeomorphic to Euclidean 2-space R2 .
Any attempt to cover the surface of a 2-sphere with a single chart, that is, with coordinates x and y such that each
point on the sphere is specified by a unique coordinate {x, y}, fails at at least one point. In the left panel, a coordinate
grid draped over the sphere fails at one point, the south pole, where coordinate lines cross. At least two charts are
required to cover the surface of a 2-sphere, as illustrated in the middle panel, where one chart covers the north pole,
the other the south pole. Where the two charts overlap, the two sets of coordinates are related differentiably. The right
panel shows standard polar coordinates θ, φ on the 2-sphere. The polar coordinatization fails at the north and south
poles, where lines of longitude cross, the azimuthal angle φ is not unique, and a person passing smoothly through the
pole would see the azimuthal angle jump by π. Such misbehaving points, called coordinate singularities, are however
innocuous: they can be removed by cutting out a patch around the coordinate singularity, and pasting on a separate
chart.
misbehaves at the north and south poles, Figure 2.2. A person passing smoothly through a pole sees the
azimuthal coordinate jump discontinuously by π. This is called a coordinate singularity. It is innocuous
because it can be removed by excising a patch around the pole, and pasting on a separate chart.
2.2.2 Principle of equivalence
The weak principle of equivalence states that: “Gravitating mass equals inertial mass.” General relativity
satisfies the weak principle of equivalence, but then so also does Newtonian gravity.
Einstein’s principle of equivalence is actually two separate statements: “The laws of physics in a
gravitating frame are equivalent to those in an accelerating frame,” and “The laws of physics in a nonaccelerating, or free-fall, frame are locally those of special relativity.”
Einstein’s principle of equivalence implies that it is possible to remove the effects of gravity locally by going
into a non-accelerating, or free-fall, frame. The structure of spacetime in a non-accelerating, or free-fall, frame
is locally inertial, with the local structure of Minkowski space. By locally inertial is meant that at each point
of spacetime it is possible to choose coordinates such that (a) the metric at that point is Minkowski, and (b)
the first derivatives of the metric are all zero1 . In other words, Einstein’s principle of equivalence asserts the
existence of locally inertial frames.
1
Actually, general relativity goes a step further. The metric is the scalar product of coordinate tangent axes,
equation (2.26). General relativity postulates, §2.10.1, that the first derivatives not only of the metric, but also of the
tangent axes themselves, vanish. See also Concept question 2.5.
54
Fundamentals of General Relativity
Since special relativity is a metric theory, and the principle of equivalence asserts that general relativity
looks locally like special relativity, general relativity inherits from special relativity the property of being a
metric theory. A notable consequence is that the proper times and distances measured by an accelerating
observer are the same as those measured by a freely-falling observer at the same point and with the same
instantaneous velocity.
2.2.3 Einstein’s equations
Einstein’s equations comprise a 4 × 4 symmetric matrix of equations
Gµν = 8πGTµν .
(2.2)
Here G is the Newtonian gravitational constant, Gµν is the Einstein tensor, and Tµν is the energymomentum tensor.
Physically, Einstein’s equations signify
(compressive part of) curvature = energy-momentum content .
(2.3)
Einstein’s equations generalize Poisson’s equation
∇2 Φ = 4πGρ
(2.4)
where Φ is the Newtonian gravitational potential, and ρ the mass-energy density. Poisson’s equation is the
time-time component of Einstein’s equations in the limit of a weak gravitational field and slowly moving
matter, §2.27.
2.3 Implications of Einstein’s principle of equivalence
2.3.1 The gravitational redshift of light
Einstein’s principle of equivalence implies that light will redshift in a gravitational field. In a weak gravitational field, the gravitational redshift of light can be deduced quantitatively from the equivalence principle
without any further assumption (such as Einstein’s equations), Exercises 2.1 and 2.2. A fully general relativistic treatment for the redshift between observers at rest in a stationary gravitational field is given in
Exercise 2.9.
Exercise 2.1 The equivalence principle implies the gravitational redshift of light, Part 1. A
rigorous general relativistic version of this exercise is Exercise 2.10. A person standing at rest on the surface
of the Earth is to a good approximation in a uniform gravitational field, with gravitational acceleration g.
The principle of equivalence asserts that the situation is equivalent to that of a frame uniformly accelerating
at g. Assume that the non-accelerating, free-fall frame is Minkowski to a good approximation. Define the
2.3 Implications of Einstein’s principle of equivalence
55
B
B
equivalent
acceleration
Lig
ht
Lig
ht
equivalent
acceleration
gravity
gravity
A
A
Figure 2.3 Einstein’s principle of equivalence implies the gravitational redshift of light, and the gravitational bending
of light. In the left panel, persons A and B are at rest relative to each other in a uniform gravitational field. They
are shown moving to the right to bring out the evolution of the system in time. A sends a beam of light upward to
B. The principle of equivalence asserts that the uniform gravitational field is equivalent to a uniformly accelerating
frame. The right panel shows the equivalent uniformly accelerating situation as perceived by a person in free-fall. In
the free-fall frame, the light moves on a straight line, and has constant frequency. Back in the gravitating/accelerating
frame in the left panel, the light appears to bend, and to redshift as it climbs from A to B.
potential Φ by the usual Newtonian formula g = −∇Φ. Show that for small differences in their gravitational
potentials, B perceives the light emitted by A to be redshifted by (with units restored)
z=
Φobs − Φem
.
c2
(2.5)
Exercise 2.2 The equivalence principle implies the gravitational redshift of light, Part 2.
A rigorous general relativistic version of this exercise is Exercise 2.11. Consider a person who, at rest in
Minkowski space, whirls a clock around them on the end of string, so fast that the clock is moving at near
the speed of light. The person sees the clock redshifted by the Lorentz γ-factor (the string is of fixed length,
so the light travel time from clock to observer is always the same, and does not affect the redshift). Tugged
on by the string, the clock experiences a centripetal acceleration towards the whirling person. According to
the principle of equivalence, the centripetal acceleration is equivalent to a centrifugal gravitational force. In a
Newtonian approximation, if the clock is whirling around at angular velocity ω, then the effective centrifugal
potential at radius r from the observer is
Φ = − 21 ω 2 r2 .
(2.6)
Show that, for non-relativistic velocities ωr ≪ c, the observer perceives the light emitted from the clock to
be redshifted by (with units restored)
Φ
(2.7)
z=− 2 .
c
56
Fundamentals of General Relativity
2.3.2 The gravitational bending of light
The principle of equivalence also implies that light will appear to bend in a gravitational field, as illustrated
by Figure 2.3. However, a quantitative prediction for the bending of light requires full general relativity. The
bending of light in a weak gravitational field is the subject of Exercise 2.16.
2.4 Metric
Postulate (1) of general relativity means that it is possible to choose coordinates
xµ ≡ {x0 , x1 , x2 , x3 }
(2.8)
covering a patch of spacetime.
Postulate (2) of general relativity implies that at each point of spacetime it is possible to choose locally
inertial coordinates
ξ m ≡ {ξ 0 , ξ 1 , ξ 2 , ξ 3 }
(2.9)
ds2 = ηmn dξ m dξ n ,
(2.10)
such that the metric is Minkowski,
in an infinitesimal neighbourhood of the point. Infinitesimal neighbourhood means that the metric is the
Minkowski metric ηmn at the point, and that the first derivatives of the metric vanish at the point. The
spacetime distance squared ds2 is a scalar, a quantity that is unchanged by the choice of coordinates.
Whereas in special relativity the Minkowski formula (1.26) for the spacetime distance ∆s2 held for finite
intervals ∆xm , in general relativity the metric formula (2.10) holds only for infinitesimal intervals dξ m .
General relativity requires, postulate (1), that two sets of coordinates are differentiably related, so locally
inertial intervals dξ m and coordinate intervals dxµ are related by the Leibniz rule,
∂ξ m µ
dx .
∂xµ
It follows that the scalar spacetime distance squared is
dξ m =
∂ξ m ∂ξ n µ ν
dx dx ,
∂xµ ∂xν
which can be written in terms of coordinate intervals dxµ as
ds2 = ηmn
ds2 = gµν dxµ dxν ,
where gµν is the metric, a 4 × 4 symmetric matrix
(2.11)
(2.12)
(2.13)
∂ξ m ∂ξ n
.
(2.14)
∂xµ ∂xν
The metric is the essential mathematical object that converts an infinitesimal interval dxµ to a proper
measurement of an interval of time or space.
gµν = ηmn
2.5 Timelike, spacelike, proper time, proper distance
57
2.5 Timelike, spacelike, proper time, proper distance
General relativity inherits from special relativity the physical meaning of the scalar spacetime distance
squared ds2 along an interval dxµ . The scalar spacetime distance squared can be negative, zero, or positive,
and accordingly timelike, lightlike, or spacelike:
√
timelike: ds2 < 0 , dτ = −ds2 = interval of proper time ,
lightlike: ds2 = 0 ,
(2.15)
√
spacelike: ds2 > 0 , dl = ds2 = interval of proper distance .
2.6 Orthonomal tetrad basis γm
You are familiar with the idea that in ordinary 3-dimensional Euclidean geometry it is often convenient to
treat vectors in an abstract coordinate-independent formalism. Thus for example a 3-vector is commonly
written as an abstract quantity r. The coordinates of the vector r may be {x, y, z} in some particular
coordinate system, but one recognizes that the vector r has a meaning, a magnitude and a direction, that is
independent of the coordinate system adopted. In an arbitrary Cartesian coordinate system, the Euclidean
3-vector r can be expressed
X
r=
x̂a xa = x̂ x + ŷ y + ẑ z
(2.16)
a
where x̂a ≡ {x̂, ŷ, ẑ} are unit vectors along each of the coordinate axes. The unit vectors satisfy a Euclidean
metric
x̂a · x̂b = δab .
(2.17)
The same kind of abstract notation is useful in general relativity. Because the spacetime of general relativity
is only locally inertial, not globally inertial, vectors must be thought of as living not in the spacetime manifold
itself, but rather in the tangent space of the manifold. The existence and structure of such a tangent space
follows from the postulate of the existence of locally inertial frames. Let ξ m be a set of locally inertial
coordinates at a point of spacetime. Define the vectors γm , called a tetrad, to be tangent to the locally
inertial coordinates at the point in question,
γm ≡ {γ
γ0 , γ 1 , γ 2 , γ 3 } ,
(2.18)
as illustrated in the left panel of Figure 2.4. Each tetrad basis vector γm is a 4-dimensional object, with
both magnitude and direction. The basis vectors γm are introduced so that vectors in spacetime can be
expressed in an abstract coordinate-independent fashion. The prototypical vector is an infinitesimal interval
dξ m of spacetime, which can be expressed in coordinate-independent fashion as the abstract vector interval
dx defined by
dx ≡ γm dξ m = γ0 dξ 0 + γ1 dξ 1 + γ2 dξ 2 + γ3 dξ 3 .
(2.19)
58
Fundamentals of General Relativity
ξ0
x0
γ0
e0
γ1
ξ1
e1
x1
Figure 2.4 (Left) The tetrad vectors γm form an orthonormal basis of vectors tangent to a set of locally inertial
coordinates ξ m at a point. (Right) The coordinate tangent vectors eµ are the basis of vectors tangent to the coordinates
at each point. The background square grid represents a locally inertial frame, the existence of which is asserted by
general relativity.
The interval dξ m transforms under a Lorentz transformation of the locally inertial coordinates as a contravariant Lorentz vector. To make the abstract vector interval dx invariant under Lorentz transformation,
the basis vectors γm must transform as a covariant Lorentz vector.
The scalar length squared of the abstract vector interval dx is
ds2 = dx · dx = γm · γn dξ m dξ n .
(2.20)
Since this must reproduce the locally inertial metric (2.10), the scalar products of the tetrad vectors γm
must form the Minkowski metric
γm · γn = ηmn .
(2.21)
A basis of tetrad vectors whose scalar products form the Minkowski metric is called orthonormal.
Tetrads are explored in depth in Chapter 11.
2.7 Basis of coordinate tangent vectors eµ
In general relativity, coordinates can be chosen arbitrarily, subject to differentiability conditions. In an
arbitrary system of coordinates xµ , the coordinate tangent vectors eµ at each point,
eµ ≡ {e0 , e1 , e2 , e3 } ,
(2.22)
dx ≡ eµ dxµ = γm dξ m .
(2.23)
are defined to satisfy
The letter e derives from the German word einheit, meaning unity. The relation (2.11) between coordinate
intervals dxµ and locally inertial coordinate intervals dξ m implies that the coordinate tangent vectors eµ
2.8 4-vectors and tensors
59
must be related to the orthonormal tetrad vectors γm by
eµ = γ m
∂ξ m
.
∂xµ
(2.24)
Like the tetrad axes γm , each coordinate tangent axis eµ is a 4-dimensional vector object, with both magnitude and direction, as illustrated in the right panel of Figure 2.4.
The scalar length squared of the abstract vector interval dx is
ds2 = dx · dx = eµ · eν dxµ dxν ,
(2.25)
from which it follows that the scalar products of the coordinate tangent axes eµ must equal the coordinate
metric gµν ,
gµν = eµ · eν .
(2.26)
Like the orthonormal tetrad vectors γm , the coordinate tangent vectors eµ form a basis for the 4dimensional tangent space at each point. The tangent space has three basic mathematical properties. First,
the tangent space is a vector space, that is, it has the properties of linearity that define a vector space.
Second, the tangent space has an inner (or scalar) product, defined by the metric (2.26). Third, vectors in
the tangent space can be differentiated with respect to coordinates, as will be elucidated in §2.9.3.
Some texts represent the tangent vectors eµ with the notation ∂µ , on the grounds that eµ transforms
like the coordinate derivatives ∂µ ≡ ∂/∂xµ . This notation is not used in this book, to avoid the potential
confusion between ∂µ as a derivative and ∂µ as a vector.
2.8 4-vectors and tensors
2.8.1 Contravariant coordinate 4-vector
Under a general coordinate transformation
xµ → x′µ ,
(2.27)
a coordinate interval dxµ transforms as
dx′µ =
∂x′µ ν
dx .
∂xν
(2.28)
In general relativity, a coordinate 4-vector is defined to be a quantity Aµ = {A0 , A1 , A2 , A3 } that transforms under a coordinate transformation (2.27) like a coordinate interval
A′µ =
∂x′µ ν
A .
∂xν
(2.29)
Just because something has an index on it does not make it a 4-vector. The essential property of a contravariant coordinate 4-vector is that it transforms like a coordinate interval, equation (2.29).
60
Fundamentals of General Relativity
2.8.2 Abstract 4-vector
A 4-vector may be written in coordinate-independent fashion as
A = e µ Aµ .
(2.30)
The quantity A is an abstract 4-vector. Although A is a 4-vector, it is by construction unchanged by a
coordinate transformation, and is therefore a coordinate scalar. See §2.8.7 for commentary on the distinction
between abstract and coordinate vectors.
2.8.3 Lowering and raising indices
Define g µν to be the inverse metric, satisfying
gλµ g µν
1
0
= δλν =
0
0
0
1
0
0
0
0
1
0
0
0
.
0
1
(2.31)
The metric gµν and its inverse g µν provide the means of lowering and raising coordinate indices. The
components of a coordinate 4-vector Aµ with raised index are called its contravariant components, while
those Aµ with lowered indices are called its covariant components,
Aµ = gµν Aν ,
(2.32)
Aµ = g µν Aν .
(2.33)
2.8.4 Dual basis eµ
The contravariant dual basis elements eµ are defined by raising the indices of the covariant tangent basis
elements eν ,
eµ ≡ g µν eν .
(2.34)
You can check that the dual vectors eµ transform as a contravariant coordinate 4-vector. The dot products
of the dual basis elements eµ with each other are
eµ · eν = g µν .
(2.35)
The dot products of the dual and tangent basis elements are
eµ · eν = δνµ .
(2.36)
2.8 4-vectors and tensors
61
2.8.5 Covariant coordinate 4-vector
Under a general coordinate transformation (2.27), the covariant components Aµ of a coordinate 4-vector
transform as
∂xν
Aν .
(2.37)
Aµ′ =
∂x′µ
You can check that the transformation law (2.37) for the covariant components Aµ is consistent with the
transformation law (2.29) for the contravariant components Aµ .
You can check that the tangent vectors eµ transform as a covariant coordinate 4-vector.
2.8.6 Scalar product
If Aµ and B µ are coordinate 4-vectors, then their scalar product is
Aµ B µ = Aµ Bµ = gµν Aµ B ν .
(2.38)
This is a coordinate scalar, a quantity that remains invariant under general coordinate transformations.
The ability to form a scalar by contracting over paired indices, always one raised and one lowered, is what
makes the introduction of two species of vector, contravariant (raised index) and covariant (lowered index),
so advantageous.
In abstract vector formalism, the scalar product of two 4-vectors A = eµ Aµ and B = eµ B µ is
A · B = eµ · eν Aµ B ν = gµν Aµ B ν .
(2.39)
2.8.7 Comment on vector naming and notation
Different texts follow different conventions for naming and notating vectors and tensors.
This book follows the convention of calling both Aµ (with a dummy index µ) and A ≡ Aµ eµ vectors.
Although Aµ and A are both vectors, they are mathematically different objects.
If the index on a vector indicates a specific coordinate, then the indexed vector is the component of the
vector; for example A0 (or At ) is the x0 (or time t) component of the coordinate 4-vector Aµ .
In this book, the different species of vector are distinguished by an adjective:
1. A coordinate vector Aµ , identified by greek (brown) indices µ, is one that changes in a prescribed
way under coordinate transformations. A coordinate transformation is one that changes the coordinates
of the spacetime without actually changing the spacetime or whatever lies in it.
2. An abstract vector A, identified by boldface, is the thing itself, and is unchanged by the choice of
coordinates. Since the abstract vector is unchanged by a coordinate transformation, it is a coordinate
scalar.
All the types of vector have the properties of linearity (additivity, multiplication by scalars) that identify
them mathematically as belonging to vector spaces. The important distinction between the types of vector
is how they behave under transformations.
In referring to both Aµ and A as vectors, this book follows the standard physics practice of mentally
62
Fundamentals of General Relativity
regarding Aµ and A as equivalent objects. You are familiar with the advantages of treating a vector in
3-dimensional Euclidean space either as an abstract vector A, or as a coordinate vector Aa . Depending on
the problem, sometimes the abstract notation A is more convenient, and sometimes the coordinate notation
Aa is more convenient. Sometimes it’s convenient to switch between the two in the middle of a calculation.
Likewise in general relativity it is convenient to have the flexibility to work in either coordinate or abstract
notation, whatever suits the problem of the moment.
2.8.8 Coordinate tensor
In general, a coordinate tensor Aκλ...
µν... is an object that transforms under general coordinate transformations (2.27) as
A′κλ...
µν... =
∂xσ ∂xτ
∂x′κ ∂x′λ
... ′µ ′ν ... Aπρ...
στ... .
π
ρ
∂x ∂x
∂x ∂x
(2.40)
You can check that the metric tensor gµν and its inverse g µν are indeed coordinate tensors, transforming
like (2.40).
The rank of a tensor is the number of indices of its expansion Aκλ...
µν... in components. A scalar is a tensor
of rank 0. A 4-vector is a tensor of rank 1. The metric and its inverse are tensors of rank 2. The
rank of
a
n
tensor with n contravariant (upstairs) and m covariant (downstairs) indices is sometimes written
.
m
2.9 Covariant derivatives
2.9.1 Derivative of a coordinate scalar
Suppose that Φ is a coordinate scalar. Then the coordinate derivative of Φ is a coordinate 4-vector
∂Φ
∂xµ
a coordinate tensor
(2.41)
transforming like equation (2.37).
As a shorthand, the ordinary partial derivative is often denoted in the literature with a comma
∂Φ
= Φ,µ .
∂xµ
For the most part this book does not use the comma notation.
(2.42)
2.9.2 Derivative of a coordinate 4-vector
The ordinary partial derivative of a covariant coordinate 4-vector Aµ is not a tensor
∂Aµ
∂xν
not a coordinate tensor
(2.43)
2.9 Covariant derivatives
63
x0
e0
δ e0
δ x1
x1
Figure 2.5 The change δe0 in the tangent vector e0 over a small interval δx1 of spacetime is defined to be the difference
between the tangent vector e0 (x1 + δx1 ) at the shifted position x1 + δx1 and the tangent vector e0 (x1 ) at the original
position x1 , parallel-transported to the shifted position. The parallel-transported vector is shown as a dashed arrowed
line. The parallel transport is defined with respect to a locally inertial frame, shown as a background square grid.
because it does not transform like a coordinate tensor.
However, the 4-vector A = eµ Aµ , being by construction invariant under coordinate transformations, is a
coordinate scalar, and its partial derivative is a coordinate 4-vector
∂eµ Aµ
∂A
=
∂xν
∂xν
∂eµ µ
∂Aµ
= eµ ν +
A
∂x
∂xν
a coordinate tensor .
(2.44)
The last line of equation (2.44) assumes that it is legitimate to differentiate the tangent vectors eµ , but
what does this mean? The partial derivatives of basis vectors eµ are defined in the usual way by
eµ (x0 , ..., xν +δxν , ..., x3 ) − eµ (x0 , ..., xν , ..., x3 )
∂eµ
≡
lim
.
δxν →0
∂xν
δxν
(2.45)
This definition relies on being able to compare the vectors eµ (x) at some point x with the vectors eµ (x+δx)
at another point x+δx a small distance away. The comparison between two vectors a small distance apart
is made possible by the existence of locally inertial frames. In a locally inertial frame, two vectors a small
distance apart can be compared by parallel-transporting one vector to the location of the other along
the small interval between them, that is, by transporting the vector without accelerating or precessing with
respect to the locally inertial frame. Thus the right hand side of equation (2.45) should be interpreted as
eµ (x+δx) minus the value of eµ (x) parallel-transported from position x to position x+δx along the small
interval δx between them, as illustrated in Figure 2.5.
The notion of the tangent space at a point on a manifold was introduced in §2.6. Parallel transport allows
the tangent spaces at neighbouring points to be adjoined in a well-defined fashion to form the tangent
manifold, whose dimension is twice that of the underlying spacetime. Coordinates for the tangent manifold
are provided by a combination {xµ , ξ m } of coordinates xµ on the parent manifold and tangent space coordinates ξ m extrapolated from a locally inertial frame about each point. The tangent space coordinates ξ m
vary smoothly over the manifold provided that the locally inertial frames are chosen to vary smoothly.
64
Fundamentals of General Relativity
2.9.3 Coordinate connection coefficients
The partial derivatives of the basis vectors eµ that appear on the right hand side of equation (2.44) define
the coordinate connection coefficients Γκµν ,
∂eµ
≡ Γκµν eκ
∂xν
not a coordinate tensor .
(2.46)
The definition (2.46) shows that the connection coefficients express how each tangent vector eµ changes,
relative to parallel-transport, when shifted along an interval δxν .
2.9.4 Covariant derivative of a contravariant 4-vector
Expression (2.44) along with the definition (2.46) of the connection coefficients implies that
∂A
∂Aµ
=
e
+ Γκµν eκ Aµ
µ
ν
∂xν
∂x
κ
∂A
κ
µ
= eκ
+
Γ
A
µν
∂xν
a coordinate tensor .
(2.47)
The expression in parentheses is a coordinate tensor, and defines the covariant derivative Dν Aκ of the
contravariant coordinate 4-vector Aκ
D ν Aκ ≡
∂Aκ
+ Γκµν Aµ
∂xν
a coordinate tensor .
(2.48)
As a shorthand, the covariant derivative is often denoted in the literature with a semi-colon
Dν Aκ = Aκ;ν .
(2.49)
For the most part this book does not use the semi-colon notation.
2.9.5 Covariant derivative of a covariant coordinate 4-vector
Similarly,
∂A
= e κ D ν Aκ
∂xν
a coordinate tensor
(2.50)
where Dν Aκ is the covariant derivative of the covariant coordinate 4-vector Aκ
D ν Aκ ≡
∂Aκ
− Γµκν Aµ
∂xν
a coordinate tensor .
(2.51)
2.10 Torsion
65
2.9.6 Covariant derivative of a coordinate tensor
In general, the covariant derivative of a coordinate tensor is
Dπ Aκλ...
µν... =
∂Aκλ...
µν...
λ
κρ...
ρ
κλ...
ρ
κλ...
+ Γκρπ Aρλ...
µν... + Γρπ Aµν... + ... − Γµπ Aρν... − Γνπ Aµρ... − ...
∂xπ
(2.52)
with a positive Γ term for each contravariant index, and a negative Γ term for each covariant index.
Concept question 2.3 Does covariant differentiation commute with the metric? Answer. Yes,
essentially by construction. The covariant derivative of a tangent basis vector eµ ,
D ν eµ =
∂eν
− Γκµν eκ = 0 ,
∂xµ
(2.53)
vanishes by definition of the coordinate connections, equation (2.46). Consequently the covariant derivative of
the metric gµν ≡ eµ · eν also vanishes. As a corollary, covariant differentiation commutes with the operations
of raising and lowering indices, and of contraction.
2.10 Torsion
2.10.1 No-torsion condition
The existence of locally inertial frames requires that it must be possible to arrange not only that the tangent
axes eµ are orthonormal at a point, but also that they remain orthonormal to first order in a Taylor expansion
about the point. That is, it must be possible to choose the coordinates such that the tangent axes eµ are
orthonormal, and unchanged to linear order:
eµ · eν = ηµν ,
∂eµ
=0.
∂xν
(2.54a)
(2.54b)
In view of the definition (2.46) of the connection coefficients, the second condition (2.54b) is equivalent to
the vanishing of all the connection coefficients:
Γκµν = 0 .
(2.55)
Under a general coordinate transformation xµ → x′µ , the tangent axes transform as eµ = ∂x′κ /∂xµ e′κ .
The 4 × 4 matrix ∂x′κ /∂xµ of partial derivatives provides 16 degrees of freedom in choosing the tangent axes
at a point. The 16 degrees of freedom are enough — more than enough — to accomplish the orthonormality
condition (2.54a), which is a symmetric 4 × 4 matrix equation with 10 degrees of freedom. The additional
16 − 10 = 6 degrees of freedom are Lorentz transformations, which rotate the tangent axes eµ , but leave the
metric ηµν unchanged.
Just as it is possible to reorient the tangent axes eµ at a point by adjusting the matrix ∂x′κ /∂xµ of first
66
Fundamentals of General Relativity
partial derivatives of the coordinate transformation xµ → x′µ , so also it is possible to reorient the derivatives
∂eµ /∂xν of the tangent axes by adjusting the matrix ∂ 2 x′κ /∂xµ ∂xν of second partial derivatives of the
coordinate transformation. The second partial derivatives comprise a set of 4 symmetric 4 × 4 matrices, for
a total of 4 × 10 = 40 degrees of freedom. However, there are 4 × 4 × 4 = 64 connection coefficients Γκµν ,
all of which the condition (2.55) requires to vanish. The matrix of second derivatives is thus 64 − 40 = 24
degrees of freedom short of being able to make all the connections vanish. The resolution of the problem
is that, as shown below, equation (2.58), there are 24 combinations of the connections that form a tensor,
the torsion tensor. If a tensor is zero in one frame, then it is automatically zero in any other frame. Thus
the requirement that all the connections vanish requires that the torsion tensor vanish. This requires, from
the expression (2.58) for the torsion tensor, the no-torsion condition that the connection coefficients are
symmetric in their last two indices
Γκµν = Γκνµ .
(2.56)
It should be emphasized that the condition of vanishing torsion is an assumption of general relativity, not
a mathematical necessity. It has been shown in this section that torsion vanishes if and only if spacetime is
locally flat, meaning that at any point coordinates can be found such that conditions (2.54) are true. The
assumption of local flatness is central to the idea of the principle of equivalence. But it is an assumption,
not a consequence, of the theory.
Concept question 2.4 Parallel transport when torsion is present. If torsion does not vanish, then
there is no locally inertial frame. What does parallel-transport mean in such a case? Answer. A general
coordinate transformation can always be found such that the connection coefficients Γκµν vanish along any
one direction ν. Parallel-transport along that direction can be defined relative to such a frame. For any given
direction ν, there are 16 second partial derivatives ∂ 2 x′κ /∂xµ ∂xν , just enough to make vanish the 4 × 4 = 16
coefficients Γκµν .
2.10.2 Torsion tensor
General relativity assumes no torsion, but it is possible to consider generalizations to theories with torsion.
µ
The torsion tensor Sκλ
is defined by the commutator of the covariant derivative acting on a scalar Φ
µ
[Dκ , Dλ ] Φ = Sκλ
∂Φ
∂xµ
a coordinate tensor .
(2.57)
Note that the covariant derivative of a scalar is just the ordinary derivative, Dλ Φ = ∂Φ/∂xλ . The expression (2.51) for the covariant derivatives shows that the torsion tensor is
µ
= Γµκλ − Γµλκ
Sκλ
a coordinate tensor
(2.58)
which is evidently antisymmetric in the indices κλ.
In Einstein-Cartan theory, the torsion tensor is related to the spin content of spacetime. Since this vanishes
2.11 Connection coefficients in terms of the metric
67
in empty space, Einstein-Cartan theory is indistinguishable from general relativity in experiments carried
out in vacuum.
2.11 Connection coefficients in terms of the metric
The connection coefficients have been defined, equation (2.46), as derivatives of the tangent basis vectors eµ .
However, the connection coefficients can be expressed purely in terms of the (first derivatives of the) metric,
without reference to the individual basis vectors. The partial derivatives of the metric are
∂gλµ
∂eλ · eµ
=
∂xν
∂xν
∂eµ
∂eλ
= eλ ·
+ eµ ·
∂xν
∂xν
κ
= eλ · eκ Γµν + eµ · eκ Γκλν
= gλκ Γκµν + gµκ Γκλν
= Γλµν + Γµλν ,
(2.59)
which is a sum of two connection coefficients. Here Γλµν with all indices lowered is defined to be
the first index lowered by the metric,
Γλµν ≡ gλκ Γκµν .
Γκµν
with
(2.60)
Combining the metric derivatives in the following fashion yields an expression for a single connection,
∂gλµ
∂gλν
∂gµν
+
−
= Γλµν + Γµλν + Γλνµ + Γνλµ − Γµνλ − Γνµλ
ν
µ
∂x
∂x
∂xλ
= 2 Γλµν − Sλµν − Sµνλ − Sνµλ ,
κ
with Sλµν ≡ gλκ Sµν
, which shows that, in the presence of torsion,
1 ∂gλµ
∂gλν
∂gµν
+
−
+ Sλµν + Sµνλ + Sνµλ
Γλµν =
not a coordinate tensor .
2 ∂xν
∂xµ
∂xλ
If torsion vanishes, as general relativity assumes, then
1 ∂gλµ
∂gλν
∂gµν
+
−
Γλµν =
2 ∂xν
∂xµ
∂xλ
not a coordinate tensor .
(2.61)
(2.62)
(2.63)
This is the formula that allows connection coefficients to be calculated from the metric.
2.12 Torsion-free covariant derivative
Einstein’s principle of equivalence postulates that a locally inertial frame exists at each point of spacetime,
and this implies that torsion vanishes. However, torsion is of special interest as a generalization of gen-
68
Fundamentals of General Relativity
eral relativity because, as discussed in §2.19.2, the torsion tensor and the Riemann curvature tensor can
be regarded as fields associated with local gauge groups of respectively displacements and Lorentz transformations. Together displacements and Lorentz transformations form the Poincaré group of symmetries of
spacetime. Torsion arises naturally in extensions of general relativity such as supergravity (Nieuwenhuizen,
1981), which unify the Poincaré group by adjoining supersymmetry transformations.
The torsion-free part of the covariant derivative is a covariant derivative even when torsion is present (that
is, it yields a tensor when acting on a tensor). The torsion-free covariant derivative is important, even when
torsion is present, for several reasons. Firstly, as will be discovered from an action principle in Chapter 4,
the covariant derivative that goes in the geodesic equation (2.88) is the torsion-free covariant derivative,
equation (2.90). Secondly, the torsion-free covariant curl defines the exterior derivative in the theory of
differential forms, §15.6. The exterior derivative has the property that it is inverse to integration over curved
hypersurfaces. Integration is central to various aspects of general relativity, such as the development of
Lagrangian and Hamiltonian mechanics. Thirdly, the Lie derivative, §7.34, is a covariant derivative defined
in terms of torsion-free covariant derivatives. Finally, Yang-Mills gauge symmetries, such as the U (1) gauge
symmetry of electromagnetism, require the gauge field to be defined in terms of the torsion-free covariant
derivative, in order to preserve the gauge symmetry.
When torsion is present and it is desirable to make the torsion part explicit, it is convenient to distinguish
torsion-free quantities with a ˚ overscript. The torsion-free part Γ̊λµν of the connection, also called the LeviCivita connection, is given by the right hand side of equation (2.63). When expressed in a coordinate
frame (as opposed to a tetrad frame, §11.15), the components of the torsion-free connections Γ̊λµν are also
called Christoffel symbols. Sometimes, the components Γ̊λµν with all indices lowered are called Christoffel
symbols of the first kind, while components Γ̊λµν with first index raised are called Christoffel symbols of the
second kind. There is no need to remember the jargon, but it is useful to know what it means if you meet it.
The torsion-full connection Γλµν is a sum of the torsion-free connection Γ̊λµν and a tensor called the
contortion tensor (not contorsion!) Kλµν ,
Γλµν = Γ̊λµν + Kλµν
not a coordinate tensor .
(2.64)
From equation (2.62), the contortion tensor Kλµν is related to the torsion tensor Sλµν by
Kλµν =
1
2
(Sλµν + Sµνλ + Sνµλ ) = − Sνλµ + 32 S[λµν]
a coordinate tensor .
(2.65)
The contortion Kλµν is antisymmetric in its first two indices,
Kλµν = −Kµλν ,
(2.66)
and thus like the torsion tensor Sλµν has 6 × 4 = 24 degrees of freedom. The torsion tensor Sλµν can be
expressed in terms of the contortion tensor Kλµν ,
Sλµν = Kλµν − Kλνµ = − Kµνλ + 3 K[λµν]
a coordinate tensor .
(2.67)
The torsion-full covariant derivative Dν differs from the torsion-free covariant derivative D̊ν by the contortion,
κ
Dν Aκ ≡ D̊ν Aκ + Kµν
Aµ
a coordinate tensor .
(2.68)
2.12 Torsion-free covariant derivative
69
In this book torsion will not be assumed automatically to vanish, and thus by default the symbol Dν will
denote the torsion-full covariant derivative. When torsion is assumed to vanish, or when Dν denotes the
torsion-free covariant derivative, it will be explicitly stated so.
Concept question 2.5 Can the metric be Minkowski in the presence of torsion? In §2.10.1 it was
argued that the postulate of the existence of locally inertial frames implies that torsion vanishes. The basis
of the argument was the proposition that derivatives of the tangent axes vanish, equation (2.54b). Impose
instead the weaker condition that the derivatives of the metric (i.e. scalar products of tangent axes) vanish,
∂gλµ
=0.
∂xν
(2.69)
Can torsion be non-vanishing under this weaker condition? Answer. Yes. In fact torsion may exist even
in flat (Minkowski) space, where the metric is everywhere Minkowski, gλµ = ηλµ . The condition (2.69) of
vanishing metric derivatives is equivalent to the vanishing of the torsion-free connections,
1 ∂gλµ
= Γ(λµ)ν = Γ̊(λµ)ν + K(λµ)ν = Γ̊(λµ)ν = 0 .
2 ∂xν
(2.70)
Thus the condition (2.69) of vanishing metric derivatives imposes no condition on torsion.
Exercise 2.6
Aλ is
Covariant curl and coordinate curl. Show that the covariant curl of a covariant vector
D κ Aλ − D λ Aκ =
∂Aλ
∂Aκ
µ
−
+ Sκλ
Aµ .
κ
∂x
∂xλ
(2.71)
Conclude that the coordinate curl of a vector equals its torsion-free covariant curl,
D̊κ Aλ − D̊λ Aκ =
∂Aλ
∂Aκ
−
.
∂xκ
∂xλ
(2.72)
Of course, if torsion vanishes as general relativity assumes, then the covariant curl is the torsion-free covariant
µ
curl. Note that since both Dκ Aλ − Dλ Aκ on the left hand side and Sκλ
Aµ on the right hand side of
equation (2.71) are both tensors, it follows that the coordinate curl ∂Aλ /∂xκ − ∂Aκ /∂xλ is a tensor even in
the presence of torsion.
Exercise 2.7 Covariant divergence and coordinate divergence. Show that the covariant divergence
of a contravariant vector Aµ is
√
1 ∂( −gAµ )
µ
ν
+ Sµν
Aµ ,
(2.73)
Dµ A = √
−g
∂xµ
where g ≡ |gµν | is the determinant of the metric matrix. Conclude that the torsion-free covariant divergence
is
√
1 ∂( −gAµ )
µ
.
(2.74)
D̊µ A = √
−g
∂xµ
Of course, if torsion vanishes as general relativity assumes, then the covariant divergence is the torsion-free
70
Fundamentals of General Relativity
covariant divergence. Note that since both the covariant divergence on the left hand side of equation (2.73)
and the torsion term on the right hand side of equation (2.73) are both tensors, the torsion-free covariant
divergence (2.74) is a tensor even in the presence of torsion.
Solution. The covariant divergence is
D µ Aµ =
∂Aµ
+ Γνµν Aµ .
∂xµ
(2.75)
From equation (2.62),
1 λν ∂gλν
ν
+ Sµν
g
µ
2
∂x
√
∂ ln | −g|
ν
=
+ Sµν
.
∂xµ
Γνµν =
(2.76)
The second line of equations (2.76) follows because for any matrix M ,
δ ln |M | = ln |M + δM | − ln |M |
= ln |M −1 (M + δM )|
= ln |1 + M −1 δM |
= ln(1 + Tr M −1 δM )
= Tr M −1 δM .
(2.77)
The torsion-free covariant divergence is
D̊µ Aµ =
∂Aµ
+ Γ̊νµν Aµ ,
∂xµ
(2.78)
where the torsion-free coordinate connection is
√
1 λν ∂gλν
∂ ln | −g|
= g
=
.
(2.79)
2
∂xµ
∂xµ
Concept question 2.8 If torsion does not vanish, does torsion-free covariant differentiation
commute with the metric? Answer. Yes. Unlike the torsion-full covariant derivative, Concept Question 2.3, the torsion-free covariant derivative of the tangent basis vectors eκ does not vanish, but rather
ν
eν ,
depends on the contortion Kκµ
Γ̊νµν
ν
ν
D̊µ eκ = Dµ eκ + Kκµ
eν = Kκµ
eν .
(2.80)
However, the torsion-free covariant derivative of the metric, that is, of scalar products of the tangent basis
vectors, does vanish,
ν
ν
eν · eλ + Kλµ
eκ · eν = Kλκµ + Kκλµ = 0 ,
D̊µ gκλ = D̊µ (eκ · eλ ) = Kκµ
(2.81)
thanks to the antisymmetry of the contortion tensor in its first two indices. As a corollary, torsion-free
covariant differentiation commutes with the operations of raising and lowering indices, and of contraction.
2.13 Mathematical aside: What if there is no metric?
71
2.13 Mathematical aside: What if there is no metric?
General relativity is a metric theory. Many of the structures introduced above can be defined mathematically
without a metric. For example, it is possible to define the tangent space of vectors with basis eµ , and to
define a dual vector space with basis eµ such that eµ · eν = δνµ , equation (2.36). Elements of the dual vector
space are called covectors. Similarly it is possible to define connections and covariant derivatives without a
metric. However, this book follows general relativity in assuming that spacetime has a metric.
2.14 Coordinate 4-velocity
Consider a particle following a worldline
xµ (τ ) ,
where τ is the particle’s proper time. The proper time along any interval of the worldline is dτ ≡
Define the coordinate 4-velocity uµ by
uµ ≡
dxµ
dτ
a coordinate 4-vector .
(2.82)
√
−ds2 .
(2.83)
The magnitude squared of the 4-velocity is constant
uµ uµ = gµν
dxµ dxν
ds2
= 2 = −1 .
dτ dτ
dτ
(2.84)
The negative sign arises from the choice of metric signature: with the signature −+++ adopted here, there
is a − sign between ds2 and dτ 2 . Equation (2.84) can be regarded as an integral of motion associated with
conservation of particle rest mass.
2.15 Geodesic equation
Let u ≡ eµ uµ be the 4-velocity in coordinate-independent notation. The principle of equivalence (which
imposes vanishing torsion) implies that the geodesic equation, the equation of motion of a freely-falling
particle, is
du
=0 .
dτ
(2.85)
Why? Because du/dτ = 0 in the particle’s own free-fall frame, and the equation is coordinate-independent.
In the particle’s own free-fall frame, the particle’s 4-velocity is uµ = {1, 0, 0, 0}, and the particle’s locally
inertial axes eµ = {e0 , e1 , e2 , e3 } are constant.
72
Fundamentals of General Relativity
What does the equation of motion look like in coordinate notation? The acceleration is
dxν ∂u
du
=
dτ
dτ ∂xν
= uν e κ D ν uκ
κ
∂u
ν
κ
µ
= u eκ
+ Γµν u
∂xν
κ
du
κ
µ ν
.
= eκ
+ Γµν u u
dτ
(2.86)
The geodesic equation is then
duκ
+ Γκµν uµ uν = 0 .
dτ
(2.87)
Another way of writing the geodesic equation is
Duκ
=0,
Dτ
where D/Dτ is the covariant proper time derivative
(2.88)
D
(2.89)
≡ uν D ν .
Dτ
The above derivation of the geodesic equation invoked the principle of equivalence, which postulates that
locally inertial frames exist, and thus that torsion vanishes. What happens if torsion does not vanish? In
Chapter 4, equation (4.15), it will be shown from an action principle that in the presence of torsion, the
covariant derivative in the geodesic equation should simply be replaced by the torsion-free covariant derivative
D̊/Dτ = uµ D̊µ ,
D̊uκ
=0 .
Dτ
(2.90)
Thus the geodesic motion of particles is unaffected by the presence of torsion.
2.16 Coordinate 4-momentum
The coordinate 4-momentum of a particle of rest mass m is defined to be
pµ ≡ muµ = m
dxµ
dτ
a coordinate 4-vector .
(2.91)
The momentum squared is, from equation (2.84),
pµ pµ = m2 uµ uµ = −m2
(2.92)
minus the square of the rest mass. Again, the minus sign arises from the choice −+++ of metric signature.
2.17 Affine parameter
73
2.17 Affine parameter
For photons, the rest mass is zero, m = 0, but the 4-momentum pµ remains finite. Define the affine
parameter λ by
τ
a coordinate scalar
(2.93)
λ≡
m
which remains finite in the limit m → 0. The affine parameter λ is unique up to an overall linear transformation (that is, αλ + β is also an affine parameter, for constant α and β), because of the freedom in the
choice of mass m and the zero point of proper time τ . In terms of the affine parameter, the 4-momentum is
pµ =
dxµ
.
dλ
(2.94)
The geodesic equation is then in coordinate-independent notation
dp
=0,
dλ
(2.95)
or in component form
dpκ
+ Γκµν pµ pν = 0 ,
dλ
which works for massless as well as massive particles.
Another way of writing this is
Dpκ
=0,
Dλ
where D/Dλ is the covariant affine derivative
(2.96)
(2.97)
D
(2.98)
≡ pν D ν .
Dλ
In the presence of torsion, the connection in the geodesic equation (2.96) should be interpreted as the
torsion-free connection Γ̊κµν , and the covariant derivative in equations (2.97) and (2.98) are torsion-free
covariant derivatives.
2.18 Affine distance
The freedom in the overall scaling of the affine parameter can be removed by setting it equal to the proper
distance near the observer in the observer’s locally inertial rest frame. With the scaling fixed in this fashion,
the affine parameter is called the affine distance, so called because it provides a measure of distance along
null geodesics. When you look at a scene with your eyes, you are looking along null geodesics, and the natural
measure of distance to objects that you see is the affine distance (Hamilton and Polhemus, 2010).
In special relativity, the affine distance coincides with the perceived (e.g. binocular) distance to objects.
74
Fundamentals of General Relativity
Exercise 2.9 Gravitational redshift in a stationary metric. Let xµ ≡ {t, xα } constitute time t
and spatial coordinates xα of a spacetime. The metric gµν is said to be stationary if it is independent of
the coordinate t. A comoving observer in the spacetime is one that is at rest in the spatial coordinates,
dxα /dτ = 0.
1. Argue that the coordinate 4-velocity uν ≡ dxν /dτ of a comoving observer in a stationary spacetime is
uν = {γ, 0, 0, 0} ,
γ≡√
1
.
−gtt
(2.99)
2. Argue that the proper energy E of a particle, massless or massive, with energy-momentum 4-vector pν
seen by a comoving observer with 4-velocity uν , equation (2.99), is
E = −uν pν .
(2.100)
3. Consider a particle, massless or massive, that follows a geodesic between two comoving observers. Since
the metric is independent of the time coordinate t, the covariant momentum pt is a constant of motion,
equation (4.50). Argue that the ratio Eobs /Eem of the observed to emitted energies between two comoving
observers is
γobs
Eobs
=
.
(2.101)
Eem
γem
4. Can comoving observers exist where gtt is positive?
Exercise 2.10 Gravitational redshift in Rindler space. Rindler space is Minkowski space expressed in
the coordinates of uniformly accelerating observers, called Rindler observers. Rindler observers are precisely
the observers in the right quadrant of the spacetime wheel, Figure 1.14.
1. Start with Minkowski space in a Cartesian coordinate system {t, x, y, z}. Define Rindler coordinates l, α
by
t = l sinh α ,
x = l cosh α .
(2.102)
Show that the line-element in Rindler coordinates is
ds2 = − l2 dα2 + dl2 + dy 2 + dz 2 .
(2.103)
2. A Rindler observer is a comoving observer in Rindler space, one who follows a worldline of constant l,
y, and z. Since Rindler spacetime is stationary, conclude that the ratio Eobs /Eem of the observed to
emitted energies between two Rindler observers is, equation (2.101),
lem
Eobs
=
.
Eem
lobs
(2.104)
3. Can Rindler space be considered equivalent to a spacetime containing a uniform gravitational field? Do
Rindler observers all accelerate at the same rate?
2.19 Riemann tensor
75
Exercise 2.11 Gravitational redshift in a uniformly rotating space. Start with Minkowski space
in cylindrical coordinates {t, r, φ, z},
ds2 = − dt2 + dr2 + r2 dφ2 + dz 2 .
(2.105)
Define a uniformly rotating azimuthal angle χ by
χ ≡ φ − ωt ,
(2.106)
which is constant for observers who are at rest in a system rotating uniformly at angular velocity ω. The
line-element in uniformly rotating coordinates is
ds2 = − dt2 + dr2 + r2 (dχ + ω dt)2 + dz 2 .
(2.107)
1. A comoving observer in the uniformly rotating system follows a worldline at constant r, χ, and z. Since
the uniformly rotating spacetime is stationary, conclude that the ratio Eobs /Eem of the observed to
emitted energies between two comoving observers is, equation (2.101),
Eobs
γem
=
,
Eem
γobs
(2.108)
1
,
1 − v2
(2.109)
where
γ=√
2. What happens where v > 1?
v = ωr .
2.19 Riemann tensor
2.19.1 Riemann curvature tensor
The Riemann curvature tensor Rκλµν is defined by the commutator of the covariant derivative acting
on a 4-vector. In the presence of torsion,
ν
[Dκ , Dλ ] Aµ = Sκλ
Dν Aµ + Rκλµν Aν
a coordinate tensor .
(2.110)
If torsion vanishes, as general relativity assumes, then the definition (2.110) reduces to
[Dκ , Dλ ] Aµ = Rκλµν Aν
a coordinate tensor .
(2.111)
The expression (2.51) for the covariant derivative yields the following formula for the Riemann tensor in
terms of connection coefficients
Rκλµν =
∂Γµνλ
∂Γµνκ
−
+ Γπµλ Γπνκ − Γπµκ Γπνλ
∂xκ
∂xλ
a coordinate tensor .
(2.112)
76
Fundamentals of General Relativity
This is the formula that allows the Riemann tensor to be calculated from the connection coefficients. The
same formula (2.112) remains valid if torsion does not vanish, but the connection coefficients Γλµν themselves
are given by (2.62) in place of (2.63).
In flat (Minkowski) space, covariant derivatives reduce to partial derivatives, Dκ → ∂/∂xκ , and
∂
∂
= 0 in flat space
(2.113)
,
[Dκ , Dλ ] →
∂xκ ∂xλ
so that Rκλµν = 0 in flat space.
Exercise 2.12 Derivation of the Riemann tensor. Confirm expression (2.112) for the Riemann tensor.
This is an exercise that any serious student of general relativity should do. However, you might like to defer
this rite of passage to Chapter 11, where Exercises 11.3–11.6 take you through the derivation and properties
of the tetrad-frame Riemann tensor.
2.19.2 Commutator of the covariant derivative acting on a general tensor
The commutator of the covariant derivative is of fundamental importance because it defines what is meant
by the field in gauge theories.
It has seen above that the commutator of the covariant derivative acting on a scalar defined the torsion
tensor, equation (2.57), which general relativity assumes vanishes, while the commutator of the covariant
derivative acting on a vector defined the Riemann tensor, equation (2.111). Does the commutator of the
covariant derivative acting on a general tensor introduce any other distinct tensor? No: the torsion and
Riemann tensors completely define the action of the commutator of the covariant derivative on any tensor.
Acting on a general tensor, the commutator of the covariant derivative is
σ πρ...
σ πρ...
π σρ...
σ
πρ...
ρ πσ...
[Dκ , Dλ ] Aπρ...
µν... = Sκλ Dσ Aµν... + Rκλµ Aσν... + Rκλν Aµσ... − Rκλσ Aµν... − Rκλσ Aµν... .
(2.114)
In more abstract notation, the commutator of the covariant derivative is the operator
µ
Dµ + R̂κλ
[Dκ , Dλ ] = Sκλ
(2.115)
where the Riemann curvature operator R̂κλ is an operator whose action on any tensor is specified by equation (2.114). The action of the operator R̂κλ is analogous to that of the covariant derivative (2.52): there’s
a positive R term for each covariant index, and a negative R term for each contravariant index. The action
of R̂κλ on a scalar is zero, which reflects the fact that a scalar is unchanged by a Lorentz transformation.
The general expression (2.114) for the commutator of the covariant derivative reveals the meaning of the
torsion and Riemann tensors. The torsion and Riemann tensors describe respectively the displacement and
the Lorentz transformation experienced by an object when parallel-transported around a curve. Displacements and Lorentz transformations together constitute the Poincaré group, the complete group of symmetries
of flat spacetime.
How can an object detect a displacement when parallel-transported around a curve? If you go around
2.19 Riemann tensor
77
a curve back to the same coordinate in spacetime where you began, won’t you necessarily be at the same
position? This is a question that goes to heart of the meaning of spacetime. To answer the question, you
have to consider how fundamental particles are able to detect position, orientation, and velocity. Classically,
particles may be structureless points, but quantum mechanically, particles possess frequency, wavelength,
spin, and (in the relativistic theory) boost, and presumably it is these properties that allow particles to
“measure” the properties of the spacetime in which they live. For example, a Dirac spinor (relativistic spin- 12
particle) Lorentz transforms under the fundamental (spin- 12 ) representation of the Lorentz group, and is
thus endowed with precisely the properties that allow it to “measure” boost and rotation, §14.10. The Dirac
µ
wave equation shows that a Dirac spinor propagating through spacetime varies as ∼ eipµ x , whose phase
encodes the displacement of the Dirac spinor. Thus a Dirac spinor could potentially detect a displacement
through a change in its phase when parallel-transported around a curve back to the same point in spacetime.
Since a change in phase is indistinguishable from a spatial rotation about the spin axis of the Dirac spinor,
operationally torsion rotates particles, whence the name torsion.
2.19.3 No torsion
In the remainder of this chapter, torsion will be assumed to vanish, as general relativity postulates. A
decomposition of the Riemann tensor into torsion-free and contortion parts is deferred to §11.18.
2.19.4 Symmetries of the Riemann tensor
In a locally inertial frame (necessarily, with vanishing torsion), the connection coefficients all vanish, Γλµν = 0,
but their partial derivatives, which are proportional to second derivatives of the metric tensor, do not vanish.
Thus in a locally inertial frame the Riemann tensor is
∂Γµνλ
∂Γµνκ
−
κ
∂x
∂xλ
2
1
∂ 2 gµλ
∂ 2 gνλ
∂ 2 gµν
∂ 2 gµκ
∂ 2 gνκ
∂ gµν
=
+
−
−
−
+
2 ∂xκ ∂xλ
∂xκ ∂xν
∂xκ ∂xµ
∂xλ ∂xκ
∂xλ ∂xν
∂xλ ∂xµ
2
2
2
2
1
∂ gνλ
∂ gµκ
∂ gνκ
∂ gµλ
.
=
−
−
+
κ
ν
κ
µ
λ
ν
2 ∂x ∂x
∂x ∂x
∂x ∂x
∂xλ ∂xµ
Rκλµν =
(2.116)
You can check that the bottom line of equation (2.116):
1. is antisymmetric in κ ↔ λ,
2. is antisymmetric in µ ↔ ν,
3. is symmetric in κλ ↔ µν,
4. has the property that the sum of the cyclic permutations of the last three (or first three, or indeed any
three) indices vanishes
Rκλµν + Rκνλµ + Rκµνλ = 0 .
(2.117)
78
Fundamentals of General Relativity
Actually, as shown in Exercise 11.6, the last two of the four symmetries, the symmetric symmetry and the
cyclic symmetry, imply each other. The first three of the four symmetries can be expressed compactly
Rκλµν = R([κλ][µν]) ,
(2.118)
in which [ ] denotes anti-symmetrization and ( ) symmetrization, as in
A[κλ] ≡
1
2
(Aκλ − Aλκ ) ,
A(κλ) ≡
1
2
(Aκλ + Aλκ ) .
(2.119)
The symmetries (2.118) imply that the Riemann tensor is a symmetric matrix of antisymmetric matrices. An
antisymmetric tensor is also known as a bivector, much more about which you can discover in Chapter 13
on the geometric algebra. An antisymmetric matrix, or bivector, in 4 dimensions has 6 degrees of freedom.
A symmetric matrix of bivectors is a 6 × 6 symmetric matrix, which has 21 degrees of freedom. The final,
cyclic symmetry of the Riemann tensor, equation (2.117), which can be abbreviated
Rκ[λµν] = 0 ,
(2.120)
removes 1 degree of freedom. Thus the Riemann tensor has a net 20 degrees of freedom.
Although the above symmetries were derived in a locally inertial frame, the fact that the Riemann tensor
is a tensor means that the symmetries hold in any frame. If you prefer, you can add back the products of
connection coefficients in equation (2.112), and check that the claimed symmetries remain.
Some of the symmetries of the Riemann tensor persist when torsion is present, and others do not. The
relation between symmetries of the Riemann tensor and torsion is deferred to Exercises 11.4–11.6.
2.20 Ricci tensor, Ricci scalar
The Ricci tensor Rκµ and Ricci scalar R are the essentially unique contractions of the Riemann curvature
tensor. The Ricci tensor, the compressive part of the Riemann tensor, is
Rκµ ≡ g λν Rκλµν
a coordinate tensor .
(2.121)
If torsion vanishes as general relativity assumes, then the Ricci tensor is symmetric
Rκµ = Rµκ
(2.122)
and therefore has 10 independent components.
The Ricci scalar is
R ≡ g κµ Rκµ
a coordinate tensor (a scalar) .
(2.123)
2.21 Einstein tensor
79
2.21 Einstein tensor
The Einstein tensor Gκµ is defined by
Gκµ ≡ Rκµ −
1
2
gκµ R
a coordinate tensor .
(2.124)
For vanishing torsion, the symmetry of the Ricci and metric tensors imply that the Einstein tensor is likewise
symmetric
Gκµ = Gµκ ,
(2.125)
and thus has 10 independent components.
2.22 Bianchi identities
The Jacobi identity
[Dκ , [Dλ , Dµ ]] + [Dλ , [Dµ , Dκ ]] + [Dµ , [Dκ , Dλ ]] = 0
(2.126)
implies the Bianchi identities which, for vanishing torsion, are
Dκ Rλµνπ + Dλ Rµκνπ + Dµ Rκλνπ = 0 .
(2.127)
The torsion-free Bianchi identities can be written in shorthand
D[κ Rλµ]νπ = 0 .
(2.128)
The Bianchi identities constitute a set of differential relations between the components of the Riemann
tensor, which are distinct from the algebraic symmetries of the Riemann tensor. There are 4 ways to pick
[κλµ], and 6 ways to pick antisymmetric νπ, giving 4 × 6 = 24 Bianchi identities, but 4 of the identities,
D[κ Rλµν]π = 0, are implied by the cyclic symmetry (2.120), which is a consequence of vanishing torsion.
Thus there are 24 − 4 = 20 non-trivial torsion-free Bianchi identities on the 20 components of the torsion-free
Riemann tensor.
Exercise 2.13
Jacobi identity. Prove the Jacobi identity (2.126).
2.23 Covariant conservation of the Einstein tensor
The most important consequence of the torsion-free Bianchi identities (2.128) is obtained from the double
contraction
g κν g λπ (Dκ Rλµνπ + Dλ Rµκνπ + Dµ Rκλνπ ) = −Dκ Rκµ − Dλ Rλµ + Dµ R = 0 ,
(2.129)
80
Fundamentals of General Relativity
or equivalently
Dκ Gκµ = 0 ,
(2.130)
where Gκµ is the Einstein tensor, equation (2.124). Equation (2.130) is a primary motivation for the form
of the Einstein equations, since it implies energy-momentum conservation, equation (2.132).
2.24 Einstein equations
Einstein’s equations are
Gκµ = 8πGTκµ
a coordinate tensor equation .
(2.131)
What motivates the form of Einstein’s equations?
1. The equation is generally covariant.
2. For vanishing torsion, the Bianchi identities (2.128) guarantee covariant conservation of the Einstein
tensor, equation (2.130), which in turn guarantees covariant conservation of energy-momentum,
Dκ Tκµ = 0 .
(2.132)
3. The Einstein tensor depends on the lowest (second) order derivatives of the metric tensor that do not
vanish in a locally inertial frame.
In Chapter 16, the Einstein equations will be derived from an action principle. Although Einstein derived his
equations from considerations of theoretical elegance, the real justification for them is that they reproduce
observation.
Einstein’s equations (2.131) constitute a complete set of gravitational equations, generalizing Poisson’s
equation of Newtonian gravity. However, Einstein’s equations by themselves do not constitute a closed set
of equations: in general, other equations, such as Maxwell’s equations of electromagnetism, and equations
describing the microphysics of the energy-momentum, must be adjoined to form a closed set.
Exercise 2.14 Einstein tensor in 3 or more dimensions. What is the Einstein tensor in N ≥ 3
spacetime dimensions?
Solution. The Einstein tensor must be covariantly conserved to ensure that its source, energy-momentum,
is covariantly conserved. The doubly-contracted Bianchi identities (3.6) hold as long as there are at least 3
spacetime dimensions. In N = 2 spacetime dimensions, there are zero Bianchi identities (2.128), since there
are zero ways of picking 3 distinct indices. Thus the expression (2.124) for the Einstein tensor holds in any
number N ≥ 3 of spacetime dimensions. See §11.19 for general relativity in 2 spacetime dimensions.
2.25 Summary of the path from metric to the energy-momentum tensor
1. Start by defining the metric gµν .
2.26 Energy-momentum tensor of a perfect fluid
81
2. Compute the connection coefficients Γλµν from equation (2.63).
3. Compute the Riemann tensor Rκλµν from equation (2.112).
4. Compute the Ricci tensor Rκµ from equation (2.121), the Ricci scalar R from equation (2.123), and the
Einstein tensor Gκµ from equation (2.124).
5. The Einstein equations (2.131) then imply the energy-momentum tensor Tκµ .
The path from metric to energy-momentum tensor is straightforward to program on a computer, but
the results are typically messy and complicated, even for fairly simple spacetimes. Inverting the path to
recover the metric from a given energy-momentum content is typically highly non-trivial, the subject of a
vast literature.
The great majority of metrics gµν yield an energy-momentum tensor Tκµ that cannot be achieved with
normal matter.
2.26 Energy-momentum tensor of a perfect fluid
The simplest non-trivial energy-momentum tensor is that of a perfect fluid. In this case T µν is taken to be
isotropic in the locally inertial rest frame of the fluid, taking the form
ρ 0 0 0
0 p 0 0
T µν =
(2.133)
0 0 p 0
0
0
0
p
where
ρ
p
is the proper mass-energy density ,
is the proper pressure .
(2.134)
The expression (2.133) is valid only in the locally inertial rest frame of the fluid. An expression that is valid
in any frame is
T µν = (ρ + p)uµ uν + p g µν ,
(2.135)
where uµ is the 4-velocity of the fluid. Equation (2.135) is valid because it is a tensor equation, and it is true
in the locally inertial rest frame, where uµ = {1, 0, 0, 0}.
2.27 Newtonian limit
The Newtonian limit is obtained in the limit of a weak gravitational field and non-relativistic (pressureless)
matter. In Cartesian coordinates, the metric in the Newtonian limit is (see Chapter 26)
ds2 = − (1 + 2Φ)dt2 + (1 − 2Φ)(dx2 + dy 2 + dz 2 ) ,
(2.136)
82
Fundamentals of General Relativity
in which
Φ(x, y, z) = Newtonian potential
(2.137)
is a function only of the spatial coordinates x, y, z, not of time t.
For this metric, to first order in the potential Φ the only non-vanishing component of the Einstein tensor
is the time-time component
Gtt = 2∇2 Φ ,
(2.138)
where ∇2 = ∂ 2 /∂x2 +∂ 2 /∂y 2 +∂ 2 /∂z 2 is the usual 3-dimensional Laplacian operator. This component (2.138)
of the Einstein tensor plugged into Einstein’s equations (2.131) implies Poisson’s equation (2.4).
Exercise 2.15 Special and general relativistic corrections for clocks on satellites. The metric
just above the surface of the Earth is well-approximated by
ds2 = − (1 + 2Φ)dt2 + (1 − 2Φ)dr2 + r2 (dθ2 + sin2 θ dφ2 ) ,
(2.139)
where
Φ(r) = −
GM
r
(2.140)
is the familiar Newtonian gravitational potential.
1. Proper time. Consider an object at fixed radius r, moving along the equator θ = π/2 with constant
non-relativistic velocity r dφ/dt = v. Compare the proper time of this object with that at rest at infinity.
[Hint: Work to first order in the potential Φ. Regard v 2 as first order in Φ. Why is that reasonable?]
2. Orbits. Consider a satellite in orbit about the Earth. The conservation of energy E per unit mass,
angular momentum L per unit mass, and rest mass per unit mass are expressed by (§4.8)
ut = −E ,
uφ = L ,
uµ uµ = −1 .
For equatorial orbits, θ = π/2, show that the radial component ur of the 4-velocity satisfies
p
ur = 2(∆E − U ) ,
(2.141)
(2.142)
where ∆E is the energy per unit mass of the particle excluding its rest mass energy,
∆E = E − 1 ,
(2.143)
L2
.
2r2
(2.144)
and the effective potential U is
U =Φ+
[Hint: Neglect air resistance. Remember to work to first order in Φ. Treat ∆E and L2 as first order in
Φ. Why is that reasonable?]
3. Circular orbits. From the condition that the potential U be an extremum, find the circular orbital
velocity v = r dφ/dt of a satellite at radius r.
2.27 Newtonian limit
83
4. Special and general relativistic corrections for satellites. Compare the proper time of a satellite
in circular orbit to that of a person at rest at infinity. Express your answer in the form
dτsatellite
− 1 = −Φ⊕ (fGR + fSR ) ,
(2.145)
dt
where fGR and fSR are the general relativistic and special relativistic corrections, and Φ⊕ is the dimensionless gravitational potential at the surface of the Earth,
Φ⊕ = −
GM⊕
.
c 2 R⊕
(2.146)
What is the value of Φ⊕ in milliseconds per year?
5. Special and general relativistic corrections for satellites vs. Earth observer. Compare the
proper time of a satellite in circular orbit to that of a person on Earth at one of the poles (so the person
has no motion from the Earth’s rotation). Express your answer in the form
dτsatellite
dτperson
−
= −Φ⊕ (fGR + fSR ) .
(2.147)
dt
dt
At what satellite radius r, in units of Earth radius R⊕ , do the special and general relativistic corrections
cancel?
6. Special and general relativistic corrections for ISS and GPS satellites. What are the corrections
(be careful to get the sign right!) in units of Φ⊕ , and in units of ms yr−1 , for (i) a satellite in low Earth
orbit, such as the International Space Station; (ii) a nearly geostationary satellite, such as a GPS
satellite? Google the numbers that you may need.
Exercise 2.16 Equations of motion in weak gravity. Take the metric to be the Newtonian metric (2.136) with the Newtonian potential Φ(x, y, z) a function only of the spatial coordinates x, y, z, not of
time t, equation (2.137).
1. Confirm that the non-zero connection coefficients are (coefficients as below but with the last two indices
swapped are the same by the no-torsion condition Γκµν = Γκνµ )
β
α
α
Γttα = Γα
tt = Γββ = −Γβα = −Γαα =
∂Φ
∂xα
(α 6= β = x, y, z) .
(2.148)
[Hint: Work to linear order in Φ.]
2. Consider a massive, non-relativistic particle moving with 4-velocity uµ ≡ dxµ /dτ = {ut , ux , uy , uz }.
Show that uµ uµ = −1 implies that
1
(2.149)
ut = 1 + u2 − Φ ,
2
whereas
1 2
ut = − 1 + u + Φ
(2.150)
2
1/2
. One of ut or ut is constant. Which one? [Hint: Work to linear
where u ≡ (ux )2 + (uy )2 + (uz )2
2
order in Φ. Note that u is of linear order in Φ.]
84
Fundamentals of General Relativity
3. Equation of motion of a massive particle. From the geodesic equation
duκ
+ Γκµν uµ uν = 0
dτ
(2.151)
show that
∂Φ
duα
=− α
α = x, y, z .
dt
∂x
Why is it legitimate to replace dτ by dt? Show further that
(2.152)
dut
∂Φ
(2.153)
= − 2 uα α
dt
∂x
with implicit summation over α = x, y, z. Does the result agree with what you would expect from
equation (2.149)?
4. For a massless particle, the proper time along a geodesic is zero, and the affine parameter λ must be
used instead of the proper time. The 4-velocity of a massless particle can be defined to be (and really
this is just the 4-momentum pµ up to an arbitrary overall factor) v µ ≡ dxµ /dλ = {v t , v x , v y , v z }. Show
that vµ v µ = 0 implies that
v t = (1 − 2Φ)v ,
(2.154)
vt = −v ,
(2.155)
whereas
2 1/2
where v ≡ (v x )2 + (v y )2 + (v z )
. One of v t or vt is constant. Which one?
5. Equation of motion of a massless particle. From the geodesic equation
dv κ
+ Γκµν v µ v ν = 0
dλ
show that the spatial components v ≡ {v x , v y , v z } satisfy
(2.156)
dv
= 2 v × (v × ∇Φ) ,
(2.157)
dλ
where boldface symbols represent 3D vectors, and in particular ∇Φ is the spatial 3D gradient ∇Φ ≡
∂Φ/∂xα = {∂Φ/∂x, ∂Φ/∂y, ∂Φ/∂z}.
6. Interpret your answer, equation (2.157). In what ways does this equation for the acceleration of photons
differ from the equation governing the acceleration of massive particles? [Hint: Without loss of generality,
the affine parameter can be normalized so that the photon speed is one, v = 1, so that v is a unit vector
representing the direction of the photon.]
7. Consider an observer who happens to be at rest in the Newtonian metric, so that ux = uy = uz = 0.
Argue that the energy of a photon observed by this observer, relative to an observer at rest at zero
potential, is
− uµ v µ = 1 − Φ .
Does the observed photon have higher or lower energy in a deeper potential well?
(2.158)
2.27 Newtonian limit
85
Exercise 2.17 Deflection of light by the Sun.
1. Consider light that passes by a spherical mass M sufficiently far away that the potential Φ is always
weak. The potential at distance r from the spherical mass can be approximated by the Newtonian
potential
GM
Φ=−
.
(2.159)
r
Approximate the unperturbed path of light past the mass as a straight line. The plan is to calculate
the deflection as a perturbation to the straight line (physicists call this the Born approximation). For
definiteness, take the light to be moving in the x-direction, offset by a constant amount y away from
the mass in the y-direction (so y is the impact parameter, or periapsis). Argue that equation (2.157)
becomes
dv y
∂Φ
dv y
= vx
= − 2 (v x )2
.
(2.160)
dλ
dx
∂y
Integrate this equation to show that
4GM
∆v y
=−
.
x
v
y
(2.161)
Argue that this equals the deflection angle ∆φ.
2. Calculate the predicted deflection angle ∆φ in arcseconds for light that just grazes the limb of the Sun.
Exercise 2.18 Shapiro time delay. The three classic tests of general relativity are the gravitational
redshift (Exercise 2.9), the gravitational bending of light around the Sun (Exercise 2.17), and the precession
of Mercury (Exercise 7.8). Shapiro (1964) pointed out a fourth test, that the round-trip time for a light
beam bounced off a planet or spacecraft would be lengthened slightly by the passage of the light through
the gravitational potential of the Sun. The experiment could be done with radio signals, since the Sun does
not overwhelm a radio signal passing near its limb. In Exercise 2.16 you showed that the time component of
the 4-velocity v µ ≡ dxµ /dλ of a massless particle moving through a weak gravitational potential Φ is (units
c = 1)
dt dx
µ
v ≡
(2.162)
= {v t , v} = {1 − 2Φ, v} ,
,
dλ dλ
where v is a 3-vector of unit magnitude. Equation (2.162) implies that
dt
= 1 − 2Φ ,
dl
(2.163)
where dl ≡ |dx| is the magnitude of the 3-vector interval dx. The Shapiro time delay comes from the 2Φ
correction.
1. Time delay. The potential Φ at distance r from the Sun is
Φ=−
GM⊙
.
r
(2.164)
86
Fundamentals of General Relativity
lV
lE
b
rV
rE
Figure 2.6 A person on Earth sends out a radio signal that passes by the Sun, bounces off the planet Venus, and
returns to Earth.
Assume that the path of the light can be well-approximated as a straight line, as illustrated in Figure 2.6.
Show that the round-trip time ∆t is, with units of c restored,
4GM⊙
(rE + lE )(rV + lV )
2
∆t = (lE + lV ) +
,
(2.165)
ln
c
c3
b2
where, as illustrated in Figure 2.6, rE and rV are the distances of Earth and Venus from the Sun, b is the
impact parameter, and lE and lV are the distances of Earth and Venus from the point of closest approach.
The first term in equation (2.165) is the Newtonian expectation, while the last term in equation (2.165)
is the Shapiro term.
2. Shapiro time delay for the Earth-Venus-Sun system. Evaluate the Shapiro time delay, in milliseconds, for the Earth-Venus-Sun system when the radio signal just grazes the limb of the Sun,
with b = R⊙ . [Hint: The Earth-Sun distance is rE = 1.496 × 1011 m, while the Venus-Sun distance
is rV = 1.082 × 1011 m.]
3. Change in the time delay as the planets orbit. Assume that Earth and Venus are in circular orbit
about the Sun (so rE and rV are constant). What are the derivatives dlE /db and dlV /db, in terms of lE ,
lV , and b? Deduce an expression for c d∆t/db. Identify which is the Newtonian contribution, and which
the Shapiro contribution. Among the terms in the Shapiro contribution, which one term dominates for
small impact parameters, where b ≪ rE and b ≪ rV ?
4. Relative sizes of Newtonian and Shapiro terms. From your results in part (c), calculate approximately the relative sizes of the Newtonian and Shapiro contributions to the variation c d∆t/db of the
time delay when the radio signal just grazes the limb of the Sun, b = R⊙ . Comment.
Exercise 2.19 Gravitational lensing. In Exercise 2.17 you found that, in the weak field limit, light
passing a spherical mass M at impact parameter y is deflected by angle
∆φ =
4GM
.
yc2
(2.166)
1. Lensing equation. Argue that the deflection angle ∆φ is related to the angles α and β illustrated in
2.27 Newtonian limit
87
Image
∆φ
Source
yL
yA
yS
α
β
Observer
DL
Lens
DLS
DS
Figure 2.7 Lensing diagram.
Image
Source
Lens
2nd
image
Figure 2.8 The appearance of a source lensed by a point lens. The lens in this case is a black hole, whose physical size
is the filled circle, and whose apparent (lensed) size is the surrounding unfilled circle. However, any mass, not just a
black hole, will lens a background source.
the lensing diagram in Figure 2.7 by
αDS = βDS + ∆φDLS .
(2.167)
Hence or otherwise obtain the “lensing equation” in the form commonly used by astronomers
β =α−
2
αE
,
α
(2.168)
88
Fundamentals of General Relativity
where
αE =
4GM DLS
c2 DL DS
1/2
.
(2.169)
2. Solutions. Equation (2.168) has two solutions for the apparent angles α in terms of β. What are they?
Sketch both solutions on a lensing diagram similar to Figure 2.7.
3. Magnification. Figure 2.8 illustrates the appearance of a finite-sized source lensed by a point gravitational lens. If the source is far from the lens, then the source redshift is unchanged by the gravitational
lensing. But the distortion changes the apparent brightness of the source by a magnification µ equal to
the ratio of the apparent area of the lensed source to that of the unlensed source. For a small source,
the ratio of areas is
yA dyA
.
(2.170)
µ=
yS dyS
What is the magnification of a small source in terms of α and αE ? When is the magnification largest?
4. Einstein ring around the Sun? The case α = αE evidently corresponds to the case where the source
is exactly behind the lens, β = 0. In this case the lensed source appears as an “Einstein ring” of light
around the lens. Could there be an Einstein ring around the Sun, as seen from Earth?
5. Einstein ring around Sgr A∗ . What is the maximum possible angular size of an Einstein ring around
the 4 × 106 M⊙ black hole at the center of our Milky Way, 8 kpc away? Might this be observable?
3
More on the coordinate approach
3.1 Weyl tensor
The trace-free, or tidal, part of the Riemann curvature tensor defines the Weyl tensor Cκλµν
Cκλµν ≡ Rκλµν −
1
2
(gκµ Rλν − gκν Rλµ + gλν Rκµ − gλµ Rκν ) +
1
6
(gκµ gλν − gκν gλµ ) R a coordinate tensor .
(3.1)
The Weyl tensor is by construction trace-free, meaning that it vanishes on contraction of any two indices,
which is true with or without torsion.
If torsion vanishes as general relativity assumes, then the Weyl tensor has 10 independent components,
which together with the 10 components of the Ricci tensor account for the 20 distinct components of the
Riemann tensor. The Weyl tensor Cκλµν inherits the symmetries (2.118) of the Riemann tensor, which for
vanishing torsion are
Cκλµν = C([κλ][µν]) .
(3.2)
Whereas the Einstein tensor Gκµ necessarily vanishes in a region of spacetime where there is no energymomentum, Tκµ = 0, the Weyl tensor does not. The Weyl tensor expresses the presence of tidal gravitational
forces, and of gravitational waves.
If torsion does not vanish, then the Weyl tensor has 20 independent components, which together with the
16 components of the Ricci tensor account for the 36 distinct components of the Riemann tensor with torsion.
The 6 anstisymmetric components G[κµ] of the Einstein tensor vanish if torsion vanishes, and likewise the 10
anstisymmetric components C[[κλ][µν]] of the Weyl tensor vanish if torsion vanishes. With or without torsion,
the 10 symmetric components C([κλ][µν]) of the Weyl tensor encode gravitational waves that propagate in
empty space.
Exercise 3.1 Number of components of the Riemann, Ricci, and Weyl tensors in arbitrary
dimensions. How many components do the Riemann, Ricci, and Weyl tensors have in N spacetime dimensions, assuming vanishing torsion?
89
90
More on the coordinate approach
Solution. In N spacetime dimensions, the number of components of the torsion-free Riemann tensor is
Riemann:
1
12 (N
− 1)N 2 (N + 1) .
(3.3)
In 2 spacetime dimensions, the usual Einstein equations do not apply, Exercise ??. For N = 2, the Riemann
tensor has 1 component, the Ricci tensor 1 component, and the Weyl tensor 0 components. In N ≥ 3
spacetime dimensions, the number of components of the Ricci and Weyl tensors are
Ricci: 21 N (N + 1) ,
Weyl:
1
12 (N
− 3)N (N + 1)(N + 2) .
(3.4)
Exercise 3.2 Weyl tensor in arbitrary dimensions. What is the Weyl tensor in N spacetime dimensions?
Solution. The Weyl tensor is the trace-free part of the Riemann tensor. In N spacetime dimensions it is
given by the same expression (3.1) but with different coefficients,
1
1
(gκµ Rλν − gκν Rλµ + gλν Rκµ − gλµ Rκν ) +
(gκµ gλν − gκν gλµ ) R .
N −2
(N − 1)(N − 2)
(3.5)
The Weyl tensor vanishes identically in N = 2 and 3 spacetime dimensions.
Cκλµν ≡ Rκλµν −
3.2 Evolution equations for the Weyl tensor, and gravitational waves
This section shows how the evolution equations for the Weyl tensor resemble Maxwell’s equations for the
electromagnetic field, and how the Weyl tensor encodes gravitational waves. In this section, torsion is taken
to vanish, as general relativity assumes.
Contracted on one index, the torsion-free Bianchi identities (2.127) are
D[κ Rλµ]ν κ = Dκ Rλµν κ + Dλ Rµν − Dν Rλν = 0 .
(3.6)
In 4-dimensional spacetime, there are 20 such independent contracted identities, consisting of 4 trace identities obtained by contracting over λν, and 16 trace-free identities. Since this is the same as the number of
independent torsion-free Bianchi identities, it follows that the contracted Bianchi identities (3.6) are equivalent to the full set of Bianchi identities (2.128). An explicit expression for the Bianchi identities in terms of
the contracted Bianchi identities is, in 4-dimensions (in 5 or higher dimensions there are additional terms),
ρ σ [ν π]
σ ν π
D[κ Rλµ] νπ = 18 δ[κ
δλ δµ] δτ + 9 δτρ δ[κ
δλ δµ] D[υ Rρσ] τ υ (4D spacetime) .
(3.7)
If the Riemann tensor is separated into its trace (Ricci) and traceless (Weyl) parts, equation (3.1), then the
contracted Bianchi identities (3.6) become the Weyl evolution equations
Dκ Cκλµν = Jλµν ,
(3.8)
where Jλµν is the Weyl current
Jλµν ≡
1
2
(Dµ Gλν − Dν Gλµ ) −
1
6
(gλν Dµ G − gλµ Dν G) .
(3.9)
3.2 Evolution equations for the Weyl tensor, and gravitational waves
91
The Weyl evolution equations (3.8) can be regarded as the gravitational analogue of Maxwell’s equations of
electromagnetism.
The Weyl current Jλµν is a vector of bivectors, which would suggest that it has 4 × 6 = 24 components,
but it loses 4 of those components because of the cyclic identity (2.117), valid for vanishing torsion, which
implies the cyclic symmetry
J[λµν] = 0 .
(3.10)
Thus the torsion-free Weyl current Jλµν has 20 independent components, in agreement with the above
assertion that there are 20 independent torsion-free contracted Bianchi identities. Since the Weyl tensor is
traceless, contracting the Weyl evolution equations (3.8) on λµ yields zero on the left hand side, so that the
contracted Weyl current satisfies
J λ λν = 0 .
(3.11)
This doubly-contracted Bianchi identity, which is the same as equation (2.130), enforces conservation of
energy-momentum. Unlike the cyclic symmetry (3.10), which follows from the cyclic symmetry of the Riemann tensor and is not a differential condition on the Riemann tensor, equations (3.11) constitute a nontrivial set of 4 differential conditions on the Einstein tensor. Besides the algebraic relations (3.10) and (3.11),
the Weyl current satisfies 6 differential identities comprising the conservation law
Dλ Jλµν = 0
(3.12)
in view of equation (3.8) and the antisymmetry of Cκλµν with respect to the indices κλ. The Weyl current
conservation law (3.12) follows from the form (3.9) of the Weyl current, coupled with covariant conservation
of the Einstein tensor, equation (2.130), so does not impose any additional non-trivial conditions on the
Riemann tensor. The Weyl current conservation law (3.12) is the gravitational analogue of the conservation
law for electric current that follows from Maxwell’s equations.
The 4 relations (3.11) and the 6 identities (3.12) account for 10 of the 20 contracted torsion-free Bianchi
identities (3.6). The remaining 10 equations comprise Maxwell-like equations (3.8) for the evolution of the
10 components of the Weyl tensor.
Whereas the Einstein equations relating the Einstein tensor to the energy-momentum tensor are postulated
equations of general relativity, the 10 evolution equations for the Weyl tensor, and the 4 equations enforcing
covariant conservation of the Einstein tensor, follow mathematically from the Bianchi identities, and do not
represent additional assumptions of the theory.
Exercise 3.3
Number of Bianchi identities. Confirm the counting of degrees of freedom.
Exercise 3.4 Wave equation for the Riemann and Weyl tensors. From the Bianchi identities, show
that the Riemann tensor satisfies the covariant wave equation
Rκλµν = Dκ Dµ Rλν − Dκ Dν Rλµ + Dλ Dν Rκµ − Dλ Dµ Rκν ,
(3.13)
92
More on the coordinate approach
where is the D’Alembertian operator, the 4-dimensional wave operator
≡ D π Dπ .
(3.14)
Show that contracting equation (3.13) with g λν yields the identity Rκµ = Rκµ . Conclude that the wave
equation (3.13) is non-trivial only for the trace-free part of the Riemann tensor, the Weyl tensor Cκλµν .
Show that the wave equation for the Weyl tensor is
Cκλµν = (Dκ Dµ −
+ (Dλ Dν −
+
1
6
1
2
1
2
gκµ )Rλν − (Dκ Dν −
gλν )Rκµ − (Dλ Dµ −
(gκµ gλν − gκν gλµ )R .
1
2 gκν )Rλµ
1
2 gλµ )Rκν
(3.15)
Conclude that in a vacuum, where Rκµ = 0,
Cκλµν = 0 .
(3.16)
3.3 Geodesic deviation
This section on geodesic deviation is included not because the equation of geodesic deviation is crucial to
everyday calculations in general relativity, but rather for two reasons. First, the equation offers insight into
the physical meaning of the Riemann tensor. Second, the derivation of the equation offers a fine illustration
of the fact that in general relativity, whenever you take differences at infinitesimally separated points in
space or time, you should always take covariant differences.
Consider two objects that are free-falling along two infinitesimally separated geodesics. In flat space the
acceleration between the two objects would be zero, but in curved space the curvature induces a finite
acceleration between the two objects. This is how an observer can measure curvature, at least in principle:
set up an ensemble of objects initially at rest a small distance away from the observer in the observer’s
locally inertial frame, and watch how the objects begin to move. The equation (3.23) that describes this
acceleration between objects an infinitesimal distance apart is called the equation of geodesic deviation.
The covariant difference in the velocities of two objects an infinitesimal distance δxµ apart is
Dδxµ
= δuµ .
Dτ
(3.17)
In general relativity, the ordinary difference between vectors at two points a small interval apart is not
a physically meaningful thing, because the frames of reference at the two points are different. The only
physically meaningful difference is the covariant difference, which is the difference in the two vectors paralleltransported across the gap between them. It is only this covariant difference that is independent of the frame
of reference. On the left hand side of equation (3.17), the proper time derivative must be the covariant proper
time derivative, D/Dτ = uλ Dλ . On the right hand side of equation (3.17), the difference in the 4-velocity
3.3 Geodesic deviation
93
at two points δxκ apart must be the covariant difference δ = δxκ Dκ . Thus equation (3.17) means explicitly
the covariant equation
uλ Dλ δxµ = δxκ Dκ uµ .
(3.18)
To derive the equation of geodesic deviation, first vary the geodesic equation Duµ /Dτ = 0 (the index µ is
put downstairs so that the final equation (3.23) looks cosmetically better, but of course since everything is
covariant the µ index could just as well be put upstairs everywhere):
Duµ
Dτ
= δxκ Dκ uλ Dλ uµ
0=δ
= δuλ Dλ uµ + δxκ uλ Dκ Dλ uµ .
(3.19)
On the second line, the covariant difference δ between quantities a small distance δxκ apart has been set
equal to δxκ Dκ , while D/Dτ has been set equal to the covariant time derivative uλ Dλ along the geodesic.
On the last line, δxκ Dκ uλ has been replaced by δuµ . Next, consider the covariant acceleration of the interval
δxµ , which is the covariant proper time derivative of the covariant velocity difference δuµ :
Dδuµ
D2 δxµ
=
2
Dτ
Dτ
= uλ Dλ (δxκ Dκ uµ )
= δuκ Dκ uµ + δxκ uλ Dλ Dκ uµ .
(3.20)
As in the previous equation (3.19), on the second line D/Dτ has been set equal to uλ Dλ , while δ has been
set equal to δxκ Dκ . On the last line, uλ Dλ δxκ has been set equal to δuµ , equation (3.18). Subtracting (3.19)
from (3.20) gives
D2 δxµ
(3.21)
= δxκ uλ [Dλ , Dκ ]uµ ,
Dτ 2
or equivalently
D2 δxµ
ν
δxκ uλ Dν uµ + Rκλµν δxκ uλ uν = 0 .
(3.22)
+ Sκλ
Dτ 2
If torsion vanishes as general relativity assumes, then
D2 δxµ
+ Rκλµν δxκ uλ uν = 0 ,
Dτ 2
which is the desired equation of geodesic deviation.
(3.23)
4
Action principle for point particles
This chapter describes the action principle for point particles in a prescribed gravitational field. The action
principle provides a powerful way to obtain equations of motion for particles in a given spacetime, such as
a black hole, or a cosmological spacetime. An action principle for the gravitational field itself is deferred to
Chapter 16, after development of the tetrad formalism in Chapter 11.
Hamilton’s principle of least action postulates that any dynamical system is characterized by a scalar
action S, which has the property that when the system evolves from one specified state to another, the path
by which it gets between the two states is such as to minimize the action. The action need not be a global
minimum, just a local minimum with respect to small variations in the path between fixed initial and final
states.
That nature appears to respect a principle of such simplicity and power is quite remarkable, and a deep
mystery. But it works, and in modern physics, the principle of least action has become a basic building block
with which physicists construct theories.
From a practical perspective, the principle of least action, in either Lagrangian or Hamiltonian form,
provides the most powerful way to solve equations of motion. For example, integrals of motion associated
with symmetries of the spacetime emerge automatically in the Lagrangian or Hamiltonian formalisms.
4.1 Principle of least action for point particles
The path of a point particle through spacetime is specified by its coordinates xµ (λ) as a function of some
arbitrary parameter λ. In non-relativistic mechanics it is usual to take the parameter λ to be the time t, and
the path of a particle through space is then specified by three spatial coordinates xa (t). In relativity however
it is more natural to treat the time and space coordinates on an equal footing, and to regard the path of a
particle as being specified by four spacetime coordinates xµ (λ) as a function of an arbitrary parameter λ, as
illustrated in Figure 4.1. The parameter λ is simply a differentiable parameter that labels points along the
path, and has no physical significance (for example, it is not necessarily an affine parameter).
The path of a system of N point particles through spacetime is specified by 4N coordinates xµ (λ). The
action principle postulates that, for a system of N point particles, the action S is an integral of a Lagrangian
94
4.1 Principle of least action for point particles
95
λ
x2
x1
Figure 4.1 The action principle considers various paths through spacetime between fixed initial and final conditions,
and chooses that path that minimizes the action.
L(xµ , dxµ /dλ) which is a function of the 4N coordinates xµ (λ) together with the 4N velocities dxµ /dλ with
respect to the arbitrary parameter λ. The action from an initial state at λi to a final state at λf is thus
S=
Z
λf
dxµ
L x ,
dλ
λi
µ
dλ .
(4.1)
The principle of least action demands that the actual path taken by the system between given initial and
final coordinates xµi and xµf is such as to minimize the action. Thus the variation δS of the action must be
zero under any change δxµ in the path, subject to the constraint that the coordinates at the endpoints are
fixed, δxµi = 0 and δxµf = 0,
δS =
Z
λf
λi
∂L
∂L µ
δ(dxµ /dλ)
δx +
∂xµ
∂(dxµ /dλ)
dλ = 0 .
(4.2)
Linearity of the derivative,
dxµ
d(δxµ )
d
(xµ + δxµ ) =
+
,
dλ
dλ
dλ
(4.3)
shows that the change in the velocity along the path equals the velocity of the change, δ(dxµ /dλ) =
d(δxµ )/dλ. Integrating the second term in the integrand of equation (4.2) by parts yields
δS =
∂L
δxµ
∂(dxµ /dλ)
λf
λi
+
Z
λf
λi
∂L
∂L
d
−
∂xµ
dλ ∂(dxµ /dλ)
δxµ dλ = 0 .
(4.4)
The surface term in equation (4.4) vanishes, since by hypothesis the coordinates are held fixed at the
endpoints, so δxµ = 0 at the endpoints. Therefore the integral in equation (4.4) must vanish. Indeed least
action requires the integral to vanish for all possible variations δxµ in the path. The only way this can happen
96
Action principle for point particles
is that the integrand must be identically zero. The result is the Euler-Lagrange equations of motion
d
∂L
∂L
=0 .
−
dλ ∂(dxµ /dλ) ∂xµ
(4.5)
It might seem that the Euler-Lagrange equations (4.5) are inadequately specified, since they depend on
some arbitrary unknown parameter λ. But in fact the Euler-Lagrange equations are the same regardless of
the choice of λ. An example of the arbitrariness of λ will be seen in §4.3. Since λ can be chosen arbitrarily,
it is common to choose it in some convenient fashion. For a massive particle, λ can be taken equal to the
proper time τ of the particle. For a massless particle, whose proper time never progresses, λ can be taken
equal to an affine parameter.
Concept question 4.1 Redundant time coordinates? How can it be possible to treat the time coordinate t for each particle as an independent coordinate? Isn’t the time coordinate t the same for all N
particles? Answer. Different particles follow different trajectories in spacetime. One is free to choose t(λ)
to be a different function of the parameter λ for each particle, in the same way that the spatial coordinate
xα (λ) may be a different function for each particle.
4.2 Generalized momentum
The left hand side of the Euler-Lagrange equations of motion (4.5) involves the partial derivative of the
Lagrangian with respect to the velocity dxµ /dλ. This quantity plays a fundamental role in the Hamiltonian
formulation of the action principle, §4.10, and is called the generalized momentum πµ conjugate to the
coordinate xµ ,
πµ ≡
∂L
.
∂(dxµ /dλ)
(4.6)
4.3 Lagrangian for a test particle
According to the principle of equivalence, a test particle in a gravitating system moves along a geodesic, a
straight line relative to local free-falling frames. A geodesic is the shortest distance between two points. In
relativity this translates, for √
a massive p
particle, into the longest proper time between two points. The proper
time along any path is dτ = −ds2 = −gµν dxµ dxν . Thus the action Sm of a test particle of constant rest
mass m in a gravitating system is
Z λf
Z λf r
dxµ dxν
Sm = −m
dτ = −m
−gµν
dλ .
(4.7)
dλ dλ
λi
λi
4.3 Lagrangian for a test particle
97
The factor of rest mass m brings the action, which has units of angular momentum, to standard normalization.
The overall minus sign comes from the fact that the action is a minimum whereas the proper time is a
maximum along the path. The action principle requires that the Lagrangian L(xµ , dxµ /dλ) be written as a
function of the coordinates xµ and velocities dxµ /dλ, and it is seen that the integrand in the last expression
of equation (4.7) has the desired form, the metric gµν being considered a given function of the coordinates.
Thus the Lagrangian Lm of a test particle of mass m is
r
dxµ dxν
Lm = −m −gµν
.
(4.8)
dλ dλ
The partial derivatives that go in the Euler-Lagrange equations (4.5) are then
dxν
−gκν
∂Lm
dλ
= −m p
,
∂(dxκ /dλ)
−gπρ (dxπ /dλ)(dxρ /dλ)
1 ∂gµν dxµ dxν
−
∂Lm
2 ∂xκ dλ dλ
= −m p
.
κ
∂x
−gπρ (dxπ /dλ)(dxρ /dλ)
(4.9a)
(4.9b)
The denominators in the expressions (4.9) for the partial derivatives of the Lagrangian are
p
−gπρ (dxπ /dλ)(dxρ /dλ) = dτ /dλ. It was not legitimate to make this substitution before taking the partial
derivatives, since the Euler-Lagrange equations require that the Lagrangian be expressed in terms of xµ and
dxµ /dλ, but it is fine to make the substitution now that the partial derivatives have been obtained. The
partial derivatives (4.9) thus simplify to
dxν dλ
∂Lm
=
mg
= muκ ,
κν
∂(dxκ /dλ)
dλ dτ
∂Lm
1 ∂gµν dxµ dxν dλ
dτ
=
m
= mΓµνκ uµ uν
,
∂xκ
2 ∂xκ dλ dλ dτ
dλ
(4.10a)
(4.10b)
in which uκ ≡ dxκ /dτ is the usual 4-velocity, and the derivative of the metric has been replaced by connections
in accordance with equation (2.59). The generalized momentum πκ , equation (4.6), of the test particle
coincides with its ordinary momentum pκ :
πκ = pκ ≡ muκ .
(4.11)
The resulting Euler-Lagrange equations of motion (4.5) are
dτ
dmuκ
= mΓµνκ uµ uν
.
dλ
dλ
(4.12)
As remarked in §4.1, the choice of the arbitrary parameter λ has no effect on the equations of motion. With
a factor of m dτ /dλ cancelled, equation (4.12) becomes
duκ
= Γµνκ uµ uν .
dτ
(4.13)
98
Action principle for point particles
Splitting the connection Γµνκ into its torsion free-part Γ̊µνκ and the contortion Kµνκ , equation (2.64), gives
duκ
= (Γ̊µνκ + Kµνκ )uµ uν = Γ̊µκν uµ uν ,
dτ
(4.14)
where the last step follows from the symmetry of the torsion-free connection Γ̊µνκ in its last two indices,
and the antisymmetry of the contortion tensor Kµνκ in its first two indices. With or without torsion, equation (4.14) yields the torsion-free geodesic equation of motion,
D̊uκ
=0 .
Dτ
(4.15)
Equation (4.15) shows that presence of torsion does not affect the geodesic motion of particles.
Concept question 4.2 Throw a clock up in the air.
1. This question is posed by Rovelli (2007). Standing on the surface of the Earth, you throw a clock up in
the air, and catch it. Which clock shows more time elapsed, the one you threw up in the air, or the one
on your wrist?
2. Suppose you throw the clock so hard that it goes around the Moon. Which clock shows more time
elapsed?
4.4 Massless test particle
The equation of motion for a massless test particle is obtained from that for a massive particle in the limit of
zero mass, m → 0. The proper time τ along the path of a massless particle is zero, but an affine parameter
λ ≡ τ /m proportional to proper time can be defined, equation (2.93), which remains finite in the limit
m → 0. In terms of the affine parameter λ, the momentum pκ of a particle can be written
pκ ≡ muκ =
dxκ
,
dλ
(4.16)
and the equation of motion (4.15) becomes
D̊pκ
=0,
Dλ
which works for massless as well as massive particles.
The action for a test particle in terms of the affine parameter λ defined by equation (2.93) is
Z
2
S = −m
dλ ,
(4.17)
(4.18)
which vanishes for m → 0. One might be worried that the action seemingly vanishes identically for a massless
particle. An alternative nice action is given below, equation (4.30), that vanishes in the massless limit only
after the equations of motion are imposed.
4.5 Effective Lagrangian for a test particle
99
Concept question 4.3 Conventional Lagrangian. In the conventional Lagrangian approach, the parameter λ is set equal to the time coordinate t, and the Lagrangian L(t, xα , dxα /dt) of a system of N particles
is considered to be a function of the time t, the 3N spatial coordinates xα , and the 3N spatial velocities
dxα /dt. Compare the conventional and covariant Lagrangian approaches for a point particle. Answer. The
Euler-Lagrange equations in the conventional Lagrangian approach are
∂L
∂L
d
−
=0.
dt ∂(dxα /dt) ∂xα
(4.19)
For a point particle, the Euler-Lagrange equations (4.19) yield the spatial components of the geodesic equation of motion (4.17),
D̊pα
=0.
(4.20)
Dλ
What about the time component of the geodesic equation of motion? The geodesic equation for the time
component is a consequence of the geodesic equations for the spatial components, coupled with conservation
of rest mass m,
D̊p0
D̊pα
1 D̊p0 p0
1 D̊(pα pα + m2 )
=
=−
= −pα
=0.
(4.21)
Dλ
2 Dλ
2
Dλ
Dλ
Put another way, the covariant Lagrangian approach applied to a point particle enforces conservation of the
rest mass m of the particle, a conservation law that the conventional Lagrangian approach simply assumes.
Invariance of the action with respect to reparametrization of λ implies conservation of rest mass.
p0
4.5 Effective Lagrangian for a test particle
A drawback of the test particle Lagrangian (4.8) is that it involves a square root. This proves to be problematic
for various reasons, among which is that it is an obstacle to deriving a satisfactory super-Hamiltonian, §4.12.
This section describes an alternative approach that gets rid of the square root, making the test particle
Lagrangian quadratic in velocities dxµ /dλ, equation (4.25).
After equations of motion are imposed, the Lagrangian (4.8) for a test particle of constant rest mass m is
dτ
.
dλ
If the parameter λ is chosen such that dτ /dλ is constant,
Lm = −m
(4.22)
dτ
= constant ,
(4.23)
dλ
so that the Lagrangian Lm is constant after equations of motion are imposed, then the Euler-Lagrange
equations of motion (4.5) are unchanged if the Lagrangian is replaced by any function of it,
L′m = f (Lm ) .
(4.24)
100
Action principle for point particles
A convenient choice of alternative Lagrangian L′m , also called an effective Lagrangian, is
L′m = −
dxµ dxν
L2m
1
.
=
g
µν
2m2
2
dλ dλ
(4.25)
For the effective Lagrangian (4.25), the partial derivatives (4.9) are
∂L′m
dxν
= gκν
,
κ
∂(dx /dλ)
dλ
1 ∂gµν dxµ dxν
∂L′m
dxµ dxν
=
= Γµνκ
.
κ
κ
∂x
2 ∂x dλ dλ
dλ dλ
The Euler-Lagrange equations of motion (4.5) are then
dxν
dxµ dxν
d
gκν
= Γµνκ
.
dλ
dλ
dλ dλ
(4.26a)
(4.26b)
(4.27)
Equations (4.27) are valid subject to the condition (4.23), which asserts that dλ ∝ dτ . The constant of
proportionality does not affect the equations of motion (4.27), which thus reproduce the earlier equations of
motion in either of the forms (4.15) or (4.17).
If the test particle is moving in a prescribed gravitational field and there are no other fields, then the
equations of motion are unchanged by the normalization of the effective Lagrangian L′m . But if there are other
fields that affect the particle’s motion, such as an electromagnetic field, §4.7, then the effective Lagrangian
L′m must be normalized correctly if it is to continue to recover the correct equations of motion. The correct
normalization is such that the generalized momentum of the test particle, defined by equation (4.26a), equal
its ordinary momentum pµ , in agreement with equation (4.11),
gκν
dxν
dxν
= pκ ≡ gκν m
.
dλ
dτ
(4.28)
This requires that the constant in equation (4.23) must equal the rest mass m,
dτ
=m.
dλ
(4.29)
This is just the definition of the affine parameter λ, equation (2.93). Thus the λ in the definition (4.25) of
the effective Lagrangian L′m should be interpreted as the affine parameter.
Notice that the value of the effective Lagrangian L′m after condition (4.29) is applied (after equations of
motion are imposed) is −m2 /2, which is half the value of the original Lagrangian Lm (4.8).
4.6 Nice Lagrangian for a test particle
The effective Lagrangian (4.25) has the advantage that it does not involve a square root, but this advantage
was achieved at the expense of imposing the condition (4.29) ad hoc after the equations of motion are
derived. It is possible to retain the advantage of a Lagrangian quadratic in velocities, but get rid of the ad
4.7 Action for a charged test particle in an electromagnetic field
101
hoc condition, by modifying the Lagrangian so that the ad hoc condition essentially emerges as an equation
of motion. I call the resulting Lagrangian (4.31) the “nice” Lagrangian.
As seen in §4.1, the equations of motion are independent of the choice of the arbitrary parameter λ
that labels the path of the particle between its fixed endpoints. The equations of motion are said to be
reparametrization independent. Introduce, therefore, a parameter µ(λ), an arbitrary function of λ, that
rescales the parameter λ, and let the action for a test particle of mass m be
Z
1
dxµ dxν
2
Sm =
gµν
−m
µ dλ ,
(4.30)
2
µ dλ µ dλ
with nice Lagrangian
Lm =
µ
2
gµν
dxµ dxν
− m2
µ dλ µ dλ
.
(4.31)
Variation of the action with respect to xµ and dxµ /dλ yields the Euler-Lagrange equations in the form
d
dxν
dxµ dxν
gκν
= Γµνκ
.
(4.32)
µ dλ
µ dλ
µ dλ µ dλ
Variation of the action with respect to the parameter µ gives
Z
dxµ dxν
1
2
−gµν
− m δµ dλ ,
δSm =
2
µ dλ µ dλ
(4.33)
and requiring that this be an extremum imposes
gµν
dxµ dxν
= −m2 .
µ dλ µ dλ
(4.34)
Equation (4.34) is equivalent to
µ dλ =
dτ
,
m
(4.35)
where the sign has been taken positive without loss of generality. Substituting equation (4.35) into the
equations of motion (4.32) recovers the usual equations of motion (4.15).
Condition (4.35) substituted into the action (4.30) recovers the standard test particle action (4.7) with
the correct sign and normalization.
4.7 Action for a charged test particle in an electromagnetic field
The equations of motion for a test particle of charge q in a prescribed gravitational and electromagnetic
field can be obtained by adding to the test particle action Sm an interaction action Sq that characterizes the
interaction between the charge and the electromagnetic field,
S = Sm + Sq .
(4.36)
102
Action principle for point particles
In flat (Minkowski) space, experiment shows that the required equation of motion is the classical Lorentz
force law (4.45). The Lorentz force law is recovered with the interaction action
Z λf
Z λf
dxµ
dλ ,
(4.37)
Aµ
Sq = q
Aµ dxµ = q
dλ
λi
λi
where Aµ is the electromagnetic 4-vector potential. The interaction Lagrangian Lq corresponding to the
action (4.37) is
dxµ
Lq = qAµ
.
(4.38)
dλ
If the electromagnetic potential Aµ is taken to be a prescribed function of the coordinates xµ along the
path of the particle, then the Lagrangian Lq (4.38) is a function of coordinates xµ and velocities dxµ /dλ
as required by the action principle. The partial derivatives of the interaction Lagrangian Lq with respect to
velocities and coordinates are
∂Lq
(4.39a)
= qAκ ,
∂(dxκ /dλ)
∂Lq
∂Aµ dxµ
∂Aµ dτ
=
q
= q κ uµ
.
(4.39b)
∂xκ
∂xκ dλ
∂x
dλ
The generalized momentum πκ , equation (4.6), of the test particle of mass m and charge q in the electromagnetic field of potential Aµ is, from equations (4.10a) and (4.39a),
πκ ≡
∂(Lm + Lq )
= muκ + qAκ .
∂(dxκ /dλ)
Applied to the Lagrangian L = Lm + Lq , the Euler-Lagrange equations (4.5) are
∂Aµ µ dτ
d
µ ν
(muκ + qAκ ) = mΓµνκ u u + q κ u
,
dλ
∂x
dλ
(4.40)
(4.41)
which rearranges to
dmuκ
(4.42)
= mΓµνκ uµ uν + qFκµ uµ ,
dτ
where the antisymmetric electromagnetic field tensor Fκµ is defined to be the torsion-free covariant curl of
the electromagnetic potential Aµ ,
∂Aµ
∂Aκ
Fκµ ≡
−
.
(4.43)
κ
∂x
∂xµ
The definition (4.43) of the electromagnetic field holds even in the presence of torsion (see §16.5). Splitting
the connection in equation (4.42) into its torsion-free part and the contortion, as done previously in equation (4.14), yields the Lorentz force law for a test particle of mass m and charge q moving in a prescribed
gravitational and electromagnetic field, with or without torsion,
D̊muκ
= qFκµ uµ .
Dτ
(4.44)
4.8 Symmetries and constants of motion
103
Equation (4.44), which involves the torsion-free covariant derivative D̊/Dτ , shows that the Lorentz force law
is unaffected by the presence of torsion.
In flat (Minkowski) space, the spatial components of equation (4.44) reduce to the classical special relativistic Lorentz force law
dp
= q (E + v × B) .
(4.45)
dt
In equation (4.45), p is the 3-momentum and v is the 3-velocity, related to the 4-momentum and 4-velocity
by pk = {pt , p} = muk = mut {1, v} (note that d/dt = (1/ut ) d/dτ ). In flat space, the components of the
electric and magnetic fields E = {Ex , Ey , Ez } and B = {Bx , By , Bz } are related to the electromagnetic field
tensor Fmn by (the signs in the expression (4.46) are arranged precisely so as to agree with the classical
law (4.45))
0
Ex
Ey
Ez
0 −Ex −Ey −Ez
Ex
0
Bz −By
0
Bz −By
.
, F mn = −Ex
Fmn =
(4.46)
−Ey −Bz
Ey −Bz
0
Bx
0
Bx
Ez
By
−Bx
−Ez
0
By
−Bx
0
If the electromagnetic 4-potential Am is written in terrms of a classical electric potential φ and electric
3-vector potential A ≡ {Ax , Ay , Az },
Am = {φ, A} .
(4.47)
then in flat space equation (4.43) reduces to the traditional relations for the electric and magnetic fields E
and B in terms of the potentials φ and A,
E = − ∇φ −
∂A
,
∂t
B =∇×A ,
(4.48)
where ∇ ≡ {∂/∂x, ∂/∂y, ∂/∂z} is the spatial 3-gradient.
4.8 Symmetries and constants of motion
If a spacetime possesses a symmetry of some kind, then a test particle moving in that spacetime possesses
an associated constant of motion. The Lagrangian formalism makes it transparent how to relate symmetries
to constants of motion.
Suppose that the Lagrangian of a particle has some spacetime symmetry, such as time translation symmetry, or spatial translation symmetry, or rotational symmetry. In a suitable coordinate system, the symmetry
is expressed by the condition that the Lagrangian L is independent of some coordinate, call it ξ. In the case
of time translation symmetry, for example, the coordinate would be a suitable time coordinate t. Coordinate
independence requires that the metric gµν , along with any other field, such as an electromagnetic field, that
may affect the particle’s motion, is independent of the coordinate ξ. Then the Euler-Lagrangian equations
104
Action principle for point particles
of motion (4.5) imply that the derivative of the covariant ξ-component πξ of the conjugate momentum of
the particle vanishes along the trajectory of the particle,
∂L
dπξ
=
=0.
dλ
∂ξ
(4.49)
Thus the covariant momentum πξ is a constant of motion,
πξ = constant .
(4.50)
4.9 Conformal symmetries
Sometimes the Lagrangian possesses a weaker kind of symmetry, called conformal symmetry, in which
the Lagrangian L depends on a coordinate ξ only through an overall scaling of the Lagrangian,
L = e2ξ L̃ ,
(4.51)
where the conformal Lagrangian L̃ is independent of ξ. The factor eξ is called a conformal factor. The
Euler-Lagrangian equation of motion (4.5) for the conformal coordinate ξ is then
dπξ
∂L
=
= 2L .
dλ
∂ξ
(4.52)
As an example, consider a test particle moving in a spacetime with conformally symmetric metric
gµν = e2ξ g̃µν ,
(4.53)
where the conformal metric g̃µν is independent of the coordinate ξ. The effective Lagrangian L′m of the test
particle is given by equation (4.25). The equation of motion (4.52) becomes
dpξ
= 2L′m = −m2 .
dλ
(4.54)
If the test particle is massive, m 6= 0, then equation (4.54) integrates to
pξ = −mτ ,
(4.55)
where a constant of integration has been absorbed, without loss of generality, into a shift of the zero point
of the proper time τ of the particle. If the test particle is massless, m = 0, then equation (4.54) implies that
pξ = constant .
(4.56)
4.9 Conformal symmetries
Exercise 4.4
105
Geodesics in Rindler space. The Rindler line-element (2.103) can be written
ds2 = e2ξ − dα2 + dξ 2 + dy 2 + dz 2 ,
(4.57)
where the Rindler coordinates α and ξ are related to Minkowski coordinates t and x by
t = eξ sinh α ,
x = eξ cosh α .
(4.58)
What are the constants of motion of a test particle? Integrate the Euler-Lagrange equations of motion.
Solution. The Rindler metric is independent of the coordinates α, y, and z. The three corresponding
constants of motion are
pα ,
py ,
pz .
(4.59)
A fourth integral of motion follows from conservation of rest mass
pν pν = −m2 .
(4.60)
t
x
Figure 4.2 Rindler wedge of Minkowski space. Purple and blue lines are lines of constant Rindler time α and constant
Rindler spatial coordinate ξ respectively. The grid of lines is equally spaced by 0.2 in each of α and ξ. The Rindler
coordinates α and ξ, each extending over the interval (−∞, ∞), cover only the x > |t| quadrant of Minkowski space.
The fact that the Rindler metric is conformally Minkowski in α and ξ (the line-element is proportional to − dα2 + dξ 2 ,
equation (4.57)) shows up in the fact that small areal elements of the α–ξ grid are rhombi with null (45◦ ) diagonals.
The straight black line is a representative geodesic. The solid dot marks the point where the geodesic goes through
{α0 , ξ0 }. Open circles mark α = ∓∞, where the geodesic passes through the null boundaries t = ∓x of the Rindler
wedge.
106
Action principle for point particles
Equation (4.60) rearranges to give
dξ
≡ pξ = e−ξ
dλ
where µ is the positive constant
µ≡
q
(e−ξ pα )2 − µ2 ,
q
p2y + pz2 + m2 .
(4.61)
(4.62)
Equation (4.61) integrates to give ξ as a function of λ,
e2ξ =
2
pα
− µ 2 λ2 ,
µ2
(4.63)
where a constant of integration has been absorbed without loss of generality into a shift of the zero point of
the affine parameter λ along the trajectory of the particle. The coordinate ξ passes through its maximum
value ξ0 where λ = 0, at which point
pα
,
(4.64)
eξ0 = −
µ
the sign coming from the fact that pα = gαα pα = −e2ξ dα/dλ must be negative, since the particle must move
forward in Rindler time α. The trajectory is illustrated in Figure 4.2; the trajectory is of course a straight
line in the parent Minkowski space.
The evolution equation (4.63) for ξ(λ) can be derived alternatively from the Euler-Lagrange equation for
ξ,
dpξ
= −µ2 .
dλ
(4.65)
The Euler-Lagrange equation (4.65) integrates to
pξ = −µ2 λ ,
(4.66)
where a constant of integration has again been absorbed into a shift of the zero point of the affine parameter
λ (this choice is consistent with the previous one). Given that pξ = gξξ pξ = e2ξ dξ/dλ, equation (4.66)
integrates to yield the same result (4.63), the constant of integration being established by the rest-mass
relation (4.60).
The evolution of Rindler time α along the particle’s trajectory follows from integrating pα = gαα pα =
−e2ξ dα/dλ, which gives
ξ0
1
e + µλ
α − α0 = − ln ξ0
,
(4.67)
2
e − µλ
where α0 is the value of α for λ = 0, where ξ takes its maximum ξ0 . The Rindler time coordinate α varies
between limits ∓∞ at µλ = ∓eξ0 .
4.10 (Super-)Hamiltonian
107
4.10 (Super-)Hamiltonian
The Lagrangian approach characterizes the paths of particles through spacetime in terms of their 4N coordinates xµ and corresponding velocities dxµ /dλ along those paths. The Hamiltonian approach on the other
hand characterizes the paths of particles through spacetime in terms of 4N coordinates xµ and the 4N generalized momenta πµ , which are treated as independent from the coordinates. In the Hamiltonian approach,
the Hamiltonian H(xµ , πµ ) is considered to be a function of coordinates and generalized momenta, and
the action is minimized with respect to independent variations of those coordinates and momenta. In the
Hamiltonian approach, the coordinates and momenta are treated essentially on an equal footing.
The Hamiltonian H can be defined in terms of the Lagrangian L by
H ≡ πµ
dxµ
−L .
dλ
(4.68)
Here, as previously in §4.1, the parameter λ is to be regarded as an arbitrary parameter that labels the
path of the system through the 8N -dimensional phase space of coordinates and momenta of the N particles.
Misner, Thorne, and Wheeler (1973) call the Hamiltonian (4.68) the super-Hamiltonian, to distinguish
it from the conventional Hamiltonian, equation (4.74), where the parameter λ is taken equal to the time
coordinate t. Here however the super-Hamiltonian (4.68) is simply referred to as the Hamiltonian, for brevity.
In terms of the Hamiltonian (4.68), the action (4.1) is
Z λf
dxµ
− H dλ .
(4.69)
πµ
S=
dλ
λi
In accordance with Hamilton’s principle of least action, the action must be varied with respect to the
coordinates and momenta along the path. The variation of the first term in the integrand of equation (4.69)
can be written
dxµ
dxµ
dδxµ
dxµ
dπµ µ
d
= δπµ
(4.70)
+ πµ
= δπµ
+
(πµ δxµ ) −
δx .
δ πµ
dλ
dλ
dλ
dλ
dλ
dλ
The middle term on the right hand side of equation (4.70) yields a surface term on integration. Thus the
variation of the action is
µ
Z λf
dπµ
∂H
∂H
dx
µ λf
µ
−
δx +
δπµ dλ ,
(4.71)
+
−
δS = [πµ δx ]λi +
dλ
∂xµ
dλ
∂πµ
λi
which takes into account that the Hamiltonian is to be considered a function H(xµ , πµ ) of coordinates and
momenta. The principle of least action requires that the action is a minimum with respect to variations of
the coordinates and momenta along the paths of particles, the coordinates and momenta at the endpoints
λi and λf of the integration being held fixed. Since the coordinates are fixed at the endpoints, δxµ = 0, the
surface term in equation (4.71) vanishes. Minimization of the action with respect to arbitrary independent
variations of the coordinates and momenta then yields Hamilton’s equations of motion
dxµ
∂H
,
=
dλ
∂πµ
dπµ
∂H
=− µ .
dλ
∂x
(4.72)
108
Action principle for point particles
4.11 Conventional Hamiltonian
The conventional Hamiltonian of classical mechanics is not the same as the super-Hamiltonian (4.68). In the
conventional approach, the parameter λ is set equal to the time coordinate t. The Lagrangian is taken to be
a function L(t, xα , dxα/dt) of time t and of the 3N spatial coordinates xα and 3N spatial velocities dxα/dt.
The generalized momenta are defined to be, analogously to (4.6),
πα ≡
∂L
.
∂(dxα /dt)
(4.73)
The conventional Hamitonian is taken to be a function H(t, xα , πα ) of time t and of the 3N spatial coordinates xα and corresponding 3N generalized momenta πα . The conventional Hamiltonian is related to the
conventional Lagrangian by
dxα
−L .
(4.74)
H ≡ πα
dt
The conventional Hamilton’s equations are
dxα
∂H
,
=
dt
∂πα
dπα
∂H
=− α .
dt
∂x
(4.75)
The advantage of the super-Hamiltonian (4.68) over the conventional Hamiltonian (4.74) in general relativity will become apparent in the sections following.
4.12 Conventional Hamiltonian for a test particle
The test-particle Lagrangian (4.8) is
Lm = −m
r
−gµν
dxµ dxν
.
dλ dλ
(4.76)
The corresponding test-particle Hamiltonian is supposedly given by equation (4.68). However, one runs into
a difficulty. The Hamiltonian is supposed to be expressed in terms of coordinates xµ and momenta pµ . But
the expression (4.68) for the Hamiltonian depends on the arbitrary parameter λ, whereas as seen in §4.3 the
coordinates xµ and momenta pµ are (before the least action principle is applied) independent of the choice
of λ. There is no way to express the Hamiltonian in the prescribed form without imposing some additional
constraint on λ. Two ways to achieve this are described in the next two sections, §4.13 and §4.14.
A third approach is to revert to the conventional approach of fixing the arbitrary parameter λ equal to
coordinate time t. This choice eliminates the time coordinate and corresponding generalized momentum as
parameters to be determined by the least action principle. It also breaks manifest covariance, by singling out
the time coordinate for special treatment. For simplicity, consider flat space, where the metric is Minkowski
ηmn . The Lagrangian (4.76) becomes
r
p
dxm dxn
Lm = −m −ηmn
(4.77)
= −m 1 − v 2 ,
dt dt
4.12 Conventional Hamiltonian for a test particle
p
where v ≡ ηab v a v b is the magnitude of the 3-velocity v a ,
va ≡
dxa
.
dt
The generalized momentum πa defined by (4.73) equals the ordinary momentum pa ,
mva
πa = p a ≡ √
.
1 − v2
109
(4.78)
(4.79)
The Hamiltonian (4.74) is
H = pa v a − L = √
m
.
1 − v2
Expressed in terms of the spatial momenta pa , the Hamiltonian is
p
H = p 2 + m2 ,
p
where p ≡ η ab pa pb is the magnitude of the 3-momentum pa . Hamilton’s equations (4.75) are
pa
dxa
=p
,
dt
p 2 + m2
dpa
=0.
dt
(4.80)
(4.81)
(4.82)
The Hamiltonian (4.81) can be recognized as the energy of the particle, or minus the covariant time component of the 4-momentum,
H = −p0 .
(4.83)
A similar, more complicated, analysis in curved space leads to the same conclusion, that the conventional
Hamiltonian H is minus the covariant time component of the 4-momentum,
H = −pt .
(4.84)
The expression for the Hamiltonian in terms of spatial coordinates xα and momenta pα can be inferred from
conservation of rest mass,
g µν pµ pν + m2 = 0 .
Explicitly, the conventional Hamiltonian is
q
1
H = −pt = tt g tα pα + (g tα g tβ − g tt g αβ )pα pβ − g tt m2 .
g
(4.85)
(4.86)
In the presence of an electromagnetic field, replace the momenta pt and pα in equation (4.86) by pµ =
πµ − qAµ , and set the Hamiltonian equal to −πt ,
H = −πt .
(4.87)
The super-Hamiltonians (4.90) and (4.96) derived in the next two sections are more elegant than the
conventional Hamiltonian (4.86). All lead to the same equations of motion, but the super-Hamiltonian
exhibits general covariance more clearly.
110
Action principle for point particles
4.13 Effective (super-)Hamiltonian for a test particle with electromagnetism
In the effective approach, the condition (4.29) on the parameter λ is applied after equations of motion are
derived. The effective test-particle Lagrangian (4.25), coupled to electromagnetism, is
1
dxµ dxν
dxµ
gµν
+ qAµ
,
(4.88)
2
dλ dλ
dλ
where the metric gµν and electromagnetic potential Aµ are considered to be given functions of the coordinates
xµ . The corresponding generalized momentum (4.6) is
L = Lm + Lq =
dxν
+ qAµ .
(4.89)
dλ
The (super-)Hamiltonian (4.68) expressed in terms of coordinates xµ and momenta πµ as required is
πµ = gµν
H=
1 µν
g (πµ − qAµ )(πν − qAν ) .
2
(4.90)
Hamilton’s equations (4.72) are
dxµ
= pµ ,
dλ
dpκ
= Γµνκ pµ pν + qFκµ pµ ,
dλ
(4.91)
where pµ is defined by
pµ ≡ πµ − qAµ .
(4.92)
The equations of motion (4.91) having been derived from the Hamiltonian (4.90), the parameter λ is set
equal to the affine parameter in accordance with condition (4.29). In particular, the first of equations (4.91)
together with condition (4.29) implies that pµ = m dxµ /dτ , as it should be. The equations of motion (4.91)
thus reproduce the equations (4.42) derived in Lagrangian approach. The value of the Hamiltonian (4.90)
after the equations of motion and condition (4.29) are imposed is constant,
m2
.
(4.93)
2
Recall that the super-Hamiltonian H is a scalar, associated with rest mass, to be distinguished from the
conventional Hamiltonian, which is the time component of a vector, associated with energy. The minus sign
in equation (4.93) is associated with the choice of metric signature −+++, where scalar products of timelike
quantities are negative. The negative Hamiltionian (4.93) signifies that the particle is propagating along a
timelike direction. If the particle is massless, m = 0, then the Hamiltonian is zero (after equations of motion
are imposed), signifying that the particle is propagating along a null direction.
H=−
4.14 Nice (super-)Hamiltonian for a test particle with electromagnetism
The nice test-particle Lagrangian (4.31), coupled to electromagnetism, is
dxµ dxν
dxµ
µ
2
gµν
− m + qAµ
.
L=
2
µ dλ µ dλ
dλ
(4.94)
4.14 Nice (super-)Hamiltonian for a test particle with electromagnetism
111
The corresponding generalized momentum (4.6) is
πµ = gµν
dxν
+ qAµ .
µ dλ
(4.95)
The associated nice (super-)Hamiltonian (4.68) expressed in terms of coordinates xµ and momenta πµ as
required is
µ µν
(4.96)
g (πµ − qAµ )(πν − qAν ) + m2 .
H=
2
The nice Hamiltonian H, equation (4.96), depends on the auxiliary parameter µ as well as on xµ and πµ ,
and the action must be varied with respect to all of these to obtain all the equations of motion. Compared
to the variation (4.71), the variation of the action contains an additional term proportional to δµ:
µ
Z λf
dπµ
∂H
∂H
∂H
dx
λ
µ
δx
−
+
δπ
−
+
−
δµ
dλ .
(4.97)
δS = [πµ δxµ ]λfi +
µ
dλ
∂xµ
dλ
∂πµ
∂µ
λi
Requiring that the variation (4.97) of the action vanish under arbitrary variations of the coordinates xµ and
momenta πµ yields Hamilton’s equations (4.72), which here are
dxµ
= pµ ,
µ dλ
dpκ
= Γµνκ pµ pν + qFκµ pµ ,
µ dλ
(4.98)
with pµ defined by
pµ ≡ πµ − qAµ .
(4.99)
The condition (4.103) found below, substituted into the first of Hamilton’s equations (4.98), implies that pµ
coincides with the usual ordinary momentum pµ = m dxµ /dτ , as it should. Requiring that the variation (4.97)
of the action vanish under arbitrary variation of the parameter µ yields the additional equation of motion
∂H
=0.
∂µ
(4.100)
The additional equation of motion (4.100) applied to the Hamiltonian (4.96) implies that
g µν (πµ − qAµ )(πν − qAν ) = −m2 .
(4.101)
From the first of the equations of motion (4.98) along with the definition (4.99), equation (4.101) is the same
as
dxµ dxν
gµν
= −m2 ,
(4.102)
µ dλ µ dλ
which in turn is equivalent to
µ dλ =
dτ
,
m
(4.103)
recovering equation (4.35) derived using the Lagrangian formalism. Inserting the condition (4.103) into
Hamilton’s equations (4.98) recovers the equations of motion (4.42) for a test particle in a prescribed gravitational and electromagnetic field. The value of the Hamiltonian (4.96) after the equation of motion (4.101)
112
Action principle for point particles
is imposed is zero,
H=0.
(4.104)
4.15 Derivatives of the action
Besides being a scalar whose minimum value between fixed endpoints defines the path between those points,
the action S can also be treated as a function of its endpoints along the actual path. Along the actual path,
the equations of motion are satisfied, so the integral in the variation (4.4) or (4.71) of the action vanishes
identically. The surface term in the variation (4.4) or (4.71) then implies that δS = πµ δxµ . This means that
the partial derivatives of the action with respect to the coordinates are equal to the generalized momenta,
∂S
= πµ .
(4.105)
∂xµ
This is the basis of the Hamilton-Jacobi method for solving equations of motion, §4.16.
By definition, the total derivative of the action S with respect to the arbitrary parameter λ along the
actual path equals the Lagrangian L. In addition to being a function of the coordinates xµ along the actual
path, the action may also be an explicit function S(λ, xµ ) of the parameter λ. The total derivative of the
action along the path may thus be expressed
∂S
∂S dxµ
dS
=L=
+
.
(4.106)
dλ
∂λ
∂xµ dλ
Comparing equation (4.106) to the definition (4.68) of the Hamiltonian shows that the partial derivative of
the action with respect to the parameter λ is minus the Hamiltonian
∂S
= −H .
(4.107)
∂λ
In the conventional approach where the parameter λ is fixed equal to the time coordinate t, equations (4.105) and (4.107) together show that
∂S
(4.108)
= πt = −H ,
∂t
in agreement with equation (4.87). In the super-Hamiltonian approach, the Hamiltonian H is constant, equal
to −m2 /2 in the effective approach, equation (4.93), and equal to zero in the nice approach, equation (4.104).
Concept question 4.5 Action vanishes along a null geodesic, but its gradient does not. How
can it be that the gradient of the action pµ = ∂S/∂xµ is non-zero along a null geodesic, yet the variation of
the action dS = −m dτ is identically zero along the same null geodesic? Answer. This has to do with the
fact that a vector can be finite yet null,
dxµ ∂S
dS
=
= π µ πµ = −m2 = 0 for m = 0 .
dλ
dλ ∂xµ
(4.109)
4.16 Hamilton-Jacobi equation
113
4.16 Hamilton-Jacobi equation
The Hamilton-Jacobi equation provides a powerful way to solve equations of motion. The Hamilton-Jacobi
equation proves to be separable in the Kerr-Newman geometry for an ideal rotating black hole, Chapter 23.
The hypothesis that the Hamilton-Jacobi equation be separable provides one way to derive the Kerr-Newman
line-element, Chapter 22, and to discover other separable spacetimes.
The Hamilton-Jacobi equation is obtained by writing down the expression for the Hamiltonian H in terms
of coordinates xµ and generalized momenta πµ , and replacing the Hamiltonian H by −∂S/dλ in accordance
with equation (4.107), and the generalized momenta πµ by ∂S/∂xµ in accordance with equation (4.105).
For the effective Hamiltonian (4.90), the resulting Hamilton-Jacobi equation is
1 µν ∂S
∂S
∂S
− qAµ
− qAν ,
(4.110)
= g
−
∂λ
2
∂xµ
∂xν
whose left hand side is −m2 /2, equation (4.93). For the nice Hamiltonian (4.96), the resulting HamiltonJacobi equation is
∂S
∂S
1 µν ∂S
2
+
m
,
(4.111)
g
−
−
qA
−
qA
=
µ
ν
µ ∂λ
2
∂xµ
∂xν
whose left hand side is zero, equation (4.104). The Hamilton-Jacobi equations (4.110) and (4.111) agree, as
they should. The Hamilton-Jacobi equation (4.110) or (4.111) is a partial differential equation for the action
S(λ, xµ ). In spacetimes with sufficient symmetry, such as Kerr-Newman, the partial differential equation can
be solved by separation of variables. This will be done in §22.3.
4.17 Canonical transformations
The Lagrangian equations of motion (4.5) take the same form regardless of the choice of coordinates xµ of
the underlying spacetime. This expresses general covariance: the form of the Lagrangian equations of motion
is unchanged by general coordinate transformations.
Coordinate transformations also preserve Hamilton’s equations of motion (4.72). But the Hamiltonian
formalism allows a wider range of transformations that preserve the form of Hamilton’s equations. Transformations of the coordinates and momenta that preserve Hamilton’s equations are called canonical transformations. The construction of canonical transformations is addressed in §4.17.1.
The wide range of possible canonical transformations means that the coordinates and momenta lose much
of their original meaning as actual spacetime coordinates and momenta of particles. For example, there is
a canonical transformation (4.117) that simply exchanges coordinates and their conjugate momenta. It is
common therefore to refer to general systems of coordinates and momenta that satisfy Hamilton’s equations
as generalized coordinates and generalized momenta, and to denote them by q µ and pµ ,
qµ ,
pµ .
(4.112)
114
Action principle for point particles
4.17.1 Construction of canonical transformations
Consider a canonical transformation of coordinates and momenta
{q µ , pµ } → {q ′µ (q, p), p′µ (q, p)} .
(4.113)
By definition of canonical transformation, both the original and transformed sets of coordinates and momenta
satisfy Hamilton’s equations.
For the equations of motion to take Hamiltonian form, the original and transformed actions S and S ′ must
take the form
Z λf
Z λf
µ
′
S=
pµ dq − Hdλ , S =
(4.114)
p′µ dq ′µ − H ′ dλ .
λi
λi
One way for the original and transformed coordinates and momenta to yield equivalent equations of motion
is that the integrands of the actions differ by the total derivative dF of some function F ,
dF = pµ dq µ − p′µ dq ′µ − (H − H ′ ) dλ .
(4.115)
When the actions S and S ′ are varied, the difference in the variations is the difference in the variation of F
between the initial and final points λi and λf , which vanishes provided that whatever F depends on is held
fixed on the initial and final points,
λ
δS − δS ′ = [δF ]λfi = 0 .
(4.116)
Because the variations of the actions are the same, the resulting equations of motion are equivalent. The
function F is called the generator of the canonical transformation between the original and transformed
coordinates.
Given any function F (λ, q, q ′ ), equation (4.115) determines pµ , −pµ′ , and H −H ′ as partial derivatives of F
P
with respect to q µ , q ′µ , and λ. For example, the function F = µ q ′µ q µ generates a canonical transformation
that simply trades coordinates and momenta,
pµ =
∂F
= q ′µ ,
∂q µ
pµ′ = −
∂F
= −q µ .
∂q ′µ
(4.117)
The generating function F (λ, q, q ′ ) depends on q µ and q ′µ . Other generating functions depending on either
of q µ or pµ , and either of q ′µ or pµ′ , are obtained by subtracting pµ q µ and/or adding pµ′ q ′µ to F . For example,
equation (4.115) can be rearranged as
dG = pµ dq µ + q ′µ dp′µ − (H − H ′ )dλ ,
(4.118)
P
where G ≡ F + pν′ q ′ν is now some function G(λ, q, p′ ). For example, the function G(q, p′ ) = µ f µ (q) p′µ ,
in which f µ (q) is some function of the coordinates q ν but not of the momenta pν , generates the canonical
transformation
X ∂f ν
∂G
∂G
pν′ , q ′µ = ′ = f µ (q) .
(4.119)
pµ = µ =
µ
∂q
∂q
∂pµ
ν
This is just a coordinate transformation q µ → q ′µ = f µ (q).
4.18 Symplectic structure
115
If the generator of a canonical transformation does not depend on the parameter λ, then the Hamiltonians
are the same in the original and transformed systems,
H(q µ , pµ ) = H ′ (q ′µ , pµ′ ) .
(4.120)
In the super-Hamiltonian approach, where the parameter λ is arbitrary, the Hamiltonian is without loss of
generality independent of λ, and there is no physical significance to canonical transformations generated by
functions that depend on λ. The super-Hamiltonian H(q µ , pµ ) is then a scalar, invariant with respect to
canonical transformations that do not depend explicitly on λ. This contrasts with the conventional Hamiltonian approach, where the parameter λ is set equal to the coordinate time t, and the conventional Hamiltonian is the time component of a 4-vector, which varies under canonical transformations generated by
functions that depend on time t.
4.17.2 Evolution is a canonical transformation
The evolution of the system from some initial hypersurface λ = 0 to some final hypersurface λ is itself a
canonical transformation. This is evident from the fact that Hamilton’s equations (4.72) hold for any value of
the parameter λ, so in particular Hamilton’s equations are unchanged when initial coordinates and momenta
q µ (0) and pµ (0) are replaced by evolved values q µ (λ) and pµ (λ),
q µ (0) → q ′µ = q µ (λ) ,
pµ (0) → p′µ = pµ (λ) .
(4.121)
The action varies by the total derivative dS = pµ dq µ − H dλ along the actual path of the system, equation (4.106), so the initial and evolved actions differ by a total derivative, equation (4.115),
(4.122)
dF = pµ (0) dq µ (0) − pµ (λ) dq µ (λ) − H(0) − H(λ) dλ = dS(0) − dS(λ) .
Thus the canonical transformation from an initial λ = 0 to a final λ is generated by the difference in the
actions along the actual path of the system,
F = S(0) − S(λ) .
(4.123)
4.18 Symplectic structure
The generalized coordinates q µ and momenta pµ of a dynamical system of particles have a geometrical structure that transcends the geometrical structure of the underlying spacetime manifold. For N coordinates q µ
and N momenta pµ , the geometrical structure is a 2N -dimensional manifold called a symplectic manifold.
A symplectic manifold is also called phase space, and the coordinates {q µ , pµ } of the manifold are called
phase-space coordinates.
A central property of a symplectic manifold is that the Hamiltonian dynamics define a scalar product with
antisymmetric symplectic metric ωij . Let z i with i = 1, ..., 2N denote the combined set of 2N generalized
116
Action principle for point particles
coordinates and momenta {q µ , pµ },
{z 1 , ..., z N , z N +1 , ..., z 2N } ≡ {q 1 , ..., q N , p1 , ..., pN } .
(4.124)
Hamilton’s equations (4.72) can be written
dz i
∂H
= ω ij j ,
dλ
∂z
where ω ij is the antisymmetric symplectic metric (actually the inverse symplectic metric)
j
i
µ
1 if z = q and z = pµ ,
ij
i
ω ≡ δi+N, j − δi, j+N =
−1 if z = pµ and z j = q µ ,
0 otherwise .
As a matrix, the symplectic metric ω ij is the 2N × 2N matrix
0 1
ij
ω =
,
−1 0
(4.125)
(4.126)
(4.127)
where 1 denotes the N × N unit matrix. Inverting the inverse symplectic metric ω ij yields the symplectic
metric ωij , which is the same matrix but flipped in sign,
0 −1
ij −1
ij ⊤
ij
ωij ≡ (ω ) = (ω ) = −ω =
.
(4.128)
1 0
Let z ′i be another set of generalized coordinates and momenta satisfying Hamilton’s equations with the same
Hamiltonian H,
∂H
dz ′i
= ω ij ′j .
(4.129)
dλ
∂z
It is being assumed here that the Hamiltonian H does not depend explicitly on the parameter λ. In the superHamiltonian approach, there is no loss of generality in taking the Hamiltonian H to be independent of λ,
since the parameter λ is arbitrary, without physical significance. The important point about equation (4.129)
is that the symplectic metric ω ij is the same regardless of the choice of phase-space coordinates. Under a
canonical transformation z i → z ′i (z) of generalized coordinates and momenta, dz ′i /dλ transforms as
dz ′i
∂z ′i dz k
∂z ′i kl ∂H
∂z ′i kl ∂z ′j ∂H
=
=
ω
=
ω
.
dλ
∂z k dλ
∂z k
∂z l
∂z k
∂z l ∂z ′j
(4.130)
Comparing equations (4.129) and (4.130) shows that the symplectic matrix ω ij is invariant under a canonical
transformation,
∂z ′i kl ∂z ′j
ω
.
(4.131)
ω ij =
∂z k
∂z l
Equation (4.131) can be expressed as the invariance under canonical transformations of
ω ij
∂ ∂
∂ ∂
= ω ij ′i ′j .
∂z i ∂z j
∂z ∂z
(4.132)
4.19 Symplectic scalar product and Poisson brackets
117
Equivalently,
ωij dz i dz j = ωij dz ′i dz ′j .
(4.133)
The invariance of the symplectic metric ωij under canonical transformations can be thought of as analogous
to the invariance of the Minkowski metric ηmn under Lorentz transformations. But whereas the Minkowski
metric ηmn is symmetric, the symplectic metric ωij is antisymmetric.
4.19 Symplectic scalar product and Poisson brackets
Let f (z i ) and g(z i ) be two functions of phase-space coordinates z i . Their tangent vectors in the phase space
are ∂f /∂z i and ∂g/∂z i . The symplectic scalar product of the tangent vectors defines the Poisson bracket
of the two functions f and g,
[f, g] ≡ ω ij
∂f ∂g
∂f ∂g
∂f ∂g
−
.
= µ
∂z i ∂z j
∂q ∂pµ
∂pµ ∂q µ
(4.134)
The invariance (4.132) of the symplectic metric implies that the Poisson bracket is a scalar, invariant under
canonical transformations of the phase-space coordinates z i . The Poisson bracket is antisymmetric thanks
to the antisymmetry of the symplectic metric ω ij ,
[f, g] = −[g, f ] .
(4.135)
4.19.1 Poisson brackets of phase-space coordinates
The Poisson brackets of the phase-space coordinates and momenta themselves satisfy
[z i , z j ] = ω ij .
(4.136)
Explicitly in terms of the generalized coordinates and momenta q µ and pµ ,
[q µ , pν ] = δνµ ,
[q µ , q ν ] = 0 ,
[pµ , pν ] = 0 .
(4.137)
Reinterpreting equations (4.137) as operator equations provides a path from classical to quantum mechanics.
4.20 (Super-)Hamiltonian as a generator of evolution
The Poisson bracket of a function f (z i ) with the Hamiltonian H is
∂f ∂H
∂f ∂H
−
.
∂q µ ∂pµ
∂pµ ∂q µ
(4.138)
∂f dq µ
∂f dpµ
df
+
=
.
∂q µ dλ
∂pµ dλ
dλ
(4.139)
[f, H] =
Inserting Hamilton’s equations (4.72) implies
[f, H] =
118
Action principle for point particles
That is, the evolution of a function f (q µ , pµ ) of generalized coordinates and momenta is its Poisson bracket
with the Hamiltonian H,
df
= [f, H] .
dλ
(4.140)
Equation (4.140) shows that the (super-)Hamiltonian defined by equation (4.68) can be interpreted as generating the evolution of the system.
The same derivation holds in the conventional case where λ is taken to be time t, but generically the
function f (t, q α , pα ) and conventional Hamiltonian H(t, q α , pα ) must be allowed to be explicit functions of
time t as well as of generalized spatial coordinates and momenta q α and pα . Equation (4.140) becomes in
the conventional case
df
∂f
=
+ [f, H] .
(4.141)
dt
∂t
4.21 Infinitesimal canonical transformations
A canonical transformation generated by G = q µ p′µ is the identity transformation, since it leaves the coordinates and momenta unchanged. Consider a canonical transformation with generator infinitesimally shifted
from the identity transformation, with ǫ an infinitesimal parameter,
G = q µ pµ′ + ǫ g(q, p′ ) .
(4.142)
The resulting canonical transformation is, from equation (4.119),
q ′µ =
∂g
∂G
= qµ + ǫ ′ ,
∂pµ′
∂pµ
pµ =
∂G
∂g
= p′µ + ǫ µ .
∂q µ
∂q
(4.143)
Because ǫ is infinitesimal, the term ǫ ∂g/∂pµ′ can be replaced by ǫ ∂g/∂pµ to linear order, yielding
q ′µ = q µ + ǫ
∂g
,
∂pµ
pµ′ = pµ − ǫ
∂g
.
∂q µ
(4.144)
Equations (4.144) imply that the changes δpµ and δq µ in the coordinates and momenta under an infinitesimal
canonical transformation (4.142) is their Poisson bracket with g,
δpµ = ǫ [pµ , g] ,
δq µ = ǫ [q µ , g] .
(4.145)
As a particular example, the evolution of the system under an infinitesimal change δλ in the parameter λ
is, in accordance with the evolutionary equation (4.140), generated by a canonical transformation with g in
equation (4.142) set equal to the Hamiltonian H,
δpµ = δλ [pµ , H] ,
δq µ = δλ [q µ , H] .
(4.146)
4.22 Constancy of phase-space volume under canonical transformations
119
4.22 Constancy of phase-space volume under canonical transformations
The invariance of the symplectic metric under canonical transformations implies the invariance of phase-space
volume under canonical transformations.
The volume V of a region of 2N -dimensional phase space is
Z
Z
Z
(4.147)
V ≡ dV ≡ dz 1 ...dz 2N ≡ dq 1 ...dq N dp1 ...dpN ,
integrated over the region. Under a canonical transformation z i → z ′i (z) of phase-space coordinates, the
phase-space volume element dV transforms by the Jacobian of the transformation, which is the determinant
∂z ′i /∂z j ,
dV ′ =
∂z ′i
dV .
∂z j
(4.148)
But equation (4.131) implies that
ω ij =
∂z ′i
∂z k
ω kl
∂z ′j
∂z l
,
(4.149)
so the Jacobian must be 1 in absolute magnitude,
∂z ′i
= ±1 .
∂z j
(4.150)
If the canonical transformation can be obtained by a continuous transformation from the identity, then the
Jacobian must equal 1. As a particular case, the Jacobian equals 1 for the canonical transformation generated
by evolution, §4.22.1, since evolution is continuous from initial to final conditions.
4.22.1 Constancy of phase-space volume under evolution
Since evolution is a canonical transformation, §4.17.2 and §4.21, phase-space volume V is preserved under
evolution of the system. Each phase-space point inside the volume V evolves according to the equations
of motion. As the system of points evolves, the region distorts, but the magnitude of the volume V of the
region remains constant. The constancy of phase-space volume as it evolves was proved explicitly in 1871 by
Boltzmann, who later referred to the result as “Liouville’s theorem” since the proof was based in part on a
mathematical theorem proved by Liouville (see Nolte, 2010).
4.23 Poisson algebra of integrals of motion
A function f (z i ) of the generalized coordinates and momenta is said to be an integral of motion if it is
constant as the system evolves. In view of equation (4.140), a function f (z i ) is an integral of motion if and
only if its Poisson brackets with the Hamiltonian vanishes,
[f, H] = 0 .
(4.151)
120
Action principle for point particles
As a particular example, the antisymmetry of the Poisson bracket implies that the Poisson bracket of the
Hamiltonian with itself is zero,
[H, H] = 0 ,
(4.152)
so the Hamiltonian H is itself a constant of motion. The super-Hamiltonian H is a constant of motion in
general, while the conventional Hamiltonian H is constant provided that it does not depend explicitly on
time t.
Suppose that f (z i ) and g(z i ) are both integrals of motion. Then their Poisson brackets with each other is
also an integral of motion,
[[f, g], H] = − [[g, H], f ] − [[H, f ], g] = 0 ,
(4.153)
the first equality of which expresses the Jacobi identity, and the last equality of which follows because the
Poisson bracket of each of f and g with the Hamiltonian H vanishes. The Poisson bracket of two integrals
of motion f and g may or may not yield a further distinct integral of motion. A set of linearly independent
integrals of motion whose Poisson brackets close forms a Lie algebra is called a Poisson algebra.
Concept question 4.6 How many integrals of motion can there be? How many distinct integrals
of motion can there be in a dynamical system described by N coordinates and N momenta? A distinct
integral of motion is one that cannot be expressed as a function of the other integrals of motion (this is more
stringent than the condition that the integrals be linearly independent). Answer. The dynamical motion of
the system is described by a 1-dimensional line in a 2N -dimensional phase-space manifold consisting of the N
coordinates and N momenta. Any constant of motion f (xµ , πµ ) defines a (2N −1)-dimensional submanifold
of the phase-space manifold. A 1-dimensional line can be the intersection of no more than 2N −1 distinct
such submanifolds, so there can be at most 2N −1 distinct constants of motion. In the super-Hamiltonian
formulation, the phase space of a single particle in 4 spacetime dimensions is 8-dimensional, and there are
at most 7 distinct integrals of motion. A particle moving along a straight line in Minkowski space provides
an example of a system with a full set of 7 integrals of motion: 4 integrals constitute the covariant energymomentum 4-vector pm , and a further 3 integrals of motion comprise xa − v a t = xa (0) where v a ≡ pa /p0 is
the velocity, and xm (0) is the origin of the line at t = 0. In the conventional Hamiltonian formulation, the
phase space of a single particle is 6-dimensional, and there are at most 5 distinct integrals of motion. The
apparent discrepancy in the number of integrals occurs because in the super-Hamiltonian formalism the time
t and time component πt of the generalized momentum are treated as distinct variables whose equations
of motion are determined by Hamilton’s equations, whereas in the conventional Hamiltonian formalism the
arbitrary parameter λ is set equal to the time t, which is therefore no longer an independent variable, and
the generalized momentum πt , which equals minus the conventional Hamiltonian H, equation (4.108), is
eliminated as an independent variable by re-expressing it in terms of the spatial coordinates and momenta.
Concept Questions
1. What evidence do astronomers currently accept as indicating the presence of a black hole in a system?
2. Why can astronomers measure the masses of supermassive black holes only in relatively nearby galaxies?
3. To what extent (with what accuracy) are real black holes in our Universe described by the no-hair
theorem?
4. Does the no-hair theorem apply inside a black hole?
5. Black holes lose their hair on a light-crossing time. How long is a light-crossing time for a typical
stellar-sized or supermassive astronomical black hole?
6. Relativists say that the metric is gµν , but they also say that the metric is ds2 = gµν dxµ dxν . How can
both statements be correct?
7. The Schwarzschild geometry is said to describe the geometry of spacetime outside the surface of the
Sun or Earth. But the Schwarzschild geometry supposedly describes non-rotating masses, whereas the
Sun and Earth are rotating. If the Sun or Earth collapsed to a black hole conserving their mass M and
angular momentum L, roughly what would the spin a/M = L/M 2 of the black hole be relative to the
maximal spin a/M = 1 of a Kerr black hole?
8. What happens at the horizon of a black hole?
9. As cold matter becomes denser, it goes through the stages of being solid/liquid like a planet, then
electron degenerate like a white dwarf, then neutron degenerate like a neutron star, then finally it
collapses to a black hole. Why could there not be a denser state of matter, denser than a neutron star,
that brings a star to rest inside its horizon?
10. How can an observer determine whether they are “at rest” in the Schwarzschild geometry?
11. An observer outside the horizon of a black hole never sees anything pass through the horizon, even to
the end of the Universe. Does the black hole then ever actually collapse, if no one ever sees it do so?
12. If nothing can ever get out of a black hole, how does its gravity get out?
13. Why did Einstein believe that black holes could not exist in nature?
14. In what sense is a rotating black hole “stationary” but not “static”?
15. What is a white hole? Do they exist?
16. Could the expanding Universe be a white hole?
17. Could the Universe be the interior of a black hole?
121
122
Concept Questions
18. You know the Schwarzschild metric for a black hole. What is the corresponding metric for a white hole?
19. What is the best kind of black hole to fall into if you want to avoid being tidally torn apart?
20. Why do astronomers often assume that the inner edge of an accretion disk around a black hole occurs
at the innermost stable orbit?
21. A collapsing star of uniform density has the geometry of a collapsing Friedmann-Lemaı̂tre-RobertsonWalker cosmology. If a spatially flat FLRW cosmology corresponds to a star that starts from zero velocity
at infinity, then to what do open or closed FLRW cosmologies correspond?
22. Your friend falls into a black hole, and you watch her image freeze and redshift at the horizon. A shell
of matter falls on to the black hole, increasing the mass of the black hole. What happens to the image
of your friend? Does it disappear, or does it remain on the horizon?
23. Is the singularity of a Reissner-Nordström black hole gravitationally attractive or repulsive?
24. If you are a charged particle, which dominates near the singularity of the Reissner-Nordström geometry,
the electrical attraction/repulsion or the gravitational attraction/repulsion?
25. Is a white hole gravitationally attractive or repulsive?
26. What happens if you fall into a white hole?
27. Which way does time go in Parallel Universes in the Reissner-Nordström geometry?
28. What does it mean that geodesics inside a black hole can have negative energy?
29. Can geodesics have negative energy outside a black hole? How about inside the ergosphere?
30. Physically, what causes mass inflation?
31. Is mass inflation likely to occur inside real astronomical black holes?
32. What happens at the X point, where the outgoing and ingoing inner horizons of the Reissner-Nordström
geometry intersect?
33. Can a particle like an electron or proton, whose charge far exceeds its mass (in geometric units), be
modelled as Reissner-Nordström black hole?
34. Does it makes sense that a person might be at rest in the Kerr-Newman geometry? How would the
Boyer-Lindquist coordinates of such a person vary along their worldline?
35. In identifying M as the mass and a the angular momentum per unit mass of the black hole in the
Boyer-Lindquist metric, why is it sufficient to consider the behaviour of the metric at r → ∞?
36. Does space move faster than light inside the ergosphere?
37. If space moves faster than light inside the ergosphere, why is the outer boundary of the ergosphere not
a horizon?
38. Do closed timelike curves make sense?
39. What does Carter’s fourth integral of motion Q signify physically?
40. What is special about a principal null congruence?
41. Evaluated in the locally inertial frame of a principal null congruence, the spin-0 component of the Weyl
scalar of the Kerr geometry is C = −M/(r−ia cos θ)3 , which looks like the Weyl scalar C = −M/r3 of the
Schwarzschild geometry but with radius r replaced by the complex radius r − ia cos θ. Is there something
deep here? Can the Kerr geometry be constructed from the Schwarzschild geometry by complexifying
the radial coordinate r?
What’s important?
1. Astronomical evidence suggests that stellar-sized and supermassive black holes exist ubiquitously in
nature.
2. The no-hair theorem, and when and why it applies.
3. The physical picture of black holes as regions of spacetime where space is falling faster than light.
4. A physical understanding of how the metric of a black hole relates to its physical properties.
5. Penrose (conformal) diagrams. In particular, the Penrose diagrams of the various kinds of vacuum black
hole: Schwarzschild, Reissner-Nordström, Kerr-Newman.
6. What really happens inside black holes. Collapse of a star. Mass inflation instability.
123
5
Observational Evidence for Black Holes
It is beyond the intended scope of this book to discuss the extensive and rapidly evolving observational
evidence for black holes in any detail. However, it is useful to summarize a few facts.
1. Observational evidence supports the idea that black holes occur ubiquitously in nature. They are not
observed directly, but reveal themselves through their effects on their surroundings. Two kinds of black
hole are observed: stellar-sized black holes in x-ray binary systems, mostly in our own Milky Way galaxy,
and supermassive black holes in Active Galactic Nuclei (AGN) found at the centres of our own and other
galaxies.
2. The primary evidence that astronomers accept as indicating the presence of a black hole is a lot of mass
compacted into a tiny space.
1. In an x-ray binary system, if the mass of the compact object exceeds 3 M⊙ , the maximum theoretical
mass of a neutron star, then the object is considered to be a black hole. Many hundreds of x-ray
binary systems are known in our Milky Way galaxy, but only tens of these have measured masses,
and in about 20 the measured mass indicates a black hole (McClintock et al., 2011).
2. Several tens of thousands of AGN have been catalogued, identified either in the radio, optical,
or x-rays. But only in nearby galaxies can the mass of a supermassive black hole be measured
directly. This is because it is only in nearby galaxies that the velocities of gas or stars can be
measured sufficiently close to the nuclear centre to distinguish a regime where the velocity becomes
constant, so that the mass can be attribute to an unresolved central point as opposed to a continuous
distribution of stars. The masses of about 40 supermassive black holes have been measured in this
way (Kormendy and Gebhardt, 2001). The masses range from the 4×106 M⊙ mass of the black hole
at the centre of the Milky Way (Ghez et al., 2008; Gillessen et al., 2009) to the 6.6 ± 0.4 × 109 M⊙
mass of the black hole at the centre of the M87 galaxy at the centre of the Virgo cluster at the
centre of the Local Supercluster of galaxies (Gebhardt et al., 2011).
3. Secondary evidences for the presence of a black hole are:
1. high luminosity;
2. non-stellar spectrum, extending from radio to gamma-rays;
3. rapid variability.
4. relativistic jets.
124
Observational Evidence for Black Holes
4.
5.
6.
7.
8.
125
Jets in AGN are often one-sided, and a few that are bright enough to be resolved at high angular
resolution show superluminal motion. Both evidences indicate that jets are commonly relativistic, moving
at close to the speed of light. There are a few cases of jets in x-ray binary systems, sometimes called
microquasars.
Stellar-sized black holes are thought to be created in supernovae as the result of the core-collapse of
stars more massive than about 25 M⊙ (this number depends in part on uncertain computer simulations).
Supermassive black holes are probably created initially in the same way, but they then grow by accretion
of gas funnelled to the centre of the galaxy. The growth rates inferred from AGN luminosities are
consistent with this picture.
Long gamma-ray bursts (lasting more than about 2 seconds) are associated observationally with supernovae. It is thought that in such bursts we are seeing the formation of a black hole. As the black
hole gulps down the huge quantity of material needed to make it, it regurgitates a relativistic jet that
punches through the envelope of the star. If the jet happens to be pointed in our direction, then we see
it relativistically beamed as a gamma-ray burst.
Astronomical black holes present the only realistic prospect for testing general relativity in the strong
field regime, since such fields cannot be reproduced in the laboratory. At the present time the observational tests of general relativity from astronomical black holes are at best tentative. One test is the
redshifting of 7 keV iron lines in a small number of AGN, notably MCG-6-30-15, which can be interpreted
as being emitted by matter falling on to a rotating (Kerr) black hole.
The first direct detection of gravitational waves was with the Laser Interferometer Gravitational wave
Observatory (LIGO) on 14 September 2015 (Abbott et al., 2016). The wave-form was consistent with
the merger of two black holes of masses 29 and 36 M⊙ .
Before gravitational waves were detected directly, their existence was inferred from the gradual speeding up of the orbit of the Hulse-Taylor binary, which consists of two neutron stars, one of which,
PSR1913+16, is a pulsar. The parameters of the orbit have been measured with exquisite precision, and
the rate of orbital speed-up is in good agreement with the energy loss by quadrupole gravitational wave
emission predicted by general relativity.
6
Ideal Black Holes
6.1 Definition of a black hole
What is a black hole? Doubtless you have heard the standard definition: It is a region whose gravity is so
strong that not even light can escape.
But why can light not escape from a black hole? A standard answer, which John Michell (1784) would
have found familiar, is that the escape velocity exceeds the speed of light. But that answer brings to mind
a Newtonian picture of light going up, turning around, and coming back down, that is altogether different
from what general relativity actually predicts.
Figure 6.1 The fish upstream can make way against the current, but the fish downstream is swept to the bottom of
the waterfall (Art by Wildrose Hamilton). This painting appeared on the cover of the June 2008 issue of the American
Journal of Physics (Hamilton and Lisle, 2008). A similar depiction appeared in Susskind (2003).
126
6.2 Ideal black hole
127
A better definition of a black hole is that it is a
region where space is falling faster than light.
Inside the horizon, light emitted outwards is carried inward by the faster-than-light inflow of space, like a
fish trying but failing to swim up a waterfall, Figure 6.1.
The definition may seem jarring. If space has no substance, how can it fall faster than light? It means that
inside the horizon any locally inertial frame is compelled to fall to smaller radius as its proper time goes by.
This fundamental fact is true regardless of the choice of coordinates.
A similar concept of space moving arises in cosmology. Astronomers observe that the Universe is expanding. Cosmologists find it convenient to conceptualize the expansion by saying that space itself is expanding.
For example, the picture that space expands makes it more straightforward, both conceptually and mathematically, to deal with regions of spacetime beyond the horizon, the surface of infinite redshift, of an observer.
6.2 Ideal black hole
The simplest kind of black hole, an ideal black hole, is one that is stationary, and electrovac outside its
singularity. Electrovac means that the energy-momentum tensor Tµν is zero except for the contribution
from a stationary electromagnetic field. The most important ideal black holes are those that extend to
asymptotically flat empty space (Minkowski space) at infinity. There are ideal black hole solutions that do
not asymptote to flat empty space, but most of these have little relevance to reality. The most important
ideal black hole solutions that are not flat at infinity are those containing a non-zero cosmological constant.
The next several chapters deal with ideal black holes in asymptotically flat space. The importance of ideal
black holes stems from the no-hair theorem, discussed in the next section. The no-hair theorem has the
consequence that, except during their initial collapse, or during a merger, real astronomical black holes are
accurately described as ideal outside their horizons.
6.3 No-hair theorem
I will state and justify the no-hair theorem, but I will not prove it mathematically, since the proof is technical.
The no-hair theorem states that a stationary black hole in asymptotically flat space is characterized by
just three quantities:
1. Mass M ;
2. Electric charge Q;
3. Spin, usually parametrized by the angular momentum a per unit mass.
The mechanism by which a black hole loses its hair is gravitational radiation. When initially formed,
whether from the collapse of a massive star or from the merger of two black holes, a black hole will form a
complicated, oscillating region of spacetime. But over the course of several light crossing times, the oscillations
lose energy by gravitational radiation, and damp out, leaving a stationary black hole.
128
Ideal Black Holes
Real astronomical black holes are not isolated, and continue to accrete (cosmic microwave background
photons, if nothing else). However, the timescale (a light crossing time) for oscillations to damp out by
gravitational radiation is usually far shorter than the timescale for accretion, so in practice real black holes
are extremely well described by no-hair solutions almost all of their lives.
The physical reason that the no-hair theorem applies is that space is falling faster than light inside the
horizon. Consequently, unlike a star, no energy can bubble up from below to replace the energy lost by
gravitational radiation. The loss of energy by gravitational radiation brings the black hole to a state where it
can no longer radiate gravitational energy. The properties of a no-hair black hole are characterized entirely
by conserved quantities.
As a corollary, the no-hair theorem does not apply from the inner horizon of a black hole inward, because
space ceases to fall superluminally inside the inner horizon.
If there exist other absolutely conserved quantities, such as magnetic charge (magnetic monopoles), or
various supersymmetric charges in theories where supersymmetry is not broken, then the black hole will also
be characterized by those quantities.
Black holes are expected not to conserve quantities such as baryon or lepton number that are thought not
to be absolutely conserved, even though they appear to be conserved in low energy physics.
It is legitimate to think of the process of reaching a stationary state as analogous to reaching a condition
of thermodynamic equilibrium, in which a macroscopic system is described by a small number of parameters
associated with the conserved quantities of the system.
7
Schwarzschild Black Hole
The Schwarzschild geometry was discovered by Karl Schwarzschild in late 1915 at essentially the same
time that Einstein was arriving at his final version of the General Theory of Relativity. Schwarzschild
was Director of the Astrophysical Observatory in Potsdam, perhaps the foremost astronomical position in
Germany. Despite his position, he joined the German army at the outbreak of World War 1, and was serving
on the front at the time of his discovery. Sadly, Schwarzschild contracted a rare skin disease on the front.
Returning to Berlin, he died in May 1916 at the age of 42.
The realization that the Schwarzschild geometry describes a collapsed object, a black hole, was not understood by Einstein and his contemporaries. Understanding did not emerge until many decades later, in the
late 1950s. Thorne (1994) gives a delightful popular account of the history.
7.1 Schwarzschild metric
The Schwarzschild metric was discovered first by Karl Schwarzschild (1916b), and then independently
by Johannes Droste (1916). In a polar coordinate system {t, r, θ, φ}, and in geometric units c = G = 1, the
Schwarzschild metric is
−1
2M
2M
2
2
dr2 + r2 do2 ,
dt + 1 −
(7.1)
ds = − 1 −
r
r
where do2 (this is the Landau & Lifshitz notation) is the metric of a unit 2-sphere,
do2 = dθ2 + sin2 θ dφ2 .
With units restored, the time-time component gtt of the Schwarzschild metric is
2GM
gtt = − 1 − 2
.
c r
(7.2)
(7.3)
The Schwarzschild geometry describes the simplest kind of black hole: a black hole with mass M , but no
electric charge, and no spin.
129
130
Schwarzschild Black Hole
The geometry describes not only a black hole, but also any empty space surrounding a spherically symmetric mass. Thus the Schwarzschild geometry describes to a good approximation the spacetimes outside
the surfaces of the Sun and the Earth.
Comparison with the spherically symmetric Newtonian metric
ds2 = − (1 + 2Φ)dt2 + (1 − 2Φ)(dr2 + r2 do2 )
(7.4)
with Newtonian potential
M
(7.5)
r
establishes that the M in the Schwarzschild metric is to be interpreted as the mass of the black hole
(Exercise 7.1).
The Schwarzschild geometry is asymptotically flat, because the metric tends to the Minkowski metric in
polar coordinates at large radius
Φ(r) = −
ds2 → − dt2 + dr2 + r2 do2
as r → ∞ .
(7.6)
Exercise 7.1 Schwarzschild metric in isotropic form. The Schwarzschild metric (7.1) does not have
the same form as the spherically symmetric Newtonian metric (7.4). By a suitable transformation of the
radial coordinate r, bring the Schwarzschild metric (7.1) to the isotropic form
2
1 − M/2R
4
dt2 + (1 + M/2R) (dR2 + R2 do2 ) .
(7.7)
ds2 = −
1 + M/2R
What is the relation between R and r? Hence conclude that the identification (7.5) is correct, and therefore
that M is indeed the mass of the black hole. Is the isotropic form (7.7) of the Schwarzschild metric valid
inside the horizon?
7.2 Stationary, static
The Schwarzschild geometry is stationary. A spacetime is said to be stationary if and only if there exists
a timelike coordinate t such that the metric is independent of t. In other words, the spacetime possesses
time translation symmetry: the metric is unchanged by a time translation t → t + t0 where t0 is some
constant. Evidently the Schwarzschild metric (7.1) is independent of the timelike coordinate t, and is therefore
stationary, time translation symmetric.
As will be found below, §7.6, the Schwarzschild time coordinate t is timelike outside the horizon, but
spacelike inside. Some authors therefore refer to the spacetime inside the horizon of a stationary black hole
as being homogeneous. However, I think it is less confusing to refer to time translation symmetry, which is
a single symmetry of the spacetime, by a single name, stationarity, everywhere in the spacetime.
The Schwarzschild geometry is also static. A spacetime is static if and only if in addition to being
7.3 Spherically symmetric
131
stationary with respect to a time coordinate t, spatial coordinates can be chosen that do not change along
the direction of the tangent vector et . This requires that the tangent vector et be orthogonal to all the spatial
tangent vectors eα
et · eα = gtα = 0 .
(7.8)
The Kerr geometry for a rotating black hole is an example of a geometry that is stationary but not static. If
time t and azimuthal φ coordinates are coordinates associated with time and azimuthal symmetry, then the
scalar product et · eφ of their tangent vectors in the Kerr geometry is a non-vanishing scalar, §9.3. Physically,
in a static geometry, a system of static observers, those who are at rest in static spatial coordinates, see each
other to remain at rest as time passes. In a non-static geometry, no such system of static observers exists.
The Gullstrand-Painlevé metric for the Schwarzschild geometry, discussed in §7.12, is an example of a
metric that is stationary, since the metric coefficients are independent of the free-fall time tff , but not explicitly
static. Observers at rest with respect to Gullstrand-Painlevé spatial coordinates fall into the black hole, and
do not see each other as remaining at rest as time goes by. The Schwarzschild geometry is nevertheless static
because there exist coordinates, the Schwarzschild coordinates, with respect to which the metric is explicitly
static, gtα = 0. The Schwarzschild time coordinate t is thus identified as a special one: it is the unique time
coordinate with respect to which the Schwarzschild geometry is manifestly static.
7.3 Spherically symmetric
The Schwarzschild geometry is also spherically symmetric. This is evident from the fact that the angular
part r2 do2 of the metric is the metric of a 2-sphere of radius r. This can be seen as follows. Consider the
metric of ordinary flat 3-dimensional Euclidean space in Cartesian coordinates {x, y, z}:
ds2 = dx2 + dy 2 + dz 2 .
(7.9)
Convert to polar coordinates {r, θ, φ}, defined so that
x = r sin θ cos φ ,
(7.10a)
y = r sin θ sin φ ,
(7.10b)
z = r cos θ .
(7.10c)
Substituting equations (7.10a) into the Euclidean metric (7.9) gives
ds2 = dr2 + r2 (dθ2 + sin2 θ dφ2 ) .
(7.11)
Restricting to a surface r = constant of constant radius then gives the metric of a 2-sphere of radius r
ds2 = r2 (dθ2 + sin2 θ dφ2 )
(7.12)
as claimed.
The radius r in Schwarzschild coordinates is the circumferential radius, defined such that the proper
132
Schwarzschild Black Hole
circumference of the 2-sphere measured by observers at rest in Schwarzschild coordinates is 2πr. This is a
coordinate-invariant definition of the meaning of r, which implies that r is a scalar.
7.4 Energy-momentum tensor
It is straightforward (especially if you use a computer algebraic manipulation program) to follow the cookbook
summarized in §2.25 to check that the Einstein tensor that follows from the Schwarzschild metric (7.1) is
zero. Einstein’s equations then imply that the Schwarzschild geometry has zero energy-momentum tensor.
If the Schwarzschild geometry is empty, should not the spacetime be flat, the Minkowski spacetime? There
are two answers to this question. Firstly, the Schwarzschild geometry describes the geometry of empty space
around a static spherically symmetric mass, such as the Sun or Earth. The geometry inside the spherically
symmetric mass is described by some other metric, which connects continuously and differentiably (but not
necessarily doubly differentiably, if the spherical object has an abrupt surface) to the Schwarzschild metric.
The second answer is that the Schwarzschild geometry describes the geometry of a collapsed object, a
black hole, becomes singular at its centre, r = 0, but is otherwise empty of energy-momentum.
Exercise 7.2 Derivation of the Schwarzschild metric. There are neater and more insightful ways
to derive it, but the Schwarzschild metric can be derived by turning a mathematical crank without the
need for deeper conceptual understanding. Start with the assumption that the metric of a static, spherically
symmetric object can be written in polar coordinates {t, r, θ, φ} as
ds2 = − A(r) dt2 + B(r) dr2 + r2 (dθ2 + sin2 θ dφ2 ) ,
(7.13)
where A(r) and B(r) are some to-be-determined functions of radius r. Write down the components of the
metric gµν , and deduce its inverse g µν . Compute all the components of the coordinate connections Γλµν ,
equation (2.63). Of the 40 distinct connections, 9 should be non-vanishing. Compute all the components of
the Riemann tensor Rκλµν , equation (2.112). There should be 6 distinct non-zero components. Compute all
the components of the Ricci tensor Rκµ , equation (2.121). There should be 4 distinct non-zero components.
Now impose that the spacetime be empty, that is, the energy-momentum tensor is zero. Einstein’s equations
then demand that the Ricci tensor vanishes identically. Use the requirement that g tt Rtt − g rr Rrr = 0 to
show that AB = 1. Then use g tt Rtt = 0 to derive the functional form of A. Finally, use the Newtonian limit
−gtt ≈ 1 + 2Φ with Φ = −GM/r, valid at large radius r, to fix A.
7.5 Birkhoff ’s theorem
Birkhoff ’s theorem, whose proof is deferred to Chapter 20, Exercise 20.2, states that the geometry of
empty space surrounding a spherically symmetric matter distribution is the Schwarzschild geometry. That
7.6 Horizon
133
is, if the metric is of the form
ds2 = A(t, r) dt2 + B(t, r) dt dr + C(t, r) dr2 + D(t, r) do2 ,
(7.14)
where the metric coefficients A, B, C, and D are allowed to be arbitrary functions of t and r, and if the
energy momentum tensor vanishes, Tµν = 0, outside some value of the circumferential radius r′ defined by
r′2 = D, then the geometry is necessarily Schwarzschild outside that radius.
This means that if a mass undergoes spherically symmetric pulsations, then those pulsations do not affect
the geometry of the surrounding spacetime. This reflects the fact that there are no spherically symmetric
gravitational waves.
7.6 Horizon
The horizon of the Schwarzschild geometry lies at the Schwarzschild radius r = rs
rs =
2GM
,
c2
(7.15)
where units of c and G have been momentarily restored. Where does this come from? The Schwarzschild
metric shows that the scalar spacetime distance squared ds2 along an interval at rest in Schwarzschild
coordinates, dr = dθ = dφ = 0, is timelike, lightlike, or spacelike depending on whether the radius is greater
than, equal to, or less than rs :
< 0 if r > rs ,
rs
2
2
ds = − 1 −
(7.16)
dt
= 0 if r = rs ,
r
> 0 if r < rs .
Since the worldline of a massive observer must be timelike, it follows that a massive observer can remain at
rest only outside the horizon, r > rs . An object at rest at the horizon, r = rs , follows a null geodesic, which
is to say it is a possible worldline of a massless particle, a photon. Inside the horizon, r < rs , neither massive
nor massless objects can remain at rest. To remain at rest, a particle inside the horizon would have to go
faster than light.
A full treatment of what is going on requires solving the geodesic equation in the Schwarzschild geometry,
but the results may be anticipated already at this point. In effect, space is falling into the black hole. Outside
the horizon, space is falling less than the speed of light; at the horizon space is falling at the speed of light;
and inside the horizon, space is falling faster than light, carrying everything with it. This is why light cannot
escape from a black hole: inside the horizon, space falls inward faster than light, carrying light inward even if
that light is pointed radially outward. The statement that space is falling superluminally inside the horizon
of a black hole is a coordinate-invariant statement: massive or massless particles are carried inward whatever
their state of motion and whatever the coordinate system.
Whereas an interval of coordinate time t switches from timelike outside the horizon to spacelike inside the
134
Schwarzschild Black Hole
horizon, an interval of coordinate radius r does the opposite: it switches from spacelike to timelike:
if r > rs ,
>0
−1
r
s
dr2
ds2 = 1 −
= ∞ if r = rs ,
r
<0
if r < rs .
(7.17)
It appears then that the Schwarzschild time and radial coordinates swap roles inside the horizon. Inside the
horizon, the radial coordinate becomes timelike, meaning that it becomes a possible worldline of a massive
observer. That is, a trajectory at fixed t and decreasing r is a possible worldline. Again this reflects the fact
that space is falling faster than light inside the horizon. A person inside the horizon is inevitably compelled,
as their proper time goes by, to move to smaller radial coordinate r.
Concept question 7.3 Going forwards or backwards in time inside the horizon. Inside the
horizon, can a person can go forwards or backwards in Schwarzschild time t? What does that mean?
7.7 Proper time
The proper time experienced by an observer at rest in Schwarzschild coordinates, dr = dθ = dφ = 0, is
dτ =
p
rs 1/2
−ds2 = 1 −
dt .
r
(7.18)
For an observer at rest at infinity, r → ∞, the proper time is the same as the coordinate time,
dτ → dt
as r → ∞ .
(7.19)
Among other things, this implies that the Schwarzschild time coordinate t is a scalar: not only is it the
unique coordinate with respect to which the metric is manifestly static, but it coincides with the proper time
of observers at rest at infinity. This coordinate-invariant definition of Schwarzschild time t implies that it is
a scalar.
At finite radii outside the horizon, r > rs , the proper time dτ is less than the Schwarzschild time dt, so
the clocks of observers at rest run slower at smaller than at larger radii.
At the horizon, r = rs , the proper time dτ of an observer at rest goes to zero,
dτ → 0 as
r → rs .
(7.20)
This reflects the fact that an object at rest at the horizon is following a null geodesic, and as such experiences
zero proper time.
7.8 Redshift
135
7.8 Redshift
An observer at rest at infinity looking through a telescope at an emitter at rest at radius r sees the emitter
redshifted by a factor
νem
dτobs
rs −1/2
λobs
.
(7.21)
=
=
= 1−
1+z ≡
λem
νobs
dτem
r
This is an example of the universally valid statement that photons are good clocks: the redshift factor is given
by the rate at which the emitter’s clock appears to tick relative to the observer’s own clock. Equation (7.21)
is an example of the general formula (2.101) for the redshift between two comoving (= rest) observers in a
stationary spacetime.
It should be emphasized that the redshift factor (7.21) is valid only for an observer and an emitter at rest
in the Schwarzschild geometry. If the observer and emitter are not at rest, then additional special relativistic
factors will fold into the redshift.
The redshift goes to infinity for an emitter at the horizon
1 + z → ∞ as
r → rs .
(7.22)
Here the redshift tends to infinity regardless of the motion of the observer or emitter. An observer watching
an emitter fall through the horizon will see the emitter appear to freeze at the horizon, becoming ever slower
and more redshifted. Physically, photons emitted vertically upward at the horizon by an infallaer remain at
the horizon for ever, taking an infinite time to get out to the outside observer.
7.9 “Schwarzschild singularity”
The apparent singularity in the Schwarzschild metric at the horizon rs is not a real singularity, because it
can be removed by a change of coordinates, such as to Gullstrand-Painlevé coordinates, equation (7.27).
Einstein, and other influential physicists such as Eddington, failed to appreciate this. Einstein thought that
the “Schwarzschild singularity” at r = rs marked the physical boundary of the Schwarzschild spacetime.
After all, an outside observer watching stuff fall in never sees anything beyond that boundary.
Schwarzschild’s choice of coordinates was certainly a natural one. It was natural to search for static
solutions, and his time coordinate t is the only one with respect to which the metric is manifestly static.
The problem is that physically there can be no static observers inside the horizon: they must necessarily fall
inward as time passes. The fact that Schwarzschild’s coordinate system shows an apparent singularity at the
horizon reflects the fact that the assumption of a static spacetime necessarily breaks down at the horizon,
where space is falling at the speed of light.
Does stuff “actually” fall in, even though no outside observer ever sees it happen? The answer is yes: when
a black hole forms, it does actually collapse, and when an observer falls through the horizon, they really do
fall through the horizon. The reason that an outside observer sees everything freeze at the horizon is simply
a light travel time effect: it takes an infinite time for light to lift off the horizon and make it to the outside
world.
136
Schwarzschild Black Hole
7.10 Weyl tensor
For Schwarzschild, the Einstein tensor vanishes identically (because the spacetime is by assumption empty of
energy-momentum). The only part of the Riemann curvature tensor that does not vanish is the Weyl tensor.
The non-vanishing Weyl tensor says that gravitational tidal forces are present, even though the spacetime
is empty of energy-momentum. Non-vanishing gravitational tidal forces are the signature that spacetime is
curved.
The covariant (all indices down) components Cκλµν of the coordinate-frame Weyl tensor of the Schwarzschild geometry, computed from equation (3.1), appear at first sight to be a mess (go ahead, compute them).
However, the mess is an artefact of looking at the tensor through the distorting lens of the coordinate basis
vectors eµ , which are not orthonormal. After tetrads, Chapter 11, it will be found that the 10 components
of the Weyl tensor, the tidal part of the Riemann tensor, can be decomposed in any locally inertial frame
into 5 complex components of spin 0, ±1, and ±2. In a locally inertial frame whose radial direction coincides
with the radial direction of the Schwarzschild metric, all components of the Weyl tensor of the Schwarzschild
geometry vanish except the real spin-0 component. Spin 0 means that the Weyl tensor is unchanged under
a spatial rotation about the radial direction (and it is also unchanged by a Lorentz boost in the radial direction). This spin-0 component is a coordinate-invariant scalar, the Weyl scalar C. The fact that the Weyl
tensor of the Schwarzschild geometry has only a single independent non-vanishing component is plausible
from the fact that the non-zero components of the coordinate-frame Weyl tensor written with two indices
up and two indices down are (no implicit summation over repeated indices)
− 21 C tr tr = − 21 C θφ θφ = C tθ tθ = C tφ tφ = C rθ rθ = C rφ rφ = C ,
(7.23)
where C is the Weyl scalar,
M
.
(7.24)
r3
The trick of writing the 4-index Weyl tensor with 2 indices up and 2 indices down, in order to reveal a simple
pattern, works in a simple spacetime like Schwarzschild, but fails in more complicated spacetimes.
C=−
7.11 Singularity
The Weyl scalar, equation (7.24), goes to infinity at zero radius,
C → ∞ as
r→0.
(7.25)
The diverging Weyl tensor implies that the tidal force diverges at zero radius, signalling that there is a
genuine singularity at zero radius in the Schwarzschild geometry.
Concept question 7.4 Is the singularity of a Schwarzschild black hole a point? Is the singularity
at the centre of the Schwarzschild geometry a point? Answer. No. Familiar experience in 3-dimensional
space would suggest the answer is yes, but that conception is misleading. In the first place, general relativity
7.11 Singularity
137
n
rizo
Ho
Visible
Singularity
Invisible
Figure 7.1 The light (yellow) shaded region shows the region visible to an infaller (blue) who falls radially to the
singularity of a Schwarzschild black hole; the dark (grey) shaded region shows the region that remains invisible to
the infaller. If another infaller (purple) falls along a different radial direction, the two infallers not only fail to meet
at the singularity, they lose causal contact with each other already some distance from the singularity. Since the two
infallers fall to two causally disconnected points, the singularity cannot be a point.
fails at singularities: the locally inertial description of spacetime fails, and general relativity cannot continue
worldlines of infallers beyond a singularity. Therefore singularities are not part of the spacetime described by
general relativity. Presumably some other physical theory takes over at singularities, but what that theory
is remains equivocal at the present time. In the second place, infallers who fall into a Schwarzschild black
hole at different angular positions do not approach each other as they approach the singularity. Rather,
the diverging tidal force near the singularity funnels each infaller along radially converging lines, effectively
keeping the infallers isolated from each other. Moreover, the future lightcones of infallers who fall in at the
same time t but at different angular positions cease to intersect once they are close enough to the singularity.
Thus the infallers not only fail to touch each other, they cease even to be able to communicate with each other
as they approach the singularity, as illustrated in Figure 7.1. The reader may object that the Schwarzschild
metric shows that the proper angular distance between two observers separated by angle φ is r dφ, which
goes to zero at the singularity r → 0. This objection fails because infallers approaching the singularity cease
to be able to measure angular distances, since angularly separated points cease to be causally accessible to
the infaller. The region accessible to an infaller is cusp-like near the singularity. See Exercise 7.9 for a more
quantitative treatment of this problem.
Concept question 7.5 Separation between infallers who fall in at different times. Consider two
infallers who free-fall radially into the black hole at the same angular position, but at different times t. What
is the proper spatial separation between the two observers at the instants they hit the singularity, at r → 0?
138
Schwarzschild Black Hole
Answer. Infinity. At the same angular position, dθ = dφ, the proper radial separation is
r
√
rs
− 1 dt → ∞ as r → 0 .
dl = ds2 =
r
(7.26)
7.12 Gullstrand-Painlevé metric
An alternative metric for the Schwarzschild geometry was discovered independently by Allvar Gullstrand and
Paul Painlevé in 1921 (Gullstrand, 1922; Painlevé, 1921). (Gullstrand has priority because his paper, though
published in 1922, was submitted in May 1921, whereas Painlevé’s paper was a write-up of a presentation
to L’Académie des Sciences in Paris in October 1921). After tetrads, it will become clear that the standard
way in which metrics are written encodes not only metric but also a tetrad. The Gullstrand-Painlevé lineelement (7.27) encodes a tetrad that represents locally inertial frames free-falling radially into the black hole
at the Newtonian escape velocity, Figure 7.2, although at the time no one, including Einstein, Gullstrand,
and Painlevé, understood this. Unlike Schwarzschild coordinates, there is no singularity at the horizon
in Gullstrand-Painlevé coordinates. It is striking that the mathematics was known long before physical
understanding emerged.
Horizon
Singularity
Figure 7.2 The Gullstrand-Painlevé metric for the Schwarzschild geometry encodes locally inertial frames (tetrads)
that free-fall radially into the black hole at the Newtonian escape velocity β, equation (7.28). The infall velocity is
less than the speed of light outside the horizon, equal to the speed of light at the horizon, and faster than light inside
the horizon. The infall velocity tends to infinity at the central singularity.
7.12 Gullstrand-Painlevé metric
139
The Gullstrand-Painlevé metric is
ds2 = − dt2ff + (dr − β dtff )2 + r2 do2 .
(7.27)
Here β is the Newtonian escape velocity (with a minus sign because space is falling inward),
β=−
2GM
r
1/2
,
(7.28)
and tff is the proper time experienced by an object that free falls radially inward from zero velocity at infinity.
The free fall time tff is related to the Schwarzschild time coordinate t by
dtff = dt −
β
dr ,
1 − β2
(7.29)
which integrates to
tff = t + rs
!
p
p
r/rs − 1
2 r/rs + ln p
.
r/rs + 1
(7.30)
The time axis etff in Gullstrand-Painlevé coordinates is not orthogonal to the radial axis er , but rather is
tilted along the radial axis, etff · er = gtff r = −β.
The proper time of a person at rest in Gullstrand-Painlevé coordinates, dr = dθ = dφ = 0, is
p
(7.31)
dτ = dtff 1 − β 2 .
The horizon occurs where this proper time vanishes, which happens when the infall velocity β is the speed
of light
|β| = 1 .
(7.32)
According to equation (7.28), this happens at r = rs , which is the Schwarzschild radius, as it should be.
Exercise 7.6
Geodesics in the Schwarzschild geometry. The Schwarzschild metric is
ds2 = − ∆(r) dt2 +
1
dr2 + r2 (dθ2 + sin2 θ dφ2 ) ,
∆(r)
(7.33)
where ∆(r) is the horizon function
∆(r) = 1 −
2M
.
r
(7.34)
1. Constants of motion. Argue that, without loss of generality, the trajectory of a freely falling particle
may be taken to lie in the equatorial plane, θ = π/2. Argue that, for a massive particle, conservation
140
Schwarzschild Black Hole
of energy per unit rest mass E, angular momentum per unit rest mass L, and rest mass per unit rest
mass implies that the 4-velocity uµ ≡ dxµ /dτ satisfies
ut = −E ,
(7.35a)
uφ = L ,
(7.35b)
µ
uµ u = −1 .
(7.35c)
r
2. Effective potential. Show that the radial component u of the 4-velocity satisfies
1/2
ur = ± E 2 − U
,
(7.36)
where U is the effective potential
U=
1+
L2
r2
∆.
(7.37)
3. Proper time in radial free-fall. What is the proper time τ for an observer to free-fall from radius
r to the singularity at zero radius, for the particular case of an observer who falls radially from rest
at infinity. [Hint: What are the energy E and angular momentum L for an observer who falls radially
starting from rest at infinity?]
4. Proper time in radial free-fall — numbers. Evaluate the proper time, in seconds, to fall from the
horizon to the singularity in the case of a black hole with the mass 4 × 106 M⊙ of the black hole at the
centre of our Galaxy, the Milky Way.
5. Circular orbits. Circular orbits occur where the effective potential U is an extremum. Find the radii
at which this occurs, as a function of angular momentum L. Solutions exist only if the absolute value
|L| of the angular momentum exceeds a certain critical value Lc . What is this critical value Lc ?
6. Graph. Graph the effective potential U for values of L (i) less than, (ii) equal to, (iii) greater than the
critical value Lc . Describe physically, in words, what the possible orbital trajectories are for the various
cases. [Hint: For cases (i) and (iii), values near the critical value Lc show the distinction most clearly.]
7. Range of orbits. Identify the ranges of radii over which circular orbits are: (i) stable, (ii) unstable, (iii)
non-existent. [Hint: Stability depends on whether the extremum of the effective potential is a minimum
or a maximum. Which is which? You will find it helps to consider U as a function of 1/r rather than r.]
8. Angular momentum and energy in circular orbit. Show that the angular momentum per unit
mass for a circular orbit at radius r satisfies
r
,
(7.38)
|L| =
1/2
(r/M − 3)
and hence show also that the energy per unit mass in the circular orbit is
E=
r − 2M
[r(r − 3M )]
1/2
.
(7.39)
9. Drop in orbit. There is a certain circular orbit that has the same energy as a massive particle at rest
at infinity. This is useful for starship captains to know, because it is possible to drop into this orbit
using only a small amount of energy. What is the radius of the orbit? Is it stable or unstable?
7.12 Gullstrand-Painlevé metric
141
10. Photon sphere. There is a radius where photons can orbit in circular orbits. What is the radius of
this orbit? [Hint: Photons can be taken as the limit of a massive particle whose energy per unit mass E
vastly exceeds its rest mass energy per unit mass, which is 1.]
11. Orbital period. Show that the orbital period t, as measured by an observer at rest at infinity, of a
particle in circular orbit at radius r is given by Kepler’s 3rd law (remarkably, Kepler’s 3rd law remains
true even in the fully general relativistic case, as long as t is taken to be the time measured at infinity),
GM t2
= r3 .
(2π)2
(7.40)
[Hint: Argue that the azimuthal angle φ evolves according to dφ/dt = uφ /ut = L∆/(Er2 ).]
Exercise 7.7 Geodesics in the Schwarzschild geometry in 3 or more dimensions. Standard
general relativity breaks down in N = 2 spacetime dimensions, §11.19, and there are no black holes in N = 2
spacetime dimensions in the closest approximation to general relativity, Exercise 11.9 (there are however
black holes in N = 2 spacetime dimensions in extensions of general relativity). The Schwarzschild metric in
N ≥ 3 spacetime dimensions is
ds2 = − ∆(r) dt2 +
1
dr2 + r2 do2 ,
∆(r)
(7.41)
where do2 is the metric of a unit N −2 sphere, and ∆(r) is the horizon function
∆(r) = 1 −
2M
.
rN −3
(7.42)
What happens when N = 3? What happens when N ≥ 5?. Argue that equations (7.35)–(7.37) hold, with ∆
in the effective potential U being given by equation (7.42).
Solution. For N = 3, the horizon√ function 7.42 √
is constant ∆ = 1 − 2M . For N = 3, a coordinate
transformation to coordinates t′ = t ∆ and r′ = r/ ∆ brings the Schwarzschild line element (7.41) to
ds2 = − dt′2 + dr′2 + r′2 ∆ do2 ,
(7.43)
√
which is the metric of a cone, with angle 2π ∆ around a circumference. The spacetime looks flat except for
a conical vertex at r′ = 0. A mass M bends geodesics around it, but there are no bound orbits.
The condition for a circular orbit is that the effective potential be an extremum, dU/dr = 0. The boundary
between stable and unstable circular orbits occurs when the potential is a double extremum, dU/dr =
d2 U/dr2 = 0. The boundary between stable and unstable circular orbits occurs at
1/(N −3)
(5−N )/[2(N −3)]
Lc
N −1
N −1
rc
,
,
(7.44)
=
=
rs
5−N
rs
5−N
which has real finite solutions only for 2 ≤ N ≤ 4. For N = 2, equations (7.44) do not apply. For N = 3,
equations (7.44) give rc /rs = e and Lc /rs = e (where e is the exponential); but these values are really valid
not for N = 3, but rather for values of N infinitesimally close to but not equal to 3.
For N ≥ 5, there are no stable circular orbits. For N ≥ 5, the only circular orbits are unstable, which
142
Schwarzschild Black Hole
occur for L > 1 if N = 5 or L > 0 if N ≥ 6. Besides unstable circular orbits, there are unbound geodesics,
and geodesics that fall into the black hole. The case N = 4 is the only dimension for which stable circular
orbits exist.
Exercise 7.8 General relativistic precession of Mercury.
1. Conclude from Exercise 7.6 that the 4-velocity uµ ≡ dxµ /dτ of a massive particle on a geodesic in the
equ