Nothing Special   »   [go: up one dir, main page]

Part SM - Statistical Mechanics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 225

Stony Brook University

Academic Commons

Essential Graduate Physics Department of Physics and Astronomy

2013

Part SM: Statistical Mechanics


Konstantin Likharev
SUNY Stony Brook, konstantin.likharev@stonybrook.edu

Follow this and additional works at: https://commons.library.stonybrook.edu/egp

Part of the Physics Commons

Recommended Citation
Likharev, Konstantin, "Part SM: Statistical Mechanics" (2013). Essential Graduate Physics. 5.
https://commons.library.stonybrook.edu/egp/5

This Book is brought to you for free and open access by the Department of Physics and Astronomy at Academic
Commons. It has been accepted for inclusion in Essential Graduate Physics by an authorized administrator of
Academic Commons. For more information, please contact mona.ramonetti@stonybrook.edu,
hu.wang.2@stonybrook.edu.
Konstantin K. Likharev
Essential Graduate Physics
Lecture Notes and Problems
Beta version

Open online access at


http://commons.library.stonybrook.edu/egp/
and
https://sites.google.com/site/likharevegp/

Part SM:
Statistical Mechanics
Last corrections: May 12, 2023

A version of this material was published in 2019 under the title

Statistical Mechanics: Lecture notes


IOPP, Essential Advanced Physics – Volume 7, ISBN 978-0-7503-1416-6,
with the model solutions of the exercise problems published under the title

Statistical Mechanics: Problems with solutions


IOPP, Essential Advanced Physics – Volume 8, ISBN 978-0-7503-1420-6

However, by now this online version of the lecture notes


and the problem solutions available from the author, have been better corrected

About the author:


https://you.stonybrook.edu/likharev/

© K. Likharev
Essential Graduate Physics SM: Statistical Mechanics

Table of Contents

Chapter 1. Review of Thermodynamics (24 pp.)


1.1. Introduction: Statistical physics and thermodynamics
1.2. The 2nd law of thermodynamics, entropy, and temperature
1.3. The 1st and 3rd laws of thermodynamics, and heat capacity
1.4. Thermodynamic potentials
1.5. Systems with a variable number of particles
1.6. Thermal machines
1.7. Exercise problems (18)
Chapter 2. Principles of Physical Statistics (44 pp.)
2.1. Statistical ensemble and probability
2.2. Microcanonical ensemble and distribution
2.3. Maxwell’s Demon, information, and computing
2.4. Canonical ensemble and the Gibbs distribution
2.5. Harmonic oscillator statistics
2.6. Two important applications
2.7. Grand canonical ensemble and distribution
2.8. Systems of independent particles
2.9. Exercise problems (36)
Chapter 3. Ideal and Not-so-Ideal Gases (34 pp.)
3.1. Ideal classical gas
3.2. Calculating 
3.3. Degenerate Fermi gas
3.4. The Bose-Einstein condensation
3.5. Gases of weakly interacting particles
3.6. Exercise problems (30)
Chapter 4. Phase Transitions (36 pp.)
4.1. First order phase transitions
4.2. Continuous phase transitions
4.3. Landau’s mean-field theory
4.4. Ising model: Weiss’ molecular-field approximation
4.5. Ising model: Exact and numerical results
4.6. Exercise problems (24)
Chapter 5. Fluctuations (44 pp.)
5.1. Characterization of fluctuations
5.2. Energy and the number of particles
5.3. Volume and temperature
5.4. Fluctuations as functions of time
5.5. Fluctuations and dissipation

Table of Contents Page 2 of 4


Essential Graduate Physics SM: Statistical Mechanics

5.6. The Kramers problem and the Smoluchowski equation


5.7. The Fokker-Planck equation
5.8. Back to the correlation function
5.9. Exercise problems (30)
Chapter 6. Elements of Kinetics (38 pp.)
6.1. The Liouville theorem and the Boltzmann equation
6.2. The Ohm law and the Drude formula
6.3. Electrochemical potential and drift-diffusion equation
6.4. Charge carriers in semiconductors: Statics and kinetics
6.5. Thermoelectric effects
6.6. Exercise problems (18)

***

Additional file (available from the author upon request):


Exercise and Test Problems with Model Solutions (156 + 26 = 182 problems; 273 pp.)

***

Author’s Remarks

This graduate-level course in statistical mechanics (including an introduction to thermodynamics


in Chapter 1) has a more or less traditional structure, with two notable exceptions. First, because of the
growing interest in nanoscale systems and ultrasensitive physical measurements, large attention is paid
to fluctuations of various physical variables. Namely, their discussion (in Chapter 5) includes not only
the traditional topic of variances in thermal equilibrium, but also the characterization of their time
dependence (including the correlation function and the spectral density), and also the Smoluchowski and
Fokker–Planck equations for Brownian systems. A part of this chapter, including the discussion of the
most important fluctuation-dissipation theorem (FDT), has an unavoidable overlap with Chapter 7 of the
QM part of this series, devoted to open quantum systems. I tried to keep this overlap to a minimum, for
example using a different (and much shorter) derivation of the FDT in this course. The second deviation
from the tradition is Chapter 6 on physical kinetics, reflecting my belief that such key theoretical tools
as the Boltzmann kinetic equation and such key notions as the difference between the electrostatic and
electrochemical potentials in conductors, and band bending in semiconductors have to be in the arsenal
of every educated scientist.

Table of Contents Page 3 of 4


Essential Graduate Physics SM: Statistical Mechanics

This page is
intentionally left
blank

Table of Contents Page 4 of 4


Essential Graduate Physics SM: Statistical Mechanics

Chapter 1. Review of Thermodynamics


This chapter starts with a brief discussion of the subject of statistical physics and thermodynamics, and
the relation between these two disciplines. Then I proceed to review the basic notions and relations of
thermodynamics. Most of this material is supposed to be known to the reader from their undergraduate
studies,1 so the discussion is rather brief.

1.1. Introduction: Statistical physics and thermodynamics


Statistical physics (alternatively called “statistical mechanics”) and thermodynamics are two
different but related approaches to the same goal: an approximate description of the “internal”2
properties of large physical systems, notably those consisting of N >> 1 identical particles – or other
components. The traditional example of such a system is a human-scale portion of gas, with the number
N of atoms/molecules3 of the order of the Avogadro number NA ~ 1024 (see Sec. 4 below).
The motivation for the statistical approach to such systems is straightforward: even if the laws
governing the dynamics of each particle and their interactions were exactly known, and we had infinite
computing resources at our disposal, calculating the exact evolution of the system in time would be
impossible, at least because it is completely impracticable to measure the exact initial state of each
component – in the classical case, the initial position and velocity of each particle. The situation is
further exacerbated by the phenomena of chaos and turbulence,4 and the quantum-mechanical
uncertainty, which do not allow the exact calculation of positions and velocities of the component
particles even if their initial state is known with the best possible precision. As a result, in most
situations, only statistical predictions about the behavior of such systems may be made, with probability
theory becoming a major tool of the mathematical arsenal.
However, the statistical approach is not as bad as it may look. Indeed, it is almost self-evident
that any measurable macroscopic variable characterizing a stationary system of N >> 1 particles as a
whole (think, e.g., about the stationary pressure P of the gas contained in a fixed volume V) is nearly
constant in time. Indeed, as we will see below, besides certain exotic exceptions, the relative magnitude
of fluctuations – either in time, or among many macroscopically similar systems – of such a variable is
of the order of 1/N1/2, and for N ~ NA is extremely small. As a result, the average values of appropriate
macroscopic variables may characterize the state of the system quite well – satisfactory for nearly all
practical purposes. The calculation of relations between such average values is the only task of
thermodynamics and the main task of statistical physics. (Fluctuations may be important, but due to
their smallness, in most cases their analysis may be based on perturbative approaches – see Chapter 5.)

1 For remedial reading, I can recommend, for example (in the alphabetical order): C. Kittel and H. Kroemer,
Thermal Physics, 2nd ed., W. H. Freeman (1980); F. Reif, Fundamentals of Statistical and Thermal Physics,
Waveland (2008); D. V. Schroeder, Introduction to Thermal Physics, Addison Wesley (1999).
2 Here “internal” is an (admittedly loose) term meaning all the physics unrelated to the motion of the system as a
whole. The most important example of internal dynamics is the thermal motion of atoms and molecules.
3 This is perhaps my best chance to reverently mention Democritus (circa 460-370 BC) – the Ancient Greek
genius who was apparently the first one to conjecture the atomic structure of matter.
4 See, e.g., CM Chapters 8 and 9.

© K. Likharev
Essential Graduate Physics SM: Statistical Mechanics

Now let us have a fast look at typical macroscopic variables that statistical physics and
thermodynamics should operate with. Since I have already mentioned pressure P and volume V, let us
start with this famous pair of variables. First of all, note that volume is an extensive variable, i.e. a
variable whose value for a system consisting of several non-interacting parts is the sum of those of its
parts. On the other hand, pressure is an example of an intensive variable whose value is the same for
different parts of a system – if they are in equilibrium. To understand why P and V form a natural pair of
variables, let us consider the classical playground of thermodynamics: a portion of a gas contained in a
cylinder closed with a movable piston of area A (Fig. 1).

A F
P, V
Fig. 1.1. Compressing gas.

x
Neglecting the friction between the walls and the piston, and assuming that it is being moved so
slowly that the pressure P is virtually the same for all parts of the volume at any instant,5 the elementary
work of the external force F = PA compressing the gas, at a small piston’s displacement dx = –dV/A, is
F 
dW  Fdx    Adx    PdV .
Work
(1.1) on a gas
 A
The last expression is more general than the model shown in Fig. 1, and does not depend on the
particular shape of the system’s surface. (Note that in the notation of Eq. (1), which will be used
throughout this course, the elementary work done by the gas on its environment equals –dW.)
From the point of analytical mechanics,6 V and (–P) are just one of many possible canonical
pairs of generalized coordinates qj and generalized forces Fj, whose products dWj = Fjdqj give
independent contributions to the total work of the environment on the system under analysis. For
example, the reader familiar with the basics of electrostatics knows that if the spatial distribution E(r) of
an external electric field does not depend on the electric polarization P(r) of a dielectric medium placed
into this field, its elementary work on the medium is
3
dW   E r   dP r  d r    E j r dPj r  d 3 r .
3
(1.2a)
j 1

The most important cases when this condition is fulfilled (and hence Eq. (2a) is valid) are, first, long
cylindrical samples in a parallel external field (see, e.g., EM Fig. 3.13) and, second, the polarization of a
sample (of any shape) due to that of discrete electric dipoles p k, whose electric interaction is negligible.
In the latter case, Eq. (2a) may be also rewritten as the sum over the single dipoles located at points rk: 7

5 Such slow processes are called quasistatic; in the absence of static friction, they are (macroscopically)
reversible: at their inversion, the system runs back through in the same sequence of its macroscopic states.
6 See, e.g., CM Chapters 2 and 10.
7 Some of my SBU students needed an effort to reconcile the positive signs in Eq. (2b) with the negative sign in

the well-known relation dUk = –E(rk)dp k for the potential energy of a dipole in an external electric field – see, e.g.,

Chapter 1 Page 2 of 24
Essential Graduate Physics SM: Statistical Mechanics

. dW   dWk , with dWk  E rk   dpk . (1.2b)


k

Very similarly, and at similar conditions on an external magnetic field H(r), its elementary work on a
magnetic medium may be also represented in either of two forms:8
3
dW   0  H r   dM r  d 3 r   0   H j r dM j r  d 3 r , (1.3a)
j 1

dW   dWk , with dWk   0 H rk   dm k . (1.3b)


k

where M and mk are the vectors of, respectively, the medium’s magnetization and the magnetic
moment of a single dipole. Formulas (2) and (3) show that the roles of generalized coordinates may be
played by Cartesian components of the vectors P (or p ) and M (or m), with the components of the
electric and magnetic fields playing the roles of the corresponding generalized forces. This list may be
extended to other interactions (such as gravitation, surface tension in fluids, etc.). Following tradition, I
will use the {–P, V } pair in almost all the formulas below, but the reader should remember that they all
are valid for any other pair {Fj, qj}.9
Again, the specific relations between the variables of each pair listed above may depend on the
statistical properties of the system under analysis, but their definitions are not based on statistics. The
situation is very different for a very specific pair of variables, temperature T and entropy S, although
these “sister variables” participate in many formulas of thermodynamics exactly as if they were just one
more canonical pair {Fj, qj}. However, the very existence of these two notions is due to statistics.
Namely, temperature T is an intensive variable that characterizes the degree of thermal “agitation” of the
system’s components. On the contrary, the entropy S is an extensive variable that in most cases evades
immediate perception by human senses; it is a quantitative measure of the disorder of the system, i.e. the
degree of our ignorance about its exact microscopic state.10
The reason for the appearance of the {T, S} pair of variables in formulas of thermodynamics and
statistical mechanics is that the statistical approach to large systems of particles brings some
qualitatively new results, most notably the possibility of irreversible time evolution of collective
(macroscopic) variables describing the system. On one hand, irreversibility looks absolutely natural in
such phenomena as the diffusion of an ink drop in a glass of water. In the beginning, the ink molecules
are located in a certain small part of the system’s volume, i.e. are to some extent ordered, while at the
late stages of diffusion, the position of each molecule in the glass is essentially random. However, on

EM Eqs. (3.15). The resolution of this paradox is simple: each term of Eq. (2b) describes the work dWk of the
electric field on the internal degrees of freedom of the kth dipole, changing its internal energy Ek: dEk = dWk. This
energy change may be viewed as coming from the dipole’s potential energy in the field: dEk = –dUk.
8 Here, as in all my series, I am using the SI units; for their conversion to the Gaussian units, I have to refer the
reader to the EM part of the series.
9 Note that in systems of discrete particles, most generalized forces including the fields E and H , differ from the

mechanical pressure P in the sense that their work may be explicitly partitioned into single-particle components –
see Eqs. (2b) and (3b). This fact gives some discretion for the approaches based on thermodynamic potentials –
see Sec.4 below.
10 The notion of entropy was introduced into thermodynamics in 1865 by Rudolf Julius Emanuel Clausius on a
purely phenomenological basis. In the absence of a clear understanding of entropy’s microscopic origin (which
had to wait for the works by L. Boltzmann and J. Maxwell), this was an amazing intellectual achievement.

Chapter 1 Page 3 of 24
Essential Graduate Physics SM: Statistical Mechanics

second thought, the irreversibility is rather surprising, taking into account that the “microscopic” laws
governing the motion of the system’s components are time-reversible – such as Newton’s laws or the
basic laws of quantum mechanics.11 Indeed, if at a late stage of the diffusion process, we reversed the
velocities of all molecules exactly and simultaneously, the ink molecules would again gather (for a
moment) into the original spot.12 The problem is that getting the information necessary for the exact
velocity reversal is not practicable.
A quantitative discussion of the reversibility-irreversibility dilemma requires a strict definition of
the basic notion of statistical mechanics (and indeed of the probability theory): the statistical ensemble,
and I would like to postpone it until the beginning of Chapter 2. In particular, in that chapter, we will see
that the basic law of irreversible behavior is a growth or constancy of the entropy S in any closed
system. Thus, the statistical mechanics, without defying the “microscopic” laws governing the evolution
of the system’s components, introduces on top of them some new “macroscopic” laws, intrinsically
related to information, i.e. the depth of our knowledge of the microscopic state of the system.
To conclude this brief discussion of variables, let me mention that as in all fields of physics, a
very special role in statistical mechanics is played by the energy E. To emphasize the commitment to
disregard the motion of the system as a whole in this subfield of physics, the E considered in
thermodynamics it is frequently called the internal energy, though just for brevity, I will skip this
adjective in most cases. The simplest example of such E is the sum of kinetic energies of molecules in a
dilute gas at their thermal motion, but in general, the internal energy also includes not only the
individual energies of the system’s components but also their interactions with each other. Besides a few
“pathological” cases of very-long-range interactions, the interactions may be treated as local; in this
case, the internal energy is proportional to N, i.e. is an extensive variable. As will be shown below, other
extensive variables with the dimension of energy are often very useful as well, including the
(Helmholtz) free energy F, the Gibbs energy G, the enthalpy H, and the grand potential . (The
collective name for such variables is thermodynamic potentials.)
Now, we are ready for a brief discussion of the relationship between statistical physics and
thermodynamics. While the task of statistical physics is to calculate the macroscopic variables discussed
above13 for various microscopic models of the system, the main role of thermodynamics is to derive
some general relations between the average values of the macroscopic variables (also called
thermodynamic variables) that do not depend on specific models. Surprisingly, it is possible to
accomplish such a feat using just a few either evident or very plausible general assumptions (sometimes
called the laws of thermodynamics), which find their proofs in statistical physics.14 Such general
relations allow for a substantial reduction of the number of calculations we have to do in statistical
physics: in most cases, it is sufficient to calculate from the statistics just one or two variables, and then

11 Because of that, the possibility of the irreversible macroscopic behavior of microscopically reversible systems
was questioned by some serious scientists as recently as in the late 19th century – notably by J. Loschmidt in 1876.
12 While quantum-mechanical effects, with their intrinsic uncertainty, may be quantitatively important in this
example, our qualitative discussion does not depend on them.
13 Several other important quantities, for example the heat capacity C, may be calculated as partial derivatives of
the basic variables discussed above. Also, at certain conditions, the number of particles N in a system cannot be
fixed and should also be considered as an (extensive) variable – see Sec. 5 below.
14 Admittedly, some of these proofs are based on other plausible but deeper postulates, for example, the central
statistical hypothesis (see Sec. 2.2 below) whose best proof, to my knowledge, is just the whole body of
experimental data.

Chapter 1 Page 4 of 24
Essential Graduate Physics SM: Statistical Mechanics

use thermodynamic relations to get all other properties of interest. Thus the science of thermodynamics,
sometimes snubbed as a phenomenology, deserves every respect not only as a useful theoretical tool but
also as a discipline more general than any particular statistical model. This is why the balance of this
chapter is devoted to a brief review of thermodynamics.

1.2. The 2nd law of thermodynamics, entropy, and temperature


Thermodynamics accepts a phenomenological approach to the entropy S, postulating that there is
such a unique extensive measure of the aggregate disorder, and that in a closed system (defined as a
system completely isolated from its environment, i.e. the system with its internal energy fixed) it may
only grow in time, reaching its constant (maximum) value at equilibrium:15
nd
2 law of
thermo- dS  0 . (1.4)
dynamics

This postulate is called the 2nd law of thermodynamics – arguably its only substantial new law.16,17
This law, together with the additivity of S (as an extensive variable) in composite systems of
non-interacting parts is sufficient for a formal definition of temperature, and a derivation of its basic
properties that comply with our everyday notion of this key variable. Indeed, let us consider a closed
system consisting of two fixed-volume subsystems (Fig. 2) whose internal relaxation is very fast in
comparison with the rate of the thermal flow (i.e. the energy and entropy exchange) between the parts.
In this case, on the latter time scale, each part is always in a quasistatic state, which may be described by
a unique relation E(S) between its energy and entropy.18

E1 , S1 dE, dS

E2 , S 2
Fig. 1.2. A. composite thermodynamic system.

Neglecting the energy of interaction between the parts (which is always possible at N >> 1, and
in the absence of very-long-range interactions), we may use the extensive character of the variables E
and S to write
E  E1 S1   E 2 S 2 , S  S1  S 2 , (1.5)

for the full energy and entropy of the system. Now let us use them to calculate the following derivative:

15 Implicitly, this statement also postulates the existence, in a closed system, of thermodynamic equilibrium, an
asymptotically reached state in which all macroscopic variables, including entropy, remain constant. Sometimes
this postulate is called the 0th law of thermodynamics.
16 Two initial formulations of this law, later proved equivalent, were put forward independently by Lord Kelvin
(born William Thomson) in 1851 and by Rudolf Clausius in 1854.
17 Note that according to Eq. (4), a macroscopically reversible process is possible only when the net entropy (of
the system under consideration plus its environment involved in the process) does not change.
18 Here we strongly depend on a very important (and possibly the least intuitive) aspect of the 2nd law, namely that
entropy is a unique macroscopic measure of disorder.

Chapter 1 Page 5 of 24
Essential Graduate Physics SM: Statistical Mechanics

dS dS dS dS dS dE 2 dS1 dS 2 d ( E  E1 )
 1  2  1  2   . (1.6)
dE1 dE1 dE1 dE1 dE 2 dE1 dE1 dE 2 dE1
Since the total energy E of the closed system is fixed and hence independent of its re-distribution
between the subsystems, we have to take dE/dE1 =0, and Eq. (6) yields
dS dS dS
 1  2. (1.7)
dE1 dE1 dE 2
According to the 2nd law of thermodynamics, when the two parts have reached the thermodynamic
equilibrium, the total entropy S reaches its maximum, so dS/dE1 = 0, and Eq. (7) yields
dS1 dS 2
 . (1.8)
dE1 dE 2
This equality shows that if a thermodynamic system may be partitioned into weakly interacting
macroscopic parts, their derivatives dS/dE should be equal in equilibrium. The reciprocal of this
derivative is called temperature. Taking into account that our analysis pertains to the situation (Fig. 2)
when both volumes V1,2 are fixed, we may write this definition as
 E  Definition of
  T , (1.9) temperature
 S V
the subscript V meaning that volume is kept constant at the differentiation. (Such notation is common
and very useful in thermodynamics, with its broad range of variables.)
Note that according to Eq. (9), if the temperature is measured in energy units19 (as I will do in
this course for the brevity of notation), then S is dimensionless. The transfer to the SI or Gaussian units,
i.e. to the temperature TK measured in kelvins (not “Kelvins”, and not “degrees Kelvin”, please!), is
given by the relation T = kBTK, where the Boltzmann constant kB  1.38×10-23 J/K = 1.38×10-16 erg/K.20
In those units, the entropy becomes dimensional: SK = kBS.
The definition of temperature given by Eq. (9), is of course in sharp contrast with the popular
notion of T as a measure of the average energy of one particle. However, as we will repeatedly see
below, in many cases these two notions may be reconciled, with Eq. (9) being more general. In
particular, the so-defined T is in agreement with our everyday notion of temperature:21
(i) according to Eq. (9), the temperature is an intensive variable (since both E and S are
extensive), i.e., in a system of similar particles, it is independent of the particle number N;

19 Here I have to mention a traditional unit of thermal energy, the calorie, still being used in some applied fields.
In the most common modern definition (as the so-called thermochemical calorie) it equals exactly 4.148 J.
20 For the more exact values of this and other constants, see appendix UCA: Selected Units and Constants. Note
that both T and TK define the natural absolute (also called “thermodynamic”) scale of temperature, vanishing at
the same point – in contrast to such artificial scales as the degrees Celsius (“centigrades”), defined as TC  TK +
273.15, or the degrees Fahrenheit: TF  (9/5)TC + 32.
21 Historically, this notion was initially only qualitative – just as something distinguishing “hot” from “cold”.
After the invention of thermometers (the first one by Galileo Galilei in 1592), mostly based on the thermal
expansion of fluids, this notion had become quantitative but not very deep: being understood as something “what
thermometers measure” – until its physical sense as a measure of thermal motion’s intensity, was revealed in the
19th century.

Chapter 1 Page 6 of 24
Essential Graduate Physics SM: Statistical Mechanics

(ii) temperatures of all parts of a system are equal at equilibrium – see Eq. (8);
(iii) in a closed system whose parts are not in equilibrium, thermal energy (heat) always flows
from the warmer part (with a higher T) to the colder part.
In order to prove the last property, let us revisit the closed composite system shown in Fig. 2,
and consider another derivative:
dS dS1 dS 2 dS1 dE1 dS 2 dE 2
    . (1.10)
dt dt dt dE1 dt dE 2 dt
If the internal state of each part is very close to equilibrium (as was assumed from the very beginning) at
each moment of time, we can use Eq. (9) to replace the derivatives dS1,2/dE1,2 with 1/T1,2, getting
dS 1 dE1 1 dE 2
  . (1.11)
dt T1 dt T2 dt
Since in a closed system E = E1 + E2 = const, these time derivatives are related as dE2/dt = –dE1/dt, and
Eq. (11) yields
dS  1 1  dE1
   . (1.12)
dt  T1 T2  dt
But according to the 2nd law of thermodynamics, this derivative cannot be negative: dS/dt ≥ 0. Hence,
1 1  dE1
    0. (1.13)
 T1 T2  dt
For example, if T1 > T2, then dE1/dt  0, i.e. the warmer part gives energy to its colder counterpart.
Note also that such a heat exchange, at fixed volumes V1,2, and T1  T2, increases the total
system’s entropy, without performing any “useful” mechanical work – see Eq. (1).

1.3. The 1st and 3rd laws of thermodynamics, and heat capacity
Now let us consider a thermally insulated system whose volume V may be changed by force –
see, for example, Fig. 1. Such a system is different from the fully closed one, because its energy E may
be changed by the external force’s work – see Eq. (1):
dE  dW   PdV . (1.14)
Let the volume change be not only quasistatic but also static-friction-free (reversible), so that the system
is virtually at equilibrium at any instant. Such reversible process, in the particular case of a thermally
insulated system, it is also called adiabatic. If the pressure P (or any generalized external force F j) is
deterministic, i.e. is a predetermined function of time, independent of the state of the system under
analysis, it may be considered as coming from a fully ordered system, i.e. the one with zero entropy,
with the aggregate system (consisting of the system under our analysis plus the source of the force)
completely closed. Since the entropy of the total closed system should stay constant (see the second of
Eqs. (5) above), the S of the system under analysis should stay constant on its own. Thus we arrive at a
very important conclusion: at an adiabatic process in a system, its entropy cannot change. (Sometimes
such a process is called isentropic.) This means that we may use Eq. (14) to write

Chapter 1 Page 7 of 24
Essential Graduate Physics SM: Statistical Mechanics

 E 
P    . (1.15)
 V  S
Now let us consider a more general thermodynamic system that may also exchange thermal
energy (“heat”) with its environment (Fig. 3).

dQ dW
E(S, V) Fig. 1.3. An example of the thermodynamic
process involving both the mechanical work by
the environment and the heat exchange with it.

For such a system, our previous conclusion about the entropy’s constancy is not valid, so in
equilibrium, S may be a function of not only the system’s energy E but also of its volume: S = S(E, V).
Let us consider this relation resolved for energy: E = E(S, V), and write the general mathematical
expression for the full differential of E as a function of these two independent arguments:
 E   E 
dE    dS    dV . (1.16)
 S V  V  S
This formula, based on the stationary relation E = E(S, V), is evidently valid not only in equilibrium but
also for all reversible22 processes. Now, using Eqs. (9) and (15), we may rewrite Eq. (16) as
Energy:
dE  TdS  PdV . (1.17) differential

According to Eq. (1), the second term on the right-hand side of this equation is just the work of the
external force, so due to the conservation of energy,23 the first term has to be equal to the heat dQ
transferred from the environment to the system (see Fig. 3):

dE  dQ  dW , (1.18) st
1 law of
thermo-
dQ  TdS . (1.19) dynamics

The last relation, divided by T and then integrated along an arbitrary (but reversible!) process,
dQ
S  const, (1.20)
T
is sometimes used as an alternative definition of entropy S – provided that temperature is defined not by
Eq. (9), but in some independent way. It is useful to recognize that entropy (like energy) may be defined

22Let me emphasize again that any adiabatic process is reversible, but not vice versa.
23Such conservation, expressed by Eqs. (18)-(19), is commonly called the 1st law of thermodynamics. While it (in
contrast with the 2nd law) does not present any new law of nature, and in particular was already used de-facto to
write the first of Eqs. (5) and also Eq. (14), such a grand name was absolutely justified in the early 19th century
when the mechanical nature of internal energy (including the motion of atoms and molecules) was not at all clear.
In this context, the names of at least three great scientists: Benjamin Thompson (who gave, in 1799, convincing
arguments that heat cannot be anything but a form of particle motion), Julius Robert von Mayer (who conjectured
the conservation of the sum of the thermal and macroscopic mechanical energies in 1841), and James Prescott
Joule (who proved this conservation experimentally two years later), have to be reverently mentioned.

Chapter 1 Page 8 of 24
Essential Graduate Physics SM: Statistical Mechanics

only to an arbitrary constant, which does not affect any other thermodynamic observables. The common
convention is to take
S  0, at T  0 . (1.21)

This condition is sometimes called the “3rd law of thermodynamics”, but it is important to realize that
this is just a convention rather than a real law.24 Indeed, the convention corresponds well to the notion of
the full order at T = 0 in some systems (e.g., separate atoms or perfect crystals), but creates ambiguity
for other systems, e.g., amorphous solids (like the usual glasses) that may remain highly disordered for
“astronomic” times, even at T  0.
Now let us discuss the notion of heat capacity that, by definition, is the ratio dQ/dT, where dQ is
the amount of heat that should be given to a system to raise its temperature by a small amount dT. 25
(This notion is important because the heat capacity may be most readily measured experimentally.) The
heat capacity depends, naturally, on whether the heat dQ goes only into an increase of the internal
energy dE of the system (as it does if its volume V is constant), or also into the mechanical work (–dW)
performed by the system at its expansion – as it happens, for example, if the pressure P, rather than the
volume V, is fixed (the so-called isobaric process – see Fig. 4).

Mg

dQ Mg
P  const Fig. 1.4. The simplest example of
A
the isobaric process.

Hence we should discuss at least two different quantities, 26 the heat capacity at fixed volume,
 Q 
CV    (1.22)
Heat
 T V
capacity: and the heat capacity at fixed pressure
definitions
 Q 
CP    , (1.23)
 T  P

24Actually, the 3rd law (also called the Nernst theorem) as postulated by Walter Hermann Nernst in 1912 was
different – and really meaningful: “It is impossible for any procedure to lead to the isotherm T = 0 in a finite
number of steps.” I will discuss this theorem at the end of Sec. 6.
25 By this definition, the full heat capacity of a system is an extensive variable, but it may be used to form such
intensive variables as the heat capacity per particle, called the specific heat capacity, or just the specific heat.
(Please note that the last terms are rather ambiguous: they are also used for the heat capacity per unit mass and per
unit volume, so some caution is in order.)
26 Dividing both sides of Eq. (19) by dT, we get the general relation dQ/dT = TdS/dT, which may be used to
rewrite the definitions (22) and (23) in the following forms:
 S   S 
CV  T   , CP  T   ,
 T V  T  P
which are more convenient for some applications.

Chapter 1 Page 9 of 24
Essential Graduate Physics SM: Statistical Mechanics

and expect that for all “normal” (mechanically stable) systems, CP  CV. The difference between CP and
CV is minor for most liquids and solids, but may be very substantial for gases – see the next section.

1.4. Thermodynamic potentials


Since for a fixed volume, dW = –PdV = 0 and Eq. (18) yields dQ = dE, we may rewrite Eq. (22)
in another convenient form
 E 
CV    . (1.24)
 T V
so to calculate CV from a certain statistical-physics model, we only need to calculate E as a function of
temperature and volume. If we want to obtain a similarly convenient expression for CP, the best way is
to introduce a new notion of so-called thermodynamic potentials – whose introduction and effective use
is perhaps one of the most impressive techniques of thermodynamics. For that, let us combine Eqs. (1)
and (18) to write the 1st law of thermodynamics in its most common form

dQ  dE  PdV . (1.25)
At an isobaric process (Fig. 4), i.e. at P = const, this expression is reduced to
dQ P  dE P  d ( PV ) P  d ( E  PV ) P . (1.26)
Thus, if we introduce a new function with the dimensionality of energy:27
Enthalpy:
H  E  PV , (1.27) definition

called enthalpy (or, sometimes, the “heat function” or the “heat contents”),28 we may rewrite Eq. (23) as
 H 
CP    . (1.28)
 T  P
Comparing Eqs. (28) and (24), we see that for the heat capacity, the enthalpy H plays the same role at
fixed pressure as the internal energy E plays at fixed volume.
Now let us explore the enthalpy’s properties at an arbitrary reversible process, lifting the
restriction P = const, but keeping the definition (27). Differentiating this equality, we get
dH  dE  PdV  VdP . (1.29)
Plugging into this relation Eq. (17) for dE, we see that the terms PdV cancel, yielding a very simple
expression
Enthalpy:
dH  TdS  VdP , (1.30) differential

whose right-hand side differs from Eq. (17) only by the swap of P and V in the second term, with the
simultaneous change of its sign. Formula (30) shows that if H has been found (say, experimentally

27From the point of view of mathematics, Eq. (27) is a particular case of the so-called Legendre transformations.
28This function (as well as the Gibbs free energy G, see below), had been introduced in 1875 by J. Gibbs, though
the term “enthalpy” was coined (much later) by H. Onnes.

Chapter 1 Page 10 of 24
Essential Graduate Physics SM: Statistical Mechanics

measured or calculated for a certain microscopic model) as a function of the entropy S and the pressure
P of a system, we can calculate its temperature T and volume V by simple partial differentiation:
 H   H 
T   , V   . (1.31)
 S  P  P  S
The comparison of the first of these relations with Eq. (9) shows that not only for the heat capacity but
for the temperature as well, at fixed pressure, enthalpy plays the same role as played by internal energy
at fixed volume.
This success immediately raises the question of whether we could develop this idea further on,
by defining other useful thermodynamic potentials – the variables with the dimensionality of energy that
would have similar properties – first of all, a potential that would enable a similar swap of T and S in its
full differential, in comparison with Eq. (30). We already know that an adiabatic process is a reversible
process with constant entropy, inviting analysis of a reversible process with constant temperature. Such
an isothermal process may be implemented, for example, by placing the system under consideration into
thermal contact with a much larger system (called either the heat bath, or “heat reservoir”, or
“thermostat”) that remains in thermodynamic equilibrium at all times – see Fig. 5.

heat bath

T
Fig. 1.5. The simplest
dQ example of the isothermal
process.

Due to its very large size, the heat bath’s temperature T does not depend on what is being done
with our system. If the change is being done sufficiently slowly (i.e. reversibly), this temperature is also
the temperature of our system – see Eq. (8) and its discussion. Let us calculate the elementary
mechanical work dW (1) at such a reversible isothermal process. According to the general Eq. (18), dW
= dE – dQ. Plugging dQ from Eq. (19) into this equality, for T = const we get

dW T  dE  TdS  d ( E  TS )  dF , (1.32)


where the following combination,
Free energy: F  E  TS , (1.33)
definition

is called the free energy (or the “Helmholtz free energy”, or just the “Helmholtz energy”29). Just as we
have done for the enthalpy, let us establish properties of this new thermodynamic potential for an
arbitrarily small, reversible (now not necessarily isothermal!) variation of variables, while keeping the
definition (33). Differentiating this relation and then using Eq. (17), we get
Free energy:
differential dF   SdT  PdV . (1.34)

29 It was named after Hermann von Helmholtz (1821-1894). The last of the listed terms for F was recommended
by the most recent (1988) IUPAC’s decision, but I will use the first term, which prevails in the physics literature.
The origin of the adjective “free” stems from Eq. (32): F is may be interpreted as the internal energy’s part that is
“free” to be transferred to the mechanical work, at the (very common) isothermal process.

Chapter 1 Page 11 of 24
Essential Graduate Physics SM: Statistical Mechanics

Thus, if we know the function F(T, V), we can calculate S and P by simple differentiation:
 F   F 
S    , P    . (1.35)
 T V  V  T
Now we may notice that the system of all partial derivatives may be made full and symmetric if
we introduce one more thermodynamic potential. Indeed, we have already seen that each of the three
already introduced thermodynamic potentials (E, H, and F) has an especially simple full differential if it
is considered as a function of its two canonical arguments: one of the “thermal variables” (either S or T)
and one of the “mechanical variables” (either P or V):30

E  E ( S ,V ); H  H ( S , P); F  F (T , V ). (1.36)
In this list of pairs of four arguments, only one pair is missing: {T, P}. The thermodynamic function of
this pair, which gives the two remaining variables (S and V) by simple differentiation, is called the
Gibbs energy (or sometimes the “Gibbs free energy”): G = G(T, P). The way to define it in a symmetric
fashion is evident from the so-called circular diagram shown in Fig. 6.

(a) (b)
S H P S H P

+PV -TS Fig. 1.6. (a) The circular diagram and


E G E G (b) an example of its use for variable
-TS +PV calculation. The thermodynamic
potentials are typeset in red, each
flanked with its two canonical
V F T V F T arguments.

In this diagram, each thermodynamic potential is placed between its two canonical arguments –
see Eq. (36). The left two arrows in Fig. 6a show the way the potentials H and F have been obtained
from energy E – see Eqs. (27) and (33). This diagram hints that G has to be defined as shown by either
of the two right arrows on that panel, i.e. as
Gibbs energy:
G  F  PV  H  TS  E  TS  PV . (1.37) definition

In order to verify this idea, let us calculate the full differential of this new thermodynamic potential,
using, e.g., the first form of Eq. (37) together with Eq. (34):
Gibbs energy:
dG  dF  d ( PV )  ( SdT  PdV )  ( PdV  VdP)   SdT  VdP, (1.38) differential

so if we know the function G(T, P), we can indeed readily calculate both entropy and volume:
 G   G 
S    , V   . (1.39)
 T  P  P  T

30 Note the similarity of this situation with that is analytical mechanics (see, e.g., CM Chapters 2 and 10): the
Lagrangian function may be used for a simple derivation of the equations of motion if it is expressed as a function
of generalized coordinates and their velocities, while to use the Hamiltonian function in a similar way, it has to be
expressed as a function of the generalized coordinates and the corresponding momenta.

Chapter 1 Page 12 of 24
Essential Graduate Physics SM: Statistical Mechanics

The circular diagram completed in this way is a good mnemonic tool for recalling Eqs. (9), (15),
(31), (35), and (39), which express thermodynamic variables as partial derivatives of the thermodynamic
potentials. Indeed, the variable in any corner of the diagram may be found as a partial derivative of any
of two potentials that are not its direct neighbors, over the variable in the opposite corner. For example,
the green line in Fig. 6b corresponds to the second of Eqs. (39), while the blue line, to the second of Eqs.
(31). At this procedure, all the derivatives giving the variables of the upper row (S and P) have to be
taken with negative signs, while those giving the variables of the bottom row (V and T), with positive
signs.31
Now I have to justify the collective name “thermodynamic potentials” used for E, H, F, and G.
For that, let us consider a macroscopically irreversible process, for example, direct thermal contact of
two bodies with different initial temperatures. As was discussed in Sec. 2, at such a process, the entropy
may grow even without the external heat flow: dS  0 at dQ = 0 – see Eq. (12). This means that at a
more general process with dQ  0, the entropy may grow faster than predicted by Eq. (19), which has
been derived for a reversible process, so
dQ
dS  , (1.40)
T
with the equality approached in the reversible limit. Plugging Eq. (40) into Eq. (18) (which, being just
the energy conservation law, remains valid for irreversible processes as well), we get

dE  TdS  PdV . (1.41)


We can use this relation to have a look at the behavior of other thermodynamic potentials in
irreversible situations, still keeping their definitions given by Eqs. (27), (33), and (37). Let us start from
the (very common) case when both the temperature T and the volume V of a system are kept constant. If
the process is reversible, then according to Eq. (34), the full time derivative of the free energy F would
equal zero. Eq. (41) says that in an irreversible process, this is not necessarily so: if dT = dV =0, then
dF d dE dS
 ( E  TS ) T  T  0. (1.42)
dt dt dt dt
Hence, in the general (irreversible) situation, F can only decrease, but not increase in time. This means
that F eventually approaches its minimum value F(T, V) given by the equations of reversible
thermodynamics. To re-phrase this important conclusion, in the case T = const and V = const, the free
energy F, i.e. the difference E – TS, plays the role of the potential energy in the classical mechanics of
dissipative processes: its minimum corresponds to the (in the case of F, thermodynamic) equilibrium of
the system. This is one of the key results of thermodynamics, and I invite the reader to give it some
thought. One of its possible handwaving interpretations of this fact is that the heat bath with fixed T > 0,
i.e. with a substantial thermal agitation of its components, “wants” to impose thermal disorder in the
system immersed into it, by “rewarding” it with lower F for any increase of the disorder.

31 There is also a wealth of other relations between thermodynamic variables that may be represented as second
derivatives of the thermodynamic potentials, including four Maxwell relations such as (S/V)T = (P/T)V, etc.
(They may be readily recovered from the well-known property of a function of two independent arguments, say,
f(x, y): (f/x)/y = (f/y)/x.) In this chapter, I will list only the thermodynamic relations that will be used
later in the course; a more complete list may be found, e.g., in Sec. 16 of the book by L. Landau and E. Lifshitz,
Statistical Physics, Part 1, 3rd ed., Pergamon, 1980 (and its later re-printings).

Chapter 1 Page 13 of 24
Essential Graduate Physics SM: Statistical Mechanics

Repeating the calculation for a different case, T = const, P = const, it is easy to see that in this
case the same role is played by the Gibbs energy:
dG d dE dS dV dS dV dS dV
 ( E  TS  PV )  T P  (T P ) T P  0, (1.43)
dt dt dt dt dt dt dt dt dt
so the thermal equilibrium now corresponds to the minimum of G rather than F.
For the two remaining thermodynamic potentials, E and H, calculations similar to Eqs. (42) and
(43) are possible but make less sense because that would require keeping S = const (with V = const for
E, and P = const for H) for an irreversible process, but it is usually hard to prevent the entropy from
growing if initially it had been lower than its equilibrium value, at least on the long-term basis.32 Thus
physically, the circular diagram is not so symmetric after all: G and F are somewhat more useful for
most practical calculations than E and H.
Note that the difference G – F = PV between the two “more useful” potentials has very little to
do with thermodynamics at all because this difference exists (although is not much advertised) in
classical mechanics as well.33 Indeed, the difference may be generalized as G – F = –Fq, where q is a
generalized coordinate, and F is the corresponding generalized force. The minimum of F corresponds
to the equilibrium of an autonomous system (with F = 0), while the equilibrium position of the same
system under the action of external force F is given by the minimum of G. Thus the external force
“wants” the system to subdue to its effect, “rewarding” it with lower G.
Moreover, the difference between F and G becomes a bit ambiguous (approach-dependent) when
the product Fq may be partitioned into single-particle components – just as it is done in Eqs. (2b) and
(3b) for the electric and magnetic fields. Here the applied field may be taken into account on the
microscopic level, by including its effect directly into the energy k of each particle. In this case, the
field contributes to the total internal energy E directly, and hence the thermodynamic equilibrium (at T =
const) is described as the minimum of F. (We may say that in this case F = G, unless a difference
between these thermodynamic potentials is created by the actual mechanical pressure P.) However, in
some cases, typically for condensed systems with their strong interparticle interactions, the easier (and
sometimes the only practicable34) way to account for the field is on the macroscopic level, by taking G
= F – Fq. In this case, the same equilibrium state is described as the minimum of G. (Several examples
of this dichotomy will be given later in this course.) Whatever the choice, one should mind not take the
same field effect into account twice.

32 There are a few practicable systems, notably including the so-called adiabatic magnetic refrigerators (to be
discussed in Chapter 2), where the unintentional growth of S is so slow that the condition S = const may be
closely approached during a finite but substantial time interval.
33 It is convenient to describe it as the difference between the “usual” (internal) potential energy U of the system
to its “Gibbs potential energy” UG – see CM Sec. 1.4. For the readers who skipped that discussion: my pet
example is the usual elastic spring with U = kx2/2, under the effect of an external force F, whose equilibrium
position (x0 = F/k) evidently corresponds to the minimum of UG = U – Fx, rather than just U.
34 An example of such an extreme situation is the case when an external magnetic field H is applied to a

macroscopic sample of a type-1 superconductor in its so-called intermediate state, in which the sample partitions
into domains of the “normal” phase with B = 0H, and the superconducting phase with B = 0. (For more on this
topic see, e.g., EM Secs. 6.4-6.5.) In this case, the field is effectively applied to the interfaces between the
domains, very similarly to the mechanical pressure applied to a gas portion via a piston – see Fig. 1 again.

Chapter 1 Page 14 of 24
Essential Graduate Physics SM: Statistical Mechanics

One more important conceptual question I would like to discuss here is why usually statistical
physics pursues the calculation of thermodynamic potentials rather than just of a relation between P, V,
and T. (Such relation is called the equation of state of the system.) Let us explore this issue on the
particular but very important example of an ideal classical gas in thermodynamic equilibrium, for which
the equation of state should be well known to the reader from undergraduate physics:35
Ideal gas:
equation
of state
PV  NT , (1.44)

where N is the number of particles in volume V. (In Chapter 3, we will derive Eq. (44) from statistics.)
Let us try to use it for the calculation of all thermodynamic potentials, and all other thermodynamic
variables discussed above. We may start, for example, from the calculation of the free energy F. Indeed,
integrating the second of Eqs. (35) with the pressure calculated from Eq. (44), P = NT/V, we get
dV d (V / N ) V
F    PdV T  const   NT    NT    NT ln  Nf (T ), (1.45)
V (V / N ) N
where V has been divided by N in both instances just to represent F as a manifestly extensive variable, in
this uniform system proportional to N. The integration “constant” f(T) is some function of temperature,
which cannot be recovered from the equation of state. This function affects all other thermodynamic
potentials, and the entropy as well. Indeed, using the first of Eqs. (35) together with Eq. (45), we get
 F   V df T  
S     N ln  , (1.46)
 T V  N dT 
and now may combine Eqs. (33) with (46) to calculate the (internal) energy of the gas,36
 V   V df T    df T  
E  F  TS   NT ln  Nf T   T  N ln  N   N  f T   T . (1.47)
 N   N dT   dT 
From here, we may use Eqs. (27), (44), and (47) to calculate the gas’ enthalpy,
 df T  
H  E  PV  E  NT  N  f T   T  T , (1.48)
 dT 
and, finally, plug Eqs. (44) and (45) into Eq. (37) to calculate its Gibbs energy
 V 
G  F  PV  N  T ln  f T   T  . (1.49)
 N 

35 The long history of the gradual discovery of this relation includes the very early (circa 1662) work by R. Boyle
and R. Townely, followed by contributions from H. Power, E. Mariotte, J. Charles, J. Dalton, and J. Gay-Lussac.
It was fully formulated by Benoît Paul Émile Clapeyron in 1834, in the form PV = nRTK, where n is the number of
moles in the gas sample, and R  8.31 J/moleK is the so-called gas constant. This form is equivalent to Eq. (44),
taking into account that R  kBNA, where NA = 6.022 140 761023 mole-1 is the Avogadro number, i.e. the number
of molecules per mole. (By the mole’s definition, NA is just the reciprocal mass, in grams, of the 1/12th part of the
12
C atom, which is close to the masses of one proton or one neutron – see Appendix UCA: Selected Units and
Constants.) Historically, this equation of state was the main argument for the introduction of the absolute
temperature T, because only with it, the equation acquires the spectacularly simple form (44).
36 Note that Eq. (47), in particular, describes a very important property of the ideal classical gas: its energy
depends only on temperature (and the number of particles), but not on volume or pressure.

Chapter 1 Page 15 of 24
Essential Graduate Physics SM: Statistical Mechanics

One might ask whether the function f(T) is physically significant, or it is something like the
inconsequential, arbitrary constant – like the one that may be always added to the potential energy in
non-relativistic mechanics. In order to address this issue, let us calculate, from Eqs. (24) and (28), both
heat capacities, which are evidently measurable quantities:
 E  d2 f
CV      NT , (1.50)
 T V dT 2
 H   d2 f 
CP     N 
  T  1   CV  N . (1.51)
 T  P
2
 dT 
We see that the function f(T), or at least its second derivative, is measurable.37 (In Chapter 3, we
will calculate this function for two simple “microscopic” models of the ideal classical gas.) The meaning
of this function is evident from the physical picture of the ideal gas: the pressure P exerted on the walls
of the containing volume is produced only by the translational motion of the gas molecules, while their
internal energy E (and hence other thermodynamic potentials) may be also contributed by the internal
dynamics of the molecules – their rotation, vibration, etc. Thus, the equation of state does not give us the
full thermodynamic description of a system, while the thermodynamic potentials do.

1.5. Systems with a variable number of particles


Now we have to consider one more important case: when the number N of particles in a system
is not rigidly fixed, but may change as a result of a thermodynamic process. A typical example of such a
system is a gas sample separated from the environment by a penetrable partition – see Fig. 7.38

environment

system
dN
Fig. 1.7. An example of a system
with a variable number of particles.

Let us analyze this situation in the simplest case when all the particles are similar. (In Sec. 4.1,
this analysis will be extended to systems with particles of several sorts). In this case, we may consider N
as an independent thermodynamic variable whose variation may change the energy E of the system, so
Eq. (17) (valid for a slow, reversible process) should now be generalized as
Chemical
dE  TdS  PdV  dN , (1.52) potential:
definition

37 Note, however, that the difference CP – CV = N is independent of f(T). (If the temperature is measured in
kelvins, this relation takes a more familiar form CP – CV = nR.) It is straightforward (and hence left for the reader’s
exercise) to prove that the difference CP – CV of any system is fully determined by its equation of state.
38 Another important example is a gas in contact with an open-surface liquid or solid of similar molecules.

Chapter 1 Page 16 of 24
Essential Graduate Physics SM: Statistical Mechanics

where  is some new function of state, called the chemical potential.39 Keeping the definitions of other
thermodynamic potentials, given by Eqs. (27), (33), and (37) intact, we see that the expressions for their
differentials should be generalized as
dH  TdS  VdP  dN , (1.53a)
dF   SdT  PdV  dN , (1.53b)
dG   SdT  VdP  dN , (1.53c)
so the chemical potential may be calculated as either of the following partial derivatives:40

 E   H   F   G 
         . (1.54)
 N  S ,V  N  S , P  N  T ,V  N  T , P
Despite the formal similarity of all Eqs. (54), one of them is more consequential than the others.
Indeed, the Gibbs energy G is the only thermodynamic potential that is a function of two intensive
parameters, T and P. However, just as all thermodynamic potentials, G has to be extensive, so in a
system of similar particles it has to be proportional to N:
G  Ng , (1.55)
where g is some function of T and P. Plugging this expression into the last of Eqs. (54), we see that 
equals exactly this function, so
G
 as Gibbs  , (1.56)
energy N
i.e. the chemical potential is just the Gibbs energy per particle.
In order to demonstrate how vital the notion of chemical potential may be, let us consider the
situation (parallel to that shown in Fig. 2) when a system consists of two parts, with equal pressure and
temperature, that can exchange particles at a relatively slow rate (much slower than the speed of the
internal relaxation of each part). Then we can write two equations similar to Eqs. (5):
N  N1  N 2 , G  G1  G2 , (1.57)
where N = const, and Eq. (56) may be used to describe each component of G:
G  1 N 1   2 N 2 . (1.58)
Plugging the N2 expressed from the first of Eqs. (57), N2 = N – N1, into Eq. (58), we see that
dG (1.59)
 1   2 ,
dN 1
so the minimum of G is achieved at 1 = 2. Hence, in the conditions of fixed temperature and pressure,
i.e. when G is the appropriate thermodynamic potential, the chemical potentials of the system parts
should be equal – the so-called chemical equilibrium.

39 This name, of a historic origin, is misleading: as evident from Eq. (52),  has a clear physical sense of the
average energy cost of adding one more particle to a system with N >> 1.
40 Note that strictly speaking, Eqs. (9), (15), (31), (35), and (39) should be now generalized by adding another
lower index, N, to the corresponding derivatives; I will just imply them being calculated at constant N.

Chapter 1 Page 17 of 24
Essential Graduate Physics SM: Statistical Mechanics

Finally, later in the course, we will also run into several cases when the volume V of a system, its
temperature T, and the chemical potential  are all fixed. (The last condition may be readily
implemented by allowing the system of our interest to exchange particles with an environment so large
that its  stays constant.) The thermodynamic potential appropriate for this case may be obtained by
subtraction of the product N from the free energy F, resulting in the so-called grand thermodynamic
(or “Landau”) potential:
G Grand
Ω  F  N  F  N  F  G   PV . (1.60) potential:
N definition

Indeed, for a reversible process, the full differential of this potential is


Grand
d  dF  d ( N )  ( SdT  PdV  dN )  ( dN  Nd )   SdT  PdV  Nd , (1.61) potential:
differential

so if  has been calculated as a function of T, V, and , other thermodynamic variables may be found as
        
S    , P    , N    . (1.62)
 T V ,   V  T ,     T ,V
Now acting exactly as we have done for other potentials, it is straightforward to prove that an
irreversible process with fixed T, V, and , provides d/dt  0, so the system’s equilibrium indeed
corresponds to the minimum of its grand potential . We will repeatedly use this fact in this course.

1.6. Thermal machines


In order to complete this brief review of thermodynamics, I cannot completely pass the topic of
thermal machines – not because it will be used much in this course, but mostly because of its practical
and historic significance.41 Figure 8a shows the generic scheme of a thermal machine that may perform
mechanical work on its environment (in our notation, equal to –W) during each cycle of the
expansion/compression of some “working gas”, by transferring different amounts of heat from a high-
temperature heat bath (QH) and to the low-temperature bath (QL).

(a) (b)
TH
P
QH
W
“working gas” W

QL
TL 0 V
Fig. 1.8. (a) The simplest implementation of a thermal machine, and (b) the graphic representation of the
mechanical work it performs. On panel (b), the solid arrow indicates the heat engine cycle direction,
while the dashed arrow, the refrigerator cycle direction.

41The whole field of thermodynamics was spurred by the famous 1824 work by Nicolas Léonard Sadi Carnot, in
which he, in particular, gave an alternative, indirect form of the 2nd law of thermodynamics – see below.

Chapter 1 Page 18 of 24
Essential Graduate Physics SM: Statistical Mechanics

One relation between the three amounts QH, QL, and W is immediately given by the energy
conservation (i.e. by the 1st law of thermodynamics):
QH  QL  W . (1.63)
From Eq. (1), the mechanical work during the cycle may be calculated as

 W   PdV , (1.64)

and hence represented by the area circumvented by the state-representing point on the [P, V] plane – see
Fig. 8b. Note that the sign of this circular integral depends on the direction of the point’s rotation; in
particular, the work (–W) done by the working gas is positive at its clockwise rotation (pertinent to heat
engines) and negative in the opposite case (implemented in refrigerators and heat pumps – see below).
Evidently, the work depends on the exact form of the cycle, which in turn may depend not only on TH
and TL, but also on the working gas’ properties.
An exception to this rule is the famous Carnot cycle, consisting of two isothermal and two
adiabatic processes (all reversible!). In its heat engine’s form, the cycle may start, for example, from an
isothermic expansion of the working gas in contact with the hot bath (i.e. at T = TH). It is followed by its
additional adiabatic expansion (with the gas being disconnected from both heat baths) until its
temperature drops to TL. Then an isothermal compression of the gas is performed in its contact with the
cold bath (at T = TL), followed by its additional adiabatic compression to raise T to TH again, after which
the cycle is repeated again and again. Note that during this specific cycle, the working gas is never in
contact with both heat baths simultaneously, thus avoiding the irreversible heat transfer between them.
The cycle’s shape on the [V, P] plane (Fig. 9a) depends on the exact properties of the working gas and
may be rather complicated. However, since the system’s entropy is constant at any adiabatic process, the
Carnot cycle’s shape on the [S, T] plane is always rectangular – see Fig. 9b.

(a) (b)
P T
T  TH
TH Fig. 1.9. Representations of the
S  S1
Carnot cycle: (a) on the [V, P]
S  S2
plane (schematically), and (b) on
TL the [S, T] plane. The meaning of
T  TL
the arrows is the same as in Fig. 8.
0 V 0 S1 S2 S

Since during each isotherm, the working gas is brought into thermal contact only with the
corresponding heat bath, i.e. its temperature is constant, the relation (19), dQ = TdS, may be
immediately integrated to yield
QH  TH ( S 2  S1 ), QL  TL ( S 2  S1 ). (1.65)

Hence the ratio of these two heat flows is completely determined by their temperature ratio:
QH TH
 , (1.66)
QL TL

Chapter 1 Page 19 of 24
Essential Graduate Physics SM: Statistical Mechanics

regardless of the working gas properties. Formulas (63) and (66) are sufficient to find the ratio of the
work (–W) to any of QH and QL. For example, the main figure-of-merit of a thermal machine used as a
heat engine (QH > 0, QL > 0, –W =  W  > 0), is its efficiency
W Q H  QL Q Heat
   1  L  1. (1.67) engine’s
QH QH QH efficiency:
definition

For the Carnot cycle, this definition, together with Eq. (66), immediately yields the famous relation
Carnot
TL
 Carnot  1  , (1.68) cycle’s
TH efficiency

which shows that at a given TL (that is typically the ambient temperature ~300 K), the efficiency may be
increased, ultimately to 1, by raising the temperature TH of the heat source.42
The unique nature of the Carnot cycle (see Fig. 9b again) makes its efficiency (68) the upper
limit for any heat engine.43 Indeed, in this cycle, the transfer of heat between any heat bath and the
working gas is performed reversibly, when their temperatures are equal. (If this is not so, some heat may
flow from the hotter to the colder bath without performing any work.) In particular, it shows that max =
0 at TH = TL, i.e., no heat engine can perform mechanical work in the absence of temperature gradients.44
On the other hand, if the cycle is reversed (see the dashed arrows in Figs. 8 and 9), the same
thermal machine may serve as a refrigerator, providing heat removal from the low-temperature bath (QL
< 0) at the cost of consuming external mechanical work: W > 0. This reversal does not affect the basic
relation (63), which now may be used to calculate the relevant figure-of-merit, called the cooling
coefficient of performance (COPcooling):
QL QL
COPcooling   . (1.69)
W QH  QL
Notice that this coefficient may be above unity; in particular, for the Carnot cycle we may use Eq. (66)
(which is also unaffected by the cycle reversal) to get
TL
(COPcooling ) Carnot  , (1.70)
TH  TL
so this value is larger than 1 at TH < 2TL, and may be even much larger when the temperature difference
(TH – TL) sustained by the refrigerator, tends to zero. For example, in a typical air-conditioning system,
this difference is of the order of 10 K, while TL ~ 300 K, so (TH – TL) ~ TL/30, i.e. the Carnot value of

42 Semi-quantitatively, this trend is valid also for other, less efficient but more practicable heat engine cycles – see
Problems 15-18. This trend is the leading reason why internal combustion engines, with TH of the order of 1,500
K, are more efficient than steam engines, with TH of at most a few hundred K.
43 In some alternative axiomatic systems of thermodynamics, this fact is postulated and serves the role of the 2nd
law. This is why it is under persisting (predominantly, theoretical) attacks by suggestions of more efficient heat
engines – notably, with quantum systems. To the best of my knowledge, reliable analyses of all the suggestions
put forward so far have confirmed that the Carnot efficiency (68) cannot be exceeded even using quantum-
mechanical cycles – see, e.g., the recent review by S. Bhattacharjee and A. Dutta, Eur. J. Phys. 94, 239 (2021).
44 Such a hypothetical heat engine that would violate the 2nd law of thermodynamics, is called the “perpetual
motion machine of the 2nd kind” – in contrast to any (also hypothetical) “perpetual motion machine of the 1st
kind” that would violate the 1st law, i.e., the energy conservation.

Chapter 1 Page 20 of 24
Essential Graduate Physics SM: Statistical Mechanics

COPcooling is as high as ~30. (In the state-of-the-art commercial HVAC systems it is within the range of 3
to 4.) This is why the term “cooling efficiency”, used in some textbooks instead of (COP)cooling, may be
misleading.
Since in the reversed cycle QH = –W + QL < 0, i.e. the system provides heat flow into the high-
temperature heat bath, it may be used as a heat pump for heating purposes. The figure-of-merit
appropriate for this application is different from Eq. (69):
QH QH
COPheating   , (1.71)
W QH  Q L
so for the Carnot cycle, using Eq. (66) again, we get
TH
(COPheating ) Carnot  . (1.72)
TH  TL
Note that this COP is always larger than 1, meaning that the Carnot heat pump is always more
efficient than the direct conversion of work into heat (when QH = –W, so COPheating = 1), though
practical electricity-driven heat pumps are substantially more complex and hence more expensive than
simple electric heaters. Such heat pumps, with typical COPheating values of around 4 in summer and 2 in
winter, are frequently used for heating large buildings.
Finally, note that according to Eq. (70), the COPcooling of the Carnot cycle tends to zero at TL 
0, making it impossible to reach the absolute zero of temperature, and hence illustrating the meaningful
(Nernst’s) formulation of the 3rd law of thermodynamics, cited in Sec. 3. Indeed, let us prescribe a finite
but very large heat capacity C(T) to the low-temperature bath, and use the definition of this variable to
write the following expression for the relatively small change of its temperature as a result of dn similar
refrigeration cycles:
C (TL )dTL  QL dn . (1.73)
Together with Eq. (66), this relation yields
C (TL )dTL QH
 dn . (1.74)
TL TH
If TL  0, so TH >>TL and  QH   –W = const, the right-hand side of this equation does not depend on
TL, so if we integrate it over many (n >> 1) cycles, getting the following simple relation between the
initial and final values of TL:
Tfin
C (T )dT QH
 T

TH
n. (1.75)
Tini

For example, if C(T) is a constant, Eq. (75) yields an exponential law,


 QH 
Tfin  Tini exp n , (1.76)
 CTH 
with the absolute zero of temperature not reached as any finite n. Even for an arbitrary function C(T)
that does not vanish at T  0, Eq. (74) proves the Nernst theorem, because dn diverges at TL  0.

Chapter 1 Page 21 of 24
Essential Graduate Physics SM: Statistical Mechanics

1.7. Exercise problems

1.1. Two bodies, with temperature-independent heat capacities C1 and C2, and different initial
temperatures T1 and T2, are placed into a weak thermal contact. Calculate the change of the total entropy
of the system before it reaches the thermal equilibrium.

1.2. A gas portion has the following properties:


(i) its heat capacity CV = aT b, and
(ii) the work W T needed for its isothermal compression from V2 to V1 equals cTln(V2/V1),
where a, b, and c are some constants. Find the equation of state of the gas, and calculate the temperature
dependence of its entropy S and thermodynamic potentials E, H, F, G, and .

1.3. A closed volume with an ideal classical gas of similar molecules is separated with a partition
in such a way that the number N of molecules in each part is the same, but their volumes are different.
The gas is initially in thermal equilibrium, and its pressure in one part is P1, and in the other part, P2.
Calculate the change of entropy resulting from a fast removal of the partition, and analyze the result.

1.4. An ideal classical gas of N particles is initially confined to volume V, and is in thermal
equilibrium with a heat bath of temperature T. Then the gas is allowed to expand to volume V’ > V in
one of the following ways:
(i) The expansion is slow, so due to the sustained thermal contact with the heat bath, the gas
temperature remains equal to T.
(ii) The partition separating the volumes V and (V’ –V) is removed very fast, allowing the gas to
expand rapidly.
For each process, calculate the changes of pressure, temperature, energy, and entropy of the gas
during its expansion.

1.5. For an ideal classical gas with temperature-independent specific heat, derive the relation
between P and V at the adiabatic expansion/compression.

1.6. Calculate the speed and the wave impedance of acoustic waves propagating in an ideal
classical gas with temperature-independent specific heat, in the limits when the propagation may be
treated as:
(i) an isothermal process, and
(ii) an adiabatic process.
Which of these limits is achieved at higher wave frequencies?

1.7. As will be discussed in Sec. 3.5, the so-called “hardball” models of classical particle
interaction yield the following equation of state of a gas of such particles:
P  T n  ,
where n = N/V is the particle density, and the function (n) is generally different from that (ideal(n) = n)
of the ideal gas, but still independent of temperature. For such a gas, with temperature-independent cV,
calculate:

Chapter 1 Page 22 of 24
Essential Graduate Physics SM: Statistical Mechanics

(i) the energy of the gas, and


(ii) its pressure as a function of n at the adiabatic compression.

1.8. For an arbitrary thermodynamic system with a fixed number of particles, prove the four
Maxwell relations (already mentioned in Sec. 4):
 S   P   V   T 
i  :     , ii  :     ,
 V  T  T V  S  P  P  S

iii  :  S    V  , iv  :  P    T  ,


 P  T  T  P  S V  V  S
and also the following formula:
 E   P 
v  :    T   P.
 V  T  T V

1.9. Express the heat capacity difference, CP – CV, via the equation of state P = P(V, T) of the
system.

1.10. Prove that the isothermal compressibility45


1  V 
T    
V  P  T , N
of a system of N similar particles may be expressed in two different ways:
V2  2P  V  N 
T   2   2   .
N2    T N    T ,V

1.11 Throttling46 is gas expansion by driving it through either a small hole (called the throttling
valve) or a porous partition, by an externally sustained constant difference of pressure on two sides of
such an obstacle.
(i) Prove that in the absence of heat exchange with the environment, the enthalpy of the
transferred gas does not change.
(ii) Express the so-called Joule-Thomson coefficient (T/P)H, which characterizes the gas
temperature change at throttling, via its thermal expansion coefficient   (V/T)P/V.

1.12. A system with a fixed number of particles is in thermal and mechanical contact with its
environment of temperature T0 and pressure P0. Assuming that the internal relaxation of the system is
sufficiently fast, derive the conditions of stability of its equilibrium with the environment with respect to
small perturbations.

45Note that the compressibility is just the reciprocal bulk modulus,  = 1/K – see, e.g., CM Sec. 7.3.
46Sometimes it is called the Joule-Thomson process, though more typically, the latter term refers to the possible
gas cooling at the throttling.

Chapter 1 Page 23 of 24
Essential Graduate Physics SM: Statistical Mechanics

1.13. Derive the analog of the relation for the difference (CP – CV), whose derivation was the
task of Problem 9, for a fixed-volume sample with a uniform magnetization M parallel to the uniform
external field H. Spell out this result for a paramagnet that obeys the Curie law M  H/T – the relation
to be derived and discussed later in this course.

1.14. Two bodies have equal, temperature-independent heat capacities C, but different
temperatures, T1 and T2. Calculate the maximum mechanical work obtainable from this system, using a
heat engine.
P
1.15. Express the efficiency  of a heat engine that uses the so-
P
called Joule (or “Brayton”) cycle consisting of two adiabatic and two max
S  const
isobaric processes (see the figure on the right), via the minimum and
maximum values of pressure, and compare the result with Carnot. P S  const
min
Assume an ideal classical working gas with temperature-independent
CP and CV. 0 V
1.16. Calculate the efficiency of a heat engine using the Otto P
S  const
cycle47 that consists of two adiabatic and two isochoric (constant-
volume) reversible processes – see the figure on the right. Explore how
the efficiency depends on the ratio r  Vmax/Vmin, and compare it with the
Carnot cycle’s efficiency. Assume an ideal classical working gas with
temperature-independent heat capacity. S  const
0
V0 rV0 V
1.17. A heat engine’s cycle consists of two isothermal (T = T
const) and two isochoric (V = const) processes – see the figure on the
right.48 TH
(i) Assuming that the working gas is an ideal classical gas of N
particles, calculate the mechanical work performed by the engine during
one cycle. TL
(ii) Are the specified conditions sufficient to calculate the
engine’s efficiency? (Justify your answer.) 0 V1 V2 V

P  const
1.18. The Diesel cycle (an approximate model of the Diesel P
internal combustion engine’s operation) consists of two adiabatic 2 3
S  const
processes, one isochoric process, and one isobaric process – see the
4
figure on the right. Assuming an ideal working gas with temperature-
S  const V  const
independent CV and CP, express the efficiency  of the heat engine using
this cycle via the gas temperature values in its transitional states 1
corresponding to the corners of the cycle diagram. 0 V
47 This name stems from the fact that the cycle is an approximate model of operation of the four-stroke internal
combustion engine, which was improved and made practicable (though not invented!) by N. Otto in 1876.
48 The reversed cycle of this type is a reasonable approximation for the operation of the Stirling and Gifford-
McMahon (GM) refrigerators, broadly used for cryocooling – for a recent review see, e.g., A. de Waele, J. Low
Temp. Phys. 164, 179 (2011).

Chapter 1 Page 24 of 24
Essential Graduate Physics SM: Statistical Mechanics

Chapter 2. Principles of Physical Statistics


This chapter is the keystone of this course. It starts with a brief discussion of such basic notions of
statistical physics as statistical ensembles, probability, and ergodicity. Then the so-called
microcanonical distribution postulate is formulated, simultaneously with the statistical definition of
entropy. This basis enables a ready derivation of the famous Gibbs (“canonical”) distribution – the
most frequently used tool of statistical physics. Then we will discuss one more, “grand canonical”
distribution, which is more convenient for some tasks. In particular, it is immediately used for the
derivation of the most important Boltzmann, Fermi-Dirac, and Bose-Einstein statistics of independent
particles, which will be repeatedly utilized in the following chapters.

2.1. Statistical ensembles and probability


As was already discussed in Sec. 1.1, statistical physics deals with situations when either
unknown initial conditions, or system’s complexity, or the laws of its motion (as in the case of quantum
mechanics) do not allow a definite prediction of measurement results. The main formalism for the
analysis of such systems is the probability theory, so let me start with a very brief review of its basic
concepts, using an informal “physical” language – less rigorous but (hopefully) more transparent than
standard mathematical treatments,1 and quite sufficient for our purposes.
Consider N >> 1 independent similar experiments carried out with apparently similar systems
(i.e. systems with identical macroscopic parameters such as volume, pressure, etc.), but still giving, by
any of the reasons listed above, different results of measurements. Such a collection of experiments,
together with a fixed method of result processing, is a good example of a statistical ensemble. Let us
start from the case when each experiment may have M different discrete outcomes, and the number of
experiments giving these outcomes is N1, N2,…, NM, so
M

N
m 1
m  N. (2.1)

The probability of each outcome, for the given statistical ensemble, is then defined as
Nm
Probability Wm  lim N  . (2.2)
N
Though this definition is so close to our everyday experience that it is almost self-evident, a few remarks
may still be relevant.
First, the probabilities Wm depend on the exact statistical ensemble they are defined for, notably
including the method of result processing. As the simplest example, consider throwing the standard
cubic-shaped dice many times. For the ensemble of all thrown and counted dice, the probability of each
outcome (say, “1”) is 1/6. However, nothing prevents us from defining another statistical ensemble of
dice-throwing experiments in which all outcomes “1” are discounted. Evidently, the probability of

1For the reader interested in a more rigorous approach, I can recommend, for example, Chapter 18 of the famous
handbook by G. Korn and T. Korn – see MA Sec. 16(ii).

© K. Likharev
Essential Graduate Physics SM: Statistical Mechanics

finding the outcome “1” in this modified (but legitimate) ensemble is 0, while for all other five
outcomes (“2” to “6”), it is 1/5 rather than 1/6.
Second, a statistical ensemble does not necessarily require N similar physical systems, e.g., N
distinct dice. It is intuitively clear that tossing the same die N times constitutes an ensemble with similar
statistical properties. More generally, a set of N experiments with the same system gives a statistical
ensemble equivalent to the set of experiments with N different systems, provided that the experiments
are kept independent, i.e. that outcomes of past experiments do not affect the experiments to follow.
Moreover, for many physical systems of interest, no special preparation of each new experiment is
necessary, and N experiments separated by sufficiently long time intervals, form a “good” statistical
ensemble – the property called ergodicity.2
Third, the reference to infinite N in Eq. (2) does not strip the notion of probability of its practical
relevance. Indeed, it is easy to prove (see Chapter 5) that, at very general conditions, at finite but
sufficiently large N, the numbers Nm are approaching their average (or expectation) values3

N m  Wm N , (2.3)

with the relative deviations decreasing as ~1/Nm1/2, i.e. as 1/N1/2.


Now let me list those properties of probabilities that we will immediately need. First, dividing
both sides of Eq. (1) by N and following the limit N  , we get the well-known normalization
condition
M

W
m 1
m  1; (2.4)

just remember that it is true only if each experiment definitely yields one of the outcomes N1, N2,…, NM.
Second, if we have an additive function of the results,
M
1
f 
N
N
m 1
m fm , (2.5)

where fm are some definite (deterministic) coefficients, the statistical average (also called the
expectation value) of this function is naturally defined as

2 The most popular counter-examples are provided by some energy-conserving systems. Consider, for example, a
system of particles placed in a potential that is a quadratic-polynomial function of its coordinates. The theory of
oscillations tells us (see, e.g., CM Sec. 6.2) that this system is equivalent to a set of non-interacting harmonic
oscillators. Each of these oscillators conserves its own initial energy Ej forever, so the statistics of N
measurements of one such system may differ from that of N different systems with a random distribution of Ej,
even if the total energy of the system, E = jEj, is the same. Such non-ergodicity, however, is a rather feeble
phenomenon and is readily destroyed by any of many mechanisms, such as weak interaction with the environment
(leading, in particular, to oscillation damping), potential anharmonicity (see, e.g., CM Chapter 5), and chaos (CM
Chapter 9), all of them growing fast with the number of particles in the system, i.e. the number of its degrees of
freedom. This is why an overwhelming part of real-life systems are ergodic; for readers interested in non-ergodic
exotics, I can recommend the monograph by V. Arnold and A. Avez, Ergodic Problems of Classical Mechanics,
Addison-Wesley, 1989.
3 Here (and everywhere in this series) angle brackets … mean averaging over a statistical ensemble, which is
generally different from averaging over time – as it will be the case in quite a few examples considered below.

Chapter 2 Page 2 of 44
Essential Graduate Physics SM: Statistical Mechanics

M
1
f  lim N 
N

m 1
Nm fm , (2.6)

so using Eq. (3) we get


M
f   Wm f m .
Expectation
value via (2.7)
probabilities m 1

Note that Eq. (3) may be considered as the particular form of this general result, when all fm = 1. Eq. (5)
with these fm defines what is sometimes called the counting function.
Next, the spectrum of possible experimental outcomes is frequently continuous for all practical
purposes. (Think, for example, about the set of positions of the marks left by bullets fired into a target
from afar.) The above formulas may be readily generalized to this case; let us start from the simplest
situation when all different outcomes may be described by just one continuous scalar variable q – which
replaces the discrete index m in Eqs. (1)-(7). The basic relation for this case is the self-evident fact that
the probability dW of having an outcome within a very small interval dq surrounding some point q is
proportional to the magnitude of that interval:
dW  w(q)dq , (2.8)

where w(q) is some function of q, which does not depend on dq. This function is called probability
density. Now all the above formulas may be recast by replacing the probabilities Wm with the products
(8), and the summation over m, with the integration over q. In particular, instead of Eq. (4) the
normalization condition now becomes
 w(q)dq  1, (2.9)

where the integration should be extended over the whole range of possible values of q. Similarly, instead
of the discrete values fm participating in Eq. (5), it is natural to consider a function f(q). Then instead of
Eq. (7), the expectation value of the function may be calculated as
Expectation
value via
probability f   w(q) f (q)dq. (2.10)
density
It is also straightforward to generalize these formulas to the case of more variables. For example,
the state of a classical particle with three degrees of freedom may be fully described by the probability
density w defined in the 6D space of its generalized radius-vector q and momentum p. As a result, the
expectation value of a function of these variables may be expressed as a 6D integral

f   w(q, p) f (q, p) d 3 qd 3 p. (2.11)

Some systems considered in this course consist of components whose quantum properties
cannot be ignored, so let us discuss how  f  should be calculated in this case. If by fm we mean
measurement results, then Eq. (7) (and its generalizations) remains valid, but since these numbers
themselves may be affected by the intrinsic quantum-mechanical uncertainty, it may make sense to have
a bit deeper look into this situation. Quantum mechanics tells us4 that the most general expression for
the expectation value of an observable f in a certain ensemble of macroscopically similar systems is

4 See, e.g., QM Sec. 7.1.

Chapter 2 Page 3 of 44
Essential Graduate Physics SM: Statistical Mechanics

f  W
m ,m '
mm' f m'm  Tr ( Wf ) . (2.12)

Here fmm’ are the matrix elements of the quantum-mechanical operator fˆ corresponding to the
observable f, in a full basis of orthonormal states m,
f mm '  m fˆ m' , (2.13)

while the coefficients Wmm’ are the elements of the so-called density matrix W, which represents, in the
same basis, the density operator Ŵ describing properties of this ensemble. Eq. (12) is evidently more
general than Eq. (7), and is reduced to it only if the density matrix is diagonal:
Wmm '  Wm mm ' (2.14)
(where mm’ is the Kronecker symbol), when the diagonal elements Wm play the role of probabilities of
the corresponding states.
Thus formally, the largest difference between the quantum and classical description is the
presence, in Eq. (12), of the off-diagonal elements of the density matrix. They have the largest values in
a pure (also called “coherent”) ensemble, in which the state of the system may be described with state
vectors, e.g., the ket-vector
   m m , (2.15)
m

where m are some (generally, complex) coefficients. In this case, the density matrix elements are
merely
Wmm '   m m ' , (2.16)
so the off-diagonal elements are of the same order as the diagonal elements. For example, in the very
important particular case of a two-level system, the pure-state density matrix is

   1 2 
W   1 1   ,
 (2.17)

 2 1  
2 2

so the product of its off-diagonal components is as large as that of the diagonal components.
In the most important basis of stationary states, i.e. the eigenstates of the system’s time-
independent Hamiltonian, the coefficients m oscillate in time as5
 Em   E 
 m (t )   m (0) exp i t    m exp i m t  i m , (2.18)
     
where Em are the corresponding eigenenergies, m are constant phase shifts, and  is the Planck constant.
This means that while the diagonal terms of the density matrix (16) remain constant, its off-diagonal
components are oscillating functions of time:

5 Here I use the Schrödinger picture of quantum dynamics, in which the matrix elements fnn’ representing
quantum-mechanical operators, do not evolve in time. The final results of this discussion do not depend on the
particular picture – see, e.g., QM Sec. 4.6.

Chapter 2 Page 4 of 44
Essential Graduate Physics SM: Statistical Mechanics

 E  E m' 

Wmm'   m'  m   m'  m expi m t  exp i ( m'   m ). (2.19)
  
Due to the extreme smallness of  on the human scale of things), minuscule random perturbations of
eigenenergies are equivalent to substantial random changes of the phase multipliers, so the time average
of any off-diagonal matrix element tends to zero. Moreover, even if our statistical ensemble consists of
systems with exactly the same Em, but different values m (which are typically hard to control at the
initial preparation of the system), the average values of all Wmm’ (with m  m’) vanish again.
This is why, besides some very special cases, typical statistical ensembles of quantum particles
are far from being pure, and in most cases (certainly including the thermodynamic equilibrium), a good
approximation for their description is given by the opposite limit of the so-called classical mixture, in
which all off-diagonal matrix elements of the density matrix equal zero, and its diagonal elements Wmm
are merely the probabilities Wm of the corresponding eigenstates. In this case, for the observables
compatible with energy, Eq. (12) is reduced to Eq. (7), with fm being the eigenvalues of the variable f, so
we may base our further discussion on this key relation and its continuous extensions (10)-(11).

2.2. Microcanonical ensemble and distribution


Now we may move to the now-standard approach to statistical mechanics, based on the three
statistical ensembles introduced in the 1870s by Josiah Willard Gibbs.6 The most basic of them is the
so-called microcanonical statistical ensemble7 defined as a set of macroscopically similar closed
(isolated) systems with virtually the same total energy E. Since in quantum mechanics the energy of a
closed system is quantized, in order to make the forthcoming discussion suitable for quantum systems as
well, it is convenient to include in the ensemble all systems with energies Em within a relatively narrow
interval ΔE << E (see Fig. 1) that is nevertheless much larger than the average distance E between the
energy levels, so that the number M of different quantum states within the interval ΔE is large, M >> 1.
Such choice of E is only possible if E << E; however, the reader should not worry too much about
this condition, because the most important applications of the microcanonical ensemble are for very
large systems (and/or very high energies) when the energy spectrum is very dense.8

E
Fig. 2.1. A very schematic image of the microcanonical
E ensemble. (Actually, the ensemble deals with quantum
states rather than energy levels. An energy level may be
degenerate, i.e. correspond to several states.)

6 Personally, I believe that the genius of J. Gibbs, praised by Albert Einstein as the “greatest mind in American
history”, is still insufficiently appreciated, and agree with R. Millikan that Gibbs “did for statistical mechanics and
thermodynamics what […] Maxwell did for electrodynamics”.
7 The terms “microcanonical”, as well as “canonical” (see Sec. 4 below) are apparently due to Gibbs and I was
unable to find out his motivation for the former name. (“Canonical” in the sense of “standard” or “common” is
quite appropriate, but why “micro”? Perhaps to reflect the smallness of ΔE?)
8 Formally, the main result of this section, Eq. (20), is valid for any M (including M = 1); it is just less informative
for small M (and trivial for M = 1).

Chapter 2 Page 5 of 44
Essential Graduate Physics SM: Statistical Mechanics

This ensemble serves as the basis for the formulation of the postulate which is most frequently
called the microcanonical distribution (or, more adequately, “the main statistical postulate” or “the main
statistical hypothesis”): in the thermodynamic equilibrium of a microcanonical ensemble, all its states
have equal probabilities,
1 Micro-
Wm   const. (2.20) canonical
M distribution

Though in some constructs of statistical mechanics this equality is derived from other axioms, which
look more plausible to their authors, I believe that Eq. (20) may be taken as the starting point of the
statistical physics, supported “just” by the compliance of all its corollaries with experimental
observations.
Note that the postulate (20) is closely related to the macroscopic irreversibility of the systems
that are microscopically virtually reversible (closed): if such a system was initially in a certain state, its
time evolution with even minuscule interactions with the environment (which is necessary for reaching
the thermodynamic equilibrium) eventually leads to the uniform distribution of its probability among all
states with essentially the same energy. Each of these states is not “better” than the initial one; rather, in
a macroscopic system, there are just so many of these states that the chance to find the system in the
initial state is practically nil – again, think about the ink drop diffusion into a glass of water.9
Now let us find a suitable definition of entropy S of a microcanonical ensemble’s member – for
now, in the thermodynamic equilibrium only. This was done in 1877 by another giant of statistical
physics, Ludwig Eduard Boltzmann – on the basis of the prior work by James Clerk Maxwell on the
kinetic theory of gases – see Sec. 3.1 below. In present-day terminology, since S is a measure of
disorder, it should be related to the amount of information10 lost when the system went irreversibly from
the full order to its current state – in equilibrium, to the full disorder, i.e. from one definite state to the
microcanonical distribution (20). In an even more convenient formulation, this is the amount of
information necessary to find the exact state of a certain system in a microcanonical ensemble.
In the information theory, the amount of information necessary to make a definite choice
between two options with equal probabilities (Fig. 2a) is defined as
I (2)  log 2 2  1. (2.21)

(a) (b)

Fig. 2.2. “Logarithmic trees” of binary decisions


for choosing between (a) M = 2, and (b) M = 4
1 bit 1 bit opportunities with equal probabilities.
1 bit
9 Though I have to move on, let me note that the microcanonical distribution (20) is a very nontrivial postulate,
and my advice to the reader is to find some time to give additional thought to this keystone of the whole building
of statistical mechanics.
10 I will rely on the reader’s common sense and intuitive understanding of what information is, because even in
the formal information theory, this notion is essentially postulated – see, e.g., the wonderfully clear short textbook
by J. Pierce, An Introduction to Information Theory, Dover, 1980.

Chapter 2 Page 6 of 44
Essential Graduate Physics SM: Statistical Mechanics

This unit of information is called a bit. Now, if we need to make a choice between four equally
probable opportunities, it can be made in two similar steps (Fig. 2b), each requiring one bit of
information, so the total amount of information necessary for the choice is
I (4)  2 I (2)  2  log 2 4. (2.22)
An obvious extension of this process to the choice between M = 2m states gives
I ( M )  mI (2)  m  log 2 M . (2.23)
This measure, if extended naturally to any integer M, is quite suitable for the definition of
entropy at equilibrium, with the only difference that, following tradition, the binary logarithm is
replaced with the natural one:11
S  ln M . (2.24a)
Using Eq. (20), we may recast this definition in its most frequently used form
Entropy 1
in S  ln   ln Wm . (2.24b)
equilibrium Wm
(Again, please note that Eqs. (24) are valid in thermodynamic equilibrium only!)
Note that Eq. (24) satisfies the major properties of the entropy discussed in thermodynamics.
First, it is a unique characteristic of the disorder. Indeed, according to Eq. (20), M (at fixed E) is the
only possible measure characterizing the microcanonical distribution, and so is its unique function lnM.
This function also satisfies another thermodynamic requirement to the entropy, of being an extensive
variable. Indeed, for several independent systems, the joint probability of a certain state is just a product
of the partial probabilities, and hence, according to Eq. (24), their entropies just add up.
Now let us see whether Eqs. (20) and (24) are compatible with the 2nd law of thermodynamics.
For that, we need to generalize Eq. (24) for S to an arbitrary state of the system (generally, out of
thermodynamic equilibrium), with an arbitrary set of state probabilities Wm. Let us first recognize that M
in Eq. (24) is just the number of possible ways to commit a particular system to a certain state m (m = 1,
2,…M), in a statistical ensemble where each state is equally probable. Now let us consider a more
general ensemble, still consisting of a large number N >> 1 of similar systems, but with a certain number
Nm = WmN >> 1 of systems in each of M states, with the factors Wm not necessarily equal. In this case,
the evident generalization of Eq. (24) is that the entropy SN of the whole ensemble is

S N  ln M ( N1 , N 2 ,..) , (2.25)
where M (N1,N2,…) is the number of ways to commit a particular system to a certain state m while
keeping all numbers Nm fixed. This number M (N1,N2,…) is clearly equal to the number of ways to
distribute N distinct balls between M different boxes, with the fixed number Nm of balls in each box, but

11 This is of course just the change of a constant factor: S(M) = lnM = ln2  log2M = ln2  I(M)  0.693 I(M). A
review of Chapter 1 shows that nothing in thermodynamics prevents us from choosing such a constant coefficient
arbitrarily, with the corresponding change of the temperature scale – see Eq. (1.9). In particular, in the SI units,
where Eq. (24b) becomes S = –kBlnWm, one bit of information corresponds to the entropy change ΔS = kB ln2 ≈
0.693 kB  0.96510-23 J/K. (The formula “S = k logW” is engraved on L. Boltzmann’s tombstone in Vienna.)

Chapter 2 Page 7 of 44
Essential Graduate Physics SM: Statistical Mechanics

in no particular order within it. Comparing this description with the definition of the so-called
multinomial coefficients,12 we get
M
N!
M ( N1 , N 2 ,)  N C  , with N   N m . (2.26)
N 1 , N 2 ,..., N M N1! N 2 !...N M ! m 1

To simplify the resulting expression for SN, we can use the famous Stirling formula, in its
crudest, de Moivre’s form,13 whose accuracy is suitable for most purposes of statistical physics:
ln( N !) N   N (ln N  1). (2.27)

When applied to our current problem, this formula gives the following average entropy per system,14
SN 1 M
 1  M

S  ln( N !)   ln( N m !)   N ln N  1   N m ln N m  1
N N m 1  Nm   N  m 1  (2.28)
M
N N
  m ln m ,
m 1 N N
and since this result is only valid in the limit Nm   anyway, we may use Eq. (2) to represent it as
M M Entropy
1
S   Wm ln Wm   Wm ln . (2.29) out of
equilibrium
m 1 m 1 Wm
This extremely important result15 may be interpreted as the average of the entropy values given by Eq.
(24), weighed with specific probabilities Wm per the general formula (7).16
Now let us find what distribution of probabilities Wm provides the largest value of the entropy
(29). The answer is almost evident from a good glance at Eq. (29). For example, if for a subgroup of M’
 M states, the coefficients Wm are constant and equal to 1/M’, so Wm = 0 for all other states, all M’ non-
zero terms in the sum (29) are equal to each other, so
1
S  M'
ln M'  ln M' , (2.30)
M'
and the closer M’ to its maximum value M the larger S. Hence, the maximum of S is reached at the
uniform distribution given by Eq. (24).

12 See, e.g., MA Eq. (2.3). Despite the intimidating name, Eq. (26) may be very simply derived. Indeed, N! is just
the number of all possible permutations of N balls, i.e. of the ways to place them in certain positions – say, inside
M boxes. Now to take into account that the particular order of the balls in each box is not important, that number
should be divided by all numbers Nm! of possible permutations of balls within each box – that’s it!
13 See, e.g., MA Eq. (2.10).
14 Strictly speaking, I should use the notation S here. However, following the style accepted in thermodynamics,
I will drop the averaging signs until we will really need them to avoid confusion. Again, this shorthand is not too
bad because the relative fluctuations of entropy (as those of any macroscopic variable) are very small at N >> 1.
15 With the replacement of lnW with log W (i.e. division of both sides by ln2), Eq. (29) becomes the famous
m 2 m
Shannon (or “Boltzmann-Shannon”) formula for the average information I per symbol in a long communication
string using M different symbols, with probability Wm each.
16 In some textbooks, this interpretation is even accepted as the derivation of Eq. (29); however, it is evidently
less rigorous than the one outlined above.

Chapter 2 Page 8 of 44
Essential Graduate Physics SM: Statistical Mechanics

In order to prove this important fact more strictly, let us find the maximum of the function given
by Eq. (29). If its arguments W1, W2, …WM were completely independent, this could be done by finding
the point (in the M-dimensional space of the coefficients Wm) where all partial derivatives S/Wm equal
zero. However, since the probabilities are constrained by condition (4), the differentiation has to be
carried out more carefully, taking into account this interdependence:
   S S Wm'
 S (W1 ,W2 ,...)   . (2.31)
 Wm  cond Wm m'  m Wm' Wm
At the maximum of the function S, such expressions should be equal to zero for all m. This condition
yields S/Wm = , where the so-called Lagrange multiplier  is independent of m. Indeed, at such point
Eq. (31) becomes
   Wm'  Wm Wm'  
 S (W1 ,W2 ,...)            (1)  0 . (2.32)
 Wm  cond m'  m Wm  Wm m ' m Wm  Wm

For our particular expression (29), the condition S/Wm =  yields


S d
  Wm ln Wm    ln Wm  1  . (2.33)
Wm dWm
The last equality holds for all m (and hence the entropy reaches its maximum value) only if Wm is
independent of m. Thus the entropy (29) indeed reaches its maximum value (24) at equilibrium.
To summarize, we see that the statistical definition (24) of entropy does fit all the requirements
imposed on this variable by thermodynamics. In particular, we have been able to prove the 2nd law of
thermodynamics using that definition together with the fundamental postulate (20).
Now let me discuss one possible point of discomfort with that definition: the values of M, and
hence Wm, depend on the accepted energy interval ΔE of the microcanonical ensemble, for whose choice
no exact guidance is offered. However, if the interval ΔE contains many states, M >> 1, as was assumed
before, then with a very small relative error (vanishing in the limit M → ∞), M may be represented as
M  g ( E )E , (2.34)
where g(E) is the density of states of the system:
d( E )
g (E)  , (2.35)
dE
Σ(E) being the total number of states with energies below E. (Note that the average interval E between
energy levels, mentioned at the beginning of this section, is just E/M = 1/g(E).) Plugging Eq. (34) into
Eq. (24), we get
S  ln M  ln g ( E )  ln E , (2.36)

so the only effect of a particular choice of ΔE is an offset of the entropy by a constant, and in Chapter 1
we have seen that such a constant offset does not affect any measurable quantity. Of course, Eq. (34),
and hence Eq. (36) are only precise in the limit when the density of states g(E) is so large that the range
available for the appropriate choice of E,

Chapter 2 Page 9 of 44
Essential Graduate Physics SM: Statistical Mechanics

g 1 ( E )  E  E , (2.37)

is sufficiently broad: g(E)E = E/E >> 1.


In order to get some gut feeling of the functions g(E) and S(E) and the feasibility of the condition
(37), and also to see whether the microcanonical distribution may be directly used for calculations of
thermodynamic variables in particular systems, let us apply it to a microcanonical ensemble of many
sets of N >> 1 independent, similar harmonic oscillators with frequency ω. (Please note that the
requirement of a virtually fixed energy is applied, in this case, to the total energy EN of each set of
oscillators, rather to energy E of a single oscillator – which may be virtually arbitrary though certainly
much less than EN ~ NE >> E.) Basic quantum mechanics tells us17 that the eigenenergies of such an
oscillator form a discrete, equidistant spectrum:
 1
Em    m  , where m  0, 1, 2,... (2.38)
 2
If ω is kept constant, the ground-state energy ω/2 does not contribute to any thermodynamic properties
of the system,18 so for the sake of simplicity we may take that point as the energy origin, and replace Eq.
(38) with Em = mω. Let us carry out an approximate analysis of the system for the case when its
average energy per oscillator,
E
E N, (2.39)
N
is much larger than the energy quantum ω.
For one oscillator, the number of states with energy 1 below a certain value E1 >> ω is
evidently Σ(E1) ≈ E1/ω  (E1/ω)/1! (Fig. 3a). For two oscillators, all possible values of the total
energy (ε1 + ε2) below some level E2 correspond to the points of a 2D square grid within the right
triangle shown in Fig. 3b, giving Σ(E2) ≈ (1/2)(E2/ω)2  (E2/ω)2/2!. For three oscillators, the possible
values of the total energy (ε1 + ε2 + ε3) correspond to those points of the 3D cubic grid, that fit inside the
right pyramid shown in Fig. 3c, giving Σ(E3) ≈ (1/3)[(1/2)(E3/ω)3]  (E3/ω)3/3!, etc.

ε2 ε2 (c)
(a) (b)
E2 E3

E1
ε1 2ω

0 ω 2ω … ω
0 ε1
ε1
Σ(E1) ω E3
0 ω 2ω E2
ε3
E3
Fig. 2.3. Calculating functions Σ(EN) for systems of (a) one, (b) two, and (c) three harmonic oscillators.

17 See, e.g., QM Secs. 2.9 and 5.4.


18 Let me hope that the reader knows that the ground-state energy is experimentally measurable – for example,
using the famous Casimir effect – see, e.g., QM Sec. 9.1. (In Sec. 5.5 below I will briefly discuss another method
of experimental observation of that energy.)

Chapter 2 Page 10 of 44
Essential Graduate Physics SM: Statistical Mechanics

An evident generalization of these formulas to arbitrary N gives the number of states19


N
1  EN 
( E N )    . (2.40)
N !   
Differentiating this expression over the energy, we get
d( E N ) 1 E NN 1
g (EN )   , (2.41)
dE N ( N  1)!   N
so
S N ( E N )  ln g ( E N )  const   ln( N  1)!  ( N  1) ln E N  N ln( )  const. (2.42)

For N >> 1 we may ignore the difference between N and (N – 1) in both instances, and use the Stirling
formula (27) to simplify this result as
 E   E   E  N 
S N ( E )  const  N  ln N  1  N  ln   ln   . (2.43)
 N          
(The second step is only valid at very high E/ ratios when the logarithm in Eq. (43) is substantially
larger than 1.) Returning for a second to the density of states, we see that in the limit N → , it is
exponentially large:
N
S  E 
g (EN )  e N    , (2.44)
  
so the conditions (37) may be indeed satisfied within a very broad range of ΔE.
Now we can use Eq. (43) to find all thermodynamic properties of the system, though only in the
limit E >> . Indeed, according to thermodynamics, if the system’s volume and the number of particles
in it are fixed, the derivative dS/dE is nothing else than the reciprocal temperature in thermal
equilibrium – see Eq. (1.9). In our current case, we imply that the harmonic oscillators are distinct, for
example by their spatial positions. Hence, even if we can speak of some volume of the system, it is
certainly fixed.20 Differentiating Eq. (43) over energy E, we get
Classical
1 dS N N 1
oscillator:
average
   . (2.45)
energy
T dE N E N E
Reading this result backward, we see that the average energy E of a harmonic oscillator equals T (i.e.
kBTK is SI units). At this point, the first-time student of thermodynamics should be very much relieved to
see that the counter-intuitive thermodynamic definition (1.9) of temperature does indeed correspond to
what we all have known about this notion from our kindergarten physics courses.
The result (45) may be readily generalized. Indeed, in quantum mechanics, a harmonic oscillator
with eigenfrequency  may be described by the Hamiltonian operator

19 The coefficient 1/N! in this formula has the geometrical meaning of the (hyper)volume of the N-dimensional
right pyramid with unit sides.
20 For the same reason, the notion of pressure P in such a system is not clearly defined, and neither are any
thermodynamic potentials but E and F.

Chapter 2 Page 11 of 44
Essential Graduate Physics SM: Statistical Mechanics

pˆ 2 qˆ 2
Hˆ   , (2.46)
2m 2
where q is some generalized coordinate, p is the corresponding generalized momentum, m is the
oscillator’s mass,21 and  is its spring constant, so  = (/m)1/2. Since in the thermodynamic equilibrium
the density matrix is always diagonal in the basis of stationary states m (see Sec. 1 above), the quantum-
mechanical averages of the kinetic and potential energies may be found from Eq. (7):

p2 
pˆ 2 q 2 
qˆ 2
  Wm m m,   Wm m m, (2.47)
2m m 0 2m 2 m 0 2
where Wm is the probability to occupy the mth energy level, while bra- and ket-vectors describe the
stationary state corresponding to that level.22 However, both classical and quantum mechanics teach us
that for any m, the bra-ket expressions under the sums in Eqs. (47), which represent the average kinetic
and mechanical energies of the oscillator on its mth energy level, are equal to each other, and hence each
of them is equal to Em/2. Hence, even though we do not know the exact probability distribution Wm yet
(it will be calculated in Sec. 5 below), we may conclude that in the “classical limit” T >> ,

p2 q 2 T Equipartition
  . (2.48) theorem
2m 2 2
Now let us consider a system with an arbitrary number of degrees of freedom, described by a
more general Hamiltonian:23
pˆ 2j  j qˆ 2j
H  H j,
ˆ ˆ ˆ
with H j   , (2.49)
j 2m j 2
with (generally, different) frequencies j = (j/mj)1/2. Since the “modes” (effective harmonic oscillators)
contributing to this Hamiltonian, are independent, the result (48) is valid for each of the modes. This is
the famous equipartition theorem: at thermal equilibrium at temperature T >> j, the average energy of
each so-called half-degree of freedom (which is defined as any variable, either pj or qj, giving a
quadratic contribution to the system’s Hamiltonian), is equal to T/2.24 In particular, for each of three
Cartesian component contributions to the kinetic energy of a free-moving particle, this theorem is valid
for any temperature, because such components may be considered as 1D harmonic oscillators with
vanishing potential energy, i.e. j = 0, so condition T >> j is fulfilled at any temperature.

21 I am using this fancy font for the mass to avoid any chance of its confusion with the state number.
22 Note again that while we have committed the energy EN of N oscillators to be fixed (to apply Eq. (36), valid
only for a microcanonical ensemble at thermodynamic equilibrium), the single oscillator’s energy E in our
analysis may be arbitrary – within the very broad limits  << E  EN ~ NT.
23 As a reminder, the Hamiltonian of any system whose classical Lagrangian function is an arbitrary quadratic
form of its generalized coordinates and the corresponding generalized velocities may be brought to the form (49)
by an appropriate choice of “normal coordinates” qj which are certain linear combinations of the original
coordinates – see, e.g., CM Sec. 6.2.
24 This also means that in the classical limit, the heat capacity of a system is equal to one-half of the number of its
half-degrees of freedom (in the SI units, multiplied by kB).

Chapter 2 Page 12 of 44
Essential Graduate Physics SM: Statistical Mechanics

I believe that this case study of harmonic oscillator systems was a fair illustration of both the
strengths and the weaknesses of the microcanonical ensemble approach.25 On one hand, we could
readily calculate virtually everything we wanted in the classical limit T >> , but calculations for an
arbitrary T ~ , though possible, would be rather unpleasant because for that, all vertical steps of the
function Σ(E N) have to be carefully counted. In Sec. 4 below, we will see that other statistical ensembles
are much more convenient for such calculations.
Let me conclude this section with a short notice on deterministic classical systems with just a
few degrees of freedom (and even simpler mathematical objects called “maps”) that may exhibit
essentially disordered behavior, called deterministic chaos.26 Such chaotic system may be approximately
characterized by an entropy defined similarly to Eq. (29), where Wm are the probabilities to find it in
different small regions of phase space, at well-separated small time intervals. On the other hand, one can
use an expression slightly more general than Eq. (29) to define the so-called Kolmogorov (or
“Kolmogorov-Sinai”) entropy K that characterizes the speed of loss of the information about the initial
state of the system, and hence what is called the “chaos depth”. In the definition of K, the sum over m is
replaced with the summation over all possible permutations {m} = m0, m1, …, mN-1 of small space
regions, and Wm is replaced with W{m}, the probability of finding the system in the corresponding
regions m at time moment tm, with tm = m, in the limit   0, with N = const. For chaos in the simplest
objects, 1D maps, K is equal to the Lyapunov exponent  > 0.27 For systems of higher dimensionality,
which are characterized by several Lyapunov exponents , the Kolmogorov entropy is equal to the
phase-space average of the sum of all positive . These facts provide a much more practicable way of
(typically, numerical) calculation of the Kolmogorov entropy than the direct use of its definition.28

2.3. Maxwell’s Demon, information, and computation


Before proceeding to other statistical distributions, I would like to make a detour to address one
more popular concern about Eq. (24) – the direct relation between entropy and information. Some
physicists are still uneasy with entropy being nothing else than the (deficit of) information, though, to
the best of my knowledge, nobody has yet been able to suggest any experimentally verifiable difference
between these two notions. Let me give one example of their direct relationship.29 Consider a cylinder
containing just one molecule (considered as a point particle), and separated into two halves by a
movable partition with a door that may be opened and closed at will, at no energy cost – see Fig. 4a. If
the door is open and the system is in thermodynamic equilibrium, we do not know on which side of the
partition the molecule is. Here the disorder, i.e. the entropy has the largest value, and there is no way to
get, from a large ensemble of such systems in equilibrium, any useful mechanical energy.

25 The reader is strongly urged to solve Problem 2, whose task is to do a similar calculation for another key (“two-
level”) physical system, and compare the results.
26 See, e.g., CM Chapter 9 and the literature therein.
27 For the definition of , see, e.g., CM Eq. (9.9).
28 For more discussion, see, e.g., either Sec. 6.2 of the monograph H. G. Schuster and W. Just, Deterministic
Chaos, 4th ed., Wiley-VHS, 2005, or the monograph by Arnold and Avez, cited in Sec. 1.
29 This system is frequently called the Szilard engine, after L. Szilard who published its detailed theoretical
discussion in 1929, but is essentially a straightforward extension of the thought experiment suggested by J.
Maxwell as early as 1867.

Chapter 2 Page 13 of 44
Essential Graduate Physics SM: Statistical Mechanics

(a) (b) (c)

v
F

Fig. 2.4. The Szilard engine: a cylinder with a single molecule and a movable partition: (a) before
and (b) after closing the door, and (c) after opening the door at the end of the expansion stage.

Now, let us consider that we know (as instructed by, in Lord Kelvin’s formulation, an omniscient
Maxwell’s Demon) on which side of the partition the molecule is currently located. Then we may close
the door, trapping the molecule, so its repeated impacts on the partition create, on average, a pressure
force F directed toward the empty part of the volume (in Fig. 4b, the right one). Now we can get from
the molecule some mechanical work, say by allowing the force F to move the partition to the right, and
picking up the resulting mechanical energy by some deterministic (zero-entropy) external mechanism.
After the partition has been moved to the right end of the volume, we can open the door again (Fig. 4c),
equalizing the molecule’s average pressure on both sides of the partition, and then slowly move the
partition back to the middle of the volume – without its resistance, i.e. without doing any substantial
work. With the continuing help from Maxwell’s Demon, we can repeat the cycle again and again, and
hence make the system perform unlimited mechanical work, fed “only” by the molecule’s thermal
motion, and the information about its position – thus implementing the perpetual motion machine of the
2nd kind – see Sec. 1.6. The fact that such heat engines do not exist means that getting any new
information, at a non-zero temperature (i.e. at a substantial thermal agitation of particles) has a non-zero
energy cost.
In order to evaluate this cost, let us calculate the maximum work per cycle that can be made by
the Szilard engine (Fig. 4), assuming that it is constantly in the thermal equilibrium with a heat bath of
temperature T. Formula (21) tells us that the information supplied by the demon (on what exactly half of
the volume contains the molecule) is exactly one bit, I (2) = 1. According to Eq. (24), this means that by
getting this information we are changing the entropy of our system by
S I   ln 2 . (2.50)
Now, it would be a mistake to plug this (negative) entropy change into Eq. (1.19). First, that relation is
only valid for slow, reversible processes. Moreover (and more importantly), this equation, as well as its
irreversible version (1.41), is only valid for a fixed statistical ensemble. The change SI does not belong
to this category and may be formally described by the change of the statistical ensemble – from the one
consisting of all similar systems (experiments) with an unknown location of the molecule to a new
ensemble consisting of the systems with the molecule in its certain (in Fig. 4, left) half.30
Now let us consider a slow expansion of the “gas” after the door had been closed. At this stage,
we do not need the Demon’s help any longer (i.e. the statistical ensemble may be fixed), and can indeed
use the relation (1.19). At the assumed isothermal conditions (T = const), this relation may be integrated

30This procedure of the statistical ensemble re-definition is the central point of the connection between physics
and information theory, and is crucial in particular for any (or rather any meaningful :-) discussion of
measurements in quantum mechanics – see, e.g., QM Secs. 2.5 and 10.1.

Chapter 2 Page 14 of 44
Essential Graduate Physics SM: Statistical Mechanics

over the whole expansion process, getting Q = TS. At the final position shown in Fig. 4c, the
system’s entropy should be the same as initially, i.e. before the door had been opened, because we again
do not know where in the volume the molecule is. This means that the entropy was replenished, during
the reversible expansion, from the heat bath, by S = –SI = +ln2, so Q = TS = Tln2. Since by the end
of the whole cycle the internal energy E of the system is the same as before, all this heat could have
gone into the mechanical energy obtained during the expansion. Thus the maximum obtained work per
cycle (i.e. for each obtained information bit) is Tln2 (kBTKln2 in the SI units), about 410-21 Joule at
room temperature. This is exactly the energy cost of getting one bit of new information about a system at
temperature T. The smallness of that amount on the everyday human scale has left the Szilard engine an
academic theoretical exercise for almost a century. However, recently several such devices, of various
physical nature, were implemented experimentally (with the Demon’s role played by an instrument
measuring the position of the particle without a substantial effect on its motion), and the relation Q =
Tln2 was proved, with a gradually increasing precision.31
Actually, discussion of another issue closely related to Maxwell’s Demon, namely energy
consumption at numerical calculations, was started earlier, in the 1960s. It was motivated by the
exponential (Moore’s-law) progress of the digital integrated circuits, which has led in particular, to a
fast reduction of the energy E “spent” (turned into heat) per one binary logic operation. In the recent
generations of semiconductor digital integrated circuits, the typical E is still above 10-17 J, i.e. still
exceeds the room-temperature value of Tln2  410-21 J by several orders of magnitude.32 Still, some
engineers believe that thermodynamics imposes this important lower limit on E and hence presents an
insurmountable obstacle to the future progress of computation. Unfortunately, in the 2000s this delusion
resulted in a substantial and unjustified shift of electron device research resources toward using “non-
charge degrees of freedom” such as spin (as if they do not obey the general laws of statistical physics!),
so the issue deserves at least a brief discussion.
Let me believe that the reader of these notes understands that, in contrast to naïve popular talk,
computers do not create any new information; all they can do is reshape (“process”) the input
information, losing most of it on the go. Indeed, any digital computation algorithm may be decomposed
into simple, binary logical operations, each of them performed by a circuit called the logic gate. Some of
these gates (e.g., the logical NOT performed by inverters, as well as memory READ and WRITE
operations) do not change the amount of information in the computer. On the other hand, such
information-irreversible logic gates as two-input NAND (or NOR, or XOR, etc.) erase one bit at each
operation, because they turn two input bits into one output bit – see Fig. 5a.
In 1961, Rolf Landauer argued that each logic operation should turn into heat at least energy
Irreversible
computation: E min  T ln 2  k BTK ln 2 . (2.51)
energy cost

This result may be illustrated with the Szilard engine (Fig. 4), operated in a reversed cycle. At the first
stage, with the partition’s door closed, it uses external mechanical work E = Tln2 to reduce the volume
in that the molecule is confined, from V to V/2, pumping heat Q = E into the heat bath. To model a
logically irreversible logic gate, let us now open the door in the partition, and thus lose one bit of

31 See, for example, A. Bérut et al., Nature 483, 187 (2012); J. Koski et al., PNAS USA 111, 13786 (2014); Y. Jun
et al., Phys. Rev. Lett. 113, 190601 (2014); J. Peterson et al., Proc. Roy. Soc. A 472, 20150813 (2016).
32 In practical computers, the effective E is even much higher (currently, above ~10-15 J) due to the high energy
cost of moving data across a multi-component system, in particular between its logic and memory chips.

Chapter 2 Page 15 of 44
Essential Graduate Physics SM: Statistical Mechanics

information about the molecule’s position. Then we will never get the work Tln2 back, because moving
the partition back to the right, with the door open, takes place at zero average pressure. Hence, Eq. (51)
gives a fundamental limit for energy loss (per bit) at the logically irreversible computation.

(a) (b)
A
A A
F Fig. 2.5. Simple examples
F of (a) irreversible and (b)
B potentially reversible logic
B circuits. Each rectangle
denotes a circuit storing one
B
bit of information.

However, in 1973 Charles Bennett came up with convincing arguments that it is possible to
avoid such energy loss by using only operations that are reversible not only physically, but also
logically.33 For that, one has to avoid any loss of information, i.e. any erasure of intermediate results, for
example in the way shown in Fig. 5b.34 At the end of all calculations, after the result has been copied
into memory, the intermediate results may be “rolled back” through reversible gates to be eventually
merged into a copy of input data, again without erasing a single bit. The minimal energy dissipation at
such reversible calculation tends to zero as the operation speed is decreased, so the average energy loss
per bit may be less than the perceived “fundamental thermodynamic limit” (51). The price to pay for this
ultralow dissipation is a very high complexity of the hardware necessary for the storage of all
intermediate results. However, using irreversible gates sparsely, it may be possible to reduce the
complexity dramatically, so in the future such mostly reversible computation may be able to reduce
energy consumption in practical digital electronics.35
Before we leave Maxwell’s Demon behind, let me use it to revisit, for one more time, the
relation between the reversibility of the classical and quantum mechanics of Hamiltonian systems and
the irreversibility possible in thermodynamics and statistical physics. In the thought experiment shown
in Fig. 4, the laws of mechanics governing the motion of the molecule are reversible at all times. Still, at
partition’s motion to the right, driven by molecular impacts, the entropy grows, because the molecule
picks up the heat Q > 0, and hence the entropy S = Q/T > 0, from the heat bath. The physical
mechanism of this irreversible entropy (read: disorder) growth is the interaction of the molecule with
uncontrollable components of the heat bath and the resulting loss of information about its motion.
Philosophically, such emergence of irreversibility in large systems is a strong argument against
reductionism – a naïve belief that by knowing the exact laws of Nature at the lowest, most fundamental

33 C. Bennett, IBM J. Res. Devel. 17, 525 (1973); see also C. Bennett, Int. J. Theor. Phys. 21, 905 (1982).
34 For that, all gates have to be physically reversible, with no static power consumption. Such logic devices do
exist, though they are still not very practicable – see, e.g., K. Likharev, Int. J. Theor. Phys. 21, 311 (1982).
(Another reason why I am citing, rather reluctantly, my own paper is that it also gave a constructive proof that the
reversible computation may also beat the perceived “fundamental quantum limit”, Et > , where t is the time
of the binary logic operation.)
35 Many currently explored schemes of quantum computing are also reversible – see, e.g., QM Sec. 8.5 and
references therein.

Chapter 2 Page 16 of 44
Essential Graduate Physics SM: Statistical Mechanics

level of its complexity, we can readily understand all phenomena on the higher levels of its organization.
In reality, the macroscopic irreversibility of large systems is a good example36 of a new law (in this case,
the 2nd law of thermodynamics) that becomes relevant on a substantially new, higher level of complexity
– without defying the lower-level laws. Without such new laws, very little of the higher-level
organization of Nature may be understood.

2.4. Canonical ensemble and the Gibbs distribution


As was shown in Sec. 2, the microcanonical distribution may be directly used for solving some
important problems. However, its further development, also due to J. Gibbs, turns out to be much more
convenient for calculations.
Let us consider a statistical ensemble of macroscopically similar systems, each in thermal
equilibrium with a heat bath of the same temperature T (Fig. 6a). Such an ensemble is called canonical.

(a) (b)

system EΣ E
under study
dQ, dS
Em, T
EHB = E – Em Fig. 2.6. (a) A system in a heat
bath (i.e. a canonical ensemble’s
heat bath member) and (b) the energy
EHB, T Em
spectrum of the composite system
0 (including the heat bath).

It is intuitively evident that if the heat bath is sufficiently large, any thermodynamic variables
characterizing the system under study should not depend on the heat bath’s environment. In particular,
we may assume that the heat bath is thermally insulated, so the total energy E of the composite system,
consisting of the system of our interest plus the heat bath, does not change in time. For example, if the
system under study is in a certain (say, mth ) quantum state, then the sum
E   E m  E HB (2.52)
is time-independent. Now let us partition the considered canonical ensemble of such systems into much
smaller sub-ensembles, each being a microcanonical ensemble of composite systems whose total, time-
independent energies E are the same – as was discussed in Sec. 2, within a certain small energy interval
E << E – see Fig. 6b. Due to the very large size of each heat bath in comparison with that of the
system under study, the heat bath’s density of states gHB is very high, and E may be selected so
1
 E   E m  E m'  E HB , (2.53)
g HB
where m and m’ are any states of the system of our interest.

36 Another famous example is Charles Darwin’s theory of biological evolution.

Chapter 2 Page 17 of 44
Essential Graduate Physics SM: Statistical Mechanics

According to the microcanonical distribution, within each of these microcanonical sub-


ensembles, the probabilities to find the composite system in any state are equal. Still, the heat bath
energies EHB = E – Em (Fig. 6b) of the members of this sub-ensemble may be different – due to the
difference in Em. The probability W(Em) to find the system of our interest (within the selected sub-
ensemble) in a state with energy Em is proportional to the number M of the corresponding heat baths in
the sub-ensemble. As Fig. 6b shows, in this case we may write M = gHB(EHB)E. As a result, within
the microcanonical sub-ensemble with the total energy E,

Wm  M  g HB ( E HB )E   g HB ( E   E m )E  . (2.54)


Let us simplify this expression further, using the Taylor expansion with respect to relatively
small Em << E. However, here we should be careful. As we have seen in Sec. 2, the density of states of
a large system is a nearly-exponential function of energy, so if we applied the Taylor expansion directly
to Eq. (54), the Taylor series would only converge for very small Em. A much broader applicability
range may be obtained by taking the logarithms of both parts of Eq. (54) first:

ln Wm  const  lng HB ( E   E m )  ln E    const  S HB ( E   E m ) , (2.55)

where the last step used Eq. (36) for the heat bath, and incorporated ln (E) into the (inconsequential)
constant. Now, we can Taylor-expand the (much more smooth) function of energy on the right-hand side
of Eq. (55), and limit ourselves to the two leading terms of the series:
dS
ln Wm  const  S HB E 0  HB E 0 E m . (2.56)
m dE HB m
But according to Eq. (1.9), the derivative participating in this expression is nothing other than
the reciprocal temperature of the heat bath, which (due to the large bath size) does not depend on
whether Em is equal to zero or not. Since our system of interest is in the thermal equilibrium with the
bath, this is also the temperature T of the system – see Eq. (1.8). Hence Eq. (56) is merely
Em
ln Wm  const  . (2.57)
T
This equality describes a substantial decrease of Wm as Em is increased by ~T, and hence our linear
approximation (56) is virtually exact as soon as EHB is much larger than T – the condition that is rather
easy to satisfy, because as we have seen in Sec. 2, the average energy per one degree of freedom of the
system of the heat bath is also of the order of T, so its total energy is much larger than T because of its
much larger size.
Now we should be careful again because so far, Eq. (57) was only derived for a sub-ensemble
with a certain fixed E. However, since the second term on the right-hand side of Eq. (57) includes only
Em and T, which are independent of E, this relation, perhaps with different constant terms, is valid for
all sub-ensembles of the canonical ensemble, and hence for that ensemble as the whole. Hence for the
total probability to find our system of interest in a state with energy Em, in the canonical ensemble with
temperature T, we can write
 E  1  E  Gibbs
Wm  const  exp m   exp m . (2.58) distribution
 T  Z  T 

Chapter 2 Page 18 of 44
Essential Graduate Physics SM: Statistical Mechanics

. This is the famous Gibbs distribution,37 sometimes called the “canonical distribution”, which is
arguably the summit of statistical physics,38 because it may be used for a straightforward (or at least
conceptually straightforward :-) calculation of all statistical and thermodynamic variables of a vast range
of systems. Its physical sense is very clear: the interaction with heat bath “punishes” the system states
(by the reduction of their probability) for having higher energies – on the scale T of its thermal agitation.
Now let us calculate the coefficient Z participating in Eq. (58). Requiring, per Eq. (4), the sum of
all Wm to be equal to 1, we get
 E 
Z   exp m  ,
Statistical
sum (2.59)
m  T 
where the summation is formally extended to all quantum states of the system, though in practical
calculations, the sum may be truncated to include only the states that are noticeably occupied. The
apparently humble normalization coefficient Z turns out to be so important for applications that it has a
special name – or actually, two names: either the statistical sum or the partition function of the system.
To appreciate the importance of Z, let us use the general expression (29) for entropy to calculate it for
the particular case of the canonical ensemble, i.e. the Gibbs distribution (58) of the probabilities Wn:

ln Z  Em  1  E 
S   Wm ln Wm   exp  E m exp m . (2.60)
m Z m T  ZT m  T 
On the other hand, according to the general rule (7), the thermodynamic (i.e. ensemble-averaged) value
E of the internal energy of the system is
1  E 
E   Wm Em  E m exp m , (2.61a)
m Z m  T 
so the second term on the right-hand side of Eq. (60) is just E/T, while the first term equals lnZ, due to
Eq. (59). (By the way, using the notion of reciprocal temperature   1/T, with the account of Eq. (59),
Eq. (61a) may be also rewritten as
 (ln Z )
E from Z E . (2.61b)

This formula is very convenient for calculations if our prime interest is the average internal energy E
rather than F or Wn.) With these substitutions, Eq. (60) yields a very simple relation between the
statistical sum and the entropy of the system:
E
S   ln Z . (2.62)
T

37 The temperature dependence of the type exp{-const/T}, especially when showing up in rates of certain events,
e.g., chemical reactions, is also frequently called the Arrhenius law – after chemist S. Arrhenius who has noticed
this law in numerous experimental data. In all cases I am aware of, the Gibbs distribution is the underlying reason
of the Arrhenius law. (We will see several examples of that later in this course.)
38 This is the opinion of many physicists, including Richard Feynman – who climbs on this “summit” already on
the first page of his brilliant book Statistical Mechanics, CRC Press, 1998. (This is a collection of lectures on a
few diverse, mostly advanced topics of statistical physics, rather than its systematic course, so it can hardly be
used as the first textbook on the subject. However, I can highly recommend its first chapter to all my readers.)

Chapter 2 Page 19 of 44
Essential Graduate Physics SM: Statistical Mechanics

Now using Eq. (1.33), we see that Eq. (62) gives a straightforward way to calculate the free
energy F of the system from nothing other than its statistical sum (and temperature):
F from Z
F  E  TS  T ln Z . (2.63)

The relations (61b) and (63) play a key role in the connection of statistics to thermodynamics,
because they enable the calculation, from Z alone, of the thermodynamic potentials of the system in
equilibrium, and hence of all other variables of interest, using the general thermodynamic relations – see
especially the circular diagram shown in Fig. 1.6, and its discussion in Sec. 1.4. Let me only note that to
calculate pressure P from the second of Eqs. (1.35), we would need to know the explicit dependence of
F, and hence of the statistical sum Z on the system’s volume V. This would require the calculation, by
appropriate methods of either classical or quantum mechanics, of the dependence of the eigenenergies
Em on the volume. Numerous examples of such calculations will be given later in the course.
Before proceeding to such examples, let us notice that Eqs. (59) and (63) may be readily
combined to give an elegant equality,
 F  E 
exp     exp  m  . (2.64)
 T m  T 
This formula, together with Eq. (59), enables us to rewrite the Gibbs distribution (58) in another form:
 F  Em 
Wm  exp , (2.65)
 T 
more convenient for some applications. In particular, this expression shows that since all probabilities
Wm are below 1, F is always lower than the lowest energy level. Also, Eq. (65) clearly shows that the
probabilities Wm do not depend on the energy reference, i. e. on an arbitrary constant added to all Em –
and hence to E and F.

2.5. Harmonic oscillator statistics


The last property may be immediately used in our first example of the Gibbs distribution
application to a particular but very important system – the harmonic oscillator, for a much more general
case than was done in Sec. 2, namely for an arbitrary relation between T and .39 Let us consider a
canonical ensemble of similar oscillators, each in a contact with a heat bath of temperature T. Selecting
the ground-state energy /2 for the origin of E, the oscillator eigenenergies (38) become Em = m
(with m = 0, 1,…), so the Gibbs distribution (58) for probabilities of these states is
1  E  1  m 
Wm  exp m   exp , (2.66)
Z  T  Z  T 
with the following statistical sum:

 m   m   
Z   exp    , where   exp   1. (2.67)
m 0  T  m 0  T 

39The task of making a very similar (and even simpler) calculation for another key class of quantum-mechanical
objects, two-level systems, is left for the reader’s exercise.

Chapter 2 Page 20 of 44
Essential Graduate Physics SM: Statistical Mechanics

This is just the well-known infinite geometric progression (the “geometric series”),40 with the sum
1 1
Z  , (2.68)
Quantum
1   1  e  /T
oscillator: so Eq. (66) yields
 
statistics
Wm  1  e  / T e  m / T . (2.69)

Figure 7a shows Wm for several lower energy levels, as functions of temperature, or rather of the
T/ ratio. The plots show that the probability to find the oscillator in each particular state (except for
the ground one, with m = 0) vanishes in both low- and high-temperature limits, and reaches its
maximum value Wm ~ 0.3/m at T ~ m, so the contribution mWm of each excited level to the average
oscillator energy E is always smaller than .

(a) (b)
1 2

W0
S E

W1 1

C
W2
W3
0.1 0

F

1

0.01 2
0.1 1 10 0 0.5 1 1.5 2 2.5 3

T /  T / 
Fig. 2.7. Statistical and thermodynamic parameters of a harmonic oscillator, as functions of temperature.

This average energy may be calculated in either of two ways: either using Eq. (61a) directly:

   m e m / T ,
 
E   E mWm  1  e  / T (2.70)
m 0 m 0

or (simpler) using Eq. (61b), as


  1
E ln Z  ln1  exp  , where   . (2.71)
  T
Both methods give (of course) the same result,41

40 See, e.g., MA Eq. (2.8b).

Chapter 2 Page 21 of 44
Essential Graduate Physics SM: Statistical Mechanics

Quantum
1 oscillator:
E  E ( , T )   , (2.72)
e  / T  1
average
energy

which is valid for arbitrary temperature and plays a key role in many fundamental problems of physics.
The red line in Fig. 7b shows this result as a function of the normalized temperature. At relatively low
temperatures, T << , the oscillator is predominantly in its lowest (ground) state, and its energy (on top
of the constant zero-point energy /2, which was used in our calculation as the reference) is
exponentially small: E   exp{-/T} << T, . On the other hand, in the high-temperature limit, the
energy tends to T. This is exactly the result (a particular case of the equipartition theorem) that was
obtained in Sec. 2 from the microcanonical distribution. Please note how much simpler is the calculation
using the Gibbs distribution, even for an arbitrary ratio T/.
To complete the discussion of the thermodynamic properties of the harmonic oscillator, we can
calculate its free energy using Eq. (63):
1
Z

 T ln 1  e  / T .
F  T ln  (2.73)

Now the entropy may be found from thermodynamics: either from the first of Eqs. (1.35), S = –(∂F/∂T)V,
or (even more easily) from Eq. (1.33): S = (E – F)/T. Both relations give, of course, the same result:

S

T e  
1
/ T
1

 ln 1  e  / T .  (2.74)

Finally, since in the general case, the dependence of the oscillator properties (essentially, of ) on
volume V is not specified, such variables as P, , G, W, and  are not defined, and what remains is to
calculate the average heat capacity C per one oscillator:

e  / T
2 2
E      / 2T 
C    . (2.75)
T  T  (e  / T  1) 2  sinh( / 2T ) 

The calculated thermodynamic variables are plotted in Fig. 7b as functions of temperature. In the
low-temperature limit (T << ), they all tend to zero. On the other hand, in the high-temperature limit
(T >> ), F  –T ln(T/) –, S  ln(T/)  +, and C  1 (in the SI units, C  kB). Note that
the last limit is the direct corollary of the equipartition theorem: each of the two “half-degrees of
freedom” of the oscillator gives, in the classical limit, the same contribution C = ½ into its heat capacity.
Now let us use Eq. (69) to discuss the statistics of the quantum oscillator described by the
Hamiltonian (46), in the coordinate representation. Again using the density matrix’s diagonality in
thermodynamic equilibrium, we may use a relation similar to Eqs. (47) to calculate the probability
density to find the oscillator at coordinate q:

   e m / T 
  
w(q )   Wm wm (q )   Wm  m (q )  1  e  / T
2 2
m (q) , (2.76)
m 0 m 0 m 0

41It was first obtained in 1924 by S. Bose and is sometimes called the Bose distribution – a particular case of the
Bose-Einstein distribution to be discussed in Sec. 8 below.

Chapter 2 Page 22 of 44
Essential Graduate Physics SM: Statistical Mechanics

where m(q) is the normalized eigenfunction of the mth stationary state of the oscillator. Since each
m(q) is proportional to the Hermite polynomial42 that requires at least m elementary functions for its
representation, working out the sum in Eq. (76) is a bit tricky,43 but the final result is rather simple: w(q)
is just a normalized Gaussian distribution (the “bell curve”),
1  q2  , (2.77)
w(q)  exp 
2 1 / 2 q  2(q) 2 
with q = 0, and
 
q 2  q  
2
coth . (2.78)
2 m 2T
Since the function coth tends to 1 at  → , and diverges as 1/ at  → 0, Eq. (78) shows that the width
q of the coordinate distribution is nearly constant (and equal to that, (/2m)1/2, of the ground-state
wavefunction 0) at T << , and grows as (T/m2)1/2  (T/)1/2 at T/ → .
As a sanity check, we may use Eq. (78) to write the following expression,

q 2     / 4, for T   ,
U  coth  (2.79)
2 4 2T T/2, for   T ,
for the average potential energy of the oscillator. To comprehend this result, let us recall that Eq. (72)
for the average full energy E was obtained by counting it from the ground state energy /2 of the
oscillator. If we add this reference energy to that result, we get
Quantum
oscillator:    
E   coth . (2.80)
e  / T  1 2
total average
energy 2 2T
We see that for arbitrary temperature, U = E/2, as was already discussed in Sec. 2. This means that the
average kinetic energy, equal to E – U, is also the same:44
p2 q 2 E  
   coth . (2.81)
2m 2 2 4 2T
In the classical limit T >> , each of the energies tends to T/2, reproducing the equipartition theorem
(48).

2.6. Two important applications


The results of the previous section, especially Eq. (72), have innumerable applications in physics
and related disciplines, and here I have time for a brief discussion of only two of them.
(i) Blackbody radiation. Let us consider a free-space volume V limited by non-absorbing (i.e.
ideally reflecting) walls. Electrodynamics tells us45 that the electromagnetic field in such a “cavity” may
be represented as a sum of modes with a time evolution similar to that of the usual harmonic oscillator.

42 See, e.g., QM Sec. 2.10.


43 The calculation may be found, e.g., in QM Sec. 7.2.
44 As a reminder: the equality of these two averages, for arbitrary temperatures, was proved already in Sec. 2.

Chapter 2 Page 23 of 44
Essential Graduate Physics SM: Statistical Mechanics

If the volume V is large enough,46 the number of these modes within a small range dk of the wave vector
magnitude k is
gV 3 gV
dN  d k 4k 2 dk , (2.82)
(2 ) 3
(2 ) 3
where for electromagnetic waves, the degeneracy factor g is equal to 2, due to their two different
independent (e.g., linear) polarizations of waves with the same wave vector k. With the linear, isotropic
dispersion relation for waves in vacuum, k = /c, Eq. (82) yields
2V  2 d 2
dN  4  V 2 3 d. (2.83)
(2 ) 3 c3  c
On the other hand, quantum mechanics says47 that the energy of such a “field oscillator” is
quantized per Eq. (38), so at thermal equilibrium its average energy is described by Eq. (72). Plugging
that result into Eq. (83), we see that the spectral density of the electromagnetic field’s energy, per unit
volume, is
E dN  3 1 Planck’s
u ( )   2 3 . (2.84) radiation
V d  c e  / T  1 law

This is the famous Planck’s blackbody radiation law.48 To understand why its common name
mentions radiation, let us consider a small planar part, of area dA, of a surface that completely absorbs
electromagnetic waves incident from any direction. (Such “perfect black body” approximation may be
closely approached using special experimental structures, especially in limited frequency intervals.)
Figure 8 shows that if the arriving wave was planar, with the incidence angle , then the power dP()
absorbed by the surface of small area dA, within a small frequency interval d, i.e. the energy incident
at that area in unit time, would be equal to the radiation energy within the same frequency interval,
contained inside an imaginary cylinder (shaded in Fig. 8) of height c, base area dAcos, and hence
volume dV = c dAcos :
dP ( )  u ( )d dV  u ( )d c dA cos  . (2.85)

dA cos 
dA c

Fig. 2.8. Calculating the relation
between dP () and u()d.

45 See, e.g., EM Sec. 7.8.


46 In our current context, the volume should be much larger than (c/T)3, where c  3108 m/s is the speed of
light. For the room temperature (T  kB300K  410-21 J), this lower bound is of the order of 10-16 m3.
47 See, e.g., QM Sec. 9.1.
48 Let me hope the reader knows that this law was first suggested in 1900 by Max Planck as an empirical fit for

the experimental data on blackbody radiation, and this was the historic point at which the Planck constant  (or
rather h  2) was introduced – see, e.g., QM Sec. 1.1.

Chapter 2 Page 24 of 44
Essential Graduate Physics SM: Statistical Mechanics

Since the thermally-induced field is isotropic, i.e. propagates equally in all directions, this result
should be averaged over all solid angles within the polar angle interval 0    /2:
 /2 2
dP ( ) 1 dP ( ) 1 c
dAd

4  dAd d  cu( ) 4 
0
sin d  d cos  
0
4
u ( ) . (2.86)

Hence Planck’s expression (84), multiplied by c/4, gives the power absorbed by such a “blackbody”
surface. But at thermal equilibrium, this absorption has to be exactly balanced by the surface’s own
radiation, due to its non-zero temperature T.
I hope the reader is familiar with the main features of the Planck law (84), including its general
shape (Fig. 9), with the low-frequency asymptote u()  2 (due to its historic significance, bearing the
special name of the Rayleigh-Jeans law), the exponential drop at high frequencies (the Wien law), and
the resulting maximum of the function u(), reached at the frequency max with

 max  2.82 T , (2.87)

i.e. at the wavelength max = 2/kmax = 2c/max  2.22 c/T.

10

u (1)
u0
0.1
Fig. 2.9. The frequency dependence of the
blackbody radiation density, normalized by
u0  T3/22c3, according to the Planck law
(red line) and the Rayleigh-Jeans law (blue
0.01 line).
0.1 1 10
/T
Still, I cannot help mentioning a few important particular values: one corresponding to the
visible light (max ~ 500 nm) for the Sun’s effective surface temperature TK  6,000 K, and another one
corresponding to the mid-infrared range (max ~10 m) for the Earth’s surface temperature TK  300 K.
The balance of these two radiations, absorbed and emitted by the Earth, determines its surface
temperature and hence has the key importance for all life on our planet. This is why it is at the front and
center of the current climate change discussions. As one more example, the cosmic microwave
background (CMB) radiation, closely following the Planck law with TK = 2.725 K (and hence having the
maximum density at max  1.9 mm), and in particular its (very small) anisotropy, is a major source of
data for modern cosmology.
Now let us calculate the total energy E of the blackbody radiation inside some volume V. It may
be found from Eq. (84) by its integration over all frequencies: 49,50

49The last step in Eq. (88) uses a table integral, equal to (4)(4) = (3!)(4/90) = 4/15 – see, e.g., MA Eq. (6.8b),
with s = 4, and then MA Eqs. (6.7e), and (2.7b).

Chapter 2 Page 25 of 44
Essential Graduate Physics SM: Statistical Mechanics

  
 3 d VT 4  3 d 2
E  V  u ( )d  V   2 3 3  
 V T4. (2.88)
 2 3
c  / T  c 15 3 3
c
0 0 e 1  0 e 1 
Using Eq. (86) to recast Eq. (88) into the total power radiated by a blackbody surface, we get the well-
known Stefan (or “Stefan-Boltzmann”) law51
dP 2 Stefan
 3 2
T 4  TK4 , (2.89a) law
dA 60 c
where  is the Stefan-Boltzmann constant
2 W Stefan-
 3 2
k B4  5.67  10 8 . (2.89b) Boltzmann
60 c m2K 4 constant

By this point, the thoughtful reader should have an important concern ready: Eq. (84) and hence
Eq. (88) are based on Eq. (72) for the average energy of each oscillator, referred to its ground-state
energy /2. However, the radiation power should not depend on the energy origin; why have not we
included the ground energy of each oscillator into the integration (88), as we have done in Eq. (80)? The
answer is that usual radiation detectors only measure the difference between the power Pin of the
incident radiation (say, that of a blackbody surface with temperature T) and their own back-radiation
power Pout, corresponding to some effective temperature Td of the detector – see Fig. 10. But however
low Td is, the temperature-independent contribution /2 of the ground-state energy to the back
radiation is always there. Hence, the term /2 drops out from the balance, and cannot be detected – at
least in this simple way. This is the reason why we had the right to ignore this contribution in Eq. (88) –
very fortunately, because it would lead to the integral’s divergence at its upper limit. However, let me
repeat that the ground-state energy of the electromagnetic field oscillators is physically real and may be
important – see Sec. 5.5 below.

  
dPin     E ( , T )  d
 2 

T
Fig. 2.10. The power balance at
Td the electromagnetic radiation
  
dPout     E ( , Td )  d power measurement.
 2 

One more interesting result may be deduced from the free energy F of the electromagnetic
radiation, which may be calculated by integration of Eq. (73) over all the modes, with the appropriate
weight (83):

50 Note that the heat capacity CV  (E/T)V, following from Eq. (88), is proportional to T3 at any temperature, and
hence does not obey the trend CV  const at T  . This is the result of the unlimited growth, with temperature,
of the number of thermally-exited field oscillators with frequencies  below T/.
51 Its functional part (E  T4) was deduced in 1879 by Joseph Stefan from earlier experiments by John Tyndall.
Theoretically, it was proved in 1884 by L. Boltzmann, using a result derived earlier by Adolfo Bartoli from the
Maxwell equations for the electromagnetic field – all well before Max Planck’s work.

Chapter 2 Page 26 of 44
Essential Graduate Physics SM: Statistical Mechanics

    dN d   T ln 1  e  / T V
 
  2

F   T ln 1  e  / T   T ln 1  e  / T  d . (2.90)
 0
d 0   c
2 3

Representing 2d as d(3)/3, we can readily work out this integral by parts, reducing it to a table
integral similar to that in Eq. (88), and getting a surprisingly simple result:
2 E
F  V 3 3
T4  . (2.91)
45 c 3
Now we can use the second of the general thermodynamic relations (1.35) to calculate the pressure
exerted by the radiation on the walls of the containing volume V:52
 F  2 E
P     T4  . (2.92a)
 V  T 45 c
3 3
3V
Rewritten in the form,
E
Photon gas: PV  , (2.92b)
PV vs. E 3
this result may be considered as the equation of state of the electromagnetic field, i.e. from the quantum-
mechanical point of view, of the photon gas. Note that the equation of state (1.44) of the ideal classical
gas may be represented in a similar form, but with a coefficient generally different from Eq. (92).
Indeed, according to the equipartition theorem, for an ideal gas of non-relativistic particles whose
internal degrees of freedom are in a fixed (say, ground) state, the temperature-dependent energy is that
of the three translational “half-degrees of freedom”, E = 3N(T/2). Expressing from here the product NT
= (2E/3), and plugging it into Eq. (1.44), we get a relation similar to Eq. (92), but with a twice larger
factor before E. On the other hand, a relativistic treatment of the classical gas shows that Eq. (92) is
valid for any gas in the ultra-relativistic limit, T >> mc2, where m is the rest mass of the gas’ particle.
Evidently, photons (i.e. particles with m = 0) satisfy this condition at any energy.53
Finally, let me note that Eq. (92) allows for the following interesting interpretation. The last of
Eqs. (1.60), being applied to Eq. (92), shows that in this particular case the grand thermodynamic
potential  equals (–E/3), so according to Eq. (91), it is equal to F. But according to the definition of ,
i.e. the first of Eqs. (1.60), this means that the chemical potential of the electromagnetic field excitations
(photons) vanishes:
F 
 0. (2.93)
N
In Sec. 8 below, we will see that the same result follows from the comparison of Eq. (72) and the
general Bose-Einstein distribution for arbitrary bosons. So, from the statistical point of view, photons
may be considered bosons with zero chemical potential.
(ii) Specific heat of solids. The heat capacity of solids is readily measurable, and in the early
1900s, its experimentally observed temperature dependence served as an important test for the then-

52 This formula may be also derived from the expression for the forces exerted by the electromagnetic radiation on
the walls (see, e.g. EM Sec. 9.8), but the above calculation is much simpler.
53 Note that according to Eqs. (1.44), (88), and (92), the difference between the equations of state of the photon
gas and an ideal gas of non-relativistic particles, expressed in the more usual form P = P(V, T), is much more
dramatic: P  T4V0 vs. P  T1V-1.

Chapter 2 Page 27 of 44
Essential Graduate Physics SM: Statistical Mechanics

emerging quantum theories. However, the theoretical calculation of CV is not simple54 – even for
insulators, whose specific heat at realistic temperatures is due to thermally-induced vibrations of their
crystal lattice alone.55 Indeed, at relatively low frequencies, a solid may be treated as an elastic
continuum. This continuum supports three different modes of mechanical waves with the same
frequency , that all obey linear dispersion laws,  = vk, but the velocity v = vl for one of these modes
(the longitudinal sound) is higher than that (vt) of two other modes (the transverse sound).56 At such
frequencies, the wave mode density may be described by an evident generalization of Eq. (83):
1
1 2
dN  V  3  3  4 2 d . (2.94a)
(2 )  vl vt 
3

For what follows, it is convenient to rewrite this relation in a form similar to Eq. (83):
1 / 3
3V  2 d 1  1 2 
dN  4 ,with v    3  3  . (2.94b)
(2 ) 3 v3  3  vl vt 
However, the basic wave theory shows57 that as the frequency  of a sound wave in a periodic
structure is increased so much that its half-wavelength /k approaches the crystal period d, the
dispersion law (k) becomes nonlinear before the frequency reaches its maximum at k = /d. To make
things even more complex, 3D crystals are generally anisotropic, so the dispersion law is different in
different directions of the wave propagation. As a result, the exact statistics of thermally excited sound
waves, and hence the heat capacity of crystals, is rather complicated and specific for each particular
crystal type.
In 1912, P. Debye suggested an approximate theory of the specific heat’s temperature
dependence, which is in surprisingly good agreement with experiment for many insulators, including
polycrystalline and amorphous materials. In his model, the linear (acoustic) dispersion law  = vk, with
the effective sound velocity v defined by the second of Eqs. (94b), is assumed to be exact all the way up
to some cutoff frequency D, the same for all three wave modes. This Debye frequency may be defined
by the requirement that the total number of acoustic modes, calculated within this model from Eq. (94b),
D
1 3 V D3
N V  4 d 
2
, (2.95)
(2 ) v 3
3
0 2 2 v 3
is equal to the universal number N = 3nV of the degrees of freedom (and hence of independent
oscillation modes) in a 3D system of nV elastically coupled particles, where n is the atomic density of
the crystal, i.e. the number of atoms per unit volume.58 For this model, Eq. (72) immediately yields the
following expression for the average energy and specific heat (in thermal equilibrium at temperature T ):
D
13 
E V 0 4 2 d  3nVT D x  xT / T , (2.96)
(2 ) v 3
3
e  / T 1 D

54 Due to a rather low temperature expansion of solids, the difference between their CV and CP is small.
55 In good conductors (e.g., metals), specific heat is contributed (and at low temperatures, dominated) by free
electrons – see Sec. 3.3 below.
56 See, e.g., CM Sec. 7.7.
57 See, e.g., CM Sec. 6.3, in particular Fig. 6.5 and its discussion.
58 See, e.g., CM Sec. 6.2.

Chapter 2 Page 28 of 44
Essential Graduate Physics SM: Statistical Mechanics

Debye CV 1  E   dD ( x) 
cV      3  D( x)  x , (2.97)
dx  x T / T
law
nV nV  T V  D

where TD  D is called the Debye temperature,59 and


3  3 d
x
 1, for x  0,
D ( x)  3    4 (2.98)
x 0 e  1  / 5 x 3 , for x  ,

is the Debye function. The red lines in Fig. 11 show the temperature dependence of the specific heat cV
(per particle) within the Debye model. At high temperatures, it approaches a constant value of three,
corresponding to the energy E = 3nVT, in agreement with the equipartition theorem for each of three
degrees of freedom (i.e. six half-degrees of freedom) of each mode. (This value of cV is known as the
Dulong-Petit law.) In the opposite limit of low temperatures, the specific heat is much smaller:
3
T 
12 4
cV     1 , (2.99)
 TD 
5
reflecting the reduction of the number of excited phonons with  < T as the temperature is decreased.

3 10

1
2
cV cV 0.1

0.01
1
3
110

4
0 110
0 0.5 1 1.5 0.01 0.1 1
T / TD T / TD
Fig. 2.11. The specific heat as a function of temperature in the Debye (red lines) and Einstein (blue lines) models.

As a historic curiosity, P. Debye’s work followed one by A. Einstein, who had suggested (in
1907) a simpler model of crystal vibrations. In his model, all 3nV independent oscillatory modes of nV
atoms of the crystal have approximately the same frequency, say E, and Eq. (72) immediately yields
 E
E  3nV , (2.100)
E / T
e 1
so the specific heat is functionally similar to Eq. (75):

59 In the SI units, the Debye temperature TD is of the order of a few hundred K for most simple solids (e.g., ~430
K for aluminum and ~340 K for copper), with somewhat lower values for crystals with heavy atoms (~105 K for
lead), and reaches its highest value ~2200 K for diamond, with its relatively light atoms and very stiff lattice.

Chapter 2 Page 29 of 44
Essential Graduate Physics SM: Statistical Mechanics

2
1  E    E / 2T 
cV     3  . (2.101)
nV  T V  sinh(   E / 2T ) 
This dependence cV(T) is shown with blue lines in Fig. 11 (assuming, for the sake of simplicity,
that E = TD). At high temperatures, this result does satisfy the universal Dulong-Petit law (cV = 3), but
for T << TD, Einstein’s model predicts a much faster (exponential) drop of the specific heart as the
temperature is reduced. (The difference between the Debye and Einstein models is not too spectacular
on the linear scale, but in the log-log plot shown on the right panel of Fig. 11, it is rather dramatic.60)
The Debye model is in much better agreement with experimental data for simple, monoatomic crystals,
thus confirming the conceptual correctness of his wave-based approach.
Note, however, that when a genius such as Albert Einstein makes an error, there is usually some
deep and important background under it. Indeed, crystals with the basic cell consisting of atoms of two
or more types (such as NaCl, etc.), feature two or more separate branches of the dispersion law (k) –
see, e.g., Fig. 12. While the lower, “acoustic” branch is virtually similar to those for monoatomic
crystals and may be approximated by the Debye model,  = vk, reasonably well, the upper (“optical”61)
branch does not approach  = 0 at any k. Moreover, for large values of the atomic mass ratio r, the
optical branches are almost flat, with virtually k-independent frequencies 0, which correspond to
simple oscillations of each light atom between its heavy neighbors. For thermal excitations of such
oscillations, and their contribution to the specific heat, Einstein’s model (with E = 0) gives a very
good approximation, so for such solids, the specific heat may be well described by a sum of the Debye
and Einstein laws (97) and (101), with appropriate weights.

“optical” branch
 (k )
(arbitrary units,
linear scale) “acoustic” branch Fig. 2.12. The dispersion relation for
mechanical waves in a simple 1D model of a
solid, with similar interparticle distances d, but
alternating particle masses, plotted for a
0 0.5 1.0
particular mass ratio r = 5 – see CM Chapter 6.
kd / 

2.7. Grand canonical ensemble and distribution


As we have seen, the Gibbs distribution is a very convenient way to calculate the statistical and
thermodynamic properties of systems with a fixed number N of particles. However, for systems in which
N may vary, another distribution is preferable for applications. Several examples of such situations (as

60 This is why there is the following general “rule of thumb” in quantitative sciences: if you plot your data on a
linear rather than log scale, you better have a good excuse ready. (An example of a valid excuse: the variable you
are plotting changes its sign within the range you want to exhibit.)
61 This term stems from the fact that at k  0, the mechanical waves corresponding to these branches have phase

velocities vph  (k)/k that are much higher than that of the acoustic waves, and may approach the speed of light.
As a result, these waves can strongly interact with electromagnetic (practically, optical) waves of the same
frequency, while acoustic waves cannot.

Chapter 2 Page 30 of 44
Essential Graduate Physics SM: Statistical Mechanics

well as the basic thermodynamics of such systems) have already been discussed in Sec. 1.5. Perhaps
even more importantly, statistical distributions for systems with variable N are also applicable to some
ensembles of independent particles in certain single-particle states even if the number of the particles is
fixed – see the next section.
With this motivation, let us consider what is called the grand canonical ensemble (Fig. 13). It is
similar to the canonical ensemble discussed in Sec. 4 (see Fig. 6) in all aspects, besides that now the
system under study and the heat bath (in this case, more often called the environment) may exchange not
only heat but also particles. In this ensemble, all environments are in both the thermal and chemical
equilibrium, with their temperatures T and chemical potentials  the same for all members.

system
under study
dQ, dS
Em,N, T, μ dN

environment
T, μ Fig. 2.13. A member of the grand canonical
ensemble.

Let us assume that the system of interest is also in chemical and thermal equilibrium with its
environment. Then using exactly the same arguments as in Sec. 4 (including the specification of
microcanonical sub-ensembles with fixed E and N), we may generalize Eq. (55), taking into account
that the entropy Senv of the environment is now a function of not only its energy Eenv = E – Em,N, 62 but
also of the number of particles Nenv = N – N, with E and N fixed:
ln Wm, N  ln M  ln g env ( E   E m, N , N   N )  ln E    S env ( E   E m , N , N   N )  const
S S (2.102)
 S env E , N  env E , N E m , N  env E , N N  const.
  E env   N env  
To simplify this relation, let us rewrite Eq. (1.52) in the following equivalent form:
1 P 
dS 
dE  dV  dN . (2.103)
T T T
Hence, if the entropy S of a system is expressed as a function of E, V, and N, then
 S  1  S  P  S  
   ,    ,    . (2.104)
 E V , N T  V  E , N T  N  E ,V T
Applying the first one and the last one of these relations to the last form of Eq. (102), and using the
equality of the temperatures T and the chemical potentials  in the system under study and its
environment, at equilibrium (as was discussed in Sec. 1.5), we get

62The additional index in the new notation Em,N for the energy of the system of interest reflects the fact that its
spectrum is generally dependent on the number N of particles in it.

Chapter 2 Page 31 of 44
Essential Graduate Physics SM: Statistical Mechanics

1 
ln Wm, N  S env ( E  , N  ) 
E m , N  N  const . (2.105)
T T
Again, exactly as at the derivation of the Gibbs distribution in Sec. 4, we may argue that since Em,N, T,
and  do not depend on the choice of environment’s size, i.e. on E and N, the probability Wm,N for a
system to have N particles and be in mth quantum state in the whole grand canonical ensemble should
also obey Eq. (105). As a result, we get the so-called grand canonical distribution:
1  N  E m, N  1  N   E m, N  Grand
Wm , N  exp  exp  exp , (2.106) canonical
ZG  T  ZG T   T  distribution

where, just as in the case of the Gibbs distribution (2.58), the constant ZG (most often called the grand
statistical sum, but sometimes the “grand partition function”) should be determined from the probability
normalization condition. However, now the summation of the probabilities Wm,N should be over all
possible values of both m and N:
 N  E m, N 
Z G   exp  . (2.107) Grand
canonical
m, N  T  sum

The last multiplier in the last form of Eq. (106) is the same as in the Gibbs distribution, and its
physical interpretation is similar: states are “punished” by lower probability for their excessively higher
energy. The handwaving interpretation of the first multiplier, with its opposite sign, is different: in the
absence of the energy-related penalty Em,N, the environment with an average particle energy  > 0
“wants” to flood the system with more particles.
Now let us see how the grand canonical distribution may be used for calculations of measurable
variables. First, using the general Eq. (29) to calculate the entropy from Eq. (106) (exactly like we did it
for the canonical ensemble), we get the following expression,
E  N
S    Wm , N ln Wm , N    ln Z G , (2.108)
m,N T T
which is evidently a generalization of Eq. (62).63 We see that now the grand thermodynamic potential 
(rather than the free energy F) may be expressed directly via the normalization coefficient ZG:
1  N  E m, N 
  F   N  E  TS   N  T ln  T ln  exp . (2.109)  from ZG
ZG m, N  T 
Finally, solving the last equality for ZG, and plugging the result back into Eq. (106), we can rewrite the
grand canonical distribution in the form
   N  E m , N 
Wm , N  exp , (2.110)
 T 
similar to Eq. (65) for the Gibbs distribution. Indeed, in the particular case when the number N of
particles is fixed, N = N, so  + N =  + N  F, Eq. (110) is reduced to Eq. (65).

63The average number of particles N is exactly what was called N in thermodynamics (see Chapter 1), but I
keep this explicit notation here to make a clear distinction between this average value of the variable, and its
particular values participating in Eqs. (102)-(110).

Chapter 2 Page 32 of 44
Essential Graduate Physics SM: Statistical Mechanics

2.8. Systems of independent particles


Now let us apply the general statistical distributions discussed above to a simple but very
important case when the system we are considering consists of many similar particles whose direct
interactions are negligible. As a result, each particular energy value Em,N of such a system may be
represented as a sum of energies εk of the particles, where the index k numbers single-particle states –
rather than those of the whole system as the index m does.
Let us start with the classical limit. In classical mechanics, the energy quantization effects are
negligible, i.e. there is a formally infinite number of quantum states k within each finite energy interval.
However, it is convenient to keep, for the time being, the discrete-state language, with the understanding
that the average number  Nk  of particles in each of these states, usually called the state occupancy, is
very small. In this case, we may apply the Gibbs distribution to the canonical ensemble of single
particles, and hence use it with the substitution Em → εk, so Eq. (58) becomes
  
Boltzmann
distribution
N k  c exp k   1, (2.111)
 T 
where the constant c should be found from the normalization condition:

k
N k  1. (2.112)

This is the famous Boltzmann distribution.64 Despite its formal similarity to the Gibbs
distribution (58), let me emphasize the conceptual difference between these two important formulas. The
Gibbs distribution describes the probability to find the whole system on one of its states with energy Em,
and it is always valid – more exactly, for any canonical ensemble of systems in thermodynamic
equilibrium. On the other hand, the Boltzmann distribution describes the occupancy of an energy level
of a single particle, and, as we will see in just a minute, is valid for quantum particles only in the
classical limit Nk  << 1, even if the particles do not interact directly.
The last fact may be surprising, because it may seem that as soon as particles of the system are
independent, nothing prevents us from using the Gibbs distribution to derive Eq. (111), regardless of the
value of  Nk . This is indeed true if the particles are distinguishable, i.e. may be distinguished from
each other – say by their definitely different spatial positions, or by the states of certain internal degrees
of freedom (say, spin), or by any other “pencil mark”. However, it is an experimental fact that
elementary particles of each particular type (say, electrons) are identical to each other, i.e. cannot be
“pencil-marked”.65 For such particles we have to be more careful: even if they do not interact directly,

64 The distribution was first suggested in 1877 by L. Boltzmann. For the particular case when  is the kinetic
energy of a free classical particle (and hence has a continuous spectrum), it is reduced to the Maxwell distribution
(see Sec. 3.1 below), which was derived earlier – in 1860.
65 This fact invites a natural question: what particles are “elementary enough” for their identity? For example,
protons and neutrons have an internal structure, in some sense consisting of quarks and gluons; can they be
considered elementary? Next, if protons and neutrons are elementary, are atoms? molecules? What about really
large molecules (such as proteins)? viruses? The general answer to these questions, given by quantum mechanics
(or rather experiment :-), is that any particles/systems, no matter how large and complex they are, are identical if
they not only have the same internal structure but also are exactly in the same internal quantum state – for
example, in the ground state of all their internal degrees of freedom. Evidently, the more complex are the
particles/systems, the harder it is to enforce this situation in experiment.

Chapter 2 Page 33 of 44
Essential Graduate Physics SM: Statistical Mechanics

there is still some indirect dependence in their behavior, which is especially evident for the so-called
fermions (elementary particles with semi-integer spin): they obey the Pauli exclusion principle that
forbids two identical particles to be in the same quantum state, even if they do not interact directly.66
Note that the term “the same quantum state” carries a heavy meaning load here. For example, if
two particles are confined to stay at different spatial positions (say, reliably locked in different boxes),
they are distinguishable even if they are internally identical. Thus the Pauli principle, as well as other
particle identity effects such as the Bose-Einstein condensation to be discussed in the next chapter, are
important only when identical particles may move in the same spatial region. To emphasize this fact, it
is common to use, instead of “identical”, a more precise (though grammatically rather unpleasant)
adjective indistinguishable.
In order to take these effects into account, let us examine the statistical properties of a system of
many non-interacting but indistinguishable particles (at the first stage of calculation, either fermions or
bosons) in equilibrium, applying the grand canonical distribution (109) to a very unusual grand
canonical ensemble: a subset of particles in the same quantum state k (Fig. 14).

single-particle energy levels:



k
Fig. 2.14. The grand canonical
… ensemble of particles in the same
1 quantum state with energy k –
0 schematically.
particle #: 1 2 … j …

In this ensemble, the role of the environment may be played just by the set of particles in all
other states k’  k, because due to infinitesimal interactions, the particles may gradually change their
states. In the resulting equilibrium, the chemical potential  and temperature T of the system should not
depend on the state number k, though the grand thermodynamic potential k of the chosen particle
subset may. Replacing N with Nk – the particular (not average!) number of particles in the selected kth
state, and the particular energy value Em,N with kNk, we reduce the final form of Eq. (109) to

   N 
  N k   k N k       k  k
 k  T ln  exp    T ln   exp
   , (2.113)
N  T  N   T  
 k  k 
where the summation should be carried out over all possible values of Nk. For the final calculation of
this sum, the elementary particle type is essential.
On one hand, for fermions, obeying the Pauli principle, the numbers Nk in Eq. (113) may take
only two values, either 0 (the state k is unoccupied) or 1 (the state is occupied), and the summation gives
 N
     k   k       k 
 k  T ln   exp
    T ln1  exp  . (2.114)
 N k  0, 1   T     T  
 

66 For a more detailed discussion of this issue, see, e.g., QM Sec. 8.1.

Chapter 2 Page 34 of 44
Essential Graduate Physics SM: Statistical Mechanics

Now the state occupancy may be calculated from the last of Eqs. (1.62) – in this case, with the (average)
N replaced with Nk:
Fermi-   k  1
Dirac N k     
 k   / T
. (2.115)
distribution    T ,V e 1
This is the famous Fermi-Dirac distribution, derived in 1926 independently by Enrico Fermi and Paul
Dirac.
On the other hand, bosons do not obey the Pauli principle, and for them the numbers Nk can take
any non-negative integer values. In this case, Eq. (113) turns into the following equality:
   N
    k  k  
N   k 
 k  T ln    exp     T ln   k , with   exp . (2.116)
 N k 0   T   N  0  T 
  k

This sum is just the usual geometric series, which converges if  < 1, giving
1      k 
 k  T ln  T ln1  exp  , for    k . (2.117)
1    T 
In this case, the average occupancy, again calculated using Eq. (1.62) with N replaced with  Nk , obeys
the Bose-Einstein distribution,
Bose-   k  1
Einstein N k        / T , for    k , (2.118)
distribution    T ,V e k 1
which was derived in 1924 by Satyendra Nath Bose (for the particular case  = 0) and generalized in
1925 by Albert Einstein to the case of arbitrary chemical potential. In particular, comparing Eq. (118)
with Eq. (72), we see that harmonic oscillator’s excitations,67 each with energy , may be considered
as bosons, with the chemical potential equal to zero. As a reminder, we have already obtained this
equality ( = 0) in a different way – see Eq. (93). Its physical interpretation is that the oscillator
excitations may be created inside the system, so there is no energy cost  of moving them into the
system under consideration from its environment.
The simple form of Eqs. (115) and (118), and their similarity (besides “only” the difference of
the signs before the unity in their denominators), is one of the most beautiful results of physics. This
similarity, however, should not disguise the fact that the energy dependences of the occupancies Nk
given by these two formulas are very much different – see their linear and semi-log plots in Fig. 15.
In the Fermi-Dirac statistics, the level occupancy is not only finite, but is below 1 at any energy,
while in the Bose-Einstein it may be above 1, and diverges at k   . However, as the temperature is
increased, it eventually becomes much larger than the difference (k – ). In this limit, Nk << 1, so both
quantum distributions coincide with each other, as well as with the classical Boltzmann distribution
(111) with c = exp{/T}:

67 As the reader certainly knows, for electromagnetic field oscillators, such excitations are called photons; for
mechanical oscillation modes, phonons, etc. It is important, however, not to confuse such mode excitations with
the oscillators as such, and be very careful in prescribing to them certain spatial locations – see, e.g., QM Sec. 9.1.

Chapter 2 Page 35 of 44
Essential Graduate Physics SM: Statistical Mechanics

Boltzmann
  k 
Nk  exp , for N k  0 . (2.119) distribution:
identical
 T  particles

This distribution (also shown in Fig. 15) may be, therefore, understood also as the high-temperature
limit for indistinguishable particles of both sorts.
10

0.8

Nk
0.6 Nk
1

0.4

0.2

0 0.1
3 2 1 0 1 2 3 3 2 1 0 1 2 3
 k    / T  k    / T
Fig. 2.15. The Fermi-Dirac (blue line), Bose-Einstein (red line), and Boltzmann (dashed line) distributions
for indistinguishable quantum particles. (The last distribution is valid only asymptotically, at Nk << 1.)

A natural question now is how to find the chemical potential  participating in Eqs. (115), (118),
and (119). In the grand canonical ensemble as such (Fig. 13), with the number of particles variable, the
value of  is imposed by the system’s environment. However, both the Fermi-Dirac and Bose-Einstein
distributions are also approximately applicable (in thermal equilibrium) to systems with a fixed but very
large number N of particles. In these conditions, the role of the environment for some subset of N’ << N
particles is essentially played by the remaining N – N’ particles. In this case,  may be found by the
calculation of N from the corresponding probability distribution and then requiring the result to be
equal to the genuine number of particles in the system. In the next section, we will perform such
calculations for several particular systems.
For that and other applications, it will be convenient for us to have ready formulas for the
entropy S of a general (i.e. not necessarily equilibrium) state of systems of independent Fermi and Bose
particles, expressed not as a function of Wm of the whole system, as in Eq. (29), but via the occupancy
numbers  Nk . For that, let us consider an ensemble of composite systems, each consisting of M >> 1
similar but distinct component systems, numbered by index m = 1, 2, … M, with independent (i.e. not
directly interacting) particles – see Fig. 16. Let us assume that though in each of the M component
systems, the number Nk(m) of particles in their kth quantum state may be different, their total number Nk()
in the composite system is fixed. As a result, the total energy of the composite system is fixed as well,
M M

 N k(m)  N k   const,


m 1
E k   N k( m )  k  N k  k  const ,
m 1
(2.120)

Chapter 2 Page 36 of 44
Essential Graduate Physics SM: Statistical Mechanics

so an ensemble of many such composite systems (with the same k), in equilibrium, is microcanonical.
According to Eq. (24a), the average entropy Sk per component system in this microcanonical ensemble
may be calculated as
ln M k
S k  lim M  , (2.121)
M
where Mk is the number of possible different ways such a composite system (with fixed Nk()) may be
implemented.

component system’s number: 1 2 … m … M

k
number of particles in the kth
single-particle quantum state: N k(1) N k( 2 ) N k(m ) N k(M )

Fig. 2.16. An example of a composite system of Nk() particles in


the kth quantum state, distributed between M component systems.

Let us start with the calculation of Mk for Fermi particles – for which the Pauli principle is valid.
Here the level occupancies Nk(m) may be only equal to either 0 or 1, so the distribution problem is
solvable only if Nk()  M, and evidently equivalent to the choice of Nk() balls (in arbitrary order) from
the total number of M distinct balls. Comparing this formulation with the definition of the binomial
coefficient,68 we immediately get
M!
M k  M C    . (2.122)
Nk ( M  N k  )! N k !
From here, using the Stirling formula (again, in its simplest form (27)), we get

S k   N k ln N k  1  N k  ln1  N k ,
Fermions:
entropy (2.123)
where
N k 
N k  lim M  (2.124)
M
is exactly the average occupancy of the kth single-particle state in each system, which was discussed
earlier in this section. Since for a Fermi system,  Nk  is always between 0 and 1, its entropy (123)
cannot be negative – see the blue line in Fig. 17.
In the Bose case, where the Pauli principle is not valid, the number N k (m) of particles in the kth
state in each of the systems is an arbitrary (non-negative) integer. Let us consider Nk() particles and (M
– 1) partitions (shown by vertical lines in Fig. 16) between M systems as (M – 1 + Nk()) mathematical
objects ordered along one axis. Each specific location of the partitions evidently fixes all Nk(m). Hence
Mk may be calculated as the number of possible ways to distribute the (M – 1) indistinguishable
partitions among these (M – 1 + Nk()) ordered objects, i.e. as the following binomial coefficient:69

68 See, e.g., MA Eq. (2.2).


69 See also MA Eq. (2.4).

Chapter 2 Page 37 of 44
Essential Graduate Physics SM: Statistical Mechanics

M  N k 1 ( M  1  N k  )!
Mk  C M 1  . (2.125)
( M  1)! N k !
Applying the Stirling formula (27) again, we get the following result,

S k   N k ln N k  1  N k  ln1  N k , (2.126) Bosons:


entropy

which again differs from the Fermi case (123) “only” by the signs in the second term, and is valid for
any positive Nk - see the red line in Fig. 17.
In the classical limit when the average occupancies  Nk  of the state is much smaller than 1, the
Fermi and Bose expressions for Sk tend to the same Boltzmann limit:

S k   N k ln N k  1   N k ln N k , for N k  0. (2.127)

(The last expression may be also obtained from the functionally similar Eq. (29), by considering an
ensemble of systems consisting of just one classical particle each, so Em  k and Wm   Nk .)

10

Sk)
1 Fig. 2.17. Entropy of particles in
a quantum state as a function of
its average occupancy Nk, for
fermions (blue line) and bosons
(red line). The dashed line shows
their common asymptote at Nk
 0, given by the first of Eqs.
0.1 (127).
0.01 0.1 1 10 100
Nk
Expressions (123) and (126) are valid for an arbitrary (possibly, non-equilibrium) case; they may
be also used for an alternative derivation of the Fermi-Dirac (115) and Bose-Einstein (118) distributions,
which are valid only in equilibrium. For that, we may use the method of Lagrange multipliers, requiring
(just like it was done in Sec. 2) the total entropy of a system of N independent, similar particles,
S   Sk , (2.128)
k

considered as a function of state occupancies Nk, to attain its maximum, under the conditions of the
fixed total number of particles N and total energy E:

k
N k  N  const , k
N k  k  E  const . (2.129)

The completion of this calculation is left for the reader’s exercise.

Chapter 2 Page 38 of 44
Essential Graduate Physics SM: Statistical Mechanics

2.9. Exercise problems

2.1. A famous example of macroscopic irreversibility was suggested in 1907 by P. Ehrenfest.


Two dogs share 2N >> 1 fleas. Each flea may jump onto another dog, and the rate  of such events (i.e.
the probability of jumping per unit time) does not depend either on time or on the location of other fleas.
Find the time evolution of the average number of fleas on a dog, and of the flea-related part of the total
dogs’ entropy (at arbitrary initial conditions), and prove that the entropy can only grow.70

2.2. Use the microcanonical distribution to calculate thermodynamic properties (including the
entropy, all relevant thermodynamic potentials, and the heat capacity), of a two-level system in
thermodynamic equilibrium with its environment, at temperature T that is comparable with the energy
gap . For each variable, sketch its temperature dependence, and find its asymptotic values (or trends) in
the low-temperature and high-temperature limits.
Hint: The two-level system is any quantum system with just two different stationary states,
whose energies (say, E0 and E1) are separated by a gap   E1 – E0. Its most popular (but by no means
the only!) example is the spin-½ of a particle, e.g., an electron, in an external magnetic field.71

2.3. Solve the previous problem using the Gibbs distribution. Also, calculate the probabilities of
the energy level occupation, and give physical interpretations of your results, in both temperature limits.

2.4. A quantum spin-½ particle with a gyromagnetic ratio  is placed into an external magnetic
field H = H nz. Neglecting the possible orbital motion of the particle, calculate its average
magnetization mz as a function H, and in particular its low-field magnetic susceptibility , in thermal
equilibrium at temperature T. Calculate the same characteristics for a classical spontaneous magnetic
dipole m of a fixed magnitude m0, free to change its direction in space, and compare the results.
Hint: The low-field magnetic susceptibility of a single particle is defined72 as
 mz
 H 0 .
H

2.5.* Calculate the weak-field magnetic susceptibility of a hydrogen atom, at the room
temperature. Is this response to the field paramagnetic or diamagnetic? Compare the result with the
estimated susceptibility of a hydrogen molecule H2.

2.6. N similar stiff rods are connected to form a 1D


chain (see the figure on the right) by the joints that allow T
for free 3D rotation. The chain, in thermal equilibrium at T l

70 This is essentially a simpler (and funnier :-) version of the particle scattering model used by L. Boltzmann to
prove his famous H-theorem (1872). Besides the historic significance of that theorem, the model used in it (see
Sec. 6.2 below) is as cartoonish, and not more general.
71 See, e.g., QM Secs. 4.6 and 5.1, for example, Eq. (4.167).
72 This “atomic” (or “molecular”) susceptibility should be distinguished from the “volumic” susceptibility  
m
Mz/H, where M is the magnetization, i.e. the magnetic moment of a unit volume of a system – see, e.g., EM
Eq. (5.111). For a uniform medium with n  N/V non-interacting dipoles per unit volume, m = n.

Chapter 2 Page 39 of 44
Essential Graduate Physics SM: Statistical Mechanics

temperature T, is stretched by a fixed force T . Calculate the spring constant  of the chain in the elastic
limit T  0.

2.7. Calculate the low-field magnetic susceptibility of a particle with an arbitrary (either integer
or semi-integer) spin s, neglecting its orbital motion. Compare the result with the solution of the
previous problem.
Hint: Quantum mechanics73 tells us that the Cartesian component mz of the magnetic moment of
such a particle, in the direction of the applied field, has (2s + 1) stationary values:
m z   m s , with m s   s,  s  1,..., s  1, s ,
where  is the gyromagnetic ratio of the particle, and  is Planck’s constant.

2.8.* Analyze the possibility of using a system of non-interacting spin-½ particles, placed into a
strong, controllable external magnetic field, for refrigeration.

2.9. A rudimentary “zipper” model of DNA replication is a chain


of N links that may be either open or closed – see the figure on the right.
Opening a link increases the system’s energy by  > 0; a link may 1 2 ... n n  1 ... N
change its state (either open or closed) only if all links to the left of it are
open, while those on the right of it, are closed. Calculate the average number of open links in thermal
equilibrium, and analyze its temperature dependence, especially for the case N >> 1.

2.10. Use the microcanonical distribution to calculate the average entropy, energy, and pressure
of a classical particle of mass m, with no internal degrees of freedom, free to move in volume V, at
temperature T.
Hint: Try to make a more accurate calculation than has been done in Sec. 2.2 for the system of N
harmonic oscillators. For that, you will need to know the volume Vd of a d-dimensional hypersphere of
the unit radius. To avoid being too cruel, I am giving it to you:
d 
V d   d / 2    1 ,
2 
where () is the gamma function. 74

2.11. Solve the previous problem using the Gibbs distribution.

2.12. Calculate the average energy, entropy, free energy, and the equation of state of a classical
2D particle (without internal degrees of freedom), free to move within area A, at temperature T, starting
from:
(i) the microcanonical distribution, and
(ii) the Gibbs distribution.
Hint: For the equation of state, make the appropriate modification of the notion of pressure.

73 See, e.g., QM Sec. 5.7, in particular Eq. (5.169).


74 For its definition and main properties, see, e.g., MA Eqs. (6.6)-(6.9).

Chapter 2 Page 40 of 44
Essential Graduate Physics SM: Statistical Mechanics

2.13. A quantum particle of mass m is confined to free motion along a 1D segment of length a.
Using any approach you like, calculate the average force the particle exerts on the “walls” (ends) of such
a “1D potential well” in thermal equilibrium, and analyze its temperature dependence, focusing on the
low-temperature and high-temperature limits.
Hint: You may consider the series     exp  n 2  a known function of . 75

n 1

2.14. Rotational properties of diatomic molecules (such as N2, CO, etc.) may be reasonably well
described by the so-called dumbbell model: two point particles, of masses m1 and m2, with a fixed
distance d between them. Ignoring the translational motion of the molecule as a whole, use this model to
calculate its heat capacity, and spell out the result in the limits of low and high temperatures. Is your
solution is valid for the so-called homonuclear molecules consisting of two similar atoms, such as H2,
O2, N2, etc?

2.15.* Modify the solution of the previous problem for homonuclear molecules. Specifically,
consider the cases of molecules H2 and N2. For the first of them, compute the equilibrium ratio of the
number of the ortho- and parahydrogen molecules as a function of temperature.

2.16. Calculate the heat capacity of a heteronuclear diatomic molecule, using the simple model
described in Problem 14, but now assuming that the rotation is confined to one plane.76

2.17. A classical, rigid, strongly elongated body (such as a thin needle), is free to rotate about its
center of mass, and is in thermal equilibrium with its environment. Are the angular velocity vector 
and the angular momentum vector L, on average, directed along the elongation axis of the body, or
normal to it?

2.18. Two similar classical electric dipoles, of a fixed magnitude d, are separated by a fixed
distance r. Assuming that each dipole moment vector d may point at any direction and that the system is
in thermal equilibrium, write general expressions for its statistical sum Z, average interaction energy E,
heat capacity C, and entropy S, and calculate them explicitly in the high-temperature limit.

2.19. A classical 1D particle of mass m, residing in the potential well



U x    x , with   0 ,
is in thermal equilibrium with its environment, at temperature T. Calculate the average values of its
potential energy U and the full energy E, using two approaches:
(i) directly from the Gibbs distribution, and
(ii) using the virial theorem of classical mechanics.77

75 It may be reduced to the so-called elliptic theta-function 3(z, ) for a particular case z = 0 – see, e.g., Sec. 16.27
in the Abramowitz-Stegun handbook cited in MA Sec. 16(ii). However, you do not need that (or any other)
handbook to solve this problem.
76 This is a reasonable model of the constraints imposed on small atomic groups (e.g., ligands) by their atomic
environment inside some large molecules.
77 See, e.g., CM Problem 1.12.

Chapter 2 Page 41 of 44
Essential Graduate Physics SM: Statistical Mechanics

2. 20. For a slightly anharmonic classical 1D oscillator with mass m and potential energy

U  x   x 2  x 3
2
with a small coefficient , calculate:
(i) the statistical average of the coordinate x, and
(ii) the deviation of the heat capacity from its basic value C =1,
in the first (linear) approximation in low temperature T.

2.21.* A small conductor (in this context, usually called the


single-electron island) is placed between two conducting
C
electrodes, with voltage V applied between them. The gap between
V " island"
one of the electrodes and the island is so narrow that electrons may
tunnel quantum-mechanically through this gap (the “weak tunnel Q  -ne
junction”) – see the figure on the right. Calculate the average n tunnel
C0 junction
charge of the island as a function of V at temperature T.
Hint: The quantum-mechanical tunneling of an electron
through a weak junction78 between two macroscopic conductors and their subsequent energy relaxation,
may be considered as a single inelastic (energy-dissipating) event, so the only energy relevant for the
thermal equilibrium of the system is its electrostatic potential energy.

2.22. An LC circuit (see the figure on the right) is in thermodynamic


equilibrium with its environment. Calculate the r.m.s. fluctuation V  V 21/2 V C L
of the voltage across it, for an arbitrary ratio T/, where  = (LC)-1/2 is the
resonance frequency of this “tank circuit”.

2.23. Derive Eq. (92) from simplistic arguments, representing the blackbody radiation as an ideal
gas of photons treated as classical ultra-relativistic particles. What do similar arguments give for an
ideal gas of classical but non-relativistic particles?

2.24. Calculate the enthalpy, the entropy, and the Gibbs energy of blackbody electromagnetic
radiation with temperature T inside volume V, and then use these results to find the law of temperature
and pressure drop at an adiabatic expansion.

2.25. As was mentioned in Sec. 6(i), the relation between the temperatures T of the visible
Sun’s surface and that (To) of the Earth’s surface follows from the balance of the thermal radiation they
emit. Prove that the experimentally observed relation indeed follows, with good precision, from a simple
model in which the surfaces radiate as perfect black bodies with constant temperatures.
Hint: You may pick up the experimental values you need from any reliable source.

78 In this particular context, the adjective “weak” denotes a junction with the tunneling transparency so low that
the tunneling electron’s wavefunction loses its quantum-mechanical coherence before the electron has a chance to
tunnel back. In a typical junction of a macroscopic area this condition is fulfilled if its effective resistance is much
higher than the quantum unit of resistance (see, e.g., QM Sec. 3.2), RQ  /2e2  6.5 k.

Chapter 2 Page 42 of 44
Essential Graduate Physics SM: Statistical Mechanics

2.26. If a surface is not perfectly radiation-absorbing (“black”), the electromagnetic power of its
thermal radiation differs from the Planck radiation law by a frequency-dependent factor  < 1, called
emissivity. Prove that such surface reflects the (1 – ) fraction of the incident radiation.

2.27. If two black surfaces, facing each other, have different


temperatures (see the figure on the right), then according to the Stefan
radiation law (89), there is a net flow of thermal radiation, from a warmer T Pnet
1 T2  T1
surface to the colder one:
Pnet
A
 
  T14  T24 .

For many applications, notably including most low-temperature


experiments, this flow is detrimental. One way to suppress it is to reduce the emissivity  (for its
definition, see the previous problem) of both surfaces – say by covering them with shiny metallic films.
An alternative way toward the same goal is to place, between the surfaces, a thin layer (usually called
the thermal shield), with a low emissivity of both surfaces – see the dashed line in Fig. above. Assuming
that the emissivity is the same in both cases, find out which way is more efficient.

2.28. Two parallel, well-conducting plates of area A are separated by a free-space gap of a
constant thickness t << A1/2. Calculate the energy of the thermally-induced electromagnetic field inside
the gap at thermal equilibrium with temperature T in the range
c c .
1/ 2
 T 
A t
Does the field push the plates apart?

2.29. Use the Debye theory to estimate the specific heat of aluminum at room temperature (say,
300 K), and express the result in the following popular units:
(i) eV/K per atom,
(ii) J/K per mole, and
(iii) J/K per gram.
Compare the last number with the experimental value (from a reliable book or online source).

2.30. Low-temperature specific heat of some solids has a considerable contribution from the
thermal excitation of spin waves, whose dispersion law scales as   k2 at   0.79 Neglecting
anisotropy, calculate the temperature dependence of this contribution to CV at low temperatures, and
discuss conditions of its experimental observation.
Hint: Just as the photons and phonons discussed in section 2.6, the quantum excitations of spin
waves (called magnons) may be considered as non-interacting bosonic quasiparticles with zero chemical
potential, whose statistics obeys Eq. (2.72).

2.31. Derive a general expression for the specific heat of a very long, d d
straight chain of similar particles of mass m, confined to move only in the m m m
direction of the chain, and elastically interacting with effective spring  
79 Note that the same dispersion law is typical for bending waves in thin elastic rods – see, e.g., CM Sec. 7.8.

Chapter 2 Page 43 of 44
Essential Graduate Physics SM: Statistical Mechanics

constants  – see the figure on the right. Spell out the result in the limits of very low and very high
temperatures. Would using the Debye approximation change these results?

 2 d 2
Hint: You may like to use the following integral:80 0 sinh 2   6 .
2.32. Use the Debye approximation to obtain a general expression for the longitudinal phonon
contribution to the specific heat of a stand-alone monatomic layer of an elastic material (such as the
graphene). Find its explicit temperature dependence at T  0.

2.33. Calculate the r.m.s. thermal fluctuation of an arbitrary point of a uniform guitar string of
length l, stretched by force T, at temperature T. Evaluate your result for l = 0.7 m, T = 103 N, and room
temperature.

sin 2 n   1   
Hint: You may like to use the following series:   .
n 1 n2 2

2.34. Use the general Eq. (123) to re-derive the Fermi-Dirac distribution (115) for a system in
equilibrium.

2.35. Each of two identical particles, not interacting directly, may be in any of two quantum
states, with single-particle energies  equal to 0 and . Write down the statistical sum Z of the system,
and use it to calculate its average total energy E at temperature T, for the cases when the particles are:
(i) distinguishable (say, by their spatial positions);
(ii) indistinguishable fermions;
(iii) indistinguishable bosons.
Analyze and interpret the temperature dependence of E for each case, assuming that  > 0.

2.36. Each of N >> 1 indistinguishable fermions has two non-degenerate energy levels separated
by gap . Calculate the chemical potential of their system in thermal equilibrium at temperature T, if the
direct interaction of the particles is negligible.

80 It may be reduced, via integration by parts, to the table integral MA Eq. (6.8d) with n = 1.

Chapter 2 Page 44 of 44
Essential Graduate Physics SM: Statistical Mechanics

Chapter 3. Ideal and Not-So-Ideal Gases


In this chapter, the general principles of thermodynamics and statistics, discussed in the previous two
chapters, are applied to examine the basic physical properties of gases, i.e. collections of identical
particles (for example, atoms or molecules) that are free to move inside a certain volume, either not
interacting or weakly interacting with each other. We will see that due to the quantum statistics,
properties of even the simplest, so-called ideal gases, with negligible direct interactions between
particles, may be highly nontrivial.

3.1. Ideal classical gas


Direct interactions of typical atoms and molecules are well localized, i.e. rapidly decreasing
when the distance r exceeds a certain scale r0. In a gas of N particles inside volume V, the average
distance rave between the particles is (V/N)1/3. As a result, if the gas density n  N/V = (rave)-3 is much
lower than r0-3, i.e. if nr03 << 1, the chance for its particles to approach each other and interact
significantly is rather small. The model in which such direct interactions are completely ignored is
called the ideal gas.
Let us start with a classical ideal gas, which may be defined as the ideal gas in whose behavior
the quantum effects are also negligible. As was discussed in Sec. 2.8, the condition of that is to have the
average occupancy of each quantum state low:
N k  1 . (3.1)

It may seem that we have already found all properties of such a system, in particular the equilibrium
occupancy of its states – see Eq. (2.111):
  
N k  const  exp k  . (3.2)
 T 
In some sense this is true, but we still need, first, to see what exactly Eq. (2) means for the gas, i.e. a
system with an essentially continuous energy spectrum, and, second, to show that, rather surprisingly,
the particles’ indistinguishability affects some properties of even classical gases.
The first of these tasks is evidently easiest for gas out of any external fields, and with no internal
degrees of freedom.1 In this case, k is just the kinetic energy of the particle, which is an isotropic and
parabolic function of p:
p2 p x2  p y2  p z2
k   . (3.3)
2m 2m
Now we have to use two facts from other fields of physics, hopefully well known to the reader. First, in
quantum mechanics, the linear momentum p is associated with the wavevector k of the de Broglie wave,

1 In more realistic cases when particles do have internal degrees of freedom, but they are all in a certain (say,
ground) quantum state, Eq. (3) is valid as well, with k counted from the fixed (e.g., ground-state) internal energy.
The effect of thermal excitation of the internal degrees of freedom will be discussed at the end of this section.

© K. Likharev
Essential Graduate Physics SM: Statistical Mechanics

p = k. Second, the eigenvalues of k for any waves (including the de Broglie waves) in free space are
uniformly distributed in the momentum space, with a constant density of states, given by Eq. (2.82):
dN states gV dN states gV
 , i.e.  , (3.4)
3
d k (2 ) 3 3
d p (2 ) 3
where g is the degeneracy of the particle’s internal states (for example, for all spin-½ particles, the spin
contribution to the internal degeneracy g = 2s + 1 = 2). Even regardless of the exact proportionality
coefficient between dNstates and d3p, the very fact that this coefficient does not depend on p means that
the probability dW to find the particle in a small region d3p = dp1dp2dp3 of the momentum space is
proportional to the right-hand side of Eq. (2), with k given by Eq. (3):

 p2  3  p12  p 22  p32  Maxwell


dW  C exp d p  C exp dp1 dp 2 dp3 . (3.5) distribution
 2mT   2mT 
This is the famous Maxwell distribution.2 The normalization constant C may be readily found
from the last form of Eq. (5), by requiring the integral of dW over all the momentum space to equal 1.
Namely, such integral is evidently a product of three similar 1D integrals over each Cartesian
component pj of the momentum (j = 1, 2, 3), which may be readily reduced to the well-known
dimensionless Gaussian integral,3 so we get
3 3
   p 2j    
2 
C    exp  2mT   e  d   2 mT 
1/ 2 3 / 2
dp j  . (3.6)
   2mT     
As a sanity check, let us use the Maxwell distribution to calculate the average energy
corresponding to each half-degree of freedom:
2
p 2j p 2j   2
pj  p 2j    1 / 3   p 2j '  
 dW  C 1 / 3  exp  j   C  exp
dp dp j ' 
2m 2m  
2m  2mT      2mT   (3.7)

T 2
 1 / 2   2 e  d .
 

The last, dimensionless integral equals /2,4 so, finally,


p 2j mv 2j T
  (3.8)
2m 2 2

2 This formula had been suggested by J. C. Maxwell as early as 1860, i.e. well before the Boltzmann and Gibbs
distributions were developed. Note also that the term “Maxwell distribution” is often associated with the
distribution of the particle momentum (or velocity) magnitude,
 p2   mv 2 
dW  4 Cp 2 exp  dp  4 Cm 3 2
v exp  dv, with 0  p, v  ,
 2 mT   2T 
which immediately follows from the first form of Eq. (5), combined with the expression d3p = 4p2dp due to the
spherical symmetry of the distribution in the momentum/velocity space.
3 See, e.g., MA Eq. (6.9b).
4 See, e.g., MA Eq. (6.9c).

Chapter 3 Page 2 of 34
Essential Graduate Physics SM: Statistical Mechanics

This result is (fortunately :-) in agreement with the equipartition theorem (2.48). It also means
that the r.m.s. velocity of each particle is
1/ 2 1/ 2
3
 T
v  v 2 1/ 2
 v
j 1
2
j  3v 2 1/ 2
j  3 
 m
. (3.9)

For a typical gas (say, for N2, our air’s main component), with m  28mp  4.710-26 kg, this velocity, at
room temperature (T = kBTK  kB300 K  4.110-21 J) is about 500 m/s, comparable with the sound
velocity in the same gas – and well above with the muzzle velocity of a typical handgun bullet. Still, it is
measurable using even simple table-top equipment (say, a set of two concentric, rapidly rotating
cylinders with a thin slit collimating an atomic beam emitted at the axis) that was available in the end of
the 19th century. Experiments using such equipment gave convincing early confirmations of the
Maxwell distribution.
This is all very simple (isn’t it?), but actually the thermodynamic properties of a classical gas,
especially its entropy, are more intricate. To show that, let us apply the Gibbs distribution to a gas
portion consisting of N particles rather than just one of them. If the particles are exactly similar, the
eigenenergy spectrum {k} of each of them is also exactly the same, and each value Em of the total
energy is just the sum of particular energies k(l) of the particles, where k(l), with l = 1, 2, … N, is the
number of the energy level on which the lth particle resides. Moreover, since the gas is classical,  Nk 
<< 1, the probability of having two or more particles in any state may be ignored. As a result, we can
use Eq. (2.59) to write
 E   1    k ( l ) 
Z   exp m    exp   k ( l )     ...  l  T  ,
exp (3.10)
m  T  k (l )  T l  k (1) k ( 2 ) k ( N )  
where the summation has to be carried over all possible states of each particle. Since the summation
over each set {k(l)} concerns only one of the operands of the product of exponents under the sum, it is
very tempting to complete the calculation as follows:
N
            
Z  Z dist   exp k (1)    exp k ( 2 ) ...   exp k ( N )     exp k   , (3.11)
k (1)  T  k ( 2)  T  k(N )  T   k  T 

where the final summation is over all states of one particle. This formula is indeed valid for
distinguishable particles.5 However, if the particles are indistinguishable (again, meaning that they are
internally identical and free to move within the same spatial region), Eq. (11) has to be modified by
what is called the correct Boltzmann counting:
N
Correct 1    
Boltzmann Z   exp k   , (3.12)
counting N!  k  T 
that considers all quantum states different only by particle permutations, as the same state.

5 Since, by our initial assumption, each particle belongs to the same portion of gas, i.e. cannot be distinguished
from others by its spatial position, this requires some internal “pencil mark” for each particle – for example, a
specific structure or a specific quantum state of its internal degrees of freedom.

Chapter 3 Page 3 of 34
Essential Graduate Physics SM: Statistical Mechanics

This expression is valid for any set {k} of eigenenergies. Now let us use it for the translational
3D motion of free particles, taking into account that the fundamental relation (4) implies the following
rule for the replacement of a sum over quantum states of such motion with an integral:6
gV gV
 ...   ... dN
k
states 
(2 ) 3  ... d
3
k
(2 ) 3  ... d
3
p. (3.13)

In application to Eq. (12), this rule yields


N
    p 2j  
3

1  gV 
Z   exp dp j   . (3.14)
N !  (2 ) 3    2mT   
 
The integral in the square brackets is the same one as in Eq. (6), i.e. is equal to (2mT)1/2, so finally
N
1   mT 
N 3/ 2
1  gV 3/ 2  
Z  ( 2 mT ) 
   gV    . (3.15)
N !  (2  ) 3  N !   2  2  

Now, assuming that N >> 1,7 and applying the Stirling formula, we can calculate the gas’ free energy:
1 V
F  T ln   NT ln  Nf (T ), (3.16a)
Z N
with8
   mT 
3/ 2
  Sackur –
f (T )  T ln  g     1 . (3.16b) Tetrode
   2 
2
   formula

The first of these relations exactly coincides with Eq. (1.45), which was derived in Sec. 1.4 from
the equation of state PV = NT, using thermodynamic identities. At that stage, this equation of state was
just postulated, but now we can derive it by calculating the pressure from Eq. (16a), using the second of
Eqs. (1.35):
 F  NT
P     . (3.17)
 V  T V
So, the equation of state of the ideal classical gas, with density n  N/V, is indeed given by Eq. (1.44):
NT
P
 nT . (3.18)
V
Hence we may use Eqs.(1.46)-(1.51), derived from this equation of state, to calculate all other
thermodynamic variables of the gas. For example, using Eq. (1.47) with f(T) given by Eq. (16b), for the
internal energy and the specific heat of the gas we immediately get

6 As a reminder, we have already used this rule twice in Sec. 2.6, with particular values of g.
7 For the opposite limit when N = g = 1, Eq. (15) yields the results obtained, by two alternative methods, in the
solutions of Problems 2.8 and 2.9. Indeed, for N = 1, the “correct Boltzmann counting” factor N! equals 1, so that
the particle distinguishability effects vanish – naturally.
8 This formula was derived (independently) by O. Sackur and H. Tetrode as early as in 1911, i.e. well before the
final formulation of quantum mechanics in the late 1920s.

Chapter 3 Page 4 of 34
Essential Graduate Physics SM: Statistical Mechanics

 df T   3 CV 1  E  3
E  N  f T   T  NT , cV      , (3.19)
 dT  2 N N  T V 2
in full agreement with Eq. (8) and hence with the equipartition theorem.
Much less trivial is the result for entropy (essentially, conjectured in Sec. 1.4):
 F   V df (T ) 
S     N ln  . (3.20)
 T V  N dT 
This formula provides the means to resolve the following gas mixing paradox (sometimes called the
“Gibbs paradox”). Consider two volumes, V1 and V2, separated by a partition, each filled with the same
gas, with the same density n, at the same temperature T, and hence with the same pressure P. Now let us
remove the partition and let the gas portions mix; would the total entropy change? According to Eq.
(20), it would not, because the ratio V/N, and hence the expression in the square brackets is the same in
the initial and the final state, so the entropy is additive, as any extensive variable should be. This makes
full sense if the gas particles in both parts of the volume are truly identical, i.e. the partition’s removal
does not change our information about the system. However, let us assume that all particles are
distinguishable; then the entropy should clearly increase because the mixing would decrease our
information about the system, i.e. increase its disorder. A quantitative description of this effect may be
obtained using Eq. (11). Repeating for Zdist the calculations made above for Z, we readily get a different
formula for entropy:

 df (T )    mT  3 / 2 
S dist  N ln V  dist , f dist (T )  T ln  g  2  . (3.21)
 dT    2  
Please notice that in contrast to the S given by Eq. (20), this entropy includes the term lnV
instead of ln(V/N), so Sdir is not proportional to N (at fixed temperature T and density N/V). While for
distinguishable particles this fact does not present any conceptual problem, for indistinguishable
particles it would mean that entropy was not an extensive variable, i.e. would contradict the basic
assumptions of thermodynamics. This fact emphasizes again the necessity of the correct Boltzmann
counting in the latter case.
Using Eq. (21), we can calculate the change of entropy due to mixing two gas portions, with N1
and N2 distinguishable particles, at a fixed temperature T (and hence at unchanged function fdist):
V1  V2 V  V2
S dist   N 1  N 2  ln(V1  V2 )   N 1 ln V1  N 2 ln V2   N 1 ln  N 2 ln 1 , (3.22)
V1 V2
so the change is positive even for V1/N1 = V2/N2. Note that for a particular case, V1 = V2 = V/2, Eq. (22)
reduces to the simple result, Sdist = (N1 + N2) ln2, which may be readily understood in terms of the
information theory. Indeed, allowing each particle of the total number N = N1 + N2 to spread to a twice
larger volume, we lose one bit of information per particle, i.e. I = (N1 + N2) bits for the whole system.
Let me leave it for the reader to show that Eq. (22) remains valid if particles in each sub-volume are
indistinguishable from each other, but different from those in another sub-volume, i.e. for mixing of two
different gases.9 However, it is certainly not applicable to the system where all particles are identical,

9 By the way, if an ideal classical gas consists of particles of several different sorts, its full pressure is a sum of
independent partial pressures exerted by each component – the so-called Dalton law. While this fact was an

Chapter 3 Page 5 of 34
Essential Graduate Physics SM: Statistical Mechanics

stressing again that the correct Boltzmann counting (12) does indeed affect the gas entropy, even though
it may be not as consequential as the Maxwell distribution (5), the equation of state (18), and the
average energy (19).
In this context, one may wonder whether the change (22) (called the mixing entropy) is
experimentally observable. The answer is yes. For example, after free mixing of two different gases, and
hence increasing their total entropy by Sdist, one can use a thin movable membrane that is
semipermeable, i.e. whose pores are penetrable for particles of one type only, to separate them again,
thus reducing the entropy back to the initial value, and measure either the necessary mechanical work
W = TSdist or the corresponding heat discharge into the heat bath. Practically, measurements of this
type are easier in weak solutions10 – systems with a small relative concentration c << 1 of particles of
one sort (solute) within much more abundant particles of another sort (solvent). The mixing entropy also
affects the thermodynamics of chemical reactions in gases and liquids.11 Note that besides purely
thermal-mechanical measurements, the mixing entropy in some conducting solutions (electrolytes) is
also measurable by a purely electrical method, called cyclic voltammetry, in which a low-frequency ac
voltage, applied between two solid-state electrodes embedded in the solution, is used to periodically
separate different ions, and then mix them again.12
Now let us briefly discuss two generalizations of our results for ideal classical gases. First, let us
consider such gas in an external field of potential forces. It may be described by replacing Eq. (3) with
p k2
k   U (rk ) , (3.23)
2m
where rk is the position of the kth particular particle, and U(r) is the potential energy of the particle. If
the potential U(r) is changing in space sufficiently slowly,13 Eq. (4) is still applicable, but only to small
volumes, V → dV = d3r whose linear size is much smaller than the spatial scale of substantial variations
of the function U(r). Hence, instead of Eq. (5), we may only write the probability dW of finding the
particle in a small volume d3rd3p of the 6-dimensional phase space:
 p2 U (r )  . (3.24)
dW  w(r, p)d 3 rd 3 p, w(r, p)  const  exp  
 2mT T 

important experimental discovery in the early 1800s, for statistical physics this is just a straightforward corollary
of Eq. (18), because in an ideal gas, the component particles do not interact.
10 The statistical mechanics of weak solutions is very similar to that of ideal gases, with Eq. (18) recast into the
following formula (derived in 1885 by J. van ’t Hoff), PV = cNT, for the partial pressure of the solute. One of its
corollaries is that the net force (called the osmotic pressure) exerted on a semipermeable membrane is
proportional to the difference in the solute concentrations it is supporting.
11 Unfortunately, I do not have time for even a brief introduction to this important field, and have to refer the
interested reader to specialized textbooks – for example, P. A. Rock, Chemical Thermodynamics, University
Science Books, 1983; or P. Atkins, Physical Chemistry, 5th ed., Freeman, 1994; or G. M. Barrow, Physical
Chemistry, 6th ed., McGraw-Hill, 1996.
12 See, e.g., either Chapter 6 in A. Bard and L. Falkner, Electrochemical Methods, 2nd ed., Wiley, 2000 (which is a
good introduction to electrochemistry as a whole); or Sec. II.8.3.1 in F. Scholz (ed.), Electroanalytical Methods,
2nd ed., Springer, 2010.
13 Quantitatively, the spatial scale of substantial variations of the potential, U(r)/T-1, has to be much larger than
the mean free path l of the gas particles, i.e. the average distance a particle passes between successive collisions
with its counterparts. (For more on this notion, see Chapter 6 below.)

Chapter 3 Page 6 of 34
Essential Graduate Physics SM: Statistical Mechanics

Hence, the Maxwell distribution of particle velocities is still valid at each point r, so the equation of
state (18) is also valid locally. A new issue here is the spatial distribution of the total density,

n(r )  N  w(r, p)d 3 p , (3.25)

of all gas particles, regardless of their momentum/velocity. For this variable, Eq. (24) yields14
 U (r ) 
n(r )  n(0) exp , (3.26)
 T 
where the potential energy at the origin (r = 0) is used as the reference for U. The local gas pressure may
be still calculated from the local form of Eq. (18):
 U (r ) 
P(r )  n(r )T  P(0) exp . (3.27)
 T 
A simple example of numerous applications of Eq. (27) is an approximate description of the
Earth’s atmosphere. At all heights h << RE ~ 6106 m above the Earth’s surface (say, above the sea
level), we may describe the Earth’s gravity effect by the potential U = mgh, and Eq. (27) yields the so-
called barometric formula
Barometric  h T k T
formula P (h)  P(0) exp , with h0   B K . (3.28)
 h0  mg mg
For the same N2, the main component of the atmosphere, at TK = 300 K, h0 ≈ 7 km. This result gives the
correct order of magnitude of the atmosphere’s thickness, though the exact law of the pressure change
differs somewhat from Eq. (28) because electromagnetic radiation flows result in a relatively small
deviation of the atmospheric air from the thermal equilibrium, namely a drop of its temperature T with
height, with the so-called lapse rate of about 2% (~6.5 K) per kilometer.
The second generalization I need to discuss is to particles with internal degrees of freedom. Now
ignoring the potential energy U(r), we may describe them by replacing Eq. (3) with
p2
k    k' , (3.29)
2m
where k’ describes the internal energy spectrum of the kth particle. If the particles are similar, we may
repeat all the above calculations, and see that all their results (including the Maxwell distribution, and
the equation of state) are still valid, with the only exception of Eq. (16), which now becomes
   mT  3 / 2     k'  
f (T )  T ln  g  2    1  ln    T   .
exp (3.30)
   2     k'  

As we already know from Eqs. (1.50)-(1.51), this change may affect both specific heats of the
ideal gas – though not their difference, cV – cP = 1. They may be readily calculated for usual atoms and
molecules, at not very high temperatures (say the room temperature of ~25 meV), because in these
conditions, k’ >> T for most of their internal degrees of freedom, including the electronic and

14 In some textbooks, Eq. (26) is also called the Boltzmann distribution, though it certainly should be
distinguished from Eq. (2.111).

Chapter 3 Page 7 of 34
Essential Graduate Physics SM: Statistical Mechanics

vibrational ones. (The typical energy of the lowest electronic excitations is of the order of a few eV, and
that of the lowest vibrational excitations is only an order of magnitude lower.) As a result, these degrees
of freedom are “frozen out”: they are in their ground states, so their contributions exp{-k’/T} to the sum
in Eq. (30), and hence to the heat capacity, are negligible. In monoatomic gases, this is true for all
degrees of freedom besides those of the translational motion, already taken into account by the first term
in Eq. (30), i.e. by Eq. (16b), so their specific heat is typically well described by Eq. (19).
The most important exception is the rotational degrees of freedom of diatomic and polyatomic
molecules. As quantum mechanics shows,15 the excitation energy of these degrees of freedom scales as
2/2I, where I is the molecule’s relevant moment of inertia. In the most important molecules, this energy
is rather low (for example, for N2, it is close to 0.25 meV, i.e. ~1% of the room temperature), so at usual
conditions they are well excited and, moreover, behave virtually as classical degrees of freedom, each
giving a quadratic contribution to the molecule’s kinetic energy. As a result, they obey the equipartition
theorem, each giving an extra contribution of T/2 to the energy, i.e. ½ to the specific heat.16 In
polyatomic molecules, there are three such classical degrees of freedom (corresponding to their rotations
about the three principal axes17), but in diatomic molecules, only two.18 Hence, these contributions may
be described by the following generalization of Eq. (19):

 3/2, for monoatomic gases,



cV  5/2, for gases of diatomic molecules, (3.31)
 3, for gases of polyatomic molecules.

Please keep in mind, however, that as the above discussion shows, this simple result is invalid at
very low and very high temperatures. In the latter case, the most frequent violations of Eq. (31) are due
to the thermal activation of the vibrational degrees of freedom; for many important molecules, it starts at
temperatures of a few thousand K.

3.2. Calculating 
Now let us discuss the properties of ideal gases of free, indistinguishable particles in more detail,
paying special attention to the chemical potential  – which, for some readers, may still be a somewhat
mysterious aspect of the Fermi and Bose distributions. Note again that particle indistinguishability is
conditioned by the absence of thermal excitations of their internal degrees of freedom, so in the balance
of this chapter such excitations will be ignored, and the particle’s energy k will be associated with its
“external” energy alone: for a free particle in an ideal gas, with its kinetic energy (3).
Let us start from the classical gas, and recall the conclusion of thermodynamics that  is just the
Gibbs potential per unit particle – see Eq. (1.56). Hence we can calculate  = G/N from Eqs. (1.49) and
(16b). The result,

15 See, e.g., either the model solution of Problem 2.14 (and references therein) or QM Secs. 3.6 and 5.6.
16 This result may be readily obtained again from the last term of Eq. (30) by treating it exactly like the first one
was and then applying the general Eq. (1.50).
17 See, e.g., CM Sec. 4.1.
18 This conclusion of the quantum theory may be interpreted as the indistinguishability of the rotations about the
molecule’s symmetry axis.

Chapter 3 Page 8 of 34
Essential Graduate Physics SM: Statistical Mechanics

V N  2 2 
3/ 2

  T ln  f T   T  T ln    , (3.32a)
N  gV  mT  
which may be rewritten as
3/ 2
   N  2 
2
exp    
  , (3.32b)
 T  gV  mT 
gives us some idea about  not only for a classical gas but for quantum (Fermi and Bose) gases as well.
Indeed, we already know that for indistinguishable particles, the Boltzmann distribution (2.111) is valid
only if  Nk  << 1. Comparing this condition with the quantum statistics (2.115) and (2.118), we see
again that the condition of the gas behaving classically may be expressed as
  k  (3.33)
exp   1
 T 
for all k. Since the lowest value of k given by Eq. (3) is zero, Eq. (33) may be satisfied only if
exp{/T} << 1. This means that the chemical potential of a classical gas has to be not just negative, but
also “strongly negative” in the sense
   T . (3.34a)
According to Eq. (32), this condition may be represented as
T  T0 , (3.34b)
with T0 defined as
2/3 2/3
Quantum 2  N  2 n 2
scale of T0        , (3.35)
temperature m  gV  m g g 2 / 3 mrave
2

where rave is the average distance between the gas particles:


1/ 3
1 V 
rave    . (3.36)
n1 / 3 N
With the last form of T0, the condition (34) is very transparent physically: disregarding the factor
2/3
g (which is typically of the order of 1), it means that the average thermal energy of a particle, which is
always of the order of T, has to be much larger than the energy of quantization of particle’s motion at
the length rave. An alternative form of the same condition is

rave  g 1 / 3 rc , where rc  . (3.37)
(mT )1 / 2
In quantum mechanics, the parameter rc so defined is frequently called the correlation length.19 For a
typical gas (say, N2, with m  14mp  2.310-26 kg) at the standard room temperature (T = kB300K 
4.110-21 J), the correlation length rc is close to 10-11 m, i.e. is significantly smaller than the physical size
r0 ~ 310-10 m of the molecule. This estimate shows that at room temperatures, as soon as any practical
gas is rare enough to be ideal (rave >> r0), it is classical. Thus, the only way to observe quantum effects
in the translational motion of molecules is by using very deep refrigeration. According to Eq. (37), for

19See, e.g., QM Sec. 7.2 and in particular Eq. (7.37).

Chapter 3 Page 9 of 34
Essential Graduate Physics SM: Statistical Mechanics

the same nitrogen molecule, taking rave ~ 102r0 ~ 10-8 m (to ensure that direct interaction effects are
negligible), the temperature should be well below 1 mK.
In order to analyze quantitatively what happens with gases when T is reduced to such low values,
we need to calculate  for an arbitrary ideal gas of indistinguishable particles. Let us use the lucky fact
that the Fermi-Dirac and the Bose-Einstein statistics may be represented with one formula:
1
N        / T , (3.38)
e 1
where (and everywhere in the balance of this section) the top sign stands for fermions and the lower one
for bosons, to discuss fermionic and bosonic ideal gases in one shot.
If we deal with a member of the grand canonical ensemble (Fig. 2.13), in which not only T but
also  is externally fixed, we may use Eq. (38) to calculate the average number N of particles in volume
V. If the volume is so large that N >> 1, we may use the general state counting rule (13) to get

gV gV d3p gV 4p 2 dp
N   d 3 k 
2 3  2 3  e[ ( p) ] / T 2 3 0 e[ ( p) ] / T
N  . (3.39)
1 1
In most practical cases, however, the number N of gas particles is fixed by particle confinement (i.e. the
gas portion under study is a member of a canonical ensemble – see Fig. 2.6), and hence  rather than N
should be calculated. Let us use the trick already mentioned in Sec. 2.8: if N is very large, the relative
fluctuation of the particle number, at fixed , is negligibly small (N/N ~ 1/N << 1), and the relation
between the average values of N and  should not depend on which of these variables is exactly fixed.
Hence, Eq. (39), with  having the sense of the average chemical potential, should be valid even if N is
exactly fixed, so the small fluctuations of N are replaced with (equally small) fluctuations of .
Physically, in this case the role of the -fixing environment for any sub-portion of the gas is played by
the rest of it, and Eq. (39) expresses the condition of self-consistency of such chemical equilibrium.
So, at N >> 1, Eq. (39) may be used to calculate the average  as a function of two independent
parameters: N (i.e. the gas density n = N/V) and temperature T. For carrying out this calculation, it is
convenient to convert the right-hand side of Eq. (39) to an integral over the particle’s energy (p) =
p2/2m, so p = (2m)1/2, and dp = (m/2)1/2d, getting

gVm 3 / 2  1 / 2 d Basic
N
2 2  3
 e (  ) / T  1 .
0
(3.40) equation
for 

This key result may be represented in two other, sometimes more convenient forms. First, Eq.
(40), derived for our current (3D, isotropic and parabolic-dispersion) approximation (3), is just a
particular case of the following self-evident state-counting relation

N   g ( ) N   d , (3.41)
0

where
g    dN states d (3.42)

is the temperature-independent density of all quantum states of a particle – regardless of whether they
are occupied or not. Indeed, according to the general Eq. (4), for the simple isotopic model (3),

Chapter 3 Page 10 of 34
Essential Graduate Physics SM: Statistical Mechanics

dN states d  gV 4 3  gVm 3 / 2 1 / 2
g    g 3 ( )    p    , (3.43)
d d  2 3 3  2 2  3
so plugging it into Eq. (41), we return to Eq. (39).
On the other hand, for some calculations, it is convenient to introduce the following
dimensionless energy variable:   /T, to express Eq. (40) via a dimensionless integral:

gV (mT ) 3 / 2  1 / 2 d
N
2 2  3
 e  /T  1 .
0
(3.44)

As a sanity check, in the classical limit (34), the exponent in the denominator of the fraction under the
integral is much larger than 1, and Eq. (44) reduces to
 
gV (mT ) 3 / 2  1 / 2 d gV (mT ) 3 / 2  
N  e  / T  exp    1 / 2 e  d , at    T . (3.45)
2 2  3 0 2 2  3 T  0
By the definition of the gamma function (),20 the last integral is just (3/2) = 1/2/2, and we get
3/ 2
  2 2  3 2  T 
exp   N   2 0  , (3.46)
T  gV (mT ) 3/ 2
  T 
which is exactly the same result as given by Eq. (32), which was obtained earlier in a rather different
way – from the Boltzmann distribution and thermodynamic identities.
Unfortunately, in the general case of arbitrary , the integral in Eq. (44) cannot be worked out
analytically.21 The best we can do is to use the T0 defined by Eq. (35), to rewrite Eq. (44) in the
following convenient, fully dimensionless form:
2 / 3
T  1 

 1 / 2 d

T0  2 2 0 e  / T  1 , (3.47)

and then use this relation to calculate the ratios T/T0 and /T0  (/T)(T/T0), as functions of /T
numerically. After that, we may plot the results versus each other, now considering the former ratio as
the argument. Figure 1 below shows the resulting plots for both particle types. They show that at high
temperatures, T >> T0, the chemical potential is negative and approaches the classical behavior given by
Eq. (46) for both fermions and bosons – just as we could expect. However, at temperatures T ~ T0 the
type of statistics becomes crucial. For fermions, the reduction of temperature leads to  changing its
sign from negative to positive, and then approaching a constant positive value called the Fermi energy,
F  7.595 T0 at T  0. On the contrary, the chemical potential of a bosonic gas stays negative and then
turns into zero at a certain critical temperature Tc  3.313 T0. Both these limits, which are very
important for applications, may (and will be :-) explored analytically, separately for each statistics.

20See, e.g., MA Eq. (6.7a).


21For the reader’s reference only: for the upper sign, the integral in Eq. (40) is a particular form (for s = ½) of a
special function called the complete Fermi-Dirac integral Fs, while for the lower sign, it is a particular case (for s
= 3/2) of another special function called the polylogarithm Lis. (In what follows, I will not use these notations.)

Chapter 3 Page 11 of 34
Essential Graduate Physics SM: Statistical Mechanics

10
F 8 fermions
T0 6
4
 2
0 Fig. 3.1. The chemical potential of an ideal
T0 gas of N >> 1 indistinguishable quantum
2
4
particles, as a function of temperature at a
bosons fixed gas density n  N/V (i.e. fixed T0  n2/3),
6 Tc
for two different particle types. The dashed
8
line shows the classical approximation (46),
10 valid only asymptotically at T >> T0.
0 1 2 3 4 5 6 7 8 9 10
T / T0

Before carrying out such analyses (in the next two sections), let me show that rather surprisingly,
for any non-relativistic ideal quantum gas, the relation between the product PV and the energy,
2 Ideal gas:
PV  E, (3.48) PV vs. E
3
is exactly the same as follows from Eqs. (18) and (19) for the classical gas, and hence does not depend
on the particle statistics. To prove this, it is sufficient to use Eqs. (2.114) and (2.117) for the grand
thermodynamic potential of each quantum state, which may be conveniently represented by a single
formula,
(   k ) / T 
 k  T ln1  e , (3.49)
 
and sum them over all states k, using the general summation formula (13). The result for the total grand
potential of a 3D gas with the dispersion law (3) is

 
 3/ 2 
gV  (   p 2 / 2m) / T  4 p 2 dp  T gVm
  T 0 ln1  e  2 3 
ln 1  e (   ) / T  1 / 2 d . (3.50)
2  3  2  0
Working out this integral by parts, exactly as we did it with the one in Eq. (2.90), we get
 
2 gVm 3 / 2  3 / 2 d 2
  g 3 ( ) N   d .
3 0
   (3.51)
3 2 2  3 0 e (   ) / T  1
But the last integral is just the total energy E of the gas:
  
gV p2 4p 2 dp gVm 3 / 2  3 / 2 d Ideal
E
2  3 0 2m e[ ( p) ] / T  1 2 2  3 0 e (  ) / T  1  0  g 3 ( ) N   d ,
 (3.52) gas:
energy

so for any temperature and any particle type,  = –(2/3)E. But since, from thermodynamics,  = –PV,
we have Eq. (48) proved. This universal relation22 will be repeatedly used below.

22For gases of diatomic and polyatomic molecules, whose rotational degrees of freedom are usually thermally
excited, Eq. (48) is valid only for the translational-motion energy.

Chapter 3 Page 12 of 34
Essential Graduate Physics SM: Statistical Mechanics

3.3. Degenerate Fermi gas


Analysis of low-temperature properties of a Fermi gas is very simple in the limit T = 0. Indeed,
in this limit, the Fermi-Dirac distribution (2.115) is just the step function:

 1, for    ,
N     (3.53)
 0, for    ,
– see the bold line in Fig. 2a. Since the function  = p2/2m is isotropic in the momentum space, in that
space the particles, at T = 0, fully occupy all possible quantum states inside a sphere (called either the
Fermi sphere or the Fermi sea) with some radius pF (Fig. 2b), while all states above the sea surface are
empty. Such degenerate Fermi gas is a striking manifestation of the Pauli principle: though in
thermodynamic equilibrium at T = 0 all particles try to lower their energies as much as possible, only g
of them may occupy each translational (“orbital”) quantum state. As a result, the sphere’s volume is
proportional to the particle number N, or rather to their density n = N/V.

N   (a) pz (b)
1 T 0
Fig. 3.2. Representations of the
py
pF 0 Fermi sea: (a) on the Fermi
T   F px distribution plot, and (b) in the
momentum space.
0 F 

Indeed, the radius pF may be readily related to the number of particles N using Eq. (39), with the
upper sign, whose integral in this limit is just the Fermi sphere’s volume:
pF
gV gV 4 3
2   
N 4 p 2
dp  pF . (3.54)
3
0 2   3
3

Now we can use Eq. (3) to express via N the chemical potential  (which, in the limit T = 0, bears the
special name of the Fermi energy F)23:
2/3 1/ 3
Fermi p2 2  2 N   9 4 
F    F   6     T0  7.595 T0 , (3.55a)
2m 2m  gV 
energy T 0
 2 
where T0 is the quantum temperature scale defined by Eq. (35). This formula quantifies the low-
temperature trend of the function (T), clearly visible in Fig. 1, and in particular, explains the ratio F/T0
mentioned in Sec. 2. Note also a simple and very useful relation,
3 N 3 N
F  , i.e. g 3 ( F )  , (3.55b)
2 g 3 ( F ) 2 F
that may be obtained immediately from the comparison of Eqs. (43) and (54).
The total energy of the degenerate Fermi gas may be (equally easily) calculated from Eq. (52):

23 Note that in the electronic engineering literature,  is usually called the Fermi level, for any temperature.

Chapter 3 Page 13 of 34
Essential Graduate Physics SM: Statistical Mechanics

pF
gV p2 gV 4 p F5 3
2  3 0
E 4p 2 dp    N, (3.56)
2m 2  3 2m 5 5 F
showing that the average energy,    E/N, of a particle inside the Fermi sea is equal to 3/5 of that (F)
of the particles in the most energetic occupied states, on the Fermi surface. Since, according to the basic
formulas of Chapter 1, at zero temperature H = G = N, and F = E, the only thermodynamic variable
still to be calculated is the gas pressure P. For it, we could use any of the thermodynamic relations P =
(H – E)/V or P = –(F/V)T, but it is even easier to use our recent result (48). Together with Eq. (56), it
yields
1/ 3
2 E 2 N F  36 4   2n5/ 3
P     P0  3.035 P0 , where P0  nT0  . (3.57)
3V 5 V  125  mg 2 / 3
From here, it is straightforward to calculate the isothermal bulk modulus (reciprocal compressibility),24
 P  2 N
K T  V    F , (3.58)
 V  T 3 V
which is frequently simpler to measure experimentally than P.
Perhaps the most important example25 of the degenerate Fermi gas is the conduction electrons in
metals – the electrons that belong to the outer shells of isolated atoms but become shared in solid metals,
and as a result, can move through the crystal lattice almost freely. Though the electrons (which are
fermions with spin s = ½ and hence with the spin degeneracy g = 2s + 1 = 2) are negatively charged, the
Coulomb interaction of the conduction electrons with each other is substantially compensated by the
positively charged ions of the atomic lattice, so they follow the simple model discussed above, in which
the interaction is disregarded, reasonably well. This is especially true for alkali metals (forming Group 1
of the periodic table of elements), whose experimentally measured Fermi surfaces are spherical within
1% – even within 0.1% for Na.
Table 1 lists, in particular, the experimental values of the bulk modulus for such metals, together
with the values given by Eq. (58) using the F calculated from Eq. (55) with the experimental density of
the conduction electrons. The agreement is pretty impressive, taking into account that the simple theory
described above completely ignores the Coulomb and exchange interactions of the electrons. This
agreement implies that, surprisingly, the experimentally observed firmness of solids (or at least metals)
is predominantly due to the kinetic energy (3) of the conduction electrons, rather than any electrostatic
interactions – though, to be fair, these interactions are the crucial factor defining the equilibrium value

24 For a general discussion of this notion, see, e.g., CM Eqs. (7.32) and (7.36).
25 Recently, nearly degenerate gases (with F ~ 5T) have been formed of weakly interacting Fermi atoms as well –
see, e.g., K. Aikawa et al., Phys. Rev. Lett. 112, 010404 (2014), and references therein. Another interesting
example of the system that may be approximately treated as a degenerate Fermi gas is the set of Z >> 1 electrons
in a heavy atom. However, in this system the account of electron interaction via the electrostatic field they create
is important. Since for this Thomas-Fermi model of atoms, the thermal effects are unimportant, it was discussed
already in the quantum-mechanical part of this series (see QM Chapter 8). However, its analysis may be
streamlined using the notion of the chemical potential, introduced only in this course – the problem left for the
reader’s exercise.

Chapter 3 Page 14 of 34
Essential Graduate Physics SM: Statistical Mechanics

of n. Numerical calculations using more accurate approximations (e.g., the Density Functional
Theory26), which agree with experiment with a few-percent accuracy, confirm this conclusion.27

Table 3.1. Experimental and theoretical parameters of electrons’ Fermi sea in some alkali metals28
Metal F (eV) K (GPa) K (GPa)  (mcal/moleK2)  (mcal/moleK2)
Eq. (55) Eq. (58) experiment Eq. (69) experiment
Na 3.24 923 642 0.26 0.35
K 2.12 319 281 0.40 0.47
Rb 1.85 230 192 0.46 0.58
Cs 1.59 154 143 0.53 0.77

Looking at the values of F listed in this table, note that room temperatures (TK ~ 300 K)
correspond to T ~ 25 meV. As a result, most experiments with metals, at least in their solid or liquid
form, are performed in the limit T << F. According to Eq. (39), at such temperatures, the occupancy
step described by the Fermi-Dirac distribution has a non-zero but relatively small width of the order of T
– see the dashed line in Fig. 2a. Calculations for this case are much facilitated by the so-called
Sommerfeld expansion formula29 for the integrals like those in Eqs. (41) and (52):
 
 2 2 d (  )
Sommerfeld I (T )    ( ) N   d    ( )d  T , for T   , (3.59)
expansion
0 0
6 d
where () is an arbitrary function that is sufficiently smooth at  =  and integrable at  = 0. To prove
this formula, let us introduce another function,

df  
f ( )    (' )d' , so that     , (3.60)
0
d
and work out the integral I(T) by parts:
df  
  
I (T )   N   d   N   df
0
d  0
  
  N   
  N   f   0      f   
 
 f  d N    
 d . (3.61)
 0 0  

26 See, e.g., QM Sec. 8.4.


27 Note also a huge difference between the very high bulk modulus of metals (K ~ 1011 Pa) and its very low values
in usual, atomic gases (for them, at ambient conditions, K ~105 Pa). About four orders of magnitude of this
difference is due to that in the particle density N/V, but the balance is due to the electron gas’ degeneracy. Indeed,
in an ideal classical gas, K = P = T(N/V), so that the factor (2/3)F in Eq. (58), of the order of a few eV in metals,
should be compared with the factor T  25 meV in the classical gas at room temperature.
28 Data from N. Ashcroft and N. D. Mermin, Solid State Physics, W. B. Saunders, 1976.
29 Named after Arnold Sommerfeld, who was the first (in 1927) to apply quantum mechanics to degenerate Fermi
gases, in particular to electrons in metals, and may be credited for most of the results discussed in this section.

Chapter 3 Page 15 of 34
Essential Graduate Physics SM: Statistical Mechanics

As evident from Eq. (2.115) and/or Fig. 2a, at T <<  the function –N()/ is close to zero for all
energies, besides a narrow peak of the unit area, at   . Hence, if we expand the function f() in the
Taylor series near this point, just a few leading terms of the expansion should give us a good
approximation:
2    N   

 df 1 d2 f
I (T )    f (  )                    d
0 
d 2 d 2   
 
  N    
  N   
   (' )d'    d   (  )        d
   
0 0  0 
(3.62)
2   N   

1 d (  )
 
2 d 0
      d .
  

In the last form of this relation, the first integral over  equals N( = 0) – N( =  = 1, the second
one vanishes (because the function under it is antisymmetric with respect to the point  = ), and only
the last one needs to be dealt with explicitly, by working it out by parts and then using a table integral:30

  N    
2 d  1 

d 2 
2

       d  T    d  4T  
2 2
 2
 4T (3.63)
0    
d  e  1  0 e 1
12

Being plugged into Eq. (62), this result proves the Sommerfeld formula (59).
The last preparatory step we need to make is to account for a possible small difference (as we
will see below, also proportional to T2) between the temperature-dependent chemical potential (T) and
the Fermi energy defined as F  (0), in the largest (first) term on the right-hand side of Eq. (59):
F
2 d (  )  2 2 d (  )
I (T )    ( ) d     F  (  )  T2  I (0)     F  (  )  T . (3.64)
0
6 d 6 d

Now, applying this formula to Eq. (41) and the last form of Eq. (52), we get the following results
(which are valid for any dispersion law (p) and even any dimensionality of the gas):
2
dg (  )
N (T )  N (0)     F g (  )  T2 , (3.65)
6 d
2 2 d
E (T )  E (0)     F g (  )  T g (  ) . (3.66)
6 d
If the number of particles does not change with temperature, N(T) = N(0), as in most experiments, Eq.
(65) gives the following formula for finding the temperature-induced change of :
 2 2 1 dg (  )
 F   T . (3.67)
6 g (  ) d
Note that the change is quadratic in T and negative, in agreement with the numerical results shown with
the red line in Fig. 1. Plugging this expression (which is only valid when the magnitude of the change is
much smaller than F) into Eq. (66), we get the following temperature correction to the energy:

30 See, e.g., MA Eqs. (6.8c) and (2.12b), with n = 1.

Chapter 3 Page 16 of 34
Essential Graduate Physics SM: Statistical Mechanics

2
E (T )  E (0)  g (  )T 2 , (3.68)
6
where within the accuracy of our approximation,  may be replaced with F. (Due to the universal
relation (48), this result also gives the temperature correction to the Fermi gas’ pressure.) Now we may
use Eq. (68) to calculate the heat capacity of the degenerate Fermi gas:
Low-T
 E  2
heat CV     T , with   g  F  . (3.69)
capacity
 T V 3
According to Eq. (55b), in the particular case of a 3D gas with the isotropic and parabolic dispersion law
(3), Eq. (69) reduces to
2 N C 2 T
  , i.e. cV  V   1 . (3.70)
2 F N 2 F
This important result deserves a discussion. First, note that within the range of validity of the
Sommerfeld approximation (T << F), the specific heat of the degenerate gas is much smaller than that
of the classical gas, even without internal degrees of freedom: cV = 3/2 – see Eq. (19). The physical
reason for such a low heat capacity is that the particles deep inside the Fermi sea cannot pick up thermal
excitations with available energies of the order of T << F, because the states immediately above them
are already occupied. The only particles (or rather quantum states, due to the particle
indistinguishability) that may be excited with such small energies are those at the Fermi surface, more
exactly within a surface layer of thickness  ~ T << F, and Eq. (70) presents a very vivid manifestation
of this fact.
The second important feature of Eqs. (69)-(70) is the linear dependence of the heat capacity on
temperature, which decreases with a reduction of T much slower than that of crystal vibrations – see Eq.
(2.99). This means that in metals, the specific heat at temperatures T << TD is dominated by the
conduction electrons. Indeed, experiments confirm not only the linear dependence (70) of the specific
heat,31 but also the values of the proportionality coefficient   CV/T for cases when F can be calculated
independently, for example for alkali metals – see the two rightmost columns of Table 1 above. More
typically, Eq. (69) is used for the experimental measurement of the density of states on the Fermi
surface, g(F) – the factor which participates in many theoretical results, in particular in transport
properties of degenerate Fermi gases (see Chapter 6 below).

3.4. Bose-Einstein condensation


Now let us explore what happens at the cooling of an ideal gas of bosons. Figure 3a shows the
same plot as Fig. 1b, i.e. the result of a numerical solution of Eq. (47) with the appropriate (lower) sign
in the denominator, on a more appropriate log-log scale. One can see that the chemical potential 
indeed tends to zero at some finite “critical temperature” Tc. It may be found by taking  = 0 in Eq. (47),
thus reducing it to a table integral:32

31 Solids, with their low thermal expansion coefficients, provide virtually-fixed-volume confinement for the
electron gas, so that the specific heat measured at ambient conditions may be legitimately compared with the
calculated cV.
32 See, e.g., MA Eq. (6.8b) with s = 3/2, and then Eqs. (2.7b) and (6.7e).

Chapter 3 Page 17 of 34
Essential Graduate Physics SM: Statistical Mechanics

2 / 3 2 / 3
 1   1 / 2 d   1  3   3  BEC:
2  
Tc  T0    T0         3.313 T0 , (3.71) critical
 2 0 e  1   2  2   2 
2
temperature

explaining the Tc/T0 ratio which was already mentioned in Sec. 2 and indicated in Figs. 1 and 3.

(a) (b)
100 10

8
PV  NT
10
6
PV 3.313
)
NT 0
 4
 ) 1
T0
2
1 .701

0.1 0
0 2 4 6 8 10
T / T0
Fig. 3.3. The Bose-Einstein condensation:
(a) the chemical potential of the gas and (b)
0.01
1 10 100 its pressure, as functions of temperature. The
T / T0 dashed line corresponds to the classical gas.
3.313

Let us have a good look at the temperature interval 0 < T < Tc, which cannot be directly
described by Eq. (40) (with the appropriate negative sign in the denominator), and hence may look
rather mysterious. Indeed, within this range, the chemical potential  cannot either be negative or equal
zero because according to Eq. (71); in this case, Eq. (40) would give a value of N smaller than the
number of particles we actually have. On the other hand,  cannot be positive either, because the
integral (40) would diverge at    due to the divergence of the factor N() – see, e.g., Fig. 2.15.
The only possible resolution of the paradox, suggested by A. Einstein in 1925, is as follows: at T
< Tc, the chemical potential of each particle of the system still equals exactly zero, but a certain number
(N0 of N) of them are in the ground state (with   p2/2m = 0), forming the so-called Bose-Einstein
condensate, usually referred to as the BEC. Since the condensate particles do not contribute to Eq. (40)
(because of the factor 1/2 = 0), their number N0 may be calculated by using that formula or,
equivalently, Eq. (44) with  = 0, to find the number (N – N0) of particles still remaining in the gas, i.e.
having energies  > 0:

gV (mT ) 3 / 2  1 / 2 d
N  N0 
2 2  3
 e  1 .
0
(3.72)

Chapter 3 Page 18 of 34
Essential Graduate Physics SM: Statistical Mechanics

This result is even simpler than it may look. Indeed, let us write it for the case T = Tc, when N0 = 0:33
gV (mTc ) 3 / 2   1 / 2 d
N
2 2  3
 e  1 .
0
(3.73)

Dividing both sides of Eqs. (72) and (73), we get an extremely simple and elegant result:

N  N0  T 
3/ 2
  T 3/ 2 
  ,
 so that N 0  N 1    , for T  Tc . (3.74a)
N  Tc    Tc  
Please note that this result is only valid for the particles whose motion, within the volume V, is
free – in other words, for a system of free particles confined within a rigid-wall box of volume V. In
most experiments with the Bose-Einstein condensation of dilute gases of neutral (and hence very weakly
interacting) atoms, they are held not in such a box, but at the bottom of a “soft” potential well, which
may be well approximated by a 3D quadratic parabola: U(r) = m2r2/2. It is straightforward (and hence
left for the reader’s exercise) to show that in this case, the dependence of N0(T) is somewhat different:
   
3
T
N 0  N 1   *  , for T  Tc* , (3.74b)
  Tc  
 
where Tc is a different critical temperature, which now depends on , i.e. on the confining potential’s
*

“steepness”. (In this case, V is not exactly fixed; however, the effective volume occupied by the particles
at T = Tc* is related to this temperature by a formula close to Eq. (71), so all estimates given above are
still valid.) Figure 4 shows one of the first sets of experimental data for the Bose-Einstein condensation
of a dilute gas of neutral atoms. Taking into account the finite number of particles in the experiment, the
agreement with the simple theory is surprisingly good.
Returning to the spatially-uniform Bose system, let us explore what happens below the critical
temperature with its other parameters. Formula (52) with the appropriate (lower) sign shows that
approaching Tc from higher temperatures, the gas energy and hence its pressure do not vanish – see the
red line in Fig. 3b. Indeed, at T = Tc (where  = 0), that formula yields34

m 3 / 2Tc5/2   3 / 2 d m 3 / 2Tc5/2  5   5 
E (Tc )  gV
2 2  3
 e  1
0
 gV      0.7701 NTc ,
2 2  3  2   2 
(3.75)

so using the universal relation (48), we get the following pressure value:
2 E (Tc )  (5 / 2) N N
P Tc    Tc  0.5134 Tc  1.701 P0 , (3.76)
3 V  (3 / 2 ) V V
which is somewhat lower than, but comparable with P(0) for the fermions – cf. Eq. (57).

33 This is, of course, just another form of Eq. (71).


34 For the involved dimensionless integral see, e.g., MA Eqs. (6.8b) with s = 5/2, and then Eqs. (2.7b) and (6.7c).

Chapter 3 Page 19 of 34
Essential Graduate Physics SM: Statistical Mechanics

Fig. 3.4. The total number N of trapped 87Rb


atoms (inset) and their ground-state fraction
N0/N, as functions of the ratio T/Tc, as measured
N0 in one of the pioneering experiments – see J.
Ensher et al., Phys. Rev. Lett. 77, 4984 (1996). In
N
this experiment, Tc* was as low as 0.2810-6 K.
The solid line shows the simple theoretical
dependence N(T) given by Eq. (74b), while other
lines correspond to more detailed theories taking
into account the finite number N of trapped
atoms. © 1996 APS, reproduced with permission.
T / Tc*

Now we can use the same Eq. (52), also with  = 0, to calculate the energy of the gas at T < Tc,

m 3 / 2 T 5 / 2  3 / 2 d
E T   gV  e  1 . (3.77)
2 2  3 0

Comparing this relation with the first form of Eq. (75), which features the same integral, we
immediately get one more simple temperature dependence:
5/ 2
T 
E (T )  E Tc   , for T  Tc . (3.78) BEC:
energy
 Tc 
From the universal relation (48), we immediately see that the gas pressure follows the same dependence:
5/ 2
T 
P (T )  PTc   , for T  Tc . (3.79) BEC:
pressure
 Tc 
This temperature dependence of pressure is shown with the blue line in Fig. 3b. The plot shows that for
all temperatures (both below and above Tc) the pressure is lower than that of the classical gas of the
same density. Now note also that since, according to Eqs. (57) and (76), P(Tc)  P0  V-5/3, while
according to Eqs. (35) and (71), Tc  T0  V-2/3, the pressure (79) is proportional to V-5/3/(V-2/3)5/2 = V0,
i.e. does not depend on the volume at all! The physics of this result (which is valid at T < Tc only) is that
as we decrease the volume at a fixed total number N of particles, more and more of them go to the
condensate, decreasing the number (N – N0) of particles in the gas phase, but not changing its spatial
density and pressure. Such behavior is very typical for the coexistence of two different phases of the
same matter – see, in particular, the next chapter.
The last thermodynamic variable of major interest is heat capacity, because it may be most
readily measured. For temperatures T  Tc, it may be easily calculated from Eq. (78):
 E  5 T 3/ 2
CV (T )     E (Tc ) , (3.80)
 T  N ,V 2 Tc5/2
so below Tc, the capacity increases with temperature, at the critical temperature reaching the value

Chapter 3 Page 20 of 34
Essential Graduate Physics SM: Statistical Mechanics

5 E (Tc )
CV (Tc ) 
 1.925 N , (3.81)
2 Tc
which is approximately 28% above that (3N/2) of the classical gas. (As a reminder, in both cases we
ignore possible contributions from the internal degrees of freedom.) The analysis for T  Tc is a little bit
more cumbersome because differentiating E over temperature – say, using Eq. (52) – one should also
take into account the temperature dependence of  that follows from Eq. (40) – see also Fig. 1.
However, the most important feature of the result may be predicted without such calculation (which is
left for the reader’s exercise). Namely, since at T >> Tc the heat capacity has to approach the classical
value 1.5N, a temperature increase from Tc up must decrease CV from the value (81), thus forming a
sharp maximum (a “cusp”) at the critical point T = Tc – see Fig. 5.

3.313
2.5

2 1.925
CV
N 1.5

Fig. 3.5. Temperature dependences of the heat


0.5
capacity of an ideal Bose-Einstein gas,
numerically calculated from Eqs. (52) and (40)
0
0 2 4 6 8 10 for T  Tc, and given by Eq. (80) for T  Tc.
T / T0

Such a cusp is a good indication of the Bose-Einstein condensation in virtually any experimental
system, especially because inter-particle interactions (unaccounted for in our simple discussion)
typically make this feature even more substantial, frequently turning it into a weak (logarithmic)
singularity. Historically, such a singularity was the first noticed, though not immediately understood
sign of the Bose-Einstein condensation observed in 1931 by W. Keesom and K. Clusius in liquid 4He at
its -point (called so exactly because of the characteristic shape of the CV(T) plot) T = Tc  2.17 K.
Other major milestones of the Bose-Einstein condensation research history include:
- the experimental discovery of superconductivity (which was later explained as the result of the
Bose-Einstein condensation of electron pairs) by H. Kamerlingh-Onnes in 1911;
- the development of the Bose-Einstein statistics, and predicting the condensation, by S. Bose
and A. Einstein, in 1924-1925;
- the discovery of superfluidity in liquid 4He by P. Kapitza and (independently) by J. Allen and
D. Misener in 1937, and its explanation as a result of the Bose-Einstein condensation by F. and H.
Londons and L. Titza, with further significant elaborations by L. Landau – all in 1938;
- the explanation of superconductivity as a result of electron binding into Cooper pairs, with a
simultaneous Bose-Einstein condensation of the resulting bosons, by J. Bardeen, L. Cooper, and J.
Schrieffer in 1957;
- the discovery of superfluidity of two different phases of 3He, due to the similar Bose-Einstein
condensation of pairs of its fermion atoms, by D. Lee, D. Osheroff, and R. Richardson in 1972;

Chapter 3 Page 21 of 34
Essential Graduate Physics SM: Statistical Mechanics

- the first observation of the Bose-Einstein condensation in dilute gases (87Ru by E. Cornell, C.
Wieman, et al., and 23Na by W. Ketterle et al.) in 1995.
The importance of the last achievement stems from the fact that in contrast to other mentioned
Bose-Einstein condensates, in dilute gases (with the typical density n as low as ~1014 cm-3) the particles
interact very weakly, and hence many experimental results are very close to the simple theory described
above and its straightforward elaborations – see, e.g., Fig. 4.35 On the other hand, the importance of
other Bose-Einstein condensates, which involve more complex and challenging physics, should not be
underestimated – as it sometimes is.
Perhaps the most important feature of any Bose-Einstein condensate is that all N0 condensed
particles are in the same quantum state, and hence are described by exactly the same wavefunction. This
wavefunction is substantially less “feeble” than that of a single particle – in the following sense. In the
second quantization language,36 the well-known commutation relations for the generalized coordinates
and momenta may be rewritten for the creation/annihilation operators; in particular, for bosons,

aˆ, aˆ   Iˆ .

(3.82)
Since â and â † are the quantum-mechanical operators of the complex amplitude a = Aexp{i} and its
complex conjugate a* = Aexp{–i}, where A and  are the real amplitude and phase of the
wavefunction, Eq. (82) yields the following approximate uncertainty relation (strict in the limit  << 1)
between the number of particles N = AA* and the phase :
N  ½ . (3.83)
This means that a condensate of N >> 1 bosons may be in a state with both phase and amplitude
of the wavefunction behaving virtually as c-numbers, with very small relative uncertainties: N << N,
 << 1. Moreover, such states are much less susceptible to unintentional perturbations including the
instruments used for measurements. For example, the electric current carried along a superconducting
wire by a coherent Bose-Einstein condensate of Cooper pairs may be as high as hundreds of amperes.
As a result, the “strange” behaviors predicted by the quantum mechanics are not averaged out as in the
usual particle ensembles (see, e.g., the discussion of the density matrix in Sec. 2.1), but may be directly
revealed in macroscopic, measurable dynamics of the condensate.
For example, the density j of the electric “supercurrent” of the Cooper pairs may be described by
the same formula as the well-known usual probability current density of a single quantum particle,37 just
multiplied by the electric charge q = –2e of a single pair, and the pair density n:
 q 
j  qn    A  , (3.84)
m  

35 Such controllability of theoretical description has motivated the use of dilute-gas BECs for modeling of
renowned problems of many-body physics – see, e.g. the review by I. Bloch et al., Rev. Mod. Phys. 80, 885
(2008). These efforts are assisted by the development of better techniques for reaching the necessary sub-K
temperatures – see, e.g., the recent work by J. Hu et al., Science 358, 1078 (2017). For a more general, detailed
discussion see, e.g., C. Pethick and H. Smith, Bose-Einstein Condensation in Dilute Gases, 2nd ed., Cambridge U.
Press, 2008.
36 See, e.g., QM Sec. 8.3.
37 See, e.g., QM Eq. (3.28).

Chapter 3 Page 22 of 34
Essential Graduate Physics SM: Statistical Mechanics

where A is the vector potential of the (electro)magnetic field. If a superconducting wire is not extremely
thin, the supercurrent does not penetrate into its interior.38 As a result, the contour integral of Eq. (84),
taken along a closed superconducting loop inside its interior (where j = 0), yields
q
 C
A  dr  Δ  2M , (3.85)

where M is an integer. But, according to the basic electrodynamics, the integral on the left-hand side of
this relation is nothing more than the flux  of the magnetic field B piercing the wire loop area A. Thus
we immediately arrive at the famous magnetic flux quantization effect:
2
Φ   Bn d 2 r  MΦ 0 , where Φ 0   2.07  10 15 Wb , (3.86)
A
q
which was theoretically predicted in 1950 and experimentally observed in 1961. Amazingly, this effect
holds even (citing H. Casimir’s famous expression) “over miles of dirty lead wire”, sustained by the
coherence of the Bose-Einstein condensate of Cooper pairs.
Other prominent examples of such macroscopic quantum effects in Bose-Einstein condensates
include not only the superfluidity and superconductivity as such, but also the Josephson effect,
quantized Abrikosov vortices, etc. Some of these effects are discussed in other parts of this series.39

3.5. Gases of weakly interacting particles


Now let us discuss the effects of weak particle interaction effects on the properties of their gas.
(Unfortunately, I will have time to do that only very briefly, and only for classical gases.40) In most
cases of interest, particle interaction may be well described by a certain potential energy U, so in the
simplest model, the total energy is
N
p2
E   k  U r1 ,.., rk ,..., rN  , (3.87)
k 1 2m

where rk is the radius vector of the kth particle’s center.41 First, let us see how far would the statistical
physics allow us to proceed for an arbitrary potential U. For N >> 1, at the calculation of the Gibbs
statistical sum (2.59), we may perform the usual transfer from the summation over all quantum states of
the system to the integration over the 6N-dimensional space, with the correct Boltzmann counting:

 Em / T 1 gN  N p 2j  3  U (r1 ,...rN )  3
Z  e  3N 
exp  d p1 ...d p N  exp
3 3
d r1 ...d rN
m N ! 2   k 1 2mT  rk V  T 

38 This is the Meissner-Ochsenfeld (or just “Meissner”) effect which may be also readily explained using Eq. (84)
combined with the Maxwell equations – see, e.g., EM Sec. 6.4.
39 See EM Secs. 6.4-6.5, and QM Secs. 1.6 and 3.1.
40 Discussions of the effects of weak interactions on the properties of quantum gases may be found, for example,
in the textbooks by Huang and by Pathria and Beale – see References.
41 One of the most significant effects neglected by Eq. (87) is the influence of atomic/molecular angular
orientations on their interactions.

Chapter 3 Page 23 of 34
Essential Graduate Physics SM: Statistical Mechanics

 1 g NV N  N p 2j  3   1  U (r1 ,...rN )  3 

 exp  
 N! 2 3 N  d p1 ...d p N    N  exp
 3
d r1 ...d rN  . (3.88)
3

  k 1 2mT    V rk V  T  
But according to Eq. (14), the first operand in the last product is just the statistical sum of an ideal gas
(with the same g, N, V, and T), so we may use Eq. (2.63) to write
 1  U (r1 ,...rN )  3 
F  Fideal  T ln  N  exp  d
 1 r ...d 3
rN

V r V  T  
 k 
(3.89)
 
1
 
 Fideal  T ln 1  N  e U / T  1 d 3 r1 ...d 3 rN  ,
 V r V 
 k 
where Fideal is the free energy of the ideal gas (i.e. of the same gas but with U = 0), given by Eq. (16).
I believe that Eq. (89) is a very convincing demonstration of the enormous power of statistical
physics methods. Instead of trying to solve an impossibly complex problem of classical dynamics of N
>> 1 (think of N ~ 1023) interacting particles, and only then calculating appropriate ensemble averages,
the Gibbs approach reduces finding the free energy (and then, from thermodynamic relations, all other
thermodynamic variables) to the calculation of just one integral on its right-hand side of Eq. (89). Still,
this integral is 3N-dimensional and may be worked out analytically only if the particle interactions are
weak in some sense. Indeed, the last form of Eq. (89) makes it especially evident that if U  0
everywhere, the term in the parentheses under the integral vanishes, and so does the integral itself, and
hence the addition to Fideal.
Now let us see what would this integral yield for the simplest, short-range interactions, in which
the potential U is substantial only when the mutual distance rkk’  rk – rk’ between the centers of two
particles is smaller than a certain value 2r0, where r0 may be interpreted as the particle’s radius. If the
gas is sufficiently dilute, so the radius r0 is much smaller than the average distance rave between the
particles, the integral in the last form of Eq. (89) is of the order of (2r0)3N, i.e. much smaller than (rave)3N
 VN. Then we may expand the logarithm in that expression into the Taylor series with respect to the
small second term in the square brackets, and keep only its first non-zero term:

F  Fideal 
T
N
V r V  
e U / T  1 d 3 r1 ...d 3 rN . (3.90)
k

Moreover, if the gas density is so low, the chances for three or more particles to come close to
each other and interact (collide) simultaneously are typically very small, so pair collisions are the most
important ones. In this case, we may recast the integral in Eq. (90) as a sum of N(N – 1)/2  N2/2 similar
terms describing such pair interactions, each of the type

 e U (rkk ' ) / T  1 d 3 r d 3 r .


V N 2    k k'
(3.91)
rk ,rk 'V
It is convenient to think about the rkk’  rk – rk’ as the radius vector of the particle number k in the
reference frame with the origin placed at the center of the particle number k’ – see Fig. 6a. Then in Eq.
(91), we may first calculate the integral over rk’, while keeping the distance vector rkk’, and hence
U(rkk’), constant, getting one more factor V. Moreover, since all particle pairs are similar, in the
remaining integral over rkk’ we may drop the radius vector’s index, so Eq. (90) becomes

Chapter 3 Page 24 of 34
Essential Graduate Physics SM: Statistical Mechanics

F  Fideal 
V
T N 2 N 1
N
2
  T
V  e U (r ) / T  1 d 3 r  Fideal  N 2 B(T ),
V
(3.92)

where the function B(T), called the second virial coefficient,42 has an especially simple form for
spherically-symmetric interactions:

   

Second 1 1
virial B (T )   1  e U (r ) / T d 3 r   4r 2 dr 1  e U ( r ) / T . (3.93)
coefficient 2 20
From Eq. (92), and the second of the thermodynamic relations (1.35), we already know something
particular about the equation of state P(V, T) of such a gas:
 F  N 2T N N2 
P     Pideal  2 B (T )  T   B(T ) 2  . (3.94)
 V  T , N V V V 
We see that at a fixed gas density n = N/V, the pair interaction creates additional pressure, proportional
to (N/V)2 = n2 and a function of temperature, B(T)T.

(a) (b)

r"
r  rkk' r"'
Fig. 3.6. The definition of the
particle k r' interparticle distance vectors
at their (a) pair and (b) triple
interactions.
particle k’

Let us calculate B(T) for a few simple models of particle interactions. The solid curve in Fig. 7
shows (schematically) a typical form of the interaction potential between electrically neutral
atoms/molecules. At large distances the interaction of particles without their own permanent electrical
dipole moment p, is dominated by the attraction (the so-called London dispersion force) between the
correlated components of the spontaneously induced dipole moments, giving U(r)  r–6 at r  .43 At
closer distances the potential is repulsive, growing very fast at r  0, but its quantitative form is specific
for particular atoms/molecules.44 The crudest description of such repulsion is given by the so-called
hardball (or “hard-sphere”) model:

42 The term “virial”, from the Latin viris (meaning “force”), was introduced to molecular physics by R. Clausius.
The motivation for the adjective “second” for B(T) is evident from the last form of Eq. (94), with the “first virial
coefficient”, standing before the N/V ratio and sometimes denoted A(T), equal to 1 – see also Eq. (100) below.
43 Indeed, independent fluctuation-induced components p(t) and p’(t) of dipole moments of two particles have

random mutual orientation, so that the time average of their interaction energy, proportional to p(t)p’(t)/r3,
vanishes. However, the electric field E of each dipole p, proportional to r-3, induces a correlated component of p’,
also proportional to r-3, giving interaction energy U(r) proportional to p’E  r-6, with a non-zero statistical
average. Quantitative discussions of this effect, within several models, may be found, for example, in QM
Chapters 3, 5, and 6.
44 Note that the particular form of the first term in the approximation U(r) = a/r12 – b/r6 (called either the
Lennard-Jones potential or the “12-6 potential”), that had been suggested in 1924, lacks physical justification,
and in professional physics was soon replaced with other approximations, including the so-called exp-6 model,

Chapter 3 Page 25 of 34
Essential Graduate Physics SM: Statistical Mechanics

 , for 0  r  2r0 ,
U (r )   (3.95)
0, for 2r0  r  ,
– see the dashed line and the inset in Fig. 7. (The distance 2r0 is sometimes called the van der Waals
radius of the particle.)

U (r )

Fig. 3.7. Pair interactions of particles.


2r0 Solid line: a typical interaction potential;
0 dashed line: its hardball model (95);
r dash-dotted line: the improved model
(97) – all schematically. The inset
U min illustrates the hardball model’s physics.
2r0

As Eq. (93) shows, in this model the second virial coefficient is temperature-independent:
2r
1 0 2 4 3
B (T )  b   4r 2 dr  2r0 3  4V0 , where V0  r0 , (3.96)
2 0 3 3
so the equation of state (94) still gives a linear dependence of pressure on temperature.
A correction to this result may be obtained by the following approximate account of the long-
range attraction (see the dash-dotted line in Fig. 7):45
  , for 0  r  2r0 ,
U (r )   (3.97)
U (r )  0, with U  T , for 2r0  r   .
For this improved model, Eq. (93) yields:
 
1 U (r ) a
B (T )  b   4r 2 dr b , with a  2  U (r ) r dr  0 .
2
(3.98)
2 2 r0 T T 2 r0

In this model, the equation of state (94) acquires a temperature-independent term:


 N  N 2  a  N N 
2
N
2

P  T      b    T   b     a   . (3.99)
 V  V   T   V  V   V 
Still, the correction to the ideal-gas pressure is proportional to (N/V)2 and has to be relatively small for
this result to be valid.

which fits most experimental data much better. However, the Lennard-Jones potential still keeps creeping from
one undergraduate textbook to another one, apparently for a not better reason than enabling a simple analytical
calculation of the equilibrium distance between the particles at T  0.
45 The strong inequality U << T in this model is necessary not only to make the calculations simpler. A deeper
reason is that if (–Umin) becomes comparable with T, particles may become trapped in this potential well, forming
a different phase – a liquid or a solid. In such phases, the probability of finding more than two particles interacting
simultaneously is high, so that Eq. (92), on which Eqs. (93)-(94) and Eqs. (98)-(99) are based, becomes invalid.

Chapter 3 Page 26 of 34
Essential Graduate Physics SM: Statistical Mechanics

Generally, the right-hand side of Eq. (99) may be considered as the sum of two leading terms in
the general expansion of P into the Taylor series in the low density n = N/V of the gas:
Pressure: N N
2
N
3

virial P  T   B(T )   C (T )   ... , (3.100)
expansion
 V V  V  
where C(T) is called the third virial coefficient. It is natural to ask how can we calculate C(T) and the
higher virial coefficients. This may be done by a tedious direct analysis of Eq. (90),46 but the
calculations may be streamlined using a different, rather counter-intuitive approach called the cluster
expansion method.47
Let us apply to our system, with the energy given by Eq. (87), the grand canonical distribution.
(Just as in Sec. 2, we may argue that if the average number N of particles in each member of a grand
canonical ensemble, with fixed  and T, is much larger than 1, the relative fluctuations of N are small,
so all its thermodynamic properties should be similar to those when N is exactly fixed.) For our current
purposes, Eq. (2.109) may be rewritten in the form

 Em , N / T N
p k2
  T ln  Z N , with Z N  e N / T  e , E m, N    U (r1 ,..., rN ) . (3.101)
N 0 m k 1 2m

(Notice that here, as at all discussions of the grand canonical distribution, N means a particular rather
than the average number of particles.) Now let us try to forget for a minute that in real systems of
interest the number of particles is extremely large, and start to calculate, one by one, the first terms ZN.
In the term with N = 0, both contributions to Em,N vanish, and so does the factor N/T, and hence
Z0 = 1. In the next term, with N = 1, the interaction term vanishes, so Em,1 is reduced to the kinetic
energy of one particle, giving
 p2 
Z 1  e  / T  exp k  . (3.102)
k  2mT 
Making the usual transition from the summation to integration, we may write
gV  p2  3
Z 1  ZI 1 , where Z  e  / T   2mT  d p, and I1  1 .
exp (3.103)
2 3
This is the same simple (Gaussian) integral as in Eq. (6), giving
3/ 2
 /T gV  /T  mT 
Z e 2mT  3/ 2
e gV  2 
. (3.104)
2 3  2 
Now let us explore the next term, with N = 2, which describes, in particular, pair interactions U =
U(r), with r = r – r’. Due to the assumed particle indistinguishability, this term needs the “correct
Boltzmann counting” factor 1/2! – cf. Eqs. (12) and (88):

46 L. Boltzmann has used that way to calculate the 3rd and 4th virial coefficients for the hardball model – as much
as can be done analytically.
47 This method was developed in 1937-38 by J. Mayer and collaborators for the classical gas, and generalized to
quantum systems in 1938 by B. Kahn and G. Uhlenbeck.

Chapter 3 Page 27 of 34
Essential Graduate Physics SM: Statistical Mechanics

1   p k2 p k'2  U (r ) / T 
Z 2  e 2 / T   exp   e  . (3.105)
2! k ,k'   2mT 2mT  
Since U is coordinate-dependent, here the transfer from the summation to integration should be done
more carefully than in the first term – cf. Eqs. (24) and (88):
1 ( gV ) 2  p2  3  p' 2  3 1 U (r ) / T 3
Z 2  e 2 / T 6 
exp    d p   exp  d p'   e d r. (3.106)
2! 2   2mT   2mT  V
Comparing this expression with Eq. (103) for the parameter Z, we get
Z2 1 U (r ) / T 3
V
Z2  I2, where I 2  e d r. (3.107)
2!
Acting absolutely similarly, for the third term of the grand canonical sum we may get
Z3 1
Z3  I 3 , where I 3  2  e U (r' ,r" ) / T d 3 r'd 3 r" , (3.108)
3! V
where r’ and r” are the vectors characterizing the mutual positions of three particles in their “cluster” –
see Fig. 6b.
These results may be readily generalized to clusters of arbitrary size N. Plugging the resulting
expression for ZN into the first of Eqs. (101) and recalling that  = –PV, we get the equation of state of
the gas in the form
T  Z2 Z3 
P  ln1  ZI 1  I2  I 3  ... . (3.109)
V  2! 3! 
As a sanity check: at U = 0, all integrals IN are equal to 1, and the expression under the logarithm is just
the Taylor expansion of the function eZ, giving P = TZ/V, and  = –PV = –TZ. In this case, according to
the last of Eqs. (1.62), the average number of particles in the system is N = –(/)T,V = Z, because
since Z  exp{/T}, Z/ = Z/T.48 Thus, in this limit, we have happily recovered the equation of state
of the ideal gas.
Returning to the general case of non-zero interactions, let us assume that the logarithm in Eq.
(109) may be also represented as a direct Taylor expansion in Z:
 Cluster
T Jl l
P
V

l 1 l!
Z , (3.110) expansion:
pressure

where Jl are some Z-independent coefficients, still to be calculated. (The lower limit of the sum reflects
the fact that according to Eq. (109), at Z = 0, P = (T/V) ln1 = 0, so the coefficient J0 in a more complete
version of Eq. (110) would equal 0 anyway.) According to Eq, (1.60), this expansion corresponds to the
grand potential

J
   PV  T  l Z l . (3.111)
l 1 l!

48Actually, the fact that in that case N = Z could have been noted earlier – just by comparing Eq. (104) with Eq.
(32b).

Chapter 3 Page 28 of 34
Essential Graduate Physics SM: Statistical Mechanics

Again using the last of Eqs. (1.62), and the already mentioned fact that according to Eq. (104), Z/ =
Z/, we get

Cluster Jl
expansion: N  Zl. (3.112)
N l 1 (l  1)!
(Note that this sum differs from that in Eq. (110) “only” by an extra factor l in each term.)
Equations (110) and (112) essentially give the solution of our problem by representing the
equation of state of the gas in the parametric form, with the factor Z serving as the parameter. The only
remaining conceptual action item is to express the coefficients Jl via the integrals IN participating in the
expansion (109). This may be done using the well-known Taylor expansion of the logarithm function, 49

l
ln (1   )    1
l 1
. (3.113)
l 1 l
Applying it to Eq. (109), we get a Taylor series in Z, starting as

P
T
 Z 
Z2
( I 2  1) 
Z3
( I 3  1)  3( I 2  1)  ... . (3.114)
V  2! 3! 
Comparing this expression with Eq. (110), we see that
J 1  1,

J2  I2 1 
V
1
  
e U (r ) / T  1 d 3 r ,
(3.115)
J 3  ( I 3  1)  3( I 2  1)

 e 
1 U (r' ,r" ) / T
  e U (r' ) / T  e U (r" ) / T  e U (r''' ) / T  2 d 3 r'd 3 r" , ...
V2
where r'''  r'  r" – see Fig. 6b. The expression for J2, describing the pair interactions of particles,
shows that besides a factor of (V/2), this is just the second virial coefficient B(T) – cf. Eq. (93). As a
reminder, the subtraction of 1 from the integral I2 in that expression makes the contribution of each
elementary 3D volume d3r into the integral J2 different from zero only if at this r two particles interact
(U  0). Very similarly, in the last of Eqs. (115), the subtraction of three pair-interaction terms from (I3
– 1) makes the contribution from an elementary 6D volume d3r’d3r” into the integral J3 different from
zero only if at that mutual location of particles, all three of them interact simultaneously, etc.

49 Looking at Eq. (109), one might think that since  = Z + Z2I2/2 +… is of the order of at least Z ~ N, the
expansion (113), which converges only if   < 1, is illegitimate for N >> 1. However, it is justified by the result
(114), in which the nth term is of the order of Nn(V0/V)n-1/n!, so that the series does converge if the gas density is
sufficiently low: N/V << 1/V0, i.e. rave >> r0. This is the very beauty of the cluster expansion whose few first
terms, perhaps unexpectedly, give good approximation even for gases with N >> 1 particles. The physics behind
this trick is that the subtraction of 1 from each exponent of the type exp{-U/T} automatically includes, to the final
result, contributions from only minor but the only important parts of the 6N-dimensional phase space, in which
the particles interact. As a result, the sum (114) is over the number of particles in each cluster (not in the whole
gas!), with an analytical summation of equal contributions from all possible clusters with the same number l of
particles in each of them – just as it is done by Eqs. (92)-(93) for the particular case l = 2.

Chapter 3 Page 29 of 34
Essential Graduate Physics SM: Statistical Mechanics

The relations (110), (114), and (115) give the final result of the cluster expansion. To see this
result at work, let us eliminate the factor Z from this system of equations, with accuracy up to terms
O(Z2). For that, we need to spell out each of these relations up to terms O(Z3):
PV J J
 J 1 Z  2 Z 2  3 Z 3  ..., . (3.116)
T 2 6
J
N  J 1 Z  J 2 Z 2  3 Z 3  ... , (3.117)
2
and then divide these two expressions, getting the result
PV 1  ( J 2 / 2 J 1 ) Z  ( J 3 / 6 J 1 ) Z 2  ... J2  J 22 J 
  1 Z   2  3  Z 2 . (3.118)
N T 1  ( J 2 / J 1 ) Z  ( J 3 / 2 J 1 ) Z  ...
2
2J1  2 J 1 3J 1 
In this approximation, we may again use Eq. (117), now solved for Z with the same accuracy O(Z2):
J2 2
Z N  N . (3.119)
J1
Plugging this expression into Eq. (118), we get the virial expansion (100) with
J2 J2 J  nd
2 and 3
rd

B (T )   V, C (T )   22  3  V 2 . (3.120) virial
2J1  J 1 3J 1  coefficients

The first of these relations, combined with the first two of Eqs. (115), yields for the 2nd virial
coefficient the same Eq. (93) that was obtained from the Gibbs distribution, in particular Eq. (96), B(T)
= 4V0, for the hardball model. The second of these results enables the calculation of the 3rd virial
coefficient; for the hardball model, C(T) = 10V02. (Let me leave the proof of the last result for the
reader’s exercise.) Evidently, a more complete expansion of Eqs. (110), (114), and (115) may be used
to calculate an arbitrary virial coefficient, though starting from the 5th of them, the calculations of the
necessary coefficients Jl may be completed only numerically even for the simplest hardball model.

3.6. Exercise problems

3.1. Use the Maxwell distribution for an alternative (statistical) calculation of the mechanical
work performed by the Szilard engine discussed in Sec. 2.3.
Hint: You may assume the simplest geometry of the engine – see Fig. 2.4.

3.2. Use the Maxwell distribution to calculate the drag


A
coefficient   –F/u, where F is the force exerted by an ideal
classical gas on a piston moving with a low velocity u, in the simplest u
geometry shown in the figure on the right, assuming that collisions of
the gas particles with the piston are elastic.

3.3. Derive the equation of state of an ideal classical gas from the grand canonical distribution.

Chapter 3 Page 30 of 34
Essential Graduate Physics SM: Statistical Mechanics

3.4. Prove that Eq. (22), derived for the change of entropy at mixing of two ideal classical gases
of completely distinguishable particles (that initially had equal densities N/V and temperatures T), is also
valid if particles in each of the initial volumes are indistinguishable from each other but different from
those in the counterpart volume. For simplicity, you may assume that the masses and internal
degeneracy factors of all particles are equal.

3.5. A round cylinder of radius R and length L, containing an ideal classical gas of N >> 1
particles of mass m each, is rotated about its symmetry axis with an angular velocity . Assuming that
the gas as a whole rotates with the cylinder, and is in thermal equilibrium at temperature T,
(i) calculate the gas pressure distribution along its radius, and analyze the result in the low-
temperature and high-temperature limits, and
(ii) neglecting the internal degrees of freedom of the particles, calculate the total energy of the
gas and its heat capacity in the high- and low-temperature limits.

3.6. N >> 1 classical, non-interacting, indistinguishable particles of mass m are confined in a


parabolic, spherically-symmetric 3D potential well U(r) = r2/2. Use two different approaches to
calculate all major thermodynamic characteristics of the system, in thermal equilibrium at temperature
T, including its heat capacity. Which of the results should be changed if the particles are distinguishable,
and how?
Hint: Suggest a replacement of the notions of volume and pressure, appropriate for this system.

3.7. In the simplest model of thermodynamic equilibrium between the liquid and gas phases of
the same molecules, temperature and pressure do not affect the molecule's condensation energy .
Calculate the density and pressure of such saturated vapor, assuming that it behaves as an ideal gas of
classical particles.

3.8. An ideal classical gas of N >> 1 particles is confined in a container of volume V and wall
surface area A. The particles may condense on container walls, releasing energy  per particle, and
forming an ideal 2D gas. Calculate the equilibrium number of condensed particles and the gas pressure,
and discuss their temperature dependences.

3.9. The inner surfaces of the walls of a closed container of volume V, filled with N >> 1
particles, have NS >> 1 similar traps (small potential wells). Each trap can hold only one particle, at
potential energy – < 0. Assuming that the gas of the particles in the volume is ideal and classical,
derive an equation for the chemical potential  of the system in equilibrium, and use it to calculate the
potential and the gas pressure in the limits of small and large values of the N/NS ratio.

3.10. Calculate the magnetic response (the Pauli paramagnetism) of a degenerate ideal gas of
spin-½ particles to a weak external magnetic field, due to a partial spin alignment with the field.

3.11. Calculate the magnetic response (the Landau diamagnetism) of a degenerate ideal gas of
electrically charged fermions to a weak external magnetic field, due to their orbital motion.

3.12.* Explore the Thomas-Fermi model of a heavy atom, with nuclear charge Q = Ze >> e, in
which the electrons are treated as a degenerate Fermi gas, interacting with each other only via their

Chapter 3 Page 31 of 34
Essential Graduate Physics SM: Statistical Mechanics

contribution to the common electrostatic potential (r). In particular, derive the ordinary differential
equation obeyed by the radial distribution of the potential, and use it to estimate the effective radius of
the atom.50

3.13.* Use the Thomas-Fermi model, explored in the previous problem, to calculate the total
binding energy of a heavy atom. Compare the result with that for a simpler model, in that the Coulomb
electron-electron interaction of electrons is completely ignored.

3.14. Calculate the characteristic Thomas-Fermi length TF of weak electric field’s screening by
conduction electrons in a metal, modeling their ensemble as a degenerate, isotropic Fermi gas, with the
electrons’ interaction limited (as in the two previous problems) by their contribution to the common
electrostatic potential.
Hint: Assume that TF is much larger than the Bohr radius rB.

3.15. For a degenerate ideal 3D Fermi gas of N particles confined in a rigid-wall box of volume
V, calculate the temperature effects on its pressure P and the heat capacity difference (CP – CV), in the
leading approximation in T << F. Compare the results with those for the ideal classical gas.
Hint: You may like to use the solution of Problem 1.9.

3.16. How would the Fermi statistics of an ideal gas affect the barometric formula (28)?

3.17. Derive general expressions for the energy E and the chemical potential  of a uniform
Fermi gas of N >> 1 non-interacting, indistinguishable, ultra-relativistic particles.51 Calculate E, and
also the gas pressure P explicitly in the degenerate gas limit T  0. In particular, is Eq. (48) valid in this
case?

3.18. Use Eq. (49) to calculate the pressure of an ideal gas of ultra-relativistic, indistinguishable
quantum particles, for an arbitrary temperature, as a function of the total energy E of the gas and its
volume V. Compare the result with the corresponding relations for the electromagnetic blackbody
radiation and for an ideal gas of non-relativistic particles.

3.19.* Calculate the speed of sound in an ideal gas of ultra-relativistic fermions of density n at
negligible temperature.

3.20. Calculate basic thermodynamic characteristics, including all relevant thermodynamic


potentials, specific heat, and the surface tension of a uniform, non-relativistic 2D electron gas with given
areal density n  N/A:
(i) at T = 0, and

50 Since this problem and the next one are important for atomic physics, and at their solution, thermal effects may
be ignored, they were given in Chapter 8 of the QM part of the series as well, for the benefit of readers who would
not take this SM course. Note, however, that the argumentation in their solutions may be streamlined by using the
notion of the chemical potential , which was introduced only in this course.
51 This is, for example, an approximate but reasonable model for electrons in white dwarf stars, whose Coulomb
interaction is mostly compensated by the charge of nuclei of fully ionized helium atoms.

Chapter 3 Page 32 of 34
Essential Graduate Physics SM: Statistical Mechanics

(ii) at low temperatures (in the lowest order in T/F << 1, giving a nonzero result),
neglecting the Coulomb interaction effects.52

3.21. Calculate the effective latent heat ef ≡ –N(Q/N0)N,V of evaporation of the spatially-
uniform Bose-Einstein condensate as a function of temperature T. Here Q is the heat absorbed by the
(condensate + gas) system of N >> 1 particles as a whole, while N0 is the number of particles in the
condensate alone.

3.22.* For an ideal, spatially-uniform Bose gas, calculate the law of the chemical potential’s
disappearance at T  Tc, and use the result to prove that at the critical point T = Tc, the heat capacity CV
is a continuous function of temperature.

3.23. In Chapter 1, several thermodynamic relations involving entropy have been discussed,
including the first of Eqs. (1.39):
S  G / T P .

If we combine this expression with Eq. (1.56), G = N, it looks like that for the Bose-Einstein
condensate, whose chemical potential  equals zero at temperatures below the critical point Tc, the
entropy should vanish as well. On the other hand, by dividing both parts of Eq. (1.19) by dT, and
assuming that at this temperature change the volume is kept constant, we get
CV  T S / T V .
(This equality was also mentioned in Chapter 1.) If the CV is known as a function of temperature, the last
relation may be integrated over T to calculate S:
CV (T )
S  dT  const.
V  const
T
According to Eq. (80), the specific heat for the Bose-Einstein condensate is proportional to T 3/2, so the
integration gives a non-zero entropy S  T 3/2. Resolve this apparent contradiction, and calculate the
genuine entropy at T = Tc.

3.24. The standard analysis of the Bose-Einstein condensation, outlined in Sec. 4, may seem to
ignore the energy quantization of the particles confined in volume V. Use the particular case of a cubic
confining volume V = aaa with rigid walls to analyze whether the main conclusions of the standard
theory, in particular Eq. (71) for the critical temperature of the system of N >> 1 particles, are affected
by such quantization.

3.25.* N >> 1 non-interacting bosons are confined in a soft, spherically-symmetric potential well
U(r) = m2r2/2. Develop the theory of the Bose-Einstein condensation in this system; in particular,

52This condition may be approached reasonably well, for example, in 2D electron gases formed in semiconductor
heterostructures (see, e.g., the discussion in QM Sec. 1.6, and the solution of Problem 3.2 of that course), due to
not only the electron field’s compensation by background ionized atoms, but also by its screening by highly
doped semiconductor bulk.

Chapter 3 Page 33 of 34
Essential Graduate Physics SM: Statistical Mechanics

prove Eq. (74b), and calculate the critical temperature Tc*. Looking at the solution, what is the most
straightforward way to detect the condensation in experiment?

3.26. Calculate the chemical potential of an ideal, uniform 2D gas of spin-0 Bose particles as a
function of its areal density n (the number of particles per unit area), and find out whether such gas can
condense at low temperatures. Review your result for the case of a large (N >> 1) but finite number of
particles.

3.27. Can the Bose-Einstein condensation be achieved in a 2D system of N >> 1 non-interacting


bosons placed into a soft, axially-symmetric potential well, whose potential may be approximated as
U(r) = m22/2, where 2  x2 + y2, and {x, y} are the Cartesian coordinates in the particle confinement
plane? If yes, calculate the critical temperature of the condensation.

3.28. Use Eqs. (115) and (120) to calculate the third virial coefficient C(T) for the hardball
model of particle interactions.

3.29. Assuming the hardball model, with volume V0 per molecule, for the liquid phase, describe
how the results of Problem 7 change if the liquid forms spherical drops of radius R >> V01/3. Briefly
discuss the implications of the result for water cloud formation.
Hint: Surface effects in macroscopic volumes of liquids may be well described by attributing an
additional energy  (equal to the surface tension) to the unit surface area.53

3.30. For a 1D Tonks’ gas of N classical hard rods of length l, confined to a segment of length L
> Nl, in thermal equilibrium at temperature T:
(i) Calculate the system’s average internal energy, entropy, both heat capacities, and the average
force F exerted by the rods on the “walls” confining them to the segment L.
(ii) Expand the calculated equation of state F(L, T) to the Taylor series in linear density N/L of
the rods, find all virial coefficients, and compare the 2nd of them with the result following from the 1D
version of Eq. (93).

53 See, e.g., CM Sec. 8.2.

Chapter 3 Page 34 of 34
Essential Graduate Physics SM: Statistical Mechanics

Chapter 4. Phase Transitions


This chapter gives a brief discussion of the coexistence between different states (“phases”) of systems
consisting of many similar interacting particles, and transitions between these phases. Due to the
complexity of these phenomena, quantitative analytical results in this field have been obtained only for a
few very simple models, typically giving rather approximate descriptions of real systems.

4.1. First-order phase transitions


From our everyday experience, say with water ice, liquid water, and water vapor, we know that
one chemical substance (i.e. a system of many similar particles) may exist in different stable states –
phases. A typical substance may have:
(i) a dense solid phase, in which interparticle forces keep all atoms/molecules in virtually fixed
relative positions, with just small thermal fluctuations about them;
(ii) a liquid phase, of comparable density, in which the relative distances between atoms or
molecules are almost constant, but these particles are virtually free to move around each other, and
(iii) a gas phase, typically of a much lower density, in which the molecules are virtually free to
move all around the containing volume.1
Experience also tells us that at certain conditions, two different phases may be in thermal and
chemical equilibrium – say, ice floating on water at the freezing-point temperature. Actually, in Sec. 3.4
we already discussed a quantitative theory of one such equilibrium: the Bose-Einstein condensate’s
coexistence with the uncondensed gas of similar particles. However, this is a rather exceptional case
when the phase coexistence is due to the quantum nature of the particles (bosons) rather than their direct
interaction. Much more frequently, the formation of different phases and transitions between them are
due to particle repulsive and attractive interactions, briefly discussed in Sec. 3.5.
Phase transitions are sometimes classified by their order.2 I will start their discussion with the
so-called first-order phase transitions that feature non-zero latent heat  – the thermal energy that is
necessary to turn one phase into another phase completely, even if temperature and pressure are kept
constant.3 Unfortunately, even the simplest “microscopic” models of particle interaction, such as those
discussed in Sec. 3.5, give rather complex equations of state. (As a reminder, even the simplest hardball
model leads to the series (3.100), whose higher virial coefficients defy analytical calculation.) This is

1 Plasma, in which atoms are partly or completely ionized, is frequently mentioned on one more phase, on equal
footing with the three phases listed above, but one has to remember that in contrast to them, a typical
electroneutral plasma consists of particles of two very different sorts – positive ions and electrons.
2 Such classification schemes, started by Paul Ehrenfest in the early 1930s, have been repeatedly modified to
accommodate new results for particular systems, and by now only the “first-order phase transition” is still a
generally accepted term, but with a definition different from the original one.
3 For example, for water the latent heat of vaporization at the ambient pressure is as high as ~2.2106 J/kg, i.e. ~
0.4 eV per molecule, making this ubiquitous liquid indispensable for fire fighting. (The latent heat of water ice’s
melting is an order of magnitude lower.)

© K. Likharev
Essential Graduate Physics SM: Statistical Mechanics

why I will follow the tradition to discuss the first-order phase transitions using a simple
phenomenological model suggested in 1873 by Johannes Diderik van der Waals.
For its introduction, it is useful to recall that in Sec. 3.5 we have derived Eq. (3.99) – the
equation of state for a classical gas of weakly interacting particles, which takes into account (albeit
approximately) both interaction components necessary for a realistic description of gas
condensation/liquefaction: the long-range attraction of the particles and their short-range repulsion. Let
us rewrite that result as follows:
N 2 NT  Nb 
Pa 2  1  . (4.1)
V V  V 
As we saw while deriving this formula, the physical meaning of the constant b is the effective volume of
space taken by a particle pair collision – see Eq. (3.96). The relation (1) is quantitatively valid only if
the second term in the parentheses is small, Nb << V, i.e. if the total volume excluded from particles’
free motion because of their collisions is much smaller than the whole volume V. In order to describe the
condensed phase (which I will call “liquid” 4), we need to generalize this relation to the case Nb ~ V.
Since the effective volume left for particles’ motion is V – Nb, it is very natural to make the following
replacement: V  V – Nb, in the equation of state of the ideal gas.5 If we also keep on the left-hand side
the term aN2/V2, which describes the long-range attraction, we get the so-called van der Waals equation
of state:
N2 NT Van der
Pa 2  . (4.2) Waals
V V  Nb equation

Taylor-expanding the right-hand side of this relation in small Nb/V << 1, we see that its first two
terms return us to the microscopically-justified Eq. (1); however, already the next term of the expansion
gives a virial coefficient C(T) different from the microscopically-derived Eq. (3.120), due to the
phenomenological nature of Eq. (2). Let us explore the basic properties of this famous model.
It is frequently convenient to discuss any equation of state in terms of its isotherms, i.e. the P(V)
curves plotted at constant T. As Eq. (2) shows, in the van der Waals model, such a plot depends on four
parameters: a, b, N, and T, making formulas bulky. To simplify them, it is convenient to introduce
dimensionless variables: pressure p  P/Pc, volume v  V/Vc, and temperature t  T/Tc, all normalized to
their so-called critical values,
1 a 8 a
Pc  2
, Vc  3 Nb, Tc  , (4.3)
27 b 27 b
whose meaning will be clear in a minute. In this notation, Eq. (2) acquires the following form
(historically called the law of corresponding states):
3 8t
p  , (4.4)
v 2
3v  1
so the isotherms p(v) depend on only one parameter, the normalized temperature t – see Fig. 1.

4 Due to the phenomenological character of the van der Waals model, one cannot say for sure whether the
condensed phase it predicts corresponds to a liquid or a solid. However, for most real substances at ambient
conditions, gas coexists with liquid, hence the name.
5 For the 1D gas of non-zero-size particles (“rods”) with hard-core next-neighbor interactions, such replacement
gives the exact result - see the solution of Problem 3.30. Unfortunately, this is not true in higher dimensions.

Chapter 4 Page 2 of 36
Essential Graduate Physics SM: Statistical Mechanics

2
1.2

1.1
P
p
Pc
t  1.0
1

0.9 Fig. 4.1. The van der Waals equation of state,


plotted on the [p, v] plane for several values of
the normalized temperature t  T /Tc. Shading
0.8 shows the range of single-phase instability
0
0 1 2
where (P/V)T > 0.
v  V /Vc
The most important property of these plots is that the isotherms have qualitatively different
shapes in two temperature regions. At t > 1, i.e. T > Tc, pressure increases monotonically at gas
compression (qualitatively, as in an ideal classical gas, with P = NT/V, to which the van der Waals
system tends at T >> Tc), i.e. with (P/V)T < 0 at all points.6 However, below the critical temperature
Tc, any isotherm features a segment with (P/V)T >0. It is easy to understand that, at least in a constant-
pressure experiment (see, for example, Fig. 1.5),7 these segments describe a mechanically unstable
equilibrium. Indeed, if due to a random fluctuation, the volume deviated upward from its equilibrium
value, the pressure would also increase, forcing the environment (say, the heavy piston in Fig. 1.5) to
allow further expansion of the system, leading to an even higher pressure, etc. A similar deviation of
volume downward would lead to its similar avalanche-like decrease. Such avalanche instability would
develop further and further until the system has reached one of the stable branches with a negative slope
(P/V)T. In the range where the single-phase equilibrium state is unstable, the system as a whole may
be stable only if it consists of the two phases (one with a smaller, and another with a higher density n =
N/V) that are described by the two stable branches – see Fig. 2.

P
stable liquid
phase liquid and gas
2' in equilibrium
Au
P0 (T ) 1 2
Ad unstable stable gaseous
branch phase
1' Fig. 4.2. Phase equilibrium
0 at T < Tc (schematically).
V

6 The special choice of the numerical coefficients in Eq. (3) makes the border between these two regions take
place exactly at t = 1, i.e. at the temperature equal to Tc, with the critical point’s coordinates equal to Pc and Vc.
7 Actually, this assumption is not crucial for our analysis of mechanical stability, because if a fluctuation takes
place in a small part of the total volume V, its other parts play the role of a pressure-fixing environment.

Chapter 4 Page 3 of 36
Essential Graduate Physics SM: Statistical Mechanics

In order to understand the basic properties of this two-phase system, let us recall the general
conditions of the thermodynamic equilibrium of two systems, which have been discussed in Chapter 1:
T1  T2 (thermal equilibrium), (4.5)
1   2 (“chemical” equilibrium), (4.6)
Phase
equilibrium
the latter condition meaning that the average energy of a single particle in both systems has to be the conditions
same. To those, we should add the evident condition of mechanical equilibrium,
P1  P2 (mechanical equilibrium), (4.7)
which immediately follows from the balance of the normal forces exerted on any inter-phase boundary.
If we discuss isotherms, Eq. (5) is fulfilled automatically, while Eq. (7) means that the effective
isotherm P(V) describing a two-phase system should be a horizontal line – see Fig. 2:8
P  P0 (T ) . (4.8)
Along this line, the internal properties of each phase do not change; only the particle distribution is: it
evolves gradually from all particles being in the liquid phase at point 1 to all particles being in the gas
phase at point 2.9 In particular, according to Eq. (6), the chemical potentials  of the phases should be
equal at each point of the horizontal line (8). This fact enables us to find the line’s position: it has to
connect points 1 and 2 in that the chemical potentials of the two phases are equal to each other. Let us
recast this condition as
2 2

 d  0, i.e.  dG  0 ,
1 1
(4.9)

where the integral may be taken along the single-phase isotherm. (For this mathematical calculation, the
mechanical instability of states on a certain part of this curve is not important.) By definition, along that
curve, N = const and T = const, so according to Eq. (1.53c), dG = –SdT + VdP +dN, for a slow
(reversible) change, dG = VdP. Hence Eq. (9) yields
2

 VdP  0 .
1
(4.10)

This equality means that in Fig. 2, the shaded areas Ad and Au should be equal. 10

8 Frequently, P0(T) is called the saturated vapor pressure.


9 A natural question: is the two-phase state with P = P0(T) the only state existing between points 1 and 2? Indeed,
the branches 1-1’ and 2-2’ of the single-phase isotherm also have negative derivatives (P/V)T and hence are
mechanically stable with respect to small perturbations. However, these branches are actually metastable, i.e.
have larger Gibbs energy per particle (i.e. ) than the counterpart phase at the same P, and are hence unstable to
larger perturbations – such as foreign microparticles (say, dust), protrusions on the confining walls, etc. In very
controlled conditions, these single-phase “superheated” and “supercooled” states can survive almost all the way to
the zero-derivative points 1’ and 2’, leading to sudden jumps of the system into the counterpart phase. (At fixed
pressure, such jumps go as shown by dashed lines in Fig. 2.) In particular, at the atmospheric pressure, purified
water may be supercooled to almost –50C, and superheated to nearly +270C. However, at more realistic
conditions, unavoidable perturbations result in the two-phase coexistence formation close to points 1 and 2.
10 This Maxwell equal-area rule (also called “Maxwell’s construct”) was suggested by J. C. Maxwell in 1875
using more complex reasoning.

Chapter 4 Page 4 of 36
Essential Graduate Physics SM: Statistical Mechanics

As the same Fig. 2 figure shows, the Maxwell rule may be rewritten in a different form,
Maxwell 2
equal-area
rule  P  P (T ) dV  0 ,
1
0 (4.11)

which is more convenient for analytical calculations than Eq. (10) if the equation of state may be
explicitly solved for P – as it is in the van der Waals model (2). Such calculation (left for the reader’s
exercise) shows that for that model, the temperature dependence of the saturated vapor pressure at low T
is exponential,11
  a 27
P0 (T )  Pc exp , with    Tc , for T  Tc , (4.12)
 T b 8
corresponding very well to the physical picture of the particle’s thermal activation from a potential well
of depth .
The signature parameter of a first-order phase transition, the latent heat of evaporation
Latent 2
heat:    dQ , (4.13)
definition
1

may also be found by a similar integration along the single-phase isotherm. Indeed, using Eq. (1.19), dQ
= TdS, we get
2
   TdS  T ( S 2  S1 ) . (4.14)
1

Let us express the right-hand side of Eq. (14) via the equation of state. For that, let us take the full
derivative of both sides of Eq. (6) over temperature, considering the value of G = N for each phase as a
function of P and T, and taking into account that according to Eq. (7), P1 = P2 = P0(T):

 G1   G  dP0  G2   G  dP0


   1     2  . (4.15)
 T  P  P  T dT  T  P  P  T dT
According to the first of Eqs. (1.39), the partial derivative (G/T)P is just minus the entropy, while
according to the second of those equalities, (G/P)T is the volume. Thus Eq. (15) becomes
dP0 dP
 S1  V1   S 2  V2 0 . (4.16)
dT dT
Solving this equation for (S2 – S1), and plugging the result into Eq. (14), we get the following
Clapeyron-Clausius formula:
Clapeyron- dP
Clausius   T (V2  V1 ) 0 . (4.17)
formula dT
For the van der Waals model, this formula may be readily used for the analytical calculation of  in two
limits: T << Tc and (Tc – T) << Tc – the exercises left for the reader. In the latter limit,   (Tc – T)1/2,
naturally vanishing at the critical temperature.

11 It is fascinating how well this Arrhenius exponent is hidden in the polynomial van der Waals equation (2)!

Chapter 4 Page 5 of 36
Essential Graduate Physics SM: Statistical Mechanics

Finally, some important properties of the van der Waals’ model may be revealed more easily by
looking at the set of its isochores P = P(T) for V = const, rather than at the isotherms. Indeed, as Eq. (2)
shows, all single-phase isochores are straight lines. However, if we interrupt these lines at the points
when the single phase becomes metastable, and complement them with the (very nonlinear!)
dependence P0(T), we get the pattern (called the phase diagram) shown schematically in Fig. 3a.

(a) (b)
P V  Vc V  Vc P
critical
V  Vc points
Pc
solid liquid
liquid gas
triple
point
Pt gas
0 Tc 0 Tt
T T
Fig. 4.3. (a) Van der Waals model’s isochores, the saturated gas pressure diagram, and the
critical point, and (b) the phase diagram of a typical three-phase system (all schematically).

In this plot, one more meaning of the critical point {Pc, Tc} becomes very vivid. At fixed
pressure P < Pc, the liquid and gaseous phases are clearly separated by the saturated pressure line P0(T),
so if we achieve the transition between the phases just by changing temperature (see the red horizontal
arrow in Fig. 3a), we have to pass through the phase equilibrium point, being delayed there to either put
the latent heat into the system or take it out. However, if we perform the transition between the same
initial and final points by changing both the pressure and temperature in a way to go around the critical
point (see the blue arrow in Fig. 3a), no definite point of transition may be observed: the substance stays
in a single phase, and it is a subjective judgment of the observer in which region that phase should be
called the liquid, and in which region, the gas. For water, the critical point corresponds to the
temperature of 647 K (374C), and the pressure Pc  22.1 MPa (i.e. ~200 bars), so a lecture
demonstration of its critical behavior would require substantial safety precautions. This is why such
demonstrations are typically carried out with other substances such as either diethyl ether,12 with its
much lower Tc (194C) and Pc (3.6 MPa), or the now-infamous carbon dioxide CO2, with even lower Tc
(31.1C), though higher Pc (7.4 MPa). Though these substances are colorless and clear in both gas and
liquid phases, their separation (by gravity) is still visible, due to small differences in the optical
refraction coefficient, at P < Pc, but not above Pc.13
Thus, in the van der Waals model, two phases may coexist, though only at certain conditions – in
particular, T < Tc. Now a natural, more general question is whether the coexistence of more than two

12 (CH3-CH2)-O-(CH2-CH3), historically the first popular general anesthetic.


13 It is interesting that very close to the critical point the substance suddenly becomes opaque – in the case of
ether, whitish. The qualitative explanation of this effect, called the critical opalescence, is simple: at this point,
the difference of the Gibbs energies per particle (i.e. the chemical potentials) of the two phases becomes so small
that unavoidable thermal fluctuations lead to spontaneous appearance and disappearance of relatively large (a-
few-m-scale) single-phase regions in all the volume. A large concentration of boundaries of such randomly-
shaped regions leads to strong light scattering.

Chapter 4 Page 6 of 36
Essential Graduate Physics SM: Statistical Mechanics

phases of the same substance is possible. For example, can the water ice, the liquid water, and the water
vapor (steam) all be in thermodynamic equilibrium? The answer is essentially given by Eq. (6). From
thermodynamics, we know that for a uniform system (i.e. a single phase), pressure and temperature
completely define the chemical potential (P, T). Hence, dealing with two phases, we had to satisfy just
one chemical equilibrium condition (6) for two common arguments P and T. Evidently, this leaves us
with one extra degree of freedom, so the two-phase equilibrium is possible within a certain range of P at
fixed T (or vice versa) – see again the horizontal line in Fig. 2 and the bold line in Fig. 3a. Now, if we
want three phases to be in equilibrium, we need to satisfy two equations for these variables:
1 ( P, T )   2 ( P, T )   3 ( P, T ) . (4.18)

Typically, the functions (P, T) are monotonic, so the two equations (18) have just one solution, the so-
called triple point {Pt, Tt}. Of course, this triple point of equilibrium between three phases should not be
confused with the partial critical points {Pc, Tc} for each of the two-phase pairs. Fig. 3b shows, very
schematically, their relation for a typical three-phase system solid-liquid-gas. For example, water, ice,
and water vapor are at equilibrium at a triple point with Pt  0.612 kPa14 and Tt = 273.16 K. The
practical importance of this particular temperature point is that by an international agreement it has been
accepted for the definition of not only the Kelvin temperature scale but also of the Celsius scale’s
reference, as 0.01C, so the absolute temperature zero corresponds to exactly –273.15C.15 More
generally, triple points of other purified simple substances (such as H2, N2, O2, Ar, Hg, and H2O) are
also used for thermometer calibration, defining the so-called international temperature scales including
the currently accepted scale ITS-90.
This analysis may be readily generalized to multi-component systems consisting of particles of
several (say, L) sorts.16 If such a mixed system is in a single phase, i.e. is macroscopically uniform, its
chemical potential may be defined by a natural generalization of Eq. (1.53c):
L
dG   SdT  VdP    l  dN l  . (4.19)
l 1

The last term reflects the fact that usually, every single phase is not a pure chemical substance, but has
particles of all other components, so (l) may depend not only on P and T but also on the concentrations
c(l)  N(l)/N of particles of each sort. If the total number N of particles is fixed, the number of
independent concentrations is (L – 1). For the chemical equilibrium of R phases, all R values of r(l) (r =
1, 2, …, R) have to be equal for particles of each sort: 1(l) = 2(l) = … = R(l), with each r(l) depending
on (L – 1) concentrations cr(l), and also on P and T. This requirement gives L(R – 1) equations for (L –
1)R concentrations cr(l), plus two common arguments P and T, i.e. for [(L –1)R + 2] independent
variables. This means that the number of phases has to satisfy the limitation
Gibbs
phase L( R  1)  ( L  1) R  2, i.e. R  L  2 , (4.20)
rule

14 Please note that for water, Pt is much lower than the normal atmospheric pressure (1 bar = 101.325 kPa).
15 Note the recent (2018) re-definition of the “legal” kelvin via joule (see, Appendix UCA: Selected Units and
Constants); however, the new definition is compatible, within experimental accuracy, with that mentioned above.
16 Perhaps the most practically important example is the air/water system. For its detailed discussion, based on Eq.
(19), the reader may be referred, e.g., to Sec. 3.9 in F. Schwabl, Statistical Mechanics, Springer (2000). Other
important applications include liquid solutions, and metallic alloys – solid solutions of metal elements.

Chapter 4 Page 7 of 36
Essential Graduate Physics SM: Statistical Mechanics

where the equality sign may be reached at just one point in the whole parameter space. This is the Gibbs
phase rule. As a sanity check, for a single-component system, L = 1, the rule yields R  3 – exactly the
result we have already discussed.

4.2. Continuous phase transitions


As Fig. 2 illustrates, if we fix pressure P in a system with a first-order phase transition, and start
changing its temperature, then the complete crossing of the transition-point line, defined by the equation
P0(T) = P, requires the insertion (or extraction) some non-zero latent heat . Formulas (14) and (17)
show that  is directly related to non-zero differences between the entropies and volumes of the two
phases (at the same pressure). As we know from Chapter 1, both S and V may be represented as the first
derivatives of appropriate thermodynamic potentials. This is why P. Ehrenfest called such transitions,
involving jumps of potentials’ first derivatives, the first-order phase transitions.
On the other hand, there are phase transitions that have no first derivative jumps at the transition
temperature Tc, so the temperature point may be clearly marked, for example, by a jump of the second
derivative of a thermodynamic potential – for example, the derivative C/T which, according to Eq.
(1.24), equals to 2E/T2. In the initial Ehrenfest classification, this was an example of a second-order
phase transition. However, most features of such phase transitions are also pertinent to some systems in
which the second derivatives of potentials are continuous as well. Due to this reason, I will use a more
recent terminology (suggested in 1967 by M. Fisher), in which all phase transitions with  = 0 are called
continuous.
Most (though not all) continuous phase transitions result from particle interactions. Here are
some representative examples:
(i) At temperatures above ~490 K, the crystal lattice of barium titanate (BaTiO3) is cubic, with a
Ba ion in the center of each Ti-cornered cube (or vice versa) – see Fig. 4a. However, as the temperature
is being lowered below that critical value, the sublattice of Ba ions is displaced along one of six sides of
the TiO3 sublattice, leading to a small deformation of both lattices – which become tetragonal. This is a
typical example of a structural transition, in this particular case combined with a ferroelectric
transition, because (due to the positive electric charge of the Ba ions) below the critical temperature the
BaTiO3 crystal has a spontaneous electric polarization even in the absence of external electric field.

(a) (b)

Cu
Ba Ti O Zn
Fig. 4.4. Single cells of
crystal lattices of (a)
BaTiO3 and (b) CuZn.

(ii) A different kind of phase transition happens, for example, in CuxZn1-x alloys (called brasses).
Their crystal lattice is always cubic, but above certain critical temperature Tc (which depends on x) any
of its nodes may be occupied by either a copper atom or a zinc atom, at random. At T < Tc, a trend
toward ordered atom alternation arises, and at low temperatures, the atoms are fully ordered, as shown
in Fig. 4b for the stoichiometric case x = 0.5. This is a good example of an order-disorder transition.

Chapter 4 Page 8 of 36
Essential Graduate Physics SM: Statistical Mechanics

(iii) At ferromagnetic transitions (such as the one taking place, for example, in Fe at 1,388 K)
and antiferromagnetic transitions (e.g., in MnO at 116 K), lowering of temperature below the critical
value17 does not change atom positions substantially, but results in a partial ordering of atomic spins,
eventually leading to their full ordering (Fig. 5).

(a) (b)

Fig. 4.5. Classical images


of fully ordered phases: (a)
a ferromagnet, and (b) an
antiferromagnet.

Note that, as it follows from Eqs. (1.1)-(1.3), at ferroelectric transitions the role of pressure is
played by the external electric field E , and at the ferromagnetic transitions, by the external magnetic
field H. As we will see very soon, even in systems with continuous phase transitions, a gradual change
of such an external field, at a fixed temperature, may induce jumps between metastable states, similar to
those in systems with first-order phase transitions (see, e.g., the dashed arrows in Fig. 2), with non-zero
decreases of the appropriate free energy.
Besides these standard examples, some other threshold phenomena, such as the formation of a
coherent optical field in a laser, and even the self-excitation of oscillators with negative damping (see,
e.g., CM Sec. 5.4), may be treated, at certain conditions, as continuous phase transitions.18
The general feature of all these transitions is the gradual formation, at T < Tc, of certain ordering,
which may be characterized by some order parameter   0. The simplest example of such an order
parameter is the magnetization at the ferromagnetic transitions, and this is why continuous phase
transitions are usually discussed for certain models of ferromagnetism. (I will follow this tradition but
mention in passing other important cases that require a substantial modification of the theory.) Most of
such models are defined on an infinite 3D cubic lattice (see, e.g., Fig. 5), with evident generalizations to
lower dimensions. For example, the Heisenberg model of a ferromagnet (suggested in 1928) is defined
by the following Hamiltonian:
Heisenberg
model Hˆ   J  σˆ
k , k '
k  σˆ k '   h  σˆ k ,
k
(4.21)

where σ̂ k is the Pauli vector operator19 acting on the kth spin, and h is the normalized external magnetic
field:

17 For ferromagnets, this point is usually referred to at the Curie temperature, and for antiferromagnets, as the
Néel temperature.
18 Unfortunately, I will have no time/space for these interesting (and practically important) generalizations, and
have to refer the interested reader to the famous monograph by R. Stratonovich, Topics in the Theory of Random
Noise, in 2 vols., Gordon and Breach, 1963 and 1967, and/or the influential review by H. Haken,
Ferstkörperprobleme 10, 351 (1970).

Chapter 4 Page 9 of 36
Essential Graduate Physics SM: Statistical Mechanics

h  m0  0 H . (4.22)

(Here m0 is the magnitude of the spin’s magnetic moment; for the Heisenberg model to be realistic for
usual solids, it should be of the order of the Bohr magneton B  e/2me  0.92710-23 J/T.) The figure
brackets {k, k’} in Eq. (21) denote the summation over the pairs of adjacent lattice sites, so the
magnitude of the constant J may be interpreted as the maximum coupling energy per “bond” between
two adjacent particles. At J > 0, the coupling tries to keep spins aligned, i.e. to install the ferromagnetic
ordering.20 The second term in Eq. (21) describes the effect of the external magnetic field, which tries to
orient all spin magnetic moments along its direction.21
However, even the Heisenberg model, while being rather rudimentary (in particular because its
standard form (21) is only valid for spins-½), is still rather complex for analysis. This is why most
theoretical results have been obtained for its classical twin, the Ising model:22

s sk '  h  sk .
Ising
Em   J k (4.23) model
 
k ,k ' k

Here Em are the particular values of the system’s energy in each of its 2N possible states with all possible
combinations of the binary classical variables sk = 1, while h is the normalized external magnetic
field’s magnitude – see Eq. (22). (Despite its classical character, the variable sk, modeling the field-
oriented Cartesian component of the real spin, is usually called “spin” for brevity, and I will follow this
tradition.) Somewhat shockingly, even for this toy model, no exact analytical 3D solution that would be
valid at arbitrary temperatures has been found yet, and the solution of its 2D version by L. Onsager in
1944 (see Sec. 5 below) is still considered one of the top intellectual achievements of statistical physics.
Still, Eq. (23) is very useful for the introduction of the basic notions of continuous phase transitions and
methods of their analysis, so for my brief discussion, I will mostly use this model.23
Evidently, if T = 0 and h = 0, the lowest possible energy,
E min   JNd , (4.24)

where d is the lattice dimensionality, is achieved in the “ferromagnetic” phase in which all spins sk are
equal to either +1 or –1, so  sk  = 1 as well. On the other hand, at J = 0, the spins are independent, and
if h = 0 as well, all sk are completely random, with a 50% probability to take either of values 1, so sk
= 0. Hence in the general case (with arbitrary J and h), we may use the average
Ising
  sk (4.25) model:
order
parameter

19 See, e.g., QM Sec. 4.4.


20 At J < 0, the first term of Eq. (21) gives a reasonable model of an antiferromagnet, but in this case, the external
magnetic field effects are more subtle; I will not have time to discuss them.
21 See, e.g., QM Eq. (4.163).
22 Named after Ernst Ising who explored the 1D version of the model in detail in 1925, though a similar model
was discussed earlier (in 1920) by Wilhelm Lenz.
23 For more detailed discussions of phase transition theories (including other popular models of the ferromagnetic
phase transition, e.g., the Potts model), see, e.g., either H. Stanley, Introduction to Phase Transitions and Critical
Phenomena, Oxford U. Press, 1971; or A. Patashinskii and V. Pokrovskii, Fluctuation Theory of Phase
Transitions, Pergamon, 1979; or B. McCoy, Advanced Statistical Mechanics, Oxford U. Press, 2010. For a very
concise text, I can recommend J. Yeomans, Statistical Mechanics of Phase Transitions, Clarendon, 1992.

Chapter 4 Page 10 of 36
Essential Graduate Physics SM: Statistical Mechanics

as a good measure of spin ordering, i.e. as the order parameter. Since in a real ferromagnet, each spin
carries a magnetic moment, the order parameter  is proportional to the Cartesian component of the
system’s average magnetization, in the direction of the applied magnetic field.
Now that the Ising model gave us a very clear illustration of the order parameter, let me use this
notion for a semi-quantitative characterization of continuous phase transitions. Due to the difficulty of
theoretical analyses of most models of the transitions at arbitrary temperatures, their theoretical
discussions are focused mostly on a very close vicinity of the critical point Tc. Both experiment and
theory show that in the absence of an external field, the function (T) is close to a certain power,

   ~t 
 ~ ,
 t for ~
t  0, i.e. T  Tc , (4.26)

of the small deviation from the critical temperature – which is conveniently normalized as
~ T  Tc
t   t  1. (4.27)
Tc
Most other key variables follow a similar temperature behavior, with critical exponents frequently being
the same for both signs of ~ t – though typically with very different proportionality factors. In particular,
the heat capacity at a fixed magnetic field behaves as24
~ 
ch  t . (4.28)

Similarly, the (normalized) low-field susceptibility25


 
 h 0  ~
t . (4.29)
h
Two other important critical exponents,  and , describe the temperature behavior of the
correlation function sksk’, whose dependence on the distance rkk’ between two spins may be well fitted
by the following law,
1  r  (4.30)
s k s k'  d 2 exp kk' ,
rkk ' r
 c 
with the correlation radius

rc  ~t . (4.31)

Finally, three more critical exponents, usually denoted , , and , describe the external field
dependences of, respectively, c, , and rc at ~
t  0. For example,  may be defined as

  h1  ~
t 0
. (4.32)

(Other field exponents are used less frequently, and for their discussion, the interested reader is referred
to the special literature that was cited above.)

The forms of this and other functions of  are selected to make all critical exponents non-negative.
24
25In most models of ferromagnetic phase transitions, this variable is proportional to the genuine low-field
magnetic susceptibility m of the material – see, e.g., EM Eq. (5.111).

Chapter 4 Page 11 of 36
Essential Graduate Physics SM: Statistical Mechanics

The leftmost column of Table 1 shows the ranges of experimental values of these critical
exponents for various 3D physical systems featuring continuous phase transitions. One can see that their
values vary from system to system, leaving no hope for a universal theory that would describe them all
exactly. However, certain combinations of the exponents are much more reproducible – see the four
bottom lines of the table.

Table 4.1. Major critical exponents of continuous phase transitions


Exponents and Experimental Landau’s 2D Ising 3D Ising 3D Heisenberg
combinations range (3D)(a) theory model model Model(d)
 0 – 0.14 0(b) (c)
0.12 –0.14
 0.32 – 0.39 1/2 1/8 0.31 0.3
 1.3 – 1.4 1 7/4 1.25 1.4
 4–5 3 15 5 ?
 0.6 – 0.7 1/2 1 0.64 0.7
 0.05 0 1/4 0.05 0.04
( + 2 + )/2 1.00  0.005 1 1 1 1
 – / 0.93  0.08 1 1 1 ?
(2 – )/ 1.02  0.05 1 1 1 1
(2 – )/d ? 4/d 1 1 1
(a)
Experimental data are from the monograph by A. Patashinskii and V. Pokrovskii, cited above.
(b)
Discontinuity at T = Tc – see below.
(c)
Instead of following Eq. (28), in this case, ch diverges as ln ~t .
(d)
With the order parameter  defined as jB/B.

Historically the first (and perhaps the most fundamental) of these universal relations was derived
in 1963 by J. Essam and M. Fisher:
  2    2 . (4.33)
It may be proved, for example, by finding the temperature dependence of the magnetic field value ht that
changes the order parameter by the same amount as a finite temperature deviation ~
t < 0 gives at h = 0.
Comparing Eqs. (26) and (29), we get
~   .
ht  t (4.34)

By the physical sense of ht, we may expect that such a field has to affect the system’s free energy26 F by
an amount comparable to the effect of a bare temperature change ~ t . Ensemble-averaging the last term

26 As was already discussed in Secs. 1.4 and 2.4, there is some dichotomy of terminology for free energies in
literature. In models (21) and (23), the magnetic field effects are accounted for at the microscopic level, by the
inclusion of the corresponding term into each particular value Em. From this point of view, the list of macroscopic

Chapter 4 Page 12 of 36
Essential Graduate Physics SM: Statistical Mechanics

of Eq. (23) and using the definition (25) of the order parameter , we see that the change of F (per
particle) due to the field equals –ht and, according to Eq. (26), scales as ht ~
t  ~
t (2 + ).

In order to estimate the thermal effect on F, let me first elaborate a bit more on the useful
thermodynamic formula already mentioned in Sec. 1.3:
 S 
CX  T  , (4.35)
 T  X
where X means the variable(s) maintained constant at the temperature variation. In the standard “P-V”
thermodynamics, we may use Eqs. (1.35) for X = V, and Eqs. (1.39) for X = P, to write
 S   2F   S    2G 
CV  T    T  2  , CP  T    T  2  . (4.36)
 T V , N  T V , N  T  P , N  T  P , N
As was just discussed, in the ferromagnetic models of the type (21) or (23), at a constant field h, the role
of G is played by F, so Eq. (35) yields
 S   2F 
Ch  T    T  2  . (4.37)
 T  h , N  T  h , N
The last form of this relation means that F may be found by double integration of (–Ch/T) over
temperature. In the limit ~t << 1, the factor T may be treated as a constant, so with Eq. (28) for ch 
C , the free energy scales as the double integral of c  ~
h t – over ~
h t , i.e. as ~
t (2 – ). Requiring this
change to be proportional to the same power of ~
t as the field-induced part of the energy, we finally
get the Essam-Fisher relation (33).
Using similar reasoning, it is straightforward to derive a few other universal relations of critical
exponents, including the Widom relation:

  1, (4.38)

very similar relations for other high-field exponents  and  (which I do not have time to discuss), and
the Fisher relation:
 2      . (4.39)

A slightly more complex reasoning, involving the so-called scaling hypothesis, yields the following
dimensionality-dependent Josephson relation
 d  2  . (4.40)
The second column of Table 1 shows that at least three of these relations are in very reasonable
agreement with experiment, so their set may be used as a testbed for various theoretical approaches to
continuous phase transitions.

variables in these systems does not include either P and V or their magnetic analogs, so we may take G  F + PV
= F + const, and the equilibrium (at fixed h, T and N) corresponds to the minimum of the Helmholtz free energy F.

Chapter 4 Page 13 of 36
Essential Graduate Physics SM: Statistical Mechanics

4.3. Landau’s mean-field theory


The highest-level (i.e. the most phenomenological) approach to continuous phase transitions,
formally not based on any particular microscopic model (though implying either the Ising model (23) or
one of its siblings), is the mean-field theory developed in 1937 by L. Landau, on the basis of prior ideas
by P. Weiss – to be discussed in the next section. The main idea of this phenomenological approach is to
represent the free energy’s change F at the phase transition as an explicit function of the order
parameter  (25). Since at T  Tc, the order parameter has to tend to zero, this change,
F  F (T )  F (Tc ) , (4.41)

may be expanded into the Taylor series in , and only a few, most important first terms of that
expansion retained. In order to keep the symmetry between two possible signs of the order parameter
(i.e. between two possible spin directions in the Ising model) in the absence of the external field, at h = 0
this expansion should include only even powers of :
F 1
f h 0  h 0  A(T ) 2  B (T ) 4  ..., at T  Tc . (4.42)
V 2
As Fig. 6 shows, at A(T) < 0, and B(T) > 0, these two terms are sufficient to describe the minimum of the
free energy at 2 > 0, i.e. to calculate the stationary values of the order parameter; this is why Landau’s
theory ignores higher terms of the Taylor expansion – which are much smaller at   0.

F F
V A0 V A0
A0  A/ B
A0 Fig. 4.6. The Landau free
energy (42) as a function of
0 (a)  and (b) 2, for two signs
0  A2 2 of the coefficient A(T), both

2B for B(T) > 0.

Now let us discuss the temperature dependence of the coefficients A and B. As Eq. (42) shows,
first of all, the coefficient B(T) has to be positive for any sign of ~
t  (T – Tc), to ensure the equilibrium
at a finite value of  . Thus, it is reasonable to ignore the temperature dependence of B near the critical
2

temperature altogether, i.e. use the approximation


B(T )  b  0. (4.43)
On the other hand, as Fig. 6 shows, the coefficient A(T) has to change sign at T = Tc, to be positive at T
> Tc and negative at T < Tc, to ensure the transition from  = 0 at T > Tc to a certain non-zero value of
the order parameter at T < Tc. Assuming that A is a smooth function of temperature, we may
approximate it by the leading term of its Taylor expansion in the difference ~
t  T – Tc < 0:
A(T )  a ~
t , with a  0 , (4.44)
so Eq. (42) becomes
~ 1
f h 0  a t  2  b 4 . (4.45)
2

Chapter 4 Page 14 of 36
Essential Graduate Physics SM: Statistical Mechanics

In this rudimentary form, the Landau theory may look almost trivial, and its main strength is the
possibility of its straightforward extension to the effects of the external field and of spatial variations of
the order parameter. First, as the field terms in Eqs. (21) or (23) show, the applied field gives such
systems, on average, the energy addition of –h per particle, i.e. –nh per unit volume, where n is the
particle density. Second, since according to Eq. (31) (with  > 0, see Table 1) the correlation radius
diverges at ~
t  0, in this limit the spatial variations of the order parameter should be slow,   0.
Hence, the effects of the gradient on F may be approximated by the first non-zero term of its expansion
into the Taylor series in ()2. As a result, Eq. (45) may be generalized as
Landau
1
ΔF   Δfd 3 r , with Δf  a~
t  2  b 4  nh  c   ,
2
theory: (4.46)
free energy 2
where c is a coefficient independent of . To avoid the unphysical effect of spontaneous formation of
spatial variations of the order parameter, that factor has to be positive at all temperatures and hence may
be taken for a constant in a small vicinity of Tc – the only region where Eq. (46) may be expected to
provide quantitatively correct results.
Let us find out what critical exponents are predicted by this phenomenological approach. First of
all, we may find the equilibrium values of the order parameter from the condition of F having a
minimum, F/ = 0. At h = 0, it is easier to use the equivalent equation F/(2) = 0, where F is given
by Eq. (45) – see Fig. 6b. This immediately yields

 a~t / b  , for t  0,


1/ 2 ~
  (4.47)
 0, for 0  ~
t.
Comparing this result with Eq. (26), we see that in the Landau theory,  = ½. Next, plugging the result
(47) back into Eq. (45), for the equilibrium (minimal) value of the free energy, we get

 a 2 ~ ~
t 2 / 2b, for t  0,
f   ~ (4.48)
 0, for 0  t .
From here and Eq. (37), the specific heat,
~
C h  a 2 / bTc , for t  0,
 ~ (4.49)
V 0, for 0  t ,
has, at the critical point, a discontinuity rather than a singularity, so we need to prescribe zero value to
the critical exponent  .
In the presence of a uniform field, the equilibrium order parameter should be found from the
condition f/ = 0 applied to Eq. (46) with  = 0, giving
f
 2a ~
t   2b 3  nh  0 . (4.50)

In the limit of a small order parameter,   0, the term with 3 is negligible, and Eq. (50) gives the
solution (stable only for ~
t  0):
nh
 ~, (4.51)
2a t

Chapter 4 Page 15 of 36
Essential Graduate Physics SM: Statistical Mechanics

so according to Eq. (29),  = 1. On the other hand, at ~ t  0 (or at sufficiently high fields at other
temperatures), the cubic term in Eq. (50) is much larger than the linear one, and this equation yields
1/ 3
 nh 
   , (4.52)
 2b 
so comparison with Eq. (32) yields  = 3. Finally, according to Eq. (30), the last term in Eq. (46) scales
as c2/rc2. (If rc  , the effects of the pre-exponential factor in Eq. (30) are negligible.) As a result, the
gradient term’s contribution is comparable27 with the two leading terms in f (which, according to Eq.
(47), are of the same order), if
1/ 2
 c 
rc   ~  , (4.53)
a t 
 
so according to the definition (31) of the critical exponent , in the Landau theory it is equal to ½.
The third column in Table 1 summarizes the critical exponents and their combinations in
Landau’s theory. It shows that these values are somewhat out of the experimental ranges, and while
some of their “universal” relations are correct, some are not; for example, the Josephson relation would
be only correct at d = 4 (not the most realistic spatial dimensionality :-) The main reason for this
disappointing result is that describing the spin interaction with the field, the Landau mean-field theory
neglects spin randomness, i.e. fluctuations. Though a quantitative theory of fluctuations will be
discussed only in the next chapter, we can readily perform their crude estimate. Looking at Eq. (46), we
see that its first term is a quadratic function of the effective “half-degree of freedom”, . Hence per the
equipartition theorem (2.28), we may expect that the average square of its thermal fluctuations, within a
d-dimensional volume with a linear size of the order of rc, should be of the order of T/2 (close to the
critical temperature, Tc/2 is a good enough approximation):
~ T
a t ~ 2 r d ~ c . (4.54)
c 2
In order to be negligible, the variance has to be small in comparison with the average 2 ~ a ~
t /b – see
Eq. (47). Plugging in the T-dependences of the operands of this relation, and values of the critical
exponents from the Landau theory, for  > 0 we get the so-called Levanyuk-Ginzburg criterion of its
validity:
~ d /2 a ~
Tc  a t  t . (4.55)


2 a  c   b
We see that for any realistic dimensionality, d < 4, the order parameter’s fluctuations grow at ~
t 0
faster than its average value, and hence the theory becomes invalid.
Thus the Landau mean-field theory is not a perfect approach to finding critical indices at
continuous phase transitions in Ising-type systems with their next-neighbor interactions between the
particles. However, any long-range interactions between particles increase the correlation radius rc, and

27 According to Eq. (30), the correlation radius may be interpreted as the distance at which the order parameter 
relaxes to its equilibrium value if it is deflected from that value at some point. Since the law of such spatial
change may be obtained by a variational differentiation of F, for the actual relaxation law, all major terms of (46)
have to be comparable.

Chapter 4 Page 16 of 36
Essential Graduate Physics SM: Statistical Mechanics

hence suppress the order parameter fluctuations. As one example, at laser self-excitation, the emerging
coherent optical field couples essentially all photon-emitting particles in the electromagnetic cavity
(resonator). As another example, in superconductors, the role of the correlation radius is played by the
Cooper-pair size 0, which is typically of the order of 10-6 m, i.e. much larger than the average distance
between the pairs (~10-8 m). As a result, the mean-field theory remains valid at all temperatures besides
an extremely small temperature interval near Tc – for bulk superconductors, of the order of 10–6 K.
The real strength of Landau’s mean-field theory is that despite its classical character, it may be
readily generalized to a description of various Bose-Einstein condensates, i.e. quantum fluids. Of those
generalizations, the most famous is the Ginzburg-Landau theory of superconductivity. It was developed
in 1950, i.e. even before the microscopic-level explanation of this phenomenon by J. Bardeen, L.
Cooper, and R. Schrieffer in 1956-57. In this theory, the real order parameter  is replaced with the
modulus of a complex function , physically the wavefunction of the coherent Bose-Einstein
condensate of Cooper pairs. Since each pair carries the electric charge q = –2e and has zero spin, it
interacts with the magnetic field in a way different from that described by the Heisenberg or Ising
models. Namely, as was already discussed in Sec. 3.4, in the magnetic field, the del operator  in Eq.
(46) has to be amended with the term –i(q/)A, where A is the vector potential of the total magnetic
field B = A, including not only the external magnetic field H but also the field induced by the
supercurrent itself. With the account for the well-known formula for the magnetic field energy, Eq. (46)
is now replaced with
2
1 2  q  B2
Δf  a~
2 4
GL free t   b     i A   , (4.56)
energy 2 2m    2 0
where m is a phenomenological coefficient rather than the actual particle’s mass.
The variational minimization of the resulting Gibbs energy density g  f – 0HM  f –
HB + const28 over the variables  and B (which is suggested for the reader’s exercise) yields two
differential equations:
 B i   q  
q     i A  *  c.c. , (4.57a)
GL 0 2m     
equations 2
2  q 
 a~
2
t b     i A  . (4.57b)
2m   
The first of these Ginzburg-Landau equations (57a) should be no big surprise for the reader,
because according to the Maxwell equations, in magnetostatics the left-hand side of Eq. (57a) has to be
equal to the electric current density, while its right-hand side is the usual quantum-mechanical
probability current density multiplied by q, i.e. the density j of the electric current of the Cooper pair
condensate. (Indeed, after plugging  = n1/2exp{i} into that expression, we come back to Eq. (3.84)
which, as we already know, explains such macroscopic quantum phenomena as the magnetic flux
quantization and the Meissner-Ochsenfeld effect.)

28As an immediate elementary sanity check of this relation, resulting from the analogy of Eqs. (1.1) and (1.3), the
minimization of g in the absence of superconductivity ( = 0) gives the correct result B = 0H. Note that this
account of the difference between f and g is necessary here because (unlike Eqs. (21) and (23)), the Ginzburg-
Landau free energy (56) does not take into account the effect of the field on each particle directly.

Chapter 4 Page 17 of 36
Essential Graduate Physics SM: Statistical Mechanics

However, Eq. (57b) is new for us – at least for this course.29 Since the last term on its right-hand
side is the standard wave-mechanical expression for the kinetic energy of a particle in the presence of a
magnetic field,30 if this term dominates that side of the equation, Eq. (57b) is reduced to the stationary
Schrödinger equation E  Hˆ  , for the ground state of free Cooper pairs, with the energy E = –a ~ t > 0.
However, in contrast to the usual (single-particle) Schrödinger equation, in which   is determined by
the normalization condition, the Cooper pair condensate density n =  2 is determined by the
thermodynamic balance of the condensate with the ensemble of “normal” (unpaired) electrons, which
plays the role of the uncondensed part of the particles in the usual Bose-Einstein condensate – see Sec.
3.4. In Eq. (57b), such balance is enforced by the first term b 2 on the right-hand side. As we have
already seen, in the absence of magnetic field and spatial gradients, such term yields     ~ t  1/2  (Tc
– T)1/2 – see Eq. (47).
As a parenthetic remark, from the mathematical standpoint, the term b 2, which is nonlinear
in , makes Eq. (57b) a member of the family of the so-called nonlinear Schrödinger equations.
Another important member of this family is the Gross-Pitaevskii equation,
~ 2 2 2 Gross-
 at   b       U (r ) , (4.58) Pitaevskii
2m equation

which gives a reasonable (albeit approximate) description of gradient and field effects on Bose-Einstein
condensates of electrically neutral atoms at T  Tc. The differences between Eqs. (58) and (57) reflect,
first, the zero electric charge q of the atoms (so Eq. (57a) becomes trivial) and, second, the fact that the
atoms forming the condensates may be readily placed in external potentials U(r)  const (including the
time-averaged potentials of optical traps – see EM Chapter 7), while in superconductors such potential
profiles are much harder to create due to the screening of external electric and optical fields by
conductors – see, e.g., EM Sec. 2.1.
Returning to the discussion of Eq. (57b), it is easy to see that its last term increases as either the
external magnetic field or the density of current passed through a superconductor are increased,
increasing the vector potential. In the Ginzburg-Landau equation, this increase is matched by a
corresponding decrease of  2, i.e. of the condensate density n, until it is completely suppressed. This
balance describes the well-documented effect of superconductivity suppression by an external magnetic
field and/or the supercurrent passed through the sample. Moreover, together with Eq. (57a), naturally
describing the flux quantization (see Sec. 3.4), Eq. (57b) explains the existence of the so-called
Abrikosov vortices – thin magnetic-field tubes, each carrying one quantum 0 of magnetic flux – see Eq.
(3.86). At the core of the vortex,  2 is suppressed (down to zero at its central line) by the persistent,
dissipation-free current of the superconducting condensate that circulates around the core and screens
the rest of the superconductor from the magnetic field carried by the vortex.31 The penetration of such
vortices into the so-called type-II superconductors enables them to sustain zero dc resistance up to very
high magnetic fields of the order of 20 T, and as a result, to be used in very compact magnets –
including those used for beam bending in particle accelerators.

29 It is also discussed in EM Sec. 6.5.


30 See, e.g., QM Sec. 3.1.
31 See, e.g., EM Sec. 6.5.

Chapter 4 Page 18 of 36
Essential Graduate Physics SM: Statistical Mechanics

Moreover, generalizing Eqs. (57) to the time-dependent case, just as it is done with the usual
Schrödinger equation, one can describe other fascinating quantum macroscopic phenomena such as the
Josephson effects, including the generation of oscillations with frequency J = (q/)V by weak links
between two superconductors, biased by dc voltage V. Unfortunately, time/space restrictions do not
allow me to discuss these effects in any detail in this course, and I have to refer the reader to special
literature.32 Let me only note that in the limit T  Tc, and for not extremely pure superconductor
crystals (in which the so-called non-local transport phenomena may be important), the Ginzburg-Landau
equations are exact, and may be derived (and their parameters Tc, a, b, q, and m determined) from the
standard “microscopic” theory of superconductivity, based on the initial work by Bardeen, Cooper, and
Schrieffer.33 Most importantly, such derivation proves that q = –2e – the electric charge of a single
Cooper pair.

4.4. Ising model: Weiss’ molecular-field approximation


The Landau mean-field theory is phenomenological in the sense that even within the range of its
validity, it tells us nothing about the value of the critical temperature Tc and other parameters (in Eq.
(46), the coefficients a, b, and c), so they have to be found from a particular “microscopic” model of the
system under analysis. In this course, we would have time to discuss only the Ising model (23) for
various dimensionalities d.
The most simplistic mean-field approach to this model is to assume that all spins are exactly
equal, sk = , with an additional condition 2  1, ignoring for a minute the fact that in the genuine Ising
model, sk may equal only +1 or –1. Plugging this relation into Eq. (23), we get34
F   NJd  2  Nh . (4.59)
This energy is plotted in Fig. 7a as a function of , for several values of h.

(a) (b)
1 
1
0.5

F
0 Fig. 4.7. Field dependences
0
 hc  hc h of (a) the free energy profile
0 and (b) the order parameter
0.5 h 0 .5
 (i.e. magnetization), in the
2J 1 .5 1 .0 crudest mean-field approach
1
1 0 1 1 to the Ising model.

32 See, e.g., M. Tinkham, Introduction to Superconductivity, 2nd ed., McGraw-Hill, 1996. A short discussion of
the Josephson effects and Abrikosov vortices may be found in QM Sec. 1.6 and EM Sec. 6.5 of this series.
33 See, e.g., Sec. 45 in E. Lifshitz and L. Pitaevskii, Statistical Physics, Part 2, Pergamon, 1980.
34 In this naïve approach, we neglect the fluctuations of spin, i.e. their disorder. This full ordering assumption

implies S = 0, so F  E – TS = E, and we may use either notation for the system’s energy.

Chapter 4 Page 19 of 36
Essential Graduate Physics SM: Statistical Mechanics

The plots show that at h = 0, the system may be in either of two stable states, with  = 1,
corresponding to two different spin directions (i.e. two different directions of magnetization), with equal
energy.35 (Formally, the state with  = 0 is also stationary, because at this point F/ = 0, but it is
unstable because for the ferromagnetic interaction, J > 0, the second derivative 2F/2 is always
negative.) As the external field is increased, it tilts the potential profile, and finally at the critical field,
h  hc  2 Jd , (4.60)

the state with  = –1 becomes unstable, leading to the system’s jump into the only remaining state with
opposite magnetization,  = +1 – see the arrow in Fig. 7a. Application of the similar external field of the
opposite polarity leads to the similar switching, back to  = –1, at the field h = –hc, so the full field
dependence of  follows the hysteretic pattern shown in Fig. 7b.36
Such a pattern is the most visible experimental feature of actual ferromagnetic materials, with
the coercive magnetic field Hc of the order of 103 A/m, and the saturated (or “remnant”) magnetization
corresponding to fields B of the order of a few teslas. The most important property of these materials,
also called permanent magnets, is their stability, i.e. the ability to retain the history-determined direction
of magnetization in the absence of an external field, for a very long time. In particular, this property is
the basis of all magnetic systems for data recording, including the ubiquitous hard disk drives with their
incredible information density approaching 1 Terabit per square inch.37
So, already the simplest mean-field approximation (59) does give a (crude) description of the
ferromagnetic ordering. However, it grossly overestimates the stability of these states with respect to
thermal fluctuations. Indeed, in this theory, there is no thermally-induced randomness at all, until T
becomes comparable with the height of the energy barrier separating two stable states,
F  F (  0)  F (  1)  NJd , (4.61)
which is proportional to the number of particles. At N  , this value diverges, and in this sense, the
critical temperature is infinite, while numerical experiments and more refined theories of the Ising
model show that actually its ferromagnetic phase is suppressed at T > Tc ~ Jd – see below.
The accuracy of this theory may be dramatically improved by even an approximate account for
thermally-induced randomness. In this approach (suggested in 1907 by Pierre-Ernest Weiss), called the
molecular-field approximation,38 random deviations of individual spin values from the lattice average,

35 The fact that the stable states always correspond to  = 1, partly justifies the treatment, in this crude
approximation, of the order parameter  as a continuous variable.
36 Since these magnetization jumps are accompanied by (negative) jumps of the free energy F, they are sometimes
called first-order phase transitions. Note, however, that in this simple theory, these transitions are between two
physically similar fully-ordered phases.
37 For me, it was always shocking how little my graduate physics students knew about this fascinating (and very
important) field of modern engineering, which involves so much interesting physics and fantastic
electromechanical technology. For getting acquainted with it, I may recommend, for example, the monograph by
C. Mee and E. Daniel, Magnetic Recording Technology, 2nd ed., McGraw-Hill, 1996.
38 In some texts, this approximation is called the “mean-field theory”. This terminology may lead to confusion,
because the molecular-field theory belongs to a different, deeper level of the theoretical hierarchy than, say, the
(more phenomenological) Landau-style mean-field theories. For example, for a given microscopic model, the

Chapter 4 Page 20 of 36
Essential Graduate Physics SM: Statistical Mechanics

~
sk  s k   , with   s k , (4.62)

are allowed, but considered small: ~ s k   . This assumption allows us, after plugging the resulting
~
expression s    s into the first term on the right-hand side of Eq. (23),
k k

Em   J 
 
  ~s   ~s   h s
k ,k '
k k'
k
k  J  
 
k ,k '
2
  ~ sk '   ~
sk  ~ sk ~ 
sk '  h s k ,
k
(4.63)

ignore the last term in the square brackets. Making the replacement (62) in the terms proportional to ~
sk ,
we may rewrite the result as
E m  E m'   NJd  2  hef  s k , (4.64)
k
where hef is defined as the sum
hef  h  2 Jd  . (4.65)
This sum may be interpreted as an effective external field, which (besides the genuine external field h)
takes into account the effect that would be exerted on spin sk by its 2d next neighbors if they all had non-
fluctuating (but possibly continuous) spin values sk’ = . Such addition to the external field,
Weiss
molecular hmol  hef  h  2 Jd  , (4.66)
field

is called the molecular field – giving its name to Weiss’ approach.


From the point of view of statistical physics, at fixed parameters of the system (including the
order parameter ), the first term on the right-hand side of Eq. (64) is merely a constant energy offset,
and hef is just another constant, so
  h , for s k  1,
E m'  const    k , with  k  hef s k   ef (4.67)
k  hef , for s k  1.
Such separability of the energy means that in the molecular-field approximation the fluctuations of
different spins are independent of each other, and their statistics may be examined individually, using
the energy spectrum k. But this is exactly the two-level system that was the subject of Problems 2.2-
2.4. Actually, its statistics is so simple that it is easier to redo it from scratch, rather than to use the
results of those exercises (which would require changing notation). Indeed, according to the Gibbs
distribution (2.58)-(2.59), the equilibrium probabilities of the states sk = 1 may be found as
1  hef / T  h   h  h
W  e , with Z  exp ef   exp ef   2 cosh ef . (4.68)
Z  T   T  T
From here, we may readily calculate F = –TlnZ and all other thermodynamic variables, but let us
immediately use Eq. (68) to calculate the statistical average of sj, i.e. the order parameter:
 hef / T  hef / T
e e hef
  s j  (1) W  (1) W   tanh . (4.69)
2 cosh hef / T  T

molecular-field approach may be used for the (approximate) calculation of the parameters a, b, and Tc
participating in Eq. (46) – the starting point of the Landau theory.

Chapter 4 Page 21 of 36
Essential Graduate Physics SM: Statistical Mechanics

Now comes the punch line of Weiss’ approach: plugging this result back into Eq. (65), we may write the
condition of self-consistency of the molecular-field approach:
Self-
h consistency
hef  h  2 Jd tanh ef . (4.70) equation
T
This is a transcendental equation, which evades an explicit analytical solution, but whose properties may
be readily analyzed by plotting both its sides as functions of the same argument, so the stationary
state(s) of the system corresponds to the intersection point(s) of these plots.
First of all, let us explore the field-free case (h = 0), when hef = hmol  2dJ, so Eq. (70) is
reduced to
 2 Jd 
  tanh  , (4.71)
 T 
giving one of the patterns sketched in Fig. 8, depending on the dimensionless parameter 2Jd/T.

LHS
1
RHS Fig. 4.8. The ferromagnetic phase transition
0
in Weiss’ molecular-field theory: two sides
0 0  of Eq. (71) sketched as functions of  for
three different temperatures: above Tc (red),
1 below Tc (blue), and equal to Tc (green).

If this parameter is small, the right-hand side of Eq. (71) grows slowly with  (see the red line in
Fig. 8), and there is only one intersection point with the left-hand side plot, at  = 0. This means that the
spin system has no spontaneous magnetization; this is the so-called paramagnetic phase. However, if
the parameter 2Jd/T exceeds 1, i.e. if T is decreased below the following critical value,
Critical
Tc  2 Jd , (4.72) (“Curie”)
temperature
the right-hand side of Eq. (71) grows, at small , faster than its left-hand side, so their plots intersect it
at three points:  = 0 and  = 0 – see the blue line in Fig. 8. It is almost evident that the former
stationary point is unstable, while the two latter points are stable. (This fact may be readily verified by
using Eq. (68) to calculate F. Now the condition F/h=0 = 0 returns us to Eq. (71), while calculating
the second derivative, for T < Tc we get 2F/2 > 0 at  = 0, and 2F/2 < 0 at  = 0). Thus, below
Tc the system is in the ferromagnetic phase, with one of two possible directions of the average
spontaneous magnetization, so the critical (Curie39) temperature given by Eq. (72) marks the transition
between the paramagnetic and ferromagnetic phases. (Since the stable minimum value of the free energy
F is a continuous function of temperature at T = Tc, this phase transition is continuous.)
Now let us repeat this graphics analysis to examine how each of these phases responds to an
external magnetic field h  0. According to Eq. (70), the effect of h is just a horizontal shift of the
straight-line plot of its left-hand side – see Fig. 9. (Note a different, here more convenient, normalization
of both axes.)

39 Named after Pierre Curie, rather than his (more famous) wife Marie Skłodowska-Curie.

Chapter 4 Page 22 of 36
Essential Graduate Physics SM: Statistical Mechanics

(a) (b)
2dJ 2dJ
h  hc
h0 h0 h   hc h   hc
0 hef 0 hef
h   hc Fig. 4.9 External field effects on
(a) a paramagnet (T > Tc), and
 2dJ  2dJ (b) a ferromagnet (T < Tc).

In the paramagnetic case (Fig. 9a) the resulting dependence hef(h) is evidently continuous, but
the coupling effect (J > 0) makes it steeper than it would be without spin interaction. This effect may be
quantified by the calculation of the low-field susceptibility defined by Eq. (29). To calculate it, let us
notice that for small h, and hence small hef, the function tanh is approximately equal to its argument so
Eq. (70) is reduced to
2 Jd 2 Jd
hef  h  hef , for hef  1 . (4.73)
T T
Solving this equation for hef, and then using Eq. (72), we get
h h
hef   . (4.74)
1  2 Jd / T 1  Tc / T
Recalling Eq. (66), we can rewrite this result for the order parameter:
hef  h h
  , (4.75)
Tc T  Tc
so the low-field susceptibility
Curie-  1
Weiss  h  0  , for T  Tc . (4.76)
law h T  Tc
This is the famous Curie-Weiss law, which shows that the susceptibility diverges at the approach to the
Curie temperature Tc.
In the ferromagnetic case, the graphical solution (Fig. 9b) of Eq. (70) gives a qualitatively
different result. A field increase leads, depending on the spontaneous magnetization, either to the further
saturation of hmol (with the order parameter  gradually approaching 1), or, if the initial  was negative,
to a jump to positive  at some critical (coercive) field hc. In contrast with the crude approximation (59),
at T > 0 the coercive field is smaller than that given by Eq. (60), and the magnetization saturation is
gradual, in a good (semi-quantitative) accordance with experiment.
To summarize, the Weiss molecular-field approach gives an approximate but realistic description
of the ferromagnetic and paramagnetic phases in the Ising model, and a very simple prediction (72) of
the temperature of the transition between them, for an arbitrary dimensionality d of the cubic lattice. It
also enables the calculation of other parameters of Landau’s mean-field theory for this model – an
exercise left for the reader. Moreover, the molecular-field approximation allows one to obtain similarly
reasonable analytical results for some other models of phase transitions – see, e.g., the exercise
problems at the end of Sec. 6.

Chapter 4 Page 23 of 36
Essential Graduate Physics SM: Statistical Mechanics

4.5. Ising model: Exact and numerical results


In order to evaluate the main prediction (72) of the Weiss theory, let us now discuss the exact
(analytical) and quasi-exact (numerical) results obtained for the Ising model, going from the lowest
value of dimensionality, d = 0, to its higher values. The zero dimensionality means that the spin has no
nearest neighbors at all, so the first term of Eq. (23) vanishes. Hence Eq. (64) is exact, with hef = h, and
so is its solution (69). Now we can simply use Eq. (76), with J = 0, i.e. Tc = 0, reducing this result to the
so-called Curie law:
1 Curie
 . (4.77) law
T
It shows that the system is paramagnetic at any temperature. One may say that for d = 0 the Weiss
molecular-field theory is exact – or even trivial. (However, in some sense it is more general than the
Ising model because as we know from Chapter 2, it gives the exact result for a fully quantum-
mechanical treatment of any two-level system, including spin-½.) Experimentally, the Curie law is
approximately valid for many so-called paramagnetic materials, i.e. 3D systems of particles with
spontaneous spins and sufficiently weak interaction between them.
The case d = 1 is more complex but has an exact analytical solution. A simple way to obtain it
for a uniform chain is to use the so-called transfer matrix approach.40 For this, first of all, we may argue
that most properties of a 1D system of N >> 1 spins (say, put at equal distances on a straight line) should
not change too much if we bend that line gently into a closed ring (Fig. 10), assuming that spins s1 and
sN interact exactly as all other next-neighbor pairs. Then the energy (23) becomes

E m   Js1 s 2  Js 2 s3  ...  Js N s1   hs1  hs 2  ...  hs N  . (4.78)


...
s N 1
sN
s1
s2 Fig. 4.10. The closed-ring
version of the 1D Ising system.
s3
...

Let us regroup the terms of this sum in the following way:


 h h  h h  h h 
E m    s1  Js1 s 2  s 2    s 2  Js 2 s 3  s3   ...   s N  Js N s1  s1  , (4.79)
 2 2  2 2  2 2 
so the group inside each pair of parentheses depends only on the state of two adjacent spins. The
corresponding statistical sum,

40 This approach was developed in 1941 by H. Kramers and G. Wannier. Note that for the field-free case h = 0, an
even simpler solution of the problem, valid for chains with an arbitrary number N of “spins” and arbitrary
coefficients Jk, is possible. For that, one needs to calculate the explicit relation between the statistical sums Z for
systems with N and (N + 1) spins first, and then apply it sequentially to systems starting from Z = 2. I am leaving
this calculation for the reader’s exercise.

Chapter 4 Page 24 of 36
Essential Graduate Physics SM: Statistical Mechanics

 s ss s   s s s s   s s s s 
Z 
s k  1, for
exph 1  J 1 2  h 2  exph 2  J 2 3  h 3 ... exph N  J N 1  h 1  , (4.80)
 2T T 2T   2T T 2T   2T T 2T 
k 1, 2 ,... N

still has 2N terms, each corresponding to a certain combination of signs of N spins. However, each
operand of the product under the sum may take only four values, corresponding to four different
combinations of its two arguments:
exp J  h  / T , for s k  s k 1  1,
 sk s k s k 1 s k 1  
exph J h   expJ  h  / T , for s k  s k 1  1, (4.81)
 2T T 2T  
 exp J / T , for s k   s k 1  1.

These values do not depend on the site number k,41 and may be represented as the elements Mj,j’ (with j,
j’ = 1, 2) of the so-called transfer matrix
 exp J  h /T  exp J/T  
M   , (4.82)
 exp J/T  exp J  h /T 
so the whole statistical sum (80) may be recast as a product:

Z  M j j M j j ...M j j M j j .
1 2 2 3 N 1 N N 1
(4.83)
jk 1, 2
According to the basic rule of matrix multiplication, this sum is just

Z  Tr M N  . (4.84)
Linear algebra tells us that this trace may be represented just as
Z   N   N , (4.85)
where  are the eigenvalues of the transfer matrix M, i.e. the roots of its characteristic equation,
exp J  h /T    exp J/T 
 0. (4.86)
exp J/T  exp J  h /T   
A straightforward solution of this equation yields two roots:
 J   4J  
1/ 2
h  2 h
  exp cosh   sinh  exp    . (4.87)
T  T  T  T   
The last simplification comes from the condition N >> 1 – which we need anyway, to make the
ring model sufficiently close to the infinite linear 1D system. In this limit, even a small difference of the
exponents, + > -, makes the second term in Eq. (85) negligible, so we finally get
N
 NJ   4J  
1/ 2
h  h
Z    exp cosh   sinh 2  exp
N
 
  . (4.88)
 T  T  T  T   

41This is a result of the “translational” (or rather rotational) symmetry of the system, i.e. its invariance to the
index replacement k  k + 1 in all terms of Eq. (78).

Chapter 4 Page 25 of 36
Essential Graduate Physics SM: Statistical Mechanics

From here, we can find the free energy per particle:


F T 1  h  2 h  4J  
1/ 2

 ln   J  T ln cosh   sinh  exp    , (4.89)


N N Z  T  T  T   
and then use thermodynamics to calculate such variables as entropy – see the first of Eqs. (1.35).
However, we are mostly interested in the order parameter defined by Eq. (25):   sj. The
conceptually simplest approach to the calculation of this statistical average would be to use the sum
(2.7), with the Gibbs probabilities Wm = Z-1exp{-Em/T}. However, the number of terms in this sum is 2N,
so for N >> 1 this approach is completely impracticable. Here the analogy between the canonical pair {–
P, V} and other generalized force-coordinate pairs {F, q}, in particular {0H(rk), mk} for the magnetic
field, discussed in Secs. 1.1 and 1.4, becomes invaluable – see in particular Eq. (1.3b). (In our
normalization (22), and for a uniform field, the pair {0H(rk), mk} becomes {h, sk}.) Indeed, in this
analogy the last term of Eq. (23), i.e. the sum of N products (–hsk) for all spins, with the statistical
average (–Nh), is similar to the product PV, i.e. the difference between the thermodynamic potentials F
and G  F + PV in the usual “P-V thermodynamics”. Hence, the free energy F given by Eq. (89) may be
understood as the Gibbs energy of the Ising system in the external field, and the equilibrium value of the
order parameter may be found from the last of Eqs. (1.39) with the replacements –P  h, V  N:
 F    F / N  
N    , i.e.    . (4.90)
 h  T  h  T
Note that this formula is valid for any model of ferromagnetism, of any dimensionality, if it has the same
form of interaction with the external field as the Ising model.
For the 1D Ising ring with N >> 1, Eqs. (89) and (90) yield
1/ 2
h  h  4J   1  2J 
  sinh  sinh 2  exp   , giving   h  0  exp  . (4.91)
T  T  T  h T T 
This result means that the 1D Ising model does not exhibit a phase transition, i.e., in this model Tc = 0.
However, its susceptibility grows, at T  0, much faster than the Curie law (77). This gives us a hint
that at low temperatures the system is “virtually ferromagnetic”, i.e. has the ferromagnetic order with
some rare random violations. (Such violations are commonly called low-temperature excitations.) This
interpretation may be confirmed by the following approximate calculation. It is almost evident that the
lowest-energy excitation of the ferromagnetic state of an open-end 1D Ising chain at h = 0 is the reversal
of signs of all spins in one of its parts – see Fig. 11.

+ + + + - - - - Fig. 4.11. A Bloch wall in an open-end


1D Ising chain.

Indeed, such an excitation (called either the “magnetic domain wall” or just the Bloch wall42)
involves the change of sign of just one product sksk’, so according to Eq. (23), its energy EW (defined as

42Named after Felix Bloch who was the first one to discuss such excitations. More complex excitations such as
skyrmions (see, e.g., A. Fert et al., Nature Review Materials 2, 17031 (2017)) have higher energies.

Chapter 4 Page 26 of 36
Essential Graduate Physics SM: Statistical Mechanics

the difference between the values of Em with and without the excitation) equals 2J, regardless of the wall
position.43 Since in the ferromagnetic Ising model, the parameter J is positive, EW > 0. If the system
“tried” to minimize its internal energy, having any wall in the system would be energy-disadvantageous.
However, thermodynamics tells us that at T  0, the system’s thermal equilibrium corresponds to the
minimum of the free energy F  E – TS, rather than just energy E.44 Hence, we have to calculate the
Bloch wall’s contribution FW to the free energy. Since in an open-end linear chain of N >> 1 spins, the
wall can take (N – 1)  N positions with the same energy EW, we may claim that the entropy SW
associated with this excitation is lnN, so
FW  EW  TSW  2 J  T ln N . (4.92)

This result tells us that in the limit N  , and at T  0, the Bloch walls are always free-energy-
beneficial, thus explaining the absence of the perfect ferromagnetic order in the 1D Ising system. Note,
however, that since the logarithmic function changes extremely slowly at large values of its argument,
one may argue that a large but finite 1D system should still feature a quasi-critical temperature
2J
"Tc "  , (4.93)
ln N
below which it would be in a virtually complete ferromagnetic order. (The exponentially large
susceptibility (91) is another manifestation of this fact.)
Now let us apply a similar approach to estimate Tc of a 2D Ising model, with open borders. Here
the Bloch wall is a line of a certain total length L – see Fig. 12. (For the example presented in that
figure, counting from the left to the right, L = 2 + 1 + 4 + 2 + 3 = 12 lattice periods.)

+ + + + + + + + +
+ + + + + + + + +
+ + + + + + + + +
+ + + + + + - - -
+ + + + + + - - -
+ + - - - - - - - Fig. 4.12. A Bloch wall in a 2D Ising system.
- - - - - - - - -

Evidently, the additional energy associated with such a wall is EW = 2JL, while the wall’s
entropy SW may be estimated using the following reasoning. A continuous Bloch wall may be thought
about as the path of a “Manhattan pedestrian” crossing the system between its nodes. At each junction of
straight segments of the path, the pedestrian may select 3 choices of 4 possible directions (except the
one that leads backward), so for a path without self-crossings, there are 3(L-1)  3L options for a walk
starting from a certain point. Now taking into account that the open borders of a square-shaped lattice
with N spins have a length of the order of N1/2, and the Bloch wall may start from any of them, there are
approximately M ~ N1/23L different walks between two borders. Again estimating SW as lnM, we get

43 For the closed-ring model (Fig. 10) such analysis gives an almost similar prediction, with the difference that in
that system, the Bloch walls may appear only in pairs, so EW = 4J, and SW = ln[N(N – 1)]  2lnN.
44 This is a very vivid application of one of the core results of thermodynamics. If the reader is still uncomfortable
with it, they are strongly encouraged to revisit Eq. (1.42) and its discussion.

Chapter 4 Page 27 of 36
Essential Graduate Physics SM: Statistical Mechanics

 
FW  EW  TSW  2 JL  T ln N 1 / 2 3 L  L(2 J  T ln 3)  T / 2  ln N . (4.94)

(Actually, since L scales as N1/2 or higher, at N   the last term in Eq. (94) is negligible.) We see that
the sign of the derivative FW /L depends on whether the temperature is higher or lower than the
following critical value:
2J
Tc   1.82 J . (4.95)
ln 3
At T < Tc, the free energy’s minimum corresponds to L  0, i.e. the Bloch walls are free-energy-
detrimental, and the system is in the purely ferromagnetic phase.
So, for d = 2 the simple estimate predicts a non-zero critical temperature of the same order as
the Weiss theory (according to Eq. (72), in this case Tc = 4J). The major approximation implied in the
calculation leading to Eq. (95) is disregarding possible self-crossings of the “Manhattan walk”. The
accurate counting of such self-crossings is rather difficult. It had been carried out in 1944 by L.
Onsager; since then his calculations have been redone in several easier ways, but even they are rather
cumbersome, and I will not have time to discuss them.45 The final result, however, is surprisingly
simple:
2J Onsager’s
Tc   2.269 J ,
ln 1  2   (4.96) exact result

i.e. showing that the simple estimate (95) is off the mark by only ~20%.
The Onsager solution, as well as all alternative solutions of the problem that were found later,
are so “artificial” (2D-specific) that they do not give a clear way towards their generalization to other
(higher) dimensions. As a result, the 3D Ising problem is still unsolved analytically. Nevertheless, we do
know Tc for it with extremely high precision – at least to the 6th decimal place. This has been achieved
by numerical methods; they deserve a discussion because of their importance for the solution of other
similar problems as well. Conceptually, the task is rather simple: just compute, to the desired precision,
the statistical sum of the system (23):
J h 
Z   exp  s k s k '   s k  . (4.97)
s k  1, for T {k ,k '}
k 1, 2 ,..., N
T k 

As soon as this has been done for a sufficient number of values of the dimensionless parameters J/T and
h/T, everything is easy; in particular, we can compute the dimensionless function
F / T   ln Z , (4.98)
and then find the ratio J/Tc as the smallest value of the parameter J/T at which the ratio F/T (as a
function of h/T) has a minimum at zero field. However, for any system of a reasonable size N, the
“exact” computation of the statistical sum (97) is impossible, because it contains too many terms for any
supercomputer to handle. For example, let us take a relatively small 3D lattice with N = 101010 = 103
spins, which still features substantial boundary artifacts even using the periodic boundary conditions, so
its phase transition is smeared about Tc by ~ 3%. Still, even for such a crude model, Z would include

45For that, the interested reader may be referred to either Sec. 151 in the textbook by Landau and Lifshitz, or
Chapter 15 in the text by Huang.

Chapter 4 Page 28 of 36
Essential Graduate Physics SM: Statistical Mechanics

21,000  (210)100  (103)100  10300 terms. Let us suppose we are using a modern exaflops-scale
supercomputer performing 1018 floating-point operations per second, i.e. ~1026 such operations per year.
With those resources, the computation of just one statistical sum would require ~10(300-26) = 10274 years.
To call such a number “astronomic” would be a strong understatement. (As a reminder, the age of our
Universe is close to 1.41010 years – a very humble number in comparison.)
This situation may be improved dramatically by noticing that any statistical sum,
 E 
Z   exp m  , (4.99)
m  T 
is dominated by terms with lower values of Em. To find those lowest-energy states, we may use the
following powerful approach (belonging to a broad class of numerical Monte-Carlo techniques), which
essentially mimics one (randomly selected) path of the system’s evolution in time. One could argue that
for that we would need to know the exact laws of evolution of statistical systems,46 that may differ from
one system to another, even if their energy spectra Em are the same. This is true, but since the genuine
value of Z should be independent of these details, it may be evaluated using any reasonable kinetic
model that satisfies certain general rules. In order to reveal these rules, let us start from a system with
just two states, with energies Em and Em’  Em +  – see Fig. 13.

W m'
E m'  E m  
  Fig. 4.13. Deriving the detailed
Em balance relation.
Wm

In the absence of quantum coherence between the states (see Sec. 2.1), the equations for the time
evolution of the corresponding probabilities Wm and Wm’ should depend only on the probabilities (plus
certain constant coefficients). Moreover, since the equations of quantum mechanics are linear, these
master equations should be also linear. Hence, it is natural to expect them to have the following form,
Master dW m dW m'
equations  W m' Γ   W m Γ  ,  W m Γ   W m' Γ  , (4.100)
dt dt
where the coefficients  and  have the physical sense of the rates of the corresponding transitions
(see Fig. 13); for example, dt is the probability of the system’s transition into the state m’ during an
infinitesimal time interval dt, provided that at the beginning of that interval it was in the state m with full
certainty: Wm = 1, Wm’ = 0.47 Since for the system with just two energy levels, the time derivatives of the
probabilities have to be equal and opposite, Eqs. (100) describe a redistribution of the probabilities
between the energy levels, while keeping their sum W = Wm + Wm’ constant. According to Eqs. (100), at
t  , the probabilities settle to their stationary values related as
Wm' 
 . (4.101)
Wm 

46
Discussion of such laws in the task of physical kinetics, which will be briefly reviewed in Chapter 6.
47
The calculation of these rates for several particular cases is described in QM Secs. 6.6, 6.7, and 7.6 – see, e.g.,
QM Eq. (7.196), which is valid for a very general model of a quantum system.

Chapter 4 Page 29 of 36
Essential Graduate Physics SM: Statistical Mechanics

Now let us require these stationary values to obey the Gibbs distribution (2.58); from it
Wm'  E  E m'   
 exp m   exp   1 . (4.102)
Wm  T   T
Comparing these two expressions, we see that the rates have to satisfy the following detailed balance
relation:
   Detailed
 exp  . (4.103) balance
  T
Now comes the final step: since the rates of transition between two particular states should not depend
on other states and their occupation, Eq. (103) has to be valid for each pair of states of any multi-state
system. (By the way, this relation may serve as an important sanity check: the rates calculated using any
reasonable model of a quantum system have to satisfy it.)
The detailed balance yields only one equation for two rates  and ; if our only goal is the
calculation of Z, the choice of the other equation is not too critical. A very simple choice is

1, if   0,
        (4.104)
exp  / T , otherwise,
where  is the energy change resulting from the transition. This model, which evidently satisfies the
detailed balance relation (103), is very popular (despite the unphysical cusp this function has at  = 0),
because it enables the following simple Metropolis algorithm (Fig. 14).

set up an initial state

- flip a random spin


- calculate 
- calculate  ()

generate random 
(0   1)

Fig. 4.14. A crude scheme of


reject  < compare  > accept the Metropolis algorithm for
spin flip  spin flip the Ising model simulation.

The calculation starts by setting a certain initial state of the system. At relatively high
temperatures, the state may be generated randomly; for example, in the Ising system, the initial state of
each spin sk may be selected independently, with a 50% probability. At low temperatures, starting the
calculations from the lowest-energy state (in particular, for the Ising model, from the ferromagnetic state
sk = sgn(h) = const) may give the fastest convergence. Now one spin is flipped at random, the

Chapter 4 Page 30 of 36
Essential Graduate Physics SM: Statistical Mechanics

corresponding change  of the energy is calculated,48 and plugged into Eq. (104) to calculate (). Next,
a pseudo-random number generator is used to generate a random number , with the probability density
being constant on the segment [0, 1]. (Such functions are available in virtually any numerical library.) If
the resulting  is less than (), the transition is accepted, while if  > (), it is rejected. Physically,
this means that any transition down the energy spectrum ( < 0) is always accepted, while those up the
energy profile ( > 0) are accepted with the probability proportional to exp{–/T}.49 After sufficiently
many such steps, the statistical sum (99) may be calculated approximately as a partial sum over the
states passed by the system. (It may be better to discard the contributions from a few first steps, to avoid
the effects of the initial state choice.)
This algorithm is extremely efficient. Even with the modest computers available in the 1980s, it
has allowed simulating a 3D Ising system of (128)3 spins to get the following result: J/Tc  0.221650 
0.000005. For all practical purposes, this result is exact – so perhaps the largest benefit of the possible
future analytical solution of the infinite 3D Ising problem will be a virtually certain Nobel Prize for its
author. Table 2 summarizes the values of Tc for the Ising model. Very visible is the fast improvement of
the prediction accuracy of the molecular-field approximation, because it is asymptotically correct at d 
.

Table 4.2. The critical temperature Tc (in the units of J) of the Ising model
of a ferromagnet (J > 0), for several values of dimensionality d
d Molecular-field approximation – Eq. (72) Exact value Exact value’s source
0 0 0 Gibbs distribution
1 2 0 Transfer matrix theory
2 4 2.269… Onsager’s solution
3 6 4.513… Numerical simulation

Finally, I need to mention the renormalization-group (“RG”) approach,50 despite its low
efficiency for Ising-type problems. The basic idea of this approach stems from the scaling law (30)-(31):
at T = Tc the correlation radius rc diverges. Hence, the critical temperature may be found from the
requirement for the system to be spatially self-similar. Namely, let us form larger and larger groups
(“blocks”) of adjacent spins, and require that all properties of the resulting system of the blocks
approach those of the initial system, as T approaches Tc.
Let us see how this idea works for the simplest nontrivial (1D) case described by the statistical
sum (80). Assuming N to be even (which does not matter at N  ), and adding an inconsequential
constant C to each exponent (for the purpose that will be clear soon), we may rewrite this expression as

48 Note that a flip of a single spin changes the signs of only (2d + 1) terms in the sum (23), i.e. does not require
the re-calculation of all (2d +1)N terms of the sum, so the computation of  takes just a few multiply-and-
accumulate operations even at N >> 1.
49 The latter step is necessary to avoid the system’s trapping in local minima of its multidimensional energy
profile Em(s1, s2,…, sN).
50 Initially developed in the quantum field theory in the 1950s, it was adapted to statistics by L. Kadanoff in 1966,
with a spectacular solution of the so-called Kubo problem by K. Wilson in 1972, later awarded with a Nobel Prize.

Chapter 4 Page 31 of 36
Essential Graduate Physics SM: Statistical Mechanics

 h J h 
Z  
s k  1 k 1, 2 ,... N
exp  s k  s k s k 1 
 2T T 2T
s k 1  C  .

(4.105)

Let us group each pair of adjacent exponents to recast this expression as a product over only even
numbers k,
 h J h h 
Z    exp s k 1  s k  s k 1  s k 1     s k 1  2C  , (4.106)
s k  1 k  2 , 4 ,... N  2T T T  2T 
and carry out the summation over two possible states of the internal spin sk explicitly:
  h J h h  
exp 2T s k 1  T s k 1  s k 1   T  2T s k 1  2C  
  
Z   
  h J h h 
 exp s k 1  s k 1  s k 1   
s k  1 k  2 , 4 ,... N
s k 1  2C  (4.107)
  2T T T 2T 
J h  h 
   2 cosh  s k 1  s k 1    exp s k 1  s k 1   2C .
s k  1 k  2 , 4 ,... N T T  2T 
Now let us require this statistical sum (and hence all statistical properties of the system of two-
spin blocks) to be identical to that of the Ising system of N/2 spins, numbered by odd k:
 J' h' 
Z'   
s k  1 k  2 , 4 ,..., N
exp s k 1 s k 1  s k 1  C'  ,
T T 
(4.108)

with some different parameters h’, J’, and C’, for all four possible values of sk-1 = 1 and sk+1 = 1.
Since the right-hand side of Eq. (107) depends only on the sum (sk-1 + sk+1), this requirement yields only
three (rather than four) independent equations for finding h’, J’, and C’. Of them, the equations for h’
and J’ depend only on h and J (but not on C),51 and may be represented in an especially simple form,
RG
x (1  y ) 2 y ( x  y) equations
x'  , y'  , (4.109) for 1D Ising
( x  y )(1  xy ) 1  xy model

if the following notation is used:


 J  h
x  exp  4 , y  exp  2  . (4.110)
 T  T
Now the grouping procedure may be repeated, with the same result (109)-(110). Hence these
equations may be considered as recurrence relations describing repeated doubling of the spin block size.
Figure 15 shows (schematically) the trajectories of this dynamic system on the phase plane [x, y]. (Each
trajectory is defined by the following property: for each of its points {x, y}, the point {x’, y’} defined by
the “mapping” Eq. (109) is also on the same trajectory.) For the ferromagnetic coupling (J > 0) and h >
0, we may limit the analysis to the unit square 0  x, y  1. If this flow diagram had a stable fixed point
with x’ = x = x  0 (i.e. T/J < ) and y’ = y = 1 (i.e. h = 0), then the first of Eqs. (110) would
immediately give us the critical temperature of the phase transition in the field-free system:

51 This might be expected because physically C is just a certain constant addition to the system’s energy.
However, the introduction of that constant was mathematically necessary, because Eqs. (107) and (108) may be
reconciled only if C’  C.

Chapter 4 Page 32 of 36
Essential Graduate Physics SM: Statistical Mechanics

4J
Tc  . (4.111)
ln(1 / x )
However, Fig. 15 shows that the only fixed point of the 1D system is x = y = 0, which (at a finite
coupling J) should be interpreted as Tc = 0. This is of course in agreement with the exact result of the
transfer-matrix analysis but does not provide much additional information.

y  exp 2h / T 
h0
1

T 0 T 
Fig. 4.15. The RG flow
diagram of the 1D Ising
0 h 1 x  exp{4 J / T } system (schematically).

Unfortunately, for higher dimensionalities, the renormalization-group approach rapidly becomes


rather cumbersome and requires certain approximations whose accuracy cannot be easily controlled. For
the 2D Ising system, such approximations lead to the prediction Tc  2.55 J, i.e. to a substantial
difference from the exact result (96).

4.6. Exercise problems

4.1. Calculate the entropy, the internal energy, and the specific heat cV of the van der Waals gas,
and discuss the results. For the gas with temperature-independent cV, find the relation between V and T
during an adiabatic process.

4.2. Use two different approaches to calculate the coefficient (E/V)T for the van der Waals gas,
and the change of temperature of such a gas, with a temperature-independent CV, at its very fast
expansion.

4.3. For real gases, the Joule-Thomson coefficient (T/P)H (and hence the gas temperature
change at its throttling, see Problem 1.11) inverts its sign at crossing the so-called inversion curve
Tinv(P). Calculate this curve for the van der Waals gas.

4.4. Calculate the difference CP – CV for the van der Waals gas, and compare the result with that
for an ideal classical gas.

4.5. Calculate the temperature dependence of the phase-equilibrium pressure P0(T) and the latent
heat (T), for the van der Waals model, in the low-temperature limit T << Tc.

4.6. Perform the same tasks as in the previous problem in the opposite limit – in close vicinity of
the critical point Tc.

Chapter 4 Page 33 of 36
Essential Graduate Physics SM: Statistical Mechanics

4.7. Calculate CV and CP for the stable gas-liquid system described by the van der Waals
equation, for V = Vc and 0 < Tc – T << Tc.

4.8. Calculate the critical values Pc, Vc, and Tc for the so-called Redlich-Kwong model of the real
gas, with the following equation of state:52
a NT ,
P 
V V  Nb T 1/ 2
V  Nb
with constant parameters a and b.
Hint: Be prepared to solve a cubic equation with particular (numerical) coefficients.

4.9. Calculate the critical values Pc, Vc, and Tc for the phenomenological Dieterici model, with
the following equation of state:53
NT  a 
P exp ,
V b  NTV 
with constant parameters a and b. Compare the value of the dimensionless factor PcVc/NTc with those
given by the van der Waals and Redlich-Kwong models.

4.10. In the crude sketch shown in Fig. 3b, the derivatives dP/dT of the phase transitions liquid-
gas (“vaporization”) and solid-gas (“sublimation”), at the triple point, are different, with

 dPv   dP 
   s  .
 dT T Tt  dT T Tt
Is this occasional? What relation between these derivatives can be obtained from thermodynamics?

4.11. Use the Clapeyron-Clausius formula (17) to calculate the latent heat  of the Bose-Einstein
condensation, and compare the result with that obtained in the solution of Problem 3.21.

4.12. As was discussed in Sec. 4.1 of the lecture notes, properties of systems with first-order
phase transitions (such as the van der Waals gas) change qualitatively at the critical temperature: at T <
Tc, the system may include two different phases of the same substance. Since the difference in density of
these phases, in equilibrium, is a continuous function of the difference Tc – T, this change itself is
sometimes considered a continuous phase transition between the purely gaseous phase and the gas-plus-
liquid “phase”. From this viewpoint, what are the most reasonable analogs of the critical exponents , ,
and , defined in Sec. 4.2, for such a continuous transition? Evaluate these exponents for the van der
Waals model.

52 This equation of state, suggested in 1948, describes most real gases better than not only the original van der
Waals model, but also other two-parameter alternatives, such as the Berthelot, modified-Berthelot, and Dieterici
models, though some approximations with more fitting parameters (such as the Soave-Redlich-Kwong model)
work even better.
53 This model is currently less popular than the Redlich-Kwong one (also with two fitting parameters), whose
analysis was the task of the previous problem.

Chapter 4 Page 34 of 36
Essential Graduate Physics SM: Statistical Mechanics

4.13.
(i) Compose the effective Hamiltonian for that the usual single-particle stationary Schrödinger
equation coincides with the Gross-Pitaevski equation (58).
(ii) Use this Gross-Pitaevskii Hamiltonian, with the trapping potential U(r) = m2r2/2, to
calculate the energy E of N >> 1 trapped particles, assuming the trial solution   exp{–r2/2r02}, as a
function of the parameter r0.54
(iii) Explore the function E(r0) for positive and negative values of the constant b, and interpret
the results.
(iv) For small b < 0, estimate the largest number N of particles that may form a metastable Bose-
Einstein condensate.

4.14. Superconductivity may be suppressed by a sufficiently strong magnetic field. In the


simplest case of a bulk, long cylindrical sample of a type-I superconductor, placed into an external
magnetic field Hext parallel to its surface, this suppression takes a simple form of a simultaneous
transition of the whole sample from the superconducting state to the “normal” (non-superconducting)
state at a certain value Hc(T) of the field’s magnitude. This critical field gradually decreases with
temperature from its maximum value Hc(0) at T  0 to zero at the critical temperature Tc. Assuming
that the function Hc(T) is known, calculate the latent heat of this phase transition as a function of
temperature, and spell out its values at T  0 and T = Tc.
Hint: In this context, “bulk sample” means a sample much larger than the intrinsic length scales
of the superconductor (such as the London penetration depth L and the coherence length ).55 For such
bulk superconductors, magnetic properties of the superconducting phase may be well described just as
the perfect diamagnetism, with B = 0 inside it.

4.15. In some textbooks, the discussion of thermodynamics of superconductivity is started by


displaying, as self-evident, the following formula:
 0 H c2 T 
Fn T   Fs T   V,
2
where Fs and Fn are the free energy values in the superconducting and non-superconducting (“normal”)
phases, and Hc(T) is the critical value of the magnetic external field. Is this formula correct, and if not,
what qualification is necessary to make it valid? Assume that all conditions of the simultaneous field-
induced phase transition in the whole sample, spelled out in the previous problem, are satisfied.

4.16. Consider a ring of N = 3 Ising “spins” (sk = 1), with similar ferromagnetic coupling J
between all sites, in thermal equilibrium.
(i) Calculate the order parameter  and the low-field susceptibility   /hh=0.
(ii) Use the low-temperature limit of the result for  to predict it for a ring with an arbitrary N,
and verify your prediction by a direct calculation (in this limit).
(iii) Discuss the relation between the last result, in the limit N  , and Eq. (91).

54
This task is essentially the first step of the variational method of quantum mechanics – see, e.g., QM Sec. 2.9.
55Adiscussion of these parameters, as well as of the difference between the type-I and type-II superconductivity,
may be found in EM Secs. 6.4-6.5. However, those details are not needed for the solution of this problem.

Chapter 4 Page 35 of 36
Essential Graduate Physics SM: Statistical Mechanics

4.17. Calculate the average energy, entropy, and heat capacity of a three-site ring of Ising-type
“spins” (sk = 1), with anti-ferromagnetic coupling (of magnitude J) between the sites, in thermal
equilibrium at temperature T, with no external magnetic field. Find the asymptotic behavior of its heat
capacity for low and high temperatures, and give an interpretation of the results.

4.18. Using the results discussed in Sec. 5, calculate the average energy, free energy, entropy,
and heat capacity (all per one “spin”) as functions of temperature T and external field h, for the infinite
1D Ising model. Sketch the temperature dependence of the heat capacity for various values of the h/J
ratio, and give a physical interpretation of the result.

4.19. Calculate the specific heat (per “spin”) for the d-dimensional Ising problem in the absence
of the external field, in the molecular-field approximation. Sketch the temperature dependence of C and
compare it with the corresponding plot in the previous problem’s solution.

4.20. Prove that in the limit T  Tc, the molecular-field approximation, applied to the Ising
model with spatially-constant order parameter, gives results similar to those of Landau’s mean-field
theory with certain coefficients a and b. Calculate these coefficients, and list the critical exponents
defined by Eqs. (26), (28), (29), and (32), given by this approximation.

4.21. Assuming that the statistical sum ZN of a field-free, open-ended 1D Ising system of N
“spins” with arbitrary coefficients Jk is known, calculate ZN+1. Then use this relation to obtain an explicit
expression for ZN, and compare it with Eq. (88).

4.22. Use the molecular-field approximation to calculate the critical temperature and the low-
field susceptibility of a d-dimensional cubic lattice of spins, described by the so-called classical
Heisenberg model:56
E m   J  s k  s k'   h  s k .
k , ,k'  k

Here, in contrast to the (otherwise, very similar) Ising model (23), the spin of each site is described as a
classical 3D vector sk = {sxk, syk, szk} of unit length: s k2 = 1.

4.23. Use the molecular-field approximation to calculate the coefficient a of the Landau
expansion (46) for a 3D cubic lattice of spins described by the classical Heisenberg model (whose
analysis was the subject of the previous problem).

4.24. Use the molecular-field approximation to calculate the critical temperature of the
ferromagnetic transition for the d-dimensional cubic Heisenberg lattice of arbitrary (either integer or
half-integer) quantum spins s.
Hint: This model is described by Eq. (4.21) of the lecture notes, with σ̂ now meaning the vector
operator of spin s, in units of Planck’s constant .

56 This classical model is formally similar to the generalization of the genuine (quantum) Heisenberg model (21)
to arbitrary spin s, and serves as its infinite-spin limit.

Chapter 4 Page 36 of 36
Essential Graduate Physics SM: Statistical Mechanics

Chapter 5. Fluctuations
This chapter discusses fluctuations of macroscopic variables, mostly in thermodynamic equilibrium. In
particular, it describes the intimate connection between fluctuations and dissipation (damping) in
dynamic systems coupled to multi-particle environments. This connection culminates in the Einstein
relation between the diffusion coefficient and mobility, the Nyquist formula, and its quantum-
mechanical generalization – the fluctuation-dissipation theorem. An alternative approach to the same
problem, based on the Smoluchowski and Fokker-Planck equations, is also discussed in brief.

5.1. Characterization of fluctuations


At the beginning of Chapter 2, we discussed the notion of averaging a variable f over a statistical
ensemble – see Eqs. (2.7) and (2.10). Now, the fluctuation of the variable is defined simply as its
deviation from the average  f :
~
Fluctuation
f  f  f ; (5.1)

this deviation is, generally, also a random variable. The most important property of any fluctuation is
that its average over the same statistical ensemble equals zero; indeed:
~
f  f  f  f  f  f  f  0. (5.2)

As a result, such an average cannot be used to characterize fluctuations’ intensity, and the simplest
meaningful characteristic of the intensity is the variance (sometimes called “dispersion”):
~2
f  f  f  .
Variance: 2
definition
(5.3)

The following simple property of the variance is frequently convenient for its calculation:
~2
f  f  f   f 2  2 f f  f (5.4a)
2 2 2 2
 f 2 2 f  f ,
so, finally:
Variance ~2 2
via f  f 2
 f . (5.4b)
averages

As the simplest example, consider a variable that takes only two values, 1, with equal
probabilities Wj = ½. For such a variable, the basic Eq. (2.7) yields
1
f  W j f j   1  1  1  0 ; (5.5a)
j 2 2
however,
~2
  W j f j2  ½ (1) 2  ½ (1) 2  1, so that
2
f 2
f  f 2
 f  1  0. (5.5b)
j

The square root of the variance,


r.m.s. ~2 1/ 2
fluctuation f  f , (5.6)

© K. Likharev
Essential Graduate Physics SM: Statistical Mechanics

is called the root-mean-square (r.m.s.) fluctuation. An advantage of this measure is that it has the same
dimensionality as the variable itself, so the ratio f/ f  is dimensionless, and is used to characterize the
relative intensity of fluctuations.
As has been mentioned in Chapter 1, all results of thermodynamics are valid only if the
fluctuations of thermodynamic variables (the internal energy E, entropy S, etc.) are relatively small.1 Let
us make a simple estimate of the relative intensity of fluctuations, taking as the simplest example a
system of N independent, similar parts (e.g., particles), and an extensive variable
N
F   fk . (5.7)
k 1

where all single-particle functions fk are similar, besides that each of them depends on the state of only
“its own” (kth) part. The statistical average of such F is evidently
N
F  f N f , (5.8)
k 1
while the variance of its fluctuations is
~ ~~ N
~ N
~ N
~ ~ N
~ ~
F 2  FF  f f
k 1
k
k' 1
k'  f
k , k '1
k fk'  
k , k '1
fk fk' . (5.9)

Now we may use the fact that for two independent variables:
~ ~ ~ ~
fk fk'  fk f k '  0, for k'  k ; (5.10)

indeed, the first of these equalities may be used as the mathematical definition of their independence.
Hence, only the terms with k’ = k make nonzero contributions to the right-hand side of Eq. (9):
~ N
~2 ~
F 2  
k , k '1
f k  k ,k '  N f 2 . (5.11)

Comparing Eqs. (8) and (11), we see that the relative intensity of fluctuations of the variable F,
F 1 f Relative
 , (5.12) fluctuation
F N 1/ 2 f estimate

tends to zero as the system size grows (N  ). It is this fact that justifies the thermodynamic approach
to typical physical systems, with the number N of particles of the order of the Avogadro number NA ~
1024. Nevertheless, in many situations even small fluctuations of variables are important, and in this
chapter, we will calculate their basic properties, starting with the variance.
It should be comforting for the reader to notice that for one very important case, such a
calculation has already been done in our course. Indeed, for any generalized coordinate q and
generalized momentum p that give quadratic contributions of the type (2.46) to the system’s

1 Let me remind the reader that up to this point, the averaging signs … were dropped in most formulas, for the
sake of notation simplicity. In this chapter, I have to restore these signs to avoid confusion. The only exception
will be temperature – whose average, following (probably, bad :-) tradition, will be still called just T everywhere,
besides the last part of Sec. 3, where temperature fluctuations are discussed explicitly.

Chapter 5 Page 2 of 44
Essential Graduate Physics SM: Statistical Mechanics

Hamiltonian (as in a harmonic oscillator), we have derived the equipartition theorem (2.48), valid in the
classical limit. Since the average values of these variables, in the thermodynamic equilibrium, equal
zero, Eq. (6) immediately yields their r.m.s. fluctuations:
1/ 2 1/ 2 1/ 2
 T 
T   
p  (mT )1 / 2 , q     2 
, where     . (5.13)
   m  m
The generalization of these classical relations to the quantum-mechanical case (T ~ ) is provided by
Eqs. (2.78) and (2.81):
1/ 2 1/ 2
 m      
p   coth  , q   coth  . (5.14)
 2 2T   2 m 2T 
However, the intensity of fluctuations in other systems requires special calculations. Moreover,
only a few cases allow for general, model-independent results. Let us review some of them.

5.2. Energy and the number of particles


First of all, note that fluctuations of macroscopic variables depend on particular conditions.2 For
example, in a mechanically- and thermally-insulated system with a fixed number of particles, i.e. a
member of a microcanonical ensemble, the internal energy does not fluctuate: E = 0. However, if such
a system is in thermal contact with the environment, i.e. is a member of a canonical ensemble (Fig. 2.6),
the situation is different. Indeed, for such a system we may apply the general Eq. (2.7), with Wm given
by the Gibbs distribution (2.58)-(2.59), not only to E but also to E2. As we already know from Sec. 2.4,
the former average,
1  E   E 
E   Wm E m , Wm  exp m , Z   exp m  , (5.15)
m Z  T  m  T 
yields Eq. (2.61b), which may be rewritten in the form
1 Z 1
E  , where β  , (5.16)
Z  (  ) T
which is more convenient for our current purposes. Let us carry out a similar calculation for E2:
1
E 2   Wm E m2  E 2
m exp E m . (5.17)
m Z m

It is straightforward to verify, by double differentiation, that the last expression may be rewritten in a
form similar to Eq. (16):
1 2 1 2Z
E2   exp  E m   . (5.18)
Z  (  ) 2 m Z  (  ) 2
Now it is easy to use Eqs. (4) to calculate the variance of energy fluctuations:
2
~ 2 1 2Z  1 Z    1 Z   E
E2  E2  E        . (5.19)
Z  (  ) 2
 Z  (  )   (  )  Z  (  )   (  )

2Unfortunately, even in some renowned textbooks, certain formulas pertaining to fluctuations are either incorrect
or given without specifying the conditions of their applicability, so the reader’s caution is advised.

Chapter 5 Page 3 of 44
Essential Graduate Physics SM: Statistical Mechanics

Since Eqs. (15)-(19) are valid only if the system’s volume V is fixed (because its change may affect the
energy spectrum Em), it is customary to rewrite this important result as follows:

~  E  E  Fluctuations
E2   T 2    CV T 2 .
 (5.20) of E
 (1 / T )  T V
This is a remarkably simple, fundamental result. As a sanity check, for a system of N similar,
independent particles,  E  and hence CV are proportional to N, so Eq. (20) yields E  N1/2 and E/E
 N–1/2, in agreement with Eq. (12). Let me emphasize that the classically-looking Eq. (20) is based on
the general Gibbs distribution, and hence is valid for any system (either classical or quantum) in thermal
equilibrium.
Some corollaries of this result will be discussed in the next section, and now let us carry out a
very similar calculation for a system whose number N of particles in a system is not fixed, because they
may go to, and come from its environment at will. If the chemical potential  of the environment and its
temperature T are fixed, i.e. we are dealing with the grand canonical ensemble (Fig. 2.13), we may use
the grand canonical distribution (2.106)-(2.107):
1  N  E m , N   N  E m , N 
Wm , N  exp , Z G   exp . (5.21)
ZG  T  N ,m  T 
Acting exactly as we did above for the internal energy, we get
1  N  E m , N  T Z G
N   N exp  , (5.22)
ZGm, N  T  Z G 
1  N  E m , N  T 2  2 Z G
N2   N 2
exp   , (5.23)
 Z G 
2
Z G m, N  T
so the particle number’s variance is
2
~ 2 T 2 Z G T 2  Z G    T Z G   N
N2  N2  N      T    T , (5.24) Fluctuations
Z G  Z G2      Z G    of N

in full analogy with Eq. (19).3


In particular, for an ideal classical gas, we may combine the last result with Eq. (3.32b). (As was
already emphasized in Sec. 3.2, though that result has been obtained for the canonical ensemble, in
which the number of particles N is fixed, at N >> 1 the fluctuations of N in the grand canonical ensemble
should be relatively small, so the same relation should be valid for the average N in that ensemble.)
Easily solving Eq. (3.32b) for N, we get
 
N  const  exp , (5.25)
T 
where “const” means a factor constant at the partial differentiation of  N  over , required by Eq. (24).
Performing the differentiation and then using Eq. (25) again,

3 Note, however, that for the grand canonical ensemble, Eq. (19) is generally invalid.

Chapter 5 Page 4 of 44
Essential Graduate Physics SM: Statistical Mechanics

 N 1   N
 const  exp   , (5.26)
 T T  T
we get from Eq. (24) a very simple result:
Fluctuations ~ 1/ 2
of N: N 2  N , i.e. N  N . (5.27)
classical gas

This relation is so important that I will also show how it may be derived differently. As a by-
product of this new derivation, we will prove that this result is valid for systems with an arbitrary (say,
small) N, and also get more detailed information about the statistics of fluctuations of that number. Let
us consider an ideal classical gas of N0 particles in a volume V0, and calculate the probability WN to have
exactly N  N0 of these particles in its part of volume V  V0 – see Fig. 1.

V, N

V0, N0
Fig. 5.1. Deriving the binomial,
Poisson, and Gaussian distributions.

For one particle such probability is obviously W = V/V0 = N/N0  1, while the probability to
have that particle in the remaining part of the volume is W’ = 1 – W = 1 – N/N0. If all particles are
distinct, the probability of having N  N0 specific particles in volume V and (N – N0) specific particles in
volume (V – V0) is WNW’(N0–N). However, if we do not want to distinguish the particles, we should
multiply this probability by the number of possible particle combinations keeping the numbers N and N0
constant, i.e. by the binomial coefficient N0!/N!(N0 – N)!.4 As the result, the required probability is
N N0  N
Binomial N0!  N   N  N0!
W N  W W'
N ( N0  N )
   1   . (5.28)
N !( N 0  N )!  N 0   N 0 
distribution
N !( N 0  N )!

This is the so-called binomial probability distribution,5 valid for any  N  and N0.
Still keeping  N  arbitrary, we can simplify the binomial distribution by assuming that the whole
volume V0, and hence N0, are very large:
N 0  N , (5.29)

where N means any value of interest, including  N . Indeed, in this limit we can neglect N in
comparison with N0 in the second exponent of Eq. (28), and also approximate the fraction N0!/(N0 – N)!,
i.e. the product of N terms, (N0 – N + 1) (N0 – N + 2)…(N0 – 1)N0, by just N0N. As a result, we get
N N0 N N0 N
 N   N  N 0N N  N   N 1

N

W N    1 
 

  1 


   1  W W
 , (5.30)
N
 0   N 0  N! N!  N 0  N! 

4 See, e.g., MA Eq. (2.2).


5 It was derived by Jacob Bernoulli (1655-1705).

Chapter 5 Page 5 of 44
Essential Graduate Physics SM: Statistical Mechanics

where, as before, W =  N /N0. In the limit (29), W  0, so the factor inside the square brackets tends to
1/e, the reciprocal of the natural logarithm base.6 Thus, we get an expression independent of N0:
N
N N Poisson
WN  e . (5.31) distribution
N!
This is the much-celebrated Poisson distribution7 which describes a very broad family of random
phenomena. Figure 2 shows this distribution for several values of  N  – which, in contrast to N, is not
necessarily an integer.
1

N 
0.8

0.1
0.6

WN
0.4
1
2 Fig. 5.2. The Poisson distribution for
4 several values of  N . In contrast to
0.2 8
that average, the argument N may take
only integer values, so that the lines in
0
0 2 4 6 8 10 12 14 these plots are only guides for the eye.
N

In the limit of very small N, the function WN(N) is close to an exponent, WN ≈ WN   N N,
while in the opposite limit,  N  >> 1, it rapidly approaches the Gaussian (or “normal”) distribution8

1  ( N  N ) 2  Gaussian
WN  exp  . (5.32)
2 1 / 2 N  2(N ) 2 
distribution

(Note that the Gaussian distribution is also valid if both N and N0 are large, regardless of the relation
between them – see Fig. 3.)

Binomial distribution N  N 0 Poisson distribution


Eq. (28) Eq. (31)

Gaussian distribution
Eq. (32) Fig. 5.3. The hierarchy of three
1  N , N 0 1  N major probability distributions.

6 Indeed, this is just the most popular definition of that major mathematical constant – see, e.g., MA Eq. (1.2a)
with n = –1/W.
7 Named after the same Siméon Denis Poisson (1781-1840) who is also responsible for other major mathematical
results and tools used in this series, including the Poisson equation – see, e.g., Sec. 6.4 below.
8 Named after Carl Friedrich Gauss (1777-1855), though Pierre-Simone Laplace (1749-1827) is credited for
substantial contributions to its development.

Chapter 5 Page 6 of 44
Essential Graduate Physics SM: Statistical Mechanics

A major property of the Poisson (and hence of the Gaussian) distribution is that it has the same
variance as given by Eq. (27):
N 2  N  N   N .
~ 2
(5.33)

(This is not true for the general binomial distribution.) For our current purposes, this means that for the
ideal classical gas, Eq. (27) is valid for any number of particles.

5.3. Volume and temperature


What are the r.m.s. fluctuations of other thermodynamic variables – like V, T, etc.? Again, the
answer depends on specific conditions. For example, if the volume V occupied by a gas is externally
fixed (say, by rigid walls), it obviously does not fluctuate at all: V = 0. On the other hand, the volume
may fluctuate in the situation when the average pressure is fixed – see, e.g., Fig. 1.5. A formal
calculation of these fluctuations, using the approach applied in the last section, is complicated by the
fact that in most cases of interest, it is physically impracticable to fix its conjugate variable P, i.e.
suppress its fluctuations. For example, the force F(t) exerted by an ideal classical gas on a container’s
wall (whose measure the pressure is) is the result of individual, independent hits of the wall by particles
(Fig. 4), with the time scale c ~ rB/v21/2 ~ rB/(T/m)1/2 ~ 10–16 s, so its spectrum extends to very high
frequencies, virtually impossible to control.

F (t )

Fig. 5.4. The force exerted by gas


F particles on a container’s wall, as a
function of time (schematically).
0 t

However, we can use the following trick, very typical for the theory of fluctuations. It is almost
evident that the r.m.s. fluctuations of the gas volume are independent of the shape of the container. Let
us consider a particular situation similar to that shown in Fig. 1.5, with the container of a cylindrical
shape, with the base area A.9 Then the coordinate of the piston is just q = V/A, while the average force
exerted by the gas on the cylinder is F  = PA – see Fig. 5. Now if the piston is sufficiently massive,
the frequency  of its free oscillations near the equilibrium position is low enough to satisfy the
following three conditions.
First, besides balancing the average force F  and thus sustaining the average pressure  P  of
the gas, the interaction between the heavy piston and the relatively light particles of the gas is weak,
because of a relatively short duration of the particle hits (Fig. 4). As a result, the full energy of the
system may be represented as a sum of those of the particles and the piston, with a quadratic
contribution to the piston’s potential energy by small deviations from the equilibrium:

9As a math reminder, the term “cylinder” does not necessarily mean the “circular cylinder”; the shape of its cross-
section may be arbitrary; it just should not change with height.

Chapter 5 Page 7 of 44
Essential Graduate Physics SM: Statistical Mechanics

~
 V
U p  q~ 2 , ~
where q  q  q  , (5.34)
2 A
and  is the effective spring constant arising from the finite compressibility of the gas.

F  PA
A, M

V ~
q V  V  V (t )
A
Fig. 5.5. Deriving Eq. (37).

Second, at  = (/M)1/2  0, this spring constant may be calculated just as for constant
variations of the volume, with the gas remaining in quasi-equilibrium at all times:
F   P 
   A 2   .
 (5.35)
q  V 
This partial derivative10 should be calculated at whatever the thermal conditions are, e.g., with S = const
for adiabatic conditions (i.e., a thermally insulated gas), or with T = const for isothermal conditions
(including a good thermal contact between the gas and a heat bath), etc. With that constant denoted as X,
Eqs. (34)-(35) give
 P   V~ 
2
1   1   P  ~ 2
Up   A 2
    V . (5.36)
2  V  X  A  2   V  X

Finally, assuming that  is also small in the sense  << T, we may apply, to the piston’s
fluctuations, the classical equipartition theorem: Up = T/2, giving11

~  V  Fluctuations
V2  T    .
 (5.37a) of V
  P
X
X
Since this result is valid for any A and , it should not depend on the system’s geometry and the
piston’s mass, provided that it is large in comparison with the effective mass of a single system
component (say, a gas molecule) – the condition that is naturally fulfilled in most experiments. For the

10 As already was discussed in Sec. 4.1 in the context of the van der Waals equation, for the mechanical stability
of a gas (or liquid), the derivative P/V has to be negative, so  is positive.
11 One may meet statements that a similar formula,

~   P 
P 2  T    ,

(WRONG!)
  V X
X

is valid for pressure fluctuations. However, this equality does not take into account a different physical nature of
pressure (Fig. 4), with its very broad frequency spectrum. This issue will be discussed later in this chapter.

Chapter 5 Page 8 of 44
Essential Graduate Physics SM: Statistical Mechanics

particular case of fluctuations at constant temperature (X = T),12 we may use the definition (3.58) of the
isothermal bulk modulus (reciprocal compressibility) KT of the gas to rewrite Eq. (37a) as
~ TV
V2  . (5.37b)
T KT
For an ideal classical gas of N particles, with the equation of state V = NT/P, it is easier to use
directly Eq. (37a), again with X = T, to get

~  NT  V 2 VT 1
V2  T    , i.e.  , (5.38)
T  P 2 N V N 1/ 2
 
in full agreement with the general trend given by Eq. (12).
Now let us proceed to fluctuations of temperature, for simplicity focusing on the case V = const.
Let us again assume that the sample we are considering is weakly coupled to a heat bath of temperature
T0, in the sense that the time  of temperature equilibration between the two is much larger than the time
of the internal equilibration, called thermalization. Then we may assume that, on the former time scale,
temperature T describes the whole sample, though it may fluctuate:
~
T  T T . (5.39)

Moreover, due to the (relatively) large , we may use the stationary relation between small fluctuations
of temperature and the internal energy of the system:
~
~ E E
T  , so that δT  . (5.40)
CV CV
With those assumptions, Eq. (20) immediately yields the famous expression for the so-called
thermodynamic fluctuations of temperature:
E T
Fluctuations
of T
T   . (5.41)
CV CV1 / 2
The most straightforward application of this result is to analyses of so-called bolometers –
broadband detectors of electromagnetic radiation in microwave and infrared frequency bands. (In
particular, they are used for measurements of the CMB radiation, which was discussed in Sec. 2.6.) In
such a detector (Fig. 6), the incoming radiation is focused on a small sensor (e.g., a small piece of a
germanium crystal, a superconductor thin film at temperature T  Tc, etc.), which is well isolated
thermally from the environment. As a result, the absorption of an even small radiation power P leads to
a noticeable change T of the sensor’s average temperature T and hence of its electric resistance R,
which is probed up by low-noise external electronics.13 If the power does not change in time too fast, T
is a certain function of P, turning to 0 at P = 0. Hence, if T is much lower than the environment
temperature T0, we may keep only the main, linear term in its Taylor expansion in small P:

12 In this case, we may also use the second of Eqs. (1.39) to rewrite Eq. (37) via the second derivative (2G/P2)T.
13 Besides low internal electric noise, a good sensor should have a sufficiently large temperature responsivity
dR/dT, making the noise contribution by the readout electronics insignificant – see below.

Chapter 5 Page 9 of 44
Essential Graduate Physics SM: Statistical Mechanics

P
T  T  T0  , (5.42)
G
where the coefficient G  P/T is called the thermal conductance of the (perhaps unintentional but
unavoidable) thermal coupling between the sensor and the heat bath – see Fig. 6.

~
T  T  T (t )
T  T0  T G T0
R (T ) P
Fig. 5.6. The conceptual scheme of a bolometer.
to readout electronics

The power may be detected if the electric signal from the sensor, which results from the change
T, is not drowned in spontaneous fluctuations. In practical systems, these fluctuations are contributed
by several sources including electronic amplifiers. However, in modern systems, these “technical”
contributions to noise are successfully suppressed,14 and the dominating noise source is the fundamental
sensor temperature fluctuations described by Eq. (41). In this case, the so-called noise-equivalent power
(“NEP”), defined as the level of P that produces the signal equal to the r.m.s. value of noise, may be
calculated by equating the expressions (41) (with T = T0) and (42):
TG
NEP  P T T  01 / 2 . (5.43)
CV
This expression shows that to decrease the NEP, i.e. improve the detector’s sensitivity, both the
environment temperature T0 and the thermal conductance G should be reduced. In modern receivers of
radiation, their typical values are of the order of 0.1 K and 10-10 W/K, respectively.
On the other hand, Eq. (43) implies that to increase the bolometer’s sensitivity, i.e. to reduce the
NEP, the CV of the sensor, and hence its mass, should be increased. This conclusion is valid only to a
certain extent, because due to technical reasons (parameter drifts and the so-called 1/f noise of the sensor
and external electronics), the incoming power has to be modulated with as high a frequency  as
technically possible (in practical receivers, the cyclic frequency  = /2 of the modulation is between
10 and 1,000 Hz), so the electrical signal might be picked up from the sensor at that frequency. As a
result, the CV may be increased only until the thermal relaxation constant of the sensor,
CV
 , (5.44)
G
becomes close to 1/, because at  >> 1 the useful signal drops faster than noise. So, the lowest (i.e.
the best) values of the NEP,

14 An important trend in this progress [see, e.g., P. Day et al., Nature 425, 817 (2003)] is the replacement of the
resistive temperature sensors R(T) with thin and narrow superconducting strips with temperature-sensitive kinetic
inductance Lk(T) – see the model solution of EM Problem 6.20. Such inductive sensors have zero dc resistance,
and hence vanishing Johnson-Nyquist noise at typical signal pickup frequencies of a few kHz – see Eq. (81) and
its discussion below.

Chapter 5 Page 10 of 44
Essential Graduate Physics SM: Statistical Mechanics

( NEP) min  T0G 1 / 2 1 / 2 , with  ~ 1 , (5.45)

are reached at   1. (The exact values of the optimal product , and of the numerical constant  ~ 1
in Eq. (45), depend on the exact law of the power modulation and the readout signal processing
procedure.) With the parameters cited above, this estimate yields (NEP)min/1/2 ~ 310-17 W/Hz1/2 – a
very low power indeed.
However, perhaps counter-intuitively, the power modulation allows the bolometric (and other
broadband) receivers to register radiation with power much lower than this NEP! Indeed, picking up the
sensor signal at the modulation frequency , we can use the subsequent electronics stages to filter out
all the noise besides its components within a very narrow band, of width  << , around the
modulation frequency (Fig. 7). This is the idea of a microwave radiometer,15 currently used in all
sensitive broadband receivers of radiation.

input
power modulation
frequency 

  
noise density

0 Fig. 5.7. The basic idea of the Dicke


frequency radiometer.
pick-up
to output

In order to analyze this opportunity, we need to develop theoretical tools for a quantitative
description of the spectral distribution of fluctuations. Another motivation for that description is a need
for analysis of variables dominated by fast (high-frequency) components, such as pressure – please have
one more look at Fig. 4. Finally, during such an analysis, we will run into the fundamental relation
between fluctuations and dissipation, which is one of the main results of statistical physics as a whole.

5.4. Fluctuations as functions of time


In most discussions of the previous three sections, the averaging … of variables was assumed
to be over an appropriate statistical ensemble of many similar systems. However, as was discussed in
Sec. 2.1, most physical systems of interest are ergodic. If such a system is also stationary, i.e. the
statistical averages of its variables do not change with time, the averaging may be also understood as
that over a sufficiently long time interval. In this case, we may think about fluctuations of any variable f
~ ~
as about a random process taking place in just one particular system, but developing in time: f  f (t ) .
There are two mathematically equivalent approaches to the description of such random functions
of time, called the time-domain picture and the frequency-domain picture, their relative convenience

15It was pioneered in the 1950s by Robert Henry Dicke, so the device is frequently called the Dicke radiometer.
Note that the optimal strategy of using similar devices for time- and energy-resolved detection of single high-
energy photons is different – though even it is essentially based on Eq. (41). For a recent brief review of such
detectors see, e.g., K. Morgan, Phys. Today 71, 29 (Aug. 2018), and references therein.

Chapter 5 Page 11 of 44
Essential Graduate Physics SM: Statistical Mechanics

depending on the particular problem to be solved. In the time domain, we need to characterize random
~ ~
fluctuations f (t ) by some deterministic function of time. Evidently, the average  f (t )  cannot be used
for this purpose, because it equals zero – see Eq. (2). Of course, the variance (3) has not to equal zero,
but if the system is stationary, that average cannot depend on time either. Because of that, let us consider
the following average:
~ ~
f (t ) f (t' ) . (5.46)

Generally, this is a function of two arguments. However, in a stationary system, the average (46) may
depend only on the difference,
τ  t'  t , (5.47)
between the two observation times. In this case, the average (46) is called the correlation function of the
variable f:
~ ~
K f ( )  f (t ) f (t   ) . (5.48) Correlation
function

Again, here the averaging may be understood as that either over a statistical ensemble of
macroscopically similar systems or over a sufficiently long interval of the time argument t, with the
argument  kept constant. The correlation function’s name16 catches the idea of this notion very well:
Kf() characterizes the mutual relation between the fluctuations of the variable f at two times separated
by the given interval . Let us list the basic properties of this function.17
First of all, Kf () has to be an even function of the time delay . Indeed, we may write
~ ~ ~ ~ ~ ~
K f ( )  f (t ) f (t   )  f (t   ) f (t )  f (t' ) f (t'   ) , (5.49)

with t’  t – . For stationary processes, this average cannot depend on the common shift of two
observation times t and t’, so the averages (48) and (49) have to be equal:
K f ( )  K f ( ) . (5.50)

Second, at   0 the correlation function tends to the variance:


~ ~ ~
K f (0)  f (t ) f (t )  f 2  0 . (5.51)

In the opposite limit, when  is much larger than a certain characteristic correlation time c of the
system,18 the correlation function has to tend to zero because the fluctuations separated by such time
interval are virtually independent (uncorrelated) – see Eq. (10). As a result, the correlation function
typically looks like one of the plots sketched in Fig. 8.

16 Another term, the autocorrelation function, is sometimes used for the average (48) to distinguish it from the
mutual correlation function, f1(t)f2(t + ), of two different stationary processes.
17 Note that this correlation function is the direct temporal analog of the spatial correlation function briefly
discussed in Sec. 4.2 – see Eq. (4.30).
18 Note that the correlation time  is the direct temporal analog of the correlation radius r that was discussed in
c c
Sec. 4.2 – see the same Eq. (4.30).

Chapter 5 Page 12 of 44
Essential Graduate Physics SM: Statistical Mechanics

K f ( )
fˆ 2

Fig. 5.8. The correlation function of


 c 0 c  fluctuations: two typical examples.

Note that on a time scale much longer than c, any physically-realistic correlation function may
be well approximated with a delta function of . For example, for a process that is a sum of independent
very short pulses, e.g., the gas pressure force exerted on the container wall (Fig. 4), this approximation
is legitimate on time scales much longer than the single pulse duration, e.g., the time of particle’s
interaction with on the wall at the impact.
~
In the reciprocal, frequency domain, the same process f (t ) is represented as a Fourier integral,19

~ it
f (t )   f e

d , (5.52)

with the reciprocal transform being



1 ~ i t
f 
2  f (t )e

dt . (5.53)
~
If the function f (t ) is random (as it is in the case of fluctuations), with zero average, its Fourier
transform f is also a random function (now of frequency), also with a vanishing statistical average.
Indeed, now thinking of the operation … as an ensemble averaging, we may write
 
1 ~ it 1 ~
f   f (t ) e dt  2  f (t ) e it dt  0 . (5.54)
2 

The simplest non-zero average may be formed similarly to Eq. (46), but with due respect to the
complex-variable character of the Fourier images:
 
1 ~ ~
*
f (t ) f (t' ) e i (ω't' ωt ) .
2   
f  f ω'  dt' dt
2
(5.55)
 

It turns out that for a stationary process, the averages (46) and (55) are directly related. Indeed,
since the integration over t’ in Eq. (55) is in infinite limits, we may replace it with the integration over 
 t’ – t (at fixed t), also in infinite limits. Replacing t’ with t +  in the expressions under the integral,
we see that the average is just the correlation function Kf(), while the time exponent is equal to
exp{i(’ – )t}exp{i’}. As a result, changing the order of integration, we get
   
1 i ( ' )t i' 1
f  f *' i  '
d  e i ( ' )t dt . (5.56)
2    2  
 2
dt d K f ( )e e  K
2 f ( )e
   

19 The argument of the function f is represented as its index with the purpose to emphasize that this function is
~
different from f (t ) , while (very conveniently) still using the same letter for the same variable.

Chapter 5 Page 13 of 44
Essential Graduate Physics SM: Statistical Mechanics

But the last integral is just 2( – ’),20 so we finally get


*
f  f ω'  S f ( ) (  ' ), (5.57)

where the real function of frequency,


 
1 i 1 Spectral
S f ( ) 
2 
 K f ( )e d    K f ( ) cos  d ,
0
(5.58) density of
fluctuations

is called the spectral density of fluctuations at frequency . According to Eq. (58), the spectral density is
just the Fourier image of the correlation function, and hence the reciprocal Fourier transform is:21,22
  Wiener-
i
K f ( )   S f ( )e d  2 S f ( ) cos  d .
 0
(5.59) Khinchin
theorem

In particular, for the fluctuation variance, Eq. (59) yields


 
~2
f  K f ( 0)   S f ( )d  2 S f ( )d .
 0
(5.60)

The last relation shows that the term “spectral density” describes the physical sense of the function Sf()
very well. Indeed, if a random signal f(t) had been passed through a frequency filter with a small
bandwidth  <<  of positive cyclic frequencies, the integral in the last form of Eq. (60) could be
limited to the interval  = 2, i.e. the variance of the filtered signal would become
~2
f  2 S f ( )  4S f ( ) . (5.61)


(A popular alternative definition of the spectral density is Sf()  4Sf(), making the average (61)
equal to just Sf().)
To conclude this introductory (mostly mathematical) section, let me note an important particular
case. If the spectral density of some process is nearly constant within the frequency range of interest,
Sf() = const = Sf(0),23 Eq. (59) shows that its correlation function may be well approximated with a
delta function:

K f ( )  S f (0)  e i d  2S f (0) ( ) . (5.62)


From this relation stems another popular name of the white noise, the delta-correlated process. We have
already seen that this is a very reasonable approximation, for example, for the gas pressure force

20 See, e.g., MA Eq. (14.4).


21 The second form of Eq. (59) uses the fact that, according to Eq. (58), Sf() is an even function of frequency –
just as Kf() is an even function of time.
22 Although Eqs. (58) and (59) look not much more than straightforward corollaries of the Fourier transform, they
bear a special name of the Wiener-Khinchin theorem – after the mathematicians N. Wiener and A. Khinchin who
have proved that these relations are valid even for the functions f(t) that are not square-integrable, so from the
point of view of standard mathematics, their Fourier transforms are not well defined.
23 Such a process is frequently called the white noise, because it consists of all frequency components with equal
amplitudes, reminding the white light, which consists of many monochromatic components with close amplitudes.

Chapter 5 Page 14 of 44
Essential Graduate Physics SM: Statistical Mechanics

fluctuations (Fig. 4). Of course, for the spectral density of a realistic, limited physical variable the
approximation of constant spectral density cannot be true for all frequencies (otherwise, for example,
the integral (60) would diverge, giving an unphysical, infinite value of its variance), and may be valid
only at frequencies much lower than 1/c.

5.5. Fluctuations and dissipation


Now we are equipped mathematically to address a major issue of statistical physics, the relation
between fluctuations and dissipation This relation is especially simple for the following hierarchical
situation: a relatively “heavy”, slowly evolving system, weakly interacting with an environment
consisting of many rapidly moving, “light” components. A popular theoretical term for such a system is
the Brownian particle, named after botanist Robert Brown who was first to notice (in 1827) the random
motion of small particles (in his case, pollen grains), caused by their random hits by fluid’s molecules,
under a microscope. However, the family of such systems is much broader than that of small mechanical
particles. Just for a few examples, such description is valid for an atom interacting with electromagnetic
field modes of the surrounding space, a macroscopic mechanical system interacting with molecules of
air around it, current in a macroscopic electric circuit interacting with microscopic charge carriers, etc.24
One more important assumption of this theory is that the system’s motion does not violate the
thermal equilibrium of the environment – well fulfilled in many cases. (Think, for example, about a
typical mechanical pendulum – its motion does not overheat the air around it to any noticeable extent.)
In this case, the averaging over a statistical ensemble of similar environments at a fixed, specific motion
of the system of interest, may be performed assuming their thermal equilibrium.25 I will denote such a
“primary” averaging by the usual angle brackets …. At a later stage, we may carry out additional,
“secondary” averaging over an ensemble of many similar systems of interest, coupled to similar
environments. When we do, such secondary averaging will be denoted by double angle brackets ….
Let me start with a simple classical system, a 1D harmonic oscillator whose equation of
evolution may be represented as
~ ~
mq  q  Fdet (t )  Fenv (t )  Fdet (t )  F  F (t ), with F (t )  0 , (5.63)

where q is the (generalized) coordinate of the oscillator, Fdet(t) is the deterministic external force, while
both components of the force Fenv(t) represent the impact of the environment on the oscillator’s motion.
Again, on the time scale of the fast-moving environmental components, the oscillator’s motion is slow.
The average component F  of the force exerted by the environment on such a slowly moving object is
frequently independent of its coordinate q but does depend on its velocity q . For most such systems, the
Taylor expansion of the force in small velocity starts with a non-zero linear term:

F   q , (5.64)

24 To emphasize this generality, in the forthcoming discussion of the 1D case, I will use the letter q rather than x
for the system’s displacement.
25 For a usual (ergodic) environment, the primary averaging may be interpreted as that over relatively short time

intervals, c << t << , where c is the correlation time of the environment, while  is the characteristic time
scale of motion of our “heavy” system of interest.

Chapter 5 Page 15 of 44
Essential Graduate Physics SM: Statistical Mechanics

where the constant  is usually called the drag (or “kinematic friction”, or “damping”) coefficient, so
Eq. (63) may be rewritten as
Langevin
~ equation
mq   q  q  Fdet (t )  F (t ) . (5.65) for classical
oscillator

This method of describing the environmental effects on an otherwise Hamiltonian system is


called the Langevin equation.26 Due to the linearity of the differential equation (65), its general solution
may be represented as a sum of two independent parts: the deterministic motion of the damped linear
~
oscillator due to the external force Fdet(t), and its random fluctuations due to the random force F (t )
exerted by the environment. The former effects are well-known from classical dynamics,27 so let us
focus on the latter part by taking Fdet(t) = 0. The remaining term on the right-hand side of Eq. (65)
describes the fluctuating part of the environmental force; in contrast to the average component (64), its
intensity (read: its spectral density at relevant frequencies  ~ 0  (/m)1/2) does not vanish at q(t) = 0,
and hence may be evaluated ignoring the system’s motion.28
Plugging into Eq. (65) the representation of both variables in the Fourier form similar to Eq.
(52), and requiring the coefficients before the same exp{-it} to be equal on both sides of the equation,
for their Fourier images we get the following relation:

 m 2 q  i q  q  F , (5.66)

which immediately gives us q, i.e. the (random) complex amplitude of the coordinate fluctuations:
F F
q   . (5.67)
(  m )  i2
m(   2 )  i
2
0

Now multiplying Eq. (67) by its complex conjugate for another frequency (say, ’), averaging both
parts of the resulting equation, and using the formulas similar to Eq. (57) for each of them,29 we get the
following relation between the spectral densities of the oscillations and the random force: 30
1
S q ( )  SF ( ) . (5.68)
m (   2 ) 2   2 2
2 2
0

26 Named after Paul Langevin, whose 1908 work was the first systematic development of A. Einstein’s ideas on
the Brownian motion (see below) using this formalism. A detailed discussion of this approach, with numerical
examples of its application, may be found, e.g., in the monograph by W. Coffey, Yu. Kalmykov, and J. Waldron,
The Langevin Equation, World Scientific, 1996.
27 See, e.g., CM Sec. 5.1. Again, here I assume that the variable f(t) is classical, with the discussion of the
quantum case postponed until the end of the section.
28 Note that the direct secondary statistical averaging of Eq. (65) with F = 0 yields q = 0! This, perhaps a bit
det
counter-intuitive result becomes less puzzling if we recognize that this is the averaging over a large statistical
ensemble of random sinusoidal oscillations with all values of their phase, and that the (equally probable)
oscillations with opposite phases give mutually canceling contributions to the sum in Eq. (2.6).
29 At this stage, we restrict our analysis to random, stationary processes q(t), so Eq. (57) is valid for this variable

as well, if the averaging in it is understood in the … sense.


30 Regardless of the physical sense of such a function of , and of whether its maximum is situated at a finite

frequency 0 as in Eq. (68) or at  = 0, it is often referred to as the Lorentzian (or “Breit-Wigner”) line.

Chapter 5 Page 16 of 44
Essential Graduate Physics SM: Statistical Mechanics

In the so-called low-damping limit ( << m0), the fraction on the right-hand side of Eq. (68)
has a sharp peak near the oscillator’s own frequency 0 (describing the well-known effect of high-Q
resonance), and may be well approximated in that vicinity as
1 1 2m   0 
 , with   . (5.69)
m (   )   
2 2
0
2 2 2
    1
2 2
0
2

In contrast, the spectral density of fluctuations of a typical environment is changing relatively slowly, so
for the purpose of integration over frequencies near 0, we may replace SF () with SF (0). As a result,
the variance of the environment-imposed random oscillations may be calculated, using Eq. (60), as31
 
~ 1  d
q 2
 2  S q ( )d  2  S q ( )d  2 SF ( 0 ) 2 2  . (5.70)
0     0 2 m   2  1
0

This is a well-known table integral,32 equal to , so, finally:


1   
q~ 2  2SF ( 0 )  SF ( 0 )  S ( ) . (5.71)
  2m
2 2
0 m 0 
2
 F 0
But on the other hand, any weak interaction with the environment should keep the oscillator in
thermodynamic equilibrium at the same temperature T. Since our analysis has been based on the
classical Langevin equation (65), we may only use it in the classical limit 0 << T, in which we may
use the equipartition theorem (2.48). In our current notation, it yields
 T
q~ 2  . (5.72)
2 2
Comparing Eqs. (71) and (72), we see that the spectral density of the random force exerted by the
environment has to be fundamentally related to the damping it provides:

SF ( 0 )  T. (5.73a)

Now we may argue (rather convincingly :-) that since this relation does not depend on the oscillator’s
parameters m and , and hence its eigenfrequency 0 = (/m)1/2,33 it should be valid at any relatively low
frequency (c << 1). Using Eq. (58) with   0, it may be also rewritten as a formula for the effective
low-frequency drag coefficient:
No dissipation  
1 1 ~ ~
without
fluctuation
  KF   d   F 0F   d . (5.73b)
T 0 T 0

31 Since in this case the process in the oscillator is entirely due to its environment, its variance should be obtained
by statistical averaging over an ensemble of many similar (oscillator + environment) systems, and hence,
following our convention, it is denoted by double angular brackets.
32 See, e.g. MA Eq. (6.5a).
33 Moreover, it does not depend on the assumption  << m , made above only for the sake of calculation
0
simplicity. Indeed, for a frequency-independent spectral density SF, such as the one given by Eq. (73a), the
integration of both sides of Eq. (68) over all frequencies yields Eq. (71) for any .

Chapter 5 Page 17 of 44
Essential Graduate Physics SM: Statistical Mechanics

Formulas (73) reveal an intimate, fundamental relation between the fluctuations and the
dissipation provided by a thermally-equilibrium environment. Parroting the famous political slogan,
there is “no dissipation without fluctuation”. This means in particular that the phenomenological
description of dissipation by the drag force alone in classical mechanics34 is (approximately) valid only
when the energy scale of the considered process is much larger than T. To the best of my knowledge,
this fact was first recognized in 1905 by A. Einstein,35 for the following particular case.
Let us apply our result (73) to a free 1D Brownian particle, by taking Fdet(t) = 0 and   0. In
this limit, both relations (71) and (72) lead to infinite coordinate variance. To understand the reason for
that divergence, let us go back to the Langevin equation (65) with not only  = 0 and Fdet(t) = 0, but also
m  0 – just for the sake of simplicity. (The latter approximation, frequently called the overdamping
limit, is quite appropriate, for example, for the motion of small particles in sufficiently viscous fluids –
such as in R. Brown’s experiments.) In this approximation, Eq. (65) is reduced to a simple equation,
~ ~
 q  F (t ), with F (t )  0 , (5.74)

which may be readily integrated to give the particle’s displacement during a finite time interval t:
1
t
~

q (t )  q (t )  q (0)  F (t' )dt' . (5.75)
0

Evidently, at the full statistical averaging of the displacement, the fluctuation effects vanish, but
this does not mean that the particle does not move – just that it has equal probabilities to be shifted in
either of two possible directions. To see that, let us calculate the variance of the displacement:
1
t t
~ ~ 1
t t
q~ 2 (t )   dt'  dt" F (t' )F (t" )  2  dt'  dt"K F (t'  t" ) . (5.76)
2 0 0  0 0

As we already know, at times  >> c, the correlation function may be well approximated by the delta
function – see Eq. (62). In this approximation, with SF(0) expressed by Eq. (73a), we get

2 2 T
t t t
2T
Δq~ 2 (t )  2 SF 0 dt'  dt" (t"  t' )  2  dt'  t  2 Dt , (5.77)
 0 0   0

with
T Einstein’s
D . (5.78) relation

The final form of Eq. (77) describes the well-known law of diffusion (“random walk”) of a 1D
system, with the r.m.s. deviation from the point of origin growing as (2Dt)1/2. The coefficient D in this
relation is called the coefficient of diffusion, and Eq. (78) describes the extremely simple and important36

34 See, e.g., CM Sec. 5.1.


35 It was obtained in one of the three papers of Einstein’s celebrated 1905 “triad”. As a reminder, another of the
papers started the relativity theory, and one more was the quantum description of the photoelectric effect,
essentially starting quantum mechanics. Not too bad for one year of one young scientist’s life!
36 In particular, in 1908, i.e. very soon after Einstein’s publication, it was used by J. Perrin for an accurate
determination of the Avogadro number NA. (It was Perrin who graciously suggested naming this constant after A.
Avogadro, honoring his pioneering studies of gases in the 1810s.)

Chapter 5 Page 18 of 44
Essential Graduate Physics SM: Statistical Mechanics

Einstein’s relation between that coefficient and the drag coefficient. Often this relation is rewritten, in
the SI units of temperature, as D =  kBTK, where   1/ is the mobility of the particle. The physical
sense of  becomes clear from the expression for the deterministic velocity (particle’s “drift”), which
follows from the averaging of both sides of Eq. (74) after the restoration of the term Fdet(t) in it:

1
vdrift  q (t )  Fdet (t )  Fdet (t ) , (5.79)

so the mobility is just the drift velocity given to the particle by a unit force.37
Another famous embodiment of the general Eq. (73) is the thermal (or “Johnson”, or “Johnson-
Nyquist”, or just “Nyquist”) noise in resistive electron devices. Let us consider a lumped,38 two-
terminal, dissipation-free “probe” circuit, playing the role of the harmonic oscillator in the analysis
carried out above, connected to a resistive device (Fig. 9), playing the role of the probe circuit’s
environment. (The noise is generated by the thermal motion of numerous electrons, randomly moving
inside the resistive device.) For this system, one convenient choice of the conjugate variables (the
generalized coordinate and generalized force) is, respectively, the electric charge Q  I(t)dt that has
passed through the “probe” circuit by time t, and the voltage V across its terminals, with the polarity
shown in Fig. 9. (Indeed, the product VdQ is the elementary work dW done by the environment on the
probe circuit.)


probe I
circuit V R, T
Fig. 5.9. A resistive device as a dissipative
 environment of a two-terminal probe circuit.

Making the corresponding replacements, q  Q and F  V in Eq. (64), we see that it becomes

V   Q  I . (5.80)

Comparing this relation with Ohm’s law, V = R(-I),39 we see that in this case, the coefficient  has the
physical sense of the usual Ohmic resistance R of our dissipative device,40 so Eq. (73a) becomes
R
SV ( )  T. (5.81a)

Using the last equality in Eq. (61), and expressing temperature in the SI units (T = kBTK), we may bring
this famous Nyquist formula41 to its most popular form:

37 Note that in solid-state physics and electronics, the charge carrier mobility is usually defined as vdrift/E  =
evdrift/Fdet  e (where E is the applied electric field), and is traditionally measured in cm2/Vs.
38 As a (good :-) student of classical electrodynamics should know, lumped (compact) electric circuits may be
described by the usual Kirchhoff laws, neglecting the wave propagation effects – see, e.g., EM Sec. 6.6.
39 The minus sign here is due to the fact that in our notation, the current flowing in the resistor, from the terminal
assumed to be positive to the negative one, is (-I) – see Fig. 9.
40 Due to this fact, Eq. (64) is often called the Ohmic model of the environment’s response, even if the physical
nature of the variables q and F is completely different from the electric charge and voltage.

Chapter 5 Page 19 of 44
Essential Graduate Physics SM: Statistical Mechanics

~
V 2  4k BTK R . (5.81b) Nyquist
formula

Note that according to Eq. (65), this result is only valid at a negligible speed of change of the coordinate
q (in our current case, negligible current I), i.e. Eq. (81) describes the voltage fluctuations as would be
measured by a virtually ideal voltmeter, with its input resistance much higher than R. On the other hand,
using a different choice of generalized coordinate and force, q  , F  I (where   V(t)dt is the
generalized magnetic flux, so dW = IV(t)dt  Id), we get   1/R, and Eq. (73) yields the thermal
fluctuations of the current through the resistive device, as would be measured by a virtually ideal
ammeter, i.e. at V  0:
1 ~ 4k T
S I ( )  T, i.e. I 2  B K  . (5.81c)
R   R
The nature of Eqs. (81) is so fundamental that they may be used, in particular, for the so-called Johnson
noise thermometry.42
Note, however, that these relations are valid for noise in thermal equilibrium only. In electric
circuits that may be readily driven out of equilibrium by an applied voltage V, other types of noise are
frequently important, notably the shot noise that arises in short conductors, e.g., tunnel junctions, at
applied voltages with V  >> T /q, due to the discreteness of charge carriers.43 A straightforward
analysis (left for the reader’s exercise) shows that this noise may be characterized by current
fluctuations with the following low-frequency spectral density:
qI ~ Schottky
S I ( )  , i.e. I 2  2 qI  , (5.82) formula
2 

where q is the electric charge of a single current carrier. This is the Schottky formula,44 valid for any
relation between the average I and V. The comparison of Eqs. (81c) and (82) for a device that obeys the
Ohm law shows that the shot noise has the same intensity as the thermal noise with the effective
temperature
qV
Tef   T . (5.83)
2
This relation may be interpreted as a result of charge carrier overheating by the applied electric field,
and explains why the Schottky formula (82) is only valid in conductors much shorter than the energy

41 It is named after Harry Nyquist who derived this formula in 1928 (independently of the prior work by A.
Einstein, M. Smoluchowski, and P. Langevin) to describe the “Johnson-Nyquist” noise that had been just
discovered experimentally by his Bell Labs colleague John Bertrand Johnson. The derivation of Eq. (73) and
hence of Eq. (81) in these notes is essentially a twist of the derivation used by H. Nyquist.
42 See, e.g., J. Crossno et al., Appl. Phys. Lett. 106, 023121 (2015), and references therein.
43 Another practically important type of fluctuations in electronic devices is the low-frequency 1/f noise that was
already mentioned in Sec. 3 above. I will briefly discuss it in Sec. 8.
44 It was derived by Walter Hans Schottky as early as 1918, i.e. even before Nyquist’s work.

Chapter 5 Page 20 of 44
Essential Graduate Physics SM: Statistical Mechanics

relaxation length le of the charge carriers.45 (Another mechanism of shot noise suppression, which may
become noticeable in highly conductive nanoscale devices, is the Fermi-Dirac statistics of electrons.46)
Now let us return for a minute to the bolometric Dicke radiometer (see Figs. 6-7 and their
discussion in Sec. 4), and use the Langevin formalism to finalize its analysis. For this system, the
Langevin equation is an extension of the usual equation of heat balance:
dT ~
CV  G (T  T0 )  Pdet (t )  P (t ) , (5.84)
dt
~
where Pdet  P describes the (deterministic) power of the absorbed radiation and P represents the
effective source of temperature fluctuations. Now we can use Eq. (84) to carry out a calculation of the
spectral density ST() of temperature fluctuations absolutely similarly to how this was done with Eq.
(65), assuming that the frequency spectrum of the fluctuation source is much broader than the intrinsic
bandwidth 1/ = G/CV of the bolometer, so its spectral density at frequencies  ~ 1 may be well
approximated by its low-frequency value SP(0):
2
1
S T    S P 0 . (5.85)
 iCV  G
Then, requiring the variance of temperature fluctuations, calculated from this formula and Eq. (60),
  2
~ 1
T  2
 T 2  2 S T ( )d  2 S P 0 d
0 0
 iCV  G
(5.86)
1

d S 0
 2S P 0  2  2  P ,
CV 0   G / CV  2
GCV
to coincide with our earlier “thermodynamic fluctuation” result (41), we get
G
S P (0)  T02 . (5.87)

The r.m.s. value of the “power noise” within a bandwidth  << 1/ (see Fig. 7) becomes equal to the
deterministic signal power Pdet (or more exactly, the main harmonic of its modulation law) at
~
P  Pmin  P 2  

1/ 2
 2S P (0) 
1/ 2
 2G  T0 .
1/ 2
(5.88)

This result shows that our earlier prediction (45) may be improved by a substantial factor of the
order of (/)1/2, where the reduction of the output bandwidth is limited only by the signal
accumulation time t ~ 1/, while the increase of  is limited by the speed of (typically, mechanical)
devices performing the power modulation. In practical systems this factor may improve the sensitivity
by a couple of orders of magnitude, enabling observation of extremely weak radiation. Maybe the most
spectacular example is the recent measurements of the CMB radiation, which corresponds to blackbody
temperature TK  2.726 K, with accuracy TK ~ 10-6 K, using microwave receivers with the physical

45 See, e.g., Y. Naveh et al., Phys. Rev. B 58, 15371 (1998). In practically used metals, le is of the order of a few
nanometers, so the usual “macroscopic” resistors do not exhibit the shot noise.
46 For a review of this effect see, e.g., Ya. Blanter and M. Büttiker, Phys. Repts. 336, 1 (2000).

Chapter 5 Page 21 of 44
Essential Graduate Physics SM: Statistical Mechanics

temperature of all their components much higher than T. The observed weak (~10-5 K) anisotropy of
the CMB radiation is a major experimental basis of all modern cosmology.47
Returning to the discussion of our main result, Eq. (73), let me note that it may be readily
generalized to the case when the environment’s response is different from the Ohmic form (64). This
opportunity is virtually evident from Eq. (66): by its derivation, the second term on its left-hand side is
just the Fourier component of the average response of the environment to the system’s displacement:
F  i q . (5.89)

Now let the response be still linear, but have an arbitrary frequency dispersion,
F   ( )q , (5.90)

where the function (), called the generalized susceptibility (in our current case, of the environment)
may be complex, i.e. have both the imaginary and real parts:
 ( )  ' ( )  i" ( ) . (5.91)
Then Eq. (73) remains valid with the replacement   ”()/: 48
"  
SF ( )  T. (5.92)

This fundamental relation49 may be used not only to calculate the fluctuation intensity from the
known generalized responsibility (i.e. the deterministic response of the system to a small perturbation),
but also in reverse – to calculate such linear response from the known fluctuations. The latter use is
especially attractive at numerical simulations of complex systems, e.g., those based on molecular-
dynamics approaches, because it circumvents the need in extracting a weak response to a small
perturbation out of a noisy background.
Now let us discuss what generalization of Eq. (92) is necessary to make that fundamental result
suitable for arbitrary temperatures, T ~ . The calculations we had performed were based on the
apparently classical equation of motion, Eq. (63). However, quantum mechanics shows50 that a similar
equation is valid for the corresponding Heisenberg-picture operators, so by repeating all the arguments
that have led us to the Langevin equation (65), we may write its quantum-mechanical version
~ˆ Heisenberg-
mqˆ  qˆ  qˆ  Fˆdet  F . (5.93) Langevin
equation

47 See, e.g., a concise book by A. Balbi, The Music of the Big Bang, Springer, 2008.
48 Reviewing the calculations leading to Eq. (73), we may see that the possible real part ’() of the susceptibility
just adds up to (k – m2) in the denominator of Eq. (67), resulting in a change of the oscillator’s frequency 0.
This renormalization is insignificant if the oscillator-to-environment coupling is weak, i.e. if the susceptibility
() is small – as had been assumed at the derivation of Eq. (69) and hence Eq. (73).
49 It is sometimes called the Green-Kubo (or just “Kubo”) formula. This is hardly fair, because, as the reader
could see, Eq. (92) is just an elementary generalization of the Nyquist formula (81). Moreover, the corresponding
works of M. Green and R. Kubo were published, respectively, in 1954 and 1957, i.e. after the 1951 paper by H.
Callen and T. Welton, where a more general result (98) had been derived. Much more adequately, the
Green/Kubo names are associated with Eq. (102) below.
50 See, e.g., QM Sec. 4.6.

Chapter 5 Page 22 of 44
Essential Graduate Physics SM: Statistical Mechanics

This is the so-called Heisenberg-Langevin (or “quantum Langevin”) equation – in this particular case,
for a harmonic oscillator.
The further operations, however, require certain caution, because the right-hand side of the
equation is now an operator, and has some nontrivial properties. For example, the “values” of the
Heisenberg operator, representing the same variable f(t) at different times, do not necessarily commute:
~ˆ ~ˆ ~ˆ ~ˆ
f (t ) f (t' )  f (t' ) f (t ), if t'  t . (5.94)
As a result, the function defined by Eq. (46) may not be a symmetric function of the time delay   t’ – t
even for a stationary process, making it inadequate for the representation of the actual correlation
function – which has to obey Eq. (50). This technical difficulty may be overcome by the introduction of
the following symmetrized correlation function51
1 ~ˆ ~ˆ ~ˆ ~ˆ 1  ~ˆ ~ˆ 
K f ( )  f (t ) f (t   )  f (t   ) f (t )   f (t ), f (t   ) , (5.95)
2 2  
(where {…,…} denotes the anticommutator of the two operators), and, similarly, the symmetrical
spectral density Sf(), defined by the following relation:

S f ( ) (   ' ) 
1 ˆ ˆ*
2
* ˆ
f  f ω'  fˆω' f 
1
2
fˆ , fˆω'*  , (5.96)

with Kf() and Sf() still related by the Fourier transform (59).
Now we may repeat all the analysis that was carried out for the classical case, and get Eq. (71)
again, but now this expression has to be compared not with the equipartition theorem, but with its
quantum-mechanical generalization (14), which, in our current notation, reads
 0 
q~ 2  coth 0 . (5.97)
2 2T
As a result, we get the following quantum-mechanical generalization of Eq. (92):
"   
FDT SF ( )  coth . (5.98)
2 2T
This is the much-celebrated fluctuation-dissipation theorem, usually referred to just as the FDT, first
derived in 1951 by Herbert Bernard Callen and Theodore A. Welton – in a somewhat different way.
As natural as it seems, this generalization of the relation between fluctuations and dissipation
poses a very interesting conceptual dilemma. Let, for the sake of clarity, temperature be relatively low, T
<< ; then Eq. (98) gives a temperature-independent result
"  
Quantum SF ( )  , (5.99)
noise 2

51 Here (and to the end of this section) the averaging … should be understood in the general quantum-statistical
sense – see Eq. (2.12). As was discussed in Sec. 2.1, for the classical-mixture state of the system, this does not
create any difference in either the mathematical treatment of the averages or their physical interpretation.

Chapter 5 Page 23 of 44
Essential Graduate Physics SM: Statistical Mechanics

which describes what is frequently called quantum noise. According to the quantum Langevin equation
~
(93), nothing but the random force F exerted by the environment, with the spectral density (99)
proportional to the imaginary part of susceptibility (i.e. damping), is the source of the ground-state
“fluctuations” of the coordinate and momentum of a quantum harmonic oscillator, with the r.m.s. values
1/ 2 1/ 2
1/ 2    1/ 2  m 0 
q  q~ 2    , p  ~
p2   , (5.100)
 2m 0   2 
and the total energy 0/2. On the other hand, basic quantum mechanics tells us that exactly these
formulas describe the ground state of a dissipation-free oscillator, not coupled to any environment, and
are a direct corollary of the basic commutation relation
 qˆ, pˆ   i . (5.101)
So, what is the genuine source of the uncertainty described by Eqs. (100)?
The best resolution of this paradox I can offer is that either interpretation of Eqs. (100) is
legitimate, with their relative convenience depending on the particular application. One may say that
since the right-hand side of the quantum Langevin equation (93) is a quantum-mechanical operator
rather than a classical force, it “carries the quantum uncertainty relation within itself”. However, this
(admittedly, opportunistic:-) view leaves the following question open: is the quantum noise (99) of an
environment observable directly, without any probe oscillator subjected to it? An experimental
resolution of this dilemma is not quite simple, because usual scientific instruments have their own
ground-state uncertainty, i.e. their own quantum fluctuations, which may be readily confused with those
of the system under study. Fortunately, this difficulty may be overcome, for example, by using unique
frequency-mixing (“down-conversion”) properties of Josephson junctions. Special low-temperature
experiments using such down-conversion52 have confirmed that the quantum noise (99) is quite real and
measurable.
Finally, let me mention an alternative derivation53 of the fluctuation-dissipation theorem (98)
from the general quantum mechanics of open systems. This derivation is substantially longer than that
presented above but gives an interesting sub-product, the Green-Kubo formula
~ ˆ ~ˆ 
F (t ),F (t   )  iG ( ) , (5.102)

where G() is the temporal Green’s function of the environment, defined by the following relation:
 t
F (t )   G ( )q(t   )d   G (t  t' )q(t' )dt' . (5.103)
0 

Plugging the Fourier transforms of all three functions of time participating in Eq. (103) into this relation,
it is straightforward to check that this Green’s function is just the Fourier image of the complex
susceptibility () defined by Eq. (90):

i
 G ( )e
0
d   ( ) ; (5.104)

52 R. Koch et al., Phys. Rev. B 26, 74 (1982), and references therein.


53 See, e.g., QM Sec. 7.4.

Chapter 5 Page 24 of 44
Essential Graduate Physics SM: Statistical Mechanics

here 0 is used as the lower limit instead of (–) just to emphasize that due to the causality principle,
Green’s function has to be equal to zero for  < 0.54
In order to reveal the real beauty of Eq. (102), we may use the Wiener-Khinchin theorem (59) to
rewrite the fluctuation-dissipation theorem (98) in a similar time-domain form:
 ~ˆ ~ˆ 
 F (t ),F (t   )   2 KF ( ) , (5.105)
 
where the symmetrized correlation function KF() is most simply described by its Fourier transform,
which is, according to Eq. (58), equal to SF(), so using the FDT, we get

"  


 KF ( ) cos  d 
0
2
coth
2T
. (5.106)

The comparison of Eqs. (102) and (104), on one hand, and Eqs (105)-(106), on the other hand,
shows that both the commutation and anticommutation properties of the Heisenberg-Langevin force
operator at different moments of time are determined by the same generalized susceptibility () of the
environment. However, the averaged anticommutator also depends on temperature, while the averaged
commutator does not – at least explicitly. (The complex susceptibility of an environment may be temperature-
dependent as well.)

5.6. The Kramers problem and the Smoluchowski equation


Returning to the classical case, it is evident that the Langevin equations of the type (65) provide
means not only for the analysis of stationary fluctuations, but also for the description of the time
evolution of (classical) systems coupled to their environments – which, again, may provide both
dissipation and fluctuations. However, this approach to evolution analysis suffers from two major
handicaps.
First, the Langevin equation does enable a straightforward calculation of the statistical average
of the variable q, and its fluctuation variance – i.e., in the common mathematical terminology, the first
and second moments of the probability density w(q, t) – as functions of time, but not of the probability
distribution as such. Admittedly, this is rarely a big problem, because in most cases the distribution is
Gaussian – see, e.g., Eq. (2.77).
The second, more painful drawback of the Langevin approach is that it is instrumental only for
“linear” systems – i.e., the systems whose dynamics may be described by linear differential equations,
such as Eq. (65). However, as we know from classical dynamics, many important problems (for
example, the Kepler problem of planetary motion55) are reduced to motion in potentials Uef(q) that
substantially differ from quadratic parabolas, giving nonlinear equations of motion. If the energy of
interaction between the system and its random environment is factorable – i.e. is a product of variables
belonging to these subsystems (as it is very frequently the case), we may repeat all arguments of the last
section to derive the following generalized version of the 1D Langevin equation:

54 See, e.g., CM Sec. 5.1.


55 See, e.g., CM Secs. 3.4-3.6.

Chapter 5 Page 25 of 44
Essential Graduate Physics SM: Statistical Mechanics

U (q, t ) ~
mq   q   F (t ) , (5.107)
q
valid for an arbitrary, possibly time-dependent potential U(q, t).56 Unfortunately, the solution of this
equation may be very hard. Indeed, its Fourier analysis carried out in the last section was essentially
based on the linear superposition principle, which is invalid for nonlinear equations.
If the fluctuation intensity is low,  q << q, where q(t) is the deterministic solution of Eq.
(107) in the absence of fluctuations, this equation may be linearized57 with respect to small fluctuations
q~  q  q to get a linear equation of motion:

   q~   (t )q~  F~ (t ),
mq~
2
with  (t )  2 U  q (t ), t  . (5.108)
q
This equation differs from Eq. (65) only by the time dependence of the effective spring constant (t),
and may be solved by the Fourier expansion of both the fluctuations and the function (t). Such
calculations may be more cumbersome than those performed above, but still doable (especially if the
unperturbed motion q(t) is periodic), and sometimes give useful analytical results.58
However, some important problems cannot be solved by linearization. Perhaps, the most
apparent (and practically very important) example is the so-called Kramers problem59 of finding the
lifetime of a metastable state of a 1D classical system in a potential well separated from the region of
unlimited motion with a potential barrier – see Fig. 10.

U (q) Iw

U0

0 q2 q" Fig. 5.10. The Kramers problem.


q1 q' q

In the absence of fluctuations, the system, if initially at rest close to the well’s bottom (in Fig. 10,
at q  q1), would stay there forever. Fluctuations result not only in a finite spread of the probability
density w(q, t) around that point but also in a gradual decrease of the total probability

W (t )   w(q, t )dq
well's
(5.109)
bottom

56 The generalization of Eq. (107) to higher spatial dimensionality is also straightforward, with the scalar variable
q replaced with a multi-dimensional vector q, and the scalar derivative dU/dq replaced with the vector U, where
 is the del vector-operator in the q-space.
57 See, e.g., CM Secs. 3.2, 5.2, and beyond.
58 See, e.g., QM Problem 7.8, and also Chapters 5 and 6 in the monograph by W. Coffey et al., cited above.
59 It was named after Hendrik Anthony (“Hans”) Kramers who, besides solving this conceptually important
problem in 1940, has made several other seminal contributions to physics, including the famous Kramers-Kronig
dispersion relations (see, e.g., EM Sec. 7.4) and the WKB (Wentzel-Kramers-Brillouin) approximation in
quantum mechanics – see, e.g., QM Sec. 2.4.

Chapter 5 Page 26 of 44
Essential Graduate Physics SM: Statistical Mechanics

to find the system in the well, because of a non-zero rate of its escape from it, over the potential barrier,
due to thermal activation. What may be immediately expected of the situation is that if the barrier
height,
U 0  U (q 2 )  U (q1 ) , (5.110)

is much larger than the temperature T,60 the Boltzmann distribution w  exp{–U(q)/T} should be still
approximately valid in most of the well, so the probability for the system to overcome the barrier in unit
time should scale as exp{–U0/T}. From these handwaving arguments, one may reasonably expect that if
the probability W(t) of the system’s still residing in the well by time t obeys the usual “decay law”
W
W   , (5.111a)

then the lifetime  has to obey the general Arrhenius law:
U 0 
   A exp . (5.111b)
T 
However, these relations need to be proved, and the pre-exponential coefficient A (usually called the
attempt time) needs to be calculated. This cannot be done by the linearization of Eq. (107), because this
approximation is equivalent to a quadratic approximation of the potential U(q), which evidently cannot
describe the potential well and the potential barrier simultaneously – see Fig. 10 again.
This and other essentially nonlinear problems may be addressed using an alternative approach to
fluctuations, dealing directly with the time evolution of the probability density w(q, t). Due to the
shortage of time/space, I will review this approach using mostly handwaving arguments, and refer the
interested reader to special literature61 for strict mathematical proofs. Let us start with the diffusion of a
free classical 1D particle with inertial effects negligible in comparison with damping. It is described by
the Langevin equation (74) with Fdet = 0. Let us assume that at all times the probability distribution
stays Gaussian:
 q  q 0  
2
1
w(q, t )  exp   , (5.112)
2 1 / 2 q(t )  2q 2 (t ) 
where q0 is the initial position of the particle, and q(t) is the time-dependent distribution width, whose
growth in time is described, as we already know, by Eq. (77):

q(t )  2 Dt 1 / 2 . (5.113)
Then it is straightforward to verify, by substitution, that this solution satisfies the following simple
partial differential equation,
w 2w
D 2 , (5.114)
t q
with the delta-functional initial condition

60 If U is comparable with T, the system’s behavior also depends substantially on the initial probability
0
distribution, i.e., does not follow the simple law (111).
61 See, e.g., either R. Stratonovich, Topics in the Theory of Random Noise, vol. 1., Gordon and Breach, 1963, or
Chapter 1 in the monograph by W. Coffey et al., cited above.

Chapter 5 Page 27 of 44
Essential Graduate Physics SM: Statistical Mechanics

w(q,0)   (q  q 0 ) . (5.115)

The simple and important equation of diffusion (114) may be naturally generalized to the 3D motion:62
w Equation
 D 2 w . (5.116) of diffusion
t
Now let us compare this equation with the probability conservation law,63
w
   jw  0 , (5.117a)
t
where the vector jw has the physical sense of the probability current density. (The validity of this
relation is evident from its integral form,
d

dt V
wd 3 r   j w  d 2 q  0 ,
S
(5.117b)

resulting from the integration of Eq. (117a) over an arbitrary time-independent volume V limited by
surface S, and the application of the divergence theorem64 to the second term of the result.)
The continuity relation (117a) coincides with Eq. (116), with D given by Eq. (78), if we take
T
j w   Dw   w . (5.118)

The first form of this relation65
allows a simple interpretation: the probability flow is proportional to the
spatial gradient of the probability density (i.e., in application to N >> 1 similar and independent
particles, just to the gradient of their concentration n = Nw), with the sign corresponding to the flow
from the higher to lower concentrations. This flow is the very essence of the effect of diffusion. The
second form of Eq. (118) is also not very surprising: the diffusion speed scales as temperature and is
inversely proportional to the viscous drag.
The fundamental Eq. (117) has to be satisfied also in the case of a force-driven particle at
negligible diffusion (D  0); in this case
j w  wv , (5.119)
where v is the deterministic velocity of the particle. In the high-damping limit we are considering right
now, v has to be just the drift velocity:
1 1
v  Fdet   U (q) , (5.120)
 

62 As will be discussed in Chapter 6, the equation of diffusion also describes several other physical phenomena –
in particular, the heat propagation in a uniform, isotropic solid, and in this context is called the heat conduction
equation or (rather inappropriately) just the “heat equation”.
63 Both forms of Eq. (117) are similar to the mass conservation law in classical dynamics (see, e.g., CM Sec. 8.2),
the electric charge conservation law in electrodynamics (see, e.g., EM Sec. 4.1), and the probability conservation
law in quantum mechanics (see, e.g., QM Sec. 1.4).
64 See, e.g., MA Eq. (12.2).
65 In application to systems of many similar, weakly-interacting particles (to be discussed in the next chapter)
where w is proportional to the particle density n, this expression is sometimes called the Fick’s law, after the
physiologist A. Fick who suggested it in 1855.

Chapter 5 Page 28 of 44
Essential Graduate Physics SM: Statistical Mechanics

where F det is the deterministic force described by the potential energy U(q).
Now that we have descriptions of jw due to both the drift and the diffusion separately, we may
rationally assume that in the general case when both effects are present, the corresponding components
(118) and (119) of the probability current just add up, so
1
jw  w  U   Tw , (5.121)

and Eq. (117a) takes the form
Smoluchowski
equation
w 1
t 

   w  U   T  2 w .  (5.122)

This is the Smoluchowski equation (or “Smoluchowski diffusion equation”),66 which is closely related to
the drift-diffusion equation in multi-particle kinetics – to be discussed in the next chapter.
As a sanity check, let us see what the Smoluchowski equation gives in the stationary limit, w/t
 0 (which evidently may be eventually achieved only if the deterministic potential U is time-
independent.) Then Eq. (117a) yields jw = const, where the constant describes the deterministic motion
of the system as the whole. If such a motion is absent, jw = 0, then according to Eq. (121),
w U
w U  T w  0, i.e.  . (5.123)
w T
Since the left-hand side of the last relation is just (lnw), it may be easily integrated over q, giving
U  U (q) 
ln w    ln C , i.e. w(r )  C exp , (5.124)
T  T 
where C is a normalization constant. With both sides multiplied by the number N of similar, independent
systems, with the spatial density n(q) = Nw(q), this equality becomes the Boltzmann distribution (3.26).
As a less trivial example of the Smoluchowski equation’s applications, let us use it to solve the
1D Kramers problem (Fig. 10) in the corresponding high-damping limit, m << A, where A (still to be
calculated) is some time scale of the particle’s motion inside the well. It is straightforward to verify that
the 1D version of Eq. (121),
1   U  w 
I w   w    T , (5.125a)
   q  q 
(where Iw is the probability current at point q, rather than its density) is mathematically equivalent to
T  U (q )    U (q)  
Iw   exp   w exp   , (5.125b)
  T  q   T 
so we may write
U (q )  T   U (q )  
I w exp   w exp   . (5.126)
 T   q   T 

66 It is named after Marian Smoluchowski, who developed this formalism in 1906. Note that sometimes Eq. (122)
is referred to as the Fokker-Planck equation, but it is more common to use that name for a more general equation
discussed the next section.

Chapter 5 Page 29 of 44
Essential Graduate Physics SM: Statistical Mechanics

As was discussed above, the notion of a metastable state’s lifetime is well-defined only for sufficiently
low temperatures
T  U 0 . (5.127)

when the lifetime is relatively long:  >> A. Since according to Eq. (111a), the first term of the
continuity equation (117b) has to be of the order of W/, in this limit the term, and hence the gradient of
Iw, are exponentially small, so the probability current virtually does not depend on q in the potential
barrier region. Let us use this fact in the integration of both sides of Eq. (126) over that region:
q"
U (q)  T U (q)   q"
Iw  exp T 
dq    w exp

  ,
 T   q'
(5.128)
q'

where the integration limits q’ and q” (shown schematically in Fig. 10) are selected so that
T  U (q' )  U (q1 ),U (q 2 )  U (q" )  U 0 . (5.129)
(Obviously, such selection is only possible if condition (127) is satisfied.) In this limit, the contribution
from the point q” to the right-hand side of Eq. (129) is negligible because the probability density behind
the barrier is exponentially small. On the other hand, the probability at the point q’ has to be close to the
value given by its quasi-stationary Boltzmann distribution (124), so
U (q' )  U (q1 )  , (5.130)
w(q' ) exp   w(q1 ) exp 
 T   T 
and Eq. (128) yields
q"
T U (q )  U (q1 ) 
Iw 

w(q1 )  exp T
dq .

(5.131)
q'

Patience, my reader, we are almost done. The probability density w(q1) at the well’s bottom may
be expressed in terms of the total probability W of the particle being in the well by using the
normalization condition
U ( q1 )  U ( q )  ;
W   w( q1 ) exp  dq (5.132)
well's  T 
bottom

the integration here may be limited to the region where the difference U(q) – U(q1) is much smaller than
U0 – cf. Eq. (129). According to the Taylor expansion, the shape of virtually any smooth potential U(q)
near the point q1 of its minimum may be well approximated with a quadratic parabola:
1 d 2U
U (q  q1 )  U (q1 )  (q  q1 ) ,
2
where  1  q q1  0 . (5.133)
2 dq 2
With this approximation, Eq. (132) is reduced to the standard Gaussian integral:67

 1 q~ 2  ~
 1/ 2
  1 (q  q1 ) 2   2T 
W  w(q1 ) 
well's
exp
 2T
dq  w(q1 )  exp
   2T 
dq  w(q1 )
 1 
 . (5.134)
bottom

67 If necessary, see MA Eq. (6.9b) again.

Chapter 5 Page 30 of 44
Essential Graduate Physics SM: Statistical Mechanics

To complete the calculation, we may use a similar approximation for the barrier top:
   
U (q  q 2 )  U (q1 )  U (q 2 )  2 (q  q 2 ) 2   U (q1 )  U 0  2 (q  q 2 ) 2 ,
 2  2
2 (5.135)
d U
where  2   2 q q  0,
dq 2

and work out the remaining integral in Eq. (131), because in the limit (129) it is dominated by the
contribution from a region very close to the barrier top, where the approximation (135) is asymptotically
exact. As a result, we get
q" 1/ 2
U (q )  U (q1 )  U 0  2T 
 exp T
dq  exp 
  T   2 
 . (5.136)
q'

Plugging Eq. (136), and the w(q1) expressed from Eq. (134), into Eq. (131), we finally get
( 1 2 )1 / 2  U 
Iw  W exp 0  . (5.137)
2  T 
This expression should be compared with the 1D version of Eq. (117b) for the segment [–, q’].
Since this interval covers the region near q1 where most of the probability density resides, and Iq(-) =
0, this equation is merely
dW
 I w (q' )  0 . (5.138)
dt
In our approximation, Iw(q’) does not depend on the exact position of the point q’, and is given by Eq.
(137), so plugging it into Eq. (138), we recover the exponential decay law (111a), with the lifetime 
obeying the Arrhenius law (111b), and the following attempt time:
Kramers
2 
A   2  1 2  , where  1, 2 
formula 1/ 2
. (5.139)
for high
damping  1 2 
1/ 2
 1, 2
Thus the metastable state lifetime is indeed described by the Arrhenius law, with the attempt
time scaling as the geometric mean of the system’s “relaxation times” near the potential well bottom (1)
and the potential barrier top (2).68 Let me leave it for the reader’s exercise to prove that if the potential
profile near the well’s bottom and/or top is sharp, the expression for the attempt time should be
modified, but the Arrhenius decay law (111) is not affected.

5.7. The Fokker-Planck equation


Formula (139) is just a particular, high-damping limit of a more general result obtained by
Kramers. In order to get all of it (and much more), we need to generalize the Smoluchowski equation to
arbitrary values of damping . In this case, the probability density w is a function of not only the
particle’s position q (and time t) but also of its momentum p – see Eq. (2.11). Thus the continuity
equation (117) needs to be generalized to the 6D phase space {q, p}. Such generalization is very natural:

68 Actually, 2 describes the characteristic time of the exponential growth of small deviations from the unstable
fixed point q2 at the barrier top, rather than their decay, as near the stable point q1.

Chapter 5 Page 31 of 44
Essential Graduate Physics SM: Statistical Mechanics

w
  q  jq   p  j p  0 , (5.140)
t
where jq (which was called jw in the last section) is the probability current density in the coordinate
space, and q (which was denoted as  in that section) is the usual vector operator in the space, while jp
is the current density in the momentum space, and p is the similar vector operator in that space:
3
 3

q  n j   p  n j . (5.141)
j 1 q j j 1 p j

At negligible fluctuations (T  0), jp may be composed using the natural analogy with jq – see
Eq. (119). In our new notation, that relation reads,
p
j q  wq  w , (5.142)
m
so it is natural to take
j p  wp  w F , (5.143a)

where the (statistical-ensemble) averaged force F  includes not only the contribution due to the
potential’s gradient, but also the drag force –v provided by the environment – see Eq. (64) and its
discussion:
p
j p  w (  qU  v )   w ( qU   ) . (5.143b)
m
As a sanity check, it is straightforward to verify that the diffusion-free equation resulting from the
combination of Eqs. (140), (142) and (143),
w  p   p 
drift   q   w    p   w   qU     , (5.144)
t  m   m 
allows the following particular solution:

w(q, p, t )    q  q t     p  p t  , (5.145)

where the statistical-averaged coordinate and momentum satisfy the deterministic equations of motion,
p p
q  , p   qU   , (5.146)
m m
describing the particle’s drift, with the usual deterministic initial conditions.
In order to understand how the diffusion should be accounted for, let us consider a statistical
ensemble of free (qU = 0,  = 0) particles that are uniformly distributed in the direct space q (so qw =
0), but possibly localized in the momentum space. In this case, the right-hand side of Eq. (144) vanishes,
i.e. the time evolution of the probability density w may be only due to diffusion. In the corresponding
limit F   0, the Langevin equation (107) for each Cartesian coordinate is reduced to
~ ~
mq j  F j (t ), i.e. p j  F j (t ) . (5.147)

Chapter 5 Page 32 of 44
Essential Graduate Physics SM: Statistical Mechanics

The last equation is identical to the high-damping 1D equation (74) (with Fdet = 0), with the replacement
q  pj/, and hence the corresponding contribution to w/t may be described by the last term of Eq.
(122), with that replacement:
w T 2 2
diffusion  D p /  w 
2
  p w  T  2p w . (5.148)
t 
Now the reasonable assumption that in the arbitrary case the drift and diffusion contributions to w/t
just add up immediately leads us to the full Fokker-Planck equation:69
Fokker- w  p   p 
Planck   q   w    p   w   qU     T  2p w . (5.149)
equation t  m   m 
As a sanity check, let us use this equation to calculate the stationary probability distribution of
the momentum of particles with an arbitrary damping  but otherwise free, in the momentum space,
assuming (just for simplicity) their uniform distribution in the direct space, q = 0. In this case, Eq.
(149) is reduced to
  p  p 
 p   w   T 2p w  0, i.e.  p   w  T p w   0 . (5.150)
  m  m 
The first integration over the momentum space yields
p  p2 
w  T p w  j w , i.e. w p    T p w  j w , (5.151)
m  2m 
where jw is a vector constant describing a possible general probability flow in the system. In the absence
of such flow, jw = 0, we get

 p2   w  p2   p2 
 p    T p   p   T ln w   0, giving w  const  exp , (5.152)
 2m  w  2m   2mT 
i.e. the Maxwell distribution (3.5). However, the result (152) is more general than that obtained in Sec.
3.1, because it shows that the distribution stays the same even at non-zero damping. It is also
straightforward to verify that in the more general case of an arbitrary stationary potential U(q), Eq. (149)
is satisfied with the stationary solution (3.24), also giving jw = 0.
In the limit where the damping is large, i.e. the inertial effects are relatively small, the solution of
the Fokker-Planck equation tends, relatively rapidly, to the following product
 p  p 0 q, t 2 
w(q, p, t )  exp   w (q,t ) , (5.153)
 2mT 
where p0  –(m/)qU, followed by a much slower time evolution of the direct-space distribution w(q,
t), described by the Smoluchowski equation (122).
Another important particular case is that of a quasi-periodic motion of a particle, with low
damping, in a soft potential well. In this case, the Fokker-Planck equation describes both the diffusion of

69It was first derived by Adriaan Fokker in 1913 in his PhD thesis, and further elaborated by Max Planck in 1917.
(Curiously, A. Fokker is more famous for his work on music theory, and the invention and construction of several
new keyboard instruments, than for this and several other important contributions to theoretical physics.)

Chapter 5 Page 33 of 44
Essential Graduate Physics SM: Statistical Mechanics

the effective phase  of such (generally nonlinear, “anharmonic”) oscillator, and a slow relaxation of its
energy. If we are only interested in the latter effect, Eq. (149) may be reduced to the so-called energy
diffusion equation,70 which is much easier to solve.
However, in most practically interesting cases, solutions of Eq. (149) are rather complicated.
(Indeed, the reader should remember that these solutions embody, in the particular case T = 0, all
classical dynamics of a particle.) Because of this, I will present (rather than derive) only one more of
them: the Kramers’ solution71 of his problem (Fig. 10) for /m2 ~ 1. In this general case, the
metastable state’s lifetime turns out to be again given by the Arrhenius formula (111b), with the
following reciprocal attempt time:

1 1  2 
1/ 2
   1 / 2 , for η / m 2  1,
  12, 2     (5.154)
A 2 2  4m 2  2m 
  m
 1 2  / 2  1 / 2  
1 2 1/ 2
, for 1  η / m 2 ,

where 1,2  (1,2/m)1/2. Thus, in the limit /m2 << 1, Eqs. (111b) and (154) give a very simple result
2 U 
 exp 0  . (5.155)
1 T 
Note, however, that this result is strictly valid only if /m2 >> T/U0 (as a reminder, the latter ratio has
to be much smaller than 1 in order for the very notion of lifetime  to be meaningful) and at lower
damping, its pre-exponential factor requires a correction, which may be calculated using the already
mentioned energy diffusion equation.72
The Kramers’ result for the classical thermal activation of a system over a potential barrier may
be compared with that for its quantum-mechanical tunneling through the barrier.73 The WKB
approximation for the latter effect gives the expression
   2 2 ( q )
 Q   A exp 2
  2 ( q )0 
  (q)dq, with
2m
 U (q)  E ,
(5.156)

showing that generally, the classical and quantum lifetimes of a metastable state have different
dependences on the barrier shape. For example, for a nearly-rectangular potential barrier, the exponent
that determines the classical lifetime (155) depends (linearly) only on the barrier height U0, while that
defining the quantum lifetime (156) is proportional to the barrier width and to the square root of U0.
However, in the important case of “soft” potential profiles, which are typical for the case of emerging
(or nearly disappearing) quantum wells (Fig. 11), the classical and quantum results are closely related.
U (q)

q1
U0
0 q2 q
Fig. 5.11. Cubic-parabolic potential
profile and its parameters.

70 An example of such an equation, for the particular case of a harmonic oscillator, is given by QM Eq. (7.214).
The Fokker-Planck equation, of course, can give only its classical limit, with n, ne >> 1.
71 H. Kramers, Physica 7, 284 (1940); see also the model solution of Problem 27.
72 See, e.g., the review paper by O. Mel’nikov, Phys. Repts. 209, 1 (1991), and references therein.
73 See, e.g., QM Secs. 2.4-2.6.

Chapter 5 Page 34 of 44
Essential Graduate Physics SM: Statistical Mechanics

Indeed, such potential profile U(q) may be well approximated by four leading terms of its Taylor
expansion, with the highest term proportional to (q – q0)3, near any point q0 in the vicinity of the well. In
this approximation, the second derivative d2U/dq2 vanishes at the inflection point q0 = (q1 + q2)/2,
exactly between the well’s bottom and the barrier’s top (in Fig. 11, q1 and q2). Selecting the origin at
this point, as this is done in Fig. 11, we may reduce the approximation to just two terms:74
b
U (q)  aq  q 3 . (5.157)
3
(For the particle’s escape into the positive direction of the q-axis, we should have a,b > 0.) An easy
calculation gives all essential parameters of this cubic parabola: the positions of its minimum and
maximum:
q 2   q1  a / b  ,
1/ 2
(5.158)
the barrier height over the well’s bottom:
1/ 2
4  a3  , (5.159)
U 0  U (q 2 )  U (q1 )   
3 b 
and the effective spring constants at these points:
d 2U
1   2   2ab  .
1/ 2
2
(5.160)
dq q
1, 2

Hence for this potential profile, Eq. (155) may be rewritten as


Soft well: 2 U  2(ab)1 / 2
thermal  exp 0 , with   2
. (5.161)
0
0
lifetime T  m
On the other hand, for the same profile, the WKB approximation (156) (which is accurate when the
height of the metastable state energy over the well’s bottom, E – U(q1)  0/2, is much lower than the
barrier height U0) yields75
1/ 2
Soft well: 2   0   36 U 0 
quantum Q    exp . (5.162)
lifetime  0  864U 0   5  0 
The comparison of the dominating, exponential factors in these two results shows that the
thermal activation yields a lower lifetime (i.e., dominates the metastable state decay) if the temperature
is above the crossover value
36
Tc   0  7.2  0 . (5.163)
5
This expression for the cubic-parabolic barrier may be compared with a similar crossover for a
quadratic-parabolic barrier,76 for which Tc = 2 0  6.28 0. We see that the numerical factors for

74 As a reminder, a similar approximation arises for the P(V) function, at the analysis of the van der Waals model
near the critical temperature – see Problem 4.6.
75 The main, exponential factor in this result may be obtained simply by ignoring the difference between E and

U(q1), but the correct calculation of the pre-exponential factor requires taking this difference, 0/2, into account
– see, e.g., the model solution of QM Problem 2.43.
76 See, e.g., QM Sec. 2.4.

Chapter 5 Page 35 of 44
Essential Graduate Physics SM: Statistical Mechanics

the quantum-to-classical crossover temperature for these two different soft potential profiles are close to
each other – and much larger than 1, which could result from a naïve estimate.

5.8. Back to the correlation function


Unfortunately, I will not have time/space to either derive or even review solutions of other
problems using the Smoluchowski and Fokker-Planck equations, but have to mention one conceptual
issue. Since it is intuitively clear that the solution w(q, p, t) of the Fokker-Planck equation for a system
provides full statistical information about it, one may wonder how it may be used to find its temporal
characteristics that were discussed in Secs. 4-5 using the Langevin formalism. For any statistical
average of a function taken at the same time instant, the answer is clear – cf. Eq. (2.11):

f q(t ), p(t )   f (q, p) w(q, p, t )d 3 qd 3 p , (5.164)

but what if the function depends on variables taken at different times, for example as in the correlation
function Kf() defined by Eq. (48)?
To answer this question, let us start from the discrete-variable case when Eq. (164) takes the
form (2.7), which, for our current purposes, may be rewritten as

f t    f mWm (t ) . (5.165)
m

In plain English, this is a sum of all possible values of the function, each multiplied by its probability as
a function of time. But this implies that the average f(t)f(t’) may be calculated as the sum of all
possible products fmfm’, multiplied by the joint probability to measure outcome m at moment t, and
outcome m’ at moment t’. The joint probability may be represented as a product of Wm(t) by the
conditional probability W(m’, t’ m, t). Since the correlation function is well defined only for stationary
systems, in the last expression we may take t = 0, i.e. look for the conditional probability as the solution,
Wm’(), of the equation describing the system’s probability evolution, at time  = t’ – t (rather than t’),
with the special initial condition
Wm' (0)   m' ,m . (5.166)

On the other hand, since the average f(t)f(t +) of a stationary process should not depend on t, instead
of Wm(0) we may take the stationary probability distribution Wm(), independent of the initial
conditions, which may be found as the same special solution, but at time   . As a result, we get
Correlation
f t  f t      f mWm   f m'Wm'   . (5.167) function:
discrete
m ,m ' system

This expression looks simple, but note that this recipe requires solving the time evolution
equations for each Wm’() for all possible initial conditions (166). To see how this recipe works in
practice, let us revisit the simplest two-level system (see, e.g., Fig. 4.13, which is reproduced in Fig. 12
below in a notation more convenient for our current purposes), and calculate the correlation function of
its energy fluctuations.

Chapter 5 Page 36 of 44
Essential Graduate Physics SM: Statistical Mechanics

W1 (t ) E1  
 
W 0 t  E0  0
Fig. 5.12. Dynamics of a two-level system.

The stationary probabilities of the system’s states (i.e. their probabilities for   ) have been
calculated in problems of Chapter 2, and then again in Sec. 4.4 – see Eq. (4.68). In our current notation
(Fig. 12),
1 1
W0    , W    ,
1  e  / T e /T  1
1

(5.168)

so that E    W0    0  W1       / T .
e 1
To calculate the conditional probabilities Wm’( ) with the initial conditions (166) (according to Eq.
(168), we need all four of them, for {m, m’} = {0, 1}), we may use the master equations (4.100), in our
current notation reading
dW1 dW0
  W0  W1 . (5.169)
d d
Since Eq. (169) conserves the total probability, W0 + W1 = 1, only one probability (say, W1) is an
independent variable, and for it, Eq. (169) gives a simple, linear differential equation
dW1
  1  W1   W1    W1 , where      , (5.170)
d
which may be readily integrated for an arbitrary initial condition:

W1 ( )  W1 (0)e


 W1 () 1  e
 
, (5.171)

where W1() is given by the second of Eqs. (168). (It is straightforward to verify that the solution for
W0() may be represented in a form similar to Eq. (171), with the corresponding replacement of the state
index.)
Now everything is ready to calculate the average E(t)E(t +) using Eq. (167), with fm,m’ = E0,1.
Thanks to our (smart :-) choice of the energy reference, of the four terms in the double sum (167), all
three terms that include at least one factor E0 = 0 vanish, and we have only one term left to calculate:


E (t ) E (t   )  E1W1 () E1W1 ( ) W (0)1  E12W1 () W1 0 e
1


 W1 () 1  e


W1 (0)1

2
  /T
 
e 
1
1  e 
  2  
1  e /T e  .   (5.172)
e

1  e  /T
1

 e  / T
12

From here and the last of Eqs. (168), the correlation function of energy fluctuations is77

77 The step from the first line of Eq. (173) to its second line utilizes the fact that our system is stationary, so E(t +
) = E(t) = E() = const.

Chapter 5 Page 37 of 44
Essential Graduate Physics SM: Statistical Mechanics

K E ( )  E (t ) E (t   )  E (t )  E (t ) E (t   )  E (t ) 
~ ~

e /T  (5.173)
 E (t ) E (t   )  E  
2
 2 e
e  / T  1 2
,

so its variance, equal to KE(0), does not depend on the transition rates  and . However, since the
rates have to obey the detailed balance relation (4.103), / = exp{/T}, for this variance we may
formally write
K E 0 e /T  /    
     2 , (5.174)
 2
e /T  1
2
 
 /   1     
2 2

so Eq. (173) may be represented in a simpler form:


Energy
    fluctuations:
K E ( )    2  e  .
2
(5.175) two-level
 system

We see that the correlation function of energy fluctuations decays exponentially with time, with the rate
. Now using the Wiener-Khinchin theorem (58) to calculate its spectral density, we get

1 Γ  Γ  Γ Δ 2 Γ Γ
S E ( )  Δ   
2
e cos d . (5.176)
 0 Γ 2  Γ  Γ 2   2
Such Lorentzian dependence on frequency is very typical for discrete-state systems described by
master equations. It is interesting that the most widely accepted explanation of the 1/f noise (also called
the “flicker” or “excess” noise), which was mentioned in Sec. 5, is that it is a result of thermally-
activated jumps between states of two-level systems with an exponentially-broad statistical distribution
of the transition rates . Such a broad distribution follows from the Kramers formula (155), which is
approximately valid for the lifetimes of both states of systems with double-well potential profiles (Fig.
13), for a statistical ensemble with a smooth statistical distribution of the energy barrier heights U0.
Such profiles are typical, in particular, for electrons in disordered (amorphous) solid-state materials,
which indeed feature high 1/f noise.

U0
Fig. 5.13. Typical double-
 well potential profile.
0 q

Returning to the Fokker-Planck equation, we may use the following evident generalization of
Eq. (167) to the continuous-variable case:
Correlation

f (t ) f (t   )   d qd p  d q'd p' f (q,p) wq,p,   f (q',p' ) wq',p' ,  ,


3 3 3 3 function:
(5.177) continuous
system

were both probability densities are particular values of the equation’s solution with the delta-functional
initial condition

Chapter 5 Page 38 of 44
Essential Graduate Physics SM: Statistical Mechanics

w(q' , p' ,0)   (q' - q) (p' - p) . (5.178)

For the Smoluchowski equation, valid in the high-damping limit, the expressions are similar, albeit with
a lower dimensionality:
f (t ) f (t   )   d 3 q  d 3 q' f (q) wq,   f (q' ) wq' ,  , (5.179)

w(q' ,0)   (q' - q) . (5.180)

To see this formalism in action, let us use it to calculate the correlation function Kq() of a linear
relaxator, i.e. an overdamped 1D harmonic oscillator with m0 << . In this limit, as Eq. (65) shows,
the oscillator’s coordinate averaged over the ensemble of environments obeys a linear equation,

 q   q  0 , (5.181)

which describes its exponential relaxation from the initial position q0 to the equilibrium position q = 0,
with the reciprocal time constant  = /:

q (t )  q0 e t . (5.182)

The deterministic equation (181) corresponds to the quadratic potential energy U(q) = q2/2, so
the 1D version of the corresponding Smoluchowski equation (122) takes the form
w 
wq   T  w2 .
2
  (5.183)
t q q
It is straightforward to check, by substitution, that this equation, rewritten for the function w(q’,), with
the 1D version of the delta-functional initial condition (180), w(q’,0) = (q’ – q), is satisfied with a
Gaussian function:
1  q'  q ( ) 2 
w(q' , )  exp , (5.184)
2 1 / 2 q( )  2q 2 ( ) 
with its center q() moving in accordance with Eq. (182), and a time-dependent variance


q 2 ( )  q 2 () 1  e 2 ,  where q 2 ()  q 2 

.
T
(5.185)

(As a sanity check, the last equality coincides with the equipartition theorem’s result.) Finally, the first
probability under the integral in Eq. (179) may be found from Eq. (184) in the limit    (in which
q()  0), by replacing q’ with q:
1  q2 
w(q, )  exp   . (5.186)
2 1 / 2 q()  2q 2 () 
Now all ingredients of the recipe (179) are ready, and we can spell it out, for f (q) = q, as

1
 
 q2  
 q'  qe    .
2

2 q ( )q ()  


q (t )q (t   )  dq dq' q exp    q' exp  (5.187)
 2q ()  2q 2 ( )
2
 

Chapter 5 Page 39 of 44
Essential Graduate Physics SM: Statistical Mechanics

The integral over q’ may be worked out first, by replacing this integration variable with (q” + qe-) and
hence dq’ with dq”:

 
 
1  q2    q" 2 
2 q ( )q ()  
q (t )q (t   )  q exp    dq q"  qe exp   dq" . (5.188)
 2q ()    2q ( ) 
2 2

The internal integral of the first term in the parentheses equals zero (as that of an odd function in
symmetric integration limits), while that with the second term is the standard Gaussian integral, so:
 
 q2 
q (t )q (t   ) 
1
e   q 2
exp    dq 
2T 
e   
 2 exp   2 d . (5.189)
2 1 / 2 q()  2q () 
2
 
1/ 2


The last integral78 equals 1/2/2, so taking into account that for this stationary system centered at
the coordinate origin, q() = 0, we finally get a very simple result:
Correlation
T
K q ( )  q~ (t )q~ (t   )  q(t )q (t   )  q   q(t )q (t   )  e  .
2 function:
(5.190) linear
 relaxator

As a sanity check, for  = 0 it yields Kq(0)  q2 = T/, in accordance with Eq. (185). As  is increased
the correlation function decreases monotonically – see the solid-line sketch in Fig. 8.
So, the solution of this very simple problem has required straightforward but somewhat bulky
calculations. On the other hand, the same result may be obtained literally in one line using the Langevin
formalism – namely, as the Fourier transform (59) of the spectral density (68) in the corresponding limit
m << , with SF() given by Eq. (73a):79
  
T 1 T cos  T
K q ( )  2  S q ( ) cos  d  2  cos  d   2  d  e  . (5.191)
0 0
    
2 2
 0 (  )  
2 2

This example illustrates the fact that for fluctuations in linear systems (and small fluctuations in
nonlinear systems) the Langevin approach is usually much simpler than the one based on the Fokker-
Planck or Smoluchowski equations. However, again, the latter approach is indispensable for the analysis
of fluctuations of arbitrary intensity in nonlinear systems.
To conclude this chapter, I have to emphasize again that the Fokker-Planck and Smoluchowski
equations give a quantitative description of the time evolution of nonlinear Brownian systems with
dissipation in the classical limit. The description of the corresponding properties of such dissipative
(“open”) and nonlinear quantum systems is more complex,80 and only a few simple problems of their
theory have been solved analytically so far,81 typically using particular models of the environment, e.g.,
as a large set of harmonic oscillators with various statistical distributions of their parameters, each
leading to a specific function () for the generalized susceptibility.

78 See, e.g., MA Eq. (6.9c).


79 The involved table integral may be found, e.g., in MA Eq. (6.11).
80 See, e.g., QM Sec. 7.6.
81 See, e.g., the solutions of the 1D Kramers problem for quantum systems with low damping by A. Caldeira and
A. Leggett, Phys. Rev. Lett. 46, 211 (1981), and with high damping by A. Larkin and Yu. Ovchinnikov, JETP
Lett. 37, 382 (1983).

Chapter 5 Page 40 of 44
Essential Graduate Physics SM: Statistical Mechanics

5.10. Exercise problems

5.1. Treating the first 30 digits of number  = 3.1415… as a statistical ensemble of integers k
(equal to 3, 1, 4, 1, 5,…), calculate the average k and the r.m.s. fluctuation k. Compare the results
with those for the ensemble of randomly selected decimal integers 0, 1, 2,.., and 9.

5.2. An ideal classical gas of N similar particles fills a spherical cavity of radius R. Calculate the
variance of fluctuations of the position r of its center of mass, in equilibrium.

5.3. Calculate the variance of fluctuations of a magnetic moment m placed into an external
magnetic field H, within the same two models as in Problem 2.4:
(i) a quantum spin-½ with a gyromagnetic ratio , and
(ii) a classical magnetic moment m of a fixed magnitude m0 but an arbitrary orientation,
both in thermal equilibrium at temperature T. Compare the results.82
Hint: Mind all three Cartesian components of the vector m.

5.4. For a field-free, two-site Ising system with energy values Em = –Js1s2, in thermal equilibrium
at temperature T, calculate the variance of energy fluctuations. Explore the low-temperature and high-
temperature limits of the result.

5.5. In a system in thermodynamic equilibrium with fixed T and , both the number N of
particles and the internal energy E may fluctuate. Express the mutual correlation factor of these
fluctuations via the average of E. Spell out your result for an ideal classical gas of N >> 1 particles.

5.6. As was mentioned in Sec. 2, the variance of energy fluctuations in a system with fixed T and
 (i.e. a member of a grand canonical ensemble) is generally different from that in a similar system in
which T and N are fixed, i.e. a member of a canonical ensemble. Calculate and interpret the difference.

5.7. For a uniform, three-site Ising ring with ferromagnetic coupling (and no external field), in
thermal equilibrium at temperature T, calculate the correlation coefficients Ks  sksk' for both k = k' and
k  k'.

5.8.* For a field-free 1D Ising system of N >> 1 “spins”, in thermal equilibrium at temperature
T, calculate the correlation coefficient Ks  slsl+n, where l and (l + n) are the numbers of two specific
spins in the chain.
Hint: Consider a mixed partial derivative of the statistical sum calculated in Problem 4.21 for an
Ising chain with an arbitrary set of Jk, over a part of these parameters.

82 Note that these two cases may be considered as the non-interacting limits of, respectively, the Ising model
(4.23) and the classical limit of the Heisenberg model (4.21), whose analysis within the Weiss approximation was
the subject of Problems 4.22 and 4.23.

Chapter 5 Page 41 of 44
Essential Graduate Physics SM: Statistical Mechanics

5.9. Within the framework of the Weiss molecular-field approximation, calculate the variance of
spin fluctuations in the d-dimensional Ising model. Use the result to derive the conditions of quantitative
validity of the approximation.

5.10. Calculate the variance of energy fluctuations in a quantum harmonic oscillator with
frequency , in thermal equilibrium at temperature T, and express it via the average energy.

5.11. The spontaneous electromagnetic field inside a closed volume V is in thermal equilibrium
at temperature T. Assuming that V is sufficiently large, calculate the variance of fluctuations of the total
energy of the field, and express the result via its average energy and temperature. How large should the
volume V be for your results to be quantitatively valid? Evaluate this limitation for room temperature.

5.12. Express the r.m.s. uncertainty of the occupancy Nk of a certain energy level k by non-
interacting:
(i) classical particles,
(ii) fermions, and
(iii) bosons,
in thermodynamic equilibrium, via the level’s average occupancy Nk, and compare the results.

5.13. Write a general expression for the variance of the number of particles in the ideal gases of
bosons and fermions, at fixed V, T and . Spell out the result for the degenerate Fermi gas.
~
5.14. Express the variance of the number of particles,  N 2 V,T,, of a single-phase system in
equilibrium, via its isothermal compressibility T  –(V/P)T,N/V.

5.15.* Calculate the low-frequency spectral density of fluctuations of the pressure P of an ideal
classical gas, in thermal equilibrium at temperature T, and estimate their variance. Compare the former
result with the solution of Problem 3.2.
Hints: You may consider a cylindrically-shaped container of volume
A
V = LA (see the figure on the right), and start by using the Maxwell N ,T
distribution of velocities to calculate the spectral density of the force F (t) F (t )
exerted by the confined particles on its plane lid of area A, approximating it
as a delta-correlated process.
L
5.16. Calculate the low-frequency spectral density of fluctuations of the electric q
current I(t) due to the random passage of charged particles between two conducting
electrodes – see the figure on the right. Assume that the particles are emitted, at random
times, by one of the electrodes, and are fully absorbed by the counterpart electrode. Can
your result be mapped on some aspect of the electromagnetic blackbody radiation?
Hint: For the current I(t), use the same delta-correlated-process approximation as for the force
F(t) in the previous problem.

5.17. Perhaps the simplest model of the diffusion is the 1D discrete random walk: each time
interval , a particle leaps, with equal probability, to any of two adjacent sites of a 1D lattice with spatial

Chapter 5 Page 42 of 44
Essential Graduate Physics SM: Statistical Mechanics

period a. Prove that the particle’s displacement during a time interval t >>  obeys Eq. (77), and
calculate the corresponding diffusion coefficient D.

5.18.83 A long uniform string, of mass  per unit length, is attached to a F t 


firm support and stretched with a constant force (“tension”) T – see the figure on T
the right. Calculate the spectral density of the random force F (t) exerted by the 
string on the support point, within the plane normal to its length, in thermal
equilibrium at temperature T.
Hint: You may assume that the string is so long that the transverse wave propagating along it
from the support point, never comes back.

5.19.84 Each of the two 3D harmonic oscillators, with mass m, resonance frequency 0, and
damping  > 0, has electric dipole moment d = qs, where s is the vector of the oscillator’s displacement
from its equilibrium position. Use the Langevin formalism to calculate the average potential of
electrostatic interaction of these oscillators (a particular case of the so-called London dispersion force),
separated by distance r >> (T/m)1/2/0, in thermal equilibrium at temperature T >> 0. Also, explain
why the approach used to solve a very similar Problem 2.18 is not directly applicable to this case.

1 2 
 d 
Hint: You may like to use the following integral:
0 1     a  
2 2 2
2
4a
.

5.20.* Within the van der Pol approximation,85 calculate major statistical properties of
fluctuations of classical self-oscillations (including their linewidth), for:
(i) a free (“autonomous”) run of the oscillator, and
(ii) its phase locked by an external sinusoidal force,
assuming that the fluctuations are caused by a weak external noise with a smooth spectral density Sf().

5.21. Calculate the correlation function of the coordinate of a 1D harmonic oscillator with small
Ohmic damping, in thermal equilibrium. Compare the solution with that of the previous problem.

5.22. A lumped electric circuit, consisting of a capacitor C shortened with a Ohmic resistor R, is
in thermal equilibrium at temperature T. Use two different approaches to calculate the variance of the
thermal fluctuations of the capacitor’s electric charge Q. Estimate the effect of quantum fluctuations.

83 This problem, conceptually important for the quantum mechanics of open systems, was also given in Chapter 7
of the QM part of this series.
84 This problem, for the case of arbitrary temperature, was the subject of QM Problem 7.6, with QM Problem 5.15
serving as the background. However, the method used in the model solutions of those problems requires one to
prescribe, to the oscillators, different frequencies 1 and 2 at first, and only after this more general problem has
been solved, pursue the limit 1  2, while neglecting dissipation altogether. The goal of this problem is to
show that the result of that solution is valid even at non-zero damping.
85 See, e.g., CM Secs. 5.2-5.5. Note that in quantum mechanics, a similar approach is called the rotating-wave
approximation (RWA) – see, e.g., QM Secs. 6.5, 7.6, 9.2, and 9.4.

Chapter 5 Page 43 of 44
Essential Graduate Physics SM: Statistical Mechanics

5.23. Consider a very long, uniform, two-wire transmission line (see the
figure on the right) with wave impedance Z, which allows propagation of TEM I V
2
electromagnetic waves with negligible attenuation. Calculate the variance V  I
of spontaneous fluctuations of the voltage V between the wires within a small
interval  of cyclic frequencies, in thermal equilibrium at temperature T.
Hint: As an E&M reminder,86 in the absence of dispersive materials, TEM waves propagate with
a frequency-independent velocity, with the voltage V and the current I (see the figure above) related as
V (x,t)/I(x,t) = Z, where Z is line’s wave impedance.

5.24. Now consider a similar long transmission line but terminated, at one end, with an
impedance-matching Ohmic resistor R = Z. Calculate the variance V2 of the voltage across the
resistor, and discuss the relation between the result and the Nyquist formula (81b), including numerical
factors.
Hint: A termination with resistance R = Z absorbs incident TEM waves without reflection.
U (q)
5.25. An overdamped classical 1D particle escapes from a potential well Iq
with a smooth bottom but a sharp top of the barrier – see the figure on the right.
Perform the necessary modification of the Kramers formula (139).
0 q1 q2 q

5.26.* Similar particles, whose spontaneous electric dipole moments p have a field-independent
magnitude p0, are uniformly distributed in space with a density n so low that their mutual interaction is
negligible. Each particle may rotate without substantial inertia but under a kinematic friction torque
proportional to its angular velocity. Use the Smoluchowski equation to calculate the complex dielectric
constant () of such a medium, in thermal equilibrium with temperature T, for a weak, linearly-
polarized rf electric field.

5.27.* Prove that for systems with relatively low inertia (i.e. relatively high damping), at not very
high temperatures, the Fokker-Planck equation (149) reduces to the Smoluchowski equation (122) – in
the sense described by Eq. (153) and the accompanying statement.

5.28.* Use the 1D version of the Fokker-Planck equation (149) to prove the solution (156) of the
Kramers problem.

5.29. A constant external torque applied to a mechanical pendulum with mass m and length l, has
displaced it by angle 0 < /2 from the vertical position. Calculate the average rate of the pendulum’s
rotation induced by relatively small thermal fluctuations.

5.30. A classical particle may occupy any of N similar sites. Its weak interaction with the
environment induces random, incoherent jumps from the occupied site to any other site, with the same
time-independent rate . Calculate the correlation function and the spectral density of fluctuations of the
instant occupancy n(t) (equal to either 1 or 0) of a site.

86 See, e.g., EM Sec. 7.6.

Chapter 5 Page 44 of 44
Essential Graduate Physics SM: Statistical Mechanics

Chapter 6. Elements of Kinetics


This chapter gives a brief introduction to the basic notions of physical kinetics. Its main focus is on the
Boltzmann transport equation, especially within the simple relaxation-time approximation (RTA), which
allows an approximate but reasonable and simple description of transport phenomena (such as the
electric current and thermoelectric effects) in gases, including electron gases in metals and
semiconductors.

6.1. The Liouville theorem and the Boltzmann equation


Physical kinetics (not to be confused with “kinematics”!) is the branch of statistical physics that
deals with systems out of thermodynamic equilibrium. Major effects addressed by kinetics include:
(i) for autonomous systems (those out of external fields): the transient processes (relaxation),
that lead from an arbitrary initial state of a system to its thermodynamic equilibrium;
(ii) for systems in time-dependent (say, sinusoidal) external fields: the field-induced periodic
oscillations of the system’s variables; and
(iii) for systems in time-independent (“dc”) external fields: dc transport.
In the last case, we are dealing with stationary (/t = 0 everywhere), but non-equilibrium
situations, in which the effect of an external field, continuously driving the system out of equilibrium, is
partly balanced by its simultaneous relaxation – the trend back to equilibrium. Perhaps the most
important effect of this class is the dc current in conductors and semiconductors,1 which alone justifies
the inclusion of the basic notions of kinetics into any set of core physics courses.
The reader who has reached this point of the notes already has some taste of physical kinetics
because the subject of the last part of Chapter 5 was the kinetics of a “Brownian particle”, i.e. of a
“heavy” system interacting with an environment consisting of many “lighter” components. Indeed, the
equations discussed in that part – whether the Smoluchowski equation (5.122) or the Fokker-Planck
equation (5.149) – are valid if the environment is in thermodynamic equilibrium, but the system of our
interest is not necessarily so. As a result, we could use those equations to discuss such non-equilibrium
phenomena as the Kramers problem of the metastable state’s lifetime.
In contrast, this chapter is devoted to the more traditional subject of kinetics: systems of many
similar particles – generally, interacting with each other but not too strongly, so the energy of the system
still may be partitioned into a sum of single-particle components, with the interparticle interactions
considered as a perturbation. Actually, we have already started the job of describing such a system at the
beginning of Sec. 5.7. Indeed, in the absence of particle interactions (i.e. when it is unimportant whether
the particle of our interest is “light” or “heavy”), the probability current densities in the coordinate and
momentum spaces are given, respectively, by Eq. (5.142) and the first form of Eq. (5.143a), so the
continuity equation (5.140) takes the form
w
  q  wq    p  wp   0 . (6.1)
t

1 This topic was briefly addressed in EM Chapter 4, avoiding its aspects related to thermal effects.

© K. Likharev
Essential Graduate Physics SM: Statistical Mechanics

If similar particles do not interact, this equation for the single-particle probability density w(q, p, t) is
valid for each of them, and the result of its solution may be used to calculate any ensemble-average
characteristic of the system as a whole.
Let us rewrite Eq. (1) in the Cartesian component form,
w   
  wq j    wp j   0 , (6.2)
t  q j
j  p j 
where the index j numbers all degrees of freedom of the particle under consideration, and assume that its
motion (perhaps in an external, time-dependent field) may be described by a Hamiltonian function H (qj,
pj, t). Plugging into Eq. (2) the Hamiltonian equations of motion:2
H H , (6.3)
q j  , p j  
p j q j
we get
w    H    H 
  w  w   0 . (6.4)
t j  q j
 p  p  q 
  j  j  j 
After differentiation of both parentheses by parts, the equal mixed terms w2H/qjpj and w2H/pjqj
cancel, and using Eq. (3) again, we get the so-called Liouville theorem3

w  w w 
  q j  p j   0 . (6.5) Liouville
t 
j  q j p j  theorem

Since the left-hand side of this equation is just the full derivative of the probability density w
considered as a function of the generalized coordinates qj(t) of a particle, its generalized momenta
components pj(t), and (possibly) time t,4 the Liouville theorem (5) may be represented in a surprisingly
simple form:
dw(q, p, t )
 0. (6.6)
dt
Physically this means that the elementary probability dW = wd3qd3p to find a Hamiltonian particle in a
small volume of the coordinate-momentum space [q, p], with its center moving in accordance to the
deterministic law (3), does not change with time – see Fig. 1.

q (t ), p (t )
Fig. 6.1. The Liouville
t'  t theorem’s interpretation:
probability’s conservation
t at the system’s motion flow
3
d qd p 3 through the [q, p] space.

2 See, e.g., CM Sec. 10.1.


3 Actually, this is just one of several theorems bearing the name of Joseph Liouville (1809-1882).
4 See, e.g., MA Eq. (4.2).

Chapter 6 Page 2 of 38
Essential Graduate Physics SM: Statistical Mechanics

At the first glance, this fact may not look surprising because according to the fundamental
Einstein relation (5.78), one needs non-Hamiltonian forces (such as the kinematic friction) to have
diffusion. On the other hand, it is striking that the Liouville theorem is valid even for (Hamiltonian)
systems with deterministic chaos,5 in which the deterministic trajectories corresponding to slightly
different initial conditions become increasingly mixed with time.
For an ideal gas of 3D particles, we may use the ordinary Cartesian coordinates rj (with j = 1, 2,
3) as the generalized coordinates qj, so pj become the Cartesian components mvj of the usual (linear)
momentum, and the elementary volume is just d3rd3p – see Fig. 1. In this case, Eqs. (3) are just
pj
rj   v j , p j  F j , (6.7)
m
where F is the force exerted on the particle, so the Liouville theorem may be rewritten as
w 3  w w 
  vj  Fj  0, (6.8)
t j 1  r j p j 
and conveniently represented in the vector form
w
 v rw F  pw  0 . (6.9)
t
Of course, the situation becomes much more complex if the particles interact. Generally, a
system of N similar particles in 3D space has to be described by the probability density being a function
of (6N + 1) arguments: 3N Cartesian coordinates, plus 3N momentum components, plus time. An
analytical or numerical solution of any equation describing the time evolution of such a function for a
typical system of N ~ 1023 particles is evidently a hopeless task. Hence, any theory of realistic systems’
kinetics has to rely on making reasonable approximations that would simplify the situation.
One of the most useful approximations (sometimes called Stosszahlansatz – German for the
“collision-number assumption”) was suggested by Ludwig Boltzmann for a gas of particles that move
freely most of the time but interact during short time intervals, when a particle comes close to either an
immobile scattering center (say, an impurity in a conductor’s crystal lattice) or to another particle of the
gas. Such brief scattering events may change the particle’s momentum. Boltzmann argued that they may
be still approximately described Eq. (9), with the addition of a special term (called the scattering
integral) to its right-hand side:
Boltzmann w w
transport  v rw F  pw  scattering . (6.10)
equation t t
This is the Boltzmann transport equation, sometimes called just the “Boltzmann equation” for short. As
will be discussed below, it may give a very reasonable description of not only classical but also quantum
particles, though it evidently neglects the quantum-mechanical coherence/entanglement effects6 –
besides those that may be hidden inside the scattering integral.

5 See, e.g., CM Sec. 9.3.


6 Indeed, the quantum state coherence is described by off-diagonal elements of the density matrix, while the
classical probability w represents only the diagonal elements of that matrix. However, at least for the ensembles
close to thermal equilibrium, this is a reasonable approximation – see the discussion in Sec. 2.1.

Chapter 6 Page 3 of 38
Essential Graduate Physics SM: Statistical Mechanics

The concrete form of the scattering integral depends on the type of particle scattering. If the
scattering centers do not belong to the ensemble under consideration (an example is given, again, by
impurity atoms in a conductor), then the scattering integral may be expressed as an evident
generalization of the master equation (4.100):
w
t
scatteering 
  d 3 p' p' p w(r, p' , t )  pp' w(r, p, t ) ,  (6.11)

where the physical sense of pp’ is the rate (i.e. the probability per unit time) for the particle to be
scattered from the state with the momentum p into the state with the momentum p’ – see Fig. 2.

scattering
center p'

Fig. 6.2. A single-particle scattering event.


p
Most elastic interactions are reciprocal, i.e. obey the following relation (closely related to the
reversibility of time in Hamiltonian systems): pp’ = p’p, so Eq. (11) may be rewritten as7
w
scatteering   d 3 p' pp' w(r, p' , t )  w(r, p, t ) . (6.12)
t
With such scattering integral, Eq. (10) stays linear in w but becomes an integro-differential equation,
typically harder to solve analytically than differential equations.
The equation becomes even more complex if the scattering is due to the mutual interaction of the
particle members of the system – see Fig. 3.
p ''
interaction
p'
region
p' Fig. 6.3. A particle-particle scattering event.
p

In this case, the probability of a scattering event scales as a product of two single-particle
probabilities, and the simplest reasonable form of the scattering integral is8

7 One may wonder whether this approximation may work for Fermi particles, such as electrons, for whom the
Pauli principle forbids scattering into the already occupied state, so for the scattering p  p’, the term w(r, p, t) in
Eq. (12) has to be multiplied by the probability [1 – w(r, p’, t)] that the final state is available. This is a valid
argument, but one should notice that if this modification has been done with both terms of Eq. (12), it becomes
w
scatteering   d 3 p' pp' w(r, p' , t )1  w(r, p, t )  w(r, p, t )1  w(r, p' , t ) .
t
Opening both square brackets, we see that the probability density products cancel, bringing us back to Eq. (12).
8 This was the approximation used by L. Boltzmann to prove the famous H-theorem, stating that the entropy of
the gas described by Eq. (13) may only grow (or stay constant) in time, dS/dt  0. Since the model is very
approximate, that result does not seem too fundamental nowadays, despite all its historic significance.

Chapter 6 Page 4 of 38
Essential Graduate Physics SM: Statistical Mechanics

p' p, p ' p w(r, p' , t ) w(r, p '' , t ) 


w
  d p'  d p '  .
3 3 ' '
(6.13)
t
scatteering
 pp' , p p ' w(r, p, t ) w(r, p ' , t )
 ' ' 
The integration dimensionality in Eq. (13) takes into account the fact that due to the conservation of the
total momentum at scattering,
p  p '  p'  p '' , (6.14)

one of the momenta is not an independent argument, so the integration in Eq. (13) may be restricted to a
6D p-space rather than the 9D one. For the reciprocal interaction, Eq. (13) may also be a bit simplified,
but it still keeps Eq. (10) a nonlinear integro-differential transport equation, excluding such powerful
solution methods as the Fourier expansion – which hinges on the linear superposition principle.
This is why most useful results based on the Boltzmann transport equation depend on its further
simplifications, most notably the relaxation-time approximation – RTA for short.9 This approximation is
based on the fact that in the absence of spatial gradients ( = 0), and external forces (F = 0), in the
thermal equilibrium, Eq. (10) yields
w w
 scattering , (6.15)
t t
so the equilibrium probability distribution w0(r, p, t) has to turn any scattering integral to zero. Hence at
a small deviation from the equilibrium,
~ (r, p, t )  w(r, p, t )  w (r, p, t )  0 ,
w (6.16)
0

the scattering integral should be proportional to the deviation w ~ , and its simplest reasonable model is
Relaxation-
w ~
w
time
approximation   , (6.17)

scatteering
(RTA) t
where  is a phenomenological constant (which, according to Eq. (15), has to be positive for the
system’s stability) called the relaxation time. Its physical meaning will be more clear in the next section.
The relaxation-time approximation is quite reasonable if the angular distribution of the scattering
rate is dominated by small angles between vectors p and p’ – as it is, for example, for the Rutherford
scattering by a Coulomb center.10 Indeed, in this case the two values of the function w participating in
Eq. (12) are close to each other for most scattering events, so the loss of the second momentum
argument (p’) is not too essential. However, using the Boltzmann-RTA equation that results from
combining Eqs. (10) and (17),
w ~
w
Boltzmann-
 v rw F  pw   , (6.18)
RTA
equation
t 
we should always remember that this is just a phenomenological model, sometimes giving completely
wrong results. For example, it prescribes the same time scale () to the relaxation of the net momentum
of the system, and to its energy relaxation, while in many real systems, the latter process (that results

9 Sometimes this approximation is called the “BGK model”, after P. Bhatnager, E. Gross, and M. Krook who
suggested it in 1954. (The same year, a similar model was considered by P. Welander.)
10 See, e.g., CM Sec. 3.7.

Chapter 6 Page 5 of 38
Essential Graduate Physics SM: Statistical Mechanics

from inelastic collisions) may be substantially longer. Naturally, in the following sections, I will
describe only those applications of the Boltzmann-RTA equation that give a reasonable description of
physical reality.

6.2. The Ohm law and the Drude formula


Despite its shortcomings, Eq. (18) is adequate for quite a few applications. Perhaps the most
important of them is deriving the Ohm law for dc current in a “nearly-ideal” gas of charged particles,
whose only important deviation from ideality is the rare scattering effects described by Eq. (17). As a
result, in equilibrium it is described by the stationary probability w0 of an ideal gas (see Sec. 3.1):
g
w0 (r, p, t )  N   , (6.19)
2 3
where g is the internal degeneracy factor (say, g = 2 for electrons due to their spin), and N() is the
average occupancy of a quantum state with momentum p, that obeys either the Fermi-Dirac or the Bose-
Einstein distribution:
1
N    ,    p  . (6.20)
exp    / T   1

(The following calculations will be valid, up to a point, for both statistics and hence, in the limit /T 
–, for a classical gas as well.)
Now let a uniform dc electric field E be applied to a uniform gas of similar particles with
electric charge q, exerting the force F = qE on each of them. Then the stationary solution of Eq. (18),
with /t = 0, should also be stationary and spatially uniform (r = 0), so this equation is reduced to
~
w
qE   p w   . (6.21)

~ it produces is relatively small,
Let us require the electric field to be relatively low, so the perturbation w
as required by our basic assumption (16).11 Then on the left-hand side of Eq. (21), we can neglect that
perturbation, by replacing w with w0, because that side already has a small factor (E). As a result, this
equation yields
~   q E   w   qE     w0 ,
w (6.22)

p 0 p

where the second step implies isotropy of the parameters  and T, i.e. their independence of the
direction of the particle’s momentum p. But the gradient p is nothing else than the particle’s velocity

11 Since the scale of the fastest change of w0 in the momentum space is of the order of w0/p = (w0/)(d/dp) ~
(1/T)v, where v is the particle speed scale, the necessary condition of the linear approximation (22) is eE << T/v,
i.e. if eEl << T, where l  v has the meaning of the effective mean free path. Since the left-hand side of the last
inequality is just the average energy given to the particle by the electric field between two scattering events, the
condition may be interpreted as the smallness of the gas’ “overheating” by the applied field. However, another
condition is also necessary – see the last paragraph of this section.

Chapter 6 Page 6 of 38
Essential Graduate Physics SM: Statistical Mechanics

v – for a quantum particle, its group velocity.12 (This fact is easy to verify for the isotropic and parabolic
dispersion law, pertinent to classical particles moving in free space,
p2 p 2  p 22  p32
 p    1 . (6.23)
2m 2m
Indeed, in this case, the jth Cartesian components of the vector p is

 p   j    p j  v j , (6.24)
p j m
so p = v.) Hence, Eq. (22) may be rewritten as

~   q E  v w0 .
w (6.25)

Let us use this result to calculate the electric current density j. The contribution of each particle
to the current density is qv, so the total density is

j   qvwd 3 p  q  vw0  w
~d 3 p . (6.26)

Since in the equilibrium state (with w = w0), the current has to be zero, the integral of the first term in
the parentheses has to vanish. For the integral of the second term, plugging in Eq. (25), and then using
Eq. (19), we get
Sommerfeld  w0  3 gq 2   N    2
j  q   vE  v    
2 3 
theory’s
2
d p  v E  v   d p  dp , (6.27)
result      

where d2p is the elementary area of the constant energy surface in the momentum space, while dpis the
momentum differential’s component normal to that surface. The real power of this result13 is that it is
valid even for particles with an arbitrary dispersion law (p) (which may be rather complicated, for
example, for particles moving in space-periodic potentials14), and gives, in particular, a fair description
of conductivity’s anisotropy in crystals.
For free particles whose dispersion law is isotropic and parabolic, as in Eq. (23), the constant
energy surface is a sphere of radius p, so d2p = p2d = p2sindd, while dp = dp. In the spherical
coordinates, with the polar axis directed along the electric field vector E, we get (Ev) = E vcos. Now
separating the vector v outside the parentheses into the component vcos directed along the vector E,
and two perpendicular components, vsincos and vsinsin, we see that the integrals of the last two
components over the angle  give zero. Hence, as we could expect, in the isotropic case the net current
is directed along the electric field and obeys the linear Ohm law,
Ohm
law j  E , (6.28)

12 See, e.g., QM Sec. 2.1.


13 It was obtained by Arnold Sommerfeld in 1927.
14 See, e.g., QM Secs. 2.7, 2.8, and 3.4. (In this case, p should be understood as the quasimomentum rather than
the genuine momentum.)

Chapter 6 Page 7 of 38
Essential Graduate Physics SM: Statistical Mechanics

with a field-independent, scalar15 electric conductivity

gq 2
2  
  N   
 0 0
   0 p dp v  
2 2 2
d sin d cos θ . (6.29)
2   3
 
(Note that  is proportional to q2 and hence does not depend on the particle charge sign.16)
Since sind is just –d(cos), the integral over  equals (2/3). The integral over d is of course
just 2, while that over p may be readily transformed to one over the particle’s energy (p) = p2/2m: p2 =
2m, v2 = 2/m, p = (2m)1/2, so dp = (m/2)1/2d, and p2dpv2 = (2m)(m/2)1/2d (2/m)  (8m3)1/2d. As
a result, the conductivity equals

gq 2 4

  N   
 8m 
3 1/ 2
   d . (6.30)
2  3 3 0   
Now we may work out the integral in Eq. (30) by parts, first rewriting [-N()/]d as –d[N()]. Due
to the fast (exponential) decay of the factor N() at   , its product by the factor (8m3)1/2 vanishes
at both integration limits, and we get

 N   d 8m  
 
gq 2 4 gq 2 4
 3 1/ 2
 8m 1 / 2  N   3  1 / 2 d
2  3 3 0 2   3
3
0
2

(6.31)
q 2 gm 3 / 2
   N    d .
1/ 2

m 2 2  3 0

But according to Eq. (3.40), the last factor in this expression (after the  sign) is just the particle density
n  N/V, so Sommerfeld’s result is reduced, for an arbitrary temperature and any particle statistics, to the
very simple Drude formula,17
q 2 Drude
 n, (6.32) formula
m
which should be well familiar to the reader from an undergraduate physics course.
As a reminder, here is its simple classical derivation.18 Let  be the average time after the last
scattering event that has caused particles to lose the deterministic component of their velocity, vdrift,
provided by the electric field E on the top of the particle’s random thermal motion – which does not

15 As Eq. (27) shows, if the dispersion law (p) is anisotropic, the current density direction may be different from
that of the electric field. In this case, conductivity should be described by a tensor jj’, rather than a scalar.
However, in most important conducting materials, the anisotropy is rather small – see, e.g., EM Table 4.1.
16 This is why to determine the dominating type of charge carriers in semiconductors (electrons or holes, see Sec.
4 below), the Hall effect, which lacks such ambivalence (see, e.g., QM 3.2), is frequently used.
17 It was derived in 1900 by Paul Drude. Note that Drude also used the same arguments to derive a very simple

(and very reasonable) approximation for the complex electric conductivity in the ac field of frequency : () =
(0)/(1 – i), with (0) given by Eq. (32); sometimes the name “Drude formula” is used for this expression. Let
me leave its derivation, from the Boltzmann-RTA equation, for the reader’s exercise.
18 See also EM Sec. 4.2. Note that the frequently met definition of  as the “the average time interval between two

sequential scattering events” would lead to an extra factor of ½ in the expressions for vdrift and .

Chapter 6 Page 8 of 38
Essential Graduate Physics SM: Statistical Mechanics

contribute to the net current. Using the 2nd Newton law to describe the particle’s acceleration by the
field, dv/dt = qE/m, we get vdrift = qE/m. Multiplying this result by the particle’s charge q and density
n  N/V, we get the Ohm law j = E, with  given by Eq. (32).
Sommerfeld’s derivation of the Drude formula poses an important conceptual question. The
structure of Eq. (30) implies that the only quantum states contributing to the electric conductivity are
those whose derivative [-N()/] is significant. For the Fermi particles such as electrons, in the limit
T << F, these are the states at the very Fermi surface. On the other hand, Eq. (32) and the whole Drude
reasoning, involve the density n of all electrons. So, what exactly electrons are responsible for the
conductivity: all of them, or only those at the Fermi surface? For the resolution of this paradox, let us
return to Eq. (22) and analyze the physical meaning of that result. Let us compare it with the following
model distribution:
~, t ) ,
wmodel  w0 (r, p  p (6.33)

where p~ is some time-independent, small vector that describes a small shift of the unperturbed
distribution w0 as a whole, in the momentum space. Performing the Taylor expansion of Eq. (33) in this
small parameter, and keeping only two leading terms, we get
~
wmodel  w0 (r, p, t )  w ~ ~
model , with w model  p   p w0 (r , p, t ) . (6.34)

Comparing the last expression with the first form of Eq. (22), we see that they coincide if
~  qE τ  F .
p (6.35)

This means that Eq. (22) describes a small shift of the equilibrium distribution of all particles (in the
momentum space) by qE along the electric field’s direction, justifying the cartoon shown in Fig. 4.

p2 (a) p2 (b)

F  qE
Fig. 6.4. Filling of momentum states by
0 p1 p1 a degenerate electron gas: (a) in the
absence and (b) in the presence of an
external electric field E. Arrows show
~ representative scattering events.
p  F
At E = 0, the system is in equilibrium, so the quantum states inside the Fermi sphere (p < pF), are
occupied, while those outside of it are empty – see Fig. 4a. Electron scattering events may happen only
between states within a very thin layer ( p2/2m – F  ~ T) at the Fermi surface because only in this layer
the states are partially occupied, so both components of the product w(r, p, t)[1 – w(r, p’, t)], mentioned
in Sec. 1, do not vanish. These scattering events, on average, do not change the equilibrium probability
distribution, because they are uniformly spread over the Fermi surface.
Now let the electric field be turned on instantly. Immediately it starts accelerating all electrons in
its direction, i.e. the whole Fermi sphere starts moving in the momentum space, along the field’s
direction in the real space. For elastic scattering events (with  p’  =  p ), this creates an addition of

Chapter 6 Page 9 of 38
Essential Graduate Physics SM: Statistical Mechanics

occupied states at the leading edge of the accelerating sphere and an addition of free states on its trailing
edge (Fig. 4b). As a result, now there are more scattering events bringing electrons from the leading
edge to the trailing edge of the sphere than in the opposite direction. This creates the average backflow
of the state occupancy in the momentum space. These two trends eventually cancel each other, and the
Fermi sphere approaches a stationary (though not a thermal-equilibrium!) state, with the shift (35)
relatively to its thermal-equilibrium position.
Now Fig. 4b may be used to answer which of the two different interpretations of the Drude
formula is correct, and the answer is: either. On one hand, we can look at the electric current as a result
of the shift (35) of all electrons in the momentum space. On the other hand, each filled quantum state
deep inside the sphere gives exactly the same contribution to the net current density as it did without the
field. All these internal contributions to the net current cancel each other so the applied field changes the
situation only at the Fermi surface. Thus it is equally legitimate to say that only the surface states are
responsible for the non-zero net current.19
Let me also mention another paradox related to the Drude formula, which is often misunderstood
(not only by students :-). As was emphasized above,  is finite even at elastic scattering – that by itself
does not change the total energy of the gas. The question is how can such scattering be responsible for
the Ohmic resistivity   1/, and hence for the Joule heat production, with the power density p = jE =
j2?20 The answer is that the Drude/Sommerfeld formulas describe just the “bottleneck” of the Joule
heat formation. In the scattering picture (Fig. 4b) the states filled by elastically scattered electrons are
located above the (shifted) Fermi surface, and these electrons eventually need to relax onto it via some
inelastic process, which releases their excessive energy in the form of heat (in a solid, described by
phonons – see Sec. 2.6). The rate and other features of these inelastic phenomena do not participate in
the Drude formula directly, but for keeping the theory valid (in particular, keeping the probability
distribution w close to its equilibrium value w0), their intensity has to be sufficient to avoid gas
overheating by the applied field.21
One final comment is that the Sommerfeld theory of Ohmic conductivity, based on the
Boltzmann-RTA equation (18), works very well for the electron gas in most conductors. The scheme
shown in Fig. 4 helps to understand why: for degenerate Fermi gases the energies of all particles whose
scattering contributes to transport properties, are close (  F), and prescribing them all the same
relaxation time  is very reasonable. In contrast, in classical gases, with their relatively broad
distribution of , some results given by Eq. (18) are valid only by the order of magnitude.

6.3. Electrochemical potential and the drift-diffusion equation


Now let us generalize our calculation to the case when the particle transport takes place in the
presence of a time-independent spatial gradient of the probability distribution, rw  0, caused for

19 So here, as it frequently happens in physics, formulas (or graphical sketches, such as Fig. 4b) give a more clear
description of reality than words – the privilege lacked by many “scientific” disciplines that are rich with
unending, shallow verbal debates. Note also that, as frequently happens in physics, the dual interpretation of  is
expressed by two different but equal integrals (30) and (31), related by the integration-by-parts rule.
20 This formula is probably self-evident, but if you need you may revisit EM Sec. 4.4.
21 In some poorly conducting materials, charge carrier overheating effects resulting in deviations from the Ohm
law, i.e. from the linear relation (28) between j and E, may be observed already at practicable electric fields.

Chapter 6 Page 10 of 38
Essential Graduate Physics SM: Statistical Mechanics

example by that of the particle concentration n = N/V (and hence, according to Eq. (3.40), of the
chemical potential ), while still assuming that the temperature T is constant. For this generalization, we
should keep the second term on the left-hand side of Eq. (18). If the gradient of w is sufficiently small,
we can repeat the arguments of the last section and replace w with w0 in this term as well. With the
applied electric field E represented as (–),22 where  is the electrostatic potential, Eq. (25) becomes

~   v   w0 q  w  .
w (6.36)
 
0

Since in any of the equilibrium distributions (20), N() is a function of  and  only in the combination
( – ), it obeys the following relation:
 N    N  
 . (6.37)
 
Using it, the gradient of w0  N() may be represented as23
w0
w0    , for T  const , (6.38)

so Eq. (36) becomes
~   w0 v  q     w0 v  ' ,
w (6.39)
 
where the following sum,
Electro-
chemical
'    q , (6.40)
potential
is called the electrochemical potential. Now replicating the calculation of the electric current, carried
out in the last section, we get the following generalization of the Ohm law (28):
j    ' / q    E , (6.41)

where the effective electric field E is proportional to the gradient of the electrochemical potential, rather
of the electrostatic potential:
Effective μ' 
electric E  E  . (6.42)
field q q
The physics of this extremely important and general result24 may be explained in two ways.
First, let us have a look at the energy spectrum of a uniform degenerate Fermi gas confined in a volume
of finite size. To ensure such confinement we need a piecewise-constant potential U(r) – a “hard-wall,

22 Since we will not encounter p in the balance of this chapter, from this point on the subscript of the operator r
is dropped for the notation brevity.
23 Since we consider w as a function of two independent arguments r and p, taking its gradient, i.e. the
0
differentiation of this function over r, does not involve its differentiation over the kinetic energy  – which is a
function of p only.
24 Note that Eq. (42) does not include the phenomenological parameter  of the relaxation-time approximation,
signaling that it is much more general than the RTA. Indeed, this equality is based entirely on the relation between
the second and third terms on the left-hand side of the general Boltzmann equation (10), rather than on any details
of the scattering integral on its right-hand side.

Chapter 6 Page 11 of 38
Essential Graduate Physics SM: Statistical Mechanics

flat-bottom potential well” – see Fig. 5a. (For conduction electrons in a metal, such profile is provided
by the positively charged ions of the crystal lattice, augmented by its screening by the conduction
electrons.) The well should be of a sufficient depth U0 > F  T = 0 to provide the confinement of the
overwhelming majority of the particles, with energies below and somewhat above the Fermi level F.
This means that there should be a substantial energy gap,
  U 0    T , (6.43)

between the Fermi energy of a particle inside the well, and its potential energy U0 outside the well. (The
latter value of energy is usually called the vacuum level.) The difference defined by Eq. (43) is called the
workfunction;25 for most metals, it is between 4 and 5 eV, so the relation  >> T is well fulfilled for
room temperatures (T ~ 0.025 eV) – and actually for all temperatures below the metal’s evaporation
point.
(a) (b) (c)
vacuum d
level 2 
 1 1 2
U0

Fig. 6.5. Potential profiles of (a) a single conductor and (b, c) a system of
two closely located conductors, for two different biasing situations: (b) zero
electrostatic field (the “flat-band condition”), and (c) zero voltage ’.

Now let us consider two conductors with different values of , separated by a small spatial gap d
– see Figs. 5b,c. Panel (b) shows the case when the electric field E = – in the free-space gap between
the conductors equals zero, i.e. their electrostatic potentials  are equal.26 If there is an opportunity for
particles to cross the gap (e.g., by either the thermally-activated hopping over the potential barrier,
discussed in Secs. 5.6-5.7, or the quantum-mechanical tunneling through it), there will be an average
flow of particles from the conductor with the higher Fermi level to that with the lower Fermi level,27
because the chemical equilibrium requires their equality – see Secs. 1.5 and 2.7. If the particles have an
electric charge (as electrons do), the equilibrium will be automatically achieved by them recharging the
effective capacitor formed by the conductors, until the electrostatic energy difference q reaches the
value reproducing that of the workfunctions (Fig. 5c). So for the equilibrium potential difference28 we
may write
q     . (6.44)

At this equilibrium, the electric field in the gap between the conductors is

25 Sometimes it is also called “electron affinity”, though this term is mostly used for atoms and molecules.
26 In semiconductor physics and engineering, the situation shown in Fig. 5b is called the flat-band condition,
because any electric field applied normally to a surface of a semiconductor leads to the so-called energy band
bending – see the next section.
27 As measured from a common reference value, for example from the vacuum level – rather than from the bottom
of an individual potential well as in Fig. 5a.
28 In physics literature, it is usually called the contact potential difference, while in electrochemistry (for which it
is one of the key notions), the term Volta potential is more common.

Chapter 6 Page 12 of 38
Essential Graduate Physics SM: Statistical Mechanics

Δ Δ 
E  n n ; (6.45)
d qd q
in Fig. 5c this field is clearly visible as the tilt of the electric potential profile. Comparing Eq. (45) with
the definition (42) of the effective electric field E, we see that the equilibrium, i.e. the absence of current
through the potential barrier, is achieved exactly when E = 0, in accordance with Eq. (41).
The electric field dichotomy, E  E, raises a natural question: which of these fields we are
speaking about in everyday and laboratory practice? Upon some contemplation, the reader should agree
that most of our electric field measurements are done indirectly, by measuring corresponding voltages –
with voltmeters. A vast majority of these instruments belong to the so-called electrodynamic variety,
which is based on the measurement of a small current flowing through the voltmeter.29 As Eq. (41)
shows, such electrodynamic voltmeters measure the electrochemical potential difference ’/q.
However, there exists a rare breed of electrostatic voltmeters (also called “electrometers”) that measure
the electrostatic potential difference  between two conductors. One way to implement such an
instrument is to use an ordinary, electrodynamic voltmeter, but with the reference point set at the flat-
band condition (Fig. 5b) between the conductors. (This condition may be detected by vanishing electric
charge on the adjacent surfaces of the conductors, and hence by the absence of its modulation in time if
the distance between the surfaces is periodically modulated.)
Now let me return to Eq. (41) and make two very important remarks. First, it says that in the
presence of an electric field, the current vanishes only if ’ = 0, i.e. that the electrochemical potential
’, rather than the chemical potential , has to be position-independent in a system in thermodynamic
(thermal, chemical, and electric) equilibrium of a conducting system. This result by no means
contradicts the fundamental thermodynamic relations for  discussed in Sec. 1.5, or the statistical
relations involving , which were discussed in Sec. 2.7 and beyond. Indeed, according to Eq. (40), ’(r)
is “merely” the chemical potential referred to the local value of the electrostatic energy q(r), and in all
previous parts of the course, this energy was assumed to be constant throughout the system.
Second, note another interpretation of Eq. (41), which may be achieved by modifying Eq. (38)
for the particular case of the classical gas. Indeed, the local density n  N/V of the gas obeys Eq. (3.32),
which may be rewritten as
  (r )  . (6.46)
n(r )  const  exp  
 T 
Taking the spatial gradient of both sides of this relation (still at constant T), we get
1   n
n  const  exp    , (6.47)
T T  T
so  = (T/n)n, and Eq. (41), with  given by Eq. (32), may be recast as
 '  q 2  1  
j       n        q nqE  Tn  . (6.48)
 q  m  q  m

29 The devices for such measurement may be based on the interaction between the measured current and a
permanent magnet, as pioneered by A.-M. Ampère in the 1820s – see, e.g., EM Chapter 5. Such devices are
sometimes called galvanometers, honoring another pioneer of electricity, Luigi Galvani.

Chapter 6 Page 13 of 38
Essential Graduate Physics SM: Statistical Mechanics

The second term in the parentheses is a specific manifestation of the general Fick’s law of diffusion jw =
Dn, already mentioned in Sec. 5.6. Hence the current density may be viewed as consisting of two
independent parts: one due to particle drift induced by the “usual” electric field E = –, and another
due to their diffusion – see Eq. (5.118) and its discussion. This is exactly the physics of the “mysterious”
term  in Eq. (42), though its simple form (48) is valid only in the classical limit.
Besides being very useful for applications, Eq. (48) also gives us a pleasant surprise. Namely,
plugging it into the continuity equation for electric charge,30
 (qn)
 j  0, (6.49)
t
we get (after the division of all terms by q/m) the so-called drift-diffusion equation:31
m n Drift-
  nU   T 2 n, with U  q . (6.50) diffusion
 t equation

Comparing it with Eq. (5.122), we see that the drift-diffusion equation is identical to the Smoluchowski
equation,32 provided that we parallel the ratio /m with the mobility m = 1/ of the Brownian particle.
Now using Einstein’s relation (5.78), we see that the effective diffusion constant D of the classical gas
of similar particles is
T
D . (6.51a)
m
This important relation is more frequently represented in either of two other forms. First, since
the rare scattering events we are considering do not change the statistics of the gas in thermal
equilibrium, we may still use the Maxwell-distribution result (3.9) for the average-square velocity v2,
to recast Eq. (51a) as
1
D  v2  . (6.51b)
3
One more popular form of the same relation uses the notion of the mean free path l, which may be
defined as the average distance to be passed by a particle before its next scattering:
1 1/ 2 1/ 2
D  l v2 , with l  v 2 . (6.51c)
3
In the forms (51b)-(51c), the result for D makes more physical sense, because it may be readily derived
(admittedly, with some uncertainty of the numerical coefficient) from simple kinematic arguments – the
task left for the reader’s exercise.
Note also that using Eq. (51a), Eq. (48) may be rewritten as an expression for the particle flow
density jn  njw = j/q:
j n  nμ m qE  Dn , (6.52)

30 If this relation is not obvious, please revisit EM Sec. 4.1.


31 Sometimes this term is associated with Eq. (52). One may also run into the term “convection-diffusion
equation” for Eq. (50) with the replacement (51a).
32 And hence, at negligible U, identical to the diffusion equation (5.116).

Chapter 6 Page 14 of 38
Essential Graduate Physics SM: Statistical Mechanics

with the first term on the right-hand side describing particles’ drift, and the second one, their diffusion. I
will discuss the application of this equation to the most important case of non-degenerate (“quasi-
classical”) gases of electrons and holes in semiconductors, in the next section.
To complete this section, let me emphasize again that the mathematically similar drift-diffusion
equation (50) and the Smoluchowski equation (5.122) describe different physical situations. Indeed, our
(or rather Einstein and Smoluchowski’s :-) treatment of the Brownian motion in Chapter 5 was based on
a strong hierarchy of the system, consisting of a large “Brownian particle” in an environment of many
smaller particles – “molecules”. On the other hand, in this chapter we are considering a gas of similar
particles. Nevertheless, the equations describing the dynamics of their probability distribution, are the
same – at least within the framework of the Boltzmann transport equation with the relaxation-time
approximation (17) of the scattering integral. The origin of this similarity is the fact that Eq. (12) is
clearly applicable to a Brownian particle as well, with each “scattering” event being the particle’s hit by
a random molecule of its environment. Since, due to the mass hierarchy, the particle momentum change
at each such event is very small, the scattering integral has to be local, i.e. depend only on w at the same
momentum p as the left-hand side of the Boltzmann equation, so the relaxation time approximation (17)
is absolutely natural – indeed, more natural than for our current case of similar particles.

6.4. Charge carriers in semiconductors


Now let me demonstrate the application of the concepts discussed in the last section, first of all
of the electrochemical potential, to understanding the basic kinetic properties of semiconductors and a
few key semiconductor structures – which are the basis of most modern electronic and optoelectronic
devices, and hence of all our IT civilization. For that, I will need to take a detour to discuss their
equilibrium properties first.
I will use an approximate but reasonable picture in which the energy of the electron subsystem in
a solid may be partitioned into the sum of the effective energies  of independent electrons. Quantum
mechanics says33 that in such periodic structures as crystals, the stationary state energy  of a particle
interacting with the atomic lattice follows one of the periodic functions n(q) of the quasimomentum q,
oscillating between two extreme values nmin and nmax. These allowed energy bands are separated with
bandgaps, of widths n  nmin – n-1max, with no allowed states inside them. Semiconductors and
insulators (dielectrics) are defined as such crystals that in equilibrium at T = 0, all electron states in
several energy bands (with the highest of them called the valence band) are completely filled, N(v) =
1, while those in the upper bands, starting from the lowest, conduction band, are completely empty,
N(c) = 0.34, 35 Since the electrons follow the Fermi-Dirac statistics (2.115), this means that at T  0,

33 See, e.g., QM Sec. 2.7 and 3.4, but a thorough knowledge of this material is not necessary for following
discussions in this section. If the reader is not familiar with the notion of quasimomentum (alternatively called the
“crystal momentum”), the following interpretation may be useful: q is the result of quantum averaging of the
genuine electron momentum p over the crystal lattice period. In contrast to p, which is not conserved because of
the electron’s interaction with the lattice, q is an integral of motion – in the absence of other forces.
34 This mapping of electrical properties of crystals on their band structure was pioneered in 1931-32 by Alan H.
Wilson.
35 In insulators, the bandgap  is so large (e.g., ~9 eV in SiO ) that the conduction band remains unpopulated in
2
all practical situations, so the following discussion is only relevant for semiconductors, with their moderate
bandgaps – such as 1.14 eV in the most important case of silicon at room temperature.

Chapter 6 Page 15 of 38
Essential Graduate Physics SM: Statistical Mechanics

the Fermi energy F  (0) is located somewhere between the valence band’s maximum vmax (usually
called simply V), and the conduction band’s minimum cmin (called C) – see Fig. 6.

 c q 

C
 
V
Fig. 6.6. Calculating  in an
 v q  intrinsic semiconductor.
0 q
Let us calculate the population of both branches n(q), and the chemical potential  in
equilibrium at T > 0. Since the functions n(q) are typically smooth, near the bandgap edges the
dispersion laws c(q) and v(q) may be well approximated with quadratic parabolas. For our analysis, let
us take the parabolas in the simplest, isotropic form, with origins at the same quasimomentum, taking it
for the reference point:36
  C  q 2 / 2mC , for    C ,
  with  C   V   . (6.53)
 V  q / 2mV , for    V ,
2

The positive constants mC and mV are usually called the effective masses of, respectively, electrons and
holes. (In a typical semiconductor, mC is a few times smaller than the free electron mass me, while mV is
closer to me.)
Due to the similarity between the top line of Eq. (53) and the dispersion law (3.3) of free
particles, we may reuse Eq. (3.40), with the appropriate particle mass m, the degeneracy factor g, and
the energy origin, to calculate the full spatial density of the populated states (in semiconductor physics,
called electrons in the narrow sense of the word):
 
Ne g m3/ 2
n   N   g 3  d  C 2C 3  N     
~ ~
C
1/ 2
d~ , (6.54)
V C 2  0

where ~   – C  0. Similarly, the density p of “no-electron” excitations (called holes) in the valence
band is the number of unfilled states in the band, and hence may be calculated as
V 
g V mV3 / 2
 1  N    g  d   1  N   ~  ~ 1 / 2 d~ ,
N
p h  3 V (6.55)
V  2  2 3
0

where in this case, ~  0 is defined as (V – ). If the electrons and holes37 are in the thermal and
chemical equilibrium, the functions N() in these two relations should follow the Fermi-Dirac

36 It is easy (and hence is left for the reader’s exercise) to verify that all equilibrium properties of charge carriers
remain the same (with some effective values of mC and mV) if c(q) and v(q) are arbitrary quadratic forms of the
Cartesian components of the quasimomentum. A mutual displacement of the branches c(q) and v(q) in the
quasimomentum space is also unimportant for statistical and most transport properties of the semiconductors,
though it is very important for their optical properties – which I will not have time to discuss in any detail.
37 The collective name for them in semiconductor physics is charge carriers – or just “carriers”.

Chapter 6 Page 16 of 38
Essential Graduate Physics SM: Statistical Mechanics

distribution (2.115) with the same temperature T and the same chemical potential . Moreover, in our
current case of an undoped (intrinsic) semiconductor, these densities have to be equal,
n  p  ni , (6.56)
because if this electroneutrality condition was violated, the volume would acquire a non-zero electric
charge density  = e(p – n), which would result, in a bulk sample, in an extremely high electric field
energy. From this condition and Eqs. (54)-(55), we get a system of two equations,

g C mC3 / 2 
~ 1 / 2 d~ g V mV3 / 2  ~ 1 / 2 d~
ni  
2 2  3 0 exp   C    / T   1
~  
2 2  3 0 exp   V    / T   1
~ , (6.57)

whose solution gives both the requested charge carrier density ni and the Fermi level .
For an arbitrary ratio /T, this solution may be found only numerically, but in most practical
cases, this ratio is very large. (Again, for Si at room temperature,   1.14 eV, while T  0.025 eV.) In
this case, we may use the same classical approximation as in Eq. (3.45), to reduce Eqs. (54) and (55) to
simple expressions
  C     
n  nC exp , p  nV exp V , for T   , (6.58)
 T   T 
where the temperature-dependent parameters,
3/ 2 3/ 2
g m T  g m T 
nC  C3  C  and nV  V3  V  , (6.59)
  2    2 
may be interpreted as the effective numbers of states (per unit volume) available for occupation in,
respectively, the conduction and valence bands, in thermal equilibrium. For usual semiconductors (with
gC ~ gV ~ 1, and mC ~ mV ~ me), at room temperature, these numbers are of the order of 31025m-3 
31019cm-3. (Note that all results based on Eqs. (58) are only valid if both n and p are much lower than,
respectively, nC and nV.)
With the substitution of Eqs. (58), the system of equations (56) allows a straightforward solution:

V  C T  g V 3 mV   
  ln  , ni  nC n V 
1/ 2
  ln exp . (6.60)
2 2  g C 2 mC   2T 
Since in all practical materials the logarithms in the first of these expressions are never much larger than
1,38 it shows that the Fermi level in intrinsic semiconductors never deviates much from the so-called
midgap value (V +C)/2 – see the (schematic) Fig. 6. In the result for ni, the last (exponential) factor is
very small, so the equilibrium number of charge carriers is much lower than that of the atoms – for the
most important case of silicon at room temperature, ni ~ 1010cm-3. The exponential temperature
dependence of ni (and hence of the electric conductivity   ni) of intrinsic semiconductors is the basis
of several applications, for example, simple germanium resistance thermometers efficient in the whole
range from ~0.5K to ~100K. Another useful application of the same fact is the extraction of the bandgap

38Note that in the case of simple electron spin degeneracy (gV = gC = 2), the first logarithm vanishes altogether.
However, in many semiconductors, the degeneracy is factored by the number of similar energy bands (e.g., six
similar conduction bands in silicon), and the factor ln(gV/gC) may slightly affect quantitative results.

Chapter 6 Page 17 of 38
Essential Graduate Physics SM: Statistical Mechanics

of a semiconductor from the experimental measurement of the temperature dependence of   ni –


frequently, in just two well-separated temperature points.
However, most applications require a much higher concentration of carriers. It may be increased quite
dramatically by planting into a semiconductor a relatively small number of slightly different atoms – either
donors (e.g., phosphorus atoms for Si) or acceptors (e.g., boron atoms for Si). Let us analyze the first
opportunity, called n-doping, using the same simple energy band model (53). If the donor atom is only
slightly different from those in the crystal lattice, it may be easily ionized – giving an additional electron
to the conduction band and hence becoming a positive ion. This means that the effective ground state
energy D of the additional electrons is just slightly below the conduction band edge C – see Fig. 7a.39

(a) (b)
D C Fig. 6.7. The Fermi levels  in
 (a) n-doped and (b) p-doped
  semiconductors. Hatching shows
 the ranges of unlocalized state
V A
energies.

Reviewing the arguments that have led us to Eqs. (58), we see that at relatively low doping,
when the strong inequalities n << nC and p << nV still hold, these relations are not affected by the
doping, so the concentrations of electrons and holes given by these equalities still obey a universal
(doping-independent) relation following from Eqs. (58) and (60):40

np  ni2 . (6.61)
However, for a doped semiconductor, the electroneutrality condition looks differently from Eq. (56)
because the total density of positive charges in a unit volume is not p, but rather (p + n+), where n+ is the
density of positively-ionized (“activated”) donor atoms, so the condition becomes
n  p  n . (6.62)
If virtually all dopants are activated, as it is in most practical cases,41 then we may take n+ = nD, where
nD is the total concentration of donor atoms, i.e. their number per unit volume, and Eq. (62) becomes
n  p  nD . (6.63)
Plugging in Eq. (61) in the form p = ni2/n, we get a simple quadratic equation for n, with the following
physically acceptable (positive) solution:

39 Note that in comparison with Fig. 6, here the (for most purposes, redundant) information on the q-dependence
of the energies is collapsed, leaving the horizontal axis of such a band-edge diagram free for showing their
possible spatial dependences – see Figs. 8, 10, and 11 below.
40 Very similar relations may be met in the theory of chemical reactions (where it is called the law of mass
action), and other disciplines – including such exotic examples as theoretical ecology.
41 Let me leave it for the reader’s exercise to prove that this assumption is always valid unless the doping density

nD becomes comparable to nC, and as a result, the Fermi level  moves into a ~T-wide vicinity of D.

Chapter 6 Page 18 of 38
Essential Graduate Physics SM: Statistical Mechanics

1/ 2
nD  n D2 
n     ni2  . (6.64)
2  4 
This result shows that the doping affects n (and hence  = C – Tln(nC/n) and p = ni2/n) only if the
dopant concentration nD is comparable with, or higher than the intrinsic carrier density ni given by Eq.
(60). For most applications, nD is made much higher than ni; in this case Eq. (64) yields
ni2 ni2 nC
n  nD  ni , p   n,    p   C  T ln . (6.65)
n nD nD
Because of the reasons to be discussed very soon, modern electron devices require doping densities
above 1018cm-3, so the logarithm in Eq. (65) is not much larger than 1. This means that the Fermi level
rises from the midgap to a position only slightly below the conduction band edge C – see Fig. 7a.
The opposite case of purely p-doping, with nA acceptor atoms per unit volume, and a small
activation (negative ionization) energy A – V << ,42 may be considered absolutely similarly, using
the electroneutrality condition in the form
n  n  p , (6.66)
where n– is the number of activated (and hence negatively charged) acceptors. For the relatively high
concentration (ni << nA << nV), virtually all acceptors are activated, so n–  nA, Eq. (66) may be
approximated as n + nA = p, and the analysis gives the results dual to Eq. (65):
ni2 ni2 nV
p  nA  ni , n   p,    n   V  T ln , (6.67)
p nA nA
so in this case, the Fermi level is just slightly above the valence band edge (Fig. 7b), and the number of
holes far exceeds that of electrons – again, in the narrow sense of the word. Let me leave the analysis of
the simultaneous n- and p-doping (which enables, in particular, so-called compensated semiconductors
with the sign-variable difference n – p  nD – nA) for the reader’s exercise.
Now let us consider how a sample of a doped semiconductor (say, a p-doped one) responds to a
static external electrostatic field E applied normally to its surface.43 (In semiconductor integrated
circuits, such a field is usually created by the voltage applied to a special highly-conducting gate
electrode separated from the semiconductor surface by a thin insulating layer.) Assuming that the field
penetrates into the sample by a distance  much larger than the crystal lattice period a (the assumption
to be verified a posteriori), we may calculate the distribution of the electrostatic potential  using the
macroscopic version of the Poisson equation.44 Assuming that the semiconductor occupies the semi-
space x > 0 and that E = nxE, the equation reduces to the following 1D form45

42 For the typical donors (P) and acceptors (B) in silicon, both ionization energies, (C – D) and (A – V), are
close to 45 meV, i.e. are indeed much smaller than   1.14 eV.
43 A simplified version of this analysis was carried out in EM Sec. 2.1.
44 See, e.g., EM Sec. 3.4.
45 I am sorry for using, for the SI electric constant  , the same Greek letter as for single-particle energies, but
0
both notations are traditional, and the difference between these uses will be clear from the context.

Chapter 6 Page 19 of 38
Essential Graduate Physics SM: Statistical Mechanics

d 2  x 
 . (6.68)
dx 2
 0
Here  is the dielectric constant of the semiconductor matrix – excluding the dopants and charge
carriers, which in this approach are treated as explicit (“stand-alone”) charges, with the volumic density
  e  p  n  n  . (6.69)

(As a sanity check, Eqs. (68)-(69) show that if E  –d/dx = 0, then  = 0, bringing us back to the
electroneutrality condition (66), and hence the “flat” band-edge diagrams shown in Figs. 7b and 8a.)

(a) (b) (c)


C n  0, p  0
n0 p  nA   enA  0
 0 p  nA
p  nA  0   0,  e  0
 0  0  0  0
  e  0 '  const w  0
 0
'  
A '  const
V
x  x0  w
x0 x0 x ~ D x0 x  x0

Fig. 6.8. The band-edge diagrams of the electric field penetration into a uniform p-doped semiconductor:
(a) E = 0, (b) E < 0, and (c) 0 < Ec < E. Solid red points depict positive charges; solid blue points, negative
charges; and hatched blue points, possible electrons in the inversion layer – all very schematically.

In order to get a closed system of equations for the case E  0, we should take into account that
the electrostatic potential   0, penetrating into the sample with the field,46 adds the potential
component q(x) = –e(x) to the energy of each electron, and hence shifts the whole local system of
single-electron energy levels “vertically” by this amount – down for  > 0, and up for  < 0. As a result,
the field penetration leads to what is called band bending – see the band-edge diagrams schematically
shown in Figs. 8b,c for two possible polarities of the applied field, which affects the distribution (x) via
the boundary condition47
d
0  E . (6.70)
dx
Note that the electrochemical potential ’ (which, in accordance with the discussion in Sec. 3, replaces
the chemical potential in presence of the electric field),48 has to stay constant through the system in
equilibrium, keeping the electric current equal to zero – see Eq. (41). For arbitrary doping parameters,

46 It is common (though not necessary) to select the energy reference so deep inside the semiconductor,  = 0; in
what follows I will use this convention.
47 Here E is the field just inside the semiconductor. The free-space field necessary to create it is  times larger –

see, e.g., the same EM Sec. 3.4, in particular, Eq. (3.56).


48 In semiconductor physics literature, the value of ’ is usually called the Fermi level, even in the absence of the
degenerate Fermi sea typical for metals – cf. Sec. 3.3. In this section, I will follow this common terminology.

Chapter 6 Page 20 of 38
Essential Graduate Physics SM: Statistical Mechanics

the system of equations (58) (with the replacements V  V – e, and   ’) and (68)-(70), plus the
relation between n– and nA (describing the acceptor activation), does not allow an analytical solution.
However, as was discussed above, in the most practical cases nA >> ni, we may use the approximate
relations n–  nA and n  0 at virtually any values of ’ within the locally shifted bandgap [V – e(x), C
– e(x)], so the substitution of these relations, and the second of Eqs. (58), with the mentioned
replacements, into Eq. (69) yields

  V  e  '   nV   V  '    e  
  enV exp   enA  enA  exp   exp   1 . (6.71)
 T   nA  T   T  
The x-independent electrochemical potential (a.k.a. the Fermi level) ’ in this relation should be equal to
the value of the chemical potential  (x  ) in the semiconductor’s bulk, given by the last of Eqs. (67),
which turns the expression in the parentheses into 1. With these substitutions, Eq. (68) becomes
d 2 en   e  
 A exp T   1, for  V  e  x   '   C  e  x  . (6.72)
dx 2
 0    
This nonlinear differential equation may be solved analytically, but in order to avoid a
distraction by this (rather bulky) solution, let me first consider the case when the electrostatic potential
is sufficiently small – either because the external field is small, or because we focus on the distances
sufficiently far from the surface – see Fig. 8 again. In this case, in the Taylor expansion of the exponent
in Eq. (72), with respect to small , we may keep only two leading terms, turning it into a linear
equation:
1/ 2
d 2 e 2 n A d 2    0T 
 , i.e.  , where  D   2  , (6.73)
dx 2  0T dx 2 2D  e nA 
with the well-known exponential solution, satisfying also the boundary condition   0 at x  :
 x 
  C exp , at e   T . (6.74)
 D 
The constant D given by the last of Eqs. (73) is called the Debye screening length. It may be
rather substantial; for example, at TK = 300K, even for the relatively high doping nA  1018cm-3 typical
for modern silicon (  12) integrated circuits, it is close to 4 nm – still much larger than the crystal
lattice constant a ~ 0.3 nm, so the above analysis is indeed quantitatively valid. Note also that D does
not depend on the charge’s sign; hence it should be no large surprise that repeating our analysis for an n-
doped semiconductor, we may find out that Eqs. (73)-(74) are valid for that case as well, with the only
replacement nA  nD.
If the applied field E is weak, Eq. (74) is valid in the whole sample, and the constant C in it may
be readily calculated using the boundary condition (70), giving
1/ 2
  T 
 x 0  C   DE   2 0  E. (6.75)
 e nA 
This formula allows us to express the condition of validity of the linear approximation leading to Eq.
(74), e  << T, in terms of the applied field:

Chapter 6 Page 21 of 38
Essential Graduate Physics SM: Statistical Mechanics

1/ 2
T  Tn 
E  Emax , with Emax    A  ; (6.76)
e D   0 
in the above example, Emax ~ 60 kV/cm. On the lab scale, such field is not low at all (it is twice higher
than the threshold of electric breakdown in the air at ambient conditions), but it may be sustained by
many solid-state materials that are much less prone to breakdown.49 This is why we should be interested
in what happens if the applied field is higher than this value.
The semi-quantitative answer is relatively simple if the field is directed out of the p-doped
semiconductor (in our nomenclature, E < 0 – see Fig. 8b). As the valence band bends up by a few T, the
local hole concentration p(x), and hence the charge density (x), grow exponentially – see Eq. (71).
Hence the effective local length of the nonlinear field’s penetration, ef(x)  -1/2(x), shrinks
exponentially. A detailed analysis of this effect using Eq. (72) does not make much sense, because as
soon as ef(0) decreases to ~a, the macroscopic Poisson equation (68) is no longer valid quantitatively.
For typical semiconductors, this happens at the field that raises the edge V – e(0) of the bent valence
band at the sample’s surface above the Fermi level ’. In this case, the valence-band electrons near the
surface form a degenerate Fermi gas, with an “open” Fermi surface – essentially a metal, which a very
small (atomic-size) Thomas-Fermi screening length:50
1/ 2
  
ef 0  ~ TF  2 0  . (6.77)
 e g 3  F  
The effects taking place at the opposite polarity of the field, E > 0, are much more interesting –
and more useful for applications. Indeed, in this case, the band bending down leads to an exponential
decrease of (x) as soon as the valence band edge V – e(x) drops down by just a few T below its
unperturbed value V. If the applied field is large enough, E > Ec (as it is in the situation shown in Fig.
8c), it forms, on the left of such point x0 the so-called depletion layer, of a certain width w. Within this
layer, not only the electron density n but the hole density p as well are negligible, so the only substantial
contribution to the charge density  is given by the fully ionized acceptors:   –en–  –enA, and Eq.
(72) becomes very simple:
d 2 enA
  const, for x0  w  x  x0 . (6.78)
dx 2  0
Let us use this equation to calculate the largest possible width w of the depletion layer, and the
critical value, Ec, of the applied field necessary for this. (By definition, at E = Ec, the left boundary of the
layer, where V – e(x) = C, i.e. e(x) = V – A  , just touches the semiconductor surface: x0 – w = 0,
i.e. x0 = w. (Figure 8c shows the case when E is slightly larger than Ec.) For this, Eq. (78) has to be
solved with the following boundary conditions:
 d d
 0   , 0  Ec ,  w  0, w   0 . (6.79)
e dx dx

49 Even some amorphous thin-film insulators, such as properly grown silicon and aluminum oxides, can withstand
fields up to ~10 MV/cm.
50 As a reminder, the derivation of this formula was the task of Problem 3.14.

Chapter 6 Page 22 of 38
Essential Graduate Physics SM: Statistical Mechanics

Note that the first of these conditions is strictly valid only if T << , i.e. at the assumption we have made
from the very beginning, while the last two conditions are asymptotically correct only if D << w – the
assumption we should not forget to check after the solution.
After all the undergraduate experience with projective motion problems, the reader certainly
knows by heart that the solution of Eq. (78) is a quadratic parabola, so let me immediately write its final
form satisfying the boundary conditions (79):

enA w  x 2
1/ 2
 2   2
 x   , with w   2 0  , at Ec  . (6.80)
 0 2  e nA  eε 0 w

Comparing the result for w with Eq. (73), we see that if our basic condition T <<  is fulfilled, then D
<< w, confirming the qualitative validity of the whole solution (80). For the same particular parameters
as in the example before (nA  1018cm-3,   10), and   1 eV, Eqs. (80) give w  40 nm and Ec  600
kV/cm – still a practicable field. (As Fig. 8c shows, to create it, we need a gate voltage only slightly
larger than /e, i.e. close to 1 V for typical semiconductors.)
Figure 8c also shows that if the applied field exceeds this critical value, near the surface of the
semiconductor the conduction band edge drops below the Fermi level. This is the so-called inversion
layer, in which electrons with energies below ’ form a highly conductive degenerate Fermi gas.
However, typical rates of electron tunneling from the bulk through the depletion layer are very low, so
after the inversion layer has been created (say, by the gate voltage application), it may be only populated
from another source – hence the hatched blue points in Fig. 8c. This is exactly the fact used in the
workhorse device of semiconductor integrated circuits – the field-effect transistor (FET) – see Fig. 9.51
(a) gate (b)
gate
insulator
source drain insulators " fin"
n n p
p

Fig. 6.9. Two main species of the n-FET: (a) the bulk FET, and (b) the FinFET. While
on panel (a), the current flow from the source to the drain is parallel to the plane of the
drawing, on panel (b) it is normal to the plane, with the n-doped source and drain
contacting the thin “fin” from two sides off this plane.

In the “bulk” variety of this structure (Fig. 9a), a gate electrode overlaps a gap between two
similar highly-n-doped regions near the surface, called source and drain, formed by n-doping inside a p-
doped semiconductor. It should be more or less obvious (and will be shown in a moment) that in the
absence of gate voltage, the electrons cannot pass through the p-doped region, so virtually no current
flows between the source and the drain, even if a modest voltage is applied between these electrodes.
However, if the gate voltage is positive and large enough to induce the electric field E > Ec at the surface
of the p-doped semiconductor, it creates the inversion layer as shown in Fig. 8c, and the electron current

51This device was invented (by Julius E. Lilienfeld) in 1930 but demonstrated experimentally only in the mid-
1950s.

Chapter 6 Page 23 of 38
Essential Graduate Physics SM: Statistical Mechanics

between the source and drain electrodes may readily flow through this surface channel. (Very
unfortunately, in this course I would not have time/space for a detailed analysis of transport properties
of this keystone electron device and have to refer the reader to special literature.52)
Fig. 9a shows that another major (and virtually unavoidable) structure of semiconductor
integrated circuits is the famous p-n junction – an interface between p- and n-doped regions. Let us
analyze its simple model, in which the interface is in the plane x = 0, and the doping profiles nD(x) and
nA(x) are step-like, making an abrupt jump at the interface:

n  const, at x  0, 0, at x  0,
nA x    A nD x    (6.81)
0, at x  0, n D  const, at x  0 .
(This model is very reasonable for modern integrated circuits where the doping is performed by
implantation using high-energy ion beams.)
To start with, let us assume that no voltage is applied between the p- and n-regions, so the
system may be in thermodynamic equilibrium. In the equilibrium, the Fermi level ’ should be flat
through the structure, and at x  – and x  +, where   0, the level structure has to approach the
positions shown, respectively, on panels (a) and (b) of Fig. 7. In addition, the distribution of the electric
potential (x), shifting the level structure vertically by –e(x), has to be continuous to avoid unphysical
infinite electric fields. With that, we inevitably arrive at the band-edge diagram that is (schematically)
shown in Fig. 10.

n - doping
 C  e x  wn e

'  const Fig. 6.10. The band-edge diagram of a


p-n junction in thermodynamic
wp  V  e  x  equilibrium (T = const, ’ = const). The
p - doping notation is the same as in Figs. 7 and 8.
x0

The diagram shows that the contact of differently doped semiconductors gives rise to a built-in
electric potential difference , equal to the difference of their values of  in the absence of the contact
– see Eqs. (65) and (67): 53
n n
e  e     e      n   p    T ln C V , (6.82)
nD nA

52 The classical monograph in this field is S. Sze, Physics of Semiconductor Devices, 2nd ed., Wiley 1981. (The 3rd
edition, circa 2006, co-authored with K. Ng, is more tilted toward technical details.) I can also recommend a
detailed textbook by R. Pierret, Semiconductor Device Fundamentals, 2nd ed., Addison Wesley, 1996.
53 Frequently, Eq. (82) is also rewritten in the form e = T ln(n n /n 2). In view of the second of Eqs. (60), this
D A i
equality is formally correct but may be misleading because the intrinsic carrier density ni is an exponential
function of temperature and is physically irrelevant to this particular problem.

Chapter 6 Page 24 of 38
Essential Graduate Physics SM: Statistical Mechanics

which is usually just slightly smaller than the bandgap. (Qualitatively, this is the same contact potential
difference that was discussed, for the case of metals, in Sec. 3 – see Fig. 5.) The arising internal
electrostatic field E = –d/dx induces, in both semiconductors, depletion layers similar to that induced
by an external field (Fig. 8c). Their widths wp and wn may also be calculated similarly, by solving the
following boundary problem of electrostatics, mostly similar to that given by Eqs. (78)-(79):

d 2 e nA , for  w p  x  0,
  (6.83)
dx 2
 0  nD , for 0  x   wn ,
d
 wn     w p    , wn   d  wp   0,   0    0, d
 0  d  0 , (6.84)
dx dx dx dx
also exact only in the limit  << , ni << nD, nA. Its (easy) solution gives a result similar to Eq. (80):

 enA w p  x 2 / 2 0 , for  w p  x  0,


  const   (6.85)
  enD wn  x  / 2 0 , for 0  x   wn ,
2

with expressions for wp and wn giving the following formula for the full depletion layer width:
1/ 2
 2 0   nA nD 1 1 1
w  w p  wn    , with nef  , i.e.   . (6.86)
 en ef  nA  nD nef nA nD
This expression is similar to that given by Eq. (80), so for typical highly doped semiconductors
(nef ~1018cm-3) it gives for w a similar estimate of a few tens nm.54 Returning to Fig. 9a, we see that this
scale imposes an essential limit on the reduction of bulk FETs (whose scaling down is at the heart of the
well-known Moore’s law),55 explaining why such high doping is necessary. In the early 2010s, the
problems with implementing even higher doping, plus issues with dissipated power management, have
motivated the transition of advanced silicon integrated circuit technology from the bulk FETs to the
FinFET (also called “double-gate”, or “tri-gate”, or “wrap-around-gate”) variety of these devices,
schematically shown in Fig. 9b, despite their essentially 3D structure and hence a more complex
fabrication technology. In the FinFETs, the role of p-n junctions is reduced, but these structures remain
an important feature of semiconductor integrated circuits.
Now let us have a look at the p-n junction in equilibrium from the point of view of Eq. (52). In
the simple model we are considering now (in particular, at T << ), this equation is applicable separately
to the electron and hole subsystems, because in this model the gases of these charge carriers are classical
in all parts of the system, and the generation-recombination processes56 coupling these subsystems have
relatively small rates – see below. Hence, for the electron subsystem, we may rewrite Eq. (52) as
n
j n  n m qE  Dn , (6.87)
x

54 Note that such w is again much larger than D – the fact that justifies the first two boundary conditions (84).
55 Another important limit is quantum-mechanical tunneling through the gate insulator, whose thickness has to be
scaled down in parallel with lateral dimensions of a FET, including its channel length.
56 In the semiconductor physics lingo, the “carrier generation” event is the thermal excitation of an electron from
the valence band to the conduction band, leaving a hole behind, while the reciprocal event of filling such a hole
by a conduction-band electron is called the “carrier recombination”.

Chapter 6 Page 25 of 38
Essential Graduate Physics SM: Statistical Mechanics

where q = –e. Let us discuss how each term of the right-hand of this equality depends on the system’s
parameters. Because of the n-doping at x > 0, there are many more electrons in this part of the system.
According to the Boltzmann distribution (58), some number of them,
 e 
n  exp , (6.88)
 T 
have energies above the conduction band edge in the p-doped part (see Fig. 11a) and try to diffuse into
this part through the depletion layer; this diffusion flow of electrons from the n-side to the p-side of the
structure (in Fig. 11, from the right to the left) is described by the second term on the right-hand side of
Eq. (87). On the other hand, the intrinsic electric field E = –/x inside the depletion layer, directed as
Fig. 11a shows, exerts on the electrons the force F = qE  –eE pushing them in the opposite direction
(from p to n), is described by the first, “drift” term on the right-hand side of Eq. (87).57

(a) (b)
p n p n

F  eE jn e  V 
e je
E '  x 
eV
'  const 
 wp wn
wp wn

Fig. 6.11. Electrons in the conduction band of a p-n junction at: (a) V = 0, and (b) V > 0.
For clarity, other charges (of the holes and all ionized dopant atoms) are not shown.

The explicit calculation of these two flows58 shows, unsurprisingly, that in the equilibrium, they
are exactly equal and opposite, so jn = 0, and such analysis does not give us any new information.
However, the picture of two electron counter-flows, given by Eq. (87), enables a prediction of the
functional dependence of jn on a modest external voltage V, with V  < , applied to the junction.
Indeed, since the doped semiconductor regions outside the depletion layer are much more conductive

57 Note that if an external photon with energy  >  generates an electron-hole pair somewhere inside the
depletion layer, this electric field immediately drives its electron component to the right, and the hole component
to the left, thus generating a pulse of electric current through the junction. This is the physical basis of the whole
vast technological field of photovoltaics, currently strongly driven by the demand for renewable electric power.
Due to the progress of this technology, the cost of solar power systems has dropped from ~$300 per watt in the
mid-1950s to ~$1 per watt in 2020, and its global generation is now approaching 1015 watt-hours per year –
though it is still below 2% of the electric power generated by all methods.
58 I will not try to reproduce this calculation (which may be found in any of the semiconductor physics books
mentioned above), because getting all its scaling factors right requires using some model of the recombination
process, and in this course, there is just no time for its quantitative discussion. (However, see Eq. (93) below.)

Chapter 6 Page 26 of 38
Essential Graduate Physics SM: Statistical Mechanics

than it, virtually all applied voltage (i.e. the difference of values of the electrochemical potential ’)
drops across this layer, changing the total band edge shift – see Fig. 11b:59

e  e  μ'  e  qV  e  V  . (6.89)

This change results in an exponential change of the number of electrons able to diffuse into the p-side of
the junction – cf. Eq. (88):
 eV 
n V   n 0 exp , (6.90)
 T 
and hence in a proportional change of the diffusion flow jn of electrons from the n-side to the p-side of
the system, i.e. of the oppositely directed density of the electron current je = –ejn – see Fig. 11b.
On the other hand, the drift counter-flow of electrons is not altered too much by the applied
voltage: though it does change the electrostatic field E = – inside the depletion layer, and also the
depletion layer width,60 these changes are incremental, not exponential. As the result, the net density of
the current carried by electrons may be approximately expressed as
 eV 
j e V   j diffusion  j drift  j e 0 exp   const. (6.91a)
T 
As was discussed above, at V = 0, the net current has to vanish, so the constant in Eq. (91a) has to equal
je(0), and we may rewrite this equality as
  eV  
j e V   j e 0 exp   1. (6.91b)
 T  
Now repeating this analysis for the current jh of the holes (the exercise highly recommended to
the reader), we get a similar expression, with the same sign before eV,61 though with a different scaling
factor, jh(0) instead of je(0). As a result, the total electric current density obeys the famous Shockley law

  eV  
j V   j e V   j h V   j 0 exp   1 , with j 0  j e 0  j h 0 , (6.92)
 T  
describing the main p-n junction’s property as an electric diode – a two-terminal device passing the
current more “readily” in one direction (from the p- to the n-terminal) than in the opposite one.62

59 In our model, the positive sign of V  ’/q  –’/e corresponds to the additional electric field, –’/q 
’/e, directed in the positive direction of the x-axis (in Fig. 11, from the left to the right), i.e. to the positive
terminal of the voltage source connected to the p-doped semiconductor – which is the common convention.
60 This change, schematically shown in Fig. 11b, may be readily calculated by making the replacement (89) in the
first of Eqs. (86).
61 This sign invariance may look strange, due to the opposite (positive) electric charge of the holes. However, this
difference in the charge sign is compensated by the opposite direction of the hole diffusion – see Fig. 10. (Note
also that the actual charge carriers in the valence band are still electrons, and the effective positive charge of holes
is just a convenient representation of the specific dispersion law in this energy band, with a negative effective
mass – see Fig. 6, the second line of Eq. (53), and a more detailed discussion of this issue in QM Sec. 2.8.)

Chapter 6 Page 27 of 38
Essential Graduate Physics SM: Statistical Mechanics

Besides numerous practical applications in electrical and electronic engineering, diodes have very
interesting statistical properties, in particular performing very non-trivial transformations of the spectra
of deterministic and random signals. Very unfortunately, I would not have time for their discussion and
have to refer the interested reader to the special literature.63
Still, before proceeding to our next (and last!) topic, let me give for the reader reference, without
proof, the expression for the scaling factor j(0) in Eq. (92), which follows from a simple but broadly
used model of the recombination process:
 D D 
j 0   eni2  e  h  . (6.93)
 le nA l h nD 
Here le and lh are the characteristic lengths of diffusion of electrons and holes before their
recombination, which may be expressed by Eq. (5.113), le = (2Dee)1/2 and lh = (2Dhh)1/2, with e and h
being the characteristic times of recombination of the so-called minority carriers – of electrons in the p-
doped part, and of holes in the n-doped part of the structure. Since the recombination is an inelastic
process, its times are typically rather long – of the order of 10-7s, i.e. much longer than the typical times
of elastic scattering of the same carriers, which define their diffusion coefficients – see Eq. (51).

6.5. Heat transfer and thermoelectric effects


Now let us return to our analysis of kinetic effects using the Boltzmann-RTA equation, and
extend it even further, to the effects of a non-zero (albeit small) temperature gradient. Again, since for
any of the statistics (20), the average occupancy N() is a function of just one combination of all its
arguments,   ( – )/T, its partial derivatives obey not only Eq. (37) but also the following relation:

 N       N       N  
  . (6.94)
T T2  T 
As a result, Eq. (38) is generalized as
w0    
w0      T  , (6.95)
  T 
giving the following generalization of Eq. (39):
~   w0 v   '     T  .
w (6.96)
  T 
Now, calculating the current density as in Sec. 3, we get the result that is traditionally represented as
 ' 
j        S  T  , (6.97)
 q 

62 Some metal-semiconductor junctions, called Schottky diodes, have similar rectifying properties (and may be
better fitted for high-power applications than silicon p-n junctions), but their properties are more complex because
of the rather involved chemistry and physics of interfaces between different materials.
63 See, e.g., the monograph by R. Stratonovich cited in Sec. 4.2.

Chapter 6 Page 28 of 38
Essential Graduate Physics SM: Statistical Mechanics

where the constant S, called the Seebeck coefficient64 (or the “thermoelectric power”, or just
“thermopower”) is given by the following relation:

(   )   N   

gq 4
 8m 
Seebeck 3 1/ 2
coefficient S    d . (6.98)
2 3 3 0
T   
Working out this integral for the most important case of a degenerate Fermi gas, with T << F,
we have to be careful because the center of the sharp peak of the last factor under the integral coincides
with the zero point of the previous factor, ( – )/T. This uncertainty may be resolved using the
Sommerfeld expansion formula (3.59). Indeed, for a smooth function f() obeying Eq. (3.60), so f(0) = 0,
we may use Eq. (3.61) to rewrite Eq. (3.59) as

  N     2T 2 d 2 f  
 f ( ) 

 d  f (  ) 
6 d  2  
. (6.99)
0  
In particular, for working out the integral (98), we may take f()  (8m3)1/2( – )/T. (For this function,
the condition f(0) = 0 is evidently satisfied.) Then f() = 0, d2f/d2= = 3(8m)1/2/T  3(8mF)1/2/T, and
Eq. (98) yields
gq 4  2T 2 38m F 
1/ 2

S  . (6.100)
2 3 3 6 T
Comparing the result with Eqs. (3.54) and (32), for the constant S we get a simple expression
independent of :65
 2 T cV
S   , for T   F , (6.101)
2q  F q
where cV  CV/N is the heat capacity of the gas per unit particle, in this case given by Eq. (3.70).
In order to understand the physical meaning of the Seebeck coefficient, it is sufficient to consider
a conductor carrying no current. For this case, Eq. (97) yields

 ' / q  ST   0 .
Seebeck
effect (6.102)

So, at these conditions, a temperature gradient creates a proportional gradient of the electrochemical
potential ’, and hence the effective electric field E defined by Eq. (42). This is the Seebeck effect.
Figure 12 shows the standard way of its measurement, using an ordinary (electrodynamic) voltmeter that
measures the difference of ’/e at its terminals, and a pair of junctions (in this context, called the
thermocouple) of two materials with different coefficients S.

64 Named after Thomas Johann Seebeck who experimentally discovered, in 1822, the effect described by the
second term in Eq. (97) – and hence by Eq. (103).
65 Again, such independence hints that Eq. (101) has a broader validity than in our simple model of an isotropic
gas. This is indeed the case: this result turns out to be valid for any form of the Fermi surface, and for any
dispersion law (p). Note, however, that all calculations of this section are valid for the simplest RTA model in
that  is an energy-independent parameter; for real metals, a more accurate description of experimental results
may be obtained by tweaking this model to take this dependence into account – see, e.g., Chapter 13 in the
monograph by N. Ashcroft and N. D. Mermin, cited in Sec. 3.5.

Chapter 6 Page 29 of 38
Essential Graduate Physics SM: Statistical Mechanics

T"
S1 S2
A" V  (T'  T" )(S 1  S 2 )
B
A

A'
Fig. 6.12. The Seebeck effect in a thermocouple.
T'
Integrating Eq. (102) around the loop from point A to point B, and neglecting the temperature
drop across the voltmeter, we get the following simple expression for the thermally-induced difference
of the electrochemical potential, usually called either the thermoelectric power or the “thermo e.m.f.”:

μ' B μ' A 1 B B A"


 A' B

V     '  dr    ST  dr  S 1  T  dr  S 2   T  dr   T  dr 
q q qA A A' A A"  (6.103)

 S 1 T"  T'   S 2 T'  T"   (S 1  S 2 ) (T'  T" ) .

(Note that according to Eq. (103), any attempt to measure such voltage across any two points of a
uniform conductor would give results depending on the voltmeter wire materials, due to an unintentional
gradient of temperature in them.)
Using thermocouples is a very popular, inexpensive method of temperature measurement –
especially in the a-few-hundred-C range, where gas- and fluid-based thermometers are not too
practicable, if a 1C-scale accuracy is sufficient. The temperature responsivity (S1 – S2) of a popular
thermocouple, chromel-constantan,66 is about 70 V/C. To understand why the typical values of S are
so small, let us discuss the Seebeck effect’s physics. Superficially, it is very simple: the particles heated
by an external source, diffuse from it toward the colder parts of the conductor, creating an electric
current if they are electrically charged. However, this naïve argument neglects the fact that at j = 0, there
is no total flow of particles. For a more accurate interpretation, note that inside the integral (98), the
Seebeck effect is described by the factor ( – )/T, which changes its sign at the Fermi surface, i.e. at the
same energy where the term [-N()/], describing the availability of quantum states for transport
(due to their intermediate occupancy 0 < N() < 1), reaches its peak. The only reason why that integral
does not vanish completely, and hence S  0, is the growth of the first factor under the integral (which
describes the density of available quantum states on the energy scale) with , so the hotter particles
(with  > ) are more numerous and hence carry more heat than the colder ones carry in the opposite
direction.
The Seebeck effect is not the only result of a temperature gradient; the same diffusion of
particles also causes the less subtle effect of heat flow from the region of higher T to that with lower T,
i.e. the effect of thermal conductivity, well-known from our everyday practice. The density of this flow

66 Both these materials are alloys, i.e. solid solutions: chromel is 10% chromium in 90% nickel, while constantan
is 45% nickel and 55% copper.

Chapter 6 Page 30 of 38
Essential Graduate Physics SM: Statistical Mechanics

(i.e. that of thermal energy) may be calculated similarly to that of the electric current – see Eq. (26),
with the natural replacement of the electric charge q of each particle with its thermal energy ( – ):

j h      vwd 3 p . (6.104)

(Indeed, we may look at this expression is as at the difference between the total energy flow density, j =
vwd3p, and the product of the average energy needed to add a particle to the system () by the particle
flow density, jn = vwd3p  j/q.)67 Again, at equilibrium (w = w0) the heat flow vanishes, so w in Eq.
(104) may be replaced with its perturbation w ~ that already has been calculated – see Eq. (96). The
substitution of that expression into Eq. (104), and its transformation exactly similar to the one performed
above for the electric current j, yields
 ' 
j h  Π       T  , (6.105)
 q 
with the coefficients  and  given, in our approximation, by the following formulas:

gq 4

  N   
 8m      
Peltier 3 1/ 2
coefficient    d , (6.106)
2 3 3 0
 

(   )   N   

g 4
2

 8m 
Thermal 3 1/ 2
   d . (6.107)
2 3 3
conductivity
0
T   

Besides the missing factor T in the denominator, the integral in Eq. (106) is the same as the one
in Eq. (98), so the constant  (called the Peltier coefficient68), is simply and fundamentally related to the
Seebeck coefficient: 69
 vs. S   ST . (6.108)

67 An alternative explanation of the factor ( – ) in Eq. (104) is that according to Eqs. (1.37) and (1.56), for a
uniform system of N particles this factor is just (E – G)/N  (TS – PV)/N. The full differential of the numerator is
TdS + SdT –PdV – VdP, so in the absence of the mechanical work dW = –PdV, and changes of temperature and
pressure, it is just TdS  dQ – see Eq. (1.19).
68 Named after Jean Charles Athanase Peltier who experimentally discovered, in 1834, the effect expressed by the
first term in Eq. (105) – and hence by Eq. (112).
69 This extremely simple relation (first discovered experimentally in 1854 by W. Thompson, a.k.a. Lord Kelvin) is
frequently considered as the most prominent example of the so-called Onsager’s reciprocal relations between
kinetic coefficients, first suggested by L. Onsager in 1931. Unfortunately, the common derivation of these
relations, reproduced in even very popular textbooks, assumes without proof that the mutual correlation function
of statistical averages of thermodynamic variables have the same time-reversal symmetry as that of the underlying
microscopic variables. As was argued, among others, by R. Zwanzig, J. Chem. Phys. 40, 2527 (1964), this
assumption may be plausibly justified for the processes that, by their physical nature, lack very fast fluctuations,
such as the volume fluctuations discussed in Sec. 5.3, but not for those that feature them – see the discussion of
pressure fluctuations in the same section, and the solution of Problem 5.15. Unfortunately, I would have no
time/space for a sufficiently rigorous discussion of this interesting topic, and have to refer the reader to the
corresponding literature including B. Coleman and C. Truesdell, J. Chem. Phys. 33, 28 (1960), R. Zwanzig, Annu.
Rev. Phys. Chem. 16, 67 (1965), and U. Geigenmüller et al., Physica A 119, 53 (1983).

Chapter 6 Page 31 of 38
Essential Graduate Physics SM: Statistical Mechanics

On the other hand, the integral in Eq. (107) is different, but may be readily calculated\, for the
most important case of a degenerate Fermi gas, using the Sommerfeld expansion in the form (99), with
f()  (8m3)1/2( – )2/T, for which f() = 0 and d2f/d2= = 2(8m3)1/2/T  2(8mF3)1/2/T, so

g 4  2 2 2(8m F3 )1 / 2  2 nT
 T  . (6.109)
2 3 3 6 T 3 m
Comparing the result with Eq. (32), we get the so-called Wiedemann-Franz law70
 2 T
 . (6.110) Wiedemann-
 3 q2 Franz law

This relation between the electric conductivity  and the thermal conductivity  is more general
than our formal derivation might imply. Indeed, it may be shown that the Wiedemann-Franz law is also
valid for an arbitrary anisotropy (i.e. an arbitrary Fermi surface shape) and, moreover, well beyond the
relaxation-time approximation. (For example, it is also valid for the scattering integral (12) with an
arbitrary angular dependence of rate , provided that the scattering is elastic.) Experiments show that
the law is well obeyed by most metals, but only at relatively low temperatures when the thermal
conductance due to electrons is well above the one due to lattice vibrations, i.e. phonons – see Sec. 2.6.
Moreover, for a non-degenerate gas, Eq. (107) should be treated with the utmost care, in the context of
the definition (105) of the coefficient . (Let me leave this issue for the reader’s analysis.)
Now let us discuss the effects described by Eq. (105), starting from the less obvious, first term
on its right-hand side. It describes the so-called Peltier effect, which may be measured in the loop
geometry similar to that shown in Fig. 12, but now driven by an external voltage source – see Fig. 13.

( 1   2 ) I

2I

I  jA

1 I

2I
Fig. 6.13. The Peltier effect at T = const.
( 1   2 ) I

70 It was named after Gustav Wiedemann and Rudolph Franz who noticed the constancy of ratio / for various
materials, at the same temperature, as early as 1853. The direct proportionality of the ratio to the absolute
temperature was noticed by Ludwig Lorenz in 1872. Due to his contribution, the Wiedemann-Franz law is
frequently represented, in the SI temperature units, as / = LTK, where the constant L  (2/3)kB/e2, called the
Lorenz number, is close to 2.4510-8WK-2. Theoretically, Eq. (110) was derived in 1928 by A. Sommerfeld.

Chapter 6 Page 32 of 38
Essential Graduate Physics SM: Statistical Mechanics

The voltage drives a certain dc current I = jA (where A is the area of the conductor’s cross-
section), necessarily the same in the whole loop. However, according to Eq. (105), if materials 1 and 2
are different, the power P = jhA of the associated heat flow is different in the two parts of the loop.71
Indeed, if the whole system is kept at the same temperature (T = 0), the integration of that relation over
the cross-sections of each part yields
 ' 
P1, 2   1, 2 A1, 2 1, 2      1, 2 A1, 2 j1, 2   1, 2 I 1, 2   1, 2 I , (6.111)
 q 1, 2
where, at the second step, Eq. (41) for the electric current density has been used. This equality means
that to sustain the constant temperature, the following power difference,
Peltier
effect P   1   2 I , (6.112)

has to be extracted from one junction of the two materials (in Fig. 13, shown on the top), and inserted
into the counterpart junction.
If a constant temperature is not maintained, the former junction is heated (in excess of the bulk,
Joule heating), while the latter one is cooled, thus implementing a thermoelectric heat pump/refrigerator.
Such Peltier refrigerators (also called “thermoelectric coolers”) which require neither moving parts nor
fluids, are very convenient for modest (by a few tens C) cooling of relatively small components of
various systems – from sensitive radiation detectors on mobile platforms (including spacecraft), all the
way to cold drinks in vending machines. It is straightforward (and hence is left for the reader) to use the
above formulas to show that the practical efficiency of active materials used in such thermoelectric
refrigerators may be characterized by the following dimensionless figure-of-merit,
S 2
ZT  T. (6.113)

For the best thermoelectric materials found so far, the values of ZT at room temperature are close to 2,
providing the COPcooling, defined by Eq. (1.69), of the order of 20% of the Carnot limit (1.70), i.e. a few
times lower than that of traditional refrigerators using mechanical compressors. The search for
composite materials (including those with nanoparticles) with higher values of ZT is an active field of
applied solid-state physics.72 Another currently explored idea in this field is to reduce  (and hence to
increase ZT) radically by replacing the electron diffusion with their transfer through vacuum gaps.
Finally, let us discuss the second term of Eq. (105), in the absence of ’ (and hence of the
electric current) giving
Fourier
law j h  T , (6.114)

This equality should be familiar to the reader because it describes the very common effect of thermal
conductivity. Indeed, this linear relation73 is much more general than the particular expression (107) for

71 Let me emphasize that here we are discussing the heat transferred through a conductor, not the Joule heat
generated in it by the current. (The latter effect is quadratic, rather than linear, in current, and hence is much
smaller at I  0.)
72 See, e.g., D. Rowe (ed.), Thermoelectrics Handbook: Macro to Nano, CRC Press, 2005.
73 It was suggested (in 1822) by the same universal scientific genius J.-B. J. Fourier who has not only developed
such a key mathematical tool as the Fourier series but also discovered what is now called the greenhouse effect!

Chapter 6 Page 33 of 38
Essential Graduate Physics SM: Statistical Mechanics

: for sufficiently small temperature gradients it is valid for virtually any medium – for example, for
insulators. (Table 6.1 gives typical values of  for most common and/or representative materials.) Due
to its universality and importance, Eq. (114) has deserved its own name – the Fourier law.
Acting absolutely similarly to the derivation of other continuity equations, such as Eqs. (5.117)
for the classical probability, and Eq. (49) for the electric charge,74 let us consider the conservation of the
aggregate variable corresponding to jh – the internal energy E within a time-independent volume V.
According to the basic Eq. (1.18), in the absence of media’s expansion (dV = 0 and hence dW = 0), the
energy change75 has only the thermal component, so its only cause may be the heat flow through its
boundary surface S:
dE
  jh  d 2 r . (6.115)
dt S

In the simplest case of thermally-independent heat capacity CV, we may integrate Eq. (1.22) over
temperature to write76
E  CV T   cV T d 3 r , (6.116)
V

where cV is the volumic specific heat, i.e. the heat capacity per unit volume – see the rightmost column
in Table 6.1.

Table 6.1. Approximate values of two major thermal coefficients of some materials at 20C.

Material  (Wm-1K-1) cV (JK-1m-3)


Air(a),(b) 0.026 1.2103
Teflon ([C2F4]n) 0.25 0.6106
Water(b) 0.60 4.2106
Amorphous silicon dioxide 1.1-1.4 1.5106
Undoped silicon 150 1.6106
Aluminum(c) 235 2.4106
Copper(c) 400 3.4106
Diamond 2,200 1.8106
(a)
At ambient pressure.
(b)
In fluids (gases and liquids), heat flow may be much enhanced by temperature-gradient-induced
turbulent circulation – convection, which is highly dependent on the system’s geometry. The given values
correspond to conditions preventing convection.
(c)
In the context of the Wiedemann-Franz law (valid for metals only!), the values of  for Al and Cu
correspond to the Lorenz numbers, respectively, 2.2210-8 WK-2 and 2.3910-8 WK-2, in a pretty
impressive comparison with the universal theoretical value of 2.4510-8WK-2 given by Eq. (110).

74 They are all similar to continuity equations for other quantities – e.g., the mass (see CM Sec. 8.3) and the
quantum-mechanical probability (see QM Secs. 1.4 and 9.6).
75 According to Eq. (1.25), in the case of negligible thermal expansion, it does not matter whether we speak about
the internal energy E or the enthalpy H.
76 If the dependence of c on temperature may be ignored only within a limited temperature interval, Eqs. (116)
V
and (118) may be still used within that interval, for temperature deviations from some reference value.

Chapter 6 Page 34 of 38
Essential Graduate Physics SM: Statistical Mechanics

Now applying to the right-hand side of Eq. (115) the divergence theorem,77 and taking into
account that for a time-independent volume the full and partial derivatives over time are equivalent, we
get
 T  3
V  cV t    jh d r  0 , (6.117)

This equality should hold for any time-independent volume V, which is possible only if the function
under the integral equals zero at any point. Using Eq. (114), we get the following partial differential
equation, called the heat conduction equation (or, rather inappropriately, the “heat equation”):
Heat T
conduction cV r      r Τ   0 , (6.118)
equation t
where the spatial arguments of the coefficients cV and  are spelled out to emphasize that this equation is
valid even for nonuniform media. (Note, however, that Eq. (114) and hence Eq. (118) are valid only if
the medium is isotropic.)
In a uniform medium, the thermal conductivity  may be taken out from the external spatial
differentiation, and the heat conduction equation becomes mathematically similar to the diffusion
equation (5.116), and also to the drift-diffusion equation (50) in the absence of drift (U = 0):
T 
 DT  2T , with DT  . (6.119)
t cV
This means, in particular, that the solutions of these equations, discussed earlier in this course (such as
Eqs. (5.112)-(5.113) for the evolution of the delta-functional initial perturbation) are valid for Eq. (119)
as well, with the only replacement D  DT. This is why I will leave a few other examples of the
solution of this equation for the reader’s exercise.
Another topic I have to leave for the reader’s exercise is making estimates of the kinetic
coefficients (such as the , D, and  discussed above, and also the shear viscosity ) of a nearly-ideal
classical gas78 from simple kinematic arguments, and comparing the results with those following from
the Boltzmann-RTA equation.
More generally, let me emphasize again that due to time/space restrictions, in this chapter I was
able to barely scratch the surface of physical kinetics.79

6.6. Exercise problems

6.1. Use the Boltzmann equation in the relaxation-time approximation to derive the Drude
formula for the complex ac conductivity (), and give a physical interpretation of the result’s trend at
high frequencies.

77 I hope the reader knows it by heart by now, but if not – see, e.g., MA Eq. (12.2).
78Here the term “nearly-ideal gas” means that its mean free path l is so large that particle collisions do
not significantly affect the basic statistical properties of the gas.
79A much more detailed coverage of this important part of physics may be found, for example, in the textbook by
L. Pitaevskii and E. Lifshitz, Physical Kinetics, Butterworth-Heinemann, 1981. For a discussion of applied
aspects of kinetics see, e.g., T. Bergman et al., Fundamentals of Heat and Mass Transfer, 7th ed., Wiley, 2011.

Chapter 6 Page 35 of 38
Essential Graduate Physics SM: Statistical Mechanics

6.2. At t = 0, similar particles were uniformly distributed in a plane layer of thickness 2a:
n , for  a  x   a,
n  x ,0    0
0, otherwise.
At t > 0, the particles are allowed to propagate by diffusion through an unlimited uniform medium. Use
the variable separation method80 to calculate the time evolution of the particle density distribution.

6.3. Solve the previous problem using an appropriate Green’s function for the 1D version of the
diffusion equation, and discuss the relative convenience of the results.

6.4. Particles with the same initial spatial distribution as in the two previous problems are now
freed at t = 0 to propagate ballistically, without any scattering. Calculate the time evolution of their
density distribution at t > 0, provided that initially, the particles were in thermal equilibrium at
temperature T. Compare the solution with that of the previous problem.

6.5.* Calculate the electric conductance of a narrow, uniform conducting link between two bulk
conductors, in the low-voltage and low-temperature limit, neglecting the electron interaction and
scattering inside the link.

6.6. Calculate the effective capacitance (per unit area) of a broad plane sheet of a degenerate 2D
electron gas, separated by distance d from a metallic ground plane.

6.7. Give a quantitative description of the dopant atom ionization, which would be consistent
with the conduction and valence band occupation statistics, using the same simple model of an n-doped
semiconductor as in Sec. 4 (see Fig. 7a), and taking into account that the ground state of the dopant atom
is typically doubly degenerate, due to two possible spin orientations of the bound electron. Use the
results to verify Eq. (65), within the displayed limits of its validity.

6.8. Generalize the solution of the previous problem to the case when  C
the n-doping of a semiconductor by nD donor atoms per unit volume is 
D
complemented with its simultaneous p-doping by nA acceptor atoms per unit 
volume, whose energy A – V of activation, i.e. of accepting an additional 
electron and hence becoming a negative ion, is much lower than the bandgap  A
 – see the figure on the right. V

6.9. A nearly-ideal classical gas of N particles with mass m, was in thermal equilibrium at
temperature T, in a closed container of volume V. At some moment, an orifice of a very small area A is
open in one of the container’s walls, allowing the particles to escape into the surrounding vacuum.81 In
the limit of very low density n  N/V, use simple kinetic arguments to calculate the r.m.s. velocity of the
escaped particles during the time period when the total number of such particles is still much smaller
than N. Formulate the limits of validity of your results in terms of V, A, and the mean free path l.

80 A detailed introduction to this method may be found, for example, in EM Sec. 2.5.
81 In chemistry-related fields, this process is frequently called effusion.

Chapter 6 Page 36 of 38
Essential Graduate Physics SM: Statistical Mechanics

6.10. For the system analyzed in the previous problem, calculate the rate of particle flow through
the orifice – the so-called effusion rate. Discuss the limits of validity of your result.

6.11. Use simple kinematic arguments to estimate:


(i) the diffusion coefficient D,
(ii) the thermal conductivity , and
(iii) the shear viscosity ,
of a nearly-ideal classical gas with mean free path l. Compare the result for D with that calculated in
Sec. 3 from the Boltzmann-RTA equation.
Hint: In fluid dynamics, the shear viscosity (frequently called simply "viscosity") may be defined
as the coefficient  in the following relation:82
dF j' v j'
 ,
dA j r j
where dFj' is the j' th Cartesian component of the tangential force between two parts of a fluid, separated
by an imaginary interface normal to some direction nj (with j  j', and hence nj  nj'), exerted over an
elementary area dAj of this surface, and v(r) is the fluid velocity distribution at the interface.

6.12. Use simple kinematic arguments to relate the mean free path l in a nearly-ideal classical
gas, with the full cross-section  of mutual scattering of its particles.83 Use the result to express the
thermal conductivity and the viscosity coefficient estimates made in the previous problem, in terms of .

6.13. Use the Boltzmann-RTA equation to calculate the thermal conductivity of a nearly-ideal
classical gas, measured in conditions when the applied thermal gradient does not create a net particle
flow. Compare the result with that following from the simple kinetic arguments (Problem 11), and
discuss their relationship.

6.14. Use the Boltzmann-RTA equation to calculate the shear viscosity of a nearly-ideal gas.
Spell out the result in the classical limit, and compare it with the estimate made in the solution of
Problem 11.

6.15. Use a simple model of a thermoelectric refrigerator (“cooler”) based on the Peltier effect to
analyze its efficiency. In particular, explain why the fraction ZT given by Eq. (6.113) of the lecture
notes may be used as the figure-of-merit of materials for such devices.

82 See, e.g., CM Eq. (8.56). Please note the difference between the shear viscosity coefficient  considered in this
problem and the drag coefficient  whose calculation was the task of Problem 3.2. Despite the similar (traditional)
notation, and belonging to the same realm (kinematic friction), these coefficients have different definitions and
even different dimensionalities.
83 I am sorry for using the same letter for the cross-section as for the electric Ohmic conductivity. (Both notations
are very traditional.) Let me hope this would not lead to confusion, because conductivity is not discussed in this
problem.

Chapter 6 Page 37 of 38
Essential Graduate Physics SM: Statistical Mechanics

6.16. Use the heat conduction equation (119) to calculate the amplitude of day-periodic
temperature variations at depth z under the surface of a soil with negligible thermal expansion, and
temperature-independent specific heat cV and thermal conductivity . Assume that the incident heat flux
per unit surface is a sinusoidal function of time, with amplitude j0. Estimate the temperature variation
amplitudes, at depth z = 1 m, for a typical dry soil, taking necessary parameters from a reliable source.

6.17. Use Eq. (119) to calculate the time evolution of temperature in the center of a uniform solid
sphere of radius R, initially heated to a uniformly distributed temperature Tini, and at t = 0 placed into a
heat bath that keeps its surface at temperature T0.

6.18. Suggest a reasonable definition of the entropy production rate (per unit volume), and
calculate this rate for stationary thermal conduction, assuming that it obeys the Fourier law, in a material
with negligible thermal expansion. Give a physical interpretation of the result. Does the stationary
temperature distribution in a sample correspond to the minimum of the total entropy production in it?

Chapter 6 Page 38 of 38

You might also like