Proof of Heisenberg’s error-disturbance relation
Paul Busch,1, ∗ Pekka Lahti,2, † and Reinhard F. Werner3, ‡
arXiv:1306.1565v2 [quant-ph] 24 Sep 2013
2
1
Department of Mathematics, University of York, York, United Kingdom
Turku Centre for Quantum Physics, Department of Physics and Astronomy, University of Turku, FI-20014 Turku, Finland
3
Institut für Theoretische Physik, Leibniz Universität, Hannover, Germany
(Dated: September 25, 2013)
While the slogan “no measurement without disturbance” has established itself under the name
Heisenberg effect in the consciousness of the scientifically interested public, a precise statement of this
fundamental feature of the quantum world has remained elusive, and serious attempts at rigorous
formulations of it as a consequence of quantum theory have led to seemingly conflicting preliminary
results. Here we show that despite recent claims to the contrary [Rozema et al, Phys. Rev. Lett.
109, 100404 (2012)], Heisenberg-type inequalities can be proven that describe a trade-off between the
precision of a position measurement and the necessary resulting disturbance of momentum (and vice
versa). More generally, these inequalities are instances of an uncertainty relation for the imprecisions
of any joint measurement of position and momentum. Measures of error and disturbance are here
defined as figures of merit characteristic of measuring devices. As such they are state independent,
each giving worst-case estimates across all states, in contrast to previous work that is concerned
with the relationship between error and disturbance in an individual state.
PACS numbers: 03.65.Ta, 03.65.Db, 03.67.-a
In spite of their important role since the very beginning of quantum mechanics, uncertainty relations
have recently become the subject of active scientific debates. On one hand, entropic versions of the informationdisturbance trade-off [1] have become an important tool
in security proofs [2] for continuous variable cryptography. On the other hand there were widely publicized
[3] claims of a refutation [4–6] of the error-disturbance
uncertainty relations heuristically claimed by Heisenberg
[7]. A review of the literature on uncertainty relations is
given in [8].
Heisenberg’s 1927 paper [7] introducing the uncertainty relations is one of the key contributions to early
quantum mechanics. It is part of virtually every quantum
mechanics course, almost always in the version forwarded
by Kennard [9], Weyl [10] and Robertson [11]. What is
often overlooked, however, is that this popular version is
only one way of making the idea of uncertainty precise.
The original paper begins with a famous discussion of the
resolution of microscopes, in which the accuracy (resolution) of an approximate position measurement is related
to the disturbance of the particle’s momentum.
This situation is no way covered by the standard relations, since in an experiment concerning the KennardWeyl-Robertson inequality no particle meets with both a
position and a momentum measurement. Heisenberg’s
semiclassical discussion has no immediate translation
into the modern quantum formalism, particularly since
the momentum disturbance prima facie involves the comparison of two (generally) non-commuting quantities, the
momentum before and after the measurement. Such a
translation does require some careful conceptual work,
and one can arrive at different results. This is shown by
the example of Ozawa [4], who defines a relation he claims
to be a rigorous version of Heisenberg’s ideas, and shows
that it fails to hold in general. A suggested modification
of the false relation has recently been verified experimentally [5, 6]. This has been widely publicized as a refutation of Heisenberg’s ideas, in apparent contradiction
to our main result. However, there is no contradiction,
and the disagreement only shows that there is a grain of
rigorously explicable truth in Heisenberg, provided one
looks in the right place for it. While Ozawa aims to describe the interplay between error and disturbance for an
individual state, our approach gives a state-independent
characterization of the overall performance of measuring
devices. In [12] we show that Ozawa’s notions, though
mathematically well-defined, have only limited validity
as measures of error and disturbance.[13]
We will describe and prove an inequality of the classic
form
(∆Q)(∆P ) ≥
~
,
2
(1)
in which the quantities ∆Q and ∆P are not given by the
variances of the position and momentum distributions in
the same state, as in the textbook inequality. Instead,
following closely the suggestion of Heisenberg, they are
explicitly defined figures of merit for a microscope-like
measurement scenario: the accuracy ∆Q of a position
measurement and the momentum disturbance ∆P incurred by it. Moreover, the inequality is sharp, and we
will describe explicitly the cases of equality. We believe
that the definitions and results are simple enough to use
in a basic quantum mechanics course, although the full
proof uses some tools beyond such a course.
The main progress over earlier work [14] is a simpler
definition of the ∆ quantities, using the idea of calibration [15]. This definition does not require the Monge
transportation metric, which led in [14] to quantities akin
2
to absolute deviations rather than root mean square deviations, and hence to a constant different from ~/2 in (1).
A changed constant (even if optimal for the particular
definitions of ∆) puts an undue burden on the memory
of undergraduates. Using variances also for calibration
solves this problem. The basic ideas of the proof in [14]
can be taken over.
To keep matters simple, we stick to the classic situation of two canonically conjugate variables of a single
quantum degree of freedom. For the sake of comparison,
let us recall the scenario of the Kennard-Weyl-Robertson
inequality, which we call preparation uncertainty (see
1/2
Fig. 1). The spreads ∆ρ (A) = tr ρA2 − (tr ρA)2
ρ
ρ
∆ρ(Q)
Q
P
∆ρ(P)
FIG. 1. Scenario of preparation uncertainty. ∆ρ is the root of
the variance of the distribution obtained for the indicated observable in the state ρ. In this pair of experiments no particle
is subject to both a position and a momentum measurement.
of position Q and momentum P are determined in separate experiments on the same source, given by a density
operator ρ. The uncertainty relation ∆ρ (Q)∆ρ (P ) ≥ ~/2
is a quantitative version of the observation that there are
no dispersion-free quantum states [16], as applied to a
canonical pair of observables. It is not to be found in
Heisenberg’s paper [7], except in a rough discussion of
post-measurement states, which he assumes to be Gaussian with a spread related to the accuracy of a position
measurement.
In contrast, Fig. 2 shows the scenario discussed by
Heisenberg. The middle row shows an approximate position measurement Q′ followed by a momentum measurement. How should we define the momentum disturbance
and position error in this setup? The error of the approximate position measurement Q′ clearly refers to the
comparison with an ideal measurement Q as shown in the
first row. For the momentum disturbance we can say the
same: We have remarked that the momenta before and
after the microscope interaction do not commute, so the
difference makes no sense in the individual case. However, we can compare the distributions of the momenta
measured after the position measurement (call this effective measurement P ′ ) with the distribution an ideal momentum measurement P would have given on the same
input state. Come to think of it, this is precisely how
we detect disturbance in other typical quantum settings.
Consider, for example, the double slit experiment. It is
ρ
Q
ρ
Q'
ρ
P
∆(Q, Q' )
P'
M
∆(P, P' )
FIG. 2. Scenario of measurement uncertainty for successive
measurements, as discussed by Heisenberg (middle row). An
approximate position measurement Q′ is followed by an ideal
momentum measurement, effectively given a measurement P ′
on the initial state. The accuracy ∆(Q, Q′ ) quantifies the difference between the output distributions of Q′ and an ideal
position measurement Q (first row). Similarly, the momentum disturbance ∆(P, P ′ ) quantifies the difference between
the distributions obtained by P ′ and by an ideal momentum
measurement P (last row). The definitions for these ∆ quantities (see text) can be applied, more generally, to an arbitrary
joint measurement M (dashed box). This can be any device
producing, in every shot, a q value and a p value. Q′ and P ′
are then defined as the marginals of M , obtained by ignoring
the other output.
well-known that illuminating the slits enough to detect
the passage of a particle through one or the other hole
makes the interference fringes go away. Clearly, the light
used for observation disturbs the particles, and the evidence for this is once again the change of the distribution
on the screen. Note that this way of looking at error
and disturbance restores the symmetry between the position and momentum aspects of this scenario. The uncertainty relations we will prove therefore apply just as
well to the position disturbance caused by an approximate momentum measurement and, more generally, to
any measurement scheme M , which produces in every
run a value p and a value q (see the dashed outline in
Fig. 2). This generalization also covers any successive
measurement scenario, in which one tries to correct for
some of the momentum disturbance, perhaps using the
detailed knowledge of how the position measuring device
works. In principle, this could allow a reduction of uncertainties. However, the inequality holds without change,
which gives a precise meaning and a proof to Heisenberg’s phrase “uncontrollable momentum disturbance”,
which he himself uses without further justification.
Let us now discuss the definition of ∆(Q, Q′ ) in more
detail (the momentum case will be completely analogous). We think of this “microscope resolution” as a figure of merit for the device, a promise which might be
advertised by the manufacturer, and which could be verified by a testing lab. ∆(Q, Q′ ) = 0 will mean that the
“approximate” device Q′ is completely equivalent to the
ideal Q, i.e., for every input state ρ the output distributions will be the same. Similarly, a small value might
3
indicate that the difference in the distributions will be
small for every input state. This requires a definition
for the distance of two general probability distributions,
which we will give below (Section on “Uncertainty metrics”). However, we can also take a simpler approach,
which avoids verifying a statement for all input states.
Instead the testing lab might concentrate on those states,
which at least classically would seem to be the most demanding ones, namely states for which Q has a known
and sharp value. We call this process “calibration”. Still,
this requires testing of many states but no longer on very
mixed states, or states which contain coherent superpositions of widely separated wave functions.
An advantage of the calibrated error is that we no
longer need a quantitative evaluation of the distance between arbitrary probability distributions, but just between an arbitrary distribution and a known sharp value
ξ. For this we naturally take the root mean square deviation from ξ
D
E1/2
D(ρ, Q′ ; ξ) = (q ′ − ξ)2
′
ρ,Q
(2)
where the angle brackets denote the expectation of the
indicated function of the output q ′ , in the distribution
obtained on the preparation ρ with the device Q′ . This
statement allows for Q′ to be a general positive operator
valued measurement. For projection valued observables
like Q
we could simplify this to D(ρ, Q; ξ)2 = tr ρ(Q −
ξ1I)2 . The latter quantity is to be small, say ≤ ε, for
the input states ρ used for calibration. Hence we set
∆c (Q, Q′ ) to be
n
o
lim sup D(ρ, Q′ ; ξ) ρ, ξ; D(ρ, Q; ξ) ≤ ε .
(3)
ε→0
Here the set is non-empty since for any ξ and ǫ > 0
there is a ρ such that tr(ρ Q) = ξ and D(ρ, Q; ξ) < ǫ;
moreover, the limit exists, because with decreasing ε the
supremum is over fewer and fewer states, so the function is non-increasing. In the case of a bad approximation, the supremum can be infinite, in which case we put
∆c (Q, Q′ ) = ∞.
With this definition, and the corresponding one for P ,
we can state our main result. We just assume that the
Q′ and P ′ are the marginal observables of some joint
measurement device M whose calibration errors are both
finite. As discussed above this covers also the case of a
sequential measurement (Fig. 2). Then
∆c (Q, Q′ ) ∆c (P, P ′ ) ≥
~
.
2
(4)
This inequality is sharp, and equality holds for an M for
which the joint distribution of (q, p)-outputs is the socalled Husimi distribution [17] of the input state, which
can be obtained by a Gaussian smearing of the Wigner
function. In the extreme case of one of the marginals
being error free, the error for the other marginal is necessarily infinite.
Proof. The proof has two parts: The first is elementary and concerns the special case that M is a covariant
phase space observable. These observables [17–20] can be
described explicitly, including a very simple form of their
marginals Q′ and P ′ , by which (4) can be reduced to
the preparation uncertainty. The second, more technical
part of the proof reduces the general case to the covariant
case by an averaging method, and is taken from [14]. We
only sketch it [21].
By a covariant measurement we mean one which has
a natural symmetry property for both position and momentum translations. That is, if we apply it to an input
state shifted in position by δq and in momentum by δp,
the output distribution will be the same as before, transformed by (q, p) 7→ (q + δq, p + δp). These symmetries
are implemented by the Weyl operators (a.k.a. Glauber
translations) W (q, p) = exp((iqP − ipQ)/~). Then the
whole observable can be reconstructed from its density
at the origin, which must be [19, 20] a positive operator
σ of trace 1, i.e., a density operator as for a quantum
state. The probability for outcomes in a set S ⊆ R2 is
then given by the positive operator
Z
dq dp
W (q, p)∗ σW (q, p) .
(5)
M (S) =
2π~
S
A remarkable property of these joint measurements of
position and momentum is that their marginals take a
particularly simple form: The probability density of the
outputs q ′ obtained on a state ρ is a convolution of the
position distributions of ρ and σ. That is, we can model
the output distribution by taking q distributed like the
outputs of an ideal measurement Q on ρ, and adding a
noise term q ′′ , which is independent of q and distributed
according to the position distribution of σ. The same
description applies to the marginal P ′ .
Therefore, for a covariant measurement we can immediately identify ∆c (Q, Q′ ) without further computation:
The density σ is a fixed characteristic property of the
measurement. Therefore, as the position distribution of
ρ becomes sharply concentrated around some ξ, the outputs converge in distribution to q ′ = ξ + q ′′ , so
∆c (Q, Q′ ) = D(σ, Q; 0) ,
(6)
which is the “size” (the root mean square deviation) of
the “noise”. For example, if σ has sharp position distribution at some value a, this is equal to |a|, since the
outputs will be off by a shift a (i.e., q ′ ≈ q + a). Hence
one will choose σ with zero mean. The uncertainty product then becomes ∆c (Q, Q′ )∆c (P, P ′ ) = ∆σ (Q)∆σ (P ),
which is ≥ ~/2 by the preparation uncertainty relation
applied to σ. This proves (4) for the case of covariant
measurements, and at the same time provides examples
of minimum uncertainty measurements: all we have to
4
do is to choose σ as a centered minimum uncertainty
state, i.e., as σ = |ΨihΨ| with Ψ a real valued centered
Gaussian wave function. The phase space distribution
associated with an input state ρ by this measurement M
is then the Husimi distribution [17].
The more technical part of the proof of (4) is to show
that for any measurement M there is a covariant one,
say M with at most the same ∆’s. Basically, M is obtained from M by averaging, the technical problem being that the parameter range of (q, p) over which one
has to “average” is infinite (see [14]). Let us introduce
Mε (∆Q, ∆P ) as the set of measurements M such that,
for A = Q, P , D(ρ, A′ ; ξ) ≤ ∆A whenever D(ρ, A; ξ) ≤ ε
for given ∆A and ε. This is a convex set, and compact
in a suitable weak topology. We can write the covariance
condition as a fixed point equation for some transformations on the set of all observables, namely a unitary
transformation by a Weyl operator combined with a shift
in the argument. These transformations commute, and
leave Mε (∆Q, ∆P ) invariant. Therefore, by the MarkovKakutani fixed point theorem this set, if non-empty, must
also contain a covariant element, which by construction
has at most the same uncertainties. This concludes our
sketch of the proof of (4).
Uncertainty metrics. The calibration criterion only involves highly concentrated states so that, in principle,
on general input states the optimal joint measurement
might produce output distributions quite different from
the ideal ones. One can easily give examples of a projection valued observable A and an “approximation” A′
for which the calibrated distance is a rather optimistic
estimate. That is if we denote by ∆(Q, Q′ ) a figure of
merit based on comparison of all states we might have
∆(Q, Q′ ) ≫ ∆c (Q, Q′ ). Note first that in the covariant
case this cannot happen: The statement that Q′ can be
simulated by adding fixed independent noise to Q is valid
for arbitrary input states, and any reasonable definition
of ∆(Q, Q′ ) should give the size of the noise. However,
in the general case we would need a definition which is
independent of that special form. Here we will introduce
such a quantity and show that an uncertainty holds for
it.
The idea is to define a metric D on probability distributions which extends (2) in the sense that D(ρ, Q′ ; ξ)
becomes the metric distance between the output distribution of Q′ and a point measure at ξ. Then we set
∆(Q, Q′ ) = sup D(ρ, Q; ρ, Q′ ),
(7)
ρ
where the expression on the right is the metric distance of
the two output distributions. Since ∆c takes the supremum over the smaller set of highly concentrated states,
we have ∆(Q, Q′ ) ≥ ∆c (Q, Q′ ). The metric D on probability distributions is basically fixed by our requirements
as what is technically known as the Wasserstein-2 distance, which is a variant of the the Monge-Kantorovich
transport or “earth mover’s” distance (see [22] for a study
of such metrics). The problem addressed by Monge was
the cost of transforming a hill (earth distribution µ) into
some fortifications (earth distribution η), when the workers had to be paid by the bucket and the distance covered. A transport plan, also known as a coupling between
the measures µ and η would be a measure γ on R × R
describing how much earth was to be moved from x to
y. This entails that the marginals of Rγ must be µ and
η. The cost in the Monge problem is γ(dx dy)|x − y|,
which is then minimized by choosing an optimal γ. In
the Wasserstein-2 distance the cost function is chosen to
be quadratic in the distance and an overall root is taken
to bring the units back to a length:
Z
1/2
D(µ; η) = inf
γ(dx dy)|x − y|2
,
(8)
γ
where the infimum is over all couplings γ. Consider now
the case that η arises from µ by adding independent noise
with distribution ν, which amounts to the convolution
η = µ ∗ ν. This immediately suggests a transport plan,
namely shifting each individual element of the µ distribution by the amount suggested by the noise (formally:
γ(dx dy) = µ(dx)ν(d(y − x))). This may not be optimal,
but gives the estimate D(µ; µ ∗ ν) ≤ D(ν; 0), the size of
the noise, where once again the second argument stands
for the point measure at zero. This says that the largest
distance is attained for a point measure µ, and therefore
∆(Q, Q′ ) = ∆c (Q, Q′ )
(9)
whenever Q′ is the marginal of a covariant measurement.
To summarize this section: if we define the deviation between Q and Q′ by a worst case figure of merit over all
states, the uncertainty relation once again holds. Moreover, the two notions coincide on all covariant measurements, and in particular for the cases of equality.
Conclusion and Outlook. With the inequality (4) we
have provided a general, quantitative quantum version
of Heisenberg’s original semiclassical uncertainty discussion. This is a remarkable vindication of Heisenberg’s
intuitions, far beyond the usual view, which takes the
quantitative content of the paper to be summarized entirely by the preparation inequality, and sees the discussion of the microscope as no more than a heuristic order
of magnitude argument.
Our conceptual framework applies to any pair of observables which are not jointly measurable. However,
evaluating the respective uncertainty bounds, which will
typically not be expressed in terms of the product of uncertainties, is another matter requiring further studies.
Acknowledgements. Part of this work (P.L.) is supported
by the Academy of Finland, project no 138135. R.F.W.
acknowledges support from the European network SIQS.
P.B. has been supported by COST Action MP1006. We
wish to thank two anonymous referees for their valuable
criticism and recommendations.
5
∗
†
‡
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
paul.busch@york.ac.uk
pekka.lahti@utu.fi
reinhard.werner@itp.uni-hannover.de
M. Tomamichel and R. Renner, Phys. Rev. Lett. 106,
110506 (2011).
F. Furrer, T. Franz, M. Berta, A. Leverrier, V. B. Scholz,
M. Tomamichel, and R. F. Werner, Phys. Rev. Lett. 109,
100502 (2012).
A. Furuta, Scient. Am. March 8 (2012), see
also http://phys.org/news/2012-09-scientists-renowneduncertainty-principle.html.
M. Ozawa, Ann. Phys. 311, 350 (2004).
J. Erhart, S. Sponar, G. Sulyok, G. Badurek, M. Ozawa,
and Y. Hasegawa, Nature Phys. 8, 185 (2012).
L. Rozema, A. Darabi, D. Mahler, A. Hayat, Y. Soudagar, and A. Steinberg, Phys. Rev. Lett. 109, 100404
(2012).
W. Heisenberg, Zeitschr. Phys. 43, 172 (1927).
P. Busch, T. Heinonen, and P. Lahti, Phys. Rep. 452,
155 (2007).
E. Kennard, Zeitschr. Phys. 44, 326 (1927).
H. Weyl, Gruppentheorie und Quantenmechanik (Hirzel,
Leipzig, 1928).
H. Robertson, Phys. Rev. 34, 163 (1929).
[12] P. Busch, P. Lahti, and R. Werner, “Challenging Heisenberg’s Uncertainty Principle: What it takes,” in preparation (2013).
[13] This work is a detailed elaboration of criticisms that were
raised by the present authors in earlier publications, for
instance, in [23] and [14].
[14] R. F. Werner, Quant. Inform. Comput. 4, 546 (2004),
quant-ph/0405184.
[15] P. Busch and D. B. Pearson, J. Math. Phys. 48, 082103
(2007).
[16] J. von Neumann, Mathematische Grundlagen der Quantenmechanik (Springer, Berlin, 1932).
[17] K. Husimi, Proc. Physico-Mathem. Soc. Japan 22, 264
(1940).
[18] E. Davies, Quantum Theory of Open Systems (Academic
Press, 1976).
[19] A. Holevo, Probabilistic and Statistical Aspects of Quantum Theory (North Holland, Amsterdam, 1982).
[20] R. F. Werner, J. Math. Phys. 25, 1404 (1984).
[21] A full proof is given in [24]. This paper also generalizes
the preparation and measurement uncertainty relations
from root-mean-square deviations to power-p deviations.
[22] C. Villani, Optimal transport: old and new (Springer,
2009).
[23] P. Busch, T. Heinonen, and P. Lahti, Physics Letters A
320, 261 (2004).
[24] P. Busch, P. Lahti, and R. Werner, “Measurement Uncertainty Relations,” in preparation (2013).