Fundamentals of Quantum Information Theory: Michael Keyl
Fundamentals of Quantum Information Theory: Michael Keyl
Fundamentals of Quantum Information Theory: Michael Keyl
Michael Keyl1
TU-Braunschweig, Inst. Math. Phys, Mendelssohnstraße 3, D-38106 Braunschweig
February 1, 2008
1
electronic mail: M.Keyl@TU-BS.DE
Abstract
In this paper we give a self contained introduction to the conceptional and mathe-
matical foundations of quantum information theory. In the first part we introduce
the basic notions like entanglement, channels, teleportation etc. and their mathe-
matical description. The second part is focused on a presentation of the quantitative
aspects of the theory. Topics discussed in this context include: entanglement mea-
sures, channel capacities, relations between both, additivity and continuity proper-
ties and asymptotic rates of quantum operations. Finally we give an overview on
some recent developments and open questions.
Contents
1 Introduction 5
1.1 What is quantum information? . . . . . . . . . . . . . . . . . . . . . 5
1.2 Tasks of quantum information . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Experimental realizations . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Basic concepts 11
2.1 Systems, States and Effects . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Operator algebras . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Classical probability . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.4 Observables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Composite systems and entangled states . . . . . . . . . . . . . . . . 15
2.2.1 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Compound and hybrid systems . . . . . . . . . . . . . . . . . 16
2.2.3 Correlations and entanglement . . . . . . . . . . . . . . . . . 17
2.2.4 Bell inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Completely positive maps . . . . . . . . . . . . . . . . . . . . 19
2.3.2 The Stinespring theorem . . . . . . . . . . . . . . . . . . . . . 20
2.3.3 The duality lemma . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Separability criteria and positive maps . . . . . . . . . . . . . . . . . 21
2.4.1 Positivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 The partial transpose . . . . . . . . . . . . . . . . . . . . . . 22
2.4.3 The reduction criterion . . . . . . . . . . . . . . . . . . . . . 23
3 Basic examples 24
3.1 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Maximally entangled states . . . . . . . . . . . . . . . . . . . 24
3.1.2 Werner states . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.3 Isotropic states . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.4 OO-invariant states . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.5 PPT states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.6 Multipartite states . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Quantum channnels . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Channels under symmetry . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Classical channels . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.4 Observables and Preparations . . . . . . . . . . . . . . . . . . 34
3.2.5 Instruments and Parameter Dependent Operations . . . . . . 34
3.2.6 LOCC and separable channels . . . . . . . . . . . . . . . . . 36
3.3 Quantum mechanics in phase space . . . . . . . . . . . . . . . . . . . 37
3.3.1 Weyl operators and the CCR . . . . . . . . . . . . . . . . . . 38
3.3.2 Gaussian states . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.3 Entangled Gaussians . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.4 Gaussian channels . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Contents
4 Basic tasks 43
4.1 Teleportation and dense coding . . . . . . . . . . . . . . . . . . . . . 43
4.1.1 Impossible machines revisited: Classical teleportation . . . . 43
4.1.2 Entanglement enhanced teleportation . . . . . . . . . . . . . 43
4.1.3 Dense coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Estimating and copying . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.1 Quantum state estimation . . . . . . . . . . . . . . . . . . . . 46
4.2.2 Approximate cloning . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Distillation of entanglement . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.1 Distillation of pairs of qubits . . . . . . . . . . . . . . . . . . 49
4.3.2 Distillation of isotropic states . . . . . . . . . . . . . . . . . . 50
4.3.3 Bound entangled states . . . . . . . . . . . . . . . . . . . . . 50
4.4 Quantum error correction . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Quantum computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5.1 The network model of classical computing . . . . . . . . . . . 53
4.5.2 Computational complexity . . . . . . . . . . . . . . . . . . . . 54
4.5.3 Reversible computing . . . . . . . . . . . . . . . . . . . . . . 55
4.5.4 The network model of a quantum computer . . . . . . . . . . 56
4.5.5 Simons problem . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6 Quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Entanglement measures 62
5.1 General properties and definitions . . . . . . . . . . . . . . . . . . . 62
5.1.1 Axiomatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.2 Pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.3 Entanglement measures for mixed states . . . . . . . . . . . . 65
5.2 Two qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.1 Pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.2 EOF for Bell diagonal states . . . . . . . . . . . . . . . . . . 68
5.2.3 Wootters formula . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.4 Relative entropy for Bell diagonal states . . . . . . . . . . . . 70
5.3 Entanglement measures under symmetry . . . . . . . . . . . . . . . . 71
5.3.1 Entanglement of Formation . . . . . . . . . . . . . . . . . . . 71
5.3.2 Werner states . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3.3 Isotropic states . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.4 OO-invariant states . . . . . . . . . . . . . . . . . . . . . . . 74
5.3.5 Relative Entropy of Entanglement . . . . . . . . . . . . . . . 74
6 Channel capacity 78
6.1 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.1.1 The definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.1.2 Simple calculations . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 The classical capacity . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2.1 Classical channels . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2.2 Quantum channels . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.3 Entanglement assisted capacity . . . . . . . . . . . . . . . . . 83
6.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3 The quantum capacity . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.1 Alternative definitions . . . . . . . . . . . . . . . . . . . . . . 87
6.3.2 Upper bounds and achievable rates . . . . . . . . . . . . . . . 88
6.3.3 Relations to entanglement measures . . . . . . . . . . . . . . 92
Contents 4
7 Multiple inputs 94
7.1 The general scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.1.1 Figures of merit . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.1.2 Covariant operations . . . . . . . . . . . . . . . . . . . . . . . 95
7.1.3 Group representations . . . . . . . . . . . . . . . . . . . . . . 97
7.1.4 Distillation of entanglement . . . . . . . . . . . . . . . . . . . 99
7.2 Optimal devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.2.1 Optimal cloning . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.2.2 Purification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.2.3 Estimating pure states . . . . . . . . . . . . . . . . . . . . . . 103
7.2.4 The UNOT gate . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.3 Asymptotic behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.3.1 Estimating mixed state . . . . . . . . . . . . . . . . . . . . . 105
7.3.2 Purification and cloning . . . . . . . . . . . . . . . . . . . . . 107
Chapter 1
Introduction
Measurement Preparation
M P
Figure 1.1: Schematic representation of classical teleportation. Here and in the fol-
lowing diagrams a curly arrow stands for quantum systems and a straight one for
the flow of classical information.
This concerns in particular the construction of Bell’s telephone from a joint measurement, which
we have omitted here.
7 1.1. What is quantum information?
P′ M P M′
∼
=
P′ M′
Figure 1.2: A teleportation process should not affect the results of a statistical
experiment with quantum systems. A more precise explanation of the diagram is
given in the text.
chine. This is a device C which takes one quantum system p as input and produces
two systems p1 , p2 of the same type as output. The limiting condition on C is that
p1 and p2 are indistinguishable from the input, where “indistinguishable” has to be
understood in the same way as above: Any statistical experiment performed with
one of the output particles (i.e. always with p1 or always with p2 ) yields the same
result as applied directly to the input p. To get such a device from teleportation
is easy: We just have to perform an M measurement on p, make two copies of the
classical data obtained, and run the preparation P on each of them; cf. Figure 1.3.
Hence if teleportation is possible copying is possible as well.
According to the “no-cloning theorem” of Wootters and Zurek [173], however, a
quantum copy machine does not exist and this basically concludes our proof. How-
ever we will give an easy argument for this theorem in terms of a third impossible
machine – a joint measuring device MAB for two arbitrary observables A and B.
This is a measuring apparatus which produces each time it is invoked a pair (a, b)
of classical outputs, where a is a possible output of A and b a possible output of
B. The crucial requirement for MAB again is of statistical nature: The statistics of
the a outcomes is the same as for device A, and similarly for B. It is known from
elementary quantum mechanics that many quantum observables are not jointly
measurable in this way. The most famous examples are position and momentum or
different components of angular momentum. Nevertheless a device MAB could be
Figure 1.4: Constructing a joint measurement for the observables A and B from a
quantum copying machine.
constructed for arbitrary A and B from a quantum copy machine C. We simply have
to operate with C on the input system p producing two outputs p1 and p2 and to
perform an A measurement on p1 and a B measurement on p2 ; cf. Figure 1.4. Since
the outputs p1 , p2 are, by assumption indistinguishable from the input p the overall
device constructed this way would give a joint measurement for A and B. Hence a
quantum copying machine cannot exist, as stated by the no-cloning theorem. This
in turn implies that classical teleportation is impossible, and therefore we can not
transform quantum information lossless into classical information and back. This
concludes our chain of arguments.
1.2 Tasks of quantum information
So we have seen that quantum information is something new, but what can we do
with it? There are three answers to this question which we want to present here.
First of all let us remark that in fact all information in a modern data processing
environment is carried by micro particles (e.g. electrons or photons). Hence quantum
information comes automatically into play. Currently it is safe to ignore this and
to use classical information theory to describe all relevant processes. If the size of
the structures on a typical circuit decreases below a certain limit, however, this is
no longer true and quantum information will become relevant.
This leads us to the second answer. Although it is far too early to say which
concrete technologies will emerge from quantum information in the future, several
interesting proposals show that devices based on quantum information can solve
certain practical tasks much better than classical ones. The most well known and
exciting one is, without a doubt, quantum computing. The basic idea is, roughly
speaking, that a quantum computer can operate not only on one number per reg-
ister but on superpositions of numbers. This possibility leads to an “exponential
speedup” for some computations which makes problems feasible which are consid-
ered intractable by any classical algorithm. This is most impressively demonstrated
by Shor’s factoring algorithm [139, 140]. A second example which is quite close
to a concrete practical realization (i.e. outside the laboratory; see next Section) is
quantum cryptography. The fact that it is impossible to perform a quantum me-
chanical measurement without disturbing the state of the measured system is used
here for the secure transmission of a cryptographic key (i.e. each eavesdropping
attempt can be detected with certainty). Together with a subsequent application
of a classical encryption method known as the “one-time” pad this leads to a cryp-
tographic scheme with provable security – in contrast to currently used public key
systems whose security relies on possibly doubtful assumptions about (pseudo) ran-
9 1.3. Experimental realizations
dom number generators and prime numbers. We will come back to both subjects –
quantum computing and quantum cryptography in Sections 4.5 and 4.6.
The third answer to the above question is of more fundamental nature. The dis-
cussion of questions from information theory in the context of quantum mechanics
leads to a deeper and in many cases more quantitative understanding of quantum
theory. Maybe the most relevant example for this statement is the study of en-
tanglement, i.e. non-classical correlations between quantum systems, which lead to
violations of Bell inequalities2 . Entanglement is a fundamental aspect of quantum
mechanics and demonstrates the differences between quantum and classical physics
in the most drastical way – this can be seen from Bell-type experiments, like the
one of Aspect et. al. [5], and the discussion about. Nevertheless, for a long time it
was only considered as an exotic feature of the foundations of quantum mechanics
which is not so relevant from a practical point of view. Since quantum information
attained broader interest, however, this has changed completely. It has turned out
that entanglement is an essential resource whenever classical information process-
ing is outperformed by quantum devices. One of the most remarkable examples is
the experimental realization of “entanglement enhanced” teleportation [24, 22]. We
have argued in Section 1.1 that classical teleportation, i.e. transmission of quantum
information through a classical information channel, is impossible. If sender and
receiver share, however, an entangled pair of particles (which can be used as an
additional resource) the impossible task becomes, most surprisingly, possible [11]!
(We will discuss this fact in detail in Section 4.1.) The study of entanglement and
in particular the question how it can be quantified is therefore a central topic within
quantum information theory (cf. Chapter 5). Further examples for fields where
quantum information has led to a deeper and in particular more quantitative in-
sight include “capacities” of quantum information channels and “quantum cloning”.
A detailed discussion of these topics will be given in Chapter 6 and 7. Finally let
us remark that classical information theory benefits in a similar way from the syn-
thesis with quantum mechanics. Beside the just mentioned channel capacities this
concerns for example the theory of computational complexity which analyzes the
scaling behavior of time and space consumed by an algorithm in dependence of the
size of the input data. Quantum information challenges here in particular the fun-
damental Church-Turing hypotheses [45, 152] which claims that each computation
can be simulated “efficiently” on a Turing machine; we come back to this topic in
Section 4.5.
motional state of the ions has to be used. A “program” on an ion trap quantum
computer starts now with a preparation of the register in an initial state – usually
the ground state of the ions. This is done by optical pumping and laser cooling
(which is in fact one of the most difficult parts of the whole procedure, in particular
if many ions are involved). Then the “network” of quantum gates is applied, in
terms of a (complicated) sequence of laser pulses. The readout finally is done by
laser beams which illuminate the ions subsequently. The beams are tuned to a fast
transition which affects only one of the qubit states and the fluorescent light is
detected. Concrete implementations (see e.g. [118, 102]) are currently restricted to
two qubits, however there is some hope that we will be able to control up to 10 or
12 qubits in the not too distant future.
A second quite successful technique is NMR quantum computing (see Section
5.4 of [23] and Section 7.7 of [122] together with the references therein for details).
NMR stands for “nuclear magnetic resonance” and it is the study of transitions
between Zeeman levels of an atomic nucleus in a magnetic field. The qubits are in
this case different spin states of the nuclei in an appropriate molecule and quantum
gates are realized by high frequency oscillating magnetic fields in pulses of controlled
duration. In contrast to ion traps however we do not use one molecule but a whole
cup of liquid containing some 1020 of them. This causes a number of problems,
concerning in particular the preparation of an initial state, fluctuations in the free
time evolution of the molecules and the readout. There are several ways to overcome
these difficulties and we refer the reader again to [23] and [122] for details. Concrete
implementations of NMR quantum computers are capable to use up to five qubits
[113]. Other realizations include the implementation of several known quantum
algorithms on two and three qubits; see e.g. [44, 96, 109].
The fundamental problem of the two methods for quantum computation dis-
cussed so far, is their lack of scalability. It is realistic to assume that NMR and
ion-trap quantum computer with up to tens of qubits will exist somewhen in the
future but not with thousands of qubits which are necessary for “real world” appli-
cations. There are, however, many other alternative proposals available and some
of them might be capable to avoid this problem. The following is a small (not at all
exhaustive) list: atoms in optical lattices [28], semiconductor nanostructures such as
quantum dots (there are many works in this area, some recent are [149, 30, 21, 29])
and arrays of Josephson junctions [112].
A second circle of experiments we want to mention here is grouped around
quantum communication and quantum cryptography (for a more detailed overview
let us refer to [163] and [69]). Realizations of quantum cryptography are fairly far
developed and it is currently possible to span up to 50km with optical fibers (e.g.
[93]). Potentially greater distances can be bridged by “free space cryptography”
where the quantum information is transmitted through the air (e.g [34]). With this
technology satellites can be used as some sort of “relays”, thus enabling quantum
key distribution over arbitrary distances. In the meantime there are quite a lot
of successful implementations. For a detailed discussion we will refer the reader
to the review of Gisin et. al. [69] and the references therein. Other experiments
concern the usage of entanglement in quantum communication. The creation and
detection of entangled photons is here a fundamental building block. Nowadays this
is no problem and the most famous experiment in this context is the one of Aspect
et. al. [5], where the maximal violation of Bell inequalities was demonstrated with
polarization correlated photons. Another spectacular experiment is the creation
of entangled photons over a distance of 10 km using standard telecommunication
optical fibers by the Geneva group [151]. Among the most exciting applications
of entanglement is the realization of entanglement based quantum key distribution
[95], the first successful “teleportation” of a photon [24, 22] and the implementation
of “dense coding” [115]; cf. Section 4.1.
Chapter 2
Basic concepts
After we have got a first, rough impression of the basic ideas and most rel-
evant subjects of quantum information theory, let us start with a more detailed
presentation. First we have to introduce the fundamental notions of the theory and
their mathematical description. Fortunately, much of the material we should have
to present here, like Hilbert spaces, tensor products and density matrices, is known
already from quantum mechanics and we can focus our discussion to those concepts
which are less familiar like POV measures, completely positive maps and entangled
states.
finite dimensional Hilbert spaces, as long as nothing else is explicitly stated. Since
most research in quantum information is done up to now for finite dimensional
systems (the only exception in this work is the discussion of Gaussian systems in
Section 3.3) this is not a too severe loss of generality. Hence we can choose H = Cd
and B(H) is just the algebra of complex d × d matrices. Since A is a subalgebra
of B(H) it operates naturally on H and it inherits from B(H) the operator norm
kAk = supkψk=1 kAψk and the operator ordering A ≥ B ⇔ hψ, Aψi ≥ hψ, Bψi
∀ψ ∈ H. Now we can define:
S(A) = {ρ ∈ A∗ | ρ ≥ 0, ρ(1I) = 1} (2.1)
where A∗ denotes the dual space of A, i.e. the set of all linear functionals on A, and
ρ ≥ 0 means ρ(A) ≥ 0 ∀A ≥ 0. Elements of S(A) describe the states of the system
in question while effects are given by
E(A) = {A ∈ A | A ≥ 0, A ≤ 1I}. (2.2)
The probability to measure the effect A in the state ρ is ρ(A). More generally we can
look at ρ(A) for an arbitrary A as the expectation value of A in the state ρ. Hence
the idea behind Equation (2.1) is to define states in terms of their expectation value
functionals.
Both spaces are convex, i.e. ρ, σ ∈ S(A) and 0 ≤ λ ≤ 1 implies λρ + (1 − λ)σ ∈
S(A) and similarly for E(A). The extremal points of S(A) respectively E(A), i.e. those
elements which do not admit a proper convex decomposition (x = λy + (1 − λ)z ⇒
λ = 1 or λ = 0 or y = z = x), play a distinguished role: the extremal points of S(A)
are pure states and those of E(A) are the propositions of the system in question. The
latter represent those effects which register a property with certainty in contrast
to non-extremal effects which admit some “fuzziness”. As a simple example for the
latter consider a detector which registers particles not with certainty but only with
a probability which is smaller than one.
Finally let us note that the complete discussion of this section can be generalized
easily to infinite dimensional systems, if we replace H = Cd by an infinite dimen-
sional Hilbert space (e.g. H = L2 (R)). This would require however more material
about C* algebras and measure theory than we want to use in this paper.
2.1.2. Quantum mechanics. — For quantum mechanics we have
A = B(H), (2.3)
where we have chosen again H = Cd . The corresponding systems are called d-level
systems or qubits if d = 2 holds.
To avoid
clumsy
notations we frequently write S(H)
and E(H) instead of S B(H) and E B(H) . From Equation (2.2) we immediately
see that an operator A ∈ B(H) is an effect iff it is positive and bounded from
above by 1I. An element P ∈ E(H) is a propositions iff P is a projection operator
(P 2 = P ).
States are described in quantum mechanics usually by density matrices, i.e.
positive and normalized trace class1 operators. To make contact to the general
definition in Equation (2.1) note first that B(H) is a Hilbert space with the Hilbert-
Schmidt scalar product hA, Bi = tr(A∗ B). Hence each linear functional ρ ∈ B(H)∗
can be expressed in terms of a (trace class) operator ρe by2 A 7→ ρ(A) = tr(e
ρA). It is
1 On a finite dimensional Hilbert space this attribute is of course redundant, since each operator
is of trace class in this case. Nevertheless we will frequently use this terminology, due to greater
consistency with the infinite dimensional case.
2 If we consider infinite dimensional systems this is not true. In this case the dual space of
the observable algebra is much larger and Equation (2.1) leads to states which are not necessarily
given by trace class operators. Such “singular states” play an important role in theories which
admit an infinite number of degrees of freedom like quantum statistics and quantum field theory;
cf. [25, 26]. For applications of singular states within quantum information see [97].
13 2.1. Systems, States and Effects
obvious that each ρe defines a unique functional ρ. If we start on the other hand with
ρ we can recover the matrix elements of ρe from ρ by ρekj = tr(e ρ|jihk|) = ρ(|jihk|),
where |jihk| denotes the canonical basis of B(H) (i.e. |jihk|ab = δja δkb ). More
generally we get for ψ, φ ∈ H the relation hφ, ρeψi = ρ(|ψihφ|), where |ψihφ| now
denotes the rank one operator which maps η ∈ H to hφ, ηiψ. In the following we
drop the ∼ and use the same symbol for the operator and the functional whenever
confusion can be avoided. Due to the same abuse of language we will interpret
elements of B(H)∗ frequently as (trace class) operators instead of linear functionals
(and write tr(ρA) instead of ρ(A)). However we do not identify B(H)∗ with B(H)
in general, because the two different notations help to keep track of the distinction
between spaces of states and spaces of observables. In addition we equip B∗ (H)
with the trace-norm kρk1 = tr |ρ| instead of the operator norm.
Positivity of the functional ρ implies positivity of the operator ρ due to
0 ≤ ρ(|ψihψ|) = hψ, ρψi and the same holds for normalization: 1 = ρ(1I) = tr(ρ).
Hence we can identify the state space from Equation (2.1) with the set of density
matrices, as expected for quantum mechanics. Pure states of a quantum system
are the one dimensional projectors. As usual we will frequently identify the density
matrix |ψihψ| with the wave function ψ and call the latter in abuse of language a
state.
To get a useful parameterization of the state space consider again the Hilbert-
Schmidt scalar product hρ, σi = tr(ρ∗ σ), but now on B∗ (H). The space of trace free
matrices in B∗ (H) (alternatively the functionals with ρ(1I) = 0) is the corresponding
orthocomplement 1I⊥ of the unit operator. If we choose a basis σ1 , . . . , σd2 −1 with
hσj , σk i = 2δjk in 1I⊥ we can write each selfajoint (trace class) operator ρ with
tr(ρ) = 1 as
2
d −1
1I 1 X 1I 1 2
ρ= + xj σj =: + ~x · ~σ , with ~x ∈ Rd −1 . (2.4)
d 2 j=1 d 2
A = C(X) = {f : X → C} (2.6)
2. Basic concepts 14
the discussion of quantum state estimation in Section 4.2 and Chapter 7). However, it helps us to
avoid measure theoretical subtleties; cf. Holevo’s book [79] for a more general discussion.
15 2.2. Composite systems and entangled states
eigenvalues and Pλ denotes the projection onto the corresponding eigenspace. Hence
there is a unique PV measure P = (Pλ )λ∈σ(A) associated to A which is called the
spectral measure
P of A. It is uniquely characterized by the property that the expecta-
tion value λ λρ(Pλ ) of P in the state ρ is given for any state ρ by ρ(A) = tr(ρA);
as it is well known from quantum mechanics. Hence the traditional way to define
observables within quantum mechanics perfectly fits into the scheme just outlined,
however it only covers the projection valued case and therefore admits no fuzziness.
For this reason POV measures are sometimes called generalized observables.
Finally note that the eigenprojections Pλ of A are elements of an observable
algebra A iff A ∈ A. This shows two things: First of all we can consider selfadjoint
elements of any *-subalgebra A of B(H) as observables of A-systems, and this is
precisely the reason why we have called A observable algebra. Secondly we see why
it is essential that A is really a subalgebra of B(H): if it is only a linear subspace
of B(H) the relation A ∈ A does not imply Pλ ∈ A.
where the trace on the left hand side is over H and on the right hand side over
H ⊗ K.
If two orthonormal bases φ1 , . . . , φn and ψ1 , . . . , ψm are given in H respectively
K we can consider the product basis P φ1 ⊗ ψ1 , . . . , φn ⊗ ψm in H ⊗ K, and we can
expand each Ψ ∈ H ⊗ K as Ψ = jk Ψjk φj ⊗ ψk with Ψjk = hφj ⊗ ψk , Ψi. This
procedure works for an arbitrary number of tensor factors. However, if we have
2. Basic concepts 16
exactly a twofold tensor product, there is a more economic way to expand Ψ, called
Schmidt decomposition in which only diagonal terms of the form φj ⊗ ψj appear.
Proposition 2.2.1 For each element Ψ of the twofold tensor product H ⊗ K there
are orthonormal systems φj , j = 1, . . . , n and ψk , k = 1, . . . , n (not necessarily
bases, i.e.P
n can
√ be smaller than dim H and dim K) of H and K respectively such
that Ψ = j λj φj ⊗ ψj holds. The φj and ψj are uniquely determined by Ψ. The
√
expansion is called Schmidt decomposition and the numbers λj are the Schmidt
coefficients.
Proof. Consider the partial trace ρ1 = trK (|ΨihΨ|) of the one dimensional projector
|ΨihΨ| associated to Ψ. ItPcan be decomposed in terms of its eigenvectors φn and we
get trK (|ΨihΨ|) = ρ1 = n λn |φn ihφn |. Now we can choose an orthonormal basis
ψk′ , k = 1, . . . , m in K and expand Ψ with respect
P to φj ⊗ ψk′ . Carrying out the k
summation
P we get a family of vectors ψj = k hΨ, φj ⊗ ψk′ iψk′ with the property
′′
Ψ = j φj ⊗ ψj′′ . Now we can calculate the partial trace and get for any A ∈ B(H1 ):
X X
λj hφj , Aφj i = tr(ρ1 A) = hΨ, (A ⊗ 1I)Ψi = hφj , Aφk ihψj′′ , ψk′′ i. (2.8)
j,k
j
Since A is arbitrary we can compare the left and right hand side of this equation
−1/2 ′′
term by term and we get hψj′′ , ψk′′ i = δjk λj . Hence ψj = λj ψj is the desired
orthonormal system. 2
As an immediate application of this result we can show that each mixed state
ρ ∈ B∗ (H) (of the quantum system B(H)) can be regarded as a pure state on a
larger Hilbert space H ⊗ H′ . We just have to consider the eigenvalue expansion ρ =
P
j λj |φj ihφj | of ρ and to choose an arbitrary orthonormal system ψj , j = 1, . . . n
in H′ . Using Proposition 2.2.1 we get
Corollary 2.2.2 Each state ρ ∈ B∗ (H) can be extended to a pure state Ψ on a
larger system with Hilbert space H ⊗ H′ such that trH′ |ΨihΨ| = ρ holds.
as expected. For two classical systems A = C(X) and B = C(Y ) recall that elements
of C(X) (respectively C(Y )) are complex valued functions on X (on Y ). Hence the
tensor product C(X) ⊗ C(Y ) consists of complex valued functions on X × Y , i.e.
C(X) ⊗ C(Y ) = C(X × Y ). In other words states and observables of the composite
system C(X) ⊗ C(Y ) are, in accordance with classical probability theory, given by
probability distributions and random variables on the Cartesian product X × Y .
17 2.2. Composite systems and entangled states
If only one subsystem is classical and the other is quantum; e.g. a micro particle
interacting with a classical measuring device we have a hybrid system. The elements
of its observable algebra C(X) ⊗ B(H) can be regarded as operator valued functions
on X, i.e. X ∋ x 7→ Ax ∈ B(H) and A is an effect iff 0 ≤ Ax ≤ 1I holds for all
x ∈ X. The elements of the dual C∗ (X) ⊗ B∗ (H) are in a similar way B∗ (X) valued
functions X ∋ x 7→ P ρx ∈ B∗ (H) and ρ is a state iff each ρx is a positive trace class
P on H and x ρx = 1I. The probability to measure the effect A in the state
operator
ρ is x ρx (Ax ).
2.2.3. Correlations and entanglement. — Let us now consider two effects
A ∈ A and B ∈ B then A ⊗ B is an effect of the composite system A ⊗ B. It
is interpreted as the joint measurement of A on the first and B on the second
subsystem, where the “yes” outcome means “both effects give yes”. In particular
A ⊗ 1I means to measure A on the first subsystem and to ignore the second one
completely. If ρ is a state of A⊗B we can define its restrictions by ρA (A) = ρ(A⊗1I)
and ρB (A) = ρ(1I ⊗ A). If both systems are quantum the restrictions of ρ are the
partial traces, while in the classical case we have to sum over the B, respectively
A variables. For two states ρ1 ∈ S(A) and ρ2 ∈ S(B) there is always a state ρ of
A ⊗ B such that ρ1 = ρA and ρ2 = ρB holds: We just have to choose the product
state ρ1 ⊗ ρ2 . However in general we have ρ 6= ρA ⊗ ρB which means nothing else
then ρ also contains correlations between the two subsystems systems.
Definition 2.2.3 A state ρ of a bipartite system A ⊗ B is called correlated if there
are some A ∈ A, B ∈ B such that ρ(A ⊗ B) 6= ρA (A)ρB (B) holds.
We immediately see that ρ = ρ1 ⊗ ρ2 implies ρ(A ⊗ B) = ρ1 (A)ρ2 (B) =
ρA (A)ρB (B) hence ρ is not correlated. If on the other hand ρ(A⊗B) = ρA (A)ρB (B)
holds we get ρ = ρA ⊗ ρB . Hence, the definition of correlations just given perfectly
fits into our intuitive considerations.
An important issue in quantum information theory is the comparison of correla-
tions between quantum systems on the one hand and classical systems on the other.
Hence let us have a closer look on the state space of a system consisting of at least
one classical subsystem.
Proposition 2.2.4 Each state ρ of a composite system A ⊗ B consisting of a clas-
sical (A = C(X)) and an arbitrary system (B) has the form
X
ρ= λj ρA B
j ⊗ ρj (2.11)
j∈X
If A and B are two quantum systems it is still possible for them to be correlated
in the way just described. We can simply prepare them with a classical random
generator which triggers two preparation devices to produce systems in the states
ρA B
j , ρj with probability λj . The overall state produced by this setup is obviously
the ρ from Equation (2.11). However, the crucial point is that not all correlations of
quantum systems are of this type! This is an immediate consequence of the definition
of pure states ρ = |ΨihΨ| ∈ S(H): Since there is no proper convex decomposition of
2. Basic concepts 18
(k)
with states ρj of B(Hk ) and weights λj > 0. Otherwise ρ is called entangled. The
set of all separable states is denoted by D(H1 ⊗ H2 ) or just D if H1 and H2 are
understood.
2.2.4. Bell inequalities. — We have just seen that it is quite easy for pure states
to check whether they are entangled or not. In the mixed case however this is a much
bigger, and in general unsolved, problem. In this subsection we will have a short
look at Bell inequalities, which are maybe the oldest criterion for entanglement (for
a more detailed review see [164]). Today more powerful methods, most of them
based on positivity properties, are available. We will postpone the corresponding
discussion to the end of the following section, after we have studied (completely)
positive maps (cf. Section 2.4).
Bell inequalities are traditionally discussed in the framework of “local hidden
variable theories”. More precisely we will say that a state ρ of a bipartite system
B(H ⊗ K) admits a hidden variable model, if there is a probability space (X, µ) and
(measurable) response functions X ∋ x 7→ FA (x, k), FB (x, l) ∈ R for all discrete PV
measures A = A1 , . . . , AN ∈ B(H) respectively B = B1 , . . . , BM ∈ B(K) such that
Z
FA (x, k)FB (x, l)µ(dx) = tr(ρAk ⊗ Bl ) (2.13)
X
holds for all, k, l and A, B. The value of the functions FA (x, k) is interpreted as
the probability to get the value k during an A measurement with known “hidden
parameter” x. The set of states admitting a hidden variable model is a convex set
and as such it can be described by an (infinite) hierarchy of correlation inequalities.
Any one of these inequalities is usually called (generalized) Bell inequality. The
most well known one is those given by Clauser, Horne, Shimony and Holt [47]: The
state ρ satisfies the CHSH-inequality if
ρ A ⊗ (B + B ′ ) + A′ ⊗ (B − B ′ ) ≤ 2 (2.14)
the references therein instead). Interesting for us is the fact that Bell inequalities,
in particular the CHSH case in Equation (2.14), provide a necessary condition for
a state ρ to be separable. However there exist entangled states admitting a hidden
variable model [166]. Hence, Bell inequalities are not sufficient for separability.
2.3 Channels
Assume now that we have a number of quantum systems, e.g. a string of ions in
a trap. To “process” the quantum information they carry we have to perform in
general many steps of a quite different nature. Typical examples are: free time
evolution, controlled time evolution (e.g. the application of a “quantum gate” in a
quantum computer), preparations and measurements. The purpose of this section is
to provide a unified framework for the description of all these different operations.
The basic idea is to represent each processing step by a “channel”, which converts
input systems, described by an observable algebra A into output systems described
by a possibly different algebra B. Henceforth we will call A the input and B the
output algebra. If we consider e.g. the free time evolution, we need quantum systems
of the same type on the input and the output side, hence in this case we have
A = B = B(H) with an appropriately chosen Hilbert space H. If on the other hand
we want to describe a measurement we have to map quantum systems (the measured
system) to classical information (the measuring result). Therefore we need in this
example A = B(H) for the input and B = C(X) for the output algebra, where X is
the set of possible outcomes of the measurement (cf. Subsection 2.1.4).
Our aim is now to get a mathematical object which can be used to describe a
channel. To this end consider an effect A ∈ B of the output system. If we invoke first
a channel which transforms A systems into B systems, and measure A afterwards
on the output systems, we end up with a measurement of an effect T (A) on the
input systems. Hence we get a map T : E(B) → E(A) which completely describes the
channel 4 . Alternatively we can look at the states and interpret a channel as a map
T ∗ : S(A) → S(B) which transforms A systems in the state ρ ∈ S(A) into B systems
in the state T ∗ (ρ). To distinguish between both maps we can say that T describes
the channel in the Heisenberg picture and T ∗ in the Schrödinger picture. On the level
of the statistical interpretation both points of view should coincide of course, i.e. the
probabilities5 (T ∗ ρ)(A) and ρ(T A) to get the result “yes” during an A measurement
on B systems in the state T ∗ ρ, respectively a T A measurement on A systems in
the state ρ, should be the same. Since (T ∗ ρ)(A) is linear in A we see immediately
that T must be an affine map, i.e. T (λ1 A1 + λ2 A2 ) = λ1 T (A1 ) + λ2 T (A2 ) for each
convex linear combination λ1 A1 + λ2 A2 of effects in B, and this in turn implies that
T can be extended naturally to a linear map, which we will identify in the following
with the channel itself, i.e. we say that T is the channel.
2.3.1. Completely positive maps. — Let us change now slightly our point of
view and start with a linear operator T : A → B. To be a channel, T must map
effects to effects, i.e. T has to be positive: T (A) ≥ 0 ∀A ≥ 0 and bounded from
above by 1I, i.e. T (1I) ≤ 1I. In addition it is natural to require that two channels
in parallel are again a channel. More precisely, if two channels T : A1 → B1 and
S : A2 → B2 are given we can consider the map T ⊗ S which associates to each
A ⊗ B ∈ A1 ⊗ A2 the tensor product T (A) ⊗ S(B) ∈ B1 ⊗ B2 . It is natural to
assume that T ⊗ S is a channel which converts composite systems of type A1 ⊗ A2
into B1 ⊗ B2 systems. Hence S ⊗ T should be positive as well [125].
4 Note that the direction of the mapping arrow is reversed compared to the natural ordering of
processing.
5 To keep notations more readable we will follow frequently the usual convention to drop the
parenthesis around arguments of linear operators. Hence we will write T A and T ∗ ρ instead of
T (A) and T ∗ (ρ). Similarly we will simply write T S instead of T ◦ S for compositions.
2. Basic concepts 20
algebras. It needs however some material from representation theory of C*-algebras which we want
to avoid here. See e.g. [125, 83].
21 2.4. Separability criteria and positive maps
ρ = (Id ⊗T ∗ ) σ, (2.17)
where Id denotes the identity map on B∗ (H). The pure state σ can be chosen such
that trH (σ) has no zero eigenvalue. In this case T and σ are uniquely
determined
e, T with ρ = Id ⊗Te∗ σ
(up to unitary equivalence) by Equation (2.17); i.e. if σ e e are
e = (1I ⊗ U )∗ σ(1I ⊗ U ) and Te( · ) = U ∗ T ( · )U with an appropriate
given, we have σ
unitary operator U .
Proof. The state σ is obviously the purification of trH1 (ρ). Hence if λj and
ψj arePeigenvalues
p and eigenvectors of trH1 (ρ) we can set σ = |ΨihΨ| with
Ψ = j λ ψ
j j ⊗ φ j where φj is an (arbitrary) orthonormal basis in K. It is
clear that σ is uniquely determined up to a unitary. Hence we only have to show
that a unique T exists if Ψ is given. To satisfy Equation (2.17) we must have
ρ |ψj ⊗ ηk ihψl ⊗ ηl | = Ψ, (Id ⊗T ) |ψj ⊗ ηk ihψl ⊗ ηl | Ψ (2.18)
= Ψ, |ψj ihψl | ⊗ T |ηk ihηp | Ψ (2.19)
p
= λj λl φj , T |ηk ihηp | φl , (2.20)
2.4.2. The partial transpose. — The most typical example for a positive non-
cp map is the transposition ΘA = AT of d × d matrices, which we have just used in
the proof of Theorem 2.4.1. Θ is obviously a positive map, but the partial transpose
is not. The latter can be easily checked with the maximally entangled state (cf.
Subsection 3.1.1).
1 X
Ψ= √ |ji ⊗ |ji (2.24)
d j
for any separable state ρ ∈ B∗ (H ⊗ K), These equations are another non-trivial
separability criterion, which is called the reduction criterion [85, 42]. It is closely
related to the ppt criterion, due to the following proposition (see [85]) for a proof).
Proposition 2.4.5 Each ppt-state ρ ∈ S(H ⊗ K) satisfies the reduction criterion.
If dim H = 2 and dim K = 2, 3 both criteria are equivalent.
Hence we see with Theorem 2.4.3 that a state ρ in 2 × 2 or 2 × 3 dimensions is
separable iff it satisfies the reduction criterion.
Chapter 3
Basic examples
After the somewhat abstract discussion in the last chapter we will become more
concrete now. In the following we will present a number of examples which help
on the one hand to understand the structures just introduced, and which are of
fundamental importance within quantum information on the other.
3.1 Entanglement
Although our definition of entanglement (Definition 2.2.5) is applicable in arbitrary
dimensions, detailed knowledge about entangled states is available only for low
dimensional systems or for states with very special properties. In this section we
will discuss some of the most basic examples.
3.1.1. Maximally entangled states. — Let us start with a look on pure states
of a composite systems A ⊗ B and their possible correlations. If one subsystem is
classical, i.e. A = C {1, . . . , d} , the state space is given according to Subsection
2.2.2 by S(B)d and ρ ∈ S(B)d is pure iff ρ = (δj1 τ, . . . , δjd τ ) with j = 1, . . . , d
and a pure state τ of the B system. Hence the restrictions of ρ to A respectively B
are the Dirac measure δj ∈ S(X) or τ ∈ S(B), in other words both restrictions are
pure. This is completely different if A and B are quantum, i.e. A ⊗ B = B(H ⊗ K):
Consider ρ = |ΨihΨ| with Ψ ∈ H ⊗ K and Schmidt decomposition (Proposition
P 1/2
2.2.1) Ψ = j λj φj ⊗ ψj . Calculating the A restriction, i.e. the partial trace over
K we get
X 1/2 1/2
tr[trK (ρ)A] = tr[|ΨihΨ|A ⊗ 1I] = λj λk hφj , Aφk iδjk , (3.1)
jk
P
hence trK (ρ) = j λj |φj ihφj | is mixed iff Ψ is entangled. The most extreme case
arises if H = K = Cd and trK (ρ) is maximally mixed, i.e. trK (ρ) = 1dI . We get for Ψ
d
1 X
Ψ= √ φj ⊗ ψj (3.2)
d j=1
Let us come back to the general case now and consider an arbitrary ρ ∈ S(H⊗H).
Using maximally entangled states, we can introduce another separability criterion
in terms of the maximally entangled fraction (cf. [16])
F(ρ) = sup hΨ, ρΨi. (3.4)
Ψ max. ent.
If ρ is separable the reduction criterion (2.25) implies hΨ, [tr1 (ρ) ⊗ 1I − ρ]Ψi ≥ 0 for
any maximally entangled state. Since the partial trace of |ΨihΨ| is d−1 1I we get
d−1 = hΨ, tr1 (ρ) ⊗ 1IΨi ≤ hΨ, ρΨi, (3.5)
hence F(ρ) ≤ 1/d. This condition is not very sharp however. Using the ppt criterion
it can be shown that ρ = λ|Φ1 ihΦ1 | + (1 − λ)|00ih00| (with the Bell state Φ1 ) is
entangled for all 0 < λ ≤ 1 but a straightforward calculation shows that F(ρ) ≤ 1/2
holds for λ ≤ 1/2.
Finally, we have to mention here a very useful parameterization of the set of
pure states on H ⊗ H in terms of maximally entangled states: If Ψ is an arbitrary
but fixed maximally entangled state, each φ ∈ H ⊗ H admits (uniquely determined)
operators X1 , X2 such that
φ = (X1 ⊗ 1I)Ψ = (1I ⊗ X2 )Ψ (3.6)
holds. This can be easily checked in a product basis.
3.1.2. Werner states. — If we consider entanglement of mixed states rather
than pure ones, the analysis becomes quite difficult, even if the dimensions of the
underlying Hilbert spaces are low. The reason is that the state space S(H1 ⊗H2 ) of a
two-partite system with dim Hi = di is a geometric object in a d21 d22 − 1 dimensional
space. Hence even in the simplest non-trivial case (two qubits) the dimension of the
state space becomes very high (15 dimensions) and naive geometric intuition can
be misleading. Therefore it is often useful to look at special classes of model states,
which can be characterized by only few parameters. A quite powerful tool is the
study of symmetry properties; i.e. to investigate the set of states which is invariant
under a group of local unitaries. A general discussion of this scheme can be found
in [159]. In this paper we will present only three of the most prominent examples.
Consider first a state ρ ∈ S(H ⊗ H) (with H = Cd ) which is invariant under the
group of all U ⊗ U with a unitary U on H; i.e. [U ⊗ U, ρ] = 0 for all U . Such a ρ
is usually called a Werner state [166, 128] and its structure can be analyzed quite
easily using a well known result of group theory which goes back to Weyl [171] (see
also Theorem IX.11.5 of [142]), and which we will state in detail for later reference:
Theorem 3.1.1 Each operator A on the N -fold tensor product H⊗N of the (finite
dimensional) Hilbert space H which commutes with all unitaries
P of the form U ⊗N
is a linear combination of permutation operators, i.e. A = π λπ Vπ , where the sum
is taken over all permutations π of N elements, λπ ∈ C and Vπ is defined by
Vπ φ1 ⊗ · · · ⊗ φN = φπ−1 (1) ⊗ · · · ⊗ φπ−1 (N ) . (3.7)
In our case (N = 2) there are only two permutations: the identity 1I and the flip
F (ψ ⊗ φ) = φ ⊗ ψ. Hence ρ = a1I + bF with appropriate coefficients a, b. Since ρ is
a density matrix, a and b are not independent. To get a transparent way to express
these constraints, it is reasonable to consider the eigenprojections P± of F rather
then 1I and F ; i.e. F P± ψ = ±P± ψ and P± = (1I ± F )/2. The P± are the projections
⊗2
on the subspaces H± ⊂ H ⊗ H of symmetric respectively antisymmetric tensor
products (Bose- respectively Fermi-subspace). If we write d± = d(d ± 1)/2 for the
⊗2
dimensions of H± we get for each Werner state ρ
λ (1 − λ)
ρ= P+ + P− , λ ∈ [0, 1]. (3.8)
d+ d−
3. Basic examples 26
On the other hand it is obvious that each state of this form is U ⊗ U invariant,
hence a Werner state.
If ρ is given, it is very easy to calculate the parameter λ from the expectation
value of ρ and the flip tr(ρF ) = 2λ − 1 ∈ [−1, 1]. Therefore we can write for an
arbitrary state σ ∈ S(H ⊗ H)
tr(σF ) + 1 (1 − tr σF )
PUU (σ) = P+ + P− , (3.9)
2d+ 2d−
and this defines a projection from the full state space to the set of Werner states
which is called the twirl operation. In many cases it is quite useful that it can be
written alternatively as a group average of the form
Z
PUU (σ) = (U ⊗ U )σ(U ∗ ⊗ U ∗ )dU, (3.10)
U(d)
where dU denotes the normalized, left invariant Haar measure on U(d). To check
this identity note first that its right hand side is indeed U ⊗ U invariant, due to the
invariance of the volume element dU . Hence we have to check only that the trace
of F times the integral coincides with tr(F σ):
" Z # Z
tr F (U ⊗ U )σ(U ∗ ⊗ U ∗ )dU = tr [F (U ⊗ U )σ(U ∗ ⊗ U ∗ )] dU (3.11)
U(d) U(d)
Z
= tr(F σ) dU = tr(F σ), (3.12)
U(d)
where we have used the fact that F commutes with U ⊗ U and the normalization of
dU . We can apply PUU obviously to arbitrary operators A ∈ B(H ⊗ H) and, as an
integral over unitarily implemented operations, we get a channel. Substituting U →
U ∗ in (3.10) and cycling the trace tr(APUU (σ)) we find tr(PUU (A)ρ) = tr(APUU (ρ)),
hence PUU has the same form in the Heisenberg and the Schrödinger picture (i.e.
∗
PUU = PUU ).
If σ ∈ S(H ⊗ H) is a separable state the integrand of PUU (σ) in Equation (3.10)
consists entirely of separable states, hence PUU (σ) is separable. Since each Werner
state ρ is the twirl of itself, we see that ρ is separable iff it is the twirl PUU (σ) of
a separable state σ ∈ S(H ⊗ H). To determine the set of separable Werner states
we therefore have to calculate only the set of all tr(F σ) ∈ [−1, 1] with separable
σ. Since each such σ admits a convex decomposition into pure product states it is
sufficient to look at
hψ ⊗ φ, F ψ ⊗ φi = |hψ, φi|2 (3.13)
which ranges from 0 to 1. Hence ρ from Equation (3.8) is separable iff 1/2 ≤ λ ≤ 1
and entangled otherwise (due to λ = (tr(F ρ) + 1)/2). If H = C2 holds, each Werner
state is Bell diagonal and we recover the result from Subsection 3.1.1 (separable if
highest eigenvalue less or equal than 1/2).
3.1.3. Isotropic states. — To derive a second class of states consider the partial
transpose (Id ⊗Θ)ρ (with respect to a distinguished base |ji ∈ H, j = 1, . . . , d) of
a Werner state ρ. Since ρ is, by definition, U ⊗ U invariant, it is easy to see that
(Id ⊗Θ)ρ is U ⊗ Ū invariant, where Ū denotes component wise complex conjugation
in the base |ji (we just have to use that U ∗ = Ū T holds). Each state τ with this kind
of symmetry is called an isotropic state [132], and our previous discussion shows
that τ is a linear combination of 1I and the partial transpose of the flip, which is
the rank one operator
d
X
Fe = (Id ⊗Θ)F = |ΨihΨ| = |jjihkk|, (3.14)
jk=1
27 3.1. Entanglement
P
where Ψ = j |jji is, up to normalization a maximally entangled state. Hence each
isotropic τ can be written as
1 1I e d2
τ= λ + (1 − λ)F , λ ∈ 0, 2 , (3.15)
d d d −1
where the bounds on λ follow from normalization and positivity. As above we can
determine the parameter λ from the expectation value
1 − d2
tr(Feτ ) = λ+d (3.16)
d
which ranges from 0 to d and this again leads to a twirl operation: For an arbitrary
state σ ∈ S(H ⊗ H) we can define
1
PUŪ (σ) = tr(Feσ) − d 1I + 1 − d tr(Fe σ) Fe , (3.17)
d(1 − d2 )
and as for Werner states PUŪ can be rewritten in terms of a group average
Z
PUŪ (σ) = (U ⊗ Ū)σ(U ∗ ⊗ Ū ∗ )dU. (3.18)
U(d)
Now we can proceed in the same way as above: PUŪ is a channel with PU∗ Ū = PUŪ ,
its fixed points PUŪ (τ ) = τ are exactly the isotropic states, and the image of the set
of separable states under PUŪ coincides with the set of separable isotropic states.
To determine the latter we have to consider the expectation values (cf. Equation
(3.13))
d
X
hψ ⊗ φ, Fe ψ ⊗ φi = ψj φj = |hψ, φ̄i|2 ∈ [0, 1]. (3.19)
j=1
d(d − 1) d2
2
≤λ≤ 2 (3.20)
d −1 d −1
holds and entangled otherwise. For λ = 0 we recover the maximally entangled state.
For d = 2, again we recover again the special case of Bell diagonal states encountered
already in the last subsection.
3.1.4. OO-invariant states. — Let us combine now Werner states with isotropic
states, i.e. we look for density matrices ρ which can be written as ρ = a1I + bF + cFe ,
or, if we introduce the three mutually orthogonal projection operators
1e 1 1 1
p0 = F, p1 = (1I − F ), (1I + F ) − Fe (3.21)
d 2 2 d
as a convex linear combination of tr(pj )−1 pj , j = 0, 1, 2:
p1 p2
ρ = (1 − λ1 − λ2 )p0 + λ1 + λ2 , λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1 (3.22)
tr(p1 ) tr(p2 )
Each such operator is invariant under all transformations of the form U ⊗ U if U
is a unitary with U = Ū , in other words: U should be a real orthogonal matrix.
A little bit representation theory of the orthogonal group shows that in fact all
operators with this invariance property have the form given in (3.22); cf. [159]. The
corresponding states are therefore called OO-invariant, and we can apply basically
3. Basic examples 28
f
tr(F ρ)
3
-1
-1 0 1 2 3
tr(F ρ)
Figure 3.1: State space of OO-invariant states (upper triangle) and its partial trans-
pose (lower triangle) for d = 3. The special cases of isotropic and Werner states are
drawn as thin lines.
the same machinery as in Subsection 3.1.2 if we replace the unitary group U(d)
by the orthogonal group O(d). This includes in particular the definition of a twirl
operation as an average over O(d) (for an arbitrary ρ ∈ S(H ⊗ H)):
Z
POO (ρ) = U ⊗ U ρU ⊗ U ∗ dU (3.23)
O(d)
which we can express alternatively in terms of the expectation values tr(F ρ), tr(Fe ρ)
by
!
tr(Fe ρ) 1 − tr(F ρ) 1 + tr(F ρ) tr(Feρ) p2
POO (ρ) = p0 + p1 + − . (3.24)
d 2 tr(p1 ) 2 d tr(p2 )
2 tr(Fe ρ)
−1 ≤ tr(F ρ) ≤ 1, 0 ≤ tr(Feρ) ≤ d, tr(F ρ) ≥ − 1. (3.25)
d
For d = 3 this is the upper triangle in Figure 3.1.
The values in the lower (dotted) triangle belong to partial transpositions of
OO-invariant states. The intersection of both, i.e. the gray shaded square Q =
[0, 1]×[0, 1], represents therefore the set of OO-invariant ppt states, and at the same
time the set of separable states, since each OO-invariant ppt state is separable. To
see the latter note that separable OO-invariant states form a convex subset of Q.
Hence, we only have to show that the corners of Q are separable. To do this note
29 3.1. Entanglement
that 1. POO (ρ) is separable whenever ρ is and 2. that tr F POO (ρ) = tr(F ρ) and
tr Fe POO (ρ) = tr(F ρ) holds (cf. Equation (3.12)). We can consider pure product
states |φ ⊗ ψihφ ⊗ ψ| for ρ and get |hφ, ψi|2 , hφ, ψ̄i|2 for the tuple tr(F ρ), tr(Fe ρ) .
Now the point 1, 1) in Q is obtained if ψ = φ is real, the point (0, 0) is obtained
for real and orthogonal φ, ψ and the point (1, 0) belongs to the case ψ = φ and
hφ, φ̄i = 0. Symmetrically we get (0, 1) with the same φ and ψ = φ̄.
3.1.5. PPT states. — We have seen in Theorem 2.4.3 that separable states and
ppt states coincide in 2 × 2 and 2 × 3 dimensions. Another class of examples with
this property are OO-invariant states just studied. Nevertheless, separability and a
positive partial transpose are not equivalent. An easy way to produce such examples
of states which are entangled and ppt is given in terms of unextendible product bases
[14]. An orthonormal family φj ∈ H1 ⊗H2 , j = 1, . . . , N < d1 d2 (with dk = dim Hk )
is called an unextendible product basis1 (UPB) iff 1. all φj are product vectors and
2. there is no product vector orthogonal to all φj . Let us denote the projector to
the span of all φj by E, its orthocomplement by E ⊥ , i.e. E ⊥ = 1I − E, and define
the state ρ = (d1 d2 − N )−1 E ⊥ . It is entangled because there is by construction no
product vector in the support of ρ, and it is ppt. The latter can be seen as follows:
The projector E is a sum of the one dimensional projectors |φj ihφj |, j = 1, . . . , N .
Since all φj are product vectors the partial transposes of the |φj ihφj | are of the form
|φej ihφej |, with another UPB φej , j = 1, . . . , N and the partial transpose (1I ⊗ Θ)E of
E is the sum of the |φej ihφej |. Hence (1I ⊗ Θ)E ⊥ = 1I − (1I ⊗ Θ)E is a projector and
therefore positive.
To construct entangled ppt states we have to find UPBs. The following two
examples are taken from [14]. Consider first the five vectors
Ψj = φj ⊗ φ2jmod5 , j = 0, . . . , 4 (3.27)
form a UPB in the Hilbert space H ⊗ H, dim H = 3 (cf. [14]). A second example,
again in 3×3 dimensional Hilbert space are the following five vectors (called “Tiles”
in [14]):
1 1 1
√ |0i ⊗ |0i − |1i , √ |2i ⊗ |1i − |2i , √ |0i − |1i ⊗ |2i,
2 2 2
1 1
√ |1i − |2i ⊗ |0i, |0i + |1i + |2i ⊗ |0i + |1i + |2i , (3.28)
2 3
tensor product H(1) ⊗ · · · ⊗ H(N ) (with N > 2) which can not be written as2
d
X (1) (N )
Ψ= λj φj ⊗ · · · ⊗ φj (3.29)
j=1
(k) (k)
with N orthonormal bases φ1 , . . . , φd of H(k) , k = 1, . . . , N . To get examples for
such states in the tri-partite case, note first that any partial trace of |ΨihΨ| with Ψ
from Equation (3.29) has separable eigenvectors. Hence, each purification (Corollary
2.2.2) of an entangled, two-partite, mixed state with inseparable eigenvectors (e.g.
a Bell diagonal state) does not admit a Schmidt decomposition. This implies on
the one hand that there are interesting new properties to be discovered, but on
the other we see that many techniques developed for bipartite pure states can be
generalized in a straightforward way only for states which are Schmidt decomposable
in the sense of Equation (3.29). The most well known representative of this class
for a tripartite qubit system is the GHZ state [73]
1
Ψ = √ |000i + |111i , (3.30)
2
which has the special property that contradictions between local hidden variable
theories and quantum mechanics occur even for non-statistical predictions (as op-
posed to maximally entangled states of bipartite systems; [73, 117, 116]).
A second new aspect arising in the discussion of multiparty entanglement is the
fact that several different notions of separability occur. A state ρ of an N -partite
system B(H1 ) ⊗ · · · ⊗ B(HN ) is called N -separable if
X
ρ= λJ ρj1 ⊗ · · · ⊗ ρjN , (3.31)
J
with states ρjk ∈ B∗ (Hk ) and multi indices J = (j1 , . . . , jk ). Alternatively, how-
ever, we can decompose B(H1 ) ⊗ · · · ⊗ B(HN ) in two subsystems (or even into M
subsystems if M < N ) and call ρ biseparable if it is separable with respect to this
decomposition. It is obvious that N -separability implies biseparability with respect
to all possible decompositions. The converse is – not very surprisingly – not true.
One way to construct a corresponding counterexample is to use an unextendable
product base (cf. Subsection 3.1.5). In [14] it is shown that the tripartite qubit state
complementary to the UPB
1
|0, 1, +i, |1, +, 0i, |+, 0, 1i, |−, −, −i with |±i = √ (|0i ± |1i) (3.32)
2
is entangled (i.e. tri-inseparable) but biseparable with respect to any decomposition
into two subsystems (cf. [14] for details).
Another, maybe more systematic, way to find examples for multipartite states
with interesting properties is the generalization of the methods used for Werner
states (Subsection 3.1.2), i.e. to look for density matrices ρ ∈ B∗ (H⊗N ) which
commute with all unitaries of the form U ⊗N . Applying again theorem 3.1.1 we
see that each such ρ is a linear combination of permutation unitaries. Hence the
structure of the set of all U ⊗N invariant states can be derived from representation
theory of the symmetric group (which can be tedious for large N !). For N = 3
this program is carried out in [61] and it turns out that the corresponding set of
invariant states is a five dimensional (real) manifold. We skip the details here and
refer to [61] instead.
2 There (k) (k)
is however the possibility to choose the bases φ1 , . . . , φd such that the number of
summands becomes minimal. For tri-partite systems this “minimal canonical form” is study in [1].
31 3.2. Channels
3.2 Channels
In Section 2.3 we have introduced channels as very general objects transforming
arbitrary types of information (i.e. classical, quantum and mixtures of them) into
one another. In the following we will consider some of the most important special
cases.
3.2.1. Quantum channnels. — Many tasks of quantum information theory re-
quire the transmission of quantum information over long distances, using devices
like optical fibers or storing quantum information in some sort of memory. Both
situations can be described by a channel or quantum operation T : B(H) → B(H),
where T ∗ (ρ) is the quantum information which will be received when ρ was sent,
or alternatively: which will be read off the quantum memory when ρ was written.
Ideally we would prefer those channels which do not affect the information at all,
i.e. T = 1I, or, as the next best choice, a T whose action can be undone by a physi-
cal device, i.e. T should be invertible and T −1 is again a channel. The Stinespring
Theorem (Theorem 2.3.2) immediately shows that this implies T ∗ ρ = U ρU ∗ with
a unitary U ; in other words the systems carrying the information do not interact
with the environment. We will call such a kind of channel an ideal channel. In
real situations however interaction with the environment, i.e. additional, unobserv-
able degrees of freedom, can not be avoided. The general structure of such a noisy
channel is given by
T ∗ (ρ) = trK U (ρ ⊗ ρ0 )U ∗ (3.33)
Note that there are in general many ways to express a channel this way, e.g. if
T is an ideal channel ρ 7→ T ∗ ρ = U ρU ∗ we can rewrite it with an arbitrary unitary
U0 : K → K by T ∗ ρ = tr2 (U ⊗ U0 ρ ⊗ ρ0 U ∗ ⊗ U0∗ ). This is the weakness of the ancilla
form compared to the Stinespring representation of Theorem 2.3.2. Nevertheless
3. Basic examples 32
ρ T ∗(ρ)
Unitary
A
Corollary 3.2.1 shows that each channel which is not an ideal channel is noisy in
the described way.
The most prominent example for a noisy channel is the depolarizing channel for
d-level systems (i.e. H = Cd )
1I
S(H) ∋ ρ 7→ ϑρ + (1 − ϑ) ∈ S(H), 0≤ϑ≤1 (3.37)
d
or in the Heisenberg picture
tr(A)
B(H) ∋ A 7→ ϑA + (1 − ϑ) 1I ∈ B(H). (3.38)
d
A Stinespring dilation of T (not the minimal one – this can be checked by counting
dimensions) is given by K = H ⊗ H ⊕ C and V : H → H ⊗ K = H⊗3 ⊕ H with
"r #
1−ϑ X
d h√ i
|ji 7→ V |ji = |ki ⊗ |ki ⊗ |ji ⊕ ϑ|ji , (3.39)
d
k=1
U (φ1 ⊗ φ2 ⊗ φ3 ⊕ χ) = φ2 ⊗ φ3 ⊗ φ1 ⊕ χ, (3.41)
useful for the study of the cloning problem in Chapter 7). Before we do this let
us have a short look on a particular class of examples which is closely related to
OO-invariant states.
Hence consider a channel T : B(H) → B(H) which is covariant with respect
to the orthogonal group, i.e. T (U AU ∗ ) = U T (A)U ∗ for all unitaries U on H with
Ū = UPin a distinguished basis |ji, j = 1, . . . , d. The maximally entangled state ψ =
d−1/2 j |jji is OO-invariant, i.e. U ⊗ U ψ = ψ for all these U . Therefore each state
ρ = (Id ⊗T ∗ )|ψihψ| is OO-invariant as well and by the duality lemma (Theorem
2.3.4) T and ψ are uniquely determined (up to unitary equivalence) by ρ. This
means we can use the structure of OO-invariant states derived in Subsection 3.1.4
to characterize all orthogonal covariant channels. As a first step consider the linear
maps X1 (A) = d tr(A)1I, X2 (A) = dAT and X3 (A) = dA. They are not channels
(they are not unital and X2 is not cp) but they have the correct covariance property
and it is easy to see that they correspond to the operators 1I, F, Fe ∈ B(H ⊗ H), i.e.
(Id ⊗X1 )|ψihψ| = 1I, (Id ⊗X2 )|ψihψ| = F, (Id ⊗X3 )|ψihψ| = Fe . (3.43)
Using Equation (3.21) we can determine therefore the channels which belong to the
three extremal OO-invariant states (the corners of the upper triangle in Figure 3.1):
tr(A)1I − AT
T0 (A) = A, T1 (A) = (3.44)
d−1
2 d T
T2 (A) = tr(A)1I + A − A (3.45)
d(d + 1) − 2 2
Each OO-invariant channel is a convex linear combination of these three. Special
cases are the channels corresponding to Werner and isotropic states. The latter leads
to depolarizing channels T (A) = ϑA + (1 − ϑ)d−1 tr(A)1I with ϑ ∈ [0, d2 /(d2 − 1)];
cf. Equation (3.15), while Werner states correspond to
ϑ 1 − ϑ
T (A) = tr(A)1I + AT + tr(A)1I − AT , ϑ ∈ [0, 1]; (3.46)
d+1 d−1
cf. Equation (3.8).
Let us come back now to the general case. We will state here the covariant
version of the Stinespring theorem (see [98] for a proof). The basic idea is that all
covariant channels are parameterized by representations on the dilation space.
Theorem 3.2.2 Let G be a group with finite dimensional unitary representations
πj : G → U(Hj ) and T : B(H1 ) → B(H2 ) a π1 , π2 - covariant channel. Then
there is a finite dimensional unitary representation πe : G → U(K) and an operator
V : H2 → H1 ⊗ K with V π2 (U ) = π1 (U ) ⊗ π e (U ) and T (A) = V ∗ A ⊗ 1IV .
To get an explicit example consider the dilation of a depolarizing channel given
in Equation (3.39). In this case we have π1 (U ) = π2 (U ) = U and π
e(U ) = (U ⊗ Ū)⊕1I.
The check that the map V has indeed the intertwining property V π2 (U ) = π1 (U ) ⊗
π
e(U ) stated in the theorem is left as an exercise to the reader.
3.2.3. Classical channels. — The classical analog to a quantum operation is
a channel T : C(X) → C(Y ) which describes the transmission or manipulation of
classical information. As we have mentioned already in Subsection 2.3.1 positivity
and complete positivity are equivalent in this case. Hence we have to assume only
that T is positive and unital. Obviously T is characterized by its matrix elements
Txy = δy (T |xihx|), where δy ∈ C∗ (X) denotes the Dirac measure at y ∈ Y and
|xihx| ∈ C(X) is the canonical basis in C(X) (cf. Subsection 2.1.3). Positivity and
normalization of T imply that 0 ≤ Txy ≤ 1 and
h X i X
1 = δy (1I) = δy T (1I) = δy T |xihx| = Txy (3.47)
x x
3. Basic examples 34
holds. Hence the family (Txy )x∈X is a probability distribution on X and Txy is
therefore the probability to get the information x ∈ X at the output side of the
channel if y ∈ Y was send. Each classical channel is uniquely determined by its
matrix of transition probabilities. For X = Y we see that the information is trans-
mitted without error iff Txy = δxy , i.e. T is an ideal channel if T = Id holds and
noisy otherwise.
3.2.4. Observables and Preparations. — Let us consider now a channel which
transforms quantum information B(H) into classical information C(X). Since posi-
tivity and complete positivity are again equivalent, we just have to look at a positive
and unital map E : C(X) → B(H). With the canonical basis |xihx|, x ∈ X of C(X)
we
P get a family Ex = E(|xihx|), x ∈ X of positive operators Ex ∈ B(H) with
x∈X Ex = 1I. Hence the Ex form a POV measure, i.e. an observable. If on the
other hand a POV measure Ex ∈ B(H), x ∈ X is given P we can define a quantum
to classical channel E : C(X) → B(H) by E(f ) = x f (x)Ex . This shows that
the observable Ex , x ∈ X and the channel E can be identified and we say E is the
observable.
With this interpretation in mind it is possible to have a short look at continuous
observables without the need of abstract measure theory: We only have to say how
the classical algebra C(X) is defined for a set X which is not finite or discrete. For
simplicity we assume that X = R holds, however the generalization to other locally
compact spaces is straightforward. We choose for C(R) the space of continuous,
complex valued functions vanishing at infinity, i.e. |f (x)| < ǫ for each ǫ > 0 provided
|x| is large enough. C(R) can be equipped with the sup-norm and becomes an
Abelian C*-algebra (cf. [25]). To interpret it as an operator algebra as assumed in
Subsection 2.1.1 we have to identify f ∈ C(R) with the corresponding multiplication
operator on L2 (R). An observable taking arbitrary real values can be defined now
as a positive map E : C(R) → B(H). The probability to get a result in the interval
[a, b] ⊂ R during an E measurement on systems in the state ρ is3
µ([a, b]) = sup {tr(E(f )ρ) | f ∈ C(R), 0 ≤ f ≤ 1I, supp f ⊂ [a, b]} (3.48)
where supp denotes the support of f . The most well known example for R valued
observables are of course position Q and momentum P of a free particle in one
dimension. In this case we have H = L2 (R) and the channels corresponding to Q
and P are (in position representation) given by C(R) ∋ f 7→ EQ (f ) ∈ B(H) with
EQ (f )ψ = f ψ respectively C(R) ∋ f 7→ EP (f ) ∈ B(H) with EP (f )ψ = (f ψ) b∨
where ∧ and ∨ denote the Fourier transform and its inverse.
Let us return now to a finite set X and exchange the role of C(X) and B(H); in
other words let us consider a channel R : B(H) → C(X) with a classical input and
a quantum output algebra. In the Schrödinger picture we get a family of density
matrices ρx := R∗ (δx ) ∈ B∗ (H), x ∈ X, where δx ∈ C∗ (X) denote again the Dirac
measures (cf. Subsection 2.1.3). Hence we get a parameter dependent preparation
which can be used to encode the classical information x ∈ X into the quantum
information ρx ∈ B∗ (H).
3.2.5. Instruments and Parameter Dependent Operations. — An observ-
able describes only the statistics of measuring results, but contains no information
about the state of the system after the measurement. To get a description which fills
this gap we have to consider channels which operates on quantum systems and pro-
duces hybrid systems as output, i.e. T : B(H) ⊗ M(X) → B(K). Following Davies
[50] we will call such an object an instrument. From T we can derive the subchannel
C(X) ∋ f 7→ T (1I ⊗ f ) ∈ B(K) (3.49)
3 Due to the Riesz-Markov theorem (cf. Theorem IV.18 of [134]) the set function µ extends in
which is the observable measured by T , i.e. tr T 1I ⊗ |xihx| ρ is the probability to
measure x ∈ X on systems in the state ρ. On the other hand we get for each x ∈ X
a quantum channel (which is not unital)
hence we can identify T with the family Tx , x ∈ X. Finally we can consider the
second marginal of T
X
B(H) ∋ A 7→ T (A ⊗ 1I) = Tx (A) ∈ B(K). (3.53)
x∈X
Hence we get the final state tr(Ex ρ)−1 Ex ρEx if we measure the value x ∈ X on
systems initially in the state ρ – this is well known from quantum mechanics.
Let us change now the role of B(H) ⊗ C(X) and B(K); in other words consider
a channel T : B(K) → B(H) ⊗ C(X) with hybrid input and quantum output. It
describes a device which changes the state of a system depending on additional
classical information. As for an instrument, T decomposes into a family
P of (uni-
tal!) channels Tx : B(K) → B(H) such that we get T ∗ (ρ ⊗ p) = x px Tx∗ (ρ) in
the Schrödinger picture. Physically T describes a parameter dependent operation:
depending on the classical information x ∈ X the quantum information ρ ∈ B(K)
is transformed by the operation Tx (cf. figure 3.4)
Finally we can consider a channel T : B(H) ⊗ C(X) → B(K) ⊗ C(Y ) with hybrid
input and output to get a parameter dependent instrument (cf. figure 3.5): Similarly
ρ ∈ B∗ (K)
T
x∈X
ρ ∈ B∗ (H)
∗
ρ ∈ B∗ (H) Ty,x (ρ) ∈ B∗ (K)
T
y∈Y x∈X
3.2.6. LOCC and separable channels. — Let us consider now channels acting
on finite dimensional bipartite systems: T : B(H1 ⊗ K2 ) → B(K1 ⊗ K2 ). In this case
we can ask the question whether a channel preserves separability. Simple examples
are local operations (LO), i.e. T = T A ⊗ T B with two channels T A,B : B(Hj ) →
B(Kj ). Physically we think of such a T in terms of two physicists Alice and Bob both
performing operations on their own particle but without information transmission
neither classical nor quantum. The next difficult step are local operations with
one way classical communications (one way LOCC). This means Alice operates
on her system with an instrument, communicates the classical measuring result
j ∈ X = {1, . . . , N } to Bob and he selects an operation depending on these data.
We can write such a channel as a composition T = (T A ⊗ Id)(Id ⊗T B ) of the
instrument T A : B(H1 ) ⊗ C(X1 ) → B(K1 ) and the parameter dependent operation
T B : B(H2 ) → C(X1 ) ⊗ B(K2 ) (cf. Figure 3.6)
Id ⊗T B T A ⊗Id (3.55)
B(H1 ⊗ H2 ) −−−−−→ B(H1 ) ⊗ C(X) ⊗ B(K2 ) −−−−→ B(K1 ⊗ K2 ).
It is of course possible to continue the chain in Equation (3.55), i.e. instead of
just operating on his system, Bob can invoke a parameter dependent instrument de-
pending on Alice’s data j1 ∈ X1 , send the corresponding measuring results j2 ∈ X2
to Alice and so on. To write down the corresponding chain of maps (as in Equation
(3.55)) is simple but not very illuminating and therefore omitted; cf. Figure 3.7 in-
stead. If we allow Alice and Bob to drop some of their particles, i.e. the operations
they perform need not to be unital, we get a LOCC channel (“local operations and
classical communications”). It represents the most general physical process which
can be performed on a two partite system if only classical communication (in both
directions) is available.
37 3.3. Quantum mechanics in phase space
Alice Bob
T ∗ ρ ∈ B∗ (K1 ⊗ K2 )
ρ ∈ B∗ (H1 ⊗ H2 )
TA
TB
Figure 3.6: One way LOCC operation; cf Figure 3.7 for an explanation.
Figure 3.7: LOCC operation. The upper and lower curly arrows represent Alice’s
respectively Bob’s quantum system, while the straight arrows in the middle stand
for the classical information Alice and Bob exchange. The boxes symbolize the
channels applied by Alice and Bob.
LOCC channels play a significant role in entanglement theory (we will see this
in Section 4.3), but they are difficult to handle. Fortunately it is often possible
to replace them by closely related operations with a more simple structure: A not
necessarily unital channel T : B(H1 ⊗ K2 ) → B(K1 ⊗ K2 ) is called separable, if it
is a sum of (in general non-unital) local operations, i.e.
N
X
T = TjA ⊗ TjB . (3.56)
j=1
It is easy to see that a separable T maps separable states to separable states (up
to normalization) and that each LOCC channel is separable (cf. [13]). The converse
however is (somewhat surprisingly) not true: there are separable channels which are
not LOCC, see [13] for a concrete example.
3.3 Quantum mechanics in phase space
Up to now we have considered only finite dimensional systems and even in this
extremely idealized situation it is not easy to get nontrivial results. At a first look
3. Basic examples 38
which plays a crucial role for the geometry of classical mechanics. We will call
the pair (V, σ) consisting of σ and the 2d-dimensional real vector space V = R2d
henceforth the classical phase space.
The relations in Equation (3.57) are, however, not sufficient to fix the opera-
tors Rj up to unitary equivalence. The best way to remove the remaining physical
ambiguities is the study of the unitaries
2d
X
W (x) = exp(ix · σ · R), x ∈ V, x · σ · R = xj σjk Rk (3.59)
jk=1
it is called an (irreducible) representation of the Weyl relations (on (V, σ)) and the
operators W (x) are called Weyl operators. By the well known Stone - von Neumann
uniqueness theorem all these representations are mutually unitarily equivalent, i.e. if
we have two of them W1 (x), W2 (x), there is a unitary operator U with U W1 (x)U ∗ =
W2 (x) ∀x ∈ V . This implies that it does not matter from a physical point of view
which representation we use. The most well known one is of course the Schrödinger
representation where H = L2 (Rd ) and Qj , Pk are the usual position and momentum
operators.
4 Note that the CCR (3.57) are implied by the Weyl relations (3.60) but the converse is, in
contrast to popular believe, not true: There are representations of the CCR which are unitarily
inequivalent to the Schrödinger representation; cf. [134] Section VIII.5 for particular examples.
Hence uniqueness can only be achieved on the level of Weyl operators – which is one major reason
to study them.
39 3.3. Quantum mechanics in phase space
3.3.2. Gaussian states. — A density operator ρ ∈ S(H) has finite second mo-
ments if the expectation values tr(ρQ2j ) and tr(ρPj2 ) are finite for all j = 1, . . . , d.
In this case we can define the mean m ∈ R2d and the correlation matrix α by
mj = tr(ρR), αjk + iσjk = 2 tr (Rj − mj )ρ(Rk − mk )]. (3.61)
The mean m can be arbitrary, but the correlation matrix α must be real and sym-
metric and the positivity condition
α + iσ ≥ 0 (3.62)
where |nihn| denotes the number basis and N is the mean photon number. The
characteristic function of ρN is
1 1 2
tr W (x)ρN = exp − N+ |x| , (3.65)
2 2
and its correlation matrix is simply α = 2(N + 1/2)1I
3.3.3. Entangled Gaussians. — Let us consider now bipartite systems. Hence
the phase space (V, σ) decomposes into a direct sum V = VA ⊕ VB (where A stands
for “Alice” and B for “Bob”) and the symplectic matrix σ = σA ⊕ σB is block
diagonal with respect to this decomposition. If WA (x) respectively WB (y) denote
Weyl operators, acting on the Hilbert spaces HA , HB , and corresponding to the
phase spaces VA and VB , it is easy to see that the tensor product WA (x) ⊗ WB (y)
satisfies the Weyl relations with respect to (V, σ). Hence by the Stone - von Neumann
uniqueness theorem we can identify W (x ⊕ y), x ⊕ y ∈ Va ⊕ VB = V with WA (x) ⊗
WA (y). This shows immediately that a state ρ on H = HA ⊗ HB is a product state
iff its characteristic function factorizes. Separability5 is characterized as follows (we
omit the proof, see [170] instead).
5 In infinite dimensions we have to define separable states (in slight generalization to Definition
Theorem 3.3.1 A Gaussian state with covariance matrix α is separable iff there
are covariance matrices αA , αB such that
αA 0
α≥ (3.66)
0 αB
holds.
This theorem is somewhat similar to Theorem 2.4.1: It provides a useful criterion
as long as abstract considerations are concerned, but not for explicit calculations.
In contrast to finite dimensional systems, however, separability of Gaussian states
can be decided by an operational criterion in terms of nonlinear maps between
matrices [65]. To state it we have to introduce some terminology first. The key tool
is a sequence of 2n + 2m × 2n + 2m matrices αN , N ∈ N, written in block matrix
notation as
AN CN
αN = T . (3.67)
CN BN
if αN −iσ ≥ 0 and αN +1 = 0 otherwise. Here we have set XN = CN (BN −iσB )−1 CNT
6
and the inverse denotes the pseudo inverse if BN − iσB is not invertible. Now we
can state the following theorem (see [65] for a proof):
Theorem 3.3.2 Consider a Gaussian state ρ of a bipartite system with correlation
matrix α0 and the sequence αN , N ∈ N just defined.
Proposition 3.3.3 A Gaussian state is ppt iff its correlation matrix α satisfies
−σA 0
α + ie
σ ≥ 0 with σe= . (3.69)
0 σB
The interesting question is now whether the ppt criterion is for a given number
of degrees of freedom equivalent to separability or not. The following theorem which
was proved in [144] for 1 × 1 systems and in [170] in 1 × d case gives a complete
answer.
Theorem 3.3.4 A Gaussian state of a quantum system with 1×d degrees of freedom
(i.e. dim XA = 2 and dim XB = 2d) is separable iff it is ppt; in other words iff the
condition of Proposition 3.3.3 holds.
For other kinds of systems the ppt criterion may fail which means that there
are entangled Gaussian states which are ppt. A systematic way to construct such
states can be found in [170]. Roughly speaking, it is based on the idea to go to the
boundary of the set of ppt covariance matrices, i.e. α has to satisfy Equation (3.62)
and (3.69) and it has to be a minimal matrix with this property. Using this method
explicit examples for ppt and entagled Gaussians are constructed for 2 × 2 degrees
of freedom (cf. [170] for details).
3.3.4. Gaussian channels. — Finally we want to give a short review on a special
class of channels for infinite dimensional quantum systems (cf. [84] for details). To
explain the basic idea note first that each finite set of Weyl operators (W (xj ),
j = 1, . . . , N , xj 6= xk for j 6= k)P is linear independent. This can be checked
easily using expectation values of j λj W (xj ) in Gaussian states. Hence linear
maps on the space of finite linear combinations of Weyl operators can be defined
by T [W (x)] = f (x)W (Ax) where f is a complex valued function on V and A is a
2d × 2d matrix. If we choose A and f carefully enough, such that some continuity
properties match T can be extended in a unique way to a linear map on B(H) –
which is, however, in general not completely positive.
This means we have to consider special choices for A and f . The most easy
case arises if f ≡ 1 and A is a symplectic isomorphism, i.e. AT σA = σ. If this
holds the map V ∋ x 7→ W (Ax) is a representation of the Weyl relations and
therefore unitarily equivalent to the representation we have started with. In other
words there is a unitary operator U with T [W (x)] = W (Ax) = U W (x)U ∗ , i.e.
T is unitarily implemented, hence completely positive and, in fact, well known as
Bogolubov transformation.
If A does not preserve the symplectic matrix, f ≡ 1 is no option. Instead we
have to choose f such that the matrices
i i
Mjk = f (xj − xk ) exp − xj · σxk + Axj · σAxk (3.70)
2 2
for k > 1. If the environment is initially in a thermal state ρNe (cf. Equation (3.64))
this leads to
2
1 |k − 1|
T W (x) = exp + Nc x2 W (kx), (3.74)
2 2
N ′ = k 2 N + max{0, k 2 − 1} + Nc . (3.75)
If Nc = 0 this means that T amplifies (k > 1) or damps (k < 1) the mean pho-
ton number, while Nc > 0 leads to additional classical, Gaussian noise. We will
reconsider this channel in greater detail in Chapter 6.
Chapter 4
Basic tasks
of the channels
X
C(X) ∋ f 7→ E(f ) = f (x)Ex ∈ B(H) (4.2)
x∈X
and
X
C∗ (X) ∋ p 7→ D∗ (p) = px ρx ∈ B∗ (H), (4.3)
x∈X
i.e. ρe = D∗ E ∗ (ρ) and this Equation makes sense even if X is not finite. The tele-
portation is successful if the output state ρe can not be distinguished from the input
state ρ by any statistical experiment, i.e. if D∗ E ∗ (ρ) = ρ. Hence the impossibility
of classical teleportation can be rephrased simply as ED 6= Id for all observables E
and all preparations D.
4.1.2. Entanglement enhanced teleportation. — Let us change our setup
now slightly. Assume that Alice wants to send a quantum state ρ ∈ B∗ (H) to
Bob and that she shares an entangled state σ ∈ B∗ (K ⊗ K) and an ideal classical
communication channel C(X) → C(X) with him. Alice can perform a measurement
E : C(X) → B(H ⊗ K) on the composite system B(H ⊗ K) consisting of the
particle to teleport (B(H)) and her part of the entangled system (B(K)). Then she
communicates the classical data x ∈ X to Bob and he operates with the parameter
dependent operation D : B(H) → B(K) ⊗ C(X) appropriately on his particle (cf.
4. Basic tasks 44
Figure 4.1). Hence the overall procedure can be described by the channel T =
(E ⊗ Id)D, or in analogy to (4.1)
E ∗ ⊗Id D∗
B∗ (H ⊗ K⊗2 ) −−−−→ C∗ (X) ⊗ B∗ (K) −−→ B∗ (H). (4.4)
The teleportation of ρ is successful if
T ∗ (ρ ⊗ σ) := D∗ (E ∗ ⊗ Id)(ρ ⊗ σ) = ρ (4.5)
holds, in other words if there is no statistical measurement which can distinguish
the final state T ∗ (ρ ⊗ σ) of Bob’s particle from the initial state ρ of Alice’s input
system. The two channels E and D and the entangled state σ form a teleportation
scheme if Equation (4.5) holds for all states ρ of the B(H) system, i.e. if each state
of a B(H) system can be teleported without loss of quantum information.
Assume now that H = K = Cd and X = {0, . . . , d2 − 1} holds. In this case
we can define a teleportation scheme as follows: The entangled state shared by
Alice and Bob is a maximally entangled state σ = |ΩihΩ| and Alice performs a
measurement which is given by the one dimensional projections Ej = |Φj ihΦj |,
where Φj ∈ H ⊗ H, j = 0, . . . , d2 − 1 is a basis of maximally entangled vectors.
If her result is j = 0, . . . , d2 − 1 Bob has to apply the operation τ 7→ Uj∗ τ Uj on
his partner of the entangled pair, where the Uj ∈ B(H), j = 0, . . . , d2 − 1 are an
orthonormal family of unitary operators, i.e. tr(Uj∗ Uk ) = dδjk . Hence the parameter
dependent operation D has the form (in the Schrödinger picture):
2
dX −1
∗ ∗ ∗
C (X) ⊗ B (H) ∋ (p, τ ) 7→ D (p, τ ) = pj Uj∗ τ Uj ∈ B∗ (H). (4.6)
j=0
here tr12 denotes the partial trace over the first two tensor factors (= Alice’s qubits).
If Ω, the Φj and the Uj are related by the equation
Φj = (Uj ⊗ 1I)Ω (4.10)
Alice Bob
ρ ρ
x∈X
E Dx
To get an ideal channel we just have to choose mutually orthogonal pure states
ρx = |ψx ihψx |, x = 1, . . . , d on Alice’s side and the corresponding one-dimensional
projections Ey = |ψy ihψy |, y = 1, . . . , d on Bob’s. If d = 2 and H = C2 it is
possible to send one bit classical information via one qubit quantum information.
The crucial point is now that the amount of classical information can be increased
(doubled in the qubit case) if Alice shares an entangled state σ ∈ S(H ⊗ H) with
Bob. To send the classical information x ∈ X = {1, . . . , n} to Bob, Alice operates
on her particle with an operation Dx : B(H) → B(H), sends it through an (ideal)
quantum channel to Bob and he performs a measurement E1 , . . . , En ∈ B(H ⊗ H)
on both particles. The probability for Bob to measure y ∈ X if Alice has send x ∈ X
is given by
tr (Dx ⊗ Id)∗ (σ)Ey , (4.12)
D∗ ⊗Id E∗
C∗ (X) ⊗ B∗ (H) ⊗ B∗ (H) −−−−→ B∗ (H) ⊗ B∗ (H) −−→ C∗ (X) (4.13)
Alice Bob
x∈X x∈X
Dx E
i.e. T ∗ (p) = E ∗ ◦ (D∗ ⊗ Id)(p ⊗ σ). The advantage of this point of view is that it
works as well for infinite dimensional Hilbert spaces and continuous observables.
Finally let us consider again the case where H = Cd and X = {1, . . . , d2 }. If
we choose as in the last paragraph a maximally entangled vector Ω ∈ H ⊗ H, an
orthonormal base Φx ∈ H ⊗ H, j = x, . . . , d2 of maximally entangled vectors and
an orthonormal family Ux ∈ B(H ⊗ H), x = 1, . . . , d2 of unitary operators, we can
construct a dense coding scheme as follows: Ex = |Φx ihΦx |, Dx (A) = Ux∗ AUx and
σ = |ΩihΩ|. If Ω, the Φx and the Ux are related by Equation (4.10) it is easy to see
that we really get a dense coding scheme [168]. If d = 2 holds, we have to set again
the Bell basis for the Φx , Ω = Φ0 and the identity and the Pauli matrices for the
Ux . We recover in this case the standard example of dense coding proposed in [19]
and we see that we can transfer two bits via one qubit, as stated above.
4.2 Estimating and copying
The impossibility of classical teleportation can be rephrased as follows: It is impos-
sible to get complete information about the state ρ of a quantum system by one
measurement on one system. However, if we have many systems, say N , all prepared
in the same state ρ it should be possible to get (with a clever measuring strategy)
as much information on ρ as possible, provided N is large enough. In this way we
can circumvent the impossibility of devices like classical teleportation or quantum
copying at least in an approximate way.
4.2.1. Quantum state estimation. — To discuss this idea in a more detailed
way consider a number N of d-level quantum systems, all of them prepared in the
same (unknown) state ρ ∈ B∗ (H). Our aim is to estimate the state ρ by measure-
ments on the compound system ρ⊗N . This is described in terms of an observable
E N : C(XN ) → B(H⊗N ) with values in a finite subset1 XN ⊂ S(H) of the quantum
state space S(H). According to Subsection
P 3.2.4 each such E N is given in terms of a
N N
tuple Eσ , σ ∈ XN , by E(f ) = σ f (σ)Eσ hence we get for the expectation value
of an EN measurement on systems in the state ρ⊗N the density matrix ρbN ∈ S(H)
with matrix elements
X
hφ, ρbN ψi = hφ, σψiEσN . (4.16)
x∈XN
We will call the channel E N an estimator and the criterion for a good estimator
E N is that for any one-particle density operator ρ, the value measured on a state
ρ⊗N is likely to be close to ρ, i.e. that the probability
X
K N (ω) := tr E N (ω)ρ⊗N with E N (ω) = EσN (4.17)
σ∈XN ∩ω
1 This is a severe restriction at this point and physically not very well motivated. There might
be more general (i.e. continuous) observables taking their values in the whole state space S(H)
which lead to much better estimates. However we do not discuss this possibility in order to keep
mathematics more elementary.
47 4.2. Estimating and copying
We see immediately that the probability to get wrong copies coincides exactly with
the error probability of the estimator given in Equation (4.17). This shows first
that we get exact copies in the limit N → ∞ and second that the quality of the
copies does not depend on the number M of output systems, i.e. the asymptotic
rate limN,M→∞ M/N of output systems per input system can be arbitrary large.
The fact that we get classical data at an intermediate step allows a further
generalization of this scheme. Instead of just preparing M systems in the state
σ detected by the estimator, we can apply first an arbitrary transformation F :
S(H) → S(H) on the density matrix σ and prepare F (σ)⊗M instead of σ ⊗M . In this
way we get the channel (cf. Figure 4.3)
X
B∗ (H⊗N ) ∋ τ 7→ tr(EσN τ )F (σ)⊗M ∈ B∗ (H⊗M ), (4.19)
σ∈XN
i.e. a physically realizable device which approximates the impossible machine F . The
probability to get a bad approximation of the state F (ρ)⊗M (if the input state was
ρ⊗N ) is again given by the error probability of the estimator and we get a perfect
realization of F at arbitrary rate as M, N → ∞.
There are in particular two interesting tasks which become possible this way:
The first is the “universal not gate” which associates to each pure state of a qubit the
unique pure state orthogonal to it [36]. This is a special example of a antiunitarily
implemented symmetry operation and therefore not completely positive. The second
example is the purification of states [46, 100]. Here it is assumed that the input
4. Basic tasks 48
F (σ)⊗M ∈ B∗ (H⊗M )
ρ⊗N ∈ B∗(H⊗N )
Preparation
Estimation
F
classical data
σ∈X⊂S F (σ) ∈ S
states were once pure but have passed later on a depolarizing channel |φihφ| 7→
ϑ|φihφ| + (1 − ϑ)1I/d. If ϑ > 0 this map is invertible but its inverse does not describe
an allowed quantum operation because it maps some density operators to operators
with negative eigenvalues. Hence the reversal of noise is not possible with a one shot
operation but can be done with high accuracy if enough input systems are available.
We rediscuss this topic in Chapter 7.
4.3 Distillation of entanglement
Let us return now to entanglement. We have seen in Section 4.1 that maximally
entangled states play a crucial role for processes like teleportation and dense coding.
In practice however entanglement is a rather fragile property: If Alice produces a
pair of particles in a maximally entangled state |ΩihΩ| ∈ S(HA ⊗HB ) and distributes
one of them over a great distance to Bob, both end up with a mixed state ρ which
contains much less entanglement then the original and which can not be used any
longer for teleportation. The latter can be seen quite easily if we try to apply
the qubit teleportation scheme (Subsection 4.1.2) with a non-maximally entangled
isotropic state (Equation (3.15) with λ > 0) instead of Ω.
Hence the question arises, whether it is possible to recover |ΩihΩ| from ρ, or,
following the reasoning from the last section, at least a small number of (almost)
maximally entangled states from a large number N of copies of ρ. However since
the distance between Alice and Bob is big (and quantum communication therefore
impossible) only LOCC operations (Section 3.2.6) are available for this task (Alice
and Bob can only operate on their respective particles, drop some of them and
communicate classically with one another). This excludes procedures like the pu-
rification scheme just sketched, because we would need “entangled” measurements
to get an asymptotically exact estimate for the state ρ. Hence we need a sequence
of LOCC channels
⊗N ⊗N
TN : B(CdN ⊗ CdN ) → B(HA ⊗ HB ) (4.20)
such that
kTN∗ (ρ⊗N ) − |ΩN ihΩN |k1 → 0, for N → ∞ (4.21)
holds, with a sequence of maximally entangled vectors ΩN ∈ CdN ⊗ CdN . Note that
⊗N ⊗N ∼
we have to use here the natural isomorphism HA ⊗ HB = (HA ⊗ HB )⊗N , i.e. we
49 4.3. Distillation of entanglement
have to reshuffle ρ⊗N such that the first N tensor factors belong to Alice (HA ) and
the last N to Bob (HB ). If confusion can be avoided we will use this isomorphism
in the following without a further note. We will call a sequence of LOCC channels,
TN satisfying (4.21) with a state ρ ∈ S(HA ⊗ HB ) a distillation scheme for ρ and
ρ is called distillable if it admits a distillation scheme. The asymptotic rate with
which maximally entangled states can be distilled with a given protocol is
1. First they take two pairs (let us call them pair 1 and pair 2), i.e. ρ ⊗ ρ and
apply to each of them the twirl operation PUŪ associated to isotropic states
(cf. Equation (3.18)). This can be done by LOCC operations in the following
way: Alice selects at random (respecting the Haar measure on U(2)) a unitary
operator U applies it to her qubits and sends to Bob which transformation she
has chosen; then he applies Ū to his particles. They end up with two isotropic
states ρe ⊗ ρe with the same maximally entangled fraction as ρ.
3. Finally Alice and Bob perform locally a measurement in the basis |0i, |1i on
pair 1 and discards it afterwards. If the measurements agree, pair 2 is kept
and has a higher F. Otherwise pair 2 is discarded as well.
If this procedure is repeated over and over again, it is possible to get states
with an arbitrarily high F, but we have to sacrifice more and more pairs and the
asymptotic rate is zero. To overcome this problem we can apply the scheme above
until F(ρ) is high enough such that 1 + tr(ρ ln ρ) ≥ 0 holds and then we continue
with another scheme called hashing [16] which leads to a nonvanishing rate.
If finally F(ρ) ≤ 1/2 but ρ is entangled, Alice and Bob can increase F for some
of their particles by filtering operations [9, 67]. The basic idea is that Alice applies
an instrument T : C(X) ⊗ B(H) → B(H) with two possible outcomes (X = {1, 2})
−1 ∗
to her particles. Hence
∗ the state becomes ρ 7→ px (Tx ⊗ Id) (ρ), x = 1, 2 with
probability px = tr Tx (ρ) (cf. Subsection 3.2.5 in particular Equation (3.50) for
the definition of Tx ). Alice communicates her measuring result x to Bob and if
x = 1 they keep the particle otherwise (x = 2) they discard it. If the instrument T
was correctly chosen Alice and Bob end up with a state ρe with higher maximally
entangled fraction. To find an appropriate T note first that there are ψ ∈ H ⊗ H
with hψ, (Id ⊗Θ)ρψi ≤ 0 (this follows from Theorem 2.4.3 since ρ is by assumption
entangled) and second that we can write each vector ψ ∈ H ⊗ H as (Xψ ⊗ 1I)Φ0 with
the Bell state Φ0 and an appropriately chosen operator Xψ (see Subsection 3.1.1).
4. Basic tasks 50
Now we can define T in terms of the two operations T1 , T2 (cf. Equation (3.52))
with
of particles in the state ρf . Nevertheless they can try to apply an appropriate filter
operation on ρ to get with a certain probability a new state which leads to a better
quality of the teleportation (or, if the filtering fails, to get nothing at all). It can
be shown however [88] that there are states ρf such that the error occuring in this
process (e.g. measured by the trace norm distance of σ and σ ′ ) is always above
a certain threshold. This is the point where the bound entangled states ρb come
into play: If Alice and Bob operate with an appropriate protocol on ρf and many
copies of ρb the distance between σ and σ ′ can be made arbitrarily small (although
the probability to be successful goes to zero). Another example for an activation
of bound entanglement is related to distillability of npt states: If Alice and Bob
share a certain ppt-entangled state as additional recource each npt state ρ becomes
distillable (evem if ρ is bound entangled) [60, 104]. For a more detailed survey of
the role of bound entanglement and further references see [91].
T
Encoding
Decoding
Id
ρ ρ
Id
Id
Id
Figure 4.4: Five bit quantum code: Encoding one qubit into five and correcting one
error.
4. Basic tasks 52
Theorem 4.4.1 The quantum code VΓ defined in Equation (4.29) detects the error
configuration Z ⊂ Y if the system of equations
X
Γkl gl = 0, k ∈ Y \ E, gl ∈ Zd (4.31)
l∈X∪Z
implies that
X
gl = 0, l ∈ X and Γkl gl = 0, k ∈ X (4.32)
l∈Z
holds.
We omit the proof, see [137] instead. Two particular examples (which are equiv-
alent!) are given in Figure 4.5. In both cases we have N = 1, M = 5 and K = 1
i.e. one input node, which can be chosen arbitrarily, five output nodes and the cor-
responding codes correct one error. For a more detailed survey on quantum error
correction, in particular for more examples we refer to [20].
4.5 Quantum computing
Quantum computing is without a doubt the most prominent and most far reaching
application of quantum information theory, since it promises on the one hand, “ex-
ponential speedup” for some problems which are “hard to solve” with a classical
computer, and gives completely new insights into classical computing and complex-
ity theory on the other. Unfortunately, an exhaustive discussion would require its
own review article. Hence we we are only able to give a short overview (see Part II
of [122] for a more complete presentation and for further references).
4.5.1. The network model of classical computing. — Let us start with a
brief (and very informal) introduction to classical computing (for a more complete
review and hints for further reading see Chapter 3 of [122]). What we need first is
a mathematical model for computation. There are in fact several different choices
and the Turing machine [152] is the most prominent one. More appropriate for our
purposes is, however, the so called network model, since it allows an easier general-
ization to the quantum case. The basic idea is to interpret a classical (deterministic)
computation as the evaluation of a map f : BN → BM (where B = {0, 1} denotes
the field with two elements) which maps N input bits to M output bits. If M = 1
holds f is called a boolean function and it is for many purposes sufficient to consider
this special case – each general f is in fact a Cartesian product of boolean functions.
a a
c c a b
b b
a b c a b c
0 0 0 0 0 0 a b
1 0 0 1 0 1 0 1
0 1 0 0 1 1 1 0
1 1 1 1 1 1
c = ab c = a + b − ab b =1−a
AND, ∧ OR, ∨ NOT, ¬
Figure 4.6: Symbols and definition for the three elementary gates AND, OR and
NOT.
4. Basic tasks 54
Particular examples are the three elementary gates AND, OR and NOT defined in
Figure 4.6 and arbitrary algebraic expressions constructed from them: e.g. the XOR
gate (x, y) 7→ x + y mod 2 which can be written as (x ∨ y) ∧ ¬(x ∧ y). It is now a
standard result of boolean algebra that each boolean function can be represented
in this way and there are in general many possibilities to do this. A special case
is the disjunctive normal form of f ; cf [161]. To write such an expression down in
form of equations is, however, somewhat confusing. f is therefore expressed most
conveniently in graphical form as a circuit or network, i.e. a graph C with nodes
representing elementary gates and edges (“wires”) which determine how the gates
should be composed; cf. Figure 4.7 for an example. A classical computation can now
be defined as a circuit applied to a specified string of input bits.
Variants of this model arise if we replace AND, OR and NOT by another (finite)
set G of elementary gates. We only have to guarantee that each function f can be
expressed as a composition of elements from G. A typical example for G is the set
which contains only the NAND gate (x, y) 7→ x ↑ y = ¬(x ∧ y). Since AND, OR
and NOT can be rewritten in terms of NAND (e.g. ¬x = x ↑ x) we can calculate
each boolean function by a circuit of NAND gates.
x
c
x + y mod 2
solve the problem, and in the network model we choose the number of steps needed
to execute the whole circuit, if gates which operate on different bits are allowed to
work simultaneously2 . Even with a fixed type of model the functional behavior of t
depends on the set of elementary operations we choose, e.g. the set of elementary
gates in the network model. It is therefore useful to divide computational problems
into complexity classes whose definitions do not suffer under model dependent as-
pects. The most fundamental one is the class P which contains all problems which
can be computed in “polynomial time”, i.e. t is, as a function of L, bounded from
above by a polynomial. The model independence of this class is basically the con-
tent of the strong Church Turing hypotheses which states, roughly speaking, that
each model of computation can be simulated in polynomial time on a probabilistic
Turing machine.
Problems of class P are considered “easy”, everything else is “hard”. However
even if a (decision) problem is hard the situation is not hopeless. E.g. consider
the factoring problem fac described above. It is generally believed (although not
proved) that this problem is is not in class P. But if somebody gives us a divisor
p < l of m it is easy to check whether p is really a factor, and if the answer is
true we have computed fac(m, l). This example motivates the following definition:
A decision problem f is in class NP (“nondeterministic polynomial time”) if there
is a boolean function f ′ in class P such that f ′ (x, y) = 1 for some y implies f (x). In
our example fac′ is obviously defined by fac′ (m, l, p) = 1 ⇔ p < l and p is a devisor
of m. It is obvious that P is a subset of NP the other inclusion however is rather
nontrivial. The conjecture is that P 6= NP holds and great parts of complexity
theory are based on it. Its proof (or disproof) however represents one of the biggest
open questions of theoretical informatics.
To introduce a third complexity class we have to generalize our point of view
slightly. Instead of a function f : BN → BM we can look at a noisy classical T
which sends the input value x ∈ BN to a probability distribution Txy , y ∈ BM on
BM (i.e. Txy is the transition matrix of the classical channel T ; cf. Subsection 3.2.3).
Roughly speaking, we can interpret such a channel as a probabilistic computation
which can be realized as a circuit consisting of “probabilistic gates”. This means
there are several different ways to proceed at each step and we use a classical random
number generator to decide which of them we have to choose. If we run our device
several times on the same input data x we get different results y with probability
Txy . The crucial point is now that we can allow some of the outcomes to be wrong
as long as there is an easy way (i.e. a class P algorithm) to check the validity of
the results. Hence we define BPP (“bounded error probabilistic polynomial time”)
as the class of all decision problems which admit a polynomial time probabilistic
algorithm with error probability less than 1/2 − ǫ (for fixed ǫ). It is obvious that
P ⊂ BPP holds but the relation between BPP and NP is not known.
4.5.3. Reversible computing. — In the last subsection we have discussed the
time needed to perform a certain computation. Other physical quantities which seem
to be important are space and energy. Space can be treated in a similar way as time
and there are in fact space-related complexity classes (e.g PSPACE which stands
for “polynomial space”). Energy, however, is different, because it turns surprisingly
out that it is possible to do any calculation without expending any energy! One
source of energy consumption in a usual computer is the intrinsic irreversibility
of the basic operations. E.g. a basic gate like AND maps two input bits to one
output bit, which implies obviously that the input can not be reconstructed from
2 Note
that we have glanced over a lot of technical problems at this point. The crucial difficulty
is that each circuit CN allows only the computation of a boolean function fN : BN → B which
acts on input data of length N . Since we are interested in answers for arbitrary finite length inputs
a sequence CN , N ∈ N of circuits with appropriate uniformity properties is needed; cf. [124] for
details.
4. Basic tasks 56
the output. In other words: one bit of information is erased during the operation
of the AND gate, hence a small amount of energy is dissipated to the environment.
A thermodynamic analysis, known as Landauer’s principle, shows that this energy
loss is at least kB T ln 2, where T is the temperature of the environment [106].
If we want to avoid this kind of energy dissipation we are restricted to reversible
processes, i.e. it should be possible to reconstruct the input data from the output
data. This is called reversible computation and it is performed in terms of reversible
gates, which in turn can be described by invertible functions f : BN → BN . This does
not restrict the class of problems which can be solved however: We can repackage
a non-invertible function f : BN → BM into an invertible one f ′ : BN +M → BN +M
simply by f ′ (x, 0) = (x, f (x)) and an appropriate extension to the rest of BN +M . It
can be even shown that a reversible computer performs as good as a usual one, i.e.
an “irreversible” network can be simulated in polynomial time by a reversible one.
This will be of particular importance for quantum computing, because a reversible
computer is, as we will see soon, a special case of a quantum computer.
4.5.4. The network model of a quantum computer. — Now we are ready
to introduce a mathematical model for quantum computation. To this end we will
generalize the network model discussed in Subsection 4.5.1 to the network model of
quantum computation.
somehow in the future. Particular examples are on the one hand the controlled U
operations and the set consisting of CNOT and all one-qubit gates on the other (cf.
Figure 4.8; for a proof of universality see Section 4.5 of [122]).
U1 H
U2 U1 H
U3 U2 U1 H
1 1 1 1 0
H=√ Uk = −k
2 1 −1 0 e2 π
Figure 4.9: Quantum circuit for the discrete Fourier transform on a 4-qubit register.
which represents actually the superposition of all numbers the registers can
represent – this is indeed the crucial point of quantum computing and we
come back to it below.
3. Now we can apply the quantum circuit C to the input state ψ and after the
calculation we get the output state U ψ, where U is the unitary represented
by C.
4. Basic tasks 58
4. To read out the data after the calculation we perform a von Neumann mea-
surement in the computational basis, i.e. we measure the observable given by
the one dimensional projectors |xihx|, x ∈ BN . Hence we get x ∈ BN with
probability PN = |hψ|xi|2 .
So, why is quantum computing potentially useful? First of all, a quantum com-
puter can perform at least as good as a classical computer. This follows immediately
from our discussion of reversible computing in Subsection 4.5.3 and the fact that
any invertible function f : BN → BN defines a unitary by Uf : |xi 7→ |f (x)i (the
quantum CNOT gate in Figure 4.8 arises exactly in this way from the classical
CNOT). But, there is on the other hand strong evidence which indicates that a
quantum computer can solve problems in polynomial time which a classical com-
puter can not. The most striking example for this fact is the Shor algorithm, which
provides a way to solve the factoring problem (which is most probably not in class
P) in polynomial time. If we introduce the new complexity class BQP of decision
problems which can be solved with high probability and in polynomial time with a
quantum computer, we can express this conjecture as BPP 6= BQP.
The mechanism which gives a quantum computer its potential power is the
ability to operate not just on one value x ∈ BN , but on whole superpositions
of values, as already mentioned in step 2 above. E.g. consider a, not necessarily
invertible, map f : BN → BM and the unitary operator Uf
H⊗N ⊗ H⊗M ∋ |xi ⊗ |0i 7→ Uf |xi ⊗ |0i = |xi ⊗ |f (x)i ∈ H⊗N ⊗ H⊗M . (4.34)
If we let act Uf on a register in the state Ψ ⊗ |0i from Equation (4.33) we get the
result
1 X
Uf (Ψ ⊗ |0i) = √ |xi ⊗ |f (x)i. (4.35)
2N x∈BN
Hence a quantum computer can evaluate the function f on all possible arguments
x ∈ BN at the same time! To benefit from this feature – usually called quantum
parallelism – is, however, not as easy as it looks like. If we perform a measurement
on Uf (Ψ ⊗ |0i) in the computational basis we get the value of f for exactly one
argument and the rest of the information originally contained in Uf (Ψ ⊗ |0i) is
destroyed. In other words it is not possible to read out all pairs (x, f (x)) from
Uf (Ψ ⊗ |0i) and to fill a (classical) lookup table with them. To take advantage
from quantum parallelism we have to use a clever algorithm within the quantum
computation step (step 3 above). In the next section we will consider a particular
example for this.
Before we come to this point, let us give some additional comments which link
this section to other parts of quantum information. The first point concerns entan-
glement. The state Uf (Ψ ⊗ |0i) is highly entangled (although Ψ is separable since
⊗N
Ψ = 2−1/2 (|0i + |1i) ), and this fact is essential for the “exponential speedup”
of computations we could gain in a quantum computer. In other words, to out-
perform a classical computer, entanglement is the most crucial resource – this will
become more transparent in the next section. The second remark concerns error
59 4.6. Quantum cryptography
1 1 X
√ H ⊗N |xi + |x + ai = √ (−1)x·y |yi (4.36)
2 2N −1 a·y=0
where the dot denotes the (B-valued) scalar product in the vector space BN . Now
we perform a measurement on the first register (in computational basis) and we get
a y ∈ BN with the property y · a = 0. If we repeat this procedure N times and if we
get N linear independent values yj we can determine a as a solution of the system
of equations y1 · a = 0, . . . , yN · a = 0. The probability to appear as an outcome of
the second measurement is for each y with y · a = 0 given by 21−N . Therefore the
success probability can be made arbitrarily big while the number of times we have
to access the box is linear in N .
4.6 Quantum cryptography
Finally we want to have a short look on quantum cryptography – another more
practical application of quantum information, which has the potential to emerge into
3 If an analog computer works reliably only with a certain accuracy, we can rewrite the algorithm
technology in the not so distant future (see e.g. [95, 93, 34] for some experimental
realizations and [69] for a more detailed overview). Hence let us assume that Alice
has a message x ∈ BN which she wants to send secretly to Bob over a public
communication channels. One way to do this is the so called “one-time pad”: Alice
generates randomly a second bit-string y ∈ BN of the same length as x sends x + y
instead of x. Without knowledge of the key y it is completely impossible to recover
the message x from x + y. Hence this is a perfectly secure method to transmit
secret data. Unfortunately it is completely useless without a secure way to transmit
the key y to Bob, because Bob needs y to decrypt the message x + y (simply by
adding y again). What makes the situation even worse is the fact that the key y can
be used only once (therefore the name one-time pad). If two messages x1 , x2 are
encrypted with the same key we can use x1 as a key to decrypt x2 and vice versa:
(x1 + y) + (x2 + y) = x1 + x2 , hence both messages are partly compromised.
Due to these problems completely different approaches, namely “public key sys-
tems” like DSA and RSA are used today for cryptography. The idea is to use two
keys instead of one: a private key which is used for decryption and only known to its
owner and a public key used for encryption, which is publicly available (we do not
discuss the algorithms needed for key generation, encryption and decryption here,
see [145] and the references therein instead). To use this method, Bob generates
a key pair (z, y), keeps his private key (y) at a secure place and sends the public
one (z) to Alice over a public channel. Alice encrypts her message with z sends
the result to Bob and he can decrypt it with y. The security of this scheme relies
on the assumption that the factoring problem is computationally hard, i.e. not in
class P, because to calculate y from z requires the factorization of large integers.
Since the latter is tractable on quantum computers via Shor’s algorithm, the secu-
rity of public key systems breaks down if quantum computers become available in
the future. Another problem of more fundamental nature is the unproven status of
the conjecture that factorization is not solvable in polynomial time. Consequently,
security of public key systems is not proven either.
The crucial point is now that quantum information provides a way to distribute
a cryptographic key y in a secure way, such that y can be used as a one-time
pad afterwards. The basic idea is to use the no cloning theorem to detect possible
eavesdropping attempts. To make this more transparent, let us consider a particular
example here, namely the probably most prominent protocol proposed by Benett
and Brassard in 1984 [10].
1. Assume that Alice wants to transmit bits from the (randomly generated) key
y ∈ BN through an ideal quantum channel to Bob. Before they start they
settle upon two orthonormal bases e0 , e1 ∈ H, respectively f0 , f1 ∈ H, which
are mutually nonorthogonal, i.e. |hej , fk i| ≥ ǫ > 0 with ǫ big enough for each
j, k = 0, 1. If photons are used as information carrier a typical choice are
linearly polarized photons with polarization direction rotated by 45◦ against
each other.
2. To send one bit j ∈ B Alice selects now at random one of the two bases, say
e0 , e1 and then she sends a qubit in the state |ej ihej | through the channel.
Note that neither Bob nor a potential eavesdropper knows which bases she
has chosen.
3. When Bob receives the qubit he selects, as Alice before, at random a base and
performs the corresponding von Neumann measurement to get one classical
bit k ∈ B, which he records together with the measurement method.
bit for bit which base he has used for the measurement (but not the result
of the measurement). If he has used the same base as Alice both keep the
corresponding bit otherwise they discard it. They end up with a bit-string
y ′ ∈ BM of a reduced length M . If this is not sufficient they have to continue
sending random bits until the key is long enough. For large N the rate of
successfully transmitted bits per bits sended is obviously 1/2. Hence Alice has
to send approximately twice as many bits as they need.
To see why this procedure is secure, assume now that the eavesdropper Eve can
listen and modify the information sent through the quantum channel and that she
can listen on the classical channel but can not modify it (we come back to this
restriction in a minute). Hence Eve can intercept the qubits sent by Alice and make
two copies of it. One she forwards to Bob and the other she keeps for later analysis.
Due to the no cloning theorem however she has produced errors in both copies and
the quality of her own decreases if she tries to make the error in Bob’s as small
as possible. Even if Eve knows about the two bases e0 , e1 and f0 , f1 she does not
know which one Alice uses to send a particular qubit4 . Hence Eve has to decide
randomly which base to choose (as Bob). If e0 , e1 and f0 , f1 are chosen optimal,
i.e. |hej , fk i|2 = 0.5 it is easy to see that the error rate Eve necessarily produces if
she randomly measures in one of the bases is 1/4 for large N . To detect this error
Alice and Bob simply have to sacrify portions of the generated key and to compare
randomly selected bits using their classical channel. If the error rate they detect is
too big they can decide to drop the whole key and restart from the beginning.
So let us discuss finally a situation where Eve is able to intercept the quantum
and the classical channel. This would imply that she can play Bob’s part for Alice
and Alice’s for Bob. As a result she shares a key with Alice and one with Bob.
Hence she can decode all secret data Alice sends to Bob, read it, and encode it
finally again to forward it to Bob. To secure against such a “woman in the middle
attack”, Alice and Bob can use classical authentication protocols which ensure that
the correct person is at the other end of the line. This implies that they need a
small amount of initial secret material which can be renewed however from the new
key they have generated through quantum communication.
4 If Alice and Bob uses only one basis to send the data and Eve knows about it she can produce
of course ideal copies of the qubits. This is actually the reason why two nonorthogonal bases are
necessary.
Chapter 5
Entanglement measures
We have seen in the last chapter that entanglement is an essential resource for
many tasks of quantum information theory, like teleportation or quantum compu-
tation. This means that entangled states are needed for the functioning of many
processes and that they are consumed during operation. It is therefore necessary
to have measures which tell us whether the entanglement contained in a number
of quantum systems is sufficient to perform a certain task. What makes this sub-
ject difficult, is the fact that we can not restrict the discussion to systems in a
maximally or at least highly entangled pure state. Due to unavoidable decoherence
effects realistic applications have to deal with imperfect systems in mixed states,
and exactly in this situation the question for the amount of available entanglement
is interesting.
5.1 General properties and definitions
The difficulties arising if we try to quantify entanglement can be divided, roughly
speaking, into two parts: First we have to find a reasonable quantity which describes
exactly those properties which we are interested in and second we have to calculate
it for a given state. In this section we will discuss the first problem and consider
several different possibilities to define entanglement measures.
5.1.1. Axiomatics. — First of all, we will collect some general properties which
a reasonable entanglement measure should have (cf. also [16, 154, 153, 155, 89]). To
quantify entanglement, means nothing else but to associate a positive real number
to each state of (finite dimensional) two-partite systems.
Axiom E0 An entanglement measure is a function E which assigns to each state
ρ of a finite dimensional bipartite system a positive real number E(ρ) ∈ R+ .
Note that we have glanced over some mathematical subtleties here, because E
is not just defined on the state space of B(H ⊗ K) systems for particularly chosen
Hilbert spaces H and K – E is defined on any state space for arbitrary finite
dimensional H and K. This is expressed mathematically most conveniently by a
family of functions which behaves naturally under restrictions (i.e. the restriction
to a subspace H′ ⊗ K′ coincides with the function belonging to H′ ⊗ K′ ). However
we will see soon that we can safely ignore this problem.
The next point concerns the range of E. If ρ is unentangled E(ρ) should be
zero of course and it should be maximal on maximally entangled states. But what
happens if we allow the dimensions of H and K to grow? To get an answer consider
first a pair of qubits in a maximally entangled state ρ. It should contain exactly
one bit entanglement i.e. E(ρ) = 1 and N pairs in the state ρ⊗N should contain
N bits. If we interpret ρ⊗N as a maximally entangled state of a H ⊗ H system
with H = CN we get E(ρ⊗N ) = log2 (dim(H)) = N , where we have to reshuffle in
ρ⊗N the tensor factors such that (C2 ⊗ C2 )⊗N becomes (C2 )⊗N ⊗ (C2 )⊗N (i.e. “all
Alice particles to the left and all Bob particles to the right”; cf. Section 4.3.) This
observation motivates the following.
Axiom E1 (Normalization) E vanishes on separable and takes its maximum on
maximally entangled states. This means more precisely that E(σ) ≤ E(ρ) = log2 (d)
for ρ, σ ∈ S(H ⊗ H) and ρ maximally entangled.
One thing an entanglement measure should tell us, is how much quantum infor-
mation can be maximally teleported with a certain amount of entanglement, where
63 5.1. General properties and definitions
this maximum is taken over all possible teleportation schemes and distillation pro-
tocols, hence it can not be increased further by additional LOCC operations on the
entangled systems in question. This consideration motivates the following Axiom.
Axiom E2 (LOCC monotonicity) E can not increase under LOCC operation,
i.e. E[T (ρ)] ≤ E(ρ) for all states ρ and all LOCC channels T .
A special case of LOCC operations are of course local unitary operations U ⊗ V .
Axiom E2 implies now that E(U ⊗ V ρU ∗ ⊗ V ∗ ) ≤ E(ρ) and on the other hand
E(U ∗ ⊗ V ∗ ρeU ⊗ V ) ≤ E(e
ρ) hence with ρe = U ⊗ V ρU ∗ ⊗ V we get E(ρ) ≤ E(U ⊗
V ρV ⊗U ) therefore E(ρ) = E(U ⊗V ρU ∗ ⊗V ∗ ). We fix this property as a weakened
∗ ∗
The last point we have to consider here are additivity properties: Since we are
looking at entanglement as a resource, it is natural to assume that we can do with
two pairs in the state ρ twice as much as with one ρ, or more precisely E(ρ ⊗ ρ) =
2E(ρ) (in ρ ⊗ ρ we have to reshuffle tensor factors again ;see above).
Axiom E5 (Additivity) For any pair of two-partite states ρ, σ ∈ S(H ⊗ K) we
have E(σ ⊗ ρ) = E(σ) + E(ρ).
Unfortunately this rather natural looking axiom seems to be too strong (it ex-
cludes reasonable candidates). It should be however always true that entanglement
can not increase if we put two pairs together.
Axiom E5a (Subadditivity) For any pair of states ρ, σ we have E(ρ ⊗ σ) ≤
E(ρ) + E(σ).
There are further modifications of additivity available in the literature. Most
frequently used is the following, which restricts Axiom E5 to the case ρ = σ:
Axiom E5b (Weak additivity) For any state ρ of a bipartite system we have
N −1 E(ρ⊗N ) = E(ρ).
5. Entanglement measures 64
Finally, the weakest version of additivity only deals with the behavior of E for
large tensor products, i.e. ρ⊗N for N → ∞.
Axiom E5c (Existence of a regularization) For each state ρ the limit
E(ρ⊗N )
E ∞ (ρ) = lim (5.2)
N →∞ N
exists.
5.1.2. Pure states. — Let us consider now a pure state ρ = |ψihψ| ∈ S(H ⊗
K). If it is entangled its partial trace σ = trH |ψihψ| = trK |ψihψ| is mixed and
for a maximally entangled state it is maximally mixed. This suggests to use the
von Neumann entropy1 of ρ, which measures how much a state is mixed, as an
entanglement measure for mixed states, i.e. we define [9, 16]
EvN (ρ) = − tr trH ρ ln(trH ρ) . (5.3)
It is easy to deduce from the properties of the von Neumann entropy that EvN
satisfies Axioms E0, E1, E3 and E5b. Somewhat more difficult is only Axiom E2
which follows however from a nice theorem of Nielsen [119] which relates LOCC
operations (on pure states) to the theory of majorization. To state it here we need
first some terminology. Consider two probability distributions λ = (λ1 , . . . , λM )
and µ = (µ1 , . . . , µN ) both given in decreasing order (i.e. λ1 ≥ . . . ≥ λM and
µ1 ≥ . . . ≥ µN ). We say that λ is majorized by µ, in symbols λ ≺ µ, if
k
X k
X
λj ≤ µj ∀k = 1, . . . , min M, N (5.4)
j=1 j=1
holds. Now we have the following result (see [119] for a proof).
P 1/2
Theorem 5.1.1 A pure state ψ = j λj ej ⊗ e′j ∈ H ⊗ K can be transformed
P 1/2
into another pure state φ = j µj fj ⊗ fj′ ∈ H ⊗ K via a LOCC operation, iff the
Schmidt coefficients of ψ are majorized by those of φ, i.e. λ ≺ µ.
The von Neumann entropy of the restriction trH |ψihψ| can be P immediately
calculated from the Schmidt coefficients λ of ψ by EvN (|ψihψ|) = − Pj λj ln(λj ).
Axiom E2 follows therefore from the fact that the entropy S(λ) = − j λj ln(λj )
of a probability distribution λ is a Shur concave function, i.e. λ ≺ µ implies S(λ) ≥
S(µ); see [121].
Hence we have seen so far that EvN is one possible candidate for an entanglement
measure on pure states. In the following we will see that it is in fact the only
candidate which is physically reasonable. There are basically two reasons for this.
The first one deals with distillation of entanglement. It was shown by Bennett et.
al. [9] that each state ψ ∈ H ⊗ K of a bipartite system can be prepared out of
(a possibly large number of) systems in an arbitrary entangled state φ by LOCC
operations. To be more precise, we can find a sequence of LOCC operations
TN : B (H ⊗ K)⊗M(N ) → B (H ⊗ K)⊗N (5.5)
such that
1 We assume here and in the following that the reader is sufficiently familiar with entropies. If
holds with a nonvanishing rate r = limN →∞ M (N )/N . This is done either by dis-
tillation (r < 1 if ψ is higher entangled then φ) or by “diluting” entanglement,
i.e. creating many less entangled states from few highly entangled ones (r > 1).
All this can be performed in a reversible way: We can start with some maximally
entangled qubits dilute them to get many less entangled states which can be dis-
tilled afterwards to get the original states back (again only in an asymptotic sense).
The crucial point is that the asymptotic rate r of these processes is given in terms
of EvN by r = EvN (|φihφ|)/EvN (|ψihψ|). Hence we can say, roughly speaking that
EvN (|ψihψ|) describes exactly the amount of maximally entangled qubits which is
contained in |ψihψ|.
A second somewhat more formal reason is that EvN is the only entanglement
measure on the set of pure states which satisfies the axioms formulated above. In
other words the following “uniqueness theorem for entanglement measures” holds
[129, 155, 57]
Theorem 5.1.2 The reduced von Neumann entropy EvN is the only entanglement
measure on pure states which satisfies Axioms E0 – E5.
such that
holds with a sequence of maximally entangled states ΩN ∈ CdN . Now we can define
log2 (dN )
ED (ρ) = sup lim sup , (5.9)
(TN )N ∈N N →∞ N
where the supremum is taken over all possible distillation protocols (TN )N ∈N . It
is not very difficult to see that ED satisfies E0, E1, E2 and E5b. It is not known
whether continuity (E4) and convexity (Axiom E3) holds. It can be shown however
that ED is not convex (and not additive; Axiom E5) if npt bound entangled states
exist (see [141], cf. also Subsection 4.3.3).
For pure states we have discussed beside distillation the “dilution” of entan-
glement and we can use, similar to ED , the asymptotic rate with which bipartite
systems in a given state ρ can be prepared out of maximally entangled singlets [78].
Hence consider again a sequence of LOCC channels
and a sequence of maximally entangled states ΩN ∈ CdN , N ∈ N, but now with the
property
log2 (dN )
EC (ρ) = inf lim inf , (5.12)
(SN )N ∈N N →∞ N
where the infimum is taken over all dilution protocols SN , N ∈ N. It is again easy to
see that EC satisfies E0, E1, E2 and E5b. In contrast to ED however it can be shown
that EC is convex (Axiom E3), while it is not known, whether EC is continuous
(Axiom E4); cf [78] for proofs.
ED and EC are based directly on operational concepts. The remaining two mea-
sures we want to discuss here are defined in a more abstract way. The first can be
characterized as the minimal convex extension of EvN to mixed states: We define
the entanglement of formation EF of ρ as [16]
X
EF (ρ) = P inf pj EvN (|ψj ihψj |), (5.13)
ρ= pj |ψj ihψj |
j
where the infimum is taken over all decompositions of ρ into a convex sum of pure
states. EF satisfies E0 - E4 and E5a (cf. [16] for E2 and [120] for E4 the rest follows
directly from the definition). Whether EF is (weakly) additive (Axiom E5b) is not
known. Furthermore it is conjectured that EF coincides with EC . However proven
is only the identity EF∞ = EC , where the existence of the regularization EF∞ of EF
follows directly from subadditivity.
Another idea to quantify entanglement is to measure the “distance” of the (en-
tangled) ρ from the set of separable states D. It hat turned out [154] that among
all possible distance functions the relative entropy is physically most reasonable.
Hence we define the relative entropy of entanglement as
ER (ρ) = inf S(ρ|σ), S(ρ|σ) = tr ρ log2 ρ − ρ log2 σ , (5.14)
σ∈D
where the infimum is taken over all separable states. It can be shown that ER
satisfies, as EF the Axioms E0 - E4 and E5a, where E1 and E2 are shown in [154]
and E4 in [56]; the rest follows directly from the definition. It is shown in [159] that
∞
ER does not satisfy E5b; cf. also Subsection 5.3. Hence the regularization ER of
ER differs from ER .
Finally let us give now some comments on the relation between the measures just
introduced. On pure states all measures just discussed, coincide with the reduced
von Neumann entropy – this follows from Theorem 5.1.2 and the properties stated in
the last Subsection. For mixed states the situation is more difficult. It can be shown
however that ED ≤ EC holds and that all “reasonable” entanglement measures lie
in between [89].
Theorem 5.1.3 For each entanglement measure E satisfying E0, E1, E2 and E5b
and each state ρ ∈ S(H ⊗ K) we have ED (ρ) ≤ E(ρ) ≤ EC (ρ).
holds, with
3
X 3
X
C(ψ) = α2j with ψ = αj Φj , (5.17)
j=0 j=0
where Φj , j = 0, . . . , 3 denotes the Bell basis (3.3). Since C becomes rather im-
portant in the following let us reexpress it as C(ψ) = |hψ, Ξψi|, where ψ 7→ Ξψ
denotes complex conjugation in Bell basis. Hence Ξ is an antiunitary operator and
it can be written as the tensor product Ξ = ξ ⊗ ξ of the map H ∋ φ 7→ σ2 φ̄, where
φ̄ denotes complex conjugation in the canonical basis and σ2 is the second Pauli
matrix. Hence local unitaries (i.e. those of the form U1 ⊗ U2 ) commute with Ξ and
it can be shown that this is not only a necessary but also a sufficient condition for
a unitary to be local [160].
We see from Equations (5.15) and (5.17) that C(ψ) ranges from 0 to 1 and that
EvN (ψ) is a monotone function in C(ψ). The latter can be considered therefore as
an entanglement quantity in its own right. For a Bell state we get in particular
C(Φj ) = 1 while a separable state φ1 ⊗ φ2 leads to C(φ1 ⊗ φ2 ) = 0; this can be seen
easily with the factorization Ξ = ξ ⊗ ξ.
Assume now that one of the αj say α0 satisfies |α0 |2 > 1/2. This implies that
C(ψ) can not be zero since
3
X
α2j ≤ 1 − |α0 |2 (5.18)
j=1
must hold. Hence C(ψ) is at least 1 − 2|α0 |2 and this implies for EvN and arbitrary
ψ
( h p i
2
H 21 + x(1 − x) x ≥ 21
EvN (ψ) ≥ h |hΦ0 , ψi| with h(x) = . (5.19)
0 x < 21
5. Entanglement measures 68
The
P reduced von Neumann entropy of all these states equals h(λ1 ), hence
j µj EvN (|Ψj ihΨj |) = h(λ1 ) and therefore EF (ρ) = h(λ1 ). Since the maximally
entangled fraction of ρ is obviously λ1 we see that (5.21) holds with equality.
Assume now that the highest eigenvalue is less than 1/2. Then we can find phase
P
factors exp(iφj ) such that 3j=0 exp(iφj )λj = 0 holds and ρ can be expressed as a
convex linear combination of the states
p X3
p
eiφ0 /2 λ0 Φ0 + i (±eiφj /2 λj )Φj . (5.23)
j=1
(5.21) since the maximally entangled fraction of ρ is less than 1/2. Summarizing
this discussion we have shown (cf. Figure 5.1)
Proposition 5.2.1 A Bell diagonal state ρ is entangled iff its highest eigenvalue
λ is greater than 1/2. In this case the Entanglement of Formation of ρ is given by
1 p
EF (ρ) = H + λ(1 − λ) . (5.24)
2
1
Entanglement of Formation
EF (ρ) Relative Entropy
ER (ρ)
0.8
0.6
0.4
0.2
0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Highest eigenvalue λ of ρ
Figure 5.1: Entanglement of Formation and Relative Entropy of Entanglement for
Bell diagonal states, plotted as a function of the highest eigenvalue λ of ρ
with
q
√ √
R= ρΞρΞ ρ. (5.26)
Here we have set ρ = |ψihψ|. The definition of the hermitian matrix R however
makes sense for arbitrary ρ as well. If we write λj , j = 1, . . . , 4 for the eigenvalues of
R and λ1 is without loss of generality the biggest one we can define the concurrence
of an arbitrary two qubit state ρ as [172]
C(ρ) = max 0, 2λ1 − tr(R) = max(0, λ1 − λ2 − λ3 − λ4 ). (5.27)
It is easy to see that C(|ψihψ|) coincides with C(ψ) from (5.17). The crucial point
is now that Equation (5.15) holds for EF (ρ) if we insert C(ρ) instead of C(ψ):
Theorem 5.2.2 (Wootters Formula) The Entanglement of Formation of a
two qubit system in a state ρ is given by
1 p
EF (ρ) = H 1 + 1 − C(ρ)2 (5.28)
2
5. Entanglement measures 70
where the concurrence of ρ is given in Equation (5.27) and H denotes the binary
entropy from (5.16).
Since log is a concave function we have − log2 hΦj , σΦj i ≤ hΦj , − log2 (σ)Φj i and
therefore
3
X
ER (ρ) ≥ tr(ρ log2 ρ) + inf − λj log2 hΦj , σΦj i . (5.32)
σ∈D
j=0
Hence only the diagonal elements of σ in the Bell basis enter the minimization
on the right hand side of this inequality and this implies that we can restrict the
infimum to the set of separable Bell diagonal state. Since a Bell diagonal state is
separable iff all its eigenvalues are less than 1/2 (Proposition 5.2.1) we get
X3 X3
ER (ρ) ≥ tr(ρ log2 ρ) + inf −
λj log2 pj , with pj = 1. (5.33)
pj ∈[0,1/2]
j=0 j=0
This is an optimization problem (with constraints) over only four real parameters
and easy to solve. If the highest eigenvalue of ρ is greater than 1/2 we get p1 = 1/2
and pj = λj /(2 − 2λ), where we have chosen without loss of generality λ = λ1 . We
get a lower bound on ER (ρ) which is achieved if we insert the corresponding σ in
Equation (5.31). Hence we have proven the statement for λ > 1/2. which completes
the proof, since we have seen already that λ ≤ 1/2 implies that ρ is separable
(Proposition 5.2.1). 2
71 5.3. Entanglement measures under symmetry
where we have denoted the set of G-invariant states with PG S. 3. Determine EF (ρ)
then in terms of the convex hull of ǫ, i.e.
P
EF (ρ) = inf{ j λj ǫ(σj ) |
P P
σj ∈ PG S, 0 ≤ λj ≤ 1, ρ = j λj σj , j λj = 1}. (5.35)
The equality in the last Equation is of course a non-trivial statement which has to
be proved. We skip this point, however, and refer the reader to [159]. The advantage
of this scheme relies on the fact that spaces of G invariant states are in general very
low dimensional (if G is not too small). Hence the optimization problem contained
in step 3 has a much bigger chance to be tractable than the one we have to solve for
the original definition of EF . There is of course no guarantee that any of this three
steps can be carried out in a concrete situation. For the three examples mentioned
above, however, there are results available, which we will present in the following.
5.3.2. Werner states. — Let us start with Werner states [159]. In this case ρ is
uniquely determined by its flip expectation value tr(ρF ) (cf. Subsection 3.1.2). To
determine Φ ∈ H ⊗ H such that PUU |ΦihΦ| = ρ holds, we have to solve therefore
the equation
X
hΦ, F Φi = Φjk Φkj = tr(F ρ), (5.36)
jk
where Φjk denote components of Φ in the canonical basis. On the otherPhand the
reduced density matrix ρ = tr1 |ΦihΦ| has the matrix elements ρjk = l Φjl Φkl .
By exploiting U ⊗ U invariance we can assume without loss of generality that ρ is
diagonal. Hence to get the function ǫUU we have to minimize
" #
X X
2
EvN |ΦihΦ| = S |Φjk | (5.37)
j k
under the constraint (5.36), where S(x) = −x log2 (x) denotes the von Neumann
entropy. We skip these calculations here (see [159] instead) and state the results
5. Entanglement measures 72
1
EF (ρ)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1 -0.8 -0.6 -0.4 -0.2 0
tr(ρF )
Figure 5.2: Entanglement of Formation for Werner states plotted as function of the
flip expectation.
only. For tr(F ρ) ≥ 0 we get ǫ(ρ) = 0 (as expected since ρ is separable in this case)
and with H from (5.16)
1 p
ǫUU (ρ) = H 1 − 1 − tr(F ρ)2 (5.38)
2
for tr(F ρ) < 0. The minima are taken for Φ where all Φjk except one diagonal
element are zero in the case tr(F ρ) ≥ 0 and for Φ with only two (non-diagonal)
coefficients Φjk , Φkj , j 6= k nonzero if tr(ρF ) < 0. The function ǫ is convex and
coincides therefore with its convex hull such that we get
( h p i
1
H 2 1− 1 − tr(F ρ)2 tr(F ρ) < 0
EF (ρ) = (5.39)
0 tr(F ρ) ≥ 0.
2
d=4
ǫUŪ (ρ) d=3
1.8 d=2
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
1 1.5 2 2.5 3 3.5 4
tr(ρFe )
Figure 5.3: ǫ-function for isotopic states plotted as a function of the flip expectation.
For d > 2 it is not convex near the right endpoint.
with V = U1T U2 and after inserting the definition of Fe . Following our general
scheme, we have to minimize EvN (|ΦihΦ|) under the constraint given in Equation
(5.42). This is explicitly done in [150]. We will only state the result here, which
leads to the function
(
H(γ) + (1 − γ) log2 (d − 1) tr(ρFe ) ≥ d1
ǫUŪ (ρ) = (5.43)
0 tr(ρFe ) < 0
with
q q 2
1 e e
γ= 2 tr(ρF ) + [d − 1][d − tr(ρF )] . (5.44)
d
For d ≥ 3 this function is not convex (cf. Figure 5.3), hence we get
Proposition 5.3.2 For any isotropic state the Entanglement of Formation is given
as the convex hull
P P
EF (ρ) = inf{ j λj ǫUŪ (σj ) | ρ = j λj σj , PUŪ σ = σ} (5.45)
5.3.4. OO-invariant states. — The results derived for isotropic and Werner
states can be extended now to a large part of the set of OO-invariant states with-
out solving new minimization problems. This is possible, because the definition of
EF in Equation (5.13) allows under some conditions
an easy extension to a suitable set of non-symmetric
3
states. If more precisely
P a nontrivial, minimizing de-
composition ρ = j p j |ψj ihψj | of ρ is known, all
states ρ′ which are a convex linear combination of the
2 same |ψj ihψj | but arbitrary p′j have the same EF as ρ
(see [159] for proof of the statement). For the general
A scheme we have presented in Subsection 5.3.1 this im-
plies the following: If we know the pure states σ ∈ Mρ
1
C which solve the minimization problem for ǫ(ρ) in Equa-
tion (5.34) we get a minimizing decomposition of ρ in
B terms of U ∈ G translated copies of σ. This follows
-1 0 1 from the fact that ρ is by definition of Mρ the twirl of
0 σ. Hence any convex linear combination of pure states
U σU ∗ with U ∈ G has the same EF as ρ.
Figure 5.4: State space of
OO-invariant states.
1
ER (ρ)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1 -0.8 -0.6 -0.4 -0.2 0
tr(ρF )
Figure 5.5: Relative Entropy of Entanglement for Werner states, plotted as a func-
tion of the flip expectation.
The sets of Werner and isotropic states are just intervals and the corresponding
separable states form subintervals over which we have to perform the optimization.
Due to the convexity of the relative entropy in both arguments, however, it is
clear that the minimum is attained exactly at the boundary between entangled and
separable states. For Werner states this is the state σ0 with tr(F σ0 ) = 0, i.e. it
gives equal weight to both minimal projections. To get ER (ρ) for a Werner state ρ
we have to calculate therefore only the relative entropy with respect to this state.
Since all Werner states can be simultaneously diagonalized this is easily done and
we get:
1 + tr(F ρ)
ER (ρ) = 1 − H (5.46)
2
Similarly, the boundary point σ1 for isotropic states is given by tr(Feσ1 ) = 1 which
leads to
! !
tr(Feρ) tr(Feρ) 1 − tr(Feρ)
ER (ρ) = log2 d − 1− log2 (d − 1) − S , (5.47)
d d d
for each entangled isotropic state ρ, and 0 if ρ is separable. (S(p1 , p2 ) denotes here
the entropy of the probability vector (p1 , p2 ).)
5. Entanglement measures 76
2
d=2
ER (ρ) d=3
d=4
1.5
0.5
0
1 1.5 2 2.5 3 3.5 4
tr(ρFe )
Let us consider now OO-invariant states. As for EOF we divide the state space
into the separable square and the three triangles A, B, C; cf. Figure 5.4. The state
at the coordinates (1, d) is a maximally entangled state and all separable states on
the line connecting (0, 1) with (1, 1) minimize the relative entropy for this state.
Hence consider a particular state σ on this line. The convexity property of the
relative entropy shows immediately that σ is a minimizer for all states on the line
connecting σ with the state at (1, d). In this way it is easy to calculate ER (ρ) for
all ρ in A. In a similar way we can treat the triangle B: We just have to draw a line
from ρ to the state at (−1, 0) and find the minimizer for ρ at the intersection with
the separable border between (0, 0) and (0, 1). For all states in the triangle C the
relative entropy is minimized by the separable state at (0, 1).
An application of the scheme just reviewed is a proof that ER is not additive, i.e.
it does not satisfy Axiom E5b. To see this consider the state ρ = tr(P− )−1 P− where
P− denotes the projector on the antisymmetric subspace. It is a Werner state with
flip expectation −1 (i.e. it corresponds to the point (−1, 0) in Figure 5.4). According
to our discussion above S(ρ| · ) is minimized in this case by the separable state σ0
and we get ER (ρ) = 1 independently of the dimension d. The tensor product ρ⊗2
can be regarded as a state in S(H⊗2 ⊗ H⊗2 ) with U ⊗ U ⊗ V ⊗ V symmetry, where
U, V are unitaries on H. Note that the corresponding state space of U U V V invariant
states can be parameterized by the expectation of the three operators F ⊗ 1I, 1I ⊗ F
and F ⊗ F (cf. [159]) and we can apply the machinery just described to get the
minimizer σ e of S(ρ| · ). If d > 2 holds it turns out that
d+1 d−1
σ
e= 2
P+ ⊗ P+ + P− ⊗ P− (5.48)
2d tr(P+ ) 2d tr(P− )2
holds (where P± denote the projections onto the symmetric and antisymmetric
subspaces of H ⊗ H) and not σ e = σ0 ⊗ σ0 as one would expect. As a consequence
we get the inequality
2d − 1
ER (ρ⊗2 ) = 2 − log2 < 2 = S(ρ⊗2 |σ0⊗2 ) = 2ER (ρ). (5.49)
d
77 5.3. Entanglement measures under symmetry
The cb-norm improves the sometimes annoying property of the usual operator norm
that quantities like kT ⊗ IdB(Cd ) k may increase with the dimension d. On infinite
dimensional observable algebras kT kcb can be infinite although each term in the
supremum is finite. A particular example for a map with such a behavior is the
transposition on an infinite dimensional Hilbert space. A map with finite cb-norm
is therefore called completely bounded. In a finite dimensional setup each linear
map is completely bounded. For the transposition Θ on Cd we have in particular
kΘkcb = d. The cb-norm has some nice features which we will use frequently; this
includes its multiplicativity kT1 ⊗T2 kcb = kT1 kcb kT2 kcb and the fact that kT kcb = 1
holds for each (unital) channel. Another useful relation is kT kcb = kT ⊗ IdB(H) k,
which holds if T is a map B(H) → B(H). For more properties of the cb-norm let
us refer to [125].
Now we can define the quantity
∆(T, B) = inf kET D − IdB kcb , (6.2)
E,D
79 6.1. The general case
where the infimum is taken over all channels E : A2 → B and D : B → A1 and IdB
is again the ideal B-channel. ∆ describes, as indicated above, the smallest possible
error we have to take into account if we try to transmit one B system through
one copy of the channel T using any encoding E and decoding D. In Section 4.4,
however, we have seen that we can reduce the error if we take M copies of the
channel instead of just one. More generally we are interested in the transmission
of “codewords of length” N , i.e. B⊗N systems using M copies of the channel T .
Encodings and decodings are in this case channels of the form E : A⊗M 2 → B⊗N
⊗N ⊗M
respectively D : B → A1 . If we increase the number M of channels the error
∆(T ⊗M , B⊗N (M) ) decreases provided the rate with which N grows as a function of
M is not too large. A more precise formulation of this idea leads to the following
definition.
Definition 6.1.1 Let T be a channel and B an observable algebra. A number c ≥ 0
is called achievable rate for T with respect to B, if for any pair of sequences Mj , Nj ,
j ∈ N with Mj → ∞ and lim supj→∞ Nj /Mj < c we have
The supremum of all achievable rates is called the capacity of T with respect to B
and denoted by C(T, B).
Note that by definition c = 0 is an achievable rate hence C(T, B) ≥ 0. If on
the other hand each c > 0 is achievable we write C(T, B) = ∞. At a first look
it seems cumbersome to check all pairs of sequences with given upper ratio when
testing c. Due to some monotonicity properties of ∆, however, it can be shown that
it is sufficient to check only one sequence provided the Mj satisfy the additional
condition Mj /(Mj+1 ) → 1.
6.1.2. Simple calculations. — We see that there are in fact many different
capacities of a given channel depending on the type of information we want to
transmit. However, there are only two different cases we are interested in: B can
be either classical or quantum. We will discuss both special cases in greater detail
in the next two sections. Before we do this, however, we will have a short look on
some simple calculations which can be done in the general case. To this end it is
convenient to introduce the notations
as shorthand notations for B(Cd ) and C({1, . . . , d}) since some notations become
otherwise a little bit clumsy. First of all let us have a look on capacities of ideal
channels. If IdMf and IdCf denote the identity channels on the quantum algebra
Mf respectively the classical algebra Cf we get
log2 f
C(IdCf , Md ) = 0, C(IdCf , Cd ) = C(IdMf , Md ) = C(IdMf , Cd ) = . (6.5)
log2 d
The first equation is the channel capacity version of the no-teleportation theorem:
It is impossible to transfer quantum information through a classical channel. The
other equations follow simply by counting dimensions.
For the next relation it is convenient to associate to a pair of channels T , S the
quantity C(T, S) which arises if we replace in Definition 6.1.1 and Equation (6.2)
the ideal channel IdB by an arbitrary channel S. Hence C(T, S) is a slight gener-
alization of the channel capacity which describes with which asymptotic rate the
channel S can be approximated by T (and appropriate encodings and decodings).
6. Channel capacity 80
These generalized capacities satisfy the two step coding inequality, i.e. for the three
channels T1 , T2 , T3 we have
where we have used for the last inequality the fact that the cb-norm of a
channel is one. If c1 is an achievable rate of T1 with respect to T2 such that
lim supj→∞ Mj /Nj < c1 and c2 is an achievable rate of T2 with respect to T3
such that lim supj→∞ Nj /Kj < c2 we see that
Mj M j Nj Mj Nk
lim sup = lim sup ≤ lim sup lim sup . (6.10)
j→∞ Kj j→∞ Nj Kj j→∞ Nj k→∞ Kk
If we choose the sequences Mj , Nj and Kj clever enough (cf. the remark following
Definition 6.1.1) this implies that c1 c2 is an achievable rate for T1 with respect to
T3 and this proves Equation (6.6).
As a first application of (6.6), we can relate all capacities C(T, Md ) (and
C(T, Cd )) for different d to one another. If we choose T3 = T , T1 = IdMd and
T2 = IdMf we get with (6.5) C(T, Md ) ≤ log 2f
log2 d C(T, Mf ), and exchanging d with f
shows that even equality holds. A similar relation can be shown for C(T, Cd ). Hence
the dimension of the observable algebra B describing the type of information to be
transmitted, enters only via a multiplicative constant, i.e. it is only a choice of units
and we define the classical capacity Cc (T ) and the quantum capacity Cq (T ) of a
channel T as
Cq (T ) ≤ Cc (T ). (6.12)
Note that it is now not possible to interchange the roles of C2 and M2 . Hence
equality does not hold here.
Another useful relation concerns concatenated channels: We transmit informa-
tion of type B first through a channel T1 and then through a second channel T2 .
It is reasonable to assume that the capacity of the composition T2 T1 can not be
bigger than capacity of the channel with the smallest bandwidth. This conjecture
is indeed true and known as the “Bottleneck inequality”:
k Id⊗N
B −E(T2 T1 )
⊗M
Dkcb = k Id⊗N ⊗M
B −(ET2 )T1⊗M Dkcb . (6.14)
81 6.2. The classical capacity
This implies that ET2⊗M and D are an encoding and a decoding channel for T1 .
Something similar holds for D and T1⊗M D with respect to T2 . Hence each achievable
rate for T2 T1 is also an achievable rate for T2 and T1 , and this proves Equation (6.13).
Finally we want to consider two channels T1 , T2 in parallel, i.e. we consider the
tensor product T1 ⊗ T2 . If Ej , Dj , j = 1, 2 are encoding, respectively decoding
⊗N
channels for T1⊗M and T2⊗M such that k IdB j −Ej Tj⊗M Dj kcb ≤ ǫ holds, we get
When all channels are ideal, or when all systems involved are classical even equality
holds, i.e. channel capacities are additive in this case. However, if quantum channels
are considered, it is one of the big open problems of the field, to decide under which
conditions additivity holds.
6.2 The classical capacity
In this section we will discuss the classical capacity Cc (T ) of a channel T . There
are in fact three different cases to consider: T can be either classical or quantum
and in the quantum case we can use either ordinary encodings and decodings or a
dense coding scheme (cf. Subsection 4.1.3).
6.2.1. Classical channels. — Let us consider first a classical to classical channel
T : C(Y ) → C(X). This is basically the situation of classical information theory
and we will only have a short look here – mainly to show how this (well known)
situation fits into the general scheme described in the last section1 .
First of all we have to calculate the error quantity ∆(T, C2 ) defined in Equation
(6.2). As stated in Subsection 3.2.3 T is completely determined by its transition
probabilities Txy , (x, y) ∈ X × Y describing the probability to receive x ∈ X when
y ∈ Y was sent. Since the cb-norm for a classical algebra coincides with the ordinary
norm we get (we have set X = Y for this calculation):
X
k Id −T kcb = k Id −T k = sup (δxy − Txy ) fy (6.19)
x,f y
where the supremum in the first equation is taken over all f ∈ C(X) with kf k =
supy |fy | ≤ 1. We see that the quantity in Equation (6.20) is exactly twice the
maximal error probability, i.e. the maximal probability of sending x and getting
anything different. Inserting this quantity for ∆ in Definition 6.1.1 applied to a
classical channel T and the “bit-algebra” B = C2 , we get exactly Shannons classical
definition of the capacity of a discrete memoryless channel [138].
Hence we can apply Shannons noisy channel coding theorem to calculate Cc (T )
for a classical channel. To state it we have to introduce first some terminology.
Consider therefore a state p ∈ C∗ (X) of the classical input algebra C(X) and its
1 Please note that this implies in particular that we do not give a complete review of the
where S(p), S(q) and S(P ) denote the entropies of p, q and P . The mutual infor-
mation describes, roughly speaking, the information that p and q contain about
each other. E.g. if p and q are completely uncorrelated (i.e. Pxy = px qy ) we get
I(p, T ) = 0. If T is on the other hand an ideal bit-channel and p equally distributed
we have I(p, T ) = 1. Now we can state Shannons Theorem which expresses the
classical capacity of T in terms of mutual informations [138]:
Theorem 6.2.1 (Shannon) The classical capacity of Cc (T ) of a classical commu-
nication channel T : C(Y ) → C(X) is given by
Cc (T ) = sup I(p, T ), (6.22)
p
where the supremum is taken over all encodings and decodings of classical bits. The
term “one-shot” in this definition arises from the fact that we need apparently only
one invocation of the channel T . However many uses of the channel are hidden in
the definition of the classical capacity on the right hand side. Hence Cc,1 (T ) can
be defined alternatively in the same way as Cc (T ) except that no entanglement
is allowed during encoding and decoding, or more precisely in Definition 6.1.1 we
consider only encodings E : B(K)⊗M → C⊗N 2 which prepare separable states and
only decodings D : C2⊗N → B(H)⊗M which lead to separable observables. It is not
yet known, whether entangled codings can help to increase the transmission rate.
Therefore we only know that
1
Cc,1 (T ) ≤ Cc (T ) = sup Cc,1 (T ⊗M ) (6.24)
M∈N M
holds. One reason why Cc,1 (T ) is an interesting quantity relies on the fact that we
have, due to the following theorem by Holevo [80] a computable expression for it.
Theorem 6.2.2 The one-shot classical capacity Cc,1 (T ) of a quantum channel T :
B(H) → B(H) is given by
X X
Cc,1 (T ) = sup S pj T ∗ [ρj ] − pj S T ∗ [ρj ] , (6.25)
pj ,ρj
j j
83 6.2. The classical capacity
where the supremum is taken over all probability distributions pj and collections of
density operators ρj .
which is the quantum analog of the mutual information given in Equation (6.21).
It has a number of nice properties, in particular positivity, concavity with respect
to the input state and additivity [2] and its maximum with respect to ρ coincides
actually with Ce (T ) [18].
Theorem 6.2.3 The entanglement assisted capacity Ce (T ) of a quantum channel
T : B(H) → B(H) is given by
This example is very unusal, because all capacities discussed up to now (including
the quantum capacity as we will see in Subsection 6.3.2) can be calculated explicitly:
We get Cc,1 (T ) = Cc (T ) = (1 − ϑ) log2 (d) for the classical and Ce (T ) = 2Cc (T ) for
the entanglement enhanced classical capacity [15, 17]. Hence the gain by entangle-
ment assistance is exactly a factor two; cf. Figure 6.1.
2
classical capacity
Ce (T ) ee. classical capacity
quantum capacity
Cc (T )
Cq (T ) 1.5
0.5
0
0 0.2 0.4 0.6 0.8 1
ϑ
Figure 6.1: Capacities of the quantum erasure channel plotted as a function of the
error probability.
and in Figure 6.3 we have plotted the quotient Ce (T )/Cc,1 (T ) which gives an upper
bound on the gain we get from entanglement assistance.
2
one-shot cl. capacity
Ce (T ) entanglement enhanced cl. capacity
1.8
Cc,1 (T )
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
θ
3
Ce (T )
Cc,1 (T ) 2.9
2.8
2.7
2.6
2.5
2.4
2.3
2.2
2.1
2
0 0.2 0.4 0.6 0.8 1
θ
Figure 6.3: Gain of using entanglement assisted versus unassisted classical capacity
for a depolarizing qubit channel.
encoding. In our case this means that we will consider the constraint tr(ρaa∗ ) ≤ N
for a positive real number N > 0 and with the usual creation and annihilation opera-
tors a∗ , a. This can be rewritten as an energy constraint for a quadratic Hamiltonian;
hence this is a physically realistic restriction.
10
Ent. enhanced classical capacity
Ce (T ) one-shot classical capacity
9
Cc,1 (T )
8
0
0 0.5 1 1.5 2
k
For the entanglement enhanced capacity it can be shown now that the maximum
in Equation (6.28) is taken on Gaussian states. To get Ce (T ) it is sufficient therefore
to calculate the quantum mutual information I(T, ρ) for the Gaussian state ρN from
Equation (3.64). The details can be found in [84] and [18], we will only state the
results here. With the abbreviation
with
p
D= (N + N ′ + 1)2 − 4k 2 N (N + 1) (6.35)
for the entropy exchange. The sum of all three terms gives Ce (T ) which we have
plotted in Figure 6.4 as a function of k.
To calculate the one-shot capacity Cc,1 (T ) the optimization in Equation (6.25)
has to be calculatedPover probability distributions pj and collections of density oper-
ators ρj such that j pj tr(aa∗ ρj ) ≤ N holds. It is conjectured but not yet proven
[84] that the maximum is achieved on coherent states with Gaussian probability
distribution p(x) = (πN )−1 exp(−|x|2 /N ). If this is true we get
The result is plotted as a function of k in Figure 6.4 and the ratio G = Ce /C1
in Figure 6.5. G gives an upper bound on the gain of using entanglement assisted
versus unassisted classical capacity.
3.5
Ce (T ) N=0.1
N=1
Cc1 (T ) N=10
2.5
1.5
1
0 0.5 1 1.5 2
k
Figure 6.5: Gain of using entanglement assisted versus unassisted classical capacity
for a Gaussian amplification/attenuation channel with Nc = 0 and input noise
N = 0.1, 1, 10
Here S(T ∗ ρ) is the entropy of the output state and S(ρ, T ) is the entropy exchange
defined in Equation (6.26). It is argued [7] that J(ρ, T ) plays a role in quantum
information theory which is analogous to that of the (classical) mutual information
(6.21) in classical information theory. J(ρ, T ) has some nasty properties, however:
it can be negative [41] and it is known to be not additive [54]. To relate it to Cq (T )
it is therefore not sufficient to consider a one-shot capacity as in Shannons Theorem
(Thm 6.2.1). Instead we have to define
1
Cs (T ) = sup Cs,1 (T ⊗N ) with Cs,1 (T ) = sup J(ρ, T ). (6.46)
N N ρ
holds for any channel. In contrast to many other calculations in this field it is
particular easy to derive this relation from properties of the cb-norm. Hence we are
able to give a proof here. We start with the fact that kΘkcb = d if d is the dimension
89 6.3. The quantum capacity
⊗N (M)
EM : Md⊗M → M2 , M, N (M ) ∈ N (6.55)
To find such a sequence note first that we can look at the depolarizing channel
as a device which produces an error with probability ϑ and leaves the quantum
information intact otherwise. If more and more copies of T are used in parallel, i.e.
if M goes to infinity, the number of errors approaches therefore ϑM . In other words
the probability to have more than ϑM errors vanishes asymptotically. To see this
consider
M
X
⊗M (M)
T ⊗M = (ϑ − 1) Id +ϑd−1 tr( · )1I = (1 − ϑ)K ϑN −K TK (6.57)
K=1
(M)
where TK denotes the sum of all M -fold tensor products with d−1 tr( · )1I on N
(M)
places and Id on the N − K remaining – i.e. TK is a channel which produces
1
one-shot coherent information
Cθ (T ) transposition bound
Hamming bound
Cs,1 (T )
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
ϑ
X (M)
= (1 − ϑ)K ϑN −K TK (6.59)
K>ϑM cb
M
X (M)
≤ (1 − ϑ)K ϑN −K kTK kcb (6.60)
K>ϑM
M
X
M
≤ (1 − ϑ)K ϑN −K = R. (6.61)
K
K>ϑM
The quantity R is the tail a of Binomial series and vanishes therefore in the limit
M → ∞ (cf. e.g. Appendix B of [131]). This shows that for M → ∞ only terms
(M)
TK with K ≤ ϑM are relevant in Equation (6.57) – in other words at most ϑM
errors occur asymptotically, as stated. This implies that we need a sequence of codes
EM which encode N (M ) qubits and correct ϑM errors on M places. One way to
get such a sequence is “random coding” – the classical version of this method is
well known from the proof of Shannons theorem. The idea is, basically, to generate
error correcting codes of a certain type randomly. E.g. we can generate a sequence
of random graphs with N (M ) input and M output vertices (cf. Section 4.4). If we
can show that the corresponding codes correct (asymptotically) ϑM errors, the cor-
responding rate r = limM→∞ N (M )/M is achievable. For the depolarizing channel2
such an analysis, using randomly generated stabilizer codes shows [16, 71]
where H is the binary entropy from Equation (5.16). This bound can be further
improved using a more clever coding strategy; cf. [54].
As a third example let us consider again the Gaussian channel studied already
in Subsection 6.2.4. For Cθ (T ) we have (the corresponding calculation is not trivial
and uses properties of Gaussian channels which we have not discussed; cf. [84].)
3.5
One-shot coherent information
Cθ (T ) Transposition bound
Cs,1 (T ) 3
2.5
1.5
0.5
0
0 0.5 1 1.5 2
k
8
One-shot coherent information
Cθ (T ) Transposition bound
Cs,1 (T )7
0
0 0.2 0.4 0.6 0.8 1
k
which measures the degree with which the partial transpose of ρ fails to be positive.
Eθ can be regarded as entanglement measure although it has some drawbacks: it is
not LOCC monotone (Axiom E2), it is not convex (Axiom E3) and most severe: It
does not coincides with the reduced von Neumann entropy on pure states, which we
have considered as “the” entanglement measure for pure states. On the other hand
it is easy to calculate and it gives bounds on distillation rates and teleportation
capacities [158]. In addition Eθ can be used together with the relation between
depolarizing channels and isotropic states to derive Equation (6.54) in a very simple
way.
Chapter 7
Multiple inputs
We have seen in Chapter 4 that many tasks of quantum information which are
impossible with one-shot operations can be approximated by channels which operate
on a large number of equally prepared inputs. Typical examples are approximate
cloning, undoing noise and distillation of entanglement. There are basically two
questions which are interesting for a quantitative analysis: First we can search for
the optimal solutions for a fixed number N of input systems and second we can ask
for the asymptotic behavior in the limit N → ∞. In the latter case the asymptotic
rate, i.e. the number of outputs (of a certain quality) per input system is of particular
interest.
7.1 The general scheme
Both types of questions just mentioned can be treated (up to certain degree) in-
dependently from the (impossible) task we are dealing with and we will study
in the following the corresponding general scheme. Hence consider a channel
T : B(H⊗M ) → B(H⊗N ) which operates on N input systems and produces M
outputs of the same type. Our aim is to optimize a “figure of merit ” F(T ) which
measures the deviation of T ∗ (ρ⊗N ) from the target functional we want to approxi-
mate. The particular type of device we are considering is mainly fixed by the choice
of F(T ) and we will discuss in the following the most relevant examples. (Note that
we have considered them already on a qualitative level in Chapter 4; cf. in particular
Section 4.2 and 4.3).
7.1.1. Figures of merit. — Let us start with pure state cloning [68, 31, 32,
35, 167, 98], i.e. for each (unknown) pure input state σ = |ψihψ|, ψ ∈ H the M
clones T ∗ (σ ⊗N ) produced by the channel T should approximate M copies of the
input in the common state σ ⊗M as good as possible. There are in fact two different
possibilities to measure the distance of T ∗ (σ ⊗N ) to σ ⊗M . We can either check the
quality of each clone separately or we can test in addition the correlations between
output systems. With the notation
σ (j) = 1I⊗(j−1) ⊗ σ ⊗ 1I⊗(M−j) ∈ B(H⊗M ) (7.1)
a figure of merit for the first case is given by
Fc,1 (T ) = inf inf tr σ (j) T ∗ (σ ⊗N ) . (7.2)
j=1,... ,N σ pure
It measures the worst one particle fidelity of the output state T ∗ (σ ⊗N ). If we are
interested in correlations too, we have to choose
Fc,all (T ) = inf tr σ ⊗M T ∗ (σ ⊗N ) (7.3)
σ pure
which is again a “worst case” fidelity, but now of the full output with respect to M
uncorrelated copies of the input σ.
Instead of fidelities we can consider other error quantities like trace-norm dis-
tances or relative entropies. In general, however, we do not get significantly different
results from such alternative choices; hence we can safely ignore them. Real vari-
ants arise if we consider instead of the infima over all pure states quantities which
prefer a (possibly discrete or even finite) class of states. Such a choice leads to
“state dependent cloning”, because the corresponding optimal devices perform bet-
ter as “universal” ones (i.e. those described by the figures of merit above) on some
95 7.1. The general scheme
states but much worse on the rest. We ignore state dependent cloning in this work,
because the universal case is physically more relevant and technically more chal-
lenging. Other cases which we do not discuss either include “asymmetric cloning”,
which arises if we trade in Equation (7.2) the quality of one particular output sys-
tem against the rest (see [40]), and cloning of mixed states. The latter is much more
difficult then the pure state case and even for classical systems, where it is related
to the so called “bootstrap” technique [59], nontrivial.
Closely related to cloning is purification, i.e. undoing noise. This means we are
considering N systems originally prepared in the same (unknown) pure state σ but
which have passed a depolarizing channel
R∗ σ = ϑσ + (1 − ϑ)1I/d (7.4)
afterwards. The task is now to find a device T acting on N of the decohered systems
such that T ∗ (R∗ σ) is as close as possible to the original pure state. We have the
same basic choices for a figure of merit as in the cloning problem. Hence we define
FR,1 (T ) = inf inf tr σ (j) T ∗ (R∗ σ)⊗N (7.5)
j=1,... ,N σ pure
and
FR,all (T ) = inf tr σ ⊗M T ∗ (R∗ σ)⊗N . (7.6)
σ pure
Since Θσ is a state if σ is, we can ask again for a channel T such that T ∗ (σ ⊗N )
approximates (Θσ)⊗M . As in the two previous examples we have the choice to allow
arbitrary correlations in the output or not and we get the following figures of merit:
Fθ,1 (T ) = inf inf tr (Θσ)(j) T ∗ (σ ⊗N ) (7.8)
j=1,... ,N σ pure
and
Fθ,all(T ) = inf tr (Θσ)⊗M T ∗ (σ ⊗N ) . (7.9)
σ pure
Note that we can plug in for Θ basically any functional which maps states to states.
In addition we can combine Equation (7.5) and (7.6) on the one hand with (7.8)
and (7.9) on the other. As result we would get a measure for devices which undo
an operation R and approximate an impossible machine Θ at the same time.
7.1.2. Covariant operations. — All the functionals just defined give rise to
optimization problems which we will study in greater detail in the next Sections.
This means we are interested in two things: First of all the maximal value of F#,♮
(with # = c, R, θ and ♮ = 1, all) given by
where the supremum is taken over all channels T : B(H⊗M ) → B(H⊗N ), and second
the particular channel Tb where the optimum is attained. At a first look a complete
solution of these problems seems to be impossible, due to the large dimension of
the space of all T , which scales exponentially in M and N . Fortunately all F#,♮ (T )
admit a large symmetry group which allows in many cases the explicit calculation of
the optimal values F#,♮ (N, M ) and the determination of optimizers Tb with a certain
covariance behavior. Note that this is an immediate consequence of our decision to
restrict the discussion to “universal” procedures, which do not prefer any particular
input state.
Let us consider permutations of the input systems first: If p ∈ SN is a permu-
tation on N places and Vp the corresponding unitary on H⊗N (cf. Equation (3.7))
we get obviously T ∗ (Vp ρ⊗N Vp∗ ) = T ∗ (ρ⊗N ), hence
F#,♮ αp (T ) = F#,♮ (T ) ∀p ∈ SN with αp (T ) (A) = Vp∗ T (A)Vp . (7.11)
In other words: F#,♮ (T ) is invariant under permutations of the input systems. Sim-
ilarly we can show that F#,♮ (T ) is invariant under permutations of the output
systems:
F#,♮ βp (T ) = F(T ) ∀p ∈ SM with βp (T ) (A) = T (Vp∗ AVp ). (7.12)
To see this consider e.g. for # = c and ♮ = all
tr σ ⊗M Vp T ∗ (ρ⊗N )Vp∗ = tr Vp σ ⊗M Vp∗ T ∗ (ρ⊗N ) = tr σ ⊗M T ∗ (ρ⊗N ) . (7.13)
For the other cases similar calculations apply.
Finally, none of the F#,♮ (T ) singles out a preferred direction in the one particle
Hilbert space H. This implies that we can rotate T by local unitaries of the form
U ⊗N respectively U ⊗M without changing F#,♮ (T ). More precisely we have
F#,♮ γU (T ) = F#,♮ (T ) ∀U ∈ U(d) (7.14)
with
γU (T ) (A) = U ∗⊗N T (U ⊗M AU ∗⊗M )U ⊗N . (7.15)
The validity of Equation (7.14) can be proven in the same way as (7.11) and (7.12).
The details are therefore left to the reader.
Now we can average over the groups SN , SM and U(d). Instead of the operation
T we consider
1 X X Z
T̄ = αp βq γU (T )dU, (7.16)
N !M ! G
p∈SN q∈SM
where dU denotes the normalized, left invariant Haar measure on U(d). We see
immediately that T̄ has the following symmetry properties
αp (T̄ ) = T̄ , βq (T̄ ) = T̄ , γU (T̄ ) = T̄ , ∀p ∈ SN , ∀q ∈ SM , ∀U ∈ U(d) (7.17)
and we will call each operation T fully symmetric, if it satisfies this equation. The
concavity of F#,♮ implies immediately that it can not decrease if we replace T by
T̄ :
1 X X Z
F#,♮ (T ) = F#,♮ αp βq γU (T )dU (7.18)
N !M ! G
p∈SN q∈SM
1 X X Z
≥ F#,♮ αp βq γU (T ) dU = F#,♮ (T ). (7.19)
N !M ! G
p∈SN q∈SM
97 7.1. The general scheme
the k-component of total angular momentum (i.e. σk is the k th Pauli matrix and
P 2
σ (j) ∈ B(H⊗N ) is defined according to Equation (7.1)) and L ~2 =
k Lk . The
~ 2 is well known to be
eigenvalue expansion of L
(
X 0, 1, . . . , N/2 N even
~ =
L s(s + 1)Ps , with s = , (7.21)
j
1/2, 3/2, . . . , N/2 N odd
Ps H⊗N ∼
= Hs ⊗ KN,s , U ⊗N ψ = (πs (U ) ⊗ 1I)ψ ∀ψ ∈ Ps H⊗N . (7.24)
Since Vp and U ⊗N commute the Hilbert space KN,s carries a representation π bN,s (p)
of SN which is irreducible as well. Note that KN,s depends in contrast to Hs on the
number N of tensor factors and its dimension is (see [100] or [142] for general d)
2s + 1 N
dim KN,s = . (7.25)
N/2 + s + 1 N/2 − s
7. Multiple inputs 98
holds if Aj ⊗ Bj ∈ B(Hj ⊗ KN,j ). The operations Tsj are unital and have, according
to γU (T ) = T the following covariance properties
πs (U )T (Aj )πs (U ∗ ) = T πj (U )Aj πj (U ∗ ) ∀U ∈ SU(2). (7.28)
The classification of all fully symmetric channels T is reduced therefore to the study
of all these Tsj .
We can apply now the covariant version of Stinespring’s theorem (Theorem 3.2.2)
to find that
e V πs (U ) = πj (U ) ⊗ π
Tsj (Aj ) = V ∗ (Aj ⊗ 1I)V, V : Hs → Hj ⊗ H, e(U )V, (7.29)
particular the case for Equation (7.31) which holds for general d if we replace s, j
and l by Young frames. However, the representation theory of U(d) becomes much
more difficult. The generalization of results available for qubits (d = 2) to d > 2 is
therefore by no means straightforward.
Finally let us give a short comment on Gaussian states here. Obviously the
methods just described do not apply in this case. However, we can consider instead
of U ⊗N -covariance, covariance with respect to phase-space translations. Following
this idea some results concerning optimal cloning of Gaussian states are obtained
(see [43] and the refences therein), but the corresponding general theory is not as
far developed as in the finite dimensional case.
7.1.4. Distillation of entanglement. — Finally let us have another look at dis-
tillation of entanglement. The basic idea is quite the same as for optimal cloning:
Use multiple inputs to approximate a task which is impossible with one-shot opera-
tions. From a more technical point of view, however, it does not fit into the general
scheme proposed up to now. Nevertheless, some of the arguments can be adopted
in an easy way. First of all we have to replace the “one-particle” Hilbert space H
with a two-fold tensor product HA ⊗ HB and the channels we have to look at are
LOCC operations
⊗M ⊗M ⊗N ⊗N
T : B(HA ⊗ HB ) → B(HA ⊗ HB ); (7.33)
cf. Section 4.3. Our aim is to determine T such that T ∗ (ρ⊗N ) is for each distillable
(mixed) state ρ ∈ B∗ (HA ⊗ HB ), close to the M -fold tensor product |ΨihΨ|⊗M
of a maximally entangled state Ψ ∈ HA ⊗ HB . A figure of merit with a similar
structure as the F#,all studied above can be derived directly from the definition of
the entanglement measure ED in Section 5.1.3: We define (replacing the trace-norm
distance with a fidelity)
where the infima are taken over all maximally entangled states Ψ and all distillable
states ρ. Alternatively we can look at state dependent measures, which seem to be
particularly important if we try to calculate ED (ρ) for some state ρ. In this case we
simply get
To translate the group theoretical analysis of the last two subsections is somewhat
more difficult. As in the case of F#,♮ we can restrict the search for optimizers to
permutation invariant operations, i.e. αp (T ) = T and βp (T ) = T in the terminology
of Subsection 7.1.2. Unitary covariance
however, can not be assumed for all unitaries U of HA ⊗ HB , but only for local
ones (U = UA ⊗ UB ) in the case of FD or only for local U which leave ρ invariant for
FD,ρ . This makes the analogon of the decomposition scheme from Subsection 7.1.3
more difficult and such a study is (up to my knowledge) not yet done. A related
subproblem arises if we consider FD,ρ from Equation (7.35) for a state ρ with special
symmetry properties; e.g. an OO-invariant state. The corresponding optimization
might be simpler and a solution would be relevant for the calculation of ED .
7.2 Optimal devices
Now we can consider the optimization problems associated to the figures of merit
discussed in the last section. This means that we are searching for those devices
7. Multiple inputs 100
which approximate the impossible tasks in question in the best possible way. As
pointed out at the beginning of this Chapter this can be done for finite N and in
the limit N → ∞. The latter is postponed to the next section.
7.2.1. Optimal cloning. — The quality of an optimal, pure state cloner is defined
by the figures of merit Fc,# in Equations (7.2) and (7.3) and the group theoretic
ideas sketched in Subsection 7.1.3 allow the complete solution of this problem. We
will demonstrate some of the basic ideas in the qubit case first and state the final
result afterwards in full generality.
The solvability of this problem relies in part on the special structure of the figures
of merit Fc,# , which allows further simplifications of the general scheme sketched
in Subsection 7.1.3. If we consider e.g. Fc,1 (T ) (the other case works similarly) we
get:
Fc,1 (T ) = inf inf tr σ (j) T ∗ (σ ⊗N ) (7.37)
j=1,... ,N σ pure
= inf inf tr T (σ (j) )σ ⊗N ) (7.38)
j=1,... ,N σ pure
⊗N ⊗N
Hence Fc,# only depends on the B(H+ ) component (where H+ denotes again
⊗N
the Bose-subspace of H ) of T and we can assume without loss of generality that
T is of the form
⊗N
T : B(H⊗M ) → B(H+ ). (7.40)
⊗N
The restriction of U ⊗N to H+ is an irreducible representation (for any d) and in
⊗N
the qubit case (d = 2) we have U ⊗N ψ = πs (U )ψ with s = N/2 for all ψ ∈ H+ . The
decomposition of T from Equation (7.27) contains therefore only those summands
with s = N/2. This simplifies the optimization problem significantly, since the
number of variables needed to parametrize all relevant cloning maps according to
Equation (7.31) is reduced from 3 to 2. A more detailed (and non-trivial) analysis
shows that the maximum for Fc,1 and Fc,all is attained if all terms in (7.31) except
the one with s = N/2, j = N/2 and l = (M − N )/2 vanish. The precise result is
stated in the following theorem ([68, 31, 32] for qubits and [167, 98] for general d).
Theorem 7.2.1 For each H = Cd both figures of merit Fc,1 and Fc,all are maxi-
mized by the cloner
d[N ]
Tb∗ (ρ) = SM (ρ ⊗ 1I)SM (7.41)
d[M ]
⊗N
where d[N ], d[M ] denote the dimensions of the symmetric tensor products H+
⊗M ⊗M ⊗M
respectively H+ and SM is the projection from H to H+ . This implies for
the optimal fidelities
d−1 N M +d
Fc,1 (N, M ) = (7.42)
d N +d M
and
d[N ]
Fc,all (N, M ) = . (7.43)
d[M ]
Tb is the unique solution for both optimization problems, i.e. there is no other oper-
ation T of the form (7.40) which maximizes Fc,1 or Fc,all .
101 7.2. Optimal devices
There are two aspects of this result which deserve special attention. One is the
relation to state estimation which is postponed to Subsection 7.2.3. The second
concerns the role of correlations: It does not matter whether we are looking for the
quality of each single clone (Fc,1 ) only, or whether correlations are taken into account
(Fc,all ). In both cases we get the same optimal solution. This is a special feature of
pure states, however. Although there are no concrete results for quantum systems,
it can be checked quite easily in the classical case that considering correlations
changes the optimal cloner for arbitrary mixed states drastically.
7.2.2. Purification. — To find an optimal purification device, i.e. maximizing
FR,# , is more difficult then the cloning problem, because the simplification from
Equation (7.40) does not apply. Hence we have to consider all the summands in the
direct sum decomposition of T from Equation (7.31) and solutions are available only
for qubits. Therefore we will assume for the rest of this subsection that H = C2
holds. The SU(2) symmetry of the problem allows us to assume without loss of
generality that the pure initial state ψ coincides with one of the basis vectors.
Hence we get for the (noisy) input states of the purifier
σ
1 3 1 eβ 0
ρ(β) = exp 2β = β (7.44)
2 cosh(β) 2 e + e−β 0 e−β
1
= tanh(β)|ψihψ| + (1 − tanh(β)) 1I, ψ = |0i (7.45)
2
1
FR,#
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
one-qubit fidelity
all qubit fidelity
0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ϑ
Figure 7.1: One- and all-qubit fidelities of the optimal purifier for N = 100 and
M = 10. Plotted as a function of the noise parameter ϑ.
7. Multiple inputs 102
to get
M 1I
ρ(β)⊗N = wN (s)ρs (β) ⊗ , (7.46)
s
dim KN,s
with
sinh (2s + 1)β
wN (s) = dim KN,s , (7.47)
sinh(β)(2 cosh(β))N
and
sinh(β)
ρs (β) = exp(2βL(s)
3 );
sinh (2s + 1)β
(s)
where L3 is the 3-component of angular momentum in the spin-s representation
and the dimension of KN,s is given in Equation (7.25). By (7.23) the representation
2s
space of πs coincides with the symmetric tensor product H+ . Hence we can interpret
ρs (β) as a state of 2s (indistinguishable) particles. In other words the decomposition
of ρ(β)⊗N leads in a natural way to a family of operations
⊗2s
Qs : B(H+ ) → B(H⊗N ), with Q∗s ρ(β)⊗N = ρs (β). (7.48)
We can think of the family Qs , of operations as an instrument Q which measures
the number of output systems and transforms ρ(β)⊗N to the appropriate ρs (β).
The crucial point is now that the purity of ρs (β), measured in terms of fidelities
with respect to ψ increases provided s > 1/2 holds. Hence we can think of Q as
a purifier which arises naturally by reduction to irreducible spin components [46].
Unfortunately Q does not produce a fixed number of output systems. The most
obvious way to construct a device which produces always the same number M of
outputs is to run the optimal 2s → M cloner Tb2s→M if 2s < M or to drop 2s − M
particles if M ≤ 2s holds. More precisely we can define Q b : B(H⊗M ) → B(H⊗N )
by
X
Qb ∗ ρ(β)⊗N = wN (s)Tb2s→M
∗
ρs (β) , (7.49)
s
with
( d[2s]
d[M] SM (ρ ⊗ 1I)SM for M > 2s
Tb2s→M
∗
(ρ) = (7.50)
tr2s−M ρ for M ≤ 2s.
tr2s−M denotes here the partial trace over the 2s − M first tensor factors. Applying
the general scheme of Subsection 7.1.3 shows that this is the best way to get exactly
M purified qubits [100]:
Theorem 7.2.2 The operation Q b defined in Equation (7.49) maximizes FR,1 and
FR,all . It is called therefore the optimal purifier. The maximal values for FR,1 and
FR,all are given by
X X
FR,1 (N, M ) = wN (s)f1 (M, β, s), FR,all (N, M ) = wN (s)fall (M, β, s)
s s
(7.51)
with
2f1 (M, β, s) − 1 =
2s + 1 1
2s coth (2s + 1)β − 2s coth β for 2s > M
= (7.52)
1 M + 2
(2s + 1) coth (2s + 1)β − coth β for 2s ≤ M .
2s + 2 M
103 7.2. Optimal devices
1
FR,#
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
one-qubit fidelity
all-qubit fidelity
0.1
0 10 20 30 40 50 60 70 80 90 100
N
Figure 7.2: One- and all-qubit fidelities of the optimal purifier for ϑ = 0.5 and
M = 10. Plotted as a function of N .
and
2s + 1 1 − e−2β
M ≤ 2s
M + 1 1 − e−(4s+2)β
fall (M, β, s) = −1 X (7.53)
1 − e−2β 2s K 2β(K−s)
e M > 2s.
1 − e−(4s+2)β M M
K
The expression for the optimal fidelities given here look rather complicated and
are not very illuminating. We have plotted there both quantities as a function of ϑ
(Figure 7.1) of N (Figure 7.2) and M (Figure 7.3). While the first two plots looks
quite similar the functional behavior in dependence of M seems to be very different.
The study of the asymptotic behavior in the next Section will give a precise analysis
of this observation.
7.2.3. Estimating pure states. — We have already seen in Section 4.2 that the
cloning problem and state estimation are closely related, because we can construct
an approximate cloner T from an estimator E simply by running E on the N input
states, and preparing M systems according to the attained classical information. In
this section we want to go the other way round and show that the optimal cloner
derived in Theorem 7.2.1 leads immediately to an optimal pure state estimator; cf.
[33].
To this end let us assume that E has the form (cf. Section 4.2)
X
C(X) ∋ f 7→ E(f ) = f (σ)Eσ ∈ B(H⊗N ) (7.54)
σ∈X
where X ⊂ B∗ (H) is a finite set1 of pure states. The quality of E can be measured
1 The generalization of the following considerations to continuous sets and a measure theo-
retic setup is straightforward and does not lead to a different result; i.e. we can not improve the
estimation quality with continuous observables.
7. Multiple inputs 104
1
one-qubit fidelity
FR,# all-qubit fidelity
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
M
Figure 7.3: One- and all-qubit fidelities of the optimal purifier for ϑ = 0.5 and
N = 10. Plotted as a function of M .
its one-particle fidelity Fc,1 (TE ) coincides obviously with Fs (E). Since we can pro-
duce in this way arbitrary many clones of the same quality we see that Fs (E) is
smaller than Fc,1 (N, M ) for all M and therefore
d−1 N
Fs (E) ≤ Fc,1 (N, ∞) = lim Fc,1 (N, M ) = (7.57)
M→∞ d N +d
where we can look at Fc,1 (N, ∞) as the optimal quality of a cloner which produces
arbitrary many outputs from N input systems.
To see that this bound can be saturated consider an asymptotically exact family
X
C(XM ) ∋ f 7→ E M (f ) = f (σ)EσM ∈ B(H⊗M ), XM ⊂ S(H) (7.58)
σ∈X
of estimators, i.e. the error probabilities (4.17) vanish in the limit N → ∞. If the
EσM ∈ B(H⊗M ) are pure tensor products (i.e. the E M are realized by a “quorum”
of observables as described in Subsection 4.2.1) they can not distinguish between
the output state Tb∗ (ρ⊗N ) (which is highly correlated) and the pure product state
ρe⊗M where ρe ∈ B∗ (H) denotes the partial trace over M − 1 tensor factors (due
to permutation invariance it does not matter which factors we trace away here).
105 7.3. Asymptotic behaviour
1
Fθ,1 (N, 1) = Fθ,all (N, 1) = 1 − . (7.59)
N +2
The dependence on the number M of outputs is not interesting here,. because
the optimal device produces arbitrary many copies of the same quality.
7.3 Asymptotic behaviour
If a device, such as the optimal cloner, is given which produces M output sys-
tem from N inputs it is interesting to ask for the maximal rate, i.e. the max-
imal ratio M (N )/N in the limt N → ∞ such that the asymptotic fidelity
limN →∞ F N, M (N ) is above a certain threshold (preferably equal to one). Note
that this type of question was very important as well for distillation of entanglement
and channel capacities, but almost not computable in there. In the current context
this type of question is somewhat easier to answer. This relies on the one hand on
the group theoretical structure presented in the last section and on the other on
the close relation to quantum state estimation. We start this section therefore with
a look on some aspects of the asymptotics of mixed state estimation.
7.3.1. Estimating mixed state. — If we do not know a priori that the input
systems are in a pure state much less is known about estimating and cloning. It is
in particular almost impossible to say anything about optimality for finitely many
input systems (only if N is very small e.g. [156]). Nevertheless some strong results
are available for the behavior in the limit N → ∞ and we will give here a short
review of some of them.
One quantity, interesting to be analyzed for a family of estimators E N in the
limit N → ∞ is the variance of the E N . To state some results in this context it is
convenient to parameterize the state space S(H) or parts of it in terms of n real
2 Basically convergence must be shown here. It follows however easily from the corresponding
property of the E M .
7. Multiple inputs 106
For a good estimation strategy we expect that Vjk (x) decreases as 1/N , i.e.
N Wjk (x)
Vjk (x) ≃ , (7.62)
N
where the scaled mean quadratic error matrix Wjk (x) does not depend on N . The
task is now to find bounds on this matrix. We will state here two results taken from
[66]. To this end we need the Hellström quantum information matrix
λj(x)λk (x) − λk (x)λj (x)
Hjk (x) = tr ρ(x) (7.63)
2
which is defined in terms of symmetric logarithmic derivatives λj , which in turn are
implicitly given by
∂ρ(x) λj (x)ρ(x) + ρ(x)λj (x)
= . (7.64)
∂xj 2
Now we have the following theorem [66]:
Theorem 7.3.1 Consider a family of estimators E N , N ∈ N as described above
such that the following conditions hold:
N
1. The scaled mean quadratic error matrix N Vjk (x) converges uniformly in x to
Wjk (x) as N → ∞.
2. Wjk (x) is continuous at a point x0 = x.
3. Hjk (x) and its derivatives are bounded in a neighborhood of x0 .
Then we have
tr H −1 (x0 )W −1 (x0 ) ≤ (d − 1) (7.65)
For qubits this bound can be attained by a particular estimation strategy which
measures on each qubit separately. We refer to [66] for details.
A second quantity interesting to study in the limit N → ∞ is the error prob-
ability defined in Section 4.2; cf. Equation (4.17). For a good estimation strategy
it should go to zero of course, an additional question, however, concerns the rate
with which this happens. We will review here a result from [99] which concerns the
subproblem of estimating the spectrum. Hence we are looking now at a family of
observables E N : C(XN ) → B(H⊗N ), N ∈ N taking their values in a finite subset
XN of the set
P
Σ = {(x1 , . . . , xd ) ∈ Rd | x1 ≥ · · · ≥ xd ≥ 0, j xj = 1} (7.66)
107 7.3. Asymptotic behaviour
1
lim ln KN (∆) = inf I(s), (7.69)
N →∞ N s∈∆
where the “rate function” I : Σ → R is just the relative entropy between the two
probability vectors s and r
X
I(s) = sj (ln sj − ln rj ) . (7.70)
j
To make this statement more transparent, note that we can rephrase (7.69) as
KN (∆) ≈ exp −N inf I(s) . (7.71)
s∈∆
Since the rate function I vanishes only for s = r we see that the probability measures
KN converge (weakly) to a point measure concentrated at r ∈ Σ. The rate of this
convergence is exponential and measured exactly by the function I.
7.3.2. Purification and cloning. — Let us come back now to the discussion of
purification started in Subsection 7.2.2 (consequently we have H = C2 again). Our
aim is now to calculate the fidelities FR,# N, M (N ) in the limit N → ∞ for a
sequence M (N ), N ∈ N such that M (N )/N converges to a value c ∈ R. The crucial
7. Multiple inputs 108
step to do this is the application of Theorem 7.3.2. The density matrices ρs (β) from
Equation (7.46) can be defined alternatively by
1I
ρs (β) ⊗ = wN (s)−1 Ps ρ(β)⊗N Ps , wN (s) = tr ρ(β)⊗N Ps (7.72)
dim KN,s
for the limit of the fidelities. A precise formulation of this idea leads to the following
theorem [100]
Theorem 7.3.3 The two purification fidelities FR,# have the following limits
and
2ϑ2
if µ ≤ ϑ
Φ(µ) = lim FR,all (N, M ) = 2ϑ2
+ µ(1 − ϑ) (7.78)
2
N →∞
2ϑ
M/N →µ if µ ≥ ϑ.
µ(1 + ϑ)
If we are only interested in the quality of each qubit separately we can produce
arbitrarily good purified qubits at any rate. If on the other hand the correlations
between the output systems should vanish in the limit the rate is always zero. This
can be seen from the function Φ, which is the asymptotic all-qubit fidelity which
can be reached by a given rate µ. We have plotted it in Figure 7.4. Note finally that
the results just stated contain the rates of optimal cloning machines as a special
case; we only have to set ϑ = 1.
109 7.3. Asymptotic behaviour
1
theta=0.25
Φ(µ) theta=0.50
0.9 theta=0.75
theta=1.00
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2
µ
Figure 7.4: Asymptotic all-qubit fidelity Φ(µ) plotted as function of the rate µ.
Bibliography
[2] C. Adami and N. J. Cerf. Von Neumann capacity of noisy quantum channels.
Phys. Rev. A 56, no. 5, 3470–3483 (1997).
[10] C. H. Bennett and G. Brassard. Quantum key distribution and coin tossing.
In Proc. of IEEE Int. Conf. on Computers, Systems, and Signal Processing
(Bangalore, India, 1984), pages 175–179. IEEE, New York (1984).
[34] W. T. Buttler, R.J. Hughes, S.K. Lamoreaux, G.L. Morgan, J.E. Nordholt
and C.G. Peterson. Daylight quantum key distribution over 1.6 km. Phys.
Rev. Lett 84, 5652–5655 (2000).
[35] V. Bužek and M. Hillery. Universal optimal cloning of qubits and quantum
registers. Phys. Rev. Lett. 81, no. 22, 5003–5006 (1998).
[43] N. J. Cerf, S. Iblisdir and G. van Assche. Cloning and cryptography with
quantum continuous variables. quant-ph/0107077 (2001).
[48] J. F. Cornwell. Group theory in physics. II. Academic Press, London et. al.
(1984).
[52] D. Deutsch. Quantum theory, the Church-Turing principle and the universal
quantum computer. Proc. R. Soc. Lond. A 400, 97–117 (1985).
[55] D.P. DiVincenzo, P.W. Shor, J.A. Smolin, B.M. Terhal and A.V. Thapliyal.
Evidence for bound entangled states with negative partial transpose. Phys.
Rev. A 61, no. 6, 062312 (2000).
[58] W. Dür, J.I. Cirac, M. Lewenstein and D. Bruss. Distillability and partial
transposition in bipartite systems. Phys. Rev. A 61, no. 6, 062313 (2000).
[64] G. Giedke, L.-M. Duan, J. I. Cirac and P. Zoller. Distillability criterion for
all bipartite gaussian states. Quant. Inf. Comp. 1, no. 3 (2001).
[66] R. D. Gill and S. Massar. State estimation for large ensembles. Phys. Rev.
A61, 2312–2327 (2000).
[67] N. Gisin. Hidden quantum nonlocality revealed by local filters. Phys. Lett. A
210, no. 3, 151–156 (1996).
[71] D. Gottesman. Stabilizer codes and quantum error correction. Ph.D. thesis,
California Institute of Technology (1997). quant-ph/9705052.
[72] M. Grassl, T. Beth and T. Pellizzari. Codes for the quantum erasure channel.
Phys. Rev. A 56, no. 1, 33–38 (1997).
[77] Jim Harrington and John Preskill. Achievable rates for the gaussian quantum
channel. Phys. Rev. A 64, no. 6, 062301 (2001).
[98] M. Keyl and R. F. Werner. Optimal cloning of pure states, testing single
clones. J. Math. Phys. 40, 3283–3299 (1999).
[100] M. Keyl and R. F. Werner. The rate of optimal purification procedures. Ann
H. Poincaré 2, 1–26 (2001).
[107] U. Leonhardt. Measuring the quantum state of light. Cambridge Univ. Press,
Cambridge (1997).
Bibliography 116
[110] S. Lloyd. Capacity of the noisy quantum channel. Phys. Rev. A 55, no. 3,
1613–1622 (1997).
[114] R. Matsumoto and T. Uyematsu. Lower bound for the quantum capacity of a
discrete memoryless quantum channel. quant-ph/0105151 (2001).
[116] N. D. Mermin. Quantum mysteries revisited. Am. J. Phys. 58, no. 8, 731–734
(1990).
[117] N. D. Mermin. What’s wrong with these elements of reality? Phys. Today 43,
no. 6, 9–11 (1990).
[120] M. A. Nielsen. Continuity bounds for entanglement. Phys. Rev. A 61, no. 6,
064301 (2000).
[123] M. Ohya and D. Petz. Quantum entropy and its use. Springer, Berlin (1993).
[125] V. I. Paulsen. Completely bounded maps and dilations. Longman Scientific &
Technical (1986).
[126] A. Peres. Higher order schmidt decompositions. Phys. Lett. A 202, no. 1,
16–17 (1995).
[127] A. Peres. Separability criterion for density matrices. Phys. Rev. Lett. 77,
no. 8, 1413–1415 (1996).
117 Bibliography
[130] J. Preskill. Lecture notes for the course ‘information for physics 219/computer
science 219, quantum computation’. Caltech, Pasadena, California (1999).
www.theory.caltech.edu/people/preskill/ph229.
[143] D. Simon. On the power of quantum computation. In Proc. 35th annual sym-
posium on foundations of computer science, pages 124–134. IEEE Computer
Society Press, Los Alamitos (1994).
[145] S. Singh. The code book: The Science of Secrecy from Ancient Egypt to Quan-
tum Cryptography. Fourh Estate, London (1999).
Bibliography 118
[148] E. Størmer. Positive linear maps of operator algebras. Acta Math. 110, 233–
278 (1693).
[155] G. Vidal. Entanglement monotones. J. Mod. Opt. 47, no. 2-3, 355–376 (2000).
[160] K. G. H. Vollbrecht and R. F. Werner. Why two qubits are special. J. Math.
Phys. 41, no. 10, 6772–6782 (2000).
[164] R. Werner and M. M. Wolf. Bell inequalities and entanglement. Quant. Inf.
Comp. 1, no. 3, 1–25 (2001).