1 Introduction

More than twenty years after its discovery, the AdS/CFT correspondence [49] remains the only known example of a theory of quantum gravity.Footnote 1 A crucial feature of this correspondence is that the emergence of a (classical) spacetime is closely related to the entanglement structure of the boundary theory. Tensor networks appear to provide useful toy models for this aspect of AdS/CFT, mirroring many of its expected properties in a setting that can be made completely mathematically rigorous [40, 61, 70, 71]. A particularly powerful model is given by random tensor networks, which have the advantage of being highly analytically tractable, while exhibiting remarkably precise agreement with gravitational calculations (even including certain exponentially small corrections) [29, 40, 42, 60, 62,63,64, 80]. Random tensors and tensor networks also arise in a number of other fields of physics, including quantum information, where they have been used to explore generic entanglement properties of quantum states [2, 5, 9,10,11, 17, 18, 20,21,22, 37, 39, 46, 50, 57, 79] and condensed matter physics, e.g., in the study of random circuits and measurements [44, 47, 48, 53, 55, 77, 81, 82].

The most basic version of a random tensor network is characterized by a choice of bond dimension D and a graph \(G = (V,E)\), where the vertices \(V = V_b \sqcup V_\partial \) of G are partitioned into “bulk” vertices \(V_b\) and “boundary” vertices \(V_\partial \). To each edge \(e \in E\), we associate a maximally entangled state

$$\begin{aligned} \frac{1}{\sqrt{D}} \sum _{i=1}^D \vert ii\rangle \end{aligned}$$
(1.1)

on two D-dimensional Hilbert spaces, one of which is associated with each endpoint of e; each vertex \(v \in V\) is therefore associated with a Hilbert space \({\mathcal {H}}_v\) of dimension \(D^{\deg (v)}\). Finally, we project each bulk vertex \(v_b \in V_b\) onto a Haar random state \(\vert \psi _{v_b}\rangle \in {\mathcal {H}}_{v_b}\). The resulting “random tensor network state” lives in the Hilbert space \({\mathcal {H}}_\partial = \otimes _{v_\partial } {\mathcal {H}}_{v_\partial }\) associated with the boundary vertices \(v_\partial \in V_\partial \), as shown in Fig. 1. Such states, obtained by projecting maximally entangled edge states onto (not necessarily random) bulk vertex states, are also known as projected entangled pair states (PEPS) in the condensed matter literature [23, 75, 76].

Fig. 1
figure 1

The basic structure of a random tensor network

To characterize the typical entanglement structure of random tensor network states, we can compute the von Neumann entropy \(H(\rho _A)\) of the reduced density matrix \(\rho _A\) on a subset \(A \subset V_\partial \) of the boundary vertices. In the limit where the bond dimension D is very large, this entropy can be shown to converge with high probability to \(\log (D) \left|\gamma _A\right|\), where \(\gamma _A\) is the set of edges crossing the minimal cut (for the moment, assumed to be the unique such cut) in the graph separating A from its boundary complement \(V_\partial {\setminus } A\) (see Fig. 2a). This formula is closely analogous to the Ryu–Takayanagi (RT) formula and its generalizations in AdS/CFT, in which entropies are given by the area of minimal surfaces homologous to a subregion of the conformal boundary [32, 33, 41, 45, 65, 66]. Indeed, this connection is one of the primary reasons for studying tensor networks as a toy model of quantum gravity. However, if one goes beyond the von Neumann entropy and studies finer details of the entanglement spectrum of random tensor network states, significant divergences from holography begin to appear, as we will see shortly.

When studying entanglement in either random tensor networks or quantum gravity, or more generally in quantum field theory, it is often convenient to study kth Rényi entropies \(H_k(\rho _A) = (1-k)^{-1}\log {{\,\textrm{tr}\,}}[\rho _A^k]\). For integer \(k > 1\), these are more amenable to direct computation than the von Neumann entropy, and one can extract the von Neumann entropy by analytic continuation to \(k = 1\). The computation of Rényi entropies in random tensor network models is very similar to holographic computations. In both cases, the idea is to use the replica trick—essentially, this is the observation that \({{\,\textrm{tr}\,}}[\rho _A^k] = {{\,\textrm{tr}\,}}[\tau \rho _A^{\otimes k}]\) where \(\tau \) is an operator which permutes the k copies of A cyclically. In the holographic computation, this can be written as a path integral, on k copies of the theory, glued together in an appropriate way. By the holographic dictionary, this path integral can then be computed by the action of a bulk geometry with certain boundary conditions [45]. For random tensor networks, one finds that \({{\,\textrm{tr}\,}}[\rho _A^k]\) concentrates around its expectation and can be computed as the partition function of a classical spin model on the bulk vertices, with boundary conditions dictated by the choice of boundary subsystem [40]. This computation will be explained in detail in Sect. 2.1. From these computations, one finds that holographic CFT states and random tensor network states behave quite differently when \(k \ne 1\). For random tensor network states, the Rényi entropies are approximately independent of k in the large D limit, meaning their entanglement spectrum is close to “flat”; the boundary state \(\rho _A\) is approximately maximally mixed within a certain subspace. On the other hand, CFT states that are dual to semiclassical spacetime geometries have Rényi entropies that vary non-trivially with k, meaning their entanglement spectrum contains a wide range of eigenvalues that contribute significantly to the state. Recently, it has been argued that the class of “fixed-area states” in AdS/CFT do have flat spectra and more generally have an entanglement structure that closely matches random tensor network states [8, 14, 27, 29, 54]. Fixed-area states have a well-defined semiclassical geometry associated with a fixed spatial slice; however, thanks to the uncertainty principle, they cannot describe a single semiclassical spacetime geometry [14]. We discuss the connection between these states and random tensor networks in more detail in “Appendix A.”

In the random tensor network model, the flatness of the spectrum can be traced to the maximally entangled states used as “link states” (see Eq. (1.1) and Fig. 1b) on the edges of the graph, which themselves have flat entanglement spectra. To take results about random tensor networks beyond the fixed-area state regime, it is natural—see, e.g., discussion in [14, 40]—to replace the maximally entangled link states by general states

$$\begin{aligned} \vert \phi _e\rangle = \sum _{i = 1}^D \sqrt{\lambda _{e,i}} \vert ii\rangle . \end{aligned}$$
(1.2)

The variation in the entanglement spectrum \(\lambda _{e,i}\) represents the fluctuations in area in semiclassical gravitational states. We introduce this model in Sect. 2. The goal of this paper will be to understand the entanglement spectra of random tensor networks with such non-trivial link states and spectra in a number of different regimes. In fact, a number of our results apply in an even more general setting, where the product of link states \(\otimes _e \vert \phi _e\rangle \) is replaced by a completely general “background state” density matrix \(\phi _V\). From a quantum gravity perspective, random tensor networks with general background states are needed to model bulk quantum fields in AdS/CFT, which yield significant physical consequences when the bulk entropies are large [6, 7]. Beyond holography, they also play a central role in the quantum information processing tasks of multiparty state merging split transfer [26], a connection we elaborate on in “Appendix B.”

If one considers a random tensor network with non-trivial link states, if there is a single minimal cut \(\gamma _A\) for a subsystem A, then the resulting density matrix \(\rho _A\) will have an entanglement spectrum that converges to that of \(\left|\gamma _A\right|\) copies of the link state along the minimal cut as \(D \rightarrow \infty \); indeed, this was implicit in [40]. A more complex question, and the main focus of this work, is the case where there are two minimal cuts, as in Fig. 2b. This situation is motivated by questions in holography: It can be used to study the phase transition at the point where there are two competing minimal surfaces [6, 54], which has been relevant to recent advances on the black hole information paradox [1, 4, 51, 59, 60].

Fig. 2
figure 2

Tensor networks with one and two minimal cuts

For our first main result, in Sect. 3, we consider a family of link states with increasing bond dimension D as in Eq. (1.2). For each D, the link state has an associated distribution

$$\begin{aligned} \mu _e^{(D)} = \frac{1}{D}\sum _{i=1}^D \delta _{D\lambda _{e,i}}, \end{aligned}$$

where \(\delta _{x}\) is a \(\delta \)-distribution centered at x, so this is the discrete probability distribution given by a uniform distribution over the spectrum of the link state. Note that \(\lambda _{e,i}=\lambda _{e,i}^{(D)}\) also depends on D. We then require that the moments

$$\begin{aligned} m_k^{(D)} = D^{k-1} \sum _{i=1}^D \lambda _{e,i}^k \end{aligned}$$

of the distributions \(\mu _e^{(D)}\) converge to a finite limit as \(D\rightarrow \infty \) for all positive integer k. We refer to this as the bounded spectral variation limit. This means, in particular, that we must have \(\lambda _{e,i} = {\mathcal {O}}(1/D)\) for all but a vanishing fraction of the eigenvalues \(\lambda _{e,i}\). If we let \(\gamma _A\) denote a minimal cut for a boundary domain A, then we may similarly define the associated distribution

$$\begin{aligned} \mu _{\gamma _A}^{(D)} =\frac{1}{D^{|\gamma _A|}} \sum _{\{i_e\}_{e \in \gamma _A}} \delta _{D^{|\gamma _A|} \prod _{e \in \gamma _A} \lambda _{e,i_e}}. \end{aligned}$$

By assumption, the moments of the distribution \(\mu _{\gamma _A}^{(D)}\) converge to a finite limit (because those of each distribution \(\mu _e^{(D)}\) do), implying that \(\mu _{\gamma _A}^{(D)}\) converges weakly to some distribution \(\mu _{\gamma _A}\). Now, consider the empirical distribution of the spectrum of reduced state \(\rho _A\), which is the (random) distribution

$$\begin{aligned} \mu _A^{(D)} = \frac{1}{D^{\left|\gamma _A\right|}} \sum _{\lambda \in {{\,\textrm{spec}\,}}(\rho _A)} \delta _{D^{\left|\gamma _A\right|} \lambda }. \end{aligned}$$

In the case where there are two non-intersecting minimal cuts \(\gamma _{A,1}\) and \(\gamma _{A,2}\), we find that \(\mu ^{(D)}_A\) converges weakly, in probability, to a limiting distribution \(\mu _A\) given by a free product \({{\,\textrm{MP}\,}}(1) \boxtimes \mu _{\gamma _{A,1}} \boxtimes \mu _{\gamma _{A,2}}\), a notion from the theory of free probability. Here, \({{\,\textrm{MP}\,}}(1)\) is the Marchenko–Pastur distribution of parameter 1. The situation is summarized by our first main result, which we state more precisely as Theorem 3.4:

Theorem

(Informal). Consider a family of link states in the bounded spectral variation limit. If there is a unique minimal cut \(\gamma _A\) for a boundary subsystem A, then \(\mu _A^{(D)}\) converges weakly, in probability, to \(\mu _{\gamma _A}\), while if there are exactly two non-intersecting minimal cuts \(\gamma _{A,1}\) and \(\gamma _{A,2}\), it converges to \({{\,\textrm{MP}\,}}(1) \boxtimes \mu _{\gamma _{A,1}} \boxtimes \mu _{\gamma _{A,2}}\).

In Sect. 3.3, we briefly discuss the closely related problem of computing the entanglement negativity spectrum in the same regime.

For our second main result, in Sect. 4, we investigate a different regime, in which link states are allowed to have unbounded spectral variation in the large D limit. This is the more relevant regime for holography, where fluctuations in the area of a surface (in Planck units) grow sublinearly but without bound in the semiclassical limit. When the spectral variation is unbounded, there still exists a reasonable notion of a minimal cut that determines the entanglement spectrum of the boundary state, but the key difference is that minimality must now be defined entropically, rather than geometrically. In fact, the underlying graph essentially plays no role in this regime. A sensible way to formalize this would be to use one-shot entropies: We might say that a cut is “minimal” if the rank of the state along the cut is smaller than the inverse of the largest element in the entanglement spectrum along any other cut. This condition, while intuitive, is a little too restrictive, and one can use smooth conditional entropies to get a weaker, but still meaningful, condition. In Definition 4.6, following [6] we introduce the notion of a unique \((\varepsilon , K)\)-minimal cut \(\Gamma _A\), where \(\Gamma _A\subseteq V\) is a subset of vertices such that the \(\varepsilon \)-smooth conditional min-entropy of \(\Gamma _A\) compared to competing cuts either contained in \(\Gamma _A\) or containing \(\Gamma _A\) is lower bounded by K. Intuitively, we want K to be as large as possible, and indeed, we show in Theorem 4.7 that the spectrum of a reduced density matrix \(\rho _A\) will be close to the spectrum along a unique \((\varepsilon , K)\)-minimal cut with an error exponentially small in K.

The situation is more complicated if there are two non-intersecting \((\varepsilon , K)\)-minimal cuts, defined in Definition 4.10 as both cuts satisfying an \((\varepsilon , K)\)-minimality property for the same \(\varepsilon \) and K. Here, we need to impose a regularity condition on the link states, motivated by the example of a link state that is a (large) number of copies of a fixed state:

$$\begin{aligned} \vert \phi _e\rangle = \vert \phi _0\rangle ^{\otimes n} \end{aligned}$$
(1.3)

where \(\vert \phi _0\rangle \) is a bipartite state with local dimension d, so the total bond dimension is \(D = d^n\). In this case, if \(\phi _0\) is not maximally entangled, the measure \(\mu _e^{(D)}\) will not converge with increasing n as the spectrum is not concentrated around \(\frac{1}{D}\). However, using a different measure, the entanglement spectrum of \(\vert \phi _e\rangle \) satisfies a central limit theorem. Namely, if \(\vert \phi _e\rangle \) has Schmidt coefficients \(\{\lambda _i\}\), and \(\vert \phi _0\rangle \) has entanglement entropy \(H_0\), then the distribution of the random variable \(X^{(n)}\) which takes values \(\frac{1}{\sqrt{n}}(\log (\frac{1}{\lambda _i}) - nH_0)\) with probability \(\lambda _i\), converges weakly to a centered Gaussian distribution as \(n\rightarrow \infty \). Since we subtracted the entropy \(nH_0\), the random variable \(X^{(n)}\) has expectation zero. Its variance can be thought of as a measure of the fluctuation of \(\log (\frac{1}{\lambda _i})\) around the entropy and is relevant for second-order asymptotic rates in quantum information processing tasks [72]. We take this central limit theorem as motivation for a regularity condition on the spectra of general link and background states, and we allow the states to have varying bond dimensions D(n), e.g., \(D(n) \sim d^n\). To be more precise, we define the following measure along a cut \(\gamma _A\):

$$\begin{aligned} \nu _{\gamma _A}^{(n)} = \sum _{\{i_e\}_{e \in \gamma _A}} \prod _{e \in \gamma _A} \lambda _{e,i_e} \delta _{\frac{1}{\sqrt{n}}\left( \sum _{e \in \gamma _A} \log \frac{1}{\lambda _{e,i_e}} - H(n)\right) }, \end{aligned}$$

where H(n) is a function of n (which one can think of as being approximately equal to the entanglement entropy along the cut \(\gamma _A\)), and we assume that \(\nu _{\gamma _A}^{(n)}\) converges weakly to a continuous distribution. Note that the distribution described above reduces to the distribution of \(X^{(n)}\) if the link state is of the form in Eq. (1.3) and is very different from the distribution \(\mu _A^{(D)}\), we study for link states with bounded spectral variation. Similarly, we let

$$\begin{aligned} \nu _{A}^{(n)} = \sum _{\lambda \in {{\,\textrm{spec}\,}}(\rho _A)} \lambda \, \delta _{\frac{1}{\sqrt{n}}(\log (\frac{1}{\lambda }) - H(n))} \end{aligned}$$

be the corresponding (random) distribution for the boundary spectrum. Knowledge of this distribution allows computation of the entropy of \(\rho _A\) (and fluctuations) as a correction to H(n). In the random tensor network setting, we find that in a situation with two competing minimal cuts, the random tensor network will ‘select’ the minimal parts of each cut, in the following sense:

Theorem

(Informal). Assume that we have a family of (states with) two non-intersecting \((\varepsilon (n),K(n))\)-minimal cuts \(\gamma _{A,1}\) and \(\gamma _{A,2}\), as defined in Definition 4.10. Suppose the entanglement spectra along the two minimal cuts are such that \(\nu _{\gamma _{A,1}}^{(n)}\) and \(\nu _{\gamma _{A,2}}^{(n)}\) converge weakly to continuous measures \(\nu _1\) and \(\nu _2\), respectively, as \(n \rightarrow \infty \). Then, \(\nu _{A}^{(n)}\) converges weakly, in probability, to \(\textrm{min}_*(\nu _1,\nu _2)\) which is the pushforward of \(\nu _1\) and \(\nu _2\) along the function \(\min :\mathbbm {R}\times \mathbbm {R}\rightarrow \mathbbm {R}\). In other words, for any bounded continuous function \(f \in C_b(\mathbbm {R})\)

$$\begin{aligned} \sum _{\lambda \in {{\,\textrm{spec}\,}}(\rho _A)} \lambda \, f\left(\frac{\log (\frac{1}{\lambda }) - H(n)}{\sqrt{n}}\right) \rightarrow \int \int f(\min (x_1,x_2)) \, \textrm{d}\nu _1(x_1) \textrm{d}\nu _2(x_2)\nonumber \\ \end{aligned}$$
(1.4)

in probability.

We also show in Corollary 4.14 that, up to an error of size \({\mathcal {O}}(\log (n))\), this allows us to compute the entropy of \(\rho _A\).

An analogous statement to (1.4) was previously conjectured in [54] to be valid in quantum gravity and justified at a physics level of rigor in [6]. Our proof of (1.4) is closely related to the arguments in [6], but converting the physical arguments into a mathematical proof and careful controlling the relevant sources of error requires significant technical work, and constitutes the majority of Sect. 4.

In “Appendix A,” we relate our results to the study of holographic gravity computations, particularly in situations with competing minimal surfaces. This is not needed to understand the results of this work, but provides additional motivation for the relevance of our results and clarifies the way in which random tensor networks provide a useful toy model for holographic quantum gravity. Then, in “Appendix B,” we discuss the relation of our results to split transfer and its relevance in holography. This is again not needed to understand the main text, but rather pointing at topics our work is connected to. Next, in “Appendix C,” we prove two technical lemmas on joint smoothing that are necessary to analyze competing minimal cuts in the unbounded spectral variation regime. Finally, in “Appendix D,” we prove that a certain function on the symmetric group is a metric; this is not directly used in the current work, but may be of independent interest for the study of random tensor networks.

After the completion of this manuscript, we became aware of independent work by Jinzhao Wang [78] on the use of free products to describe entanglement in the toy model of quantum gravity introduced in [60] that has strong overlap with the ideas in Sect. 3.2 and “Appendix A.”

1.1 Notation and Conventions

For \(k \in \mathbbm {N}\), we let \([k] = \{1, \ldots , k\}\), and we denote by \(S_k\) the group of permutations of this set. We denote by \(\Vert a\Vert _p\) the \(\ell _p\)-norm of a vector a, defined by \(\Vert a\Vert _p^p = \sum _i \left|a_i\right|^p\). If a and b are vectors of different dimension, we extend the shorter vector by zeros and still write \(\Vert a-b\Vert _p\) for their distance. For example, if \(a \in \mathbbm {C}^{d_1}\) and \(b \in \mathbbm {C}^{d_2}\) with \(d_2 > d_1\), we write \(\Vert a - b\Vert _1 = \sum _{i=1}^{d_1} \left|a_i - b_i\right| + \sum _{i=d_1 + 1}^{d_2} \left|b_i\right|\). We also denote by \(\Vert A\Vert _p\) the Schatten p-norm of an operator A, defined as the \(\ell _p\)-norm of the singular values \(\{s_i\}\) of A; it can also be computed by \(\Vert A\Vert _p^p = {{\,\textrm{tr}\,}}((A^\dagger A)^{p/2})\). The operator norm is given by \(\Vert A\Vert _{\infty } = \max \{s_i\}\). If \({\mathcal {H}}\) is a Hilbert space, we introduce the notation \({\mathcal {P}}({\mathcal {H}})\) for the set of positive semidefinite operators on \({\mathcal {H}}\). We often refer to positive semidefinite operators as “density operators” or “states,” without requiring them to be normalized to unit trace. We write \({\mathcal {P}}_{\scriptscriptstyle {=}}({\mathcal {H}})\) for the set of \(\rho \in {\mathcal {P}}({\mathcal {H}})\) with unit trace, \({{\,\textrm{tr}\,}}[\rho ] = 1\), and we denote by \({\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}})\) the set of subnormalized states, that is, \(\rho \in {\mathcal {P}}({\mathcal {H}})\) with \({{\,\textrm{tr}\,}}[\rho ] \le 1\). We use the convention that for a vector \(\vert \phi \rangle \), we denote the corresponding pure state by \(\phi \), so \(\phi = \mathinner {|\phi \rangle \langle \phi |}\). Given a positive semidefinite operator \(\rho \), we denote by \({{\,\textrm{spec}\,}}(\rho )\) the vector containing its spectrum in non-increasing order, and we write \({{\,\textrm{spec}\,}}_+(\rho )\) for the nonzero part of the spectrum. It is a well-known fact that

$$\begin{aligned} \Vert {{\,\textrm{spec}\,}}(\rho ) - {{\,\textrm{spec}\,}}(\sigma )\Vert _1 = \Vert {{\,\textrm{spec}\,}}_+(\rho ) - {{\,\textrm{spec}\,}}_+(\sigma )\Vert _1 \le \Vert \rho - \sigma \Vert _1 \end{aligned}$$
(1.5)

. (In the second expression, we use the convention for the distance of vectors of possibly different dimension introduced above.) If A is a quantum system with Hilbert space \({\mathcal {H}}_A\), we write \({\mathcal {P}}(A) = {\mathcal {P}}({\mathcal {H}}_A)\), \({\mathcal {P}}_{\scriptscriptstyle {=}}(A) = {\mathcal {P}}_{\scriptscriptstyle {=}}({\mathcal {H}}_A)\), and \({\mathcal {P}}_{\scriptscriptstyle {\le }}(A) = {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}}_A)\), and we use subscripts, e.g., \(\rho _A \in {\mathcal {P}}(A)\), to indicate which system and Hilbert space a quantum state is associated with. For a bipartite state \(\rho _{AB} \in {\mathcal {P}}(AB)\) defined on a tensor product of Hilbert spaces \({\mathcal {H}}_{AB} = {\mathcal {H}}_A \otimes {\mathcal {H}}_B\), we obtain the reduced state or reduced density operator \(\rho _A \in {\mathcal {P}}(A)\) by taking the partial trace over the complement: \(\rho _A = {{\,\textrm{tr}\,}}_B[\rho _{AB}]\), and similarly in multipartite situations. Finally, we adopt the standard notation that if \(\mu _n\) is some sequence of finite measures, we write \(\mu _n \Rightarrow \mu \) if \(\mu _n\) converges weakly (or in distribution) to a finite measure \(\mu \), meaning that for any bounded continuous function \(f \in C_b(\mathbbm {R})\),

$$\begin{aligned} \int f(x) \textrm{d}\mu _n(x) \rightarrow \int f(x) \textrm{d}\mu (x). \end{aligned}$$
(1.6)

If \(\mu _n\) is a sequence of random finite measures on \(\mathbbm {R}\), we say that the sequence \(\mu _n\) converges weakly, in probability, to a finite measure \(\mu \), if, for any bounded continuous function \(f \in C_b(\mathbbm {R})\), Eq. (1.6) converges in probability, i.e., if for every \(\varepsilon > 0\) we have that

$$\begin{aligned} \lim _{n \rightarrow \infty } \Pr \left( \left|\int f(x) \textrm{d}\mu _n(x) - \int f(x) \textrm{d}\mu (x) \right| \ge \varepsilon \right) = 0. \end{aligned}$$

In this situation, we will also write \(\mu _n \Rightarrow \mu \), in probability. All logarithms are to base 2.

2 Random Tensor Network States

We first review the random tensor network model, closely following [29, 40]. Let \(G = (V,E)\) be a connected undirected graph, and let \(V = V_\partial \sqcup V_b\) be a partition of the vertices into a set of boundary vertices \(V_\partial \) and bulk vertices \(V_b\). If \(A \subseteq V_\partial \), we write \({\bar{A}} = V_\partial {\setminus } A\). We assign a bond dimension \(D_e\) to each edge, and we will consider families of states with increasing bond dimensions; for example, we may take \(D_e = D\) for all edges and let D increase. For each vertex \(x \in V\), let \(\partial \{x\}\) denote the set of edges \(e = (xy) \in E\) connecting x to some \(y \in V\). We define Hilbert spaces \({\mathcal {H}}_{e,x} = \mathbbm {C}^{D_e}\) for \(e \in \partial \{x\}\), and \({\mathcal {H}}_x:= \bigotimes _{e \in \partial \{x\}} {\mathcal {H}}_{e,x}\). We call the pair (ex) a half-edge. Moreover, we write \({\mathcal {H}}_{e} = {\mathcal {H}}_{e,x} \otimes {\mathcal {H}}_{e,y}\) for an edge \(e = (xy) \in E\). Let \(D_x = \dim ({\mathcal {H}}_x)\). For a subset \(A \subseteq V\), we write \({\mathcal {H}}_A = \bigotimes _{x \in A} {\mathcal {H}}_x\), and similarly, for a subset \(S \subseteq E\) we write \({\mathcal {H}}_S = \bigotimes _{e \in S} {\mathcal {H}}_e\). Similarly, for a set T of half-edges we write \({\mathcal {H}}_T = \bigotimes _{(e,x) \in T} {\mathcal {H}}_{e,x}\). At each edge \(e = (xy) \in E\), we place a pure state \(\vert \phi _e\rangle \in {\mathcal {P}}_{\scriptscriptstyle {=}}({\mathcal {H}}_e)\)

$$\begin{aligned} \vert \phi _{e}\rangle = \sum _{i=1}^{D_e} \sqrt{\lambda _{e,i}} \vert ii\rangle \in {\mathcal {H}}_e = {\mathcal {H}}_{e,x} \otimes {\mathcal {H}}_{e,y}, \end{aligned}$$
(2.1)

that we call a link state. Then, \(\phi _{e,x} = \phi _{e,y} = \sum _{i=1}^D \lambda _{e,i} \mathinner {|i\rangle \langle i|}\) is the reduced density matrix of the link state on either of the two subsystems. We refer to the vector \({{\,\textrm{spec}\,}}(\phi _{e,x}) = {{\,\textrm{spec}\,}}(\phi _{e,y})\), which is ordered in non-increasing fashion, as the entanglement spectrum of \(\phi _e\). Let \(\phi \in {\mathcal {P}}_{\scriptscriptstyle {=}}(V)\) be the full state on edges given by the tensor product of link states

$$\begin{aligned} \vert \phi \rangle = \bigotimes _{e \in E} \vert \phi _{e}\rangle . \end{aligned}$$
(2.2)

At every bulk vertex \(x \in V_b\), we place a random vector \(\vert \psi _x\rangle \in {\mathcal {H}}_x\), where the entries of \(\vert \psi _x\rangle \) are independent standard (circularly symmetric) complex Gaussian random variables: Each entry of the tensor can be written as \(\frac{1}{\sqrt{2}}(x + iy)\) where x and y are independent real Gaussian random variables of mean 0 and unit variance. We note that, in the model of [40], the tensors \(\vert \psi _x\rangle \) were not chosen as random Gaussian vectors, but as uniformly random vectors on the unit sphere. However, for our choice of Gaussian \(\vert \psi _x\rangle \), the norm \(\Vert \vert \psi _x\rangle \Vert \) is independent of the normalized vector \(\vert \psi _x\rangle /\Vert \vert \psi _x\rangle \Vert \), and \(\vert \psi _x\rangle /\Vert \vert \psi _x\rangle \Vert \) will be a uniformly random vectors on the unit sphere. Therefore, these two models only differ by their normalization. We write \(\vert \psi \rangle = \bigotimes _{x \in V_b} \vert \psi _x\rangle \). The resulting random tensor network state \(\rho \in {\mathcal {P}}(V_\partial )\) is defined by

$$\begin{aligned} \vert \rho \rangle = (I_{V_\partial } \otimes \langle \psi \vert ) \vert \phi \rangle . \end{aligned}$$
(2.3)

The random tensor network state is obtained by projecting the link states onto random vectors, so that the final state lives in the boundary Hilbert space. We can make this manifest by using the cyclicity of the trace to write the density matrix:

$$\begin{aligned} \rho = \left( I_{V_{\partial }} \otimes \langle \psi \vert \right) \phi \left( I_{V_{\partial }} \otimes \vert \psi \rangle \right) = {{\,\textrm{tr}\,}}_{V_b}\left[\left( I_{V_{\partial }} \otimes \psi \right) \phi \right]. \end{aligned}$$
(2.4)

Note that this state need not be normalized, but we chose the standard deviation of the \(\vert \psi _x\rangle \) such that \(\rho \) is normalized on average, given that the link state \(\phi \) is normalized:

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}[\rho ] = {{\,\textrm{tr}\,}}[\phi ]. \end{aligned}$$
(2.5)

In Sect. 2.3.1, we prove the stronger statement that \(\rho \) is normalized with high probability for appropriately connected tensor networks and large bond dimension. Note also, that in Eq. (2.1), we have chosen states which have a Schmidt decomposition in a fixed basis (the standard basis). Since we project onto uniformly random tensors, we can choose to do so without loss of generality.

2.1 The Replica Trick for Random Tensor Networks

We now consider a boundary subset \(A \subseteq V_{\partial }\) and use the replica trick to study the Rényi entropies of the reduced density matrix \(\rho _A\). The replica trick for random tensor network models was first studied in [40], and it is the key tool we apply throughout this work. Let \({\mathcal {H}}\) be a Hilbert space. The Rényi entropies of a (normalized) density matrix \(\rho \in {\mathcal {P}}_{\scriptscriptstyle {=}}({\mathcal {H}})\) are defined by

$$\begin{aligned} H_k(\rho ) = \frac{1}{1-k}\log ({{\,\textrm{tr}\,}}[\rho ^k]) \end{aligned}$$

for \(k \in (0,1) \cup (1,\infty )\). For \(k = 0,1,\infty \), there are well-defined limits, given by

$$\begin{aligned} \begin{aligned} H_0(\rho )&:= \log ({{\,\textrm{rank}\,}}(\rho ))\\ H_1(\rho )&:= -{{\,\textrm{tr}\,}}[\rho \log (\rho )]\\ H_{\infty }&:= -\log (\Vert \rho \Vert _{\infty }). \end{aligned} \end{aligned}$$
(2.6)

In particular, we see that \(H(\rho ) = H_1(\rho )\) is the von Neumann entropy. We will also write \(H(A)_\rho := H(\rho _A)\) and \(H_k(A)_{\rho }:= H_k(\rho _A)\) for reduced density matrices. If \(\rho \in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}})\) is subnormalized, we let

$$\begin{aligned} H_k(\rho ) = \frac{1}{1-k}\log \frac{{{\,\textrm{tr}\,}}\left[\rho ^k\right]}{{{\,\textrm{tr}\,}}[\rho ]}. \end{aligned}$$
(2.7)

Denote by R the representation of \(S_k\) on \({\mathcal {H}}^{\otimes k}\) which permutes the k copies of \({\mathcal {H}}\) according to the action of \(S_k\). We will write \(R_x(\pi )\) when \({\mathcal {H}}= {\mathcal {H}}_x\) and \(R_A(\pi )\) if \({\mathcal {H}}= {\mathcal {H}}_A\) for \(A \subseteq V\). We let \(\tau \) denote the standard k-cycle in \(S_k\), i.e.,

$$\begin{aligned} \tau = (1 2 \ldots k). \end{aligned}$$

The key idea of the replica trick is the observation that the kth moment of \(\rho \in {\mathcal {P}}({\mathcal {H}})\) can be written as

$$\begin{aligned} {{\,\textrm{tr}\,}}\left[\rho ^k\right] = {{\,\textrm{tr}\,}}\left[R(\tau )\rho ^{\otimes k}\right]. \end{aligned}$$
(2.8)

Recall the notion of the cycle type of a permutation \(\pi \): If \(\pi \) can be written as a product of m disjoint cycles of lengths \(l_1, \ldots , l_m\), then \(\pi \) has cycle type \(C(\pi ) = \{l_1,\ldots ,l_m\}\). Then, for an arbitrary \(\pi \in S_k\),

$$\begin{aligned} {{\,\textrm{tr}\,}}\left[ R(\pi )\rho ^{\otimes k}\right] = \prod _{l \in C(\pi )} {{\,\textrm{tr}\,}}\left[ \rho ^l \right]. \end{aligned}$$

Note that this is the generalization of the well-known swap trick for two copies of a state \(\rho \). The other crucial ingredient is a property of the Gaussian random vectors:

$$\begin{aligned} \mathbbm {E}\left[\psi _x^{\otimes k}\right] = \sum _{\pi \in S_k} R_x(\pi ). \end{aligned}$$
(2.9)

Using Eq. (2.4), we may then compute

$$\begin{aligned} \begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}\left[\rho _A^k\right]&= \mathbbm {E}{{\,\textrm{tr}\,}}\left[R_A(\tau ) \rho _{A}^{\otimes k}\right] \\&= \mathbbm {E}{{\,\textrm{tr}\,}}\left[\left( R_A(\tau ) \otimes R_{V_\partial {\setminus } A}(\textrm{id})\right) \rho ^{\otimes k}\right]\\&= \mathbbm {E}{{\,\textrm{tr}\,}}\left[\left( R_A(\tau ) \otimes R_{V_\partial {\setminus } A}(\textrm{id})\right) \left( I_{V_{\partial }} \otimes \psi \right) ^{\otimes k} \phi ^{\otimes k}\right] \\&= {{\,\textrm{tr}\,}}\left[\left( R_A(\tau ) \otimes R_{V {\setminus } A}(\textrm{id})\right) \mathbbm {E}\left[(I_{V_\partial } \otimes \psi )^{\otimes k}\right] \phi ^{\otimes k}\right]. \end{aligned} \end{aligned}$$
(2.10)

To further simplify this expression, we define the following set:

$$\begin{aligned} {\mathcal {S}}_{A,\sigma } = \left\{ \{\pi _x\}_{x \in V} : \pi _x \in S_k, \text { where } \pi _x = \sigma \text { for } x \in A \text { and } \pi _x = \textrm{id}\text { for } x \in {\bar{A}} \right\} , \end{aligned}$$
(2.11)

for any \(\sigma \in S_k\) and \(A \subseteq V_\partial \). An element of \({\mathcal {S}}_{A,\sigma }\) assigns a permutation to each vertex in V subject to a “boundary condition.” Now, using Eq. (2.9), we find that

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}\left[\rho _A^k\right]&= \sum _{\{\pi _x\} \in {\mathcal {S}}_{A,\tau }} {{\,\textrm{tr}\,}}\left[\bigotimes _{x \in V} R_x(\pi _x) \phi ^{\otimes k}\right]. \end{aligned}$$

Finally, we observe that for \(e = (xy)\)

$$\begin{aligned} {{\,\textrm{tr}\,}}\left[R(\pi _x) \otimes R(\pi _y) \phi ^{\otimes k}\right] = \prod _{l \in C(\pi _x^{-1} \pi _y)} {{\,\textrm{tr}\,}}\left[\phi _{e,x}^l\right], \end{aligned}$$

where we recall that \(\phi _{e,x}\) is the reduced density matrix of the link state on edge \(e = (xy)\). Thus, we conclude that

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}\left[\rho _A^k\right]&= \sum _{\{\pi _x\} \in {\mathcal {S}}_{A,\tau }} \prod _{e = (xy) \in E} \prod _{l \in C(\pi _x^{-1} \pi _y)} {{\,\textrm{tr}\,}}\left[\phi _{e,x}^l\right]. \end{aligned}$$
(2.12)

We can interpret the expectation as the partition function of a classical spin model

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}\left[\rho _A^k\right]&= \sum _{\{\pi _x\} \in {\mathcal {S}}_{A,\tau }} 2^{-\sum _{e = (xy) \in E} J_{e}(\pi _x, \pi _y)}, \end{aligned}$$
(2.13)

where the site variables in the spin model are permutations \(\pi _x \in S_k\), and the interaction at the edges between sites is given by

$$\begin{aligned} J_{e}(\pi _x,\pi _y) = -\sum _{l \in C(\pi _x^{-1} \pi _y)} \log ({{\,\textrm{tr}\,}}\left[\phi _{e,x}^l\right]) = \sum _{l \in C(\pi _x^{-1} \pi _y)} (l-1) H_l(\phi _{e,x}), \end{aligned}$$

with \(H_l\) being the lth Rényi entropy, and the model as boundary conditions such that the permutation must be \(\tau \) on A and \(\textrm{id}\) on \({\bar{A}}\). It turns out that \(J_e\) is a metric on the symmetric group \(S_k\)—see “Appendix D.” Similarly, we may place an arbitrary permutation \(\pi \) on A instead of \(\tau \), which yields (by exactly the same reasoning)

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}\left[R(\pi )\rho ^{\otimes k}\right]&= \sum _{\{\pi _x\} \in {\mathcal {S}}_{A,\pi }} \prod _{e = (xy) \in E} \prod _{l \in C(\pi _x^{-1} \pi _y)} {{\,\textrm{tr}\,}}\left[\phi _{e,x}^l\right]. \end{aligned}$$
(2.14)

2.2 Maximally Entangled Link States and Minimal Cuts

We will now discuss the special case where all the link states are maximally entangled states of dimension D, which has been studied extensively in [40]. We will generalize the results we discuss here to a wider class of link states in Sect. 3. In this case, the entanglement spectra of the link states are flat: For \(e \in E\), we have \(\lambda _{e,i} = \frac{1}{D}\) for \(i = 1, \ldots , D\). In particular, for all \(l \in \mathbbm {N}\) we have \(H_l(\phi _{e}) = \log (D)\) and hence

$$\begin{aligned} J_{e}(\pi _x,\pi _y) = \log (D)\sum _{l \in C(\pi _x^{-1} \pi _y)} (l-1). \end{aligned}$$

This leads to the so-called Cayley distance on \(S_k\):

$$\begin{aligned} d(\pi _x,\pi _y) = \sum _{l \in C(\pi _x^{-1} \pi _y)} (l-1) = k - \left|C(\pi _x^{-1} \pi _y)\right|, \end{aligned}$$
(2.15)

where \(\left|C(\pi )\right|\) is the number of cycles in \(\pi \). Moreover, \(d(\pi _x,\pi _y)\) is a metric and equals the minimal number of transpositions needed to transform \(\pi _x\) into \(\pi _y\). We say that \(\pi \in S_k\) is on a geodesic between \(\pi _1\) and \(\pi _2\) if \(d(\pi _1, \pi ) + d(\pi , \pi _2) = d(\pi _1,\pi _2)\). (Recall that d is a metric.) We can rewrite the spin model in terms of this distance:

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}\left[\rho _A^k\right]&= \sum _{\{\pi _x\} \in {\mathcal {S}}_{A,\sigma }} 2^{-\log (D) \sum _{e = (xy) \in E} d(\pi _x, \pi _y)}. \end{aligned}$$
(2.16)

The physically inclined reader may observe that the logarithm of the bond dimension has the role of an inverse temperature, and for large D, the dominant contribution to the partition function will be the ground state of the spin model, subject to the relevant boundary conditions.

To describe the dominant contribution to the sum in Eq. (2.16) for large D, we need the minimal cuts for A in G. A cut for A is a subset of the vertices \(\Gamma _A \subset V\) such that \(\Gamma _A \cap V_{\partial } = A\). Throughout this work, we will denote the set of all cuts for A by C(A). We will use the convention of denoting cuts (i.e., subsets of vertices) by capital Greek letters. Given a cut \(\Gamma _A \in C(A)\), we will denote the set of edges crossing the cut, that is, edges connecting a vertex in \(\Gamma _A\) with a vertex in \(V {\setminus } \Gamma _A\), by lowercase Greek letters \(\gamma _A\). (And by an abuse of language, also refer to this set as a “cut.”) A minimal cut for A is a cut such that the number of edges \(\left|\gamma _A\right|\) is minimal. We write \(m(A) = \left|\gamma _A\right|\) for a minimal cut \(\gamma _A \in C(A)\). If \(\Gamma _A \in C(A)\), we write \(\Gamma _A^c = V {\setminus } \Gamma _A\). Note that \(\Gamma _A^c\) is a cut for \({\bar{A}} = V_{\partial } {\setminus } A\).

In the simplest case, there is a unique minimal cut \(\gamma _A\). For this case, one can show that the dominant configuration is the one in which \(\pi _x = \tau \) for \(x \in \Gamma _A\) and \(\pi _x = \textrm{id}\) for \(x \in V {\setminus } \Gamma _A\), see [39], or Proposition 3.3. That is, there are two domains in the spin model corresponding to \(\tau \) and \(\textrm{id}\), and the minimization of the domain wall corresponds to the minimal cut in the graph.

We will also be interested in the case of exactly two non-intersecting minimal cuts \(\Gamma _{A,1}\) and \(\Gamma _{A,2}\). In this case, we have that \(\Gamma _{A,1} \subset \Gamma _{A,2}\), or \(\Gamma _{A,2} \subset \Gamma _{A,1}\). After relabeling, we may assume that the first is the case, and define three domains in the graph: \(V = V_1 \sqcup V_2 \sqcup V_3\) given by \(V_1 = \Gamma _{A,1}\), \(V_2 = \Gamma _{A,2} {\setminus } \Gamma _{A,1}\) and \(V {\setminus } \Gamma _{A,2}\). If there are exactly two minimal cuts, then multiple dominant configurations contribute equally to the partition function Eq. (2.16). These dominant configurations can be constructed as follows: For each \(\pi \) on a geodesic between \(\tau \) and \(\textrm{id}\), set \(\pi _x = \tau \) for \(x \in V_1\), \(\pi _x = \pi \) for \(x \in V_2\) and \(\pi _x = \textrm{id}\) for \(x \in V_3\). That these are the dominant configurations follows immediately from the fact that \(d(\tau , \pi ) + d(\pi , \textrm{id}) \ge d(\tau , \textrm{id})\), with equality if and only if \(\pi \) is on a geodesic between \(\tau \) and \(\textrm{id}\).

To understand this degeneracy, we use the following fact [56]: the set of permutations \(\pi \) on a geodesic between \(\tau \) and \(\textrm{id}\) is in a one-to-one correspondence with the set of non-crossing partitions NC(k) of [k]. See Sect. 3.1 for a definition and properties of NC(k). Thus, the degeneracy for the kth moment is \(\left|NC(k)\right| = C_k\) where

$$\begin{aligned} C_k = \frac{1}{k+1} \left( {\begin{array}{c}2k\\ k\end{array}}\right) \end{aligned}$$

is the kth Catalan number. These are the moments of the Marchenko–Pastur distribution \({{\,\textrm{MP}\,}}(t)\)

$$\begin{aligned} \begin{aligned} MP(t)&= \max (1 - t,0) \delta _0 + \nu _t \\ \textrm{d}\nu _t(x)&= \frac{\sqrt{4t - (x - 1 - t)^2}}{2\pi x} \mathbbm {1}_{(x - 1 - t)^2 \le 4t} \textrm{d}x. \end{aligned} \end{aligned}$$
(2.17)

This allows one to show the folklore result (which we prove and extend to more general link states in Theorem 3.4) that upon an appropriate rescaling, the empirical distribution of the spectrum of \(\rho _A\) converges to a Marchenko–Pastur distribution. This is in line with the case of a single random tensor, which precisely yields a Wishart matrix (see Sect. 3.1.1 for a brief introduction to these objects). In the first case, where there is a unique minimal cut, the entanglement spectrum of \(\rho _A\) is flat, while, as we have seen, in the second case, the degeneracy gives rise to a non-trivial spectrum in the right scaling limit.

2.3 The Replica Trick for General Background States

In Eq. (2.13), we computed the result of the replica trick for the kth moment for a random tensor network state. We will also consider the more general setting where the link state is replaced by some arbitrary state \(\phi _V\). In this setting, there need not be a graph structure, and the Hilbert space at each vertex \(x \in V\) can be some arbitrary Hilbert space, rather than a tensor product of Hilbert spaces labeled by half-edges. We will refer to \(\phi _V\) as a “background state” instead of a “link state” (as the interpretation of links along the edges does not necessarily make sense in this situation). That is, where before we had a link state

$$\begin{aligned} \vert \phi \rangle = \bigotimes _{e \in E} \vert \phi _e\rangle , \end{aligned}$$

we will now consider some arbitrary possibly mixed and subnormalized \(\phi _V \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(V)\) in the tensor network construction. We can generalize Eq. (2.4) to also apply for general background states to obtain a state \(\rho \in {\mathcal {P}}(V_\partial )\) given by

$$\begin{aligned} \rho = {{\,\textrm{tr}\,}}_{V_b}\left[\left( I_{V_{\partial }} \otimes \psi \right) \phi _V\right] \end{aligned}$$
(2.18)

where \(\vert \psi \rangle \) is a tensor product of random states at the bulk vertices. If \(\phi \) is pure, then so is \(\rho \), since in that case

$$\begin{aligned} \vert \rho \rangle = \left( I_{V_{\partial }} \otimes \langle \psi \vert \right) \vert \phi \rangle . \end{aligned}$$

If \(\phi \) is not pure, we can consider a purification \(\phi _{VR} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(VR)\) and consider R as an additional boundary system; this leads to a random tensor network state \(\rho _{V_{\partial }R}\) which is a purification of \(\rho _{V_\partial }\). This setup is illustrated in Fig. 3, while formally very similar, the resulting state is no longer a PEPS tensor network state in general.

Fig. 3
figure 3

The structure of a (purified) random tensor network with a general background state

There are multiple reasons to also allow general background states. The first reason is of a technical nature: They are useful for estimates based on smooth entropies, which we discuss in Sect. 4. In this application, the full state on edges is still pure, but is no longer a tensor product of link states along the edges. A second motivation for considering general background states is that they can be used as a toy model for holographic systems where there is “bulk entropy” present. Finally, these states are closely related to protocols for the quantum information processing task of split transfer [26]. We comment on this connection in “Appendix B.”

Even for a general background state, a version of the replica trick still applies. Consider a boundary subsystem \(A \subseteq V_{\partial }\) with corresponding boundary state \(\rho _A\). Then, the computation in Eq. (2.10) is still valid, and we find

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}[\rho _A^k]&= \sum _{\{\pi _x\} \in {\mathcal {S}}_{A,\tau }} {{\,\textrm{tr}\,}}_V\left[\bigotimes _{x \in V} R_x(\pi _x) \phi _V^{\otimes k}\right] \end{aligned}$$
(2.19)

where \(\tau = (1 2 \ldots k)\). However, Eq. (2.19) no longer has the interpretation of a spin model with local interactions.

For general background states, we will only need the replica trick for \(k = 2\). Since \(S_2\) has only two elements, each configuration of permutations is completely characterized by the domain \(\Delta _A = \{x \in V \text { such that } \pi _x = \tau \}\). Because of the boundary conditions in \({\mathcal {S}}_{A,\tau }\), the collection of these sets coincides with C(A), and hence,

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}[\rho _A^2] = \sum _{\Delta _A \in C(A)} {{\,\textrm{tr}\,}}[\phi _{\Delta _A}^2] = \sum _{\Delta _A \in C(A)} {{\,\textrm{tr}\,}}[\phi ] 2^{-H_2(\Delta _A)_\phi } . \end{aligned}$$
(2.20)

Another useful fact is that by Eq. (2.9),

$$\begin{aligned} \mathbbm {E}\rho _{V_\partial } = \phi _{V_\partial }. \end{aligned}$$
(2.21)

We remark that if one only uses the \(k = 2\) replica trick, one could also use tensors which are drawn from a projective 2-design, a distribution which produces tensors with the same first and second moments as uniformly random tensors of unit norm [35, 43]. An example of a projective 2-design is the set of uniformly random stabilizer states. For tensors \(\vert \psi _x\rangle \) drawn from a projective 2-design of dimension \(D_x\), it holds that

$$\begin{aligned} \mathbbm {E}\psi _x^{\otimes 2} = \frac{1}{D_x(D_x - 1)}I + \frac{1}{D_x(D_x - 1)}R_x(\tau ), \end{aligned}$$

and hence,

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}[\rho _A^2] = \frac{D_x}{D_x + 1} \sum _{\Delta _A \in C(A)} {{\,\textrm{tr}\,}}[\phi _{\Delta _A}^2], \end{aligned}$$

which is close to Eq. (2.20) for large \(D_x\). Thus, it is not hard to see that all random tensor network results which only use the \(k = 2\) replica trick are also valid for states with tensors drawn from projective 2-designs. This was already observed in [40], and random tensor networks with random stabilizer tensors were further studied in [57]. The results of Sect. 4 only use the \(k=2\) replica trick and thus will extend to states with tensors drawn from projective 2-designs. This will not be true for the results in Sect. 3, which requires usage of the replica trick for all \(k \in \mathbbm {N}\).

2.3.1 Normalization of Random Tensor Network States

One immediate consequence of the replica trick for \(k=2\) is that the random tensor network state \(\rho \) will be approximately normalized with high probability, so long as a mild condition on the background state is satisfied: The bulk needs to be connected, with sufficiently entangled edges. Let

$$\begin{aligned} \eta = \max _{\Delta \subseteq V_b, \Delta \ne \emptyset } {{\,\textrm{tr}\,}}[\phi _{\Delta }^2] = \max _{\Delta \subseteq V_b, \Delta \ne \emptyset } {{\,\textrm{tr}\,}}[\phi ] 2^{-H_2(\Delta )_\phi }. \end{aligned}$$
(2.22)

If the state has enough correlations along each cut (or more precisely, if \(H_2(\Delta )_\phi \) is large for each \(\Delta \)), then \(\eta \) is small. Concretely, if we consider a random tensor network state with maximally entangled link states of bond dimension D, we will have \(\eta \le \frac{1}{D}\). We then have

Lemma 1.1

For any background state \(\phi \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(V)\), with associated \(\rho \in {\mathcal {P}}(V_\partial )\) as in Eq. (2.18), it holds that for any \(\varepsilon > 0\)

$$\begin{aligned} \Pr \bigl (\left|{{\,\textrm{tr}\,}}[\rho ] - {{\,\textrm{tr}\,}}[\phi ]\right| \ge \varepsilon \bigr ) \le 2^{|V_b|}\frac{\eta }{\varepsilon ^2} \end{aligned}$$

where \(\eta \) is defined in Eq. (2.22).

Proof

This follows from a special case of Eq. (2.20). In this case, the empty cut contributes \({{\,\textrm{tr}\,}}[\phi ]^2,\) so we find

$$\begin{aligned} {{\,\textrm{Var}\,}}({{\,\textrm{tr}\,}}[\rho ]) = \mathbbm {E}\left|{{\,\textrm{tr}\,}}[\rho ] - {{\,\textrm{tr}\,}}[\phi ]\right|^2 = \mathbbm {E}\left|{{\,\textrm{tr}\,}}[\rho ]^2 - {{\,\textrm{tr}\,}}[\phi ]^2\right| \le 2^{V_b} \max _{\Delta \subseteq V_b, \Delta \ne \emptyset } {{\,\textrm{tr}\,}}[\phi _{\Delta }^2], \end{aligned}$$

where we have used the normalization of \(\rho \) in expectation \(\mathbbm {E}{{\,\textrm{tr}\,}}[\rho ] = {{\,\textrm{tr}\,}}[\phi ]\), as in Eq. (2.5). The result follows by an application of Chebyshev’s inequality. \(\square \)

We can improve this result by taking advantage of the fact that our random projectors are random Gaussian vectors, allowing us to use Gaussian concentration of measure rather than the Chebyshev’s inequality. For instance, using a concentration bound for Gaussian polynomials (see [9], Corollary 5.49) one can show that for any \(\varepsilon \ge (\sqrt{2}e)^{2V_b} \eta \):

$$\begin{aligned} \Pr \bigl (\left|{{\,\textrm{tr}\,}}[\rho ] - {{\,\textrm{tr}\,}}[\phi ]\right| \ge \varepsilon \bigr ) \le \exp \left( -\frac{\left|V_b\right|}{2e}\varepsilon ^{\frac{1}{\left|V_b\right|}}\eta ^{-\frac{1}{\left|V_b\right|}}\right) , \end{aligned}$$

where \(\eta \) is defined as in Eq. (2.22). We will not need this refinement.

3 Link States with Bounded Spectral Variation

In this section, we study random tensor network states with link states that have bounded spectral variation, meaning that there is an effective bond dimension D such that the Schmidt coefficients of the link state are of the order \(\tfrac{1}{D}\).

We start by providing background material on random matrix theory and free probability, which is a key tool in the study of products of random matrices. In Sect. 3.2, we will precisely define the notion of bounded spectral variation and generalize the results in Sect. 2.2 for random tensor network states with maximally entangled link states to this wider class of link states. This leads to the main result of this section, Theorem 3.4, which shows that the asymptotic entanglement spectrum can be expressed in terms of a free product of distributions. We will see that the results are similar to the quantum gravity setup described in “Appendix A.3.” Finally, in Sect. 3.3, we investigate the entanglement negativity for random tensor network states with link states of bounded spectral variation.

3.1 Random Matrices, Free Probability, and Non-crossing Partitions

3.1.1 Random Matrix Theory and Wishart Matrices

We start by reviewing relevant concepts from probability and random matrix theory that are relevant for our analyses. This material can be found in any introduction to random matrix theory, e.g., [3, 15, 58].

A fundamental question in random matrix theory is as follows: Given a family of \(n \times n\) matrices with entries selected according to some distribution, what is the asymptotic distribution of the eigenvalues as \(n \rightarrow \infty \)? This question has been extensively studied and in many cases has an elegant and concise answer. We discuss a basic example which is closely related to our purposes: Wishart matrices. Consider an \(n \times m\) matrix X whose entries are drawn i.i.d. from a Gaussian distribution with mean zero and unit variance. The sample covariance matrix of X is the \(n\times n\) matrix defined as

$$\begin{aligned} Y_{n,m} = \frac{1}{m}XX^T. \end{aligned}$$
(3.1)

Such random matrices are called (real) Wishart matrices and can be thought of as a sample second moment matrix (where one has m realizations of an n-dimensional random variable). One can also consider complex Wishart matrices: In this case, the entries of the \(n \times m\) matrix X are complex i.i.d. standard (circularly symmetric) complex Gaussian random variables. We then let \(Y_{n,m} = \frac{1}{m}XX^\dagger \). We would like to understand the spectrum of \(Y_{n,m}\), and to that end, we consider the empirical distribution of the eigenvalues. This empirical distribution is itself random, depending on the particular realization of \(Y_{n,m}\). To characterize the convergence, we recall that if \(\{\mu _n\}_{n \in \mathbbm {N}}\) is a sequence of random finite measures on \(\mathbbm {R}\), we say that the sequence \(\mu _n\) converges weakly, in probability, to a finite measure \(\mu \), if, for any bounded continuous function \(f \in C_b(\mathbbm {R})\), it holds that for every \(\varepsilon > 0\)

$$\begin{aligned} \lim _{n \rightarrow \infty } \Pr \left( \left|\int f(x) \textrm{d}\mu _n(x) - \int f(x) \textrm{d}\mu (x) \right| \ge \varepsilon \right) = 0. \end{aligned}$$

The asymptotic distribution of the eigenvalues of Wishart matrices is known to obey the Marchenko–Pastur law (see, for instance, Theorem 3.6 and Theorem 3.7 in [15]):

Theorem 1.2

Consider (real or complex) Wishart matrices \(Y_{n,m}\), and let

$$\begin{aligned} \mu _{n,m} = \frac{1}{n}\sum _{\lambda \in {{\,\textrm{spec}\,}}(Y_{n,m})}\delta _{\lambda } \end{aligned}$$

be the empirical distribution of its eigenvalue spectrum. Suppose that the ratio of dimensions n/m converges to a constant \(t > 0\) as \(n \rightarrow \infty \). Then, \(\mu _{n,m}\) converges weakly, in probability, to the Marchenko–Pastur distribution \({{\,\textrm{MP}\,}}(t)\) with parameter \(t > 0\), as defined in Eq. (2.17).

Generalizations to this result are possible. For example, one still has convergence if the entries of X are chosen according to non-Gaussian distributions with mean zero and unit variance. Also, one can prove weak convergence, almost surely (rather than just in probability); see [15].

If \(Y_{n,m} = \frac{1}{m}XX^\dagger \) is a complex Wishart matrix, X can also be interpreted as a uniformly random pure quantum state on \(\mathbbm {C}^n \otimes \mathbbm {C}^m\), and \(Y_{n,m}\), up to normalization, as the reduced density matrix on \(\mathbbm {C}^n\) [39]. Note that \(\frac{1}{n}Y_{n,m}\) is normalized in expectation in the sense that \(\mathbbm {E}\frac{1}{n} Y_{n,m} = 1\). So, complex Wishart matrices can be used as a model for the reduced state of a random bipartite quantum state, and this allows one to quantify the “typical entanglement” of a random state. Equivalently, in the tensor network setting, \(\frac{1}{\sqrt{n}}X\) can be thought of as a random tensor network state with a single bulk vertex, two boundary vertices, and maximally entangled link states. We can then can interpret \(\frac{1}{n}Y_{n,m}\) as the reduced density matrix on one of the boundary vertices. We will provide a generalization of Theorem 3.1 for the entanglement spectrum of random tensor network states in Theorem 3.4.

3.1.2 Free Probability

The topic of probability distributions in random matrix theory is closely related to free probability and, in particular, to the notion of the free product. We provide a brief introduction here; the material in this section is very standard, and we only review a few relevant aspects. For an extensive treatment, see, for instance, Chapter 5 in [3] or the books [52, 56, 58]. As we will see later, the free product will allow us to concisely formulate replica trick results involving multiple minimal cuts.

A non-commutative probability space is a pair \(({\mathcal {A}}, \omega )\), where \({\mathcal {A}}\) is a \(C^*\)-algebra and \(\omega \) is a state on \({\mathcal {A}}\). An element \(a \in {\mathcal {A}}\) is called a non-commutative random variable. The key example to have in mind is the space of \(n \times n\) random matrices, where the matrix entries are distributed according to some probability distribution, and \(\omega (a) = \mathbbm {E}\frac{1}{n}{{\,\textrm{tr}\,}}[a]\) defines a tracial state. If \(a \in {\mathcal {A}}\), the distribution (or law) \(\mu _a\) of a is defined as a map on polynomials, which evaluates on a polynomial p as \(\mu _a(p) = \omega (p(a))\). If a is self-adjoint, it has real spectrum and we can extend the domain of \(\mu _a\) to all bounded continuous functions \(f \in C_b(\mathbbm {R})\), using the functional calculus to define f(a) and letting \(\mu _a(f) = \omega (f(a))\). In this case, we can identify \(\mu _a\) with a distribution such that, for \(f \in C_b(\mathbbm {R})\), we have \(\mu _a(f) = \int f(x) \textrm{d}\mu _a(x)\). In particular, if a is an \(n \times n\) self-adjoint random matrix, then \(\mu _a(f) = \frac{1}{n}\mathbbm {E}\sum _{\lambda \in {{\,\textrm{spec}\,}}(a)} f(\lambda )\), and we may identify \(\mu _a\) with the empirical measure of the eigenvalues of a. If \({\mathcal {A}}\) is a commutative algebra, these notions reduce to the usual notions of probability theory, where \(\omega \) is the expectation.

We call a set of n non-commutative random variables \(\{a_i\}\) on a non-commutative probability space \(({\mathcal {A}},\omega )\) freely independent or just free if, for any set of \(k \ge 2\) polynomials \(\{p_j\}\), the variables satisfy

$$\begin{aligned} \omega (p_1(a_{i_1})\ldots p_k(a_{i_k})) = 0 \end{aligned}$$

whenever \(\omega (p_m(a_{i_m})) = 0\) for all \(1 \le m \le k\) and no two adjacent indices \(i_m\) and \(i_{m+1}\) for \(1 \le m \le k-1\) are equal. One can see that two freely independent variables \(a_1,a_2\) satisfy:

$$\begin{aligned} 0 = \omega ((a_1-\omega (a_1))(a_2-\omega (a_2))) = \omega (a_1a_2) - \omega (a_1)\omega (a_2), \end{aligned}$$
(3.2)

which, in the commutative case with random variables \(x_1\), \(x_2\), is the classical bivariate independence condition \(\mathbbm {E}[x_1x_2] = \mathbbm {E}[x_1]\mathbbm {E}[x_2]\). The definition of free independence does not specialize to independence in the commutative case. (Commuting independent random variables are only free when they are constant.) However, the role of free independence is analogous to the role of classical independence for commuting random variables: It allows one to, in principle, compute the joint mixed moments of the variables.

We will be interested in the multiplicative free convolution or free product (there also exists an additive convolution or just free convolution) of distributions. Suppose ab are non-commutative self-adjoint free random variables on \(({\mathcal {A}},\omega )\) with distributions \(\mu _a\) and \(\mu _b\). Then, we denote the distribution of ab by \(\mu _{ab} = \mu _a \boxtimes \mu _b\). Note that, generally, ab need not be self-adjoint. However, if \(\omega \) is tracial (as in the random matrix case) and a is positive, the distribution of ab coincides with that of \(\sqrt{a}b\sqrt{a}\) which is self-adjoint, and we can identify \(\mu _{ab}\) with a distribution on \(\mathbbm {R}\). If \(\mu _a\) and \(\mu _b\) are compactly supported distributions, then so is \(\mu _a \boxtimes \mu _b\).

As a concrete example of the freeness and the free product, let \(X_n\) and \(Y_n\) be two families of random \(n \times n\) positive diagonal matrices with uniformly bounded norm, such that their spectrum converges weakly to probability distributions \(\mu \) and \(\nu \), respectively. Let \(U_n\) be a family of Haar random unitary \(n \times n\) matrices. Then, as n goes to infinity, \(X_n\) and \(Y_n' = U_nY_nU_n^\dagger \) will be freely independent (so they are asymptotically free), and we would like to study their product. The product of positive matrices need not be self-adjoint, so we consider \(Z_n = \sqrt{X_n} Y_n' \sqrt{X_n}\) which is a positive matrix. One may then show that the distribution of the spectrum of \(Z_n\) weakly converges in probability to \(\mu \boxtimes \nu \). See Corollary 5.4.11 in [3] for a precise statement and proof.

The free product may be analyzed using generating functions: Given a (non-commutative) random variable a with distribution \(\mu _a\), let

$$\begin{aligned} m_{a,k} = \int x^k \textrm{d}\mu _a(x) \end{aligned}$$

be the kth moment of \(\mu _a\). Then, the moment-generating function is the formal power series

$$\begin{aligned} M_{\mu _a}(z) = \sum _{k=1}^\infty m_{a,k} z^k. \end{aligned}$$
(3.3)

We define the S-transform to be the formal power series

$$\begin{aligned} S_{\mu _a}(z) = \frac{1 + z}{z}M_{a}^{-1}(z), \end{aligned}$$

where \(M_{a}^{-1}(z)\) is the power series corresponding to the formal inverse of \(M_{\mu }(z)\) under composition, which is well defined as long as \(m_{a,1} \ne 0\). For compactly supported distributions, the moment-generating function, and hence the S-transform, uniquely determines the distribution.

If a and b are non-commutative self-adjoint free random variables, then

$$\begin{aligned} S_{\mu _{a} \boxtimes \mu _{b}}(z) = S_{\mu _{ab}}(z) = S_{\mu _a}(z)S_{\mu _b}(z). \end{aligned}$$
(3.4)

This also provides a completely combinatorial interpretation of the free product, without reference to the associated non-commutative probability spaces. That is, given compactly supported distributions \(\mu \) and \(\nu \), we can define \(\mu \boxtimes \nu \) by Eq. (3.4): It is the compactly supported distribution with moments prescribed by \(S_{\mu _{a} \boxtimes \mu _{b}}(z)\), and hence, \(M_{\mu _{a} \boxtimes \mu _{b}}(z)\). The free product is commutative and associative.

As an example, we compute the S-transform of the Marchenko–Pastur distribution \(\mu \sim {{\,\textrm{MP}\,}}(1)\). The distribution is given by

$$\begin{aligned} \textrm{d}\mu (x) = \frac{1}{2\pi }\sqrt{4x^{-1} - 1} \textrm{d}x. \end{aligned}$$

The moments can be computed directly:

$$\begin{aligned} m_k = \sum _{i=0}^{k-1}\frac{1}{i+1}\left( {\begin{array}{c}k\\ i\end{array}}\right) \left( {\begin{array}{c}k-1\\ i\end{array}}\right) \end{aligned}$$
(3.5)

After some work, one can show that the moments above lead to a closed-form moment-generating function

$$\begin{aligned} M(z) = \frac{2z - 1 - \sqrt{1 - 4z}}{2z}. \end{aligned}$$

One may then invert the expression and obtain the S-transform

$$\begin{aligned} S(z) = \frac{1}{1+z}. \end{aligned}$$

Similarly, for the Marchenko–Pastur distribution MP(t) with parameter t, which has distribution as given in Eq. (2.17), we find that

$$\begin{aligned} S(z) = \frac{1}{t+z}. \end{aligned}$$

See, for instance, [12].

3.1.3 Non-crossing Partitions

Given \(k \in \mathbbm {N}\), let NC(k) denote the set of non-crossing partitions of [k]. A non-crossing partition of [k] is a partition \([k] = X_1 \sqcup \ldots \sqcup X_m\) which is such that if \(i < j \in X_\alpha \), then there are no \(k, l \in X_{\beta }\) for \(\beta \ne \alpha \) with \(k< i< l < j\) or \(i< k< j < l\). To any non-crossing partition, we associate a permutation \(\pi \in S_k\) by mapping each subset \(\{i_1, \ldots , i_l\}\) to the cycle \((i_1,\ldots , i_l)\) with \(i_1< \cdots < i_l\). In a slight abuse of notation, we will write \(\pi \in NC(k)\). For any \(\pi \in S_k\), and for a sequence of numbers \(f_k\) for \(k = 1,2, \ldots \), we write

$$\begin{aligned} f_{\pi } = \prod _{l \in C(\pi )} f_l \end{aligned}$$
(3.6)

where \(C(\pi )\) is the cycle type of \(\pi \). We will need the following result, which is a straightforward consequence of the combinatorics of the S-transform.

Theorem 1.3

Consider compactly supported probability distributions \(\mu , \nu , \eta \). Suppose that the moments of \(\eta \) are given by

$$\begin{aligned} m_k^\eta = \sum _{\pi \in NC(k)} m^\mu _{\pi } m^\nu _{\pi ^{-1} \tau _k} \end{aligned}$$

where \(\tau _k = (1 2 \ldots k)\) is the full cycle. Then,

$$\begin{aligned} \eta = {{\,\textrm{MP}\,}}(1) \boxtimes \mu \boxtimes \nu . \end{aligned}$$

Proof

We let \({\mathcal {F}}\) be the transformation that sends a formal power series f(z) to the power series \(\frac{1}{z}f^{-1}(z)\). This is such that for some distribution \(\mu \), the S-transform is given by \(S_{\mu }(z) = (1 + z) {\mathcal {F}}(M_{\mu })(z)\). Moreover, given power series \(f(z) = \sum _k f_k z^k\) and \(g(z) = \sum _k g_k z^k\), define a convolution operation \(\circledast \) by

$$\begin{aligned} (f \circledast g) (z) = \sum _k \left( \sum _{\pi \in NC(k)} f_{\pi } g_{\pi ^{-1} \tau _k}\right) z^k \end{aligned}$$

where \(\tau _k\) is the full cycle in \(S_k\). Then, Theorem 18.14 in [56] states that for any two f and g with \(f_1 \ne 0\) and \(g_1\ne 0\), it holds that

$$\begin{aligned} {\mathcal {F}}(f \circledast g)(z) = {\mathcal {F}}(f)(z){\mathcal {F}}(g)(z). \end{aligned}$$

Then, the S-transform of \(\eta \) can be written:

$$\begin{aligned} S_\eta (z) = (1 + z) {\mathcal {F}}(M_\eta )(z) = (1 + z) {\mathcal {F}}(M_\mu )(z) {\mathcal {F}}(M_\nu )(z) = \frac{1}{1 + z} S_\mu (z)S_\nu (z). \end{aligned}$$

This implies the desired result, as the S-transform of \({{\,\textrm{MP}\,}}(1)\) is given by \(\frac{1}{1+z}\), and the S-transform uniquely determines a compactly supported distribution. \(\square \)

We remark briefly that free independence can equivalently be formulated in terms of the vanishing of free cumulants, which are themselves defined in terms of sums over non-crossing partitions. We refer the interested reader to any of the previously cited references for a more in-depth discussion on the role of non-crossing partitions in free probability. For our purposes, the fact that non-crossing partitions are intimately related to free independence will allow us to later phrase random tensor network results in terms of free probability.

3.2 Entanglement Spectrum of Random Tensor Network States as a Free Product

We now return to studying random tensor network states. Consider a family of states in \({\mathcal {P}}_{\scriptscriptstyle {=}}(V)\) composed of the tensor product of link states \(\vert \phi _e\rangle \) along the edges \(e \in E\) as in Eq. (2.1), and assume that along each edge, the bond dimensions scale with a parameter D, so \(D_e = \Theta (D)\). Our key assumption is that the link states have bounded spectral variation—by this we mean that the empirical distribution of the rescaled entanglement spectrum of the link states

$$\begin{aligned} \mu ^{(D)}_{e} := \sum _{i=1}^{D_e} \frac{1}{D_e} \delta _{D_e\lambda _{e,i}} \end{aligned}$$
(3.7)

has all moments converging to the moments \(m_{e,k}\) of a compactly supported probability distribution \(\mu _{e}\), as D goes to infinity. We assume that the link states are normalized, so \(m_{e,1} = 1\). This condition implies that, up to a vanishing fraction as \(D\rightarrow \infty \), the elements of the entanglement spectrum of the link state are of order \(D^{-1}\).

For a minimal cut \(\gamma _A\), let \(\mu ^{(D)}_{\gamma _A}\) be the distribution for the spectrum of the tensor product of the link states in \(\gamma _A\):

$$\begin{aligned} \mu ^{(D)}_{\gamma _A} = \bigotimes _{e \in \gamma _A} \mu ^{(D)}_{e} = \frac{1}{D_{\gamma _A}}\sum _{\{i_{e}\}} \delta _{D_{\gamma _A} \prod _{e \in \gamma _A}\lambda _{e,i_e}} \end{aligned}$$

where \(i_e = 1,\ldots ,D_e\) and \(D_{\gamma _A} = \prod _{e \in \gamma _A} D_e\). We define the tensor product of distributions as follows: If \(X_1\) and \(X_2\) are independent real valued random variables with distributions \(\mu _{X_1}\) and \(\mu _{X_2}\), then \(\mu _{X_1} \otimes \mu _{X_2}\) is defined as the joint distribution of \((X_1, X_2)\). The distribution \(\mu ^{(D)}_{\gamma _A}\) has kth moment given by \(m^{(D)}_{\gamma _A,k} = \prod _{e \in \gamma _A} m^{(D)}_{e,k}\), and we can see that \(m^{(D)}_{\gamma _A,k}\) converges to \(m_{\gamma _A,k}\), the moments of the distribution

$$\begin{aligned} \mu _{\gamma _A} := \bigotimes _{e \in \gamma _A} \mu _{e}. \end{aligned}$$

Let \({{\,\textrm{spec}\,}}(\rho _A) = \{\lambda _{A,i}\}\). (Recall that \({{\,\textrm{spec}\,}}(\rho _A)\) is ordered in non-increasing order.) Let \(\gamma _A\) be a cut for A. By a standard argument, the number of nonzero eigenvalues of \(\rho _A\) (that is, \({{\,\textrm{rank}\,}}(\rho _A)\)) is upper bounded by \(D_{\gamma _A}\). If \(\gamma _A\) is the unique minimal cut, then we define

$$\begin{aligned} \mu ^{(D)}_{A} := \frac{1}{D_{\gamma _A}} \sum _{i=1}^{D_{\gamma _A}} \delta _{D_{\gamma _A} \lambda _{A,i}}. \end{aligned}$$
(3.8)

If there are multiple minimal cuts, it is ambiguous which \(\gamma _A\), and hence, which \(D_{\gamma _A}\), we should pick; we choose the cut for which \(D_{\gamma _A}\) is minimal in Eq. (3.8), and we will denote this minimal cut by \(\gamma _{A,1}\). The moments of \(\mu ^{(D)}_{A}\) are given by

$$\begin{aligned} m^{(D)}_{A,k} := \int z^k \textrm{d}\mu _{A}(z) = D_{\gamma _A}^{k-1} \sum _{i=1}^{D_{\gamma _A}} (\lambda _{A,i})^k. \end{aligned}$$

Note that the distribution \(\mu ^{(D)}_A\) is random, and correspondingly, the moments \(m^{(D)}_{A,k}\) are random variables. In contrast, the moments \(m^{(D)}_{e,k}\) and \(m^{(D)}_{\gamma _A,k}\) are numbers depending only on the bond dimension.

The theorem we want to prove will follow straightforwardly from a key intermediate result: As D goes to infinity, all the moments of the boundary distribution \(\mu _A^{(D)}\) converge to the moments of \(\mu _{\gamma _A}\). We use the notation in Eq. (3.6) to write expressions like

$$\begin{aligned} m_{\gamma _A,\pi } = \prod _{l \in C(\pi )} m_{\gamma _A,l} \end{aligned}$$

for a permutation \(\pi \in S_k\). We will then apply the method of moments to show that convergence of moments implies convergence in distribution. As a remark on notation, in the error bounds in both the current section and Sect. 4, when we use \({\mathcal {O}}\)-notation, the constants may depend on the graph underlying the tensor network. (Typically our bounds scale as \(2^{\left|V_b\right|}\), where \(V_b\) is the set of bulk vertices.)

Proposition 1.4

If there exists a unique minimal cut \(\gamma _A\) for A, then

$$\begin{aligned} \lim _{D \rightarrow \infty } \mathbbm {E}m^{(D)}_{A,k} = m_{\gamma _A,k}. \end{aligned}$$
(3.9)

If there exist exactly two minimal cuts \(\gamma _{A,1}\) and \(\gamma _{A,2}\), which do not intersect (so \(\gamma _{A,1} \cap \gamma _{A,2} = \emptyset \)) and for which \(\frac{D_{\gamma _{A,1}}}{D_{\gamma _{A,2}}}\) converges to a constant \(t \le 1\), then

$$\begin{aligned} \lim _{D \rightarrow \infty } \mathbbm {E}m^{(D)}_{A,k} = \sum _{\pi \in NC(k)} t^{d(\pi ,\textrm{id})} m_{\gamma _{A,1},\tau ^{-1}\pi } m_{\gamma _{A,2},\pi }. \end{aligned}$$
(3.10)

Moreover, in both cases the variance goes to zero as D goes to infinity: for every k

$$\begin{aligned} \mathbbm {E}\left[\left( m^{(D)}_{A,k} - \mathbbm {E}\left[m^{(D)}_{A,k}\right]\right) ^2\right] = {\mathcal {O}}\left( \frac{1}{D}\right) . \end{aligned}$$
(3.11)

Proof

We first provide a sketch of the proof. It proceeds via the following steps:

  1. 1.

    Write the expectation of the moments of \(\mu _A^{(D)}\) as the partition function for a classical spin model, as in Sect. 2.1.

  2. 2.

    Show that the contributions from terms of the form given in the statement of the proposition dominate the partition function by carefully tracking the powers of D, and showing that all other contributions are suppressed polynomially in D.

  3. 3.

    Show that the variance of the moments vanishes in the limit \(D \rightarrow \infty \) by direct computation.

We begin with Step 1. First, we observe that the kth moment of \(\mu _A^{(D)}\) is given by \(m^{(D)}_{A,k} = D_{\gamma _A}^{k-1} {{\,\textrm{tr}\,}}\left[\rho _A^k\right]\). Consider the expression in Eq. (2.14) for the replica trick with permutation \(\pi \) on A:

$$\begin{aligned} Z_{k, \pi } := \mathbbm {E}{{\,\textrm{tr}\,}}\left[R_A(\pi ) \rho _A^{\otimes k}\right] = \sum _{\{\pi _x\} \in {\mathcal {S}}_{A,\pi }} \prod _{e = (xy) \in E} \prod _{l \in C(\pi _x^{-1} \pi _y)} {{\,\textrm{tr}\,}}\left[\phi _{e,x}^l\right]. \end{aligned}$$
(3.12)

Recall that the set \({\mathcal {S}}_{A,\pi }\), as defined in Eq. (2.11), consists of assignments of permutations to each \(x \in V\), subject to \(\pi _x = \pi \) for \(x \in A\) and \(\pi _x = \textrm{id}\) for \(x \in {\bar{A}}\). As in Eq. (2.12), if \(\pi = \tau \), then we indeed have \(Z_{k, \tau } = \mathbbm {E}{{\,\textrm{tr}\,}}\left[\rho _A^k\right]\), so

$$\begin{aligned} \mathbbm {E}m^{(D)}_{A,k} = D_{\gamma _A}^{k-1}Z_{k, \tau }. \end{aligned}$$
(3.13)

On the other hand, if we let \(k = 2n\) and \(\pi = {\tilde{\tau }} = (1 2 \ldots n)(n+1 \, n+2 \ldots 2n)\), then \(Z_{k, \pi } = \mathbbm {E}\left[{{\,\textrm{tr}\,}}\left[\rho _A^n\right]^2\right]\), and hence,

$$\begin{aligned} \mathbbm {E}\left[\left( m^{(D)}_{A,n}\right) ^2\right] = D_{\gamma _A}^{2n-2}Z_{k, {{\tilde{\tau }}}}. \end{aligned}$$
(3.14)

Recall that \(m^{(D)}_{e,l} = D_e^{l-1}{{\,\textrm{tr}\,}}\left[\phi _{e,x}^l\right]\), and write

$$\begin{aligned} Z_{k, \pi } = \sum _{\{\pi _x\} \in {\mathcal {S}}_{A,\pi }} Z_k(\{\pi _x\}) \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} Z_{k}(\{\pi _x\})&:= \prod _{e = (xy) \in E} \prod _{l \in C(\pi _x^{-1} \pi _y)} {{\,\textrm{tr}\,}}\left[\phi _{e,x}^l\right]\\&= \prod _{e = (xy) \in E} D_e^{\left|C(\pi _x^{-1} \pi _y)\right| - k}\prod _{l \in C(\pi _x^{-1} \pi _y)} m^{(D)}_{e,l} \\&= \prod _{e = (xy) \in E} D_e^{-d(\pi _x,\pi _y)} m^{(D)}_{e,\pi _x^{-1} \pi _y}. \end{aligned} \end{aligned}$$
(3.15)

This accomplishes Step 1: We have recast the problem of computing moments into a question of computing a partition function for a classical spin model with fixed boundary conditions.

For Step 2, we want to show that the dominant contribution(s) to \(Z_{k, \pi }\) as D goes to infinity are those given in the statement of the proposition. This will simply be a matter of checking powers of D, and using the triangle inequality property of the Cayley distance. If \(\Gamma _{A}\) is the unique minimal cut, then we let \(\pi ^{\min }_x = \pi \) for \(x \in \Gamma _A\) and \(\pi ^{\min }_x = \textrm{id}\) for \(x \in V {\setminus } \Gamma _A\), and we have

$$\begin{aligned} Z_k(\{\pi ^{\min }_x\})&= D_{\gamma _A}^{-d(\pi ,\textrm{id})} \prod _{e \in \gamma _A} m^{(D)}_{e,\pi } = D_{\gamma _A}^{-d(\pi ,\textrm{id})} m^{(D)}_{\gamma _A,\pi }. \end{aligned}$$
(3.16)

If there are exactly two minimal cuts \(\Gamma _{A,1} \subset \Gamma _{A,2}\), we let \(V = V_1 \sqcup V_2 \sqcup V_3\), with \(V_1 = \Gamma _{A,1}\), \(V_2 = \Gamma _{A,2} \cap \Gamma _{A,1}^c\) and \(V_3 = \Gamma _{A,2}^c\). Now, consider the permutations \(\sigma \in S_k\) that are on a geodesic between \(\pi \) and \(\textrm{id}\) (recall this implies \(d(\pi ,\sigma ) + d(\sigma , \textrm{id}) = d(\pi ,\textrm{id})\)), and consider the configuration given by \(\pi ^{\sigma }_x = \pi \) for \(x \in V_1\)\(\pi ^{\sigma }_x = \sigma \) for \(x \in V_2\), and \(\pi ^{\sigma }_x = \textrm{id}\) for \(x \in V_3\). By hypothesis, \(\gamma _{A,1}\) and \(\gamma _{A,2}\) do not intersect, and hence, the edges in each cut are distinct. Then, this configuration has weight

$$\begin{aligned} Z_k(\{\pi ^{\sigma }_x\})&=\prod _{e_1 \in \gamma _{A,1}} D_{e_1}^{-d(\pi ,\sigma )} \prod _{l_1 \in C(\pi ^{-1}\sigma )} m^{(D)}_{e_1,l_1} \prod _{e_2 \in \gamma _{A,2}} D_{e_2}^{-d(\sigma ,\textrm{id})} \prod _{l_2 \in C(\sigma )} m^{(D)}_{e,l}\nonumber \\&= D_{\gamma _{A,1}}^{-d(\pi ,\textrm{id})} \left( \tfrac{D_{\gamma _{A,1}}}{D_{\gamma _{A,2}}}\right) ^{d(\sigma ,\textrm{id})} \prod _{e_1 \in \gamma _{A,1}} m^{(D)}_{e_1,\pi ^{-1}\sigma } \prod _{e_2 \in \gamma _{A,2}} m^{(D)}_{e_2,\sigma }\nonumber \\&= D_{\gamma _{A,1}}^{-d(\pi ,\textrm{id})} \left( \tfrac{D_{\gamma _{A,1}}}{D_{\gamma _{A,2}}}\right) ^{d(\sigma ,\textrm{id})} m^{(D)}_{\gamma _{A,1},\pi ^{-1}\sigma } m^{(D)}_{\gamma _{A,2},\sigma }, \end{aligned}$$
(3.17)

where \(D_{\gamma _A,1}/D_{\gamma _A,2}\) converges to t, by assumption. Now, to show that these configurations yield the dominant contributions, we will need to use that \(D_e = \Theta (D)\), so let us write \(\frac{D}{D_e} = C_e^{(D)} = \Theta (1)\). Then for general configurations labeled by \(\pi _x\), we may rewrite Eq. (3.15) as

$$\begin{aligned} Z_k(\{\pi _x\}) = \prod _{e = (xy) \in E} D^{-d(\pi _x,\pi _y)} (C^{(D)}_e)^{d(\pi _x,\pi _y)} m^{(D)}_{e,\pi _x^{-1} \pi _y}. \end{aligned}$$

The configurations we claimed to be dominant satisfy \(Z_k(\{\pi \}) = \Theta (D^{-m(A)d(\pi ,\textrm{id})})\), where we recall that m(A) is the size of a minimal cut for A. Now, we will show that all other configurations satisfy \(Z_k(\{\pi \}) = {\mathcal {O}}(D^{-m(A)d(\pi ,\textrm{id}) - 1})\). To this end, consider some arbitrary configuration \(\{\pi _x\} \in {\mathcal {S}}_{A,\pi }\). Let P be a maximal set of edge-disjoint paths in G from A to \({\bar{A}}\). It is a well-known fact that such a set has size m(A), by the max-flow min-cut theorem. Let

$$\begin{aligned} C_k^{(D)} := \left( \max _{e \in E, l = 1, \ldots , k} (C^{(D)}_e)^{l-1}\right) \left( \max _{e \in E, \pi \in S_k} m^{(D)}_{e,\pi }\right) . \end{aligned}$$

Then, we may bound

$$\begin{aligned} Z_k(\{\pi _x\})&\le (C_k^{(D)})^{\left|E\right|} \prod _{e = (xy) \in E} D^{-d(\pi _x, \pi _y)} \le (C_k^{(D)})^{\left|E\right|} \prod _{p \in P} D^{- \sum _{e = (xy) \in p}d(\pi _x, \pi _y)}. \end{aligned}$$
(3.18)

The first inequality is clear from the definition of \(C_k^{(D)}\), and in the second inequality, we simply restrict to a subset of the edges we multiply over. Note that \(C_k^{(D)} = {\mathcal {O}}(1)\). Then, by the triangle inequality for the Cayley distance d, it holds that

$$\begin{aligned} \sum _{e = (xy) \in p}d(\pi _x, \pi _y) \ge d(\pi ,\textrm{id}) \end{aligned}$$

with equality if and only if the only edges (xy) for which \(\pi _x \ne \pi _y\) are on a path in P, and each of the paths is a geodesic. Then, we conclude

$$\begin{aligned} Z_k(\{\pi _x\}) \le C_k^{(D)} \prod _{p \in P} D^{- d(\pi ,\textrm{id})} = C_k^{(D)} D^{- m(A)d(\pi ,\textrm{id})}, \end{aligned}$$

and we see that the weight of every configuration can be bounded by the product of a \({\mathcal {O}}(1)\) number and a polynomial in D.

Now, as promised, we show that if \(\{\pi _x\}\) is not one of the minimal configurations described above, we actually have

$$\begin{aligned} Z_k(\{\pi _x\}) = {\mathcal {O}}(D^{- m(A)d(\pi ,\textrm{id}) - 1}). \end{aligned}$$
(3.19)

To see this, we rewrite the triangle inequality for the Cayley distance as:

$$\begin{aligned} \prod _{e = (xy) \in E} D^{-d(\pi _x, \pi _y)} \le \prod _{p \in P} D^{- \sum _{e = (xy) \in p}d(\pi _x, \pi _y)} \le D^{- m(A)d(\pi ,\textrm{id})} \end{aligned}$$
(3.20)

with equality if and only if the \(\pi _x\) are on a geodesic path in P. We now show that this is satisfied only for the configurations we claimed to be minimal. Assume that \(\{\pi _x\} \in {\mathcal {S}}_{A,\pi }\) is such that the inequalities in Eq. (3.20) are equalities and let

$$\begin{aligned} \Delta _n = \{x \in V \text { such that } d(\pi _x, \pi ) \le n \}. \end{aligned}$$

Then, \(\Delta _n \in C(A)\) for \(0 \le n < d(\pi ,\textrm{id})\), and we denote by \(\delta _n\) the associated set of edges crossing the cut. Each edge \((xy) \in \delta _n\) must be such that \(\pi _x \ne \pi _y\), so it must be on a path in P, and because the permutations are geodesics along the paths, they must be on different paths. Hence \(\left|\delta _n\right| \le \left|P\right| = m(A)\), implying each \(\Delta _n\) is a minimal cut. This immediately implies the claim if there is a unique minimal cut, since we must have \(\Delta _{d(\pi ,\textrm{id}) - 1} = \Delta _{0} = \Gamma _A\). If there are exactly two minimal cuts, then we must have \(\pi _x = \pi \) for \(x \in V_1\)\(\pi _x = \textrm{id}\) for \(x \in V_3\), and there must be some l such that for all \(x \in V_2\) we have \(d(\pi ,\pi _x) = l\) and \(d(\pi _x,\textrm{id}) = d(\pi ,\textrm{id}) - l\). Then in order to have equality in Eq. (3.20), we must have that for all \(x \in V_2\)\(\pi _x\) equals some fixed permutation \(\sigma \), because the assumption of having exactly two cuts implies that \(V_2\) is connected, and we must have \(d(\pi _x, \pi _y) = 0\) for all \((xy) \in E\) with \(x, y\in V_2\). This proves Eq. (3.19).

In conclusion, if there is a unique minimal cut, then by Eqs. (3.16) and (3.19), we find

$$\begin{aligned} Z_{k,\pi } = D_{\gamma _A}^{-d(\pi ,\textrm{id})} m^{(D)}_{\gamma _A,\pi } + {\mathcal {O}}(D^{-m(A)d(\pi ,\textrm{id}) - 1}), \end{aligned}$$
(3.21)

and if there are exactly two (non-intersecting) cuts, then by Eqs. (3.17) and (3.19), we find

$$\begin{aligned} Z_{k,\pi }&= \sum _{\sigma , \, d(\pi ,\sigma ) + d(\sigma ,\textrm{id}) = d(\pi ,\textrm{id})} D_{\gamma _{A,1}}^{-d(\pi ,\textrm{id})} \left( \tfrac{D_{\gamma _{A,1}}}{D_{\gamma _{A,2}}}\right) ^{d(\sigma ,\textrm{id})} m^{(D)}_{\gamma _{A,1},\pi ^{-1}\sigma } m^{(D)}_{\gamma _{A,2},\sigma } \nonumber \\&\quad + {\mathcal {O}}(D^{-m(A)d(\pi ,\textrm{id}) - 1}). \end{aligned}$$
(3.22)

Finally, we set \(\pi = \tau \) for the full cycle \(\tau \) and we use Eq. (3.13). For a unique minimal cut \(\gamma _A\), by Eq. (3.21)

$$\begin{aligned} \begin{aligned} \mathbbm {E}m_{\gamma _A,k}&= D_{\gamma _A}^{k-1} Z_{k, \tau }\\&= m^{(D)}_{\gamma _A,\pi } + {\mathcal {O}}( D_{\gamma _A}^{k-1} D^{-m(A)(k-1) - 1})\\&= m^{(D)}_{\gamma _A,k} + {\mathcal {O}}\left( \frac{1}{D}\right) . \end{aligned} \end{aligned}$$
(3.23)

using \(d(\tau ,\textrm{id}) = k-1\) and \(D_{\gamma _A} = \Theta (D^{m(A)})\). This proves Eq. (3.9) as \(m^{(D)}_{\gamma _A,k}\) converges to \(m_{\gamma _A,k}\).

For two non-intersecting minimal cuts, we saw that the dominant contribution is due to configurations \(\{\pi ^\sigma _x\}\) for \(\sigma \) on a geodesic between \(\tau \) and \(\textrm{id}\). Then, applying the observation that \(\sigma \) is on such a geodesic if and only if \(\sigma \) is a non-crossing partition similarly yields that by Eq. (3.22)

$$\begin{aligned} \begin{aligned} \mathbbm {E}m_{\gamma _A,k}&= D_{\gamma _A}^{k-1} Z_{k, \tau }\\&= D_{\gamma _A}^{k-1}\sum _{\sigma \in NC(k)} \left( \tfrac{D_{\gamma _{A,1}}}{D_{\gamma _{A,2}}}\right) ^{d(\sigma ,\textrm{id})} m^{(D)}_{\gamma _{A,1},\pi ^{-1}\sigma } m^{(D)}_{\gamma _{A,2},\sigma } \\ {}&\quad + {\mathcal {O}}( D_{\gamma _A}^{k-1} D^{-m(A)(k-1) - 1})\\&= \sum _{\sigma \in NC(k)}\left( \tfrac{D_{\gamma _{A,1}}}{D_{\gamma _{A,2}}}\right) ^{d(\sigma ,\textrm{id})} m^{(D)}_{\gamma _{A,1},\tau ^{-1}\sigma } m^{(D)}_{\gamma _{A,2},\sigma } + {\mathcal {O}}\left( \frac{1}{D}\right) . \end{aligned} \end{aligned}$$
(3.24)

Since \(m^{(D)}_{\gamma _{A,1},\tau ^{-1}\sigma } \rightarrow m_{\gamma _{A,1},\tau ^{-1}\sigma }\), \(m^{(D)}_{\gamma _{A,2},\sigma } \rightarrow m_{\gamma _{A,2},\sigma }\) and \(D_{\gamma _{A,1}}/ D_{\gamma _{A,2}} \rightarrow t\), this proves Eq. (3.10).

This accomplishes Step 2: We have shown that the configurations \(\{\pi ^{\min }_x\}\) (in case of a unique minimal cut for A) and \(\{\pi ^{\sigma }_x\}\) (in case there are exactly two non-intersecting minimal cuts for A) dominate in the computation of the expectation of the kth moment of \(\mu _A\) in terms of powers of D.

We complete the proof by showing that the variance of \(m_{A,k}\) vanishes as \(D \rightarrow \infty \). We use the observation in Eq. (3.14), applying the analysis of \(Z_{2k,\pi }\) to the case where \(\pi = {\tilde{\tau }} = (1 2 \ldots k)(k+1 \, k+2 \ldots 2k)\). If there is a unique minimal cut, then using Eq. (3.21) and the fact that \(d({\tilde{\tau }}, \textrm{id}) = 2k - 2\), we find

$$\begin{aligned} \mathbbm {E}\left[\left( m^{(D)}_{A,k}\right) ^2\right]&= D_{\gamma _A}^{2k-2} Z_{2k, {\tilde{\tau }}} = \prod _{l \in C({\tilde{\tau }})} m^{(D)}_{\gamma _A,l} + {\mathcal {O}}(D_{\gamma _A}^{2k-2} D^{-m(A)(2k-2) - 1}) \\&= (m^{(D)}_{\gamma _A,k})^2 + {\mathcal {O}}\left( \frac{1}{D}\right) . \end{aligned}$$

By Eq. (3.23), we know that \((\mathbbm {E}m^{(D)}_{A,k})^2 = (m^{(D)}_{\gamma _A,k} + {\mathcal {O}}(\frac{1}{D}))^2\), and we conclude that the variance obeys

$$\begin{aligned} \mathbbm {E}\left[\left( m^{(D)}_{A,k} - \mathbbm {E}\left[m^{(D)}_{A,k}\right]\right) ^2\right] = \mathbbm {E}\left[(m^{(D)}_{A,k})^2\right] - \left( \mathbbm {E}m^{(D)}_{A,k}\right) ^2 = {\mathcal {O}}\left( \frac{1}{D}\right) . \end{aligned}$$

For the case with exactly two minimal cuts, a similar argument holds. Here, the key observation is that \(\sigma \) is on a geodesic between \(\tilde{\tau }\) and \(\textrm{id}\) if and only if \(\sigma = \sigma _1 \sigma _2\) where \(\sigma _1\) is on a geodesic between \((1 2 \ldots k)\) and \(\textrm{id}\) and \(\sigma _2\) is on a geodesic between \((k+1 \, k+2 \ldots 2k)\) and \(\textrm{id}\). Using Eq. (3.22), this implies

$$\begin{aligned} \mathbbm {E}(m^{(D)}_{A,k})^2&= D_{\gamma _A}^{2k-2} Z_{2k, {{\tilde{\tau }}}}\\&=\sum _{\sigma _1,\sigma _2 \in NC(k)}\left( \tfrac{D_{\gamma _{A,1}}}{D_{\gamma _{A,2}}}\right) ^{d(\sigma _1,\textrm{id}) +d(\sigma _2,\textrm{id}) } m^{(D)}_{\gamma _{A,1},\tau ^{-1}\sigma _1}\\ {}&\quad m^{(D)}_{\gamma _{A,1},\tau ^{-1}\sigma _2} m^{(D)}_{\gamma _{A,2},\sigma _1} m^{(D)}_{\gamma _{A,2},\sigma _2} + {\mathcal {O}}\left( \frac{1}{D}\right) \\&= \left( \sum _{\sigma \in NC(k)}\left( \tfrac{D_{\gamma _{A,1}}}{D_{\gamma _{A,2}}}\right) ^{d(\sigma ,\textrm{id})} m^{(D)}_{\gamma _{A,1},\tau ^{-1}\sigma } m^{(D)}_{\gamma _{A,2},\sigma } \right) ^2 + {\mathcal {O}}\left( \frac{1}{D}\right) . \end{aligned}$$

By Eq. (3.24), we see that this coincides with \((\mathbbm {E}m^{(D)}_{A,k})^2\), up to \({\mathcal {O}}(\tfrac{1}{D})\), and hence, Eq. (3.11) holds. \(\square \)

We now have the ingredients to prove that the entanglement spectrum of random tensor networks with link states with bounded spectral variation can be written in a simple fashion. We will use the method of moments to translate the above result on convergence of moments to convergence in distribution. The basic statement is that, given certain conditions on the distributions in question, if the moments of a sequence of distribution \(\mu _n\) converge to those of \(\mu \), then \(\mu _n \Rightarrow \mu \)—see, for instance, Theorem 30.8 in [13].

The method of moments is valid, so long as a distribution \(\mu \) is completely determined by its moments. This occurs if, for all k, the kth moment \(m_{\mu ,k}\) is bounded as

$$\begin{aligned} m_{\mu ,k} \le AB^kk! \end{aligned}$$
(3.25)

for constants AB independent of k. If the distributions have compact support, as in Proposition 3.3, then this condition is satisfied.Footnote 2

Now that we have established the convergence of moments in Proposition 3.3, we have our main result of the (conditional) convergence in distribution. As in Proposition 3.3, we consider a family of random tensor network states with link states with bounded spectral variation with increasing D, as defined in the beginning of this section.

Theorem 1.5

If there exists a unique minimal cut \(\gamma _A\) for A, then \(\mu _A^{(D)} \Rightarrow \mu _{\gamma _A}\), in probability, as \(D \rightarrow \infty \). If there exist exactly two minimal cuts \(\gamma _{A,1}\) and \(\gamma _{A,2}\), which do not intersect and for which \(\lim _{D \rightarrow \infty } \frac{D_{\gamma _{A,1}}}{D_{\gamma _{A,2}}} = t \le 1\), then \(\mu ^{(D)}_A \Rightarrow {{\,\textrm{MP}\,}}(1) \boxtimes \mu _{\gamma _{A,1}} \boxtimes \mu _{\gamma _{A,2}}(t)\), in probability, where \(\mu _{\gamma _{A,2}}(t) = (1 - t)\delta _0 + t\mu _{\gamma _{A,2}}\)

Proof

It is straightforward to see that the kth moment of \(\mu _{\gamma _{A,2}}(t)\) is given by \(t^{k-1} m_{\gamma _{A,2},k}\), and then, the result follows immediately from Proposition 3.3, Theorem 3.2, and the method of moments. Because we assumed that for any minimal cut \(\gamma _A\) for A, the limiting distributions \(\mu _{\gamma _A}\) are compactly supported, they are uniquely determined by their moments. Hence, the method of moments is valid, and the convergence of moments implies convergence in distribution. \(\square \)

Remark 3.5

In Theorem 3.4, we assumed that the two cuts were non-intersecting. What happens if there are still only exactly two minimal cuts, but \(\gamma _{A,1} \cap \gamma _{A,2}\) is non-empty? This extension is straightforward. Let \(\gamma _A^{(a)}:= \gamma _{A,1} \cap \gamma _{A,2}\) and let \(\gamma _{A,i}^{(b)} = \gamma _{A,i} {\setminus } \gamma _A^{(a)}\) for \(i = 1,2\). In line with previous notation, let \(\mu _{\gamma _A^{(a)}}\) and \(\mu _{\gamma _{A,i}^{(b)}}\) denote the corresponding limiting distributions of the entanglement spectra along these sets, with moments \(m_{\gamma _A^{(a)},k}\) and \(m_{\gamma _{A,i}^{(b)},k}\). The only step in the proof of Proposition 3.3 where we used that the cuts were non-intersecting is when we computed the value of \(Z_k(\{\pi _x\})\) for the optimal configuration. If the cuts do intersect, and we consider the configuration with \(\pi _x = \tau \) for \(x \in V_1\) with \(\tau \) the complete cycle, \(\pi _x = \sigma \) for \(x \in V_2\) and \(\sigma \in NC(k)\), and \(\pi _x = \textrm{id}\) for \(x \in V_3\), then a quick calculation shows

$$\begin{aligned} Z_{k}(\{\pi _x\})&\rightarrow D_{\gamma _{A,1}}^{-d(\pi ,\textrm{id})} (D_{\gamma _A,1}/D_{\gamma _A,2})^{d(\sigma ,\textrm{id})} \prod _{e \in \gamma _A^{(a)}} m_{e,k}\\ {}&\quad \prod _{e_1 \in \gamma ^{(b)}_{A,1}} m_{e_1,\pi ^{-1}\sigma } \prod _{e_2 \in \gamma ^{(b)}_{A,2}} m_{e_2,\sigma }. \end{aligned}$$

Apart from this modification, the proof of Proposition 3.3 is still valid, leading to

$$\begin{aligned} Z_{k,\tau } = m_{\gamma _A^{(a)},k} \sum _{\sigma \in NC(k)} t^{d(\sigma ,\textrm{id})}m_{\gamma _{A,1}^{(b)},\tau ^{-1}\sigma } m_{\gamma _{A,2}^{(b)},\sigma }. \end{aligned}$$

If, in Theorem 3.4, we do not assume that the cuts are non-intersecting, then the partition function above leads to a limiting distribution given by

$$\begin{aligned} \mu _{\gamma _A^{(a)}} \otimes \left( MP(1) \boxtimes \mu _{\gamma _{A,1}^{(b)}} \boxtimes \mu _{\gamma _{A,2}^{(b)}}(t)\right) . \end{aligned}$$

3.3 Non-trivial Link States and Entanglement Negativity

As another application of the theory of free probability, we will compute the entanglement negativity spectrum for random tensor network states with link states with bounded spectra. In [29], it was shown how to compute the entanglement negativity spectrum for a random tensor network state with maximally entangled link states using a replica trick. Using the methods from the previous subsection, we can analyze the negativity for entangled link states with bounded spectral variation. We remark that similar computations have recently been performed in [28] in the context of replica wormholes, and our assumption on the link states is a generalization of the “pairwise connected regime” in [28]. Another work investigating non-trivial entanglement negativity spectra in random tensor networks is [42], where they focus on the effect of having multiple minimal cuts in the network. As our analysis will be a straightforward combination of the arguments in [29] and Sect. 3.2, we will be rather concise; the main message of this section is to show that the language of free probability applies to other random tensor network computations as well.

We first recall how negativity functions as an entanglement measure for mixed states. Let \({\mathcal {T}}\) be the superoperator which maps an operator X to its transpose \(X^{{\hspace{-0.83328pt}{\textsf{T}}}}\), and \({\mathcal {I}}\) be the identity superoperator. For \(\rho _{AB} \in {\mathcal {P}}(AB)\),

$$\begin{aligned} \rho _{AB}^{T_B} := ({\mathcal {I}}_A \otimes {\mathcal {T}}_B)(\rho _{AB}) \end{aligned}$$

is the partial transpose of \(\rho _{AB}\) on the B system. The logarithmic or entanglement negativity is given by

$$\begin{aligned} E_N(\rho _{AB}) = \log \frac{\Vert \rho _{AB}^{T_B}\Vert _1}{{{\,\textrm{tr}\,}}[\rho ]}. \end{aligned}$$

It is a measure for the entanglement of the mixed state \(\rho _{AB}\): if \(E_N(\rho _{AB}) > 0\) the state must be entangled. We call \({{\,\textrm{spec}\,}}(\left|\rho _{AB}^{T_B}\right|)\) the entanglement negativity spectrum. In analogy to the Rényi entropies, we can generalize the logarithmic negativity to a one-parameter family of negativities. The kth Rényi negativity is given by

$$\begin{aligned} N_k(\rho _{AB}) = {{\,\textrm{tr}\,}}\left[(\rho _{AB}^{T_B})^k\right]. \end{aligned}$$

If we let \(N^{(\textrm{even})}_m(\rho _{AB}) = N_{2m}(\rho _{AB})\), then the logarithmic negativity is obtained as an analytic continuation in the Rényi index \(m \rightarrow \frac{1}{2}\) of \(\log (N^{(\textrm{even})}_m(\rho _{AB}))\). More precisely, in the expression

$$\begin{aligned} \log \sum _{\lambda \in {{\,\textrm{spec}\,}}(\rho _{AB}^{T_B})} \left|\lambda \right|^{\alpha }, \end{aligned}$$

we may take \(\alpha \rightarrow \frac{1}{2}\) to obtain \(E_N(\rho _{AB}) + \log {{\,\textrm{tr}\,}}[\rho ]\).

In the context of random tensor networks, we partition the boundary in three regions: \(V_{\partial } = A \sqcup B \sqcup C\), and we would like to compute the Rényi negativities of the reduced state \(\rho _{AB}\). We will then use this to determine the entanglement negativity spectrum and compute the entanglement negativity. The idea is that the kth Rényi negativity can be computed using a replica trick, by placing the full cycle \(\tau _k = (1 2 \ldots k) \in S_k\) on A and \(\tau _k^{-1} = (k \, k-1 \ldots 1)\) on B:

$$\begin{aligned} N_k(\rho _{AB})&= {{\,\textrm{tr}\,}}\left[\rho _{AB}^{\otimes k} \left( R_A(\tau ) \otimes R_B(\tau ^{-1})\right) \right] \\ {}&= {{\,\textrm{tr}\,}}\left[\rho _{ABC}^{\otimes k} \left( R_A(\tau ) \otimes R_B(\tau ^{-1}) \otimes R_C(\textrm{id})\right) \right]. \end{aligned}$$

Let us first discuss the case with maximally entangled link states, following [29]. The same arguments as in Sect. 2.2 show that one can compute the expectation of \(N_k(\rho _{AB})\) for a random tensor network state using a spin model, now with boundary conditions of \(\tau _k\) on A, \(\tau _k^{-1}\) on B and \(\textrm{id}\) on C. We will assume that the minimal cuts \(\Gamma _A\), \(\Gamma _B\) and \(\Gamma _C\) are unique. Note that the minimal cut for AB is given by \(\Gamma _{AB} = \Gamma _C^c\). From the theory of multi-commodity flows, it is known that there exist sets of edge-disjoint paths \(P = P_{AB} \cup P_{AC} \cup P_{BC}\), where \(P_{AB}\) consists of paths from A to B, and similarly for \(P_{AC}\) and \(P_{BC}\), and which are such that

$$\begin{aligned} \left|P_{AB}\right| + \left|P_{AC}\right| = \left|\gamma _A\right|, \qquad \left|P_{AB}\right| + \left|P_{BC}\right| = \left|\gamma _B\right|, \qquad \left|P_{AC}\right| + \left|P_{BC}\right| = \left|\gamma _C\right|. \end{aligned}$$

This can be used to show (in analogous fashion to the proof of Proposition 3.3) that if \(k = 2n\) is even, any spin model configuration contributing to \(\mathbbm {E}N_k(\rho _{AB})\) is of order \({\mathcal {O}}(D^{-(n-1)(\left|\gamma _A\right| + \left|\gamma _B\right|) - n\left|\gamma _C\right|})\). If \(k = 2n+1\) is odd, any spin model configuration contributing to \(\mathbbm {E}N_k(\rho _{AB})\) is of order \({\mathcal {O}}(D^{-n(\left|\gamma _A\right| + \left|\gamma _B\right| + \left|\gamma _C\right|)})\).

In order to determine what happens as \(D \rightarrow \infty \), we need to determine the dominant configurations. Let r be the number of connected components of \(V {\setminus } (\Gamma _A \cup \Gamma _B \cup \Gamma _C)\). There are two distinct cases. The first is when the minimal cut for AB (which is the complement of the minimal cut for C) is the union of the minimal cuts for A and B, so \(\Gamma _{AB} = \Gamma _A \cup \Gamma _B\) and hence \(\gamma _{AB} = \gamma _A \cup \gamma _B\). Then, the minimal cuts naturally partition the bulk vertices into three cuts \(\Gamma _A\), \(\Gamma _B\), and \(\Gamma _C\), and we have \(r=0\). In this case, the dominant configurations in the spin model are those where the vertices in \(\Gamma _A\) are assigned \(\tau _k\), those in \(\Gamma _B\) are assigned \(\tau _k^{-1}\) and those in \(\Gamma _C\) are assigned \(\textrm{id}\). This is illustrated in Fig. 4a.

Fig. 4
figure 4

Tensor networks with one and two minimal cuts. The relevant ground state configuration domains are denoted by \(\Gamma _A\)

The second case is when \(\Gamma _A \cup \Gamma _B \subsetneq \Gamma _{AB}\) and hence \(\gamma _{AB} \ne \gamma _A \cup \gamma _B\). Now, we have again the domains \(\Gamma _A\)\(\Gamma _B\) and \(\Gamma _C\), but upon removing these vertices, there may also be connected components \(V_1, \ldots , V_r\) which are not connected to A, B or C. Here, the minimal configurations are those for which, again, the vertices in \(\Gamma _A\) are assigned \(\tau _k\), those in \(\Gamma _B\) are assigned \(\tau _k^{-1}\) and those in \(\Gamma _C\) are assigned \(\textrm{id}\), and where in each component \(V_i\) the vertices are assigned a permutation \(\pi _i\) which is such that it satisfies three conditions: It must be on a geodesic between \(\tau _k\) and \(\tau _k^{-1}\), on a geodesic between \(\tau _k\) and \(\textrm{id}\) and on a geodesic between \(\tau _k^{-1}\) and \(\textrm{id}\). If \(k = 2n\) is even, such permutations are given by non-crossing pairings: permutations corresponding to non-crossing partitions in which each cycle has length 2. The set of non-crossing pairings on 2n elements is in bijection with the set of non-crossing partitions on n elements, so the number of non-crossing pairings on 2n elements is given by \(\left|NC(n)\right| = C_n\). One way to obtain this correspondence is as follows. If \(\pi \) is a non-crossing pairing, \(\tau _{2n} \pi \) will map even numbers to even numbers, and restricting to the even numbers and relabeling \(2i \mapsto i\) yields a non-crossing partition \(\sigma \in NC(n)\). Moreover, restricting to the odd numbers and relabeling \(2i + 1 \mapsto i\) yields the non-crossing partition \(\sigma ^{-1}\tau \in NC(n)\). This leads to \(C_n^r\) dominant contributions to \(\mathbbm {E}N_{2n}(\rho _{AB})\) of size \(D^{-(n-1)(\left|\gamma _A\right| + \left|\gamma _B\right|) - n\left|\gamma _C\right|}\) since we can choose a non-crossing pairing \(\pi _i\) for each component. Such a configuration is illustrated in Fig. 4b.

For odd \(k = 2n + 1\), we similarly have permutations which correspond to a non-crossing partition, and which have a single fixed point and all other cycles with length 2. This leads to \(((2n + 1)C_n)^r\) dominant contributions to \(\mathbbm {E}N_{2n}(\rho _{AB})\), of size \(D^{-(n-1)(\left|\gamma _A\right| + \left|\gamma _B\right|) - n\left|\gamma _C\right|}\). We also note that \({{\,\textrm{rank}\,}}(\rho _{AB}^{T_B}) \le D^{\left|\gamma _A\right| + \left|\gamma _B\right|}\). If \({{\,\textrm{spec}\,}}(\rho _{AB}^{T_B}) = \{s_i\}\), then we define the measure

$$\begin{aligned} \mu ^{(D)}_{AB} = \frac{1}{D^{\left|\gamma _A\right| + \left|\gamma _B\right|}} \sum _{i=1}^{D^{\left|\gamma _A\right| + \left|\gamma _B\right|}} \delta _{D^{\frac{1}{2}(\left|\gamma _A\right| + \left|\gamma _B\right| + \left|\gamma _C\right|)}s_i}. \end{aligned}$$
(3.26)

This has moments given by

$$\begin{aligned} m^{(D)}_{AB,k} = \int x^k \textrm{d}\mu ^{(D)}_{AB}(x) = D^{(\tfrac{k}{2}-1)(\left|\gamma _A\right| + \left|\gamma _B\right|) + \tfrac{k}{2}\left|\gamma _C\right|} N_{k}(\rho _{AB}). \end{aligned}$$

If we take the expectation of the moments, we again need to distinguish the two cases. If \(\left|\gamma _A\right| + \left|\gamma _B\right| = \left|\gamma _C\right|\), we see that the powers of D cancel for the dominant configurations, so \(m^{(D)}_{AB,k} \rightarrow 1\) for all k. On the other hand, for \(\left|\gamma _A\right| + \left|\gamma _B\right| > \left|\gamma _C\right|\), we see that for \(D \rightarrow \infty \) with odd k, we have \(\mathbbm {E}m^{(D)}_{AB,k} \rightarrow 0\). For even k, we recover the degeneracy of the dominant configurations, leading to

$$\begin{aligned} \lim _{D \rightarrow \infty } m^{(D)}_{AB,k} = {\left\{ \begin{array}{ll} 1 &{} \text {if }\left|\gamma _A\right| + \left|\gamma _B\right| = \left|\gamma _C\right|,\\ 0 &{} \text {if }k\text { odd and }\left|\gamma _A\right| + \left|\gamma _B\right|> \left|\gamma _C\right|,\\ C_{k/2}^r &{} \text {if }k\text { even and }\left|\gamma _A\right| + \left|\gamma _B\right| > \left|\gamma _C\right|, \end{array}\right. } \end{aligned}$$
(3.27)

where r is the number of connected components of \(V {\setminus } (\Gamma _A \cup \Gamma _B \cup \Gamma _C)\). In fact, one can show that, as in Proposition 3.3, the variance goes to zero as well, and hence, the method of moments allows one to conclude that \(\mu ^{(D)}_{AB} \Rightarrow \mu _{AB}\), in probability, where

$$\begin{aligned} \mu _{AB} = {\left\{ \begin{array}{ll} \sigma ^{\otimes r} &{} \text {if }r> 0,\\ \frac{1}{2}\delta _1 + \frac{1}{2}\delta _{-1} &{} \text {if }r=0\text { and }\left|\gamma _A\right| + \left|\gamma _B\right| > \left|\gamma _C\right|,\\ \delta _1 &{} \text {if }r=0\text { and }\left|\gamma _A\right| + \left|\gamma _B\right| = \left|\gamma _C\right|, \end{array}\right. } \end{aligned}$$
(3.28)

where \(\sigma \) is the semicircle distribution with density

$$\begin{aligned} \textrm{d}\sigma (x) = \frac{1}{2\pi }{\sqrt{4-x^2}} \mathbbm {1}_{\left|x\right|\le 2}\,dx \end{aligned}$$

Alternatively, one may study the empirical distribution of the squared entanglement negativity spectrum

$$\begin{aligned} \nu _{AB}^{(D)} = \frac{1}{D^{\left|\gamma _A\right| + \left|\gamma _B\right|}} \sum _i \delta _{D^{\left|\gamma _A\right| + \left|\gamma _B\right| + \left|\gamma _C\right|}s_i^2}. \end{aligned}$$
(3.29)

This distribution has kth moment given by \(m^{(D)}_{AB,2k}\), and in comparison with the limiting moments in Eq. (3.27), one can conclude that \(\nu _{AB}^{(D)} \Rightarrow \nu _{AB}\), in probability, where

$$\begin{aligned} \nu _{AB} = MP(1)^{\otimes r}. \end{aligned}$$
(3.30)

The logarithmic negativity can be computed using the distribution \(\mu ^{(D)}_{AB}\) or \(\nu ^{(D)}_{AB}\) as

$$\begin{aligned} E_N(\rho _{AB})&= \log \int \left|\lambda \right| \textrm{d}\mu ^{(D)}_{AB}(\lambda ) + \frac{\log D}{2}(\left|\gamma _A\right| + \left|\gamma _B\right| - \left|\gamma _C\right|) - \log {{\,\textrm{tr}\,}}\left[\rho \right] \end{aligned}$$
(3.31)
$$\begin{aligned}&= \log \int \sqrt{\lambda } \textrm{d}\nu ^{(D)}_{AB}(\lambda ) + \frac{\log D}{2}(\left|\gamma _A\right| + \left|\gamma _B\right| - \left|\gamma _C\right|) - \log {{\,\textrm{tr}\,}}\left[\rho \right]. \end{aligned}$$
(3.32)

The convergence of \(\nu ^{(D)}_{AB}\) to \(\nu _{AB}\) impliesFootnote 3 that \(E_N(\rho _{AB}) - \frac{\log D}{2}(\left|\gamma _A\right| + \left|\gamma _B\right| - \left|\gamma _C\right|)\) converges in probability to

$$\begin{aligned} \log \int \sqrt{\lambda } \textrm{d}\nu _{AB}(\lambda ) = r\log \frac{8}{3\pi }. \end{aligned}$$

See Appendix D of [29] for details and proofs.

A straightforward combination of the arguments in Sect. 3.2 and [29] shows that the same configurations are the dominant contributions for link states with bounded spectral variation as in Sect. 3.2. To determine the limiting distribution in this case, we can generalize Eq. (3.28) in the same fashion as in Sect. 3.2. We assume the minimal cuts \(\Gamma _A\), \(\Gamma _B\) and \(\Gamma _C\) are unique. We also assume that \(\gamma _A \cap \gamma _B = \emptyset \), and in the case where \(\gamma _C = \gamma _{AB} \ne \gamma _A \cup \gamma _B\) (so \(\left|\gamma _A\right| + \left|\gamma _B\right| > \left|\gamma _C\right|\)), all pairwise intersections between \(\gamma _A\)\(\gamma _B\) and \(\gamma _C\) are empty. This excludes the case where \(\left|\gamma _A\right| + \left|\gamma _B\right| > \left|\gamma _C\right|\), but \(r = 0\). We let \(\gamma _{A,i}\) and \(\gamma _{B,i}\) denote the components of \(\gamma _A\) and \(\gamma _B\) which are connected to \(V_i\), and we let \(\mu ^{(D)}_{\gamma _{A,i}}\) and \(\mu ^{(D)}_{\gamma _{B,i}}\) denote the distribution of the spectrum along these sets, with associated kth moments \(m^{(D)}_{\gamma _{A,i},k}\), \(m^{(D)}_{\gamma _{B,i},k}\), which we assume to converge to the moments \(m_{\gamma _{A,i},k}\), \(m_{\gamma _{B,i},k}\) of compactly supported distributions \(\mu _{\gamma _{A,i}}\) and \(\mu _{\gamma _{B,i}}\). For convenience, we assume \(D_e = D\) for all edges \(e \in E\).

We can now compute the dominant contributions to \(\mathbbm {E}N_{k}(\rho _{AB})\). If \(\gamma _C = \gamma _A \cup \gamma _B\), then there is a unique dominant configuration, which contributes \(D^{-(k-1)\left|\gamma _C\right|} m_{\gamma _C,k}\). If \(\left|\gamma _A\right| + \left|\gamma _B\right| > \left|\gamma _C\right|\) and \(k =2n\) is even, consider the configuration which assigns \(\pi _i\) to \(V_i\), where each \(\pi _i\) is a non-crossing pairing. For each edge \(e \in \gamma _C\), we have \(m^{(D)}_{e,\pi _i} = (m^{(D)}_{e,2})^{n}\), so this configuration contributes

$$\begin{aligned} D^{-(n-1)(\left|\gamma _A\right| + \left|\gamma _B\right|) - n\left|\gamma _C\right|}\left( m^{(D)}_{\gamma _C,2}\right) ^{n} \prod _{i=1}^r\bigl (m^{(D)}_{\gamma _{A,i},\tau _{2n}^{-1} \pi _i} m^{(D)}_{\gamma _{B,i},\tau _{2n} \pi _i} \bigr ) . \end{aligned}$$

Recalling the construction of the equivalence between NC(n) and non-crossing pairings on 2n elements, we see that

$$\begin{aligned} m^{(D)}_{\gamma _{B,i},\tau _{2n} \pi _i}&= m^{(D)}_{\gamma _{B,i},\sigma _{i}} m^{(D)}_{\gamma _{B,i},\sigma _i^{-1}\tau _{n}} \end{aligned}$$

for some unique \(\sigma _i \in NC(n)\). Similarly, one may verify

$$\begin{aligned} m^{(D)}_{\gamma _{A,i},\tau _{2n}^{-1} \pi _i}&= m^{(D)}_{\gamma _{A,i},\sigma _{i}} m^{(D)}_{\gamma _{A,i},\sigma _i^{-1}\tau _{n}}. \end{aligned}$$

This implies that the contribution of all dominant configurations is given by

$$\begin{aligned} \left( m^{(D)}_{\gamma _C,2}\right) ^{n}\prod _{i=1}^r\Bigg (\sum _{\sigma \in NC(n)} m^{(D)}_{\gamma _{A,i},\sigma } m^{(D)}_{\gamma _{B,i},\sigma } m^{(D)}_{\gamma _{A,i},\sigma ^{-1}\tau _n} m^{(D)}_{\gamma _{B,i},\sigma ^{-1}\tau _n}\Bigg ) \end{aligned}$$

As in the maximally entangled case, upon rescaling, the odd moments vanish as \(D \rightarrow \infty \). In conclusion, the resulting asymptotic moments are given by

$$\begin{aligned} \lim _{D \rightarrow \infty } \mathbbm {E}m^{(D)}_{AB,k} = {\left\{ \begin{array}{ll} m_{\gamma _C,k} &{} \text {if }\left|\gamma _A\right| + \left|\gamma _B\right| = \left|\gamma _C\right|,\\ 0 &{} \text {if }k\text { odd and }\left|\gamma _A\right| + \left|\gamma _B\right|> \left|\gamma _C\right|,\\ m_k &{} \text {if }k\text { even and }\left|\gamma _A\right| + \left|\gamma _B\right| > \left|\gamma _C\right| \end{array}\right. } \end{aligned}$$
(3.33)

with

$$\begin{aligned} m_{2n} = m_{\gamma _C,2}^{n}\prod _{i=1}^r\Bigg (\sum _{\sigma \in NC(n)} m_{\gamma _{A,i},\sigma } m_{\gamma _{B,i},\sigma } m_{\gamma _{A,i},\sigma ^{-1}\tau _n} m_{\gamma _{B,i},\sigma ^{-1}\tau _n}\Bigg ). \end{aligned}$$

As before, one can also show, in similar fashion to the proof of Proposition 3.3, that the variance of the moments goes to zero as \(D \rightarrow \infty \). For the case \(\left|\gamma _A\right| + \left|\gamma _B\right| > \left|\gamma _C\right|\), we consider \(\nu ^{(D)}_{AB}\) similar to Eq. (3.29), but with an additional rescaling by \(m_{\gamma _C,2}\):

$$\begin{aligned} \nu _{AB}^{(D)} = \frac{1}{D^{\left|\gamma _A\right| + \left|\gamma _B\right|}} \sum _i \delta _{D^{\left|\gamma _A\right| + \left|\gamma _B\right| + \left|\gamma _C\right|}m_{\gamma _C,2}^{-1} s_i^2}. \end{aligned}$$

This has moments, which compute \(N^{(\textrm{even})}_k(\rho _{AB})\), converging to

$$\begin{aligned} \lim _{D \rightarrow \infty } \mathbbm {E}\int x^k \textrm{d}\nu _{AB}^{(D)}(x) = \prod _{i=1}^r\Biggl (\sum _{\sigma \in NC(n)} m_{\gamma _{A,i},\sigma } m_{\gamma _{B,i},\sigma } m_{\gamma _{A,i},\sigma ^{-1}\tau _n} m_{\gamma _{B,i},\sigma ^{-1}\tau _n}\Biggr ). \end{aligned}$$

Thus, by the method of moments and Theorem 3.2, it holds that \(\nu ^{(D)}_{AB} \Rightarrow \nu _{AB}\), in probability, where

$$\begin{aligned} \nu _{AB} = {\left\{ \begin{array}{ll} \bigotimes _{i = 1}^r \nu _i &{} \text {if }r > 0,\\ \mu _{\gamma _C} &{} \text {if }r=0, \end{array}\right. } \end{aligned}$$
(3.34)

and where \(\nu _i\) is given by

$$\begin{aligned} \nu _i = (\mu _{\gamma _{A,i}} \otimes \mu _{\gamma _{B,i}})^{\boxtimes 2} \boxtimes {{\,\textrm{MP}\,}}(1). \end{aligned}$$

This reduces to Eq. (3.30) if the link states are maximally entangled. We can use this to compute the logarithmic negativity, as we did previously. For \(r > 0\),

$$\begin{aligned} E_N(\rho _{AB})&= \log \int \sqrt{\lambda } \textrm{d}\nu _{AB}^{(D)}(\lambda ) + \frac{\log D}{2}(\left|\gamma _A\right| + \left|\gamma _B\right| - \left|\gamma _C\right|) \\ {}&\quad + \frac{1}{2}\log m_{\gamma _{C,2}} - \log {{\,\textrm{tr}\,}}\left[\rho \right], \end{aligned}$$

from which we find that \(E_N(\rho _{AB}) - \frac{\log D}{2}(\left|\gamma _A\right| + \left|\gamma _B\right| - \left|\gamma _C\right|)\) converges in probability to

$$\begin{aligned} \log \int \sqrt{\lambda } \textrm{d}\nu _{AB}(\lambda ) + \frac{1}{2}\log m_{\gamma _{C,2}}. \end{aligned}$$

For the case \(\left|\gamma _A\right| + \left|\gamma _B\right| = \left|\gamma _C\right|\), it is more elegant to use the limiting distribution of \(\mu ^{(D)}_{AB}\), as defined in Eq. (3.26). By the method of moments and Eq. (3.33), \(\mu ^{(D)}_{AB} \Rightarrow \mu _{\gamma _C}\), in probability. We may then compute the entanglement negativity as

$$\begin{aligned} E_N(\rho _{AB}) = \log \int \left|\lambda \right| \textrm{d}\mu _{AB}^{(D)}(\lambda ) + \frac{\log D}{2}(\left|\gamma _A\right| + \left|\gamma _B\right| - \left|\gamma _C\right|) - \log {{\,\textrm{tr}\,}}\left[\rho \right], \end{aligned}$$

and hence \(E_N(\rho _{AB}) - \frac{\log D}{2}(\left|\gamma _A\right| + \left|\gamma _B\right| - \left|\gamma _C\right|)\) converges in probability to \(\log \int \left|\lambda \right| \textrm{d}\mu _{AB}(\lambda )\).

4 Link States with Unbounded Spectral Variation

We will now consider a different regime, where the link states have unbounded spectral variation. Our methods in this section are distinct from the previous one, and the two sections can be considered separately.

4.1 One-Shot Entropies

We begin by introducing one of our main tools for studying entanglement spectra in random tensor network states: one-shot entropies. In quantum information theory, the rates of certain important protocols, such as compression or state merging can be expressed as entropic quantities. One-shot entropies are the appropriate analogs for settings where one would like to analyze a task for a single or finite number of copies of the relevant state. Asymptotic rates in terms of ordinary von Neumann entropies are then recovered in the limit of infinitely many independent copies. For an extensive introduction to this point of view, see [74]; here, we provide the basic definitions and introduce the relevant concepts.

A random tensor network built from link states that are maximally entangled (or more generally have bounded spectral variation) can be analyzed using asymptotic tools. Indeed, if we have a maximally entangled state of large dimension \(D=2^n\), then this is equal to the nth tensor power of a qubit maximally entangled state, so we are effectively in an asymptotic situation. However, if we allow for link states with unbounded spectral variation or even completely general background states, as in Sect. 2.3, then it is more natural to use tools from one-shot quantum information theory.

We take the Rényi entropies as a starting point, which we defined in Eq. (2.7) for subnormalized states. Let \({\mathcal {H}}\) be some Hilbert space and for \(\rho \in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}})\) we define the (unconditional) min-entropy and the max-entropy by

$$\begin{aligned} H_{\min }(\rho )&= -\log \, \Vert \rho \Vert _{\infty }\\ H_{\max }(\rho )&= \log \left( {{\,\textrm{tr}\,}}[\sqrt{\rho }]^2\right) \end{aligned}$$

which coincide with the Rényi entropies \(H_{\infty }(\rho )\) and \(H_{\frac{1}{2}}(\rho )\) for \(\rho \in {\mathcal {P}}_{\scriptscriptstyle {=}}({\mathcal {H}})\). As usual, if \(\rho _A\) is the reduced density matrix on a system A, we write \(H_{\min }(A)_{\rho } = H_{\min }(\rho _A)\) and \(H_{\max }(A)_{\rho } = H_{\max }(\rho _A)\).

Often, when applied to study quantum information processing tasks, it is useful to allow a small error. This leads to the introduction of smooth entropies. To define these, we use a distance measure known as the purified distance, which is given for \(\rho ,\sigma \in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}})\) by

$$\begin{aligned} P(\rho , \sigma ) = \sqrt{1 - F_*(\rho ,\sigma )^2} \end{aligned}$$

where \(F_*(\rho ,\sigma )\) is the generalized fidelity between \(\rho \) and \(\sigma \), which is defined by

$$\begin{aligned} F_*(\rho ,\sigma ) = F(\rho ,\sigma ) + \sqrt{(1 - {{\,\textrm{tr}\,}}\left[\rho \right])(1 - {{\,\textrm{tr}\,}}\left[\sigma \right])} \end{aligned}$$

in terms of the ordinary fidelity \(F(\rho ,\sigma ) = \Vert \sqrt{\rho }\sqrt{\sigma }\Vert _1\). We define the smooth min- and max-entropies of \(\rho \in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}})\) as

$$\begin{aligned} H^\varepsilon _{\min }(\rho )&= \sup _{\rho ^{\varepsilon } \in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}}), P(\rho ^\varepsilon ,\rho ) \le \varepsilon } H_{\min }(\rho ^\varepsilon )\\ H^\varepsilon _{\max }(\rho )&= \inf _{\rho ^{\varepsilon }\in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}}), P(\rho ^\varepsilon ,\rho ) \le \varepsilon } H_{\max }(\rho ^\varepsilon ). \end{aligned}$$

The smooth entropies are such that one recovers the usual von Neumann entropies in the limit of many independent copies. Indeed, the following asymptotic equipartition property holds:

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n} H^{\varepsilon }_{\min }(\rho ^{\otimes n}) = H(\rho ) = \lim _{n \rightarrow \infty } \frac{1}{n} H^{\varepsilon }_{\max }(\rho ^{\otimes n}) \end{aligned}$$

for any \(0< \varepsilon < 1\). Variations on this definition are possible. For instance, one can choose a different distance measure, which will yield different entropies. However, for the usual choices, the differences go to zero as \(\varepsilon \) goes to zero, so the particular choice is often immaterial. For instance, consider the trace distance between \(\rho , \sigma \in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}})\), which is defined by

$$\begin{aligned} T(\rho ,\sigma ) = \frac{1}{2}\Vert \rho - \sigma \Vert _1 + \frac{1}{2}\left|{{\,\textrm{tr}\,}}\left[\rho - \sigma \right]\right|, \end{aligned}$$

where the last term, which is absent in usual definitions of the trace distance, accounts for subnormalized states. It is easy to see that \(T(\rho ,\sigma ) \le \Vert \rho - \sigma \Vert _1 \le 2T(\rho ,\sigma )\). The Fuchs–van de Graaff inequalities (see Lemma 3.17 in [74]) relate the trace distance and purified distance:

$$\begin{aligned} T(\rho ,\sigma ) \le P(\rho ,\sigma ) \le \sqrt{2T(\rho ,\sigma )} \end{aligned}$$
(4.1)

for \(\rho , \sigma \in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}})\).

There are also conditional versions of the Rényi entropies. Consider a bipartite quantum state \(\rho _{AB} \in {\mathcal {P}}_{\scriptscriptstyle {=}}(AB)\). For the von Neumann entropy, the conditional entropy can simply be defined as an entropy difference, namely \(H(A\vert B)_\rho = H(AB)_\rho - H(B)_\rho \). However, it turns out that this is not a good definition in the Rényi case. There are various ways to define a Rényi conditional entropy \(H_k(A \vert B)\); we use a version based on the so-called sandwiched Rényi relative entropy. For \(k=2\), this gives a quantum conditional collision entropy, which will be useful for defining minimal cuts and which is defined as follows. For \(\rho _{AB} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(AB)\), let

$$\begin{aligned} H_2(A\vert B)_{\rho | \rho } := -\log {{\,\textrm{tr}\,}}\left[\left( (I \otimes \rho _B)^{-\frac{1}{4}}\rho _{AB} (I \otimes \rho _B)^{-\frac{1}{4}}\right) ^2\right] + \log {{\,\textrm{tr}\,}}\left[\rho \right]. \end{aligned}$$
(4.2)

Finally, there are also conditional versions of the min- and max-entropy. For \(\rho _{AB} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(AB)\) and \(\sigma _B \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(B)\), we define

$$\begin{aligned} H_{\min }(A\vert B)_{\rho |\sigma }&= - \inf \{ \lambda : \rho _{AB} \le 2^\lambda I_A \otimes \sigma _B\} \\ H_{\max }(A\vert B)_{\rho |\sigma }&= \log \, \Vert \sqrt{\rho _{AB}}\sqrt{I \otimes \sigma _B}\Vert _1^2 \end{aligned}$$

and we let

$$\begin{aligned} H_{\min }(A \vert B)_{\rho }&= \sup _{\sigma \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(B)} H_{\min }(A\vert B)_{\rho |\sigma }\\ H_{\max }(A \vert B)_{\rho }&= \sup _{\sigma \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(B)} H_{\max }(A\vert B)_{\rho |\sigma }. \end{aligned}$$

We can also define their smoothed versions

$$\begin{aligned} H^\varepsilon _{\min }(A\vert B)_\rho&= \sup _{\rho ^{\varepsilon } \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(AB), P(\rho ^\varepsilon ,\rho ) \le \varepsilon } H_{\min }(A\vert B)_{\rho ^\varepsilon }\\ H^\varepsilon _{\max }(A\vert B)_\rho&= \underset{\rho ^{\varepsilon } \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(AB), P(\rho ^\varepsilon ,\rho ) \le \varepsilon }{\inf } H_{\max }(A\vert B)_{\rho ^\varepsilon }. \end{aligned}$$

There is a duality between (smooth) max- and min-entropies. If \(\rho \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(ABC)\) is a pure state, it holds that

$$\begin{aligned} H^{\varepsilon }_{\min }(A\vert B)_{\rho } = - H^{\varepsilon }_{\max }(A\vert C)_{\rho }. \end{aligned}$$
(4.3)

We will use the fact that for a normalized state \(\rho _{AB} \in {\mathcal {P}}_{\scriptscriptstyle {=}}(AB)\) (Corollary 5.10 in [74])

$$\begin{aligned} H_{\min }(A\vert B)_{\rho } \le H_2(A\vert B)_{\rho |\rho }. \end{aligned}$$
(4.4)

A final important property of conditional smooth entropies is the data processing inequalities. Let \(\Phi \) and \(\Psi \) be completely positive and trace-preserving (CPTP) maps, mapping systems A to \(A'\) and B to \(B'\), respectively, and let \(\sigma = (\Phi \otimes \Psi )(\rho )\). If \(\Phi \) is also subunital, and \(0 \le \varepsilon \le \sqrt{{{\,\textrm{tr}\,}}\rho }\), then Theorem 6.2 of [74] states

$$\begin{aligned} H_{\min }^{\varepsilon }(A' \vert B')_{\sigma } \ge H_{\min }^{\varepsilon }(A \vert B)_{\rho }\quad \text {and}\quad H_{\max }^{\varepsilon }(A' \vert B')_{\sigma } \ge H_{\max }^{\varepsilon }(A \vert B)_{\rho }. \end{aligned}$$

In fact, for the smooth min-entropy the data processing inequality is also valid if \(\Phi \) is only trace non-increasing rather than trace-preserving, see [73].

4.2 Recovery Isometries

Recall that we study random tensor network states with link states, pure states placed on each edge whose tensor product forms the full state on edges \(\phi = \bigotimes _{e \in E} \phi _e \in {\mathcal {P}}_{\scriptscriptstyle {=}}(V)\) for some graph \(G = (V,E)\). In Sect. 2.3, we considered more general background states \(\phi _V \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(V)\), where we no longer have a tensor product structure along the edges of some graph, and applying the replica trick does not yield a local spin model for the moments of the tensor network state. This situation is of independent interest, but will also be useful as an intermediate step when applying bounds based on one-shot entropies to link states. In Sect. 3, we studied link states for which the entanglement spectrum of the edge states \(\phi _e\) had bounded variation, and we used the replica trick to compute the moments of the spectrum of \(\rho _A\) for a boundary subsystem A. For general background states, we saw that the replica trick for \(k=2\) extends as in Eq. (2.20). What are the minimal cuts in this setting? Based on Eq. (2.20), a first guess would be that \(\Gamma _A \in C(A)\) would be a minimal cut (i.e., correspond to the dominant term in the replica trick) if for all other cuts \(\Delta _A \in C(A)\) we would have \(H_2(\Gamma _A)_{\phi } \ll H_2(\Delta _A)_\phi \). If the state is a link state, this corresponds to adding weights to the edges of the graph corresponding to the Rényi-2 entropies along the edges, and computing a weighted minimal cut. Indeed, this would yield an accurate approximation of \({{\,\textrm{tr}\,}}[\rho _A^2]\) and hence of \(H_2(\rho _A)\). However, if the spectrum of \(\rho _A\) is not close to a flat spectrum, this does not imply that \({{\,\textrm{spec}\,}}_+(\rho _A)\) is close to \({{\,\textrm{spec}\,}}_+(\phi _{\Gamma _A})\). We would like to show that for link states with unbounded spectral variation, and an appropriate minimal cut condition for \(\Gamma _A \in C(A)\), it is still true that \({{\,\textrm{spec}\,}}_+(\rho _A)\) is close to \({{\,\textrm{spec}\,}}_+(\phi _{\Gamma _A})\).

We will adapt the \(k=2\) replica trick for general background states to get a bound on the difference in trace norm between \({{\,\textrm{spec}\,}}_+(\rho _A)\) and \({{\,\textrm{spec}\,}}_+(\phi _{\Gamma _A})\) in terms of conditional Rényi-2 entropies,Footnote 4 as defined in Eq. (4.2). In Sect. 4.3, we will use this to formulate a condition for cut minimality in terms of smooth entropies for link states.

The main result of this subsection is a tensor network version of one-shot decoupling. Let \(\phi _V \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(V)\). We allow \(\phi _V\) to be a general state, which need not be pure and also need not be a product state along the edges of some graph. Let R be a purifying system and \(\phi _{VR} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(VR)\) be a purification of \(\phi _V\). Then, we can construct the random tensor network state \(\rho _{V_{\partial }R}\) where the boundary systems are given by \(V_\partial \cup R\), which is a purification of the random tensor network state \(\rho _{V_\partial }\) as in Eq. (2.18) by

$$\begin{aligned} \phi _{V_\partial R} = {{\,\textrm{tr}\,}}_{V_b}[(I_{V_\partial R} \otimes \psi ) \phi ] \end{aligned}$$
(4.5)

where \(\psi \) is a tensor product of random tensors. We briefly recall our notation for boundary subsystems and cuts: For a boundary subsystem \(A \subseteq V_\partial \), we denote its boundary complement by \({\bar{A}} = V_\partial {\setminus } A\), and for a cut \(\Gamma _A \in C(A)\), we let \(\Gamma _A^c = V {\setminus } \Gamma _A\), which is a cut for \({\bar{A}}\). The purifying system R can be thought of as an additional boundary system in the tensor network construction.

In Theorem 4.4, we will assume that we have a cut \(\Gamma _A \in C(A)\) which is such that for all cuts \(\Delta _A \in C(A)\) for which \(\Delta _A \subsetneq \Gamma _A\) we have \(H_2(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c R)_{\phi \vert \phi } \gg 1\), and similarly for all cuts \(\Delta _A \in C(A)\) for which \(\Gamma _A \subsetneq \Delta _A\) we have \(H_2(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A R)_{\phi \vert \phi } \gg 1\). We show that this condition implies that with high probability there exist isometries \(V_{A}: {\mathcal {H}}_A \rightarrow {\mathcal {H}}_{\Gamma _A}\) and \(V_{{\bar{A}}}: {\mathcal {H}}_{{\bar{A}}} \rightarrow {\mathcal {H}}_{\Gamma _A^c}\) such that

$$\begin{aligned} (V_A \otimes V_{{\bar{A}}} \otimes I_R)\vert \rho \rangle \approx \vert \phi \rangle . \end{aligned}$$
(4.6)

The approximation accuracy will be measured in trace norm. In particular, this implies that \({{\,\textrm{spec}\,}}_+(\rho _A) \approx {{\,\textrm{spec}\,}}_+(\phi _{\Gamma _A})\). If the state \(\phi \) is a tensor product of link states, \({{\,\textrm{spec}\,}}_+(\phi _{\Gamma _A})\) is precisely the entanglement spectrum along the cut \(\gamma _A\). The isometries \(V_A\) and \(V_{{\bar{A}}}\) are recovery isometries, which allow us to “recover” \(\Gamma _A\) from the A system, and similarly we can recover \(\Gamma _A^c\) from \({\bar{A}}\).

The result is closely related to quantum error correction. One way to interpret this is as follows: Consider a subspace \({\mathcal {H}}_S\) of \({\mathcal {H}}_{V}\) and let R be a reference system of dimension \(\dim ({\mathcal {H}}_S)\), and \(\phi _{VR}\) a maximally entangled state between S and R. Then, Eq. (4.6) can be interpreted as saying that if we encode the subspace S by projecting onto random tensors, the information in \(\Gamma _A\) is protected, after encoding, against an erasure error on \({\bar{A}}\). This idea is also discussed in [61] for perfect tensor network models, and in [40] for random tensor networks with maximally entangled link states. In holography, the notion of local recovery isometries and their error correction interpretation goes under the name of entanglement wedge reconstruction or subregion-subregion duality. See [6, 7] for a detailed discussion of entanglement wedge reconstruction in holographic systems with bulk entropy, relating to one-shot entropies. We provide more details in “Appendix B.”

Our approach to showing Eq. (4.6) is that we start by projecting only on the random tensors in \(\Gamma _A\), and not on the random tensors in \(\Gamma _A^c\). This yields a random tensor network state \(\sigma \) on \(A\Gamma _A^c R\).

We then show that, by a version of one-shot decoupling, the reduced state on \(\sigma _{\Gamma _A^c R}\) has not changed much from \(\phi _{\Gamma _A^c R}\). By Uhlmann’s theorem, this implies that there exists an isometry \(V_A\) such that \((V_A \otimes I_{\Gamma _A^c R})\vert \sigma \rangle \approx \vert \phi \rangle \). Combining this with a similar result for \(\Gamma _A^c\) we obtain Eq. (4.6), as will be made precise in Theorem 4.4.

In our construction of \(\sigma \), we can relabel the vertices in the graph, and think of the vertices in \(\Gamma _A {\setminus } A\) as the bulk vertices \(V_b\), the boundary subsystem A as the complete boundary \(V_{\partial }\), and relabel all other subsystems as the reference system R. Then, we prove the following result, which is closely related to the one-shot decoupling results in [24].

Proposition 1.7

Consider a random tensor network state \(\rho _{V_\partial R}\) as in Eq. (4.5) with a (purified) background state \(\phi _{VR} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(VR)\). Let \(A = V_\partial \) and let \(\Gamma _A = V\) and suppose that for any cut \(\Delta _A \in C(A)\) other than \(\Gamma _A\)

$$\begin{aligned} H_2(\Gamma _A {\setminus } \Delta _A \vert R)_{\phi \vert \phi } \ge K\end{aligned}$$

then

$$\begin{aligned} \mathbbm {E}\Vert \rho _{R} - \phi _{R}\Vert _1 \le 2^{\tfrac{\left|V_b\right|}{2}}\sqrt{{{\,\textrm{tr}\,}}[\phi ]}2^{-\frac{1}{2} K}. \end{aligned}$$

Note that since \(\Gamma _A = V_b \cup A\), the sets \(\Gamma _A {\setminus } \Delta _A\) for \(\Delta _A \in C(A) {\setminus } \{\Gamma _A\}\) are exactly the non-empty subsets of \(V_b\). The formulation in terms of \(\Delta _A \in C(A)\) will be natural when we apply this result in Theorem 4.4.

Proof

We closely follow the strategy in [24, 26]. We first note a basic fact (Lemma 3.7 in [24]): For any operator X and \(\omega \) a subnormalized density matrix, it holds that

$$\begin{aligned} \Vert X\Vert _1 \le \Vert \omega ^{-\frac{1}{4}} X \omega ^{-\frac{1}{4}}\Vert _2. \end{aligned}$$
(4.7)

The proof is an application of the Cauchy–Schwarz inequality. We use Eq. (4.7) with \(\omega = \phi _{R}\) and Jensen’s inequality to see that

$$\begin{aligned} \mathbbm {E}\Vert \rho _{R} - \phi _{R}\Vert _1 \le \sqrt{\mathbbm {E}{{\,\textrm{tr}\,}}[({\tilde{\rho }}_{R} - {\tilde{\phi }}_{R})^2]} \end{aligned}$$

where \({\tilde{\rho }}_{V_{\partial } R} = (I \otimes \phi _R)^{-\frac{1}{4}} \rho _{V_{\partial } R} (I \otimes \phi _R)^{-\frac{1}{4}}\) and \({\tilde{\phi }}_{VR} = (I \otimes \phi _R)^{-\frac{1}{4}} \phi _{VR} (I \otimes \phi _R)^{-\frac{1}{4}}\). Now, \(\mathbbm {E}{\tilde{\rho }}_{R} = {\tilde{\phi }}_{R}\) by Eq. (2.21), and the replica trick in Eq. (2.20) yields

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}[({\tilde{\rho }}_{R} - {\tilde{\phi }}_{R})^2]&= \mathbbm {E}{{\,\textrm{tr}\,}}[{\tilde{\rho }}_{R}^2] - {{\,\textrm{tr}\,}}[{\tilde{\phi }}_{R}^2]\\&= \sum _{\Delta _A \in C(A), \Delta _A \subsetneq \Gamma _A} {{\,\textrm{tr}\,}}[{\tilde{\phi }}_{(\Gamma _A {\setminus } \Delta _A) R}^2]\\&= \sum _{\Delta _A \in C(A), \Delta _A \subsetneq \Gamma _A} {{\,\textrm{tr}\,}}[\phi ] 2^{-H_2(\Gamma _A {\setminus } \Delta _A \vert R)_{\phi |\phi }} \end{aligned}$$

using the definition of \({\tilde{\phi }}\) and Eq. (4.2) and hence

$$\begin{aligned} \left( \mathbbm {E}\Vert \rho _{R} - \phi _{R}\Vert _1\right) ^2 \le 2^{\left|V_b\right|}{{\,\textrm{tr}\,}}[\phi ] 2^{-K}. \end{aligned}$$

\(\square \)

Suppose that in the setup of Proposition 4.1, we would have equality \(\rho _{R} = \phi _{R}\). Then, by Uhlmann’s theorem, their purifications \(\rho _{AR}\) and \(\phi _{\Gamma _A R}\) are related by an isometry \(V_A\) from A to \(\Gamma _A\). The following lemma is useful to extend to the case where the reduced states are close in trace distance.

Lemma 1.8

Suppose \(\rho _{AB} \in {\mathcal {P}}(AB)\) and \(\sigma _{AC} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(AC)\) are pure states on Hilbert spaces \({\mathcal {H}}_A \otimes {\mathcal {H}}_B\) and \({\mathcal {H}}_A \otimes {\mathcal {H}}_C\), respectively. Then,

$$\begin{aligned} \min _{V} \Vert (I_A \otimes V) \rho _{AB} (I_A \otimes V^\dagger ) - \sigma _{AC}\Vert _1 \le 2\sqrt{2\Vert \rho _{A} - \sigma _{A}\Vert _1 + 2\Vert \rho _{A} - \sigma _{A}\Vert _1^2}. \end{aligned}$$

where the minimum is over all isometries \(V: {\mathcal {H}}_B \rightarrow {\mathcal {H}}_C\).

Proof

Uhlmann’s theorem states that if \(\rho _{AB} \in {\mathcal {P}}(AB)\) and \(\sigma _{AC} \in {\mathcal {P}}(AC)\) are pure quantum states with \(\dim ({\mathcal {H}}_B) \le \dim ({\mathcal {H}}_C)\), then there exists an isometry \(V: {\mathcal {H}}_B \rightarrow {\mathcal {H}}_C\) such that

$$\begin{aligned} P(\rho _A,\sigma _A) = P((I_A \otimes V) \rho _{AB} (I_A \otimes V^\dagger ), \sigma _{AC}) \end{aligned}$$

and, in particular, the isometry is the solution to an optimization problem:

$$\begin{aligned} P(\rho _A, \sigma _A) = \min _V P((I_A \otimes V) \rho _{AB} (I_A \otimes V^\dagger ), \sigma _{AC}). \end{aligned}$$

Moreover, if both \(\rho \) and \(\sigma \) are subnormalized, by Eq. (4.1), we can bound

$$\begin{aligned} \min _{V} \Vert (I_A \otimes V) \rho _{AB} (I_A \otimes V^\dagger ) - \sigma _{AC}\Vert _1&\le \min _{V} 2P((I_A \otimes V) \rho _{AB} (I_A \otimes V^\dagger ), \sigma _{AC})\\&= 2P(\rho _A,\sigma _A) \\&\le 2\sqrt{2\Vert \rho _{A} - \sigma _{A}\Vert _1}. \end{aligned}$$

From this, it follows that if \(\sigma \) is subnormalized and \(\rho \) has \({{\,\textrm{tr}\,}}[\rho ] > 1\),

$$\begin{aligned} \min _{V} \Vert (I_A \otimes V) \rho _{AB} (I_A \otimes V^\dagger ) - \sigma _{AC}\Vert _1 \le 2\sqrt{2{{\,\textrm{tr}\,}}[\rho ]\Vert \rho _{A} - \sigma _{A}\Vert _1}. \end{aligned}$$

Since \({{\,\textrm{tr}\,}}[\rho ] \le {{\,\textrm{tr}\,}}[\sigma ] + \Vert \rho - \sigma \Vert _1\) and \({{\,\textrm{tr}\,}}[\sigma ] \le 1\), we conclude that

$$\begin{aligned} \min _{V} \Vert (I_A \otimes V) \rho _{AB} (I_A \otimes V^\dagger ) - \sigma _{AC}\Vert _1 \le 2\sqrt{2\Vert \rho _{A} - \sigma _{A}\Vert _1 + 2\Vert \rho _{A} - \sigma _{A}\Vert _1^2}. \end{aligned}$$

for arbitrary \(\rho \) and subnormalized \(\sigma \). \(\square \)

Finally, we will need a basic lemma relating tensor network states with differing background states:

Lemma 1.9

Suppose we consider random tensor network states \(\rho _{V_\partial R}\) and \({\tilde{\rho }}_{V_\partial R}\) with (purified) background states \(\phi _{V R}, {\tilde{\phi }}_{VR} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(VR)\) and projecting onto the same random tensors. Then,

$$\begin{aligned} \mathbbm {E}\Vert \rho _{V_\partial R} - {\tilde{\rho }}_{V_\partial R} \Vert _1 \le \Vert \phi _{VR} - \tilde{\phi }_{VR}\Vert _1. \end{aligned}$$

Proof

Let \(\phi - {\tilde{\phi }} = \Delta _+ - \Delta _-\) where both \(\Delta _+\) and \(\Delta _-\) are positive semidefinite and are such that \(\Vert \phi - {\tilde{\phi }}\Vert _1 = {{\,\textrm{tr}\,}}[\Delta _+ + \Delta _-]\). Then, we can also consider the random tensor network states \(\sigma _+\) and \(\sigma _-\) which take \(\Delta _+\) and \(\Delta _-\) as background states, and by the linearity of Eq. (4.5) in the background state we have \(\rho - {{\tilde{\rho }}} = \sigma _+ - \sigma _-\). By Eq. (2.21), \(\mathbbm {E}\sigma _{\pm } = \Delta _{\pm }\). We then estimate

$$\begin{aligned} \mathbbm {E}\Vert \rho - {{\tilde{\rho }}}\Vert _1&= \mathbbm {E}\Vert \sigma _+ - \sigma _-\Vert _1 \le \mathbbm {E}(\Vert \sigma _+\Vert _1 + \Vert \sigma _-\Vert _1) = \mathbbm {E}{{\,\textrm{tr}\,}}[\sigma _+ + \sigma _-] \\&= {{\,\textrm{tr}\,}}[\Delta _+ + \Delta _-] = \Vert \phi - {\tilde{\phi }}\Vert _1. \end{aligned}$$

where we have used that \(\sigma _+\) and \(\sigma _-\) are positive semidefinite and hence \(\Vert \sigma _{\pm }\Vert _1 = {{\,\textrm{tr}\,}}[\sigma _{\pm }]\). \(\square \)

With all our tools assembled, we are ready to prove the main result of this subsection. We again let \(\phi _{VR} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(VR)\) be a background state with R a purifying system, and we let \(\rho _{V_{\partial }R}\) be the associated random tensor network state as constructed in Eq. (4.5). Let \(\Gamma _A\) be an arbitrary cut for the boundary region A. In Theorem 4.4, we provide a criterion to determine whether \(\Gamma _A\) is a minimal cut in terms of conditional entropies. Informally speaking, the following result shows that if \(\Gamma _A\) is a minimal cut in this sense, we can recover the system \(\Gamma _A\) from the boundary subsystem A, while we can recover \(\Gamma _A^c\) from the boundary subsystem \({\bar{A}}\). For general \(\phi \), Theorem 4.4 is closely related to the task of split transfer, see “Appendix B” for a discussion. The following result closely follows Proposition 18 of [26].

Theorem 1.10

(Recovery isometries). Let \(\phi _{VR} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(VR)\), and let \(\rho _{V_{\partial }R}\) be the associated random tensor network state as in Eq. (4.5). Let \(\Gamma _A \in C(A)\) and suppose that

$$\begin{aligned} H_2(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c R)_{\phi |\phi } \ge K_1 \end{aligned}$$
(4.8)

for all cuts \(\Delta _A \in C(A)\) such that \(\Delta _A \subsetneq \Gamma _A\) and

$$\begin{aligned} H_2(\Delta _A{\setminus } \Gamma _A \vert \Gamma _A R)_{\phi |\phi } \ge K_2 \end{aligned}$$
(4.9)

for all cuts \(\Delta _A \in C(A)\) such that \(\Gamma _A \subsetneq \Delta _A\). Then,

$$\begin{aligned}&\mathbbm {E}\min _{V_A, V_{{\bar{A}}}} \Vert (V_A \otimes V_{{\bar{A}}} \otimes I_R)\rho _{V_{\partial }R}(V_A^\dagger \otimes V_{{\bar{A}}}^\dagger \otimes I_R) - \phi _{VR}\Vert _1 \nonumber \\ {}&\qquad \qquad = {\mathcal {O}}({{\,\textrm{tr}\,}}[\phi ]^{\frac{1}{4}}(2^{-\frac{1}{4}K_1} + 2^{-\frac{1}{4}K_2})). \end{aligned}$$
(4.10)

where the minimum is over isometries \(V_{A}: {\mathcal {H}}_A \rightarrow {\mathcal {H}}_{\Gamma _A}\) and \(V_{{\bar{A}}}: {\mathcal {H}}_{{\bar{A}}} \rightarrow {\mathcal {H}}_{\Gamma _A^c}\).

Proof

Let \(\sigma _{A\Gamma _A^c R}\) be the state where we have contracted along the tensors in \(\Gamma _A\) but not along those in \(\Gamma _A^c\), and similarly let \(\tau _{{\bar{A}} \Gamma _A R}\) be the state where we have contracted along the tensors in \(\Gamma _A^c\) but not along those in \(\Gamma _A\). We first use Proposition 4.1 to show that \(\sigma _{\Gamma _A^c R} \approx \phi _{\Gamma _A^c R}\) and \(\tau _{\Gamma _A R} \approx \phi _{\Gamma _A R}\). Indeed, for \(\sigma \) we simply apply Proposition 4.1 with \(V_b \cap \Gamma _A\) as the set of bulk vertices, A as the set of boundary vertices and \(\Gamma _A^c R\) as the reference system. This gives

$$\begin{aligned} \mathbbm {E}\Vert \sigma _{\Gamma _A^c R} - \phi _{\Gamma _A^c R}\Vert _1 ={\mathcal {O}}(\sqrt{{{\,\textrm{tr}\,}}[\phi ]}2^{-\frac{1}{2} K_1}). \end{aligned}$$

A similar application of Proposition 4.1, with \(V_b \cap \Gamma _A^c\) as the set of bulk vertices, \({\bar{A}}\) as the set of boundary vertices and \(\Gamma _A R\) as the reference system, shows

$$\begin{aligned} \mathbbm {E}\Vert \tau _{\Gamma _A R} - \phi _{\Gamma _{A R}}\Vert _1 = {\mathcal {O}}\bigg (\sqrt{{{\,\textrm{tr}\,}}[\phi ]}2^{-\frac{1}{2} K_2}\bigg ). \end{aligned}$$

We note that for any isometries \(V_{A}: {\mathcal {H}}_A \rightarrow {\mathcal {H}}_{\Gamma _A}\) and \(V_{{\bar{A}}}: {\mathcal {H}}_{{\bar{A}}} \rightarrow {\mathcal {H}}_{\Gamma _A^c}\)

$$\begin{aligned}&\Vert (V_A \otimes V_{{\bar{A}}} \otimes I_R)\rho (V_A^\dagger \otimes V_{{\bar{A}}}^\dagger \otimes I_R) - \phi _{VR}\Vert _1 \\&\quad \le \Vert (I_{\Gamma _A} \otimes V_{{\bar{A}}} \otimes I_R)\tau (I_{\Gamma _A} \otimes V_{{\bar{A}}}^\dagger \otimes I_R) - \phi _{VR}\Vert _1 \\&\qquad + \Vert \tau - (V_A \otimes I_{{\bar{A}} R})\rho (V_A^\dagger \otimes I_{{\bar{A}} R})\Vert _1, \end{aligned}$$

where we have applied the triangle inequality after adding and subtracting \((I_{\Gamma _A} \otimes V_{{\bar{A}}} \otimes I_R)\tau (I_{\Gamma _A} \otimes V_{{\bar{A}}}^\dagger \otimes I_R)\), and then using the invariance of the trace norm under isometries in the second term. We use this to estimate

$$\begin{aligned} \begin{aligned}&\mathbbm {E}\min _{V_A, V_{{\bar{A}}}} \Vert (V_A \otimes V_{{\bar{A}}} \otimes I_R)\rho (V_A^\dagger \otimes V_{{\bar{A}}}^\dagger \otimes I_R) - \phi _{VR}\Vert _1 \\&\quad \le \mathbbm {E}\bigl ( \min _{V_{{\bar{A}}}} \Vert (I_{\Gamma _A} \otimes V_{{\bar{A}}} \otimes I_R)\tau (I_{\Gamma _A} \otimes V_{{\bar{A}}}^\dagger \otimes I_R) - \phi _{VR}\Vert _1 \\&\qquad + \min _{V_A} \Vert \tau - (V_A \otimes I_{{\bar{A}} R})\rho (V_A^\dagger \otimes I_{{\bar{A}} R})\Vert _1 \bigr ), \end{aligned}\nonumber \\ \end{aligned}$$
(4.11)

where the minimum is over isometries \(V_{A}: {\mathcal {H}}_A \rightarrow {\mathcal {H}}_{\Gamma _A}\) and \(V_{{\bar{A}}}: {\mathcal {H}}_{{\bar{A}}} \rightarrow {\mathcal {H}}_{\Gamma _A^c}\). For the first term of Eq. (4.11), we apply Lemma 4.2 to get

$$\begin{aligned}&\min _{V_{{\bar{A}}}} \Vert (I_{\Gamma _A} \otimes V_{{\bar{A}}} \otimes I_R)\tau (I_{\Gamma _A} \otimes V_{{\bar{A}}}^\dagger \otimes I_R) - \phi _{VR}\Vert _1\\&\quad \le 2\sqrt{2 \Vert \tau _{\Gamma _A R} - \phi _{\Gamma _{A R}}\Vert _1 + 2\Vert \tau _{\Gamma _A R} - \phi _{\Gamma _{A R}}\Vert _1^2}\\&\quad \le 2\sqrt{2} \left( \sqrt{\Vert \tau _{\Gamma _A R} - \phi _{\Gamma _{A R}}\Vert _1} + \Vert \tau _{\Gamma _A R} - \phi _{\Gamma _{A R}}\Vert _1\right) \end{aligned}$$

and by Jensen’s inequality

$$\begin{aligned}&\mathbbm {E}\min _{V_{{\bar{A}}}} \Vert (I_{\Gamma _A} \otimes V_{{\bar{A}}} \otimes I_R)\tau (I_{\Gamma _A} \otimes V_{{\bar{A}}}^\dagger \otimes I_R) - \phi _{VR}\Vert _1 \nonumber \\&\quad \le 2\sqrt{2} \left( \sqrt{\mathbbm {E}\Vert \tau _{\Gamma _A R} - \phi _{\Gamma _{A R}}\Vert _1} + \mathbbm {E}\Vert \tau _{\Gamma _A R} - \phi _{\Gamma _{A R}}\Vert _1\right) \nonumber \\&\quad = {\mathcal {O}}({{\,\textrm{tr}\,}}[\phi ]^{\frac{1}{4}}2^{-\frac{1}{4} K_2}) \end{aligned}$$
(4.12)

For the second term of Eq. (4.11), we can think of \(\tau \) and \((V_A \otimes I_{{\bar{A}} R})\rho (V_A^\dagger \otimes I_{{\bar{A}} R})\) as the random tensor network states with \(\phi \) and \((V_A \otimes I_{\Gamma _A^c R})\sigma (V_A^\dagger \otimes I_{\Gamma _A^c R})\) as the full state on edges, applying random tensors in \(\Gamma _A^c\). Then, denoting by \(\mathbbm {E}_{\Gamma _A^c}\) the expectation value over all random tensors in \(\Gamma _A^c\), by Lemma 4.3

$$\begin{aligned} \mathbbm {E}_{\Gamma _A^c} \Vert \tau - (V_A \otimes I_{{\bar{A}} R})\rho (V_A^\dagger \otimes I_{{\bar{A}} R})\Vert _1 = {\mathcal {O}}(\Vert \phi _{VR} - (V_A \otimes I_{{\bar{A}} R})\sigma (V_A^\dagger \otimes I_{{\bar{A}} R})\Vert _1). \end{aligned}$$

We thus estimate

$$\begin{aligned} \mathbbm {E}\min _{V_A}&\Vert \tau - (V_A \otimes I_{{\bar{A}} R})\rho (V_A^\dagger \otimes I_{{\bar{A}} R})\Vert _1 \\ {}&= {\mathcal {O}}(\mathbbm {E}_{\Gamma _A} \min _{V_A} \Vert \phi _{VR} - (V_A \otimes I_{{\bar{A}} R})\sigma (V_A^\dagger \otimes I_{{\bar{A}} R})\Vert _1) \end{aligned}$$

for which we may argue exactly as in Eq. (4.12) and using Lemma 4.2 that

$$\begin{aligned} \mathbbm {E}_{\Gamma _A} \min _{V_A} \Vert \phi _{VR} - (V_A \otimes I_{\Gamma _A^c R})\sigma (V_A^\dagger \otimes I_{\Gamma _A^c R})\Vert _1 = {\mathcal {O}}({{\,\textrm{tr}\,}}[\phi ]^{\frac{1}{4}}2^{-\frac{1}{4} K_1}). \end{aligned}$$

We conclude that

$$\begin{aligned}&\mathbbm {E}\min _{V_A, V_{{\bar{A}}}} \Vert (V_A \otimes V_{{\bar{A}}} \otimes I_R)\rho (V_A^\dagger \otimes V_{{\bar{A}}}^\dagger \otimes I_R) - \phi _{VR}\Vert _1 \\ {}&\qquad \qquad = {\mathcal {O}}({{\,\textrm{tr}\,}}[\phi ]^{\frac{1}{4}}(2^{-\frac{1}{4} K_1} + 2^{-\frac{1}{4} K_2})). \end{aligned}$$

\(\square \)

We hence find that the closeness of the boundary and background state can be bounded via conditional Rényi-2 entropies of cuts. In particular, for large \(K_1,K_2\), the recovery isometries can recover states to good accuracy, and we find that \(\mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A) - {{\,\textrm{spec}\,}}_+(\phi _{\Gamma _A})\Vert \) is small. However, this result is not yet completely satisfying. The conditional Rényi-2 entropy is not a “robust” quantity, in the sense that a small deformation of \(\phi \) can drastically change the values of the conditional Rényi-2 entropies in Eqs. (4.8) and (4.9). For this reason, we would like a condition with smoothed entropies. We first note that one can actually show that for the condition in Eq. (4.8), we can bound

$$\begin{aligned} H_2(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c R)_{\phi \vert \phi } \ge H_{\min }(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c R)_{\phi } \end{aligned}$$
(4.13)

and similarly for Eq. (4.9),

$$\begin{aligned} H_2(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A R)_{\phi \vert \phi } \ge H_{\min }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A R)_{\phi } \end{aligned}$$
(4.14)

To make the condition “robust,” we would like to replace these by smoothed entropies and express a condition in terms of \(H^{\varepsilon }_{\min }(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c R)_{\phi }\) and \(H^{\varepsilon }_{\min }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A R)_{\phi }\). This will require simultaneous smoothing: finding a state \(\phi ^\varepsilon _{VR} \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(VR)\) which is close to \(\phi \), such that \(H_{\min }(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c R)_{\phi ^{\varepsilon }} \ge H^{\varepsilon }_{\min }(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c R)_{\phi }\) and \(H_{\min }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A R)_{\phi ^{\varepsilon }} \ge H^{\varepsilon }_{\min }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A R)_{\phi }\) for all relevant cuts \(\Delta _A\). If we have a general background state, it is not known how this can be done [25, 31]. However, if the background state is actually a tensor product of link states, we can perform the simultaneous smoothing.

4.3 One Minimal Cut

The primary result in this subsection is Theorem 4.7, which states that if the background state is actually a tensor product of link states as in Eq. (2.2), then the spectrum of the boundary state is well approximated by the spectrum of the minimal cut link state in expectation, where the approximation accuracy is controlled by smooth one-shot entropies. It is a straightforward application of Theorem 4.4.

For a cut \(\Gamma _A \in C(A)\) we define

$$\begin{aligned} {\mathcal {C}}_1(\Gamma _A)&= \{\Delta _A \in C(A) : \Delta _A \subsetneq \Gamma _A \} \\ {\mathcal {C}}_2(\Gamma _A)&= \{\Delta _A \in C(A) : \Gamma _A \subsetneq \Delta _A \}. \end{aligned}$$

The key result we need is the following lemma, which we prove in “Appendix C,” which shows that if we have a link state, we can perform the desired joint smoothing.

Lemma 1.11

Let \(\phi \in {\mathcal {P}}_{\scriptscriptstyle {=}}(V)\) be a link state, \(A \subseteq V_{\partial }\) a boundary subsystem and \(\Gamma _A \in C(A)\) a cut for A. Then, there exists a pure state \(\phi ^{\varepsilon } \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(V)\) which is such that

$$\begin{aligned} P(\phi ,\phi ^\varepsilon ) \le 2\left( \sqrt{\left|{\mathcal {C}}_1(\Gamma _A)\right|} + \sqrt{\left|{\mathcal {C}}_2(\Gamma _A)\right|}\right) \sqrt{\varepsilon } \end{aligned}$$

and it holds that for any \(\Delta _A \in {\mathcal {C}}_1(\Gamma _A)\)

$$\begin{aligned} H_{\min }(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c)_{\phi ^\varepsilon } \ge H_{\min }^{\varepsilon }(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c)_{\phi } \end{aligned}$$

and for any \(\Delta _A \in {\mathcal {C}}_2(\Gamma _A)\)

$$\begin{aligned} H_{\min }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A)_{\phi ^\varepsilon } \ge H_{\min }^{\varepsilon }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A)_{\phi }. \end{aligned}$$

We now define the notion of a minimal cut for an arbitrary link state.

Definition 1.12

(Generalized minimal cut) A cut \(\Gamma _{A}\) is an \((\varepsilon ,K)\)-minimal cut if for all \(\Delta _A \in {\mathcal {C}}_1(\Gamma _A)\)

$$\begin{aligned} H_{\min }^{\varepsilon }(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c)_{\phi } \ge K \end{aligned}$$

and for all \(\Delta _A \in {\mathcal {C}}_2(\Gamma _A)\)

$$\begin{aligned} H_{\min }^{\varepsilon }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A)_{\phi } \ge K. \end{aligned}$$

This definition is consistent with the smooth entropy conditions for minimal surfaces in holography from [6]. The following is now a straightforward consequence of Theorem 4.4 and Lemma 4.5. It justifies our notion of a generalized minimal cut, as it controls the degree to which the spectrum of the corresponding cut link state \(\phi _{\Gamma _A}\) is close to the boundary state \(\rho _A\).

Theorem 1.13

Consider a random tensor network state \(\rho \) constructed with \(\phi \in {\mathcal {P}}_{\scriptscriptstyle {=}}(V)\) a tensor product of link states as in Eq. (2.2). Let A be a boundary region of the network, \(\rho _A\) the corresponding boundary state, and \(\Gamma _A\) an \((\varepsilon ,K)\)-minimal cut. Then, the spectra of \(\rho _A\) and the state \(\phi _{\Gamma _A}\) on A are related as:

$$\begin{aligned} \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A) - {{\,\textrm{spec}\,}}_+(\phi _{\Gamma _A})\Vert _1 = {\mathcal {O}}(2^{-\frac{1}{4} K} + \sqrt{\varepsilon }). \end{aligned}$$

Proof

Let \(\phi ^{\varepsilon }\) be a state as constructed in Lemma 4.5, and let \(\rho ^{\varepsilon }\) be the random tensor network state using this background state. Then, by Theorem 4.4

$$\begin{aligned}&\mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A^{\varepsilon }) - {{\,\textrm{spec}\,}}_+(\phi ^{\varepsilon }_{\Gamma _A})\Vert _1 \\ {}&\quad \le \mathbbm {E}\min _{V_A, V_{{\bar{A}}}} \Vert (V_A \otimes V_{{\bar{A}}} \otimes I_R)\rho ^{\varepsilon }(V_A^\dagger \otimes V_{{\bar{A}}}^\dagger \otimes I_R) - \phi ^{\varepsilon }\Vert _1\\&\quad = {\mathcal {O}}({{\,\textrm{tr}\,}}[ \phi ]^{\frac{1}{4}}(2^{-\frac{1}{4}K_1} + 2^{-\frac{1}{4}K_2})) \end{aligned}$$

where \(K_1\) is the minimal value over \(\Delta _A \in {\mathcal {C}}_1(\Gamma _A)\) of

$$\begin{aligned} H_2(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c)_{\phi ^{\varepsilon }|\phi ^{\varepsilon }}&\ge H_{\min }(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c)_{\phi ^{\varepsilon }} - \log {{\,\textrm{tr}\,}}[\phi ^\varepsilon ] \\&\ge H_{\min }^{\varepsilon }(\Gamma _A {\setminus } \Delta _A \vert \Gamma _A^c)_{\phi } - \log {{\,\textrm{tr}\,}}[\phi ^\varepsilon ] \end{aligned}$$

using Eq. (4.4), the defining property of \(\phi ^{\varepsilon }\) from Lemma 4.5 and accounting for the normalization. Similarly \(K_2\) is the minimal value over \(\Delta _A \in {\mathcal {C}}_2(\Gamma _A)\) of

$$\begin{aligned} H_2(\Delta _A{\setminus } \Gamma _A \vert \Gamma _A)_{\phi ^{\varepsilon }|\phi ^{\varepsilon }}&\ge H_{\min }(\Delta _A{\setminus } \Gamma _A \vert \Gamma _A)_{\phi ^{\varepsilon }} - \log {{\,\textrm{tr}\,}}[\phi ^\varepsilon ] \\&\ge H_{\min }^{\varepsilon }(\Delta _A{\setminus } \Gamma _A \vert \Gamma _A)_{\phi } - \log {{\,\textrm{tr}\,}}[\phi ^\varepsilon ] \end{aligned}$$

and hence \(K_1 \ge K\) and \(K_2 \ge K\), so

$$\begin{aligned} \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A^{\varepsilon }) - {{\,\textrm{spec}\,}}_+({{\tilde{\phi }}}^{\varepsilon }_{\Gamma _A})\Vert _1 = {\mathcal {O}}(2^{-\frac{1}{4}K}). \end{aligned}$$

Moreover, \(\Vert \phi - \phi ^{\varepsilon }\Vert _1 \le 2T(\phi ,\phi ^{\varepsilon }) \le 2P(\phi ,\phi ^{\varepsilon }) = {\mathcal {O}}(\sqrt{\varepsilon })\) by Eq. (4.1) and hence by Lemma 4.3

$$\begin{aligned} \mathbbm {E}\Vert \rho - \rho ^{\varepsilon }\Vert _1 = {\mathcal {O}}(\sqrt{\varepsilon }). \end{aligned}$$

We conclude that

$$\begin{aligned} \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A) - {{\,\textrm{spec}\,}}_+(\phi _{\Gamma _A})\Vert _1&\le \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A^{\varepsilon }) - {{\,\textrm{spec}\,}}_+(\phi ^{\varepsilon }_{\Gamma _A})\Vert _1\\&\quad + \mathbbm {E}\Vert \rho _A - \rho ^{\varepsilon }_A\Vert _1 + \Vert \phi _{\Gamma _A} - \phi ^{\varepsilon }_{\Gamma _A}\Vert _1\\&= {\mathcal {O}}(2^{-\frac{1}{4}K} + \sqrt{\varepsilon }). \end{aligned}$$

\(\square \)

4.4 Two Competing Minimal Cuts

We now consider the case of two minimal cuts for A, where the link states do not have bounded spectral variation as in Sect. 3. We cannot directly apply Theorem 4.7 if there are two competing minimal cuts (note that we will need to define what this means exactly), nor can we assume that the empirical measure, or even a rescaling of it, will converge. Instead, we will see that the spectrum can be approximated in trace distance in a way that allows one to compute entropies, as well as showing convergence of certain measures depending on the spectrum of the reduced state. In particular, under these regularity conditions, we will prove the main result of this subsection: The distribution defined by the boundary state converges to the measure defined by the pushforward along the \(\min \)-function acting on the distributions of the two competing minimal cuts. The intuition behind this is that the state on the edges can be approximated by a superposition of two states which both do have a unique minimal cut. As a corollary of this result, we will then show how to approximate the von Neumann entropy of the boundary state in such a situation.

We begin by introducing the relevant family of probability distributions for our purposes. Given a probability distribution vector \(p \in \mathbbm {R}^d\) of d outcomes and \(f(d) > 0\) and \(g(d) \in \mathbbm {R}\), consider the random variable which takes value \(f(d)[-\log (p_i) - g(d)]\) with probability \(p_i\). It has distribution

$$\begin{aligned} \nu _p = \sum _{i = 1}^d p_i \delta _{f(d)[-\log (p_i) - g(d)]}. \end{aligned}$$
(4.15)

Typically, we will have families of probability distributions with increasing d, let \(f(d) = (\log d)^{-1}\) and choose g(d) such that it corresponds to the entropy of p. We will also write \(\nu _\rho = \nu _{{{\,\textrm{spec}\,}}(\rho )}\) for a quantum state \(\rho \). We may also consider Eq. (4.15) for the case where \(p \in \mathbbm {R}^d\) is positive but normalized, in which case \(\nu _p\) is a finite measure. To motivate the study of the distribution in Eq. (4.15), we note that it is closely related to the central limit theorem. Let \(d = d_0^n\) and \(\rho = \rho _0^{\otimes n}\) for some state \(\rho _0\) on \(\mathbbm {C}^{d_0}\). Then,

$$\begin{aligned} \nu ^{(n)}_\rho = \sum _{\lambda \in {{\,\textrm{spec}\,}}(\rho _0^{\otimes n})} \lambda \delta _{\frac{1}{\sqrt{n}}[-\log (\lambda ) - nH(\rho _0)]} \end{aligned}$$

converges in distribution to a normal distribution by the central limit theorem. We study this particular measure because the empirical measures we investigated in Sect. 3 may not have good convergence properties if, for example, the full state on edges consists of many copies of a single non-maximally entangled link state. Moreover, the measure in Eq. (4.15) clearly captures information about the entropy of the probability distribution, and turns out to capture information about second-order asymptotics of information processing tasks [38, 72]. In fact, the example of many independent copies of a single link state is not the only relevant situation. In [14, 19], it was argued that the entanglement spectrum of conformal field theories with large central charge (which are the motivating example to study random tensor networks) have similar behavior.

Before diving into more technical details, we can build some intuition for how distributions of the form Eq. (4.15) will behave in the two-cut setting. Consider the situation with a single random tensor. Let us take a single bulk vertex x and two boundary vertices a and b, with edges (xa) and (xb). There is a single random tensor at x, and we take link states \(\vert \phi _1\rangle ^{\otimes n}\) along (xa) and \(\vert \phi _2\rangle ^{\otimes n}\) along (xb). We denote

$$\begin{aligned} \vert \phi _j\rangle = \sum _i \sqrt{\lambda _{i,j}} \vert ii\rangle \end{aligned}$$

for \(j = 1,2\) and we let \(h_j:= H(\{\lambda _{i,j}\})\) be the entanglement entropies along the two minimal link states. There are two separating cuts for the boundary subsystem \(\{a\}\): If \(h_1 < h_2\), then (xa) is the minimal cut, and if \(h_1 > h_2\), the minimal cut is given by (xb). In these cases, for large n, the entanglement spectrum of \(\rho _{\{a\}}\) can be approximated by the entanglement spectrum along the minimal cut. What happens at the “phase transition” where \(h_1 = h_2 = h\)? The intuition is that we can split \(\vert \phi _1\rangle ^{\otimes n} \otimes \vert \phi _2\rangle ^{\otimes n}\) into a superposition of the two states, one for which the minimal surface is at (xa) and one for which the minimal surface is at (xb). We find that in this case, if we let \(\sigma _j^2 = \sum _i \lambda _{i,j}(\log (\frac{1}{\lambda _{i,j}}) - h)^2\), then for any bounded continuous function f, the quantity

$$\begin{aligned} \sum _{\lambda \in {{\,\textrm{spec}\,}}(\rho _{\{a\}})} \lambda f\left( \frac{-\log (\lambda ) - nh}{\sqrt{n}}\right) \end{aligned}$$

converges to

$$\begin{aligned} \frac{1}{2\pi \sigma _1\sigma _2} \int \int e^{-\frac{(x_1 - h)^2}{2\sigma _1^2} -\frac{(x_2 - h)^2}{2\sigma _2^2} } f(\min (x_1,x_2))\textrm{d}x_1 \textrm{d}x_2 \end{aligned}$$

in probability as n goes to infinity. Our goal in this section will be to prove a general version of this result for full random tensor networks.

In Sect. 3, we used the method of moments. (If all moments of a distribution converge, and the distribution is uniquely determined by its moments, then we have weak convergence.) However, convergence of moments is a stronger condition than weak convergence, and it requires computation of all moments. In this section, we will use that for distributions of the form Eq. (4.15), convergence in distribution follows from convergence in trace norm:

Lemma 1.14

Consider a sequence of increasing integers \(\{d_n\}_{n \in \mathbbm {N}}\) and for each n, positive vectors \(p^{(n)}, q^{(n)} \in \mathbbm {R}^{d_n}\). Let \(\sigma , h: \mathbbm {N}\rightarrow \mathbbm {R}\) be such that \(\sigma (d) > 0\) for \(d \in \mathbbm {N}\) and \(\lim _{d \rightarrow \infty } \sigma (d)^{-1} = 0\). Let

$$\begin{aligned} \nu ^{(n)}_p := \sum _{i=1}^{d_n} p^{(n)}_i \delta _{\frac{1}{\sigma (d_n)}[-\log (p^{(n)}_i)-h(d_n))]} \end{aligned}$$

and similarly let

$$\begin{aligned} \nu ^{(n)}_{q} = \sum _{i=1}^{d_n} q^{(n)}_i \delta _{\frac{1}{\sigma (d_n)}[-\log (q^{(n)}_i)-h(d_n)]}. \end{aligned}$$
  1. 1.

    Suppose \(\Vert p^{(n)} - q^{(n)}\Vert _1\) goes to zero as n goes to infinity, and \(\nu ^{(n)}_p \Rightarrow \nu \) for some probability distribution \(\nu \). Then, \(\nu ^{(n)}_{q} \Rightarrow \nu \).

  2. 2.

    Suppose the vectors \(q^{(n)}\) and \(p^{(n)}\) are random, \(\mathbbm {E}\Vert p^{(n)} - q^{(n)}\Vert _1\) goes to zero as n goes to infinity, and \(\nu ^{(n)}_p \Rightarrow \nu \) in probability. Then, \(\nu ^{(n)}_{q} \Rightarrow \nu \) in probability.

Proof

We need to show that

$$\begin{aligned} \left|\int f(x) \textrm{d}\nu _q^{(n)}(x) - \int f(x) \textrm{d}\nu (x)\right| \rightarrow 0 \end{aligned}$$

for any \(f \in C_b(\mathbbm {R})\). In fact, it suffices to show this for f uniformly continuous (see, for instance, Theorem C.10 in [3]). Let \(\varepsilon > 0\), then if we assume f to be uniformly continuous, there exists \(\delta > 0\) such that for any xy for which \(\left|x - y\right| \le \delta \), it holds that \(\left|f(x) - f(y)\right| \le \varepsilon \). We use a triangle inequality to bound

$$\begin{aligned} \left|\int f(x) \textrm{d}\nu _q^{(n)}(x) - \int f(x) \textrm{d}\nu (x)\right|&\le \left|\int f(x) \textrm{d}\nu _q^{(n)}(x) - \int f(x) \textrm{d}\nu _p^{(n)}(x)\right|\nonumber \\&\quad + \left|\int f(x) \textrm{d}\nu _p^{(n)}(x) - \int f(x) \textrm{d}\nu (x)\right|. \end{aligned}$$
(4.16)

Since \(\nu _p^{(n)} \Rightarrow \nu \), the second term on the right-hand side vanishes as n goes to infinity. If we write \(g_n(x) = \frac{1}{\sigma (d_n)}(\log \frac{1}{x} - h(d_n))\), then the first term on the right-hand side of Eq. (4.16) is given by

$$\begin{aligned} \left|\sum _{i = 1}^{d_n} q_i^{(n)} f(g_n(q_i^{(n)})) - p_i^{(n)} f(g_n(p_i^{(n)}))\right|. \end{aligned}$$

This can be bounded by

$$\begin{aligned} \sum _{i = 1}^{d_n} \left|q_i^{(n)} f(g_n(q_i^{(n)})) - p_i^{(n)} f(g_n(p_i^{(n)}))\right|&\le \sum _{i = 1}^{d_n} \left|q_i^{(n)} - p_i^{(n)}\right| \left|f(g_n(q_i^{(n)}))\right| \\&\quad + \sum _{i = 1}^{d_n} p_i^{(n)} \left|f(g_n(q_i^{(n)})) - f(g_n(p_i^{(n)}))\right| \end{aligned}$$

In this expression, the first term is bounded by

$$\begin{aligned} \sum _{i = 1}^{d_n} \left|q_i^{(n)} - p_i^{(n)}\right| \left|f(g_n(q_i^{(n)}))\right| \le C\Vert q^{(n)} - p^{(n)}\Vert _1 \end{aligned}$$
(4.17)

where C is a uniform upper bound for f (using \(f \in C_b(\mathbbm {R})\)). For the second term, we partition the sum over \([d_n]\) into three sets

$$\begin{aligned} I_1&= \left\{ i \in [d_n] \text { such that } 2^{-\delta \sigma (d_n)} \le \frac{p_i^{(n)}}{q_i^{(n)}} \le 2^{\delta \sigma (d_n)} \right\} \\ I_2&= \left\{ i \in [d_n] \text { such that } \frac{p_i^{(n)}}{q_i^{(n)}} > 2^{\delta \sigma (d_n)} \right\} \\ I_3&= \left\{ i \in [d_n] \text { such that } \frac{p_i^{(n)}}{q_i^{(n)}} < 2^{-\delta \sigma (d_n)} \right\} . \end{aligned}$$

The idea is that for \(i \in I_1\), \(p_i\) and \(q_i\) are sufficiently close that we may use the continuity of f, whereas \(I_2\) and \(I_3\) cannot have too much weight (using that \(p^{(n)}\) and \(q^{(n)}\) are close in trace distance). Let us now make this precise. For the sum over the elements in \(I_1\), we have

$$\begin{aligned} \sum _{i \in I_1} p_i^{(n)} \left|f(g_n(q_i^{(n)})) - f(g_n(p_i^{(n)}))\right| \le \sum _{i \in I_1} p_i^{(n)} \varepsilon \le \Vert p^{(n)}\Vert _1\varepsilon , \end{aligned}$$
(4.18)

using the uniform continuity of f and the fact that for \(i \in I_1\), we have \(g(q_i^{(n)}) - g(p_i^{(n)}) = \frac{1}{\sigma (d_n)}\log \left( \frac{p_i^{(n)}}{q_i^{(n)}}\right) \), implying \(\left|g(q_i^{(n)}) - g(p_i^{(n)})\right| \le \delta \) by the definition of \(I_1\). Next, we observe that

$$\begin{aligned} \Vert p^{(n)} - q^{(n)}\Vert _1&\ge \sum _{i \in I_2} \left|p_i^{(n)} - q_i^{(n)}\right| = \sum _{i \in I_2} p_i^{(n)}\left( 1 - \frac{q_i^{(n)}}{p_i^{(n)}}\right) \\ {}&\ge \sum _{i \in I_2} p_i^{(n)}\left( 1 - 2^{-\delta \sigma (d_n)}\right) , \end{aligned}$$

and hence for \(\sigma (d_n) \ge \delta ^{-1}\), we have

$$\begin{aligned} \sum _{i \in I_2} p_i^{(n)} \le 2\Vert p^{(n)} - q^{(n)}\Vert _1. \end{aligned}$$
(4.19)

In analogous fashion, we see that

$$\begin{aligned} \Vert p^{(n)} - q^{(n)}\Vert _1&\ge \sum _{i \in I_3, p_i^{(n)}> 0} \left|p_i^{(n)} - q_i^{(n)}\right| = \sum _{i \in I_3, p_i^{(n)} > 0} p_i^{(n)}\left( \frac{q_i^{(n)}}{p_i^{(n)}} - 1\right) \\&\ge \sum _{i \in I_3} p_i^{(n)}\left( 2^{\delta \sigma (d_n)} - 1\right) , \end{aligned}$$

so for \(\sigma (d_n) \ge \delta ^{-1}\) we have

$$\begin{aligned} \sum _{i \in I_3} p_i^{(n)} \le \Vert p^{(n)} - q^{(n)}\Vert _1. \end{aligned}$$
(4.20)

In conclusion, for \(\sigma (d_n) \ge \delta ^{-1}\), collecting Eqs. (4.18), (4.19), (4.20) and using that f is uniformly bounded by C, we can bound

$$\begin{aligned} \sum _{i = 1}^{d_n} p_i^{(n)} \left|f(g_n(q_i^{(n)})) - f(g_n(p_i^{(n)}))\right| \le \varepsilon \Vert p^{(n)}\Vert _1 + 3C\Vert p^{(n)} - q^{(n)}\Vert _1. \end{aligned}$$

Together with Eq. (4.17), this implies that for \(\sigma (d_n) \ge \delta \), we obtain the bound:

$$\begin{aligned} \left|\int f(x) \textrm{d}\nu _q^{(n)}(x) - \int f(x) \textrm{d}\nu _p^{(n)}(x)\right| \le 4C\Vert q^{(n)} - p^{(n)}\Vert _1 + \varepsilon \Vert p^{(n)}\Vert _1. \end{aligned}$$
(4.21)

Since \(\nu _{p}^{(n)} \Rightarrow \nu \), \(\Vert p^{(n)}\Vert _1 \rightarrow 1\), and since \(\varepsilon > 0\) was arbitrary, we conclude that as n goes to infinity

$$\begin{aligned} \left|\int f(x) \textrm{d}\nu _q^{(n)}(x) - \int f(x) \textrm{d}\nu (x)\right| \rightarrow 0 \end{aligned}$$

proving 1.

To prove 2, we note that by Markov’s inequality, it suffices to show that

$$\begin{aligned} \mathbbm {E}\left|\int f(x) \textrm{d}\nu _q^{(n)}(x) - \int f(x) \textrm{d}\nu (x)\right| \rightarrow 0, \end{aligned}$$
(4.22)

where \(f \in C_b(\mathbbm {R})\) is a uniformly continuous function. By Eq. (4.16), it suffices to show

$$\begin{aligned} \mathbbm {E}\left|\int f(x) \textrm{d}\nu _q^{(n)}(x) - \int f(x) \textrm{d}\nu _p^{(n)}(x)\right| \rightarrow 0. \end{aligned}$$

Let \(\varepsilon > 0\), and let \(\delta \) such that if \(\left|x - y\right| \le \delta \), \(\left|f(x) - f(y)\right| \le \varepsilon \). Then by Eq. (4.21), for \(\sigma (d_n) \ge \delta ^{-1}\),

$$\begin{aligned} \mathbbm {E}\left|\int f(x) \textrm{d}\nu _q^{(n)}(x) - \int f(x) \textrm{d}\nu _p^{(n)}(x)\right| \le 4C \mathbbm {E}\Vert q^{(n)} - p^{(n)}\Vert _1 + \varepsilon \mathbbm {E}\Vert p^{(n)}\Vert _1. \end{aligned}$$

Since \(\varepsilon > 0\) was arbitrary, \(\mathbbm {E}\Vert q^{(n)} - p^{(n)}\Vert _1\) goes to zero, and \(\mathbbm {E}\Vert p^{(n)}\Vert _1\) goes to one, we conclude that Eq. (4.22) holds, proving 2. \(\square \)

The value of this result is clear: So long as we restrict ourselves to measures of the form Eq. (4.15), then we can prove convergence results by proving convergence of trace norms.

We will also need a basic lemma to help estimate overlaps of states.

Lemma 1.15

Suppose we have two bipartite pure states \(\psi _{AB}, \phi _{AB} \in {\mathcal {P}}(AB)\). Then,

$$\begin{aligned} \Vert {{\,\textrm{tr}\,}}_{B}[\vert \psi \rangle \langle \phi \vert ]\Vert _1 = F(\psi _B, \phi _B). \end{aligned}$$

Proof

For any operator \(M_A\) on \({\mathcal {H}}_A\) it holds that

$$\begin{aligned} \Vert M_A\Vert _1 = \sup _{U_A} \, \left|{{\,\textrm{tr}\,}}[U_A M_A]\right| \end{aligned}$$

where the supremum is over unitary operators \(U_A\) on \({\mathcal {H}}_A\), so

$$\begin{aligned} \Vert {{\,\textrm{tr}\,}}_{B}[\vert \psi \rangle \langle \phi \vert ]\Vert _1&= \sup _{U_A} \, \left|{{\,\textrm{tr}\,}}[U_A{{\,\textrm{tr}\,}}_{B}[\vert \psi \rangle \langle \phi \vert ]]\right|\\&= \sup _{U_A} \, \left|{{\,\textrm{tr}\,}}[U_A \otimes I_B \vert \psi \rangle \langle \phi \vert ]\right| = \sup _{U_A} \, \left|\langle \phi \vert U_A \otimes I_B \vert \psi \rangle \right| \end{aligned}$$

which equals \(F(\psi _B, \phi _B)\) by Uhlmann’s theorem. \(\square \)

With these tools in hand, let us now return to the random tensor network setting. Let \(\Gamma _{A,1}\) and \(\Gamma _{A,2}\) be two cuts with associated sets of edges \(\gamma _{A,1}\) and \(\gamma _{A,2}\). We will assume these two cuts are non-intersecting: \(\gamma _{A,1} \cap \gamma _{A,2} = \emptyset \), and, without loss of generality \(\Gamma _{A,1} \subset \Gamma _{A,2}\). We will assume that the full state on edges is a tensor product of link states along the edges, and that the associated central limit measure for the spectrum along the cuts \(\gamma _{A,1}\) and \(\gamma _{A,2}\) converges to a continuous probability distribution. The most obvious application is where the link states on each edge \(\phi _e\) are many copies of some single state \(\phi _{e,0}\), so \(\phi _e = (\phi _{e,0})^{\otimes n}\), in which case the spectrum is subject to a central limit theorem and the assumptions of Theorem 4.12 are satisfied.

We will now formalize what it means for these two cuts to be competing minimal cuts. We have fixed \(\Gamma _{A,1}\) and \(\Gamma _{A,2}\) with \(\Gamma _{A,1} \subsetneq \Gamma _{A,2}\). Then, we consider three sets of cuts: \({\mathcal {C}}_1(\Gamma _{A,1}, \Gamma _{A,2})\) is the collection of cuts strictly contained in \(\Gamma _{A,1}\), \({\mathcal {C}}_2(\Gamma _{A,1}, \Gamma _{A,2})\) is the collection of cuts which strictly contain \(\Gamma _{A,2}\), and \({\mathcal {C}}_3(\Gamma _{A,1}, \Gamma _{A,2})\) is the collection of cuts which are “in between” \(\Gamma _{A,1}\) and \(\Gamma _{A,2}\). That is,

$$\begin{aligned} {\mathcal {C}}_1(\Gamma _{A,1}, \Gamma _{A,2})&= \{ \Delta _A \in C(A) : \Delta _A \subsetneq \Gamma _{A,1}\}\\ {\mathcal {C}}_2(\Gamma _{A,1}, \Gamma _{A,2})&= \{ \Delta _A \in C(A) : \Gamma _{A,2} \subsetneq \Delta _A\}\\ {\mathcal {C}}_3(\Gamma _{A,1}, \Gamma _{A,2})&= \{ \Delta _A \in C(A) : \Gamma _{A,1} \subsetneq \Delta _A \subsetneq \Gamma _{A,2}\} \end{aligned}$$

and we let \({\mathcal {C}} = {\mathcal {C}}_1(\Gamma _{A,1}, \Gamma _{A,2}) \cup {\mathcal {C}}_2(\Gamma _{A,1}, \Gamma _{A,2}) \cup {\mathcal {C}}_3(\Gamma _{A,1}, \Gamma _{A,2})\). As in the single cut case, we would like to say that the surfaces \(\Gamma _{A,1}\) and \(\Gamma _{A,2}\) are minimal cuts if an appropriate set of conditional entropies are sufficiently large for all \(\Delta _A \in {\mathcal {C}}\). We formalize this as follows:

Definition 1.16

A pair of cuts \(\Gamma _{A,1}\) and \(\Gamma _{A,2}\) for A is a pair of \((\varepsilon ,K)\)-minimal cuts if

$$\begin{aligned} H_{\min }^\varepsilon (\Gamma _{A,1} {\setminus } \Delta _A \vert \Gamma _{A,1}^c) \ge K \end{aligned}$$

for all cuts \(\Delta _A \in {\mathcal {C}}_1(\Gamma _{A,1}, \Gamma _{A,2})\)

$$\begin{aligned} H_{\min }^\varepsilon (\Delta _A {\setminus } \Gamma _{A,2}\vert \Gamma _{A,2}) \ge K \end{aligned}$$

for all cuts \(\Delta _A \in {\mathcal {C}}_2(\Gamma _{A,1}, \Gamma _{A,2})\) and

$$\begin{aligned} H_{\min }^\varepsilon (\Delta _A {\setminus } \Gamma _{A,1}\vert \Gamma _{A,1})&\ge K \\ H_{\min }^\varepsilon (\Gamma _{A,2} {\setminus } \Delta _A \vert \Gamma _{A,2}^c)&\ge K \end{aligned}$$

for all cuts \(\Delta _A \in {\mathcal {C}}_3(\Gamma _{A,1}, \Gamma _{A,2})\).

We will need a joint smoothing result similar to Lemma 4.5 for the particular case where \(\Gamma _{A,1} = A\) and \(\Gamma _{A,2}^c = {{\bar{A}}}\). To this end we consider a graph \(G = (V,E)\) with a set of boundary vertices \(V_\partial \), and a boundary subsystem \(A \subseteq V_\partial \). We denote by \(\gamma _{A,1}\) the edges incident to A, and \(\gamma _{A,2}\) the edges incident to \({{\bar{A}}}\), and assume these sets do not intersect. We let \(E_b = E {\setminus } (\gamma _{A,1} \cup \gamma _{A,2})\). For a cut \(\Delta _A \in C(A)\) we let \(Y^{\Delta _A}\) be the set of half-edges

$$\begin{aligned} Y^{\Delta _A} = \{(e,x) : e = (xy), x \in \Delta _A^c, y \in A \}. \end{aligned}$$

Lemma 1.17

Let \(\phi \in {\mathcal {P}}_{\scriptscriptstyle {\le }}(V)\) be a pure background state which is of the form \(\phi = \phi _{\gamma _{A,1}} \otimes \phi _{\gamma _{A,2}} \otimes \phi _{E_b}\), where

$$\begin{aligned} \phi _{E_b}&= \bigotimes _{e \in E_b} \phi _e \end{aligned}$$

is a product state and the \(\phi _{\gamma _{A,i}}\) have a Schmidt decomposition in the standard basis along the half-edges. Then, there exists a state \(\phi ^{\varepsilon }\) which is such that

$$\begin{aligned} P(\phi ,\phi ^{\varepsilon }) \le 2\sqrt{2^{V_b}\varepsilon } \end{aligned}$$

and for all cuts \(\Delta _A \in C(A)\) it holds that

$$\begin{aligned} H_{\min }(\Delta _A {\setminus } A \vert A)_{\phi ^\varepsilon } \ge H_{\min }^{\varepsilon }(\Delta _A {\setminus } A \vert A Y^{\Delta _A})_{\phi }. \end{aligned}$$

To state the main result of this section, recall that given a measurable function \(f: {\mathcal {X}} \rightarrow {\mathcal {Y}}\) between measure spaces, the pushforward of the function f on a measure \(\mu \) on \({\mathcal {X}}\) is defined by \((f_*(\mu ))(A) = \mu (f^{-1})(A)\) for any measureable set \(A \subseteq {\mathcal {Y}}\). We apply this to the function \(\min : \mathbbm {R}\times \mathbbm {R}\rightarrow \mathbbm {R}\) in the result:

Theorem 1.18

Consider a family of random tensor network states on a graph G with pure state on edges \(\phi \), indexed by an increasing sequence of positive integers n. We assume that \(\Gamma _{A,1}, \Gamma _{A,2} \in C(A)\) are a pair of non-intersecting \((\varepsilon (n),K(n))\)-minimal cuts for all n where \(\varepsilon (n) = {\mathcal {O}}(n^{-\gamma })\) for \(\gamma > 4\) and \(K(n) = \Omega (\log (n)^2)\) as n goes to infinity. We let \(H: \mathbbm {N}\rightarrow \mathbbm {R}\), and we assume that

$$\begin{aligned} \nu _{\phi _{\Gamma _{A,i}}} = \sum _{\lambda \in {{\,\textrm{spec}\,}}(\phi _{\Gamma _{A,i}})} \lambda \delta _{\frac{1}{\sqrt{n}}[\log (\frac{1}{\lambda }) - H(n)]} \end{aligned}$$

for \(i = 1,2,\) is such that \(\nu _{\phi _{\Gamma _{A,i}}} \Rightarrow \nu _i\), where \(\nu _i\) is a probability distribution with a continuous cumulative distribution function. Then, \(\nu _{\rho _A} \Rightarrow \textrm{min}_*(\nu _1,\nu _2)\) in probability.

Fig. 5
figure 5figure 5

Illustration of the proof of Theorem4.12

Proof

Proving this result will require several intermediate results. We provide a very high-level, enumerated sketch of our proof here, involving the following steps:

  1. 1.

    Study a reduced problem on a subnetwork; this subnetwork is such that the minimal cuts \(\gamma _{A,i}\) are incident to the boundary.

  2. 2.

    Approximate the background state with superpositions of maximally entangled states by binning eigenvalues, and construct approximate tensor network states with the approximate background states.

  3. 3.

    Prove that the spectrum of the boundary state converges to the spectrum of the approximate tensor network state, which in turn converges to the spectrum of the approximate background states. We write the background state as a superposition of two states, one of which has \(\gamma _{A,1}\) as its minimal cut, whereas the other state has \(\gamma _{A,2}\) as its minimal cut. We show that the resulting states are approximately orthogonal.

  4. 4.

    Show that the distribution of the approximate background states converges to the min-distribution, and hence conclude that the spectrum of the boundary state converges to the min-distribution.

Figure 5 provides a more detailed visual sketch of the intuition behind our proof strategy. Lemma 4.8 will be a key tool, as it implies that it will suffice to show convergence in trace norm.

We assume without loss of generality that \(\Gamma _{A,1} \subset \Gamma _{A,2}\). We now define the subnetwork that we will analyze in our proof. Let \(G' = (V',E')\) be the induced graph on \(V' = V_b' \cup V_\partial '\), where \(V_b' = \Gamma _{A,2} {\setminus } \Gamma _{A,1}\), and \(V_\partial ' = B \cup {\bar{B}}\), with B the set of vertices in \(\Gamma _{A,1}\) which are incident to \(V_b'\) and \({\bar{B}}\) the set of vertices in \(V {\setminus } \Gamma _{A,2}\) which are incident to \(V_b'\). The subgraph \(G'\) is depicted in Fig. 5b.

We also define the random tensor network state \(\tau \) as the state obtained by applying random tensors only on bulk vertices in the complement of \(V_b'\). Then for this state, by a slight variation on Theorem 4.7, it holds that

$$\begin{aligned} \mathbbm {E}\min _{V_A, V_{{\bar{A}}}} \Vert (V_A \otimes V_{{\bar{A}}} \otimes I)\tau (V_A^\dagger \otimes V_{{\bar{A}}}^\dagger \otimes I) - \phi \Vert _1 = {\mathcal {O}}(2^{-K(n)} + \sqrt{\varepsilon (n)}), \end{aligned}$$

where the minimum is over isometries \(V_A: {\mathcal {H}}_A \rightarrow {\mathcal {H}}_{\Gamma _{A,1}}\) and \(V_{{\bar{A}}}: {\mathcal {H}}_{{\bar{A}}} \rightarrow {\mathcal {H}}_{\Gamma _{A,2}^c}\). Let \({{\tilde{\tau }}} = (V_A \otimes V_{{\bar{A}}} \otimes I)\tau (V_A^\dagger \otimes V_{{\bar{A}}}^\dagger \otimes I)\) where the \(V_A\) and \(V_{{\bar{A}}}\) are the isometries that realize the minimum.

Now, consider two random tensor network states on \(G'\) obtained by applying the same random tensors on \(V_b'\) with background states \({{\tilde{\tau }}}\) and \(\phi \), respectively. The state where we take \({{\tilde{\tau }}}\) as the background state yields \((V_A \otimes V_{{\bar{A}}} \otimes I)\rho (V_A^\dagger \otimes V_{{\bar{A}}}^\dagger \otimes I)\). Denote the state where we take \(\phi \) as the background state by \(\sigma \). Then by Lemma 4.3:

$$\begin{aligned} \mathbbm {E}\Vert (V_A \otimes V_{{\bar{A}}})\rho (V_A^\dagger \otimes V_{{\bar{A}}}^\dagger ) - \sigma \Vert _1 = {\mathcal {O}}(2^{-K(n)} + \sqrt{\varepsilon (n)}). \end{aligned}$$

If we let \(E_1\) denote the set of edges (xy) for which both \(x,y \in \Gamma _{A,1}\), and \(E_2\) the set of edges (xy) for which both \(x,y \in \Gamma _{A,2}^c\), then \(\sigma _{\Gamma _{A,1}} = \phi _{E_1} \otimes \sigma _B\) and \(\sigma _{\Gamma _{A,2}} = \phi _{E_2} \otimes \sigma _{{\bar{B}}}\). This shows that

$$\begin{aligned} \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A) - {{\,\textrm{spec}\,}}_+(\sigma _{B})\Vert _1 = {\mathcal {O}}(2^{-K(n)} + \sqrt{\varepsilon (n)}) \end{aligned}$$
(4.23)

and \({{\,\textrm{spec}\,}}_+(\sigma _B) = {{\,\textrm{spec}\,}}_+(\sigma _{{\bar{B}}})\), because \(\sigma \) is pure, which follows because its background state \(\phi \) is pure. We will continue to study \(\sigma \) on the reduced graph \(G'\), as in Fig. 5b, and at the end of the proof, we will see that Eq. (4.23) will be sufficient to prove the desired properties of \({{\,\textrm{spec}\,}}(\rho _A)\).

We have accomplished Step 1 by reducing the full network to a subnetwork, and we now try to construct an approximation to \(\sigma \). We will do so by coarse-graining the spectrum of the background states along the two minimal cuts: slicing the tails of the distribution, binning the remaining eigenvalues, throwing away the smallest bins of the binned distribution, and then approximating the states as a superposition of maximally entangled states defined on the bins. Consider the background states along the two minimal cuts in the Schmidt basis:

$$\begin{aligned} \vert \phi _{\gamma _{A,i}}\rangle = \sum _{J} \sqrt{\lambda _{i,J}}\vert JJ\rangle . \end{aligned}$$
(4.24)

Here, the \(\vert J\rangle \) are a tensor product basis along the half-edges (which we again may take to be the standard basis), so that

$$\begin{aligned} \vert JJ\rangle = \bigotimes _{e = (xy) \in \gamma _{A,i}} \vert i_{e,x}\rangle \otimes \vert i_{e,y}\rangle , \end{aligned}$$

where the \(\vert i_{e,x}\rangle \) and \(\vert i_{e,y}\rangle \) are a basis for \({\mathcal {H}}_{e,x}\) and \({\mathcal {H}}_{e,y}\). First, we truncate the allowed range of eigenvalues by slicing off the tails of the distribution, as in the left side of Fig. 5c. Let

$$\begin{aligned} I_n = [2^{-H(n)-C\sqrt{n}\log (n)}, 2^{-H(n)+C\sqrt{n}\log (n)}] \end{aligned}$$

for some constant C, and let

$$\begin{aligned} \vert \phi _{\gamma _{A,i}}^{(1)}\rangle = \sum _{\lambda _{i,J} \in I_n} \sqrt{\lambda _{i,J}}\vert JJ\rangle . \end{aligned}$$

Note that the entanglement spectrum of \(\phi _{\gamma _{A,i}}\) is given by the spectrum of \(\phi _{\Gamma _{A,i}}\). This implies that

$$\begin{aligned} \sum _{\lambda _{i,J} \notin I_n} \lambda _{i,J} = \nu _{\phi _{\Gamma _{A,i}}}((-\infty ,-C\log (n))\cup (C\log (n),\infty )) \rightarrow 0 \end{aligned}$$

and hence

$$\begin{aligned} \Vert \phi _{\gamma _{A,i}} - \phi _{\gamma _{A,1}}^{(1)}\Vert _1 \rightarrow 0. \end{aligned}$$

Next, for \(\lambda _{i,J} \in I_n\), we define new \({\tilde{\lambda }}_{i,J}\) according to \(\log ({\tilde{\lambda }}_{i,J}) = \frac{1}{n^{\alpha }}\lfloor n^{\alpha }\log (\lambda _{i,J})\rfloor \), effectively binning the values of \(\log (\lambda _{i,J})\) into intervals of size \(n^{-\alpha }\), as in the right side of Fig. 5c. We will choose \(\alpha > 0\) small (but other choices will be useful for Corollary 4.14); to be precise, we choose \(\alpha < \frac{1}{8}(\gamma - 4)\). We can now define the background state

$$\begin{aligned} \vert \phi _{\gamma _{A,i}}^{(2)}\rangle = \sum _{\lambda _{i,J} \in I_n} \sqrt{{{\tilde{\lambda }}}_{i,J}}\vert JJ\rangle . \end{aligned}$$

Then by Lemma C.4

$$\begin{aligned} \Vert \phi _{\gamma _{A,i}}^{(1)}- \phi _{\gamma _{A,i}}^{(2)}\Vert _1&\le 2\sqrt{\sum _{J} \left|\lambda _{i,J} - {{\tilde{\lambda }}}_{i,J}\right|}, \end{aligned}$$

which we may bound using the fact that \(2^{-\frac{1}{n^{\alpha }}} \le \frac{{{\tilde{\lambda }}}_{i,J}}{\lambda _{i,J}} \le 1\), so

$$\begin{aligned} \sum _{J} \left|\lambda _{i,J} - {{\tilde{\lambda }}}_{i,J}\right|&= \sum _J \lambda _{i,J}\left( 1 - \frac{{{\tilde{\lambda }}}_{i,J}}{\lambda _{iJ}}\right) \le \max _{J} \left( 1 - \frac{{{\tilde{\lambda }}}_{i,J}}{\lambda _{i,J}}\right) \\ {}&\le 1 - 2^{-\frac{1}{n^{\alpha }}} = {\mathcal {O}}\left( \frac{1}{n^{\alpha }}\right) . \end{aligned}$$

Thus, \(\Vert \phi _{\gamma _{A,i}}^{(1)}- \phi _{\gamma _{A,i}}^{(2)}\Vert _1 = {\mathcal {O}}(n^{-\frac{1}{2} \alpha })\). Since the interval \(I_n\) has length \({\mathcal {O}}(\sqrt{n}\log (n))\) and the distinct values \({{\tilde{\lambda }}}_{i,J}\) are \(n^{-{\alpha }}\) apart, the \({\tilde{\lambda }}_{i,J}\) take \({\mathcal {O}}(n^{{\alpha } + \frac{1}{2}}\log (n))\) different values. Denote these values by \(p_{i,j}\), and let \(d_{i,j}\) be their multiplicities. Setting

$$\begin{aligned} q_{i,j} = d_{i,j}p_{i,j}, \end{aligned}$$

we can rewrite the state \(\vert \phi _{\gamma _{A,i}}^{(2)}\rangle \) using the collected eigenvalues:

$$\begin{aligned} \vert \phi _{\gamma _{A,i}}^{(2)}\rangle = \sum _{j} \sqrt{q_{i,j}}\vert \Phi ^+_{i,j}\rangle , \end{aligned}$$

where the \(\vert \Phi ^+_{i,j}\rangle \) are maximally entangled states of dimension \(d_{i,j}\) which are orthogonal, i.e., \(\langle \Phi ^+_{i,j}|\Phi ^+_{i,k}\rangle = \delta _{j,k}\). Note that \(0 \le q_{i,j} \le 1\) and \(\sum _j q_{i,j} \le 1\). Now, we discard any bins that are too small: Let \(\beta > \alpha + \frac{1}{2}\) and consider the state

$$\begin{aligned} \vert \phi _{\gamma _{A,i}}^{(3)}\rangle = \sum _{q_{i,j} > n^{-\beta }} \sqrt{q_{i,j}}\vert \Phi ^+_{i,j}\rangle \end{aligned}$$
(4.25)

where we restricted the sum to terms for which \(q_{i,j}\) is sufficiently large. (And hence \(p_{i,j}\) will be sufficiently close to \(d_{i,j}^{-1}\).) Then, since the number of terms is \({\mathcal {O}}(n^{\alpha + \frac{1}{2}}\log (n))\) and using Lemma C.4, we have

$$\begin{aligned} \Vert \phi _{\gamma _{A,i}}^{(2)} - \phi _{\gamma _{A,i}}^{(3)}\Vert _1 ={\mathcal {O}}\left( n^{\frac{1}{2}\alpha + \frac{1}{4} - \frac{1}{2}\beta }\sqrt{\log (n)}\right) \end{aligned}$$

To summarize, \(\phi _{\gamma _{A,i}}^{(3)}\) is the state obtained from the original background state \(\phi _{\gamma _{A,i}}\) by 1) removing the tails of the spectrum, 2) binning the eigenvalues, and finally 3) dropping any of the bins that are too small. The first approximation incurs an error \(\Vert \phi _{\gamma _{A,i}} - \phi _{\gamma _{A,i}}^{(1)}\Vert _1\), which converges to zero, and the second and third approximations incur errors of order \({\mathcal {O}}(n^{-\frac{1}{2}\alpha })\) and \({\mathcal {O}}\left( n^{\frac{1}{2}\alpha + \frac{1}{4} - \frac{1}{2}\beta }\sqrt{\log (n)}\right) \), respectively.

Now, let \({{\tilde{\phi }}}\) be the background state for which we have replaced \(\phi _{\gamma _{A,i}}\) by \(\phi _{\gamma _{A,i}}^{(3)}\) for \(i=1,2\). Recall that \(\sigma \) is the state on the random tensor network state constructed with background state \(\phi \) on the subgraph \(G'\), as in Fig. 5b. We then define the approximation \({{\tilde{\sigma }}}\) to be the tensor network state on \(G'\) which uses \({{\tilde{\phi }}}\) as its background state instead of \(\phi \). With these background states, we see

$$\begin{aligned} \Vert \phi - {{\tilde{\phi }}}\Vert _1 = {\mathcal {O}}\left( \Vert \phi _{\gamma _{A,i}} - \phi _{\gamma _{A,i}}^{(1)}\Vert _1 + n^{-\frac{1}{2}\alpha } + n^{\frac{1}{2}\alpha + \frac{1}{4} - \frac{1}{2}\beta }\sqrt{\log (n)} \right) \rightarrow 0. \end{aligned}$$
(4.26)

We then apply Lemma 4.3 to find:

$$\begin{aligned} \mathbbm {E}\Vert \sigma - {{\tilde{\sigma }}}\Vert _1 = {\mathcal {O}}\left( \Vert \phi - {{\tilde{\phi }}}\Vert _1\right) \rightarrow 0. \end{aligned}$$
(4.27)

We now make one final approximation to \(\sigma \), in which we discard the parts of the background state where the maximally entangled states along each cut are close in dimension, as in Fig. 5d. Consider the state

$$\begin{aligned} \vert {{\bar{\phi }}}_{\gamma _{A,1},\gamma _{A,2}}\rangle = \sum _{2^{n^{\nicefrac {1}{4}}}p_{2,k} \le p_{2,j} \text { or } 2^{-n^{\nicefrac {1}{4}}}p_{2,k} \ge p_{2,j}}\sqrt{q_{1,k}q_{2,j}} \vert \Phi _{1,k}\rangle \otimes \vert \Phi _{2,j}\rangle , \end{aligned}$$

where the sum is still only over those j and k for which \(q_{1,k} > n^{-\beta }\) and \(q_{2,j} > n^{-\beta }\). If we let

$$\begin{aligned} D_n = \{(x,y) \in \mathbbm {R}^2 : x - n^{-\frac{1}{4}} - n^{-\frac{1}{2}} \le y \le x + n^{-\frac{1}{4}} + n^{-\frac{1}{2}}, \left|x\right| \le \log (n) + 1\}, \end{aligned}$$

then as n increases, the measure of \(D_n\) converges to zero and since the measures \(\nu _{\phi _{\gamma _{A,i}}} \Rightarrow \nu _i\) and \(\nu _i\) has continuous cumulative distribution function

$$\begin{aligned} \sum _{2^{-n^{\nicefrac {1}{4}}}p_{1,j} \le p_{2,k} \le 2^{n^{\nicefrac {1}{4}}}p_{1,j}} q_{1,j}q_{2,k} \le (\nu _{\phi _{\Gamma _{A,1}}} \times \nu _{\phi _{\Gamma _{A,2}}}) (D_n)\rightarrow 0. \end{aligned}$$
(4.28)

Hence, if we let \({{\bar{\phi }}}\) denote the background state with \({\tilde{\phi }}_{\gamma _{A,1}}\otimes {\tilde{\phi }}_{\gamma _{A,2}}\) replaced by \({{\bar{\phi }}}_{\gamma _{A,1},\gamma _{A,2}}\), we get that \(\Vert {{\bar{\phi }}} - {{\tilde{\phi }}}\Vert _1 \rightarrow 0\). By Lemma 4.3, if we denote by \({{\bar{\sigma }}}\) the state we obtain on \(G'\) by using \({{\bar{\phi }}}\) rather than \({{\tilde{\phi }}}\) as the background state, we get

$$\begin{aligned} \mathbbm {E}\Vert {{\bar{\sigma }}} - \sigma \Vert _1 \rightarrow 0. \end{aligned}$$
(4.29)

We pause here to note that we have accomplished Step 2: We have constructed an approximation \({{\bar{\sigma }}}\) to \(\sigma \) by approximating the background state along the cuts as superpositions of maximally entangled states. The utility in doing so is that working with this approximated tensor network state allows us to reduce to calculations where we restrict to a maximally entangled state along one of the two surfaces. Our next major step is to then show that the spectrum of \({{\bar{\sigma }}}\) can be used to approximate the spectrum of \(\rho \), which will then allow us to analyze the convergence of the corresponding distribution. As a first intermediate step, we will show that \({\bar{\sigma }}\) can be approximated as a superposition of a state with minimal cut \(\gamma _{A,1}\) and a state with minimal cut \(\gamma _{A,2}\). This will allow us to more easily reason about how the background states are related to the spectrum of \({\bar{\sigma }}\), in turn, \(\sigma \), and in turn, \(\rho _A\).

Let us write

$$\begin{aligned} \vert \phi _{\gamma _{A,1},\gamma _{A,2}}^{(j,a)}\rangle&= \sum _{k \text { s.t. } p_{1,k} \ge p_{2,j}2^{n^{\nicefrac {1}{4}}}} \sqrt{q_{1,k} q_{2,j}} \vert \Phi ^+_{1,k}\rangle \otimes \vert \Phi ^+_{2,j}\rangle \\ \vert \phi _{\gamma _{A,1},\gamma _{A,2}}^{(j,b)}\rangle&= \sum _{k \text { s.t. } p_{2,k} \ge p_{1,j}2^{n^{\nicefrac {1}{4}}}} \sqrt{q_{1,j} q_{2,k}} \vert \Phi ^+_{1,j}\rangle \otimes \vert \Phi ^+_{2,k}\rangle , \end{aligned}$$

allowing us to write the background state \({{\bar{\phi }}}_{\gamma _{A,1},\gamma _{A,2}}\) as a different superposition, as depicted in Fig. 5e:

$$\begin{aligned} \vert {{\bar{\phi }}}_{\gamma _{A,1},\gamma _{A,2}}\rangle&= \sum _{j} \vert \phi _{\gamma _{A,1},\gamma _{A,2}}^{(j,a)}\rangle + \sum _j \vert \phi _{\gamma _{A,1},\gamma _{A,2}}^{(j,b)}\rangle . \end{aligned}$$

Let \(\vert \phi ^{(j,a)}\rangle \) and \(\vert \phi ^{(j,b)}\rangle \) denote the background states on \(G'\) where we have replaced \(\vert {{\bar{\phi }}}_{\gamma _{A,1},\gamma _{A,2}}\rangle \) by \(\vert \phi _{\gamma _{A,1},\gamma _{A,2}}^{(j,a)}\rangle \) and \(\vert \phi _{\gamma _{A,1},\gamma _{A,2}}^{(j,b)}\rangle \), respectively, and let

$$\begin{aligned} \vert \phi ^{a}\rangle = \sum _j \vert \phi ^{(j,a)}\rangle , \qquad \vert \phi ^{b}\rangle = \sum _j \vert \phi ^{(j,b)}\rangle . \end{aligned}$$

Denote by \(\vert \sigma ^{(j,a)}\rangle \), \(\vert \sigma ^{(j,b)}\rangle \) the random tensor network states on \(G'\) with background states \(\vert \phi ^{(j,a)}\rangle \) and \(\vert \phi ^{(j,b)}\rangle \), respectively. Similarly, denote by \(\vert \sigma ^{(a)}\rangle \) and \(\vert \sigma ^{(b)}\rangle \) the random tensor network states on \(G'\) with background states \(\vert \phi ^{(a)}\rangle \) and \(\vert \phi ^{(b)}\rangle \), respectively.

We start with the following bound on the rank of \(\sigma ^{(j,a)}_B\):

$$\begin{aligned} \begin{aligned} {{\,\textrm{rank}\,}}(\sigma ^{(j,a)}_B)&\le {{\,\textrm{rank}\,}}(\phi ^{(j,a)}_B) \\&\le \sum _{p_{1,k} \ge p_{2,j}2^{n^{\nicefrac {1}{4}}}} d_{1,k} \\&\le \sum _{p_{1,k} \ge p_{2,j}2^{n^{\nicefrac {1}{4}}}} \frac{q_{1,k}}{p_{1,k}}\\&\le \sum _{k} \frac{q_{1,k}2^{-n^{1/4}}}{p_{2,j}} \\&\le \frac{2^{-n^{1/4}}}{p_{2,j}} \end{aligned} \end{aligned}$$
(4.30)

By the same reasoning we may bound the rank of \(\sigma ^{(j,b)}_{{\bar{B}}}\) as

$$\begin{aligned} {{\,\textrm{rank}\,}}(\sigma ^{(j,b)}_{{\bar{B}}}) \le \frac{2^{-n^{1/4}}}{p_{1,j}}. \end{aligned}$$
(4.31)

Now, we will argue that the state \(\sigma ^{(j,a)}\) has minimal cut \(\gamma _{A,1}\), and \(\sigma ^{(j,b)}\) has minimal cut \(\gamma _{A,2}\), as in Fig. 5f. Intuitively, it is sensible that for \(\sigma ^{(j,a)}\) the unique minimal cut is along \(\gamma _{A,1}\), as for \(\phi ^{(j,a)}\), we have a fixed maximally entangled state along \(\gamma _{A,2}\), and a superposition of maximally entangled states of lower dimension along \(\gamma _{A,1}\). Similarly, for \(\sigma ^{(j,b)}\) the minimal cut is along \(\gamma _{A,2}\). To confirm this intuition, we will now show that \(\sigma ^{(j,a)}_B \approx \phi ^{(j,a)}_B\) and \(\sigma ^{(j,b)}_{{\bar{B}}} \approx \phi ^{(j,b)}_{{\bar{B}}}\).

We show this for \(\sigma ^{(j,a)}_B\). In this case, the “minimal” cut is simply B. Let \(\Delta _B\) be a cut for B unequal to B or to \(V_b' \cup B\). We denote by \(\Delta _A\) the associated minimal cut for A on the original graph G given by \(\Delta _A = \Delta _B \cup \Gamma _{A,1}\). We let

$$\begin{aligned} Y^{\Delta _B} = \{(e,x) : e = (xy), x \in \Delta _B^c, y \in B \}. \end{aligned}$$

Note that as \(\phi \) is a product state we have \(H_{\min }^{\varepsilon }(\Delta _B {\setminus } B \vert B)_{\phi } = H_{\min }^{\varepsilon }(\Delta _B {\setminus } B \vert B Y^{\Delta _B})_{\phi }\). We can obtain \(\phi ^{(j,a)}\) from \(\phi \) by acting with subunital CP maps on B and \({\bar{B}}\) and therefore (by two applications of data processing)

$$\begin{aligned} H_{\min }^{\varepsilon }(\Delta _B {\setminus } B \vert B Y^{\Delta _B})_{\phi ^{(j,a)}}&\ge H_{\min }^{\varepsilon }(\Delta _B {\setminus } B \vert B Y^{\Delta _B})_{\phi } = H_{\min }^{\varepsilon }(\Delta _B {\setminus } B \vert B)_{\phi } \\&\ge H_{\min }^{\varepsilon }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A)_{\phi } \end{aligned}$$

We now consider \(\Delta _B = V_b' \cup B\) (in which case \(Y^{\Delta _B}\) is empty, since we assume \(\gamma _{A,1} \cap \gamma _{A,2} = \emptyset \)). Then,

$$\begin{aligned} H_{\min }^{\varepsilon }(\Delta _B {\setminus } B \vert B)_{\phi ^{(j,a)}} \ge H_{\min }(\Delta _B {\setminus } B \vert B)_{\phi ^{(j,a)}} = H_{\min }({{\bar{B}}})_{\phi ^{(j,a)}} - H_{\max }(B)_{\phi ^{(j,a)}} \end{aligned}$$

using the product structure of \(\phi ^{(j,a)}\). Now, we compute

$$\begin{aligned} H_{\min }({{\bar{B}}})_{\phi ^{(j,a)}} = - \log (\Vert \phi ^{(j,a)}_{{\bar{B}}}\Vert _{\infty }) \ge -\log (p_{2,j}), \end{aligned}$$

where we recall that \(p_{2,j}\) is a binned eigenvalue. By Eq. (4.30), we can bound

$$\begin{aligned} H_{\max }(B)_{\phi ^{(j,a)}}&\le \log ({{\,\textrm{rank}\,}}(\phi ^{(j,a)}_B)) \le \log \left( \frac{2^{-n^{1/4}}}{p_{2,j}}\right) = -\log (p_{2,j}) - n^{\frac{1}{4}}. \end{aligned}$$

In conclusion,

$$\begin{aligned} H_{\min }^{\varepsilon }(\Delta _B {\setminus } B \vert B)_{\phi ^{(j,a)}} \ge n^{\frac{1}{4}}. \end{aligned}$$

By Lemma 4.11, we can find \(\phi ^{(\varepsilon ,j,a)}\) such that \(P(\phi ^{(j,a)}, \phi ^{(\varepsilon ,j,a)}) = {\mathcal {O}}(\sqrt{\varepsilon })\) and for all cuts \(\Delta _B \ne B\) we have

$$\begin{aligned} H_{\min }(\Delta _B {\setminus } B \vert B)_{\phi ^{(\varepsilon ,j,a)}} \ge H_{\min }^{\varepsilon }(\Delta _B {\setminus } B \vert B Y^{\Delta _B})_{\phi ^{(j,a)}} \end{aligned}$$

and hence for \(\Delta _B \ne V_b' \cup B\)

$$\begin{aligned} H_{\min }(\Delta _B {\setminus } B \vert B)_{\phi ^{(\varepsilon ,j,a)}} \ge H_{\min }^{\varepsilon }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A)_{\phi } \end{aligned}$$

and for \(\Delta _B = V_b' \cup B\)

$$\begin{aligned} H_{\min }(\Delta _B {\setminus } B \vert B)_{\phi ^{(\varepsilon ,j,a)}} \ge n^{\frac{1}{4}}. \end{aligned}$$

Therefore, Proposition 4.1 allows us to conclude that if we let \(\sigma ^{(\varepsilon ,j,a)}\) denote the random tensor network state with background state \(\phi ^{(\varepsilon ,j,a)}\), then

$$\begin{aligned} \mathbbm {E}\Vert \sigma ^{(\varepsilon ,j,a)}_B - \phi ^{(\varepsilon ,j,a)}_B\Vert _1 = {\mathcal {O}}\left( 2^{-\frac{1}{2} K(n)} + 2^{-\frac{1}{2} n^{\frac{1}{4}}}\right) , \end{aligned}$$

and by Lemma 4.3 and the fact that \(T(\phi ^{(j,a)}, \phi ^{(\varepsilon ,j,a)}) \le P(\phi ^{(j,a)}, \phi ^{(\varepsilon ,j,a)})\) this implies

$$\begin{aligned} \begin{aligned} \mathbbm {E}\Vert \sigma ^{(j,a)}_B - \phi ^{(j,a)}_B\Vert _1&\le \mathbbm {E}\Vert \sigma ^{(\varepsilon ,j,a)}_B - \phi ^{(\varepsilon ,j,a)}_B\Vert _1 \\&\quad + \mathbbm {E}\Vert \sigma ^{(\varepsilon ,j,a)}_B - \sigma ^{(j,a)}_B\Vert _1 +\Vert \phi ^{(\varepsilon ,j,a)}_B - \phi ^{(j,a)}_B\Vert _1 \\&= {\mathcal {O}}\left( 2^{-\frac{1}{2} K(n)} + 2^{-\frac{1}{2} n^{\frac{1}{4}}} + \sqrt{\varepsilon (n)}\right) . \end{aligned} \end{aligned}$$

It follows that

$$\begin{aligned} \mathbbm {E}\Vert \sigma ^{(a)}_B - \phi ^{(a)}_B\Vert _1&\le \sum _j \mathbbm {E}\Vert \sigma ^{(j,a)}_B - \phi ^{(j,a)}_B\Vert _1 \nonumber \\&= {\mathcal {O}}\left( n^{\alpha + \frac{1}{2}}\log (n)(2^{-\frac{1}{2} K(n)} + 2^{-\frac{1}{2} n^{\frac{1}{4}}} + \sqrt{\varepsilon (n)})\right) \rightarrow 0 \end{aligned}$$
(4.32)

using that the number of terms is \({\mathcal {O}}(n^{\alpha + \frac{1}{2}}\log (n))\). This expression goes to zero since we assume \(K(n) = \Omega (\log (n)^2)\) and \(\varepsilon (n) = {\mathcal {O}}(n^{-\gamma })\). A completely symmetric argument shows that

$$\begin{aligned} \mathbbm {E}\Vert \sigma ^{(j,b)}_{{{\bar{B}}}} - \phi ^{(j,b)}_{{{\bar{B}}}}\Vert _1 = {\mathcal {O}}\left( 2^{-\frac{1}{2} K(n)} + 2^{-\frac{1}{2} n^{\frac{1}{4}}} + \sqrt{\varepsilon (n)}\right) \end{aligned}$$
(4.33)

and hence

$$\begin{aligned} \begin{aligned} \mathbbm {E}\Vert \sigma ^{(b)}_{{\bar{B}}} - \phi ^{(b)}_{{\bar{B}}}\Vert _1 = {\mathcal {O}}(n^{\alpha + \frac{1}{2}}\log (n)(2^{-\frac{1}{2} K(n)} + 2^{-\frac{1}{2} n^{\frac{1}{4}}} + \varepsilon (n))) \rightarrow 0. \end{aligned} \end{aligned}$$
(4.34)

We have hence shown that \(\sigma ^{(j,a)}_B \approx \phi _B^{(j,a)}\) and \(\sigma ^{(j,b)}_{{{\bar{B}}}} \approx \phi _{{{\bar{B}}}}^{(j,b)}\) in expectation.

We now show that we can approximate the state \({\bar{\sigma }}_B\) on \(G'\) by the sum of these states. We first claim that we can sum over the j index to obtain the \(\sigma ^{(a)}_B\) and \(\sigma ^{(b)}_B\). To see this, we introduce some notation: Given a self-adjoint matrix X, let us write \({{\,\textrm{supp}\,}}(X)\) for the image, or support, of X. In other words, \({{\,\textrm{supp}\,}}(X)\) is the space spanned by the eigenvectors of X with nonzero eigenvalue. We let \({\mathcal {H}}_{B,j}\) be the support of the reduced state of \(\vert \Phi ^+_{1,j}\rangle \) on the B system, and we similarly let \({\mathcal {H}}_{{\bar{B}},j}\) be the support of the reduced state of \(\vert \Phi ^+_{2,j}\rangle \) on the \({\bar{B}}\) system. Then for \(j \ne k\), the subspace \({\mathcal {H}}_{B,j}\) is orthogonal to \({\mathcal {H}}_{B,k}\) and similarly \({\mathcal {H}}_{{\bar{B}},j}\) is orthogonal to \({\mathcal {H}}_{{\bar{B}},k}\). By construction, it is clear that \({{\,\textrm{supp}\,}}(\sigma ^{(j,a)}_{{\bar{B}}}) \subseteq {\mathcal {H}}_{{\bar{B}},j}\) and \({{\,\textrm{supp}\,}}(\sigma ^{(j,b)}_{B}) \subseteq {\mathcal {H}}_{B,j}\). This orthogonality for indices \(j\ne k\) then makes it clear that we can sum the reduced states, as in Fig. 5g:

$$\begin{aligned} \sigma ^{(a)}_B = \sum _{j} \sigma ^{(j,a)}_B, \qquad \sigma ^{(b)}_{{\bar{B}}} = \sum _{j} \sigma ^{(j,b)}_{{\bar{B}}}, \end{aligned}$$

and by the purity of \(\sigma ^{(b)}\), we have that \(\sigma _B^{(b)}\) has the same spectrum as \(\sigma _{{\bar{B}}}^{(b)}\), We remind the reader that the (a) and (b) indices indicate that the minimal cut for the state is given by \(\gamma _{A,1}\) and \(\gamma _{A,2}\), respectively. Naturally, this also holds for the summed states, by orthogonality of the summands.

We now claim that \({{\bar{\sigma }}}_B \approx \sigma ^{(a)}_B + \sigma ^{(b)}_B\), i.e., the spectrum of the approximate state on \(G'\), as in Eq. (4.29), is well approximated by the sum of two states with differing minimal cuts. We estimate their difference by

$$\begin{aligned} \mathbbm {E}\Vert \sigma ^{(a)}_B + \sigma ^{(b)}_B - {{\bar{\sigma }}}_B\Vert _1&= \mathbbm {E}\Vert \sum _{j,k} {{\,\textrm{tr}\,}}_{{\bar{B}}}[\vert \sigma ^{(j,a)}\rangle \langle \sigma ^{(k,b)}\vert + \vert \sigma ^{(k,b)}\rangle \langle \sigma ^{(j,a)}\vert ]\Vert _1 \\&\le 2 \sum _{j,k}\mathbbm {E}\Vert {{\,\textrm{tr}\,}}_{{\bar{B}}}[\vert \sigma ^{(j,a)}\rangle \langle \sigma ^{(k,b)}\vert ]\Vert _1\\&= 2\sum _{j,k} \mathbbm {E}F({{\bar{\sigma }}}^{(j,a)}_{{\bar{B}}}, {{\bar{\sigma }}}^{(k,b)}_{{\bar{B}}}) \end{aligned}$$

by Lemma 4.9. Now, if \(\rho \), \(\sigma _1\) and \(\sigma _2\) are positive operators, then by Lemma B.9 in [34], we have

$$\begin{aligned} \left|F(\rho ,\sigma _1) - F(\rho ,\sigma _2)\right| \le \sqrt{\Vert \sigma _1 - \sigma _2\Vert _1 {{\,\textrm{tr}\,}}[\rho ]}. \end{aligned}$$

So, we find that

$$\begin{aligned} \mathbbm {E}F(\sigma ^{(j,a)}_{{\bar{B}}}, \sigma ^{(k,b)}_{{\bar{B}}})&\le \mathbbm {E}\left|F(\sigma ^{(j,a)}_{{\bar{B}}}, \sigma ^{(k,b)}_{{\bar{B}}}) - F(\sigma ^{(j,a)}_{{\bar{B}}}, \phi ^{(k,b)}_{{\bar{B}}})\right| + \mathbbm {E}F(\sigma ^{(j,a)}_{{\bar{B}}}, \phi ^{(k,b)}_{{\bar{B}}}))\\&\le \mathbbm {E}\sqrt{{{\,\textrm{tr}\,}}[\sigma ^{(j,a)}_{{\bar{B}}}] \Vert \sigma ^{(k,b)}_{{\bar{B}}} - \phi ^{(k,b)}_{{\bar{B}}})\Vert _1} + \mathbbm {E}F(\sigma ^{(j,a)}_{{\bar{B}}}, \phi ^{(k,b)}_{{\bar{B}}})\\&\le \sqrt{\mathbbm {E}{{\,\textrm{tr}\,}}[\sigma ^{(j,a)}_{{\bar{B}}}]} \sqrt{\mathbbm {E}\Vert \sigma ^{(k,b)}_{{\bar{B}}} - \phi ^{(k,b)}_{{\bar{B}}})\Vert _1} + \mathbbm {E}F(\sigma ^{(j,a)}_{{\bar{B}}}, \phi ^{(k,b)}_{{\bar{B}}}), \end{aligned}$$

where we used Cauchy–Schwarz in the last step. For the first term, we may use Eq. (4.33) and \(\mathbbm {E}{{\,\textrm{tr}\,}}[{{\bar{\sigma }}}^{(j,a)}_{{\bar{B}}}] \le 1\) to see

$$\begin{aligned} \sqrt{\mathbbm {E}{{\,\textrm{tr}\,}}[\sigma ^{(j,a)}_{{\bar{B}}}]} \sqrt{\mathbbm {E}\Vert \sigma ^{(k,b)}_{{\bar{B}}} - \phi ^{(k,b)}_{{\bar{B}}})\Vert _1} = {\mathcal {O}}(2^{-\frac{1}{4} K(n)} + 2^{-\frac{1}{4} n^{\frac{1}{4}}} + (\varepsilon (n))^{\frac{1}{4}}). \end{aligned}$$

For the second term we use a basic estimate on the fidelity: If \(\rho \) and \(\sigma \) are positive operators, then

$$\begin{aligned} F(\rho ,\sigma )&= \Vert \sqrt{\rho }\sqrt{\sigma }\Vert _1 \le \Vert \sqrt{\rho }\Vert _1 \Vert \sqrt{\sigma }\Vert _{\infty } \\&\le \sqrt{{{\,\textrm{rank}\,}}(\rho )} \Vert \sqrt{\rho }\Vert _2 \Vert \sqrt{\sigma }\Vert _{\infty } = \sqrt{{{\,\textrm{rank}\,}}(\rho ){{\,\textrm{tr}\,}}[\rho ] \Vert \sigma \Vert _{\infty }}, \end{aligned}$$

which follows by Hölder’s inequality and the standard relation between Schatten 1 and 2-norms. Then, the second term can be estimated as follows: Write \(P_{{\bar{B}},j}\) for the projection onto \({\mathcal {H}}_{{\bar{B}}, j}\), then

$$\begin{aligned} F(\sigma ^{(j,a)}_{{\bar{B}}}, \phi ^{(k,b)}_{{\bar{B}}})&= F(\sigma ^{(j,a)}_{{\bar{B}}}, P_{{\bar{B}},j}\phi ^{(k,b)}_{{\bar{B}}}P_{{\bar{B}},j}) \\&\le \sqrt{{{\,\textrm{tr}\,}}[\sigma ^{(j,a)}_{{\bar{B}}}] {{\,\textrm{rank}\,}}(\sigma ^{(j,a)}_{{\bar{B}}}) \Vert P_{{\bar{B}},j}\phi ^{(k,b)}_{{\bar{B}}}P_{{\bar{B}},j}\Vert _{\infty }} \\&\le \sqrt{{{\,\textrm{tr}\,}}[\sigma ^{(j,a)}_{{\bar{B}}}] \frac{2^{-n^{1/4}}}{p_{2,j}} p_{2,j}} \\ {}&\le \sqrt{{{\,\textrm{tr}\,}}[\sigma ^{(j,a)}_{{\bar{B}}}] 2^{-n^{1/4}}} \end{aligned}$$

using Eq. (4.30) and \(\Vert P_{{\bar{B}},j}\phi ^{(k,b)}_{{\bar{B}}}P_{{\bar{B}},j}\Vert _{\infty } \le p_{2,j}\). Thus

$$\begin{aligned} \mathbbm {E}F(\sigma ^{(j,a)}_{{\bar{B}}}, \phi ^{(k,b)}_{{\bar{B}}}) \le \sqrt{ \mathbbm {E}{{\,\textrm{tr}\,}}[\sigma ^{(j,a)}_{{\bar{B}}}] 2^{-n^{1/4}}} \le 2^{-\frac{1}{2} n^{1/4}} \end{aligned}$$

and we may estimate

$$\begin{aligned} \mathbbm {E}F(\sigma ^{(j,a)}_{{\bar{B}}}, \sigma ^{(k,b)}_{{\bar{B}}}) = {\mathcal {O}}(2^{-\frac{1}{4} K(n)} + 2^{-\frac{1}{4} n^{\frac{1}{4}}} + (\varepsilon (n))^{\frac{1}{4}}). \end{aligned}$$

and therefore

$$\begin{aligned} \begin{aligned} \mathbbm {E}\Vert \sigma ^{(a)}_B + \sigma ^{(b)}_B - {{\bar{\sigma }}}_B\Vert _1&\le 2\sum _{j,k} \mathbbm {E}F({{\bar{\sigma }}}^{(j,a)}_{{\bar{B}}}, {{\bar{\sigma }}}^{(k,b)}_{{\bar{B}}})\\&= {\mathcal {O}}(n^{2\alpha + 1} \log (n)^2(2^{-\frac{1}{4} K(n)} + 2^{-\frac{1}{4} n^{\frac{1}{4}}} + (\varepsilon (n))^{\frac{1}{4}})) \rightarrow 0 \end{aligned} \end{aligned}$$
(4.35)

using that the number of terms is \({\mathcal {O}}(n^{2\alpha + 1}\log (n)^2)\), \(K(n) = \Omega (\log (n)^2)\) and \(\varepsilon (n) = {\mathcal {O}}(n^{-\gamma })\), and our choice of \(\alpha \) is such that \({\mathcal {O}}(n^{2\alpha + 1} \log (n)^2 n^{-\frac{1}{4} \gamma })\) goes to zero.

To summarize, we have shown that we can approximate the random tensor network state \({{\bar{\sigma }}}_B\) on \(G'\) by \(\sigma ^{(a)}_B + \sigma ^{(b)}_B\), and we can approximate \(\sigma ^{(a)}_B\) by \(\phi ^{(a)}_B\). Moreover, we can approximate \(\sigma ^{(b)}_{{\bar{B}}}\) by \(\phi ^{(b)}_{{\bar{B}}}\), so the spectrum of \(\sigma ^{(b)}_B\) can be approximated by the spectrum of \(\phi ^{(b)}_{{\bar{B}}}\). Recall that \(\sigma ^{(a)}\) and \(\sigma ^{(b)}\) are the random tensor network states on \(G'\) which take \(\phi ^{(a)}\) and \(\phi ^{(b)}\) as background states, respectively. Furthermore, recall that our larger goal is to show that the spectra of the reduced background states \(\phi _B^{(a)}\) and \(\phi _B^{(b)}\) are, in some sense, close to the spectrum of the approximate random tensor network state \(\sigma _B\) on the subgraph \(G'\). This is a two-step process:

  1. 1.

    Show that \(\phi ^{(a)}_B\) and \(\sigma _B^{(a)}\) can be slightly deformed to \(\phi ^{(a,\perp )}_B\) and \(\sigma _B^{(a,\perp )}\), states with support orthogonal to \({{\,\textrm{supp}\,}}(\sigma _B^{(j,b)})\) for all j.

  2. 2.

    Use the slightly deformed states to show that \({{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B\oplus \phi ^{(b)}_B)\) is close to \({{\,\textrm{spec}\,}}_+(\sigma ^{(a)}_B+\sigma ^{(b)}_B)\), and follow the chain of approximations: \(\sigma ^{(a)}_B+\sigma ^{(b)}_B \approx {{\bar{\sigma }}}_B \approx \sigma _B\) along with \({{\,\textrm{spec}\,}}_+(\sigma _B) \approx {{\,\textrm{spec}\,}}_+(\rho _A)\) to conclude that \({{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B\oplus \phi ^{(b)}_B) \approx {{\,\textrm{spec}\,}}_+(\rho _A)\)

Now, consider the state \(\sigma ^{(j,b)}_B\). It has support contained in \({\mathcal {H}}_{B,j}\) and by Eq. (4.31), it has \({{\,\textrm{rank}\,}}(\sigma ^{(j,b)}_B) \le 2^{-n^{1/4}} p_{1,j}^{-1}\). Let \(Q_j\) be the projection onto \({{\,\textrm{supp}\,}}(\sigma ^{(j,b)}_B)\), so \({{\,\textrm{supp}\,}}(Q_j) \subseteq {\mathcal {H}}_{B,j}\) and \({{\,\textrm{rank}\,}}(Q_j) \le 2^{-n^{1/4}} p_{1,j}^{-1}\). We find that

$$\begin{aligned} \Vert Q_j \phi ^{(a)}_B\Vert _1 \le {{\,\textrm{rank}\,}}(Q_j) \Vert P_{B,j} \phi ^{(a)}_B P_{B,j}\Vert _{\infty } \le \frac{2^{-n^{1/4}}}{p_{1,j}} p_{1,j} = 2^{-n^{1/4}}. \end{aligned}$$

If we let \(Q = \sum _j Q_j\), then we see that \((I - Q)\phi ^{(a)}(I - Q)\) is a small deformation of \(\phi _B^{(a)}\):

$$\begin{aligned} \Vert \phi ^{(a)}_B - (I - Q)\phi ^{(a)}(I - Q)\Vert _1&\le 2\Vert Q \phi ^{(a)}\Vert _1 \le 2\sum _j \Vert Q_j \phi ^{(a)}_B\Vert _1\nonumber \\&= {\mathcal {O}}(n^{\nicefrac 92}\log (n)2^{-n^{1/4}}) \rightarrow 0. \end{aligned}$$
(4.36)

With this property in hand, we can show that \(\sigma ^{(a)}_B\) can similarly be deformed. Let \(\sigma ^{(a,\perp )}_B = (I - Q)\sigma ^{(a)}(I - Q)\). By construction, \(\sigma ^{(a,\perp )}_B\) has support orthogonal to that of \(\sigma ^{(b)}_B\), and hence, \({{\,\textrm{spec}\,}}_+(\sigma ^{(a,\perp )}_B + \sigma ^{(b)}_B) = {{\,\textrm{spec}\,}}_+(\sigma ^{(a,\perp )}_B \oplus \sigma ^{(b)}_{{\bar{B}}})\). We observe that \(\sigma ^{(a,\perp )}_B\) is a small deformation of \(\sigma ^{(a)}_B\):

$$\begin{aligned} \begin{aligned} \mathbbm {E}\Vert \sigma _{B}^{(a)} - \sigma ^{(a,\perp )}_B\Vert _1&\le \mathbbm {E}\Vert \sigma _{B}^{(a)} - \phi _{B}^{(a)}\Vert _1\\&\quad + \mathbbm {E}\Vert (I- Q)\sigma _{B}^{(a)}(I- Q) - (I- Q)\phi _{B}^{(a)} (I - Q)\Vert _1\\&\quad + \Vert \phi ^{(a)}_B - (I - Q)\phi ^{(a)}(I - Q)\Vert _1\\&\le 2\mathbbm {E}\Vert \sigma _{B}^{(a)} - \phi _{B}^{(a)}\Vert _1 + \Vert \phi ^{(a)}_B - (I - Q)\phi ^{(a)}(I - Q)\Vert _1 \end{aligned} \end{aligned}$$
(4.37)

which goes to zero, using Eq. (4.32) for the first term, and Eq. (4.36) for the second term. By construction, \((I - Q)\phi ^{(a)}(I - Q)\) has support orthogonal to \({{\,\textrm{supp}\,}}(\sigma ^{(b)}_{B})\). This allows us to bound

$$\begin{aligned}&\mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B \oplus \phi ^{(a)}_{{\bar{B}}}) - {{\,\textrm{spec}\,}}_+(\sigma ^{(a)}_B + \sigma ^{(b)}_B)\Vert _1\\&\quad \le \mathbbm {E}\Vert \sigma _{B}^{(a)} - \sigma ^{(a,\perp )}_B\Vert _1 + \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B \oplus \phi ^{(a)}_{{\bar{B}}}) - {{\,\textrm{spec}\,}}_+(\sigma ^{(a,\perp )}_B \oplus \sigma ^{(b)}_B)\Vert _1 \\&\quad = \mathbbm {E}\Vert \sigma _{B}^{(a)} - \sigma ^{(a,\perp )}_B\Vert _1 + \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B)\\ {}&\qquad - {{\,\textrm{spec}\,}}_+(\sigma ^{(a,\perp )}_B) + {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_{{\bar{B}}}) - {{\,\textrm{spec}\,}}_+(\sigma ^{(b)}_B)\Vert _1 \\&\quad \le \mathbbm {E}\Vert \sigma _{B}^{(a)} - \sigma ^{(a,\perp )}_B\Vert _1 + \mathbbm {E}\Vert \phi ^{(a)}_B - \sigma ^{(a,\perp )}_B\Vert _1 + \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_{{\bar{B}}}) - {{\,\textrm{spec}\,}}_+(\sigma ^{(b)}_B)\Vert _1, \end{aligned}$$

where we have used the bound on the difference of spectra by the trace distance of the corresponding states Eq. (1.5). Each term on the RHS can be shown to converge to zero. The first term goes to zero by Eq. (4.37). Similarly, the second term can be bounded by

$$\begin{aligned} \mathbbm {E}\Vert \phi ^{(a)}_B - \sigma ^{(a,\perp )}_B\Vert _1 \le \mathbbm {E}\Vert \phi ^{(a)}_B - \sigma ^{(a)}_B\Vert _1 + \mathbbm {E}\Vert \sigma ^{(a)}_B - \sigma ^{(a,\perp )}_B\Vert _1 \end{aligned}$$

which goes to zero by Eq. (4.37) and Eq. (4.32). The third term can be estimated by observing that \({{\,\textrm{spec}\,}}_+(\sigma ^{(b)}_B) = {{\,\textrm{spec}\,}}_+(\sigma ^{(b)}_{{\bar{B}}})\) and

$$\begin{aligned} \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_{{\bar{B}}}) - {{\,\textrm{spec}\,}}_+(\sigma ^{(b)}_{B})\Vert _1 \le \mathbbm {E}\Vert \phi ^{(a)}_{{\bar{B}}} - \sigma ^{(b)}_{{\bar{B}}}\Vert _1 \end{aligned}$$

which goes to zero by Eq. (4.34). We conclude that \(\mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B \oplus \phi ^{(a)}_{{\bar{B}}}) - {{\,\textrm{spec}\,}}_+(\sigma ^{(a)}_B + \sigma ^{(b)}_B)\Vert _1 \rightarrow 0\).

We now follow a chain of approximations to get the desired closeness between \({{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B \oplus \phi ^{(a)}_{{\bar{B}}})\) and \({{\,\textrm{spec}\,}}_+(\rho _A)\). Using Eqs. (4.29) and (4.35), we can see that

$$\begin{aligned} \mathbbm {E}\Vert \sigma _B - (\sigma ^{(a)}_B + \sigma ^{(b)}_B)\Vert _1 \le \mathbbm {E}\Vert \sigma _B - {{\bar{\sigma }}}_B\Vert _1 + \mathbbm {E}\Vert \sigma ^{(a)}_B + \sigma ^{(b)}_B - {{\bar{\sigma }}}_B\Vert _1 \rightarrow 0, \end{aligned}$$

so \(\mathbbm {E}\Vert \sigma _B - (\sigma ^{(a)}_B + \sigma ^{(b)}_B)\Vert _1\) will converge to 0. This then implies

$$\begin{aligned}&\mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\sigma _B) - {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B \oplus \phi ^{(a)}_{{\bar{B}}})\Vert _1 \\&\quad \le \mathbbm {E}\Vert \sigma _B - (\sigma ^{(a)}_B + \sigma ^{(b)}_B)\Vert _1 \\&\qquad + \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B \oplus \phi ^{(a)}_{{\bar{B}}}) - {{\,\textrm{spec}\,}}_+(\sigma ^{(a)}_B + \sigma ^{(b)}_B)\Vert _1, \end{aligned}$$

and as we have just shown, both terms on the RHS converge to zero. Together with Eq. (4.23), this yields the desired relationship to the spectrum of \(\rho \):

$$\begin{aligned} \begin{aligned}&\mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B \oplus \phi ^{(b)}_{{\bar{B}}}) - {{\,\textrm{spec}\,}}_+(\rho _A)\Vert _1 \\&\quad \le \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\phi ^{(a)}_B \oplus \phi ^{(b)}_{{\bar{B}}}) - {{\,\textrm{spec}\,}}_+(\sigma _{B})\Vert _1\\&\qquad + \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A) - {{\,\textrm{spec}\,}}_+(\sigma _{B})\Vert _1 \rightarrow 0. \end{aligned} \end{aligned}$$
(4.38)

We pause here again to note that we have accomplished Step 3: approximating the spectrum of \(\rho _A\) by the spectra of the (approximate) background states. The final step will then be to consider the convergence properties of the distributions on the background states, which will then translate to convergence for the distribution on \(\rho _A\).

Explicitly, we want to relate the above result to \(\textrm{min}_*(\nu _1,\nu _2)\). First, we observe that by Lemma 4.8 and Eq. (4.26), \(\nu _{{{\tilde{\phi }}}_B} \Rightarrow \nu _1\) and \(\nu _{{{\tilde{\phi }}}_{{\bar{B}}}} \Rightarrow \nu _2\), and since \(\min \) is a continuous function, convergence holds for the pushforward measure:

$$\begin{aligned} \textrm{min}_*(\nu _{{{\tilde{\phi }}}_{B}},\nu _{{{\tilde{\phi }}}_{{\bar{B}}}}) \Rightarrow \textrm{min}_*(\nu _1,\nu _2). \end{aligned}$$
(4.39)

We compute

$$\begin{aligned} \begin{aligned} \textrm{min}_*(\nu _{{{\tilde{\phi }}}_{B}},\nu _{{{\tilde{\phi }}}_{{\bar{B}}}})&= \sum _{j,k} q_{1,k}q_{2,j}\delta _{\frac{1}{\sqrt{n}}[\min (\log (\frac{1}{p_{1,k}}),\log (\frac{1}{p_{2,j}})) - H(n)]}\\&= \sum _{j}\left( \sum _{p_{1,k} \ge 2^{n^{1/4}}p_{2,j}} q_{1,k}q_{2,j}\delta _{\frac{1}{\sqrt{n}}[\log (\frac{1}{p_{1,k}}) - H(n)]}\right. \\&\quad \left. + \sum _{p_{1,k} \le 2^{-n^{\nicefrac {1}{4}}}p_{2,j}} q_{1,k}q_{2,j}\delta _{\frac{1}{\sqrt{n}}[\log (\frac{1}{p_{2,j}}) - H(n)]} \right) \\&\quad + \sum _{2^{-n^{\nicefrac {1}{4}}}p_{2,j} \le p_{1,k} \le 2^{n^{\nicefrac {1}{4}}}p_{2,j}} q_{1,k}q_{2,j}\delta _{\frac{1}{\sqrt{n}}[\min (\log (\frac{1}{p_{1,k}}),\log (\frac{1}{p_{2,j}})) - H(n)]}\\&= \sum _{k} q_{1,k}q_{2,>k}\delta _{\frac{1}{\sqrt{n}}[\log (\frac{1}{p_{1,k}}) - H(n)]}\\&\quad + \sum _{j} q_{1,> j}q_{2,j}\delta _{\frac{1}{\sqrt{n}}[\log (\frac{1}{p_{2,j}}) - H(n)]} + \nu _0 \end{aligned} \end{aligned}$$
(4.40)

where we have written

$$\begin{aligned} q_{1,>j}&= \sum _{p_{1,k} \ge 2^{n^{1/4}}p_{2,j}} q_{1,k}\\ q_{2,>k}&= \sum _{p_{1,k} \le 2^{-n^{\nicefrac {1}{4}}}p_{2,j}} q_{2,j} \end{aligned}$$

and

$$\begin{aligned} \nu _0 = \sum _{2^{-n^{\nicefrac {1}{4}}}p_{2,j} \le p_{1,k} \le 2^{n^{\nicefrac {1}{4}}}p_{2,j}} q_{1,k}q_{2,j}\delta _{\frac{1}{\sqrt{n}}[\min (\log (\frac{1}{p_{1,k}}),\log (\frac{1}{p_{2,j}})) - H(n)]}. \end{aligned}$$

Then by Eq. (4.28), we see that \(\nu _0 \Rightarrow 0\). Using Eq. (4.39), we conclude

$$\begin{aligned} \textrm{min}_*(\nu _{{{\tilde{\phi }}}_{B}},\nu _{{{\tilde{\phi }}}_{{\bar{B}}}}) - \nu _0 \Rightarrow \textrm{min}_*(\nu _1,\nu _2). \end{aligned}$$
(4.41)

On the other hand, by construction, we have

$$\begin{aligned} \nu _{{{\,\textrm{spec}\,}}(\phi ^{(a)}_B \oplus \phi ^{(b)}_{{\bar{B}}})} = \nu _{\phi ^{(a)}_B} + \nu _{\phi ^{(b)}_{{\bar{B}}}}. \end{aligned}$$

We can explicitly write down

$$\begin{aligned} \nu _{\phi ^{(a)}_B} + \nu _{\phi ^{(b)}_{{\bar{B}}}}&= \sum _k q_{1,k}q_{2,>k}\delta _{\frac{1}{\sqrt{n}}[\log (\frac{1}{p_{1,k}q_{2,>k}}) - H(n)]} \\&\quad + \sum _{j} q_{1,> k}q_{2,j}\delta _{\frac{1}{\sqrt{n}}[\log (\frac{1}{p_{2,j}q_{1,> k}}) - H(n)]}. \end{aligned}$$

Since the \(q_{1,k}\) and \(q_{2,j}\) are at least \(n^{-\beta }\) (if the corresponding term is nonzero), \(\left|\log (q_{1,>j})\right| \le \beta \log (n)\) and \(\left|\log (q_{2,>k})\right| \le \beta \log (n)\). With these bounds in mind, we can compare to the last line of Eq. (4.40). More precisely, if we let \(\nu '_{{{\bar{\phi }}}} = \textrm{min}_*(\nu _{{{\tilde{\phi }}}_{B}},\nu _{{{\tilde{\phi }}}_{{\bar{B}}}}) - \nu _0\), then for uniformly continuous \(f \in C_b(\mathbbm {R})\), we have

$$\begin{aligned} \lim _{n \rightarrow \infty } \left|\int f(x) \textrm{d}\nu '_{{{\bar{\phi }}}}(x) - \int f(x) \textrm{d}\nu _{\phi ^{(a)}_B}(x) - \int f(x) \textrm{d}\nu _{\phi ^{(b)}_{{\bar{B}}}}(x)\right| = 0 \end{aligned}$$

We conclude that

$$\begin{aligned} \nu _{\phi ^{(a)}_B} + \nu _{\phi ^{(b)}_{{\bar{B}}}} \Rightarrow \textrm{min}_*(\nu _1,\nu _2). \end{aligned}$$
(4.42)

Now, we can finally put all of our ingredients together. Recall the statement of convergence in probability implied by the convergence of spectra, in expectation, as in the second part of Lemma 4.8. Then using Eq. (4.38) as the vectors \(p^{(n)}\) and \(q^{(n)}\) in the statement of Lemma 4.8 and the convergence of distributions in Eq. (4.42), we conclude

$$\begin{aligned} \nu _{\rho _A} \Rightarrow \textrm{min}_*(\nu _1,\nu _2) \end{aligned}$$
(4.43)

\(\square \)

4.5 Computing Entropies with Two Minimal Cuts

Ideally, we would like to use Theorem 4.12 to compute von Neumann entropies of random tensor network states. However, Theorem 4.12 alone is too weak to allow us to directly compute entropies up to \(o(\sqrt{n})\) corrections, as weak convergence of the spectrum does not directly imply convergence of the mean. However, we kept track of various approximation errors in the proof of Theorem 4.12, and we will use these to show that with slightly stronger assumptions, these errors allow to compute the entropy up to \({\mathcal {O}}(\log (n))\) corrections.

To begin, we first bound the difference in the entropy of a sum of density matrices and the sum of the entropies of the individual density matrices:

Lemma 1.19

Suppose \(\{p_i\}_{i=1}^m\) is a subnormalized distribution and \(\{\rho _i\in {\mathcal {P}}({\mathcal {H}}) \}_{i=1}^m\) is a collection of (unnormalized) density matrices and let \(\rho = \sum _i p_i \rho _i \in {\mathcal {P}}({\mathcal {H}})\). Suppose that \(C^{-1} \le {{\,\textrm{tr}\,}}[\rho _i] \le C\), and \(C^{-1} \le {{\,\textrm{tr}\,}}[\rho ] \le C\), then

$$\begin{aligned} \left|H(\rho ) - \sum _{i=1}^m p_i H(\rho _i)\right| \le C(\log (m) + 2\log (C)). \end{aligned}$$

Proof

If the \(\rho _i\) are normalized and \(\sum _i p_i = 1\), then by the Holevo bound we have

$$\begin{aligned} \sum _i p_i H(\rho _i) \le H\bigg (\sum _i p_i \rho _i\bigg ) \le \sum _i p_i H(\rho _i) + H(\{p_i\}) \le \sum _i p_i H(\rho _i) + \log (m). \end{aligned}$$

Now let \(q_i = {{\,\textrm{tr}\,}}[\rho _i]\), \(P = \sum _i {q_i p_i}\) and let \(\sigma _i = \frac{\rho _i}{q_i}\), \(r_i = \frac{p_i q_i}{P}\). Then, \(H(\sum _i p_i \rho _i) = H(P\sum _i r_i \sigma _i) = P H(\sum _i r_i \sigma _i) - P\log (P)\). On the other hand

$$\begin{aligned} P\sum _i r_i H(\sigma _i) = \sum _i p_i H(\rho _i) + \sum _i p_i\log (q_i) \end{aligned}$$

so it follows that

$$\begin{aligned} \left|H\bigg (\sum _i p_i \rho _i\bigg ) - \sum _i p_iH(\rho _i)\right|&\le P\log (m) + \left|P\log (P)\right| + \max \left|\log (q_i)\right|\\&\le C(\log (m) + 2\log (C)). \end{aligned}$$

\(\square \)

The idea is that for a random tensor network state, we will split up the background state as a superposition of states which are maximally entangled along the two minimal cuts and then use Lemma 4.13. This approach formalizes an argument sketched in [6].

To state our result, we introduce a new function: For vectors of positive numbers \(p \in \mathbbm {R}^{d_1}\), \(q \in \mathbbm {R}^{d_2}\), we let

$$\begin{aligned} H^*(p,q) := \sum _{i,j} p_i q_j \min \left(\log \left( \frac{1}{p_i}\right) ,\log \left( \frac{1}{q_j}\right) \right). \end{aligned}$$

A key tool we will need is the continuity of the entropy: If \(\rho , \sigma \in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}})\) are quantum states on a Hilbert space \({\mathcal {H}}\) of dimension d with \(T(\rho ,\sigma ) \le \frac{1}{e}\), then the Fannes–Audenaert inequality states that

$$\begin{aligned} \left|H(\rho ) - H(\sigma )\right| \le T(\rho ,\sigma )\log d - T(\rho ,\sigma )\log (T(\rho ,\sigma )). \end{aligned}$$
(4.44)

Let \(\eta \) be the function defined by \(\eta (x) = x\log x\). Then, if we write \({{\,\textrm{spec}\,}}(\rho ) = \{p_j\}_{j=1}^d\) and \({{\,\textrm{spec}\,}}(\sigma ) = \{q_j\}_{j=1}^d\) for Eq. (4.44) one can actually show

$$\begin{aligned} \left|H(\rho ) - H(\sigma )\right| \le \sum _{j=1}^d \, \left|\eta (p_j) - \eta (q_j)\right| \le T(\rho ,\sigma )\log d - T(\rho ,\sigma )\log (T(\rho ,\sigma )). \end{aligned}$$

We will use this to show continuity of \(H^*\) as well. Consider \(\rho _i, \sigma _i \in {\mathcal {P}}_{\scriptscriptstyle {\le }}({\mathcal {H}}_i)\) with \(\dim ({\mathcal {H}}_i) \le d\) for \(i = 1,2\). We let \({{\,\textrm{spec}\,}}(\rho _i) = \{p_{i,j}\}_{j=1}^{d_i}\) and \({{\,\textrm{spec}\,}}(\sigma _i) = \{q_{i,j}\}_{j=1}^{d_i}\). For real numbers \(x_i, y_i\) we have

$$\begin{aligned} \left|\min (x_1,y_1) - \min (x_2,y_2)\right| \le \left|x_1 - x_2\right| + \left|y_1 - y_2\right|. \end{aligned}$$

Then, we see that

$$\begin{aligned}&\left|H^*({{\,\textrm{spec}\,}}(\rho _1),{{\,\textrm{spec}\,}}(\rho _2)) - H^*({{\,\textrm{spec}\,}}(\sigma _1),{{\,\textrm{spec}\,}}(\sigma _2))\right| \\&\quad \le \sum _{j,k} \, \left| p_{1,j} p_{2,k} \min \left( \log \left(\frac{1}{p_{1,j}}\right) ,\log \left(\frac{1}{p_{2,k}}\right)\right) \right. \\&\qquad \left. - q_{1,j} q_{2,k} \min \left( \log \left(\frac{1}{q_{1,j}}\right), \log \left(\frac{1}{q_{2,k}}\right)\right) \right| \\&\quad \le \sum _{j,k} \, \left| p_{1,j} p_{2,k} \log p_{1,j} - q_{1,j} q_{2,k} \log q_{1,j}\right| + \left| p_{1,j} p_{2,k} \log p_{2,k} - q_{1,j} q_{2,k} \log q_{2,k}\right|. \end{aligned}$$

We may estimate the first term by

$$\begin{aligned}&\sum _{j,k} \, \left| p_{1,j} p_{2,k} \log p_{1,j} - q_{1,j} q_{2,k} \log q_{1,j}\right| \\&\quad \le \sum _{j,k} p_{2,k}\left| p_{1,j} \log p_{1,j} - q_{1,j} \log q_{1,j}\right| - \sum _{j,k} \, \left| p_{2,k} - q_{2,k}\right| \, q_{1,j} \log q_{1,j}\\&\quad \le \sum _j \, \left|\eta (p_{1,j}) - \eta (p_{2,j})\right| + \Vert \rho _2 - \sigma _2\Vert _1 H(\sigma _1) \\&\quad \le T(\rho _1,\sigma _1)\log d - T(\rho _1,\sigma _1)\log (T(\rho _1,\sigma _1)) + 2T(\rho _2,\sigma _2)\log d \end{aligned}$$

The second term may be estimated in similar fashion. We conclude that if \(T(\rho _1, \sigma _1)\le \varepsilon \le e^{-1}\) and \(T(\rho _2, \sigma _2)\le \varepsilon \le e^{-1}\)

$$\begin{aligned} \left|H^*({{\,\textrm{spec}\,}}(\rho _1),{{\,\textrm{spec}\,}}(\rho _2)) - H^*({{\,\textrm{spec}\,}}(\sigma _1),{{\,\textrm{spec}\,}}(\sigma _2))\right| \le 6\varepsilon \log d - 2\varepsilon \log (\varepsilon ). \end{aligned}$$
(4.45)

We will now show that we can approximate the entropy as would be expected from Theorem 4.12, if we make some additional assumptions (which in particular are satisfied if the state along each edge is a copy of n states, \(\phi _e = \phi _{e,0}^{\otimes n}\)).

Corollary 1.20

Let \(\rho \) be a random tensor network state satisfying the same assumptions as in Theorem 4.12, and assume additionally that the \(\nu _{\phi _{\gamma _{A,i}}}\) have uniformly exponentially decaying tail probabilities and \(\varepsilon (n) = {\mathcal {O}}(n^{-\gamma })\) for \(\gamma > 10\). Assume that for each edge \(e = (xy)\), the bond dimension is \(D_e = 2^{{\mathcal {O}}(n)}\) and \({{\,\textrm{tr}\,}}[\phi _{e,x}^2] \le 2^{-\Omega (n)}\). Then with high probability

$$\begin{aligned} \left|H(\rho _A) - H^*({{\,\textrm{spec}\,}}(\phi _{\gamma _{A,1}}),{{\,\textrm{spec}\,}}(\phi _{\gamma _{A,2}}))\right| = {\mathcal {O}}(\log (n)). \end{aligned}$$

Proof

The basic proof strategy will be that of Theorem 4.12: We study a slightly reduced problem on the approximated tensor network states \({\tilde{\sigma }}\) with approximate background states \({\tilde{\phi }}\), work out the entropies for \({\tilde{\sigma }}\) and \({\tilde{\phi }}\), and then argue that the closeness of the resulting entropies will continue to hold for the original tensor network state and background state, up to errors that we carefully keep track of.

First of all, we note that we can reduce to the tensor network state \(\sigma \) on the reduced graph \(G'\), with error as in Eq. (4.23); in particular

$$\begin{aligned} \mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A) - {{\,\textrm{spec}\,}}_+(\sigma _{B})\Vert _1 = {\mathcal {O}}(n^{-\frac{\gamma }{2}}). \end{aligned}$$
(4.46)

Next we adapt the part of the proof of Theorem 4.12 where we modify the state along the minimal cuts. In the proof of Theorem 4.12, we first observed that if the (regularized) spectrum along each cut \(\nu _{\phi _{\Gamma _{A,i}}}\) has uniformly exponentially decaying tail probabilities, then for sufficiently large C, we can slice off the tails with vanishing probability mass:

$$\begin{aligned} \sum _{\lambda _{i,j} \notin I_n} \lambda _{i,j} = \nu _{\phi _{\Gamma _{A,i}}}((-\infty ,-C\log (n))\cup (C\log (n),\infty )) = O\left( \frac{1}{n^4}\right) . \end{aligned}$$

We also binned the eigenvalues of \(\phi \) in the reduced spectrum:

$$\begin{aligned} q_{i,j} = d_{i,j}p_{i,j}, \end{aligned}$$

where \(p_{i,j}\) are the bins and \(d_{i,j}\) is the multiplicity of each bin. We then removed bins that were too small, leading to a state \({{\tilde{\phi }}}\). In the notation of the proof of Theorem 4.12 we choose \(\alpha \) and \(\beta \) such that \(\alpha > 2\) and \(\frac{5}{2} + \alpha < \beta \le \frac{1}{2}(\gamma - 1)\). By Eq. (4.26) this yields an error

$$\begin{aligned} \Vert {{\tilde{\phi }}} - \phi \Vert _1 = o(n^{-1}). \end{aligned}$$
(4.47)

Note that in the proof of Theorem 4.12, we performed one more approximation of removing the “middle” or “diagonal” part of the spectrum and obtained a state \({\bar{\phi }}\); we will not need to do this here. Now, we recall that the binning of eigenvalues allows us to write \({\tilde{\phi }}\) as a superposition over maximally entangled states along each cut:

$$\begin{aligned} \vert {{\tilde{\phi }}}_{\gamma _{A,1}, \gamma _{A,2}}\rangle&= \sum _{j,k} \sqrt{q_{1,j} q_{2,k}} \vert \Phi ^+_{1,j}\rangle \otimes \vert \Phi ^+_{2,k}\rangle , \end{aligned}$$

and use this to decompose \({{\tilde{\phi }}}\) as

$$\begin{aligned} \vert {{\tilde{\phi }}}\rangle&= \sum _{j,k} \sqrt{q_{1,j} q_{2,k}} \vert \psi ^{(j,k)}\rangle , \end{aligned}$$

where in \(\vert \psi ^{(j,k)}\rangle \), we have replaced \(\vert {{\tilde{\phi }}}_{\gamma _{A,1}, \gamma _{A,2}}\rangle \) with the maximally entangled state \(\vert \Phi ^+_{1,j}\rangle \otimes \vert \Phi ^+_{2,k}\rangle \). The states \(\vert \psi ^{(j,k)}\rangle \) are normalized background states on the graph \(G'\). We let

$$\begin{aligned} \vert \phi ^{(j,k)}\rangle = \sqrt{q_{1,j} q_{2,k}}\vert \psi ^{(j,k)}\rangle \end{aligned}$$

which are subnormalized states. At this point, the key idea of the argument is straightforward. We will consider the random tensor network states which have (a smoothed version) of the background states \(\vert \psi ^{(j,k)}\rangle \). These will have entropy close to \(\min (\log (d_{1,j}), \log (d_{2,k}))\). Then, we will use Lemma 4.13 to argue that the entropy of \(\rho _A\) is approximated up to \({\mathcal {O}}(\log (n))\) terms by the convex combination \(\sum _{j,k} q_{1,j} q_{2,k} \min (\log (d_{1,j}), \log (d_{2,k}))\), which we can relate to the desired result. To make this easy argument precise, we will need to take care of smoothing the background state appropriately and ensure that the relevant states are close to normalized.

We will now argue that we can smoothen the states \(\phi ^{(j,k)}\). We may assume without loss of generality that the states \(\phi ^{(j,k)}\) have nonnegative coefficients in the standard basis \(\vert I\rangle = \prod _{e \in E'} \vert i_e i_e\rangle \) where \(I = \{i_e\}_{e \in E'}\) runs over all possible basis elements over each edge, so we can write

$$\begin{aligned} \vert \phi ^{(j,k)}\rangle = \sum _I \sqrt{\lambda ^{(j,k)}_I}. \end{aligned}$$

By the same argument as in Theorem 4.12 we find a state

$$\begin{aligned} \phi ^{(j,k,\varepsilon )} = \sum _I \sqrt{\lambda ^{(j,k,\varepsilon )}_I} \end{aligned}$$

which is such that for any \(\Delta _B \in C(B)\) not equal to B or \(V_b' \cup B\) and \(\Delta _A = \Gamma _{A,1} \cup \Delta _B\)

$$\begin{aligned} H_{\min }(\Delta _B {\setminus } B \vert B)_{\phi ^{(j,k,\varepsilon )}} \ge H^\varepsilon _{\min }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A)_{\phi }, \end{aligned}$$

while for \(\Delta _B = B\) and \(\Delta _B = B \cup V_b'\) we have

$$\begin{aligned} H_{\min }(B)_{\phi ^{(j,k,\varepsilon )}}&\ge H_{\min }(B)_{\phi ^{(j,k)}} = \log \frac{1}{q_{2,k} p_{1,j}}\\ H_{\min }({{\bar{B}}})_{\phi ^{(j,k,\varepsilon )}}&\ge H_{\min }({{\bar{B}}})_{\phi ^{(j,k)}} = \log \frac{1}{q_{1,j}p_{2,k}}. \end{aligned}$$

Moreover, \(\phi ^{(j,k,\varepsilon )}\) is close to \(\phi ^{(j,k)}\) in the sense that

$$\begin{aligned} \Vert \phi ^{(j,k,\varepsilon )} - \phi ^{(j,k)}\Vert _1&= {\mathcal {O}}\left(\sqrt{\sum _I \left|\lambda ^{(j,k,\varepsilon )} - \lambda ^{(j,k)}\right|}\right) = {\mathcal {O}}(\sqrt{\varepsilon }). \end{aligned}$$

By the remark after Lemma 4.11 we may assume that \(\lambda ^{(j,k,\varepsilon )} \le \lambda ^{(j,k)}\).

By a chain rule (e.g., Theorem 5.13 in [74], proven in [30]) for any \(\Delta _B \in C(B)\) it holds that

$$\begin{aligned} H_2(\Delta _B)_{\phi ^{(j,k,\varepsilon )}} \ge H_{\min }(\Delta _B {\setminus } B \vert B)_{\phi ^{(j,k,\varepsilon )}} + H_{\min }(B)_{\phi ^{(j,k,\varepsilon )}}. \end{aligned}$$

We now let

$$\begin{aligned} \vert \phi ^{\varepsilon }\rangle = \sum _{j,k} \vert \phi ^{(j,k,\varepsilon )}\rangle . \end{aligned}$$

Then, by Lemma C.4

$$\begin{aligned} \Vert {{\tilde{\phi }}} - \phi ^{\varepsilon }\Vert _1&\le \sqrt{2 \sum _{j,k} \sum _I \left|\lambda _I^{(j,k)} - \lambda _I^{(j,k,\varepsilon )}\right|} = {\mathcal {O}}(\sqrt{n^{2\alpha }\log (n)^2\varepsilon (n)}) \nonumber \\&= {\mathcal {O}}(n^{\alpha - \frac{1}{2}\gamma }\log (n)) = {\mathcal {O}}(n^{-3}). \end{aligned}$$
(4.48)

Therefore, if we let \(\sigma ^{\varepsilon }_B\) be the random tensor network state with background state \(\phi ^{\varepsilon }\)

$$\begin{aligned} \mathbbm {E}\Vert \sigma - \sigma ^{\varepsilon }\Vert _1 \le \Vert \phi - {{\tilde{\phi }}}\Vert _1 + \Vert {{\tilde{\phi }}} - \phi ^{\varepsilon }\Vert _1 = o\bigg (\frac{1}{n}\bigg ). \end{aligned}$$
(4.49)

Finally, let

$$\begin{aligned} \vert \phi ^{(j,k,\varepsilon )}\rangle = \sqrt{q_{1,j} q_{2,k}}\vert \psi ^{(j,k,\varepsilon )}\rangle \end{aligned}$$

By Eq. (C.2) in the remark after the proof of Lemma 4.11 we have

$$\begin{aligned} \left|{{\,\textrm{tr}\,}}[\psi ^{(j,k,\varepsilon )}] - 1\right|&= \frac{1}{q_{1,j}q_{2,k}}\left|{{\,\textrm{tr}\,}}[\phi ^{(j,k,\varepsilon )}] - {{\,\textrm{tr}\,}}[\phi ^{(j,k)}]\right| \\&= {\mathcal {O}}\left( \frac{\varepsilon }{q_{1,j}q_{2,k}}\right) = {\mathcal {O}}(n^{2\beta - \gamma }) = {\mathcal {O}}(n^{-1}). \end{aligned}$$

using that \(q_{1,j}, q_{2,k} > n^{-\beta }\) and \(\beta \le \frac{1}{2}(\gamma - 1)\). We denote the random tensor network states with background states \(\psi ^{(j,k,\varepsilon )}\) by \(\sigma ^{(j,k)}\), and the random tensor network states with background states \(\sum _{j} \sqrt{q_{1,j}}\psi ^{(j,k,\varepsilon )}\) by \(\sigma ^{(k)}\).

We introduce the event N, which entails that \(\rho , \sigma \) and \(\sigma ^{(j,k)}\) for all jk are close to normalized, that is

$$\begin{aligned} \left|{{\,\textrm{tr}\,}}[\rho ] - 1\right| \le \frac{1}{n^2} \ \text { and } \ \left|{{\,\textrm{tr}\,}}[\sigma ] - 1\right| \le \frac{1}{n^2} \ \text { and } \ \left|{{\,\textrm{tr}\,}}[\sigma ^{(j,k)}] - {{\,\textrm{tr}\,}}[\psi ^{(j,k,\varepsilon )}]\right| \le \frac{1}{n^2}. \end{aligned}$$

By assumption, for each edge \(e = (xy)\) we have \({{\,\textrm{tr}\,}}[\phi _{e,x}^2] \le 2^{-\Omega (n)}\). Moreover, using that \(q_{1,j}, q_{2,k} \ge n^{-\beta }\)

$$\begin{aligned} {{\,\textrm{tr}\,}}[(\phi ^{(j,k,\varepsilon )}_{\Delta })^2] \le q_{1,j}^{-2} q_{2,k}^{-2} {{\,\textrm{tr}\,}}[\phi ^2_{\Delta }] = {\mathcal {O}}({{\,\textrm{poly}\,}}(n)2^{-\Omega (n)}) \end{aligned}$$

and therefore the quantity \(\eta \) in Lemma 2.1 is \({\mathcal {O}}(2^{-\Omega (n)})\) for \(\rho \), \(\sigma \) and \(\sigma ^{(j,k)}\). So, by Lemma 2.1 and the union bound the event N has probability

$$\begin{aligned} p_N&\ge 1 - \left( \Pr \left( \left|{{\,\textrm{tr}\,}}[\rho ] - 1\right| \ge \frac{1}{n^2}\right) + \Pr \left( \left|{{\,\textrm{tr}\,}}[\sigma ] - 1\right| \ge \frac{1}{n^2}\right) \right. \\&\left. \quad + \sum _{j,k} \Pr \left( \left|{{\,\textrm{tr}\,}}[\sigma ^{(j,k)}] - {{\,\textrm{tr}\,}}[\psi ^{(j,k,\varepsilon )}]\right| \ge \frac{1}{n^2}\right) \right) \\&= 1 - {\mathcal {O}}({{\,\textrm{poly}\,}}(n)2^{-\Omega (n)}) = 1 - {\mathcal {O}}(2^{-\Omega (n)}). \end{aligned}$$

We denote by \(\mathbbm {E}_N\) the expectation value over the random tensors conditioned on this event.

We now use Lemma 4.13 to approximate the entropy of \(\sigma ^{\varepsilon }_B\) conditioned on N:

$$\begin{aligned} \left|H(\sigma ^{\varepsilon }_B) - \sum _k q_{2,k} H(\sigma _B^{(k)})\right| = {\mathcal {O}}(\log (n)). \end{aligned}$$

Because \(\sigma ^{(k)}\) is pure, we have \(H(\sigma _B^{(k)}) = H(\sigma _{{{\bar{B}}}}^{(k)})\). We apply Lemma 4.13 again, this time to the decomposition of \(\sigma _{{{\bar{B}}}}^{(k)}\) to see:

$$\begin{aligned} \left|H(\sigma _{{{\bar{B}}}}^{(k)}) - \sum _j q_{1,j} H(\sigma _{{{\bar{B}}}}^{(j,k)})\right| = {\mathcal {O}}(\log (n)). \end{aligned}$$

Thus,

$$\begin{aligned} \left|H(\sigma ^{\varepsilon }_B) - \sum _{j,k} q_{1,j} q_{2,k} H(\sigma _{B}^{(j,k)})\right| = {\mathcal {O}}(\log (n)). \end{aligned}$$

Since \({{\,\textrm{rank}\,}}(\sigma _{B}^{(j,k)}) \le \min \{d_{1,j}, d_{2,k} \}\) (and taking into account the normalization of \(\sigma ^{(j,k)}\))

$$\begin{aligned} H(\sigma _{B}^{(j,k)}) \le {{\,\textrm{tr}\,}}[\sigma ^{(j,k)}]\min \left( \log d_{1,j}, \log d_{2,k} \right) - {{\,\textrm{tr}\,}}[\sigma ^{(j,k)}] \log {{\,\textrm{tr}\,}}[\sigma ^{(j,k)}]. \end{aligned}$$

Conditioned on N and using \(\left|{{\,\textrm{tr}\,}}[\psi ^{(j,k,\varepsilon )}] - 1\right| = {\mathcal {O}}(n^{-1})\), this implies

$$\begin{aligned} H(\sigma _{B}^{(j,k)}) \le \min \left( \log d_{1,j}, \log d_{2,k} \right) + {\mathcal {O}}(1). \end{aligned}$$

For a lower bound, we use that

$$\begin{aligned} \mathbbm {E}_N H(\sigma _{B}^{(j,k)})&\ge \mathbbm {E}_N {{\,\textrm{tr}\,}}[\sigma ^{(j,k)}]\left( -\log {{\,\textrm{tr}\,}}\left[(\sigma _{B}^{(j,k)})^2\right] + \log {{\,\textrm{tr}\,}}[\sigma _{B}^{(j,k)}]\right) \\&\ge \bigg ({{\,\textrm{tr}\,}}[\psi ^{(j,k,\varepsilon )}] - \frac{1}{n^2}\bigg )\left( -\log \mathbbm {E}_N {{\,\textrm{tr}\,}}\left[ (\sigma _{B}^{(j,k)})^2\right] + \log \bigg ({{\,\textrm{tr}\,}}[\psi ^{(j,k,\varepsilon )}] - \frac{1}{n^2}\bigg ) \right) \\&\ge -\log \mathbbm {E}_N {{\,\textrm{tr}\,}}\left[ (\sigma _{B}^{(j,k)})^2\right] - {\mathcal {O}}(1) \end{aligned}$$

where in the first and second inequality we have used Jensen’s inequality, and in the second and third inequality we have used \({{\,\textrm{tr}\,}}[\sigma _{B}^{(j,k)}] \ge {{\,\textrm{tr}\,}}[\psi ^{(j,k,\varepsilon )}] - \frac{1}{n^2} \ge 1 - {\mathcal {O}}(\frac{1}{n})\). We use the replica trick to estimate

$$\begin{aligned} \mathbbm {E}{{\,\textrm{tr}\,}}\left[ (\sigma ^{(j,k)}_B)^2 \right] = \sum _{\Delta _B \in C(B)} {{\,\textrm{tr}\,}}[(\psi ^{(j,k,\varepsilon )}_{\Delta _B})^2] \end{aligned}$$

In this expression, we have contributions from \(\Delta _B = B\) and \(\Delta _B = B \cup V_b'\), which yield contributions

$$\begin{aligned} {{\,\textrm{tr}\,}}[(\psi ^{(j,k,\varepsilon )}_{B})^2] \le d_{1,j}^{-1} \\ {{\,\textrm{tr}\,}}[(\psi ^{(j,k,\varepsilon )}_{{{\bar{B}}}})^2] \le d_{2,k}^{-1}. \end{aligned}$$

For any other cut \(\Delta _B\), we have a contribution at most

$$\begin{aligned} {{\,\textrm{tr}\,}}[(\psi ^{(j,k,\varepsilon )}_{\Delta _B})^2]&= q_{1,j}^{-2}q_{2,k}^{-2} {{\,\textrm{tr}\,}}[\phi ^{(j,k,\varepsilon )}] 2^{-H_{\min }(B)_{\phi ^{(j,k,\varepsilon )}} - H^\varepsilon _{\min }(\Delta _A {\setminus } \Gamma _A \vert \Gamma _A)_{\phi }}\\&\le q_{1,j}^{-1}q_{2,k}^{-1}2^{-H_{\min }(B)_{\phi ^{(j,k,\varepsilon )}}} \le d_{1,j}^{-1} \end{aligned}$$

using that \({{\,\textrm{tr}\,}}[\phi ^{(j,k,\varepsilon )}] \le q_{1,j}q_{2,k}\) and \(2^{-H_{\min }(B)_{\phi ^{(j,k,\varepsilon )}}} \le q_{2,k}p_{1,j} = q_{1,j}q_{2,k} d_{1,j}^{-1}\). Therefore

$$\begin{aligned} \mathbbm {E}_N {{\,\textrm{tr}\,}}\left[ (\sigma ^{(j,k,\varepsilon )}_B)^2 \right]&\le p_N^{-1} \mathbbm {E}{{\,\textrm{tr}\,}}\left[ (\sigma ^{(j,k,\varepsilon )}_B)^2 \right] \\&= d_{1,j}^{-1} + d_{2,k}^{-1} + {\mathcal {O}}(d_{1,j}^{-1}). \end{aligned}$$

so

$$\begin{aligned} -\log \mathbbm {E}_N {{\,\textrm{tr}\,}}\left[ (\sigma ^{(j,k,\varepsilon )}_B)^2 \right]&\ge -\log \left( \max \left(d_{1,j}^{-1}, d_{2,k}^{-1}\right)(2 + {\mathcal {O}}(1)) \right) \\&\ge \min ( \log d_{1,j}, \log d_{2,k} ) - {\mathcal {O}}(1). \end{aligned}$$

We find that

$$\begin{aligned} \mathbbm {E}_N \left|H(\sigma _{B}^{(j,k)}) - \min ( \log d_{1,j}, \log d_{2,k} )\right| = {\mathcal {O}}(1) \end{aligned}$$

and hence

$$\begin{aligned} \mathbbm {E}_N \left|\sum _{j,k} q_{1,j} q_{2,k} H(\sigma _{B}^{(j,k)}) - \sum _{j,k} q_{1,j} q_{2,k} \min ( \log d_{1,j}, \log d_{2,k} )\right| = {\mathcal {O}}(1). \end{aligned}$$
(4.50)

We collect the various estimates we have found:

$$\begin{aligned}&\mathbbm {E}_N \Vert {{\,\textrm{spec}\,}}_+(\rho _A) - {{\,\textrm{spec}\,}}_+(\sigma _{B})\Vert _1 \le p_N^{-1}\mathbbm {E}\Vert {{\,\textrm{spec}\,}}_+(\rho _A) - {{\,\textrm{spec}\,}}_+(\sigma _{B})\Vert _1 = o\left( \frac{1}{n}\right) \\&\mathbbm {E}_N \Vert \sigma _B - \sigma ^{\varepsilon }_B\Vert _1 \le p_N^{-1}\mathbbm {E}\Vert \sigma _B - \sigma ^{\varepsilon }_B\Vert _1 = o\left( \frac{1}{n}\right) \\&\mathbbm {E}_N \left|\sum _{j,k} q_{1,j} q_{2,k} H(\sigma _{B}^{(j,k)}) - \sum _{j,k} q_{1,j} q_{2,k} \min ( \log d_{1,j}, \log d_{2,k} )\right| = {\mathcal {O}}(1). \end{aligned}$$

Let M be the event that N holds and moreover

$$\begin{aligned}&\Vert {{\,\textrm{spec}\,}}_+(\rho _A) - {{\,\textrm{spec}\,}}_+(\sigma _{B})\Vert _1 \le \frac{1}{n}\\&\Vert \sigma _B - \sigma ^{\varepsilon }_B\Vert _1 \le \frac{1}{n} \\&\left|\sum _{j,k} q_{1,j} q_{2,k} H(\sigma _{B}^{(j,k)}) - \sum _{j,k} q_{1,j} q_{2,k} \min ( \log d_{1,j}, \log d_{2,k} )\right| \le \log (n). \end{aligned}$$

By Markov’s inequality and the union bound, the probability that M holds goes to one as n goes to infinity. Moreover, if M holds, it is easy to verify that by the Fannes–Audenaert inequality Eq. (4.44) and the fact that \(\rho \), \(\sigma \) and \(\sigma ^{\varepsilon }\) are close to normalized

$$\begin{aligned} \left|H\Bigg (\frac{\rho _A}{{{\,\textrm{tr}\,}}[\rho ]}\Bigg ) - H\Bigg (\sigma ^{\varepsilon }_B\Bigg )\right|&= {\mathcal {O}}(1). \end{aligned}$$

Also, if M holds, by Eq. (4.50)

$$\begin{aligned}&\left|H(\sigma ^{\varepsilon }_B) - \sum _{j,k} q_{1,j} q_{2,k} \min ( \log d_{1,j}, \log d_{2,k} )\right|\\&\quad = \left|\sum _{j,k} q_{1,j} q_{2,k} H(\sigma _{B}^{(j,k)}) - \sum _{j,k} q_{1,j} q_{2,k} \min ( \log d_{1,j}, \log d_{2,k} )\right| \\&\qquad +{\mathcal {O}}(\log (n))= {\mathcal {O}}(\log (n)) \end{aligned}$$

so we conclude that

$$\begin{aligned} \left|H\bigg (\frac{\rho _A}{{{\,\textrm{tr}\,}}[\rho ]}\bigg ) - \sum _{j,k} q_{1,j} q_{2,k} \min ( \log d_{1,j}, \log d_{2,k} )\right|&= {\mathcal {O}}(\log (n)). \end{aligned}$$

Since \(q_{i,j} = d_{i,j}p_{i,j} \ge n^{-\beta }\) from the binning procedure, we have the simple observation

$$\begin{aligned} \frac{1}{n^{\beta } p_{i,j}} \le \frac{q_{i,j}}{p_{i,j}} = d_{i,j} \le \frac{1}{p_{i,j}}, \end{aligned}$$

and hence \(\left|\min (\log d_{1,j},\log d_{2,k}) - \log (\min (\frac{1}{p_{1,k}},\frac{1}{p_{2,j}}))\right| = {\mathcal {O}}(\log (n))\). We conclude that

$$\begin{aligned} \left|H\left( \frac{\rho _A}{{{\,\textrm{tr}\,}}[\rho ]}\right) - \sum _{j,k} q_{1,k}q_{2,j}\log \left( \min \left(\frac{1}{p_{1,k}},\frac{1}{p_{2,j}}\right)\right) \right| = {\mathcal {O}}(\log (n)). \end{aligned}$$

Finally, we need to relate the result back to the original background state. In the above approximation to \(H({\tilde{\sigma }}_B)\), we see that

$$\begin{aligned} \sum _{j,k} q_{1,k}q_{2,j}\log \left( \min \left(\frac{1}{p_{1,k}},\frac{1}{p_{2,j}}\right)\right) = H^*({{\,\textrm{spec}\,}}({{\tilde{\phi }}}_{\gamma _{A,1}}),{{\,\textrm{spec}\,}}({{\tilde{\phi }}}_{\gamma _{A,2}})). \end{aligned}$$

Then by Eqs. (4.47) and (4.45), this will converge to the appropriate quantity on the non-approximated background state:

$$\begin{aligned} \left|H^*({{\,\textrm{spec}\,}}({{\tilde{\phi }}}_{\gamma _{A,1}}),{{\,\textrm{spec}\,}}({{\tilde{\phi }}}_{\gamma _{A,2}})) - H^*({{\,\textrm{spec}\,}}(\phi _{\gamma _{A,1}}),{{\,\textrm{spec}\,}}(\phi _{\gamma _{A,2}}))\right| \rightarrow 0, \end{aligned}$$

proving the desired result. \(\square \)