1 Introduction

Quantum estimation theory is one of the pillars of quantum information science, with a wide range of applications from evaluating the performance of quantum devices [1, 2] to exploring the foundation of physics [3, 4]. In the typical scenario, the problem is specified by a parametric family of quantum states, called the model, and the objective is to design measurement strategies that estimate the parameters of interest with the highest possible precision. The precision measure is often chosen to be the mean square error (MSE), and is lower bounded through generalizations of the Cramér–Rao bound of classical statistics [5, 6]. Given n copies of a quantum state, such generalizations imply that the product \(\mathrm{MSE}\cdot n\) converges to a positive constant in the large n limit.

Despite many efforts made over the years (see, e.g., [5,6,7,8,9,10,11,12] and [13] for a review), the attainability of the precision bounds of quantum state estimation has only been proven in a few special cases. Consider, as an example, the most widely used bound, namely the symmetric logarithmic derivative Fisher information bound (SLD bound, for short). The SLD bound is tight in the one-parameter case [5, 6], but is generally non-tight in multiparameter estimation. Intuitively, measuring one parameter may affect the precision in the measurement of another parameter, and thus it is extremely tricky to construct the optimal measurement. Another bound for multiparameter estimation is the right logarithmic derivative Fisher information bound (RLD bound, in short) [5]. Its achievability was shown in the Gaussian states case [5], the qubits case [14, 15], and the qudits case [16, 17]. In this sense, the RLD bound is superior to the SLD bound. However, the RLD bound holds only when the family of states to be estimated satisfies an ad hoc mathematical condition. The most general quantum extension of the classical Cramér–Rao bound till now is the Holevo bound [5], which gives the maximum among all existing lower bounds for the error of unbiased measurements for the estimation of any family of states. The attainability of the Holevo bound was studied in the pure states case [10] and the qubit case [14, 15], and was conjectured to be generic by one of us [18]. Yamagata et al. [19] addressed the attainability question in a local scenario, showing that the Holevo bound can be attained under certain regularity conditions. However, the attaining estimator constructed therein depends on the true parameter, and therefore has limited practical interest. Meanwhile, the need of a general, attainable bound on multiparameter quantum estimation is increasing, as more and more applications are being investigated [20,21,22,23,24].

In this work we explore a new route to the study of precision limits in quantum estimation. This new route allows us to prove the asymptotic attainability of the Holevo bound in generic scenarios, to extend its validity to a broader class of estimators, and to derive a new set of attainable precision bounds. We adopt the condition of local asymptotic covariance [18] which is less restrictive than the unbiasedness condition [5] assumed in the derivation of the Holevo bound. Under local asymptotic covariance, we characterize the MSE of the limiting distribution, namely the distribution of the estimator’s rescaled deviation from the true value of the parameter in the asymptotic limit of \(n\rightarrow \infty \).

Our contribution can be divided into two parts, the attainability of the Holevo bound and the proof that the Holevo bound still holds under the weaker condition of local asymptotic covariance. To show the achievability part, we employ quantum local asymptotic normality (Q-LAN), a useful characterization of n-copy d-dimensional (qudit) states in terms of multimode Gaussian states. The qubit case was derived in [14, 15] and the case of full parametric models was derived by Kahn and Guta when the state has non-degenerate spectrum [16, 17]. Here we extend this characterization to a larger class of models, called D-invariant models, using a technique of symplectic diagonalization. For models that are not D-invariant, we derive an achievable bound, expressed in terms of a quantum Fisher information-like quantity that can be straightforwardly evaluated. Whenever the model consists of qudit states with non-degenerate spectrum, this quantity turns out to be equal to the quantity in the Holevo bound [5]. Our evaluation has compact uniformity and order estimation of the convergence, which will allow us to prove the achievability of the bound even in the global setting.

We stress that, until now, the most general proof of the Holevo bound required the condition of local unbiasedness. In particular, no previous study showed the validity of the Holevo bound under the weaker condition of local asymptotic covariance in the multiparameter scenario. To avoid employing the (local) unbiasedness condition, we focus on the discretized version of the RLD Fisher information matrix, introduced by Tsuda and Matsumoto [25]. Using this version of the RLD Fisher information matrix, we manage to handle the local asymptotic covariance condition and to show the validity of the Holevo bound in this broader scenario. Remarkably, the validity of the bound does not require finite-dimensionality of the system or non-degeneracy of the states in the model. Our result also provides a simpler way of evaluating the Holevo bound, whose original expression involved a difficult optimization over a set of operators.

The advantage of local asymptotic covariance over local unbiasedness is the following. For practical applications, the estimator needs to attain the lower bound globally, i.e., at all points in the parameter set. However, it is quite difficult to meet this desideratum under the condition of local unbiasedness, even if we employ a two-step method based on a first rough estimate of the state, followed by the measurement that is optimal in the neighbourhood of the estimate. In this paper, we construct a locally asymptotic covariant estimator that achieves the Holevo bound at every point, for any qudit submodel except those with degenerate states. Our construction proceeds in two steps. In the first step, we perform a full tomography of the state, using the protocol proposed in [26]. In the second step, we implement a locally optimal estimator based on Q-LAN [16, 17]. The two-step estimator works even when the estimated parameter is not assumed to be in a local neighbourhood of the true value. The key tool to prove this property is our precise evaluation of the optimal local estimator with compact uniformity and order estimation of the convergence. Our method can be extended from the MSE to arbitrary cost functions. A comparison between the approach adopted in this work (in green) and conventional approaches to quantum state estimation (in blue) can be found in Fig. 1.

Besides the attainability of the Holevo bound, the method can be used to derive a broad class of bounds for quantum state estimation. Under suitable assumptions, we characterize the tail of the limiting distribution, providing a bound on the probability that the estimate falls out of a confidence region. The limiting distribution is a good approximation of the (actual) probability distribution of the estimator, up to a term vanishing in n. Then, we derive a bound for quantum estimation with nuisance parameters, i.e. parameters that are not of interest to the experimenter but may affect the estimation of the other parameters. For instance, the strength of noise in a phase estimation scenario can be regarded as a nuisance parameter. Our bound applies also to arbitrary estimation models, thus extending nuisance parameter bounds derived for specific cases (see, e.g., [27,28,29]). In the final part of the paper, the above bounds are illustrated in concrete examples, including the joint measurement of two qubit observables, the estimation of qubit states in the presence of amplitude damping noise, and noisy multiphase estimation.

Fig. 1
figure 1

Comparison between the approach of this work (in green) and the traditional approach of quantum state estimation (in blue). In the traditional approach, one derives precision bounds based on the probability distribution function (PDF) for measurements on the original set of quantum states. The bounds are evaluated in the large n limit and the task is to find a sequence of measurements that achieves the limit bound. In this work, we first characterize the limiting distribution and then work out a bound in terms of the limiting distribution. This construction also provides the optimal measurement in the limiting scenario, which can be used to prove the asymptotic attainability of the bound. The analysis of the limiting distribution also provides tail bounds, which approximate the tail bounds for finite n up to a small correction, under the assumption that the cost function and the model satisfy a certain relation (see Theorem 9)

The remainder of the paper is structured as follows. In Sect. 2 we introduce the main ideas in the one-parameter case. Our discussion of the one-parameter case requires no regularity condition for the parametric model. Then we devote several sections to introducing and deriving tools for the multiparameter estimation. In Sect. 3, we briefly review the Holevo bound and Gaussian states, and derive some relations that will be useful in the rest of the paper. In Sect. 4, we introduce Q-LAN. In Sect. 5 we introduce the \(\epsilon \)-difference RLD Fisher information matrix, which will be a key tool for deriving our bounds in the multiparameter case. In Sect. 6, we derive the general bound on the precision of multiparameter estimation. In Sect. 7, we address state estimation in the presence of nuisance parameters and derive a precision bound for this scenario. Section 8 provides bounds on the tail probability. In Sect. 9, we extend our results to global estimation and to generic cost functions. In Sect. 10, the general method is illustrated through examples. The conclusions are drawn in Sect. 11.

Remark on the notation In this paper, we use \(z^*\) for the complex conjugate of \(z\in \mathbb {C}\) and \(A^\dag \) for the Hermitian conjugate of an operator A. For convenience of the reader, we list other frequently appearing notations and their definitions in Table 1.

Table 1 Table of terms and notations

2 Precision Bound Under Local Asymptotic Covariance: One-Parameter Case

In this section, we discuss estimation of a single parameter under the local asymptotic covariance condition, without any assumption on the parametric model.

2.1 Cramér–Rao inequality without regularity assumptions

Consider a one-parameter model \(\mathcal {M}\), of the form

$$\begin{aligned} \mathcal {M} = \left\{ \rho _t \right\} _{ t \in \mathsf {\Theta }} \end{aligned}$$
(1)

where \(\mathsf {\Theta }\) is a subset of \(\mathbb {R}\). In the literature it is typically assumed that the parametrization is differentiable. When this is the case, one can define the symmetric logarithmic derivative operator (SLD in short) at \(t_0\) via the equation

$$\begin{aligned} \frac{d \rho _{{t_0}}}{d t_0}= \frac{1}{2}\left( \rho _{{t}_0}{L}_{t_0}+{L}_{t_0}\rho _{{t}_0}\right) . \end{aligned}$$
(2)

Then, the SLD Fisher information is defined as

$$\begin{aligned} J_{{t_0}}:=\mathrm{Tr}\,\left[ \rho _{{t_0}} L_{t_0}^2\right] \, . \end{aligned}$$
(3)

The SLD \(L_{t_0}\) is not unique in general, but the SLD Fisher information \(J_{{t_0}}\) is uniquely defined because it does not depend on the choice of the SLD \(L_{t_0}\) among the operators satisfying (2). When the parametrization is \(C^1\)-continuous and \(\epsilon >0\) is a small number, one has

$$\begin{aligned} F\left( \rho _{t_0} ||\rho _{t_0+\epsilon }\right) = 1-\frac{J_{t_0}}{8}\epsilon ^2+ o(\epsilon ^2) \, , \end{aligned}$$
(4)

where

$$\begin{aligned} F(\rho \Vert \rho '):= \mathrm{Tr}\,|\sqrt{\rho }\sqrt{\rho '}| \end{aligned}$$
(5)

is the fidelity between two density matrices \(\rho \) and \(\rho '\). It is called Bhattacharya or Hellinger coefficient in the classical case [30, 31].

Here we do not assume that the parametrization (1) is differentiable. Hence, the SLD Fisher information cannot be defined by (3). Instead, following the intuition of (4), we define the SLD Fisher information \(J_{t_0}\) as the limit

$$\begin{aligned} J_{t_0}:=\liminf _{\epsilon \rightarrow 0}\frac{8 \big [ \,1-F\left( \rho _{t_0} ||\rho _{t_0+\epsilon }\right) \big ]}{\epsilon ^2 } \, . \end{aligned}$$
(6)

In the n-copy case, we have the following lemma:

Lemma 1

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{8\left( 1-F\left( \rho _{t_0}^{\otimes n}||\rho _{t_0+\frac{\epsilon }{\sqrt{n}}}^{\otimes n}\right) \right) }{\epsilon ^2} {=} \frac{8\left( 1- e^{-\frac{J_{t_0} \epsilon ^2}{8}} \right) }{\epsilon ^2}. \end{aligned}$$
(7)

Proof

Using the definition (6), we have

$$\begin{aligned}&\liminf _{n \rightarrow \infty } \frac{8\left( 1-F\left( \rho _{t_0}^{\otimes n}||\rho _{t_0+\frac{\epsilon }{\sqrt{n}}}^{\otimes n}\right) \right) }{\epsilon ^2} \nonumber \\&\quad = \liminf _{n \rightarrow \infty } \frac{8\left( 1- \left( 1-\frac{J_{t_0} \epsilon ^2}{8n} +o(\frac{ \epsilon ^2}{n}) \right) ^n\right) }{\epsilon ^2} {=} \frac{8\left( 1- e^{-\frac{J_{t_0} \epsilon ^2}{8}} \right) }{\epsilon ^2}. \end{aligned}$$
(8)

In other words, the SLD Fisher information is constant over n if we replace \(\epsilon \) by \(\epsilon /\sqrt{n}\).

To estimate the parameter \(t\in \mathsf {\Theta }\), we perform on the input state a quantum measurement, which is mathematically described by a positive operator valued measure (POVM) with outcomes in \(\mathcal {X}\subset \mathbb {R}\). An outcome x is then mapped to an estimate of t by an estimator \(\hat{t}(x)\). It is often assumed that the measurement is unbiased, in the following sense: a POVM M on a single input copy is called unbiased when

$$\begin{aligned} \int _\mathcal{{X}} \hat{t}(x) \, \mathrm{Tr}\,\left[ \rho _{t} \,M(d {x})\right] =t, \qquad \forall t\in \mathsf {\Theta }. \end{aligned}$$
(9)

For a POVM M, we define the mean square error (MSE) \(V_{t}(M)\) as

$$\begin{aligned} V_{{t}}(M) :=\int _\mathcal{{X}} \left( \hat{t}(x)-t\right) ^2 \mathrm{Tr}\,\left[ \, \rho _{{t}} M(d{x}) \,\right] \, . \end{aligned}$$
(10)

Then, we have the fidelity version of the Cramér–Rao inequality:

Theorem 1

For an unbiased measurement M satisfying

$$\begin{aligned} \int _{\mathcal{X}} \hat{t}(x)P_t(dx)=t \end{aligned}$$
(11)

for any t, we have

$$\begin{aligned} \frac{1}{2} \big [ \, V_{t_0}(M)+V_{t_0+\epsilon }(M)+\epsilon ^2 \, \big ] \ge \frac{\epsilon ^2}{8 \big [ \, 1-F(\rho _{t_0} ||\rho _{t_0+\epsilon }) \,\big ]} \, . \end{aligned}$$
(12)

When \(\lim _{\epsilon \rightarrow 0}V_{t_0+\epsilon }(M)=V_{t_0}(M)\), taking the limit \(\epsilon \rightarrow 0\), we have

$$\begin{aligned} V_{t_0}(M) \ge J_{t_0}^{-1} . \end{aligned}$$
(13)

The proof uses the notion of fidelity between two classical probability distributions: for two given distributions P and Q on a probability space \(\mathcal{X}\), we define the fidelity \(F(P\Vert Q)\) as follows. Let \(f_P\) and \(f_Q\) be the Radon–Nikodým derivatives of P and Q with respect to \(P+Q\), respectively. Then, the fidelity \(F(P\Vert Q)\) can be defined as

$$\begin{aligned} F(P\Vert Q) : = \int _{\mathcal{X}} \sqrt{f_P(x)}\sqrt{f_Q(x)} (P+Q)(dx)\, . \end{aligned}$$
(14)

With the above definition, the fidelity satisfies an information processing inequality: for every classical channel \(\mathcal {G}\), one has \(F(\mathcal {G} (P) \Vert \mathcal {G}(Q)) \ge F( P\Vert Q)\). For a family of probability distributions \(\{P_\theta \}_{\theta \in \mathsf {\Theta }}\), we define the Fisher information as

$$\begin{aligned} J_{t_0}:=\liminf _{\epsilon \rightarrow 0}\frac{8 \big [ \,1-F\left( P_{t_0} ||P_{t_0+\epsilon }\right) \big ]}{\epsilon ^2 } \, . \end{aligned}$$
(15)

When the probability distributions are over a discrete set, their Fisher information coincides with the quantum SLD of the corresponding diagonal matrices.

Proof of Theorem 1

Without loss of generality, we assume \(t_0=0\). We define the probability distribution \(P_t\) by \(P_t(B):= \mathrm{Tr}\,\left[ \,\rho _t M(B) \,\right] \). Then, the information processing inequality of the fidelity [32] yields the bound \(F(\rho _{t_0} ||\rho _{t_0+\epsilon }) \le F(P_0\Vert P_\epsilon )\). Hence, it is sufficient to show (12) for the probability distribution family \(\{P_t\}\).

Let \(f_0\) and \(f_\epsilon \) be the Radon–Nikodým derivatives of \(P_0\) and \(P_{\epsilon }\) with respect to \(P_{0}+P_{\epsilon }\). Denoting the estimate by \(\hat{t}\), we have

$$\begin{aligned} V_{\epsilon } (\hat{t}) =&\int _{\mathcal{X}} (\hat{t}(x)-\epsilon )^2 P_{\epsilon }(d x) = \int _{\mathcal{X}} \hat{t}(x)^2 P_{\epsilon }(d x) -2\epsilon \int _{\mathcal{X}} \hat{t}(x) P_{\epsilon }(d x) +\epsilon ^2 \\ =&\int _{\mathcal{X}} \hat{t}(x)^2 P_{\epsilon }(d x) -2\epsilon ^2 +\epsilon ^2 = \int _{\mathcal{X}} \hat{t}(x)^2 P_{\epsilon }(d x) -\epsilon ^2 , \end{aligned}$$

and therefore

$$\begin{aligned} 2 V_{0} (\hat{t}) +2 V_{\epsilon } ( \hat{t}) +2\epsilon ^2&= \int _{\mathcal{X}}2 \hat{t}(x)^2 (P_{0}(d x) +P_{\epsilon }(d x) )\nonumber \\&= \int _{\mathcal{X}}2 \hat{t}(x)^2 (f_0(x)+f_\epsilon (x)) (P_{0}+P_{\epsilon })(d x) \nonumber \\&\ge \int _{\mathcal{X}} \hat{t}(x)^2 (\sqrt{f_0(x)}+\sqrt{f_\epsilon (x)})^2 (P_{0}+P_{\epsilon })(d x) . \end{aligned}$$
(16)

Also, (14) implies the relation

$$\begin{aligned} \int _{\mathcal{X}} (\sqrt{f_0(x)}-\sqrt{f_\epsilon (x)})^2 (P_{0}+P_{\epsilon })(d x) =2-2F(P_0\Vert P_\epsilon ). \end{aligned}$$
(17)

Hence, Schwartz inequality implies

$$\begin{aligned}&\int _{\mathcal{X}}\hat{t}(x)^2 (\sqrt{f_0(x)}+\sqrt{f_\epsilon (x)})^2 (P_{0}+P_{\epsilon })(d x) \cdot \int _{\mathcal{X}} (\sqrt{f_0(x)}-\sqrt{f_\epsilon (x)})^2 (P_{0}+P_{\epsilon })(d x)\nonumber \\&\quad \ge \Big ( \int _{\mathcal{X}}\hat{t}(x) (\sqrt{f_0(x)}+\sqrt{f_\epsilon (x)}) (\sqrt{f_0(x)}-\sqrt{f_\epsilon (x)}) (P_{0}+P_{\epsilon })(d x) \Big )^2\nonumber \\&\quad = \Big ( \int _{\mathcal{X}}\hat{t}(x) (f_0(x)-f_\epsilon (x)) (P_{0}+P_{\epsilon })(d x) \Big )^2\nonumber \\&\quad = \Big ( \int _{\mathcal{X}} \hat{t}(x) (P_{0}-P_{\epsilon })(d x) \Big )^2 =\epsilon ^2. \end{aligned}$$
(18)

Combining (16), (17), and (18) we have (12). \(\quad \square \)

2.2 Local asymptotic covariance

When many copies of the state \(\rho _t\) are available, the estimation of t can be reduced to a local neighbourhood of a fixed point \(t_0 \in \mathsf {\Theta }\). Motivated by Lemma 1, we adopt the following parametrization of the n-copy state

$$\begin{aligned} \rho _{t_0,t}^{n} :=\rho _{t_0+\frac{t}{\sqrt{n}}}^{\otimes n} \, , \qquad t\in \mathsf {\Delta }_n := \sqrt{n} \, ( \mathsf {\Theta }-t_0) \, , \end{aligned}$$
(19)

having used the notation \(a \, \mathsf {\Theta }+ b : = \{ a x+ b \, | x\in \mathsf {\Theta }\}\), for two arbitrary constants \(a, b\in \mathbb {R}\). With this parametrization, the local n-copy model is \( \left\{ \rho _{t_0,t}^{n} \right\} _{t \in \mathsf {\Delta }_n} \). Note that, for every \(t\in \mathbb {R}\), there exists a sufficiently large n such that \( \mathsf {\Delta }_n\) contains t. As a consequence, one has \( \bigcup _{n\in \mathbb {N}}\, \mathsf {\Delta }_n = \mathbb {R}\).

Assuming \(t_0\) to be known, the task is to estimate the local parameter \(t\in \mathbb {R}\), by performing a measurement on the n-copy state \(\rho _{t_0,t}^n\) and then mapping the obtained data to an estimate \(\hat{t}_n\). The whole estimation strategy can be described by a sequence of POVMs \(\mathfrak {m}:=\{M_n\}\). For every Borel set \(\mathsf {B}\subset \mathbb {R}\), we adopt the standard notation

$$\begin{aligned} M_n(\mathsf {B}):=\int _{\hat{t}_n\in \mathsf {B}}~M_n(d\hat{t}_n) \, . \end{aligned}$$

In the existing works on quantum state estimation, the error criterion is defined in terms of the difference between the global estimate \(t_0+\frac{\hat{t}_n}{\sqrt{n}}\) and the global true value \(t_0+\frac{t}{\sqrt{n}}\). Instead, here we focus on the difference between the local estimate \(\hat{t}_n\) and the true value of the local parameter t. With this aim in mind, we consider the probability distribution

$$\begin{aligned} \wp ^{n}_{t_0,t|M_n}(\mathsf {B}) :=\mathrm{Tr}\,\left[ \rho _{t_0,t}^{n} M_n \left( \frac{\mathsf {B}}{\sqrt{n} }+{t}_0\right) \right] \, . \end{aligned}$$
(20)

We focus on the behavior of \(\wp ^{n}_{t_0,t|M_n}\) in the large n limit, assuming the following condition:

Condition 1

(Local asymptotic covariance for a single-parameter). A sequence of measurements \(\mathfrak {m}=\{M_n\}\) satisfies local asymptotic covarianceFootnote 1 when

  1. 1.

    The distribution \(\wp ^{n}_{t_0,t|M_n}\) (20) converges to a distribution \(\wp _{t_0,t| \mathfrak {m}}\), called the limiting distribution, namely

    $$\begin{aligned} \wp _{t_0,t| \mathfrak {m}}\left( \mathsf {B}\right) := \lim _{n \rightarrow \infty }\wp ^{n}_{t_0,t|M_n}\left( \mathsf {B}\right) \end{aligned}$$
    (21)

    for any Borel set \(\mathsf {B}\).

  2. 2.

    the limiting distribution satisfies the relation

    $$\begin{aligned} \wp _{t_0,t| \mathfrak {m}}(\mathsf {B}+t)=\wp _{t_0,0| \mathfrak {m}}(\mathsf {B}) \end{aligned}$$
    (22)

    for any \(t\in \mathbb {R}\), which is equivalent to the condition

    $$\begin{aligned} \lim _{n \rightarrow \infty } \mathrm{Tr}\,\left[ \rho _{t_0,t}^{n} M_n \left( \frac{\mathsf {B}}{\sqrt{n} }+{t}_0+\frac{t}{\sqrt{n} }\right) \right] = \lim _{n \rightarrow \infty } \mathrm{Tr}\,\left[ \rho _{t_0,0}^{n} M_n \left( \frac{\mathsf {B}}{\sqrt{n} }+{t}_0\right) \right] . \end{aligned}$$
    (23)

Using the limiting distribution, we can faithfully approximate the tail probability as

$$\begin{aligned} \mathbf {Prob}\left\{ \left| \hat{t}_n-t\right| > \epsilon \right\}&= \wp _{t_0,t|\mathfrak {m}}((-\infty ,-\epsilon )\cup (\epsilon ,\infty ))+\epsilon _n \end{aligned}$$
(24)

where the \(\epsilon _n\) term vanishes with n for every fixed \(\epsilon \).

For convenience, one may be tempted to require the existence of a probability density function (PDF) of the limiting distribution \(\wp _{t_0,t| \mathfrak {m}}\). However, the existence of a PDF is already guaranteed by the following lemma.

Lemma 2

When a sequence \(\mathfrak {m}:=\{M_n\}\) of POVMs satisfies local asymptotic covariance, the limiting distribution \(\wp _{t_0,t| \mathfrak {m}} \) admits a PDF, denoted by \(\wp _{t_0,0|\mathfrak {m},d}\).

The proof is provided in “Appendix A”.

2.3 MSE bound for the limiting distribution

As a figure of merit, we focus on the mean square error (MSE) \(V[\wp _{t_0,t| \mathfrak {m}}]\) of the limiting distribution \(\wp _{t_0,t| \mathfrak {m}}\), namely

$$\begin{aligned} V[\wp _{t_0,t| \mathfrak {m}}]:=\int _{-\infty }^{\infty } (\hat{t}-t)^2\, \wp _{t_0,t| \mathfrak {m}}\left( d\hat{t}\right) \, . \end{aligned}$$

Note that local asymptotic covariance implies that the MSE is independent of t.

The main result of the section is the following theorem:

Theorem 2

(MSE bound for single-parameter estimation). When a sequence \(\mathfrak {m}:=\{M_n\}\) of POVMs satisfies local asymptotic covariance, the MSE of its limiting distribution is lower bounded as

$$\begin{aligned} V[\wp _{t_0,t| \mathfrak {m}}] \ge J^{-1}_{t_0} \, , \end{aligned}$$
(25)

where \(J_{t_0}\) is the SLD Fisher information of the model \(\{\rho _t\}_{t\in \mathsf {\Theta }}\). The PDF of \(\wp _{t_0,t| \mathfrak {m}}\) is upper bounded by \(\sqrt{J_{t_0}}\). When the PDF of \(\wp _{t_0,t| \mathfrak {m}}\) is differentiable with respect to t, equality in (25) holds if and only if \(\wp _{t_0,t| \mathfrak {m}}\) is the normal distribution with average zero and variance \(J_{t_0}^{-1}\).

Proof of Theorem 2

When the integral \(\int _{\mathbb {R}} \hat{t} \wp _{t_0,0|\mathfrak {m}}(d \hat{t})\) does not converge, \(V[\wp _{t_0,t| \mathfrak {m}}]\) is infinite and satisfies (25). Hence, we can assume that the above integral converges. Further, we can assume that the outcome \(\hat{t}\) satisfies the unbiasedness condition \(\int _{\mathbb {R}} \hat{t} \wp _{t_0,t|\mathfrak {m}}(d \hat{t})=t\). Otherwise, we can replace \(\hat{t}\) by \(\hat{t}_0:=\hat{t}-\int _{\mathbb {R}} \hat{t}' \wp _{t_0,0|\mathfrak {m}}(d \hat{t}')\) because the estimator \(\hat{t}_0\) has a smaller MSE than \(\hat{t}\) and satisfies the unbiasedness condition due to the covariance condition. Hence, Theorem 1 guarantees

$$\begin{aligned} V[\wp _{t_0,t| \mathfrak {m}}] =V[\wp _{t_0,0| \mathfrak {m}}]&\ge \Big (\liminf _{\epsilon \rightarrow 0} \frac{8\left( 1- F(\wp _{t_0,0|\mathfrak {m}}||\wp _{t_0,\epsilon |\mathfrak {m}})\right) }{\epsilon ^2} \Big )^{-1}. \end{aligned}$$
(26)

Applying Lemma 20 to \(\{\wp _{t_0,t|\mathfrak {m}}\}\), we have

$$\begin{aligned}&\liminf _{\epsilon \rightarrow 0} \frac{8\left( 1- F(\wp _{t_0,0|\mathfrak {m}}|| \wp _{t_0,\epsilon |\mathfrak {m}})\right) }{\epsilon ^2}\nonumber \\&\quad {\mathop {\le }\limits ^{(a)}} \liminf _{\epsilon \rightarrow 0}\liminf _{n \rightarrow \infty } \frac{8\left( 1-F\left( \wp ^n_{t_0,0|M_n}||\wp ^n_{t_0,\epsilon |M_n} \right) \right) }{\epsilon ^2}\nonumber \\&\quad {\mathop {\le }\limits ^{(b)}} \liminf _{\epsilon \rightarrow 0}\liminf _{n \rightarrow \infty } \frac{8\left( 1-F\left( \rho _{t_0}^{\otimes n}||\rho _{t_0+\frac{\epsilon }{\sqrt{n}}}^{\otimes n}\right) \right) }{\epsilon ^2}\nonumber \\&\quad {\mathop {=}\limits ^{(c)}} \liminf _{\epsilon \rightarrow 0} \frac{8\left( 1- e^{-\frac{J_{t_0} \epsilon ^2}{8}} \right) }{\epsilon ^2} = J_{t_0} \, . \end{aligned}$$
(27)

The inequality (a) holds by Lemma 20 from “Appendix B”, and the inequality (b) comes from the data-processing inequality of the fidelity. The equation (c) follows from Lemma 1. Finally, substituting Eq. (27) into Eq. (26), we have the desired bound (25).

Now, we denote the PDF of \(\wp _{t_0,0|\mathfrak {m}}\) by \(\wp _{t_0,0|\mathfrak {m},d}\). In “Appendix A” the proof of Lemma 2 shows that we can apply Lemma 19 to \(\{\wp _{t_0,t| \mathfrak {m}}\}_t\). Since the Fisher information of \(\{\wp _{t_0,t| \mathfrak {m}}\}_t\) is upper bounded by \( J_{t_0}\), this application guarantees that \(\wp _{t_0,t| \mathfrak {m},d}(x)\le \sqrt{J_{t_0}}\) for \(x \in \mathbb {R}\).

When the PDF \(\wp _{t_0,t| \mathfrak {m},d}\) is differentiable, to derive the equality condition in Eq. (25), we show (26) in a different way. Let \(l_{t_0,t}(x)\) be the logarithmic derivative of \(\wp _{t_0,t|\mathfrak {m},d}(x)\), defined as \(l_{t_0,t}(x)\cdot \wp _{t_0,t|\mathfrak {m},d}(x): =\frac{\partial \log \wp _{t_0,t|\mathfrak {m},d}(x)}{\partial t} =\frac{\partial \log \wp _{t_0,0|\mathfrak {m},d}(x-t)}{\partial t}\). By Schwartz inequality, we have

$$\begin{aligned} V[\wp _{t_0,0| \mathfrak {m}}]&\ge \frac{\left| \int _{-\infty }^{\infty } \hat{x}\,l_{t_0,0}(\hat{x}) \,\wp _{t_0,0|\mathfrak {m}}(\hat{x})d\hat{x}\right| ^2}{\int _{-\infty }^{\infty } l_{t_0,0}^2(\hat{x}) \wp _{t_0,0|\mathfrak {m}}(\hat{x})d\hat{x}}. \end{aligned}$$
(28)

The numerator on the right hand side of Eq. (28) can be evaluated by noticing that

$$\begin{aligned} \int _{-\infty }^{\infty } \hat{x}\,l_{t_0,0}(\hat{x})\,\wp _{t_0,0|\mathfrak {m}}(\hat{x}) d\hat{x}&=\left. \frac{\partial }{\partial x}\int _{-\infty }^{\infty } \hat{x}\,\wp _{t_0,x|\mathfrak {m}}(\hat{x}) d\hat{x}\right| _{x=0}. \end{aligned}$$

By local asymptotic covariance, this quantity can be evaluated as

$$\begin{aligned} \int _{-\infty }^{\infty } \hat{x}\,l_{t_0,0}(\hat{x})\,\wp _{t_0,0|\mathfrak {m}}(\hat{x})d\hat{x} =\left. \frac{\partial }{\partial x}\left[ \int _{-\infty }^{\infty }( \hat{x}+x)\,\wp _{t_0,0|\mathfrak {m}}(\hat{x}) d\hat{x}\right] \right| _{x=0}=1. \end{aligned}$$
(29)

Hence, (28) coincides with (26). The denominator on the right hand side of (28) equals the right hand side of (26). The equality in Eq. (28) holds if and only if \(\int _{-\infty }^{\infty } \hat{x}^2 \wp _{t_0,0|\mathfrak {m},d}( \hat{x})d \hat{x} =J^{-1}_{t_0}\) and \(\frac{d}{dx} \log \wp _{t_0,0|\mathfrak {m},d}(\hat{x})\) is proportional to \(\hat{x}\), which implies that \(\wp _{t_0,0|\mathfrak {m}}\) is the normal distribution with average zero and variance \(J^{-1}_{t_0}\). \(\quad \square \)

The RHS of (25) can be regarded as the limiting distribution version of the SLD quantum Cramér–Rao bound. Note that, when the limiting PDF is differentiable and the bound is attained, the probability distribution \(\wp ^{n}_{t_0,t|M_n}\) is approximated (in the pointwise sense) by a normal distribution with average zero and variance \(\frac{1}{n J_{t_0}}\). Using this fact, we will show that there exists a sequence of POVMs that attains the equality (25) at all points uniformly. The optimal sequence of POVMs is constructed explicitly in Sect. 6.

2.4 Comparison between local asymptotic covariance and other conditions

We conclude the section by discussing the relation between asymptotic covariance and other conditions that are often imposed on measurements. This subsection is not necessary for understanding the technical results in the next sections and can be skipped at a first reading.

Let us start with the unbiasedness condition. Assuming unbiasedness, one can derive the quantum Cramér–Rao bound on the MSE [5]. Holevo showed the attainability of the quantum Cramér–Rao bound when estimating displacements in Gaussian systems [5].

The disadvantage of unbiasedness is that it is too restrictive, as it is satisfied only by a small class of measurements. Indeed, the unbiasedness condition for the estimator M requires the condition \( \mathrm{Tr}\,\left[ E \frac{d^i \rho _t}{d t^i}\right] \, |_{t=t_0}=0\) for \(i \ge 2\) with \( E:= \int \hat{t} M(d\hat{t})\) as well as the condition \( \mathrm{Tr}\,\left[ E \frac{d \rho _t}{d t}\right] \,|_{t=t_0}=1\). In certain situations, the above conditions might be incompatible. For example, consider a family of qubit states \(\rho _t:=\frac{1}{2}\left( I+\varvec{n}_t\cdot \varvec{\sigma }\right) \). When the Bloch vector \(\varvec{n}_t\) has a non-linear dependence on t and the set of higher order derivatives \(\frac{d^i \rho _t}{d t^i}|_{t=t_0}\) with \(i \ge 2\) spans the space of traceless Hermittian matrices, no unbiased estimator can exist. In contrast, local asymptotic covariance is only related to the first derivative \(\frac{d \rho _t}{d t}|_{t=t_0}\) because the contribution of higher order derivatives to the variable \(\hat{t}_n\) has order \(o\left( \frac{1}{\sqrt{n}}\right) \) and vanishes under the condition of the local asymptotic covariance.

One can see that the unbiasedness condition implies local asymptotic covariance with the parameterization \(\rho _{t_0+\frac{t}{\sqrt{n}}}\) in the following sense. When we have n (more than one) input copies, we can construct unbiased estimator by applying a single-copy unbiased estimator M satisfying Eq. (9) to all copies as follows. For the i-th outcome \(x_i\), we take the rescaled average \( \sum _{i=1}^n \frac{x_i}{n}\), which satisfies the unbiasedness (9) for the parameter t as well. When the single-copy estimator M has variance v at \(t_0\), which is lower bounded by the Cramér–Rao inequality, this estimator has variance v / n at \(t_0\). In addition, the average (30) of the obtained data satisfies the local asymptotic covariance because the rescaled estimator \(\sqrt{n} ( (\sum _{i=1}^n \frac{x_i}{n})-t_0)\) follows the Gaussian distribution with variance v in the large n limit by the central limit theorem; the center of the Gaussian distribution is pinned at the true value of the parameter by unbiasedness; the shape of the Gaussian is independent of the value t and depends only on \(t_0\); thus locally asymptotic covariance holds.

The above discussion can be extended to the multiple-copy case as follows. Suppose that \(M_{\ell }\) is an unbiased measurement for the \(\ell \)-copy state \(\rho _{t_0+\frac{t}{\sqrt{n}}}^{\otimes \ell }\) with respect to the parameter \(t_0+\frac{t}{\sqrt{n}}\), where \(\ell \) is an arbitrary finite integer. From the measurement \(M_{\ell }\) we can construct a measurement for the n-copy state with \(n= k\ell +i\) and \(i <\ell \) by applying the measurement \(M_\ell \)k times and discarding the remaining i copies. In the following, we consider the limit where the total number n tends to infinity, while \(\ell \) is kept fixed. When the variance of \(M_\ell \) at \(t_0\) is \(v/\ell \), the average

$$\begin{aligned} \sum _{i=1}^k \frac{x_i}{k} \end{aligned}$$
(30)

of the k obtained data \(x_1, \ldots , x_k\) satisfies local asymptotic covariance, i.e., the rescaled estimator \(\sqrt{n} ((\sum _{i=1}^k \frac{x_i}{k})-t_0)\) follows the Gaussian distribution with variance v in the large n limit. Therefore, for any unbiased estimator, there exists an estimator satisfying locally asymptotic covariance that has the same variance.

Another common condition, less restrictive than unbiasedness, is local unbiasedness. This condition depends on the true parameter \(t_0\) and consists of the following two requirements

$$\begin{aligned} \int \hat{t}\mathrm{Tr}\,\left[ \, \rho _{t}^{\otimes \ell }\, M_\ell (d\hat{t})\, \right] \, \Big |_{t=t_0}&=t_0 \end{aligned}$$
(31)
$$\begin{aligned} \frac{d}{d t}\int \hat{t}\mathrm{Tr}\,\left[ \, \rho _{t}^{\otimes \ell } \, M_\ell (d\hat{t})\right] \,\Big |_{t=t_0}&=1\, , \end{aligned}$$
(32)

where \(\ell \) is a fixed, but otherwise arbitrary, integer. The derivation of the quantum Cramér–Rao bound still holds, because it uses only the condition (32). When the parametrization \(\rho _t\) is \(C^1\) continuous, the first derivative \(\frac{d}{d t}\int \hat{t}\mathrm{Tr}\,\left[ \rho _{t}^{\otimes \ell }M_\ell (d\hat{t})\right] \) is continuous at \(t=t_0\), and the locally unbiased condition at \(t_0\) yields the local asymptotic covariance at \(t_0\) in the way as Eq. (30).

Another relaxation of the unbiasedness condition is asymptotic unbiasedness [11]

$$\begin{aligned}&\lim _{n\rightarrow \infty }\int \hat{t}\mathrm{Tr}\,\left[ \, \rho _{t}^{\otimes n} \, M_n(d\hat{t})\right] =t \qquad&\forall t\in \mathsf {\Theta }\end{aligned}$$
(33)
$$\begin{aligned}&\lim _{n\rightarrow \infty }\frac{d}{d t}\int \hat{t} \, \mathrm{Tr}\,\left[ \, \rho _{t}^{\otimes n}\, M_n(d\hat{t})\right] =1 \qquad&\forall t\in \mathsf {\Theta }\, . \end{aligned}$$
(34)

The condition of asymptotic unbiasedness leads to a precision bound on MSE [34, Chapter 6]. The bound is given by the SLD Fisher information, and therefore it is attainable for Gaussian states. However, no attainable bound for qudit systems has been derived so far under the condition of asymptotic unbiasedness. Interestingly, one cannot directly use the attainability for Gaussian systems to derive an attainability result for qudit systems, despite the asymptotic equivalence between Gaussian systems and qudit systems stated by quantum local asymptotic normality (Q-LAN) (see [16, 17] and Sect. 4.1). The problem is that the error of Q-LAN goes to 0 for large n, but the error in the derivative may not go to zero, and therefore the condition (34) is not guaranteed to hold.

In order to guarantee attainability of the quantum Cramér–Rao bound, one could think of further loosening the condition of the asymptotic unbiasedness. An attempt to avoid the problem of the Q-LAN error could be to remove condition (34) and keep only condition (33). This leads to an enlarged class of estimators, called weakly asymptotically unbiased. The problem with these estimators is that no general MSE bound is known to hold at every point x. For example, one can find superefficient estimators [35, 36], which violate the Cramér–Rao bound on a set of points. Such a set must be of zero measure in the limit \(n\rightarrow \infty \), but the violation of the bound may occur in a considerably large set when n is finite. In contrast, local asymptotic covariance guarantees the MSE bound (25) at every point t where the local asymptotic convariance condition is satisfied. All these alternative conditions for deriving MSE bounds, discussed here in this subsection, are summarized in Table 2.

Table 2 Alternative conditions for deriving MSE bounds

3 Holevo Bound and Gaussian States Families

3.1 Holevo bound

When studying multiparameter estimation in quantum systems, we need to address the tradeoff between the precision of estimation of different parameters. This is done using two types of quantum extensions of Fisher information matrix: the SLD and the right logarithmic derivative (RLD).

Consider a multiparameter family of density operators \(\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\), where \(\mathsf {\Theta }\) is an open set in \(\mathbb {R}^k\), k being the number of parameters. Throughout this section, we assume that \(\rho _{\varvec{t}_0}\) is invertible and that the parametrization is \(C^1\) in all parameters. Then, the SLD \(L_j\) and the RLD \(\tilde{L}_j\) for the parameter \(t_j\) are defined through the following equations

$$\begin{aligned} \frac{\partial \rho _{\varvec{t}}}{\partial t_j}=\frac{1}{2}\left( \rho _{\varvec{t}}{L}_j+{L}_j\rho _{\varvec{t}}\right) , \quad \frac{\partial \rho _{\varvec{t}}}{\partial t_j}=\rho _{\varvec{t}}\tilde{L}_j \, , \end{aligned}$$

see e.g. [5, 6] and [15, Sect. II]. It can be seen from the definitions that the SLD \(L_j\) can always be chosen to be Hermitian, while the RLD \(\tilde{L}_j\) is in general not Hermitian.

The SLD quantum Fisher information matrix \(J_{\varvec{t}}\) and the RLD quantum Fisher information matrix \(\tilde{J}_{\varvec{t}}\) are the \(k\times k\) matrices defined as

$$\begin{aligned} \left( J_{\varvec{t}}\right) _{ij}:=\mathrm{Tr}\,\left[ \frac{\rho _{\varvec{t}}(L_iL_j+L_jL_i)}{2}\right] ,\quad \left( \tilde{J}_{\varvec{t}}\right) _{ij}:=\mathrm{Tr}\,\left[ \tilde{L}_j^\dagger \rho _{\varvec{t}} \tilde{L}_i \right] \, . \end{aligned}$$
(35)

Notice that the SLD quantum Fisher information matrix \(J_{\varvec{t}}\) is a real symmetric matrix, but the RLD quantum Fisher information matrix \(\tilde{J}_{\varvec{t}}\) is not a real matrix in general.

A POVM M is called an unbiased estimator for the family \(\mathcal {S}=\{\rho _{\varvec{t}}\}\) when the relation

$$\begin{aligned} \varvec{t}=E_{\varvec{t}}(M):=\int \varvec{x} \mathrm{Tr}\,\big [ \rho _{\varvec{t}} M(d \varvec{x})\big ] \end{aligned}$$

holds for any parameter \(\varvec{t}\). For a POVM M, we define the mean square error (MSE) matrix \(V_{\varvec{t}}(M)\) as

$$\begin{aligned} \Big ( \, V_{\varvec{t}}(M) \, \Big )_{i,j}:=\int (x_i- t_i) (x_j- t_j) \mathrm{Tr}\,\big [ \, \rho _{\varvec{t}} M(d \varvec{x})\, \big ]. \end{aligned}$$
(36)

It is known that an unbiased estimator M satisfies the SLD type and RLD type of Cramer–Rao inequalities

$$\begin{aligned} V_{\varvec{t}}(M)\ge J_{\varvec{t}}^{-1}, \qquad \mathrm{and} \qquad V_{\varvec{t}}(M)\ge \tilde{J}_{\varvec{t}}^{-1} \, , \end{aligned}$$
(37)

respectively [5]. Since it is not always possible to minimize the MSE matrix under the unbiasedness condition, we minimize the weighted MSE \(\mathrm{tr}\,\big [ W V_{\varvec{t}}(M) \big ]\) for a given weight matrix \(W \ge 0\), where \(\mathrm{tr}\,\) denotes the trace of \(k\times k\) matrices. When a POVM M is unbiased, one has the RLD bound [5]

$$\begin{aligned} \mathrm{tr}\,\big [ W V_{\varvec{t}}(M) \big ] \ge \mathcal {C}_{\mathrm{R},\mathcal{M}}(W,\varvec{t}) \end{aligned}$$
(38)

with

$$\begin{aligned} \mathcal {C}_{\mathrm{R},\mathcal{M}}(W,\varvec{t}) :=&\min \Big \{\mathrm{tr}\,[V W] ~\Big |~ V \ge \tilde{J}^{-1}_{\varvec{t}}\Big \} \nonumber \\ =&\mathrm{tr}\,\Big [ \, W \mathsf{Re} (\tilde{J}_{\varvec{t}})^{-1}\, \Big ] + \mathrm{tr}\,\Big | \sqrt{W} \mathsf{Im} (\tilde{J}_{\varvec{t}})^{-1} \sqrt{W}\Big | \, . \end{aligned}$$
(39)

In particular, when \(W>0\), the lower bound (39) is attained by the matrix \(V= \mathsf{Re} (\tilde{J}_{\varvec{t}})^{-1} + \sqrt{W}^{-1} | \sqrt{W} \mathsf{Im} (\tilde{J}_{\varvec{t}})^{-1} \sqrt{W}|\sqrt{W}^{-1} \).

The RLD bound has a particularly tractable form when the model is D-invariant:

Definition 1

The model \(\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\) is D-invariant at \(\varvec{t}\) when the space spanned by the SLD operators is invariant under the linear map \(\mathcal {D}_{\varvec{t}}\). For any operator X, \(\mathcal {D}_{\varvec{t}}(X)\) is defined via the following equation

$$\begin{aligned} \frac{ \rho _{\varvec{t}}\mathcal {D}_{\varvec{t}}(X)+\mathcal {D}_{\varvec{t}}(X)\rho _{\varvec{t}} }{2}=i[X,\rho _{\varvec{t}}] \end{aligned}$$
(40)

where \([A,B] = AB-BA\) denotes the commutator. When the model is D-invariant at any point, it is simply called D-invariant.

For a D-invariant model, the RLD quantum Fisher information can be computed in terms of the D-matrix, namely the skew-symmetric matrix defined as

$$\begin{aligned} (D_{\varvec{t}})_{j,k}:=i\mathrm{Tr}\,\Big [ \rho _{\varvec{t}}[L_j, L_k] \Big ] \,. \end{aligned}$$
(41)

Precisely, the RLD quantum Fisher information has the expression [5]

$$\begin{aligned} \left( \widetilde{J}_{\varvec{t}}\right) ^{-1}=\left( J_{\varvec{t}}\right) ^{-1} +\frac{i}{2}\left( J_{\varvec{t}}\right) ^{-1}D_{\varvec{t}}\left( J_{\varvec{t}}\right) ^{-1}\, . \end{aligned}$$
(42)

Hence, (39) becomes

$$\begin{aligned} \mathcal {C}_{\mathrm{R},\mathcal{M}}(W,\varvec{t}) = \mathrm{tr}\,\left[ \, W \left( J_{\varvec{t}}\right) ^{-1}\right] + \frac{1}{2} \mathrm{tr}\,\left| \sqrt{W}\left( J_{\varvec{t}}\right) ^{-1}D_{\varvec{t}}\left( J_{\varvec{t}}\right) ^{-1} \sqrt{W}\right| . \end{aligned}$$
(43)

For D-invariant models, the RLD bound is larger and thus it is a better bound than the bound derived by using the SLD Fisher information matrix (the SLD bound). However, in the one-parameter case, when the model is not D-invariant, the RLD bound is not tight, and it is common to use the SLD bound in the one-parameter case. Hence, both quantum extensions of the Cramér–Rao bound have advantages and disadvantages.

To unify both extensions, Holevo [5] derived the following bound, which improves the RLD bound when the model is not D-invariant. For a k-component vector \(\varvec{X}\) of operators, define the \(k\times k\) matrix \(Z_{\varvec{t}}(\varvec{X})\) as

$$\begin{aligned} \Big (Z_{\varvec{t}}(\varvec{X})\Big )_{ij}:= \mathrm{Tr}\,\Big [ \rho _{\varvec{t}}X_iX_j \Big ] \, . \end{aligned}$$
(44)

Then, Holevo’s bound is as follows: for any weight matrix W, one has

$$\begin{aligned} \inf _{M \in \mathsf {UB}_{\mathcal{M}}} \mathrm{tr}\,\Big [ W V_{\varvec{t}}(M) \Big ]&\ge \mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t}) \nonumber \\&:= \min _{\varvec{X}} \min _V \Big \{\mathrm{tr}\,\big [ W V\big ] ~ \Big |~ V \ge Z_{\varvec{t}}(\varvec{X})\Big \} \end{aligned}$$
(45)
$$\begin{aligned}&= \min _{\varvec{X}} \mathrm{tr}\,\Big [ W \mathsf{Re} (Z_{\varvec{t}}(\varvec{X})) \Big ] + \mathrm{tr}\,\Big |\sqrt{W} \mathsf{Im} (Z_{\varvec{t}}(\varvec{X}))\sqrt{W}\Big | , \end{aligned}$$
(46)

where \(\mathsf {UB}_{\mathcal{M}}\) denotes the set of all unbiased measurements under the model \(\mathcal{M}\), V is a real symmetric matrix, and \(\varvec{X}=(X_i)\) is a k-component vector of Hermitian operators satisfying

$$\begin{aligned} \mathrm{Tr}\,\left. \left[ X_i \frac{\partial \rho _{\varvec{t}}}{\partial t_j} \right] \right| _{\varvec{t}=\varvec{t}_0 } =\delta _{ij}, \ \forall i, j \le k. \end{aligned}$$
(47)

\(\mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t})\) is called the Holevo bound. When \(W>0\), there exists a vector \(\varvec{X}\) achieving the minimum in (45). Hence, similar to the RLD case, the equality in (45) holds for \(W>0\) only when

$$\begin{aligned} V_{\varvec{t}}(M) = \mathsf{Re}\, (Z_{\varvec{t}}(\varvec{X}))+ \sqrt{W}^{-1} |\sqrt{W} \mathsf{Im } \, (Z_{\varvec{t}}(\varvec{X}))\sqrt{W}| \sqrt{W}^{-1}. \end{aligned}$$
(48)

Moreover, we have the following proposition.

Proposition 1

([15, Theorem 4]). Let \(\mathcal {S}=\{\rho _{\varvec{t}}\}_{\varvec{t}\in \mathsf {\Theta }}\) be a generic k-parameter qudit model and let \(\mathcal {S}'=\{\rho _{\varvec{t},\varvec{p}}\}_{\varvec{t},\varvec{p}}\) be a \(k'\)-parameter model containing \(\mathcal {S}\) as \(\rho _{\varvec{t}}=\rho _{(\varvec{t},\varvec{0})}\). When \(\mathcal {S}'\) is D-invariant and the inverse \(\tilde{J}_{\varvec{t}'}^{-1} \) of the RLD Fisher information matrix of the model \(\mathcal {S}'\) exists, we have

$$\begin{aligned} \mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t})&= \min _{\varvec{X}:\mathcal{M}'} \min _V \Big \{\mathrm{tr}\,\big [W V \big ] ~ \Big |~ V \ge Z_{\varvec{t}}(\varvec{X})\Big \} \end{aligned}$$
(49)
$$\begin{aligned}&=\min _{P}\left\{ \mathrm{tr}\,\left[ P^TWP(J_{\varvec{t}'}^{-1})\right] +\frac{1}{2}\mathrm{tr}\,\left| \sqrt{P^TWP}J_{\varvec{t}'}^{-1} D_{\varvec{t}'}J_{\varvec{t}'}^{-1}\sqrt{P^TWP}\right| \right\} . \end{aligned}$$
(50)

In (49), \(\min _{\varvec{X}:\mathcal{M}'}\) denotes the minimum for vector \( \varvec{X}\) whose components \(X_i\) are linear combinations of the SLDs operators in the model \(\mathcal{M}'\). In (50), the minimization is taken over all \(k\times k'\) matrices satisfying the constraint \((P)_{ij}:=\delta _{ij}\) for \(i,j\le k\), \(J_{\varvec{t}'}\) and \(D_{\varvec{t}'}\) are the SLD Fisher information matrix and the D-matrix [cf. Eqs. (35) and (41)] for the extended model \(\mathcal {S}'\) at \(\varvec{t}':=(\varvec{t},\varvec{0})\).

The Holevo bound is always tighter than the RLD bound:

$$\begin{aligned} \mathcal {C}_{\mathrm{R},\mathcal{M}}(W,\varvec{t}) \le \mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t}) \,. \end{aligned}$$
(51)

The equality holds if and only if the model \(\mathcal{M}\) is D-invariant [37].

In the above proposition, it is not immediately clear whether the Holevo bound depends on the choice of the extended model \(\mathcal {S}'\). In the following, we show that there is a minimum D-invariant extension of \(\mathcal {S}\), and thus the Holevo bound is independent of the choice of \(\mathcal {S}'\). The minimum D-invariant subspace in the space of Hermitian matrices is given as follows. Let \(\mathcal{V}\) be the subspace spanned SLDs \(\{L_i\}\) of the original model \(\mathcal{M}\) at \(\rho _{\varvec{t}}\). Let \(\mathcal{V}'\) be the subspace spanned by \(\cup _{j=0}^{\infty }\mathcal{D}_{\varvec{t}}^j(\mathcal{V})\). Then, the subspace \(\mathcal{V}'\) is D-invariant and contains \(\mathcal{V}\). What remains is to show that \(\mathcal{V}'\) is the minimum D-invariance subspace. Let \(\mathcal{V}''\) be the orthogonal space with respect to \(\mathcal{V}'\) for the inner product defined by \(\mathrm{Tr}\,\rho X^\dagger Y\). We denote by \(P'\) and \(P''\) the projections into \(\mathcal V'\) and \(\mathcal V''\) respectively. Each component \(X_i\) of a vector of operators \(\varvec{X}\) can be expressed as \(X_i=P'X_i + P''X_i\). Then, the two vectors \(\varvec{X}':=(P' X_i)\) and \(\varvec{X}'':=(P'' X_i)\) satisfy the inequality \(Z_{\varvec{t}}(\varvec{X})=Z_{\varvec{t}}(\varvec{X}')+Z_{\varvec{t}}(\varvec{X}'') \ge Z_{\varvec{t}}(\varvec{X}')\). Substituting Eq. (35) into Eq. (47) and noticing that \(P''X_i\) has no support in \(\mathcal V\), we get that only the part \(P'X_i \) contributes the condition (47) and the minimum in (46) is attained when \(\varvec{X}''=0\). Hence, the minimum is achieved when each component of the vector \(\varvec{X}\) is included in the minimum D-invariant subspace \(\mathcal{V}'\). Therefore, since the minimum D-invariant subspace can be uniquely defined, the Holevo bound does not depend on the choice of the D-invariant model \(\mathcal {S}'\) that extends \(\mathcal {S}\).

3.2 Classical and quantum Gaussian states

For a classical system of dimension \(d^\mathrm{C}\), a Gaussian state is a \(d^\mathrm{C}\)-dimensional normal distribution \(N[\varvec{\alpha }^\mathrm{C},\varGamma ^\mathrm{C}]\) with mean \(\varvec{\alpha }^\mathrm{C}\) and covariance matrix \(\varGamma \). The corresponding random variable will be denoted as \(\varvec{Z}=(Z_1, \ldots , Z_{d^\mathrm{C}})\) and will take values \(\varvec{z} = (z_1, \dots , z_{d^\mathrm{C}})\in \mathbb {R}^{d^\mathrm{C}}\).

For quantum systems we will restrict our attention to a subfamily of Gaussian states, known as displaced thermal states. For a quantum system made of a single mode, the displaced thermal states are defined as

$$\begin{aligned} \rho _{\alpha ,\beta }:=T^\mathrm{Q}_{\alpha }\,\rho ^{\mathrm{thm}}_{\beta }(T^\mathrm{Q}_{\alpha })^\dag , \qquad T^\mathrm{Q}_\alpha =\exp (\alpha \hat{a}^\dag -\bar{\alpha } \hat{a}) \, , \end{aligned}$$
(52)

where \(\alpha \in \mathbb {C}\) is the displacement, \(T^\mathrm{Q}_\alpha \) is the displacement operator, \(\hat{a}\) is the annihilation operator satisfying the relation \([\hat{a}, \hat{a}^\dag ] =1\), and \(\rho ^{\mathrm{thm}}_{\beta }\) is a thermal state, defined as

$$\begin{aligned} \rho ^{\mathrm{thm}}_{\beta }&:=(1-e^{-\beta })\sum _{j=0}^{\infty }e^{-j\beta }\, |j\rangle \langle j| \, , \end{aligned}$$
(53)

where the basis \(\{ |j\rangle \}_{j \in \mathbb {N}}\) consists of the eigenvectors of \(\hat{a}^\dag \hat{a}\) and \(\beta \in (0,\infty ) \) is a real parameter, hereafter called the thermal parameter.

For a quantum system of \(d^\mathrm{Q}\) modes, the products of single-mode displaced thermal states will be denoted as

$$\begin{aligned} \varPhi [\varvec{\alpha }^\mathrm{Q} , \varvec{\beta }^\mathrm{Q}]:= \bigotimes _{j=1}^{d^Q} \rho _{\alpha _j,\beta _j} \end{aligned}$$
(54)

where \(\varvec{\alpha }^\mathrm{Q} = (\alpha _j)_{j=1}^{d^\mathrm{Q}}\) is the vector of displacements and \(\varvec{\beta }^\mathrm{Q} = (\beta _j)_{j=1}^{d^\mathrm{Q}} \) is the vector of thermal parameters. In the following we will regard \(\varvec{\alpha }\) as a vector in \(\mathbb {R}^{2 d^\mathrm{Q}}\), using the notation \(\varvec{\alpha }= (\alpha ^\mathrm{R}_1, \alpha ^\mathrm{I}_1 , \ldots , \alpha ^\mathrm{R}_{d^\mathrm{Q}}, \, \alpha ^\mathrm{I}_{d^\mathrm{Q}})\), \(\alpha _j^\mathrm{R} : =\mathsf{Re} (\alpha _j)\), \(\alpha _j^\mathrm{I} : = \mathsf{Im} ( \alpha _j)\).

For a hybrid system of \(d^\mathrm{C}\) classical variables and \(d^\mathrm{Q}\) quantum modes, we define the canonical classical–quantum (c–q) Gaussian states\(G[\varvec{\alpha },\varGamma ]\) as

$$\begin{aligned} G[\varvec{\alpha },\varGamma ]:=N[\varvec{\alpha }^\mathrm{C},\varGamma ^\mathrm{C}]\otimes \varPhi [\varvec{\alpha }^\mathrm{Q},\varvec{\beta }^{\mathrm{Q}}] \, , \end{aligned}$$
(55)

with \(\varvec{\alpha }= \varvec{\alpha }^\mathrm{C} \oplus \varvec{\alpha }^\mathrm{Q} \in \mathbb {R}^{d^\mathrm{C} + 2 d^\mathrm{Q}}\), \(\varvec{\alpha }^\mathrm{C} \in \mathbb {R}^{d^\mathrm{C}}\), \(\varvec{\alpha }^\mathrm{Q} \in \mathbb {R}^{2d^\mathrm{Q}}\), and \({\varGamma }= \varGamma ^\mathrm{C} \oplus \varGamma ^\mathrm{Q}\), where

$$\begin{aligned} \varGamma ^\mathrm{Q}&:= E_{d^\mathrm{Q}}\left( \varvec{N}\right) +\frac{i}{2}\varOmega _{d^\mathrm{Q}}, \nonumber \\ E_{d^\mathrm{Q}}(\varvec{N})&:= \left( \begin{array}{ccccc} N_{1} &{} 0 &{} &{} &{} \\ 0 &{}N_{1} &{} &{} &{} \\ &{} &{} \ddots &{} &{} \\ &{} &{} &{} N_{d^\mathrm{Q}}&{}0 \\ &{} &{} &{} 0&{}N_{d^\mathrm{Q}} \end{array} \right) \qquad N_{j}:=\frac{e^{-\beta _{j}}}{1-e^{-\beta _{j}}} \end{aligned}$$
(56)
$$\begin{aligned} \varOmega _{d^\mathrm{Q}}&:= \left( \begin{array}{ccccc} 0 &{} 1 &{} &{} &{} \\ -1 &{} 0 &{} &{} &{} \\ &{} &{} \ddots &{} &{} \\ &{} &{} &{} 0&{}1 \\ &{} &{} &{} -1&{}0 \end{array} \right) \quad . \end{aligned}$$
(57)

Equivalently, the canonical Gaussian states can be expressed as

$$\begin{aligned} G[\varvec{\alpha },{\varGamma }]=T_{\varvec{\alpha }} \, G[\varvec{0},{\varGamma }] \, T^\dag _{\varvec{\alpha }} \, , \end{aligned}$$
(58)

where \(T_{\varvec{\alpha }}\) is the Gaussian shift operator

$$\begin{aligned} T_{\varvec{\alpha }}:=\left( \bigotimes _{k=1}^{d^C} T^\mathrm{C}_{\alpha ^C_k}\right) \otimes \left( \bigotimes _{j=1}^{d^\mathrm{Q}}T^\mathrm{Q}_{\alpha ^\mathrm{Q}_j }\right) \,, \end{aligned}$$
(59)

\(T^\mathrm{Q}_{\alpha _k^\mathrm{Q}}\) is given by Eq. (52), and \(T^\mathrm{C}_{\alpha _j^\mathrm{C}}\) is the map \(z_j\rightarrow z_j + \alpha _j^\mathrm{C}\). For the classical part, we have adopt the notation

$$\begin{aligned} \mathrm{Tr}\,\left[ N[\varvec{\alpha }^\mathrm{C},\varGamma ^\mathrm{C}] \, \exp \left( \sum _{j=1}^{d^\mathrm{C}} i \xi _j Z_j\right) \right] : = \int d z_1 \dots d z_{d^\mathrm{C}} \, N[\varvec{\alpha }^\mathrm{C},\varGamma ^\mathrm{C}] (\varvec{z}) \, \exp \left( \sum _{j=1}^{d^\mathrm{C}} i \xi _j z_j\right) \, . \end{aligned}$$

With this notation, the canonical Gaussian state \( G[\varvec{\alpha },{\varGamma }]\) is uniquely identified by the characteristic equation [5]

$$\begin{aligned} \mathrm{Tr}\,\left[ G[\varvec{\alpha },{\varGamma }] \, \exp \left( \sum _{j=1}^d i \xi _j R_j\right) \right] = \exp \left[ i \sum _{j} \xi _j \alpha _j - \frac{1}{2} \sum _{j,k} \xi _j\xi _k \, \mathsf{Re} [ \varGamma _{j,k}] \right] \, , \end{aligned}$$
(60)

with

$$\begin{aligned} d&: = d^\mathrm{C} + 2 d^\mathrm{Q} \nonumber \\ R_j&:= Z_j \, , \qquad \forall j \in \{1,\dots , d^\mathrm{C}\} \nonumber \\ R_{2j-1}&:=Q_j : = \frac{a_j+ a_j^\dag }{\sqrt{2}} \, ,\quad R_{2j}: =P_j : = \frac{a_j - a_j^\dag }{\sqrt{2} i} \, ,\qquad \forall j \in \left\{ d^\mathrm{C} + 1 , \dots , d\right\} \, . \end{aligned}$$
(61)

The formulation in terms of the characteristic equation (60) can be used to generalize the notion of canonical Gaussian state [38]. Given a d-dimensional Hermitian matrix (correlation matrix) \(\varGamma =\mathsf{Re} ( \varGamma )+i\mathsf{Im} ( \varGamma )\) whose real part \(\mathsf{Re} ( \varGamma )\) is positive semi-definite, we define the operators \(\varvec{R}:=(R_1, \ldots , R_d)\) via the commutation relation

$$\begin{aligned}{}[R_k, R_j]= i \mathsf{Im} ( \varGamma _{k,j}). \end{aligned}$$
(62)

We define the general Gaussian state\( G[\varvec{\alpha },{\varGamma }]\) on the operators \(\varvec{R}\) as the linear functional on the operator algebra generated by \(R_1, \ldots , R_d\) satisfying the characteristic equation (60) [38]. Note that, although \(\varGamma \) is not necessarily positive semi-definite, its real part \(\mathsf{Re} (\varGamma )\) is positive semi-definite. Hence, the right-hand-side of Eq. (60) is contains a negative semi-definite quadratic form, in the same way as for the standard Gaussian states.

For general Gaussian states, we have the following lemma.

Lemma 3

Given a Hermitian matrix \(\varGamma \), there exists an invertible real matrix T such that the Hermitian matrix \(T \varGamma T^{T} \) is the correlation matrix of a canonical Gaussian state. In particular, when \(\mathrm{Im} (\varGamma )\) is invertible, \(T \varGamma T^{T} = E_{d^Q}\left( \varvec{N}\right) +\frac{i}{2}\varOmega _{d^Q}\) and the vector \(\beta \) is unique up to the permutation. Further, when two matrices T and \(\tilde{T}\) satisfy the above condition, the canonical Gaussian states \(G[T^{-1}\varvec{\alpha }, T \varGamma T^{T}]\) and \(G[\tilde{T}^{-1}\varvec{\alpha }, \tilde{T} \varGamma \tilde{T}^{T}]\) are unitarily equivalent.

The proof is provided in “Appendix C”.

In the above lemma, we can transform \(\varGamma \) into the block form \( \varGamma ^C \oplus \varGamma ^Q\) where \(\varGamma ^C\) is real by applying orthogonal transformation. The unitary operation on the classical part is given as a scale conversion. Hence, an invertible real matrix T can be realized by the combination of a scale conversion and a linear conversion, which can be implemented as a unitary on the Hilbert space. Hence, a general Gaussian state can be given as the resultant linear functional on the operator algebra after the application of the linear conversion to a canonical Gaussian state. This kind of construction is unique up to unitarily equivalence. Indeed, Petz [38] showed a similar statement by using Gelfand–Naimark–Segal (GNS) construction. Our derivation directly shows the uniqueness without using the GNS construction.

Lemma 4

The Gaussian states family \(\{ G[\varvec{\alpha },{\varGamma }] \}_{\varvec{\alpha }}\) is D-invariant. The SLDs are calculated as \(L_{\varvec{\alpha },j}= \sum _{k=1}^d (( \mathsf{Re} (\varGamma ))^{-1})_{j,k} R_k\). The D-operator at any point \(\varvec{\alpha }\) is given as \( \mathcal {D}(R_j)= \sum _{k} 2 \mathsf{Im} (\varGamma )_{j,k}R_k\). The inverse of the RLD Fisher information matrix \(\tilde{J}_{\varvec{\alpha }}\) is calculated as

$$\begin{aligned} (\tilde{J}_{\varvec{\alpha }})^{-1}= \varGamma . \end{aligned}$$
(63)

This lemma shows the inverse of the RLD Fisher information matrix is given by the correlation matrix.

Proof

Due to the coordinate conversion give in Lemma 3, it is sufficient to show the relation (63) for the canonical Gaussian states family. In that case, the desired statement has already been shown by Holevo in [5]. \(\quad \square \)

Therefore, as shown in “Appendix D”, a D-invariant Gaussian model can be characterized as follows:

Lemma 5

Given an \(d \times d \) strictly positive-definite Hermitian matrix \(\varGamma =A+ iB\) (AB are real matrices) and a \(d\times k\) real matrix T with \(k \le d\), the following conditions are equivalent.

  1. (1)

    The linear submodel \(\mathcal{M}:=\{G[ T\,\varvec{t}, {\varGamma }]\}_{\varvec{t}\in \mathbb {R}^{k}}\) (with displacement \(T\,\varvec{t}\)) is D-invariant.

  2. (2)

    The image of the linear map \(A^{-1}T\) is invariant for the application of B.

  3. (3)

    There exist a unitary operator U and a Hermitian matrix \(\varGamma _0\) such that

    $$\begin{aligned} U G[ T\,\varvec{t}, {\varGamma }] U^\dagger = G[ \varvec{t}, \varGamma _T]\otimes G[ 0, {\varGamma }_0], \end{aligned}$$
    (64)

    where \(\varGamma _T:= (T^T A^{-1} T)^{-1}+ i (T^T A^{-1} T)^{-1}(T^T B T)(T^T A^{-1} T)^{-1}\).

3.3 Measurements on Gaussian states family

We discuss the stochastic behavior of the outcome of the measurement on the c–q system generated by \(\varvec{R}=(R_j)_{j=1}^d\) when the state is given as a general Gaussian state \(G[\varvec{\alpha }, {\varGamma }]\). To this purpose, we introduce the notation \(\wp _{\varvec{\alpha }| M}(B):= \mathrm{Tr}\,\Big [ G [\varvec{\alpha }, {\varGamma }] M(B)\Big ] \) for a POVM M. Then, we have the following lemma.

Lemma 6

Let \(\varvec{X} = (X_i)_{i=1}^k\) be the vector defined by \(X_i : = \sum _{j=1}^d P_{i,j} R_j\), where P is a real \(k\times d\) matrix. For a weighted matrix \(W>0\), there exists a POVM \(M_{P|W}^\varGamma \) with outcomes in \(\mathbb {R}^k\) such that \(\int x_i M_{P|W}^\varGamma (d \varvec{x})= X_i\) and \(\wp _{\varvec{\alpha }| M_{P|W}^\varGamma }\) is the normal distribution with average \( (\sum _{j=1}^d P_{i,j}\alpha _j)_{i=1}^k\) and covariance matrix

$$\begin{aligned} \mathsf{Re} (Z_{\varvec{\alpha }}(\varvec{X}))+\sqrt{W}^{-1} |\sqrt{W} \mathsf{Im} (Z_{\varvec{\alpha }}(\varvec{X}))\sqrt{W}| \sqrt{W}^{-1}. \end{aligned}$$

In this case, the weighted covariance matrix is

$$\begin{aligned} \mathrm{tr}\,\Big [ W \mathsf{Re} ( Z_{\varvec{\alpha }}(\varvec{X})) \Big ] +\mathrm{tr}\,\Big |\sqrt{W} \mathsf{Im} ( Z_{\varvec{\alpha }}(\varvec{X}))\sqrt{W} \Big |. \end{aligned}$$

The proof is provided in “Appendix E”.

In the above lemma, when \(\varvec{X}=\varvec{R}\), we simplify \(M_{P|W}^{\varGamma }\) to \(M_{W}^\varGamma \). This lemma is useful for estimation in the Gaussian states family \(\mathcal{M}':=\{G[\varvec{t}, {\varGamma }]\}_{\varvec{t}\in \mathbb {R}^{d}}\). In this family, we consider the covariant condition.

Definition 2

A POVM M is a covariant estimator for the family \(\{G[\varvec{t}, {\varGamma }]\}_{\varvec{t}\in \mathbb {R}^{k}}\) when the distribution \(\wp _{\varvec{t}|M}(\mathsf {B}):=\mathrm{Tr}\,G[\varvec{t}, {\varGamma }] M(\mathsf {B})\) satisfies the condition \(\wp _{\varvec{0}|M}(\mathsf {B})=\wp _{\varvec{t}|M}(\mathsf {B}+\varvec{t})\) for any \(\varvec{t}\). This condition is equivalent to

$$\begin{aligned} M(\mathsf {B}+\varvec{t})=T_{\varvec{t}} M(\mathsf {B}) T_{\varvec{t}}^\dagger \qquad \forall \varvec{t} \in \mathbb {R}^k \, . \end{aligned}$$

Then, we have the following lemma for this Gaussian states family.

Corollary 1

([5]). For any weight matrix \(W\ge 0\) and the above Gaussian states family \(\mathcal{M}'\), we have

$$\begin{aligned}&\inf _{M \in \mathsf {UB}_{\mathcal{M}'}}\mathrm{tr}\,\Big [ W V_{\varvec{t}}(M) \Big ] =\inf _{M \in \mathsf {CUB}_{\mathcal{M}'}}\mathrm{tr}\,\Big [ W V_{\varvec{t}}(M)\Big ] \nonumber \\&\quad = \mathcal {C}_{\mathrm{R},\mathcal{S'}}(W,\varvec{t}) = \mathrm{tr}\,\Big [ \mathsf{Re} ( \varGamma ) W\Big ] + \mathrm{tr}\,\Big |\sqrt{W} \mathsf{Im} ( \varGamma )\sqrt{W}\Big |, \end{aligned}$$
(65)

where \(\mathsf {CUB}_{\mathcal{M}'}\) are the sets of covariant unbiased estimators for the model \(\mathcal{M}'\), respectively. Further, when \(W>0\), the above infimum is attained by the covariant unbiased estimators \(M_W^\varGamma \) whose output distribution is the normal distribution with average \(\varvec{t}\) and covariance matrix \(\mathsf{Re} ( (\varGamma ) +\sqrt{W}^{-1} |\sqrt{W} \mathsf{Im} ( \varGamma ) \sqrt{W}|\sqrt{W}^{-1}\).

This corollary can be shown as follows. Due to Lemma 4, the lower bound (43) of the weighted MSE \(\mathrm{tr}\,W V_{\varvec{t}}(M) \) of unbiased estimator M is calculated as the RHS of (65). Lemma 6 guarantees the required performance of \(M_{W}^\varGamma \). To discuss the case when W is not strictly positive definite, we consider \(W_\epsilon :=W+\epsilon I\). Using the above method, we can construct an unbiased and covariant estimator whose output distribution is the \(2d^Q\)-dimensional distribution of average \(\varvec{t}\) and covariance \(\mathsf{Re} ( (\varGamma ) + \sqrt{W_\epsilon }^{-1} |\sqrt{W_\epsilon } \mathsf{Im} ( \varGamma ) \sqrt{W_\epsilon }|\sqrt{W_\epsilon }^{-1}\). The weighted MSE matrix is \(\mathrm{tr}\,\Big [ W \mathsf{Re} ( \varGamma ) \Big ]+ \mathrm{tr}\, \Big [ \sqrt{W_\epsilon }^{-1} W \sqrt{W_\epsilon }^{-1} |\sqrt{W_\epsilon } \mathsf{Im} ( \varGamma ) \sqrt{W_\epsilon }| \Big ]\), which converges to the bound (65).

By combining Proposition 1, this corollary can be extended to a linear subfamily of \(k'\)-dimensional Gaussian family \(\{G[\varvec{t}', {\varGamma }]\}_{\varvec{t}'\in \mathbb {R}^{k'}}\). Consider a linear map T from \(\mathbb {R}^k\) to \(\mathbb {R}^{k'}\). We have the following corollary for the subfamily \(\mathcal{M}:=\{G[ T(\varvec{t}), {\varGamma }]\}_{\varvec{t}\in \mathbb {R}^{k}}\).

Corollary 2

For any weight matrix \(W\ge 0\), we have

$$\begin{aligned}&\inf _{M \in \mathsf {UB}_{\mathcal{M}}}\mathrm{tr}\,\Big [W V_{\varvec{t}}(M) \Big ] =\inf _{M \in \mathsf {CUB}_{\mathcal{M}}}\mathrm{tr}\,\Big [ W V_{\varvec{t}}(M) \Big ] = \mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t}). \end{aligned}$$
(66)

Further, when \(W>0\), we choose a vector \(\varvec{X}\) to realize the minimum in (49). The above infimum is attained by the covariant unbiased estimators \(M_W\) whose output distribution is the normal distribution with average \(\varvec{t}\) and covariance matrix \(\mathsf{Re} ( (Z_{\varvec{t}}(\varvec{X}))+ \sqrt{W}^{-1} |\sqrt{W} \mathsf{Im} ( Z_{\varvec{t}}(\varvec{X}))\sqrt{W}| \sqrt{W}^{-1}\).

Proposition 1 guarantees that \(\mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t})\) with (49) can be given when the components \(\varvec{X}\) are given a linear combination of \(R_1, \ldots , R_{k'}\). Hence, the latter part of the corollary with \(W>0\) follows from (45) and Lemma 6, implies this corollary for \(W>0\). The case with non strictly positive W can be shown by considering \(W_\epsilon \) in the same way as Corollary 1.

4 Local Asymptotic Normality

The extension from one-parameter estimation to multiparameter estimation is quite nontrivial. Hence we first develop the concept of local asymptotic normality which is the key tool to constructing the optimal measurement in multiparameter estimation. Since we could derive the tight bound of MSE for the Gaussian states family, it is a natural idea to approximate the general case by Gaussian states family, and local asymptotic normality will serve as the bridge between these general qudit families and Gaussian state families.

4.1 Quantum local asymptotic normality with specific parametrization

For a quantum system of dimension \(d<\infty \), also known as qudit, we consider generic states, described by density matrices with full rank and non-degenerate spectrum. To discuss quantum local asymptotic normality, we need to define a specific coordinate system. For this aim, we consider the neighborhood of a fixed density matrix \(\rho _{\varvec{\theta }_0}\), assumed to be diagonal in the canonical basis of \(\mathbb {C}^d\), and parametrized as

$$\begin{aligned} \rho _{\varvec{\theta }_0} = \sum _{j=1}^{d} \theta _{0,j}\, |j\rangle \langle j|\, \end{aligned}$$

with spectrum ordered as \({\theta }_{0,1}>\cdots>{\theta }_{0,d-1}>{\theta }_{0,d}>0\). In the neighborhood of \(\rho _{\varvec{\theta }_0} \), we parametrize the states of the system as

$$\begin{aligned} \rho _{\varvec{\theta }_0+\frac{\varvec{\theta }}{\sqrt{n}}}= U_{\varvec{\theta }^R,\varvec{\theta }^I} \, \rho _0(\varvec{\theta }^C) \, U^\dag _{\varvec{\theta }^R,\varvec{\theta }^I} \end{aligned}$$
(67)

for \(\varvec{\theta } := ( \varvec{\theta }^C,\varvec{\theta }^R,\varvec{\theta }^I)\) with \((\varvec{\theta }^R,\varvec{\theta }^I)\in \mathbb {R}^{d(d-1)}\) and \(\varvec{\theta }^C \in \mathbb {R}^{d-1}\), where \(\rho _0(\varvec{\theta }^C)\) is the diagonal density matrix

$$\begin{aligned} \rho _0 (\varvec{\theta }^C) = \sum _{j=1}^{d} \, \left( \theta _{0,j}+\frac{{\theta }^C_j}{\sqrt{n}}\right) \, |j\rangle \langle j|, \,\qquad \theta ^C_d:=-\sum _{k=1}^{d-1}\theta ^C_k, \end{aligned}$$
(68)

and \(U_{\varvec{\theta }^R,\varvec{\theta }^I}\) is the unitary matrix defined by

$$\begin{aligned} U_{\varvec{\theta }^R,\varvec{\theta }^I}&= \exp \left[ \sum _{1\le j<k\le d}\frac{i\left( \theta ^\mathrm{I}_{j,k}F^\mathrm{I}_{j,k}+\theta ^\mathrm{R}_{j,k}F^\mathrm{R}_{k,j}\right) }{\sqrt{n(\theta _{0,j}-\theta _{0,k})}}\right] . \end{aligned}$$
(69)

Here \(\varvec{\theta }^R\) and \(\varvec{\theta }^I\) are vectors of real parameters \( ({\theta }^R_{j,k})_{1\le j<k\le d}\) and \(( {\theta }^I_{j,k})_{1\le j<k\le d}\), and \(F^\mathrm{I}\) (\(F^\mathrm{R}\)) is the matrix defined by \((F^\mathrm{I})_{j,k}:=i\delta _{j,k}-i\delta _{k,j}\) (\((F^\mathrm{R})_{k,j}:=\delta _{j,k}+\delta _{k,j})\), where \(\delta _{j,k}\) is the delta function. We note that by this definition the components of \(\varvec{\theta }^R\) and \(\varvec{\theta }^I\) are in one-to-one correspondence. The parameter \(\varvec{\theta } = ( \varvec{\theta }^C,\varvec{\theta }^R,\varvec{\theta }^I)\) will be referred to as the Q-LAN coordinate, and the state with this parametrization, which was used by Khan and Guta in [16, 17, 39], will be denoted by \(\rho ^\mathrm{KG}_{\varvec{\theta }}\).

Q-LAN establishes an asymptotic correspondence between multicopy qudit states and Gaussian shift models. Using the parameterization \(\varvec{\theta }=(\varvec{\theta }^C,\varvec{\theta }^R,\varvec{\theta }^I)\), we have the multicopy qudit models and Gaussian shift models are equivalent in terms of the RLD quantum Fisher information matrix:

Lemma 7

The RLD quantum Fisher information matrices of the qudit model and the corresponding Gaussian model in Eq. (75) are both equal to

$$\begin{aligned} \left( \widetilde{J}^Q_{\varvec{\theta }}\right) ^{-1}=E_{d(d-1)/2}\left( \varvec{\beta }'\right) +\frac{i}{2}\varOmega _{d(d-1)/2}\qquad e^{-\beta '_i}=\frac{1}{4}\coth {\frac{\beta _i}{2}}. \end{aligned}$$
(70)

The calculations can be found in “Appendix F”. The quantum version of local asymptotic normality has been derived in several different forms [16, 17, 39] with applications in quantum statistics [12, 40], benchmarks [41] and data compression [42]. Here we use the version of [17], which states that n identical copies of a qudit state can be locally approximated by a c–q Gaussian state in the large n limit. The approximation is in the following sense:

Definition 3

(Compact uniformly asymptotic equivalence of models). For every \(n\in \mathbb {N}^*\), let \(\{\rho _{\varvec{t},n}\}_{\varvec{t}\in \mathsf {\Theta }_n}\) and \(\{\widetilde{\rho }_{\varvec{t},n}\}_{\varvec{t}\in \mathsf {\Theta }_n}\) be two models of density matrices acting on Hilbert spaces \(\mathcal {H}\) and \(\mathcal {K}\) respectively where the set of parameters \(\mathsf {\Theta }_n\) may depend on n. We say that the two families are asymptotically equivalent for \(\varvec{t} \in \mathsf {\Theta }_n\), denoted as \(\rho _{\varvec{t},n} \cong \widetilde{\rho }_{\varvec{t},n}~(\varvec{t} \in \mathsf {\Theta }_n)\), if there exists a quantum channel \(\mathcal {T}_n\) (i.e. a completely positive trace preserving map) mapping trace-class operators on \(\mathcal {H}\) to trace-class operators on \(\mathcal {K}\) and a quantum channel \(\mathcal {S}_n\) mapping trace-class operators on \(\mathcal {K}\) to trace-class operators on \(\mathcal {H}\), which are independent of \(\varvec{t}\) and satisfy the conditions

$$\begin{aligned}&\sup _{\varvec{t}\in \mathsf {\Theta }_n } \left\| \mathcal {T}_n \left( \rho _{\varvec{t},n}\right) -\widetilde{\rho }_{\varvec{t},n} \right\| _1 \xrightarrow {n\rightarrow \infty } 0 \end{aligned}$$
(71)
$$\begin{aligned}&\sup _{\varvec{t}\in \mathsf {\Theta }_n} \left\| \rho _{\varvec{t}, n} -\mathcal {S}_n \left( \widetilde{\rho }_{\varvec{t}, n}\right) \right\| _1 \xrightarrow {n\rightarrow \infty } 0 \, . \end{aligned}$$
(72)

Next, we extend asymptotic equivalence to compact uniformly asymptotic equivalence. In this extension, we also describe the order of the convergence.

Given a sequence \(\{a_n\}\) converging to zero, for every \(\varvec{t}'\) in a compact set \(\mathsf {K}\) consider two models \(\{\rho _{\varvec{t},\varvec{t}',n}\}_{\varvec{t}\in \mathsf {\Theta }_n, }\) and \(\{\widetilde{\rho }_{\varvec{t},\varvec{t}',n}\}_{\varvec{t}\in \mathsf {\Theta }_n}\). We say that they are asymptotically equivalent for \(\varvec{t} \in \mathsf {\Theta }_n\) compact uniformly with respect to \(\varvec{t}'\) with order \(a_n\), denoted as \(\rho _{\varvec{t},\varvec{t}',n} {\mathop {\cong }\limits ^{\varvec{t}'}} \widetilde{\rho }_{\varvec{t},\varvec{t}',n}~ (\varvec{t} \in \mathsf {\Theta }_n, a_n)\), if for every \(\varvec{t}'\in \mathsf {K}\) there exists a quantum channel \(\mathcal {T}_{n,\varvec{t}'}\) mapping trace-class operators on \(\mathcal {H}\) to trace-class operators on \(\mathcal {K}\) and a quantum channel \(\mathcal {S}_{n,\varvec{t}'}\) mapping trace-class operators on \(\mathcal {K}\) to trace-class operators on \(\mathcal {H}\) such that

$$\begin{aligned}&\sup _{\varvec{t}' \in K} \sup _{\varvec{t}\in \mathsf {\Theta }_n } \Vert \mathcal {T}_{n,\varvec{t}'} (\rho _{\varvec{t},\varvec{t}',n}) -\widetilde{\rho }_{\varvec{t},\varvec{t}',n} \Vert _1 = O(a_n) \end{aligned}$$
(73)
$$\begin{aligned}&\sup _{\varvec{t}' \in K} \sup _{\varvec{t}\in \mathsf {\Theta }_n} \Vert \rho _{\varvec{t},\varvec{t}',n} -\mathcal {S}_{n,\varvec{t}'} (\widetilde{\rho }_{\varvec{t},\varvec{t}',n}) \Vert _1 = O(a_n). \end{aligned}$$
(74)

Notice that the channels \(\mathcal {T}_{n,\varvec{t}'}\) and \(\mathcal {S}_{n,\varvec{t}'} \) depend on \(\varvec{t}'\) and are independent of \(\varvec{t}\).

In the above terminology, Q-LAN establishes an asymptotic equivalence between families of n copy qudit states and Gaussian shift models. Precisely, one has the following.

Proposition 2

(Q-LAN for a fixed parameterization; Kahn and Guta [16, 17]). For any \(x <1/9\), we define the set \(\mathsf {\Theta }_{n,x}\) of \(\varvec{\theta }\) as

$$\begin{aligned} \mathsf {\Theta }_{n,x}:=\left\{ \varvec{\theta }~|~\Vert \varvec{\theta }\Vert \le n^{x}\right\} \end{aligned}$$

(\(\Vert \cdot \Vert \) denotes the vector norm). Then, we have the following compact uniformly asymptotic equivalence;

$$\begin{aligned} (\rho ^\mathrm{KG}_{\varvec{\theta }_0+\varvec{\theta }/\sqrt{n}})^{\otimes n} {\mathop {\cong }\limits ^{\varvec{\theta }_0}} G[\varvec{\theta },\varGamma _{\varvec{\theta }_0}]:= N[\varvec{\theta }^C,\varGamma _{\varvec{\theta }_0}^C] \otimes \varPhi [(\varvec{\theta }^R,\varvec{\theta }^I),\varvec{\beta }_{\varvec{\theta }_0}] ~(\varvec{\theta } \in \mathsf {\Theta }_{n,x},n^{-\kappa } ), \end{aligned}$$
(75)

where \(\kappa \) is a parameter to satisfy \(\kappa \ge 0.027\), and \(N[\varvec{\theta }^C,\varGamma _{\varvec{\theta }_0}] \) is the multivariate normal distribution with mean \(\varvec{\theta }^C\) and covariance matrix \(\varGamma _{\varvec{\theta }_0,k,l}:= (J_{\varvec{\theta }_0}^{-1})_{k,l}\) for \(k,l=1, \ldots , d-1\), and \((\varvec{\beta })_{\varvec{\theta }_0,j,k}:=\frac{(\rho _{\varvec{\theta }_0})_{k,k}}{ (\rho _{\varvec{\theta }_0})_{j,j}}\).

The conditions (73) and (74) are not enough to translate precision limits for one family into precision limits for the other. This is because such limits are often expressed in terms of the derivatives of the density matrix, whose asymptotic behaviour is not fixed by (73) and (74). In the following we will establish an asymptotic equivalence in terms of the RLD quantum Fisher information.

4.2 Quantum local asymptotic normality with generic parametrization

In the following, we explore to which extent can we extend Q-LAN in Proposition 2. Precisely, we derive a Q-LAN equivalence as in Eq. (75) which is not restricted to the parametrization of Eqs. (68) and (69).

In the previous subsection, we have discussed the specific parametrization given in (67). In the following, we discuss a generic parametrization. Given an arbitrary D-invariant model \(\rho ^{\otimes n}_{\varvec{t}_0+\frac{\varvec{t}}{\sqrt{n}}}\) with vector parameter \(\varvec{t}\), we have the following theorem.

Theorem 3

(Q-LAN for an arbitrary parameterization). Let \(\{\rho _{\varvec{t}}\}_{\varvec{t}\in \mathsf {\Theta }}\) be a k-parameter D-invariant qudit model. Assume that \(\rho _{\varvec{t}_0}\) is a non-degenerate state, the parametrization is \(C^2\) continuous, and \(\tilde{J}_{\varvec{t}_0}^{-1} \) exists. Then, there exist a constant \(c({\varvec{t}_0})\) such that the set

$$\begin{aligned} \mathsf {\Theta }_{n,x,c({\varvec{t}_0})}:= \left\{ \varvec{t}~|~\Vert \varvec{t}\Vert \le c({\varvec{t}_0}) n^{x}\right\} \end{aligned}$$
(76)

with \(x <1/9\) satisfies

$$\begin{aligned} \rho ^{\otimes n}_{\varvec{t}_0+\frac{\varvec{t}}{\sqrt{n}}} {\mathop {\cong }\limits ^{\varvec{t}_0}} G[ \varvec{t},\tilde{J}_{\varvec{t}_0}^{-1}] ~(\varvec{t} \in \mathsf {\Theta }_{n,x,c({\varvec{t}_0})}\cap \mathbb {R}^k ,n^{-\kappa } ), \end{aligned}$$
(77)

where \(\tilde{J}_{\varvec{t}_0}^{-1}\) is the RLD Fisher information at \(\varvec{t}_0\) and \(\kappa \) is a parameter to satisfy \(\kappa \ge 0.027\).

Proof

We choose the basis \(\{|i\rangle \}_{i=1}^d\) to diagonalize the state \(\rho _{\varvec{t}_0}\). We denote the Q-LAN parametrization based on this basis by \(\rho ^{\mathrm{KG}|\varvec{t}_0}_{\varvec{\theta } } \), where this parametrization depends on \(\varvec{t}_0\). It is enough to consider the neighborhood \(U(\varvec{t}_0)\) of \(\varvec{t}_0\). There exists a map \(f_{\varvec{t}_0}\) on \(U(\varvec{t}_0)\) such that \( \rho _{\varvec{t}_0+\varvec{t}}= \rho ^{\mathrm{KG}|\varvec{t}_0}_{\varvec{\theta }_0(\varvec{t}_0)+f_{\varvec{t}_0}(\varvec{t})}\), where \(\varvec{\theta }_0(\varvec{t}_0) \) is the parameter to describe the diagonal elements of \(\rho _{\varvec{t}_0}\). Since the parametrization \( \rho _{\varvec{t}}\) is \(C^2\)-continuous, the function f is also \(C^2\)-continuous. Proposition 2 guarantees that

$$\begin{aligned} \rho _{\varvec{t}_0+ \varvec{t}/\sqrt{n}}^{\otimes n}&= \left( \rho ^{\mathrm{KG}|\varvec{t}_0}_{\varvec{\theta }_0(\varvec{t}_0)+f_{\varvec{t}_0}(\varvec{t}/\sqrt{n})}\right) ^{\otimes n} \nonumber \\&{\mathop {\cong }\limits ^{\varvec{\theta }_0}} G[\sqrt{n} f_{\varvec{t}_0}(\varvec{t}/\sqrt{n}),\varGamma _{\varvec{\theta }_0(\varvec{t}_0)}] ~(\varvec{t} \in \mathsf {\Theta }_{n,x,c({\varvec{t}_0})}\cap \mathbb {R}^k ,n^{-\kappa } ) \end{aligned}$$
(78)

with suitable choice of the constant \(c({\varvec{t}_0})\). Denoting by \(f_{\varvec{t}_0}' \) the Jacobian matrix of \(f_{\varvec{t}_0}\), since f is \(C^2\)-continuous and \(f_{\varvec{t}_0}(0)=0\), the norm \(\Vert \sqrt{n} f_{\varvec{t}_0}(\varvec{t}/\sqrt{n})- f_{\varvec{t}_0}'(0)\varvec{t} \Vert _1\) is evaluated as \(O(\frac{\Vert \varvec{t}\Vert ^2}{\sqrt{n}}) \). Hence, the trace norm \( \Vert G[\sqrt{n} f_{\varvec{t}_0}(\varvec{t}/\sqrt{n}),\varGamma _{\varvec{\theta }_0(\varvec{t}_0)}] - G[f_{\varvec{t}_0}'(0)\varvec{t},\varGamma _{\varvec{\theta }_0(\varvec{t}_0)}]\Vert _1 \) is also \(O(\frac{\Vert \varvec{t}\Vert ^2}{\sqrt{n}}) \), which is at most \(O(n^{-5/18}) \) because \(\varvec{t} \in \mathsf {\Theta }_{n,x,c({\varvec{t}_0})}\). Since \(O(n^{-5/18})\) is smaller than \(n^{-\kappa }\), the combination of this evaluation and (78) yields

$$\begin{aligned} \rho _{\varvec{t}_0+ \varvec{t}/\sqrt{n}}^{\otimes n} {\mathop {\cong }\limits ^{\varvec{\theta }_0}} G[f_{\varvec{t}_0}'(0)\varvec{t},\varGamma _{\varvec{\theta }_0(\varvec{t}_0)}] ~(\varvec{t} \in \mathsf {\Theta }_{n,x,c({\varvec{t}_0})}\cap \mathbb {R}^k ,n^{-\kappa } ). \end{aligned}$$
(79)

The combination of Lemma 5 and (79) implies (77).

5 The \(\epsilon \)-Difference RLD Fisher Information Matrix

In Sect. 2.1 we evaluated the limiting distribution in the one-parameter case, using the fidelity as a discretized version of the SLD Fisher information. In order to tackle the multiparameter case, we need to develop a similar discretization for the RLD Fisher information matrix, which is the relevant quantity for the multiparameter setting (cf. Sect. 3). In this section we define a discretized version of the RLD Fisher information matrix, extending to the multiparameter case the single-parameter definition introduced by Tsuda and Matsumoto [25], who in turn extended the corresponding classical notion [43, 44].

5.1 Definition

Let \(\mathcal {M} = \{ \rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\) be a k-parameter model, with the property that \(\rho _{\varvec{t}_0}\) is invertible. If the parametrization \(\rho _{\varvec{t}}\) is differentiable, the RLD quantum Fisher information matrix \(\tilde{J}_{\varvec{t}}\) can be rewritten as the following \(k\times k\) matrix

$$\begin{aligned} \left( \tilde{J}_{\varvec{t}_0}\right) _{ij} =\mathrm{Tr}\,\left[ \tilde{L}_j^\dagger \rho _{\varvec{t}_0} \tilde{L}_i \right] = \mathrm{Tr}\,\left[ \left. \frac{\partial \rho _{\varvec{t}}}{\partial t_j}\right| _{\varvec{t}=\varvec{t}_0} \rho _{\varvec{t}_0}^{-1} \left. \frac{\partial \rho _{\varvec{t}}}{\partial t_i}\right| _{\varvec{t}=\varvec{t}_0} \right] \, . \end{aligned}$$
(80)

The \(\epsilon \)-difference RLD quantum Fisher information matrix \(\tilde{J}_{\varvec{t}_0,\epsilon }\) is defined by replacing the partial derivatives with finite increments:

$$\begin{aligned} \left( \tilde{J}_{\varvec{t}_0,\epsilon }\right) _{i,j} :=&\mathrm{Tr}\,\left[ \left( \frac{ \rho _{\varvec{t}_0+ \epsilon \varvec{e}_i}- \rho _{\varvec{t}_0} }{\epsilon }\right) \, \rho _{\varvec{t}_0}^{-1} \, \left( \frac{ \rho _{\varvec{t}_0+ \epsilon \varvec{e}_j}- \rho _{\varvec{t}_0} }{\epsilon } \right) \right] \nonumber \\ =&\frac{\mathrm{Tr}\,\Big [ \rho _{\varvec{t}_0+ \epsilon \varvec{e}_i} \, \rho _{\varvec{t}_0}^{-1} \, \rho _{\varvec{t}_0+ \epsilon \varvec{e}_j} \Big ] -1}{\epsilon ^2} \, , \end{aligned}$$
(81)

where \(\varvec{e}_j\) is the unit vector with 1 in the j-th entry and zero in the other entries. Notice that one has

$$\begin{aligned} \left( \tilde{J}_{\varvec{t}_0,\epsilon }\right) _{i,i} = \frac{\exp \big [ D_2(\rho _{\varvec{t}_0+ \epsilon \varvec{e}_i}||\rho _{\varvec{t}_0}) \Big ] -1}{\epsilon ^2} \, , \end{aligned}$$
(82)

where \(D_2(\rho ||\sigma ):=\log \mathrm{Tr}\,\big [\rho ^2\sigma ^{-1}\big ]\) is the (Petz’s) \(\alpha \)-Renyi entropy for \(\alpha =2\).

When the parametrization \(\rho _{\varvec{t}} \) is differentiable, one has

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}\tilde{J}_{\varvec{t}_0,\epsilon }= \tilde{J}_{\varvec{t}_0} \, , \end{aligned}$$
(83)

where \(\tilde{J}_{\varvec{t}_0}\) is the RLD quantum Fisher information matrix (80).

When the parametrization is not differentiable, we define the RLD Fisher information matrix \(\tilde{J}_{\varvec{t}_0}\) to be the limit (83), provided that the limit exists. All throughout this section, we impose no condition on the parametrization \(\rho _{\varvec{t}} \), except for the requirement that \(\rho _{\varvec{t}_0}\) be invertible.

5.2 The \(\epsilon \)-difference RLD Cramér–Rao inequality

A discrete version of the RLD quantum Cramér–Rao inequality can be derived under the assumption of \(\epsilon \)-locally unbiasedness, defined as follows:

Definition 4

A POVM M with outcomes in \(\mathbb {R}^k\) is \(\epsilon \)-locally unbiased at \(\varvec{t}_0\) if the expectation value \(E_{\varvec{t}_0}(M)\) satisfies the conditions

$$\begin{aligned} E_{\varvec{t}_0}(M) = \varvec{t}_{0} ,\qquad \mathrm{and} \qquad E_{\varvec{t}_0+ \epsilon \varvec{e}_j}(M) = \varvec{t}_{0} +\epsilon \varvec{e}_j \qquad \forall j \in \{1,\dots , k\} \, . \end{aligned}$$

Under the \(\epsilon \)-locally unbiasedness condition, Tsuda et al. [25] derived a lower bound on the MSE for the one-parameter case. In the following theorem, we extend the bound to the multiparameter case.

Theorem 4

(\(\epsilon \)-difference RLD Cramér–Rao inequality). The MSE matrix for an \(\epsilon \)-locally unbiased POVM M at \(\varvec{t}_0\) satisfies the bound

$$\begin{aligned} V_{\varvec{t}_0}(M)\ge (\tilde{J}_{\varvec{t}_0,\epsilon })^{-1}. \end{aligned}$$
(84)

Proof

For simplicity, we assume that \(\varvec{t}_0=\varvec{0}\). For two vectors \(\varvec{a} \in \mathbb {C}^k\) and \(\varvec{b}\in \mathbb {C}^{k}\), we define the two observables \(X:= \int \sum _i a_i x_i \,M(d \varvec{x}) \) and \(Y:= \sum _j b_j \frac{ \rho _{\varvec{t}_0+ \epsilon \varvec{e}_j}-\rho _{\varvec{t}_0}}{\epsilon }\rho _{\varvec{t}_0}^{-1}\). Then, the Cauchy-Schwartz inequality implies

$$\begin{aligned} \mathrm{Tr}\,\Big [ X^\dag X \, \rho _{\varvec{t}_0}\Big ] \, \mathrm{Tr}\,\Big [ Y^\dag Y \, \rho _{\varvec{t}_0}\Big ]&\ge \left| \mathrm{Tr}\,\Big [ X^\dagger Y \rho _{\varvec{t}_0} \Big ] \right| ^2 \nonumber \\&= \left| \sum _{i,j} \overline{a}_i b_j ~\int x_i \mathrm{Tr}\,\left[ M(d \varvec{x}) \frac{ \rho _{\varvec{t}_0+ \epsilon \varvec{e}_j}-\rho _{\varvec{t}_0}}{\epsilon }\right] \right| ^2\nonumber \\&= |\langle \varvec{a}| \varvec{b}\rangle |^2 \, , \end{aligned}$$
(85)

the second equality following from \(\epsilon \)-locally unbiasedness at \(\varvec{t}_0\). Note that one has \(\mathrm{Tr}\,[Y^\dag Y \rho _{\varvec{t}_0}] = \langle \varvec{b} | \, \tilde{J}_{\varvec{t}_0, \epsilon } \, |\varvec{b} \rangle \) and

$$\begin{aligned} \langle \varvec{a} |V_{\varvec{t}_0}(M) |\varvec{a} \rangle - \mathrm{Tr}\,\Big [ X^\dagger X \rho _{\varvec{t}_0}\Big ]&= \int \mathrm{Tr}\,\left[ \Big (\sum _i a_i x_i- X\Big )^\dagger \Big (\sum _j a_j x_j- X\Big ) M(d \varvec{x}) \rho _{\varvec{t}_0}\right] \nonumber \\&\ge 0 \,. \end{aligned}$$
(86)

Choosing \(\varvec{b}:= (\tilde{J}_{\varvec{t}_0,\epsilon })^{-1} \varvec{a}\), we have

$$\begin{aligned} \langle \varvec{a} |V_{\varvec{t}_0}(M) |\varvec{a} \rangle ~ \langle \varvec{a} | \tilde{J}_{\varvec{t}_0,\epsilon }^{-1} |\varvec{a} \rangle&\ge \mathrm{Tr}\,\Big [ X^\dag X \, \rho _{\varvec{t}_0}\Big ] \, \mathrm{Tr}\,\Big [ Y^\dag Y \, \rho _{\varvec{t}_0}\Big ] \nonumber \\&\ge \left| \mathrm{Tr}\,\Big [ X^\dagger Y \rho _{\varvec{t}_0} \Big ] \right| ^2 \nonumber \\&=\left| \langle \varvec{a} | (\tilde{J}_{\varvec{t}_0,\epsilon })^{-1} | \varvec{a}\rangle \right| ^2 \, , \end{aligned}$$
(87)

which implies \(\langle \varvec{a} |V_{\varvec{t}_0}(M) |\varvec{a} \rangle \ge \langle \varvec{a} | (\tilde{J}_{\varvec{t}_0,\epsilon })^{-1} | \varvec{a}\rangle \). Since \(\varvec{a}\) is arbitrary, the last inequality implies (84).

We will call (84) the \(\epsilon \)-difference RLD Cramér–Rao inequality.

The \(\epsilon \)-difference RLD Cramér–Rao inequality can be used to derive an information processing inequality, which states that the \(\epsilon \)-difference RLD Fisher information matrix is non-increasing under the application of measurements. For a family of probability distributions \(\{P_{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\), we assume that \(P_{\varvec{t}+\epsilon \varvec{e}_j }\) is absolutely continuous with respect to \(P_{\varvec{t}}\) for every j. Then, the \(\epsilon \)-difference RLD Fisher information is defined as

$$\begin{aligned} \left( J_{\varvec{t},\epsilon }\right) _{ij} : = \int \left( \frac{p_{\varvec{t}+\epsilon \varvec{e}_j }(x)-1}{\epsilon }\right) \, \left( \frac{p_{\varvec{t}+\epsilon \varvec{e}_j }(x)-1}{\epsilon }\right) ~ P_{\varvec{t}}(dx) \end{aligned}$$
(88)

where \(p_{\varvec{t}+\epsilon \varvec{e}_j }\) and \(p_{\varvec{t}+\epsilon \varvec{e}_i }\) are the Radon–Nikodým derivatives of \(P_{\varvec{t}+\epsilon \varvec{e}_j }\) and \(P_{\varvec{t}+\epsilon \varvec{e}_i }\) with respect to \(P_{\varvec{t}}\), respectively. We note that the papers [43, 44] defined its one-parameter version when the distributions are absolutely continuous with respect to the Lebesgue measure. Hence, when an estimator \(\hat{\varvec{t}}\) for the distribution family \(\{P_{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\) is \(\epsilon \)-locally unbiased at \(\varvec{t}_0\), in the same way as (84), we can show the \(\epsilon \)-difference Cramér–Rao inequality;

$$\begin{aligned} V_{\varvec{t}_0}[\hat{t}]\ge ({J}_{\varvec{t}_0,\epsilon })^{-1}. \end{aligned}$$
(89)

For a family of quantum states \(\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\) and a POVM M, we denote by \(J_{\varvec{t},\epsilon }^M\) the \(\epsilon \)-difference Fisher information matrix of the probability distribution family \(\{ P^M_{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\) defined by \(P^M_{\varvec{t}} : =\mathrm{Tr}\,\big [ M\rho _{\varvec{t}}\big ]\). With this notation, we have the following lemma:

Lemma 8

For every family of quantum states \(\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\) and every POVM M, one has the information processing inequality

$$\begin{aligned} \tilde{J}_{\varvec{t}_0,\epsilon } \ge J_{\varvec{t}_0,\epsilon }^M \, , \end{aligned}$$
(90)

where \(\tilde{J}_{\varvec{t}_0,\epsilon }\) is the \(\epsilon \)-difference RLD Fisher information of the model \(\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\).

Proof

Consider the estimation of \(\varvec{t}\) from the probability distribution family \(\{ P^M_{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\). Following the same arguments used for the achievability of the Cramér–Rao bound with locally unbiased estimators (see, for instance, Chapter 2 of Ref. [34]), it is possible to show that there exists an \(\epsilon \)-locally unbiased estimator \(\hat{\varvec{t}}\) at \(\varvec{t}_0\) such that

$$\begin{aligned} V_{\varvec{t}_0}\left( \hat{\varvec{t}} \right) = (J_{\varvec{t}_0,\epsilon }^M)^{-1} \, . \end{aligned}$$
(91)

Combining the POVM M with the \(\epsilon \)-locally unbiased estimator \(\hat{\varvec{t}}\) we obtain a new POVM \(M'\), which is \(\epsilon \)-locally unbiased. Applying Theorem 4 to the POVM \(M'\) we obtain

$$\begin{aligned} ( \tilde{J}_{\varvec{t}_0,\epsilon } )^{-1}\le V_{\varvec{t}_0}\left( M' \right) = V_{\varvec{t}_0}\left( \hat{\varvec{t}} \right) = (J_{\varvec{t}_0,\epsilon }^M)^{-1} \, , \end{aligned}$$
(92)

which implies (90). \(\quad \square \)

We stress that (90) is a matrix inequality for Hermitian matrices: in general, \(\tilde{J}_{\varvec{t}_0,\epsilon }\) has complex entries. Also note that any classical process can be regarded as a POVM. Hence, in the same way as (90), using the \(\epsilon \)-difference Cramér–Rao inequality (89), we can show the inequality

$$\begin{aligned} J_\epsilon \ge J_\epsilon ' \end{aligned}$$
(93)

for an classical process \(\mathcal{E}\) when \(J_\epsilon \) is the \(\epsilon \)-difference Fisher information matrix on the distribution family \(\{P_{\varvec{t}}\}_{\varvec{t}\in \mathsf {\Theta }}\) and \(J_\epsilon '\) is the \(\epsilon \)-difference Fisher information matrix on the distribution family \(\{\mathcal{E}(P_{\varvec{t}})\}_{\varvec{t}\in \mathsf {\Theta }}\).

5.3 Extended models

The lemmas in the previous subsection can be generalized to the case where an extended model \(\mathcal{M}':= \{\rho _{\varvec{t}'}\}_{\varvec{t}'=(\varvec{t},\varvec{p})}\) contains the original model \(\mathcal{M} \) as \(\rho _{\varvec{t}}=\rho _{(\varvec{t},\varvec{0})}\). Choosing \(\varvec{t}_0'=(\varvec{t}_0,\varvec{0})\), we denote the \(\epsilon \)-difference RLD Fisher information matrix at \(\varvec{t}_0'\) for the family \(\mathcal{M}'\) by \(\tilde{J}_{\varvec{t}_0',\epsilon }\).

Lemma 9

For an \(\epsilon \)-locally unbiased estimator M at \(\varvec{t}_0'\), there exists a \(k \times k'\) matrix P such that \(P_{ij} =\delta _{ij}\) for \( i,j\le k\) and

$$\begin{aligned} V_{\varvec{t}_0}(M)&\ge P \tilde{J}_{\varvec{t}_0',\epsilon }^{-1}P^T. \end{aligned}$$

Proof of Lemma 9

For an \(\epsilon \)-locally unbiased estimator M at \(\varvec{t}_0\), there exists a \(k \times k'\) matrix P such that

$$\begin{aligned} P_{ij}&=\delta _{ij} \hbox { for }i,j\le k \end{aligned}$$
(94)
$$\begin{aligned} \epsilon P_{ij}&= \int x_i \mathrm{Tr}\,M(d \varvec{x}) ( \rho _{\varvec{t}_0'+ \epsilon \varvec{e}_j}-\rho _{\varvec{t}_0'}) \hbox { for }i\le k, k+1 \le j \le k'. \end{aligned}$$
(95)

Now, we introduce a new parametrization \(\tilde{\rho }_{\eta }:= \rho _{\varvec{t}_0'+ \sum _{i,j}\eta _i P^{-1}_{j,i}\varvec{e}_j}\). Since \(\frac{\partial \theta _j}{\partial \eta _i} = P^{-1}_{j,i}\), the \(\epsilon \)-difference RLD quantum Fisher information under the parameter \(\eta \) is \( (P^{-1})^{T} \tilde{J}_{\varvec{t}_0',\epsilon } P^{-1}\). Applying Theorem 4 to the parameter \(\eta \), we obtain

$$\begin{aligned} V_{\varvec{t}_0}(M) \ge P \tilde{J}_{\varvec{t}_0',\epsilon }^{-1}P^T. \end{aligned}$$
(96)

Combining (94) and (96), we obtain the desired statement.

In the same way as Lemmas 89 yields the following lemma.

Lemma 10

For any POVM M, there exists a \(k \times k'\) matrix P such that \(P_{ij} =\delta _{ij}\) for \( i,j\le k\) and

$$\begin{aligned} (J_{\varvec{t}_0,\epsilon }^M)^{-1}&\ge P \tilde{J}_{\varvec{t}_0',\epsilon }^{-1}P^T . \end{aligned}$$
(97)

5.4 Asymptotic case

We denote by \(\tilde{J}_{\varvec{t}_0,\epsilon }^{n}\) the \(\epsilon \)-difference RLD Fisher information matrix of the n-copy states \(\{\rho _{\varvec{t}}^{\otimes n}\}_{\varvec{t} \in \mathsf {\Theta }}\).

In the following we provide the analogue of Lemma 1 for the RLD Fisher information matrix.

Lemma 11

When the parametrization is \(C^1\) continuous, the following relations hold

$$\begin{aligned} \lim _{n \rightarrow \infty }\frac{1}{n}\tilde{J}_{\varvec{t}_0, \frac{\epsilon }{\sqrt{n}}}^{n}&= \tilde{J}^{[\epsilon ]}_{\varvec{t}_0} \end{aligned}$$
(98)
$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \tilde{J}^{[\epsilon ]}_{\varvec{t}_0}&= \tilde{J}_{\varvec{t}_0} , \end{aligned}$$
(99)

where \(J^{[\epsilon ]}_{\varvec{t}_0}\) is the matrix defined by \(\left( \tilde{J}^{[\epsilon ]}_{\varvec{t}_0} \right) _{i,j}:= \frac{1}{\epsilon ^2} \left[ e^{\epsilon ^2 \left( \tilde{J}_{\varvec{t}_0}\right) _{i,j}}-1\right] \).

Proof of Lemma 11

Eq. (99) holds trivially. Using (83), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n} \left( \tilde{J}_{\varvec{t}_0, \frac{\epsilon }{\sqrt{n}}}^n \right) _{i,j}&=\lim _{n\rightarrow \infty } \frac{1}{\epsilon ^2} \left( \mathrm{Tr}\,\left[ \rho _{\varvec{t}_0+ \frac{\epsilon }{\sqrt{n}} \varvec{e}_j}^{\otimes n} (\rho _{\varvec{t}_0}^{\otimes n})^{-1} \rho _{\varvec{t}_0+ \frac{\epsilon }{\sqrt{n}} \varvec{e}_i}^{\otimes n}\right] -1\right) \\&= \lim _{n\rightarrow \infty } \frac{1}{\epsilon ^2} \left[ \left( 1+ \frac{\epsilon ^2}{n} \left( \tilde{J}_{\varvec{t}_0}\right) _{i,j} +O\left( \frac{1}{n^2}\right) \right) ^n -1\right] \\&=\lim _{n\rightarrow \infty } \frac{1}{\epsilon ^2} \left[ \left( 1+ \frac{\epsilon ^2}{n} \left( \tilde{J}_{\varvec{t}_0}\right) _{i,j} +O\left( \frac{1}{n^2}\right) \right) ^n -1\right] \\&= \frac{1}{\epsilon ^2} \left[ e^{\epsilon ^2 \left( \tilde{J}_{\varvec{t}_0}\right) _{i,j}}-1 \right] \, , \end{aligned}$$

which implies (98). \(\quad \square \)

6 Precision Bounds for Multiparameter Estimation

6.1 Covariance conditions

First, we introduce the condition for our estimators. The correspondence between qudit states and Gaussian states also extends to the estimator level. We consider a generic state family \(\mathcal{M}=\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\), with the parameter space \(\mathsf {\Theta }\) being an open subset of \(\mathbb {R}^k\). Similar to the single-parameter case, given a point \(\varvec{t}_{0} \in \mathsf {\Theta }\), we consider a local model \(\rho ^n_{\varvec{t}_0,\varvec{t}}:=\rho ^{\otimes n}_{\varvec{t}_0+\varvec{t}/\sqrt{n}}\). Throughout this section, we assume that \(\rho _{\varvec{t}_0}\) is invertible. For a sequence of POVM \(\mathfrak {m}:=\{M_n\}\), we introduce the condition of local asymptotic covariance as follows:

Condition 2

(Local asymptotic covariance). We say that a sequence of measurements \(\mathfrak {m}:=\{M_n\}\) satisfies local asymptotic covariance at \(\varvec{t}_{0} \in \mathsf {\Theta }\) under the state family \(\mathcal{M}\), if the probability distribution

$$\begin{aligned} \wp ^{n}_{\varvec{t}_{0},\varvec{t}|M_n}(\mathsf {B}) :=\mathrm{Tr}\,\rho _{\varvec{t}_0+\frac{\varvec{t}}{\sqrt{n}}} ^{\otimes n}M_n \left( \frac{\mathsf {B}}{\sqrt{n}}+\varvec{t}_0\right) \end{aligned}$$
(100)

converges to a limiting distribution

$$\begin{aligned} \wp _{\varvec{t}_{0},\varvec{t} |\mathfrak {m}}(\mathsf {B}):= \lim _{n\rightarrow \infty } \wp ^n_{\varvec{t}_{0},\varvec{t}|M_n}(\mathsf {B}), \end{aligned}$$
(101)

the relation

$$\begin{aligned} \wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}(\mathsf {B}+\varvec{t}) = \wp _{\varvec{t}_{0},\varvec{0}| \mathfrak {m}}(\mathsf {B}) \end{aligned}$$
(102)

holds for any \(\varvec{t}\in \mathbb {R}^k\).Footnote 2 When we need to express the outcome of \(\wp ^n_{\varvec{t}_{0},\varvec{t}|M_n} \) or \(\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}\), we denote it by \(\hat{\varvec{t}}\).

Further, we say that a sequence of measurements \(\mathfrak {m}:=\{M_n\}\) satisfies local asymptotic covariance under the state family \(\mathcal{M}\) when it satisfies local asymptotic covariance at any element \(\varvec{t}_{0} \in \Theta \) under the state family \(\mathcal{M}\).

Under these preparations, we obtain the following theorem by using Theorem 3.

Theorem 5

Let \(\{\rho ^{\otimes n}_{\varvec{t}}\}_{\varvec{t}\in \mathsf {\Theta }}\) be a k-parameter D-invariant qudit model with \(C^2\) continuous parametrization. Assume that \(\tilde{J}^{-1}_{\varvec{t}_0}\) exists, \(\rho _{\varvec{t}_0}\) is a non-degenerate state, and a sequence of measurements \(\mathfrak {m}:=\{M_n\}\) satisfies local asymptotic covariance at \(\varvec{t}_{0} \in \mathsf {\Theta }\). Then there exists a covariant POVM \(\widetilde{M}^G\) such that

$$\begin{aligned} \mathrm{Tr}\,\widetilde{M}^G(\mathsf {B})G[\varvec{t},\tilde{J}^{-1}_{\varvec{t}_0}]= \wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}(\mathsf {B}) \end{aligned}$$
(103)

for any vector \(\varvec{t}\) and any measurable subset \(\mathsf {B}\). Here \(\tilde{J}_{\varvec{t}_0}\) is the RLD Fisher information of the qudit model at \(\varvec{t}_0\).

To show Theorem 5, we will use the following lemma.

Lemma 12

For a function f, an operator F, and a c–q Gaussian state in the canonical form \(G[\varvec{\alpha },\varGamma ]\), the relation

$$\begin{aligned} \mathrm{Tr}\,F G[\varvec{\alpha },\varGamma ]= f(\varvec{\alpha }) \end{aligned}$$
(104)

holds for any vector \(\varvec{\alpha }\) if and only if

$$\begin{aligned} F=\int d\varvec{y}\,\mathcal {F}^{-1}_{\varvec{\xi }\rightarrow \varvec{y}}\left( \sqrt{\pi ^k} e^{\frac{1}{4}\sum _j\frac{\xi _j^2}{1-\gamma _j}}\mathcal {F}_{\varvec{\alpha }\rightarrow \varvec{\xi }}\left( f (\varvec{\alpha }) \right) \right) |\varvec{y}\rangle \langle \varvec{y}|. \end{aligned}$$
(105)

Here \(\varvec{\xi }\) and \(\varvec{y}\) are k-dimensional vectors, \(|\varvec{y}\rangle \) is a (multimode) coherent state, \(\gamma _j\) are thermal parameters of the Gaussian, and \(\mathcal {F}^{-1}_{\varvec{\xi }\rightarrow \varvec{y}}(g)\) denotes the inverse of the Fourier transform \(\mathcal {F}_{\varvec{\xi }\rightarrow \varvec{y}}(g):=\int d\varvec{\xi }\ e^{i\varvec{\xi }\cdot \varvec{y}} g\). Therefore, for a given function \(f(\varvec{\alpha })\), there uniquely exists an operator F to satisfy (104).

The proof can be found in “Appendix G”. Now, we are ready to prove Theorem 5.

Proof of Theorem 5

We consider without loss of generality \(G[\varvec{t},\tilde{J}_{\varvec{t}_0}^{-1}]\) to be in the canonical form, noticing that any Gaussian state is unitarily equivalent to a Gaussian state in the canonical form as shown by Lemma 3. For any measurable set \(\mathsf {B}\), we define the operator \(\widetilde{M}^G(\mathsf {B})\) as

$$\begin{aligned} \widetilde{M}^G(\mathsf {B}):=\int d\varvec{y}\,\mathcal {F}^{-1}_{\varvec{\xi }\rightarrow \varvec{y}}\left( \sqrt{\pi ^k} e^{\frac{1}{4}\sum _j\frac{\xi _j^2}{1-\gamma _j}}\mathcal {F}_{\varvec{t}\rightarrow \varvec{\xi }}\left( \wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}(\mathsf {B}) \right) \right) |\varvec{y}\rangle \langle \varvec{y}|. \end{aligned}$$
(106)

From the above definition, it can be verified that \(\widetilde{M}^G(\mathsf {B})\) satisfies the definition of a POVM: first, it is immediate to see that the term \(\mathcal {F}^{-1}_{\varvec{\xi }\rightarrow \varvec{y}}\left( \sqrt{\pi ^k}e^{\frac{1}{4}\sum _j\frac{\xi _j^2}{1-\gamma _j}}\mathcal {F}_{\varvec{t}\rightarrow \varvec{\xi }}\left( \wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}(\mathsf {B}) \right) \right) \) equals the convolution of \(\wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}(\mathsf {B})\) and a Gaussian function by employing the convolution theorem. Since both functions are non-negative, the outcome of convolution is still non-negative, which implies that \(\widetilde{M}^G(\mathsf {B})\ge 0\). Second, by linearity we have \(\widetilde{M}^G(\mathsf {B}_1\cup \mathsf {B}_2)=\widetilde{M}^G(\mathsf {B}_1)+\widetilde{M}^G(\mathsf {B}_2)\) for any disjoint sets \(\mathsf {B}_1\) and \(\mathsf {B}_2\). Finally, the equality \(\widetilde{M}^G(\mathsf {B})\) can be shown by combining the linearity with the fact that \(\wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}\) is a probability distribution function.

Moreover, Lemma 12 guarantees that

$$\begin{aligned} \mathrm{Tr}\,\widetilde{M}^G(\mathsf {B}) G[\varvec{t},\tilde{J}_{\varvec{t}_0}^{-1}]= \wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}(\mathsf {B}). \end{aligned}$$
(107)

What remains to be shown is that the POVM \(\{\widetilde{M}^G(\mathsf {B})\}\) satisfies the covariance condition. Eq. (107) guarantees that

$$\begin{aligned} \mathrm{Tr}\,T_{\varvec{t}'} \widetilde{M}^G(\mathsf {B}) T_{\varvec{t}'}^\dagger G[\varvec{t},\tilde{J}_{\varvec{t}_0}^{-1}]= \mathrm{Tr}\,\widetilde{M}^G(\mathsf {B}) G[\varvec{t}-\varvec{t}',\tilde{J}_{\varvec{t}_0}^{-1}]= \wp _{\varvec{t}_0,\varvec{t}-\varvec{t}'|\mathfrak {m}}(\mathsf {B}), \end{aligned}$$

and

$$\begin{aligned} \mathrm{Tr}\,\widetilde{M}^G(\mathsf {B}+\varvec{t}') G[\varvec{t},\tilde{J}_{\varvec{t}_0}^{-1}]= \wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}(\mathsf {B}+\varvec{t}') = \wp _{\varvec{t}_0,\varvec{t}-\varvec{t}'|\mathfrak {m}}(\mathsf {B}). \end{aligned}$$

The uniqueness of the operator to satisfy the condition (104) implies the covariance condition

$$\begin{aligned} \widetilde{M}^G(\mathsf {B}+\varvec{t}') = T_{\varvec{t}'} \widetilde{M}^G(\mathsf {B}) T_{\varvec{t}'}^\dagger . \end{aligned}$$

\(\square \)

6.2 MSE bound for the D-invariant case

Next, we derive the lower bound of MSE of the limiting distribution for any D-invariant model. As an extension of the mean square error, we introduce the mean square error matrix (MSE matrix), defined as

$$\begin{aligned} V_{i,j}[\wp ]:= \int \,x_i x_j \wp (dx) \end{aligned}$$
(108)

for a generic probability distribution \(\wp \). Since the set of symmetric matrices is not totally ordered, we will consider the minimization of the expectation value \(\mathrm{tr}\,W V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}]\) for a certain weight matrix \(W\ge 0\). For short, we will refer to the quantity \(\mathrm{tr}\,W V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}]\) as the weighted MSE.

Under local asymptotic covariance, one can derive lower bounds on the covariance matrix of the limiting distribution and construct optimal measurements to achieve them. In general, the attainability of the conventional quantum Cramér–Rao bounds is a challenging issue. For instance, a well-known bound is the symmetric logarithmic derivative (SLD) Fisher information bound

$$\begin{aligned} \mathrm{tr}\,W V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}]\ge \mathrm{tr}\,W J_{\varvec{t}_0}^{-1}, \end{aligned}$$
(109)

where \(J_{\varvec{t}_0}\) is the SLD Fisher information. The SLD bound is attainable in the single-parameter case, i.e. when \(k=1\), yet it is in general not attainable for multiparameter estimation (see, for instance, later in Sect. 10.1 for a concrete example).

In the following, we derive an attainable lower bound on the weighted MSE. To this purpose, we define the set \(\mathsf {LAC}(\varvec{t}_{0})\) of local asymptotic covariant sequences of measurements at the point \(\varvec{t}_0 \in \mathsf {\Theta }\). For a model \(\mathcal{M}\), we focus on the minimum value

$$\begin{aligned} \mathcal {C}_{\mathcal {S}}(W,\varvec{t}_{0}):=\min _{\mathfrak {m}' \in \mathsf {LAC}(\varvec{t}_{0}) } \mathrm{tr}\,W V [\wp _{\varvec{t}_{0},\varvec{t} |\mathfrak {m}'}]. \end{aligned}$$
(110)

When \(k\ge 2\), a better choice is the RLD quantum Fisher information bound. The main result of this section is an attainable bound on the weighted MSE, relying on the RLD quantum Fisher information.

Theorem 6

(Weighted MSE bound for D-invariant models). Assume that \(\tilde{J}_{\varvec{t}_0}^{-1} \) exists. Consider any sequence of locally asymptotically covariant measurements \(\mathfrak {m}:=\{M_n\}\). The limiting distribution is evaluated as

$$\begin{aligned} V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}] \ge (\tilde{J}_{\varvec{t}_0})^{-1}, \end{aligned}$$
(111)

where \(\tilde{J}_{\varvec{t}_0}\) is the RLD quantum Fisher information. When the model is \(C^1\) continuous and D-invariant, we have the bound for the weighted MSE with weight matrix \(W\ge 0\) of the limiting distribution as

$$\begin{aligned} \mathrm{tr}\,W V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}] \ge \mathrm{tr}\,WJ_{\varvec{t}_0}^{-1}+\frac{1}{2}\mathrm{tr}\,\left| \sqrt{W}J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1}\sqrt{W}\right| , \end{aligned}$$
(112)

where \(J_{\varvec{t}_0}\) is the SLD quantum Fisher information (35) and \(D_{\varvec{t}_0}\) is the D-matrix (41). When \(\mathcal {S}\) is a D-invariant qudit model and the state \(\rho _{\varvec{t}_0}\) is not degenerate, we have

$$\begin{aligned} \mathcal {C}_{\mathcal {S}}(W,\varvec{t}_{0}) = \mathrm{tr}\,WJ_{\varvec{t}_0}^{-1}+\frac{1}{2}\mathrm{tr}\,\left| \sqrt{W}J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1}\sqrt{W}\right| . \end{aligned}$$
(113)

Moreover, if \(W>0\) and \(\wp _{\varvec{t}_0,\varvec{0}| \mathfrak {m}}\) has a differentiable PDF, the equality in (112) holds if and only if \(\wp _{\varvec{t}_0,\varvec{t}| \mathfrak {m}}\) is the normal distribution with average zero and covariance

$$\begin{aligned} V_{\varvec{t_0}|W}:=J_{\varvec{t}_0}^{-1}+\frac{1}{2} \sqrt{W}^{-1} \left| \sqrt{W}J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1}\sqrt{W}\right| \sqrt{W}^{-1}. \end{aligned}$$
(114)

Further, when \(\{\rho _{\varvec{t}}\}_{\varvec{t}\in \mathsf {\Theta }}\) is a qudit-model with \(C^2\) continuous parametrization, the equality in (112) holds, i.e., there exist a sequence of POVMs \(M_W^{\varvec{t}_0,n}\), a compact set K, and constant \(c(\varvec{t}_0)\) such that

$$\begin{aligned} \limsup _{n \rightarrow \infty }\sup _{\varvec{t}_0 \in K} \sup _{\varvec{t} \in \mathsf {\Theta }_{n,x,c(\varvec{t}_0)} } n^{\kappa }\Vert \wp ^n_{\varvec{t}_0,\varvec{t}| M_W^{\varvec{t}_0,n}} - N[\varvec{t}, V_{\varvec{t_0}|W}] \Vert _1 <\infty , \end{aligned}$$
(115)

where \(\kappa \) is a parameter to satisfy \(\kappa \ge 0.027\).

In the following, we prove Theorem 6 following three steps. The first step is to derive the bound (112). The second step is to show that, to achieve the equality, the limiting distribution needs to be a Gaussian with certain covariance. The last step is to find a measurement attaining the equality. In this way, when the state is not degenerate, we can construct the measurement using Q-LAN.Footnote 3

Proof of Theorem 6

Impossibility partFootnote 4(Proofs of (111) and (112)):

To give a proof, we focus on the \(\epsilon \)-difference RLD Fisher information matrix \(\tilde{J}_{\varvec{t}_0,\epsilon }\) at \(\varvec{t}_0\) for a quantum states family \(\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\). We denote the \(\epsilon \)-difference Fisher information matrices for the distribution family \(\{\wp ^n_{\varvec{t}_{0},\varvec{t}|M_n}\}_{\varvec{t}}\) and \(\{\wp _{\varvec{t}_{0},\varvec{t} |\mathfrak {m}}\}_{\varvec{t}}\) by \(J_{\varvec{t},\epsilon }^n\) and \(J_{\varvec{t},\epsilon }^{\mathfrak {m}}\), respectively. Also, we employ the notations given Sect. 5.4.

Applying (90) to the POVM \(M_n\), we have

$$\begin{aligned} \frac{1}{n}\tilde{J}_{\varvec{t}_0, \epsilon /\sqrt{n}}^n \ge J^n_{\varvec{0},\epsilon }, \end{aligned}$$
(116)

where \(\tilde{J}_{\varvec{t}_0,\epsilon /\sqrt{n}}^{n}\) denotes the \((\epsilon /\sqrt{n})\)-difference RLD Fisher information matrix of the n-copy states \(\{\rho _{\varvec{t}}^{\otimes n}\}_{\varvec{t} \in \mathsf {\Theta }}\). Combining the above with (98) of Lemma 11, we find that \(J^n_{\varvec{0},\epsilon } < (\tilde{J}^{[\epsilon ]}_{\varvec{t}_0} +I)\) for sufficiently large n. Hence, we can apply Lemma 21 (see “Appendix B”) to the distribution family \(\{\wp ^n_{\varvec{t}_{0},\varvec{t}|M_n}\}_{\varvec{t}}\). Then, for any complex vector \(\varvec{a}\), we have

$$\begin{aligned} \liminf _{n \rightarrow \infty } \langle \varvec{a}| J_{\varvec{0},\epsilon }^n -J_{\varvec{0},\epsilon }^{\mathfrak {m}}|\varvec{a}\rangle \ge 0 . \end{aligned}$$
(117)

By taking the limit \(n\rightarrow \infty \), the combination of (116), (98) of Lemma 11, and (117) implies

$$\begin{aligned} \tilde{J}^{[\epsilon ]}_{\varvec{t}_0} \ge J_{\varvec{0},\epsilon }^{\mathfrak {m}}. \end{aligned}$$
(118)

Here, in the same way as the proof of Theorem 2, we can assume that the outcome \(\hat{\varvec{t}}\) satisfies the unbiasedness condition. Hence, the \(\epsilon \)-difference Carmér–Rao inequality (89) implies that

$$\begin{aligned} V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}] \ge \left( J_{\varvec{0},\epsilon }^{\mathfrak {m}}\right) ^{-1} \ge \left( \tilde{J}^{[\epsilon ]}_{\varvec{t}_0} \right) ^{-1}. \end{aligned}$$
(119)

By taking the limit \(\epsilon \rightarrow 0\), (99) of Lemma 11 implies

$$\begin{aligned} V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}] \ge \left( \tilde{J}_{\varvec{t}_0} \right) ^{-1}. \end{aligned}$$
(120)

When the model is \(C^1\) continuous and D-invariant, adding the conventional discussion for MSE bounds (see, e.g., Chapter 6 of [5]) to (119), we obtain (112).

Achievability part (Proof of (113)):

Next, we discuss the attainability of the bound when \(W>0\) and \(\wp _{\varvec{t}_0,\varvec{0}| \mathfrak {m}}\) has a differentiable PDF. In this case, we have the Fisher information matrix \(J_{\varvec{0}}^{\mathfrak {m}}\) of the location shift family \(\{\wp _{\varvec{t}_0,\varvec{t}| \mathfrak {m}}\}_{\varvec{t}}\). Taking limit \(\epsilon \rightarrow 0\) in (119), we have

$$\begin{aligned} V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}] \ge \left( J_{\varvec{0}}^{\mathfrak {m}}\right) ^{-1} \ge \left( \tilde{J}_{\varvec{t}_0} \right) ^{-1}. \end{aligned}$$
(121)

The equality of (112) holds if and only if \(V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}] = V_{\varvec{t_0}|W}\) and the equality in the first inequality of (121) holds. Due to the same discussion as the proof of Theorem 2, the equality in the first inequality of (121) holds only when all the components of the logarithmic derivative of the distribution family \(\{\wp _{\varvec{t}_{0},\varvec{t} |\mathfrak {m}}\}_{\varvec{t}}\) equal the linear combinations of the estimate of \(t_i\). This condition is equivalent to the condition that the distribution family \(\{\wp _{\varvec{t}_{0},\varvec{t} |\mathfrak {m}}\}_{\varvec{t}}\) is a distribution family of shifted normal distributions. Therefore, when \(W>0\), the equality condition of Eq. (112) is that \(\wp _{\varvec{t}_0,\varvec{t}| \mathfrak {m}}\) is the normal distribution with average zero and covariance matrix \(V_{\varvec{t_0}|W}\).

Now, we assume that the state \(\rho _{\varvec{t}_0}\) is not degenerate. Then, we use Q-LAN to show that there always exists a sequence of POVM \(\mathfrak {m}=\{M_n\}\) satisfying the above property. We rewrite Eq. (77) of Theorem 3 as follows.

$$\begin{aligned} \limsup _{n \rightarrow \infty }\sup _{\varvec{t}_0 \in K} \sup _{\varvec{t} \in \mathsf {\Theta }_{n,x,c(\varvec{t}_0)} } n^{\kappa } \left\| \mathcal {T}^\mathrm{QLAN}_n \left( \rho _{\varvec{t}_0+\varvec{t}/\sqrt{n}}^{\otimes n}\right) -G[{\varvec{t}},\tilde{J}_{\varvec{t}_0}^{-1}] \right\| _1 <\infty , \end{aligned}$$
(122)

where the notation is the same as Theorem 3. Then, we choose the covariant POVM \(M_{W}^{\tilde{J}_{\varvec{t}_0}^{-1}}\) on the Gaussian system given in Lemma 6. When the POVM \(M_{W}^{\tilde{J}_{\varvec{t}_0}^{-1}}\) applied to the system with the state \(G[{\varvec{\theta }},\varGamma ] \), the outcome obeys the distribution \(N[\varvec{t}, V_{\varvec{t_0}|W}] \). Therefore, due to (122), when we choose \(\mathfrak {m} \) to be the sequence of POVMs \(M_n(\mathsf {B}):= (\mathcal {T}^\mathrm{QLAN}_n)^\dag (M_{W}^{\tilde{J}_{\varvec{t}_0}^{-1}} (\mathsf {B}))\) satisfies the condition (115).

Notice that when W has null eigenvalues, \(\sqrt{W}^{-1}\) is not properly defined. In this case, we consider \(W_{\epsilon }:=W+\epsilon \cdot P_{W}^{\perp }\) for \(\epsilon >0\), where \(P_W^\perp \) is the projection on the kernel of W, i.e. \(P_W^\perp \varvec{x}=\varvec{x}\) if and only if \(W\varvec{x}=0\). Denoting by \(J_{\epsilon }^{-1}:=J_{\varvec{t}_0}^{-1}+\frac{1}{2} \sqrt{W_\epsilon }^{-1} \left| \sqrt{W_\epsilon }J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1}\sqrt{W_\epsilon }\right| \sqrt{W_\epsilon }^{-1}\) the

$$\begin{aligned} \mathrm{tr}\,WJ_{\epsilon }^{-1}\le \mathrm{tr}\,W_{\epsilon }J_{\epsilon }^{-1}-\epsilon \mathrm{tr}\,J_{\epsilon }^{-1}. \end{aligned}$$

Meanwhile, since \(W_\epsilon >0\) we can repeat the above argument to find a qudit measurement that attains \(\mathrm{tr}\,W_{\epsilon }J_{\epsilon }^{-1}\). Taking the limit \(\epsilon \rightarrow 0\) the quantity \(\mathrm{tr}\,WJ_{\epsilon }^{-1}\) converges to the equality of Eq. (113). Therefore, we can still find a sequence of measurements with Fisher information \(\{J_\epsilon \}\) that approaches the bound. \(\quad \square \)

6.3 Precision bound for the estimation of generic models

In the previous subsection, we established the precision bound for D-invariant models, where the bound is attainable and has a closed form. Here we extend the bound to any n-copy qudit models. The main idea is to extend the model to a larger D-invariant model by introducing additional parameters.

When estimating parameters in a generic model \(\mathcal {S}\) (consisting of states generated by noisy evolutions, for instance), the bound (112) may not hold. It is then convenient to extend the model to a D-invariant model \(\mathcal {S}'\) which contains \(\mathcal {S}\). Since the bound (112) holds for the new model \(\mathcal {S}'\), a corresponding bound can be derived for the original model \(\mathcal {S}\). The new model \(\mathcal {S}'\) has some additional parameters other than those of \(\mathcal {S}\), which are fixed in the original model \(\mathcal {S}\). Therefore, a generic quantum state estimation problem can be regarded as an estimation problem in a D-invariant model with fixed parameters. The task is to estimate parameters in a model \(\mathcal {S}'\) (globally) parameterized as \(\varvec{t}'_0=(\varvec{t}_0,\varvec{p}_0)\in \mathsf {\Theta }'\), where \(\varvec{p}_0\) is a fixed vector and \(\mathsf {\Theta }'\) is an open subset of \(\mathbb {R}^{k'}\) that equals \(\mathsf {\Theta }\) when restricted to \(\mathbb {R}^k\). In the neighborhood of \(\varvec{t}'_0\), since the vector \(\varvec{p}_0\) is fixed, we have \(\varvec{t}'=(\varvec{t},\varvec{0})\) with \(\varvec{0}\) being the null vector of \(\mathbb {R}^{k'-k}\) and \(\varvec{t}\in \mathbb {R}^{k}\) being a vector of free parameters. For this scenario, only the parameters in \(\varvec{t}\) need to be estimated and we know the parameters \(\varvec{p}_0\). Hence, the MSE of \(\varvec{t}'\) is of the form

$$\begin{aligned} V[\wp _{\varvec{t}'_0,\varvec{t}'| \mathfrak {m}}]= \left( \begin{array}{cc} V[\wp _{\varvec{t}_0,\varvec{t}| \mathfrak {m}}]&{}0 \\ 0 &{} 0 \end{array}\right) \end{aligned}$$

for any local asymptotic covariant measurement sequence \(\mathfrak {m}\). Due to the block diagonal form of the MSE matrix, to discuss the weight matrix W in the original model \(\mathcal {S}\), we consider the weight matrix \(W'=P^TWP\) in the D-invariant model \(\mathcal {S}'\), where P is any \(k\times k'\) matrix satisfying the constraint \((P)_{ij}:=\delta _{ij}\) for \(i,j\le k\) in the following way.

Theorem 7

(MSE bound for generic models). The models \(\mathcal {S}\) and \(\mathcal {S}'\) are \(C^1\) continuous and are given in the same way as Proposition 1, and the notations are the same as Proposition 1. Also, we assume that \(\tilde{J}_{\varvec{t}_0}^{-1} \) exists. Consider any sequence of locally asymptotically covariant measurements \(\mathfrak {m}:=\{M_n\}\). Then, the MSE matrx of the limiting distribution is evaluated as \(\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}\). There exists a \(k \times k'\) matrix P such that

$$\begin{aligned} P_{ij}&=\delta _{ij} \hbox { for }i,j\le k \end{aligned}$$
(123)
$$\begin{aligned} V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}]&\ge P (\tilde{J}_{\varvec{t}_0'})^{-1} P^T, \end{aligned}$$
(124)

where \(\tilde{J}_{\varvec{t}_0'}\) is the RLD quantum Fisher information of \(S'\). When the model \(\mathcal{M}'\) is a D-invariant model, we have the bound for the weighted MSE with weight matrix \(W\ge 0\) of the limiting distribution as

$$\begin{aligned} \mathrm{tr}\,W V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}] \ge \mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t}_{0}) \end{aligned}$$
(125)

with \(\mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t}_{0})\) defined in Eq. (46). When the model \(\mathcal{M}'\) is a D-invariant qudit model and the state \(\rho _{\varvec{t}_0}\) is not degenerate, we have

$$\begin{aligned} \mathcal {C}_{\mathcal{M}}(W,\varvec{t}_{0}) = \mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t}_{0}). \end{aligned}$$
(126)

Moreover, if \(W>0\) and \(\wp _{\varvec{t}_0,\varvec{0}| \mathfrak {m}}\) has a differentiable PDF, the equality in (125) holds if and only if \(\wp _{\varvec{t}_0,\varvec{t}| \mathfrak {m}}\) is the normal distribution with average zero and covariance matrix \(V_{\varvec{t_0}|W}:=\mathsf{Re}(Z_{\varvec{t}_0}(\varvec{X}))+ \sqrt{W}^{-1} |\sqrt{W} \mathsf{Im} ( Z_{\varvec{t}_0}(\varvec{X}))\sqrt{W}| \sqrt{W}^{-1}\), where \(\varvec{X}\) is the vector to realize the minimum (49). Further, when the models \(\mathcal {S}\) and \(\mathcal {S}'\) are qudit-models with \(C^2\) continuous parametrization, the equality in (125) holds, i.e., there exist a sequence of POVMs \(M_W^{\varvec{t}_0,n}\), a compact set K, and constant \(c(\varvec{t}_0)\) such that

$$\begin{aligned} \limsup _{n \rightarrow \infty }\sup _{\varvec{t}_0 \in K} \sup _{\varvec{t} \in \Theta _{n,x,c(\varvec{t}_0)} } n^{\kappa }\left\| \wp ^n_{\varvec{t}_0,\varvec{t}| M_W^{\varvec{t}_0,n}} - N[\varvec{t}, V_{\varvec{t_0}|W}] \right\| _1 <\infty . \end{aligned}$$
(127)

Theorem 7 determines the ultimate precision limit for generic qudit models. Now, we compare it with the most general existing bound on quantum state estimation, namely Holevo’s bound [5]. Let us define the ultimate precision of unbiased measurements as

$$\begin{aligned} \mathcal {C}_{\mathsf {UB}_{\mathcal{M}}}(W,\varvec{t}_{0}):= \lim _{n\rightarrow \infty }\min _{M_n \in \mathsf {UB}_{\mathcal{M}} } \mathrm{tr}\,W V \left[ \wp ^n_{\varvec{t}_{0},\varvec{t}|M_n}\right] . \end{aligned}$$

Since the Holevo bound still holds with the n-copy case, (see [15, Lemma 4]) we have

$$\begin{aligned} \mathcal {C}_{\mathsf {UB}_{\mathcal{M}}}(W,\varvec{t}_{0})\ge \mathcal {C}_{\mathrm{H},\mathcal{M}}(W,\varvec{t}_{0}). \end{aligned}$$
(128)

There are a couple of differences between our results and existing results: The Holevo bound is derived under unbiasedness assumption, which, as mentioned earlier, is more restrictive than local asymptotic covariance. Our bound (125) thus applies to a wider class of measurements than the Holevo bound.

Furthermore, Yamagata et al. [19] showed a similar statement as (127) of Theorem 7 in a local model scenario. They did not show the compact uniformity of the convergence and had no order estimation of the convergence. However, our evaluation (127) guarantees the compact uniformity with the order estimation. Then, they did not discuss an estimator to attain the bound globally. Later, we will construct an estimator to attain our bound globally based on the estimator given in Theorem 7. Our detailed evaluation with the compact uniformity and the order estimation enables us to evaluate the performance of such an estimator globally.

Proof of Theorem 7

Impossibility part (Proofs of (124) and (125)):

We denote the \(\epsilon \)-difference Fisher information matrices for the distribution family \(\{\wp ^n_{\varvec{t}_{0},\varvec{t}|M_n}\}_{\varvec{t}}\) and \(\{\wp _{\varvec{t}_{0},\varvec{t} |\mathfrak {m}}\}_{\varvec{t}}\) by \(J_{\varvec{t},\epsilon }^n\) and \(J_{\varvec{t},\epsilon }^{\mathfrak {m}}\), respectively. Also, we denote the \(\epsilon \)-difference type RLD Fisher information matrix at \(\varvec{t}_0'= (\varvec{t}_0,\varvec{0})\) of the family \(\{\rho _{\varvec{t}'}^{\otimes n}\}_{\varvec{t}'}\) by \(\tilde{J}_{\varvec{t}_0',\epsilon }^{n}\). Then, we have (117) in the same way.

Applying (97) of Lemma 10 with \(\epsilon \rightarrow \epsilon /\sqrt{n}\), there exist \(k \times k'\) matrices \(P_n\) such that

$$\begin{aligned}&(P_{n})_{ij} =\delta _{ij} \qquad \hbox { for }i,j\le k \end{aligned}$$
(129)
$$\begin{aligned}&\frac{1}{n}\left( P_n (\tilde{J}_{\varvec{t}_0', \epsilon /\sqrt{n}}^{n} )^{-1} P_n^T\right) ^{-1} \ge J_{\varvec{t}_0,\epsilon }^n. \end{aligned}$$
(130)

Hence, the combination of (98) of Lemma 11, (130), and (117) implies that there exists a \(k \times k'\) matrices P such that

$$\begin{aligned}&P_{ij} =\delta _{ij} \qquad \hbox { for }i,j\le k \end{aligned}$$
(131)
$$\begin{aligned}&\left( P(\tilde{J}_{\varvec{t}_0'}^{[\epsilon ]})^{-1}P^T\right) ^{-1} \ge J_{\varvec{t}_0,\epsilon }^{\mathfrak {m}}. \end{aligned}$$
(132)

Due to the same reason as (119), we have

$$\begin{aligned} V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}] \ge (J_{\varvec{t}_0,\epsilon }^{\mathfrak {m}})^{-1} \ge P(\tilde{J}_{\varvec{t}_0'}^{[\epsilon ]})^{-1}P^T. \end{aligned}$$
(133)

By taking the limit \(\epsilon \rightarrow 0\), the combination of (99) of Lemma 11 and (133) implies (124).

When the model \(\mathcal{M}'\) is D-invariant, since

$$\begin{aligned} \mathrm{tr}\,W V[\wp _{\varvec{t}_{0},\varvec{t}| \mathfrak {m}}] \ge \mathrm{tr}\,P^T W P\tilde{J}_{\varvec{t}_0'}^{-1}, \end{aligned}$$

we obtain (125) by using the expression (50) in the same way as (112):

Achievability part (Proof of (126)):

Since \(\rho _{\varvec{t}_0'}\) is not degenerate, we can show the achievability in the same way as Theorem 6 because we can apply Q-LAN (Theorem 3) for the model \(\mathcal{M}'\). The difference is the following. Choosing the matrix P to achieve the minimum (50), we employ the covariant POVM \(M_{P|W}^{\tilde{J}_{\varvec{t}_0'}^{-1}}\) instead of the covariant POVM \(M_W^{\tilde{J}_{\varvec{t}_0}^{-1}}\). Then, we obtain the desired statement. \(\quad \square \)

7 Nuisance Parameters

For state estimation in a noisy environment, the strength of noise is not a parameter of interest, yet it affects the precision of estimating other parameters. In this scenario, the strength of noise is a nuisance parameter [46, 47]. To illustrate the difference between nuisance parameters and fixed parameters that are discussed in the previous section, let us consider the case of a qubit clock state under going a noisy time evolution. To estimate the duration of the evolution, we introduce the strength of the noise as an additional parameter and consider the estimation problem in the extended model parameterized by the duration and the noise strength. The strength of the noise is usually unknown. Although it is not a parameter of interest, its value will affect the precision of our estimation, and thus it should be treated as a nuisance parameter.

7.1 Precision bound for estimation with nuisance parameters

In this subsection, we consider state estimation of an arbitrary \((k+s)\)-parameter model \(\{ \rho _{\varvec{t},\varvec{p}}\}_{(\varvec{t},\varvec{p})\in \tilde{\mathsf {\Theta }}}\), where \(\varvec{t}\) and \(\varvec{p}\) are k-dimensional and s-dimensional parameters, respectively. Our task is to estimate only the parameters \(\varvec{t}\) and it is not required to estimate the other parameters \(\varvec{p}\), which is called nuisance parameters. Hence, our estimate is k-dimensional. We say that a parametric family of a structure of nuisance parameters is a nuisance parameter model, and denote it by \(\tilde{\mathcal {S}}=\{ \rho _{\varvec{t},\varvec{p}}\}_{(\varvec{t},\varvec{p})\in \tilde{\mathsf {\Theta }}}\). We simplify \((\varvec{t},\varvec{p})\) by \(\tilde{\varvec{t}}\).

The concept of local asymptotic covariance can be extended to a model with nuisance parameters by considering a local model \(\rho ^n_{\tilde{\varvec{t}}_0,\tilde{\varvec{t}}} :=\rho ^{\otimes n}_{\tilde{\varvec{t}}_0+\tilde{\varvec{t}}/\sqrt{n}}\). Throughout this section, we assume that \(\rho _{\tilde{\varvec{t}}_0}\) is invertible and all the parametrizations are at least \(C^1\) continuous.

Condition 3

(Local asymptotic covariance with nuisance parameters). We say that a sequence of measurements \(\mathfrak {m}:=\{M_n\}\) to estimate the k-dimensional parameter \(\varvec{t}\) satisfies local asymptotic covariance at \(\tilde{\varvec{t}}_0 =(\varvec{t}_0,\varvec{p}_0) \in \tilde{\mathsf {\Theta }}\) under the nuisance parameter model \(\tilde{\mathcal{M}}\) when the probability distribution

$$\begin{aligned} \wp ^{n}_{\tilde{\varvec{t}}_0,\tilde{\varvec{t}}|M_n}(\mathsf {B}) :=\mathrm{Tr}\,\rho _{\tilde{\varvec{t}}_0+\frac{\tilde{\varvec{t}}}{\sqrt{n}}} ^{\otimes n}M_n \left( \frac{\mathsf {B}}{\sqrt{n}}+\varvec{t}_0\right) \end{aligned}$$
(134)

converges to a limiting distribution

$$\begin{aligned} \wp _{\tilde{\varvec{t}}_0,\tilde{\varvec{t}} |\mathfrak {m}}(\mathsf {B}):= \lim _{n\rightarrow \infty } \wp ^n_{\tilde{\varvec{t}}_0,\tilde{\varvec{t}}|M_n}(\mathsf {B}), \end{aligned}$$
(135)

the relation

$$\begin{aligned} \wp _{\tilde{\varvec{t}}_0,\tilde{\varvec{t}}| \mathfrak {m}}(\mathsf {B}+\varvec{t}) = \wp _{\tilde{\varvec{t}}_0,(\varvec{0},\varvec{0})| \mathfrak {m}}(\mathsf {B}) \end{aligned}$$
(136)

holds for any \(\tilde{\varvec{t}}=(\varvec{t},\varvec{p})\in \mathbb {R}^{k+s}\).

Further, we say that a sequence of measurements \(\mathfrak {m}:=\{M_n\}\) satisfies local asymptotic covariance under the nuisance parameter model \(\tilde{\mathcal{M}}\) when it satisfies local asymptotic covariance at any element \(\tilde{\varvec{t}}_0 \in \tilde{\mathsf {\Theta }}\) under the nuisance parameter model \(\tilde{\mathcal{M}}\).

The quantity we want to bound is \(\mathrm{tr}\,V[\wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}] W\), where \(\wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}\) is the limiting distribution of a sequence \(\mathfrak {m}\) of locally asymptotically covariant measurements and W is a weight matrix. Since nuisance parameters are not of interest, the weight matrix of the model \(\tilde{\mathcal {S}}\) is a \((k+s)\times (k+s)\) matrix of the form

$$\begin{aligned} \tilde{W}=\left( \begin{array}{cc} W &{} 0\\ 0&{} 0\end{array}\right) . \end{aligned}$$
(137)

Lemma 13

Let \(\tilde{\mathcal {S}}=\{\rho _{\tilde{\varvec{t}}}\}_{\tilde{\varvec{t}}\in \tilde{\mathsf {\Theta }}}\) be a \((k+s)\)-parameter nuisance parameter model and let \(\mathcal {S}'=\{\rho _{\varvec{t}'}\}_{\varvec{t}'=(\tilde{\varvec{t}},\varvec{q})}\) be a \(k'\)-parameter model containing \(\tilde{\mathcal {S}}\) as \(\rho _{\tilde{\varvec{t}}}=\rho _{(\tilde{\varvec{t}},\varvec{0})}\). When \(\mathcal {S}'\) is D-invariant and the inverse \(\tilde{J}_{\tilde{\varvec{t}}_0}^{-1} \) exists, we have

$$\begin{aligned} \mathcal {C}_{\mathrm{NH},\mathcal{M}}(W,\tilde{\varvec{t}}_{0})&:= \min _{\varvec{X}} \min _V \{\mathrm{tr}\,W V| V \ge Z_{\tilde{\varvec{t}}_0}(\varvec{X})\} \end{aligned}$$
(138)
$$\begin{aligned}&=\min _{P}\left\{ \mathrm{tr}\,P^TWP(J_{\varvec{t}_0'}^{-1})+\frac{1}{2}\mathrm{tr}\,\left| \sqrt{P^TWP} J_{\varvec{t}_0'}^{-1}D_{\varvec{t}_0'}J_{\varvec{t}_0'}^{-1}\sqrt{P^TWP}\right| \right\} . \end{aligned}$$
(139)

In (138), V is a real symmetric matrix and \(\varvec{X}=(X_i)\) is a k-component vector of operators to satisfy

$$\begin{aligned} \mathrm{Tr}\,X_i \frac{\partial \rho _{\tilde{\varvec{t}}}}{\partial \tilde{t}_j} \Big |_{\tilde{\varvec{t}}=\tilde{\varvec{t}}_0 }=\delta _{ij}, \ \forall \,i \le k,\quad \forall \,j \le k+s. \end{aligned}$$
(140)

In (139), the minimization is taken over all \(k\times (k+s)\) matrices satisfying the constraint \((P)_{ij}:=\delta _{ij}\) for \(i \le k,j\le k+s\), and, \(J_{\varvec{t}_0'}\) and \(D_{\varvec{t}_0'}\) are the SLD Fisher information matrix and the D-matrix [cf. Eqs. (35) and (41)] for the extended model \(\mathcal {S}'\) at \(\varvec{t}_0':=(\tilde{\varvec{t}}_0,\varvec{0})\).

This lemma is a different statement from [15, Theorem 4]. However, using the method of [15, Theorem 4], we can show this lemma.

In the following, we derive an attainable lower bound on the weighted MSE. To this purpose, we define the set \(\mathsf {LAC}(\tilde{\varvec{t}}_{0})\) of local asymptotic covariant sequences of measurements at the point \(\tilde{\varvec{t}}_0 \in \tilde{\mathsf {\Theta }}\) for the nuisance parameter model \(\tilde{\mathcal{M}}\), and focus on the minimum value

$$\begin{aligned} \mathcal {C}_{\mathrm{N},\tilde{\mathcal{M}}}(W,\tilde{\varvec{t}}_{0}) :=\min _{\mathfrak {m}\in \mathsf {LAC}(\tilde{\varvec{t}}_{0}) } \mathrm{tr}\,W V [\wp _{\tilde{\varvec{t}}_{0},\tilde{\varvec{t}} | \mathfrak {m}}]. \end{aligned}$$
(141)

Theorem 8

(Weighted MSE bound with nuisance parameters). The models \(\tilde{\mathcal {S}}\) and \(\mathcal {S}'\) are given in the same way as Lemma 13, and the notations is the same as Lemma 13. Also, we assume that \((\tilde{J}_{\varvec{t}'_0})^{-1}\) exists. Consider any sequence of locally asymptotically covariant measurements \(\mathfrak {m}:=\{M_n\}\) for the nuisance parameter model \(\tilde{\mathcal {S}}\). Then, the MSE matrx of the limiting distribution is evaluated as follows. There exists a \(k \times k'\) matrix P such that

$$\begin{aligned}&P_{ij} =\delta _{ij} \qquad \hbox { for }1\le i\le k, 1\le j \le k' \end{aligned}$$
(142)
$$\begin{aligned}&V[\wp _{\tilde{\varvec{t}}_0,\tilde{\varvec{t}}| \mathfrak {m}}] \ge P (\tilde{J}_{\varvec{t}'_0})^{-1} P^T. \end{aligned}$$
(143)

When the model \(\mathcal {S}'\) is D-invariant, we have the bound for the weighted MSE with weight matrix \(W\ge 0\) of the limiting distribution as

$$\begin{aligned} \mathrm{tr}\,W V[\wp _{\tilde{\varvec{t}}_0,\tilde{\varvec{t}}| \mathfrak {m}}] \ge \mathcal {C}_{\mathrm{NH},\mathcal{M}}(W,\tilde{\varvec{t}}_{0}). \end{aligned}$$
(144)

When the model \(\mathcal {S}'\) is a D-invariant qudit model and the state \(\rho _{\varvec{t}_0}\) is not degenerate, we have

$$\begin{aligned} \mathcal {C}_{\mathrm{N},\tilde{\mathcal{M}}}(W,\tilde{\varvec{t}}_{0}) = \mathcal {C}_{\mathrm{NH},\tilde{\mathcal{M}}}(W,\tilde{\varvec{t}}_0). \end{aligned}$$
(145)

Moreover, if \(W>0\) and \(\wp _{\tilde{\varvec{t}}_0,\tilde{\varvec{t}}| \mathfrak {m}}\) has a differentiable PDF, the equality in (144) holds if and only if \(\wp _{\varvec{t}_0,\varvec{t}| \mathfrak {m}}\) is the normal distribution with average zero and covariance

$$\begin{aligned} V_{\varvec{t}_0|W}:=\mathsf{Re}(Z_{{\varvec{t}}_0}(\varvec{X}))+ \sqrt{W}^{-1} |\sqrt{W} \mathsf{Im} ( Z_{\varvec{t}_0}(\varvec{X}))\sqrt{W}| \sqrt{W}^{-1}, \end{aligned}$$
(146)

where \(\varvec{X}\) is the vector to realize the minimum (138). Further, when the models \(\tilde{\mathcal {S}}\) and \(\mathcal {S}'\) are qudit-models with \(C^2\) continuous parametrization, the equality in (144) holds, i.e., there exist a sequence of POVMs \(M_W^{\varvec{t}_0,n}\), a compact set K, and constant \(c(\varvec{t}_0)\) such that

$$\begin{aligned} \limsup _{n \rightarrow \infty }\sup _{\varvec{t}_0 \in K} \sup _{\varvec{t} \in \mathsf {\Theta }_{n,x,c(\varvec{t}_0)} } n^{\kappa }\Vert \wp ^n_{\varvec{t}_0,\varvec{t}| M_W^{\varvec{t}_0,n}} - N[\varvec{t}, V_{\varvec{t_0}|W}] \Vert _1 <\infty . \end{aligned}$$
(147)

Here \(\kappa \) is a parameter to satisfy \(\kappa \ge 0.027\).

Before proving Theorem 8, we discuss a linear subfamily of \(k'\)-dimensional Gaussian family \(\{G[\varvec{t}', \varvec{\gamma }]\}_{\varvec{t}'\in \mathbb {R}^{k'}}\). Consider a linear map T from \(\mathbb {R}^{(k+s)}\) to \(\mathbb {R}^{k'}\). We have the subfamily \(\tilde{\mathcal{M}}:=\{G[ T(\varvec{t},\varvec{p}), \varvec{\gamma }]\}_{(\varvec{t},\varvec{p})\in \mathbb {R}^{k+s}}\) as a nuisance parameter model. Then, the covariance condition is extended as follows.

Definition 5

A POVM M is unbiased for the nuisance parameter model \(\{\rho _{(\varvec{t},\varvec{p})}\}\) when

$$\begin{aligned} \varvec{t}=E_{\varvec{t},\varvec{p}}(M):=\int \varvec{x} \mathrm{Tr}\,\rho _{(\varvec{t},\varvec{p})} M(d \varvec{x}) \end{aligned}$$

holds for any parameter \((\varvec{t},\varvec{p})\). A POVM M is a covariant estimator for the nuisance parameter model \(\{G[ T(\varvec{t},\varvec{p}), \varvec{\gamma }]\}\) when the distribution \(\wp _{(\varvec{t},\varvec{p})|M}\) satisfies the condition \(\wp _{\varvec{0},\varvec{0}|M}(\mathsf {B})=\wp _{\varvec{t},\varvec{p}|M}(\mathsf {B}+\varvec{t})\).

Then, we have the following corollary of Lemma 6.

Corollary 3

For any weight matrix \(W\ge 0\), the nuisance parameter model \(\tilde{\mathcal{M}}=\{G[ T(\varvec{t},\varvec{p}), \varvec{\gamma }]\}\) with \(C^1\) continuous parametrization satisfies

$$\begin{aligned}&\inf _{M \in \mathsf {UB}_{\tilde{\mathcal{M}}}}\mathrm{tr}\,W V_{\varvec{t}}(M) =\inf _{M \in \mathsf {CUB}_{\tilde{\mathcal{M}}}}\mathrm{tr}\,W V_{\varvec{t}}(M) = \mathcal {C}_{\mathrm{NH},\tilde{\mathcal{M}}}(W,\varvec{t}), \end{aligned}$$
(148)

where \(\mathsf {UB}_{\tilde{\mathcal{M}}}\) and \(\mathsf {CUB}_{\tilde{\mathcal{M}}}\) are the sets of unbiased estimators and covariant unbiased estimators of the nuisance parameter model \(\tilde{\mathcal{M}}\), respectively. Further, when \(W>0\), we choose a vector \(\varvec{X}\) to realize the minimum in (49). The above infimum is attained by the covariant unbiased estimators \(M_W\) whose output distribution is the normal distribution with average \(\varvec{t}\) and covariance matrix \(\mathsf{Re} ( (Z_{\varvec{t}}(\varvec{X}))+ \sqrt{W}^{-1} |\sqrt{W} \mathsf{Im} ( Z_{\varvec{t}}(\varvec{X}))\sqrt{W}| \sqrt{W}^{-1}\).

This corollary can be shown as follows. The inequality \( \inf _{M \in \mathsf {UB}_{\tilde{\mathcal{M}}}}\mathrm{tr}\,W V_{\varvec{t}}(M) \ge \mathcal {C}_{\mathrm{NH},\tilde{\mathcal{M}}}(W,\varvec{t})\) follows from the condition (140). Similar to Corollary 2, Proposition 1 guarantees that the latter part of the corollary with \(W>0\) follows from (138) and Lemma 6. Hence, we obtain this corollary for \(W>0\). The case with non strictly positive W can be shown by considering \(W_\epsilon \) in the same way as Corollary 1.

Proof of Theorem 8

Impossibility part (Proofs of (143) and (144)):

We denote the \(\epsilon \)-difference Fisher information matrix of \(\{\wp _{\tilde{\varvec{t}}_0,\tilde{\varvec{t}} |\mathfrak {m}}\}_{\tilde{\varvec{t}}}\) by \(J_{\tilde{\varvec{t}}_0,\epsilon }^{\mathfrak {m}}\). Due to (132), there exists a \((k+s) \times k'\) matrix \(\tilde{P}\) satisfying the following conditions.

$$\begin{aligned} \tilde{P}_{ij}&=\delta _{ij} \hbox { for } 1\le i, j\le k+s , \quad \left( P(\tilde{J}_{\varvec{t}_0'}^{[\epsilon ]})^{-1}P^T\right) ^{-1} \ge J_{\varvec{t}_0,\epsilon }^{\mathfrak {m}}. \end{aligned}$$
(149)

We define the \(k \times (k+s)\) matrix \(\bar{P}\) by

$$\begin{aligned} \bar{P}_{ij}&=\delta _{ij} \hbox { for } 1\le i \le k, 1\le j\le k+s. \end{aligned}$$

Now, we extend Theorem 4. Let \(E_{(\varvec{0},\varvec{0})}[\hat{\varvec{t}}]\) be the expectation of \(\hat{\varvec{t}}\) under the distribution \(\wp _{\varvec{t}_{0},(\varvec{0},\varvec{0}) |\mathfrak {m}}\). We denote the Radon–Nikodým derivative of \(\wp _{\tilde{\varvec{t}}_0,\epsilon e_i |\mathfrak {m}}\) with respect to \(\wp _{\tilde{\varvec{t}}_0,0|\mathfrak {m}}\) by \(p_i\). Then, for two vectors \(\varvec{a} \in \mathbb {R}^k\) and \(\varvec{b}\in \mathbb {R}^{k+s}\), we apply Schwartz inequality to the two variables \(X:= \sum _j b_j(\hat{\varvec{t}}-E_{(\varvec{0},\varvec{0})}[\hat{\varvec{t}}])_j \) and \(Y:= \sum _i \frac{a_i}{\epsilon } (p_j(x)-1)\). Using \(\tilde{X}:= \sum _j b_j\varvec{t}_j \), we obtain

$$\begin{aligned}&\langle \varvec{b} |V[\wp _{\tilde{\varvec{t}}_{0},(\varvec{0},\varvec{0})| \mathfrak {m}}] |\varvec{b} \rangle \langle \varvec{a} |J_{\tilde{\varvec{t}}_0,\epsilon }^{\mathfrak {m}}|\varvec{a} \rangle = \int \tilde{X}(\varvec{x})^2 \wp _{\tilde{\varvec{t}}_0,(\varvec{0},\varvec{0})|\mathfrak {m}}(d\varvec{x}) \int Y(\varvec{x})^2 \wp _{\tilde{\varvec{t}}_0,(\varvec{0},\varvec{0})|\mathfrak {m}}(d\varvec{x}) \nonumber \\&\quad \ge \int X(\varvec{x})^2 \wp _{\tilde{\varvec{t}}_0,(\varvec{0},\varvec{0})|\mathfrak {m}}(d\varvec{x}) \int Y(\varvec{x})^2 \wp _{\tilde{\varvec{t}}_0,(\varvec{0},\varvec{0})|\mathfrak {m}}(d\varvec{x}) \nonumber \\&\quad \ge \Big |\int X(\varvec{x})Y(\varvec{x}) \wp _{\tilde{\varvec{t}}_0,(\varvec{0},\varvec{0})|\mathfrak {m}}(d\varvec{x})\Big |^2 =\Big |\langle \varvec{b} | \bar{P} |\varvec{a}\rangle \Big |^2, \end{aligned}$$
(150)

where the final equation follows from the fact that the expectation of the variable \(\hat{\varvec{t}}-E_{(\varvec{0},\varvec{0})}[\hat{\varvec{t}}]\) equals \(\varvec{t}= \bar{P} \tilde{\varvec{t}}\) under the distribution \(\wp _{\tilde{\varvec{t}}_{0},\tilde{\varvec{t}} |\mathfrak {m}}\), which can be shown by the covariance condition for the distribution family \(\{\wp _{\tilde{\varvec{t}}_{0},\tilde{\varvec{t}} |\mathfrak {m}}\}_{\tilde{\varvec{t}}}\).

Choosing \(\varvec{a}:= (J_{\tilde{\varvec{t}}_0,\epsilon }^{\mathfrak {m}})^{-1} \bar{P}^T \varvec{b}\), we have

$$\begin{aligned} \langle \varvec{b} | V[\wp _{\tilde{\varvec{t}}_{0},(\varvec{0},\varvec{0})| \mathfrak {m}}] | \varvec{b} \rangle \ge \langle \varvec{b} | \bar{P} (J_{\tilde{\varvec{t}}_0,\epsilon }^{\mathfrak {m}})^{-1}\bar{P}^T| \varvec{b} \rangle , \end{aligned}$$
(151)

which implies

$$\begin{aligned} V[\wp _{\tilde{\varvec{t}}_{0},\tilde{\varvec{t}}| \mathfrak {m}}] =V[\wp _{\tilde{\varvec{t}}_{0},(\varvec{0},\varvec{0})| \mathfrak {m}}] \ge \bar{P} (J_{\tilde{\varvec{t}}_0,\epsilon }^{\mathfrak {m}})^{-1}\bar{P}^T . \end{aligned}$$
(152)

Combining the above with Eq. (149), since \(P:=\bar{P} \tilde{P}\) satisfies the condition (142), we obtain (143).

When the model \(\mathcal{M}'\) is D-invariant, since

$$\begin{aligned} \mathrm{tr}\,W V[\wp _{\tilde{\varvec{t}}_0,\tilde{\varvec{t}}| \mathfrak {m}}] \ge \mathrm{tr}\,P^T W P\tilde{J}_{\varvec{t}_0'}^{-1}, \end{aligned}$$

we obtain (144) by using the expression (139) in the same way as (125):

Achievability part (Proof of (145));

Since \(\rho _{\varvec{t}_0}\) is not degenerate, we can show the achievability part in the same way as Theorem 6 because we can apply Q-LAN (Theorem 3) for the model \(\mathcal{M}'\). The difference is the following. Instead of Corollary 1, we employ Corollary 3 to choose the covariant POVM \(M_{P|W}^{\tilde{J}_{\varvec{t}'_0}^{-1}}\). Then, we obtain the desired statement. \(\quad \square \)

7.2 Nuisance parameter with D-invariant model

Next, we discuss the nuisance parameters when the model is D-invariant.

Lemma 14

When \(\tilde{\mathcal {S}}=\{\rho _{(\varvec{t},\varvec{p})}\}_{(\varvec{t},\varvec{p})\in \tilde{\mathsf {\Theta }}}\) is a D-invariant \(k+s\)-parameter nuisance parameter model and \({J}_{\varvec{t}_0}^{-1} \) exists, we have

$$\begin{aligned} \mathcal {C}_{\mathrm{NH},\tilde{\mathcal{M}}}(W,\tilde{\varvec{t}}_{0}) =\mathrm{tr}\,\tilde{W}(J_{\tilde{\varvec{t}}_0}^{-1})+\frac{1}{2}\mathrm{tr}\,\left| \sqrt{\tilde{W}}J_{\tilde{\varvec{t}}_0}^{-1}D_{\tilde{\varvec{t}}_0}J_{\tilde{\varvec{t}}_0}^{-1}\sqrt{\tilde{W}}\right| . \end{aligned}$$
(153)

A few comments are in order. First, the nuisance parameter bound (144) reduces to the bound (112), when the parameters to estimate are orthogonal to the nuisance parameters in the sense that the RLD Fisher information matrix \(\tilde{J}_{\tilde{\varvec{t}}_0}\) is block-diagonal. This orthogonality is equivalent to the condition that the SLD Fisher information matrix \(J_{\tilde{\varvec{t}}_0}\) and the D-matrix take the block diagonal forms

$$\begin{aligned} J_{\tilde{\varvec{t}}_0}=\left( \begin{array}{cc} J_{\varvec{t}_0} &{} 0\\ 0 &{} J_\mathrm{N}\end{array}\right) \quad D_{\tilde{\varvec{t}}_0}=\left( \begin{array}{cc} D_{\varvec{t}_0} &{} 0\\ 0 &{} D_\mathrm{N}\end{array}\right) . \end{aligned}$$
(154)

This is the case, for instance, of simultaneous estimation of the spectrum and the Hamiltonian-generated phase of a two-level system. Under such circumstances, the inverse of the Fisher information matrix can be done by inverting \(J_{\varvec{t}_0}\) and \(J_\mathrm{N}\) independently. The same precision bound is thus obtained with or without introducing nuisance parameters, and we have the following lemma.

Lemma 15

When all nuisance parameters are orthogonal to the parameters of interest, the bound with nuisance parameters (144) coincides with the D-invariant MSE bound (112).

In the case of orthogonal nuisance parameters, the estimation of nuisance parameters does not affect the precision of estimating the parameters of interest, which does not hold for the generic case of non-orthogonal nuisance parameters. Thanks to this fact, one can achieve the bound (144) by first measuring the nuisance parameters and then constructing the optimal measurement based on the estimated value of the nuisance parameters. On the other hand, an RLD bound [cf. Eq. (39)] can be attained if and only if its model is D-invariant. Combining these arguments with Lemma 15, we obtain a characterization of the attainability of RLD bounds as follows.

Corollary 4

An RLD bound can be achieved if and only if it has an orthogonal nuisance extension, i.e. Eq. (154) holds for some choice of nuisance parameters.

The above corollary offers a simple criterion for the important problem of the attainability of RLD bounds. In Sect. 10.3, we will illustrate the application of this criterion with a concrete example.

The bound (144) can be straightforwardly computed even for complex models; for D-invariant models, the SLD operators have an uniform entry-wise expression and one only needs to shot it into a program to yield the bound (144). Moreover, the bound does not rely on the explicit choice of nuisance parameters. To see this, one can consider another parameterization \(\varvec{x}'\) of the D-invariant model. The bound (144) comes from the RLD bound for the D-invariant model, and the RLD quantum Fisher information matrices \(\widetilde{J}_{\varvec{t}_0'}\) and \(\widetilde{J}_{\varvec{x}'_0}\) for two parameterizations are connected by the equation \(\widetilde{J}_{\varvec{t}_0'}=A\widetilde{J}_{\varvec{x}'_0}A^T\), where A is the Jacobian \(\left( \partial \varvec{x}'/\partial \varvec{t}'\right) \) at \(\varvec{t}'_0\). Since both parameterizations are extensions of the same model \(\mathcal {S}\) satisfying \(P_0\varvec{t}_0'=P'_0\varvec{x}'_0=\varvec{t}_0\), the Jacobian takes the form

$$\begin{aligned} A=\left( \begin{array}{cc} I_{k}&{}A'\\ 0 &{}A''\end{array}\right) . \end{aligned}$$

Then we have \(\widetilde{J}_{\varvec{x}'_0}^{-1}=A^T \widetilde{J}^{-1}_{\varvec{t}_0'}A\), which implies that the upper-left \(k\times k\) blocks of \(\widetilde{J}_{\varvec{x}'_0}\) and \(\widetilde{J}_{\varvec{t}_0'}\) are equal. The bound (144) thus remains unchanged.

7.3 Precision bound for joint measurements

A useful implication of Theorem 8 is a bound on MSEs of several jointly measured observables. Consider a set \(\{O_i\}\) of k bounded observables. The goal is to jointly measure their expectations

$$\begin{aligned} o_i:=\mathrm{Tr}\,\rho O_i\qquad i=1,\dots ,k. \end{aligned}$$
(155)

The main result of this subsection is the following corollary:

Corollary 5

Define the SLD gap of \(o_i\) as

$$\begin{aligned} \varDelta _{o_i}:=\mathbf {MSE}_{o_i}-\left( J^{-1}\right) _{ii}, \end{aligned}$$
(156)

where \(\mathbf {MSE}_{o_i}\) denotes the MSE of \(o_i\) under joint measurement and J is the SLD quantum Fisher information. The sum of the SLD gaps for all observables satisfies the attainable bound:

$$\begin{aligned} \sum _{i=1}^d \varDelta _{o_i}\ge \frac{1}{2}\mathrm{tr}\,\left| \left( J^{-1}DJ^{-1}\right) _{k\times k}\right| , \end{aligned}$$
(157)

where D is the D-matrix.

The right hand side of Eq. (157) is exactly the gap between the SLD bound and the ultimate precision limit. It shows a typical example where the SLD bound is not attainable.

Proof

Substituting \(W'\) in Eq. (144) by the projection into the subspace \(\mathbb {R}^k\), we obtain a bound for the MSE \(\{\mathbf {MSE}_{o_i}\}\) of the limiting distributions:

$$\begin{aligned} \sum _{i=1}^d \mathbf {MSE}_{o_i}\ge \sum _{i=1}^d \left( J^{-1}\right) _{ii}+\frac{1}{2}\mathrm{tr}\,\left| \left( J^{-1}DJ^{-1}\right) _{k\times k}\right| . \end{aligned}$$
(158)

Here J and D are the SLD Fisher information and D-matrix for the extended model, and \(\left( A\right) _{k\times k}\) denotes the upper-left \(k\times k\) block of a matrix A. Substituting the above definition into Eq. (158), we obtain Corollary 5. \(\quad \square \)

Specifically, for the case of two parameters, the bound (157) reduces to

$$\begin{aligned} \varDelta _{o_1}+\varDelta _{o_2}\ge \left| \mathrm{Tr}\,\rho _{\varvec{\theta }}[\hat{L}_1,\hat{L}_2]\right| , \end{aligned}$$
(159)

where \(\hat{L}_j:=\sum _{j=1}^{k'}\left( J_{\varvec{\theta }'}\right) _{ji}^{-1}L_i\) are the SLD operators in the dual space. Next, taking partial derivative with respect to \(o_j\) on both sides of Eq. (155) and substituting in the definition of RLD operators, the observables satisfy the orthogonality relation with the SLD operators as

$$\begin{aligned} \frac{1}{2}\mathrm{Tr}\,\left( \rho L_j+L_j\rho \right) O_i=\delta _{ij}. \end{aligned}$$

By uniqueness of the dual space, we have

$$\begin{aligned} \hat{L_i}=O_i-o_i I\qquad i=1,\dots ,k' \end{aligned}$$

and the bound becomes

$$\begin{aligned} \varDelta _{o_1}+\varDelta _{o_2}&\ge |\langle [O_1,O_2]|\rangle |. \end{aligned}$$
(160)

Another bound expressing the tradeoff between \(\varDelta _{o_1}\) and \(\varDelta _{o_2}\) was obtained by Watanabe et al. [48] as

$$\begin{aligned} \varDelta _{o_1}\varDelta _{o_2}\ge |\langle [O_1,O_2]\rangle |^2/4. \end{aligned}$$
(161)

Now, substituting \(O_2\) by \(\alpha O_2\) for a variable \(\alpha \in \mathbb {R}\) in Eq. (160), we have

$$\begin{aligned} \varDelta _{o_1}+\varDelta _{\alpha o_2}=\varDelta _{o_1}+\alpha ^2\varDelta _{o_2}\ge \alpha |\langle [O_1,O_2]|\rangle |. \end{aligned}$$

For the above quadratic inequality to hold for any \(\alpha \in \mathbb {R}\), its discriminant must be non-positive, which immediately implies the bound (161). Notice that the bound (161) was derived under asymptotic unbiasedness [48], and thus it was not guaranteed to be attainable. Here, instead, since our bound (160) is always attainable, the bound (161) can also be achieved in any qudit model under the asymptotically covariant condition.

7.4 Nuisance parameters versus fixed parameters

It is intuitive to ask what is the relationship between the nuisance parameter bound (144) and the general bound (125). To see it, let \(\mathcal {S}=\{\rho _{\varvec{t}}\}_{\varvec{t}\in \mathsf {\Theta }}\) be a generic k-parameter qudit model and let \(\tilde{\mathcal {S}}\) be a \((k+s)\)-parameter D-invariant model containing \(\mathcal {S}\). When \(\rho _{\varvec{t}_0}\) is non-degenerate, we notice that the QCR bound with nuisance parameters (144) can be rewritten as

$$\begin{aligned} \mathcal {C}_{\mathcal{M}}(W,\varvec{t}_{0})= \mathrm{tr}\,P_0^TWP_0(J_{\tilde{\varvec{t}}_0}^{-1})+\frac{1}{2}\mathrm{tr}\,\left| \sqrt{P_0^TWP_0}J_{\tilde{\varvec{t}}_0}^{-1}D_{\tilde{\varvec{t}}_0}J_{\tilde{\varvec{t}}_0}^{-1}\sqrt{P_0^TWP_0}\right| , \end{aligned}$$
(162)

where \(P_0\) is a \(k\times (k+s)\) matrix satisfying the constraint \((P_0)_{ij}:=\delta _{ij}\) for any \(i,j\le k+s\). By definition, \(P_0\) is a special case of P, and it follows straightforwardly from comparing Eq. (162) with Eq. (125) that the general MSE bound is upper bounded by the MSE bound for the nuisance parameter case. This observation agrees with the obvious intuition that having additional information on the system is helpful for (or at least, not detrimental to) estimation. At last, since \(J_{\tilde{\varvec{t}}_0}\) and \(D_{\tilde{\varvec{t}}_0}\) are block-diagonal in the case of orthogonal nuisance parameters, we have

$$\begin{aligned} P(J_{\tilde{\varvec{t}}_0})^{-1}P^T=J^{-1}_{\varvec{t}_0}\quad PD_{\tilde{\varvec{t}}_0}P^T=D_{\varvec{t}_0} \end{aligned}$$

for any \(k\times (k+s)\) matrix satisfying the constraint \((P)_{ij}:=\delta _{ij}\) for \(i,j\le k\). This implies that the general bound (125) coincides with the nuisance parameter bound (144) when the nuisance parameters are orthogonal.

8 Tail Property of the Limiting Distribution

In previous discussions, we focused on the MSE of the limiting distribution. Here, instead, we consider the behavior of the limiting distribution itself. The characteristic property is the tail property: Given a weight matrix \(W\ge 0\) and a constant c, we define the tail region \(\mathsf {T}_{W,c}(\varvec{t})\) as

$$\begin{aligned} \mathsf {T}_{W,c}(\varvec{t}):=\left\{ \varvec{x}~|~\,(\varvec{x}^T-\varvec{t}^T)W(\varvec{x}-\varvec{t})\ge c\right\} . \end{aligned}$$

For a measurement \(\mathfrak {m}=\{M_n(\hat{\varvec{t}}_n)\}\), the probability that the estimate \(\hat{\varvec{t}}_n\) is in the tail region can be approximated by the tail probability of the limiting distribution, i.e.

$$\begin{aligned} \mathbf {Prob}\left( \hat{\varvec{t}}_n\in \mathsf {T}_{W,c}(\varvec{t})\right) =\wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}\left( \mathsf {T}_{W,c}(\varvec{t})\right) +\epsilon _n, \end{aligned}$$

up to \(\epsilon _n\) being a term vanishing in n. The tail property is usually harder to characterize than the MSE. Nevertheless, here we show that, under certain conditions, there exists a good bound on the tail property of the limiting distribution.

8.1 Tail property of Gaussian shift models

Just like in the previous sections, the tail property of n-copy qudit models can be analyzed by studying the tail property of Gaussian shift models. In this subsection, we first derive a bound on the tail probability of Gaussian shift models. The result has an interest in its own and can be used for further analysis of qudit models using Q-LAN.

Consider a Gaussian shift model \(\{G[\varvec{\alpha },\varvec{\gamma }]\}\) with \(G[\varvec{\alpha },\varvec{\gamma }]=N[\varvec{\alpha }^C,\varGamma ]\otimes \varPhi [\varvec{\alpha }^Q,\varvec{\beta }]\) and a measurement \(M^\mathrm{G} (\hat{\alpha })\). Then, define the probability \(\wp _{\varvec{\alpha }|M^G}\left( \mathsf {T}_{W,c}(\varvec{\alpha })\right) \), where \(\mathsf {T}_{W,c}(\varvec{\alpha })\) is the tail region around \(\varvec{\alpha }\) defined as

$$\begin{aligned} \mathsf {T}_{W,c}(\varvec{\alpha }):=\left\{ \varvec{x}~|~\,(\varvec{x}^T-\varvec{\alpha }^T) W(\varvec{x}-\varvec{\alpha })\ge c\right\} . \end{aligned}$$

Then, for covariant POVMs, the tail probability is independent of \(\varvec{\alpha }\) and is given by:

$$\begin{aligned} \wp _{\varvec{\alpha }|M^G}\left( \mathsf {T}_{W,c}(\varvec{\alpha })\right) =\mathrm{Tr}\,N[\varvec{0},\varGamma ]\otimes \varPhi [\varvec{0},\varvec{\beta }] M\left( \mathsf {T}_{W,c}(\varvec{0})\right) . \end{aligned}$$

When the measurement is covariant, we have the following bound on the tail probability, which can be attained by a certain covariant POVM:

Lemma 16

Consider a Gaussian model \(G[\varvec{\alpha },\varvec{\gamma }]=N[\varvec{\alpha }^C,\varGamma ]\otimes \varPhi [\varvec{\alpha }^Q,\varvec{\beta }]\) with \(s'\) classical parameters and s pairs of quantum parameters. Assume that a POVM \(\{M^G(\mathsf {B}) \}_{\mathsf {B} \subset \mathbb {R}^{s'} \times \mathbb {R}^{2s}}\) is covariant and the weight matrix W has the following form;

$$\begin{aligned} W=\left( \begin{array}{cccc} W^C &{} &{} &{} \\ &{} w_{s'+1}I_2 &{} &{} \\ &{} &{} \ddots &{}\\ &{} &{} &{} w_{s'+s}I_2\end{array}\right) \end{aligned}$$
(163)

with \(W^C\ge 0\). Then, the tail probability of the limiting distribution is bounded as

$$\begin{aligned} \wp _{\varvec{\alpha }|M^G}\left( \mathsf {T}_{W,c}(\varvec{\alpha })\right) \ge N\left[ \varvec{0},\varGamma \otimes E_s\left( e^{-\varvec{\beta }}+\varvec{e}/2 \right) \right] \left( \mathsf {T}_{W,c}(\varvec{0})\right) , \end{aligned}$$
(164)

where \(\varvec{e}\) is the 2s-dimensional vector with all entries equal to 1. For the definition of \(E_s\left( e^{-\varvec{\beta }}+\varvec{e}/2 \right) \), see (56). When the POVM \(M^G\) is given as \(M^G(B)=\int _{B}| \alpha _1, \ldots , \alpha _s\rangle \langle \alpha _1, \ldots , \alpha _s| d \varvec{\alpha }\), the equality in (164) holds.

The proof can be found in “Appendix H”. When the model has a group covariance, similar evaluation might be possible. For example, similar evaluation was done in the n-copy of full pure states family [49] and in the n-copy of squeezed states family [50, Sect. 4.1.3].

8.2 Tail property of D-invariant qudit models

For a k-parameter D-invariant model \(\{\rho _{\varvec{t}}\}\), using Lemma 16 and Q-LAN, we have the following theorem.

Theorem 9

Let \(\{\rho _{\varvec{t}}\}_{\varvec{t}\in \mathsf {\Theta }}\) be a k-parameter D-invariant model. Assume that \((\tilde{J}_{\varvec{t}'_0})^{-1}\) exists, \(\rho _{\varvec{t}_0}\) is a non-degenerate state, and a sequence of measurements \(\mathfrak {m}:=\{M_n\}\) satisfies local asymptotic covariance at \(\varvec{t}_{0} \in \mathsf {\Theta }\). When \(J_{\varvec{t}_0}^{-1/2}W J_{\varvec{t}_0}^{-1/2}\) commutes with \(J_{\varvec{t}_0}^{-1/2}D_{\varvec{t}_0} J_{\varvec{t}_0}^{-1/2}\), we have

$$\begin{aligned} \wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}\left( \mathsf {T}_{W,c}(\varvec{t})\right) \ge N\Big [0, W^{1/2}J_{\varvec{t}_0}^{-1}W^{1/2} +\frac{1}{2}\left| W^{1/2}J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1} W^{1/2}\right| \Big ]\left( \mathsf {T}_c\right) \end{aligned}$$
(165)

for \(\mathsf {T}_c:=\{\varvec{x}\in \mathbb {R}^k~|~\Vert \varvec{x}\Vert \ge c\}\). The equality holds if and only if \(\wp _{\varvec{t}_0,\varvec{t}| \mathfrak {m}}\) is the normal distribution with average zero and covariance \(V_{\varvec{t_0}|W}\) as defined in Eq. (114).

We note that bounds on the probability distributions are usually more difficult to obtain and more informative than the MSE bounds, as the MSE can be determined by the probability distribution. Theorem 9 provides an attainable bound of the tail probability, which can be used to determine the maximal probability that the estimate falls into a confidence region \(\mathsf {T}_{W,c}\) as well as the optimal measurement.

Our proof of Theorem 9 needs some preparations. First, we introduce the concept of simultaneous diagonalization in the sense of symmetric transformation. Two \(2k \times 2k\) real symmetric matrices \(A_1\) and \(A_2\) are called simultaneously symplectic diagonalizable when there exist a symplectic matrix S and two real vectors \(\varvec{\beta }_1\) and \(\varvec{\beta }_2\) such that such that

$$\begin{aligned} S^T A_1 S= E_k(e^{-\varvec{\beta }_1}),\quad S^T A_2 S= E_k(e^{-\varvec{\beta }_2}) \end{aligned}$$
(166)

with \(E_k\) defined in Eq. (56). Regarding the simultaneous diagonalization, we have the following property, whose proof can be found in “Appendix I”:

Lemma 17

The following conditions are equivalent for two \(2k \times 2k\) real symmetric matrices \(A_1\) and \(A_2\).

  1. (i)

    \(A_1\) and \(A_2^{-1}\) are simultaneously symplectic diagonalizable.

  2. (ii)

    \(\varOmega _k A_2 A_1=A_1 A_2 \varOmega _k\), where \(\varOmega _k\) is defined in Eq. (57).

Using Lemma 17, we obtain the following lemma.

Lemma 18

Let \(A_1\) be \( \left| J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1}\right| ^{1/2}\). Assume that \(A_1^{-1}J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1} A_1^{-1}=\varOmega _k\). When \(J_{\varvec{t}_0}^{-1/2}W J_{\varvec{t}_0}^{-1/2}\) commutes with \(J_{\varvec{t}_0}^{-1/2}D_{\varvec{t}_0} J_{\varvec{t}_0}^{-1/2}\), \(( A_1 W A_1)^{-1}\) and \( A_1^{-1} J_{\varvec{t}_0}^{-1} A_1^{-1}\) are simultaneously symplectic diagonalizable.

Proof

From \([J_{\varvec{t}_0}^{-1/2}W J_{\varvec{t}_0}^{-1/2},J_{\varvec{t}_0}^{-1/2}D_{\varvec{t}_0} J_{\varvec{t}_0}^{-1/2}]=0\), we get \([J_{\varvec{t}_0}^{-1}W, (J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1})]=0\). Next, noticing that \((J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1})=A_1\varOmega _k A_1\), we have

$$\begin{aligned}&J_{\varvec{t}_0}^{-1}W A_1 \varOmega _k A_1 =J_{\varvec{t}_0}^{-1}W (J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1}) =(J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1})W J_{\varvec{t}_0}^{-1} =A_1 \varOmega _k A_1W J_{\varvec{t}_0}^{-1}. \end{aligned}$$

The above equalities show that \(J_{\varvec{t}_0}^{-1}W\) commutes with \(A_1 \varOmega _k A_1\), which further implies that

$$\begin{aligned}&A_1^{-1}J_{\varvec{t}_0}^{-1} A_1^{-1} A_1WA_1 \varOmega _k =\varOmega _k A_1 W A_1 A_1^{-1}J_{\varvec{t}_0}^{-1} A_1^{-1}. \end{aligned}$$

Using Lemma 17, we get \((A_1 W A_1 )^{-1}\) and \(A_1^{-1}J_{\varvec{t}_0}^{-1} A_1^{-1}\) are simultaneously symplectic diagonalizable. \(\quad \square \)

Proof of Theorem 9

Define \(A_1:= \left| J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1}\right| ^{1/2}\) and \(A_2:= A_1^{-1}J_{\varvec{t}_0}^{-1}D_{\varvec{t}_0}J_{\varvec{t}_0}^{-1} A_1^{-1}\). Applying a suitable orthogonal matrix, we assume that \(A_2=\varOmega _k\) without loss of generality.

Step 1 For simplicity, we assume that there is no classical part. First, we choose an orthogonal matrix \(S'\) such that \({S'}^TA_2S'=D\). Using Lemma 18 guarantees that the condition \([J_{\varvec{t}_0}^{-1/2}W J_{\varvec{t}_0}^{-1/2},J_{\varvec{t}_0}^{-1/2}D_{\varvec{t}_0} J_{\varvec{t}_0}^{-1/2}]=0\) allows us to simultaneously diagonalize W and \(D_{\varvec{t}_0}\). That is, we can choose symplectic matrix S such that \(S^{T} {S'}^T ( A_1 W A_1)^{-1}{S'} S\) and \(S^{T} {S'}^T A_1^{-1} J_{\varvec{t}_0}^{-1} A_1^{-1}{S'} S\) are diagonal matrixces \( E_{k}(\varvec{w})^{-1}\) and \( E_{k}(\varvec{\beta })\) for \(k\in \mathbb {N}^*\). We introduce the local parameter \({\varvec{t}'}:= S^{T} {S'}^T A_1^{-1} \varvec{t}\). Then, \( \varvec{t}\cdot W \varvec{t}= \varvec{t}'\cdot E_{k}(\varvec{w}) \varvec{t}'\).

For a sequence of measurement \(\mathfrak {m}:=\{M_n\}\) to satisfy local asymptotic covariance at \(\varvec{t}_{0} \in \mathsf {\Theta }\), according to Theorem 5, we choose a covariant POVM \(\widetilde{M}^G\) to satisfy (103). Applying Lemma 16 to the POVM \(\widetilde{M}^G\), we obtain the desired statement.

Step 2 We consider the general case. Now, we choose the local parameter \(\varvec{t}':= J_{\varvec{t}_0}^{-1/2} \varvec{t}\). In this coordinate, The inverse of the RLD quantum Fisher information is \(I+ J_{\varvec{t}_0}^{-1/2}D_{\varvec{t}_0} J_{\varvec{t}_0}^{-1/2}\). Since \(J_{\varvec{t}_0}^{-1/2}D_{\varvec{t}_0} J_{\varvec{t}_0}^{-1/2}\) commutes with \(J_{\varvec{t}_0}^{-1/2}W J_{\varvec{t}_0}^{-1/2}\), the weight matrix has no cross term between the classical and quantum parts. Using the above discussion and Lemma 16, we obtain the desired statement. \(\quad \square \)

9 Extension to Global Estimation and Generic Cost Functions

In the previous sections, we focused on local models and cost functions of the form \(\mathrm{tr}\,WV[\wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}]\). In this section, our treatment will be extended to global models \(\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\). (where the parameter to be estimated is not restricted to a local neighborhood) and to generic cost functions.

9.1 Optimal global estimation via local estimation

Our optimal global estimation is given by combining the two-step method and local optimal estimation. That is, the first step is the application of full tomography proposed in [26] on \(n^{1-x/2}\) copies with the outcome \(\hat{\varvec{t}}_0\) for a constant \(x\in (0,2/9)\), and the second step is the local optimal estimation at \(\hat{\varvec{t}}_0\), given in Sect. 6.3, on

$$\begin{aligned} a_{n,x}:=n-n^{1-x/2} \end{aligned}$$

copies. Before its full description, we define the neighborhood \(\mathsf {\Theta }_{n,x}(\varvec{t})\) of \(\varvec{t}\in \mathsf {\Theta }\) as

$$\begin{aligned} \mathsf {\Theta }_{n,x}(\varvec{t})&:=\left\{ \varvec{y}~|~\Vert \varvec{y}-\varvec{t}\Vert \le n^{-\frac{1-x}{2}}\right\} . \end{aligned}$$
(167)

Given a generic model \(\mathcal{M}=\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\) that does not contain any degenerate state and a weight matrix \(W> 0\), we describe the full protocol as follows.

  1. (A1)

    Localization: Perform full tomography proposed in [26] on \(n^{1-x/2}\) copies, which is described by a POVM \(\{M^\mathrm{tomo}_{n^{1-x/2}}\}\), for a constant \(x\in (0,2/9)\). The tomography outputs the first estimate \(\hat{\varvec{t}}_0\) so that

    $$\begin{aligned}&\mathrm{Tr}\,\rho _{\varvec{t}}^{\otimes n^{1-x/2}} M^\mathrm{tomo}_{n^{1-x/2}} \left( \mathsf {\Theta }_{n,x}(\varvec{t}_\mathrm{g})\right) =1-O\left( e^{-n^{x/2}}\right) \end{aligned}$$
    (168)

    for any true parameter \(\varvec{t}\).

  2. (A2)

    Local estimation: Based on the first estimate \(\hat{\varvec{t}}_0\), apply the optimal local measurement \(M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}\) given in Theorem 7 with the weight matrix W. If the measurement outcome \(\hat{\varvec{t}}_1\) of \(M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}\) is in \(\mathsf {\Theta }_{n,x}(\hat{\varvec{t}}_0)\), output the outcome \(\hat{\varvec{t}}_1\) as the final estimate; otherwise output \(\hat{\varvec{t}}_0\) as the final estimate.

Denoting the POVM of the whole process by \(\mathfrak {m}_{W}=\{M^n_{W}\}\), we obtain the following theorem.

Theorem 10

Assume that a qudit-model \(\mathcal{M}=\{\rho _{\varvec{t}}\}_{\varvec{t} \in \mathsf {\Theta }}\) does not contain any degenerate state, the parametrization is \(C^2\) continuous, \(\mathsf {\Theta }\) is an open set, and \({J}_{\varvec{t}_0}^{-1} \) exists. (i) The sequence \(\mathfrak {m}_W\) satisfies local asymptotic covariance at any point \(\varvec{t}_0\) in the parameter space. (ii) The equation

$$\begin{aligned} \mathrm{tr}\,W V [\wp _{\varvec{t}_{0},\varvec{t} |\mathfrak {m}_{W}}]= \mathcal {C}(W,\varvec{t}_{0}) \end{aligned}$$
(169)

holds for any point \(\varvec{t}_0 \in \mathsf {\Theta }\) and any \(\varvec{t}\in \Theta _{n,x,c(\varvec{t}_0)}\) corresponding to a non-degenerate state, where \(\mathcal {C}_{\mathcal {S}}(W,\varvec{t}_{0})\) is the minimum weighted MSE as defined in Eq. (110). More precisely, we have

$$\begin{aligned} \limsup _{n \rightarrow \infty }\sup _{\varvec{t}_0 \in K} \sup _{\varvec{t} \in \Theta _{n,x,c(\varvec{t}_0)} } n^{\kappa } \left\| \wp ^n_{\varvec{t}_0,\varvec{t}| M^n_{W}} - N[\varvec{t}, V_{\varvec{t_0}|W}] \right\| _1 <\infty \end{aligned}$$
(170)

for a compact set \(K\subset \mathsf {\Theta }\), where \(V_{\varvec{t_0}|W}\) is defined in Eq. (146) and \(\Theta _{n,x,c(\varvec{t}_0)}\) is defined in Eq. (76). Further, when the parameter set \(\mathsf {\Theta }\) is bounded and \(x<\kappa \), we have the following relation.

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _{\varvec{t}_0 \in K} \sup _{\varvec{t} \in \mathsf {\Theta }_{n,x,c(\varvec{t}_0)} } \left\| V [\wp ^n_{\varvec{t}_0,\varvec{t}| M^n_{W}}] - V_{\varvec{t_0}|W} \right\| _1=0. \end{aligned}$$
(171)

Here, we should remark the key point of the derivation. The existing papers [8, 11] addressed the achievability of \(\min _{M}\mathrm{tr}\,W J_{\varvec{t}|M}^{-1}\) with the two-step method, where \(J_{\varvec{t}|M}\) is the Fisher information matrix of the distribution family \(\{\wp _{\varvec{t}|M}\}_{\varvec{t}}\), which expresses the bounds among separable measurement [34, Exercise 6.42]. Hence it can be called the separable bound. In the one-parameter case, the separable bound equals the Holevo bound. To achieve the separable bound, we do not consider the sequence of measurement. Hence, we do not handle a complicated convergence. The global achievability of the separable bound can be easily shown by the two-step method [8, 11]. However, in our setting, we need to handle the sequence of measurement to achieve the local optimality. Hence, we need to carefully consider the compact uniformity and the order estimate of the convergence in Theorem 7. In the following proof, we employ our evaluation with such detailed analysis as in Eq. (127).

Proof

Step 1 Define by \(\varvec{t}_\mathrm{g}:=\varvec{t}_0+\frac{\varvec{t}}{\sqrt{n}}\) the true value of the parameters. By definition, we have \(\Vert \varvec{t}_\mathrm{g}- \hat{\varvec{t}}_0\Vert \le n ^{-\frac{1-x}{2}} \) with probability \(1-O(e^{-n^{x/2}})\) and \(\Vert \varvec{t}_\mathrm{g}- \varvec{t}_0\Vert \le c(\varvec{t}_0) n ^{-\frac{1}{2}+x} \) by definition. Since the error probability vanishes exponentially, it would not affect the scaling of MSE. In this step, we will show

$$\begin{aligned}&\left\| \wp ^{n}_{\varvec{t}_0,\varvec{t}| M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}} -N[\varvec{t}, V_{\varvec{t}_0|W}]\right\| _1 =O(n^{-\kappa }). \end{aligned}$$
(172)

Since \(\Vert \varvec{t}_\mathrm{g}- \hat{\varvec{t}}_0\Vert \le n ^{-\frac{1-x}{2}} \) and \(\Vert \varvec{t}_\mathrm{g}- \varvec{t}_0\Vert \le c(\varvec{t}_0) n ^{-\frac{1}{2}+x} \) imply \(\Vert \varvec{t}_0- \hat{\varvec{t}}_0\Vert \le 2 c(\varvec{t}_0) n ^{-\frac{1}{2}+x} \), we have

$$\begin{aligned} \Vert N[\varvec{0}, V_{\varvec{t}_0|W}] - N[\varvec{0}, V_{\hat{\varvec{t}}_0|W}]\Vert _1&= O( n ^{-\frac{1}{2}+x} ) . \end{aligned}$$
(173)

Eq. (127) of Theorem 7 implies

$$\begin{aligned} \Big \Vert \wp ^{a_{n,x}}_{\varvec{t}_0,\varvec{t}| M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}} -N[\varvec{0}, V_{\hat{\varvec{t}}_0|W}]\Big \Vert _1&=O(n^{-\kappa }). \end{aligned}$$
(174)

Since \(\wp ^{n}_{\varvec{t}_0,\varvec{t}| M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}}(\mathsf {B}) =\wp ^{a_{n,x}}_{\varvec{t}_0,\varvec{t}| M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}}\left( \frac{\sqrt{a_{n,x}}\mathsf {B}}{\sqrt{n}}\right) \),

$$\begin{aligned}&\Big \Vert \wp ^{n}_{\varvec{t}_0,\varvec{t}| M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}} -N[\varvec{t}, \frac{n}{a_{n,x}}V_{\varvec{t}_0|W}]\Big \Vert _1 = \Big \Vert \wp ^{a_{n,x}}_{\varvec{t}_0,\varvec{t}| M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}} -N[\varvec{t}, V_{\varvec{t}_0|W}]\Big \Vert _1 \nonumber \\&\quad \le \Big \Vert \wp ^{a_{n,x}}_{\varvec{t}_0,\varvec{t}| M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}} -N[\varvec{0}, V_{\hat{\varvec{t}}_0|W}]\Big \Vert _1 +\Vert N[\varvec{t}, V_{\varvec{t}_0|W}] - N[\varvec{t}, V_{\hat{\varvec{t}}_0|W}]\Vert _1 \nonumber \\&\quad = O(n^{-\kappa })+O( n ^{-\frac{1}{2}+x} ) =O(n^{-\kappa }). \end{aligned}$$
(175)

As we have

$$\begin{aligned} \Vert N[\varvec{0}, V_{\varvec{t}_0|W}]-N[\varvec{0}, \frac{n}{a_{n,x}}V_{\varvec{t}_0|W}]\Vert _1&=O(\frac{n}{a_{n,x}}-1) \nonumber \\&=O((1-n^{x/2})^{-1}-1)=O(n^{x/2}), \end{aligned}$$
(176)

we obtain

$$\begin{aligned}&\Vert \wp ^{n}_{\varvec{t}_0,\varvec{t}| M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}} -N[\varvec{t}, V_{\varvec{t}_0|W}]\Vert _1\nonumber \\&\quad \le \Vert N[\varvec{t}, V_{\varvec{t}_0|W}]-N[\varvec{t}, \frac{n}{a_{n,x}}V_{\varvec{t}_0|W}]\Vert _1 +\Vert \wp ^{n}_{\varvec{t}_0,\varvec{t}| M^{\hat{\varvec{t}}_0,a_{n,x}}_{W}} -N[\varvec{t}, \frac{n}{a_{n,x}}V_{\varvec{t}_0|W}]\Vert _1\nonumber \\&\quad =O(n^{x/2})+O(n^{-\kappa }). \end{aligned}$$
(177)

Step 2 We will show (170). First, we discuss two exceptional cases \(\Vert \varvec{t}_\mathrm{g}- \hat{\varvec{t}}_0\Vert > n ^{-\frac{1-x}{2}} \) and \(\Vert \hat{\varvec{t}}_1- \hat{\varvec{t}}_0\Vert > n ^{-\frac{1-x}{2}} \). Eq. (168) guarantees that

$$\begin{aligned} \mathrm{Tr}\,\rho _{\varvec{t}_\mathrm{g}}^{\otimes n^{1-x/2}} M^\mathrm{tomo}_{n^{1-x/2}} \left( \Big \{\hat{\varvec{t}}_0 \Big | \Vert \varvec{t}_\mathrm{g}- \hat{\varvec{t}}_0\Vert > n ^{-\frac{1-x}{2}} \Big \} \right) =O\left( e^{-n^{x/2}}\right) . \end{aligned}$$
(178)

Eq. (175) and the property of normal distribution implies

$$\begin{aligned}&\mathrm{Tr}\,\rho _{\varvec{t}_\mathrm{g}}^{\otimes a_{n,x}} M^{\hat{\varvec{t}}_0,a_{n,x}}_{W} \left( \Big \{\hat{\varvec{t}}_0 \Big | \Vert \hat{\varvec{t}}_1- \hat{\varvec{t}}_0\Vert> n ^{-\frac{1-x}{2}} \Big \} \right) \nonumber \\&\quad = O(n^{-\kappa })+ N[\varvec{t}, \frac{n}{a_{n,x}}V_{\varvec{t}_0,|W}] ( \{\varvec{t}| ~ \Vert \varvec{t} \Vert > n^{x/2} \}) \nonumber \\&\quad = O(n^{-\kappa })+ O(e^{- O(n^{x/2}) }) =O(n^{-\kappa }). \end{aligned}$$
(179)

When \(\Vert \varvec{t}_\mathrm{g}- \hat{\varvec{t}}_0\Vert \le n ^{-\frac{1-x}{2}} \) and \(\Vert \hat{\varvec{t}}_1- \hat{\varvec{t}}_0\Vert \le n ^{-\frac{1-x}{2}} \), Eq. (172) holds under the condition \(\Vert \varvec{t}_\mathrm{g}- \varvec{t}_0\Vert \le c(\varvec{t}_0) n ^{-\frac{1}{2}+x} \), which implies that

$$\begin{aligned} \sup _{\varvec{t} \in \mathsf {\Theta }_{n,x,c(\varvec{t}_0)} } \Vert \wp ^n_{\varvec{t}_0,\varvec{t}| M^n_{W}} - N[\varvec{t}, V_{\varvec{t_0}|W}] \Vert _1 =O(n^{-\kappa }). \end{aligned}$$
(180)

Since the above evaluation is compactly uniform with respect to \(\varvec{t}_0\), we have (170).

Step 3 We will show

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _{\varvec{t}_0 \in K} \sup _{\varvec{t} \in \mathsf {\Theta }_{n,x,c(\varvec{t}_0)} } \Big \Vert V [\wp ^n_{\varvec{t}_0,\varvec{t}| M^n_{W}}] - \frac{n}{a_{n,x}} V_{\varvec{t_0}|W} \Big \Vert _1=0 \end{aligned}$$
(181)

because Eq. (181) implies (171) due to the convergence \(\frac{n}{a_{n,x}} \rightarrow 1\). There are three cases. (1) \(\Vert \varvec{t}_\mathrm{g}- \hat{\varvec{t}}_0\Vert > n ^{-\frac{1-x}{2}} \), (2) \(\Vert \hat{\varvec{t}}_1- \hat{\varvec{t}}_0\Vert > n ^{-\frac{1-x}{2}} \) and \(\Vert \varvec{t}_\mathrm{g}- \hat{\varvec{t}}_0\Vert \le n ^{-\frac{1-x}{2}} \), and (3) the remaining case.

The compactness of \(\mathsf {\Theta }\) guarantees that the error \( n (\hat{\varvec{t}} - \varvec{t}_\mathrm{g})(\hat{\varvec{t}} - \varvec{t}_\mathrm{g})^T \) is bounded by nC with a constant C. Due to (178), the contribution of the first case is bounded by \( nC \cdot O(e^{-n^{x/2}})\), which goes to zero.

In the second case, since \(\hat{\varvec{t}}_0=\hat{\varvec{t}}\), the error \( n (\hat{\varvec{t}} - \varvec{t}_\mathrm{g})(\hat{\varvec{t}} - \varvec{t}_\mathrm{g})^T \) is bounded by \(n (n ^{-\frac{1-x}{2}})^2=n^x \). Due to (179), the contribution of the second case is bounded by \(n^x \cdot O(n^{-\kappa })=O(n^{x-\kappa })\), which goes to zero.

In the third case, since \(\Vert {\varvec{t}}_\mathrm{g}-\hat{\varvec{t}}_1\Vert \le \Vert {\varvec{t}}_\mathrm{g}-\hat{\varvec{t}}_0\Vert +\Vert \hat{\varvec{t}}_0-\hat{\varvec{t}}_1\Vert \le 2 n ^{-\frac{1-x}{2}}\), the error \( n (\hat{\varvec{t}} - \varvec{t}_\mathrm{g})(\hat{\varvec{t}} - \varvec{t}_\mathrm{g})^T \) is bounded by \( 2n (n^{-\frac{1-x}{2}})^2=2 n^x \). Due to (175), the contribution of the second case is bounded by \(2 n^x \cdot O(n^{-\kappa })=O(n^{x-\kappa })\), which goes to zero. Therefore, we obtain (181). \(\quad \square \)

9.2 Generic cost functions

Finally, we show that results in this work hold also for any cost function \(c(\hat{\varvec{t}},\varvec{t})\), which is bounded and has a symmetric expansion, in the sense of satisfying the following two conditions:

  1. (B1)

    \(c(\hat{\varvec{t}},\varvec{t})\) has a continuous third derivative, so that it can be expanded as \(c(\hat{\varvec{t}},\varvec{t})= (\hat{\varvec{t}}^T-\varvec{t}^T)W_{\varvec{t}}(\hat{\varvec{t}}-\varvec{t})+O(\Vert \hat{\varvec{t}}-\varvec{t}\Vert ^3)\) as \(\hat{\varvec{t}}\) is close to \(\varvec{t} \), where the matrix \(W_{\varvec{t}}\ge 0\) is a continuous function of \(\varvec{t}\).

  2. (B2)

    \(c(\hat{\varvec{t}},\varvec{t})\le C\) for a constant \(C>0\) and for any \(\hat{\varvec{t}},\varvec{t}\in \mathbb {R}^k\).

To adopt this situation, we replace the step (A2) by the following step (A2)’:

  1. (A2)’

    Based on the first estimate \(\hat{\varvec{t}}_0\), apply the optimal local measurement \(M^{\hat{\varvec{t}}_0,a_{n,x}}_{W_{\hat{\varvec{t}}_0}}\) given in Theorem 7 with the weight matrix \(W_{\hat{\varvec{t}}_0}\). If the measurement outcome \(\hat{\varvec{t}}_1\) of \(M^{\hat{\varvec{t}}_0,a_{n,x}}_{W_{\hat{\varvec{t}}_0}}\) is in \(\mathsf {\Theta }_{n,x}(\hat{\varvec{t}}_0)\), output the outcome \(\hat{\varvec{t}}_1\) as the final estimate; otherwise output \(\hat{\varvec{t}}_0\) as the final estimate \(\hat{\varvec{t}}\).

Denoting the POVM of the whole process by \(\mathfrak {m}_{c}=\{M^n_c\}\), we have the following result:

Theorem 11

Assume the same assumption for the model \(\mathcal{M}\) as Theorem 10. (i) When a sequence of measurements \(\mathfrak {m}:=\{M_{n}(d\hat{\varvec{t}})\}\) satisfies local asymptotic covariance at \(\varvec{t}_{0} \in \mathsf {\Theta }\) and a cost function c satisfies condition (B1), the inequality

$$\begin{aligned} \lim _{n\rightarrow \infty } n\cdot c_{\varvec{t}_{0}+\frac{\varvec{t}}{\sqrt{n}}}(\mathfrak {m}) \ge \mathcal {C}(W_{\varvec{t}_{0}},\varvec{t}_{0}) \end{aligned}$$
(182)

holds, where \(\mathcal {C}_{\mathcal {S}}(W_{\varvec{t}_{0}},\varvec{t}_{0})\) is defined in Eq. (110) and

$$\begin{aligned} c_{\varvec{t}_{0}+\frac{\varvec{t}}{\sqrt{n}}}(\mathfrak {m}) :=\int c\left( \hat{\varvec{t}}_n,\varvec{t}_{0}+\frac{\varvec{t}}{\sqrt{n}}\right) \,\mathrm{Tr}\,\rho _{\varvec{t}_{0}+\frac{\varvec{t}}{\sqrt{n}}}^{\otimes n}M_{n}\left( d\hat{\varvec{t}}_{n}\right) . \end{aligned}$$

(ii) In addition, if c also satisfies Condition (B2), \(\mathfrak {m}_{c}=\{M^n_c\}\) is locally asymptotically covariant and attains the equality in (182) at any point \(\varvec{t}_{0} \in \mathsf {\Theta }\) corresponding to a non-degenerate state.

Theorem 11 is reduced to a bound for the (actual) MSE when \(c(\hat{\varvec{t}},\varvec{t})=(\hat{\varvec{t}}^T-\varvec{t}^T)W(\hat{\varvec{t}}-\varvec{t})\) for \(W\ge 0\). Therefore, bounds in this work, Eqs. (125) and (144) for instance, are also attainable bounds for the MSE of any locally asymptotically unbiased measurement.

Proof

Step 1 We prove (1). Consider any sequence of asymptotically covariant measurements \(\mathfrak {m}_{\varvec{t}_0}:=\{M_{n,\varvec{t}_0}\}\) at \(\varvec{t}_0\). Denote by \(\varvec{t}_\mathrm{g}:=\varvec{t}_0+\frac{\varvec{t}}{\sqrt{n}}\) the true value of the parameters. For a cost function c satisfying (ii), we have

$$\begin{aligned}&\lim _{n\rightarrow \infty } n\cdot c_{\varvec{t}_{0}+\frac{\varvec{t}}{\sqrt{n}}}(\mathfrak {m})\\&\quad =\lim _{n\rightarrow \infty }n\cdot \int c(\hat{\varvec{t}}_n,\varvec{t}_\mathrm{g})\,\mathrm{Tr}\,\rho _{\varvec{t}_\mathrm{g}}^{\otimes n}M_{n,\varvec{t}_{0}}\left( d\hat{\varvec{t}}_{n}\right) \\&\quad =\lim _{n\rightarrow \infty }\int \left( \varvec{t}'^TW_{\varvec{t}_\mathrm{g}}\varvec{t'}+\frac{1}{\sqrt{n}}O(\Vert \varvec{t'}\Vert ^3)\right) \ \wp ^n_{\varvec{t}_0,\varvec{t}|M_{n,\varvec{t}_{0}}}\left( d\varvec{t}'\right) \quad \big (\varvec{t}':=\sqrt{n}(\hat{\varvec{t}}_n-\varvec{t}_\mathrm{g}) \big )\\&\quad =\int \varvec{t}'^TW_{\varvec{t}_{0}}\varvec{t'}\ \wp _{\varvec{t}_{0},\varvec{t}|\mathfrak {m}_{\varvec{t}_0}}(d\varvec{t}') \ge \ \mathcal {C}(W_{\varvec{t}_{0}},\varvec{t}_{0}). \end{aligned}$$

Step 2 We prove (2). We replace W by \(W_{\varvec{t}}\) in the proof of Theorem 10. In this replacement, (173) is replaced by

$$\begin{aligned} \Vert N[\varvec{0}, V_{\varvec{t}_0|W_{{\varvec{t}_0}}}] - N[\varvec{0}, V_{\hat{\varvec{t}}_0|W_{\hat{\varvec{t}}_0}}]\Vert _1&= O( n ^{-\frac{1}{2}+x} ) \end{aligned}$$
(183)

where \(x\in (0,2/9)\). Hence, the contributions of the first and second cases of Step 3 of the proof of Theorem 10 go to zero.

In the third case of Step 3 of the proof, we have \(\Vert {\varvec{t}}_\mathrm{g}-\hat{\varvec{t}}_1\Vert \le 2 n ^{-\frac{1-x}{2}}\), Hence,

$$\begin{aligned} n c(\hat{\varvec{t}}_1,\varvec{t}_\mathrm{g})- n (\hat{\varvec{t}}_1^T-\varvec{t}_\mathrm{g}^T)W_{\varvec{t}_\mathrm{g}}(\hat{\varvec{t}}_1-\varvec{t}_\mathrm{g}) =n O(\Vert \hat{\varvec{t}}_1-\varvec{t}_\mathrm{g}\Vert ^3) =O (n ^{-\frac{1-3x}{2}}) \rightarrow 0. \end{aligned}$$
(184)

Hence, in the contribution of the third case, we can replace the expectation of \(n c(\hat{\varvec{t}}_1,\varvec{t}_\mathrm{g})\) by the weighted MSE with weight \(W_{\varvec{t}_\mathrm{g}}\). Hence, we obtain the part (2). \(\quad \square \)

10 Applications

In this section, we show how to evaluate the MSE bounds in several concrete examples.

10.1 Joint measurement of observables

Here we consider the fundamental problem of the joint measurement of two observables. For simplicity we choose to analyze qubit systems, although the approach can be readily generalized to arbitrary dimension. The task is to simultaneously estimate the expectation of two observables A and B in a qubit system. The observables can be expressed as \(A=\varvec{a}\cdot \varvec{\sigma }\) and \(B=\varvec{b}\cdot \varvec{\sigma }\) with \(\varvec{\sigma }=(\sigma _x,\sigma _y,\sigma _z)\) being the vector of Pauli matrices. We assume without loss of generality that \(|\varvec{a}|=|\varvec{b}|=1\) and \(\varvec{a}\cdot \varvec{b}\in [0,1)\). The state of an arbitrary qubit system can be expressed as

$$\begin{aligned} \rho :=\frac{1}{2}\left( I+\varvec{n}\cdot \varvec{\sigma }\right) , \end{aligned}$$

where \(\varvec{n}\) is the Bloch vector.

With this notation, the task is reduced to estimate the parameters

$$\begin{aligned} x:=\varvec{a}\cdot \varvec{n},\qquad y:=\varvec{b}\cdot \varvec{n}. \end{aligned}$$

It is also convenient to introduce a third unit vector \(\varvec{c}\) orthogonal to \(\varvec{a}\) and \(\varvec{b}\) so that \(\{\varvec{a},\varvec{b},\varvec{c}\}\) form a (non-orthogonal) normalized basis of \(\mathbb {R}^3\). In terms of this vector, we can define the parameter \(z:=\varvec{c}\cdot \varvec{n}\). In this way, we extend the problem to the full model containing all qubit states, where x, y are the parameters of interest and z is a nuisance parameter. Under this parameterization, we can evaluate the SLD operators for xy,  and z, as well as the SLD Fisher information matrix and the D matrix (see “Appendix J” for details), substituting which into the bound (144) yields:

$$\begin{aligned} \mathrm{tr}\,V W\ge&\mathrm{tr}\,W\left( \begin{array}{cc} \frac{1-|\varvec{n}|^2+y'^2(1-s^2)+z^2}{1-|\varvec{n}|^2+x'^2+2x'y's+y'^2+z^2} &{}-\frac{x'y'(1-s^2)+(1-|\varvec{n}|^2+z^2)s}{1-|\varvec{n}|^2+x'^2+2x'y's+y'^2+z^2} \\ \\ -\frac{x'y'(1-s^2)+(1-|\varvec{n}|^2+z^2)s}{1-|\varvec{n}|^2+x'^2+2x'y's+y'^2+z^2} &{}\frac{1-|\varvec{n}|^2+x'^2(1-s^2)+z^2}{1-|\varvec{n}|^2+x'^2+2x'y's+y'^2+z^2} \end{array}\right) \nonumber \\&+\frac{1}{2}\mathrm{tr}\,\left| \sqrt{W}\left( \begin{array}{cc}0&{}-2z\sqrt{1-s^2}\\ 2z\sqrt{1-s^2}&{}0\end{array}\right) \sqrt{W}\right| , \end{aligned}$$
(185)

where \(s:=\varvec{a}\cdot \varvec{b}\), \(x'=\frac{x-ys}{1-s^2}\), and \(y'=\frac{y-xs}{1-s^2}\).

The tradeoff between the measurement precisions for the two observables is of fundamental interest. Substituting the expressions of D-matrix and the SLD Fisher information matrix (see “Appendix J”) into Eq. (159), we obtain

$$\begin{aligned} \varDelta _A+\varDelta _B&\ge 2|z|\sqrt{1-s^2}, \end{aligned}$$

which characterizes the precision tradeoff in joint measurements of qubit observables.

10.2 Direction estimation in the presence of noise

Consider the task of estimating a pure qubit state \(|\psi \rangle =\cos \frac{\theta }{2}|0\rangle +e^{i\varphi }\sin \frac{\theta }{2}|1\rangle \), which can also be regarded as determining a direction in space, as qubits are often realized in spin-1 / 2 systems. In a practical setup, it is necessary to take into account the effect of noise, under which the qubit becomes mixed. For noises with strong symmetry, like depolarization, the usual MSE bound produces a good estimate of the error. For other kind of noises, it is essential to introduce nuisance parameters, and to use the techniques introduced in this paper.

Fig. 2
figure 2

Amplitude damping model. Plot of \(\mathbf {MSE}_{\theta }+\mathbf {MSE}_\varphi \) as a function of \(\theta \) (the value is the same for all \(\varphi \)) for \(\eta =0.9\) (red), \(\eta =0.5\) (green), and \(\eta =0.1\) (blue). Here \(\mathbf {MSE}_{x}\) denotes the (xx)-th element of the MSE matrix

As an illustration, we consider the amplitude damping noise as an example, which can be formulated as the channel

$$\begin{aligned} \mathcal {A}_\eta (\rho ):=A_0\rho A_0^\dag +A_1\rho A_1^\dag \end{aligned}$$

where \(A_0=|0\rangle \langle 0|+\sqrt{\eta }|1\rangle \langle 1|\) and \(A_1=\sqrt{1-\eta }|0\rangle \langle 1|\) are the Kraus operators. After the noisy evolution, the qubit state can be expressed as

$$\begin{aligned} \rho :=\frac{1}{2}\left( I+\varvec{n}\cdot \varvec{\sigma }\right) \end{aligned}$$

with \( \varvec{n}:=(\sqrt{\eta }\sin \theta \sin \varphi ,\sqrt{\eta }\sin \theta \cos \varphi ,1-\eta +\eta \cos \theta ). \) Now we can regard \(\eta \) as a nuisance parameter, while \(\theta \) and \(\varphi \) are the parameters of interest.

Defining the derivative vector through the equation \(\varvec{p}_x\cdot \varvec{\sigma }=\partial \rho /\partial x\), we can calculate the vectors

$$\begin{aligned} \varvec{p}_{\theta }&=(\sqrt{\eta }\cos \theta \sin \varphi ,\sqrt{\eta }\cos \theta \cos \varphi ,-\eta \sin \theta )\\ \varvec{p}_\varphi&=(\sqrt{\eta }\sin \theta \cos \varphi ,-\sqrt{\eta }\sin \theta \sin \varphi ,0)\\ \varvec{p}_\eta&=(\sin \theta \sin \varphi /(2\sqrt{\eta }),\sin \theta \cos \varphi /(2\sqrt{\eta }),-1+\cos \theta ) \end{aligned}$$

In terms of the derivative vector, the SLD for the parameter \(x\in \{\theta ,\varphi , \eta \}\) takes the form

$$\begin{aligned} L_x=-\frac{2\varvec{p}_x\cdot \varvec{n}}{1-|\varvec{n}|^2}I+\left( 2\varvec{p}_x+\frac{2\varvec{p}_x\cdot \varvec{n}}{1-|\varvec{n}|^2}\varvec{n}\right) \cdot \varvec{\sigma } \, . \end{aligned}$$

After some straightforward calculations, we get

$$\begin{aligned} J=\left( \begin{array}{ccc}4\eta &{}0 &{} 2\sin \theta \\ \\ 0 &{}4\eta \sin ^2\theta &{}0\\ \\ 2\sin \theta &{}0&{} \frac{(1-\eta )[\sin ^2\theta +4\eta (1-\cos \theta )^2]+(2\eta -1)^2}{\eta (1-\eta )}\end{array}\right) \end{aligned}$$

and

$$\begin{aligned}&D_{\theta ,\varphi }=-D_{\varphi ,\theta }=8\eta \sin \theta (\eta -\eta \cos \theta +\cos \theta )\\&D_{\eta ,\varphi }=-D_{\varphi ,\eta }=4\sin ^2\theta \left( 1+\eta -\eta \cos \theta \right) \\&D_{\theta ,\eta }=-D_{\eta ,\theta }=0. \end{aligned}$$

Then we have the MSE bound with nuisance parameter \(\eta \). An illustration can be found in Fig. 2 with \(W=I\) in Eq. (144). The minimum of the sum of the (xx)-th matrix element of the MSE matrix for \(x=\theta ,\varphi \) is independent of \(\varphi \), which is a result of the symmetry of the problem: the D-matrix does not depend on \(\varphi \), and thus an estimation of \(\varphi \) can be obtained without affecting the precisions of other parameters. Notice that when the state is close to \(|0\rangle \) or \(|1\rangle \), it is insensitive to the change of \(\theta \), resulting in the cup-shape curves in Fig. 2.

Fig. 3
figure 3

The nuisance parameter bound versus the fixed parameter bound. \(\mathbf {MSE}_{\theta }+\mathbf {MSE}_\varphi \) as a function of \(\theta \) (the value is the same for all \(\varphi \)) for \(\eta =0.5\) (green) and \(\eta =0.1\) (blue) are plotted. The solid curves correspond to the case when \(\eta \) is a nuisance parameter, while the dashed curves correspond to the case when \(\eta \) is a fixed parameter. Here \(\mathbf {MSE}_{x}\) denotes the (xx)-th element of the MSE matrix

Next, we evaluate the sum of MSEs of \(\varphi \) and \(\theta \) when \(\eta \) is a (known) fixed parameter using Eq. (125) and compare it to the nuisance parameter case. The result of the numerical evaluation is plotted in Fig. 3. It is clear from the plot that the variance sum is strictly lower when \(\eta \) is treated as a fixed parameter, compared to the nuisance parameter case. This is a good example of how knowledge on a parameter (\(\eta \)) can assist the estimation of other parameters (\(\varphi \) and \(\theta \)). It is also observed that, when the noise is larger (i.e. when \(\eta \) is smaller), the gain of precision by knowing \(\eta \) is also bigger.

10.3 Multiphase estimation with noise

Here we consider a noisy version of the multiphase estimation setting [20, 51]. This problem was first studied by [20], where the authors derived a lower bound for the quantum Fisher information and conjectured that it was tight. Under local asymptotic covariance, we can now derive an attainable bound and show its equivalence to the SLD bound using the orthogonality of nuisance parameters, which proves the conjecture.

Our techniques also allow to resolve an open issue about the result of Ref. [20], where it was unclear whether or not the best precision depended on the knowledge of the noise. Using Corollary 4, we will also see that knowing a priori the strength of the noise does not help to decrease the estimation error.

Fig. 4
figure 4

Setup of noisy multiphase estimation. A input state \(|\varPsi \rangle \) passes through a noisy channel \(\mathcal {N}_{\varvec{t}}\) consisting of \(d+1\) modes and carrying d parameters. The resultant state is then measured to obtain an estimate of \(\varvec{t}\)

The setting is illustrated in Fig. 4. Due to photon loss, the phase-shift operation is no longer unitary. Instead, it corresponds to a noisy channel with the following Kraus form:

$$\begin{aligned} \hat{\varPi }_{l,\varvec{t}}&=\sqrt{\frac{(1-\eta )^{l}}{l!}}e^{i\hat{n}\varvec{t}}\eta ^{\frac{\hat{n}}{2}}\hat{a}^l\\ \mathcal {N}_{\varvec{t}}(\rho )&:=\sum _{\varvec{l}}\hat{\varPi }_{\varvec{l},\varvec{t}}\rho \hat{\varPi }_{\varvec{l},\varvec{t}}^\dag , \qquad \hat{\varPi }_{\varvec{l},\varvec{t}}:=\hat{\varPi }_{l_j,0}\otimes \left( \bigotimes _{j=1}^d\hat{\varPi }_{l_j,\varvec{t}_j}\right) . \end{aligned}$$

Note that \(\eta =0\) corresponds to the noiseless scenario. We consider a pure input state with N photons and in the “generalized NOON form” as

$$\begin{aligned} |\varPsi \rangle :=a|N\rangle _0+\frac{b}{\sqrt{d}}\sum _{j=1}^d |N\rangle _j\qquad |N\rangle _j:=|0\rangle ^{\otimes j}\otimes |N\rangle \otimes |0\rangle ^{\otimes (d-j)}. \end{aligned}$$

The output state from the noisy multiphase evolution would be

$$\begin{aligned} \mathcal {I}\otimes \mathcal {N}_{\varvec{t}}(|\varPsi \rangle \langle \varPsi |)=p_\eta \left| \psi _{\eta ,\varvec{t}} \rangle \langle \psi _{\eta ,\varvec{t}}\right| +(1-p_\eta )\rho _\eta \end{aligned}$$

where

$$\begin{aligned} \left| \psi _{\eta ,\varvec{t}}\right\rangle =\cos \alpha _\eta |N\rangle _0+\sin \alpha _\eta \sum _j\frac{e^{iN\varvec{t}}|N\rangle _j}{\sqrt{d}},\qquad \cos \alpha _\eta =a/\sqrt{1-b^2(1-\eta ^N)}, \end{aligned}$$

\(p_\eta =1-b^2(1-\eta ^N)\), and \(\rho _\eta \) is independent of \(\varvec{t}\). Notice that the output state is supported by the finite set of orthonormal states \(\{|n\rangle _j:\, j=0,\dots ,d, n=0,\dots , N\}\), and thus it is in the scope of this work.

In this case, \(\{\varvec{t}_j\}\) are the parameters of interest, while \(\alpha _\eta \) and \(p_\eta \) can be regarded as nuisance parameters. The SLD operators for these parameters can be calculated as

$$\begin{aligned} L_{\varvec{t}_j}&=2i\left[ N|N\rangle _j\langle N|_j,|\psi _{\eta ,\varvec{t}}\rangle \langle \psi _{\eta ,\varvec{t}}|\right] \\ L_{\alpha _\eta }&=2\left( |\psi _{\eta ,\varvec{t}}\rangle \langle \psi ^\perp _{\eta ,\varvec{t}}|+|\psi ^\perp _{\eta ,\varvec{t}}\rangle \langle \psi _{\eta ,\varvec{t}}|\right) \\ L_{p_\eta }&=\frac{1}{p_\eta }|\psi _{\eta ,\varvec{t}}\rangle \langle \psi _{\eta ,\varvec{t}}|-\frac{1}{1-p_\eta }\wp _{\mathcal {H}_\perp }, \end{aligned}$$

where \(\wp _{\mathcal {H}_\perp }\) refers to the projection into the space orthogonal to \(|\psi _{\eta ,\varvec{t}}\rangle \). Notice that \(p_\eta \) and \(\alpha _\eta \) are orthogonal to other parameters, in the sense that

$$\begin{aligned} \mathrm{Tr}\,\rho L_{\varvec{t}_j}L_{p_\eta }=\mathrm{Tr}\,\rho L_{\alpha _\eta }L_{p_\eta }=0 \end{aligned}$$

and

$$\begin{aligned} \mathrm{Tr}\,\rho L_{\varvec{t}_j}L_{\alpha _\eta }=\frac{2ip_\eta \sin 2\alpha _\eta }{d} \end{aligned}$$

is purely imaginary. We also have

$$\begin{aligned} \mathrm{Tr}\,\rho L_{\varvec{t}_j}L_{\varvec{t}_k}&=4p_\eta N^2|\langle \psi |N\rangle _j|^2\left( \delta _{jk}-|\langle \psi |N\rangle _k|^2\right) \\&=\frac{4p_\eta N^2\sin ^2\alpha _\eta }{d}\left( \delta _{jk}-\frac{\sin ^2\alpha _\eta }{d}\right) . \end{aligned}$$

Therefore, the SLD Fisher information matrix and the D matrix are of the forms

$$\begin{aligned} J=\left( \begin{array}{ccc}J_{\varvec{t}}&{} 0 &{} 0 \\ \\ 0&{}J_{\alpha _\eta } &{} 0\\ \\ 0&{} 0 &{} J_{p_\eta }\end{array}\right) \qquad D=\left( \begin{array}{ccc}0&{} D_{\varvec{t},\alpha _\eta } &{} 0 \\ \\ D_{\varvec{t},\alpha _\eta }^T&{}0 &{} 0\\ \\ 0&{} 0 &{} 0\end{array}\right) . \end{aligned}$$

Substituting the above into the bound (144), we immediately get an attainable bound

$$\begin{aligned} \mathrm{tr}\,WV\left[ \wp _{\varvec{t}_0,\varvec{t}|\mathfrak {m}}\right] \ge \mathrm{tr}\,WJ_{\varvec{t}}^{-1},\quad \left( J_{\varvec{t}}\right) _{ij}=\frac{4p_\eta N^2\sin ^2\alpha _\eta }{d}\left( \delta _{ij}-\frac{\sin ^2\alpha _\eta }{d}\right) \end{aligned}$$
(186)

for any locally asymptotically covariant measurement \(\mathfrak {m}\). Taking W to be the identity, one will see that for small \(\eta \) the sum of the variances scales as \(N^2/d^2\), while for \(\eta \rightarrow 1\) it scales as \(N^2/d\), losing the boost in scaling compared to separate measurement of the phases. The bound (186) coincides with the SLD bound and the RLD bound. By Corollary 4, we conclude that the SLD (RLD) bound can be attained in the case of joint estimation of multiple phases. In addition, we stress that the ultimate precision does not depend on whether or not the noisy parameter \(\eta \) is known aprior: If \(\eta \) is unknown, one can obtain the same precision as when \(\eta \) is known by estimating \(\eta \) without disturbing the parameters of interest.

11 Conclusion

Table 3 Comparison between our results and existing results

In this work, we completely solved the attainability problem of precision bounds for quantum state estimation under the local asymptotic covariance condition. We provided an explicit construction for the optimal measurement which attains the bounds globally. The key building block of the optimal measurement is the quantum local asymptotic normality, derived in [16, 17] for a particular type of parametrization and generalized here to arbitrary parameterizations. Besides the bound of MSE, we also derived a bound for the tail probability of estimation. Our work provides a general tool of constructing benchmarks and optimal measurements in multiparameter state estimation. In Table 3, we compare our result with existing results.

Here, we should remark the relation with the results by Yamagata et al. [19], which showed a similar statement for this kind of achievability in a local model scenario by a kind of local quantum asymptotic normality. In Theorem 7, we have shown the compact uniformity with the order estimation in our convergence, but they did not such properties. In the evaluation of global estimator, these properties for the convergence is essential. The difference between our evaluation and their evaluation comes from the key tools. The key tool of our derivation is Q-LAN (Proposition 2) by [16, 17], which gives the state conversion, i.e., the TP-CP maps converting the states family with precise evaluation of the trace norm. However, their method is based on the algebraic central limit theorem [38, 52], which gives only the behavior of the expectation of the function of operators \(R_i\). This idea of applying this method to the achievability of the Holevo bound was first mentioned in [18]. Yamagata et al. [19] derived the detailed discussion in this direction.

Indeed, the algebraic version of Q-LAN by [38, 52] can be directly applied to the vector \(\varvec{X}\) of Hermitian matrices to achieve the Holevo bound while use of the state conversion of Q-LAN requires some complicated procedure to handle the the vector \(\varvec{X}\) of Hermitian matrices, which is the disadvantage of our approach. However, since the algebraic version of Q-LAN does not give a state conversion directly, it is quite difficult to give the compact uniformity and the order estimate of the convergence. In this paper, to overcome the disadvantage of our approach, we have derived several advanced properties for Gaussian states in Sects. 3.2 and 3.3 by using symplectic structure. Using these properties, we could smoothly handle complicated procedure to fill the gap between the full quit model and arbitrary submodel.