Paper • The following article is Open access

Predicting excited states from ground state wavefunction by supervised quantum machine learning

Hiroki Kawai and Yuya O. Nakagawa

Published 29 October 2020 • © 2020 The Author(s). Published by IOP Publishing Ltd
Machine Learning: Science and Technology, Volume 1, Number 4 Citation Hiroki Kawai and Yuya O. Nakagawa 2020 Mach. Learn.: Sci. Technol. 1 045027 DOI 10.1088/2632-2153/aba183

Download Article PDF

Article metrics

6100 Total downloads
Video abstract views

Submit

Submit to this Journal

Dates

Received 18 March 2020
Accepted 30 June 2020
Published 29 October 2020

Peer review information

Method: Single-anonymous
Revisions: 1
Screened for originality? Yes

Buy this article in print

Journal RSS

Abstract

Excited states of molecules lie in the heart of photochemistry and chemical reactions. The recent development in quantum computational chemistry leads to inventions of a variety of algorithms that calculate the excited states of molecules on near-term quantum computers, but they require more computational burdens than the algorithms for calculating the ground states. In this study, we propose a scheme of supervised quantum machine learning which predicts the excited-state properties of molecules only from their ground state wavefunction resulting in reducing the computational cost for calculating the excited states. Our model is comprised of a quantum reservoir and a classical machine learning unit which processes the measurement results of single-qubit Pauli operators with the output state from the reservoir. The quantum reservoir effectively transforms the single-qubit operators into complicated multi-qubit ones which contain essential information of the system, so that the classical machine learning unit may decode them appropriately. The number of runs for quantum computers is saved by training only the classical machine learning unit, and the whole model requires modest resources of quantum hardware that may be implemented in current experiments. We illustrate the predictive ability of our model by numerical simulations for small molecules with and without noise inevitable in near-term quantum computers. The results show that our scheme reproduces well the first and second excitation energies as well as the transition dipole moment between the ground states and excited states only from the ground states as inputs. We expect our contribution will enhance the applications of quantum computers in the study of quantum chemistry and quantum materials.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

The rapid growth of the machine learning technology in the last decade has revealed its potential to be utilized in various engineering fields such as image recognition, natural language processing, and outlier detection [1, 2]. Its applications to scientific fields have also attracted numerous attentions recently as well as those to engineering. One of the most active research areas is physical science [3], especially studies of quantum many-body systems including condensed matter physics and quantum chemistry. For example, one can classify a phase of matter from its wavefunction [4, 5] or predict the atomization energy of molecules [6–8] from their molecular structures with sophisticated machine learning techniques.

Most of those researches employ classical machine learning, with which classical data are processed by classical algorithms and computers. On the other hand, machine learning algorithms on a quantum processor have been developed since the invention of the Harrow-Hassidim-Lloyd (HHL) algorithm, and they are dubbed as 'quantum machine learning' [9–16]. In the last few years, there has been surging interest in quantum machine learning leveraging the variational method [17–22], in which a shallow quantum circuit parameterized with classical parameters such as the angles of the rotational gates is optimized with a classical optimization algorithm to find optimal parameters for performing the given objective. This is because a primitive type of quantum computers is about to be realized in the near future, and such machines may have the potentials to outperform classical computers [23, 24]. Those near-term quantum computers are called noisy intermediate-scale quantum (NISQ) devices [25] and consist of hundreds to thousands of physical, non-fault-tolerant qubits.

So far, quantum machine learning has been mostly applied to classical computing tasks with classical data such as pattern recognition of images [18, 19, 22]. In those studies, the classical data must be encoded in quantum states to be processed by quantum computers, but the encoding is generally inefficient; it requires the exponentially large number of gates to encode classical data into a quantum state unless the data have a structure of the tensor product which is compatible with that of qubits [26–28].

Therefore, it is natural to think of performing tasks with quantum nature. In this study, we consider the following task: predicting excited-state properties of a given molecular system from its ground state wavefunction. Specifically, we are interested in the Hamiltonian for the electronic states of molecules. The question we raise and want to solve leveraging quantum machine learning is whether it is possible to predict properties of the excited states from the ground state wavefunction $|\psi_0\rangle$ . According to the celebrated Hohenberg-Kohn theorem [29], one can determine an external potential for electrons and thereby the whole original electron Hamiltonian from its ground state electron density $\rho_0(r) = \langle\psi_0|\hat{r}|\psi_0\rangle$ up to constant, where $\hat{r}$ is the position operator. Hence, it should be also possible to predict the excited states from the ground state in principle.

The task we propose here has various practical and conceptual attractions from the viewpoint of quantum machine learning and the studies of quantum many-body systems. First, practically, computing excited states of a given Hamiltonian needs significantly larger computational cost and is more difficult than computing the ground state [30, 31]. Since the excited-state properties are essential for thermodynamics of the system and non-equilibrium dynamics such as chemical reactions, a large benefit to the studies of quantum chemistry and quantum materials is expected if one may predict the excited states only from the ground state. We note that applying classical machine learning to predict excited states of molecules from classical data (molecular structure, coulomb matrix, etc) has been widely explored in the literature [32–36]. Second, the problem of encoding data to quantum computers mentioned above can be circumvented in this setup; as we will see later, it is possible to input wavefunctions into quantum registers directly from outputs of another quantum algorithm which yields a ground state wavefunction, such as the variational quantum eigensolver (VQE), which is one of the most promising applications of the NISQ devices [37]. Third, from a conceptual point of view, the original 'data' of quantum systems are wavefunctions, which are quantum in nature, so quantum machine learning dealing with quantum data as they are will take advantage of the whole information contained in the wavefunctions and potentially has stronger predictability than the classical counterparts which process only classical features of quantum data in a pure classical way [38, 39].

In this study, we propose a simple quantum machine learning scheme to predict the excited-state properties of the Hamiltonian of a given molecule from its ground state wavefunction. Our simulations suggest the potential that one can implement our model on the real NISQ devices being robust to the inevitable noise of outputs on such devices. In particular, we employ and generalize the quantum reservoir computing [40] and quantum reservoir processing [41] techniques.

Both techniques feed the initial quantum information to a random quantum system called a 'quantum reservoir' which evolves the initial state to another state, and the measurement results of the output state are learned by linear regression to predict some properties associated with the initial information.

Similarly, we first process an input wavefunction that is the ground state of the target molecular Hamiltonian with a random quantum circuit or the time evolution under another certain Hamiltonian and then measure the expectation values of one-qubit operators afterwards.

The measurement results are post-processed by a classical machine learning unit, and we train only the classical unit to predict the target properties of the system by supervised learning so that the overall number of runs of quantum computers is small. In the Heisenberg picture, the quantum reservoir effectively transforms the one-qubit operators into complicated multi-qubit ones which contain essential information of the system, and the classical machine learning will decode them appropriately. We numerically demonstrate the predictive power of our scheme by taking three small molecules as examples. Our model can predict the excitation energies and the transition dipole moment between the ground state and the excited state properly only from the ground state wavefunction.

The rest of the paper is organized as follows. In section 2, we explain our setup in detail and propose a model for quantum machine learning of excited states, besides presenting the way to train the model. In section 3, we show the result of numerical simulations of our scheme predicting the excited-state properties of small molecules as examples. Section 4 is dedicated for the discussion of our result. We conclude the study in section 5. Appendix A is a review of the VQE and its extension to find the excited states. Appendix B introduces the Jordan-Wigner transformation, which is used to map fermionic molecular Hamiltonians into Hamiltonians written in qubit operators. Appendix C is the extension of the discussion in section 4 to demonstrate the non-linearity between the excited-state properties of the Hamiltonian and the information one may obtain from the ground state. Appendix D provides a further analysis of the effect of the entangler. The dependence of our scheme on the performance of the VQE is analyzed in Appendix E.

2. Method

In this section, we propose a model for quantum machine learning and explain its training process. The schematic diagram of our model is described in figure 1.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Schematic diagram of our model for quantum machine learning of the excited-state properties of a molecule from its ground state. The input qubit state $|\psi\rangle$ , which is assumed to be the ground state, is processed with a quantum circuit $U_\mathrm{ent}$ , and the measurement yields a classical vector $\mathbf{x}_{|\psi\rangle}$ , whose elements $\langle X_{i}\rangle, \langle Y_{i}\rangle, \langle Z_{i}\rangle$ are the expectation values of the single-qubit measurements of the Pauli operators X, Y, Z, respectively, on the ith qubit. A classical machine learning unit f_W with learnable parameters W outputs the target properties from $\mathbf{x}_{|\psi\rangle}$ .
Download figure:
Standard image High-resolution image

2.1. Model description

Let us consider an N-qubit system and a wavefunction $|\psi\rangle \in \mathbb{C}^{2^N}$ on it. Our learning model proceeds as follows. First, an input N-qubit state $|\psi\rangle$ , which is assumed to be the ground state of a given Hamiltonian here, is prepared on a quantum computer and fed into a quantum circuit which is denoted as $U_\mathrm{ent}$ in figure 1. We call this circuit a quantum entangler or a quantum reservoir for its role of mixing local quantum information of the input state $|\psi\rangle$ and encoding it to the output state $U_\mathrm{ent}|\psi\rangle$ . $U_\mathrm{ent}$ is chosen to create enough entanglement in the wavefunction and fixed for each learning task (or an experiment). The details of $U_\mathrm{ent}$ are not so important for the quality of learning as illustrated by an exactly-solvable model in section 4, so one can use a quantum circuit easy to be realized on real quantum devices. After applying $U_\mathrm{ent}$ , we measure the expectation values of local Pauli operators $\{X_0, Y_0, Z_0, \cdots, X_{N-1}, Y_{N-1}, Z_{N-1}\}$ , where X_i, Y_i, and Z_i represent a Pauli X, Y, Z operator acting on the site i, respectively. Although the total number of operators is $3\mathrm{N}$ , we can measure the operators $X_0, ..., X_{N-1}$ simultaneously since they commute with each other, so can we for the cases of $Y_0, ..., Y_{N-1}$ and $Z_0, ..., Z_{N-1}$ .

Hence, one can measure all operators with only three different circuits, i.e. the number of experiments to obtain the measurement data does not scale with the number of qubit N, but only with the desired precision epsilon as $\mathcal{O}(1/\epsilon^2)$ due to the statistical uncertainty.

After the measurements, we obtain a $3\mathrm{N}$ -dimensional real-valued classical vector:

Finally, the classical data $\mathbf{x}_{|\psi\rangle}$ is fed into a classical machine learning unit with learnable parameters W, such as a linear regression model or a neural network, and the prediction $f_W(\mathbf{x}_{|\psi\rangle})$ is obtained.

We have several comments in order. First, the process to obtain $\mathbf{x}_{|\psi\rangle}$ from $|\psi\rangle$ can be viewed as compressing the data of 2^N-dimensional complex-valued vector $|\psi\rangle$ into $3\mathrm{N}$ -dimensional real-valued data. Although the way of compression is quite complicated due to the entangler $U_\mathrm{ent}$ , the classical machine learning unit can decode the information in $\mathbf{x}_{|\psi\rangle}$ and use it to predict the properties of the excited states of the Hamiltonian. More concretely, the effect of the entangler is to make the classical vector $\mathbf{x}_{|\psi\rangle}$ to contain the expectation values of complicated (generally long-ranged, many-body) observables for the original ground state $|\psi\rangle$ ; that is, $\mathbf{x}_{|\psi\rangle}$ can be viewed as the expectation values of the complicated operators $\{U_\mathrm{ent}^\dagger X_i U_\mathrm{ent}, U_\mathrm{ent}^\dagger Y_i U_\mathrm{ent}, U_\mathrm{ent}^\dagger Z_i U_\mathrm{ent}\}_{i = 0}^{N-1}$ for the ground state $|\psi\rangle$ . When we expand $U_\mathrm{ent}^\dagger X_i U_\mathrm{ent}$ as $U_\mathrm{ent}^\dagger X_i U_\mathrm{ent} = \sum_j \lambda_j^{(X_i)} P_j^{(X_i)}$ , where $\lambda_j^{(X_i)}$ is some coefficient and $P_j^{(X_i)}$ is an N-qubit Pauli operator, some of $\{P_j^{(X_i)}\}_j$ are the long-ranged and many-body ones if $U_\mathrm{ent}$ creates entanglement over the whole system. This means that the classical vector $\mathbf{x}_{|\psi\rangle}$ contains a lot of detailed information of $|\psi\rangle$ as multi-point, long-ranged correlation functions even though we measure only the single-qubit Pauli operators $\bigcup_{i = 0}^{N-1}\{X_i, Y_i, Z_i\}$ in reality. Although how such information is implemented in $\mathbf{x}_{|\psi\rangle}$ is not explicitly known since we do not know the actual values of coefficients $\lambda_j^{(X_i)}$ , the classical machine learning unit can be trained to utilize the information to predict the excited states. An explicit example of this point is described in section 4 and Appendix C. Moreover, any $U_\mathrm{ent}$ can be written in the form of a time-evolution operator as e^−iHT under a certain Hamiltonian operator H. In this formulation, one can naturally interpret the entangler as an operator evolving the single-qubit Pauli operators $\bigcup_{i = 0}^{N-1}\{X_i, Y_i, Z_i\}$ into a linear combination of the multi-qubit ones in the Heisenberg picture, and the linear combination consists of more variety of the multi-qubit Paulis as the evolution takes a longer time (see the details in Appendix D). Second, the model is identical to quantum reservoir computing proposed in reference [40] and quantum reservoir processing proposed in reference [41] if we choose the linear regression as the classical machine learning unit in the model. One may also consider using general classical models such as the neural network, the Gaussian process regression, etc. Even though the numerical simulations we carried out in this study leverage only a linear model, which actually gives sufficiently accurate predictions at least for the molecules we consider here, non-linearity in the classical machine learning unit may be necessary to predict the excited states for certain tasks as discussed in section 4 using an exactly-solvable toy model for the hydrogen molecule. Third, as mentioned in the previous section, we stress that this scheme is very suitable to be combined with the VQE. The VQE finds a quantum circuit that produces an approximate ground state of a given Hamiltonian by using the variational principle and has been extended to obtain the excited states recently [42–49]. Since it can handle Hamiltonians of large systems that are intractable by classical computers, the VQE is considered as one of the best approaches to utilize the NISQ devices for real-world problems. In our quantum machine learning model, one can use the quantum circuit obtained by the VQE to make an input state (approximate ground state wavefunction) for the training and the prediction of our model. There is no overhead cost at all to feed target data to the learning model in this case (see also reference [50]). We review the VQE and one of its extensions to compute the excited states in Appendix A.

2.2. Supervised learning of the model

Next, we explain the procedure for supervised learning of our model. First, we define the training set $\mathcal{R}$ whose elements $r\in \mathcal{R}$ are a set of characteristics of a molecule (e.g. name of a molecule and its atomic configuration), and we prepare the data $\{|\psi_0(r)\rangle, \mathbf{y}(r)\}_{r\in \mathcal{R}}$ for training. In the case of predicting excited states of a given Hamiltonian from its ground state, $|\psi_0(r)\rangle$ is the ground state of the molecular Hamiltonian H(r) and $\mathbf{y}(r)$ contains the target properties of the excited states of H(r), such as excitation energies. Next, by using the training set, the classical machine learning unit f_W is trained to predict $\{\mathbf{y}(r) \}_{r\in \mathcal{R}}$ from the classical vectors $\{\mathbf{x}_{|\psi_0(r)\rangle} \}_{r\in \mathcal{R}}$ which are calculated in the way described in the previous subsection. A typical training algorithm for the supervised learning is to minimize a cost function defined to measure the deviations of the prediction $\{f_W(\mathbf{x}_{|\psi_0(r)\rangle}) \}_{r\in \mathcal{R}}$ from the training data $\{\mathbf{y}(r) \}_{r\in \mathcal{R}}$ by tuning W.

We note that our model is easier to be trained and less costly in terms of the number of runs of quantum computers compared with the so-called 'quantum circuit learning' where parameters of the quantum circuit are optimized [18–21] since once the classical representation of the quantum state $\{\mathbf{x}_{|\psi_0(r)\rangle} \}_{r\in \mathcal{R}}$ is obtained, there is no need to run the quantum device afterwards for training the model.

3. Numerical demonstration for small molecules

In this section, we numerically demonstrate the ability of our model to reproduce excited-state properties from the ground state wavefunctions by taking small molecules as examples. We consider three types of molecules: LiH molecule, $\mathrm{H_4}$ molecule whose hydrogen atoms are aligned linearly with equal spacing, and $\mathrm{H_4}$ molecules whose hydrogen atoms are placed in a rectangle shape. We call them as LiH, $\mathrm{H_4}$ (line), $\mathrm{H_4}$ (rectangle), respectively. We evaluate our model in two situations, in one of which ideal outputs of the quantum circuits are available (noiseless), and inevitable noise in the real NISQ devices is considered in the other (noisy). The electronic ground states of those molecules with various atomic geometries are prepared by diagonalizing the Hamiltonian for the noiseless simulation and by numerically simulating the VQE for the noisy simulation. Then, we train our model with the linear regression as its classical machine learning unit to predict the first and second excitation energies and the transition dipole moment among them whose values are obtained by exactly solving the Hamiltonian. Numerical results show that our model can properly reproduce the excited states and illustrate the predictive power of our model.

3.1. Dataset

To prepare a dataset for the simulations, we consider the electronic Hamiltonians of the following configurations. For LiH molecule and $\mathrm{H_4}$ (line), the atomic distances are in the range of $[0.5\ \mathrm{\AA}, 3.3\ \mathrm{\AA}]$ . For $\mathrm{H_4}$ (rectangle), we choose the two spacing of atoms (lengths of two edges) in $[0.5\ \mathrm{\AA}, 2.0\ \mathrm{\AA}] \times [0.5\ \mathrm{\AA}, 2.0\ \mathrm{\AA}]$ . We perform the standard Hartree–Fock calculation by employing the STO-3G minimal basis and construct the fermionic second-quantized Hamiltonian for all of the molecules and configurations [51, 52] with open-source libraries PySCF [53] and OpenFermion [54]. Two Hartree–Fock orbitals with the highest and the second-highest energies among six orbitals of LiH molecule are removed by assuming they are vacant because they are composed almost completely from 2p_x and 2p_y atomic orbitals of LiH and do not significantly contribute to the binding energy of LiH. Then the Hamiltonian is mapped to the sum of the Pauli operators by the Jordan-Wigner transformation [55] which we denote H(r) (a review of the Jordan-Wigner transformation is given in Appendix B). Then, the electric Hamiltonians for all of the molecules turn into 8-qubit Hamiltonians.

The training and test datasets for the simulations are prepared for each Hamiltonian H(r) in the following way. First, in the case of the noiseless simulation, the ground state of H(r) is prepared by the exact diagonalization. In the case of the noisy simulation, the VQE algorithm is applied to H(r), and the approximate ground state is obtained as $|\tilde{\psi}_0(r)\rangle = U(\vec{\theta})|0\rangle$ . Here $U(\vec{\theta})$ is a variational quantum circuit (ansatz) with classical parameters $\vec{\theta}$ and $|0\rangle$ is a reference Hartree-Fock state. We adapt the unitary coupled-cluster singles and doubles ansatz [37, 56] as $U(\vec{\theta})$ . Next, we compute the quantities of the excited-state properties to be predicted,

where $\Delta E_{1(2)}(r) = E_{1(2)}(r) - E_0(r)$ is the first (second) excitation energy of H(r) in the sector of neutral charge, where E_0,1,2(r) are three lowest eigenenergies of H(r) in the same sector ignoring degeneracy. For our choice of the molecules and configurations, E₀(r) is the energy of the spin-singlet ground state S₀, and E₁(r) is the energy of the spin-triplet excited state T₁. E₂(r) is the energy of the spin-singlet excited state S₁ or the spin-triplet excited state T₂ depending on the configurations of the molecule. The transition dipole moment between the ground state and the excited state $\boldsymbol{\mu}_\mathrm{eg}(r)$ is defined as

where $|\psi_0(r)\rangle$ is the exact ground state of H(r) (the singlet state S₀), $|\psi_\mathrm{ex}(r)\rangle$ is the exact excited state of H(r) which has the lowest energy among those having a non-zero transition dipole moment from the ground state (typically S₁ state), and $\boldsymbol{\mu} = -e (\hat{x},\hat{y},\hat{z})^T$ is the dipole moment operator with electronic charge e. In this study, we use its L2-norm $\|\boldsymbol{\mu}_\mathrm{eg}(r)\|$ for the learning tasks. The calculation of each value of $\mathbf{y}(r)$ is performed by the exact diagonalization of H(r) for both of the noiseless and noisy simulations. To stabilize the learning process, we scale those calculated values to fit them into the [−1, 1] range, so that the maximum value $y^{(k)}_{\max}$ and the minimum value $y^{(k)}_{\min}$ in the training dataset are scaled as $y^{(k)}_{\max} = \max_{r\in \mathcal{R}}y^{(k)}(r) \rightarrow 1$ and $y^{(k)}_{\min} = \min_{r\in \mathcal{R}}y^{(k)}(r) \rightarrow -1$ where y(r)^(k) denotes the k-th element of $\boldsymbol{y}(r)$ for each k = 1, 2, 3, and other values, including those in the test dataset, are mapped as $y(r)^{(k)} \rightarrow 2\frac{y(r)^{(k)}-y^{(k)}_{\max}}{y(r)^{(k)}_{\max} - y(r)^{(k)}_{\min}}-1$ .

For the numerical experiments, we randomly split those obtained data $\{|\tilde{\psi}_0(r)\rangle, \mathbf{y}(r) \}_{r}$ into the training set and the test set for the evaluation of the model. We used 30 training data points and 50 test points, respectively, for the tasks of the LiH and $\mathrm{H_4}$ (linear) molecules, and 250 training data points and 1250 test data points for the $\mathrm{H_4}$ (rectangle) molecules.

3.2. Model for the simulations

The entangler $U_\mathrm{ent}$ in the model is chosen to be the time-evolution operator $e^{-iH_\mathrm{TFIM}T}$ under the random transverse-field Ising model (TFIM),

where X_i and Z_j are Pauli operators acting on the site i, j-th qubit, coefficients h_i and J_ij are sampled from the Gaussian distributions N(1, 0.1) and N(0.75, 0.1), respectively, and we set T = 10. These coefficients are fixed during each of the numerical simulations. This type of the entangler can be implemented on various types of the NISQ devices; for example, in the case of superconducting qubits, it can be realized by a sequence of the cross resonance gates [57, 58] or simply tuning the resonance frequency of the qubits [18]. We note that a similar kind of the quantum reservoir has recently been implemented on a real NISQ device [59].

In our numerical simulations, this time evolution is exactly simulated as the unitary operation $e^{-iH_\mathrm{TFIM}T}$ acting on the input state.

For the classical machine learning unit for the numerical demonstration, we employ the linear regression (LR) [60]. Although the LR does not have non-linearity which is in principle necessary to compute the excited-state properties (see section 4), it performs well enough for the molecular Hamiltonians we consider for the simulations as shown in section 3.4, so it serves as a nice demonstrative model to evaluate the concept of our model.

The output function of the LR is

where $\mathbf{w}_\mathrm{out}^{(k)}$ is a $3\mathrm{N}$ -dimensional vector, or parameters of the model, to be optimized, and k = 0, 1, 2 corresponds to the component of the prediction for $\mathbf{y} = (y^{(0)}, y^{(1)}, y^{(2)})^T$ . The model is trained to minimize the mean squared error (MSE) cost function

where $\mathcal{R}$ represents the training dataset, respectively, for each target property. The exact optimum of the cost function can be obtained as

where V is a $|\mathcal{R}|\times 3\mathrm{N}$ dimensional matrix whose ith row is ${\mathbf{x}_{|\psi_0(r_i)\rangle}}^T$ , and $\mathbf{Y}^{(k)}$ is a $|\mathcal{R}|$ dimensional column vector whose ith component is $y(r_i)^{(k)}$ , where r_i is the ith element of $\mathcal{R}$ . The whole classical process requires the computational complexity of $\mathcal{O}(N^3+N^2|\mathcal{R}|)$ .

3.3. Simulation of quantum circuits

To check the practical advantage of our model with the NISQ devices, we numerically simulate quantum circuits of the model considering the noiseless and noisy situations (including preparation of the ground state wavefunctions by the VQE for the noisy case). The latter reflects a more realistic situation of experiments on a real NISQ device, but we stress that the former still serves as a reference point to judge whether the model has the capability of performing the learning task or not.

In the noiseless simulation, the expectation value of the Pauli operator $\langle \psi|P_i|\psi\rangle$ , where the $|\psi\rangle$ is a quantum state and P_i is the Pauli operator acting on ith qubit, is estimated exactly by calculating the inner product. In the noisy simulation, we consider two error sources which make estimations of those expectation values deviate from the exact ones. One of them is a sequence of the depolarizing noise channels [61] that transform the quantum state $\rho = U_\mathrm{ent}|\psi\rangle\langle\psi|U_\mathrm{ent}^\dagger$ from the reservoir into $\rho^{^{\prime}} = \mathcal{E}_{N-1}\circ \ldots\circ\mathcal{E}_0(\rho)$ , where $\mathcal{E}_i$ is the depolarizing channel that acts on ith qubit as $\mathcal{E}_i(\sigma) = (1-p)\sigma + \frac{p}{3} (X_i\sigma X_i + Y_i\sigma Y_i + Z_i\sigma Z_i)$ . We take p = 0.01 in the simulations. The other source is the so-called shot noise that stems from the finite number of shots for the projective measurements of the Pauli operator P_i. Each measurement returns ± 1 according to the probability distribution determined by the exact values of $\mathrm{\textrm{Tr}}\left(\rho^{^{\prime}} P_i\right)$ . We sample 10⁴ shots of measurements to compute the ground state with the VQE, and 10⁶ shots for each Pauli operator to construct the vector in equation (1). These are feasible numbers in experiments [57].

3.4. Results

The model described in the previous subsections is trained by the training dataset and evaluated by the test set. The evaluation is performed based on the mean absolute error (MAE) for the test set $\mathcal{T}$ ,

where $\mathbf{f}_W(\mathbf{x})$ is the output of the model considered as a vector. We train and evaluate the model for each molecule separately.

3.4.1. Noiseless simulation

The prediction results by the trained model in the noiseless numerical simulation are shown in figures 2 and 3. In these figures, the excited-state properties $\mathbf{y}(r)$ are scaled back to the original scale. Our model obviously reproduces the exact values of the excited-state properties $\mathbf{y}(r)$ for all of the three molecule types.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** The prediction results by the trained model for LiH (left column) and $\mathrm{H_4}$ (line) (right column) for the noiseless simulations. Top, middle, and bottom panels display the first, second excitation energies and the transition dipole moment, respectively. The green circles represent the training data points and the red crosses are the predictions. The exact values are displayed as the black line. Those values are read from the left ticks of each panel. The blue plot lines with circles represent the absolute errors between the predictions and the exact values, and the orange line indicates their mean. These error values are read from the right ticks of each panel.
Download figure:
Standard image High-resolution image

**Figure 2.** The prediction results by the trained model for LiH (left column) and $\mathrm{H_4}$ (line) (right column) for the noiseless simulations. Top, middle, and bottom panels display the first, second excitation energies and the transition dipole moment, respectively. The green circles represent the training data points and the red crosses are the predictions. The exact values are displayed as the black line. Those values are read from the left ticks of each panel. The blue plot lines with circles represent the absolute errors between the predictions and the exact values, and the orange line indicates their mean. These error values are read from the right ticks of each panel.
Download figure:
Standard image High-resolution image

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** The same figures as figure 2 for $\mathrm{H_4}$ (rectangle). x and y denote the two atomic spacings of the rectangular geometry.
Download figure:
Standard image High-resolution image

To quantify it, in the upper rows of table 1, we summarize the MAEs of ΔE₁ and ΔE₂ for the test data, equation (8), between the predictions and the exact values scaled back to the original scale. For the LiH and $\mathrm{H_4}$ (linear) molecules, the MAEs are below or in the comparable scale to the chemical accuracy $1.6 \times 10^{-3} \mathrm{Ha}$ . The error is larger than the chemical accuracy for $\mathrm{H_4}$ (rectangle), and it is probably because the degrees of freedom of the molecular structure of $\mathrm{H_4}$ (rectangle) are larger, and the LR model may not have a sufficient expressive space. Utilizing other machine learning methods (e.g. neural networks) is one possible way to achieve more accurate results.

Table 1. The individual MAEs estimating the properties ΔE₁ and ΔE₂ in the units of Hartree, compared with the chemical accuracy $1.6 \mathrm{mHa}$ for both of the noisy and the noiseless cases.

		LiH	H4 (line)	H4 (rectangle)	Chemical Accuracy
MAE (noiseless) (mHa)	ΔE₁	0.1	1.4	21.3	1.6
	ΔE₂	0.1	3.2	28.0
MAE (noisy) (mHa)	ΔE₁	15.8	39.2	109.1
	ΔE₂	15.4	50.3	103.8

Also, we investigate the necessity of the entangler $U_\mathrm{ent}$ by comparing the values of the MAE (equation (8)) for the test set after the training. The evaluations of the learners with and without the entangler are summarized in table 2, indicating that the entangler significantly enhances the predictive power of the model. To treat all the excited-state properties on an equal footing, here we use the scaled values of $\mathbf{y}(r)$ . In section 4 and Appendix C, another supporting result for the necessity of the entangler is presented by using an exactly solvable model for the hydrogen molecule.

Table 2. The MAEs evaluated with the test set for the trained models in the noiseless situation with and without the entangler. The mean values are taken over all the three properties to be estimated. The output values and the excited-state properties are in the same standardized scale as explained in section 3.1. The MAEs for the random guess are also presented as a reference.

	LiH	H4 (line)	H4 (rectangle)
Test MAE with entangler $U_\mathrm{ent}$	0.018 1	0.020 3	0.083 6
Test MAE without entangler $U_\mathrm{ent}$	0.172	0.324	0.300
Random Guess	0.673	0.544	0.444

These results from the noiseless simulations illustrate the predictive power of our model for the difficult task to predict the excited-state properties only from the ground state.

3.4.2. Noisy simulation

In order to evaluate our scheme in a realistic situation with a quantum device, we add two noise sources to the simulation as described in section 3.3. In this case, to enhance the noise robustness, we make two modifications to the noiseless case as follows. First, after obtaining the classical vectors $\{\mathbf{x}_{|\psi_0(r)\rangle}\}_{r\in\mathcal{R}}$ by processing the training dataset with the noisy quantum circuit, we make 100 copies of every vector $\mathbf{x}_{|\psi_0(r)\rangle}$ on a classical computer, and add a gaussian noise N(0, 2 × 10⁻³) to each component of it. We stack these vectors, and now we have a new $100|\mathcal{R}| \times 3\mathrm{N}$ dimensional matrix $V^{^{\prime}}$ . The vector $\mathbf{Y}^{(k)}$ is also duplicated 100 times to match up with $V^{^{\prime}}$ (let us call this new vector $\mathbf{Y}^{{\prime{(k)}}}$ for later use). Notice that this modification does not affect the required number of measurements of the quantum circuit. Second, we add the L2 regularization term into the cost function of the LR, particularly saying the cost function becomes

where we used α = 10⁻³ in the simulations. We may obtain the exact optimum by computing

where I is an identity matrix of $3\mathrm{N} \times 3\mathrm{N}$ dimensions. Both of these two modifications work as regularizations preventing the model from overfitting due to the outliers with large noises.

The prediction results are presented in figures 4 and 5. We see that the model still predicts $\mathbf{y}(r)$ well even in this noisy case.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** The prediction results by the trained model for LiH (left column) and $\mathrm{H_4}$ (line) (right column) for the noisy simulations. Top, middle, and bottom panels display the first, second excitation energies and the transition dipole moment, respectively. The green circles represent the training data points and the red crosses are the predictions. The exact values are displayed as the black line. Those values are read from the left ticks of each panel. The blue plot lines with circles represent the absolute errors between the predictions and the exact values, and the orange line indicates their mean. These error values are read from the right ticks of each panel.
Download figure:
Standard image High-resolution image

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** The same figures as figure 4 for $\mathrm{H_4}$ (rectangle). x and y denote two atomic spacings of the rectangular geometry.
Download figure:
Standard image High-resolution image

We attribute this noise-robustness to the regularization technique of the LR in equation (9). The MAEs for the predictions of ΔE₁ and ΔE₂ are summarized in the lower rows of table 1 in the same way as the noiseless cases. The noise makes the accuracy of the predictions worse than those of the noiseless cases, and all of the errors become larger than the chemical accuracy. A part of the reason for this is because the noise hinders obtaining sufficiently precise ground states of the molecular Hamiltonians. Indeed, for example in the case of LiH, we find that the ground-state energies computed by the VQE in the noisy situation already have a larger error (0.007 4 Ha) than the chemical accuracy, and the MAEs of the predictions (∼ 0.015 Ha) are in the similar order. Adapting the error mitigation techniques [62, 63] to the VQE can remove the effect of the noise and will yield more accurate results even in the noisy situation. In Appendix E, we present how the accuracy of the predictions for the excited-state properties varies as the function of the number of the shots used to perform the VQE.

4. Discussion

4.1. Necessity of the entangler $U_\mathrm{ent}$ and the non-linearity of f_W

Here we discuss the necessity of the entangler $U_\mathrm{ent}$ and the non-linearity in the classical machine learning unit f_W by considering an exactly solvable model of fermions, namely, the 2-site fermion Hubbard model at half-filling [64].

The 2-site Hubbard model is defined as

where $c_{i, \sigma}, c_{i, \sigma}^\dagger$ are fermionic annihilation and creation operators acting on an electron with spin $\sigma = \uparrow, \downarrow$ located at ith site (i = 0, 1), and $n_{i,\sigma} = c_{i, \sigma}^\dagger c_{i, \sigma}$ is the number operator of an electron with spin σ at ith site. The parameter U > 0 determines the strength of electron repulsion. This system can be considered as a simplified model of a hydrogen molecule whereas it also serves as a prototype of strongly-correlated materials. When we restrict ourselves into the sector where the number of electrons is two, i.e. the neutral hydrogen states, the first, second excitation energies and the transition dipole moment are

respectively. We note that the dipole moment operator is defined as $\boldsymbol{\mu} = \frac{1}{2}(n_{1,\uparrow}+n_{1,\downarrow}-n_{0,\uparrow}-n_{0,\downarrow})$ .

Applying the Jordan-Wigner transformation [55] to the system (equation 11), we obtain the 4-qubit Hamiltonian $H_\mathrm{qubit}(U)$ . We denote the ground state of $H_\mathrm{qubit}(U)$ as $|\psi_0(U)\rangle$ . When there is no entangler, the classical vector $\mathbf{x}_{|\psi_0(U)\rangle}$ is trivial because

holds for all qubit sites j = 0, 1, 2, 3, where we define $\langle\ldots\rangle_\mathrm{GS} = \langle\psi_0(U)|\ldots|\psi_0(U)\rangle$ . In contrast, when there is an entangler $U_\mathrm{ent}$ in our model, it converts the Pauli operators $X_j, Y_j, Z_j$ into a sum of more complicated Pauli strings as $U_\mathrm{ent}^\dagger Z_0 U_\mathrm{ent} = Z_0Z_1 + 0.2 Z_1 X_1 Z_2 Y_2 + \ldots$ in the Heisenberg picture. Then, the classical vector $\mathbf{x}_{|\psi_0(U)\rangle}$ contains contributions from the terms like $\langle Z_0Z_1\rangle_\mathrm{GS}$ . It follows that

and

These equations indicate that the excitation energies can be predicted by utilizing the values of $\langle Z_0Z_1\rangle_\mathrm{GS}$ appropriately. Therefore, one can see that it is possible to predict the excitation energies from the classical vector $\mathbf{x}_{|\psi_0(U)\rangle}$ if the ground state vector is processed by an entangler, and the classical machine learning unit f_W has enough non-linearity. These equations also imply that the details of the entangler, which determine coefficients of the terms like $\langle Z_0Z_1\rangle_{\mathrm{GS}}$ in $\mathbf{x}_{|\psi_0(U)\rangle}$ , is not so important for predictions; the classical machine learning unit can compensate the difference of such coefficients. In the appendix, we provide further analysis of the 2-site Hubbard model including the necessity of non-linearity.

4.2. Generalizablity

In the numerical simulations in section 3, the models are trained and evaluated for each molecule separately. The generalizability of our model to predict the properties of various molecules simultaneously is one possibility of our model for future extensions.

To make our model more powerful and capable of taking various molecules as inputs, several modifications can be considered. First, including the ground state energy which can also be calculated by the VQE besides the ground state in the input of the classical machine learning unit f_W will be necessary since otherwise, one may not determine the energy scale of an input molecule. Second, replacing the entangler $U_\mathrm{ent}$ with a parametrized quantum circuit V(θ) and optimizing the circuit parameters θ along with the classical machine learning unit increase the degree of freedoms of the model and may result in a better predictive power, with a possible drawback that the number of required experiments on the NISQ devices would increase in the training step. Exploring these ideas is an interesting future direction of the work.

5. Conclusion

In this study, we introduce a new quantum machine learning framework for predicting the excited-state properties of a molecule from its ground state wavefunction. By employing the quantum reservoir and choosing simple one-qubit observables for measurements accompanied by post-processing with classical machine learning, one may process our framework easily on the NISQ devices requiring the realistic number of runs of them. The numerical simulations with and without the noise in outputs of quantum circuits demonstrate that our model accurately predicts the excited states. Although our framework is tested only with small molecules to illustrate its potential in the numerical simulations, we expect that it will benefit the calculation of excited states of larger molecules by reducing the computational cost from calculating exact solutions. Our result opens up the further possibility to utilize the NISQ devices in the study of quantum chemistry and quantum material fields.

Acknowledgment

HK was supported by QunaSys Inc. HK and YON acknowledge Suguru Endo, Kosuke Mitarai, Nobuyuki Yoshioka, Wataru Mizukami, and Keisuke Fujii for valuable discussions. This work was also supported by MEXT Q-LEAP JPMXS0118068682.

Appendix A.: Variational Quantum Eigensolver (VQE) and its extension to the excited states

In this appendix, we first review the VQE algorithm [37] which finds the ground state of a given Hamiltonian by using the near-term quantum computers. We use it to prepare the ground states of the molecular Hamiltonians considering the realistic noisy situation in section 3. Next, to give the readers an insight on how costly it is to find the excited states of a given Hamiltonian on the near-term quantum computers compared with the computations for the ground states, we review the subspace-search VQE (SSVQE) algorithm [44] as one example of such algorithms.

A.1. VQE algorithm

The VQE tries to compute the minimum eigenvalue and its corresponding eigenstate of a given observable H by minimizing the expectation value of H with the ansatz state $|\psi(\theta)\rangle = U(\vec{\theta})|0\rangle$ , where $U(\vec{\theta})$ is the parameterized unitary circuit on a quantum computer with classical parameters $\vec{\theta}$ and $|0\rangle$ is some reference state. When the expectation value $E(\vec{\theta}) = \langle\psi(\vec{\theta})|H|\psi(\vec{\theta})\rangle$ reaches the minimum at $\vec{\theta}_\mathrm{opt}$ by optimizing the parameters $\vec{\theta}$ , $E(\vec{\theta}_\mathrm{opt})$ and $|\psi(\vec{\theta}_\mathrm{opt})\rangle$ are the closest approximation of the lowest eigenvalue and its corresponding eigenstate, respectively. Evaluation of $E(\vec{\theta})$ for a given $\vec{\theta}$ is performed by the near-term quantum computers, and one uses a classical optimization algorithm to iteratively update the values of $\vec{\theta}$ to find the minimum. This classical-quantum hybrid architecture of the VQE algorithm requires less computational/experimental abilities for quantum computers than the long-term, pure-quantum algorithms such as the phase estimation, so that one may run it on the near-term quantum computers.

When applying the VQE to the molecular Hamiltonian, first we prepare the observable H as the second-quantized Hamiltonian of a given molecule by using the finite number of orbitals. Typically, the Hartree–Fock molecular orbitals are used for the second-quantization and each spin orbital corresponds one qubit [51, 52]. Since the second-quantized Hamiltonian is written in fermionic operators while quantum computers can handle with qubit operators $P \in \{I, X, Y, Z\}^{\otimes N}$ only, it is then mapped into the linear combination of qubit operators, $H \rightarrow \sum_P h_P P$ where h_P is a coefficient corresponding to the operator P. One example of such the fermion-spin mapping is the Jordan-Wigner transformation which is reviewed in B.

A.2. Subspace-search variational quantum eigensolver (SSVQE) for excited states

Here, we also review the SSVQE algorithm [44], which is one of the algorithms to find the eigenstates corresponding to the higher eigenvalues of an observable H on the near-term quantum computers. Suppose we would like to find the k lowest eigenvalues and eigenstates of H. The SSVQE employs the k reference states $\{|\phi_i\rangle\}_{i = 1}^k$ which are mutually orthogonal and prepares the k ansatz states with a parameterized unitary circuit $U(\vec{\theta})$ as $\{|\psi_i(\vec{\theta})\rangle = U(\vec{\theta})|\phi_i\rangle \}_{i = 1}^k$ . It was shown in [44] that when the following cost function

takes the minimum at $\vec{\theta}_\mathrm{opt}$ for appropriate weights $\{w_i\}_{i = 1}^k$ satisfying $i \lt j \Rightarrow w_i > w_j$ , ith eigenvalue and eigenstate are approximated by $\langle\psi_i(\vec{\theta})|H|\psi_i(\vec{\theta})\rangle$ and $|\psi_i(\vec{\theta})\rangle$ , respectively. Compared with the VQE for the ground state, evaluating the cost function of the SSVQE takes more computational cost (runs of quantum circuits and measurements) by k times because one need to evaluate $\langle\psi_i(\vec{\theta})|H|\psi_i(\vec{\theta})\rangle$ for each i = 1, ..., k separately and combine them. Moreover, the parameterized unitary circuit must be deeper to express the excited states because they are generally more entangled than the ground state. To implement the deeper unitary circuit, the fidelity required for the near-term quantum computers is tougher than that for the VQE, and more parameters need to be optimized so it will take longer for the cost function to converge.

Appendix B.: Jordan-Wigner transformation

The Jordan-Wigner transformation [55] converts the fermionic creation and annihilation operators to the spin (qubit) operators faithfully preserving the algebra. It regards the vacuum state $|0\rangle$ as the down spin $|\downarrow\rangle$ and the occupied state $|1\rangle$ as the up spin $|\uparrow\rangle$ . The algebra of the fermionic operators follows the anti-commutation relations

where $c^\dagger_m$ and c_m are the fermionic creation and annihilation operators acting on the m-th lattice site, respectively, and δ_mn is the Kronecker delta. These relations can be represented in terms of the spin operators if one replaces the fermionic operators as

where $\sigma^+_m = (X_m + iY_m)/2$ and $\sigma^-_m = (X_m - iY_m)/2$ .

Appendix C.: Nonlinearity of excited-state properies

In this appendix, we present further analysis of the 2-site Hubbard model discussed in section 4. The exact expressions of the excited-state properties of the Hubbard model in terms of the elements of the classical vector $\mathbf{x}_{|\psi_0(r)\rangle}$ present specific examples demonstrating that they may and may not be approximated with a linear model, given the classical vector.

There are 4⁴ = 256 Pauli operators (from $I_0I_1I_2I_3$ to $Z_0Z_1Z_2Z_3$ ) which may act on the Hilbert space for the 2-site Hubbard model (11). Exhaustive search for all of these Pauli operators reveals that only two functions of U appear as the ground state expectation values:

In other words, for any choice of the entangler $U_\mathrm{ent}$ , all components of the classical vector $\mathbf{x}_{|\psi_0(U)\rangle} = \left( \langle \psi_0(U)|U_\mathrm{ent}^\dagger X_0 U_\mathrm{ent}|\psi_0(U)\rangle, \cdots, \langle \psi_0(U)|U_\mathrm{ent}^\dagger Z_{N-1} U_\mathrm{ent}|\psi_0(U)\rangle \right)^T$ will be written as a linear combination of f₁(U) and f₂(U).

Now, we can see that the non-linearity in the classical unit is not necessary for a small value of U, but it is for a large U. For $0 \lt U \ll 1$ , f₁(U)≈U/4 and f₂(U)≈ 1, and the excited-state properties in terms of these functions are

ignoring $\mathcal{O}(U^2)$ terms. In this case, the excited-state properties can be expressed easily as the linear combination of f₁(U) and f₂(U). On the other hand, when U is large, i.e. when $0\lt1/U\ll 1$ , it follows that f₁(U)≈ 1 and f₂(U)≈ 4/U, and the excited-state properties can be expressed as

ignoring O(1/U²) terms. ΔE₂ may not be expressed as a linear combination of f₁ and f₂.

To support the observation, we also perform a numerical simulation for the 2-site Hubbard model. We randomly sample 30 distinct values of U for the training data and 50 distinct values of it for the test data in the range of U ∈ [0.1, 6] (Case 1) and U ∈ [0.1, 20] (Case 2). The ground state wavefunction of $H_\mathrm{Hub}(U)$ is prepared by the exact diagonalization. Instead of using an entangler, here we define the classical vector $\mathbf{x}_{|\psi_0(U)\rangle}$ as $(\langle Z_0 Z_1\rangle_\mathrm{GS}, \langle X_0 Z_1 X_2\rangle_\mathrm{GS})^T$ . The linear regression to learn the excited-state properties $\mathbf{y} = (\Delta E_1, \Delta E_2, \|\boldsymbol{\mu}_\mathrm{eg}\|)^T$ from $\mathbf{x}_{|\psi_0(U)\rangle}$ is performed both for Case 1 and Case 2. All values of $\mathbf{x}_{|\psi_0(U)\rangle}$ and $\mathbf{y}$ are standardized by using the mean and the standard deviation of the training dataset, respectively, during the training process of the LR. The results are shown in figure C1. The LR predicts the excites state properties almost perfectly for small values of U as one may see in the results for Case 1, whereas it fails once one tries to learn and predict from the data with large U values as shown in the results for Case 2, especially evident from the prediction of ΔE₂. Those results support our expectation that a linear classical unit can sufficiently approximate the excited-state properties in the case of small U, but it may not for a large U. We consider a similar mechanism applies to the numerical simulations of small molecules in section 3, where atomic-spacings are not very small so that the Coulomb repulsion U is not large.

Figure C1. Refer to the following caption and surrounding text. — **Figure C1.** The prediction results by the trained model for the 2-site Hubbard model (11) with two datasets sampled from (Case 1) U ∈ [0.1, 6] (left column) and (Case 2) U ∈ [0.1, 20] (right column). Top, middle, and bottom panels display the first, second excitation energies and the transition amplitude, respectively. The green circles indicate the training data points and the red crosses do the predictions. The exact values are displayed as the black line. Those values are read from the left ticks of each panel. The blue plot lines with circles represent the absolute errors between the predictions and the exact values, and the orange line indicates their mean. These error values are read from the right ticks of each panel.
Download figure:
Standard image High-resolution image

Appendix D.: Time evolution operator as the entangler: its action on the single qubit operators

In the numerical simulations in section 3, we adopt the time evolution operator $e^{-iH_\mathrm{rand}T}$ , where $H_\mathrm{rand}$ is a random Hamiltonian and T is a fixed time for the evolution, as the entangler $U_\mathrm{ent}$ . In this case, we can intuitively understand the effect of the entangler by considering the time evolution of the single-qubit Pauli operators $\bigcup_{i = 0}^{N-1}\{X_i, Y_i, Z_i\}$ in the Heisenberg picture, where N is the number of qubits in the system.

As discussed in section 2.1, the information we obtain as the outputs of the quantum circuit in our model are the expectation values of the complicated operators $\bigcup_{i = 0}^{N-1}\{U_\mathrm{ent}^\dagger X_i U_\mathrm{ent}, U_\mathrm{ent}^\dagger Y_i U_\mathrm{ent}, U_\mathrm{ent}^\dagger Z_i U_\mathrm{ent}\}$ for the ground state $|\psi\rangle$ . If we expand the random Hamiltonian as $H_\mathrm{rand} = \sum_{P\in \{I, X, Y, Z\}^{\otimes N}} h_P P$ in the basis of N-qubit Pauli operators P and coefficients h_P, it follows

where [A, B] = AB − BA. Same for the Y_i and Z_i operators. i[P, X_i] is another N-qubit Pauli operator with larger support (i.e. the number of qubits on which i[P, X_i] acts nontrivially is larger) than X_i, if P nontrivially acts on the ith qubit and one or more other qubits. As seen in section 4.1 and Appendix C, to estimate the excited-state properties, we generally need the information of the expectation values of certain Pauli operators nontrivially acting on multiple qubits. Hence, if the set of the Pauli operators $\{i[P, X_i]\}_P$ includes such required operators, the machine learning unit may automatically find them and construct the excited-state properties as a function of the expectation values. We note that O(T²) terms contain the terms like $[[P^{^{\prime}}, [P, X_i]]$ , O(T³) terms contain the terms like $[[P^{^{\prime\prime}}, [P^{^{\prime}}, [P, X_i]]]$ , and so on, so even when $\{i[P, X_i]\}_P$ does not contain the required operators, they may be contained in these higher-order terms, and the machine learning unit may find them if T is large enough so that the higher-order terms in T contribute enough to $U_\mathrm{ent}^\dagger X_i U_\mathrm{ent}$ . Hence, the operators $\{U_\mathrm{ent}^\dagger X_i U_\mathrm{ent}, U_\mathrm{ent}^\dagger Y_i U_\mathrm{ent}, U_\mathrm{ent}^\dagger Z_i U_\mathrm{ent}\}_{i = 0}^{N-1}$ are constituted from more long-ranged, multi-qubit Pauli operators if (1) the random Hamiltonian $H_\mathrm{rand}$ contains stronger and longer-ranged interactions and/or (2) the time for the evolution becomes larger. This means that the expectation values of $\bigcup_{i = 0}^{N-1}\{U_\mathrm{ent}^\dagger X_i U_\mathrm{ent}, U_\mathrm{ent}^\dagger Y_i U_\mathrm{ent}, U_\mathrm{ent}^\dagger Z_i U_\mathrm{ent}\}$ for the ground state $|\psi\rangle$ bring more information of $|\psi\rangle$ and the original molecular Hamiltonian than those of $\bigcup_{i = 0}^{N-1}\{X_i, Y_i, Z_i\}$ for the ground state, and there is more chance for the machine learning unit to successfully predict the excited-state properties from the information. We note that the physical picture of the spreading of the single-qubit Pauli operators over the whole system under chaotic Hamiltonians was discussed in reference [65].

Appendix E.: Dependence of the excited-state prediction on the number of shots for the VQE

In this appendix, we present how the accuracy of the predictions from our model varies with the number of shots used to perform the VQE for preparing the dataset of the ground states. We carried out the same numerical simulation for the LiH molecules in the noisy situation as described in sections 3.3 and 3.4.2, but with the various numbers of shots ranging from 100 to 10⁶ for the computation of the VQE, instead of fixing it to 10 000 shots. Left panel of figure E2 displays the MAE for the predictions of ΔE₁ versus the number of shots for the VQE, showing that the MAE decreases almost monotonically with the number of shots. Right panel of figure E2 shows the MAE between the ground state energy computed by the VQE and the exact one obtained by diagonalization of the molecular Hamiltonians of LiH, as a function of the number of shots. Interestingly, the accuracy of the VQE has an empirical overhead at around $10^3-10^4$ shots and gradually saturates the infinite-shots limit which is non-zero because of the presence of the noise. Two panels of figure E2 suggest that the accuracy of the predictions of the excited-state properties is almost independent of the precision of the computation result from the VQE. Rather, it depends on the number of shots, and one may simply increase the shots to obtain estimations with higher accuracy. We leave a deeper analysis of this curious dependence as future work.

Figure E2. Refer to the following caption and surrounding text. — **Figure E2.** (Left) Dependence of the MAEs of the predictions of ΔE₁ for LiH on the number of shots employed in performing the VQE. (Right) Dependence of the MAEs between the ground state energy obtained by the VQE and the exact one computed by diagonalization of the molecular Hamiltonian of LiH on the number of shots to perform the VQE.
Download figure:
Standard image High-resolution image

Please wait… references are loading.