Experimental quantum kernel trick with nuclear spins in a solid

The kernel trick allows us to employ high-dimensional feature space for a machine learning task without explicitly storing features. Recently, the idea of utilizing quantum systems for computing kernel functions using interference has been demonstrated experimentally. However, the dimension of feature spaces in those experiments have been smaller than the number of data, which makes them lose their computational advantage over explicit method. Here we show the first experimental demonstration of a quantum kernel machine that achieves a scheme where the dimension of feature space greatly exceeds the number of data using 1H nuclear spins in solid. The use of NMR allows us to obtain the kernel values with single-shot experiment. We employ engineered dynamics correlating 25 spins which is equivalent to using a feature space with a dimension over 1015. This work presents a quantum machine learning using one of the largest quantum systems to date.


I. INTRODUCTION
Quantum machine learning is an emerging field that has attracted much attention recently.The major algorithmic breakthrough was an algorithm invented by Harrow-Hassidim-Lloyd [1].This algorithm has been further developed to more sophisticated machine learning algorithms [2,3].However, a quantum computer that is capable of executing those algorithms is yet to be realized.At present, noisy intermediate-scale quantum (NISQ) devices [4], which consist of several tens or hundreds of noisy qubits, are the most advanced technology.Although their performance is limited compared to the fault-tolerant quantum computer, simulation of the NISQ devices with 100 qubits and sufficiently high gate fidelity are beyond the reach for the existing supercomputer and classical simulation algorithms [5][6][7].This fact motivates us to explore its power for solving practical problems.
Many NISQ algorithms for machine learning have been proposed in recent works [8][9][10][11][12][13][14][15][16][17].Almost all of the algorithms require us to evaluate an expectation value of an observable, which is sometimes troublesome to measure by sampling, for example with superconducting or trapped-ion qubits.On the other hand, NMR can evaluate the expectation value with a one-shot experiment owing to its use of a vast number of duplicate quantum systems.It is therefore, in fact, a great testbed for those algorithms.A major weak point of NMR is that its initialization fidelity is quite low; at the thermal equilibrium of room temperature, the proton spins can effectively describe with a density matrix ρ eq = 1 2 (I + I z ) with ≈ 10 −5 .Nevertheless, ensemble spin systems can exhibit complex quantum dynamics that are classically intractable.For example, the dynamical phase transition between localization and delocalization has been observed in polycrystalline adamantane along with tens of correlated proton spins [18].Discrete time-crystalline order has been observed in disordered [19] and ordered [20,21] spin systems.
In this work, we employ such an ensemble spin system for machine learning.Specifically, we implement the kernel-based algorithm which utilizes the quantum state as a feature vector and is a variant of theoretical proposals [8,9,22].The experimental verification has been provided in Refs.[15,23] using either superconducting qubits or the photonic system.Our strategy to use the NMR is advantageous in that we can estimate the value of the kernel, which is the inner product of two quantum states, by single-shot experiments.We perform simple regression and binary classification tasks using the dynamics of nuclear spins in polycrystalline adamantane sample.Also, to carry out the performance analysis of our approach without the inevitable effect of noise in experiments, we present numerical simulations of 20 spin dynamics.For certain tasks, we observed that the performance of the trained model becomes better as more spins are involved in the dynamics.We employ one of the largest quantum systems to date for a quantum machine learning experiment in this work.

A. Kernel methods in machine learning
In machine learning, you are asked to extract some patterns, or features, in a given dataset [3,24].It is sometimes useful to pre-process them beforehand to achieve the objective.For example, a speech recognition task might become easier when we work in the frequency domain; in this case, the useful pre-processing would be Fourier transform.The space in which such preprocessed data live is called feature space.For a given set of data {x i } N d i=1 ⊂ R D , a feature space mapping φ(x) constructs the data in the feature space {φ(x i )} N d i=1 ⊂ R D f .The feature map has to be carefully taken as to maximize the performance in e.g.classification tasks.
The kernel methods are a powerful tool in machine learning.It uses a distance measure of two inputs defined as a kernel, k : R D × R D → R. For example, a kernel can be defined as an inner product of two feature vectors: Many machine learning models, such as support vector machine or linear regression, can be constructed using the kernel only, that is, we do not have to explicitly hold φ(x).For example, for a given teacher dataset {y i } N i=1 ⊂ R each corresponding to the input x i , a linear regression model can be constructed by, where we have defined the N d -dimensional vector k(x), and a matrix where w ∈ R D f is chosen to minimize the mean squared error between the model prediction and the teacher; w = argmin w i |w T φ(x) − y i | 2 .

B. Implementing kernel by NMR
In NMR, we can prepare a data-dependent operator A(x i ) by applying a data-dependent unitary transformation U (x i ) on the initial z-magnetization Here, I α,µ (α = x, y, z) is the x, y, z component of the spin operator of the µ-th spin and n represents the number of spins.A(x i ) with a sufficiently large n is generally intractable by classical computers [25,26].We employ this operator A(x i ) as a feature map φ NMR (x i ).A(x i ) can be regarded as a vector, for example, by expanding A(x i ) as a sum of Pauli operators.For an n-spin-1/2 system, A(x i ) is a vector in R 4 n .The dynamics of NMR can involve tens of spins maintaining its coherence [18,[27][28][29], which means we can employ an approximately 4 O (10) dimensional feature vector for machine learning.Although the high dimensional feature space does not always mean the superiority in machine learning tasks, the fact that we can work with the feature space which has been intractable with a classical computer motivates us to explore its power.
The kernel method opens up a way to exploit A(x i ) directly for machine learning purposes.While we cannot evaluate each element of A(x i ) because it takes exponential amount of time, we can evaluate the inner product of two feature vector A(x i ) and A(x j ) efficiently.To see this, let us define the inner product of two feature vectors φ NMR (x i ) and φ NMR (x j ) as, Then, This can easily be evaluated by NMR.Note that at the thermal equilibrium, the density matrix of spin systems is ρ eq = 1 2 n I + I z + o( 2 ) .Assuming 1, the above inner product can be evaluated from the following quantity, that is, we first evolve the system with U (x i ) and then with U † (x j ), and finally measure I z .Note that when x i − x j ≈ 0, the protocol resembles the famous Loschmidt echo [30,31].A similar protocol is also used for measuring out-of-time-ordered correlator (OTOC) [32,33], which is considered as a certain complexity measure of quantum many-body systems.

III. EXPERIMENT
We propose to use U (x) for an input x = {x j } D j=1 that takes the form of, where τ is a constant and H(x j ) is an input-dependent Hamiltonian (Fig. 1 (b)).In this work, we choose H(x j ) to be where I α = m I αm .The Hamiltonian H(0) can approximately be constructed from the dipole interaction among nuclear spins in solids with a certain pulse sequence [18,34,35] described in Supplementary Material with details of the experiment.Shifting the phase of the pulse by x provides us H(x) for general x.This Hamiltonian with x j = 0 created in adamantane has been shown to have a delocalizing feature in Refs.[18,[27][28][29], which makes it appealing as we wish to involve as many spins as possible in the dynamics.
To illustrate character of the kernel function, we show the shape of the kernel for one-dimensional input x obtained with this sequence setting τ = N τ 1 where τ 1 = 60µs and N = 1, 2, • • • , 6 as Fig. 1 (c).Since H(x) is defined with the varying phase of the pulse, the value of the kernel k(x i , x j ) for two one-dimensional inputs x i and x j only depends on their difference, x i − x j .We therefore show the value of the kernel as a function of x i − x j in Fig. 1 (c).The decay of the intensity of the signal with increasing N is due to decoherence.We show the Fourier transform of the measured NMR kernel (Fig. 1  (c)) in Fig. 1 (d).The frequency component of an integer m in this experiment, which is called coherence order, results from the existence of m-body spin operators in A(x), and its intensity is called multiple quantum spectrum [18,29,32].Fig. 1 (d) indicates that the dynamics involving 10 spins is present for N = 6 [36].

A. One-dimensional regression task
As the first demonstration, we perform the onedimensional kernel regression task using the kernel shown in Fig, 1 (c).To evaluate the nonlinear regression ability of the kernel, we use y = sin(2πx/50) and y = sin(2πx/50) 2πx/50 , which will be refered to as sin and sinc function, respectively.We randomly drew 40 samples of x from [−45, 45] (in degrees) to construct the traning data set which consists of the input {x j } 40 j=1 and the teacher {y j } 40 j=1 calculated at each x j .The NMR kernel k NMR (x i , x j ) is measured for each pair of data to construct the model by kernel Ridge regression [24].We let the model predict y for 64 x's including the training data.The regularization strength was chosen to minimize the mean squared error of the result at the 64 evaluation data.
The result for the sin function is shown in Fig. 2 (a)-(f).That for the sinc function is shown in Supplementary Material.Fig. 3 (a) shows the accuracy of learning evaluated by the mean squared error between the output from the trained model and true function.We see that the regression accuracy tends to increase with a larger N .However, because of the deteriorating signal-to-noise ratio, the result also gets noisy with increasing N .

Numerical simulation
To certify the trend without the effect of noise, we conducted numerical simulations of 20-qubit dynamics.We drew the interaction strength, d µν , from uniform distribution on [−1, 1] for all µ, ν.The evolution according to the Hamiltonian H(x) is approximated by the first-order Trotter formula, that is, e −iH(x)τ ≈ e −ixIz µ<ν e −iτ dµ,ν (Iy,µIy,ν −Ix,µIx,ν )/M M e ixIz .(11) We set τ = 0.01, 0.02, • • • , 0.06 and τ /M = 0.001 in the simulation.In order to reduce the computational cost, we set 2 , where |0 is the ground state of I z .Since we can compute this quantity by simulating dynamics of a 2 20 -dimensional state vector, it is significantly easier than computing A(x) = U (x)I z U † (x) where we would need to simulate dynamics of 2 20 × 2 20 matrices.All simulations are performed with a quantum circuit simulator Qulacs [37].The result for the sin function is shown as Figs. 2 (g)-(l).For the sinc function, we place the result in Supplementary Material.The mean squared error of the prediction evaluated in the same manner is shown in Fig. 3 (b).We can see the performance gets better With increasing τ , which corresponds to increasing N in the experiment.This certifies the trend observed in the NMR experiment.

B. Two-dimensional classification task
As the second demonstration, we implement twodimensional classification tasks.We employ the hardmargin kernel support vector machine [24] and its implementation in scikit-learn [38] for this task.The training data set is generated by the circle and moon dataset of scikit-learn [38].We used the NMR kernel with N = 1, 2, 3. We again conducted numerical simulations with the same setting as the previous section along with the experiment with τ = 0.03, 0.06, 0.09.The value of this numerical kernel is shown in the Supplementary Material.
The results are shown in Fig. 4. We note that, for the moon dataset with N = 1 experimental NMR kernel, the kernel matrix was singular, and we did not obtain a reliable result.We reason this to the broadness of the kernel at N = 1.For these classification tasks, we do not observe particular changes with increased evolution time.It is also clear from the hinge loss, which is a measure of the accuracy in classification tasks, for each result given in Supplementary Material.

IV. DISCUSSION
In the one-dimensional regression task, we observed the trend of better performance with longer evolution time.This can be explained by the shape of the kernel generated by the NMR dynamics, which is shown in Fig. 1  (c).As mentioned earlier, this experiment is essentially the Loschmidt echo, and the shape of the signal sharpens as the evolution time increases.The sharpness of the kernel can directly be translated to the representability of the model as it can be in the popular gaussian kernel, because this property allows the machine to distinguish different data more clearly.However, it also causes overfitting problems if the data points are sparse.The most extreme case is when we use delta function as a kernel, where every training point is learned with the perfect accuracy while the trained model fails to predict for unknown inputs.In our experimental case, we did not observe any overfitting problem, which means that our training samples were dense enough for the sharpness of the kernel utilized in the model, and thus we observed an increasing performance from the improved representabil-ity of the kernel with longer evolution time.On the other hand, for the classification task, we did not observe any significant trend depending on the evolution time.We suspect that there is an optimal evolution time for this kind of task, which should be explored in future works.
We note that the shape of the kernel resembles the gaussian kernel which is widely employed in many machine learning tasks.One might think that we could have obtained similar results using the gaussian kernel.While this is true, it is also true that the NMR dynamics evolved by general Hamiltonian cannot be simulated classically under certain complexity conjecture [25,26].This leaves a possibility that the NMR kernel performs better than classical kernel in some specific cases.More experiments using different Hamiltonians are required to test whether the "quantum" kernel has any advantage in machine learning tasks over widely used conventional kernels.

V. CONCLUSION
We proposed and experimentally tested a quantum kernel expressed by Eqs. ( 7) and (9).Experimentally, we used 1 H spins in adamantane with O(10) coherence order to compute the kernel.Machine learning models for onedimensional regression tasks and two-dimensional classification tasks were constructed with the proposed kernel.
The experimental and numerical results showed similar results.Experiments along with numerical simulation also showed that the performance of the model tended to increase with longer evolution time, or equivalently, with a larger number of spins involved in the dynamics for certain tasks.It would be interesting to export this method to more quantum-oriented machine learning tasks.For example, one may be able to distinguish two dynamical phases of spin systems, such as localized and delocalized phases demonstrated in Ref. [18], with the kernel support vector machine employed in this work.More experiments are needed to verify the power of this "quantum kernel" approach, but our results can be thought of as one of the baselines of this emerging field.

I. EXPERIMENTAL DETAILS
A pulse sequence to realize H(0) is given in Fig. S1.In the experiment, we set the length of π/2 pulse, τ p , to 1.5 µs.For the waiting period, we used ∆ = 2∆ + τ p with ∆ = 3.5 µs, which makes the evolution time for a cycle, τ 1 , 60 µs.By repeating the sequence for N times, we can effectively evolve the spins with e −iH(x D )τ for τ = N τ 1 .NMR spectroscopy with polycrystalline adamantane sample was performed at room temperature with OPENCORE NMR [39], operating at a resonant frequency of 400.281MHz for 1 H nucleus observation.S1.Pulse sequence to generate H(0).X and X respectively stand for π/2 and −π/2 rotation around x-axis.

II. ONE-DIMENSIONAL REGRESSION TASK
We show the results for the regression task of sinc function performed with the experimental NMR kernel and that of numerical simulations in Fig. S2.

III. KERNEL FROM THE NUMERICAL SIMULATIONS
The kernel computed from numerical simulations for one-dimensional input is shown as Fig. S3.It is shown as a function of x−x ∈ − π 2 , π 2 .We can observe the similar features as the experimental one, such as the sharpening of the kernel with increasing evolution time.

IV. CLASSIFICATION TASKS A. Hinge loss of trained classification model
We define the hinge loss by, 1 N i max{1 − λ i y i , 0}, where N , λ i and y i are the number of data, the model output and the teacher for the i-th data, respectively.The hinge loss for each result is computed with the training dataset and shown in Tabs.I and II.As mentioned in the main text, for the moon dataset, the N = 1 NMR kernel produced a singular matrix and the result is unreliable.Let x = (x 1 , x 2 ), x = (x 1 , x 2 ) be two data points with which we wish to evaluate the kernel k(x, x ) = k({x 1 , x 2 }, {x 1 , x 2 }).Since our encoding of the data is performed by the unitary defined by Eq. ( 9) of the main text, the kernel satisfies the equality: With this in mind, we define P

FIG. 1 .
FIG. 1.(a) Adamantane molecule.(b) Quantum circuit employed in this work to realize data-dependent quantum state ρ(x).(c) NMR kernel employed in this work.(d) Fourier transform of (c) which corresponds to the obtained 1 H multiple-quantum spectra for N = 1 to N = 6.

12 FIG. 2 .
FIG. 2. Demonstration of one-dimensional regression task of y = sin(2πx/50) performed with (a)-(f) NMR kernel and (g)-(l) numerically simulated kernel.The blue dots are the training data.The green line is the prediction of the trained model.

FIG. 3 .
FIG.3.Mean squared error of the trained models for the regression task of sin and sinc functions using (a) experimental NMR kernel and (b) numerically simulated kernel.

FIG. 4 .
FIG. 4. (a) Classification with the NMR kernel.Left and right panels respectively show the results for "circle" and "moon" dataset.The dots in the figures represent training data.The background color indicates the decision function of the trained model.Top, middle, and bottom panels are the results with N = 1, 2, 3 NMR kernels, respectively.(b) Results from the numerically simulated quantum kernel.Top, middle, and bottom panels are the results with simulated kernel with τ = 0.03, 0.06, 0.09.All the other notations follow that of (a).

TABLE I .
Hinge loss of the trained model with the experimental NMR kernel.