Assessment of image generation by quantum annealer

Quantum annealing was originally proposed as an approach for solving combinatorial optimization problems using quantum effects. D-Wave Systems has released a production model of quantum annealing hardware. However, the inherent noise and various environmental factors in the hardware hamper the determination of optimal solutions. In addition, the freezing effect in regions with weak quantum fluctuations generates outputs approximately following a Gibbs–Boltzmann distribution at an extremely low temperature. Thus, a quantum annealer may also serve as a fast sampler for the Ising spin-glass problem, and several studies have investigated Boltzmann machine learning using a quantum annealer. Previous developments have focused on comparing the performance in the standard distance of the resulting distributions between conventional methods in classical computers and sampling by a quantum annealer. In this study, we focused on the performance of a quantum annealer as a generative model from a different aspect. To evaluate its performance, we prepared a discriminator given by a neural network trained on an a priori dataset. The evaluation results show a higher performance of quantum annealer compared with the classical approach for Boltzmann machine learning in training of the generative model. However the generation of the data suffers from the remanent quantum fluctuation in the quantum annealer. The quality of the generated images from the quantum annealer gets worse than the ideal case of the quantum annealing and the classical Monte-Carlo sampling.


Introduction
Quantum annealing (QA) is an approach for solving combinatorial optimisation problems using quantum effects 1 .The hardware implementation of QA has been developed by D-Wave Systems and performs for production level, attracting much interest from academia and industry [2][3][4][5] .Quantum annealers have been used in numerous applications, such as portfolio optimisation 6 , protein folding 7 , molecular similarity problems 8 , computational biology 9 , job-shop scheduling 10 , traffic optimisation 11 , election forecasting 12 , machine learning [13][14][15][16][17][18] , web recommendation 19 , and automated guided vehicles in factories 20 .By feeding a Hamiltonian-based quadratic unconstrained binary optimisation matrix into a quantum annealer, several solutions can be quickly obtained.However, the solution may not always be optimal owing to unavoidable hardware limitations.For instance, connectivity in the hardware graph may be insufficient to represent the optimisation matrix.In fact, the original optimisation problem is embedded in a hardware chimaera (previous hardware version) or a Pegasus graph.The embedding protocol is generally difficult, and many variants have been proposed 21 .When various problems are embedded in the hardware graph, a chain with interacting redundant spins represent the original problems.Nevertheless, when the strength of the chain interaction is sufficiently high, embedding can be avoided.Another hardware limitation is the nonuniform distribution of the degenerate ground states owing to quantum effects.As an alternative for QA, simulated annealing achieves uniform degenerate ground states 22,23 .Still, this limitation is not critical, as a degenerate ground state can be found in any case.Another limitation is relatively crucial and related to the freezing effect and environment effect from a heat bath.A real quantum annealer is not isolated from the environment.Thus, it does not reflect the ideal QA case assumed in theory.Hence, if the system is thermalised, the QA outputs follow a Gibbs-Boltzmann distribution.Consequently, several protocols based on QA do not maintain the system in the ground state as in adiabatic quantum computation.Instead, they employ a nonadiabatic counterpart [24][25][26][27] and consider thermal effects 28 .In addition, environmental effects cannot be avoided.
Instead of finding optimal solutions, the quantum annealer may be used to generate outputs following a Gibbs-Boltzmann distribution.In this case, another problem of quantum annealers occurs.Experimentally, the energy of solutions attained by a quantum annealer follows a Gibbs-Boltzmann distribution with finite-strength quantum fluctuations 29 .Although many QA applications have been reported, most of them focus on optimisation, and only a few developments have addressed QA applications of Boltzmann machine learning using quantum annealers.Previous studies have shown that the quantum annealer achieves higher performance in Boltzmann machine learning measured in the standard distance between the target and attained distributions, the Kullback-Leibler (KL) divergence.

arXiv:2103.08373v1 [cond-mat.dis-nn] 15 Mar 2021
In this study, we investigated the performance of Boltzmann machine learning using a quantum annealer from a different perspective.We used the quantum annealer during training and generation of the Boltzmann machine.Previously, the performance of the quantum annealer has been investigated with respect to the final value of the KL divergence.However, the trained model has not been considered as a generative model, as its performance is difficult to assess.Thus, we prepared another neural network to measure the quality of the generated data from the Boltzmann machine.Although several measures quantify the similarity between two images, we use another neural network because the generated outputs do not necessarily represent a one-to-one correspondence of images in the training dataset.We evaluate the generated outputs by discriminating them using the discriminator neural network, assuming that these outputs are similar to the training samples.The discriminator model was trained using the same training dataset as that fed to the Boltzmann machine to learn its various features.The discriminator then indicated the similarity between the generated and training samples.
The remainder of this paper is organised as follows.The next section describes the Boltzmann machine and sampling using a quantum annealer.Then, we explain the training and generation method of the Boltzmann machine used in the experiments and the discriminator to assess the generated data.Subsequently, we report the evaluation results of the images generated by the Boltzmann machine and compare different combinations of sampling methods in Boltzmann machine learning.In the last section, we summarise the study.

Problem Setting
The Boltzmann machine is a neural network model with fully connected and undirected edges.We set a binary variable that can be either 0 or 1 at each node.In Boltzmann machine learning, the binary variables are assumed to follow the Gibbs-Boltzmann distribution: where x i is the binary variable for node i and Φ is the energy function of the Boltzmann machine indicating that the state of the node that reduces this energy is likely to appear.In the energy function, b i is the bias of node i and w i j is the weight between nodes i and j on each edge.These are summarised as parameters and are denoted as θ .Z(θ ) is a partition function used for normalisation.In Boltzmann machine learning, the output data are assumed to follow the Gibbs-Boltzmann distribution defined above.During learning, the maximum likelihood estimation for probability distribution (1) is performed: where and N is the number of samples.We use the log-likelihood function instead of the mature quantity for simplicity.To find maximiser θ * of the log-likelihood function, we take its derivative with respect to θ to obtain where . . .data denotes the empirical mean over the training data and . . .model denotes the model expectation.The bottleneck of Boltzmann machine learning is estimating the model expectation in the equations above, as it requires 2 N calculations in principle.Therefore, the estimation should be approximated or a sampling method following the Gibbs-Boltzmann distribution should be adopted.For sampling, we prepared several synthetic data samples according to the probability distribution of the model and approximated the expected value using their empirical mean.Sampling methods include Gibbs sampling using the Markov chain Monte Carlo (MCMC) method.Instead of directly manipulating Gibbs sampling, we used a quantum annealer to efficiently obtain output samples following the Gibbs-Boltzmann distribution.We also compared the performance of Boltzmann machine learning for two sampling methods.
We analysed the quality of generation by the Boltzmann machine trained using the outputs from a quantum annealer.To control data generation, we separated the nodes of the Boltzmann machine into sectors A and B. One sector of nodes (i ∈ A) represents the data, and the other sector (i ∈ B) selects the kind of data.We denote the binary vectors in sector A as x A and those in sector B as x B .We used the MNIST dataset, which consists of various images of handwritten digits from zero to nine, as the dataset for assessment.Thus, sector B for selecting the digit label requires at most 10 nodes.Sector A representing the generated images has the remaining nodes of the Boltzmann machine.Although machine learning studies are generally performed for higher-dimensional data even for proofs of concept, we employed the MNIST dataset due to the limited capacity of the quantum annealer.For the experiments, we used the D-Wave 2000Q quantum computer, whose size is restricted to 64 binary variables on the fully connected graph by embedding on a chimaera graph.Consequently, the number of nodes in the sector for representing the image in Boltzmann machine learning was seriously restricted using this hardware.We also restricted the number of digits to five, and resized the images to 8 × 6 = 48 pixels from the scaled-down MNIST dataset.In addition, we binarised the original images as shown in Fig. 1 to be suitable for the quantum annealer.To control the generated images, we fixed one of the nodes in sector B during training.For instance, let us consider the generation of five images for digits from 5 to 9. We represented each image x with label k from 5 to 9 into image data x B = x using one-hot encoding as x A = (1, 0, 0, 0, 0) for k = 5, x A = (0, 1, 0, 0, 0) for k = 1, and so on.The Boltzmann machine was trained using the labelled encoded images and provided the controlled images depending on sector B. In the implementation, we applied a strong magnetic field to the node corresponding to the desired label.Then, the generated output from the Boltzmann machine was conditioned on sector B. In the conventional MCMC approach, we can directly control each bit in sector B. In contrast, the quantum annealer cannot keep each bit.Thus, we applied a strong magnetic field to generate the data.Below, we detail the experiments for assessing the performance of Boltzmann machine learning by using a quantum annealer.Note that the quantum annealer sometimes fails to correctly generate outputs by indirect control of the bit in sector B.

Experiments
The currently available quantum annealer performs QA mainly to solve optimisation problems.However, it can also generate different binary variable configurations.The distribution of the configurations approximately follows a Gibbs-Boltzmann distribution with the form of the energy function in Eq. (2).Therefore, we replaced the estimation of the model expectation with the computation of the empirical mean of the QA outputs by setting the same energy function during Boltzmann machine learning.The outputs are classical but follow the Gibbs-Boltzmann distribution with a finite transverse magnetic field controlling the strength of the quantum fluctuation 29 .For optimisation, the remaining quantum fluctuation may lead to obstacles in the solutions.However, for machine learning, it would be helpful to find the parameters providing high performance, as reported in previous studies.Indeed, the possibility of improving the generalisation performance by finite-strength quantum fluctuations during neural network training has been explored 20,30 .
A previous study used the KL divergence to assess the performance of a Boltzmann machine 31 .As a result, a lower KL divergence was attained by using the Boltzmann machine and sampling from the quantum annealer, suggesting the superiority of the quantum version of the Boltzmann machine.In this study, we focused on the practical performance of a quantum annealer by directly assessing the quality of generated data from the Boltzmann machine.Besides investigating the data quality, we compared different combinations of sampling methods for Boltzmann machine learning.Specifically, we evaluated Gibbs sampling and direct sampling in the quantum annealer during both training and generation.During training, sampling was used to compute the expectations for assessing the derivatives with respect to the parameters.Finite-strength quantum fluctuations affect the precision of the computation of the derivatives.In other words, quantum fluctuations perform regularisation, as investigated in a previous study.During generation, sampling is used again to generate a new sample.Then, the finite-strength quantum fluctuation was assumed not to be directly related to the quality of the generated data but rather to cause degradation.However, sampling in a quantum annealer provides independent output samples because the annealer quickly repeats the generation of the binary configurations following the Gibbs-Boltzmann distribution by leveraging quantum superposition.In contrast, classical sampling tends to maintain correlation between samples if both the generation of the distribution function is not restarted using different initial conditions and the sampling period is short.Thus, Boltzmann machine learning using quantum annealers is expected to be superior to that using classical computers during generation.
We evaluated the performance of Boltzmann machine learning using a quantum annealer by preparing a discriminator neural network to measure the quality of the generated data.The discriminator was trained using the MNIST dataset, as for Boltzmann machine learning.The input layer of the discriminator had 8 × 6 + 5 = 53 nodes receiving the output from the Boltzmann machine representing the resized images and the bits in sector B for selecting the label.The output layer had 5 nodes to express the image label using one-hot encoding.The discriminator architecture is shown in Fig. 2. In addition, we set the cross-entropy as the loss function and used Adam optimisation to train the discriminator 32 and D = 896 images representing digits 5-9.We used all these images for training the discriminator and the Boltzmann machine.Thus, both networks suitably characterised the MNIST dataset.However, the discriminator was not trained using the images generated by the Boltzmann machine.We defined the quality of the generated images as the agreement rate between the obtained labels from discrimination: where D m is the number of matching labels between generation and discrimination and D t is the total number of generated images.
We used two sampling methods, namely, Gibbs sampling and direct sampling in the quantum annealer (D-Wave 2000Q) for training.Gibbs sampling was performed with burn-in time T in and sampling interval ∆t of (T in , ∆t) = (200, 10) and (1000, 50).The number of samples was set to 200.On the other hand, direct sampling in the quantum annealer does not have burn-in time and sampling interval but uses annealing time, this we set to δt = 5 µs per sampling.The number of samples was 200, as for Gibbs sampling.During training, while computing the expectation by the empirical mean trough sampling, we updated the parameters using minibatch learning and the momentum method with L 2 norm regularisation following the approach in 33 .The parameters used during training of the Boltzmann machine are listed in Table 1.After learning the Boltzmann machine, we used it to synthesise images.For image generation, we also used Gibbs sampling and direct sampling in the quantum annealer.We sampled 100 images per digit (i.e., 500 samples for the five digits) by controlling sector B. In addition to the two sampling methods, we performed low-energy sampling using the quantum annealer.Specifically, we first obtained 1000 samples from the quantum annealer and considered only the 100 images with the lowest energy values as the sampled images.We changed combination of sampling methods for training and generation.For example, we considered training with Gibbs sampling and generation with direct sampling.The sampling combinations are listed in Table 2.

Results
The training results of the discriminator is shown in Fig. 4. The recognition rate of the discriminator exceeded 99%.We used the discriminator to assess the performance of the Boltzmann machine in terms of agreement rate R a according to the number of epochs, obtaining the results shown in Fig. 5.For a given sampling method for generation, training using Gibbs sampling provided a R a approximately 10% lower than training using direct sampling with QA.When training using Gibbs sampling, R a tended to be lower owing to overfitting with increasing number of epochs.This phenomenon did not occur for direct sampling with QA.Thus, training using direct sampling with QA may prevent overfitting in training of the Boltzmann machine.This observation is consistent with previous studies that considered KL divergence 31 .We confirmed the observation considering the agreement ratio.The agreement ratio reflects the quality of the generated images by comparing the generated and original images.On the other hand, the KL divergence measures the similarity between probability distributions.Therefore, the agreement ratio is homogeneous between different generative models and measures the performance of data generation.
By considering the two characteristics of the agreement ratio, we compared all the experimental results.Comparing the different sampling methods for training, we found that a higher agreement ratio is given by direct sampling with QA for all the cases regardless of the generation method.Thus, regularisation is ensured during training of Boltzmann machine learning by direct sampling with QA.Figs. 6 and 7 show several images to illustrate the differences between the training methods.The images correctly discriminated are coloured, and those incorrectly classified are shown in black and white.Comparing the different sampling methods during generation, we found that a higher agreement ratio is achieved by MCMC regardless of the burn-in time and sampling interval.Thus, the quantum annealer provides lower generation quality than the classical approach.This lower quality is due to obstacles in sector B for selecting the label because we cannot always fix the binary variable during QA.Instead, we applied a strong magnetic field to fix the binary variables in sector B. Therefore, several samples could not be correctly generated following sector B. This demonstrates some practical difficulty for generating data while controlling outputs using the quantum annealer.
From the experiments, it is clear that training using the quantum annealer improves the performance of the Boltzmann machine, being consistent with the findings from another study 31 .Regarding generation, QA retains a finite-strength transverse magnetic field at the end, influencing the experimental results.To investigate the effect of the residual transverse magnetic field, we applied the quantum Monte Carlo method with a finite-strength transverse magnetic field for data generation.We used the same trained Boltzmann machines by QA and MCMC(1000, 50) as in the previous experiment.We set the number of Trotter slices to 32 and the inverse temperature to ?.The transverse magnetic field was changed from 0.1 to 1.0 in increments of 0.1 for image generation.R a was calculated according to the number of epochs during training, obtaining the results shown in Fig. 8. a remarkable discovery in the context of quantum machine learning.Previous studies ensured that finite-strength quantum fluctuations during training improved the generalisation performance, indicating the robustness of the trained model and ability to handle unknown data.On the other hand, we showed that finite-strength quantum fluctuations increase the quality of the generated images.We could not assert any mechanism that explains the higher performance of sampling with finite-strength quantum fluctuations.However, we believe that our findings provide a new direction for studying quantum machine learning.The latest version of the quantum annealer, the D-wave advantage, might show different (possibly higher) performance compared with that achieved in our study.As shown in the results of the quantum Monte Carlo method, sampling affected by the remanent quantum fluctuation at the end of the QA protocol showed a slightly different performance during generation compared with Gibbs sampling.After this study, we expect that future developments will emphasise practical aspects of quantum machine learning, such as generated images and their quality.

Figure 3 .
Figure 3. Summary of experiments performed in this study.

Figure 4 .
Figure 4. Results of discriminator training.The solid and dotted lines correspond to the results of training and validation, respectively.

( a )Figure 6 .
Figure 6.Images generated using QA with low energy at epoch 4000.

Table 1 .
Parameters of Boltzmann machine.

Table 2 .
Combinations of sampling methods for training and generation.Gibbs sampling is denoted by MCMC, and lowQA denotes QA with the lowest energy states.All the experiments performed in this study are summarised in Fig.3.