Quantum generalisation of feedforward neural networks

We propose a quantum generalisation of a classical neural network. The classical neurons are firstly rendered reversible by adding ancillary bits. Then they are generalised to being quantum reversible, i.e., unitary (the classical networks we generalise are called feedforward, and have step-function activation functions). The quantum network can be trained efficiently using gradient descent on a cost function to perform quantum generalisations of classical tasks. We demonstrate numerically that it can: (i) compress quantum states onto a minimal number of qubits, creating a quantum autoencoder, and (ii) discover quantum communication protocols such as teleportation. Our general recipe is theoretical and implementation-independent. The quantum neuron module can naturally be implemented photonically. We often want computers to tell us something about the input data, e.g. if a given image corresponds to a cat or a dog. It seems the human brain learns this by looking at examples whilst getting feedback from a teacher, rather than being given an algorithm. Such an approach to programming is now revolutionising the ability of machines to learn. The approach uses simplified models of the brain: neural nets. Quantum information processing devices are now emerging as the next generation of information processors. One may hope that the neural net approach will be similarly powerful there. We therefore designed quantum neural nets, processing quantum superpositions. The nets work well in two example tasks: compressing data stored in superpositions, and rediscovering a protocol known as quantum teleportation.


INTRODUCTION
Artificial neural networks mimic biological neural networks to perform information processing tasks.They are highly versatile, applying to vehicle control, trajectory prediction, game-playing, decision making, pattern recognition (such as facial recognition, spam filters), financial time series prediction, automated trading systems, mimicking unpredictable processes, and data mining [1,2].The networks can be trained to perform tasks without the programmer necessarily detailing how to do it.Novel techniques for training networks of many layers (deep networks) is credited with giving impetus to the neural networks approach [3].
The field of quantum machine learning is rapidly developing though the focus has aruably not been on the connection to neural networks.Quantum machine learning, see e.g.[4][5][6][7][8][9][10][11][12][13][14][15][16][17] employs quantum information processing (QIP) [18].QIP uses quantum superpositions of states with the aim of faster processing of classical data as well as tractable simulation of quantum systems.In a superposition each bit string is associated with two numbers: the probability of the string and the phase [19], respectively.The phase impacts the future probabilities via a time evolution law.There are certain promising results that concern quantum versions of recurrent neural networks, wherein neurons talk to each other in all directions rather than feeding signals forward to the next layer, e.g. with the purpose of implementing quantum simulated annealing [8,14,20,21].In [22] several papers proposing quantum neural network designs are discussed and critically reviewed.A key challenge to overcome is the clash between the nonlinear, dissipative dynamics of neural network computing and the linear, reversible dynamics of quantum computing [22].A key reason for wanting well-functioning quantum neural networks is that these could do for quantum inputs what classical networks can do for classical inputs, e.g.compressing data encoded in quantum superpositions to a minimal number of qubits.
We here accordingly focus on creating quantum gen-eralisations of classical neural networks, which can take quantum inputs and process them coherently.Our networks contribute to a research direction known as quantum learning [23][24][25][26][27] which concerns learning and optimising with truly quantum objects.The networks provide a route to harnessing the powerful neural network paradigm for this purpose.Moreover they are strict generalisations of the classical networks, providing a clear framework for comparing the power of quantum and classical neural networks.
The networks generalise classical neural networks to the quantum case in a similar sense to how quantum computing generalises classical computing.We start with a common classical neural network family: feedforward perceptron networks.We make the invidual neurons reversible and then naturally generalise them to being quantum reversible (unitary).This resolves the classicalquantum clash mentioned above from [22].An efficient training method is identified: global gradient descent for a quantum generalisation of the cost function, a function evaluating how close the outputs are to the desired outputs.To illustrate the ability of the quantum network we apply it to (i) compressing information encoded in superpositions onto fewer qubits (an autoencoder) and (ii) re-discovering the quantum teleportation protocol-this illustrates that the network can work out QIP protocols given only the task.To make the connection to physics clear we describe how to simulate and train the network with quantum photonics.
We proceed as follows.Firstly, we describe the recipe for generalising the classical neural network.Then it is demonstrated how the network can be applied to the tasks mentioned above, followed by a design of a quantum photonic realisation of a neural module.We discuss the results, followed finally by a summary and outlook.before detailing how to generalise them to quantum neurons.

The classical neuron
A classical neuron is depicted in FIG. 1.In this case, it has two inputs (though there could be more).There is one output, which depends on the inputs (bits in our case) and a set of weights (real numbers): if the weighted sum of inputs is above a set threshold, the output is 1, else it is 0. We will use the following standard general notation.The j th neuron in the l th layer of a network takes a number of inputs, a , where n is the number of inputs to the neuron (FIG.1).The function relating the output to the weighted input is called the activation function, which has most commonly been a Heaviside step function or a sigmoid [1].For example, the neuron in FIG. 1 with a Heaviside activation function gives an output of the form: This paper aims to generalise the classical neuron to a quantum mechanical one.In the absence of measurement, quantum mechanical processes are required to be reversible, and more specifically, unitary, in a closed quantum system [18,28].This suggests the following procedure for generalising the neuron first to a reversible gate and finally to a unitary gate: Irreversible → reversible: For an n-input classical neuron having (in 1 , in 2 , ..., in n ) → out, create a classical reversible gate taking (in 1 , in 2 , ..., in n , 0) → (in 1 , in 2 , ..., in n , out).Such an operation can always be represented by a permutation matrix [29].This is a clean way of rendering the classical neuron reversible.The extra 'dummy' input bit is used to make it reversible [28]; in particular, some of the '2 bits in -1 bit out' functions the neuron can implement require 3 bits to be made reversible in this manner.
Reversible → unitary: Generalise the classical reversible gate to a quantum unitary taking input (|ψ in 1,2,...,n |0 ) → |ψ out 1,2,...,n,out , such that the final output qubit is the output of interest.This is the natural way of making a permutation matrix unitary.
If the input is a mixture of states in the computational basis and the unitary a permutation matrix [30], the output qubit will be a mixture of |0 or |1 : this we call the classical special case.This way the quantum neuron can simulate any classical neuron as defined above.The generalisation recipe summarised in FIG. 2 also illustrates how any irreversible classical computation can be recovered as a special case from reversible classical computation (by ignoring the dummy and copied bits), which in turn can be recovered as a special case from quantum computation.
FIG. 2. Diagram summarising our method of generalising the classical irreversible neuron with Heaviside activation function, first to a reversible neuron represented by a permutation matrix (P), and finally to a quantum reversible computation, represented by a unitary operator (U).

The network
In order to form a neural network, classical neurons are connected together in various configurations.Here, we consider feedforward classical networks, where neurons are arranged in layers and each neuron in the l th layer is connected to every neuron in the (l − 1) th and (l + 1) th layers, but with no connections within the same layer.For an example of such a classical network, see FIG. 3. Note that in this case the same output of a single neuron is sent to all the neurons in the next layer [1,2].
To make the copying reversible, in line with our approach of firstly making the classical neural network reversible, we propose the recipe: Irreversible → reversible: For a classical irreversible copying operation of a bit b → (b, b), create a classical reversible gate, which can be represented by a permutation matrix [28], taking (b, 0) → (b, b).
In the quantum case the no-cloning theorem shows one cannot do this in the most naive way [18].For a 2-qubit case, one can use a CNOT for example to copy in the classical computational basis [28]: Thus one may consider replacing the copying with a CNOT.However when investigating applications of the network we realised that there are scenarios (the autoencoder in particular) where entanglement between different neurons is needed to perform the task.We have therefore chosen the following definition: Reversible → unitary: The classical CNOT is generalised to a general 2-qubit 'fan-out' unitary U F , with one dummy input set to |0 , such that |b |0 → U F |b |0 .As this unitary does not in general copy quantum states that are non-orthogonal we call it a 'fan-out' operation rather than a copying operation, as it distributes information about the input state into several output qubits.Note that a quantum network would be trained to choose the unitary in question.

Efficient training with gradient descent
A classical neural network is trained to perform particular tasks.This is done by randomly initialising the weights and then propagating inputs through the network many times, altering the weights after each propagation in such a way as to make the network output closer to the desired output.A cost function, C, relating the network output to the desired output is defined by where y (L) is a vector of the desired outputs from each of the final layer l = L neurons and a (L) is the vector of actual outputs, which depends on the network weights, and (.) is the l 2 -norm.The cost function is minimised to zero when the weights propagate the input in such a way that the network output vector equals the desired output vector.Since the weights are continuous variables, the numerical partial derivatives of the cost function w.r.t. each weight can be found by approximating ∂C ∂w ≈ C(w+ )−C(w) .After each propagation, these partial derivatives are computed and the weights are altered in the direction of greatest decrease of the cost function.Specifically, each weight w where η is an adjustable non-negative parameter.This training procedure is known as gradient descent [1].Note that gradient descent normally also requires a continuous and differentiable activation function, to allow small changes in the weights to relate to small changes in the cost.For this reason, the Heaviside activation function has traditionally been replaced by a sigmoid function [1,2].Nevertheless, gradient descent has also been achieved using Heaviside activation functions, by taking the weights as Gaussian variables and taking partial derivatives w.r.t. the means and standard deviations of the appropriate Gaussian distributions [31,32].
In the reversible generalisation, where each neuron is replaced by a permutation matrix, we find that the output is no longer a function of the inputs and continuous weights, but rather of the inputs and a discrete set of permutation matrices.However, in the generalisation to unitaries, for a gate with n inputs and outputs, there exist an infinite number of unitaries, in contrast with the discrete set of permutation matrices.This means that the unitaries can be parametrised by continuous variables, which once again allows the application of gradient descent.
Given that any unitary matrix U can be expressed as U = e iH , where H is a Hermitian matrix [18], and that such matrices can be written as linear combinations of tensor products of the Pauli matrices and the identity, it follows that a general N -qubit unitary can be expressed as where σ i are the Pauli matrices for i ∈ {1, 2, 3} and σ 0 is the 2×2 identity matrix.This parametrisation allows the use of the training rule of Eq. 3, but replacing the weight w (l) jk with a general parameter α j1,...,j N of the unitary U N : A simpler and less general form of U N has been sufficient for the tasks discussed in this paper: where V is a general 2-qubit unitary of the form of Eq. 4. Each T j is similarly a general 1-qubit unitary and one can see, using the methods of [33] on Eq. 4, that this can be expressed as a linear combination of the Pauli matrices, σ j : where Ω = α 2 1 + α 2 2 + α 2 3 [33].To extend this to higher dimensional unitaries, see e.g.[34].
The cost function we use for the quantum neural networks is, with experimental feasibility in mind, determined by the expectation values of local Pauli matrices (σ 1 , σ 2 , σ 3 ) on individual output qubits, j.It has the form where f ij is a real non-negative number (in the examples to follow f ij ∈ {0, 1}).We note in the classical mode of operation, where the total density matrix state is diagonal in the computational basis, only σ 3 will have non-zero expectation, and the cost function becomes the same as in the classical case (Eq.2) up to a simple transformation.
It is important to note that the number of weights grow polynomially in the number of neurons.Each weight shift is determined by evaluating the cost function twice to get the RHS of Eq. 5. Thus the number of evaluations of the cost function for a given iteration of the gradient descent grows polynomially in the number of neurons.The training procedure is efficient in this sense.We do not here attempt to provide a proof that the convergence to zero cost-function, where possible, will always take a number of iterations that grows polynomially in the number of neurons.Note also that the statements about the efficiency of the training procedure refer to the physical implementation with quantum technology: the simulation of quantum systems with a classical computer is, with the best known methods, in general inefficient.

Example: Autoencoder for data compression
We now demonstrate applications of our quantum generalisation of neural networks described in the previous section.We begin with autoencoders.These compress an input signal from a given set of possible inputs onto a smaller number of bits, and are 'work-horses' of classical machine learning [2].

Classical autoencoder
Autoencoders are commonly achieved by a feedforward neural network with a bottleneck in the form of a layer with fewer neurons than the input layer.The network is trained to recreate the signal at a later layer, which necessitates reversibly compressing it (as well as possible) to a bit size equal to the number of neurons in the bottleneck layer [2].The bottleneck layer size can be varied as part of the training to find the smallest compression size possible, which depends on the data set in question.After the training is complete, the post-bottleneck part of the network can be discarded and the compressed output taken directly from after the bottleneck.
In FIG. 3 a basic autoencoder designed to compress two bits into a single bit is shown.(Here the number of input bits, j max = 2.) The basic training procedure consists of creating a cost function: with which the network is trained using the learning rule of Eq. 3. If the outputs are identical to the inputs (to within numerical precision), the network is fully trained.The final layer is then removed, revealing the second last layer, which should enclose the compressed data.The number of neurons in a given hidden layer for a classical neuron will not exceed j max .Once the network is trained, the removal of the post-bottleneck layer(s) will yield a second last layer of fewer neurons, achieving dimensional reduction [2].

Quantum autoencoder
We now generalise the classical autoencoder as shown in FIG. 3 to the quantum case.We generalise the neurons labelled 1, 2 and 3 in FIG. 3 into unitary matrices U 1 , U 2 and U 3 , respectively, with the addition of a 'fan-out' gate, U F , as motivated in the previous sections.The result is shown in FIG. 4 as a quantum circuit model.(We follow the classical convention that this neural network is drawn with the input neurons as well, but they are identity operators which let the inputs through regardless, and can be ignored in the simulation of the network.)The input state of interest |in 12 is on 2 qubits, each fed into a different neuron, generalising the classical autoencoder in FIG. 3. From each of these neurons, one output qubit each is led into the bottleneck neuron U 1 , followed by a fan-out of its output.We add as an extra desideratum that the compressed bit, the output of U 1 , is diagonal that a natural and simple cost function is Training is then conducted via global gradient descent of the cost w.r.t. the α j1,...,j N parameters, as defined in Eq. 5.During the training the network was fed states from the given input set, picked independently and identically for each step (i.i.d).Standard speed-up techniques for learning were used, e.g. a momentum term [1,2].
In training with a variety of 2 possible orthogonal input states including superposition states, the cost function of the quantum autoencoder converged towards zero through global gradient descent in every case, starting with uniformly randomised weights, α j1,...,j N ∈ [−1, 1].For 2 non-orthogonal inputs and a 1-qubit bottleneck the cost-function will not converge to zero as is to be expected, but the training rather results in an approximately compressing unitary.FIG. 4 shows the network learning to compress in the case of two possible inputs: One can force the compressed output to be diagonal in a particular basis by adding an extra term to the cost-function (e.g.desiring the expectation value of Pauli X and Y to be zero in the case of a single qubit will push the network to give an output diagonal in the Z-basis).

Example: Neural network discovers teleportation protocol
With quantum neural networks already shown to be able to perform generalisations of classical tasks, we now consider the possibility of quantum networks discovering solutions to existing and potentially undiscovered quantum protocols.We propose a quantum neural network structure that can, on its own, work out the standard protocol for quantum teleportation [18].
The design and training of this network is analogous to the autoencoder and the quantum circuit diagram is shown in FIG. 5.The cost function used was: A fully trained network can teleport the state |ψ (from Alice) to the output port of qubit 6 (to Bob).Once trained properly, ρ out1 will no longer be |ψ ψ|, as the teleportation has 'messed up' Alice's state [35].
In order to train the teleportation for any arbitrary state |ψ (and to avoid the network simply learning to copy |ψ from Alice to Bob), the training inputs are randomly picked from the axis intersection states on the surface of the Bloch sphere [18].FIG.6 shows the convergence of the cost function during training, simulated on a classical computer.As can be seen, the training was found to be successful, i.e. the cost function converged towards zero.This held for all tests with randomly initialised weights.

Quantum vs. classical
Can these neural networks show some form of quantum supremacy?The comparison of classical and quantum neural networks is well-defined within our set-up, as the classical networks correspond to a particular parameter regime for the quantum networks.A key type of quantum supremacy is that the quantum network can take and process quantum inputs: it can for example process |+ and |− differently.Thus, there are numerous quantum tasks it can do that the classical network cannot, including the two examples above.We anticipate that they will moreover, in some cases be able to process classical inputs faster, by turning them into superpositionsinvestigating this is a natural follow-on from this work.We also mention that we term our above design a quantum neural network with classical learning parameters, as the parameters in the unitaries are classical.It seems plausible that allowing these parameters to be in superpositions, whilst experimentally more challenging, could give further advantages.
Whilst adding the ancillary qubits ensures that the network is a strict generalisation of the classical network, it can of course be experimentally and numerically simpler to omit these.Then one would sacrifice performance in the classical mode of operation, and the network may not be as good as a classical network with the same number of neurons for all tasks.

Scaling to bigger networks
The same scheme can be used to make quantum generalisations of networks whose generalised neurons have more inputs/outputs and connections.FIG. 8 illustrates an M -qubit input quantum neuron with a subsequent N -qubit fan-out gate.If one wishes the number of free parameters of a neuron to grow no more than polynomially in the number of inputs, one needs to restrict the unitary.It is natural to demand it to be a polynomial length circuit of some elementary universal gates, in particular if the input states are known to be generated by a polynomial length circuit of a given set of gates, it is natural to let the unitary be restricted to that set of gates.
The evaluation of the cost function can be kept to a sensible scaling if we restrict it to be a function of local observables on each qubit, in particular a function of the local Pauli expectation values, as was used in this paper, for which case a vector of 3n expectation values suffices for n qubits.

QUANTUM PHOTONICS NEURON MODULE
To investigate the physical viability of these quantum neural networks we consider quantum photonics.This is an attractive platform for quantum information processing: it has room temperature operation, the possibility of robust miniaturisation through photonic integrated circuits; in general it harnesses the highly developed optical fibre-related technology for QIP purposes [36].Moreover optical implementations have been viewed as optimal for neural networks, in the classical case, due to the low design cost of adding multiple connections (as light passes through light without interacting) [37].A final motivation for choosing this platform is that the tuning can be naturally implemented, as detailed below.
We design a neuron as a module that can then be connected to other neurons.This makes it concrete how experimentally complex the network would be to build and operate, including how it could be trained.
The design employs the Cerf-Adami-Kwiat (C-A-K) protocol [38], where a single photon with polarisation and multiple possible spatial modes encodes the quantum state; the scheme falls into the category of hyperentangling schemes, which entangle different degrees of freedom.One qubit is the polarisation; digital encodings of the spatial mode labels give rise to the others.With four spatial modes this implements 3 qubits, with basis |0/1 |H/V |0/1 , where H/V are two different polarisation states, and the other bits label the four spatial modes.The first bit says whether it is in the top two or bottom two pairs of modes and the last bit whether it is the upper or lower one in one of those pairs.This scheme and related ones such as [39,40] are experimentally viable, theoretically clean and can implement any unitary on a single photon spread out over spatial modes.In such a single photon scenario they do not scale well however.The number of spatial modes grows exponentially in the number of qubits.Thus for larger networks our design below would need to be modified to something less simple, e.g.accepting probabilistic gates in the spirit of the KLM scheme [41], or using measurement-based cluster state quantum computation approaches [36].
Before describing the module we make the simplifying restriction that there is one input qubit to the neuron and one dummy input.We will ensure that the designated output qubit can be fed into another neuron, as in FIG. 9 and FIG.10.
We propose to update the neural network by adjusting both variable polarisation FIG. 9.The first neuron takes one input and one dummy input and its designated output is fed into the next neuron.FIG. 10.A circuit diagram of our neural module.Following C-A-K there are three qubits, with basis |0/1 |H/V |0/1 , where H/V label different polarisation states, and the other bits label the four spatial modes.We define the input to the module to be carried by the middle (polarisation) qubit.The neuron U1 has the form of Eq. 6, modifying the output conditional on the input state.The swaps ensure that the next neuron module U2 also gets the input via the polarisation.
rotators, and spatial phase shifters in a set of Mach-Zehnder interferometers as shown in FIG.11.In this we are able to change the outputs from each layer of the network.The spatial shift could be induced by varying the strain or temperature on the waveguides at given locations, to change their refractive indices and hence the relative phase; this may have additional difficulties in that silicon waveguides are birefringent [42].Alternatively we can tune both polarisation and spatial qubits via the electro-optic effect.This circuit can be made more robust and minitaturised using silicon or silica optical waveguides [36].They have been extensively used to control spatial modes and recently also polarisation [43].Several labs can implement the phase shifting via heaters or the electrooptic effect.Conventionally phase shifters built upon the electro-optic effect are known to work in the megahertz region and have extremely low loss [36].For many applications this would be considered slow, but our tuning only requires (in the region of) a few thousand steps of tuning, meaning learning tasks for neural networks this small could be completed in milliseconds.While it appears that this effect will be the limiting factor in terms of speed, photodetectors are able to reach reset times in the tens of nanoseconds, while the production of single photons through parametric down conversion have megahertz repetion rates [44].

SUMMARY AND OUTLOOK
We have given a protocol for generalising classical feedforward step-function neural networks to networks that take and process quantum inputs.We have shown that these networks can perform the natural quantum generalisation of the classical network in the case of an autoencoder, being able for example to compress entangled inputs.We have shown that they can be used to work out a quantum information processing protocol: teleportation, without being told how to do it, only the task.Based on these results we think that these networks will be highly versatile tools for quantum information scientists, similar to the classical networks' role in classical information processing.

FIG. 1 .
FIG. 1.A classical neuron taking two inputs in1 and in2 and giving a corresponding output out [1]. a (l) j labels the output of the j th neuron in the l th layer of the network.

(l− 1
) k , where k labels the input.The inputs are each multiplied by a corresponding weight, w (l) jk , and an output, a (l) j , is fired as a function of the weighted input z

FIG. 3 . 1 .
FIG. 3. A classical autoencoder taking two inputs in1 = a (0) 1 FIG. 4. Neural network implementing a quantum autoencoder that can accomodate two input qubits that are entangled.The blue box represents the quantum compression device after training.

FIG. 5 .
FIG. 5. A circuit diagram of a quantum neural network that can learn and carry out teleportation of the state |ψ from Alice to Bob using quantum entanglement.The standard teleportation protocol allows only classical communication of 2 bits [18]; this is enforced by only allowing two connections, which are dephased in the Z-basis (D).U1, U2 and U3 are unitaries,.The blue line is the boundary between Alice and Bob.

FIG. 6 .
FIG. 6.A plot of the teleportation cost function w.r.t. the number of steps used in the training procedure.The cost function can be seen to converge to zero.The non-monotonic decrease is to be expected as we are varying the input states.The network now teleports any qubit state: picking 1000 states at random from the Haar measure (uniform distribution over the Bloch sphere) gives a cost function distribution with mean 5.0371 × 10 −4 and standard deviation 1.7802 × 10 −4 , which is effectively zero.

FIG. 7 .
FIG. 7. A 3-D plot of the cost function (vertical axis) of a 2qubit unitary as a function of θ and φ (horizontal axes).The red line represents the path taken when carrying out gradient descent from a particular starting point.

FIG. 8 .
FIG.8.Diagram of the quantum generalisation of a classical neuron with M inputs and N outputs.The superscripts inside the square brackets of the unitaries represent the number of qubits the respective unitaries act on.U [M +1] is the unitary that represents the quantum neuron with an N -qubit input and U[N ] is the fan-out gate that fans out the output in the final port of U [M +1] in a particular basis.