Quantum hyperparallel algorithm for matrix multiplication

Hyperentangled states, entangled states with more than one degree of freedom, are considered as promising resource in quantum computation. Here we present a hyperparallel quantum algorithm for matrix multiplication with time complexity O(N2), which is better than the best known classical algorithm. In our scheme, an N dimensional vector is mapped to the state of a single source, which is separated to N paths. With the assistance of hyperentangled states, the inner product of two vectors can be calculated with a time complexity independent of dimension N. Our algorithm shows that hyperparallel quantum computation may provide a useful tool in quantum machine learning and “big data” analysis.

Quantum algorithms 1 are believed to be able to speedup dramatically for some problems over the classical ones. Ground breaking work include large integer factoring with Shor algorithm 2 , Gorver's search algorithm [3][4][5] , and linear system algorithm 6,7 . Recently, quantum algorithms for matrix are attracting more and more attentions, for its promising ability in dealing with "big data". For example, quantum speedup for linear system equation 6,7 , max-min matrix product 8 , Boolean matrix product [8][9][10] have been studied.
Matrix multiplication is one of the most important operations for matrix as many other problems can be reduced to it, e.g., determinant and matrix inverse 11 . Up to now, only the algorithm for matrix product verification has been proposed 12 , i.e., to verify whether AB = C or not. The algorithm is based on quantum random walk and has a time complexity O(N 5/3 ), which is better than the best known classical algorithm that runs in time O(N 2 ). However, quantum algorithm for matrix multiplication without previous knowledge about the results has not yet been presented.
Swap test is a procedure that can determine the overlap of two quantum states, which is first introduced for quantum fingerprinting 13,14 . By entangling the tested system with an ancillary qubit, one can estimate the inner product of two different states by measuring the ancillary qubit repeatedly. Recently, in ref. 15, the authors proposed an algorithm for quantum machine learning based on swap test, which is exponentially faster than classical algorithm. The key step of the algorithm is mapping the N-dimensional vectors to a log 2 N qubits state, then estimate the distance of two normalized vectors with swap test. An experiment in photonic system is also performed to prove its validity 16 . Those work show that swap test can play an important role for quantum algorithm.
However, for photonic systems, the preparing of many-photon entangled state becomes increasingly difficult for very large N, i.e., both generation rate and state fidelity will drop dramatically [17][18][19][20] . An alternative way is using hyperparallel quantum computation (HPQC) [21][22][23] . In HPQC, two or more qubits are encoded in different degree of freedom of a single source, which is called 'hyperentanglement' . It requires much less resource and can avoid the infidelity problem during the multipartite entangled state generation process. Some important works in hyperentanglement have been reported. For example, the generation of hyper entangled states (HES) with very large dimension has been demonstrated experimentally 24,25 . The first complete hyperentangled Bell state analysis protocol was proposed in 2010 26 . Recently, the first experiment of quantum teleportation of multiple degrees of freedom of a single photon in polarization and orbital angularmomentum with linear optics was performed 27 . Based on the robust entanglement encoded in other degree of freedom such as spatial modes or time-bin, it is shown that it can be used to perform the deterministic entanglement distillation or purification 28,29 .
Here, we present a hyperparallel algorithm for matrix multiplication based on swap test. We show that besides reducing the required resource, HPQC also leads to significant speedup. In our algorithm, both the polarization degree of freedom and the spacial degree of freedom of the single photon are used. By introducing an extra degree of freedom, we do not need to prepare multipartite entanglement cluster states. Instead, an N dimensional vector is represented by only a single source, and the information of each element are mapped to the spacial degree of freedom. Therefore, for square matrix multiplication, our algorithm takes time O(N 2 ε −2 log 2 η −1 ), where m and ε will be defined in the following . In comparison, the best known classical algorithm given by Williams takes time N 2.372 30 , while the non-hyperparallel quantum algorithm takes time O(N 2 log N). The speedup of HPQC comes from the state preparation procedure. For HES, preparing an N dimensional state takes time O(1), while conventional quantum algorithm takes time log(N). Our algorithm includes the output of all elements of the results. Since printing out N 2 numbers takes time O(N 2 ), our algorithm reaches the lower bound of time complexity for matrix multiplication 30 .

Result
The algorithm. Suppose we have two real number matrices , and  → = Here, we represent the two level degree of freedom by alphabet H and V, and the paths degree of freedom by numbers  0, 1, 2 . We begin with preparing an entangled state with respect to the two level degree of freedom anc a nc where |H(V), 0, 0〉 represents a tested source that is in path |0, 0〉 . The first number denotes stem paths, while the second number denotes branch paths. The stem path of k(k′ ) > 0 represents the vector with label → u k or  → ′ v k (k and k′ represent two different paths), and the branch path i represents different elements of the vector.
First, to calculate W k,k′ with particular accuracy, we apply a series of operations on the space expanded by paths and two level degrees of freedom (see the method section for details), which transform the state |Ψ 〉 to is a constant ensuring that the probability amplitudes of each paths will not exceed 1; , . η (a k,i ) and η (b i , k′ ) denote errors that are introduced by the oracles, which map the information of vectors to the states. The errors satisfy η η where m is the steps required for the oracle. Given the required accuracy, the steps of obtaining Φ ′ k k , are independent to N. The second terms of Eq. (4) are orthogonal to each other because they are in different stem paths k and k′ . So the inner product of two states is simplified as Here, ∼ ′ W k k , are the approximation of W k,k′ , which satisfy to C. To reach the accuracy η, the oracles require O(m) or O(log 2 η −1 ) operations.
Next, to read out the inner product, we project the ancillary qubits to state ϕ = − H V ( ) anc a nc anc 1 2 . If the success probability of the projection is p k,k′ , the element ∼ ′ W k k , can be expressed as 15 Experimentally, the probability p k,k′ can be obtained approximately with repeated measurements. For each element of W k,k′ , we repeat the initialization, transform of tested qubit and measurement procedure for T times, among which there are ′ ′ T k k , times of successfully projection. The probability can be approximated as ′ . Therefore, we can define Scientific RepoRts | 6:24910 | DOI: 10.1038/srep24910 In this way, the error satisfies where ε″ is the statistic error of p k,k′ To reach the accuracy ε″ for p k,k′ , or ε for ∼ ′ W k k , , the repeating time is The total accuracy of ω k,k′ respected to W k,k′ can be given as The time complexity of calculating W k,k′ is ε η In summary, the procedure of calculating W k,k′ includes the following steps: • Define the measuring time T, initially T′ = 0, l = 1; • Initialize the state in |Ψ 〉 ; • Transform the state to |Φ k,k′ 〉 ; • Perform the projection measurement at ancillary qubit, if success, T′ = T′ + 1; • If l < T, l = l + 1,return to step 1; if l = T, output 1 − 2T′ /T.

Further speedup.
At this stage, we present a further speedup of the algorithm, which allows one to calculate N numbers of T k,k′ synchronously. Previously, only two registers work at the same time, while other registers wait for the query. In fact, this waste can be avoided. We show that by sacrificing the spacial source, that is, increase the number of entanglement sources, quantum gates and the measurement devices, the algorithm can be speeded up dramatically by activating all registers at the same time.
The k th entanglement pair is now directed to the register k and (k + c)′ for calculating the element W k,k+c . Obviously, all 2N registers have been activated, and will process at the same time. We apply operation ⊗ U N 0 on the state and then direct the i branch path of the k th qubit to oracle ô k i , and ô k i , . Finally, we obtain the state However, if we only speed up the calculation of T k,(k+c)′ , the time complexity can not be reduced. The reason is that calculating W k,k′ from T k,k′ still needs O(N 2 ) steps in classical computer, and the printing out of N 2 still takes O(N 2 ) steps. On the other hand, although the required sources increase, it is still much less than the required number of oracles O(N 2 ). Therefore, the spacial complexity does not increase.
In conclusion, we have presented a hyperparallel algorithm for matrix multiplication with arbitrary high accuracy. Our work shows that besides reducing the spacial sources, HPQC can also speedup the algorithm. We show that the time complexity of swap test is independent of the vectors' dimension. Therefore, related problems that can be solved by swap test, like quantum machine learning 15,16 , norm, etc., can also be speeded up with HPQC.
In our algorithm, the time complexity respected to accuracy is restricted by the oracle. With a more optimized design of the oracle, it is possible to further speed up the algorithm. HPQC avoids the difficulties in preparing multipartite entanglement cluster states. So it may has superiority in both required sources and time complexity. Our work provides a new powerful tool for manipulating "big data" with quantum computer.

Method
The transform to |Φ k,k′ 〉. The transform Eq. (4) is the most critical part of our algorithm. As shown in Fig. 1, we now elaborate how to realize this procedure. To begin with, we define a separation operation to the tested source. The state turns to anc a nc 1 Then, we guide the tested sources to two different paths. Both paths connect to a register, which stores the information of vectors → u k or  → ′ v k . In registers we perform an operation U 0 on the branch path degree of freedom In this case the source is prepared as an equally distributed superposition state of branch paths where |k, i〉 (|k′ , i〉 ) corresponds to the elements a k,i (b i,k′ ) of the vectors → u k (  → ′ v k ). Next, paths |k, i〉 and |k′ , i〉 are directed into oracles ô k i , and ′ o k i , , which control the rotation of the input state around axis-y in the space spanned by {|H〉 , |V〉 }. Generally, the oracles give transforms Finally, |H〉 and |V〉 are seperated to different paths by a separation operation, which lead the total output state to Eq. (4). For preparing state Eq. (13), we query all oracles synchronously. Therefore, the query complexity of the state preparation is independent of the dimension N.
The oracle. We now elaborate how to realize the oracles ô k i , and ′ o k i , with arbitrary small error η a ( ) The quantum circuit of the oracle ô k i , is given in Fig. 2 ) controls the sign of elements a k,i (b i,k′ ), while other qubits controls the magnitude of a k,i (b i,k′ ). The larger m is, the more accuracy the oracles are. Each work qubit can be on state |0〉 or |1〉 .
First, we rewrite a k,i and b i,k′ as where θ k,i and θ k′,i can be expressed in terms of infinite series as , ). Then, we define angles θ  k i , and θ  k i , , which are represented by control qubits: As can be seen, they are the approximation of θ k,i and θ k′,i . The error between them satisfies which can be infinitely small for infinitely large m, . The readin of an oracle takes O(m) steps. Assuming that we are now given the oracle for element a k,i , the operation procedure can be seen in Fig. 2. From j = 0 to j = m, the rotation is applied about axis-y on the two level degree of freedom sequentially, which are controlled by the control qubits | 〉 ∼ A j k i ( , ) . A total operation U (k,i) takes m steps, which can be expressed as: . Next we separate |H〉 and |V〉 into different paths, and combine the paths with |V〉 into |0, i〉 , while leaving |H〉 in k, i〉 (|k′ , i〉 ). Such procedure removes the path information of |H〉 . For photonic systems, it can be simply realized with a non-polarizing beam splitter (NBS) and post-selecting 31 . Finally, we complete the oracle operation.
For large m, the error of the amplitude satisfies If m is infinitely large, the error can be infinitely small. Obviously, both time and spacial complexity the oracles are O(m).