Quantum algorithms1 are believed to be able to speedup dramatically for some problems over the classical ones. Ground breaking work include large integer factoring with Shor algorithm2, Gorver’s search algorithm3,4,5 and linear system algorithm6,7. Recently, quantum algorithms for matrix are attracting more and more attentions, for its promising ability in dealing with “big data”. For example, quantum speedup for linear system equation6,7, max-min matrix product8, Boolean matrix product8,9,10 have been studied.

Matrix multiplication is one of the most important operations for matrix as many other problems can be reduced to it, e.g., determinant and matrix inverse11. Up to now, only the algorithm for matrix product verification has been proposed12, i.e., to verify whether AB = C or not. The algorithm is based on quantum random walk and has a time complexity O(N5/3), which is better than the best known classical algorithm that runs in time O(N2). However, quantum algorithm for matrix multiplication without previous knowledge about the results has not yet been presented.

Swap test is a procedure that can determine the overlap of two quantum states, which is first introduced for quantum fingerprinting13,14. By entangling the tested system with an ancillary qubit, one can estimate the inner product of two different states by measuring the ancillary qubit repeatedly. Recently, in ref. 15, the authors proposed an algorithm for quantum machine learning based on swap test, which is exponentially faster than classical algorithm. The key step of the algorithm is mapping the N-dimensional vectors to a log2N qubits state, then estimate the distance of two normalized vectors with swap test. An experiment in photonic system is also performed to prove its validity16. Those work show that swap test can play an important role for quantum algorithm.

However, for photonic systems, the preparing of many-photon entangled state becomes increasingly difficult for very large N, i.e., both generation rate and state fidelity will drop dramatically17,18,19,20. An alternative way is using hyperparallel quantum computation (HPQC)21,22,23. In HPQC, two or more qubits are encoded in different degree of freedom of a single source, which is called ‘hyperentanglement’. It requires much less resource and can avoid the infidelity problem during the multipartite entangled state generation process. Some important works in hyperentanglement have been reported. For example, the generation of hyper entangled states (HES) with very large dimension has been demonstrated experimentally24,25. The first complete hyperentangled Bell state analysis protocol was proposed in 201026. Recently, the first experiment of quantum teleportation of multiple degrees of freedom of a single photon in polarization and orbital angularmomentum with linear optics was performed27. Based on the robust entanglement encoded in other degree of freedom such as spatial modes or time-bin, it is shown that it can be used to perform the deterministic entanglement distillation or purification28,29.

Here, we present a hyperparallel algorithm for matrix multiplication based on swap test. We show that besides reducing the required resource, HPQC also leads to significant speedup. In our algorithm, both the polarization degree of freedom and the spacial degree of freedom of the single photon are used. By introducing an extra degree of freedom, we do not need to prepare multipartite entanglement cluster states. Instead, an N dimensional vector is represented by only a single source and the information of each element are mapped to the spacial degree of freedom. Therefore, for square matrix multiplication, our algorithm takes time O(N2ε−2 log2 η−1), where m and ε will be defined in the following . In comparison, the best known classical algorithm given by Williams takes time N2.372 30, while the non-hyperparallel quantum algorithm takes time O(N2 log N). The speedup of HPQC comes from the state preparation procedure. For HES, preparing an N dimensional state takes time O(1), while conventional quantum algorithm takes time log(N). Our algorithm includes the output of all elements of the results. Since printing out N2 numbers takes time O(N2), our algorithm reaches the lower bound of time complexity for matrix multiplication30.


The algorithm

Suppose we have two real number matrices

We define the vectors and . Matrix multiplication is to calculate W = UQ, where .

Here, we represent the two level degree of freedom by alphabet H and V and the paths degree of freedom by numbers . We begin with preparing an entangled state with respect to the two level degree of freedom

where |H(V), 0, 0〉 represents a tested source that is in path |0, 0〉. The first number denotes stem paths, while the second number denotes branch paths. The stem path of k(k′) > 0 represents the vector with label or (k and k′ represent two different paths) and the branch path i represents different elements of the vector.

First, to calculate Wk,k with particular accuracy, we apply a series of operations on the space expanded by paths and two level degrees of freedom (see the method section for details), which transform the state |Ψ〉 to


where is a constant ensuring that the probability amplitudes of each paths will not exceed 1; and are the approximation of ak,i and bi,k, i.e., , . η (ak,i) and η (bi,k) denote errors that are introduced by the oracles, which map the information of vectors to the states. The errors satisfy , where m is the steps required for the oracle. Given the required accuracy, the steps of obtaining are independent to N.

The second terms of Eq. (4) are orthogonal to each other because they are in different stem paths k and k′. So the inner product of two states is simplified as

Here, are the approximation of Wk,k, which satisfy

with and D is the ratio of the mean of to C. To reach the accuracy η, the oracles require O(m) or O(log2 η−1) operations.

Next, to read out the inner product, we project the ancillary qubits to state . If the success probability of the projection is pk,k, the element can be expressed as15

Experimentally, the probability pk,k can be obtained approximately with repeated measurements. For each element of Wk,k, we repeat the initialization, transform of tested qubit and measurement procedure for T times, among which there are times of successfully projection. The probability can be approximated as . Therefore, we can define

In this way, the error satisfies

where is the statistic error of pk,k

To reach the accuracy for pk,k, or ε for , the repeating time is . The total accuracy of ωk,k respected to Wk,k can be given as

The time complexity of calculating Wk,k is . Since there are N2 elements to be calculated, the total time complexity of matrix multiplication is . We uses totally 2N2 oracles, while each oracle contains O(m) quantum gates. Therefore, the spacial complexity is O(N2m) or O(N2 log2η−1).

In summary, the procedure of calculating Wk,k includes the following steps:

  • Define the measuring time T, initially T′ = 0, l = 1;

  • Initialize the state in |Ψ〉;

  • Transform the state to |Φk,k〉;

  • Perform the projection measurement at ancillary qubit, if success, T′ = T′ + 1;

  • If l < T, l = l + 1,return to step 1; if l = T, output 1 − 2T′/T.

Further speedup

At this stage, we present a further speedup of the algorithm, which allows one to calculate N numbers of Tk,k synchronously. Previously, only two registers work at the same time, while other registers wait for the query. In fact, this waste can be avoided. We show that by sacrificing the spacial source, that is, increase the number of entanglement sources, quantum gates and the measurement devices, the algorithm can be speeded up dramatically by activating all registers at the same time.

We begin with preparing N pairs of entangled states , where . As the procedure in the above section, we apply the separation operation to each qubit

where integer 0 ≤ c < N is a constant for a particular loop of operation and (k + c)′ represents the register storing vectors of matrix V with . The state becomes

The kth entanglement pair is now directed to the register k and (k + c)′ for calculating the element Wk,k+c. Obviously, all 2N registers have been activated and will process at the same time. We apply operation on the state and then direct the i branch path of the kth qubit to oracle and . Finally, we obtain the state

Next, we perform the projection measure of all N ancillary qubits onto the state . All values of for a particular c can be obtained at the same time and thus all values of takes time O(N), instead of O(N2).

However, if we only speed up the calculation of Tk,(k+c)′, the time complexity can not be reduced. The reason is that calculating Wk,k from Tk,k still needs O(N2) steps in classical computer and the printing out of N2 still takes O(N2) steps. On the other hand, although the required sources increase, it is still much less than the required number of oracles O(N2). Therefore, the spacial complexity does not increase.

In conclusion, we have presented a hyperparallel algorithm for matrix multiplication with arbitrary high accuracy. Our work shows that besides reducing the spacial sources, HPQC can also speedup the algorithm. We show that the time complexity of swap test is independent of the vectors’ dimension. Therefore, related problems that can be solved by swap test, like quantum machine learning15,16, norm, etc., can also be speeded up with HPQC. In our algorithm, the time complexity respected to accuracy is restricted by the oracle. With a more optimized design of the oracle, it is possible to further speed up the algorithm. HPQC avoids the difficulties in preparing multipartite entanglement cluster states. So it may has superiority in both required sources and time complexity. Our work provides a new powerful tool for manipulating “big data” with quantum computer.


The transform to |Φk,k

The transform Eq. (4) is the most critical part of our algorithm. As shown in Fig. 1, we now elaborate how to realize this procedure. To begin with, we define a separation operation

Figure 1
figure 1

Quantum circuit of realizing the transformEq. (4).

‘/’ denotes a bundle of path. Dash line represents that the amplitude of such path is 0.

which separates states |H〉 and |V〉 of the tested source into different paths. For photonic quantum computation, such separation operation can be realized easily with PBS, which allows horizontally polarize light to pass through and reflect vertically polarized light.

The first step is implementing operation to the tested source. The state turns to

Then, we guide the tested sources to two different paths. Both paths connect to a register, which stores the information of vectors or . In registers we perform an operation U0 on the branch path degree of freedom

In this case the source is prepared as an equally distributed superposition state of branch paths

where |k, i〉 (|k′, i〉) corresponds to the elements ak,i (bi,k) of the vectors (). Next, paths |k, i〉 and |k′, i〉 are directed into oracles and , which control the rotation of the input state around axis-y in the space spanned by {|H〉, |V〉}. Generally, the oracles give transforms


Finally, |H〉 and |V〉 are seperated to different paths by a separation operation, which lead the total output state to Eq. (4). For preparing state Eq. (13), we query all oracles synchronously. Therefore, the query complexity of the state preparation is independent of the dimension N.

The oracle

We now elaborate how to realize the oracles and with arbitrary small error and . The quantum circuit of the oracle is given in Fig. 2. Each oracles or contains m + 1 control qubits or (0 ≤ j ≤ m). controls the sign of elements ak,i (bi,k), while other qubits controls the magnitude of ak,i (bi,k). The larger m is, the more accuracy the oracles are. Each work qubit can be on state |0〉 or |1〉.

Figure 2
figure 2

The realization of oracle . denotes work qubit.

Dash line represents that the amplitude of such path is zero.

First, we rewrite ak,i and bi,k as

where θk,i and θk′,i can be expressed in terms of infinite series as

where and can be either 0 or 1. For , we prepare the state of in a way that (). Then, we define angles and , which are represented by control qubits:

As can be seen, they are the approximation of θk,i and θk′,i. The error between them satisfies

which can be infinitely small for infinitely large m, . The readin of an oracle takes O(m) steps. Assuming that we are now given the oracle for element ak,i, the operation procedure can be seen in Fig. 2. From j = 0 to j = m, the rotation is applied about axis-y on the two level degree of freedom sequentially, which are controlled by the control qubits . A total operation U(k,i) takes m steps, which can be expressed as:

where and . For input states |V, k, i〉, we obtain the output states

where .

For that represents bi,k, we first perform a rotation on the input states . Then, following a similar procedure we obtain the output state

where .

Next we separate |H〉 and |V〉 into different paths and combine the paths with |V〉 into |0, i〉, while leaving |H〉 in k, i〉 (|k′, i〉). Such procedure removes the path information of |H〉. For photonic systems, it can be simply realized with a non-polarizing beam splitter (NBS) and post-selecting31. Finally, we complete the oracle operation.

For large m, the error of the amplitude satisfies

If m is infinitely large, the error can be infinitely small. Obviously, both time and spacial complexity the oracles are O(m).

Additional Information

How to cite this article: Zhang, X.-D. et al. Quantum hyperparallel algorithm for matrix multiplication. Sci. Rep. 6, 24910; doi: 10.1038/srep24910 (2016).