Abstract
Variational quantum algorithms are the leading candidate for advantage on nearterm quantum hardware. When training a parametrized quantum circuit in this setting to solve a specific problem, the choice of ansatz is one of the most important factors that determines the trainability and performance of the algorithm. In quantum machine learning (QML), however, the literature on ansatzes that are motivated by the training data structure is scarce. In this work, we introduce an ansatz for learning tasks on weighted graphs that respects an important graph symmetry, namely equivariance under node permutations. We evaluate the performance of this ansatz on a complex learning task, namely neural combinatorial optimization, where a machine learning model is used to learn a heuristic for a combinatorial optimization problem. We analytically and numerically study the performance of our model, and our results strengthen the notion that symmetrypreserving ansatzes are a key to success in QML.
Similar content being viewed by others
Introduction
Hybrid quantumclassical algorithms in which a parametrized quantum circuit (PQC) is optimized by a classical algorithm to solve a specific problem, also known as variational quantum algorithms^{1}, are expected to be the leading candidate for nearterm quantum advantage due to their flexibility and the hope that their hybrid nature can make them robust to noise to some degree. These types of algorithms can be applied in a variety of contexts, and it is known that the right choice of circuit structure, also known as the ansatz, is of key importance for the performance of these models. Much work has been dedicated to understand how circuits have to be structured to address problems in optimization^{2,3} or chemistry^{4,5}. For quantum machine learning however, it is largely unknown which type of ansatz should be used for a given type of data. In absence of an informed choice, general architectures as the hardwareefficient ansatz^{6} are often used^{7}. It is known that ansatzes with randomly selected structure scale badly as the width and depth of the circuit grows, most prominently because of the barren plateau phenomenon^{8,9,10} where the gradients of a PQC vanish exponentially as the system size grows and thus render training impossible. This situation can be compared to the early days of neural networks (NNs), where fully connected feedforward NNs were the standard architecture. These types of NNs suffer from similar trainability issues as random quantum circuits^{11}. Recent breakthroughs in deep learning were in part possible because more efficient architectures that are directly motivated by the training data structure have been developed^{12,13,14}. In fact, a whole field that studies the mathematical properties of successful NN architectures has emerged in the past decade, known as geometric deep learning. This field studies the properties of common NN architectures, like convolutional NNs or graph NNs, through the lens of group theory and geometry and provides an understanding of why these structured types of models are the main drivers of recent advances in deep learning. The success of these models can largely be attributed to the fact that they preserve certain symmetries that are present in the training data. Graph NNs, for example, take graphstructured data as input and their layers are designed such that they respect one of two important graph symmetries: invariance or equivariance under permutation of vertices^{15}, as depicted in Fig. 1. Graphstructured data is ubiquitous in realworld problems, for example to predict properties of molecules^{13} or to solve combinatorial optimization problems^{16}. Even images can be viewed as special types of graphs, namely those defined on a lattice with nearestneighbor connections. This makes graph NNs applicable in a multitude of contexts, and motivated a number of works that study quantum versions of these models^{17,18,19,20}. However, the key questions of how to design symmetrypreserving ansatzes motivated by a concrete input data structure and how these ansatzes perform compared to those that are structurally unrelated to the given learning problem remain open.
In this work, we address these open questions by introducing a symmetrypreserving ansatz for learning problems where the training data is given in form of weighted graphs, and study its performance both numerically and analytically. To do this, we extend the family of ansatzes from^{20} to incorporate weighted edges of the input graphs and prove that the resulting ansatz is equivariant under node permutations. To evaluate this ansatz on a complex learning task where preserving a given symmetry can yield a significant performance advantage, we apply it in a domain where classical graph NNs have been used extensively: neural combinatorial optimization^{16}. In this setting, a model is trained to solve instances of a combinatorial optimization problem. Namely, we train our proposed ansatz to find approximate solutions to the Traveling Salesperson Problem (TSP). We numerically compare our ansatz to three nonequivariant ansatzes on instances with up to 20 cities (20 qubits), and show that the more the equivariance property of the ansatz is broken, the worse performance becomes and that a simple hardwareefficient ansatz completely fails on this learning task. In addition, we analytically study the expressivity of our model at depth one, and show under which conditions there exists a parameter setting for any given TSP instance of arbitrary size for our ansatz that produces the optimal tour with the learning scheme that is applied in this work.
The neural combinatorial optimization approach presented in this work also provides an alternative method to employ nearterm quantum computers to tackle combinatorial optimization problems. As problem instances are directly encoded into the circuit in form of graphs without the need to specify a cost Hamiltonian, this approach is even more frugal than that of the quantum approximate optimization algorithm (QAOA)^{2} in terms of the requirement on the number of qubits and connectivity in cases where the problem encoding is nontrivial. For the TSP specifically, standard Hamiltonian encodings require n^{2} variables where n is the number of cities (or \(n\log (n)\) variables at the cost of increased circuit depth)^{21}, whereas our approach requires only n qubits and twobody interactions. We do note that the theoretical underpinnings and expected guarantees of performance of our method are very different and less rigorous than those of the QAOA, so the two are hard to compare directly. However, we establish a theoretical connection to the QAOA based on the structure of our ansatz, and in addition numerically compare QAOA performance on TSP instances with 5 cities to the performance of the proposed neural combinatorial optimization approach. We find that our ansatz at depth one outperforms the QAOA even at depth up to three. From a pragmatic point of view, linear scaling in qubit numbers w.r.t. number of problem variables, as opposed to e.g., quadratic scaling as in the case of the TSP, dramatically changes the applicability of quantum algorithms in the near to midterm.
Our work illustrates the merit of using symmetrypreserving ansatzes for QML on the example of graphbased learning, and underlines the notion that in order to successfully apply variational quantum algorithms for ML tasks in the future, the usage of ansatzes unrelated to the problem structure, which are popular in current QML research, is limited as problem sizes grow. This work motivates further study of “geometric quantum learning” in the vein of the classical field of geometric deep learning, to establish more effective ansatzes for QML, as these are a prerequisite to efficiently apply quantum models on any practically relevant learning task in the nearterm.
Results
In this section, we formally introduce the structure of our equivariant quantum circuit (EQC) for learning tasks on weighted graphs that we use in this work. Examples of graphstructured data that can be used as input in this type of learning task are images^{14}, social networks^{22} or molecules^{13}. In general, when learning based on graph data, there are two sets of features: node features and edge features. Depending on the specific learning task, it might be enough to use only one set of these features as input data, and the specific implementation of the circuit will change accordingly. As mentioned above, an example of an ansatz for cases where encoding node features suffices is the family of ansatzes introduced in ref. ^{20}. In our case, we use both node and edge features to solve TSP instances. In case of the nodes, we encode whether a node (city) is already present in the partial tour at time step t to inform the node selection process described later in Definition 2. For the edges, we simply encode the edge weights of the graph as these correspond to the distances between nodes in the TSP instance’s graph. In this work, we use one qubit per node in the graph, but in general multiple qubits per node are also possible. We discuss the details of this in the supplementary material. We now proceed to define the ansatz in terms of encoding node information in form of α (see Definition 1) and edge information in terms of the weighted graph edges \({\varepsilon }_{ij}\in {{{\mathcal{E}}}}\). For didactic reasons we relate the node and edge features to the concrete learning task that we seek to solve in this work, however, we note that this encoding scheme is applicable in the context of other learning tasks on weighted graphs as well.
Equivariant quantum circuit
Given a graph \({{{\mathcal{G}}}}({{{\mathcal{V}}}},{{{\mathcal{E}}}})\) with node features α and weighted edges \({{{\mathcal{E}}}}\), and trainable parameters \({{{\boldsymbol{\beta }}}},{{{\boldsymbol{\gamma }}}}\in {{\mathbb{R}}}^{p}\), our ansatz at depth p is of the following form
where \(\left\vert s\right\rangle\) is the uniform superposition of bitstrings of length n,
U_{N}(α, β_{j}) with \({{{\rm{Rx}}}}(\theta )={e}^{i\frac{\theta }{2}X}\), is defined as
and \({U}_{G}({{{\mathcal{E}}}},{\gamma }_{j})\) is
with \({H}_{{{{\mathcal{G}}}}}={\sum }_{(i,j)\in {{{\mathcal{E}}}}}{\varepsilon }_{ij}{\sigma }_{z}^{(i)}{\sigma }_{z}^{(j)}\) and \({{{\mathcal{E}}}}\) are the edges of graph \({{{\mathcal{G}}}}\) weighted by ε_{ij}. A 5qubit example of this ansatz can be seen in Fig. 2.
For p = 1, we have
where \({{{\rm{diag}}}}{({Z}_{i}{Z}_{j})}_{\left\vert x\right\rangle }=\pm 1\) is the entry in the matrix corresponding to each Z_{i}Z_{j} term, e.g., I_{1} ⊗ … ⊗ Z_{i} ⊗ I_{k} ⊗ … ⊗ Z_{j} ⊗ … ⊗ I_{n}, corresponding to the basis state \(\left\vert x\right\rangle\). (E.g., the first term on the diagonal corresponds to the allzero state, and so on.) We see that the first group of terms, denoted weighted bitflip terms, is a sum over products of terms that encode the node features. In other words, in the onequbit case we get a sum over sine and cosine terms, in the twoqubit case we get a sum over products of pairs of sine and cosine terms, and so on. The terms in the second part of the equation denoted edge weights is the exponential of a sum over edge weight terms. As we start in the uniform superposition, each basis state’s amplitude depends on all node and edge features, but with different signs and therefore different terms interfering constructively and destructively for every basis state. This can be regarded as a quantum version of the aggregation functions used in classical graph NNs, where the kth layer of a NN aggregates information over the klocal neighborhood of the graph in a permutation equivariant way^{23}. In a similar fashion, the terms in Eq. (5) aggregate node and edge information and become more complex with each additional layer in the PQC.
The reader may already have observed that this ansatz is closely related to an ansatz that is wellknown in quantum optimization: that of the quantum approximate optimization algorithm^{2}. Indeed, our ansatz can be seen as a special case of the QAOA, where instead of using a cost Hamiltonian to encode the problem, we directly encode instances of graphs and apply the “mixer terms” in Eq. (3) only to nodes not yet in the partial tour. This correspondence will later let us use known results for QAOAtype ansatzes at depth one^{24} to derive exact analytical forms of the expectation values of our ansatz, and use these to study its expressivity.
As our focus is on implementing an ansatz that respects a symmetry that is useful in graph learning tasks, namely an equivariance under permutation of vertices of the input graph, we now show that each part of our ansatz respects this symmetry.
Theorem 1
(Permutation equivariance of the ansatz). Let the ansatz of depth p be of the type as defined in Eq. (1) with initial state \({\left\vert +\right\rangle }^{\otimes n}\) and parameters \({{{\boldsymbol{\beta }}}},{{{\boldsymbol{\gamma }}}}\in {{\mathbb{R}}}^{p}\), that represents an instance of a graph \({{{\mathcal{G}}}}\) with nodes \({{{\mathcal{V}}}}\) and the list of edges \({{{\mathcal{E}}}}\) with corresponding edge weights ε_{ij}, and node features \({{{\boldsymbol{\alpha }}}}\in {{\mathbb{R}}}^{n}\) with \(n= {{{\mathcal{V}}}}\). Let σ be a permutation of the vertices in \({{{\mathcal{V}}}}\), \({P}_{\sigma }\in {{\mathbb{B}}}^{n\times n}\) the corresponding permutation matrix that acts on the weighted adjacency matrix A of \({{{\mathcal{G}}}}\), and \({\widetilde{P}}_{\sigma }\in {{\mathbb{B}}}^{{2}^{n}\times {2}^{n}}\) a matrix that maps the tensor product \(\left\vert {v}_{1}\right\rangle \otimes \left\vert {v}_{2}\right\rangle \otimes \ldots \otimes \left\vert {v}_{n}\right\rangle\) with \(\left\vert {v}_{i}\right\rangle \in {{\mathbb{C}}}^{2}\) to \(\left\vert {v}_{{\widetilde{p}}_{\sigma }(1)}\right\rangle \otimes \left\vert {v}_{{\widetilde{p}}_{\sigma }(2)}\right\rangle \otimes \ldots \otimes \left\vert {v}_{{\widetilde{p}}_{\sigma }(n)}\right\rangle\). Then, the following relation holds,
where \({{{{\mathcal{E}}}}}_{(\cdot )}\) denotes a specific permutation of the adjacency matrix A of the given graph. We call an ansatz that satisfies this property permutation equivariant.
As mentioned before, our ansatz is closely related to those in^{20}, and the authors of this work prove permutation equivariance of unitaries that are defined in terms of unweighted adjacency matrices of graphs. In order to prove equivariance of our circuit, we have to generalize their result to the case where a weighted graph is encoded in the form of a Hamiltonian, and parametrized by a set of free parameters as described in Eq. (1). In the nonparametrized case this is trivial, as edge weights and node features are directly permuted as a consequence of the permutation of the graph. When introducing parameters to the node and edge features, however, we have to make sure that the parameters themselves preserve equivariance, as the parameters are not tied to the adjacency matrix but to the circuit itself. To guarantee this, we make the parametrization itself permutation invariant by assigning one node and edge parameter per layer, respectively, and this makes us arrive at the QAOAtype parametrization shown in Eq. (1). Another difference of our proof to that in^{20} is that we consider a complete circuit including its initial state, instead of only guaranteeing that the unitaries that act on the initial state are permutation equivariant. We provide the detailed proof of equivariance of our ansatz in the supplementary material.
The above definition and proof are given in terms of a learning problem where we map one vertex to one qubit directly. However, settings where we require more than one qubit to encode node information are easily possible with this type of architecture as well. In order to preserve equivariance of our ansatz construction, three conditions have to hold: (i) the initial state of the circuit has to be permutation invariant or equivariant, (ii) the twoqubit gates used to encode edge weights have to commute, (iii) the parametrization of the gates has to be permutation invariant. In the case where each vertex or edge is represented with more than just one gate per layer, one has freedom on how to do this as long as the above (i)–(iii) still hold. A simple example is when each vertex is represented by m qubits: (i) the initial state remains to be the uniform superposition, (ii) the topology of the twoqubit gates that represent edges has to be changed according to the addition of the new qubits, but ZZgates can still be used to encode the information, (iii) the parametrization is the same as in the onequbitpervertex case.
Trainability of ansatz
Our goal in this work is to introduce a problemtailored ansatz for a specific data type that provides trainability advantages compared to unstructured ansatzes. One important question that arises in this context is that of barren plateaus, where the variance of derivatives for random circuits vanishes exponentially with the system size^{8}. This effect poses challenges for scaling up circuit architectures like the hardwareefficient ansatz^{6}, as even at a modest number of qubits and layers a quantum model like this can become untrainable^{9,10,25}. Therefore it is important to address the presence of barren plateaus when introducing an ansatz. In a recent work^{26}, it has been proven that barren plateaus are not present in circuits that are equivariant under the symmetric group S_{n}, namely the group of permutations on n elements, in this case all permutations over the qubits. While our circuit is also permutation equivariant, we define permutations based on the input graphs and not the qubits themselves, so our approach differs from the equivariant quantum neural networks in^{26} as (a) the incorporation of edge weights into the unitaries prevents the unitaries from commuting with all possible permutations of qubits, and (b) multiple qubits can potentially correspond to one vertex. While permutation equivariance poses some restrictions on the expressibility of the ansatz and one would expect a better scaling of gradients than in, e.g., hardwareefficient types of circuits, the results of^{26} do not directly translate to our work for the above reasons.
To get additional insight, one can also turn to results on barren plateaus related to QAOAtype circuits, due to the structural similarity that our ansatz has to them. The authors of^{27} investigate the scaling of the variance of gradients of two related types of ansatzes. They characterize ansatzes given by the following two Hamiltonians: the transverse field Ising model (TFIM),
where n_{f} = n − 1 (n_{f} = n) for open (periodic) boundary conditions, and a spin glass (SG),
with h_{i}, J_{ij} drawn from a Gaussian distribution. Based on the generators of those two ansatzes, the authors of^{27} show that an ansatz that consists of layers given by the TFIM Hamiltonian has a favorable scaling of gradients. An ansatz that consists of layers given by H_{SG}, on the other hand, does not. Considering the results for the two above Hamiltonians, one can expect that whether our ansatz exhibits barren plateaus will strongly depend on the encoded graphs, i.e., the connectivity, edge weights and node features. Which types of graphs lead to a favorable scaling of gradients, and for what learning tasks our ansatz exhibits good performance at a number of layers polynomial in the input size, is an interesting question that we leave for future work.
In addition to barren plateaus that are a result of the randomness of the circuit, there is a type of barren plateau that is caused by hardware noise, called noiseinduced barren plateaus (NIBPs)^{28}. This problem can not be directly mitigated by the choice of circuit architecture, as eventually all circuit architectures are affected by hardware noise, especially when they become deeper. We do not expect that our circuit is resilient to NIBPs, however, the numerical results in Section II E show that the EQC already performs well with only one layer for the environment we study in this work as we scale up the problem size. This provides hope that, at least in terms of circuit depth, the EQC will scale favorably in the number of layers as the number of qubits in the circuit is increased, and therefore the effect of NIBPs will be less severe than for other circuit architectures with the same number of qubits.
Another important question for the training of ML models is that of data efficiency, i.e., how many training data points are required to achieve a low generalization error. Indeed, one of the key motivating factors behind the design of geometric models that preserve symmetries in the training data is to reduce the size of the training data set. In the classical literature, it was shown that geometric models require fewer training data and as a result often fewer parameters as models that do not preserve said symmetries^{29}. Recent work showed that this is also true for S_{n}equivariant quantum models^{26}, where the authors give an improved bound on the generalization error compared to the bounds that were previously shown to exist for general classes of PQCs^{30}. However, the results from^{26} do again not directly translate to our approach as stated in the context of barren plateaus above.
Quantum neural combinatorial optimization with the EQC
Combinatorial optimization problems are ubiquitous, be it in transportation and logistics, electronics, or scheduling tasks. These types of problems have also been studied in computer science and mathematics for decades. Many interesting combinatorial optimization problems that are relevant in industry today are NPhard, so that no general efficient solution is expected to exist. For this reason, heuristics have gained much popularity, as they often provide highquality solutions to realworld instances of many NPhard problems. However, good heuristics require domain expertize in their design and they have to be defined on a perproblem basis. To circumvent handcrafting heuristic algorithms, machine learning approaches for solving combinatorial optimization problems have been studied. One line of research in this area investigates using NNs to learn algorithms for solving combinatorial optimization problems^{16,31}, which is known as neural combinatorial optimization (NCO). Here, NNs learn to solve combinatorial optimization problems based on data, and can then be used to find approximate solutions to arbitrary instances of the same problem. First approaches in this direction used supervised learning to find approximate solutions based on NN techniques from natural language processing^{32}. A downside of the supervised approach is that it requires access to a large amount of training data in form of solved instances of the given problem, which requires solving many NPhard instances of the problem to completion. At large problem sizes, this is a serious impediment for the practicability of this method. For this reason, reinforcement learning (RL) was introduced as a technique to train these heuristics. In RL, an agent does not learn based on a given data set, but by interacting with an environment and gathering information in form of rewards in a trialanderror fashion. These RLbased approaches have been shown to successfully solve even instances of significant size in problems with a geometric structure like the convex hull problem^{33}, chip placement^{34} or the vehicle routing problem^{35}. To implement NCO in this work we use a RL method called Qlearning following^{33}, details of which are presented in Section METHODS.
Our goal is to use the ansatz described in Section Equivariant quantum circuit to train a model that, once trained, implements a heuristic to produce tours for previously unseen instances of the TSP. The TSP consists of finding a permutation of a set of cities such that the resulting length of a tour visiting each city in this sequence is minimal. The heuristic takes as input an instance of the TSP problem in form of a weighted 2D Euclidean graph \({{{\mathcal{G}}}}({{{\mathcal{V}}}},{{{\mathcal{E}}}})\) with \(n= {{{\mathcal{V}}}}\) vertices representing the cities and edge weights ε_{ij} = d(v_{i}, v_{j}), where d(v_{i}, v_{j}) is the Euclidean distance between nodes v_{i} and v_{j}. Specifically, we are dealing with the symmetric TSP, where the edges in the graph are undirected. Given \({{{\mathcal{G}}}}\), the algorithm constructs a tour in n − 2 steps. Starting from a given (fixed) node in the proposed tour T_{t=1}, in each step t of the tour selection process the algorithm proposes the next node (city) in the tour. Once the secondbeforelast node has been added to the tour, the last one is also directly added, hence the tour selection process requires n − 2 steps. This can also be viewed as the process of successively marking nodes in a graph as they are added to a tour. In order to refer to versions of the input graph at different time steps where the nodes that are already present in the tour are marked, we now define the annotated graph.
Definition 1
(Annotated graph). For a graph \({{{\mathcal{G}}}}({{{\mathcal{V}}}},{{{\mathcal{E}}}})\), we call \({{{\mathcal{G}}}}({{{\mathcal{V}}}},{{{\mathcal{E}}}},{{{{\boldsymbol{\alpha }}}}}^{(t)})\) the annotated graph at time step t. The vector α^{(t)} ∈ {0, π}^{n} specifies which nodes are already in the tour T_{t} (\({\alpha }_{i}^{(t)}=0\)) and which nodes are still available for selection (\({\alpha }_{i}^{(t)}=\pi\)).
In each time step of an episode in the algorithm, the model is given an annotated graph as input. Based on the annotated graph, the model should select the next node to add to the partial tour T_{t} at step t. The annotation can be used to partition the nodes \({{{\mathcal{V}}}}\) into the set of available nodes \({{{{\mathcal{V}}}}}_{a}=\{{v}_{i} {\alpha }_{i}^{(t)}=\pi \}\) and the set of unavailable nodes \({{{{\mathcal{V}}}}}_{u}=\{{v}_{i} {\alpha }_{i}^{(t)}=0\}\). The node selection process can now be defined as follows.
Definition 2
(Node selection). Given an annotated graph \({{{\mathcal{G}}}}({{{\mathcal{V}}}},{{{\mathcal{E}}}},{{{{\boldsymbol{\alpha }}}}}^{(t)})\), the node selection process consist of selecting nodes in a tour in a stepwise fashion. To add a node to the partial tour T_{t}, the next node is selected from the set of available nodes \({{{{\mathcal{V}}}}}_{a}\). The unavailable nodes \({{{{\mathcal{V}}}}}_{u}\) are ignored in this process.
After n − 2 steps, the model has produced a tour T_{n}. A depiction of this process can be found in Fig. 3. To assess the quality of the generated tour, we compare the tour length c(T_{n}) to the length of the optimal tour c(T^{*}), where
is the sum of edge weights (distances) for all edges between the nodes in the tour, with \({{{{\mathcal{E}}}}}_{T}\subset {{{\mathcal{E}}}}\). We measure the quality of the generated tour in form of the approximation ratio
In order to perform Qlearning we need to define a reward function that provides feedback to the RL agent on the quality of its proposed tour. The rewards in this environment are defined by the difference in overall length of the partial tour T_{t} at time step t, and upon addition of a given node v_{l} at time step t + 1:
Note that we use the negative of the cost as a reward, as a Qlearning agent will always select the action that leads to the maximum expected reward.
The learning process is defined in terms of a DQN algorithm, where the Qfunction approximator is implemented in form of a PQC (which is described in detail in Section II A). Here, we define the TSP in terms of an RL environment, where the set of states \({{{\mathcal{S}}}}=\{{{{{\mathcal{G}}}}}_{i}({{{\mathcal{V}}}},{{{\mathcal{E}}}},{{{{\boldsymbol{\alpha }}}}}^{(t)})\,{{{\rm{for}}}}\,i=1,\ldots , {{{\mathcal{X}}}} \,{{{\rm{and}}}}\,t=1,\ldots ,n1\}\) consists of all possible annotated graphs (i.e., all possible configurations of values of α^{(t)}) for each instance i in the training set \({{{\mathcal{X}}}}\). This means that the number of states in this environment is \( {{{\mathcal{S}}}} ={2}^{n1} {{{\mathcal{X}}}}\). The action that the agent is required to perform is selecting the next node in each step of the node selection process described in Definition 2, so the action space \({{{\mathcal{A}}}}\) consists of a set of indices for all but the first node in each instance (as we always start from the first node in terms of the list of nodes we are presented with for each graph, so \({\alpha }_{1}^{(t)}=0,\,\forall \,t\)), and \( {{{\mathcal{A}}}} =n1\).
The Qfunction approximator gets as input an annotated graph, and returns as output the index of the node that should next be added to the tour. Which index this is, is decided in terms of measuring an observable corresponding to each of the available nodes \({{{{\mathcal{V}}}}}_{a}\). Depending on the last node added to the partial tour, denoted as v_{t−1}, the observable for each available node v_{l} is defined as
weighted by the edge weight \({\varepsilon }_{{v}_{t1},{v}_{l}}\), and the Qvalue corresponding to each action is
where the exact form of \({\left\vert {{{\mathcal{E}}}},{{{{\boldsymbol{\alpha }}}}}^{(t)},{{{\boldsymbol{\beta }}}},{{{\boldsymbol{\gamma }}}}\right\rangle }_{p}\) is described in Section Equivariant quantum circuit. The node that is added to the tour next is the one with the highest Qvalue,
All unavailable nodes \({v}_{l}\in {{{{\mathcal{V}}}}}_{u}\) are not included in the node selection process, so we manually set their Qvalues to a large negative number to exclude them, e.g.,
We also define a stopping criterion for our algorithm, which corresponds to the agent solving the TSP environment for a given instance size. As we aim at comparing the results of our algorithm to optimal solutions in this work, we have access to a labeled set of instances and define our stopping criterion based on these. However, note that the optimal solutions are not required for training, as a stopping criterion can also be defined in terms of number of episodes or other figures of merit that are not related to the optimal solution. In this work, the environment is considered as solved and training is stopped when the average approximation ratio of the past 100 iterations is <1.05, where an approximation ratio of 1 means that the agent returns the optimal solution for the instances it was presented with in the past 100 episodes. We do not set the stopping criterion at optimality for two reasons: (i) it is unlikely that the algorithm finds a parameter setting that universally produces the optimal tour for all training instances, and (ii) we want to avoid overfitting on the training data set. If the agent does not fulfill the stopping criterion, the algorithm will run until a predefined number of episodes is reached. In our numerical results shown in Section Numerical results, however, most agents do not reach the stopping criterion of having an average approximation ratio below 1.05, and run for the predefined number of episodes instead. Our goal is to generate a model that is, once fully trained, capable of solving previously unseen instances of the TSP.
We showed in Section Equivariant quantum circuit that our ansatz of arbitrary depth is permutation equivariant. Now we proceed to show that the Qvalues that are generated from measurements of this PQC, and the tour generation process as described in Section Quantum neural combinatorial optimization with the EQC are equivariant as well. While the equivariance of all components of an algorithm is not a prerequisite to harness the advantage gained by an equivariant model, knowing which parts of our learning strategy fulfill this property provides additional insight for studying the performance of our model later. As we show that the whole node selection process is equivariant, we know that the algorithm will always generate the same tour for every possible permutation of the input graph for a fixed setting of parameters, given that the model underlying the tour generation process is equivariant. This is not necessarily true for a nonequivariant model, and simply by virtue of giving a permuted graph as input, the algorithm can potentially return a different tour.
Theorem 2
(Equivariance of Qvalues). Let \(Q({{{\mathcal{G}}}}({{{\mathcal{V}}}},{{{\mathcal{E}}}},{{{\boldsymbol{\alpha }}}}),{v}_{l})=Q({{{\mathcal{G}}}},{v}_{l})\) be a Qvalue as defined in Eq. (13), where we drop instancespecific sub and superscripts for brevity. Let σ be a permutation of \(n= {{{\mathcal{V}}}}\) elements, where the lth element corresponds to the lth vertex v_{l} and σ_{Q} be a permutation that reorders the set of Qvalues \(Q({{{\mathcal{G}}}})=\{Q({{{\mathcal{G}}}},{v}_{1}),\ldots ,Q({{{\mathcal{G}}}},{v}_{n})\}\) in correspondence to the reordering of the vertices by σ. Then the Qvalues \(Q({{{\mathcal{G}}}})\) are permutation equivariant,
where \({{{{\mathcal{G}}}}}_{\sigma }\) is the permuted graph.
Proof
We know from Theorem 1 that the ansatz we use, and therefore the expectation values \(\langle {O}_{{v}_{l}}\rangle\), are permutation equivariant. The Qvalues are defined as \(Q({{{\mathcal{G}}}},{v}_{l})={\varepsilon }_{ij}\langle {O}_{{v}_{l}}\rangle\) (see Eq. (13)) and therefore additionally depend on the edge weights of the graph \({{{\mathcal{G}}}}\). The edge weights are computed according to the graph’s adjacency matrix, and reordered under a permutation of the vertices and assigned to their corresponding permuted expectation values. □
As a second step, to show that all components of our algorithm are permutation equivariant, it remains to show that the tours that our model produces as described in Section Quantum neural combinatorial optimization with the EQC are also permutation equivariant.
Corollary 1
(Equivariance of tours). Let \(T({{{\mathcal{G}}}},{{{\boldsymbol{\beta }}}},{{{\boldsymbol{\gamma }}}},{v}_{0})\) be a tour generated by a permutation equivariant agent implemented with a PQC as defined in Eq. (1) and Qvalues as defined in Eq. (13), for a fixed set of parameters β, γ and a given start node v_{0}, where a tour is a cycle over all vertices \({v}_{l}\in {{{\mathcal{V}}}}\) that contains each vertex exactly once. Let σ be a permutation of the vertices \({{{\mathcal{V}}}}\), and σ_{T} a permutation that reorders the vertices in the tour accordingly. Then the output tour is permutation equivariant,
Proof
We have shown in Theorem 2 that the Qvalues of our model are permutation equivariant, meaning that a permutation of vertices results in a reordering of Qvalues to different indices. Action selection is done by \({v}_{t+1}={{{{\rm{argmax}}}}}_{v}Q({{{{\mathcal{G}}}}}_{i}^{(t)},v)\), and the node at the index corresponding to the largest Qvalue is chosen. To generate a tour, the agent starts at a given node v_{0} and sequentially selects the following n − 1 vertices. Upon a permutation of the input graph, the tour now starts at another node index v_{σ(0)}. Each step in the selection process can now be seen w.r.t. the original graph \({{{\mathcal{G}}}}\) and the permuted graph \({{{{\mathcal{G}}}}}_{\sigma }\). As we have shown in Theorem 1, equivariance of the model holds for arbitrary input graphs, so in particular it holds for each \({{{\mathcal{G}}}}\) and \({{{{\mathcal{G}}}}}_{\sigma }\) in the action selection process, and the output tour under the permuted graph is equal to the output tour under the original graph up to a renaming of the vertices. □
Analysis of expressivity
In this section, we analyze under which conditions there exists a setting of β, γ for a given graph instance \({{{{\mathcal{G}}}}}_{i}\) for our ansatz at depth one that can produce the optimal tour for this instance. Note that this does not show anything about constructing the optimal tour for a number of instances simultaneously with this set of parameters, or how easy it is to find any of these sets of parameters. Those questions are beyond the scope of this work. The capability to produce optimal tours at any depth for individual instances is of interest because first, we do not expect that the model can find a set of parameters that is closetooptimal for a large number of instances if it is not expressive enough to contain a parameter setting that is optimal for individual instances. Second, the goal of a ML model is always to find similarities within the training data that can be used to generalize well on the given learning task, so the ability to find optimal solutions on individual instances is beneficial for the goal of generalizing on a larger set of instances. In addition, how well the model generalizes also depends on the specific instances and the parameter optimization routine, and therefore it is hard to make formal statements about the general case where we find one universal set of parameters that produces the optimal solution for arbitrary sets of instances.
For our model at p = 1, we can compute the analytic form of the expectation values of our circuit as defined in Eq. (12) and Eq. (13) as the following, by a similar derivation as in ref. ^{24},
where v_{t−1} is the last node in the partial tour and v_{l} is the candidate node. Note that due to the specific setup of node features used in our work where the contributions of nodes already present in the tour are turned off, these expectation values are simpler than those given for Isingtype Hamiltonians in^{24}. For a learning task where contributions of all nodes are present in every step, the expectation values of the EQC will be the same as those for Ising Hamiltonians without local fields given in^{24}, with the additional node features α. Due to this structural similarity to the ansatz used in the QAOA, results on the hardness to give an analytic form of these expectation values at p > 1 also transfer to our model. Even at depth p = 2 analytic expressions can only be given for certain types of graphs^{36,37}, and everything beyond this quickly becomes too complex. For this reason, we can only make statements for p = 1 in this work.
In order to generate an arbitrary tour of our choice, in particular also the optimal tour, it suffices to guarantee that for a suitable choice of (fixed) γ, at each step in the node selection process the edge we want to add next to the partial tour has highest expectation. One way we can do this is by controlling the signs of each sine and cosine term in Eq. (17) such that only the expectation values corresponding to edges that we want to select are positive, and all others are negative.
To understand whether this is possible, we can leverage known results about the expressivity of the sine function. For any rationally independent set of {x_{1}, . . . , x_{n}} with labels y_{i} ∈ ( − 1, 1), the sine function can approximate these points to arbitrary precision ϵ as shown in ref. ^{38}, i.e., there exists an ω s.t.
In general, the edge weights of graphs that represent TSP instances are not rationally independent. (The real numbers x_{1}, …, x_{n} are said to be rationally independent if no integers k_{1}, …, k_{n} exist such that x_{1}k_{1} + … + x_{n}k_{n} = 0, besides the trivial solution \({k_{i} = 0\,\forall\, {k}}\). Rational independence also implies the points are not rational numbers, so they are also not numbers normally represented by a computer.) However, in principle they can easily be made rationally independent by adding a finite perturbation \({\epsilon }_{i}^{{\prime} }\) to each edge weight. The results in^{38} imply that almost any set of points x_{1}, …, x_{n} with 0 < x_{i} < 1 is rationally independent, so we can choose \({\epsilon }_{i}^{{\prime} }\) to be drawn uniformly at random from \((0,{\epsilon }_{\max }]\). As long as these perturbations are applied to the edge weights in a way that does not change the optimal tour, as could be done by ensuring that \({\epsilon }_{\max }\) is small enough so that the proportions between edge weights are preserved, we can use this perturbed version of the graph to infer the optimal tour. (Such an \({\epsilon }_{\max }\) can be computed efficiently.) In this way we can guarantee that the ansatz at depth one can produce arbitrary labelings of our edges, which in turn let us produce expectation values such that only the ones that correspond to edges in the tour of our choice will have positive values. We note that in the analysis we assume realvalued (irrational) perturbations, which of course cannot be represented in the computer. However, by using the results of^{38} and approximating ± 1 within a small epsilon, we can get a robust statement where finite precision suffices. For completeness, we provide a proof for this case in the supplementary material. However, we point out that the parameter ω that leads to the construction of the optimal tour can in principle be arbitrarily large and hard to find. We do not go deeper into this discussion since in fact we do not want to rely on this proof of optimality as a guiding explanation of how the algorithm works.
The reason for this is that in some way, this proof of optimality works despite the presence of the TSP graph and not because of it. This is similar in vein to universality results for QAOAtype circuits, where it can be shown that for very specific types of Hamiltonians, alternating applications of the cost and mixer Hamiltonian leads to quantum computationally universal dynamics, i.e., it can reach all unitaries to arbitrary precision^{39,40}, but these Hamiltonians are not related to any of the combinatorial optimization problems that were studied in the context of the QAOA. While these results provide valuable insight into the expressivity of the models, in our case they do not inform us about the possibility of a quantum advantage on the learning problem that we study in this work. In particular, we do not know from these results whether the EQC utilizes the information provided by the graph features in a way in which the algorithm benefits from the quantumness of the model, at depth one or otherwise. As it is known that the QAOA applied to ground state finding benefits from interference effects, investigating whether similar results hold for our algorithm is an interesting question that we leave for future work.
In addition, we note that high expressivity alone does not necessarily lead to a good model, and may even lead to issues in training as the wellstudied phenomenon of barren plateaus^{8}, or a susceptibility to overfitting on the training data. In practice, the best models are those that strike a balance between being expressive enough, and also restricting the search space of the model in a way that suits the given training data. Studying and designing models that have this balance is exactly the goal of geometric learning, and the equivariance we have proven for our model is a helpful geometric prior for learning tasks on graphs.
Numerical results
After proving that our model is equivariant under node permutations and analytically studying the expressivity of our ansatz, we now numerically study the training and validation performance of this model on TSP instances of varying size in a NCO context. The training data set that we use is taken from^{32}, where the authors propose a novel classical attention approach and evaluate it on a number of geometric learning tasks. We note that we have recomputed the optimal tours for all instances that we use, as the data set uploaded by the authors of^{32} erroneously contains suboptimal solutions. (This was confirmed with the authors, but at the time of writing of this work their repository has not been updated with the correct solutions.) To compute optimal solutions for the TSP instances with 10 and 20 cities we used the library PythonTSP^{41}.
We evaluate the performance of the EQC on TSP instances with 5, 10, and 20 cities (corresponding to 5, 10, and 20 qubits, respectively). As described in Section Quantum neural combinatorial optimization with the EQC, the environment is considered as solved by an agent when the running average of the approximation ratio over the past 100 episodes is <1.05. Otherwise, each agent will run until it reaches the maximum number of episodes, that we set to be 5000 for all agents. Note that this is merely a convenience to shorten the overall training times, as we have access to the optimal solutions of our training instances. In a realistic scenario where one does not have access to optimal solutions, the algorithm would simply run for a fixed number of episodes or until another convergence criterion is met. When evaluating the final average approximation ratios, we always use the parameter setting that was stored in the final episode, regardless of the final training error. When variations in training lead to a slightly worse performance than what was achieved before, we still use the final parameter setting. We do this because as noted above, in a realistic scenario one does not have knowledge about the ratio to the optimal solutions during training. Unless otherwise stated, all models are trained on 100 training instances and evaluated on 100 validation instances.
As we are interested in the performance benefits that we gain by using an ansatz that respects an important graph symmetry, we compare our model to versions of the same ansatz where we gradually break the equivariance property. We start with the simplest case, were the circuit structure is still the same as for the EQC, but instead of having one β_{l}, γ_{l} in each layer, every X and ZZgate is individually parametrized. As these parameters are now tied directly to certain one and twoqubits gates, e.g., an edge between qubits one and two, they will not change location upon a graph permutation and therefore break equivariance. We call this the nonequivariant quantum circuit (NEQC). To go one step further, we take the NEQC and add a variational part to each layer that is completely unrelated to the graph structure: namely a hardwareefficient layer that consists of parametrized Yrotations and a ladder of CZgates. In this ansatz, we have a division between a data encoding part and a variational part, as is often done in QML. To be closer to standard types of ansatzes often used in QML, we also omit the initial layer of Hgates here and start from the allzero state, which requires us to switch the order of X and ZZgates (however, in practice it did not make a difference whether we started from the allzero or uniform superposition state in the learning task that we study). We denote this the hardwareefficient with trainable embedding (HWETE) ansatz. Finally, we study a third ansatz, where we take the HWETE and now only train the Yrotation gates, and the graphembedding part of the circuit only serves as a data encoding step. We call this simply the hardwareefficient (HWE) ansatz. A depiction of all ansatzes can be seen in Fig. 4.
We start by comparing the EQC to the NEQC on TSP instances with 5, 10, and 20 cities. We show the training and validation results in Fig. 5. To evaluate the performance of the models that we study, we compute the ratio to the optimal tour length as shown in Eq. (10), as the instances that we can simulate the circuits for are small enough to allow computing optimal tours for. For reference, the authors of^{32}, who generated the training instances that we use, stop comparing to optimal solutions at n = 20 as it becomes extremely costly to find optimal tours from thereon out. To provide an additional classical baseline, we also show results for the nearestneighbor heuristic. This heuristic starts at a random node and selects the closest neighboring node in each step to generate the final tour. The nearestneighbor algorithm finds a solution quickly also for instances with increasing size, but there is no guarantee that this tour is close to the optimal one. However, as we know the optimal tours for all instances, the nearestneighbor heuristic provides an easy to understand classical baseline that we can use. In addition, we add the upper bound given by one of the most widely used approximation algorithms for the TSP (as implemented e.g., in Google ORTools): the Christofides algorithm. This algorithm is guaranteed to find a tour that is at most 1.5 times as long as the optimal tour^{42}. In the case where any of our models produces validation results that are on average above this upper bound of the Christofides algorithm, we consider it failed, as it is more efficient to use a polynomial approximation algorithm for these instances. However, we stress that this upper bound can only serve to inform us about the failure of our algorithms and not their success, as in practice the Christofides algorithm often achieves much better results than those given by the upper bound. We also note that both the Christofides and nearestneighbor algorithms are provided here to assure that our algorithm produces reasonable results, and not to show that our algorithm outperforms classical methods as this is not the topic of the present manuscript. The bound is shown as a dotted black line in Fig. 5 and Fig. 6.
Geometric learning models are expected to be more dataefficient than their unstructured counterparts, as they respect certain symmetries in the training data. This means that when a number of symmetric instances are present in the training or validation data, the effective size of these data sets is decreased. This usually translates into models that are more resourceefficient in training, e.g., by requiring fewer parameters or fewer training samples. In our comparison of the EQC and the NEQC, we fix the number of training samples and compare the different models in terms of circuit depth and number of parameters to achieve a certain validation error and expect that the EQC will need fewer layers to achieve the same validation performance as the NEQC. This comparison can be seen in Fig. 5. In Fig. 5 a) and b), we show the training and validation performance of both ansatzes at depth one. For instances with five cities, both ansatzes perform almost identically on the validation set, where the NEQC performs worse on a few validation instances. As the instance size increases, the gap between EQC and NEQC becomes bigger. We see that even though the two ansatzes are structurally identical, the specific type of parametrizations we choose and the properties of both ansatzes that result from this make a noticeable difference in performance. While the EQC at depth one has only two parameters per layer regardless of instance size, the NEQC’s number of parameters per layer depends on the number of nodes and edges in the graph. Despite having much fewer parameters, the EQC still outperforms the NEQC on instances of all sizes. Increasing the depth of the circuits also does not change this. In Fig. 5 c) and d) we see that at a depth of four, the EQC still beats the NEQC. The latter’s validation performance even slightly decreases with more layers, which is likely due to the increased complexity of the optimization task, as the number of trainable parameters per layer is \(\frac{(n1)n}{2}+n\), which for the 20city instances means 840 trainable parameters at depth four (compared to 8 parameters in case of the EQC). This shows that at a fraction of the number of trainable parameters, the EQC is competitive with its nonequivariant counterpart even though the underlying structure of both circuits is identical. Compared to the classical nearestneighbor heuristic, both ansatzes perform well and beat it at all instance sizes, and both ansatzes are also below the approximation ratio upper bound given by the Christofides algorithm on all validation instances. The box plots in Fig. 5 show a comparison of the EQC and NEQC in terms of the quartiles of the approximation ratios on the validation set. As it is hard to infer statistical significance of results directly from the box plots, especially when the distributions of data points are not very far apart, we additionally plot the means of the distributions and their standard error, and compute p values based on a ttest to give more insight on the comparison of these two models in the supplementary material. To show statistical significance of the comparison of the EQC and NEQC, we perform a twosample ttest with the nullhypothesis that the averages of the two distributions are the same, as is common in statistical analysis, and compute p values based on this. The p values confirm that there is indeed a statistical significance in the comparison between models for the 10 and 20city instances, and that we can be more certain about the significance as we scale up the instance size. The average approximation ratios in case of the 5city instances are roughly the same, as we can expect due to the fact that there exist only 12 permutations of the TSP graphs of this size. However, even for these small instances the EQC achieves the same result with fewer parameters, namely 2 per layer instead of the 15 per layer required in the NEQC.
Next, we compare the EQC to ansatzes in which we introduce additional variational components that are completely unrelated to the training data structure, as described above. We show results for the HWETE and the HWE ansatz in Fig. 6. To our own surprise, we did not manage to get satisfactory results with either of those two ansatzes, especially at larger instances, despite an intensive hyperparameter search. Even the HWETE, which is basically identical to the NEQC with additional trainable parameters in each layer, failed to show any significant performance. To gauge how badly those two ansatzes perform, we also show results for an algorithm that selects a random tour for each validation instance in Fig. 6. In this figure, we show results for TSP instances with five and ten cities for both ansatzes with one and four layers, respectively. In addition, we show how the validation performance changes when the models are trained with either a training data set consisting of 10 or 100 instances, in the hopes of seeing improved performance as the size of the training set increases. We see that in neither configuration, the HWETE or HWE outperform the Christofides upper bound on all validation instances. In addition, in almost all cases those two ansatzes do not even outperform the random algorithm. This example shows that in a complex learning scenario, where the number of permutations of each input instance grows combinatorially with instance size and the number of states in the RL environment grows exponentially with the number of instances, a simple hardwareefficient ansatz will fail even when the data encoding part of the PQC is motivated by the problem data structure. While increasing the size of the training set and/or the number of layers in the circuit seems to provide small advantages in some cases, it also leads to a decrease in performance in others. On the other hand, the EQC is mostly agnostic to changes in the number of layers or the training data size. Overall, we see that the closer the ansatz is to an equivariant configuration, the better it performs, and picking ansatzes that respect symmetries inherent to the problem at hand is the key to success in this graphbased learning task.
In Section Equivariant quantum circuit we have also pointed out that the EQC is structurally related to the ansatz used in the QAOA. The main difference in solving instances of the TSP with the NCO approach used in our work and solving it with the QAOA lies in the way in which the problem is encoded in the ansatz, and in the quantity that is used to compute the objective function value for parameter optimization. We give a detailed description of how the TSP is formulated in terms of a problem Hamiltonian suitable for the QAOA and how parameters are optimized in Section Solving the TSP with the QAOA. As the QAOA is arguably the most explored variational quantum optimization algorithm at the time of writing, and due to the structural similarity between the EQC and the QAOA’s ansatz, we also compare these two approaches on TSP instances with five cities.
We see in Fig. 7 that already on these small instances, the QAOA requires significantly deep circuits to achieve good results, that may be out of reach in a noisy nearterm setting. The EQC on the other hand (i) uses a number of qubits that scales linearly with the number of nodes of the input graph as opposed to the n^{2} number of variables required for QAOA, and (ii) already shows good performance at depth one for instances with up to 20 cities. In addition to optimizing QAOA parameters for each instance individually, we also show results of applying one set of parameters that performed well on one instance at depth three, on other instances of the same problem following the parameter concentration argument given in^{43} and described in more detail in Section Solving the TSP with the QAOA. While we find that parameters seem to transfer well to other instances of the same problem in case of the TSP, the overall performance of the QAOA is still much worse than that of the EQC.
Discussion
After providing analytic insight on the expressivity of our ansatz, we have numerically investigated the performance of our EQC model on TSP instances with 5, 10, and 20 cities (corresponding to 5, 10, 20 qubits respectively), and compared them to other types of ansatzes that do not respect any graph symmetries. To get a fair comparison, we designed PQCs that gradually break the equivariance property of the EQC and assessed their performance. We find that ansatzes that contain structures that are completely unrelated to the input data structure are extremely hard to train for this learning task where the size of the state space scales exponentially in the number of input nodes of the graph. Despite much effort and hyperparameter optimization, we did not manage to get satisfactory results with these ansatzes. The EQC on the other hand works almost outofthe box, and achieves good generalization performance with minimal hyperparameter tuning and relatively few trainable parameters. We have also compared using the EQC in a neural combinatorial optimization scheme with the QAOA, and find that even on TSP instances with only five cities the NCO approach significantly outperforms the QAOA. In addition to training the QAOA parameters for every instance individually, we have also investigated the performance in light of known parameter concentration results that state that in some cases, parameters found on one instance perform well on average for other instances of the same problem. While this is true in the case of the TSP instances we investigate here as well, the overall performance is still worse than that of the EQC.
Comparing our algorithm to the QAOA is also interesting from a different perspective. In Section Equivariant quantum circuit we have seen that our ansatz can be regarded as a special case of a QAOAtype ansatz, where instead of encoding a problem Hamiltonian we encode a graph instance directly, and in case of the specific formulation of the TSP used in this work, include mixing terms only for a problemdependent subset of qubits. This lets us derive an exact formulation of the expectation values of our model at depth one from those of the QAOA given in^{24}. For the QAOA, it is known that in the limit of infinite depth, it can find the ground state of the problem Hamiltonian and therefore the optimal solution to a given combinatorial optimization problem^{2}. In addition, it has been shown that even at low depth, the probability distributions generated by QAOAtype circuits are hard to sample from classically^{44}. These results give a clear motivation of why using a quantum model in these settings can provide a potential advantage. While our model is structurally almost identical to that of the QAOA, in our case the potential for advantage is less clear. We saw in Eq. (17) that at depth one, in each step the expectation value of each edge that we consider to be selected consists of (i) a term corresponding to the edge between the last added node and the candidate node, and (ii) all outgoing edges from the candidate node. So our model considers the onestep neighborhood of each candidate node at depth one. In the case of the TSP it is not clear whether this can provide a quantum advantage for the learning task as specified in Section Quantum neural combinatorial optimization with the EQC. In terms of QAOA, it was shown that in order to find optimal solutions, the algorithm has to “see the whole graph”^{45}, meaning that all edges in the graph contribute to the expectation values used to minimize the energy. To alleviate this strong requirement on depth, a recursive version of the QAOA (RQAOA) was introduced in^{46}. It works by iteratively eliminating variables in the problem graph based on their correlation, and thereby gradually reducing the problem to a smaller instance that can be solved efficiently by a classical algorithm, e.g., by bruteforce search. The authors of^{46} show that the depthone RQAOA outperforms QAOA with constant depth p, and that RQAOA achieves an approximation ratio of one for a family of Ising Hamiltonians.
The node selection process performed by our algorithm with the EQC used as the ansatz is similar to the variable elimination process in the RQAOA, where instead of merging edges, the mixer terms for nodes that have already been selected are turned off, therefore effectively turning expectation values of edges corresponding to unavailable nodes to zero. Furthermore, the specific setup of weighted Z_{i}Z_{j}correlations (see Eq. (12)) that we measure to compute Qvalues in our RL scheme to solve TSP instances are of the same form as those in the Hamiltonian for the weighted MaxCut problem,
The MaxCut problem and its weighted variant have been studied in depth in the context of the QAOA, and it has been shown that it performs well on certain instances of graphs for this task^{2,43,47,48}. While the TSP and weighted MaxCut are clearly very different problems, the similarity between our algorithm and the RQAOA raises the interesting question whether the mechanisms underlying the successful performance of both models in those two learning tasks are related. Based on this, one may ask the broader question of whether QAOAtype ansatzes implement a favorable bias for hybrid quantumclassical optimization algorithms on weighted graphs, like the RQAOA or the quantum NCO scheme in this work. Specifically, by relating the mechanism underlying the variable elimination procedure in RQAOA, which eliminates variables based on their largest (anti)correlation in terms of Z_{i}Z_{j} operators, to the node selection process in our algorithm that solves TSP instances, we can establish a connection between the EQC and known results for the (R)QAOA on weighted MaxCut. It is an interesting question whether results that establish a quantum advantage of the QAOA can be related to the EQC in a NCO context as we present here, and we leave this question for future work.
Methods
Geometric learning—quantum and classical
Learning approaches that utilize geometric properties of a given problem have lead to major successes in the field of ML, such as AlphaFold for the complex task of protein folding^{13,49} and have become an increasingly popular research field over the past few years. Arguably the prime example of a successful geometric model is the convolutional NN (CNN), which has been developed at the end of the 20th century in an effort to enable efficient training of image recognition models^{50}. Since then, it has been shown that one of the main reasons that CNNs are so effective is that they are translation invariant: if an object in a given input image is shifted by some amount, the model will still “recognize” it as the same object and thus effectively requires fewer training data^{23}. While CNNs are the standard architecture used for images, symmetrypreserving architectures have also been developed for timeseries data in the form of recurrent NNs^{51}, and for graph data with GNNs^{52}. GNNs have seen a surge of interest in the classical machine learning community in the past decade^{15,52}. They are designed to process data that is presented in graph form, like social networks^{52}, molecules^{53}, images^{54} or instances of combinatorial optimization problems^{16}.
The first attempt to implement a geometric learning model in the quantum realm was made with the quantum convolutional NN in^{55}, where the authors introduce a translation invariant architecture motivated by classical convolutional NNs. Approaches to translate the GNN formalism to QNNs were taken in^{17}, where input graphs are represented in terms of a parametrized Hamiltonian, which is then used to prepare the ansatz of a quantum model called a quantum graph neural network (QGNN). While the approach in^{17} yields promising results, this work does not take symmetries of the input graph into account. (However, in an independent work prepared at the time of writing this manuscript, one of the authors of^{17} shows that one of their proposed ansatzes is permutation invariant^{56}.) The authors of^{18} introduce the socalled quantum evolution kernel, where they devise a graphbased kernel for a quantum kernel method for graph classification. Again, their ansatz is based on alternating layers of Hamiltonians, where one Hamiltonian in each layer encodes the problem graph, while a second parametrized Hamiltonian is trained to solve a given problem. A proposal for a quantum graph convolutional NN was made in^{19}, and the authors of^{57} propose directly encoding the adjacency matrix of a graph into a unitary to build a quantum circuit for graph convolutions. While all of the above works introduce forms of structured QML models, none of them study their properties explicitly from a geometric learning perspective or relate their performance to unstructured ansatzes.
The authors of^{20} take the step to introduce an equivariant model family for graph data and generalize the QGNN picture to socalled equivariant quantum graph circuits (EQGCs). EQGCs are a very broad class of ansatzes that respect the connectivity of a given input graph. The authors of^{20} also introduce a subclass of EQGCs called equivariant quantum Hamiltonian graph circuits (EHQGCs), that includes the QGNNs by^{17} as a special case. EHQGCs are implemented in terms of a Hamiltonian that is constructed based on the input graph structure, and they are explicitly equivariant under permutation of vertices in the input graph. The framework that the authors of^{20} propose can be seen as a generalization of the above proposals. Different from the above proposals, EQGCs use a postmeasurement classical layer that performs the functionality of an aggregation function as those found in classical GNNs. In classical GNNs, the aggregation function in each layer is responsible for aggregating node and edge information in an equivariant or invariant manner. Popular aggregation functions are sums or products, as they trivially fulfill the equivariance property. In the case of EQGCs, there is no aggregation in the quantum circuit, and this step is offloaded to a classical layer that takes as input the measurements of the PQC. In addition, the EQGC family is defined over unweighted graphs and only considers the adjacency matrix of the underlying input graph to determine the connectivity of the qubits. The authors of^{20} also show that their EQGC outperforms a standard message passing neural network on a graph classification task, and thereby demonstrate a first separation of quantum and classical models on a graphbased learning task.
During preparation of our final manuscript, a work on invariant quantum machine learning models was released by the authors of^{56}. They prove for a number of selected learning tasks whether an invariant quantum machine learning model for specific types of symmetries exists. Their work focuses on group invariance, and leaves proposals for NISQfriendly equivariant quantum models as an open question.
Our proposal is most closely related to EHQGCs, but with a number of deviations. First, our model is defined on weighted graphs and can therefore be used for learning tasks that contain node as well as edge features. Second, the initial state of our model is always the uniform superposition, which allows each layer in the ansatz to perform graph feature aggregation via sums and products of node and edge features, as discussed in Section Equivariant quantum circuit. Third, we do not require a classical postprocessing layer, so our EQC model is purely quantum. In addition, in its simplest form as used in this work, the number of qubits in our model scales linearly with the number of nodes in the input graph, while the depth of each layer depends on the graph’s connectivity, and therefore it provides one answer to the question of a NISQfriendly equivariant quantum model posed by^{56}.
Neural combinatorial optimization with reinforcement learning
The idea behind NCO is to use a ML model to learn a heuristic for a given optimization problem based on data. When combined with RL, this data manifests in form of states of an environment, while the objective is defined in terms of a reward function, as we will now explain in more detail. In the RL paradigm, the model, referred to as an agent, interacts with a socalled environment. The environment is defined in terms of its state space \({{{\mathcal{S}}}}\) and action space \({{{\mathcal{A}}}}\), that can both either be discrete or continuous. The agent alters the state of the environment by performing an action \(a\in {{{\mathcal{A}}}}\), whereafter it receives feedback from the environment in form of the following state \({s}^{{\prime} }\in {{{\mathcal{S}}}}\), and a reward r that depends on the quality of the chosen action, given the initial state s. Actions are chosen based on a policy π(a∣s), which is a probability distribution of actions a given states s. The definition of the state and action spaces and the reward function depends on the given environment. In general, the goal of the agent is to learn a policy that maximizes the expected return G,
where γ ∈ [0, 1] is the discount factor that determines the importance of future rewards in the agent’s decision. The above definition of the expected return is for the socalled infinite horizon, where the interaction with the environment can theoretically go on to infinity. In practice. we usually work in environments with a finite horizon, where the above sum runs only over a predefined number of indices. There are many different approaches to maximize the expected return, and we refer the interested reader to^{58} for an indepth introduction.
In this work, we focus on socalled Qlearning, where the expected return is maximized in terms of Qvalues. The values Q(s, a) for each (s, a) pair also represent the expected return following a policy π, but now conditioned on an initial state s_{t} and action a_{t},
When the agent is implemented in form of a NN, the goal of the NN is to approximate the optimal Qfunction Q^{*}. One popular method to use a NN as a function approximator for Qlearning is called the deep Qnetwork (DQN), and the resulting DQN algorithm^{59}. In this algorithm, the NN is trained similarly as in the supervised case, but without a given set of labelled examples. Instead, the agent collects samples at training time by interacting with the environment. These samples are stored in a memory, out of which a batch of random samples is drawn for each parameter update step. Based on the agent’s output, the label for a given (s, a) pair from the memory is computed as follows,
and this label is then used to compute parameter updates. The update is not computed with the output of the function approximator Q, but by a copy \(\hat{Q}\) called the target network, which is updated with the current parameters of the function approximator at fixed intervals. The purpose of this target network is to stabilize training, and how often it is updated is a hyperparameter that depends on the environment. In our case, the function approximator and target network are implemented as PQCs, while the parameter optimization is perfomed via the classical DQN algorithm. For more detail on implementing the DQN algorithm with a PQC as the function approximator, we refer the reader to^{60,61,62}.
To evaluate the performance gains of an ansatz that respects certain symmetries relevant to the problem at hand, we apply our model to a practically motivated learning task on graphs. The TSP is a lowlevel abstraction of a problem commonly encountered in mobility and logistics: given a list of locations, find the shortest route that connects all of these locations without visiting any of them twice. Formally, given a graph \({{{\mathcal{G}}}}({{{\mathcal{V}}}},{{{\mathcal{E}}}})\) with vertices \({{{\mathcal{V}}}}\) and weighted edges \({{{\mathcal{E}}}}\), the goal is to find a permutation of the vertices such that the resulting tour length is minimal, where a tour is a cycle that visits each vertex exactly once. A special case of the TSP is the 2D Euclidean TSP, where each node is defined in terms of its x and y coordinates in Euclidean space, and the edge weights are given by the Euclidean distance between these points. In this work, we deal with the symmetric Euclidean TSP on a complete graph, where the edges in the graph are undirected. This reduces the number of possible tours from n! to \(\frac{(n1)!}{2}\). However, even in this reduced case the number of possible tours is already larger than 100k for instances with a modest number of ten cities, and the TSP is a wellknown NPhard problem.
To solve this problem with a RL approach, we follow the strategy introduced in ref. ^{33}. In this work, a classical GNN is used to solve a number of combinatorial optimization problems on graphs. The authors show that this approach can outperform dedicated approximation algorithms defined for the TSP, like the Christofides algorithm, on instances of up to 300 cities. One episode of this learning algorithm for the TSP can be seen in Fig. 3, and a detailed description of the learning task as implemented in our work is given in Section Quantum neural combinatorial optimization with the EQC.
Solving the TSP with the QAOA
The QAOA is implemented as a PQC by a Trotterization of Adiabatic Quantum Computation (AQC)^{2}. In general, for AQC, we consider a starting Hamiltonian H_{0}, for which both the formulation and the ground state are wellknown, and a final Hamiltonian H_{P}, that encodes the combinatorial optimization problem to be solved. The system is prepared in the ground state of the Hamiltonian H_{0} and then it is evolved according to the timedependent Hamiltonian:
where s(t) is a real function called annealing schedule that satisfies the boundary conditions: s(0) = 0 and s(T) = 1, with T the duration of the evolution. To implement this as a quantum circuit we use the following approximation:
which is knwon as the TrotterSuzuki formula. By using this formula to approximate the evolution according to H(t) and by parameterizing time we obtain:
All of these matrices are unitary since the Hamiltonians in the argument of the exponential are all Hermitian. We define a parameter p (integer known as the depth, or level) of QAOA which has the same role as r in Eq. (22). Increasing the depth p adds additional layers to the QAOA circuit, and thus more closely approximates the H(t)^{2}.
In QAOA, all qubits are initialized to \({\left\vert +\right\rangle }^{{\otimes }_{n}}\), which is the ground state of \({H}_{0}={\sum }_{i}{\sigma }_{x}^{(i)}\). Alternating layers of H_{p} and H_{0} are added to the circuit (p times), parameterized by γ and β as defined in Eq. (23). The values of γ and β are found by minimizing the expectation value of H_{p}, and thus approximate the optimal solution to the original combinatorial optimization problem. When using QAOA, we do not solve the TSP directly, but a QUBO representation of this problem. This representation is wellknown, and can be found in ref. ^{21}:
Here, ε_{i,j} are the distances between two nodes \(i,j\in {{{\mathcal{V}}}}\) and \(W:= \mathop{\max }\limits_{(i,j)\in {{{\mathcal{E}}}}}{\varepsilon }_{i,j}\). The variables x_{v,t} are binary decision variables denoting whether node v is visited at step t. We optimize the β and γ parameters for p = 1 by performing a uniform random search over the space [0, 2π]^{2}, and selecting the best configuration found.
For p = 2 and 3, we optimized the circuit parameters using Constrained Optimization BY Linear Approximation (COBYLA). In addition, similar to^{63}, we employed a pdependent initialization technique for the circuit parameters. Specifically, (p + 1)depth QAOA circuit parameters were initialized with the optimal parameters from the pdepth circuit, as follows:
This way we are allowing the parameter training procedure to start in a known acceptable state based on the results of the previous step. In Fig. 7 we show our results for fivecity instances of the TSP. The approximation ratio shown is derived by dividing the tour length of the best feasible solution, measured as the output of the trained QAOA circuit, by the optimal tour length of the respective instance. In addition, we compute results for two different p = 3 QAOA circuits: the first is trained in the procedure described above (where the parameters are trained for each instance). The second uses the parameters of the best QAOA circuit out of those for all instances evaluated at p = 3, following a concentration of parameters argument as presented in^{43}. The second method is closer to what is done in a ML context, where one set of parameters is used to evaluate the performance on all validation samples.
Due to the number of qubits required to formulate a QUBO for the TSP, we were not able to run QAOA for all TSP instances. For example, an instance with six cities already requires 25 qubits (we can fix the choice of the first city to be visited without loss of generality, requiring only (n−1)^{2} variables to formulate the QUBO). A different formulation of the QUBO problem presented in^{64}, that needs \({{{\mathcal{O}}}}(n\log (n))\) qubits, avoids this issue by modifying the circuit design. However, this proposal increases the circuit depth considerably and is therefore illsuited for the NISQ era.
In Fig. 7, we can see that finding a good set of parameters for QAOA to solve TSP is hard even for fivecity instances. We note that the performance of QAOA improves with higher p, however, QAOA performance is still far from matching the approximation ratios obtained by EQC even for p = 3, which can be seen in Fig. 7 as a black dashed line. Furthermore, we note that significant computational effort is required to obtain these results: methods like COBYLA are based on gradient descent, which requires us to evaluate the circuit many times until either convergence or the maximum number of iterations is reached. We also note that due to the heuristic optimization of the QAOA parameters themselves, we are not guaranteed that the configuration of parameters is optimal, which may result in either insufficient iterations to converge or premature convergence to suboptimal parameter values. In an attempt to mitigate this, we tested several optimizers (Adam, SPSA, BFGS and COBYLA) and used the best results, which were those found by COBYLA.
Data availability
The data sets containing the TSP instances studied in this work and their optimal solutions can be found on GitHub (https://github.com/askolik/eqc_for_nco).
Code availability
The full code that was used to generate the numerical results and figures in this work can be found on GitHub (https://github.com/askolik/eqc_for_nco).
References
Cerezo, M. et al. Variational quantum algorithms. Nat. Rev. Phys. 3, 625–644 (2021).
Farhi, E., Goldstone, J. & Gutmann, S. A quantum approximate optimization algorithm. arXiv preprint arXiv:1411.4028 (2014).
Bennett, T., Matwiejew, E., Marsh, S. & Wang, J. B. Quantum walkbased vehicle routing optimisation. Front. Phys. 692, 9 Jg (2021).
Grimsley, H.R., Economou, S.E., Barnes, E. & Mayhall, N.J. Adaptvqe: An exact variational algorithm for fermionic simulations on a quantum computer. arXiv preprint arXiv:1812.11173 (2018).
Peruzzo, A. et al. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun. 5, 1–7 (2014).
Kandala, A. et al. Hardwareefficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242–246 (2017).
Benedetti, M., Lloyd, E., Sack, S. & Fiorentini, M. Parameterized quantum circuits as machine learning models. Quant. Sci. Technol. 4, 043001 (2019).
McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren plateaus in quantum neural network training landscapes. Nat. Commun. 9, 1–6 (2018).
Arrasmith, A., Cerezo, M., Czarnik, P., Cincio, L. & Coles, P. J. Effect of barren plateaus on gradientfree optimization. Quantum 5, 558 (2021).
Marrero, C. O., Kieferová, M. & Wiebe, N. Entanglementinduced barren plateaus. PRX Quant. 2, 040316 (2021).
Smagt, P. v. d. & Hirzinger, G. Why feedforward networks are in a bad shape. In International Conference on Artificial Neural Networks, 159–164 (Springer, 1998).
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
Ramesh, A. et al. Zeroshot texttoimage generation. In International Conference on Machine Learning, 8821–8831 (PMLR, 2021).
Zhou, J. et al. Graph neural networks: A review of methods and applications. AI Open 1, 57–81 (2020).
Cappart, Q. et al. Combinatorial optimization and reasoning with graph neural networks. arXiv preprint arXiv:2102.09544 (2021).
Verdon, G. et al. Quantum graph neural networks. arXiv preprint arXiv:1909.12264 (2019).
Henry, L.P., Thabet, S., Dalyac, C. & Henriet, L. Quantum evolution kernel: Machine learning on graphs with programmable arrays of qubits. Phys. Rev. A 104, 032416 (2021).
Zheng, J., Gao, Q. & Lü, Y. Quantum graph convolutional neural networks. In 2021 40th Chinese Control Conference (CCC), 6335–6340 (IEEE, 2021).
Mernyei, P., Meichanetzidis, K. & Ceylan, İ. İ. Equivariant quantum graph circuits. In International Conference on Machine Learning, (PMLR, 2022).
Lucas, A. Ising formulations of many np problems. Front. Phys. 2, 5 (2014).
Fan, W. et al. Graph neural networks for social recommendation. In The world wide web conference, 417–426 (2019).
Bronstein, M. M., Bruna, J., Cohen, T. & Veličković, P. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478 (2021).
Ozaeta, A., van Dam, W. & McMahon, P. L. Expectation values from the singlelayer quantum approximate optimization algorithm on ising problems. Quant. Sci. Technol. 7.4, 045036 (2022).
Cerezo, M., Sone, A., Volkoff, T., Cincio, L. & Coles, P. J. Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat. Commun. 12, 1–12 (2021).
Schatzki, L., Larocca, M., Sauvage, F. & Cerezo, M. Theoretical guarantees for permutationequivariant quantum neural networks. arXiv preprint arXiv:2210.09974 (2022).
Larocca, M. et al. Diagnosing barren plateaus with tools from quantum optimal control. Quantum 6, 824 (2022).
Wang, S. et al. Noiseinduced barren plateaus in variational quantum algorithms. Nat. Commun. 12, 1–11 (2021).
Mei, S., Misiakiewicz, T. & Montanari, A. Learning with invariances in random features and kernel models. In Conference on Learning Theory, 3351–3418 (PMLR, 2021).
Caro, M. C. et al. Generalization in quantum machine learning from few training data. Nat. Commun. 13, 1–11 (2022).
Bengio, Y., Lodi, A. & Prouvost, A. Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Operational Res. 290, 405–421 (2021).
Vinyals, O., Fortunato, M. & Jaitly, N. Pointer networks. arXiv preprint arXiv:1506.03134 (2015).
Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B. & Song, L. Learning combinatorial optimization algorithms over graphs. arXiv preprint arXiv:1704.01665 (2017).
Mirhoseini, A. et al. Chip placement with deep reinforcement learning. arXiv preprint arXiv:2004.10746 (2020).
Nazari, M., Oroojlooy, A., Snyder, L. V. & Takáč, M. Reinforcement learning for solving the vehicle routing problem. arXiv preprint arXiv:1802.04240 (2018).
Marwaha, K. Local classical maxcut algorithm outperforms p = 2 qaoa on highgirth regular graphs. Quantum 5, 437 (2021).
Szegedy, M. What do qaoa energies reveal about graphs?arXiv preprint arXiv:1912.12277 (2019).
Harman, G. H., Kulkarni, S. R. & Narayanan, H.\(\sin (\omega x)\) can approximate almost every finite set of samples.Constr Approximation 42, 303–311 (2015).
Lloyd, S. Quantum approximate optimization is computationally universal. arXiv preprint arXiv:1812.11075 (2018).
Morales, M. E., Biamonte, J. D. & Zimborás, Z. On the universality of the quantum approximate optimization algorithm. Quant. Inform. Processing 19, 1–26 (2020).
Pythontsp https://github.com/fillipegsm/pythontsp.
Christofides, N. Worstcase analysis of a new heuristic for the travelling salesman problem. Tech. Rep. (CarnegieMellon Univ Pittsburgh Pa Management Sciences Research Group, 1976).
Brandao, F. G., Broughton, M., Farhi, E., Gutmann, S. & Neven, H. For fixed control parameters the quantum approximate optimization algorithm’s objective function value concentrates for typical instances. arXiv preprint arXiv:1812.04170 (2018).
Farhi, E. & Harrow, A. W. Quantum supremacy through the quantum approximate optimization algorithm. arXiv preprint arXiv:1602.07674 (2016).
Farhi, E., Gamarnik, D. & Gutmann, S. The quantum approximate optimization algorithm needs to see the whole graph: A typical case. arXiv preprint arXiv:2004.09002 (2020).
Bravyi, S., Kliesch, A., Koenig, R. & Tang, E. Obstacles to variational quantum optimization from symmetry protection. Phys. Rev. Lett. 125, 260505 (2020).
Zhou, L., Wang, S.T., Choi, S., Pichler, H. & Lukin, M. D. Quantum approximate optimization algorithm: Performance, mechanism, and implementation on nearterm devices. Phys. Rev. X 10, 021067 (2020).
Shaydulin, R., Lotshaw, P. C., Larson, J., Ostrowski, J. & Humble, T. S. Parameter transfer for quantum approximate optimization of weighted maxcut. arXiv preprint arXiv:2201.11785 (2022).
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).
LeCun, Y. et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks. 3361, 1995 (1995).
Schmidt, R. M. Recurrent neural networks (rnns): A gentle introduction and overview. arXiv preprint arXiv:1912.05911 (2019).
Wu, Z. et al. A comprehensive survey on graph neural networks. IEEE Transact. Neural Netw. Learning Syst. 32, 4–24 (2020).
Mansimov, E., Mahmood, O., Kang, S. & Cho, K. Molecular geometry prediction using a deep generative graph neural network. Sci. Rep. 9, 1–13 (2019).
Long, J. et al. A graph neural network for superpixel image classification. In Journal of Physics: Conference Series, vol. 1871, 012071 (IOP Publishing, 2021).
Cong, I., Choi, S. & Lukin, M. D. Quantum convolutional neural networks. Nat. Phys. 15, 1273–1278 (2019).
Larocca, M. et al. Groupinvariant quantum machine learning. arXiv preprint arXiv:2205.02261 (2022).
Chen, Y. et al. Novel architecture of parameterized quantum circuit for graph convolutional network. arXiv preprint arXiv:2203.03251 (2022).
Sutton, R. S. & Barto, A. G.Reinforcement learning: An introduction (MIT press, 2018).
Mnih, V. et al. Humanlevel control through deep reinforcement learning. Nature 518, 529–533 (2015).
Chen, S. Y.C. et al. Variational quantum circuits for deep reinforcement learning. IEEE Access 8, 141007–141024 (2020).
Lockwood, O. & Si, M. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 16, 245–251 (2020).
Skolik, A., Jerbi, S. & Dunjko, V. Quantum agents in the gym: a variational quantum algorithm for deep qlearning. Quantum 6, 720 (2022).
Zhou, L., Wang, S.T., Choi, S., Pichler, H. & Lukin, M. D. Quantum approximate optimization algorithm: Performance, mechanism, and implementation on nearterm devices. Phys. Rev. X 11, 021067 (2020).
Glos, A., Krawiec, A. & Zimborás, Z. Spaceefficient binary optimization for variational computing. npj Quant. Inform. 8, 1–39 (2022).
Acknowledgements
A.S. thanks Elies GilFuster and Radoica Draskic for valuable discussions about geometric deep learning. A.S., M.C., and S.Y. are funded by the German Ministry for Education and Research (BMB+F) in the project QAI2QKIS under grant 13N15587. V.D. is supported by the Dutch Research Council (NWO/OCW), as part of the Quantum Software Consortium programme (project number 024.003.037). V.D. also acknowledges the support by the project NEASQC funded from the European Union’s Horizon 2020 research and innovation programme (grant agreement No 951821). V.D. also acknowledges the funding by the European Union under Grant Agreement 101080142 and the project EQUALITY.
Author information
Authors and Affiliations
Contributions
A.S. conceived the idea for this work. A.S. and V.D. conducted theoretical analysis of the proposed ansatz. A.S. and M.C. performed numerical simulations and the analysis of their results. S.Y. created the data set used in this work. V.D. and T.B. supervised the project. A.S., M.C., and S.Y. wrote the first draft of the paper; all authors contributed to editing the final paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Skolik, A., Cattelan, M., Yarkoni, S. et al. Equivariant quantum circuits for learning on weighted graphs. npj Quantum Inf 9, 47 (2023). https://doi.org/10.1038/s4153402300710y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4153402300710y
This article is cited by

Exponential concentration in quantum kernel methods
Nature Communications (2024)

Understanding quantum machine learning also requires rethinking generalization
Nature Communications (2024)

Theoretical guarantees for permutationequivariant quantum neural networks
npj Quantum Information (2024)

Quantum kernel estimationbased quantum support vector regression
Quantum Information Processing (2024)

Variational quantum algorithms: fundamental concepts, applications and challenges
Quantum Information Processing (2024)