Hierarchical quantum circuit representations for neural architecture search

Machine learning with hierarchical quantum circuits, usually referred to as Quantum Convolutional Neural Networks (QCNNs), is a promising prospect for near-term quantum computing. The QCNN is a circuit model inspired by the architecture of Convolutional Neural Networks (CNNs). CNNs are successful because they do not need manual feature design and can learn high-level features from raw data. Neural Architecture Search (NAS) builds on this success by learning network architecture and achieves state-of-the-art performance. However, applying NAS to QCNNs presents unique challenges due to the lack of a well-defined search space. In this work, we propose a novel framework for representing QCNN architectures using techniques from NAS, which enables search space design and architecture search. Using this framework, we generate a family of popular QCNNs, those resembling reverse binary trees. We then evaluate this family of models on a music genre classification dataset, GTZAN, to justify the importance of circuit architecture. Furthermore, we employ a genetic algorithm to perform Quantum Phase Recognition (QPR) as an example of architecture search with our representation. This work provides a way to improve model performance without increasing complexity and to jump around the cost landscape to avoid barren plateaus. Finally, we implement the framework as an open-source Python package to enable dynamic QCNN creation and facilitate QCNN search space design for NAS.


INTRODUCTION
Machine learning using trainable quantum circuits provides promising applications for quantum computing [1][2][3][4].Among various parameterized quantum circuit (PQC) models, the Quantum Convolutional Neural Network (QCNN) introduced in Ref [5] stands out for its shallow circuit depth, absence of barren plateaus [6], and good generalisation capabilities [7].It has been implemented experimentally [8] and combines techniques from Quantum Error Correction (QEC), Tensor Networks (TNs) and deep learning.Research at this intersection has been fruitful, yielding deep learning solutions for quantum many-body problems [9][10][11][12], quantum-inspired insights for deep learning [13][14][15] and equivalences between them [16][17][18].Deep learning has been widely successful in recent years with applications spanning from content filtering and product recommendations to aided medical diagnosis and scientific research.Its main characteristic, learning features from raw data, eliminates the need for manual feature design by experts [19].AlexNet [20] demonstrated this and marked the shift in focus from feature design to architecture design [21].Naturally, the next step is learning network architecture, which Neural Architecture Search (NAS) aims to achieve [22].NAS has already produced state-of-the-art deep learning models with automatically designed architectures [21,[23][24][25].NAS consist of * lourensmattj@gmail.comthree main categories: search space, search strategy and performance estimation strategy [22].The search space defines the set of possible architectures that a search algorithm can consider, and carefully designed search spaces help improve search efficiency and reduce computational complexity [26].Search space design often involves encoding architectures using a cell-based representation.Usually, a set of primitive operations, such as convolutions or pooling, are combined into a cell to capture some design motif (compute graph).Different cells are then stacked to form a complete architecture.Cell-based representations are popular because they can capture repeated motifs and modular design patterns, which are often seen in successful hand-crafted architectures.Similar patterns also appear in quantum circuit designs [5,[27][28][29][30][31].For example, Grant et al. [27] use hierarchical architectures based on tensor networks to classify classical and quantum data.Similarly, Cong et al. [5] use the multiscale entanglement renormalisation ansatz (MERA) as an instance of their proposed QCNN and discuss generalisations for quantum analogues of convolution and pooling operations.In this work, we formalise these design patterns by providing a hierarchical representation for QCNNs, thereby capturing their architecture in such a way to facilitate search space design for NAS with PQCs.
The QCNN belongs to the class of hybrid quantumclassical algorithms, in which a quantum computer executes the circuit, and a classical computer optimises its parameters.Two key factors must be considered when using PQCs for machine learning: the method FIG.1: The machine learning pipeline we implemented for music genre classification.Given an audio signal of a song (a), we generate two forms of data: tabular (b) and image (c).Each form has data preprocessing applied before being encoded into a quantum state (d).The QCNN circuit shown in (d) favours Principal Component Analysis (PCA) because qubits are pooled from bottom to top, and principal components are encoded from top to bottom.This architecture is an instance of the reverse binary tree family that we generated with our framework.
of data encoding (feature map) [32,33] and the choice of a quantum circuit [34][35][36].Both the challenge and objective are to find a suitable quantum circuit for a given feature map that is expressive and trainable [33].
The typical approach to finding a circuit is to keep the architecture (gates layout) fixed and to optimise continuous parameters such as rotation angles.Optimising architecture is referred to as variable structure ansatz in literature and is generally not the focus because of its computational complexity [2].However, the architecture of a circuit can improve its expressive power and the effectiveness of initialisation techniques [28].Also, the QCNN's defining characteristic is its architecture, which we found to impact model performance significantly.Therefore, we look towards NAS to optimise architecture in a quantum circuit setting.This approach, sometimes referred to as quantum architecture search (QAS) [37,38], has shown promising results for the variational quantum eigensolver (VQE) [39][40][41][42], the quantum approximate optimisation algorithm (QAOA) [43,44] and general architecture search [37,38,45,46].However, these approaches are often task-specific or impose additional constraints, such as circuit topology or allowable gates, to make them computationally feasible.To the best of the author's knowledge, there is currently no framework that can generate hierarchical architectures such as the QCNN without imposing such constraints.
One problem with the cell-based representation for NAS is that the macro architecture, the sequence of cells, is fixed and must be chosen [22].Recently, Liu et al. [26] proposed a hierarchical representation as a solution, where a cell sequence acts as the third level of a multi-level hierarchy.In this representation, lower-level motifs act as building blocks for higher-level ones, allowing both macro and micro architecture to be learned.In this work, we follow a similar approach and represent a QCNN architecture as a hierarchy of directed graphs.On the lowest level are primitive operations such as convolutions and pooling.The second level consists of sequences of these primitives, such as convolutionpooling or convolution-convolution units.Higher-level motifs then contain sequences of these lower-level motifs.For example, the third level could contain a sequence of three convolution-pooling units, as seen in Figure 1d.
For the primitives, we define hyperparameters such as strides and pooling filters that control their architectural effect.This way, the representation can capture design motifs on multiple levels, from the distribution of gates in a single layer to overall hierarchical patterns such as tensor tree networks.We demonstrate this by generating FIG.2: An overview of our architectural representation for QCNNs.From a given set of gates, we build two-qubit unitary ansatzes.The representation then captures design motifs M l k on different levels l of the hierarchy.On the lowest level l = 1, we define primitives which act as building blocks for the architecture.For example, a convolution operation with stride one is encoded as the directed graph M 1  1 .The directed graph M 1 3 is a pooling operation that measures the bottom half of the circuit.Combined, they form the level two motif (e): a convolution-pooling unit M 2  1 .Higher-level motifs consist of combinations of lower-level motifs up until the final level l = L, which contains only one motif M L 1 , the complete QCNN architecture.M L 1 is a hierarchy of directed graphs fully specifying how to spread the unitary ansatzes across the circuit.The two lines of code (e) and (f ) show the power of this representation as it is all that is required to create the entire QCNN circuit from Figure 1 (d).The code comes from the Python package we implemented based on the work of this paper.It facilitates dynamic QCNN creation and search space design.
a family of QCNN architectures based on popular motifs in literature.We then benchmark this family of models and show that alternating architecture has a greater impact on model performance than other modelling components.By alternating architecture we mean the following: given a quantum circuit that consist of n unitary gates, an altered architecture consists of the same n gates rearranged in a different way on the circuit.The types of rearrangements may be changing which qubits the gates act upon, altering the order of gate occurrences, or adjusting larger architectural motifs, such as pooling specific qubits (stop using them) while leaving others available for subsequent gates and so on.We create architectural families to show the impact of alternating architecture, any two instances of the family will have the exact same unitaries, just applied in a different order on different qubits.Consider the machine learning pipeline for classifying musical genres from audio signals, seen in Figure 1.We start with a 30-second recording of a song (Figure 1a) and transform it in two ways.The first is tabular form (Figure 1b), derived from standard digital signal processing statistics of the audio signal.The second is image form (Figure 1c), constructed using a Mel frequency spectrogram.Both datasets are benchmarked separately, with their own data preprocessing and encoding techniques applied.For the tabular data, we test Principal Component Analysis (PCA) and tree-based feature selection before encoding it in a quantum state using either qubit, IQP, or amplitude encoding.Once encoded, we choose two-qubit unitary ansatzes U m and V m for the convolution and pooling primitives m = 1, 2, . . ., 6, as shown in Figure 1d.We show example ansatzes in Appendix A and test them across different instances of an architecture family.Of all the components in this pipeline, alternating architecture, that is changing how each U m and each V m are spread across the circuit, had the greatest impact on model performance.In addition to our theoretical framework, we implement it as an open-source Python package to enable dynamic QCNN creation and facilitate search space design for NAS.It allows users to experimentally determine suitable architectures for specific modelling setups, such as finding circuits that perform well under a specific noise or hardware configuration, which is particularly relevant in the Noisy Intermediate-Scale Quantum (NISQ) [47] era.Additionally, as more qubits become available, the hierarchical nature of our framework provides a natural way to scale up the same model.In summary, our contributions are the architectural representation for QCNNs, a Python package for dynamic QCNN creation, and experimental results on the potential advantage of architecture search in a quantum setting.
The remainder of this paper is structured as follows: we begin with our main results by summarising the architectural representation for QCNNs and then show the effect of alternating architecture, justifying its importance.We then provide an example of architecture search with our representation by employing an evolutionary algorithm to perform QPR.Following this, we give details of our framework by providing a mathematical formalism for the representation and describing its use.Next, with the formalism at hand, we show how it facilitates search space design by describing the space we created for the benchmark experiments.We then discuss generalisations of the formalism and the applicability of our representation with search algorithms.After this we elaborate on our experimental setup in the Methods Section.Finally, we discuss applications and future steps.

Architectural Representation
Figure 2 shows our architectural representation for QCNNs.We define two-qubit unitary ansatzes from a given set of gates, and capture design motifs M l k on different levels l of the hierarchy.On the lowest level l = 1, we define primitives which act as building blocks for the architecture.For example, a convolution operation with stride one is encoded as the directed graph M 1  1 , and with stride three as M 1  2 .The directed graph M 1 3 is a pooling operation that measures the bottom half of the circuit, and M 1 4 measures from the inside outwards.Combined, they can form higher-level motifs such as convolution-pooling units M 2 1 (e), convolution-convolution units M 2 2 , or convolution-pooling-convolution units M 2 3 .The highest level l = L contains only one motif M L 1 , the complete QCNN architecture.M L 1 is a hierarchy of directed graphs fully specifying how to spread the unitary ansatzes across the circuit.This hierarchical representation is based on the one from Liu et al. [26] for deep neural networks (DNNs), and allows for the capture of modularised design patterns and repeated motifs.The two lines of code (e) and (f ) show the power of this representation as it is all that is required to create the entire QCNN circuit from Figure 1 (d).The code comes from the Python package we implemented based on the work of this paper.It facilitates dynamic QCNN creation and search space design.

Architectural impact
The details regarding specific notation and representation of the framework is given after this section, first we justify it with the following experimental results.In Appendix C we also give background on QCNNs and quantum machine learning for more context.To illustrate the impact of architecture on model performance, we compare the fixed architecture from the experiments of Hur et al. [29] to other architectures in the same family while keeping all other components the same.The only difference in each comparison is architecture (how the unitaries are spread across the circuit).The architecture in [29] is represented within our framework as: (s c , F * , s p ) = (1, even, 0) → Qfree(8) + (Qconv(1) + Qpool(0, F even )) × 3, see algorithm 1.To evaluate their performance, we use the country vs rock genre pair, which proved to be one of the most difficult classification tasks from the 45 possible combinations.We compare eight unitary ansatzes with different levels of complexity, as shown in Figure A.1.Table I shows the results of the comparisons, the reference architecture is as described above and the discovered alteration found via random search.We note the first important result, we improved the performance of every ansatz, in one case, by 18.05%, through random search of the architecture space.Ansatz refers to the two-qubit unitary used for the convolution operation of a model.For example, the model in figure 1 (d) is described by (1, right, 0) and ansatz A.1a corresponds to U 1 , U 2 and U 3 being circuit A.1a from Appendix A. Each value represents the average model accuracy and standard deviation from 30 separate trained instances on the same held-out test set.The second important result is that alternating architecture can improve model performance without increasing complexity.For instance, the best-performing model for the reference architecture is with ansatz A.1g, which has an average accuracy of 73.24%.However, this ansatz causes the model to have 10 × 3 = 30 parameters.In contrast, by alternating the architecture with the   [29] and the "alteration" was found through random search within the same family.
simplest ansatz A.1a, the model outperformed the best reference model with an average accuracy of 75.14% while only having 3 × 2 = 6 parameters.The parameter counts come from each model having N = 8 qubits and the same number of unitaries, 3N − 2 → 3(8) − 2 = 22, of which 13 are for convolutions.See the search space design section and Algorithm 1 for more details.A model has three convolutions, and each convolution shares weights between its two-qubit unitaries.This means that the two-qubit unitary ansatz primarily determines the number of parameters to optimise for a model.For example, a model with ansatz A.1a have 2 × 3 = 6 parameters to optimise because ansatz A.1a has two parameters.Another interesting result is for ansatz A.1c, the reference architecture could only obtain an average accuracy of 52.69% indicating its inability to find any kind of local minimum during training, leading one to think it might be a barren plateu.But, the altered architecture was able to find a local minima and improve the average accuracy by 18.05%.
We would like to note that our primary objective in these experiments is to demonstrate the potential for performance improvement.As such, we only conducted random search for approximately 2 hours on an i7-1165G7 processor for each ansatz.Consequently, for higher parameter ansatzes, which correspond to longer training times, the search space was less explored.This is likely the reason behind the observed decrease in performance improvement for larger parameter ansatzes.Therefore the observed improvements are all lower  II: Country vs Rock average accuracy within the reverse binary tree search space, all with A.1a as ansatz.The convolution stride s c is shown on the horizontal axis and the combinations of pooling filter F * and stride s p on the vertical.The best pooling filter and convolution stride combinations are presented in bold along with the overall best architecture (s c , F * , s p ) = (6, left, 2).
bounds for the potential performance increase from alternating architecture.We anticipate that significantly better architectures may still exist within the space.
Table II presents the performance of the family of reverse binary trees (as described in Algorithm 1) for ansatz A.1a. Due to its quick training time, ansatz A.1a was the only case for which we managed to exhaust the search space (168 architectures).In the search space design section, we discuss how the size of the family can be easily increased or decreased.Each value represents FIG.3: QCNN with the F right m pooling filter using low resolution image data.The accuracies for all genre pairs are provided.
the average accuracy of five trained instances on the country vs rock genre pair.The overall accuracy of the whole space is 63.11%, indicating that the reference architecture from table I was close to the mean performance.The best-performing architecture in this space is (s c , F * , s p ) = (6, left, 2), with an average accuracy of 75.93%.This is the alteration from Table I discovered through random search within the family of reverse binary trees.It seems that the combination of F left and s c = 6 performs particularly well for this task, with an average accuracy of 72.52%.In general, it appears that the convolution stride s c and pooling filter F * have the most significant impact on performance.It is also worth noting that convolution strides of s c = 3, 4, 5 performed poorly compared to the other values.The range of performance in this space goes from a minimum of 43.75% to a maximum of 75.93%, demonstrating the potential impact of architectural choices on model performance.
Finally, we compared the performance of two different architectures on the image data across all genres.This time, we used ansatz A.1g to compare the F right m and F even m pooling filters, shown in Figures 3 and 4. The image data is a low-resolution (8 × 32 = 256 = 2 8 pixels) spectrogram of the audio signal.We did not expect high accuracy from this data, but were interested in the variation of performance for different architectures.Figures 3 and 4 show the difficulty of some genre pairs.Interestingly, the F right m pooling filter outperformed the F even m filter on almost all genres.If we focus on the genre pairs that the models were able to classify, we see that F right m had 14 models that achieved an accuracy above 75%, compared to the 5 of F even m .We also note that the image data had no PCA or tree-based feature selection applied to it, and the F right m filter was still favoured.A similar result was obtained with ansatz A.1a.This shows architecture impacts performance even on low-resolution data.
FIG. 5: Expectation values for the circuit found via evolutionary search for a system of N = 15 spins.Points represent a test set of 64 × 64 ground states for various h 1 and h 2 values of the Hamiltonian, J = 1.The inside, middle and outside points were used to evaluate an architecture's fitness during search.The same color scale as in [5] is used to facilitate comparison.

Architectural Search
In this section, we present an example of applying our architectural representation in conjunction with evolutionary search to perform Quantum Phase Recognition (QPR).The specifics of the search algorithm can be found in the Generalisation and Search section but we utilize an algorithm similar to the one employed by Liu et al. [26].Mutations involve replacing a primitive within a motif with a randomly generated one, while crossover consists of combining two motifs end-to-end, if possible, or interweaving them otherwise.To facilitate comparison, we consider the same task and setup from the original QCNN paper [5].The objective is to recognize a Z 2 × Z 2 symmetry-protected topological (SPT) phase for a ground state that belongs to a family of cluster-Ising Hamiltonians [50]: Here, X i , Z i are Pauli operators acting on the spin at site i and the SPT phase contains a S = 1 Haldane chain [51].The ground state can belong to an SPT, paramagnetic or antiferromagnetic phase depending on the values of h 1 , h 2 , and J. Our goal is to identify a QCNN capable of distinguishing between SPT and other phases by measuring a single qubit.Following the approach in [5], we consider a system of N = 15 spins and train a circuit on 40 equally spaced points along h 2 = 0, where the ground state is known to be in the SPT phase when h 1 ≤ 1.We also evaluate the circuit with the same sample complexity [5]: where p represents the probability of measuring a non-zero expectation value and p 0 = 0.5.Equation ( 2) calculates the minimum number of measurements required to be 95% confident that p = 0.5, with p being the expectation value of the circuit U encoded with the ground state |ψ g transformed into a probability: p = ( ψ g | U |ψ g + 1)/2.Therefore a well-performing QCNN will yield low values of M min near the phase boundary for points within the SPT phase.We define the fitness of an architecture as a linear combination of the sample complexity values M in , M middle for points in the SPT phase, and the mean squared error MSE out for points outside the boundary.Figure 5 illustrates the points considered for M in , M middle and MSE out .During search we assigned the majority of the weight to M in as the goal is to develop a model that confidently identifies SPT phases near the boundary.To prevent a model from classifying all points as SPT, MSE out is included, while M middle ensures overall good performance.Finally, during search we added a regularization term for the number of parameters, to find well-performing architectures with low computational complexity.[5] and the architecture found via evolutionary search.Sample complexity represents the expected number of measurements required to be 95% confident that the ground state is in the SPT phase (non-zero expectation value).Metrics are calculated on a set of points in the test set, where inside refers to SPT points near the phase boundary, outside to non-SPT points near to the phase boundary and middle to points in between, as shown in Figure 5.
Table III and Figure 5 show the performance of the best architecture found during search.The search algorithm identified a QCNN with only 11 parameters, in contrast to the 1308 parameters of the original reference architecture.For points in the SPT phase near the boundary, the sample complexity of the discovered architecture (M in = 36.079) is lower than that of the reference (61.523), resulting in 25 fewer measurements required on average.Although the reference architecture exhibits slightly better sample complexity for points in the middle of the phase boundary (M middle = 10.992) compared to the discovered architecture (M middle = 13.253), and a marginally lower MSE for points outside the phase boundary (MSEout = 0.164 compared to MSEout = 0.167), the improvements in M in and the number of parameters are substantial and more advantageous.The discovered architecture can be found in Appendix B.2, and the phase diagram it generates is shown in Figure 5.The search was conducted on a system equipped with two Intel Xeon E5-2640 processors (2.0 GHz) and 128 GB of RAM, and it took approximately 2 hours to discover the final architecture (over 831 generations).Although we anticipate that extending the search may yield even better architectures, the primary goal of this experiment was to demonstrate a representative example of the search process and showcase the ease of obtaining promising results.This emphasizes the potential advantages of architecture search in quantum computing tasks, where the computational cost of a circuit can be reduced while maintaining or even improving performance.We attribute this success to a well-defined search space, with our representation aiming to simplify the process of creating such spaces.Moreover, our representation allows for the incorporation of hardware constraints, facilitating the search for architectures that perform well on specific quantum devices.We believe this to be a necessary step towards the development of efficient quantum algorithms for real-world applications.By employing a well-structured representation and search space, we can streamline the process of discovering optimized quantum circuit architectures that are better suited for specific tasks and hardware.

Digraph Formalism
We represent the QCNN architecture as a sequence of directed graphs, each acting as a primitive operation such as a convolution (Qconv) or pooling (Qpool).A primitive is the directed graph G = (Q, E); its nodes Q represent available qubits, and oriented edges E the connectivity of the unitary applied between a pair of them.The direction of an edge indicates the order of interaction for the unitary.For example, a CNOT gate with qubit i as control and j as target is represented by the edge from qubit i to qubit j.We also introduce other primitives, such as Qfree, that free up pooled qubits for future operations.The effect of a primitive is based on its hyperparameters and the effect of its predecessor.This way, their individual and combined architectural effects are captured, enabling them to be dynamically stacked one after another to form the second level l = 2 motifs.Stacking these stacks in different ways constitutes higher-level motifs until a final level l = L, where one motif constitutes the entire QCNN architecture.In the case of pooling, controlled unitaries are used in place of measurement due to the deferred measurement principle [52].We define a QCNN architecture in Definition 1.
Motifs on the lowest level M 1 k are primitive operations, which form the set . At the highest level l = L there is only one motif M L 1 which is a hierarchy of tuples.M L 1 is flattened through an assemble operation: M = assemble(M L 1 ) which encodes each primitive into a directed graph G m = (Q m , E m ), the nodes Q m are available qubits and edges E m the connectevity of unitaries applied between them.M describes the entire QCNN architecture, M = (G 1 , G 2 , . . ., G |M | ).
. G 1 is always a Qfree(N q ) primitive specifying the number of available qubits with N q .For m > 1, G m is defined as: If G m is a convolution primitive: with d − (i) and d + (i) referring to the indegree and outdegree of node i, respectively and \ to set difference.
We show this digraph perspective in Figure 6, it is the data structure of the circuit in Figure 1d.If the m th graph in M is a convolution, we denote its two-qubit unitary acting on qubit i and j as U ij m (θ).Similarly, for pooling, we notate the unitary as V ij m (θ).The action of V ij m (θ) is measuring qubit i (the control), which causes a unitary rotation V on qubit j (the target).With this figure and notational scheme in mind, Definition 2 reads as follows: FIG.6: Graph view for the circuit architecture in Figure 1 (d).The same two-qubit unitary is used in all layers for the convolution operation, i.e.U ij m = U m .Similarly, in this example, we use the same two-qubit pooling unitaries with all eight qubits Q c 1 available for the convolution operations Below G 1 is G 2 with half the qubits of Q p 2 measured, indicated by the i th indices of V ij m , (i, j) ∈ E p 2 .For example, qubit 8 ∈ Q p 2 is measured and V 2 applied to qubit 1 ∈ Q p 2 as indicated by V 81 2 , (8, 1) ∈ E p 2 .This pattern repeats until one qubit remains in G 6 , which is measured and used to classify the music genre.

Q x
m is the set of available qubits for the m th primitive in M , where x ∈ {c, p, f } for convolution, pooling or Qfree respectively.The first primitive G 1 is Qfree(N q ) which specifies the number of available qubits N q for future operations.Any proceeding m > 1 primitive G m only has access to qubits not measured up to that point.This is the previous primitive's available qubits Q x m−1 if its type x ∈ {c, f } is a convolution or Qfree.Otherwise, for pooling, x = p, it's the set difference: m indicates measured qubits.This is visualised as small red circles in Figure 6.The only way to make those qubits available again is through Qfree(N f ), which can be used to free up N f qubits.For the convolution primitive, E c m is the set of all pairs of qubits that have U ij m (θ) applied to them.Finally, for the pooling primitive, E p m is the set of pairs of qubits that have pooling unitaries V ij m (θ) applied to them.The restriction is that if qubit i is measured, it cannot have any other rotational unitary V applied to it within the same primitive G m .This means the indegree d − of node i is zero.Similarly, if qubit i is measured, it may only have one corresponding target, meaning that the outdegree d + of node i is one.In the same vein, no target qubit j can be the control for another, d + (j) = 0. Every target qubit j have at least one corresponding control qubit i, d − (j) ≥ 1.It is possible for multiple measured qubits to have the same target qubit, giving E p m a surjective property.Following this definition, we can express a convolution or pooling operation for the m th graph in M as: Let W m = U m or V m be the m th primitive in M based on whether it's a convolution or pooling and the identity I if it's a Qfree primitive.Then the state of the QCNN after one training run is: We note that the choice of V is unrestricted, which means that within one layer each V can be a different rotation.Figure 1d shows a special case where the same V is used per layer, which is computationally favourable compared to using different ones.To enable weight sharing, the QCNN require convolution unitaries to be the same i.e.U ij m = U kh m where (i, j) ∈, (k, h) ∈ E c m .This formulation only regards one and two qubit unitaries for convolutions, one qubit unitaries being described with In the generalisation and search section we extend it to multiple qubit unitaries.
After training, |ψ in eq. 5 is measured based on the type of classification task, in this work we focus on binary classification allowing us estimate ŷ by measuring the remaining or specified qubit in the computational basis: We note that multi-class classification is also possible by measuring the other qubits and associating each with a different class outcome.Following this, we calculate the cost of a training run with C(y, ŷ), then using numerical optimization the cost is reduced by updating the parameters from Equations 3 to 4 and repeating the whole process until some local minimum is reached.Resulting in a model alongside a set of parameters to be used for classifying unseen data.

Controlling the primitives
We define basic hyperparameters that control the individual architectural effect of a primitive.There are two broad classes of primitives, special and operational.A special primitive has no operational effect on the circuit, such as Qfree.Its purpose is to make qubits available for future operational primitives and therefore has one hyperparamater N f for this specification.N f is typically an integer or set of integers corresponding to qubit numberings: Each operational primitive has its own stride parameter analogous to classical CNNs.For a given stride s, each qubit gets paired with the one s qubits away modulo the number of available qubits.For example a stride of 1 pairs each qubit with its neighbour.This depends on the qubit numbering used which is based on the circuit topology.For illustration purposes, we use a circular topological ordering, but any layout is possible as long as some ordering is provided for Q f 1 .For the convolution primitive we define its stride s c ∈ {1, 2, 3, ...} as: Equation ( 10) captures the case where there are only two qubits available for a convolution and equation (11) when there is only one which implies the convolution unitaries only consist of single qubit gates.A stride of s c = 1 is a typical design motif for PQCs and the graph formalism allow for a simple way to capture and generalise it.To achieve translational invariance for all strides the two constraints: m are added.Another option for translational invariance is a Qdense primitive, which only differs from Qconv in that E c m generates all possible pairwise combinations of Q c m .This primitive is available in the python package but left out from the definition (because of its similarity).Figure 7 show different ways in which The pooling primitive has two hyperparameters, a stride s p and filter F * m .The filter indicates which qubits to measure and the stride how to pair them with the qubits remaining.We define the filter as a binary string: For N = 8 qubits, the binary string F * m = 00001111 translates to measuring the rightmost qubits, i.e. {i|i ∈ Q p m , i ≥ 5}. Figure 6 is an example where the pattern F * 2 = 00001111 → F * 4 = 0011 → F * 6 = 01 is used, visually the qubits are removed from bottom to top.Encoding filters as binary strings is useful since generating them becomes generating languages, enabling the use of computer scientific tools such as context free grammars and regular expressions to describe families of filters.Pooling primitives enable hierarchical architectures for QCNNs, and in the search space design section, we illustrate how they can be implemented to create a family resembling reverse binary trees.The action of the filter is expressed as: where slices Q p m corresponding to the 0 indices of F * m , i.e. w i = 0(not measured).For example 010 {4, 7, 2} = {4, 2}.This example illustrates the case where an ordering was given to the set of available qubits to represent some specific topology of the circuit.Let Q x m+1 = F * m Q p m then the pooling primitive stride s p = {1, 2, . . .} is defined as:

Search Space Design
We show how the digraph formalism facilitates QCNN generation and search space design.Grant et al. [27] exhibited the success of hierarchical designs that resemble reverse binary trees.To create a space of these architectures, we only need three levels of motifs.The idea is to reduce the system size in half until one qubit remains while alternating between convolution and pooling operations.Given N qubits, a convolution stride s c , pooling stride s p and a pooling filter F * that reduce system size in half, a reverse binary tree QCNN is generated in Algorithm 1.
Motif: alternate convolution and pooling Motif: repeat until one qubit remain Algorithm 1 shows how to create instances of this architecture family.First, two primitives are created on the first level of the hierarchy, a convolution operation M 1  1 and a pooling operation M 1 2 .They are then sequentially combined on level two as 2 ) to form a convolution-pooling unit.The third-level motif M 3  1 repeats this second-level motif M 2 1 until the system only contains one qubit.This is log 2 (N ) repetitions for N qubits because we chose F * to reduce the system size in half during each pooling operation.The addition and multiplication symbols act as append and extend for tuples.For example which allow for an intuitive way to build motifs.It is easy to expand the algorithm for more intricate architectures, for instance, by increasing the number of motifs per level and the number of levels.A valid level four motif for algorithm 1 would be and M 2 2 = M 1 1 × 2 which is the reverse binary tree architecture M 3  1 then two convolutions and one convolutionpooling unit on four qubits, all repeated three times.Motifs can also be randomly selected on each level to generate novel architectures.The python package we provide acts as a tool to facilitate architecture generation this way.
In more detail, we now analyse the family of architectures generated by algorithm 1.First, we consider the possible pooling filters F * that reduce system size in half.It is equivalent to generating strings for the language A = {w|w has an equal number of 0s and 1s indicate the number of available qubits for the filter F * m .Then based on the 4 2 = 6 possible equal binary strings [53] of length four, we construct the following pooling filters: where the exponent a 3 ≡ {a} • {a} • {a} = aaa refers to the regular operation concatenation: A • B = {xy|x ∈ A, y ∈ B}.The pooling filter F inside yields 0110.Visually this pattern pools qubits from the inside (the middle of the circuit).See Figure 8 (c).Figure 8 (a) shows the repeated usage of F right for pooling.This particular pattern is useful for data preprocessing techniques such as principal component analysis (PCA) since PCA introduces an order of importance to the features used in the model.Typically, the first principal component (which explains the most variance) is encoded on the first qubit, the second principal component on the second qubit and so on.Therefore, it makes sense to pool the last qubits and leave the first qubits in the model for as long as possible.
If N = 8, s c = 1, s p = 0 and F * = F right then Algorithm 1 generates the circuit in Figure 1  The possible combinations of N, s c , s p , F * represent the search space/family size.Since F * reduces system size in half, it's required that the number of available qubits N is a power of two.Using integer strides causes the see the controlling primitives section), which enable translational invariance.The complexity of the model(in terms of the number of unitaries used) then scales linearly with the number of qubits N available.Specifically, N qubits result in 3N − 2 number of unitaries [54].

Generalisation and Search
The digraph formalism extends naturally to multiqubit unitaries, enabling the representation of more intricate and larger scale architectures.In general, a primitive with n-qubit unitaries is represented as a hypergraph G = (Q, E), where the edges E consist of n-tuples.We introduce two additional hyperparameters, step and offset, which control the construction of E. For instance, Figure 9 shows three primitives, each with 3-qubit unitaries.The first two have a stride of one, meaning that each 3-qubit unitary connects to its neighbors.In contrast, the last primitive has a stride of three, connecting every third qubit within the unitary.The offset parameter determines the starting point for counting; Figure 9a begins with the first qubit, while Figure 9b starts with the third.The step parameter controls the position of the next unitary; for example, Figure 9a and b have a step of three, skipping two qubits before creating another edge starting on the third qubit.Consequently, the primitives with 2-qubit unitaries we've been considering thus far are all special cases with a step of one and an offset of zero.Another aspect to consider is the execution order of the unitaries, which by default is the sequence in which the edges were created for a primitive.Our package introduces an additional hyperparameter to control this order.For example to execute the third edge of Figure 9a first followed by edge five, four, one and two, a value of (3, 5, 4, 1, 2) can be passed to the edge order hyperparameter.Lastly, a boundary condition hyperparameter can also be specified, allowing for the definition of open or periodic boundaries for the qubits.This essentially de-termines whether edge creation is calculated in modulo with respect to the number of qubits or not, which in turn influences whether edge creation ceases when no further connections can be made based on the stride parameter.
Motif: Apply all primitives to N qubits The hyperparameters provided are sufficient to generate a diverse array of hierarchical architectures.For example, we demonstrate how to represent the original QCNN from [5] within our framework in Algorithm 2. The arguments for each convolution and pooling primitive are stride, step, and offset.The Qdense primitive generates 2-qubit unitaries between all pairwise combinations of n + 1 qubits.Subsequently, the second primitive M 1 2 takes M 1 1 as its mapping, which just means it treats M 1  1 as a single n + 1-qubit unitary, and distributes it across the circuit with a stride of 1, step of n, and offset of n − 1.This is followed by n convolutions of n-qubit unitaries, each having an offset incremented by one from the previous.For n = 3 and N = 15, the first and last convolution is illustrated in Figure 9a,b.Next, a pooling layer with n-qubit unitaries is applied, measuring the outer n − 1 qubits from each n th qubit, this corresponds to the filter Finally, a convolution is performed on all remaining qubits.In practice, each of these primitives is given a mapping for their corresponding unitary.The mappings of the original QCNN are based on 2 v × 2 v gellman matrices, where v indicates the number of qubits the unitary acts upon.For instance, the first unitary of the primitive M 1 2 operates on v = n + 1 qubits, M 1 3 on v = n qubits and pooling M 1 3 on n qubits where v = n − 1 to leave a qubit for the control.For M 1  5 , v equals the number of remaining qubits.It's easy to generate a family of architectures related to the original by providing the algorithm with different values of stride, step, offset, pooling filters, mappings and relaxing the dependance on n based on how large we want the search space to be.Next, we discuss the applicability of search algorithms FIG.8: An example of how the hyperparameters of the primitives effect the circuit architecture of the family generated by Algorithm 1.Three are shown, the convolution stride s c , pooling stride s p and pooling filter F * .These are specified in the controlling primitives section.Controlled-R θ z gates are used for convolutions and CNOTs for pooling as an example.The convolution stride s c determine how convolution unitaries are distributed across the circuit.Each convolution primitive typically consist of multiple unitaries and the QCNN requires them to be identical for weight sharing.The pooling stride s p determine how pooling unitaries are distributed, for a given pooling primitive, a portion of available qubits gets pooled via controlled unitary operations and s p dictates which controls match to which targets.The pooling filter F * dictates which qubits to pool according to some recursive pattern/mask.For example, circuit d) always pools the outside qubits during pooling primitives, resulting in the middle qubit making it to the end of the circuit.
with our representation.The framework's expressiveness is demonstrated in Figure 1(e,f), where only two lines of code are needed to specify a complete architecture, and in Figure 10, which illustrates how to capture circuits from [5,27].This expressiveness allows search algorithms to explore an extensive range of architectures and numerous design choices.Moreover, the modularity of the framework enables search algorithms to identify robust building blocks to combine into motifs, serving as the foundation for architectural designs.This is especially advantageous in the context of genetic algorithms, as it facilitates the definition of crossover and mutation operations in various ways.For example, mutations can involve adjusting a single hyperparameter of a primitive or replacing an entire primitive within a motif.Crossovers may include combining motifs at the same or different levels or interweaving two motifs by alternating their final sequence of primitives.In the case of reinforcement learning, the modularity allows an agent to make decisions at multiple levels of granularity,

Stride=3
Step=1 Offset=0 FIG. 9: Examples how 3-qubit unitaries are represented with the framework.For general n-qubit unitaries the graphs become hypergraphs with n-tuples as edges.
enabling it to explore and exploit different combinations of primitives and motifs.Hill climbing algorithms can also leverage this modularity in various ways.For instance, we can generate a random fixed high-level motif, such as a MERA circuit, and then iteratively optimize the hyperparameters of each primitive within the motif.In each step, we adjust a hyperparameter to neighboring values, evaluate the resulting objective values, and select the best configuration.Once we have updated all the hyperparameters of all primitives, we obtain a final motif, which can be used as a starting point for the next iteration.This approach of adjusting individual hyperparameters within a multilevel motif allows for incremental changes to the architecture.Such fine-grained modifications can be beneficial in approaches like Bayesian optimization, where smoothness in objective values is advantageous.Additionally, the hierarchical nature of the representation promotes scalability, enabling search algorithms to investigate smaller subsystems before scaling up to the full problem.This can reduce computational costs and allow for the exploration of more architectures.Lastly, the intuitive nature of the representation facilitates understanding the performance of discovered architectures, which enhances interpretability.For instance, in one experiment, we observed a spike in performance for a convolution stride of five.Upon further investigation, we discovered a strong correlation between features one and six, which was previously unknown.This insight informed future experiments and design choices for the problem at hand.
Finally, we present the evolutionary algorithm used in our experiments, which is based on the approach described in [26] and detailed in Algorithm 3. We refer to an architecture as a genotype, and its fitness is determined by the sample complexity for both inside and middle points, as well as the mean squared error (MSE) for outside points in the test set (see Figure 5).Specifically, fitness = c 1

M middle
Mcap +c 3 MSE out +λn p where we cap M in and M middle by some large value M cap and n p is the number of paramaters required for the architec-ture.The weights c 1 , c 2 , and c 3 sum to one, assigning importance to each term.Our experiments showed that setting c 1 = 0.7, c 2 = 0.05, and c 3 = 0.25 led to generally well-performing architectures, we also chose M cap = 500 since fit genotypes exhibit sample complexity below 100.We initialise the population with a pool of 100 random primitives (Qconv, Qpool, Qdense), each having random hyperparameters.Upon initialization, we perform mutation and crossover operations based on tournament selection with a 5% selection pressure.After the selection, we mutate the fittest genotype by choosing one of its primitives and replacing it with a randomly generated one.The crossover operator acts on the two fittest individuals, attempting to combine them tail-to-head.If this is not possible, they are interleaved up to the point where they can be combined.Just like the approach in [26], we do not remove any genotypes from the pool, leading to a more diverse population.[27] (d), generated using our Python package to demonstrate its expressibility, interpretability, and scalability.In (a), the 15-qubit original QCNN is created with the first three parameters of each primitive being stride, step, and offset, respectively.The unitary U mappings employ generalised Gell-Man matrices parameterised based on the number of qubits qpu they act upon.Line 5, Qfree(15) + m 1 + m 2 + m 3 , controls the system size; applying the same architecture to N qubits only requires changing it to Qfree(N ).To introduce a depth d parameter to the circuit, the last line should be modified to m 4 * d + m 5 .In (b), a 16-qubit MERA circuit is generated.For an 8-qubit MERA circuit, the last line would be changed to Qfree(8) + m 1 1 + m 1 2 * 2, and in general, Qfree(N ) + m 1 1 + m 1 2 (log 2 N − 1) produces an N -qubit MERA circuit.These examples highlight the representation's strengths: the essence of an architecture is captured with a few lines of code in a modular and understandable manner, and scaling up to larger systems is accomplished with minimal adjustments.

DISCUSSION
The main contribution of this paper is a framework that enables the dynamic generation of QCNNs and the creation of QCNN search spaces.The framework is provided theoretically in this paper and implemented as a Python package that is ready for use.Our numerical experiments demonstrate the importance of alternating architectures for PQCs, and illustrate a way to increase model performance without increasing its complexity.Our next step is to explore search strategies using this architectural representation to find high-performing QCNNs for different classification tasks automatically.We've already shown how the representation is useful for evolutionary algorithms, as in the classical case [26] but we'd like to explore other search algorithms such as reinforcement learning or bayesian optimization.
Another interesting consideration is the theoretical analysis of QCNN architectures that generalise well across multiple data sets.Recently, it has been shown how symmetry can be used to inform the inductive biases of a model [55,56], and we suspect that our numerical results stem from the search finding architectures that respect symmetries of the data.Symmetry is a natural starting point for creating primitives, the convolution primitive is already constrained by translational symmetry and additional primitives can be developed by considering other symmetries.This approach effectively narrows the search space, enabling a system to automatically discover general equivariant architectures that align well with the data.The framework also allows for the specification qubit orderings that correspond to physical hardware setups.Therefore, benchmarking the effect of noise on different architectures on NISQ devices would be a useful exploration.

METHODS
Figure 1 gives a broad view of the machine learning pipeline we implement for the benchmarks.There are various factors influencing model performance during such a pipeline.Each step, from a raw audio signal to a classified musical genre, contains various possible configurations, the influence of which propagates throughout the pipeline.For this reason, it is difficult to isolate any configuration and evaluate its effect on the model.With our goal being to analyse QCNN architectures (Figure 1 d) on the audio data, we perform random search in the family created by algorithm 1 with different choices of circuit ansatz and quantum data encoding.These are evaluated on two different datasets: Mel spectrogram data (Figure 1 b) and 2D statistical data (Figure 1 c), both being derived from the same audio signal (Figure 1 a).We preprocess the data based on requirements imposed by the model implementation before encoding it into a quantum state.These configurations are expanded on below:

Data
We aimed to use a practical and widely applicable dataset for the data component and chose the well-known [57] music genre dataset, GTZAN.It consists of 1000 audio tracks, each being a 30-second recording of some song.These recordings were obtained from radio, compact disks and compressed MP3 audio files [58].Each is given a label of one of the following ten musical genres: (blues, classical, country, disco, hip-hop, jazz, metal, pop, reggae, rock).Binary classification is used for the analysis of model performance across different architectures.Meaning there are 10  2 = 45 possible genre pairs to build models from.Each pair is equally balanced since there are 100 songs for each genre.The dataset enables the comparison of 45 models per configuration within the audio domain.

Model Implementation
For all experiments, we evaluate instances of Algorithm 1 with N = 8 qubits, resulting in 3(8) − 2 = 22 two-qubit unitaries.We test each model based on different combinations of model architecture, two-qubit unitary ansatz and quantum data encoding.The specific unitaries for U m are chosen from a set of eight ansatzes that were used by [29].They are based on previous studies that explore the expressibility and entangling capability of parameterised circuits [48], hierarchical quantum classifiers [27] and extensions to the VQE [49].These are shown in Figure A.1, the ansatz for pooling also comes from [29] and is shown in figure 11.For quantum data encoding, we compare qubit encoding [36] with IQP encoding [59] on the tabular dataset.Amplitude encoding [60] is used for the image data. where ŷi is obtained from equation 6, i represents one observation and both y, ŷ are all the observations in vector form.

Data Creation
We benchmark the model against two different forms of data, namely tabular and image.To construct the dataset in tabular form, we extract specific features from each audio signal using librosa [61] as shown in Figure 1 (b).Each row represents a single audio track with its features as columns.The specific features extracted are those typically used by music information retrieval systems, namely: chroma frequencies, harmonic and percussive elements, Mel-frequency cepstral coefficients, root-mean-square, spectral bandwidth, spectral centroid, spectral roll-off, tempo and the zero crossing rate.See Appendix D for a short description of these features.To construct the data set in image form, we extract a Mel frequency spectrogram (Figure 1 c) from each audio signal.The Mel scale is a non-linear transformation based on how humans perceive sound and is frequently used in speech recognition applications [62].The spectrogram size depends on the number of qubits available for the QCNN.We can encode 2 N values with amplitude encoding into a quantum state, where N is the number of available qubits.Using N = 8 qubits, we scale the image to 8 × 32 = 256 = 2 8 pixels, normalising each pixel between 0 and 1.The downscaling is done by binning the Mel frequencies into eight groups and taking the first three seconds of each audio signal.

FIG. 4 :
FIG.4: QCNN with the F even m pooling filter using low resolution image data.The accuracies for all genre pairs are provided.

Figure 2 Definition 2
Figure2shows example motifs on different levels for a QCNN.Higher level motifs are tuples and the lowest level ones directed graphs.The dependence between successive motifs is specified in definition 2. Definition 2 Let x ∈ {c, p, f } indicate the primitive type for {Qconv, Qpool, Qfree} and M L 1 be the highest level motif for a QCNN.Then assemble(M L 1 ) flattens depth-wise into M= (G 1 , G 2 , . . ., G |M | ) where G m = (Q x m , E x m ).G 1 is always a Qfree(N q ) primitive specifying the number of available qubits with N q .For m > 1, G m is defined as:

FIG. 7 :
FIG. 7: Diagram showing how changing the convolution stride s c generates different configurations for E c m .
(d), Figure 2, Figure 6 (f) and Figure 8 (a).Specifically, Figure 8 shows how different values for s c , s p and F * generate different instances of the family using Algorithm 1.

3 FIG. 10 :
FIG.10: Example architectures from Cong et al.[5] (c) and Grant et al.[27] (d), generated using our Python package to demonstrate its expressibility, interpretability, and scalability.In (a), the 15-qubit original QCNN is created with the first three parameters of each primitive being stride, step, and offset, respectively.The unitary U mappings employ generalised Gell-Man matrices parameterised based on the number of qubits qpu they act upon.Line 5, Qfree(15) + m 1 + m 2 + m 3 , controls the system size; applying the same architecture to N qubits only requires changing it to Qfree(N ).To introduce a depth d parameter to the circuit, the last line should be modified to m 4 * d + m 5 .In (b), a 16-qubit MERA circuit is generated.For an 8-qubit MERA circuit, the last line would be changed to Qfree(8) + m 1 1 + m 1 2 * 2, and in general, Qfree(N ) + m 1 1 + m 1 2 (log 2 N − 1) produces an N -qubit MERA circuit.These examples highlight the representation's strengths: the essence of an architecture is captured with a few lines of code in a modular and understandable manner, and scaling up to larger systems is accomplished with minimal adjustments.
FIG.11: Pooling ansatz from the experiments of[29].A rotation is applied on the second qubit based on whether the control is one (filled circle) or zero (open circle).

TABLE III
: Different performance metrics (lower is better) for the 15-qubit QCNN from