Main

The recognition that the world is quantum mechanical has allowed researchers to embed well established, but classical, theories into the framework of quantum Hilbert spaces. Shannon’s information theory, which is the basis of communication technology, has been generalized to quantum Shannon theory (or quantum information theory), opening up the possibility that quantum effects could make information transmission more efficient1. The field of biology has been extended to quantum biology to allow for a deeper understanding of biological processes such as photosynthesis, smell and enzyme catalysis2. Turing’s theory of universal computation has been extended to universal quantum computation3, potentially leading to exponentially faster simulations of physical systems.

One of the most successful technologies of this century is machine learning (ML), which aims to classify, cluster and recognize patterns for large datasets. Learning theory has been simultaneously developed alongside of ML technology to understand and improve upon its success. Concepts such as support vector machines, neural networks and generative adversarial networks have impacted science and technology in profound ways. ML is now ingrained into society to such a degree that any fundamental improvement to ML leads to tremendous economic benefit.

Similarly to other classical theories, ML and learning theory can in fact be embedded into the quantum-mechanical formalism. Formally speaking, this embedding leads to the field known as quantum machine learning (QML)4,5,6, which aims to understand the ultimate limits of data analysis allowed by the laws of physics. Practically speaking, the advent of quantum computers, with the hope of achieving a so-called quantum advantage (as defined below) for data analysis, is what has made QML so exciting. Quantum computing exploits entanglement, superposition and interference to perform certain tasks with substantial speedups over classical computing, sometimes even exponentially faster. Indeed, while such speedup has already been observed for a contrived problem7, reaching it for data science is still uncertain even at the theoretical level, but this is one of the main goals for QML.

In practice, QML is a broad term that encompasses all of the tasks shown in Fig. 1. For example, ML can be applied to quantum applications such as discovering quantum algorithms8 or optimizing quantum experiments9,10, or a quantum neural network (QNN) can be used to process either classical or quantum information11. Even classical tasks can be viewed as QML when they are quantum inspired12. We note that the focus of this Perspective will be on QNNs, quantum deep learning and quantum kernels, even though the field of QML is quite broad and goes beyond these topics.

Fig. 1: QML tasks.
figure 1

QML is usually considered for four main tasks. These include tasks where the data are either classical or quantum, and where the algorithm is either classical or quantum. Top left: tensor networks are quantum-inspired classical methods that can analyze classical data. Top right: unitary time-evolution data U from a quantum system can be classically compiled into a quantum circuit. Bottom left: handwritten digits can be mapped to quantum states for classification on a quantum computer. Bottom right: molecular ground-state data can be classified directly on a quantum computer. The figure shows the dependence of ground-state energy E on the distance d between the atoms.

After the invention of the laser, it was called a solution in search of a problem. To some degree, the situation with QML is similar. The complete list of applications of QML is not fully known. Nevertheless, it is possible to speculate that all the areas shown in Fig. 2 will be impacted by QML. For example, QML will likely benefit chemistry, materials science, sensing and metrology, classical data analysis, quantum error correction and quantum algorithm design. Some of these applications produce data that are inherently quantum mechanical, and hence it is natural to apply QML (rather than classical ML) to them.

Fig. 2: Key applications for QML.
figure 2

QML has been envisioned to bring a computational advantage in many applications. QML can enhance quantum simulation for chemistry (for example, molecular ground states110, equilibrium states47 and time evolution112) and materials science (for example, quantum phase recognition11 and generative design with a target property in mind130). QML can enhance quantum computing by learning quantum error correction codes11,109 and syndrome decoders, performing quantum control, learning to mitigate errors and compiling and optimizing quantum circuits. QML can enhance sensing and metrology46,104,105,106,107 and extract hidden parameters from quantum systems. Finally, QML may speed up classical data analysis, including clustering and classification.

While there are similarities between classical and quantum ML, there are also some differences. Because QML employs quantum computers, noise from these computers can be a major issue. This includes hardware noise such as decoherence as well as statistical noise (that is, shot noise) that arises from measurements on quantum states. Both of these noise sources can complicate the QML training process. Moreover, nonlinear operations (for example, neural activation functions) that are natural in classical ML require more careful design of QML models due to the linearity of quantum transformations.

For the field of QML, the immediate goal for the near future is demonstrating quantum advantage, that is, outperforming classical methods, in a data science application. Achieving this goal will require keeping an open mind about which applications will benefit most from QML (for example, it may be an application that is inherently quantum mechanical). Understanding how QML methods scale to large problem sizes will also be required, including analysis of trainability (gradient scaling) and prediction error. The availability of high-quality quantum hardware13,14 will also be crucial.

Finally, we note that QML provides a new way of thinking about established fields, such as quantum information theory, quantum error correction and quantum foundations. Viewing such applications from a data science perspective will likely lead to new breakthroughs.

Framework

Data

As shown in Fig. 3, QML can be used to learn from either classical or quantum data, and thus we begin by contrasting these two types of data. Classical data are ultimately encoded in bits, each of which can be in a 0 or 1 state. This includes images, texts, graphs, medical records, stock prices, properties of molecules, outcomes from biological experiments and collision traces from high-energy physics experiments. Quantum data are encoded in quantum bits, called qubits, or higher-dimensional analogs. A qubit can be represented by the states |0〉, |1〉 or any normalized complex linear superposition of these two. Here, the states contain information obtained from some physical process such as quantum sensing15, quantum metrology16, quantum networks17, quantum control18 or even quantum analog–digital transduction19. Moreover, quantum data can also be the solution to problems obtained on a quantum computer: for example, the preparation of various Hamiltonians’ ground states.

Fig. 3: Classification with QML.
figure 3

a, The classical data x, that is, images of cats and images of dogs, is encoded into a Hilbert space via some map x → |ψ(x)〉. Ideally, data from different classes (here represented by dots and stars) are mapped to different regions of the Hilbert space. b, Quantum data |ψ〉 can be directly analyzed on a quantum device. Here the dataset is composed of states representing metallic or superconducting systems. c, The dataset is used to train a QML model. Two common paradigms in QML are QNNs and quantum kernels, both of which allow for classification of either classical or quantum data. In kernel methods we fit a decision hyperplane that separates the classes. d, Once the model is trained, it can be used to make predictions.

In principle, all classical data can be efficiently encoded in systems of qubits: a classical bitstring of length n can be easily encoded onto n qubits. However, the same cannot be said for the converse, since one cannot efficiently encode quantum data in bit systems; that is, the state of a general n-qubit system requires (2n − 1) complex numbers to be specified. Hence, systems of qubits (and more generally the quantum Hilbert space) constitute the ultimate data representation medium, as they can encode not only classical information but also quantum information obtained from physical processes.

In a QML setting, the term quantum data refers to data that are naturally already embedded in a Hilbert space \({{{\mathcal{H}}}}\). When the data are quantum, they are already in the form of a set of quantum states {|ψj〉} or a set of unitaries {Uj} that could prepare these states on a quantum device (via the relation |ψj〉 = Uj|0〉). On the other hand, when the data x are classical, they first need to be encoded in a quantum system through some embedding mapping xj → |ψ(xj)〉, with |ψ(xj)〉 in \({{{\mathcal{H}}}}\). In this case, the hope is that the QML model can solve the learning task by accessing the exponentially large dimension of the Hilbert space20,21,22,23.

One of the most important and reasonable conjectures to make is that the availability of quantum data will substantially increase in the near future. The mere fact that people will use the quantum computers that are available will logically lead to more quantum problems being solved and quantum simulations being performed. These computations will produce quantum datasets, and hence it is reasonable to expect the rapid rise of quantum data. Note that, in the near term, these quantum data will be stored on classical devices in the form of efficient descriptions of quantum circuits that prepare the datasets.

Finally, as our level of control over quantum technologies progresses, coherent transduction of quantum information from the physical world to digital quantum computing platforms may be achieved19. This would quantum mechanically mimic the main information acquisition mechanism for classical data from the physical world, this being analog–digital conversion. Moreover, we can expect that the eventual advent of practical quantum error correction24 and quantum memories25 will allow us to store quantum data on quantum computers themselves.

Models

Analyzing and learning from data requires a parameterized model, and many different models have been proposed for QML applications. Classical models such as neural networks and tensor networks (as shown in Fig. 1) are often useful for analyzing data from quantum experiments. However, due to their novelty, we will focus our discussion on quantum models using quantum algorithms, where one applies the learning methodology directly at the quantum level.

Similarly to classical ML, there exist several different QML paradigms: supervised learning (task based)26,27,28, unsupervised learning (data based)29,30 and reinforced learning (reward based)31,32. While each of these fields is exciting and thriving in itself, supervised learning has recently received considerable attention for its potential to achieve quantum advantage26,33, resilience to noise34 and good generalization properties35,36,37, which makes it a strong candidate for near-term applications. In what follows we discuss two popular QML models: QNNs and quantum kernels, shown in Fig. 3, with a particular emphasis on QNNs as these are the primary ingredient of several supervised, unsupervised and reinforced learning schemes.

Quantum neural networks

The most basic and key ingredient in QML models is parameterized quantum circuits (PQCs). These involve a sequence of unitary gates acting on the quantum data states |ψj〉, some of which have free parameters θ that will be trained to solve the problem at hand38. PQCs are conceptually analogous to neural networks, and indeed this analogy can be made precise: that is, classical neural networks can be formally embedded into PQCs39.

This has led researchers to refer to certain kinds of PQC as QNNs. In practice, the term QNN is used whenever a PQC is employed for a data science application, and hence we will use the term QNN in what follows. QNNs are employed in all three QML paradigms mentioned above. For instance, in a supervised classification task, the goal of the QNN is to map the states in different classes to distinguishable regions of the Hilbert space26. Moreover, in the unsupervised learning scenario of ref. 29, a clustering task is mapped onto a MAXCUT problem and solved by training a QNN to maximize distance between classes. Finally, in the reinforced learning task of ref. 32, a QNN can be used as the Q-function approximator, which can be used to determine the best action for a learning agent given its current state.

Figure 4 gives examples of three distinct QNN architectures where in each layer the number of qubits in the model is increased, preserved or decreased. In Fig. 4a we show a dissipative QNN40 which generalizes the classical feedforward network. Here, each node corresponds to a qubit, while lines connecting qubits are unitary operations. The term dissipative arises from the fact that qubits in a layer are discarded after the information forward-propagates to the (new) qubits in the next layer. Figure 4b shows a standard QNN where quantum data states are sent through a quantum circuit, at the end of which some or all of the qubits are measured. Here, no qubits are discarded or added as we go deeper into the QNN. Finally, Fig. 4c depicts a convolutional QNN11, where in each layer qubits are measured to reduce the dimension of the data while preserving its relevant features. Many other QNNs have been proposed41,42,43,44,45, and constructing QNN architectures is currently an active area of research.

Fig. 4: Examples of QNN architectures.
figure 4

a, A classical feedforward neural network has input, hidden and output layers. This can be generalized to the quantum setting with a dissipative QNN, where some qubits are discarded and replaced by new qubits during the algorithm. Here we show a quantum circuit representation for the dissipative QNN. In a circuit diagram each horizontal line represents a qubit, and the logical operations, or quantum gates, are represented by boxes connecting the qubit lines. Circuits are read from left to right. For instance, here the circuit is initialized in a product state \(\left|{\psi }_{j}\right\rangle \otimes {\left|0\right\rangle }^{\otimes ({N}_{\mathrm{h}}+{N}_{\mathrm{o}})}\), where |ψj〉 encodes the input data and Nh (No) is the number of ancilla qubits in the hidden (output) layer, which are initialized to the fiduciary state |0〉. As logical operations are performed, the information forward-propagates through the circuit into the ancillary qubits. b, Another possible QNN strategy is to keep the qubits fixed, without discarding or replacing them. The circuit represents consecutive application of two-qubit gates Uj and controlled-NOT (denoted by CNOT) gates. c, QCNNs measure and discard qubits during the algorithm. The QCNN circuit considered here is built with two-qubit quantum gates Uj and is initialized in |ψj〉.

To further accommodate the limitation of near-term quantum computers, we can also employ a hybrid approach with models that have both classical and quantum neural networks46. Here, QNNs act coherently on quantum states while deep classical neural networks alleviate the need for higher-complexity quantum processing. Such hybridization distributes the representational capacity and computational complexity across both quantum and classical computers. Moreover, since quantum states generally have a mixture of classical correlations and quantum correlations, hybrid quantum–classical models allow for the use of quantum computers as an additive resource to increase the ability of classical models to represent quantum-correlated distributions. Applications of hybrid models include generating47 or learning and distilling information46 from multipartite-entangled distributions.

Quantum kernels

As an alternative to QNNs, researchers have proposed quantum versions of kernel methods26,28. A kernel method maps each input to a vector in a high-dimensional vector space, known as the reproducing kernel Hilbert space. Then, a kernel method learns a linear function in the reproducing kernel Hilbert space. The dimension of the reproducing kernel Hilbert space could be infinite, which enables the kernel method to be very powerful in terms of expressiveness. To learn a linear function in a potentially infinite-dimensional space, the kernel trick48 is employed, which only requires efficient computation of the inner product between these high-dimensional vectors. The inner product is also known as the kernel48. Quantum kernel methods consider the computation of kernel functions using quantum computers. There are many possible implementations. For example, refs. 26,28 considered a reproducing kernel Hilbert space equal to the quantum state space, which is finite dimensional. Another approach13 is to study an infinite-dimensional reproducing kernel Hilbert space that is equivalent to transforming a classical vector using a quantum computer. It then maps the transformed classical vectors to infinite-dimensional vectors.

Inductive bias

For both QNNs and quantum kernels, an important design criterion is their inductive bias. This bias refers to the fact that any model represents only a subset of functions and is naturally biased towards certain types of function (that is, functions relating the input features to the output prediction). One aspect of achieving quantum advantage with QML is to aim for QML models with an inductive bias that is inefficient to simulate with a classical model. Indeed, it was recently shown49 that quantum kernels with this property can be constructed, albeit with some subtleties regarding their trainability.

Generally speaking, inductive bias encompasses any assumptions made in the design of the model or the optimization method that bias the search of the potential models to a subset in the set of all possible models. In the language of Bayesian probabilistic theory, we usually call these assumptions our prior. Having a certain parameterization of potential models, such as QNNs, or choosing a particular embedding for quantum kernel methods13,14,26 is itself a restriction of the search space, and hence a prior. Adding a regularization term to the optimizer or modulating the learning rate to keep searches geometrically local also adds inherently a prior and focuses the search, and thus provides inductive bias.

Ultimately, inductive biases from the design of the ML model, combined with a choice of training process, are what make or break an ML model. The main advantage of QML will then be to have the ability to sample from and learn models that are (at least partially) natively quantum mechanical. As such, they have inductive biases that classical models do not have. This discussion assumes that the dataset to be represented is quantum mechanical in nature, and is one of the reasons why researchers typically believe that QML has greater promise for quantum rather than classical data.

Training and generalization

The ultimate goal of ML (classical or quantum) is to train a model to solve a given task. Thus, understanding the training process of QML models is fundamental for their success.

Consider the training process, whereby we aim to find the set of parameters θ that lead to the best performance. The latter can be accomplished, for instance, by minimizing a loss function \({{{\mathcal{L}}}}({{{\mathbf{\uptheta }}}})\) encoding the task at hand. Some methods for training QML models are leveraged from classical ML, such as stochastic gradient descent. However, shot noise, hardware noise and unique landscape features often make off-the-shelf classical optimization methods perform poorly for QML training. (This is due to the fact that extracting information from a quantum state requires computing the expectation values of some observable, which in practice need to be estimated via measurements on a noisy quantum computer. Hence, given a finite number of shots (measurement repetitions), these can only be resolved up to some additive errors. Moreover, such expectation values will be subject to corruption due to hardware noise.) This realization led to development of quantum-aware optimizers, which account for the quantum idiosyncrasies of the QML training process. For example, shot-frugal optimizers50,51,52,53 can employ stochastic gradient descent while adapting the number of shots (or measurements) needed at each iteration, so as not to waste too many shots during the optimization. Quantum natural gradient54,55 adjusts the step size according to the local geometry of the landscape (on the basis of the quantum Fisher information metric). These and other quantum-aware optimizers often outperform standard classical optimization methods in QML training tasks.

For the case of supervised learning, we are interested not only in learning from a training dataset but also in making accurate predictions on (generalizing to) previously unseen data. This translates into achieving small training and prediction errors, with the second usually hinging on the first. Thus, let us now consider prediction error, also known as generalization error, which has been studied only very recently for QML13,14,35,37,56,57. Formally speaking, this error measures the extent to which a trained QML model performs well on unseen data. Prediction error depends on the training error as well as the complexity of the trained model. If the training error is large, the prediction error is also typically large. If the training error is small but the complexity of the trained model is large, then the prediction error is likely still large. The prediction error is small only if training error is itself small and the complexity of the trained model is moderate (that is, sufficiently smaller than training data size)14,35. The notion of complexity depends on the QML model. We have a good understanding of the complexity of quantum kernel methods13,14, while more research is needed on QNN complexity. Recent theoretical analysis of QNNs shows that their prediction performance is closely linked to the number of independent parameters in the QNN, with good generalization obtained when the number of training data is roughly equal to the number of parameters35. This gives the exciting prospect of using only a small number of training data to obtain good generalization.

Challenges in QML

Heuristic fields can face periods of stagnation (or ‘winters’) due to unforeseen technical challenges. Indeed in classical ML, there was a gap between introducing a single perceptron58 and the multilayer perceptron59 (that is, neural network), and there was also a gap between attempts to train multiple layers and the introduction of the backpropagation method60.

Naturally we would like to avoid these stagnations or winters for QML. The obvious strategy is to try to determine all of the challenges as quickly as possible, and focus research effort on addressing them. Fortunately, QML researchers have adopted this strategy. Figure 5 showcases some of the different elements of QML models, as well as the challenges associated with them. In this section we detail various QML challenges and how they could potentially be avoided.

Fig. 5: Challenges for QML.
figure 5

a, There are several ingredients and priors needed to build a QML model: a dataset (and an encoding scheme for classical data), the choice of parameterized model, loss function and classical optimizer. In this diagram, we show some of the challenges of the different components of the model. bd, The success of the QML model hinges on an accurate and efficient training of the parameters. However, there are certain phenomena that can hinder the QML trainability. These include the abundance of low-quality local minimum solutions (b), as well as the barren plateau phenomenon (c). When a QML architecture exhibits a barren plateau, the landscape becomes exponentially flat (on average) as the number of qubits increases (seen as a transition from dashed to solid line). The presence of hardware noise has been shown to erase the features in the landscape as well as potentially shifting the position of the minima. Here, the dashed (solid) line corresponds to the noiseless (noisy) landscape shown in d.

Embedding schemes and quantum datasets

The access to high-quality, standardized datasets has played a key role in advancing classical ML. Hence, one could conjecture that such datasets will be crucial for QML as well.

Currently, most QML architectures are benchmarked using classical datasets (such as MNIST, Dogs vs Cats and Iris). While using classical datasets is natural due to their accessibility, it is still unclear how to best encode classical information onto quantum states. Several embedding schemes have been proposed22,26,61, and there are some properties they must possess. One such property is that the inner product between output states of the embedding is classically hard to simulate (otherwise the quantum kernel would be classically simulable). In addition, the embedding should be practically useful: that is, in a classification task, the states should be in distinguishable regions of the Hilbert space. Unfortunately, embeddings that satisfy one of these properties do not necessarily satisfy the other62. Thus, developing encoding schemes is an active area of research, especially those that are equipped with an inductive bias containing information about the dataset49.

Furthermore, some recent results suggest that achieving a quantum advantage with classical data might not be straightforward49. On the other hand, QML models with quantum data have a more promising route towards a quantum advantage63,64,65,66. Despite this fact, there is still a dearth of truly quantum datasets for QML, with just a few recently proposed67,68. Hence, the field needs standardized quantum datasets with easily preparable quantum states, as these can be used to benchmark QML models on true quantum data.

Quantum landscapes

Training the parameters of the QML model corresponds in a wide array of cases to minimizing a loss function and navigating through a (usually non-convex) loss function landscape in search of its global minimum. Technically speaking, the loss function defines a map from the model’s parameter space to the real values. The loss function value can quantify, for instance, the model’s error in solving a given task, so our goal is to find the set of parameters that minimizes such error. Quantum landscape theory69 aims to understand QML landscape properties and how to engineer them. Local minima and barren plateaus have received substantial attention in quantum landscape theory.

Local minima in quantum landscapes

As schematically shown in Fig. 5b, similarly to classical ML, the quantum loss landscape can have many local minima. Ultimately, this can lead to the overall non-convex optimization being NP-hard70, which is again similar to the classical case. There have been some methods proposed to address local minima. For example, variable structure QNNs71,72, which grow and contract throughout the optimization, adaptively change the model’s prior and allow some local minima to be turned into saddle points. Moreover, evidence of the overparametrization phenomenon has been seen for QML73,74. Here, the optimization undergoes a computational phase transition, due to spurious local minima disappearing, whenever the number of parameters exceeds a critical value.

Overview of barren plateaus

Local minima are not the only issue facing QML, as it has been shown that quantum landscapes can exhibit a fascinating property known as a barren plateau57,75,76,77,78,79,80,81,82,83,84,85,86,87. As depicted in Fig. 5c, in a barren plateau the loss landscape becomes, on average, exponentially flat with the problem size. When this occurs, the valley containing the global minimum also shrinks exponentially with problem size, leading to a so-called narrow gorge69. As a consequence, exponential resources (for example, numbers of shots) are required to navigate through the landscape. The latter impacts the complexity of the QML algorithm and can even destroy quantum speedup, since quantum algorithms typically aim to avoid the exponential complexity normally associated with classical algorithms.

Barren plateaus from ignorance or insufficient inductive bias

The barren plateau phenomenon has been studied in deep hardware-efficient QNNs75, where they arise due to the high expressivity of the model79. By making no assumptions about the underlying data, deep hardware-efficient architectures aim to solve a problem by being able to prepare a wide range of unitary evolutions. In other words, the prior over hypothesis space is relatively uninformed. Barren plateaus in this unsharp prior are caused by ignorance or the lack of sufficient inductive bias, and therefore a means to avoid them is to input knowledge into the construction of the QNN—making the design of QNNs with good inductive biases for the problem at hand a key solution.

Fortunately various strategies have been developed to address these barren plateaus, such as clever initialization88, pretraining and parameter correlation80,81. These are all examples of adding a sharper prior to the search over the overexpressive parameterizations of hardware-efficient QNNs. Below we further discuss how QNN architectures can be designed to further introduce inductive bias.

Barren plateaus from global observables

Other mechanisms have been linked to barren plateaus. Simply defining a loss function based on a global observable (that is, observables measuring all qubits) leads to barren plateaus even for shallow circuits with sharp priors76, while local observables (those comparing quantum states at the single-qubit level) avoid this issue76,85. The latter is due not to bad inductive biases but rather to the fact that comparing objects in exponentially large Hilbert spaces requires an exponential precision, as their overlap is usually exponentially small.

Barren plateaus from entanglement

While entanglement is one of the most important quantum resources for information processing tasks in quantum computers, it can also be detrimental for QML models. QNNs (or embedding schemes) that generate too much entanglement also lead to barren plateaus82,84,86. Here, the issue arises when the visible qubits of the QNN (those that are measured at the QNN’s output) are entangled with a large number of qubits in the hidden layers. Due to entanglement, the information of the state is stored in non-local correlations across all qubits, and hence the reduced state of the visible qubits concentrates around the maximally mixed state. This type of barren plateau can be solved by taming the entanglement generated across the QNN.

QNN architecture design

One of the most active areas is developing QNN architectures that have sharp priors. Since QNNs are a fundamental ingredient in supervised learning (deep learning, kernel methods), but also in unsupervised learning and reinforced learning, developing good QNN architectures is crucial for the field.

For instance, it has been shown that QNNs with sharp priors can avoid issues such as barren plateaus altogether. One such example is quantum convolutional neural networks (QCNNs)11. QCNNs possess an inductive bias from having a prior over the space of architectures that is much sharper than that of deep hardware-efficient architectures, as QCNNs are restricted to be hierarchically structured and translationally invariant. The notable reduction in the expressivity and parameter space dimension from this translational invariance assumption yields the greater trainability80.

The idea of embedding knowledge about the problem and dataset into our models (to achieve helpful inductive bias) will be key to improve the trainability of QML models. Recent proposals use quantum graph neural networks89 for scenarios where quantum subsystems live on a graph, and potentially have further symmetries. For instance, the underlying graph-permutation symmetries of a quantum communication dataset were taken into account by a quantum graph convolutional network. Similarly, a quantum recurrent neural network has been used in scenarios where temporal recurrence of parameters occurs—for example, in the quantum dynamics of a stationary (time-dependent) quantum dynamical process.

To better understand how to go beyond the aforementioned inductive biases from temporal and/or translational invariance in grids and graphs, we can take inspiration from recent advances in the theory of classical deep learning. In classical ML, the study of the group theory behind graph neural networks, namely the concepts of invariance and equivariance to various group actions on the input space, has led to a unifying theory of deep learning architectures based on group theory, called geometric deep learning theory90.

To have a prescription to create arbitrary architectures and inductive biases suitable for a given set of quantum physical data, a theory of quantum geometric deep learning could be key to design architectures with the right prior over the transformation space and inductive biases to ensure trainability and generalization. As the study of physics is often about the identification of inherent or emergent symmetries in particular systems, there is great potential for a future unifying theory of quantum geometric deep learning to provide consistent methods to create QML model architectures with inductive biases encoding knowledge of the basic symmetries and principles of the quantum physical system underlying given quantum datasets. This approach has been recently explored in refs. 91,92,93. Moreover, the works 74,94 have also shown that the Lie algebra obtained from the generators of the QNN can be linked to properties of the QML landscape such as the presence of barren plateaus or the overparametrization phenomenon.

Effect of quantum noise

The presence of hardware noise during quantum computations is one of the defining characteristics of noisy intermediate-scale quantum (NISQ) computing. Despite this fact, most QML research neglects noise in the analytical calculations and numerical simulations while still promising that the methods are near-term compatible. Accounting for the effects of hardware noise should be a crucial aspect of QML analysis if we wish to pursue a quantum advantage with currently available hardware.

Noise corrupts the information as it forward-propagates in a quantum circuit, meaning that deeper circuits with longer run-times will be particularly affected. As such, noise affects all aspects of the model that make use of quantum computers. This includes the dataset preparation scheme as well as circuits used to compute quantum kernels. Moreover, when using QNNs, noise can hinder their trainability as it leads to noise-induced barren plateaus87,95. Here, the relevant features of the landscape become exponentially suppressed by noise as the depth of the circuit increases (Fig. 5d). Ultimately, the effects of noise translate into a deformation of the inductive bias of the model from its original one, and an effective reduction of the dimension of the quantum feature space. Despite the critical impact of quantum noise, its effects are still largely unexplored, particularly its impact on the classical simulability of the QML model96,97.

Addressing noise-induced issues will likely require either (1) reduction in hardware error rates, (2) partial quantum error correction98 or (3) employing QNNs that are relatively shallow (that is, whose depth grows sublinearly in the problem size)87, such as QCNNs. Error mitigation techniques99,100,101 can also improve performance of QML models in the presence of noise, although they may not solve noise-induced trainability issues95. A different approach to dealing with noise is to engineer QML models with noise-resilient properties34,102,103 (such as the position of the minima not changing due to noise).

Outlook

Potential for quantum advantage

The first quantum advantages in QML will likely arise from hidden parameter extraction from quantum data. This can be for quantum sensing or quantum state classification/regression. Fundamentally, we know from the theory of optimal measurement that non-local quantum measurements can extract hidden parameters using fewer samples. Using QML, one can form and search over a parameterization of hypotheses for such measurements.

This is particularly useful when such optimal measurements are not known a priori—for example, identifying the measurement that extracts an order parameter or identifies a particular phase of matter. As the information about this classical parameter is embedded in the structure of quantum correlations between subsystems, it is natural that a trained QML model with good inductive biases can exhibit an advantage over local measurements and classical representations.

Another area of application where classical parameter extraction may yield an advantage is in quantum machine perception46,63,104,105,106,107, that is, quantum sensing, metrology and beyond. Here, leveraging the variational search over multipartite-entangled states for input to exposure to a quantum signal along with the optimization for optimal control and/or over post-processing schemes can find optimal measurements for the estimation of hidden parameters in the incoming signal. In particular, the variational approach may be able to find the optimal entanglement, exposure and measurement scheme that filters signal from noise108, akin to variationally learning the quantum error correcting code that filters signal from noise, but instead applied to quantum metrology.

Beyond classical parameter extraction embedded in quantum data, there may be an advantage for the discovery of quantum error correcting codes109. Quantum error correcting codes fundamentally encode data (typically) non-locally into a subsystem or subspace of the Hilbert space. As deep learning is fundamentally about the discovery of submanifolds of data space, identifying and decoding subspaces/subsystems from a Hilbert space that correspond to a quantum error correction subspace/subsystem is a natural area where differentiable quantum computing may yield an advantage. This is barely explored, mainly due to the difficulty of gaining insights with small-scale numerical simulations. Fundamentally, it is akin to a quantum data version of classical parameter embedding/extraction advantage.

Finally, a quantum advantage for generative modeling may be achieved when ground states110, equilibrium states47,111 or quantum dynamics112 can be generated using models incorporating QNNs, in a situation where the distribution cannot be sampled classically, and yields more accurate predictions or more extensive generalization compared with classical ML approaches. The nearest-term possibility for demonstrating such an advantage would likely be from variational optimization at the continuous time optimal control level on analog quantum simulators.

What will quantum advantage look like?

When the data originate from quantum-mechanical processes, such as from experiments in chemistry, material science, biology and physics, it is more likely to see exponential quantum advantage in ML. The quantum advantage could be in sample complexity or time complexity. An exponential advantage in sample complexity always implies an exponential advantage in time complexity, but the reverse is not generally true. It was recently shown63,65,113,114 that there is an exponential quantum advantage in sample complexity when we can use a quantum sensor, quantum memory and quantum computer to retrieve, store and process quantum information from experiments. Such a sample complexity advantage can be proven rigorously without the possibility of being dequantized12,64,115 in the future, that is, it is impossible to find improved classical algorithms such that there is no exponential advantage. This substantial quantum advantage has recently been demonstrated on the Sycamore processor63 raising the hope for achieving quantum advantage using NISQ devices116.

The situation for advantage in time complexity is more subtle. Classical simulation of quantum process is intractable in many cases, hence exponential advantage in time complexity would be expected to be prevalent. However, we should be cautious about the availability of data in ML tasks, which makes classical ML algorithms computationally more powerful13,117. For instance, ref. 117 shows that in the worst case there is no exponential quantum advantage in predicting ground-state properties in geometrically local gapped Hamiltonians. Furthermore, the emergence of effective classical theory in quantum-mechanical processes could enable classical machines to provide accurate predictions. For example, density functional theory118,119 allows accurate prediction of molecular properties when we have an accurate approximation to the exchange–correlation functionals by conducting real-world experiments. It is still likely that an exponential advantage is possible in physical systems of practical interest, but there are no rigorous proofs yet.

When the data are of a purely classical origin, such as in applications for recommending products to customers12, performing portfolio optimization120,121 and processing human languages122 and everyday images123, there is no known exponential advantage115. However, it is still reasonable to expect polynomial advantage. Furthermore, a quadratic advantage can be rigorously proven124,125 for purely classical problems. Therefore, we likely have a potential impact in the long term when we have fault-tolerant quantum computers, albeit with the speedup notably dampened by the overheads of quantum error correction126 for currently known fault-tolerant quantum computing schemes.

Transition to the fault-tolerant era and beyond

While QML has been proposed as a candidate to achieve a quantum advantage in the near term using NISQ devices, we can still pose a question about its usability in the future. Here, researchers envision two different chronological eras post-NISQ. In the first, which we can refer to as ‘partial error corrected’, quantum computers will have enough physical qubits (a couple of hundred), and sufficiently small error rates, to allow for a small number of fully error-corrected logical qubits. Since one logical qubit is comprised of multiple physical qubits, in this era we will have the freedom to trade off and split the qubits in the device into a subset of error-corrected qubits along with a subset of non-error-corrected qubits. The next era, that is, the ‘fault-tolerant’ era, will arise when the quantum hardware has a large number of error-corrected qubits.

Indeed, we can easily envision QML being useful in both of these post-NISQ eras. First, in the partial error-corrected era, QML models will be able to execute high-fidelity circuits and thus have an improved performance. This will naturally enhance the trainability of the models by mitigating noise-induced barren plateaus, and also reduce noise-induced classification errors in QML models. Most importantly, QML will likely see its most widespread and critical use during the fault-tolerant era. Here, quantum algorithms such as those for quantum simulation127,128 will be able to accurately prepare quantum data, and to faithfully store it in quantum memories129. Therefore QML will be the natural model to learn, infer and make predictions from quantum data, as here the quantum computer will learn from the data themselves directly.

On the further-term horizon, we anticipate it will be possible to capture quantum data from nature directly via transduction from their natural analog form to one that is quantum digital (for example, via quantum analog–digital interconversion19). These data will then be able to be shuttled around quantum networks for distributed and/or centralized processing with QML models, using fault-tolerant quantum computation and error-corrected quantum communication. At this point, QML will have reached a stage similar to that where ML is today, where edge sensors capture data, the data are relayed to a central cloud and ML models are trained on the aggregated data. As the modern advent of widespread classical ML arose at this point of abundant data, one could anticipate that ubiquitous access to quantum data in the fault-tolerant era could similarly propel QML to even greater widespread use.