Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Quantum-inspired machine learning on high-energy physics data

Abstract

Tensor Networks, a numerical tool originally designed for simulating quantum many-body systems, have recently been applied to solve Machine Learning problems. Exploiting a tree tensor network, we apply a quantum-inspired machine learning technique to a very important and challenging big data problem in high-energy physics: the analysis and classification of data produced by the Large Hadron Collider at CERN. In particular, we present how to effectively classify so-called b-jets, jets originating from b-quarks from proton–proton collisions in the LHCb experiment, and how to interpret the classification results. We exploit the Tensor Network approach to select important features and adapt the network geometry based on information acquired in the learning process. Finally, we show how to adapt the tree tensor network to achieve optimal precision or fast response in time without the need of repeating the learning process. These results pave the way to the implementation of high-frequency real-time applications, a key ingredient needed among others for current and future LHCb event classification able to trigger events at the tens of MHz scale.

Introduction

Artificial Neural Networks (NN) are a well-established tool for applications in Machine Learning and they are of increasing interest in both research and industry1,2,3,4,5,6. Inspired by biological NN, they are able to recognise patterns while processing a huge amount of data. In a nutshell, a NNs describes a functional mapping containing many variational parameters, which are optimised during the training procedure. Recently, deep connections between Machine Learning and quantum physics have been identified and continue to be uncovered7. On one hand, NNs have been applied to describe the behaviour of complex quantum many-body systems8,9,10 while, on the other hand, quantum-inspired technologies and algorithms are taken into account to solve Machine Learning tasks11,12,13.

One particular numerical method originated from quantum physics which has been increasingly compared to NNs are Tensor Networks (TNs)14,15,16. TNs have been developed to investigate quantum many-body systems on classical computers by efficiently representing the exponentially large quantum wavefunction \(\left|\psi \right\rangle\) in a compact form and they have proven to be an essential tool for a broad range of applications17,18,19,20,21,22,23,24,25,26. The accuracy of the TN approximation can be controlled with the so-called bond-dimension χ, an auxiliary dimension for the indices of the connected local tensors. Recently, it has been shown that TN methods can also be applied to solve Machine Learning (ML) tasks very effectively13,16,27,28,29,30,31. Indeed, even though NNs have been highly developed in recent decades by industry and research, the first approaches of ML with TN yield already comparable results when applied to standard datasets13,27,32. Due to their original development focusing on quantum systems, TNs allow to easily compute quantities such as quantum correlations or entanglement entropy and thereby they grant access to insights on the learned data from a distinct point of view for the application in ML16,30. Hereafter, we demonstrate the effectiveness of the approach and, more importantly, that it allows introducing algorithms to simplify and explain the learning process, unveiling a pathway to an explainable Artificial Intelligence. As a potential application of this approach, we present a TN supervised learning of identifying the charge of b-quarks (i.e. b or \(\bar{b}\)) produced in high-energy proton–proton collisions at the Large Hadron Collider (LHC) accelerator at CERN.

In what follows, we first describe the quantum-inspired Tree Tensor Network (TTN) and introduce different quantities that can be extracted from the TTN classifier which are not easily accessible for the biological-inspired Deep NN (DNN), such as correlation functions and entanglement entropy which can be used to explain the learning process and subsequent classifications, paving the way to an efficient and transparent ML tool. In this regard, we introduce the Quantum-Information Post-learning feature Selection (QuIPS), a protocol that reduces the complexity of the ML model based on the information the single features provide for the classification problem. We then briefly describe the LHCb experiment and its simulation framework, the main observables related to b-jets physics, and the relevant quantities for this analysis together with the underlying LHCb data33,34. We further compare the performance obtained by the DNN and the TTN, before presenting the analytical insights into the TTN which, among others, can be exploited to improve future data analysis of high-energy problems for a deeper physical understanding of the LHCb data. Moreover, we introduce the Quantum-Information Adaptive Network Optimisation (QIANO), which adapts the TN representation by reducing the number of free parameters based on the captured information within the TN while aiming to maintain the highest accuracy possible. Therewith, we can optimise the trained TN classifier for a targeted prediction speed without the necessity to relearn a new model from scratch.

TNs are not only a well-established way to represent a quantum wavefunction \(\left|\psi \right\rangle\), but more general an efficient representation of information as such. In the mathematical context, a TN approximates a high-order tensor by a set of low-order tensors that are contracted in a particular underlying geometry and have common roots with other decompositions, such as the Singular Value Decomposition (SVD) or Tucker decomposition35. Among others, some of the most successful TN representations are the Matrix Product State—or Tensor Trains18,27,36,37, the TTN—or Hierarchical Tucker decomposition30,38,39, and the Projected Entangled Pair States40,41.

For a supervised learning problem, a TN can be used as the weight tensorW13,27,30, a high-order tensor which acts as classifier for the input data {x}: Each sample x is encoded by a feature map Φ(x) and subsequently classified by the weight tensor W. The final confidence of the classifier for a certain class labelled by l is given by the probability

$${{\mathcal{P}}}_{l}({\bf{x}})\,=\,{W}_{l}\,\cdot\, {{\Phi }}({\bf{x}})\,.$$
(1)

In the following, we use a TTN Ψ to represent W (see Fig. 1, bottom right) which can be described as a contraction of its N hierarchically connected local tensors T{χ}

$${{\Psi }}\,=\,\mathop{\sum}\limits_{\chi }{T}_{l,{\chi }_{1},{\chi }_{2}}^{[1]}\mathop{\prod }\limits_{\eta =2}^{N}{T}_{{\chi }_{n},{\chi }_{2n},{\chi }_{2n\,+\,1}}^{[\eta ]}$$
(2)

where n [1, N]. Therefore, we can interpret the TTN classifier Ψ as well as a set of quantum many-body wavefunctions \(\left|{\psi }_{l}\right\rangle\)—one for each of the class labels l (see Supplementary Methods). For the classification, we represent each sample x by a product state Φ(x). Therefore, we map each feature xix into a quantum spin by choosing the feature map Φ(x) as a Kronecker product of N + 1 local feature maps

$${{{\Phi }}}^{[i]}({x}_{i})\,=\,\left[\cos \left(\frac{\pi x^{\prime} }{2}\right),\sin \left(\frac{\pi {x}_{i}^{\prime}}{2}\right)\right]$$
(3)

where \(x^{\prime} \,\equiv\, {x}_{i}/{x}_{i,\text{max}}\,\in\, [0,1]\) is the re-scaled value with respect to the maximum xi,max within all samples of the training set.

Fig. 1: Data flow of the Machine Learning analysis for the b-jet classification of the LHCb experiment at CERN.
figure1

After proton–proton collisions, b- and \(\bar{b}\)-quarks are created, which subsequently fragment into particle jets (left). The different particles within the jets are tracked by the LHCb detector. Selected features of the detected particle data are used as input for the Machine Learning analysis by NNs and TNs in order to determine the charge of the initial quark (right).

Accordingly, we classify a sample x by computing the overlap 〈Φ(x)ψl〉 for all labels l with the product state Φ(x) resulting in the weighted probabilities

$${{\mathcal{P}}}_{l}\,=\,\frac{| \langle {{\Phi }}(x)| {\psi }_{l}\rangle {| }^{2}}{{\sum }_{l}| \langle {{\Phi }}(x)| {\psi }_{l}\rangle {| }^{2}}$$
(4)

for each class. We point out, that we can encode the input data in different non-linear feature maps as well (see Supplementary Notes).

One of the major benefits of TNs in quantum mechanics is the accessibility of information within the network. They allow to efficiently measure information quantities such as entanglement entropy and correlations. Based on these quantum-inspired measurements, we here introduce the QuIPS protocol for the TN application in ML, which exploits the information encoded and accessible in the TN in order to rank the input features according to their importance for the classification.

In information theory, entropy as such is a measure of the information content inherent in the possible outcomes of variables, such as e.g. a classification42,43,44. In TNs such information content can be assessed by means of the entanglement entropy S which describes the shared information between TN bipartitions. The entanglement S is measured via the Schmidt decomposition, that is, decomposing \(\left|\psi \right\rangle\) into two bipartitions \(\left|{\psi }_{\alpha }^{A}\right\rangle\) and \(\left|{\psi }_{\alpha }^{B}\right\rangle\)44 such that

$${{\Psi }}\,=\,\mathop{\sum }\limits_{\alpha }^{\chi }{\lambda }_{\alpha }\left|{{{\Psi }}}_{\alpha }^{A}\right\rangle \,\otimes\, \left|{{{\Psi }}}_{\alpha }^{B}\right\rangle ,$$
(5)

where λα are the Schmidt-coefficients (non-zero, normalised singular values of the decomposition). The entanglement entropy is then defined as \(S\,=\,-{\sum }_{\alpha }{\lambda }_{\alpha }^{2}{\mathrm{ln}}\,{\lambda }_{\alpha }^{2}\). Consequently, the minimal entropy S = 0 is obtained only if we have one single non-zero singular value λ1 = 1. In this case, we can completely separate the two bipartitions as they share no information. On the contrary, higher S means that information is shared among the bipartitions.

In the ML context, the entropy can be interpreted as follows: If the features in one bipartition provide no valuable information for the classification task, the entropy is zero. On the contrary, S increases the more information between the two bipartitions are exploited. This analysis can be used to optimise the learning procedure: whenever S = 0, the feature can be discarded with no loss of information for the classification. Thereby, a second model with fewer features and fewer tensors can be introduced. This second, more efficient model results in the same predictions in less time. On the contrary, a high bipartition entropy highlights which feature—or combination of features—are important for the correct predictions.

The second set of measurements we take into account are the correlation functions

$${C}_{i,j}^{l}\,=\,\frac{\langle {\psi }_{l}| {\sigma }_{i}^{z}{\sigma }_{j}^{z}| {\psi }_{l}\rangle }{\langle {\psi }_{l}| {\psi }_{l}\rangle }$$
(6)

for each pair of features (located at site i and j) and for each class l. The correlations offer an insight into the possible relation among the information that the two features provide. In case of maximum correlation or anti-correlation among them for all classes l, the information of one of the features can be obtained by the other one (and vice versa), thus one can be neglected. In case of no correlation among them, the two features may provide fundamentally different information for the classification. The correlation analysis allows pinpointing if two features give independent information. However, the correlation itself—in contrast to the entropy—does not tell if this information is important for the classification.

In conclusion, based on the previous insights, namely: (i) a low entropy of a feature bipartition signals that one of the two bipartitions can be discarded, providing negligible loss of information and (ii) if two features are completely (anti-)correlated we can neglect at least one of them, the QuIPS enables to filter out the most valuable features for the classification.

Nowadays, in particle physics, ML is widely used for the classification of jets, i.e. streams of particles produced by the fragmentation of quarks and gluons. The jet substructure can be exploited to solve such classification problems45. ML techniques have been proposed to identify boosted, hadronically decaying top quarks46, or to identify the jet charge47. The ATLAS and CMS collaborations developed ML algorithms in order to identify jets generated by the fragmentation of b-quarks48,49,50: a comprehensive review on ML techniques at the LHC can be found in51.

The LHCb experiment in particular is, among others, dedicated to the study of the physics of b- and c-quarks produced in proton–proton collisions. Here, ML methods have been introduced recently for the discrimination between b- and c-jets by using Boosted Decision Tree classifiers52. However, a crucial topic for the LHCb experiment, which is yet unexploited by ML, is the identification of the charge of a b-quark, i.e. discriminating between a b or \(\bar{b}\). Such identification can be used in many physics measurements, and it is the core of the determination of the charge asymmetry in b-pairs production, a quantity sensitive to physics beyond the Standard Model53. Whenever produced in a scattering event, b-quarks have a short lifetime as free particles; indeed, they manifest themselves as bound states (hadrons) or as narrow cones of particles produced by the hadronization (jets). In the case of the LHCb experiment, the b-jets are detected by the apparatus located in the forward region of proton–proton collisions (see Fig. 1, left)54. The LHCb detector includes a particle identification system that distinguishes different types of charged particles within the jet, and a high-precision tracking system able to measure the momentum of each particles55. Still, the separation between b- and \(\bar{b}\)-jets is a highly difficult task because the b-quark fragmentation produce dozens of particles via non-perturbative Quantum Chromodynamics processes, resulting in non-trivial correlations between the jet particles and the original quark.

The algorithms used to identify the charge of the b-quarks based on information on the jets are called tagging methods. The tagging algorithm performance is typically quantified with the tagging power ϵtag, representing the effective fraction of jets that contribute to the statistical uncertainty in an asymmetry measurement56,57. In particular, the tagging power ϵtag takes into account the efficiency ϵeff (the fraction of jets for which the classifier takes a decision) and the prediction accuracy a (the fraction of correctly classified jets among them) as follows:

$${\epsilon }_{tag}\,=\,{\epsilon }_{eff}\,\cdot\, {(2a\,-\,1)}^{2}\,.$$
(7)

To date, the muon tagging method gives the best performance on the b- vs. \(\bar{b}\)-jet discrimination using the dataset collected in the LHC Run I58: here, the muon with the highest momentum in the jet is selected, and its electric charge is used to decide on the b-quark charge.

For the ML application, we now formulate the identification of the b-quark charge in terms of a supervised learning problem. As described above, we implemented a TTN as a classifier and applied it to the LHCb problem analysing its performance. Alongside, a DNN analysis is performed to the best of our capabilities, and both algorithms are compared with the muon tagging approach. Both the TTN and the DNN, use as input for the supervised learning 16 features of the jet substructure from the official simulation data released by the LHCb collaboration33,34. The 16 features are determined as follows: the muon with the highest pT among all other detected muons in the jet is selected and the same is done for the highest pT kaon, pion, electron, and proton, resulting in 5 different selected particles. For each particle, three observables are considered: (i) The momentum relative to the jet axis (\({p}_{{\rm{T}}}^{rel}\)), (ii) the particle charge (q), and (iii) the distance between the particle and the jet axis (ΔR), for a total of 5 × 3 observables. If a particle type is not found in a jet, the related features are set to 0. The 16th feature is the total jet charge Q, defined as the weighted average of the particles charges qi inside the jet, using the particles \({p}_{{\rm{T}}}^{rel}\) as weights:

$$Q\,=\,\frac{{\sum }_{i}{({p}_{{\rm{T}}}^{rel})}_{i}{q}_{i}}{{\sum }_{i}{({p}_{{\rm{T}}}^{rel})}_{i}}\,.$$
(8)

Results

Analysis framework

In the following, we present the jet classification performance for the TTN and the DNN applied to the LHCb dataset, also comparing both ML techniques with the muon tagging approach. For the DNN we use an optimised network with three hidden layers of 96 nodes (see Supplementary Methods for details). Hereafter, we aim to compare the best possible performance of both approaches therefore, we optimised the hyperparameters of both methods in order to obtain the best possible results from each of them, TTN and DNN. Therefore, we split the dataset of about 700k events (samples) into two sub-sets: 60% of the samples are used in the training process while the remaining 40% are used as test set to evaluate and compare the different methods. For each event prediction after the training procedure, both ML models output the probability \({{\mathcal{P}}}_{b}\) to classify the event as a jet generated by a b- or a \(\bar{b}\)-quark. A threshold Δ around \({{\mathcal{P}}}_{b}\,=\,0.5\) is then defined in which we classify the quark as unknown in order to optimise the overall tagging power ϵtag.

Jet classification performance

We obtain similar performances in terms of the raw prediction accuracy applying both ML approaches after the training procedure on the test data: the TTN takes a decision on the charge of the quark in \({\epsilon }_{eff}^{\,\text{TTN}\,}\,=\,54.5 \%\) of the cases with an overall accuracy of aTTN = 70.56%, while the DNN decides in \({\epsilon }_{eff}^{\,\text{DNN}\,}\,=\,55.3 \%\) of the samples with aDNN = 70.49%. We checked both approaches for biases in physical quantities to ensure that both methods are able to properly capture the physical process behind the problem and thus that they can be used as valid tagging methods for LHCb events (see Supplementary Methods).

In Fig. 2a we present the tagging power of the different approaches as a function of the jet transverse momentum pT. Evidently, both ML methods perform significantly better than the muon tagging approach for the complete range of jet transverse momentum pT, while the TTN and DNN display comparable performances within the statistical uncertainties.

Fig. 2: Comparison of the DNN and TNN analysis.
figure2

a Tagging power for the DNN (green), TTN (blue) and the muon tagging (red), (b) ROC curves for the DNN (green) and the TTN (blue, but completely covered by DNN), compared with the line of no-discrimination (dotted navy-blue line), (c) probability distribution for the DNN and (d) for the TTN. In the two distributions (c, d), the correctly classified events (green) are shown in the total distribution (light blue). Below, in black all samples where a muon was detected in the jet.

In Fig. 2c, d we present the histograms of the confidences for predicting a b-flavoured jet for all samples in the test dataset for the DNN and the TTN respectively. Interestingly, even though both approaches give similar performances in terms of overall precision and tagging power, the prediction confidences are fundamentally different. For the DNN, we see a Gaussian-like distribution with, in general, not very high confidence for each prediction. Thus, we obtain less correct predictions with high confidences, but at the same time, fewer wrong predictions with high confidences compared to the TTN predictions. On the other hand, the TTN displays a flatter distribution including more predictions—correct and incorrect—with higher confidence. Remarkably though, we can see peaks for extremely confident predictions (around 0 and around 1) for the TTN. These peaks can be traced back to the presence of the muon; noting that the charge of which is a well-defined predictor for a jet generated by a b-quark. The DNN lacks these confident predictions exploiting the muon charge. Further, we mention that using different cost functions for the DNN, i.e. cross-entropy loss function and the Mean Squared Error, lead to similar results (see Supplementary Methods).

Finally, in Fig. 2b we present the Receiving Operator Characteristic (ROC) curves for the TTN and the DNN together with the line of no-discrimination, which represents a randomly guessing classifier: the two ROC curves for TTN and DNN are perfectly coincident, and the Area Under the Curve (AUC) for the two classifiers is the almost same (AUCTTN = 0.689 and AUCDNN = 0.690). The graph illustrates the similarity in the outputs between TTN and DNN despite the different confidence distributions. This is further confirmed by a Pearson correlation factor of r = 0.97 between the outputs of the two classifiers.

In conclusion, the two different approaches result in similar outcomes in terms of prediction performances. However, the underlying information used by the two discriminators is inherently different. For instance, the DNN predicts more conservatively, in the sense that the confidences for each prediction tend to be lower compared with the TTN. Additionally, the DNN does not exploit the presence of the muon as strongly as the TTN, even though the muon is a good predictor for the classification.

Exploiting insights into the data with TTN

As previously mentioned, the TTN analysis allows to efficiently measure the captured correlations and the entanglement within the classifier. These measurements give insight into the learned data and can be exploited via QuIPS to identify the most important features typically used for the classifications.

In Fig. 3a we present the correlation analysis allowing us to pinpoint if two features give independent information. For both labels (\(l\,=\,b,\bar{b}\)) the results are very similar, thus in Fig. 3a we present only l = b. We see among others that the momenta \({p}_{T}^{rel}\) and distance ΔR of all particles are correlated except for the kaon. Thus this particle provides information to the classification which seems to be independent of the information gained by the other particles. However, the correlation itself does not tell if this information is important for the classification. Thus, we compute the entanglement entropy S of each feature, as reported in Fig. 3b. Here, we conclude that the features with the highest information content are the total charge and \({p}_{T}^{rel}\) and distance ΔR of the kaon. Driven by these insights, we employ the QuIPS to discard half of the features by selecting the eight most important ones: i.–iii. charge, momenta and distance of the muon, iv.–vi. charge, momenta and distance of the kaon, vii. charge of the pion and viii. total detected charge. To test the QuIPS performance, we compared it with an independent but more time-expensive analysis on the importance of the different particle types: the two approaches perfectly matched. Further, we studied two new models, one composed of the eight most important features proposed by the QuIPS, and, for comparison, another with the eight discarded features. In Fig. 3c we show the tagging power for the different analysis with the complete 16-sites (model M16), the best 8 (B8), the worst 8 (W8) and the muon tagging. Remarkably, we see that the models M16 and B8 give comparable results, while model W8 results are even worse than the classical approach. These performances are confirmed by the prediction accuracy of the different models: While only less than 1% of accuracy is lost from M16 to B8, the accuracy of the model W8 drastically drops to around 52%—that is, almost random predictions. Finally, in this particular run, the model B8 has been trained 4.7 times faster with respect to model M16 and predicts 5.5 times faster as well (The actual speed-up depends on the bond-dimension and other hyperparameters).

Fig. 3: Exploiting the information provided by the learned TTN classifier.
figure3

a Correlations between the 16 input features (blue for anti-correlated, white for uncorrelated, red for correlated). The numbers indicate q, \({p}_{T}^{rel}\), ΔR of the muon (1–3), kaon (4–6), pion (7–9), electron (10–12), proton (13–15) and the jet charge Q (16). b Entropy of each feature as the measure for the information provided for the classification. c Tagging power for learning on all features (blue), the best eight proposed by QuIPS exploiting insights from (a, b) (magenta), the worst eight (yellow) and the muon tagging (red). d Tagging power for decreasing bond-dimension truncated after training: The complete model (blue shades for χ = 100, χ = 50, χ = 5), for using the QuIPS best 8 features only (violet shades for χ = 16, χ = 5), and the muon tagging (red).

A critical point of interest in real-time ML applications is the prediction time. For example, in the LHCb Run 2 data-taking, the high-level software trigger takes a decision approximately every 1 μs55 and shorter latencies are expected in future Runs. Consequently, with the aid of the QuIPS protocol, we can efficiently reduce the prediction computational time while maintaining a comparable high prediction power. However, with TTNs, we can undertake an even further step to reduce the prediction time by reducing the bond-dimension χ after the training procedure. Here, we introduce the QIANO performing this truncation by means of the well-established SVD for TN18,23,25 in a way ensuring to introduce the least infidelity possible. In other words, QIANO can adjust the bond-dimension χ to achieve a targeted prediction time while keeping the prediction accuracy reasonably high. We stress that this can be done without relearning a new model, as would be the case with NN.

Finally, we apply QuIPS and QIANO to reduce the information in the TTN in an optimal way for a targeted balance between prediction time and accuracy. In Fig. 3d we show the tagging power taking the original TTN and truncate it to different bond dimensions χ. We can see, that even though we compress quite heavily, the overall tagging power does not change significantly. In fact, we only drop about 0.03% in the overall prediction accuracy, while at the same time improving the average prediction time from 345 to 37 μs (see Table 1). Applying the same idea to the model B8 we can reduce the average prediction time effectively down to 19 μs on our machines, a performance compatible with current real-time classification rate.

Table 1 TTN prediction time.

Discussion

We analysed an LHCb dataset for the classification of b- and \(\bar{b}\)-jets with two different ML approaches, a DNN and a TTN. We showed that we obtained with both techniques a tagging power about one order of magnitude higher than the classical muon tagging approach, which up to date is the best-published result for this classification problem. We pointed out that, even though both approaches result in similar tagging power, they treat the data very differently. In particular, TTN effectively recognises the importance of the presence of the muon as a strong predictor for the jet classification. Here, we point out that we only used a conjugate gradient descent for the optimisation of our TTN classifier. Deploying more sophisticated optimisation procedures which have already been proven to work for Tensor Trains, such as stochastic gradient descent59 or Riemannian optimisation28, may further improve the performance (in both time and accuracy) in future applications.

We further explained the crucial benefits of the TTN approach over the DNNs, namely (i) the ability to efficiently measuring correlations and the entanglement entropy, and (ii) the power of compressing the network while keeping a high amount of information (to some extend even lossless compression). We showed how the former quantum-inspired measurements help to set up a more efficient ML model: in particular, by introducing an information-based heuristic technique, we can establish the importance of single features based on the information captured within the trained TTN classifier only. Using this insight, we introduced the QuIPS, which can significantly reduce the model complexity by discarding the least-important features maintaining high prediction accuracy. This selection of features based on their informational importance for the trained classifier is one major advantage of TNs targeting to effectively decrease training and prediction time. Regarding the latter benefit of the TTN, we introduced the QIANO, which allows to decrease the TTN prediction time by optimally decreasing its representative power based on information from the quantum entropy, introducing the least possible infidelity. In contrast to DNNs, with the QIANO we do not need to set up a new model and train it from scratch, but we can optimise the network post-learning adaptively to the specific conditions, e.g. the used CPU or the required prediction time of the final application.

Finally, we showed that using QuIPS and QIANO we can effectively compress the trained TTN to target a given prediction time. In particular, we decreased our prediction times from 345 to 19 μs. We stress that, while we only used one CPU for the predictions, in future application we might obtain a speed-up from 10 to 100 times by parallelising the tensor contractions on GPUs60. Thus, we are confident that it is possible to reach a MHz prediction rate while still obtaining results significantly better than the classical muon tagging approach. Here, we also point out that, for using this algorithm on the LHCb real-time data acquisition system, it would be necessary to develop custom electronic cards like FPGAs, or GPUs with an optimised architecture. Such solutions should be explored in the future.

Given the competitive performance of the presented TTN method at its application in high-energy physics, we envisage a multitude of possible future applications in high-energy experiments at CERN and in other fields of science. Future applications of our approach in the LHCb experiment may include the discrimination between b-jets, c-jets and light flavour jets52. A fast and efficient real-time identification of b- and c-jets can be the key point for several studies in high-energy physics, ranging from the search for the rare Higgs boson decay in two c-quarks, up to the search for new particles decaying in a pair of heavy-flavour quarks (\(b\bar{b}\) or \(c\bar{c}\)).

Methods

LHCb particle detection

LHCb is fully instrumented in the phase space region of proton–proton collisions defined by the pseudo-rapidity (η) range [2, 5], with η defined as

$$\eta \,=\,-{\rm{log}}\left[\tan \left(\frac{\theta }{2}\right)\right]\,,$$
(9)

where θ is the angle between the particle momentum and the beam axis (see Fig. 4). The direction of particles momenta can be fully identified by η and by the azimuthal angle ϕ, defined as the angle in the plane transverse to the beam axis. The projection of the momentum in this plane is called transverse momentum (pT). The energy of charged and neutral particles is measured by electromagnetic and hadronic calorimeters. In the following, we work with physics natural units.

Fig. 4: Illustrative sketch showing an LHCb experiment and the two possible tagging algorithms.
figure4

A single particle tagging algorithm, exploiting information coming from one single particle (muon), and the inclusive tagging algorithm which exploits the information on all the jet constituents.

At LHCb jets are reconstructed using a Particle Flow algorithm61 for charged and neutral particles selection and using the anti-kt algorithm62 for clusterization. The jet momentum is defined as the sum of the momenta of the particles that form the jet, while the jet axis is defined as the direction of the jet momentum. Most of the particles that form the jet are contained in a cone of radius \({{\Delta }}R\,=\,\sqrt{{({{\Delta }}\eta )}^{2}\,+\,{({{\Delta }}\phi )}^{2}}\,=\,0.5\), where Δη and Δϕ are respectively the pseudo-rapidity difference and the azimuthal angle difference between the particles momenta and the jet axis. For each particle inside the jet cone, the momentum relative to the jet axis (\({p}_{{\rm{T}}}^{rel}\)) is defined as the projection of the particle momentum in the plane transverse to the jet axis.

LHCb dataset

Differently from other ML performance analyses, the dataset used in this paper has been prepared specifically for this LHCb classification problem, therefore baseline ML models and benchmarks on it do not exist. In particle physics, features are strongly dependent on the detector considered (i.e. different experiments may have a different response on the same physical object) and for this reason the training has been performed on a dataset that reproduces the LHCb experimental conditions, in order to obtain the optimal performance with this experiment.

The LHCb simulation datasets used for our analysis are produced with a Monte Carlo technique using the framework GAUSS63, which makes use of PYTHIA 864 to generate proton–proton interactions and jet fragmentation and uses EvtGen65 to simulate b-hadrons decay. The GEANT4 software66,67 is used to simulate the detector response, and the signals are digitised and reconstructed using the LHCb analysis framework.

The used dataset contains b and \(\bar{b}\)-jets produced in proton–proton collisions at a centre-of-mass energy of 13 TeV33,34. Pairs of b-jets and \(\bar{b}\)-jets are selected by requiring a jet pT greater than 20 GeV and η in the range [2.2, 4.2] for both jets.

Muon tagging

LHCb measured the \(b\bar{b}\) forward-central asymmetry using the dataset collected in the LHC Run I58 using the muon tagging approach: In this method, the muon with the highest momentum in the jet cone is selected, and its electric charge is used to decide on the b-quark charge. In fact, if this muon is produced in the original semi-leptonic decay of the b-hadron, its charge is totally correlated with the b-quark charge. Up to date, the muon tagging method gives the best performance on the b- vs. \(\bar{b}\)-jet discrimination. Although this method can distinguish between b- and \(\bar{b}\)-quark with good accuracy, its efficiency is low as it is only applicable on jets where a muon is found and it is intrinsically limited by the b-hadrons branching ratio in semi-leptonic decays. Additionally, the muon tagging may fail in some scenarios, where the selected muon is produced not by the decay of the b-hadron but in other decay processes. In these cases, the muon may not be completely correlated with the b-quark charge.

Machine learning approaches

We train the TTN and analyse the data with different bond dimensions χ. The auxiliary dimension χ controls the number of free parameters within the variational TTN ansatz. While the TTN is able to capture more information from the training data with increasing bond-dimension χ, choosing χ too large may lead to overfitting and thus can worsen the results in the test set. For the DNN we use an optimised network with three hidden layers of 96 nodes (see Supplementary Methods for details).

For each event prediction, both methods give as output the probability \({{\mathcal{P}}}_{b}\) to classify a jet as generated by a b- or a \(\bar{b}\)-quark. This probability (i.e. the confidence of the classifier) is normalised in the following way: for values of probability \({{\mathcal{P}}}_{b}\,> \,0.5\) (\({{\mathcal{P}}}_{b}\,<\,0.5\)) a jet is classified as generated by a b-quark (\(\bar{b}\)-quark), with an increasing confidence going to \({{\mathcal{P}}}_{b}\,=\,1\) (\({{\mathcal{P}}}_{b}\,=\,0\)). Therefore a completely confident classifier returns a probability distribution peaked at \({{\mathcal{P}}}_{b}\,=\,1\) and \({{\mathcal{P}}}_{b}\,=\,0\) for jets classified as generated by b- and \(\bar{b}\)-quark respectively.

We introduce a threshold Δ symmetrically around the prediction confidence of \({{\mathcal{P}}}_{b}\,=\,0.5\) in which we classify the event as unknown. We optimise the cut on the predictions of the classifiers (i.e. their confidences) to maximise the tagging power for each method based on the training samples. In the following analysis we find ΔTTN = 0.40 (ΔDNN = 0.20) for the TTN (DNN). Thereby, we predict for the TTN (DNN) a b-quark with confidences \({{\mathcal{P}}}_{b}\,> \,{C}^{\text{TTN}}\,=\,0.70\) (\({{\mathcal{P}}}_{b}\,> \,{C}^{\text{DNN}}\,=\,0.60\)), a \(\bar{b}\)-quark with confidences \({{\mathcal{P}}}_{b}\,<\,0.30\) (\({{\mathcal{P}}}_{b}\,<\,0.40\)) and no prediction for the range in between (see Fig. 2c, d).

Data availability

This paper is based on data obtained by the LHCb experiment, but is analyzed independently, and has not been reviewed by the LHCb collaboration. The data are available in the official LHCb open data repository33,34.

Code availability

The software code used for the analysis of the Deep Neural Network can be freely acquired when contacting gianelle@pd.infn.it and it is permitted to use it for any kind of private or commercial usage including modification and distribution without any liabilities or warranties. The software code for the TTN analysis is currently not available for public use. For more information, please contact timo.felser@physik.uni-saarland.de.

References

  1. 1.

    Bishop, C. M. Neural Networks for Pattern Recognition (Oxford University Press, 1996).

  2. 2.

    Haykin, S. S. et al. Neural networks and learning machines, vol. 3 (Pearson, 2009).

  3. 3.

    Nielsen, M. A. Neural networks and deep learning (Determination press, 2015).

  4. 4.

    Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT press, 2016).

  5. 5.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–44 (2015).

    ADS  Google Scholar 

  6. 6.

    Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484 (2016).

    ADS  Google Scholar 

  7. 7.

    Carleo, G. et al. Machine learning and the physical sciences. Rev. Mod. Phys. 91, 045002 (2019).

    ADS  Article  Google Scholar 

  8. 8.

    Deng, D.-L., Li, X. & Das Sarma, S. Machine learning topological states. Phys. Rev. B 96. https://doi.org/10.1103/physrevb.96.195145 (2017).

  9. 9.

    Nomura, Y., Darmawan, A. S., Yamaji, Y. & Imada, M. Restricted boltzmann machine learning for solving strongly correlated quantum systems. Phys. Rev. B 96. https://doi.org/10.1103/physrevb.96.205152 (2017).

  10. 10.

    Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).

    ADS  MathSciNet  Article  Google Scholar 

  11. 11.

    Schuld, M & Petruccione, F. Supervised Learning with Quantum Computers (Springer, 2018).

  12. 12.

    Das Sarma, S., Deng, D.-L. & Duan, L.-M. Machine learning meets quantum physics. Phys. Today 72, 48–54 (2019).

    Article  Google Scholar 

  13. 13.

    Stoudenmire, E. M. Learning relevant features of data with multi-scale tensor networks. Quantum Sci. Technol. 3, 034003 (2018).

    ADS  Article  Google Scholar 

  14. 14.

    Collura, M., Dell’Anna, L., Felser, T. & Montangero, S. On the descriptive power of Neural-Networks as constrained Tensor Networks with exponentially large bond dimension. SciPost Phys. Core 4, 1 (2021).

    Article  Google Scholar 

  15. 15.

    Chen, J., Cheng, S., Xie, H., Wang, L. & Xiang, T. Equivalence of restricted boltzmann machines and tensor network states. Phys. Rev. B 97. https://doi.org/10.1103/physrevb.97.085104 (2018).

  16. 16.

    Levine, Y., Yakira, D., Cohen, N. & Shashua, A. Deep learning and quantum entanglement: Fundamental connections with implications to network design. Preprint at https://arxiv.org/abs/1704.01552 (2017).

  17. 17.

    McCulloch, I. P. From density-matrix renormalization group to matrix product states. J. Stat. Mech. Theory Exp. 2007, P10014–P10014 (2007).

    Article  Google Scholar 

  18. 18.

    Schollwöck, U. The density-matrix renormalization group in the age of matrix product states. Ann. Phys. 326, 96–192 (2011).

    ADS  MathSciNet  Article  Google Scholar 

  19. 19.

    Singh, S. & Vidal, G. Global symmetries in tensor network states: Symmetric tensors versus minimal bond dimension. Phys. Rev. B 88, 115147 (2013).

    ADS  Article  Google Scholar 

  20. 20.

    Dalmonte, M. & Montangero, S. Lattice gauge theory simulations in the quantum information era. Contemp. Phys. 57, 388–412 (2016).

    ADS  Article  Google Scholar 

  21. 21.

    Gerster, M., Rizzi, M., Silvi, P., Dalmonte, M. & Montangero, S. Fractional quantum hall effect in the interacting hofstadter model via tensor networks. Phys. Rev. B 96, 195123 (2017).

    ADS  Article  Google Scholar 

  22. 22.

    Bañuls, M. C. et al. Simulating lattice gauge theories within quantum technologies. The European Phy. J. D 74, https://doi.org/10.1140/epjd/e2020-100571-8 (2020).

  23. 23.

    Silvi, P. et al. The tensor networks anthology: simulation techniques for many-body quantum lattice systems. SciPost Phys. Lect. Notes 8 (2019).

  24. 24.

    Felser, T., Silvi, P., Collura, M. & Montangero, S. Two-dimensional quantum-link lattice quantum electrodynamics at finite density. Phys. Rev. X 10, 041040. https://doi.org/10.1103/PhysRevX.10.041040 (2020).

  25. 25.

    Montangero, S. Introduction to Tensor Network Methods (Springer International Publishing, 2018).

  26. 26.

    Bañuls, M. C. & Cichy, K. Review on novel methods for lattice gauge theories. Rep. Prog. Phys. 83, 024401 (2020).

    ADS  MathSciNet  Article  Google Scholar 

  27. 27.

    Stoudenmire, E. & Schwab, D. J. Supervised learning with tensor networks. In (eds Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R.) Advances in Neural Information Processing Systems 29, 4799–4807. http://papers.nips.cc/paper/6211-supervised-learning-with-tensor-networks.pdf (Curran Associates, Inc., 2016).

  28. 28.

    Novikov, A., Trofimov, M. & Oseledets, I. Exponential machines. Preprint at https://arxiv.org/abs/1605.03795 (2016).

  29. 29.

    Khrulkov, V., Novikov, A. & Oseledets, I. Expressive power of recurrent neural networks. Preprint at https://arxiv.org/abs/1711.00811 (2017).

  30. 30.

    Liu, D. et al. Machine learning by unitary tensor network of hierarchical tree structure. N. J. Phys. 21, 073059 (2019).

    MathSciNet  Article  Google Scholar 

  31. 31.

    Roberts, C. et al. Tensornetwork: A library for physics and machine learning. Preprint at https://arxiv.org/abs/1905.01330 (2019).

  32. 32.

    Glasser, I., Pancotti, N. & Cirac, J. I. From probabilistic graphical models to generalized tensor networks for supervised learning. Preprint at https://arxiv.org/abs/1806.05964 (2018).

  33. 33.

    Aaij, R. et al. LHCb open data website. http://opendata.cern.ch/docs/about-lhcb (2020).

  34. 34.

    Aaij, R. et al. Simulated jet samples for quark flavour identification studies. https://doi.org/10.7483/OPENDATA.LHCB.N75T.TJPE (2020).

  35. 35.

    Tucker, L. R. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966).

    MathSciNet  Article  Google Scholar 

  36. 36.

    Östlund, S. & Rommer, S. Thermodynamic limit of density matrix renormalization. Phys. Rev. Lett. 75, 3537–3540 (1995).

    ADS  Article  Google Scholar 

  37. 37.

    Oseledets, I. V. Tensor-train decomposition. SIAM J. Sci. Comput. 33, 2295–2317 (2011).

    MathSciNet  Article  Google Scholar 

  38. 38.

    Gerster, M. et al. Unconstrained tree tensor network: an adaptive gauge picture for enhanced performance. Phys. Rev. B 90, 125154 (2014).

    ADS  Article  Google Scholar 

  39. 39.

    Hackbusch, W. & Kühn, S. A new scheme for the tensor representation. J. Fourier Anal. Appl. 15, 706–722 (2009).

    MathSciNet  Article  Google Scholar 

  40. 40.

    Verstraete, F. & Cirac, J. I. Renormalization algorithms for quantum-many body systems in two and higher dimensions. Preprint at https://arxiv.org/abs/cond-mat/0407066 (2004).

  41. 41.

    Orús, R. A practical introduction to tensor networks: matrix product states and projected entangled pair states. Ann. Phys. 349, 117–158 (2014).

    ADS  MathSciNet  Article  Google Scholar 

  42. 42.

    Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).

    MathSciNet  Article  Google Scholar 

  43. 43.

    Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 623–656 (1948).

    MathSciNet  Article  Google Scholar 

  44. 44.

    Nielsen, M. & Chuang, I. Quantum Computation and Quantum Information (Cambridge University Press, 2000).

  45. 45.

    Larkoski, A. J., Moult, I. & Nachman, B. Jet Substructure at the large hadron collider: a review of recent advances in theory and machine learning. Phys. Rept. 841, 1–63 (2020).

    ADS  Article  Google Scholar 

  46. 46.

    Butter, A. et al. The machine learning landscape of top taggers. SciPost Phys. 7, 014 (2019).

    ADS  Article  Google Scholar 

  47. 47.

    Fraser, K. & Schwartz, M. D. Jet charge and machine learning. JHEP 10, 093 (2018).

    ADS  Article  Google Scholar 

  48. 48.

    ATLAS Collaboration. Deep Sets based Neural Networks for Impact Parameter Flavour Tagging in ATLAS. Tech. Rep. ATL-PHYS-PUB-2020-014 (CERN, 2020).

  49. 49.

    ATLAS Collaboration. Identification of Jets Containing b-Hadrons with Recurrent Neural Networks at the ATLAS Experiment. Tech. Rep. ATL-PHYS-PUB-2017-003 (CERN, 2017).

  50. 50.

    CMS Collaboration. Performance of b tagging algorithms in proton-proton collisions at 13 TeV with Phase 1 CMS detector. Tech. Rep. CMS-DP-2018-033 (CERN, 2018).

  51. 51.

    Kogler, R. et al. Jet substructure at the large hadron collider: experimental review. Rev. Mod. Phys. 91, 045003 (2019).

    ADS  Article  Google Scholar 

  52. 52.

    Aaij, R. et al. Identification of beauty and charm quark jets at LHCb. JINST 10, P06013 (2015).

    Article  Google Scholar 

  53. 53.

    Murphy, C. W. Bottom-Quark Forward-Backward and Charge Asymmetries at Hadron Colliders. Phys. Rev. D92, 054003 (2015).

    ADS  Google Scholar 

  54. 54.

    Alves Jr., A. A. et al. The LHCb detector at the LHC. JINST 3, S08005 (2008).

    ADS  Google Scholar 

  55. 55.

    Aaij, R. et al. LHCb detector performance. Int. J. Mod. Phys. A30, 1530022 (2015).

    Google Scholar 

  56. 56.

    D0 collaboration. Measurements of Bd mixing using opposite-side flavor tagging. Phys. Rev. D74, 112002 (2006).

  57. 57.

    Giurgiu, Gavril A. B Flavor tagging calibration and search for \({\rm{B}}_s^0\) oscillations in semileptonic decays with the cdf detector at fermilab. United States: N. p., 2005. https://doi.org/10.2172/879144.

  58. 58.

    Aaij, R. et al. First measurement of the charge asymmetry in beauty-quark pair production. Phys. Rev. Lett. 113, 082003 (2014).

    ADS  Article  Google Scholar 

  59. 59.

    Miller, J. Torchmps. https://github.com/jemisjoky/torchmps (2019).

  60. 60.

    Milsted, A., Ganahl, M., Leichenauer, S., Hidary, J. & Vidal, G. Tensornetwork on tensorflow: a spin chain application using tree tensor networks. Preprint at https://arxiv.org/abs/1905.01331 (2019).

  61. 61.

    ALEPH collaboration. ALEPH detector performance. Nucl. Instrum. Meth. A 360, 481 (1994).

  62. 62.

    Cacciari, M., Salam, G. P. & Soyez, G. The anti-kt jet clustering algorithm. JHEP 04, 063 (2008).

    ADS  Article  Google Scholar 

  63. 63.

    Clemencic, M. et al. The lhcb simulation application, Gauss: design, evolution and experience. J. Phys. Conf. Ser. 331, 032023 (2011).

    Article  Google Scholar 

  64. 64.

    Sjöstrand, T., Mrenna, S. & Skands, P. A brief introduction to PYTHIA 8.1. Comput. Phys. Commun. 178, 852–867 (2008).

    ADS  Article  Google Scholar 

  65. 65.

    Lange, D. J. The EvtGen particle decay simulation package. Nucl. Instrum. Meth. A462, 152–155 (2001).

    ADS  Article  Google Scholar 

  66. 66.

    Agostinelli, S. et al. Geant4: A simulation toolkit. Nucl. Instrum. Meth. A506, 250 (2003).

    ADS  Article  Google Scholar 

  67. 67.

    Allison, J. et al. Geant4 developments and applications. IEEE Trans. Nucl. Sci. 53, 270 (2006).

    ADS  Article  Google Scholar 

Download references

Acknowledgements

We are very grateful to Konstantin Schmitz for valuable comments and discussions on the ML comparison. We thank Miles Stoudenmire for fruitful discussions on the application of the TNs ML code. This work is partially supported by the Italian PRIN 2017 and Fondazione CARIPARO, the Horizon 2020 research and innovation programme under grant agreement No. 817482 (Quantum Flagship—PASQuanS) and the QuantERA projects QTFLAG and QuantHEP. We acknowledge computational resources by CINECA and the Cloud Veneto. The work is partially supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) and the European Social Fund (ESF) as part of the EXIST programme under the project Tensor Solutions. We acknowledge the LHCb Collaboration for the valuable help and the Istituto Nazionale di Fisica Nucleare and the Department of Physics and Astronomy of the University of Padova for the support.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Affiliations

Authors

Contributions

Conceptualisation (T.F., D.L. and S.M.); data analysis (D.Z., L.S., T.F., M.T. and A.G.); funding acquisition (D.L., S.M.); investigation (D.Z., L.S., T.F. and S.M.); methodology (T.F., S.M.); tensor network software development (T.F. using private resources); validation (D.Z., L.S., T.F. and M.T.); writing—original draft (T.F., S.M.); writing—review and editing (all authors).

Corresponding author

Correspondence to Timo Felser.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Felser, T., Trenti, M., Sestini, L. et al. Quantum-inspired machine learning on high-energy physics data. npj Quantum Inf 7, 111 (2021). https://doi.org/10.1038/s41534-021-00443-w

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing