Abstract
Tensor Networks, a numerical tool originally designed for simulating quantum manybody systems, have recently been applied to solve Machine Learning problems. Exploiting a tree tensor network, we apply a quantuminspired machine learning technique to a very important and challenging big data problem in highenergy physics: the analysis and classification of data produced by the Large Hadron Collider at CERN. In particular, we present how to effectively classify socalled bjets, jets originating from bquarks from proton–proton collisions in the LHCb experiment, and how to interpret the classification results. We exploit the Tensor Network approach to select important features and adapt the network geometry based on information acquired in the learning process. Finally, we show how to adapt the tree tensor network to achieve optimal precision or fast response in time without the need of repeating the learning process. These results pave the way to the implementation of highfrequency realtime applications, a key ingredient needed among others for current and future LHCb event classification able to trigger events at the tens of MHz scale.
Introduction
Artificial Neural Networks (NN) are a wellestablished tool for applications in Machine Learning and they are of increasing interest in both research and industry^{1,2,3,4,5,6}. Inspired by biological NN, they are able to recognise patterns while processing a huge amount of data. In a nutshell, a NNs describes a functional mapping containing many variational parameters, which are optimised during the training procedure. Recently, deep connections between Machine Learning and quantum physics have been identified and continue to be uncovered^{7}. On one hand, NNs have been applied to describe the behaviour of complex quantum manybody systems^{8,9,10} while, on the other hand, quantuminspired technologies and algorithms are taken into account to solve Machine Learning tasks^{11,12,13}.
One particular numerical method originated from quantum physics which has been increasingly compared to NNs are Tensor Networks (TNs)^{14,15,16}. TNs have been developed to investigate quantum manybody systems on classical computers by efficiently representing the exponentially large quantum wavefunction \(\left\psi \right\rangle\) in a compact form and they have proven to be an essential tool for a broad range of applications^{17,18,19,20,21,22,23,24,25,26}. The accuracy of the TN approximation can be controlled with the socalled bonddimension χ, an auxiliary dimension for the indices of the connected local tensors. Recently, it has been shown that TN methods can also be applied to solve Machine Learning (ML) tasks very effectively^{13,16,27,28,29,30,31}. Indeed, even though NNs have been highly developed in recent decades by industry and research, the first approaches of ML with TN yield already comparable results when applied to standard datasets^{13,27,32}. Due to their original development focusing on quantum systems, TNs allow to easily compute quantities such as quantum correlations or entanglement entropy and thereby they grant access to insights on the learned data from a distinct point of view for the application in ML^{16,30}. Hereafter, we demonstrate the effectiveness of the approach and, more importantly, that it allows introducing algorithms to simplify and explain the learning process, unveiling a pathway to an explainable Artificial Intelligence. As a potential application of this approach, we present a TN supervised learning of identifying the charge of bquarks (i.e. b or \(\bar{b}\)) produced in highenergy proton–proton collisions at the Large Hadron Collider (LHC) accelerator at CERN.
In what follows, we first describe the quantuminspired Tree Tensor Network (TTN) and introduce different quantities that can be extracted from the TTN classifier which are not easily accessible for the biologicalinspired Deep NN (DNN), such as correlation functions and entanglement entropy which can be used to explain the learning process and subsequent classifications, paving the way to an efficient and transparent ML tool. In this regard, we introduce the QuantumInformation Postlearning feature Selection (QuIPS), a protocol that reduces the complexity of the ML model based on the information the single features provide for the classification problem. We then briefly describe the LHCb experiment and its simulation framework, the main observables related to bjets physics, and the relevant quantities for this analysis together with the underlying LHCb data^{33,34}. We further compare the performance obtained by the DNN and the TTN, before presenting the analytical insights into the TTN which, among others, can be exploited to improve future data analysis of highenergy problems for a deeper physical understanding of the LHCb data. Moreover, we introduce the QuantumInformation Adaptive Network Optimisation (QIANO), which adapts the TN representation by reducing the number of free parameters based on the captured information within the TN while aiming to maintain the highest accuracy possible. Therewith, we can optimise the trained TN classifier for a targeted prediction speed without the necessity to relearn a new model from scratch.
TNs are not only a wellestablished way to represent a quantum wavefunction \(\left\psi \right\rangle\), but more general an efficient representation of information as such. In the mathematical context, a TN approximates a highorder tensor by a set of loworder tensors that are contracted in a particular underlying geometry and have common roots with other decompositions, such as the Singular Value Decomposition (SVD) or Tucker decomposition^{35}. Among others, some of the most successful TN representations are the Matrix Product State—or Tensor Trains^{18,27,36,37}, the TTN—or Hierarchical Tucker decomposition^{30,38,39}, and the Projected Entangled Pair States^{40,41}.
For a supervised learning problem, a TN can be used as the weight tensorW^{13,27,30}, a highorder tensor which acts as classifier for the input data {x}: Each sample x is encoded by a feature map Φ(x) and subsequently classified by the weight tensor W. The final confidence of the classifier for a certain class labelled by l is given by the probability
In the following, we use a TTN Ψ to represent W (see Fig. 1, bottom right) which can be described as a contraction of its N hierarchically connected local tensors T_{{χ}}
where n ∈ [1, N]. Therefore, we can interpret the TTN classifier Ψ as well as a set of quantum manybody wavefunctions \(\left{\psi }_{l}\right\rangle\)—one for each of the class labels l (see Supplementary Methods). For the classification, we represent each sample x by a product state Φ(x). Therefore, we map each feature x_{i} ∈ x into a quantum spin by choosing the feature map Φ(x) as a Kronecker product of N + 1 local feature maps
where \(x^{\prime} \,\equiv\, {x}_{i}/{x}_{i,\text{max}}\,\in\, [0,1]\) is the rescaled value with respect to the maximum x_{i,max} within all samples of the training set.
Accordingly, we classify a sample x by computing the overlap 〈Φ(x)∣ψ_{l}〉 for all labels l with the product state Φ(x) resulting in the weighted probabilities
for each class. We point out, that we can encode the input data in different nonlinear feature maps as well (see Supplementary Notes).
One of the major benefits of TNs in quantum mechanics is the accessibility of information within the network. They allow to efficiently measure information quantities such as entanglement entropy and correlations. Based on these quantuminspired measurements, we here introduce the QuIPS protocol for the TN application in ML, which exploits the information encoded and accessible in the TN in order to rank the input features according to their importance for the classification.
In information theory, entropy as such is a measure of the information content inherent in the possible outcomes of variables, such as e.g. a classification^{42,43,44}. In TNs such information content can be assessed by means of the entanglement entropy S which describes the shared information between TN bipartitions. The entanglement S is measured via the Schmidt decomposition, that is, decomposing \(\left\psi \right\rangle\) into two bipartitions \(\left{\psi }_{\alpha }^{A}\right\rangle\) and \(\left{\psi }_{\alpha }^{B}\right\rangle\)^{44} such that
where λ_{α} are the Schmidtcoefficients (nonzero, normalised singular values of the decomposition). The entanglement entropy is then defined as \(S\,=\,{\sum }_{\alpha }{\lambda }_{\alpha }^{2}{\mathrm{ln}}\,{\lambda }_{\alpha }^{2}\). Consequently, the minimal entropy S = 0 is obtained only if we have one single nonzero singular value λ_{1} = 1. In this case, we can completely separate the two bipartitions as they share no information. On the contrary, higher S means that information is shared among the bipartitions.
In the ML context, the entropy can be interpreted as follows: If the features in one bipartition provide no valuable information for the classification task, the entropy is zero. On the contrary, S increases the more information between the two bipartitions are exploited. This analysis can be used to optimise the learning procedure: whenever S = 0, the feature can be discarded with no loss of information for the classification. Thereby, a second model with fewer features and fewer tensors can be introduced. This second, more efficient model results in the same predictions in less time. On the contrary, a high bipartition entropy highlights which feature—or combination of features—are important for the correct predictions.
The second set of measurements we take into account are the correlation functions
for each pair of features (located at site i and j) and for each class l. The correlations offer an insight into the possible relation among the information that the two features provide. In case of maximum correlation or anticorrelation among them for all classes l, the information of one of the features can be obtained by the other one (and vice versa), thus one can be neglected. In case of no correlation among them, the two features may provide fundamentally different information for the classification. The correlation analysis allows pinpointing if two features give independent information. However, the correlation itself—in contrast to the entropy—does not tell if this information is important for the classification.
In conclusion, based on the previous insights, namely: (i) a low entropy of a feature bipartition signals that one of the two bipartitions can be discarded, providing negligible loss of information and (ii) if two features are completely (anti)correlated we can neglect at least one of them, the QuIPS enables to filter out the most valuable features for the classification.
Nowadays, in particle physics, ML is widely used for the classification of jets, i.e. streams of particles produced by the fragmentation of quarks and gluons. The jet substructure can be exploited to solve such classification problems^{45}. ML techniques have been proposed to identify boosted, hadronically decaying top quarks^{46}, or to identify the jet charge^{47}. The ATLAS and CMS collaborations developed ML algorithms in order to identify jets generated by the fragmentation of bquarks^{48,49,50}: a comprehensive review on ML techniques at the LHC can be found in^{51}.
The LHCb experiment in particular is, among others, dedicated to the study of the physics of b and cquarks produced in proton–proton collisions. Here, ML methods have been introduced recently for the discrimination between b and cjets by using Boosted Decision Tree classifiers^{52}. However, a crucial topic for the LHCb experiment, which is yet unexploited by ML, is the identification of the charge of a bquark, i.e. discriminating between a b or \(\bar{b}\). Such identification can be used in many physics measurements, and it is the core of the determination of the charge asymmetry in bpairs production, a quantity sensitive to physics beyond the Standard Model^{53}. Whenever produced in a scattering event, bquarks have a short lifetime as free particles; indeed, they manifest themselves as bound states (hadrons) or as narrow cones of particles produced by the hadronization (jets). In the case of the LHCb experiment, the bjets are detected by the apparatus located in the forward region of proton–proton collisions (see Fig. 1, left)^{54}. The LHCb detector includes a particle identification system that distinguishes different types of charged particles within the jet, and a highprecision tracking system able to measure the momentum of each particles^{55}. Still, the separation between b and \(\bar{b}\)jets is a highly difficult task because the bquark fragmentation produce dozens of particles via nonperturbative Quantum Chromodynamics processes, resulting in nontrivial correlations between the jet particles and the original quark.
The algorithms used to identify the charge of the bquarks based on information on the jets are called tagging methods. The tagging algorithm performance is typically quantified with the tagging power ϵ_{tag}, representing the effective fraction of jets that contribute to the statistical uncertainty in an asymmetry measurement^{56,57}. In particular, the tagging power ϵ_{tag} takes into account the efficiency ϵ_{eff} (the fraction of jets for which the classifier takes a decision) and the prediction accuracy a (the fraction of correctly classified jets among them) as follows:
To date, the muon tagging method gives the best performance on the b vs. \(\bar{b}\)jet discrimination using the dataset collected in the LHC Run I^{58}: here, the muon with the highest momentum in the jet is selected, and its electric charge is used to decide on the bquark charge.
For the ML application, we now formulate the identification of the bquark charge in terms of a supervised learning problem. As described above, we implemented a TTN as a classifier and applied it to the LHCb problem analysing its performance. Alongside, a DNN analysis is performed to the best of our capabilities, and both algorithms are compared with the muon tagging approach. Both the TTN and the DNN, use as input for the supervised learning 16 features of the jet substructure from the official simulation data released by the LHCb collaboration^{33,34}. The 16 features are determined as follows: the muon with the highest p_{T} among all other detected muons in the jet is selected and the same is done for the highest p_{T} kaon, pion, electron, and proton, resulting in 5 different selected particles. For each particle, three observables are considered: (i) The momentum relative to the jet axis (\({p}_{{\rm{T}}}^{rel}\)), (ii) the particle charge (q), and (iii) the distance between the particle and the jet axis (ΔR), for a total of 5 × 3 observables. If a particle type is not found in a jet, the related features are set to 0. The 16th feature is the total jet charge Q, defined as the weighted average of the particles charges q_{i} inside the jet, using the particles \({p}_{{\rm{T}}}^{rel}\) as weights:
Results
Analysis framework
In the following, we present the jet classification performance for the TTN and the DNN applied to the LHCb dataset, also comparing both ML techniques with the muon tagging approach. For the DNN we use an optimised network with three hidden layers of 96 nodes (see Supplementary Methods for details). Hereafter, we aim to compare the best possible performance of both approaches therefore, we optimised the hyperparameters of both methods in order to obtain the best possible results from each of them, TTN and DNN. Therefore, we split the dataset of about 700k events (samples) into two subsets: 60% of the samples are used in the training process while the remaining 40% are used as test set to evaluate and compare the different methods. For each event prediction after the training procedure, both ML models output the probability \({{\mathcal{P}}}_{b}\) to classify the event as a jet generated by a b or a \(\bar{b}\)quark. A threshold Δ around \({{\mathcal{P}}}_{b}\,=\,0.5\) is then defined in which we classify the quark as unknown in order to optimise the overall tagging power ϵ_{tag}.
Jet classification performance
We obtain similar performances in terms of the raw prediction accuracy applying both ML approaches after the training procedure on the test data: the TTN takes a decision on the charge of the quark in \({\epsilon }_{eff}^{\,\text{TTN}\,}\,=\,54.5 \%\) of the cases with an overall accuracy of a^{TTN} = 70.56%, while the DNN decides in \({\epsilon }_{eff}^{\,\text{DNN}\,}\,=\,55.3 \%\) of the samples with a^{DNN} = 70.49%. We checked both approaches for biases in physical quantities to ensure that both methods are able to properly capture the physical process behind the problem and thus that they can be used as valid tagging methods for LHCb events (see Supplementary Methods).
In Fig. 2a we present the tagging power of the different approaches as a function of the jet transverse momentum p_{T}. Evidently, both ML methods perform significantly better than the muon tagging approach for the complete range of jet transverse momentum p_{T}, while the TTN and DNN display comparable performances within the statistical uncertainties.
In Fig. 2c, d we present the histograms of the confidences for predicting a bflavoured jet for all samples in the test dataset for the DNN and the TTN respectively. Interestingly, even though both approaches give similar performances in terms of overall precision and tagging power, the prediction confidences are fundamentally different. For the DNN, we see a Gaussianlike distribution with, in general, not very high confidence for each prediction. Thus, we obtain less correct predictions with high confidences, but at the same time, fewer wrong predictions with high confidences compared to the TTN predictions. On the other hand, the TTN displays a flatter distribution including more predictions—correct and incorrect—with higher confidence. Remarkably though, we can see peaks for extremely confident predictions (around 0 and around 1) for the TTN. These peaks can be traced back to the presence of the muon; noting that the charge of which is a welldefined predictor for a jet generated by a bquark. The DNN lacks these confident predictions exploiting the muon charge. Further, we mention that using different cost functions for the DNN, i.e. crossentropy loss function and the Mean Squared Error, lead to similar results (see Supplementary Methods).
Finally, in Fig. 2b we present the Receiving Operator Characteristic (ROC) curves for the TTN and the DNN together with the line of nodiscrimination, which represents a randomly guessing classifier: the two ROC curves for TTN and DNN are perfectly coincident, and the Area Under the Curve (AUC) for the two classifiers is the almost same (AUC^{TTN} = 0.689 and AUC^{DNN} = 0.690). The graph illustrates the similarity in the outputs between TTN and DNN despite the different confidence distributions. This is further confirmed by a Pearson correlation factor of r = 0.97 between the outputs of the two classifiers.
In conclusion, the two different approaches result in similar outcomes in terms of prediction performances. However, the underlying information used by the two discriminators is inherently different. For instance, the DNN predicts more conservatively, in the sense that the confidences for each prediction tend to be lower compared with the TTN. Additionally, the DNN does not exploit the presence of the muon as strongly as the TTN, even though the muon is a good predictor for the classification.
Exploiting insights into the data with TTN
As previously mentioned, the TTN analysis allows to efficiently measure the captured correlations and the entanglement within the classifier. These measurements give insight into the learned data and can be exploited via QuIPS to identify the most important features typically used for the classifications.
In Fig. 3a we present the correlation analysis allowing us to pinpoint if two features give independent information. For both labels (\(l\,=\,b,\bar{b}\)) the results are very similar, thus in Fig. 3a we present only l = b. We see among others that the momenta \({p}_{T}^{rel}\) and distance ΔR of all particles are correlated except for the kaon. Thus this particle provides information to the classification which seems to be independent of the information gained by the other particles. However, the correlation itself does not tell if this information is important for the classification. Thus, we compute the entanglement entropy S of each feature, as reported in Fig. 3b. Here, we conclude that the features with the highest information content are the total charge and \({p}_{T}^{rel}\) and distance ΔR of the kaon. Driven by these insights, we employ the QuIPS to discard half of the features by selecting the eight most important ones: i.–iii. charge, momenta and distance of the muon, iv.–vi. charge, momenta and distance of the kaon, vii. charge of the pion and viii. total detected charge. To test the QuIPS performance, we compared it with an independent but more timeexpensive analysis on the importance of the different particle types: the two approaches perfectly matched. Further, we studied two new models, one composed of the eight most important features proposed by the QuIPS, and, for comparison, another with the eight discarded features. In Fig. 3c we show the tagging power for the different analysis with the complete 16sites (model M_{16}), the best 8 (B_{8}), the worst 8 (W_{8}) and the muon tagging. Remarkably, we see that the models M_{16} and B_{8} give comparable results, while model W_{8} results are even worse than the classical approach. These performances are confirmed by the prediction accuracy of the different models: While only less than 1% of accuracy is lost from M_{16} to B_{8}, the accuracy of the model W_{8} drastically drops to around 52%—that is, almost random predictions. Finally, in this particular run, the model B_{8} has been trained 4.7 times faster with respect to model M_{16} and predicts 5.5 times faster as well (The actual speedup depends on the bonddimension and other hyperparameters).
A critical point of interest in realtime ML applications is the prediction time. For example, in the LHCb Run 2 datataking, the highlevel software trigger takes a decision approximately every 1 μs^{55} and shorter latencies are expected in future Runs. Consequently, with the aid of the QuIPS protocol, we can efficiently reduce the prediction computational time while maintaining a comparable high prediction power. However, with TTNs, we can undertake an even further step to reduce the prediction time by reducing the bonddimension χ after the training procedure. Here, we introduce the QIANO performing this truncation by means of the wellestablished SVD for TN^{18,23,25} in a way ensuring to introduce the least infidelity possible. In other words, QIANO can adjust the bonddimension χ to achieve a targeted prediction time while keeping the prediction accuracy reasonably high. We stress that this can be done without relearning a new model, as would be the case with NN.
Finally, we apply QuIPS and QIANO to reduce the information in the TTN in an optimal way for a targeted balance between prediction time and accuracy. In Fig. 3d we show the tagging power taking the original TTN and truncate it to different bond dimensions χ. We can see, that even though we compress quite heavily, the overall tagging power does not change significantly. In fact, we only drop about 0.03% in the overall prediction accuracy, while at the same time improving the average prediction time from 345 to 37 μs (see Table 1). Applying the same idea to the model B_{8} we can reduce the average prediction time effectively down to 19 μs on our machines, a performance compatible with current realtime classification rate.
Discussion
We analysed an LHCb dataset for the classification of b and \(\bar{b}\)jets with two different ML approaches, a DNN and a TTN. We showed that we obtained with both techniques a tagging power about one order of magnitude higher than the classical muon tagging approach, which up to date is the bestpublished result for this classification problem. We pointed out that, even though both approaches result in similar tagging power, they treat the data very differently. In particular, TTN effectively recognises the importance of the presence of the muon as a strong predictor for the jet classification. Here, we point out that we only used a conjugate gradient descent for the optimisation of our TTN classifier. Deploying more sophisticated optimisation procedures which have already been proven to work for Tensor Trains, such as stochastic gradient descent^{59} or Riemannian optimisation^{28}, may further improve the performance (in both time and accuracy) in future applications.
We further explained the crucial benefits of the TTN approach over the DNNs, namely (i) the ability to efficiently measuring correlations and the entanglement entropy, and (ii) the power of compressing the network while keeping a high amount of information (to some extend even lossless compression). We showed how the former quantuminspired measurements help to set up a more efficient ML model: in particular, by introducing an informationbased heuristic technique, we can establish the importance of single features based on the information captured within the trained TTN classifier only. Using this insight, we introduced the QuIPS, which can significantly reduce the model complexity by discarding the leastimportant features maintaining high prediction accuracy. This selection of features based on their informational importance for the trained classifier is one major advantage of TNs targeting to effectively decrease training and prediction time. Regarding the latter benefit of the TTN, we introduced the QIANO, which allows to decrease the TTN prediction time by optimally decreasing its representative power based on information from the quantum entropy, introducing the least possible infidelity. In contrast to DNNs, with the QIANO we do not need to set up a new model and train it from scratch, but we can optimise the network postlearning adaptively to the specific conditions, e.g. the used CPU or the required prediction time of the final application.
Finally, we showed that using QuIPS and QIANO we can effectively compress the trained TTN to target a given prediction time. In particular, we decreased our prediction times from 345 to 19 μs. We stress that, while we only used one CPU for the predictions, in future application we might obtain a speedup from 10 to 100 times by parallelising the tensor contractions on GPUs^{60}. Thus, we are confident that it is possible to reach a MHz prediction rate while still obtaining results significantly better than the classical muon tagging approach. Here, we also point out that, for using this algorithm on the LHCb realtime data acquisition system, it would be necessary to develop custom electronic cards like FPGAs, or GPUs with an optimised architecture. Such solutions should be explored in the future.
Given the competitive performance of the presented TTN method at its application in highenergy physics, we envisage a multitude of possible future applications in highenergy experiments at CERN and in other fields of science. Future applications of our approach in the LHCb experiment may include the discrimination between bjets, cjets and light flavour jets^{52}. A fast and efficient realtime identification of b and cjets can be the key point for several studies in highenergy physics, ranging from the search for the rare Higgs boson decay in two cquarks, up to the search for new particles decaying in a pair of heavyflavour quarks (\(b\bar{b}\) or \(c\bar{c}\)).
Methods
LHCb particle detection
LHCb is fully instrumented in the phase space region of proton–proton collisions defined by the pseudorapidity (η) range [2, 5], with η defined as
where θ is the angle between the particle momentum and the beam axis (see Fig. 4). The direction of particles momenta can be fully identified by η and by the azimuthal angle ϕ, defined as the angle in the plane transverse to the beam axis. The projection of the momentum in this plane is called transverse momentum (p_{T}). The energy of charged and neutral particles is measured by electromagnetic and hadronic calorimeters. In the following, we work with physics natural units.
At LHCb jets are reconstructed using a Particle Flow algorithm^{61} for charged and neutral particles selection and using the antik_{t} algorithm^{62} for clusterization. The jet momentum is defined as the sum of the momenta of the particles that form the jet, while the jet axis is defined as the direction of the jet momentum. Most of the particles that form the jet are contained in a cone of radius \({{\Delta }}R\,=\,\sqrt{{({{\Delta }}\eta )}^{2}\,+\,{({{\Delta }}\phi )}^{2}}\,=\,0.5\), where Δη and Δϕ are respectively the pseudorapidity difference and the azimuthal angle difference between the particles momenta and the jet axis. For each particle inside the jet cone, the momentum relative to the jet axis (\({p}_{{\rm{T}}}^{rel}\)) is defined as the projection of the particle momentum in the plane transverse to the jet axis.
LHCb dataset
Differently from other ML performance analyses, the dataset used in this paper has been prepared specifically for this LHCb classification problem, therefore baseline ML models and benchmarks on it do not exist. In particle physics, features are strongly dependent on the detector considered (i.e. different experiments may have a different response on the same physical object) and for this reason the training has been performed on a dataset that reproduces the LHCb experimental conditions, in order to obtain the optimal performance with this experiment.
The LHCb simulation datasets used for our analysis are produced with a Monte Carlo technique using the framework GAUSS^{63}, which makes use of PYTHIA 8^{64} to generate proton–proton interactions and jet fragmentation and uses EvtGen^{65} to simulate bhadrons decay. The GEANT4 software^{66,67} is used to simulate the detector response, and the signals are digitised and reconstructed using the LHCb analysis framework.
The used dataset contains b and \(\bar{b}\)jets produced in proton–proton collisions at a centreofmass energy of 13 TeV^{33,34}. Pairs of bjets and \(\bar{b}\)jets are selected by requiring a jet p_{T} greater than 20 GeV and η in the range [2.2, 4.2] for both jets.
Muon tagging
LHCb measured the \(b\bar{b}\) forwardcentral asymmetry using the dataset collected in the LHC Run I^{58} using the muon tagging approach: In this method, the muon with the highest momentum in the jet cone is selected, and its electric charge is used to decide on the bquark charge. In fact, if this muon is produced in the original semileptonic decay of the bhadron, its charge is totally correlated with the bquark charge. Up to date, the muon tagging method gives the best performance on the b vs. \(\bar{b}\)jet discrimination. Although this method can distinguish between b and \(\bar{b}\)quark with good accuracy, its efficiency is low as it is only applicable on jets where a muon is found and it is intrinsically limited by the bhadrons branching ratio in semileptonic decays. Additionally, the muon tagging may fail in some scenarios, where the selected muon is produced not by the decay of the bhadron but in other decay processes. In these cases, the muon may not be completely correlated with the bquark charge.
Machine learning approaches
We train the TTN and analyse the data with different bond dimensions χ. The auxiliary dimension χ controls the number of free parameters within the variational TTN ansatz. While the TTN is able to capture more information from the training data with increasing bonddimension χ, choosing χ too large may lead to overfitting and thus can worsen the results in the test set. For the DNN we use an optimised network with three hidden layers of 96 nodes (see Supplementary Methods for details).
For each event prediction, both methods give as output the probability \({{\mathcal{P}}}_{b}\) to classify a jet as generated by a b or a \(\bar{b}\)quark. This probability (i.e. the confidence of the classifier) is normalised in the following way: for values of probability \({{\mathcal{P}}}_{b}\,> \,0.5\) (\({{\mathcal{P}}}_{b}\,<\,0.5\)) a jet is classified as generated by a bquark (\(\bar{b}\)quark), with an increasing confidence going to \({{\mathcal{P}}}_{b}\,=\,1\) (\({{\mathcal{P}}}_{b}\,=\,0\)). Therefore a completely confident classifier returns a probability distribution peaked at \({{\mathcal{P}}}_{b}\,=\,1\) and \({{\mathcal{P}}}_{b}\,=\,0\) for jets classified as generated by b and \(\bar{b}\)quark respectively.
We introduce a threshold Δ symmetrically around the prediction confidence of \({{\mathcal{P}}}_{b}\,=\,0.5\) in which we classify the event as unknown. We optimise the cut on the predictions of the classifiers (i.e. their confidences) to maximise the tagging power for each method based on the training samples. In the following analysis we find Δ^{TTN} = 0.40 (Δ^{DNN} = 0.20) for the TTN (DNN). Thereby, we predict for the TTN (DNN) a bquark with confidences \({{\mathcal{P}}}_{b}\,> \,{C}^{\text{TTN}}\,=\,0.70\) (\({{\mathcal{P}}}_{b}\,> \,{C}^{\text{DNN}}\,=\,0.60\)), a \(\bar{b}\)quark with confidences \({{\mathcal{P}}}_{b}\,<\,0.30\) (\({{\mathcal{P}}}_{b}\,<\,0.40\)) and no prediction for the range in between (see Fig. 2c, d).
Data availability
This paper is based on data obtained by the LHCb experiment, but is analyzed independently, and has not been reviewed by the LHCb collaboration. The data are available in the official LHCb open data repository^{33,34}.
Code availability
The software code used for the analysis of the Deep Neural Network can be freely acquired when contacting gianelle@pd.infn.it and it is permitted to use it for any kind of private or commercial usage including modification and distribution without any liabilities or warranties. The software code for the TTN analysis is currently not available for public use. For more information, please contact timo.felser@physik.unisaarland.de.
References
Bishop, C. M. Neural Networks for Pattern Recognition (Oxford University Press, 1996).
Haykin, S. S. et al. Neural networks and learning machines, vol. 3 (Pearson, 2009).
Nielsen, M. A. Neural networks and deep learning (Determination press, 2015).
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT press, 2016).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–44 (2015).
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484 (2016).
Carleo, G. et al. Machine learning and the physical sciences. Rev. Mod. Phys. 91, 045002 (2019).
Deng, D.L., Li, X. & Das Sarma, S. Machine learning topological states. Phys. Rev. B 96. https://doi.org/10.1103/physrevb.96.195145 (2017).
Nomura, Y., Darmawan, A. S., Yamaji, Y. & Imada, M. Restricted boltzmann machine learning for solving strongly correlated quantum systems. Phys. Rev. B 96. https://doi.org/10.1103/physrevb.96.205152 (2017).
Carleo, G. & Troyer, M. Solving the quantum manybody problem with artificial neural networks. Science 355, 602–606 (2017).
Schuld, M & Petruccione, F. Supervised Learning with Quantum Computers (Springer, 2018).
Das Sarma, S., Deng, D.L. & Duan, L.M. Machine learning meets quantum physics. Phys. Today 72, 48–54 (2019).
Stoudenmire, E. M. Learning relevant features of data with multiscale tensor networks. Quantum Sci. Technol. 3, 034003 (2018).
Collura, M., Dell’Anna, L., Felser, T. & Montangero, S. On the descriptive power of NeuralNetworks as constrained Tensor Networks with exponentially large bond dimension. SciPost Phys. Core 4, 1 (2021).
Chen, J., Cheng, S., Xie, H., Wang, L. & Xiang, T. Equivalence of restricted boltzmann machines and tensor network states. Phys. Rev. B 97. https://doi.org/10.1103/physrevb.97.085104 (2018).
Levine, Y., Yakira, D., Cohen, N. & Shashua, A. Deep learning and quantum entanglement: Fundamental connections with implications to network design. Preprint at https://arxiv.org/abs/1704.01552 (2017).
McCulloch, I. P. From densitymatrix renormalization group to matrix product states. J. Stat. Mech. Theory Exp. 2007, P10014–P10014 (2007).
Schollwöck, U. The densitymatrix renormalization group in the age of matrix product states. Ann. Phys. 326, 96–192 (2011).
Singh, S. & Vidal, G. Global symmetries in tensor network states: Symmetric tensors versus minimal bond dimension. Phys. Rev. B 88, 115147 (2013).
Dalmonte, M. & Montangero, S. Lattice gauge theory simulations in the quantum information era. Contemp. Phys. 57, 388–412 (2016).
Gerster, M., Rizzi, M., Silvi, P., Dalmonte, M. & Montangero, S. Fractional quantum hall effect in the interacting hofstadter model via tensor networks. Phys. Rev. B 96, 195123 (2017).
Bañuls, M. C. et al. Simulating lattice gauge theories within quantum technologies. The European Phy. J. D 74, https://doi.org/10.1140/epjd/e20201005718 (2020).
Silvi, P. et al. The tensor networks anthology: simulation techniques for manybody quantum lattice systems. SciPost Phys. Lect. Notes 8 (2019).
Felser, T., Silvi, P., Collura, M. & Montangero, S. Twodimensional quantumlink lattice quantum electrodynamics at finite density. Phys. Rev. X 10, 041040. https://doi.org/10.1103/PhysRevX.10.041040 (2020).
Montangero, S. Introduction to Tensor Network Methods (Springer International Publishing, 2018).
Bañuls, M. C. & Cichy, K. Review on novel methods for lattice gauge theories. Rep. Prog. Phys. 83, 024401 (2020).
Stoudenmire, E. & Schwab, D. J. Supervised learning with tensor networks. In (eds Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R.) Advances in Neural Information Processing Systems 29, 4799–4807. http://papers.nips.cc/paper/6211supervisedlearningwithtensornetworks.pdf (Curran Associates, Inc., 2016).
Novikov, A., Trofimov, M. & Oseledets, I. Exponential machines. Preprint at https://arxiv.org/abs/1605.03795 (2016).
Khrulkov, V., Novikov, A. & Oseledets, I. Expressive power of recurrent neural networks. Preprint at https://arxiv.org/abs/1711.00811 (2017).
Liu, D. et al. Machine learning by unitary tensor network of hierarchical tree structure. N. J. Phys. 21, 073059 (2019).
Roberts, C. et al. Tensornetwork: A library for physics and machine learning. Preprint at https://arxiv.org/abs/1905.01330 (2019).
Glasser, I., Pancotti, N. & Cirac, J. I. From probabilistic graphical models to generalized tensor networks for supervised learning. Preprint at https://arxiv.org/abs/1806.05964 (2018).
Aaij, R. et al. LHCb open data website. http://opendata.cern.ch/docs/aboutlhcb (2020).
Aaij, R. et al. Simulated jet samples for quark flavour identification studies. https://doi.org/10.7483/OPENDATA.LHCB.N75T.TJPE (2020).
Tucker, L. R. Some mathematical notes on threemode factor analysis. Psychometrika 31, 279–311 (1966).
Östlund, S. & Rommer, S. Thermodynamic limit of density matrix renormalization. Phys. Rev. Lett. 75, 3537–3540 (1995).
Oseledets, I. V. Tensortrain decomposition. SIAM J. Sci. Comput. 33, 2295–2317 (2011).
Gerster, M. et al. Unconstrained tree tensor network: an adaptive gauge picture for enhanced performance. Phys. Rev. B 90, 125154 (2014).
Hackbusch, W. & Kühn, S. A new scheme for the tensor representation. J. Fourier Anal. Appl. 15, 706–722 (2009).
Verstraete, F. & Cirac, J. I. Renormalization algorithms for quantummany body systems in two and higher dimensions. Preprint at https://arxiv.org/abs/condmat/0407066 (2004).
Orús, R. A practical introduction to tensor networks: matrix product states and projected entangled pair states. Ann. Phys. 349, 117–158 (2014).
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 623–656 (1948).
Nielsen, M. & Chuang, I. Quantum Computation and Quantum Information (Cambridge University Press, 2000).
Larkoski, A. J., Moult, I. & Nachman, B. Jet Substructure at the large hadron collider: a review of recent advances in theory and machine learning. Phys. Rept. 841, 1–63 (2020).
Butter, A. et al. The machine learning landscape of top taggers. SciPost Phys. 7, 014 (2019).
Fraser, K. & Schwartz, M. D. Jet charge and machine learning. JHEP 10, 093 (2018).
ATLAS Collaboration. Deep Sets based Neural Networks for Impact Parameter Flavour Tagging in ATLAS. Tech. Rep. ATLPHYSPUB2020014 (CERN, 2020).
ATLAS Collaboration. Identification of Jets Containing bHadrons with Recurrent Neural Networks at the ATLAS Experiment. Tech. Rep. ATLPHYSPUB2017003 (CERN, 2017).
CMS Collaboration. Performance of b tagging algorithms in protonproton collisions at 13 TeV with Phase 1 CMS detector. Tech. Rep. CMSDP2018033 (CERN, 2018).
Kogler, R. et al. Jet substructure at the large hadron collider: experimental review. Rev. Mod. Phys. 91, 045003 (2019).
Aaij, R. et al. Identification of beauty and charm quark jets at LHCb. JINST 10, P06013 (2015).
Murphy, C. W. BottomQuark ForwardBackward and Charge Asymmetries at Hadron Colliders. Phys. Rev. D92, 054003 (2015).
Alves Jr., A. A. et al. The LHCb detector at the LHC. JINST 3, S08005 (2008).
Aaij, R. et al. LHCb detector performance. Int. J. Mod. Phys. A30, 1530022 (2015).
D0 collaboration. Measurements of B_{d} mixing using oppositeside flavor tagging. Phys. Rev. D74, 112002 (2006).
Giurgiu, Gavril A. B Flavor tagging calibration and search for \({\rm{B}}_s^0\) oscillations in semileptonic decays with the cdf detector at fermilab. United States: N. p., 2005. https://doi.org/10.2172/879144.
Aaij, R. et al. First measurement of the charge asymmetry in beautyquark pair production. Phys. Rev. Lett. 113, 082003 (2014).
Miller, J. Torchmps. https://github.com/jemisjoky/torchmps (2019).
Milsted, A., Ganahl, M., Leichenauer, S., Hidary, J. & Vidal, G. Tensornetwork on tensorflow: a spin chain application using tree tensor networks. Preprint at https://arxiv.org/abs/1905.01331 (2019).
ALEPH collaboration. ALEPH detector performance. Nucl. Instrum. Meth. A 360, 481 (1994).
Cacciari, M., Salam, G. P. & Soyez, G. The antik_{t} jet clustering algorithm. JHEP 04, 063 (2008).
Clemencic, M. et al. The lhcb simulation application, Gauss: design, evolution and experience. J. Phys. Conf. Ser. 331, 032023 (2011).
Sjöstrand, T., Mrenna, S. & Skands, P. A brief introduction to PYTHIA 8.1. Comput. Phys. Commun. 178, 852–867 (2008).
Lange, D. J. The EvtGen particle decay simulation package. Nucl. Instrum. Meth. A462, 152–155 (2001).
Agostinelli, S. et al. Geant4: A simulation toolkit. Nucl. Instrum. Meth. A506, 250 (2003).
Allison, J. et al. Geant4 developments and applications. IEEE Trans. Nucl. Sci. 53, 270 (2006).
Acknowledgements
We are very grateful to Konstantin Schmitz for valuable comments and discussions on the ML comparison. We thank Miles Stoudenmire for fruitful discussions on the application of the TNs ML code. This work is partially supported by the Italian PRIN 2017 and Fondazione CARIPARO, the Horizon 2020 research and innovation programme under grant agreement No. 817482 (Quantum Flagship—PASQuanS) and the QuantERA projects QTFLAG and QuantHEP. We acknowledge computational resources by CINECA and the Cloud Veneto. The work is partially supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) and the European Social Fund (ESF) as part of the EXIST programme under the project Tensor Solutions. We acknowledge the LHCb Collaboration for the valuable help and the Istituto Nazionale di Fisica Nucleare and the Department of Physics and Astronomy of the University of Padova for the support.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
Conceptualisation (T.F., D.L. and S.M.); data analysis (D.Z., L.S., T.F., M.T. and A.G.); funding acquisition (D.L., S.M.); investigation (D.Z., L.S., T.F. and S.M.); methodology (T.F., S.M.); tensor network software development (T.F. using private resources); validation (D.Z., L.S., T.F. and M.T.); writing—original draft (T.F., S.M.); writing—review and editing (all authors).
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Felser, T., Trenti, M., Sestini, L. et al. Quantuminspired machine learning on highenergy physics data. npj Quantum Inf 7, 111 (2021). https://doi.org/10.1038/s4153402100443w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4153402100443w
This article is cited by

Quantum Machine Learning for bjet charge identification
Journal of High Energy Physics (2022)

Deep tensor networks with matrix product operators
Quantum Machine Intelligence (2022)

Quantum algorithm for Feynman loop integrals
Journal of High Energy Physics (2022)