Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Search for charged-lepton-flavour violation in Z-boson decays with the ATLAS detector

## Abstract

Leptons with essentially the same properties apart from their mass are grouped into three families (or flavours). The number of leptons of each flavour is conserved in interactions, but this is not imposed by fundamental principles. Since the formulation of the standard model of particle physics, the observation of flavour oscillations among neutrinos has shown that lepton flavour is not conserved in neutrino weak interactions. So far, there has been no experimental evidence that this also occurs in interactions between charged leptons. Such an observation would be a sign of undiscovered particles or a yet unknown type of interaction. Here the ATLAS experiment at the Large Hadron Collider at CERN reports a constraint on lepton-flavour-violating effects in weak interactions, searching for Z-boson decays into a τ lepton and another lepton of different flavour with opposite electric charge. The branching fractions for these decays are measured to be less than 8.1 × 10−6 (eτ) and 9.5 × 10−6 (μτ) at the 95% confidence level using 139 fb−1 of proton–proton collision data at a centre-of-mass energy of $$\sqrt{s}=13\,{\rm{TeV}}$$ and 20.3 fb−1 at $$\sqrt{s}=8\,{\rm{TeV}}.$$ These results supersede the limits from the Large Electron–Positron Collider experiments conducted more than two decades ago.

## Main

In the standard model of particle physics (SM)1,2,3,4, three lepton families (flavours) exist. The number of leptons of each family is conserved in weak interactions, and violation of this assumption is known as lepton flavour violation (LFV). No fundamental principles forbid LFV processes in the SM. The phenomenon of neutrino oscillations, where neutrinos (the neutral leptons) of one flavour transform into those of another5,6, indicates that neutrinos have mass and LFV processes do occur in nature. The mechanisms responsible for neutrinos acquiring mass and weak interactions violating lepton flavour conservation remain unknown. More experimental data are needed to constrain and guide possible generalizations of the SM explaining these phenomena.

An observation of LFV in charged-lepton interactions would be an unambiguous sign of new physics. In particular, decays of the Z boson into a light lepton (electron or muon) and a τ lepton at colliders are of experimental interest. The abundance of Z bosons produced at the Large Hadron Collider (LHC) offers the opportunity to strongly constrain potential LFV Z → eτ or Z → μτ interactions, in particular those proportional to the centre-of-mass energy of the decay7. Moreover, Z → eτ, μτ decays are less constrained by low-energy experiments than Z → eμ decays. According to current knowledge, these decays can occur via neutrino mixing but are too rare to be detected. Only 1 in approximately 1054 Z bosons would decay into a muon and a τ lepton8. An observation of such decays would therefore require new theoretical explanations. For example, theories predicting the existence of heavy neutrinos9 provide a fundamental understanding of the observed tiny masses and large mixing of SM neutrinos. In such theories, up to 1 in 105 Z bosons would be expected to undergo an LFV decay involving τ leptons. The ATLAS experiment can test the predictions of such theories by observing or setting ever more stringent constraints on LFV Z-boson decays.

Constraints on the branching fractions ($${\mathcal{B}}$$) of the LFV decays of the Z boson involving a τ lepton have been set by the experiments at the Large Electron–Positron Collider (LEP): $${\mathcal{B}}(Z\to e\tau )<9.8\times 1{0}^{-6}$$ (ref. 10) and $${\mathcal{B}}(Z\to \mu \tau )<1.2\times 1{0}^{-5}$$ (ref. 11) at the 95% confidence level (CL). The ATLAS experiment12 at the LHC has set constraints $${\mathcal{B}}(Z\to e\tau )<5.8\times 1{0}^{-5}$$ at 95% CL using part of the Run 2 data and $${\mathcal{B}}(Z\to \mu \tau )<1.3\times 1{0}^{-5}$$ using the Run 1 data and a subset of the Run 2 data13.

This work uses proton–proton (pp) collision data collected by the ATLAS experiment during Run 2 of the LHC, containing about eight billion Z-boson decays. Only events with a τ lepton that decays hadronically are considered. Neural network (NN) classifiers are used in a novel way for optimal discrimination of signal from background, and to achieve improved sensitivity in the search for LFV effects in the data using a binned maximum-likelihood fit. The result for the μτ channel is combined with a previous LHC Run 1 result to further improve the sensitivity. These results set constraints on LFV Z-boson decays involving τ leptons that supersede the most stringent ones set by the LEP experiments more than two decades ago.

## The ATLAS experiment and data sample

To record and analyse the LHC pp collisions, the ATLAS experiment uses a multipurpose particle detector with a forward–backward symmetric cylindrical geometry and a near 4π coverage in solid angle12,14,15. It consists of an inner tracking detector surrounded by a superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer.

The search uses the complete dataset of pp collision events at a centre-of-mass energy of $$\sqrt{s}=\,\text{13}\,\text{TeV}\,$$ collected by the ATLAS experiment during LHC Run 2. This dataset was recorded using single-electron or single-muon triggers16 and corresponds to an integrated luminosity of 139 fb−1. For the search in the μτ channel, the results are combined with those of a previous similar search using pp collisions at $$\sqrt{s}=\,\text{8}\,\text{TeV}\,$$ during LHC Run 1, corresponding to an integrated luminosity of 20.3 fb−1 (ref. 17).

Candidates for electrons18, muons19, jets20,21,22, and visible decay products of hadronic τ-lepton decays (τhad-vis)23,24 are reconstructed from energy deposits in the calorimeters and charged-particle tracks measured in the inner detector and the muon spectrometer.

Electron candidates are required to pass the Medium likelihood-based identification requirement18 and have pseudorapidity $$\left|\eta \right| <1.37$$ or $$1.52<\left|\eta \right| <2.47.$$ Muon candidates are required to pass the Medium identification requirement19 and have $$\left|\eta \right| <2.5.$$ Both the electron and muon candidates must have transverse momentum pT > 30 GeV and satisfy the Tight isolation requirement18,19. The lower bounds on the electron and muon transverse momenta are driven by the acceptance of the trigger selection.

Quark- or gluon-initiated particle showers (jets) are reconstructed using the anti-kt algorithm20,21 with the radius parameter R = 0.4. Jets fulfilling pT > 20 GeV and $$\left|\eta \right| <2.5$$ are identified as containing b hadrons if tagged by a dedicated multivariate algorithm25.

The τhad-vis candidates are reconstructed from jets with pT > 10 GeV, $$\left|\eta \right| <1.37$$ or $$1.52<\left|\eta \right| <2.5$$, and one or three associated tracks, referred to as ‘1-prong’ (1P) and ‘3-prong’ (3P), respectively. The τhad-vis identification is performed by a recurrent NN algorithm23, which uses calorimetric shower shapes and tracking information to discriminate true τhad-vis candidates from fake candidates from quark- or gluon-initiated jets. The τhad-vis candidates are required to pass the Tight identification selection, which has an efficiency of 60% (45%) for true 1P (3P) τhad-vis candidates, constant in the τhad-vis candidates’ transverse momentum, and a misidentification rate of 1 in 70 (700) for fake 1P (3P) candidates in dijet events. Dedicated multivariate algorithms are used to further discriminate between τhad-vis and electrons, and to calibrate the τhad-vis energy24. The τhad-vis candidate with the largest pT in each event is the selected candidate and is required to have pT > 25 GeV. Based on simulation, in Z → τ decays, the τhad-vis candidate is expected to be correctly selected 98% of the time.

The missing transverse momentum $$({E}_{\,\text{T}}^{\text{miss}})$$ is calculated as the negative vectorial sum of the pT of all fully reconstructed and calibrated physics objects26,27. The calculation also includes inner detector tracks that originate from the vertex associated with the hard-scattering process but are not associated with any of the reconstructed objects. The missing transverse momentum is the best proxy for the total transverse momentum of undetected particles (in particular neutrinos) in an event.

## Search strategy

The Z → τ → τhad-vis + ν ( = light lepton, e or μ) signal events have a number of key features that can be exploited to separate them from the SM background events. The signal events are characterized by their unique final state, which has exactly one and one τ lepton, with the invariant mass of the pair being compatible with the Z-boson mass. The and τ leptons carry opposite electric charges and are emitted approximately back to back in the plane transverse to the proton beam direction. Since the τ lepton is typically boosted due to the large difference between its mass and the mass of its parent Z boson, the neutrino from its decay is usually almost collinear with the visible τ-decay products. The neutrino escapes detection and is reconstructed as part of the $${E}_{\,\text{T}}^{\text{miss}\,}$$ of the event. In a signal event, this is the only major source of $${E}_{\text{T}}^{\text{miss}}.$$

The major background contributions for this search are as follows: lepton-flavour-conserving Z → ττ → τhad-vis + 3ν decays, where one of the τ leptons decays leptonically and the other hadronically; Z →  decays, where one of the light leptons is misidentified as the τhad-vis candidate; events with a quark- or gluon-initiated jet that is misidentified as the τhad-vis candidate. The last of these are hereafter referred to as events with ‘fakes’ and are mostly W(→ ν) + jets events and purely hadronic multijet events. Other SM processes with a real τhad-vis final state, such as decays of a top–antitop-quark pair, two gauge bosons or a Higgs boson, and those with a real τhad-vis and a jet misidentified as a light lepton, such as W(→ τν) + jets, are considered, although their contribution to the overall background is minor.

The signal and background events are separated by using a set of event selection criteria that help to define a signal-enhanced sample, referred to as the signal region (SR). The main selection criteria are listed in Table 1 and will be explained in the following. They are primarily based on the multiplicity of reconstructed particle candidates and the event topology, in particular the transverse masses (mT), which are defined as

$${m}_{\text{T}}(X,\,{E}_{\text{T}}^{\text{miss}})\equiv \sqrt{2 {p}_{\text{T}}(X) {E}_{\text{T}}^{\text{miss}} \left(1-\cos ({\phi }_{X}-{\phi }_{{E}_{\text{T}}^{\text{miss}}})\right)}$$
(1)

where X is either a light lepton or a τhad-vis candidate and ɸ denotes the azimuthal angle. A schematic of the expected signal and background topologies is described in Extended Data Figs. 1 and 2.

NN binary classifiers are used to distinguish signal events from W + jets , Z → ττ and Z →  background events. The NNs are trained on simulated events (‘Signal and background predictions’). Each individual NN is optimized to discriminate against a particular background process in a given decay channel. The input to these NNs is a mixture of low-level and high-level kinematic variables, as detailed in Methods. The low-level variables are the momentum components of the reconstructed , τhad-vis candidate and $${E}_{\,\text{T}}^{\text{miss}\,}$$. The high-level variables are kinematic properties of the τhad-vis$${E}_{\,\text{T}}^{\text{miss}\,}$$ system, such as the collinear mass mcoll(, τ), defined as the invariant mass of the τhad-visν system, where the ν is assumed to have a momentum that is equal in pT and ϕ to the measured $${E}_{\,\text{T}}^{\text{miss}\,}$$ and equal in η to the τhad-vis momentum. Given the finite training-sample size, the high-level variables help the NNs to converge faster, while the NNs exploit any residual correlations between the low-level variables.

The outputs from the individual NNs are numbers between 0 and 1 that reflect the probability for an event to be a signal event; they are combined into a final discriminant, hereafter referred to as the ‘combined NN output’. The combination is parameterized by weights associated with each individual NN and optimized for discrimination among various background processes distributed differently along the range of combined NN output values, as detailed in Methods. This allows the maximum-likelihood fit to determine the background contributions more precisely, which ultimately improves the sensitivity.

Events classified by the NNs as being background-like are excluded from the SR, as indicated in Table 1. The signal acceptance times selection efficiency in the SR is 2.7% for the eτ channel and 3.0% for the μτ channel, as determined from simulated signal samples.

## Signal and background predictions

Predictions for signal and background contributions to the event yield and kinematic distributions in the SR are based partly on Monte Carlo (MC) simulations and partly on the use of data in regions that are enriched in background events and do not overlap with the SR.

The signal events were simulated using PYTHIA 828 with matrix elements calculated at leading order (LO) in the strong coupling constant (αs). Parameter values for initial-state radiation, multiparton interactions and beam remnants were set according to the A14 set of tuned parameters (tune)29 with the NNPDF 2.3 LO parton distribution function (PDF) set30. Nominal signal samples were generated with a parity-conserving Zτ vertex and unpolarized τ leptons. Scenarios where the decays are maximally parity-violating were considered by reweighting the simulated events using TAUSPINNER31. The event weight was computed as the probability of occurrence of each generated signal event, based on its kinematics, when assuming a specific τ-polarization state (left-handed or right-handed).

Background Z → ττ events were simulated with the SHERPA 2.2.132 generator using the NNPDF 3.0 NNLO PDF set33 and next-to-leading-order (NLO) matrix elements for up to two partons, and LO matrix elements for up to four partons, calculated with the COMIX34 and OPENLOOPS35,36,37 libraries. They were matched with the SHERPA parton shower38 using the MEPS@NLO prescription39,40,41,42 with the default SHERPA tune. This set-up follows the recommendations of the SHERPA authors. Background Z →  events were simulated using the POWHEG-BOX43 generator with NLO matrix elements and interfaced to PYTHIA 8 to model the parton showers, hadronization and underlying events. All MC samples include a detailed simulation of the ATLAS detector with GEANT 444, to produce predictions that can be compared with the data. Furthermore, simulated inelastic pp collisions, generated with PYTHIA 8 using the NNPDF 2.3 LO PDF set and the A3 tune45 were overlaid on the hard-scattering events to model the additional pp collisions occurring in the same proton bunch crossing. All simulated events were processed using the same reconstruction algorithms as used for data.

The simulation of Z-boson production is improved with a correction derived from measurements in data. The simulated pT spectra of the Z boson are reweighed to match the unfolded distribution measured by ATLAS in ref. 46. This improves the predictions of signal, Z → ττ and Z →  events, which are simulated at different orders in αs using different generators. It also reduces the uncertainties related to missing higher orders in αs.

The predicted overall yields of signal and Z → ττ events are determined by a binned maximum-likelihood fit to data (see next section) in the SR and in a control region enhanced in Z → ττ → τhad-vis + 3ν events (CRZττ), using an unconstrained fit parameter, which accounts for theoretical uncertainties in the total Z-boson production cross-section (σZ), as well as the experimental uncertainties related to the acceptance of the common τhad-vis final state. The selection criteria for events in the CRZττ are the same as those for events in the SR, except that events are required to have $${m}_{\text{T}}({\tau }_{{\rm{had}}{\rm{-}}{\rm{vis}}},{E}_{\text{T}}^{\text{miss}})>\text{35}\,\text{GeV}$$, $${m}_{\text{T}}(\ell ,{E}_{\text{T}}^{\text{miss}})<\text{40}\,\text{GeV}$$ and 70 GeV < mcoll(, τ) < 110 GeV.

A much smaller contribution to the total background originates from Z →  events. Their predicted overall yield is based on the measured value of σZ (ref. 47) times the measured integrated luminosity. The uncertainty in the measurement is taken into account. The predicted rates of misidentifying electrons and muons in Z →  events as 1P τhad-vis candidates are corrected using data in a region enriched in Z →  events and orthogonal to the SR, where the last selection criterion in Table 1 is inverted and the outputs of the NN classifiers optimized to reject Z → ττ, and W + jets events are required to be greater than 0.8. The corrections are derived as functions of pT and $$\left|\eta \right|$$ of the τhad-vis candidate. Statistical uncertainties in the correction are considered.

Events with fakes are one of the dominant contributions to the background, and are estimated from data using the ‘fake-factor method’, which is described in ref. 13. A fake factor is defined as the ratio of the number of events with a fake τhad-vis candidate passing the Tight τhad-vis identification requirement to those failing it. Four fake factors, one for each of the most important backgrounds with fakes (W(→ ν) + jets, multijet, Z(→ ) + jets and $$t\bar{t}$$ events), are measured in data in four corresponding fakes-enriched regions. Each of these regions has a dominant contribution from one of the four targeted backgrounds with fakes. These regions do not overlap with any of the regions used in the final maximum-likelihood fit. The purity of the multijet-enriched region is improved by introducing two additional selection criteria: events must have a same-sign charged τhad-vis pair and $${m}_{\text{T}}(\ell ,{E}_{\text{T}}^{\text{miss}})<\text{40}\,\text{GeV}$$. The fake factors are measured as functions of the transverse momentum of the τhad-vis candidate, separately for eτ and μτ events and for events with 1P or 3P τhad-vis candidates.

The number of events with a fake 1P or 3P τhad-vis candidate in a given pT range in the SR or CRZττ is estimated by the number of events with a τhad-vis candidate failing the Tight identification requirement, but otherwise satisfying all other selection criteria for that region, multiplied by an average of the fake factors. To calculate this average, the fake factors are summed with weights equal to the expected relative contribution of the corresponding background to the total yield of events in the region with the inverted identification requirement. This approach is used to model the kinematic properties of the events with fakes. The total predicted yields of these events in the SR and CRZττ are instead determined by a maximum-likelihood fit to data (see next section), separately for events with 1P and 3P τhad-vis candidates. This approach avoids the uncertainties associated with the simulation of events with fakes, and makes full use of the large amount of data collected.

The remaining background processes (summarized as ‘Others’ in the following) have relatively small contributions in the SR and are estimated using simulations. They include events from the production and decays of top quarks, pairs of gauge bosons, the Higgs boson and W(→ τν) + jets. The yields of these events are normalized to their theoretical cross-sections.

The modelling of the estimated background is validated using events in regions where a possible contamination from signal is negligible. Especially important to the search is the modelling of the combined NN output distribution of Z → ττ events and events with fakes. This is validated by comparing the predicted distributions with data in the CRZττ and in a region similar to the SR, but with events that have same-sign charged τhad-vis pairs, as shown in Fig. 1.

## Constraints on $${\mathcal{B}}(Z\to \ell \tau )$$

A statistical analysis of the selected events is performed to assess the presence of LFV signal events. The statistical analysis method is detailed in Methods. A simultaneous binned maximum-likelihood fit to the combined NN output in the SR and mcoll(, τ) in the CRZττ is used to constrain uncertainties in the models and extract evidence of a possible signal. The fit is performed independently for the eτ and μτ channels. Events with 1P and 3P τhad-vis candidates are considered separately. Hypothesis tests, in which a log-likelihood ratio is used as the test statistic, are used to assess the compatibility between the background and signal models and the data.

There are four unconstrained parameters in the fits: two of them determine the overall yields of events with fake 1P τhad-vis or 3P τhad-vis candidates, one determines σZ times the overall acceptance and reconstruction efficiency of the τhad-vis final state in Z → ττ and signal events, and the last one, the parameter of interest, determines the LFV branching fraction $${\mathcal{B}}(Z\to \ell \tau )$$ by modifying an arbitrary pre-fit signal yield.

Constrained parameters are also introduced to account for systematic uncertainties in the signal and background predictions. In the case of no significant deviations from the SM background, exclusion limits are set using the CLS method48.

Systematic uncertainties in this search include uncertainties in simulated events in the modelling of trigger, reconstruction, identification and isolation efficiencies, as well as energy calibrations and resolutions of reconstructed objects. Conservative theory uncertainties ranging between 4% and 20% are also assigned to the predicted cross-sections used for the estimation of minor background processes. These uncertainties are not assigned to events with fakes or Z-boson decays, whose yields are determined from data. These events constitute only a small fraction of the background events in the SR. The dominant uncertainties in this search are those in the overall yields of events with fakes, which are predominantly of statistical nature, and those in the τhad-vis energy calibration, which are independent between 1P and 3P τhad-vis candidates and constrained by the fit of the collinear mass spectrum to the data in the CRZττ. A summary of the uncertainties and their impact on the best-fit LFV branching fraction is provided in Table 2, which shows that the sensitivity of the search is primarily limited by the available amount of data.

The best-fit expected and observed distributions of the combined NN output in the SR are shown in Fig. 2. The best-fit yields of Z → ττ and events with fakes are close to the pre-fit predicted values and are determined with a relative precision of 2–4%. Table 3 shows the best-fit expected background and signal yields and the observed number of events in the SR of the eτ and μτ channels with an additional requirement of a combined NN output > 0.7 to consider the most signal-like events.

The best-fit amount of Z → τ signal corresponds to the branching fractions $${\mathcal{B}}(Z\to e\tau )=(-0.1\pm 3.5(\text{stat})\pm 2.3(\text{syst}))\times 1{0}^{-6}$$ and $${\mathcal{B}}(Z\to \mu \tau )=(4.3\pm 2.8(\text{stat})\pm 1.6(\text{syst}))\times 1{0}^{-6}$$. The positive best-fit value of $${\mathcal{B}}(Z\to \mu \tau )$$ is related to a small excess of observed events relative to the background-only hypothesis. This excess has a significance of 0.9 standard deviations when the events with 1P and 3P τhad-vis candidates are fitted simultaneously.

No statistically significant deviation from the SM prediction is observed, and upper limits on the LFV branching fractions are set. For the μτ channel, a more stringent upper limit is set by combining the likelihood function of the presented measurement and a similar measurement done with ATLAS Run 1 data17. Systematic uncertainties from the two measurements are considered uncorrelated in the combined likelihood function. The upper limits are shown in Table 4 for LFV decays with different assumptions about the τ-polarization state. In the scenario where the τ leptons are unpolarized, the observed upper limits at 95% CL on $${\mathcal{B}}(Z\to e\tau )$$ and $${\mathcal{B}}(Z\to \mu \tau )$$ are 8.1 × 10−6 and 9.5 × 10−6, respectively.

In conclusion, these results from the ATLAS experiment at the LHC set stringent constraints on LFV Z-boson decays involving τ leptons (using only their hadronic decays), superseding the most stringent ones set by the LEP experiments more than two decades ago. The precision of these results is mainly limited by statistical uncertainties.

## Methods

### Neural network classifiers

Several binary NN classifiers are trained for both the eτ and μτ channels to discriminate signal from the three major backgrounds: W + jets, Z → ττ and Z → . They are referred to as NNWjets, NNZττ and NNZ, respectively.

The NNs are trained using simulated events selected with the same criteria as those used in the SR, except that the cuts on mvis(, τ) and the NN output are omitted, and real τhad-vis candidates from Z → τ and Z → ττ are required to pass less stringent identification criteria so as to increase the training sample size. For the Z →  process, only events where the τhad-vis candidate is a misidentified light lepton are used. For the W + jets process, jets misidentified as τhad-vis are modelled by simulations. Different NNs are trained separately for eτ and μτ events as well as for events with 1P or 3P τhad-vis candidates. To increase the signal sample size, the Z → eτ and Z → μτ samples are combined and used for training in both channels, assuming equivalent event topology when exchanging e and μ. Owing to the low expected yield of Z →  events with 3P τhad-vis candidates, no classifier is trained to discriminate them from background.

A mixture of low-level and high-level kinematic variables are used as input to the NNs. The low-level variables include the four-momenta of the reconstructed (refs. 18,19), τhad-vis candidate23,24 and $${E}_{\,\text{T}}^{\text{miss}\,}$$ (refs. 26,27). To remove known spatial symmetries for optimal training, the low-level variables are transformed in a way that preserves the Lorentz invariance before they are fed into the NNs. The transformation consists of the following steps: first, the τhad-vis$${E}_{\,\text{T}}^{\text{miss}\,}$$ system is boosted in a direction in the plane transverse to the beam line such that the total transverse momentum of the system is zero; the system is then rotated about the z axis such that the direction of $${E}_{\,\text{T}}^{\text{miss}\,}$$ is aligned with the x axis; if the τhad-vis candidate’s momentum has a negative z component, the entire system is rotated about the new x axis by 180°. After the transformation, only six independent non-vanishing components are left (the τhad-vis candidate is assumed to have zero rest mass), which are the inputs to the NNs.

The high-level variables include Δα, which is a kinematic discriminant defined7 as

$${{\Delta }}\alpha =\frac{{m}_{Z}^{2}-{m}_{\tau }^{2}}{2p(\ell )\times p({\tau }_{{\rm{had}}{\rm{-}}{\rm{vis}}})}-\frac{{p}_{\text{T}}(\ell )}{{p}_{\text{T}}({\tau }_{{\rm{had}}{\rm{-}}{\rm{vis}}})}$$
(2)

where mZ and mτ are the nominal masses of the Z boson and τ lepton, respectively, and p denotes four-momentum. It is specifically defined to test the assumptions that the missing momentum of the event is collinear with the τhad-vis candidate, and that the τ and light leptons in the event are decay products of an on-shell Z boson. For a signal event, where these assumptions are approximately true, it is expected that Δα ≈ 0. Meanwhile, for an SM background event, the value is expected to deviate from zero in general. The other high-level variables are the invariant mass of the  − τhad-vis system, the collinear mass mcoll(, τ) and the invariant mass of the light lepton and the track associated with the τhad-vis candidate (only used by the Z →  classifier).

The training and optimization of the NN classifiers are performed using the open-source software package KERAS49. All of the NNs used in the analysis share the same architecture. Each NN consists of an input layer, two hidden layers of 20 nodes each, and an output layer with a single node. Each layer is fully connected to the neighbouring layers. Low-level and high-level variables are treated in the same way in the input layer. The hidden-layer nodes use rectified linear activation functions, while the output node uses a sigmoid activation function. The NNs are trained using the Adam algorithm50 to optimize the binary cross entropy. All the NNs are trained with a batch size of 256 and 200 epochs. The number of hidden layers, the number of nodes per layer, the training batch size and the learning rate parameter of the optimizer are simultaneously chosen by maximizing the area under the expected receiver operating characteristic curve. The optimization is done with a grid scan. No regularization or dropout is added, and no sign of overtraining is observed. For other configurations and hyperparameters that have not been mentioned, the default settings in KERAS 1.1.0 are used.

Each NN classifier outputs a score between 0 and 1 for each event, where a higher score indicates that the event is more signal-like. The output scores from the different classifiers are combined into the final discriminant (combined NN output) using the formula

$$\text{Combined NN output}=1-\sqrt{\frac{{\sum }_{b}{w}_{b}\times {\left(1-{\text{NN}}_{b}\,\text{output}\right)}^{2}}{{\sum }_{b}{w}_{b}}}$$
(3)

where b = Wjets, τ, Zℓ and wb are constant parameters. Output scores for events with 1P τhad-vis candidates and those with 3P τhad-vis candidates are combined separately. The summation is over Wjets, τ and Zℓ for events with 1P τhad-vis candidates, and only over Wjets and τ for events with 3P τhad-vis candidates.

By construction, the combined NN output ranges between 0 and 1, where 0 represents the most background-like (and 1 the most signal-like) event possible. The choice of values of wb affects the expected sensitivity of the analysis because they change how events from the different background processes are distributed along the range of combined NN output values, and thus impact the ability of the binned maximum-likelihood fit to determine the background contributions. The values of wb are chosen with a grid scan to minimize the expected upper limit on the branching fraction in the absence of a signal. The chosen values have the ratio wτ:wWjets:wZℓ = 1.0:1.5:0.33. As could be expected, the optimized weights loosely reflect the impact of the uncertainties in the corresponding backgrounds on the determination of the signal branching fraction.

### Maximum-likelihood fit

Binned maximum-likelihood fits are implemented using the statistical analysis packages ROOFIT51, ROOSTATS52 and HISTFITTER53. The expected binned distributions of the combined NN output in the SR and the collinear mass in the CRZττ are fit to data to extract evidence of signal events. Fitting the data in the CRZττ and in part of the SR with low combined NN output values (where no signal is expected) benefits the overall sensitivity to the signal, because it reduces the uncertainties of the background model in the high combined NN output value region, where most of the signal is expected. Owing to the differences in background composition, acceptance and efficiencies, regions with 1P and 3P τhad-vis candidates are fit separately but simultaneously. The probabilities of compatibility between the data and the background-only or background-plus-signal hypotheses are assessed using the modified frequentist CLS method48, and exclusion upper limits on $${\mathcal{B}}(Z\to \ell \tau )$$ are set by the inversion of these hypothesis tests.

The background-plus-signal model has four unconstrained parameters before the fit. Two of the parameters determine the overall yields of events with 1P and 3P fakes separately. A third parameter determines σZ times the overall acceptance and reconstruction efficiency of events with a true τhad-vis final state. It is applied to the normalizations of both the signal and Z → ττ events to ensure that the same σZ times acceptance is estimated for both processes. The last unconstrained parameter is the parameter of interest μsig, which controls the normalization of signal events. Given the similarity between the signal and Z → ττ → τhad-vis + 3ν final states and that both processes are estimated with the same σZ and acceptance and efficiency corrections, this choice of parameterization reduces the impact on the determined $${\mathcal{B}}(Z\to \ell \tau )$$ from detector effects and uncertainties in predicting σZ. The parameter of interest represents

$${\mu }_{\text{sig}} =\frac{{\mathcal{B}}(Z\to \ell \tau )}{{{\mathcal{B}}}_{\text{pre-fit}}(Z\to \ell \tau )}$$
(4)

where $${{\mathcal{B}}}_{\text{pre-fit}}(Z\to \ell \tau )$$ is an arbitrary branching fraction to which the signal prediction is normalized. Although the physical branching fraction must be positive, the parameter of interest in the fit is not constrained to be positive.

Systematic uncertainties are represented by nuisance parameters (NPs) with Gaussian constraints in the likelihood function. The impact of uncertainties on both the shape and normalization of the predicted distributions are taken into account. Uncertainties in the energy calibration and resolution as well as in the trigger, reconstruction, identification and isolation efficiencies of jets, electrons, muons, τhad-vis and $${E}_{\,\text{T}}^{\text{miss}\,}$$ are considered. Theoretical uncertainties in the production cross-sections affect only the predictions of the minor backgrounds, because the Z → ττ and signal yields are determined in the maximum-likelihood fit to data and the Z →  yield is determined by the measured value of σZ. Statistical uncertainties in the determination of the fake factors are also considered. They are modelled by one NP per pT bin in which the fake factors are measured. As noted in the section ‘Constraints on$${\mathcal{B}}(Z\to \ell \tau )$$’ the dominant uncertainties in the analysis are the statistical uncertainties in determining how many events have fakes and the systematic uncertainties in the reconstructed τhad-vis energy.

For the μτ channel, the likelihood functions of the presented measurement and of the measurement in ref. 17 are combined. As the two measurements are statistically uncorrelated and the predictions are based on different methods, NPs in the individual likelihood functions are considered uncorrelated in the combination. The method of combination is the same as in ref. 13.

## Data availability

The experimental data that support the findings of this study are available in HEPData with the identifier https://www.hepdata.net/record/96390. The ATLAS software is available at the following link: https://gitlab.cern.ch/atlas/athena.

## References

1. 1.

Glashow, S. Partial-symmetries of weak interactions. Nucl. Phys. 22, 579–588 (1961).

2. 2.

Weinberg, S. A model of leptons. Phys. Rev. Lett. 19, 1264–1266 (1967).

3. 3.

Salam, A. Weak and electromagnetic interactions. Conf. Proc. C 680519, 367–377 (1968).

4. 4.

’t Hooft, G. & Veltman, M. Regularization and renormalization of gauge fields. Nucl. Phys. B 44, 189–213 (1972).

5. 5.

Super-Kamiokande Collaboration Evidence for oscillation of atmospheric neutrinos. Phys. Rev. Lett. 81, 1562–1567 (1998).

6. 6.

SNO Collaboration Direct evidence for neutrino flavor transformation from neutral-current interactions in the Sudbury Neutrino Observatory. Phys. Rev. Lett. 89, 011301 (2002).

7. 7.

Davidson, S., Lacroix, S. & Verdier, P. LHC sensitivity to lepton flavour violating Z boson decays. J. High Energy Phys. 9, 92–103 (2012).

8. 8.

Illana, J. I., Jack, M. & Riemann, T. Predictions for Z → τμ and related reactions. In Proc. 2nd Joint ECFA/DESY Workshop on Physics and Detectors for a Linear Electron Positron Collider 490–524 (DESY, 1999); https://arxiv.org/pdf/hep-ph/0001273.pdf

9. 9.

Illana, J. I. & Riemann, T. Charged lepton flavor violation from massive neutrinos in Z decays. Phys. Rev. D 63, 053004 (2001).

10. 10.

OPAL Collaboration A search for lepton flavor violating Z0 decays. Z. Phys. C 67, 555–564 (1995).

11. 11.

DELPHI Collaboration Search for lepton flavor number violating Z0 decays. Z. Phys. C 73, 243–251 (1997).

12. 12.

ATLAS Collaboration The ATLAS experiment at the CERN Large Hadron Collider. J. Instrum. 3, S08003 (2008).

13. 13.

ATLAS Collaboration Search for lepton-flavor-violating decays of the Z boson into a τ lepton and a light lepton with the ATLAS detector. Phys. Rev. D 98, 092010 (2018).

14. 14.

ATLAS Collaboration ATLAS Insertable B-Layer Technical Design Report ATLAS-TDR-19; CERN-LHCC-2010-013 (CERN, 2010); https://cds.cern.ch/record/1291633

15. 15.

Abbott, B. et al. Production and integration of the ATLAS insertable B-layer. J. Instrum. 13, T05008 (2018).

16. 16.

ATLAS Collaboration Performance of the ATLAS trigger system in 2015. Eur. Phys. J. C 77, 317–393 (2017).

17. 17.

ATLAS Collaboration Search for lepton-flavour-violating decays of the Higgs and Z bosons with the ATLAS detector. Eur. Phys. J. C 77, 70–116 (2017).

18. 18.

ATLAS Collaboration Electron and photon performance measurements with the ATLAS detector using the 2015–2017 LHC proton–proton collision data. J. Instrum. 14, P12006 (2019).

19. 19.

ATLAS Collaboration Muon reconstruction performance of the ATLAS detector in proton–proton collision data at $$\sqrt{s}=13$$ TeV. Eur. Phys. J. C 76, 292–337 (2016).

20. 20.

Cacciari, M., Salam, G. P. & Soyez, G. The anti-kt jet clustering algorithm. J. High Energy Phys. 4, 63–75 (2008).

21. 21.

Cacciari, M., Salam, G. P. & Soyez, G. FastJet user manual. Eur. Phys. J. C 72, 1896–1965 (2012).

22. 22.

ATLAS Collaboration Jet energy scale and resolution measured in proton–proton collisions at $$\sqrt{s}=13$$ TeV with the ATLAS detector. Preprint at https://arxiv.org/pdf/2007.02645.pdf (2020).

23. 23.

ATLAS Collaboration Identification of Hadronic Tau Lepton Decays using Neural Networks in the ATLAS Experiment ATL-PHYS-PUB-2019-033 (CERN, 2019); https://cds.cern.ch/record/2688062

24. 24.

ATLAS Collaboration Measurement of the Tau Lepton Reconstruction and Identification Performance in the ATLAS Experiment using pp Collisions at $$\sqrt{s}=13$$TeV ATLAS-CONF-2017-029 (CERN, 2017); https://cds.cern.ch/record/2261772

25. 25.

ATLAS Collaboration. ATLAS b-jet identification performance and efficiency measurement with $$t\bar{t}$$ events in pp collisions at $$t\bar{t}$$ TeV. Eur. Phys. J. C 79, 970–1006 (2019).

26. 26.

ATLAS Collaboration Performance of missing transverse momentum reconstruction with the ATLAS detector using proton–proton collisions at $$\sqrt{s}=13$$ TeV. Eur. Phys. J. C 78, 903–969 (2018).

27. 27.

ATLAS Collaboration Performance in the ATLAS Detector using 2015–2016 LHC pp Collisions ATLAS-CONF-2018-023 (CERN, 2018); https://cds.cern.ch/record/2625233

28. 28.

Sjöstrand, T. et al. An introduction to PYTHIA 8.2. Comput. Phys. Commun. 191, 159–177 (2015).

29. 29.

ATLAS Collaboration ATLAS Pythia 8 Tunes to 7-TeV Data ATL-PHYS-PUB-2014-021 (CERN, 2014); https://cds.cern.ch/record/1966419

30. 30.

Ball, R. D. et al. Parton distributions with LHC data. Nucl. Phys. B 867, 244–289 (2013).

31. 31.

Przedzinski, T., Richter-Was, E. & Was, Z. Documentation of TauSpinner algorithms: program for simulating spin effects in τ-lepton production at LHC. Eur. Phys. J. C 79, 91–113 (2019).

32. 32.

Bothmann, E. et al. Event generation with Sherpa 2.2. SciPost Phys. 7, 34–73 (2019).

33. 33.

Ball, R. D. et al. Parton distributions for the LHC run II. J. High Energy Phys. 4, 40–191 (2015).

34. 34.

Gleisberg, T. & Höche, S. Comix, a new matrix element generator. J. High Energy Phys. 12, 39–64 (2008).

35. 35.

Buccioni, F. et al. OpenLoops 2. Eur. Phys. J. C 79, 866–946 (2019).

36. 36.

Cascioli, F., Maierhöfer, P. & Pozzorini, S. Scattering amplitudes with open loops. Phys. Rev. Lett. 108, 111601 (2012).

37. 37.

Denner, A., Dittmaier, S. & Hofer, L. Collier: a Fortran-based complex one-loop library in extended regularizations. Comput. Phys. Commun. 212, 220–238 (2017).

38. 38.

Schumann, S. & Krauss, F. A Parton shower algorithm based on Catani–Seymour dipole factorisation. J. High Energy Phys. 3, 38–97 (2008).

39. 39.

Höche, S., Krauss, F., Schönherr, M. & Siegert, F. A critical appraisal of NLO + PS matching methods. J. High Energy Phys. 9, 49–83 (2012).

40. 40.

Höche, S., Krauss, F., Schönherr, M. & Siegert, F. QCD matrix elements + parton showers. The NLO case. J. High Energy Phys. 4, 27–41 (2013).

41. 41.

Catani, S., Krauss, F., Kuhn, R. & Webber, B. R. QCD matrix elements + parton showers. J. High Energy Phys. 11, 63–84 (2001).

42. 42.

Höche, S., Krauss, F., Schumann, S. & Siegert, F. QCD matrix elements and truncated showers. J. High Energy Phys. 5, 53–94 (2009).

43. 43.

Alioli, S., Nason, P., Oleari, C. & Re, E. A general framework for implementing NLO calculations in shower Monte Carlo programs: the POWHEG BOX. J. High Energy Phys. 6, 43–99 (2010).

44. 44.

Agostinelli, S. et al. Geant4—a simulation toolkit. Nucl. Instrum. Methods Phys. Res. A 506, 250–303 (2003).

45. 45.

ATLAS Collaboration The Pythia 8 A3 Tune Description of ATLAS Minimum Bias and Inelastic Measurements Incorporating the Donnachie–Landshoff Diffractive Model ATL-PHYS-PUB-2016-017 (CERN, 2016); https://cds.cern.ch/record/2206965

46. 46.

ATLAS Collaboration Measurement of the transverse momentum distribution of Drell–Yan lepton pairs in proton–proton collisions at $$\sqrt{s}=13$$ TeV with the ATLAS detector. Eur. Phys. J. C 80, 616–644 (2020).

47. 47.

ATLAS Collaboration Measurement of W± and Z-boson production cross-sections in pp collisions at $$\sqrt{s}=13$$ TeV with the ATLAS detector. Phys. Lett. B 759, 601–621 (2016).

48. 48.

Read, A. L. Presentation of search results: the CLs technique. J. Phys. G 28, 2693–2704 (2002).

49. 49.

Chollet, F. et al. Keras (Keras, 2015); https://keras.io

50. 50.

Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/pdf/1412.6980.pdf (2014).

51. 51.

Verkerke, W. & Kirkby, D. P. The RooFit toolkit for data modeling. Preprint at https://arxiv.org/pdf/physics/0306116.pdf (2003).

52. 52.

Moneta, L. et al. The RooStats Project. Preprint at https://arxiv.org/pdf/1009.1003.pdf (2010).

53. 53.

Baak, M. et al. HistFitter software framework for statistical data analysis. Eur. Phys. J. C 75, 153–173 (2015).

54. 54.

ATLAS Collaboration ATLAS Computing Acknowledgments ATL-SOFT-PUB-2020-001 (CERN, 2020); https://cds.cern.ch/record/2717821

## Acknowledgements

We acknowledge our late colleague, Olga Igonkina (1973–2019), for inspiring and driving this and other searches for lepton flavour violation within the ATLAS experiment. Her curiosity and intelligence remain an inspiration to the ATLAS Collaboration. We thank CERN for the very successful operation of the LHC, as well as the support staff from our institutions, without whom ATLAS could not be operated efficiently. We acknowledge the support of ANPCyT, Argentina; YerPhI, Armenia; ARC, Australia; BMWFW and FWF, Austria; ANAS, Azerbaijan; SSTC, Belarus; CNPq and FAPESP, Brazil; NSERC, NRC and CFI, Canada; CERN; ANID, Chile; CAS, MOST and NSFC, China; COLCIENCIAS, Colombia; MSMT CR, MPO CR and VSC CR, Czech Republic; DNRF and DNSRC, Denmark; IN2P3-CNRS and CEA-DRF/IRFU, France; SRNSFG, Georgia; BMBF, HGF and MPG, Germany; GSRT, Greece; RGC and Hong Kong SAR, China; ISF and Benoziyo Center, Israel; INFN, Italy; MEXT and JSPS, Japan; CNRST, Morocco; NWO, Netherlands; RCN, Norway; MNiSW and NCN, Poland; FCT, Portugal; MNE/IFA, Romania; MES of Russia and NRC KI, Russia Federation; JINR; MESTD, Serbia; MSSR, Slovakia; ARRS and MIZŠ, Slovenia; DST/NRF, South Africa; MICINN, Spain; SRC and Wallenberg Foundation, Sweden; SERI, SNSF and Cantons of Bern and Geneva, Switzerland; MOST, Taiwan; TAEK, Turkey; STFC, UK; DOE and NSF, United States. In addition, individual groups and members have received support from BCKDF, CANARIE, Compute Canada and CRC, Canada; ERC, ERDF, Horizon 2020, Marie Skłodowska-Curie Actions and COST, EU; Investissements d’Avenir Labex, Investissements d’Avenir Idex and ANR, France; DFG and AvH Foundation, Germany; Herakleitos, Thales and Aristeia programmes co-financed by EU-ESF and the Greek NSRF, Greece; BSF-NSF and GIF, Israel; La Caixa Banking Foundation, CERCA Programme Generalitat de Catalunya and PROMETEO and GenT Programmes Generalitat Valenciana, Spain; Göran Gustafssons Stiftelse, Sweden; The Royal Society and Leverhulme Trust, UK. The crucial computing support from all WLCG partners is acknowledged gratefully, in particular from CERN, the ATLAS Tier-1 facilities at TRIUMF (Canada), NDGF (Denmark, Norway, Sweden), CC-IN2P3 (France), KIT/GridKA (Germany), INFN-CNAF (Italy), NL-T1 (Netherlands), PIC (Spain), ASGC (Taiwan), RAL (UK) and BNL (USA), the Tier-2 facilities worldwide and large non-WLCG resource providers. Major contributors of computing resources are listed in ref. 54.

## Author information

### Contributions

All authors contributed to the publication, being variously involved in the design and the construction of the detectors, in writing software, calibrating subsystems, operating the detectors and acquiring data, and finally analysing the processed data. The ATLAS Collaboration members discussed and approved the scientific results. The manuscript was prepared by a subgroup of authors appointed by the collaboration and subject to an internal collaboration-wide review process. All authors reviewed and approved the final version of the manuscript.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Peer review informationNature Physics thanks Michael Schmidt, Roger Wolf and Scott Yost for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Extended data

### Extended Data Fig. 1 Schematic representation of a typical event selected in the SR.

The topology as seen in the plane transverse to the beam line is shown. (a) A signal Z → τ event. (b) A Z → ττ event. (c) A W + jets event. The green arrows represent reconstructed light leptons (). The blue triangles represent the τhad-vis candidates. The light blue dashed lines represent neutrinos that escape detection and are reconstructed as (part of) the missing transverse momentum of the event.

### Extended Data Fig. 2 Distributions of $${m}_{T}({\tau }_{{\rm{had-vis}}},{E}_{{\rm{T}}}^{{\rm{miss}}})$$ versus $${m}_{T}(\mu ,{E}_{{\rm{T}}}^{{\rm{miss}}})$$ of events selected in the SR.

(a) Simulated Z → μτ events. (b) Simulated Z → ττ events. (c) Events measured in data in regions where quark- or gluon-initiated jets are misidentified as τhad-vis candidates (events with jet → τhad-vis fakes, see ‘Signal and background predictions’ section) in the μτ final state. The colour map represents the fraction of events in each bin.

## Rights and permissions

Reprints and Permissions

ATLAS Collaboration. Search for charged-lepton-flavour violation in Z-boson decays with the ATLAS detector. Nat. Phys. 17, 819–825 (2021). https://doi.org/10.1038/s41567-021-01225-z

• Accepted:

• Published:

• Issue Date: