Test of lepton universality in beauty-quark decays

The Standard Model of particle physics currently provides our best description of fundamental particles and their interactions. The theory predicts that the different charged leptons, the electron, muon and tau, have identical electroweak interaction strengths. Previous measurements have shown a wide range of particle decays are consistent with this principle of lepton universality. This article presents evidence for the breaking of lepton universality in beauty-quark decays, with a significance of 3.1 standard deviations, based on proton-proton collision data collected with the LHCb detector at CERN's Large Hadron Collider. The measurements are of processes in which a beauty meson transforms into a strange meson with the emission of either an electron and a positron, or a muon and an antimuon. If confirmed by future measurements, this violation of lepton universality would imply physics beyond the Standard Model, such as a new fundamental interaction between quarks and leptons.

The Standard Model (SM) of particle physics provides precise predictions for the properties and interactions of fundamental particles, which have been confirmed by numerous experiments since the inception of the model in the 1960's. However, it is clear that the model is incomplete. The SM is unable to explain cosmological observations of the dominance of matter over antimatter, the apparent dark-matter content of the Universe, or explain the patterns seen in the interaction strengths of the particles. Particle physicists have therefore been searching for 'new physics' -the new particles and interactions that can explain the SM's shortcomings.
One method to search for new physics is to compare measurements of the properties of hadron decays, where hadrons are bound states of quarks, with their SM predictions. Measurable quantities can be predicted precisely in the decays of a charged beauty hadron, B + , into a charged kaon, K + , and two charged leptons, + − . The B + hadron contains a beauty antiquark, b, and the K + a strange antiquark, s, such that at the quark level the decay involves a b → s transition. Quantum field theory allows such a process to be mediated by virtual particles that can have a physical mass larger than the energy available in the interaction. In the SM description of such processes, these virtual particles include the electroweak-force carriers, the γ, W ± and Z 0 bosons, and the top quark (see Fig. 1, left). Such decays are highly suppressed [1] and the fraction of B + hadrons that decay into this final state (the branching fraction, B) is of the order of 10 −6 [2].
A distinctive feature of the SM is that the different leptons, electron (e − ), muon (µ − ) and tau (τ − ), have the same interaction strengths. This is known as 'lepton universality'. The only exception to this is due to the Higgs field, since the lepton-Higgs interaction strength gives rise to the differing lepton masses m τ > m µ > m e . The suppression of b → s transitions is understood in terms of the fundamental symmetries on which the SM is built. Conversely, lepton universality is an accidental symmetry of the SM, which is not a consequence of any axiom of the theory. Extensions to the SM that aim to address many of its shortfalls predict new virtual particles that could contribute to b → s transitions (see Fig. 1, right) and could have nonuniversal interactions, hence giving branching fractions of B + → K + + − decays with different leptons that differ from the SM predictions. Whenever a process is specified in this article, the inclusion of the charge-conjugate mode is implied. Calculation of the SM predictions for the branching fractions of B + → K + µ + µ − and B + → K + e + e − decays is complicated by the strong nuclear force that binds together the quarks into hadrons, as described by quantum chromodynamics (QCD). The large interaction strengths preclude predictions of QCD effects with the perturbation techniques used to compute the electroweak force amplitudes and only approximate calculations are presently possible. However, the strong force does not couple directly to leptons and hence its effect on the B + → K + µ + µ − and B + → K + e + e − decays is identical. The ratio between the branching fractions of these decays is therefore predicted with O(1%) precision [3][4][5][6][7][8]. Due to the small masses of both electrons and muons compared to that of b quarks, this ratio is predicted to be close to unity, except where the value of the dilepton invariant mass-squared (q 2 ) significantly restricts the phase space available to form the two leptons. Similar considerations apply to decays with other B hadrons, B → Hµ + µ − and B → He + e − , where B = B + , B 0 , B 0 s or Λ 0 b ; and H can be e.g. an excited kaon, K * 0 , or a combination of particles such as a proton and charged kaon, pK − . The ratio of branching fractions, R H [9,10], is defined in the dilepton mass-squared range q 2 min < q 2 < q 2 max as For decays with H = K + and H = K * 0 such ratios, denoted R K and R K * 0 , respectively, have previously been measured by the LHCb [11,12], Belle [13,14] and BaBar [15] collaborations. For R K the LHCb measurements are in the region 1.1 < q 2 < 6.0 GeV 2 /c 4 , whereas for R K * 0 the regions are 0.045 < q 2 < 1.1 GeV 2 /c 4 and 1.1 < q 2 < 6.0 GeV 2 /c 4 . These ratios have been determined to be 2.1-2.5 standard deviations below their respective SM expectations [3][4][5][6][7][16][17][18][19][20][21][22]. The analogous ratio has also been measured for Λ 0 b decays with H = pK − and is compatible with unity at the level of one standard deviation [23].
In this article, a measurement of the R K ratio is presented based on proton-proton collision data collected with the LHCb detector at CERN's Large Hadron Collider (see Methods). The data were recorded during the years 2011, 2012 and 2015-2018, in which the centre-of-mass energy of the collisions was 7, 8 and 13 TeV, and correspond to an integrated luminosity of 9 fb −1 . Compared to the previous LHCb R K result [11], the experimental method is essentially identical but the analysis uses an additional 4 fb −1 of data collected in 2017 and 2018. The results supersede those of the previous LHCb analysis.
The analysis strategy aims to reduce systematic uncertainties induced in modelling the markedly different reconstruction of decays with muons in the final state, compared to decays with electrons. These differences arise due to the significant bremsstrahlung radiation emitted by the electrons and the different detector subsystems that are used to identify electron and muon candidates (see Methods). The major challenge of the measurement is then correcting for the efficiency of the selection requirements used to isolate signal candidates and reduce background. In order to avoid unconscious bias, the analysis procedure was developed and the cross-checks described below performed before the result for R K was examined.
In addition to the process discussed above, the K + + − final state is produced via a B + → X qq K + decay, where X qq is a bound state (meson) such as the J/ψ . The J/ψ meson consists of a charm quark and antiquark, cc, and is produced resonantly at q 2 = 9.59 GeV 2 /c 4 . This 'charmonium' resonance subsequently decays into two leptons, J/ψ → + − . The B + → J/ψ (→ + − )K + decays are not suppressed and hence have a branching fraction orders of magnitude larger than that of B + → K + + − decays. These two processes are separated by applying a requirement on q 2 . The 1.1 < q 2 < 6.0 GeV 2 /c 4 region used to select B + → K + + − decays is chosen to reduce the pollution from the J/ψ resonance and the high-q 2 region that contains contributions from further excited charmonium resonances, such as the ψ(2S) and ψ(3770) states, and from lighter ss resonances, such as the φ(1020) meson. In the remainder of this article, the notation B + → K + + − is used to denote only decays with 1.1 < q 2 < 6.0 GeV 2 /c 4 , which are referred to as nonresonant, whereas B + → J/ψ (→ + − )K + decays are denoted resonant.
To help overcome the challenge of modelling precisely the different electron and muon reconstruction efficiencies, the branching fractions of B + → K + + − decays are measured relative to those of B + → J/ψ K + decays [130]. Since the J/ψ → + − branching fractions are known to respect lepton universality to within 0.4% [2,131], the R K ratio is determined via the double ratio of branching fractions In this equation, each branching fraction can be replaced by the corresponding event yield divided by the appropriate overall detection efficiency (see Methods), as all other factors needed to determine each branching fraction individually cancel out. The efficiency of the nonresonant B + → K + e + e − decay therefore needs to be known only relative to that of the resonant B + → J/ψ (→ e + e − )K + decay, rather than relative to the B + → K + µ + µ − decay. As the detector signature of each resonant decay is similar to that of its corresponding nonresonant decay, systematic uncertainties that would otherwise dominate the calculation of these efficiencies are suppressed. The yields observed in these four decay modes and the ratios of efficiencies determined from simulated events then enable an R K measurement with statistically dominated uncertainties. As detailed below, percent-level control of the efficiencies is verified with a direct comparison of the B + → J/ψ (→ e + e − )K + and B + → J/ψ (→ µ + µ − )K + branching fractions in the ratio r J/ψ = B(B + → J/ψ (→ µ + µ − )K + )/B(B + → J/ψ (→ e + e − )K + ), which does not benefit from the same cancellation of systematic effects. Candidate B + → K + + − decays are found by combining the reconstructed trajectory (track) of a particle identified as a charged kaon, together with the tracks from a pair of well-reconstructed oppositely charged particles identified as either electrons or muons. The particles are required to originate from a common vertex, displaced from the proton-proton interaction point, with good vertex-fit quality. The techniques used to identify the different particles and to form B + candidates are described in Methods.
The invariant mass of the final state particles, m(K + + − ), is used to discriminate between signal and background contributions, with the signal expected to accumulate around the known mass of the B + meson. Background originates from particles selected from multiple hadron decays, referred to as combinatorial background, and from specific decays of B hadrons. The latter also tend to accumulate around specific values of m(K + + − ). For the muon modes, the residual background is combinatorial and, for the resonant mode, there is an additional contribution from B + → J/ψ π + decays with a pion misidentified as a kaon. For the electron modes, in addition to combinatorial background, other specific background decays contribute significantly in the signal region. The dominant such background for the nonresonant and resonant modes comes from partially reconstructed B (0,+) → K + π (−,0) e + e − and B (0,+) → J/ψ (→ e + e − )K + π (−,0) decays, respectively, where the pion is not included in the B + candidate. Decays of the form B + → D 0 (→ K + e − ν e )e + ν e also contribute at the level of O(1%) of the B + → K + e + e − signal; and there is also a contribution from B + → J/ψ (→ e + e − )K + decays, where a photon is emitted but not reconstructed. The kinematic correlation between m(K + e + e − ) and q 2 means that, irrespective of misreconstruction effects, the latter background can only populate the m(K + e + e − ) region well below the signal peak.
After the application of the selection requirements, the resonant and nonresonant decays are clearly visible in the mass distributions (see Fig. 2). The yields in the two B + → K + + − and two B + → J/ψ (→ + − )K + decay modes are determined by performing unbinned extended maximum-likelihood fits to these distributions (see Methods). For the nonresonant candidates, the m(K + e + e − ) and m(K + µ + µ − ) distributions are fitted with a likelihood function that has the B + → K + µ + µ − yield and R K as fit parameters and the resonant decay-mode yields incorporated as Gaussian-constraint terms. The resonant yields are determined from separate fits to the mass, m J/ψ (K + + − ), formed by kinematically constraining the dilepton system to the known J/ψ mass [2] and thereby improving the mass resolution.
Simulated events are used to derive the two ratios of efficiencies needed to form R K using Eq. (2). Control channels are used to calibrate the simulation in order to correct for the imperfect modelling of the B + production kinematics and various aspects of the detector response. The overall effect of these corrections on the measured value of R K is a relative shift of (+3 ± 1)%. When compared with the 20% shift that these corrections induce in the measurement of r J/ψ , this demonstrates the robustness of the double-ratio method in suppressing systematic biases that affect the resonant and nonresonant decay modes similarly.
The systematic uncertainty (see Methods) from the choice of signal and background mass-shape models in the fits is estimated by fitting pseudoexperiments with alternative models that still describe the data well. The effect on R K is at the 1% level. A comparable uncertainty arises from the limited size of the calibration samples, with negligible contributions from the calibration of the B + production kinematics and modelling of the selection and particle-identification efficiencies. Systematic uncertainties that affect the ratios of efficiencies influence the measured value of R K and are taken into account using constraints on the efficiency values. Correlations between different categories of selected events and data-taking periods are taken into account in these constraints. The combined   statistical and systematic uncertainty is then determined by scanning the profile-likelihood and the statistical contribution to the uncertainty is isolated by repeating the scan with the efficiencies fixed to their fitted values.
The determination of the r J/ψ ratio requires control of the relative selection efficiencies for the resonant electron and muon modes, and does not therefore benefit from the cancellation of systematic effects in the double ratio used to measure R K . Given the scale of the corrections required, comparison of r J/ψ with unity is a stringent cross check of the experimental procedure. In addition, if the simulation is correctly calibrated, the measured r J/ψ value will not depend on any variable. The r J/ψ ratio is therefore also computed as a function of different kinematic variables. Even though the nonresonant and resonant samples are mutually exclusive as a function of q 2 , there is significant overlap between them in the quantities on which the efficiency depends, such as the laboratory-frame momenta of the final-state particles, or the opening angle between the two leptons. This is because a given set of values for the final-state particles' momenta and angles in the B + rest frame will result in a distribution of such values when transformed to the laboratory frame.
The value of r J/ψ is measured to be 0.981 ± 0.020. This uncertainty includes both statistical and systematic effects, where the latter dominate. The consistency of this ratio with unity demonstrates control of the efficiencies well in excess of that needed for the determination of R K . In the measurement of the r J/ψ ratio, the systematic uncertainty is dominated by the imperfect modelling of the B + production kinematics and the modelling of selection requirements, which have a negligible impact on the R K measurement. No significant trend is observed in the differential determination of r J/ψ as a function of any considered variable. An example distribution, with r J/ψ determined as a function of B + momentum component transverse to the beam direction, p T , is shown in Fig. 3. Assuming the observed r J/ψ variation in such distributions reflects genuine mismodelling of the efficiencies, rather than statistical fluctuations, and taking into account the spectrum of the relevant variables in the nonresonant decay modes, a total shift on R K is computed for each of the variables examined. The resulting variations are typically at the permille level and hence well within the estimated systematic uncertainty on R K . Similarly, computations of the r J/ψ ratio in bins of two kinematic variables also do not show any trend and are consistent with the systematic uncertainties assigned on the R K measurement.
In addition to B + → J/ψ K + decays, clear signals are observed from B + → ψ(2S)K + decays. The double ratio of branching fractions, R ψ(2S) , defined by provides an independent validation of the double-ratio analysis procedure and further tests the control of the efficiencies. This double ratio is expected to be close to unity [2] and is determined to be 0.997 ± 0.011, where the uncertainty includes both statistical and systematic effects, the former of which dominates. This can be interpreted as a world-leading test of lepton flavour universality in ψ(2S) → + − decays. LHCb Figure 3: Differential r J/ψ measurement. The distributions of (left) the B + transverse momentum, p T , and (right) the ratio r J/ψ relative to its average value r J/ψ as a function of p T . The p T spectrum of the B + → J/ψ K + decays is similar to that of the corresponding B + → K + + − decays such that the measurement of r J/ψ tests the kinematic region relevant for the R K measurement. The lack of any dependence of the value of r J/ψ / r J/ψ as a function of B + p T demonstrates control of the efficiencies. Uncertainties on the data points are statistical only. Fig. 2. The fit is of good quality and the value of R K is measured to be where the first uncertainty is statistical and the second systematic. Combining the uncertainties gives R K = 0.846 + 0.044 − 0.041 . This is the most precise measurement to date and is consistent with the SM expectation, 1.00 ± 0.01 [3][4][5][6][7], at the level of 0.10% (3.1 standard deviations), giving evidence for the violation of lepton universality in these decays. The value of R K is found to be consistent in subsets of the data divided on the basis of data-taking period, different selection categories and magnet polarity (see Methods). The profile-likelihood is given in Methods. A comparison with previous measurements is shown in Fig. 4.
The 3850 ± 70 B + → K + µ + µ − decay candidates that are observed are used to compute the B + → K + µ + µ − branching fraction as a function of q 2 . The results are consistent between the different data-taking periods and with previous LHCb measurements [37]. The B + → K + e + e − branching fraction is determined by combining the value of R K with the value of dB(B + → K + µ + µ − )/dq 2 in the region (1.1 < q 2 < 6.0 GeV 2 /c 4 ) [37], taking into account correlated systematic uncertainties. This gives The 1.9% uncertainty on the B + → J/ψ K + branching fraction [2] gives rise to the dominant systematic uncertainty. This is the most precise measurement of this quantity to date and, given the large (O(10%)) theoretical uncertainty on the predictions [7,132], is consistent with the SM. A breaking of lepton universality would require an extension of the gauge structure of the SM that gives rise to the known fundamental forces. It would therefore constitute a significant evolution in our understanding and would challenge an inference based on a wealth of experimental data in other processes. Confirmation of any beyond the SM effect will clearly require independent evidence from a wide range of sources.
Measurements of other R H observables with the full LHCb data set will provide further information on the quark-level processes measured. In addition to affecting the decay rates, new physics can also alter how the decay products are distributed in phase space. An angular analysis of the electron mode, where SM-like behaviour might be expected in the light of the present results and those from b → sµ + µ − decays, would allow the formation of ratios between observable quantities other than branching fractions, enabling further precise tests of lepton universality [16,18,31,133,134]. The hierarchical effect needed to explain the existing b → s + − and b → c + ν data, with the largest effects observed in tau modes, then muon modes, and little or no effects in electron modes, suggests that studies of b → sτ + τ − transitions are also of great interest [135,136]. There are excellent prospects for all of the above and further measurements with the much larger samples that will be collected with the upgraded LHCb detector from 2022 and, in the longer term, with the LHCb Upgrade II [137]. Other experiments should also be able to determine R H ratios, with the Belle II experiment in particular expected to have competitive sensitivity [138]. The ATLAS and CMS experiments may also be able to contribute [139,140].
In summary, in the dilepton mass-squared region 1.1 < q 2 < 6.0 GeV 2 /c 4 , the ratio of branching fractions for B + → K + µ + µ − and B + → K + e + e − decays is measured to be R K = 0.846 + 0.044 − 0.041 . This is the most precise measurement of this ratio to date and is compatible with the SM prediction with a p-value of 0.10%. The significance of this discrepancy is 3.1 standard deviations, giving evidence for the violation of lepton universality in these decays.

Experimental setup
The Large Hadron Collider (LHC) is the world's highest-energy particle accelerator and is situated approximately 100 m underground, close to Geneva, Switzerland. The collider accelerates two counter-rotating beams of protons, guided by superconducting magnets located around a 27 km circular tunnel, and brings them into collision at four interaction points that house large detectors. The LHCb experiment [141,142] is instrumented in the region covering the polar angles between 10 and 250 mrad around the proton beam axis, in which the products from B-hadron decays can be efficiently captured and identified. The detector includes a high-precision tracking system with a dipole magnet, providing measurements of momentum and impact parameter (IP), defined for charged particles as the minimum distance of a track to a primary proton-proton interaction vertex (PV). Different types of charged particles are distinguished using information from two ring-imaging Cherenkov (RICH) detectors, a calorimeter and a muon system [142][143][144][145][146][147].
Since the associated data storage and analysis costs would be prohibitive, the experiment does not record all collisions. Only potentially interesting events, selected using real-time event filters referred to as triggers, are recorded. The LHCb trigger system has a hardware stage, based on information from the calorimeter and muon systems; followed by a software stage that uses all the information from the detector, including the tracking, to make the final selection of events to be recorded for subsequent analysis. The trigger selection algorithms are based on identifying key characteristics of B hadrons and their decay products, such as high p T final state particles, and a decay vertex that is significantly displaced from any of the PVs in the event.
For the R K measurement, candidate events are required to have passed a hardware trigger algorithm that selects either a high p T muon; or an electron, hadron or photon with high transverse energy deposited in the calorimeters. The B + → K + µ + µ − and B + → J/ψ (→ µ + µ − )K + candidates must be triggered by one of the muons, whereas B + → K + e + e − and B + → J/ψ (→ e + e − )K + candidates must be triggered in one of three ways: by either one of the electrons; by the kaon from the B + decay; or by particles in the event that are not decay products of the B + candidate. In the software trigger, the tracks of the final-state particles are required to form a displaced vertex with good fit quality. A multivariate algorithm is used for the identification of displaced vertices consistent with the decay of a B hadron [148,149].

Analysis description
The analysis technique used to obtain the results presented in this article is essentially identical to that used to obtain the previous LHCb R K measurement, described in Ref. [11] and only the main analysis steps are reviewed here.

Event selection
Kaon and muon candidates are identified using the output of multivariate classifiers that exploit information from the tracking system, the RICH detectors, the calorimeters and the muon chambers. Electrons are identified by matching tracks to particle showers in the electromagnetic calorimeter (ECAL) and using the ratio of the energy detected in Table 1: Nonresonant and resonant mode q 2 and m(K + + − ) ranges. The variables m(K + + − ) and m J/ψ (K + + − ) are used for nonresonant and resonant decays, respectively. the ECAL to the momentum measured by the tracking system. An electron that emits a bremsstrahlung photon due to interactions with the material of the detector downstream of the dipole magnet results in the photon and electron depositing their energy in the same ECAL cells, and therefore in a correct measurement of the original energy of the electron in the ECAL. However, a bremsstrahlung photon emitted upstream of the magnet will deposit energy in a different part of the ECAL than the electron, which is deflected in the magnetic field. For each electron track, a search is therefore made in the ECAL for energy deposits around the extrapolated track direction before the magnet that are not associated with any other charged tracks. The energy of any such deposit is added to the electron energy that is derived from the measurements made in the tracker. Bremsstrahlung photons can be added to none, either, or both of the final-state e + and e − candidates.

Decay mode
In order to suppress background, each final-state particle is required to have sizeable p T and to be inconsistent with coming from a PV. The particles are required to originate from a common vertex, with good vertex-fit quality, that is displaced significantly from all of the PVs in the event. The PVs are reconstructed by searching for space points where an accumulation of track trajectories is observed. A weighted least-squares method is then employed to find the precise vertex position. The B + momentum vector is required to be aligned with the vector connecting one of the PVs in the event (below referred to as the associated PV) and the B + decay vertex. The value of q 2 is calculated using only the lepton momenta, without imposing any constraint on the m(K + + − ) mass.
The m(K + + − ) mass ranges and the q 2 regions used to select the different decay modes are shown in Table 1. The selection requirements applied to the nonresonant and resonant decays are otherwise identical. For the muon modes, the superior mass resolution allows a fit in a reduced m(K + + − ) mass range compared to the electron modes. For the electron modes, a wider mass region is needed to perform an accurate fit, but the range chosen suppresses any significant contribution from decays with two or more additional pions that are not reconstructed. The residual contribution from such decays is considered as a source of systematic uncertainty. Resolution effects similarly motivate the choice of nonresonant q 2 regions, with a lower limit that excludes contributions from φ-meson decays and an upper limit that reduces the tail from B + → J/ψ (→ e + e − )K + decays. The proportion of signal candidates that migrate in and out of the q 2 region of interest is of the order of 10%. This effect is accounted for using simulation.
Cascade background of the form B → H c (→ K + − ν X) + ν Y , where H c is a hadron containing a c quark (D 0 , D + , D + s , Λ + c ), and X, Y are particles that are not included in the B + candidate, are suppressed by requiring that the kaon-lepton invariant mass is in the region m(K + − ) > m D 0 , where m D 0 is the known D 0 mass [2]. For the electron mode, this requirement is illustrated in Fig. 5 (left). Analogous background sources with a misidentified particle are reduced by applying a similar veto, but with the leptonmass hypothesis changed to that of a pion (denoted [→π] ). In the muon case, Kµ [→π] combinations with a mass smaller than m D 0 are rejected. In the electron case, a ±40 MeV/c 2 window around the D 0 mass is used to reject candidates where the veto is applied without the bremsstrahlung recovery, i.e. based on only the measured track momenta. The mass distributions are shown in Fig. 5. The electron and muon veto cuts differ given the relative helicity suppression of π + → + ν decays. This causes misidentification backgrounds to populate a range of Kµ masses but only a peak in the Ke mass. The veto requirements retain 97% of B + → K + µ + µ − and 95% of B + → K + e + e − decays passing all other selection requirements.
Background from other exclusive B-hadron decays requires at least two particles to be misidentified. These include the decays B + → K + π + π − , and misreconstructed B + → J/ψ (→ + − )K + and B + → ψ(2S)(→ + − )K + decays. In the latter two decays the kaon is misidentified as a lepton and the lepton (of the same electric charge) as a kaon. Such background is reduced to a negligible level by particle-identification criteria. Background from decays with a photon converted into an e + e − pair are also negligible due to the q 2 selection.

Multivariate selection
A Boosted Decision Tree (BDT) algorithm [150] with gradient boosting [151] is used to reduce combinatorial background. For the nonresonant muon mode and for each of the three different trigger categories of the nonresonant electron mode, a single BDT classifier is trained for the 7 and 8 TeV data, and an additional classifier is trained for the 13 TeV data. The BDT output is not strongly correlated with q 2 and the same classifiers are used to select the respective resonant decays. In order to train the classifier, simulated nonresonant B + → K + + − decays are used as a proxy for the signal and nonresonant K + + − candidates selected from the data with m(K + + − ) > 5.4 GeV/c 2 are used as a background sample. The k-folding technique is used in the training and testing [152]. The classifier includes the following variables: the p T of the B + , K + and dilepton candidates, and the minimum and maximum p T of the leptons; the B + , dilepton and K + χ 2 IP with respect to the associated PV, where χ 2 IP is defined as the difference in the vertex-fit χ 2 of the PV reconstructed with and without the considered particle; the minimum and maximum χ 2 IP of the leptons; the B + vertex-fit quality; the statistical significance of the B + flight distance; and the angle between the B + candidate momentum vector and the direction between the associated PV and the B + decay vertex. The p T of the final state particles, the vertex-fit χ 2 and the significance of the flight distance have the most discriminating power. For each of the classifiers, a requirement is placed on the output variable in order to maximise the predicted significance of the nonresonant signal yield. For the electron modes that dictate the R K precision, this requirement reduces the combinatorial background by approximately 99%, while retaining 85% of the signal mode. The muon BDT classifier has similar performance. In both cases, for both signal and background, the efficiency of the BDT selection has negligible dependence on m(K + + − ) and q 2 in the regions used to determine the event yields.

Calibration of simulation
The simulated data used in this analysis are produced using the software described in Refs. [153][154][155][156][157]. Bremsstrahlung emission in the decay of particles is simulated using the Photos software in the default configuration [158], which is observed to agree with an independent quantum electrodynamics calculation at the level of 1% [5].
Simulated events are weighted to correct for the imperfect modelling using control channels. The B + production kinematics are corrected using B + → J/ψ (→ + − )K + events. The particle-identification performance is calibrated using data, where the species of particles in the final state can be unambiguously determined purely on the basis of the kinematics. The calibration samples consist of D * + → D 0 (→ K − π + )π + , J/ψ → µ + µ − , and B + → J/ψ (→ e + e − )K + decays, from which kaons, muons, and electrons, respectively, can be selected without applying particle-identification requirements. The performance of the particle-identification requirements is then evaluated from the proportion of events in these samples which fulfil the particle-identification selection criteria. The trigger response is corrected using weights applied to simulation as a function of variables relevant to the trigger algorithms. The weights are calculated by requiring that simulated B + → J/ψ (→ + − )K + events exhibit the same trigger performance as the control data. The B + → J/ψ (→ + − )K + events selected from the data have also been used to demonstrate control of the electron track-reconstruction efficiency at the percent level [159]. Whenever B + → J/ψ (→ + − )K + events are used to correct the simulation, the correlations between calibration and measurement samples are taken into account in the results and cross-checks presented in the article. The correlation is evaluated using a bootstrapping method to recompute the yields and efficiencies many times with different subsets of the data.

Likelihood fit
An unbinned extended maximum-likelihood fit is made to the m(K + e + e − ) and m(K + µ + µ − ) distributions of nonresonant candidates. The value of R K is a fit parameter, which is related to the signal yields and efficiencies according to where N (X) indicates the yield of decay mode X and ε(X) is the efficiency for selecting decay mode X. The resonant yields are determined from separate fits to m J/ψ (K + + − ).
In the fit for R K these yields, together with the efficiencies, are incorporated as Gaussianconstraint terms.
In order to take into account the correlation between the selection efficiencies, the m(K + e + e − ) and m(K + µ + µ − ) distributions of nonresonant candidates in each of the different trigger categories and data-taking periods are fitted simultaneously, with a common value of R K . The relative fraction of partially reconstructed background in each trigger category is also shared across the different data-taking periods.
The mass-shape parameters are derived from the calibrated simulation. The four signal modes are modelled by multiple Gaussian functions with power-law tails on both sides of the peak [160,161], although the differing detector response gives different shapes for the electron and muon modes. The signal mass shapes of the electron modes are described with the sum of three distributions, which model whether the ECAL energy deposit from a bremsstrahlung photon was added to both, either, or neither of the e ± candidates. The expected values from simulated events are used to constrain the fraction of signal decays in each of these categories. These fractions are observed to agree well with those observed in resonant events selected from the data. In order to take into account residual differences in the signal shape between data and simulation, an offset in the peak position and a scaling of the resolution are allowed to vary in the fits to the resonant modes. The corresponding parameters are then fixed in the fits to the relevant nonresonant modes. This resolution scaling changes the migration of candidates into the q 2 region of interest by less than 1%.
For the modelling of nonresonant and resonant partially reconstructed backgrounds, data are used to correct the simulated Kπ mass spectrum for B (0,+) → K + π (−,0) e + e − and B (0,+) → J/ψ (→ e + e − )K + π (−,0) decays [162]. The calibrated simulation is used subsequently to obtain the m(K + + − ) mass shape and relative fractions of these background components. In order to accommodate possible lepton-universality violation in these partially reconstructed processes, which are underpinned by the same b → s quark-level transitions as those of interest, the overall yield of such decays is left to vary freely in the fit. The shape of the B + → J/ψ π + background contribution is taken from simulation but the size with respect to the B + → J/ψ K + mode is constrained using the known ratio of the relevant branching fractions [2,163] and efficiencies.
In the fits to nonresonant B + → K + e + e − candidates, the mass shape of the background from B + → J/ψ (→ e + e − )K + decays with an emitted photon that is not reconstructed is also taken from simulation and, adjusting for the relevant selection efficiency, its yield is constrained to the value from the fit to the resonant mode within its uncertainty. In all fits, the combinatorial background is modelled with an exponential function with a freely varying yield and shape.
The fits to the nonresonant (resonant) decay modes in different data-taking periods and trigger categories are shown in Fig. 6 (Fig. 7). For the resonant modes the results from independent fits to each period/category are shown. Conversely, the nonresonant distributions show the projections from the simultaneous fit across data taking periods and trigger categories that is used to obtain R K . The fitted yields for the resonant and nonresonant decays are given in Table 2.
The profile likelihood for the fit to the nonresonant decays is shown in Fig. 8. The likelihood is non-Gaussian in the region R K > 0.95 due to the comparatively low yield of B + → K + e + e − events. Following the procedure described in Refs. [11,12], the p-value is computed by integrating the posterior probability density function for R K , having folded in the theory uncertainty on the SM prediction, for R K values larger than the SM expectation. The corresponding significance in terms of standard deviations is computed using the inverse Gaussian cumulative distribution function for a one-sided conversion.
A test statistic is constructed that is based on the likelihood ratio between two hypotheses with common (null) or different (test) R K values for the part of the sample analysed previously (7, 8 and part of the 13 TeV data) and for the new portion of the 13 TeV data. Using pseudoexperiments based on the null hypothesis, the data suggest that the R K value from the new portion of the data is compatible with that from the previous sample with a p-value of 95%. Further tests give good compatibility for subsamples of the data corresponding to different trigger categories and magnet polarities.
The departure of the profile likelihood shown in Fig. 8 from a normal distribution stems from the definition of R K . In particular, in the R K ratio the denominator is affected by larger statistical uncertainties than the numerator, owing to the larger number of nonresonant muonic signal candidates. However, the intervals of the likelihood distribution are found to be the same when estimated with 1/R K as the fit parameter.

Additional cross-checks
The r J/ψ single ratio is used to perform a number of additional cross-checks. The distribution of this ratio as a function of the angle between the leptons and the minimum p T of the leptons is shown in Fig. 9, together with the spectra expected for the resonant and nonresonant decays. No significant trend is observed in either r J/ψ distribution. Assuming the deviations observed are genuine mismodelling of the efficiencies, rather than statistical fluctuations, a total shift of R K at a level less than 0.001 would be expected due to these effects. This estimate takes into account the spectrum of the relevant Table 2: Yields of the nonresonant and resonant decay modes obtained from the fits to the data. The quoted uncertainty is the combination of statistical and systematic effects.

Decay mode Yield
B + → K + e + e − 1 640 ± 70 B + → K + µ + µ − 3 850 ± 70 B + → J/ψ (→ e + e − )K + 743 300 ± 900 B + → J/ψ (→ µ + µ − )K + 2 288 500 ± 1 500   variables in the nonresonant decay modes of interest and is compatible with the estimated systematic uncertainties on R K . Similarly, the variations seen in r J/ψ as a function of all other reconstructed quantities examined are compatible with the systematic uncertainties assigned. In addition, r J/ψ is computed in two-dimensional intervals of reconstructed quantities, as shown in Fig. 10. Again, no significant trend is seen.

Systematic uncertainties
The majority of the sources of systematic uncertainty affect the relative efficiencies between nonresonant and resonant decays. These are included in the fit to R K by allowing the relative efficiency to vary within Gaussian constraints. The width of the constraint is determined by adding the contributions from the different sources in quadrature. Correlations in the systematic uncertainties between different trigger categories and run periods are taken into account. Systematic uncertainties affecting the determination of the signal yield are assessed using pseudoexperiments generated with variations of the fit model. Pseudoexperiments are also used to assess the degree of bias originating from the fitting procedure. The bias is found to be 1% of the statistical precision, i.e. negligible with respect to other sources of systematic uncertainty. For the nonresonant B + → K + e + e − decays, the systematic uncertainties are dominated by the modelling of the signal and background components used in the fit. The effect on R K is at the 1% level. A significant proportion (0.7%) of this uncertainty comes from the limited knowledge of the Kπ spectrum in B (0,+) → K + π (−,0) e + e − decays. In addition, a 0.2% systematic uncertainty is assigned for the potential contribution from partially reconstructed decays with two additional pions. A comparable uncertainty to that from the modelling of the signal and background components is induced by the limited sizes of calibration samples. Other sources of systematic uncertainty, such as the calibration of B + production kinematics, the trigger calibration and the determination of the particle identification efficiencies, contribute at the few-permille or permille level,  Figure 9: Differential r J/ψ measurement. (Top) distributions of the reconstructed spectra of (left) the angle between the leptons, α( + , − ), and (right) the minimum p T of the leptons for B + → K + + − and B + → J/ψ (→ + − )K + decays. (Bottom) the single ratio r J/ψ relative to its average value r J/ψ as a function of these variables. In the electron minimum p T spectra, the structure at 2800 MeV/c is related to the trigger threshold. Uncertainties on the data points are statistical only.  depending strongly on the data-taking period and the trigger category.
The uncertainties on parameters used in the simulation model of the signal decays affect the q 2 distribution and hence the selection efficiency. These uncertainties are propagated to an uncertainty on R K using predictions from the flavio software package [7] but give rise to a negligible effect. Similarly, the differing q 2 resolution between data and simulation, which alters estimates of the q 2 migration, has negligible impact on the result.