## Main

Precision medicine aims to tailor treatment to an individual based on their clinical characteristics1. The most successful examples of precision medicine to date have been in cancer and monogenic disease, where genetic sequencing has indicated molecularly distinct subtypes that could benefit from specific treatment strategies2,3. This approach, however, is not suitable for common polygenic complex diseases, so other strategies are needed.

Type 2 diabetes is an attractive candidate for a precision medicine approach as it is a heterogeneous disease with varying underlying pathophysiology, and there are many different options for glucose-lowering treatment available that have differing mechanisms of action4. Identifying clinical characteristics or biomarkers robustly associated with differential treatment responses could allow the targeting of specific glucose-lowering agents to those most likely to benefit.

In the 2022 American Diabetes Association (ADA)/European Association for the Study of Diabetes (EASD) international guidelines, the targeting of therapy based on a person’s clinical features is limited5. In patients with established atherosclerotic cardiovascular disease, glucagon-like peptide 1 receptor agonists (GLP-1RA) or sodium–glucose co-transporter 2 inhibitors (SGLT2i) are recommended5. Patients with either heart failure or chronic kidney disease are recommended to receive SGLT2i. However, these recommendations apply to only 15–20% of individuals6. For most individuals with type 2 diabetes, current guidelines include a broad choice of potential therapies with differentiation between treatment classes based predominantly on costs and side effect profiles, rather than efficacy.

Simple clinical features, such as a person’s sex; surrogate markers of insulin resistance, such as body mass index (BMI) and triglycerides; or markers of renal function, such as estimated glomerular filtration rate (eGFR), can be used to stratify people with type 2 diabetes into subgroups showing differential responses to glucose-lowering therapies7. Individuals with obesity have been shown to have a greater glycemic reduction on thiazolidinediones compared to individuals without obesity, whereas a higher BMI is associated with a smaller glycemic reduction on DPP4 inhibitors (DPP4i)8,9. For SGLT2i, which acts through inhibiting the active reabsorption of glucose in the proximal tubule, impaired renal function (lower eGFR) is associated with reduced glucose-lowering efficacy10,11. In contrast, with some DPP4i, impaired renal function is associated with increased glucose-lowering efficacy, likely due to the drug pharmacokinetics where reduced renal clearance can lead to increased plasma DPP4i concentrations12. These associations, to date, have been observed in independent treatment groups in electronic healthcare records and in post hoc analyses of individual participant data in parallel group randomized controlled trials (RCTs)8,9,10,11.

The precision medicine approach to using these data-derived strata needs to be tested in a clinical trial. To date, no trials have directly examined a precision medicine approach to prescribing in type 2 diabetes. The effectiveness of any stratified approach for choosing between therapies will depend upon the extent to which differential responses can be predicted, and, therefore, the true test of a precision medicine approach would be to assess the within-person differential responses to therapy.

We carried out a three-drug, three-period, randomized crossover trial to assess two specific hypotheses (Figs. 1 and 2) in people with type 2 diabetes treated with metformin alone or with metformin plus sulfonylurea:

1. 1.

Individuals with BMI > 30 kg/m2, compared to those with BMI ≤ 30 kg/m2, will have a greater glycemic reduction with a thiazolidinedione (pioglitazone) than with a DPP4i (sitagliptin).

2. 2.

Individuals with eGFR 60–90 ml/min/1.73 m2, compared to those with eGFR > 90 ml/min/1.73 m2, will have a greater glycemic reduction with a DPP4i (sitagliptin) than with an SGLT2i (canagliflozin).

## Results

### Participant retention and baseline characteristics

Figure 3 shows participant flow throughout the study and the numbers on each drug at each stage.

In total, 742 patients were screened for eligibility between 22 November 2016 and 24 January 2020. Of these, 210 patients did not meet eligibility criteria, and seven withdrew before being randomized. Overall, 525 participants were randomized to one of the six sequences of drug allocations (see Table 1 for participant characteristics). Of these, 20 patients withdrew before the baseline visit (four for health reasons, ten changed their mind, two were ineligible, one moved out of the area and three were unable to be contacted), and two withdrew at the baseline visit due to difficulties taking blood, leaving 503 receiving their first study drug. Overall, 45 participants subsequently withdrew (Fig. 3), leading to 458 participants (87% of those randomized) who completed all three study periods. In total, there were 1,417 instances of people taking drugs: 469 pioglitazone, 474 sitagliptin and 474 canagliflozin.

For hypothesis 1, 356 participants (68%) had HbA1c results that could be included in primary analysis (that is, took therapy for at least 12 weeks with >80% adherence based on pill count). For hypothesis 2, 342 participants (65%) had HbA1c results that could be included in primary analysis. No participants were missing eGFR or BMI results.

There was no evidence of any HbA1c carryover effect but some evidence of a period effect with participants having a mean (95% CI) 1.38 (0.23, 2.54) mmol/mol lower HbA1c in period 2 compared to period 1. There was no difference in period 3 compared to period 1, suggesting that this was not a sustained reduction over the year (Supplementary Table 1). Period effect was adjusted for in subsequent analysis.

Before stratification, there was no difference in achieved HbA1c among the three therapies: pioglitazone 59.6 mmol/mol (95% CI 58.5, 60.7), sitagliptin 60.0 mmol/mol (95% CI: 59.0, 61.1) and canagliflozin 60.6 mmol/mol (95% CI: 59.7, 61.6) (P = 0.2) (Supplementary Table 2). Pioglitazone was associated with the lowest rates of discontinuation; sitagliptin was associated with the lowest mean number of side effects; and canagliflozin was associated with the lowest weight on therapy (Supplementary Table 2). The distribution of side effects on the three therapies is shown in Extended Data Fig. 1.

### Primary analysis

The five components of the estimand for both hypotheses are shown in Extended Data Table 1.

For hypothesis 1 (BMI-dependent differential glycemic responses to pioglitazone and sitagliptin), 356 (68% of randomized participants) had valid HbA1c values for both pioglitazone and sitagliptin and so were eligible for hypothesis 1 primary analysis (BMI strata). Eligible participants were slightly older and had a slightly lower HbA1c at baseline compared to those without valid HbA1c values but were similar with respect to other characteristics (Supplementary Table 3). Characteristics of patients in the two BMI strata are shown in Supplementary Table 4.

Participants with BMI ≤ 30 kg/m2 had a lower mean 1.48 (95% CI: 0.04, 2.91) mmol/mol achieved HbA1c on sitagliptin compared to pioglitazone. Participants with BMI > 30 kg/m2 had a lower mean 1.44 (95% CI: 0.19, 2.70) mmol/mol achieved HbA1c on pioglitazone compared to sitagliptin (Fig. 4a and Extended Data Table 2). This led to a 2.92 (95% CI: 0.99, 4.85) mmol/mol overall difference between BMI strata. Results were similar in a full mixed effects model, adjusting for period (2.88 (95% CI: 0.98, 4.79) mmol/mol; P = 0.003) (Supplementary Table 5).

A tipping point analysis suggested that the missing data would need to show a 3.1 mmol/mol difference in HbA1c in the opposite direction to the trial results to change the statistical significance of the findings.

The association between BMI and difference in response between pioglitazone and sitagliptin was linear on a continuous scale, indicating that there would be an even greater benefit for pioglitazone at higher BMIs and greater benefit for sitagliptin at lower BMIs (Extended Data Fig. 2a).

For hypothesis 2 (eGFR and differential responses to sitagliptin and canagliflozin), 342 (65% of randomized participants) had valid HbA1c values for both sitagliptin and canagliflozin and so were eligible for primary analysis for hypothesis 2 (eGFR strata). There were no differences in characteristics between those eligible and ineligible for hypothesis 2 analysis (Supplementary Table 6). Characteristics of patients in the two eGFR strata are shown in Supplementary Table 7.

Participants with eGFR 60–90 ml/min/1.73 m2 had a lower mean (95% CI) 1.74 (0.65, 2.85) mmol/mol achieved HbA1c on sitagliptin compared to canagliflozin. Participants with eGFR >90 ml/min/1.73 m2 had a lower mean 1.08 (−0.24, 2.41) mmol/mol achieved HbA1c on canagliflozin compared to sitagliptin (Fig. 4b and Extended Data Table 3). In a full mixed effects model, adjusting for period, this translated into a difference of 2.90 (1.19, 4.61) mmol/mol between eGFR strata (P = 0.001) (Supplementary Table 8).

A tipping point analysis suggested that the missing data would need to show a 3.2 mmol/mol difference in HbA1c in the opposite direction to the trial results to change the statistical significance of the findings.

The association between eGFR and difference in response between sitagliptin and canagliflozin was linear on a continuous scale, indicating that there would be an even greater benefit for canagliflozin at higher eGFR values and greater benefit for sitagliptin at lower eGFR values (Extended Data Fig. 2b)

Sensitivity analyses show that results did not differ for either of the tested hypotheses when adjusting for study period, when restricted to only those with HbA1c values when on therapy for at least 15 weeks, when adjusting for differences in time intervals between measurements or when adjusting for those who had >18 weeks of therapy (Supplementary Tables 911).

#### Secondary outcomes

There was no difference in tolerability between BMI strata for pioglitazone compared to sitagliptin (odds ratio (OR) (95% CI) 2.11 (0.66, 6.76)) for drug × BMI strata interaction in a mixed effects logistic regression analysis (P = 0.2; Supplementary Tables 12 and 13 and Extended Data Table 4) or between eGFR strata for canagliflozin compared to sitagliptin (OR (95% CI) 0.424 (0.158, 1.135)) for drug × eGFR strata interaction in a mixed effects logistic regression analysis (P = 0.09; Supplementary Tables 13 and 14 and Extended Data Table 4).

There was no difference in the odds of experiencing at least one side effect for either of the drug/strata combinations of interest (OR (95% CI) 0.68 (0.31, 1.45), P = 0.3 for drug × BMI strata interaction; OR (95% CI) 1.46 (0.70, 3.04), P = 0.3 for drug × eGFR strata interaction) (Extended Data Table 5 and Supplementary Tables 15 and 16).

There was evidence of period and carryover effects for weight, with participants being heavier on average as the trial progressed and with a carryover effect (P < 0.001), with either canagliflozin or sitagliptin treatment in the previous period associated with lower weight compared to pioglitazone treatment in the previous period (Supplementary Table 17). This means that absolute weight differences observed between drugs need to be treated with caution. When analyzing by strata, pioglitazone was associated with a higher weight compared to sitagliptin in both BMI categories, and this was more pronounced in those with BMI > 30 kg/m2 (Extended Data Table 6). There was no difference in weight between eGFR strata for canagliflozin and sitagliptin (Extended Data Table 6).

There was no evidence of any difference in the odds of experiencing hypoglycemia by BMI strata for pioglitazone and sitagliptin or by eGFR strata for sitagliptin and canagliflozin (Extended Data Table 7 and Supplementary Tables 18 and 19).

Participant drug preference was a pre-specified secondary analysis and is reported in a separate publication13. There was no difference in drug preference by strata. Pioglitazone was ranked higher than sitagliptin in 131/265 (49%) participants in the BMI > 30 kg/m2 strata compared to 78/183 (43%) in the BMI < 30 kg/m2 strata (P = 0.2; ten participants expressed no preference). Sitagliptin was ranked higher than canagliflozin in 112/214 (52%) in the eGFR 60–90 ml/min/1.73 m2 strata compared to 105/235 (45%) in the eGFR > 90 ml/min/1.73 m2 strata (P = 0.1; nine participants expressed no preference).

There were 2,201 adverse events reported throughout the study: 56 pre-trial, one post-trial and 2,144 while on therapy in the trial. Table 2 summarizes the adverse events on therapy reported throughout the trial. In total, 447/525 (85%) randomized participants experienced adverse events on at least one of the study drugs. Forty-five events were classed as serious (three participants died), but none of these was related to the study drugs.

## Discussion

This randomized crossover study provides prospective trial evidence to directly support a stratified approach for therapy to manage glycemia in type 2 diabetes. Our results demonstrate that, for second-line and third-line therapy in type 2 diabetes, simple predefined stratification using BMI and renal function can determine the choice of the drug most likely to be effective for glucose lowering.

We show here that, among patients with type 2 diabetes on background metformin or combination metformin and sulfonylurea therapy, stratification based on BMI and eGFR is associated with differential glucose-lowering responses to canagliflozin, sitagliptin and pioglitazone. For a population of people with type 2 diabetes, treating patients with the drug proposed best for their strata rather than the alternative drug could potentially lead to an overall mean improvement of ~3 mmol/mol in those who are able to tolerate the therapy. This stratification could be used to help select glucose-lowering therapies for individuals in clinical practice. For participants with BMI > 30 kg/m2, a lower HbA1c was achieved on pioglitazone compared to sitagliptin, whereas, for those with BMI < 30 kg/m2, a lower HbA1c was achieved with sitagliptin. For participants with impaired renal function (eGFR between 60 ml/min/1.73 m2 and 90 ml/min/1.73 m2), a lower HbA1c was achieved on sitagliptin compared to canagliflozin, whereas, for those with normal renal function (eGFR > 90 ml/min/1.73 m2), a lower HbA1c was achieved on canagliflozin. These findings are concordant with our original study hypothesis. There was no evidence by strata in reported drug tolerability or overall rates of side effects.

We found that using different strata led to clinically meaningful differences (~3 mmol/mol) in achieved HbA1c among glucose-lowering therapies. This equates to approximately 3 years without requiring additional therapy, given that the median deterioration in HbA1c in people with type 2 diabetes is 1 mmol/mol per year14. In contrast, without stratification, all three therapies were, on average, equivalent in achieved HbA1c. Although these differences are of a smaller magnitude than the benefits seen with targeted therapy in monogenic diabetes (for example, a ~30 mmol/mol difference in response between metformin and gliclazide treatment for patients with HNF1A_MODY15), the overall improvement through stratification would likely have a pronounced effect at the population level, as type 2 diabetes is far more common (90% of all diabetes for type 2 compared to <1% for MODY)16,17. A lack of difference in tolerability or overall incidence of side effects between strata suggests that, if choice of therapy were to be based solely on the optimal strata for glycemic response, this would not likely lead to any overall increase in these detrimental effects. However, consideration would need to be made regarding the weight gain associated with pioglitazone, which was greater in individuals with obesity and would need to be balanced against the greater HbA1c improvement. Further work is needed to determine the effect of this on other non-glycemic effects, such as blood pressure.

These findings, based on binary, free-to-implement strata, establish the principle of stratification helping to target type 2 diabetes treatment to those most likely to benefit, and they represent a step forward in the translation of type 2 diabetes precision medicine into clinical practice. However, ultimately, a more sophisticated ‘precision’ approach using models that integrate multiple individual-level clinical features (for example, BMI, HbA1c and eGFR) on a continuous scale will have the greatest utility for clinical practice7,18. Using individual-level features will likely enable the identification of more ‘extreme’ patient phenotypes with large differences in HbA1c reduction than we demonstrated with binary strata based on clinically defined cutoffs. For example, when we look at the impact of BMI or renal function on a continuous scale rather than two dichotomous groups, it is clear that those with more extreme values have greater differential response to the treatment. Such models could potentially be optimized to incorporate more advanced biomarkers and genetics and to evaluate additional outcomes beyond HbA1c19.

Our findings are consistent with previous research from trial subgroup and observational data that has suggested that higher BMI may be associated with increased glucose lowering to thiazolidinediones9 and modestly reduced glucose lowering to DPP4 inhibtors8 and with research suggesting that lower eGFR is associated with reduced response to SGLT2 inhibitors10,11. Pioglitazone acts through altering the transcription of genes influencing carbohydrate and lipid metabolism in adipocytes20, which could lead to a greater glycemic effect in those with higher BMI. For sitagliptin, which reduces degradation of incretin hormones, including GLP-1, thereby potentiating insulin secretion, the association of greater HbA1c reduction in those with lower BMI is less clear. Potential mechanisms include the impact of high insulin resistance on the action of a drug that acts predominantly through potentiating insulin secretion, impaired GLP-1 secretion in obesity or direct effects of lipotoxicity on GLP-1 receptor expression, which have been demonstrated in animal models21,22,23. For SGLT2i, the drug mechanism of action to lower glucose levels per se (as opposed to its other effects) is through inhibition of renal tubular glucose reabsorption, and a low eGFR might, therefore, be expected to lead to reduced filtration of glucose and, subsequently, reduced glycosuria with SGLT2i therapy24.

A key strength of this RCT is that we have shown that these differences are observed in the crossover setting, allowing robust assessment of differential response to therapy within individuals and, therefore, direct assessment of stratified treatment that cannot be undertaken from existing trials with a parallel group design. The crossover design also requires a much smaller sample size compared to parallel group trials. Our RCT had several limitations. The crossover design, although more powerful for assessing within-person differences, does require careful design to avoid period and carryover effects. We did see a period effect with a reduction in HbA1c in the second period, but, in line with our Statistical Analysis Plan, we adjusted for this in our analysis, and this was not a sustained reduction over the year, which would indicate a more general decline in glycemic control. We did not see a carryover effect for HbA1c, our primary outcome, but there was carryover with weight limiting the interpretability of the effect sizes for the associations seen with weight. In addition, the crossover design enabled an assessment of only short-term outcomes, meaning that we did not evaluate durability of HbA1c reduction, cardiovascular outcomes or development of diabetes complications. However, our previous work using parallel group trial data and observational data suggests that differences in response associated with strata are maintained over time, with early HbA1c response representative of long-term effects8,25. Most of our study population was male (73%) and self-reported White ethnicity (94%), which limits conclusions about the relative benefits and risks of these therapies in females and in other ethnic groups. We assessed only specific glucose-lowering agents, and findings cannot be assumed to reflect class effects of SGLT2 inhibitors, DPP4 inhibitors and thiazolidinediones. We chose a per-protocol analysis rather than an intention-to-treat approach for our primary analysis, as we could not obtain a valid HbA1c value when participants had not taken the therapy for at least 12 weeks, and imputation with baseline measures was deemed inappropriate due to the pre-study baseline not being representative for later study periods. This means that the inferences from this study apply only to those who can tolerate the therapies of interest. There were some minor differences between individuals included and excluded from the BMI-defined strata (hypothesis 1), but there was no difference in tolerability between the study drug and/or strata combinations. In addition, tipping point analysis indicated that the missing data would have to show large differences in the opposite direction to change the statistical significance of our findings. Therefore, we are confident that our findings are not artifacts of our analytical approach and are reflective of the effects seen in those who are able to tolerate the respective therapies.

It should be recognized that we studied only patients treated with metformin (with or without a sulphonylurea) at baseline and that the glycemia-related and tolerability-related outcomes that we studied are not the only factors considered by clinicians and patients when choosing a glucose-lowering therapy for a patient with type 2 diabetes. In patients with established atherosclerotic cardiovascular disease (or those at elevated risk), or with chronic kidney disease or heart failure, SGLT2 inhibitors are the recommended drugs in international guidelines, and GLP1-RAs are recommended for those with atherosclerotic cardiovascular disease26. In addition, despite still being a low-cost treatment option proposed in guidelines, prescribing of thiazolidinediones is declining27,28. Any precision medicine approach based on short-term outcomes, such as glycemia, will need to be embedded in existing treatment pathways based on the longer-term cardiorenal risk benefits of specific therapies. In patients without specific cardiorenal indications (~80% of patients6), the 2022 ADA/EASD updated guidelines offer many treatment options, so considering likely glycemic response (based on participant characteristics), alongside other factors considered in current practice (such as cost and side effect profile), may offer a low-cost approach to improving treatment response and patient outcomes.

We show here, in a randomized crossover study, that clinically relevant differences in glycemic responses to therapy in type 2 diabetes can be seen when stratifying a patient population based on BMI and eGFR, leading to benefits in those who tolerate these therapies that would not be observed if considering overall glycemic response to the three drugs in the population as a whole. This study represents a prospective demonstration of a potential stratified approach to type 2 diabetes treatment.

## Methods

### Ethics

This study was approved by the UK Health Research Authority Research Ethics Committee South Central—Oxford A (16/SC/0147).

This trial was conducted and analyzed in line with the previously published protocol29 and the Statistical Analysis Plan (the full TriMaster Statistical Analysis Plan is freely available and can be downloaded from https://ore.exeter.ac.uk/repository/handle/10871/125162). The trial was registered at ClinicalTrials.gov (NCT02653209) and the ISRCTN registry (12039221). Major protocol amendments were approved by the Royal Devon University Healthcare NHS Foundation Trust as sponsor, the UK Health Research Authority (HRA) Research Ethics Committee South Central—Oxford A and the UK Medicines and Healthcare products Regulatory Agency (where relevant). Details of all 12 major amendments are included in Extended Data Table 7: Protocol Amendments in the TriMaster randomized three-way crossover trial.

#### Study design

We conducted a double-blind, randomized crossover trial of three glucose-lowering therapies (pioglitazone 30 mg once daily, sitagliptin 100 mg once daily and canagliflozin 100 mg once daily) in 24 UK centers (Supplementary Table 20). The three-way crossover trial was undertaken as an efficient, faster and more cost-effective approach to address both hypotheses, requiring fewer participants than performing two two-way crossover studies. In addition, this study design allows a unique opportunity to compare the effects of these three medications within a single person, including participant tolerance and therapy preference13

#### Study participants

Participants were adults aged ≥30 years and ≤80 years, with a clinical diagnosis of type 2 diabetes for at least 12 months and treated with either metformin alone or two classes of oral glucose-lowering therapy (given either as separate or combined medications) that do not include a DPP4 inhibitor, an SGLT2 inhibitor or a thiazolidinedione. This was likely to be metformin and sulphonylurea but included prandial glucose regulators nateglinide or repaglinide. No change of diabetes treatment (new therapy or dose change) was permitted in the previous 3 months. Participants had HbA1c > 58 mmol/mol (7.5%) and ≤110 mmol/mol (12.2%) and eGFR ≥ 60 ml/min/1.73 m2, both results confirmed at a screening visit, and were able and willing to give informed consent.

Patients were excluded if screening blood tests identified alanine transaminase (ALT) > 2.5× upper limit of the assay normal range (ULN) or known liver disease, specifically bilirubin >30 μmol/L associated with other evidence of liver failure; HbA1c ≤ 58 mmol/mol (7.5%) or >110 mmol/mol (12.2%); or eGFR <60 ml/min/1.73 m2. Treatment with insulin in the previous 12 months or with any of the study drugs within the previous 3 months were exclusion criteria, as was current treatment with corticosteroids, rifampicin, gemfibrozil, phenytoin and carbamazepine, loop diuretics (furosemide or bumetanide) or antibiotics for active infection. Presence of limb ischemia shown by absence of both pulses in one or both feet at screening, a foot ulcer requiring antibiotics in the previous 3 months or any active infection requiring antibiotics were also exclusions.

Patients could not be recruited if they were undergoing current/ongoing investigation for macroscopic hematuria, had recent (within 3 months) or planned major surgery or had experienced an acute cardiovascular episode (angina, myocardial infarction, stroke, or transient ischemic episode) within the previous 3 months. Also excluded were patients with any history of heart failure, bladder carcinoma, diabetic ketoacidosis or pancreatitis. Patients were not recruited if they were pregnant, breastfeeding or planning a pregnancy over the study period, and concurrent participation on another clinical trial of investigational medicinal product (CTIMP) where the investigational medicinal product (IMP) was currently being taken, without a sufficient washout period (5× half-life of the IMP/potential IMP), was also not permitted.

Participants were identified in primary care and from existing research cohorts. People with type 2 diabetes were eligible if aged 30–80 years on stable doses of metformin alone or metformin plus a sulfonylurea, with HbA1c > 58 mmol/mol (>7.5%) and ≤110 mmol/mol (≤12.2%). Figure 1 shows the design of the trial. Participants provided written informed consent. Ethnicity was self-reported by participants against standard 2011 UK Office for National Statistics coding. Those meeting screening criteria and consenting to take part were randomized to one of the six possible therapy sequences and asked to take each allocated therapy in turn for 16 weeks, with both participants and investigators blinded to therapy allocation. There was no washout between therapies. The 16-week treatment period was designed to minimize any carryover (the effects of the previous treatment on HbA1c in the subsequent period): all three drugs have half-lives between 7 hours and 14 hours, and HbA1c measurement reflects the previous 8–12 weeks of glycemia. Therefore, the end-of-treatment-period HbA1c represented the initial glycemic response to the drug for that individual.

### Randomization and blinding

Randomization was carried out at the baseline visit as described in the study protocol and Statistical Analysis Plan. The three therapies were allocated in random order according to six possible treatment orders: ABC, ACB, BAC, BCA, CAB and CBA. Drugs were blinded by over-encapsulation (Tayside Pharmaceuticals) with allocations blinded to the participants, study team, study researchers and study statistician.

### Study procedure

Within 2 weeks of screening, participants attended a baseline fasting visit. Subsequent research visits were scheduled to take place after 16–18 weeks of study treatment, but participants were offered the opportunity to stop a treatment early and move on to the next treatment period if they were unable to tolerate the therapy. At the baseline and end-of-therapy visits, blood samples were collected for measurement of HbA1c, weight and blood pressure, and the participant’s experiences of the therapy and potential side effects were recorded (once daily). Participants were compensated for travel expenses only.

### Biochemistry measures

Recruiting centers used local results to confirm eligibility, but all biochemical tests used in analysis, except HbA1c, were centrally analyzed at Exeter Clinical Laboratory. These included albumin, aspartate aminotransferase (AST), bilirubin, NT-pro-BNP, cholesterol, C-peptide, creatinine, fructosamine, glucose, HDL cholesterol, islet autoantibodies (GAD, IA2 and ZnT8), insulin, LDL cholesterol and triglycerides. To ensure standardization across centers, eGFR was calculated using the CKD-EPI equation by the central database, based on serum creatinine, sex, ethnicity and age as collected at baseline. All HbA1c assessment was performed by recruiting center NHS laboratories to ensure that results were available for screening and to inform final patient preference. HbA1c assays were CE marked, fully validated and accredited by the UK Accreditation Service.

Participants were asked to return their medication bottle and all unused capsules at the end of the study, with adherence in each treatment period expressed as a percentage calculated as number of tablets taken divided by the expected number of tablets to be taken (number of days between study visits). Where pill count was not available, adherence was based on four questions around self-reported compliance (if the patient ever forgot to take their medicine, if they were careless about taking their medicine, if they stopped taking their medicine if they felt unwell and if they stopped taking their medicine if they felt better). Participants were considered to be non-adherent if they answered yes to at least three of the four questions.

### Primary outcome

The primary outcome was the HbA1c value achieved after each treatment period as long as the participant had taken the study drug for at least 12 weeks and had at least 80% adherence on therapy.

### Secondary outcomes

The following secondary outcomes were assessed:

1. 1.

Tolerability, defined as taking the drug for at least 12 weeks.

2. 2.

Participant-reported side effects, assessed at the end of each treatment period (see Supplementary Table 21 for the full list). For analysis by strata, these were summarized into a binary variable ‘any’ or ‘none’ for each drug for each participant. We defined side effects as any experienced in the treatment periods, including those where they were also reported at baseline.

3. 3.

Weight on each therapy, measured at the end of each treatment period.

4. 4.

Participant-reported experience of hypoglycemia at the end of each treatment period (binary variable: experienced at least one episode of hypoglycemia versus none). Low blood glucose was defined as either ‘episodes of hypoglycemia where you felt confused, disorientated or lethargic and were unable to treat yourself’ or ‘hypoglycemic episodes where you were unconscious or had a seizure and needed glucagon or intravenous glucose’. At both baseline and subsequent timepoints, number of episodes or experience of hypoglycemia was self-reported and collected on data collection forms.

5. 5.

Patient preference of therapy. Participants ranked the three drugs in overall preference: 1 for most preferred, 3 for least preferred.

In line with a change to the Statistical Analysis Plan that we specified before data lock, analysis and unblinding, we did not analyze gender differences as a secondary outcome as our study was powered for a 60:40 split in strata, whereas 73% of our cohort were male.

Adverse events or reactions were recorded as they presented or at research visits and reported to the sponsor and Data Monitoring Committee at regular intervals. Adverse events were rated in terms of severity, seriousness and causality and coded according to MedDRA dictionary terms.

### Changes to protocol

All protocol amendments are detailed in Extended Data Table 8.

### Statistical analysis

All analyses were carried out in line with the TriMaster Statistical Analysis Plan, which was approved before data lock and drug allocations being provided. Investigation of participant preference, including additional exploratory analysis, is reported separately13.

All analysis was carried out using a validated version of Stata 16.1. In line with the Statistical Analysis Plan, statistical significance was defined as P < 0.05, based on two-sided tests of significance.

#### Effect of stratification by clinical features

Figure 4 shows the overall approach for the primary analysis for the two hypotheses. For each hypothesis and corresponding drug comparison, the aim was to assess whether the difference in achieved HbA1c for the two drugs differed for the two strata (either BMI above or below 30 kg/m2 or eGFR above or below 90 ml/min/1.73 m2), the null hypothesis being that the difference in HbA1c between drugs will be the same between strata.

### Rationale for a per-protocol approach

Analysis was carried out using a per-protocol approach. For a participant to be included in the primary analysis, it was necessary to have a valid HbA1c value. For intention-to-treat analysis, in the absence of a valid HbA1c value, some form of imputation of missing values would be required. This is more challenging in a crossover setting30, as parallel group approaches, such as imputing with the baseline, are not valid, as the pre-treatment baseline is an appropriate baseline only for the first period. However, we recognize that the missing data could be informative. Therefore, to address this issue, we proposed two further analyses to explore the extent to which the missing HbA1c values could affect the final results: a tipping point analysis (see ‘sensitivity analysis’) and a secondary analysis of tolerability.

Carryover and period effects

Carryover and period effects were checked before the main analysis. In line with the Statistical Analysis Plan, we examined first-order carryover effects (that is, carryover from the preceding period only) using mixed effects models with drug, period and a carryover variable (that is, drug in previous period) as fixed effects, participant as a random effect and HbA1c as the outcome. The carryover variable used the same coding as the drug variable or a 0 if in the first period (adjustment for period removes this part of the carryover term in analysis). We adjusted for period in primary analysis by adding as a fixed effect variable in the mixed effects models.

### Primary analysis

For each hypothesis, the mean (95% CI) for the difference in HbA1c between the two drugs of interest was calculated and also the mean (95% CI) difference of these differences (treatment contrasts) between the two strata of interest (Fig. 4). Distribution of HbA1c difference was checked and confirmed to be normally distributed. For the main analysis, a mixed effects model was used for each hypothesis to allow adjustment for study period, with HbA1c as the outcome, participant as the random effect and drug, period, stratum and drug × stratum interaction as fixed effects. The drug × stratum interaction represented the effect size of interest.

Pre-specified sensitivity analyses

1. 1.

We examined whether substantial amendment to protocol SA6 (expanding the inclusion criteria to including participants treated with metformin alone as well as metformin and sulfonylureas) affected the main findings by adding in an ‘epoch’ term to the model, where ‘epoch’ was a binary variable representing before or after the change in inclusion criteria.

2. 2.

We repeated the main analysis but included only participants who completed the full treatment period (at least 15 weeks to allow for flexibility in arranging study visits).

3. 3.

We examined whether receiving the study drug for >18 weeks (substantial amendment to protocol SA12 in relation to the COVID-19 pandemic) affected the main findings, by adding in a binary variable to the model for those with treatment periods greater or less than 18 weeks.

Tipping point analysis

As we were analyzing using a per-protocol approach, a tipping point analysis was used to explore what change in treatment contrast would be required as a result of the missing data to change the statistical significance of the outcome31. The tipping point, Δ, was designated according to when it would change the outcome of the study at the 5% significance level, calculated by:

$${\Delta} = \frac{{\left( {1.96 \times \mathrm{SE}} \right) - \tau }}{f}$$

where τ is the main effect size from analysis (difference in treatment contrasts between stratum), SE is the standard error of this effect size and f is the fraction of the cohort with missing data.

Secondary analyses relating to stratification hypotheses

Tolerability: We tested whether the odds of each of tolerability, side effects (any versus none) and hypoglycemia (any versus none) differed by the two hypothesized drug/stratum combinations. Each of these secondary outcomes were binary variables, so analysis followed the same approach as primary analysis using the same predictors but using mixed effects logistic regression models instead. As before, the drug × strata interaction represented the effect size of interest, but, this time, the output was an OR, as the data were binary.

Weight: We assessed differences in weight by drug/stratum as in the primary analysis hypotheses using similar mixed effects models to those used in primary analysis but with weight as the outcome.

Patient preference: For each hypothesis, we examined whether a patient’s preferred drug differed by strata. All other analysis relating to patient preference is reported in an additional paper submitted separately13. For each hypothesis, we compared whether the proportions preferring each of the two drugs of interest differed by strata using the chi-squared test.

Secondary analyses of overall differences in outcomes

Overall weight and HbA1c: Mean and s.d. for weight and HbA1c for each of the three drugs were examined, with statistical differences across all three determined using mixed effects models with drug (three-level factor) as the fixed effect and participant ID as the random effect.

Overall side effects: This analysis was descriptive, examining the proportions reporting experiencing each of the 16 side effects that the patients were asked about for each of the three drugs.

Overall tolerability: We report proportions not tolerating therapy (that is, not completing at least 12 weeks of therapy) for each drug. As specified in the Statistical Analysis Plan, we compared tolerability using both a Mantel–Haenszel approach and a mixed effects model. Results were similar using both approaches, but, for clarity, we just present the P values based on the mixed effects model with tolerability as the outcome, drug and period as fixed effects and participant ID as the random effect.

Sample size

For each hypothesis, to detect a difference of 0.35 s.d. (equivalent to a 3.0-mmol/mol difference between the two strata on the two different therapies) with 90% power and α = 0.05, we required 172 participants in each stratum. To allow for the possibility of uneven numbers in each stratum (up to a 60%:40% split), the sample size was increased to 358. To allow for a withdrawal rate of 15% and exclusion from primary analysis due to discontinuing at least one study drug (estimated at 19%), the sample size was increased to 520.

Differences to Statistical Analysis Plan

For side effects, in the Statistical Analysis Plan we stated that we would examine only new side effects (that is, not previously experienced), but we changed to any side effects after discussion of the presentation of findings. By examining only new side effects, this did not allow us to show change in side effects from baseline. It was apparent that the proportion experiencing some side effects went down on treatments compared to baseline, whereas some went up. By allowing analysis of all, we were able to demonstrate this. It also meant that participants could record the same side effect on two different drugs, which would not be possible otherwise. The full distribution of participants reporting side effects for baseline and on each of the therapies is presented for completeness.

Additional analyses to original Statistical Analysis Plan

There were no major changes to the analysis proposed compared to the original Statistical Analysis Plan, but some minor additional analysis was carried out to explore differences in side effects (any versus none) between drugs and strata, as a way of capturing the overall burden of side effects. A strata-specific analysis for each side effect would entail multiple testing without prior hypotheses and increased likelihood of type 1 errors, so this was deemed inappropriate.

We also report numbers of adverse events for each drug, split by severity and relatedness and whether they were associated with withdrawal or non-tolerability.

A further sensitivity analysis was conducted to explore the impact of residual autocorrelation arising from time trends or treatment carryover on the main effect sizes (drug × strata interactions). The mixed effects models for the primary analysis were extended by defining an exponential autocorrelation structure for the residual errors. This allowed for the pairwise correlation between HbA1c measurements to decrease systematically as the time gap increased and could account for irregularly spaced intervals.

Finally, we added in scatter plots to show the association on a continuous scale between each of BMI and eGFR against the difference in HbA1c for the two drugs of interest for each hypothesis. We present Pearson correlation coefficients to show the strength of associations alongside these.

Missing data

No imputation was carried out, and missing data were minimal. Participants required eGFR and BMI to be included in primary analysis, but this was available on all randomized participants. We report n for each analysis and in the tables of results throughout.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.