Introduction

Two of the most recently introduced anti-hyperglycaemic drug classes, SGLT2-inhibitors (SGLT2i) and GLP1-receptor agonists (GLP1-RA), have been shown in randomized clinical trials not only to reduce glycaemia1 but also to lower the risk of renal and cardiovascular disease (CVD) outcomes among high-risk individuals with type 2 diabetes (T2D)2,3,4,5. Based on average treatment effects reported in placebo-controlled trials, current T2D clinical consensus guidelines recommend a stratified approach to treatment selection, preferentially recommending these drug classes independent of their glucose lowering effect for individuals with cardiovascular or renal comorbidity. Specifically, people with heart failure and/or chronic kidney disease are recommended to initiate SGLT2i and people with prior CVD or high risk for CVD are recommended to initiate either an SGLT2i or a GLP1-RA. In addition, these drugs are recommended as second-line glucose lowering medications to be added after metformin6.

A limitation of the current stratified approach to SGLT2i and GLP1-RA treatment in clinical guidelines is that it is informed by selective trial recruitment strategies, and consequential accumulation of evidence of treatment benefits only for specific subgroups with or at high risk of cardiorenal disease, rather than from an understanding of how the benefits and risks of each drug class vary across the whole spectrum of T2D. A more comprehensive approach to treatment selection would require recognition of the extreme heterogeneity in the demographic, clinical, and biological features of people with T2D, and the impact of this heterogeneity on drug-specific clinical outcomes. Identification of robust and reproducible patterns of heterogenous treatment effects is plausible as, at the individual patient level, responses to the same drug treatment appear to vary greatly7. A greater understanding of population-wide heterogenous treatment effects and enhanced capacity to predict individual treatment responses is needed to advance towards the central goal of precision type 2 diabetes medicine—using demographic, clinical, biological, or other patient-level features to match individuals to their optimal anti-hyperglycaemic regimen as part of routine T2D clinical care.

To assess the evidence base for treatment effect heterogeneity for SGLT2i and GLP1-RA, we undertook a systematic literature review to summarize key findings from studies that specifically examined interactions between individual-level biomarkers and the effects of these drug classes on clinical outcomes. Although biomarkers may connote laboratory-based measurements in traditional contexts, herein we broadly conceptualized biomarkers as individual-level demographic, clinical, and biological features, including both laboratory measures as well as genetic and genomic markers. We focused on three categories of outcomes relevant to T2D care: (1) glycaemic response (as measured by hemoglobin A1c; HbA1c); (2) CVD outcomes; and (3) renal outcomes. Our review was guided by the following research question: In a population with T2D, treated with SGLT2i or GLP1-RA, what are the biomarkers associated with heterogenous treatment effects in glycaemic, CVD, and renal outcomes? Each of the three outcomes were evaluated separately for each of the two drug classes for a total of 6 sub-studies. Given the heterogeneity of the T2D population, we anticipated that we would find one or more biomarkers modifying the effects of SGLT2i and GLP1-RA.

The Precision Medicine in Diabetes Initiative (PMDI) was established in 2018 by the American Diabetes Association (ADA) in partnership with the European Association for the Study of Diabetes (EASD). The ADA/EASD PMDI includes global thought leaders in precision diabetes medicine who are working to address the burgeoning need for higher quality, individualized diabetes prevention and care through precision medicine8. This systematic review is written on behalf of the ADA/EASD PMDI as part of a comprehensive evidence evaluation in support of the 2nd International Consensus Report on Precision Diabetes Medicine9.

We find that a majority of the papers identified by our review have methodological limitations precluding robust assessment of treatment effect heterogeneity. For SGLT2-inhibitors, multiple observational studies suggest lower renal function as a predictor of lesser glycaemic response, while markers of reduced insulin secretion predict lesser glycaemic response with GLP1-receptor agonists. For both therapies, multiple post-hoc analyses of randomized control trials (including trial meta-analysis) identify minimal clinically relevant treatment effect heterogeneity for cardiovascular and renal outcomes.

Methods

We conducted a systematic review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines10. The protocol was pre-registered (PROSPERO registration number: CRD42022303236). As above, our review was guided by the following research question: In a population with T2D, treated with SGLT2i and GLP1-RA, what are the biomarkers associated with heterogenous treatment effects in glycaemic, CVD, and renal outcomes?

Search strategy

The search strategy for this review was developed for each drug class (SGLT2i and GLP1-RA) and outcome (glycaemic, cardiovascular, and renal) to capture studies specifically evaluating treatment effect heterogeneity associated with demographic, clinical, and biological features in people with type 2 diabetes. Terms for drug class (SGLT2i or GLP1-RA) and individual generic names of licensed drugs within each class (e.g. ‘empagliflozin’) were included. Potential effect modifiers of interest comprised age, sex, ethnicity, clinical features, routine blood tests, metabolic markers, and genetics; all search terms were based on medical subject sub-headings (MeSH) terms and are reported in Supplementary Note 1. SGLT2i and GLP1-RA were evaluated at drug class level, and we did not aim to identify within-class heterogeneity in treatment effects. Electronic searches were performed in PubMed and Embase by two independent academic librarians in February 2022. Forwards and backwards citation searching was conducted but grey literature and white papers were not searched.

Inclusion criteria

To be included, studies were required to meet the following criteria: full-text English-language publications of RCTs, meta-analyses, post-hoc analyses of RCTs, pooled cohort analyses, prospective and retrospective observational analyses published in peer-reviewed journals; adult populations with type 2 diabetes taking at least one of either SGLT2i or GLP1-RA with sample size >100 for the active drug of interest; at least a 4-month potential follow up period (chosen pragmatically as a suitable time length over which changes in glycaemic response could be assessed) after initiation of the drug class of interest; randomized control trials (RCTs) required a comparison against placebo or an active comparator anti-hyperglycaemic drug (observational studies did not require a comparator group); a pre-specified aim of the study must be to examine heterogeneity in treatment outcome, such as biomarker-treatment interactions, stratified analyses, or heterogeneity-focused machine learning approaches; and the study must report differential effects of the drug class on an outcome of interest (see Outcomes section below) with respect to a biomarker. All individual trial or observational cohorts included in a meta-analysis or pooled cohort analysis must have met the inclusion criteria stated above.

We further excluded studies based on the following criteria: studied type 1 or other forms of non-type 2 diabetes; included children/minors; inpatient studies; conference proceeding abstracts, editorials, opinions papers, book chapters, clinical trial registries, case reports, commentaries, narrative reviews, or non-peer reviewed studies; did not adequately adjust for confounders (individual RCTs and observational studies only, this criteria was not applied for meta-analyses and pooled cohort analyses); did not address the question of treatment response heterogeneity for biomarkers of interest.

Titles and abstracts were independently screened by pairs of research team members to identify potentially eligible studies; these were then independently evaluated for inclusion in the full-text review. Any discrepancies were discussed with a third author until reaching consensus. Discrepancies were discussed as part of larger group meetings to ensure consistency in decisions across reviewer pairs.

Data extraction and quality assessment

Pairs of authors independently reviewed the main reports and supplementary materials and extracted the following data for each of the included papers: publication (PMID, journal, publication year, first author, title, study type); study (setting and region, study time period, follow up period); population (overall characteristics, ethnicity); intervention (drug class, specific therapies, treatment/comparator arm sizes); statistical analysis (outcome, outcome measurement, subgroups/predictors analysed with respect to biomarkers, statistical model, covariate set); and results (relevant figures and tables, main findings, methodology, quality). Covidence systematic review software11 was used for data extraction.

After data were extracted, information was synthesized by drug class and outcome and further examined by biomarkers or subgroups analyzed within each study. Results were extracted within these subsections and summarized for each paper, where general trends in results for each subsection were outlined.

Risk of bias evaluations were conducted alongside the data extraction by each pair of authors, using the Joanna Briggs Institute (JBI) Critical Appraisal Tool for Cohort Studies12 for all included research papers. This was used to determine the extent of bias within the study’s design, execution, and analysis, specifically within the population, outcome measurements, and statistical modelling. The Cohort studies tool was applied for all studies as we did not identify any individual RCTs designed to specifically examine treatment effect heterogeneity, and all included RCT meta-analyses represent post-hoc rather than pre-specified analyses. Further detail on the risk of bias can be seen in Supplementary Figs. 1 and 2. Additionally, the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) framework13,14 was applied at the outcome level for each drug class to determine the quality of evidence and certainty of effects for these subsections; an overall GRADE evaluation for all evidence was also provided.

Outcomes

Three outcome categories were assessed in the included studies: (1) changes in HbA1c from baseline associated with treatment; (2) CVD outcomes limited to cardiovascular (CV)-related death, non-fatal myocardial infarction, non-fatal stroke, hospitalization for angina, coronary artery bypass graft, percutaneous coronary intervention, hospitalization for heart failure, carotid endarterectomy, and peripheral vascular disease; and (3) renal outcomes including development of chronic kidney disease (including end-stage renal disease, ESRD), and longitudinal changes in markers of renal function including eGFR/creatinine and albuminuria. Specific measurements and procedures for each category of outcome varied across the included studies. Summaries of the included papers assessing each outcome for each drug class are reported in Supplementary Tables 1-8.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Literature search and screening results

Figures 1 and 2 depict the outcomes of the study screening processes for SGLT2i (Fig. 1) and GLP1-RA (Fig. 2).

Fig. 1: Study screening and attrition flow diagram (PRISMA) for SGLT2-inhibitor studies.
figure 1

Study screening and attrition flow diagram (PRISMA) for SGLT2-inhibitor studies.

Fig. 2: Study screening and attrition flow diagram (PRISMA) for GLP1-receptor agonist studies.
figure 2

Study screening and attrition flow diagram (PRISMA) for GLP1-receptor agonist studies.

For SGLT2i, a total of 3415 unique citations underwent title and abstract screening. A total of 3076 were determined to not meet the pre-defined eligibility criteria. The remaining 339 full-text articles were screened, through which process 238 articles were excluded. The most common reasons for exclusion were: studies did not report on the heterogeneity of treatment response (126 studies), studies reported only univariate or unadjusted associations (41 studies), and studies did not meet inclusion criteria (64 studies). In total, 101 studies were identified for inclusion based on the systematic search.

For GLP1-RA, a total of 2270 unique citations underwent title and abstract screening. 2109 were determined to not meet the pre-defined eligibility criteria. The remaining 161 full-text articles were screened, through which process 86 articles were excluded. The most common reasons for exclusion were: studies did not meet inclusion criteria (39 studies), studies reported only univariate or unadjusted associations (26 studies), and studies did not report on the heterogeneity of treatment response (17 studies). In total, 75 studies were identified for inclusion.

Description of included studies

Included studies for CVD and renal outcomes were predominantly secondary analyses of industry-funded placebo-controlled trials (RCT), or meta-analyses of these trials, with a smaller number of observational studies. For glycaemic outcomes, most studies were observational. Supplementary Tables 1-8 show all included studies for GLP1-RA and SGLT2i, split by glycaemic, CVD, and renal outcomes, and including information on study population size, examined biomarkers, and notable findings. Summaries of the major individual RCTs that were included in meta-analyses are detailed in Supplementary Tables 9 and 10.

SGLT2i, GLP1-RA, and glycaemic outcomes

Study quality for assessment of heterogenous treatment effects of both drug classes was variable with strong methodological limitations for the study of predictors of glycaemia treatment response common. A core weakness with many studies was a lack of head-to-head comparisons between therapies, which is required to separate broader prognostic factors (that predict response to any glucose-lowering therapy) from drug-specific factors that are associated with differential treatment response. Put otherwise, even when data suggested that a biomarker was associated with glycaemic response, it was not clear if this factor was helpful for choosing between therapies due to the lack of an active comparator.

Other common methodological weaknesses included the use of arbitrary subgroups (rather than the assessment of continuous predictors), small numbers in comparator subgroups that limited statistical power, dichotomized outcomes (responder analysis), multiple testing, and lack of adjustment for key potential confounders.

SGLT2i

Of 27 studies that met our inclusion/exclusion criteria, 9 observational studies (usually retrospective analysis of healthcare records), 5 post-hoc analysis of individual RCTs, 10 pooled analyses of individual data from multiple RCTs, and 3 RCT meta-analyses were included (Supplementary Table 7). All included studies assessed routine clinical characteristics and routinely measured clinical biomarkers (Table 1). No pharmacogenetic, or, with the exception of one study of HOMA-B15, non-routine biomarker studies were identified.

Table 1 Summary of evidence for treatment effect heterogeneity for SGLT2-inhibitor and GLP1-receptor agonist therapies for glycaemic outcomes.

A key finding across multiple studies including appropriately adjusted analysis of RCT and observational data was that HbA1c reduction with SGLT2i is substantially reduced with lower eGFR16,17,18,19,20,21,22. In pooled RCT data for canagliflozin 300 mg, 6-month HbA1c reduction was estimated to be 11.0 mmol/mol for participants with eGFR ≥90 mL/min/1.73 m2, compared to 6.7 mmol/mol for those with eGFR 45-6022. With empagliflozin 25 mg, 6-month HbA1c reduction was 9.6 mmol/mol at eGFR ≥90, and 4.3 mmol/mol at eGFR 30-6019.

A further finding confirmed by multiple robust studies is that in keeping with other glucose-lowering agents, higher baseline HbA1c is associated with greater HbA1c lowering with SGLT2 inhibitors, including verses placebo15,21,23,24,25,26,27. Active comparator studies suggested that higher baseline HbA1c may predict greater relative HbA1c response to SGLT2i therapy in comparison to DPP4i and sulfonylurea therapy15,25,26. Notably, an individual participant data meta-analysis of two RCTs showed greater improvement with empagliflozin (6-month HbA1c decline per unit higher baseline HbA1c [HbA1c slope] −0.49% [95%CI −0.62, −0.37] compared to sitagliptin (6-month HbA1c slope −0.29% [95%CI −0.42, −0.15]) and glimepiride (12-month HbA1c slope: empagliflozin -0.52% [95%CI −0.59, −0.44]; glimepiride −0.32% [95%CI −0.39, −0.25])25.

A number of studies assessing differences in glycaemic response to SGLT2i by ethnicity suggest that initial glycaemic response to this medication class does not vary by ethnicity28,29,30,31,32. Similarly, many studies also showed that response did not vary meaningfully by sex. Some studies suggested older age may be associated with reduced glycaemia response; however, analyses usually did not adjust for eGFR which may confound this association, as eGFR declines with age17,23,32,33,34,35.

GLP1-RA

Of 49 studies that met our inclusion/exclusion criteria, 24 observational studies, 6 post-hoc analysis of individual RCTs, and 19 meta-analyses were included (Supplementary Table 8). The majority of included studies assessed routine clinical characteristics and routinely measured clinical biomarkers, although 3 studies evaluated genetic variants, and 15 studies evaluated non-routine biomarkers (Table 1).

Studies consistently identified baseline HbA1c as a predictor of greater HbA1c response. For other clinical features, the strongest evidence was that, in many observational studies, markers of lower insulin secretion (including longer diabetes duration [or proxies such as insulin treatment], lower fasting C-peptide, lower urine C-peptide-to-creatinine ratio, and positive glutamic acid decarboxylase (GAD) or islet antigen 2 (IA2) islet autoantibodies) were associated with lesser glycaemic response to GLP1-RA36,37,38,39,40,41,42,43,44,45,46,47,48,49. One large prospective study (n=620) observed clinically relevant reductions in HbA1c response with GLP1-RA in individuals with GAD or IA2 autoantibodies (mean HbA1c reduction −5.2 vs. −15.2 mmol/mol without autoantibodies) or C-peptide <0.25 nmol/L (mean HbA1c reduction −2.1 vs. −15.3 mmol/mol with C-peptide >0.25 nmol/L). In contrast, post-hoc RCT analyses have found T2D duration50 and beta-cell function51,52 do not modify glycaemic outcomes. This may reflect trial inclusion criteria as included participants had relatively higher beta-cell function, and were less-commonly insulin-treated, compared with the observational cohorts51.

Few studies contrasted HbA1c outcome for GLP1-RA versus a comparator drug. One meta-analysis showed a greater HbA1c reduction with the GLP1-RA liraglutide compared to other antidiabetic drugs (sitagliptin, glimepiride, rosiglitazone, exenatide, and insulin glargine) across all baseline HbA1c categories (n = 1804)53, a finding supported for the GLP1-RA dulaglutide compared to glimepiride and insulin glargine54.

Overall, there was no consistent evidence for effect modification by body mass index (BMI), sex, age or kidney function, with studies reporting contrasting, or null, associations for these clinical features39,40,44,45,46,50,54,55,56,57,58,59,60,61,62,63,64. In comparative analysis, one large observational study found that markers of insulin resistance (including higher HOMA-IR, BMI, fasting triglycerides, and HDL) do not alter GLP1-RA response, but are associated with lesser DPP4-inhibitor response57.

There was limited evidence for differences by ethnicity. One large pooled RCT analysis (N = 2355) suggested greater HbA1c response in Asian participants compared to those of other ethnicities, but other studies have not identified differences in response across ethnic groups65,66,67,68. Similarly, limited studies evaluated pharmacogenetics, although two small studies suggest variants rs163184 and rs10305420, but not rs3765467, may be associated with lesser response in Chinese patients43,69.

SGLT2i, GLP1-RA and cardiovascular outcomes

SGLT2i: Evidence from clinical trials

Of 65 studies, 58 were post-hoc meta-analysis of RCTs or meta-analysis of multiple RCTs. Heart failure was common as a secondary outcome. The majority of studies were derived from EMPA-REG70 and the CANVAS program71, although more recent meta-analyses included up to 12 cardiovascular outcome trials (CVOTs) with different inclusion criteria, treatments, primary outcomes, and follow-up duration (Supplementary Table 9). Most studies included only participants with established CVD or elevated cardiovascular risk, although some studies were restricted to patients with pre-existing heart failure or chronic kidney disease. While most CVOTs and meta-analyses included only patients with type 2 diabetes, some meta-analyses also included data from patients without diabetes in the EMPEROR-P72, EMPEROR-R73, DAPA-HF74 and DAPA-CKD75 RCTs. Studies primarily focused on relative rather than absolute treatment effects and one of two primary outcomes: 3-point MACE which was a composite of cardiovascular death, non-fatal MI, and non-fatal stroke; or composite heart failure outcomes including hospitalized heart failure and cardiovascular death. The longest duration of follow-up was in the CANVAS CVOT with a median follow-up of 5.7 years, while most other included CVOTs had durations of 1 to 4 years.

On average, in relative terms, SGLT2i reduce the risk of cardiovascular disease (MACE) by 10% (HR 0.90 [95%CI 0.85, 0.95]), and heart failure hospitalization by 32% (HR 0.68 [95%CI 0.61, 0.76]) in individuals with or at high-risk of CVD2. The majority of meta-analyses of CVOTs found no significant interactions for MACE or heart failure outcomes across a variety of biomarkers (Table 2; Supplementary Table 1). Several meta-analyses found no interactions by age, sex, and adiposity for MACE or heart failure outcomes. Four meta-analyses examined interactions by race for MACE outcomes and found no interactions. Three meta-analyses consistently identified a greater relative heart failure benefit of SGLT2i in people of Black and Asian ethnicity76,77,78 (HR SGLT2i versus placebo 0.60 [95% CI 0.47, 0.74]) compared to White individuals (HR 0.82 [95% CI 0.73, 0.92])76, however, one meta-analysis reported no difference between White and non-White individuals79.

Table 2 Summary of evidence for treatment effect heterogeneity for SGLT2-inhibitor and GLP1-receptor agonist therapies for cardiovascular outcomes (including heart failure).

Contemporary meta-analysis incorporating the CREDENCE and VERTIS-CV trials alongside EMPA-REG, CANVAS, and DECLARE suggests history of CVD does not modify the efficacy of SGLT2i for MACE2,80. One meta-analysis suggests heart failure severity modifies the efficacy of SGLT2i’s for heart failure outcome (composite outcome of cardiovascular death or hospitalization for heart failure) with greater efficacy in patients with NYHA heart failure class II (HR SGLT2i versus placebo 0.66 [95%CI 0.59, 0.74]) than class III or IV (HR 0.86 [95%CI 0.75, 0.99])77. Other meta-analyses that examined treatment effect heterogeneity using heart failure history as a binary predictor did not find significant interactions2,81.

A recent meta-analysis82 that included 6 CVOTs of patients with diabetes and 4 CVOTs of patients with and without diabetes found that eGFR did not alter the relative benefit of SGLT2 inhibitors for MACE and heart failure outcomes;2,77,81,83,84,85 however, a greater relative benefit was reported for MACE in those with higher baseline albuminuria (ACR>300 mg/g HR 0.74 [95%CI 0.66, 0.84]; ACR 30-300 mg/g HR 0.95 [95%CI 0.82, 1.10]) ACR<30 mg/g HR 0.87 [95%CI 0.77, 0.98]).

We identified many secondary analyses of single CVOTs, which largely found no interactions by biomarkers (Supplementary Table 1). Single studies identified potential effect modification for MACE by history of CVD86, and obesity87, and history of heart failure for heart failure outcome88, but these associations were not replicated across the other studies or in multi-RCT meta-analyses. In a secondary analysis of CANVAS, participants with higher levels of biomarkers of cardiovascular stress (high-sensitivity cardiac troponin T (hs-cTnT), soluble suppression of tumorigenesis-2 (sST2), and insulin-like growth factor binding protein 7 (IGFBP7)) had greater relative benefit for MACE; for a multimarker score summing high levels of these 3 biomarkers, the relative benefit of SGLT2i for no abnormal biomarkers was HR: 0.99 [95% CI: 0.66–1.49], 1 abnormal biomarker HR: 1.34 [95% CI: 0.94–1.89), 2 abnormal biomarkers HR: 0.61 [95% CI: 0.45–0.82]), and 3 abnormal biomarkers HR: 0.46 [95% CI:0.18–1.17]; Pinteraction trend =0.005)89. Unlike meta-analyses, studies based on single RCTs typically performed multivariable adjustment for potential confounders.

GLP1-RA: Evidence from clinical trials

Of the 35 studies that investigated heterogeneity in the effect of GLP1-RAs on cardiovascular health and met our inclusion criteria, 15 were meta-analyses of RCTs or pooled analyses of multiple RCTs, 15 were post-hoc analyses of RCTs, and 5 were observational studies (Supplementary Table 2). Most studies used data collected from the LEADER, SUSTAIN 6, and EXSCEL trials, however in total the data from 7 CVOTs were used (Supplementary Table 10). The majority of these CVOTs investigated the effect of us CVD on the cardiovascular efficacy of GLP1-RAs using 3-point MACE as a primary outcome, and with heart failure being a common secondary outcome, focusing on relative rather than absolute benefit. The population of 6 of the 7 CVOTs had established CVD or high CVD risk. The CVOT with the longest median follow-up was REWIND with a median follow-up of 5.4 years, and the median follow-up of the other CVOTs ranged from 1 to 4 years.

Contemporary meta-analysis data suggests GLP1-RA reduces the relative risk of cardiovascular disease (MACE) by 14% (HR 0.86 [95%CI 0.80-0.93]), and heart failure hospitalization by 11% (HR 0.89 [95%CI 0.82, 0.98]) compared to placebo3. Several large meta-analyses examining heterogenous treatment effects in placebo-controlled CVOTs have been conducted for GLP1-RA76,83,84,90,91,92,93,94,95,96,97, with the majority of studies focusing on whether prior established CVD modifies the relative effect of GLP1-RA on MACE and/or heart failure. Two meta-analyses reported the relative MACE benefit of GLP-RA may be restricted to those with established CVD83,90, the largest of which included 7 RCTs and reported a 14% relative risk reduction with GLP1-RA specific to individuals with established CVD (with CVD: HR 0.86 [95%CI 0.80, 0.93]; at high-risk of CVD: HR 0.94 [95% CI 0.82, 1.07])83. However, this risk difference is not conclusive and has not been replicated in other meta-analyses and pooled RCT analyses91,92,93,98,99, including an individual participant level re-analysis of the SUSTAIN and PIONEER RCTs which evaluated baseline CVD risk as a continuous rather than subgroup-level variable100.

Differential relative effects of GLP1-RAs on MACE have been reported by ethnicity in two out of three meta-analyses:76,83,90 one showed a benefit of GLP1-RA treatment compared to placebo in Asian (HR 0.76 [95%CI 0.61, 0.96]) and Black (HR 0.77 [95%CI 0.59, 0.99]) individuals, but not in White individuals (HR 0.95 [95%CI 0.88, 1.02]);90 the second showed a significantly greater benefit of GLP1-RA for MACE in Asian compared to White individuals (HR Asian 0.68 [95%CI 0.53, 0.84]; White 0.87 [95% 0.81, 0.94])76. For other clinical features including sex, BMI/obesity, baseline kidney disease, duration of diabetes, baseline HbA1c, background glucose lowering medications, and prior history of microvascular disease, the overall body of evidence from meta-analyses does not provide robust evidence to support differential effects of GLP1-RA on CVD outcomes (Table 2).

SGLT2i and GLP1-RA: Evidence from observational studies

10 observational studies met our inclusion criteria, with studies primarily reporting relative rather than absolute risk differences101,102,103,104,105,106,107,108,109,110. These studies comparing SGLT2i and GLP1-RA individually with other oral therapies (predominantly DPP4i) generally reported average relative benefits for CVD and heart failure outcomes in-line with placebo-controlled trials, with no consistent pattern of subgroup level differences across studies (Supplementary Tables 1 and 2).

A few observational studies compared SGLT2i and GLP1-RA CVD outcomes. In a US claims-based study with follow-up to two years (n = 47,343), Htoo et al. 106 reported a higher relative risk of MACE with SGLT2i compared to GLP1-RA specific to individuals without CVD and heart failure (Relative risk [RR] 1.31 [95% CI 1.09, 1.56]), and a higher risk of stroke with SGLT2i versus GLP1-RA specific to individuals without CVD (No CVD without heart failure: RR 1.62 [95%CI 1.10, 2.38]; No CVD with heart failure: RR 3.30 [95%CI 1.22, 8.97]). In contrast, over a median follow-up of 7 months, Patorno et al. 105 reported a lower relative risk of myocardial infarction with SGLT2i compared to GLP1-RA in US claims data specific to individuals with a history of CVD (n=156,825; HR 0.83 [95%CI 0.74, 0.93] with history of CVD; HR 1.13 [95%CI 1.00, 1.28] without history of CVD), with no differences in stroke outcomes irrespective of CVD status. Both studies reported a consistent benefit of SGLT2i over GLP1-RA for heart failure. Raparelli et al. 102 identified potential differences by sex in the Truven Health MarketScan database (n=167,341): compared to sulfonylureas and over a median follow-up of 4.5 years, there was a greater relative reduction with GLP1-RA for females (HR 0.57 [95%CI 0.48, 0.68]) compared to males (HR 0.82 [95%CI 0.71, 0.95]), but a similar benefit for both sexes with SGLT2i (females HR 0.58 [95%CI 0.57, 0.83]; males HR 0.69 [95%CI 0.57, 0.83]).

SGLT2i, GLP1-RA, and renal outcomes

SGLT2i: Evidence from clinical trials

A total of 29 studies met our inclusion criteria. These included 20 post-hoc analyses of individual RCTs, 7 trial meta-analyses (Supplementary Table 4), and 2 analyses of observational data. All of the post-hoc RCT analyses and all but 1 of the meta-analyses used only data from the 12 SGLT2i cardiovascular/renal RCTs shown in Supplementary Table 9, which therefore provided most of the evidence in our review. These trials included people with type 2 diabetes with and without pre-existing cardiovascular disease, and had composite renal endpoints incorporating two or more of the following (which differed between trials): changes in eGFR/serum creatinine, end-stage renal disease, changes in urine albumin:creatinine ratio (ACR), and/or death from renal causes. Most studies assessed routine clinical characteristics, especially renal function as measured by eGFR or urine ACR or a combination of both. In addition, 4 post-hoc RCT analyses examined non-routine plasma biomarkers. We found no genetic studies (Table 3).

Table 3 Summary of evidence for treatment effect heterogeneity for SGLT2-inhibitor and GLP1-receptor agonist therapies for renal outcomes.

On average, SGLT2i have a relative benefit for a number of renal outcomes including kidney disease progression (HR 0.63, 95%CI 0.58,0.69) and acute kidney injury (HR 0.77, 95%CI 0.70, 0.84)4. Placebo-controlled trial meta-analyses of subgroups found no evidence for heterogeneity of SGLT2i treatment effects on relative renal outcomes by age79, use of other glucose-lowering drugs79, use of blood pressure/cardiovascular medications79,111, blood pressure79, BMI79, diabetes duration79, White race79, history of cardiovascular disease or heart failure2,80 or sex79.

For baseline eGFR, an early meta-analysis that included EMPA-REG, CANVAS, and DECLARE reported greater effect of SGLT2i on renal outcomes in those with higher eGFR112 but both a later meta-analysis that added CREDENCE111 and a recent meta-analysis that added two further studies (SCORED and DAPA-CKD, including some participants without diabetes)82 showed no effect of baseline eGFR on renal outcomes with SGLT2i. For urine ACR, meta-analyses of subgroups found no evidence for greater SGLT2i effect with higher UACR2,82,111,113. Single RCTs found no heterogeneity of treatment effect by eGFR and UACR, or subgroups defined by the combination of these two114,115,116,117,118, with the exception of Neuen et al. 119 which showed a greater SGLT2i effect in preventing eGFR decline relative to placebo for those with higher UACR, and heterogeneity in a composite renal outcome by UACR. Overall, there was limited or no evidence to support modifying effects of baseline eGFR or UACR on the effect of SGTL2i on renal function outcomes.

A few post-hoc analyses of the CANVAS RCT considered non-routine biomarkers, with most showing no interaction with SGLT2i treatment and renal outcomes. Two RCTs studied the effect of SGLT2i on renal outcomes at differing plasma IGFBP7 levels. One study reported an interaction of IGFBP7 with SGLT2i treatment for progression of albuminuria (>96.5 ng/ml HR 0.64; <=96.5 ng/ml HR 0.95, Pinteraction = 0.003)120 but no effect was seen for the composite renal endpoint in two studies89,120. The biomarker panel (sST2, IGFBP7, hs-cTnT) that showed a strong interaction with SGLT2i for MACE outcomes (see above) did not show any interaction for renal outcomes89.

GLP1-RA: Evidence from clinical trials

7 studies met our inclusion criteria: all post-hoc RCT analyses, 6 of individual trials (or multiple trials analysed separately) and 1 pooled analysis of two RCTS (Supplementary Table 5). These studies used data from 5 of the 7 GLP1-RA cardiovascular outcome trials shown in Supplementary Table 10, with renal outcomes only a secondary endpoint. Most of these trials had composite renal endpoints as per the SGLT2i cardiovascular/renal trials, while some examined changes in either eGFR or urine ACR only. All studies assessed routine clinical characteristics, especially renal function as measured by eGFR or urine ACR. No studies of genetics or non-routine biomarkers were identified (Table 3). The overall sample sizes were small and subgroup analyses underpowered to show a subgroup by treatment interaction for renal outcomes.

Overall, GLP1-RA reduce the relative risk of albuminuria over 2 years by 24% versus placebo (HR 0.76 [95% CI 0.73-0.80; P < 0.001), and similarly reduce the relative risk of a 40% reduction in eGFR (HR, 0.86 [95% CI 0.75-0.99]; P = 0.039)5. Studies found no heterogeneity of GLP1-RA relative treatment effect by age121, blood pressure122,123, diabetes duration124, history of cardiovascular disease/heart failure122,125 or use of RAS inhibitors122. For BMI, a post-hoc analysis of EXSCEL (Exenatide) found a greater GLP1-RA effect on reducing rate of eGFR decline in those with lower BMI (BMI ≤ 30 kg/m2 treatment difference 0.26 mL/min/1.73m2/year [95% CI 0.04, 0.48] vs BMI > 30 kg/m2 −0.12 [-0.26, 0.03], Pinteraction = 0.005)122. However, Verma et al.126 found no significant interaction by BMI subgroup with GLP1-RA treatment for a composite renal outcome in LEADER (Liraglutide) or SUSTAIN 6 (Semaglutide).

For baseline eGFR, a pooled analysis of LEADER and SUSTAIN-6 reported a significant interaction, with lower eGFR associated with greater GLP1-RA effect in reducing eGFR decline: Semaglutide 1.0 mg vs placebo, eGFR < 60 difference in decline 1.62 ml/min/1.73m2/year vs eGFR> = 60 difference in decline 0.64 ml/min/1.73 m2/year, Pinteraction = 0.057; Liraglutide 1.8 mg vs placebo, eGFR < 60 difference in decline 0.67 ml/min/1.73m2/year vs 0.15 ml/min/1.73 m2/year, Pinteraction = 0.008)5. However, a study of Exenatide LAR found no treatment heterogeneity for this same outcome by eGFR category122, and in a further analysis of LEADER, the renal composite endpoint was used with no interaction reported by baseline eGFR category127. The overall evidence does not support an effect of baseline eGFR on the relative renal benefit for GLP1-RA as an overall drug class.

For baseline UACR, a pooled analysis of LEADER and SUSTAIN-65 and EXSCEL122 showed a greater benefit of GLP1-RA on eGFR reduction or eGFR slope with higher UACR; however, there was either no significant interaction5 or no formal interaction test was reported122. For ELIXA, Muskiet et al. 128 did not find a significant interaction of UACR category on eGFR decline. A further study found no association between UACR and GLP1-RA effect on reducing a composite renal outcome127.

Two studies found that GLP1-RAs more effectively reduced UACR in those with higher UACR. In a pooled analysis of LEADER and SUSTAIN-6, those with normal albuminuria had a 20% (95%CI 15%, 25%) reduction in UACR compared to placebo; those with microalbuminuria had a 31% (95%CI 25–37%) reduction; those with macroalbuminuria had a 19% (95%CI 7–30%); Pinteraction = 0.0215. In ELIXA, least-squares mean percentage change in UACR was –1·69% (SE 5·10; 95% CI –11·69 to 8·30; p = 0·7398) in participants with normoalbuminuria, –21.10% (10.79; –42.25 to 0·04; p = 0.0502) in participants with microalbuminuria, and –39·18% (14·97; –68·53 to –9·84; p = 0·0070) in participants with macroalbuminuria in favour of lixisenatide; a formal test for interaction was not reported128. A third study found no treatment heterogeneity for this same outcome122.

In summary, the included studies showed conflicting results for renal outcomes of GLP1-RA, though the majority were underpowered to detect heterogenous treatment effects. The most consistent finding was that a higher UACR is associated with greater GLP1-RA reduction in UACR relative to placebo, but this does not translate to benefits in eGFR-defined measures of renal function. There were no other biomarkers that robustly predicted benefit from GLP1-RA for the renal outcomes examined.

SGLT2i and GLP1-RA: Evidence from observational studies

There were no observational studies for GLP1-RA and renal outcomes included, and no comparison studies between people treated with GLP1-RA and SGLT2i. Observational studies comparing SGLT2i to other glucose-lowering drugs confirmed the lack of treatment effect heterogeneity associated with age129,130, use of blood pressure/cardiovascular medications127, blood pressure (Koh 2021), history of cardiovascular disease129 and sex129, but one study in a Korean population found greater SGLT2i benefit on progression to end stage renal impairment with higher BMI (BMI < 25 kg/m2, HR 0.80 (95%CI 0.51, 1.25); BMI ≥ 25 kg/m2 HR 0.27 (0.16, 0.44), Pinteraction = 0.002) and with abdominal obesity compared to without129. This is not consistent with results from meta-analysis of RCTs.

Summary of quality assessment

To evaluate risk of bias, we used the JBI critical appraisal tool for cohort studies as the best flexible tool for the range of studies included. Due to our screening criteria, no manuscripts that passed full text screening were excluded due to risk of bias. The checklist results for the 11 points in the appraisal checklist are shown as a heatmap in Supplementary Figure 1(SGLT2i) and 2(GLP1-RA).

Additionally, the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) framework was applied at the outcome level for each drug class to determine the quality of evidence and certainty of effects (Table 4)13. Overall certainty of evidence was rated as moderate for all outcomes except glycaemia with GLP1-RA which was rated low certainty. This reflects that a larger proportion of the studies included for evaluation of GLP1-RA glycaemia outcomes were observational (24/49). By contrast, for SGLT2i glycaemia outcomes there were 18 RCT/meta-analyses and 9 observational studies. For CVD and renal outcomes, observational studies were limited and the majority of evidence came from industry-funded CVOTs (RCT designs), including post-hoc analyses of individual trials as well as meta-analyses.

Table 4 Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) framework summary of findings.

Discussion

This systematic review provides a comprehensive review of observational and RCT-based studies of people with type 2 diabetes, specifically examining heterogenous treatment effects for SGLT2i and GLP1-RA therapies on glycaemic, cardiovascular, and renal outcomes. We assessed evidence for treatment effect modification for a wide range of demographic, clinical and biological features, including pharmacogenetic markers. Each of the three clinical outcomes were evaluated separately for each drug class for a total of 6 sub-studies. Overall, our review identified limited evidence for treatment effect heterogeneity for glycaemia, cardiovascular, and renal outcomes for the two drug classes. We summarize the key findings below.

For glycaemic response, there was high certainty that reduced renal function is associated with lower efficacy of SGLT2i. For GLP1-RA there was moderate certainty that markers of reduced insulin secretion, either directly measured (e.g. c-peptide or HOMA-B) or proxy measures, such as diabetes duration, were associated with reduced glycaemic response to GLP1-RA, although the majority of evidence was from observational studies. As with other glucose-lowering drug classes, a greater glycaemic response with both SGLT2i and GLP1-RA was seen at higher baseline HbA1c. We did not identify any studies examining whether the relative efficacy of SGLT2i compared to GLP1-RA is altered by baseline HbA1c levels. Of note, many of the included studies for HbA1c outcome were observational, meaning findings could potentially reflect biases from differential prescribing behaviour, or regression to the mean, although we did attempt to account for the latter by including adjustment for baseline HbA1c as one of our study inclusion criteria.

For both CVD and heart failure outcomes, RCT meta-analyses do not support differences in the relative efficacy of either GLP1-RA or SGLT2i based on an individuals’ prior CVD status. However, this finding should be interpreted cautiously as all RCTs to-date have predominantly included participants with, or at high-risk of, CVD, thereby excluding the majority of the wider T2D population at lower risk. However, meta-analyses suggest (with moderate certainty) that the relative effects of both drug classes may be greater in people of non-White ethnicity. In particular, those of Asian and African ethnicity (compared to Whites) have been shown to have a greater relative benefit for hospitalization for heart failure/CV death (but not MACE) with SGLT2i, and MACE for GLP1-RA.

When evaluating renal outcomes, there was no consistent evidence of treatment heterogeneity for SGLT2i, but for GLP1-RA, there was greater reduction in proteinuria in those with higher baseline proteinuria.

This limited evidence could reflect a true lack of heterogenous treatment effects, but it more likely reflects an absence of clinical studies that were well designed or sufficiently powered to robustly identify and characterise treatment effect heterogeneity. Although five of the six sub-studies we evaluated were evaluated at GRADE B, there were methodological concerns with many of the included studies. As individual RCTs are by design powered only for the main effect of treatment131, our primary focus when reporting were meta-analyses of post-hoc subgroup analyses of RCTs. However, we found the subgroup analyses in these studies primarily focused on stratification by baseline risk for the outcome in question e.g. baseline HbA1c on glycaemic response, CKD stage or albuminuria on renal outcomes, and CVD risk or established CVD for CVD outcomes. Other common subgroups included those defined by BMI, age, sex or other routinely collected clinical characteristics, with very few studies evaluating non-routine biomarkers or pharmacogenetic markers (as highlighted in Tables 13). A major limitation was that studies predominantly focused on conventional approaches to subgroup analysis, with very few studies assessing continuous features (such as BMI) on a continuous scale which is required to maximize power to detect treatment effect heterogeneity131,132.

It is also important to recognize that almost all the studies evaluating cardiovascular and renal endpoints included in our systematic review focused on the relative effect of a biomarker/stratifier on the outcome, as most studies reported a hazard ratio compared with a placebo arm for the outcome of interest (e.g. MACE, incident renal disease). This does not recognize that baseline absolute risk of these endpoints is likely to differ substantially across these strata; so although, for example, there was no difference in relative benefit of an SGLT2i by age, this means that on the absolute scale, benefit will increase with age (as underlying absolute risk increases), and it is this absolute benefit that should be considered when deciding on whether to initiate SGLT2i treatment.

An important finding of our review is the lack of robust comparative effectiveness studies directly examining treatment effect heterogeneity for these two major drug classes, either head-to-head or compared with other major anti-hyperglycaemic therapies. Insight into effect modification for a single drug class is not sufficient to support the clinical translation of a precision medicine approach. The lack of direct comparisons between therapies obscures the interpretation of biomarkers with regards to whether they function as broad prognostic factors, which may be relevant to any (or at least multiple) drug class, or as markers of heterogenous treatment effects specific to a particular drug class. An evidence base that includes more high-quality studies on heterogeneity in the comparative effectiveness of SGLT2i, GLP1-RA, and other drug classes is needed to advance the field towards clinically useful precision diabetes medicine. For cardiovascular and renal outcomes, these studies need to incorporate both absolute outcome risk and relative estimates of treatment effects in order to usefully inform clinical decision-making. Only when this evidence is available can precision medicine support more individualised treatment decisions, allowing providers to select an optimal therapy from a set of multiple options informed by each medication’s risk/benefit profile specific to the characteristics of an individual patient.

We identified the following additional, high-level evidence gaps in our review: (1) Limited head-to-head comparative effectiveness studies examining treatment effect heterogeneity; (2) A lack of robust studies integrating multiple clinical features and biomarkers. The majority of studies only tested single biomarkers one at a time in subgroup analysis; (3) Few studies focused on pharmacogenetics or non-routine biomarkers; (4) Few studies conducted in low-middle income countries, required for an equitable global approach to precision type 2 diabetes medicine; (5) Few RCT meta-analyses based on individual-level participant data, precluding robust evaluation of between-trial heterogeneity and individual-level confounders; (6) An absence of confirmatory studies. We identified no prospective studies testing a priori hypotheses of potential treatment effect modifiers, or studies conducting independent validation of previously described heterogenous treatment effects; (7) A lack of population-based data representing individuals treated in routine care. As cardiovascular and renal trials have focused on high-risk participants, the benefits of SGLT2i and GLP1-RA for primary prevention is a major unanswered question; (8) Few cardiovascular and renal outcome studies considering treatment effect modification on the absolute as well as relative risk scale; (9) A focus on short-term glycaemic outcomes, with limited studies investigating durability of glycaemic response or time to glycaemic failure.

Of note, several studies published since our data extraction was completed in February 2022 which fill some of the evidence gaps identified in our review, and highlight the clear potential for a precision medicine approach to T2D treatment: the TriMaster study—a precision medicine RCT of SGLT2i, DPP4i and thiazolidinediones (TZD) that established that individuals with higher renal function (eGFR >90 ml/min/1.73 m2) have a greater HbA1c response with SGLT2i vs DPP4i relative to those with eGFR 60–90 ml/min/1.73 m2 133, a result concordant with our finding that reduced renal function is associated with lower efficacy of SGLT2i; a similarly designed two-way crossover trial in New Zealand which identified a greater relative benefit of TZD therapy compared to DPP4i in people with obesity and/or hypertriglyceridemia;134 a study using large-scale observational data and post-hoc analysis of individual participant-level data from 14 RCTs that specifically investigated differential treatment effects with SGLT2i and DPP4i, and developed a treatment selection model to predict HbA1c response on the two therapies based on an individuals’ routine clinical characteristics;135 and a robust study across observational and multiple RCTs identifying pharmacogenetic markers of differential glycaemic response to GLP1-RA136. In addition, three large trials (AMPLITUDE-O investigating cardiovascular and renal outcomes in 4076 participants with T2D for the GLP-RA efpeglenatide137, DELIVER investigating worsening heart failure or cardiovascular death in 3131 participants [45% with T2D] for the SGLT2i Dapagliflozin138, and EMPA-KIDNEY investigating progression of kidney disease or cardiovascular death in 6609 participants [44% with T2D]139) have recently been published. Although all three are primary RCTs examining average treatment effects rather than treatment effect heterogeneity, and thus would have been ineligible for our review, future meta-analysis studies integrating the results of these and other ongoing SGLT2i and GLP1-RA trials may add to the evidence we have presented.

As our aim was to provide a comprehensive review of these treatments, we did not conduct quantitative analysis of specific biomarkers due to the range of different biomarkers, methodologies, and outcomes evaluated in the included studies. However, this review provides guidance for where future targeted quantitative meta-analysis could be most insightful. In addition, different methods for synthesising the current available evidence, such as conducting an umbrella review, may offer further insights into the current state-of-play of precision Type 2 diabetes treatment.

This review highlights the need for several research priorities to advance our limited understanding of heterogenous treatment effects among individuals with type 2 diabetes. We outline priorities for research to advance the field towards a translational model of evidence-based, empirical precision diabetes medicine (Fig. 3), and highlight the recent Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement to guide this research132. In the future, with a greater understanding of heterogenous treatment effects and enhanced capacity to predict individual treatment responses, precision treatment in type 2 diabetes may be able to integrate demographic, clinical, biological, or other patient-level features to match individuals to their optimal anti-hyperglycaemic regimen.

Fig. 3: Priorities for future research to advance the field towards a translational model of evidence-based, empirical precision diabetes medicine.
figure 3

Priorities for future research in treatment heterogeneity of diabetes medications as identified by this systematic review.

Conclusions

There is limited evidence of treatment effect heterogeneity with SGLT2i and GLP1-RA for glycaemic, cardiovascular, and renal outcomes in people with type 2 diabetes. This lack of evidence likely reflects the methodological limitations of the current evidence base. Robust future studies to fill the research gaps identified in this review are required for precision medicine in type 2 diabetes to translate to clinical care.