Introduction

Fatty liver disease (FLD), which is characterized by the accumulation of fat in the liver, is a prevalent liver disorder with significant global impact1,2. This condition is a hepatic manifestation of metabolic syndrome and is closely linked to insulin resistance and type 2 diabetes mellitus (T2DM)2,3,4. Individuals with diabetes are at a higher risk of developing FLD, and diabetes increases the risk of progression to more severe liver diseases, such as liver cirrhosis or hepatocellular carcinoma (HCC)5,6. Various studies have indicated that patients with concurrent FLD and T2DM are significantly more likely to develop HCC6,7,8. With the incidence of FLD and T2DM increasing worldwide, managing the risk of progression to HCC in these patient populations is becoming a critical concern.

Sodium-glucose cotransporter-2 inhibitors (SGLT2i) are a class of drugs primarily used to treat T2DM9. Beyond their role in lowering blood glucose levels, emerging research suggests SGLT2i may offer additional benefits in liver diseases, including non-alcoholic fatty liver disease (NAFLD) and HCC10,11,12. Pre-clinical studies have shown that SGLT2i can decrease hepatic steatosis, enhance insulin sensitivity, and reduce liver inflammation and fibrosis13,14. Several clinical studies have shown that the use of SGLT2i in patients with NAFLD improves liver function and serum markers of liver injury11,15. Additionally, the use of SGLT2i in T2DM patients has been observed to reduce the risk of HCC development16. However, the relationship between SGLT2i and other cancer types has yielded mixed outcomes; while some studies report a reduced risk of cancer, such as lung and gastrointestinal cancers, others have raised concerns over increased risks of bladder cancer17. Given the relatively recent introduction of SGLT2i in the market, there is a critical need for further research involving long-term follow-up and the use of clinical big data to more thoroughly investigate the cancer incidence associated with SGLT2i use.

In healthcare research, the use of big data has become increasingly vital, particularly for identifying trends, patterns, and correlations within vast datasets18. Our study used data from the Health Insurance Review and Assessment Service (HIRA) of Korea, which is an extensive database encompassing a wide array of patient information19. The use of such big datasets offers a unique opportunity to conduct comprehensive and detailed analyses of large populations. This approach enabled us to observe real-world outcomes, overcome the limitations of small sample sizes, and enhance the generalizability of our findings.

Our study aimed to evaluate the impact of SGLT2i on cancer development, with a specific focus on HCC, in patients with co-existing FLD and T2DM, using a nationwide Korean cohort from the HIRA. We conducted our analysis on two distinct subpopulations of patients coexisting with FLD and T2DM. The first cohort included patients with T2DM and NAFLD, after excluding those with other chronic liver diseases, such as chronic viral hepatitis (CVH), alcoholic liver disease, and autoimmune liver disease from patients with FLD and T2DM. The second cohort comprised high-risk individuals with HCC who were diagnosed with CVH among patients with FLD and T2DM. In addition to assessing the impact of SGLT2i on HCC incidence, we also examined various demographic and clinical factors to identify independent risk factors for HCC in these patient groups, leveraging an extensive dataset to provide insights into effective HCC risk management strategies.

Results

Baseline characteristics and incidence rate of cancers in the NAFLD-T2DM cohort

We identified 201,542 patients with co-existing NAFLD and T2DM. Of these patients, 55,770 (27.7%) were in the SGLT2i group and 145,772 (72.3%) were in the non-SGLT2i group. The median [interquartile range (IQR)] of the follow-up time was 3.56 (2.17–5.10) years for all, 3.01 (1.94–4.52) years for SLGT2i group, and 3.77 (2.30–5.34) years for Non-SGLT2i group. This selection was conducted after excluding patients diagnosed with chronic liver diseases including CVH, alcoholic liver disease, and autoimmune liver disease. After 1:1 PS matching, a balanced cohort of 107,972 patients was established for analysis and evenly divided into 53,986 patients (50.0%) in the SGLT2i group and 53,986 patients (50.0%) in the non-SGLT2i group (Fig. 1). In PS-matched cohort, the median (IQR) of the follow-up time was 3.04 (1.94–4.55) years for all, 3.05 (1.95–4.56) years for SLGT2i group, and 3.03 (1.93–4.54) years for Non-SGLT2i group. There was no significant difference in follow-up period between the two groups (P = 0.445).

Figure 1
figure 1

Schematic representation of cohort derivation for this study. T2DM type 2 diabetes mellitus, NAFLD non-alcoholic fatty liver disease, FLD fatty liver disease, CVH chronic viral hepatitis, SGLT2i sodium-glucose cotransporter-2 inhibitor, PSM propensity score matching.

Supplementary Table 1 and Figure S1A (Love plot) confirm the successful adjustment of covariate differences between groups following PS matching. In this cohort, PS matching effectively standardized the mean differences, with all variables achieving an aSMD of less than 0.1, demonstrating excellent balance across covariates. These results underscore the robustness of the matching process and comparability of the groups for subsequent analyses. Table 1 illustrates comprehensive patient characteristics before and after PS matching.

Table 1 Baseline characteristics of pre- and post-PS matched NAFLD-T2DM cohort.

Table 2 displays the number of cancer cases, person-years, and IR per 10,000 person-years [95% CI] for each type of cancer according to SGLT2i exposure status in the NAFLD-T2DM cohort both before and after PS matching. Figure 3A shows a forest plot of the HRs for each cancer type. In the pre-matching analysis, non-SGLT2i users exhibited significant HRs for the occurrence of “total cancer”, HCC, CCC, stomach cancer, colorectal cancer, pancreatic cancer, lung cancer, prostate cancer, and “other cancers”. However, after PS matching, the statistically significant differences in cancer risk between the two groups disappeared.

Table 2 Incident rate per 10,000 person year of the malignancies according to SGLT2i usage in the pre- and post-PS matched NAFLD-T2DM cohort.

Survival analysis of HCC and other cancers in the PS-matched NAFLD-T2DM cohort according to SGLT2i usage

Figure 2A shows the Kaplan–Meier curves comparing the probability of HCC incidence between SGLT2i users and non-SGLT2i users within the PS-matched NAFLD-T2DM cohort. No significant differences were observed not only in HCC development but also in other types of cancer, indicating that SGLT2i usage does not statistically influence cancer incidence in this cohort (Fig. S1).

Figure 2
figure 2

Comparison of Kaplan–Meier curves of HCC occurrence according to SGLT2i exposure in PS-matched NAFLD-T2DM cohort and PS-matched FLD-T2DM-CVH cohort. HCC hepatocellular carcinoma, SGLT2i sodium-glucose cotransporter-2 inhibitor, PS propensity score, NAFLD non-alcoholic fatty liver disease, T2DM type 2 diabetes mellitus, FLD fatty liver disease, CVH chronic viral hepatitis.

Subsequently, Cox proportional hazards analysis was conducted to identify the independent variables affecting HCC occurrence in this cohort (Table 3). In the univariate Cox regression analysis, older age; male sex; comorbidities such as hypertension, hypothyroidism, and liver cirrhosis; and the use of aspirin, beta-blockers, calcium channel blockers, and fibrates were identified as significant risk factors for HCC occurrence. In the multivariate Cox regression analysis, which included variables with a P value < 0.1 and SGLT2i used from the univariate analysis, older age (HR = 1.08, 95% CI = 1.06–1.10, P < 0.001), male sex (HR = 2.79, 95% CI = 1.87–4.14, P < 0.001), hypothyroidism (HR = 2.43, 95% CI = 1.21–4.87, P = 0.013), liver cirrhosis (HR = 17.88, 95% CI = 8.19–39.03, P < 0.001), statin use (HR = 0.59, 95% CI = 0.36–0.96, P = 0.035), and fibrate use (HR = 0.14, 95% CI = 0.02–0.99, P = 0.049) were identified as independent risk factors for HCC occurrence. The Concordance index of this model was 0.805, with a standard error (SE) of 0.029.

Table 3 Univariate and multivariate Cox regression analysis to identify risk factors associated with HCC occurrence in the PS-matched NAFLD-T2DM cohort.

Baseline characteristics and incidence rate of cancers in the FLD-T2DM-CVH cohort

In this subset, 4936 patients with CVH along with co-existing NAFLD and T2DM were identified. Among them, 1440 (29.2%) were categorized into the SGLT2i group and 3,496 (70.8%) into the non-SGLT2i group. The median (IQR) of the follow-up period was 3.50 (2.18–4.95) years for all, 3.06 (2.04–4.46) years for SLGT2i group, and 3.67 (2.25–5.17) years for Non-SGLT2i group. Following 1:1 PS matching, an eligible cohort for analysis was formed, consisting of patients with an equal distribution of 1,399 patients (50.0%) in both the SGLT2i and non-SGLT2i groups (Fig. 1). In the PS-matched FLD-T2DM-CVH cohort, the median [IQR] of the follow-up time was 3.13 (2.07–4.48) years for all, 3.07 (2.03–4.48) years for SLGT2i group, and 3.19 (2.10–4.48) years for non-SGLT2i group. There was no significant difference in follow-up period between the two groups (P = 0.529).

Supplementary Table 2 and Fig. S1B (Love plot) confirm the successful adjustment of covariate differences between groups following PS matching. Significant discrepancies between groups were noted before PS matching; however, PS matching effectively standardized the mean differences in the PS-matched cohort, with all variables achieving an aSMD of less than 0.1, demonstrating excellent balance across covariates. These results underscore the robustness of the matching process and comparability of the groups for subsequent analyses. Table 4 illustrates comprehensive patient characteristics before and after PS matching.

Table 4 Baseline characteristics of pre- and post-PS matched FLD-T2DM-CVH cohort.

Table 5 shows the number of cancer cases, person-years, and IR per 10,000 person-years for each cancer type according to SGLT2i exposure in the FLD-T2DM-CVH cohort before and after PS matching. Notably, the crude IR per 10,000 person-years of HCC was significantly higher in the FLD-T2DM-CVH cohort (IR per 10,000 person-years: 57.3, 95% CI 18.4–71.6) compared to the NAFLD-T2DM cohort (IR per 10,000 person-years: 5.2, 95% CI 3.6–5.7). Interestingly, in both pre-and post-PS matching, the IR per 10,000 person-years of HCC was markedly higher in the non-SGLT2i group (Pre-PS matching: 18.4 vs 71.6, and post-PS matching: 18.8 vs 41.7, for SGLT2i users and non-SGLT2i users, respectively).

Table 5 Incident rate per 10,000 person year of the malignancies according to SGLT2i use in the pre- and post-PS matched FLD-T2DM-CVH cohort.

While the IR per 10,000 person-years for HCC increased by more than tenfold in the FLD-T2DM-CVH cohort, the number of cases of other cancer types decreased as the cohort size diminished. As shown in Table 5, for several cancer types, the number of cases was less than 10. Due to concerns such as lack of statistical power, risk of overestimation, adherence to the Events Per Variable rule, and model fit issues, HRs could not be calculated for these types of cancer. We could analyze HR of HCC, “total cancer”, and “other cancer” in both pre- and post-PM matched cohorts. The risk of HCC occurrence in non-SGLT2i users was significantly higher in both cohorts; before matching [crude HR = 3.58 (1.80–7.09)] and in the PS-matched cohort [adjusted HR = 2.32 (1.06–5.06)]. The risk of “total cancer” showed significant HR in the pre-matched cohort; however, this significance disappeared in the post-PS-matched cohort (Fig. 3B).

Figure 3
figure 3

Forrest plots of the hazard ratio of each cancer according to SGLT2i usage in the NAFLD-T2DM cohort and FLD-T2DM-CVH cohort. (A) NAFLD-T2DM cohort. (B) FLD-T2DM-CVH cohort. SGLT2i sodium-glucose cotransporter-2 inhibitor, PSM propensity score matching, NAFLD non-alcoholic fatty liver disease, T2DM type 2 diabetes mellitus, FLD fatty liver disease, CVH chronic viral hepatitis.

Survival analysis of HCC and other cancers in the PS matched FLD-T2DM-CVH cohort according to SGLT2i usage

Figure 2b displays the Kaplan–Meier curves comparing HCC occurrence between SGLT2i and non-SGLT2i users within the PS-matched FLD-T2DM-CVH cohort. SGLT2i users had a significantly lower risk of developing HCC (P = 0.03). There were no significant differences in the occurrence of “total cancers” and “other cancers” between the two groups (Fig. S3).

Subsequently, Cox proportional hazards analysis was conducted in the PS-matched FLD-T2DM-CVH cohort (Table 6). In the univariate Cox regression analysis, older age and comorbidities such as dyslipidemia, heart failure, and liver cirrhosis, as well as the use of SGLT2i, statins, and antiviral treatment, were significantly associated with the occurrence of HCC. To adjust for covariates, we performed a multivariate Cox regression analysis by entering variables with a P value < 0.1 from the univariate analysis. Sex was also included in the multivariate analysis, although it was not a significant factor in the univariate analysis. This is because it is considered a basic variable for adjustment. In multivariate analysis, SGLT2i usage [HR = 2.22 (1.01–4.87), P = 0.047] was identified as an independent risk factor of HCC occurrence along with older age [HR = 1.07 (1.03–1.10), P < 0.001], male sex [HR = 2.23 (1.00–5.26), P = 0.049], and liver cirrhosis [HR = 7.33 (3.31–16.21), P < 0.001]. The C-index of this model was 0.882 with an SE of 0.056.

Table 6 Univariate and multivariate Cox regression analysis to identify risk factors associated with HCC occurrence in the PS matched FLD-T2DM-CVH cohort (N = 2798).

Discussion

This study undertook a comprehensive analysis using large-scale healthcare data to investigate the influence of SGLT2i on cancer development, with emphasis on HCC, in a cohort with co-existing FLD and T2DM. By leveraging high-quality data from the HIRA Service of Korea. This study enriches the field with valuable insights into practical implications and outcomes in the clinical setting. Our findings in the NAFLD-T2DM cohort indicated no significant differences in the incidence of HCC and other types of cancers based on SGLT2i use. However, in the HCC high-risk group of patients, the FLD-T2DM-CVH cohort, the use of SGLT2i was significantly associated with a lower incidence of HCC, even after PS matching and multivariate Cox analysis, highlighting its potential protective effect in this particular subgroup.

A previous systematic review and meta-analysis investigating the association between SGLT2i and cancer risk in T2DM patients found no significant increase in the overall cancer risk, consistent with our findings in the NAFLD-T2DM cohort17. This prior research, encompassing 46 randomized controlled trials, indicated an increased risk of bladder cancer with SGLT2 inhibitor use but suggested a potential protective effect against gastrointestinal cancers. However, the authors state that further long-term studies are recommended owing to the short-term nature of the trials included in the study. In our study on patients with FLD and T2DM, the use of SGLT2i was not associated with an increased risk of bladder cancer, and the potential protective effect against gastrointestinal cancer was not statistically significant. Chou et al.20 reported a protective effect of SGLT2i against HCC compared to dipeptidyl peptidase-4 inhibitors in T2DM patients using data from Hong Kong's National Health Care System. In our study, using data from the Korean HIRA Service, we initially observed a trend towards lower crude IRs of HCC and other cancer types among SGLT2i users within the NAFLD-T2DM cohort. In addition, a significant increase in the HRs of various types of cancers, including HCC, was observed in non-SGLT2i users before matching. However, this trend did not reach statistical significance after PS matching, which was adjusted for discrepancies in person-years attributable to the relatively recent introduction of SGLT2i compared to other oral hypoglycemic agents (OHA). This may be due to differences in the observed person-years between SGLT2i users and non-users. Specifically, SGLT2i users demonstrated relatively shorter person-years than non-users, resulting in an apparent increase in the IR of various cancers in the SGLT2i user group before matching.

While SGLT2i did not demonstrate a statistically significant association with HCC incidence in patients with NAFLD and T2DM, multivariate Cox analysis identified several factors associated with increased HCC risk in this population. These included older age, male sex, presence of hypothyroidism, and liver cirrhosis. Furthermore, the use of statins and fibrates has been associated with a lower incidence of HCC. This observation aligns with the existing research, underscoring the potential protective effects of statins and fibrates against HCC. Previous research has demonstrated that statins may confer a protective benefit in the chemoprevention and treatment of several cancers, including HCC21,22,23,24. Recently, Zou et al.25 suggested an association between statin and reduced risk of HCC development in NAFLD patients by using the Optum de-identified Clinformatics database. Additionally, a large-scale case–control study in Taiwan revealed a significant inverse association between fibrate use and the incidence of liver cancer26. The study demonstrated that fibrate use was associated with significantly lower odds of liver cancer in a dose-dependent manner, indicating a protective effect of fibrates against liver cancer. While our study contributes to the understanding of SGLT2i's role in various types of cancer risk, particularly in a specific cohort of patients with NAFLD and T2DM, it also highlights the importance of considering the protective effects of other medications, such as statins and fibrates, in managing HCC risk in this cohort.

In our FLD-T2DM-CVH cohort, we noted a notably higher crude incidence rate of HCC compared to the NAFLD-T2DM cohort. This difference is attributed not only to viral infection but also to variations in HCC screening strategies for both cohorts. CVH is a well-known risk factor for HCC, and it is recommended by various expert groups that patients with CVH should undergo biannual HCC screening27,28,29. On the other hand, in patients without CVH or liver cirrhosis, regular HCC screening is not recommended. Considering the significant differences in HCC risk and HCC screening strategies based on CVH status, we conducted separate analyses for patients with CVH and those NAFLD patients without CVH to minimize potential biases. Interestingly, within the CVH cohort with higher HCC risk, we noted a pronounced protective effect of SGLT2i against development of HCC. This finding is in line with the concepts of risk difference effect and relative risk reduction, suggesting that therapeutic interventions might offer greater absolute benefits in populations at a higher baseline risk30,31. The underlying theory suggests that individuals at an elevated risk of a condition may gain more from interventions due to their higher initial risk, potentially preventing a greater number of adverse outcomes31. Despite the limitations of our study design and dataset which prevent a detailed statistical analysis to fully quantify this effect, the observed trend highlights the importance of considering baseline risk when evaluating treatment outcomes. This insight is particularly pertinent for clinicians seeking to optimize therapeutic strategies for patients with diverse risk profiles, emphasizing the need for tailored approaches based on individual patient risk factors. Further research is needed to explore this differential effect more comprehensively, possibly by incorporating more detailed data on baseline risk and utilizing statistical methods to assess the interaction effects between treatment efficacy and specific risk factors for HCC in patients. This finding aligns with a territory-wide cohort study conducted in Hong Kong, which reported that SGLT2i use was associated with a lower risk of HCC development in patients with co-existing T2DM and chronic hepatitis B infection32. These results suggest the potential protective effects of SGLT2 inhibitors against HCC development in high-risk patients, reinforcing the importance of targeted therapeutic strategies for managing HCC risk in patients with diabetes and chronic viral hepatitis.

The strength of our study lies in its large sample size and utilization of a national database, enabling a robust statistical approach and enhancing the generalizability of our findings. Nevertheless, we acknowledge the presence of inherent limitations, notably the study's retrospective and observational nature, which could introduce biases and the potential for residual confounding factors that might not be fully eliminated through statistical adjustments. In addition to, critical individual patient variables, such as height, weight, and blood glucose levels, which can significantly influence the outcomes, were not directly measured in our study. To mitigate these constraints, we incorporated several variables capable of indirectly representing the baseline health status of patients, including diagnoses related to obesity and the intensity of glycemic control treatments. Notably, in the Korean healthcare system, the prescription of OHAs and insulin is determined by initial HbA1c levels, offering a surrogate marker for assessing patients' baseline glycemic control. This methodology, while not directly measuring each variable, provides a practical and indirect assessment of patients' health conditions that could address, at least partially, some of the limitations mentioned. Furthermore, the potential underdiagnosis of early-stage HCC among non-cirrhotic patients without CVH presents an additional limitation. Our reliance on ICD-10 codes for identifying FLD, T2DM, and any cancers might not capture all instances of early-stage HCC, especially given the lack of established recommendations for HCC screening in non-cirrhotic patients. To address this concern, we employed a wash-out period strategy, however, we recognize that this measure cannot fully overcome the challenges associated with underdiagnosis of early-stage HCC. It indicates the need for future studies to develop more precise diagnostic criteria and screening protocols for this patient population.

In conclusion, within the NAFLD-T2DM cohort, SGLT2i did not demonstrate a statistically significant effect in reducing the risk of developing HCC. In contrast, our analysis within the FLD-T2DM-CVH cohort indicates a significant association between SGLT2i use and a decreased risk of HCC, highlighting their potential as a preventive strategy in patients with a higher risk profile of HCC. Nevertheless, it is important to recognize that our study is based on retrospective cohort data, underscoring the need for future research through prospective cohort studies to further validate these findings.

Methods

Data source

We used a dataset from the HIRA database of the Republic of Korea between January 1, 2014, and December 31, 2021. The dataset contained comprehensive information from both inpatient and outpatient medical claims, including details such as prescription drug utilization, diagnostic and treatment codes, and primary and secondary diagnosis codes.

Study design

This study was designed as a comparative cohort study to evaluate the implications of SGLT2 inhibitor prescription on HCC incidence in patients diagnosed with FLD and T2DM. Figure 1 shows the flowchart of this study. Data were extracted from eligible patients. The eligibility criteria for the study were as follows: (1) patients diagnosed with co-existing FLD and T2DM, and (2) patients receiving treatment with one to three types of OHA. Patients diagnosed with FLD or T2DM were identified based on medical diagnoses according to the International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10). Individuals who met the following criteria were excluded: those diagnosed with any malignancy or those who underwent liver transplantation before cohort entry or within the year after cohort entry, considering the lag period to eliminate the possibility of detection of already existing cancers. Patients with cohort entry day in 2014 were excluded because they did not meet the criteria for assessing baseline characteristics in the year prior to cohort entry. Patients with a cohort entry day of 2021 were also excluded due to the lack of a minimum one-year follow-up period to evaluate cancer development. Patients who had a history of any OHA or insulin prescription within 1 year before cohort entry were also excluded.

Patients treated with SGLT2i for more than 90 days since cohort entry were categorized into SGLT2i users, while those who never used SGLT2i during 2014–2021 were categorized into the comparative group non-SGLT2i users. The index date was defined as the cohort entry day, which was set as the first date of SGLT2i or other OHA prescriptions. In South Korea, OHA is prescribed according to the insurance coverage criteria of the National Health Insurance Service. The insurance coverage criteria were based on the patient's glycemic control status, as represented by hemoglobin A1c(HbA1c)33. Thus, the number of prescribed OHA or insulin use was closely related to the glycemic control status in each patient. Therefore, the use of multiple OHA or insulin suggests that patients with diabetes require more intensive treatment to achieve adequate glycemic control. Furthermore, we classified the patients according to the number of prescribed OHA and insulin usage during the 90 days after cohort entry to reflect the glycemic control level at the time of cohort entry; level 1—one or two OHA had been taken, Level 2—three classes of OHA had been taken without insulin, and level 3—administration of insulin in combination with other OHA. The index year, age at cohort entry, sex, level of antidiabetic treatment 90 days after cohort entry, comorbidities, Charlson Comorbidity Index (CCI), and prescribed drugs during the year prior to cohort entry were analyzed as baseline characteristics.

Cohort definition

We analyzed two distinct patient cohorts with concurrent FLD and T2DM. The patients presenting with both FLD and T2DM, who also have CVH, are categorized into a higher risk group for HCC, necessitating bi-annual HCC screenings for this population. Conversely, T2DM-NAFLD patients without CVH or liver cirrhosis are not classified as being at high risk for HCC, and thus, regular HCC screenings using ultrasound are not routinely recommended for them. To address the disparities in risk and screening frequencies between patients with CVH and those with only NAFLD, we conducted separate analyses for these groups to mitigate any biases arising from these differences. The first, termed the NAFLD-T2DM cohort, was identified by excluding patients with other causes of chronic liver diseases at baseline, such as CVH, alcoholic liver disease, and autoimmune liver disease including primary biliary cholangitis and autoimmune hepatitis, aligning with the definition of NAFLD. The second cohort, the FLD-T2DM-CVH cohort, included patients diagnosed with CVH in addition to concurrent FLD and T2DM. CVH, alcoholic liver disease, primary biliary cholangitis, and autoimmune hepatitis were diagnosed based on the presence of these diagnoses in medical records during the year prior to cohort entry. Additionally, patients were considered to have received antiviral treatment if they had been prescribed antiviral agents for hepatitis B or C within the year prior to cohort entry.

Outcome

The primary outcome of the present study was a diagnosis of any malignancy, which was indicated by the C code in the ICD-10, and registration of catastrophic illness coverage in the national health insurance system for the corresponding malignancies. All eligible patients were followed up from the index date until the occurrence of the primary outcome or the study end date (31st December 2021), whichever occurred first. In this study, we evaluated the occurrence of a spectrum of cancer types: HCC, Cholangiocarcinoma (CCC), and various gastrointestinal cancers (stomach, colorectal, esophageal, and pancreatic), along with lung, bladder, prostate, breast, and cervical cancers. We also included a category termed “other cancers” to encompass less common or unspecified cancer sites. Furthermore, we assessed the combined incidence rate of these malignancies, referred to as “total cancer” incidence, to provide an aggregate measure of cancer diagnoses in our study.

Statistical analyses

To thoroughly evaluate the baseline characteristics across differing groups in our study, we meticulously applied descriptive statistical techniques. These techniques were used to analyze a wide array of baseline covariates, including age, sex, the intensity of antidiabetic treatment, an array of comorbid conditions, the Charlson Comorbidity Index (CCI), and any co-medication regimes. By employing the absolute standardized mean difference (aSMD) with a threshold set at 0.1 or higher, we successfully pinpointed notable discrepancies between the study groups, ensuring a rigorous comparison basis.

To rigorously adjust for potential confounding factors and balance the comparison groups, we meticulously calculated propensity scores. This was achieved using logistic regression, factoring in critical variables such as age, sex, the index year of study entry, the CCI score, medical histories of hypertension and liver cirrhosis, and the specific level of antidiabetic treatment within the NAFLD-T2DM cohort. Similarly, for the FLD-T2DM-CVH cohort, additional variables including medical histories of hypertension, dyslipidemia, heart failure, coronary artery disease, alcoholic liver disease, chronic hepatitis B, chronic hepatitis C, and the administration history of ACE inhibitors, ARBs, statins, and ezetimibe were considered, alongside the level of antidiabetic treatment. Following this, a precise 1:1 propensity score (PS) matching was executed without replacement using the nearest-neighbor matching algorithm, applying a caliper width of 0.02 to ensure close matches.

Subsequently, we determined the incidence rate (IR) of each cancer type within the study groups, presenting these rates as cases per 10,000 person-years to provide a clear understanding of cancer development risk.

For a comparative analysis of the effect of SGLT2 inhibitors on HCC and other cancer types’ development, Kaplan–Meier curves were plotted, and log-rank tests were utilized, offering a visual and statistical representation of the time-to-event data. To further refine our understanding, both univariate and multivariate Cox proportional hazard regression analyses were conducted. These analyses aimed to estimate hazard ratios [HR] and their 95% confidence intervals [CI] based on baseline variables such as sex, age at cohort entry, detailed comorbidities, and the use of SGLT2i, along with antiplatelet, antihypertensive, and antidyslipidemic agents. The multivariate Cox regression analysis included variables that exhibited a P value of < 0.1 in the univariate analysis, a strategic choice to ensure that all potential predictors of interest showing a trend towards association were considered, even if they did not meet the conventional significance threshold.

These comprehensive statistical analyses were performed using advanced software tools, namely SAS version v9.4 (SAS Institute, Inc., Cary, NC, USA) and R version 4.3.2 (Boston, MA, USA), to ensure the utmost accuracy and reliability of our findings.

Ethics approval statement

This study was performed according to the Declaration of Helsinki. This retrospective study utilized data from the Health Insurance Review and Assessment Service (HIRA) in South Korea. The institutional review board (IRB) of Ajou university hospital granted an informed consent waiver due to the study's nature and use of de-identified data. Ethical approval was given by the Ajou University IRB, recognizing that patient confidentiality and privacy were upheld, in line with ethical guidelines for retrospective research (AJOUIRB-EX-2023-179).

Patient consent statement

Patient consent was waived for this study as it exclusively utilized anonymized data, ensuring the privacy and confidentiality of individual participants.