Introduction

Previously, the term non-alcoholic fatty liver disease (NAFLD) has been used to encompass a spectrum of liver pathology with macrovesicular steatosis in at least 5% of hepatocytes in individuals with low to no alcohol consumption. Non-alcoholic fatty liver (NAFL) or simple steatosis is the non-progressive subtype that does not usually have serious implications, although it is estimated that 25% of individuals with NAFLD develop non-alcoholic steatohepatitis (NASH)1—a progressive subtype that eventually advances to fibrosis, cirrhosis (ca. 25% of those with NASH)2, and hepatocellular carcinoma (HCC). Studies have shown that the presence and severity of NAFLD are associated with increased incidence and prevalence of cardiovascular disease (CVD) and chronic kidney disease (CKD)3,4,5,6,7,8,9,10,11,12,13. Notwithstanding the morbidity, mortality, and limited therapeutics of NAFLD-related cirrhosis and HCC, disease mortality is often seen as a result of type 2 diabetes mellitus (T2DM) and CVD complications14,15. While the aetiology remains to be fully understood, NAFLD is recognised as the hepatic manifestation of the metabolic syndrome16. Hence, the causal link of NAFLD to chronic morbidities (i.e., obesity, hypertension, T2DM, CVD, and CKD) is hypothesised—underscoring the concept of NAFLD as a multisystem disease with potential involvement in the pathology of extra-hepatic diseases17.

With NAFLD being closely associated with obesity and metabolic syndrome, its incidence and prevalence are increasing to epidemic proportions and becoming the most common cause of abnormal serum aminotransferase levels, chronic liver disease, and liver transplantation in the United States (US)18,19,20. Data in Asia also shows that NAFLD is as common and important as in the West, albeit it manifests at a lower body mass index (BMI) with many patients not displaying insulin resistance21,22,23. This ethnic variability including the differences in severity and rate of progression as a function of environmental risk exposures demonstrates that NAFLD is a complex disease trait24.

Lifestyle modification, as with other chronic diseases, is the cornerstone of NAFLD management regardless of the disease stage, so while end-stage liver disease has a poor prognosis, NAFLD is clinically manageable at its early onset. Classifying NAFLD into grades is imperative, especially in patients with advanced fibrosis who are at greater risk of developing complications of end-stage liver disease. Although invasive and costly, liver biopsy is still the gold standard in NASH diagnosis and NAFLD staging. While surrogate serum biomarkers exist for NASH, there have been no non-invasive tests that can reliably differentiate it from NAFL18,25. Ultrasonography, while lacking sensitivity, is used as the first-line screening of steatosis. Other imaging techniques such as controlled attenuated parameter (CAP) and computed tomography (CT) are promising, whilst magnetic resonance imaging—proton density fat fraction technique (MRI-PDFF) is considered by many as the gold standard. Considerations on sensitivity, efficiency, operator-dependent results, ease of operation, access, availability, and cost among others remain as limiting factors for these modalities, particularly limiting their potential utility in longitudinal and epidemiology-based studies25. The current understanding of NAFLD pathogenesis, its epidemiology and the available diagnostic strategies underscore the importance of thorough surveillance, early detection, and timely interventions (e.g., lifestyle modification) not only for epidemiological surveillance but also to address the risk of comorbidity in NAFLD patients.

DXA is an imaging technique that has been used commonly for assessing bone density. It also allows for body composition assessment particularly relating to muscle and fat deposition in the body. It is based on an X-ray imaging technique, with low radiation dose, and has been validated extensively for both bones and body composition analyses26,27,28. Besides the commonly used bone density measurements, body composition-related parameters that can be derived from DXA include visceral adipose tissue mass, total body fat percentage, fat-free mass, as well as muscle-related mass. In total, a DXA scan can give up to 48 different parameters pertaining to body composition. The major limitation of DXA, however, is the lack of representation of the body as a true 3D structure. Volumetric parameters are therefore estimates of the 2D projection measurements. Meanwhile, accuracy validations have shown that DXA-estimated mass with scale weight is within 1%28,29,30. Furthermore, DXA has been shown to correlate well with CT and MRI—cross-sectional imaging techniques which serve as gold standards in body composition assessment31. While this limitation exists, there has been a reported consensus in which DXA is considered a reference technique or at least a surrogate to CT/MRI for the assessment of body composition in clinical practice32,33.

Given the known relationships between NAFLD and body composition-related parameters such as visceral fat, we reasoned that using body composition-related parameters based on DXA imaging, a prediction model can be derived to predict people at risk of hepatic steatosis. To aid in this task, we first performed association analyses of various DXA-derived parameters and traditional body composition indices. The reference standard for hepatic steatosis for this study is taken as measurements on MRI using the PDFF techniques, which have been extensively validated previously to be comparable to histopathology34,35,36. We demonstrated that several DXA-derived parameters were significantly associated with hepatic steatosis. We then leverage the use of machine learning (ML) to identify the potential of hepatic steatosis and to classify them into grades based on DXA scan and body composition-related indices. Our hypothesis is that an accurate prediction model can be built to predict the risk of NAFLD based on DXA parameters.

Results

Cohort characteristics

A total of 2959 participants remained after exclusion (see Table 1, Fig. 1). These were 1271 males and 1688 females. In this cohort, 582 participants (19.67%) were deemed as having NAFLD based on the liver MRI-PDFF37. In total 303 were classified as grade 1, 225 as grade 2, and 54 as grade 3, respectively. The characteristics of the cohorts are summarized in Table 1. When stratified by gender, there were significant differences between all the DXA-derived body composition indexes and BSA-normalized DXA parameters between the NAFLD +ve and NAFLD -ve groups (see Supplementary Tables 1 and 2).

Table 1 Descriptive statistics of body composition indices stratified by gender.
Fig. 1: Overview of included data cohorts from the UK Biobank population and patient selection study workflow.
figure 1

DXA dual-energy X-ray absorptiometry, NaN null values, WHR waist-to-hip ratio, ASMMI appendicular skeletal muscle mass index, AGR android gynoid ratio, FMI fat mass index, BMI body mass index, HI hip index, ABSI A Body Shape Index, MRI-PDFF magnetic resonance imaging proton density fat fraction, NAFLD non-alcoholic fatty liver disease, ICD International Classification of Diseases, ROC receiver operating characteristic, AUC area under the curve, SHAP SHapley Additive exPlanations.

Association analysis

The multivariable logistic regression analysis of the body composition indices reveals several parameters to be significantly associated with hepatic steatosis (see Table 2). Of note, obesity defined as BMI over 25 yielded an odds ratio (OR) of 1.9 for males and 2.62 for females. Abdominal obesity, as defined by WHR (OR = 2.50 (male), 3.35 (female)), AGR (OR = 3.35 (male), 6.39 (female)) and WC (OR = 1.79 (male), 3.80 (female)) were all associated with hepatic steatosis. Similarly, when examining ABSI into different quartiles, the higher quartiles yielded the highest OR (Quantile 4 OR = 1.89 (male), 5.81 (female)), and for FMI, both the overweight (OR = 6.93 (male), 2.83 (female)) and the obese (OR = 14.12 (male), 5.32 (female)) categories were significantly associated with hepatic steatosis. When looking at DXA parameters, there were several parameters that were significantly associated with hepatic steatosis. A summary of the top 10 features is shown in Table 3 (with full results in Supplementary Table 3). Of note, we observed the biggest contribution from VAT mass (OR = 8.37 (male), 19.03 (female)), VAT volume (OR = 8.37 (male), 19.03 (female)), trunk fat mass (OR = 8.64 (male), 25.69 (female)), android fat mass (OR = 7.93 (male), 21.77 (female)) and total fat mass (OR = 3.60 (male), 3.90 (female)).

Table 2 NAFLD-associated body composition indices based on multivariable logistic regression analysis stratified by gender and adjusted by age, weight, and height.
Table 3 Top 5 positively NAFLD-associated DXA parameters with multivariable linear regression analysis stratified by gender and adjusted by age, weight, and height.

Machine learning models and prediction

We set out to compare 3 machine learning classifiers. In binary classification, all three achieved reasonable performance with ROC AUC = 0.83-0.87 (Fig. 2). Supplementary tables 4-7 show the full results with separate evaluations using cross-fold validation and hold-out test validation sets. In this main section, we discuss the results of the hold-out test set with the graphical comparison of the 3 models on the hold-out test set shown in Fig. 2. We shall discuss the results of HGBC binary classification in more detail. Using the body composition indices, HGBC achieved an AUC of 0.8519, sensitivity of 0.7601, and specificity of 0.7500. Using DXA-parameters, HGBC achieved an AUC of 0.8617, sensitivity of 0.7736, and specificity of 0.7605. Using combined parameters, HGBC achieved an AUC of 0.8656, sensitivity of 0.7686, and specificity of 0.7542. Using a combination of traditional body composition indices and DXA parameters did not improve performance. Multiclass classification models performed reasonably well in NAFLD grading (Supplementary Fig. 2). For example, using HGBC on DXA parameters, a weighted average ROC AUC (wROCUC) of 0.8377 was achieved, with class 0 (AUC = 0.86), class 1 (AUC = 0.72), class 2 (AUC = 0.79) and class 3 (AUC = 0.70), respectively. In addition, gender-specific binary classification models had similar or better performance for females (Supplementary Figs. 3 and 4). For example, with HGBC, body composition indices (AUC = 0.86), DXA-parameters (AUC = 0.88), and combined (AUC = 0.89).

Fig. 2: ROC AUC curves for the three different machine learning classifications.
figure 2

ROC receiver operating characteristic, AUC area under the curve, LR logistic regression, HGBC HistGradient Boosting Classifier, XGBC Extreme Gradient Boosting, DXA dual-energy X-ray absorptiometry.

We then proceeded to examine the contribution of each of the features using SHAP analysis. All SHAP analyses for the 3 classifiers are demonstrated in Supplementary Figs. 5-10. The SHAP features for HGBC and XGBC were almost identical. For the main result section, we shall focus on HGBC. As expected, the top contributions from the machine learning models were from the features that were highly associated with hepatic steatosis based on the odds ratio (Fig. 3). For example, the top 3 contributions from body composition analyses were from AGR, FMI, and WC. Whereas for the BSA-normalised DXA parameters, the top 3 contributions were from VAT mass, trunk fat mass, and trunk total mass.

Fig. 3: SHAP feature importance on body composition indices (left) and BSA-normalised DXA parameters (right).
figure 3

SHAP SHapley Additive exPlanations, BSA body surface area, AGR android gynoid ratio, FMI fat mass index, BMI body mass index, WC waist circumference, WHR waist-to-hip ratio, HC hip circumference, HI hip index, ASMMI appendicular skeletal muscle mass index, ABSI A Body Shape Index.

The SHAP dependency plots are shown in Fig. 4 for the top 3 contributions. There are clear positive correlations between increasing SHAP values and increasing risks of disease with more distinct separations between the low and high-risk groups.

Fig. 4: SHAP dependence plots of the top 3 predictors of HGBC models trained on body composition indices and DXA parameter.
figure 4

HGBC HistGradient Boosting Classifier, AGR android gynoid ratio, SHAP SHapley Additive exPlanations, BSA DXA, dual-energy X-ray absorptiometry, NAFLD non-alcoholic fatty liver disease.

Discussion

We have shown that DXA-derived parameters were highly associated with hepatic steatosis as measured on MRI-PDFF. Within the traditional body composition indices, FMI (which utilises fat mass information from DXA scan) has the strongest association compared to other traditional metrics. Previously, it has been shown that traditional metrics such as WC were shown to be predictive for hepatic steatosis and fibrosis38 but we have shown in our study that FMI was more predictive. Other studies have also highlighted the importance of DXA parameters such as AG ratio and VAT mass39 but we believe our study is the first to compare all the parameters with traditional parameters such as WC. A recent study also demonstrated that FMI can be used with high accuracy to identify hepatic steatosis as determined by ultrasonography with a high degree of accuracy40. With regards to DXA, we have shown that many DXA parameters (normalised to BSA) were highly associated with hepatic steatosis, not limiting to fat-related parameters which would be expected, but also other parameters such as those relating to lean mass. For instance, the total lean mass has an odds ratio of less than 1 for both genders indicating a negative association with NAFLD. Lee et al. (2021) observed that participants in their study had less skeletal muscle mass over several years of follow-up, and their findings suggest that maintaining muscle mass is important in NAFLD management41. Meanwhile, Cho et al. (2022) have shown that skeletal muscle mass to visceral fat area ratio could serve as a complementary index to conventional adiposity indices in detecting NAFLD among lean yet overweight men and women42. This underscores the potential and practical application of non-conventional indices or measurements to NAFLD diagnosis—not only limited to adiposity indices. There are several studies that have examined the role of muscle mass (particularly fat infiltration of muscle), and we also wanted to examine some of the other parameters relating to muscle that can be derived from DXA scans. Whilst some associations were seen between some of the lean mass parameters on DXA, by far the strongest associations were observed in parameters pertaining to fat, with an extremely strong association with VAT mass and volume, trunk fat mass, android fat mass, and total mass, far higher than those seen using traditional parameters. With several parameters on DXA being associated with hepatic steatosis, we set out to build a machine learning model that can be used to predict hepatic steatosis, and we showed that a reasonably accurate model can be built using these parameters.

In this study, we utilised logistic regression and 2 boosting classifiers. As expected, the performance and feature importance of classifiers varied slightly. On one hand, LR performed marginally better than HGBC with DXA parameters in the gender-unstratified dataset (Fig. 2). On the other hand, gender-stratified-trained models show that histogram-based boosting classifiers outperformed LR with body composition indices but not with DXA parameters (Supplementary Figs. 3 and 4). Theoretically, LR is less robust in high-dimensional datasets where it tends to overfit as opposed to boosting classifiers. While LR could be trained with DXA parameters, the assumption of linearity between dependent and independent variables is a major limitation. Furthermore, the existence of multicollinearity between DXA parameters is expected which makes boosting ensemble classifier a more suitable algorithm that can estimate all types of relationships between dependent and independent variables. In cases where LR performed better, we hypothesize that it is because of the default regularisation in LR. With regularisation, the performance, and accuracy of the LR model are improved by reducing overfitting and underfitting. Furthermore, it also addresses the issue of multicollinearity in LR. In general, DXA parameters outperformed traditional body composition indices in any ML algorithm. Meanwhile, combining body composition indices and DXA parameters did not result in a significant improvement in performance. We hypothesised that this could be due to the more encompassing nature of DXA parameters than traditional body composition indices. While a minimum number of DXA parameters based on association and feature importance could be inferred, the infinitesimal yet cumulative importance of other parameters cannot be discounted.

Early detection of NAFLD is important in order that timely intervention can be prescribed to patients (e.g., lifestyle and diet modification) by healthcare practitioners. In this study, DXA-based ML models demonstrate a potential alternative means to perform early diagnosis of NAFLD, although it is important to take note that the results presented are preliminary and are subject to follow-up validations. Moreover, accessibility to DXA scanning needs to be borne in mind. Nevertheless, the performance of the models based on ROC AUC and sensitivity makes them a promising surrogate compared to conventional imaging techniques. Ultrasonography, for instance, has a sensitivity greater than 90% if the fat content is higher than 30%. Similarly, CT achieves 82% sensitivity on moderate to severe degrees of steatosis43,44. Meanwhile, MRI has a sensitivity of 80-95.8% making it the gold standard in the detection of liver steatosis35,45,46. While these imaging techniques can all be considered suitable for early detection of NAFLD, concerns on detection limit, radiation exposure (in case of CT), access, and ease of operation among others have resulted in divided preferences on their adoption in the clinical practice to quantify liver steatosis. To this end, liver biopsy has remained the gold standard in confirming NASH. However, due to its invasiveness, the frequency of patient/participant hesitating and subsequent refusal to undergo the procedure may exceed 50% in some centres—ostensibly precluding its potential utility as a practical option in early NAFLD screening or detection47.

There are some limitations worth noting. First, we recognise that the recently proposed metabolic-associated fatty liver disease (MAFLD) is now recommended for usage with the aim to cover the more heterogeneous nature of the disease, and not excluding the impact of alcohol on the disease48. For the purpose of this study, we set out to examine and isolate the metabolic associated factors and hence have excluded patients with excess alcoholic intake. Second, the data used for this study was from the UK Biobank cohort, and whilst this is useful for the predominantly Western population, applicability to other regions and ethnicity may need to be further examined. Third, we did not have an independent validation set to test the generalisability of our model beyond the UK Biobank cohort. We are currently in the process of recruiting participants to pursue this objective, so we can test the generalisability of our findings.

As NAFLD cases rise to epidemic proportions, new tools that can potentially be used as opportunistic screening may be helpful particularly as early detection is important. In this study, we not only showed the association of traditional body composition indices to hepatic steatosis but also the strong association of DXA parameters to hepatic steatosis. As expected, visceral adipose tissue mass, trunk fat mass, and adipose tissue mass showed a strong positive association with hepatic steatosis, while total lean mass also demonstrated a negative association. The ML models trained on two types of predictors are practical applications of how body composition indices and DXA can potentially be leveraged to opportunistically screen for NAFLD although more prospective studies with validation across different populations as well as cost-effective analysis need to be performed before this can be adopted more widely.

Methods

The data used were from the UK Biobank which received ethical approval from the North West Multicentre Research Ethics Committee (REC reference: 11/NW/03820). All participants gave written informed consent before enrolment in the study. This research has been conducted using the UK Biobank Resource under Application Number 78730. Additionally, this study was approved by the authors’ own local ethics board (UW-20814) at the University of Hong Kong.

Study population

The UK Biobank cohort consists of over half a million participants from the general population in the United Kingdom (UK). Participants were aged between 40 and 70 years at enrolment and were recruited between 2006 and 2010, with follow-up data. In 2014, the imaging assessments were performed on these cohorts with the aim of collecting 100,000 participants with imaging of the brain, cardiac and abdominal magnetic resonance imaging, DXA, and carotid ultrasound. At the time of writing, the UK Biobank imaging project has collected imaging scans from over 60,000 participants (https://www.ukbiobank.ac.uk/explore-your-participation/contribute-further/imaging-study). For this study, we focused on the imaging data, particularly those with abdominal MRI and DXA imaging and retrieved all other relevant associated information. Only participants with MRI-PDFF37,49 (UK Biobank Category 126) and DXA-derived parameters including visceral fat were included. The UK Biobank provides an imaging modality (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/DXA_explan_doc.pdf) for DXA as a reference.

Data pre-processing

Data processing, statistics, machine learning classification and visualization were performed with custom-made Python scripts based on Statsmodels and Scikit-Learn unless stated otherwise50,51. Electronic health records were retrieved from participants in the UK Biobank limiting the search to those with “10 P Liver PDFF (proton density fat fraction) | Instance 2” and “VAT (visceral adipose tissue) mass”. The downloaded dataset includes DXA-related attributes with additional attributes on gender, age, alcohol consumption, and comorbidities. In summary, a total of 4663 participants were retrieved from the UK Biobank with matching records. DXA-related attributes with more than 50% missing values were excluded (n = 7), while participants with less than 50% missing DXA attributes were imputed with multiple imputation by chained equations (MICE)52,53. Likewise, participants with missing height and/or weight attributes in Instance 2 were excluded (n = 18). This resulted in 4645 remaining participants. DXA attributes were normalized with body surface area (BSA) using Mosteller formula54. The choice of Mosteller formula to calculate BSA was based on its accuracy in various clinical use-case scenarios and applicability among normal, overweight, and obese adults55,56,57,58,59,60. Body composition indices including waist-to-hip ratio (WHR), appendicular skeletal muscle mass index (ASMMI), android gynoid ratio (AGR), fat mass index (FMI), BMI, hip index (HI), and a body shape index (ABSI) were calculated. National Health And Nutrition Examination Survey (NHANES) population average values for {height} = 166 cm, {weight} = 73 kg were used for calculating ABSI38,61,62.

Reference standard, predictor variables and inclusion criteria

While liver biopsy remains the gold standard in NAFLD diagnosis and grading, its inherent invasiveness limits it from routine use. The proton density fat fraction in MRI (MRI-PDFF) has been demonstrated to correlate well with total lipid accumulation in the liver and thus making it a suitable surrogate and reference standard for liver biopsy34,35,36. In this study, UK Biobank participants were categorized into NAFLD grades (0-1 – absence-presence or 0-3 – normal, mild, moderate, severe as class labels) based on the MRI-PDFF values following Szczepaniak et al.’s NAFLD grading scheme (cut-off values)63. In brief, the grading scheme 0, 1, 2 and 3 corresponds to 0-\(\le\)5.56%, 5.56%-\(\le\)10%, 10%-\(\le\)20%, and >20% fat content (steatosis), respectively34,64. Participants with excess alcohol intake or known chronic liver disease were excluded, defined as either consuming more than 21 (Male) or 14 alcohol units (Female) per week (n = 1654), with chronic liver diseases (International Classification of Diseases, Tenth Revision ICD-10: K73, K74 and K75) (n = 18), or both (n = 14)65,66,67,68. Considering both alcohol intake habits and the presence/absence of chronic liver disease, the total number of participants in the final cohort is 2959.

Statistical analysis

Two sets of predictor variables were adopted for the analysis: (1) 9 body composition indices and (2) 36 (mass- and volume-based) BSA-normalized DXA parameters. We set out to determine the association between the different variables with hepatic steatosis. Independent sample t-tests with unequal variances were performed to determine whether the two groups (NAFLD- and NAFLD + ) in this study exhibit significant differences in various predictor variables. Multivariable adjusted (with age, weight, and height) analysis with logistic regression with respective odds ratios was performed to rank categories or quantiles (body composition indices) with respect to case-control (“normal”) or to the first quantile of the sample69. Similarly, odds ratios for DXA parameters were calculated from the standardized (beta, β) coefficients of linear regression analysis.

Machine learning model training and evaluation

We then set out to develop ML prediction models for the prediction of hepatic steatosis. Three machine learning classifiers were compared. Logistic regression (LR), two histogram-based gradient boosting ensembles: HistGradientBoostingClassifier (HGBC, Scikit-Learn), and Extreme Gradient Boosting (XGBoost) classifier (XGBC) ensemble algorithms were employed to train binary and multiclass classifiers taking inputs of body composition indices, BSA-normalised DXA values or combined variables (body composition indices and BSA-normalized DXA)70,71. Data was randomly partitioned into 80:20 train-test sets with stratification such that the proportions of NAFLD +ve and NAFLD -ve were consistent in both sets. Owing to imbalanced datasets, boosting techniques of the minority class were used72. The minority classes were oversampled with the synthetic minority oversampling technique—support vector machine (SMOTE-SVM) (k, m = 10, 5). Meanwhile, the majority class was re-sampled and under-sampled in the process with the synthetic minority oversampling technique—edited nearest neighbour (SMOTE-ENN) and RandomUnderSampler, respectively72. Hyperparameters were optimised for specificity based on k-1 validation sets while simultaneously testing for performance with repeated (n = 3) and stratified k-folds (k = 10) cross-validation. For LR, the solver and tolerance parameters were optimized for specificity (and in all other algorithms with L2 regularisation parameters). For HBGC, optimisation parameters included a maximum number of leaves for each tree, the maximum depth of each tree, and a minimum number of samples per leaf. For XGBC optimisation parameters included learning rate, number of estimators, maximum tree depth, lambda regularisation, and subsample ratio of the training instances. Models were built using optimized hyperparameters with SMOTE-oversampled minority class/es on the hold-out train sets. Supplementary Table 8 lists the optimised hyperparameters for various models we trained for this study, while Supplementary Tables 4 and 5 show the performance metrics of ML algorithms trained with different types of predictors on gender-(un)stratified sets. Model performance was evaluated on a separate hold-out test dataset for (area under the curve of the receiver operating characteristic) various performance metrics. Finally, feature importance was identified and ranked based on SHAP values73.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.