Introduction

Non-alcoholic fatty liver disease (NAFLD)‒ the hepatic manifestation of metabolic syndrome‒ is the most common chronic liver disease1,2. Worldwide prevalence of metabolic syndrome and NAFLD has increased in parallel with increased obesity prevalence3,4,5, which is about 20–30% in developed countries and one-third among American adults6,7,8.

Obesity is a common metabolic risk factor associated with NAFLD9,10,11. The prevalence of NAFLD is directly related to increased body mass index (BMI) and central obesity12,13,14. Most studies have shown that visceral fat is an independent factor in generating hepatic steatosis, independent of BMI15,16. The amount of adipose tissue and its distribution differs between men and women17. Women have higher overall fat tissue with relatively more subcutaneous adipose tissue in the hips and thighs. At the same time, men accumulate visceral and subcutaneous fat mainly in the trunk and abdomen with continuous changes before and after puberty17,18,19. The increased fat distribution around the waist (i.e. apple-shaped body) is linked to NAFLD in both genders20. In a pear-shaped body, the subcutaneous fat accumulates mainly in the thighs and buttocks21,22, which is typical among females but can increase metabolic syndrome in males, which is a risk factor for NAFLD independent of central obesity23. In support of the role of fat distribution and anthropometric measures in NAFLD, studies have found several contributing factors, including abdomen circumferences, waist, neck and fat accrual in trunk and arms24,25,26,27,28,29.

Most people with NAFLD, including both children or adults, do not have differential symptoms at the early stages of the disease30. Notably, after the development of cirrhosis, different symptoms such as caput medusa, spider angioma, palmar erythema, ascites, and jaundice appear31. Therefore, early diagnosis is critical to prevent severe complications.

Ultrasonography and laboratory tests are typical diagnostic methods for detecting fatty liver disease. Ultrasound technique has relatively high accuracy in detecting the moderate-to-severe steatosis level and lower accuracy in earlier stages of NAFLD32. Notably, hepatic fibrosis cannot be diagnosed by ultrasonography14,33. Although typically used to detect fatty liver disease, laboratory tests are not useful for all ages and gender groups due to low accuracy34,35. Therefore, a precise, cost-effective, and non-invasive method to analyze symptoms of various stages of the fatty liver for NAFLD diagnosis is desirable. Such an approach is important to help with early diagnosis of NAFLD, which could help prevent hepatic steatosis progression to fibrosis, advanced cirrhosis, and hepatocellular carcinoma.

In recent years, machine learning (ML) models have been used as a novel approach in predicting NAFLD36,37,38,39. However, all of these studies have focused mainly on laboratory outcomes and have not considered body composition and anthropometric factors. Therefore, the primary aim of this study is to identify essential ML classifiers of NAFLDs using body composition and anthropometric indices. The secondary aim is to identify feature contributions to the prediction of NAFLDs.

Materials and methods

Study design and participants

This cross-sectional study was conducted to explore NAFLD phenotypes based on body composition and anthropometric indices. Participants were recruited from the eastern (Khorasan Razavi) and southern (Hormozgan) provinces of Iran, through advertisement on the notice boards of the university clinics, as well as via phone or email contact to potential participants. A total of 593 individuals aged above 13 years old were initially recruited. Eighty individuals were excluded from the study and 513 participants were remained. Exclusion criteria were as follows: the presence of underlying liver disease, taking medications (anti-hypertensive and anti-arrhythmic, anti-glycaemic, corticosteroids, nervous system agents, chemotherapy, Methotrexate and Tamoxifen), alcoholic patients with more than twice-a-week consumption, previous history of any type of cancer during the last year, history of surgery during the last 6 months, pregnant women. The study was conducted following the Declaration of Helsinki, and ethical approval was granted by Mashhad University of Medical Sciences following (Code: IR.MUMS.fm.REC.1395.64). All participants provided written informed consent. Also, written informed consent was obtained from guardians of participants aged under 18 years.

Data collection

At each medical clinic, eligibility, demographics questionnaire, anthropometric, and body composition measurements were assessed by two trained nutritionists. Medical examination and disease diagnosis were performed by a general physician and an internal specialist, respectively. Demographic information, including sex, age, education, disease history and medications, were assessed by researcher using a questionnaire. Weight was measured using a digital weighing scale (Seca 704; Hamburg, Germany), height was measured using a wall height chart, and the body composition measures were assessed using InBody 270 (Inbody Co. Ltd, South Korea) body analyzer to measure per cent (%) body fat, total fat mass, muscle mass, as well as fat mass in the right/left leg, right/left arm and trunk with light clothing and without shoes. The circumferences of neck, chest, arm, wrist, waist, hips, abdomen, thighs, and length of ulna and leg were measured using a flexible tape measure with an accuracy of 0.1 cm. BMI was calculated by dividing weight (kilograms) by the height in meters squared26. Subcutaneous fat in the area below the scapula, arms biceps and triceps and the upper iliac crest was measured using a Saehan calliper (Saehan SH5020, Korea). Participants were also examined for acanthosis in the back of the neck and armpits and the presence of subcutaneous fat under the chin and at the back of the neck. A Fibroscan equipped with the M and XL probes (Echosens 504, Paris, France) was used to assess both controlled attenuation parameter (CAP) (dB/m) and liver stiffness measurement (LSM) (kPa) values simultaneously. A reliable LSM was defined as the median liver stiffness of the 10 measurements (a success rate of greater than 60%, and an IQR < 30% of the median LSM value)40. CAP values range from 100 to 400 dB/m and the following cut-off values were used for the diagnosis of steatosis stages: Stage 0, < 238 dB/m, Stage 1, ≥ 238 to 260 dB/m, Stage 2, ≥ 260 to 292 dB/m, and Stage 3, ≥ 292 dB/m41. LSM values range from 1.5 to 75 kPa, and the following cut-off values were used for the diagnosis of liver fibrosis stages: no significant fibrosis or F0 < 6.2 kPa, mild fibrosis or F1 ≥ 6.2 to 7.6 kPa, moderate fibrosis or F2 ≥ 7.6 to 8.8 kPa, severe fibrosis or F3 ≥ 8.8 to 11.8 kPa and cirrhosis or F4 ≥ 11.8 kPa42.

Statistical analysis

Descriptive and non-predictive data analysis was performed using SPSS version 21 software (SPSS, Inc., Chicago, IL). Data were expressed as mean ± standard deviation or frequencies. Between-group comparisons were performed using an independent sample t test and analysis of variance (ANOVA), followed by Tukey’s post hoc test. A P-value of less than 0.05 was considered statistically significant.

Machine learning models

Three label variables were considered: fatty liver (stage I, II and III vs. no steatosis), steatosis, and fibrosis stages. Eight ML techniques were applied to the dataset to identify the best modelling approach. To this end, k-Nearest Neighbor (kNN), Support Vector Machine (SVM), Radial Basis Function (RBF) SVM, Gaussian Process (GP), Random Forest (RF), Neural Network (NN), AdaBoost and Naïve Bayes were tested. An extant explanation of these classifiers can be found elsewhere43. Testing of these models was performed using the Scikit-learn library in Python programming language44.

To comprehensively compare different classifiers, we trained and evaluated dataset 50 times. This is because different classifiers sometimes predict slightly different outputs and initial points are different for a specific classifier in each run. Thus, a reliable output can be estimated by averaging each classifier several times. Model accuracy and area under the curve (AUC) are reported for each ML technique. Importance values are reported for individual feature variables.

Pre-processing involved data normalization and segmentation. The few missing values in the numerical results of the experiments were replaced using the Linear Interpolation method45. Principal component analysis (PCA) was used to extract the attribute of the data46,47. Data were divided into two parts, train and test. Processing involved feature selection and classification with the best feature. The model processing involved a variety of models. The model with the highest performance was selected.

Patient consent

All patients provided written consent for participation in this study. For participants aged under 18 years, written informed consent was obtained from their guardians.

Ethics approval

Ethical approval was received from the research ethics committee at Mashhad University of Medical Sciences (Code: IR.MUMS.fm.REC.1395.64).

Results

In total 513 participants (240 males and 273 females) took part in the study, of whom 169 (74.1%) male and 220 (80.6%) female cases had a degree of hepatic steatosis. The mean age, weight, and BMI were 37.04 ± 15.44 years, 77.26 ± 17.31 kg, and 28.15 ± 4.89 kg m2, respectively. Overall demographic characteristics and biochemical measures are presented in Table 1. Significant differences were found in most anthropometric variables between male and female participants (see Tables 2 and 3).

Table 1 Demographic information of study participants.
Table 2 A comparison of the anthropometric variables across different stage of the hepatic steatosis in male and female participants.
Table 3 A comparison of the anthropometric variables across different stage of the hepatic fibrosis in male and female participants.

Machine learning results

Figures 1, 2, 3 present box plots for each classification method applied to three outcomes. Random Forest (RF) method generated the most accurate ML model for fatty liver (presence of any stage), steatosis stage and fibrosis stage. Average accuracy and AUC values resulted from RF were 0.82 and 0.84 for fatty liver, 0.52 and 0.69 for steatosis stages, 0.57 and 0.58 for fibrosis stages, respectively. Average accuracy and AUC are presented in the Supplemental file (Model Iterations) for all conditions. Moreover, sensitivity, specificity, true positive and true negative measures were presented for fatty liver disease.

Figure 1
figure 1

Box plots showing different classification methods applied to the dataset for presence of fatty liver. Box plots are generated by performing 50 individual runs for each classifier. This will assure that the achieved results are reliable.

Figure 2
figure 2

Box plots showing different classification methods applied to the dataset for stages of steatosis. Box plots are generated by performing 50 individual runs for each classifier. This will assure that the achieved results are reliable.

Figure 3
figure 3

Box plots showing different classification methods applied to the dataset for stages of fibrosis. Box plots are generated by performing 50 individual runs for each classifier. This will assure that the achieved results are reliable.

Feature variables with the highest predictability for fatty liver were abdomen circumference (IV; average importance value = 0.061), waist circumference (IV = 0.061), chest circumference (IV = 0.054), trunk fat (IV = 0.056) and BMI (IV = 0.053); for steatosis, the stage was abdominal circumference (IV = 0.053), waist circumference (IV = 0.052), chest circumference (IV = 0.052), trunk fat (IV = 0.051) and BMI (IV = 0.050); and for fibrosis were abdominal circumference (IV = 0.049), waist circumference (IV = 0.049), chest circumference (IV = 0.043), BMI (IV = 0.045) and weight (IV = 0.045). See Figs. 4, 5, 6 and Tables 4, 5, 6.

Figure 4
figure 4

Box plots showing relative feature importance for presence of fatty liver. hx history, cm centimeter, kg kilograms, BMI body mass index, MUAC mid-upper arm circumference.

Figure 5
figure 5

Box plots showing relative feature importance for stages of steatosis. hx history, cm centimeter, kg kilograms, BMI body mass index, MUAC mid-upper arm circumference.

Figure 6
figure 6

Box plots showing relative feature importance for stages of fibrosis. hx history, cm centimeter, kg kilograms, BMI body mass index, MUAC mid-upper arm circumference.

Table 4 Variable importance from the random forest method for fatty liver (presence of any stage).
Table 5 Variable Importance from the random forest method for steatosis stage.
Table 6 Variable importance from the random forest method for fibrosis.

Further assessment identified gender-specific features (see Supplemental Figs. 16; Supplemental Tables 16). Important predictor factors for fatty liver disease among females were waist circumference (IV = 0.057), abdomen circumference (IV = 0.056), trunk fat (IV = 0.055), fat mass (IV = 0.052), chest circumference (IV = 0.048), and BMI (IV = 0.048) were the most important features. Among males, waist circumference (IV = 0.053), chest circumference (IV = 0.052), trunk fat (IV = 0.051), BMI (IV = 0.052), abdomen circumference (IV = 0.049) and fat mass (IV = 0.048) had the highest predictive value for fatty liver. Important predictor factors for steatosis among females were abdomen circumference (IV = 0.048), waist circumference (IV = 0.047), weight (IV = 0.046), trunk fat (IV = 0.045), fat mass (IV = 0.044), and BMI (IV = 0.043) were the most important features. Among males, waist circumference (IV = 0.051), chest circumference (IV = 0.050), abdomen circumference (IV = 0.049), trunk fat (IV = 0.048), BMI (IV = 0.048), and fat mass (IV = 0.046) had the highest predictive value for steatosis. Important predictor factors for fibrosis among females were abdomen circumference (IV = 0.048), waist circumference (IV = 0.047), BMI (IV = 0.046), trunk fat (IV = 0.045), chest circumference (IV = 0.043), and muscle mass (IV = 0.043) were the most important features. Among males, abdomen circumference (IV = 0.045), waist circumference (IV = 0.043), weight (IV = 0.043), BMI (IV = 0.043), right arm fat (IV = 0.042) and fat mass (IV = 0.042) had the highest predictive value for fibrosis.

Discussion

This study applied ML techniques to determine the optimal body composition and anthropometric classifier of NAFLD and identify feature contribution to the prediction of the disease. RF generated the most accurate ML model to predict fatty liver presence, steatosis (stages) and fibrosis. To our knowledge, this is the first study applying ML on body composition and anthropometric data to predict NAFLD. High accuracy (82%) highlights the potential for applying ML techniques for the primary prevention and screening of NAFLD using anthropometric measurements.

Previous studies using ML techniques to predict fatty liver disease have mainly focused on biochemical measurements, with similar levels of accuracy (83.0%) using Bayesian Network38, (76.3%) Logistic Regression37, (86.4%) RF39 and (80%) Classification Tree techniques36. However, we tested the predictive value of body composition and anthropometric measurements rather than biochemical variables. Anthropometry as a lower-cost and more feasible approach can be considered a primary screening method for fatty liver disease.

Abdominal obesity is a significant risk factor leading to NAFLD27. Waist circumference and trunk fat have been shown to be significantly predicting the risk of NAFLD24. Although BMI is one of the risk factors of NAFLD31, it has been argued that BMI is limited compared to other anthropometric measures (e.g., waist circumference) in identifying lean NAFLD individuals25. In a similar vein, the findings of the present study clearly show the importance of these body composition and anthropometric measures and their relative contribution to the prediction of NAFLD.

Neck circumference reflects the amount of subcutaneous fat in the upper body, and is a reliable factor in determining central obesity48. A positive correlation has been shown between neck circumference and hepatic steatosis26,28. Neck circumference showed a positive association with other anthropometric components, such as BMI and waist and waist-to-hip circumference. In the present study, neck circumference contributed almost equally to hepatic steatosis and fibrosis.

A study by Subramanian revealed that the level of arm fat index in both males and females had a negative association with the degree and severity of NAFLD29. In our study, a strong and positive relationship between arm circumference and the severity of steatosis and fibrosis was detected, validated by the ML model. Rafiee et al. showed that the amount of fat in hips and legs and circumference of hip negatively associated with fatty liver and the severity of the disease. In contrast, the waist-to-hip ratio was closely associated with fatty liver. They also showed that the accuracy of this ratio in predicting NAFLD was greater than BMI and waist-to-height ratio49.

Most ML studies for the prediction of NAFLD have used the ultrasonography technique to diagnose fatty liver disease36,37,38,39. Ultrasound is a commonly used method for the diagnosis of hepatic steatosis50. Ultrasonography is a safe, well-tolerated, non-invasive and low-cost technique50; however, there are limitations associated with ultrasound use, including limited capability in detecting fatty infiltration (less than 20% steatosis), operator dependency and subjective assessment51,52, and ML is expected to minimise some of these. Application of ML techniques on body composition and anthropometric measures as a less time-consuming and easy to undertake method can help physicians in their clinical decision making.

The presence of liver fibrosis in patients with NAFLD is considered the strongest predictor of long-term outcome53. NAFLD Fibrosis Score (NFS) and Fibrosis-4 (FIB-4) have been recommended as appropriate methods for the initial assessment of fibrosis in NAFLD patients54. Both of these methods use a combination of variables including age, BMI and biochemical measures (i.e. aspartate aminotransferase (AST), alanine aminotransferase (ALT), platelets, etc.). Graupera et al. concluded that NFS and FIB-4 are not optimal for screening as they correlate poorly with liver stiffness55. In their study, waist circumference was found to be the ideal measure for fibrosis screening among high risk people from general population55. However, other studies found that NFS and FIB-4 have the potential to detect advanced fibrosis and the progression of fibrosis among people with NAFLD56. It seems that NFS and FIB-4 are more useful in the diagnosis of fibrosis in NAFLD but not for fibrosis screening among the general populations. The present study showed suboptimal accuracy (57%) in detecting fibrosis using less expensive and non-invasive factors i.e. anthropometric and body composition measures. Further studies might explore a combination of these methods including anthropometric, body composition and biochemical variables altogether.

The proposed algorithm identified in this research can be used by the health systems for several reasons. Screening of the presence or absence of NAFLD with the help of non-invasive anthropometric measurements can be achieved with simple and cheap equipment57. Moreover, performing the measurement task needs less specialty knowledge therefore can be implemented in several health centres (e.g., primary practice) and also remote areas. Once validated, the resulted assistive technology can serve the clinicians in the prevention of liver diseases. There are limitations of the present research that need to be addressed. A small sample size might have potentially limited the results of ML prediction. Although, the small sample size was accounted for by multiple cross-validations, which reduced potential errors. Future studies with larger sample sizes can allocate separate validation sets and evaluate the model. Moreover, even though the most common method for fatty liver diagnosis, the ultrasound technique is not the gold standard. Using liver biopsy outcomes would generate more valid results. Also, to increase the predictive accuracy of the proposed model for NAFLD prediction, future studies should include other body composition and anthropometric measures such as sagittal abdominal diameter (SAD) and peri-renal fat58.

Conclusion

Present findings show that applying a ML classification model on anthropometric and body composition variables predicted the presence of fatty liver disease. ML-based decision support systems offer potential to assist physicians with screening, diagnosis and prevention of NAFLD. ML-based decision support systems could be of particular value for providing services at a population level and remote health care where there is a lack of trained specialists.