Introduction

Uric acid (UA) is the final product of purine metabolism, mainly derived from nucleic acids and other purine compounds decomposed by cell metabolism and purine in food1. UA level in human bodies is normally balanced; however, hyperuricemia (HUA) occurs as a result of increased accumulation or reduced excretion of UA2. HUA is the root cause of gout and UA nephropathy and an independent risk factor for hypertension, diabetes mellitus, and coronary heart disease3. Overall, the prevalence of HUA tends to be higher in developed countries than in developing countries. However, it has been noted that there is a persistent increase in the prevalence of HUA in many developing countries4.With the rapid development of China's economy, urbanization, aging population, and changes in traditional eating habits and lifestyles, the prevalence of HUA has increased from 11.1 (2015–2016) to 14.0% (2018–2019) in China5. According to data from the National Health and Nutrition Examination Survey (NHANES) 2007–2016, a nationally representative survey, the prevalence rates of HUA were 20.2% among men and 20.0% among women between 2015 and 2016 in the United States6. The prevalence of HUA increased in both men (19.7–25.0%) and women (20.5–24.1%) from 2006 to 2014 in Ireland7. Therefore, it has gradually developed into an important public health problem that seriously endangers public health and has caused huge health and economic burden to human beings.

Different regions have geographic characteristics, genetic factors, which predispose individuals to HUA. It was revealed that the pooled prevalence of HUA was significantly higher in the southern (25.5%) and southwestern (21.2%) regions of China than in the remaining regions8. The prevalence of HUA varies significantly across different ethnicities in China9. Among the Han population, the prevalence was found to be 17.9%10, while it was as high as 24.5% among the Zhuang community11. In contrast, the Mongolian ethnicity and Hui ethnicity had significantly lower HUA prevalence rates of 10.0% and 4.0%, respectively12. Interestingly, a recent survey conducted in Xinjiang highlighted that the Uighur community had one of the lowest HUA prevalence rates (4.6%), which was attributed to their low alcohol consumption5. Guangxi, located in southern China, has the largest population of ethnic minorities and a unique geographical environment, lifestyle, and national cultural characteristics. Gongcheng is a Yao autonomous county that is located in Guangxi, the southeast of Guilin, with a resident population of 360,000, and includes Yao, Han, and Zhuang people. Most Yao people live in deep mountains and are relatively isolated, making them different from the general population in terms of social culture and living habits. Therefore, this study conducted a cross-sectional survey to investigate the prevalence and influencing factors of HUA in adults aged ≥ 30 years among the Yao people of Guangxi to enrich the epidemiological study in the field of HUA among residents in rural minority areas of China.

Materials and methods

Study population

The study was based on the Healthy Gongcheng Cohort Study. We aimed to understand the health status of people over 30 years old in Gongcheng to create a national health promotion county in Gongcheng. From December 2018 to December 2019, it was conducted a cross-sectional survey in Lianhua Town and Limu Town. All survey respondents participated voluntarily, and 4357 residents aged ≥ 30 years were screened for routine health. We conducted baseline and dietary assessment questionnaires a few months later and collected 3719 baseline and 3397 dietary assessment questionnaires. Residents aged ≤ 30 years, who had self-reported disability, mental disease malignant tumor, clinotherapy, or egresses, or who worked or lived away from home over half a year were excluded from further participation. After eliminating the questionnaires with missing information, incorrect information, or obvious errors, a total of 2128 participants were included in this study. The process was shown in Fig. 1. Participant’s personal information was de-identified and stored in protected files and locked cabinets. This cross-sectional study protocol was approved by the Ethics Committee of the Guilin Medical University (No. 20180702-3) and was in accordance with the Declaration of Helsinki (1964) and subsequent amendments. All participants signed an informed consent form.

Figure 1
figure 1

Flow chart of sample selection criteria: cross-sectional study. DSQ dietary assessment questionnaires, BQ baseline questionnaire.

Data collection

Volunteers made up of medical students were recruited and trained for a week to develop the skills needed for the survey13. Physical examination items included height, weight, chest circumference, waist circumference, blood pressure, oxygen saturation, electrocardiogram, abdominal B ultrasound, chest anteroposterior film, bone mineral density, blood routine, blood biochemistry 13 items, urine routine and other examinations. The biochemical indexes were detected in the People's Hospital of Gongcheng Yao Autonomous County. Parallel samples were set for all routine physical examination indexes at testing time. The baseline questionnaire collected the demographic information of the participants, including gender, age, educational level, occupation, smoking, alcohol consumption, medical history, physical activity, sleep status, etc. The semi-quantitative food frequency questionnaire was used to collect the information of nutrients, drinking situation of Camellia oleifera, drinking frequency of green tea, daily oil intake, daily salt intake, etc.

Definition

The HUA was defined as a serum UA level > 420 μmol/L in men and > 360 μmol/L in women14. Diabetes was defined as fasting plasma glucose (FPG) ≥ 126 mg/dL (7.0 mmol/L) or glycosylated hemoglobin (HbA1c) ≥ 48 mmol/mol (6.5%) or history of diabetes15. Fatty liver disease (FLD) was defined as per patient's self-reported medical history or abdominal B-ultrasound findings. The diagnostic criteria for dyslipidemia included serum total cholesterol (TC) ≥ 6.2 mmol/L, low-density lipoprotein (LDL) ≥ 4.1 mmol/L, high-density lipoprotein (HDL) < 1.0 mmol/L, and triglyceride (TG) ≥ 2.3 mmol/L16. Anemia was defined as hemoglobin less than 130 g/L for men and 120 g/L for women17. Hypoxia was defined as ambient air blood oxygen saturation < 90% by pulse oximetry at diagnosis18. The bone mass was defined according to World Health Organization criteria: normal if the T-score was > − 1.0; osteopenia if it was between − 1.0 and − 2.5; and osteoporosis if it was < − 2.519. Body mass index (BMI) was calculated by weight (kg)/height (m2). Participants were classified as emaciation (BMI < 18.5 kg/m2), normal weight (BMI 18.5–24.9 kg/m2), overweight (BMI 25–29.9 kg/m2) and obese (BMI ≥ 30 kg/m2)20. Abdominal obesity was defined by waist circumference (WC) > 102 cm in men and WC > 88 cm in women21. Hypertension was defined as blood pressure ≥ 140/90 mm Hg or a history of treatment with antihypertensive medication22.

Bayesian network model

The Bayesian network model is a kind of probabilistic graph model that combines probability theory and graph theory to reveal the probabilistic dependence relationship between variables (nodes), which is represented by directed acyclic graph (DAG)23. In the DAG, the nodes represent the random variables \(U = \left\{ {X_{1} ,...,X_{n} } \right\}\), and the directed edges represent the direct probabilistic dependencies between the corresponding variables \(X_{1}\) and \(X_{j}\).If there is an arc from \(X_{1}\) to \(X_{j}\),then we can informally say that \(X_{1}\) causes \(X_{j}\), so \(X_{1}\) and \(X_{j}\) are often referred to as parent and child, respectively is used to quantitatively describe the strength of the relationship between a random variable and its parent. A Bayesian network is simply a representation of the joint probability distribution of a set of random variables \(X = \left\{ {X_{1} ,...,X_{n} } \right\}\), so the probability expression can be obtained:

$$P(X_{1} ,...X_{n} ) = \prod\limits_{i = 1}^{n} {|P(X_{i} |X_{1} ,X_{2,...,} X_{i - 1} )} = \prod\limits_{i = 1}^{n} {|P(X_{i} |\pi (X_{i} )} )$$

where \(\pi (X_{i} )\) represents the set of parent nodes of node \(X_{i}\), \(\pi (X_{i} ) \subseteq \left\{ {X_{1} ,...,X_{i - 1} } \right\}\)24. A conditional probability table (CPT) can describe the association strength between variables by constructing a DAG to reveal the potential relationship between influencing factors24. This can intuitively clarify the complex internal regulation relationship between diseases and their related factors to make up for some shortcomings of the logistic regression analysis model25.

Bayesian network learning algorithms

The learning of Bayesian network includes structure learning and parameter learning26. Structure learning is the process of constructing and determining the most suitable topological structure of Bayesian network from data, and its emphasis is to reveal the complex network relations among variables. Parameter learning is to determine the parameters of the model, the conditional probabilities and transition probabilities of the variables in the network, when the structure of the model is known. In this paper, the PC algorithm in GeNle4.0 software is used for structure learning, and EM (expectation–maximization) algorithm is used for parameter learning. The PC structure learning algorithm is one of the earliest and the most popular algorithms, it uses independences observed in data (established by means of classical independence tests) to infer the structure that has generated them.

Statistical analysis

All data were collated using Microsoft Excel 2021. The normality of continuous variables was tested by using the Kolmogorov–Smirnov test. Data of all continuous variables that did not obey normality were presented as median and IQR. Categorical variables were described in percentages. Univariate analysis of categorical variables between HUA and non-HUA groups was performed using the Chi-square test and between creatinine (CREA) using the Mann–Whitney U test. The variables with statistical significance in univariate analysis were analyzed by binary logistic multivariate regression. The above statistical analysis was performed by SPSS28.0 software (IBM, Chicago, IL, USA) and 2-tailed P values < 0.05 were considered significant. The visualization of Fig. 1 was performed using QuickDraw software and the forest plot was visualized using GraphPad Prism 9.3.0 software shown in Fig. 2. The PC algorithm and EM algorithm of GeNle 4.0 Academic software were used to learn the structure and parameters of Bayesian network model, respectively. Bayesian network model graphs and CPTs were constructed using GeNle 4.0 Academic software as shown in Fig. 3.

Figure 2
figure 2

Bayesian network model of hyperuricemia based on EM algorithm and its prior probability. FLD fatty liver disease, CREA creatinine.

Figure 3
figure 3

Prediction differentiation and accuracy of Bayesian network model in HUA. (A) The ROC curve for HUA. (B) The Calibration curve for HUA. TPR true positive rate, FPR false positive rate, H–L Hosmer–Lemeshow.

Institutional review board statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by Ethics Committee of Guilin Medical University (No. 20180702-3).

Informed consent statement

Informed consent was obtained from all participants involved in the study.

Results

Basic demographic characteristics

A total of 2128 participants were included in this cross-sectional study, of whom 826 (33.3%) were men, 1302 (61.2%) were women, ranging in age from 30 to 93 years, with a mean age of 57.7 ± 12.0 years. The overall prevalence of HUA in this population was 15.6%, 23.2% in men and 10.7% in women, respectively (Table 1).

Table 1 Univariate analysis of the prevalence of hyperuricemia in the cross-sectional study.

Univariate analysis

The univariate analysis of the questionnaire data and physical examination data on the prevalence of HUA were shown in Table 1. The prevalence of HUA in men (826, 23.2%) was significantly higher than that of women (1302, 10.7%). Similarly, the educational level (≥ middle school) smoking, drinking, hypoxia, abdominal obesity and weekly working days (≤ 5 days) were risk factors for HUA. In addition, both oil tea and bone mass (osteopenia, osteoporosis) were protective factors against HUA in public. Non-working and obese showed the highest prevalence among the four physical activity level at work groups and somatotype group, respectively (P < 0.05). Interestingly, there were no significant differences between the HUA group and the non-HUA group in age, nation, occupation, household income, operation, medical insurance, daily oil/salt intake, green tea, walking time, and working time. Moreover, it was showed significant differences between the HUA group and the non-HUA group in other metabolic diseases (diabetes, FLD, dyslipidemia) and CREA level (P < 0.05). The occurrence of HUA was not related to hypertension and anemia (Table 1).

Multivariate logistic regression analysis

To simplify the structure of the later Bayesian network model, 15 variables with P < 0.05 in univariate analysis were included in the binary logistic multivariate regression analysis model. The relevant factors affecting the occurrence of HUA were screened with αin = 0.05 and αout = 0.10, as shown in Table 2. The logistic regression analysis results showed that alcohol consumption, physical activity level at work, FLD, dyslipidemia, bone mass, abdominal obesity, somatotype, and CREA finally entered the model. Drinking, FLD, dyslipidemia, abdominal obesity, somatotype (overweight, obesity), and CREA were risk factors for HUA. In addition, physical activity level at work and bone mass (osteopenia) were protective factors against HUA.

Table 2 Multivariate logistic analysis of the prevalence of hyperuricemia in the cross-sectional study.

In contrast, drinking was associated with a 57.9% (OR = 1.579, 95% CI 1.177–2.118, P = 0.002) higher odds for HUA. Similarly, the risk of HUA was 28.4% (OR = 0.716, 95% CI 0.517–0.991, P = 0.044) and 74.4% (OR = 0.256, 95% CI 0.088–0.745, P = 0.012) lower in the population with general and heavy physical activity levels at work, respectively. Moreover, the odds of HUA in patients with FLD and dyslipidemia were 1.212 times (OR = 2.212, 95% CI 1.416–3.456, P < 0.001) and 1.536 times (OR = 2.536, 95% CI 1.942–3.312, P < 0.001) higher than those in patients with non-FLD and normal lipid levels, respectively. Interestingly, the risk of HUA was 0.36 times (OR = 0.640, 95% CI 0.412–0.994, P = 0.047) lower in the osteopenia group than in the control group. In addition, compared with the normal weight group, the risk of HUA increased by 60.7% (OR = 1.607, 95% CI 1.165–2.215, P = 0.004) and 1.409 times (OR = 2.409, 95% CI 1.314–3.543, P = 0.022) in the overweight and obese groups, respectively. According to the IQR of the CREA, the participants were divided into groups Q1–Q4. Compared to the lowest quartile of CREA, those in groups Q2, Q3, and Q4 had a significant association with a higher rate of HUA (OR = 2.157, 95% CI 1.314–3.543; OR = 3.615, 95% CI 2.202–5.937; and OR = 12.259, 95% CI 7.325–20.516, respectively; all P values are less than 0.05) (Table 2).

Bayesian network model of HUA

According to the eight variables screened from the logistic regression analysis model, the Bayesian network model of related factors of HUA was further constructed using the EM algorithm in GeNle4.0 software. As shown in Fig. 2, a HUA Bayesian network model containing nine nodes and 14 directed edges was constructed. The results showed that drinking, dyslipidemia, somatotype, and CREA were directly related to HUA, in which drinking, dyslipidemia, and somatotype were the father nodes of HUA, and the child node of HUA was CREA. Bone mass and FLD were indirectly related to HUA by affecting somatotype, suggesting that somatotype was the intermediate variable between bone mass and FLD, affecting the occurrence of HUA, as shown in Fig. 2.

The calibration curve for predicting incidence and observed proportions closely followed the line of y = x. Additionally, the Hosmer–Lemeshow goodness of fit test yielded a P value of 0.633, greater than the significance level of 0.05. These results indicated that the Bayesian network model was well calibrated, as depicted in Fig. 3A. Moreover, the area under the curve (AUC) of the receiver operating characteristic (ROC) was found to be 0.812, which was greater than the cutoff value of 0.750. This suggested that the variables in the Bayesian network model had good discriminatory ability, as illustrated in Fig. 3B.

Risk reasoning of HUA

The Bayesian network model can infer the probability of unknown nodes according to the state of known nodes to determine the risk of HUA. The risk of HUA was the lowest (0.036) in patients with dyslipidemia and emaciation and in those who did not drink. The risk of HUA was highest when dyslipidemia, obesity, and drinking were present (0.773). The risk of developing HUA increased significantly with the change in somatotype from emaciation to obesity (Table 3).

Table 3 Conditional probability table of hyperuricemia nodes in Gongcheng, Guangxi.

Discussion

HUA has become a common metabolic disease, which is affected by economic development, environment, diet, race, heredity and other factors27 The prevalence of HUA in China increased from 8.48 to 14.0%5 during 2001–2019. In this cross-sectional study of adults from Gongcheng Yao Autonomous County, the prevalence rate of HUA was 15.6% (23.2% for men, 10.7% for women), corresponding to an estimated 38.3 thousand adults with HUA, which was higher than that reported in other neighboring Asian countries such as Japan (13.4%)28 and Korea (11.4%)29. Moreover, the HUA prevalence in Gongcheng men (23.2%) was significantly higher than that found in some developed countries such as the USA6 (20.2%) and Australia30 (16.6%). The HUA prevalence in China is similar to that in developed countries.We hypothesized that the increased prevalence rates of HUA in Gongcheng may be related to China's rapid economic development and westernization of dietary habits in recent years31.

It has been observed that HUA is more prevalent in individuals of Zhuang ethnicity compared to those of Han descent, with a reported prevalence of 24.5% in Guangxi in 2018–201911. Furthermore, recent research has identified Zhuang descent as a risk factor for HUA5. However, in our study, the prevalence of HUA in Yao individuals was found to be lower (11.1%) than that in Han and Zhuang. Interestingly, even though the diet of Tibetan people is usually rich in meat, fat, and alcohol, the prevalence of HUA (2.05%) was still lower than that of Han (17.9%)32. Additionally, despite the Inner Mongolia Mongolian residents' diet primarily comprising meat and dairy products, the prevalence of HUA (10.0%) was also found to be lower in Mongo than in Han people12. Major et al.33 reported that genetic variants may play a greater role in hyperuricaemia in the general population compared with dietary exposure, which could explain the varying prevalence found in these studies. However, further research to confirm these findings is needed.

Previous studies reported that the prevalence of HUA in men was higher than that in women34,35. In the current study, our results demonstrated sex differences in the prevalence of HUA, which was markedly higher in men (23.2%) than in women (10.7%). The results of the univariate analysis were significant. Such sex differences may be related to the higher estrogen level in women, which benefits UA excretion. In comparison, higher androgen level in men promotes renal reabsorption of UA and inhibits UA excretion30,36, owing to the lifestyle of men consisting of drinking and a high-fat and high-purine diet. However, the effect of sex was not significant after multivariate analysis, which may be owing to the small proportion of sex and other factors affecting HUA. This study further demonstrated that FLD, dyslipidemia, drinking, abdominal obesity, somatotype (overweight, obese), and CREA levels were all risk factors for HUA, which was consistent with the studies of other regions in China and with other ethnic groups37,38,39,40. Physical activity level at work and osteopenia were protective factors against HUA5,41. We speculated that the plasma volume increases with the increase in glomerular filtration rate and extracellular fluid volume during long-term moderate exercise, and the improvement in renal plasma flow would promote the secretion and excretion of UA42. We realize that the potential influencing factors analyzed in this study were limited; the investigation of more factors is warranted in future studies.

In addition, we found that the Bayesian network model diagram further demonstrated the complex network connection among the various influencing factors of HUA, among which dyslipidemia, somatotype, and drinking were directly related to HUA. Risk inference by the Bayesian network model showed that the risk of HUA was the highest in people with dyslipidemia and obesity and in those who drank. This might be because the lipotoxicity in dyslipidemia affects the function of islet β cells, increases the levels of free fatty acids, and promotes the occurrence of β-oxidation of free fatty acids; this enhances the activity of NADPH, promotes the synthesis of UA, and causes HUA39. The possible causes are visceral fat accumulation36, endocrine system disorder, androgen and ACTH level decrease, and UA excretion inhibition, which might lead to HUA complications43. The synthesis and metabolism of lactic acid would be accelerated during alcohol metabolism in those who consume alcohol, and lactic acid competitively inhibits UA secretion from renal tubules, activates the ion exchange function of the human urate anion exchanger, inhibits UA excretory function of the kidney, and stimulates UA reabsorption in proximal tubules44,45. In addition, people often consume purine-rich foods during drinking, which would further increase UA content and cause HUA42. At the same time, compared with the non-HUA group, the CREA level in the HUA group was significantly higher. Relevant studies46,47 showed that the serum CREA level was the most commonly used renal function indicator, which might lead to chronic kidney disease (CKD). HUA can indirectly affect renal function by affecting the serum CREA level. It is worth noting that emaciated people with normal blood lipid levels had a lower risk of developing HUA. This may be because being emaciated and dyslipidemic may jointly contribute to a higher risk of developing HUA; however, further studies are needed in this regard. The results also suggested that bone mass and FLD were associated with the somatotype and that somatotype was directly associated with HUA. We assumed that might be because patients with osteopenia and FLD are mostly overweight and obese, thus promoting insulin resistance, reducing renal UA excretion, and leading to HUA48.

Our study had several strengths. First, to our knowledge, this study is the first to investigate the prevalence of HUA in a large sample of Yao individuals, thereby providing valuable insights into the prevalence of this condition across different ethnic groups in China. Second, compared with previous studies on HUA, which used logistic and Cox regression models to describe several independent factors of HUA, the Bayesian network model could reveal how the factors were related to each other and affected the occurrence of HUA through the form of a probabilistic graphical model. This helped discover the potential influencing factors of the disease and provided new clues for further research. Third, this study was based on a survey of the natural population in ethnic minority areas. In addition to the physical examination data, a large amount of detailed questionnaire data was combined. The survey results had important practical significance for determining the prevalence of HUA in ethnic minority areas and specifying corresponding prevention and control strategies. However, some limitations of this study should be noted. First, most of the participants in this study were located in Gongcheng Yao Autonomous County, which might not be sufficient to represent the overall prevalence of HUA in Guangxi ethnic minorities. Second, as a cross-sectional study, the causal relationship between HUA and risk factors could not be determined, and further prospective studies are needed to demonstrate this.

Conclusions

In conclusion, this study has shown that the prevalence rates of HUA among adults in Gongcheng Yao Autonomous County during 2018–2019 were much higher than those reported in previous studies of the Chinese population and even higher than those found in some developed countries. The Bayesian network model can further supplement the complex network relationship among variables that cannot be displayed by the former and can more intuitively reveal the network relationship between diseases and related factors. This further suggests that the prevalence of HUA is influenced by a few factors, including somatotype, drinking, and other complicating metabolic diseases (such as FLD, dyslipidemia, and CKD). Interestingly, bone mass and physical activity level at work were independent protective factors against HUA. Thus, it is suggested to carry out health education for the population, guide the formation of a healthy lifestyle, and improve the blood UA level through good diet, alcohol restriction, adherence to moderate exercise, and maintaining a healthy and ideal somatotype to reduce the prevalence of HUA in the future.