Unraveling atherosclerotic cardiovascular disease risk factors through conditional probability analysis with Bayesian networks: insights from the AZAR cohort study

This study aimed at modelling the underlying predictor of ASCVD through the Bayesian network (BN). Data for the AZAR Cohort Study, which evaluated 500 healthcare providers in Iran, was collected through examinations, and blood samples. Two BNs were used to explore a suitable causal model for analysing the underlying predictor of ASCVD; Bayesian search through an algorithmic approach and knowledge-based BNs. Results showed significant differences in ASCVD risk factors across background variables’ levels. The diagnostic indices showed better performance for the knowledge-based BN (Area under ROC curve (AUC) = 0.78, Accuracy = 76.6, Sensitivity = 62.5, Negative predictive value (NPV) = 96.0, Negative Likelihood Ratio (LR−) = 0.48) compared to Bayesian search (AUC = 0.76, Accuracy = 72.4, Sensitivity = 17.5, NPV = 93.2, LR− = 0.83). In addition, we decided on knowledge-based BN because of the interpretability of the relationships. Based on this BN, being male (conditional probability = 63.7), age over 45 (36.3), overweight (51.5), Mets (23.8), diabetes (8.3), smoking (10.6), hypertension (12.1), high T-C (28.5), high LDL-C (23.9), FBS (12.1), and TG (25.9) levels were associated with higher ASCVD risk. Low and normal HDL-C levels also had higher ASCVD risk (35.3 and 37.4), while high HDL-C levels had lower risk (27.3). In conclusion, BN demonstrated that ASCVD was significantly associated with certain risk factors including being older and overweight male, having a history of Mets, diabetes, hypertension, having high levels of T-C, LDL-C, FBS, and TG, but Low and normal HDL-C and being a smoker. The study may provide valuable insights for developing effective prevention strategies for ASCVD in Iran.


Bayesian networks
The BN models were developed and evaluated using a two-stage process, including (1) structural learning to determine the topology of the BN or DAG, and (2) parametric learning or estimation of CPs among the nodes, once the network topology was established.In our study, BN provided insight into how a group of ASCVD risk factors can influence the probability of occurring ASCVD, independent of sample size 53 .Our BNs are graphs with arcs linking nodes and no directed cycles, where our ASCVD risk factors and outcome variables are represented as nodes, and conditional dependencies between them are represented by directed edges or arrows 52 .Each node is associated with a CP table, which specifies the CP of each of its values for each combination of the parents' values 53,54 .Our procedure in BN modelling was to learn a BN structure by amalgamating algorithmic potency and expert insights with empirical evidence obtained through a systematic literature review.This rigorous approach ensures the relevance and significance of chosen variables in capturing intricate relationships and dependencies within the modeled system.This aligns with established research, as exemplified in studies by Ordovas et al. 32 and Huang et al. 55 , which advocate for incorporating prior expert knowledge and comprehensive literature reviews in BN variable selection processes 32,55 .In other words, we decided to train two BNs; Bayesian search through an algorithmic approach and knowledge-based BN.

Structures from the literature
We aimed to predict the most suitable causal model for analyzing variables related to ASCVD, which models underlying risk factors of ASCVD, including age, sex, diabetes, smoking status, hypertension, BMI, FBS, T-C, LDL-C, HDL-C, and TG.To achieve this, we utilize BN and select the probabilistic models for our purposes by an amalgamation of algorithmic search and knowledge-based models 32 .
The BN structure in Fig. 1 illustrates the interconnections between these variables and their impact on ASCVD risk.The obtained structure of the algorithmic search network, as depicted in Fig. 1A, reveals the factors that influence ASCVD.Notably, age, smoking, and diabetes have a significant impact on ASCVD probability, Fig. 1A includes direct links between these predictors and ASCVD.Additionally, Age → Hypertension → FBS, since hypertension is non-conditional, then FBS and age would be d-connected.
The knowledge-based network structure depicted in Fig. 1B provides a concise overview of the factors influencing ASCVD via a systematic search in the literature.The network highlights the connections between modifiable and non-modifiable risk factors in predicting various ASCVD conditions.Notably, TC, TG, HDL-C, and FBS are indirectly associated with ASCVD through other risks 14,29,56 .However, sex is indirectly associated with ASCVD through diabetes (Sex → Diabetes → ASCVD).Also, the path between TG and ASCVD is blocked after conditioning on HDL-C, a d-separation.TG is conditionally independent of ASCVD given HDL-C (TG → HDL-C → ASCVD), another example of a d-separation in this model.On the other hand, ASCVD is directly associated with smoking, hypertension, diabetes, BMI, HDL-C, LDL-C, cholesterol, and FBS 57,58 .Ultimately, our goal is to provide a precise and easily understandable prediction of ASCVD risk by analyzing the relationships between variables 29,62 .
By models we assumed d-separation, and the model consists of a set of independent predictors that lead to the outcome variable.D-separation is a criterion used in BN to determine whether two sets of variables are independent of each other given a third set of variables, and conditional independence between variables can be directly inferred from the graph using the d-separation criterion [59][60][61] .

Statistical analysis
The Bayes search BN was built using GeNIe Academic Version 4.1.3402.0(Built on 2023-10-03; License ID: 6c8hwje30dfnjbukdej30zg76), and the knowledge-based BN was built utilizing Netica 6.05 (Norsys software corp, USA) 63 , and the BNs were drawn using Netica.Categorical data were presented as count (percentage), and P-values were computed using Fisher's exact test.We compared our 2 different structures using Akaike Information Criteria (AIC), and Bayesian Information Criteria (BIC) values.A smaller AIC and BIC value indicate a better structure.Furthermore, diagnostic indices including sensitivity, accuracy, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR−), Vol:.( 1234567890 and particularly area under the ROC curve (AUC) were calculated for comparing the BNs.We used the leaveone-out cross-validation procedure, as a standard method, to compute the AUC, accuracy, and diagnostic indices.Finally, based on the best-suited BN structure, CPs ASCVD and non-ASCVD were calculated in the datasets.

Consent to participate
All participants filled out and signed the informed consent and assent.The participants' privacy was preserved.All methods were carried out according to relevant guidelines and regulations.

Results
Out  3), and LR + (2.8 vs 16.1) (Fig. 3).Generally, we decided on knowledge-based BN as the better-performing model regarding its better diagnostic indices and lower values of AIC and BIC compared to the Bayesian search.Additionally, as the arrows for knowledge-based BN were obtained based on a systematic search from the literature, clinical interpretability of the relationships in knowledge-based BN is guaranteed (Table 3).

Conditional probabilities of the Bayesian network
CPs of ASCVD based on knowledge-based BN structure are shown in Table 4.The CP of developing ASCVD using smoke given hypertension is equal to 18.4%.The findings revealed a varying CP of ASCVD occurrence associated with different BMI levels, conditioned on diabetes equal to 10.3%.Also, the CP of ASCVD occurrence about age, conditional on hypertension equal to 11.7%.Notably, the CP of ASCVD for FBS given hypertension is equal to 13.1%.Also, CPs of other variables based on knowledge-based BN structure are shown in Table S1.

Strength of influence of the relationship
The strength of the relationship between variables in Table 5 indicates that in knowledge-based BN, variables such as diabetes (0.017), hypertension (0.016), FBS (0.016), and LDL-C (0.016) have the greatest influence on the ASCVD variable.For more details on the strength of the relationships among BNs, refer to Table S2.

Discussion
In our study of 491 participants, ASCVD was exclusively observed in males and patients aged over 45 years, with a prevalence of 7.7%.Our BN models showed a good fit, and their predictive performance for ASCVD risk factors was accurate.The CPs revealed that being male, aged over 45, having diabetes, Mets, and other risk factors increased ASCVD risk, while high HDL-C reduced it.These results provide valuable insights into ASCVD risk factors and can aid in developing effective prevention, predicting various conditions, supporting health research, and determining relevant findings.
BN models have been increasingly used in the field of cardiovascular risk prediction due to their ability to model complex relationships among risk factors and incorporate prior knowledge into the model.Several studies have demonstrated the effectiveness of BN models in predicting cardiovascular risk, such as predicting coronary heart disease risk in Korean adults 64 , identifying important risk factors for stroke in the Chinese population 65 , and predicting major cardiovascular events in patients with hypertension 66 .These studies highlight the potential of BN models in improving CVD risk prediction and helping clinicians make more informed decisions.
In one study, two BN models were developed to predict ASCVD risk using data from a large population-based cohort.The model included demographic factors, ASCVD risk factors, and their inter-relationships.The performance of the models was evaluated using various measures, including sensitivity, specificity, accuracy, PPV, NPV, LR+, LR−, and AUC.The results showed that the knowledge-based BN model had good predictive performance, and identified several risk factors associated with ASCVD, such as age, sex, smoking status, hypertension, and lipid levels.Overall, the BN approach provides a promising tool for predicting cardiovascular risk and can aid in the development of personalized prevention strategies 67 .
The finding that ASCVD was exclusively observed in males and patients aged over 45 years is consistent with previous research on cardiovascular risk factors.Multiple studies have identified being male and higher age as independent risk factors for ASCVD [68][69][70] .This may be due to hormonal differences between men and women, as well as changes in the cardiovascular system that occur with aging, such as endothelial dysfunction and arterial stiffening 71,72 .It is important to note, however, that the present study did not identify age or sex as significant predictors of ASCVD in the BN models.This may be due to the complex interplay between multiple risk factors and the non-linear relationships between them.Further research is needed to fully understand the relative contributions of age, sex, and other risk factors to the development of ASCVD.
The results of the study suggest that several traditional risk factors, including diabetes and Mets, are associated with an increased risk of ASCVD.This finding is consistent with previous research that has identified diabetes as a strong predictor of CVD 73 .Mets, which is characterized by a cluster of metabolic abnormalities including abdominal obesity, dyslipidemia, and insulin resistance, has also been shown to be a strong predictor of CVD 74 .In addition to these risk factors, the study found that high HDL-C was associated with a reduced risk of ASCVD.This is consistent with previous research that has identified HDL-C as a protective factor against CVD 75 .The findings of this study underscore the importance of identifying and managing traditional risk factors for ASCVD, as well as the potential benefit of interventions to increase HDL-C levels.www.nature.com/scientificreports/

Strengths and limitations
Strengths of this study include employing the BN models, which allow for the modelling of complex relationships among various risk factors and the incorporation of prior knowledge into the model.Additionally, the study identified several traditional risk factors associated with an increased risk of ASCVD, such as diabetes and Mets, as well as a protective factor, high HDL-C.The study provides valuable insights into ASCVD risk factors and can aid in developing effective prevention and management strategies, and facilitate treatment.
One potential limitation of the study is the small sample size of 491 participants, which may limit the generalizability of the findings to larger populations.Future studies with larger sample sizes could help confirm the results and identify additional risk factors associated with ASCVD.The absence of ASCVD among female participants in this study can be attributed to several factors, including the low number of women in this study limiting the generalizability of the findings.Additionally, this study exclusively focused on healthcare providers,

Conclusion
In conclusion, the study provides valuable insights into ASCVD risk factors and demonstrates the potential of BN models in predicting cardiovascular risk in a large population-based cohort.The BN models showed good fit and accurate predictive performance for ASCVD risk factors including age, sex, smoking status, hypertension, lipid levels, diabetes, and Mets.The study found that high HDL-C was associated with a reduced risk of ASCVD.
The findings underscore the importance of identifying, preventing, and managing traditional risk factors for www.nature.com/scientificreports/ASCVD, as well as the potential benefit of interventions to increase HDL-C levels.Overall, the BN approach provides a promising tool for predicting cardiovascular risk and can aid in the development of personalized prevention strategies and health policymaking.However, the study's limitations, including a small sample size, and the complexity of the interplay between risk factors should be taken into account when interpreting the results.Further research is needed to fully understand the complex interplay between multiple risk factors and non-linear relationships between them, as well as to validate the study findings and improve our understanding of ASCVD risk factors.

Figure 2 .
Figure 2. Receiver operating characteristics curves of the Bayesian Networks.

Table 1 .
(all p < 0.05).Individuals with diabetes had 28.4%(11.3-50.2) higher ASCVD rates compared to those without diabetes.In individuals with Mets, the ASCVD rate was found to be 10.5%(4.3-18.5).Additionally, individuals with normal levels of TG, T-C, HDL-C, and LDL-C had 8.5%, 7.3%, 9%, and 11.3% (2.6-16.1,1.8-14.1,0.4-4.1, and CI: 5.1-19.3)higherASCVDrates,respectively,compared to those with high levels of TG, T-C, HDL-C, and LDL-C (All p < 0.05).Refer to Table1for more detailed participant information.We evaluated the quality of the BN models using AIC and BIC measures, and the results are summarized in Table2.Lower values of AIC and BIC are indicative of a better model fit.The results suggest that knowledge-based BN with lower AIC and BIC could be considered an appropriate representation of the data.Figure2shows ROC curves of various BN models under different methods: (1) BN constructed by Algorithmic search network structure; (2) BN constructed by Knowledge-based network structure.Sociodemographic and clinical characteristics in the study population ASCVD cases and non-ASCVD.ASCVD, atherosclerotic cardiovascular disease; Mets, metabolic syndrome; BMI, body mass index; FBS, fasting blood pressure; TG, triglyceride; T-C, total cholesterol; HDL, high-density lipoprotein; LDL, lowdensity lipoprotein; CI, confidence interval.*Numbers are expressed as frequency (percent).‡ χ 2 test for the difference between ASCVD and non-ASCVD cases.

Table 2 .
AIC and BIC values for comparing the different BN structures.AIC, akaike information criteria; BIC, Bayesian information criteria; BN, Bayesian network.

Table 4 .
Conditional probabilities of BN for ASCVD & non-ASCVD once variables are instantiated to different values.The CPs are obtained at the presence or higher risk levels of the particular variables; for example: Conditional probability for non-ASCVD = 87% is obtained at smoking = yes and Hypertension = Yes.which may have influenced the lack of ASCVD cases among female participants.Another limitation is the lack of inclusion of certain risk factors, such as family history, which may be important predictors of ASCVD.Future studies could incorporate additional risk factors into the model to improve its accuracy.Another shortcoming of the study is having to discretise continuous variables into categorical variables for the BN models.Though clinical guidelines are much better written with categorical variables, however, this brings some loss of information in the model when discretising continuous variables.Future direction is recommended to assess how this process affects the statistical information provided in the BN models.As well as finding ways to incorporate the continuous variables into BN models.