Network of biomarkers and their mediation effects on the associations between regular exercise and the incidence of cardiovascular & metabolic diseases

This study aimed to understand the biological process related to the prevention of cardiovascular & metabolic diseases (CMD), including diabetes, hypertension, and dyslipidemia via regular exercise. This study included 17,053 subjects aged 40–69 years in the Health Examinees Study from 2004 to 2012. Participation in regular exercise was investigated by questionnaires. Data on 42 biomarkers were collected from anthropometric measures and laboratory tests. We examined the associations between regular exercise and biomarkers using general linear models, between biomarkers and the risk of CMD using cox proportional hazard models, and the mediation effect of biomarkers using mediation analyses. Biomarker networks were constructed based on the significant differential correlations (p < 0.05) between the exercise and non-exercise groups in men and women, respectively. We observed significant mediators in 14 and 16 of the biomarkers in men and women, respectively. Triglyceride level was a noteworthy mediator in decreasing the risk of CMD with exercise, explaining 23.79% in men and 58.20% in women. The biomarker network showed comprehensive relationships and associations among exercise, biomarkers, and CMD. Body composition-related biomarkers were likely to play major roles in men, while obesity-related biomarkers seemed to be key factors in women.


Results
Characteristics of the study population. The basic characteristics of the included dataset (n = 17,053) differed from those of the excluded dataset (n = 45,098); however, a noteworthy imbalance was not observed in the evaluation of the standardized differences except for income (Supplementary Table S1). Age at baseline which ranged from 40 to 69 years was correlated with most of the biomarkers in both men and women (Supplementary Table S2). All analyses were performed in men and women separately because they differed in basic characteristics (Supplementary Table S3) and distribution of biomarkers at baseline (Supplementary Table S4; all p values < 00001, not shown). However, men and women had similar patterns of regular exercise; who were older, more educated, with a higher income, were unemployed or housewives, had never smoked, or currently drunk alcohol were more likely to participate in regular exercise (Table 1). Table S5), between regular exercise at baseline and the risk of CMD (Supplementary Table S6), and between biomarkers at baseline and CMD (Supplementary Table S7) were mostly consistent between men and women. Among the 42 biomarkers, regular exercise was associated with 20 biomarkers in men and 27 biomarkers in women, and 26 biomarkers in men and 29 biomarkers in women were associated with CMD. Although the associations were not significant, regular exercise seemed to protect against CMD during the follow-up.

Associations among regular exercise, biomarkers and risk of CMD. The associations between regular exercise at baseline and biomarkers at baseline (Supplementary
Mediation effects of biomarkers. We performed mediation analysis to examine the effects of biomarkers on the relationships between exercise and CMD regarding causal links (Fig. 1). Fourteen biomarkers in men and 16 biomarkers in women were shown to be the significant mediators. In particular, triglyceride showed the largest proportion explained by the indirect effect between regular exercise and the risk of CMD in both men and women, at 23.79% and 58.20%, respectively (Table 2). Waist-hip ratio, γ-glutamyl transpeptidase (γ-GTP), C-reactive protein (CRP), and white blood cell count were significant mediators in both men and women. Indirect effects of regular exercise on the risk of any CMD were observed via body composition-related markers (lean body mass, muscle mass, cell mass, protein mass, and mineral mass), hemoglobin A1c (HbA1c), albumin, alkaline phosphatase (ALP), and red blood cell count in men and via obesity-related markers (waist circumference), pulse, high-density lipoprotein (HDL), direct bilirubin, indirect bilirubin, hematocrit, and platelet count in women (Table 2).  Figure 2 shows the differential correlation networks (presented by edges) constructed for men and women separately. Biomarkers are presented by nodes, and the color shows the association between biomarkers and the risk of CMD. Overall, biomarkers were clustered similarly in both men and women. Lipid markers (triglyceride, HDL, LDL, and total cholesterol) were clustered via the solid lines, which means stronger correlations in the exercise group than in the non-exercise group. Body composition-related markers and bilirubins were clustered separately. Muscle mass had the most edges that were more strongly correlated in the exercise group and was linked to body composition-related markers (cell mass, mineral mass, and protein mass) for men, whereas visceral fat mass had the most edges that were more strongly correlated in the exercise group and was linked to obesity-related markers (body fat percentage and body fat mass) for women. CRP and white blood cell count, as significant mediators between exercise and the risk of CMD in both men and women, were observed only in the network for men, and their correlation was stronger in the non-exercise group. Waist-hip ratio and waist circumference were shown only in the network for women, and their correlation was also stronger in the non-exercise group. On the network, the nodes representing a risk for each disease suggested that obesity-related markers (waist circumference, visceral fat mass, body fat percentage, and body fat mass) were more likely to contribute to the risk of diabetes ( Supplementary Fig. S1) and that markers of body composition (muscle mass, protein mass, cell mass, and lean body mass) were more likely to influence the risk of dyslipidemia ( Supplementary Fig. S2). Notable markers for hypertension were not observed in the network for women, while the markers of body composition (muscle mass, protein mass, cell mass, and lean body mass) showed a protective role and diastolic blood pressure (DBP) showed a risk effect on hypertension ( Supplementary Fig. S3).

Discussion
This study examined the health benefit effects of regular exercise for preventing CMD by assessing the associations between exercise and biomarkers, the associations between biomarkers and CMD, and the mediation effects of biomarkers on the relationship between exercise and CMD. Among 42 biomarkers, we observed significant mediators for 14 of the biomarkers in men and 16 of the biomarkers in women. Especially, triglyceride showed a noteworthy mediation effect on decreasing the risk of CMD with regular exercise. The associations and correlations among regular exercise, biomarkers, and CMD were visualized by constructing networks. There were some differences in the networks between men and women.
Waist-hip ratio, triglyceride, γ-GTP, CRP, and white blood cell count were significant mediators in both men and women. The associations between exercise and these markers were consistent with those reported previously [15][16][17][18][19][20] , and waist-hip ratio, triglyceride, γ-GTP, CRP, and white blood cell count are well-known risk factors for CMD [21][22][23][24][25] . Among these biomarkers, only triglyceride was observed in the network of both men and women. As shown in our results, exercise decreased triglyceride and triglyceride was negatively correlated with HDL, which was enhanced by exercise. HDL was also correlated with total cholesterol and LDL. All these correlations were stronger in the exercise group than in the non-exercise group, implying that the relations between lipid markers are more highly influenced by exercise. The potential biological process would be that exercise reduces triglyceride by increasing post-heparin plasma lipoprotein lipase activity, which promotes lipoproteinmediated hydrolysis 26 . Reducing triglyceride could impact on increasing HDL 27 . Exercise can also raise HDL by inducing liver X receptor and ATP-binding cassette transporter A-1, which influence improving the reverse cholesterol transport pathway and increasing plasma HDL formation. Consequently, increased HDL leads to reduced LDL by transporting it to the liver [28][29][30][31][32][33] . www.nature.com/scientificreports/ Two of the mediators (albumin and ALP) in men and four of the mediators (hip circumference, indirect bilirubin, hemoglobin, and hematocrit) in women showed risk-mediated effects (HR > 1) despite the slight magnitudes. All these biomarkers were risk factors for CMD (Supplementary Table S7), and they showed a positive association with participating in regular exercise, although albumin and ALP were not significant (Supplementary Table S5). Elevating these biomarkers after exercise has been observed in previous studies and is probably due to adverse effects from exercise, such as damage to muscle cells or hepatic or renal stress 15,[34][35][36][37][38][39] . However, the magnitude of the indirect effects via these mediators and their proportion were much less than those of the other mediators, which showed beneficial mediated effects. Further studies would be needed to understand whether the acute adverse effects are neutralized by other beneficial effects during exercise.
Differences in the networks between men and women were observed in the cluster of body composition markers. Muscle mass had the most edges in the network of men, while visceral fat mass had the most edges in women. Muscle mass was linked with protein mass, mineral mass, cell mass and lean body mass. All these biomarkers were significant mediators between exercise and the risk of CMD in men. These results suggest that muscle mass possibly plays a key role among body composition markers in terms of preventing CMD when individuals participate in regular exercise. Meanwhile, visceral fat mass seems to be a major marker among the www.nature.com/scientificreports/ body composition markers in women, although visceral fat mass was not a significant mediator in women. In general, there is a body compositional difference between men and women: there is more muscle mass in men and more fat mass in women [40][41][42] . In the previous studies, women showed higher fat oxidation and lower use of muscle glycogen than men during exercise [43][44][45][46][47][48] . Sex-based differences in motivation and patterns of participation in exercise or physical activities might be related to the different biological processes of exercise on the health benefit. Women were motivated by improvements in appearance or weight loss, while men performed exercise  www.nature.com/scientificreports/ for enjoyment or as a challenge [49][50][51] . Therefore, women were likely to engage in regular walking or recreational activities, whereas men preferred strengthening exercise or competitive sports 52 . Because of the differences in preference and motivation for exercising between men and women, not only the significant mediators but also the network structure based on differential correlations between the exercise and non-exercise groups might have differed. This study has several limitations. First, information on regular exercise and diagnosis of CMD were collected via questionnaires that might be subject to recall bias. However, the questionnaire for the diagnosis of disease was validated by the Korean Centers for Disease Control and Prevention (KCDC) 53 , and we observed that the questionnaire for regular exercise also showed acceptable validity in the ongoing work (manuscript work in progress). Second, we used a binary variable, which is participation in regular exercise or not. When we used the total time exercise per week (frequency x duration) and categorical variable (no exercise, < 150 min/week, 150-300 min/week, ≥ 300 min/week), consistent results were observed. Nevertheless, we used a binary variable (yes or no) not only for the better power of analysis but also for ease of interpretation consistent with the analysis process comparing the network of the exercise group and the non-exercise group. Third, the associations between regular exercise at baseline and the risk of each CMD were not significant. The incidence of each CMD was not sufficient because the follow-up period was relatively short. Other diseases, such as cardiovascular accidents, myocardial infarction, and cancer, had incidences of less than 2%; thus, we could not include them in the study. However, we observed a tendency of decreasing the risk of CMD and lower HRs for the risk of two or more CMD. In addition, significant indirect effects of exercise on the risk of CMD were shown through a few biomarkers. Previous studies have demonstrated that significant indirect effects can be found even though the total effect is not statistically significant and it has been suggested that this may be due to the difference in statistical power for detecting those effects [54][55][56][57] . Finally, we examined the associations between regular exercise at baseline and biomarkers at baseline, which could be seen as a cross-sectional setting. However, information on regular exercise reflected the lifestyle before enrollment in this study, while biomarkers were measured after registering in this study. Therefore, we assumed that exercise habits before enrollment influenced the biomarkers at baseline and ultimately exerted effects on CMD risk. Nevertheless, this study still has strengths. We comprehensively examined prospective associations between regular exercise, biomarkers, and risk of CMD, and significant mediators were found by mediation analysis. Their relationship was shown via networks, and the networks were based on differential correlation; therefore, the networks also imply a difference in the relationship between the biomarkers according to participation in exercise or not.

Conclusions
The current study examined the effects of exercise on CMD by evaluating the associations between regular exercise and biomarkers, the associations between biomarkers and the risks of CMD, and, finally, the mediation effect of biomarkers on the relationships between regular exercise and CMD. Visualization of these associations in the network showed comprehensive relationships and suggested the potential biological process by which participation in regular exercise could prevent the incidence of CMD via the comprehensive benefit effects on the biomarkers. Forty-two biomarkers from anthropometric measures and laboratory tests may not be a sufficient number to show comprehensive relationships or to suggest biological processes. Further studies using metabolomics or microbiome data are needed to show more comprehensive relationships and to identify notable markers that may be key factors for explaining the health benefit of exercise on preventing chronic disease and healthy aging.

Methods
Study population. This study used data from the HEXA study, a large-scale genomic cohort study in Korea. The HEXA study recruited 169,722 participants aged 40 to 69 years between 2002 and 2013 from 38 general hospitals and health examination centers. Baseline data was obtained when the subjects were enrolled the study. Follow-up was conducted between 2012 and 2017, and data was obtained. The study design, data collection methods, and other details have been described previously 58,59 . Informed consents were obtained from all participants, and this study was approved by the Institutional Review Board of Seoul National University Hospital, Seoul, Korea (No. 0608-018-179). This study was performed in accordance with the Declaration of Helsinki.
The HEXA-G (Health Examinees-Gem) study was updated with additional eligibility criteria and included 139,348 participants at baseline 60 . After excluding subjects with missing information regarding regular exercise at baseline and those lost to follow-up (n = 77,197), this study included 62,151 subjects. We further excluded subjects (n = 45,098) with at least one chronic disease among cancer, cerebrovascular accident, myocardial infarction, diabetes, hypertension, and dyslipidemia at baseline; missing information on chronic diseases at baseline; and missing biomarker data to conduct analyses on subjects with complete data. Thus, this study included a total of 17,053 subjects. Regular exercise and biomarkers at baseline. Participation in regular exercise at baseline was investigated using an interviewer-administered questionnaire. The subjects answered yes or no to the question "Do you exercise regularly enough to sweat?". Further queries to subjects who participated in regular exercise asked about the average frequency per week and duration. This study used a binary variable (participation in regular exercise or not) to ease of interpretation consistent with the analysis process comparing the network of the exercise group and the non-exercise group.
All available biomarkers at baseline were selected from among variables measured by clinical tests and physical examinations. Pulse (beats/minutes) was measured for 30 s or 1 min following the standard procedure. Systolic blood pressure (SBP) (mmHg), and diastolic blood pressure (DBP) (mmHg) were measured twice using a www.nature.com/scientificreports/ standardized mercury sphygmomanometer, with the mean of the two measurements used in the analyses. Waist circumference (cm) and hip circumference (cm) were measured to the nearest 0.1 cm. Waist-hip ratio was calculated from the measured waist circumference and hip circumference. Grip strength (kg) was measured for both hands, and the average value was used. Body fat mass (kg), percent of body fat (kg), visceral fat mass (kg), lean body mass (kg), muscle mass (kg), body cell mass (kg), protein mass (kg) and mineral mass (kg) were measured by multifrequency bioelectrical impedance analysis (MF-BIA; InBody 3.0, Biospace, Seoul, Korea). Biomarkers related to renal function (blood urea nitrogen [BUN] (mg/dL), creatinine (mg/dL), and uric acid (mg/dL)), total cholesterol (mg/dL), high-density lipoprotein cholesterol (HDL) (mg/dL), low-density lipoprotein cholesterol (LDL) (mg/dL), triglyceride (mg/dL), glucose levels (fasting blood sugar (mg/dL) and hemoglobin A1c [HbA1c] (%)), liver function (albumin (g/dL), aspartate aminotransferase Incidence of CMD. Information on diabetes, hypertension, or dyslipidemia diagnosed by a doctor during the follow-up period was self-reported by questionnaire. Subjects who reported having been diagnosed with any of these diseases were further asked when they had been diagnosed. The median follow-up period was four years from baseline. The questionnaire for the diagnosis of diseases was validated and reported by the Korean Centers for Disease Control and Prevention (KCDC) 53 . The agreement of disease history between questionnaire data from HEXA and national health insurance records showed kappa indexes of 0.93 for diabetes, 0.95 for hypertension, and 0.75 for hyperlipidemia. Each disease was used as an outcome variable. The number of diseases was summed, and the "any CMD" variable was defined as the presence of any one of the diseases. Further analyses were performed among subjects with two or more CMD.
Covariates. Education level, income, marital status, current job, smoking and drinking habits, and menopause status were investigated using a questionnaire. Education level was categorized as < middle school, high school, and ≥ college. Income was classified as less than 2000 thousand earned, between 2000 thousand and 4000 thousand, and ≥ 4000 thousand in Korean currency (Won). Marital status was categorized as living with a spouse or living alone. The current job was categorized into office work, manual work, unemployed or housewife, and soldier or others. Information on smoking and drinking habits was collected in terms of never, former, and current use. Body mass index (BMI) was calculated using measured weight and height (kg/m 2 ). Statistical analysis. All analyses were performed in SAS 9.4 and R software (ver. 4.0.0). Biomarkers were normal score transformed using the "gstat" package in R to make normal distributions and unify scales 61 . The standardized differences between included (n = 17,053) and excluded (n = 45,098) datasets were calculated using the "stddiff " package in R. Standardized differences greater than 0.2 were considered indicative of an imbalance between datasets 62 . Correlation coefficients of age were estimated for all potential biomarkers. Wilcox rank-sum and chi-square tests were performed to evaluate the differences in basic characteristics and biomarkers between men and women. These summary statistics and odds ratios (ORs) with 95% confidence intervals (95% CIs) from logistic regression were estimated in SAS 9.4. Age, education level, income, marital status, current job, smoking and drinking habit, and BMI at baseline were included as covariates in the statistical models. Menopause status at baseline was additionally included in the models for women. General linear models were used to examine the associations between regular exercise at baseline and biomarkers at baseline after adjusting for covariates and multiple corrections in R software. Cox proportional hazard regression models were used to examine the associations 1) between regular exercise at baseline and risks of CMD at follow-up and 2) between biomarkers at baseline and risks of CMD at follow-up adjusting covariates. Hazard ratios (HRs) with 95% CIs were estimated using the "survival" package in R software.
Mediation analysis based on Cox proportional hazard regression models with the same covariates as above was performed to examine whether regular exercise influenced the risk of CMD directly without any mediator effect or indirectly through biomarkers as the mediators (Fig. 2). When the 95% CI of the estimated indirect effect did not include 0, the indirect effects were considered statistically significant. The proportions explained by the indirect effect of regular exercise through each biomarker on the risk of CMD were calculated as the indirect effect divided by the total effect (direct effect + indirect effect). The R code used for mediation analysis has been described previously 63 .
Biomarker networks were constructed for men and women separately to comprehensively visualize the associations between regular exercise and biomarkers, relationships among biomarkers, and the effects of biomarkers on the risks of CMD. Partial correlation matrixes adjusted for age were calculated in the exercise and non-exercise groups using the "pcor" package in R. The "DiffCorr" package was used to identify significant differential correlations between the exercise and non-exercise groups. Among significant differential correlations (p < 0.05), partial correlations coefficients with absolute values greater than 0.1 were selected and visualized as networks in Cytoscape software (ver. 3.7.2). Correlations among biomarkers were presented as the edges. Solid edges indicated significantly larger correlations of the partial correlation coefficients in the exercise group than those www.nature.com/scientificreports/ in the non-exercise group. Dotted edges indicated correlations with significantly larger partial correlation coefficients in the non-exercise group than those in the exercise group. The associations between regular exercise and biomarkers were indicated by triangle direction (Δ: positive associations, ∀: negative associations). The associations between biomarkers and risks of CMD were indicated by node color (red: positive associations, blue: negative associations).
Ethics approval and consent to participate. All participants signed consent forms, and this study was approved by the Institutional Review Board of Seoul National University Hospital, Seoul, Korea (No. 0608-018-179).

Data availability
The data underlying this study are the Health Examinee cohort, a part of the Korean Genome and Epidemiology Study (KoGES). Researchers who want to conduct studies using this data can apply for data access by submitting application form with documents such as research plan and the Institutional Review Board approval form. The relevant data requesting process and contact information in detail can be found in the following link: http:// www. nih. go. kr/ conte nts. es? mid= a5040 10104 00# menu4_1_2.