Kernel machine SNP set analysis finds the association of BUD13, ZPR1, and APOA5 variants with metabolic syndrome in Tehran Cardio-metabolic Genetics Study

Metabolic syndrome (MetS) is one of the most important risk factors for cardiovascular disease. The 11p23.3 chromosomal region plays a potential role in the pathogenesis of MetS. The present study aimed to assess the association between 18 single nucleotide polymorphisms (SNPs) located at the BUD13, ZPR1, and APOA5 genes with MetS in the Tehran Cardio-metabolic Genetics Study (TCGS). In 5421 MetS affected and non-affected participants, we analyzed the data using two models. The first model (MetS model) examined SNPs' association with MetS. The second model (HTg-MetS Model) examined the association of SNPs with MetS affection participants who had a high plasma triglyceride (TG). The four-gamete rules were used to make SNP sets from correlated nearby SNPs. The kernel machine regression models and single SNP regression evaluated the association between SNP sets and MetS. The kernel machine results showed two sets over three sets of correlated SNPs have a significant joint effect on both models (p < 0.0001). Also, single SNP regression results showed that the odds ratios (ORs) for both models are almost similar; however, the p-values had slightly higher significance levels in the HTg-MetS model. The strongest ORs in the HTg-MetS model belonged to the G allele in rs2266788 (MetS: OR = 1.3, p = 3.6 × 10–7; HTg-MetS: OR = 1.4, p = 2.3 × 10–11) and the T allele in rs651821 (MetS: OR = 1.3, p = 2.8 × 10–7; HTg-MetS: OR = 1.4, p = 3.6 × 10–11). In the present study, the kernel machine regression models could help assess the association between the BUD13, ZPR1, and APOA5 gene variants (11p23.3 region) with lipid-related traits in MetS and MetS affected with high TG.


Results
In the first model of analysis (MetS versus non-Mets) 2996 individuals were affected with MetS, and 2425 individuals were non-affected. The MetS group was older with higher mean values for MetS components except for HDL concentration. About 0.7% and 35% were Lipid-lowering drug users, 3.5% and 35% were hypertensionlowering drug users, and 1.7% and 25% were diabetes drug users among MetS and non-MetS participants, respectively (Table 1).
In the HTg-MetS model, comparing the 2543 cases and 2170 controls showed a significant difference within all metabolic syndrome risk factors. In this case, all considered risk factors for MetS had higher mean values in the case group. Between lipid drug users in the case and control groups, a significant difference is observed. None of the case group subjects used lipid-lowering drugs, but 41.2% of controls that consumed drugs indicated that the lipid-lowering drug consumption is directly related to the increased TG levels in HTg-MetS individuals. Table 1 summarizes the demographic and clinical characteristics of the studied individuals for both analytical models.
The selected SNPs were clustered into three variant sets according to the four gamete rules procedure with the sizes of 5, 1, and 12, shown in Table 2. Figure 1 presents the LD and haplotype plots related to the four gamete rules procedure. Moreover, Fig. 2 compares 18 selected SNPs' allele frequencies among populations from five continents with our population.
The kernel machine regression between SNP sets and case-control groups in the MetS model showed that the first and the last sets had a significant association with MetS (p-value < 0.0001). This finding were confirmed in the HTg-MetS model with the same significance level. In both models, most SNPs showed the association, especially two variations of the APOA5 gene: rs2266788, MetS model: [ (Tables 3 and 4).
Four SNPs (rs2266788, rs651821, rs12292921, and rs180326) with the highest significance levels were selected to indicate the relation between MetS components level variation and the number of risk alleles. The result of association analysis showed the significant difference between studied markers and the triglyceride and HDL-C level (p < 0.001), and three of them were in association with non-HDL-C levels (p < 0.001) ( Table 5). Figure 3 tries to present that the TG has the highest level in the HomoAlt groups for all four SNPs, and there is an association between raised TG levels in MetS-affected and the presence of the risk allele. The SNPs were tested for linkage disequilibrium (LD) to indicate the same signal between rs2266788 and rs651821 in the S3 and showed a strong LD (D′ > 0.9; r 2 > 0.8).

Discussion
This is the first kernel machine study to evaluate association genetic variations within the BUD13, ZPR1, and APOA5 genes (11q23.3 region) with MetS and HTg-MetS. This kernel machine is a logistic sequence kernel association test that is denoted as SKAT method. SKAT is a supervised, flexible, computationally efficient regression method that handles tests for evaluating the association between common and rare genetic variants within a region and a continuous or dichotomous trait while easily adjusting for covariates. This method is also useful for assessing the association between joint effects of all SNPs within a set and the considered phenotype. Moreover, SKAT can quickly calculate calibrated p-values for rare variant association analysis in case-control studies. This regression-based method overcomes the limited power of classical single-marker association analysis for rare variants poses a central challenge in association studies 23,24 .
The results of both the kernel machine and MAX (T) tests in our study show that rs2266788, which belongs to both ZPR1 and APOA5 genes, is associated with MetS and MetS affected subjects that had a high TG.  25,26 . For the studied markers, most of the Iranian allele frequencies were common in the population, and the association result confirms the result of previous studies 7, 13,25,26 . According to the association models, the S1 and S3 sets demonstrated a significant correlation with MetS and the high TG model. The two lowest p-values belong to rs2266788 (ZPR1/APOA5 genes) and rs651821 (APOA5 gene), to increase the risk of High TG 1.4 times in the case group of the HTg-MetS model.
The BUD13, ZPR1, and APOA5 genes encode three proteins in the APOA5 protein pathway in the 11q23.3 chromosomal region. BUD13 is a subunit of the splicing factor that plays a role in preserving nuclear pre-mRNA, and ZPR1 is a regulatory protein related to signal transduction and cell proliferation 27 . APOA5 protein plays an important role in several mechanisms, including insulin sensitivity, glucose, fatty acid, and cholesterol metabolism 15,16 . The APOA5 protein is an LPL activator, which inhibits hepatic VLDL production, and accelerates the hepatic uptake of lipoprotein remnants and insulin secretion. Furthermore, it is related to free fatty acid decrease, triglyceride synthesis, and stimulates adipose tissue 25,28 . It is claimed that interaction between BUD13/ ZPR1 variants and APOA5 may change its function and increase the TG level 7,29,30 (Fig. 4).
Many studies in different populations, including European and Asian populations, confirmed that BUD13, ZNF259, and APOA5 variants correlate with changes in plasma TG levels 11,31,32 . Studies on SNPs in noncoding, exonic, and intergenic regions showed transcriptional binding sites of adjacent genes lead to transcriptional processes affected by the SNPs without being linked explicitly to protein regulation 27 .
The GWAS study conducted on the Chinese population indicated that rs651821 on APOA5 and rs180326 on BUD13 are associated with MetS and serum TG levels 7 . Another GWAS study showed the homozygotes carriers of rs2266788-G APOA5 has pleiotropic effects on (increasing) TG and (lowering) HDL-C levels 13 . Also, as a functional point mutation, rs2266788 is highly correlated with rs651821 7 .
Our results are consistent with Korean and Kuwaiti population studies 25,26 , which reported rs2266788 as related variant hypertriglyceridemia (TG > 150 mg/dL) with similar OR. Our results are also consistent with a study performed by Karja et al. to confirm the rs2266788 risk allele concerning increased TG 13 . The results of a linkage disequilibrium replicated by Zhu et al. indicated a strong LD between rs2266788 and rs651821. Furthermore, the association of rs2266788, rs651821, rs180326, and rs12292921 with MetS and HTg-MetS were reported www.nature.com/scientificreports/ in our outcome. Although the first three SNPs' association in our study and the global studies were deep-rooted, rs12292921 is not well known 7,33-36 . Finally, the present study was done in well-characterized and extended TCGS participants who were monitored for more than twenty years. The kernel machine SNP set analysis by two-step models was used for the first time to evaluate this genetic relationship; this would be the most important strength of these findings. The limitation of this research includes the failure to check the APO cluster associated with the lipid profile.
In conclusion, the kernel machine and Single SNP Regression models found the association of genetic variations on the BUD13, ZPR1, and APOA5 genes (11q23.3 region) with MetS and High TG levels in MetS affected individuals in the TCGS population. We suggested using these methods in the same studies. Moreover, rs2266788 and rs651821 in, APOA5 gene was in strong LD and showed a significant association with MetS and HTg-MetS and can be used for prediction. These two SNPs also showed the most significant association with HTg-MetS, which indicates that this region is related to changing lipid profiles in metabolic syndrome. Therefore, we recommend a more in-depth study in the APO genes cluster to confirm these findings.

Materials and methods
Subjects and population. Subjects selected from Tehran Cardio-metabolic Genetic Study (TCGS) 37 . This study is an ongoing genetic cohort study designed to determine the risk factors of major non-communicable disorders in the Tehran Lipid and Glucose Study (TLGS) 38 , whose subjects were followed-up cardio-metabolic risk factors over twenty years. The attendance enrolled in the study from 1999 until now, and over twenty thousand participants were monitored for seven phases (1999-until now; periodic follow-up interval: 3 years). An expert obtained informed consent from each subject and then interviewed for collecting demographic data and referred each one to trained physicians and laboratories for clinical examinations and blood sampling at each visit; details are published elsewhere 37,39,40 . All procedures performed in this study approved by the ethics committee on human subject research at Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences (code of "IR.SBMU.ENDOCRINE.REC.1395.366"), which were following the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. www.nature.com/scientificreports/ In the present study, 5421 unrelated individuals more than 18 years old were recruited. According to the Iranian modified metabolic syndrome joint interim statement (JIS) definition, the metabolic syndrome is defined for all participants 41,42 who attended the sixth phase of TCGS (2015-2017).
We defined two models to assign individuals to the case and control groups. In the first model (Mets versus non-MetS), the cases were selected as MetS affected if they had at least three of the following criteria 41 : (1) abdominal obesity (increased WC ≥ 91 cm in females and males) based on national cut-offs 42 , (2) TG ≥ 150 mg/ dl or receiving treatment for hypertriglyceridemia, (3) HDL < 50/40 mg/dl in F/M, (4) SBP ≥ 130 mmHg or DBP ≥ 85 mmHg or receiving treatment for hypertension, (5) FPG ≥ 100 mg/dl or previously diagnosed with type 2 diabetes and diabetic drug users. The rest of the population were selected as controls 42 .
For the second model to measure precise triglyceride effect on association analysis (hTG-MetS), subjects were divided into case and control if they affected by MetS at the same time have TG higher than 150 mg/dl and Non-MetS with TG less than 150 mg/dl, respectively, and other individuals were excluded (Fig. 5).  37 . After quality control for markers and individuals in our desired region, 20 SNPs located on the BUD13, ZPR1, and APOA5 genes were selected. According to our defined inclusion criteria and genotyping After omitting 2 SNPs with departure from Hardy-Weinberg equilibrium, 18 SNP remained in our analysis 44 . (Exact test p < 0.01) ( Table 2). The plink software was used to create a list of minor    www.nature.com/scientificreports/ significance at the level of 0.01. Rare variants, SNPs with minor allele frequency (MAF) less than 0.01, were not considered for further analysis.
Haploview. In Haploview software version 4.2, four gamete reules procedure was used to make SNP sets from correlated adjacent SNPs in our region 47 . For each marker pair, the four possible two-marker haplotypes' population frequencies are computed; if all four were observed with at least a frequency of 0.01, it is deemed to recombination have occurred. According to this analysis, SNPs were clustered into three sets. According to linkage disequilibrium (LD), the LD plot and haplotypes were drawn under the four gamete rule, with a con-   www.nature.com/scientificreports/