A study of associations between CUBN, HNF1A, and LIPC gene polymorphisms and coronary artery disease

The aim of this study was to identify novel genetic markers related to coronary artery disease (CAD) using a whole-exome sequencing (WES) approach and determine any associations between the selected gene polymorphisms and CAD prevalence. CUBN, HNF1A and LIPC gene polymorphisms related to CAD susceptibility were identified using WES screening. Possible associations between the five gene polymorphisms and CAD susceptibility were examined in 452 CAD patients and 421 control subjects. Multivariate logistic regression analyses indicated that the CUBN rs2291521GA and HNF1A rs55783344CT genotypes were associated with CAD (GG vs. GA; adjusted odds ratio [AOR] = 1.530; 95% confidence interval [CI] 1.113–2.103; P = 0.002 and CC vs. CT; AOR = 1.512; 95% CI 1.119–2.045; P = 0.007, respectively). The CUBN rs2291521GA and HNF1A rs55783344CT genotype combinations exhibited a stronger association with CAD risk (AOR = 2.622; 95% CI 1.518–4.526; P = 0.001). Gene-environment combinatorial analyses indicated that the CUBN rs2291521GA, HNF1A rs55783344CT, and LIPC rs17269397AA genotype combination and several clinical factors (fasting blood sugar (FBS), high-density lipoprotein (HDL), and low-density lipoprotein (LDL) levels) were associated with increased CAD risk. The CUBN rs2291521GA, HNF1A rs55783344CT, and LIPC rs17269397AA genotypes in conjunction with abnormally elevated cholesterol levels increase the risk of developing CAD. This exploratory study suggests that polymorphisms in the CUBN, HNF1A, and LIPC genes can be useful biomarkers for CAD diagnosis and treatment.

Although current genetic tests may not lead to medical management of clinically diagnosed patients, these tests could be used to detect risk-associated mutations in asymptomatic family members.
Therefore, based on whole-exome sequencing (WES) of samples from 20 CAD patients and 20 control subjects, we identified 15 SNPs in CAD-associated genes that occurred in significantly different allele frequencies in these two groups (see Supplementary Table S1). We then selected five SNPs for genes expected to affect CAD development, including cubilin (CUBN), HNF1 homeobox A (HNF1A), and LIPC. The CUBN gene is located on 10p13 and encodes cubilin, an extracellular binding protein with various ligands, including intrinsic factorvitamin B12, vitamin D-binding protein, apolipoprotein, and HDL-cholesterol 9 . Ligands such as vitamin B12 and HDL-cholesterol are recognized by cubilin on the surface of absorptive cells; cubilin facilitates ligand absorption into the cell by endocytosis 10 . The HNF1A gene is located on 12q24.31 and encodes hepatocyte nuclear factor 1-alpha (HNF-1α), which is associated with maturity-onset diabetes of the young (MODY). This gene is highly expressed in the liver and pancreas, and is a transcription factor involved in regulating lipid and glucose metabolism 11 . The LIPC gene encodes hepatic lipase and is located on 15q21.3. Hepatic lipase (LIPC), also called hepatic triglyceride lipase, is mainly expressed in the liver and catalyzes the hydrolysis of triacylglyceride 12 .
The aim of the present study, therefore, was to investigate genetic polymorphisms associated with CAD susceptibility and risk factors such as obesity 13 , diabetes mellitus (DM) 14 , and hypertension (HTN) 15 using an NGS approach in order to identify novel genetic biomarkers.

Results
Clinical profiles of study subjects. Clinical data for the 452 CAD patients and 421 control subjects is summarized in Table 1 (Supplementary Table S2 shows the data from WES participants). There were no statistically significant differences in terms of age or gender for any case-control comparison (P = 0.699 and 0.778, respectively, for age and gender). HTN and DM, the major risk factors for development of CAD, were significantly more frequent in the patient group (P < 0.05). No significant difference was observed between groups in terms of hyperlipidemia or smoking status. A significant difference between groups was observed in terms of total cholesterol (TC), triglyceride (TG), and HDL-cholesterol levels, which are plasma lipid and lipoprotein markers (P < 0.05), but no significant difference was observed in terms of LDL-cholesterol level (P = 0.375).
Whole-exome sequencing and identification of five SNPs for a larger cohort case-control study. Twenty CAD patients and 20 controls were randomly selected from the total participant pool for WES analysis. Supplementary Fig. S1 showed workflow for variants sortation. A total of 293,250 variants, including 248,156 known variants, 45,094 novel variants, and 142,201 intron variants, were detected by WES analysis. The known variants were divided into common variants (minor allele frequency [MAF] ≥ 0.05; n = 121,234) and rare variants (MAF < 0.05; n = 45,094). By comparing allele frequencies between cases and controls using Fisher's exact test, we identified 5,187 variants with significant P-values from among the common variants. These significant variants included 985 exon variants and 3,127 intron variants. Fifteen variants in CAD-associated genes were selected for further investigation into their association with CAD in a larger cohort composed of 100 cases and 100 controls. Finally, the five variants that maintained significant P-values in this cohort were selected for a case-control study with 452 CAD patients and 421 controls (see Supplementary Fig. S2).  www.nature.com/scientificreports/ Genotype combination analysis. Genotype combination analysis was performed to confirm the combined genotype effect of the five SNPs. Prior to genotype combination analysis, multifactor dimensionality reduction (MDR) was performed on the five SNPs to identify interactions affecting CAD risk. Three SNPs (CUBN rs2291521G > A, HNF1A rs55783344C > T, and LIPC rs17269397A > G) were selected as the best MDR model. In analysis of the CUBN rs2291521G > A/HNF1A rs55783344C > T/LIPC rs17269397A > G genotype combination, an association with CAD risk was identified for the combined genotype of CUBN rs2291521GA/HNF1A rs55783344CT/LIPC rs17269397AA (AOR = 2.501; 95% CI 1.382-4.527; P = 0.003). A reduction in CAD risk was found with respect to the CUBN rs2291521GA/HNF1A rs55783344CC/LIPC rs17269397AG genotype combination (AOR = 0.329; 95% CI 0.141-0.764; P = 0.010) (see Supplementary Table S3). The CUBN rs2291521GA/LIPC rs139204AA and HNF1A rs55783344CT/LIPC rs17269397AA genotype combinations were associated with increased risk of CAD (AOR = 1.874; 95% CI 1.299-2.703; P = 0.001 and AOR = 1.474; 95% CI 1.058-2.054; P = 0.022, respectively), and the CUBN rs2291521GA/HNF1A rs55783344CT genotype combination in particular exhibited a synergistic effect (AOR = 2.622; 95% CI 1.518-4.526; P = 0.001). In contrast, the CUBN rs2291521GG/ HNF1A rs55783344CC and HNF1A rs55783344CC/LIPC rs17269397AG genotype combinations were associated with a protective effect in terms of CAD risk (AOR = 0.683; 95% CI 0.518-0.899; P = 0.007 and AOR = 0.692; 95% CI 0.408-0.888; P = 0.011, respectively) (see Table 3).

Discussion
This study evaluated the association of five identified SNPs (CUBN rs1801232C > A, rs2291521G > A, HNF1A rs11065390G > A, rs55783344C > T and LIPC rs17269397A > G) based on WES data, with the risk of developing CAD. In comparisons of the genotype frequencies of CAD patients and control subjects, CUBN rs2291521G > A and HNF1A rs55783344C > T appeared to be most responsible for the prevalence of CAD. In particular, genotype combinations involving these two SNPs synergistically elevated CAD risk. LIPC rs17269397A > G was also found to have a significant impact on CAD susceptibility in combination with CUBN rs2291521G > A and HNF1A rs55783344C > T, although this effect was not independently significant. Although the HNF1A rs5578334C > T polymorphism has been reported to be associated with Japanese type 2 diabetes 16 , this is the first report, to our knowledge, of the association of CUBN rs2291521G > A and HNF1A rs55783344C > T with CAD susceptibility. Cubilin (CUBN), also known as the intestinal intrinsic factor (IF)-cobalamin (vitamin B12) complex receptor, is expressed on renal and intestinal epithelial cells and is hypothesized to function together with megalin (LRP2) 17 . Vitamin B12 is an important regulator of homocysteine metabolism 18 , and abnormally high homocysteine levels are associated with CAD 19 . Hepatic nuclear factor 1-alpha (HNF1A), also known as transcription factor 1, is a homeodomain-containing transcription factor that is important for a diverse array of metabolic processes in the liver, pancreatic islet cells, kidneys, and intestines 20 . HNF1A plays a key role in maintaining glucose homeostasis 21 , and mutations in the gene encoding HNF1A were identified as the cause of MODY type 3 22,23 . Lipase C (LIPC), a triglyceride lipase generally expressed in the liver, maintains lipid homeostasis in both the liver and white adipose tissue 24 and also contributes to the remodeling of lipoproteins, such as HDL-cholesterol 25 , LDL-cholesterol, and very low-density lipoprotein (VLDL) 26 . Other research established that atherosclerosis can be induced by LIPC mutation 27,28 .
The most notable gene-environment factors in the combined effect analysis were lipoproteins. In particular, the combination of the CUBNGA + AA and HNF1ACT + TT genotypes with high LDL-cholesterol level was associated with a dramatically increased risk of CAD, even though no independent factor effect of LDL-cholesterol was shown. Low HDL-cholesterol combined with the CUBNGA + AA, HNF1ACT + TT, and LIPCAA genotypes was associated with increased CAD risk. Previous studies demonstrated that CUBN, HNF1A, and LIPC are involved in lipoprotein homeostasis and metabolism [29][30][31] . In addition, the CUBNGA + AA, HNF1ACT + TT, and LIPCAA genotypes exhibited combined effects in conjunction with FBS. LIPC also plays roles in regulating plasma glucose and lipoprotein levels 32,33 . In conjunction with MetS, these three genotypes exhibited a combined effect on glucose and lipid metabolism. MetS is a disorder that in addition to being considered a cardiovascular disease risk factor can also exacerbate cardiovascular disease 34 . MetS develops in response to a constellation of factors that can be difficult to ascertain. Once cardiovascular disease or diabetes develops, MetS often develops as well, and the components of MetS in turn contribute to disease progression and risk 35 . Therefore, the overall metabolic system is often considered as having collapsed in CAD patients, and more research is needed to explain this phenomenon.
Vitamin B12 (cobalamin) is absorbed by CUBN in the intestinal epithelium in association with IF, a carrier protein produced in the stomach 30 . As mentioned above, vitamin B12 plays a key role in homocysteine and folate metabolism, and impairments in these metabolic processes are associated with increased risk of MetS in patients with vascular disease 36 . Despite the strong association between CUBN and vascular disease, an association between CUBN and CAD risk has only been reported in GWASs 37,38 .
Moreover, searches for CAD and coronary heart disease (CHD) in the GWAS Catalog (https ://www.ebi. ac.uk/gwas/home) resulted in 940 and 1,220 associations in 40 and 90 studies, respectively. Two variants (rs1169288 and rs2244608) in HNF1A were reported in five studies that were associated with CAD, and another variant (rs261332) in LIPC was reported in a study that was associated with CHD. Associations with other diseases were reported for the two intronic variants that were associated with CAD in our study. In detail, the CUBN rs2291521G > A variant was associated with total grey matter volume (OR and 95% CI were unknown; P = 3 × 10 -6 ) 39 , and the HNF1A rs55783344C > T variant was associated with type 2 diabetes (OR = 1.07; 95% CI 1.04-1.11; P = 5 × 10 -6 ) 16 . However, five SNPs that were identified in this study were not found in the GWAS Catalog when searching for CAD and CHD. Although the effect of these two variants in CAD is not clear, they may affect mRNA splicing by alteration of donor and acceptor sites, the polypyrimidine tract, the branch point, enhancers, or silencers 40 .
The WES analysis was included various intronic and intergenic variants. The WES library preparation was performed through capture kit (SureSelect V5-post, Agilent Technologies, Santa Clara, CA, USA). The kit can capture not only exon but also proximal flanking intronic sequence for increased accuracy of exon sequencing. Some reports using the capture kit were showed that kit captured intronic and intergenic variants 41,42 . And we have not excluded the intronic variants which may role as splicing variants. Therefore, some intron variants were included in five candidates for case-control study.
In this study, we used a NGS approach to identify novel diagnostic markers of CAD that could contribute to the establishment of a CAD diagnosis system. Candidate SNPs were selected from NGS data, and subsequent case-control studies demonstrated that the CUBN rs2291521G > A, HNF1A rs55783344C > T, and LIPC rs17269397A > G SNPs are related to the prevalence of CAD. These SNPs may also negatively impact patients www.nature.com/scientificreports/ in conjunction with vascular disease-associated factors such as DM, FBS, HDL-cholesterol, LDL-cholesterol, folate, and vitamin B12. This study has several limitations. First, it is unclear whether intron variants of CUBN, HNF1A, and LIPC polymorphisms contribute to gene expression or mRNA splicing. Second, the population of this study was restricted to patients of Korean ethnicity. Although the results of our study provide the first evidence that SNPs in the CUBN, HNF1A, and LIPC genes could be prognostic biomarkers useful for CAD prevention, a prospective study involving a larger cohort of patients and functional studies are required to validate these findings.

Methods
Study population. Subjects were recruited from the South Korean provinces of Seoul and Kyeonggi-do between 2014 and 2016. All participants gave written informed consent to this study approved by the Institutional Review Board (IRB) of CHA Bundang Medical Center (IRB number: 2013-10-114) in January 2014, and all study protocols followed the recommendations of the Declaration of Helsinki. The study included 452 consecutive patients with CAD, referred from the Department of Cardiology at CHA Bundang Medical Center, CHA University. All patients had stenosis of more than 50% in at least one of the main coronary arteries or their major branches, which was confirmed by coronary angiography. To avoid issues in blood testing caused by various medical treatments, exclusion criteria included cardiac arrest and life expectancy of < 1 year. Diagnoses were based on the results of coronary angiography, and required the agreement of at least one independent experienced cardiologist.
We selected 421 gender-and age-matched control subjects without CAD symptoms from among patients presenting at our hospitals during the same period for health examinations, including electrocardiogram. Control subjects had no recent history of angina symptoms or myocardial infarction, and showed no T wave inversion on electrocardiography. Exclusion criteria have been described in our previous study 43 . The criteria for metabolic syndrome (MetS) in this study were as follows: body mass index ≥ 25.0 kg/m 2 ; TG level ≥ 150 mg/ dL; HDL-cholesterol < 40 mg/dL for men and < 50 mg/dL for women; blood pressure ≥ 130/85 mmHg or taking anti-hypertensive medication, and fasting blood sugar (FBS) ≥ 110 mg/dL or taking insulin or anti-hypoglycemic medication. Individuals with three or more of the five above-mentioned risk factors were considered as having MetS 35 .

Blood biochemical analyses. Blood was collected in tubes containing an anticoagulant after 12 h of fast-
ing. Samples were centrifuged for 15 min at 1,000×g to separate plasma from whole blood. The plasma homocysteine concentration was determined using an IMx fluorescent polarizing immunoassay (Abbott Laboratories, Abbott Park, IL, USA), and plasma folate concentration was determined using a radioimmunoassay kit (ACS:180; Bayer, Tarrytown, NY, USA).
TC, TG, HDL-cholesterol, and LDL-cholesterol levels were determined by enzymatic colorimetric methods using commercial reagent sets (TBA 200FR NEO, Toshiba Medical Systems, Japan).
Whole-exome sequencing (WES) work flow. WES was performed for 20 CAD patients and 20 control subjects, who were selected randomly, with the only considerations being age-and sex-matching. The genomic DNA captured library was prepared for WES using SureSelect V5-post capture kit (Agilent Technologies, Santa Clara, CA, USA). Paired-end sequences produced using an Illumina HiSeq instrument were first mapped to the human genome using the Burrows-Wheeler Alignment Tool mapping program (version 0.7.12). Variant genotyping for each sample was performed using the Haplotype Caller of the Genome Analysis Toolkit (GATK), based on the BAM file previously generated. An in-house program and SnpEff were used to filter additional databases, including ESP6500, ClinVar, dbNSFP2.9. For advanced analyses, all per-sample genomic variant calling format (GVCFs) were gathered and submitted collectively to the joint genotyping tool, Genotype GVCFs. The genotype frequency of each polymorphism was calculated, and data quality and genotype error were confirmed based on the Hardy-Weinberg equilibrium 44 .

Identification of candidate biomarkers. The association between CAD and individual polymorphisms
was assessed using Fisher's exact test under the assumption that a rare allele would have an effect for each polymorphism. The list of SNPs was sorted based on those meeting the significant criterion of P < 0.05 for Fisher's exact test. The Gene Ontology (https ://geneo ntolo gy.org) database was used to perform gene-enrichment and functional annotation analysis of significant SNPs. The statistical significance of putative associations was assessed using PLINK 1.07 (https ://pngu.mgh.harva rd.edu/~purce ll/plink /). Of SNPs found significant using Fisher's exact test, genes classified as ' cardiovascular disease' in the Genetic Association Database (https ://genet icass ociat iondb .nih.gov/) and ' coronary artery disease' in the GWAS Catalog (https ://www.ebi.ac.uk/gwas/) were selected. Among these, genes associated with CAD risk factors were sorted and identified using the 'Functional Annotation Clustering' tool of DAVID (https ://david .ncifc rf.gov/home.jsp). In total, 15 SNP candidates were identified by the above process. These 15 SNPs were subsequently genotyped in a randomly selected group of 100 cases and 100 controls. Based on the results, we selected five SNPs (CUBN rs1801232, rs2291521, HNF1A rs11065390, rs55783344, and LIPC rs17269397) that exhibited distinct frequency differences between the 100 cases and 100 controls. Although the sample size for WES was small, some identified SNPs remained significantly associated with CAD susceptibility in a case-control study.
Genotyping. DNA  www.nature.com/scientificreports/ polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) analysis under the conditions shown in Supplementary Table S7. The digested PCR products for genotyping of CUBN rs1801232C > A, HNF1A rs55783344C > T, and LIPC rs17269397A > G were subjected to agarose gel electrophoresis, stained with 3.0% ethidium bromide, and visualized under ultraviolet illumination (see Supplementary Fig. S3). CUBN rs2291521G > A and HNF1A rs11065390G > A were genotyped using real-time PCR (RG-6000, Corbett Research, Australia) for allelic discrimination. The sequences of the primers and probes used for the HNF1A rs11065390G > A genotyping analyses are shown in Supplementary Table S7. We randomly repeated 10-15% of the PCR assays for each polymorphism and confirmed the results by DNA sequencing using an automated sequencer (ABI3730x DNA Analyzer, Applied Biosystems, Foster City, CA, USA). The concordance of the quality control samples was 100%.
Statistical analyses. The chi-square test for categorical data and Mann-Whitney test for continuous data were used to compare clinical characteristics between the study groups. Associations between CUBN, HNF1A, and LIPC polymorphisms and CAD incidence were analyzed using adjusted odds ratios (AORs) and 95% confidence intervals (CIs) from multivariate logistic regressions adjusted for age, gender, HTN, DM, hyperlipidemia, and smoking status. Analyses were performed using GraphPad  www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.