Although over 30 common genetic susceptibility loci have been identified to be independently associated with coronary artery disease (CAD) risk through genome-wide association studies (GWAS), genetic risk variants reported to date explain only a small fraction of heritability. To identify novel susceptibility variants for CAD and confirm those previously identified in European population, GWAS and a replication study were performed in the Koreans and Japanese. In the discovery stage, we genotyped 2123 cases and 3591 controls with 521 786 SNPs using the Affymetrix SNP Array 6.0 chips in Korean. In the replication, direct genotyping was performed using 3052 cases and 4976 controls from the KItaNagoya Genome study of Japan with 14 selected SNPs. To maximize the coverage of the genome, imputation was performed based on 1000 Genome JPT+CHB and 5.1 million SNPs were retained. CAD association was replicated for three GWAS-identified loci (1p13.3/SORT1 (rs599839), 9p21.3/CDKN2A/2B (rs4977574), and 11q22.3/ PDGFD (rs974819)) in Koreans. From GWAS and a replication, SNP rs3782889 showed a strong association (combined P=3.95 × 10−14), although the association of SNP rs3782889 doesn’t remain statistically significant after adjusting for SNP rs11066015 (proxy SNP with BRAP (r2=1)). But new possible CAD-associated variant was observed for rs9508025 (FLT1), even though its statistical significance did marginally reach at the genome-wide a significance level (combined P=6.07 × 10−7). This study shows that three CAD susceptibility loci, which were previously identified in European can be directly replicated in Koreans and also provides additional evidences implicating suggestive loci as risk variants for CAD in East Asian.
Coronary artery disease (CAD) being the leading causes of disability and mortality world-wide,1 is a complex polygenic disease in which genetic factors have a significant role in disease etiology.2 It has been estimated that heritable factors account for 30–60% of the inter-individual variation in the risk of CAD.3 Until now, genome-wide association studies (GWAS) have identified over 30 common variants that are associated with the risk of coronary artery disease, as reported in NHGRI catalog (http://www.genome.gov/gwasstudies) and literature review.4 A number of loci associated with CAD have been found but the most studies have recently been reported to identify susceptibility loci in population of European descent.5, 6, 7, 8, 9, 10, 11 GWASs in Asians have led to the discovery of genetic variants in BRAP at 12q24. 12 (Ozaki et al.12) and C5orf105 at 6p24.1 that are associated with CAD.13 However, association of newly discovered loci previously identified in population of European descents were much smaller than those associated with CAD risk in population of Asian descents or could not be confirmed in Asians given the difference in linkage disequilibrium (LD) patterns between populations or overlapped between two populations.14 In addition, reproducible evidence of disease association has been acquired at a few candidate loci previously identified by GWAS studies.15, 16
In this study, we conducted GWAS and a replication study to identify common CAD susceptibility loci and validate the previously reported loci using Affymetrix Genome-Wide Human SNP array 6.0 with 2293 CAD patients from Genomics Research in Cardiovascular Disease (GenRIC) and 4302 healthy controls from a large urban cohort, the Korea Genome Epidemiology Study (KoGES) as Stage I. By analyzing data from GWAs scan, replication was performed using 3052 cases and 4976 controls from the KItaNagoya Genome (King) study of Japan. From GWAS and a replication analysis, we found evidence for genetic variants that may be associated with CAD risk.
Materials and methods
The study protocol was approved by the institutional review boards at Korea National Institute of Health and at each collaborating institute. Informed consent was obtained from all participants. To identify common susceptibility loci, a GWA scan (GWAS) was conducted with 2293 CAD patients from GenRIC working groups consisting with three teaching hospitals (Samsung Medical Center, Seoul National University Hospital and Yonsei University College of Medicine) in Korea. CAD was confirmed by standard coronary angiography. Subjects with myocardial infarction (31%), stable angina (41%) and unstable angina (28%) were classified as CAD subjects. The diagnosis of myocardial infarction was based on typical chest pain with a duration >30 min, on characteristic electrocardiographic patterns of acute myocardial infarction, and on elevated creatine kinase-MB and troponin I levels. Subjects with familial hypercholesterolemia, known vasculitides, end-stage renal disease and congenital heart disease were excluded.
KoGES is an ongoing cohort study in Korea that started in 2001 and is aimed at understanding the causes of disease and identifying disease risk factors. The participants of the large urban cohort visited one of the several centers located in four north-central (Seoul) and south-eastern (Busan) regions of South Korea to undergo a health examination that included a clinical test and physical measurements. All procedures were standardized across the centers through the development of a standard protocol, as well as the training of research coordinators and research assistants. For replication, samples were used from the KItaNagoya Genome (KING) study, which is a community-based prospective observational study of individuals who underwent community-based annual health checkups in Kitanagoya City, Japan, between May 2005 and December 2007, that is aimed at identifying factors contributing to the genetic basis of cardiovascular disease and its risk factors (Supplementary Table S3). The replication cohort has been described in a previous study.17
Genotyping and quality control of the GWAS and replication
Genetic variants in samples from 2293 cases and 4302 controls were genotyped using the Affymetrix Genome-Wide Human SNP array 6.0. The BirdSeed (http://www.broadinstitute.org/mpg/birdsuite/birdseed.html) genotyping algorithm was used for genotype calling. The PLINK program (Ver. 1.06) and R statistics (http://www.r-project.org/) (Ver. 2.10.1) were used for quality control procedures. Samples with any of the following were removed: (i) gender inconsistencies (n=23 in cases, n=12 in controls), (ii) call rate<95% (n=94 in cases, n=443 in controls), (iii) outliers in a heterozygosity plot (n=41 in cases, n=25 in controls), (iv) cryptic first-degree relative>0.8 (n=9 in cases, n=33 in controls), or (v) outliers in a multidimensional scaling plot (n=3 in cases, n=26 in controls). Samples with a history of cancer and CAD were also excluded (in controls, n=172).
Samples in which the genotype-deduced gender differed from the clinical record were excluded. The calculation of heterozygosity and identity-by-state (IBS) were performed on the basis of the method reported by the WTCCC.7 To eliminate the genetic influence of sample contamination, duplications and cryptic first-degree relative sibling pairs, genome-wide average IBS values were calculated for each pair of individuals in the present GWA case−control data set using pruned SNPs (51 195 SNPs in 2123 cases and 74 965 SNPs in 3703 controls), which tend to represent weak LD blocks. From the IBS analysis, individuals who shared a too high degree of IBS were excluded. To evaluate differences in population structure, an multidimensional scaling calculation was performed using pruned SNPs.
SNPs were filtered if: (1) the call rate was <95% (n=92 682 in cases, n=170 631 in controls), (2) the minor allele frequency (MAF) was <1% (n=83 902 in cases, n=112 231 in controls), (3) the difference between case and control was missing (P<5 × 10−5) (41 142 SNPs in cases and controls), (4) differential genotype calling rate between the cases and controls (case missing rate >1% or control missing rate >1%, and missing P-value <5 × 10−6), and (5) significant deviation from Hardy-Weinberg equilibrium (HWE P1 × 10−6) were also filtered. After quality control, 521 786 autosomal SNPs in 2123 cases and in 3591 controls remained for association analysis.
For replication, fourteen significant candidate loci in GWAS scan were examined for replication using 3052 cases and 4976 controls using TaqMan SNP genotyping assay (Supplementary Figure S1). The minimum genotyping success rate in replication study was 0.993 and Hardy–Weinberg equilibrium test P-value showed >0.05 in control group, suggesting no fault in genotyping procedure.
SNP selection for validation study using GWAS
We selected the SNPs if they were implicated in previous GWAS studies and reported from the National Human Genome Research Institute catalog (http://www.genome.gov/gwastudies). We included SNPs only if they have assigned reference allele, defined MAF, and estimated OR or β-coefficient and 95% confidence interval (CI). All SNPs successfully imputed were used (imputation QC R2<0.3).
Association analysis of the GWAS
Association analysis was performed for 521 786 common SNPs that passed QC criteria for cases and controls under an additive model (one degree of freedom) of logistic regression with adjustment for age and gender in 2123 cases and 3591 controls. The genomic inflation factor λ was calculated as 1.08 (Supplementary Figure S2), and for examining a population stratification, population substructures using multidimensional scaling and principal component analyses were performed. It is on the basis of pairwise IBS which showed that, apart for some outliers, all subjects in the present study cluster closely with HapMap Asians (Supplementary Figure S3). Detailed method was described in previous study.18 The statistical power of detecting the ORs reported in previous GWASs was calculated by using Quanto version 1.2.4. (http://hydra.usc.edu/gxe) (Supplementary Table S6).
Association of loci with established cardiovascular risk factors was examined in control samples from the GWAS. For quantitative traits (the levels of high-density lipoprotein, low-density lipoprotein, total cholesterol, triglycerides, blood pressure and body mass index), linear regressions were used, whereas, for the binary trait (hypertension), a logistic regression model was applied (Supplementary Table S5).
Single-nucleotide polymorphism imputation
To infer the genotype of SNPs that were not observed in the Affymetrix Genome-wide Human SNP array 6.0 platform, SNP imputation was performed by using a Markov chain Monte Carlo models (MCMC), as implemented in IMPUTE ver2 (http://mathgen.stats.ox.ac.uk/impute/impute_v2.html). On the basis of NCBI build 37, the phased JPT+CHB data from 1000 Genomes Phase 1 (interim) were used as a reference panel, which consisted of over 7.5 million SNPs. After excluding imputed SNPs with an imputation quality score below a set threshold (R2<0.3), call rate of <0.9 in either cases or controls, MAF of <0.01 in either cases or controls, Hardy–Weinberg equilibrium P of <1 × 10−3 in controls, we retained a total of 5 125 961 genotyped and imputed autosomal SNPs. Plots were drawn using the LocusZoom standalone version (http://genome.sph.umich.edu/wiki/LocusZoom) on the basis of 1000 Genome JPT+CHB for all SNPs in Figure 1.
Replication for SNP selection and combined analysis
Following the GWAs scan, SNPs for replication were selected on the basis of the following criteria among the 521,786 SNPs directly genotyped that had passed QC procedure: SNPs (a) with an MAF of >5% (b) with very clear genotyping clusters (c) with P-value of <5 × 10−4, and (d) not in strong LD (R2<0.5) with any of the GWAS-identified risk variants. In addition, when multiple SNPs showed LD within 100kb (R2>0.2), the SNP with the lowest P-value was selected for replication. SNPs with MAF of between 1% and 5% is ∼50 000 and, among them, fifty SNPs (0.1%) have the P-value of <5 × 10−4. And mostly, these SNPs didn’t meet other replication criteria. For replication study, we did performed logistic regression by adjusting age and gender. The combined analysis was performed by fixed effects and weighted numbers of samples, and combined P-values and ORs were calculated using the Mantel-Haenszel test in RMETA in the R package. In addition, as we found CAD association of variant previously identified on 12q24 in the Japanese, we examined independent test at chromosomal position 12q24.11 using SNP (rs11066011) as a proxy of BRAP.
Validation of previously identified coronary artery disease susceptibility loci in Koreans
Table 1 presents the test of associations between the previously reported variants in European population and CAD risk in 2123 cases and 3591 controls. Among the CAD-associated loci previously reported in Europeans, most strong association was replicated at the CDKN2A/B locus in the Koreans. Among 33 SNPs reported in published GWASs, four SNPs located in three loci showed associations with CAD risk (after considering multiple testing, P<0.05/28=0.0018). Suggestive findings of replication were observed for six loci—PPAP2B, CXCL12, LIPA, APOA1-C3-A4-A5, COL4A1-A2, SMG6-SRR—for CAD association (p<0.05). Most significant findings were observed in 9p21.3 (rs1333049 at CDKN2A/2B) (OR=1.26, 95% CI 1.17–1.34 and P=6.05 × 10−9), following the same locus (rs4977574 at CDKN2A/2B) (OR=1.26, 95% CI 1.16–1.36 and P=1.36 × 10−8) almost similar to the magnitude and direction of previous reports. The SORT1 (rs599839) and PDGFD (rs974819) were also confirmed to be associated with CAD risk although the magnitude of the two SNPs was smaller than the previously reported in Europeans. All the remaining 11 SNPs were not replicated in our study (P> 0.05), and the effect sizes were calculated <1.11, which are lower than those of estimated in European descents. Regional plots and association by chromosomal position for three loci are shown in Figure 1.
GWA analysis of CAD
Multiple genomic locations were shown to be potentially associated with the risk of CAD (Figure 2). We observed a strong association signals in two chromosomal regions 9p21.3 and 12q24: six SNPs in two genomic loci, ACAD10, C120rf51, CDKN2A/B and RPL6-PTPN11, reached the genome-wide significant P-value of 5.0e-08. In Supplementary Table S1, a summary of the two genomic loci and their SNPs is shown. The top SNP (rs11066015 at 12p24) with a P-value of P=4.51 × 10−11 is located within the ACAD10 gene, which is adjacent to ALDH2. We also found two other SNPs, rs2074356 and rs11066280 (located−477 kb and −650 kb from rs11066015, respectively) to be significantly associated with CAD risk (OR=1.42, 95% CI 1.28–1.58 and P=6.73 × 10−11; OR=1.33, 95% CI 1.21–1.47 and P=1.44 × 10−8). Therefore, we performed conditional association analyses using the case–control sample (Supplementary Table S2). The results from these analyses showed that the associations for rs11066015, rs2074356 and rs11066280 were not significant at the genome-wide level after adjustment for each other’s effect, suggesting that they are unlikely to be independent signals. Our GWAS signals reaching genome-wide significance levels were included another five SNPs mapping to chromosome 9p21.3, which were previously discovered in association with CAD.
Replication and meta-analysis
To search additional independent genetic risk variants in Koreans, we selected the most suggestive 14 SNPs from GWAS scan and genotyped in an independent set of 3052 cases and 4976 controls from the KING study of Japanese individuals. Three loci were found to have statistically significant associations with the risk of CAD (P<0.05). The other 10 SNPs showed no significant association in replication set (P>0.05). Most significant finding were observed in 12q24.11 (rs3782889 at MYL2) (OR=1.26, 95% CI 1.18–1.35 and P=1.65 × 10−10) (Table 2). The combined analysis also reached the genome-wide significant levels (P=3.95 × 10−14). Test for heterogeneity suggested no difference in genetic effects across between GWAS and replication study for rs3782889 (P for heterogeneity=0.8577). Suggestive findings of replication analysis were observed for rs9508025 in FLT1 (OR=1.11, 95% CI 1.04–1.18 and P=0.0023). Although the MYL2 was in completely different LD block from that of the BRAP, and the association of SNP rs3782889 with CAD doesn’t remained statistically significant after adjusting for the effect of the SNP rs11066015 (Supplementary Table S4).
Imputation analysis of CAD
After SNP quality control and mapping of genomic positions to build37, a total of 5 125 961 SNPs for 5714 samples were retained as input genotype data for imputation. Among previously identified CAD loci, imputed SNPs for 9p21.3 and 12q24 loci were satisfied the P-value of GWAS significance (5 × 10−8) (Supplementary Table S7).
We conducted a GWAS study on the Koreans population with a larger number of subjects in the discovery screen than a previous study.14 We evaluated whether 33 SNPs in GWAS-identified loci from European GWAS are also relevant in Korean populations and four SNPs at three loci were successfully replicated in Koreans. The main locus is the chromosome 9 loci, which contains CDKN2ABS gene, which was previously identified in CAD and myocardial infarction.7, 11, 14, 15, 19 SNPs for rs1333049 and rs4977574 at 9p21.3 was also identified in Koreans and the effect size was similar to that of European population. Other important loci included the SORT1 locus on chromosome 1 and PDGFD locus on chromosome 11. Suggestive loci were observed for six loci—PPAP2B, CXCL12, LIPA, APOA1-C3-A4-A5, COL4A1-A2, SMG6-SRR—for CAD association (P<0.05). The effect sizes of confirmed variants were similar to or rather smaller than those of initially identified. This phenomenon was frequently observed in validation studies using different ethnic population from the population used for initial finding.5 However, the associations of other SNPs were not significant in the study possibly owing to the limited power of current study. For the established risk locus within TCF21 (rs12190287), LPA (rs3798220) and LIPA (rs1412444), we could not evaluate the same SNPs as reported because they were not genotyped or successfully imputed (imputation QC R2<0.3). Therefore, we selected rs12193973 as a proxy SNP, capturing TCF21 for rs12190287 (R2=0.24 in CHB+JTP; R2=0.35 in CEU) and rs9457925 in replacement for LPA rs3798220 (R2=0.31 in CHB+JPT; R2=none), and rs2246833 in replacement for LIPA rs1412444 (R2=0.86 in CHB+JPT; R2=1.00 in CEU) on the basis of 1000 genomes pilot1 data.
The 12q24 locus appeared to overlap with the three reported in Europeans and Asians9, 20, 21 although this locus may have pleiotropic effects with several phenotypes including our study.18 Especially, the 12q24 haplotype confers risk alleles for CAD, much more significantly in the Japanese than in the Europeans.9, 20 And previous Japanese study reported significant association with CAD risk at SNPs (rs3782886 and rs11066001) in BRAP (BRCA1-associated protein) and SNP (rs671) in ALDH2,14 located on 12q24. SNP (rs3782886) in BRAP identified in the Japanese GWA study is polymorphic in East Asians; conversely, a lead SNP (rs3184504 in SH2B3) in the European GWA study is polymorphic only in Europeans. In our study, the strong evidence of CAD association was identified via GWAS scan (P=4.51 × 10−11 at rs11066015 in ACAD10, P=6.73 × 10−11 at rs2074356 in C12orf51 and P=1.44 × 10−8 rs11066280 in RPL6-PTPN11). The SNP (rs11066015 in ACAD10) had a complete LD with SNP (rs11066001 in BRAP (R2=1)), and the SNPs (rs2074356 in C12orf51, rs11066280 in RPL6-PTPN11) had a strong LD with SNP (rs671 in ALDH2 (r2=0.8)). Previous studies have also shown significant evidence supporting signatures of natural selection and pleiotropic effects for this region (for example, CAD,9, 14, 20 blood pressure21, 22, 23 and the regulation of plasma lipid levels21, 24, 25). All variants at 12q24 that are associated with CAD in Europeans are not polymorphic in the Koreans, whereas all CAD-associated variants at 12q24 in the Koreans are monomorphic in Europeans. In the present study, rs11066280 were confirmed to be associated with CAD and blood pressure supporting previous report.26
Even though the association of rs3782889 (in MYL2) was identified with Korean GWAS and directly replicated in the Japanese, it couldn’t escape the possibility that the potential presence of LD block around MYL2 is not independent of the one around BRAP previously identified in Japanese.
As an approach to examining clinical relevance of replicated SNPs, we assessed whether they are also associated with established epidemiological risk factors. SNP for rs599839, which is replicated in GWAS showed significant evidence of association with the levels of low-density lipoprotein cholesterol and total cholesterol, which are similar to those of previous report.27 In case of rs974819, we identified this as a new possible CAD-associated variant even though its statistical significance did reach at the marginal genome-wide a significance level (P=6.07 × 10−7). In addition, this variant was also associated with diastolic blood pressure. These lipid and blood pressure risk loci may also have multiple phenotypes that might be tracking with CAD. As a result, interpretation of these findings is complex. From a qualitative perspective, these findings may suggest that some, but not all, biological mechanisms involved in low-density lipoprotein cholesterol regulation and blood pressure may be implicated in the etiology of CAD, which consist of previous CAD associations.14, 21, 28
The present study has all the potential biases for the failure of replication of the previously identified loci in European descents. First, owing to the difference in underlying genomic structure between ethnicities, bias could lead to cover tagging SNPs, while the reported SNPs could work effectively in European descents. Another potential reasons are that, variants previously examined may not be strongly associated with CAD risk in Asian such as in the null association of rs17114036 (PPAP2B) for which our study have 90% of power to detect an OR as small as 1.5. The same null association was also found in the study conducted in Japanese.14 In addition, the risk profiles of genetic variants may be overestimated by different ethnic populations, suggesting that the relative contribution of risk variants to the pathogenesis of CAD is varied between ethnicity. Especially, sampling bias (population stratification) between cases and controls might be influencing the observed results. Last, we could not exclude the effect size of previously identified reports be exaggerated caused by ‘Winner’s curse’.29 When we interpret this genetic association study, these limitations should be considered. The multi-ethnic studies will be required to contribute a better understanding of genetic architecture of CAD and pathogenesis.
In summary, we have confirmed four SNPs in three loci of CAD that were initially identified in Europeans, provided additional evidence suggesting association in the risk of CAD susceptibility variants. Future studies including fine mapping study, functional assay and replication study with large sample sizes from diverse ethnic populations need to validate our results.
We thank all participants and investigators of the GenRIC study and the Korea Genome Epidemiology Study (KoGES). This work was supported by grants from the Korea Centers for Disease Control and Prevention (4845-301), an intramural grant from the Korea National Institute of Health (2011-N73007-00, 2012-N73003-00).
About this article
Supplementary Information accompanies the paper on Journal of Human Genetics website (http://www.nature.com/jhg)
Association between 1p13 polymorphisms and peripheral arterial disease in a Chinese population with diabetes
Journal of Diabetes Investigation (2018)