The SNP rs7865618 of 9p21.3 locus emerges as the most promising marker of coronary artery disease in the southern Indian population

Development of coronary artery disease (CAD) is primarily due to the process of atherosclerosis, however the prognosis of CAD depends on pleiotropic effects of the genes located at 9p21.3 region. Genome wide association studies revealed association of variants in this region with CAD pathology. However, specific marker in predicting CAD development or progression is not yet identified. In the present study, 35 SNPs at 9p21.3 region, located in the cyclin dependent kinase inhibitor (CDKN2A/CDKN2B) genes, were genotyped among 350 CAD cases and 480 controls from the southern Indian population of Hyderabad using fluidigm nanofluidic SNP genotyping system and the data were analyzed using PLINK and R softwares. Of the 35 SNPs analysed, only one SNP, rs7865618, was found to be highly significantly associated with CAD, even after correction for multiple testing (p = 0.008). The AG and GG genotypes of this SNP conferred 3.08 and 1.93 folds increased risk for CAD respectively. In particular, this SNP was significantly associated with severe anatomic (triple vessel disease p = 0.023) and phenotypic (acute coronary syndrome p = 0.007) categories of CAD. Pair wise SNP interaction analysis between the SNPs of 9p21.3 and 11q23.3 regions revealed significantly increased risk of three SNPs of 11q23.3 region that were not associated individually, in conjunction with rs7865618 of 9p21.3.


Scientific Reports
| (2020) 10:21511 | https://doi.org/10.1038/s41598-020-77080-4 www.nature.com/scientificreports/ of these variants with different clinical and phenotypic categories of CAD suggesting complex nature of interactions between the variants in contributing to the clinical and phenotypic heterogeneity of CAD. However, studies conducted to explain the genetic etiology of progression of atherosclerosis involving inflammation, cell cycle regulation and apoptosis are inadequate in the Indian context. Harbouring cell cycle regulating cyclin-dependent kinase inhibitor2A and cyclin-dependent kinase inhibitor2B (CDKN2A and CDKN2B) genes, 9p21.3 region was found to play a central role in cell-cycle arrest, affecting cell renewal, senescence, apoptosis and several other cellular processes 13,14 . Of the 32 CAD associated loci established from the independent GWAS and meta-analysis studies, 9p21.3 region was found to be the most replicated locus across the globe albeit lack of consistency in the association pattern was evident across different populations. The CAD associated variants of this locus were specifically found to be clustered within the 60kb intergenic region of the CDKN2A and CDKN2B-genes which is referred to as the CAD interval [15][16][17] . Various population based studies revealed association of different SNPs-rs10757274, rs2383206 among Caucasians and rs2383207, rs10757278 among Germans. Recently, Kalpana et al. 18 observed rs2383206 (GG) to be associated with premature CAD among south Indians. Although this study discussed the association of several SNPs assuming various genotypic models as well as ascertained sex specific patterns, none of these SNPs were significant after Bonferroni correction for multiple testing. Further, two more variants rs7865618 (GG) and rs496892 (AA) of this locus were found associated with periodontitic coronary artery disease patients from south India 19 . However, these studies were not adequate to draw conclusions on the susceptibility profile of the concerned populations. Given the complex nature of the disease and enormous ethnic heterogeneity implicit in the Indian population, it is imperative to screen large number of populations and identify the risk associated SNPs within 9p21.3 and other CAD specific genomic regions, which can predict the development as well as prognosis of CAD. The present study deals with a comprehensive set of 35 GWAS identified SNPs at the 9p21.3 chromosomal region for their possible association with risk for CAD and its anatomical and phenotypic sub-categories, either individually or through their interactions with other SNPs in the same region as well as with those of the 11q23.3 region.

Results
Allelic association of 9p21.3 SNPs with CAD. After data pruning, seven of the 35 SNPs were excluded either because of their minor allele frequency < 1% or departure from Hardy Weinberg Equilibrium (p < 0.001) and the remaining 28 SNPs were subjected to further analysis. The results of the logistic regression analysis of the allelic data (Table 1) suggested only one of the 28 SNPs (rs7865618) as significantly associated with CAD (p = 0.0003). The minor allele (G) frequency of this SNP is observed to be higher in cases (0.50) than controls (0.41), suggesting its risk conferring nature. The association remained highly significant even after correction for multiple testing (p corrected = 0.008). Three more SNPs-rs1333048, rs10757274 and rs10757278 that were not significant earlier turned out to be significant (p ≤ 0.05) after adjusting for age and sex as covariates. However, the minor allele frequency of these SNPs was observed to be higher among the controls suggesting protective nature. On the other hand, the genotype-phenotype association analysis of rs7865618 suggested highly significant association with CAD (p = 3.86e −10 ), under codominant model (Table 2). Further, the AG and GG genotypes were both significantly associated and conferred 3.08 and 1.93 folds increased risk for CAD, respectively. Further, in order to test the possible association of this SNP with the common risk factors of CAD, we performed logistic regression analysis of this SNP with hypertension, diabetes and dyslipidemia, in the control cohort of this study, Table 1. Allelic association of SNP variants at 9p21.3 chromosomal region with CAD. *SNP significant after multiple correction (p_corrected = 0.008). Bold indicates the significant p value. Allelic association of 9p21.3 SNPs with anatomical and phenotypic categories of CAD. The results of allelic association analysis with reference to four anatomical categories of CAD viz. Insignificant, single vessel disease (SVD), double vessel disease (DVD) and, triple vessel disease (TVD) were furnished in Table 3. Except for DVD, rs7865618 is associated with increased risk for all the anatomical categories of CAD. On the other hand, the two SNPs-rs1333040 and rs10116277-conferred increased risk towards DVD. Two more SNPs, rs17694493 and rs1333048, were associated with increased and decreased risk, respectively, towards SVD. The logistic regression analyses of the SNPs with respect to phenotypic categories (Table 4) revealed highly significant association of rs7865618 with acute coronary syndrome (ACS) (p = 0.007) which represents the severe phenotypic form of CAD.

Pair wise epistatic interactions of 9p21.3 SNPs and their association with CAD.
The analysis of epistatic effects using pair wise logistic regression revealed significant SNP-SNP interactions of the three SNP pairs that were associated with CAD (Table 5). Interestingly, while rs7865618 is common to all the three pairs, the three other SNPs pairing with this SNP become significant only as an epistasis outcome with this highly significant variant indicating that this SNP might contribute to the pathogenesis of CAD through its interaction with other SNPs as well. However, the outcomes of these epistatic interactions are observed to be protective in nature. The haplotype and GMDR analyses for interaction among multiple SNP combinations did not yield any significant results, hence not presented.
Pair wise SNP-SNP interactions of 9p21.3 and 11q23.3 regions. The process of atherosclerosis is a result of disruption in lipid metabolism and cell proliferation pathways that is evident from most signifi-  www.nature.com/scientificreports/ cant association of variants at 11q23.3 and 9p21.3 loci respectively. There is no consensus information about interaction between the variants of these loci towards disease progression. As mentioned earlier in this article, rs7865618 of 9p21.3 locus is a standalone variant significantly associated with CAD and its phenotypic or anatomical categories at allelic level and in pairwise interaction analysis. However, the protective effect of the SNP in interaction with other SNPs of the same 9p21.3 region indicated that it may contribute to CAD risk through its interactions with variants at other loci. Given the significant effect of variants in 11q23.3 region on CAD pathology, we made an attempt to explore the nature of pair wise interactions between the 35 SNPs of 9p21.3 and 95 SNPs of 11q23.3, analysed by us in the previous study 11,12 . The results revealed three significant SNP-SNP combinations between these two chromosomal loci (Table 6), each pair represented one SNP from 11q23.3 locus (rs2187126 in the intron of BUD13, rs1263163 and rs2849165 both in the intergenic region of APOA5-APOA4 genes). Interestingly, the SNP included from 9p21.3 region in all the three combinations was found to be rs7865618. None of these three SNPs at 11q23.3 locus showed significant allelic association with CAD in the earlier study, except for the protective effect of rs2849165 11 . Nevertheless, in the pair wise SNP interaction analysis, all the three were shown to be contributing significantly to CAD risk in conjunction with rs7865618 from the 9p21.3 region. Because of the absence of LD among these SNPs, the observed interactions may assume greater biological significance to the pathogenesis of CAD. These results underlined the significance of the lone SNP rs7865618 from 9p21.3 region either individually or in interaction with 11q23.3 SNPs for contributing to the risk of CAD.

Discussion
The complex nature of CAD phenotype might be the outcome of interactions of different genomic loci. Since the SNPs at 11q23.3 region harbouring apolipoprotein coding genes namely APOA1, APOC3, APOA4, APOA5 and regulatory genes BUD13, ZPR1, SIK3 were found to be associated with defective lipid metabolism, this region was thought to play an important role in atherosclerosis and CAD risk. However, the distinct patterns of association of 11q23.3 SNPs observed with CAD and dyslipidemia in the previous study of this cohort made it imperative to screen for other CAD loci for deriving the genetic susceptibility profile of the population of Hyderabad. The next prominent locus identified was 9p21.3 which was found to have pleiotropic effects, including its role in atherosclerosis 20 . Recently, a male specific association of rs7865618 from this locus was reported among premature CAD cases of south Indians, albeit not significant after correction for multiple testing 18 . However, the present study confirms its association in the cohort of south Indians from Hyderabad. The association pattern of this SNP is also consistent with the earlier reports where AG and GG genotypes were shown to be significantly associated with increased risk of CAD 19,21,22 . Intriguingly, the rs7865618 that emerged as the only significant SNP associated in the pooled CAD cohort (p = 0.0003) also turned out to be the only significantly associated SNP with the insignificant (0.037), SVD (p = 0.022) and TVD (p = 0.023) anatomical categories and ACS (p = 0.007) phenotypic category. Hence, this could serve as an independent prognostic marker of the CAD in general. This SNP is located in the CDKN2B-AS1 gene which encodes for a long non-coding antisense RNA transcript known as cyclin-dependent kinase 2B antisense RNA. It was presumed that disease risk caused by the SNPs at 9p21.3 act through this long non-coding RNA, which is commonly referred to as the antisense non-coding RNA in the INK4 locus (ANRIL). ANRIL was shown to control the expression of three major tumour suppressor loci within the INK4b-ARF-INK4a gene cluster 23 . Moreover, expression of ANRIL was observed to be specific to atherosclerotic tissues and it upregulates the cell proliferation 20 . A recent study in the south Indian population revealed significant association of rs7865618 with periodontitis (a key risk factor for CAD) and thereby suggested possible susceptibility to CAD predisposition 19 . Another study from Tehran observed similar pattern of association of rs7865618 with coronary heart disease 22 . It is plausible to surmise that the minor allele of rs7865618 may encode for a defective CDKN2B antisense RNA that affects the expression of corresponding CDKN2B gene, influencing the vascular tissue cell division and thereby causing the susceptibility to CAD development.
Concurrent to the association pattern of SNPs with DVD category, these SNPs were also observed to show significant association with CAD pathology in a north Indian population 24 . Therefore, given that the frequency of CAD cases diagnosed in this phase were comparatively lesser than that of the insignificant, SVD and TVD categories, the DVD may be regarded as a short phase that precedes the development of TVD, which might serve as an indicator of CAD progression. The SNP rs17694493 is intronic, relatively more significantly associated (p = 0.006) with SVD in the present study and was thought to be implicated in CAD pathology. Significant association of rs7865618 from this region with severe anatomic and phenotypic categories of CAD indicated the involvement of 9p21.3 region in CAD progression, albeit relatively larger sample size for the subcategories of CAD would have provided sufficient statistical power and greater degree of confidence in the inference. Further Table 6. Significant SNP-SNP interactions between 9p21.3 and 11q23.3 regions with effects on CAD. *SNP significant for allelic and genotypic association with CAD. Bold indicates the significant risk conferring lone SNP and the p values of epistasis. www.nature.com/scientificreports/ the interaction analysis between rs7865618 of 9p21.3 and SNPs rs2187126, rs1263163, rs2849165 of 11q23.3 indicated that the intronic variant (rs2187126) of BUD13 gene and intergenic variants (rs1263163, rs2849165) of APOA4-APOA5 genes have a profound risk towards CAD. Either individually or in interaction with other SNPs of 11q23.3 region, the variants of BUD13 gene are reported to elevate lipid traits as well as conferring risk towards coronary disease. With its evolutionarily conserved function of forming pre m-RNA Retention and Splicing (RES) complex, BUD13 appears to be one of the key regulating genes of 11q23.3 region. Further, the APOA4-APOA5 genes are well known for their role in coding activating enzymes of cholesterol metabolism there by regulating cholesterol homeostasis 11 . It may be hypothesized from the findings of the present study that the variants in 11q23.3 chromosomal region harbouring apolipoprotein encoding or regulating genes might be involved in defective cholesterol homeostasis thereby resulting in increased levels of oxidised low density lipoproteins (LDLs). This is the first event in the process of atherosclerosis. Subsequently, the variants of other pathways such as immune and inflammation or cell cycle regulation might lead to an imbalance in further events of atherosclerosis such as invasion and functioning of cells of immune system, thereby leading to formation of fibrous cap in the inner lining of blood vessels and thrombosis or clot formation. The CDKN2B-AS1 gene harboured at 9p21.3 region might play an important role in the later process of atherosclerosis. However, the invitro studies on the expression levels of these genes might help in validating the hypothesis.
In conclusion, the SNP rs7865618 of CDKN2B-AS1 gene from the 9p21.3 region could be a strong candidate to serve as a predictive marker for CAD risk. This SNP may be screened in relatively all the anatomic and/or phenotypic forms of CAD in order to determine the prognosis or severity of the disease. It might also be prudent to screen for rs7865618 in other complex diseases such as hypertension and type2 diabetes as precautionary/ preventive measure for possible future development of CAD. However, since this SNP showed interactions with other SNPs at 11q23.3 locus in causing risk for CAD, 9p21.3 remains to be the most significant region with rs7865618 as the predictive marker that needs to be considered for functional evaluation. On the other hand, analysis of interactions between risk variants of various GWAS identified CAD loci would help in identification of causal variants for CAD complexity and phenotypic heterogeneity.

Materials and methods
All methods were carried out in accordance with relevant guidelines and regulations. The study protocol was approved by the Indian Statistical Institute Review Committee for Protection of Research Risks to Humans. Written informed consent of all the participants is obtained as per the guidelines.
Study design, sample and data collection. The present study was part of a major project on CAD carried out by the corresponding author at the Indian Statistical Institute, Hyderabad during 2011-2016. For this case-control study, a total of 1024 subjects comprising 508 CAD cases and 516 controls broadly representing the populations of the undivided Andhra Pradesh were included. The CAD cases were recruited from the CARE hospitals, Hyderabad, after their evaluation by interventional cardiologists. Patients with characteristic symptoms of stable/unstable angina pectoris along with variable degrees (generally > 40%) of stenosis in at least one of the major coronary arteries as determined through angiogram were included in the study. Cases with monogenic diseases, valvular heart disease, cardiomyopathy, renal disease, acute and chronic viral or bacterial infections, asthma, tumours or connective tissue diseases and other vascular diseases were excluded from the study. All the cases were evaluated by interventional cardiologists at the CARE Hospitals, Hyderabad, for the above mentioned criteria. The baseline characteristics of CAD patients recruited for the present study were already furnished in the previous paper 11 .
The CAD cases were categorized into the following four anatomical sub types 12 : (1) cases with 40-70% stenosis and symptomatic for CAD with characteristic atherosclerotic lesions as 'insignificant' disease, (2) with > 70% stenosis in any one of the major coronary blood vessel as 'SVD' (3) with > 70% stenosis in two major coronary blood vessels as 'DVD' and (4) with > 70% stenosis in three major coronary blood vessels are categorized as 'TVD' . We also categorized the cases based on the phenotypic severity into three broad classes, (1) those with characteristic symptoms of stable or unstable angina, (2) with symptoms of ACS, and (3) with reported myocardial infarction (MI). We could not retrieve relevant information for categorizing CAD cases into the above subcategories for some of the case samples, hence there was a difference in the total number of CAD cases used for anatomical and phenotypic severity categories when compared to the sample of pooled CAD cases.
The control samples were collected from Hyderabad and its vicinity, broadly representing similar ethnic composition, socioeconomic backgrounds as that of the cases, and aged above 45 years. The population of Hyderabad is a conglomeration of people from different parts of the undivided state of Andhra Pradesh and the mother tongue of most of its population is Telugu, one of the four Dravidian languages. It would be also pertinent to note that despite the subdivision of Telugu population into a number of traditionally endogamous castes and sub castes, Reddy et al. 25 observed genetic differentiation among the populations of Andhra Pradesh to be very low and insignificant; the Markov chain Monte Carlo analysis of population structure, which implements model based clustering method for grouping individuals into populations 26,27 , did not reveal any unique population clusters, suggesting high degree of genetic homogeneity.
The epidemiological and clinical data pertaining to the individuals who participated in the study were obtained through personal interviews using a detailed questionnaire, and from the hospital records. About 5 ml of peripheral fasting blood sample was collected from each of the subjects by certified medical lab technicians. All blood samples were used for isolation of DNA using phenol chloroform method 28 . The quality and quantity of isolated DNAs were determined with the help of Thermo Scientific Varioskan Flash Multimode Reader using Quant-iT PicoGreen dsDNA Assay Kit. Quantification of the samples was done at Sandor Lifesciences, a medical laboratory in Hyderabad. www.nature.com/scientificreports/