Integrative functional genomics identifies regulatory mechanisms at coronary artery disease loci

Coronary artery disease (CAD) is the leading cause of mortality and morbidity, driven by both genetic and environmental risk factors. Meta-analyses of genome-wide association studies have identified >150 loci associated with CAD and myocardial infarction susceptibility in humans. A majority of these variants reside in non-coding regions and are co-inherited with hundreds of candidate regulatory variants, presenting a challenge to elucidate their functions. Herein, we use integrative genomic, epigenomic and transcriptomic profiling of perturbed human coronary artery smooth muscle cells and tissues to begin to identify causal regulatory variation and mechanisms responsible for CAD associations. Using these genome-wide maps, we prioritize 64 candidate variants and perform allele-specific binding and expression analyses at seven top candidate loci: 9p21.3, SMAD3, PDGFD, IL6R, BMP1, CCDC97/TGFB1 and LMOD1. We validate our findings in expression quantitative trait loci cohorts, which together reveal new links between CAD associations and regulatory function in the appropriate disease context.


Reviewer #2 (Remarks to the Author)
The revised version of this manuscript is improved over the previous version. Just as I had noted in my previous review, the paper offers a tremendous amount of new data that should be of considerable interest to the field. However, I still have concerns about the authors conclusions regarding the enrichment of inflammatory SNPs (and other disease-associated SNPs) in the ATACseq peaks from HCASMC cells (Fig 2e). The authors present the results of similar analyses from other cell lines (K562, Glioblastoma cells, and GM12878 cells), and the interpretation of these results is VERY subjective. This needs to be better fleshed out. Specifically, are the open chromatin peaks which harbor the 7 functional SNPs specific to HCASMC cells, or not? Are these sites the same sites that show enrichment of the immune-disorder and schizophrenia SNPS? Or are the immune/schizophrenia SNPs mapping to the same locus, but different sites of open chromatin from the CAD SNPs? Furthermore, it would be nice to know if these regions differ with respect to K27ac levels. This is important, as it has implications for assessing whether or not the CAD SNPs exert cell type specific effects and how they ultimately confer susceptibility to CAD.

Reviewer #3 (Remarks to the Author)
In response to Nature Genetics reviewers comments the authors have responded point by point. Responses: 1.1 The authors response to explain the use of TGF-b and PDGF to stimulate the cells prior to genomics analysis is justified with the additions in the main text and figures and the use of appropriate references. 1.2 The authors response to explain the use of ATAC-seq and H3K27Ac is satisfactory. 1.3 Error is corrected. 1.4 Response satisfactory 1.5 Response satisfactory (added comments on main text) 1.6 Even if selective omission of a rogue sample improved somewhat the separation of PCs it is obvious that there is no clear clustering. Since this is obvious there is not much else that the authors can do other than state the challenging nature of the samples used for readers that are not familiar with these kind of samples. 1.7 Response satisfactory (added text and Fisher test p values. 1.8 The response of the authors to comments on ASE of SMAD3 isnot satisfactory. I found frustrating to navigate the manuscript main text and supplement to identify which SNP was used as a transcript surrogate for rs17293632 and what is the r2 or D' value representing the haplotype relationship between the genetic marker and the transcribed SNP. If the authors want to uphold the claim that there is indeed rs17293632 SMAD3 ASE these need to be presented clearly. The use of more inclusive premRNA species does not solve the problem that rs17293632 and the SNP used to assess SMAD3 allelic levels are weakly linked. Since this is an important part of the downstream experimental validation of the genomics analysis I think a more detailed description of the experimental details is needed in the methods section. 1.9 & 1.10 Response satisfactory. From the data presented on fig 6b,c,d it is clear that this is a SNP that disrupts TF binding and is an eSNP for SMAD3 in HACSMC. This might not be statistically significant in cell heterogeneous tissues but the use of single cell types in culture is proving here to be an advantage. 1.11 Response satisfactory. However, as above include a table with the relationship between marker and transcribed surrogate so weak relationships such as(r2 0.4)are obvious. This is included in the response to comments but has not been incorporated in the manuscript.  a table as Supplementary Table 16 in the text (also see below), which includes the candidate SNP, transcribed SNP used to detect ASE, as well as the LD information from 1000 Genomes Phase 1 data in Europeans (both r 2 and D' values). As noted in the text, as well as this table, due to the weak linkage between rs17293632 and the proxy transcribed SNP, rs1065080, we measured the allelic imbalance in SMAD3 pre-mRNA levels using the candidate SNP, rs17293632 itself. As we stated in the previous rebuttal, this approach has been successful in deducing ASE for intronic SNPs (Remmers et al. Nat Genet 2010) given that intronic reads are an accurate measure of nascent transcription genome-wide (Gaidatzis et al. Nat Biotech 2015). Just to clarify, these data shown in Fig 6e. have replaced the ASE data using the proxy SNP, rs1065080, as noted in the previous rebuttal. We have now added this justification and experimental details to the results and methods sections.  Results, page 9, line 23: Given the weak linkage of a proxy transcript SNP to detect ASE, we measured SMAD3 pre-mRNA levels using individuals heterozygous at rs17293632 itself. This approach is supported by genome-wide methods correlating intronic reads to changes in transcription{Gaidatzis, 2015 #137}.

Methods, page 21, line 7:
In the case of SNP rs17293632, given the weak linkage between the proxy transcript SNP rs1065080, we used the intronic SNP rs17293632 itself to measure imbalance in the nascent SMAD3 pre-mRNA as a transcription surrogate{Gaidatzis, 2015 #137}. Another criteria for detecting ASE with these markers was MAF >0.15 in Europeans (as determined from 1000 Genomes data), since this is a limiting factor to obtain sufficient heterozygous individuals from the cohort of HCASMC. Refer to Supplementary  Table 16 for further details on these proxy transcribed SNPs. perturbed cultured human coronary artery smooth muscle cells, and in normal and atherosclerotic human coronary artery tissues. This highlighted a number of variants, which were subsequently studied/validated to some extent. Although the data does not contribute groundbreaking insight into the mechanisms of CAD, the approach for prioritising variants is highly valid and is likely to be emulated in other disciplines. The data appears robust and the manuscript is written clearly. Response: We appreciate the reviewer's positive comments.

Comment 3.2.
An important aspect of the work are the epigenetic studies under perturbed conditions. It will be very informative if the authors provide data as to how this has informed the selection of variants as compared to when such perturbation was left out. In other words, how useful / necessary was this perturbation? Response: We agree that this is important to clarify. In order to fully assess the impact of these perturbations on prioritizing CAD variants, we performed a differential accessibility analysis of ATAC-seq reads under each condition using the edgeR R/bioconductor package. While this software is typically used for gene expression analysis, we adapted it to define differential open chromatin by generating a read count table around the 5,240 candidate CAD SNPs. Using the serum-free condition as the Control and either TGFB, PDGF-BB, or PDGF-DD as the Experimental condition, we implemented a generalized linear model to test for differential accessibility between conditions using a likelihood ratio test to calculate significance at P < 0.05. From this analysis we defined 95 variants that resided in differentially accessible chromatin. These results are now included as Supplementary Table 17. Two of these variants were among our top candidates (rs2019090; P =0.0010 and rs73551707; P = 0.0229). However, as we noted previously, due to the variance between the different donors ( Fig. 1f) and the intrinsic activation of HCASMC when in culture, it may be difficult to identify a large number of differentially accessible regions overall. Nonetheless, we did perform overlaps of the variants using open chromatin regions in HCASMC cultured under these various perturbations. For instance we identified 104, 109, and 105 variants in each of the conditions (TGFB, PDGFBB, and PDGFDD, respectively) compared to 87 variants in the serum-free control condition. After applying functional annotations to these variants using ENCODE and Roadmap data this resulted in 72, 73, and 73 candidate variants, versus 64 in the control condition. In terms of variant selection, these conditions did not significantly influence the final candidate variant list. The more useful "perturbation" turned out to be the ex vivo coronary artery ATAC-seq data, which helped reduce the final candidate variants to 30, and ultimately 26 after applying the annotations.
We have now updated Supplementary Fig. 1 to include this information. In the case of the open chromatin region covering SNP rs7549250, besides HCASMC this region is accessible in a range of endothelial cells, as well as pancreatic and glial cells, while for rs1537373 this region is accessible in fibroblasts, and also in cervical and T-cells. This indicates that even for the most cell-specific of these 7 SNP/open chromatin regions, cell specificity is not limited to HCASMC or physiologically and phenotypically SMC-related vascular cell types, such as endothelial and fibroblast cells, but instead represents a range of diverse cell types (e.g. astrocytes, pancreatic cells etc), ultimately which may have non-overlapping effects on gene expression.

Reviewer #4
We also investigated the overlap of our top 7 candidate SNPs with Roadmap Epigenomics open chromatin and histone modification ChIP-seq data from 127 different cells and tissues (many of which are primary tissues not represented in the ENCODE dataset). These tissues are ranked by binomial P-value of enrichment for the SNP overlap versus all GWAS catalog variants as a background. From this analysis we observed that 3 of our top candidate variants (rs1537373, rs17293632, and rs2019090) were enriched in Aorta (in bold), with rs2019090 (PDGFD) enriched in a more limited set of tissues including Aorta. Interestingly all 7 showed significant enrichment for smooth muscle tissues (also in bold) in the GI system (including small intestine, colon, duodenum, esophagus, stomach and rectum). Again, while these regions are not HCASMC specific, they likely represent enhancers that are shared between tissues, related to the developmental origin or physiological function of the tissue composition (e.g. smooth muscle contraction in GI system (peristalsis) versus coronary vasculature (blood flow to heart). We have summarized this Roadmap data below: H3K27ac does not show clear peaks as in transcription factor ChIP-Seq analysis, rather it shows a broad trace, thus here we focused on regions +/-1000bp surrounding the SNPs and counted the reads with bedtools multicov. When we counted the H3K27ac coverage in a window +/-1000bp around the 7 candidate CAD SNPs and compared it to the IgG control we obtained the following results. This indicates that these SNPs are surrounded with the H3K27ac modification, however the signal varies among them, with rs17293632 and rs2241718 having the strongest H3K27ac signal represented as fold change normalized to number of reads. Next we selected SNPs from the GWAS catalog for schizophrenia, bipolar disorder, rheumatoid arthritis, inflammatory bowel disease, Crohns disease, coronary artery and coronary heart disease, and tested the signal of H3K27ac in +/-1000bp regions. We performed the same analysis as above and obtained normalized fold changes for 4 biological replicate H3K27ac experiments (from different HCASMC donors) and calculated an average value for each SNP. The lead SNPs from different GWAS categories do not differ in their HCASMC H3K27ac signals overall, which is also very low, and that is somewhat expected as lead GWAS SNPs are often located outside histone modification rich regions. However, H3K27ac signals appear to be much higher for the 7 tested SNPs in our study (see Figure below).  Supplementary Fig. 7. Consistently, we observed that the signal for immuno-and brain-related categories remained high in all three experiments, further supporting our hypothesis of shared pathways between these complex phenotypes.