Genome‐wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease

Identifying genetic variants associated with circulating protein concentrations (protein quantitative trait loci; pQTLs) and integrating them with variants from genome-wide association studies (GWAS) may illuminate the proteome’s causal role in disease and bridge a knowledge gap regarding SNP-disease associations. We provide the results of GWAS of 71 high-value cardiovascular disease proteins in 6861 Framingham Heart Study participants and independent external replication. We report the mapping of over 16,000 pQTL variants and their functional relevance. We provide an integrated plasma protein-QTL database. Thirteen proteins harbor pQTL variants that match coronary disease-risk variants from GWAS or test causal for coronary disease by Mendelian randomization. Eight of these proteins predict new-onset cardiovascular disease events in Framingham participants. We demonstrate that identifying pQTLs, integrating them with GWAS results, employing Mendelian randomization, and prospectively testing protein-trait associations holds potential for elucidating causal genes, proteins, and pathways for cardiovascular disease and may identify targets for its prevention and treatment.

A very helpful addition is Figure 1. While my own taste would have some of those numbers changed (e.g., rather than total number of associations which most don't care about, the number of physical loci tagged by LD-independent sentinel SNPs), this is helpful to focus on what's potentially the reported outputs from the paper.
The challenge I have on this read is that each of the presented analyses don't feel "tight" or "conclusive" (save the initial discovery, minus the relatively large number of trans-pQTLs) around any given story. Each individual unit has some challenges that aren't closed up: -"enrichment with eQTLs" -this represents enrichment, not formal co-localization. Chance overlap is very likely to happen, so the numbers there aren't surprising nor to they tell us about specific pathways or biology -"functional, reg, clinical enrichments -enrichment of CAD pathways for a preselected set of proteins screened for CHD relevance is circular. The chance overlap with coding variants isn't explored deeply -the work on page 5 is really high level or confirmatory reporting.
-"MR" -done with trans-QTLs which is a difficult set to explore. The genes highlighted and connection to epidemiological correlation is not crisp. The genes that are implicated here don't have an obvious, biological story.
-"pQTL overlap with CHD SNP" -again, we care about co-localized results, not chance overlaps. You'd expect some of this. No "tight" inference directed to a biological inference.
The above issues are placed in a context of a /large/ number of trans-pQTL discoveries (more numerous than cis-pQTLs), which even after a more stringent testing correction still makes me concerned. This is not to say that there aren't potential good things in the paper -the discovery and efforts to replicate are laudable; multiple studies with different platforms for assessment of variation. It is definitely believable that there exists pQTLs in there.

Reply:
We thank the reviewer for the complimentary comments and useful suggestions for further improvement. As detailed below, we have made numerous revisions in response to the reviewer's suggestions: 1. Based on reviewers' comments, we have raised the SNP imputation quality r 2 from 0.3 to 0. 5 Table S6) consisting of 40 sentinel cis-pQTLs ( Figure 2a) and 63 sentinel trans-pQTLs (Figure 2b). Among the 16,602 pQTL variants, 341 were coding variants associated with 19 proteins (Table S7) and 33 were rare variants (minor allele frequency <1% genotyped on Exome Chip) associated with 17 proteins (Table S8). In addition, 1,689 insertion/deletion polymorphisms were identified for 55 proteins ( "Trans-pQTL variants and CHD: Of the nine proteins with pQTL variants that coincided with CHDassociated SNPs, eight had pQTLs with trans effects (54 non-redundant trans-pQTL variants in total) and 69% of these trans-pQTL variants were also associated with the expression of nearby genes (cis-eGenes), i.e. these trans-pQTL variants were also cis-eQTL variants associated with the expression of nearby cis-eGenes. Based on these findings, we hypothesized that trans-pQTL variants may regulate circulating protein levels through cis-effects on the expression of nearby cis-eGenes ( Figure S3). To test this hypothesis, we employed Mendelian randomization (MR) 16 Figure 5) " 2. Figure 1: We have clarified our flow diagram by stating the total number of physical loci identified: "GWAS of 71 proteins yielded 16,602 pQTL variants representing 103 loci for 57 proteins." Please see revised Figure 1 below: 3. Enrichment with eQTLs: We have replaced the "Enrichment of pQTLs with eQTLs" section with "Colocalization of pQTLs and eQTLs". Please refer to our new analyses and our reply to Comment 2. 4. Functional, regulatory, and clinical enrichments: We conducted pathway enrichment for each protein using the annotated genes corresponding to its pQTL variants. We did not conduct pathways enrichment using the preselected proteins because they represent a biased set of proteins with a high prior probability of association with cardiovascular disease. 5. MR: We did not use trans-QTL variants as instrumental variables in any MR analyses. We only used cis-eQTL variants or cis-pQTL variants. Please refer to our detailed reply to Comment 5. 6. pQTL overlap with CHD SNPs: We have conducted a new colocalization analysis of pQTLs and CHD GWAS SNPs. Please refer to our reply to Comment 3.

Reviewer 1 Comments.
Comment 1. Power calcs. Given the size of the replication study, and the given effect sizes (which are likely winner's cursed, which is central to the authors' contention), it should be knowable the extent to which replication was expected. Does the given number match these expectations, exceed, or are less than? The author's response here -that lower replication is due to loss of power from either (i) effect sizes, or (ii) sample size are essentially the 'same coin' by and large. But that is knowable, to some extent, given the replication attempt they did employ. Does the % yield make sense?If predicted power was high for all attempts, then it begs the question if the criteria for significance is not stringent enough, or that there's something else that is not able to be understood.

Reply:
We thank the reviewer for this suggestion. We have performed 1000 re-samplings of 3300 FHS participants to evaluate our power for replication. We predicted that there would be 80 sentinel pQTLprotein associations with 80% power to replicate. We found that 54 (67%) of them replicated in the external INTERVAL cohort. The failure of proteins to replicate may be due proteomic platform differences between discovery and replication studies.
We have modified the Methods section as follows:

Reply:
We thank the reviewer for this suggestion. We have conducted de novo colocalization analyses using the COLOC package as the reviewer suggested. Among the 33 proteins that have both cis-pQTLs and cis-eQTLs, 19 proteins were found to have >75% probability of sharing a causal SNP.
We have modified the Statistical Methods section as follows: "Colocalization analysis: For the colocalization analyses, we first pruned our pQTL variants, retaining only those with LD r 2 <0.1. Using FHS eQTL results 20 , we then identified transcripts that were also associated with the same pQTL variant (i.e. pQTL variant = eQTL variant). To estimate the probability that cis-eQTLs and cis-pQTLs residing in the same genomic location shared the same causal variant, we conducted a Bayesian test for colocalization of all eQTL-pQTL pairs using the coloc package in R 21 . This method requires specifying a prior probability for a SNP being associated with gene expression only (p1), protein level only (p2), and with both traits (p12). We applied the default P values, with p1 and p2 set to 1E-4, assuming that 1 in 10,000 SNPs are causal for either trait and p12 was set to 1E-5. We also used p1 and p2 values based on the number of eQTL and pQTL variants observed in our data. For the eQTL analysis, we detected 19,613 nonredundant eSNPs among 4,285,456 total cis-SNPs, indicating that the probability a SNP is a causal eSNP is 4.6E-3. This probability corresponds to the sum p1 + p12. For the pQTL analysis, we detected 254 non-redundant pQTL variants among 440,409 total cis-SNPs, indicating that the probability that a SNP is a causal pQTL variant is 5.7E-4. This probability corresponds to the sum p2 + p12. p12 was set to 0.003 corresponding to a probability of 75% that a causal eSNP is also a causal pQTL variant, an approach that has been shown to represent the best choice 77 ." We modified the Results section as follows: 2b. In addition, here you are also potentially implicating a specific gene. So the question is not just of statistical co-localization, but also that the same gene is implicated. At face, the fact that only 11/114 of the sentinel pQTL variants ping the same gene probably begs some further scrutiny. If the tissues are approximately the same assayed, shouldn't this be higher? What other ways in which a pQTL could be biologically bona-fide, yet not generate an eQTL signature in the appropriate tissue?
Reply: Although pQTLs and eQTLs were both identified in FHS participants, gene expression and protein expression were measured at different time points, on average 6-7 years apart, as described in the Methods section. To avoid misinterpretation, we have replaced the "Enrichment of pQTLs with eQTLs" section with "Colocalization of pQTL variants and eQTL variants". Please see the reply to Comment 2a for details.
2c. For the 11/14, there seems like a majority are directionally consistency (I read: 8/11), but shouldn't this be substantially higher/near perfect? That is, unless the pQTLs and eQTLs aren't co-localized. Let's say all of these 11 co-localized. Can the authors describe a model for which this inconsistency of directionality makes sense?
Reply: Because gene expression and protein expression were measured at different time point, we are reluctant to compare directional consistency between gene expression and plasma protein levels. As the reviewer suggested, we have replaced the "Enrichment of pQTLs with eQTLs" section with "Colocalization of pQTL variants and eQTL variants". Please see our reply to Comment 2a for details. Comment 3. CHD overlaps (comments 6a).
3a. The authors state that the "localization" was performed "by direct matching" which isn't exactly what one wants here. As above, how many of the pQTLs actually co-localize with CHD associations? COLOC and other statistical tools here will tell you if the distribution of associations signals for both scans "line up", implicating the same causal variant. The implications is that two associations partially overlap, but don't implicate the same causal variant and/or gene, which could be coincidental.

Reply:
We thank the reviewer for this suggestion. Colocalization analysis can only be applied to cis-pQTL variants that overlap with GWAS SNPs for CHD (at P<5e-8). The only protein eligible for colocalization analysis using cis-pQTL variants was LPA. The co-localization result for LPA was highly significant, as shown below. Note that for all other proteins with pQTL variants that coincided with GWAS SNPs for CHD, the pQTL variants were exclusively trans-pQTLs and colocalization could not be performed. Given the scarcity of proteins eligible for co-localization analysis, we decided to not present the results for LPA (for which we instead share the more informative MR results 4. trans-QTL effects. The number of effects here -measured as a function of the 'sentinel' loci -is 2:1 that of the cis-eQTLs. this doesn't feel right. Empirically, at least in gene expression space, we find far more cis-eQTL effects than trans-eQTLs. I have to believe that a similar intuition applies here. Put another way: not all proteins measured have a detectable cis-pQTLs (at most 50%, 32 of / 71), which to me is surprising.

Reply:
The reviewer is comparing cis-eQTLs with cis-pQTLs, but SNP effects on gene expression may be quite different from those on protein expression. We found that trans-pQTL variants can regulate remote proteins through the expression of nearby cis-eGenes (Please refer to the "Trans-pQTLs and CHD" section in manuscript for details). Therefore, it may not be surprising that we observed fewer cis effects than trans effect. To address this further, we checked on the cis vs. trans associations reported in the INTERVERAL study (https://www.biorxiv.org/content/early/2017/05/05/134551), which assessed pQTL variants for ~3000 proteins. We discovered that they had very similar results to ours. They found 1478 proteins to have pQTLs. Among them, 374 (25%) proteins had cis-pQTLs only, 925 (63%) proteins had trans-pQTLs only, and 179 (12%) proteins had both cisand trans-pQTLs. Therefore, we concluded that our cis-to-trans ratio, while different from that for eQTLs, is consistent with that from the INTERVAL study.
5. The Causal inference / MR (+ epidemiological correlation) experiments, linking pQTLs to gene loci 5a. These results (based on trans-QTLs which are undoubtedly amongst the most likely to have additional, pleiotropic effects) are perhaps the least optimal to include in an MR experiment for an outcome. intrinsically, these are the hardest to believe on face.

Reply:
We agree with the reviewer that trans-QTLs are more likely to violate the assumptions of MR testing, due to potential pleiotropic effects. The same concern had been raised in the last round of review. Therefore, in the submitted manuscript, we updated all MR analyses using only cis-pQTLs as instrumental variables for each protein.
In the current manuscript, we further modified the Methods and Results sections to emphasize this information more clearly.
We have modified the Statistical Methods section as follows: 5b. An example as stated earlier is the CELSR2, APOB relationship. surely this must reflect Reply: As noted above in our reply to Comment 5a, we only used cis-pQTLs to explore the causal associations between protein levels and CHD. As we did not identify any cis-pQTL variants for APOB in the current study, we did not further explore the causal effect of circulating APOB levels on CHD. However, we considered it interesting to understand the underlying regulatory mechanism between trans-pQTLs and protein levels. Therefore, we used cis-eQTL variants for genes residing within the CELSR2/SORT1/PSRC1 locus as instrumental variables to test if circulating levels of APOB are causally affected by cis-eGene expression. We found that PSRC1 in the CELSR2/SORT1/PSRC1 locus causally affected APOB protein levels. Please refer to Figure S3 and the "Trans-pQTLs and CHD" section in the manuscript for details.
5b. But then Figure 5 attempts to put together the MR effect sizes with the associated hazard in FHS. The story to take back from this plot is challenging. You have a ton of things that are null (mostly the Epi HRs) but then opposite direction (PON1, BCHE, sRAGE). The positive controls are certainly working (LPA), and maybe there a signal there -but it looks very noisy.

Reply:
We thank the reviewer for this comment. When interpreting the results from the MR test, we considered it worthwhile to compare the putative causal estimates with the observed protein-trait associations, taking advantage of the longitudinal cardiovascular disease outcome data available in the FHS. When there was an overlap between the confidence intervals of the MR odds ratio and the longitudinal protein-trait hazards ratio, we considered the putative causal estimate and the observed protein-trait estimate to be not significantly different. We therefore included both estimates in Figure 5 to allow such comparisons. When the 95% CI of the MR odds ratio and the longitudinal protein-trait hazards ratio did not overlap, the putative causal estimate and the observed protein-trait association estimate were considered to be significantly different. This was the case only for PON1, where the putative causal estimate and observed protein-trait estimate differed significantly. We further addressed a potential explanation for this unexpected difference in the "Novel Proteins and Pathways Implicated in CHD" section.
# minor comments -how were the statistical threshold determined? 1.2E-7 for cis pQTLs? the 7E-10 I savvy, given the backof-envelope I suggested to the authors. But in the methods, it should be clear exactly how they arrived at the threshold they did.

Reply:
We thank the reviewer for this suggestion. We have revised the Methods section as follows: "We estimated that there were 440,409 potential cis SNP-protein pairs in total. Therefore, the Bonferroni-corrected P value for cis-pQTL variants was calculated as 0.05/440,409 = 1.25E-7. The number of potential trans SNP-protein pairs was 8.5 million (SNPs) x 71 (proteins) -440,409, yielding a Bonferroni corrected P value for trans-pQTLs of 7.04E-10." Reviewer #1 (Remarks to the Author): The authors provide a details response to my concerns. I have additional comments here.
a. In my previous comments, I suggested that because the enrichment analysis based on the proteins pre-selected to related to CHD, any enrichment analysis performed here is to be expected.
In looking more closely at this, it doesn't appear that there's any statistical quantification of enrichment at all here (w.r.t. gene enrichment). The use of the specific tool is simply as one for annotation. The annotations w.r.t. gene location are fine.
As such, I'm not sure at all what the the last 3 sentences add, or Figure 2. If the genes selected for protein quantification are already CVD related, then all of this is known, enrichments are not tested, so I don't know wha any of this actually adds.
b. It looks to me like the KEGG Pathway analysis involves taking all the genes in the interval around a pQTL -not sure this realistically adds anything at all. c. It also seems like most of the supplement ( Figure S2) is a long list of these outputs -again, I'm not sure 100+ pages of these figures really add much and certainly aren't particularly useful.
2. Co-localization analysis. The authors respond by performing a colocalization analysis with coloc. The authors state that the first thing they did was "pruned the pQTL variants, retaining only those with LD r2 < 0.1". I'm not entirely sure I understand what this means; the specifics for priors I understand (could be left in methods), but the overall approach as worded is unclear enough to me that I want to make sure this analysis is performed in the way that I expect it to.
Can the author confirm the following general approach: a. Identify a locus that harbors a pQTL -e.g., at least one sentinel SNP that is the most strongly associated, surpassing a multiple-test correction. Note the targeted associated gene of interest.
b. Determine if, for the given target gene, if there exists an associated eQTL for that gene in FHS (blood).
c. If there exists at least one eQTL associated variant in the interval, perform co-localization, i.e.
d. Take the regional association data for all variants (say, that spans the territory), for pQTL and eQTL association. This step should involve all SNP associations data, not pruned data (looking at Table S12, not sure what n(SNP) means -this can't be the number of SNPs going into the coloc analysis?) e. Report PP1, PP2, PP3, PP4 data. Ideally, PP4/(PP3+PP4) suggests some localization, very high PP4 values denote very strong association f. Visual (heuristic) conformation of localization could be achieved using locus zoom plots. I suggest manually checking to eyeball the statistical values reported. If there are truly strong PP4 scores, the patterns of eQTL and pQTL associations OVERALL in the region will appear strongly correlated.
g. To close the loop, one could then also perform formal conditional analysis on the lead SNP. If the sentinel SNP is not the same, but the signals statistically localize, conditional analysis on one (or the other) SNP in the pQTL or eQTL can would ablate each other's signals, respectively. I would perhaps perform this only if one had a really interesting result and want to make sure the signals truly to map to the same signal (and, potentially, the same causal variant). This doesn't need to be done on everything (or anything, if specific pQTL:eQTL pairs aren't interesting enough to warrant it).
3a. Section trans-pQTL and CHD: They authors present a model whereby trans-pQTLs act as they do, because they are cis-eQTLs. I would believe this to the extent that the cis-eQTL and trans pQTL data localize to the same variant.
3b. Moreover, the interpretation of this is challenged somewhat, as the "cis-eQTL" gene that is emphasize for many of the results is PSRC1. For the biology here, it's clear that there is far more compelling data supporting SORT1 as the relevant gene. Because the eQTL for PSRC1 and SORT1 (and another gene in the region) are the same, I think it must be because PSRC1 is an eQTL in blood and Liver, whereas SORT1 is specific to the liver. I understand why the authors are presenting the data "just the facts" as they are, but they must agree that the biology is more compelling, and that there must be ways to avoid confusion about it -unless they wish to make a case that PSRC1 is another causal gene. (though that would require far more evidence that is presented here).
It could be as simply as presenting the gene tag as "CELSR2/SORT1/PSRC1" in the places where relevant instead of PSRC1. But the tables, main text, etc. do not at all mention any of what's listed in the rebuttal.
## Minor comments -Abstract: a. "comprehensively mapping over 16,000 pQTL variants" -this is overstated. The total number of LD-independent sentinel associations is 2-2.5 orders of magnitude smaller than this.
b. "coincided with CAD risk" is not a particularly helpful statement, and again overstates what was actually observed -that is, 1 of the 13 pQTLs statistically co-localized with an established CHD association, and it occurs at a known locus (LPA). "coincided" i.e., 'coincidental' association means a lack of mechanistic link, which you formally evaluate c. the last sentence also seems like a stretch. I appreciate the need from the authors to drum up their work; but given the score of what is in this report, this claim does not feel particularly well justified.
- Figure S3. I felt like I had to stare at this figure for 3 minutes to understand it. This could be visualized in a much better way (with a causal graph, for example).
-I felt like the discussion of genes at the end felt a little much --some discrepancies certainly are worth talking about, but this all felt a little bit speculative since I wasn't fully convinced by the MR analyses at the end. If presented more as a resource than putting a ton of weight on causal discovery (i.e., calibrated a bit more conservatively), suggesting follow-up efforts that require more work and/or data to proven convincingly, I think this would read much more amicably (at least for my taste, for what that's worth).

Detailed Response
Reviewer #1 (Remarks to the Author): The authors provide a details response to my concerns. I have additional comments here.

Reply:
We thank the reviewer for the useful suggestions for improving our manuscript and have made further revisions in response to each of the reviewer's suggestions as detailed below.
1. Functional, reg, clinical enrichments. a. In my previous comments, I suggested that because the enrichment analysis based on the proteins pre-selected to related to CHD, any enrichment analysis performed here is to be expected. In looking more closely at this, it doesn't appear that there's any statistical quantification of enrichment at all here c. It also seems like most of the supplement ( Figure S2) is a long list of these outputs -again, I'm not sure 100+ pages of these figures really add much and certainly aren't particularly useful.

Reply:
We thank the reviewer for these suggestions. We agree that the KEGG pathway analysis did not add much high-value information. Therefore, we removed Figure S2 (100+ pages!). We have retained Figure 2 and the last 3 sentences in the "pQTL Functional and Regulatory Annotations" section. We believe that functional annotation clarifies previously known biological pathways and identifies several novel proteins as candidate CHD drug targets by linking their pQTLs to CHD risk pathways. We have revised the manuscript as follows: 2. Co-localization analysis. The authors respond by performing a colocalization analysis with coloc. The authors state that the first thing they did was "pruned the pQTL variants, retaining only those with LD r2 < 0.1". I'm not entirely sure I understand what this means; the specifics for priors I understand (could be left in methods), but the overall approach as worded is unclear enough to me that I want to make sure this analysis is performed in the way that I expect it to.
Can the author confirm the following general approach: a. Identify a locus that harbors a pQTL -e.g., at least one sentinel SNP that is the most strongly associated, surpassing a multiple-test correction. Note the targeted associated gene of interest. b. Determine if, for the given target gene, if there exists an associated eQTL for that gene in FHS (blood). c. If there exists at least one eQTL associated variant in the interval, perform co-localization, i.e. d. Take the regional association data for all variants (say, that spans the territory), for pQTL and eQTL association. This step should involve all SNP associations data, not pruned data (looking at Table S12, not sure what n(SNP) means -this can't be the number of SNPs going into the coloc analysis?) e. Report PP1, PP2, PP3, PP4 data. Ideally, PP4/(PP3+PP4) suggests some localization, very high PP4 values denote very strong association f. Visual (heuristic) conformation of localization could be achieved using locus zoom plots. I suggest manually checking to eyeball the statistical values reported. If there are truly strong PP4 scores, the patterns of eQTL and pQTL associations OVERALL in the region will appear strongly correlated. g. To close the loop, one could then also perform formal conditional analysis on the lead SNP. If the sentinel SNP is not the same, but the signals statistically localize, conditional analysis on one (or the other) SNP in the pQTL or eQTL can would ablate each other's signals, respectively. I would perhaps perform this only if one had a really interesting result and want to make sure the signals truly to map to the same signal (and, potentially, the same causal variant). This doesn't need to be done on everything (or anything, if specific pQTL:eQTL pairs aren't interesting enough to warrant it).

Reply:
We thank the reviewer for this excellent suggestion. We have revised the methods used in our colocalization analysis and now use all variants instead of only LD-pruned SNPs.

Methods:
Colocalization of cis-pQTLs and eQTLs: Colocalization analysis involved a two-step procedure.  (Table  S12). We observed similar colocalization results by applying default P values (p1=1E-4, p2=1E-4, p12=1E-5) in the coloc package assuming that 1 in 10,000 SNPs is causal for either trait. 3a. Section trans-pQTL and CHD: They authors present a model whereby trans-pQTLs act as they do, because they are cis-eQTLs. I would believe this to the extent that the cis-eQTL and trans pQTL data localize to the same variant. 3b. Moreover, the interpretation of this is challenged somewhat, as the "cis-eQTL" gene that is emphasize for many of the results is PSRC1. For the biology here, it's clear that there is far more compelling data supporting SORT1 as the relevant gene. Because the eQTL for PSRC1 and SORT1 (and another gene in the region) are the same, I think it must be because PSRC1 is an eQTL in blood and Liver, whereas SORT1 is specific to the liver. I understand why the authors are presenting the data "just the facts" as they are, but they must agree that the biology is more compelling, and that there must be ways to avoid confusion about it -unless they wish to make a case that PSRC1 is another causal gene. (though that would require far more evidence that is presented here). It could be as simply as presenting the gene tag as "CELSR2/SORT1/PSRC1" in the places where relevant instead of PSRC1. But the tables, main text, etc. do not at all mention any of what's listed in the rebuttal.

Reply:
We thank the reviewer for this suggestion. We have replaced PSRC1 with CELSR2/SORT1/PSRC1 in the manuscript and supplementary tables as follows:

Results:
Moreover, we found significant causal effects of CELSR2/SORT1/PSRC1 on APOB levels (in liver), CELSR2/SORT1/PSRC1 on GRN levels (in artery), and ABO on GMP140 levels (in heart atrial appendage). a. "comprehensively mapping over 16,000 pQTL variants" -this is overstated. The total number of LDindependent sentinel associations is 2-2.5 orders of magnitude smaller than this. b. "coincided with CAD risk" is not a particularly helpful statement, and again overstates what was actually observed -that is, 1 of the 13 pQTLs statistically co-localized with an established CHD association, and it occurs at a known locus (LPA). "coincided" i.e., 'coincidental' association means a lack of mechanistic link, which you formally evaluate c. the last sentence also seems like a stretch. I appreciate the need from the authors to drum up their work; but given the score of what is in this report, this claim does not feel particularly well justified.

Reply:
We thank the reviewer for these suggestions. We have revised the abstract as follows: - Figure S3. I felt like I had to stare at this figure for 3 minutes to understand it. This could be visualized in a much better way (with a causal graph, for example).

Reply:
We agree with the reviewer. We have revised the Results and figure S3 as follows: Results: Trans-pQTLs and CHD: Of the nine proteins with pQTL variants that precisely matched CHDassociated SNPs, eight had pQTLs with trans effects (54 non-redundant trans-pQTL variants in total) and 69% of these trans-pQTL variants were also associated with the expression of nearby genes (cis-eGenes), i.e. these trans-pQTL variants were also cis-eQTL variants associated with the expression of nearby cis-eGenes. Based on these findings, we hypothesized that trans-pQTL variants may regulate circulating protein levels through cis-effects on the expression of nearby cis-eGenes ( Figure S2a). To test this hypothesis, we employed Mendelian randomization (MR) 5 using the expression of all genes within 1 Mb of the trans-pQTL locus as the exposure, cis-eQTLs associated with these genes (from the FHS whole blood gene expression database 2 ) as instrumental variables, and circulating protein levels as the outcome. We found that for eight proteins the effects of trans-pQTLs on circulating protein levels were causally regulated by the expression of cis-eGenes (Table S14) Figure S2b). To extend these findings to other CHD-related tissues, we applied the same analyses to GTEx 8 whole blood, liver, and heart eQTLs. For two of the proteins (APOB and GRN), there was additional experimental evidence in support of our results through interrogation of GTEx 9 whole blood eQTLs (Table S14). Moreover, we found significant causal effects of CELSR2/SORT1/PSRC1 on APOB levels (in liver), CELSR2/SORT1/PSRC1 on GRN levels (in artery), and ABO on GMP140 levels (in heart atrial appendage).    -I felt like the discussion of genes at the end felt a little much --some discrepancies certainly are worth talking about, but this all felt a little bit speculative since I wasn't fully convinced by the MR analyses at the end. If presented more as a resource than putting a ton of weight on causal discovery (i.e., calibrated a bit more conservatively), suggesting follow-up efforts that require more work and/or data to proven convincingly, I think this would read much more amicably (at least for my taste, for what that's worth).

Reply:
We thank the reviewer for this suggestion, and have condensed the section on "Novel Proteins and Pathways Implicated in CHD" as follows: Novel Proteins and Pathways Implicated in CHD: Six proteins were implicated by MR analyses as nominally causal for CHD (Table S15). Lipoprotein (a) (LPA), which interferes with the fibrinolytic cascade 12 , has been previously demonstrated to be causal for CHD 13 , and served as a positive control. Butyrylcholinesterase (BCHE) has previously been reported to be inversely associated with long-term CVD mortality 14,15 , and several polymorphisms within BCHE have been reported 16 to be associated with CHD risk factors. rs1803274, the sentinel cis-pQTL variant for BCHE (A539T) (Table S6), is associated with decreased BCHE circulating levels and enzymatic activity 17 , and has been shown to predict early-onset CHD 16 . Our protein-trait analyses similarly demonstrated inverse associations between plasma BCHE and long-term cardiovascular outcomes, and MR analyses revealed lower BCHE to be causal for CHD.
Four additional proteins were nominally causal for CHD by MR. PON1 exhibits cardioprotective effects through prevention of LDL oxidation 18 , and overexpression of PON1 in mice inhibits the development of atherosclerosis 19 . Our protein-trait analyses similarly demonstrated an inverse association between PON1 levels and long-term cardiovascular outcomes, but MR revealed that higher PON1 levels are causal for CHD ( Table 2). We hypothesize that this directional discordance may reflect down-regulation of PON1 expression in the setting of CHD. Myeloperoxidase (MPO), which promotes formation of atherosclerotic lesions by enhancing apolipoprotein B (APOB) oxidation within circulating LDL particles 20 , was positively associated with incident cardiovascular outcomes in our protein-trait and MR analyses. Cystatin C, a pro-atherosclerotic 21 cysteine proteinase cathepsin inhibitor and well-characterized biomarker of CHD risk 22 , was also positively associated with CVD events in our protein-trait and MR analyses. MCAM, or CD146, was causally associated with CHD risk in an inverse manner by MR. This is directionally consistent with prior animal studies of limb ischemia, which have shown that injection of sCD146 into the circulation decreased fibrosis and inflammation and increased local perfusion 23 .
Several proteins lacked cis-pQTLs and therefore were unavailable for MR analysis. Six of these proteins, which had pQTL variants that perfectly matched CHD SNPs from GWAS, were associated with CHD/CVD outcomes in FHS participants with long-term follow-up: GRN, sGP130, sICAM1, APOB, B2M, and CRP. GRN has previously been implicated in atherosclerosis progression and incident MI 24,25 . Its precursor, progranulin, has been shown to bind to SORT1, which contains a trans-pQTL variant for GRN that is also associated with CHD 26 . sGP130 levels have been shown to positively correlate with long-term CVD mortality 27 , perhaps via pathways related to hypertension and vascular remodeling 28 . B2M, an essential component of the major histocompatibility complex I 29 , is associated with hypertension 30 , atherosclerosis, and CVD 31 . Finally, circulating APOB, an LDL particle ligand, is a well-characterized biomarker of CVD risk. 32,33