Common variants in Alzheimer’s disease and risk stratification by polygenic risk scores

Genetic discoveries of Alzheimer’s disease are the drivers of our understanding, and together with polygenetic risk stratification can contribute towards planning of feasible and efficient preventive and curative clinical trials. We first perform a large genetic association study by merging all available case-control datasets and by-proxy study results (discovery n = 409,435 and validation size n = 58,190). Here, we add six variants associated with Alzheimer’s disease risk (near APP, CHRNE, PRKD3/NDUFAF7, PLCG2 and two exonic variants in the SHARPIN gene). Assessment of the polygenic risk score and stratifying by APOE reveal a 4 to 5.5 years difference in median age at onset of Alzheimer’s disease patients in APOE ɛ4 carriers. Because of this study, the underlying mechanisms of APP can be studied to refine the amyloid cascade and the polygenic risk score provides a tool to select individuals at high risk of Alzheimer’s disease.

1. LDSC analysis: this result seems fine and suggests low population stratification. But please give a bit more information in the Methods and cite the LDSC paper.
2. Figure 1 and workflow. I thank the authors for clarifying but Figure 1 has not been modified at all from the previous submission and I still find it confusing; I suspect many readers will as well. For example, it's very difficult to follow why there are different subsets of GR@CE in Figure 1 (with different sample sizes) and how they relate to each other.
3. PRS construction: the authors state that the objectives have been rewritten, but it is not clear exactly what they are referring to, and I am left guessing. (I could not find the details in reply 2 to reviewer 1). 4. Survival analysis: my earlier comments still stand and have not been fully addressed.
(i) The Cox regression can be performed with case-only (using age of onset and no censoring), and will likely provide substantially increased power over the current binning strategy, while allowing adjustment for covariates like APOE status and sex (as well as testing for interactions). There are other regression methods that can be used as well. The binning approach is very underpowered and limits the conclusions of the study.
(ii) The clarification that the GR@ACE controls are much younger than cases: this means that some of the controls might still develop AD later in life, and will eventually be cases. This fact does not completely invalidate the analysis assuming that the proportion of 'eventual AD' in the controls is low enough. But this point needs to be acknowledged somewhere in the Discussion.
5. PRS independent of APOE status: I assume that the table shows odds ratios for the PRS within each APOE group; this is not specified in the reply. Please add this table to the manuscript (beyond the one sentence in Figure 4). 6. GTEx analysis: As with my previous comment, there is no information in the manuscript on this analysis. Whether the results are based on public databases or not, it needs describing: what the analysis was, which data was used, etc.

Referees' comments:
Referee #1 (Remarks to the Author): I have previously reviewed the submission for Nature Genetics, hence I will not repeat my previous review but address the author reply.
Overall, the authors have addressed some of my concerns and clarified some issues, mainly around the PRS analysis. At the same time, the authors seem resistant to meaningfully changing any of their analyses to increase the statistical power of their study and seem content to simply reword their aims instead. Even just changing figures to increase the clarity of the study design was not considered.
Many of the modifications or clarifications are quite cursory and brief, and some are only presented here in the reply and not in the main text, or vice-versa. Often I am left trying to guess at the authors' intentions.

Main comments:
1. LDSC analysis: this result seems fine and suggests low population stratification. But please give a bit more information in the Methods and cite the LDSC paper.
Answer: In accordance with reviewer, we have now added the information about the LDSC in Methods section: "Polygenicity and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in GWAS. To distinguish between inflation from a true polygenic signal and bias we quantified the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD) using the LD Score regression intercept (LDSC software 33 )." 2. Figure 1 and workflow. I thank the authors for clarifying but Figure 1 has not been modified at all from the previous submission and I still find it confusing; I suspect many readers will as well. For example, it's very difficult to follow why there are different subsets of GR@CE in Figure 1 (with different sample sizes) and how they relate to each other.

Answer:
We acknowledge it is confusing. GR@ACE includes on average 1500-2000 samples per year, in the time between finishing the discovery, additional samples were genotyped. We added these as replication. State it as GR@ACE-replication.
The workflow in Figure 1 has been slightly modified for better understanding. Also we added some comments in the figure legend. " Fig. 1. (...) 3. PRS construction: the authors state that the objectives have been rewritten, but it is not clear exactly what they are referring to, and I am left guessing. (I could not find the details in reply 2 to reviewer 1).

Answer:
The discrimination and optimization of PRS is important. However, in this work, the objective of genetic PRS was to add to the existing data that the PRS can identify subjects at highest risk of AD. First, we performed a validation study to show that PRS (including the new SNP list) is a strong and independent predictor of important determinants of AD (age at onset, gender, PCs, APOE, or diagnosis accuracy (pathological versus clinical series). Then we performed the extensive stratification of risk by the PRS in the GR@ACE dataset, which has not been shown previously. Lastly, we´d like to emphasize that this validation dataset and the large GR@ACE dataset are completely independent from other published cohorts and therefore our results strongly reinforce the evidence for the effect of the PRS in AD.
The PRS study should thus be considered only as a proof of principle for future prospective studies that will follow. Furthermore, we know that our PRS cannot be considered definitive because it will be refreshed and improved periodically by introducing new hits yet to be discovered and added to the model in the future.
We hope that the rewriting of the PRS goals will make it clear that there are many ways to construct the PRS and we did not aim to identify the 'best' PRS, neither to make the comparison to other PRS or polygenic hazard scores (Qian Zhang et al., Nature Communications 2020).
4. Survival analysis: my earlier comments still stand and have not been fully addressed. (i) The Cox regression can be performed with case-only (using age of onset and no censoring), and will likely provide substantially increased power over the current binning strategy, while allowing adjustment for covariates like APOE status and sex (as well as testing for interactions). There are other regression methods that can be used as well. The binning approach is very underpowered and limits the conclusions of the study.

(ii)
The clarification that the GR@ACE controls are much younger than cases: this means that some of the controls might still develop AD later in life, and will eventually be cases. This fact does not completely invalidate the analysis assuming that the proportion of 'eventual AD' in the controls is low enough. But this point needs to be acknowledged somewhere in the Discussion.
Answer: (i) We implemented the Cox regression model case-only as a request of the reviewer and we report it in the methods and results section.
Methods: "We implemented a Cox regression model on the GR@ACE/DEGESCO dataset caseonly adjusted for covariates as APOE group, the interaction between the PRS and APOE and four population ancestry components. All analyses were done in R (v3.4.2)." Results: "The Cox regression also showed an impact of APOE on AAO, mainly on APOE ε4ε4 (significant APOE-PRS interaction (p = 0.021), Fig. 5d)." (ii) In accordance with reviewer's suggestions, the limitations related to the use of younger controls have been reinforced in the discussion section 5. PRS independent of APOE status: I assume that the table shows odds ratios for the PRS within each APOE group; this is not specified in the reply. Please add this table to the manuscript (beyond the one sentence in Figure 4).

Answer:
The table was added to the supplementary tables also pointing to the results of Figure 4 in the Results section. The BrainSeq is based on the dorsolateral prefrontal cortex (DLPFC) polyA+ RNA-seq on 738 subjects spanning the lifespan and three main psychiatric diagnostic groups (Schizophrenia, Major Depression Disorder, and Bipolar Disorder). BrainSeq identified eQTLs in the DLPFC using RNA sequencing (RNA-seq) and genotype data. The eQTL modeling tested for additive genetic effects on expression while adjusting for sex, ancestry (multidimensional scaling components), and expression heterogeneity (principal components). Significant eQTLs were those SNP-feature pairs with a false discovery rate (FDR) less than 1%. The "DLPFC -All" database was used. For more details, see http://eqtl.brainseq.org/phase1/eqtl/."