Introduction

Pharmacogenomics (PGx), an important tool for precision medicine, studies how pharmacokinetics, pharmacodynamics, efficacy, and safety responses to drugs are associated with genetic information at the molecular level of treated subjects1,2,3. Efficacy PGx studies have great potential to guide treatment options by integrating routine pharmacogenomic screening into clinical development and proposing novel strategies for identifying genetic markers that impact efficacy for new compounds (and for marketed drugs, if applicable)4. In the domain of precision medicine, many associations between genetic variation and inter-individual difference in drug response have been discovered to tailor treatments to the genetic makeup of the patient4. However, the conventional single variant PGx biomarkers or drug response predictors usually rely on large effect sizes. Genetic variants with small but genuine effects may not reach the significance threshold in a typical PGx analysis of hundreds to a few thousand subjects. Recent developments in disease genetics reveal that the polygenicity, i.e., many small genetic effects, is present in many complex traits5. This observation provides evidence to support modeling and predicting disease status by combining the effects of many weak signals.

Polygenic risk score (PRS), defined as the weighted sum of the effect sizes of many polymorphisms, is a rapidly emerging tool in the disease genetics field. PRS reflects the overall genetic risk of a phenotype of interest and can be used as a stratification mechanism for downstream analyses and decision making. In disease genetics, PRSs have been successfully developed for multiple complex diseases including coronary artery disease6,7, cancer8,9, etc. A large variety of methods have been developed for constructing PRS. To name a few, they include (1) the unadjusted method which builds PRS using the unadjusted effect estimates of SNPs across whole genome; (2) the Clumping and Thresholding (C+T) method10 which builds a few PRSs using independent SNPs that pass different significance thresholds and the optimal threshold is selected according to their performance in an independent study; (3) the Lassosum method11 which uses penalized regression to select informative SNPs by incorporating linkage disequilibrium (LD) information, and (4) the Bayesian regression methods, e.g., LDpred12, LDpred213, and PRS-CS14, which shrink the marginal effect sizes with respect to LD. Among them, the more sophisticated penalized regression and Bayesian regression methods have been shown to achieve better performance over the C+T method13.

Like many complex traits, most drug responses in PGx are extremely polygenic3,15. Despite recent development in PRS methods and their exciting applications in disease genetics, similar analytic methods have largely not yet been successfully adapted to drug responses in PGx16. There are emerging examples published so far, which build PRS from disease GWAS using SNPs with treatment unrelated prognostic effects only and then test whether the PRS is predictive of drug responses in one or several PGx studies17,18,19,20,21. However, this current practice of building disease PRS and applying to PGx data (called PRS-Dis approach) has not been fully justified in theory. In fact, we can show (Results section) that the PRS-Dis approach relies on a very stringent assumption that every variant selected for constructing PRS should have a constant ratio between its genotype main effect and genotype-by-treatment interaction effect, which may not be true in real PGx data. On the other hand, to the best of our knowledge, only a few published studies directly build PRS from drug-related data for safety or efficacy PGx prediction. For example, Lanfear et al.16 build an efficacy PGx PRS for β-blockers using observational data. Koido et al.22 build a PGx PRS for drug-induced liver injury using a method similar to C+T. Lewis et al.23 build an efficacy PGx PRS for clopidogrel response in terms of cardiovascular outcomes using single-arm clinical data. However, there is limited methodological development on the adaptation of PRS methods in disease genetics to PGx where data from both treatment and placebo (or control) arms are available.

To tackle the challenges of the complex drug response prediction and the lack of state-of-the-art PGx PRS methods, we propose to shift from the disease PRS approach to the PGx PRS approach by jointly modeling the genetic main effect and the genotype-by-treatment interaction effect (called PRS-PGx approach). We systematically extend the current PRS-Dis methods to construct both prognostic and predictive PRSs for drug response prediction in PGx studies. These methods include PRS-PGx-Unadj (Unadjusted), PRS-PGx-CT (Clumping + Thresholding), PRS-PGx-L, -GL, -SGL (-Lasso, -Group Lasso, -Sparse Group Lasso), and PRS-PGx-Bayes (Bayesian regression) methods. Our proposed methods use only PGx genome-wide association summary statistics and an external LD reference panel except for the penalized regression-based methods, which require access to individual level genetic and phenotypic data. Moreover, by extending the idea of global-local scaling parameters from disease GWAS14 to PGx GWAS, the PRS-PGx-Bayes method is able to infer the posterior prognostic and predictive effects simultaneously.

Our simulation studies demonstrate that PRS-PGx methods generally outperform the PRS-Dis methods across a wide range of genetic architectures and PRS-PGx-Bayes is superior to all other PRS-PGx methods. These methods are further applied to the IMPROVE-IT (IMProved Reduction of Outcomes: Vytroin Efficacy International Trial)24 PGx GWAS summary statistics data25 to predict treatment-related LDL cholesterol reduction. The drug response prediction results demonstrate a substantial improvement of PRS-PGx-Bayes in both prediction accuracy and the capability of capturing the predictive effect over alternative methods.

Results

Conceptual framework of the PRS-PGx methods

We consider a high-dimensional regression model of n patients and m SNPs for a drug response:

$${{{{{{{\bf{Y}}}}}}}}={{{{{{{\bf{X}}}}}}}}\gamma+{\upbeta }_{{{{{{{{\rm{T}}}}}}}}}{{{{{{{\bf{T}}}}}}}}+{{{{{{{\bf{G}}}}}}}}\upbeta+({{{{{{{\bf{G}}}}}}}}\times {{{{{{{\bf{T}}}}}}}})\upalpha+\epsilon,$$
(1)

where Y denotes a quantitative trait (drug response), T the binary treatment assignment, X the n × p matrix of covariates, and G the n × m genotype matrix; β is a m × 1 vector of prognostic effects (i.e., main effects), α is a m × 1 vector of predictive effects (i.e., interaction effects), and ϵ is the random error. In practice, the phenotype Y can first be adjusted by the covariates X and the treatment T, before application to any PRS-PGx algorithms. For simplicity, we will use Y as the phenotype after such adjustment in the later discussion.

The regression coefficient b = (β, α) is assumed to be fixed in the PRS-PGx-Unadj, PRS-PGx-CT and PRS-PGx-L, -GL, -SGL methods and random in the PRS-PGx-Bayes method. Specifically, for each j = 1,   , m, we consider the following prior distribution of bj = (βj, αj) in the Bayesian approach:

$$\left(\begin{array}{l} {\beta }_{j}\\ {\alpha }_{j}\end{array}\right) \Big|{\sigma }^{2},\phi,{\psi }_{j},{\xi }_{j},{\rho }_{j} \sim {{{{{{{\rm{MVN}}}}}}}} \Big ({{{{{{{\bf{0}}}}}}}},\,\frac{{\sigma }^{2}}{n}\phi {M}_{j} \Big ),{{{{{{{\rm{where}}}}}}}}\,{M}_{j}=\left[\begin{array}{cc} {\psi }_{j}&{\rho }_{j}\sqrt{{\psi }_{j}{\xi }_{j}}\\ {\rho }_{j}\sqrt{{\psi }_{j}{\xi }_{j}}&{\xi }_{j}\end{array}\right] \sim g,$$
(2)

where ϕ is a global scaling parameter that is shared across multiple SNPs and controls the degree of the model sparseness; ψj and ξj are local and marker-specific scaling parameters; ρj is the marker-specific correlation between the two effect sizes βj and αj; and g is a probability density function of a random matrix.

In PRS-PGx methods, SNPs are used for the construction of prognostic PRS and predictive PRS based on their estimated best = (βest, αest). The prognostic and predictive PRSs are defined as the weighted sum of the selected SNPs’ genotypes, where the weights are the estimated prognostic and predictive effect sizes, respectively,

$${S}_{{{{{{\rm{prog}}}}}}}=\mathop{\sum}\limits_{j=1}^{m}{\beta }_{j}^{{{{{{{\rm{est}}}}}}}}\,{{{{{{{{\bf{G}}}}}}}}}_{j},\quad {S}_{{{{{{\rm{pred}}}}}}}=\mathop{\sum }\limits_{j=1}^{m}{\alpha }_{j}^{{{{{{{\rm{est}}}}}}}}{{{{{{{{\bf{G}}}}}}}}}_{j}.$$
(3)

The predictive PRS is useful for patient stratification by aggregating the differential treatment effects. We can define the PGx PRS as

$${S}_{{{{\rm{PGx}}}}}=\left\{\begin{array}{lr}{S}_{{{{\rm{prog}}}}}+{S}_{{{{\rm{pred}}}}},&T=1,\\ {S}_{{{{\rm{prog}}}}},\hfill &T=0,\end{array}\right.$$
(4)

for overall drug response prediction. More technical details are provided in the “Methods” section.

Assumption of PRS-Dis approach for drug response prediction

Consider the linear model defined in Eq. (1). Assume (i) SNPs Gi, i = 1, . . . , m are standardized: EGi = 0, var(Gi) = 1; (ii) E(ϵ) = 0, var(ϵ) = σ2; (iii) b is defined in Eq. (2); (iv) Gi, i = 1, . . . , m, b, and ϵ are mutually independent. We consider the following three quantities:

$$Y|(T=1)=\mathop{\sum }\limits_{i=1}^{m}({\beta }_{i}+{\alpha }_{i}){{{{{{{{\bf{G}}}}}}}}}_{i}+\epsilon,$$
(5)
$${S}_{{{{{{\rm{PGx}}}}}}}|(T=1)=\mathop{\sum }\limits_{i=1}^{m}({\beta }_{i}+{\alpha }_{i}){{{{{{{{\bf{G}}}}}}}}}_{i},$$
(6)
$${S}_{{{{{{\rm{Dis}}}}}}}=\mathop{\sum }\limits_{i=1}^{m}{\beta }_{i}{{{{{{{{\bf{G}}}}}}}}}_{i}.$$
(7)

Y(T = 1) is the observed response of a subject in the treatment arm with SNPs Gi, i = 1, . . . , m; SDis and SPGx(T = 1) are the perfect polygenic scores for this treated subject from disease GWAS and PGx, respectively. We will drop the condition notation ‘(T = 1)’ hereafter when there is no ambiguity. We prove (in Supplementary Method A) that the heritability of a drug response can be calculated as

$${h}^{2}=\frac{{{{{{{{\rm{var}}}}}}}}\left(\mathop{\sum }\nolimits_{i=1}^{m}({\beta }_{i}+{\alpha }_{i}){{{{{{{{\bf{G}}}}}}}}}_{i}\right)}{{{{{{{{\rm{var}}}}}}}}\left(\mathop{\sum }\nolimits_{i=1}^{m}({\beta }_{i}+{\alpha }_{i}){{{{{{{{\bf{G}}}}}}}}}_{i}\right)+{\sigma }^{2}}={{{\mbox{cor}}}}^{2}\left({S}_{{{{{{\rm{PGx}}}}}}},Y\right).$$
(8)

On the other hand, it can be shown (in Supplementary Method B) that the squared correlation coefficient between SDis and Y for the treated subjects is:

$${{{\mbox{cor}}}}^{2}\left({S}_{{{{{{\rm{Dis}}}}}}},Y\right)={h}^{2}\left(1-\frac{\mathop{\sum }\nolimits_{i=1}^{m}{\psi }_{i}\mathop{\sum }\nolimits_{i=1}^{m}{\xi }_{i}-{(\mathop{\sum }\nolimits_{i=1}^{m}{\rho }_{i}\sqrt{{\psi }_{i}{\xi }_{i}})}^{2}}{{(\mathop{\sum }\nolimits_{i=1}^{m}{\psi }_{i})}^{2}+2\mathop{\sum }\nolimits_{i=1}^{m}{\psi }_{i}\mathop{\sum }\nolimits_{i=1}^{m}{\rho }_{i}\sqrt{{\psi }_{i}{\xi }_{i}}+\mathop{\sum }\nolimits_{i=1}^{m}{\psi }_{i}\mathop{\sum }\nolimits_{i=1}^{m}{\xi }_{i}}\right).$$
(9)

In the scenario that all interaction effects are independent of main effects, i.e., ρi ≡ 0, i = 1, . . . , m, Eq. (9) is reduced to \({h}^{2}\left(1-\frac{1}{1+R}\right)\), where \(R=\mathop{\sum }\nolimits_{i=1}^{m}{\psi }_{i}/\mathop{\sum }\nolimits_{i=1}^{m}{\xi }_{i}\) is the ratio between the total main effect and the total interaction effect. If R → , that is no interaction effect at all, then cor2(SDis, Y) → h2. If, however, we have strong interaction effect and no main effect, R → 0, then cor2(SDis, Y) → 0. This observation is consistent with the intuition that disease PRS approach, which ignores the treatment-by-genotype effects, is less ideal if such effects are strongly present.

In fact, by Cauchy-Schwarz inequality, cor2(SDis, Y) ≤ h2 and the equality holds if and only if

$${\rho }_{i}\equiv 1\,{{\mbox{and}}}\,{\psi }_{i}\propto {\xi }_{i},\quad {{\mbox{for all}}}\,i=1,\cdots \,,\quad m,$$
(10)

which is equivalent to

$${\beta }_{i}=c{\alpha }_{i},\quad i=1,\cdots \,,\quad m,\,\,{{\mbox{for some constant number}}}\,\,c.$$
(11)

This explicitly shows that the disease PRS approach SDis works only under an extremely stringent assumption that every causal variant must have the same interaction effect proportionate to its main effect. We also consider the situations when the regression coefficients βi, αi, i = 1, . . . , m, are fixed constants (in Supplementary Method B). The proof also shows that disease PRS SDis cannot recover all heritability as long as the interaction effect is not proportionate to its main effect for all causal variants.

By using the IMPROVE-IT PGx GWAS summary statistics data and 1000 Genomes (1KG) Phase 3 data (http://csg.sph.umich.edu/abecasis/mach/download/1000G.Phase3.v5.html) as external reference panel, we can calculate the cor2(SDis, Y) = h2(1 − 0.54), which means the PRS developed from any disease GWAS can at most explain 46% genetic variability of the drug response. In addition, we also calculate the ratio of genetic main effect to interaction effect, c, for the SNPs (after clumping with 250 kb window size and LD r2 > 0.8) across whole genome (m = 8,551,930) and the top SNPs defined by p-values of 2df (joint G and G × T) two-sided test26 less than three thresholds 1e−06 (m = 16), 1e−05 (m = 81), and 1e−04 (m = 472), respectively. Figure 1 shows that the constant ratio assumption (11) is completely not satisfied. Therefore, it is expected that the performance of PRS-Dis methods will be lower when applied to analyzing real PGx data (i.e., the IMPROVE-IT PGx GWAS data). Our real data analysis results indeed show that the PRS-Dis methods have substantially lower predictive power than the PRS-PGx methods.

Fig. 1: Distributions of the prognostic to predictive effect size ratios calculated from the IMPROVE-IT PGx GWAS summary statistics data with n = 5661 unrelated European samples.
figure 1

The left boxplot shows the distribution of whole genome SNPs (after clumping, m = 8,551,930). The right three boxplots show the distribution of top SNPs (after clumping) with their 2df (G + G × T) two-sided test p-values less than the three p-value thresholds 1e−06 (m = 16), 1e−05 (m = 81), and 1e−04 (m = 472), respectively. In each boxplot, the band indicates the median, the box indicates the first and third quartiles, and the whiskers indicate ± 1.5 × interquartile range. Effect size ratios of m SNPs are overlaid on the corresponding boxplot as dot points.

Simulation studies

In this section, we further illustrate the limitations of PRS-Dis methods and compare their empirical performance with the proposed PRS-PGx methods. We considered the scenario where all causal variants were both prognostic and predictive and their effect sizes were positively correlated. The constant ratio assumption (11) was only partially satisfied since the correlation coefficients were assumed to follow uniform distribution. As a sensitivity analysis, we also considered the scenario where all causal variants were either prognostic or predictive but could not be both, i.e., the constant ratio assumption (11) was strongly violated.

The constant ratio assumption of PRS-Dis is partially satisfied

We simulated SNPs’ prognostic and predictive effects from a bivariate normal distribution, as described in Eq. (15). Specifically, we set the heritability H2 = 0.3, the treatment effect βT = 0, and the prognostic and predictive effect sizes at the same scale with ψ/ξ = 1. Note that although ψiξi for all \(i\in {{{{{{{\mathcal{I}}}}}}}}\), where \({{{{{{{\mathcal{I}}}}}}}}\) is the set of causal variants, the correlation coefficient between the two effects ρi ~ Uniform(0,1) may vary from different LD blocks. The full details of data generation process are provided in the “Methods” section.

Before we compared PRS-PGx and PRS-Dis methods, we first assessed the performance among the three disease PRS methods (PRS-Dis-Unadj, PRS-Dis-CT, and PRS-Dis-LDpred2) and the three machine learning-based PRS-PGx methods (PRS-PGx-L, PRS-PGx-GL, and PRS-PGx-SGL), respectively. As shown in Supplementary Fig. 1, among the three disease PRS methods, PRS-Dis-LDpred2 outperformed the others in terms of both R2 and the statistical significance of its predictive effect. Similarly, among three penalized regression approaches, PRS-PGx-GL was consistently favored in the current simulation setting. Therefore, in the remaining simulations and real data analyses, we focused on only PRS-Dis-LDpred2 among all PRS-Dis methods; and only PRS-PGx-GL among all the PGx penalized regression methods.

Five polygenic prediction methods, PRS-Dis-LDpred2, PRS-PGx-Unadj, PRS-PGx-CT, PRS-PGx-GL, and PRS-PGx-Bayes, were compared across different settings of sample sizes, number of causal variants, heritabilities and effect sizes. The tuning parameters such as the p-value threshold in PRS-PGx-CT, the penalty parameter in PRS-PGx-GL, and some prior distribution parameters in PRS-PGx-Bayes were selected via 5-fold cross-validation (CV). The 1000 Genomes Project European population data was used as an external reference panel for LD. The performance was evaluated in an independent testing set (sample size = 1000) in terms of (i) the prediction accuracy of SPGx quantified by R2 between the observed and predicted phenotypes in two arms; (ii) the predictive effect measured by the −log10(p-value) from the two-sided likelihood ratio test (LRT) of Spred × T interaction; and iii) R2 of the SPGx under treatment and control arms, respectively. The statistical details about the above analyses are provided in the “Methods” section.

The predictive performance of the five polygenic prediction methods from the simulation studies is summarized in Fig. 2a. The PRS-PGx methods generally outperformed the PRS-Dis method (i.e., PRS-Dis-LDpred2). Among PRS-PGx methods, our proposed Bayesian approach PRS-PGx-Bayes was consistently better than the others. Overall speaking, PRS-PGx-Unadj approach, which aggregated all SNPs, performed poorly when the number of causal variants was small, but became more comparable to other methods when the genetic architectures were highly polygenic. Although it is reasonable to expect that PRS-PGx-GL (which accounts for local LD patterns) likely outperforms PRS-PGx-CT (which does not consider the impact of LD information), we observed an opposite pattern in our simulations. This is likely because Lasso-based methods are sensitive to the noise, and suffer most when the signal-to-noise ratio is small, which was the case in our simulation data. Finally, for all the methods, the prediction accuracy decreased as the number of causal variants increased given a fixed heritability. This is because, as more causal SNPs were in LD (as a result of more causal SNPs being randomly sampled across the genome) and their effect sizes declined, it became increasingly difficult to distinguish real signals from noise. Furthermore, we compared R2 of different methods in the treatment and control arms, respectively (Fig. 2c, d). The performance in the treatment arm held a similar pattern as to the R2 in two arms. However, the results from the control arm showed a different pattern. PRS-Dis-LDpred2 seemed to be superior to PRS-PGx methods. Note that under the control arm, the underlying true model becomes EY = Gβ. Therefore, a large-scale disease GWAS is able to perfectly recover β’s with \(\hat{\beta }\)’s, which implies that the disease PRS (=\(\mathop{\sum }\nolimits_{i=1}^{m}{\hat{\beta }}_{i}{{{{{{{{\bf{G}}}}}}}}}_{i}\)) is able to capture the prognostic effect under control arm. In such condition, disease PRS may show advantage to PGx PRS since it is constructed from a much larger sample size from disease GWAS. Fortunately, our proposed PRS-PGx-Bayes was still comparable to PRS-Dis-LDpred2 (Fig. 2d).

Fig. 2: Predictive performance of five polygenic prediction methods in the simulation studies, where heritability was fixed at 0.3 and ψ/ξ = 1.
figure 2

The numbers of the causal variants for P(causal) = 0.001, 0.01, and 0.1 were 5, 50, and 500, respectively. The training sample size for PRS-PGx approaches was either 1000 or 3000; for PRS-Dis-LDpred2 approach was 20,000. The tuning parameters were selected via cross-validation in the training data. The performance was assessed in terms of a prediction accuracy R2 of SPGx in two arms, b predictive p-value for the two-sided Spred × T interaction test, c R2 of SPGx under treatment arm, and d R2 of SPGx under control arm. Data are presented as mean values +/− standard deviations (error bars) with 10,000 replications, where results were calculated from the testing sets.

In addition to prediction accuracy, we summarized the predictive p-values (i.e., the significance of Spred × T interaction) across different methods in Fig. 2b. As expected, PRS-PGx methods showed a clear advantage to the PRS-Dis method PRS-Dis-LDpred2. This is not surprising since the disease PRS can fully capture the predictive effect only when the strong assumption (9) is satisfied as we discussed before. Furthermore, our proposed Bayesian approach PRS-PGx-Bayes generally outperformed other methods, which was consistent with our previous observations in terms of R2. P-values, obtained by the two-sided LRT of SPGx from Y ~ SPGx under two arms, respectively, were provided in Supplementary Fig. 2.

As shown in Supplementary Fig. 3, we further compared distributions of (\(\hat{\beta },\; \hat{\alpha }\)) estimated from different PRS-PGx methods versus the true value of (β, α) under four different genetic architectures, from no signal (P(causal) = 0), sparse signals (P(causal) = 0.001), to dense signals (P(causal) = 0.01 or 0.1). In a scenario when no causal variants were simulated (i.e., null hypothesis scenario), PRS-PGx-CT misidentified a few SNPs. However, the type I error rate was still well controlled. More specifically, PRS-PGx-CT misidentified 7 SNPs, which is comparable to the expected number of false positives (i.e., 5000 × 0.001 = 5). Both PRS-PGx-GL and PRS-PGx-Bayes shrank prognostic and predictive effect sizes to zero. In this scenario when 5, 50, or 500 causal variants were simulated, PRS-PGx-Bayes more accurately estimated the genetic effects compared with the other methods.

To further assess the impact of different implementation strategies, we also performed additional simulations where PRS-PGx-Bayes function was applied on LD blocks jointly (i.e., full LD matrix across LD blocks was used). The simulation settings remained the same as described in Fig. 2. As a sensitivity analysis, we also applied PRS-PGx-Bayes method to the uniform blocks with number of variants in each block as 200, 500, and 2500, respectively. Supplementary Fig. 4 shows that there is a slightly decreasing trend in R2 and −log(p-value) from using the full genotype matrix to using uniform blocks with size 200. However, such differences across different types of blocks are limited, especially between LD blocks and full genotype matrix. For example, when P(causal) = 0.001, compared to using the full genotype matrix, the LD block approach only decreases R2 by 0.4%. The relative decreases of LD block approach compared to the full genotype approach was summarized in Supplementary Table 1. The table shows that the relative decreases of the LD block approach in simulation studies are very small (i.e., all ≤1.1%).

To assess the performance of the proposed methods under different heritabilities, we conducted sensitivity analyses by setting H2 = 0.1 and 0.5 and the results showed a very similar pattern (Supplementary Fig. 5). Supplementary Fig. 6 shows the sensitivity analysis results when the treatment effect βT was set to 1. The methods performed very similar to when βT was set as 0. Furthermore, we assessed the impact of different scales of the prognostic and predictive effect sizes on the methods’ performance. When the two effect sizes were set with different scales (ψ/ξ = 16 or 1/16), the sparse group Lasso-based method (PRS-PGx-SGL) performed the best among the three penalized regression methods, while PRS-PGx-Bayes still outperformed all the other methods (Supplementary Fig. 7). In addition, when ψ/ξ = 16 (i.e., the heritability is mostly explained by the prognostic effect), it was not surprising that PRS-Dis-LDpred2 was at least comparable to most PRS-PGx methods in terms of the prediction accuracy R2. But still, the performance of the PRS-Dis methods was much worse than PRS-PGx methods in terms of capturing the predictive effect.

The constant ratio assumption of PRS-Dis is strongly violated

We simulated completely separate sets of prognostic and predictive SNPs so that no SNPs were both prognostic and predictive. Under this condition, the ratio of main to interaction effects (i.e., β/α) for each causal variant was either 0 or . The details of data generation are provided in the “Methods” section.

The simulation results are summarized in Fig. 3, which shows that, when the assumption (11) is not satisfied, the PRS-PGx methods uniformly outperformed PRS-Dis methods in terms of the prediction accuracy across different settings of causal variants. Figure 3a shows that the average R2 of the PRS-Dis methods are all below 0.1 while it is larger than 0.13 for the PRS-PGx methods. Such advantage was even more pronounced for capturing the predictive effect, which was measured by the predictive p-value from the two-sided likelihood ratio test of Spred × T interaction. Figure 3b shows the PRS-Dis-LDpred2 method generates the geometric mean p-values > 0.01, but the geometric mean p-values of PRS-PGx methods are <1e−8. The detailed results from the three disease PRS methods and the three penalized regression methods are summarized in Supplementary Fig. 8.

Fig. 3: The drug response prediction performance comparison among five methods based on the simulated data with completely separate prognostic and predictive SNPs and heritability fixed at 0.3.
figure 3

The training sample size for PGx PRS approaches was fixed to be 3000. Numbers of the causal variants for P(causal) = 0.001, 0.01, and 0.1 are 5, 50, and 500, respectively. The performance was assessed in terms of a prediction accuracy R2 of SPGx in two arms, b predictive p-value for the two-sided Spred × T interaction test. Data are presented as mean values +/− standard deviations (error bars) with 10,000 replications, where results were calculated from the testing sets.

Computational time

To assess the computational burden of the proposed method, we applied the PRS-PGx-Bayes function with 1000 MCMC iterations to chromosome 6, LD block 33 (the largest LD block with 11,769 SNPs). As a sensitivity analysis, we also explored scenarios by randomly choosing 1000, 3000, 5000, 7000, 9000 SNPs from that block. The real genetic data was obtained from the IMPROVE-IT trial with a sample size of 5661. The effect sizes and phenotype data were simulated with heritability fixed at 0.3, ψ/ξ = 1, and P(causal) = 0.01. The tuning parameters were selected via cross-validation. The computation was completed on a single core of 2.4 GHz Intel Core i5. We summarized the result in Supplementary Fig. 9, which shows that the computational time increased at the rate of m2 to m3, where m denotes the number of variants. The result also shows that it took roughly 5.9 hours for the largest LD block and 1 h for the median-size LD block (Supplementary Fig. 10) to complete the computation. In practice, since the computation in each LD block is independent, we could further shorten the computational time by parallel computing 1725 LD blocks across the whole genome. In the authors’ High Performance Computing working environment, the IMPROVE-IT whole genome analysis took about 35 h, where typically 50 jobs were run simultaneously.

Polygenic prediction of drug responses in the IMPROVE-IT PGx GWAS study

We applied the four proposed PRS-PGx methods (PRS-PGx-Unadj, PRS-PGx-CT, PRS-PGx-GL, PRS-PGx-Bayes) and the other two PRS-Dis methods (PRS-Dis-CT and PRS-Dis-LDpred2) to the IMPROVE-IT PGx GWAS summary statistics data to predict the low-density lipoproteins cholesterol (LDL-C) log-fold change at 1-month from the two treatment arms. The two treatment arms are the treatment arm with the combined therapy (Ezetimibe + Simvastatin: 10 mg + 40 mg) and the active control arm with monotherapy (Simvastatin: 40 mg). We adjusted for the age, gender, prior lipid-lowering (PLL) therapy, early glycoprotein IIb/IIIa inhibition in non-ST-segment elevation acute coronary syndrome (EARLY ACS) trial, high-risk ACS diagnosis, baseline LDL-C level, and five top principal components when generating the IMPROVE-IT summary statistics data for the LDL-C drug response phenotype.

To apply PRS-PGx methods to the IMPROVE-IT data, we used nested cross-validation. More specifically, the IMPROVE-IT data was split into five folds in the outer layer of cross-validation with four for training and one for testing. The training set was used to obtain the PGx GWAS summary statistics. In the inner layer of cross-validation, the training set was further split into four folds, three for training and one for validation, to select the optimal tuning parameters (i.e., p-value cutoff for PRS-PGx-CT, penalty λ for PRS-PGx-GL and (v, ϕ) for PRS-PGx-Bayes). We compared performance across different methods with the results summarized from the testing set. The prediction accuracy was measured by R2 and summarized in Table 1. The capabilities of the PRS methods in capturing the prognostic and predictive effects were measured by their effect sizes, as well as association p-values, and shown in this table as well.

Table 1 IMPROVE-IT PGx GWAS data analysis results: R2, p-values of two-sided test, and effect sizes

Consistent with previous simulation results, the two PRS-Dis methods performed poorly in terms of R2 and predictive p-value. In contrast, the PRS-PGx approaches demonstrated an overall improvement in both metrics. For example, PRS-PGx-Bayes increased the prediction accuracy R2 to 0.214 in both arms while compared with 0.174 from the best disease PRS method PRS-Dis-LDpred2. In the treatment arm, PRS-PGx-Bayes improved the R2 by 0.112 while compared with PRS-Dis-LDpred2 (0.277 vs. 0.165). On the other hand, the PRS-Dis method PRS-Dis-LDpred2 was superior to PRS-PGx methods in terms of R2 under the control arm, which might be partially due to the fact that we used the disease GWAS statistics data with much larger sample size (n > 300,000) for constructing the disease PRS in the PRS-Dis analyses. But our proposed method PRS-PGx-Bayes still provided a comparable R2 prediction performance (i.e., 0.194 from PRS-PGx-Bayes vs. 0.201 from PRS-Dis-LDpred2). In terms of predictive p-value, PRS-PGx-Bayes yielded a much more statistically significant predictive (or interaction) p-value 5.4e−05 while compared to 0.033 from PRS-Dis-LDpred2. In addition, p-values obtained by the LRT of SPGx showed very similar pattern as the R2 in either treatment or control arm. Table 1 also shows that the marginal effect sizes \({\widehat{\beta }}_{G}\) of SPGx from the model Y ~ SPGx under treatment arm were all negative across different PRS-Dis and PRS-PGx methods, indicating that a larger PRS would result in more LDL reduction after 1-month treatment of Ezetimibe + Simvastatin. In the meantime, PRS-PGx-Bayes method outperformed the others with the largest absolute value of effect size \({\widehat{\beta }}_{G}\). Similarly, the interaction effect sizes \({\widehat{\beta }}_{G\times T}\) of Spred from the model Y ~ T + Sprog + T × Spred were all negative across all methods, implying that a larger predictive score would result in a larger treatment effect (i.e., Ezetimibe + Simvastatin combination vs. Simvastatin monotherapy). PRS-PGx-Bayes method is also superior to the others with the largest absolute value of effect size \({\widehat{\beta }}_{G\times T}\).

We further compared the patient stratification performance across different methods with the results summarized in Fig. 4. In Fig. 4a, we used four fixed quantiles (0–25%, 25–50%, 50–75%, and 75–100%). The results indicated that although overall the population had a positive treatment effect (i.e., Simva+EZ is better), the treatment effects varied across different patient subgroups when stratified by the predictive score. Furthermore, the predictive score determined by PRS-PGx-Bayes was generally superior to other methods for patient stratification. Specifically, ratios of top 75–100% subgroup to bottom 0–25% subgroup in terms of treatment effects were 1.27, 1.48, 1.65, 3.27, 1.94, and 10.28 for PRS-Dis-CT, PRS-Dis-LDpred2, PRS-PGx-Unadj, PRS-PGx CT, PRS-PGx-GL, and PRS-PGx-Bayes, respectively. In Fig. 4b, patients were stratified into top 10%, 20%,  , 90% percentile of the predictive score vs. the rest, respectively. The corresponding between group differential treatment effect was calculated. Among the six methods, PRS-PGx-Bayes had the largest differential treatment effect across different cutoff points followed by PRS-PGx-CT and PRS-PGx-GL and the rest three methods had the lowest differential treatment effects. The optimal cutoff point for PRS-PGx-Bayes occurred between 50% and 60%, with differential treatment effect around 0.52. Instead of using fixed quantiles, we also determined the optimal quantile cutoffs with the largest differential treatment effect estimated from the 5-fold cross-validation (training and testing) procedures. The corresponding ability of PRS-PGx-Bayes to stratify patients with greater clinical benefits was assessed in different validation sets with the results summarized in Supplementary Fig. 11. The differences in treatment effects between high and low predictive score subgroups were very clear in the overall population as well as in four out of five CVs.

Fig. 4: Patient stratification performance of six polygenic prediction methods in the IMPROVE-IT PGx real data analysis with n = 5661 unrelated European samples.
figure 4

a Quantile plot of treatment effect using four fixed quantiles (0–25%, 25–50%, 50–75%, and 75–100%). Each dot stands for the observed Treatment Effect (TE), and each bar denotes the 95% Confidence Interval (CI). b Differential treatment effect when patients were stratified into top 10%, 20%,  , 90% percentile of the predictive score vs. the rest, respectively.

In terms of variant selection, PRS-Dis-CT identified 280 SNPs to construct disease PRS with the optimal p-value cutoff determined as 5e−08 and PRS-PGx-CT selected 91 SNPs into the PGx PRS with the optimal p-value cutoff selected as 1e−07. The number of SNPs identified by PRS-PGx-CT is much smaller because the sample size of the IMPROVE-IT PGx GWAS data is much smaller than that from the disease GWAS summary statistics data. For PRS-PGx-Bayes, it tends to shrink effect sizes of most SNPs close to zero (but not exactly equivalent to zero), which is consistent with our simulation results. Distributions of the predictive effect sizes of the whole genome SNPs estimated by PRS-PGx-CT, -GL, and -Bayes are shown in Supplementary Fig. 12. The corresponding information of the top 20 SNPs with the largest predictive effect sizes (>0.1) reported by PRS-PGx-Bayes was summarized in Supplementary Table 2, most of which were with the previous association evidence from literature and the Open Targets database (https://genetics.opentargets.org/).

Discussion

In this article, we develop a series of PRS-PGx methods to construct PGx-based PRS and predict the polygenic component of drug responses in PGx studies. To our best knowledge, no existing methods can be directly applied to jointly model both prognostic and predictive effects for drug response prediction. The necessity of using PGx PRS approaches instead of disease PRS approaches is validated by the proof of extremely stringent assumptions needed for the disease PRS approach to predict drug response. Our proposed PRS-PGx methods include a simple method using whole genome variants (PRS-PGx-Unadj), a clumping and p-value thresholding method (PRS-PGx-CT), three penalized regression methods (PRS-PGx-L, PRS-PGx-GL, PRS-PGx-SGL), and a novel Bayesian regression method (PRS-PGx-Bayes). Except for the penalized regression-based algorithms, all the other methods can take PGx summary statistics as input without relying on individual-level genotype and phenotype data. Compared with the PRS-Dis methods, the PRS-PGx approaches can shrink variants’ main and interaction effect sizes simultaneously, and construct PGx scores including a prognostic PRS and a predictive PRS. Thus, in our PRS-PGx-Bayes method, we propose to accommodate both effects and their correlation by modeling a variance-covariance matrix. Although the inverse Wishart (IW) prior is widely used, in this paper, we choose to use the hierarchical half-t prior27 instead due to the limitation of IW (i.e., IW prior imposes a dependency between the correlations and the variances). Moreover, by introducing global-local continuous shrinkage priors on SNP effect sizes, our proposed PRS-PGx-Bayes method is more robust to varying relationships between main and interaction effects compared to other PRS-PGx methods. Our extensive simulation studies and the PRS analyses based on the IMPROVE-IT PGx GWAS data show that the proposed PRS-PGx methods generally outperform the PRS-Dis methods in terms of the prediction accuracy R2 and the significance of their predictive effects. Furthermore, the PGx PRS Bayesian approach (PRS-PGx-Bayes) is superior to all the other methods, under different genetic architectures. Interestingly, we find that the C+T method (PRS-PGx-CT) can outperform the penalized regression-based methods (PRS-PGx-L, PRS-PGx-GL, and PRS-PGx-SGL) in some of the scenarios. This pattern was also observed in Mak et al.11. Our results suggest that Lasso-based methods are sensitive to the noise, and perform poorly when the signal-to-noise ratio is small (Supplementary Figs. 5 and 7). Further study is needed to examine the difference more comprehensively.

Despite the successful application of polygenic risk score in disease genetic studies for disease prediction and stratification, the PRS analyses in PGx studies from randomized clinical trials are usually more complex. On one hand, although sample sizes from PGx studies are usually smaller than those in disease GWAS, the significantly larger effect sizes from PGx studies28,29 can result in decent power to detect variants associated with drug response phenotypes and further enable good drug response prediction performance of PGx PRS. On the other hand, the availability of summary statistics from PGx GWAS is expected to increase quickly (i.e., similar as the case for the availability of summary statistics from disease GWAS). Therefore, statistical methods customized for PGx PRS analysis are urgently needed for drug response prediction and patient stratification in PGx GWAS studies, with the ultimate goal of achieving precision medicine. Unfortunately, there are currently very limited efforts of constructing PGx PRS and successfully applying to efficacy-based PGx studies16,22,23. The most popular practice of PRS analysis strategies in literature is to construct PRS from disease GWAS and apply to PGx data. We used rigorous statistical modeling to prove that PRSs built from disease GWAS lack the prediction potential in reaching the full heritability of drug responses. Qualitatively speaking, the disease PRS approach is only able to fully predict drug responses under an extremely strong assumption that all variants used for constructing PRS have the same relationship between their main effect and interaction effect (i.e., the ratio between the prognostic G and predictive G × T effect sizes is a constant). The violation of such assumption may result in very limited predictive power. For example, we showed in our theoretical analysis that any PRS developed from disease GWAS can explain at most 46% variability of the LDL-C drug response in the IMPROVE-IT PGx GWAS data. In addition, the disease PRS likely lacks the molecular specificity to be directly informative for clinical interventions30 since the PRS is built from variants with only prognostic effects (i.e., without considering the variants with predictive effects related to treatment). This may partially explain why there has been little investigation into PGx PRS and very few successful PGx PRS applications have been published for predicting drug responses in the PGx space so far. One successful example using an analysis strategy most closely resemble our PGx PRS analysis strategy is from Lanfear et al.’s paper16. A direct PGx polygenic response predictor (PRP) was constructed from a genome-wide analysis of β-Blocker (BB) x SNP interaction and successfully predicted all-cause mortality (BB survival benefit) in European ancestry patients with reduced ejection fraction heart failure16. The authors built the PRS with the selected variants (using the simple p-value thresholding method) with only predictive effects (i.e., from 44 SNPs with strong BB x SNP interaction) and then applied to the validation PGx GWAS data for PRS effect evaluation and patient stratification. Compared with the PGx PRS construction in this example, the PGx PRS analysis strategy embedded in our PRS-PGx methods constructs both the prognostic and predictive PRSs by jointly modeling both effects in the base GWAS data and then tests them on validation data. This provides a systematic way to understand the full picture of PRS’ association with drug response. Furthermore, in addition to the p-value thresholding method, the more advanced (Bayesian regression based) method PRS-PGx-Bayes can also be applied to such application examples, which has been demonstrated to outperform the simple p-value thresholding method in our simulations and real data analyses. In summary, our drug response prediction analysis strategies based on proposed PRS-PGx methods, especially PRS-PGx-Bayes, are highly recommended over the various strategies and existing methods from current literature.

The genetic architectures of responses from a single drug or drug classes determine the proportion of variants across the human genome contributing to their heritability and the proportion of variants with small-, moderate- or large-effects in capturing the heritability. Our research tackles the challenges of PRS methodology in PGx so that PRS can be directly and accurately used for drug response prediction and patient stratification. However, it is usually challenging to find two independent PGx GWAS data or a summary statistics data from a relatively large PGx GWAS study with the trait same or similar as that in an independent testing PGx GWAS study. In addition to discovering the top variants with large effects associated with drug responses (like what most current PGx GWAS do), we call for also carefully examining genetic architectures of drug responses in PGx GWAS and sharing summary statistics data from those studies in the public domain. Furthermore, recording both the prognostic and predictive (the genotype by treatment interaction) effects (β, α) as GWAS summary statistics in future PGx studies from randomized clinical trials and sharing them will enable the wide application of the proposed PRS-PGx methods and accelerate the development and deployment of PRS based on PGx data for precision medicine.

Recent studies31 have shown that Genotype-by-Environment Interaction (GEI or G × E) may explain a significant proportion of phenotypes compared with the main genotype (G) test in disease genetics studies. Our proposed PRS-PGx methods can be directly applicable to the scenario of genome-wide GEI test or G + G × E joint test if environmental variables are considered in the PRS development and deployment in disease genetics. In addition, although we mainly demonstrate the PRS-PGx methods in the analysis of continuous drug response phenotypes, all of them can be directly applied to binary (i.e., drug-induced adverse reactions or other drug safety responses) and survival (i.e., time-to-event drug responses) phenotypes except for PRS-PGx-Bayes. It is straightforward to extend the PRS-PGx-Bayes to analyzing binary and survival endpoints by adopting the Bayesian logistic regression32 and Bayesian Cox proportional hazards model33,34,35 instead of the Bayesian linear regression. How to derive their posterior probabilities remains a question for future research. Moreover, our methods are developed based on single-ethnic population (e.g., European population). A direct application of this score to other ethnic groups may result in considerable loss in prediction accuracy. With the rapid growth of non-European genomic resources in recent years, it is, therefore, of great both research and public interests to extend the proposed PRS-PGx approaches to the trans-ethnic scenario in future. In addition, for the purpose of effect size shrinkage, the currently existing Bayesian methods lack a systematic way to determine the optimal prior. For example, the LDpred method uses the spike-and-slab prior12; the SBayesR method uses the spike-and-slab prior by replacing normal with mixture normal36; the DPR method uses the Dirichlet process prior37; and Griffin and Brown use the Normal-Gamma shrinkage prior38. The PRS-CS method14 and our proposed PRS-PGx-Bayes method use one of the most popular continuous shrinkage priors (global-local scale mixtures of normals, i.e., the Horseshoe prior). One potential drawback of the Horseshoe prior is that there has been no consensus on how to place a prior for ϕ based on the information about the sparsity39, neither by grid searching for the optimal ϕ (as in PRS-PGx-Bayes), nor by applying full Bayesian inference (as in PRS-CS). Thus, another possible future research direction is to systematically study the impact of different priors on PRS-PGx-Bayes performance and then determine the optimal one. Furthermore, if the Horseshoe prior is used, we may further improve the PRS-PGx-Bayes algorithm by automatically updating (ν, ϕ) based on the sparsity information instead of using grid searching for the tuning parameters. Last, we suggest applying our proposed methods by LD blocks. Expanding the size of blocks (e.g., using the whole chromosome) may slightly improve prediction accuracy but also significantly increase computational costs. On the other hand, further reducing the size of blocks (e.g., using the uniform blocks with smaller size) can reduce run-time but also possibly increase the bias by missing long-range LD. We believe that the by LD block strategy is a right trade-off between decent modeling accuracy and feasible computational time. We also acknowledge that future work is needed to further improve the computational efficiency to incorporate more SNPs for simultaneous analysis.

As next-generation sequencing and other genetic platforms become much cheaper and more routinely embedded within PGx research studies, more rare variants may be evaluated and discovered to associate with drug responses. There will be an increasing need to consider pharmacogenomic variants, both common and rare with either large, moderate, or small effects to predict patients’ drug responses. The PRS-PGx methods we develop are promising in advancing precision medicine by improving drug response (efficacy and/or safety) prediction in PGx studies. Our efforts of developing the PRS-PGx methods, which identify optimal ways for PRS construction by jointly modeling the prognostic and predictive effects, is an important step for accelerating the translation of PRS to clinical practice.

Methods

The four PRS-PGx methods (PRS-PGx-Bayes, PRS-PGx-Unadj, PRS-PGx-CT, and PRS-PGx-L, -GL, -SGL) we propose are described in this section. We leave the brief description of the PRS-Dis-LDpred2 method to Supplementary Method C and Supplementary Fig. 13 since it is an existing method. The overview of PRS-Dis and PRS-PGx methods is summarized in Supplementary Table 3. The workflow and details of simulation studies and real data analyses are also discussed.

PRS-PGx-Bayes

Recall that the Bayesian regression model has been specified in Eq. (1), and we assume Y and G have been standardized. Furthermore, we assume the residual variance σ2 follows a non-informative scale-invariant Jeffreys prior, that is, p(σ2) σ−2 as suggested by Ge et al.14 Regarding the priors of effect sizes, we extend the idea of global-local scale mixtures of normals14 to the two-dimensional scenario as indicated in Eq. (2):

$$\left(\begin{array}{l}{\beta }_{j}\\ {\alpha }_{j}\end{array}\right) \Big|{\sigma }^{2},\,\phi,\,{\psi }_{j},\,{\xi }_{j},\,{\rho }_{j} \sim \,{{\mbox{MVN}}}\, \Big ({{{{{{{\bf{0}}}}}}}},\,\frac{{\sigma }^{2}}{n}\phi {M}_{j} \Big ),\,{{\mbox{where}}}\,\,{M}_{j}=\left[\begin{array}{cc}{\psi }_{j}&{\rho }_{j}\sqrt{{\psi }_{j}{\xi }_{j}}\\ {\rho }_{j}\sqrt{{\psi }_{j}{\xi }_{j}}&{\xi }_{j}\end{array}\right] \sim g,$$

where ϕ is a global scaling parameter that is shared across all effect sizes; ψj and ξj are local and marker-specific scaling parameters; ρj is the marker-specific correlation between the two effect sizes βj and αj; and g(  ,  ) is an absolutely continuous and two-dimensional mixing density function.

We first note that, given the prior information σ2, ϕ, and Mj, j = 1, 2,  , m, and the summary statistics (i.e., the effect size estimates) \(\widehat{{{{{{{{\bf{b}}}}}}}}}={{{{{{{\bf{X}}}}}}}}^{\prime} {{{{{{{\bf{Y}}}}}}}}/n\) from PGx GWAS, the posterior mean of b is

$$\,{{\mbox{E}}}\,[{{{{{{{\bf{b}}}}}}}}\,|\,\widehat{{{{{{{{\bf{b}}}}}}}}}]={({{{{{{{\bf{D}}}}}}}}+{{{\Omega }}}^{-1})}^{-1}\widehat{{{{{{{{\bf{b}}}}}}}}},$$
(12)

where

$${{\Omega }}=\left[\begin{array}{cc}{{\Psi }}&P\\ P&{{\Xi }}\end{array}\right],\,{{\Psi }}=\,{{\mbox{diag}}}\,(\phi {\psi }_{j}),\,{{\Xi }}=\,{{\mbox{diag}}}\,(\phi {\xi }_{j}),\,P=\,{{\mbox{diag}}}\,\Big(\phi {\rho }_{j}\sqrt{{\psi }_{j}{\xi }_{j}}\Big),$$

and

$${{{{{{{\bf{D}}}}}}}}={{{{{{{\bf{X}}}}}}}}^{\prime} {{{{{{{\bf{X}}}}}}}}/n=\,{{\mbox{cor}}}\,([{{{{{{{\bf{G}}}}}}}}\quad {{{{{{{\bf{G}}}}}}}}\times {{{{{{{\bf{T}}}}}}}}])$$

can be obtained from LD reference panel as illustrated in Supplementary Method D and Supplementary Fig. 14.

To provide further insights on Eq. (12), we consider several scenarios. First, assume ψj ≡ 1, ξj ≡ 1, and ρj ≡ 0, Eq. (12) is equivalent to the Ridge regression and all effect sizes are shrunk towards zero at the same constant rate controlled by the global shrinkage parameter ϕ (i.e., the penalty). Second, assume a one-to-one treatment-placebo allocation ratio (μT = 0.5), unlinked genetic markers (σij ≡ 0 for i ≠ j) and ρj ≡ 0 (i.e., within each SNP, the two effect sizes are independent). We can derive the formulas explicitly for \(\,{{\mbox{E}}}\,[{\beta }_{j}|{\widehat{\beta }}_{j}]\) and \(\,{{\mbox{E}}}\,[{\alpha }_{j}|{\widehat{\alpha }}_{j}]\) (Supplementary Method E). Under the simplified scenario where all markers’ MAFs are small, fj ≡ f → 0, we can show that:

$${{{{{{{\rm{E}}}}}}}}[{\beta }_{j}|{\hat{\beta }}_{j}] \;\approx \; \frac{{\hat{\beta }}_{j}-\frac{c}{{s}_{j}}{\hat{\alpha }}_{j}}{{t}_{j}-{c}^{2}/{s}_{j}},\\ {{{{{{{\rm{E}}}}}}}}[{\alpha }_{j}|{\hat{\alpha }}_{j}] \;\approx \; \frac{{\hat{\alpha }}_{j}-\frac{c}{{t}_{j}}{\hat{\beta }}_{j}}{{s}_{j}-{c}^{2}/{t}_{j}},$$

where \({t}_{j}=1+{\phi }^{-1}{\psi }_{j}^{-1}\), \({s}_{j}=1-f+{\phi }^{-1}{\xi }_{j}^{-1}\), and \(c=\sqrt{(1-f)/2}\). To understand the above equations, we can interpret \(1/{t}_{j}=\frac{1}{1+{\phi }^{-1}{\psi }_{j}^{-1}}\), \(1/{s}_{j}=\frac{1}{1-f+{\phi }^{-1}{\xi }_{j}^{-1}}\) as the shrinkage factors. Therefore, \({\hat{\beta }}_{j}/{t}_{j}\) and \({\hat{\alpha }}_{j}/{s}_{j}\) are the ‘shrunk’ effects: tj = sj = 1 indicates no shrinkage while tj = sj →  yields full shrinkage. The correlation c between Gj and Gj × T also contributes to the second part of the numerator because the bias induced by the positive correlation needs to be corrected.

In the PRS-PGx-Bayes method, Mj denotes the variance-covariance matrix. It is a common practice to use an inverse Wishart (IW) distribution as the conjugate prior for the covariance matrix of a multivariate normal distribution. However, the IW prior has its own limitations. The IW prior imposes a dependency between the correlations and the variances: larger variances automatically imply the absolute value of the correlation near one while small variances imply the correlation near zero40,41. Therefore, in this study, we propose to use the hierarchical half-t prior27, which is more flexible than the IW prior by adding the degrees of freedom parameter to the scale matrix. Specifically, we assume

$${M}_{j} \sim {{{{{{{{\rm{W}}}}}}}}}^{-1}({B}_{j},\,2v+1),\,\,{{\mbox{where}}}\,\,{B}_{j}=4v\left[\begin{array}{cc}{\delta }_{j}&0\\ 0&{\lambda }_{j}\end{array}\right],\,{\delta }_{j} \sim \,{{\mbox{G}}}\,({b}_{1},\,1),\,{\lambda }_{j} \sim \,{{\mbox{G}}}\,({b}_{2},\,1),$$
(13)

where W−1(Bj, 2v + 1) denotes the inverse Wishart distribution with scale matrix Bj and v degrees of freedom and G is a Gamma distribution. Equation (13) implies marginal distributions of the variances and the correlation as:

$${\psi }_{j} \sim \,{{\mbox{iG}}}\,(v,\,2v{\delta }_{j}),\quad {\xi }_{j} \sim \,{{\mbox{iG}}}\,(v,\,2v{\lambda }_{j}),\quad p({\rho }_{j})\propto {\big(1-{\rho }_{j}^{2}\big)}^{v},$$

where iG denotes the inverse Gamma distribution. By using this definition, changing the correlation does not necessarily result in a change in the variances, which are instead determined through δj and λj.

In practice, by using LD information from an external reference panel (i.e., 1000 Genomes data), the method can be applied to PGx GWAS summary statistics and does not require individual-level genotype and phenotype data. PRS-PGx-Bayes approach updates b = (β, α), σ2, (ψ, ξ, ρ), (δ, λ) sequentially based on their posterior distributions (as described in Supplementary Method E), where we set b1 = b2 = 1/2 as suggested by Ge et al.14. Also as proposed by Ge et al.14, to avoid numerical issues caused by collinearity between SNPs, we set ϕψj ≤ ρ and ϕξj ≤ ρ, where ρ = 1 is a constant number. In addition, we partition the genome into 1725 largely independent genomic regions42 (https://bitbucket.org/nygcresearch/ldetect-data/src/master/) estimated using data from the 1KG European sample, and further conduct multivariate update of the effect sizes within each LD block. The distributions of block sizes in 1KG and IMPROVE-IT data are summarized in Supplementary Fig. 10 a and b, respectively. The largest LD block (chr 6, block 33) after matching IMPROVE-IT PGx data to 1KG contains 11,769 SNPs in total. The same strategy is also applied to the other methods. As shown in Supplementary Fig. 4 and Supplementary Table 1, this strategy is justified by that fact that only a small relative difference (≤1.1%) is observed when the PRS-PGx-Bayes function is carried out by LD blocks, compared to across multiple LD blocks in simulation studies. The overall PRS-PGx-Bayes algorithm is described in Algorithm 1.

Algorithm 1

PRS-PGx-Bayes: Performed within each LD block

PRS-PGx-Unadj

The PGx PRS include two parts: prognostic PRS (Sprog) and predictive PRS (Spred). The unadjusted PGx PRS is the sum of all genetic markers across the whole genome, weighted by their marginal prognostic and predictive effect size estimates (i.e., \(\widehat{\beta }\) and \(\widehat{\alpha }\) from PGx GWAS summary statistics), respectively.

PRS-PGx-CT

The PRS-PGx-CT method constructs both the prognostic and predictive PRS using the variant LD-clumping and p-value thresholding steps, in a similar manner as the disease PRS C+T method. Specifically, in the clumping step, for any pair of SNPs that have a physical distance smaller than 250 kb and an LD r2 > 0.01, the less significant SNP is removed. Furthermore, in the thresholding step, the prognostic and predictive effect size estimates of SNPs, whose 2-df two-sided test (i.e., joint test of G + G × T, obtained from PGx GWAS summary statistics) p-values not passing the threshold PT, will be shrunk to zero. And then the remaining SNPs are kept with both types of effects. We consider PT {5e−08, 1e−07, 1e−06, 1e−05, 1e−04, 0.001, 0.01, 0.1, 1} in this paper. The PT value that produces the highest prediction accuracy in a validation data set is selected, and the predictive performance is assessed in an independent testing set.

PRS-PGx-L, -GL and -SGL

In the PRS-PGx-L, -GL and -SGL methods, penalized regression is used to solve Eq. (1) with individual-level data. Assuming independence between prognostic and predictive effect sizes within each SNP, a direct solution is to minimize the following equation while using a Lasso framework (PRS-PGx-L based on glmnet R package v4.1.1 https://cran.r-project.org/web/packages/glmnet/index.html):

$$f(b)=\frac{1}{2} \Big|\Big|{{{{{{{\bf{Y}}}}}}}}-\mathop{\sum }\limits_{j=1}^{m}{{{{{{{{\bf{X}}}}}}}}}_{j}{{{{{{{{\bf{b}}}}}}}}}_{j} {\Big|\Big|}_{2}^{2}+\lambda|\vert {{{{{{{\bf{b}}}}}}}}|{|}_{1},$$

where Xj = [Gj, G × Tj], and bj = (βj, αj). 1 and 2 stand for L1-norm and L2-norm, respectively. Assuming if a SNP is included into the model, both prognostic and predictive effects of that SNP may be non-zero, then Group Lasso43 (PRS-PGx-GL based on gglasso R package v1.5 https://cran.r-project.org/web/packages/gglasso/index.html) might be appealing by considering each genetic marker as a group:

$$f(b)=\frac{1}{2} \Big|\Big|{{{{{{{\bf{Y}}}}}}}}-\mathop{\sum }\limits_{j=1}^{m}{{{{{{{{\bf{X}}}}}}}}}_{j}{{{{{{{{\bf{b}}}}}}}}}_{j} {\Big|\Big|}_{2}^{2}+\lambda \mathop{\sum }\limits_{j=1}^{m}\sqrt{{p}_{j}}|\vert {{{{{{{{\bf{b}}}}}}}}}_{j}|{|}_{2},$$

where pj = 2 denotes the group size. Finally, if we assume sparsity at both the group and individual feature levels, we also consider the Sparse Group Lasso44 (PRS-PGx-SGL based on SGL R package v1.3 https://cran.r-project.org/web/packages/SGL/index.html) whose penalty is a linear combination of penalties from Lasso and Group Lasso:

$$f(b)=\frac{1}{2} \Big|\Big|{{{{{{{\bf{Y}}}}}}}}-\mathop{\sum }\limits_{j=1}^{m}{{{{{{{{\bf{X}}}}}}}}}_{j}{{{{{{{{\bf{b}}}}}}}}}_{j} {\Big|\Big|}_{2}^{2}+\alpha \lambda|\vert {{{{{{{\bf{b}}}}}}}}|{|}_{1}+(1-\alpha )\lambda \mathop{\sum }\limits_{j=1}^{m}\sqrt{{p}_{j}}|\vert {{{{{{{{\bf{b}}}}}}}}}_{j}|{|}_{2}.$$

Simulations

We performed simulation studies using real genetic data from the IMPROVE-IT trial (n = 5661 in the PGx subset population) and the 1KG European sample (n = 503) as an external LD reference panel. 5000 SNPs were randomly chosen from LD blocks 31 and 3242 on chromosome 19, which were matched between the IMPROVE-IT and the 1KG datasets. To mimic disease GWAS data, the sample size was increased to n = 20,000 via random mating (Supplementary Method G). The SNP prognostic and predictive effect sizes were simulated jointly with the following distribution:

$$\left(\begin{array}{r}{\beta }_{j}^{(k)}\\ {\alpha }_{j}^{(k)}\end{array}\right) \sim \left\{\begin{array}{cl}{{\mbox{MVN}}}\,(0,\,{{{\Sigma }}}_{k})&\,{{\mbox{with probability}}}\,{\pi }_{k},\\ 0 \hfill &\, {{\mbox{with probability}}}\,1-{\pi }_{k},\end{array}\right.$$
(14)

where πk ~ Beta(P(causal), 1 − P(causal)), j denotes the j-th SNP, and k the k-th LD block. Equation (14) implies that proportion of causal variants varies from different LD blocks, but the overall proportion of causal variants across the whole genome is controlled by P(causal). Furthermore,

$${{{\Sigma }}}_{k}=\left[\begin{array}{cc}\psi &{\rho }_{k}\sqrt{\psi \xi }\\ {\rho }_{k}\sqrt{\psi \xi }&\xi \end{array}\right],\,{{\mbox{where}}}\,\,{\rho }_{k}\, \sim \,\,{{\mbox{Uniform}}}\,(0,\; 1).$$
(15)

We explored different scenarios when ψ/ξ = 1 (i.e., the prognostic effect was in the same scale with the predictive effect); and when ψ/ξ = 16 or 1/16 (i.e., one effect was dominant to the other effect). It is worth noting that Eq. (14) indicates that each causal variant would carry some degree of prognostic effect, and some degree of predictive effect. In addition, to generate completely separated prognostic and predictive SNPs, we randomly chose half of the causal variants and only kept their prognostic effects (i.e., artificially shrank αj = 0); for the rest half of the causal variants, only predictive effects were kept (i.e., βj = 0).

Five clinical factors (age, gender, prior lipid-lowering (PLL) therapy, EARLY acute coronary syndrome (ACS) trial, and high-risk ACS diagnosis) were considered as covariates. To mimic the disease GWAS data, the phenotype was generated as

$${{{{{{{{\bf{Y}}}}}}}}}_{n\times 1}={{{{{{{{\bf{X}}}}}}}}}_{n\times 5}{{{{{{{{\mathbf{1}}}}}}}}}_{5\times 1}+{{{{{{{{\bf{G}}}}}}}}}_{n\times m}{\beta }_{m\times 1}+{\epsilon }_{n\times 1},$$

where n = 20,000 and m = 5000. To mimic the PGx GWAS data, the simulated trait was generated as

$${{{{{{{{\bf{Y}}}}}}}}}_{n\times 1}={{{{{{{{\bf{X}}}}}}}}}_{n\times 5}{{{{{{{{\mathbf{1}}}}}}}}}_{5\times 1}+{\beta }_{{{{{{{{\rm{T}}}}}}}}}{{{{{{{{\bf{T}}}}}}}}}_{n\times 1}+{{{{{{{{\bf{G}}}}}}}}}_{n\times m}{\beta }_{m\times 1}+{({{{{{{{\bf{G}}}}}}}}\times {{{{{{{\bf{T}}}}}}}})}_{n\times m}{\alpha }_{m\times 1}+{\epsilon }_{n\times 1},$$

where n = 5661 and m = 5000. In the above two equations, ϵn×1 ~ N(0, σ2In), where σ2 was determined by the heritability, which was set to 0.1, 0.3, and 0.5.

In each replicate, we randomly chose either 1000 or 3000 patients from PGx GWAS data as the PGx training dataset (to build the PGx PRS); and the other 1000 patients as the independent testing dataset (to evaluate the predictive performance of all the PRS-Dis and PRS-PGx methods). Specifically, the observed phenotype Y in the testing set was first adjusted by the clinical factors X and the treatment T, and we obtained the adjusted phenotype Yadj. Then the predictive effect was measured by the p-value from the two-sided likelihood ratio test of Spred × T interaction in the model Yadj ~ Sprog + Spred × T. The prediction accuracies were quantified by R2 between the adjusted phenotype and the predicted ones from Yadj ~ SPGx under two arms, the treatment arm, and the control arm, respectively. P-values were also obtained by the two-sided LRT of SPGx under treatment and control arms, respectively. The entire workflow of the simulation studies is shown in Supplementary Fig. 15.

IMPROVE-IT PGx GWAS data analysis

We applied two PRS-Dis methods (PRS-Dis-CT and PRS-Dis-LDpred2) and four PRS-PGx methods (PRS-PGx-Unadj, PRS-PGx-CT, PRS-PGx-GL, and PRS-PGx-Bayes) to the prediction of the drug response (low-density lipoprotein cholesterol log-fold change at 1-month) from the IMPROVE-IT PGx GWAS although LDL-C was measured longitudinally at multiple time points. IMPROVE-IT is a phase 3b, multicenter, double-blind, randomized study to establish the clinical benefit and safety of Vytorin (Ezetimibe/Simvastatin tablet) versus Simvastatin monotherapy in high-risk subjects24 (clinical trial registry number: NCT00202878). The ethics committee at each participating center approved the protocol and amendments. All IMPROVE-IT trials were carried out in accordance with the Declaration of Helsinki, current guidelines on Good Clinical Practices and local ethical and legal requirements. All participants provided voluntary written informed consent before trial entry. The details of the endpoint, genotyping, genotype QC, and imputation for this GWAS analyses are introduced elsewhere25. After GWAS QC and SNP imputation, there were 9,407,967 variants and 6502 subjects are available for analyses. The subjects were further filtered down to 5661 subjects for the GWAS analyses by excluding subjects who had a cardiovascular event prior to month 1, since cardiovascular events prior to this time point may affect LDL-C in a manner unrelated to treatment. A total of 5661 European subjects were included for analysis of the LDL-C endpoint.

For the PRS-Dis approaches (PRS-Dis-CT and PRS-Dis-LDpred2), the LDL-C disease GWAS summary statistics, obtained from the Global Lipids Genetics Consortium Results45 (http://csg.sph.umich.edu/willer/public/lipids2017/), were used to construct disease PRS in the IMPROVE-IT data (as the independent testing set). For PRS-PGx methods (PRS-PGx-Unadj, PRS-PGx-CT, PRS-PGx-GL, and PRS-PGx-Bayes), due to the lack of independent PGx data for training, we alternatively proposed to use a 5-fold cross-validation to evaluate their performance. More specifically, we split the IMPROVE-IT data into five folds; in each CV step, we chose four of them as the training set, and used the remaining one as the testing set. The PGx GWAS summary statistics, obtained from the training set, were used to construct PGx PRS in the testing set, and only the prognostic and predictive scores of patients in the testing set were recorded. The above procedures were repeated and the PGx PRS for all the patients in the IMPROVE-IT PGx GWAS data (when they served as the testing set) were obtained. Finally, the predictive performance was evaluated in the same criteria as described in the “Simulations” section, as well as the quantile plot for patient stratification.

It is worth noting that in each cross-validation, the GWAS summary statistics data were generated from the training set by running GWAS analysis with the model: \(\log {{{{{{{{\bf{Y}}}}}}}}}_{1}-\log {{{{{{{{\bf{Y}}}}}}}}}_{0}={\beta }_{0}+{\beta }_{{{{\mbox{Y}}}}_{0}}\log {{{{{{{{\bf{Y}}}}}}}}}_{0}+{\beta }_{{{{{{{{\rm{T}}}}}}}}}{{{{{{{\bf{T}}}}}}}}+{{{{{{{\bf{G}}}}}}}}\beta+({{{{{{{\bf{G}}}}}}}}\times {{{{{{{\bf{T}}}}}}}})\alpha+{{{{{{{\bf{X}}}}}}}}\gamma\) where Y1 is the on-treatment LDL-C response, Y0 is the baseline LDL-C response, β is the prognostic effect, α is the predictive effect and the covariate matrix X included age, gender, prior lipid-lowering therapy, early Acute Coronary Syndrome (ACS) trial, high-risk ACS diagnosis, and the top five principal components. As recommended by Zhang et al.25, we adjusted for the baseline LDL-C level Y0 (in the log scale) in the model to appropriately control the type I error rate (or genome inflation). The PGx GWAS summary statistics data, including the prognostics and predictive effects (\(\widehat{\beta }\) and \(\widehat{\alpha }\)), the minor allele frequencies (MAF), the two-sided 2df (G + G × T) test p-values, the SNP positions, the standard deviation of response, and the mean of treatment, were further used for the PGx PRS based drug response analyses. Detailed information about the summary statistics is provided in Supplementary Method F.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.