Introduction

Tens of thousands of single nucleotide polymorphisms (SNP) have been mapped to human complex traits and diseases through genome-wide association studies (GWAS)1,2. Though each SNP only explains a small fraction of variation of the underlying phenotype, polygenic risk scores (PRS), which aggregate the genetic effects of many loci, can have a substantial ability to predict traits and stratify populations by underlying disease risks3,4,5,6,7,8,9,10,11,12. However, as existing GWAS to date have been primarily conducted in European ancestry populations (EUR)13,14,15,16, recent studies have consistently shown that the transferability of EUR-derived PRS to non-EUR populations often is suboptimal and in particular poor for African Ancestry populations17,18,19,20,21.

Despite growing efforts of conducting genetic research on minority populations22,23,24,25, the gap in sample sizes between EUR and non-EUR populations is likely to persist in the foreseeable future. As the performance of PRS largely depends on the sample size of training GWAS3,26, using single-ancestry methods27,28,29,30,31 to generate PRS for a minority population, using data from that population alone may not achieve ideal results. To address this issue, researchers have developed methods for generating powerful PRS by borrowing information across diverse ancestry populations32. For example, Weighted PRS33 combines single-ancestry PRS generated from each population using weights that optimize performance for a target population. Bayesian methods have also been proposed that generate improved PRS for each population by jointly modeling the effect-size distribution across populations34,35. Recently, our group proposed a new method named CT-SLEB21, which extends the clumping and thresholding (CT)36 method to multi-ancestry settings. The method uses an empirical-Bayes (EB) approach to estimate effect sizes by borrowing information across populations and a super learning model to combine PRSs under different tuning parameters. However, the optimality of the methods depends on many factors, including the ability to account for heterogeneous linkage disequilibrium (LD) structure across populations and the adequacy of the models for underlying effect-size distribution3,26. In general, our extensive simulation studies and data analyses suggest that no method is uniformly the most powerful, and exploration of complementary methods will often be needed to derive the optimal PRS in any given setting21.

In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on an enSemble PEnalized Regression (PROSPER) using GWAS summary statistics and validation datasets across diverse populations. The method incorporates \({{{{{{\mathscr{L}}}}}}}_{1}\) penalty functions for regularizing SNP effect sizes within each population, an \({{{{{{\mathscr{L}}}}}}}_{2}\) penalty function for borrowing information across populations, and a flexible but parsimonious specification of the underlying penalty parameters to reduce computational time. Further, instead of selecting a single optimal set of tuning parameters, the method combines PRS generated across different populations and tuning parameters using a final ensemble regression step. We compare the predictive performance of PROSPER with a wide variety of single- and multi-ancestry methods using simulation datasets from our recent study21 across five populations (EUR, African (AFR), Ad Mixed American (AMR), East Asian (EAS), and South Asian (SAS))21. Furthermore, we evaluate these methods using a variety of real datasets from 23andMe Inc. (23andMe), the Global Lipids Genetics Consortium (GLGC)37, All of Us (AoU)38, and the UK Biobank study (UKBB)39. Results from these analyses indicate that PROSPER is a highly promising method for generating the most powerful multi-ancestry PRS across diverse types of complex traits. Computationally, PROSPER is also exceptionally scalable compared to other advanced methods.

Results

Method overview

PRSOSPER is a method designed to improve prediction performance for PRS across distinct ancestral populations by borrowing information across ancestries (Fig. 1). It can integrate large EUR GWAS with smaller GWAS from non-EUR populations. Ideally, individual-level tuning data are needed for all populations, because the method needs optimal parameters from single-ancestry analysis as an input; however, even when data is only available for a target population, PRSOSPER can still be performed, and the PRS will be optimized and validated toward the target population. The method can account for population-specific genetic variants, allele frequencies, and LD patterns and use computational techniques for penalized regressions for fast implementation.

Fig. 1: Detailed flowchart of PROSPER.
figure 1

The analysis of \(M\) populations in PROSPER involves three key steps: (1) Separate single-ancestry analysis for all populations \(i=1,\ldots,M\); (2) Joint analysis across populations using penalized regression; (3) Ensemble regression. In step 1, the training GWAS data is used to train lassosum2 models, and the tuning data is used to obtain the optimal tuning parameters in a single-ancestry analysis. In step 2, the training GWAS and the optimal tuning parameter values from step 1 are used to train the joint cross-population penalized regression model, and obtain solution \({{{{{{\boldsymbol{\beta }}}}}}}_{\lambda,c,i}\) for each \(\lambda\) and \(c\). In step 3, the tuning data is used to train the super learning model for the ensemble of PRSs computed from the solutions in step 2, \({{{{{{\bf{PRS}}}}}}}_{\lambda,c,i}={{{{{\bf{X}}}}}}{{{{{{\boldsymbol{\beta }}}}}}}_{\lambda,c,i}\). The final PRS is computed as \({{{{{\bf{PRS}}}}}}={{{{{\bf{X}}}}}}\left(\sum {{{{{{\boldsymbol{w}}}}}}}_{\lambda,c,i}{{{{{{\boldsymbol{\beta }}}}}}}_{\lambda,c,i}\right)\), where \({w}_{\lambda,c,i}\) are the weights from the super learning model. Refer to the “Method Overview” section in the main text for a full explanation of all notations in the flowchart.

PROSPER

Assuming a continuous trait, we first consider a standard linear regression model for underlying individual-level data for describing the relationship between trait values and genome-wide genetic variants across \(M\) distinct populations. Let \({{{{{{\bf{Y}}}}}}}_{i}\) denote the \({n}_{i}\times 1\) vector of trait values, \({{{{{{\bf{X}}}}}}}_{i}\) denote the \({n}_{i}\times {p}_{i}\) genotype matrix, \({{{{{{\boldsymbol{\beta }}}}}}}_{i}\) denote the \({p}_{i}\times 1\) vector of SNP effects, and \({{{{{{\boldsymbol{\epsilon }}}}}}}_{i}\) denote the \({n}_{i}\times 1\) vector of random errors for the \(i\)th population. We assume underlying linear regression models of the form \({{{{{{\bf{Y}}}}}}}_{i}{{{{{\boldsymbol{=}}}}}}{{{{{{\bf{X}}}}}}}_{i}{{{{{{\boldsymbol{\beta }}}}}}}_{i}{{{{{\boldsymbol{+}}}}}}{{{{{{\boldsymbol{\epsilon }}}}}}}_{i},i=1,\ldots M{{{{{\boldsymbol{;}}}}}}\) and intend to solve the linear regression system by least square with a combination of \({{{{{{\mathscr{L}}}}}}}_{1}\) (lasso)40 and \({{{{{{\mathscr{L}}}}}}}_{2}\) (ridge)41 penalties in the form

$$ \mathop{\sum}\limits_{1\le i\le M}\frac{1}{{n}_{i}} \, {({{{{{{\bf{Y}}}}}}}_{i}-{{{{{{\bf{X}}}}}}}_{i}{{{{{{\boldsymbol{\beta }}}}}}}_{i})}^{T}({{{{{{\bf{Y}}}}}}}_{i}-{{{{{{\bf{X}}}}}}}_{i}{{{{{{\boldsymbol{\beta }}}}}}}_{i}) \\ \qquad+\mathop{\sum}\limits_{1\le i\le M}2{\lambda }_{i}{\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{i}\Vert }_{1}^{1} \quad \\ \qquad +\mathop{\sum}\limits_{1\le {i}_{1} < {i}_{2}\le M}{c}_{{i}_{1}{i}_{2}}\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{1}}^{{s}_{{i}_{1}{i}_{2}}}-\,{{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{2}}^{{s}_{{i}_{1}{i}_{2}}}\Vert {\,}_{2}^{2}$$

where \({\lambda }_{i},i=1,\ldots,M\) are the population-specific tuning parameters associated with the lasso penalty; \({{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{1}}^{{s}_{{i}_{1}{i}_{2}}}\) and \({{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{2}}^{{s}_{{i}_{1}{i}_{2}}}\) denote the vectors of effect-sizes for SNPs for the \({i}_{1}\)-th and \({i}_{2}\)-th populations, respectively, restricted to the set of shared SNPs (\({s}_{{i}_{1}{i}_{2}}\)) across the pair of the populations; and \({c}_{{i}_{1}{i}_{2}},1\le {i}_{1} < {i}_{2}\le M\) are the tuning parameters associated with the ridge penalty imposing effect-size similarity across pairs of populations.

In the above, the first part, \({{\sum }_{1\le i\le M}2{\lambda }_{i}\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{i}\Vert }_{1}^{1}\), uses a lasso penalty. Lasso can produce sparse solution40 and recent PRS studies that have implemented the lasso penalty in the single-ancestry setting have shown its promising performance28,29. The second part, \({{\sum }_{1\le {i}_{1} < {i}_{2}\le M{c}_{{i}_{1}{i}_{2}}}\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{1}}^{{s}_{{i}_{1}{i}_{2}}}-{{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{2}}^{{s}_{{i}_{1}{i}_{2}}}\Vert }_{2}^{2}\), uses a ridge penalty. As it has been widely shown that the causal effect sizes of SNPs tend to be correlated across populations42,43, we propose to use the ridge penalty to induce genetic similarity across populations. Compared to the fused lasso44, which uses lasso penalty for the differences, we use ridge penalty instead, which allows a small difference in SNP effects across populations rather than truncating them to zero. The solutions for population-specific effect size using the combined lasso and ridge penalties can be sparse.

The estimate of \({{{{{{\boldsymbol{\beta }}}}}}}_{i}{{{{{\boldsymbol{,}}}}}}i=1,\ldots,M\) in the above individual-level linear regression systems can be obtained by minimizing the above least square objective function. Following the derivation of lassosum28, a single-ancestry method for fitting the lasso model to GWAS summary statistics data, we show that the objective function for individual-level data can be approximated using GWAS summary statistics and LD reference matrices by substituting \({\frac{1}{{n}_{i}}{{{{{\bf{X}}}}}}}_{i}^{T}{{{{{{\bf{X}}}}}}}_{i}\) by \({{{{{{\bf{R}}}}}}}_{i}\), where \({{{{{{\bf{R}}}}}}}_{i}\) is the estimated LD matrix based on a reference sample from the \(i\)-th population, and \(\frac{1}{{n}_{i}}{{{{{{\bf{X}}}}}}}_{i}^{T}{{{{{{\bf{y}}}}}}}_{i}\), by \({{{{{{\bf{r}}}}}}}_{i}\), where \({{{{{{\bf{r}}}}}}}_{i}\) is the GWAS summary statistics in the \(i\)-th population. Therefore, the objective function of the summary-level model can be written as

$$\mathop{\sum}\limits_{1\le i\le M}({{{{{{\boldsymbol{\beta }}}}}}}_{i}^{T}({{{{{{\bf{R}}}}}}}_{i}+{\delta }_{i}{{{{{\bf{I}}}}}}){{{{{{\boldsymbol{\beta }}}}}}}_{i}-2{{{{{{\boldsymbol{\beta }}}}}}}_{i}^{T}{{{{{{\bf{r}}}}}}}_{i}+2{\lambda }_{i}{\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{i}\Vert }_{1}^{1})+\mathop{\sum}\limits_{1\le {i}_{1} < {i}_{2}\le M}{c}_{{i}_{1}{i}_{2}}{\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{1}}^{{s}_{{i}_{1}{i}_{2}}}-{{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{2}}^{{s}_{{i}_{1}{i}_{2}}}\Vert }_{2}^{2}$$

where the additional tuning parameters \({\delta }_{i}\), \(i=1,\ldots,M\), are introduced for regularization of the LD matrices across the different populations29. For a fixed set of tuning parameters, the above objective function can be solved using fast coordinate descent algorithms45 by iteratively updating each element of \({\beta }_{i}\), \(i=1,\ldots,M\) (see “Obtain PROSPER solution” under “Methods”).

Reducing tuning parameters

For the selection of tuning parameters, we assume we have access to individual-level data across the different populations which are independent of underlying GWAS from which summary statistics are generated. The above setting involves three sets of tuning parameters, \({\left\{{\delta }_{i}\right\}}_{i=1}^{M}\), \({\left\{{\lambda }_{i}\right\}}_{i=1}^{M}\), and \({\{{c}_{{i}_{1}{i}_{2}}\}}_{1\le {i}_{1} < {i}_{2}\le M}\), totaling to the number of \(M+M+\frac{M\left(M-1\right)}{2}\). As grid search across many combinations of tuning parameter values can be computationally intensive, we propose to reduce the search range by a series of steps. First, we use lassosum229 to analyze GWAS summary statistics and tuning data from each ancestry population by itself and obtain underlying values of optimal tuning parameters, (\({\delta }_{i}^{0}\), \({\lambda }_{i}^{0}\)) for \(i=1,\ldots,M\); if tuning data is only available for the target population, the (\({\delta }_{i}^{0}\), \({\lambda }_{i}^{0}\)) for other populations can be optimized towards the target population. For fitting PROSPER, we fix \({\delta }_{i}={\delta }_{i}^{0}\) for \(i=1,\ldots,M\) as these are essentially used to regularize estimates of population-specific LD matrices. We note that the optimal \({\left\{{\lambda }_{i}\right\}}_{i=1}^{M}\) depend on sample sizes of underlying training GWAS (Supplementary Fig. 1), and thus should not be arbitrarily assumed to be equal across all populations. Considering that the optimal tuning parameters associated with the \({{{{{{\mathscr{L}}}}}}}_{1}\) penalty function from the single-ancestry analyses should reflect the characteristics of GWAS data, which includes underlying sparsity of effect sizes and sample sizes, we propose to specify the \({{{{{{\mathscr{L}}}}}}}_{1}\)-tuning parameters in PROSPER as \({\lambda }_{i}=\lambda {\lambda }_{i}^{0}\), i.e., they are determined by the corresponding tuning parameters from the ancestry-specific analysis except for the constant multiplicative factor \(\lambda\). Finally, for computational feasibility, we further assume that effect sizes across all pairs of populations have a similar degree of homogeneity and thus set all \({\{{c}_{{i}_{1}{i}_{2}}\}}_{1\le {i}_{1} < {i}_{2}\le M}\) to be equal to \(c\). We will later discuss this assumption and perform a sensitivity analysis (see Discussion). By using the above assumptions, the objective function to minimize with respect to \({{{{{{\boldsymbol{\beta }}}}}}}_{i}{{{{{\boldsymbol{,}}}}}}i=1,\ldots,M\), becomes

$$\mathop{\sum}\limits_{1\le i\le M}({{{{{{\boldsymbol{\beta }}}}}}}_{i}^{T}({{{{{{\bf{R}}}}}}}_{i}+{\delta }_{i}^{0}{{{{{\bf{I}}}}}}){{{{{{\boldsymbol{\beta }}}}}}}_{i}-2{{{{{{\boldsymbol{\beta }}}}}}}_{i}^{T}{{{{{{\bf{r}}}}}}}_{i}+2\lambda {\lambda }_{i}^{0}{\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{i}\Vert }_{1}^{1})+\mathop{\sum}\limits_{1\le {i}_{1} < {i}_{2}\le M}c{\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{1}}^{{s}_{{i}_{1}{i}_{2}}}-{{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{2}}^{{s}_{{i}_{1}{i}_{2}}}\Vert }_{2}^{2}$$

where \(\lambda\) and \(c\) are the only two tuning parameters needed for lasso penalty and genetic similarity penalty, respectively.

Ensemble

Using an ensemble method to combine PRS has been shown to be promising in CT-type methods as opposed to picking an optimal threshold21,36. In general, a specific form of the penalty function, or equivalently a model for prior distribution in the Bayesian framework, may not be able to adequately capture the complex nature of the underlying distribution of the SNPs across diverse populations. We conjecture that when effect size distribution is likely to be mis-specified, an ensemble method, which combines PRS across different values of tuning parameters instead of choosing one optimal set, is likely to improve prediction. Therefore, as a last step, we obtain the final PROSPER model using an ensemble method, super learning46,47,48, implemented in the SuperLearner R package, to combine PRS generated from various tuning parameter settings and optimized using tuning data from the target population. The super learner we use here was based on three supervised learning algorithms, including lasso40, ridge41, and linear regression (see “Methods”).

Results

Methods comparison on simulated data

We conducted simulation analyses on continuous traits under various genetic architectures21 to evaluate the performance of different methods that can be categorized into five groups: single-ancestry methods trained from target GWAS data (single-ancestry method), single-ancestry methods trained from EUR GWAS data (EUR PRS-based method), simple multi-ancestry methods by weighting single-ancestry PRS (weighted PRS), recently published multi-ancestry methods (existing multi-ancestry methods), and our proposed method PROSPER. Single-ancestry methods include CT36, LDpred230, and lassosum229. Existing multi-ancestry methods include PRS-CSx34 and CT-SLEB21. All of the methods were implemented using the latest available version of the underlying software. The performance of the methods is evaluated by R2 measured on validation samples independent of training and tuning datasets. Analyses in this and the following sections are restricted to a total of 2,586,434 SNPs, which are included in either HapMap 3 (HM3)49 or the Multi-Ethnic Genotyping Arrays (MEGA) chips array50. LD reference samples for all five ancestries, EUR, AFR, AMR, EAS, and SAS, in this and the following sections, are from 1000 Genomes Project (Phase 3)51 (1000G).

The results (Fig. 2, Supplementary Figs. 25, and Supplementary Data 15) show that multi-ancestry methods generally exhibit superior performance compared to single-ancestry methods. Weighted PRS generated from methods modeling LD (LDpred2 and lassosum2) can lead to a noticeable improvement in performance (green bars in Fig. 2). Notably, PROSPER shows robust performance uniformly across different scenarios. When the sample size of the target non-EUR population is small (\({N}_{{target}}=15{{{{{\rm{K}}}}}}\)) (Fig. 2a), PROSPER has comparable good performance with other multi-ancestry methods, such as weighted LDpred2 and PRS-CSx, under a high degree of polygenicity (\({p}_{{causal}}=0.01\)). However, under the same sample size setting and lower polygenicity (pcausal = 0.001 and 5 × 10−4), PRS-CSx and CT-SLEB outperform PROSPER, with the margin of improvement increasing as the strength of negative selection decreases (strong negative selection in Fig. 2a, mild negative selection in Supplementary Fig. 2a, and no negative selection in Supplementary Fig. 3a). When the sample size of the target population is large (\({N}_{{target}}=80{{{{{\rm{K}}}}}}\)) (Fig. 2b and Supplementary Figs. 25b), PROSPER almost uniformly outperforms all other methods, particularly for the AFR population, and weighted LDpred2 remains a close competitor.

Fig. 2: Performance comparison of alternative methods on simulated data generated with different sample sizes and genetic architectures under strong negative selection and fixed common-SNP heritability.
figure 2

Data are simulated for continuous phenotype under a strong negative selection model and three different degrees of polygenicity (top panel: \({p}_{{causal}}=0.01\), middle panel: \({p}_{{causal}}=0.001\), and bottom panel: \({p}_{{causal}}=5\times {10}^{-4}\)). Common SNP heritability is fixed at 0.4 across all populations, and the correlations in effect sizes for share SNPs between all pairs of populations is fixed at 0.8. The sample sizes for GWAS training data are assumed to be a n = 15,000, and b n = 80,000 for the four non-EUR target populations; and is fixed at n = 100,000 for the EUR population. PRS generated from all methods are tuned in n = 10,000 samples, and then tested in n = 10,000 independent samples in each target population. The PRS-CSx package is restricted to SNPs from HM3, whereas other alternative methods use SNPs from either HM3 or MEGA. Bars in the figure show the performance of R2 for each method in each dataset. Colors are described on the right side of the figure. Source data are provided in Supplementary Data 1.

We further compare the computational efficiency of PROSPER in comparison to PRS-CSx, the state-of-the-art Bayesian method available for generating multi-ancestry PRS. We train PRS models for the two methods using simulated data for chromosome 22 using a single core with AMD EPYC 7702 64-Core Processors running at 2.0 GHz. We observe (Supplementary Data 6) that PROSPER is 37 times faster than PRS-CSx (3.0 vs. 111.1 minutes) in a two-ancestry analysis including AFR and EUR; and 88 times faster (6.8 vs. 595.8 minutes) in the analysis of all five ancestries. The memory usage for PRS-CSx is about 2.8 times smaller than PROSPER (0.78 vs. 2.24 Gb in two-ancestry analysis, and 0.84 vs. 2.35 Gb in five-ancestry analysis).

23andMe data analysis

We applied various methods to GWAS summary statistics available from the 23andMe, Inc. to predict two continuous traits, heart metabolic disease burden and height; as well as five binary traits, any cardiovascular disease (any CVD), depression, migraine diagnosis, morning person, and sing back musical note (SBMN). The datasets are available for all five ancestries, African American (AA), Latino, EAS, EUR, and SAS. The methods are tuned and validated on a set of independent individuals of the corresponding ancestry from the 23andMe participant cohort (see the section of “Real data analysis” under “Methods” for data description, and Supplementary Data 7 and 8 for sample sizes used in training, tuning and validation). In an earlier version of the analysis, we had analyzed the data using an older version LDpred2 in its package of bigsnpr (version 1.8) that was available when the project was initiated. Quality control analysis following comments from one of the reviewers indicated problem with convergence of those results. As we were not able to further update the analysis using the most recent version of the LDpred2 in its package of bigsnpr (version 1.12) due to time constraint of the 23andMe team, we did not report results from LDpred2 and its corresponding EUR and weighted methods in this section of 23andMe data analysis.

From the analysis of two continuous traits (Fig. 3 and Supplementary Data 9), we observe that lassosum2 and its related methods (EUR lassosum2 and weighted lassosum2) generally perform better than CT and its related methods. On the basis of the advantage of lassosum2, PROSPER further improves the performance, and for most of the settings, outperforms all alternative methods, including PRS-CSx and CT-SLEB. PROSPER demonstrates particularly remarkable improvement for both traits in AA and Latino (26.9% relative improvement in R2 over the second-best method on average, yellow cells in Supplementary Data 10) (first two panels in Fig. 3a, b). For EAS and SAS, PROSPER is slightly better than other methods, except for heart metabolic disease burden of SAS (the last panel in Fig. 3a), which has the smallest sample size (~20 K).

Fig. 3: Performance comparison of alternative methods for prediction of two continuous traits in 23andMe.
figure 3

We analyzed two continuous traits, a heart metabolic disease burden and b height. PRS are trained using 23andMe data that available for five populations: African American, Latino, EAS, EUR, and SAS, and then tuned in an independent set of individuals from 23andMe of the corresponding ancestry. Performance is reported based on adjusted R2 accounting for sex, age and PC1-5 in a held-out validation sample of individuals from 23andMe of the corresponding ancestry. The ratio of sample sizes for training, tuning and validation is roughly about 7:2:1, and detailed numbers are in Supplementary Data 7 and 8. The PRS-CSx package is restricted to SNPs from HM3, whereas other alternative methods use SNPs from either HM3 or MEGA. LDpred2 and its corresponding EUR and weighted methods are excluded to avoid misinterpretation, as a result of our collaboration restrictions with 23andMe, Inc., preventing us from updating these methods to the latest version of its package. Bars in the figure show the performance of adjusted R2 for each method in each dataset. Colors are described on the right side of the figure. Source data are provided in Supplementary Data 9.

The results from the analysis of the binary traits (Fig. 4 and Supplementary Data 9) show that PROSPER generally exhibits better performance (7.8% and 12.3% relative improvement in logit-scale variance (see “Methods”) over CT-SLEB and PRS-CSx, respectively, averaged across populations and traits) (blue and red cells, respectively, in Supplementary Data 10). A similar trend is observed for the analyses of AA and Latino, where PROSPER usually has the best performance (first two panels in Fig. 4a–e). In general, no single method can uniformly outperform others. Weighted lassosum2 has outstanding performance for depression (Fig. 4b), while PROSPER is superior for morning person (Fig. 4d). PRS-CSx shows a slight improvement in the analysis of migraine diagnosis for EAS populations (last second panel in Fig. 4c), and CT-SLEB performs the best in the analysis of any CVD for SAS population (last panel in Fig. 4a).

Fig. 4: Performance comparison of alternative methods for prediction of five binary traits in 23andMe.
figure 4

We analyzed five binary traits, a any CVD, b depression, c migraine diagnosis, d morning person, and e SBMN. PRS are trained using 23andMe data that available for five populations: African American, Latino, EAS, EUR, and SAS, and then tuned in an independent set of individuals from 23andMe of the corresponding ancestry. Performance is reported based on adjusted AUC accounting for sex, age, PC1-5 in a held-out validation sample of individuals from 23andMe of the corresponding ancestry. The ratio of sample sizes for training, tuning and validation is roughly about 7:2:1, and detailed numbers are in Supplementary Data 7 and 8. The PRS-CSx package is restricted to SNPs from HM3, whereas other alternative methods use SNPs from either HM3 or MEGA. LDpred2 and its corresponding EUR and weighted methods are excluded to avoid misinterpretation, as a result of our collaboration restrictions with 23andMe, Inc., preventing us from updating these methods to the latest version of its package. Bars in the figure show the performance of adjusted AUC for each method in each dataset. Colors are described on the right side of the figure. Source data are provided in Supplementary Data 9.

GLGC and AoU data analysis

Considering the uncommonly huge sample sizes from 23andMe, we further applied alternative methods for the analysis of two other real datasets, GLGC and AoU. The GWAS summary statistics from GLGC for four blood lipid traits, high-density lipoprotein (HDL), low-density lipoprotein (LDL), log-transformed triglycerides (logTG), and total cholesterol (TC), are publicly downloadable and available for all five ancestries, African/Admixed African, Hispanic, EAS, EUR, and SAS (see “Methods"' for data description, and Supplementary Data 7 for sample sizes). Further, we generated GWAS summary statistics data from the AoU study for two anthropometric traits, body mass index (BMI) and height, for individuals from three ancestries, AFR, EUR, and Latino/Admixed American (see “Methods” for data description, and Supplementary Data 7 for sample sizes). Both the blood lipid traits and anthropometric traits have corresponding phenotype data available in the UKBB, which we use to perform tuning and validation (see “Real data analysis” under “Methods” for the ancestry composition, and Supplementary Data 8 for sample sizes). Given the limited sample sizes of genetically inferred AMR ancestry individuals in UKBB, we do not report the performance of PRS on AMR individuals in UKBB. In these analyses, we implemented LDpred2 method using the latest version of the software (version 1.12).

Results from analysis of four blood lipid traits (Fig. 5 and Supplementary Data 11) from GLGC and UKBB show that weighted PRS methods substantially outperform alternative methods. In particular, we observe that the weighted lassosum2 outperforms the other two weighted methods. Furthermore, our proposed method, PROSPER, shows improvement over weighted lassosum2 in both AFR and SAS (13.5% and 12.3% relative improvement in R2, respectively, averaged across traits) (green and orange cells, respectively, in Supplementary Data 12), but not in EAS. Notably, PROSPER outperforms PRS-CSx and CT-SLEB in most scenarios (34.2% and 37.7% relative improvement in R2, respectively, averaged across traits and ancestries) (blue and red cells, respectively, in Supplementary Data 12), with the improvement being particularly remarkable for the AFR population (Fig. 5) in which PRS development tends to be the most challenging.

Fig. 5: Performance comparison of alternative methods for prediction of four blood lipid traits (GLGC-training and UKBB-tuning/validation).
figure 5

We analyzed four blood lipid traits, a HDL, b LDL, c logTG, and d TC. PRS are trained using GLGC data that available for five populations: admixed African or African, East Asian, European, Hispanic, and South, and then tuned in individuals from UKBB of the corresponding ancestry: AFR, EAS, EUR, AMR, and SAS (see “Real data analysis” under “Methods” for ancestry composition). Performance is reported based on adjusted R2 accounting for sex, age, PC1-10 in a held-out validation sample of individuals from UKBB of the corresponding ancestry. Sample sizes for training, tuning and validation data are in Supplementary Data 7 and 8. Results for AMR are not included due to the small sample size of genetically inferred AMR ancestry individuals in UKBB. The PRS-CSx package is restricted to SNPs from HM3, whereas other alternative methods use SNPs from either HM3 or MEGA. Bars in the figure show the performance of adjusted R2 for each method in each dataset. Colors are described on the right side of the figure. Source data are provided in Supplementary Data 11.

The results from AoU and UKBB (Fig. 6 and Supplementary Data 13) show that PROSPER generates the most predictive PRS for the two analyzed anthropometric traits for the AFR population. It appears that Bayesian and penalized regression methods that explicitly model LD tend to outperform corresponding CT-type methods (CT, EUR CT, and weighted CT) which excluded correlated SNPs. Among weighted methods, both LDpred2 and lassosum2 show major improvement over the corresponding CT method. Further, for both traits, PROSPER shows remarkable improvement over the best of the weighted methods and the two other advanced methods, PRS-CSx and CT-SLEB (91.3% and 76.5% relative improvement in R2, respectively, averaged across the two traits) (blue and red cells, respectively, in Supplementary Data 14).

Fig. 6: Performance comparison of alternative methods for prediction of two anthropometric traits (AoU-training and UKBB-tuning/validation).
figure 6

We analyzed two anthropometric traits, a BMI and b height. PRS are trained using AoU data that are available for three populations: African, Latino/Admixed American, and European and then tuned in individuals from UKBB of the corresponding ancestry: AFR, AMR, and EUR (see “Real data analysis” under “Methods” for ancestry composition). Performance is reported based on adjusted R2 accounting for sex, age, PC1-10 in a held-out validation sample of individuals from UKBB of the corresponding ancestry. Sample sizes for training, tuning and validation data are in Supplementary Data 7 and 8. Results for AMR are not included due to the small sample size of genetically inferred AMR ancestry individuals in UKBB. The number of SNPs analyzed in AoU analyses is much smaller than other analyses because the GWAS from AoU is on array data only (see Supplementary Data 7 for the number of SNPs). The PRS-CSx package is restricted to SNPs from HM3, whereas other alternative methods use SNPs from either HM3 or MEGA. Bars in the figure show the performance of adjusted R2 for each method in each dataset. Colors are described on the right side of the figure. Source data are provided in Supplementary Data 13.

Gain from PROSPER over lassosum2

To investigate whether the additional gain from PROSPER arises from modeling shared effects across populations or from combining PRS with super learning, we further employ a super learning step for lassosum2 (termed as advanced weighted lassosum2) as a point of comparison. The results in simulations (Supplementary Figs. 610 and Supplementary Data 15) indicate that PROSPER consistently has more advantage than the advanced weighted lassosum2 in all scenarios. The results in real data (Supplementary Figs. 11 and 12 and Supplementary Data 16) show that the performance of the two methods depends on traits and ancestries. PROSPER has comparable performance with advanced weighted lassosum2 in AFR; while PROSPER outperforms advanced weighted lassosum2 almost in all scenarios in SAS and EAS. In summary, PROSPER has 41.1% relative improvement in R2 over advanced weighted lassosum2 on average across all ancestries and all traits in GLGC and AoU. We were not able to perform this analysis in 23andMe due to time constraint of the 23andMe team.

Discussion

In this article, we propose PROSPER as a powerful method that can jointly model GWAS summary statistics from multiple ancestries by an ensemble of penalized regression models to improve the performance of PRS across diverse populations. We show that PROSPER is a uniquely promising method for generating powerful PRS in multi-ancestry settings through extensive simulation studies, analysis of real datasets across a diverse type of complex traits, and considering the most recent developments of alternative methods. Computationally, the method is an order of magnitude faster compared to PRS-CSx34, an advanced Bayesian method, and comparable to CT-SLEB21, which derives the underlying PRS in closed forms. We have packaged the algorithm into a command line tool based on the R programming language (https://github.com/Jingning-Zhang/PROSPER).

We compare PROSPER with a number of alternative simple and advanced methods using both simulated and real datasets. The simulation results show that PROSPER generally outperforms other existing multi-ancestry methods when the target sample size is large (Fig. 2b). However when the sample size of the target population is small (Fig. 2a), no method performed uniformly the best. In this setting, when the degree of polygenicity is the lowest (\({p}_{{causal}}=5\times {10}^{-4}\)), CT-SLEB outperforms other methods by a noticeable margin, and PROSPER performs slightly worse than PRS-CSx. Simulations also show that in the scenario of a highly polygenic trait (\({p}_{{causal}}=0.01\)), irrespective of sample size, both weighted lassosum2 and PROSPER tend to exhibit superiority compared to all other methods. In terms of computational time, PROSPER is an order of magnitude faster than PRS-CSx in a five-ancestry analysis. The memory usage for PRS-CSx is smaller than PROSPER, but both are acceptable (Supplementary Data 6).

We observe that for the analysis of both continuous and binary traits using 23andMe Inc. data, PROSPER demonstrates a substantial advantage over all other methods for the AA and Latino populations, which have the largest sample sizes among all minority groups. The result is consistent with the superior performance of PROSPER observed in simulation settings when the sample size of the target population is large. However, it is worth noting that even for the two other populations, EAS and SAS, which have much smaller sample sizes, PROSPER still performs the best in half of the settings (the last two panels in Figs. 3a, b and  4a–e). For the prediction of blood lipid traits, PROSPER and weighted PRS methods perform noticeable better than other alternative methods. For the analysis of two anthropometric traits using training data from AoU, we observe that methods that explicitly model and account for LD differences (e.g., lassosum2, LDpred2, and their corresponding weighted methods) generally achieve higher predictive accuracy than CT-based methods which discard correlated SNPs. The result is consistent with what we have observed in simulation settings under extreme polygenic architectures as expected for complex traits like height and BMI. In addition, we observe significant improvement in PRS performance using PROSPER over advanced weighted lassosum2 method which is allowed to incorporate a super learning step in lassosum2. This suggests that the additional gain of PROSPER arises from modeling shared effects across populations through the \({{{{{{\mathscr{L}}}}}}}_{2}\)-penalty function.

PROSPER, while showing promising results in our simulations and real-data analyses, does have several limitations. First, when the sample size for the training sample for a target population is small, particularly for traits with low polygenicity, the method may not perform as well as some of the other existing methods (Fig. 2a). In this specific scenario where the number of true causal variants is small, a potential reason for the suboptimal performance of PROSPER is the bias induced by lasso. This inspires future work of extending PROSPER to adaptive lasso52 for unbiased estimation and other forms of penalty functions for sparser solutions. Second, the use of a super learning step in PROSPER can lead to poorer performance compared to weighted lassosum2 when the sample size for the tuning dataset is not adequately large. In the analysis of lipid traits for EAS, for example, we observe lower predictive accuracy of PROSPER than weighted lassosum2 (the middle panel in Fig. 5b, d). This can be attributed to overfitting in the tuning sample, as the number of tuning samples of EAS origin in the UKBB is only ~1000, while the number of PRSs combined in the super learning step is close to 500. In this scenario, we suggest comparing the performance of the ensemble PRS with that without the ensemble step, as the latter one will be more resilient to overfitting. We conducted simulation analyses to further explore the ideal sample size for tuning (Supplementary Fig. 13). Generally, a tuning sample size within the range of 1000–3000 is adequate for continuous traits. Third, we used a constant tuning parameter for the genetic similarity penalty, disregarding varying genetic distances among populations53. However, introducing additional tuning parameters could result in both computational challenges and numerical instability. We have investigated this by analyzing GLGC data (see Supplementary Data 17 and “Methods”), adding an extra tuning parameter to accommodate adaptable distances between the AFR population and others. Results indicate a disproportionate increase in computational load (the last column in Supplementary Data 17) relative to the marginal enhancement in predictive accuracy, and a potential of instability and overfitting (gray cells in Supplementary Data 17). Lastly, the framework is modeled on a standardized genotype scale characterized by strong negative selection; however, there could be diverse genetic architectures in reality. To address this limitation, models could be extended to varying degrees of negative selection by multiplied by exponentiations of allele frequencies, as discussed in ref. 21.

PROSPER and a number of other recent methods have been developed for modeling summary statistics data across discrete populations typically defined by self-reported ancestry information. Increasing sample size for reference sample sizes for various populations well-matched with those providing training datasets can further enhance the performance of PROSPER and other methods that explicitly incorporates LD information into modeling. Further, there is an emerging need to consider the underlying continuum of genetic diversity across populations in both the development and implementational of PRS in diverse populations in the future54. Toward this goal, a recent method called GAUDI55 has been proposed based on the fused lasso penalty for developing PRS in admixed population using individual-level data. While GAUDI shares similarities with PROSPER in terms of the use of the lasso-penalty function, the two methods are distinct in terms of the specification of tuning parameters and use of the ensemble step. Our model specification of PROSPER makes it easily amendable to handle continuous genetic ancestry data, but further research is needed for scalable implementation of the method with individual-level data and extensive empirical evaluations.

To conclude, we have proposed PROSPER, a statistically powerful and computationally scalable method for generating multi-ancestry PRS using GWAS summary statistics and additional tuning and validation datasets across diverse populations. While no method is uniformly powerful in all settings, we show that PROSPER is the most robust among a large variety of recent methods proposed across a wide variety of settings. As individual-level data from GWAS of diverse populations becomes increasingly available, PROSPER and other methods will require additional considerations for incorporating continuous genetic ancestry information, both global and local, into the underlying modeling framework.

Methods

We confirm that our research complies with all relevant ethical regulations. All individuals from 23andMe included have provided informed consent and answered surveys online according to our human subject protocol reviewed and approved by Ethical & Independent Review Services, a private institutional review board (http://www.eandireview.com). All participants from UK Biobank provided written informed consent (more information is available at https://www.ukbiobank.ac.uk/2018/02/gdpr/). The information of individuals from All of US included in our analyses has been collected according to All of Us Research Program Operational Protocol (https://allofus.nih.gov/sites/default/files/aou_operational_protocol_v1.7_mar_2018.pdf). The detailed consent process of All of Us is described on https://allofus.nih.gov/about/protocol/all-us-consent-process.

Data preparation and formatting in PROSPER

We match SNPs and their alleles in GWAS summary statistics and genotypes of individuals for tuning and validation purposes to that in 1000 G reference data (phase 3)51. To simplify computing huge-dimensional LD matrix, we use existing LD block information from EUR28 to divide the whole genome, and assume the blocks to be independent. We use PLINK1.956 with flag --r bin4 to compute the LD matrix within each block in each ancestry for common SNPs (MAF > 0.01) either in HM349 or the MEGA50. For SNPs not common in all populations, we only model them in the populations where they are common; if a SNP is population-specific that is only common in one population, we model it only using the lasso penalty without the genetic similarity penalty. The parameter path of the tuning parameter \(\lambda\) for the scale factor in lasso penalty is set to a sequence evenly spaced on a logarithmic scale from \({\lambda }^{\max }=\mathop{\min }\limits_{1\le i\le m}\left(\frac{\mathop{\max }\limits_{1\le k\le p}\left(\left|{r}_{{ik}}\right|\right)}{{\lambda }_{i}^{0}}\right)\) to \({\lambda }^{\min }=0.001\times {\lambda }^{\max }\) which is set to guarantee non-zero solutions, where \({r}_{{ik}}\) is the GWAS summary statistics for the \(k\)-th SNP in the \(i\)-th population, and \({\lambda }_{i}^{0}\) is the underlying values of optimal tuning parameter \(\lambda\) for the \(i\)-th population. The parameter path for the tuning parameter \(c\) for the genetic similarity penalty is set to a sequence evenly spaced on a quad-root scale from \({c}^{\min }=2\) to \({c}^{\max }=100\), i.e., seq(\({c}^{\min }\)^(1/4), \({c}^{\max }\)^(1/4), length.out = 10)^4 using R command. For all analyses excluding 23andMe, the length of sequences of both parameters are set to be 10, while for the analysis of 23andMe, it is set to be 5 to reduce the computation workload caused by the confidential requirements of the 23andMe dataset.

Obtain PROSPER solution

For \(M\) populations, the objective function to minimize for \({p}_{i}\)-dimentional vector of SNP effect, \({{{{{{\boldsymbol{\beta }}}}}}}_{i},i=1,\ldots,M\), is

$${{{{{\bf{L}}}}}}({{{{{{\boldsymbol{\beta }}}}}}}_{1},\ldots,{{{{{{\boldsymbol{\beta }}}}}}}_{m})= \mathop{\sum}\limits_{1\le i\le M}({{{{{{\boldsymbol{\beta }}}}}}}_{i}^{T}({{{{{{\bf{R}}}}}}}_{i}+{\delta }_{i}{{{{{\bf{I}}}}}}){{{{{{\boldsymbol{\beta }}}}}}}_{i}-2{{{{{{\boldsymbol{\beta }}}}}}}_{i}^{T}{{{{{{\bf{r}}}}}}}_{i}+2{\lambda }_{i}{\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{i}\Vert }_{1}^{1}) \\ +\mathop{\sum}\limits_{1\le {i}_{1} < {i}_{2}\le M}{c}_{{i}_{1}{i}_{2}}{\Vert {{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{1}}^{{s}_{{i}_{1}{i}_{2}}}-{{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{2}}^{{s}_{{i}_{1}{i}_{2}}}\Vert }_{2}^{2}$$

where \({{{{{{\bf{R}}}}}}}_{i}\) is an estimate of \({p}_{i}\)-by-\({p}_{i}\) LD matrix based on a reference sample from the \(i\)-th population, \({{{{{{\bf{r}}}}}}}_{i}\) is the \({p}_{i}\)-dimentional vector of GWAS summary statistics in the \(i\)-th population, \({{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{1}}^{{s}_{{i}_{1}{i}_{2}}}\) and \({{{{{{\boldsymbol{\beta }}}}}}}_{{i}_{2}}^{{s}_{{i}_{1}{i}_{2}}}\) denote the effect vectors for the SNPs shared across \({i}_{1}\)-th and \({i}_{2}\)-th populations (the set of SNPs is denoted by \({s}_{{i}_{1}{i}_{2}}\)); \({\delta }_{i}\), \({\lambda }_{i}\) and \({c}_{{i}_{1}{i}_{2}}\) are tuning parameters as defined in above sections.

This optimization can be solved using coordinate descent algorithms by iteratively updating each element in the vectors. We take derivative for SNP \(k\) in \(i\)-th population, \(k=1,\ldots,{p}_{i}\), \(i=1,\ldots,M\)

$$\frac{\partial {{{{{\bf{L}}}}}}\left({{{{{{\boldsymbol{\beta }}}}}}}_{1}{{{{{\boldsymbol{,}}}}}}{{\ldots }}{{{{{\boldsymbol{,}}}}}}{{{{{{\boldsymbol{\beta }}}}}}}_{m}\right)}{\partial {\beta }_{{ik}}} =2\left(1+{\delta }_{i}+\mathop{\sum}\limits_{{i}^{{\prime} }\ne i,1\le {i}^{{\prime} }\le M}{c}_{i{i}^{{\prime} }}\right){\beta }_{{ik}}+2{\lambda }_{i}\frac{\partial {{{{{\rm{|}}}}}}{\beta }_{{ik}}{{{{{\rm{|}}}}}}}{\partial {\beta }_{{ik}}} \\ -2\left({r}_{{ik}}-\mathop{\sum}\limits_{{k}^{{\prime} }\ne k,1\le {k}^{{\prime} }\le p}{R}_{i,{k}^{{\prime} }k}{\beta }_{i{k}^{{\prime} }}+\mathop{\sum}\limits_{1\le {i}^{{\prime} }\le M,{{{{{\rm{s}}}}}}.{{{{{\rm{t}}}}}}.k\in {S}_{i,{i}^{{\prime} }}}{c}_{i{i}^{{\prime} }}{\beta }_{{i}^{{\prime} }k}\right)$$

where \({\beta }_{{ik}}\) denotes the effect for SNP \(k\) in \({{{{{{\boldsymbol{\beta }}}}}}}_{i}\), \({r}_{{ik}}\) denotes the summary statistics for SNP \(k\) in \({{{{{{\bf{r}}}}}}}_{i}\), and \({R}_{i,{k}^{{\prime} }k}\) denotes LD between the SNP \(k\) and the SNP \({k}^{{\prime} }\) in \({{{{{{\bf{R}}}}}}}_{i}\).

By solving \(\frac{\partial {{{{{\bf{L}}}}}}\left({{{{{{\boldsymbol{\beta }}}}}}}_{1}{{{{{\boldsymbol{,}}}}}}{{{{{\boldsymbol{\ldots }}}}}}{{{{{\boldsymbol{,}}}}}}{{{{{{\boldsymbol{\beta }}}}}}}_{m}\right)}{\partial {\beta }_{{ik}}}=0\) after the \((t)\)-th iteration, we can get the updating rule for the \((t+1)\)-th iteration

$${\beta }_{{ik}}^{(t+1)}=\frac{{{{{{\rm{sign}}}}}}\left({u}_{{ik}}\right)\cdot \max \{0,\left|{u}_{{ik}}\right|-{\lambda }_{i}\}}{1+{\delta }_{i}+\mathop{\sum}\limits_{1\le {i}^{{\prime} }\le M,{{{{{\rm{s}}}}}}.{{{{{\rm{t}}}}}}.k\in {S}_{i,{i}^{{\prime} }}}{c}_{i{i}^{{\prime} }}}$$

where

$${u}_{{ik}}={r}_{{ik}}-\mathop{\sum}\limits_{{k}^{{\prime} }\ne k,1\le {k}^{{\prime} }\le p}{R}_{i,{k}^{{\prime} }k}{\beta }_{i{k}^{{\prime} }}^{(t)}+\mathop{\sum}\limits_{1\le {i}^{{\prime} }\le M,{{{{{\rm{s}}}}}}.{{{{{\rm{t}}}}}}.k\in {S}_{i,{i}^{{\prime} }}}{c}_{i{i}^{{\prime} }}{\beta }_{{i}^{{\prime} }k}^{(t)}$$

Super learning

After getting PRSs for all populations under all tuning parameter settings, we further apply super learning to combine them to be trained on the tuning samples to get the final PROSPER model and tested on the validation samples. We use the function “SuperLearner” implemented in the R package with the same name, and include three linear prediction algorithms: lasso, ridge, and linear regression for continuous outcomes; and two prediction algorithms: lasso and linear regression for binary outcomes. We did not include ridge for binary outcomes due to the unavailability of ridge for binary outcomes in the function. For the included algorithms which have parameters: (1) in lasso, we use 100 values in lambda path calculated in the default setting in glmnet package; (2) in ridge, we use a lambda path of sequence from 1 to 20 incrementing by 0.1. We use Area under the ROC curve (AUC) as the objective function for binary outcomes and thus use the flag “method = method. AUC” in the function.

Existing PRS methods

We compare five groups of PRS methods. The first group is: single-ancestry method, which contains commonly known single-ancestry methods, including CT, LDpred2, and lassosum2, that are trained from the GWAS data from the target population. The second group is: EUR PRS-based method, which is the three above single-ancestry methods trained from EUR GWAS data. The third group is: weighted PRS, which uses the weights estimated from a linear regression to combine the PRSs estimated from the corresponding single-ancestry method from all populations. The fourth group is: existing multi-ancestry methods, which includes two recently published and well-performed multi-ancestry methods, PRS-CSx and CT-SLEB. The last group is our proposed PROSPER. For all algorithms that have tuning parameters or weights, the optimal ones are determined based on predictive R2 or AUC on tuning samples and finally evaluated on validation samples.

Below are detailed descriptions of the existing PRS methods used as comparisons in this manuscript. In short, CT and CT-SLEB are methods that use less-dependent genetic variants after a clumping step in models. LDpred2 and PRS-CSx are Bayesian methods that can account for LD among genetic variants. Lassosum2 and our proposed PROSPER are penalized regression methods capable of modeling genome-wide genetic variants and fitting the model in a speedy way. As for the three multi-ancestry methods, CT-SLEB and PRS-CSx model the cross-ancestry genetic correlation using a multivariate Bayesian prior, while our proposed PROSPER uses a ridge penalty to impose effect-size similarity across pairs of populations.

CT is implemented in our analysis by using r2-cutoff of \(0.1\) in the clumping step and then thresholding by treating P value-cutoff as a tuning parameter and being chosen from 5 × 10−8, 1 × 10−7, 5 × 10−7, 1 × 10−6,…, 5 × 10−1, \(1.0\); P value is from GWAS summary statistics using Chi-squared test.

LDpred2 is a PRS method that uses a spike-and-slab prior on GWAS summary statistics and modeling LD across SNPs. We implement LDpred2 by the function “snp_ldpred2_grid” in the R package “bigsnpr” version 1.12. The two tuning parameters in the algorithm include: the proportion of causal SNPs, which is chosen from a sequence of length 21 that are evenly spaced on a logarithmic scale from \({10}^{-5}\) to \(1\); per-SNP heritability, which is chosen from 0.3, 0.7, 1, or 1.4 times the total heritability estimated by LD score regression divided by the number of causal SNPs. We fix the additional “sparse” option (for truncating small effects to zero) to FALSE.

lassosum2 is a PRS method that uses lasso regression on GWAS summary statistics for a single ancestry. We implement lassosum2 by the function “snp_lassosum2” in the R package “bigsnpr” version 1.12. The two tuning parameters in the algorithm include: tuning parameter for the lasso penalty, which is chosen from a sequence of length 30 that are evenly spaced on a logarithmic scale from \(0.01\times \mathop{\max }\limits_{1\le k\le p}\left(\left|{r}_{k}\right|\right)\) to \(\mathop{\max }\limits_{1\le k\le p}\left(\left|{r}_{k}\right|\right)\); and regularization parameter for LD matrix, which is chosen from c(0.001, 0.01, 0.1, 1).

EUR PRS are the PRSs trained from EUR GWAS using the above single-ancestry methods, CT, LDpred2, and lassosum2, that are then applied to individuals of the target population. There is no need to perform tuning for them because the models have been tuned in EUR tuning samples. When computing scores for EUR PRS-based method, we exclude SNPs that are not presented in the validation samples from the target population.

Weighted PRS linearly combines the corresponding single-ancestry method trained from all populations. The weights in the linear combination are estimated by a simple linear regression in the tuning samples from the target population.

PRS-CSx is a Bayesian multi-ancestry PRS method that jointly models GWAS summary statistics and LD structures across multiple populations using a continuous shrinkage prior. It has a further step to linearly combine the posterior effect-sizes estimates for EUR and the target population using weights in a simple linear regression in the tuning samples from the target population. We implement PRS-CSx using their Python-based command line tool “PRS-CSx, available at https://github.com/getian107/PRScsx. The parameter phi was chosen from the default candidate values, \(1,{10}^{-2},{10}^{-4}\) and \({10}^{-6}\). Due to the package restriction, the models are fitted with only HM3 SNPs.

CT-SLEB is a multi-ancestry PRS method that starts from clumping and thresholding, then uses Empirical-Bayes (EB) method to estimate the coefficients of PRS, and finally combines PRS by a super learning model. We implement CT-SLEB by codes available at https://github.com/andrewhaoyu/CTSLEB. The three tuning parameters in the algorithm include: r2-cutoff and base size of the clumping window size used in the clumping step, which are chosen from (0.01, 0.05, 0.1, 0.2, 0.5) and (50 kb, 100 kb), respectively; and P value cutoffs for EUR and the target population, which are chosen from \(5\times {10}^{-8},5\times {10}^{-7},5\times {10}^{-6},\ldots,5\times {10}^{-1}\) and \(1.0\); P value are from GWAS summary statistics using Chi-squared test.

Simulation analysis

The simulated data are generated in ref. 21. In brief summary, the data were simulated under five assumed genetic architecture (as described in the legends of Fig. 2 and Supplementary Figs. 25) and three different degrees of polygenicity pcausal = 0.01, 0.001, and 5 × 10−4. The sample sizes for GWAS training data are assumed to be n = 15,000 and n = 80,000 for the four non-EUR target populations; and is fixed at n = 100,000 for the EUR population. PRS generated from all methods are tuned in n = 10,000 samples, and then tested in n = 10,000 independent samples in each target population. We randomly repeated the simulation three times, and reported the average R2 for all candidate methods.

Computational time and memory usage

The computational time and memory usage of PROSPER and PRS-CSx are compared based on the analysis using simulated data on chromosome 22. The analysis starts from inputting all required data into the algorithms, such as summary statistics and LD reference data, and ends with outputting the final PRS coefficients from the algorithms. PROSPER requires an input of optimal parameters in single-ancestry analysis, so we also include the step of running the single-ancestry analysis, lassosum. The analyses are performed using a single core with AMD EPYC 7702 64-Core Processors running at 2.0 GHz. The reported results are averaged over 10 replicates. The sample size for training GWAS summary statistics is n = 15,000 for non-EUR populations and n = 100,000 for EUR population. The sample size for the tuning dataset is n = 10,000 for each population.

Real-data analysis

Training GWAS summary statistics are from 23andMe, GLGC, and AoU. Tuning and validation of individual-level data are from 23andMe and UKBB. LD reference data are from 1000G. Detailed descriptions of those datasets are listed below.

1000G data. We used samples in five populations, AFR, AMR, EAS, EUR, and SAS from 1000 Genomes Project (Phase 3)51. The components of the five populations are described in https://useast.ensembl.org/Help/Faq?id=532.

23andMe data. We analyzed two continuous traits, heart metabolic disease burden and height; and five binary traits, any CVD, depression, migraine diagnosis, morning person and SBMN, using GWAS summary statistics obtained from 23andMe Inc. Data on these seven traits are available for all five populations: AA, EAS, EUR, Latino, and SAS. The LD reference panels used for the five populations, respectively, are unrelated individuals from 1000G of AFR, EAS, EUR, AMR, and SAS origins. The tuning and validation are performed on a set of independent individuals of the corresponding ancestry from 23andMe participant cohort. Please see Supplementary Data 7 for training sample sizes and Supplementary Data 8 for tuning and validation sample sizes. The data we used are preprocessed in ref. 21, accessible from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3NBNCV. The details of the data, including genotyping, quality control, imputation, removing related individuals, ancestry determination, and the preprocessing of GWAS, are described in pages 54–61 in its Supplementary Notes, and Manhattan plots and QQ plots were shown in its Supplementary Figs. 9–15. For continuous traits, we evaluate PRS performance by the predictive R2 of the PRS for residualized trait values obtained from regressing the traits on covariates. For binary traits, we evaluated PRS performance by the AUC by using the roc.binary function in the R package RISCA version 1.057. To compare the PRS performance for two different methods, we used the relative increase of logit-scale variance. The logit-scale variance of binary traits is converted from AUC by the formula \({\sigma }^{2}=2{\phi }^{-1}\left({AUC}\right)\), where \(\phi\) is the cumulative distribution function of the standard normal distribution.

GLGC data. We analyzed four blood lipid traits, LDL, HDL, logTG and TC, using GWAS summary statistics computed without UKBB samples that are publicly available from GLGC. Detailed information about the design of the study, genotyping, quality control, and GWAS is described in ref. 37. The data we used are preprocessed in ref. 21 in pages 61–62 in its Supplementary Notes, and Manhattan plots and QQ plots were shown in its Supplementary Figs. 16–19. Data on the four traits are available for all five populations: admixed African or African, EAS, EUR, Hispanic, and SAS. The LD reference panels used for the five populations, respectively, are unrelated individuals from 1000G of AFR, EAS, EUR, AMR, and SAS origins. The tuning and validation are performed on UKBB individuals (as described below) from the same reference ancestry label as the LD reference panel. Please see Supplementary Data 7 for sample sizes and the number of SNPs included in the analysis.

AoU data. We analyzed two anthropometric traits, BMI and height, using GWAS summary statistics trained from AoU. Data for the two traits are available for three ancestries: AFR, Latino/Admixed American, and EUR. The data we used are preprocessed in ref. 21, accessible from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FAWEQK. The details of the data are described in pages 61–62 in its Supplementary Notes, and Manhattan plots and QQ plots were shown in its Supplementary Figs. 20 and 21. The LD reference panel used for the three populations, respectively, are 1000G unrelated individuals of AFR, AMR, and EUR origins. The tuning and validation are performed using UKBB individuals (as described below) from the same reference ancestry label as the LD reference panel. Please see Supplementary Data 7 for sample sizes and the number of SNPs included in the analysis.

UKBB data. We used UKBB data only for tuning and validation purposes. The four blood lipid traits and two anthropometric traits mentioned above have direct measurements in UKBB. The ancestry label of UKBB individuals is determined by genetically predicted ancestry, which are described in pages 62–63 in the Supplementary Notes of the paper from ref. 21. Tuning and validation are based on R2 of the PRS regressed on the residuals of the phenotypes adjusted by sex, age and PC1-10. Please see Supplementary Data 8 for sample sizes. We note that for PRS we tested in UKBB validation samples, we use the ancestry labels in UKBB (AFR, AMR, EAS, EUR, or SAS), instead of ancestry labels in the GWAS training data, to report the R2 in the Figures, “Results”, and “Discussion” of this paper.

Extra tuning parameter for varying genetic distances

In the discussion, we investigated adding an extra tuning parameter to accommodate adaptable distances between the AFR population and others. Specifically, the pair-wise \({c}_{{ij}}\) follows the formula

$${c}_{ij}=\left\{\begin{array}{cc} {r} \times {c} & {{{{{\rm{if}}}}}}\,i\,{{{{{\rm{or}}}}}}\,j={AFR} \\ {c} & \;{{{{{{\rm{if}}}}}}} {i}\,{{{{{\rm{and}}}}}}\,{j} \ne {{AFR}}\end{array}\right.$$

where \(r\) and \(c\) are tuning parameters; \(r\) takes values from 0.5, 1,1.5; and \(c\) takes the same sequence of candidate values as described in the first paragraph of “Methods”.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.