Improving genetic prediction by leveraging genetic correlations among human diseases and traits

Genomic prediction has the potential to contribute to precision medicine. However, to date, the utility of such predictors is limited due to low accuracy for most traits. Here theory and simulation study are used to demonstrate that widespread pleiotropy among phenotypes can be utilised to improve genomic risk prediction. We show how a genetic predictor can be created as a weighted index that combines published genome-wide association study (GWAS) summary statistics across many different traits. We apply this framework to predict risk of schizophrenia and bipolar disorder in the Psychiatric Genomics consortium data, finding substantial heterogeneity in prediction accuracy increases across cohorts. For six additional phenotypes in the UK Biobank data, we find increases in prediction accuracy ranging from 0.7% for height to 47% for type 2 diabetes, when using a multi-trait predictor that combines published summary statistics from multiple traits, as compared to a predictor based only on one trait. Genetic prediction of complex traits so far has limited accuracy because of insufficient understanding of the genetic risk. Here, Maier et al. develop an improved method for trait prediction that makes use of genetic correlations between traits and apply it to summary statistics of psychiatric diseases.

P ersonalised medicine, in which genetic testing is the basis for informing future health status and determining intervention, is effectively applied for a number of monogenic disorders 1 . For common complex disorders, which are those that are underlain by multiple genetic and environmental factors 2 , predictive genetic testing that can discriminate individuals who are most at risk is currently limited, mainly because much of the genetic variation remains poorly understood 3,4 . The potential of genetic risk prediction to (i) inform early interventions and (ii) aid diagnosis by identifying individuals with an increased genetic risk of disease could be improved substantially by increasing the accuracy of genetic risk predictors 5 . While genome-wide association studies (GWASs) of increased sample size will continue to unravel the role of genetic factors for complex diseases 6 , improved prediction models are also required to maximise the accuracy of a risk predictor.
GWASs use linear regression to independently estimate the effects of single-nucleotide polymorphisms (SNPs) across the genome, and commonly, these estimated SNP effects are then used to create a genetic risk predictor in independent samples [7][8][9] . However, this approach is not optimal because it either ignores linkage disequilibrium (LD) between markers or accounts for LD by discarding potentially informative SNPs 10 . Prediction accuracy of complex phenotypes can be improved by methods that jointly estimate the SNP associations to obtain SNP effect estimates with best linear unbiased predictor (BLUP) properties within a linear mixed model (LMM) approach, a model termed genomic BLUP (GBLUP) 7,11,12 . A multi-trait extension of the LMM approach, yielding multivariate BLUP (MT-BLUP) predictors of the SNP effects, can further improve prediction accuracy when phenotypes are genetically correlated, because measurements on each trait provide information on the genetic values of the other correlated traits [13][14][15][16] . MT-BLUP has been shown to improve prediction accuracy for genetically correlated common psychiatric disorders when combining individual-level data across independent data sets 16,17 . However, the application of MT-BLUP to complex common disorders is limited as combining individual-level genotype-phenotype data across case-control studies of all complex diseases is generally not feasible due to data protection concerns and restrictions on data sharing.
Here we overcome this limitation by developing a framework that combines publically available GWAS summary statistics across multiple studies of different traits together in a weighted index to generate approximate multi-trait summary statistic BLUP (wMT-SBLUP) predictors (Supplementary Table 1). We show through theory and simulation study that MT-BLUP predictors, which traditionally require individuallevel phenotype-genotype data for all traits, can be approximated accurately by wMT-SBLUP predictors in a computationally efficient manner using only summary statistic data and an independent genomic reference sample. We also show how multi-trait summary statistic predictors can be created directly from GWAS summary statistics (wMT-GWAS) or from predictors obtained using the software LDPred 18 that extends a single-trait summary statistic BLUP model (SBLUP) by assuming that marker effects come from a mixture of distributions. We apply our approach to multiple phenotypes in the Psychiatric Genomics Consortium (PGC) to compare summary statistic approaches to direct estimation on individual-level data. We further apply our approach to summary statistics of several other phenotypes to create predictors that we evaluate using the UK Biobank data. We show that, for most traits, our multi-trait predictors improve prediction accuracy as compared to a single-trait predictors.

Results
Overview of the approach. Standard GWAS summary statistics are ordinary least squares (OLS) estimates of the SNP effects and do not have optimal properties for prediction 11 . Even when LMM association analysis is used, the estimated SNP effects still represent marginal effects and not effects conditional on other SNPs, which is what is desirable for prediction 19 . Previous studies have shown how OLS summary statistics can be reanalysed in a mixed model framework to produce approximate BLUP predictors (summary statistic BLUP: SBLUP, implemented in the most recent release of GCTA) 18,20,21 or approximate mixture model predictors (LDPred). We first extend the SBLUP approach to a multi-trait framework (MT-SBLUP) and find a computational limitation associated with the inversion of a SNP-by-SNPby-trait matrix. To overcome this, we then derive theory to show how single-trait predictors with BLUP properties can be combined together in a weighted index to generate predictors with equivalent properties to those gained from a MT-BLUP analysis (Fig. 1).
Consider two genetically correlated traits for which we have individual-level genetic predictors with BLUP properties. For each individual, i, and focal trait of interest, f, we have a genetic prediction b g BLUP i;k for each trait, k, that we can combine together using the index weights, w i,k , for each b g BLUP i;k effect to produce a weighted multi-trait BLUP genetic predictor: In the Methods section, we show that the optimal index weights can be calculated as: where h 2 k is the SNP heritability of trait k (proportion of phenotypic variance explained by genome-wide SNPs), r G is the genetic correlation between trait k and the focal trait and R 2 k is the expected squared correlation between a phenotype and a BLUP predictor, calculated as: where M eff is the effective number of chromosome segments and N k is the sample size of trait k. These weights will ensure that the contribution of each added trait is approximately proportional to the square root of its sample size, its SNP heritability and its genetic correlation with the focal trait (trait 1), while accounting for different variances of single-trait BLUP predictors. Both h 2 k and r G can be estimated from GWAS summary statistics using LD score regression 22,23 . Following 20 , individuallevel genetic predictors with BLUP properties can also be obtained from GWAS summary statistics (b g SBLUP k , where SBLUP represents summary statistic approximate BLUP). Therefore, for any given trait, genetic predictors with BLUP properties b g SBLUP k phenotype-genotype data for all traits, enabling prediction accuracy to be improved by fully utilising all of the publically available GWAS summary statistic data. We also show how weighted indices can be calculated for GWAS summary statistics (wMT-GWAS) or from predictors obtained using the software LDPred 18 (wMT-LDPred), therefore depending upon the genetic architecture of the trait approximate multi-trait summary statistics can be created to maximise genomic prediction accuracy.
Simulation study. We first conducted a simulation study using observed SNP genotype data to confirm the expectations from our theory. We show through theory (see Methods section) that a wMT-SBLUP genetic predictor has the same expected prediction accuracy as one created from a multivariate mixed-effects model (multi-trait BLUP: MT-BLUP) if the linkage disequilibrium among SNP markers in the individual-level analysis is well approximated by a reference genotype panel (see Methods section). We demonstrate that a wMT-SBLUP predictor increases prediction accuracy over a single-trait predictor, with the magnitude of increase being proportional to the ratio of the SNP heritability of the added traits relative to that of the predicted trait, the sample size of the added traits relative to that of the predicted trait and the genetic correlation between the added traits and the predicted trait (Fig. 2, Supplementary Figs. 1 and 2). We also demonstrate how genetic predictors generated by LDPred 18 can be combined in an approximate multi-trait weighting ( Supplementary Fig. 3). We also provide a theoretical expectation for the loss in prediction accuracy that occurs when using an independent reference sample to compute SBLUP effects compared to a predictor based on BLUP effects (see Methods section), and we detail the loss of prediction accuracy in our simulation study (Fig. 2b, Supplementary Figs. 1 and 4).
Application to psychiatric disorders. We then applied our approach to the PGC schizophrenia 24,25 and bipolar data, two psychiatric disorders known to have a high genetic correlation 26 . The availability of combined individual-level data for both disorders enabled a direct comparison of the MT-BLUP 16   wMT-SBLUP approaches. We calculated all predictors for the previously used 16 PGC wave 1 (PGC1) data sets 24 and compared the prediction accuracy (correlation between predicted values and phenotypes adjusted for sex, cohort and the first 20 principal components) across diseases and approaches. We find comparable but slightly lower accuracies in the wMT-SBLUP predictors as compared to the MT-BLUP predictors (0.151 vs 0.156 in bipolar disorder and 0.217 vs 0.219 in schizophrenia) and an increase in prediction accuracy as compared to the single-trait (BLUP) predictors (0.128 in bipolar disorder, 0.198 in schizophrenia) (Fig. 3). Our results demonstrate that creating SBLUP genetic predictors using an independent LD reference sample and combining these in a weighted sum results in prediction accuracy comparable to a full MT-BLUP prediction for common complex disease traits, at a much lower computational burden. We then applied our approach to the larger PGC wave 2 (PGC2) data sets for schizophrenia 25 and bipolar disorder (see Methods section), which included the PGC1 data. To test whether the addition of more cohorts improved prediction accuracy, we estimate wMT-SBLUP predictors in the PGC2 data. Having shown the resemblance of wMT-SBLUP and MT-BLUP by theory, simulation and in the PGC1 data, we refrained from running a MT-BLUP model in the PGC2 data to avoid the computational burden of analysing the combined schizophrenia bipolar data set. For schizophrenia, there were 36 cohorts (26,412 cases and 32,440 controls in total) and for bipolar disorder there were 23 cohorts (18,865 cases and 30,460 controls in total). We conducted a cohort-wise leave-one-out crossvalidation approach to examine variation in prediction accuracy across cohorts.
For schizophrenia, we find that prediction accuracy increases in 20 of the 36 cohorts of the PGC2 data when using a wMT-SBLUP predictor as compared to a SBLUP predictor (Supplementary Fig. 5). However, the median correlation (0.300 with an SBLUP predictor, and 0.304 with a wMT-SBLUP predictor) and mean correlation (0.295 with a SBLUP predictor and 0.294 with a wMT-SBLUP predictor) across the 36 PGC2 cohorts did not improve with a wMT-BLUP predictor. For bipolar disorder, we find an improvement of the wMT-SBLUP predictor over the SBLUP predictor in 17 out of the 23 cohorts ( Supplementary  Fig. 6), with a mean correlation increase from 0.212 to 0.229 and a median correlation increase from 0.210 to 0.225. To evaluate whether this is because the weights we used for schizophrenia and bipolar disorder do not represent the mixing proportions that lead to the highest accuracy in this data set or whether other factors explain the variable results across cohorts, we created multi-trait predictors using not only weights calculated from Eq. (17) but also weights corresponding to any other mixing proportion of the two disorders (Supplementary Figs. 5, 6 and 7). This demonstrates (i) that our calculated weights are very close to the empirically optimal weights when averaged across cohorts ( Supplementary Fig. 7), (ii) that there is substantial heterogeneity across cohorts as shown by the variable prediction accuracies of single-trait and cross-trait predictors across cohorts, which is supported by previous studies 25 , and (iii) that, for some test set cohorts, there is no mixing proportion that will lead to a multi-trait predictor which outperforms a single-trait predictor. The larger gain in accuracy that results from supplementing a bipolar disorder predictor with schizophrenia data compared to supplementing a schizophrenia predictor with bipolar disorder Correlation between predicted and true genetic value Two traits are considered. The first trait has a sample size of 20,000 and a SNP heritability of 0.5. The sample size and SNP heritability of the second trait vary between panels. The blue line shows the expected prediction accuracy of a single-trait predictor. The black line shows the expected prediction accuracy of a multi-trait predictor. The purple line shows the expected prediction accuracy of a cross-trait predictor (using only trait 2 to predict trait 1). The advantage of a multi-trait predictor over a cross-trait predictor decreases with increasing r G , h 2 , and sample size of the second trait. b Simulation results. Prediction accuracy is shown as correlation between simulated genetic value and predicted phenotype of individuals. Genotypes from European individuals in the GERA cohort were used for simulation. Boxplots show results across six replicates. In the left panels, the LD structure was removed by permuting dosage values for each SNP across all individuals. In the right panels, the original genotypes were used for simulation. Expected prediction accuracies were derived for the case of unlinked genotypes and are shown as red horizontal bars. In each section, the prediction accuracy of three predictors is shown: data is consistent with greater power of the schizophrenia discovery sample. We find that for both single-trait and multitrait predictors the SBLUP predictors outperform the OLS predictors in almost all cohorts (Supplementary Figs. 5 and 6).
Application to traits recorded in a large population study. In principle, any number of traits can be combined into a multi-trait predictor at almost no computational cost. We therefore extended our approach to create wMT-SBLUP predictors from 34 phenotypes for which we could access summary statistics. In order to calculate wMT-SBLUP weights, we used LD score regression to estimate SNP heritability and genetic correlations of the 34 summary statistics traits. The results are mostly in line with previous reports 23 ( Supplementary Fig. 8, Supplementary Data 1). As test set, we used 112,338 individuals in the UK Biobank data. We matched 6 of the 34 discovery traits to traits in the UK Biobank (Supplementary Table 1) and created wMT-SBLUP predictors. For the wMT-SBLUP predictor of each focal trait, we included predictor traits with genetic correlation p-value < 0.05. For all traits, wMT-SBLUP genetic predictors were more accurate than any single-trait (SBLUP) predictor (Fig. 4). wMT-SBLUP predictors generally improved prediction accuracy over single-trait GWAS OLS predictors (Supplementary Fig. 9) and were similar to wMT-LDPred predictors (Supplementary Figs. 10 and 11.) We observe the largest increases in accuracy for Type 2 diabetes (47.8%) and depression (34.8%). Accuracy for height (0.7%) and body mass index (BMI) (1.4%) increase only marginally. As shown in our theory and simulation study, the magnitude of increase in prediction accuracy of a wMT-SBLUP predictor over a single-trait SBLUP predictor depends upon the prediction accuracies of all the traits included in the index and the genetic correlation among phenotypes. As GWAS sample sizes increase and genomic predictors increase in accuracy, a wMT-SBLUP approach will likely become increasingly beneficial.

Discussion
In summary, we demonstrate that multivariate predictors derived from GWAS summary statistics can increase prediction accuracy in a wide range of traits. This approach has particular utility in risk prediction of traits for which it is hard to generate large sample sizes for GWAS, as SNP heritability and sample size are the two factors that determine prediction accuracy of a polygenic trait, when using a single-trait predictor. The increase in prediction accuracy of a multi-trait over a standard single-trait genetic predictor is therefore greatest when the additional traits included in the predictor have higher SNP heritability and sample size than the trait to be predicted, as well as a high genetic correlation with the trait to be predicted. We show how genetic predictors from GWAS OLS effects, LDPred effects or SBLUP effects can be combined, yielding an approach that is general across different phenotypes. Special consideration should be given to the risk of sample overlap between the summary statistics data used to create the predictor and the prediction target. Sample overlap will lead to inflated estimates of accuracy, and while here we were able to take steps to avoid individuals being recorded across multiple data sets, further work is required to negate these effects within this framework. In principle, assuming perfect homogeneity between training and test set and perfect estimates of SNP heritability and genetic correlation, there is no limit to the number of traits that can be combined using our approach. In practice, however, there will be little benefit of combining traits with low genetic correlation, as they will not influence the predictor much. Some added traits might even reduce accuracy, if the genetic correlation is not estimated accurately. The focus of our analysis was the prediction of genetic risk and we aimed to provide a fast, computationally efficient, general framework for genomic prediction. This sets it apart from other multi-trait approaches like phenome-wide association studies, which focus on the effects of individual SNPs on multiple phenotypes. We note, however, that a multitrait testing approach can in principle also be used to increase the power to identify loci associated with specific traits as demonstrated in the recently developed MTAG method 27 . Another potential caveat of our analysis is that prediction accuracy increases for a focal trait may come from the addition of traits that are standardly measured on patients, and improved frameworks are required to identify marker effects conditionally on known health risk factors. Despite these limitations, current evidence suggests that genetic correlations among phenotypes are pervasive 23 , sample sizes of GWAS are increasing 6 and public availability of genome-wide summary statistics is becoming the norm 28 , meaning that genomic prediction of complex common disease will continually improve especially when predictors of multiple phenotypes are integrated across studies within this framework.

Methods
General model. We consider a general linear mixed model: where y is the phenotype, W a matrix of SNP genotypes, where values are standardised to give the ij th element as: , with x ij the number of minor alleles (0, 1 or 2) for the i th individual at the j th SNP and p j the minor allele frequency. b are the genetic effects for each SNP, and the residual error. The dimensions of y, W, b and are dependent upon the number of phenotypes, k, the number of SNP markers, M, and the number of individuals, N, and are described in the sections below. We denote the distributional properties var(b) = B, var() = R and var(y) = WBW′ + R.
For human complex diseases and quantitative phenotypes, GWASs have typically estimated the solutions for b of Eq. (1) one SNP at a time using OLS regression 29 as: where diag[W′W] has diagonal elements w j ′w j and off-diagonal elements of zero. However, by analysing one SNP at a time, GWAS effect size estimates do not account for the covariance structure among SNPs and they are not unbiased in the sense that E bjb h i ¼b 12 . BLUP of the SNP effects have the property E bjb h i ¼b, are used in genomic prediction in animal and plant breeding 30 and more recently in human medical genetics, yielding improved prediction accuracy for a number of traits over genetic predictors created from OLS SNP estimates 16,17 . In a general form, BLUP solutions for b of Eq. (1) can be written using Henderson's mixed model equations 31 as: and if R is diagonal, then Eq. (6) can be reduced to: Below, we describe how Eqs. (6) and (7) can be used to estimate BLUP SNP effects for a single trait and for multiple traits jointly from individual-level phenotype-genotype data. We then show how Eqs. (6) and (7) can be approximated to obtain BLUP SNP effects for single and multiple traits in the absence of individual-level data from publically available GWAS summary statistics and an independent reference sample. , giving R ¼ I N σ 2 ϵk , with I N an identity matrix of dimension N. Substituting these expressions into Eq. (6) means that Eq. (7) can then be written as: Joint estimation of BLUP SNP effects for multiple traits. Phenotypic measurements of a trait can be informative for the genetic values of other traits, if the traits are genetically correlated with one another 14,15,32 . Recent studies have shown that prediction accuracy of common complex disease can be improved by estimating SNP effects for multiple traits jointly within a multivariate mixed-effects model 16,17 .
If k traits are measured on different individuals, with N k observations for trait k, the elements of Eq. (4) become: y ′ ¼ ½y′ 1 :::y ′ k ; W ¼ where Σ b is a k × k matrix, with diagonal elements σ 2 bk and off-diagonal elements the covariances of SNP effects between traits k and l, σ bk;l . For Kronecker products, B À1 ¼ Σ À1 b I M and substituting these expressions directly into Eq. (6) means that multi-trait BLUP solutions for b can be obtained in Eq. (7) as: For a two-trait example, Eq. (9) Multi-trait BLUP SNP effects from summary statistics. Estimating SNP effects for multiple traits jointly in Eq. (9) requires individual-level genotype and phenotype data across a range of common complex diseases and quantitative phenotypes, which are not readily available in human medical genetics due to privacy concerns and data sharing restrictions. Additionally, Eq. (9) requires a series of computationally intensive M × k equations to be solved. However, these issues can be overcome by approximating Eq. (9) using publically available GWAS summary statistic data and an independent genomic reference sample. Single-trait approximate BLUP SNP effects can be obtained from GWAS summary statistics (SBLUP: summary statistic approximate BLUP) by replacing W k ′W k and W k ′y k of Eq. (8) by their expectation, which are . Equation (8) can then be written as: The shrinkage parameter is λ k ¼ σ 2 ϵk =σ 2 bk = Mσ 2 ϵk =h 2 SNPk = M 1 À h 2 SNPk =h 2 SNPk , under the assumption of phenotypic variance of 1 that makes the proportion of phenotypic variance of trait k attributable to the SNPs h 2 SNPk ¼ Mσ 2 bk . This approach was implemented in ref. 21 and is similar to the LDpred model presented by Vilhjálmsson et al. 18 but with a few differences. The first is that it only considers the infinitesimal case, where all SNPs are considered to be causal and their effect sizes follow a normal distribution. This corresponds to the LDpred-Inf model. The second difference is that the shrinkage parameter of Vilhjálmsson et al. 18 is λ k ¼ M=h 2 SNPk as they assume that the error variance is 1 rather than 1 À h 2 SNPk in our implementation. The third difference is that in the LDpred-Inf model, Vilhjálmsson et al. 18 calculate BLUP effects for blocks of a certain number of SNPs following a tiling window approach giving a block diagonal structure to L, whereas our implementation within the software GCTA (see URLs) follows a sliding window approach giving a banded diagonal to L. Assuming an error variance of 1 À h 2 SNPk is more appropriate because cumulatively the SNP markers explain h 2 SNPk of the phenotypic variance. In both implementations, a window is used to capture the LD around SNP markers in order to avoid the large computational costs of inverting a dense M dimensional SNP LD matrix, with only little loss of information (see below).
For multiple phenotypes, the elements of Eq. (11) become: , meaning that Eq. (11) can be extended as: Equation (12) approximates Eq. (9) using only publically available GWAS summary statistic data and an independent genomic reference sample. However, there remains the large computational cost associated with the inversion of the Index weighted multi-trait BLUP SNP effects. An alternative to Eq. (12), is to obtain k b b MTÀSBLUP effects by combining together k single-trait b b SBLUP estimates of Eq. (11), using an optimal index weighting for each trait. The index weighting to For SNP j and focal trait f, we have b b SBLUP values for k traits, and we wish to obtain the index weights, w j,k , for each b b SBLUPj;k effect as: In animal and plant breeding, selection indices have been developed, which combine many single-trait BLUP predictors of an individual's genetic value together in an index weighting to optimise the selection of individuals with the most favourable multi-trait phenotype for breeding programs [33][34][35][36] . Utilising a selection index approach, the solution for w SBLUP of Eq. (13) can be obtained as: where C SBLUP a k × 1 column vector of the covariance of the b b SBLUPk values of the k traits, with the true genetic effects of the SNPs for the focal trait, and V SBLUP a k × k variance-covariance matrix of the b b SBLUP effects: Therefore, if V SBLUP and C SBLUP can be approximated then b b MTÀSBLUP of Eq. (12) can be obtained from k single-trait b b SBLUP estimates from Eq. (11).
To derive the approximations, we first consider the diagonal elements of V SBLUP , which comprise the variance of the SBLUP SNP solutions, var b b SBLUPk . These can be approximated from theory under the assumption that b b SBLUPk have BLUP properties E bjb h i ¼b, which in turn implies that Following Daetwyler et al. 37 and Wray et al. 38 , the squared correlation between a phenotype, y k , in an independent sample and a single-trait BLUP predictor of the phenotype, b g BLUPk , is approximately: attributable to additive genetic effects for trait k. Note that M eff is the effective number of chromosome segments or the number of independent SNPs, which is a function of effective population size (N e ) and can be empirically obtained as an inverse of the variance of genomic relationships 39,40 . Here we use an estimate of M eff of 60,000, which is in line both with our estimates from the genomic relationships in our simulation data and with previously reported estimates 41 . In Eq. (16), R 2 k occurs on both the left-and righ-hand side. Solving for R 2 k gives R 2 k ¼ With a phenotypic variance of 1 and individual-level genetic effects g k = Wb k , then h 2 k ¼ σ 2 gk ¼ Mσ 2 bk and the squared correlation between the true, g k , and estimated BLUP effects, b g BLUPk , is: Rearranging Eq. (17) gives Therefore: Second, we consider the off-diagonal elements of V SBLUP , which are comprised of the covariance of BLUP SNP solutions among the k traits. These can again be approximated from theory given the covariance of genetic effects among traits k and l is cov(b k , b l ) = r G h k h l /M, with r G the genetic correlation, and given the squared correlation between the true genetic effects of the SNPs, b k , and b b BLUPk which is given by Eq. (17) The covariance of BLUP SNP predictors is then: Finally, we can consider the column vector C SBLUP , which is composed of the covariance between the true genetic effects of the SNPs for the focal trait, b f , and b b SBLUPk for all of the k traits. The first element of C SBLUP is covariance between the true genetic effects of the SNPs for the focal trait b f and b b SBLUPf for the focal trait , which can be approximated from theory by considering a The covariance of b f and b b BLUPk can then be written as: If we consider a two-trait example where the focal trait that we want to predict is matched to the first of the two traits, Eqs. (18), (19) and (20) combine as: giving the index for the focal trait as: with solutions for the index weights of: For traits with low power, R 2 k is usually very small. In that case, V SBLUP can be well approximated by a diagonal matrix with entries R 2 k M . w f will become 1 and w k for all other traits will be r Gf ;k hf hk . It may appear surprising that traits with higher SNP heritability have smaller weights than traits with lower SNP heritability. This can be explained by the fact that the variance of each BLUP predictor R 2 k À Á is approximately proportional to h 4 k N if M eff is large, and thus a trait with higher SNP heritability will still have a larger contribution to the multi-trait predictor than a trait with lower SNP heritability.
Equation (17) k and thus the index weights of Eq. (15) can be applied equally to BLUP solutions for the SNP effects or BLUP predictors for individuals of each trait as described in the main text in Eq. (1) through (3). Both r Gk;l and h 2 k of Eq. (15) can be obtained from summary statistic data using LD score regression 22 and therefore b b MTÀBLUP effects of Eq. (10), which would traditionally require individual-level phenotype-genotype data for all traits, can be approximated accurately in a computationally efficient manner using only publically available GWAS summary statistic data and an independent genomic reference sample.
Index weighted multi-trait OLS SNP effects. In the previous section, we have shown how b b SBLUP estimates for multiple traits can be combined to yield more accurate b b wMTÀSBLUP SNP effects, which can be turned into b g wMTÀSBLUP individual predictors that approach b g MTÀBLUP accuracy. However, using a similar weighting we can also combine b b OLS estimates for multiple traits into b b wMTÀOLS .
For SNP j and focal trait f, we have b b OLS values for k traits, and we wish to obtain the index weights, w j,k , for each b b OLSj;k effect as: Just like before, the optimal weights can be derived as:w OLS ¼ V À1 OLS C OLS , where C OLS is now a k × 1 column vector of the covariances of the b b OLSk values of the k traits with the true genetic effects of the SNPs for the focal trait, and V OLS is a k × k variance-covariance matrix of the b b OLS effects: . .
The diagonal elements of V OLS are: The off-diagonal elements for trait k and l are C OLS now has elements If we again consider a two-trait example, Eqs. (25), (26) and (27) combine as: These weights are considerably different from the BLUP weights, which reflects the different variances of BLUP effects and OLS effects. Here we include this section for completeness but focus our analyses on multi-trait BLUP effects, because they are more accurate in expectation than multi-trait OLS effects.
Index weighted multi-trait SNP effects using LDPred. For phenotypes with a genetic architecture characterised by a few loci of very large effect sizes, this approach may not be ideal. Models that assume a mixture distribution for SNP effects, such as LDpred or BayesR, can yield higher prediction accuracies in traits of non-infinitesimal genetic architecture 18,42 . As outlined above, Eq. (17) implies k and thus the index weights of Eq. (15) can be applied equally to BLUP solutions for the SNP effects or BLUP predictors for individuals of each trait as described in the main text in Eq. (1) through (3). LDpred aims to estimate the posterior mean phenotype that provides best unbiased prediction. Therefore, single-trait individual-level predictors obtained from LDPred can also be weighted together to create an approximate multi-trait predictor.
Prediction accuracy of weighted multi-trait BLUP predictors. The prediction accuracy of b b wMTÀBLUP effects obtained from Eq. (15) can be derived by considering the correlation of b f and b b wMTÀBLUPk as: Equation (13) gives b b wMTÀBLUPf ¼ w′ b b BLUP and thus the covariance of b f and b b wMTÀBLUPf is: The variance of the b b wMTÀBLUP effects obtained from Eq. (15) is: Additionally, w = V −1 C and Vw = C, and thus w′C = w′Vw or written another way into Eq. (19), the correlation of b f and b b wMTÀBLUPk can then be written as: which gives the squared correlation as Therefore, the squared correlation between a phenotype and a multiple trait index weighted BLUP predictor of the phenotype is approximately: If we consider a two-trait example then prediction accuracy for a focal trait R 2 can be written as: where V 1,2 is the off-diagonal element of the matrix V of Eqs. (15) and (21). The value of R 2 can then be compared to the prediction accuracy of the singletrait BLUP predictor of Eq. (16) and to the prediction accuracy of a cross-trait predictor 43 , where a BLUP predictor of the second trait is used to predict the focal trait phenotype, which is given by: R 2 . This comparison is of interest, because we expect the multi-trait predictor to be more accurate than any available single-trait predictor, even if the most accurate singletrait predictor is across two different traits. Cross-trait prediction is equivalent to the proxy-phenotype method, which has been used to predict cognitive performance from educational attainment GWAS data 44 .
Loss of prediction accuracy from BLUP approximation. Equations (16), (17), other words that SBLUP SNP solutions have BLUP properties. The use of an independent LD reference sample to create an approximate single-trait BLUP predictor in Eq. (11) does not affect the covariance between the true SNP effect sizes and the approximate BLUP SNP solution, meaning that the approximate single-trait BLUP predictors have BLUP properties. However, the variance of b b SBLUP is likely affected, which may potentially result in a loss of prediction accuracy of a weighted multi-trait BLUP predictor. The variance of b b SBLUP is: The loss of information from using an independent data set as an LD reference to obtain L, rather than directly using the individual-level data to calculate W′W, can be approximated by considering the scenario where SNP makers are unlinked, resulting in diag [L]. The diagonal elements of σ 2 b bSBLUP jj for SNP j are then: The diagonal elements of diag[(W′W)(W′W)] can be approximated as where the expectation of the LD correlation of the SNPs, E r 2 ½ , is 1/N as the SNP markers are unlinked. Equation (36) can then be written as: From Eq. (37), the squared correlation between true SNP effects and SBLUP SNP effects can be written as: This can be contrasted to Eq. (17), which gives the squared correlation between the true genetic effects of the SNPs, b k , and b b BLUPk as: Equation (39) is similar to Eq. (38) apart from the factor 1 À R 2 k . Therefore, the relative loss of prediction accuracy from using an SBLUP predictor is given as a ratio of Eqs. (39) and (38) as: For a phenotype of SNP heritability 0.5, with effective number of independent markers (independent genomic segments), M eff , of~60,000 and sample size, N, of 500,000, R 2 b; b bSBLUP from summary statistics in an independent reference sample will be 91% of the value of R 2 bk; b bBLUP k if individual-level data were available. Likewise, for a two-trait example where both traits have h 2 = 0.5 and N = 500,000, the accuracy of the multi-trait SBLUP predictor will also be 91% of the accuracy of the multi-trait BLUP predictor. It should be noted that here we assume L to be a a diagonal matrix, which will lead to a conservative estimate of the accuracy of SBLUP relative to the accuracy of BLUP, and that this estimate is in fact equivalent to the expected accuracy of a polygenic risk predictor based on marginal OLS effects 28 . In practice, approximating L through an external reference data set leads to SBLUP predictors, which are more accurate than predictors based on marginal OLS effects but less accurate than predictors based on BLUP effects.
Computation time. Computing weights and combing up to 930,000 SNP effects of 34 traits takes <10 min on an Intel Xeon E7-8837 processor with 2.76 GHz. Memory requirements do not extend much beyond the amount necessary to read in the summary statistics. Calculation of single-trait SBLUP effects is more computationally demanding, so we split the data by chromosome. Runtime for chromosome one was <40 min, with memory usage just under 10 GB.
Simulation study. To compare the accuracy of single-trait and multi-trait genetic predictors created from SNP effects obtained from both individual-level and summary statistic data, we conducted a simulation study based on real genotypes from the Kaiser Permanente study (Genetic Epidemiology Research on Adult Health and Aging: GERA cohort) and simulated phenotypes.
From the GERA cohort, we selected 50,000 individuals of European ancestry (for definitions of European individuals and quality control (QC) of the genotypic data, see ref. 45 ). SNP QC procedures on the initial data sets consisted of per-SNP missing data rate of <0.01, minor allele frequency >0.01, per-person missing data rate <0.01 and Hardy-Weinberg disequilibrium p-value < 1 × 10 −6 . For the subsequent imputation, the data were first haplotyped using HAPI-UR. After that, Impute2 was used to impute the haplotypes to the 1000 genomes reference panel (release 1, version 3). Best-guess genotypes at common SNPs included in the HapMap 3 European sample were then extracted and filtered for imputation info score >0.5, missing data rate of <0.01, minor allele frequency >0.01, per-person missing data rate <0.01 and Hardy-Weinberg disequilibrium p-value < 1 × 10 −6 . Next, we performed principal component analysis and removed individuals with principal eigenvector values that were >7 SD from the mean of the European cluster. Lastly, we identified pairs of individuals with genetic relatedness matrix >0.05 and removed one individual from each of these pairs. The Atherosclerosis Risk in Communities study (ARIC data) was used as an independent LD reference when estimating SBLUP SNP effects of Eq. (11). Eight thousand seven hundred and forty-four European individuals were selected and the data were imputed and QC conducted in the same way as described above for the GERA cohort. We then reduced the SNPs used in both the GERA and ARIC cohorts to overlapping HapMap3 SNPs, which gave 557,034 SNPs that were used in the simulation study.
We then randomly assigned 20,000, 20,000 and 10,000 individuals from the GERA cohort to create three data sets: training set one, training set two, and a testing set. We simulated two genetically correlated traits by randomly selecting 2000 causal SNPs. Effect sizes for the causal markers were simulated from a bivariate normal distribution with mean 0, variances of p . These effect sizes were then multiplied with the standardised genotype dosages (mean 0 and variance 1) to create a genetic value for each individual. Normally distributed environmental effects e~N(0, 1 − h 2 ) were added to this genetic value for each individual to create phenotypes with mean 0 and variance of 1. To remove any effects of population stratification, the simulated phenotypes were then regressed against the first 20 genetic principal components, and the residuals from this regression were used in all subsequent analyses.
In training set 1, we simulated trait 1 and we then estimated: (i) OLS SNP effects using Eq. In the testing set, we then used the estimated SNP effects from the training sets to produce genetic predictors for both traits. Single-trait genetic predictors were created for both simulated traits from (i) the OLS SNP effects b g OLS À Á , (ii) the BLUP SNP effects b g BLUP À Á and (iii) the SBLUP SNP effects b g SBLUP À Á . We then created multi-trait predictors where trait 1 was the focal trait from: (i) individual-level multi-trait BLUP predictor b g MTÀBLUP À Á , (ii) weighted multi-trait SBLUP predictor b g wMTÀSBLUP À Á , (iii) a weighted multi-trait BLUP predictor-based individual-level single-trait BLUP estimates b g wMTÀBLUP À Á , and (iv) a weighted multi-trait GWAS predictor based on GWAS OLS estimates b g wMTÀOLS À Á . We simulated phenotypic values for both traits using the same effect sizes as those used to generate the phenotypes in the training sets and normally distributed environmental effects sampled independently for each trait as e~N(0, 1 − h 2 ). We also compared estimates obtained using LDPred with the proportion of SNPs in the model of 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1 or using the LDPred-Inf option.
We created two simulation scenarios. Heritability of the first and second trait and genetic correlations were h 2 1 = 0.2, h 2 2 = 0.8 and r G = 0.8, respectively, in the first scenario and were h 2 1 = 0.5, h 2 2 = 0.5, and r G = 0.5, respectively, in the second scenario. In each set-up, six replicates were conducted, each with a different set of randomly selected causal markers. We then repeated all analyses on a permuted data set, where the values of the genotype matrix were permuted across all individuals, for each SNP. This creates a genotype matrix where the allele frequency distribution remains the same, but all LD structure is removed, allowing us to determine the degree to which differences between the simulations results are driven by the LD structure in the real genotype data. Finally, because prediction accuracy is expected to be reduced by the error introduced by using an external LD reference data set and a restricted LD window when implementing Eq. (8) (see above), we examined how changing the LD reference and restricting the LD window size influences to optimal value of shrinkage parameter λ when implementing Eq. (8) (see Supplementary Fig. 3).
Application to PGC schizophrenia and bipolar disorder. We then applied our approach to the schizophrenia (SCZ) and bipolar disorder (BIP) samples from both wave 1 and wave 2 data of the PGC (PGC1 and PGC2). A description of the data collection and imputation of the SNP genotype data can be found elsewhere 25,26,46 .
We selected these two disorders because there is a high genetic correlation between them (estimate for r G between schizophrenia and bipolar disorder using ldsc: 0.72, SE: 0.03; estimated using meta-analysis of all PGC2 schizophrenia and bipolar cohorts, excluding cohorts which were used as test set in the initial PGC1 analysis), and it enabled us to draw a direct comparison between the approach described here and a previous study, which estimated multi-trait BLUP SNP effects b b MTÀBLUP from individual-level data in an approach equivalent to Eq. (9). , and (iii) weighted multi-trait GWAS predictor based on GWAS OLS estimates b g wMTÀOLS À Á . We then compared the prediction accuracy we obtained using the weighted multitrait SBLUP predictors to the individual-level multi-trait BLUP predictor b g MTÀBLUP À Á used in the previous study 16 . We then extended our analysis to the PGC2 data set. There were 36 cohorts for schizophrenia (26,412 cases and 32,440 controls in total) and 23 cohorts for bipolar disorder (18,865 cases and 30,460 controls in total) available to us. The number of SNPs used in the PGC2 analyses varied between cohorts. Summary statistics for each of the PGC2 cohorts was available to an imputed SNP set of >10,000,000 SNPs. After intersecting this set of SNPs with the HapMap3 SNPs and the ARIC SNPs, 932,344 SNPs remained that were used to create predictors.
We applied a cross-validation approach as we observed that prediction accuracy as well as accuracy differences between predictors can be highly dependent on the choice of the test set in the extended PGC2 data set ( Supplementary Figs. 5 and 6), which is supported by previous results showing highly variable prediction accuracy across cohorts in the PGC2 data set 25 . A cross-validation approach allowed us to get a more robust estimate of the increase of prediction accuracy achieved by our multi-trait prediction method compared to a single-trait predictor. We employed a leave-one-out cross-validation approach, where, for each test set cohort, all cohorts of the same disease without any highly related individuals were chosen to be in the training set for the single-trait predictor and all cohorts of both diseases without any highly related individuals were chosen to be in the training set for the multitrait predictor. To identify pairs of cohorts with highly related individuals, genetic relatedness for all pairs of individuals (across all pairs of cohorts) was calculated based on chromosome 22, and whenever at least one pair of individuals had relatedness >0.8, that pair of cohorts was not simultaneously used in the training set and the test set.
The full genotypes from the PGC2 cohorts that were used as test sets underwent stringent QC and only comprised 458,744-860,576 SNPs for schizophrenia and 556,278-859,034 SNPs for bipolar disorder. We refrained from using the intersection between all these cohorts to not reduce the number of SNPs used in prediction by too much. This meant that different iterations in the crossvalidations were based on predictions using a different number of SNPs. However, each comparison between a single-trait predictor and a multi-trait predictor is based on the same number of SNPs.
In each iteration of the cross-validation, a different cohort acts as the test set and a different set of cohorts comprises the training set. To create a predictor from a particular set of cohorts, we first had to obtain effect size estimates from this particular set of cohorts. This is achieved by performing a meta-analysis of the summary statistics of the cohorts that comprise the training set. The meta-analysed beta values b META are calculated as: where b s is the effect size in cohort s and SE s is the standard error in cohort s. Conversion between beta values and odds ratios (OR) simply follows the equality b = log(OR). The weights derived for each trait make assumptions about the variance of SNP effects. We found that, in the summary statistics we used, the observed variance across SNP effects often departed from the expected value. To correct for that, we scaled the SNP effect estimates for each trait to have a variance of one and multiplied the weights for the unscaled SNP effects by the expected standard deviation across all SNPs.
We created approximate SBLUP effects b b SBLUP using the OLS SNP effects from Eq. (5) and the ARIC data as an LD reference using Eq. (11) and set the shrinkage parameter, λ, to 1,300,000 for schizophrenia and to 2,000,000 for bipolar disorder, corresponding to observed scale SNP heritability estimates of 0.43 and 0.33 for schizophrenia and bipolar disorder, respectively. We then used the PLINK "--score" function to turn SNP effects for each meta-analysed schizophrenia or bipolar disorder crossvalidation set. For the multi-trait weighting, we estimated the heritability of schizophrenia and bipolar disorder and their genetic correlation using LD score regression from publicly available PGC2 schizophrenia summary statistics and the PGC1 bipolar disorder summary statistics. These estimates were then used to calculate the index weights of Eq. (15) for the weighted multi-trait SBLUP predictors b g wMTÀSBLUP ; b g wMTÀGWAS À Á of SCZ and BIP, and these were not altered between different cross-validation sets.
To test the degree to which the choice of weights affects the accuracy of the multi-trait predictor, we compared the accuracy of multi-trait predictors based on a spectrum of other weights ( Supplementary Figs. 5 and 6). For this, we took advantage of two things: First, when individual predictors b g SBLUP ; b g GWAS À Á are weighted rather than SNP effects b b SBLUP ; b b GWAS , the conversion from SNP effects to individual effects does not have to be repeated for different weights. Second, the scaling of a predictor does not influence its accuracy in terms of correlation between prediction and outcome. Therefore, rather than testing each combination of weights of schizophrenia and bipolar disorder, it is sufficient to vary the relative weight of schizophrenia to bipolar disorder to explore the whole range of possible multi-trait predictors for these two traits. For each test cohort, this enabled us to test whether the weights of our multi-trait predictor derived from theory deviate from the weights that would result in the highest prediction accuracy for that data set.
Application to phenotypes in the UK Biobank study. We applied our approach to a large range of phenotypes for which GWAS summary statistics are publicly available. We started with GWAS summary statistics for 46 phenotypes. However, in some circumstance the same studies (i.e., based on the same individuals) had generated summary statistics for multiple similar phenotypes, so we chose only one phenotype per study, which left us with 34 phenotypes. For example, out of "Cigarettes per day" and "Smoking Ever" we only selected the latter to have only one trait for smoking. We used 112,338 unrelated individuals of European descent in the UK biobank data as the testing set. We paired 6 phenotypes out of the 34 summary statistic phenotypes to phenotypes in the UK Biobank: Height, BMI, fluid intelligence score, depression, angina, and diabetes. The first three are quantitative traits and the latter three are disease traits for which we could identify at least 1000 cases in the UK Biobank data. For details, see Supplementary Table 1.
For the disease traits, we used the self-reported diagnoses rather than ICD10 diagnoses, as they tend to have larger sample sizes. For depression, we used a more refined definition of cases and controls, where individuals were not counted as cases if they had any history of psychiatric symptoms or diagnoses other than depression or if they were prescribed drugs that are indicative of such diagnoses. Individuals were selected as controls only when there was an absence of any psychiatric symptoms or diagnoses and only when they were not prescribed any drugs that could be indicative of such diagnoses. All 6 traits in the UK Biobank were corrected for age, sex and the first 10 principal components by regressing the phenotype on these covariates and using the residuals from that regression for further analysis. For each trait, the SNPs that went into the analysis were based on the overlap between the GWAS summary statistics, the HapMap3 SNPs, the GERA data set, which was used as an LD reference in the SBLUP analysis, and the imputed SNPs from the UK Biobank. (For details on the QC process and imputation, see URLs.) Depending on the trait, the total number of SNPs ranged from around 660,000 to around 930,000.
We created single-trait b g SBLUP À Á as well as multi-trait b g wMTÀSBLUP À Á predictors for the six paired phenotypes. To create SBLUP SNP effects b b SBLUP from summary statistic trait, we used a λ value of M 1 À h 2 SNPk =h 2 SNPk for each trait k, where M is assumed to be 1,000,000. As LD reference set, we used a random subset of 10,000 people of European descent from the GERA data set, and we set the LD window size to 2000 kb. We then used the PLINK "-score" function to turn SNP effects b b SBLUP into individual predictors b g SBLUP À Á for each trait. For the multitrait weighting, we used LD score regression to calculate SNP heritability and genetic correlation between all pairs of cohorts. For dichotomous disease traits, SNP heritability was calculated on the observed scale. For each phenotype for which a multi-trait predictor was created, we selected all phenotypes that had a genetic correlation estimate significantly different from 0 at p = 0.05 with the focal trait, as well as the focal trait itself. The summary statistics based single-trait SBLUP predictors of the selected phenotypes were then combined into multi-trait SBLUP b g wMTÀSBLUP À Á predictors. The weights for each phenotype were calculated according to Eq. (15). These weights require the single-trait predictors to have exactly the right variance. Since the summary statistics data slightly diverged from this expectation, we scaled each single-trait SBLUP predictor to have mean 0 and variance 1 and then multiplied it with its expected standard deviation, to ensure everything is on exactly the correct scale. We followed the same approach when using single-trait LDPred predictors.
We compared the performance of the multi-trait predictors b g wMTÀSBLUP À Á not only to the performance of the single-trait predictor b g SBLUP À Á for the same trait but also to the performance of all other (cross-trait) single-trait predictors for the traits that exhibited significant r G with the focal trait (Fig. 4). This is appropriate because in some traits the single-trait predictor from the same trait is not the most accurate single-trait predictor.