Widespread signatures of natural selection across human complex traits and functional genomic categories

Understanding how natural selection has shaped genetic architecture of complex traits is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level GWAS data to estimate multiple genetic architecture parameters including selection signature. Here, we present a method (SBayesS) that only requires GWAS summary statistics. We analyse data for 155 complex traits (n = 27k–547k) and project the estimates onto those obtained from evolutionary simulations. We estimate that, on average across traits, about 1% of human genome sequence are mutational targets with a mean selection coefficient of ~0.001. Common diseases, on average, show a smaller number of mutational targets and have been under stronger selection, compared to other traits. SBayesS analyses incorporating functional annotations reveal that selection signatures vary across genomic regions, among which coding regions have the strongest selection signature and are enriched for both the number of associated variants and the magnitude of effect sizes.

T he joint distribution of SNP effect size and minor allele frequency (MAF) is an essential component of the genetic architecture of human complex traits and is influenced by natural selection 1 . A negative relationship between effect size and MAF is a signature of negative (or purifying) selection 2,3 , which prevents mutations with large deleterious effects becoming frequent in the population. Understanding how natural selection has shaped genetic variation helps researchers to improve experimental designs of genetic association studies 4 and the estimation of SNPbased heritability (the proportion of phenotypic variance explained by the SNPs) [5][6][7][8][9] . Inference on natural selection is also a critical step towards the understanding of the genetic architecture of complex traits. For instance, the theory of negative selection 10 explains why the effects of common variants identified by genome-wide association studies (GWAS) are unlikely to be large 11,12 .
We have recently developed a Bayesian method (BayesS) to estimate the effect size-MAF relationship, which was considered as a free parameter (S) in the model 13 . We detected negativeŜ for a number of complex traits in humans, highlighting an important role of negative selection in shaping the genetic architecture, consistent with the findings from other studies based on genomewide variance estimation approaches 7,11,14,15 . The BayesS model also allows us to estimate the SNP-based heritability and polygenicity (the proportion of SNPs with nonzero effects) to better describe the genetic architecture for a trait. The application of BayesS has been restricted to GWAS samples with individuallevel genotypes but for most common complex diseases, only summary-level data are publicly available. Moreover, despite the implementation of a parallel computing strategy 13 , it remains computationally challenging to run BayesS for biobank-scale data, as the computing resource required increases linearly with the number of individuals or SNPs.
In this study, we enhance the BayesS model such that the analysis only requires GWAS summary statistics and a sparse linkage disequilibrium (LD) correlation matrix from a reference sample. This method (referred to as Summary-data-based BayesS or SBayesS) opens an opportunity to disentangle the genetic architecture of complex traits (including diseases) using publicly available data sets of the largest sample sizes to date, with merely a small fraction of the computational resource required for BayesS. We perform extensive analyses to benchmark between SBayesS and BayesS, and apply the SBayesS methods to GWAS summary statistics from the full release of the UK Biobank 16 (UKB) data and other published studies [17][18][19][20][21][22][23][24][25] , followed by time-forward simulations 26 for evolutionary inference and SBayesS analyses that incorporated functional genomic annotation data. We detect widespread signatures of negative selection in the genetic architecture across 155 complex traits with a predicted mean selection coefficient of~0.001 and a predicted mean proportion of human genome sequence being mutational targets of~1%, among which common diseases show a relatively higher mean selection coefficient and a relatively smaller number of mutational targets. Metaanalysis across traits reveals differential signatures of negative selection across functional genomic regions, among which coding regions have the strongest selection signature and are enriched for both trait-associated variants and those with large effect sizes.

Results
Method overview. BayesS is a method that can estimate three key parameters to describe the genetic architecture of complex traits by a Bayesian mixed linear model 13 , namely SNP-based heritability (h 2 SNP ), polygenicity (π) and the relationship between MAF and effect size (S), all of which are defined with respect to a certain set of SNPs (see the definition of h 2 SNP as an example 5 ). BayesS requires individual-level data. SBayesS is an extension of BayesS, but only requires GWAS summary statistics of the SNPs and LD information from a reference sample (see the "Methods" section, Supplementary Note and Supplementary Fig. 1). We compute pairwise LD correlations between SNPs located on the same chromosome from a reference sample and remove correlations that can be attributed to sampling variation by a chisquared test, resulting in a sparse LD matrix (see the "Methods" section). In addition, we model analytically the sampling variance of LD estimates as part of the residual variance and allow the estimate of residual variance to vary across SNPs (Supplementary Note). Compared to BayesS, SBayesS not only addresses the barrier of data sharing as it does not require individual-level data, but also substantially increases the computational efficiency because of the use of sparse LD matrix and a different updating strategy in the MCMC sampling (Supplementary Note). These features allow SBayesS to be scalable to data with millions of SNPs regardless of the discovery GWAS sample sizes. To examine the convergence of MCMC, we provide a GCTB-SBayesS implementation of the Gelman-Rubin statistic 27 which compares the variation between and within multiple chains with different starting values of the model parameters (see the "Methods" section). Convergence is only concluded if all the three key parameters converge, which may not occur if the LD matrix from a reference sample is too divergent from that of the GWAS sample, or if the summary statistics are generated from a GWAS with low power or contain uncorrected population stratification, poor imputation or other errors such as misreported per-SNP sample size and allele frequency.
In light of recent studies 11,28 , which point out a possible lack of fit of a point-normal mixture model to some traits, we further extended SBayesS to a multi-component mixture model (referred to as SBayesRS), following the framework of SBayesR 29 . In SBayesRS, each SNP effect is assumed to have a mixture of a point mass at zero and three normal distributions with mean zero and variances that differ by a factor of 10 (see the "Methods" section). This flexible prior accounts for a more complex genetic architecture with a spectrum of very small to very large effect sizes. The S parameter and overall polygenicity are estimated based on the SNPs across all nonnull mixture components.
To better understand the variability of regional genetic architecture in different parts of the genome, we incorporate functional genomic annotations into SBayesS to allow the three key parameters to vary in different annotation categories, e.g., coding, regulatory and repressed regions. We performed the functional partitioning SBayesS analysis (denoted SBayesS-strat) based on a two-component model that fitted SNPs in one annotation as the first component and the rest of the SNPs as the second component (see the "Methods" section). During MCMC sampling, the enrichment of a parameter in an annotation category is computed as the ratio of the sampled value of the parameter in the category to that for the whole genome (see the "Methods" section).
Benchmarking SBayesS with BayesS. We ran both SBayesS and BayesS with~1.1 million HapMap3 SNPs with MAF ≥ 0.01 for 18 quantitative traits (n > 100k) as analysed in Zeng et al. 13 . We used the HapMap3 SNPs as they were optimised to tag common genetic variants 30 and are widely used in the literature which improves the comparability of our results with those generated using published GWAS summary statistics. Hence, the reported parameters are specific to this SNP set. For ease of computation, we used unrelated individuals of European ancestry from the interim release of the UKB data for the BayesS analysis (maximum n = 120k across traits) and the same data to generate GWAS summary statistics for the SBayesS analysis. We show in Fig. 1 that the correlation between the SBayesS and BayesS estimates for all of the three genetic architecture parameters was close to one across traits (Pearson correlation r = 0.998 for h 2 SNP , 0.985 for π and 0.965 for S).
We performed additional sensitivity analyses to investigate the impact of the sparsity of LD matrix, the SNP panel, the choice of reference sample and the reference sample size on the performance of SBayesS. We found that SBayesS was robust to different chi-squared thresholds used for LD filtering (Supplementary Fig. 2) and gave consistent results with BayesS regardless of whether using HapMap3 (Fig. 1) or UKB Axiom array panel ( Supplementary Fig. 3a). The analysis using HapMap3 SNPs tended to give slightly lowerĥ 2 SNP andπ but stronger signals of S for both SBayesS and BayesS ( Supplementary Fig. 3b, c), possibly due to the under-representation of low-frequency SNPs in HapMap3 panel in comparison with the UKB Axiom array panel ( Supplementary Fig. 4). There was negligible difference in parameter estimates when the LD reference sample size (n ref ) decreased from 50k to 20k but notable inflation inĥ 2 SNP andŜ when n ref further decreased to 4k ( Supplementary Fig. 5), suggesting that the LD reference sample size cannot be too small relative to the GWAS sample size. Given a constant reference sample size (n ref = 50k), we ran GWAS with sample sizes n gwas = 120k and 350k and found highly consistent parameter estimates between SBayesS and BayesS regardless of n gwas ( Supplementary  Fig. 6). As expected, the π estimate from either SBayesS or BayesS increased with larger n gwas because of the increased power to detect small effects, consistent with the observation from an independent prior study 28 . Since h 2 SNP and S were estimated based on the SNPs with nonzero effects, the estimators of these parameters were also to some extent sample size dependent (see Supplementary Note for more discussion). Furthermore, with both n ref = 50k and n gwas = 300k held constant, we benchmarked BayesS and SBayesS in a few different scenarios where the LD reference was a subset of the GWAS sample, an independent sample from the same or slightly different population, or a sample of different ancestry. The performance of SBayesS was almost independent of the overlap between the GWAS and LD reference samples as long as they are from the same population but started to deteriorate when the genetic discrepancy between GWAS and reference samples increased ( Supplementary Fig. 8). This observation demonstrates the importance of choosing a reference sample that is genetically as close to the GWAS sample as possible in the analysis of summary data 31 .
The parameter estimates were largely consistent between SBayesS and SBayesRS except for polygenicity, of which the estimate from SBayesRS was higher than that from SBayesS ( Supplementary Fig. 9a). This is because, on one hand, SBayesS has a relatively low power to detect SNPs with very small effect sizes due to its assumption of a single normal distribution; on the other hand, SBayesRS tends to overestimate the number of SNPs with very small effect sizes due to the insufficient power to distinguish very small effect sizes from zero, as suggested by simulation ( Supplementary Fig. 9c). Nevertheless, the number of SNPs with relatively large effects estimated from SBayesS was mostly consistent with that from SBayesRS ( Supplementary  Fig. 9b).
Finally, we tested the method in application to ascertained case-control data by simulation. The parameter estimates were nearly unbiased regardless of whether cases were oversampled, although the sampling variances of the estimates of polygenicity and S were relatively large in some simulation scenarios where the number of cases was relatively small ( Supplementary Fig. 10).
Analyses of GWAS summary data from the UKB and other published studies. We applied SBayesS to analyse the full release of the UKB data, including 26 complex traits and 9 common diseases (Supplementary Table 1). Although individual-level data are available in the UKB, application of the standard BayesS tõ 350k unrelated individuals with~1.1 million HapMap3 SNPs is computationally prohibitive. Prior to running SBayesS, we carried out standard quality control (QC) of the data (see the "Methods" section) and used linear regression to perform a GWAS analysis in unrelated individuals to generate summary statistics for each trait. We also applied SBayesS to data for 9 other complex common diseases from published GWAS of very large sample size where only summary statistics are available (Supplementary Table 2). In the analysis of the UKB data, we used the sparse LD matrix computed from a random sample of 50k unrelated individuals. For the analysis of data from published GWAS of which nearly all the samples are of European ancestry, the GERA 32 sample was used as the LD reference. To mitigate the problem due to inconsistent LD between the GWAS and reference samples, we excluded SNPs in the major histocompatibility complex  (MHC) region although the SBayesS results with and without the MHC region were very similar ( Supplementary Fig. 11). The SNP-based heritability estimates for the diseases were converted to those on the liability scale 33 . On average across the 44 complex traits (including diseases), 1.8% of the 1.1 million common HapMap3 SNPs explained 18% of the phenotypic variance ( Fig. 2 and Supplementary Tables 1, 2). The estimate of h 2 SNP for height was 0.545 (posterior standard error or p.s.e. = 0.003), consistent with those in previous studies using different approaches and data sets 6,14,[34][35][36] . The most polygenic traits (i.e., body fat percentage, educational attainment and schizophrenia) had about 5% (~55,000) SNPs with nonzero effects. The least polygenic traits were prostate cancer, age at menopause and male pattern baldness, which were affected by about 0.1-0.3% (1000-3000) common SNPs. The estimate of S was significantly negative (P < 0.001) in all the traits analysed (medianŜ = −0.578, SD = 0.096), suggesting a pervasive action of negative selection on the trait-associated variants. We also re-ran the analysis for the 9 public GWAS data sets with the UKB subsample as the LD reference and found that the results were highly consistent with those using LD from the GERA ( Supplementary Fig. 12).
We used the UKB classification code to classify the 44 traits into four categories related to disease, reproduction, physical measures, and cognition (Supplementary Table 3). The estimates of the genetic architecture parameters varied across traits and appeared to have distinct patterns in different categories (Fig. 2). Physical measures had the highest median SNP-based heritability (0.225), followed by reproductive traits (0.197). The median polygenicity estimate was the lowest for diseases (0.007) and reproductive traits (0.008) and the highest for cognitive traits (0.037). The estimates of polygenicity for psychiatric disorders such as schizophrenia (π= 0.046, p.s.e. = 0.003) and bipolar disorder (π = 0.034, p.s.e. = 0.009) were substantially higher than that for other types of disease and comparable to those for the cognitive traits, consistent with the high polygenicity for brainrelated traits reported in previous studies 11,28 . The absolute median value ofŜ was the highest for diseases, especially cardiovascular diseases, and the lowest for cognitive traits, with a relatively large variability inŜ for diseases. We observed similar results from SBayesRS but with higherπ ( Supplementary Fig. 13), in line with the observations from the benchmark analysis above.
To investigate the diversity of genetic architecture in more traits, we applied SBayesS to GWAS summary data from the Neale Lab (http://www.nealelab.is/uk-biobank) for 274 UKB traits, among which 130 passed the convergence test and 110 of these were not included in the analyses above (Supplementary Table 4). The traits that failed to converge tended to have much smaller sample size orĥ 2 SNP compared to the ones that converged (mean n = 231k andĥ 2 SNP ¼ 0:223 for the converged vs. mean n = 73k andĥ 2 SNP ¼ 0:044 for the non-converged), but we did not filter traits byĥ 2 SNP to avoid direct ascertainment bias. Figure  for the 44 traits analysed above, the distribution ofŜ appeared symmetric at about −0.6, with 79% of the estimates within the range of −0.7 to −0.5 (median = −0.576) across traits.

Prediction of evolutionary parameters for complex traits.
Although a negative estimate of S is a signature of negative selection, the numeric interpretation ofŜ is still not clear. For example, our results showed that most traits hadŜ at about −0.6; does it mean that negative selection acted on the associated variants with similar selection strength among traits? To answer this question, we performed forward simulations 26 to study the variational patterns of the genetic architecture parameters in a variety of evolutionary scenarios. We focused on three main evolutionary parameters: average selection coefficient ( s), proportion of mutational targets (π m , i.e., the proportion of DNA sequence at which mutations can affect the trait) and mutational heritability (h 2 m ¼ σ 2 m =σ 2 e with σ 2 m being the amount of additional additive genetic variance arising from new mutations in each generation and σ 2 e being the environmental variance σ 2 e ), and set a large range of values for each parameter (see the "Methods" section). The simulations were carried out based on a 100-Mb genome with a variable effective population size inferred from a demographic model 37 . The selection coefficients were sampled from either a normal distribution or a mixture distribution of many small and some very large values. In the last generation of selection, we used two pleiotropic models (the Simon et al. 12 and Eyre-Walker 3 model) to generate causal effects of the mutations on the focal trait (see the "Methods" section). Based on the LD correlation matrix computed from the unrelated individuals in the final generation of the simulated data, we directly simulated GWAS summary statistics with an equal sample size as in the UKB data analysis 11,38 (see the "Methods" section). The genetic architecture parameters h 2 SNP , π and S were estimated by SBayesS and SBayesRS using 36k SNPs randomly sampled from the common sequence variants, a comparable SNP density to that of the 1.1 million HapMap3 common SNPs used in the real trait analysis (1.1 × 10 6 × 1 × 10 8 /3 × 10 9 = 36,000).
Repeating the simulation with different values of s, π m and h 2 m produced a landscape of the genetic architecture under different scenarios (Fig. 4). Our results showed thatŜ ,π andĥ 2 SNP increased with the increasing levels of the corresponding evolutionary parameters s, π m and h 2 m , respectively. The results were generally consistent regardless of the use of SNPs (36k common SNPs or the actual common causal variants), estimation method (SBayesS or SBayesRS), simulation model (the Simons et al. or Eyre-Walker model), or the underlying distribution of selection coefficients (mixture or normal distribution) ( Fig. 4 and Supplementary Figs. [14][15][16][17]. In addition to the direct impact of the evolutionary parameters on the corresponding genetic architecture parameters, s had negative effects onĥ 2 SNP andπ because of causal variants with large effect sizes were purged by negative selection. Using a linear regression analysis of the true causal effects, we obtained the ordinary least-squares (OLS) estimate of S and used it as a proxy of the true value (see the "Methods" section). Regarding to the different distributions of selection coefficients, the true values of S were not highly correlated (Supplementary Fig. 18; r = 0.628), suggesting the distribution of selection coefficient plays a role in determining S. Compared to the true value of S, the SBayesS estimate based on the causal variant genotypes tended to be more negative when selection strength was weak and selection coefficients followed a mixture distribution (Fig. 4). This is because SBayesS assumes normality of the SNP effects whereas the true distribution is a multi-component mixture, supported by our results of maximum-likelihood estimation assuming normality (Supplementary Note and Supplementary Fig. 19). Compared to SBayesS, SBayesRS is more robust to the distribution of causal effects in the estimation of the S parameter ( Fig. 4 and Supplementary Fig. 17).
Next, we used a polynomial regression model to associate the evolutionary parameters ( s, π m and h 2 m ) withĥ 2 SNP ,π andŜ in the entire simulation dataset, and leveraged this association to predict the evolutionary parameters in real data (see the "Methods" section). We demonstrated by a cross-validation analysis in the simulated data that the three evolutionary parameters can be predicted with reasonably high accuracy ( Supplementary Fig. 20). We then applied this prediction model to the 44 traits analysed above. Overall, the real trait prediction results were robust to the evolutionary model used for training data simulation and the statistical method for genetic architecture parameter estimation (Fig. 5). Here, we focus on the results from the mixture model for selection coefficients, as similar results were observed from the model assuming normality (Supplementary Fig. 21). The predicted s were mostly between 10 −4 and 10 −3 with a mean of 0.0007 across traits (Fig. 5), in line with the estimates from prior work 12 . Cancer, stroke, and coronary artery disease showed the highest predicted s ( Supplementary Fig. 22), albeit the per-trait s had a wide confidence interval which covered a range of possible values by a factor of 10. The predicted π m was~1% on average across traits, meaning that~30 million base pairs of the human genome were mutational targets for a complex trait. The mean predicted h 2 m was 0.001 with relatively small variability across traits comparing to the other two parameters, which can be regarded as a justification of our projection approach because h 2 m is a rather conservative parameter with estimates of all~0.001 across traits even in different species 39 .
While the predicted h 2 m was similar across traits, the predicted s and π m were significantly different between some trait categories ( Fig. 5 and Supplementary Fig. 22). Common diseases had a mean s of 0.0010, which was significantly higher than that of 0.0005 for physical measures (median P value = 0.015 among the four groups of estimation methods and pleiotropic models). Compared to physical measures (mean π m = 0.011), the mean π m was significantly lower for disease (0.005, median P = 0.003) and Analyses incorporating functional genomic annotations. The functional annotation categories used in our analysis were from the LDSC baseline model 15 . We excluded continuous annotations and annotations with flanking windows, resulting in 21 annotation categories such as the coding, regulatory, repressed and conserved regions (Supplementary Table 5). We applied SBayesS-strat to the 35 UKB traits (including 9 diseases), and combined the parameter estimates across traits for each functional category (see the "Methods" section). Considering the extensive overlaps between annotation categories ( Supplementary Fig. 23), we ran SbayesS-strat analysis with a two-component model (SNPs in an annotation category versus the other SNPs) and computed the enrichment of each of the genetic architecture parameters using the SNPs in the focal annotation category in comparison to the genome-wide estimate using all SNPs. The fold enrichment of per-SNP heritability was correlated with that of polygenicity across annotation categories (r = 0.762; Fig. 6a). The per-SNP heritability and polygenicity were enriched in functionally important categories, such as transcription start sites (TSS), 3′-and 5′-UTRs, and conserved, enhancer and coding regions, but depleted in repressed regions. This result suggests that a functional category that explains a greater fraction of heritability tends to have a larger number of nonnull variants, consistent with the findings from a recent study 11 . However, for some categories, such as coding and conserved regions, the fold enrichment of per-SNP heritability was greater than that of polygenicity, suggesting an enrichment of larger effect sizes in these regions. To distinguish between the contributions of the number and the magnitude of the nonzero effects to h 2 SNP , we estimated per-NZE heritability (per-nonzero effect heritability h 2 NZE ðcÞ ¼ where m NZ (c) is the number of SNPs with nonzero effects in category c). While the fold enrichment of h 2 NZE ðcÞ was close to or smaller than one in most categories, the enrichment was the largest in the coding and conserved regions (Fig. 6b), suggesting that the enrichment of per-SNP heritability in these categories was not only because of the larger number of nonnull variants but also the larger effect sizes, confirmed by simulations (Supplementary Fig. 24). The median value ofŜ was −0.540, ranging from −0.739 (s.e.m. =  (Fig. 6c).
The negativeŜ in all functional categories suggests a widespread negative selection across the whole genome, and the largestŜ in the coding regions among all the functional categories highlighted the action of negative selection in the biologically most important regions.
Our estimates of per-SNP heritability enrichment were consistent with those from S-LDSC 15,40,41 for most annotation categories ( Supplementary Fig. 25). However, S-LDSC reported a much larger enrichment for the conserved region category, followed by the coding region category. This may be due to the different assumptions made in the two methods, i.e., SBayesSstrat assumes a sparse genetic architecture whereas S-LDSC does not explicitly assume a mixture model, as both the coding and conserved regions categories were enriched for the number of nonzero effects and the magnitude of effect sizes (Fig. 6b). Another explanation could be that the SBayesS-strat estimate is from a separate analysis of a focal category at a time conditioning on all the other SNPs with no overlap among categories whereas the S-LDSC estimates are from a joint analysis of all the categories with overlaps.

Discussion
We have developed an efficient summary-data-based method to estimate the joint distribution of effect sizes and MAF as well as SNP-based heritability, polygenicity and joint SNP effects. By analysing GWAS summary statistics from the public domain, we detected pervasive signatures of negative selection in the genetic architecture for a wide range of complex traits including common diseases (Figs. 2 and 3). Our results support a model of negative selection, that is, most new nonneural mutations are deleterious to fitness such that mutations with larger effects on fitness are more likely to be eliminated or kept at lower frequencies in the population by selection.
Most traits hadŜ at about −0.6 with diverse estimates of h 2 SNP and polygenicity, implying that the model with S = −1 originally used in the GREML method 42 is more appropriate than the model with S = 0 for most complex traits. Schoech et al. 7 linked the S parameter (denoted by α in their model) to the τ parameter in Eyre-Walker's model 3 and further drew inference on the average genome-wide selection coefficient. However, our forward simulations have shown that inference regarding the strength of selection cannot be made based solely on S but should take into account other genetic architecture parameters as well as the distribution of effect sizes. Despite the narrowly distributedŜ across traits, the predicted strength of selection per trait can vary by orders of magnitude ( Fig. 5 and Supplementary Fig. 22). By extrapolating our results based on HapMap3 common SNPs, we estimate that, on average,~1% of human genome sequence are mutational targets for a complex trait with an average selection coefficient of 0.0007, giving rise to additional additive genetic variance of 0.001 (in the unit of environmental variance) in each generation. The large estimates of mutational target size per trait implicate widespread pleiotropy across the genome, consistent with the result of a recent study that 90% of GWAS loci affect multiple traits 43 . Our results suggest a relatively small mutational target size but relatively strong selection on variants for common diseases and relatively large mutational target size for cognitive traits, in line with the previous finding that brain-related traits are highly polygenic and the associated genetic variants are likely under strong selection 11 . Our polygenicity parameter π represents the proportion of SNPs with nonzero effects; this definition has also been used previously 13,28,35,[44][45][46][47] . Our forward simulations showed that π is driven by both the mutational target size and selection strength, with increased average selection coefficient resulting in decreasedπ. This is because negative selection removed causal variants of large effects as well as SNPs in LD with them (a phenomenon known as background selection). O'Connor et al. 11 proposed an alternative definition of polygenicity, the effective number of independently associated variants or M e , which accounts for the distribution of per-SNP heritability across the genome. Despite the difference in definition, both π and M e estimates varied with the number of causal variants under different scenarios (Supplementary Fig. 26). In addition, the estimates of π and M e were highly correlated (r = 0.876) with a regression slope ofπ on M e estimates = 3.4. This is highly consistent with the result reported in O'Connor et al. that their M e estimates were highly correlated with the estimates from our previous study 13 for a number of traits (r = 0.9) but 4× smaller (Fig. S4 in O'Connor et al.).
Since we only detected signatures of negative selection in real traits, our evolutionary simulations focused on the models of negative selection. To investigate the impact of both negative and positive selections, we extended our simulation scenarios by considering two additional positive selection-related parameters: average positive selection coefficient and proportion of beneficial mutational targets (see the "Methods" section). When considering both negative and positive selections in the simulations, we observed more complicated relationships between the genetic architecture and evolutionary parameters ( Supplementary  Fig. 27), which, however, could still be used for prediction. Our results showed that the predicted s, π m and h 2 m were consistent with those predicted above considering negative selection only ( Supplementary Fig. 28), except that the estimated strength of The biologically important categories, such as the TSS, conserved, UTR and coding regions, had the highest enrichment in per-SNP heritability, most of which also had the highest enrichment in polygenicity, whereas the repressed regions were depleted in both parameters (Fig. 6). The concordance in functional enrichment between the two parameters reflects an uneven distribution of the number of causal variants across functional categories, consistent with the finding from prior work 11 . We further observed enrichment of per-NZE heritability in conserved and coding regions, suggesting larger effect sizes of nonnull SNPs in these regions compared to genome average. It is of note that coding regions showed the largestŜ among all the functional categories (in line with Speed et al. 9 ) with significant enrichment of per-NZE heritability, likely because of the coding mutations by nature having larger effect sizes and/or a mixture of negative selection on coding mutations with detrimental effects and positive selection on those with favourable effects. It is surprising to observe a relatively largeŜ in repressed regions, which may be a consequence of overlaps between functional annotations. Another possible explanation could be that positive selection has suppressed the signature of negative selection in non-repressed (biologically important) regions, consistent with our observation that S was closer to zero in the simulations considering both positive and negative selections than in the simulations only considering negative selection ( Supplementary Fig. 27a versus Fig. 4).
There are several limitations in this study. First, our inference on negative selection is based on HapMap3 common SNPs and therefore may not hold for the unobserved rare variants. In fact, we found by forward simulations a weaker magnitude of S in rare variants because the very rare variants were mostly new mutations whose relationship between effect size and MAF had not yet been shaped by selection, which diluted the selection signals from the variants under selection (Supplementary Fig. 29). This suggests that the true S parameter is allelic age dependent and subject to the combined effect of mutation, selection and genetic drift. An apparent change in the effect size-MAF relationship when moving toward low MAF was also reported by Schoech et al. 7 . Second, independence of chromosomes is assumed in our model. This may not hold if there was non-random mating in the ancestral population. For example, assortative mating would introduce positive correlations between trait-increasing alleles located on different chromosomes, and therefore increase heritability in the equilibrium population, e.g., for height 48 . Third, our definition of polygenicity is based on the number of SNPs with nonzero effects (m NZ ), which may not be an unbiased estimator of the number of causal variants (m C ) especially when the causal variants are not observed. For example, m NZ will tend to be smaller than m C if some causal variants are not well tagged by any SNP markers but tend to be larger than m C if they are in high multi-locus LD with a number of SNPs. Thus, our polygenicity estimate should be best used to compare traits using the same set of SNPs, rather than an unbiased estimate of the number of causal variants. Fourth, we did not attempt to predict the evolutionary parameters for functional genomic categories because it would require simulating a genome with functional partitioning. Despite these limitations, our study highlights the impact of negative selection on the genetic architecture across complex traits and in different functional genomic regions. In addition to a better understanding of the genetic architecture, our methods can also be applied to genetic mapping and polygenic risk prediction through the use of the joint SNP effect estimates or the characterised underlying distributions of effect sizes as prior knowledge for other methods 49 .

Methods
SBayesS. Let us consider an individual-level data-based multiple regression model for a GWAS cohort: where y is the vector of phenotypes adjusted for all fixed effects, X is the columncentred genotype matrix, β is the vector of SNP effects, and e is the vector of residuals with Var e ð Þ ¼ Iσ 2 e for a cohort of unrelated individuals. Assuming Hardy-Weinberg equilibrium (HWE), the variance of genotype dosage (0, 1, 2) of SNP j is h j = 2p j q j , where p j is MAF and q j = 1 − p j . Let D be a diagonal matrix with D jj ¼ X 0 j X j ¼ h j n j , where n j is per-SNP sample size. Multiplying both sides of the equation by D −1 X′ gives Note that D −1 X′ y = b, the vector of least-squares estimates of SNP marginal effects from GWAS, and D À1 X 0 X ¼ D À 1 2 BD 1 2 , where B ¼ D À 1 2 X 0 XD À 1 2 is the LD correlation matrix among all SNPs 50 . Let W ¼ D À 1 2 BD 1 2 and ε = D −1 X′e. Then, the above equation can be written as In contrast to the identity structure of residual variance in model (1), the residuals in model (2) are dependent of LD, as This is a generic form of summary-data-based Bayesian regressions, which is similar to Zhu and Stephens's RSS model 35 . As in BayesS, we assume the effect size is related to MAF through a parameter S: where ϕ is a point mass at zero, and S (the relationship between MAF and effect size), σ 2 β (the effect variance factor common to all SNPs) and π (the proportion of SNPs with nonzero effects, i.e., the polygenicity) are considered as unknown, with prior distributions of a standard normal, a scaled inverse chi-squared distribution (Supplementary Note), and a uniform distribution between zero and one, respectively. Specifying a different prior distribution to β j gives a form of other summarydata-based Bayesian alphabet models 29 .
When the LD correlations are computed using all SNPs in the GWAS sample, models (1) and (2) are equivalent in terms of posterior inference because the GWAS estimates of SNP effects (b) and LD correlation matrix (B) are sufficient statistics for the joint posterior distribution of β (Supplementary Note). Compared to model (1), model (2) allows us to incorporate LD information from a different reference sample from the GWAS sample for which the individual-level data are often not accessible. Further, it is often not practical to compute and store the entire LD matrix in the memory. Therefore, we used a sparse LD matrix that ignores the small LD correlation estimates due to sampling variation, but still accounted for the sampling variance of LD correlation in the model (Supplementary Note). In our GCTB software 13 where SBayesS is implemented, we have developed a parallel computing strategy to facilitate the computation of the LD matrix. Once the LD matrix is computed, it can be used repeatedly in the GWAS summary-data analysis for different traits.
MCMC and convergence. We used MCMC algorithm to generate 50,000 posterior samples (the first 20,000 discarded as burn-in) from the joint posterior distribution of model parameters, based on which statistical inference was made. Details of the MCMC sampling scheme are shown in the Supplementary Note. The posterior mean was used as the point estimator, with the statistical uncertainty quantified by the posterior variance or its square root (posterior standard error), as shown below. We ran four parallel chains with different starting values of the parameters randomly sampled from their prior distributions. Following the method proposed by Gelman and Rubin 27 , we estimated the posterior variance by where T is the chain length, W is the within-chain variance, and B is the betweenchain variance.
To assess convergence in MCMC, we computed the potential scale reduction statisticR for each of the model parameters. As suggested by Gelman and Rubin,R < 1:2 generally indicates good convergence. Thus, we concluded convergence for a trait analysis when all the three genetic architecture parameters hadR < 1:2.
Sparse LD matrix. For computational efficiency, we used a sparse LD matrix in the analysis where LD due to sampling variation were set to be zero. To this end, we tested whether LD between each pair of SNPs on the same chromosome is zero in the population when computing the LD correlation matrix using a reference sample. Under the null hypothesis that the true LD in the population is zero, we assume 51 (tilde denotes quantities computed from the reference sample) and reject the null if the chi-squared statistic > 10 (corresponding to P < 0.0016). This is equivalent to a r 2 threshold of 2 × 10 −4 given a sample size of 50,000, resulting in each SNP, on average, being in LD with~1000 SNPs on the same chromosome. We setB jk to be zero if the null hypothesis is not rejected or if the two SNPs are on different chromosomes, leading to a sparse LD matrix. The chi-squared threshold of 10 is chosen in order to balance the type I and II error rates. If a type I error occurs, i.e., the true correlation ρ jk = 0 butB jk is not set to be zero, then as shown in the Supplementary Information, s 2 jk ¼ñ jk þn jk n jk n jk 1 ÀB 2 jk 2 , which is very likely to be larger than the true sampling variance 1/n jk . This would inflate the residual variance and therefore deflate the heritability estimate. If a type II error occurs, i.e., ρ jk ≠ 0 butB jk is set to be zero, then s 2 jk ¼ 1 n jk , which is very likely to be larger than the true sampling varianceñ jk þn jk n jk n jk 1 À ρ 2 jk 2 . This would deflate the residual variance and therefore inflate the heritability estimate. Since the consequence of type II errors is worse, we use a not-too-stringent threshold to eliminate the LD due to sampling. This also suggests that LD reference sample size cannot be too small, otherwise, type II error rate would increase due to the loss of power. Since we only include non-zero elements in the LD matrix, it is faster by folds to run the summary-level data analysis with substantially less amount of memory needed.
SNP-based heritability estimation. In BayesS 13 , we computed the genetic variance σ 2 g as the variance of genetic values across individuals given the sampled values of β in each MCMC iteration. As described in Zhu and Stephens 35 , this is equivalent to the following quadratic term of β given the LD correlation matrix: Given the right-hand-side updating strategy in MCMC (Supplementary Note), this quadratic term can be computed efficiently as the difference of two vector-byvector products: where r = Db and r adj is the adjusted r for β. The residual variance (σ 2 e ) is sampled from a scaled inverse chi-squared distribution with the mean mainly driven by SBayesRS. Following the recently published SBayesR 29 model which assumes a mixture of a point mass at zero and multiple normal distributions with different variances, we extended SBayesS to this flexible multi-component mixture model to account for a more complex genetic architecture with a spectrum of very small to very large effect sizes. For each SNP effect, we assume where γ k = 0, 0.01, 0.1, 1 for k = 1, 2, 3, 4, representing four mixture components of zero, small, medium and large effect size, respectively, with P 4 k¼1 π k ¼ 1. The priors for S and σ 2 β are as defined in SBayesS. The mixing probabilities π are assumed to have a Dirichlet distribution with hyperparameters set to one. The polygenicity is defined as the sum of fraction of SNPs in each nonnull component, i.e., π = π 2 + π 3 + π 4 . The sparse LD matrix above or the shrunk LD matrix 29,35 used in the SBayesR study 29 can be used for SBayesRS analysis.
Compared to other methods utilising functional annotations, such as S-LDSC 52 , BayesRC 53 and RSS-E 54 , a unique feature of the annotation-stratified SBayesS (referred to as SBayesS-strat) is that it allows the estimation of S in a specific functional annotation category. Compared to a recently published method, BLD-LDAK-Alpha 9 , that estimates the S parameter (denoted by α in their model) based on an infinitesimal model, our method accounts for a sparse genetic architecture. In addition to the estimation of per-SNP heritability, polygenicity and S for each category, we also defined per-nonzero-effect (per-NZE) heritability (h 2 NZE ) as the total heritability explained in a category divided by the number of nonzero effects in the category, which is helpful to understand whether the heritability enrichment is due to the larger number of associated variants or the larger magnitude of effect size compared to genome average. In addition to the category-specific parameters, we estimated the global parameters S, π, h 2 SNP and h 2 NZE empirically conditional on the sampled value of β in each iteration of MCMC. The fold of enrichment of each parameter for each trait was then computed as E t θ t γ =θ t h i over T MCMC iterations.
The estimation variation of the enrichment fold was quantified by the posterior variance as described above.
Meta-analysis across traits. We combined the SBayesS-strat estimates across traits by calculating the median fold enrichment for each functional category. We reported the median instead of the mean in order to minimise the impact of outliers, especially for the per-NZE heritability estimate for which the denominator (i.e., the number of nonzero effects in an annotation category) is often estimated with large sampling variance. To account for the phenotypic correlation among the traits, we estimated the effective number of traits (n e ) by performing an eigen decomposition on the phenotypic correlation matrix 55 : where λ i is the ith eigenvalue of the phenotypic correlation matrix. Then, the posterior standard error of the mean was computed as s:e:m: ¼ c SD θjy À Á ffiffiffiffi ffi n e p with c SD θjy À Á being the standard deviation of the estimate across traits.
GWAS summary statistics. We performed GWAS analyses for 26 quantitative traits and 9 common diseases in the full release of the UKB data using PLINK 1.90 beta 56 . We used 348,501 unrelated individuals of European ancestry (estimated genetic relatedness from GCTA < 0.05) 57 and the imputed data provided by the UKB team 16 . We filtered HapMap3 SNPs 30 with MAF < 0.01, HWE test P value < 1 × 10 −6 , missing genotype rate > 0.05, or imputation info score < 0.3. We further excluded SNPs in the Human Major Histocompatibility Complex (MHC) region, resulting in a total of 1,124,198 common SNPs for the analysis. The LD correlations in the reference samples were computed based on the effect alleles in the GWAS summary data. For quantitative traits, we standardised phenotypes to mean zero and variance one after removing the outliers (phenotype > 7 SD) and performed rank-based inverse normal transformation (RINT) within each sex group. Prior to GWAS, we pre-adjusted phenotypes by age, sex and first 10 principal components (PCs) provided by the UKB team after RINT if applied. For the publicly available summary statistics, we downloaded the data and matched the SNPs with those in the UKB data after excluding the strand ambiguous SNPs (i.e., A/T or C/G SNPs) in addition to the QC procedures above. For the GWAS summary data from the Neale Lab, we extracted 274 quantitative traits for which the GWAS was performed based on RINT phenotypes in their analysis pipeline.
Evolutionary forward simulations. We used SLiM3 26 to run evolutionary forward simulations. A large sequence of 100 Mb was simulated, where a proportion of new mutations (π m ) that had pleiotropic effects on fitness and trait emerged at random with an average selection coefficient of s and a mutational heritability of h 2 m in each generation. The values of the three input parameters were sampled from uniform distributions at log10 scale: log 10 (s)~U(10 −5 , 10 −2 ), log 10 (π m )~U(10 −3 , 10 −1 ) and log 10 ðh 2 m Þ $ U 10 À4 ; 10 À2 À Á . The mutation rate (μ m ) was set to 1.65 × 10 −8 per base pair per individual per generation 58 , and the recombination rate was set to 1 × 10 −8 . We assumed a model of negative selection (see below for a scenario with positive selection), where all trait variants were deleterious to fitness and the other mutations were neutral. The causal effects were assumed to have a mixture of three normal distributions with small, medium and large variances: Given the input parameters h 2 m and π m , the mixture distribution parameter σ 2 β can be computed because, according to population genetics theory 10 , h 2 m ¼ 2Lπ m μ m V β in the unit of environmental variance σ 2 e , with L being the genome length (100 Mb). For each trait mutation, the selection coefficient was modelled by s j ¼ kβ 2 j , where k ¼ s=V β given the input parameter s. Because β j followed a mixture distribution, s j also followed a mixture distribution with small and large selection coefficients. To break up the perfect proportionality between selection coefficients and squared effect sizes, two pleiotropic effect models were used to remodel the trait effect of causal variant conditional on its selection coefficient toward the end of the selection process. The first model is the Simons et al. 12 model which assumes the causal effect on the focal trait (denoted as 1) following the distribution β j1~N (0, k −1 s j /n t ) with n t being the number of traits on which the variant has an effect. The second model is the Eyre-Walker's model 3 : β j1 ¼ δ j S τ j 1 þ ε j with δ j = 1 or −1 determined at random, S j = 4N e s j , N e = 10,000 being the effective population size and ε j~N (0, σ 2 ). In the Eyre-Walker's model, τ is the key parameter and σ 2 is a nuisance parameter. For each set of causal variants, we simulated causal effects based on either the Simons et al. model with n t = 1, 2, 4 or 10 or the Eyre-Walker model with τ = 0.2, 0.5, 0.8, 1.0 and σ 2 = 0.1. A demographic model inferred by Gravel et al. 37 with population bottleneck and expansion was used to simulate a population that had undergone selection for 58,000 generations. The simulation started with an ancestral base population of N e = 7310, which was expanded to 14,474 after 52,080 generations, a long period of neutral burn-in to allow the population reach mutation-drift equilibrium (~1.3 million years assuming 25 years per generation). In generation 55,960, 1861 individuals were split from the base population into a descendant population to mimic the out-of-African dispersal. In generation 57,080, the population size was further reduced to 1032 and then increased with an exponential rate of 0.0038 until generation 58,000, reaching to a final population size of 34,039. In the last generation of selection, we obtained the genotypes of~2000 unrelated individuals (genomic relationship < 0.05) and computed the LD correlation matrix for all common causal variants and a random sample of 36k common SNPs, a comparable density as the SNPs used in the real trait analysis (1.1 × 10 6 × 1 × 10 8 /3 × 10 9 = 36,000). Given the LD matrix and causal effects, we directly simulated the GWAS summary statistics for all variants 11,38 :α $ NBβ; 1 NB À Á , whereB is the LD matrix, β is the simulated causal effects based on different evolutionary models (the Simon et al. or Eyre-Walker model), and N = 350k is the sample size in the UKB dataset (see the Supplementary Note for more details). Using this approach, we were able to simulate a GWAS data set with comparable statistical power as in the real data analysis. We then used the same methods as those used in real trait analysis (i.e., SBayesS and SBayesRS) to estimate the SNP-based heritability, polygenicity and S at either common causal variants or the random set of 36,000 common SNPs (excluding the causal variants). The true value of SNP-based heritability was computed as σ 2 g =ðσ 2 g þ σ 2 e Þ, where σ 2 g is the genetic variance yielded from the simulation and σ 2 e ¼ 1. The true value for polygenicity was represented by the number of common causal variants in the last generation of selection. The true S parameter was estimated using a linear regression log β 2 j ¼ α 0 þ α 1 log 2p j q j þ ϵ j where the slope α 1 is an OLS estimate of S according to the BayesS model, and the residuals ϵ are independent. When the distribution of causal effects is a mixture of multiple normal distributions, fitting a single intercept or multiple intercepts with respect to the mixture components has negligible impact on the OLS estimate of S ( Supplementary Fig. 30). We performed the whole simulation process 100 times with a mixture or normal distribution for selection coefficients, respectively. To incorporate positive selection, we specified two more input parameters, i.e., proportion of trait mutations being beneficial (π m,b ) and average selection coefficient for the beneficial alleles ( s b ). Similar as above, we sampled their values from uniform distributions at log10 scale: log 10 ð s b Þ $ U 10 À5 ; 10 À2 ð Þ and log 10 (π m,b )~U(10 −3 , 10 −1 ). In this scenario, a new mutation can either be beneficial or deleterious with a selection coefficient as modelled above given the average positive or negative selection coefficient. Note that only the Simons et al. model was used to simulate GWAS data in this scenario as the Eyre-Walker model only fits in the context of negative selection.
Inference on evolutionary parameters. We used a polynomial regression model to predict an evolutionary parameter from the estimates of the genetic architecture parameters for complex traits. The forward simulation data under either the Simons et al. 12 or Eyre-Walker 3 model with various settings were used as a reference to estimate the associations between an evolutionary parameter ( s, π m and h 2 m at log10 scale for a positive parameter space) and the genetic architecture parameters in their original units. We tested the performance of this method by a cross-validation analysis in the simulated data, with 80% of the sample as training and the rest as validation. As shown in Supplementary Fig. 20, the three evolutionary parameters, s, π m and h 2 m , can be predicted with reasonably high accuracy, and the prediction accuracy reached the maximum when the degree of polynomial was about 3. Given this result, we chose the degree of polynomial of 3 in the real trait prediction analysis. We used the entire simulation data set to build a polynomial regression model to predict each of the evolutionary parameters by jointly modelling the SBayesS estimates of h 2 SNP ; S and π (or the SBayesRS estimates of h 2 SNP , S, π 1 , π 2 , π 3 and π 4 ). We then applied these polynomial equations to predict the evolutionary parameters in real data for 44 complex traits.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.