Global genetic differentiation of complex traits shaped by natural selection in humans

Guo, Jing; Wu, Yang; Zhu, Zhihong; Zheng, Zhili; Trzaskowski, Maciej; Zeng, Jian; Robinson, Matthew R.; Visscher, Peter M.; Yang, Jian

doi:10.1038/s41467-018-04191-y

Download PDF

Article
Open access
Published: 14 May 2018

Global genetic differentiation of complex traits shaped by natural selection in humans

Nature Communications volume 9, Article number: 1865 (2018) Cite this article

22k Accesses
47 Citations
349 Altmetric
Metrics details

Subjects

Abstract

There are mean differences in complex traits among global human populations. We hypothesize that part of the phenotypic differentiation is due to natural selection. To address this hypothesis, we assess the differentiation in allele frequencies of trait-associated SNPs among African, Eastern Asian, and European populations for ten complex traits using data of large sample size (up to ~405,000). We show that SNPs associated with height ($P = 2.46 \times 10^{ - 5}$), waist-to-hip ratio ($P = 2.77 \times 10^{ - 4}$), and schizophrenia ($P = 3.96 \times 10^{ - 5}$) are significantly more differentiated among populations than matched “control” SNPs, suggesting that these trait-associated SNPs have undergone natural selection. We further find that SNPs associated with height ($P = 2.01 \times 10^{ - 6}$) and schizophrenia ($P = 5.16 \times 10^{ - 18}$) show significantly higher variance in linkage disequilibrium (LD) scores across populations than control SNPs. Our results support the hypothesis that natural selection has shaped the genetic differentiation of complex traits, such as height and schizophrenia, among worldwide populations.

Quantifying genetic heterogeneity between continental populations for human height and body mass index

Article Open access 04 March 2021

Jing Guo, Andrew Bakshi, … Jian Yang

Heritability of complex traits in sub-populations experiencing bottlenecks and growth

Article Open access 08 April 2024

Cameron S. Taylor & Daniel J. Lawson

Genetic drift from the out-of-Africa bottleneck leads to biased estimation of genetic architecture and selection

Article Open access 13 April 2021

Bilal Ashraf & Daniel John Lawson

Introduction

Many human complex traits, including quantitative traits (e.g., height¹) and complex disorders (e.g., cardiovascular diseases^2,3), are substantially differentiated among worldwide populations. For example, the mean height in Northern Hemisphere populations generally increases with latitude^1,4. European Americans have a lower body mass index (BMI) (~1.3 kg/m²) than African Americans but a higher BMI (1.9–3.2 kg/m²) than Asians, such as Chinese, Indonesians, and Thais for the same body fat percentage^5,6. For the mortality rates associated with ischemic heart disease in the UK, African Caribbeans are at a lower risk while South Asians are at a higher risk than Europeans⁷. While environmental factors certainly play a role, since most complex traits have a genetic component, the question is whether or not the phenotypic differentiation is partly due to genetic differentiation and, if so, whether the genetic differentiation is a consequence of genetic drift or natural selection. There has been evidence suggesting that natural selection has caused genetic differentiation among worldwide populations^8,9,10 and the signals of selection are enriched in specific parts of the genome^8,11. However, it is not straightforward to investigate whether the signals of natural selection are enriched at genetic variants associated with a complex trait for two main reasons. First, because of the polygenic nature of most complex traits¹², the signals of genetic differentiation are usually diluted among many trait-associated loci, each having a small effect too weak to be detected using methods that target a complete selective sweep^13,14,15. Second, genetic differentiation can be masked by environmental factors, thereby increasing the difficulty of directly detecting and quantifying the polygenic selection signals^4,16.

Over the past decade, genome-wide association studies (GWAS) have identified thousands of single nucleotide polymorphisms (SNPs) associated with a number of traits in humans¹⁷. These findings have provided critical knowledge for understanding the polygenic architecture of human complex traits¹⁸ or detecting the signature of natural selection^4,16,19,20. The large amount of GWAS data available in the public domain also allows researchers to address the question whether genetic variants associated with a complex trait have been under natural selection. For example, utilizing data from GWAS of large sample size, recent studies have shown that genetic variants associated with human height have been under directional selection in European populations^4,19,21. In this study, we seek to address the question whether the differentiation in allele frequency and linkage disequilibrium (LD) of variants associated with a complex trait among global populations is shaped by natural selection. We first test if allele frequencies of the trait-associated SNPs are more differentiated across African, East Asian, and European populations than expected under genetic drift for 10 complex traits (Supplementary Table 1) utilizing summary-level data from published GWAS with large sample sizes (n = up to ~405,000). These traits include three quantitative traits (adult height, BMI, and waist-to-hip ratio adjusted by BMI (WHRadjBMI)), two common diseases and related biochemical traits (coronary artery disease (CAD) and type II diabetes (T2D) and high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol), two neurological/psychiatric disorders (Alzheimer’s disease (AD) and schizophrenia (SCZ)), and one behavioral trait (educational attainment years (EAY)). For the traits for which the associated SNPs are significantly more differentiated among the global populations than expected under drift, we then measure the direction of genetic differentiation using polygenic scoring^4,16 or the mean frequency of the trait-increasing alleles, and compare it with the observed direction of phenotypic differentiation in the three populations. Moreover, previous studies have shown that both strong and soft selective sweeps can alter linkage disequilibrium (LD) between a genetic variant and its surrounding variants^22,23,24,25. If a trait-associated SNP has been under selection, one would also expect to see a population differentiation of the LD score of this SNP with its surrounding SNPs, more than expected under a drift model, where the LD score is defined as the sum of the LD r² between the focal SNP and all nearby SNPs in a 10 Mb window^26,27. In this context, we further test whether the trait-associated SNPs show greater differences in LD across populations than the control SNPs.

Results

Enrichment of F _ST in trait-associated SNPs

If the genetic loci associated with a complex trait have been under natural selection, an excess of among-population differentiation will be observed in the frequencies of the trait-associated alleles compared with what is expected under a drift model²⁸. We used Wright’s fixation index (F_ST) to measure the extent to which a particular SNP varied in allele frequency among three worldwide populations. We focused only on common variants (minor allele frequency, MAF > 0.01) because rare variants were not available in most GWAS summary data used in this study. The F_ST values of the SNPs were calculated from unrelated individuals (SNP-derived genetic relatedness < 0.05) of European (EUR, n = 1099), African (AFR, n = 1099), and East Asian (EAS, n = 1099) ancestry from the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort after quality controls (QC) (Methods; Supplementary Table 2). We selected a list of nearly independent SNPs (LD r² < 0.01) associated with each trait using PLINK²⁹ clumping analysis (Methods) of the summary statistics from the latest meta-analysis of GWAS (Supplementary Table 1) and a set of “control” SNPs randomly sampled from the genome with MAF and LD scores matched with those of the associated SNPs (Methods). All GWAS summary statistics were from studies on individuals of European ancestry. We tested whether the mean F_ST value of the trait-associated SNPs was significantly higher than that of the control SNPs, a method we call the F_ST enrichment test (Methods), similar to the approach used in Zhang et al.³⁰. We demonstrate by simulation (Methods; Supplementary Table 3) that there was no inflation in the test-statistics of the method under the null model of genetic drift (Supplementary Fig. 1a) and that selecting associated SNPs at a clumping P-value threshold of 5 × 10⁻⁶ provided higher detection power than at P < 5 × 10⁻⁸ under the alternative model (Supplementary Fig. 1b). We performed the F_ST enrichment analysis for each of the 10 traits using the trait-associated SNPs clumped at P < 5 × 10⁻⁶ with F_ST values computed from the three populations in GERA. We found that the mean F_ST values for the trait-associated loci for height ($P = 2.46 \times 10^{ - 5}$), WHRadjBMI ($P = 2.77 \times 10^{ - 4}$), and SCZ ($P = 3.96 \times 10^{ - 5}$) were significantly higher than those of the control SNPs after Bonferroni correction for multiple tests (Table 1), indicating that the genetic loci associated with these traits have been under natural selection. We further confirmed the results using F_ST values calculated from the 1000 Genome Project (1000G; unrelated EUR n = 494, AFR n = 591, and EAS n = 491; Supplementary Table 2). The results remained significant for height ($P = 4.93 \times 10^{ - 6}$), WHRadjBMI ($P = 2.62 \times 10^{ - 3}$), and SCZ (P = 6.83 × 10⁻⁵) correcting for multiple tests (Table 1 and Fig. 1).

Table 1 Enrichment test of the differentiation of trait-associated SNPs (clumped at P < 5 × 10⁻⁶) in allele frequency against the control SNPs among the three populations in GERA and 1000G

Full size table

Direction of genetic differentiation

The analyses above showed that the genetic variants associated with height, WHRadjBMI, and SCZ are more differentiated than expected by random drift. However, these analyses did not show the direction of genetic differentiation. To demonstrate the direction of differentiation, we used the SNPs clumped at $P < 5 \times 10^{ - 6}$ (see above) to compute a polygenic risk score (PRS)^4,16,29 for each individual in 1000G for height, WHRadjBMI, and SCZ. We then estimated the deviation of the mean PRS of each population from the overall mean in standard deviation (s.d.) units (Methods). The deviation is expected to be zero under drift as demonstrated by the null distribution computed from the 10,000 control SNP sets (Fig. 2). The results showed that the mean PRS for height in the EUR subjects was higher than that in the AFR and EAS subjects (Fig. 2), consistent with the observed mean phenotypic differences between the populations¹. For SCZ, the mean PRS in AFR was higher than that in EUR (Fig. 2), in line with the results from recent studies suggesting that SCZ is more prevalent in people of AFR ancestry than EUR ancestry and Asians^31,32,33. For WHRadjBMI, EAS showed a higher mean PRS than both AFR and EUR (Fig. 2). WHRadjBMI measures the fat distribution at the abdomen region after adjusting for BMI to exclude the influence of overall adiposity. Previous studies have reported that after correcting for age, gender and BMI, Asian Americans tend to have a higher accumulation of excess visceral adipose tissue (VAT) than European Americans³⁴, whereas African Americans have a lower VAT than European Americans^35,36. We repeated the PRS analysis in GERA and found that the results were consistent with those in 1000G (Supplementary Fig. 2). It should be noted that we first used the F_ST enrichment analysis to seek for evidence that the SNPs associated with a trait have been under natural selection, and then used the PRS analysis to illustrate the direction of selection without repeating the significance test. An observed differentiation of PRS values among populations alone is not evidence for selection unless it is more than expected under drift.

The PRS analysis used the effect sizes of SNPs estimated from GWAS samples of EUR ancestry. Strong genetic heterogeneity between EUR and non-EUR (e.g., if the effect sizes in EUR are different from those in non-EUR populations) could possibly bias PRS prediction in non-EUR populations^37,38. To avoid this potential bias, we examined the direction of genetic differentiation by directly comparing the difference in mean frequency of the trait-increasing alleles (fTIAs) across SNPs (clumped at $P < 5 \times 10^{ - 6}$) between two populations because in comparison with the magnitude, the direction of SNP effect estimated from GWAS is less prone to bias due to population structure³⁷. We also calculated the difference in fTIA between the two populations for the control SNPs. Note that TIA of a control SNP simply means the allele for which the estimated SNP effect is positive in the GWAS summary data. Under a drift model, fTIA is expected to be 0.5 and the difference in fTIA values between two populations is expected to be zero. The results were consistent with those from the PRS analysis (Figs. 2, 3). On average, the height-increasing alleles were more frequent in EUR than in EAS, SCZ risk alleles were more frequent in ARFs than in EUR, and WHRadjBMI-increasing alleles were more frequent in EAS than in EUR (see Fig. 3 for the results from 1000G and Supplementary Fig. 3 for the results from GERA). Of note, the mean fTIA for the control SNPs was not zero, especially for height and SCZ, which was likely because both height and SCZ are highly polygenic^39,40 and some of the control SNP could be in LD with the causal variant(s) by chance. This result also implies that SNPs other than those selected at low-association P-values have also been under polygenic selection^10,14.

LD pattern of complex trait loci altered by selection

Because our findings indicated that natural selection has shaped the frequencies of the trait-associated alleles, we then sought to test whether selection has also differentiated the LD pattern at the trait-associated genomic loci among populations more than expected under drift^22,23,24,25. If this is the case, we should expect to see larger among-population difference in LD score at the trait-associated SNPs than at the control SNPs. We first computed the LD score of each SNP in the unrelated individuals in GERA and 1000G, respectively. We then calculated the coefficient of variation of the LD score (LDCV) across the AFR, EAS, and EUR populations for each SNP (Methods), and tested whether the mean LDCV of the trait-associated SNPs (clumped at $P < 5 \times 10^{ - 6}$) significantly deviates from that of the control SNPs, a procedure we call the LDCV enrichment analysis. Note that we used coefficient of variation rather than variance because SNPs with higher LD scores tend to have larger between-population variation, a mean-variance relationship. Using the LDCV from GERA, we found a significant excess of LDCV for the trait-associated SNPs over the control SNPs for height ($P = 2.01 \times 10^{ - 6}$), EAY ($P = 2.37 \times 10^{ - 8}$), and SCZ ($P = 5.16 \times 10^{ - 18}$) after Bonferroni correction for multiple tests (Table 2). The results for height ($P = 1.99 \times 10^{ - 4}$) and SCZ ($P = 2.74 \times 10^{ - 8}$) remained significant (after correcting for multiple tests) when using the LDCV from 1000G (Table 2 and Supplementary Fig. 4). There was no significant correlation between F_ST and LDCV for either height or SCZ, suggesting that the between-population differentiation in LD were not confounded by the differentiation in F_ST (Supplementary Fig. 5). Together, these results suggest that natural selection has altered both the frequency and LD properties of the SNPs associated with height and SCZ.

Table 2 Enrichment test of the differentiation of trait-associated SNPs (clumped at P < 5 × 10⁻⁶) in the LD pattern against the control SNPs among the three populations in GERA and 1000G

Full size table

Discussion

This study sought to address whether natural selection has differentiated allele frequencies and LD scores of complex traits associated SNPs among worldwide populations more than expected under drift. To this end, we established a strategy to test whether the among-population variation in allele frequency or LD score at the trait-associated SNPs is significantly higher than that at MAF-matched and LD-matched control SNPs. We detected significant signals in allele frequencies in the GERA populations for height ($P = 2.46 \times 10^{ - 5}$), WHRadjBMI ($P = 2.77 \times 10^{ - 4}$), and SCZ ($P = 3.96 \times 10^{ - 5}$) (Table 1) and significant signals in LD for height ($P = 2.01 \times 10^{ - 6}$) and SCZ ($P = 5.16 \times 10^{ - 18}$) (Table 2). There are two plausible models that are compatible with the observed results: 1) the trait itself has undergone natural selection; 2) variants affecting the trait have been under selection because of their pleiotropic effects on fitness, and variants with larger effects on the trait tend to be more differentiated via such pleiotropic effects⁴¹.

Height is a classic complex trait. Previous studies have shown that height-associated SNPs have been under natural selection and that this might contribute to the mean height differences among EUR populations^{4,16,19,20,21}. Our results suggest that the phenotypic differences in height among AFR, EAS, and EUR populations are also partially due to natural selection on height-associated SNPs (Fig. 2 and Supplementary Fig. 2). The mean PRS of AFR was higher than that of EAS but lower than that of EUR, consistent with the observed phenotypic differences in mean height¹. Waist-to-hip ratio is an anthropometric index of abdominal obesity. Previous studies have shown that after adjusting for BMI, European Americans tend to have a larger amount of visceral fat on average than African Americans^35,36 but a lower amount than those of Asian descent^6,34. This finding could be explained by the enrichment of the WHRadjBMI-increasing alleles in EAS as shown in our results; we did not have enough power to detect difference between EUR and AFR (Fig. 2 and Supplementary Fig. 2). It is worth noting that, compared with height, direct measurements of WHRadjBMI are not widely available for worldwide populations and are less consistent across studies, which may be related to insufficient sample sizes^42,43 or the influence of environmental factors⁴⁴. For SCZ, the enrichment of risk alleles in AFR (Fig. 2 and Supplementary Fig. 2) identified by our analyses appears to support the reported discrepancy in the prevalence of SCZ between African Americans and European Americans^31,32. Moreover, Fearon et al.³³ found that, among the ethnic minority groups in the UK, the incidence rate ratios (IRRs) for SCZ (adjusted by age and sex) in individuals of Chinese descent (IRR = 3.5) is higher than non-British Whites (IRR=2.5) and lower than individuals of African descent (IRR = 9.1 for African-Caribbean and 5.8 for Black African), consistent with our findings. However, the diagnosis of SCZ can be potentially biased by non-genetic factors, such as socioeconomic status and access to hospitalization³¹.

The trait-associated SNPs used to detect signatures of natural selection were ascertained from EUR-based meta-analyses of GWAS. Such ascertainment might be biased because F_ST is a function of MAF and the MAF or LD properties of the trait-associated SNPs could be different from that of the non-associated SNPs^26,45. For example, SNPs with higher MAF in EUR tend to have higher power to be detected at a certain significance level, resulting in a difference in mean F_ST even in the absence of natural selection. This potential bias can be mitigated by matching the control SNPs with the associated SNPs by MAF and LD score. Via simulations, we confirmed that inflation did not occur in the test statistics under the null model (Supplementary Fig. 1a). Our F_ST enrichment results are unlikely to be driven by the EUR bias because among the traits that showed significant signal in the F_ST enrichment test (Table 1) only height showed higher mean fTIA in EUR than non-EUR (mean fTIA was higher in EAS for WHRadjBMI and higher in AFR for SCZ compared to EUR) (Fig. 3). Moreover, we did not observe a correlation between the worldwide F_ST (using the AFR, EAS and EUR samples in 1000G) and EUR F_ST (using the CEPH, Finnish, British, Spanish, and Tuscan samples in 1000G-EUR) for the trait-associated SNPs that show evidence of selection (Supplementary Fig. 6). This result suggests that the differentiation of PRS in global populations is unlikely to be confounded by the biases in the estimated SNP effects due to population stratification in EUR. In addition, genetic drift that takes place during particular demographic events (e.g., population bottleneck or expansion in Europeans) would result in a difference in the frequency spectrum between the causal variants and neutral variants⁴⁶. The difference, however, was very unlikely lead to a bias in our results because there was no difference in MAF between the trait-associated SNPs and the matched control SNPs. It is confirmed by simulation that there was no inflation in the F_ST enrichment test-statistics when the causal variants were simulated to have lower MAF than null SNPs in the absence of selection (A and B in Supplementary Table 4 and Supplementary Fig. 7a). We further demonstrated by simulation that the F_ST enrichment analysis method was robust to different levels of heritability (C and D in Supplementary Table 4 and Supplementary Fig. 7b), different degrees of LD between causal variants (Supplementary Figs. 8 and 9), different strategies of sampling control SNPs (matching the control SNPs with the trait-associated SNPs by only MAF computed from EUR or by both MAF and LD scores computed from AFR; Supplementary Fig. 10), or whether the causal variants were included in the analysis (Supplementary Fig. 11). We also applied different strategies of matching control SNPs to the analyses of real data and observed little differences in results (Supplementary Table 5). In addition, we compared the variance of F_ST values of a set of associated SNPs with a random set of control SNPs (with MAF and LD matched) across 100 independently simulated traits and did not observe a significant difference in variance of F_ST values between the trait-associated SNPs and control SNPs, regardless of whether the traits were simulated based on a single variant or multiple variants at each of the 1,000 causal loci (Supplementary Fig. 12).

We observed from the LD score calculation in GERA that there were three regions (on chromosomes 6, 11 and 17) presenting extremely large LD scores (Supplementary Fig. 13). The chr6 region (Supplementary Fig. 14a) harbors genes that encode the major histocompatibility complex (MHC; hg19 chr6:28,477,797-33,448,354), a well-known protein complex that is essential for the immune response. It is not surprising that the MHC locus has been under selection⁴⁷. The chr17 region (Supplementary Fig. 14b) harbors an inversion (hg16 chr17:44.1-45.0 Mb) that almost exclusively occurs in EUR and has been shown to be under positive selection in Icelandic females⁴⁸. The chr11 region (Supplementary Fig. 14c) stretches over the centromere with a length >10 Mb, and it contains multiple polymorphic inversions⁴⁹. More than half of the genes contained in this region (48–60 Mb; Supplementary Fig. 14d) are olfactory receptor genes (ORs) (186 ORs/308 genes)⁵⁰. This gene family has been found to show the greater evolutionary acceleration between humans and chimpanzees than other functional gene classes, such as nuclear transport and reproduction⁵¹. Moreover, this region included a large number of nearly independent SNPs that show pleiotropic effects on HDL cholesterol and height (Supplementary Fig. 14d).

Nevertheless, several limitations were associated with this study. First, the F_ST enrichment test is underpowered for highly polygenic traits because some of the control SNPs might be in LD with the causal variants by chance under a polygenic model (as demonstrated by the deviation of fTIA differences in the control SNPs from the expected values for height and SCZ; Fig. 3). Fortunately, the loss of power was remedied by the use of data from studies with very large sample sizes (Supplementary Table 1). Second, the GWAS summary statistics used in this study were generated from EUR samples (see the discussion of EUR bias above). This is because non-EUR GWAS of large sample size (on a similar scale as the sample sizes of the EUR GWAS used in this study) are not available for most complex traits. If there is genetic heterogeneity between EUR and non-EUR populations, the PRS computed in AFR and EAS based on SNP effects estimated from EUR studies will be biased^37,38. We showed that this potential bias could be mitigated by the fTIA analysis, which ignores the magnitude of estimated SNP effects (Fig. 3 and Supplementary Fig. 3). In fact, the results from the fTIA analysis (Fig. 3 and Supplementary Fig. 3) were largely consistent with those from the PRS analysis (Fig. 2 and Supplementary Fig. 2), suggesting that the direction of genetic differentiation as indicated by the PRS for height, WHRadjBMI, or SCZ is unlikely to be substantially biased by the between-population differences in SNP effects⁵². In addition, our result that the EUR F_ST was almost independent of the global F_ST (Supplementary Fig. 6) implies that the mean values of PRS in non-EUR samples were not biased by possible confounding in the estimated SNP effects owing to population stratification in EUR. Third, because we tested the mean F_ST (or LDCV) across all the associated loci against that of the control SNPs, we could not distinguish whether the population differentiation of mean F_ST (or LDCV) at the trait-associated variants (more than expected by drift) is due to natural selection on different loci, the same loci but different alleles, or the same alleles but different levels of selection pressure in different populations. Fourth, regarding to the type of selection on the trait-associated variants, our results seem to indicate that the excess of genetic differentiation at the trait-associated SNPs is a consequence of local selection (different alleles are favored in different populations). However, we cannot rule out the possibility that the increase in F_ST (or LDCV) at the trait-associated SNPs is due to background selection⁵³ (i.e., those SNPs are in LD with causal variants under negative selection). This is confirmed by forward simulation using SLiM⁵⁴ (Methods). The simulation result shows that background selection reduces genetic diversity and increases between-population differentiation at genetic variants in LD with the variants under negative selection (Supplementary Fig. 15). Last, we used the F_ST and LDCV enrichment analyses to assess the excess of population genetic differentiation at the trait-associated loci as a means to detect signature of natural selection. These analyses, however, cannot determine when the selection occurred in the history of human evolution and whether there are other types of natural selection within a population. Studies in progress have developed methods to model polygenic selection in an admixture graph (representing of the historical divergences and admixture events in the human populations through time) to infer which branches are most likely to have experienced polygenic selection⁵⁵, and to model the relationship between variance in SNP effect and MAF to detect signatures of negative selection on variants associated with complex traits^56,57.

In summary, we proposed a robust statistical approach to test whether SNPs associated with a complex trait of interest are more differentiated across worldwide populations than MAF-matched and LD-matched control SNPs, and used the results to infer whether the trait-associated genetic variants have undergone natural selection. Our simulations indicated that the test statistics of the proposed approach were not inflated under the null model of random drift. Using this approach, we identified that for height, WHRadjBMI, and SCZ, the trait-associated alleles were differentiated significantly more than the matched control, in directions consistent with those of phenotypic differentiation. We showed that the results were robust to the potential biases in ascertaining the trait-associated SNPs (e.g., population stratification in EUR). These results support our hypothesis that the observed phenotypic differentiation among worldwide populations is (at least partly) genetic and a consequence of natural selection on the trait-associated variants because of selection on the trait or through their pleiotropic effects on fitness since the divergence of these populations¹⁶. Our findings further suggest that natural selection has also driven differentiation in LD among populations at genomic loci associated with height and SCZ. These findings expand our understanding of the role of natural selection in shaping the genetic architecture of complex traits in human populations. The methods developed in this study are general and applicable to other complex traits, including endophenotypes, such as gene expression and DNA methylation.

Methods

Data and quality control (QC)

The 1000G samples used in this study comprised individuals of EUR (n = 503), AFR (n = 661) and EAS origins (n = 504) (Supplementary Table 2). SNPs with MAF <0.01 and Hardy–Weinberg equilibrium (HWE) $P < 10^{ - 6}$ were removed from the genotype data for each population, respectively, resulting in 6,160,018 SNPs in common across the three populations. We used GCTA⁵⁸ to construct the genetic relationship matrix (GRM) using the SNPs present in HapMap phase 3 project (HapMap3; m = ~1.2 million SNPs) and generate a set of unrelated individuals at a relatedness threshold of 0.05 in each population, resulting in 494 unrelated EUR, 591 unrelated AFR and 491 unrelated EAS (Supplementary Table 2).

There were 60,586 EUR, 3826 AFR, and 5188 EAS genotyped on Affymetrix Axiom arrays in the GERA data. The SNP genotype data were cleaned according to the following QC criteria: sample/SNP call rate <98%, MAF<0.01, and HWE test $P < 10^{ - 6}$. After QC, the SNP genotypes were imputed to the 1000G (phase 1) reference panels using IMPUTE2⁵⁹ (Supplementary Table 6). The imputed SNPs with imputation INFO scores <0.3, MAF<0.01, or HWE $P < 10^{ - 6}$ in any of the populations (AFR, EAS, and EUR) were removed (Supplementary Table 6), resulting in ~5.8 million remaining SNPs in common across the three populations. We performed a principal component analysis⁶⁰ (PCA) in a combined GERA and 1000G sample using 820,460 HapMap3 SNPs. For a particular population (e.g., GERA-EUR), any GERA individuals who were more than 3 s.d. away from the mean of the corresponding 1000G population (e.g., 1000G-EUR) were removed (Supplementary Fig. 16). We further calculated the GRM for each GERA population using the HapMap3 SNPs ($m = 820,460$), and removed one of each pair of individuals with estimated genetic relatedness >0.05. This resulted in 53,629 unrelated EUR, 1099 unrelated AFR and 3365 unrelated EAS. To avoid heterogeneity in the subsequent analyses, we harmonized the sample sizes by randomly sampling 1,099 individuals from the EUR and EAS samples (Supplementary Table 2). In summary, ~6.1 million SNPs in 1000G and ~5.8 million SNPs in GERA after QC were used in the subsequent analyses.

We used the LD-based clumping approach in PLINK²⁹ to select trait-associated SNPs from GWAS summary data. The clumping approach filters out SNPs with P-values larger than a specific threshold, clusters the remaining SNPs by LD and physical distance between SNPs, and selects the top associated SNP from each clump. We used an LD r² threshold of 0.01 and a distance threshold of 1 Mb to ensure that all the selected trait-associated SNPs were nearly independent. The Atherosclerosis Risk in Communities (ARIC) data set (~8.8 million 1000G-imputed SNPs and 7703 unrelated European Americans) was used as the reference to compute LD r² between SNPs. The ARIC sample was genotyped on Affymetrix 6.0 arrays. Genotype QC and imputation have been detailed elsewhere⁶¹ (Supplementary Table 6).

F _ST enrichment test

Similar to the approach used in Zhang et al.³⁰, we compared the mean F_ST value of the trait-associated SNPs with that of the control SNPs with MAF and LD score matched. First, we divided all the SNPs (in either 1000G or HapMap2 depending on the SNPs used in GWAS for the trait) into 20 MAF bins from 0 to 0.5 with an increment of 0.025 (excluding the SNPs with MAF < 0.01). Each of the MAF bins was further grouped into 20 bins according to the 20 quantiles of LD score distribution. The MAF and LD values were computed from the EUR samples in GERA or 1000G described above. Second, we allocated the trait-associated SNPs to the MAF and LD stratified bins, randomly sampled a matched number of “control” SNPs from each bin, computed a mean F_ST value for the control SNPs sampled from all bins, and repeated this process 10,000 times to generate a distribution of mean F_ST under drift (approximately normally distributed; see Fig. 1). Third, a P-value was computed from a two-tailed test by comparing the observed mean F_ST value for the associated SNPs against the null distribution quantified by the control SNPs, assuming normality of the null distribution.

Simulation based on real GWAS genotype data

To verify whether the test statistics used in the F_ST enrichment analyses are well calibrated, we used GCTA⁵⁸ to perform GWAS simulations based on the 1000G-imputed genotypes of 53,629 unrelated European Americans in GERA after QC. Two SNP panels (1000G and HapMap2) were respectively used to mimic those used in the real data to sample “causal variants” (GWAS for EAY, AD, CAD, and SCZ were based on 1000G and those for the other traits such as height and BMI were based on HapMap2) (Supplementary Tables 1 and 3). First, we simulated 100 quantitative traits under the null hypothesis (i.e., genetic drift) with heritability (h²) of 0.5 based on 1000 causal variants randomly sampled from a SNP panel for each trait (1000G or HapMap2). The trait phenotype was simulated based on an additive genetic model, i.e.,

$$y = g + e = \mathop {\sum }\limits_i^m \,w_iu_i + e$$

(1)

where y is the phenotype, m is the number of causal variants, w_i is the standardized genotype of the i-th causal variant with its effect u_i drawn from N(0, 1), and e is the residual generated from $N[0,{\mathrm{var}}\left( g \right)\left( {\frac{1}{{h^2}} - 1} \right)]$. To demonstrate the statistical power under the alternative hypothesis (i.e., natural selection), we sampled 1000 causal variants for each trait from the top 50% of the F_ST distribution of the 1000G (median F_ST of 0.084) or HapMap2 (median F_ST of 0.093) SNPs, where the F_ST values were computed from the three populations in 1000G. Second, we performed a GWAS analysis for each trait (under null or alternative hypothesis) with the first 10 PCs fitted as covariates to control for population stratification. Independent trait-associated SNPs were selected by PLINK-clumping at two P-value thresholds ($5 \times 10^{ - 8}$ and $5 \times 10^{ - 6}$) alongside a LD r² threshold of 0.01 and window size of 1 Mb (Supplementary Table 3). We included a less stringent threshold (i.e., $5 \times 10^{ - 6}$) in the simulation to test whether the increased number of true positives passing this lower threshold could outweigh the increased number of false positives, which might result in an increase in power for the F_ST enrichment test. Finally, we performed the F_ST enrichment test for each trait to see if there is inflation under the null hypothesis and whether there is a difference in statistical power between the two P-value thresholds under the alternative hypothesis. Note that, for consistency, we used the same SNP panel to simulate the causal variants and to sample control SNPs. Four sets of results were generated from the combinations of two P-value thresholds ($5 \times 10^{ - 8}$ and $5 \times 10^{ - 6}$) and two SNP panels (1000G and HapMap2) (Supplementary Table 3).

We further performed simulations under the null hypothesis in three additional scenarios with 1) larger proportion of lower-MAF causal variant (1000 random causal variants + 500 causal variants with MAF < 0.1; A and B in Supplementary Table 4), 2) a lower level of heritability (i.e., h² = 0.2; C and D in Supplementary Table 4), and 3) two additional causal variants sampled from a 1-Mb flanking region of each primary causal variant (3000 causal variants in total). We also performed the F_ST enrichment analysis of the original simulation data (see above) and the real GWAS summary data in two additional scenarios, i.e., 1) matching the control SNPs with the trait-associated SNPs by MAF (computed from 1000G-EUR) only, and 2) matching the control SNPs by both MAF and LD score computed from 1000G-AFR. The inflation of test statistics under the null hypothesis and the statistical power under the alternative hypothesis were illustrated by quantile-quantile (QQ) plots.

Direction of genetic differentiation

The analysis below uses a similar method introduced in Robinson et al. to quantify the population genetic differentiation of a complex trait⁴. The PRS of an individual is computed as $\hat g = \mathop {\sum }\limits_l^m x_l\hat b_l$, where m is the number of SNPs used to create the PRS, $x$ represents the SNP genotype (coded as 0, 1, or 2) and $\hat b$ is the estimate of SNP effect from the GWAS summary data. To investigate the direction of genetic differentiation among populations, we fitted a linear model

$$\hat g_i = \mu + v_j + e_i$$

(2)

where $\hat g_i$ represents the standardized PRS calculated from the trait-associated SNPs (clumping $P < 5 \times 10^{ - 6}$) for each of the unrelated individuals $i$ in either the 1000G or GERA samples; μ is the mean term; $v_j$ is the deviation of the mean PRS of population j from μ; and $e_i$ represents the residual. This method provides the estimate of the deviation (in s.d. units) of the mean PRS of a population from the overall mean. In data analysis, we also applied this method to estimate $v_j$ for control SNPs (10,000 random sets for each trait in each population) to demonstrate the variability of $\hat v_j$ under drift.

LD variation among populations

Similar to the F_ST, we created LDCV for each SNP to measure LD variation among the AFR, EAS and EUR populations. We first calculated LD score of each SNP as the sum of the LD r² between the focal SNP and all the flanking SNPs (including the focal SNP itself) within a 10-Mb window. SNPs with LD r² values < 0.01 were excluded from the calculation to avoid chance correlations between SNPs. Each SNP obtained three LD scores estimated in the unrelated individuals from each of the three populations. We computed LDCV of a SNP as the ratio of the s.d. of the three LD scores to the mean in GERA and 1000G, respectively.

Forward simulation

We simulated two independent 10-Mb segments using SLiM⁵⁴ where new mutations occurred with 5% probability to be deleterious for fitness and 95% probability to be neutral on the first segment and 100% probability to be neutral on the second segment. The deleterious mutations were under negative selection with a selection coefficient of −0.01. The mutation rate was set to be 2.36 × 10⁻⁸ (ref. ⁶²). The population samples were generated based on a commonly used demographic model⁶³, mimicking the “Out-of-Africa” event allowing migration between populations and population expansion. We started the simulation with 7310 individuals. After 58,000 generations as suggested in Gravel et al.⁶³, we obtained 34,039 Europeans and 14,474 Africans along with ~260,000 variants segregating in Europeans and ~225,000 variants in Africans. We sampled 5000 individuals from each population, extracted common variants (MAF > 0.01) and calculated the $F_{{\mathrm{ST}}}$ (or LDCV) values of the variants in common between the two populations. We conducted the simulation with 30 independent replicates. The average number of common variants across the 30 replicates was 81,055 in Africans and 43,715 in Europeans with 29,710 variants in common. We then compared the mean $F_{{\mathrm{ST}}}$ (or LDCV) value of the neutral variants on segment #2 with that of the non-deleterious variants with matching MAF and LD on segment #1 (some of which were under background selection because of the LD with deleterious mutations).

URLs

GCTA: http://cnsgenomics.com/software/gcta

SLiM: https://messerlab.org/slim

PLINK: https://www.cog-genomics.org/plink2

ARIC data: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000090.v4.p1

GERA data: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000674.v2.p2

GWAS summary data for Height, BMI, and WHRadjBMI: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files

HDL and LDL: http://csg.sph.umich.edu//abecasis/public/lipids2013/

EAY: http://www.thessgac.org/data

AD: http://web.pasteur-lille.fr/en/recherche/u744/igap/igap_download.php

CAD (coronary artery disease): http://www.cardiogramplusc4d.org/data-downloads/

SCZ: https://www.med.unc.edu/pgc/downloads

T2D: http://diagram-consortium.org/downloads.html

Data availability

All the data used in this study were obtained from the public domain (see the URLs above).

References

Stulp, G. & Barrett, L. Evolutionary perspectives on human height variation. Biol. Rev. 91, 206–234 (2016).
Article PubMed Google Scholar
Menotti, A. et al. Food intake patterns and 25-year mortality from coronary heart disease: cross-cultural correlations in the Seven Countries Study. Eur. J. Epidemiol. 15, 507–515 (1999).
Article CAS PubMed Google Scholar
Tunstall-Pedoe, H. et al. Contribution of trends in survival and coronary-event rates to changes in coronary heart disease mortality: 10-year results from 37 WHO MONICA Project populations. Lancet 15, 507–515 (1999).
Google Scholar
Robinson, M. R. et al. Population genetic differentiation of height and body mass index across Europe. Nat. Genet. 47, 1357–1362 (2015).
Article CAS PubMed PubMed Central Google Scholar
Deurenberg, P., Deurenberg-Yap, M. & Guricci, S. Asians are different from Caucasians and from each other in their body mass index/body fat per cent relationship. Obes. Rev. 3, 141–146 (2002).
Article CAS PubMed Google Scholar
Deurenberg, P., Yap, M. & van Staveren, W. A. Body mass index and percent body fat: a meta analysis among different ethnic groups. Int. J. Obes. Relat. Metab. Disord. 22, 1164–1171 (1998).
Article CAS PubMed Google Scholar
Chaturvedi, N. Ethnic differences in cardiovascular disease. Heart 89, 681–686 (2003).
Article PubMed PubMed Central Google Scholar
Barreiro, L. B., Laval, G., Quach, H., Patin, E. & Quintana-Murci, L. Natural selection has driven population differentiation in modern humans. Nat. Genet. 40, 340–345 (2008).
Article CAS PubMed Google Scholar
Novembre, J. & Di, R. A. Spatial patterns of variation due to natural selection in humans. Nat. Rev. Genet. 10, 745–755 (2009).
Article CAS PubMed PubMed Central Google Scholar
Fu, W. & Akey, J. M. Selection and adaptation in the human genome. Annu. Rev. Genom. Hum. Genet 14, 467–489 (2013).
Article CAS Google Scholar
Tennessen, J. A. & Akey, J. M. Parallel adaptive divergence among geographically diverse human populations. PLoS Genet. 7, e1002127 (2011).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Ubiquitous polygenicity of human complex traits: genome-wide analysis of 49 traits in Koreans. PLoS Genet. 9, e1003355 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hancock, A. M., Alkorta-Aranburu, G., Witonsky, D. B. & Di Rienzo, A. Adaptations to new environments in humans: the role of subtle allele frequency shifts. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 365, 2459–2468 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pritchard, J. K., Pickrell, J. K. & Coop, G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 20, R208–R215 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pritchard, J. K. & Di Rienzo, A. Adaptation–not by sweeps alone. Nat. Rev. Genet. 11, 665–667 (2010).
Article CAS PubMed PubMed Central Google Scholar
Berg, J. J. & Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).
Article PubMed PubMed Central Google Scholar
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Article CAS PubMed Google Scholar
Visscher, P. M. et al. 10 eears of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Article CAS PubMed PubMed Central Google Scholar
Turchin, M. C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).
Article CAS PubMed PubMed Central Google Scholar
Field, Y. et al. Detection of human adaptation during the past 2,000 years. Science 354, 760–764 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Zoledziewska, M. et al. Height-reducing variants and selection for short stature in Sardinia. Nat. Genet. 47, 1352–1356 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Sabeti, P. C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832–837 (2002).
Article ADS CAS PubMed Google Scholar
Pennings, P. S. & Hermisson, J. Soft sweeps III: The signature of positive selection from recurrent mutation. PLoS Genet. 2, e186 (2006).
Article PubMed PubMed Central Google Scholar
Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, 0446–0458 (2006).
Article CAS Google Scholar
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lewontin, R. C. & Krakauer, J. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74, 175–195 (1973).
CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Zhang, G., Muglia, L. J., Chakraborty, R., Akey, J. M. & Williams, S. M. Signatures of natural selection on genetic variants affecting complex human traits. Appl. Transl. Genom. 2, 77–93 (2013).
CAS Google Scholar
Schwartz, R. C. & Blankenship, D. M. Racial disparities in psychotic disorder diagnosis: a review of empirical literature. World J. Psychiatry 4, 133–140 (2014).
Article PubMed PubMed Central Google Scholar
Bresnahan, M. et al. Race and risk of schizophrenia in a US birth cohort: another example of health disparity? Int. J. Epidemiol. 36, 751–758 (2007).
Article PubMed Google Scholar
Fearon, P. et al. Incidence of schizophrenia and other psychoses in ethnic minority groups: results from the MRC AESOP Study. Psychol. Med. 36, 1541–1550 (2006).
Article PubMed Google Scholar
Park, Y. W., Allison, D. B., Heymsfield, S. B. & Gallagher, D. Larger amounts of visceral adipose tissue in Asian Americans. Obes. Res. 9, 381–387 (2001).
Article CAS PubMed Google Scholar
Hill, J. O. et al. Racial differences in amounts of visceral adipose tissue in young adults: the CARDIA (Coronary Artery Risk Development in Young Adults) study. Am. J. Clin. Nutr. 69, 381–387 (1999).
Article CAS PubMed Google Scholar
Hoffman, D. J., Wang, Z., Gallagher, D. & Heymsfield, S. B. Comparison of visceral adipose tissue mass in adult African Americans and whites. Obes. Res. 13, 66–74 (2005).
Article PubMed Google Scholar
Carlson, C. S. et al. Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLoS Biol. 11, e1001661 (2013).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Gen. 42, 565–569 (2010).
Article CAS Google Scholar
Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).
Article CAS PubMed PubMed Central Google Scholar
Johnson, T. & Barton, N. Theoretical models of selection and mutation on quantitative traits. Philos. Trans. R. Soc. B Biol. Sci. 360, 1411–1425 (2005).
Article CAS Google Scholar
Goh, L. G. H., Dhaliwal, S. S., Welborn, T. a., Lee, A. H. & Della, P. R. Ethnicity and the association between anthropometric indices of obesity and cardiovascular risk in women: a cross-sectional study. BMJ Open 4, e004702 (2014).
Article PubMed PubMed Central Google Scholar
Lean, M. E. J. et al. Ethnic differences in anthropometric and lifestyle measures related to coronary heart disease risk between South Asian, Italian and general-population British women living in the west of Scotland. Int. J. Obes. 25, 1800–1805 (2001).
Article CAS Google Scholar
Cheng, C.-Y. et al. Admixture mapping of obesity-related traits in African Americans: the Atherosclerosis Risk in Communities (ARIC) study. Obesity 18, 563–572 (2010).
Article PubMed Google Scholar
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Article CAS PubMed PubMed Central Google Scholar
Marth, G. T., Czabarka, E., Murvai, J. & Sherry, S. T. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166, 351–372 (2004).
Article CAS PubMed PubMed Central Google Scholar
Meyer, D. & Thomson, G. How selection shapes variation of the human major histocompatibility complex: a review. Ann. Hum. Genet. 65, 1–26 (2001).
Article CAS PubMed Google Scholar
Stefansson, H. et al. A common inversion under selection in Europeans. Nat. Genet. 37, 129–137 (2005).
Article CAS PubMed Google Scholar
Martínez-Fundichely, A. et al. InvFEST, a database integrating information of polymorphic inversions in the human genome. Nucleic Acids Res. 42, D1027–D1032 (2014).
Article PubMed Google Scholar
Taylor, T. D. et al. Human chromosome 11 DNA sequence and analysis including novel gene identification. Nature 440, 497–500 (2006).
Article ADS CAS PubMed Google Scholar
Clark, A. G. et al. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302, 1960–1963 (2003).
Article ADS CAS PubMed Google Scholar
Marigorta, U. M. & Navarro, A. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genet. 9, e1003566 (2013).
Article CAS PubMed PubMed Central Google Scholar
Charlesworth, B., Nordborg, M. & Charlesworth, D. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet. Res. 70, 155–174 (1997).
Article CAS PubMed Google Scholar
Messer, P. W. SLiM: simulating evolution with selection and linkage. Genetics 194, 1037–1039 (2013).
Article PubMed PubMed Central Google Scholar
Racimo, F., Berg, J. J. & Pickrell, J. K. Detecting polygenic adaptation in admixture graphs. Genetics 208, 1565–1584 (2018).
Article PubMed Google Scholar
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. https://doi.org/10.1038/s41588-018-0101-4 (2018).
Schoech, A. et al. Quantification of frequency-dependent genetic architectures and action of negative selection in 25 UK Biobank traits. Preprint at https://www.biorxiv.org/content/early/2017/09/13/188086 (2017).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: A tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Article PubMed PubMed Central Google Scholar
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Article PubMed PubMed Central Google Scholar
Zhu, Z. et al. Dominance genetic variation contributes little to the missing heritability for human complex traits. Am. J. Hum. Genet. 96, 377–385 (2015).
Article CAS PubMed PubMed Central Google Scholar
Palamara, P. F. et al. Leveraging distant relatedness to quantify human mutation and gene-conversion rates. Am. J. Hum. Genet. 97, 775–789 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. 108, 11983–11988 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was supported by the Australian National Health and Medical Research Council (1107258, 1078037, 1103418, and 1113400), the Australian Research Council (DP160101343), the US National Institutes of Health (MH100141 and MH077139), and the Sylvia and Charles Viertel Charitable Foundation (Senior Medical Research Fellowship). This study uses data from dbGaP (accessions: phs000090 and phs000674). A full list of acknowledgments to these data sets can be found in the Supplementary Note.

Author information

Authors and Affiliations

Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
Jing Guo, Yang Wu, Zhihong Zhu, Zhili Zheng, Maciej Trzaskowski, Jian Zeng, Matthew R. Robinson, Peter M. Visscher & Jian Yang
The Eye Hospital, School of Ophthalmology and Optometry, Wenzhou Medical University, 325027, Zhejiang, China
Zhili Zheng
Department of Computational Biology, University of Lausanne, 1011, Lausanne, Switzerland
Matthew R. Robinson
Queensland Brain Institute, The University of Queensland, Brisbane, QLD, 4072, Australia
Peter M. Visscher & Jian Yang

Authors

Jing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhili Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Trzaskowski
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Matthew R. Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Peter M. Visscher
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.Y. conceived and designed the study. J.G. performed simulations and statistical analyses under the assistance and guidance from Y.W., Z.Z., Z.L.Z., M.T., J.Z., M.R.R., P.M.V., and J.Y. J.G. and J.Y. wrote the manuscript with the participation of all authors. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Jian Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Guo, J., Wu, Y., Zhu, Z. et al. Global genetic differentiation of complex traits shaped by natural selection in humans. Nat Commun 9, 1865 (2018). https://doi.org/10.1038/s41467-018-04191-y

Download citation

Received: 29 October 2017
Accepted: 12 April 2018
Published: 14 May 2018
DOI: https://doi.org/10.1038/s41467-018-04191-y

This article is cited by

Similarity and diversity of genetic architecture for complex traits between East Asian and European populations
- Jinhui Zhang
- Shuo Zhang
- Ping Zeng
BMC Genomics (2023)
Joint multi-ancestry and admixed GWAS reveals the complex genetics behind human cranial vault shape
- Seppe Goovaerts
- Hanne Hoskens
- Peter Claes
Nature Communications (2023)
Recent natural selection conferred protection against schizophrenia by non-antagonistic pleiotropy
- Javier González-Peñas
- Lucía de Hoyos
- Javier Costas
Scientific Reports (2023)
Genetic variants underlying differences in facial morphology in East Asian and European populations
- Manfei Zhang
- Sijie Wu
- Sijia Wang
Nature Genetics (2022)
Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores
- Omer Weissbrod
- Masahiro Kanai
- Alkes L. Price
Nature Genetics (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Enrichment of F ST in trait-associated SNPs

Direction of genetic differentiation

LD pattern of complex trait loci altered by selection

Discussion

Methods

Data and quality control (QC)

F ST enrichment test

Simulation based on real GWAS genotype data

Direction of genetic differentiation

LD variation among populations

Forward simulation

URLs

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links

Enrichment of F _ST in trait-associated SNPs

F _ST enrichment test