High mutation rates explain low population genetic divergence at copy-number-variable loci in Homo sapiens

Hu, Xin-Sheng; Yeh, Francis C.; Hu, Yang; Deng, Li-Ting; Ennos, Richard A.; Chen, Xiaoyang

doi:10.1038/srep43178

Download PDF

Article
Open access
Published: 22 February 2017

High mutation rates explain low population genetic divergence at copy-number-variable loci in Homo sapiens

Xin-Sheng Hu^1,2,
Francis C. Yeh³,
Yang Hu⁴,
Li-Ting Deng^1,2,
Richard A. Ennos⁵ &
…
Xiaoyang Chen^1,2

Scientific Reports volume 7, Article number: 43178 (2017) Cite this article

3186 Accesses
5 Citations
Metrics details

Subjects

Abstract

Copy-number-variable (CNV) loci differ from single nucleotide polymorphic (SNP) sites in size, mutation rate, and mechanisms of maintenance in natural populations. It is therefore hypothesized that population genetic divergence at CNV loci will differ from that found at SNP sites. Here, we test this hypothesis by analysing 856 CNV loci from the genomes of 1184 healthy individuals from 11 HapMap populations with a wide range of ancestry. The results show that population genetic divergence at the CNV loci is generally more than three times lower than at genome-wide SNP sites. Populations generally exhibit very small genetic divergence (G_st = 0.05 ± 0.049). The smallest divergence is among African populations (G_st = 0.0081 ± 0.0025), with increased divergence among non-African populations (G_st = 0.0217 ± 0.0109) and then among African and non-African populations (G_st = 0.0324 ± 0.0064). Genetic diversity is high in African populations (~0.13), low in Asian populations (~0.11), and intermediate in the remaining 11 populations. Few significant linkage disequilibria (LDs) occur between the genome-wide CNV loci. Patterns of gametic and zygotic LDs indicate the absence of epistasis among CNV loci. Mutation rate is about twice as large as the migration rate in the non-African populations, suggesting that the high mutation rates play dominant roles in producing the low population genetic divergence at CNV loci.

Complexity of avian evolution revealed by family-level genomes

Article 01 April 2024

Josefin Stiller, Shaohong Feng, … Guojie Zhang

The variation and evolution of complete human centromeres

Article Open access 03 April 2024

Glennis A. Logsdon, Allison N. Rozanski, … Evan E. Eichler

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Introduction

Understanding human population genetic structure remains important for gaining insights into human history and demography, as well as for investigating genetic diseases in relation to geography and ancestry^1,2,3. Historically, human population divergence is assessed using approaches from a variety of disciplines including archaeology, palaeontology, linguistics, climatology and genetics. Early studies of genetic divergence were conducted by investigating the genetic variability of mitochondrial DNA and Y chromosomes^4,5. Currently, genome-wide SNP sites are used to measure population differentiation (e.g., the HapMap genotype data)^6,7,8, and to search for outlier regions that are potentially associated with geographically restricted genetic diseases⁹. Different classes of genetic markers vary widely in many important characteristics, such as their mode of inheritance (paternal, maternal, or biparental), mutation rate, and degree of selective neutrality. As a result population genetic divergence can vary depending on the class of genetic marker investigated. Copy-number-variable (CNV) loci are an important cause of genetic variation in human genomes, and give rise to differences of 4.8–9.5% in the overall length of human genomes^10,11. However population genetic divergence at the genome-wide CNV loci has not been investigated in detail^12,13, nor has genome-wide divergence at the CNV loci been compared with that at SNP sites.

Genetic variation at CNV loci in Homo sapiens and other species has been extensively reviewed from a number perspectives^12,14. Topics covered include the mechanisms for generating copy number variation, natural selection on duplication and deletion variants, the impacts of demographical changes on CNV loci, associations with SNP loci, and the role of CNV loci in causing diseases^{10,11,12,15,16,17}. At the population level, the evolutionary dynamics of CNV loci can be studied within the framework of population genetics^12,14. Emerson et al.¹⁸ used an infinite-site model to investigate purifying selection on copy number variation in specific gene regions in Drosophila melanogaster. Sjödin and Jakobsson¹² suggested the use of a K-allele model¹⁹ or a stepwise mutational model²⁰ to describe the mutation process at CNV loci. The neutrality of CNV loci has also been analyzed^21,22. We recently developed a three-allele model to test neutrality at CNV loci, and demonstrated selective neutrality at 856 CNV loci scored in 1184 healthy individuals from the HapMap genotype data set²³. The evolution of these CNV loci can be essentially explained by a mutation-drift process²³. Here, we proceed with the same dataset to investigate population genetic divergence at the genome-wide CNV loci.

In comparison with variation at SNP sites, variants at CNV loci have several distinct features. First, CNV variants often differ in length by 1kbp or more^24,25, whereas SNP variants differ by a single base pair. Thus although CNV loci (~4.8~9.5% of human genomes) are much less abundant than SNP sites in human genomes, they represent an important type of chromosomal structural variation¹¹. Second, more complex processes are involved in generating copy number variants, including non-allelic homologous recombination (NAHR)²⁶, non-homologous end joining (NHEJ), and insertion of transposable elements (TEs)^27,28. These differ dramatically from the mechanisms generating point mutation (transitions and transversions) at SNP sites. Third, the average mutation rate at CNV loci is expected to be much higher than the point mutation rates at SNP sites²⁹, resulting in a much younger average age of alleles for CNV than for SNP loci in natural populations³⁰. Given these differences in the properties of CNV and SNP markers, we anticipate that they will vary in their degree of population genetic divergence.

To test this hypothesis, we employ genotype data at CNV loci from the HapMap Phase III populations. This has two advantages. The first is that genetic divergence among these populations has been fully investigated at genome-wide SNP sites³¹, providing the opportunity for direct comparison with results for the genome-wide CNV loci. Analysis of CNV loci has so far only been conducted with partial HapMap Phase III populations³¹ or at a particular gene site³². Our result should differ from existing analysis because they will include more populations with a wider range of ancestry. Increasing the number of individuals will affect both the genetic divergence and the number of common CNV loci. The second advantage for using the HapMap dataset is that exact discrete copy numbers are available for each diploid genotype at each CNV locus³¹. Although techniques for detecting CNV loci have recently been improved, discrete copy-number genotypes at each CNV locus, which are also essential for accurate case-control association testing with CNV loci³³, are rarely archived in publically accessible data. Furthermore, the sample sizes in previous studies at CNV loci are often too small, and hence are inappropriate for population genetic structure analysis^18,34,35. The large sample sizes in HapMap Phase III populations means that the probabilities of making either false-positive or negative CNV calls are negligible²³.

In this study we analyze genetic divergence at the genome-wide CNV loci and compare it with that at the genome-wide SNP sites in exactly the same populations. To further address the population genetic properties of CNV loci and reinforce our explanations of evolution at CNV loci, we test LDs at both gametic and zygotic levels among all pairs of CNV loci. We compare the patterns of gametic and zygotic LDs at CNV loci with those previously reported at SNP sites^36,37. Recent theoretical studies indicate that zygotic LD is more informative than gametic LD for inferring the effects of different evolutionary forces (mating system, gene flow, selection, and genetic drift)^38,39. In the absence of functional epistatic selective effects among loci, gametic LD (lower order) is always greater than the maximum zygotic LD in value. Other processes, including mating system, gene flow and genetic drift, do not change this pattern although they can generate LD (statistical associations between loci)^38,39. The difference between the values of gametic LD and maximum zygotic LD can be used to infer whether epistasis exists between loci. Such differences tested previously at the genome-wide SNP sites with the HapMap Phase III populations³⁷, have shown the existence of epistases among many SNP sites. Here, we also investigate this property at the genome-wide CNV loci by presuming that individual CNV loci are directly/indirectly or equally involved in fitness changes. Information from LD analyses among CNV loci helps us to view the difference in population genetic divergence between SNP and CNV loci from a different perspective. Overall our objective is to infer the roles of mutation and migration in producing human population genetic divergence at the genome-wide CNV loci by comparing the single and multilocus population genetic structure of SNP and CNV loci.

Results

Population genetic divergence

Maximum likelihood estimates (MLEs) of allele frequencies are summarized in Table S1. Although all CNV loci are polymorphic in the pooled population, they exhibit various levels of polymorphisms among populations (Table 1). More than 80% of CNV loci are polymorphic in African populations (ASW, LWK, MKK, and YRI), but less than 60% in non-African populations except MEX (62.38%). Three Asian populations (CHB, CHD, and JPT) have about 45% polymorphic CNV loci.

Table 1 Sample sizes and polymorphisms at the genome-wide CNV loci in 11 human populations.

Full size table

African populations have 1.84–1.90 alleles per CNV locus while Asian populations have about 1.50 alleles per CNV locus. The rest of the 11 populations have intermediate numbers of alleles per locus (N_a = 1.6–1.66). Similarly, African populations have high gene diversity over all CNV loci (H_e = ~0.13) and small standard deviations (~0.15); while Asian populations have low gene diversity (~0.11) but large standard deviations (~0.16) over all CNV loci. The rest of the 11 populations have intermediate gene diversity and standard deviations (Table 1).

Genetic differentiation measured by G_st is 0.0498 ± 0.0491 among all CNV loci, and most individual G_st values are around 0.05, with a few CNV loci having relatively large G_st values (Fig. 1). Substantial variations exist among chromosomes, especially for the small G_st values that are outside the 95% CIs (Fig. 2). The proportions of CNV loci exhibiting a significantly low level of population genetic divergence are 72.72% on Chr 1, 51.35% on Chr 4,76.6% on Chr 5, 84.48% on Chr 6, 76.67% on Chr 7, 56.26% on Chr 9, 62.8% on Chr 11, 52.63% on Chr 17, 60.87% on Chr 19, and 90.91% on Chr 22. The rest of the chromosomes have less than 50% of CNV loci with a significantly low level of population differentiation. None of the chromosomes has any CNV locus exhibiting a significantly high level of population differentiation (Fig. 2).

**Figure 1: A histogram of G_st distribution at 856 CNV loci.**

**Figure 2: G_st values across chromosomes at CNV loci.**

The average pairwise multilocus G_st ranges from 0.0038 ± 0.00001 (CHB-CHD) to 0.0421 ± 0.0001 (JPT-LWK), with the mean of 0.0255 ± 0.0114 over all pairs (Table 2). The average pairwise multilocus G_st in African populations ranges from 0.0059 ± 0.00001 (LWK-YRI) to 0.0128 ± 0.00002 (MKK-YRI), with the mean of 0.0081 ± 0.0025 over population pairs. The average pairwise multilocus G_st in non-African populations ranges from 0.0038 ± 0.00001(CHB-CHD) to 0.0352 ± 0.0001(TSI-JPT), with the mean of 0.0212 ± 0.0109 over population pairs. The average pairwise multilocus G_st among African and non-African populations ranges from 0.0206 ± 0.00004 (MKK-TSI) to 0.0421 ± 0.0001 (JPT-LWK), with the mean of 0.0324 ± 0.0064. over population pairs.

Table 2 Comparison of the pairwise G_st(CNV) at the genome-wide CNV loci with the pairwise F_st(SNP) at the genome-wide SNP sites⁷.

Full size table

Compared with the pairwise multilocus F_st previously reported at the genome-wide SNP sites⁷, the pairwise multilocus G_st at the genome-wide CNV loci is generally more than three times lower (average ratio of F_st(SNP)/G_st(CNV) = 3.3081 ± 1.1837; Table 2). The ratios of F_st(SNP)/G_st(CNV) range from 1.3481 ± 0.0171 (LWK-YRI) to 2.1023 ± 0.0087 (MKK-LWK) in African populations, with the mean of 1.6849 ± 0.3294 over population pairs; from 0.2649 ± 0.0265 (CHB-CHD) to 3.6545 ± 0.0253 (CEU-CHD) in non-African populations, with the mean of 2.5048 ± 0.9240 over population pairs; and from 3.3.5497 ± 0.0200 (ASW-GIH) to 4.8624 ± 0.0258 (CHD-MKK) among African and non-African populations, with the mean of 4.2584 ± 0.3548 over population pairs (Table 2).

Inter-chromosomal variations in pairwise G_st values are substantial among different population pairs (Figure S1a), indicating the presence of differential divergences among chromosomes during the formation of populations. The pairs among African and non-African populations have large variations among chromosomes, especially on Chrs 9, 10, 16, 20, and 22 (Figure S1a), while the pairs among African populations or among non-African populations exhibit relatively stable divergences among chromosomes (e.g., CHB-JPT and CEU-CHB; Figure S1b).

Pairwise Nei’s genetic distances at multiple CNV loci range from 0.001 ± 0.000004 (CHB-CHD) to 0.0241 ± 0.0001 (CHD-YRI), with a mean of 0.0124 ± 0.0067 over all pairs (Table S2). The average genetic distance is 0.0029 ± 0.0010 among African populations, 0.0085 ± 0.0049 among non-African populations, and 0.0174 ± 0.0040 among African and non-African populations. Cluster analysis with the unweighted pair group method with arithmetic mean (UPGMA) shows that the three subgroups (African, Asian, and the rest of the populations) are clearly distinguished (Fig. 3). Bootstrapping resample trees (1000) using PHYLIP⁴⁰ indicate that African and non-African populations can be separated with a probability of 100% (data not shown here).

**Figure 3: Cluster analysis of 11 human populations.**

Consider an average mutation rate of the order 10⁻⁵ at a CNV locus²⁹, the equal effective population sizes among the 11 populations, and 25 years per generation. From the average distance and its approximate variance V(t), the population isolation time is generally about t = 0.0124 × 5 × 10⁴ × 25 ± 0.0067 × 5 × 10⁴ × 25 = 15500 ± 8375 years among populations, t = 3625 ± 1250 years among African populations, about t = 10625 ± 6125 years among non-African populations, and about t = 21750 ± 5000 years among African and non-African populations.

Gametic and zygotic LDs

Statistical tests indicate that very few pairs of CNV loci, 0.027~0.073%, exhibit significant gametic LDs in the 11 populations (Table 3; Table S3 for details). Most pairs of CNV loci have insignificant gametic LDs in each population. Among the significant gametic LDs, African populations generally have a lower proprtion of CNV locus pairs with significant gametic LDs than do most non-African populations (Table 3). The average significant r-squares are higher for CNV loci from the same chromosome (~0.76) than from different chromosomes (~0.16). Among the significant gametic LDs on the same chromosomes, more pairs come from partially overlapped CNV loci in each population (Table 3).

Table 3 Means and standard deviations of significant gametic LDs (r-squares) in 11 human populations ^* .

Full size table

Patterns of gametic LDs are different among populations. African populations have more significant gametic LDs from different chromosomes than from the same chromosomes, while non-African populations except CEU and MEX have more significant gametic LDs from the same chromosomes than from different chromosomes. No common pairs of CNV loci have significant gametic LDs on different chromosomes among 11 populations, but twelve common pairs from overlapped CNV loci (except one on Chr 7) exist, with 1 on Chrs 1, 7,9,11, and 12, 3 on Chr 5, and 4 on Chr 6 (Table S4).

Tests of zygotic LDs also indicate that a very few CNV loci have significant zygotic LDs, 0 ~ 0.0359% (Table 4), which is generally less than the proportion of significant gametic LDs (Table 3). Most CNV loci with significant zygotic LDs are partially overlapped on the same chromosomes (Table S5). African populations have fewer significant zygotic LDs than do most non-African populations in significant D_ij (i, j = 0, 1, 2) except D _3j (j = 0, 1, 2, 3). There are twenty-two common pairs of CNV loci (mostly overlapped) of significant zygotic LDs in 11 populations, with 1 pair on Chr 1, 3 on Chr 5, 7 on Chr 6, 3 on Chr 7, 1 on Chr 9, 2 on Chr 10, 2 on Chr 11, 1 on Chr 12, 1 on Chr 13, and 1 on Chr 20 (Table 4). These locus pairs also have significant gametic LDs, while some CNV loci with significant zygotic LDs have no significant gametic LDs in 11 populations (Table S4).

Table 4 Percentages of the pairs of CNV loci with significant zygotic LDs in 11 human populations ^* .

Full size table

For all CNV loci the maximum zygotic LD is smaller than the gametic LD in value, indicating that no epistatic effects exist between CNV loci. Both gametic and zygotic LD analyses indicate that these CNV loci are essentially in linkage equilibrium except for a few overlapped loci in each population.

Joint migration and mutation rates

From the pairwise multilocus G_st(CNV) (Table 2) and the pairwise multilocus F_st(SNP)^7,31, the ratios of the joint migration and nutation rates at CNV loci (m_c + 3μ_c/2) to those at SNP sites (m_s + 2μ_s) are estimated according to equations (9) and (12) (Table 5). The ratios range from 0.2624 ± 0.0263 (CHB-CHD) to 5.7238 ± 0.0375 (CHD-MKK), with the mean of 3.6600 ± 1.4188 over all pairs. The ratios change from 1.4126 ± 0.0144 (ASW-LWK) to 2.1402 ± 0.0088 (MKK-YRI) in African populations, with the mean of 1.6988 ± 0.3411 over population pairs; from 0.2624 ± 0.0263 (CHB-CHD) to 3.9942 ± 0.0311 (CEU-CHD) in non-African populations, with the mean of 2.6796 ± 1.0224 over population pairs; and from 3.8132 ± 0.0266 (ASW-GIH) to 5.7238 ± 0.0375 (CHD-MKK) among African and non-African populations, with the mean of 4.8157 ± 0.4929 over population pairs.

Table 5 Ratios of the joint mutation and migration rates at CNV loci to those at SNP sites (above diagonal), and the ratios of the mutation rate to the migration rate at CNV loci (below diagonal).

Full size table

Using the average pairwise F_st(SNP) = 0.0956 ± 0.0567^7,31 and the average pairwise G_st(CNV) = 0.0255 ± 0.0114 across all population pairs, we obtain (m_c + 3μ_c/2)/(m_s + 2μ_s) = 4.0396 ± 3.2341, where a large standard deviation arises from the variation among populations. The above estimates indicate that the joint migration and mutation rates are generally much greater at the genome-wide CNV loci than at the genome-wide SNP sites.

The ratio of the mutation rate to the migration rate at CNV loci can be approximately quantified. According to equations (13) and (14), estimates of μ_c/m are summarised in Table 5, which range from 0.0352 ± 0.0177 (TSI-CEU) to 3.1492 ± 0.0250 (CHB-MEX), with a mean of 1.8153 ± 0.9016 over population pairs (except for a negative value for the CHB-CHD pair). The mutation rate is generally smaller than the migration rate among African populations (0.2392 ± 0.0115~0.7601 ± 0.0055; Table 5), but is greater than the migration rate among non-African populations (1.2036 ± 0.5881) or among African and non-African populations (2.4655 ± 0.4384).

Estimate of μ_c/m is 2.0264 ± 2.1561 from the rate (m_c + 3μ_c/2)/(m_s + 2μ_s) = 4.0396 ± 3.2341 in the 11 populations, and 2.0352 ± 2.0909 from (m_c + 3μ_c/2)/(m_s + 2μ_s) = 4.0529 ± 3.1364 in four populations (CEU, YRI, CHB, and JPT)^7,8 (average pairwise F_st(SNP) = 0.1265 ± 0.0675; average pairwise G_st(CNV) = 0.0354 ± 0.0158 in the present study). These estimates indicate that the mutation rate at CNV loci is generally about twice as large as the migration rate.

Discussion

Our results indicate a closer population genetic relationship at CNV loci than at SNP sites among 11 HapMap Phase III populations. Previous reports indicate a similar pattern at specific loci among African, European and East Asian populations (HapMap Phase II data)⁴¹, or among HapMap Phase II populations (F_st = ~0.11 at the genome-wide SNP sites)⁴². A general similarity in relative population genetic structure at CNV loci and SNP sites is also reported with more populations (29) and fewer CNV loci (396) and individuals (405 in total), but the difference is not quantified¹³. LD analyses indicate that these CNV loci are essentially in linkage equilibrium except for a few overlapped loci. Epistasis does not exist for any pair of CNV loci, presuming that these CNV loci are not selectively neutral or equally additive in influencing fitness. This result is different from those at the genome-wide SNP sites where epistasis occurs among many intron SNPs³⁷. The results provide additional support for a recent report indicating that the 856 CNV loci are selectively neutral in each population²³. The evolutionary processes for the low level of population divergences are different from those at the nonsynonumous SNP sites with F_st < 5% where negative selection is thought to be involved³¹.

Note that our analyses are based on the three-allele system for describing the evolution at a CNV locus because the maximum number of allele copies is four in a diploid genotype. These 856 CNV loci are shown to exhibit neutrality among 1184 healthy individuals²³. A system of more than three alleles is needed when more than four allele copies occur in a genotype at any CNV locus. This could likely occur when fewer individuals are surveyed or when unhealthy individuals are included because the number of common CNV loci could become fewer with smaller sample sizes. Under this situation, a neutrality test at CNV loci is needed for small sample sizes, and the extent of population genetic divergence could be different from the results reported here. This needs further verification.

Nei’s genetic distance at the genome-wide CNV loci is generally comparable to those between human populations at the common protein or blood group loci⁴³. However, African populations have even smaller genetic divergence at CNV loci. In the process of mutation-drift at the 856 CNV loci²³, population differentiation is expected to occur more recently owing to the high mutation rates at CNV loci. The time estimates since divergence are much shorter than those for general population genetic divergence in humans estimated from common protein loci (~120 Kyrs between human populations⁴³), or than the postulated time (>100 Kyrs) for modern humans to leave Africa and colonize the rest of the world. Because the assumption for = 2 μt^43,44 is violated due to the unequal effective population sizes among populations^45,46, the varying mutation rates among loci, and the finite number of alleles at a CNV locus (not the infinite-allele model)²³, the preceding estimates might provide a reference for the minimum divergence times.

Patterns of genetic divergence at CNV loci may reflect the historical divergence in forming modern human origins. The common pattern at both CNV and SNP loci is that the smallest genetic divergence is present among African populations, followed by among non-African populations, and then among African and non-African populations. Polymorphisms at CNV loci decrease from African to non-African populations. More alleles per CNV locus in African populations suggest a longer-term accumulation of mutants. These patterns are consistent with the Out of African model rather than with the multiregional model for modern human origins^47,48. Genetic drift effects reduce genetic diversity in non-African populations. Further inferences on the evolutionary processes occurring among non-African populations would require additional information besides the comparison of polymorphisms at CNV loci. Nevertheless, the genetic relationships among non-African populations show a clear separation of Asian populations from non-Asian populations. Evidence at genome-wide CNV loci supports the hypothesis that CHB and CHD have a very close genetic relationship. This is slightly different from the genetic relationships revealed by the patterns of zygotic and gametic LDs at the genome-wide SNP sites where JPT and CHD have a very close genetic relationship³⁷. Genetic drift effects could explain the relative small differentiation in polymorphism at CNV loci in Asian and European populations. Both CHB and CHD have relatively smaller genetic drift effects than JPT⁴⁵, and hence have higher polymorphisms (1.50 vs1.48 alleles per CNV locus). CEU probably has relatively smaller genetic drift effects than do CHB and JPT⁴⁵, and hence has more alleles per CNV locus (1.66 alleles per CNV locus). A relatively high level of polymorphisms in MEX among non-African populations probably arise from an admixture of individuals with multiple distinct ancestries, which is consistent with previous explanations^37,49.

Because both mutation and migration reduce population genetic divergence⁵⁰, the combined patterns of genetic divergence at CNV and SNP loci provide us with an opportunity to address their relative roles. Previous reports⁵¹ indicate that the mutation rates are about 1.7 × 10⁻⁶ to 1.0 × 10⁻⁴, about 100~10000 times of the point mutation rate at SNP sites (1.8–2.5 × 10⁻⁸). Fu et al.²⁹ indicates that the mutation rate for most CNV loci is about order of 10⁻⁵ per CNV locus per generation. On average, a mutation rate of the order 10⁻⁵ at the 856 CNV loci could be inferred from the estimate of the population-scaled mutation rate θ (=4 Nμ) = 0.1415 ± 0.0144²³, given N ~ 3000⁴⁵. Patterns of the μ_c/mestimates suggest a dominant role that the mutation process plays in shaping population genetic divergence at CNV loci, especially in the non-African populations (Table 5). The low μ_c/m in African populations could likely arise from their closer genetic relationships where the inter-population gene exchanges are historically more frequent or from natural evolutionary convergence where their genetic compositions become similar since ancestral populations. However, statistical tests indicate that the mutation-drift process can explain the variation at CNV loci in African populations, implying that the latter process could be the main reason for low genetic divergence²³.

In comparison with the previous results (G_st ~ 0.11) at a few CNV loci¹⁰ (67 CNV loci and n = 270 in total) or at the locus of a specific gene CCL4L³² in four HapMap populations (YRI, CEU, and CHB + JPT), our investigation shows much lower population genetic divergence at the 856 CNV loci among these four populations (mean G_st = 0.0345 ± 0.0158; Table 2). This result indicates that the CNV loci shared among 1184 healthy individuals exhibit smaller population genetic divergence. Also, compared with the pairwise F_st across chromosomes at the genome-wide SNP sites (Fig. 2 in Baye⁸), a similarity in pattern at the genome-wide CNV loci exists (Figure S1). The difference is the presence of low population genetic divergence at CNV loci.

A caveat in the above inferences is that it is based on the assumption of equilibrium among the processes of mutation, drift, and migration at CNV and SNP loci in human populations. Like conventional population genetics analyses in different organisms, such an equilibrium might not be attained in reality, and a dynamic model of evolution is more realistic for further investigation. However, concerning the estimates of , the qualitative conclusion about the major effects of mutation on population genetic divergence cannot be rejected at the genome-wide CNV loci²⁹, especially in non-African populations.

Although small LDs are difficult to detect owing to the statistical power, very few CNV loci exhibit significant gametic and zygotic LDs from either the same or different chromosomes. This is different from the patterns at the genome-wide SNP sites (Hu and Hu³⁷ for zygotic LDs with the recombination rate <10%, Reich et al.³⁶ for gametic LDs with the recombination rate <16%, and Koch et al.⁵² for gametic LDs with the recombination rate >25%). The CNV loci on the same chromosomes (except a few overlapped loci) are distributed over a wide range of distances, with an average recombination rate of 3.3% (0~35%). The significant correlations among CNV loci do not exist across populations⁵³. The generally concordant pattern of no significant gametic and zygotic LDs provides no evidence for the presence of functionally epistatic CNV loci^26,27, different from the results at genome-wide SNP sites³⁷.

Patterns of LDs also suggest that the effects of mutation on reducing LDs are stronger than the effects of migration that increases LDs. The gametic LDs at CNV loci gradually decay with time in African populations, and the same is the case for the zygotic LDs at CNV loci⁵³, except for the overlapped CNV loci (but not for one pair of CNV loci on Chr 7 with a physical distance of 2658 bp that requires a longer time to decay). The gametic LDs at CNV loci initially formed by the founder effects in non-African populations also decay with time due to the mutation and recombination effects. The same is the case for the zygotic LDs³⁸. If recombination is the dominant process in eroding LDs, a certain proportion of CNV loci could maintain significant LD within very short distances except for overlapped loci. Such an expected pattern is not observed (Tables S4 and S5). High mutation rates causing low LDs between CNV and SNP loci are also discussed⁵⁴. Thus, the mutation effects could be greater than the recombination effects in eroding both gametic and zygotic LDs although recombination and mutation effects are both involved in reducing LDs⁵⁵.

Finally, our investigation suggests differential evolutionary processes at CNV and SNP loci along chromosomes. Although mosaic patterns occur in genome architecture in terms of different measures of genetic diversity or from different perspectives⁵³, the DNA segments with CNV loci themselves display individual blocks each with a small level of population genetic divergence. These blocks are different from the gametic or zygotic LD blocks at SNP sites since recombination within CNV loci should rarely occur. The LD blocks between CNV loci cannot be maintained due to the effects of the high mutation rates.

Methods

Genotype data at CNV loci

Genotype data in 11 HapMap Phase III populations, released by The International HapMap 3 Consortium, was downloaded from ftp://ftp.ncbi.nlm.nih.gov/hapmap/cnv_data/hm3_cnv_submission.txt. The data differs from most accessible data sets in that it provides the discrete copy numbers per CNV locus. The copy numbers at a CNV locus are derived through a two-step process according to Altshuler et al.³¹ The first step is to detect copy number variation on each chromosome by analyzing the probe-level intensity data from both the Affymetrix and Illumina arrays. QuantiSNP⁵⁶ and Birdseye⁵⁷ algorithms are used to identify CNV loci separately. Common CNV loci are further identified, and refined to ensure qualified copy number variant calls. The second step is to determine the discrete copy numbers for each CNV locus from the probe-level intensity data. CNVtools³³ and a two-dimensional model (Gaussian mixture)³¹, are used to infer the copy numbers from the maximum posterior likelihood function. A meta-approach combining the two algorithms and other criteria are used to further refine the discrete copy number classes to ensure reliable copy number estimates per diploid genomes. This second step for estimating the copy number per CNV locus is not conducted in most archived CNV data sets although later techniques for CNV locus detection are now more advanced.

Diploid genotypes were recorded in integers (0, 1, 2, 3, and 4): 0 for the genotype without any allele copy in both gametes, 1 for the genotype with one allele copy in one gamete but without any copy in the other gamete, 2 for the genotype with one allele copy in each gamete, 3 for the genotype with one allele copy in one gamete and two allele copies in the other gamete, and 4 for the genotype with two allele copies in each gamete. From the individual IDs in the HapMap project, eleven populations were extracted from the pooled data (hm3_cnv_submission.txt): ASW (African ancestry in Southwest USA), CEU (Utah residents with Northern and Western European ancestry from the CEPH collection), CHB (Han Chinese in Beijing, China), CHD (Chinese in Metropolitan Denver, Colorado), GIH (Gujarati Indians in Houston, Texas), JPT (Japanese in Tokyo, Japan), LWK (Luhya in Webuye, Kenya), MEX (Mexican ancestry in Los Angeles, California), MKK (Maasai in Kinyawa, Kenya), TSI (Toscans in Italy), and YRI (Yoruba in Ibadan, Nigeria). Sample size for each population is shown in Table 1. The number of CNV loci per Chr ranges from 11 on Chr 22 to 68 on Chr 2, with 856 common CNV loci in total. Mean size of CNV loci per Chr is ~0.02 Mb, ranging from 26 to 456897 bp. The physical distance between adjacent CNV loci per Chr is ~3.3 Mb on average, ranging from 0 (partially overlapped loci) to 34804235 bp. There are 29 CNV loci that are partially overlapped on chromosomes.

Allele frequency

Because the maximum number of allele copies is four at a CNV locus in the diploid genotype dataset of HapMap Phase III populations, a three-allele system is used to describe the genotype composition. Note that a system of more than three alleles is needed if the number of allele copies is more than 4 in a diploid genotype^23,58. Let A₀, A₁, and A₂ be the alleles with 0-, 1-, and 2-copies at a CNV locus, respectively. Allele A₁ may be the most abundant variant in a population (the segment on the reference genome), while alleles A₀ and A₂ are likely less abundant at a CNV locus. Owing to lack of information needed to separate distinct genotypes with the same copy numbers in diploids, allele frequencies under Hardy-Weinberg equilibrium (HWE) were estimated using the expectation-maximization (EM)^23,29,59,60. Polymorphism was measured in terms of the number of observed alleles per CNV locus (N_a), the percentage of polymorphic loci, P(99%), and the genetic diversity in a population ( where p_u is the uth allele frequency) which is equal to the expected heterozygosity (H_e) under HWE.

Genetic divergence

Population genetic differentiation was measured by G_st⁴⁴: G_st = 1 − H_s/H_t where H_s is the mean of the expected heterozygosity (H_e) per locus over all populations and H_t is the expected heterozygosity per locus in the pooled population. The 95% confidence intervals (CIs) for G_st was derived using the bootstrapping approach. To relate the population genetic differentiation to the time since the populations diverge from a single ancestral population, genetic distance was measured⁴⁶. This distance develops under a specific evolutionary processes. Nei’s genetic distance⁴⁴ was used to measure population genetic divergence: D = −ln(I) where in which p_lu1 and p_lu2 are the frequencies of alleles u1 and u2 at the lth locus from populations 1 and 2, respectively. Under the neutral process (mutation and genetic drift), Nei’s genetic distance is linearly related to the time since divergence (t), i.e. ^44,46, and its approximate variance V(t) = V(D)/4μ², given a mutation rate μ. Standard deviations for G_st and Nei’s genetic distance were calculated using the jackknife method⁴⁶.

LD tests

To assess the properties of CNV loci relevant for interpreting population genetic divergence, both the gametic and zygotic LDs were tested in each population. Assuming that CNV loci are involved in fitness, a comparison of gametic LD with the maximum zygotic LD in value can be used to determine whether epistasis occurs or not among loci^37,38,39. If the maximum zygotic LD (high order LD) is greater than the gametic LD (low order) in value, epistasis exists between loci, which otherwise does not occur (additive or neutral effects). This relationship has been applied to analyzing genome-wide SNP sites³⁷, providing the evidence of epistasis among many intron SNP sites in each of the 11 populations. For a pair of CNV loci each with three alleles, there are 9 types of two-non-allele gametes. Let d_ij (i, j = 0, 1, 2) be the gametic LD between allele i at the first locus and allele j at the second locus, and p_ij be the gametic frequency in the population. MLE of the frequency of a genotype pair, (s, t = 0, 1, 2, 3, 4), can be obtained using the direct counting method. An EM method is used to estimate the gametic frequency through an iterative calculation, which is described below:

where δ_ij, a Kronecker delta variable, is equal to 1 when i = j, and 0 when i≠j. Note that the E- and M-steps are combined into one formula in equation (1). Thus, given the initial gametic frequency p_ij (i, j = 0, 1, 2), the gametic frequency at the next step p′_uv can be calculated using equation (1). Then, replace p_ij in equation (1) with p′_uv and recalculate p′_uv at the next step. This iterative calculation is repeated until the convergence of gametic frequencies is attained.

The gametic LD, d_ij, is then estimated as where and are the MLEs of the frequencies of allele i at the first locus and allele j at the second locus, respectively. A chi-square statistic with 1 degree of freedom (df) is used to test H₀: d_ij = 0 ⁴⁶, i.e.

R-square, , is used to measure gametic LD, which ranges from 0 to 1. Appendix S1 gives the power calculation for the gametic LD test. The power tends to a concave upward curve as the allele frequency increases because the variance under H₀ or under H₁ (d_ij ≠ 0) has a maximum value at the intermediate allele frequencies. A large variance increases the uncertainty and hence reduces the power, given a sample size (n), a significance level (α), and gametic LD. The power also increases as the sample size or the gametic LD increases.

Let D_ij be the zygotic LD between genotypes i at the first locus and j at the second locus (i, j = 0, 1, 2, 3, 4) in the population. The MLE of zygotic LD, , from the sample of size n can be obtained by where is the MLE of the joint frequency of genotypes i at the first locus and j at the second locus, and (or ) is the frequency of genotype i (or j). To test H₀: D_ij = 0, a chi-square statistic with 1 df is set as

The normalized r-square is set as , which ranges from 0 to 1^37,39,61. Appendix S2 derives the power calculation for the zygotic LD test. Similarly, the power increases as the sample size or the zygotic LD increases. The power may be relatively lower for testing zygotic LD than for testing gametic LD due to the doubling of sample size in gametic LD tests.

The significance tests of gametic and zygotic LDs were conducted at the genome-wide level in each population, and hence a Bonferroni adjusted p-value was set as 0.05/the number of all pairs of CNV loci across 22 chromosomes, ranging from 1.88 × 10⁻⁷~6.91 × 10⁻⁷ owing to different numbers of polymorphic loci in the 11 populations. To minimize the impacts of minor allele frequency (MAF) on amplifying gametic LD test or on increasing false-positive errors, those alleles with their frequencies being out of the range [0.05, 0.95] in the samples were excluded in testing gametic LD. For the same reason, those genotypes with genotypic frequencies beyond the range [0.05, 0.95] in the samples were excluded in testing zygotic LD. Sample sizes ranging from 77 to 171 can provide appropriate statistical power for genotypic frequencies within the range [0.05, 0.95] (Appendix S2). Since the constraints and hold, only four gametic LDs and sixteen zygotic LDs were tested for each pair of CNV loci. Note that CNV loci were not filtered out by frequency except in this LD analysis.

Joint mutation and migration rates

Consider a neutral CNV locus with three alleles. Let μ_c be the mutation rate of one allele to any of the other two alleles at a CNV locus. The probability density distribution (pdf) for the allele frequency under an equilibrium among genetic drift, mutation, and migration effects can be approximated by synthesizing Kimura’s¹⁹ and Wright’s⁵⁰ work, i.e.

where N is the effective population size, m_c is the migration rate per generation for an allele at a CNV locus, Q is the migrant allele frequency, and θ_c (aka “population diversity”) is the population-scaled mutation rate (=4Nμ_c). F_st per locus is derived as

The practical population differentiation with F_st⁶² is measured by G_st⁴⁴ for a three-allele locus.

Similarly, the pdf of allele frequency at a bi-allelic SNP locus under an equilibrium among genetic drift, mutation, and migration effects can be approximated by synthesizing Kimura’s¹⁹ and Wright’s⁵⁰ work,

where m_s is the migration rate per generation, Q is the migrant allele frequency, and θ_s is equal to 4Nμ_s in which μ_s is the mutation rate at an SNP locus. F_st per locus is derived as

The relative extent of genetic divergence at the genome-wide SNP sites versus at the genome-wide CNV loci is measured by the ratio of F_st(SNP)/G_st(CNV), and its standard deviation can be estimated from the variance approximation:

where and are the means of F_st(SNP) and G_st(CNV), respectively, and cov(F_st(SNP), G_st(CNV)) is the covariance between F_st(SNP) and G_st(CNV). The above expression is derived by the delta method⁶³. Estimate of the variance of the ratio can be approximated by assuming that the covariance, cov(F_st(SNP), G_st(CNV)) is negligible at the genome-wide scale. Correlations between CNV and SNP loci are weak, which could arise from the effects of transposition events, recurrent mutation/reversions, or the preference of CNV loci at the low density of SNP sites on chromosomes^12,54.

From equations (5) and (7), the ratio of the joint migration and nutation rates at CNV loci to those at SNP sites is estimated as

Similarly, the variance of this ratio can be estimated using the delta method⁵¹. Let X = F_st(SNP)(1 − G_st(CNV)) and Y = G_st(CNV)(1 − F_st(SNP)). Again, assume that cov(F_st(SNP), G_st(CNV)) is neglected at the genome-wide scale. The variance V(X) is given by

V(Y) can be obtained by replacing F_st(SNP) and 1 − G_st(CNV) in equation (10) with G_st(CNV) and 1 − F_st(SNP), respectively. Similarly, V(XY) can be obtained by replacing 1 − G_st(CNV) in equation (10) with G_st(CNV). The covariance cov(X, Y) is given by

The variance of the ratio V(X/Y) can be estimated from the following expression,

The variance can be appropriately estimated by V(X/Y) in equation (12), especially when the sample sizes are large.

It is appropriate to assume that the migration rate is the same, on average, at the neutral CNV and SNP loci (m_c = m_s = m) although local variation might occur among loci (e.g., due to the genetic hitchhiking effects). Also, compared with the migration rate, the point mutation rate at the SNP sites can be neglected. Thus, the ratio of the mutation rate to the migration rate at CNV loci can be estimated:

The standard deviation of the μ_c/m estimate can be obtained according to equation (12), i.e.

Additional Information

How to cite this article: Hu, X.-S. et al. High mutation rates explain low population genetic divergence at copy-number-variable loci in Homo sapiens. Sci. Rep. 7, 43178; doi: 10.1038/srep43178 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Cavalli-Sforza, L. L. Population structure and human evolution. Proc. R. Soc. London Ser. B. 164, 362–79 (1966).
ADS CAS Google Scholar
Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes. Princeton, NJ: Princeton Univ. Press (1994).
Goldstein, D. B. & Chikhi, L. Human migrations and population structure: what we know and why it matters. Annu. Rev. Genom. Hum. Genet. 3, 129–152 (2002).
CAS Google Scholar
Underhill, P. A. & Kivisild, T. Use of y chromosome and mitochondrial DNA population structure in tracing human migrations. Annu. Rev. Genet. 41, 539–64 (2007).
CAS PubMed Google Scholar
Stewart, J. B. & Chinnery, P. F. The dynamics of mitochondrial DNA heteroplasmy: implications for human health and disease. Nat. Rev. Genet. 16, 530–542 (2015).
CAS PubMed Google Scholar
Duan, S., Zhang, W., Cox, N. J. & Dolan, M. R. FstSNP-HapMap3: a database of SNPs with high population differentiation for HapMap3. Bioinformation 3(3), 139–141 (2008).
PubMed PubMed Central Google Scholar
The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Baye, T. M. Inter-chromosomal variation in the pattern of human population genetic structure. Hum. Genomics 5(4), 220–240 (2011).
CAS PubMed PubMed Central Google Scholar
Auton, A. et al. Global distribution of genomics diversity underscores rich complex history of continental human populations. Genome Res. 19, 795–803 (2009).
CAS PubMed PubMed Central Google Scholar
Redon, R. et al. Global variation in copy number in human genome. Nature 444(7118), 444–54 (2006).
ADS CAS PubMed PubMed Central Google Scholar
Zarrei, M., MacDonald, J. R., Merico, D. & Scherer, S. W. A copy number variation map of the human genome. Nat. Rev. Genet. 16, 172–183 (2015).
CAS PubMed Google Scholar
Sjödin, P. & Jakobsson, M. Population genetic nature of copy number variation. Methods Mol. Biol. 838, 209–223 (2012).
PubMed Google Scholar
Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451(7181), 998–1003 (2008).
ADS CAS PubMed Google Scholar
Kato, M. et al. Population-genetic nature of copy number variations in the human genome. Hum. Mol. Genet. 19(5), 761–773 (2010).
CAS PubMed Google Scholar
Beckmann, J. S., Estivill, X. & Antonarakis, S. E. Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability. Nat. Rev. Genet. 8, 639–646 (2007).
CAS PubMed Google Scholar
Yang, T. L. et al. Genome-wide copy-number-variation study identified a susceptibility gene, UGT2B17, for osteoporosis. Am. J. Hum. Genet. 83(6), 663–74 (2008).
CAS PubMed PubMed Central Google Scholar
Stankiewicz, P. & Lupski, J. R. Structural variation in the human genome and its role in disease. Annu. Rev. Med. 61, 437–55 (2010).
CAS PubMed Google Scholar
Emerson, J. J., Cardoso-Moreira, M., Borevitz, J. O. & Long, M. Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster . Science 320(5883), 1629–1631 (2008).
ADS CAS PubMed Google Scholar
Kimura, M. Genetic variability maintained in a finite population due to mutational production of neutral and nearly neutral isoalleles. Genet. Res. 11, 247–269 (1968).
CAS PubMed Google Scholar
Ohta, T. & Kimura, M. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22, 201–204 (1973).
MathSciNet CAS PubMed Google Scholar
Gazave, E. et al. Copy number variation analysis in the great apes reveals species-specific patterns of structural variation. Genome Res. 21, 1626–1639 (2011).
CAS PubMed PubMed Central Google Scholar
Ezawa, K. & Innan, H. Theoretical framework of population genetics with somatic mutations taken into account: application to copy number variations in humans. Heredity 111(5), 364–374 (2013).
CAS PubMed PubMed Central Google Scholar
Hu, X. S., Hu, Y. & Chen, X. Y. Testing neutrality at copy-number-variable loci under the finite-allele and finite-site models. Theor. Popul. Biol. 112, 1–13 (2016).
PubMed MATH Google Scholar
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305(5683), 525–528 (2004).
ADS CAS PubMed Google Scholar
Crawford, D. C., Akey, D. T. & Nickerson, D. A. The patterns of natural variation in human genes. Annu. Rev. Genomics Hum. Genet. 6, 287–312 (2005).
CAS PubMed Google Scholar
Yim, S. H. et al. Copy number variations in East-Asian population and their evolutionary and functional implications. Hum. Mol. Genet. 19, 1001–1008 (2010).
CAS PubMed Google Scholar
Hastings, P. J., Lupski, J. R., Rosenberg, S. M. & Ira, G. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10(8), 551–64 (2009).
CAS PubMed PubMed Central Google Scholar
Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).
CAS PubMed Google Scholar
Fu, W., Zhang, F., Wang, Y., Gu, X. & Jin, L. Identification of copy number variation hotspots in human populations. Am. J. Hum. Genet. 87, 494–504 (2010).
CAS PubMed PubMed Central Google Scholar
Kimura, M. & Ohta, T. The age of a neutral mutant persisting in a finite population. Genetics 75, 199–212 (1973).
CAS PubMed PubMed Central Google Scholar
Altshuler, D. M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
ADS CAS PubMed Google Scholar
Colobran, R. et al. Population structure in copy number variation and SNPs in the CCL4L chemokine gene. Genes Immun. 9, 279–288 (2008).
CAS PubMed Google Scholar
Barnes, C. et al. A robust statistical method for case-control association testing with copy number variation. Nature Genet. 40, 1245–1252 (2008).
CAS PubMed Google Scholar
Rogers, R. L. et al. Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans . Mol. Biol. Evol. 31, 1750–1766 (2014).
CAS PubMed PubMed Central Google Scholar
Rogers, R. L. et al. Tandem Duplications and the Limits of Natural Selection in Drosophila yakuba and Drosophila simulans. PLoS One 10(7), e0132184 (2015).
PubMed PubMed Central Google Scholar
Reich, D. E. et al. Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001).
ADS CAS PubMed Google Scholar
Hu, X. S. & Hu, Y. Genomic scans of zygotic disequilibrium and epistatic SNPs in HapMap phase III populations. PLoS One 10(6), e0131039 (2015).
PubMed PubMed Central Google Scholar
Hu, X. S. Evolution of zygotic linkage disequilibrium in a finite local population. PLoS One 8, e80538 (2013).
ADS PubMed PubMed Central Google Scholar
Hu, X. S. & Yeh, F. C. Assessing postzygotic isolation using zygotic disequilibrium in natural hybrid zones. PLoS One 9, e100568 (2014).
ADS PubMed PubMed Central Google Scholar
Felsenstein, J. PHYLIP - Phylogeny inference package (version 3.2). Clarlistics 5, 164–166 (1989).
Google Scholar
Wu, D. D. & Zhang, Y. P. Different level of population differentiation among human genes. BMC Evol. Biol. 11, 16 (2011).
PubMed PubMed Central Google Scholar
Barreiro, L. B., Laval, G., Quach, H. L., Patin, E. & Quintana-Murci, L. Natural selection has driven population differentiation in modern humans. Nat. Genet. 40, 340–345 (2008).
CAS PubMed Google Scholar
Nei, M. The theory of genetic distance and evolution of human races. Jap. J. Hum. Genet. 23, 341–369 (1978).
CAS PubMed Google Scholar
Nei, M. Molecular population genetics and evolution. North-Holland Publishing Company, Amsterdam (1975).
Tenesa, A. et al. Recent human effective population size estimated from linkage disequilibrium. Genome Res. 17, 520–526 (2007).
CAS PubMed PubMed Central Google Scholar
Weir, B. S. Genetic Data Analysis II. Sinauer Associates: Sunderland, MA, (1996)
Tattersall, I. Human origins: Out of Africa. Proc. Natl. Acad. Sci. USA 106, 16018–16021(2009).
ADS CAS PubMed Google Scholar
Wolpoff, M. H. Interpretations of multiregional evolution. Science 274, 704–707(1996).
ADS CAS PubMed Google Scholar
Schwartz-Marín, E. & Silva-Zolezzi, I. The Map of the Mexican’s Genome: overlapping national identity, and population genomics. IDIS 3, 489–514 (2010).
Google Scholar
Wright, S. Evolution and the genetics of populations. Vol. 2, The Theory of Gene Frequencies Chicago, IL. The University of Chicago Press (1969).
Zhang, F., Gu, W. L., Hurles, M. E. & Lupski, J. R. Copy number variation in human health, disease, and evolution. Annu. Rev. Genomics Hum Genet. 10, 451–481(2009).
CAS PubMed PubMed Central Google Scholar
Koch, E., Ristroph, M. & Kirkpatrick, M. Long range linkage disequilibrium across the human genome. PLoS One 8(12), e80754 (2013).
ADS PubMed PubMed Central Google Scholar
Hu, X. S., Yeh, F. C. & Wang, Z. Structural genomics: Correlation blocks, population structure, and genome architecture. Curr. Genomics 12, 55–70(2011).
CAS PubMed PubMed Central Google Scholar
Sudmant, P. H. et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349(6253), aab3761 (2015).
PubMed PubMed Central Google Scholar
Ohta, T. & Kimura M. Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation. Genetics 63, 229–238 (1969).
CAS PubMed PubMed Central Google Scholar
Colella, S. et al. QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007).
CAS PubMed PubMed Central Google Scholar
Korn, J. M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genet. 40, 1253–1260 (2008).
CAS PubMed Google Scholar
Handsaker, R. E. et al. Large multiallelic copy number variations in humans. Nature Genet. 47(3), 296–303 (2015).
CAS PubMed Google Scholar
Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39, 1–38 (1977).
MathSciNet MATH Google Scholar
Nicholas, T. J., Baker, C., Eichler, E. E. & Akey, J. M. A high-resolution integrated map of copy number polymorphisms within and between breeds of modern domesticated dog. BMC Genomics 12, 414 (2011).
PubMed PubMed Central Google Scholar
Yang, R. C. Gametic and zygotic associations. Genetics 165, 447–450 (2003).
CAS PubMed PubMed Central Google Scholar
Wright, S. The genetic structure of populations. Ann. Eugenics. 15, 323–354 (1951).
MathSciNet CAS Google Scholar
Lynch, M. & Walsh, B. Genetics and analysis of quantitative traits. Sinauer Associates, Inc. Publishers, Sunderland, Massachusetts, 01375, USA (1997).

Download references

Acknowledgements

We sincerely appreciate Inke König and three anonymous reviewers for very helpful comments on this article. The work is supported by the startup funding from South China Agricultural University to XSH (4400-K16013), and by the Forest Sciences and Technology Innovation Project in Guangdong Province to XYC (2015KJCX009).

Author information

Authors and Affiliations

Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong, 510642, China
Xin-Sheng Hu, Li-Ting Deng & Xiaoyang Chen
College of Forestry and Landscape Architecture, South China Agricultural University, Guangdong, 510642, China
Xin-Sheng Hu, Li-Ting Deng & Xiaoyang Chen
Department of Renewable Resources, 751 General Service Building, University of Alberta, Edmonton, T6G 2H1, AB, Canada
Francis C. Yeh
Department of Computing Science, University of Alberta, Edmonton, T6G 2S4, AB, Canada
Yang Hu
Institute of Evolutionary Biology, Ashworth Laboratories, School of Biological Sciences, University of Edinburgh, Edinburgh, 9 3JT, EH, United Kingdom
Richard A. Ennos

Authors

Xin-Sheng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Francis C. Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Yang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Li-Ting Deng
View author publications
You can also search for this author in PubMed Google Scholar
Richard A. Ennos
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.S.H. conceived and designed the study. X.S.H. analyzed data and wrote the manuscript. Y.H. analyzed the data. L.T.D. provided logistic assistance. F.C.Y., R.A.E. and X.Y.C. revised the manuscript. All authors approved the manuscript.

Corresponding authors

Correspondence to Xin-Sheng Hu or Xiaoyang Chen.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Dataset 1 (XLS 449 kb)

Supplementary Dataset 2 (XLS 378 kb)

Supplementary Dataset 3 (XLS 236 kb)

Supplementary Tables, Appendices and Figure S1 (PDF 698 kb)

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Hu, XS., Yeh, F., Hu, Y. et al. High mutation rates explain low population genetic divergence at copy-number-variable loci in Homo sapiens. Sci Rep 7, 43178 (2017). https://doi.org/10.1038/srep43178

Download citation

Received: 10 August 2016
Accepted: 19 January 2017
Published: 22 February 2017
DOI: https://doi.org/10.1038/srep43178

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.