Copy number variations and genetic admixtures in three Xinjiang ethnic minority groups

Lou, Haiyi; Li, Shilin; Jin, Wenfei; Fu, Ruiqing; Lu, Dongsheng; Pan, Xinwei; Zhou, Huaigu; Ping, Yuan; Jin, Li; Xu, Shuhua

doi:10.1038/ejhg.2014.134

Download PDF

Article
Published: 16 July 2014

Copy number variations and genetic admixtures in three Xinjiang ethnic minority groups

Haiyi Lou^1,2^na1,
Shilin Li³^na1,
Wenfei Jin¹,
Ruiqing Fu¹,
Dongsheng Lu¹,
Xinwei Pan³,
Huaigu Zhou⁴,
Yuan Ping⁴,
Li Jin^2,3,4 &
…
Shuhua Xu^1,2

European Journal of Human Genetics volume 23, pages 536–542 (2015)Cite this article

1532 Accesses
19 Citations
3 Altmetric
Metrics details

Subjects

Genetic markers

Abstract

Xinjiang is geographically located in central Asia, and it has played an important historical role in connecting eastern Eurasian (EEA) and western Eurasian (WEA) people. However, human population genomic studies in this region have been largely underrepresented, especially with respect to studies of copy number variations (CNVs). Here we constructed the first CNV map of the three major ethnic minority groups, the Uyghur, Kazakh and Kirgiz, using Affymetrix Genome-Wide Human SNP Array 6.0. We systematically compared the properties of CNVs we identified in the three groups with the data from representatives of EEA and WEA. The analyses indicated a typical genetic admixture pattern in all three groups with ancestries from both EEA and WEA. We also identified several CNV regions showing significant deviation of allele frequency from the expected genome-wide distribution, which might be associated with population-specific phenotypes. Our study provides the first genome-wide perspective on the CNVs of three major Xinjiang ethnic minority groups and has implications for both evolutionary and medical studies.

Ethnic and functional differentiation of copy number polymorphisms in Tunisian and HapMap population unveils insights on genome organizational plasticity

Article Open access 26 February 2024

A map of copy number variations in the Tunisian population: a valuable tool for medical genomics in North Africa

Article Open access 08 January 2021

Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project

Article Open access 26 May 2022

Introduction

Copy number variations (CNVs) are usually defined as DNA segments larger than 1 kb that present different numbers of copies among two or more genomes. By changing gene dosage, CNVs can affect gene expression and phenotype.¹ Associations have also been found between CNVs and several neurological diseases and complex traits, including autism, schizophrenia, HIV susceptibility, Crohn’s disease, and psoriasis.² Although it is still unknown how much of the human genome is covered with CNVs, it is evident that genomic regions with CNVs harbor a considerable proportion of genetic diversity both within and among populations.

Xinjiang has played an important role in connecting eastern Eurasia (EEA) and western Eurasia (WEA). It was crossed by the famous Silk Road, which linked the trade among East Asia, Central Asia, and Europe. Many ethnic groups, including Uyghur (UIG), Kazakh (KZK), and Kirgiz (KGZ), have lived there for hundreds of years. Based on the geographic vital role in history and certain phenotypic features of the people living there, it is reasonable to hypothesize that the ethnic groups in Xinjiang share genetic components with EEA and WEA. Previous studies using data from Y-chromosomes have indicated an admixture pattern among Eurasian populations in Central Asia, which was shaped by migration.³ Recently, studies using genomic SNP data have supported the conclusion that UIG is an admixed population incorporating both EEA and WEA ancestries.^{4, 5} Genetic analysis of Xinjiang ethnic groups may reveal more about human migration history and the mechanisms underlying admixture, which could be useful for gene mapping of disease-gene discovery. As such, admixture mapping has been shown to be of great value in the studies of African Americans.^{4, 6}

Although CNV distribution and properties in several populations have been widely studied,⁷ those in Xinjiang admixed populations has been largely underrepresented. Here we constructed the first CNV map of the three major Xinjiang populations (KZK, KGZ, and UIG) using Affymetrix Genome-Wide Human SNP Array 6.0. Based on this map, we examined genomic admixture patterns of Xinjiang populations and identified several regions that showed deviation from genome-wide admixture patterns, and regions showing significant frequency differences from those of ancestral source populations. We performed further functional annotation analysis to explore the evolutionary and medical implications of these ‘outlier’ CNV regions (CNVRs).

Materials and methods

Samples and CNV detection

We collected 48 samples for each of the three Xinjiang populations. Each individual was the offspring after three generations by non-consanguineous marriage of members in the same ethnicity. Informed consent was obtained from all participants. The procedures followed were in accordance with the ethical standards of the Responsible Committee on Human Experimentation (approved by Ethical Committee of Fudan University) and the Helsinki Declaration of 1975, as revised in 2000. All the samples were assayed using Affymetrix Genome-Wide Human SNP Array 6.0. We used Birdsuite⁸ to genotype CNVs. The quality control was the same as our previous study.⁹ Samples which did not pass quality control were removed from subsequent analysis. We also removed the loci that showed difference (F_ST>0) between combined calling results (all samples) and separate calling results (three ethnic groups). In addition, we included the Affymetrix SNP 6.0 data of 89 Han Chinese from Beijing (CHB) and 110 samples from unrelated Utah residents with northern and western European ancestry (CEU) from HapMap project into analysis and used them as representative ancestral source populations of EEA and WEA, respectively.

The original results of Birdsuite were based on human genome assembly build 36 (hg18), here we converted all the coordinates from hg18 to hg19. The variants those could not be successfully mapped to hg19 were removed. As it is difficult for microarray to determine the exact breakpoints of the variants, the coordinates of variants in this study are the approximate boundaries: the first probe position in CNV and the last probe position in CNV. The individual variants’ data are available at dbVar with accession number nstd97.

Building the CNV map of Xinjiang population

A CNVR is defined as a union region of overlapping CNVs on the chromosome.¹⁰ CNVs are merged from different samples with any amount of overlap by extending the boundaries of the overlapping CNVs. Birdsuite output file (birdseye_canary_calls) was used to generate a CNVR map. Each CNVR was assigned a genotype (ranging from 0-copy to 4-copy state), which was determined using the copy state of the variant with the largest length. Given a set of samples, biallelic CNVs were defined as loci with only two observable alleles (either 0-copy and 1-copy or 1-copy and 2-copy). We compared our Xinjiang CNV map to the Database of Genomic Variants (DGV v10)¹¹ using the criteria of any amount of overlap. Gene functional annotation clustering analysis was performed via DAVID.¹²

Calculation of allele frequency and F_ST

At genotype level, there are five copy states generated by Birdsuite: the 0-copy state (homozygous deletion), 1-copy state (heterozygous deletion), 2-copy state (normal state/copy-neutral with LOH), 3-copy state (single copy duplication), and 4-copy state (double copy duplication). At allele level, a three-allele system (0 copy-allele, loss-allele; 1 copy-allele, normal-allele; 2 copy-allele, gain-allele) is sufficient to explain these five genotypic states. We used an Expectation-Maximization algorithm to calculate the allele frequency by assuming that each CNV locus was in Hardy–Weinberg equilibrium, as in our previous study.⁹ For each deletion or duplication allele in the admixed populations, the expected allele frequency was calculated as the sum of allele frequency of each ancestral source population weighted by its admixture proportion. Population differentiation for each CNV locus between each pair of populations was measured using the widely used pairwise F_ST¹³ based on the inferred allele frequency.

Evaluation of the influence of sample size on the estimation of allele frequency

We used HapMap samples to evaluate the influence of sample size on the estimation of allele frequency. A total of 167 CEU, 89 CHB, and 175 Yoruba in Ibadan (Nigeria) from HapMap were included. The allele frequency of the whole data set in each population was used as a standard reference allele frequency. We randomly sampled 10, 20, 30, 40, 50, 60, 70, and 80 individuals from each population. The difference proportions were calculated as the difference between the sampling allele frequency and the standard frequency divided by the standard allele frequency. The sampling process was repeated 100 times, and we took the average difference proportion of these 100 samplings in each frequency bin for common deletions and duplications separately.

Reconstruction of population relationship and population structure analysis

We used the average number of different genotypes between two individuals from different populations to characterize the genetic distance among populations, which was identical to the one described in our previous work.⁹ Based on the pair-wise distance among populations, we reconstructed the genetic relationship among these populations using Neighbor-joining method. We also performed 100 bootstrap replications and used PHYLIP¹⁴ to generate a consensus clustering topology.

Population structure was inferred using two approaches, principal component analysis (PCA, software Eigensoft v2.0¹⁵) and a model-based clustering method STRUCTURE v2.2.¹⁶ Because microarrays only provide information regarding the total number of copies in each CNV, and the allelic copy number states cannot be determined for multiallelic CNVs at individual level, we used biallelic CNVs to analyze the structures of the populations. STRUCTURE was run with 10 000 iterations and 10 000 burn-ins under K=2.

Selection of ancestral informative markers (AIMs) and local ancestry inference

Given that the unknown allelic configuration of the multiallelic CNVs can affect estimates of subsequent ancestral proportions and the relatively less accurate genotyping of duplications using the microarray platform,¹⁷ we still focused AIM selection on biallelic CNVs. Biallelic loci with F_ST values greater than 0.03 (top 9%) between ancestral source populations were selected as AIMs.We tried to identify the ancestral origins of the CNVs in the admixed populations by superimposing the CNVs onto the ancestry map generated by SNPs using software HAPMIX v1.1.¹⁸ The ancestral state of the CNV was determined by that of the most adjacent SNP.

Calculation of heterozygosity

We randomly sampled 40 individuals from each population, and calculated the heterozygosity at polymorphic site k as the following:

p_ik denotes the ith-allele frequency under a three-allele system. We calculated the average heterozygosity for each population. Statistical test was performed between each admixed population and ancestral source population by Wilcoxon test.

Estimation of linkage disequilibrium (LD) between CNVs and SNPs

We focused on LD between common biallelic CNVs (CNV occurrence more than 10% in each of the five populations: CEU, CHB, KZK, KGZ, and UIG) and their neighboring SNPs extending from 20 kb of CNV boundaries. The CNV and SNP haplotypes were phased using polyHap.¹⁹ LD was calculated as r², and SNPs with maximum r²>0.8 were labeled as CNV tags.

Identification of ancestry-biased CNVs and population-specific CNVs

Ancestry-biased F_ST was calculated between the observed and expected allele frequencies in admixed populations. Ancestry-biased F_ST was ranked according to its value and ancestry-biased CNVs were defined as CNVRs in the top 0.5 percentile. We defined population-specific CNVs as those loci showing significant differences (P<0.01) in allele frequency in admixed populations compared with the reference populations at different levels. The sample size were controlled at 40 individuals for each populations and the differences in allele frequency were compared using the χ²-test.

Results

Profiles of CNVs in the three Xinjiang populations

We performed analyses in three Xinjiang ethnic groups (KGZ, KZK, and UIG) with two reference populations (CHB and CEU, as representative populations of EEA and WEA, respectively) from HapMap. Altogether, 46 KGZ, 44 KZK, 41 UIG, 89 CHB, and 110 CEU were included in the analysis and a total of 19 842 CNV events were detected in the three Xinjiang groups. KGZ and UIG carried significantly more CNVs per individual than CHB and CEU (P<0.0002, t-test), whereas KZK showed slightly fewer CNVs (P<0.05, t-test) than the two reference populations (Table 1).

Table 1 CNV detection in Xinjiang populations

Full size table

By merging overlapping CNVs into CNVRs (see Methods), we built a CNV map, which consisted of 1436 CNVRs from 131 Xinjiang samples (Supplementary Table 1) with the size ranging from 1.017 to 1906 kb covering 2.5% of the human genome. We determined the genotype and calculated the deletion and duplication allele frequency for each CNVR (Supplementary Table 1, Supplementary Figure 1, see Methods).

We further evaluated the influence of different sample sizes on the estimation of allele frequency. Using random sampling of HapMap individuals as references, which yielded about 90 samples for each of the sampled populations (Methods), we found the difference of allele frequency between 40 and 80 samples to be within the range of approximately 10%, 5%, and 3% of the whole data set sample frequency at ranges of 0.1–0.3, 03–0.5, and 0.5–1, respectively (Supplementary Figure 2). The allele frequency difference between 40 and 80 samples was never observed to be more than 0.03 on average for common CNV loci, which is acceptable because accuracy would not be substantially improved even with a larger sample size.

In addition, we compared the data in this Xinjiang CNV map with those from DGV v10. There were 25 non-singleton regions undiscovered in DGV and the maximum frequency of these regions was 3.1%. These numbers might be underestimated because of many previous studies using lower resolution platforms. There were 581 CNVRs overlapping 986 genes in the Xinjiang CNV map, in which nearly 40% of the genes were duplications. Genes in the CNVRs were enriched in the cluster of peptidase activity, cell adhesion, membrane component, immunoglobulin, olfactory receptor, and defense response.

Genetic relationships between Xinjiang populations and reference populations

Population relationships can be characterized and measured using genetic distance (Methods). KZK showed the smallest average intra-population distance and UIG showed the largest. KZK and CHB had the smallest inter-population distance, whereas CHB and CEU showed the largest. Based on this distance, we did clustering analysis to study the relationship among these populations (Methods).The topology of the clustering graph, which was supported by 100 bootstrapping replications, showed that all the three Xinjiang populations were in the middle of the two reference populations. Of these, KZK was closer to CHB and UIG was closer to CEU (Figure 1a). This pattern was confirmed by PCA based on biallelic CNVs (Figure 1b). The three Xinjiang groups could also be distinguished without the two reference populations, where the first PC separated KZK from KGZ and UIG, and the second PC separated the latter two groups (Figure 1c). Furthermore, we used a model-based clustering algorithm STRUCTURE to infer population structure, the consistent result (Figure 1a) supported that Xinjiang ethnic groups are admixed populations with genetic components source from both EEA and WEA.

Population differentiation among populations can be characterized by F_ST. Although the majority of F_ST values of CNVRs were very small (<0.1), for each admixed population, we observed that the F_ST values between the admixed population and one ancestral populations were different from F_ST values between the admixed population and the other ancestral population (Supplementary Figure 3). For comparisons among Xinjiang populations, pairwise F_ST was much smaller than that between admixed and ancestral source populations (P<10⁻¹⁵, Wilcoxon rank sum test, Supplementary Figure 3).

Genetic architecture and admixture properties of Xinjiang populations

The average population admixture proportion inferred from biallelic CNVs by STRUCTURE was 65:35, 63:37, 45:55 (EEA/WEA) for KZK, KGZ, and UIG, respectively. The estimated WEA ancestry of UIG was very similar to that in previous study using random SNPs on chromosome 21 (56.2%).²⁰ The admixture proportion between any two individuals from the same population also varied to some extent—8.2%, 9.3%, and 13.7% for KZK, KGZ, and UIG, respectively.

Previous studies have identified that SNPs showing pronounced differences in allele frequency between CHB and CEU could serve as AIMs for UIG.²⁰ Here we selected 115 biallelic CNVs as AIMs for Xinjiang admixed populations (Figure 2a, Supplementary Table 2, Methods). These selected AIMs were able to retrieve the average admixture proportion of EEA versus WEA as 67:33, 66:34, and 46:54 for KGZ, KZK, and UIG, respectively, which showed no significant deviation from the results based on the whole data set (P>0.26, t-test), and would also facilitate the estimation of the CNV admixture proportion of samples from the three populations as well as from other admixed populations with genetic origins from EEA and WEA.

At the individual level, we tried to identify the ancestry of the AIMs in admixed populations (see Methods). We showed in Figures 2b–d the examples of each individual from one admixed group. In addition, we calculated the ancestral proportion in each individual based on the superimposed ancestry-assignments of the AIMs. The results had no significant difference from the ancestral proportions inferred directly from biallelic CNVs (P>0.84, paired t-test).

Increased heterozygosity and extended LD have been observed in the admixed populations, such as the UIG population, using SNP data.²⁰ In terms of CNV data, we calculated the expected heterozygosity using allele frequency (see Methods). On average, all the three admixed populations have increased heterozygosity (0.055) than the two ancestral source populations (0.052) at 1872 polymorphic loci (Supplementary Table 3). Although the admixed groups showed slightly higher CNV taggability than two ancestral source populations at 95 biallelic loci (only 26, 27, 30, 32, and 27 CNVRs were well-tagged by SNPs (r²>0.8, see Methods) in CHB, KZK, KGZ, UIG, and CEU, respectively). Furthermore, there were no significant differences in the overall LD pattern between admixed groups and ancestral source populations when comparing all the LDs of 1241 SNPs located adjacent to 20 kb of the 95 CNVRs (P>0.27, paired Wilcoxon test) except for UIG and KGZ, which showed higher LD than CEU (P<0.024, paired Wilcoxon test).

We performed a CNV sharing analysis to evaluate the CNV distribution pattern between Xinjiang ethnic groups and ancestral source populations. More than 40% of the CNVRs in Xinjiang groups were also detected in both CHB and CEU, and about 16% of the CNVRs were observed only in CEU. CNVRs that were detected only in CHB varied from 9.2 to 11.5% (Figure 3a). For the comparisions excluding the CNVs found in either ancestral population, all three Xinjiang groups showed a large proportion of specific CNVRs (Figure 3b).

To investigate the relationships between admixed and ancestral source populations with respect to the CNV allele frequency spectrum, we compared the observed deletion and duplication allele frequencies in the admixed populations with their respective expected allele frequencies (see Methods). Excluding loci with rare variants (allele frequency <0.025), the observed allele frequency matched the expected allele frequency on the genome-wide scale (Pearson correlation coefficient r²>0.8; Figure 4). Furthermore, we determined whether the observed and expected data could be explained by regression model y=x. Although the duplication alleles of KGZ deviated from the null hypothesis (H₀: β=1; t-test), neither the deletion alleles of any of the three admixed groups nor the duplication alleles of KZK and UGR showed any significant difference from the model y=x (P>0.1), indicating that the overall allele frequency spectrum of the admixed population followed a two-way admixture scenario that could explain the majority of the genetic architectures.

Population-specific CNVs identified in the Xinjiang admixed populations

Although the overall frequency spectra of Xinjiang populations showed a pattern typical of admixture, the allele frequency of some regions under different evolutionary forces might deviate from such expected distribution. To identify these outliers, we applied a strategy analogous to one that was recently used to assess natural selection among African Americans.²¹ We calculated the ancestry-biased F_ST between the expected and observed allele frequency for each admixed population (Supplementary Figure 4). CNVRs in the top 0.5 percentile were regarded as ancestry-biased CNVs (Table 2). For example, at chr22: 22315089-23258615, which contains several genes like GGTLC2 (gamma-glutamyltransferase light chain 2) and PRAME (preferentially expressed antigen in melanoma), deletions and duplications could hardly be observed in all three admixed populations, but could be observed with certain frequencies in the ancestral populations.

Table 2 Ancestry-biased CNVs in Xinjiang populations

Full size table

On the other hand, admixture event might cause the difference in allele frequency between ancestral source populations and admixed populations as well as among the admixed populations. We defined population-specific CNVs as regions with significant differences in frequency from reference populations. Here we identified the population-specific CNVs for each of the Xinjiang populations at two levels using different reference populations: (1) ancestral source populations and (2) the other two admixed populations. These population-specific CNVs were highlighted in Supplementary Table 4. At the first level with ancestral source populations as reference, the three admixed have three population-specific CNVRs in common. Whereas at the second level with admixed populations as reference, we found that the region on chromosome 1 overlapping with Rh blood system genes (RHCE, RHD, TMEM50A) had a higher deletion frequency in UIG (31.7%) than in KGZ (9.9%), KZK (12.5%), or CHB (7.9%), but a lower frequency than in CEU (43.2%). In addition, we identified the non-singleton CNVs that were observed only in admixed populations but not present in the ancestral source populations. The functional annotation analysis of the genes overlapping with all private CNVs showed enrichment in response to wounding (genes: TPST1, LPA, and NINJ2; P<0.04, Benjamini correction: P<0.1).

Discussion

In this study, we investigated CNVs in three major ethnic minority groups, UIG, KZK, and KGZ, using Affymetrix Genome-Wide Human SNP Array 6.0. On a genome-wide scale, the fitness of the admixture model supported the previous conclusion that the UIG gene pool was probably a recipient of EEA and WEA rather than a donor.⁵ The admixture proportions inferred here were the most reliable estimations, as indicated by the fact that the correlation coefficients between observed and expected allele frequency of admixed populations were larger at our inferred admixture proportion than at any other values of the proportion (Supplementary Figure 5).

One of the practical applications for the admixed populations is admixture mapping. It has been shown the success in other admixed populations like African Americans. Unlike African Americans, the Xinjiang admixed populations have longer admixture time and have ancestral source populations with much shorter divergence time. Admixture mapping still holds in Xinjiang admixed populations as there are a large number of markers (AIMs) available with high-differentiated frequency between WEA and EEA populations, and the previous study has also demonstrated the effectiveness of the admixture mapping for localization of disease genes in UIG populations.⁴ However, the long admixture time in Xinjiang populations has two sides for admixture mapping: on one hand, it is a challenge as weaker LD between markers and the causal variants as a consequence of more recombination events occurred due to longer admixture history; on the other hand, it could help fine mapping of disease-associated genes, again, because of the shorter LD between AIMs and potential causal variants. CNVs usually affect genome more than hundreds and thousands of base pairs; they are not only the markers but also likely the variants that make the impact directly. The AIMs we identified here are a number of CNVs showing high-differentiated frequency between ancestral source populations, which have potential utility for admixture mapping. For example, the region of chr17: 44161867-44364214 overlapping with gene KANSL1 has no duplication in EEA, but has a duplication frequency of 0.44 in WEA. This duplication copy, which was found to be associated with novo transcript and is likely to be underlying positive selection,²² could also be observed in Xinjiang admixed populations at a frequency of ∼0.13 with a clear background of complete WEA-inheritance. In addition, although the deletion of this gene was absent in the normal samples here, it was found to be associated with the 17q21.31 microdeletion syndrome.²³

In summary, we constructed the first CNV map of three admixed populations in Xinjiang and provided a first landscape of population genomics based on CNVs in this region. Our analysis showed that the genomic CNV distribution of the three representative populations, UIG, KZK, and KGZ, each followed an admixture pattern with an ancestral genetic pool from both EEA and WEA. Our study advanced our understanding of CNV distribution and admixture history of populations in Xinjiang and may serve as a useful resource for further medical and evolutionary studies.

References

Schlattl A, Anders S, Waszak SM, Huber W, Korbel JO : Relating CNVs to transcriptome data at fine resolution: assessment of the effect of variant size, type, and overlap with functional regions. Genome Res 2011; 21: 2004–2013.
Article CAS PubMed PubMed Central Google Scholar
Zhang F, Gu W, Hurles ME, Lupski JR : Copy number variation in human health, disease, and evolution. Annu Rev Genomics Hum Genet 2009; 10: 451–481.
Article CAS PubMed PubMed Central Google Scholar
Wells RS, Yuldasheva N, Ruzibakiev R et al: The Eurasian heartland: a continental perspective on Y-chromosome diversity. Proc Natl Acad Sci USA 2001; 98: 10244–10249.
Article CAS PubMed PubMed Central Google Scholar
Xu S, Jin L : A genome-wide analysis of admixture in Uyghurs and a high-density admixture map for disease-gene discovery. Am J Hum Genet 2008; 83: 322–336.
Article CAS PubMed PubMed Central Google Scholar
Xu S, Jin W, Jin L : Haplotype-sharing analysis showing Uyghurs are unlikely genetic donors. Mol Biol Evol 2009; 26: 2197–2206.
Article CAS PubMed Google Scholar
Reich D, Patterson N, De Jager PL et al: A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat Genet 2005; 37: 1113–1118.
Article CAS PubMed Google Scholar
Mills RE, Walter K, Stewart C et al: Mapping copy number variation by population-scale genome sequencing. Nature 2011; 470: 59–65.
Article CAS PubMed PubMed Central Google Scholar
Korn JM, Kuruvilla FG, McCarroll SA et al: Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 2008; 40: 1253–1260.
Article CAS PubMed PubMed Central Google Scholar
Lou H, Li S, Yang Y et al: A map of copy number variations in chinese populations. PLoS One 2011; 6: e27341.
Article CAS PubMed PubMed Central Google Scholar
Shaikh TH, Gai X, Perin JC et al: High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome Res 2009; 19: 1682–1690.
Article CAS PubMed PubMed Central Google Scholar
Iafrate AJ, Feuk L, Rivera MN et al: Detection of large-scale variation in the human genome. Nat Genet 2004; 36: 949–951.
Article CAS PubMed Google Scholar
Dennis G 1, Sherman BT, Hosack DA et al: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003; 4: P3.
Article PubMed Google Scholar
Weir BS, Hill WG : Estimating F-statistics. Annu Rev Genet 2002; 36: 721–750.
Article CAS PubMed Google Scholar
Felsenstein J : PHYLIP—Phylogeny Inference Package (Version 3.2). Cladistics 1989; 5: 164–166.
Google Scholar
Patterson N, Price AL, Reich D : Population structure and eigenanalysis. PLoS Genet 2006; 2: e190.
Article PubMed PubMed Central Google Scholar
Falush D, Stephens M, Pritchard JK : Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 2003; 164: 1567–1587.
CAS PubMed PubMed Central Google Scholar
Cooper GM, Zerr T, Kidd JM, Eichler EE, Nickerson DA : Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nat Genet 2008; 40: 1199–1203.
Article CAS PubMed PubMed Central Google Scholar
Price AL, Tandon A, Patterson N et al: Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet 2009; 5: e1000519.
Article PubMed PubMed Central Google Scholar
Su SY, Asher JE, Jarvelin MR et al: Inferring combined CNV/SNP haplotypes from genotype data. Bioinformatics 2010; 26: 1437–1445.
Article CAS PubMed PubMed Central Google Scholar
Xu S, Huang W, Qian J, Jin L : Analysis of genomic admixture in Uyghur and its implication in mapping strategy. Am J Hum Genet 2008; 82: 883–894.
Article CAS PubMed PubMed Central Google Scholar
Jin W, Xu S, Wang H et al: Genome-wide detection of natural selection in African Americans pre- and post-admixture. Genome Res 2011; 22: 519–527.
Article PubMed Google Scholar
Boettger LM, Handsaker RE, Zody MC, McCarroll SA : Structural haplotypes and recent evolution of the human 17q21.31 region. Nat Genet 2012; 44: 881–885.
Article CAS PubMed PubMed Central Google Scholar
Cooper GM, Coe BP, Girirajan S et al: A copy number variation morbidity map of developmental delay. Nat Genet 2011; 43: 838–846.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

These studies were supported by the Strategic Priority Research Program the Chinese Academy of Sciences (XDB13040100), by the National Science Foundation of China (NSFC) grants (91331204 and 31171218). SX is Max-Planck Independent Research Group Leader and member of CAS Youth Innovation Promotion Association. SX also gratefully acknowledges the support of the National Program for Top-notch Young Innovative Talents of The ‘Ten-Thousand-Talents’ Project and the support of K.C.Wong Education Foundation, Hong Kong. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Haiyi Lou and Shilin Li: These authors contributed equally to this work.

Authors and Affiliations

Max Planck Independent Research Group on Population Genomics Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Haiyi Lou, Wenfei Jin, Ruiqing Fu, Dongsheng Lu & Shuhua Xu
Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Haiyi Lou, Li Jin & Shuhua Xu
Ministry of Education (MOE) Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
Shilin Li, Xinwei Pan & Li Jin
Key Laboratory of Forensic Evidence and Scene Technology, Ministry of Public Security and Shanghai Key Laboratory of Crime Scene Evidence, Shanghai, China
Huaigu Zhou, Yuan Ping & Li Jin

Authors

Haiyi Lou
View author publications
You can also search for this author in PubMed Google Scholar
Shilin Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenfei Jin
View author publications
You can also search for this author in PubMed Google Scholar
Ruiqing Fu
View author publications
You can also search for this author in PubMed Google Scholar
Dongsheng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xinwei Pan
View author publications
You can also search for this author in PubMed Google Scholar
Huaigu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Ping
View author publications
You can also search for this author in PubMed Google Scholar
Li Jin
View author publications
You can also search for this author in PubMed Google Scholar
Shuhua Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuhua Xu.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Author contributions

SX conceived and designed the study. HL, WJ, RF, and DL performed data analysis. XP contributed to sample collection. SL, HZ, and YP contributed to the experimental work. SX and HL interpreted the data and wrote the paper. LJ contributed to reagents and materials. All authors have read and approved the final version of the manuscript.

Supplementary Information accompanies this paper on European Journal of Human Genetics website

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lou, H., Li, S., Jin, W. et al. Copy number variations and genetic admixtures in three Xinjiang ethnic minority groups. Eur J Hum Genet 23, 536–542 (2015). https://doi.org/10.1038/ejhg.2014.134

Download citation

Received: 28 January 2014
Revised: 06 June 2014
Accepted: 12 June 2014
Published: 16 July 2014
Issue Date: April 2015
DOI: https://doi.org/10.1038/ejhg.2014.134

This article is cited by

Population history modulates the fitness effects of Copy Number Variation in the Roma
- Marco Antinucci
- David Comas
- Francesc Calafell
Human Genetics (2023)
Genetic structure and forensic characteristics of the Kyrgyz population from Kizilsu Kirghiz autonomous prefecture based on autosomal DIPs
- Mingkun Xie
- Ye Li
- Jienan Li
International Journal of Legal Medicine (2022)
Rapid parallel adaptation despite gene flow in silent crickets
- Xiao Zhang
- Jack G. Rayner
- Nathan W. Bailey
Nature Communications (2021)
A study of normal copy number variations in Israeli population
- Idit Maya
- Pola Smirin-Yosef
- Mali Salmon-Divon
Human Genetics (2021)
Autosomal DIPs for population genetic structure and differentiation analyses of Chinese Xinjiang Kyrgyz ethnic group
- Yuxin Guo
- Chong Chen
- Bofeng Zhu
Scientific Reports (2018)