Whole-genome sequencing of 175 Mongolians uncovers population-specific genetic architecture and gene flow throughout North and East Asia

Article metrics


The genetic variation in Northern Asian populations is currently undersampled. To address this, we generated a new genetic variation reference panel by whole-genome sequencing of 175 ethnic Mongolians, representing six tribes. The cataloged variation in the panel shows strong population stratification among these tribes, which correlates with the diverse demographic histories in the region. Incorporating our results with the 1000 Genomes Project panel identifies derived alleles shared between Finns and Mongolians/Siberians, suggesting that substantial gene flow between northern Eurasian populations has occurred in the past. Furthermore, we highlight that North, East, and Southeast Asian populations are more aligned with each other than these groups are with South Asian and Oceanian populations.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Sampling, variants, and imputation.
Fig. 2: Population genetic structure.
Fig. 3: Inference of population demographic history.
Fig. 4: Gene flow between Mongolians and global human populations of 1000G.
Fig. 5: Phylogenetic relatedness of East Asian groups with other people.

Data availability

Raw sequencing data and variant sets have been deposited to the CNGB (China National Genebank) Nucleotide Sequence Archive (CNSA) with accession CNP0000063 (https://db.cngb.org/cnsa/).


  1. 1.

    Bai, H. et al. The genome of a Mongolian individual reveals the genetic imprints of Mongolians on modern human populations. Genome Biol. Evol. 6, 3122–3136 (2014).

  2. 2.

    Kolman, C. J., Sambuughin, N. & Bermingham, E. Mitochondrial DNA analysis of Mongolian populations and implications for the origin of New World founders. Genetics 142, 1321–1334 (1996).

  3. 3.

    Merriwether, D. A., Hall, W. W., Vahlne, A. & Ferrell, R. E. mtDNA variation indicates Mongolia may have been the source for the founding population for the New World. Am. J. Hum. Genet. 59, 204–212 (1996).

  4. 4.

    Karafet, T. M. et al. Ancestral Asian source(s) of new world Y-chromosome founder haplotypes. Am. J. Hum. Genet. 64, 817–831 (1999).

  5. 5.

    Brace, C. L. et al. Old World sources of the first New World human inhabitants: a comparative craniofacial view. Proc. Natl Acad. Sci. USA 98, 10017–10022 (2001).

  6. 6.

    Franke, H. & Twitchett, D. The Cambridge History of China: Alien Regimes and Border States, 907–1368 (Cambridge Univ. Press, New York, 1994).

  7. 7.

    Zerjal, T. et al. The genetic legacy of the Mongols. Am. J. Hum. Genet. 72, 717–721 (2003).

  8. 8.

    Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014).

  9. 9.

    Weatherford, J. M. Genghis Khan and the Making of the Modern World (Three Rivers Press, New York, 2004).

  10. 10.

    Li, J. Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).

  11. 11.

    The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  12. 12.

    Pagani, L. et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538, 238–242 (2016).

  13. 13.

    Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).

  14. 14.

    The HUGO Pan-Asian SNP Consortium. Mapping human genetic diversity in Asia. Science 326, 1541–1545 (2009).

  15. 15.

    Mondal, M. et al. Genomic analysis of Andamanese provides insights into ancient human migration into Asia and adaptation. Nat. Genet. 48, 1066–1070 (2016).

  16. 16.

    Qin, P. et al. Quantitating and dating recent gene flow between European and East Asian populations. Sci. Rep. 5, 9500 (2015).

  17. 17.

    Wong, E. H. et al. Reconstructing genetic history of Siberian and Northeastern European populations. Genome Res. 27, 1–14 (2017).

  18. 18.

    Kong, Q. P. et al. Phylogeny of east Asian mitochondrial DNA lineages inferred from complete sequences. Am. J. Hum. Genet. 73, 671–676 (2003).

  19. 19.

    Derenko, M. et al. Phylogeographic analysis of mitochondrial DNA in northern Asian populations. Am. J. Hum. Genet. 81, 1025–1041 (2007).

  20. 20.

    Su, B. et al. Y-chromosome evidence for a northward migration of modern humans into Eastern Asia during the last Ice Age. Am. J. Hum. Genet. 65, 1718–1724 (1999).

  21. 21.

    Ke, Y. et al. African origin of modern humans in East Asia: a tale of 12,000 Y chromosomes. Science 292, 1151–1153 (2001).

  22. 22.

    Shi, H. et al. Y chromosome evidence of earliest modern human settlement in East Asia and multiple origins of Tibetan and Japanese populations. BMC Biol. 6, 45 (2008).

  23. 23.

    Zhong, H. et al. Global distribution of Y-chromosome haplogroup C reveals the prehistoric migration routes of African exodus and early settlement in East Asia. J. Hum. Genet. 55, 428–435 (2010).

  24. 24.

    Xing, J. et al. Genomic analysis of natural selection and phenotypic variation in high-altitude mongolians. PLoS Genet. 9, e1003634 (2013).

  25. 25.

    McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

  26. 26.

    The Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).

  27. 27.

    Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6, 8111 (2015).

  28. 28.

    Reich, D. et al. Reconstructing Native American population history. Nature 488, 370–374 (2012).

  29. 29.

    Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

  30. 30.

    Kong, Q. P. et al. Mitochondrial DNA sequence polymorphisms of five ethnic populations from northern China. Hum. Genet. 113, 391–405 (2003).

  31. 31.

    Stewart, J. B. & Chinnery, P. F. The dynamics of mitochondrial DNA heteroplasmy: implications for human health and disease. Nat. Rev. Genet. 16, 530–542 (2015).

  32. 32.

    Katoh, T. et al. Genetic features of Mongolian ethnic groups revealed by Y-chromosomal analysis. Gene 346, 63–70 (2005).

  33. 33.

    Poznik, G. D. et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat. Genet. 48, 593–599 (2016).

  34. 34.

    Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).

  35. 35.

    Botigue, L. R. et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl Acad. Sci. USA 110, 11791–11796 (2013).

  36. 36.

    Gravel, S. et al. Reconstructing Native American migrations from whole-genome and whole-exome data. PLOS Genet. 9, e1004023 (2013).

  37. 37.

    Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

  38. 38.

    Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

  39. 39.

    Henikoff, S. & Henikoff, J. G. Position-based sequence weights. J. Mol. Biol. 243, 574–578 (1994).

  40. 40.

    Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLOS Genet. 8, e1002967 (2012).

  41. 41.

    Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).

  42. 42.

    Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

  43. 43.

    DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

  44. 44.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  45. 45.

    The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  46. 46.

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  47. 47.

    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

  48. 48.

    Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012).

  49. 49.

    Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).

  50. 50.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

  51. 51.

    Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).

  52. 52.

    Liu, K. & Muse, S. V. PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics 21, 2128–2129 (2005).

  53. 53.

    Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).

  54. 54.

    Van Geystelen, A., Decorte, R. & Larmuseau, M. H. AMY-tree: an algorithm to use whole genome SNP calling for Y chromosomal phylogenetic applications. BMC Genomics 14, 101 (2013).

  55. 55.

    Zhang, F. et al. YHap: a population model for probabilistic assignment of Y haplogroups from re-sequencing data. BMC Bioinformatics 14, 331 (2013).

  56. 56.

    Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).

  57. 57.

    Lewis, P. O. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst. Biol. 50, 913–925 (2001).

  58. 58.

    van Oven, M. & Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009).

  59. 59.

    Fan, L. & Yao, Y. G. An update to MitoTool: using a new scoring system for faster mtDNA haplogroup determination. Mitochondrion 13, 360–363 (2013).

  60. 60.

    Kloss-Brandstatter, A. et al. HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum. Mutat. 32, 25–32 (2011).

  61. 61.

    Bergström, A. et al. A Neolithic expansion, but strong genetic structure, in the independent history of New Guinea. Science 357, 1160–1163 (2017).

  62. 62.

    de Manuel, M. et al. Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science 354, 477–481 (2016).

  63. 63.

    Browning, B. L. & Browning, S. R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).

  64. 64.

    Browning, B. L. & Browning, S. R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88, 173–182 (2011).

  65. 65.

    Atzmon, G. et al. Abraham’s children in the genome era: major Jewish diaspora populations comprise distinct genetic clusters with shared Middle Eastern ancestry. Am. J. Hum. Genet. 86, 850–859 (2010).

  66. 66.

    Reich, D., Thangaraj, K., Patterson, N., Price, A. L. & Singh, L. Reconstructing Indian population history. Nature 461, 489–494 (2009).

  67. 67.

    Yi, X. et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science 329, 75–78 (2010).

  68. 68.

    Alexandros, S. et al. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

Download references


We sincerely thank the Mongolian volunteers who agreed to contribute blood samples and participate in this study. We thank D. Reich for sharing genotype data on populations from Siberia and South Asia, and J. Fekecs for graphical assistance. We acknowledge F.S. Collins and C.D. Bustamante for their helpful discussions and comments on the manuscript, as well as Shuangshan Shuangshan, Y. Bao, and S. Ba for contributing to the sample collection process. This study was supported by Shenzhen Municipal Government of China (CXB201108250094A), Inner Mongolia University for Nationalities Scientific Research Project (MD2012038), the National Science Foundation of China (81560176, 81511130050), China National Genebank, Foundation of the Inner Mongolia Department of Science and Technology (2015MS0875, 201502103), Science and Technology Planning Project of Inner Mongolia, China (20120409), and the Guangdong Provincial Key Laboratory of Genome Read and Write (2017B030301010). C.R.G. is supported by the US National Institutes of Health (4U01HG007419-04) and National Science Foundation (1201234). N.N., S.R.B., and L.C.B are supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health.

Author information

Y.Y., H.Z., B.B., and H.B. initiated and supervised the project. H.B., Q.W., Y.X., Z.P., J.J., X.Y., M.M., B.G., D.W., Y.G., H.H., S.S., Y.C., YanruZ., L.Z., YiyiL., C.L., F.M., K.W., L.L., and YingchunL. surveyed and collected the samples. Y.X., YanruZ., DongZ., J.C., S.W., X.Li, and T.Li performed extraction of the genomic DNA. H.B., X.G., Q.W., M.J., and B.W. did the genome sequencing. YongZ., L.F., H.W., and T.Lan did the mapping and variation calling. T.Lan, X.G., H.L., W.L., Z.W., and B.W. performed experimental validation. X.G., T.Lan, and B.D. did the construction of the haplotype reference panel. X.G., T.Lan, DandanZ., H.X., N.D., X.Luo, W.X., and L.Y. performed the analysis of population diversity and genetic structure. T.Lan, X.G., N.N., B.D., and X.N. did the inferences of population demographic history. N.N., S.R.B., K.L., and C.R.G. did the analysis of phylogeny of East Asians. X.G., N.N., T.Lan, and S.R.B. wrote the manuscripts. X.G., C.Y., X.Luo, and T.Li were in charge of data submission. N.N., X.G., T.Lan, S.R.B., N.D., C.R.G., X.X., X.Liu, H.Y., L.C.B., J.W., and K.K. revised the manuscript.

Correspondence to Burenbatu Burenbatu or Huanmin Zhou or Ye Yin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–17 and Supplementary Tables 1–8

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bai, H., Guo, X., Narisu, N. et al. Whole-genome sequencing of 175 Mongolians uncovers population-specific genetic architecture and gene flow throughout North and East Asia. Nat Genet 50, 1696–1704 (2018) doi:10.1038/s41588-018-0250-5

Download citation

Further reading