The deep population history of East Asia remains poorly understood owing to a lack of ancient DNA data and sparse sampling of present-day people1,2. Here we report genome-wide data from 166 East Asian individuals dating to between 6000 bc and ad 1000 and 46 present-day groups. Hunter-gatherers from Japan, the Amur River Basin, and people of Neolithic and Iron Age Taiwan and the Tibetan Plateau are linked by a deeply splitting lineage that probably reflects a coastal migration during the Late Pleistocene epoch. We also follow expansions during the subsequent Holocene epoch from four regions. First, hunter-gatherers from Mongolia and the Amur River Basin have ancestry shared by individuals who speak Mongolic and Tungusic languages, but do not carry ancestry characteristic of farmers from the West Liao River region (around 3000 bc), which contradicts theories that the expansion of these farmers spread the Mongolic and Tungusic proto-languages. Second, farmers from the Yellow River Basin (around 3000 bc) probably spread Sino-Tibetan languages, as their ancestry dispersed both to Tibet—where it forms approximately 84% of the gene pool in some groups—and to the Central Plain, where it has contributed around 59–84% to modern Han Chinese groups. Third, people from Taiwan from around 1300 bc to ad 800 derived approximately 75% of their ancestry from a lineage that is widespread in modern individuals who speak Austronesian, Tai–Kadai and Austroasiatic languages, and that we hypothesize derives from farmers of the Yangtze River Valley. Ancient people from Taiwan also derived about 25% of their ancestry from a northern lineage that is related to, but different from, farmers of the Yellow River Basin, which suggests an additional north-to-south expansion. Fourth, ancestry from Yamnaya Steppe pastoralists arrived in western Mongolia after around 3000 bc but was displaced by previously established lineages even while it persisted in western China, as would be expected if this ancestry was associated with the spread of proto-Tocharian Indo-European languages. Two later gene flows affected western Mongolia: migrants after around 2000 bc with Yamnaya and European farmer ancestry, and episodic influences of later groups with ancestry from Turan.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
BMC Genomics Open Access 07 November 2023
Nature Communications Open Access 30 October 2023
Parallel signatures of Mycobacterium tuberculosis and human Y-chromosome phylogeography support the Two Layer model of East Asian population history
Communications Biology Open Access 13 October 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
The aligned sequences are available through the European Nucleotide Archive under accession number PRJEB42781. The newly generated genotype data of 383 modern East Asian individuals have been deposited in Zenodo (https://doi.org/10.5281/zenodo.4058532). The previously published data co-analysed with our newly reported data can be obtained as described in the original publications, which are all referenced in Supplementary Table 4; a compiled dataset that includes the merged genotypes used in this paper is available as the Allen Ancient DNA Resource at https://reich.hms.harvard.edu/allen-ancient-dna-resource- aadr-downloadable-genotypes-present-day-and-ancient-dna-data. Any other relevant data are available from the corresponding authors upon reasonable request.
Cavalli-Sforza, L. L. The Chinese human genome diversity project. Proc. Natl Acad. Sci. USA 95, 11501–11503 (1998).
HUGO Pan-Asian SNP Consortium. Mapping human genetic diversity in Asia. Science 326, 1541–1545 (2009).
Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).
Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).
Damgaard, P. B. et al. 137 ancient human genomes from across the Eurasian steppes. Nature 557, 369–374 (2018).
Narasimhan, V. M. et al. The formation of human populations in South and Central Asia. Science 365, eaat7487 (2019).
Fu, Q. et al. An early modern human from Romania with a recent Neanderthal ancestor. Nature 524, 216–219 (2015).
Fu, Q. et al. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl Acad. Sci. USA 110, 2223–2227 (2013).
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Loh, P. R. et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193, 1233–1254 (2013).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Yang, M. A. et al. 40,000-year-old individual from Asia provides insight into early population structure in Eurasia. Curr. Biol. 27, 3202–3208 (2017).
Massilani, D. et al. Denisovan ancestry and population history of early East Asians. Science 370, 579–583 (2020).
Wang, C. C. & Li, H. Inferring human history in East Asia from Y chromosomes. Investig. Genet. 4, 11 (2013).
Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
Yang, M. A. et al. Ancient DNA indicates human population shifts and admixture in northern and southern China. Science 369, 282–288 (2020).
Nakashima, A., Ishida, H., Shigematsu, M., Goto, M. & Hanihara, T. Nonmetric cranial variation of Jomon Japan: implications for the evolution of eastern Asian diversity. Am. J. Hum. Biol. 22, 782–790 (2010).
Bellwood, P. & Renfrew, C. Examining the Farming/Language Dispersal Hypothesis (McDonald Institute for Archaeological Research, 2002).
Robbeets, M. & Savelyev, A. The Oxford Guide to the Transeurasian Languages (Oxford Univ. Press, 2020).
Siska, V. et al. Genome-wide data from two early Neolithic East Asian individuals dating to 7700 years ago. Sci. Adv. 3, e1601877 (2017).
Kamberov, Y. G. et al. Modeling recent human evolution in mice by expression of a selected EDAR variant. Cell 152, 691–702 (2013).
Zhang, X. L. et al. The earliest human occupation of the high-altitude Tibetan Plateau 40 thousand to 30 thousand years ago. Science 362, 1049–1051 (2018).
Chen, F. H. et al. Agriculture facilitated permanent human occupation of the Tibetan Plateau after 3600 B.P. Science 347, 248–250 (2015).
Zhang, M., Yan, S., Pan, W. & Jin, L. Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic. Nature 569, 112–115 (2019).
van Driem, G. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 81–106 (Routledge, 2005).
Liu, S. et al. Genomic analyses from non-invasive prenatal testing reveal genetic associations, patterns of viral infections, and Chinese population history. Cell 175, 347–359 (2018).
Chiang, C. W. K., Mangul, S., Robles, C. & Sankararaman, S. A comprehensive map of genetic variation in the world’s largest ethnic group—Han Chinese. Mol. Biol. Evol. 35, 2736–2750 (2018).
Reich, D. et al. Reconstructing Native American population history. Nature 488, 370–374 (2012).
Lipson, M. et al. Ancient genomes document multiple waves of migration in Southeast Asian prehistory. Science 361, 92–95 (2018).
McColl, H. et al. The prehistoric peopling of Southeast Asia. Science 361, 88–92 (2018).
Wang, L. X. et al. Reconstruction of Y-chromosome phylogeny reveals two neolithic expansions of Tibeto-Burman populations. Mol. Genet. Genomics 293, 1293–1300 (2018).
Ge, J. X., Wu, S. D. & Chao, S. J. Zhongguo yimin shi (The Migration History of China) (Fujian People’s Publishing House, 1997).
Ning, C. et al. Ancient genomes from northern China suggest links between subsistence changes and human migration. Nat. Commun. 11, 2700 (2020).
Wei, L. H. et al. Phylogeography of Y-chromosome haplogroup O3a2b2-N6 reveals patrilineal traces of Austronesian populations on the eastern coastal regions of Asia. PLoS ONE 12, e0175080 (2017).
Ko, A. M. et al. Early Austronesians: into and out of Taiwan. Am. J. Hum. Genet. 94, 426–436 (2014).
Skoglund, P. et al. Genomic insights into the peopling of the Southwest Pacific. Nature 538, 510–513 (2016).
Lipson, M. et al. Reconstructing Austronesian population history in island Southeast Asia. Nat. Commun. 5, 4689 (2014).
Bellwood, P. The checkered prehistory of rice movement southwards as a domesticated cereal—from the Yangzi to the equator. Rice 4, 93–103 (2011).
Yang, X. et al. Early millet use in northern China. Proc. Natl Acad. Sci. USA 109, 3726–3730 (2012).
Wilkin, S. et al. Dairy pastoralism sustained eastern Eurasian steppe populations for 5,000 years. Nat. Ecol. Evol. 4, 346–355 (2020).
Kovalev, A. The great migration of the Chemurchek people from France to the Altai in the early 3rd millennium bce. Int. J. Eurasian Stud. 1, 1–58 (2011).
Jeong, C. et al. A dynamic 6,000-year genetic history of Eurasia’s Eastern Steppe. Cell 183, 890–904 (2020).
Ning, C. et al. Ancient genomes reveal Yamnaya-related ancestry and a potential source of Indo-European speakers in Iron Age Tianshan. Curr. Biol. 29, 2526–2532 (2019).
Bellwood, P. in The Encyclopedia of Global Human Migration (Wiley-Blackwell, 2013).
Mallory, J. P. in Search of the Indo-Europeans: Language, Archaeology and Myth (Thames & Hudson, 1991).
Anthony, D. The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World (Princeton Univ. Press, 2007).
Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).
Wang, K., Mathieson, I., O’Connell, J. & Schiffels, S. Tracking human population structure through time from whole genome sequences. PLoS Genet. 16, e1008552 (2020).
Jeong, C. et al. Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc. Proc. Natl Acad. Sci. USA. 113, 7485–7490 (2016).
Pinhasi, R., Fernandes, D. M., Sirak, K. & Cheronet, O. Isolating the human cochlea to generate bone powder for ancient DNA analysis. Nat. Protocols 14, 1194–1205 (2019).
Sirak, K. A. et al. A minimally-invasive method for sampling human petrous bones from the cranial base for ancient DNA analysis. Biotechniques 62, 283–289 (2017).
Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, 15758–15763 (2013).
Korlević, P. et al. Reducing microbial and human contamination in DNA extractions from ancient bones and teeth. Biotechniques 59, 87–93 (2015).
Rohland, N., Glocke, I., Aximu-Petri, A. & Meyer, M. Extraction of highly degraded DNA from ancient bones, teeth and sediments for high-throughput sequencing. Nat. Protocols 13, 2447–2461 (2018).
Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. & Reich, D. Partial uracil-DNA-glycosylase treatment for screening of ancient DNA. Phil. Trans. R. Soc. Lond. B 370, 20130624 (2015).
Gansauge, M. T. & Meyer, M. Selective enrichment of damaged DNA molecules for ancient genome sequencing. Genome Res. 24, 1543–1549 (2014).
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Maricic, T., Whitten, M. & Pääbo, S. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS ONE 5, e14004 (2010).
Rohland, N. & Hofreiter, M. Ancient DNA extraction from bones and teeth. Nat. Protocols 2, 1756–1762 (2007).
John, J. S. SeqPrep. GitHub https://github.com/jstjohn/SeqPrep (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Behar, D. M. et al. A “Copernican” reassessment of the human mitochondrial DNA tree from its root. Am. J. Hum. Genet. 90, 675–684 (2012).
Weissensteiner, H. et al. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44, W58–W63 (2016).
Günther, T. & Nettelblad, C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet. 15, e1008302 (2019).
Kennett, D. J. et al. Archaeogenomic evidence reveals prehistoric matrilineal dynasty. Nat. Commun. 8, 14115 (2017).
Lohse, J. C., Madsen, D. B., Culleton, B. J. & Kennett, D. J. Isotope paleoecology of episodic mid-to-late Holocene bison population expansions in the southern Plains, U.S.A. Quat. Sci. Rev. 102, 14–26 (2014).
Reimer, P. J. et al. The IntCal20 Northern Hemisphere radiocarbon age calibration curve (0–55 cal kBP). Radiocarbon 62, 725–757 (2020).
Bronk Ramsey, C. Bayesian analysis of radiocarbon dates. Radiocarbon 51, 337–360 (2009).
Rasmussen, M. et al. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334, 94–98 (2011).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Busing, F. T. A., Meijer, E. & van der Leeden, R. Delete-m jackknife for unequal m. Stat. Comput. 9, 3–8 (1999).
Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).
Monroy Kuhn, J. M., Jakobsson, M. & Günther, T. Estimating genetic kin relationships in prehistoric populations. PLoS ONE 13, e0195491 (2018).
Ringbauer, H., Novembre, J. & Steinruecken, M. Human parental relatedness through time — detecting runs of homozygosity in ancient DNA. Preprint at bioRxiv https://doi.org/10.1101/2020.05.31.126912 (2020).
We thank D. Anthony, O. Bar-Yosef, K. Brunson, R. Flad, P. Flegontov, Q. Fu, W. Haak, I. Lazaridis, M. Lipson, I. Mathieson, R. Meadow, I. Olalde, N. Patterson, P. Skoglund, D. Xu, P. Bellwood and C. Chiang for comments; N. Saitou and the Asian DNA Repository Consortium for sharing genotype data from present-day Japanese groups; T. Nishimoto and T. Fujisawa from the Rebun Town Board of Education for sharing the Funadomari Jomon samples, and H. Tanaka and W. Nagahara from the Archeological Center of Chiba City, who are excavators of the Rokutsu Jomon site. The excavations at Boisman-2 site (Boisman culture), the Pospelovo-1 site (Yankovsky culture) and the Roshino-4 site (Heishui Mohe culture) were funded by the Far Eastern Federal University and the Institute of History, Archaeology and Ethnology Far Eastern Branch of the Russian Academy of Sciences; research on Pospelovo-1 is funded by RFBR project number 18-09-40101. C.-C.W. was funded by the Max Planck Society, the National Natural Science Foundation of China (NSFC 31801040), the Nanqiang Outstanding Young Talents Program of Xiamen University (X2123302), the Major project of National Social Science Foundation of China (20&ZD248), a European Research Council (ERC) grant to D. Xu (ERC-2019-ADG-883700-TRAM) and Fundamental Research Funds for the Central Universities (ZK1144). H.M. was supported by grant JSPS 16H02527. M.R. and C.-C.W. received funding from the ERC under the European Union’s Horizon 2020 research and innovation program (grant no. 646612) to M.R. H. Li was funded NSFC (91731303, 31671297), B&R International Joint Laboratory of Eurasian Anthropology (18490750300). J.K. was funded by DFG grant KR 4015/1-1, the Baden Württemberg Foundation and the Max Planck Institute. Accelerator Mass Spectrometry radiocarbon dating work was supported by the National Science Foundation (NSF) (BCS-1460369) to D.J.K. and B.J.C. D.R. was funded by NSF grant BCS-1032255, NIH (NIGMS) grant GM100233, the Paul M. Allen Frontiers Group, John Templeton Foundation grant 61220, a gift from J.-F. Clin and the Howard Hughes Medical Institute.
The authors declare no competing interests.
Peer review information Nature thanks Peter Bellwood, Charleston Chiang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Projection of ancient samples onto PCA dimensions 1 and 2 defined by East Asian, European, Siberian and Native American populations.
a, PCA dimensions 1 and 2 defined by present-day East Asian, European, Siberian and Native American populations. b, PCA dimensions 1 and 2 defined by present-day East Asian groups with little West Eurasian mixture.
Extended Data Fig. 3 Neighbour-joining tree of present-day East Eurasian individuals using the human origin dataset.
a, Neighbour-joining tree of present-day East Eurasian individuals based on FST distances using the human origin dataset. The branch length is shown in FST distance. b, Neighbour-joining tree of present-day East Eurasian individuals in which internal branches are all shown with the same branch length for better visualization.
a–f, We grouped the populations roughly into six groups based on geographical and genetic affinity. a, Populations mainly from Africa (yellow), America (magenta), West Eurasia (dark green and light brown) and Oceania (light magenta). b, Populations mainly from Mongolia (blue) and Siberia (purple). c, Populations mainly from southern China and Southeast Asia (light blue). d, Populations mainly from the Tibetan Plateau (olive) and Neolithic Yellow River Basin (red). e, Mainly Han Chinese groups from China (light blue and red). f, Populations mainly from the Amur River Basin (blue and red) and northeast Asia.
a, Cross-coalescence rates for selected population pairs. We ran MSMC for four pairs of populations: Tibetan–Ami, Tibetan–Atayal, Tibetan–Ulchi and Tibetan–Mixe. We used one individual from each population in this analysis. The modern genomic data for those individuals are from the Simons Genome Diversity Project. The times are calculated based on the mutation rate and generation time specified on the x axis. b, Cross-coalescence rates for selected population pairs. The same analysis as shown in a but using MSMC2 instead of MSMC, and using two individuals per population except for the Tibetan–Atayal pair, for which we used only one.
This figure is the same as Fig. 2 except we show the fitted genetic drifts on each lineage. We used all available sites in the dataset comprising 1,237,207 SNPs, restricting to transversions only to confirm that the same model fit (Supplementary Information section 3). We started with a skeleton tree that fits the data for Denisovan, Mbuti, Onge, Tianyuan and Luxembourg Loschbour and one admixture event. We grafted on Mongolia East Neolithic, Late Neolithic farmers from the Upper Yellow River, Liangdao 2, Japan Jomon, Nepal Chokhopani, Taiwan Hanben and Late Neolithic farmers from the West Liao River in turn, adding them consecutively to all possible edges in the tree and retaining only graph solutions that provided no differences of |Z| < 3 between fitted and estimated statistics (maximum |Z| = 2.95 here). We used the MSMC and MSMC2 relative population split time estimates to constrain models. Deep splits are not well constrained because of the minimal availability of data on East Asian populations from the Upper Paleolithic. a, Locations and dates of the East Asian individuals used in model fitting, with colours indicating whether the majority ancestry is from the hypothesized coastal expansion (green), interior expansion south (red) and interior expansion north (blue). The map is based on the ‘Google Map Layer’ from ArcGIS Online Basemaps (map data ©2020 Google). The grey circles represent sampled populations and white circles represent unsampled hypothesized nodes. b, In the model visualization, we colour lineages modelled as deriving entirely from one of these expansions, and also colour populations according to ancestry proportions. Dashed lines represent admixture (proportions are marked), and we show the amount of genetic drift on each lineage in units of FST × 1,000.
Lighter colours indicate more shared drift. Lahu groups with the Southeast Asian cluster probably due to substantial admixture. The Tibetan_Yajiang are geographically in the Tibeto-Burman Corridor but group with Core Tibetan individuals, presumably reflecting less genetic admixture from people of the Southeast Asian cluster.
This Supplementary Information file contains an Ethics Statement, Supplementary Information sections 1-4 including, 15 Supplementary Figures, 5 Supplementary Tables and Supplementary References. The supplementary figures and tables provide information on the genetic structure and population history of East Asians.
This zipped file contains 26 Supplementary Tables and a table guide.
Genotypes of the newly reported 166 ancient individuals.
About this article
Cite this article
Wang, CC., Yeh, HY., Popov, A.N. et al. Genomic insights into the formation of human populations in East Asia. Nature 591, 413–419 (2021). https://doi.org/10.1038/s41586-021-03336-2
This article is cited by
Multiple founding paternal lineages inferred from the newly-developed 639-plex Y-SNP panel suggested the complex admixture and migration history of Chinese people
Human Genomics (2023)
Genome-wide allele and haplotype-sharing patterns suggested one unique Hmong–Mein-related lineage and biological adaptation history in Southwest China
Human Genomics (2023)
BMC Genomics (2023)
Extensive genetic admixture between Tai-Kadai-speaking people and their neighbours in the northeastern region of the Yungui Plateau inferred from genome-wide variations
BMC Genomics (2023)
Parallel signatures of Mycobacterium tuberculosis and human Y-chromosome phylogeography support the Two Layer model of East Asian population history
Communications Biology (2023)