Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The genomic origins of the Bronze Age Tarim Basin mummies


The identity of the earliest inhabitants of Xinjiang, in the heart of Inner Asia, and the languages that they spoke have long been debated and remain contentious1. Here we present genomic data from 5 individuals dating to around 3000–2800 bc from the Dzungarian Basin and 13 individuals dating to around 2100–1700 bc from the Tarim Basin, representing the earliest yet discovered human remains from North and South Xinjiang, respectively. We find that the Early Bronze Age Dzungarian individuals exhibit a predominantly Afanasievo ancestry with an additional local contribution, and the Early–Middle Bronze Age Tarim individuals contain only a local ancestry. The Tarim individuals from the site of Xiaohe further exhibit strong evidence of milk proteins in their dental calculus, indicating a reliance on dairy pastoralism at the site since its founding. Our results do not support previous hypotheses for the origin of the Tarim mummies, who were argued to be Proto-Tocharian-speaking pastoralists descended from the Afanasievo1,2 or to have originated among the Bactria–Margiana Archaeological Complex3 or Inner Asian Mountain Corridor cultures4. Instead, although Tocharian may have been plausibly introduced to the Dzungarian Basin by Afanasievo migrants during the Early Bronze Age, we find that the earliest Tarim Basin cultures appear to have arisen from a genetically isolated local population that adopted neighbouring pastoralist and agriculturalist practices, which allowed them to settle and thrive along the shifting riverine oases of the Taklamakan Desert.


As part of the Silk Road and located at the geographic confluence of Eastern and Western cultures, the Xinjiang Uyghur Autonomous Region (henceforth Xinjiang) has long served as a major crossroads for trans-Eurasian exchanges of people, cultures, agriculture and languages1,5,6,7,8,9. Bisected by the Tianshan mountains, Xinjiang can be divided into two subregions referred to as North Xinjiang, which contains the Dzungarian Basin, and South Xinjiang, which contains the Tarim Basin (Fig. 1). The Dzungarian Basin in the north consists of the Gurbantünggüt Desert, which is surrounded by a vast expanse of grasslands traditionally inhabited by mobile pastoralists. The southern part of Xinjiang consists of the Tarim Basin, a dry inland sea that now forms the Taklamakan Desert. Although mostly uninhabitable, the Tarim Basin also contains small oases and riverine corridors, fed by runoff from thawing glacier ice and snow from the surrounding high mountains4,10,11.

Fig. 1: Overview of the Xinjiang Bronze Age archaeological sites analysed in this study.
figure 1

a, Overview of key Eurasian geographic regions, features and archaeological sites discussed in the text; new sites analysed in this study are shown in grey. b, Enhanced view of Xinjiang and the six new sites analysed in this study. c, Timeline of the sites in a. The timeline is organized by region, and the median date for each studied group is shown. The base maps in a and b were obtained from the Natural Earth public domain map dataset ( In the group labels, the suffixes represent the archaeological time periods of each group: N, Neolithic; EN, MN and LN, Early, Middle and Late Neolithic, respectively; EN, Eneolithic for Geoksyur, Parkhai and Sarazm; CA, Chalcolithic Age; BA, Bronze Age; MBA, Middle Bronze Age; EIA, Early Iron Age. MA-1, Mal'ta; EHG, Eastern European hunter-gatherers.

Within and around the Dzungarian Basin, pastoralist Early Bronze Age (EBA) Afanasievo (3000–2600 bc) and Chemurchek (or Qiemu’erqieke) (2500–1700 bc)12 sites have been plausibly linked to the Afanasievo herders of the Altai–Sayan region in southern Siberia (3150–2750 bc), who in turn have close genetic ties with the Yamnaya (3500–2500 bc) of the Pontic–Caspian steppe located 3,000 km to the west13,14,15. Linguists have hypothesized that the Afanasievo dispersal brought the now extinct Tocharian branch of the Indo-European language family eastwards, separating it from other Indo-European languages by the third or fourth millennium bc (ref. 14). However, although Afanasievo-related ancestry has been confirmed among Iron Age Dzungarian populations (around 200–400 bc)7, and Tocharian is recorded in Buddhist texts from the Tarim Basin dating to ad 500–1000 (ref. 13), little is known about earlier Xinjiang populations and their possible genetic relationships with the Afanasievo or other groups.

Since the late 1990s, the discovery of hundreds of naturally mummified human remains dating to around 2000 bc to ad 200 in the Tarim Basin has attracted international attention due to their so-called Western physical appearance, their felted and woven woollen clothing, and their agropastoral economy that included cattle, sheep/goats, wheat, barley, millet and even kefir cheese16,17,18,19. Such mummies have now been found throughout the Tarim Basin, among which the earliest are those found in the lowest layers of the cemeteries at Gumugou (2135–1939 bc), Xiaohe (1884–1736 bc) and Beifang (1785–1664 bc) (Fig. 1, Extended Data Fig. 1 and Extended Data Table 1). These and related Bronze Age sites are grouped within the Xiaohe archaeological horizon on the basis of their shared material culture13,16,20.

Multiple contrasting hypotheses have been suggested by scholars to explain the origins and Western elements of the Xiaohe horizon, including the Yamnaya/Afanasievo steppe hypothesis16, the Bactrian oasis hypothesis21 and the Inner Asian Mountain Corridor (IAMC) island biogeography hypothesis4. The Yamnaya/Afanasievo steppe hypothesis posits that the Afanasievo-related EBA populations in the Altai–Sayan mountains spread via the Dzungarian Basin into the Tarim Basin and subsequently founded the agropastoralist communities making up the Xiaohe horizon around 2000 bc (refs. 16,22,23). By contrast, the Bactrian oasis hypothesis posits that the Tarim Basin was initially colonized by migrating farmers of the Bactria–Margiana Archaeological Complex (BMAC) (around 2300–1800 bc) from the desert oases of Afghanistan, Turkmenistan and Uzbekistan via the mountains of Central Asia. Support for this hypothesis is largely based on similarities in the agricultural and irrigation systems between the two regions that reflect adaptations to a desert environment, as well as evidence for the ritual use of Ephedra at both locations3,21. The IAMC island biogeography hypothesis similarly posits a mountain Central Asian origin for the Xiaohe founder population, but one linked to the transhumance of agropastoralists in the IAMC to the west and north of the Tarim Basin4,24,25. In contrast to these three migration models, the greater IAMC, which spans the Hindu Kush to Altai mountains, may have alternatively functioned as a geographic arena through which cultural ideas, rather than populations, primarily moved25.

Recent archaeogenomic research has shown that Bronze Age Afanasievo of southern Siberia and IAMC/BMAC populations of Central Asia have distinguishable genetic profiles15,26, and that these profiles are likewise also distinct from those of pre-agropastoralist hunter-gatherer populations in Inner Asia2,5,7,27,28,29,30. As such, an archaeogenomic investigation of Bronze Age Xinjiang populations presents a powerful approach for reconstructing the population histories of the Dzungarian and Tarim basins and the origins of the Bronze Age Xiaohe horizon. Examining the skeletal material of 33 Bronze Age individuals from sites in the Dzungarian (Nileke, Ayituohan and Songshugou) and Tarim (Xiaohe, Gumugou and Beifang) basins, we successfully retrieved ancient genome sequences from 5 EBA Dzungarian individuals (3000–2800 bc) culturally assigned as Afanasievo, and genome-wide data from 13 Early–Middle Bronze Age (EMBA) Tarim individuals (2100–1700 bc) belonging to the Xiaohe horizon (Extended Data Table 1 and Supplementary Data 1A). We additionally report dental calculus proteomes of seven individuals from basal layers at the site of Xiaohe in the Tarim Basin (Extended Data Table 2). To the best of our knowledge, these individuals represent the earliest human remains excavated to date in the region.

Genetic diversity of the Bronze Age Xinjiang

We obtained genome-wide data for 18 of 33 attempted individuals by either whole-genome sequencing or DNA enrichment for a panel of about 1.2 million single-nucleotide polymorphisms (1,240k panel SNPs) (Supplementary Data 1A). Overall, endogenous DNA was well preserved with minimal levels of contamination (Extended Data Table 1 and Supplementary Data 1A). To explore the genetic profiles of ancient Xinjiang populations, we first calculated the principal components of present-day Eurasian and Native American populations onto which we projected those of ancient individuals. Ancient Xinjiang individuals form several distinct clusters distributed along principal component 1 (PC1) (Fig. 2), the main principal component that separates eastern and western Eurasian populations. EBA Dzungarian individuals from the sites of Ayituohan and Songshugou near the Altai Mountains (Dzungaria_EBA1) fall close to EBA Afanasievo steppe herders from the Altai–Sayan mountains to the north. Genetic clustering with ADMIXTURE further supports this observation (Extended Data Fig. 3). The contemporaneous individuals from the Nileke site near the Tianshan mountains (Dzungaria_EBA2) are slightly shifted along PC1 towards the later Tarim individuals. In contrast to the EBA Dzungarian individuals, the EMBA individuals from the eastern Tarim sites of Xiaohe and Gumugou (Tarim_EMBA1) form a tight cluster close to pre-Bronze Age central steppe and Siberian individuals who share a high level of ancient North Eurasian (ANE) ancestry (for example, Botai_CA). A contemporaneous individual from the Beifang site (Tarim_EMBA2) in the southern Tarim Basin is slightly displaced from the Tarim_EMBA1 towards EBA individuals from the Baikal region.

Fig. 2: Genetic structure of ancient and present-day populations included in this study.
figure 2

Principal component analysis of ancient individuals projected onto Eurasian and Native American populations; the inset displays ancient individuals projected onto only Eurasian populations.

Afanasievo genetic legacy in Dzungaria

Outgroup f3 statistics supports a tight genetic link between the Dzungarian and Tarim groups (Extended Data Fig. 2A). Nevertheless, both of the Dzungarian groups are significantly different from the Tarim groups, showing excess affinity with various western Eurasian populations and sharing fewer alleles with ANE-related groups (Extended Data Fig. 2b, c). To understand this mixed genetic profile, we used qpAdm to explore admixture models of the Dzungarian groups with Tarim_EMBA1 or a terminal Pleistocene individual (AG3) from the Siberian site of Afontova Gora31, as a source (Supplementary Data 1D). AG3 is a distal representative of the ANE ancestry and shows a high affinity with Tarim_EMBA1. Although the Tarim_EMBA1 individuals lived a millennium later than the Dzungarian groups, they are more genetically distant from the Afanasievo than the Dzungarian groups, suggesting that they have a higher proportion of local autochthonous ancestry. Here we define autochthonous to signify a genetic profile that has been present in a region for millennia, rather than being associated with more recently arrived groups.

We find that Dzungaria_EBA1 and Dzungaria_EBA2 are both best described by three-way admixture models (Fig. 3c, Extended Data Table 3 and Supplementary Data 1D) in which they derive a majority ancestry from Afanasievo (about 70% in Dzungaria_EBA1 and about 50% in Dzungaria_EBA2), with the remaining ancestry best modelled as a mixture of AG3/Tarim_EMBA1 (19–36%) and Baikal_EBA (9–21%). When we use Eneolithic and Bronze Age populations from the IAMC as a source, models fail when Afanasievo is not included as a source, and no contribution is allocated to the IAMC groups when Afanasievo is included (Supplementary Data 1D). Thus, Afanasievo ancestry, without IAMC contributions, is sufficient to explain the western Eurasian component of the Dzungarian individuals. We also find that the Chemurchek, an EBA pastoralist culture that succeeds the Afanasievo in both the Dzungarian Basin and Altai Mountains, derive approximately two-thirds of their ancestry from Dzungaria_EBA1 with the remainder from Tarim_EMBA1 and IAMC/BMAC-related sources (Fig. 3, Extended Data Table 3, Supplementary Data 1F and Supplementary Text 5). This helps to explain both the IAMC/BMAC-related ancestry previously noted in Chemurchek individuals30 and their reported cultural and genetic affiliations to Afanasievo groups32. Taken together, these results indicate that the early dispersal of the Afanasievo herders into Dzungaria was accompanied by a substantial level of genetic mixing with local autochthonous populations, a pattern distinct from that of the initial formation of the Afanasievo culture in southern Siberia.

Fig. 3: Genetic ancestry and admixture dating of ancient populations from Xinjiang and its vicinity.
figure 3

a, qpAdm-based estimates of the ancestry proportion of Dzungaria_EBA and Tarim_EMBA from three ancestry sources (AG3, Afanasievo and Baikal_EBA) (Supplementary Data 1D, E). Unlike Dzungaria_EBA individuals, Tarim_EMBA individuals are adequately modelled without EBA Eurasian steppe pastoralist (for example, Afanasievo) ancestry. b, Genetic admixture dates for key Bronze Age populations in Inner Asia, including Dzungaria_EBA1 (n = 3), Chemurchek (n = 3), Kumsay_EBA (n = 4), Mereke_MBA (n = 2), Dali_EBA (n = 1) and Tarim_EMBA1 (n = 12). The blue shade represents the radiocarbon dating range of the Yamnaya and Afanasievo individuals. The orange circles and the associated vertical bars represent the averages and standard deviations of median radiocarbon dates, respectively. The circles above each orange circle represent the estimated admixture dates with a generation time of 29 years, and the vertical bars represent the sum of standard errors of the admixture date and the radiocarbon date estimate. c, Representative qpAdm-based admixture models of ancient Eurasian groups (Supplementary Data 1D–I). For Dzungaria_EBA1 and Geoksyur_EN, we show their three-way admixture models including Tarim_EMBA1 as a source. For later populations in Xinjiang, IAMC and nearby regions, we used them as sources, and allocated a colour to each of them (blue for Dzungaria_EBA1; magenta for Geoksyur_EN). The base map in c was obtained from the Natural Earth public domain map dataset (

Genetic isolation of the Tarim group

The Tarim_EMBA1 and Tarim_EMBA2 groups, although geographically separated by over 600 km of desert, form a homogeneous population that had undergone a substantial population bottleneck, as suggested by their high genetic affinity without close kinship, as well as by the limited diversity in their uniparental haplogroups (Figs. 1 and 2, Extended Data Fig. 4, Extended Data Table 1, Supplementary Data 1B and Supplementary Text 4). Using qpAdm, we modelled the Tarim Basin individuals as a mixture of two ancient autochthonous Asian genetic groups: the ANE, represented by an Upper Palaeolithic individual from the Afontova Gora site in the upper Yenisei River region of Siberia (AG3) (about 72%), and ancient Northeast Asians, represented by Baikal_EBA (about 28%) (Supplementary Data 1E and Fig. 3a). Tarim_EMBA2 from Beifang can also be modelled as a mixture of Tarim_EMBA1 (about 89%) and Baikal_EBA (about 11%). For both Tarim groups, admixture models unanimously fail when using the Afanasievo or IAMC/BMAC groups as a western Eurasian source (Supplementary Data 1E), thus rejecting a western Eurasian genetic contribution from nearby groups with herding and/or farming economies. We estimate a deep formation date for the Tarim_EMBA1 genetic profile, consistent with an absence of western Eurasian EBA admixture, placing the origin of this gene pool at 183 generations before the sampled Tarim Basin individuals, or 9,157 ± 986 years ago when assuming an average generation time of 29 years (Fig. 3b). Considering these findings together, the genetic profile of the Tarim Basin individuals indicates that the earliest individuals of the Xiaohe horizon belong to an ancient and isolated autochthonous Asian gene pool. This autochthonous ANE-related gene pool is likely to have formed the genetic substratum of the pre-pastoralist ANE-related populations of Central Asia and southern Siberia (Fig. 3c, Extended Data Fig. 2 and Supplementary Text 5).

Pastoralism in the Tarim Basin

Although the harsh environment of the Tarim Basin may have served as a strong barrier to gene flow into the region, it was not a barrier to the flow of ideas or technologies, as foreign innovations, such as dairy pastoralism and wheat and millet agriculture, came to form the basis of the Bronze Age Tarim economies. Woollen fabrics, horns and bones of cattle, sheep and goats, livestock manure, and milk and kefir-like dairy products have been recovered from the upper layers of the Xiaohe and Gumugou cemeteries33,34,35,36, as have wheat and millet seeds and bundles of Ephedra twigs34,37,38. Famously, many of the mummies dating to 1650–1450 bc were even buried with lumps of cheese35. However, until now it has not been clear whether this pastoralist lifestyle also characterized the earliest layers at Xiaohe.

To better understand the dietary economy of the earliest archaeological periods, we analysed the dental calculus proteomes of seven individuals at the site of Xiaohe dating to around 2000–1700 bc. All seven individuals were strongly positive for ruminant-milk-specific proteins (Extended Data Table 2), including β-lactoglobulin, α-S1-casein and α-lactalbumin (Extended Data Fig. 5), and peptide recovery was sufficient to provide taxonomically diagnostic matches to cattle (Bos), sheep (Ovis) and goat (Capra) milk (Extended Data Fig. 5, Extended Data Table 2 and Supplementary Data 3). These results confirm that dairy products were consumed by individuals of autochthonous ancestry (Tarim_EMBA1) buried in the lowest levels of the Xiaohe cemetery (Extended Data Table 2). Importantly, however, and in contrast to previous hypotheses36, none of the Tarim individuals was genetically lactase persistent (Supplementary Data 1J). Rather, the Tarim mummies contribute to a growing body of evidence that prehistoric dairy pastoralism in Inner and East Asia spread independently of lactase persistence genotypes28,30.


Although human activities in Xinjiang can be traced back to around 40,000 years ago24,39, the earliest evidence for sustained human habitation in the Tarim Basin dates only to the late third to early second millennium bc. There, at the sites of Xiaohe, Gumugou and Beifang, well-preserved mummified human remains buried within wooden coffins and associated with rich organic grave good assemblages represent the earliest known archaeological cultures of the region. Since their initial discovery in the early twentieth century and subsequent large-scale excavations beginning in the 1990s (ref. 16), the Tarim mummies have been at the centre of debates with regard to their origins, their relationship to other Bronze Age steppe (Afanasievo), oasis (BMAC) and mountain (IAMC and Chemurchek) groups, and their potential connection to the spread of Indo-European languages into this region3,4,40.

The palaeogenomic and proteomic data we present here suggest a very different and more complex population history than previously proposed. Although the IAMC may have been a vector for transmitting cultural and economic factors into the Tarim Basin, the known sites from the IAMC do not provide a direct source of ancestry for the Xiaohe populations. Instead, the Tarim mummies belong to an isolated gene pool whose Asian origins can be traced to the early Holocene epoch. This gene pool is likely to have once had a much wider geographic distribution, and it left a substantial genetic footprint in the EMBA populations of the Dzungarian Basin, IAMC and southern Siberia. The Tarim mummies’ so-called Western physical features are probably due to their connection to the Pleistocene ANE gene pool, and their extreme genetic isolation differs from the EBA Dzungarian, IAMC and Chemurchek populations, who experienced substantial genetic interactions with the nearby populations mirroring their cultural links, pointing towards a role of extreme environments as a barrier to human migration.

In contrast to their marked genetic isolation, however, the populations of the Xiaohe horizon were culturally cosmopolitan, incorporating diverse economic elements and technologies with far-flung origins. They made cheese from ruminant milk using a kefir-like fermentation37, perhaps learned from descendants of the Afanasievo, and they cultivated wheat, barley and millet37,41, crops that were originally domesticated in the Near East and northern China and which were introduced into Xinjiang no earlier than 3500 bc (refs. 8,42), probably via their IAMC neighbours24. They buried their dead with Ephedra twigs in a style reminiscent of the BMAC oasis cultures of Central Asia, and they also developed distinctive cultural elements not found among other cultures in Xinjiang or elsewhere, such as boat-shaped wooden coffins covered with cattle hides and marked by timber poles or oars, as well as an apparent preference for woven baskets over pottery43,44. Considering these findings together, it appears that the tightknit population that founded the Xiaohe horizon were well aware of different technologies and cultures outside the Tarim Basin and that they developed their unique culture in response to the extreme challenges of the Taklamakan Desert and its lush and fertile riverine oases4.

This study illuminates in detail the origins of the Bronze Age human populations in the Dzungarian and Tarim basins of Xinjiang. Notably, our results support no hypothesis involving substantial human migration from steppe or mountain agropastoralists for the origin of the Bronze Age Tarim mummies, but rather we find that the Tarim mummies represent a culturally cosmopolitan but genetically isolated autochthonous population. This finding is consistent with earlier arguments that the IAMC served as a geographic corridor and vector for regional cultural interaction that connected disparate populations from the fourth to the second millennium bc (refs. 24,25). While the arrival and admixture of Afanasievo populations in the Dzungarian Basin of northern Xinjiang around 3000 bc may have plausibly introduced Indo-European languages to the region, the material culture and genetic profile of the Tarim mummies from around 2100 bc onwards call into question simplistic assumptions about the link between genetics, culture and language and leave unanswered the question of whether the Bronze Age Tarim populations spoke a form of proto-Tocharian. Future archaeological and palaeogenomic research on subsequent Tarim Basin populations—and most importantly, studies of the sites and periods where first millennium ad Tocharian texts have been recovered—are necessary to understand the later population history of the Tarim Basin. Finally, the palaeogenomic characterization of the Tarim mummies has unexpectedly revealed one of the few known Holocene-era genetic descendant populations of the once widespread Pleistocene ANE ancestry profile. The Tarim mummy genomes thus provide a critical reference point for genetically modelling Holocene-era populations and reconstructing the population history of Asia.


Sample provenance

The archaeological human remains studied in this manuscript were excavated by the Xinjiang Institute of Cultural Relics and Archaeology from 1979 to 2017. Scientific investigation of these remains was approved by the Xinjiang Cultural Relics and Archaeology Institute, which holds the custodianship of the studied remains, based on the written agreements.

Radiocarbon dating

Of the 18 individuals reported in this study, 10 were directly dated using accelerator mass spectrometry (AMS) at Beta Analytic, Miami, USA, and/or at Lanzhou University, China. To confirm the reliability of our AMS dating results, 4 out of the 10 individuals were AMS-dated at both Beta Analytic and Lanzhou University. Consistent dates were obtained in all cases (Supplementary Data 1C). The calibration of the dated samples was performed on the basis of the IntCal20 database45 and using the OxCal v.4.4 program46. All of the samples were dated to time periods consistent with those estimated from archaeological stratigraphic layers and excavated grave goods.

DNA laboratory procedures

Ancient DNA work was conducted in dedicated cleanroom laboratory facilities at the ancient DNA laboratories of Jilin University in Changchun and the Institute of Vertebrate Paleontology and Paleoanthropology in Beijing (Extended Data Table 1 and Supplementary Data 1A). For the 33 individuals initially screened in this study, approximately 50 mg of dentine or bone powder was obtained per individual from either teeth or bones. DNA was extracted following established protocols47 with slight modifications ( A subset of DNA extracts (n = 16) was subjected to a partial uracil-specific excision reagent repair following the methods described in ref. 48 (Extended Data Table 1 and Supplementary Data 1A). All 33 DNA extracts were built into double-stranded dual-index Illumina libraries. Libraries that were prepared in Jilin (n = 26) were directly shotgun sequenced on an Illumina HiSeq X10 or HiSeq 4000 instrument using 2 × 150-base-pair (bp) chemistry, and those with endogenous human DNA higher than 10% (n = 12) were sent for deeper sequencing. One of the 12 individuals (XHBM1) was later excluded from this study owing to high modern human DNA contamination (Supplementary Data 1A). For libraries prepared at the Institute of Vertebrate Paleontology and Paleoanthropology, samples with 0.1% or more human DNA from the initial screening (n = 7) were further enriched for approximately 1.2 million nuclear SNPs and then deeper sequenced on an Illumina HiSeq 4000 instrument using 2 × 150-bp chemistry. Together, a total of 18 individuals yielded sufficient high-quality ancient genomic data for downstream analyses (Extended Data Table 1).

DNA sequence data processing

Raw read data were processed with EAGER v.1.92.55 (ref. 49), a pipeline specially designed for processing ancient DNA sequence data. Specifically, raw reads were trimmed for Illumina adaptor sequences, and overlapping pairs were collapsed into single reads using AdapterRemoval 2.2.0 (ref. 50). Merged reads were mapped to the human reference genome (hs37d5; GRCh37 with decoy sequences) using the aln/samse programs in BWA v.0.7.12 (ref. 51). PCR duplicates were removed using DeDup v.0.12.2 (ref. 49). To minimize the effect of postmortem DNA damage on genotyping, we trimmed BAM files generated from samples treated (n = 11) or not (n = 7) with uracil DNA glycosylase (UDG) by soft-masking up to 10 bp on both ends of each read using the trimbam function on bamUtils v.1.0.13 (ref. 52) on the basis of the DNA misincorporation pattern per library tabulated using mapDamage v.2.0.9 (ref. 53). For each SNP in the 1,240k panel, a single base from a high-quality read (base and mapping quality score 30 or higher) was randomly sampled to represent a pseudo-diploid genotype using the pileupCaller v. downloaded from under the random haploid calling mode (-randomHaploid). For the transition SNPs (C/T and G/A), trimmed BAM files were used. For the transversion SNPs, BAM files without trimming were used.

Ancient DNA authentication

We assessed the authenticity of our ancient DNA data as follows. First, we computed the proportion of C-to-T deamination errors at both the 5′ and 3′ ends of the sequencing reads, and found that all samples exhibited postmortem damage patterns characteristic of ancient DNA (Supplementary Data 1A). We then estimated mitochondrial DNA contamination for all individuals using the Schmutzi v.1.5.1 program54. To do this, we mapped adapter-trimmed reads to a 500-bp-extended revised Cambridge Reference Sequence (rCRS) of the human mitochondrial genome (NC_012920.1) to preserve reads passing through the origin, and then wrapped up the alignment to the regular rCRS with the circularmapper v.1.1 (ref. 49). We successively ran the contDeam and schmutzi modules in the schmutzi program against the worldwide allele frequency database of 197 individuals to estimate the mitochondrial DNA contamination rate. Last, we estimated the nuclear contamination rate on men using ANGSD v.0.910 (ref. 55), on the basis of the principle that mens have only a single copy of the X chromosome, and thus contamination will introduce extra mismatches among reads in SNP sites but not in the flanking monomorphic sites.

DNA reference datasets

We compared the genome sequences of our ancient individuals to two sets of worldwide genotype panels, one based on the Affymetrix Axiom Genome-wide Human Origins 1 array (HumanOrigins; 593,124 autosomal SNPs)56,57,58 and the other on the 1,240k dataset (1,233,013 autosomal SNPs including all of the HumanOrigins SNPs)59. We augmented both datasets by adding the Simons Genome Diversity Panel60 and published ancient genomes (Supplementary Data 2A).

Genetic relatedness analysis

We used pairwise mismatch rate (pmr)61 and lcMLkin v0.5.0 (ref. 62), to determine the genetic relatedness between ancient individuals. We calculated pmr for all pairs of ancient individuals in this study using the autosomal SNPs in the 1,240k panel and kept individual pairs with at least 8,000 SNPs covered by both to remove noisy estimates from low-coverage samples. We used lcMLkin to validate our observation in pmr analysis and to distinguish between parent–offspring and full sibling pairs.

Uniparental haplogroup assignment

We aligned the adapter-trimmed reads to the rCRS NC_012920.1, and then generated the mitochondrial consensus sequence of each ancient individual using Geneious software v.11.1.3 (ref. 63; We assigned each consensus sequence into a specific haplogroup using HaploGrep2 (ref. 64). For the Y chromosome, we used lineage-informative SNPs from the International Society of Genetic Genealogy 2016 tree ( For these SNPs, we called each individual’s genotype using bcftools v.1.7 (ref. 51) mpileup and call modules, after removing reads with mapping quality score < 30 (-q 30) and bases with quality score < 30 (-Q 30). We subsequently removed all heterozygous genotype calls. Then we assigned each individual to a specific Y haplogroup by manually comparing the genotype calls with the International Society of Genetic Genealogy SNPs. Before variant calling, we filtered alignment data using the pysam library v.0.15.2 ( to reduce false positive variants due to postmortem damage and modern human contamination. We kept an observed base only if it was from a read shorter than 100 bp and the base was more than 10 bp away from the read ends. For transition SNPs, we further removed aligned bases if they were from a read with no postmortem damage pattern (that is, no C-to-T or G-to-A substitution). We determined each individual’s Y haplogroup primarily on the basis of the transversion SNPs and additionally considered transitions if transversions were insufficient.

Population genetic analysis

We performed principal component analysis as implemented in smartpca v.16000 (ref. 65) using a set of 2,077 present-day Eurasian individuals from the HumanOrigins dataset (Supplementary Data 2B) with the options ‘lsqproject: YES’ and ‘shrinkmode: YES’. The unsupervised admixture analysis was performed with ADMIXTURE v.1.3.0 (ref. 66). For ADMIXTURE, we removed genetic markers with minor allele frequency lower than 1% and pruned for linkage disequilibrium using the -indep-pairwise 200 25 0.2 option in PLINK v.1.90 (ref. 67). We used outgroup f3 statistics68 to obtain a measurement of genetic relationship of the target population to a set of the Eurasian populations since their divergence from an African outgroup. We calculated f4 statistics with the ‘f4mode: YES’ function in the ADMIXTOOLS package58. f3 and f4 statistics were calculated using qp3Pop v.435 and qpDstat v.755 in the ADMIXTOOLS package.

Runs of homozygosity

We characterized whether the Bronze Age Xinjiang individuals descended from genetically related parents by estimating the runs of homozygosity (ROH). ROH refers to segments of the genome where the two chromosomes in an individual are identical to each other owing to recent common ancestry. Therefore, the presence of long ROH segments strongly suggests that an individual’s parents are related. We applied the hapROH method69 using the Python library hapROH v.0.3a4 with default parameters. The method was developed to identify ROH from low-coverage genotype data typical of ancient DNA and is still robust enough to identify ROH for individuals with a coverage down to 0.5× (ref. 69). We reported the total sum of ROH longer than 4, 8, 12 and 20 cM, and visualized the results using DataGraph v.4.5.1.

Genetic admixture modelling with qpAdm

We modelled our ancient Xinjiang populations using the qpWave/qpAdm programs (qpWave v.410 (ref. 70) and qpAdm v.810 (ref. 57)). We used the following eight populations in the 1,240k dataset as the base set of outgroups (base) unless explicitly stated otherwise: Mbuti (n = 5), Natufian (n = 6), Onge (n = 2), Iran_N (n = 5), Villabruna (n = 1), Mixe (n = 3), Ami (n = 2), Anatolia_N (n = 23). This set includes an African outgroup (Mbuti), early Holocene Levantine hunter-gatherers (Natufian), Andamanese islanders (Onge), early Neolithic Iranians from the Tepe Ganj Dareh site (Iran_N), late Pleistocene Western European hunter-gatherers (Villabruna), Central Native Americans (Mixe), an indigenous group native to Taiwan (Ami) and Neolithic farmers from Anatolia (Anatolia_N). To compare competing models, we also took a ‘rotating’ approach, where we reciprocally added a source from a model to outgroups for a competing model. We specified which outgroups are used for all qpAdm models.

Admixture dating with DATES

We used DATES v.753 (ref. 26) for the dating of admixture events of the ancient populations with the pseudo-haploid genotype data under the simplified assumption that gene flow occurred as a single event, and assuming a generation time of 29 years (ref. 58). The DATES software measures the decay of ancestry covariance to infer the admixture time and estimates jackknife standard errors. In the parameter file for running DATES, we used the options binsize: 0.001, maxdis: 0.5, runmode: 1, qbin: 10 and lovalfit: 0.45 in every run on the pseudo-haploid genotype data. For each target population, we chose a pair of reference populations that we identified as good sources in the qpAdm analysis. In cases in which the qpAdm source had limited sample size or SNP coverage, we chose an alternative that had a similar genetic profile to the qpAdm source but with better data quality to enhance the statistical power of the DATES analysis (Supplementary Data 1D–G). For Dzungaria_EBA1 and Chemurchek, we used the Afanasievo (n = 20) and Baikal_EBA (n = 9) as the references. For Kumsay_EBA and Mereke_MBA, we used the Afanasievo (n = 20) and Baikal_EN (n = 15). For Dali_EBA, we used Tarim_EMBA1 (n = 12) and Baikal_EBA (n = 9). For Tarim_EMBA1, we used West_Siberia_N (n = 3) and DevilsCave_N (n = 4).

Protein extraction, digestion and liquid chromatography with tandem mass spectrometry

Total protein extractions were performed on dental calculus obtained from seven Xiaohe individuals excavated from layers 4 and 5 (Extended Data Table 2). Only individuals with calculus deposits >5 mg were analysed, and 5–10 mg of dental calculus was processed for each sample. Samples were extracted and digested using a filter-aided sample preparation, following decalcification in 0.5 M EDTA (ref.  71). Extracted peptides were analysed by liquid chromatography with tandem mass spectrometry (MS/MS) using a Q-Exactive mass spectrometer (Thermo Scientific) coupled to an ACQUITY UPLC M-Class system (Waters AG) according to previously described protocols28. Potential contamination and sample carryover were monitored through the use of extraction blanks as well as injection blanks between each sample.

Protein database searching

Tandem mass spectra were converted to Mascot generic files by MSConvert version 3.0.11781 using the 100 most intense MS/MS peaks. All MS/MS samples were analysed using Mascot (Matrix Science; v.2.6.0). Mascot was set up to search the SwissProt Release 2019_08 database (560,823 entries) assuming the digestion enzyme trypsin. Mascot was searched with a fragment ion mass tolerance of 0.050 Da and a parent ion tolerance of 10.0 ppm. Carbamidomethylation of cysteine was specified in Mascot as a fixed modification. Deamidation of asparagine and glutamine and oxidation of methionine and proline were specified in Mascot as variable modifications. A subset of samples were analysed in duplicate (Supplementary Data 3), and the results were combined using multidimensional protein identification technology (MudPIT) before analysis.

Criteria for protein identification

MS/MS-based protein and peptide identifications were validated using Scaffold (version Scaffold_4.9.0, Proteome Software). Peptide identifications were accepted if they could be established at greater than 86.0% probability to achieve a false discovery rate (FDR) less than 1.0% by the Peptide Prophet algorithm71 with Scaffold delta-mass correction. Protein identifications were accepted if they could be established at an FDR of less than 5.0% and contained at least two unique peptides. Final protein and peptide FDRs were 1.8% and 0.99%, respectively. Protein probabilities were assigned by the Protein Prophet algorithm72. After establishing the presence of the milk proteins β-lactoglobulin and α-S1-casein using these criteria, we expanded our analysis to accept further milk proteins identified on the basis of single peptides for high-scoring PSMs (>60), which resulted in the additional identification of α-lactalbumin. Proteins that contained similar peptides that could not be differentiated on the basis of MS/MS analysis alone were grouped to satisfy the principles of parsimony. All samples yielded proteomes typical of a dental calculus oral microbiome, and damage-associated modifications (N and Q deamidation) characteristic of ancient proteins were observed (Supplementary Data 3).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

The DNA sequences reported in this paper have been deposited in the European Nucleotide Archive under the accession number PRJEB46875. Haploid genotype data of ancient individuals in this study on the 1,240k panel are available in the EIGENSTRAT format at The protein spectra have been deposited in the ProteomeXchange Consortium via the PRIDE partner repository under the accession number PDX027706. The publicly available database SwissProt release 2019_08 is accessible through the UniProt Knowledge Base ( The basemaps used in Figs. 1, 3 are in the public domain and accessible through the Natural Earth website (

Code availability

All of the analyses performed in this study are based on publicly available software programs. Specific version information and non-default arguments are described in the Methods.


  1. Peyrot, M. in Aspects of Globalisation: Mobility, Exchange and the Development of Multi-Cultural States 12–17 (2017).

  2. Damgaard, P. et al. 137 ancient human genomes from across the Eurasian steppes. Nature 557, 369–374 (2018).

    ADS  CAS  PubMed  Google Scholar 

  3. Hemphill, B. E. & Mallory, J. P. Horse-mounted invaders from the Russo-Kazakh steppe or agricultural colonists from western Central Asia? A craniometric investigation of the Bronze Age settlement of Xinjiang. Am. J. Phys. Anthropol. 124, 199–222 (2004).

    PubMed  Google Scholar 

  4. Betts, A., Jia, P. & Abuduresule, I. A new hypothesis for early Bronze Age cultural diversity in Xinjiang, China. Archaeol. Res. Asia 17, 204–213 (2019).

    Google Scholar 

  5. Li, C. et al. Evidence that a West-East admixed population lived in the Tarim Basin as early as the early Bronze Age. BMC Biol. 8, 15 (2010).

    PubMed  PubMed Central  Google Scholar 

  6. Li, C. et al. Analysis of ancient human mitochondrial DNA from the Xiaohe cemetery: insights into prehistoric population movements in the Tarim Basin, China. BMC Genet. 16, 78 (2015).

    PubMed  PubMed Central  Google Scholar 

  7. Ning, C. et al. Ancient genomes reveal Yamnaya-related ancestry and a potential source of Indo-European speakers in Iron Age Tianshan. Curr. Biol. 29, 2526–2532 (2019).

    CAS  PubMed  Google Scholar 

  8. Zhou, X. et al. 5,200-year-old cereal grains from the eastern Altai Mountains redate the trans-Eurasian crop exchange. Nat. Plants 6, 78–87 (2020).

    CAS  PubMed  Google Scholar 

  9. Wang, T. et al. Tianshanbeilu and the isotopic millet road: reviewing the late Neolithic/Bronze Age radiation of human millet consumption from north China to Europe. Natl Sci. Rev. 6, 1024–1039 (2019).

    CAS  PubMed  Google Scholar 

  10. Zhang, Y. et al. Holocene environmental changes around Xiaohe Cemetery and its effects on human occupation, Xinjiang, China. J. Geogr. Sci. 27, 752–768 (2017).

    Google Scholar 

  11. Hong, Z., Jian-Wei, W., Qiu-Hong, Z. & Yun-Jiang, Y. A preliminary study of oasis evolution in the Tarim Basin, Xinjiang, China. J. Arid Environ. 55, 545–553 (2003).

    ADS  Google Scholar 

  12. Jia, P. & Betts, A. A re-analysis of the Qiemu’erqieke (Shamirshak) cemeteries, Xinjiang, China. J. Indo-Eur. Stud. 38, 275–317 (2010).

    Google Scholar 

  13. Peyrot, M. The deviant typological profile of the Tocharian branch of Indo-European may be due to Uralic substrate influence. Indo-Eur. Linguist. 7, 72–121 (2019).

    Google Scholar 

  14. Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).

    ADS  CAS  PubMed  Google Scholar 

  16. Mallory, J. P. & Mair, V. H. The Tarim Mummies: Ancient China and the Mystery of the Earliest Peoples from the West (Thames & Hudson, 2000).

  17. Barber, E. W. Mummies of Urumchi (W. W. Norton & Co., 1999).

  18. Mair, V. H. Prehistoric Caucasoid corpses of the Tarim Basin. J. Indo-Euro. Stud. 23, 281–307 (1995).

    Google Scholar 

  19. Mair, V. H. in The Bronze Age and Early Iron Age Peoples of Eastern Central Asia Vol. 2 835–855 (Institute for the Study of Man and the University of Pennsylvania Museum, 1998).

  20. Mallory, J. P. The Problem of Tocharian Origins: an Archaeological Perspective (Univ. Pennsylvania Press, 2015).

  21. Chen, K. & Hiebert, F. T. The late prehistory of Xinjiang in relation to its neighbors. J. World Prehist. 9, 243–300 (1995).

    Google Scholar 

  22. Han, K. Craniometric study on the ancient individuals from the Gumugou site, Xinjiang (in Chinese). Kaogu Xuebao 361–384 (1986).

  23. Kuzmina, E. E. in Archeology, Migration and Nomadism, Linguistics Vol. 1 63–93 (Univ. Pennsylvania Museum Publications, 1998).

  24. Li, Y. Agriculture and palaeoeconomy in prehistoric Xinjiang, China (3000–200 BC). Veg. Hist. Archaeobot. 30, 287–303 (2021).

    Google Scholar 

  25. Frachetti, M. D. Multiregional emergence of mobile pastoralism and nonuniform institutional complexity across Eurasia. Curr. Anthropol. 53, 2–38 (2012).

    Google Scholar 

  26. Narasimhan, V. M. et al. The formation of human populations in South and Central Asia. Science 365, eaat7487 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Feng, Q. et al. Genetic history of Xinjiang’s Uyghurs suggests Bronze Age multiple-way contacts in Eurasia. Mol. Biol. Evol. 34, 2572–2582 (2017).

    CAS  PubMed  Google Scholar 

  28. Jeong, C. et al. Bronze Age population dynamics and the rise of dairy pastoralism on the eastern Eurasian steppe. Proc. Natl Acad. Sci. USA 115, E11248–E11255.

  29. Yu, H. et al. Paleolithic to Bronze Age Siberians reveal connections with first Americans and across Eurasia. Cell 181, 1232–1245 (2020).

    CAS  PubMed  Google Scholar 

  30. Jeong, C. et al. A dynamic 6,000-year genetic history of Eurasia’s Eastern Steppe. Cell 183, 890–904 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Fu, Q. et al. The genetic history of Ice Age Europe. Nature 534, 200–205 (2016).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Wang, C.-C. et al. Genomic insights into the formation of human populations in East Asia. Nature 591, 413–419 (2021).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Li, J.-F. et al. Buried in sands: environmental analysis at the archaeological site of Xiaohe Cemetery, Xinjiang, China. PLoS ONE 8, e68957 (2013).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  34. Qiu, Z. et al. Paleo-environment and paleo-diet inferred from Early Bronze Age cow dung at Xiaohe Cemetery, Xinjiang, NW China. Quat. Int. 349, 167–177 (2014).

    Google Scholar 

  35. Yang, Y. et al. Proteomics evidence for kefir dairy in Early Bronze Age China. J. Archaeol. Sci. 45, 178–186 (2014).

    CAS  Google Scholar 

  36. Xie, M. et al. Identification of a dairy product in the grass woven basket from Gumugou Cemetery (3800 BP, northwestern China). Quat. Int. 426, 158–165 (2016).

    Google Scholar 

  37. Yang, R. et al. Investigation of cereal remains at the Xiaohe Cemetery in Xinjiang, China. J. Archaeol. Sci. 49, 42–47 (2014).

    CAS  Google Scholar 

  38. Zhang, G. et al. Ancient plant use and palaeoenvironmental analysis at the Gumugou Cemetery, Xinjiang, China: implication from desiccated plant remains. Archaeol. Anthropol. Sci. 9, 145–152 (2017).

    ADS  Google Scholar 

  39. Yu, J. & He, J. Significant discoveries from the excavation of Jimunai Tongtiandong site (in Chinese). Wenwubao 8 (2017).

  40. Hollard, C. et al. New genetic evidence of affinities and discontinuities between Bronze Age Siberian populations. Am. J. Phys. Anthropol. 167, 97–107 (2018).

    PubMed  Google Scholar 

  41. Li, C. et al. Ancient DNA analysis of desiccated wheat grains excavated from a Bronze Age cemetery in Xinjiang. J. Archaeol. Sci. 38, 115–119 (2011).

    CAS  Google Scholar 

  42. Stevens, C. J. & Fuller, D. Q. The spread of agriculture in eastern Asia: archaeological bases for hypothetical farmer/language dispersals. Lang. Dyn. Change 7, 152–186 (2017).

    Google Scholar 

  43. Abuduresule, I. Archaeological report of Xiaohe cemetery of 2003 (in Chinese). Wenwu 4–42 (2007).

  44. Abuduresule, Y., Li, W. & Hu, X. in The Cultures of Ancient Xinjiang, Western China: Crossroads of the Silk Roads 19–51 (Archaeopress, 2019).

  45. Reimer, P. J. et al. The IntCal20 Northern Hemisphere radiocarbon age calibration curve (0–55 cal kBP). Radiocarbon 62, 725–757 (2020).

    CAS  Google Scholar 

  46. Ramsey, C. B. Methods for summarizing radiocarbon datasets. Radiocarbon 59, 1809–1833 (2017).

    CAS  Google Scholar 

  47. Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, 15758–15763 (2013).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  48. Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. & Reich, D. Partial uracil–DNA–glycosylase treatment for screening of ancient DNA. Philos. Trans. R. Soc. B 370, (2015).

  49. Peltzer, A. et al. EAGER: efficient ancient genome reconstruction. Genome Biol. 17, 60 (2016).

    PubMed  PubMed Central  Google Scholar 

  50. Schubert, M., Lindgreen, S. & Orlando, L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes 9, (2016).

  51. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Jun, G., Wing, M. K., Abecasis, G. R. & Kang, H. M. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res. 25, 918–925 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).

    PubMed  PubMed Central  Google Scholar 

  54. Renaud, G., Slon, V., Duggan, A. T. & Kelso, J. Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biol. 16, 224 (2015).

    PubMed  PubMed Central  Google Scholar 

  55. Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics 15, 356 (2014).

    PubMed  PubMed Central  Google Scholar 

  56. Jeong, C. et al. The genetic history of admixture across inner Eurasia. Nat. Ecol. Evol. 3, 966–976 (2019).

    PubMed  PubMed Central  Google Scholar 

  57. Lazaridis, I. et al. Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–424 (2016).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  58. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

    PubMed  PubMed Central  Google Scholar 

  59. Fu, Q. et al. An early modern human from Romania with a recent Neanderthal ancestor. Nature 524, 216–219 (2015).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  60. Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  61. Kennett, D. J. et al. Archaeogenomic evidence reveals prehistoric matrilineal dynasty. Nat. Commun. 8, 14115 (2017).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  62. Lipatov, M., Sanjeev, K., Patro, R. & Veeramah, K. R. Maximum likelihood estimation of biological relatedness from low coverage sequencing data. Preprint at (2015).

  63. Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).

    PubMed  PubMed Central  Google Scholar 

  64. Weissensteiner, H. et al. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44, W58–W63 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

    PubMed  PubMed Central  Google Scholar 

  66. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, (2015).

  68. Raghavan, M. et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91 (2014).

    ADS  PubMed  Google Scholar 

  69. Ringbauer, H., Novembre, J. & Steinrücken, M. Detecting runs of homozygosity from low-coverage ancient DNA. Preprint at (2020).

  70. Reich, D. et al. Reconstructing Native American population history. Nature 488, 370–374 (2012).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  71. Keller, A., Nesvizhskii, A. I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).

    CAS  PubMed  Google Scholar 

  72. Nesvizhskii, A. I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).

    CAS  PubMed  Google Scholar 

Download references


We thank Xinjiang Institute of Cultural Relics and Archaeology and Renmin University of China for providing such valuable samples for study; Lanzhou University for providing the AMS dating results; K. Wang, H. Yu and G. A. Gnecchi-Ruscone for helpful comments on the genetic landscapes of the Eurasian steppe; T. Hermes and R. Flad for helpful comments on the broader archaeological context of the region. This work was supported by the National Key R&D Program of China (grant numbers 2016YFE0203700 and 2018YFA0606402), the National Natural Science Foundation of China (grant number 42072018, 41925009), the Fundamental Research Funds for the Central Universities, the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement numbers 804884-DAIRYCULTURES and 646612-Eurasia3angle), the Humanities and Social Sciences Key Research Base of the Ministry of Education (16JJD780005), the National Research Foundation of Korea grant funded by the Korean Government (MSIT; 2020R1C1C1003879) and the Max Planck Society.


Open access funding provided by Max Planck Society.

Author information

Authors and Affiliations



Y.C., C.J., C.W., C.N. and J.K. conceived and supervised the study. F.Z., A.S, L.F., P.C., R.Y., F.L. and Q.D. performed the laboratory work. Q.F., D.W., W.L., X.H., Q.R., I.A., C.L., S.G., Y.X., S. Wu, S. Wen, H. Zhu, H. Zhou and A.N. provided archaeological materials and associated information. R.B. and M.R. provided the linguistic background and G.D. and Z.T. assisted with the AMS dating. C.N., F.Z., A.S., C.W., C.J., Q.F., P.M., X.F., W.W. and V.K. analysed data. C.N., C.W., C.J., Y.C., F.Z. and A.S. wrote the manuscript with input from all coauthors.

Corresponding authors

Correspondence to Chao Ning, Johannes Krause, Christina Warinner, Choongwon Jeong or Yinqiu Cui.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Paula Dupuy, Michael Frachetti and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Burial goods excavated from the Xiaohe cemetery.

A, a wooden sculpture excavated from the upper layer of a double-layer mud coffin of XHM75. B, an oar-plank placed in front of a male burial. C, a wooden pole placed in front of a female burial. D, Burial XHM66 from layer 4 of the Xiaohe cemetery illustrating typical features of early burials, including boat-shaped coffins and mummified remains dressed in woollen garments. This burial style is common at Bronze Age cemeteries throughout the Tarim Basin, including Beifang and Gumugou. E, Side view of the Xiaohe cemetery showing wooden grave markers and fencing.

Extended Data Fig. 2 F-statistics for the ancient Xinjiang and the Eurasian steppe populations.

A, we show top 5 outgroup f3-statistics of the form f3(Target, X; Mbuti) for the 361 world-wide populations as contrast populations X, and 8 populations from this study and the Eurasian Steppe as target: Dzungaria_EBA1, Dzungaria_EBA2, Chemurchek, Dzungaria_EIA, Okunevo_EMBA, Kazakhstan_EMBA, Botai_CA, West_Siberia_N, horizontal bars represent ± 1 standard error measure (s.e.m.) calculated by 5 cM block jackknifing. B, f4-statistics of the form f4(Mbuti, X; Dzungaria_EBA1, Tarim_EMBA1), horizontal bars represent ± 3 (thin) and ± 1 (thick) s.e.m. calculated by 5 cM block jackknifing, and C, f4-statistics of the form f4(Mbuti, X; Dzungaria_EBA2, Tarim_EMBA1), where X is 361 world-wide populations. We show the top and the bottom 15 f4 statistics. Horizonal bars represent the point estimate ± 3 (thin) and ± 1 (thick) s.e.m., respectively, as estimated using 5 cM block jackknifing. F4 statistics deviating three s.e.m. or more from zero are marked in red.

Extended Data Fig. 3 Unsupervised ADMIXTURE plot for the Bronze Age Xinjiang individuals.

We plot ancestry component estimates for K = 8 using ‘AncestryPainter’ ( Dzungaria_EBA individuals show an ancestry pattern close to Afanasievo and Yamnaya, while Tarim_EMBA individuals show a pattern similar to AG3, West_Siberia_N and Botai_CA from the Eurasia steppe.

Extended Data Fig. 4 Reduced genetic diversity of the Tarim_EMBA individuals.

A, a comparison of individual outgroup f3-statistics for the ancient Xinjiang populations and their neighboring populations from Inner Asia, including Tarim_EMBA1 (n = 12), Tarim_EMBA2 (n = 1), ANE (n = 3), Dzungaria_EBA1 (n = 3), Dzungaria_EBA2 (n = 2), West_Siberia_N (n = 3) and Botai_CA (n = 3), which Tarim Basin individuals show the highest affinity to each other. In each boxplot, the box marks the 25th and 75th quartiles of the distribution, respectively, and the horizontal line within the box marks the median. The whisker delineates the maximum and the minimum. B, the cumulative distribution of ROH tracts shows that Tarim_EMBA individuals did not descend from close related parents. C, pairwise mismatch rate (pmr) between individuals in the ancient populations of Xinjiang and its neighboring regions, including all pairs of individuals within the Afanasievo (n = 27), ANE (n = 3), Baikal_EBA (n = 9), Baikal_EN (n = 15), Botai_CA (n = 3), Dzungaria_EBA (n = 5), Dzungaria_EIA (n = 10), Sintashta_MLBA (n = 51), Tarim_EMBA (n = 13), West_Siberia_N (n = 3), as well as present-day isolated populations such as Papuan and Karitiana. Tarim_EMBA individuals uniformly show a much reduced pmr value that is equivalent to the first-degree relatives in Afanasievo or Sintashta_MLBA. The red dotted lines mark the expected pmr value for the given coefficient of relationship (r), ranging from 0 (unrelated) and 1/4 (second degree relatives) to 1/2 (first degree relatives), based on the mean value of pmr among these populations, respectively. In each box plot, the box represents the interquartile range (the 25th and 75th quartiles), and the horizon line within the box represents the median. Black-filled and open circles represent outliers (1.5 times beyond the IQR) and extreme outliers (3 times beyond the IQR), respectively. The whisker delineates the smallest and the largest non-outlier observations. D, Y chromosome phylogeny of the Bronze Age Xinjiang male individuals. Xiaohe male individuals fall into a branch distinct from western Bronze Age steppe pastoralists, such as Afanasievo and Yamnaya. One individual from Beifang falls in a position that is more basal than Xiaohe, but its phylogenetic position cannot be fixed due to low coverage, and its proximate position(s) are instead indicated with an asterisk.

Extended Data Fig. 5 Proteomic evidence for dairy consumption in Xiaohe dental calculus, ca. 2000-1800 BCE.

A, B- and Y-ion series for the frequently observed β-lactoglobulin peptide TPEVD(D/N/K)EALEK, which contains a taxon-specific polymorphic residue: D, Bovinae; N, Ovis; K, Capra. See SI Appendix. B, Taxonomically assigned β-lactoglobulin (black), α-S1-casein (dark grey), and α-lactalbumin peptide spectral matches (PSMs) presented as scaled pie charts on a cladogram of dairy livestock. Bracketed numbers represent the number of PSMs (excluding duplicates) assigned to each node. †Included on the Bovidae node are: 13 PSMs assigned to Bovidae; 21 PSMs assigned to Bovidae but excluding Capra.

Extended Data Table 1 A summary of the Bronze Age Xinjiang individuals reported in this study
Extended Data Table 2 Dietary proteins identified in the dental calculus of individuals analyzed from the Tarim Basin Xiaohe cemetery
Extended Data Table 3 Robustness of key qpAdm admixture models

Supplementary information

Supplementary Information

This PDF file includes five sections of Supplementary text. (1) Environmental setting of Xinjiang; (2) Archaeological sites and context; (3) Linguistic background of the population history in Xinjiang; (4) Detailed description of genetic isolation of the Tarim group; and (5) Tarim mummies and the pre-pastoralist Central Asian genetic substratum.

Reporting Summary

Peer Review File

Supplementary Data 1

Sample information, qpAdm modelling results and phenotypic traits of the studied individuals.

Supplementary Data 2

Ancient and modern populations analysed in this study.

Supplementary Data 3

Dairy peptides identified within the Xiaohe dental calculus samples.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, F., Ning, C., Scott, A. et al. The genomic origins of the Bronze Age Tarim Basin mummies. Nature 599, 256–261 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing