Introduction

Consistent with their displaced ethno-history since the ancient Northern Kingdom of Israel was invaded and occupied by the Neo-Assyrian Empire1, contemporary Jews, including Ashkenazi Jews, Sephardic Jews, North African Jews and Middle Eastern Jews2, retain a genetic imprint of their Near Eastern ancestry, but have received a substantial contribution, to a variable extent, from their neighboring populations such as Europeans, Near Easterners and North Africans2,3,4,5,6,7,8. However, it has hitherto remained unclear whether Jews received any genetic contribution from populations outside western Eurasia. Intriguingly, frequent communication has been observed between Jews and Chinese since the early centuries of the Common Era, plausibly initiated by the Silk Road. For instance, Hebrew letters and prayers in the 8th century from ancient Jewish merchants were found in the northwestern region of China9. Some unearthed pottery figurines from the Tang Dynasty (618–907AD) have Semitic characteristics9 and synagogues were recorded in the epigraphy from the Ming (1368–1644AD) and Qing Dynasties (1644–1912AD)9,10. Nonetheless, such connections, as revealed by the archaeological discoveries and historical records, have been confined to economic and cultural exchanges; so far, no direct evidence of a genetic contribution from Chinese into Jews has been reported.

To address the issue of whether Jews received any genetic contribution from the Far East and thus shed more light on their ethno-origins, mitochondrial DNA (mtDNA) variation (mainly from the control region of the molecule, plus some coding-region variants) of 1,930 Jews and 21,191 East Asians, retrieved from previous studies as well as our unpublished data, were considered and analyzed, with especial attention to pinpointing eastern Eurasian haplogroups in Jews (Supplementary Table S1). Then, mtDNA control region variants of an additional 32,474 Eurasian individuals were analyzed to gain further insights into the phylogeographic distribution of M33c (Supplementary Table S1), so that the total number of Eurasian mtDNAs considered here was 55,595. Our results do reveal a direct genetic connection, as manifested by the sharing of some Eastern Eurasian haplogroups e.g. N9a, A and M33c, between Jews and Chinese. Further analyses, including phylogeny reconstruction with the aid of new mtDNA genomes, confirm that this connection was established at least by a founder lineage M33c2. The differentiation time of this lineage is estimated to ~1.4 kilo-years ago (kya), which fits well with the historical records and, most importantly, indicates that the exchange between Jews and the Far East was not confined to culture but also extended to the demic.

Results and Discussion

Our analysis of the mtDNA variation in a total of 23,121 individuals from East Asian populations and Jews reveals that mtDNAs of four Ashkenazi Jewish individuals can be allocated into eastern Eurasian haplogroups A and N9a, suggesting that Ashkenazi Jews received a genetic contribution from East Asia (Table 1). Intriguingly, our results also disclose that 14 eastern Ashkenazi Jews belong to haplogroup M33c (Table 1), for which sister clusters, M33a, M33b and M33d, are prevalent in the Indian Subcontinent and thus most plausibly trace their origins there11,12.

Table 1 The shared eastern Eurasian haplotypes between Ashkenazi Jews and Chinese

To achieve further insight into the phylogeographic distribution of M33c, mtDNA variants (mainly from the control region) of an additional 32,474 Eurasian individuals were analyzed, so that the total number of Eurasian mtDNAs considered here was 55,595. As shown in Table 1, besides the 14 Ashkenazi Jewish M33c lineages, an additional 38 M33c mtDNAs (with the specific control-region motif showing transitions at positions 16111, 16223, 16235 and 16362) were pinpointed, among which 34 are from China, 2 from Vietnam and 1 from Thailand, with the remaining individual most likely from Europe but with ambiguous ancestry. Thus, despite the restricted distribution of M33a, M33b and M33d in South Asia, it is most likely that M33c originated, or at least differentiated, in eastern Asia. This notion receives clear support from the median network, in which virtually all of the diversity of this haplogroup is observed in China (Figure 1).

Figure 1
figure 1

Median-joining network of haplogroup M33c.

The median-joining network is reconstructed on the basis of mtDNA hypervariable segment I (HVS-I) variation. The sampling locations are shown by different colors in the map. Transversions are highlighted by adding suffixes “A”, “C”, “G” and “T”. The prefix @ designates back mutation, whereas recurrent variants are underlined. * denotes that this individual's whole-mtDNA genome information is shown on the phylogenetic tree. The size of the circle is in proportion to the number of individuals. The geographic locations are abbreviated as follows: CHS (Hunan or Fujian), GD (Guangdong), GX (Guangxi), GZ (Guizhou), HN (Hunan), JL (Jilin), JS (Jiangsu), SC (Sichuan), SN (Shaanxi), Thai (Thailand), Viet (Vietnam) and YN (Yunnan). Note: M33c individuals in Europe. M33c individuals in Asia. M33a, M33b or M33d individuals. Sampling locations of all the other samples considered in this study. The map was created by the Kriging algorithm of the Surfer 8.0 package. More details regarding the populations are displayed in Supplementary Table S1.

To shed light on the phylogeny within haplogroup M33c, 11 mtDNAs, covering the widest range of internal variation within the haplogroup, were chosen for whole-mtDNA genome sequencing. In good agreement with the previous result13, the resulting phylogenetic tree (Figure 2), incorporating five previously reported mtDNA genomes13,14,15 as well as one whose information was released online (A Genetic Genealogy Community; http://eng.molgen.org), confirms that M33c is defined by mutations at positions 3316, 4079, 5894, 8227, 8848, 16111 and 16235. Of note is that five clades within M33c appear respectively characterized by diagnostic coding-region variant(s) and these are named M33c1 to M33c5 here. With the exception of M33c2, all the samples in these clades are from China. The likely origin of M33 in South Asia and the restriction to China of M33c, dating to 10 kya according to the estimation based on whole-mtDNA genome, implies some dispersal from South to East Asia in the immediate postglacial.

Intriguingly, sub-haplogroup M33c2 (defined by three additional coding-region variants at positions 4182, 4577 and 7364) consists of three different haplotypes (one seen in three Ashkenazi Jews, another in a single Chinese individual and the third in the likely European with unknown ethnicity). Although there is no control-region variant in the defining motif of M33c2, multiple lines of evidence suggest that the pinpointed 14 Ashkenazi Jewish M33c mtDNAs most likely all belong to this clade: (1) all of the 14 mtDNAs share an identical control-region motif (Table 1); (2) the three completely sequenced Ashkenazi Jewish mtDNAs with this motif (EU148486, Bel 1 and Forum 1) belong to M33c2 (Figure 2); (3) M33c shows a virtually exclusive distribution in Ashkenazi Jews in western Eurasia, even though 55,595 mtDNAs have been checked (Table 1 and Supplementary Table S1). Thus, it is plausible that the unknown European individual (JQ702003) was in fact from a Jewish population or had Ashkenazi Jewish ancestry.

Age estimates for M33c2 are similar whether based either on the whole genome or on the control region alone (Table 2) and the age of ~1.4 kya fits well with the medieval operation of the Silk Road. We note that this is an upper bound for the gene flow event during which the lineage was assimilated into the Ashkenazim; it is the age of the subclade overall, which most likely arose within China and indeed there is no variation at all within the Jewish lineages, suggesting a very recent event. If we assume that the unidentified European lineage belongs within the Ashkenazi diversity, we can date the Ashkenazi subclade itself more specifically to about 640 years ago – around 1350AD. This in turn would then provide a minimum point estimate for the age of the gene flow event (although the range taking account of errors in the estimates is of course much wider).

Table 2 Ages of the major clades of haplogroup M33c estimated from control-region and whole-mtDNA genome data with 95% confidence intervals

The ancient Silk Road was an important transportation hub connecting China and the Mediterranean region from the Han Dynasty (206BC–220AD) onwards and there are likely to have been Jewish merchants at the eastern end of the Silk Road from the early centuries AD. Moreover, Jewish merchants in Europe, referred to as Radhanites, were involved in trade between west and east as early as the ninth century16. It has been suggested, on the basis of contrasts between patterns of mtDNA and Y-chromosome variation17, that such merchants may have formed the nucleus for a number of extant Jewish communities.

Ashkenazi origins are controversial18. According to recent archaeological evidence, the Jewish community of Cologne, mentioned by Emperor Constantine in 321AD, existed in the city continuously until they had to leave in 1423–1424AD19. This suggests that Ashkenazi Jewry may date to Roman times, possibly originating in Italy, which is also suggested by analysis of mtDNA8 and autosomal data20. An early eastern European Ashkenazi origin from Italy (first millennium and earlier) would also agree with the finding that an origin mainly from Germany21 or another central or western European country18 during the late Middle Ages, is demographically not possible. Recent work also suggests a sizable Jewish presence in eastern Germany (the Danube region, rather than the Rhineland) prior to the expansion in Poland between 1500 and 1650AD22. The M33c2 mtDNAs are confined to eastern European Ashkenazim in the present database (the single unknown example is of likely East European ancestry14), suggesting that these groups had contacts to the east to the extent that they mediated female gene flow.

Extensive genetic admixture has been observed in populations residing around the ancient Silk Road region23,24. Our currently observed genetic imprint echoes the previously observed ancient communications between Jews and Chinese and, most significantly, implies that such historical exchanges were not confined to the cultural realm but involved gene flow. This unexpected ancient genetic connection between Ashkenazi Jews and the Far East, as witnessed at least by mtDNA haplogroup M33c2, provides the first evidence for a significant genetic contribution from Chinese to eastern European Ashkenazi Jews that was most likely mediated by the Silk Road between around 640 and 1400 years ago. Although the involvement of male Jewish traders has been suggested before17, our results, focusing on the female line of descent, specifically point to the involvement also of women. Well-resolved evidence from the male-specific part of the Y chromosome and from the autosomes would help to further illustrate the rather complex, pan-Eurasian ethno-history of Jews.

Methods

mtDNA Data collection and mining

mtDNA variation (mainly from control region) of 23,121 East Asians and Jews, retrieved from previous studies as well as our unpublished data, were considered and analyzed, with especial attention to pinpointing the eastern Eurasian haplogroups in Jews. Then, additional 32,474 individuals were analyzed to gain further insights into the phylogeographic distribution of M33c, leading the total number of Eurasian mtDNAs considered here to 55,595. The study project was approved by the Ethics Committee at Kunming Institute of Zoology, Chinese Academy of Sciences. Each participant was informed about the study and provided informed consent. All mtDNAs collected and considered in the present study were first allocated to haplogroups, based mainly on their control-region motifs, which were then further confirmed by typing specific coding-region variation according to the PhyloTree (mtDNA tree Build 1625; http://www.phylotree.org/).

DNA amplification and sequencing

For haplogroups of interest, special attention was paid to the intrinsic phylogeny reconstructed on entire mitogenome information. In this way, entire mitogenomes for 11 selected representatives from haplogroup M33c were amplified, sequenced and dealt with as described elsewhere13,26. The sequencing outputs were edited and aligned by Lasergene (DNAStar Inc., Madison, Wisconsin, USA) and compared with the revised Cambridge Reference Sequence (rCRS)27.

Data analysis

The median-joining network of M33c was constructed manually28 and then confirmed using Network 4.612 (http://www.fluxus-engineering.com/sharenet.htm). The most parsimonious phylogenetic tree (Figure 2) was reconstructed by hand as carried out previously13,26. The coalescence ages were estimated by the ρ ± σ method29,30 and maximum likelihood (ML) analysis. Recently corrected calibrated mutation rates31 were adopted in the ρ statistic and the ML analysis.